Data Linking Systems Analysis Coursework Example | Topics and Well Written Essays

(First, Middle, Last) Admission Number: Lecturer: (DD, MM, YYYY) RESEARCH, COMPUTER SCIENCES AND INFORMATION TECHNOLOGY Abstract Question answering system plays a vital role in search engine optimization model. Natural language processing method is typically applied in QA system for inquiring user’s question and numerous steps are also followed for alteration of questions to query form for receiving a precise answer. This paper analyzes diverse question answering systems that are based on semantic web and ontology with different formats of queries. Introduction Semantic web search is applied in order to develop and improve the precision of search by considering the aim of the user and the implication of the language in the searching sentence. Two searches are available, Navigation Search and Research Search. In navigational search, the user accesses the search engine as a navigation instrument to navigate to a particular targeted document. Semantic Search is not used in navigational searches. In Research Searching, the user feeds the search engine with a phrase is proposed to symbolize an object about which the user attempts to gather information. Rather than PageRank algorithm in Google, Semantic Web Search uses semantics to create highly pertinent searching results. This Search method can be used to retrieve the information from the data resource like ontology. Ontology1 is a technology used to facilitate the field knowledge to improve the query time used in Semantic Question Answering system. Data linking systems analysis Data linking systems apply some of the systems identified in order to interweave Web data described in RDF. The following analysis studies some systems performing both automated and semi-automated data linking. 1. Aqua log Aqua log is able to learn the users language in order to improve the knowledge by the time. The user’s learning method is good in a way that it uses ontology reasoning to study and learn more general patterns, which could then be use again for the questions with analogous context.2 In this system Linguistic Component (LC) is used to change the NL questions into Query-triple format and Relation Similarity Service (RSS). The data model is triple based like {Subject or Object} type. The presentation is based on accuracy, recall and also failure types are referred individually. Averages of about 64 % of consecutive answers are retrieved from ontology with closed environment. 2. ORAKEL ORAKEL used for computing planned answers of user query. It processes based questions as rational query type and information is symbolized with F-Logic and onto broker form. This system is applied to change questions into query manner then the query is fed to bottom-up generalization model for retrieving intentional answers to the user. Inference engine is used to assess queries to knowledge base manner. Customization is executed through the user interaction through Frame Mapper software in which the linguistic argument systems, like verbs or nouns are mapped to the relations in the ontology.3 3. SMART This is a Semantic web information tool with Automated Reasoning Tool system with integrated query form. Semantic queries are assessed using DL queries that are mapped with SPARQL query form.4The characteristic of this system is a semantic query with confirmation using DL graphical representation of query and mapping of DL queries to SPARQL. The retrieval of pre-computed inferences is from RDF triple format. Ontology supports URI identifiers. It uses file based system to store ontologies. Users can write logical, syntactic and logical queries in suitable form. 4. QUERIX This is ontology-based question answering system depends on elucidation dialogs incase of any ambiguities. This system has ontology manager (OM) user interface (UI) query analyzer (QA) and ontology access layer (OAL).5 NL queries are changed into SPARQL query using Wordnet as the synonym identifier. Stand ford parser is also used in this system which provides a syntax tree for NL query. Querix does not use the logic based semantic methods. 5. OKLAM OKLAM system offers a design based on the use of spread servers maintaining sets of corresponding resources.6 They are called Entity Name Servers (ENS). Each equivalent resource set is allocated to an identifier. ENS store entity descriptions on the form of key values. New entities are added based on a matching algorithm creating a similarity assessment between the candidate source and the ENS entity. The similarity measure uses a string matching algorithm between the key/property pairs. The similarity measure is then prejudiced based on the probability of the key to specify a name for the entity. A small vocabulary of naming features is therefore maintained in the system. Lastly similarities are collected by calculating the summation of maximal similarities among the properties of the two entities. 6. Silk This is a framework tool for connecting and maintaining datasets together with the links. It has a tool and a specification language “the Silk Link Specification Language.”7 Before matching two datasets, the user specifies entities to link in a LSL file. The tool uses a variety of string matching methods, numeric equality, date equality, taxonomical distance similarity, and sets similarity measure. All these similarity measures are put in parameters by the user using a definite format. Preprocessing transformations can be specified by the user in order to advance the value of the matching. Matching algorithms can be joined using a set of operators (MAX, MIN, AVG). Recently, very large, structured, and semantically rich knowledge bases have become available. Ex-amples are Yago 8DBpe-dia 9 and Freebase DBpedia forms the nucleus of the Web of Linked Data 10, which inter-connects hundreds of RDF data sources with a total of 30 billion subject-property-object (SPO) triples. The diversity of linked-data sources and their high heterogeneity make it difficult for humans to search and discover relevant information. As linked data is in RDF format, the standard approach would be to run structured queries in triple-pattern-based languages like SPARQL, but only expert programmers are able to precisely specify their information needs and cope with the high heterogeneity of the data (and absence or very high complexity of schema in-formation). For less initiated users the only option to query this rich data is by keyword search.11None of these approaches is satisfactory. Instead, the by far most convenient approach would be to search in knowledge bases and the Web of linked data by means of natural-language questions. For example: consider a quiz question like “Which female actor played in Casablanca and is married to a writer who was born in Rome?”. The answer could be found by querying sev-eral linked data sources together, like the IMDB-style LinkedMDB movie database and the DB-pedia knowledge base, exploiting that there are entity-level sameAs links between these collections. One can think of different formulations of the example question, such as “Which actress from Casablanca is married to a writer from Rome?” A possible SPARQL formulation, assuming a user familiar with the schema of the underlying knowledge base(s), could consist of the following six triple patterns (joined by shared-variable bindings) : ?x hasGender female , ?x isa actor, ?x actedIn Casablanca (film), ?x marriedTo ?w , ?w isa writer , ?w bornIn Rome. This complex query, which involves multiple joins, would yield good results, but it is difficult for the user to come up with the precise choices for relations, classes, and entities. This would require familiarity with the con-tents of the knowledge base, which no average user is expected to have. The goal is to automatically create such structured queries by mapping the user’s question into this representation. Keyword search is usually not a viable alternative when the information need involves joining multiple triples to construct the final result, notwithstanding good attempts. In the example, the obvious keyword query “female actress Casablanca married writer born Rome” lacks a clear speciﬁcation of the relations among the different entities. In this approach, new elements towards making translation of questions into SPARQL triple patterns more expressive and robust. Most importantly, we attempt solve the disambiguation and mapping tasks jointly, by encoding them into a comprehensive integer linear program (ILP): the segmentation of questions into meaningful phrases, the mapping of phrases to semantic entities, classes, and relations, and the construction of SPARQL triple pat-terns. The ILP harnesses the richness of large knowledge bases like Yago2 which has information not only about entities and relations, but also about surface names and textual patterns by which web sources refer to them. For example, Yago2 knows that “Casablanca” can refer to the city or the ﬁlm, and “played in” is a pattern that can denote the acted In relation. In addition, we can lever age the rich type system of semantic classes. For example, knowing that Casablanca is a ﬁlm for translating “played in” we can focus on relations with a type signature whose range includes ﬁlms, as op-posed to sports teams, for example. Such information is encoded in judiciously designed constraints for the ILP. Although we intensively harness Yago2, our approach does not depend on a speciﬁc choice of knowledge base or language resource for type information and phrase/name dictionaries. Other knowledge bases such as DBpedia can be easily plugged in. Based on these ideas, a system called DEANNA (Deep Answers for many Naturally Asked questions developed a frame-work and system, is developed. It comprises a full suite of components for question de-composition, mapping constituents into the semantic concept space, generating alternative candidate mappings, and computing a coherent mapping of all constituents into a set of SPARQL triple patterns that can be directly executed on one or more linked data sources. A question sentence is a sequence of tokens, The input question is fed into the following pipeline of six steps: 1. Phrase detection. Phrases are detected that potentially correspond to semantic items such as ‘Who’, ‘played in’, ‘movie’ and ‘Casablanca’. 2. Phrase mapping to semantic items. This includes finding that the phrase ‘played in’ can either refer to the semantic relation acted In or to played For Team and that the phrase ‘Casablanca’ can potentially refer to Casablanca (film) or Casablanca, Morocco . This step merely constructs a candidate space for the mapping. 3. Q-unit generation. Intuitively, a q-unit is a triple composed of phrases. 4. Joint disambiguation, where the ambiguities in the phrase-to-semantic-item mapping are resolved. This entails resolving the ambiguity in phrase borders, and above all, choosing the best ﬁtting candidates from the semantic space of entities, classes, and relations. 5. Semantic items grouping to form semantic triples. For example, we determine that the relation married to connect person referred to by ‘Who’ and writer to form the semantic triple person married to writer. This is done via q-units. 6. Query generation. For SPARQL queries, semantic triples such as person married to writer have to be mapped to suitable triple patterns with appropriate join conditions expressed through common variables: ?x type person , ?x marriedTo ?w, and ?w type writer for the example. 1. Phrase Detection A detected phrase p is a pair {concept, relation}, indicating whether a phrase is a relation phrase or a concept phrase. One special type of detected relation phrase is the null phrase, where no relation is explicitly mentioned, but can be induced. The most prominent example of this is the case of adjectives, such as ‘Australian movie’, where we know there is a relation being expressed between ‘Australia’ and ‘movie’. Multiple detectors are used for detecting phrases of different types. For concept detection, a detector that works against a phrase concept dictionary which looks as follows: f‘Rome’,‘eternal city’g ! Rome f‘Casablanca’g ! Casablanca (film) 2. Phrase Mapping After phrases are detected, each phrase is mapped to a set of semantic items. The mapping of concept phrases also relies on the phrase-concept dictionary. To map relation phrases, we rely on a corpus oftextual patterns to relation mappings of the form: f‘play’,‘star in’,‘act’,‘leading role’g ! actedIn f‘married’, ‘spouse’,‘wife’g ! marriedTo Distinct phrase occurrences will map to different semantic item instances. We discuss why this is important when we discuss the construction of the disambiguation graph and variable assignment in the structured query. 3. Dependency Parsing & Q-Unit Generation Dependency parsing identiﬁes triples of to-kens, or triploids. Triploids are collected by looking for speciﬁc de-pendency patterns in dependency graphs. The most prominent pattern we look for is a verb and its arguments. Other patterns include adjectives and their arguments, preposition ally modiﬁed tokens and objects of prepositions. By combining triploids with detected phrases, we obtain q-units. A q-unit is a triple of sets of phrases, conceptually; one can view a q-unit as a placeholder node with three sets of edges, each connecting the same q-node to a phrase that corresponds to a relation or concept phrase in the same q-unit. This notion of nodes and edges will be made more concrete when we present our disambiguation graph construction. 4. Disambiguation of Phrase Mappings The core contribution of this paper is a framework for disambiguating phrases into semantic items covering relations, classes, and entities in a uniﬁed manner. This can be seen as a joint task combining named entity disambiguation for entities, word sense disambiguation for classes (common nouns), and relation extraction. The next section presents the disambiguation framework in detail. 5. Query Generation Once phrases are mapped to unique semantic items, we proceed to generate queries in two steps. First, Semantic items are grouped into triples. This is done using the triploids generated earlier. The power of using a knowledge base is that we have a rich type system that allows us to tell if two semantic items are compatible or not. Each relation has a type sig-nature and we check whether the candidate items are compatible with the signature. Semantic web search is becoming appealing for commercial search engines. For instance Google’s Knowledge Graph can be seen as a vast knowledge base that Google aims to utilize for enhancing search results, moving from a search engine to a knowledge engine.WolframAlpha2 is a data inference engine that computes answers to factual queries from a structured knowledge base about the world, rather than providing a list of documents. In evaluating the methodology goal of QALD is to analyze and make a comparison between question answering systems between semantic data and users that express their information needs. The task for participating systems is to return a natural language question as an RDF data source, a list of entries that answer the question by labels or URLs, numbers, dates and literals such as strings. 1. Data sets- The selected datasets are required to contain real large scale data, being challenging enough to evaluate the abilities and shortcomings of the systems. Two different datasets with complementary properties and requirements were selected: DBpedia and MusicBrainz. DBpedia 12is becoming the central interlinking nucleus for the rising linked data cloud. The main DBpedia dataset for English expresses more than 4 Million entities extracted from Wikipedia.13 MusicBrainz is a joint open content music database.14 An RDF export of the MusicBrainz dataset contains all of MusicBrainz’ artists and albums as well as a subset of its tracks, leading to a total of roughly15million RDF triples. This data is modeled with respect to a small ontology with just a few classes and relations the MusicBrainz ontology Incase of QALD-1and the more standard Music Ontology14 in the case of QALD-2.15 Example 1: DBpedia 1. Who is the daughter of Bill Clinton married to? SELECT DISTINCT ?uri ?string WHERE { res:Bill_Clinton dbo:child ?child . ?child dbp:spouse ?string . ?uri rdfs:label ?string . } 2. Who produced the most films? SELECT DISTINCT ?uri ?string WHERE { ?film rdf:type dbo:Film . ?film dbo:producer ?uri . OPTIONAL { ?uri rdfs:label ?string . FILTER (lang(?string) = ’en’) } } ORDER BY DESC(COUNT(?film)) LIMIT 1 Example 2: MusicBrainz 1. Give me all live albums by Michael Jackson. SELECT DISTINCT ?album ?title WHERE { ?album rdf:type mm:Album . ?album mm:releaseType mm:TypeLive . ?album dc:title ?title . ?album dc:creator ?artist . ?artist dc:title ’Michael Jackson’ . } Open questions 1. What is Linked Data? 2. What is RDF? 3. What is the relationship between Linked Data and the Semantic Web? 4. What is the relationship between Linked Data and RDF? 5. Does linked data require RDF? 6. Is publishing RDF sufficient to create linked data? 7. How does one publish or deploy linked data? 8. Is linked data just another term or branding for the Semantic Web? 9. Does linked data only apply to instance data? 10. What role do ontologies play with linked data? 11. Is linked data a centralized or federated approach? 12. How does one maintain context when federating linked data? 13. Does data need to be open to qualify as linked data? 14. Can legacy data be expressed as linked data? 15. Can enterprise and open or public data be intermixed as Linked Data? 16. How does one query or access linked data? 17. How is access control or security maintained around Linked Data? 18. What are the enterprise benefits of linked data? (Why adopt it?) 19. What are early applications or uses of linked data? Conclusion In the Semantic Web domain, data linking represents a comparatively new trend, which typically follows the research conducted in ontology matching and particularly in the database research area. Thus, many problems currently considered in these areas are also applicable for data linking. For instance example, unreliable data sources and mistaken data values and the need to take links into account when processing queries. With reference to the tools there is still space for improvement and enhancement of system functionality by using stronger algorithms on reasoning. Bibliography Auer, S.; Bizer, C.; Kobilarov, G.; Lehmann, J.; Cyga-niak, R.; and Ives, Z. G. 2007. DBpedia: A Nucleusfor a Web of Open Data. In ISWC/ASWC . Battista, A.D.L., N. Villanueva-Rosales, M. Palenychka and M. Dumontier, 2007. SMART: A web-based, ontology-driven, semantic web query answering application. Bernstein , A.E., Kaufmann, C. Kaiser, Kiefer, C. Ginseng: a guided input natural language search engine, in: Proc. of the 15th Workshop InformationTechnologiesandSystems,WITS2005. 2006. pp.45–50 Cimiano, P., P. Haase, J. Heizmann, M. Mantel and R. Studer, 2007. Towards portable natural language interfaces to knowledge bases-the case of the ORAKEL system. Data Know. ng., 65: 325-354. Fernandez, O., R. Izquierdo, S. Ferrandez and J.L. Vicedo, 2009. Addressing ontology-based question answering with collections of user queries. Inform. Halpin, H.D. Herzig, P. Mika, R. Blanco, Pound, H.S. Thompson, T.-T. Duc, Evaluating Ad-Hoc object retrieval, in: Proc V. Uren, M. Sabou, E. Motta, M. Fernandez, V. Lopez, Y. Lei, Reflections on five years of evaluating semantic search systems, International Journal of Metadata, Semantics and Ontologies(IJMSO). 2010. 5(2)87–98. Heath, T., and Bizer, C. 2011. Linked Data: Evolving the Web into a Global Data Space. San Rafael, CA:Morgan & Claypool, 1 edition. Kaufmann, E., A. Bernstein and R. Zumstein, 2006. Querix: A natural language interface to query ontologies based on clarification dialogs. Proceedings of the 5th International Semantic Web Conference, (ISWC’ 2006), Citeulike, pp: 980-981. Ioannou, E., Nejdl, W., Niederée, C., & Velegrakis, Y. (2010). On-the-fly entity-aware query processing in the presence of linkage. In 36th International conference on very large databases (VLDB 2010) pp. 429–438. Lopez, V., Nikolov, A., Fernandez, M., Sabou, M., Uren, V., & Motta, E. (2009). Merging and ranking answers in the Semantic Web: The wisdom of crowds. In Proceedings of the 4th Asian Se- mantic Web Conference (ASWC 2009) (pp. 135–152). Shanghai, China. Suchanek, F. M.; Kasneci, G.; and Weikum, G. 2007.Yago: a core of semantic knowledge Tummarello, G.; Cyganiak, R.; Catasta, M.; Danielczyk, S.; Delbru, R.; and Decker, S. 2010. Sig.ma: Liveviews on the web of data. J. Web Sem. 8(4). Uren, V. M. Sabou, E. Motta, Fernandez, M. V. Lopez, Y. Lei, Reflections on five years of evaluating semantic search systems, International Journal of Metadata, Semantics and Ontologies(IJMSO). 2010. 5(2)87–98. Volz, J., Bizer, C., & Gaedke, M. (2009). Web of data link maintenance protocol Technical Report). Frei Universität Berlin. Read More

Data Linking Systems Analysis - Coursework Example

Extract of sample "Data Linking Systems Analysis"

CHECK THESE SAMPLES OF Data Linking Systems Analysis

Information Technology Law

Infromation System Development Blog

Information Systems in Business - Wal-Mart

Role of Information Technology in the UK Banking Sector

Comparison of Online Banking on The Differences between Males and Females

Implementing a Geographic Information System for Water & Sewerage Company

Business Intelligence Issues

Information Systems in Banking