Information Retrieval - an Overview Assignment Example | Topics and Well Written Essays

Component One Information Retrieval Systems are an essential feature of online content management and any company, such as this one, dealing in information must have a reliable and user efficient process to manage knowledge, content and data. As a company we deal with the production of content and knowledge and our ability to effectively streamline the availability and quality of access among our researchers will determine our effectiveness as a business. Managing content can best be handled with an efficient Information Retrieval System and proper and effective indexing of all our electronic data. In the context of the term Information retrieval System the word information may be misleading as it is used to refer to a document or image which is stored in the content database rather than data in it self. Knowing you have information is one thing, knowing where it is for effective retrieval is another. The Records and Archives Office of the establishment must be able to locate and retrieve the information you require, regardless of whether they are active, inactive or archived records from administrative files, committee documents or older archived records. When the establishment is an online research establishment the task of Electronic Record keeping will be a key for Classification. Staff and students should be encouraged to generate and disseminate information in electronic formats to enhance learning and research and to improve productivity. The principles of the management of electronic records are no different to those of the paper record. Records must be created, captured and maintained in a manner that ensures their ongoing integrity and retrievability for as long as they are required to meet the research and accountability requirements of the Establishment. Electronic records must remain available, accessible, retrievable and useable for as long as a business need exists or as long as legislative, policy and archival requirements exist. An 'Electronic Record' means any document or record created, communicated and maintained by means of electronic equipment and includes, but is not limited to, electronic organisers, computer-based diaries, appointment books and calendars, electronic mail, facsimile transmissions and databases. The office information retrieval services of a research establishment are mainly used for day-to-day information requests from laboratories (includes requests for information on various topics, updated information and research). The merit of building information retrieval systems was in predicting problems and end user satisfaction. In this case of research establishment the end user generally will be a research scholar and the problems that are to be predicted are generally about the updating the latest developments, for the updated information decides the pace of research. The prime purpose of a classification scheme is to provide control and consistency over the vocabulary used for titling of files and indexing records. It does so by providing listing of approved terms for file titling and indexing. Knowing which terms to search on means that searches will be more efficient and retrieval of the right records is facilitated. Classification can defined as: Systematic identification and arrangement of activities and/or records into categories according to logically structured conventions, methods and procedural rules represented in a classification system. It is clear that classification is a valuable records management tool that can be used for a variety of different activities. By applying classification schemes based on research, records can be indexed and titled and we can determine how they should be kept, how they should be stored, who should have access to them and how long they need to be retained. While one-size fits all approach is not being forced upon all areas of the establishment this Classification Scheme is the base Classification Scheme for the establishment and it is accepted that specific areas may have requirements to add to and/or substitute other terms in their place. This process should be done in consultation with the central Records and Archives management unit. The file Classification Scheme is a starting point for the creation of files across the Establishment, it is not meant to be a complete tool to meet every file title need and therefore does not try and provide for every file title across establishment. The Classification Scheme is not an index of file titles but simply a tool to assist staff to consistently title files and build the areas file index. The following classification scheme can be made available and followed for research establishments. Alphabetical File Classification Scheme Research Classification scheme Format Online Version of the File Classification Scheme. Indexing of the database is basically what enables good content management. Creating content often amounts to gathering data and storing it in electronic form. This database further needs to be enabled so that all the stored information can be easily located and retrieved. This is achieved by the indexing content and the efficiency and performance of the IRS largely depends on good indexing. Creation of the index for an IRS can be done manually or enabling the IRS software to automatically index or classify documents. Manual indexing can be done by using specific criteria as long as it can be easily understood by all users or by using a classification and query process that is defined by a competent person or group. While manual indexing is a time-honoured method of cataloguing, it has its limitations. For instance, its efficiency is restricted by the vocabulary of the user who may not know or be familiar with the terms that can be used to query database. This may not be disqualifying criteria in a small work group where all users can be easily educated on the indexing terms. However, it does limit the efficiency of the system and would require all new users in a group to be trained prior to using the IRS. The advantage of manual indexing is that it is easier to group documents that are needed to be together when the documents may not have a readily identifiable reason for their classification as a group. Automatic indexing can be considered as an extension to manual indexing, for the automation reduces or minimises the manpower but not nullifies it. So when it comes to updating minimum amount of manpower is required to keep automatic indexing afresh. Automatic indexing differentiates between classification and indexing in that classification is determined by the answer to the question of which group the document may be included in, while indexing would be the question of what name or tag the document or group of documents should have. The matter of separation between clarification and indexing would be relevant when seeking related documents. In automatic indexing, single word indexing is efficient and may be used rather than phrase indexing. Concepts which are used to create the index are stemming, statistical analysis, syntactical analysis, probabilistic analysis and thesaurus use and construction. The main advantage of automatic indexing seems to be that is quickly done. However, artificial intelligence in the software is not yet achieved sufficient advanced capability to differentiate between similar appearing words as efficiently as a human. INDEXING SYSTEMS Alphabetic Indexing Systems Numeric Indexing Systems Subject-Numeric and Alpha-Numeric indexing Systems Chronological Indexing Systems ALPHABETIC INDEXING SYSTEMS 1. METHOD OF INDEXING: BY NAME Documents are filed by name of firms or individuals in alphabetical order. METHOD OF RETRIEVAL: Documents are requested by name. APPLICATION: Case records topic records Employee records FACTORS TO CONSIDER: Easy to use. Direct indexing and retrieval Keeps records together that pertain to the same individual or project. No manual or index required for effective classification or retrieval of records. However it might be useful if records are voluminous. EXAMPLE OF INDEXING BY NAME: TOPIC Records TOPIC 1 TOPIC 2 2. METHOD OF INDEXING: BY SUBJECT Documents are grouped and indexed alphabetically by related topics or subjects in either encyclopedia or dictionary order. METHOD OF RETRIEVAL: Documents are requested by subject. APPLICATION: Direct indexing and retrieval Information retrieval is fast because all related information is together. Easy to expand by adding subdivisions or secondary divisions. Time consuming to read each topic before selecting a heading. Filing may be complicated if subject matter is too technical or vague. Need for cross-referencing if more than one subject is involved. A subject file manual and an index must be kept up-to-date. The manual clearly states the system in use and what decisions have been made with regard to indexing. The index is an alphabetically arranged list of the subjects and the headings under which they are indexed. This serves as a control listing for classifying and retrieving records. Neither subjects nor headings should be added or removed from the indexing system without due consideration. EXAMPLES OF INDEXING BY SUBJECT subjects and topics categories journals, monographs, slides, videos. NUMERIC INDEXING SYSTEMS 1. METHOD OF INDEXING FOR NON-CONFIDENTIAL RECORDS: Number in ascending order indexes documents. The number corresponds to the number on the file itself. METHOD OF RETRIEVAL: number requests Documents. For example: 1-100 APPLICATION: Queries on particular topic Asking information about a classification Taking information on a particular category. FACTORS TO CONSIDER: Fast and direct indexing and retrieving. EXAMPLE OF INDEXING SYSTEM: Classification Units 400-1 400-2 410 451-2 2. METHOD OF INDEXING FOR CONFIDENTIAL RECORDS: A number is assigned to each name or subject. Number in ascending order indexes documents. Numbering within the classification system can be: SERIAL (consecutive numbering) or DUPLEX (numbers are separated into parts) METHOD OF RETRIEVAL: Documents are usually requested by number but may be requested by name or subject. Thus, a cross-reference index of names and numbers or subjects and numbers must be maintained. There is an increased chance of errors in both filing and retrieving with numbers. When there are insufficient records to warrant individual folders, a separate alphabetic index must be maintained until it is time to create a separate folder. APPLICATION: Subject Records Topic records Information available on above FACTORS TO CONSIDER: Definite need for greater security Must maintain permanent record of all assigned numbers and those still available. Maintain a cross reference index by name and number or subject and number Must locate missing records when numbers or indexes are inaccurate. Increased chance of errors using numbers EXAMPLE OF SERIAL NUMBERING: Records by subject number 20390 20391 ----- 20393 20394 SUBJECT NUMERIC AND ALPHANUMERIC FILING SYSTEMS 1. METHOD OF INDEXING: SUBJECT-NUMERIC Documents are arranged by related subjects in the encyclopedic fashion and assigned numbers to maintain their sequence. METHOD OF RETREIVAL: An index listing all main and division headings must be consulted. FACTORS TO CONSIDER: Subject-numeric is more commonly used than alphanumeric. Time consuming to classify, file and retrieve. The index should provide some information about the contents of the file. EXAMPLE OF A SUBJECT-NUMERIC SYSTEM: TRA 1515 Training TRA 1515-1 Apprenticeship TRA 1515-2 Educational Data Network TRA 1515-3 Manuals 2. METHOD OF FILING: ALPHANUMERIC Documents are arranged by related subjects and assigned letters to maintain their sequence. Sub-categories of the subject are assigned identifying numbers. METHOD OF RETRIEVAL: An index listing all main and division headings must be consulted. APPLICATION: Limited FACTORS TO CONSIDER: Same as above EXAMPLE OF AN ALPHA-NUMERIC SYSTEM: Same as above CHRONOLOGIC INDEXING SYSTEM METHOD OF INDEXING: Documents are filed in sequence by date METHOD OF RETREIVAL: Documents are requested by date APPLICATION: Not often used for a main classification system because people have difficulty remembering dates. However appropriate for: Transaction files such as information registering and receiving from database. WITHIN individual files where the most recent record is filed at the front of a folder. FACTORS TO CONSIDER: No need of cross referencing Filing is easy and accurate because coding is not required. Fast and accurate retrieval No index. Therefore, if a record is requested by name, a physical search of the file is necessary. Consider another filing system if records are frequently requested by a method other than by date. Component 2 A subject gateway is a web site with links to various references and resources grouped around a particular subject. These resources can be browsed and searched. The subject gateway is a useful tool for a research establishment such as this one. It will enable our researchers to read and search points of interest on a particular subject and are an efficient method of information handling. Setting up a subject gateway must be done by keeping in mind certain criteria. These are operational framework, standards guidelines, quality of service delivery and scope. Organising and labelling the information and easy navigation in the web site are important features. The design of the website should aid in assimilation of information and intuitive access to resources. The subject gateway can be developed using a combination of human resources as well as automated processes. Resource selection of the subject gateway is chosen manually. Maintenance of all the resources collected is done regularly, including deletion of old or superseded resources. An automated process can efficiently do this. A short description of the resources available and classification and indexing of the resources will be required to be done manually to get good results. To achieve the best results for our research establishment, a single subject portal can be used as a link for related subjects in a particular area. This would entail the setting up of different subject portals for different fields of research, thus increasing the scope of linked information while keeping different areas of research separate. An operational frame work can be framed for a successful establishment an maintenance of a subject gateway. An operational framework The framework can be defined as "a comprehensive guide to what is available and where. This will be an essential tool for identifying resources that already exist and who to approach as potential partners." [1]. The development of the framework may be measured by examining: 1. Gateway coverage, 2. The quantity of resources and their update strategies, 3. The percentage of Parent establishment and non-Parent establishment content. 4. The ability to cross-search gateways simultaneously, 5. Current partners, 6. Plans for expansion within Establishment, and 7. Planned external partnerships. 1. Gateway coverage The gateway can be comprised of a database of quality sources of technical (research related) and Information Technology information on the Web. Due to the practical and applied nature of their subject, the information needs and information seeking habits of researchers and information technology professionals often differ from other disciplines. The gateway could be designed to help researchers save time and find relevant information on the Web quickly. A Website pointing to thousands of resources identified and contributed by Establishment researchers can be maintained. It can also networks the bodies responsible for education by the Establishment. Discussions and notice boards are offered on the site, making it a meta-network of Establishment’s Education partners. It services and creates communities of educators online - further networks. The site can be organised around Establishment curriculum, its tools can be offered free to Establishment’s educators. Through this, the Establishment’s education systems and sectors collaborate on a range of online education, communications and information technology issues - a powerful network. A single Web-based focal point for access to information related to particular Subject and resources of all kinds can also be maintained. The gateway provides access to Internet information such as electronic publications and databases, research projects, data sources, software, online teaching modules, directories, and conferences of that particular subject on which the focal point based on. 2. The quantity of resources and their update strategies Update strategy should be three-pronged: (a) use of the update information associated with the Web pages on the fly (a feature for identifying URL materials that have been modified as well and then send an alert back to the webmaster that attention is required); (b) notification by users; (c) use of the administrative review element and report functionality. Gathering software is likely to be added at a later stage to pull in sites identified by accredited contributors. The update strategy will be three-pronged: (a) the use of the ADMIN Core to signal expired resources; (b) the use of a link package to identify broken links; (c) feedback from users. The evaluated sites form the basis from which further linked items can be indexed. The update strategy is an indexing process for the sites identified above that is a cyclic, semi-automated process. Complementing this is a customised robot, which can be developed to harvest metadata-enriched content from accredited contributors' websites. A gatherer can be used to pull in sites, which already contain recognisable data. The addition of new resources is likely to slow down after the initial gateway establishment. Its update strategy consists of (a) software for broken link checking is being assessed; (b) random checks by contributors as a result of the selection process. 3. The percentage of Parent Establishment content and Non Parent Establishment Content. Relationships with other gateways will be implemented as links in the first instance. Establishment will be largely home based initially, but should also investigate the addition of resources for other establishments. Gateway should have global content, tailored to the target group. An optimum mix of Establishment content and non establishment content must be identified. Local contributions (within the establishment) can be expected to increase as liaison with Departments of particular subjects with which the Establishment deals with. 4. The ability to cross-search gateways simultaneously It should examine the possibility of using advanced techniques in the future. In the interim, sites will be linked to until more items become available. Establishment should consider the feasibility of mirroring the researches within. In addition, there are three possible cross-searching options: (i) No relevance-ranking algorithm is applied by ranking one gateway against another. Rather, removing duplicates, and then listing the results alphabetically interleave the results. (This is a simple option to apply when the ranking algorithms of individual sites are not accessible or are unknown.) The criteria for removing duplicates can be: (a). If Web-based resources, a match on URLs (and therefore mostly print-based), then a match on local network is recommended. (ii) It is possible to activate three different searches simultaneously, using the query language expected by each database, then combining the results into one set. An extension to this is ranking: (ii) in the software, results should be ranked according to different criteria (due to the different software and records being searched, it is unfortunately impossible to rank them in exactly the same way), however the end product is usually to rank the most useful records first. The catalogue should not count the instances of search words and weigh them according to where they appear. Gateway should be enabled by a combination of centralised searching through the transference of metadata from distributed sites or the addition of metadata to a central repository hosted by gateway Online. Searching can be considered as an enhancement, particularly by the research sector. [7]. For international sites, mirroring can be preferred in order to overcome network response time difficulties. This will be more of an issue for gateway. The decision should be based on knowledge of the users. The need for an ability to conduct searches across gateways is based on an assumption that none of the gateways will host content. Rather they will host it as part of a separate service (and therefore the content will be linked to in the same way as if it were remote) where the content is instantly accessible. The ability to support other strategies, such as mirroring of gateway resources or centralised searching options have not yet been decided, but issues such as best response times, technical content maintenance, refresh/update rates and synchronisation of data transfer will need to be considered by the gateway owners. The needs of target audiences should be reflected in the decision to incorporate other gateways' resources. As a secondary process, the gateways would like to explore the viability of delivering content by other technical means such as intelligent agents. The gateways should utilise a standard as a baseline and this could be used as a data exchange format to support interoperability. Gate way and focal point Online are underpinned by relational database technology. 5. current partners Establishment’s partners can be the University Libraries and other such organizations. Establishment's partners are the Universities of various provinces. 6. Plans for expansion within the Establishment Gateway should arrange discussions for potential collaboration with State-based Departments of research and other Educational centres. Gateway should provide a 'meta-network' for all education networks and gateways. The subject gateways initiatives within the establishment have a natural affinity, and it can provide with an opportunity for added value through aggregation. The gateway can be considered as a link to the Establishment’s research project 7. Planned partnerships Establishment’s partnerships can be explored continuously. An e-mail exchange can be done with various research organizations. There should be further dialogue with the Centres for excellence in research on their website. Potential connections with various establishments can also be pursued. Gateway can commence collaboration with the research libraries and can consider possible cross-searching with various websites. Within Establishment, each gateway can be strengthened by its collaborative nature. In addition, it can be considered to be essential for development and continued synergy possible with strong partnerships. The Establishment’s projects can be considered national by virtue of the breadth of their participants, but must be committed to both sharing and learning from the expertise and experience of the longer-established gateways in other countries. Standards and guidelines The guidelines (including Evaluation) can be considered as: "the wealth of experience in creating and managing Internet gateways, and the costs of these activities, which exist in the community." The standards and guidelines are reflected in: . the Gateway schemas used for resource description, . solutions for technical issues arising in the establishment of the gateways, and . the evaluation criteria for the gateways. Component 3 In the entity relationship model for a database of staff CVs, the entity set would be the employee. His or her qualification would be the relationship set while the field of expertise would be another relationship set. It can also be called an entity-relationship model, a graphical representation of entities and their relationships to each other, typically used in computing in regard to the organization of data within databases, information systems. An entity is a piece of data—an object concept about which data is stored. A relationship is how the data is shared between entities. The relationships between entities can be termed as follows: One-to-one: one instance of an entity (A) is associated with one other instance of another entity (B). For example, in a database of employees, each employee name (A) is associated with only one Qualification or experience. One-to-many: one instance of an entity (A) is associated with zero, one or many instances of another entity (B), but for one instance of entity B there is only one instance of entity A. For example, for a company with all employees working in one building, the building name (A) is associated with many different employees (B), but those employees all share the same singular association with entity A. Many-to-many: one instance of an entity (A) is associated with one, zero or many instances of another entity (B), and one instance of entity B is associated with one, zero or many instances of entity A. For example, for a company in which all of its employees work on multiple projects, each instance of an employee (A) is associated with many instances of a project (B), and at the same time, each instance of a project (B) has multiple employees (A) associated with it. ER modeling was first created to provide a diagramming notation that could directly map to a relational database scheme. In the ensuing years, it has taken on many disparate flavors, each emphasizing an individual developer's priorities for modeling data. Entities. Attributes. Relationships. The definitions have become increasingly sophisticated as we refine our knowledge of data modeling and pursue the "best" method (a pursuit that, like the pursuit for the "best" operating system, depends on a host of factors - some objective, some subjective). "An entity is a ‘thing’, which can be distinctly identified. There are many 'things' in the real world. It is the responsibility of the database designer to select the entity types which are most suitable for his/her company." Relationships receive similarly helpful definitions: "Relationships may exist between entities." For representing cardinality - that is, the trivial debate over how symbology for conveying one-to-one, one-to-many, and many-to-many relationships between entities should be represented - has probably spawned more controversy than any fundamental characteristic of the notation. The most popular such version is found in Information Engineering, in which crow's feet and crossbars are used to indicate cardinality. Attributes are the properties of the entities, and relationships represent the "connections, links, or associations" between entities. Still, despite the increasing specificity and despite the pages of details describing how to model a wide variety of specific situations. A simple data-modeling problem will help demonstrate how an ER diagram is developed. The following common example serves as a baseline to which you can compare other data modeling approaches. Consider the popular department, building, and employee problem. You have three entities: department, employee and experience. Possible attributes of department are Department Name and employee experience in years; possible attributes of building are Department Name, employeeID Experience. Possible attributes of employee are Employee Name, Employee, Employee Department, and EmployeeHireDate Employee experience. The relationships between Department and Employee can be defined in noun/verb format as: "Department consists of zero or more Employees," "Employees belong to one and only one Department," and "Employees work in one and only one Building." These three entities, their attributes, and the relationships between them are illustrated in Figure. Read More

Information Retrieval - an Overview - Assignment Example

Extract of sample "Information Retrieval - an Overview"

CHECK THESE SAMPLES OF Information Retrieval - an Overview

The Elevator Pitch

The Danger Of The Encephalitis

Data versus information

Advantages of Information Obtained from an Information System

Motiation Questionnnaire overview: Draft

Data Type Taxonomy for Information Visualization

Creating A Use Case

HL7 vs. Meaningful Usage