StudentShare
Contact Us
Sign In / Sign Up for FREE
Search
Go to advanced search...
Free

Data Warehousing and Data Mining - Essay Example

Cite this document
Summary
This essay "Data Warehousing and Data Mining" talks about the repository of the historical and current data of an organization that is deemed important to management in decision making support, and how the data is extracted from the organization’s operational systems, and how the data is created as the snapshots for history…
Download full paper File format: .doc, available for editing
GRAB THE BEST PAPER92.4% of users find it useful
Data Warehousing and Data Mining
Read Text Preview

Extract of sample "Data Warehousing and Data Mining"

al Affiliation) Q1: Data warehouse is the repository of the historical and current data of an organization that is deemed important to management in decision making support. how the data is extracted from the organization’s operational systems and how the data is created as the snapshots for history. This helps in handling emergency queries and scheduled reports. Data warehouse also helps in performing complex analysis and queries. It supports data analysis and decision support by having data organized in a form ready to undergo analytical processing through activities like querying, data mining and reporting. It is thus characterized by subject, integration, time and volatility. Subject-oriented – this is the type of data warehousing in which the data is arranged by the subject details that are relevant to the decision support processes thus enable users determine the how and why of the organization’s performance. Integrated-this is the characteristic of data warehouse in which the data is placed from different sources are placed in a consistent format. This is only possible if the data warehouse is dealing with major data warehousing conflicts like naming discrepancies. Time-variant-this is the characteristic of data warehouse that maintains historical data. Time is a very important aspect of warehousing that defines the status of data in real systems. This character is responsible for detecting trends, long-term relationships and deviations that help in comparisons and forecasting which forms a very important part of decision making. Non-volatile- this is the characteristic of data warehouse that ensures that data entered into the data warehouse is not manipulated by users through updating or changing. It also ensures that changes in the data are recorded as new data and obsolete data are discarded. Q2: Natural Language Processing is the mode of representation that was used by early text mining applications to introduce various structures to the text-based documents so that they could cluster them to the natural groupings or classify them to some predetermined classes. In the earlier text mining like bag-of –words the natural language was considered a cluster of words. Texts like paragraphs, sentence or complete document were often represented as a collection of world. This process disregarded the grammar of the text or the order in which these words appeared. Though this model has become obsolete, some document classification tools still use it. This is still applicable in span filtering where the e-mail message is modeled as an unordered bag-of –words which are then compared to other predetermined bags-of-words. In the spam filtering, a single bag is filled with the words that were found in spam messages while the other bag is filled with the words that are found in the genuine or legitimate e-mails. There is likelihood that there are words that will be found in both bags to have been used in spam messages and in the legitimate e-mail text. It is therefore the match between the bag-of –words containing e-mail and those containing the word descriptors that determine the classification of e-mail as either legitimate or a spam. Challenges of Natural Language Processing- there are a number of discrepancies that comes with the use of natural language processing. These include; Word sense disambiguation- it is difficult to process different words with the same meaning. Part-of-speech tagging-it is difficult to categorize certain words in a speech as nouns, adjectives, verbs and adverbs. Systactic ambiguity- the natural language grammar is ambiguous. Speech acts- there are chances that the sentence may not contain enough information that can clearly define the action of the sentence therefore a sentence can be considered an action. Imperfect or irregular input- the use of natural language processing makes it difficult to process foreign accents and vocabularies especially when they have grammatical or typographical error. It is also difficult to analyze speeches without text boundaries. Q3: text mining- is the process of extracting useful information and knowledge from large amounts of data sources that are unstructured. This process is semi-automated. Data mining is the process of identifying novel, potentially useful, valid and understandable patterns of the data stored in the structured databases. Web mining is the process of discovering interesting and useful information from the web data which is expressed in the form of linkage, textual, and usage information. Text mining, data mining and web mining are similar in the sense that they use the same processes and have the same purpose. The commonalities between these mining processes come in the form of the content they handle. All these processes use similar data extraction process throughout the extraction action. They operate in the in the same array of data extraction that make it easy to process the language. In text mining the data handled are put into categories that help extraction process very easy. All these data handling processes use similar call frequencies that aggregate text data that are then clustered into the probabilistic data samples. They all have the ability to automatically extract the useful information. The processes all use the web tools to extract information. They navigate the modifications and business tools. The major difference of data mining, text mining and web mining is in their implementation. They are implemented distinct paradigms which use the same tool to extract the needed information. Q4: web 2.0 is the emerging technological trends in which the World Wide Web permits and enhances the interaction of users so as to be able to share information over online platform. It has some major components like blogs, wikis and the web services. These are the major components of web 2.0. This technology has always been considered a very important aspect of online business with its use major impacting on the business world in positive and negative ways. Wikis- they are component of web 2.0 that are used in aiding collaborative interaction between the company structures. In this respect, it encourages posting of hypotheses and help requests. It then initiates the process of suggestion taking and commentary. This helps find solution to different problems that users encounter in their daily business undertakings. Blogs- this is an interactive tool that is centered in certain kind of data and a key metric. They provide interfaces that can be used to post information. It can then take comments. This is important to the business intelligence as it allows for different information to be shared and accessed thus widen the scope of business knowledge. RSS feeds- these are dashboard embedded tools that help in inquiry process. It can show various aspects of inquiry that can be useful to the business by initiating collective analysis of information about transaction characteristics and behaviors related to selling that can produce high sales. Virtual world- the virtual world is the aspect of technology that gives the touch of telepresence and distance participation. This kind of artificial world is created by computers which allow users to create avatars that interact with other individuals generated by computers. The avatars can be hired and used as employees in business field. They can manage businesses and carry out functions as designated by the user this makes them very important aspect of business intelligence. Q5: How data can be divided between training and test sets. Data sets can be divided into training and tests data. There are several methods that can be used to divide into training sets and test sets. This is always done to achieve the data analysis accuracy. The methods that can be used to divide data include the following; cross validation and residual validation. These models of data evaluation have some distinctive setbacks. For example, in the residual evaluation model, the learner is not well explained to what they will do when they are asked or required to make new prediction about data they haven’t seen. This makes it a little inferior method to cross validation. However this problem can be avoided by using only a portion of the data sets while training someone. The best method is cross validation. This method involves the dividing of data sets into training sets and test sets. During learning, some data is removed at the training level after which the removed data is used for testing the performance of the learner on the new data. There are a number of cross validation kinds, they include; holdout method, k-fold and leave-one-out cross validation. The best and common amongst these methods is k-fold cross validation. K-fold cross-validation; this is the method where data is divided into k-subsets and the method is then repeated k-times. Every time a method is done, one k-subset is made a test set while the other subset which now becomes k-1 subset is used for training. Then their average error margin is computed. The outstanding advantage of this method is that the manner of data division does not matter. Q6: the ETL process refers to the process of data integration that involves data extraction from external sources, transformation of the extracted data into appropriate format and the loading of the transformed data into the warehouse repository. This process stimulates physical data movement from their sources to the target data. The extraction process is the first process in which the data is collected from the source. The second process is the data transformation where the data is formatted into the right form that is compatible with the database it’s targeting. The final process is the loading in which the data that has been transformed is imported into the data warehouse. Below are the processes; Extraction- this process of data integration is where the data source systems are connected into then the data is selected and collected for processing within the data mart or the data warehouse. The data is collected from various sources that can be stored in different forms. This process transforms the extracted data into the format that can be transformed in the next stage. The amount of data to be extracted determines the complexity of this process. Transformation- in this stage a series of functions are executed to the data extracted that help convert it into the right format. The records are either rejected or validated in this stage. The data amount determines the manipulations required. The process involved in this stage include, data filtering, data standardization, data sorting, data translation and data consistency check Loading- in this stage the data that has been extracted from the source and transformed are imported to the target data warehouse. The load process can also insert data into the record as new rows. However, this process does not allow for integrity check. REFERENCES Reeves, L. L. (2009). A managers guide to data warehousing. Indianapolis, IN: Wiley Pub.. Taniar, D. (2011). Integrations of data warehousing, data mining and database technologies innovative approaches. Hershey, PA: Information Science Reference. Turban, E. (2008). Business intelligence: a managerial approach. Upper Saddle River, N.J.: Pearson Prentice Hall. Read More
Tags
Cite this document
  • APA
  • MLA
  • CHICAGO
(“Exam_Decision support system and business intelligence 2 Essay”, n.d.)
Exam_Decision support system and business intelligence 2 Essay. Retrieved from https://studentshare.org/information-technology/1638926-examdecision-support-system-and-business-intelligence-2
(Exam_Decision Support System and Business Intelligence 2 Essay)
Exam_Decision Support System and Business Intelligence 2 Essay. https://studentshare.org/information-technology/1638926-examdecision-support-system-and-business-intelligence-2.
“Exam_Decision Support System and Business Intelligence 2 Essay”, n.d. https://studentshare.org/information-technology/1638926-examdecision-support-system-and-business-intelligence-2.
  • Cited: 0 times

CHECK THESE SAMPLES OF Data Warehousing and Data Mining

Data Mining and Data Warehousing

data mining data mining data mining is the latest and the most powerful technology, and that have great potential in helping companies to focus only on the most vital information in the collected data on the behavior, of their potential customers and their current customers (Olson & Delen, 2008).... data mining is a powerful tool which makes it good for business analytics, and the models utilizing procedures to bring about deserving results in customer service....
4 Pages (1000 words) Assignment

Data Mining Technologies

data mining is defined as: “a… 1).... It is not possible to give specific advice, but there are four general principles Running Head data mining data mining At the beginning of the 21st century, organizations depend upon information technology unsuccessful use of information systems management.... data mining is defined as: “a decision support process in which we search for patterns of information in data” (Pushpa 2007, p....
2 Pages (500 words) Essay

Customer Service in the Travel and Tourism Industry

The paper will first start with a brief introduction to the importance of customer service in the travel and tourism industry.... This will be followed by a… Premium Inn.... A brief study of two other such organizations will also be taken and these will be compared to Premium Inn to learn the differences in their customer care models....
12 Pages (3000 words) Essay

High Level ETL and Data Mining Requirements

The prices of ETL High Level ETL and data mining Requirements Introduction A Data Mining and ETL methodologies seek to organize the pattern discovery process in the data warehouse of an organization.... ?data mining.... ?Geographic data mining and knowledge discovery.... igh Level ETL ETL is the process in database usage especially data warehousing that involves the following activities:Extracting data from external sources.... ata mining It is the practice of analyzing information from different dimensions and simplifying it in meaningful information (Clifton, 2010)....
2 Pages (500 words) Research Paper

Data Warehouse Business Technology

Data warehouse is a system used for reporting and data analysis of information within organizations.... Some of the advantages of data using involves integrating data from different sources and combining it on a common platform, and keeping the data history as a way of tracking the information coverage within an organization.... Understanding the features of the data warehouse and the issues surrounding this technology is crucial for its application in the public domain....
4 Pages (1000 words) Essay

Structures of a Database vs Data Warehouse

Applications of Data Warehouses And Data Mining Data Warehousing and Data Mining has picked up enhanced ubiquity in various territories of business to analyze the extensive databases rapidly which would be excessively unpredictable and tedious.... A Practical Guide to data mining for Business and Industry.... data mining Applications for Empowering Knowledge Societies.... On the other side, a data warehouse is customized for The Structures of A Database And A Data Warehouse Differences between the Structure of a Relational Database and data Warehouse....
2 Pages (500 words) Assignment

How Should a Company Measure the Success of Its Business Intelligence

n “Business Intelligence: The Savvy Managers Guide” David Loshin describes the basic architectural components of a business intelligence environment, ranging from traditional topics such as business process modeling, data modeling, and more modern topics such as business rule systems, data profiling, information compliance, and data quality, data warehousing, and data mining.... avid Loshin has described Business Intelligence on the basis of data Models, data Standards....
6 Pages (1500 words) Literature review

Multi Agent Driven Data Mining for Knowledge Discovery in Cloud Computing

Cloud computing and data mining have become famous phenomena in the current application of information technology.... … The paper "Multi Agent Driven data mining For Knowledge Discovery in Cloud Computing" is a great example of a research proposal on logic and programming.... With the changing trends and emerging of the new concept in the information technology sector, data mining and knowledge discovery have proved to be of significant importance....
7 Pages (1750 words) Research Proposal
sponsored ads
We use cookies to create the best experience for you. Keep on browsing if you are OK with that, or find out how to manage cookies.
Contact Us