StudentShare
Contact Us
Sign In / Sign Up for FREE
Search
Go to advanced search...
Free

Data Mining, Its Purpose and Its Working Methodology - Coursework Example

Cite this document
Summary
From the paper "Data Mining, Its Purpose and Its Working Methodology" it is clear that data mining is a knowledge discovery process that is also known as Knowledge Discovery in Databases. The primary function of data mining or KKD is to analyze and search a large number of data patterns in a database…
Download full paper File format: .doc, available for editing
GRAB THE BEST PAPER98.8% of users find it useful
Data Mining, Its Purpose and Its Working Methodology
Read Text Preview

Extract of sample "Data Mining, Its Purpose and Its Working Methodology"

? Full Paper This paper demonstrates in depth understanding of data mining concepts supported with real world examples. Moreover, data mining commands, languages, techniques are also discussed in detail. Furthermore, along with the benefits, various challenges and their solutions are also discussed briefly. 2 Introduction Data mining is a knowledge discovery process that is also known as Knowledge Discovery in Databases (KDD). The primary function of data mining or KKD is to analyze and search large number of data patterns in a database. Likewise, it utilizes computerized data analysis techniques to expose relationships of data items that were previously hidden or undetected. However, the data that is analyzed via different techniques is fetched from data warehouses, where many databases are interconnected with each other. Major techniques that are involved in the process of data miming are regression, classification and clustering. Data mining is incorporated for gaining in depth patterns for market intelligence from data warehouses containing massive amount of data. However, the issue that arises is not the quantity of data, as we already have massive amount of data to work with, it is the methodology that is required to learn data. 3 Data Mining 3NF is usually recommended for a corporate environment managing massive amount of replicated data. For instance there is no requirement of saving data several times. However, there is a requirement of doing more joins. Comparatively, 1NF will provide the functionality of storing replicated data regardless of number of joins. It is the choice of database administrator to evaluate what is the right form; it may be 3NF or 1 NF. Moreover, normalization comprises of five rules that are applied on a relational database. The main objective is to eliminate or minimize the redundancy and at the same time increasing database efficiency. The negative part illustrates that too much implementation of normalization can cause issues. The objective is to deploy the highest acceptable level of normalization. If we compare three of these NF’s, the 1NF removes replication in groups. The 2NF reduces data replication or redundancy and the 3NF reduces columns from the tables that are not reliant on primary keys. Therefore, database design must demonstrate the highest level of normalization possible, in order to make database efficient and robust. In order to maintain 3 large databases for a VLDB and to keep them efficient for two years if required, there is a requirement for constructing a ‘store and forward’ mechanism that will process the data or information from and through each distribution center database. Likewise, at the same time embrace that data or information pending till the completion of EDW. Moreover, data archiving is also required for maintaining each distribution center becoming a VLDB. EDW is efficient enough to support this scenario. A study demonstrated the overall cost of this disease throughout the world is $376 Billion annually. It is now almost fundamental that a person exceeding an age of 60 have more chances to get this disease, as it is now considered as the fourth largest live taking disease globally along with making its name for the fourth most common disease that contributes to a death of a person. However, the most common of all diabetes is the type 2. As there are almost 20% habitants suffering from in the United Arab Emirates alone, many research studies and debates are conducted yearly in Dubai and Abu Dhabi. Moreover, awareness sessions are conducted in every town of the cities to aware the people about this disease (MoH launches second phase of diabetes campaign.2010). However, this case study demonstrates the disease diabetes and medical data associated with patients from the Middle east region i.e. United Arab Emirates for discovering concealed patterns and the valuable information that can be utilized for decision making process. In addition, these informed decisions are performed by medical personnel and practitioners. Therefore, this case study can be utilized for illustrating the requirements for medication for each type of diabetes and also forecast the futuristic elements reflected in the extracted data (MoH launches second phase of diabetes campaign.2010). In the process of data mining, the data associated with people has risky ethical significances. Data mining experts need to deploy norms for making the data application resilient (Keating, 2008). As far as humans are concerned, this method is associated with disparity along with behaviors such as racialism, as they are negative to norms. Perception that is considered as another example is dependent on the applied classification, as it is recognized in splitting a disease that needs urgent attention. However, in case of a financial institution or a bank, loan acquisition is a non-ethical characteristic. Similarly, there are numerous factors that may be relevant to data mining. For instance, a report that was published from a leading consumer illustrated that in France, customer possessing a red car are more likely to be defaulters in returning loans back. As it is a debatable issue on categorizing it as ethical or non-ethical issue, similarly, insurance companies are always selected and discriminating because the differentiation factor includes a young person with an old lady is addressed in associated statistics, as young people have more likely hood of accidents, resulting in high insurance returns for their damaged cars. Other various issues pertain to data mining is techniques, tools, user involvement, performance and various types of data. A comprehensive discussion for each of them is as follows: 3.1 Methods of Mining and Interaction Concerns This issue pertains to the information extracted from databases and reviews the capability to gain information, as it focuses on mining information at many levels, the usability of issues associated with domains along with knowledge conception. 3.2 Mining several kinds of Information There are several different criteria that should be addressed by data associated with variety of data analysis and information discovery jobs. Moreover, jobs associated with data mining comprises of characterization, association / co relation and cluster analysis along with classification and forecast. Likewise, these jobs will utilize a single database for data mining and gives results as an output in several methods. 3.3 Collaborative Mining for Extracting Information It is difficult to predict the form along with composition of a database, as the purpose is to be iterative. Likewise, it can be allocated in different parts by applying sampling techniques along with providing techniques to the data mining specialist for making things easy and to save memory space as well. 3.4 Synchronizing Contextual Information Retaining the contextual information on the issue within a domain is a direction to define or decide the obscured data criteria that is demonstrated into summarizing terms. However, for a robust data mining process, domain knowledge is essential, as it analyze useful patterns that are associated with a set criterion. 3.5 Query Language for Data Mining Relational query language is valuable option as Structured Query Language (SQL) provides data mining specialist an option to implement various queries for acquiring specific set of data. However, for high level data processing in data mining, query languages are complex and far more advanced. These queries facilitate data mining experts to implement data mining jobs associated with domain knowledge. Moreover, these queries are also easily integrated with state of the art applications that are operational in data warehouses. Furthermore, along with integration, these queries also aid data mining experts to execute queries for quality data acquisition (Data mining extensions.2007). 3.6 Information Visualization The high level information extracted from databases is illustrated as high level visual representations and will activate data mining specialist to realize and apprehend the data (Data mining extensions.2007). However, interacting on a high level requires an aid of graphical representations such as graphs, bar charts, tables and rules. 3.7 Organizing Ineffectual Data Database also contains a large amount of extraneous data or incomplete data that creates hurdles in the process of data miming, as the algorithms and criteria are set to gather information from complete data, incomplete data also plays its part and make the process complex and in accurate in some cases (Data mining extensions.2007). However, the best option is to clean the data initially by data cleaning and analysis techniques and methods before utilizing it in the data mining process (Fowler, Karadayi, Chen, Meng, & Fowler, 2000). 3.8 Valuating Patterns Valuating patterns is an absolute essential task to perform. As there are numerous patterns that are extracted within the data mining processes and techniques, data mining specialist only analyze relevant and adequate patterns of data. This process involves high level expertise along with expert application knowledge, domain background knowledge issues and limitations associated with specific users. All these factors can limit the searching process for valid pattern discovery of data (Fowler, Karadayi, Chen, Meng, & Fowler, 2000). 4 Performance Limitations Data mining performance bottlenecks are linked with scalability, capability, and analogy of the data mining methods and procedures. Likewise, for making the data mining process effective, acquisition of information from data warehouses acquiring numerous databases is necessary. However, there are certain challenges when accessing data from large data warehouses, as some of the challenges includes long time delays in process data miming algorithms. However, the solution for this challenge is to incorporate distributed and parallel data mining techniques. These techniques can divide data in to different segments for making the process faster. 4.1 Challenges For addressing individual privacy, data mining technology is not up to the mark. Likewise, it also links data mining to be considered as a social facet. To inspect and study the customer’s buying habits and preferences from the market or predicting valuable patterns that will facilitate in creating futuristic decisions, business transactions are calculated along with the procurement of associated information. Data integrity plays a crucial role in data mining for providing authentic data that can be trusted. Likewise, the challenge is to consolidate unpredictable data collected from several sources. For instance, a financial institution or a bank utilized various techniques, tools, databases and methods for capturing data, in order to manage credit card accounts of their customers. Although, there are certain variations while gathering or capturing data in different formats, it is because of dissimilar software products. For addressing these issues, software products must be able to cope up with different systems running on different platforms, in order to gather data in a standardized format. A database where data is maintained in tabular form is called as a relational database. Relational database can be utilized for data mining techniques for addressing specific queries. As the technology associated with client/server architecture is progressing at a rapid pace and storage is managed at a single location that can be a preferred location for data mining. As there are many hardware equipment manufacturers in the market, cost of hardware has relatively minimized because of tough competition, as a result, the structure of data mining queries has also changed for utilizing optimal hardware features. Likewise, data mining queries are more powerful than before and extract optimal value form the data at ultimate speeds. Speed augments the amount of data acquisition during data mining process and at the same time provides valuable information to make informed decisions. In the process of data mining, the data associated with people has risky ethical significances. Data mining experts need to deploy norms for making the data application resilient. As far as humans are concerned, this method is associated with disparity along with behaviors such as racialism, as they are negative to norms. Perception that is considered as another example is dependent on the applied classification, as it is recognized in splitting a disease that needs urgent attention. However, in case of a financial institution or a bank, loan acquisition is a non-ethical characteristic. Similarly, there are numerous factors that may be relevant to data mining. For instance, a report that was published from a leading consumer illustrated that in France, customer possessing a red car are more likely to be defaulters in returning loans back. As it is a debatable issue on categorizing it as ethical or non-ethical issue, similarly, insurance companies are always selected and discriminating because the differentiation factor includes a young person with an old lady is addressed in associated statistics, as young people have more likely hood of accidents, resulting in high insurance returns for their damaged cars. Other various issues pertain to data mining is techniques, tools, user involvement, performance and various types of data. A comprehensive discussion for each of them is as follows: 5 Methods of Mining and Interaction Concerns This issue pertains to the information extracted from databases and reviews the capability to gain information, as it focuses on mining information at many levels, the usability of issues associated with domains along with knowledge conception. 5.1 Mining several kinds of Information There are several different criteria that should be addressed by data associated with variety of data analysis and information discovery jobs. Moreover, jobs associated with data mining comprises of characterization, association / co relation and cluster analysis along with classification and forecast. Likewise, these jobs will utilize a single database for data mining and gives results as an output in several methods. 5.2 Collaborative Mining for Extracting Information It is difficult to predict the form along with composition of a database, as the purpose is to be iterative. Likewise, it can be allocated in different parts by applying sampling techniques along with providing techniques to the data mining specialist for making things easy and to save memory space as well. 5.3 Synchronizing Contextual Information Retaining the contextual information on the issue within a domain is a direction to define or decide the obscured data criteria that is demonstrated into summarizing terms. However, for a robust data mining process, domain knowledge is essential, as it analyze useful patterns that are associated with a set criterion. 6 Query Language for Data Mining Relational query language is valuable option as Structured Query Language (SQL) provides data mining specialist an option to implement various queries for acquiring specific set of data. However, for high level data processing in data mining, query languages are complex and far more advanced. These queries facilitate data mining experts to implement data mining jobs associated with domain knowledge. Moreover, these queries are also easily integrated with state of the art applications that are operational in data warehouses. Furthermore, along with integration, these queries also aid data mining experts to execute queries for quality data acquisition. 7 Information Visualization The high level information extracted from databases is illustrated as high level visual representations and will activate data mining specialist to realize and apprehend the data. However, interacting on a high level requires an aid of graphical representations such as graphs, bar charts, tables and rules. 8 Organizing Ineffectual Data Database also contains a large amount of extraneous data or incomplete data that creates hurdles in the process of data miming, as the algorithms and criteria are set to gather information from complete data, incomplete data also plays its part and make the process complex and in accurate in some cases. However, the best option is to clean the data initially by data cleaning and analysis techniques and methods before utilizing it in the data mining process. 9 Valuating Patterns Valuating patterns is an absolute essential task to perform. As there are numerous patterns that are extracted within the data mining processes and techniques, data mining specialist only analyze relevant and adequate patterns of data. This process involves high level expertise along with expert application knowledge, domain background knowledge issues and limitations associated with specific users. All these factors can limit the searching process for valid pattern discovery of data. 10 Performance Limitations Data mining performance bottlenecks are linked with scalability, capability, and analogy of the data mining methods and procedures. Likewise, for making the data mining process effective, acquisition of information from data warehouses acquiring numerous databases is necessary. However, there are certain challenges when accessing data from large data warehouses, as some of the challenges includes long time delays in process data miming algorithms. However, the solution for this challenge is to incorporate distributed and parallel data mining techniques. These techniques can divide data in to different segments for making the process faster. 10.1 Explanation of the Problematic Domain As per (Camacho, Gharib et al. 2007), Diabetes mellitus is defined as “DM is a chronic disorder of glucose homeostasis characterized by hyperglycemia and impaired insulin action, with abnormal pancreatic insulin secretion as well as increased rates of hepatic glucose production. Unlike type 1 DM, no absolute physiological lack of insulin is present”. Diabetes is a disease that may be considered as a primary driving factor for some of the other health issues. For instance, diabetes imposes almost 100% risk for vascular diseases such as cardiovascular diseases along with creating issues in kidneys and the brain as well. A recent survey was published that illustrates the number of diabetic patients exceeding from more than 285 million globally. Likewise, diabetic patients are more common in well developed countries, however, study shows that it may gain a significant number of patients in Africa and Asia. Likewise, in the Middle East and North Africa, currently there are minimum 27 million patients suffering from diabetes mellitus. Moreover, in the United Arad Emirates, almost one third of the population is suffering from this disease. A study demonstrated the overall cost of this disease throughout the world is $376 Billion annually. It is now almost fundamental that a person exceeding an age of 60 have more chances to get this disease, as it is now considered as the fourth largest live taking disease globally along with making its name for the fourth most common disease that contributes to a death of a person. However, the most common of all diabetes is the type 2. As there are almost 20% habitants suffering from in the United Arab Emirates alone, many research studies and debates are conducted yearly in Dubai and Abu Dhabi. Moreover, awareness sessions are conducted in every town of the cities to aware the people about this disease (Zain 2009). However, this case study demonstrates the disease diabetes and medical data associated with patients from the Middle east region i.e. United Arab Emirates for discovering concealed patterns and the valuable information that can be utilized for decision making process. In addition, these informed decisions are performed by medical personnel and practitioners. Therefore, this case study can be utilized for illustrating the requirements for medication for each type of diabetes and also forecast the futuristic elements reflected in the extracted data. 11 Conclusion The first phase of the brief starts with the comprehensive introduction of data mining, its purpose and its working methodology. The second phase incorporates detailed discussion on data mining methods, mining data types, information extraction methods, query language that is used for extracting data, information visualization and evaluation of the patterns. The third phase discusses the challenges that may occur while performing data mining to extract meaningful data. To elaborate the challenges further, a comprehensive discussion is also available for the problem domain. References CAMACHO, P.M., GHARIB, H. and SIZEMORE, G.W., 2007. Evidence-based endocrinology Philadelphia: Lippincott Williams & Wilkins. Data mining extensions. (2007). Network Dictionary, , 134-134. Fowler, R. H., Karadayi, T., Chen, Z., Meng, X., & Fowler, W. A. L. (2000). A visualization system using data mining techniques for identifying information sources. (). Keating, B. (2008). Data mining: What is it and how is it used? Journal of Business Forecasting, 27(3), 33-35. MoH launches second phase of diabetes campaign. (2010). Arabia 2000, Read More
Cite this document
  • APA
  • MLA
  • CHICAGO
(“Data Minining and Data Discovery Research Paper”, n.d.)
Retrieved from https://studentshare.org/information-technology/1488551-data-minining-and-data-discovery
(Data Minining and Data Discovery Research Paper)
https://studentshare.org/information-technology/1488551-data-minining-and-data-discovery.
“Data Minining and Data Discovery Research Paper”, n.d. https://studentshare.org/information-technology/1488551-data-minining-and-data-discovery.
  • Cited: 0 times

CHECK THESE SAMPLES OF Data Mining, Its Purpose and Its Working Methodology

Main Challenges of Chinese Outward Foreign Direct Investment

With regards to organizational conflict caused by the differences in cultural practices and beliefs, this study recommends that the managers of the Chinese marble companies in the host country should use effective leadership, practice two-way communication between the top management and employees, and create a favourable working environment through organizational vision and goals....             This study recommends that the marble companies in Portugal should comply with the regulations and policies set by the government of its host country....
54 Pages (13500 words) Dissertation

Succession & Retention of Offshore Managers

Major investment (£100 million + per unit) in new Diving vessels by Technip UK Ltd and its main competitors will mean that recruitment and retention will be key to competitive advantage and building distinctive capabilities.... Chapter five will discuss the research methodology and the results will be analysed in chapter six.... Through the benchmarking exercise with its main competitors, this research study aims to arrive at a relevant examination of Technip's human resource practices and whether it is enhancing recruitment and retention, or proving to be a detriment....
4 Pages (1000 words) Essay

Value Stream Mapping of Vibration Test Data in a Product Life Cycle

The various case studies mentioned in this paper show how lean principles have been applied and how VSM is carried out, for what purpose and the outcomes achieved.... Moreover, data mining allows lean principles to be applied for extracting valuable data.... This is especially so for safety reasons besides improvements to structural design, increasing the longevity of the product and enhancing its overall quality.... For this purpose, Value Stream Mapping (VSM) is… By implementing the lean transformation, value is enhanced and non-value is either minimised, or in the case of pure waste, eliminated....
33 Pages (8250 words) Thesis

What is Data Mining and how it brings benefits to the Business

The purpose of the present paper is to provide an overview of the data mining techniques, its objectives, tools, and applications.... nbsp; Moreover, the paper "What is data mining and how it brings benefits to the Business?... will illustrate the benefits brought by applying data mining into the business.... hellip; data mining is a knowledge discovery process that is also known as Knowledge Discovery in Databases (KDD).... The primary function of data mining or KKD is to analyze and search a large number of data patterns in a database....
8 Pages (2000 words) Term Paper

Twitter Visualization Research Methodology

The author of this paper "Twitter Visualization Research methodology" concerns the critical comparison of data from Twitter to determine the tweets in all Australian states in the morning and evening one hour 9-10 am and 5-6 pm the frequency of tweets made by the people living in these states.... In this regard, Twitter has been used by the corporate for better access to its employees.... Notably, the comparison of data is made on the basis of tweets made on the weekend and the weekday....
8 Pages (2000 words) Assignment

Implementation of Voluntary Principles on Security and Human Rights

There is some belief among working group participants that those involved in the process should work to standardize implementation of the Principles (McFetridge, July 2008)... It will examine how various countries, companies and organisations that have adopted these principles have succeeded in its implementation.... A number of multinational companies and organisations especially those in the mining industry have adopted and implemented Voluntary Principles on Security and Human Rights to help in addressing issues with human rights in relation to security and safety....
18 Pages (4500 words) Essay

Information Mining and Google

The examination and processing of data allow for the examination of data from varied dimensions as well as its categorization.... The author of the following paper "Information mining and Google" will begin with the statement that information mining refers to the process of finding patterns or correlations amongst dozens of data fields located in relational databases.... hellip; Information mining is the procedure through which analysts evaluate data from varying perspectives in an endeavor to summarise it into reliable sets of information....
7 Pages (1750 words) Case Study

Data Mining Demographic Information and Transaction Data of a Large Retail Company

This research proposal "data mining Demographic Information and Transaction Data of a Large Retail Company" demonstrates the presence of the electronic business cases using data mining techniques to obtain insights on how a company can identify and support loyal customers.... hellip; The outcome of this data mining activity will enable the business to use the data to run loyalty programs that involve promotions.... Business Intelligence (BI) gives retailers the chance to meet the ever-changing desires and needs of customers through the use of tools like data mining and data warehousing....
8 Pages (2000 words) Research Proposal
sponsored ads
We use cookies to create the best experience for you. Keep on browsing if you are OK with that, or find out how to manage cookies.
Contact Us