Data Mining, Its Purpose and Its Working Methodology Coursework Example | Topics and Well Written Essays

? Full Paper This paper demonstrates in depth understanding of data mining concepts supported with real world examples. Moreover, data mining commands, languages, techniques are also discussed in detail. Furthermore, along with the benefits, various challenges and their solutions are also discussed briefly. 2 Introduction Data mining is a knowledge discovery process that is also known as Knowledge Discovery in Databases (KDD). The primary function of data mining or KKD is to analyze and search large number of data patterns in a database. Likewise, it utilizes computerized data analysis techniques to expose relationships of data items that were previously hidden or undetected. However, the data that is analyzed via different techniques is fetched from data warehouses, where many databases are interconnected with each other. Major techniques that are involved in the process of data miming are regression, classification and clustering. Data mining is incorporated for gaining in depth patterns for market intelligence from data warehouses containing massive amount of data. However, the issue that arises is not the quantity of data, as we already have massive amount of data to work with, it is the methodology that is required to learn data. 3 Data Mining 3NF is usually recommended for a corporate environment managing massive amount of replicated data. For instance there is no requirement of saving data several times. However, there is a requirement of doing more joins. Comparatively, 1NF will provide the functionality of storing replicated data regardless of number of joins. It is the choice of database administrator to evaluate what is the right form; it may be 3NF or 1 NF. Moreover, normalization comprises of five rules that are applied on a relational database. The main objective is to eliminate or minimize the redundancy and at the same time increasing database efficiency. The negative part illustrates that too much implementation of normalization can cause issues. The objective is to deploy the highest acceptable level of normalization. If we compare three of these NF’s, the 1NF removes replication in groups. The 2NF reduces data replication or redundancy and the 3NF reduces columns from the tables that are not reliant on primary keys. Therefore, database design must demonstrate the highest level of normalization possible, in order to make database efficient and robust. In order to maintain 3 large databases for a VLDB and to keep them efficient for two years if required, there is a requirement for constructing a ‘store and forward’ mechanism that will process the data or information from and through each distribution center database. Likewise, at the same time embrace that data or information pending till the completion of EDW. Moreover, data archiving is also required for maintaining each distribution center becoming a VLDB. EDW is efficient enough to support this scenario. A study demonstrated the overall cost of this disease throughout the world is $376 Billion annually. It is now almost fundamental that a person exceeding an age of 60 have more chances to get this disease, as it is now considered as the fourth largest live taking disease globally along with making its name for the fourth most common disease that contributes to a death of a person. However, the most common of all diabetes is the type 2. As there are almost 20% habitants suffering from in the United Arab Emirates alone, many research studies and debates are conducted yearly in Dubai and Abu Dhabi. Moreover, awareness sessions are conducted in every town of the cities to aware the people about this disease (MoH launches second phase of diabetes campaign.2010). However, this case study demonstrates the disease diabetes and medical data associated with patients from the Middle east region i.e. United Arab Emirates for discovering concealed patterns and the valuable information that can be utilized for decision making process. In addition, these informed decisions are performed by medical personnel and practitioners. Therefore, this case study can be utilized for illustrating the requirements for medication for each type of diabetes and also forecast the futuristic elements reflected in the extracted data (MoH launches second phase of diabetes campaign.2010). In the process of data mining, the data associated with people has risky ethical significances. Data mining experts need to deploy norms for making the data application resilient (Keating, 2008). As far as humans are concerned, this method is associated with disparity along with behaviors such as racialism, as they are negative to norms. Perception that is considered as another example is dependent on the applied classification, as it is recognized in splitting a disease that needs urgent attention. However, in case of a financial institution or a bank, loan acquisition is a non-ethical characteristic. Similarly, there are numerous factors that may be relevant to data mining. For instance, a report that was published from a leading consumer illustrated that in France, customer possessing a red car are more likely to be defaulters in returning loans back. As it is a debatable issue on categorizing it as ethical or non-ethical issue, similarly, insurance companies are always selected and discriminating because the differentiation factor includes a young person with an old lady is addressed in associated statistics, as young people have more likely hood of accidents, resulting in high insurance returns for their damaged cars. Other various issues pertain to data mining is techniques, tools, user involvement, performance and various types of data. A comprehensive discussion for each of them is as follows: 3.1 Methods of Mining and Interaction Concerns This issue pertains to the information extracted from databases and reviews the capability to gain information, as it focuses on mining information at many levels, the usability of issues associated with domains along with knowledge conception. 3.2 Mining several kinds of Information There are several different criteria that should be addressed by data associated with variety of data analysis and information discovery jobs. Moreover, jobs associated with data mining comprises of characterization, association / co relation and cluster analysis along with classification and forecast. Likewise, these jobs will utilize a single database for data mining and gives results as an output in several methods. 3.3 Collaborative Mining for Extracting Information It is difficult to predict the form along with composition of a database, as the purpose is to be iterative. Likewise, it can be allocated in different parts by applying sampling techniques along with providing techniques to the data mining specialist for making things easy and to save memory space as well. 3.4 Synchronizing Contextual Information Retaining the contextual information on the issue within a domain is a direction to define or decide the obscured data criteria that is demonstrated into summarizing terms. However, for a robust data mining process, domain knowledge is essential, as it analyze useful patterns that are associated with a set criterion. 3.5 Query Language for Data Mining Relational query language is valuable option as Structured Query Language (SQL) provides data mining specialist an option to implement various queries for acquiring specific set of data. However, for high level data processing in data mining, query languages are complex and far more advanced. These queries facilitate data mining experts to implement data mining jobs associated with domain knowledge. Moreover, these queries are also easily integrated with state of the art applications that are operational in data warehouses. Furthermore, along with integration, these queries also aid data mining experts to execute queries for quality data acquisition (Data mining extensions.2007). 3.6 Information Visualization The high level information extracted from databases is illustrated as high level visual representations and will activate data mining specialist to realize and apprehend the data (Data mining extensions.2007). However, interacting on a high level requires an aid of graphical representations such as graphs, bar charts, tables and rules. 3.7 Organizing Ineffectual Data Database also contains a large amount of extraneous data or incomplete data that creates hurdles in the process of data miming, as the algorithms and criteria are set to gather information from complete data, incomplete data also plays its part and make the process complex and in accurate in some cases (Data mining extensions.2007). However, the best option is to clean the data initially by data cleaning and analysis techniques and methods before utilizing it in the data mining process (Fowler, Karadayi, Chen, Meng, & Fowler, 2000). 3.8 Valuating Patterns Valuating patterns is an absolute essential task to perform. As there are numerous patterns that are extracted within the data mining processes and techniques, data mining specialist only analyze relevant and adequate patterns of data. This process involves high level expertise along with expert application knowledge, domain background knowledge issues and limitations associated with specific users. All these factors can limit the searching process for valid pattern discovery of data (Fowler, Karadayi, Chen, Meng, & Fowler, 2000). 4 Performance Limitations Data mining performance bottlenecks are linked with scalability, capability, and analogy of the data mining methods and procedures. Likewise, for making the data mining process effective, acquisition of information from data warehouses acquiring numerous databases is necessary. However, there are certain challenges when accessing data from large data warehouses, as some of the challenges includes long time delays in process data miming algorithms. However, the solution for this challenge is to incorporate distributed and parallel data mining techniques. These techniques can divide data in to different segments for making the process faster. 4.1 Challenges For addressing individual privacy, data mining technology is not up to the mark. Likewise, it also links data mining to be considered as a social facet. To inspect and study the customer’s buying habits and preferences from the market or predicting valuable patterns that will facilitate in creating futuristic decisions, business transactions are calculated along with the procurement of associated information. Data integrity plays a crucial role in data mining for providing authentic data that can be trusted. Likewise, the challenge is to consolidate unpredictable data collected from several sources. For instance, a financial institution or a bank utilized various techniques, tools, databases and methods for capturing data, in order to manage credit card accounts of their customers. Although, there are certain variations while gathering or capturing data in different formats, it is because of dissimilar software products. For addressing these issues, software products must be able to cope up with different systems running on different platforms, in order to gather data in a standardized format. A database where data is maintained in tabular form is called as a relational database. Relational database can be utilized for data mining techniques for addressing specific queries. As the technology associated with client/server architecture is progressing at a rapid pace and storage is managed at a single location that can be a preferred location for data mining. As there are many hardware equipment manufacturers in the market, cost of hardware has relatively minimized because of tough competition, as a result, the structure of data mining queries has also changed for utilizing optimal hardware features. Likewise, data mining queries are more powerful than before and extract optimal value form the data at ultimate speeds. Speed augments the amount of data acquisition during data mining process and at the same time provides valuable information to make informed decisions. In the process of data mining, the data associated with people has risky ethical significances. Data mining experts need to deploy norms for making the data application resilient. As far as humans are concerned, this method is associated with disparity along with behaviors such as racialism, as they are negative to norms. Perception that is considered as another example is dependent on the applied classification, as it is recognized in splitting a disease that needs urgent attention. However, in case of a financial institution or a bank, loan acquisition is a non-ethical characteristic. Similarly, there are numerous factors that may be relevant to data mining. For instance, a report that was published from a leading consumer illustrated that in France, customer possessing a red car are more likely to be defaulters in returning loans back. As it is a debatable issue on categorizing it as ethical or non-ethical issue, similarly, insurance companies are always selected and discriminating because the differentiation factor includes a young person with an old lady is addressed in associated statistics, as young people have more likely hood of accidents, resulting in high insurance returns for their damaged cars. Other various issues pertain to data mining is techniques, tools, user involvement, performance and various types of data. A comprehensive discussion for each of them is as follows: 5 Methods of Mining and Interaction Concerns This issue pertains to the information extracted from databases and reviews the capability to gain information, as it focuses on mining information at many levels, the usability of issues associated with domains along with knowledge conception. 5.1 Mining several kinds of Information There are several different criteria that should be addressed by data associated with variety of data analysis and information discovery jobs. Moreover, jobs associated with data mining comprises of characterization, association / co relation and cluster analysis along with classification and forecast. Likewise, these jobs will utilize a single database for data mining and gives results as an output in several methods. 5.2 Collaborative Mining for Extracting Information It is difficult to predict the form along with composition of a database, as the purpose is to be iterative. Likewise, it can be allocated in different parts by applying sampling techniques along with providing techniques to the data mining specialist for making things easy and to save memory space as well. 5.3 Synchronizing Contextual Information Retaining the contextual information on the issue within a domain is a direction to define or decide the obscured data criteria that is demonstrated into summarizing terms. However, for a robust data mining process, domain knowledge is essential, as it analyze useful patterns that are associated with a set criterion. 6 Query Language for Data Mining Relational query language is valuable option as Structured Query Language (SQL) provides data mining specialist an option to implement various queries for acquiring specific set of data. However, for high level data processing in data mining, query languages are complex and far more advanced. These queries facilitate data mining experts to implement data mining jobs associated with domain knowledge. Moreover, these queries are also easily integrated with state of the art applications that are operational in data warehouses. Furthermore, along with integration, these queries also aid data mining experts to execute queries for quality data acquisition. 7 Information Visualization The high level information extracted from databases is illustrated as high level visual representations and will activate data mining specialist to realize and apprehend the data. However, interacting on a high level requires an aid of graphical representations such as graphs, bar charts, tables and rules. 8 Organizing Ineffectual Data Database also contains a large amount of extraneous data or incomplete data that creates hurdles in the process of data miming, as the algorithms and criteria are set to gather information from complete data, incomplete data also plays its part and make the process complex and in accurate in some cases. However, the best option is to clean the data initially by data cleaning and analysis techniques and methods before utilizing it in the data mining process. 9 Valuating Patterns Valuating patterns is an absolute essential task to perform. As there are numerous patterns that are extracted within the data mining processes and techniques, data mining specialist only analyze relevant and adequate patterns of data. This process involves high level expertise along with expert application knowledge, domain background knowledge issues and limitations associated with specific users. All these factors can limit the searching process for valid pattern discovery of data. 10 Performance Limitations Data mining performance bottlenecks are linked with scalability, capability, and analogy of the data mining methods and procedures. Likewise, for making the data mining process effective, acquisition of information from data warehouses acquiring numerous databases is necessary. However, there are certain challenges when accessing data from large data warehouses, as some of the challenges includes long time delays in process data miming algorithms. However, the solution for this challenge is to incorporate distributed and parallel data mining techniques. These techniques can divide data in to different segments for making the process faster. 10.1 Explanation of the Problematic Domain As per (Camacho, Gharib et al. 2007), Diabetes mellitus is defined as “DM is a chronic disorder of glucose homeostasis characterized by hyperglycemia and impaired insulin action, with abnormal pancreatic insulin secretion as well as increased rates of hepatic glucose production. Unlike type 1 DM, no absolute physiological lack of insulin is present”. Diabetes is a disease that may be considered as a primary driving factor for some of the other health issues. For instance, diabetes imposes almost 100% risk for vascular diseases such as cardiovascular diseases along with creating issues in kidneys and the brain as well. A recent survey was published that illustrates the number of diabetic patients exceeding from more than 285 million globally. Likewise, diabetic patients are more common in well developed countries, however, study shows that it may gain a significant number of patients in Africa and Asia. Likewise, in the Middle East and North Africa, currently there are minimum 27 million patients suffering from diabetes mellitus. Moreover, in the United Arad Emirates, almost one third of the population is suffering from this disease. A study demonstrated the overall cost of this disease throughout the world is $376 Billion annually. It is now almost fundamental that a person exceeding an age of 60 have more chances to get this disease, as it is now considered as the fourth largest live taking disease globally along with making its name for the fourth most common disease that contributes to a death of a person. However, the most common of all diabetes is the type 2. As there are almost 20% habitants suffering from in the United Arab Emirates alone, many research studies and debates are conducted yearly in Dubai and Abu Dhabi. Moreover, awareness sessions are conducted in every town of the cities to aware the people about this disease (Zain 2009). However, this case study demonstrates the disease diabetes and medical data associated with patients from the Middle east region i.e. United Arab Emirates for discovering concealed patterns and the valuable information that can be utilized for decision making process. In addition, these informed decisions are performed by medical personnel and practitioners. Therefore, this case study can be utilized for illustrating the requirements for medication for each type of diabetes and also forecast the futuristic elements reflected in the extracted data. 11 Conclusion The first phase of the brief starts with the comprehensive introduction of data mining, its purpose and its working methodology. The second phase incorporates detailed discussion on data mining methods, mining data types, information extraction methods, query language that is used for extracting data, information visualization and evaluation of the patterns. The third phase discusses the challenges that may occur while performing data mining to extract meaningful data. To elaborate the challenges further, a comprehensive discussion is also available for the problem domain. References CAMACHO, P.M., GHARIB, H. and SIZEMORE, G.W., 2007. Evidence-based endocrinology Philadelphia: Lippincott Williams & Wilkins. Data mining extensions. (2007). Network Dictionary, , 134-134. Fowler, R. H., Karadayi, T., Chen, Z., Meng, X., & Fowler, W. A. L. (2000). A visualization system using data mining techniques for identifying information sources. (). Keating, B. (2008). Data mining: What is it and how is it used? Journal of Business Forecasting, 27(3), 33-35. MoH launches second phase of diabetes campaign. (2010). Arabia 2000, Read More

Data Mining, Its Purpose and Its Working Methodology - Coursework Example

Extract of sample "Data Mining, Its Purpose and Its Working Methodology"

CHECK THESE SAMPLES OF Data Mining, Its Purpose and Its Working Methodology

Main Challenges of Chinese Outward Foreign Direct Investment

Succession & Retention of Offshore Managers

Value Stream Mapping of Vibration Test Data in a Product Life Cycle

What is Data Mining and how it brings benefits to the Business

Twitter Visualization Research Methodology

Implementation of Voluntary Principles on Security and Human Rights

Information Mining and Google

Data Mining Demographic Information and Transaction Data of a Large Retail Company