StudentShare
Contact Us
Sign In / Sign Up for FREE
Search
Go to advanced search...
Free

Data Mining Process and Algorithms - Essay Example

Cite this document
Summary
The paper "Data Mining Process and Algorithms " states that child welfare organizations are using predictive analysis to minimize the dropout of children from foster parent care because it increases their risk of falling out of the program as a whole. …
Download full paper File format: .doc, available for editing
GRAB THE BEST PAPER96% of users find it useful
Data Mining Process and Algorithms
Read Text Preview

Extract of sample "Data Mining Process and Algorithms"

? Data Mining School a). Using predictive analytics to comprehend behaviors: To remain competitive within the market, sellers need to comprehend present consumer behavior as well as predict those of the future. The accurate prediction of these behaviors and the full understanding of customer behaviors can help them in improving sales, retaining customers, and extending the relationships sustained with customers. This is realized through predictive analysis data mining, which offers the users, impactful insights throughout the organization (Greene, 2012). Predictive analytics is where statistics and mathematics integrate to business and marketing to establish patterns in data and extrapolating the patterns to future business cases and issues, so as to reduce costs, improve response rates, increase the efficiency of processes and consequently boost revenue levels. Data mining has different components, but the most significant is defining the problem, evaluating the available data and developing predictive models. (b). Associations discovery for the commodities sold to consumers helps the retailer or other business to capture the unique identifier of a given product. Through capturing this information, the seller is able to analyze the data, so that they can learn the purchasing behavior of their customers. The information derived is used to support business-related strategies and applications like inventory management, marketing promotions and customer relations management. (c). Mining information on web usage is very important to the effective management of websites, planning the development of adaptive websites, administering business and support services, increasing personalization as well as analyzing the flow of network traffic. Further, fast business growth of businesses forces businesses and customers to face a different situation, where competition plays a major role in determining the strategies adopted by businesses (Greene, 2012). On the other hand, the customer is exposed to more options to choose from, therefore, will need to follow the businesses that depict more value. For example, through discovering that many customers of a given business come from teen customers, may help the company to adjust their targeting outlook, to ensure that it targets the focus group better. (d). Clustering analysis traces groups of data entities or objects that are similar in certain aspects. The members of the different groups are supposed to be more similar to other members, and different from the members of other clusters. The target of clustering is the discovery of high-quality groups, where inter-cluster similarity is lowest but intra-cluster similarity is highest. Through establishing the highest inter-cluster similarity, the characteristics of the members are used or viewed as the customer information that can be tracked or targeted to increase the impact of the business, among the given high-quality cluster. 2. The reliability of data mining algorithms can be done through the validation of data mining modes. The process involves the assessment of the performance of the mining models against real data. This is done through understanding the characteristics and the quality of the algorithms before deploying them into the production environment (Chung, & Gray, 1999). To determine the reliability of data mining algorithms, the deployment of different statistical validity measures is checked, towards determining whether there are issues in the model or the data. The reliability of data mining algorithms is determined through the scalability of the clustering techniques. This is particularly true, in the case of large data sets, where space and speed are high. For example – in the case that the algorithm –in the case of a database that contains millions of records, shows linear or close to linear time complexity, which demonstrates that the reliability of the algorithm is high. The reliability of the algorithm can be determined through the application of tests to determine whether a given point belongs to the cluster in question. In such a case, a reliable algorithm will detect outliers and noise, and eliminate their negative effects or delete them entirely. Through this process, the testing of the validity of the algorithm can be done while the process of clustering is ongoing, or it can be done at the post-processing phase. Following the ability of algorithms to sort the mass digital storage collected using computer systems, since their advent, provided that the reliability of these models is high, then their trustworthiness is not questionable. For example, to overcome the problem of unpredictability in handling data, the development of structured database management mechanisms and databases has increased the ability of algorithms to offer factual information (Han & Kamber, 2000). Therefore, data mining algorithms, considering the established nature of database management systems, offers an effective and trustworthy way of providing assorted information and data. The errors that are likely to arise from the usage of data mining algorithms include querying the wrong question, the failure to test the reasonable nature of the outcomes, ignoring differences in the data, building overly complex models, over generalizing the results and the employment of a single data analysis (Zhou, Jiang & Chen, 2003). 3. (a). The first privacy concern is whether the information collected for the purpose in question will be used for secondary or additional purposes in the future and whether the protection of privacy is inbuilt into the systems at the stage of development and whether the information will be further mined after the collection. The other concern is whether there is a possibility that it will be combined with other information from private or government agencies. (b). The first privacy concern is valid, because the customer must express their concerns that the information collected may be sold or communicated to other agencies, for instance marketers, who will pursue business from them (GAO, 2004). The second privacy issue is valid because the information of the customer may be diverted to other agents that may use it, prior to the checking of privacy. For example, a customer’s private information may be collected by government agencies investigating transactions that appear suspicious. The third privacy concern is valid because combining the information collected with that available at public or private enterprises can lead to harassment or the undue investigation of customers due to possible prototyping of the private information. For example, the collection of an Arab’s names and other information may be subjected to investigation, during the course of investigating terrorist activities and dealings (GAO, 2004). (c). The first privacy concern is addressed through the collection of only vital information, so that the customer can be guaranteed that the information will not be used for other purposes in the future. The second privacy concern is addressed through integrating the data privacy system into the data collection process, right from the development stage. This ensures that the system does not leak any information. The third concern is addressed through ensuring that customers’ information is stored safely and in a way that ensures that it does not infiltrate to others who may need it. 4. In the case of hospitals, predictive analysis has been used to point out the side effects of medication, towards improving the effects of medical practice. The effectiveness of this business strategy is that the hospital acquires more refined information on the treatment types that are most effective for certain diseases. Street clothing lines at Paris situate cameras at the streets, from where they can observe the fashion trends of the time (Siegel, 2013). Following the strategy, the designers recognize the color, fashion trends and the designs that are coming in and those that are losing the market, from the historical data. The patterns drawn are used to predict the next breakthrough in fashion designing. Child welfare organizations are using predictive analysis to minimize the dropout of children from foster parent’s care, because it increases their risk of falling out of the program as a whole. They do this through using a predictive model to match the parents, depending on age, income, ethnicity and education among other variables, and the children that they can match better. The effectiveness of the strategy is that it has improved the success of foster parenthood, by increasing the duration of stay and the overall outcomes of the service for society, the government and the children (Siegel, 2013). References Chung, M., & Gray, P. (1999). Special Section: Data Mining. Journal of Management Information Systems, 16(1), 11-16. Greene, W. (2012). Econometric Analysis, 7th Ed. London: Prentice Hall. Han, J., & Kamber, M. (2000). Data Mining: Concepts and Techniques. Massachusetts: Morgan Kaufmann. Siegel, E. (2013). Predictive Analytics: The Power to Predict Who Will Click, Buy, Lie, or Die. New York: John Wiley. U.S. General Accounting Office (GAO). (2004). Data Mining: Federal Efforts Cover a Wide Range of Uses. GAO-04-548, May 2004. Zhou, Z.-H., Jiang, Y., & Chen, S.-F. (2003). Extracting symbolic rules from trained Neural network ensembles. AI Communications, 16(1), 3-15. Read More
Cite this document
  • APA
  • MLA
  • CHICAGO
(“Data Mining Essay Example | Topics and Well Written Essays - 1000 words”, n.d.)
Data Mining Essay Example | Topics and Well Written Essays - 1000 words. Retrieved from https://studentshare.org/information-technology/1482351-data-mining
(Data Mining Essay Example | Topics and Well Written Essays - 1000 Words)
Data Mining Essay Example | Topics and Well Written Essays - 1000 Words. https://studentshare.org/information-technology/1482351-data-mining.
“Data Mining Essay Example | Topics and Well Written Essays - 1000 Words”, n.d. https://studentshare.org/information-technology/1482351-data-mining.
  • Cited: 0 times

CHECK THESE SAMPLES OF Data Mining Process and Algorithms

Efficiency of Clustering Algorithms for Mining Large Biological Data Bases

This paper ''Efficiency of Clustering algorithms for Mining Large Biological Data Bases'' discusses that using Pro-PAM algorithm based on partitioning clustering techniques in the place of alignment methods in large data sets, increases efficiency and reduces execution time significantly.... Various clustering algorithms-methods-have addressed the gene sequence clustering.... The most widely used of the three algorithms are the graph-based technique, and the hierarchical technique....
10 Pages (2500 words) Research Paper

Data Mining: Concepts and Techniques

This report "data mining: Concepts and Techniques" discusses data mining clustering analysis as has been done above finds data object clusters that are identical in a way or another based on their features and possibility to be integrated to achieve a single outcome or result.... data mining can be defined as a class of database applications which seek for hidden patterns within a collection of data that can be used to effectively predict future behaviors....
12 Pages (3000 words) Report

Efficiency of Clustering Algorithms in Mining Biological Databases

In biological data mining, most of the sequences that are increasingly being analyzed using clustering algorithms include genomic as well as protein sequences.... "Efficiency of Clustering algorithms in Mining Biological Databases" paper revealed that hierarchical clustering algorithms are the most efficient for the analysis of biological databases such as those involving the analysis of gene sequences because it provides an analysis of the sequences....
5 Pages (1250 words) Research Paper

Efficiency of Data Mining Algorithms in Identifying Outliers-Noise in a Large Biological Data Base

The paper "Efficiency of data mining Algorithms in Identifying Outliers-Noise in a Large Biological Data Base" summarizes that the classification of large protein sets of data sequences by clustering techniques in the place of alignment methods extremely cuts down on the execution time.... The local algorithms were used to find amino acid patterns that are conserved in protein sequences (Clote and Backofen, 2000).... The other type of algorithms was the global algorithm which was based on to align the entire protein sequence by making use of the most possible characters....
7 Pages (1750 words) Essay

Data Mining - Questions to answer

What is the significance of Bayes Theorem in data mining Give an example of how statistical inference can be used for data mining.... ost of the presently available statistical models in data mining are prone to overfitting and also unstable (sensitive to minor changes in the data).... The Bayesian algorithm facilitates integration of clustering and produces scalable powerful algorithm apt for data mining.... Example: In the search process of similar sequences (gene or protein sequences) in a sequence database, the data mining algorithm works by searching for similar matches which is based on the statistical preferences (e- value)....
4 Pages (1000 words) Essay

Identifying Outliers in a Large Biological Data Base

This coursework "Identifying Outliers in a Large Biological Data Base" identifies approaches that are efficient in clustering and are based on algorithms.... The main aim of these identified clustering algorithms is to come up with meaningful partitions, to better the quality of classification and to reduce the time used for computation.... The identified algorithms include; Pro-LEADER, Pro-Kmeans, Pro-CLARINS, and Pro-CLARA.... The above methods are used in the partitioning of protein sequence data sets in cluster algorithms....
7 Pages (1750 words) Coursework

The Efficiency of Clustering Algorithms for Mining Large Data Bases

his study focuses on evaluating the efficiency of various types of sequencing data mining algorithms with respect to protein sequence data sets, and on the basis of their shortcomings, design and develops an efficient clustering algorithm on the basis of the partitioning method.... The paper "The Efficiency of Clustering algorithms for Mining Large Data Bases" highlights that using Pro-PAM algorithm based on partitioning clustering techniques in the place of alignment methods in large data sets, increases efficiency and reduces execution time significantly....
7 Pages (1750 words) Coursework

Data Mining and Knowledge Discovery in Database

This paper is about the data mining as the main factor in the process involving the inferring of algorithms that explore the data, develop the model and discover the previous patters that are unknown.... data mining in the current situation has been important due to abundance of data that makes uses the knowledge of data discovery.... The aim of the research is to come up with the right process involved in data mining and to organize important methods that are developed in the field into unified and coherent catalog; presenting the performance evaluation approaches and techniques and also the cases and software tools that uses the method (Dai, Liu & Smirnov, 2012)....
12 Pages (3000 words) Research Paper
sponsored ads
We use cookies to create the best experience for you. Keep on browsing if you are OK with that, or find out how to manage cookies.
Contact Us