The computing power is increasing at the rate specified by Moore's law, doubling every eighteen months. The technology upgrade to parallel processing has vastly contributed to more powerful machines. There have been a number of statistical applications and algorithms that were waiting for larger computing power to arrive. Data mining makes use of these algorithms to enable data mining possibilities. In addition to these, data is being collected in a very large scale at all levels. More the data better the data mining exercise has been the watchword of most of the work that is carried out. All these combine to make data mining. Using these data and applying appropriate models, the results of the data mining is obtained. This would enable businesses to identify buying behaviour patterns from customers; identify customer demographic characteristics and predict customer response to mails.
Most of the cases, both commercial and scientific establishments report a condition where there is a large quantity of data which is collected and stored. But there is hardly any information for the people to make use of. In its basics, the data mining efforts start with employing appropriate data models that would help in understanding the system and its behaviour (Hand D J, 2001). This would further help in augmenting the nature of work executed and the future of the object becomes more predictable. This is possible to do only if the object is understood well and the modelling is realised to the closest possible accuracy. A number of modelling tools help in data mining. Typically, Decision Trees, rule Induction, Regression Models and Neural networks. All these contribute to extracting needed data from the databases using the data mining tools. These are not simple straight forward SQL statements (Australian Academy of Science, 2006).
Qualitative analysis is possible with the predicate data that would use this to identify and get objective visualisation of the object being modelled. Whereas in a quantitative analysis, the data is used for automatic processing based on specific input data or time. Based on the model the information and data available in the system is extracted to meet the requirements. In case of the banks, this would help them in identifying and detecting patterns of fraudulent credit card usage. The banks might like to identify loyal customers and those who might change their loyalty even with a minor issue. It also helps in identifying credit card spending by customer groups and finding any specific correlation between different financial indicators. A closer look at how data mining works will help in understanding the advantages of data mining much more closely.
Data mining involves the following steps of operation (Jeffrey Ullman, 2003):
1. Data Gathering: This is also called data warehousing and in most cases involves collecting information from the sources and assimilating them in the given database or data store. Data gathering is done through web crawling, data entry processes among many other methods of data gathering.
2. Data Cleansing: This involves cleaning the available data by removing unwanted data or information and in producing or culling out what is needed and what is not needed. This is primarily to remove errors in data and ensure that the data integrity is maintained all through.