Technology Review for Improving the Quality of Data Report Example | Topics and Well Written Essays

Enterprise Data Management INFS 5089 2017 Assignment 2 (Internal and External) Technology Review for Improving Data Quality [Your Name] [Date] Contents Contents 2 Executive Summary 3 About Machine Learning 5 Benefits of Using Machine Language Algorithms to Improve Data Quality (Large Organisations) 7 Benefits of Using the Machine Language Algorithm to Improve Data Quality (Small Organisations) 8 Reference List 10 Executive Summary Organizations continue to exhibit increasing dependence on data. The trend necessitates an improvement in the quality of data used by such organizations. The need for high quality data emanates from the fact that modern organizations require data to carry out almost everything. It is also evident that the increasing importance of data has resulted in the generation of huge amounts of data. Therefore, organizations require a comprehensive data management program that would guarantee the quality of their data to enable decision makers to make and implement effective decisions. Machine learning algorithms present a new frontier of opportunities for small and large organisations in the improvement of data quality. ML technologies enable organisations to accumulate data, analyse the data, and use the analysed data to make significant business insights. The benefits of ML systems include the enhancement of data quality, high scalability, and efficient identification and correction of errors. Improving Data Quality in Enterprises Both small and large organizations exhibit care for their enterprise data by safeguarding and improving the data. Improving data quality necessitates data managers in the organizations to ensure various aspects associated with enterprise data. The aspects include timeliness, accuracy, relevance, comparability, and completeness (Tee et al. 2005). In relation to the relevance of the enterprise data, some of the issues of concern include determining whether the data meets the intended purpose for its collection as well as the effective use of a database to store such data and retrieve it when necessary. The issue of relevance also requires the utility of such data for additional purposes such as conducting market analysis. In the event that the organization is unable to use such data for additional purposes, data managers should be able to state the duration and expense associated with utilising the data for additional purposes. The issue of relevance also requires determining whether it is possible to use the data for other purposes that are different to the intended purpose. It is evident that errors are inevitable with regards to the accuracy of enterprise data. However, improving enterprise data from the accuracy perspective requires determining the exact number of customers that purchased a particular product or service within a specified period. Such data is detrimental in enabling decision makers to determine the products that require an increase in their supply and those that require a reduction in their supply. This entails determining the exact number of customers that purchased a product or particular products daily, weekly, monthly, or even annually to enable the implementation of effective decisions. Data improvement also includes the aspect of timeliness. Under the aspect, it is imperative that the collected data is current to enable its utility in predicting the demand for products and services. The need emanates from the fact that outdated data is devoid of the timeliness aspect thereby rendering it inappropriate in the implementation of immediate business decisions. Data managers should strive to determine how current opinion polls reveal the actual situation on the ground. Improving data quality also entails improving data comparability. Comparability entails the ability of combining several databases into a single data warehouse to enhance the utility of such data in data modelling, exploratory analysis, and statistical estimation. Improving data comparability requires the use of similar data fields across all databases to enable the effective linking of the different data sets. It is also mandatory to enhance the accuracy of the data fields in such cases. Finally, improving data quality also entails attaining the completeness of the datasets. Achieving completeness necessitates ensuring that all records and data elements are available. Cases of missing records and missing items in the databases should not be evident since the existence of such cases reveals a quality lapse on the part of the databases. Citing the example of financial databases, it is apparent that the absence of some records or fields could result in disastrous consequences on the part of the organization. One of the measures associated with improving the completeness of the datasets is the need to train users of the database software on the effective utility of the applications. About Machine Learning It is clear that machine learning (ML) has revolutionised data quality management. Organizations have continued to integrate machine learning into their data environments with the objective of improving the quality of their data. Machine learning has the potential of automating the data matching process. In order to attain the objective, the ML technology starts by learning the matches before being able to predict them on a routine basis. The process starts by the manual process of setting up the labels. The ML model then learns from the new data entered for standardisation purposes (The Royal Society 2017). An increase in the quantity of data submitted for standardisation results into a consequent improvement in the ability of the ML algorithm to perform according to expectations and deliver the desired results. This grants the technology an added advantage over traditional solutions in the management of data quality. Therefore, organisations should not restrict the volume of entries or data that is necessary in the identification of the matching rules (Kudikala 2017). The effective performance of ML technologies requires the adoption of other systems that measure the performance of the technologies. It is evident that various organisations have identified the importance of ML technologies in data management. A good example of such organisations is NASA. The organization has realised that it is possible to apply ML in different contexts such as the assessment of scientific data quality. For instance, the organisation uses ML algorithms to detect anomalies and other unusual data values (Kudikala 2017). Determining the ways of getting rid of anomalies or avoiding them is detrimental towards the attainment of the business or project milestones of an organisation. Therefore, firms that utilise ML algorithms have the potential of identifying the obstacles to their success. The existence of search engines that process massive chunks of data such as Spark have made it possible for coders to use ML libraries on such platforms. Data matching under ML algorithms is a four-step process. The first step entails pre-analysing the dataset with the use of the tMatchpairing component. The process identifies suspicious data with a match-score that lies between the threshold value and the match-score. Match-scores also form part of the dataset. The second step entails labelling the suspect match record as either ‘non-match’ or ‘match’ by the data stewards. This is a manual process. It is possible to leverage the Talend Stewardship console in streamlining the labelling process. The third step entails feeding a sample of results obtained from step 2 into the tMatchModel to enable the ‘learning’ process. The ML classification model is the output of the process. In the step, the tMatchPredict component validates the Model. The fourth and final step is the utilisation of the Model generated from step 3 in predicting matches for new data entries (Kudikala 2017). The figure below shows the ML architecture. Figure 1: The Machine Language (ML) Architecture Benefits of Using Machine Language Algorithms to Improve Data Quality (Large Organisations) Improving the accuracy of product and inventory data is one of the significant benefits of using ML algorithms in improving data quality. With the help of the algorithms, large organisations can save on their costs and attain greater efficiency (UST Global 2017). The ML technology creates new data rules in the organisation thereby improving the efficiency of the error detection and correction process. It is evident that data accuracy and the speed of achieving data accuracy yield a positive impact to businesses. In one of the case studies involving a global retailer, it was clear that the organisation was expending massive efforts in addressing the quality of its operational data before the adoption of the ML solution. The retailer had supply chain and store operations across the globe. The worldwide network of stores and the diverse range of products implied huge data volumes. Apparently, maintaining the accuracy of such data is critical to the successful operations of the business. The manual operations of the global retailer yielded several adverse consequences such as duplication of efforts and misplaced inventory that yielded significant losses to the organisation. From the issues identified in the organisation, the need for an ML architecture to improve data quality became inevitable. The organisation also needed new data quality rules that would enhance the efficiency of the error detection and prevention process. It was evident that the ML architecture stood out as the most effective solution to the problems encountered by the organisation. Throughout the process of implementing the system, speed and accuracy turned out to be the main advantages of the system. High scalability turned out to be the other benefit realised by the organisation (UST Global 2017). The benefit emanated from the ability of the cloud-based solution to fulfil the rising demands of the organisation’s business. The process of detecting and correcting errors was also more efficient following the implementation of the system. From the test runs, it was evident that the system was able to detect and correct errors in 30% of the cases. The system was also able to fill in data in 5% of the cases. Finally, the organisation benefitted from the overall enhancement of the quality of its data because of the ability of the system to detect and correct errors. Benefits of Using the Machine Language Algorithm to Improve Data Quality (Small Organisations) For small organisations, ML algorithms have played a massive role in enabling the firms to reap competitive advantage in their markets by enabling them to address common data-quality related constraints. To begin with, the technology enables small organisations to deal with the challenge of accumulating and processing large amounts of information on their products (MIT Technology Review 2016). Apparently, firms find it difficult to collect, process, and store the ever-increasing volumes of data (Kerr 2000). The integration of such data suffices to be the other challenge. Currently, the interaction between customers and the manufacturers of their brands takes place through multiple devices, touch points, screens, and channels. It is apparent that each interaction creates data. The existence of massive information interchanges makes it difficult for organisations to collect and utilise all the information for the intended purpose. Companies have to collect, integrate, and use customer survey information, application data, as well as attribution and advertising data. Besides dealing with the large volume of data, small organisations also have to deal with the increasing variety of such data. This is where the ML algorithm steps in to integrate and utilise all the data for the benefit of the organisation. ML systems also help small organisations to deal with the issue of analysing large volumes of data to identify valuable insights (MIT Technology Review 2016). Therefore, the issue extends beyond mere analysis to the identification of insights that are valuable to the business. Machine learning presents a particular type of artificial intelligence that enables computers to learn without undergoing the programing process. With the help of the algorithm embedded in the technology, it is possible for the technology to learn from the submitted data, grow and change eventually when subjected to new data thereby enabling the identification of the important insights. Consequently, it is proper to state that machine learning is beneficial to both business analysts and business teams. With the help of machine learning, organisations can concentrate on issues that are important only. In the event that nothing needs attention and the reporting of zero anomalies by the machine on a day, the organisation saves time required to make critical decisions. ML technologies also correct errors besides merely detecting them. Therefore, small organisations that adopt such technologies enables small organisations to focus only on what is working automatically. As mentioned earlier, machine learning automates the analysis process. However, it is important to note that the analysis process is useless if the system or organisation is unable to use the results of the analysis to implement the necessary action. Converting analysis into action suffices to be the measure for the success of any organisation (MIT Technology Review 2016). With the help of the analytics officers, business partners can use machine-learning systems to set project deliverables and determine how the sophisticated analysis and algorithms can support the efforts of the business. Reference List Kerr, K., 2000. The development of a data quality framework and strategy for the New Zealand Ministry of Health. Kudikala, N., 2017. Using Machine Learning for Data Quality. Talend. Available from: https://www.talend.com/blog/2017/03/20/machine-learning-impact-data-quality-matching/ MIT Technology Review., 2016. How Analytics and Machine Learning Help Organizations Reap Competitive Advantage. Tee, S.W., Bowen, P.L., Rohde, F.H. and Doyle, P., 2005. An empirical investigation of factors influencing organisations to improve data quality in their information systems. The Royal Society., 2017. Machine learning: the power and promise of computers that learn by example. UST Global., 2017. Analytics and Machine Learning to Improve Data Quality. Available from: http://www.ust-global.com/analytics-analytics-and-machine-learning-improve-data-quality Read More

Technology Review for Improving the Quality of Data - Example

Extract of sample "Technology Review for Improving the Quality of Data"