Data Mining and Prediction Modeling in Health Care Term Paper Example | Topics and Well Written Essays

Data Mining and Prediction Modeling in Health care Data Mining and Prediction Modeling in Health care Background Information SickleCell Anemia (SCA) is a lifelong, hereditary, and hematological disorder that results from an abnormality in an individual’s oxygen-carrying hemoglobin molecules located in the red blood cells (American Accreditation Health care Commission, 2013). The abnormality in the hemoglobin molecules results from a mutation that occurs due to the inheritance of the abnormal hemoglobin gene (HbF). This gene facilitates the production of the HbS, the hemoglobin-sickle that converts the biconcave, disk-shaped, and soft, rounded red blood cells into rigid, brittle, and half moon-shaped red blood cells (Solanki, 2014). Even though this disorder is highly genetic, some individuals, born of only one parent with SCA, carry the Hb-SA gene that makes them asymptomatic. They thus carry the sickle cell gene in their blood, but the disorder does not manifest phenotypically. Such people are termed as carriers (Solanki, 2014). Statement of Problem SCA increases a person’s susceptibility to infections and disease-related complications. Patients also experience episodes of intense pain (AAHCC, 2013). In some cases, SCA could be fatal due to the acute oxygen depletion that leads to organ failure (AAHCC, 2013). Unlike other hematological disorders like anemia that can be cured or alleviated with diets rich in iron, Vitamin B12, and C, SCA can neither be cured nor alleviated with food (Dampier et al., 2011). In fact, close to 300,000 children are born with a subtype of SCA annually. Such children do not live beyond the age of five due to complications resulting from the increased vulnerability to related diseases (Dampier et al., 2011). Fortunately, there has been a recent focus on research in disease-modifying drugs and proposed curative strategies and therapies which can minimize morbidity and boost prognosis (Maakaron, 2014). Nonetheless, there is an acute dearth of efficient systems of data collection and data mining in this field. Such practices are highly potent in ensuring the collection of massive sets of data which can be effectively converted to useful knowledge that in turn boosts the development of the disease-modifying drugs and curative strategies and therapies. In doing so, the aforementioned practices could also save on medical expenditure, reduce morbidity among SCA patients, and improve the quality of patient care. Significance of Study As mentioned earlier in this document, SCA is a lifelong disorder. Therefore, a prognosis is very important if SCA patients are to live normal lives with minimal morbidity. This means that a boost of SCA prognosis through data mining and predictive modeling using strategies such as the Classification and Regression Tree (CART), has the potential of improving the lives of SCA patients (Berk, 2008). Once resources are channeled towards prognostic strategies, there is a high likelihood that the average lifespan of SCA patient will increase. Statistics indicates that in 1973, the average global lifespan for patients with SCA was only 14 years (Lewis, 2000). However, due to technology advancement and the ever-growing investment in medical research and development, the average lifespan of SCA patients is now at 48 years for women and 42 years for men (Lewis, 2000). Few studies focus on data mining; its relationship with SCA, its related complications, and SCA patients’ health-related quality of life. This study will boost disease management strategies and allow for early detection of SCA-related complications. Data mining on information relating to SCA will benefit not only the patients but also the medical practitioners. Through data mining and CART systems, well-structured, adequately defined, and reliable clinical decision rules can be developed (Loh, 2011). These reliable rules will play a major role in ensuring that new patients are appropriately classified into clinically important categories. The ease of classifying patients as a result of data mining will facilitate proper decision-making practices regarding treatment methods or hospitalization even in emergency scenarios. This will in turn reduce the instances of ethical dilemmas which have been proven to cause moral distress among medical practitioners. Moreover, by using computer-assisted analysis, through CART systems and data mining programs, all the data collected by health care institutions can be converted into useful pointers (Loh, 2011). Data mining, aided by computer programs, allows for the integration, synthesis and synchronization of the highly uncertain, vastly dimensional, and greatly distributed raw health data. This will go a long way in unraveling the undiscovered and unexpected prognostic health care dynamics hence proper patient care systems can be established to ensure SCA patients and their health care givers are under minimal stress. Elements of the Health care System In order to make a proper prognostic system, it is paramount to identify the main predictor factors. Since the prognostic system at hand mainly focuses on SCA, the main predicting factors include pregnancy, dactylitis, hemoglobin levels, and White Blood Cell (WBC) count (Pekelis, 2013). In pregnancy, the rates of fetal loss, premature births, and underweight children are critical factors to look out for in SCA prognosis. For the head-foot syndrome (dactylitis), the system will mainly focus on infants below the age of one year. For hemoglobin levels, the system will focus on patients with Hb levels below 7g/dL. Finally, for WBC count, the system will focus on patients showing signs of leukocytosis even in the absence of an infection. The main reason for selecting these parameters as predictor factors are because the hand-foot syndrome (dactylitis) mainly affects children under five years of age. Therefore, identifying children with the syndrome at ages below one year will ensure such children are placed under appropriate care and medication thus reducing their vulnerability to infections and SCA-related diseases. In a nutshell, the hand-foot syndrome is a significant indicator of the level of SCA severity since children who have it before the age of one year are most likely to have a severe clinical course. Second, the hemoglobin levels also play a significant predictor role when it comes to SCA because the disorder detrimentally affects hemoglobin (Hb) levels. If a child records a baseline Hb level that is below 7g/dL, then there is a high probability that the individual will suffer from severe SCA in future (Pekelis, 2013). Once such low Hb levels are recorded, the child will be placed under medical scrutiny to ensure minimal morbidity even in future. Finally, the WBC count also indicates the level of immune defense activity in a patient’s body (Solanki, 2014). Therefore, if a child’s WBC count is higher than normal in the absence of an infection, it is likely that their immune system is trying to combat the changes brought about by the Hb-SA (Hemoglobin-Sickle) gene. Such a child should be placed under medical scrutiny if a severe or fatal case of SCA is to be avoided. The selection of the aforementioned parameters was not only based on health facts but also on statistics from research studies conducted in different parts of the world. According to Maakawn (2014) of MedScape, SCA mortality cases are very high during childhood years. Reasons for this, particularly in Sub-Saharan Africa include the lack of diagnosis, misdiagnoses, and the sporadic and insufficient nature of data on child mortality. In other cases, mortality from SCA, a disorder dubbed “the suffering,” is considered a taboo hence death from it is attributed to other diseases such as malaria. This corrupts the infant mortality data hence hindering prognostic studies resulting in high child mortalities due to SCA. Data from the Brazilian National Newborn Screening Program indicates that out of the 3,500 children are born with SCA annually, 20 percent die of it before the age of five years (Maakaron, 2014). The infant mortality rate due to SCA was 25 percent until the Rio de Janeiro Blood Center shed light on the importance of SCA prognosis. When the organization initiated a program which availed proper treatment, adequate attention, and care to children suffering from SCA, the infant mortality rate due to SCA diminished to just 2.5 percent. Additionally, a study by the Corporate Study of Sickle Cell Disease (CSSCD) in 1995 indicates that the introduction of penicillin prophylaxis and pneumococcal vaccinations to children with SCA or SCA traits reduced the instances of acute chest syndrome which was the main fatal disease associated with SCA. This boosted survival rates due to early diagnoses and treatment hence justifying the prognostic efforts focused on children. In order to cope with the numerous features interacting in complicated and non-linear ways, the suggested system will use a partitioned system design. Prediction trees, as the name suggests, uses tree-like algorithms to represent the recursive partitioning of interactions into smaller hierarchical clusters and regions (Maakaron, 2014). From the roots of the tree model is a terminal node or leaf which is the equivalent of a particular cell. This means that point x is related to a leaf if x falls in the corresponding cell. In order to find a particular cell in such a system, one traces it from the root nodes by asking several questions about the characteristics of the cell. Using the CART system, all the interior nodes are labeled with questions whose answers are labeled on the edges and branches between them. This means that the answers provided in the previous section dictates questions being asked in the subsequent section. The health care sector has many “predictor” variables hence the suggested system ought to have the capability of making multiple comparisons from different data sets. Since different groups of patients have different extents of both variance and variation, the system has to be able to accommodate randomly distributed predictor variables (Lewis, 2000). For instance, the value of variable A such as a patient’s age may greatly affect the importance of variable B such as the same patient’s weight. As the number of interactions between the variables increases, it becomes more challenging to model them (Lewis, 2000). However, when using CART analysis, multivariate logistic regression models can be used to project a patient’s probability of disease. This probability is calculated by using the pre-recorded patient characteristics alongside regression coefficients to introduce the dynamic of probability which replaces the usual “high risk” versus “low risk” perception in the current clinical practice. Using CART analysis in combination with existing health data, clinical decision rule frameworks can be fabricated by using large data sets. The dependent variable for every patient a particular dataset could be the patient’s medical history. The dependent variable can, therefore, be whether or not the patient at hand has a history of the condition in which the medical practitioners hope to accurately predict in other patients; in this case SCA (Loh, 2011). Examples of such a variable include elevated WBC counts and hemoglobin levels lower than 7g/dL. Other variables could include the patient’s characteristics which can play a role in predicting the value of the dependent variable. For instance, if a medical practitioner wised to predict the likeability of a patient to having SCA, a possible predictor variable could be a patient recording a sudden elevation in WBC count or a sudden weight loss even in the event that the patient has no record of infections in the recent past. Advantages of Using CART Analysis in SCA Prognostic Systems There is a wide array of methods medical analysts can deploy in creating prognostic programmes for early detection and prediction of SCA (Loh, 2011). However, the CART system of analysis has the potential of making accurate predictions from a massive dataset based on a couple of simple if-then conditions. This system has a number of advantages discussed hereinafter. First, the results of the CART system are very simple (Loh, 2011). The output from this system, in this case survival rates, surgical urgency, and myocardial infarction, is very simple. The simplicity of the results makes it very useful in the field of health care were rapid patient classifications are required especially in emergency situations. Using this system if analysis, the practitioner only analyses and evaluates just one or two conditions at a time hence using this system is much easier than computing classification scores for all datasets (Loh, 2011). The output from this system is also dominated by simple if-then statements as opposed to complex non-linear model equations which are the output of other analysis methods. As mentioned earlier, the output of the CART system of analysis is a series of simple if-then conditions called tree nodes. Therefore, there is no assumption that the relationship between the predictor variable and the dependent variable is linear, or if they follow specific non-linear link functions, or if the two are monotonic in nature (Pekelis, 2013). Therefore, the CART system is non-linear and non-parametric. This makes it suitable for data mining tasks since there is little prior knowledge on the subject matter and no coherent sets if theories related to SCA. Moreover, there is a growing interest in the use of the CART system of analysis over the last decade since it uncovers some of the interactions between predictor variables. This makes its popularity surpass that of other traditional techniques. Disadvantages of Using CART Analysis in SCA Prognostic Systems The CART system also has its shortcomings. One of the major issues arising from the use of CART systems of analysis arises when applying it to actual data which is much more random than anticipated. In such an instance, it becomes very difficult to draw the line on when to stop splitting datasets (Solanki, 2014). For example, in an instance involving 10 medical SCA cases, up to 9 if-then conditions can be developed so that every single case can be adequately predicted. The theory behind this is that a continuous split of the cases allows analysts to reproduce the data hence predict the most probable outcomes (Solanki, 2014). However, it is not always certain that the continuous splitting of the cases will culminate in a replication of the data hence escalating the risks involved in the decision cost matrix. The decision cost matrix outlines the costs associated with a misclassification on a new patient. Errors of bigger magnitudes result from classifying patients with emergent health conditions as non-urgent as compared to the misclassification of patients will non-urgent health conditions as urgent. Additionally, most statisticians lack adequate knowledge and information on how CART analysis systems works. This has hindered its acceptability and the credibility of the output from the system with the public (Solanki, 2014). Until recently, using CART analysis systems has been very difficult hence most practitioners prefer other traditional techniques. The relative novelty of CART analytical systems has made it difficult to find statisticians with proficient expertise in the system. This has created a challenge in locating advisors and assistance for people willing to use CART analytical systems. Since it is not considered as a standard analysis technique, CART is normally excludes in most statistical software packages like SAS. The Classification and Regression Tree (CART) analysis is a highly potent system especially in the clinical research arena. The CART system can be easily integrated into the operations and databases of health care organizations since its use is highly diverse and extremely beneficial especially in prognostic studies of Sickle Cell Anemia. Using classification algorithms, medial analysts and practitioners can continuously analyse blood samples with respect to age and create prediction models that can be used to make early diagnosis thus reducing morbidity in SCA patients. The application of CART will also play a significant role in patient classification which in turn streamlines health institutions’ operations. References American Accreditation Health care Commission. (2013). Sickle Cell Anemia. Health Guide, New York Times. Retrieved on 12th June 2015 from: http://www.nytimes.com/health/guides/disease/sickle-cell-anemia/prognosis.html Berk, R. D. (2008). Statistical Learning from a Regression Perspective. Springer Series in Statistics. New York: Springer-Verlag. Dampier, C. K., LeBeau, P. H., Rhee, S. T., Lieff, S. B., Kesler, K. T., Ballas, S. J., Comprehensive Sickle Cell Centers (CSCC) Clinical Trial Consortium (CTC) Site Investigators. (2011). Health-Related Quality of Life in Adults with Sickle Cell Disease (SCD): A Report from the Comprehensive Sickle Cell Centers Clinical Trial Consortium. American Journal of Hematology, 86(2), 203–205. doi:10.1002/ajh.21905 Lewis, R. G. (2000). An Introduction to Classification and Regression Trees (CART) Analysis. Harbor-UCLA Medical Center, Department of Emergency Medicine. 1 (1), 1-13. Retrieved on12th June 2015 from: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.95.4103&rep=rep1&type=pdf Loh, W. K. (2011). Classification and Regression Trees. WIRES Data Mining Knowledge Discovery. John Wiley and Sons, Inc, 1 (1), 14-23. Retrieved on 12th June 2015 from: http://www.stat.wisc.edu/~loh/treeprogs/guide/wires11.pdf Maakaron, J. K. (2014). Sickle Cell Anemia. MedScape. Drugs and Diseases. Retrieved on 12th June 2015 from: http://emedicine.medscape.com/article/205926-overview#aw2aab6b2b7aa Pekelis, L. J. (2013). Classification and Regression Trees: A Practical Guide for Describing a Dataset. Classification and regression Trees, Biocoastal Datafest, Stanford University. Retrieved on 12th June 2015 from: http://statweb.stanford.edu/~lpekelis/talks/13_datafest_cart_talk.pdf Solanki, A. D. (2014). Data Mining Techniques Using WEKA Classification for Sickle Cell Disease. Research Scholar, JJT University. International Journal of Computer Science and Information Technologies, 5 (4), 5857-5860. Retrieved on 12th June 2015 from: http://www.ijcsit.com/docs/Volume%205/vol5issue04/ijcsit20140504222.pdf Read More

Data Mining and Prediction Modeling in Health Care - Term Paper Example

Extract of sample "Data Mining and Prediction Modeling in Health Care"

CHECK THESE SAMPLES OF Data Mining and Prediction Modeling in Health Care

Management Science /Operational Research literature for the year 2009

Climatology and Pollution of Watersheds

Analysis of Two Studies about Diabet

Foundation of Data Mining

The Infrastructure of Data Management and Data Mining Capabilities

System Health Prognostics

Time Series Data Mining and Forecasting Using SQL Server 2008

Active Shape Modelling in the Prediction of Hip Fracture