StudentShare
Contact Us
Sign In / Sign Up for FREE
Search
Go to advanced search...
Free

Properties of the Decision Tree in the WEKA - Essay Example

Cite this document
Summary
This paper 'Properties of the Decision Tree in the WEKA" focuses on the fact that class attribute “Diabetic” is taken as reference for all other attributes in the histogram. As the association algorithm along with decision tree J48 is used for the comparative analysis of the final_medicaldata file. …
Download full paper File format: .doc, available for editing
GRAB THE BEST PAPER97.4% of users find it useful
Properties of the Decision Tree in the WEKA
Read Text Preview

Extract of sample "Properties of the Decision Tree in the WEKA"

?Examination of Each Attributes attribute “Diabetic” is taken as reference for all other attributes in the histogram. Apriori as the association algorithm along with decision tree J48 is used for the comparative analysis of the final_medicaldata file. Data mining package known as WEKA is used. Main characteristics are demonstrated in the following. J-48: J-48 as a decision tree is discussed. It depends upon the gain ratio in order to split up the attributes by using the depth-first strategy. Leaves are replaced by the sub trees through a pruning method that also reduced the over filtering. WEKA enables the one of two options such as pruned tree or not pruned tree as shown in the figure. Figure 1: Properties of the Decision tree in the WEKA (J48) In addition to above features, the WEKA also performs the test options for data use and data classification. Usage of the Training set: Evaluation of the classifier is based on the prediction of the instances of a class, which is trained on. Supplied Test: Evaluation of the classifier is also performed on the prediction of the instances of a class, which is loaded from the file. Cross Validation: By entering the number of fold into the text field of the Fold in the WEKA explorer the classifier is evaluated. Percentage Split: Data percentage is predicted by the evaluation of a classifier that takes the data out for the testing. The percentage field determines the specification of data held. During the training, data is used and provided the value of percentage field that makes the important part. Value of the reminder is reserved for the testing purposes. By the default, value of percentage split is stated as the 66%. Data about 34% is used for testing and remaining 66% is trained. Figure 2: WEKA with testing options Decision tree performance is determined by examining the cross validation and percentage split in the provided medical dataset. Usage of Cross Validation for generation of decision tree: In order to control the factors such as training’s set size and confidence by the process of cross validation, the flexibility is found in the decision tree of J48. Confidence factor is used to minimise or reduce the error rate of the classification. It is said that confidence factor is used to settle the problem of tree pruning. In order to classify the instances in a more accurate way, the classifier is given an opportunity by increasing the confidence factor and removing the noise of the training. The value of the confidence factor is 95% used for the dataset and leads to an outstanding outcome of 89.2% for the correct and classified instances and only 10.7% is the classified incorrectly as shown in the following figure. Figure 3: Use of cross validation based on the option J-48 decision tree to generate the results by WEKA. In the above figure, the calculation of J48 decision tree has been shown which includes correct values in details. Confusion Matrix is the important point in the given figure, which describes the ways in which a classifier makes an error in the prediction of a class type. According to Dunham (2003) the confusion matrix provides the correctness of the solution for the given classification problem. Another term used as an alternative to the confusion matrix is the contingency table. Two classes having a single dataset contain a column and two rows for the confusion matrix as shown in the figure 4. Predicted Actual Figure 4: Confusion Matrix Here FP represents the incorrectly classified number of negatives as positives and called as the commission errors. TP represents correctly classified number of positives. TN represents the correct classification of negative numbers, and FN shows the incorrect classification of positive numbers as negative. These are called as the omission errors. Predictive accuracy becomes the way for measuring the performance of a classifier. Predictive accuracy is known as the calculated success rate determined by the use of predictive accuracy as the confusion matrix. Predictive Accuracy = (TP + TN/TP+TN+FN+FP) *100 In figure 3 correct prediction of 323 attributes by the decision tree is shown, while it could not predict the 39 attributes of the class. As shown in the figure 3 of the confusion matrix the patients of class a with diabetes type1 only 26 attributes have been predicted correctly by the decision tree while 22 attributes predicted incorrectly. On the other hand, patients of diabetes type2 from class b only 297 attributes were predicted correctly by decision tree and 17 incorrectly predicted. Therefore, cross validation is used to measure the predictive accuracy with the help of J48. Figure 4: Measuring the predictive accuracy Figure 5: Decision Tree Use of Percentage Split to Generate the Decision Tree The percentage split technique utilizes the 34% data for testing and 66% for the data training as a default. There are only 200 instances used for the training and only 123 instances utilized for the testing. Currently at this stage, the J48 decision tree gives the better results than that of Cross Validation by the increase of 1% in the classification. Figure 6 shows that 111 instances are classified correctly out of the 123 with 90.2% test results. Only 12 instances are classified incorrectly. Looking upon the confusion Matrix in the figure 7, it is shown that patients of class a with diabetes type 1, the correct predicted attributes were 7 and 5 attributes were incorrect. While patients of class b with type 2, the 104 attributes were predicted correctly and 7 attributes incorrectly. Hence the calculation of the predictive accuracy is measured with the help of J48 through the Cross validation. Figure 6: Predictive Accuracy Figure 7: WEKA used for generation of results from Percentage Split option on the Decision tree of J-48. Association Rules As discussed on the association rules in the earlier chapters that used for the issuance of predictions for all attributes, it resulted into the establishment of relationship between attributes. Confidence and support are the key aspects to make the relationship among the different attributes. These key aspects reflected the advantages and assurance of the undiscovered rules. Apriori was used as the association rule that resulted into a good relationship between the attributes. Data was put into the ranges because the association rule did not work on the numerical data. Data was separated into ranges by the WEKA that also discretized the filter. Association rule was applied after the data was transformed into the ranges of bins. References Dunham, H.M. (2003). Data Mining Introductory and Advanced Topics, Pearson Education, Inc. Read More
Tags
Cite this document
  • APA
  • MLA
  • CHICAGO
(“Rewrite Essay Example | Topics and Well Written Essays - 1250 words - 1”, n.d.)
Rewrite Essay Example | Topics and Well Written Essays - 1250 words - 1. Retrieved from https://studentshare.org/information-technology/1470276-rewrite
(Rewrite Essay Example | Topics and Well Written Essays - 1250 Words - 1)
Rewrite Essay Example | Topics and Well Written Essays - 1250 Words - 1. https://studentshare.org/information-technology/1470276-rewrite.
“Rewrite Essay Example | Topics and Well Written Essays - 1250 Words - 1”, n.d. https://studentshare.org/information-technology/1470276-rewrite.
  • Cited: 0 times

CHECK THESE SAMPLES OF Properties of the Decision Tree in the WEKA

Financial crisis 2007-2012

According to efficient market theory, the information flow from the market was such that it influenced not only the borrowers but also the lenders for purchase of housing properties (Harder, 2010, p.... In the paper “Financial crisis 2007-2012” the author analyzes the financial crisis of 2007-2012, which has led to severe criticism of the Efficient Market Hypothesis Theory....
4 Pages (1000 words) Essay

Precautionary Principle

Over the last three decades, the precautionary principle has become an essential element of domestic and international legislative efforts in the fields of environmental conservation, natural resource management, health protection and agricultural trade.... The precautionary principle (also termed 'the precautionary approach') involves moral, political, and ethical responsibilities towards protecting and preserving the integrity of natural systems, and the fallibility of human understanding (Ricci et al....
14 Pages (3500 words) Essay

Group Decision Making and Negotiation among Agents

It offers increased diversified views and greater acceptance of solutions among people who are concerned about the decision.... Brainstorming, Nominal group and Delphi are the major three properties of group decision making.... Compare and contrast the conditions under which these two coordination mechanisms can govern interdependences among agents effectively and efficiently. In management perspective, problem solving and… decision making is related to defining problems and selecting a best course of actions among alternatives to solve problem that is already occurred or is Since employees' acceptance of a particular decision has greater significance in organizational setting, it is usually imperative for the management to include group of employees in decision making process....
6 Pages (1500 words) Essay

Utopian Thinking and Practical Leadership Alternatives

He uses the success of this case to show how public inclusion in decision-making processes can be used as a tool to bring bureaucracy to an end, thus promotes democracy.... He explains this through a book by the name “Real Utopias project”.... Through this book, he gives various suggestions on ways in which different arenas of social… He shows various shortcomings of capitalist economies and gives possible alternatives to the current economic system....
4 Pages (1000 words) Essay

Atherton Clothing Company

Zachary primary concern was to change the buy and sell contract of Atherton, which divided ownership and decision-making power in the company after his father Allen has left over.... During 1980ss the succession passed onto Allen and Ryan who were the youngest sons of the Cohen's family to join the business....
4 Pages (1000 words) Admission/Application Essay

Analysis of Investment Market Data

nbsp; Any decision for investment requires a commitment of funds now with the expectations of earning a satisfactory return on these funds over a period of time in the future.... Criterion decision: Accept proposal if the earnings exceed initial investment required, or if it is more than 0.... Uses of commercial properties are for office space, retail units, restaurants, farms, agricultural lands and many more....
12 Pages (3000 words) Term Paper

Emaar Properties PJSC 2009 Performance, Analysis of Emaars Strategic Posture

The paper "Emaar Properties PJSC 2009 Performance, Analysis of Emaar's Strategic Posture" states that the Emaar has a decentralized corporate structure because the decision-making authority is decentralized to many units as revealed in the management structure chart in company's 2008 annual report.... Emaar properties PJSC, inaugurated in 1997, is a Dubai based real estate business entity involved in a worldwide construction, building, development and property investment business....
20 Pages (5000 words) Research Paper

Decision of Buying a New House

So it is very important for the buyer to analyze each and every implication of the decision made and take the best decision so as to make a good investment (IOWA State University, 2008).... From the paper "decision of Buying a New House" it is clear that for some of the citizens, it might very easy to buy a home because they have a huge amount in their disposal, but for many, it is quite hard to purchase a house with their hard-earned money.... A wrong decision can hamper the future life of people which may lead to several other problems....
8 Pages (2000 words) Case Study
sponsored ads
We use cookies to create the best experience for you. Keep on browsing if you are OK with that, or find out how to manage cookies.
Contact Us