Development of a Predictive Model by Making Use of a Data from Canberra Weather Measurements - Coursework Example

Summary

"Development of a Predictive Model by Making Use of a Data from Canberra Weather Measurements" paper utilized in the actualization of data mining of weather dataset. Decision Tree, Random Forest were the models that were originated by use of Rattle a software tool utilized in the prediction of rain. …

Download full paper File format: .doc, available for editing

GRAB THE BEST PAPER96.2% of users find it useful

Development of a Predictive Model by Making Use of a Data from Canberra Weather Measurements

Read Text

Subject: Information Technology
Type: Coursework
Level: Undergraduate
Pages: 10 (2500 words)
Downloads: 0

Extract of sample "Development of a Predictive Model by Making Use of a Data from Canberra Weather Measurements"

Table of contents 1.0 Abstract 1 2.0Introduction 1 3.1 Data Preparation 3 3.2Data Pre-processing 3 3.3Variable Analysis 3 3.4Modeling 4 4.0Result 4 4.1Missing Values 4 4.2Outliers 4 4.3Variable Analysis 5 4.4Model Building 6 4.4.1Decision Tree: 7 4.4.2Random Forest 8 4.5Evaluation of Models 9 4.5.1Confusion Matrix 10 4.5.2Matthews correlation coefficient 10 4.5.3Precision, Recall and Specificity 11 4.5.4ROC Curve 11 5.0Discussion 12 6.0Conclusion 12 7.0 Reference 13 1.0 Abstract This assignment involved development of a predictive model by making use of a data from Canberra weather measurements in predicting the probability of whether it would rain tomorrow. The data used was obtained from Australian Bureau of Meteorology. A variety of softwares tools were utilized in the actualization of data mining of weather dataset. Decision Tree, Random Forest were the two models that were originated and appraised by used of Rattle a software tool utilized in prediction of rain tomorrow. 2.0Introduction Weather prediction is made possible by use of mathematical models in the atmosphere and oceans in the prediction of weather on the basis of the prevailing weather at a particular moment. This practice was first attempted in 1920 but after computer simulation advent in 1950s it is when there was realization of realistic results production. There are several global and regional models which are in use in different countries all over the world, where the prevailing weather observations are conveyed from radiosondes or weather satellites at this are utilized as inputs in the models. Mathematical models that that operate on the same physical principles may be used in the generation of short-term weather predictions or prediction of long-term climate. There have been significant improvements forecasts of air quality and tropical cyclone track are a results of the improvements that have been made in regional models. This report is a product of deliberate attempt of coming up with a sound predictive model using weather data from Canberra weather measurements in the prediction of possibility of rain tomorrow. Prediction of weather is important awing to the wide areas of association that can make use of the prediction including sports organizations, travelers and farmers. There are a number of techniques that can be put to use in the development of the predictive models including decision tree and Rondom Forest. Rattle and JMP are some of the distinct software tools that are used in building the models. By use of Rattle and JMP data mining of weather dataset is accomplished. The process of building the models in three steps: a) Data collection : this is a procedure of data collection and preparation b) Data preparation: this involved handling the data and putting it into a appropriate form ready for further analysis and processing c) Preliminary data analysis: this is primary analysis of the data on the basis the needs and requirements After going through the modeling process, the aim of achieving the outcome where models are of high accuracy and adequate and the possibility of rain tomorrow will be ascertained by utilization of appropriate tools. 3.0 Method The dataset needed which consists of 13 months’ weather in Canberra from 1/6/2013 to 23/7/2012 which was obtained online from the Australian Bureau of Meteorology. The data has a total of 420 days (rows) with 22 variables (columns) and the detailed method is as introduced below. The procedure for coming up with the models will also be shown. 3.1 Data Preparation The columns were renamed where shorter names were given which were easy to understand and this would make it easy for manipulation in the software tools which have been mentioned earlier. a new variable Rain Tomorrow was added to the data set. the rainfall variable was used in coming up with the new categorical variable where the variable was taken as a “Yes” suppose there would be rain the next day and ‘no” for the case where there would be no rain. 3.2Data Pre-processing The steps which were involved in the prepossessing of the data were Identifying and making rectification (through deleting) missing values in the dataset Locating and elimination of outliers Transformation of a needed data value or a set of them from the data format of a system into a data format of a required data system. 3.3Variable Analysis Dataset variable analysis was done by use of Rattle and SPSS in order to establish the distributions and correlations between distinct variables, with bar plots, histograms being adapted in visualization of the data in dataset. 3.4Modeling Using rattle the model was built as the it supports decision tree, random forests boosted decision tree and others. Decision tree and random forests was build in this context. The approximation of the models was done using Rattle so as to discover the errors through error matrix and ROC (rate of change). 4.0Result The analysis of Canberra weather dataset was achieved using Rattle and SPSS and the observations were reported in this part. 4.1Missing Values By using Rattle, established that 26 observations with missing values were found in the dataset which was significantly small compared to the total number of observations, 0.26% of total observations. 4.2Outliers The distribution graphs of some variables that were randomly chosen but highly related to the possibility of whether tomorrow rains were checked to discover variables with possible outliers. From figure 3 can be seen that evaporation and Relative humidity at 9 were the only variable which revealed having outliers. However, the number of possible outliers was meaningless compared to the total number of observations. Figure 3 4.3Variable Analysis The distribution graphs exhibited by numeric variables, as indicated in the examples in the previous sections (Figure 3) clearly indicated that the distribution was unimodal, ” 9am relative humidity” and “9am MSL pressure” were al most normal distribution and the others being either left or right skewed. The Bar Plot of Rain Tomorrow (Figure 4) indicated that 32% of the observations showed “yes” while 67% were “no”. Figure 4 4.4Model Building The Melbourne dataset was randomly partitioned into a training dataset, a testing dataset and a validation dataset which are all independent. And normally the training dataset was adopted to build the models. The following models were built: ·Decision Tree ·Random Forest 4.4.1Decision Tree: The Decision Tree model was built using Rattle. Figure 5.1 showed that the Min Split =20, Min Bucket=7, Max Depth=30, Complexity=0.0100 with a Root node error 0.39249 (115/293). And Figure 5.2 showed the decision tree after pressing the Execute button with the values before. Figure 5.1 Figure 5.2 The main path through the decision tree above is to the right, i.e. there is a 88% of possibility of rain tomorrow if 9am humidity Read More

3.2Data Pre-processing The steps which were involved in the prepossessing of the data were Identifying and making rectification (through deleting) missing values in the dataset Locating and elimination of outliers Transformation of a needed data value or a set of them from the data format of a system into a data format of a required data system. 3.3Variable Analysis Dataset variable analysis was done by use of Rattle and SPSS in order to establish the distributions and correlations between distinct variables, with bar plots, histograms being adapted in visualization of the data in dataset. 3.4Modeling Using rattle the model was built as the it supports decision tree, random forests boosted decision tree and others.

Decision tree and random forests was build in this context. The approximation of the models was done using Rattle so as to discover the errors through error matrix and ROC (rate of change). 4.0Result The analysis of Canberra weather dataset was achieved using Rattle and SPSS and the observations were reported in this part. 4.1Missing Values By using Rattle, established that 26 observations with missing values were found in the dataset which was significantly small compared to the total number of observations, 0.

26% of total observations. 4.2Outliers The distribution graphs of some variables that were randomly chosen but highly related to the possibility of whether tomorrow rains were checked to discover variables with possible outliers. From figure 3 can be seen that evaporation and Relative humidity at 9 were the only variable which revealed having outliers. However, the number of possible outliers was meaningless compared to the total number of observations. Figure 3 4.3Variable Analysis The distribution graphs exhibited by numeric variables, as indicated in the examples in the previous sections (Figure 3) clearly indicated that the distribution was unimodal, ” 9am relative humidity” and “9am MSL pressure” were al most normal distribution and the others being either left or right skewed.

The Bar Plot of Rain Tomorrow (Figure 4) indicated that 32% of the observations showed “yes” while 67% were “no”. Figure 4 4.4Model Building The Melbourne dataset was randomly partitioned into a training dataset, a testing dataset and a validation dataset which are all independent. And normally the training dataset was adopted to build the models. The following models were built: ·Decision Tree ·Random Forest 4.4.1Decision Tree: The Decision Tree model was built using Rattle. Figure 5.

1 showed that the Min Split =20, Min Bucket=7, Max Depth=30, Complexity=0.0100 with a Root node error 0.39249 (115/293). And Figure 5.2 showed the decision tree after pressing the Execute button with the values before. Figure 5.1 Figure 5.2 The main path through the decision tree above is to the right, i.e. there is a 88% of possibility of rain tomorrow if 9am humidity

Cite this document

APA
MLA
CHICAGO

(Development of a Predictive Model by Making Use of a Data from Coursework Example | Topics and Well Written Essays - 2500 words, n.d.)
Development of a Predictive Model by Making Use of a Data from Coursework Example | Topics and Well Written Essays - 2500 words. https://studentshare.org/information-technology/2064055-in-this-assignment-you-will-be-working-with-real-world-data-and-real-world-problems-you-will-be

(Development of a Predictive Model by Making Use of a Data from Coursework Example | Topics and Well Written Essays - 2500 Words)
Development of a Predictive Model by Making Use of a Data from Coursework Example | Topics and Well Written Essays - 2500 Words. https://studentshare.org/information-technology/2064055-in-this-assignment-you-will-be-working-with-real-world-data-and-real-world-problems-you-will-be.

“Development of a Predictive Model by Making Use of a Data from Coursework Example | Topics and Well Written Essays - 2500 Words”. https://studentshare.org/information-technology/2064055-in-this-assignment-you-will-be-working-with-real-world-data-and-real-world-problems-you-will-be.

Cited: 0 times

CHECK THESE SAMPLES OF Development of a Predictive Model by Making Use of a Data from Canberra Weather Measurements

Overseas Trained Health Professionals in Australia

One strategy by which hospitals and medical organizations can address this shortfall is through a fundamental mechanism of economic exchange: taking an attractive commodity in question from where it is abundant and relocating it to where it is needed.... Certainly, it is to the advantage of British Commonwealth countries to attract talented nurses from abroad, and measures must be taken to encourage and support them to meet the country's own medical needs.... A case study analysis using secondary data is obtained and analyzed to develop insights on the cultural differences nurses are likely to experience when attempting to establish themselves in Australia....

27 Pages (6750 words) Dissertation

Human Resources Management in Supporting Corporate Strategy

data Collection Methods 24 3.... data Analysis 24 3.... The aim of this research "Human Resources Management in Supporting Corporate Strategy" is to improve human resources practices in Abu Dhabi government entities that will support Abu Dhabi government strategy for its vision 2030 to be best top-five management in the world … Human resource management has becomes important as aspect of the corporate world....

63 Pages (15750 words) Dissertation

A Framework for the Management of Oil Spillage Risks in Oil Exploration Programmes

Indeed, there has been an oil spillage once a day, on average, from the time when gas and oil development began on the North Slope.... This paper will focus on the methodology and structure, purposes of risk management, application of risk management in oil exploration programs, identification of spillage risks, risk assessment of oil spillage risks, risk response planning for oil spillage risks and risk monitoring....

67 Pages (16750 words) Dissertation

Model Predictive Control

It also seems to be the approaches, which are most suitable for the development of general and application independent software, which is essential for the development of cost-effective applications.... measurements of controlled and manipulated variables for several interacting control loops).... hemical manufacturing processes present many challenging control problems, including: nonlinear dynamic behaviour; multivariable interactions between manipulated and controlled variables; unmeasured state variables; unmeasured and frequent disturbances; high-order and distributed processes; uncertain and time-varying parameters; unmodelled dynamics; constraints on manipulated and state variables; and (variable) dead time on inputs and measurements....

18 Pages (4500 words) Essay

Nurses Knowledge and Competency

Therefore, prompt and accurate recognition of these factors are important for ensuring a recovery from the injury.... Therefore, prompt and accurate recognition of these factors are important for ensuring a recovery from the injury.... The healing of wounds caused by accident, assault, warfare, and surgical operations has always been the central consideration in surgical practice because any breach in the surface lining of the body, be it skin or mucous membrane exposes the underlying tissues to the danger of infection....

24 Pages (6000 words) Research Proposal

Police and Education

hellip; To speak on this aspect of a crime problem, Emile Durkheim in his treatise crime as a normal phenomenon said that, “a society composed of persons with angelic qualities would not be free from violations of the norms of that society”.... The necessity of curbing crimes has led people taking a serious note for the formation of Police force for the purpose of the protection of their lives and property and eliminates the criminalities from the society....

53 Pages (13250 words) Dissertation

Impact of Employee Participation in Decision Making on Job Satisfaction in the Maldives Public Enterprises

The paper "Impact of Employee Participation in Decision making on Job Satisfaction in the Maldives Public Enterprises" highlights that employee participation in decision making can be beneficial to workers' mental health and job satisfaction(Spector, 1986; Miller and Monge, 1986; Fisher, 1989).... The participation of employees in organizational decision making in these enterprises and the incorporation of employees opinions and suggestions in formulating company policies and strategies are paramount importance to achieve a competitive advantage over the competitors....

30 Pages (7500 words) Research Proposal

School Effectiveness Framework

Cooperative learning helps a lot in this as it gives both the students and the management the opportunity of interdependence in learning and sharing of information so as to help each other in the development of their careers (McLauhlin & Talbert, 2006).... he Australian national university is a research-based institution located in canberra Australia.... The main concern in learning institutions mainly lies in methods used to organize physical and social experiences in the classroom and school environment so as to encourage and participate in the development and adaption to change in learners....

17 Pages (4250 words) Case Study