Bivariate Data Analysis of Sprint Cup Drivers Essay Example | Topics and Well Written Essays

Assignment (Teresa Phares) Math215 Table of Contents Introduction 2 Data Collection 3 Statistical Analysis 4 Independent and Dependent Variables 4 Scatter plot of Average Pit Stop Time and Number of Wins 4 The Coefficient of Correlation 5 The Equation of the Regression Line 6 The Scatter Plot and Regression Trend Line 7 Prediction Using Regression Model 8 Residuals (Largest) 9 Conclusion 10 References: 11 Appendix: 12 Introduction Bivariate data analysis involves analysis of relationship between two variables. Visual displays, correlation analysis and regression analysis are used for analyzing bivariate variables. A visual display such as scatter plot of data variables provides a visual indication of the strength of relationship or association between the two variables. Correlation coefficient measures the degree of linearity in the relationship between two variables. A linear regression fits the paired data of two variables to provide a model for prediction. In this paper, the average time (pit stop time) and number of wins for top 30 Sprint Cup Drivers (the Nextel Cup Series) for year 2008 (whole season) is taken for bivariate correlation and regression analysis. Data Collection For this paper, the average time (pit stop time)1 and number of wins for top 30 Sprint Cup Drivers (the Nextel Cup Series) for year 2008 (whole season) is collected (Race results, 2008; Sprint Cup Drivers, 2008). Table 1: Driver’s Number of Win(s) and the Pit Stop Time in Appendix, shows the data for top 30 drivers for the Nextel Cup Series, 2008. Statistical Analysis Independent and Dependent Variables In bivariate correlation and regression analysis , it will be determined whether average time of pit stops is related to the number of wins. Therefore, the average time will be taken as independent variable (x) and the number of wins will be taken as dependent variable (y). Scatter plot of Average Pit Stop Time and Number of Wins Figure 1: Scatter Plot of Average Pit Stop Time vs. Number of Wins Figure 1 shows the scatter plot of Average Pit Stop Time and Number of Wins. From figure 1 , it can be seen that as the average pit stop time increases the number of wins decreases. Therefore, there is a negative relationship exists between the variables average time and number of wins. The Coefficient of Correlation Correlation coefficient, r is given by , Where , , , and (Doane & Seward, 2007). Putting values of , , and from Table 2: Sum of Squares in Appendix, the correlation coefficient is The value of sample correlation coefficient , r = -0.55 indicate that there is a negative relationship present between the variables average pit stop time and number of wins for top 30 Sprint Cup Drivers. The correlation is significant at level of significance, α = 0.01. For a two-tailed test at level of significance, α = 0.01 and degree of freedom 28, the critical value of r is ± 0.463. The value of correlation coefficient r = - 0.550 is less than the left tail critical value of -0.463, therefore the null hypothesis of no correlation is rejected and the data provide sufficient evidence of correlation between variables average pit stop time and number of wins (Table 3: Correlation Matrix). The Equation of the Regression Line The regression equation is given by Where, Slope, and Intercept, (Doane & Seward, 2007). Putting values of , , and from Table 2: Sum of Squares in Appendix, the slope and intercept of linear regression equation is Slope = Intercept = Therefore, the regression equation is given by Or Number of Wins = 26.653 – 1.803*(Average Time) The slope equal to -1.803 suggests that an additional second in average pit stop time decreases the number of wins of sprint cup drivers by approximately 1.8. The number of wins for sprint cup drivers is approximately 26.7 with zero average pit stop time. However, the intercept is not meaningful because the average pit stop time can never be zero. The regression is significant at level of significance, α = 0.01. The higher F statistic (12.148) for overall regression suggest that regression is significant at level of significance, α = 0.01. This is also confirmed by p-value (0.002). The p-value for slope and intercept are equal to 0.002, and 0.001, therefore the slope and intercept are significant at level of significance, α = 0.01 (Table 4: Regression Summary Output). The value of coefficient of determination is equal to 0.303 (). Therefore, the average pit stop time explains 30.3 percent of the variation in number of wins for sprint cup drivers. On the other hand , 69.7 percent of the variation in number of wins is not explained by average pit stop time. The Scatter Plot and Regression Trend Line Figure 2: Scatter Plot of Average Time vs. Number of Wins and Linear Trend line Figure 2 shows the graph of Scatter Plot of Average Time vs. Number of Wins and Linear Trend line. From figure 2, it can be seen that the trend line approximately fits the data points. Prediction Using Regression Model Using regression model the prediction (number of wins) for average pit stop equal to 12, 13, 14 and 15 seconds are For average pit stop time = 12 seconds Number of Wins = 26.653 – 1.803*(12) = 5.017 5 For average pit stop time = 13 seconds Number of Wins = 26.653 – 1.803*(13) = 3.214 3 For average pit stop time = 14 seconds Number of Wins = 26.653 – 1.803*(14) = 1.411 1 For average pit stop time = 15 seconds Number of Wins = 26.653 – 1.803*(15) = -0.392 0 From above analysis, it can be seen that for average pits stop time equal to 12 (less than 13) and 13 seconds the number of wins is equal to 5 and 3, respectively, this is also confirmed by the scatter plot (figure 2). For average pit stop time equal to 14 seconds, the number of win is equal to 1 that is true for some drivers. For average pit stop time equal to 15 seconds, the number of win is zero that is also confirmed by the scatter plot (figure 2). The standard error for regression equation is ±2.145. Therefore, the regression model can be used for reliably predicting number of wins for sprint cup drivers based on average pit stop time within error of ±2. Residuals (Largest) Table 5 in appendix shows the residuals output of number of wins using regression model. The points with largest residuals are (13.1, 9), (13.2, 9) and (13, 7). The residual for points (13.1, 9), (13.2, 9) and (13, 7) are 6, 6, and 4 respectively. The unexplained variation in number of wins is the sum of squared residuals (or the error sum of squares). These high values of residuals contribute in the sum of squared residuals more than the other residual values because of which the unexplained variation in number of wins increases. Conclusion In conclusion, the number of wins and the average pit stop time for sprint cup drivers is related. Further, using average pit stop time for sprint cup driver, number of wins for whole season can be approximately predicted. References: Doane D.P. & Seward L.E. (2007). Applied Statistics in Business and Economics. McGraw-Hill/Irwin: New York Race Results, retrieved on December 4, 2008 from http://www.nascar.com/races/cup/2008/rr_index.html Sprint Cup Drivers, retrieved on December 4, 2008 from http://store.nascar.com/sm-nextel-cup-drivers--ci-1736852_cp-2056638.html Appendix: Click here for Excel spreadsheet. Table 1 Driver’s Number of Win(s) and the Pit Stop Time Driver Wins in 2008 Pit Stop Time Greg Biffle 2 13.2 Clint Bowyer 1 13.5 Jeff Burton 2 13.1 Kurt Busch 0 14.0 Kyle Busch 9 13.1 Dale Earnhardt, Jr 1 13.6 Carl Edwards 9 13.2 Bill Elliott 0 13.9 Jeff Gordon 0 13.5 Denny Hamlin 1 14.5 Kevin Harvick 0 14.8 Dale Jarrett 0 13.9 Jimmie Johnson 7 13.0 Kasey Kahne 2 13.8 Matt Kenseth 0 14.9 Travis Kvapil 0 15.2 Bobby Labonte 0 13.8 Mark Martin 0 15.1 Jeremy Mayfield 0 13.7 Jamie McMurray 0 15.2 Casey Mears 0 15.6 Paul Menard 0 14.9 Juan Pablo Montoya 0 14.1 Joe Nemechek 0 15.1 Ryan Newman 1 13.9 Kyle Petty 0 14.8 David Ragan 0 15.1 David Reutimann 0 13.2 Tony Stewart 1 13.9 Martin Truex, Jr. 0 14.0 Table 2 Sum of Squares Average Time (x) Number of Wins (y) 13.2 2 -0.920 0.800 -0.736 0.846 0.640 13.5 1 -0.620 -0.200 0.124 0.384 0.040 13.1 2 -1.020 0.800 -0.816 1.040 0.640 14.0 0 -0.120 -1.200 0.144 0.014 1.440 13.1 9 -1.020 7.800 -7.956 1.040 60.840 13.6 1 -0.520 -0.200 0.104 0.270 0.040 13.2 9 -0.920 7.800 -7.176 0.846 60.840 13.9 0 -0.220 -1.200 0.264 0.048 1.440 13.5 0 -0.620 -1.200 0.744 0.384 1.440 14.5 1 0.380 -0.200 -0.076 0.144 0.040 14.8 0 0.680 -1.200 -0.816 0.462 1.440 13.9 0 -0.220 -1.200 0.264 0.048 1.440 13.0 7 -1.120 5.800 -6.496 1.254 33.640 13.8 2 -0.320 0.800 -0.256 0.102 0.640 14.9 0 0.780 -1.200 -0.936 0.608 1.440 15.2 0 1.080 -1.200 -1.296 1.166 1.440 13.8 0 -0.320 -1.200 0.384 0.102 1.440 15.1 0 0.980 -1.200 -1.176 0.960 1.440 13.7 0 -0.420 -1.200 0.504 0.176 1.440 15.2 0 1.080 -1.200 -1.296 1.166 1.440 15.6 0 1.480 -1.200 -1.776 2.190 1.440 14.9 0 0.780 -1.200 -0.936 0.608 1.440 14.1 0 -0.020 -1.200 0.024 0.000 1.440 15.1 0 0.980 -1.200 -1.176 0.960 1.440 13.9 1 -0.220 -0.200 0.044 0.048 0.040 14.8 0 0.680 -1.200 -0.816 0.462 1.440 15.1 0 0.980 -1.200 -1.176 0.960 1.440 13.2 0 -0.920 -1.200 1.104 0.846 1.440 13.9 1 -0.220 -0.200 0.044 0.048 0.040 14.0 0 -0.120 -1.200 0.144 0.014 1.440 = 14.120 = 1.200 = -31.020 = 17.208 = 184.800 Table 3 Correlation Matrix Average Time Number of Wins Average Time 1.000 Number of Wins -.550 1.000 30 sample size ± .361 critical value .05 (two-tail) ± .463 critical value .01 (two-tail) Table 4 Regression Summary Output Regression Statistics Multiple R 0.550 R Square 0.303 Adjusted R Square 0.278 Standard Error 2.145 Observations 30 ANOVA df SS MS F Significance F Regression 1 55.918 55.918 12.148 0.002 Residual 28 128.882 4.603 Total 29 184.8 Coefficients Standard Error t Stat P-value Lower 95% Intercept 26.653 7.313 3.645 0.001 11.673 Average Time -1.803 0.517 -3.485 0.002 -2.862 Table 5 Residuals Output Number of Wins (y) Predicted () Residual () 2 3 -1 1 2 -1 2 3 -1 0 1 -1 9 3 6 1 2 -1 9 3 6 0 2 -2 0 2 -2 1 1 0 0 0 0 0 2 -2 7 3 4 2 2 0 0 0 0 0 -1 1 0 2 -2 0 -1 1 0 2 -2 0 -1 1 0 -1 1 0 0 0 0 1 -1 0 -1 1 1 2 -1 0 0 0 0 -1 1 0 3 -3 1 2 -1 0 1 -1 Read More

Bivariate Data Analysis of Sprint Cup Drivers - Essay Example

Extract of sample "Bivariate Data Analysis of Sprint Cup Drivers"

CHECK THESE SAMPLES OF Bivariate Data Analysis of Sprint Cup Drivers

The Impact of Advertising on Consumer Behavior

Sprint Communications Company Overview

Summative Evaluation and Suchmans Five Levels

Statistical Analysis of Stock Indices

Data Analysis (Applied Research Method)

Data Interpretation Practicum

Financial Modelling - Relationship between Market Risk and Stock Return

The Vector Topological Data Model in the Geographical Information Systems