Descriptive Statistics of the Business Model Project

Task 1-a. The descriptive summaries of the data set for current model and new model are given below: Table 1: Table showing the Descriptives ofdriving distance for each model Descriptive Statistics Model Current New Mean 272.85 269.00 SE(Mean) 2.194 2.557 95% Confidence Interval for Mean Lower bound 268.26 263.65 95% Confidence Interval for Mean Upper Bound 277.44 274.35 5% Trimmed Mean 273.17 269.22 Median 274.00 269.00 Variance 96.239 130.737 Std. Deviation 9.810 11.434 Minimum 255 243 Maximum 285 291 Range 30 48 Interquartile Range 16 17 Skewness -0.474 -0.279 Kurtosis -1.008 0.301 Coefficient of variation 3.6% 4.25% Interpretation: Interpretation of descriptive statistics: From table 1.1, which gives Descriptive Statistics, the mean of driving distance for the current model is 272.85 and that of new model is 269. Hence the mean of the new model is 3.85 lower than the mean for current model. The standard deviation of the current model and new model are 9.81 and 11.434 respectively. Coefficient of variation is a measure of reliability and it depicts the consistency of a series of data. If it is lower for a series of data, then the series is considered to be more reliable than the series which has relatively greater coefficient of variation. It is computed as SD divided by mean multiplied by 100. It is expressed in percentage. When we evaluate the coefficient of variation for both the models, the coefficient of variation is 3.6% for the first model and 4.25% for the second model. Hence the current model seems to be more reliable than the new model. To test whether this difference of 3.85 is significant or not we perform independent t-test. To know the extreme values (both current and new model) both upper 5 and lower 5 values, the following table 1.2 is given. Table 1.2: Table showing extreme values Type of ball Case Number Value Driving distance Current Highest 1 2 285 2 6 285 3 14 284 4 19 283 5 8 281a Lowest 1 11 255 2 7 256 3 17 260 4 5 260 5 20 265 New Highest 1 22 291 2 35 284 3 32 281 4 21 280 5 37 279 Lowest 1 36 243 2 38 252 3 26 257 4 39 260 5 28 261 a. Only a partial list of cases with the value 281 are shown in the table of upper extremes. There are 5 extreme values both upper and lower values for the two types current and new type are given above. Graph 1.1: Graph shoing Box plot of driving distance Interpretation about boxplot: From graph 1.1, Boxplots allow us to compare each group using a five-number summary: the median, the 25th and 75th percentiles, and the minimum and maximum observed values that are not statistically outlying. Outliers and extreme values are given special attention The heavy black line inside each box marks the 50th percentile, or median, of that distribution. For example, the median driving distance of current model is 274.00 and that of new model is 269.00. Notice that the medians vary quite a little bit across the boxplots. The lower and upper hinges, or box boundaries, mark the 25th and 75th percentiles of each distribution, respectively. For current model, the lower hinge value is 265, and the upper hinge value is 281. Whiskers appear above and below the hinges. Whiskers are vertical lines ending in horizontal lines at the largest and smallest observed values that are not statistical outliers. Boxplots provide a quick, visual summary of any number of groups. Further, all the groups within a single factor are arrayed on the same axes, making comparisons easier. While boxplots provide some evidence about shape of the distributions, we can use the Explore procedure of SPSS allow a more detailed look at how groups may differ from each other or from expectation. From the Graph 1.2 which contains the box plot, the current type is showing a higher mean than the new type and the standard deviation of current type is less than that of new type. So the values of new type are wide from the mean whereas for the current type the values are somewhat closure to the mean value. It is clearly depicted from the box plot. But there are no outliers in both current and new type of balls. Table 1.3: Table showing group statistics of driving distance Type of ball n Mean Std. Deviation Std. Error Mean Current 20 272.85 9.810 2.194 New 20 269.00 11.434 2.557 General descriptives about each type The Descriptives table 1.3 displays the sample size, mean, standard deviation, and standard error for both groups. On average, the driving distance of the current type of ball is 272.85 and that of new type of ball is 269.00 and they vary a little more around their average. Table 1.4: Table showing Independent Samples Test Levenes Test for Equality of Variances t-test for Equality of Means Variable F Sig. t df Sig. (2-tailed) Mean Difference Std. Error Difference 95% Confidence Interval of the Difference Lower Upper Driving distance 0.017 0.898 1.143 38 0.260 3.850 3.369 -2.970 10.670 1-b. Testing the difference of Means using independent t-test: Null hypothesis H0: There is no significant difference in the mean distance between between the two types of balls (ie. current ball and new ball) 1=2. Alternative hypothesis H1: There is a significant difference in the mean distance between between the two types of balls (ie. current ball and new ball) 1#2. Level of significance: 5% level or α=0.05. The procedure produces two tests of the difference between the two groups. One test assumes that the variances of the two groups are equal. The Levene statistic tests this assumption. In this problem, the significance value of the statistic is 0.898. Because this value is greater than 0.05, we can assume that the groups have equal variances and ignore the second test. The t column displays the observed t statistic for each sample, calculated as the ratio of the difference between sample means divided by the standard error of the difference. The df column displays degrees of freedom. For the independent samples t test, this equals the total number of cases in both samples minus 2. The column labeled Sig. (2-tailed) displays a probability from the t distribution with 38 degrees of freedom. The observed t statistic value is less than the critical t value and the null hypothesis H0 is accepted. The Mean Difference is obtained by subtracting the sample mean for group 2 (the new ball group) from the sample mean for group 1. The 95% Confidence Interval of the difference provides an estimate of the boundaries between which the true mean difference lies in 95% of all possible random samples of 20 balls. Since the significance value of the test is greater than 0.05, you can safely conclude that the average of 3.85 less by the new ball is due to chance alone. From table 1.3, when we compared the two types statistically, the mean driving distance for current type is 272.85 and the new type is 269.00. There is a difference of 3.85. For this difference to be significant or not, the following table is quite helpful and tell the real significant difference. From the table 1.4, it is quite clear that although there is a difference of 3.85 in their means, this 3.85 is not significant difference between the two types and both are performing on par. Table 1.5: Table showing Mann-Whitney Test Ranks Type of ball N Mean Rank Sum of Ranks Driving distance Current 20 22.65 453.00 New 20 18.35 367.00 Total 40 Table 1.6: Table showing Test Statistics under Mann Whitney U test Driving distance Mann-Whitney U 157.000 Wilcoxon W 367.000 Z -1.164 Asymp. Sig. (2-tailed) .244 Exact Sig. [2*(1-tailed Sig.)] .253a a. Not corrected for ties. b. Grouping Variable: Type of ball 1-c. Testing the difference of Means using Non parametric test: From table 1.5, it is quite interesting to test the difference of means using Non parametric test to ensure any change in the significant difference the best test to be used is Mann Whitney U test and it is given below: The output is got from SPSS and displayed in table 1.5. Note that the mean rank of current type is more than that of new type by 4.3. From the table 1.6, it is clear that the Z statistic is -1.164 and it is not significant (probability 0.244>0.05), hence we conclude that there is no significant difference between the two types of balls. Both the types of ball do not differ significantly. The difference in mean distance is only due to randomness. 1-d. Recommendations Based on the independent t-test and Mann Whitney U test, it is recommended that the current ball produces better driving distance than the new ball (by 3.85 yards more than new). But the difference in driving distance is not statistically significant with the sample size of 20, some more samples are needed to come to a better conclusion. At the outset the difference observed could be only due to random fluctuations. Hence there is no necessity of changing the ball from current type to new type. Task 2. 2-a. Null hypothesis H0: The regression coefficients are not significant βi=0. Alternative hypothesis H1: The regression coefficients are significant βi#0. Level of significance: 5% level or α=0.05. Table 2.1: Table showing Model summary of regression equation Model R R Square Adjusted R Square Std. Error of the Estimate 1 0.795 0.632 0.536 11.914 a. Predictors: (Constant), Occupation, No. of years in last job, Married, Age, Head of household, Education Table 2.2: Table showing ANOVA Model Sum of Squares Df Mean Square F Sig. Regression 5609.859 6 934.977 6.587 0.000** Residual 3264.841 23 141.950 Total 8874.700 29 a. Predictors: (Constant), Occupation, No. of years in last job, Married, Age, Head of household, Education b. Dependent Variable: No. of weeks jobless Intepretation about R square and ANOVA: The above multiple regression model is used to fit the dependent variable no. of weeks jobless against the independent variables age, education, married, head of household, no. of years in last job and occupation. It is quite interesting that overall the regression coefficient is highly significant, since from table 2.2 of ANOVA the F statistic of 6.587 with probability 0.000 is highly significant. So the model is highly reliable. This is also ensured by the multiple R and R square values with 0.795 and 0.632 respectively from table 8. About 63.2% of the dependent variable is explained through the independent variables. The most significant contributions are from the variables occupation and head of house hold. The other variables age, education, married, no. of years in last job do not contribute much to the regression equation as their probabilities are greater than 0.05 from table 10 of regression coefficients. Table 2.3: Table showing Coefficients Model Unstandardized Coefficients Standardized Coefficients t Sig. B Std. Error Beta (Constant) -11.812 25.292 -.467 0.645 Age 0.410 0.350 0.154 1.169 0.254 Education -0.115 1.011 -0.016 -0.114 0.910 Married -0.796 4.817 -0.021 -0.165 0.870 Head of household -16.354 5.502 -0.420 -2.972 0.007 No.of years in last job 0.723 0.889 0.105 0.813 0.425 Occupation 14.509 4.067 0.522 3.567 0.002 a. Dependent Variable: No. of weeks jobless Interpretation about the regression model: The linear regression model assumes that there is a linear, or "straight line," relationship between the dependent variable and each predictor. This relationship is described in the following formula. yi=b0+b1x1i+ b2x2i+ b3x3i+ b4x4i+ b5x5i+ei where yi is the value of the ith case of the dependent scale variable bj is the value of the jth coefficient, j=0,...,5 xij is the value of the ith case of the jth predictor ei is the error in the observed value for the ith case The model is linear because increasing the value of the jth predictor by 1 unit increases the value of the dependent by bj units. Note that b0 is the intercept, the model-predicted value of the dependent variable when the value of every predictor is equal to 0. For the purpose of testing hypotheses about the values of model parameters, the linear regression model also assumes the following: 1. The error term has a normal distribution with a mean of 0. 2. The variance of the error term is constant across cases and independent of the variables in the model. An error term with non-constant variance is said to be heteroscedastic and 3. The value of the error term for a given case is independent of the values of the variables in the model and of the values of the error term for other cases. The dependent variable no. of weeks jobless requires five independent variables age, education, married, head of household, no. of years in last job and occupation. For running regression in SPSS, we have to Select no. of weeks jobless as the y variable and age to occupation as the independent variables. Click OK. These selections produce the scatterplot. To see a best-fit line overlaid on the points in the scatterplot, activate the graph by double-clicking on it. Select a point in the scatterplot. Click the Add fit line at Total tool, then close the Chart Editor. The resulting scatterplot appears to be suitable for linear regression, with two possible causes for concern The variability of no. of weeks jobless with the other variables as independent variables. We will investigate these concerns further during diagnostic checking of the regression model. To run a linear regression analysis, from the menus choose: Analyze  Regression -Linear.- Select no. of weeks jobless as the dependent variable. Select age to occupation as the independent variables. Select type as the case labeling variable. Click Plots. Select *SDRESID as the y variable and *ZPRED as the x variables. Select Histogram and Normal probability plot. Click Continue. Click Save in the Linear Regression dialog box. Select Standardized in the Predicted Values group. Select Standardized in the Residuals group. Click Continue. Click OK in the Linear Regression dialog box. These selections produce a linear regression model for no. of weeks jobless based on independent variables. Diagnostic plots of the Studentized residuals by the model-predicted values are requested, and various values are saved for further diagnostic testing. This table shows the coefficients of the regression line. It states that the expected no. of weeks jobless is equal to -11.812+0.41*age-0.115*education-0.796*married -16.354*head of household+0.723*no. of years in last job. The ANOVA table tests the acceptability of the model from a statistical perspective. The Regression row displays information about the variation accounted for by your model. The fitted model is y=-11.812+0.41*age-0.115*education-0.796*married-16.354*head of household+0.723*no. of years in last job+14.509*occupation. Histogram, Normal P-P plot and Scatter plot Interpretation about histogram, P-P plots A residual is the difference between the observed and model-predicted values of the dependent variable. The residual for a given product is the observed value of the error term for that product. A histogram or P-P plot of the residuals will help you to check the assumption of normality of the error term. The shape of the histogram should approximately follow the shape of the normal curve. This histogram is acceptably close to the normal curve. The P-P plotted residuals should follow the 45-degree line. Neither the histogram nor the P-P plot indicates that the normality assumption is violated. The plot of residuals by the predicted values shows that the variance of the errors increases with increasing predicted polishing time. There is, otherwise, good scatter. Select Standardized Residual as the y variable. The plot of residuals by no. of weeks jobless shows the same results. A residual is the difference between the observed and model-predicted values of the dependent variable. The residual for a given product is the observed value of the error term for that product. A histogram or P-P plot of the residuals will help you to check the assumption of normality of the error term. Before running the regression, you should examine a scatterplot of no. of weeks jobless to determine whether a linear model is reasonable for these variables. Interpretation about residual plot The Residual plot displays information about the variation that is not accounted for by the model. The regression sum of squares is greater than the residual sums of squares.The significance value of the F statistic is less than 0.05, which means that the variation explained by the model is not due to chance. While the ANOVA table is a useful test of the models ability to explain any variation in the dependent variable, it does not directly address the strength of that relationship. The model summary table reports the strength of the relationship between the model and the dependent variable. R, the multiple correlation coefficient, is the linear correlation between the observed and model-predicted values of the dependent variable. Its large value indicates a strong relationship. R Square, the coefficient of determination, is the squared value of the multiple correlation coefficient. It shows that about 79.5% of the variation in time is explained by the model. The ANOVA table tests the acceptability of the model from a statistical perspective. The Regression row displays information about the variation accounted for by your model. The Residual row displays information about the variation that is not accounted for by your model. The regression and residual sums of squares are approximately equal, which indicates that about half of the variation in polishing time is explained by the model. The significance value of the F statistic is less than 0.05, which means that the variation explained by the model is not due to chance. While the ANOVA table is a useful test of the models ability to explain any variation in the dependent variable, it does not directly address the strength of that relationship. The model summary table reports the strength of the relationship between the model and the dependent variable. R, the multiple correlation coefficient, is the linear correlation between the observed and model-predicted values of the dependent variable. Its large value indicates a strong relationship. R Square, the coefficient of determination, is the squared value of the multiple correlation coefficient. It shows that about half the variation in time is explained by the model. As a further measure of the strength of the model fit, compare the standard error of the estimate in the model for knowledge of the independent variables, our best guess for the dependent variable would be about with the standard error of the estimate is considerably lower, about 11.9. 2-b. Summary of findings Only three variables have positive contribution in regression equation, they are: age, no. of years in last job and occupation. So only the head of household or occupation are more contributing to the regression equation. For further clairvoyance we must take some more samples (the sample size 30 is not enough to come to a decision for such a critical study). So, further investigation is needed to decide on the factors influencing the no. of weeks jobless. Task 3 3-a. Relationship between machine type and machine number Null hypothesis H0: There is no association between the machine number and type of machine. Alternative hypothesis H1: There is an association between the machine number and type of machine. Level of significance: 5% level or α=0.05. Table 3.1: Table showing cross tabulation of Machine Number vs. Machine type Machine type Total Manual Automatic Machine Number Machine 1 927 1000 1927 Machine 2 840 950 1790 Total 1767 1950 3717 Table 3.2: Chi square tests Value df Asymp. Sig. (2-sided) Pearson Chi-Square 0.517 1 0.472NS Interpretation about chi square test: For testing the association between machine number and machine type, the crosstabs menu under SPSS produce a crosstabulation table, a chi-square test, and nominal-by-nominal measures of association for machine number vs. machine type The crosstabulation shows the frequency of each machine number for each machine type. If each machine number provides a similar level of machine type, the machine number should be similar across machine type. Both machine numbers are almost having same equal no. of machine types. From the crosstabulation alone, its impossible to tell whether these differences are real or due to chance variation. Check the chi-square test to be sure. The chi-square test measures the discrepancy between the observed cell counts and what you would expect if the rows and columns were unrelated The two-sided asymptotic significance of chi-square statistic is greater than 0.05, so its safe to say that the differences are due to chance variation, which implies that each machine number have the same level of machine type. From the table 3.3 of chi square test, the value of chi square is 0.517 with probability of significance 0.472 (>0.05), hence it is concluded that there is no significant association between the type of machine and machine no. Another criteria for comparing between manual and automatic types is independent t-test. It is given below: Testing the difference between two machine types Null hypothesis H0: There is no significant difference in the mean no. of defective parts between the manual type machine and automatic type machine 1=2. Alternative hypothesis H0: There is no significant difference in the mean no. of defective parts between the manual type machine and automatic type machine 1#2. Level of significance: 5% level or α=0.05. Table 3.3: Table showing Group Statistics Machine type N Mean Std. Deviation Std. Error Mean No. of defective parts produced Manual 6 294.50 30.547 12.471 Automatic 6 325.00 28.235 11.527 Table 3.4: Independent Samples Test Levenes Test for Equality of Variances t-test for Equality of Means F Sig. T df Sig. (2-tailed) Mean Difference Std. Error Difference 95% Confidence Interval of the Difference Lower Upper No. of defective parts produced 0.171 0.688 -1.796 10 0.103 -30.500 16.982 -68.338 7.338 Interpretation: To compare the two groups statistically for which the best available test is independent t-test. From the table 3.3, the mean no. of defective parts produced by manual type is 294.5 and that of automatic is 325.00. There is a difference of 30.5. For this difference to be significant or not, the following table 3.4 is quite helpful and tell the real significant difference. From the table 3.4, it is quite clear that although there is a difference of 30.5 in their means, this 30.5 is not significant difference between the two groups and both are the same no. of defective parts produced and the null hypothesis H0 is accepted. The problem which requires Independent-Samples t test procedure to test the significance of the difference between two sample means. Also displayed are: • Descriptive statistics for each test variable • A test of variance equality • A confidence interval for the difference between the two variables (95% or a value you specify) Usually, the groups in a two-sample t test are fixed by design, and the grouping variable has one value for each group. With the Independent-Samples r rest procedure, all you need to provide is the cut point. The program divides the sample in two at the cut point and performs the t test. The virtue of this method is that the cut point can easily be changed without the need to re-create the grouping variable by hand every time. By using the SPSS package’s Analyze  Compare Means  Independent - Select driving distance as the test variable. ► Select Type of ball as the grouping variable. Click Define Groups. manual as the Group 1 value and automatic as the Group 2 value. Click Continue. Click OK in the Independent-Samples T Test dialog box. The Descriptives table displays the sample size, mean, standard deviation, and standard error for both groups. On average, the mean no. of defective parts produced by manual machine is 294.5 and that of automatic type is 325 and they vary a little more around their average. The procedure produces two tests of the difference between the two groups. One test assumes that the variances of the two groups are equal. The Levene statistic tests this assumption. In this problem, the significance value of the statistic is 0.688. Because this value is greater than 0.05, we can assume that the groups have equal variances and ignore the second test. Using the pivoting trays, we can change the default layout of the table so that only the "equal variances" test is displayed With the test table pivoted so that assumptions are in the layer, the Equal variances assumed panel is displayed. The t column displays the observed t statistic for each sample, calculated as the ratio of the difference between sample means divided by the standard error of the difference. The df column displays degrees of freedom. For the independent samples t test, this equals the total number of cases in both samples minus 2. The column labeled Sig. (2-tailed) displays a probability from the t distribution with 10 degrees of freedom. The value listed is the probability of obtaining an absolute value less than the observed t statistic, if the difference between the sample means is purely random. The Mean Difference is obtained by subtracting the sample mean for group 2 (the automatic group) from the sample mean for group 1. The 95% Confidence Interval of the difference provides an estimate of the boundaries between which the true mean difference lies in 95% of all possible random samples of 6 machines. Since the significance value of the test is greater than 0.05, you can safely conclude that the average of 30.5 more by the automatic type purely due to chance alone. From the above table 3.4 of independent t-test it is quite clear that there is no significant difference in mean no. of defective parts produced between the two machines manual and automatic. This is emphasized by the probability of significance 0.103. Hence both types of machines perform on par. 3-b. Summary of findings In statistical point of view, the mean no. of defectives produced by manual and automatic the do not differ significantly, yet the manual type produces 30.5 less no. of defectives than the automatic type. Hence manual type is recommended on the production point of view. Read More

Descriptive of the Business Model - Statistics Project Example

Extract of sample "Descriptive of the Business Model"

CHECK THESE SAMPLES OF Descriptive Statistics of the Business Model

Bsiness statistic assignment

Statistics for Managers

Descriptive Statistics Paper

Statistics/Analysis - Descriptive and Inferential Presentation

Financial Econometrics - Testing for Unit Roots and Cointegration

Statistics: Research Methods

Role of Statistics in Politics

Failings of Financial Institutions such as UBS Bank, Barclays, and Citigroup Which Involved Rigging of Labor