Statistics Problems Report Example | Topics and Well Written Essays

Statistics Problems – Comprehensive Problem Set Week 6 88. Refer to the Baseball 2005 data, which reports information on the 30 major league teams for the 2005 baseball season. a. Select the variable team salary and find the mean, median, and the standard deviation. Mean = = 73,063,563.27 Median = (15th Value + 16th Value) / 2 = 66,191,416.50 Standard Deviation = = 34,233,970.30 b. Select the variable that refers to the age the stadium was built. (Hint: Subtract the year in which the stadium was built from the current year to find the stadium age and work with that variable.) Find the mean, median, and the standard deviation. Mean = = 28.20 Median = (15th Value + 16th Value) / 2 = 17.50 Standard Deviation = = 25.94 c. Select the variable that refers to the seating capacity of the stadium. Find the mean, median, and the standard deviation. Mean = = 45,912.60 Median = (15th Value + 16th Value) / 2 = 44,174 Standard Deviation = = 5,894.20 56. Assume the likelihood that any flight on Northwest Airlines arrives within 15 minutes of the scheduled time is .90. We select four flights from yesterday for study. The current scenario follows binomial distribution. The formula to compute the Probability of an event using Binomial distribution is n = 4 p = 0.90 X p(X) Cumulative probability 0 0.00010 0.00010 1 0.00360 0.00370 2 0.04860 0.05230 3 0.29160 0.34390 4 0.65610 1.00000 1.00000 a. What is the likelihood all four of the selected flights arrived within 15 minutes of the scheduled time? P (X = 4) = 0.65610 b. What is the likelihood that none of the selected flights arrived within 15 minutes of the scheduled time? P (X = 4) = 0.65610 c. What is the likelihood at least one of the selected flights did not arrive within 15 minutes of the scheduled time? P (X > 1) = 1 - P (X = 0) = 1 – 0.0010 = 0.9990 64. An internal study by the Technology Services department at Lahey Electronics revealed company employees receive an average of two emails per hour. Assume the arrival of these emails is approximated by the Poisson distribution. The current scenario follows Poisson distribution. The formula to compute the Probability of an event using Poisson distribution is λ = 2 X p(X) Cumulative probability 0 0.13534 0.13534 1 0.27067 0.40601 2 0.27067 0.67668 3 0.18045 0.85712 4 0.09022 0.94735 5 0.03609 0.98344 6 0.01203 0.99547 7 0.00344 0.99890 8 0.00086 0.99976 9 0.00019 0.99995 10 0.00004 0.99999 11 0.00001 1.00000 12 0.00000 1.00000 13 0.00000 1.00000 14 0.00000 1.00000 1.00000 a. What is the probability Linda Lahey, company president, received exactly 1 email between 4 P.M. and 5 P.M. yesterday? P (X = 1) = 0.27067 b. What is the probability she received 5 or more email during the same period? P (X ≥ 5) = 1 – P (X < 5) = 1 – 0.94735 = 0.05265 c. What is the probability she did not receive any email during the period? P (X = 0) = 0.13534 50. Fast Service Truck Lines uses the Ford Super Duty F-750 exclusively. Management made a study of the maintenance costs and determined the number of miles traveled during the year followed the normal distribution. The mean of the distribution was 60,000 miles and the standard deviation 2,000 miles. The current scenario follows Normal distribution. The formula to compute the Z value to standardize a value is Z = (X - µ) / σ a. What percent of the Ford Super Duty F-750s logged 65,200 miles or more? Z = (65,200 – 60,000) / 2,000 = 2.6 P (X ≥ 65200) = 0.0047 = 0.47% b. What percent of the trucks logged more than 57,060 but less than 58,280 miles? Z = (57,060 – 60,000) / 2,000 = -1.47 Z = (58,280 – 60,000) / 2,000 = -0.86 P (57060 < X < 58280) = 0.1241 = 12.41% c. What percent of the Fords traveled 62,000 miles or less during the year? Z = (62,000 – 60,000) / 2,000 = 1 P (X < 62000) = 0.8413 = 84.13% d. Is it reasonable to conclude that any of the trucks were driven more than 70,000 miles? Explain. Z = (70,000 – 60,000) / 2,000 = 5 P (X > 70000) = 0.0000 = 0% As the probability of the trucks being driven more than 70,000 miles is very negligible, it can be concluded that none of the trucks were driven more than 70,000 miles. 38. The mean amount purchased by a typical customer at Churchill’s Grocery Store is $23.50 with a standard deviation of $5.00. Assume the distribution of amounts purchased follows the normal distribution. For a sample of 50 customers, answer the following questions. The sample mean follows Normal distribution. Mean = 23.5 Std. Error = = 0.707 The formula to compute the Z value to standardize a value is Z = (X - µ) / (σ/√n) a. What is the likelihood the sample mean is at least $25.00? Z = (25 – 23.5) / 0.707 = 2.1216 P (X > 25) = 0.0169 = 1.69% b. What is the likelihood the sample mean is greater than $22.50 but less than $25.00? Z = (22.50 – 23.5) / 0.707 = -1.4144 Z = (25 – 23.5) / 0.707 = 2.1216 P (22.5 < X < 25) = 0.9044 = 90.44% c. Within what limits will 90 percent of the sample means occur? The confidence limits for 90% of the sample mean is given by Standard Error = 0.707 Degrees of Freedom = 49 t- Value = 1.6765 Interval Half Width = 1.1855 Lower Limit = 22.31 Upper Limit = 24.69 54. Families USA, a monthly magazine that discusses issues related to health and health costs, surveyed 20 of its subscribers. It found that the annual health insurance premiums for a family with coverage through an employer averaged $10,979. The standard deviation of the sample was $1,000. n = 20 Mean = 10,979 Std. Dev. = 1,000 As n = 20, t - distribution is used. Degrees of Freedom df = 19 Standard Error = a. Based on this sample information, develop a 90 percent confidence interval for the population mean yearly premium. Critical Value of t (df=19, α=0.10) = 1.729 Confidence Interval Limits = 10979 ± (1.729) (223.61) = 10979 ± 386.6 = (10,592.4, 11,365.6) b. How large a sample is needed to find the population mean within $250 at 99 percent confidence? Standard Error = 250 Critical Value at 99% confidence Z = 2.58 Sample Size = (zS/E)^2 = (2580 / 250) ^ 2 = 107 42. During recent seasons, Major League Baseball has been criticized for the length of the games. A report indicated that the average game lasts 3 hours and 30 minutes. A sample of 17 games revealed the following times to completion. (Note that the minutes have been changed to fractions of hours, so that a game that lasted 2 hours and 24 minutes is reported at 2.40 hours.) 2.98 2.40 2.70 2.25 3.23 3.17 2.93 3.18 2.80 2.38 3.75 3.20 3.27 2.52 2.58 4.45 2.45 Can we conclude that the mean time for a game is less than 3.50 hours? Use the .05 significance level. Hypotheses: Null Hypothesis  H0: μ = 3.50 Alternate Hypothesis  H1: μ < 3.50 Mean X = 2.953 Std. Dev. s = 0.5743 Test statistic: t = (X - μ) / ( s / √(n)) = -3.91 Degrees of Freedom = n - 1 = 17 - 1 = 16 Critical Value (df=16 & α=0.05) = -1.746 Since the t value is lesser than the critical value, the null hypothesis is dropped. Hence it can be concluded that the mean is less than 3.50 hours. 58. The amount of income spent on housing is an important component of the cost of living. The total costs of housing for homeowners might include mortgage payments, property taxes, and utility costs (water, heat, electricity). An economist selected a sample of 20 homeowners in New England and then calculated these total housing costs as a percent of monthly income, five years ago and now. The information is reported below. Is it reasonable to conclude the percent is less now than five years ago? Mean of X1 (Now) = 537/20 = 26.85 Mean of X2 (5 Years ago) = 663/20 = 33.15 Mean Difference = Sum of Differences / n = -126/20 = -6.3 Standard Deviation of d = 12.4778 Critical Value (df=19 & α=0.05) = -1.729 Hypotheses: Null Hypothesis  H0: μ(d) = 0 Alternate Hypothesis  H1: μ(d) < 0 Test statistic: t = (Mean Difference) / ( s(d) / √(n)) = -6.3 / 2.79 = -2.2581 Since the t value is lesser than the critical value, the null hypothesis is dropped. Hence it can be concluded that the percent is less now when compared to that of 5 years ago. 42. Martin Motors has in stock three cars of the same make and model. The president would like to compare the gas consumption of the three cars (labeled car A, car B, and car C) using four different types of gasoline. For each trial, a gallon of gasoline was added to an empty tank, and the car was driven until it ran out of gas. The following table shows the number of miles driven in each trial. Using the .05 level of significance: a. Is there a difference among types of gasoline? Analysis of Variance (ANOVA) is used to analyze the given scenario. Types of Gasoline Car Type Total Mean Car A Car B Car C Regular 22.4 20.8 21.5 64.7 21.57 Super Regular 17.0 19.4 20.7 57.1 19.03 Unleaded 19.2 20.2 21.2 60.6 20.2 Premium Unleaded 20.3 18.6 20.4 59.3 19.77 Total (Tk ) 78.9 79 83.8 Grand total = 241.7 Mean 19.73 19.75 20.95 Grand Mean = 20.14 For j: T1 = 64.7 , T2 = 57.1 , T3 = 60.6 , T4 = 59.3 T12 = (64.7)2 = 4186.09 , T22 = (57.1)2 = 3260.41 T32 = (60.6)2 = 3672.36 , T42 = (59.3)2 = 3516.49 For k: T1 = 78.9 , T2 = 79 , T3 = 83.8 T12 = (78.9)2 = 6225.21 , T22 = (79)2 = 6241 T32 = (83.8)2 = 7022.44 And n1 = 4 , n2 = 4 , n3 = 4 Overall: T = 241.7 , T2 = (241.7)2 = 58418.89 , N = 12 = 4878.45 – 4868.24 = 10.21 SSE = SST – SSA – SSB = 22.59 – 3.92 – 10.21 = 8.46 The Analysis of Variance (ANOVA) table is illustrated as below: Source of Variation Sum of Squares (SS) Degrees of Freedom (df) Mean Square MS F-ratio Among treatment groups (Cars) 3.92 3 – 1 = 2 Among Blocks (Gasoline Type) 10.21 4 – 1 = 3 Sampling Error 8.46 (4 – 1)( 3 – 1) = 6 Total (T) 22.59 12 – 1 = 11 Hypotheses: Null Hypothesis  H0: There is no difference among gasoline types. Alternate Hypothesis  H1: There is difference among gasoline types. Significance Level = 0.05 Critical Value of F (df=3,6 & α=0.05) = 4.76 The F ratio computed using Anova is 2.41. Since the F statistic is lesser than the critical value, the null hypothesis is accepted. Hence it can be concluded that there is no difference among the gasoline types. b. Is there a difference in the cars? Hypotheses: Null Hypothesis  H0: There is no difference among the cars. Alternate Hypothesis  H1: There is difference among the cars. Significance Level = 0.05 Critical Value of F (df=2,6 & α=0.05) = 5.14 The F ratio computed using Anova is 1.39. Since the F statistic is lesser than the critical value, the null hypothesis is accepted. Hence it can be concluded that there is no difference among the cars. 37. A regional commuter airline selected a random sample of 25 flights and found that the correlation between the number of passengers and the total weight, in pounds, of luggage stored in the luggage compartment is 0.94. Using the .05 significance level, can we conclude that there is a positive association between the two variables? Hypotheses: Null Hypothesis  H0: There is no positive association between the two variables. Alternate Hypothesis  H1: There is a positive association between the two variables. Significance Level = 0.05 n = 25 r = 0.94 Critical Value (df=23 & α=0.05) = 2.069 Test Statistic: Since the t statistic is greater than the critical value, the null hypothesis is rejected. Hence it can be concluded that there is a positive association between the two variables. 40. A suburban hotel derives its gross income from its hotel and restaurant operations. The owners are interested in the relationship between the number of rooms occupied on a nightly basis and the revenue per day in the restaurant. Below is a sample of 25 days (Monday through Thursday) from last year showing the restaurant income and number of rooms occupied. Use a statistical software package to answer the following questions. a. Does the breakfast revenue seem to increase as the number of occupied rooms increase? Draw a scatter diagram to support your conclusion. From the scatter plot, it is evident that the breakfast revenue increases with the increase in the number of occupied rooms. b. Determine the coefficient of correlation between the two variables. Interpret the value. Pearson’s Coefficient of Correlation R for the given data is computed as 0.437. This indicates a weak positive correlation between the breakfast revenue and the number of occupied rooms. c. Is it reasonable to conclude that there is a positive relationship between revenue and occupied rooms? Use the .10 significance level. Hypotheses: Null Hypothesis  H0:  = 0There is no positive association between the two variables. Alternate Hypothesis  H1:  ≠ 0There is a positive association between the two variables. Significance Level = 0.10 Correlation Coefficient R = 0.437 Critical Value (df=23 & α=0.10) = +1.714 Test Statistic: As the t-statistic is outside the range of the critical value, the null hypothesis is dropped. Hence it can be concluded that there is a positive association between the two variables. d. What percent of the variation in revenue in the restaurant is accounted for by the number of rooms occupied? Coefficient of Determination R2 = (0.437)^2 = 0.1910 Hence 19.10% of the movement of the revenue from breakfast can be explained by the number of rooms occupied. 17. The district manager of Jasons, a large discount electronics chain, is investigating why certain stores in her region are performing better than others. She believes that three factors are related to total sales: the number of competitors in the region, the population in the surrounding area, and the amount spent on advertising. From her district, consisting of several hundred stores, she selects a random sample of 30 stores. For each store she gathered the following information. Y=total sales last year (in $ thousands). X^1=number of competitors in the region. X^2=population of the region (in millions). X^3=advertising expense (in $ thousands). a. What are the estimated sales for the Bryne store, which has four competitors, a regional population of 0.4 (400,000), and advertising expense of 30 ($30,000)? The regression equation for the data is computed using Minitab and the results are presented below: Analysis of variance SOURCE DF SS MS Regression 3 3050.00 1016.67 Error 26 2200.00 84.62 Total 29 5250.00 Predictor Coef StDev t-ratio Constant 14.00 7.00 2.00 X 1 −1.00 0.70 −1.43 X 2 30.00 5.20 5.77 X 3 0.20 0.08 2.50 Y = 14 – 1X1 + 30X2 + 0.20 X3 When X1 = 4 , X2 = 0.4 , X3 = 30 Y = 14 – 4 + 30(0.4) + 0.20(30) = 10 + 12 + 6 = 28 ( $,000) Therefore, the estimated total sales Y = $28,000 b. Compute the R^2 value. c. Compute the multiple standard error of estimate. Standard Error of Estimate d. Conduct a global test of hypothesis to determine whether any of the regression coefficients are not equal to zero. Use the .05 level of significance. Hypotheses: Null Hypothesis  H0: 1 = 2 = 3 = 0 (there is no linear relationship between the dependent variable and at least one independent variable). Alternate Hypothesis  H1: At least one  ≠ 0 (there is a linear relationship between the dependent variable and at least one independent variable) Critical Value of F (df=3,26 & α=0.05) = 2.98 F – Statistic: Since the F statistic is greater than the critical value, the null hypothesis is rejected. Hence it can be concluded that at least one regression coefficient is not equal to zero. e. Conduct tests of hypotheses to determine which of the independent variables have significant regression coefficients. Which variables would you consider eliminating? Use the .05 significance level. Hypotheses 1 – Competitors in the Region: Null Hypothesis  H0: 1 = 0 (there is no effect). Alternate Hypothesis  H1: 1 ≠ 0 (there is an effect). Significance Level = 0.05 Critical Value (df=26 & α=0.05) = + 2.056 Test Statistic: t – Value = -01.43 (from the table) Since the t statistic lies within the critical range, the null hypothesis is accepted. Hence it can be concluded that the ‘Competitors in the region’ does not have an effect on the dependent variable. Hypotheses 2 – Population of the Region: Null Hypothesis  H0: 1 = 0 (there is no effect). Alternate Hypothesis  H1: 1 ≠ 0 (there is an effect). Significance Level = 0.05 Critical Value (df=26 & α=0.05) = + 2.056 Test Statistic: t – Value = 5.77 (from the table) Since the t statistic lies outside the critical range, the null hypothesis is rejected. Hence it can be concluded that the ‘Population of the region’ has an effect on the dependent variable. Hypotheses 3 – Advertising Expense: Null Hypothesis  H0: 1 = 0 (there is no effect). Alternate Hypothesis  H1: 1 ≠ 0 (there is an effect). Significance Level = 0.05 Critical Value (df=26 & α=0.05) = + 2.056 Test Statistic: t – Value = 2.50 (from the table) Since the t statistic lies outside the critical range, the null hypothesis is rejected. Hence it can be concluded that the ‘Advertising expense’ has an effect on the dependent variable. It is evident that only the variable, ‘Competitors in the region’ does not have a significant effect on the dependent variable, ‘Total Sales’. Hence it can be eliminated. 18. Suppose that the sales manager of a large automotive parts distributor wants to estimate as early as April the total annual sales of a region. On the basis of regional sales, the total sales for the company can also be estimated. If, based on past experience, it is found that the April estimates of annual sales are reasonably accurate, then in future the April forecast could be used to revise production schedules and maintain the correct inventory at the retail outlets. Several factors appear to be related to sales, including the number of retail outlets in the region stocking the company’s parts, the number of automobiles in the region registered as of April 1, and the total personal income for the first quarter of the year. Five independent variables were finally selected as being the most important (according to the sales manager). Then the data were gathered for a recent year. The total annual sales for that year for each region were also recorded. Note in the following table that for region 1 there were 1,739 retail outlets stocking the company’s automotive parts, there were 9,270,000 registered automobiles in the region as of April 1 and so on. The sales for that year were $37,702,000. a. Consider the following correlation matrix. Which single variable has the strongest correlation with the dependent variable? The correlations between the independent variables outlets and income and between cars and outlets are fairly strong. Could this be a problem? What is this condition called? The variable with the strongest correlation with the dependent variable is ‘Income’, as the correlation coefficient is 0.964. Regression analysis is based on the assumption that the independent variables are not dependent on one another. However, there is a strong correlation between outlets and income, and between cars and outlets. This indicates that there is a relationship between the independent variables. This condition is called ‘multi – collinearity’. b. The output for all five variables is on the following page. What percent of the variation is explained by the regression equation? Coefficient of Determination R2 = SSR / SSTotal = 1593.81 / 1602.89 = 0.9943 Hence 99.43% of the variations are explained by the regression equation. c. Conduct a global test of hypothesis to determine whether any of the regression coefficients are not zero. Use the .05 significance level. F Statistic = Means SSR / Mean SSE (Anova Table) = 318.76 / 2.27 = 140.4229 Critical Value (df=5,4 & α=0.05) = 6.26 Since the F statistic is greater than the critical value, the null hypothesis is rejected. Hence it can be concluded that all regression coefficients are equal to zero. d. Conduct a test of hypothesis on each of the independent variables. Would you consider eliminating “outlets” and “bosses”? Use the .05 significance level. Hypotheses: Null Hypothesis  H0: 1 = 0 Alternate Hypothesis  H1:  ≠ 0 Critical Value = 2.31 Since the value of test for outlets and bosses are less than the critical value, the null hypothesis is accepted. Hence it can be concluded that the regression coefficient = 0. So they can be eliminated from the model. e. The regression has been rerun below with “outlets” and “bosses” eliminated. Compute the coefficient of determination. How much has R2 changed from the previous analysis? Coefficient of Determination R2 = 1593.66 / 1602.89 = 0.994242 Thus 99.42% variability in Sales can be explained by the model. R2 decreases by 0.01% when compared to the previous analysis. f. Following is a histogram and a stem-and-leaf chart of the residuals. Does the normality assumption appear reasonable? The normality assumption is reasonable, as the histogram and stem-and-leaf chart are symmetric. g. Following is a plot of the fitted values of Y (i.e., and the residuals. Do you see any violations of the assumptions? There are no violations in the assumptions as there is no pattern in the plot of the fitted values of Y and the residuals. 22. Banner Mattress and Furniture Company wishes to study the number of credit applications received per day for the last 300 days. The information is reported on the next page. To interpret, there were 50 days on which no credit applications were received, 77 days on which only one application was received, and so on. Would it be reasonable to conclude that the population distribution is Poisson with a mean of 2.0? Use the .05 significance level. Hint: To find the expected frequencies use the Poisson distribution with a mean of 2.0. Find the probability of exactly one success given a Poisson distribution with a mean of 2.0. Multiply this probability by 300 to find the expected frequency for the number of days in which there was exactly one application. Determine the expected frequency for the other days in a similar manner. The current scenario follows Poisson distribution. The formula to compute the Probability of an event using Poisson distribution is λ = 2 P(X= 5 or more) = 1 – [P(x = 0) + P(x = 1) + P(x = 2) + P( x= 3) + P(x = 4)] = 1 – (0.1353 + 0.2707 + 0.2707 + 0.1804 + 0.0902) = 1 – 0.9473 = 0.0527 The expected frequencies are computed as shown below: No. of credit Applications Observed Frequency (f o) Probability (P) Expected Frequency ( f e ) = 300 × P 0 50 0.1353 300 × 0.1353 = 41 1 77 0.2707 300 × 0.2707 = 81 2 81 0.2707 300 × 0.2707 = 81 3 48 0.1804 300 × 0.1804 = 54 4 31 0.0902 300 × 0.0902 = 27 5 or more50 13 0.0527 300 × 0.0527 = 16 Hypotheses: Null Hypothesis  H0: The observed distribution of number of credit applications per day conforms to a Poisson distribution. Alternate Hypothesis  H1: The distribution of receipt of credit applications per day does not conform to a Poisson distribution. Level of Significance = 0.05 Degrees of Freedom = 4 Critical Value (chi-square) = 9.49 Chi-Square Statistic = 3.9949 Since the chi square value is less than the critical value, the null hypothesis is accepted. Hence it can be concluded that the observed distribution of number of credit applications per day conforms to a Poisson distribution. Read More

Statistics Problems - Report Example

Extract of sample "Statistics Problems"

CHECK THESE SAMPLES OF Statistics Problems

Quantitative Methods and Analysis

Academic Content Writing

Statistics Problems in a Funnel-In-Pattern

Statistics Problems-Week 5

Cholesterol and Cardiovascular Disease in the Elderly

Integrations and reflection

Confidence Interval of Proportions

Computers R Us