The Multiple Linear Regression Model Specification Essay Example | Topics and Well Written Essays

DATA ANALYSIS Regress vote on the variables in the Table above. that a constant should be included in the model (denote it as ). Provide the multiple linear regression model specification including the independent variables in the same order as they are presented in the table above. [10 marks] a) Provide an interpretation to the coefficient estimates of that regression. What is the meaning of the estimated intercept in this equation? SOLUTION Inflation: the coefficient for the inflation is given as -0.61945, this means that there is a negative relationship between inflation and vote and as such for every unit increase in the inflation, the dependent variable (vote) decreases by a factor of 0.61945 and vice versa. Growth: the coefficient for growth is given as 0.48663, this means that there is a positive relationship between growth and vote and as such for every unit increase in the growth, the dependent variable (vote) increases by a factor of 0.48663 and vice versa. Goodnews: the coefficient for goodnews is given as 0.64031, this means that there is a positive relationship between goodnews and vote and as such for every unit increase in the goodnews, the dependent variable (vote) increases by a factor of 0.64031 and vice versa. War: the coefficient for the war is given as -2.66658, this means that there is a negative relationship between war and vote and as such for every unit increase in the war, the dependent variable (vote) decreases by a factor of 2.66658 and vice versa. person: the coefficient for person is given as 3.04593, this means that there is a positive relationship between person and vote and as such for every unit increase in the person, the dependent variable (vote) increases by a factor of 3.04593 and vice versa. Estimated intercept (constant): the coefficient for constant is given as 48.7337, this means that holding all other factors constant then the value of vote would be estimated at 48.7337. b) Perform tests for the statistical significance of the parameters of the independent variables inflation, growth and goodnews using the critical value of the corresponding t-distribution and the test p-value. Interpret the tests results. SOLUTION Inflation: the computed t-distribution is given as -1.37 whose |-1.37| is 1.37 a value less than the critical value of 2.0423, meaning that we fail to reject the null hypothesis. Similarly using the p-value we get that the p-value is 0.183>0.05 (significance level), leading us to accept the null hypothesis and thus concluding that the parameter of inflation is insignificant in the model at 5% significance level. growth: the computed t-distribution is given as 3.03 a value greater than the critical value of 2.0423, meaning that we reject the null hypothesis. Similarly using the p-value we get that the p-value is 0.0060.05 (significance level), leading us to accept the null hypothesis and thus concluding that the parameter of goodnews is insignificant in the model at 5% significance level. 2. Perform a joint significance test for the independent variables of the model using both the p-value and the critical value of the F-distribution. [5 marks] a) Comment on the goodness-of-test of the model. What other factors might affect vote? SOLUTION F(5, 25)=5.79, this value is greater than the critical F-value (2.6030) from the tables we thus reject the null hypothesis. Similarly using the p-value we find that the given p-value is 0.0011, a value less than 5% significance level, leading us to reject the null hypothesis too. We thus conclude that the entire model is appropriate and that the independent variables in the model predict the dependent variable (vote). Other factors that might affect vote could be: i) Media coverage of contestants or political parties ii) Age and background of voters iii) Social class (employment and unemployment of people) b) What are the consequences of the results of this F-test together with those of the t-tests (from question 1) for the specification of the model? SOLUTION The F-test evaluates the null hypothesis that all regression coefficients are equal to zero versus the alternative that at least one does not. An equivalent null hypothesis is that R-squared equals zero. A significant F-test indicates that the observed R-squared is reliable, and is not a spurious result of oddities in the data set. Thus, the F-test determines whether the proposed relationship between the response variable and the set of predictors is statistically reliable, and can be useful when the research objective is either prediction or explanation. So in overall, we would say that the model is reliable for predicting the dependent variable (Vote). However, using the t-test we find that four variables (inflation, goodnews, war and person) are insignificant and as such we should consider removing them from the model. 3. Test the hypothesis of that: an extra 0.25% in the real per capita GDP growth rate has double the effect than a decrease of 0.5% in inflation, on vote. [9 marks] a) Use the command available in EViews to test for the corresponding coefficient restriction. b) Perform the test analytically. To obtain the answer to this question, we conducted a marginal effects test c) Interpret the test results. First we observe that the p-value related to the coefficient of growth is 0.000; implying that the marginal effects of the growth are significant at 5% significance level. The dy/dx for growth is close to four times that of inflation, this further confirms that an extra 0.25% in the real per capita GDP growth rate has double the effect than a decrease of 0.5% in inflation, on vote. 4. Answer the sub questions below on multicollinearity analysis in the model. [8 marks] a) Test for multicollinearity between the independent variables growth and inflation in the model. Explain your answer using EViews outputs. Using Klein’s Rule of Thumb, if the value of R2 for the auxiliary regression is greater than that of the original regression, then you probably have multicollinearity. VIF column shows by how much the other coefficients variances (and standard errors) are increased due to the inclusion of that predictor. We see that growth has no impact on the variance of inflation and so there is no multicollinearity between the two variables. b) Assuming that there is multicollinearity between those variables: i) Explain how you would resolve this problem. Explain your answer using EViews outputs. SOLUTION We may be able to resolve the problem on multicollinearity by centering, that is, we subtract the mean from the predictor values before generating the squared term. ii) What the consequences of multicollinearity are for the OLS estimator? SOLUTION The OLS Estimator is Still BLUE In the presence of multicollinearity, the OLS estimator remains unbiased. Also, in the class of the linear unbiased estimators, the OLS estimator remains to have a minimum variance. And as such we cannot find any alternative estimator that is much better than the OLS estimator. However, even though OLS estimator is the “best” estimator, it may not be very good. The Fit of the Sample Regression Equation is Unaffected In the case of OLS estimator, the “overall fit” of the sample regression equation, as measured by the R-Squared statistic, is not affected by the presence of multicollinearity. Thus, if the sole objective of our empirical study is prediction or forecasting, as in this case, then multicollinearity does not matter. The Variances and Standard Errors of the Parameter Estimates Will Increase The worst effect of multicollinearity is that it increases the variances and the standard errors of the OLS estimates. High variance implies that the estimates are not precise, and therefore unreliable to some extent. High variances and standard errors imply low t-statistics. As such, multicollinearity increases the chances of making a type II error of accepting the null-hypothesis when it is false, and therefore concluding that Y is not affected by X when in the real sense it does. That is to say, multicollinearity makes it difficult to detect an effect if one exists. 5. Perform a graphical analysis to detect the presence of heteroscedasticity in the model using at least two different plots. Do you find evidence of heteroscedasticity? Why? Explain the consequences of heteroscedasticity on the OLS estimator. [4 marks] Looking at the plot, we can see that there are no extreme outliers and thus we conclude that there is no evidence of heteroscedasticity. Consequences of heteroscedasticity on the OLS estimator i) The OLS estimators are still unbiased and consistent. This is because none of the independent variables is correlated with the error term. So a correctly specified equation will give us values of estimated coefficient which are very close to the real parameters. ii) Heteroscedasticity affects the distribution of the estimated coefficients increasing the variances of the distributions and therefore making the OLS estimators inefficient; that is, it is not BLUE. iii) Heteroscedasticity underestimates the variances of the estimators (The estimated variances and covariances of the OLS estimates are biased and inconsistent), leading to higher values of t and F statistics. iv) Hypothesis tests are not valid. 6. Perform a White test for heteroscedasticity. [8 marks] a) Provide the auxiliary regression and explain the meaning of the null hypothesis for this test. SOLUTION We evaluate the null hypothesis using the p-value; the null hypothesis in this case is for homoscedasticity and the heteroscedasticity as the alternative. P-value is given as 0.6834>0.05 (significance level) we thus fail to reject the null hypothesis and conclude that there is no presence of heteroscedasticity in the data. b) Why is the White test preferred to the Breusch-Pagan and Goldfeld-Quandt tests for heteroscedasticity. Explain your answer. SOLUTION The White test is a general test for heteroscedasticity and has the following advantages over the two other tests (Breusch-Pagan and Goldfeld-Quandt): i White test does not require one to specify any model of the structure of the heteroscedasticity, if at all it exists. ii White test does not depend on the many assumptiona that the errors are normally distributed which are common with the two tests (Breusch-Pagan and Goldfeld-Quandt). iii White test specifically tests whether the presence of heteroscedasticity causes the OLS formula for the variances and the covariances of the estimates to be incorrect. Lastly, Breusch-Pagan works well if linear forms but not for non-linear forms while Goldfeld-Quandt is more complex and inflexible to use as compared to white test. 7. Assume that there is heteroscedasticity of the form: How would you resolve the problem of heteroscedasticity in this case? Explain your answer analytically.[4 marks] SOLUTION And since It is therefore obvious that We now put all t elements in matrices and obtain estimates of alpha for the model by solving 8. Estimate the model using White’s autocorrelation and heteroscedasticity consistent standard errors. Comment on the results of that estimation in relation to the estimation results in question 1. When do we use White standard errors? [5 marks] SOLUTION The table above gives the White’s autocorrelation test. P-values are greater than 5% significance level leading us to fail in rejecting the null hypothesis of serially uncorrelated hence there is no presence of autocorrelation in the model. The p-value is given as 0.9772 (a value greater than 5% significance level), we thus fail to reject the null hypothesis and conclude that there is no presence of heteroscedasticity in the model. We should use White standard errors when we detect the presence of heteroscedasticity in the model. 9. Provide a graphical analysis of the residuals to detect the presence of autocorrelation using at least two different plots. What are the consequences of autocorrelation on the OLS estimator? [4 marks] The graphs clearly shows that the is no autocorrelation in the model specified. Consequences of autocorrelation on the OLS estimator; i) OLS estimators remain to be unbiased and linear ii) The property of minimum variance no longer exists in the presence of autocorrelation iii) The usual formulas for estimating variances are biased, that is, they can have negative or positive autocorrelation iv) Confidence intervals and hypothesis tests based on t and F-distributions are unreliable v) Since is affected so does vi) Computed standard errors and variances of forecasts might be inaccurate 10. Test for autocorrelation in the residuals using an appropriate procedure. [4 marks] SOLUTION We tested for autocorrelation using Durbin Watson test; If the observed value of the test statistic is greater than the tabulated upper bound, then we should fail to reject the null hypothesis of non-autocorrelated errors in favor of the hypothesis of positive first-order autocorrelation. Since 2.375817 is greater than 1.920, we fail to reject the null hypothesis and conclude that the errors in the model are non-autocorrelated. 11. All other factors being equal, is there evidence on that the incumbent running for the election is determinant for the percentage share of the vote won by the incumbent party? How strong is the evidence? Show all steps of the corresponding test to answer this question. [8 marks] SOLUTION We ran a correlation test to identify whether there exists evidence of correlation between vote and person. The Pearson correlation coefficient is given as 0.2746, showing that there is evidence that the incumbent running for the election is determinant for the percentage share of the vote won by the incumbent party. However, the strength is weak (though positive). 12. Describe step by step how you could test for the best functional form for the model in Question 1. [9 marks] SOLUTION We use the Ramsey RESET Test; Ramsey argued that various specification errors (omitted variables, incorrect functional form, correlation between X and U) gives rise to a nonzero U vector. The null and alternative hypotheses are; vs The test of is based on an augmented regression The test for specification error is then, . Ramsey’s suggestion is that Z should contain powers of the predicted values of the dependent variable. Using the second, third, and fourth powers gives Where and etc The first power,, is not included since it is an exact linear combination of the columns of X. Its inclusion would make the regressor matrix [X Z] have less than full rank. Based on the table above, we observe that the p-value is 0.3897>0.05 (significance level) we thus fail to reject the null hypothesis and conclude that the model has no omitted variables. 13. Test the assumption of normality in the residuals of the selected model in question 1 by using the Jarque-Bera (JB) tests. Comment on the implications of your JB test results on the properties of the OLS estimator. [6 marks] SOLUTION The above results indicate that the p-value is greater than 5% we thus fail to reject the null hypothesis and conclude that the residuals in the model follow a normal distribution. 14. For what purpose can your analysis above be used by political parties in general elections? Explain your answer. [6 marks] SOLUTION The above analysis can be used by political parties to lay down strategies on how to win the elections knowing very well that close to 54% (R2=0.5365) of variation in votes is explained by the five independent variables in the model. The analysis will help them improve on the factors affecting voting pattern Read More

The Multiple Linear Regression Model Specification - Essay Example

Extract of sample "The Multiple Linear Regression Model Specification"

CHECK THESE SAMPLES OF The Multiple Linear Regression Model Specification

Multiple Regression

The Associated Importance of the Linear Models

A State-Wise Empirical Investigation of The Income-Demand Relationship

Microeconomic Theory of Production Design

Generic Business Strategies and Advantage of Tourist Companies

Line of Best Fit Squares Regression LIne

A Heteroscedastic Regression Model for Survival Analysis

Flood Mathematical Models That Are Used In Flood Modelling