Internal Code: MAS5363
Business Statistics Assignment:
A local used car dealer in Berlin has asked us to evaluate the price of premium vehicles. We have collected data on 1049 BMWs listed on a used car website in 2016. They have provided us data on the price of the vehicle and characteristics of the vehicle such as age, kilometres driven, fuel used and the body style. You will use descriptive statistics, inferential statistics and your knowledge of multiple linear regression.
A. Calculate the descriptive statistics from the data and display in a table. Be sure to comment on the central tendency, variability and shape for all of the variables excluding Year, Name and Model. Include information regarding the quartiles for Price, Kilometers and PowerKW. How would you interpret the mean of dummy variables such as Automatic or Petrol?
B. Draw a graph that displays the distribution of Price. Be sure to comment on the distribution. Does it appear normally distributed?
C. Create a box-and-whisker plot for the distribution of Age and describe the shape. Is there evidence of outliers in the data?
D. What is the probability that we could randomly select a vehicle that is a convertible? What is the likelihood that the age of a convertible exceeds 25 years? Is the age of a vehicle statistically independent of whether they are a convertible? Use a Contingency Table or Pivot Table to show the relative frequencies of these events.
E. Estimate the 95% confidence interval for the population mean price of Hatchbacks. How does this compare to the 95% confidence interval for the population mean price of Coupes?
F. It is traditionally believed that the less than the majority of convertibles in Germany have a manual transmission. Test the claim that the population proportion of Convertibles having a manual transmission is less than 50% at the 5% level of significance.
G. Run a multiple linear regression using the data and show the output from Excel. Important: Exclude the dummy variable Coupe from the regression results as well as “Year” “name” and “model”.
H. Is the coefficient estimate for Age statistically different than zero at the 5% level of significance? Set- up the correct hypothesis test using the results found in the table in Part (G) using both the critical value and p-value approach. Interpret the coefficient estimate of the slope.
I. Interpret the remaining slope coefficient estimates. Discuss whether the signs are what you are expecting and explain your reasoning.
J. Interpret the value of the Adjusted R 2 . Is there a large difference between the R 2 and the Adjusted R 2 ? If so, what may explain the reasoning for this?
K. Is the overall model statistically significant at the 5% level of significance? Use the p-value approach.
L. Based on the results of the regressions, what other factors may have influenced the sale price of the used vehicles? Provide a couple possible examples and indicate their predicted relationship with sales if they were included.
M. Predict the average price of a vehicle that is 5 years old, has an automatic transmission, has 75,000 kilometers, uses Petrol, has no damage, has a 110 kw engine, and the body style is a sedan. Discuss if it is appropriate to do predict under these conditions. Show the predicted regression equation.
N. Do the results suggest that the data satisfy the assumptions of a linear regression: Linearity, Normality of the Errors, and Homoscedasticity of Errors? Show using scatter diagrams, normal probability plots and/or histograms and Explain.
O. Does this data indicate the true population distribution of vehicle prices of BMWs in Berlin? Explain and if not, describe a sampling procedure that could lead to more accurate results. Would you expect these results to hold for Mercedes as well?
P. The car dealer wants to display a random selection of 5 “high performance” vehicles on their website. They define “high performance” as having an engine exceeding 200kW. The dealer would generally like a mixture of body styles show up on the website. What is the probability that of those selected, all 5 vehicles would be sedans? What is the probability that none would be sedans? Create this using a Binomial Table and construct a bar chart to show the probability distribution of the number of vehicles that are sedans.