Task: About the Project The project is an opportunity to conduct a complete regression analysis of a real data set with what you have learned in this course. This will require all of the skills we have learned over the quarter: from plotting the data, to translating scientific questions into statistical regression terminology, answering those questions, and presenting the results in a concise manner. This is a group project, where groups can be no larger than 4 people. The project comprises 10% of the total course grade. Again, there will be more than one viable model for a data set. The final model will be acceptable as long as you can justify it with what we’ve learned over the quarter. No matter what regression methods you will try to find a final model, it will be helpful to keep in mind that we prefer parsimonious models to overly complicated models. Before we use a linear regression model to answer questions of interest, the four conditions of a linear regression model must be met. Transformations are necessary when any of the four conditions is not met. Data Sets Two data sets are available, which are included in the “Project Data Sets” folder on GauchoSpace. Each group chooses one of the two data sets. 1. Real Estate Sales The city tax assessor was interested in predicting residential home sales prices in a midwestern city as a function of various characteristics of the home and surrounding property. Data on 521 arms-length transactions were obtained for home sales during the year 2002. The 12 variables are 1 Sales price: Sales price of residence (dollars) 2 Finished square feet: Finished area of residence (square feet) 3 Number of bedrooms: Total number of bedrooms in residence 4 Number of bathrooms: Total number of bathrooms in residence 5 Air conditioning: Presence or absence of air conditioning: 1 if yes; 0 otherwise 6 Garage size: Number of cars that garage will hold 7 Pool: Presence or absence of swimming pool: 1 if yes; 0 otherwise 8 Year built: Year property was originally constructed 9 Quality: Index for quality of construction: 1 indicates high quality; 2 indicates medium quality; 3 indicates low quality 10 Style: Qualitative indicator of architectural style 11 Lot size: Lot size (square feet) 12 Adjacent to highway: Presence or absence of adjacency to highway: 1 if yes; 0 otherwise 2. Infection Risk The primary objective of the Study on the Efficacy of Nosocomial Infection Control was to determine whether infection surveillance and control programs have reduced the rates of nosocomial (hospital- acquired) infection in United States hospitals. This data set consists of a random sample of 113 hospitals selected from the original 338 hospitals surveyed. The data are for the 1975-1976 study period. The 12 variables are 1 Identification number: 1-113 2 Length of stay: Average length of stay of all patients in hospital (in days) 3 Age: Average age of patients (in years) 4 Infection risk: Average estimated probability of acquiring infection in hospital (in percent) 5 Routine culturing ratio: Ratio of number of cultures performed to number of patients without signs or symptoms of hospital-acquired infection, times 100 6 Routine chest X-ray ratio: Ratio of number of X-rays performed to number of patients without signs or symptoms of pneumonia, times 100 7 Number of beds: Average number of beds in hospital during study period 8 Medical school affiliation: 1 = Yes, 2 = No 9 Region: Geographic region, where: 1 = NE, 2 = NC, 3 = S, 4 = W 10 Average daily census: Average number of patients in hospital per day during study period 11 Number of nurses: Average number of full-time equivalent registered and licensed practical nurses during study period (number full-time plus one half the number part time) 12 Available facilities and services: Percent of 35 potential facilities and services that are provided by the hospital
