Variable | Type | Definition |
---|---|---|
MPG | continuous | fuel economy in miles per gallon, |
cylindrs | ordinal | number of engine cylinders, |
displace | continuous | total volume of the engine cylinders (its displacement) in cubic inches, |
horsepwr | continuous | engine horsepower, |
weight | continuous | the car's weight in pounds, |
accel | continuous | time required to accelerate from O to 60 mph (in seconds), |
modlyear | ordinal | year of manufacture, |
make | nominal | manufacturer of car, |
origin | nominal | region of origin (US, Europe or Japan), |
This data set was used by the Committee on Statistical Graphics of the American Statistical Association (ASA) in its Second (1983) Exposition of Statistical Graphics Technology, a forum for vendors of statistical graphics software to demonstrate their packages (the data are distributed by Statlib a service of the Department of Statistics at Carnegie Mellon University).
Your report should be about 5 or 6 typed pages in length (excluding figures and appendix material) and include:
Presume your reader is comfortable with basic statistical methods, but is not an expert in them, and assume that your reader is not familiar with the data. When you use a statistical method like regression or ANOVA explain why and carefully interpret your results. Present only important summaries, plots and their interpretations; don't burden your reader with unnecessary facts and analysis.
Your project report is due in class on April 27th. Late reports will not be accepted.
To print plots and figures created in SAS you first need to set your printer environment variable so that printed output will be directed to the printer of your choice, most likely the printer that serves the cluster in which you are working. To print in the teer lab, type "setenv PRINTER teerlp1" or "setenv PRINTER teerlp2" before typing "sas project &." If you are working in another cluster replace teerlp1 with the appropriate printer name; a list of printer names and info on printing files can be found here. More information on printing, especially output containing text, can be found here.
To get started type "sas project &" in one of the terminals open on your screen.
2) Explore relationships between MPG and other variables in the data set. Look at scatter plots of MPG by continuous covariates, and box plots or histograms of fuel economy grouped by categorical covariates. What do you see? Do any of the covariates show promise as predictors of MPG?
3) Use linear regression to explore the relationship between fuel economy and factors that may influence it. While data for most of the technological and physical variables that determine fuel economy are not available to us in this data set, we can use the variables we have to correlate engine size (cylindrs, displace, horsepwr), size of car (weight), engine performance (accel) and year and place of manufacture (modlyear, origin) with fuel economy.
Start by fitting a linear regression model for MPG including (a) promising covariates identified in "Question 2" and (b) those that you have strong prior beliefs should be included in the model as predictor variables. Plot residuals against fitted values for this regression; verify this plot looks like the right panel of figure 14-10 on page 463 of the book. Create a new response variable logMPG = log(MPG). Repeat the previous regression using logMPG as the response variable. Look at plots of residuals against all predictor variables to verify that each residual relationship is flat (for example, the relationships between weight and fuel economy may be different for light, moderately heavy and heavy autos; look for patterns in the residual plots like those pictured in Figure 14-7(b) on page 462 of the book). Enter appropriate quadratic terms into the model, if necessary. Search for a parsimonious model by one-at-a-time removing variables that (a) have little explanatory ability and (b) for you had weak prior reason to be included in the model.
Summarize the regression results: What fraction of variability in logMPG does your final model explain? Is the residual plot OK? Interpret the model: on average, controlling for other factors, how much higher/lower fuel economy do you predict for a car weighing 3000 pounds as opposed to 2500?, built in 1975, 1980?, etc... Remember that your response variable is the log of fuel economy.
Return to the Stat 110B home page.