STA 242/ENV 255
Lab 6
3/8/2000
There is nothing to turn in for this lab, but you will be expected to know how to perform stepwise selection. Bring the results from Exercise 17 to class Thursday.
Assignment for Lab:
Chapter 12, Exercise 11. work as a group. (Repeat the exercise on your own but stop when no F-stats greater than 2 remain for inclusion of another variable). This does not require the use of the computer, but is intended to help in understanding the mechanics of what stepwise selection does.
Chapter 12, Exercise 17, part b only for now. Use the S-Plus stepwise functions to select a model using the meteorological and socio-economic variables, then test to see if there a significant effect of pollution on mortality.
And of course, the conceptual exercises! In particular discuss exercises 1-6. Exercise 6 is particularly relevant to discussion of Exercise 17.
Preliminaries:
Read in the dataset Ex1217. In the options, set the Name Column to 17, so that city names are used as row labels.
Fit the model MORTAL ~ PRECIP + JANTEMP + JULYTEMP + OVER65 + HOUSE + EDUC + SOUND + DENSITY + NONWHITE + WHITECOL + POOR + HUMIDITY + NOX + NC + SO2. Does everything look OK? We will look at identifying which confounding variables are important using stepwise selection, then we will use an F-test to see if the pollution variables are significant.
Stepwise Selection in S-Plus Using Menus:
To use the Stepwise procedure, go to the Statistics menu and select Regression, then Stepwise Linear
Create the formula for the Upper Model, using MORTAL as the response and add all explanatory variables except for the 3 pollution variables as Main Effects. The UPPER model includes all possible predictors, while the lower model, NULL, means that we will allow the procedure to potentially run until there are no variables in the model (just an intercept). This defines the biggest and smallest possible models under consideration. Under Stepping Options select both to perform stepwise selection. (the forward option appears to be broken under S-Plus 4.5). Click on OK to run the procedure. The output will be in the Report window and will show all the models at each step.
Fit the model using the variables selected by the stepwise procedure and the three pollution variables NOX, HC, and SO2. Obtain the F-statistic and p-value for testing whether there is a pollution effect.
If you repeat the stepwise selection using all the meteorological and socio-economic and pollution variables in the Upper model, do you end up with the same selected model? Do you think that one approach is better than the other in this case?
Selection using the Command Line
Sometimes I have found that the menu commands do not work reliably, so as an alternative you can perform forward, backward, and stepwise selection using the command line. So Open the CommandLine Window.... The command line version is more flexible.
The syntax used is a little tricky but, so be patient and check for typos! To read about the command first enter help(stepwise) - after reading that I am sure the command makes perfect sense! No? Ok let's try an example
To repeat the menu version of stepwise, enter:
mort.step <- stepwise(Ex1217[,c(1:11,15)], Ex1217[,"MORTAL"])
The first argument, Ex1217[,c(1:11,15)], specifies which columns of the dataframe contain the explanatory variables. In this case it is columns 1-11, and 15. The function c( ) is used to combine the sequences 1:11 and 15 into one sequence. Try entering c(1:11, 15) on the command line and see what you get! So this creates the matrix X with all variables under current consideration for variable selection.
The second argument is the response variable Y; we can specify this by column number or by name: Ex1217[, 16] or Ex127[, "MORTAL"]
To change the critical F value for testing, you can include the option f.crit=4 (i.e. Each t-stat should be larger than 2 in absolute value). The default is 2 based on AIC. You can also change the method to do forward, method="forward" or backward selection, method="backward". i.e.
stepwise(Ex1217[,c(1:11,15)], Ex1217[,"MORTAL"], f.crit=4)
stepwise(Ex1217[,c(1:11,15)], Ex1217[,"MORTAL"], f.crit=4, method="forward")
stepwise(Ex1217[,c(1:11,15)], Ex1217[,"MORTAL"], f.crit=4, method="backward")
The output is saved in the object mort.step (at least for the first command). What does it contain? The trace showing the models at each step in a slightly different format than before. Just type the names below to display their contents or do not save the output to an object, ie just give the stepwise command and the results will spill out in the command window.
mort.step$which this is a table or matrix with column corresponding to variables and rows based on models. Each element is T (true) if the variable is included in the model and F (false) if it is excluded in the model for the row.
mort.step$rss this is the Residual Sum of Squares for each model in the search
mort.step$size the number of variables in each model
mort.step$f.stat vector with the F statistics for testing the change (adding or dropping) made at this step.
Given the best model you can refit the model using the menus.