Lab 3/6/02

ENV255 Lab Agenda
03/06/02

Today's lab will focus on items related to Homework 4, which involves a 1-page writeup of the forest damage and elevation data.

Start by transforming the percents into proportions. Also divide elevations by 100, since we will likely be interested in increases of 100 meters in elevation and their impact on percent damaged, rather than increases of 1 meter.
Sequence of model selection. You'll start with the "rich" model, which includes all main effects and interaction terms. With this rich model, you'll consider transformations on the "elevation" variable, then transformations on the "percent variable". At this point, you're only going to look at residual plots to ensure that assumptions are met.
1. Start with percent~elevation*location. Examine residuals vs. fitted and qq-normal plots. Leave these plot windows open on your desktop. Why do you see granularity in the qq-plot?
  At this point you can do a "sanity check" of your model by looking at the F-statistic at the bottom of the parameter estimate output. This F-statistic tells you whether at least one of the covariates you have included is non-zero.
2. Consider the following model: percent~I(log(elevation))*location. Examine residuals vs. fitted and qq-normal plots. Leave these plot windows open on your desktop.
3. Now consider transforming the response. Which transformations are appropriate? Try log(percent/(1-percent))~elevation*location. Compare the residual vs. fitted and qqnormal plots to those in part(a). Do you see an improvement?
4. Try log(percent/(1-percent))~I(log(elevation))*location. Compare the residual vs. fitted and qqnormal plots to those in part (c). Do you see an improvement?
5. Here is a script file for you to look at all of the residual plots at once. Note that the data is labeled "spruce" with variables: percent, elevation, and location.
Now that we have determined the form of the "rich model" in terms of transformations, we will go on to examine whether a more parsimonious model is more appropriate; that is, whether the interaction term belongs in the model.
Once you have decided on the model of interest, double-check the residuals of the model and do a coded residual plot.
Now interpret the coefficients in the model we have selected and what they mean for the North and South.
Note that the slope for the South is not very steep. Pursue this by calculating a CI for the slope in the South. What can you conclude? Do the same for the North.
Plot the model logit(percent damaged)~elevation*location on its original scale. Here is a script file to accomplish this. This is not necessary to turn in with your 1-page writeup (but wouldn't it be great to include in a journal paper or MP?). You should at least turn in a coded scatterplot of logit(percent damaged) vs. elevation, with fitted lines superimposed.

Important: Practice with Extra Sum of Squares F-tests

(not required for your writeup, but will be required for your take-home midterm)

Save the fitted regression model object in Splus for

log(percent/(1-percent))~elevation and for
log(percent/(1-percent))~elevation*location.

(You can save them using the option on the same page as the one you enter the regression -- give them descriptive names.) Perform an ESS F-test to compare the "equal lines" to the "separate lines" model. This involves a test of whether beta2=beta3=0.

Go to "Statistics" and "Compare Models". Go to the "Model Objects" drop down menu, and shift-click on the 2 models. Make sure that "F" is checked and hit OK. You'll see the F-statistic of interest.

Go back to last week's lab, and read through the problem. Make sure you know how to plot a fitted polynomial regression. Also make sure you know how to save the residuals, as well as the logic of plotting the residuals vs. time. It is assumed that you know how to do the tasks in the last 2 lab agendas.