STA 242/ENV 255: February 25, 1998

Regression Criticism

Assignment: Due Tuesday, March 3, same Bat time, same Bat place

  1. CH 4: Problems 1-3 (also construct a partial leverage plot and interpret)
  2. For the production data in Problems 1-3, fit the simple linear regression models of Production on Year and Production on Area (with and without outliers if you find any). Explore this data further to find a model that seems reasonable on statistical and scientific grounds. What are the implications of such a model for forcasting in the future?
  3. CH 4: Problems 13-14,
  4. For both data sets, turn in a one paragraph summary suitable for "Ranger Rick the Regression Raccoon" with the interpretation of your results and how they can be used to address the problems of 1) forecasting food production (Problems 1-3) and 2) whether air pollution affects illness.
Suggested Reading:


Partial Leverage Plots

These can be obtained from the FIT window under the Graphs Menu: Partial Leverage. Remove any outliers by using the "Exclude from Calculations" and "Hide in Graphs" to see how the plots change. Does this agree with the DFBeta results?

Case Statistics and Diagnostics

All of the diagnostics needed can be obtained from the Vars menu of the FIT window. If the response variable is Y, and we have two predicotr variables X1 and X2, the case statistic columns will appear in the DATA window with the following titles:

You will also need to get p-values for the studentized residuals. Here's how:
First step is to save your data in a new file. To do this, select File:Save:Data. When the window comes up, change the datafile name to HWout then press enter. Instead of HWout, you may certainly call the file whatever you like.

Now, it's time to use the PROGRAM EDITOR!! First, exit out of INSIGHT. Then, Submit the following lines:

NOTE: You will need to change the code to use the name of your Y variable and how SAS has labeled the studentized residuals and put in the correct number for the degrees of freedom. (use the df=n-k-1, as derived on page 132 of the text. )


	data sasuser.HWout;
	set sasuser.HWout;
	pval = 2*(probt(-abs(RT_Y),df));
	run;
	proc print data=sasuser.HWout;
	run;
These commands will create a new variable called pval. For judging whether there are outliers using the Studentized residual use the Boferroni inequality; see page 132 and note 16 page 143.

You can startup INSIGHT and open HWout. I find that rather than scanning through the datafile, it is often easier to identify influential points using scatter plots or boxplots. You can plot Cook's Distance versus pval, leverage, or Year or Studentized residuals. (You may want to rename columns, so that labels on plots make more sense.) Have fun!