Statistics 242 -- Applied Regression Analysis

Lab 2

Simple Linear regression analysis is built on the assumptions of linearity, constant variances, independence, and normality. While we won't discuss independence now, in this lab/HW we will focus on ways to examine these other assumptions using graphical displays.

Topics

  1. Residual Plots
  2. R^2
  3. ANOVA Tables

Assignment

  1. Work thru all the Conceptual Exercises in Chapter 8 (Do not turn in)
  2. Problem 17, Chapter 8. Verify that the fit is OK using the residual and normal probability plots.
  3. Problem 21, Chapter 8.

General Hints for Obtaining Output Related to Chapter 8

To obtain the regression ANOVA table and to save the residuals and fitted values, create the regression:

Select Regression from the Statistics Menu. Enter the model formula (remember it has the form y ~ x

From the Regression Dialog, select the Results tab and check the box for ANOVA table, residuals and Fitted Values; specify the dataframe name to save the results.

For residual plots, in the Regression dialog, select the Plot tab. Click the boxes for "Residual vs Fit" and "Residual Normal QQ". When you do the regression, these plots will be created.

For the plot of residuals versus explanatory variables, you will need to use the Graph main menu to create the 2-D scatter plot. After you fit the regression, the variables will be created. Select the Graph menu, then 2D Plot, and scatter plot. Put the X variable on the x-axis and the residuals on the y-axis. For Simple linear regression, this is not necessary, as the same info is in the residuals vs fitted values plot. Note: if you refit the model and save the residuals and fitted values everytime, the variables will saveed in other columns, but may not have the correct names. The best approach maybe to create a new dataframe for the output.

Lack of Fit Test

If you want to do a Lack of Fit test, make sure that you have observations with replicates of the explanatory variable. To obtain the pure error sum of squares, you will need to obtain the ANOVA table from fitting the linear regression AND the ANOVA table from fitting a One-Way ANOVA model. To obtain the One-Way ANOVA table, select Analysis of Variance from the Statistics menu, then select Fixed Effects. Select the dataframe, and then create the model formula. For example, in the lack of fit test for 8.16 covered in class, you would type in the model as ph ~ as.factor(time) (rather than using the create formula). The function as.factor converts the explanatory variable time into a "grouping variable" or "factor" with a discrete number of levels (6 in this case); one level for each time point, with 2 replicates at each point in time. The actual values of "time" do not enter into the picture at all in the One-Way Anova when you use as.factor(time). The Residual Sum of Squares from the One-way ANOVA table is the Pure error sum of squares for the lack of fit test, and the associated df's are the pure error degrees of freedom.

Hints for HW Problems

Exercise 8.17

  1. Download and read in the datafile Ex0817.asc.
  2. Use Transform under the Data menu to create all of the new variables such as log(mass), sqrt(mass), 1/mass, sqrt(load), etc. In S-Plus 4.5, if you click on the "Apply" button, it will create the transformation without leaving the dialog box. You can then enter in the name and expression for a new transformation, click "Apply", repeat, so that you do not have to repeatly bring up the dialog box.
  3. To do multiple scatter plots within one graph, Select 2D, and scatterplot as you have done before. To add additional graphs, make sure that you use the same Graph Sheet, i.e. GS1, as used in the previous graph (scroll down to get it, rather than starting a new graph sheet). Select 2D and Scatterplot with the next set of variables. Repeat until you have all of the plots you want. You can change some aspects of the layout using the Format menu by selecting Arrange Graphs, and then specifying how many to go across a page. Another approach all together is to use Graph, 2D, and select Matrix. Do this on a new graph sheet. In the dialog box in ONLY the X-columns, give the names of all columns that you want to plot separated by commas and a space. This will create a table of plots of all variables against each other. As this contains some plots that you may not care about, the former method is better for formal presentation, however this may be faster to produce :-) For more fun try, Brush and Spin under the Graph Menu. This is useful for detecting outliers. To specify only some of the columns, use Control-Clicks to select the columns. De-Select the Spin box for now. Click on OK. To highlight points, click on the Big Points box.
  4. Look at residual plots and normal probability plots for some of the other possible models to see the types of patterns that may arise.

Exercise 8.21

  1. Make sure that you have read case study 8.1 before starting this.
  2. Download and read in the datafile Ex0821.asc.
  3. Start with a Scatterplot of Species vs Area. Do you need to transform both species and area? (the residual plots may tell you more than the scatterplot) If transformations are needed, you might start with log transformations of both variables, as the theory in Case study 8.1 suggests, rather than starting with a shot gun approach of trying all transformations. Look at residual and normal probability plots to guide you. (To get a feel for how transformations work you may find it valuable to try several anyway).
  4. Note: R2 values cannot be directly compared for regressions with and without transformed response variables. They can be used to compare models with different transformations of the explanatory variable.
  5. Because there are replicates, you can do a formal Lack of Fit Test. This may help in deciding upon a transformation and indicating if the model is appropriate.
  6. to get the pvalue for an F-test, where the Fratio=10.10, there are 4 df in the numerator, and 6 df in the denominator (as in the class example) use 1 - pf(10.10, 4, 6) in the Command Window.
  7. Limit your write up to 1 page including any figures. You do not need to explain all the steps you went through to get your final model, but you should justify that it is appropriate. Be sure to report Confidence Intervals's for parameter estimates, and give interpretations of the final model (in the original units). See the case studies for examples and read through the section 8.4 for explanations of how to interpret results using log transformations.
Remember, there is no "true" model, but some are more useful than others.