Statistics 242 -- Applied Regression Analysis
Lab 2
Simple Linear regression analysis is built on the assumptions of
linearity, constant variances, independence, and normality. While we
won't discuss independence now, in this lab/HW we will focus on ways to
examine these other assumptions using graphical displays.
Topics
- Residual Plots
- residuals versus fitted values
- residuals versus explanatory variable
- Normal Probability Plots or Normal Quantile plots of residuals
- R^2
- ANOVA Tables
- The Regression ANOVA TABLE
- The ONE-WAY Analysis of Variance ANOVA TABLE
- Combining the above for a LACK of FIT TEST
Assignment
- Work thru all the Conceptual Exercises in Chapter 8 (Do not turn
in)
- Problem 17, Chapter 8. Verify that the fit is OK using the
residual and normal probability plots.
- Problem 21, Chapter 8.
General Hints for Obtaining Output Related to Chapter 8
To obtain the regression ANOVA table and to save the residuals and
fitted values, create the regression:
Select Regression from the Statistics Menu. Enter the
model formula (remember it has the form y ~ x
From the Regression Dialog,
select the Results tab and check the box for ANOVA table,
residuals and Fitted Values; specify the
dataframe name to save the results.
For residual plots, in the
Regression dialog, select the Plot tab. Click the boxes for
"Residual vs Fit" and "Residual Normal QQ". When you do the
regression, these plots will be created.
For the plot of residuals
versus explanatory variables, you will need to use the Graph main menu
to create the 2-D scatter plot. After you fit the regression, the
variables will be created. Select the Graph menu, then 2D Plot, and
scatter plot. Put the X variable on the x-axis and the residuals on the
y-axis. For Simple linear regression, this is not necessary, as the same
info is in the residuals vs fitted values plot. Note: if you refit the
model and save the residuals and fitted values everytime, the variables
will saveed in other columns, but may not have the correct names. The
best approach maybe to create a new dataframe for the output.
Lack of Fit Test
If you want to do a Lack of Fit test, make sure that you have
observations with replicates of the explanatory variable. To obtain the
pure error sum of squares, you will need to obtain the ANOVA table from
fitting the linear regression AND the ANOVA table from fitting a One-Way
ANOVA model. To obtain the One-Way ANOVA table, select Analysis of
Variance from the Statistics menu, then select Fixed
Effects. Select the dataframe, and then create the model formula.
For example, in the lack of fit test for 8.16 covered in class, you
would type in the
model as ph ~ as.factor(time) (rather than using the create
formula). The function as.factor converts the explanatory
variable time into a "grouping
variable" or "factor" with a
discrete number of levels (6 in this case); one level for each time
point, with 2 replicates at each point in time.
The actual values of "time" do not
enter into the picture at all in the One-Way Anova when you use
as.factor(time).
The Residual Sum of Squares from the
One-way ANOVA table is the Pure error sum of squares for the lack of fit
test, and the associated df's are the pure error degrees of freedom.
Hints for HW Problems
Exercise 8.17
- Download and read in the datafile Ex0817.asc.
- Use Transform under the Data menu to create all of
the new variables such as log(mass), sqrt(mass), 1/mass, sqrt(load),
etc. In S-Plus 4.5, if you click on the "Apply" button, it will create
the transformation without leaving the dialog box. You can then enter
in the name and expression for a new transformation, click "Apply",
repeat, so that you do not have to repeatly bring up the dialog box.
- To do multiple scatter plots within one graph, Select 2D, and
scatterplot as you have done
before. To add additional
graphs, make sure that you use the same Graph
Sheet, i.e. GS1, as used in the previous graph (scroll down to get it,
rather than starting a new graph sheet). Select 2D and Scatterplot with
the next set of variables. Repeat until you have all of the plots you
want. You can change some aspects of the layout using the Format menu
by selecting Arrange Graphs, and then specifying how many to go across a
page. Another approach all together is to use Graph, 2D, and select
Matrix. Do this on a new graph sheet. In the
dialog box in ONLY the X-columns, give the names of all columns that you
want to plot separated by commas and a space.
This will create a table of plots of all variables against each other.
As this contains some plots that you may not care about, the former method
is better for formal presentation, however this may be faster to produce
:-)
For more fun try, Brush and
Spin under the Graph Menu. This is useful for detecting outliers.
To specify only some of the columns, use Control-Clicks to select the
columns. De-Select the Spin box for now. Click on OK. To highlight
points, click on the Big Points box.
- Look at residual plots and normal probability plots for some of the
other possible models to see the types of patterns that may arise.
Exercise 8.21
- Make sure that you have read case study 8.1 before starting this.
- Download and read in the datafile Ex0821.asc.
- Start with a Scatterplot of Species vs Area. Do you need to
transform both species and area? (the residual plots may tell you more
than the scatterplot)
If transformations are needed, you might start with log transformations of both
variables, as the theory in Case study 8.1 suggests, rather than
starting with a shot gun approach of trying all transformations.
Look at residual and normal probability plots to guide you. (To get a
feel for how transformations work you may find it valuable to try
several anyway).
- Note: R2 values cannot be directly compared for regressions with and
without transformed response variables. They can be used to compare
models with different transformations of the explanatory variable.
- Because there are replicates, you can do a formal Lack of Fit
Test. This may help in deciding upon a transformation and indicating if
the model is appropriate.
- to get the pvalue for an F-test, where the Fratio=10.10, there are
4 df in the numerator, and 6 df in the denominator (as in the class
example) use 1 - pf(10.10, 4, 6) in the Command Window.
- Limit your write up to 1 page including any figures. You do not
need to explain all the steps you went through to get your final model,
but you should justify that it is appropriate. Be sure to report
Confidence Intervals's for parameter estimates, and give interpretations
of the final model (in the original units). See the case studies for
examples and read through the section 8.4 for explanations of how to
interpret results using log transformations.
Remember, there is no "true" model, but some are more useful than others.