Statistics 242 -- Applied Regression Analysis
Assignment
- Exercises 9:1-10
(do not turn in, self-check answers on page 253)
- Exercise 9:13
- Exercise 9:14
Lab Goals
- Creating and understanding a Scatterplot matrix
- Fitting multiple regression in S-Plus
- Continue viewing residual plots to assess Model adequacy
In S-Plus, fitting multiple regression models involves the same steps as
fitting a simple linear regression model. Select the Statistics
Menu, choose Regression and Linear. The only step that
changes is in the specification of the model formula. Recall, the model
formula in simple linear regression,
Y ~ X
reads "the response y is modeled as a linear function of X". For
multiple regression, we allow the mean of Y to be a function of multiple
explanatory variables, so the model formula might be
Y ~ X1 + X2
which means that the response Y is modeled (the tilde) as a linear
function of X1 and X2. The variables X1 and X2 can be anything. In
exercise 13 we will examine quadratic regression, where we fit a model
with time and time squared as variables; this is a special case of
multiple regression. In exercise 14, we have 3 different explanatory
variables. In Exercise 16 (covered in class) we created variables
based on dummy variables and interaction variables
We will focus first on fitting multiple regressions, and then come back
to interpreting the output in more detail in later labs.
Specific Tips for the Homework
Exercise 13
-
The data are in file Case0702.asc. Download, and then
read into S-PLUS.
- From the Data menu, select Transform and create the
variables for the exercise; we will need time^2, log(time) and
(log(time))^2. (this is also how to specify the expression for the
transformation in the dialog box). Recall, if you click apply, then it
will create the first transformation, and then you may continue with
another. In what follows I will use the names time2, log.time and
log.time2 as the names for these transformed variables.
- For (a): To fit the multiple regression, select
the Statistics Menu, choose
Regression and Linear. To enter the formula you may type
it in directly:
ph ~ time + time2
or click "create" to add the parts; select ph as the response and both
time and time2 as Main effects.
- Select the residual plots that you would like to view, and then click on
OK. The output will be in the Report Window.
- Locate the output that
lists the coefficient estimates, standard errors, t-stats, and
p-values. You will need the p-value for time2. This p-value
corresponds to the null hypothesis that the regression coefficient for
time^2 is 0 (vs not 0) given that time is included in the model. In
other words, we are looking for evidence that we need to include
additional terms in time to model possible curvilinear relationships
between time and pH.
- For part b just substitute log.time and log.time2 for time and
time2 in the above.
- In addition to the p-values, do the residual or normal probability plots
indicate whether it is better to use the logarithm of time or leave it untransformed?
Exercise 14
-
The data are in file Ex0914.asc. Download, and then
read into S-PLUS.
- To create a scatterplot matrix, select the Graph menu and then
choose "Scatterplot matrix". In the dialog box, select the variables
bank, walk, talk, and heart for the X-axis.
In this matrix or table, each entry
corresponds to a simple scatterplot. For a particular column of the
matrix, the variable for the column is plotted on the X-axis in all
scatterplots in the column. For a particular row, the corresponding
variable is plotted on the Y-axis. The diagonal elements would
corresponds to plotting each variable against itself, which is not very
interesting, so instead that space contains the variable name, so you
can identify the variable for that row or column. The output using
brush and spin is similar but includes only the lower half below the diagonal.
- To fit the multiple regression of heart on bank, walk, and talk, go
to the Statistics menu, select <>Regression, Linear.
The model formula should be
heart ~ bank + walk + talk
- Make sure that the selected Plots include the residuals vs fitted
values, and also check out the normal probability plot (normally part of
the default output)
- The output in the Report Window contains the estimates of the
coefficients and standard errors as needed for part (d).