Agenda for STA242/ENV255 Lab, Wednesday 2/11/04

STA242/ENV255

Agenda for Lab, Wednesday 2/11/04

By now you have read Chapter 9 of Sleuth. We will focus our discussion today on Case Study 9.1.1, the Meadowfoam Flowering Example (p. 236). .

Topic: The Use of "Dummy" or "Indicator" Variables in Regression

Problem: Multiple regression with 2 explanatory variables, where one of the explanatory variables is a categorical variable with 2 levels.
There are 3 ways to explain how the categorical variable can enter into the regression, each implies a different model.
Handout to print: Three models under consideration for meadowfoam problem

For the Meadowfoam Data, variables are:

Explanatory: Light Intensity (continuous)
Explanatory: Time(categorical). Two levels: Late (at PFI) = 1 and Early (before PFI) = 2.
Response: Average number of flowers per plant.

Recode the Time variable so that Late=0 and Early=1. You can do this by hand, or look at the functions under "Data".

Equal Lines Model.
1. [LAB] Regress flowers on intensity. Check residuals plots for violations of assumptions.
2. [LAB] Note that this model implies that the same linear relationship between intensity and flowers holds for both time levels (same slope, same intercept).
3. [LAB] Produce a coded scatterplot with regression line superimposed. You will need to run the following Splus commands..Compare your plot to the plot at the bottom of Display 9.8, "Equal lines model".
4. Interpret the slope. Increasing light intensity has what effect on the mean number of flowers per plant? Give a CI.
5. Use the centering trick to investigate the mean number of flowers at a light intensity of 500, by fitting the model: flowers~I(intensity-500).
Parallel Lines Model.
1. [LAB] Now regress flowers on intensity and time. Check residuals for violations of assumptions.
2. [LAB] Note that this model implies that the linear relationship between the number of flowers and intensity is the same for both levels of time, but that the mean number of flowers for the early and late time group differ by a fixed amount.
3. [LAB] What is the slope for the late group? the early group?
4. [LAB] What about intercepts for early and late time groups?
5. [LAB] Use your regression output to give the separate regression equations for the late and early groups.
6. [LAB] Commands to produce a coded scatterplot with parallel regression lines superimposed. Compare your plot to the plot at the middle of Display 9.8, "Parallel lines model".
7. [LAB] Note that this model implies that the effect of "time" is to shift the regression line for the early time group up by a fixed amount. What is this amount? How would you get a confidence interval for this amount? Is the shift statistically significant (test of the coefficient for time)? That is, is the parallel lines model preferable to the equal lines model?
8. Looking at the F-statistic and p-value at the bottom of the "Parameter Estimates" output, what are the null and alternative hypotheses?
9. Use the centering trick to investigate the mean number of flowers at a light intensity of 500, by fitting the model: flowers~I(intensity-500)+time. Note that the value will depend on the level of the time variable.
Separate Lines Model.
1. [LAB] Now fit this regression model: flowers~intensity*time. Check residuals for violations of assumptions. This model notation is equivalent to fitting the model: flowers~intensity+time+intensity:time
2. [LAB] Note that this model implies that the effect of light intensity on the number of flowers depends on whether the timing is at "early" or "late." That is, this model assumes that there is an interaction between intensity and time.
3. [LAB] Read Section 9.3.4. Use your regression output to give the regression equations for the early and late groups. Note that the regression lines will differ by both slope and intercept; that is, the linear relationship between flowers and intensity differs according to time.
4. [LAB] Produce a coded scatterplot with regression lines superimposed. Compare your plot to the plot at the top of Display 9.8, "Separate lines model". Now compare your plot to the parallel lines model plot and explain the difference after looking at the magnitude and p-value of the coefficient of "intensity:time."
5. [LAB] Test the significance of the interaction term to determine if the parallel line or separate lines model is more appropriate for these data.
The parallel lines and separate lines models are different from an analysis where 2 separate regression lines are fit. If I fit 2 separate regression lines, I would be estimating two model errors. If I fit the parallel lines or separate lines models, I am using all of the data to estimate a single model error.