Sta 242 / Env 255
Homework 3
Due Thursday, February 28 in class
Last modified: Thu Feb 28 16:46:34 EST 2002
This homework corresponds to Chapter 9 of Sleuth. You are responsible for Conceptual Exercises in Chapter 9.
To maximize your grade, read these homework guidelines.
Please put all plots on pages at the end of your homework.
Report all fitted regression lines in the format of the box above Section 7.4 (on page 180).
Data in pollen.txt. Variables: "removed", "duration", "code" (0=queens; 1=workers).
What you should learn from this problem: Familiarity with a logit transformation; interpreting the meaning of coefficients in this type of model. Walking through consideration of three types of models: single line, parallel lines, and non-parallel lines. Knowledge of how to plot these models will be useful to your projects. Understanding dynamics of how p-values change with addition/deletion of variables. Often a categorical variable (such as queen/worker status) can be a "lurking" variable in a regression problem; ignoring it can be misleading. Many regressions you will see this semester will contain both categorical and continuous variables.
These problems (I-IV) are strongly recommended, but are not required to turn in. You can work in groups on this; the important thing is to understand how the logit transformation changes the fit of the regression. Also, you should recognize that I-IV represent a sequence of steps you might follow to choose a model.
Now we will focus on the model in which the logit of p is a linear function of the log of duration. Recall that this means that p is a non-linear function of duration. The slope of the fitted line will indicate whether p is an increasing or decreasing non-linear function of duration. A final writeup of such a study would include information about the function fitted (plotted on original scale) as well as a qualitative description of the behavior of p as a function of duration and type of bee.
Fit the following 3 models and put the report output on a separate page.
Ho: The effect of duration on the fraction of pollen removed is the same for both types of bees. (same model for both types of bees; for transformed data we have same slope, same intercept)
Ha: While the linear relationship between log(duration) and the logit of proportion pollen removed is the same for the two types of bees (same slope) , the logit of the proportion of pollen removed for workers at each level of log(duration) is different from that of queens. (different intercepts) Write out the fitted model that corresponds to the null hypothesis, and the fitted model that corresponds to the alternative hypotheses. Perform the test suggested by the hypotheses above using the model output for Model B. What coefficient are you testing? Give hypotheses in statistical notation (greek), test statistic, p-value and conclusion. Which model is more appropriate?
How do the hypotheses differ when we wish to see whether the amount by which the logit of proportion of pollen removed for workers exceeds that of queens, after accounting for the effect of duration? What is the p-value for this test?
Is the p-value for the significance of log(duration) term different in Model A than in Model B? Why?
Ho: While the linear relationship between log(duration) and the logit of the proportion of pollen removed is the same for the two types of bees (same slope), the logit of the proportion of pollen removed at each level of log(duration) differs among the type types of bees. (different intercepts)
Ha: The linear relationship between log(duration) and the logit of the proportion of pollen removed is different for workers and for queens. (different slopes, different intercepts)
Write out the fitted models implied by the null and alternative hypotheses. Perform this test by comparing the model output for Models B and C. What coefficient are you testing? Give hypotheses in statistical notation (greek), test statistic, p-value and conclusion. Which model is more appropriate? Why is the p-value for the significance of the indicator variable so different in this model than in the one with the interaction term?
Interpret the intercept both on the transformed scale and on the original scale. Note that by interpreting the intercept, you are extrapolating.
A note: Hypothesis tests do not tell the whole story; "effect sizes", given in terms of confidence intervals are more informative.
Optional but strongly suggested: Make a plot of the fitted model on the original scale of measurement. This can be accomplished using the command line. For the model considered in (1), we can write Y=proportion pollen removed as a function of X=duration as follows.
p= ( exp(beta0) X^beta1 ) / ( 1 + exp(beta0) X^beta1 ) )
See Splus directions
Data in EX0914.ASC. Variables: "bank", "walk", "talk", "heart".