Today's lab will focus on topics in Chapter 11. Specifically we will work through the results for Case Study 11.1, "Alcohol Metabolism in Men and Women: An Observational Study." The data are located here.
First, go over conceptual exercises 1, 2, and 3 on p. 317.
- The first step of the analysis is to produce a coded scatterplot as in Display 11.2. It is assumed that you know how to do this, so you don't need to do it for lab.
- The first model we will fit is a model with all main effects, all two-way interactions, and a 3-way interaction. This model is coded as: metabol~gastric*female*alcohol, or equivalently, metabol~gastric+female+alcohol+gastric:female+gastric:alcohol+female:alcohol+gastric:female:alcohol. Print out the model output. Write down the fitted model for a male non-alcoholic. Compare this to the fitted model for a female alcoholic.
- Create a residual plot for this model, and confirm that it looks like Display 11.7. Also create a QQ normal plot. Check for violations of the regression assumptions.
- Produce case influence statistics for the model metabol~gastric*female*alcohol. Can you describe why cases 31 and 32 appear unusual in terms of the information from each case influence statistic?
Directions for Case Diagnostics. First, if your data are named "data1", type "attach(data1)" into the command line. Note that to run the script file, the command you will enter is: diagplot.fun(cbind(gastric,female,alcohol,gastric*female,gastric*alcohol,female*alcohol,gastric*female*alcohol),metabol)
Off on a slight tangent... Later in the chapter, the authors investigate case diagnostic plots for the model metabol~gastric*fem. See Display 11.11. Although this step is out of sequence in what we are doing in this lab agenda, it is useful to reproduce this plot for comparison. This will be done with cases 31 and 32. Do this by typing: diagplot.fun(cbind(gastric,female,gastric*female),metabol). Note that observation 17 has high leverage, but does not seem to be influential.
- Now refit the model without cases 31 and 32. (Save the fitted model object as "fullmod.wo3132".) Create a residual vs. fitted plot and a QQ normal plot. For a formal report, you would also check for violations of assumptions and produce new case diagnostics for this model, but move on for now.
Based on the report output, write down the fitted model for a male non-alcoholic. Compare this to the fitted model for a female alcoholic. Describe how the exclusion of these 2 cases changes your model interpretations.
- Proceeding from the guideline that it is unwise to state conclusions that hinge on one or two points, we exclude cases 31 and 32, and thus we will restrict our model building and inference to the restricted range of gastric AD activity less than 3. You will need to remove a row of data.
We will proceed by evaluating the model we fit in step 5. Since alcoholism is not of primary concern, we will evaluate whether the set of terms relating to alcoholism, alcohol, gastric:alcohol, female:alcohol and gastric:female:alcohol, belong in the model.
Perform an extra sum of squares F-test to test the significance of these terms. (Fit the model: metabol~gastric*fem, save it as "reducedmod.wo3132" and proceed as discussed at the bottom of the 3/6 lab agenda.) What are the null and alternative hypotheses? Test statistic? Degrees of freedom? Compare your answer to that in the middle of page 308.
- Proceeding with the model metabol~gastric*fem, it appears based on the p-values for the coefficients for fem and gastric:fem that we can eliminate fem from the model entirely. WRONG! First test the significance of the interaction gastric:fem in the presence of the main effect, fem. Then examine the model metabol~gastric+fem to determine whether fem is significant in a model that includes gastric. The final model should be metabol~gastric+fem. Check residuals vs. fitted for this model and QQ normal plot for this model. If you see observations that may be influential, double check using case influence diagnostics.
- In this setting the authors of the Sleuth claim that the model without an intercept is most appropriate, since it "seemes somewhat more likely to apply in the broader range." See p. 309. Fit a model without an intercept: metabol~-1+gastric+fem.
- Now for fun, let's go back to the Palm Beach data, which give the numbers of votes for Buchanan and Bush in 67 counties in
Florida. We will produce case influence diagnostics for the model buchanan2000~bush2000.
The command is: diagplot.fun(votes$bush2000,votes$buchanan2000)
Explain what each of the case influence diagnostics mean for Palm Beach in terms of leverage and influence.