Sta 240 / Env 298.01
Homework 7
Wednesday, November 8th at 5pm
To give you more time to work on your projects, this homework is limited to one problem.
Extra credit is below.
Some red spruce forests in the Appalachian Mountains show signs of decline, with many dead or dying trees. Environmental stress may contribute to this decline; deposition of airborne pollutants such as metals or acids tends to be heavier at higher elevations, where red spuce predominate. The dataset, spruce.txt, contains data on elevation and the percentage of dead or badly damaged trees, from 64 Appalachian sites (Johnson and Siccama, reported by the Committee on Monitoring and Assement of Trends in Acid Deposition, 1986). Eight of the sites are in southern states (West Virginia, Virginia and North Carolina); the remainder are northern (New Hampshire, Vermont and New York).
Dataset: "spruce.txt" describing elevation and percentage dead or damaged red spruce treesRegress percentage damaged (Y) on location (X1) and elevation (X2). Write out the fitted regression equation, and interpret the values of the Y-intercept, both regression coefficients, and R2.
At =0.01, which of
these null hypotheses can we reject? Describe what each of these
tests mean: give the equations for the models being compared, the
meaning of the test, and provide p-values for each test.
Draw a Y vs. X2 scatterplot, and show the 2 regression lines derived from your analysis in part (a): one for the South (X1=0) and one for the North (X1=1).
Test whether the parallel regression lines model in (a) is significantly different from a single regression line model.
Comment on the fit of the regression line in (a) using plots of residuals vs. fitted values, as well as a normal probability plot of residuals.
The regression in (a) assumes that the relationship between elevation and percentage damaged is the same for southern and northern sites. Do the data support this assumption? Draw a scatter plot of Y versus X2, using different symbols for southern and northern sites and describe what you see.
To allow for the possiblity of interaction (the elevation/damage relationship changing with location), we can redo the regression as a separate regression lines model. That is, each region is represented by its own slope and its own intercept.