Sta 240 / Env 298.01

Homework 7

Wednesday, November 8th at 5pm

    To give you more time to work on your projects, this homework is limited to one problem.

    Extra credit is below.

  1. Some red spruce forests in the Appalachian Mountains show signs of decline, with many dead or dying trees. Environmental stress may contribute to this decline; deposition of airborne pollutants such as metals or acids tends to be heavier at higher elevations, where red spuce predominate. The dataset, spruce.txt, contains data on elevation and the percentage of dead or badly damaged trees, from 64 Appalachian sites (Johnson and Siccama, reported by the Committee on Monitoring and Assement of Trends in Acid Deposition, 1986). Eight of the sites are in southern states (West Virginia, Virginia and North Carolina); the remainder are northern (New Hampshire, Vermont and New York).

    Dataset: "spruce.txt" describing elevation and percentage dead or damaged red spruce trees

    1. Regress percentage damaged (Y) on location (X1) and elevation (X2). Write out the fitted regression equation, and interpret the values of the Y-intercept, both regression coefficients, and R2.

    2. At =0.01, which of these null hypotheses can we reject? Describe what each of these tests mean: give the equations for the models being compared, the meaning of the test, and provide p-values for each test.

      1. Ho: betao = 0
      2. Ho: beta1 = 0
      3. Ho: beta2 = 0
      4. Ho: beta1 = beta2 = 0

    3. Draw a Y vs. X2 scatterplot, and show the 2 regression lines derived from your analysis in part (a): one for the South (X1=0) and one for the North (X1=1).

    4. Test whether the parallel regression lines model in (a) is significantly different from a single regression line model.

    5. Comment on the fit of the regression line in (a) using plots of residuals vs. fitted values, as well as a normal probability plot of residuals.

    6. The regression in (a) assumes that the relationship between elevation and percentage damaged is the same for southern and northern sites. Do the data support this assumption? Draw a scatter plot of Y versus X2, using different symbols for southern and northern sites and describe what you see.

    7. To allow for the possiblity of interaction (the elevation/damage relationship changing with location), we can redo the regression as a separate regression lines model. That is, each region is represented by its own slope and its own intercept.

      1. Generate a slope dummy variable X1X2, and regress Y on X1 (location), X2 (elevation) and X1X2 (location times elevation). Write out the regression equation.
      2. What are the implications of the regression equation for southern and northern forests (substitute X1=0 and X1=1)? Write out the regression equation in each case and explain.
      3. How does the model with an interaction effect compare with the model in (a), as measured by R2?
      4. Produce plots of residuals vs. fitted values as well as normal probability plot of the residuals and comment.
      5. Assess the significance of the coefficient on X1X2. Give the results of a relevant hypothesis test and carefully explain what this test means.
      6. Assess the significance of the coefficients on both X1 and X1X2 using an F-test. Give the results of a relevant hypothesis test and carefully explain what this test means.
      7. Run separate regressions of damage on elevation for southern and northern sites, and confirm that the equations from these regression match those derived from your slope and intercept dummy variable regression in part (g) part (i). What does exercise (g) part (i) tell us that separate North and South regressions do not?

      Splus Hints

      Splus Hints for graphs (d) and (f)

      5 point extra credit problem