Sta 242 / Env 255 Lab 2
Burning fuels produce air pollutants that return to earth as acid rain, affecting wildlife in lakes and streams. Declines in aquatic life have occurred downwind of industrial regions in parts of Europe, the Rocky Mountains, and the northeastern U.S.
The data come from 15 tributaries of Millers River in north-central Massachusetts. For each stream, we know the average pH and the number of fish species observed during the summer of 1983. pH measures acidity; the more acidic the stream, the lower its pH. Pure water has a pH of 7.0; vinegar is about 3.0.
The data in acidfish.txt are taken from D. Halliwell (Committee on Monitoring and Asessment Trends in Acid Deposition, 1986) and consist of:
We wish to explore the relationship between acidity and fish species.
Calculate the linear correlation coefficient, r. Splus directions. A range of pH 6.5 to pH 8.2 is optimal for most organisms. Suppose an EPA scientist wishes to rescale the data relative to a lower bound for pH. If 6.5 is subtracted from each of the average summer pH values, will the correlation coefficient change? Why or why not?
Does the correlation coefficient give a fair estimate of the strength of the association between number of species and pH? Why or why not? One issue to think about is "ecological correlation," which is summarized here. Look halfway down the web page under the heading "Ecological Correlation" and note the applet you can use to choose variables and look at different datasets.
Under what conditions would it be more appropriate to use the median or minimum of pH values as the x-variable?
Produce a regression line plot of fish species (Y) versus average pH (X).
Regress fish species (Y) on average pH (X). Write out the fitted regression equation in the format of at the bottom of page 185. In one sentence each, interpret the slope, intercept and R2.
Test whether there is evidence of a linear association between number of fish species and average summer pH. Use alpha=0.05. Write out hypotheses, rejection region, and test statistic. Also, give the one-sided and two-sided p-values and comment on which is more appropriate for this example.
For any regression, we verify assumptions about the residuals of the regression by creating 2 plots: a plot of residuals vs. fitted values, and a qq normal plot of residuals. How well they are met in this example? Produce these plots by running the regression and using the "Plot" tab on the pop-up window.
Produce a plot of the confidence intervals for the mean response at each level of the x-variable. Under "Graph", "2D Plot", "Fit-Linear Least Squares", specify the x-Column and the y-column. Then click on the "By Conf Bound" tab. Choose "Confidence 0.90" at the bottom. Then choose a line style (other than "None"), color and width. Click on OK. For this homework, all plots go on a single page.
Now compare the CIs for the mean response to prediction intervals, using a function in Splus. See Splus instructions. What is the difference between these two sets of intervals?
Note that neither the CIs for the mean response nor the prediction intervals simultaneously cover 90% of all predicted Y values; instead these plots show many individual CI's. For simultaneous intervals, you need a confidence interval for the regression line, based on the Scheffe multiple comparison procedure. The Splus function mentioned above produces such a plot. Create a plot that shows all three sets of bands, as in Display 7.11.
Construct by hand a 90% confidence interval for the mean number of fish species in streams at the specific x-values of pH=6.0 and with pH=8.0. Compare your values with your plot.
Calculate by hand 90% prediction intervals for the number of fish species in a stream with pH=6.0 and with pH=8.0. Compare your values with your plot.
What streams correspond to the two observations with the lowest pH and smallest number of species? Repeat the regression without these two values and create a second regression line plot. Discuss how these cases affect your results. Should they be left in or eliminated in drawing final conclusions about fish and pH?