Late homework will not be accepted.
Assignments must be typed and stapled or will not be graded. Mathematical symbols or calculations can be neatly written out, or you can use an equation editor.
You may discuss the problems in HW1 with colleagues, but everything you turn in must be your own.
(70 points) Burning fuels produce air pollutants that return to earth as acid rain, affecting wildlife in lakes and streams. Declines in aquatic life have occurred downwind of industrial regions in parts of Europe, the Rocky Mountains, and the northeastern U.S.
The following table lists data from 15 tributaries of Millers River in north-central Massachusetts. For each stream, we know the average pH and the number of fish species observed during the summer of 1983. pH measures acidity; the more acidic the stream, the lower its pH. Pure water has a pH of 7.0; vinegar is about 3.0.
The data in acidfish.txt are taken from D. Halliwell (Committee on Monitoring and Asessment Trends in Acid Deposition, 1986) and consist of:
We wish to explore the relationship between acidity and fish species.
Start by creating a directory in your Z: drive specifically for this dataset and homework problem, say HW1.acidfish.
Use Splus to calculate the linear correlation coefficient, r. A range of pH 6.5 to pH 8.2 is optimal for most organisms. Suppose an EPA scientist wishes to rescale the data relative to a lower bound for pH. If 6.5 is subtracted from each of the average summer pH values, will the correlation coefficient change? Why or why not?
Produce a regression line plot of fish species (Y) versus average pH (X). For this homework, all plots go on a single page.
Use Splus to regress fish species (Y) on average pH (X). Write out the fitted regression equation in the format of the box before Section 7.4 (bottom of page 185). In one sentence each, interpret the slope, intercept and R2. (Look under "Case Studies" "Summary of Statistical Findings" and "Scope of Inference" for examples of interpretations.)
Test whether there is evidence of a linear association between number of fish species and average summer pH. Use alpha=0.05. Write out hypotheses, rejection region, and test statistic. Also, give the one-sided and two-sided p-values and comment on which is more appropriate for this example.
Construct by hand a 95% confidence interval for the slope. How does this confirm the two-sided test result of the previous problem?
Verify the assumptions on the residuals of the regression by creating 2 plots: a plot of residuals vs. fitted values, and a qq normal plot of residuals. Write out the assumptions and discuss how well they are met in this example. For this homework, all plots go on a single page.
What streams correspond to the two observations with the lowest pH and smallest number of species? Repeat the regression without these two values and create a second regression line plot. Discuss how these cases affect your results. Should they be left in or eliminated in drawing final conclusions about fish and pH? The second regression line plot goes on the single page of plots for this homework.
Does the correlation coefficient give a fair estimate of the strength of the association between number of species and pH? Why or why not?