Due Friday, October 20 (but I suggest doing it by the 18th)
Remember: do not include all of your S-Plus output. Include only the
most relevant graphs and tables, and be sure to discuss them in your write-up.
- Get the 1978 and 1979 Old Faithful
geyser
eruption data (note that they are in the same file).
- For the 1978 data, fit a simple linear regression to predict the
time to the next eruption based on the duration of the current
eruption. Does there appear to be a useful predictive relationship?
- Find a 95% prediction interval for the waiting time to the next
eruption when the current eruption just lasted for 2.5 minutes.
- Comment on any features of the data (from plots) or fitted
residuals from the 1978 model that might be relevant in exploring ways
to improve these predictions.
- Fit a similar model to the 1979 Old Faithful geyser eruption data.
Does the analysis suggest any apparent differences in the
relationship between x and y between the two years?
- For the mercury
in fish data. Again, we are interested in comparing the various
measurement stations. This time, use the numerical values of mercury
as the response, and the rest of the variables as predictors.
- Ignoring the station number (i.e., using only river, length, and
weight), find a regression model that you are happy with for predicting mercury
concentration. You may want to consider transformations and model
selection, and don't forget to look at residuals. Discuss your
conclusions from this model.
- Fit your model separately for each river. Does there appear to
be a difference between the two rivers?
- Now include the station number as a factor variable (on whichever
model you feel is most appropriate from before). What are your
conclusions about the different stations?
- A concentration over 1 part per million is considered unsafe for
human consumption. In light of this, what recommendations can you
make for fish caught from these rivers?