Due Tuesday, October 9
Remember: do not include all of your S-Plus output. Include only the
most relevant graphs and tables, and be sure to discuss them in your write-up.
- Get the 1978 and 1979 Old Faithful
geyser
eruption data (note that they are in the same file).
- (a)
- For the 1978 data, fit a simple linear regression to predict the
time to the next eruption based on the duration of the current
eruption. Does there appear to be a useful predictive relationship?
- (b)
- Comment on any features of the data (from plots) or fitted
residuals from the 1978 model that might be relevant in exploring ways
to improve these predictions.
- (c)
- Fit a similar model to the 1979 Old Faithful geyser eruption data.
Does the analysis suggest any apparent differences in the
relationship between x and y between the two years?
- One way to measure concentrations of trace elements is to measure
accumulation in animals. For example, the fish
data contains information on mercury concentrations in fish in North
Carolina rivers.
- (a)
-
Suppose we are only interested in comparing the proportion of fish with
mercury concentration greater than 1.0 ppm by location (measuring
station). For part(a), we will only use columns 2 and 5. Analyze the
data as follows:
- Read in the data and create a dummy variable for mercury concentration ( > 1.0 ppm vs. <= 1.0 ppm).
- Create a vector of total counts by station, and a vector of counts of fish with large mercury concentration by station (these are your n and y binomial data; you may want to use the tabulate command for this).
- Create an error.bar plot and comment on it.
- Analyze the posterior for the rankings and describe your
findings. (For example, use 1000 simulations from the posterior, plot
95% credible intervals for the posterior ranks for all stations, and
then examine the posteriors for stations 5 and 12 in particular.)
- (b)
- Now we are interested in fully comparing the various
measurement stations. This time, use the numerical values of mercury
as the response, and the rest of the variables as predictors.
- (i)
- Ignoring the station number (i.e., using only river, length, and
weight), find a regression model that you are happy with for predicting mercury
concentration. You may want to consider transformations and model
selection, and don't forget to look at residuals. Discuss your
conclusions from this model.
- (ii)
- Fit your model separately for each river. Does there appear to
be a difference between the two rivers?
- (iii)
- Now include the station number as a factor variable (on whichever
model you feel is most appropriate from before). What are your
conclusions about the different stations?
- (iv)
- A concentration over 1 part per million is considered unsafe for
human consumption. In light of this, what recommendations can you
make for fish caught from these rivers?