Homework 4

Due Tuesday, October 9

Remember: do not include all of your S-Plus output. Include only the most relevant graphs and tables, and be sure to discuss them in your write-up.

  1. Get the 1978 and 1979 Old Faithful geyser eruption data (note that they are in the same file).
    (a)
    For the 1978 data, fit a simple linear regression to predict the time to the next eruption based on the duration of the current eruption. Does there appear to be a useful predictive relationship?
    (b)
    Comment on any features of the data (from plots) or fitted residuals from the 1978 model that might be relevant in exploring ways to improve these predictions.
    (c)
    Fit a similar model to the 1979 Old Faithful geyser eruption data. Does the analysis suggest any apparent differences in the relationship between x and y between the two years?

  2. One way to measure concentrations of trace elements is to measure accumulation in animals. For example, the fish data contains information on mercury concentrations in fish in North Carolina rivers.
    (a)
    Suppose we are only interested in comparing the proportion of fish with mercury concentration greater than 1.0 ppm by location (measuring station). For part(a), we will only use columns 2 and 5. Analyze the data as follows:
    • Read in the data and create a dummy variable for mercury concentration ( > 1.0 ppm vs. <= 1.0 ppm).
    • Create a vector of total counts by station, and a vector of counts of fish with large mercury concentration by station (these are your n and y binomial data; you may want to use the tabulate command for this).
    • Create an error.bar plot and comment on it.
    • Analyze the posterior for the rankings and describe your findings. (For example, use 1000 simulations from the posterior, plot 95% credible intervals for the posterior ranks for all stations, and then examine the posteriors for stations 5 and 12 in particular.)
    (b)
    Now we are interested in fully comparing the various measurement stations. This time, use the numerical values of mercury as the response, and the rest of the variables as predictors.
    (i)
    Ignoring the station number (i.e., using only river, length, and weight), find a regression model that you are happy with for predicting mercury concentration. You may want to consider transformations and model selection, and don't forget to look at residuals. Discuss your conclusions from this model.
    (ii)
    Fit your model separately for each river. Does there appear to be a difference between the two rivers?
    (iii)
    Now include the station number as a factor variable (on whichever model you feel is most appropriate from before). What are your conclusions about the different stations?
    (iv)
    A concentration over 1 part per million is considered unsafe for human consumption. In light of this, what recommendations can you make for fish caught from these rivers?