TakeHome Exam 1

Due by 5 pm February 10th

You may work in groups of 2 or 3, but there should be no discussion between groups. This is an exam. If you have any questions you should ask the instructor or the TA's.

Problem

Pollution of waterways is one of the most serious problems facing the world today. Billions of dollars have been spent on cleanup, antipollution laws have been passed, technological inovations have been sought to prevent pollution; still the world is probably getting more, not less, polluted. Pollutant levels in various bodies of water are important to study, to get a handle on the pollution problem. In particular, prediction of future levels based on current characteristics is crucial for the determination of strategies to address pollution. The data pcb.dat contains data on PCB concentrations (measured in parts per billion) in samples from US bays and estuaries in 1984 and 1985. The variable names for the three columns are Bay, pcb84, and pcb85

Fit a linear (or curvilinear) regression model using the 1984 data to predict pcb85 to answer the question of `` Does use of past data help in predicting the next year's level?''. Use residual plots to check for ``outliers'', data points with large absolute values for the residuals and to check assumptions about the model. You may need to consider transforming both variables. Since both variables are measured in the same units, you should use the same transformations on both. If there are outliers refit the model without these cases and see if it changes the fitted model. For example, it is known that in 1984 Boston stopped dumping raw sewage into Boston Harbor and opened a sewage treatment plant. If you transform the data, make sure that you include interpretations of the model back in the original units.

Write a one page (max!) summary of your regression model(s) interpreting your analysis. In your summary, you may also want to comment on other variables that may be important but were not included in the study, ways to improve upon the analysis, and limitations of the model. Please include any plots or details of the computer analysis in an appendix for reference.