STA 242/ENV 255: March 25, 1998

Takehome 2

Assignment: Due Tuesday, April 7

We will continue with the relationship between mortality and various pollution indices (Source: McDonald, G.C. and Schwing, R.C. (1973) 'Instabilities of regression estimates relating air pollution to mortality', Technometrics, vol.15, 463-482.)

Variables in order:

  1. PREC Average annual precipitation in inches
  2. JANT Average January temperature in degrees F
  3. JULT Same for July
  4. OVR65 % of 1960 SMSA population aged 65 or older
  5. POPN Average household size
  6. EDUC Median school years completed by those over 22
  7. HOUS % of housing units which are sound & with all facilities
  8. DENS Population per sq. mile in urbanized areas, 1960
  9. NONW % non-white population in urbanized areas, 1960
  10. WWDRK % employed in white collar occupations
  11. POOR % of families with income < $3000
  12. HC Relative hydrocarbon pollution potential
  13. NOX Same for nitric oxides
  14. SO2 Same for sulphur dioxide
  15. HUMID Annual average % relative humidity at 1pm
  16. MORT Total age-adjusted mortality rate per 100,000
MORT is the response variable. We are particularly interested in whether mortality is related to the pollution variables HC, NOX, and SO2, after adjusting for the other variables. here are the data for 60 US cities.
  1. Continue exploring the relationship between pollution, mortality and the other variables. Scatter plots of the 3 pollution variables indicate that the 4 cases 29, 47, 48, and 49 have a different pattern of pollution than the other cities. Case 29 is Los Angeles and case 47 is San Francisco. Delete these four cases and go through the variable selection procedures. How do your conclusions change? Check for any outliers or influential cases. Investigate whether using logs of the pollution variables (this should reduce the influence of LA and San Francisco) changes the conclusions. Construct models that make sense on scientific as well as statistical grounds.
  2. Based on your analyses above: write a one page typed (max!) summary at the level of a government policy analyst describing your findings and suggestions for appropriate model(s). If pollution levels were to be decreased by 10%, how would this affect mortality? Explain in the context of particular cases and report a prediction interval for the future mortality.

    NOTE: Summarize your output and only turn in what you need to support your answer. We will not read through the pages of output in the appendix to find your answer. This should stand on its own and the reader should not have to sift through pages and pages of output to understand your results. Any numbers that are important should be included in the text!