Sta 242 / Env 255
Homework 1
Due Tuesday, January 23, 5pm
Please review the Homework Policies.
For this homework, you should be familiar with Chapter 2, Sections 3.1-3.4, Chapter 5 (up to bottom of page 128).
Conceptual exercises in Chapters 2, 3 and 5 are recommended as review. The answers are at the end of each chapter.
Researchers are interested in obtaining intraday information about pollution levels in a specific town. For that, they record, in the same day, two measurements (noon and midnight) at forty random locations. Twenty measurements are taken at noon on the eastern part of the city and twenty are taken at midnight on the western part. The data, pollutant.txt, consist of the day of measurement, the measurement at noon, and the measurement at midnight.
Note that the data is in milligrams of pollutant per cubic meter.
We are interested in making inference about the mean difference in pollutant levels between Noon and Midnight under 2 different scenarios.
Draw boxplots of the data at noon and midnight. Comment on the distributions of the two groups, referencing the center, spread, skewness and outliers as necessary. (Review Section 1.5 of Sleuth for concepts here. An easy way to make the plot, once you have selected "Graph" and "2D-Plot" and "Boxplot", is to select both the Noon column and the Midnight column for the y-axis by holding down shift and clicking on each variable. Be sure to label your axes with descriptive labels and appropriate units of measurement.)
Give a 99% confidence interval for the difference in pollutant levels between noon and midnight. Do this by hand. You can use Splus to check your answer.
Do these data provide sufficient evidence to indicate that the mean difference between Noon and Midnight measurements is less than 3 grams per cubic meter? Clearly write out your null and alternative hypotheses and test using a significance level alpha=0.01. Write out test statistic and rejection region. In one sentence, give the result of your test.
Give and formally interpret the p-value associated with the previous test.
For the remainder of this homework, consider this setting. Say that for each of 20 western locations, a measurement was taken at noon, and a measurement was taken at midnight. (i.e. on day 1, location 1 gave a measurement of 4239 mg/m3 at noon and a measurement at midnight of 3201 mg/m3).
A hint at a future linear regression problem. Produce a plot of the Noon observations on the x-axis and the Midnight observations on the y-axis.
You will do this plot using the "Command Line" in Splus. Let's say your dataset "pollutant" has 3 variables, labeled "DayNum", "Noon", and "Midnight". Enter the following commands:
> attach(pollutant)
> par(pty="s")
> plot(Noon, Midnight, main="Noon and Midnight Values \n for 20 locations",xlab="Noon (mg/m3)", ylab="Midnight (mg/m3)")
The command, pty="s", ensures a square plot. Why is this beneficial to answer this question?
The command "attach" lets you refer to the variable names individually in the command line.
If at each of the 20 locations, the pairs of Noon and Midnight observations were close to each other, what would the plot look like? Give the equation for a line fitting the data in this case.
Sketch a graph in which Midnight measurements are systematically higher than Noon measurements.
Dissolved oxygen analysis measures the amount of gaseous oxygen (O2) dissolved in an aqueous solution. Dissolved oxygen can range from 0-18 parts per million (ppm), but most natural water systems require 5-6 parts per million to support a diverse population. As dissolved oxygen levels in water drop below 5.0 ppm, aquatic life is put under stress.
10 replicate water samples were taken at each of four locations in a river to determine whether the quantity of dissolved oxygen varied from one location to another (the higher the level of pollution, the lower the dissolved oxygen reading).
Location 1 was adjacent to the wastewater discharge point for a certain industrial plant. Locations 2, 3, 4 were selected at points 10, 20, and 30 miles downstream from this discharge point. The resulting data appear in water.txt, where Column 1 is the dissolved oxygen concentration in ppm, and Column 2 is the location number.
For this problem, put all plots on 1 page, and label it "Problem 2 Plots"
Make side by side boxplots of the four groups.
Make normal probability plots of the four groups.
Perform a one-way ANOVA to determine whether there is sufficient evidence that values for mean dissolved oxygen content differ significantly among the four locations.
Write out the null and alternative hypotheses using statistical notation.
Construct an analysis of variance table for the data analogous to Display 5.9 on p. 121.
Summarize the result of the test in a couple of sentences, giving the F-statistic, its p-value and your interpretation of the results.
How big of a difference in mean dissolved oxygen content is seen between location 1 and location 4? Give a confidence interval. (Review Display 5.6, p. 116 of Sleuth.)
Plot the residuals of the ANOVA against the estimated means, and comment on the fit of the model. (Review Section 5.4.2 of Sleuth and Conceptual Exercise #7 on page 135.)
Comment on how well the model meets each of the four ANOVA assumptions. In particular, critique the assumption of independence between groups in this sampling scheme.