Note that you are responsible for all conceptual exercises at the end of Chapters 7-9.
[40 points EXTRA CREDIT] We have two monitors for measuring indoor concentrations of carbon monoxide (CO). Each take measurements every minute. Monitor A is a newer, more accurate and expensive monitor. Monitor B is an older monitor which is cheaper to run. In a quality assurance experiment to verify that the monitors are measuring the same concentrations, both monitors are co-located near a CO source, and are turned on simultaneously. The datafile, monitor.txt, gives CO concentrations (in ppm) measured by Monitor A (column 1) and Monitor B (column 2). Label the columns A and B.
Produce a regression line plot of Monitor B versus Monitor A. Do this using the command line:
> attach(monitor)
> par(pty="s")
> plot(A,B, main="Comparison of New (A) and Old (B) \n CO Monitors",xlab="Monitor A (ppm CO)", ylab="Monitor B (ppm CO)",xlim=c(0,75),ylim=c(0,75))
> abline(0,1,lty=1) # solid line, slope 1.
##Data will follow the line above if monitors in complete agreement.
> abline(lm(B~A),lty=2) # dotted line, fitted regression line
Regress Monitor B on Monitor A concentration measurements and report the fitted line.
Perform a test of whether the slope of the regression is equal to 1. Let alpha=0.05. Use the p-value to make your conclusion, and write a sentence in the format of a writeup summarizing your result.
Determine 95% confidence intervals for b0 and b1, such that the two confidence intervals simultaneously capture the slope and intercept of the regression line with 95% probability. Use the Bonferroni procedure to construct the intervals: use the Bonferroni t-multiplier on page 163 with k=2 (for 2 parameters) to construct your interval. What do these intervals say about how well the two monitors agree? That is, simultaneously, does the interval for the intercept include zero, and does the interval for the slope include 1?
We wish to predict a measurement from Monitor B when Monitor A reads 20 ppm. Give a 95% interval for Monitor B's measurement.
We wish to predict a measurement from Monitor A when Monitor B reads 20 ppm. Give a 95% interval for Monitor A's measurement.
Data in pollen.txt. Variables: "removed", "duration", "code" (0=queens; 1=workers).
What you should learn from this problem: Familiarity with a logit transformation; interpreting the meaning of coefficients in this type of model. Walking through consideration of three types of models: single line, parallel lines, and non-parallel lines. Knowledge of how to plot these models will be useful to your projects. Understanding dynamics of how p-values change with addition/deletion of variables. Often a categorical variable (such as queen/worker status) can be a "lurking" variable in a regression problem; ignoring it can be misleading. Many regressions you will see this semester will contain both categorical and continuous variables.
The logit transformation. If p is the proportion, then the logit transform is log[p/(1-p)]. This is the log of the ratio of the amount of pollen removed to the amount not removed. We can refer to it as the "log of the pollen removal ratio", or the "logit of the proportion of pollen removed". Often this transformation allows us to make the needed assumptions for a regression model. All problems below should be turned in. We will consider the model, log(removed/(1-removed))~log(duration), in depth.Fit the following 3 models and put the report output on a separate page.
updated 2/13/04, 6pm:
Ho: The model describing the linear relationship between log(duration) and the logit of proportion pollen removed is the same for both types of bees. (for transformed data we have same slope, same intercept.)
Ha: While the linear relationship between log(duration) and the logit of proportion pollen removed is the same for the two types of bees (same slope), the mean logit of the proportion of pollen removed for workers at each level of log(duration) differs from that of queens. (different intercepts)
Write out the model that corresponds to the null hypothesis, and the model that corresponds to the alternative hypotheses. Perform the test suggested by the hypotheses above using the model output for Model B. What coefficient are you testing? Give hypotheses in statistical notation (greek), test statistic, p-value and conclusion. Which model is more appropriate?
How do the hypotheses differ when we wish to see whether the amount by which the logit of proportion of pollen removed for workers exceeds that of queens, after accounting for the effect of duration? What is the p-value for this test?
Is the p-value for the significance of log(duration) term different in Model A than in Model B? Why?
Ho: While the linear relationship between log(duration) and the logit of the proportion of pollen removed is the same for the two types of bees (same slope), the logit of the proportion of pollen removed at each level of log(duration) differs among the type types of bees. (different intercepts)
Ha: The linear relationship between log(duration) and the logit of the proportion of pollen removed is different for workers and for queens. (different slopes, different intercepts)
Write out the models implied by the null and alternative hypotheses. Perform this test by comparing the model output for Models B and C. What coefficient are you testing? Give hypotheses in statistical notation (greek), test statistic, p-value and conclusion. Which model is more appropriate? Why is the p-value for the significance of the indicator variable so different in this model than in the one with the interaction term?
Interpret the intercept both on the transformed scale and on the original scale. Note that by interpreting the intercept, you are extrapolating.
A note: Hypothesis tests do not tell the whole story; "effect sizes", given in terms of confidence intervals are more informative.
Data in EX0914.ASC. Variables: "bank", "walk", "talk", "heart".
Note that all variables are standardized, that is, the mean is subtracted from each value and is divided by the standard deviation. This means that if we talk about a one-unit change in X, we are talking about a one sample standard deviation change in X. The same goes for Y.The model statement you will enter in Splus is: heart~bank+walk+talk