Homework #1—Suggested Answers

ISBE 1.2, 1.3, 1.6, 1.7, 1.9, 1.10

1.2

63/(63+19)=77%, p = 77% +/- 9.1%
71%, p = 71% +/- 7.2%
58%, p = 58% +/- 6.1%
52%, p = 52% +/- 6.2%
55%, p = 55% +/- 4.4%

Notice this is the average between % of males and % of females who prefer popular music (there are an equal number of males and females).

b. We assume no sampling bias, that is, that the sample was selected so that it is representative of the population of listeners (allowing us to make inference about the various p's estimated above).

1.3

The sample is based on unsolicited constituent's opinions. Individuals with strong opinions were more likely to contact the senator, hence mail received by the senator are not likely to be broadly representative of his constituency. To reduce bias, the Senator should poll a randomly selected sample of his constituents.
The survey was only conducted between the hours of 9am-5pm, failing to capture the population who are at work at these hours. These may be the people most desiring a child day care center. The survey interviews should be conducted at other times as well.
Individuals attending a reunion may be different in important ways from the population of graduates (for example, less successful graduates may not attend). In addition a large fraction polled (42 of 56) did not reply. It may be that more successful individuals were more (or less) likely to reply. A better way to estimate average income would be to sample graduates from alumni lists, making efforts to achieve a high "response rate" (by, for example, making efforts to ensure confidentiality).

1.6

The purpose of a placebo is to "blind" those participating in the study to the treatment they are receiving (eliminating the possibility, for example, that the treatment group might behave differently and thereby affect cold-status) so that any observed effect for the treatment can be attributed to it. The placebo is also to provide a baseline estimate of the fraction of cold-free individuals in the study population. The critical statistic is the difference between the fractions of cold-free individuals among those receiving vitamin C and of cold-free individuals among those receiving the placebo, this is the "effect" mentioned above.
p = 26% +/- 1.96(the square root of (.26*.74)/400) or 26% +/- 4.3%
p = 18% +/- 1.96(the square root of (.18*.82)/400) or 18% +/- 3.8%
There appears to be protective (against colds) effect associated with vitamin C use. 26% of the study participants taking vitamin C (the "treatment group") remained cold-free over winter, while 18% of the placebo group remained cold-free. These fractions are statistical estimates subject to sampling error. How likely is it that in another study of the same size (drawn from the same population) would result in estimates suggesting a different conclusion (i.e. that vitamin C was no better than a placebo)? The confidence intervals calculated above help us to address this issue. We are 95% confident that the fraction of cold-free vitamin C users is between 21.7% and 30.3%, while we are 95% confident that the fraction of cold-free placebo users is between 14.2% and 21.8%. There is almost no overlap between these intervals, hence we can be pretty certain that, in this population, there is a protective effect against development of colds associated with vitamin C. [Note: later on in the course we will be able to make direct comparisons between the two fractions using a different confidence interval (one for a difference between two proportions)].

1.7 The statement "To find out what happens when you change something, it is necessary to change it" is appropriate to both randomized controlled experiments and observational studies. If we want to study a phenomena under a set of conditions, it helps to observe the phenomena under those conditions. In a controlled experiment we set the experimental conditions and assign, at random, members of the sample to receive (or be exposed to) the combinations of experimental conditions that we are interested in studying. We don't have this liberty in an observational study, and it might be that we don't observe some interesting combinations of experimental factors. In a controlled experiment we may purposely omit sampling certain combinations of factors. If our sample, whether a controlled experiment or an observational one, does not contain observations for a particular combination of conditions it is hard to reliably comment on what happens under those conditions. However, extrapolation and interpolation from similar conditions can be done (the later more defensibly than the former). Regression is one statistical technique that can be used to estimate what happens under certain unobserved conditions, but there is clearly less uncertainty if we observe what happens under the conditions than in making a statistical guess. [Regression is also useful in observational studies to control for confounding factors].

1.9

Select because the disease causes these people to move to Arizona and overwhelms
Select drive out, low, maintain, reduce, impossible

Select is counterproductive, higher, high, impossible

1.10

X might cause Y: X® Y
Some confounding factor (Z) might cause both X and Y:
Z®X

Y might cause X: Y® X

Yes, Z could affect X and Y, and X could affect Y (or vice versa).

Yes. Imagine a confounding variable, Z. Lets say that Z has a negative effect on Y (X has a positive effect on Y) and that the relationship between X, Y and Z is of the form Y = X - 2*Z. We might observe a negative relationship between X and Y as, for example, in the following "data set":

X Y Z
---------
1 1 0
2 0 1

As X increases, Y decreases because of the effect of Z. In order to correctly deduce the direction of the causal relationship between X and Y we have to measure and properly account for the effect of all "confounding variables," here just Z.