Homework #1—Suggested Answers
ISBE 1.2, 1.3, 1.6, 1.7, 1.9, 1.10
1.2
- 63/(63+19)=77%, p
= 77% +/- 9.1%
- 71%, p
= 71% +/- 7.2%
- 58%, p
= 58% +/- 6.1%
- 52%, p
= 52% +/- 6.2%
- 55%, p
= 55% +/- 4.4%
Notice this is the average between % of males and % of females who
prefer popular music (there are an equal number of males and females).
b. We assume no sampling bias, that is, that the sample
was selected so that it is representative of the population of
listeners (allowing us to make inference about the various
p's estimated above).
1.3
- The sample is based on unsolicited constituent's opinions.
Individuals with strong opinions were more likely to contact the
senator, hence mail received by the senator are not likely to be
broadly representative of his constituency. To reduce bias, the
Senator should poll a randomly selected sample of his constituents.
- The survey was only conducted between the hours of 9am-5pm, failing
to capture the population who are at work at these hours. These may
be the people most desiring a child day care center. The survey
interviews should be conducted at other times as well.
- Individuals attending a reunion may be different in important
ways from the population of graduates (for example, less successful
graduates may not attend). In addition a large fraction polled (42
of 56) did not reply. It may be that more successful individuals were
more (or less) likely to reply. A better way to estimate average income
would be to sample graduates from alumni lists, making efforts to
achieve a high "response rate" (by, for example, making efforts to
ensure confidentiality).
1.6
- The purpose of a placebo is to "blind" those participating
in the study to the treatment they are receiving (eliminating the possibility,
for example, that the treatment group might behave differently and
thereby affect cold-status) so that any observed effect for the treatment
can be attributed to it. The placebo is also to provide a baseline estimate
of the fraction of cold-free individuals in the study population. The critical
statistic is the difference between the fractions of cold-free individuals
among those receiving vitamin C and of cold-free individuals among those
receiving the placebo, this is the "effect" mentioned above.
- p
= 26% +/- 1.96(the square root of (.26*.74)/400) or 26% +/- 4.3%
- p
= 18% +/- 1.96(the square root of (.18*.82)/400) or 18% +/- 3.8%
- There appears to be protective (against colds) effect associated with
vitamin C use. 26% of the study participants taking vitamin C (the "treatment
group") remained cold-free over winter, while 18% of the placebo group remained
cold-free. These fractions are statistical estimates subject to sampling error.
How likely is it that in another study of the same size (drawn from the
same population) would result in estimates suggesting a different conclusion
(i.e. that vitamin C was no better than a placebo)? The confidence intervals
calculated above help us to address this issue. We are 95% confident that
the fraction of cold-free vitamin C users is between 21.7% and 30.3%, while
we are 95% confident that the fraction of cold-free placebo users is
between 14.2% and 21.8%. There is almost no overlap between these intervals,
hence we can be pretty certain that, in this population, there is a protective
effect against development of colds associated with vitamin C. [Note: later
on in the course we will be able to make direct comparisons between the two
fractions using a different confidence interval (one for a difference between
two proportions)].
1.7 The statement "To find out what happens when you change something,
it is necessary to change it" is appropriate to both randomized
controlled experiments and observational studies. If we want to study a
phenomena under a set of conditions, it helps to observe the phenomena under
those conditions. In a controlled experiment we set the experimental conditions
and assign, at random, members of the sample to receive (or be exposed to) the
combinations of experimental conditions that we are interested in studying.
We don't have this liberty in an observational study, and it might be
that we don't observe some interesting combinations of experimental factors.
In a controlled experiment we may purposely omit sampling certain
combinations of factors. If our sample, whether a controlled experiment
or an observational one, does not contain observations for a particular
combination of conditions it is hard to reliably comment on what happens
under those conditions. However, extrapolation and interpolation from
similar conditions can be done (the later more defensibly than the former).
Regression is one statistical technique that can be used to estimate what
happens under certain unobserved conditions, but there is clearly less
uncertainty if we observe what happens under the conditions than in
making a statistical guess. [Regression is also useful in observational
studies to control for confounding factors].
1.9
- Select because the disease causes these people to move to
Arizona and overwhelms
- Select drive out, low, maintain,
reduce, impossible
- Select is counterproductive, higher, high,
impossible
1.10
- X might cause Y: X®
Y
- Some confounding factor (Z) might cause both X and Y:
Z®X
¯
Y
- Y might cause X: Y®
X
- Yes, Z could affect X and Y, and X could affect Y (or vice versa).
- Yes. Imagine a confounding variable, Z. Lets say that Z has a
negative effect on Y (X has a positive effect on Y) and that the
relationship between X, Y and Z is of the form Y = X - 2*Z. We might
observe a negative relationship between X and Y as, for example, in
the following "data set":
X Y Z
---------
1 1 0
2 0 1
As X increases, Y decreases because of the effect of Z. In
order to correctly deduce the direction of the causal relationship
between X and Y we have to measure and properly account for the
effect of all "confounding variables," here just Z.