Data Analysis and Statistical Inference

Extra
problems on confidence intervals and hypothesis tests

1. Conceptual questions on confidence intervals

Decide whether the following statements are true or false. Explain your reasoning.

Problems:

i) For a given standard error, lower confidence levels produce wider confidence intervals.

ii) If you increase sample size, the width of confidence intervals will increase.

iii) The statement, "the 95% confidence interval for the population mean is (350, 400)", is equivalent to the statement, "there is a 95% probability that the population mean is between 350 and 400".

iv) To reduce the width of a confidence interval by a factor of two (i.e., in half), you have to quadruple the sample size.

v) Assuming the central limit theorem applies,
confidence intervals are always valid.

2. Conceptual questions on hypothesis testing

Decide whether the following statements are true or false. Explain your reasoning.

Problems:

i) A p-value of .08 is more evidence against the null hypothesis than a p-value of .04.

ii) If two independent studies are done on the same population with the purpose of testing the same hypotheses, the study with the larger sample size is more likely to have a smaller p-value than the study with the smaller sample size.

iii) The statement, "the p-value is .003", is equivalent to the statement, "there is a 0.3% probability that the null hypothesis is true".

iv) Even though you rejected the null hypothesis, it may still be true.

v) Assuming the central limit theorem applies, hypothesis tests are valid.

vi) A researcher who tried to learn statistics without taking a formal course does a hypothesis test and gets a p-value of .024. He says, "there is a 98.6% chance that the alternative hypothesis is false, so the null hypothesis is true." What, if anything, is wrong with his statement?

vii) You perform a hypothesis test using a sample size of four units, and you do not reject the null hypothesis. Your research colleague says this statistical test provides conclusive evidence against the hypothesis. Do you agree or disagree with his conclusion? Explain your reasoning in three or less sentences.

viii) You are the head of the Food and Drug Administration (F.D.A.), in charge of deciding whether new drugs are effective and should be allowed to be sold to people. A pharmaceutical company trying to win approval for a new drug they manufacture claims that their drug is better than the standard drug at curing a certain disease. The company bases this claim on a study in which they gave their drug to 1000 volunteers with the disease. They compared these volunteers to a group of 1000 hospital patients who were treated with the standard drug and whose information is obtained from existing hospital records. The company found a "statistically significant" difference between the percentage of volunteers who were cured and the percentage of the comparison group who were cured. That is, they did a statistical hypothesis test and rejected the null hypothesis that the percentages are equal. As director of the F.D.A., should you permit the new drug to be sold? Explain your reasoning in three or less sentences.

3. True or False:

On a Raleigh-Durham television news program, 75% of the 200 people living in the Triangle who responded to a call-in poll favored new U.S. military action against Iraq. A 95% confidence interval for the population proportion of people in Triangle who favor new U.S. military action against Iraq

.75 +/- 1.96 sqrt(.75*.25/200).

4. Frightening information about our citizenry.

On Dec. 16, 1991, the Associated Press reported that in a random sample of 507 adult Americans, only 142 correctly described the Bill of Rights as the first ten amendments to the U.S. Constitution.

Problems

1. Calculate a 95% confidence interval
for the proportion of all U.S. adults that could correctly describe the
Bill of Rights.

2. What assumptions need to hold
for this confidence interval to be valid? Do they seem reasonable?

3. Comment on anything you would
like to know before accepting this confidence interval.

Reference:

Devore, J. *Probability and Statistics for Engineering and
the Sciences. *Pacific Grove, CA: Duxbury, 2000, p. 294.

5. Improving response rates in surveys

There are many methods researchers use to increase response rates to
mailed surveys. One idea is to make the surveys attractive in hopes
that people don't throw them out right away. In the article "The
Impact of Cover Design and First Questions on Response Rates for a Mail
Survey of Skydivers" in *Leisure Sciences, *1991, pp. 67--76, researchers
reported on the results of an experiment with cover design. Of 420
skydivers, 207 were randomly selected to receive a survey with a plain
cover, and the remaining 213 received a cover with a picture of a skydiver.
The outcome of the experiment is shown below:

Group Number sent
Number returned

----------------------------

Plain
207
104

Skydiver 213
109

The researchers are interested in seeing if there is any difference between the response rates to the survey for plain versus skydiver covers.

Problems:

1. What are the null and alternative hypotheses?

2. Calculate the value of the test statistic for the hypothesis test.

3. Using a significance level of .05, is there evidence of a detectable difference between response rates for plain and skydiver covers?

4. Give a 95% confidence interval for the difference between the proportion of skydivers who respond to plain covers and the proportion of skydivers who respond to a skydiver cover.

5. What assumptions are you making in these analyses? Do they seem reasonable?

6. What, if anything, would you like to know about the study design before accepting conclusions?

7. What limitations does this study have, if any?

Reference:

Devore, J. *Probability and Statistics for Engineering and
the Sciences. *Pacific Grove, CA: Duxbury, 2000, p. 390-391.

6. Questionnaire wording in action

In a February 1999 poll, the Gallup Organization--one of the major U.S. polling companies--used two different wordings of a question about the death penalty. The two wordings were asked of two different random samples of Americans. Results reported at the Gallup website were:

*Question 1: Are you in favor of the
death penalty for a person convicted of murder?*

__Results:__ 385 people favor death penalty,
119 against death penalt , 39
with no opinion

*Question 2: What do you think should
be the penalty for murder--the death penalty or life imprisonment with
absolutely no parole?*

__Results:__ 286 people favor death penalty,
194 favor life imprisonment, 31
with no opinion

For purposes of this problem let's consider answers to be either favoring the death penalty or not favoring the death penalty. That is, in Question 1 combine the 39 "no opinion" people with the 119 against the death penalty people. In Question 2 combine the 31 "no opinion" people with the 194 favoring life imprisonment people.

__Problems__

1. Give a 95% confidence interval for the true percentage of Americans who favor the death penalty according to the results from Question 1. What conclusions do you make about the percentage of people in favor of the death penalty when Question 1 is asked?

2. Give a 95% confidence interval for the true percentage of Americans who favor the death penalty according to the results from Question 2. What conclusions do you make about the percentage of people in favor of the death penalty when Question 1 is asked?

3. Based on these results, does the wording
of the question appear to affect the percentage of people who say they're
in favor of the death penalty? Justify your answer in__ three or
less__ sentences.

Reference:

Utts, J.M. and Heckard, R.F. *Mind
on Statistics. *Duxbury Press. 2002, p. 310.

7. Composition of Ancient Earth's Atmosphere

Has the composition of Earth's atmosphere changed over time? To study this question, geologists Robert Berner and Gary Landis (1988) examine the composition of gas bubbles in ancient pieces of amber (hardened tree resin preserved in sedimentary rocks).

To determine the composition of the gas bubbles, they crush the amber in a vacuum and analyze the released gases with "time-resolved quadrupole mass spectromety" (Berner and Landis, 1988, p. 1406). After arguing that the air in the bubbles is not contaminated by modern air, Berner and Landis (1988) present the percentages of nitrogen and carbon dioxide plus oxygen in nine gas bubbles in amber from the Upper Cretaceous age (about 75 to 95 million years ago). These data are shown below. In the sample labels, the Roman numerals correspond to the piece of amber that is crushed (there are three pieces), and the letter corresponds to the gas bubble within the amber that is analyzed.

Sample Label

Gas | IA | IB | IIA | IIB | IIC | IID | IIIA | IIIB | IIIC |

N_{2} |
63.4 | 65.0 | 64.4 | 63.3 | 54.8 | 64.5 | 60.8 | 49.1 | 51.0 |

CO_{2} + O_{2} |
33.5 | 30.5 | 28.3 | 28.4 | 32.3 | 25.5 | 36.6 | 27.8 | 25.5 |

Berner and Landis (1988) argue that the carbon dioxide is respired oxygen
from trapped microorganisms, so that the original levels of oxygen in the
amber equal the CO_{2} + O_{2} percentages. Thus,
they claim these are percentages of the two major gases from nine samples
of ancient air.

The data are in the file *ancientair*.

Problems:

1. Modern air is known to contain 78.1% nitrogen and 20.9% oxygen. Is there evidence that the percentages of nitrogen and oxygen in ancient air differed from the composition of modern air? Use two-sided t-tests, since we are looking for any differences from the modern percentages.

2. It is not universally accepted by geologists
that the gas bubbles represent samples of air from ancient times.
This question, which is one of measurement bias, can be answered only by
experts in the field. However, the appropriateness of t-tests can
be criticized for these data from a statistical point of view. Criticize
the use of t-tests. (I can think of one legitimate criticism, and I can
see a possible case made for another.)

Reference

Berner, R. A. and Landis, G. P. (1988) "Gas Bubbles in Fossil Amber
as Possible Indicators of the Major Gas Composition of Ancient Air." *Science
***239**,
pp. 1406--1409.

8. The effects of logging on tropical rainforests

How badly does logging damage tropical rainforests? Cannon, Peart,
and Leighton (1998) compare plots of forests in Borneo that had never been
logged to similar, nearby forests that had been logged eight years earlier.
The numbers of tree species in each of 12 unlogged plots and 9 logged plots
are shown below. The data also are in the file *logging*.

Unlogged | 22, 18, 22, 20, 15, 21, 13, 13, 19, 13, 19, 15 |

Logged | 17, 4, 18, 14, 18, 15, 15, 10, 12 |

The researchers argue that the plots can be considered random samples
because "patches that escape logging are determined by the placement of
logging roads and the haphazard search patterns of operators, rather than
by intrinsic differences among patches" (Cannon, *et al.,*1998, p.
1366) and that the loggers did not know the effects of logging would be
assessed.

Problem:

1. Why is it important that the loggers did not know the effects of logging would be assessed?

2. Test the hypothesis that logged plots have a significantly lower number of tree species after 8 years. What do you conclude about this hypothesis?

3. Give a 99% confidence interval for the difference in the average number of species in logged and unlogged forests. Use the degrees of freedom from the t-test you performed above.

4. What assumptions are you making in this analysis? Do they seem reasonable?

5. There is an outlier in the data for the logged plots. When outliers exist, it is useful to perform analyses with and without the outliers to see how much they affect the results. After excluding the outlier, re-test the hypothesis that logged plots have a significantly lower number of tree species after 8 years. What do you conclude?

6. Given the results of all these analyses, what would you say about the differences in logged and unlogged plots?

Reference:

Cannon, C. H., Peart, D. R., and Leighton, M. (1998) "Tree species
diversity in commercially logged Bornean rainforest." *Science
***281**,
pp. 1366--1367.

9. Reading Journal Articles

Many journal articles describing experimental results contain terse
reports of confidence intervals and hypothesis tests that readers are supposed
to understand. For example, in Hotamisligi *et al.* (1996),
the researchers compare normal mice with similar mice genetically altered
to remove the gene called *aP2.* The researchers are interested
in this gene because it might be connected with diabetes.

Mice of both types are allowed to become obese by eating a high-fat diet. The researchers then measure the levels of insulin and glucose in their blood plasma. The article reports the following results in their Table 1 :

"Each value is the mean +/- the SEM (*standard error of the mean*)
of measurements on at least 10 mice. Mean values of each plasma component
are compared between *aP2 ^{-/- }* mice (

Parameter | Wild-type | aP2^{-/-} |

Insulin (ng/ml) | 5.9 +/- 0.9 | 0.75 +/- 0.2 ** |

Glucose (mg/dl) | 230 +/- 25 | 150 +/- 17 * |

Despite much greater circulating amounts of insulin, the wild-type mice
had higher blood glucose than the *aP2 ^{-/-}* animals.
These results indicate that the absence of

The high levels of glucose indicate diabetes, and the high-levels of insulin indicate obesity-induced insulin resistance. As a comparison, lean mice of both types have glucose levels around 140 and insulin levels around .80.

Problem:

Explain to a biologist who knows nothing about inferential statistics how to interpret the * and the **.

Reference:

Hotasmisligi, G. S., Johnson, R. S., Distel, R. J., Ellis, R., Papaioannou,
V. E., and Spiegelman, B. M. (1996) "Uncoupling of obesity from insulance
resistance through a targeted mutation in *aP2, *the adipocyte fatty
acid binding protein." *Science, ***274**, pp. 1377--1379.

10. Glaucoma

To determine whether glaucoma affects corneal thickness, measurements
were made in 8 people affected by glaucoma in one eye but not in the other.
The corneal thicknesses are in the file *glaucoma*.

Problem:

1. Make a 95% confidence interval for the difference in the average thickness of eyes with glaucoma and the average thickness of eyes without glaucoma.

2. Test if there is any difference in the average thickness of eyes with glaucoma and eyes without glaucoma. Use a 1% significance level.

3. What assumptions are you making for these analyses. Do they seem reasonable?

4. What, if any, limitations does this study have?

Reference:

Tamhane, A. and Dunlop, D. *Statistics and Data Analysis. *Upper
Saddle River, NJ: Prentice Hall, 2000, p. 292.