problems on confidence intervals and hypothesis tests
1. Conceptual questions on confidence intervals
Decide whether the following statements are true or false. Explain your reasoning.
i) For a given standard error, lower confidence levels produce wider confidence intervals.
ii) If you increase sample size, the width of confidence intervals will increase.
iii) The statement, "the 95% confidence interval for the population mean is (350, 400)", is equivalent to the statement, "there is a 95% probability that the population mean is between 350 and 400".
iv) To reduce the width of a confidence interval by a factor of two (i.e., in half), you have to quadruple the sample size.
v) Assuming the central limit theorem applies,
confidence intervals are always valid.
2. Conceptual questions on hypothesis testing
Decide whether the following statements are true or false. Explain your reasoning.
i) A p-value of .08 is more evidence against the null hypothesis than a p-value of .04.
ii) If two independent studies are done on the same population with the purpose of testing the same hypotheses, the study with the larger sample size is more likely to have a smaller p-value than the study with the smaller sample size.
iii) The statement, "the p-value is .003", is equivalent to the statement, "there is a 0.3% probability that the null hypothesis is true".
iv) Even though you rejected the null hypothesis, it may still be true.
v) Assuming the central limit theorem applies, hypothesis tests are valid.
vi) A researcher who tried to learn statistics without taking a formal course does a hypothesis test and gets a p-value of .024. He says, "there is a 98.6% chance that the alternative hypothesis is false, so the null hypothesis is true." What, if anything, is wrong with his statement?
vii) You perform a hypothesis test using a sample size of four units, and you do not reject the null hypothesis. Your research colleague says this statistical test provides conclusive evidence against the hypothesis. Do you agree or disagree with his conclusion? Explain your reasoning in three or less sentences.
viii) You are the head of the Food and Drug Administration (F.D.A.), in charge of deciding whether new drugs are effective and should be allowed to be sold to people. A pharmaceutical company trying to win approval for a new drug they manufacture claims that their drug is better than the standard drug at curing a certain disease. The company bases this claim on a study in which they gave their drug to 1000 volunteers with the disease. They compared these volunteers to a group of 1000 hospital patients who were treated with the standard drug and whose information is obtained from existing hospital records. The company found a "statistically significant" difference between the percentage of volunteers who were cured and the percentage of the comparison group who were cured. That is, they did a statistical hypothesis test and rejected the null hypothesis that the percentages are equal. As director of the F.D.A., should you permit the new drug to be sold? Explain your reasoning in three or less sentences.
3. True or False:
On a Raleigh-Durham television news program, 75% of the 200 people living in the Triangle who responded to a call-in poll favored new U.S. military action against Iraq. A 95% confidence interval for the population proportion of people in Triangle who favor new U.S. military action against Iraq
.75 +/- 1.96 sqrt(.75*.25/200).
4. Frightening information about our citizenry.
On Dec. 16, 1991, the Associated Press reported that in a random sample of 507 adult Americans, only 142 correctly described the Bill of Rights as the first ten amendments to the U.S. Constitution.
1. Calculate a 95% confidence interval
for the proportion of all U.S. adults that could correctly describe the
Bill of Rights.
2. What assumptions need to hold for this confidence interval to be valid? Do they seem reasonable?
3. Comment on anything you would like to know before accepting this confidence interval.
Devore, J. Probability and Statistics for Engineering and the Sciences. Pacific Grove, CA: Duxbury, 2000, p. 294.
5. Improving response rates in surveys
There are many methods researchers use to increase response rates to mailed surveys. One idea is to make the surveys attractive in hopes that people don't throw them out right away. In the article "The Impact of Cover Design and First Questions on Response Rates for a Mail Survey of Skydivers" in Leisure Sciences, 1991, pp. 67--76, researchers reported on the results of an experiment with cover design. Of 420 skydivers, 207 were randomly selected to receive a survey with a plain cover, and the remaining 213 received a cover with a picture of a skydiver. The outcome of the experiment is shown below:
Group Number sent
Plain 207 104
Skydiver 213 109
The researchers are interested in seeing if there is any difference between the response rates to the survey for plain versus skydiver covers.
1. What are the null and alternative hypotheses?
2. Calculate the value of the test statistic for the hypothesis test.
3. Using a significance level of .05, is there evidence of a detectable difference between response rates for plain and skydiver covers?
4. Give a 95% confidence interval for the difference between the proportion of skydivers who respond to plain covers and the proportion of skydivers who respond to a skydiver cover.
5. What assumptions are you making in these analyses? Do they seem reasonable?
6. What, if anything, would you like to know about the study design before accepting conclusions?
7. What limitations does this study have, if any?
Devore, J. Probability and Statistics for Engineering and the Sciences. Pacific Grove, CA: Duxbury, 2000, p. 390-391.
6. Questionnaire wording in action
In a February 1999 poll, the Gallup Organization--one of the major U.S. polling companies--used two different wordings of a question about the death penalty. The two wordings were asked of two different random samples of Americans. Results reported at the Gallup website were:
Question 1: Are you in favor of the death penalty for a person convicted of murder?
Results: 385 people favor death penalty,
119 against death penalt , 39
with no opinion
Question 2: What do you think should be the penalty for murder--the death penalty or life imprisonment with absolutely no parole?
Results: 286 people favor death penalty,
194 favor life imprisonment, 31
with no opinion
For purposes of this problem let's consider answers to be either favoring the death penalty or not favoring the death penalty. That is, in Question 1 combine the 39 "no opinion" people with the 119 against the death penalty people. In Question 2 combine the 31 "no opinion" people with the 194 favoring life imprisonment people.
1. Give a 95% confidence interval for the true percentage of Americans who favor the death penalty according to the results from Question 1. What conclusions do you make about the percentage of people in favor of the death penalty when Question 1 is asked?
2. Give a 95% confidence interval for the true percentage of Americans who favor the death penalty according to the results from Question 2. What conclusions do you make about the percentage of people in favor of the death penalty when Question 1 is asked?
3. Based on these results, does the wording of the question appear to affect the percentage of people who say they're in favor of the death penalty? Justify your answer in three or less sentences.
Utts, J.M. and Heckard, R.F. Mind on Statistics. Duxbury Press. 2002, p. 310.
7. Composition of Ancient Earth's Atmosphere
Has the composition of Earth's atmosphere changed over time? To study this question, geologists Robert Berner and Gary Landis (1988) examine the composition of gas bubbles in ancient pieces of amber (hardened tree resin preserved in sedimentary rocks).
To determine the composition of the gas bubbles, they crush the amber in a vacuum and analyze the released gases with "time-resolved quadrupole mass spectromety" (Berner and Landis, 1988, p. 1406). After arguing that the air in the bubbles is not contaminated by modern air, Berner and Landis (1988) present the percentages of nitrogen and carbon dioxide plus oxygen in nine gas bubbles in amber from the Upper Cretaceous age (about 75 to 95 million years ago). These data are shown below. In the sample labels, the Roman numerals correspond to the piece of amber that is crushed (there are three pieces), and the letter corresponds to the gas bubble within the amber that is analyzed.
|CO2 + O2||33.5||30.5||28.3||28.4||32.3||25.5||36.6||27.8||25.5|
Berner and Landis (1988) argue that the carbon dioxide is respired oxygen from trapped microorganisms, so that the original levels of oxygen in the amber equal the CO2 + O2 percentages. Thus, they claim these are percentages of the two major gases from nine samples of ancient air.
The data are in the file ancientair.
1. Modern air is known to contain 78.1% nitrogen and 20.9% oxygen. Is there evidence that the percentages of nitrogen and oxygen in ancient air differed from the composition of modern air? Use two-sided t-tests, since we are looking for any differences from the modern percentages.
2. It is not universally accepted by geologists
that the gas bubbles represent samples of air from ancient times.
This question, which is one of measurement bias, can be answered only by
experts in the field. However, the appropriateness of t-tests can
be criticized for these data from a statistical point of view. Criticize
the use of t-tests. (I can think of one legitimate criticism, and I can
see a possible case made for another.)
Berner, R. A. and Landis, G. P. (1988) "Gas Bubbles in Fossil Amber as Possible Indicators of the Major Gas Composition of Ancient Air." Science 239, pp. 1406--1409.
8. The effects of logging on tropical rainforests
How badly does logging damage tropical rainforests? Cannon, Peart,
and Leighton (1998) compare plots of forests in Borneo that had never been
logged to similar, nearby forests that had been logged eight years earlier.
The numbers of tree species in each of 12 unlogged plots and 9 logged plots
are shown below. The data also are in the file logging.
|Unlogged||22, 18, 22, 20, 15, 21, 13, 13, 19, 13, 19, 15|
|Logged||17, 4, 18, 14, 18, 15, 15, 10, 12|
The researchers argue that the plots can be considered random samples because "patches that escape logging are determined by the placement of logging roads and the haphazard search patterns of operators, rather than by intrinsic differences among patches" (Cannon, et al.,1998, p. 1366) and that the loggers did not know the effects of logging would be assessed.
1. Why is it important that the loggers did not know the effects of logging would be assessed?
2. Test the hypothesis that logged plots have a significantly lower number of tree species after 8 years. What do you conclude about this hypothesis?
3. Give a 99% confidence interval for the difference in the average number of species in logged and unlogged forests. Use the degrees of freedom from the t-test you performed above.
4. What assumptions are you making in this analysis? Do they seem reasonable?
5. There is an outlier in the data for the logged plots. When outliers exist, it is useful to perform analyses with and without the outliers to see how much they affect the results. After excluding the outlier, re-test the hypothesis that logged plots have a significantly lower number of tree species after 8 years. What do you conclude?
6. Given the results of all these analyses, what would you say about the differences in logged and unlogged plots?
Cannon, C. H., Peart, D. R., and Leighton, M. (1998) "Tree species diversity in commercially logged Bornean rainforest." Science 281, pp. 1366--1367.
9. Reading Journal Articles
Many journal articles describing experimental results contain terse reports of confidence intervals and hypothesis tests that readers are supposed to understand. For example, in Hotamisligi et al. (1996), the researchers compare normal mice with similar mice genetically altered to remove the gene called aP2. The researchers are interested in this gene because it might be connected with diabetes.
Mice of both types are allowed to become obese by eating a high-fat diet. The researchers then measure the levels of insulin and glucose in their blood plasma. The article reports the following results in their Table 1 :
"Each value is the mean +/- the SEM (standard error of the mean)
of measurements on at least 10 mice. Mean values of each plasma component
are compared between aP2-/- mice (the mice with
the gene removed) and wild-type controls (the normal mice) by
Student's t test (* P<.05 and ** P<.005).
|Insulin (ng/ml)||5.9 +/- 0.9||0.75 +/- 0.2 **|
|Glucose (mg/dl)||230 +/- 25||150 +/- 17 *|
Despite much greater circulating amounts of insulin, the wild-type mice had higher blood glucose than the aP2-/- animals. These results indicate that the absence of aP2-/- interferes with the development of dietary obesity-induced insulin resistance." (Hotamisligi et al. 1996, p. 1378. Italics included by Jerry Reiter.)
The high levels of glucose indicate diabetes, and the high-levels of insulin indicate obesity-induced insulin resistance. As a comparison, lean mice of both types have glucose levels around 140 and insulin levels around .80.
Explain to a biologist who knows nothing about inferential statistics how to interpret the * and the **.
Hotasmisligi, G. S., Johnson, R. S., Distel, R. J., Ellis, R., Papaioannou, V. E., and Spiegelman, B. M. (1996) "Uncoupling of obesity from insulance resistance through a targeted mutation in aP2, the adipocyte fatty acid binding protein." Science, 274, pp. 1377--1379.
To determine whether glaucoma affects corneal thickness, measurements were made in 8 people affected by glaucoma in one eye but not in the other. The corneal thicknesses are in the file glaucoma.
1. Make a 95% confidence interval for the difference in the average thickness of eyes with glaucoma and the average thickness of eyes without glaucoma.
2. Test if there is any difference in the average thickness of eyes with glaucoma and eyes without glaucoma. Use a 1% significance level.
3. What assumptions are you making for these analyses. Do they seem reasonable?
4. What, if any, limitations does this study have?
Tamhane, A. and Dunlop, D. Statistics and Data Analysis. Upper Saddle River, NJ: Prentice Hall, 2000, p. 292.