This week's data set has 8 variables. Four correspond to sample means, the other 4 to sample medians, drawn from a population with mean 61.6 and variance 211.2. The data represent age at diagnosis of a particular disease; the population consists of women living in a large U.S. government study region. Mu4 is a column of 100 sample means calculated from random samples of size 4 drawn from this population; Mu16, Mu100, and Mu1600 are 100 sample means for samples of size 16, 100, and 1600. Columns Med4, Med16, Med100, and Med1600 are sample medians for samples of the indicated size.
1) Distribution of the Sample Mean. Before you begin this problem, open a new spreadsheet to record your findings. Calculate summaries of the distribution of each of the four columns Mu4, Mu16, Mu100, and Mu1600. Make note of the shape of each histogram and enter standard deviation of each column in column 2 of your empty spread sheet, enter the associated n (4, 16, 100, or 1600) in column 1 of the empty spread sheet. The central limit theorem says that (at least the later two of) these histograms should have approximately what distribution? Are the histograms consistent with this? The variance of the sample mean as a function of n is the population variance divided by n. For each n, calculate (can do in the second spread sheet) the theoretical variance of the sample means in each of the 4 cases given. Compare the theoretical to observed values.
2) Distribution of the Sample Median. Repeat problem 1 for
the samples of sample medians. Note that, since the parent population
is approximately normal, the theoretical variance of the sample
median is approximately 1.57 times the population variance divided by
n (a result from class). Also note that the central limit theorem
applies to means, not medians.
Return to the Stat 110B lab page.