Graded Assignment

Assignment (due 3 November)

Complete the tasks described below and write a short (at most four pages, including graphs) report explaining what you've done and what it demonstrates.

Your paper's goals should be as follows:
- to explain how to interpret the meaning of the confidence level attached to a confidence interval,
- to demonstrate that if you are estimating a population mean and you are using the sample standard deviation instead of the population standard deviation, then critical values from a t distribution will give you more accurate confidence levels than critical values from a standard normal distribution,
- to demonstrate what happens to confidence intervals when the sample size remains the same but the confidence level changes, and
- to demonstrate what happens to confidence intervals when the confidence level remains the same but the sample size changes.
Graphs will be an essential part of your paper, of course. These should be titled and clearly labeled.
Graphs should be referred to at appropriate places in your paper and described clearly. They should be incorporated into the flow of your paper, not dumped in with the reader left to figure out what they mean.
The best way to present a graph in a scientific paper is to precede the graph with a description of what the graph shows, and to follow the graph with an interpretation of what the graph means.
You may work with others while doing the MatLab simulations and you may discuss your results with others. However, you must write and understand your own MatLab code, generate your own graphs, and write your own paper.
Don't forget to put your name on your paper.
The paper is to be turned in with your homework in class at 1:15 on Tuesday, 2 November.
Your paper will be graded in large part on how well it communicates its main ideas. Your writing should follow the three C's of scientific writing: Clear, Concise, and Correct.

The tasks

Throughout this lab assignment you will be simulating the heights of men randomly drawn from a large population of college-age men whose heights are normally distributed with mean 70 inches and standard deviation 2.5 inches.

Simuate 100 different samples of 5 randomly selected men. For each of the 100 samples, determine the 95% confidence interval estimate of the population mean that it yields, using the "known" population standard deviation of 2.5 and the appropriate critical value determined from the standard normal distribution. Show a plot of the 100 confidence intervals as parallel line segments, indicating which ones did and which ones did not capture the population mean of 70 inches. (The "hints" section below contains a .m file that will plot this graph for you.) Determine what proportion of the intervals successfully captured the population mean. Repeat this with 1000 samples, but do not include the graph in your paper, only the proportion of intervals that captured the mean.
In reality, the population standard deviation would not be known. Simulate 100 different samples of 5 randomly selected men once more. For each of the 100 samples, determine the 95% confidence interval estimate of the population mean that it yields, using the sample standard deviation s for each interval instead of sigma=2.5. For now, choose your critical value from a standard normal distribution as you did in step 1, even though that isn't the appropriate thing to do. Plot the intervals as you did in step 1 and report the proportion of them that capture the population mean. Repeat with 1000 random samples, but do not show the graph, just report the proportion of the intervals that caputure the mean.
Repeat everything you did in step 2, only this time use the appropriate critical value from the t distribution with 4 degrees of freedom, instead of from the standard normal distribution. You should now be able to explain why the t-distribution, not the standard normal, is the appropriate one from which to choose your critical values when using the sample standard deviation.
Repeat step 3 above but with a 99% confidence level instead of a 95% confidence level. (Do only 100 simulations, not 1000.) Show your confidence intervals. What do you find is different about the intervals? About the proportion capturing the mean?
Repeat step 3 above (returning to a 95% confidence level) but with samples of size 20 instead of size 5. Note that your degrees of freedom is no longer 4. (Do only 100 simulations, not 1000.) Show your confidence intervals. What do you find is different about the intervals?

Helpful hints

The code below may be copied into an .m file called plotconfints.m and used to make your graphs.

% This function takes as arguments a vector of left endpoints,
% a vector of right endpoints, and a number, and plots parallel
% line segments for the left/right endpoint pairs, with a horizontal
% line segment at the third argument.  The function returns the
% proportion of the intervals that contain the third argument.

function p = plotconfints(L,R,parameter)

figure
hold on

n = length(L);
for i = 1:n
  if L(i)<=parameter & R(i)>=parameter
    plot([i i],[L(i) R(i)],'g-')
  else
    plot([i i],[L(i) R(i)],'xr-')
  end
end

bottom = min(L) - 0.1*(max(R)-min(L));
top = max(R) + 0.1*(max(R)-min(L));
left = floor(-0.1*n);
right = ceil(1.1*n);
axis([left right bottom top])

plot([left right],[parameter parameter],'bd-')

hold off

numbercontainingparameter = sum(L<=parameter & R>=parameter);
p = numbercontainingparameter/length(L);

Recall that a way to simulate 100 samples of size 5 is to create a matrix of simulated values in which each row represents a different sample of 5 men:
```
heights = normrnd(70, 2.5, 100, 5);
```
Recall that row (i.e., sample) means can be found with the mean command, where a second argument of dimension=2 indicates to take row means instead of the default column means:
```
samplemeans = mean(heights,2);
```
Finding the row (i.e.,sample) standard deviations can be done with the std command, similar to the mean command, only three arguments are required. The first is the matrix of data, the third is the dimension (2 for rows), and the middle argument is a "flag" of either 0 or 1, indicating whether the standard deviation should be computed using n or n-1 as the divisor. You should use n-1, so the flag should be 0:
```
stdheights = std(heights,0,2);
```
You can look up the critical values in the standard normal and t tables in your text if you like. (There are only four critical values involved in this lab.) But if you'd like to compute them using MatLab, it isn't hard. Type help norminv and help tinv to see the syntax. Remember that for a 95% confidence interval, the critical value has area 0.975 to its left, not 0.95.