STA114/MTH136: Statistics Lab 5, Feb 8 2001


Let's consider the problem of estimating the mean µ of a normal distribution with known variance sig², on the basis of n=10 observations. The usual 90% *frequentist* interval estimate (so-called Confidence Interval) is

[ x-bar - 1.645 sig/sqrt n, x-bar + 1.645 sig/sqrt n ]

where 1.645=qnorm(0.95) is chosen so that P[-1.645 < Z < +1.645] = 0.90 for a standard normal distribution. Let's simulate some data and plot the intervals in S-Plus.

First, S-Plus; start it within Emacs or, for command-line die-hards, try "Splus -e" from the command line. Then, once S-Plus is running, type or cut-and-paste:


  motif();              #  Under Windows, use "graph.sheet();" instead
  help.start();         #  Nicer, windowed help system
  plot(c(50,0),type="n",xlim=c(50,100), ylim=c(0,1));
  lines(c(70,70), c(0,1), lwd=2, col=2);
  for(i in 1:100) {
    x <- rnorm(10, 70, 10);  # Random sample of ten No(70,1) rv's
    mid <- mean(x);         # X-bar
    wid <- 1.645 * 10. / sqrt(10.);
    lines(c(mid-wid, mid+wid), rep(i/100, 2)) # What does this do?
    points(c(mid-wid, mid, mid+wid), rep(i/100, 3));  # And this?
  }

How many of your 100 confidence intervals contain the true mean, µ=70? How many did you expect to?

Here's a careless pre-Student version where we estimate the variance from the data:


  motif();              #  Under Windows, use "graph.sheet();" instead
  help.start();         #  Nicer, windowed help system
  plot(c(50,0),type="n",xlim=c(50,100), ylim=c(0,1));
  lines(c(70,70), c(0,1), lwd=2, col=2);
  for(i in 1:100) {
    x <- rnorm(10, 70, 10);  # Random sample of ten No(70,1) rv's
    mid <- mean(x);         # X-bar
    sd2 <- var(x);          # Sample variance
    wid <- 1.645 * sqrt(sd2) / sqrt(10.);
    lines(c(mid-wid, mid+wid), rep(i/100, 2)) # What does this do?
    points(c(mid-wid, mid, mid+wid), rep(i/100, 3));  # And this?
  }

NOW how many of your 100 confidence intervals contain the true mean, µ=70? How many did you expect to? Were you surprised? What went wrong? Did you notice how the intervals now have varying width? Why?


Now let's try the Bayesian version of the problem:

  plot(c(50,0),type="n",xlim=c(50,100), ylim=c(0,1));
  x <- rnorm(10, 70, 10);  # Random sample of ten No(70,1) rv's
  xbar <- mean(x);
  wid <- 1.645 * 10. / sqrt(10.);
  segments(c(xbar-wid,xbar+wid), c(0,0),
           c(xbar-wid,xbar+wid), c(1,1), c(0,1), lwd=2, col=2);
  for(i in 1:100) {
    th <- rnorm(1, xbar, 10./sqrt(10.0));
    points(th, i/100);
  }

How many of the THETA points fall inside the interval? Can you explain what you've just done, and contrast these two plots?