House keeping

Central Limit Theorem

Variability of sample statistics

  • Each sample from the population yields a slightly different sample statistic (sample mean, sample proportion, etc.)

  • The variability of these sample statistics is measured by the standard error

  • Previously we quantified this value via simulation

  • Today we talk about the theory underlying sampling distributions

Aside: Normal probability plots

Normal probability plot

temp = rnorm(100, mean = 50, sd = 5)
# normal probability plot
g = qplot(sample = temp, stat = "qq")
g + geom_abline(intercept = mean(temp), slope = sd(temp), linetype = "dashed")

plot of chunk unnamed-chunk-2

Alternative code for normal probability plot

qqnorm(temp)
qqline(temp)

plot of chunk unnamed-chunk-3

Anatomy of a normal probability plot

  • Data are plotted on the y-axis of a normal probability plot and theoretical quantiles (following a normal distribution) on the x-axis.

  • If there is a one-to-one relationship between the data and the theoretical quantiles, then the data follow a nearly normal distribution.

  • Since a one-to-one relationship would appear as a straight line on a scatter plot, the closer the points are to a perfect straight line, the more confident we can be that the data follow the normal model.

Constructing a normal probability plot

Data (y-coordinates) Percentile Theoretical Quantiles (x-coordinates)
37.5 0.5 / 100 = 0.005 qnorm(0.005) = -2.58
38.0 1.5 / 100 = 0.015 qnorm(0.015) = -2.17
38.3 2.5 / 100 = 0.025 qnorm(0.025) = -1.95
39.5 3.5 / 100 = 0.035 qnorm(0.035) = -1.81
61.9 99.5 / 100 = 0.995 qnorm(0.995) = 2.58

Constructing a normal probability plot

qqnorm(temp)
qqline(temp)
t = sort(temp)
abline(v = c(-2.58, -2.17, -1.95, -1.81, 2.58), lty = 2, col  = 1:5)
abline(h = c(t[1:4], t[100]), lty = 2, col = 1:5)

plot of chunk unnamed-chunk-4

Fat tails

Best to think about what is happening with the most extreme values - here the biggest values are bigger than we would expect and the smallest values are smaller than we would expect (for a normal).

plot of chunk unnamed-chunk-6

Skinny tails

Here the biggest values are smaller than we would expect and the smallest values are bigger than we would expect.