# Central Limit Theorem

## Variability of sample statistics

• Each sample from the population yields a slightly different sample statistic (sample mean, sample proportion, etc.)

• The variability of these sample statistics is measured by the standard error

• Previously we quantified this value via simulation

• Today we talk about the theory underlying sampling distributions

# Aside: Normal probability plots

## Normal probability plot

``````temp = rnorm(100, mean = 50, sd = 5)
# normal probability plot
g = qplot(sample = temp, stat = "qq")
g + geom_abline(intercept = mean(temp), slope = sd(temp), linetype = "dashed")``````

## Alternative code for normal probability plot

``````qqnorm(temp)
qqline(temp)``````

## Anatomy of a normal probability plot

• Data are plotted on the y-axis of a normal probability plot and theoretical quantiles (following a normal distribution) on the x-axis.

• If there is a one-to-one relationship between the data and the theoretical quantiles, then the data follow a nearly normal distribution.

• Since a one-to-one relationship would appear as a straight line on a scatter plot, the closer the points are to a perfect straight line, the more confident we can be that the data follow the normal model.

## Constructing a normal probability plot

Data (y-coordinates) Percentile Theoretical Quantiles (x-coordinates)
37.5 0.5 / 100 = 0.005 `qnorm(0.005) = -2.58`
38.0 1.5 / 100 = 0.015 `qnorm(0.015) = -2.17`
38.3 2.5 / 100 = 0.025 `qnorm(0.025) = -1.95`
39.5 3.5 / 100 = 0.035 `qnorm(0.035) = -1.81`
61.9 99.5 / 100 = 0.995 `qnorm(0.995) = 2.58`

## Constructing a normal probability plot

``````qqnorm(temp)
qqline(temp)
t = sort(temp)
abline(v = c(-2.58, -2.17, -1.95, -1.81, 2.58), lty = 2, col  = 1:5)
abline(h = c(t[1:4], t[100]), lty = 2, col = 1:5)``````

## Fat tails

Best to think about what is happening with the most extreme values - here the biggest values are bigger than we would expect and the smallest values are smaller than we would expect (for a normal).

## Skinny tails

Here the biggest values are smaller than we would expect and the smallest values are bigger than we would expect.