15. Central Limit Theorem + CLT based inference, Pt. 1

- Central Limit Theorem
- Aside: Evaluating normality graphically
Application exercise: proving the CLT via simulation

Inference based on the Central Limit Theorem

**Due Thursday:**Read Sections 2.5 - 2.8 on OpenIntro: Intro Stat with Randomization and Simulation: http://www.openintro.org/isrs

- Means:
- Population: mean = \(\mu\), standard deviation = \(\sigma\)
- Sample: mean = \(\bar{x}\), standard deviation = \(s\)

- Proportions:
- Population: \(p\)
- Sample: \(\hat{p}\)

- Standard error: \(SE\)

Each sample from the population yields a slightly different sample statistic (sample mean, sample proportion, etc.)

The variability of these sample statistics is measured by the

**standard error**Previously we quantified this value via simulation

Today we talk about the theory underlying

**sampling distributions**

**Sampling distribution**is the distribution of sample statistics of random samples of size \(n\) taken from a populationIn practice it is impossible to construct sampling distributions since it would require having access to the entire population

Today for demonstration purposes we will assume we have access to the population data, and construct sampling distributions, and examine their shapes, centers, and spreads

```
d <- data.frame(norm_samp = rnorm(100, mean = 50, sd = 5))
ggplot(data = d, aes(sample = norm_samp)) +
geom_point(alpha = 0.7, stat = "qq")
```

Data are plotted on the y-axis of a normal probability plot and theoretical quantiles (following a normal distribution) on the x-axis.

If there is a one-to-one relationship between the data and the theoretical quantiles, then the data follow a nearly normal distribution.

Since a one-to-one relationship would appear as a straight line on a scatter plot, the closer the points are to a perfect straight line, the more confident we can be that the data follow the normal model.

Data (y-coordinates) | Percentile | Theoretical Quantiles (x-coordinates) |
---|---|---|

37.5 | 0.5 / 100 = 0.005 | `qnorm(0.005) = -2.58` |

38.0 | 1.5 / 100 = 0.015 | `qnorm(0.015) = -2.17` |

38.3 | 2.5 / 100 = 0.025 | `qnorm(0.025) = -1.95` |

39.5 | 3.5 / 100 = 0.035 | `qnorm(0.035) = -1.81` |

… | … | … |

61.9 | 99.5 / 100 = 0.995 | `qnorm(0.995) = 2.58` |

Best to think about what is happening with the most extreme values - here the biggest values are bigger than we would expect and the smallest values are smaller than we would expect (for a normal).

Here the biggest values are smaller than we would expect and the smallest values are bigger than we would expect.

Here the biggest values are bigger than we would expect and the smallest values are also bigger than we would expect.