# Today’s agenda

## Today’s agenda

• Central Limit Theorem
• Aside: Evaluating normality graphically
• Application exercise: proving the CLT via simulation

• Inference based on the Central Limit Theorem

• Due Thursday: Read Sections 2.5 - 2.8 on OpenIntro: Intro Stat with Randomization and Simulation: http://www.openintro.org/isrs

# Notation

## Notation

• Means:
• Population: mean = $$\mu$$, standard deviation = $$\sigma$$
• Sample: mean = $$\bar{x}$$, standard deviation = $$s$$
• Proportions:
• Population: $$p$$
• Sample: $$\hat{p}$$
• Standard error: $$SE$$

# Central Limit Theorem

## Variability of sample statistics

• Each sample from the population yields a slightly different sample statistic (sample mean, sample proportion, etc.)

• The variability of these sample statistics is measured by the standard error

• Previously we quantified this value via simulation

• Today we talk about the theory underlying sampling distributions

## Sampling distribution

• Sampling distribution is the distribution of sample statistics of random samples of size $$n$$ taken from a population

• In practice it is impossible to construct sampling distributions since it would require having access to the entire population

• Today for demonstration purposes we will assume we have access to the population data, and construct sampling distributions, and examine their shapes, centers, and spreads

# Evaluating normality: Normal probability plots

## Normal probability plot

d <- data.frame(norm_samp = rnorm(100, mean = 50, sd = 5))

ggplot(data = d, aes(sample = norm_samp)) +
geom_point(alpha = 0.7, stat = "qq")

## Anatomy of a normal probability plot

• Data are plotted on the y-axis of a normal probability plot and theoretical quantiles (following a normal distribution) on the x-axis.

• If there is a one-to-one relationship between the data and the theoretical quantiles, then the data follow a nearly normal distribution.

• Since a one-to-one relationship would appear as a straight line on a scatter plot, the closer the points are to a perfect straight line, the more confident we can be that the data follow the normal model.

## Constructing a normal probability plot

Data (y-coordinates) Percentile Theoretical Quantiles (x-coordinates)
37.5 0.5 / 100 = 0.005 qnorm(0.005) = -2.58
38.0 1.5 / 100 = 0.015 qnorm(0.015) = -2.17
38.3 2.5 / 100 = 0.025 qnorm(0.025) = -1.95
39.5 3.5 / 100 = 0.035 qnorm(0.035) = -1.81
61.9 99.5 / 100 = 0.995 qnorm(0.995) = 2.58

## Fat tails

Best to think about what is happening with the most extreme values - here the biggest values are bigger than we would expect and the smallest values are smaller than we would expect (for a normal).

## Skinny tails

Here the biggest values are smaller than we would expect and the smallest values are bigger than we would expect.

## Right skew

Here the biggest values are bigger than we would expect and the smallest values are also bigger than we would expect.