Today’s agenda

Today’s agenda

  • Central Limit Theorem
  • Aside: Evaluating normality graphically
  • Application exercise: proving the CLT via simulation

  • Inference based on the Central Limit Theorem

  • Due Thursday: Read Sections 2.5 - 2.8 on OpenIntro: Intro Stat with Randomization and Simulation: http://www.openintro.org/isrs

Notation

Notation

  • Means:
    • Population: mean = \(\mu\), standard deviation = \(\sigma\)
    • Sample: mean = \(\bar{x}\), standard deviation = \(s\)
  • Proportions:
    • Population: \(p\)
    • Sample: \(\hat{p}\)
  • Standard error: \(SE\)

Central Limit Theorem

Variability of sample statistics

  • Each sample from the population yields a slightly different sample statistic (sample mean, sample proportion, etc.)

  • The variability of these sample statistics is measured by the standard error

  • Previously we quantified this value via simulation

  • Today we talk about the theory underlying sampling distributions

Sampling distribution

  • Sampling distribution is the distribution of sample statistics of random samples of size \(n\) taken from a population

  • In practice it is impossible to construct sampling distributions since it would require having access to the entire population

  • Today for demonstration purposes we will assume we have access to the population data, and construct sampling distributions, and examine their shapes, centers, and spreads

Evaluating normality: Normal probability plots

Normal probability plot

d <- data.frame(norm_samp = rnorm(100, mean = 50, sd = 5))

ggplot(data = d, aes(sample = norm_samp)) +
  geom_point(alpha = 0.7, stat = "qq")

Anatomy of a normal probability plot

  • Data are plotted on the y-axis of a normal probability plot and theoretical quantiles (following a normal distribution) on the x-axis.

  • If there is a one-to-one relationship between the data and the theoretical quantiles, then the data follow a nearly normal distribution.

  • Since a one-to-one relationship would appear as a straight line on a scatter plot, the closer the points are to a perfect straight line, the more confident we can be that the data follow the normal model.

Constructing a normal probability plot

Data (y-coordinates) Percentile Theoretical Quantiles (x-coordinates)
37.5 0.5 / 100 = 0.005 qnorm(0.005) = -2.58
38.0 1.5 / 100 = 0.015 qnorm(0.015) = -2.17
38.3 2.5 / 100 = 0.025 qnorm(0.025) = -1.95
39.5 3.5 / 100 = 0.035 qnorm(0.035) = -1.81
61.9 99.5 / 100 = 0.995 qnorm(0.995) = 2.58

Constructing a normal probability plot

Fat tails

Best to think about what is happening with the most extreme values - here the biggest values are bigger than we would expect and the smallest values are smaller than we would expect (for a normal).

Skinny tails

Here the biggest values are smaller than we would expect and the smallest values are bigger than we would expect.

Right skew

Here the biggest values are bigger than we would expect and the smallest values are also bigger than we would expect.