September 30, 2014

Due 10/21: https://stat.duke.edu/courses/Fall14/sta112.01/project/mt_project.html

Each sample from the population yields a slightly different sample statistic (sample mean, sample proportion, etc.)

The variability of these sample statistics is measured by the

**standard error**Previously we quantified this value via simulation

Today we talk about the theory underlying

**sampling distributions**

temp = rnorm(100, mean = 50, sd = 5) # normal probability plot g = qplot(sample = temp, stat = "qq") g + geom_abline(intercept = mean(temp), slope = sd(temp), linetype = "dashed")

qqnorm(temp) qqline(temp)

Data are plotted on the y-axis of a normal probability plot and theoretical quantiles (following a normal distribution) on the x-axis.

If there is a one-to-one relationship between the data and the theoretical quantiles, then the data follow a nearly normal distribution.

Since a one-to-one relationship would appear as a straight line on a scatter plot, the closer the points are to a perfect straight line, the more confident we can be that the data follow the normal model.

Data (y-coordinates) | Percentile | Theoretical Quantiles (x-coordinates) |
---|---|---|

37.5 | 0.5 / 100 = 0.005 | `qnorm(0.005) = -2.58` |

38.0 | 1.5 / 100 = 0.015 | `qnorm(0.015) = -2.17` |

38.3 | 2.5 / 100 = 0.025 | `qnorm(0.025) = -1.95` |

39.5 | 3.5 / 100 = 0.035 | `qnorm(0.035) = -1.81` |

… | … | … |

61.9 | 99.5 / 100 = 0.995 | `qnorm(0.995) = 2.58` |

qqnorm(temp) qqline(temp) t = sort(temp) abline(v = c(-2.58, -2.17, -1.95, -1.81, 2.58), lty = 2, col = 1:5) abline(h = c(t[1:4], t[100]), lty = 2, col = 1:5)

Best to think about what is happening with the most extreme values - here the biggest values are bigger than we would expect and the smallest values are smaller than we would expect (for a normal).

Here the biggest values are smaller than we would expect and the smallest values are bigger than we would expect.