This application exercise explores the Central Limit Theorem (for means) via simulation.
pops <- read.csv("https://stat.duke.edu/~mc301/data/pops.csv")
Assume each column of this dataset represents a population. Each team will work with one of the populations (columns):
normal
)some_rs
)very_ls
)wonky
)Use 15,000 samples.
Note that sampling distributions are created by taking random samples, with replacement, from the original population. (Just like the bootstrap sample, except from the population instead of a sample.)
Make histograms and normal probability plots of these distributions. You should know how to make histograms by now, and remember from the slides that you can make normal probability plots using
ggplot(data = name_of_dataframe, aes(sample = name_of_variable)) +
geom_point(stat = "qq")
Describe the shapes of these distributions, and calculate the centers (mean) and the spreads (standard deviation). Compare these to shapes, centers, of spreads of parent population distributions from (1).
[Optional] If time allows (i.e. your team finishes before others), repeat the same exercise with the other populations (columns) as well.
You do not need to write a function, but if you do, it will be really easy to repeat for other sample sizes (or other populations).
Your submission should be an R Markdown file in your team App Ex repo, in a folder called AppEx_09
.
End of class today
merge conflics on GitHub – you’re working in the same repo now!
Issues will arise, and that’s fine! Commit and push often, and ask questions when stuck.