Lab 06: Simulation-based inference

Due: Thu, Mar 18 at 11:59pm ET

Goals

Getting started

Every team member should go to the course GitHub organization and locate their lab_06 repository, which should be named lab_06-<team name>. Copy the URL of the repository and clone the remote repo in RStudio.

As you work on this lab, merge conflicts may arise. Refer back to Lab 05 for how to fix them. You and your team are free to divide up the work how you think is best. However, everyone should understand all code in the lab’s final submission.

Exercises

Packages

library(tidyverse)
library(infer)

Data

In this lab, you’ll work with a couple of datasets.

The ToothGrowth dataset can be loaded into R with data("ToothGrowth"). It contains data on the length of odontoblasts (cells responsible for tooth growth) in 60 guinea pigs. Each animal received one of three dose levels of vitamin C (0.5, 1, and 2 mg/day) by one of two delivery methods, orange juice (OJ) or ascorbic acid (VC) (a form of vitamin C and coded as VC). For the purposes of this lab, we will ignore the dose variable.

The second dataset is a subset of gss_cat from the forcats package. It contains categorical variables from the General Social Survey in 2014.

gss_2014 <- gss_cat %>% 
  filter(year == 2014)

General instructions

Use ToothGrowth for Exercises 1 - 4.

  1. Suppose you are interested in computing a confidence interval for the mean length of odontoblasts in guinea pigs that received some Vitamin C. Given this description and the ToothGrowth data, identify the population, parameter of interest, sample, sample size, and observed sample mean.

  2. Create a 99% confidence interval for the mean length of odontoblasts in guinea pigs that received some Vitamin C. Interpret your interval in the context of the data.

  3. Look at the example given in infer for creating a confidence interval when you have one numeric variable and a categorical variable with two levels. Create a 95% confidence interval for the difference in mean length of odontoblasts in guinea pigs that received Vitamin C by way of OJ and VC. Define the difference as OJ - VC. Your answer should show the observed sample statistic, a visualization via histogram of the difference in means, the 95% confidence interval, and an interpretation for that interval.

  4. Based on your results in Exercise 3, can you conclude that orange juice is a better delivery method of vitamin C than ascorbic acid as it relates to tooth growth in guinea pigs? Why or why not?

Use gss_2014 for Exercises 5 - 7.

  1. The 2010 census revealed that the proportion of U.S. adults who were married was 0.48. Based on the sample data in 2014, perform a hypothesis test at the \(\alpha = 0.01\) significance level to see if this value has changed. Write your hypotheses using the notation introduced in the course. Your answer should include a simulated null distribution, p-value, and written conclusion. Since the sample size is large, only generate 1,000 sample replicates.

  2. Given your conclusion in Exercise 5, what type of error could have been made?

  3. Suppose the significance level in Exercise 5 was 0.10. Would your conclusion change? If so, how?

Submission

Upload your team’s PDF to Gradescope. Include every team member’s name in the Gradescope submission and identify which problems are on each page in Gradescope. Associate the “Overall” section with the first page of your PDF.

Include all team members’ names with the team name in the author portion of the YAML header.

There should only be one submission per team on Gradescope.

References

“Infer - Tidy Statistical Inference”. Infer.Netlify.App, 2021, https://infer.netlify.app/index.html.