Getting Started

To accept this assignment click here: https://classroom.github.com/a/M5yxnS1F.

Navigate to your repository beginning with hw4- and clone it into RStudio Cloud. Configure git by using the use_git_config() function in the usethis package, and finally, give your project a meaningful name (hw4-[name]). You may cache your login credentials, but remember they are only stored for this single project.

Be sure to follow good coding style and commit often.

Packages

Your code should only contain functions from the loaded R packages below unless explicitly stated in an Exercise.

library(tidyverse)
library(infer)

Control RNG

To control R’s random number generation process and ensure reproducibility, set the seed with

set.seed(5768952)

Exercises

Conceptual understanding of simulation-based inference

Describe precisely how you would set up and perform the full simulation process for the following inference procedures. You may put your explanation in the context of using index cards or chips to represent the data. In each of the scenarios you can assume the original sample size is 100 and the number of simulation replications is 15,000.

Describe the simulation process for testing for a single population standard deviation. Suppose the research question is asking whether the standard deviation of IQ scores is less than 10, and the observed sample standard deviation is 7.5.
Describe the simulation process for testing for a single population proportion. Suppose the research question is asking whether the proportion of successes is majority, where we have that the observed sample proportion of success is 0.52.
Describe the simulation process for creating a 95% confidence interval for the population intercept in a simple linear regression model. Assume the population model is of the form \(y = \beta_0 + \beta_1x\).

Interpreting polls and surveys

Read Diabetes Rates Rise in 18 States in Past Decade along with the Survey Methods (at the bottom of the article). Use the information provided in the article to complete Exercises 4 - 7.

What was the sample size Gallup took in the 2016 - 2017 nationwide study?
Gallup states, “Nationwide, the diabetes rate rose to 11.5% in 2016-2017, up 0.7 percentage points compared with the 10.8% measured in 2008-2009 and representing a net increase of about 1.7 million U.S. adults who report having been diagnosed with the disease over that time.” What do the quantities 11.5% and 10.8% represent?
Provide and interpret a 95% confidence interval for the proportion of adult individuals that have diabetes. Use the information gathered from the 2016 - 2017 data. Hint: if we know the sampling margin of error (this is a function of the variability in the sample statistic), then we can compute the confidence interval by point estimate \(\pm\) sampling margin of error.
Provide and interpret a 95% confidence interval for the proportion of adult individuals in Alaska that have diabetes. Use the information gathered from the 2016 - 2017 data. Hint: if we know the sampling margin of error (this is a function of the variability in the sample statistic), then we can compute the confidence interval by point estimate \(\pm\) sampling margin of error.

Simulation-based inference

One duty of the Nevada Gaming Commission is to ensure casino games are fair as stated by the rules of the game. Suppose an NGC employee records a random sample of 200 roulette wheel spins for a single wheel for the game of American Roulette. Based on the sample data, perform a simulation-based statistical hypothesis test at the 0.001 significance level to investigate if the wheel is biased with regards to the ball landing on red. To get full credit on this problem you must
- correctly write out the hypotheses,
- plot the simulated null distribution based on 5,000 replications,
- compute and display the p-value, and
- write a conclusion within the context of the problem.
```
roulette <- read_csv("data/roulette.csv")
```
Use the object women (available in R: see ?women) to create a 95% confidence interval for the population correlation coefficient between the height and weight of all American women. What assumptions must you make about this dataset? To get full credit on this problem you must
- state any necessary assumptions,
- plot the simulated distribution based on 10,000 replications,
- compute and display the confidence interval, and
- write an interpretation of the interval within the context of the problem.

Submission

Knit to PDF to create a PDF document. Stage and commit all remaining changes, and push your work to GitHub. Make sure all files are updated on your GitHub repo.

Please only upload your PDF document to Gradescope. Before you submit the uploaded document, mark where each answer is to the exercises. If any answer spans multiple pages, then mark all pages. Make sure to associate the “Overall” section with the first page.

References

Gallup, I. (2018). Diabetes Rates Rise in 18 States in Past Decade. Gallup.com. Retrieved 30 March 2020, from https://news.gallup.com/poll/243911/diabetes-rates-rise-states-past-decade.aspx

HW 04 - Simulation-based Inference

Due: Thursday, Apr 09 at 11:59pm EST