Here are the steps for getting started:
and voila, you’re done! Once you push your changes back you do not need to do anything else to “submit” your work. And you can of course push multiple times throughout the assignment. At the time of the deadline we will take whatever is in your repo and consider it your final submission, and grade the state of your work at that time (which means even if you made mistakes before then, you wouldn’t be penalized for them as long as the final state of your work is correct).
Describe precisely how you would set up the simulation for the following hypothesis tests. Imagine using index cards or chips to represent the data. Also specify whether the null hypothesis would be independence or point and whether the simulation type would be bootstrap, simulate, or permute. In each of the scenarios you can assume sample size is 100 and number of simulations is 15,000.
Each year since 2005, the US Census Bureau surveys about 3.5 million households with The American Community Survey (ACS). Data collected from the ACS have been crucial in government and policy decisions, helping to determine the allocation of more than $400 billion in federal and state funds each year. For example, funds for the Adult Education and Family Literacy Act are distributed to states taking into consideration data from the ACS on number of adults 16 and over without a high school diploma. This act is the primary source of federal funding for adults with low basic skills seeking further education or English language services, and Department of Education uses ACS data to ensure the efficient distribute funds.
The ACS received a surge of media attention in Spring 2012 when the House of Representatives voted to eliminate the survey. Daniel Webster, a first-term Republican congressman from Florida, sponsored the legislation citing the following reasons:
In this part we use data from the 2012 ACS to answer a few questions about American adults. For each of the questions make sure to use the infer
package in your answers and interpret your results in context of the data. For hypothesis tests make sure to state your hypotheses clearly. You can use 1,000 repetitions in your simulations for computational efficiency reasons.
The dataset you will use is called acs12
and it’s in the openintro
package.
Do these data provide convincing evidence of a difference in median income of employed Americans who do and do not speak English at home?
Construct a 95% confidence interval for the median travel time to work of Americans who are employed.
Do these data provide convincing evidence of a difference in the proportions of Americans who are employed between those who are citizens and those who are not?
Construct a 90% confidence interval for the difference in the proportions of Americans who are citizens between those who speak English at home and those who do not.
Construct a 95% confidence interval for the difference for the difference in median incomes of employed Americans born in the first hal of the year vs. those born in the in the second half.
Pick your own: State a question you want to answer with these data and answer it using the appropriate inference method.
Ask your questions on the #questions channel on Slack. If your question is about an error you’re getting, make sure to clearly explain what generated the error as well as what the error says.
You are also welcomed to discuss the homework with each other broadly (no sharing code!) as well as ask questions at office hours.
This is an individual assignment. You are welcomed to exchange ideas with classmates and ask questions on the getting help channels discussed above however you may not share your text or code answers directly with classmates.
The Duke Community Standard applies and course academic integrity policies apply. Please review them here. Specifically, the note on sharing / reusing code.