You must turn in a knitted file to Gradescope from a Quarto Markdown file in order to receive credit. Be sure to “associate” questions appropriately on Gradescope. As a reminder, late work is not accepted outside of the 24-hour grace period for homework assignments.

The Quarto template for this assignment may be found in the repository at the following link: https://classroom.github.com/a/09cDpPCK

Today’s data come from county level health data for North Carolina counties as compiled by the Robert Wood Johnson Foundation from a variety of data sources, including the American Community Survey, public health surveillance systems, and various other sources. There are eight variables in this dataset:

Important: Please continue to make regular commits. Note that to avoid having code being “cut-off,” you may insert line breaks as needed; good places to include them are after plus signs in specifying model predictors (+) or commas in separating model arguments/options (,). We will be taking off points if your code runs off the page!

Important: Please suppress warnings and messages in your R code chunks by including the options message = F, warning = F in your chunks. For instance, ```{r chunk-name, message = F, warning = F}

  1. Fit a model with life expectancy as the outcome variable and urbanicity as a predictor. What is the average difference in life expectancy, comparing rural vs. urban counties? What percentage of the variability in life expectancy is explained by the linear relationship with urbanicity? Is there evidence for a statistically significant difference at the \(\alpha\) = 0.05 level (no need to write out all steps in a formal hypothesis test)?
  2. Fit a model with life expectancy as the outcome variable and urbanicity and median household income as predictors. What is the average difference in life expectancy comparing rural vs. urban counties while adjusting for median household income? What percentage of the variability in life expectancy is explained by the linear relationship with urbanicity and median household income? Is there evidence for a statistically significant difference at the \(\alpha\) = 0.05 level (no need to write out all steps in a formal hypothesis test)?
  3. Comment on the differences between models 1 and 2 in terms of the estimated difference between rural and urban counties, whether this difference is statistically significant, and how the model as a whole “performs” in terms of explaining variability in life expectancy at the county level. Summarize these findings in plain English.
  4. Fit a model that predicts life expectancy based on obesity percentage, the percentage of adults that get zero physical activity in their leisure time, the percentage with a long commute, median household income, and urbanicity. Are any of the slope terms statistically significant at the \(\alpha\) = 0.05 significance level? Interpret these slope terms in context of the model. Be sure to specify units.
  5. Use your model in Exercise 4 to conduct a formal hypothesis at the \(\alpha\) = 0.05 significance level that again evaluates the difference between urban and rural counties in terms of mean life expectancy. As part of this hypothesis test, specifically interpret the parameter estimate from your model in context of the data.
  6. Create a high-quality visualization that summarizes the relationship between four variables: life expectancy, obesity percentage, median household income, and urbanicity.