You must turn in a knitted file to Gradescope from a Quarto Markdown file in order to receive credit. Be sure to “associate” questions appropriately on Gradescope. As a reminder, late work is not accepted outside of the 24-hour grace period for homework assignments.

The Quarto template for this assignment may be found in the repository at the following link: https://classroom.github.com/a/PQSjmLHJ

We will again use the data from last week’s homework on county-level indicators of health in North Carolina. As a reminder, here are the variables of interest from the dataset:

Important: Please continue to make regular commits and follow good coding practices (e.g., with not having code run off the page). As well, suppress warnings and messages in your R code chunks.

  1. Fit a linear model with life expectancy as the outcome variable and urbanicity as the only predictor. Display the residual plot. What do you notice? Why do you think it looks like this?
  2. Fit a linear model with life expectancy as the outcome variable and urbanicity and median household income as the only predictors considered. At the \(\alpha = 0.05\) level, is there sufficient evidence to suggest that the relationship between median household income and average life expectancy depends on the urbanicity of the county? Simply explain (no need to conduct a formal hypothesis test).
  3. From your model in Exercise 2, what is the relationship between household income and average life expectancy? Specifically, for each $1,000 increase in county median household income.
  4. Fit one final linear model (with life expectancy as the outcome) and median household income, obesity, urbanicity, % with long commute, % physically inactive, and an interaction between obesity and the % physically inactive as the predictors. Interpret the slope corresponding to obesity.
  5. In HW 3, we were interested in comparing life expectancy for rural vs. urban counties (perhaps adjusted for other predictors). In your model from Exercise 4, do you find sufficient statistical evidence to suggest such a difference? Conduct a formal hypothesis test at the \(\alpha = 0.05\) level.
  6. Evaluate whether the linear model assumptions are satisfied for your model in Exercise 4. Suppose your answer with any plots as necessary.
  7. (optional) if you’re bored Are there any counties for which your model did a particularly bad job in terms of prediction? Identify them (if any), and do some research that might explain why your model performed in this way.