You must turn in a knitted file to Gradescope from a Quarto Markdown
file in order to receive credit. Be sure to “associate”
questions appropriately on Gradescope. As a reminder, late work
is not accepted outside of the 24-hour grace period for homework
assignments.
The Quarto template for this assignment may be found in the
repository at the following link: https://classroom.github.com/a/PQSjmLHJ
We will again use the data from last week’s homework on county-level
indicators of health in North Carolina. As a reminder, here are the
variables of interest from the dataset:
county: the name of the county
life_expect: the mean life expectancy in years for this
county
obesity: the percentage of the county that is
considered obese (BMI 30+)
phys_inactive: the percentage of adults age 20+
reporting no leisure-time physical activity
uninsured: the percentage of adults under age 65 that
do not have health insurance
long_commute: among workers who commute in their car
alone, the percentage that commute more than 30 minutes.
hhi: the median household income in the county, in
thousands.
urbanicity: a categorical variable with levels of
“rural,” “semirural”, “semiurban,” and “urban,” corresponding to
percentages of residents who live in cities. “Rural” is defined as 0-25%
living in cities, semirural is defined as 25-50% living in cities,
semiurban as 50-75%, and urban as 75-100% of residents living in
cities.
Important: Please continue to make regular commits
and follow good coding practices (e.g., with not having code run off the
page). As well, suppress warnings and messages in your R code
chunks.
- Fit a linear model with life expectancy as the outcome variable and
urbanicity as the only predictor. Display the residual plot. What do you
notice? Why do you think it looks like this?
- Fit a linear model with life expectancy as the outcome variable and
urbanicity and median household income as the only predictors
considered. At the \(\alpha = 0.05\)
level, is there sufficient evidence to suggest that the relationship
between median household income and average life expectancy depends on
the urbanicity of the county? Simply explain (no need to conduct a
formal hypothesis test).
- From your model in Exercise 2, what is the relationship between
household income and average life expectancy? Specifically, for each
$1,000 increase in county median household income.
- Fit one final linear model (with life expectancy as the outcome) and
median household income, obesity, urbanicity, % with long commute, %
physically inactive, and an interaction between obesity
and the % physically inactive as the predictors. Interpret the slope
corresponding to obesity.
- In HW 3, we were interested in comparing life expectancy for rural
vs. urban counties (perhaps adjusted for other predictors). In your
model from Exercise 4, do you find sufficient statistical evidence to
suggest such a difference? Conduct a formal hypothesis test at the \(\alpha = 0.05\) level.
- Evaluate whether the linear model assumptions are satisfied for your
model in Exercise 4. Suppose your answer with any plots as
necessary.
- (optional) if you’re bored Are there any counties
for which your model did a particularly bad job in terms of prediction?
Identify them (if any), and do some research that might explain why your
model performed in this way.