You must turn in a knitted file to Gradescope from a Quarto Markdown
file in order to receive credit. Be sure to “associate”
questions appropriately on Gradescope. As a reminder, late work
is not accepted outside of the 24-hour grace period for homework
assignments.
The Quarto template for this assignment may be found in the
repository at the following link: https://classroom.github.com/a/09cDpPCK
Today’s data come from county level health data for North Carolina
counties as compiled by the Robert Wood Johnson Foundation from a
variety of data sources, including the American Community Survey, public
health surveillance systems, and various other sources. There are eight
variables in this dataset:
county: the name of the county
life_expect: the mean life expectancy in years for this
county
obesity: the percentage of the county that is
considered obese (BMI 30+)
phys_inactive: the percentage of adults age 20+
reporting no leisure-time physical activity
uninsured: the percentage of adults under age 65 that
do not have health insurance
long_commute: among workers who commute in their car
alone, the percentage that commute more than 30 minutes.
hhi: the median household income in the county, in
thousands.
urbanicity: a categorical variable with levels of
“rural,” “semirural”, “semiurban,” and “urban,” corresponding to
percentages of residents who live in cities. “Rural” is defined as 0-25%
living in cities, semirural is defined as 25-50% living in cities,
semiurban as 50-75%, and urban as 75-100% of residents living in
cities.
Important: Please continue to make regular commits.
Note that to avoid having code being “cut-off,” you may insert line
breaks as needed; good places to include them are after plus signs in
specifying model predictors (+) or commas in separating
model arguments/options (,). We will be taking off
points if your code runs off the page!
Important: Please suppress warnings and messages in
your R code chunks by including the options
message = F, warning = F in your chunks. For instance,
```{r chunk-name, message = F, warning = F}
- Fit a model with life expectancy as the outcome variable and
urbanicity as a predictor. What is the average difference in life
expectancy, comparing rural vs. urban counties? What percentage of the
variability in life expectancy is explained by the linear relationship
with urbanicity? Is there evidence for a statistically significant
difference at the \(\alpha\) = 0.05
level (no need to write out all steps in a formal hypothesis test)?
- Fit a model with life expectancy as the outcome variable and
urbanicity and median household income as predictors. What is
the average difference in life expectancy comparing rural vs. urban
counties while adjusting for median household income? What percentage of
the variability in life expectancy is explained by the linear
relationship with urbanicity and median household income? Is there
evidence for a statistically significant difference at the \(\alpha\) = 0.05 level (no need to write out
all steps in a formal hypothesis test)?
- Comment on the differences between models 1 and 2 in terms of the
estimated difference between rural and urban counties, whether this
difference is statistically significant, and how the model as a whole
“performs” in terms of explaining variability in life expectancy at the
county level. Summarize these findings in plain English.
- Fit a model that predicts life expectancy based on obesity
percentage, the percentage of adults that get zero physical activity in
their leisure time, the percentage with a long commute, median household
income, and urbanicity. Are any of the slope terms statistically
significant at the \(\alpha\) = 0.05
significance level? Interpret these slope terms in context of the model.
Be sure to specify units.
- Use your model in Exercise 4 to conduct a formal hypothesis at the
\(\alpha\) = 0.05 significance level
that again evaluates the difference between urban and rural counties in
terms of mean life expectancy. As part of this hypothesis test,
specifically interpret the parameter estimate from your model in context
of the data.
- Create a high-quality visualization that summarizes the relationship
between four variables: life expectancy, obesity percentage,
median household income, and urbanicity.