The Quarto template for this assignment may be found in the repository at the following link: https://classroom.github.com/a/9yj-vFdX

In today’s homework we will revisit the NC county-level data as compiled by the Robert Wood Johnson Foundation from a variety of data sources, including the American Community Survey, public health surveillance systems, and various other sources. There are eight variables in this dataset:

  1. Fit an ordinal regression model predicting urbanicity with all other variables in the model (except county ID). Interpret the slope parameter corresponding to life expectancy. Using your model, what urbanicity is Durham county predicted to be? (note - luckily in this case the factor variable is in order. This won’t always be the case!)
  2. Explain what the proportional odds assumption means in the context of this regression problem. As part of this answer, explain what it might mean in context if this assumption were to be violated.
  3. Now fit a multinomial regression model predicting urbanicity with all other variables in the model. Interpret each of the slope parameters corresponding to life expectancy. Using your model, what urbanicity is Durham county predicted to be? Provide the predicted probabilities for each of the four urbanicity categories.
  4. Explain what the independence of irrelevant alternatives assumptions means in the context of this regression problem. As part of this answer, explain what if might mean in context if this assumption were to be violated.
  5. Which model do you think is more appropriate to use for these data? Explain.
  6. Now fit a linear model predicting life expectancy based on the other variables in your model, as in HW 3. Which two counties have the highest influence in this model? Are you particularly worried that they are too influential? Explain.
  7. Whom are you working with for your project (if anyone)? What dataset are you using? How many observations are there, how many variables are there, and what is your potential research question?