--- title: "Multinomial Logistic Regression" subtitle: "Assumptions & Model Selection" date: "`r Sys.Date()`" output: html_document: theme: readable --- ```{r global-options, include=FALSE} knitr::opts_chunk$set(warning=FALSE, message=FALSE) ``` The goal of this exercise is to walk through a multinomial logistic regression analysis. It will give you a basic idea of the analysis steps and thought-process; however, due to class time constraints, this analysis is not exhaustive. ```{r load-packages} library(tidyverse) library(knitr) library(broom) library(patchwork) library(nnet) ``` This data is from the `NHANES` dataset found in the NHANES R package. Type `?NHANES` in the console to read more about the data and variables. We'll focus on the following variables for this analysis: - `HealthGen`: Self-reported rating of participant's health in general. Excellent, Vgood, Good, Fair, or Poor. - `Age`: Age at time of screening (in years). Participants 80 or older were recorded as 80. - `PhysActive`: Participant does moderate to vigorous-intensity sports, fitness or recreational activities - `Education`: Educational level of study participant. Reported for participants aged 20 years or older. 8thGrade, 9-11thGrade, HighSchool, SomeCollege, or CollegeGrad. ## Load and prepare data ```{r load-data} library(NHANES) nhanes_adult <- NHANES %>% filter(Age >= 20) %>% #only using participants 20 or older for Education select(HealthGen, Age, PhysActive, Education) %>% drop_na() %>% mutate(obs_num = 1:n()) ``` ## Fit multinomial logistic model ```{r results = "hide"} health_m <- multinom(HealthGen ~ Age + PhysActive, data = nhanes_adult) ``` ## Checking Assumptions ### Add predcited probabilities and residuals to data ```{r} #calculate residuals residuals <- as_tibble(residuals(health_m)) %>% #calculate residuals setNames(paste('resid.', names(.), sep = "")) %>% #update column names mutate(obs_num = 1:n()) #add obs number ``` ```{r} #calculate predicted probabilities pred_probs <- as_tibble(predict(health_m, type = "probs")) %>% mutate(obs_num = 1:n()) ``` ```{r} health_m_aug <- inner_join(nhanes_adult, pred_probs) #add probs health_m_aug <- inner_join(health_m_aug, residuals) #add resid ``` ### Binned residuals vs. pred. probabilities ```{r echo} par(mfrow = c(3,2)) arm::binnedplot(x = health_m_aug$Excellent, y = health_m_aug$resid.Excellent, xlab="Predicted P(Excellent)", ylab="Residuals", main="Excellent: Residuals vs. Pred. Probabilities", col.int = FALSE) arm::binnedplot(x = health_m_aug$Vgood, y = health_m_aug$resid.Vgood, xlab="Predicted P(Vgood)", ylab="Residuals", main="Very Good: Residuals vs. Pred. Probabilities", col.int = FALSE) arm::binnedplot(x = health_m_aug$Good, y = health_m_aug$resid.Good, xlab="Predicted P(Good)", ylab="Residuals", main="Good: Residuals vs. Pred. Probabilities", col.int = FALSE) arm::binnedplot(x = health_m_aug$Fair, y = health_m_aug$resid.Fair, xlab="Predicted P(Fair)", ylab="Residuals", main="Fair: Residuals vs. Pred. Probabilities", col.int = FALSE) arm::binnedplot(x = health_m_aug$Poor, y = health_m_aug$resid.Poor, xlab="Predicted P(Poor)", ylab="Residuals", main="Poor: Residuals vs. Pred. Probabilities", col.int = FALSE) ``` ### Binned residuals vs. `Age` ```{r} par(mfrow = c(3,2)) arm::binnedplot(x = health_m_aug$Age, y = health_m_aug$resid.Excellent, xlab="Age", ylab="Residuals", main="Excellent: Residuals vs. Age", col.int = FALSE) arm::binnedplot(x = health_m_aug$Age, y = health_m_aug$resid.Vgood, xlab="Age", ylab="Residuals", main="Very Good: Residuals vs. Age", col.int = FALSE) arm::binnedplot(x = health_m_aug$Age, y = health_m_aug$resid.Good, xlab="Age", ylab="Residuals", main="Good: Residuals vs. Age", col.int = FALSE) arm::binnedplot(x = health_m_aug$Age, y = health_m_aug$resid.Fair, xlab="Age", ylab="Residuals", main="Fair: Residuals vs. Age", col.int = FALSE) arm::binnedplot(x = health_m_aug$Age, y = health_m_aug$resid.Poor, xlab="Age", ylab="Residuals", main="Poor: Residuals vs. Age", col.int = FALSE) ``` ### Residuals vs. `PhysActive` ```{r} health_m_aug %>% group_by(PhysActive) %>% summarise(mean.Excellent = mean(resid.Excellent), mean.Vgood = mean(resid.Vgood), mean.Good = mean(resid.Good), mean.Fair = mean(resid.Fair), mean.Poor = mean(resid.Poor)) %>% t() ``` Is the linearity assumption met? Is the randomness assumption met? Is the independence assumption met? ## Add `Education` to the model? ```{r results = "hide"} model_red <- multinom(HealthGen ~ Age + PhysActive, data = nhanes_adult) model_full <- multinom(HealthGen ~ Age + PhysActive + Education, data = nhanes_adult) ``` ```{r} anova(model_red, model_full, test = "Chisq") %>% kable(format = "markdown") ``` ### Model with `Education` ```{r echo = F} tidy(model_full, conf.int = T, exponentiate = F) %>% kable(format = "markdown", digits = 3) ``` ### Compare NHANES models using AIC ```{r} glance(model_red) glance(model_full) ```