class: center, middle, inverse, title-slide # Multinomial Regression ### Yue Jiang ### STA 210 / Duke University / Spring 2023 --- ### Alligator diets <img src="img/alligator.jpg" width="80%" style="display: block; margin: auto;" /> --- ### Alligator diets <img src="multinomial_files/figure-html/unnamed-chunk-2-1.png" style="display: block; margin: auto;" /> --- ### Multinomial data In this case, our outcome is categorical, but *not* ordered - as well, we have more than two categories, so we can't simply use a straightforward logistic regression. .question[ Suppose you wanted to predict the preferred diet of an alligator based on its sex and its length. How might you do this? ] --- ### Just a bunch of logistic regressions Our outcome `\(Y\)` has `\(J\)` total categories. We might intuitively choose one of these categories to be the referent category, and then compare each of the other categories against it in a pairwise comparison with logistic regressions. Suppose `\(j\)` is the reference category. Then we will fit each of the following models for `\(j = 2, \cdots, J\)`: `\begin{align*} \log\left(\frac{P(Y_i = j)}{P(Y_i = 1)}\right) = \beta_{0;j} + \beta_{1;j}x_{i1} + \cdots + \beta_{p;j}x_{ip} \end{align*}` .question[ Take a look at this model - what does each term represent? Why do we only fit `\(j - 1\)` separate models? ] --- ### Fitting a multinomial regression model ```r library(nnet) m1 <- multinom(food ~ sex + length, data = gators) ``` ``` ## # weights: 12 (6 variable) ## initial value 64.818125 ## iter 10 value 48.651915 ## final value 48.651292 ## converged ``` --- ### Fitting a multinomial regression model ```r summary(m1) ``` ``` ## Call: ## multinom(formula = food ~ sex + length, data = gators) ## ## Coefficients: ## (Intercept) sexM length ## invertibrate 4.070252 0.2313371 -2.4218022 ## other -1.543849 -0.7175895 0.2512553 ## ## Std. Errors: ## (Intercept) sexM length ## invertibrate 1.476859 0.6746705 0.8283374 ## other 1.313245 0.8485217 0.5486893 ## ## Residual Deviance: 97.30258 ## AIC: 109.3026 ``` --- ### Fitting a multinomial regression model ```r exp(coef(m1)) ``` ``` ## (Intercept) sexM length ## invertibrate 58.5717389 1.260284 0.08876151 ## other 0.2135574 0.487927 1.28563828 ``` --- ### Fitting a multinomial regression model ```r head(round(fitted(m1), 3)) ``` ``` ## fish invertibrate other ## 1 0.238 0.692 0.069 ## 2 0.262 0.660 0.078 ## 3 0.232 0.735 0.033 ## 4 0.240 0.725 0.035 ## 5 0.271 0.649 0.081 ## 6 0.305 0.602 0.093 ``` --- ### An important assumption One important assumption in the multinomial model is that the probability of being in category A or B shouldn't depend on whether category C is included or not as a potential option. .question[ What does this mean in our alligator diet example? ] --- ### Outliers and leverage Now for something completely different (though relevant to any regression setting) --- ### Outliers and leverage .vocab[Outliers] are points that don't follow the general pattern of the rest of the data Points are said to have high .vocab[leverage] when they are extreme in some sense (e.g., unusual variable values) .vocab[Influential] points are those that disproportionately influence the results from regression fits (e.g., slope estimates, etc.) --- ### Outliers and leverage <img src="multinomial_files/figure-html/unnamed-chunk-7-1.png" style="display: block; margin: auto;" /> --- ### Outliers and leverage <img src="multinomial_files/figure-html/unnamed-chunk-8-1.png" style="display: block; margin: auto;" /> --- ### Outliers and leverage <img src="multinomial_files/figure-html/unnamed-chunk-9-1.png" style="display: block; margin: auto;" /> --- ### Outliers and leverage <img src="multinomial_files/figure-html/unnamed-chunk-10-1.png" style="display: block; margin: auto;" /> --- ### Cook's distance .vocab[Cook's distance] is an estimate of how influential each observation is in a linear regression model. It's basically a measure of how all of the fitted values change when the `\(i^{th}\)` observation is removed - larger Cook's d implies larger influence (Cook's d greater than 0.5 or so is a good rule of thumb for a potentially influential point) ```r library(car) plot(cooks.distance(lm(y ~ x)), xlab = "Observation Index", ylab = "Cook's distance for regression model") ``` --- ### Cook's distance <img src="multinomial_files/figure-html/unnamed-chunk-12-1.png" style="display: block; margin: auto;" /> --- ### Cook's distance <img src="multinomial_files/figure-html/unnamed-chunk-13-1.png" style="display: block; margin: auto;" /> --- ### Cook's distance <img src="multinomial_files/figure-html/unnamed-chunk-14-1.png" style="display: block; margin: auto;" /> --- ### Cook's distance <img src="multinomial_files/figure-html/unnamed-chunk-15-1.png" style="display: block; margin: auto;" /> --- ### Remember "augment"? ```r library(tidymodels) augment(lm(y ~ x)) # your model object goes here ``` ``` ## # A tibble: 20 x 8 ## y x .fitted .resid .hat .sigma .cooksd .std.resid ## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> ## 1 14.8 3.86 13.8 0.986 0.0871 0.946 0.0562 1.09 ## 2 18.5 5.36 17.7 0.780 0.0802 0.958 0.0319 0.855 ## 3 15.1 4.23 14.8 0.322 0.0608 0.975 0.00396 0.350 ## 4 19.1 5.65 18.4 0.655 0.109 0.964 0.0327 0.730 ## 5 18.9 5.82 18.8 0.0645 0.131 0.978 0.000400 0.0728 ## 6 13.2 3.14 12.0 1.23 0.187 0.921 0.236 1.44 ## 7 16.3 4.58 15.7 0.576 0.0503 0.968 0.0102 0.622 ## 8 17.1 5.68 18.5 -1.41 0.113 0.909 0.157 -1.57 ## 9 16.7 4.65 15.9 0.810 0.0500 0.957 0.0201 0.874 ## 10 14.6 4.37 15.1 -0.489 0.0548 0.971 0.00809 -0.529 ## 11 18.5 5.87 19.0 -0.426 0.138 0.972 0.0187 -0.483 ## 12 14.9 4.36 15.1 -0.238 0.0551 0.977 0.00194 -0.258 ## 13 16.1 5.03 16.8 -0.751 0.0586 0.960 0.0206 -0.814 ## 14 15.4 4.72 16.0 -0.592 0.0503 0.967 0.0108 -0.639 ## 15 11.3 3.31 12.4 -1.11 0.157 0.934 0.150 -1.27 ## 16 17.4 5.70 18.5 -1.12 0.115 0.935 0.102 -1.25 ## 17 14.1 3.74 13.5 0.545 0.0997 0.968 0.0202 0.604 ## 18 11.5 3.13 11.9 -0.408 0.189 0.972 0.0263 -0.476 ## 19 12.8 3.98 14.1 -1.32 0.0766 0.920 0.0870 -1.45 ## 20 20.8 5.86 19.0 1.89 0.137 0.844 0.365 2.14 ``` --- ### Another diagnostic plotting function ```r library(ggfortify) autoplot(lm(y ~ x)) # your model object goes here ``` --- ### Another diagnostic plotting function <img src="multinomial_files/figure-html/unnamed-chunk-19-1.png" style="display: block; margin: auto;" /> --- ### What to do with outliers? We can often detect outliers visually (e.g., in the residual plot), or by using statistics such as Cook's d or examining leverage or other diagnostic plots. Do not ignore outliers when you find them, and do not automatically delete them! Outliers are often very interesting points that you might want to learn more about, and aren't necessarily mistakes in the data (although sometimes they are). You may want to perform .vocab[sensitivity analyses] after removing outliers. Do your results or overall message change? How .vocab[robust] are your conclusions to the outliers?