Multinomial Regression

class: center, middle, inverse, title-slide

# Multinomial Regression
### Yue Jiang
### STA 210 / Duke University / Spring 2023

---

### Alligator diets

---

### Alligator diets

---

### Multinomial data

In this case, our outcome is categorical, but *not* ordered - as well, we have
more than two categories, so we can't simply use a straightforward logistic
regression.

.question[
Suppose you wanted to predict the preferred diet of an alligator based on its
sex and its length. How might you do this?
]

---

### Just a bunch of logistic regressions

Our outcome `\(Y\)` has `\(J\)` total categories. We might intuitively choose one of 
these categories to be the referent category, and then compare each of the other
categories against it in a pairwise comparison with logistic regressions.

Suppose `\(j\)` is the reference category. Then we will fit each of the following
models for `\(j = 2, \cdots, J\)`:

`\begin{align*}
\log\left(\frac{P(Y_i = j)}{P(Y_i = 1)}\right) = \beta_{0;j} + \beta_{1;j}x_{i1} + \cdots + \beta_{p;j}x_{ip}
\end{align*}`

.question[
Take a look at this model - what does each term represent? Why do we only fit
`\(j - 1\)` separate models?
]

---

### Fitting a multinomial regression model

```r
library(nnet)
m1 <- multinom(food ~ sex + length, data = gators)
```

```
## # weights:  12 (6 variable)
## initial  value 64.818125 
## iter  10 value 48.651915
## final  value 48.651292 
## converged
```

---

### Fitting a multinomial regression model

```r
summary(m1)
```

```
## Call:
## multinom(formula = food ~ sex + length, data = gators)
## 
## Coefficients:
##              (Intercept)       sexM     length
## invertibrate    4.070252  0.2313371 -2.4218022
## other          -1.543849 -0.7175895  0.2512553
## 
## Std. Errors:
##              (Intercept)      sexM    length
## invertibrate    1.476859 0.6746705 0.8283374
## other           1.313245 0.8485217 0.5486893
## 
## Residual Deviance: 97.30258 
## AIC: 109.3026
```

---

### Fitting a multinomial regression model

```r
exp(coef(m1))
```

```
##              (Intercept)     sexM     length
## invertibrate  58.5717389 1.260284 0.08876151
## other          0.2135574 0.487927 1.28563828
```

---

### Fitting a multinomial regression model

```r
head(round(fitted(m1), 3))
```

```
##    fish invertibrate other
## 1 0.238        0.692 0.069
## 2 0.262        0.660 0.078
## 3 0.232        0.735 0.033
## 4 0.240        0.725 0.035
## 5 0.271        0.649 0.081
## 6 0.305        0.602 0.093
```

---

### An important assumption

One important assumption in the multinomial model is that the probability of
being in category A or B shouldn't depend on whether category C is included or
not as a potential option.

.question[
What does this mean in our alligator diet example?
]

---

### Outliers and leverage

Now for something completely different (though relevant to any regression 
setting)

---

### Outliers and leverage

.vocab[Outliers] are points that don't follow the general pattern of the rest
of the data

Points are said to have high .vocab[leverage] when they are extreme in some
sense (e.g., unusual variable values)

.vocab[Influential] points are those that disproportionately influence the
results from regression fits (e.g., slope estimates, etc.)

---

### Outliers and leverage

---

### Outliers and leverage

---

### Outliers and leverage

---

### Outliers and leverage

---

### Cook's distance

.vocab[Cook's distance] is an estimate of how influential each observation is
in a linear regression model. It's basically a measure of how all of the fitted
values change when the `\(i^{th}\)` observation is removed - larger Cook's d implies
larger influence (Cook's d greater than 0.5 or so is a good rule of thumb for a 
potentially influential point)

```r
library(car)
plot(cooks.distance(lm(y ~ x)), xlab = "Observation Index",
     ylab = "Cook's distance for regression model")
```

---

### Cook's distance

---

### Cook's distance

---

### Cook's distance

---

### Cook's distance

---

### Remember "augment"?

```r
library(tidymodels)
augment(lm(y ~ x)) # your model object goes here
```

```
## # A tibble: 20 x 8
##        y     x .fitted  .resid   .hat .sigma  .cooksd .std.resid
##    <dbl> <dbl>   <dbl>   <dbl>  <dbl>  <dbl>    <dbl>      <dbl>
##  1  14.8  3.86    13.8  0.986  0.0871  0.946 0.0562       1.09  
##  2  18.5  5.36    17.7  0.780  0.0802  0.958 0.0319       0.855 
##  3  15.1  4.23    14.8  0.322  0.0608  0.975 0.00396      0.350 
##  4  19.1  5.65    18.4  0.655  0.109   0.964 0.0327       0.730 
##  5  18.9  5.82    18.8  0.0645 0.131   0.978 0.000400     0.0728
##  6  13.2  3.14    12.0  1.23   0.187   0.921 0.236        1.44  
##  7  16.3  4.58    15.7  0.576  0.0503  0.968 0.0102       0.622 
##  8  17.1  5.68    18.5 -1.41   0.113   0.909 0.157       -1.57  
##  9  16.7  4.65    15.9  0.810  0.0500  0.957 0.0201       0.874 
## 10  14.6  4.37    15.1 -0.489  0.0548  0.971 0.00809     -0.529 
## 11  18.5  5.87    19.0 -0.426  0.138   0.972 0.0187      -0.483 
## 12  14.9  4.36    15.1 -0.238  0.0551  0.977 0.00194     -0.258 
## 13  16.1  5.03    16.8 -0.751  0.0586  0.960 0.0206      -0.814 
## 14  15.4  4.72    16.0 -0.592  0.0503  0.967 0.0108      -0.639 
## 15  11.3  3.31    12.4 -1.11   0.157   0.934 0.150       -1.27  
## 16  17.4  5.70    18.5 -1.12   0.115   0.935 0.102       -1.25  
## 17  14.1  3.74    13.5  0.545  0.0997  0.968 0.0202       0.604 
## 18  11.5  3.13    11.9 -0.408  0.189   0.972 0.0263      -0.476 
## 19  12.8  3.98    14.1 -1.32   0.0766  0.920 0.0870      -1.45  
## 20  20.8  5.86    19.0  1.89   0.137   0.844 0.365        2.14
```

---

### Another diagnostic plotting function

```r
library(ggfortify)
autoplot(lm(y ~ x)) # your model object goes here
```

---

### Another diagnostic plotting function

---

### What to do with outliers?

We can often detect outliers visually (e.g., in the residual plot), or by using
statistics such as Cook's d or examining leverage or other diagnostic plots.

Do not ignore outliers when you find them, and do not automatically delete them!
Outliers are often very interesting points that you might want to learn more
about, and aren't necessarily mistakes in the data (although sometimes they are).

You may want to perform .vocab[sensitivity analyses] after removing outliers. Do
your results or overall message change? How .vocab[robust] are your conclusions
to the outliers?