Multinomial Logistic Regression

# Multinomial Logistic Regression
### Dr. Maria Tackett
### 03.27.19

---

### Announcements

- Lab 07 due **today**

- Project proposal due Fri, March 29

---

## ROC curve & thresholds

```r
library(pROC)
roc_results <- roc(model_aug$y,model_aug$.fitted) 
roc_results$thresholds[2546] #2546 is the index for the threshold that has the chosen true positive rate = 0.6 and chosen false positive rate = 0.25
```

```
## [1] 0.207046
```

```r
roc_results$sensitivities[2546]
```

```
## [1] 0.6050269
```

```r
1 - roc_results$specificities[2546]
```

```
## [1] 0.2499194
```
]

---

### Confusion matrix at chosen threshold

| TenYearCHD| At Risk| No Risk|
|----------:|-------:|-------:|
|          0|     775|    2326|
|          1|     337|     220|

The true positive rate is `$\frac{337}{337 + 220} = 0.6050$`

The false positive rate is `$\frac{775}{775 + 2326} = 0.2499$`

---

## Multinomial Logistic Regression

---

### Generalized Linear Models (GLM)

- In practice, there are many different types of response variables including:

+ **Binary**: Win or Lose
  + **Nominal**: Democrat, Republican or Third Party candidate
  + **Ordered**: Movie rating (1 - 5 stars)
  + and others...

- These are all examples of **generalized linear models**, a broader class of models that generalize the multiple linear regression model

- See [*Generalized Linear Models: A Unifying Theory*](https://bookdown.org/roback/bookdown-bysh/ch-glms.html) for more details about GLMs

---

### Binary Response (Logistic)

- Given `$P(Y_i=1|X_i)=p_i\hspace{5mm} \text{ and } \hspace{5mm}P(Y_i=0|X_i) = 1-p_i$`

`$$\log\Big(\frac{p_i}{1-p_i}\Big) = \beta_0 + \beta_1 X_i$$`

- We can calculate `$p_i$` by solving the logit equation:

`$$p_i = \frac{\exp\{\beta_0 + \beta_1 X_i\}}{1 + \exp\{\beta_0 + \beta_1 X_i\}}$$`

---

### Binary Response (Logistic)

- Suppose we consider `$Y=0$` the *baseline category* such that

`$$P(Y_i=0|X_i) = p_{i0} \text{ and } P(Y_i=1|X_i) = p_{i1}$$`

- Then the logit model is

`$$\log\bigg(\frac{p_{i1}}{p_{i0}}\bigg) = \beta_0 + \beta_1 X_i$$`

- Slope, `$\beta_1$`: When `$X$` increases by one unit, the odds of `$Y=1$` versus the baseline `$Y=0$` are expected to multiply by a factor of `$\exp\{\beta_1\}$`

- Intercept, `$\beta_0$`: When `$X=0$`, the odds of `$Y=1$` versus the baseline `$Y=0$` are expected to be `$\exp\{\beta_0\}$`

---

### Multinomial Logistic Regression

- Multinomial Distribution:

`$$P(Y=1) = p_1, P(Y=2) = p_2, \ldots, P(Y=k) = p_k$$`

such that `$\sum\limits_{j=1}^{k} p_j = 1$`

- Suppose we have no explanatory variables. What would be the best estimate of `$p_j$`?

---

### Multinomial Logistic Regression

- If we have an explanatory variable `$X$`, then we want `$p_j$` to be a function of `$X$`

- Choose a baseline category. Let's choose `$Y=1$`.  Then,

`$$\log\bigg(\frac{p_{ij}}{p_{i1}}\bigg) = \beta_{0j} + \beta_{1j} X_i$$`
 
--

- We have a separate model for each category of the response

---

### Multinomial Logistic Regression

- Suppose we have a response variable `$Y$` that can take three possible outcomes that are coded as 1,2,3

- Let 1 be the baseline category. Then

`$$\log\bigg(\frac{p_{i2}}{p_{i1}}\bigg) = \beta_{02} + \beta_{12} X_i \\[10pt]
\log\bigg(\frac{p_{i3}}{p_{i1}}\bigg) = \beta_{03} + \beta_{13} X_i$$`

- Slope, `$\beta_{1j}$`: When `$X$` increases by one unit, the odds of `$Y=j$` versus the baseline are expected to multiply by a factor of `$\exp\{\beta_{1j}\}$`

- Intercept, `$\beta_{0j}$`: When `$X=0$`, the odds of `$Y=j$` versus the baseline are expected to be `$\exp\{\beta_{0j}\}$`

---

### Multinomial Regression in R

- Use the `multinom()` function in the `nnet` package

```r
library(nnet)
my.model <- multinom(Y ~ X1 + X2 + ... + XP,data=my.data)
tidy(my.model, exponentiate = FALSE) #display log-odds model
```

```r
# calculate predicted probabilities
pred.probs <- predict(my.model, type = "probs")
```

---

### Iris Data

- To illustrate multinomial logistic regression, we will use the `iris` data set.

- **Response:**
 + Species: versicolor, setosa, virginica
 
- **Predictors** (all measured in centimeters):
 + Sepal.Length
 + Sepal.Width
 + Petal.Length
 + Petal.Width

---

### Iris Data

```r
glimpse(iris)
```

```
## Observations: 150
## Variables: 5
## $ Sepal.Length <dbl> 5.1, 4.9, 4.7, 4.6, 5.0, 5.4, 4.6, 5.0, 4.4, 4.9, 5…
## $ Sepal.Width <dbl> 3.5, 3.0, 3.2, 3.1, 3.6, 3.9, 3.4, 3.4, 2.9, 3.1, 3…
## $ Petal.Length <dbl> 1.4, 1.4, 1.3, 1.5, 1.4, 1.7, 1.4, 1.5, 1.4, 1.5, 1…
## $ Petal.Width <dbl> 0.2, 0.2, 0.2, 0.2, 0.2, 0.4, 0.3, 0.2, 0.2, 0.1, 0…
## $ Species <fct> setosa, setosa, setosa, setosa, setosa, setosa, set…
```

---

### Iris Data

---

### Iris Data

- Goal: Use characteristics of the petals and sepal to distinguish between the three species

---

### Iris: Model

**Suppose we have already done exploratory data analysis.**

```r
library(nnet)
model1 <- multinom(Species ~ Sepal.Length + Sepal.Width + 
 Petal.Length + Petal.Width, data = iris)
```

---

### Iris: Model

```r
kable(tidy(model1, exponentiate = FALSE), format = "markdown", 
      digits = 3)
```

|y.level    |term         | estimate| std.error| statistic| p.value|
|:----------|:------------|--------:|---------:|---------:|-------:|
|versicolor |(Intercept)  |   18.690|    34.971|     0.534|   0.593|
|versicolor |Sepal.Length |   -5.458|    89.892|    -0.061|   0.952|
|versicolor |Sepal.Width  |   -8.707|   157.042|    -0.055|   0.956|
|versicolor |Petal.Length |   14.245|    60.192|     0.237|   0.813|
|versicolor |Petal.Width  |   -3.098|    45.489|    -0.068|   0.946|
|virginica  |(Intercept)  |  -23.836|    35.766|    -0.666|   0.505|
|virginica  |Sepal.Length |   -7.924|    89.912|    -0.088|   0.930|
|virginica  |Sepal.Width  |  -15.371|   157.120|    -0.098|   0.922|
|virginica  |Petal.Length |   23.660|    60.468|     0.391|   0.696|
|virginica  |Petal.Width  |   15.135|    45.934|     0.330|   0.742|

---

### Iris: Interpreting Coefficients

2. Write the model for the log odds.

3. Interpret the coefficient for `Sepal.Length` in terms of the odds that an iris is a `versicolor` versus the baseline species.
]

---

## Calculating Probabilities

- For `$j=2,\ldots,k$`, we calculate the probabilities, `$p_{ij}$` as

`$$p_{ij} = \frac{\exp\{\beta_{0j} + \beta_{1j}X_i\}}{1 + \sum\limits_{j=2}^k \exp\{\beta_{0j} + \beta_{1j}X_i\}}$$`

- For the baseline category, `$j=1$`, we calculate the probability `$p_{i1}$` as
`$$p_{i1} = 1- \sum\limits_{j=2}^k p_{ij}$$`
---

### Model Diagnostics

For each category of the response, `$j$`:

- Analyze a plot of the binned residuals vs. predicted probabilities

- Analyze a plot of the binned residuals vs. each continuous predictor variable

- Look for any patterns in the residuals plots

- For each categorical predictor variable, examine the average residuals for each category of the response variable

---

### Drop-in-deviance Test

- Suppose there are two models: 
    - Model 1 includes predictors `$x_1, \ldots, x_q$`
    - Model 2 includes predictors `$x_1, \ldots, x_q, x_{q+1}, \ldots, x_p$`

- We want to test the hypotheses
`$$\begin{aligned}&H_0: \beta_{q+1} = \dots = \beta_p = 0 \\
& H_a: \text{ at least 1 }\beta_j \text{ is not} 0\end{aligned}$$`

- Use the **drop-in-deviance test** to compare models (similar to logistic regression)

---

### Iris: Predicted probabilities & residuals

```r
#calculate predicted probabilities
pred_probs <- data.frame(predict(model1, type = "probs"))
```

```r
iris <- iris %>%
 mutate(virginica = ifelse(Species == "virginica", 1, 0), 
 versicolor = ifelse(Species == "versicolor", 1, 0), 
 setosa = ifelse(Species == "setosa", 1, 0), 
 resid_virginica = virginica - pred_probs$virginica, 
 resid_versicolor = versicolor - pred_probs$versicolor, 
 resid_setosa = setosa - pred_probs$setosa)
```

---

### Binned Residuals vs. Predictors

- Below are the plots for *Sepal.Length* These plots should be repeated for each explanatory variable

.small[
<img src="18-multinomial-logistic-pt2_files/figure-html/unnamed-chunk-15-1.png" style="display: block; margin: auto;" />
]

---

### Interactions with Sepal.Length?

We want to test if there are any significant interactions with `Sepal.Length`

```r
#test for Sepal.Length interactions
model1 <- multinom(Species ~ Sepal.Length + Sepal.Width + 
 Petal.Length + Petal.Width,data=iris)

model2 <- multinom(Species ~ Sepal.Length + Sepal.Width + Petal.Length +
 Petal.Width + 
 Sepal.Length * Sepal.Width + 
 Sepal.Length * Petal.Length +
 Sepal.Length * Petal.Width,
 data=iris)
```
]

---

### Interactions with Sepal.Length?

```r
anova(model1, model2, test = "Chisq")
```

```
## Likelihood ratio tests of Multinomial Models
## 
## Response: Species
## Model
## 1 Sepal.Length + Sepal.Width + Petal.Length + Petal.Width
## 2 Sepal.Length + Sepal.Width + Petal.Length + Petal.Width + Sepal.Length * Sepal.Width + Sepal.Length * Petal.Length + Sepal.Length * Petal.Width
## Resid. df Resid. Dev Test Df LR stat. Pr(Chi)
## 1 290 11.899734 
## 2 284 9.279243 1 vs 2 6 2.620492 0.8547465
```

**No significant interactions with Sepal.Length**

---

### Actual vs. Predicted Species

- We can use our model to predict the species

- The predicted species is the one with the highest predicted probability

```r
pred_species <- predict(model1, type = "class") 
table(iris$Species, pred_species)
```

```
##             pred_species
##              setosa versicolor virginica
##   setosa         50          0         0
##   versicolor      0         49         1
##   virginica       0          1        49
```

---

### Example: *Sesame Street*

- We will analyze data from an experiment to test the effectiveness of the children's program *Sesame Street*.

- As part of the experiment, children were assigned to one of two groups: those who were encouraged to watch the program and those who were not

- We want to understand what effect the encouragement had on the frequency of viewing after adjusting for other characteristics

---

### Response Variable

- `viewcat`
 + 1: rarely watched show
 + 2: once or twice a week
 + 3: three to five times a week
 + 4: watched show on average more than five times a week
 
---

### Predictor Variables

- `age`: child's age in months
- `prenumb`: score on numbers pretest (0 to 54)
- `prelet`: score on letters pretest (0 to 58)
- `viewenc`: 1: encouraged to watch, 2: not encouraged
- `site:`
 + 1: three to five year old from disadvantaged inner city area
 + 2: four year old from advantaged suburban area
 + 3: from advantaged rural area
 + 4: from disadvantaged rural area
 + 5: from Spanish speaking home

---

[*Sesame Street* analysis](https://www2.stat.duke.edu/courses/Spring19/sta210.001/appex/18-multinom-logistic.html)