Categorical Data (3)

class: center, middle, inverse, title-slide

# Categorical Data (3)
### Yue Jiang
### Duke University

---

### Licorice and post-op sore throat

---

### Licorice and post-op sore throat

---

### Licorice and post-op sore throat

---

### Licorice and post-op sore throat

---

### Ordinal data

In this case, our outcome is *ordered* (and categorical). Although we have 
"numbers" on a pain scale from 0 to 10, these numbers don't have a linear
relationship - a 4 isn't necessarily twice as painful as a 4; a 3 isn't three
times as painful as 1.

However, these data *are* ordered. We do know that a 4 is more painful than a 3,
which is more painful than a 2, etc.

.question[
What are some potential pitfalls of using an ordinary least squares regression?
How about using a multinomial regression approach?
]

---

### A cumulative link model

We might consider an outcome `$Y$` that looks at the *cumulative* distribution. For
`$j$` total ordered categories, we might model the cumulative probability for
observation `$i$`:

`\begin{align*}
\gamma_{ij} &= P(Y_i \le j)\\
&= P(Y_i = 1) + P(Y_i = 2) + \cdots + P(Y_i = j).
\end{align*}`

Note that `$\gamma_{ij}$` is limited to values from 0 to 1, as it is a probability.
We might consider a model

`\begin{align*}
g(\gamma_{ij}) = \theta_j + X_i^T\beta,
\end{align*}`

where `$g()$` is a link function mapping 0 to 1 to `$\mathbb{R}$`.

.question[
What are the `$\theta$` terms? What might each `$X_i$` look like?
]

---

### A cumulative link model

The `$\theta$` terms are constants representing the "baseline" value for each
category (on the transformed scale). This implies that the design matrix will
**not** contain an intercept term (and so that `$\beta$`s only correspond to
observed covariates).

`\begin{align*}
g(\gamma_{ij}) = \theta_j + X_i^T\beta
\end{align*}`

.question[
What is the interpretation of `$\beta$`? What are we implicitly assuming?
]

---

### A cumulative link model

In this case, we have the same covariate effects `$\beta$` across **all** of the
categories. This means that `$\beta_k$` is the conditional change in (transformed)
cumulative probabilities given a 1 unit difference in `$X_{ik}$`.

---

### Ordered logistic regression

The ordered logistic regression model is a cumulative link model that assumes a
logit transformation of the cumulative probabilities:

`\begin{align*}
logit(\gamma_{ij}) &= \theta_j + X_i^T\beta\\
log\left(\frac{\gamma_{ij}}{1 - \gamma_{ij}}\right) &= \theta_j + X_i^T\beta\\
log\left(\frac{P(Y_i \le j)}{P(Y_i > j)}\right) &= \theta_j + X_i^T\beta\\
\end{align*}`

Exponentiating, we have

`\begin{align*}
\frac{P(Y_i \le j)}{P(Y_i > j)} &= \exp(\theta_j)\exp(X_i^T\beta)
\end{align*}`

.question[
What is the outcome here? How might we interpret `$\exp(\theta_j)$`? How might
we interpret the `$\beta$` terms here?
]

---

### The proportional odds assumption

Remember that we have only one `$\beta$` term for each predictor across *all*
categories. This implies that changes in `$X_k$` have the same conditional 
relationship with odds of being in category 1 vs. 2, 6 vs. 7, or any `$j-1$` vs.
`$j$`.

.question[
When might this be a reasonable assumption? When might this assumption be
violated? How might we modify the model in the case that this assumption does
not hold? How might we gut-check this assumption using the data?
]

---