Partial Likelihood

class: center, middle, inverse, title-slide

.title[
# Partial Likelihood
]
.author[
### Yue Jiang
]
.date[
### STA 490/690
]

---

### A proportional hazards model

`\begin{align*}
\lambda(t) &= \lambda_0(t)\exp\left(\beta_1x_1 + \beta_2x_2 + \cdots + \beta_px_p \right)
\end{align*}`

.question[
When might we care about the `$\boldsymbol\beta$`? When might we care about `$\lambda(t)$`?
]

---

### A proportional hazards model

`\begin{align*}
\lambda(t) &= \lambda_0(t)\exp\left(\beta_1x_1 + \beta_2x_2 + \cdots + \beta_px_p \right)
\end{align*}`

In a parametric survival model (such as ones we've seen so far), survival times
are assumed to follow a specific distribution, which we might be unwilling to assume.

.question[
Is it possible to model `$\boldsymbol\beta$` without knowing what `$\lambda_0(t)$` looks like? Intuitively, how might we be able to do so, given that `$\lambda_0(t)$` is infinite-dimensional?
]

---

### Estimating `$\boldsymbol\beta$`

For now, assume that there are no tied failure times, once again ordering the failure times from `$t_1 < t_2 < \cdots < t_K$`. We will again assume that observations/failures are independent.

.question[
Consider the interval `$(t_{j-1}^+, t_j^-)$`: that is, an interval in which no failures occur. What information, if any, do we obtain regarding `$\boldsymbol\beta$`, `$\mathbf{X}$`, and `$\lambda(t)$` (relationships between predictors and the hazard)?

**Remember**: `$\lambda_0(t)$` might take on *any arbitrary* shape.
]

---

### Estimating `$\boldsymbol\beta$`

Now suppose we know that there
**is** one failure at time `$t_j$`.

For this failure time, the contribution to the
"likelihood" is the probability that individual `$j$` fails, *given* that there is exactly one failure in the risk set at that time:

`\begin{align*}
P(\text{individual }j\text{ fails} | \text{one failure from }\mathcal{R}(t_j))
\end{align*}`

where `$\mathcal{R}(t_j) = \left\{i: T_i \ge t_j \right\}$` is the risk set at time
`$t_j$`.

.question[
How else might you more formally express this as a ratio of two probabilities (the joint and the marginal)?
]

---

### Estimating `$\boldsymbol\beta$`

This is

`\begin{align*}
\frac{P(\text{individual }j\text{ fails and one failure from }\mathcal{R}(t_j))}{P(\text{one failure from }\mathcal{R}(t_j))
}
\end{align*}`

We are conditioning on exactly one failure at time `$t_j$`, but we don't know who it is.

.question[
What are numerator and denominator more specifically? (we'll go into more detail on the next two slides)
]

---

### The numerator and the denominator

We know the following about/from the numerator:

- One specific person failed at time `$t_j$`
- This person has specific characteristics: let's call them `$\mathbf{x}_{t_j}$`; the covariates corresponding to the person that failed at time `$t_j$`
- They had to still have been eligible to fail at time `$t_j$`

We know the following about/from the denominator:

- The characteristics of people still eligible to fail at that time
- They all had to still have been eligible to fail at time `$t_j$`

.question[
What are these quantities?
]

---

### Estimating `$\boldsymbol\beta$`

We have

`\begin{align*}
&\mathrel{\phantom{=}} \frac{P(\text{individual }j\text{ fails and one failure from }\mathcal{R}(t_j))}{P(\text{one failure from }\mathcal{R}(t_j))
}\\
&= \frac{P(\text{individual }j\text{ fails} | \text{at risk at }t_j)}{\sum_{k \in \mathcal{R}(t_j)}P(\text{individual }k\text{ fails} | \text{at risk at }t_j)}\\
&= \frac{\lambda(t_j | \mathbf{x}_j)}{\sum_{k \in \mathcal{R}(t_j)}\lambda(t_j | \mathbf{x}_k)}
\end{align*}`

.question[
Can you simplify these expressions? We still don't know anything about `$\lambda_0$`!
]

---

### Estimating `$\boldsymbol\beta$`

Under proportional hazards,
`$\lambda(t | \mathbf{x}) = \lambda_0(t)\exp(\mathbf{x}\boldsymbol\beta)$`. Thus, for each
failure time `$t_j$`, we have

`\begin{align*}
&\mathrel{\phantom{=}} \frac{\lambda(t_j | \mathbf{x}_j)}{\sum_{k \in \mathcal{R}(t_j)}\lambda(t_j | \mathbf{x}_k)}\\
&= \frac{\lambda_0(t_j)\exp(\mathbf{x}_j\boldsymbol\beta)}{\sum_{k \in \mathcal{R}(t_j)}\lambda_0(t_j)\exp(\mathbf{x}_k\boldsymbol\beta)}\\
&= \frac{\exp(\mathbf{x}_j\boldsymbol\beta)}{\sum_{k \in \mathcal{R}(t_j)}\exp(\mathbf{x}_k\boldsymbol\beta)}
\end{align*}`

---

### Estimating `$\boldsymbol\beta$`

If there are `$J$` total failure times, the "likelihood" is thus the product over
all possible failure times:

`\begin{align*}
\mathcal{L}^\star(\boldsymbol\beta) &= \prod_{j = 1}^J \frac{\exp(\mathbf{x}_j\boldsymbol\beta)}{\sum_{k \in \mathcal{R}(t_j)}\exp(\mathbf{x}_k\boldsymbol\beta)}\\
&= \prod_{i = 1}^n \left(\frac{\exp(\mathbf{x}_i\boldsymbol\beta)}{\sum_{k \in \mathcal{R}(t_i)}\exp(\mathbf{x}_k\boldsymbol\beta)}\right)^{\delta_i}\\
\log\mathcal{L}^\star(\boldsymbol\beta) &= \sum_{i = 1}^n \delta_i\left(\mathbf{x}_i\boldsymbol\beta - \log \sum_{k \in \mathcal{R}(t_i)}\exp(\mathbf{x}_k\boldsymbol\beta) \right)
\end{align*}`

---

### The partial log-likelihood

We can maximize this quantity with respect to `$\boldsymbol\beta$` *without* 
having to estimate (or even specify) the baseline hazard `$\lambda_0(t)$` - "$t$" never enters the equation above. Censored individuals never contribute to the numerator, but do contribute to risk sets before their censorings.

This expression is known as the .vocab[partial likelihood].

(See board for diagram)

---

### The Cox proportional hazards model

`\begin{align*}
\lambda(t) &= \lambda_0(t)\exp\left(\beta_1x_1 + \beta_2x_2 + \cdots + \beta_px_p \right)
\end{align*}`

- Fewer assumptions than fully parametric models, but still requires PH 
assumption
- By far the most commonly used regression model for survival data
- Attractive interpretations using hazard ratios
- Can be extended to accommodate time-varying covariates, recurrent events, etc.

---

### Fitting the Cox PH model

```r
coxm1 <- coxph(Surv(DWHFDAYS, DWHF) ~ TRTMT + EJFPER + PREVMI, 
                    data = dig)
summary(coxm1)
```

```
## Call:
## coxph(formula = Surv(DWHFDAYS, DWHF) ~ TRTMT + EJFPER + PREVMI, 
##     data = dig)
## 
##   n= 6799, number of events= 2332 
##    (1 observation deleted due to missingness)
## 
##             coef exp(coef)  se(coef)       z Pr(>|z|)    
## TRTMT  -0.289745  0.748454  0.041666  -6.954 3.55e-12 ***
## EJFPER -0.038061  0.962654  0.002381 -15.985  < 2e-16 ***
## PREVMI  0.047921  1.049087  0.043638   1.098    0.272    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##        exp(coef) exp(-coef) lower .95 upper .95
## TRTMT     0.7485     1.3361    0.6898    0.8121
## EJFPER    0.9627     1.0388    0.9582    0.9672
## PREVMI    1.0491     0.9532    0.9631    1.1428
## 
## Concordance= 0.605  (se = 0.006 )
## Likelihood ratio test= 304.4  on 3 df,   p=<2e-16
## Wald test            = 302.5  on 3 df,   p=<2e-16
## Score (logrank) test = 306.2  on 3 df,   p=<2e-16
```

.question[
How might we interpret these coefficients? How do they compare to our previous
models?
]

---

### Fitting the Cox PH model

One strength of the Cox model is that we can ignore estimation of `$\lambda_0$`
completely (it doesn't matter for valid inference on the `$\beta$`s).

If we *do* want to estimate survival probabilities, then we must estimate the 
baseline hazard. A non-parametric estimate of the cumulative hazard (the 
.vocab[Breslow estimator] - derivation is beyond scope of this class) is 
implemented by the `basehaz` function in the `survival` package by:

`\begin{align*}
\hat{\Lambda}_0(t) = \sum_{i:t_{(i)} < t} \frac{1}{\sum_{k \in \mathcal{R}_i} \exp(\mathbf{x}_k\boldsymbol\beta)}
\end{align*}`

We can then estimate survival in the Cox model by:

`\begin{align*}
\hat{S}(t|\mathbf{X}) = \exp(-\exp(\mathbf{X}\boldsymbol\beta) \hat{\Lambda}_0(t))
\end{align*}`