Parametric PH Models

class: center, middle, inverse, title-slide

.title[
# Parametric PH Models
]
.author[
### Yue Jiang
]
.date[
### STA 490/690
]

---

### An AFT model

```r
library(survival)
aft_w <- survreg(Surv(DWHFDAYS, DWHF) ~ TRTMT + EJFPER + PREVMI, 
                 data = dig, dist = "lognormal")
summary(aft_w)
```

```
## 
## Call:
## survreg(formula = Surv(DWHFDAYS, DWHF) ~ TRTMT + EJFPER + PREVMI, 
##     data = dig, dist = "lognormal")
##                Value Std. Error     z       p
## (Intercept)  6.06283    0.12297 49.30 < 2e-16
## TRTMT        0.55329    0.06741  8.21 2.2e-16
## EJFPER       0.05932    0.00385 15.42 < 2e-16
## PREVMI      -0.11991    0.07059 -1.70   0.089
## Log(scale)   0.82037    0.01636 50.14 < 2e-16
## 
## Scale= 2.27 
## 
## Log Normal distribution
## Loglik(model)= -20343.2   Loglik(intercept only)= -20501
## 	Chisq= 315.72 on 3 degrees of freedom, p= 3.9e-68 
## Number of Newton-Raphson Iterations: 3 
## n=6799 (1 observation deleted due to missingness)
```

---

### Visualizing the AFT model

---

### A similar model for hazards?

---

### A proportional hazards model

.question[
How might we relate hazards to have a proportional relationship given a linear predictor?

What do you think about the model below, where `\(\lambda_0\)` is the 
.vocab[baseline hazard], the hazard for an individual 
with `\(\mathbf{x}_i = \mathbf{0}\)`.?

`\begin{align*}
\lambda(t) &= \lambda_0(t)(\beta_1x_1 + \beta_2x_2 + \cdots + \beta_px_p)
\end{align*}`
]

.question[
What about the following? Any issues below?

`\begin{align*}
\lambda(t) &= \lambda_0(t)\exp\left(\beta_0 + \beta_1x_1 + \beta_2x_2 + \cdots + \beta_px_p \right)
\end{align*}`
]

---

### A proportional hazards model

`\begin{align*}
\lambda(t) &= \lambda_0(t)\exp\left(\beta_1x_1 + \beta_2x_2 + \cdots + \beta_px_p \right)
\end{align*}`

A .vocab[proportional hazards] model is a linear model for the log hazard ratio

(though *technically* you could technically set up other models that satisfy proportional hazards - what might some other candidate functions look like? Why doesn't anyone do that?).

.question[
How might we interpret `\(\beta\)` terms in such a model? 
]

---

### A proportional hazards model

`\begin{align*}
\lambda(t) &= \lambda_0(t)\exp\left(\beta_1x_1 + \beta_2x_2 + \cdots + \beta_px_p \right)
\end{align*}`

Covariates in a PH model have a multiplicative effect on *hazards*. For
instance, if `\(\beta_k = 0.4\)`, then `\(\exp(\beta_k) \approx 1.5\)`. Holding all else 
equal, an individual with covariate `\(x_k\)` one unit greater than another is 
expected to have approximately 1.5 times the *hazard* of the event than the
other. This multiplicative effect holds *regardless* of the time `\(t\)`.

.question[
Would you rather have a higher linear predictor or a lower linear predictor
in a PH model? How does this compare to the AFT model?
]

---

### A proportional hazards model

---

### Likelihood derivations

The proportional hazard model for the model above assumes an exponential distribution with parameter `\(\lambda\)` (sorry again about notation). In particular,

`\begin{align*}
\lambda_i(t) &= \lambda\exp\left(\beta_1x_{1i} + \beta_2x_{2i} + \cdots + \beta_px_{pi} \right)
\end{align*}`

.question[
What is the distribution function for individual `\(i\)`? How about the survival function?
]

---

### Likelihood derivation

Individual `\(i\)` also has an exponentially-distributed failure time (why?); this time, with parameter `\(\lambda\exp(\mathbf{x}_i\boldsymbol\beta)\)`. Thus

`\begin{align*}
\lambda_i(t) &= \lambda\exp(\mathbf{x}_i\boldsymbol\beta)
\end{align*}`

where `\(\mathbf{x}_i\boldsymbol\beta = \beta_1x_{1i} + \cdots + \beta_px_{pi}\)`.

It's sometimes easier to rewrite and think of the linear predictor using `\(\lambda = \exp(\beta_0)\)` so that

`\begin{align*}
\lambda_i(t) &= \exp(\mathbf{x}_i\boldsymbol\beta)\\
f_i(t) &= \exp(\mathbf{x}_i\boldsymbol\beta)\exp(-\exp(\mathbf{x}_i\boldsymbol\beta)t)\\
S_i(t) &= \exp(-\exp(\mathbf{x}_i\boldsymbol\beta)t)
\end{align*}`

where `\(\mathbf{x}_i\boldsymbol\beta = \beta_0 + \beta_1x_{1i} + \cdots + \beta_px_{pi}\)` and `\(\beta_0 = \log(\lambda)\)`.

---

### Likelihood derivation

.question[
What does the likelihood function for `\(\boldsymbol\beta = (\beta_0, \beta_1, \cdots, \beta_p)^T\)` look like for independently right-censored data with no truncation? How about the log-likelihood? 
]

---

### Likelihood derivation

`\begin{align*}
\mathcal{L}(\boldsymbol\beta | Y, \mathbf{X}) &= \prod_{i = 1}^n f_i(t_i)^{\delta_i}S_i(t_i)^{1 - \delta_i}\\
&= \prod_{i = 1}^n \lambda_i(t_i)^{\delta_i}S_i(t_i)\\
&= \prod_{i = 1}^n \exp(\mathbf{x}_i\boldsymbol\beta)^{\delta_i}\Big(\exp(-\exp(\mathbf{x}_i\boldsymbol\beta)t_i) \Big)\\
\log\mathcal{L}(\boldsymbol\beta | Y, \mathbf{X}) &= \sum_{i = 1}^n \delta_i(\mathbf{x}_i\boldsymbol\beta) - \exp(\mathbf{x}_i\boldsymbol\beta)t_i
\end{align*}`

---

### Fitting a PH model

```r
library(eha)
ph_exp <- phreg(Surv(DWHFDAYS, DWHF) ~ TRTMT + EJFPER + PREVMI, 
               data = dig, dist = "weibull", shape = 1)
ph_exp
```

```
## Call:
## phreg(formula = Surv(DWHFDAYS, DWHF) ~ TRTMT + EJFPER + PREVMI, 
##     data = dig, dist = "weibull", shape = 1)
## 
## Covariate          W.mean      Coef Exp(Coef)  se(Coef)    Wald p
## TRTMT               0.518    -0.299     0.742     0.042     0.000 
## EJFPER             29.455    -0.040     0.960     0.002     0.000 
## PREVMI              0.645     0.050     1.051     0.044     0.254 
## 
## log(scale)                    6.629               0.075     0.000 
## 
##  Shape is fixed at  1 
## 
## Events                    2332 
## Total time at risk        6092212 
## Max. log. likelihood      -20512 
## LR test statistic         337.51 
## Degrees of freedom        3 
## Overall p-value           0
```

.vocab[
How might we interpret the coefficient estimates here? 
]

---

### Comparing to the AFT model

```r
library(survival)
aft_e <- survreg(Surv(DWHFDAYS, DWHF) ~ TRTMT + EJFPER + PREVMI, 
                 data = dig, dist = "exponential")
summary(aft_e)
```

```
## 
## Call:
## survreg(formula = Surv(DWHFDAYS, DWHF) ~ TRTMT + EJFPER + PREVMI, 
##     data = dig, dist = "exponential")
##                Value Std. Error     z       p
## (Intercept)  6.62850    0.07495 88.43 < 2e-16
## TRTMT        0.29856    0.04167  7.17 7.7e-13
## EJFPER       0.04037    0.00239 16.92 < 2e-16
## PREVMI      -0.04978    0.04364 -1.14    0.25
## 
## Scale fixed at 1 
## 
## Exponential distribution
## Loglik(model)= -20511.5   Loglik(intercept only)= -20680.3
## 	Chisq= 337.51 on 3 degrees of freedom, p= 7.6e-73 
## Number of Newton-Raphson Iterations: 5 
## n=6799 (1 observation deleted due to missingness)
```