Introduction to survival analysis (4)

class: center, middle, inverse, title-slide

# Introduction to survival analysis (4)
### Yue Jiang
### Duke University

---

### The DIG Trial

Investigators compared the **primary outcome** of the number of days from the
start of the study to either death or hospitalization from worsening heart
failure.

---

### Review: accelerated failure time model

An .vocab[accelerated failure time] (AFT) model assumes

`\begin{align*}
\log(T_i) = \beta_0 + \beta_1x_1 + \beta_2x_2 + \cdots + \beta_px_p + \epsilon_i
\end{align*}`

where `\(\epsilon_i\)` are commonly assumed to be i.i.d. and follow some specified
distribution.

Covariates in an AFT model have a multiplicative effect on *time*. For instance,
if `\(\beta_k = 0.4\)`, then `\(\exp(\beta_k) \approx 1.5\)`. Holding all else equal,
an individual with covariate `\(x_k\)` one unit greater than another is expected
to survive approximately 1.5 times longer than the other.

---

### Review: fitting an AFT model

```r
library(survival)
aft_w <- survreg(Surv(DWHFDAYS, DWHF) ~ TRTMT + EJF_PER + PREVMI, 
                 data = dig, dist = "lognormal")
summary(aft_w)
```

```
## 
## Call:
## survreg(formula = Surv(DWHFDAYS, DWHF) ~ TRTMT + EJF_PER + PREVMI, 
##     data = dig, dist = "lognormal")
##                Value Std. Error     z       p
## (Intercept)  6.06283    0.12297 49.30 < 2e-16
## TRTMT        0.55329    0.06741  8.21 2.2e-16
## EJF_PER      0.05932    0.00385 15.42 < 2e-16
## PREVMI      -0.11991    0.07059 -1.70   0.089
## Log(scale)   0.82037    0.01636 50.14 < 2e-16
## 
## Scale= 2.27 
## 
## Log Normal distribution
## Loglik(model)= -20343.2   Loglik(intercept only)= -20501
## 	Chisq= 315.72 on 3 degrees of freedom, p= 3.9e-68 
## Number of Newton-Raphson Iterations: 3 
## n=6799 (1 observation deleted due to missingness)
```

---

### Review: visualizing the AFT model

---

### Review: proportional hazards model

`\begin{align*}
\lambda(t) &= \lambda_0(t)\exp\left(\beta_1x_1 + \beta_2x_2 + \cdots + \beta_px_p \right)
\end{align*}`

where `\(\lambda_0\)` is the .vocab[baseline hazard]. We've seen both parametric
and semi-parametric versions of this model.

Covariates in a PH model have a multiplicative effect on *hazards*. For
instance, if `\(\beta_k = 0.4\)`, then `\(\exp(\beta_k) \approx 1.5\)`. Holding all else 
equal, an individual with covariate `\(x_k\)` one unit greater than another is 
expected to have approximately 1.5 times the *hazard* of the event than the
other. This multiplicative effect holds *regardless* of the time `\(t\)`.

---

### Review: fitting a PH model

```r
library(eha)
ph_ln <- phreg(Surv(DWHFDAYS, DWHF) ~ TRTMT + EJF_PER + PREVMI, 
               data = dig, dist = "lognormal")
ph_ln
```

```
## Call:
## phreg(formula = Surv(DWHFDAYS, DWHF) ~ TRTMT + EJF_PER + PREVMI, 
##     data = dig, dist = "lognormal")
## 
## Covariate          W.mean      Coef Exp(Coef)  se(Coef)    Wald p
## (Intercept)                   4.480               1.109     0.000 
## TRTMT               0.518    -0.290     0.748     0.042     0.000 
## EJF_PER            29.455    -0.038     0.963     0.002     0.000 
## PREVMI              0.645     0.048     1.049     0.044     0.271 
## 
## log(scale)                   15.421               2.721     0.000 
## log(shape)                   -1.358               0.123     0.000 
## 
## Events                    2332 
## Total time at risk        6092212 
## Max. log. likelihood      -20339 
## LR test statistic         303.79 
## Degrees of freedom        3 
## Overall p-value           0
```

---

### Review: fitting the Cox PH model

```r
coxm1 <- coxph(Surv(DWHFDAYS, DWHF) ~ TRTMT + EJF_PER + PREVMI, 
                    data = dig)
summary(coxm1)
```

```
## Call:
## coxph(formula = Surv(DWHFDAYS, DWHF) ~ TRTMT + EJF_PER + PREVMI, 
##     data = dig)
## 
##   n= 6799, number of events= 2332 
##    (1 observation deleted due to missingness)
## 
##              coef exp(coef)  se(coef)       z Pr(>|z|)    
## TRTMT   -0.289745  0.748454  0.041666  -6.954 3.55e-12 ***
## EJF_PER -0.038061  0.962654  0.002381 -15.985  < 2e-16 ***
## PREVMI   0.047921  1.049087  0.043638   1.098    0.272    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##         exp(coef) exp(-coef) lower .95 upper .95
## TRTMT      0.7485     1.3361    0.6898    0.8121
## EJF_PER    0.9627     1.0388    0.9582    0.9672
## PREVMI     1.0491     0.9532    0.9631    1.1428
## 
## Concordance= 0.605  (se = 0.006 )
## Likelihood ratio test= 304.4  on 3 df,   p=<2e-16
## Wald test            = 302.5  on 3 df,   p=<2e-16
## Score (logrank) test = 306.2  on 3 df,   p=<2e-16
```

---

### Review: visualizing the PH model

---

### Bayesian survival analysis

As in all Bayesian models, we must specify our prior information for various
model components.

.question[
What pieces of the model should have priors specified on them? What types of 
prior might be reasonable?
]

- Intercept (baseline hazard)
- Regression "slopes"
- Additional distribution-specific parameters (e.g., shape parameters for 
Weibull)

We can extend models to more flexible situations, but these'll suffice for the
vast majority of time-independent regression situations you'll encounter.

---

### Back to `rstanarm`

```r
# you may need to change dir
remove.packages("rstanarm", lib="~/R/win-library/4.1") 
install.packages("rstanarm", 
                 repos = c("https://mc-stan.org/r-packages/", 
                           getOption("repos")))
```

---

### Back to `rstanarm`

```r
library(rstanarm)
m1 <- stan_surv(Surv(DWHFDAYS, DWHF) ~ TRTMT + EJF_PER + PREVMI, 
               data = dig, basehaz = "weibull",
               prior = normal(autoscale = T),
               prior_intercept = normal(autoscale = T),
               prior_aux = cauchy(0, 5, autoscale = T),
               chains = 2, iter = 2000, seed = 123,
               prior_PD = F)
```

```
## 
## SAMPLING FOR MODEL 'surv' NOW (CHAIN 1).
## Chain 1: 
## Chain 1: Gradient evaluation took 0 seconds
## Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.
## Chain 1: Adjust your expectations accordingly!
## Chain 1: 
## Chain 1: 
## Chain 1: Iteration:    1 / 2000 [  0%]  (Warmup)
## Chain 1: Iteration:  200 / 2000 [ 10%]  (Warmup)
## Chain 1: Iteration:  400 / 2000 [ 20%]  (Warmup)
## Chain 1: Iteration:  600 / 2000 [ 30%]  (Warmup)
## Chain 1: Iteration:  800 / 2000 [ 40%]  (Warmup)
## Chain 1: Iteration: 1000 / 2000 [ 50%]  (Warmup)
## Chain 1: Iteration: 1001 / 2000 [ 50%]  (Sampling)
## Chain 1: Iteration: 1200 / 2000 [ 60%]  (Sampling)
## Chain 1: Iteration: 1400 / 2000 [ 70%]  (Sampling)
## Chain 1: Iteration: 1600 / 2000 [ 80%]  (Sampling)
## Chain 1: Iteration: 1800 / 2000 [ 90%]  (Sampling)
## Chain 1: Iteration: 2000 / 2000 [100%]  (Sampling)
## Chain 1: 
## Chain 1:  Elapsed Time: 51.431 seconds (Warm-up)
## Chain 1:                57.209 seconds (Sampling)
## Chain 1:                108.64 seconds (Total)
## Chain 1: 
## 
## SAMPLING FOR MODEL 'surv' NOW (CHAIN 2).
## Chain 2: 
## Chain 2: Gradient evaluation took 0.002 seconds
## Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 20 seconds.
## Chain 2: Adjust your expectations accordingly!
## Chain 2: 
## Chain 2: 
## Chain 2: Iteration:    1 / 2000 [  0%]  (Warmup)
## Chain 2: Iteration:  200 / 2000 [ 10%]  (Warmup)
## Chain 2: Iteration:  400 / 2000 [ 20%]  (Warmup)
## Chain 2: Iteration:  600 / 2000 [ 30%]  (Warmup)
## Chain 2: Iteration:  800 / 2000 [ 40%]  (Warmup)
## Chain 2: Iteration: 1000 / 2000 [ 50%]  (Warmup)
## Chain 2: Iteration: 1001 / 2000 [ 50%]  (Sampling)
## Chain 2: Iteration: 1200 / 2000 [ 60%]  (Sampling)
## Chain 2: Iteration: 1400 / 2000 [ 70%]  (Sampling)
## Chain 2: Iteration: 1600 / 2000 [ 80%]  (Sampling)
## Chain 2: Iteration: 1800 / 2000 [ 90%]  (Sampling)
## Chain 2: Iteration: 2000 / 2000 [100%]  (Sampling)
## Chain 2: 
## Chain 2:  Elapsed Time: 55.483 seconds (Warm-up)
## Chain 2:                49.767 seconds (Sampling)
## Chain 2:                105.25 seconds (Total)
## Chain 2:
```

See documentation [here](https://cran.r-project.org/web/packages/rstanarm/rstanarm.pdf) 
for further details. By default, PH is assumed.

---

### Back to `rstanarm`

```r
prior_summary(m1)
```

```
## Priors for model 'm1' 
## ------
## Intercept
##  ~ normal(location = 0, scale = 20)
## 
## Coefficients
##   Specified prior:
##     ~ normal(location = [0,0,0], scale = [2.5,2.5,2.5])
##   Adjusted prior:
##     ~ normal(location = [0,0,0], scale = [2.50,0.28,2.50])
## 
## Auxiliary (weibull-shape)
##  ~ half-cauchy(location = 0, scale = 5)
## ------
## See help('prior_summary.stanreg') for more details
```

---

### Back to `rstanarm`

```r
summary(m1)
```

```
## 
## Model Info:
## 
##  function:        stan_surv
##  baseline hazard: weibull
##  formula:         Surv(DWHFDAYS, DWHF) ~ TRTMT + EJF_PER + PREVMI
##  algorithm:       sampling
##  sample:          2000 (posterior sample size)
##  priors:          see help('prior_summary')
##  observations:    6799
##  events:          2332 (34.3%)
##  right censored:  4467 (65.7%)
##  delayed entry:   no
## 
## Estimates:
##                 mean   sd   10%   50%   90%
## (Intercept)   -4.8    0.1 -4.9  -4.8  -4.6 
## TRTMT         -0.3    0.0 -0.3  -0.3  -0.2 
## EJF_PER        0.0    0.0  0.0   0.0   0.0 
## PREVMI         0.1    0.0  0.0   0.0   0.1 
## weibull-shape  0.7    0.0  0.7   0.7   0.7 
## 
## MCMC diagnostics
##               mcse Rhat n_eff
## (Intercept)   0.0  1.0  1045 
## TRTMT         0.0  1.0  1723 
## EJF_PER       0.0  1.0  1961 
## PREVMI        0.0  1.0  1707 
## weibull-shape 0.0  1.0  1066 
## log-posterior 0.1  1.0   662 
## 
## For each parameter, mcse is Monte Carlo standard error, n_eff is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence Rhat=1).
```

---

### Back to `rstanarm`

```r
round(as.data.frame(summary(m1)), 2)
```

```
##                    mean mcse   sd       10%       50%       90% n_eff Rhat
## (Intercept)       -4.79 0.00 0.12     -4.94     -4.79     -4.64  1045    1
## TRTMT             -0.29 0.00 0.04     -0.34     -0.29     -0.23  1723    1
## EJF_PER           -0.04 0.00 0.00     -0.04     -0.04     -0.04  1961    1
## PREVMI             0.05 0.00 0.04     -0.01      0.05      0.11  1707    1
## weibull-shape      0.73 0.00 0.01      0.71      0.73      0.74  1066    1
## log-posterior -20363.45 0.06 1.61 -20365.69 -20363.11 -20361.73   662    1
```

---

### Back to `rstanarm`

```r
plot(m1, plotfun = "basehaz")
```

---

### Back to `rstanarm`

```r
ps_check(m1)
```

---

Bakc to `rstanarm`

```r
plot(m1, "dens_overlay")
```

---

### Back to `rstanarm`

```r
plot(m1, "trace")
```

---

### Back to `rstanarm`

```r
plot(m1, "trace", pars = "TRTMT") +
  labs(title = "Trace plot: Digoxin", 
       y = "Estimate", x = "Draw")
```

---

### An AFT model

```r
m2 <- stan_surv(Surv(DWHFDAYS, DWHF) ~ TRTMT + EJF_PER + PREVMI, 
               data = dig, basehaz = "weibull-aft",
               prior = normal(autoscale = T),
               prior_intercept = normal(autoscale = T),
               prior_aux = cauchy(0, 5, autoscale = T),
               chains = 2, iter = 2000, seed = 123,
               prior_PD = F)
```

```
## 
## SAMPLING FOR MODEL 'surv' NOW (CHAIN 1).
## Chain 1: 
## Chain 1: Gradient evaluation took 0.002 seconds
## Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 20 seconds.
## Chain 1: Adjust your expectations accordingly!
## Chain 1: 
## Chain 1: 
## Chain 1: Iteration:    1 / 2000 [  0%]  (Warmup)
## Chain 1: Iteration:  200 / 2000 [ 10%]  (Warmup)
## Chain 1: Iteration:  400 / 2000 [ 20%]  (Warmup)
## Chain 1: Iteration:  600 / 2000 [ 30%]  (Warmup)
## Chain 1: Iteration:  800 / 2000 [ 40%]  (Warmup)
## Chain 1: Iteration: 1000 / 2000 [ 50%]  (Warmup)
## Chain 1: Iteration: 1001 / 2000 [ 50%]  (Sampling)
## Chain 1: Iteration: 1200 / 2000 [ 60%]  (Sampling)
## Chain 1: Iteration: 1400 / 2000 [ 70%]  (Sampling)
## Chain 1: Iteration: 1600 / 2000 [ 80%]  (Sampling)
## Chain 1: Iteration: 1800 / 2000 [ 90%]  (Sampling)
## Chain 1: Iteration: 2000 / 2000 [100%]  (Sampling)
## Chain 1: 
## Chain 1:  Elapsed Time: 21.788 seconds (Warm-up)
## Chain 1:                19.422 seconds (Sampling)
## Chain 1:                41.21 seconds (Total)
## Chain 1: 
## 
## SAMPLING FOR MODEL 'surv' NOW (CHAIN 2).
## Chain 2: 
## Chain 2: Gradient evaluation took 0.003 seconds
## Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 30 seconds.
## Chain 2: Adjust your expectations accordingly!
## Chain 2: 
## Chain 2: 
## Chain 2: Iteration:    1 / 2000 [  0%]  (Warmup)
## Chain 2: Iteration:  200 / 2000 [ 10%]  (Warmup)
## Chain 2: Iteration:  400 / 2000 [ 20%]  (Warmup)
## Chain 2: Iteration:  600 / 2000 [ 30%]  (Warmup)
## Chain 2: Iteration:  800 / 2000 [ 40%]  (Warmup)
## Chain 2: Iteration: 1000 / 2000 [ 50%]  (Warmup)
## Chain 2: Iteration: 1001 / 2000 [ 50%]  (Sampling)
## Chain 2: Iteration: 1200 / 2000 [ 60%]  (Sampling)
## Chain 2: Iteration: 1400 / 2000 [ 70%]  (Sampling)
## Chain 2: Iteration: 1600 / 2000 [ 80%]  (Sampling)
## Chain 2: Iteration: 1800 / 2000 [ 90%]  (Sampling)
## Chain 2: Iteration: 2000 / 2000 [100%]  (Sampling)
## Chain 2: 
## Chain 2:  Elapsed Time: 22.586 seconds (Warm-up)
## Chain 2:                20.415 seconds (Sampling)
## Chain 2:                43.001 seconds (Total)
## Chain 2:
```

---

### Comparing model results

```r
round(as.data.frame(summary(m1))[,c(1, 3, 7)], 3)
```

```
##                     mean    sd n_eff
## (Intercept)       -4.792 0.117  1045
## TRTMT             -0.289 0.042  1723
## EJF_PER           -0.038 0.002  1961
## PREVMI             0.051 0.044  1707
## weibull-shape      0.727 0.013  1066
## log-posterior -20363.455 1.610   662
```

```r
round(as.data.frame(summary(m2))[,c(1, 3, 7)], 3)
```

```
##                     mean    sd n_eff
## (Intercept)        6.591 0.102  2358
## TRTMT              0.400 0.057  1945
## EJF_PER            0.053 0.003  1800
## PREVMI            -0.067 0.062  2344
## weibull-shape      0.727 0.014  1057
## log-posterior -20363.427 1.549  1113
```