class: center, middle, inverse, title-slide # Introduction to survival analysis (2) ### Yue Jiang ### Duke University --- ### A disclaimer Today's (and last time's) lectures are introductory surface level treatments of survival analysis. We focus on applications and use cases -- there are no theoretical results presented (even for important subjects like variance estimation). There is much to discuss regarding survival analysis both theoretically and in application. In STA 440, we will focus on using and implementing commonly used methods to tackle real-world datasets instead of focusing on theoretical considerations. --- ### The DIG Trial <img src="img/02/dig_trial.png" width="120%" /> Investigators compared the **primary outcome** of the number of days from the start of the study to either death or hospitalization from worsening heart failure. --- ### Representing survival data Underlying data: - `\(T\)`: Failure time, a non-negative random variable - `\(C\)`: Censoring time, a non-negative random variable Observed data for individual `\(i\)`: - `\(Y_i\)`: `\((T_i \wedge C_i)\)`, the minimum of `\(T_i\)` and `\(C_i\)` - `\(\delta_i\)`: `\(1_{(T_i \le C_i)}\)`, whether we observe a failure If `\(\delta_i = 0\)`, then we have .vocab[right-censoring]: the survival time is longer than the censoring time. Commonly, we assume `\(C_i\)` are *i.i.d.* random variables with some distribution and that the censoring mechanism is *independent* of the failure mechanism. --- ### Characterizing continuous `\(T\)` - Density function: `\(f(t) = \lim_{\Delta t \to 0^+} \frac{P(t \le T < t + \Delta t)}{\Delta t}\)` - Distribution function: `\(F(t) = P(T \le t) = \int_0^t f(s)ds\)` - Survival function: `\(S(t) = P(T > t) = 1 - F(t)\)` - Hazard function: `\(\lambda(t) = \lim_{\Delta t \to 0^+} \frac{P(t \le T < t + \Delta t | T \ge t)}{\Delta t}\)` - Cumulative hazard function: `\(\Lambda(t) = \int_0^t \lambda(s)ds\)` Knowing one is equivalent to knowing the others. --- ### Hazard distributions Exponential distribution: - `\(f(t) = \lambda e^{-\lambda t}\)` (don't get the rate parameter `\(\lambda\)` confused with the hazard) - `\(F(t) = 1 - e^{-\lambda t}\)` - `\(S(t) = e^{-\lambda t}\)` - `\(\lambda(t) = \lambda\)` - `\(\Lambda(t) = \lambda t\)` .vocab[ What do you notice about the hazard for survival times that have an exponential distribution? ] --- ### Hazard distributions Weibull distribution: - `\(f(t) = p\lambda^p t^{p - 1}e^{-(\lambda t)^p}\)` - `\(F(t) = 1 - e^{-(\lambda t)^p}\)` - `\(S(t) = e^{-(\lambda t)^p}\)` - `\(\lambda(t) = p\lambda^p t^{p - 1}\)` - `\(\Lambda(t) = (\lambda t)^p\)` When the shape parameter `\(p\)` is 1, then we have the exponential distribution. The hazard increases monotonically over time if `\(p > 1\)` and decreases monotonically if `\(p < 1\)` (is this reasonable to assume?). --- ### Hazard distributions Plotting Weibull hazard with `\(\lambda = 1\)` and various shape parameters `\(p\)`: <img src="survival-2_files/figure-html/unnamed-chunk-3-1.png" style="display: block; margin: auto;" /> --- ### Hazard distributions Log-normal distribution: - `\(f(t) = \frac{1}{t \sigma}\phi\left(\frac{\log(t) - \mu}{\sigma} \right)\)` - `\(F(t) = \Phi\left(\frac{\log(t) - \mu}{\sigma} \right)\)` - `\(S(t) = 1 - F(t)\)` - `\(\lambda(t) = f(t)/S(t)\)` - `\(\Lambda(t) = \int_0^t \lambda(s)ds\)` --- ### Hazard distributions Plotting log-normal hazard with `\(\mu = 0\)` and `\(\sigma^2 = 1\)`: <img src="survival-2_files/figure-html/unnamed-chunk-4-1.png" style="display: block; margin: auto;" /> --- ### Review: comparing multiple groups ```r library(survminer) ggsurvplot(survfit(Surv(DWHFDAYS, DWHF) ~ TRTMT, data = dig), xlab = "Days", ylab = "Est. Survival Probability", ylim = c(0, 1), conf.int = T, censor = F, legend.labs = c("Placebo", "Digoxin")) ``` <img src="survival-2_files/figure-html/unnamed-chunk-5-1.png" style="display: block; margin: auto;" /> --- ### Review: comparing multiple groups ```r survdiff(Surv(DWHFDAYS, DWHF) ~ TRTMT, data = dig) ``` ``` ## Call: ## survdiff(formula = Surv(DWHFDAYS, DWHF) ~ TRTMT, data = dig) ## ## N Observed Expected (O-E)^2/E (O-E)^2/V ## TRTMT=0 3403 1291 1126 24.1 46.6 ## TRTMT=1 3397 1041 1206 22.5 46.6 ## ## Chisq= 46.6 on 1 degrees of freedom, p= 9e-12 ``` .question[ How might we adjust for potential confounders? Is there any way to create a predictive model for survival time? ] --- ### An accelerated failure time model An .vocab[accelerated failure time] (AFT) model assumes `\begin{align*} \log(T_i) = \beta_0 + \beta_1x_1 + \beta_2x_2 + \cdots + \beta_px_p + \epsilon_i \end{align*}` where `\(\epsilon_i\)` are commonly assumed to be i.i.d. and follow some specified distribution. There is a one-to-one relationship between the distribution of `\(T\)` and the assumed error distribution in the AFT model. For instance, if `\(\epsilon\)` has a normal distribution, then `\(T\)` has a log-normal distribution. If `\(\epsilon\)` has a gen. EV distribution, then `\(T\)` has a Weibull distribution, etc. In software packages, these models are often fit by specifying the distribution of `\(T\)`. --- ### An accelerated failure time model Note that we can also write the AFT model as `\begin{align*} T_i &= \exp\left(\beta_0 + \beta_1x_{1i} + \beta_2x_{2i} + \cdots + \beta_px_{pi} \right)e^{\epsilon_i}\\ &= e^{\beta_0}e^{\beta_1x_{1i}}e^{\beta_2x_{2i}}\cdots e^{\beta_px_{pi}}e^{\epsilon_i} \end{align*}` Covariates in an AFT model have a multiplicative effect on *time*. For instance, if `\(\beta_k = 0.4\)`, then `\(\exp(\beta_k) \approx 1.5\)`. Holding all else equal, an individual with covariate `\(x_k\)` one unit greater than another is expected to survive approximately 1.5 times longer than the other. --- ### Fitting an AFT model ```r library(survival) aft_e <- survreg(Surv(DWHFDAYS, DWHF) ~ TRTMT + EJF_PER + PREVMI, data = dig, dist = "exponential") summary(aft_e) ``` ``` ## ## Call: ## survreg(formula = Surv(DWHFDAYS, DWHF) ~ TRTMT + EJF_PER + PREVMI, ## data = dig, dist = "exponential") ## Value Std. Error z p ## (Intercept) 6.62850 0.07495 88.43 < 2e-16 ## TRTMT 0.29856 0.04167 7.17 7.7e-13 ## EJF_PER 0.04037 0.00239 16.92 < 2e-16 ## PREVMI -0.04978 0.04364 -1.14 0.25 ## ## Scale fixed at 1 ## ## Exponential distribution ## Loglik(model)= -20511.5 Loglik(intercept only)= -20680.3 ## Chisq= 337.51 on 3 degrees of freedom, p= 7.6e-73 ## Number of Newton-Raphson Iterations: 5 ## n=6799 (1 observation deleted due to missingness) ``` --- ### Fitting an AFT model ```r library(survival) aft_w <- survreg(Surv(DWHFDAYS, DWHF) ~ TRTMT + EJF_PER + PREVMI, data = dig, dist = "weibull") summary(aft_w) ``` ``` ## ## Call: ## survreg(formula = Surv(DWHFDAYS, DWHF) ~ TRTMT + EJF_PER + PREVMI, ## data = dig, dist = "weibull") ## Value Std. Error z p ## (Intercept) 6.58614 0.10266 64.15 < 2e-16 ## TRTMT 0.39910 0.05768 6.92 4.5e-12 ## EJF_PER 0.05263 0.00337 15.61 < 2e-16 ## PREVMI -0.06661 0.06001 -1.11 0.27 ## Log(scale) 0.31844 0.01891 16.84 < 2e-16 ## ## Scale= 1.37 ## ## Weibull distribution ## Loglik(model)= -20351.2 Loglik(intercept only)= -20504.9 ## Chisq= 307.43 on 3 degrees of freedom, p= 2.5e-66 ## Number of Newton-Raphson Iterations: 5 ## n=6799 (1 observation deleted due to missingness) ``` --- ### A proportional hazards model `\begin{align*} \lambda(t) &= \lambda_0(t)\exp\left(\beta_1x_1 + \beta_2x_2 + \cdots + \beta_px_p \right) \end{align*}` where the .vocab[baseline hazard] is assumed to have some distribution (or maybe not!...more on that in just a bit). Covariates in a PH model have a multiplicative effect on *hazards*. For instance, if `\(\beta_k = 0.4\)`, then `\(\exp(\beta_k) \approx 1.5\)`. Holding all else equal, an individual with covariate `\(x_k\)` one unit greater than another is expected to have approximately 1.5 times the *hazard* of the event than the other. .question[ Would you rather have a higher linear predictor or a lower linear predictor in a PH model? How does this compare to the AFT model? ] --- ### A proportional hazards model <img src="survival-2_files/figure-html/unnamed-chunk-9-1.png" style="display: block; margin: auto;" /> --- ### Fitting a PH model ```r library(eha) ph_ln <- phreg(Surv(DWHFDAYS, DWHF) ~ TRTMT + EJF_PER + PREVMI, data = dig, dist = "lognormal") ph_ln ``` ``` ## Call: ## phreg(formula = Surv(DWHFDAYS, DWHF) ~ TRTMT + EJF_PER + PREVMI, ## data = dig, dist = "lognormal") ## ## Covariate W.mean Coef Exp(Coef) se(Coef) Wald p ## (Intercept) 4.480 1.109 0.000 ## TRTMT 0.518 -0.290 0.748 0.042 0.000 ## EJF_PER 29.455 -0.038 0.963 0.002 0.000 ## PREVMI 0.645 0.048 1.049 0.044 0.271 ## ## log(scale) 15.421 2.721 0.000 ## log(shape) -1.358 0.123 0.000 ## ## Events 2332 ## Total time at risk 6092212 ## Max. log. likelihood -20339 ## LR test statistic 303.79 ## Degrees of freedom 3 ## Overall p-value 0 ``` .vocab[ How might we interpret the coefficient estimates here? How do they relate to our previous models? ] --- ### The Cox proportional hazards model `\begin{align*} \lambda(t) &= \lambda_0(t)\exp\left(\beta_1x_1 + \beta_2x_2 + \cdots + \beta_px_p \right) \end{align*}` In a parametric survival model (such as ones we've seen), the survival times are assumed to follow a specific distribution, which is a fairly strong assumption. -- Often times, we only care about the `\(\beta\)` terms and not so much the `\(\lambda_0\)`. Using the concept of partial likelihood, Cox (1972) found that we can "separate" inference for the `\(\beta\)` terms from `\(\lambda_0\)`. The Cox model is a *semiparametric* survival model; `\(\lambda_0(t)\)` is left completely unspecified, with no assumptions made on its shape. (A semi-parametric version of the AFT model also exists, but isn't very popular). --- ### The Cox proportional hazards model - By far the most commonly used regression model for survival data - Attractive interpretations using hazard ratios - Can be extended to accommodate time-varying covariates, recurrent events, etc. - Fewer assumptions than fully parametric models, but still requires PH assumption - Can compare nested models by using partial likelihood ratio statistic, which has asumptotic `\(\chi^2\)` distribution (df = difference in number of parameters) --- ### Fitting the Cox PH model ```r coxm1 <- coxph(Surv(DWHFDAYS, DWHF) ~ TRTMT + EJF_PER + PREVMI, data = dig) summary(coxm1) ``` ``` ## Call: ## coxph(formula = Surv(DWHFDAYS, DWHF) ~ TRTMT + EJF_PER + PREVMI, ## data = dig) ## ## n= 6799, number of events= 2332 ## (1 observation deleted due to missingness) ## ## coef exp(coef) se(coef) z Pr(>|z|) ## TRTMT -0.289745 0.748454 0.041666 -6.954 3.55e-12 *** ## EJF_PER -0.038061 0.962654 0.002381 -15.985 < 2e-16 *** ## PREVMI 0.047921 1.049087 0.043638 1.098 0.272 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## exp(coef) exp(-coef) lower .95 upper .95 ## TRTMT 0.7485 1.3361 0.6898 0.8121 ## EJF_PER 0.9627 1.0388 0.9582 0.9672 ## PREVMI 1.0491 0.9532 0.9631 1.1428 ## ## Concordance= 0.605 (se = 0.006 ) ## Likelihood ratio test= 304.4 on 3 df, p=<2e-16 ## Wald test = 302.5 on 3 df, p=<2e-16 ## Score (logrank) test = 306.2 on 3 df, p=<2e-16 ``` .question[ How might we interpret these coefficients? How do they compare to our previous models? ] --- ### Fitting the Cox PH model The strength of the Cox model is that we can ignore estimation of `\(\lambda_0\)` completely (it doesn't matter for valid inference on the `\(\beta\)`s). If we want to estimate survival probabilities, then we must estimate the baseline hazard. A non-parametric estimator (the .vocab[Breslow estimator]) is implemented by the `basehaz` function in the `survival` package (confusingly, this is of the *cumulative* hazard). It is given by: `\begin{align*} \hat{\Lambda}_0(t) = \sum_{i:t_{(i)} < t} \frac{1}{\sum_{j \in R_i} \exp(\mathbf{X}_j\beta)} \end{align*}` We can then estimate survival in the Cox model by: `\begin{align*} \hat{S}(t|\mathbf{X}) = \exp(-\exp(\mathbf{X}\beta) \hat{\Lambda}_0(t)) \end{align*}` --- ### Cox model diagnostics Recall the inverse CDF result: if `\(T_i\)` has survival function `\(S_i(t)\)`, then `\(S_i(T_i)\)` should have a uniform distribution on (0, 1) and `\(\Lambda_i(T_i)\)` should follow Exp(1) distribution. Thus, if the model is correct, then the estimated cumulative hazard `\(\hat{\Lambda}\)` at observed times should be a censored sample from Exp(1). These `\(\hat{\Lambda}_i(Y_i)\)` are known as .vocab[Cox-Snell residuals]. -- Plotting `\(\log(-\log\hat{S}(Y_i))\)` against `\(\log(Y_i)\)` should thus follow a straight line through the origin at a 45-degree angle. Although Cox-Snell residuals can also be used for other models (e.g., checking whether distribution specified in AFT model is appropriate), they're not too useful in practice (for a variety of reasons). --- ### Cox model diagnostics To assess PH assumption, we can examine .vocab[Schoenfeld residuals]. Intuitively, each Schoenfeld residual is the difference between the observed covariate and the expected covariate for each observed failure, conditioned on the risk set at that time. In plotting Schoenfeld residuals vs. survival times, we expect to see them randomly distributed around 0. --- ### Cox model diagnostics .vocab[Martingale residuals] are based on the difference between observed number of events up until time `\(Y_i\)` and the expected count based on the fitted model. .vocab[Deviance residuals] are a normalized transformation of the martingale residuals that correct their skewness. They should be randomly distributed around mean 0 with a variance of 1. In practice, these residuals are useful for finding potential outliers: negative values "lived longer than expected" and positive values "died too soon." --- ### Cox model diagnostics ```r library(survminer) ggcoxdiagnostics(coxm1, type = "schoenfeld") ``` <!-- --> --- ### Cox model diagnostics ```r library(survminer) ggcoxdiagnostics(coxm1, type = "deviance", linear.predictions = F) ``` <!-- -->