class: center, middle, inverse, title-slide .title[ # Parametric AFT models ] .author[ ### Yue Jiang ] .date[ ### STA 490/690 ] --- ### Review: GLMs <img src="img/bike_crash.jpg" width="100%" /> <!-- .center[Image credit: Petar Milošević, [CC BY-SA 4.0](https://creativecommons.org/licenses/by-sa/4.0), via Wikimedia Commons] --> --- ### Data ``` ## # A tibble: 100 × 2 ## county crashes ## <chr> <dbl> ## 1 Alamance 77 ## 2 Alexander 1 ## 3 Alleghany 1 ## 4 Anson 7 ## 5 Ashe 4 ## 6 Avery 5 ## 7 Beaufort 37 ## 8 Bertie 10 ## 9 Bladen 9 ## 10 Brunswick 88 ## # ℹ 90 more rows ``` Suppose we thought these crashes came from a Poisson distribution. .question[ How might we estimate the parameter of that Poisson distribution, given our observed data? ] --- ### Maximum likelihood estimation We can maximize the .vocab[likelihood function]. Assuming the observations are i.i.d., in general we have: `\begin{align*} \mathcal{L}(\lambda | Y) &= f(y_1, y_2, \cdots, y_n | \lambda)\\ &= f(y_1 | \lambda)f(y_2 | \lambda) \cdots f(y_n | \lambda)\\ &= \prod\limits^{n}_{i = 1} f(y_i | \lambda). \end{align*}` The likelihood function is the probability of "seeing our observed data," **given** a value of `\(\lambda\)`. Do not get `\(f(y_i | \lambda)\)` confused with `\(f(\lambda | y_i)\)`! --- ### Maximum likelihood estimation `\begin{align*} \mathcal{L}(\lambda | Y) &= \prod\limits^{n}_{i = 1} f(y_i | \lambda)\\ &= \prod\limits_{i = 1}^n \frac{\lambda^y_i e^{-\lambda}}{y_i!}\\ \log \mathcal{L}(\lambda | Y) &= \sum_{i = 1}^n \left( y_i\log \lambda - \lambda - \log y_i! \right)\\ &= \log \lambda \sum_{i = 1}^n y_i - n\lambda - \sum_{i = 1}^n \log y_i! \end{align*}` --- ### Maximum likelihood estimation Setting the .vocab[score function] equal to 0: `\begin{align*} \frac{\partial}{\partial\lambda}\log \mathcal{L}(\lambda | Y) = \frac{1}{\lambda}\sum_{i = 1}^n y_i - n &\stackrel{set}{=} 0\\ \implies \hat{\lambda} = \frac{1}{n}\sum_{i = 1}^n y_i, \end{align*}` as expected. Next, let's verify that `\(\hat{\lambda}\)` is indeed a maximum: `\begin{align*} \frac{\partial^2}{\partial\lambda^2}\log \mathcal{L}(\lambda | Y) &= -\frac{1}{\lambda^2}\sum_{i = 1}^n y_i - n\\ &< 0. \end{align*}` --- ### Can we do better? ``` ## # A tibble: 100 × 6 ## county pop med_hh_income traffic_vol pct_rural crashes ## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> ## 1 Alamance 166436 50.5 182 29 77 ## 2 Alexander 37353 49.1 13 73 1 ## 3 Alleghany 11161 39.7 28 100 1 ## 4 Anson 24877 38 79 79 7 ## 5 Ashe 27109 41.9 18 85 4 ## 6 Avery 17505 41.7 35 89 5 ## 7 Beaufort 47079 46.4 53 66 37 ## 8 Bertie 19026 35.4 24 83 10 ## 9 Bladen 33190 37 19 91 9 ## 10 Brunswick 136744 60.2 43 43 88 ## # ℹ 90 more rows ``` We might expect that more populous, more urban counties might have more crashes. There might also be a relationship with traffic volume. .quesiton[ Can we incorporate this additional information while accounting for potential confounding? ] --- ### Poisson regression `\begin{align*} \log\large(\underbrace{E(Y | \mathbf{X}}_{\lambda})\large) = \mathbf{X}\boldsymbol\beta \end{align*}` .vocab[Generalized linear model] often used for count (or rate) data - Assumes outcome has Poisson distribution - Canonical link: log of conditional expectation of response has linear relationship with predictors .question[ Can we differentiate the (log) likelihood function, set it equal to zero, and solve for the MLEs for `\(\boldsymbol\beta = (\beta_0, \beta_1, \cdots, \beta_p)\)` as before? ] --- ### Poisson regression `\begin{align*} \log \mathcal{L}&= \sum_{i = 1}^n y_i\mathbf{X}_i\boldsymbol\beta - e^{\mathbf{X}_i\boldsymbol\beta} - \log y_i!\\ \nabla \log \mathcal{L}&= \sum_{i = 1}^n \left(y_i - e^{\mathbf{X}_i\boldsymbol\beta}\right)\mathbf{X}_i^T\\ \nabla^2 \log \mathcal{L} &= -\sum_{i = 1}^n e^{\mathbf{X}_i\boldsymbol\beta}\mathbf{X}_i\mathbf{X}_i^T \end{align*}` Newton-Raphson update steps for Poisson regression: `\begin{align*} \boldsymbol\beta^{(t+1)} &= \boldsymbol\beta^{(t)} - \left(-\sum_{i = 1}^n e^{\mathbf{X}_i\boldsymbol\beta^{(t)}}\mathbf{X}_i\mathbf{X}_i^T \right)^{-1}\left(\sum_{i = 1}^n \left(y_i - e^{\mathbf{X}_i\boldsymbol\beta^{(t)}}\right)\mathbf{X}_i^T \right) \end{align*}` --- ### Back to survival models Could we use the following model for survival outcomes? `\begin{align*} T_i = \beta_0 + \beta_1x_{1i} + \beta_2x_{2i} + \cdots + \beta_px_{pi} + \epsilon_i \end{align*}` How about this model? `\begin{align*} \log(T_i) = \beta_0 + \beta_1x_{1i} + \beta_2x_{2i} + \cdots + \beta_px_{pi} + \epsilon_i \end{align*}` --- ### An accelerated failure time model An .vocab[accelerated failure time] (AFT) model assumes `\begin{align*} \log(T_i) = \beta_0 + \beta_1x_{1i} + \beta_2x_{2i} + \cdots + \beta_px_{pi} + \epsilon_i \end{align*}` where `\(\epsilon_i\)` are commonly assumed to be i.i.d. and follow some specified distribution. .question[ How might we interpret coefficients in the AFT model? How might we "know" what distribution `\(\epsilon\)` takes on? What might an AFT model look like graphically in terms of survival time? ] --- ### Estimation in survival models An individual's contribution to the likelihood is `\begin{align*} \lambda(t)^{\delta_i}S(t), \end{align*}` and so the full likelihood function is given by `\begin{align*} \mathcal{L}(\theta | Y) = \prod_{i = 1}^n \lambda(t_i)^{\delta_i}S(t_i). \end{align*}` for independently right-censored data with no truncation. --- ### Fitting an AFT model We've specified a functional form between `\(Y\)`, `\(\beta\)`, and `\(\mathbf{X}\)`. For the model `\begin{align*} \log(T_i) = \beta_0 + \beta_1x_{1i} + \beta_2x_{2i} + \cdots + \beta_px_{pi} + \epsilon_i, \end{align*}` .question[ What does `\(\epsilon_i\)` represent? Suppose we assumed `\(T_i\)` had an exponential distribution. What might the likelihood function look like (this will be homework, so don't spend TOO much time on this)? ] --- ### The DIG trial again... ```r names(dig) ``` ``` ## [1] "ID" "TRTMT" "AGE" "RACE" "SEX" "EJFPER" ## [7] "EJFMETH" "CHESTX" "BMI" "KLEVEL" "CREAT" "DIGDOSER" ## [13] "CHFDUR" "RALES" "ELEVJVP" "PEDEMA" "RESTDYS" "EXERTDYS" ## [19] "ACTLIMIT" "S3" "PULCONG" "NSYM" "HEARTRTE" "DIABP" ## [25] "SYSBP" "FUNCTCLS" "CHFETIOL" "PREVMI" "ANGINA" "DIABETES" ## [31] "HYPERTEN" "DIGUSE" "DIURETK" "DIURET" "KSUPP" "ACEINHIB" ## [37] "NITRATES" "HYDRAL" "VASOD" "DIGDOSE" "CVD" "CVDDAYS" ## [43] "WHF" "WHFDAYS" "DIG" "DIGDAYS" "MI" "MIDAYS" ## [49] "UANG" "UANGDAYS" "STRK" "STRKDAYS" "SVA" "SVADAYS" ## [55] "VENA" "VENADAYS" "CREV" "CREVDAYS" "OCVD" "OCVDDAYS" ## [61] "RINF" "RINFDAYS" "OTH" "OTHDAYS" "HOSP" "HOSPDAYS" ## [67] "NHOSP" "DEATH" "DEATHDAY" "REASON" "DWHF" "DWHFDAYS" ``` --- ### Fitting an AFT model ```r library(survival) aft_e <- survreg(Surv(DWHFDAYS, DWHF) ~ TRTMT + EJFPER + PREVMI, data = dig, dist = "exponential") summary(aft_e) ``` ``` ## ## Call: ## survreg(formula = Surv(DWHFDAYS, DWHF) ~ TRTMT + EJFPER + PREVMI, ## data = dig, dist = "exponential") ## Value Std. Error z p ## (Intercept) 6.62850 0.07495 88.43 < 2e-16 ## TRTMT 0.29856 0.04167 7.17 7.7e-13 ## EJFPER 0.04037 0.00239 16.92 < 2e-16 ## PREVMI -0.04978 0.04364 -1.14 0.25 ## ## Scale fixed at 1 ## ## Exponential distribution ## Loglik(model)= -20511.5 Loglik(intercept only)= -20680.3 ## Chisq= 337.51 on 3 degrees of freedom, p= 7.6e-73 ## Number of Newton-Raphson Iterations: 5 ## n=6799 (1 observation deleted due to missingness) ``` --- ### Fitting an AFT model ```r library(survival) aft_w <- survreg(Surv(DWHFDAYS, DWHF) ~ TRTMT + EJFPER + PREVMI, data = dig, dist = "weibull") summary(aft_w) ``` ``` ## ## Call: ## survreg(formula = Surv(DWHFDAYS, DWHF) ~ TRTMT + EJFPER + PREVMI, ## data = dig, dist = "weibull") ## Value Std. Error z p ## (Intercept) 6.58614 0.10266 64.15 < 2e-16 ## TRTMT 0.39910 0.05768 6.92 4.5e-12 ## EJFPER 0.05263 0.00337 15.61 < 2e-16 ## PREVMI -0.06661 0.06001 -1.11 0.27 ## Log(scale) 0.31844 0.01891 16.84 < 2e-16 ## ## Scale= 1.37 ## ## Weibull distribution ## Loglik(model)= -20351.2 Loglik(intercept only)= -20504.9 ## Chisq= 307.43 on 3 degrees of freedom, p= 2.5e-66 ## Number of Newton-Raphson Iterations: 5 ## n=6799 (1 observation deleted due to missingness) ``` --- ### Fitting an AFT model ```r library(survival) aft_n <- survreg(Surv(DWHFDAYS, DWHF) ~ TRTMT + EJFPER + PREVMI, data = dig, dist = "lognormal") summary(aft_n) ``` ``` ## ## Call: ## survreg(formula = Surv(DWHFDAYS, DWHF) ~ TRTMT + EJFPER + PREVMI, ## data = dig, dist = "lognormal") ## Value Std. Error z p ## (Intercept) 6.06283 0.12297 49.30 < 2e-16 ## TRTMT 0.55329 0.06741 8.21 2.2e-16 ## EJFPER 0.05932 0.00385 15.42 < 2e-16 ## PREVMI -0.11991 0.07059 -1.70 0.089 ## Log(scale) 0.82037 0.01636 50.14 < 2e-16 ## ## Scale= 2.27 ## ## Log Normal distribution ## Loglik(model)= -20343.2 Loglik(intercept only)= -20501 ## Chisq= 315.72 on 3 degrees of freedom, p= 3.9e-68 ## Number of Newton-Raphson Iterations: 3 ## n=6799 (1 observation deleted due to missingness) ``` --- ### Fitting an AFT model ```r library(survival) aft_l <- survreg(Surv(DWHFDAYS, DWHF) ~ TRTMT + EJFPER + PREVMI, data = dig, dist = "loglogistic") summary(aft_l) ``` ``` ## ## Call: ## survreg(formula = Surv(DWHFDAYS, DWHF) ~ TRTMT + EJFPER + PREVMI, ## data = dig, dist = "loglogistic") ## Value Std. Error z p ## (Intercept) 6.04825 0.11365 53.22 < 2e-16 ## TRTMT 0.48004 0.06243 7.69 1.5e-14 ## EJFPER 0.05570 0.00359 15.50 < 2e-16 ## PREVMI -0.08625 0.06517 -1.32 0.19 ## Log(scale) 0.19883 0.01835 10.84 < 2e-16 ## ## Scale= 1.22 ## ## Log logistic distribution ## Loglik(model)= -20342 Loglik(intercept only)= -20498.1 ## Chisq= 312.28 on 3 degrees of freedom, p= 2.2e-67 ## Number of Newton-Raphson Iterations: 4 ## n=6799 (1 observation deleted due to missingness) ```