class: center, middle, inverse, title-slide .title[ # Adding a variable ] .author[ ### Yue Jiang ] .date[ ### STA 490/690 ] --- ### Midterm data ``` ## Call: ## coxph(formula = Surv(tstart, tstop, recur) ~ hrt + age + er + ## size, data = dat) ## ## n= 521, number of events= 153 ## ## coef exp(coef) se(coef) z Pr(>|z|) ## hrt 0.4466339 1.5630419 0.1873331 2.384 0.01712 * ## age -0.0259687 0.9743656 0.0092221 -2.816 0.00486 ** ## er -0.0005051 0.9994950 0.0006654 -0.759 0.44778 ## size 0.0099042 1.0099534 0.0050970 1.943 0.05200 . ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## exp(coef) exp(-coef) lower .95 upper .95 ## hrt 1.5630 0.6398 1.0827 2.2565 ## age 0.9744 1.0263 0.9569 0.9921 ## er 0.9995 1.0005 0.9982 1.0008 ## size 1.0100 0.9901 0.9999 1.0201 ## ## Concordance= 0.621 (se = 0.024 ) ## Likelihood ratio test= 18.54 on 4 df, p=0.001 ## Wald test = 18.65 on 4 df, p=9e-04 ## Score (logrank) test = 18.84 on 4 df, p=8e-04 ``` --- ### Midterm data ``` ## Call: ## coxph(formula = Surv(tstart, tstop, recur) ~ age + er + size, ## data = dat) ## ## n= 521, number of events= 153 ## ## coef exp(coef) se(coef) z Pr(>|z|) ## age -0.0223473 0.9779006 0.0089966 -2.484 0.0130 * ## er -0.0005185 0.9994816 0.0006668 -0.778 0.4368 ## size 0.0101590 1.0102107 0.0051027 1.991 0.0465 * ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## exp(coef) exp(-coef) lower .95 upper .95 ## age 0.9779 1.0226 0.9608 0.9953 ## er 0.9995 1.0005 0.9982 1.0008 ## size 1.0102 0.9899 1.0002 1.0204 ## ## Concordance= 0.594 (se = 0.024 ) ## Likelihood ratio test= 13.13 on 3 df, p=0.004 ## Wald test = 13.32 on 3 df, p=0.004 ## Score (logrank) test = 13.47 on 3 df, p=0.004 ``` --- ### Two models `\begin{align*} \lambda_i(t) &= \lambda_0\exp(\beta_1\times HRT_i + \beta_2 \times age_i + \beta_3 \times ER_i + \beta_4 \times size_i)\\ \lambda_i(t) &= \lambda_0\exp(0 \times HRT_i + \beta_2 \times age + \beta_3 \times ER_i + \beta_4 \times size_i) \end{align*}` We calculate the MLE in two models - an unrestricted model (the larger model) and a restricted model (the smaller model, where the `\(\beta\)` for HRT is restricted to be 0). .question[ Which model corresponds to the null hypothesis? Why does R report things like "negative 2 log-likelihood?" ] --- ### The Likelihood Ratio Test If the null hypothesis is true, then the (log) likelihood shouldn't change too much from the unrestricted MLE to the restricted MLE. That is, the ratio between the likelihoods should be 1 (the difference between log-likelihoods should be 0) if the null is true: `\begin{align*} \Lambda = -2\left\{\log\left(\sup_{\theta \in \Theta_0}\mathcal{L(\theta)}\right) - \log\left(\sup_{\theta \in \Theta}\mathcal{L(\theta)})\right)\right\} \end{align*}` --- ### The Likelihood Ratio Test Take a Taylor expansion for `\(\theta_0\)` around `\(\widehat{\theta}_n\)`: `\begin{align*} l(\theta_0) &= l(\widehat{\theta}_n) + l^\prime(\widehat{\theta}_n)(\theta_0 - \widehat{\theta}_n) + \frac{1}{2} l^{\prime\prime}(\widehat{\theta}_n)(\theta_0 - \widehat{\theta}_n)^2 +\\ &\mathrel{\phantom{=}}\frac{1}{6}(\widehat{\theta}_n)(\theta_0 - \widehat{\theta}_n)^3 + \cdots\\ &= l(\widehat{\theta}_n) + l^\prime(\widehat{\theta}_n)(\theta_0 - \widehat{\theta}_n) + \frac{1}{2} l^{\prime\prime}(\widehat{\theta}_n)(\theta_0 - \widehat{\theta}_n)^2 +\\ &\mathrel{\phantom{=}}\frac{1}{6}l^{\prime\prime\prime}(\tilde{\theta})(\theta_0 - \widehat{\theta}_n)^3 \end{align*}` using the Lagrange form of the remainder term. .question[ So far we haven't mentioned any assumptions. What might be required? ] --- ### The Likelihood Ratio Test Some assumptions given `\(H_0\)` is true: - The support does not depend on the parameter - The true `\(\theta_0\)` is not on the boundary - Existence of derivatives and Fisher information (which is non-zero and finite) - The third derivative of the log-likelihood function close to `\(\theta_0\)` "behaves nicely" --- ### The Likelihood Ratio Test Take a Taylor expansion for `\(\theta_0\)` around `\(\widehat{\theta}_n\)`: `\begin{align*} l(\theta_0) &= l(\widehat{\theta}_n) + l^\prime(\widehat{\theta}_n)(\theta_0 - \widehat{\theta}_n) + \frac{1}{2} l^{\prime\prime}(\widehat{\theta}_n)(\theta_0 - \widehat{\theta}_n)^2 +\\ &\mathrel{\phantom{=}}\frac{1}{6}(\widehat{\theta}_n)(\theta_0 - \widehat{\theta}_n)^3 + \cdots\\ &= l(\widehat{\theta}_n) + l^\prime(\widehat{\theta}_n)(\theta_0 - \widehat{\theta}_n) + \frac{1}{2} l^{\prime\prime}(\widehat{\theta}_n)(\theta_0 - \widehat{\theta}_n)^2 +\\ &\mathrel{\phantom{=}}\frac{1}{6}l^{\prime\prime\prime}(\tilde{\theta})(\theta_0 - \widehat{\theta}_n)^3 \end{align*}` using the Lagrange form of the remainder term. .question[ What is `\(l^\prime(\widehat{\theta}_n)\)` at the MLE? What are some properties of the MLE? What's going on with `\(l^{\prime\prime\prime}(\tilde{\theta})\)`? ] --- ### The Likelihood Ratio Test (we did a lot of board work in class, but summary below) `\begin{align*} l(\theta_0) &= l(\widehat{\theta}_n) + \frac{1}{2} l^{\prime\prime}(\widehat{\theta}_n)(\theta_0 - \widehat{\theta}_n)^2 + o_p(1)\\ -2(l(\theta_0) - l(\widehat{\theta}_n)) &= -l^{\prime\prime}(\widehat{\theta}_n)(\theta_0 - \widehat{\theta}_n)^2 + o_p(1) \\ &= \underbrace{-\frac{1}{nI(\theta_0)}l^{\prime\prime}(\widehat{\theta}_n)}_{\to_p 1}\left( \underbrace{\sqrt{nI(\theta_0)}(\theta_0 - \widehat{\theta}_n)}_{\to_d N(0, 1)}\right)^2 + o_p(1)\\ &\to_d \chi^2_1 \end{align*}` --- ### The Likelihood Ratio Test ``` r anova(m1, m2) ``` ``` ## Analysis of Deviance Table ## Cox model: response is Surv(tstart, tstop, recur) ## Model 1: ~ hrt + age + er + size ## Model 2: ~ age + er + size ## loglik Chisq Df Pr(>|Chi|) ## 1 -816.31 ## 2 -819.01 5.4129 1 0.01999 * ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ``` We skipped a bunch of technical details (some we covered in class, but not all of them); the proof for the multi-dimensional case is way, way worse.