class: center, middle, inverse, title-slide .title[ # Partial Likelihood ] .author[ ### Yue Jiang ] .date[ ### STA 490/690 ] --- ### A proportional hazards model `\begin{align*} \lambda(t) &= \lambda_0(t)\exp\left(\beta_1x_1 + \beta_2x_2 + \cdots + \beta_px_p \right) \end{align*}` .question[ When might we care about the `\(\boldsymbol\beta\)`? When might we care about `\(\lambda(t)\)`? ] --- ### A proportional hazards model `\begin{align*} \lambda(t) &= \lambda_0(t)\exp\left(\beta_1x_1 + \beta_2x_2 + \cdots + \beta_px_p \right) \end{align*}` In a parametric survival model (such as ones we've seen so far), survival times are assumed to follow a specific distribution, which we might be unwilling to assume. .question[ Is it possible to model `\(\boldsymbol\beta\)` without knowing what `\(\lambda_0(t)\)` looks like? Intuitively, how might we be able to do so, given that `\(\lambda_0(t)\)` is infinite-dimensional? ] --- ### Estimating `\(\boldsymbol\beta\)` For now, assume that there are no tied failure times, once again ordering the failure times from `\(t_1 < t_2 < \cdots < t_K\)`. We will again assume that observations/failures are independent. .question[ Consider the interval `\((t_{j-1}^+, t_j^-)\)`: that is, an interval in which no failures occur. What information, if any, do we obtain regarding `\(\boldsymbol\beta\)`, `\(\mathbf{X}\)`, and `\(\lambda(t)\)` (relationships between predictors and the hazard)? **Remember**: `\(\lambda_0(t)\)` might take on *any arbitrary* shape. ] --- ### Estimating `\(\boldsymbol\beta\)` Now suppose we know that there **is** one failure at time `\(t_j\)`. For this failure time, the contribution to the "likelihood" is the probability that individual `\(j\)` fails, *given* that there is exactly one failure in the risk set at that time: `\begin{align*} P(\text{individual }j\text{ fails} | \text{one failure from }\mathcal{R}(t_j)) \end{align*}` where `\(\mathcal{R}(t_j) = \left\{i: T_i \ge t_j \right\}\)` is the risk set at time `\(t_j\)`. .question[ How else might you more formally express this as a ratio of two probabilities (the joint and the marginal)? ] --- ### Estimating `\(\boldsymbol\beta\)` This is `\begin{align*} \frac{P(\text{individual }j\text{ fails and one failure from }\mathcal{R}(t_j))}{P(\text{one failure from }\mathcal{R}(t_j)) } \end{align*}` We are conditioning on exactly one failure at time `\(t_j\)`, but we don't know who it is. .question[ What are numerator and denominator more specifically? (we'll go into more detail on the next two slides) ] --- ### The numerator and the denominator We know the following about/from the numerator: - One specific person failed at time `\(t_j\)` - This person has specific characteristics: let's call them `\(\mathbf{x}_{t_j}\)`; the covariates corresponding to the person that failed at time `\(t_j\)` - They had to still have been eligible to fail at time `\(t_j\)` We know the following about/from the denominator: - The characteristics of people still eligible to fail at that time - They all had to still have been eligible to fail at time `\(t_j\)` .question[ What are these quantities? ] --- ### Estimating `\(\boldsymbol\beta\)` We have `\begin{align*} &\mathrel{\phantom{=}} \frac{P(\text{individual }j\text{ fails and one failure from }\mathcal{R}(t_j))}{P(\text{one failure from }\mathcal{R}(t_j)) }\\ &= \frac{P(\text{individual }j\text{ fails} | \text{at risk at }t_j)}{\sum_{k \in \mathcal{R}(t_j)}P(\text{individual }k\text{ fails} | \text{at risk at }t_j)}\\ &= \frac{\lambda(t_j | \mathbf{x}_j)}{\sum_{k \in \mathcal{R}(t_j)}\lambda(t_j | \mathbf{x}_k)} \end{align*}` .question[ Can you simplify these expressions? We still don't know anything about `\(\lambda_0\)`! ] --- ### Estimating `\(\boldsymbol\beta\)` Under proportional hazards, `\(\lambda(t | \mathbf{x}) = \lambda_0(t)\exp(\mathbf{x}\boldsymbol\beta)\)`. Thus, for each failure time `\(t_j\)`, we have `\begin{align*} &\mathrel{\phantom{=}} \frac{\lambda(t_j | \mathbf{x}_j)}{\sum_{k \in \mathcal{R}(t_j)}\lambda(t_j | \mathbf{x}_k)}\\ &= \frac{\lambda_0(t_j)\exp(\mathbf{x}_j\boldsymbol\beta)}{\sum_{k \in \mathcal{R}(t_j)}\lambda_0(t_j)\exp(\mathbf{x}_k\boldsymbol\beta)}\\ &= \frac{\exp(\mathbf{x}_j\boldsymbol\beta)}{\sum_{k \in \mathcal{R}(t_j)}\exp(\mathbf{x}_k\boldsymbol\beta)} \end{align*}` --- ### Estimating `\(\boldsymbol\beta\)` If there are `\(J\)` total failure times, the "likelihood" is thus the product over all possible failure times: `\begin{align*} \mathcal{L}^\star(\boldsymbol\beta) &= \prod_{j = 1}^J \frac{\exp(\mathbf{x}_j\boldsymbol\beta)}{\sum_{k \in \mathcal{R}(t_j)}\exp(\mathbf{x}_k\boldsymbol\beta)}\\ &= \prod_{i = 1}^n \left(\frac{\exp(\mathbf{x}_i\boldsymbol\beta)}{\sum_{k \in \mathcal{R}(t_i)}\exp(\mathbf{x}_k\boldsymbol\beta)}\right)^{\delta_i}\\ \log\mathcal{L}^\star(\boldsymbol\beta) &= \sum_{i = 1}^n \delta_i\left(\mathbf{x}_i\boldsymbol\beta - \log \sum_{k \in \mathcal{R}(t_i)}\exp(\mathbf{x}_k\boldsymbol\beta) \right) \end{align*}` --- ### The partial log-likelihood We can maximize this quantity with respect to `\(\boldsymbol\beta\)` *without* having to estimate (or even specify) the baseline hazard `\(\lambda_0(t)\)` - "$t$" never enters the equation above. Censored individuals never contribute to the numerator, but do contribute to risk sets before their censorings. This expression is known as the .vocab[partial likelihood]. (See board for diagram) --- ### The Cox proportional hazards model `\begin{align*} \lambda(t) &= \lambda_0(t)\exp\left(\beta_1x_1 + \beta_2x_2 + \cdots + \beta_px_p \right) \end{align*}` - Fewer assumptions than fully parametric models, but still requires PH assumption - By far the most commonly used regression model for survival data - Attractive interpretations using hazard ratios - Can be extended to accommodate time-varying covariates, recurrent events, etc. --- ### Fitting the Cox PH model ```r coxm1 <- coxph(Surv(DWHFDAYS, DWHF) ~ TRTMT + EJFPER + PREVMI, data = dig) summary(coxm1) ``` ``` ## Call: ## coxph(formula = Surv(DWHFDAYS, DWHF) ~ TRTMT + EJFPER + PREVMI, ## data = dig) ## ## n= 6799, number of events= 2332 ## (1 observation deleted due to missingness) ## ## coef exp(coef) se(coef) z Pr(>|z|) ## TRTMT -0.289745 0.748454 0.041666 -6.954 3.55e-12 *** ## EJFPER -0.038061 0.962654 0.002381 -15.985 < 2e-16 *** ## PREVMI 0.047921 1.049087 0.043638 1.098 0.272 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## exp(coef) exp(-coef) lower .95 upper .95 ## TRTMT 0.7485 1.3361 0.6898 0.8121 ## EJFPER 0.9627 1.0388 0.9582 0.9672 ## PREVMI 1.0491 0.9532 0.9631 1.1428 ## ## Concordance= 0.605 (se = 0.006 ) ## Likelihood ratio test= 304.4 on 3 df, p=<2e-16 ## Wald test = 302.5 on 3 df, p=<2e-16 ## Score (logrank) test = 306.2 on 3 df, p=<2e-16 ``` .question[ How might we interpret these coefficients? How do they compare to our previous models? ] --- ### Fitting the Cox PH model One strength of the Cox model is that we can ignore estimation of `\(\lambda_0\)` completely (it doesn't matter for valid inference on the `\(\beta\)`s). If we *do* want to estimate survival probabilities, then we must estimate the baseline hazard. A non-parametric estimate of the cumulative hazard (the .vocab[Breslow estimator] - derivation is beyond scope of this class) is implemented by the `basehaz` function in the `survival` package by: `\begin{align*} \hat{\Lambda}_0(t) = \sum_{i:t_{(i)} < t} \frac{1}{\sum_{k \in \mathcal{R}_i} \exp(\mathbf{x}_k\boldsymbol\beta)} \end{align*}` We can then estimate survival in the Cox model by: `\begin{align*} \hat{S}(t|\mathbf{X}) = \exp(-\exp(\mathbf{X}\boldsymbol\beta) \hat{\Lambda}_0(t)) \end{align*}`