Time-varying elements

class: center, middle, inverse, title-slide

.title[
# Time-varying elements
]
.author[
### Yue Jiang
]
.date[
### STA 490/690
]

---

### Extending the Cox model

`\begin{align*}
\lambda_i(t) &= \lambda_0(t)\exp\left(\mathbf{x}_i\boldsymbol\beta \right)\\
\lambda_i(t) &= \lambda_0(t)\exp\left(\mathbf{x}_i(t)\boldsymbol\beta \right)?\\
\lambda_i(t) &= \lambda_0(t)\exp\left(\mathbf{x}_i\boldsymbol\beta(t) \right)?
\end{align*}`

.question[
What are we doing in each of these expressions?
]

---

### Schoenfeld residuals

![](tv_files/figure-html/unnamed-chunk-2-1.png)

.question[
Which of the extensions to the Cox model mentioned on the previous slide do you think is most relevant here? Why?
]

---

### Scaled Schoenfeld residuals

The vector of Schoenfeld residuals for individual `\(i\)` is given by the `\(i\)`th term 
of the first derivative of the log partial likelihood with respect to `\(\boldsymbol\beta\)`, 
evaluated at `\(\widehat{\boldsymbol\beta}\)`.

For the vector of Schoenfeld residuals `\(\mathbf{r}_i\)`, we can scale them as

`\begin{align*}
\mathbf{r}^\star_i = D \times Var(\widehat{\boldsymbol\beta})\mathbf{r}_i,
\end{align*}`

which can be used to define a test of proportional hazards.

---

### The Grambsch-Therneau test

The expected value of the `\(i\)` scaled Schoenfeld residual for `\(\beta_k\)` is
approximately `\(\beta_k(t_i) - \widehat{\beta}_j\)`, where `\(\beta_k(t_i)\)` is the time-varying
coefficient of `\(x_j\)` evaluated at the failure time `\(t_i\)` corresponding to individual `\(i\)` 
(note - these are *specifically* evaluated at failure times; censored observations do
not contribute).

If proportional hazards is satisfied, then `\(r^\star_{ik} + \widehat{\beta}_k\)` should be
constant. That is, we are testing whether there is a linear relationship between `\(E(r_{ik})\)` and
time. If `\(E(r_{ik})\)` depends on time (i.e., is not constant), then proportional hazards
is violated.

```r
cox.zph(coxm1)
```

```
##         chisq df       p
## TRTMT  39.827  1 2.8e-10
## EJFPER  0.716  1    0.40
## PREVMI  0.652  1    0.42
## GLOBAL 41.187  3 6.0e-09
```

---

### The Grambsch-Therneau test

```r
ggcoxzph(cox.zph(coxm1), var = "EJFPER",
         ylim = c(-2, 2))
```

![](tv_files/figure-html/unnamed-chunk-4-1.png)

---

### The Grambsch-Therneau test

```r
ggcoxzph(cox.zph(coxm1), var = "TRTMT",
         ylim = c(-2, 2))
```

![](tv_files/figure-html/unnamed-chunk-5-1.png)

---

### Fitting time-varying coefficients

Ok, so there may be a linear (let's say linear) relationship between `\(\beta_{TRTMT}\)` and time.
Consider the following linear predictor incorporating this linear time dependence at time `\(t_j\)`

`\begin{align*}
\eta_i &= TRT_i\beta_{TRT} + TRT_i\beta^\star t_j + \\
&\mathrel{\phantom{=}} EJF_i \beta_{EJF} + PMI_i \beta_{PMI},
\end{align*}`

where the contribution to the partial likelihood for individual `\(i\)` failing at time `\(t_j\)` is given by

`\begin{align*}
\frac{\exp(\eta_i)}{\sum_{k \in \mathcal{R}_k}\exp(\eta_k)}.
\end{align*}`

---

### Fitting time-varying coefficients

.question[
Can we simply model this as an interaction with time? For instance, what do you think of the model implied by the code below (supposing we had a nice `time` variable?
]

```r
coxm2 <- coxph(Surv(DWHFDAYS, DWHF) ~ TRTMT + EJFPER + PREVMI +
                                      TRTMT * time,
                    data = dig)
```

---

### Fitting time-varying coefficients

Note - fitting time-varying coefficients like this is very computationally (and space)
intensive if you have a lot of failure times (not worth getting into but I'm happy
to handwave an answer - ask again after we talk about how `R` handles time-varying
covariates).

Fitting the code below actually creates an object that is almost half
a gigabyte in size!

```r
coxm2 <- coxph(Surv(DWHFDAYS, DWHF) ~ TRTMT + EJFPER + PREVMI +
                                      tt(TRTMT),
               tt = function(x, t, ...) { x * t },
                    data = dig)
summary(coxm2)
```

---

### Fitting time-varying coefficients

```r
coxm2 <- coxph(Surv(DWHFDAYS, DWHF) ~ TRTMT + EJFPER + PREVMI +
                                      tt(TRTMT),
               tt = function(x, t, ...) { x * t },
                    data = dig)
summary(coxm2)
```

```
## Call:
## coxph(formula = Surv(DWHFDAYS, DWHF) ~ TRTMT + EJFPER + PREVMI + 
##     tt(TRTMT), data = dig, tt = function(x, t, ...) {
##     x * t
## })
## 
##   n= 6799, number of events= 2332 
##    (1 observation deleted due to missingness)
## 
##                 coef  exp(coef)   se(coef)       z Pr(>|z|)    
## TRTMT     -0.5722515  0.5642536  0.0652410  -8.771  < 2e-16 ***
## EJFPER    -0.0379380  0.9627726  0.0023806 -15.936  < 2e-16 ***
## PREVMI     0.0497461  1.0510042  0.0436404   1.140    0.254    
## tt(TRTMT)  0.0006029  1.0006031  0.0001060   5.686  1.3e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##           exp(coef) exp(-coef) lower .95 upper .95
## TRTMT        0.5643     1.7723    0.4965    0.6412
## EJFPER       0.9628     1.0387    0.9583    0.9673
## PREVMI       1.0510     0.9515    0.9648    1.1449
## tt(TRTMT)    1.0006     0.9994    1.0004    1.0008
## 
## Concordance= 0.609  (se = 0.006 )
## Likelihood ratio test= 337.2  on 4 df,   p=<2e-16
## Wald test            = 333.7  on 4 df,   p=<2e-16
## Score (logrank) test = 338.6  on 4 df,   p=<2e-16
```

---

### Fitting time-varying coefficients

You can make the time-transform function `tt` according to any functional form
you like, for instance in this case maybe a log relationship or inclusion of 
a quadratic term might make more sense. With that said, for more complicated 
forms, numerical optimization becomes really horrible, especially for "non-small" 
datasets like this one.

---

### Onto time-dependent *covariates*

.question[
Actually, I'll let you handle this one. How might you (reasonably) incorporate
time-varying *covariates* into the Cox model? Keep in mind the contribution of
each individual at observed failure times `\(t_j\)` to the partial likelihood:

`\begin{align*}
\frac{\exp(\mathbf{x}_i\boldsymbol\beta)}{\sum_{j \in \mathcal{R}(t_j)}\exp(\mathbf{x}_j\boldsymbol\beta)}
\end{align*}`

and the product of all such contributions at each failure time.
]

(we'll spend a lot of time in-class on Thursday tackling a real-world dataset -
please bring your laptops (or whatever you use for data analysis))