Stratification in the Cox model

class: center, middle, inverse, title-slide

.title[
# Stratification in the Cox model
]
.author[
### Yue Jiang
]
.date[
### STA 490/690
]

---

### The Rossi et al. trial...

![](strata_files/figure-html/unnamed-chunk-2-1.png)

.question[
What do you notice?
]

---

### Financial aid intervention

``` r
m1 <- coxph(Surv(week, arrest) ~ fin + wexp, data = Rossi)
ggcoxzph(cox.zph(m1), var = "wexp")
```

![](strata_files/figure-html/unnamed-chunk-3-1.png)
.question[
What might we do in the presence of non-proportional hazards?
]

---

### Stratification in the Cox model

Previously, we've seen how allowing time-varying coefficients might help address proportional hazards violations. We might also consider simply not requiring proportional hazards for those "difficult" covariates by estimating different baseline hazards for each strata:

`\begin{align*}
\lambda_{i, prior \, work=yes}(t) &= \lambda_{0, prior \, work = yes}(t)\exp(\mathbf{x}_i\boldsymbol\beta)\\
\lambda_{i, prior \, work=no}(t) &= \lambda_{0, prior \, work = no}(t)\exp(\mathbf{x}_i\boldsymbol\beta)\\
\end{align*}`

In this case, we are estimating separate baseline hazards stratified by work experience.

.question[
What might the partial likelihood look like for this *stratified* model? (would've been a good homework question, rats!)
]

---

### Stratification in the Cox model

``` r
m2 <- coxph(Surv(week, arrest) ~ fin + strata(wexp), data = Rossi)
summary(m2)
```

```
## Call:
## coxph(formula = Surv(week, arrest) ~ fin + strata(wexp), data = Rossi)
## 
##   n= 432, number of events= 114 
## 
##           coef exp(coef) se(coef)      z Pr(>|z|)  
## finyes -0.3781    0.6852   0.1897 -1.993   0.0463 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##        exp(coef) exp(-coef) lower .95 upper .95
## finyes    0.6852      1.459    0.4724    0.9938
## 
## Concordance= 0.547  (se = 0.024 )
## Likelihood ratio test= 4.03  on 1 df,   p=0.04
## Wald test            = 3.97  on 1 df,   p=0.05
## Score (logrank) test = 4.02  on 1 df,   p=0.05
```

---

### Stratification in the Cox model

.question[
What do you notice? What might potential drawbacks be to stratification?

There's some evidence of non-proportional hazards due to the financial aid treatment. What would happen if we were to stratify by this variable?
]

---

### Additional applications

``` r
bladder2[1:15,]
```

```
##    id rx number size start stop event enum
## 1   1  1      1    3     0    1     0    1
## 2   2  1      2    1     0    4     0    1
## 3   3  1      1    1     0    7     0    1
## 4   4  1      5    1     0   10     0    1
## 5   5  1      4    1     0    6     1    1
## 6   5  1      4    1     6   10     0    2
## 7   6  1      1    1     0   14     0    1
## 8   7  1      1    1     0   18     0    1
## 9   8  1      1    3     0    5     1    1
## 10  8  1      1    3     5   18     0    2
## 11  9  1      1    1     0   12     1    1
## 12  9  1      1    1    12   16     1    2
## 13  9  1      1    1    16   18     0    3
## 14 10  1      3    3     0   23     0    1
## 15 11  1      1    3     0   10     1    1
```

---

### The Anderson-Gill model

``` r
m3 <- coxph(Surv(start, stop, event) ~ rx, data = bladder2)
summary(m3)
```

```
## Call:
## coxph(formula = Surv(start, stop, event) ~ rx, data = bladder2)
## 
##   n= 178, number of events= 112 
## 
##       coef exp(coef) se(coef)      z Pr(>|z|)  
## rx -0.3733    0.6885   0.1976 -1.889   0.0589 .
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##    exp(coef) exp(-coef) lower .95 upper .95
## rx    0.6885      1.452    0.4674     1.014
## 
## Concordance= 0.552  (se = 0.03 )
## Likelihood ratio test= 3.68  on 1 df,   p=0.06
## Wald test            = 3.57  on 1 df,   p=0.06
## Score (logrank) test = 3.61  on 1 df,   p=0.06
```

---

### Adjusting for number of prior events?

.question[
What is being implied by this model?
]

``` r
m4 <- coxph(Surv(start, stop, event) ~ 
              rx + cluster(id) + strata(enum), data = bladder2)
summary(m4)
```

```
## Call:
## coxph(formula = Surv(start, stop, event) ~ rx + strata(enum), 
##     data = bladder2, cluster = id)
## 
##   n= 178, number of events= 112 
## 
##       coef exp(coef) se(coef) robust se      z Pr(>|z|)
## rx -0.2458    0.7821   0.2130    0.2095 -1.173    0.241
## 
##    exp(coef) exp(-coef) lower .95 upper .95
## rx    0.7821      1.279    0.5187     1.179
## 
## Concordance= 0.541  (se = 0.031 )
## Likelihood ratio test= 1.35  on 1 df,   p=0.2
## Wald test            = 1.38  on 1 df,   p=0.2
## Score (logrank) test = 1.34  on 1 df,   p=0.2,   Robust = 1.51  p=0.2
## 
##   (Note: the likelihood ratio and score tests assume independence of
##      observations within a cluster, the Wald and robust score tests do not).
```

---

### Stratifying by event...

.question[
What model is this?
]

``` r
m5 <- coxph(Surv(start, stop, event) ~ 
              rx + cluster(id) + strata(enum), data = bladder2)
summary(m5)
```

---

### Another method...

.question[
What model is this?
]

``` r
m6 <- coxph(Surv(rep(0, 178), stop-start, event) ~ 
              rx + cluster(id) + strata(enum), data = bladder2)
summary(m6)
```

```
## Call:
## coxph(formula = Surv(rep(0, 178), stop - start, event) ~ rx + 
##     strata(enum), data = bladder2, cluster = id)
## 
##   n= 178, number of events= 112 
## 
##       coef exp(coef) se(coef) robust se      z Pr(>|z|)
## rx -0.1635    0.8492   0.2020    0.2194 -0.745    0.456
## 
##    exp(coef) exp(-coef) lower .95 upper .95
## rx    0.8492      1.178    0.5524     1.305
## 
## Concordance= 0.521  (se = 0.03 )
## Likelihood ratio test= 0.66  on 1 df,   p=0.4
## Wald test            = 0.56  on 1 df,   p=0.5
## Score (logrank) test = 0.66  on 1 df,   p=0.4,   Robust = 0.59  p=0.4
## 
##   (Note: the likelihood ratio and score tests assume independence of
##      observations within a cluster, the Wald and robust score tests do not).
```