The log-rank test

class: center, middle, inverse, title-slide

.title[
# The log-rank test
]
.author[
### Yue Jiang
]
.date[
### Duke University
]

---

### From homework 2...

.question[
What would be reasonable null and alternative hypotheses for comparing two survival functions?
]

---

### Comparing two groups

`\begin{align*}
H_0: S_1(t) &= S_2(t)\\
H_1: S_1(t) &\neq S_2(t)
\end{align*}`

.question[
- What does the alternative hypothesis mean here? 
- What would some examples of this alternative hypothesis being true look like?
- Does a UMP test exist for the generic hypotheses written above?
]

---

### Comparing two groups

`\begin{align*}
H_0: S_1(t) &= S_2(t)\\
H_1: S_1(t) &\neq S_2(t)
\end{align*}`

.question[
What might be a reasonable .vocab[omnibus test]?
]

---

### Comparing two groups

For now, let's think about tests against a .vocab[specific alternative], for instance:

`\begin{align*}
H_1: S_2(t) = \left(S_1(t)\right)^\theta, \, \theta \neq 1
\end{align*}`

.question[
What does this alternative hypothesis imply? What might it look like visually?
]

---

### Constructing the log-rank test

For time `$t_j$`, the population can be summarized by the following:

|               | Failures at `$t_j$` | Non-failures at `$t_j$`  | \# at risk at `$t_j$` |
| ------------- | :-----:            | :-----:| ----: |
| Group 1       | `$d_{1,j}$`         | `$N_{1,j} - d_{1,j}$` | `$N_{1,j}$`    |
| Group 2       | `$d_{2,j}$`         | `$N_{2,j} - d_{2,j}$` | `$N_{2,j}$`    |
| **Total**     | `$d_j$`             | `$N_j - d_j$`   | `$N_j$`   |

Note that the left column is of *observed failures* at that time, **not** including
those who have failed previously.

.question[
Under the null hypothesis, what is the distribution of `$d_{1,j}$`?
]

---

### Constructing the log-rank test

For time `$t_j$`, the population can be summarized by the following:

`\begin{align*}
f(d_{1j}) = \frac{\binom{d_j}{d_{1,j}}\binom{N_j - d_j}{N_{1,j} - d_{1,j}}}{\binom{N_j}{N_{1,j}}}
\end{align*}`
.question[
Under the null hypothesis, what are the expectation and variance of `$d_{1,j}$` 
(given the marginals)?
]

---

### Constructing the log-rank test

Under `$H_0$`, `$E(d_{1,j}) = N_{1,j}\left(d_j/N_j\right)$` (why?)

`$Var(d_{1,j}) = \frac{N_{1,j}N_{2,j}d_j(N_j - d_j)}{N_j^2(N_j - 1)}$`.

Now consider *all* failure times `$t_j$` for `$j = 1, 2, \cdots, J$`. Consider the expression:

`\begin{align*}
U = \sum_{j = 1}^J (d_{1,j} - \underbrace{N_{1,j}\left(d_j/N_j\right)}_{E(d_{1,j})})
\end{align*}`

.question[
Intuitively, what does `$U$` represent? What might be a reasonable expression for `$Var(U)$`?
]

---

### Constructing the log-rank test

`$U$`, appropriately scaled, has a standard normal distribution:

`\begin{align*}
\frac{U}{\sqrt{Var(U)}} \sim N(0, 1),
\end{align*}`

where `$\displaystyle Var(U) = \sum_{j = 1}^J Var(d_{1, j})$` (is this reasonable?).

---

### Constructing the log-rank test

`\begin{align*}
\mathrel{\phantom{=}} \frac{U^2}{Var(U)} &\mathrel{\phantom{\sim}}\\
= \frac{\left(\sum_{j = 1}^J (d_{1,j} - E(d_{1,j})) \right)^2}{\sum_{j = 1}^J Var(d_{1, j})} &\sim \chi^2_1
\end{align*}`

.question[
Think carefully about the information being used here. Do we need to know the exact times at which observations fail?
]

---

### From homework 2...

```r
library(survival)

lr <- survdiff(Surv(week, arrest) ~ mar, data = dat)
lr
```

```
## Call:
## survdiff(formula = Surv(week, arrest) ~ mar, data = dat)
## 
##                   N Observed Expected (O-E)^2/E (O-E)^2/V
## mar=married      53        8     15.2     3.394      3.94
## mar=not married 379      106     98.8     0.521      3.94
## 
##  Chisq= 3.9  on 1 degrees of freedom, p= 0.05
```

```r
names(lr)
```

```
## [1] "n"      "obs"    "exp"    "var"    "chisq"  "pvalue" "call"
```

```r
lr$pvalue
```

```
## [1] 0.04722271
```