class: center, middle, inverse, title-slide .title[ # The log-rank test ] .author[ ### Yue Jiang ] .date[ ### Duke University ] --- ### From homework 2... <img src="log-rank_files/figure-html/unnamed-chunk-1-1.png" style="display: block; margin: auto;" /> .question[ What would be reasonable null and alternative hypotheses for comparing two survival functions? ] --- ### Comparing two groups `\begin{align*} H_0: S_1(t) &= S_2(t)\\ H_1: S_1(t) &\neq S_2(t) \end{align*}` .question[ - What does the alternative hypothesis mean here? - What would some examples of this alternative hypothesis being true look like? - Does a UMP test exist for the generic hypotheses written above? ] --- ### Comparing two groups `\begin{align*} H_0: S_1(t) &= S_2(t)\\ H_1: S_1(t) &\neq S_2(t) \end{align*}` .question[ What might be a reasonable .vocab[omnibus test]? ] --- ### Comparing two groups For now, let's think about tests against a .vocab[specific alternative], for instance: `\begin{align*} H_1: S_2(t) = \left(S_1(t)\right)^\theta, \, \theta \neq 1 \end{align*}` .question[ What does this alternative hypothesis imply? What might it look like visually? ] --- ### Constructing the log-rank test For time `\(t_j\)`, the population can be summarized by the following: | | Failures at `\(t_j\)` | Non-failures at `\(t_j\)` | \# at risk at `\(t_j\)` | | ------------- | :-----: | :-----:| ----: | | Group 1 | `\(d_{1,j}\)` | `\(N_{1,j} - d_{1,j}\)` | `\(N_{1,j}\)` | | Group 2 | `\(d_{2,j}\)` | `\(N_{2,j} - d_{2,j}\)` | `\(N_{2,j}\)` | | **Total** | `\(d_j\)` | `\(N_j - d_j\)` | `\(N_j\)` | Note that the left column is of *observed failures* at that time, **not** including those who have failed previously. .question[ Under the null hypothesis, what is the distribution of `\(d_{1,j}\)`? ] --- ### Constructing the log-rank test For time `\(t_j\)`, the population can be summarized by the following: | | Failures at `\(t_j\)` | Non-failures at `\(t_j\)` | \# at risk at `\(t_j\)` | | ------------- | :-----: | :-----:| ----: | | Group 1 | `\(d_{1,j}\)` | `\(N_{1,j} - d_{1,j}\)` | `\(N_{1,j}\)` | | Group 2 | `\(d_{2,j}\)` | `\(N_{2,j} - d_{2,j}\)` | `\(N_{2,j}\)` | | **Total** | `\(d_j\)` | `\(N_j - d_j\)` | `\(N_j\)` | `\begin{align*} f(d_{1j}) = \frac{\binom{d_j}{d_{1,j}}\binom{N_j - d_j}{N_{1,j} - d_{1,j}}}{\binom{N_j}{N_{1,j}}} \end{align*}` .question[ Under the null hypothesis, what are the expectation and variance of `\(d_{1,j}\)` (given the marginals)? ] --- ### Constructing the log-rank test Under `\(H_0\)`, `\(E(d_{1,j}) = N_{1,j}\left(d_j/N_j\right)\)` (why?) `\(Var(d_{1,j}) = \frac{N_{1,j}N_{2,j}d_j(N_j - d_j)}{N_j^2(N_j - 1)}\)`. Now consider *all* failure times `\(t_j\)` for `\(j = 1, 2, \cdots, J\)`. Consider the expression: `\begin{align*} U = \sum_{j = 1}^J (d_{1,j} - \underbrace{N_{1,j}\left(d_j/N_j\right)}_{E(d_{1,j})}) \end{align*}` .question[ Intuitively, what does `\(U\)` represent? What might be a reasonable expression for `\(Var(U)\)`? ] --- ### Constructing the log-rank test `\(U\)`, appropriately scaled, has a standard normal distribution: `\begin{align*} \frac{U}{\sqrt{Var(U)}} \sim N(0, 1), \end{align*}` where `\(\displaystyle Var(U) = \sum_{j = 1}^J Var(d_{1, j})\)` (is this reasonable?). --- ### Constructing the log-rank test `\begin{align*} \mathrel{\phantom{=}} \frac{U^2}{Var(U)} &\mathrel{\phantom{\sim}}\\ = \frac{\left(\sum_{j = 1}^J (d_{1,j} - E(d_{1,j})) \right)^2}{\sum_{j = 1}^J Var(d_{1, j})} &\sim \chi^2_1 \end{align*}` .question[ Think carefully about the information being used here. Do we need to know the exact times at which observations fail? ] --- ### From homework 2... ```r library(survival) lr <- survdiff(Surv(week, arrest) ~ mar, data = dat) lr ``` ``` ## Call: ## survdiff(formula = Surv(week, arrest) ~ mar, data = dat) ## ## N Observed Expected (O-E)^2/E (O-E)^2/V ## mar=married 53 8 15.2 3.394 3.94 ## mar=not married 379 106 98.8 0.521 3.94 ## ## Chisq= 3.9 on 1 degrees of freedom, p= 0.05 ``` ```r names(lr) ``` ``` ## [1] "n" "obs" "exp" "var" "chisq" "pvalue" "call" ``` ```r lr$pvalue ``` ``` ## [1] 0.04722271 ```