class: center, middle, inverse, title-slide .title[ # Survivorship and hazard ] .author[ ### Yue Jiang ] .date[ ### STA 490 / STA 690 ] --- ### Representing survival data Underlying data: - `\(T\)`: Failure time, a non-negative random variable - `\(C\)`: Censoring time, a non-negative random variable Observed data for individual `\(i\)`: - `\(Y_i\)`: `\((T_i \wedge C_i)\)`, the minimum of `\(T_i\)` and `\(C_i\)` - `\(\delta_i\)`: `\(1_{(T_i \le C_i)}\)`, whether we observe a failure **Our goal is to make inferential statements about** `\(T\)`. --- ### Survival function The .vocab[survival function] is given by `\(S(t) = P(T > t)\)` and has the following properties: - `\(S(0) = 1\)` - `\(\lim_{t \to \infty} S(t) = 0\)` - It is non-increasing: `\(S(t_2) \le S(t_1)\)` for `\(t_2 \ge t_1\)` .question[ What do these properties mean in plain English? ] --- ### Survival function The survival function is simply the complement of the distribution function: `\begin{align*} F(t) = P(T \le t) = 1 - S(t) \end{align*}` Suppose we have absolutely continuous `\(T\)`. The distribution function `\(f(t)\)` is related to the density by `\begin{align*} f(t) &= \lim_{dt \to 0^+} \frac{P(t\le T < t + dt)}{dt}\\ &= \frac{dF(t)}{dt} \end{align*}` Or equivalently, `\begin{align*} F(t) = \int_0^t f(u)du. \end{align*}` --- ### Hazard function The .vocab[hazard function] is given by `\begin{align*} \lambda(t) = \lim_{dt \to 0^+} \frac{P(t \le T < t + dt | T \ge t)}{dt} \end{align*}` Note that this is **not** a probability (for continuous `\(T\)`), and can be unbounded .question[ - What does the hazard function represent in plain English? - Can you give an example of something with increasing / decreasing hazard? - Why might we want to think in terms of hazards for interpretability reasons? ] --- ### Cumulative hazard Similarly to how the distribution function represents a cumulative density, the .vocab[cumulative hazard] is given similarly as: `\begin{align*} \Lambda(t) = \int_0^t \lambda(u)du \end{align*}` .question[ - Intuitively, what is `\(\Lambda(0)\)`? - Must `\(\Lambda(t)\)` be non-decreasing? Explain. - What is `\(\lim_{t\to \infty} \Lambda(t)\)`? Explain. ] --- ### Hazard and survival `\begin{align*} \lambda(t) &= \lim_{dt \to 0^+} \frac{P(t \le T < t + dt | T \ge t)}{dt} \\ &= \lim_{dt \to 0^+} \frac{P(t \le T < t + dt, T \ge t)/P(T \ge t)}{dt} \\ &= \lim_{dt \to 0^+} \frac{P(t \le T < t + dt)/P(T \ge t)}{dt}\\ &= \frac{f(t)}{S(t)} \end{align*}` .question[ Show that `\(\Lambda(t) = -\log(S(t))\)`. As a hint, use the chain rule to express `\(\lambda(t)\)` as a function of `\(S(t)\)`, then integrate both sides from `\(0\)` to `\(t\)`. ] --- ### Hazard and survival <br><br> .question[ Consider a distribution with *constant* hazard, such that `\(\lambda(t) = c\)` for all times `\(t\)`. - What is the density function associated with this hazard? - Can you think of a real-world example of such a situation? ] --- ### A potential issue <br><br><br> .question[ What is wrong with the general statement `\(f(t) = \frac{dF(t)}{dt}\)` for all distributions? (we actually glossed over this when relating `\(\Lambda(t)\)` to `\(S(t)\)` as well) ] --- ### Being more correct Let `\(F(t)\)` be a non-decreasing càdlàg function with countably many jumps at `\(t_1, t_2, \cdots\)` <img src="img/cadlag.png" width="60%" style="display: block; margin: auto;" /> --- ### Being more correct Define `\(\Delta F(t_j) = F(t_j) - F(t_j^-) > 0\)`. Then we can "always" write `\begin{align*} F(t) = \int_0^t f(u)d(u) + \sum_{j: t_j \le t}\Delta F(t_j), \end{align*}` regardless of whether there are discontinuity points in `\((0, t]\)`. .question[ What does the above expression mean in plain English? ]