class: center, middle, inverse, title-slide .title[ # Censored likelihoods ] .author[ ### Yue Jiang ] .date[ ### Duke University ] --- ### Customer service <img src="img/callcenter.jpg" width="60%" style="display: block; margin: auto;" /> Your last ten tickets took 2.5, 3.1, 5.7, 8.0, 9.8, 11.4, 15.5, 18.3, 24.1, and 26.9 minutes each to resolve. .question[ - Suppose you assume resolution times are exponentially distributed. What is the probability a ticket takes less than 10 minutes? - What does the entire survival distribution look like? - How does it compare to the Kaplan-Meier estimate? ] --- ### Estimation Suppose we had a series of *non-censored* observations `\(t_1, t_2, \cdots, t_n\)`, and we thought that these survival times were i.i.d. and came from some distribution with density `\(f(t)\)`. .question[ How might we estimate the parameter(s) `\(\theta\)` of that distribution? ] --- ### Review: maximum likelihood estimation `\begin{align*} \mathcal{L}(\theta | T) &= f(t_1, t_2, \cdots, t_n | \theta)\\ &= f(t_1 | \theta)f(t_2 | \theta) \cdots f(t_n | \theta)\\ &= \prod_{i = 1}^n f(t_i|\theta), \end{align*}` which we can often maximize in closed form for many familiar distributions (or numerically). --- ### Review: maximum likelihood estimation For instance, if we thought `\(T \stackrel{i.i.d}{\sim} Exp(\lambda)\)`, then `\begin{align*} \mathcal{L}(\lambda | T) &= \lambda^n \exp \left(-\lambda\sum_{i = 1}^n t_n \right)\\ \log \mathcal{L}(\lambda | T) &= n\log(\lambda) - \lambda \sum_{i = 1}^n t_i,\\ \hat{\lambda}_{MLE} &= \frac{n}{\sum_{i = 1}^n t_i} \end{align*}` --- ### Back to customer service <img src="img/callcenter.jpg" width="60%" style="display: block; margin: auto;" /> Your last ten tickets took 2.5, 3.1, 5.7, 8.0, 9.8, 10+, 10+, 10+, 10+, and 10+ minutes each to resolve. .question[ - What is the probability a ticket takes less than 10 minutes? - How would you estimate the survival function here (let's assume no parametric asusmption yet) ] --- ### Estimation for censored data How might we perform maximum likelihood estimation for *censored* data? What would the likelihood look like? Suppose we have `\(n\)` i.i.d. observations with the same `\(f(t)\)`, `\(S(t)\)`, and hazard `\(\lambda(t)\)`. Consider what might happen at time `\(t_i\)`. .question[ Suppose an individual experiences event at `\(t_i: \delta_i = 1\)`. What do they contribute to the likelihood? What would their contribution to the likelihood be at `\(t_i\)` if they were *censored*? ] --- ### Estimation for censored data Remember that `\begin{align*} f(t) &= \lambda(t)S(t) \end{align*}` -- and so an individual's contribution to the likelihood is thus `\begin{align*} &\mathrel{\phantom{=}} f(t_i)^{\delta_i}S(t_i)^{1 - \delta_i}\\ &= \lambda(t_i)^{\delta_i}S(t_i)^{\delta_i}S(t_i)^{1 - \delta_i}\\ &= \lambda(t_i)^{\delta_i}S(t_i) \end{align*}` --- ### Estimation for censored data For our exponential example with `\(T \stackrel{i.i.d.}{\sim} Exp(\lambda)\)`, since the hazard is given by `\(\lambda\)` and the survival function is given by `\(e^{-\lambda t}\)`, we would thus have `\begin{align*} \mathcal{L}(\lambda | T) &= \prod_{i = 1}^n \lambda^{\delta_i}\exp(-\lambda t_i), \end{align*}` which we could then maximize using familiar methods. --- ### Back to customer service (again) <img src="img/callcenter.jpg" width="60%" style="display: block; margin: auto;" /> Your last ten tickets took 2.5, 3.1, 5.7, 8.0, 9.8, 10+, 10+, 10+, 10+, and 10+ minutes each to resolve. .question[ - What is the probability a ticket takes less than 10 minutes assuming an exponential distribution for the resolution times? ]