Cox Model and Missing Data

Ovarian data

Recall the ovarian data: we will create a new dataset just with our redefined predictors for simplicity.

data(ovarian); library(mice); library(survival)

## Warning in data(ovarian): data set 'ovarian' not found

## Loading required package: lattice

## 
## Attaching package: 'mice'

## The following objects are masked from 'package:base':
## 
##     cbind, rbind

ovarian$residyes=ovarian$resid.ds-1 #2=yes goes to 1 and 1=no to 0
ovarian$trt1=2-ovarian$rx #1 stays 1 and 2 goes to 0
ovarian$ecog1=2-ovarian$ecog.ps
ovarian$age10=ovarian$age/10 #10-year difference for better interpretability
ovarian2=ovarian[,c(1:2,7:10)] #pick off just vars of interest

Missing data in survival analysis

Suppose some of the variables in the ovarian data were subject to missing values. For example, perhaps the residual disease indicator has some missing values generated at random.

set.seed=1213  #for reproducibility
#randomly sample 5 indices to set to missing (MCAR)
msng=round(runif(5,min=1,max=length(ovarian$residyes)))
#set residyes values corresponding to those indices to missing
ovarian2$residyes[msng]=NA

Look at data

See how many NA values we have….

table(ovarian2$residyes,useNA="always")

## 
##    0    1 <NA> 
##   10   13    3

Just for comparison: complete case analysis

coxph(Surv(futime,fustat)~age10+residyes+trt1+ecog1,data=ovarian2)

## Call:
## coxph(formula = Surv(futime, fustat) ~ age10 + residyes + trt1 + 
##     ecog1, data = ovarian2)
## 
##            coef exp(coef) se(coef)     z     p
## age10     1.154     3.172    0.468  2.47 0.014
## residyes  0.601     1.823    0.848  0.71 0.479
## trt1      1.268     3.555    0.751  1.69 0.091
## ecog1    -0.219     0.803    0.729 -0.30 0.764
## 
## Likelihood ratio test=14.58  on 4 df, p=0.006
## n= 23, number of events= 10 
##    (3 observations deleted due to missingness)

Multiple imputation

imp=mice(ovarian2,m=20)

## 
##  iter imp variable
##   1   1  residyes
##   1   2  residyes
##   1   3  residyes
##   1   4  residyes
##   1   5  residyes
##   1   6  residyes
##   1   7  residyes
##   1   8  residyes
##   1   9  residyes
##   1   10  residyes
##   1   11  residyes
##   1   12  residyes
##   1   13  residyes
##   1   14  residyes
##   1   15  residyes
##   1   16  residyes
##   1   17  residyes
##   1   18  residyes
##   1   19  residyes
##   1   20  residyes
##   2   1  residyes
##   2   2  residyes
##   2   3  residyes
##   2   4  residyes
##   2   5  residyes
##   2   6  residyes
##   2   7  residyes
##   2   8  residyes
##   2   9  residyes
##   2   10  residyes
##   2   11  residyes
##   2   12  residyes
##   2   13  residyes
##   2   14  residyes
##   2   15  residyes
##   2   16  residyes
##   2   17  residyes
##   2   18  residyes
##   2   19  residyes
##   2   20  residyes
##   3   1  residyes
##   3   2  residyes
##   3   3  residyes
##   3   4  residyes
##   3   5  residyes
##   3   6  residyes
##   3   7  residyes
##   3   8  residyes
##   3   9  residyes
##   3   10  residyes
##   3   11  residyes
##   3   12  residyes
##   3   13  residyes
##   3   14  residyes
##   3   15  residyes
##   3   16  residyes
##   3   17  residyes
##   3   18  residyes
##   3   19  residyes
##   3   20  residyes
##   4   1  residyes
##   4   2  residyes
##   4   3  residyes
##   4   4  residyes
##   4   5  residyes
##   4   6  residyes
##   4   7  residyes
##   4   8  residyes
##   4   9  residyes
##   4   10  residyes
##   4   11  residyes
##   4   12  residyes
##   4   13  residyes
##   4   14  residyes
##   4   15  residyes
##   4   16  residyes
##   4   17  residyes
##   4   18  residyes
##   4   19  residyes
##   4   20  residyes
##   5   1  residyes
##   5   2  residyes
##   5   3  residyes
##   5   4  residyes
##   5   5  residyes
##   5   6  residyes
##   5   7  residyes
##   5   8  residyes
##   5   9  residyes
##   5   10  residyes
##   5   11  residyes
##   5   12  residyes
##   5   13  residyes
##   5   14  residyes
##   5   15  residyes
##   5   16  residyes
##   5   17  residyes
##   5   18  residyes
##   5   19  residyes
##   5   20  residyes

coximpute=with(imp,coxph(Surv(futime,fustat)~age10+residyes+trt1+ecog1))

Results

#note the pool function assumes infinite error df; not appropriate in small samples
#(this is a small sample!)
summary(pool(coximpute))

## Warning: Unknown or uninitialised column: 'df.residual'.

## Warning in pool.fitlist(getfit(object), dfcom = dfcom): Large sample
## assumed.

##            estimate std.error  statistic          df     p.value
## age10     1.3489030 0.4879946  2.7641761  26239.0293 0.005706898
## residyes  0.4726947 0.8223105  0.5748373    960.2493 0.565401638
## trt1      0.8501472 0.6456585  1.3167133 463608.1790 0.187935392
## ecog1    -0.2485919 0.6572984 -0.3782026  51372.9554 0.705280298

You can see only age is statistically significant using \(p<0.05\) as a criterion in this analysis.