Causal inference concerns designing and evaluating treatments or interventions in randomized experiments and observational studies. It is central to decision making in many disciplines, including social sciences, medicine, and policy. It is known as comparative effectiveness research (in health studies), or program evaluation (in economics), or A/B testing (in online experiments). My research in causal inference, mostly motivated from real world problems in medicine, policy and economics, focuses on the following topics.
Propensity score methods
Propensity score is arguably one of the most widely-used causal inference methods in observational studies. I have worked extensively on propensity score weighting methods. One of my main contributions is to develop the overlap weighting [Link] method (Li, Morgan, Zaslavsky, 2018), where, in binary treatments, each unit is weighted by its probability of being assigned to the opposite group. This method has many statistical and scientific advantages over the traditional inverse probability weighting method. We have also extended the method to multiple treatments and marginal structural models for time-varying treatments. I have also worked on propensity score weighting methods for multilevel or clustered data.
More information about the overlap weights is here.
Natural- and quasi-experiments: RDD and DiD
Natural- and quasi-experiments, usually in the form of instrumental variables (IV), are widely used in economics and social sciences. Two closed-related methods are regression discontinuity designs (RDD) and difference-in-differences (DiD). In the domain of RDD, I have developed a RDD model for ordinal running variables (Li et al. 2019). In the domain of DiD, I have proved a general bracketing relationship between DiD and the lagged-dependent-variable adjustment method (Ding and Li, 2019), and I have developed new double-robust DiD estimators.
Principal stratification (Frangakis and Rubin, 2002) is a general framework for handling post-treatment intermediate variables, including noncompliance, informative missing data, truncation-by-death. Within the framework, I have developed new models and methods for continuous intermediate variable, sensitivity analysis , multiple outcomes , and cluster randomized trials.
Bayesian modeling for complex applications
I have been developing flexible Bayesian causal models for complex applications, including mediation with functional mediators, spatial temporal data, electronic health record data.
Bayesian methods for structured high-dimensional data
I have developed a number of Bayesian models for analyzing high-dimensional data. These models can exploit and/or infer underlying structure and statistical dependencies between a large number of predictors relevant to a response. I have developed several novel graphical prior distributions, including an Ising prior (Li and Zhang, 2010), an Ising-Dirichlet Process prior (Li et al. 2015), and a Potts prior/penalty (Zhang et al. 2015) to represent the geometric structure between predictors. These models have been applied to problems in genomics and neuroimaging.
Missing data is prevalent in large public-use data such as national surveys, and must be addressed before applying standard statistical analysis. The state-of-the-art methodology to missing data is multiple imputation (MI) and the most popular MI algorithm is MICE (Multiple imputation by chained equations). I investigated the theoretical properties of MICE, and developed a new spectrum of MI algorithms—imputation by monotone blocks (IMB)— that can impute a wide range of data types in a more compatible fashion than MICE. I also conducted empirical comparisons between MICE and the most principled Bayesian joint modeling MI approaches.
My research has been generously funded by NSF, NIH and PCORI.