Li Ma

Research interests

  • Multi-scale inference

  • Graphical models

  • Large-scale hypothesis testing

  • Recursive partitioning and tree-related methods

  • Statical modeling biomedical data sets, especially microbiome sequencing data and flow cytometry

A recent methodological focus of my research is on using multi-scale techniques to construct flexible probability models that can be applied to massive data sets. Traditional nonparametric approaches, while enjoying many established theoretical properties, are often computationally intractable for big data. Multi-scale inference provides a general framework for tackling the computational bottleneck, while preserving the theoretical guarantees enjoyed by classical methods.

My applied interest focuses on modeling complex data sets from biomedical experiments. In particular, current efforts have been devoted to modeling and analyzing data from microbiome sequencing experiments and flow cytometry.


My research group is currently supported by both the NSF (Statistics Program grant DMS-2013930 and CAREER Award grant DMS-1749789) and the NIH (NIGMS grant 1R01GM135440). Prior support: NSF grants DMS-1309057, DMS-1612889, and a Google Faculty Research Award.


Awaya N and Ma L. (2021) Tree boosting for learning probability measures. [arxiv][R package]

Awaya N and Ma L. (2020) Hidden Markov Polya trees for high-dimensional distributions. [arxiv][R package][slides]

Mao J and Ma L. (2020) Dirichlet-tree multinomial mixtures for clustering microbiome compositions. [arxiv][R package][numerical examples]

Gorsky S, Chan C, and Ma L. (2020) Coarsened mixtures of hierarchical skew normal kernels for flow cytometry analyses. [arxiv][R package]

Gorsky S and Ma L. (2020) Multiscale Fisher's independence test for multivariate dependence. [arxiv] [R package]

Li M and Ma L. (2020) Learning asymmetric and local features in multi-dimensional data through wavelets with recursive partitioning. [arxiv][Matlab toolbox & R package]

Liu R, Li M, and Ma L. (2020) Efficient in-situ image and video compression through probabilistic image representation. [arxiv][Matlab code] (This is the journal version of our previous CVPR paper.)

Luo K, Zhong J, Safi A, Hong L.K., Tewari A.K., Song L, Reddy T.E., Ma L, Crawford G.E., and Hartemink A.J. (2020) Quantitative occupancy of myriad transcription factors from one DNase experiment enables efficient comparisons across conditions [bioRxiv]

Vaughan M, Mao J, Karstens L, Ma L, Amundsen C, Schmader K, Siddiqui N. (2020). The urinary microbiome in postmenopausal women with recurrent urinary tract infections. [bioRxiv]


Giri V, Kegerreis K, Ren Y, Bohannon L, Lobaugh-Jin E, Messina J, Matthews A, Mowery Y, Sito E, Lassiter M, Saullo J, Jung S, Ma L, Greenberg M, Andermann T, van den Brink M, Peled J, Gomes A, Choi T, Gasparetto C, Horwitz M, Long G, Lopez R, Rizzieri D, Sarantopoulos S, Chao N, Allen D, and Sung A. (2021). Chlorhexidine gluconate bathing reduces the incidence of bloodstream infections in adults undergoing inpatient hematopoietic cell transplantation. Transplantation and Cellular Therapy. [online]

Liu R, Li M, and Ma L. (2020) CARP: Compression through adaptive recursive partitioning for multi-dimensional images. CVPR 2020: IEEE/CVF Conference on Computer Vision and Pattern Recognition. [online][Matlab code]

Christensen J and Ma L. (2020) A Bayesian hierarchical model for related densities using Polya trees. Journal of the Royal Statistical Society. Series B., Vol. 82, 127-153. [online] [preprint] [R package]

Mao J, Chen Y, and Ma L. (2020) Bayesian graphical compositional regression for microbiome data. Journal of the American Statistical Association, Applications and Case Studies. Vol. 115, No. 530, 610-624. [online][preprint][R package][Source code for examples]

Ma L. (2019) Discussion on “Latent Nested Nonparametric Priors” by Camerlanghi et al. Bayesian Analysis. [online][preprint]

Ma L and Mao J. (2019) Fisher exact scanning for dependency. Journal of the American Statistical Association, Theory and Methods. Vol. 114, No. 525, 245-258. [online][preprint][R code]

Soriano J and Ma L. (2019) Mixture modeling on related samples through psi-stick breaking and kernel perturbation. Bayesian Analysis. Vol. 14, No.1, 161-180. [online][R package][examples]

Ma L and Soriano J. (2018) Analysis of distributional variation through multi-scale Beta-Binomial modeling. Journal of Computational and Graphical Statistics. Vol. 27, No. 3, 529-541. [online][preprint][R package]

Tang Y, Ma L, and Nicolae DL. (2018) A phylogenetic scan test on Dirichlet-tree multinomial model for microbiome data. Annals of Applied Statistics. Vol. 12, No. 1, 1-26. [online][preprint][R code]

Ma L and Soriano J. (2018) Efficient functional ANOVA through wavelet-domain Markov groves. Journal of the American Statistical Association, Theory and Methods. Vol. 113, No. 3, 802-818. [online][R package]

Soriano J and Ma L. (2017) Probabilistic multi-resolution scanning for two-sample differences. Journal of the Royal Statistical Society. Series B. Vol. 79, No. 2, 547-572. [online][R package]

Ma L. (2017) Recursive partitioning and multi-scale modeling on conditional densities. Electronic Journal of Statistics. Vol. 11, No. 1, 1297-1325. [online][R package]

Ma L. (2017) Adaptive shrinkage in Polya tree type models. Bayesian Analysis. Vol. 12, No. 3, 779-805. (Featured in the editor's invited session “Highlights from Bayesian Analysis” at JSM 2017.) [online][supplement][R package]

Ma L. (2015) Scalable Bayesian model averaging through local information propagation. Journal of the American Statistical Association, Theory and Methods. Vol. 110, No. 510, 795-809. [online][preprint][R package]

Ma L. (2013) Adaptive testing of conditional association through recursive mixture modeling. Journal of the American Statistical Association, Theory and Methods. Vol. 108, No. 504, 1493-1505. [online][R package]

Ma L, Wong WH, and Owen AB. (2012) A sparse transmission disequilibrium test for haplotypes based on Bradley-Terry graphs. Human Heredity. Vol. 73, No. 1, 52-61. [online][preprint]

Ma L and Wong WH. (2011) Coupling optional Polya trees and the two sample problem. Journal of the American Statistical Association, Theory and Methods. Vol. 106, No. 496, 1553-1565. [online][arxiv]

Ma L, Stein ML, Wang M, Shelton AO, Pfister CA, and Wilder KJ. (2011) A method for unbiased estimation of population abundance along curvy margins. Environmetrics. Vol. 22, No. 3, 330-339. [online]

Wong WH and Ma L. (2010) Optional Polya tree and Bayesian inference. Annals of Statistics. Vol. 38, No. 3, 1433-1459. [online][pdf][R package]

Ma L, Assimes T, Asadi NB, Iribarren C, Quertermous T, and Wong WH. (2010) An “almost exhaustive” search based sequential permutation method for detecting epistasis in disease association studies. Genetic Epidemiology. Vol. 34, No. 5, 434-443. [online][software]

Ma L, Mease D, and Russell D. (2010) A four group cross-over design for measuring irreversible treatments on web search tasks. Proceedings of HICSS-44. [online]