Li Ma

Research interests

  • Nonparametric modeling and inference

  • Multi-scale inference

  • Recursive partitioning and tree-related methods

  • Statistical modeling of biomedical data sets, especially microbiome sequencing data and flow cytometry

A recent methodological focus of my research is on using multi-scale techniques to construct flexible probability models that can be applied to massive data sets. Traditional nonparametric approaches, while enjoying many established theoretical properties, are often computationally intractable for big data. Multi-scale inference provides a general framework for tackling the computational bottleneck, while preserving the theoretical guarantees enjoyed by classical methods.

My applied interest focuses on modeling complex data sets from biomedical experiments. In particular, current efforts have been devoted to modeling and analyzing data from microbiome sequencing experiments and flow cytometry.

Support

My research group is currently supported by both the NSF (Statistics Program grant DMS-2013930, CDS&E Program grant DMS-2152999, CAREER Award grant DMS-1749789) and the NIH (NIGMS grant R01-GM135440). Prior support: NSF grants DMS-1309057, DMS-1612889, and a Google Faculty Research Award.

Preprints

Horiguchi A, Ma L, and Szabo B. (2024) Sampling depth trade-off in function estimation under a two-level design. [arxiv]

Horiguchi A, Chan C, and Ma L. (2023) A tree perspective on stick-breaking models in covariate-dependent mixtures. [arxiv]

Awaya N and Ma L. (2023) Unsupervised tree boosting for learning probability distributions. [30-min talk][15-min video][arxiv][R package] Winner of a student/postdoc best paper award at 2021 ISBA World Meeting.

Wang Z, Mao J, and Ma L. (2022) Microbiome compositional analysis with logistic-tree normal models. [talk][15-min video][arxiv][R package][numerical examples] Winner of a student/postdoc best paper award at 2021 ISBA World Meeting.

Publications

Liu R, Li M, and Ma L. (2024) Efficient in-situ image and video compression through probabilistic image representation. Signal Processing. Vol. 215, 109268. [online][arxiv][Matlab code]

Ji Z and Ma L. (2023) Controlling taxa abundance improves metatranscriptomics differential analysis. BMC Microbiology. 24:60. [online]

Gorsky S, Chan C, and Ma L. (2023) Coarsened mixtures of hierarchical skew normal kernels for flow cytometry analyses. Bayesian Analysis. (To appear) [15-min video][arxiv][R package]

LeBlanc P and Ma L. (2023) Microbiome subcommunity learning with logistic-tree normal latent Dirichlet allocation. Biometrics. Vol. 79, Iss. 3, 2321-2332. [online][arxiv][R package]

Awaya N and Ma L. (2023) Hidden Markov Pólya trees for high-dimensional distributions. Journal of the American Statistical Association, Theory and Methods. (To appear) [online][arxiv][R package][slides]

Gorsky S and Ma L. (2022) Multiscale Fisher's independence test for multivariate dependence (with discussion). Biometrika. Vol. 109, No. 3, 569–587. [online][arxiv] [R package]

Gorsky S and Ma L. (2022) Rejoinder: “Multiscale Fisher's independence test for multivariate dependence.” Biometrika. Vol. 109, No. 3, 605-609. [online]

Mao J and Ma L. (2022) Dirichlet-tree multinomial mixtures for clustering microbiome compositions. Annals of Applied Statistics. Vol. 16, No. 3, 1476-1499. [talk][15-min video][online][arxiv][R package][numerical examples]

Siddiqui N, Ma L, Brubaker L, Mao J, Hoffman C, Wang Z, Karstens L. (2022) Updating urinary microbiome analyses to enhance biologic interpretation. Frontiers in Cellular and Infection Microbiology. 12:789439. [online]

Vaughan M, Zemtsov G.E., Dahl E.M., Karstens L, Ma L, Siddiqui N. (2022) Concordance of urinary microbiota detected by 16S rRNA amplicon sequencing versus expanded quantitative urine culture. American Journal of Obstetrics & Gynecology. [online]

Luo K, Zhong J, Safi A, Hong L.K., Tewari A.K., Song L, Reddy T.E., Ma L, Crawford G.E., and Hartemink A.J. (2022) Quantitative occupancy of myriad transcription factors from one DNase experiment enables efficient comparisons across conditions. Genome Research. 32: 1183-1198. [online][bioRxiv]

Li M and Ma L. (2022) Learning asymmetric and local features in multi-dimensional data through wavelets with recursive partitioning. IEEE Transactions on Pattern Analysis and Machine Intelligence. Vol. 44, No. 11, 7674-7687. [online][arxiv][Matlab toolbox & R package]

Vaughan M, Mao J, Karstens L, Ma L, Amundsen C, Schmader K, Siddiqui N. (2021). The urinary microbiome in postmenopausal women with recurrent urinary tract infections. Journal of Urology. Vol. 206, No. 5, 1222-1231. [online][bioRxiv]

Ramalingam S, Siamakpour-Reihani S, Bohannan L, Ren Y, Sibley, Sheng J, Ma L, Nixon AB, Lyu J, Parker DC, Bain B, Muehlbauer M, Ilkayeva O, Kraus VB, Huebner J, Spitzer T, Brown J, Peled J, van den Brink M, Gomes A, Choi T, Gasparetto C, Horwitz M, Long G, Lopez R, Rizzieri D, Sarantopoulos S, Chao N, and Sung AD. (2021). A phase 2 trial of the somatostatin analog pasireotide to prevent GI toxicity and acute GVHD in allogeneic hematopoietic stem cell transplant. PLOS ONE. Vol. 16, No. 6. [online].

Giri V, Kegerreis K, Ren Y, Bohannon L, Lobaugh-Jin E, Messina J, Matthews A, Mowery Y, Sito E, Lassiter M, Saullo J, Jung S, Ma L, Greenberg M, Andermann T, van den Brink M, Peled J, Gomes A, Choi T, Gasparetto C, Horwitz M, Long G, Lopez R, Rizzieri D, Sarantopoulos S, Chao N, Allen D, and Sung A. (2021). Chlorhexidine gluconate bathing reduces the incidence of bloodstream infections in adults undergoing inpatient hematopoietic cell transplantation. Transplantation and Cellular Therapy. Vol. 27, No. 1, 262e1-e11. [online]

Liu R, Li M, and Ma L. (2020) CARP: Compression through adaptive recursive partitioning for multi-dimensional images. CVPR 2020: IEEE/CVF Conference on Computer Vision and Pattern Recognition. [online][Matlab code]

Christensen J and Ma L. (2020) A Bayesian hierarchical model for related densities using Pólya trees. Journal of the Royal Statistical Society. Series B., Vol. 82, 127-153. [online] [preprint] [R package]

Mao J, Chen Y, and Ma L. (2020) Bayesian graphical compositional regression for microbiome data. Journal of the American Statistical Association, Applications and Case Studies. Vol. 115, No. 530, 610-624. [talk][online][preprint][R package][Source code for examples]

Ma L. (2019) Discussion on “Latent Nested Nonparametric Priors” by Camerlanghi et al. Bayesian Analysis. Vol. 14, No. 4, 1303-1356. [online][preprint]

Ma L and Mao J. (2019) Fisher exact scanning for dependency. Journal of the American Statistical Association, Theory and Methods. Vol. 114, No. 525, 245-258. [online][preprint][R code]

Soriano J and Ma L. (2019) Mixture modeling on related samples through psi-stick breaking and kernel perturbation. Bayesian Analysis. Vol. 14, No.1, 161-180. [online][R package][examples]

Ma L and Soriano J. (2018) Analysis of distributional variation through multi-scale Beta-Binomial modeling. Journal of Computational and Graphical Statistics. Vol. 27, No. 3, 529-541. [online][preprint][R package]

Tang Y, Ma L, and Nicolae DL. (2018) A phylogenetic scan test on Dirichlet-tree multinomial model for microbiome data. Annals of Applied Statistics. Vol. 12, No. 1, 1-26. [online][preprint][R code]

Ma L and Soriano J. (2018) Efficient functional ANOVA through wavelet-domain Markov groves. Journal of the American Statistical Association, Theory and Methods. Vol. 113, No. 3, 802-818. [online][R package]

Soriano J and Ma L. (2017) Probabilistic multi-resolution scanning for two-sample differences. Journal of the Royal Statistical Society. Series B. Vol. 79, No. 2, 547-572. [online][R package]

Ma L. (2017) Recursive partitioning and multi-scale modeling on conditional densities. Electronic Journal of Statistics. Vol. 11, No. 1, 1297-1325. [online][R package]

Ma L. (2017) Adaptive shrinkage in Pólya tree type models. Bayesian Analysis. Vol. 12, No. 3, 779-805. (Featured in the editor's invited session “Highlights from Bayesian Analysis” at JSM 2017.) [online][supplement][R package]

Ma L. (2015) Scalable Bayesian model averaging through local information propagation. Journal of the American Statistical Association, Theory and Methods. Vol. 110, No. 510, 795-809. [online][preprint][R package]

Ma L. (2013) Adaptive testing of conditional association through recursive mixture modeling. Journal of the American Statistical Association, Theory and Methods. Vol. 108, No. 504, 1493-1505. [online][R package]

Ma L, Wong WH, and Owen AB. (2012) A sparse transmission disequilibrium test for haplotypes based on Bradley-Terry graphs. Human Heredity. Vol. 73, No. 1, 52-61. [online][preprint]

Ma L and Wong WH. (2011) Coupling optional Pólya trees and the two sample problem. Journal of the American Statistical Association, Theory and Methods. Vol. 106, No. 496, 1553-1565. [online][arxiv]

Ma L, Stein ML, Wang M, Shelton AO, Pfister CA, and Wilder KJ. (2011) A method for unbiased estimation of population abundance along curvy margins. Environmetrics. Vol. 22, No. 3, 330-339. [online]

Wong WH and Ma L. (2010) Optional Pólya tree and Bayesian inference. Annals of Statistics. Vol. 38, No. 3, 1433-1459. [online][pdf][R package]

Ma L, Assimes T, Asadi NB, Iribarren C, Quertermous T, and Wong WH. (2010) An “almost exhaustive” search based sequential permutation method for detecting epistasis in disease association studies. Genetic Epidemiology. Vol. 34, No. 5, 434-443. [online][software]

Ma L, Mease D, and Russell D. (2010) A four group cross-over design for measuring irreversible treatments on web search tasks. Proceedings of HICSS-44. [online]