Selected publications

Some stage of review
  1. Topological Summaries of Tumor Images Improve Prediction of Disease Free Survival in Glioblastoma Multiforme
  2. Development and assessment of fully automated and globally transitive geometric morphometric methods, with application to a biological comparative dataset with high interspecific variation
  3. The Geometry of Synchronization Problems and Learning Group Actions
  4. HOMINID: A framework for identifying associations between host genetic variation and microbiome composition
  5. Differential Expression Analysis for RNAseq using Poisson Mixed Models
  6. A phylogenetic transform enhances analysis of compositional microbiota data.
  7. Detecting Epistasis in Genome-wide Association Studies with the Marginal EPIstasis Test.
  8. Fast moment estimation for generalized latent Dirichlet models.
  9. Approximations of Markov Chains and High-Dimensional Bayesian Inference.
  10. Bayesian Approximate Kernel Regression with Variable Selection.
  11. Adaptive Randomized Dimension Reduction on Massive Data.
  12. Learning Subspaces of Different Dimension.
  13. Sufficient statistics for shapes and surfaces.
  14. Randomized Algorithms for Dimension Reduction on Massive Data.
  15. Towards stratification learning through homology inference.
  16. Multiscale factor models for molecular networks.
Published, accepted, or in press (since ~2001)
  1. Efficient genome-wide sequencing and low coverage pedigree analysis from non-invasively collected samples. (2016), Genetics.
  2. Fast principal components analysis reveals independent evolution of ADH1B gene in Europe and East Asia. (2016), American Journal of Human Genetics.
  3. Random Walks on Simplicial Complexes and Harmonics. (2016), Random Structures and Algorithms.
  4. Geometric representations of random hyper-graphs. (2016), JASA.
  5. Topological Consistency via Kernel Estimation. (2016), Bernoulli.
  6. Bayesian group latent factor analysis with structured spares priors. (2016), JMLR.

  7. Statistical inference for dynamical systems: a review. (2015), Statistical Surveys.
  8. Contour Trees of Uncertain Terrains. (2015), ACM SIGSPATIAL in GIS 2015.
  9. Citizen Science as a New Tool in Dog Cognition Research. (2015), PLoS One.
  10. Probabilistic Frechet Means and Statistics on Vineyards. (2015), Electronic Journal of Statistics.
  11. The information geometry of mirror descent. (2015), IEEE Trans. on Info. Theory.
  12. Consistency of maximum likelihood estimation for some dynamical systems. (2015), Annals of Statistics.
  13. The Topology of Probability Distributions on Manifolds. (2015), Probability Theory and Related Fields.

  14. Cumulon: Cloud-Based Statistical Analysis from Users Perspective. (2014), IEEE Data Eng. Bull.
  15. Core and region-enriched networks of behaviorally regulated genes and the singing genome. (2014), Science.
  16. Persistent Homology Transform for Modeling Shapes and Surfaces. (2014), Information and Inference: A Journal of the IMA.
  17. A new fully automated approach for aligning and comparing shapes. (2014), Anatomical Records.
  18. Novel Distal eQTL Analysis Demonstrates Effect of Population Genetic Architecture on Detecting and Interpreting Associations. (2014), Genetics.
  19. GSAASeqSP: A Toolset for Gene Set Association Analysis of RNA-Seq Data. (2014), Scientific Reports.
  20. A Cheeger-Type Inequality on Simplicial Complexes. (2014), Advances in Applied Mathematics.
  21. Frechet Means for Distributions of Persistence Diagrams. (2014), Discrete and Computational Geometry.
  22. A Digital Network Approach to Infer Sex Behavior in Emerging HIV Epidemics. (2014), PLoS One.
  23. Statistical Analysis of Crystallization Database Links Protein Physico-Chemical Features with Crystallization Mechanisms. (2014), PLoS One.

  24. Distinct and Overlapping Sarcoma Subtypes Initiated from Muscle Stem and Progenitor Cells. (2013), Cell Reports.
  25. Genome-wide identification and predictive modeling of tissue-specific alternative polyadenylation. (2013), Bioinformatics.
  26. A comparative study of covariance selection models for the inference of gene regulatory networks. (2013), Journal of Medical Bioinformatics.
  27. DNase-seq predicts regions of rotational nucleosome stability across diverse human cell types . (2013), Genome Research.
  28. Sustained-input switches for transcription factors and microRNAs are central building blocks of eukaryotic gene circuits. (2013), Genome Biology.
  29. Partial factor modeling: predictor-dependent shrinkage for linear regression. (2013), Journal of the American Statistical Association.
  30. Kernel Sliced Inverse Regression: Regularization and Consistency. (2013), Abstract and Applied Analysis.
  31. Assessing the radiation response of lung cancer with different gene mutations using genetically engineered mice. (2013), Frontiers in Oncology.
  32. Dissecting High-Dimensional Phenotypes with Bayesian Sparse Factor Analysis of Genetic Covariance Matrices. (2013), Genetics.

  33. Genetics of gene expression responses to temperature stress in a sea urchin gene network. (2012), Molecular Ecology.
  34. A Predictive Framework for Integrating Disparate Genomic Data Types Using Sample-Specific Gene Set Enrichment Analysis and Multi-Task Learning . (2012), PLoS One.
  35. Genetic effects on mating success and partner choice in a social mammal . (2012), American Naturalist.
  36. Cyclin-Dependent Kinases Are Regulators and Effectors of Oscillations Driven by a Transcription Factor Network. (2012), Molecular Cell.
  37. Local Homology Transfer and Stratification Learning. (2012), ACM-SIAM Symposium on Discrete Algorithms.
  38. Probability measures on the space of persistence diagrams. (2012), Inverse Problems.
  39. Integrating genetic and gene expression evidence into genome-wide association analysis of gene sets. (2012), Genome Research.

  40. RS-SNP: a random-set method for genome-wide association studies. (2011), BMC Genomics.
  41. Discovering genetic variants in Crohn's disease by exploring genomic regions enriched of weak association signals. (2011), Digestive and Liver Disease.
  42. Cross Species Genomic Analysis Identifies a Mouse Model as Undifferentiated Pleomorphic Sarcoma/Malignant Fibrous Histiocytoma. (2011), PLoS One.
  43. Estimating variable structure and dependence in Multi-task learning via gradients. (2011), Machine Learning.
  44. Multiscale factor models for molecular networks. (2011), Proc of JSM.

  45. On the reproducibility of results of pathway analysis in genome-wide expression studies of colorectal cancers. (2010), Journal of Biomedical Informatics.
  46. Localized Sliced Inverse Regression. (2010), Journal of Computational and Graphical Statistics.
  47. Learning gradients: predictive models that infer geometry and dependence. (2010), Journal of Machine Learning Research.
  48. Bayesian mixture of inverse regressions. (2010), International Conference on Artificial Intelligence and Statistics.
  49. Learning Gradients and Feature Selection on Manifolds. (2010), Bernoulli.
  50. Evidence-ranked motif identification. (2010), Genome Biology.

  51. Comparative study of gene set enrichment methods. (2009), BMC Bionformatics.
  52. Genomic features that predict allelic imbalance in humans suggest patterns of constraint on gene expression variation. (2009) Molelcular Biology and Evolution.
  53. Do serum biomarkers really measure breast cancer?. (2009), BMC Cancer.
  54. Characterizing the developmental pathways TTF-1, NKX2-8, and PAX9 in lung cancer. (2009), Proc. Natl. Acad. Sci. USA.
  55. Local sliced inverse regression. (2009), Proceedings of Advances in Neural Information Processing Systems.

  56. Modeling cancer progression via pathway dependencies. (2008), PLoS Comput Biol.

  57. Gene Expression Programs of Human Smooth Muscle Cells: Tissue-Specific Differentiation and Prognostic Significance in Breast Cancers. (2007), PLoS Genetics.
  58. Understanding the use of unlabelled data in predictive modelling. (2007), Statistical Science.
  59. Characterizing the Function Space for Bayesian Kernel Models. (2007), J Mach Learn Res.
  60. Genomic sweeping for hypermethylated genes (2007), Bioinformatics.

  61. Evidence of influence of genomic DNA sequence on human X chromosome inactivation. (2006), PLoS Comput Biol.
  62. Analysis of Sample Set Enrichment Scores: assaying the enrichment of sets of genes for individual samples in genome-wide expression profiles. (2006), Bioinformatics.
  63. Gene expression changes and moelcular pathways mediating activity-dependent plasticity in visual cortex. (2006), Nat Neurosci.
  64. Estimation of Gradients and Coordinate Covariation in Classification. (2006), J Mach Learn Res.
  65. Learning Coordinate Covariances via Gradients. (2006), J Mach Learn Res..
  66. Statistical Learning: Stability is Sufficient for Generalization and Necessary and Sufficient for Consistency of Empirical Risk Minimization. (2006), Adv Comput Math.

  67. Gene Set Enrichment Analysis: A Knowledge-Based Approach for Interpreting Genome-wide Expression Profiles (2005), Proc Natl Acad Sci USA.
  68. An oncogenic KRAS2 expression signature identified by cross-species gene-expression analysis (2005), Nat Genet.
  69. Stability Results in Learning Theory (2005), Anal App.
  70. Permutation Tests for Classification (2005), Proceedings of the Conference on Learning Theory.
  71. Risk Bounds for Mixture Density Estimation (2005), ESAIM: Probability and Statistics.
  72. Gene Selection via a Spectral Approach (2005), IEEE Workshop on Computer Vision Methods for Bioinformatics.

  73. Androgen-Induced Differentiation and Tumorigenicity of Human Prostate Epithelial Cells. (2004), Cancer Research.
  74. Learning Theory: general conditions for predictivity. (2004), Nature.
  75. Estimating Dataset Size Requirements for Classifying DNA Microarray Data. (2003), J Comput Biol.
  76. An Analytical Method for Multi-class Molecular Cancer Classification. (2003), SIAM Reviews.
  77. Optimal gene expression analysis by microarrays. (2002), Cancer Cell.
  78. Gene Expression-Based Classification and Outcome Prediction of Central Nervous System Embryonal Tumors. (2002), Nature.
  79. Choosing Multiple Parameters for Support Vector Machines. (2002), Machine Learning.
  80. A Uniform Approach to Molecular Cancer Diagnosis Using Tumor Gene Expression Signatures. (2001), Proc Natl Acad Sci U S A.
  81. Molecular classification of multiple tumor types. (2001), Bioinformatics.
  82. Bounds on sample size for policy evaluation in Markov environments. (2001), Proceedings of the Conference on Learning Theory.
  83. Feature Selection for SVMs. J Weston, S Mukherjee, O Chapelle, M Pontil, T Poggio, V Vapnik. Proc Neural Information Processing Systems.
Book Chapters
  1. Classifying Microarray Data Using Support Vector Machines. Understanding and Using Microarray Analysis Techniques: A Practical Guide.
  2. Regression and Classification with Regularization. Nonlinear Estimation and Classification.
  3. b Uncertainty in Geometric Computations.
Unpublished notes
  1. Statistical learning thoery lecture notes, random notes.
  2. Non-parametric Bayesian kernel models, Working Paper.
  3. Support Vector Method for Multivariate Density Estimation, CBCL/AI Memo.
  4. Support Vector Machine Classification of Microarray Data, CBCL/AI Memo.