Home
Selected publications
Some stage of review
- Topological Summaries
of Tumor Images Improve Prediction of Disease Free Survival in
Glioblastoma Multiforme
- Development
and assessment of fully automated and globally transitive geometric
morphometric methods, with application to a biological comparative
dataset with high interspecific variation
- The
Geometry of Synchronization Problems and Learning Group Actions
-
HOMINID: A framework for identifying associations between
host genetic variation and microbiome composition
-
Differential Expression Analysis for RNAseq using Poisson Mixed Models
-
A phylogenetic transform enhances analysis of compositional
microbiota data.
-
Detecting Epistasis in Genome-wide Association Studies with the
Marginal EPIstasis Test.
-
Fast moment estimation for generalized latent Dirichlet models.
- Approximations of Markov
Chains and High-Dimensional Bayesian Inference.
- Bayesian Approximate
Kernel Regression with Variable Selection.
- Adaptive Randomized
Dimension Reduction on Massive Data.
-
Learning Subspaces of Different Dimension.
-
Sufficient statistics for shapes and surfaces.
-
Randomized Algorithms for Dimension Reduction on Massive Data.
-
Towards stratification learning through homology inference.
-
Multiscale factor models for molecular networks.
Published, accepted, or in press (since ~2001)
2016
-
Efficient genome-wide sequencing and low coverage pedigree
analysis from non-invasively collected samples. (2016), Genetics.
- Fast principal components
analysis reveals independent evolution of ADH1B gene in Europe and
East Asia. (2016), American Journal of Human Genetics.
-
Random Walks on Simplicial Complexes and Harmonics. (2016),
Random Structures and Algorithms.
- Geometric representations of random hyper-graphs. (2016), JASA.
- Topological Consistency
via Kernel Estimation. (2016), Bernoulli.
- Bayesian group latent
factor analysis with structured spares priors. (2016), JMLR.
2015
-
Statistical inference for dynamical systems: a review. (2015),
Statistical Surveys.
- Contour Trees of Uncertain
Terrains. (2015), ACM SIGSPATIAL in GIS 2015.
- Citizen Science as a New Tool in Dog Cognition Research. (2015), PLoS One.
-
Probabilistic Frechet Means and Statistics on Vineyards. (2015),
Electronic Journal of Statistics.
-
The information geometry of mirror descent. (2015), IEEE Trans. on
Info. Theory.
-
Consistency of maximum likelihood estimation for some dynamical
systems. (2015), Annals of Statistics.
-
The Topology of Probability Distributions on Manifolds. (2015),
Probability Theory and Related Fields.
2014
-
Cumulon: Cloud-Based Statistical Analysis from Users Perspective.
(2014), IEEE Data Eng. Bull.
- Core and region-enriched networks of behaviorally regulated genes
and the singing genome. (2014), Science.
- Persistent Homology Transform for Modeling Shapes and
Surfaces. (2014), Information and Inference: A Journal of the IMA.
- A new fully automated approach for aligning and comparing
shapes. (2014), Anatomical Records.
-
Novel Distal eQTL Analysis Demonstrates Effect of Population Genetic
Architecture on Detecting and Interpreting Associations. (2014),
Genetics.
-
GSAASeqSP: A Toolset for Gene Set Association Analysis of
RNA-Seq Data. (2014), Scientific Reports.
-
A Cheeger-Type Inequality on Simplicial Complexes. (2014),
Advances in Applied Mathematics.
-
Frechet Means for Distributions of Persistence Diagrams.
(2014), Discrete and Computational Geometry.
-
A Digital Network Approach to Infer Sex Behavior in Emerging HIV Epidemics.
(2014), PLoS One.
-
Statistical Analysis of Crystallization Database Links Protein
Physico-Chemical Features with Crystallization Mechanisms.
(2014), PLoS One.
2013
-
Distinct and Overlapping Sarcoma Subtypes Initiated from Muscle Stem and Progenitor Cells.
(2013), Cell Reports.
-
Genome-wide identification and predictive modeling of tissue-specific
alternative polyadenylation.
(2013), Bioinformatics.
-
A comparative study of covariance selection models for the inference
of gene regulatory networks.
(2013), Journal of Medical Bioinformatics.
- DNase-seq predicts regions of rotational nucleosome stability across diverse human cell types
.
(2013), Genome Research.
- Sustained-input switches for transcription factors and microRNAs are central building blocks of eukaryotic gene circuits.
(2013), Genome Biology.
- Partial factor modeling: predictor-dependent shrinkage for linear
regression. (2013), Journal of the American Statistical Association.
- Kernel
Sliced Inverse Regression: Regularization and Consistency. (2013), Abstract and Applied Analysis.
- Assessing the
radiation response of lung cancer with different gene mutations
using genetically engineered mice. (2013), Frontiers in Oncology.
-
Dissecting High-Dimensional Phenotypes with Bayesian Sparse Factor Analysis of Genetic Covariance Matrices.
(2013), Genetics.
2012
-
Genetics of gene expression responses to temperature stress in a
sea urchin gene network. (2012), Molecular Ecology.
- A
Predictive Framework for Integrating Disparate Genomic Data Types
Using Sample-Specific Gene Set Enrichment Analysis and Multi-Task
Learning . (2012), PLoS One.
- Genetic effects on mating
success and partner choice in a social mammal . (2012), American Naturalist.
-
Cyclin-Dependent Kinases Are Regulators and Effectors of Oscillations
Driven by a Transcription Factor Network. (2012), Molecular Cell.
- Local
Homology Transfer and Stratification Learning. (2012),
ACM-SIAM Symposium on Discrete Algorithms.
-
Probability measures on the space of persistence diagrams. (2012),
Inverse Problems.
-
Integrating genetic and gene expression evidence into genome-wide
association analysis of gene sets. (2012), Genome Research.
2011
-
RS-SNP: a random-set method for genome-wide association studies.
(2011), BMC Genomics.
-
Discovering genetic variants in Crohn's disease by exploring genomic regions enriched of weak association signals.
(2011), Digestive and Liver Disease.
-
Cross Species Genomic Analysis Identifies a Mouse Model as Undifferentiated Pleomorphic Sarcoma/Malignant Fibrous Histiocytoma.
(2011), PLoS One.
-
Estimating variable structure and dependence in Multi-task learning
via gradients. (2011), Machine Learning.
- Multiscale factor models for molecular networks. (2011), Proc of JSM.
2010
-
On the reproducibility of results of pathway analysis in genome-wide
expression studies of colorectal cancers. (2010), Journal of Biomedical Informatics.
-
Localized Sliced Inverse Regression. (2010), Journal of
Computational and Graphical Statistics.
-
Learning gradients: predictive models that infer geometry and
dependence. (2010), Journal of Machine Learning Research.
-
Bayesian mixture of inverse regressions. (2010), International
Conference on Artificial Intelligence and Statistics.
- Learning Gradients and
Feature Selection on Manifolds. (2010), Bernoulli.
- Evidence-ranked
motif identification. (2010), Genome Biology.
2009
-
Comparative study of gene set enrichment methods. (2009), BMC Bionformatics.
- Genomic
features that predict allelic imbalance in humans suggest patterns
of constraint on gene expression variation. (2009) Molelcular
Biology and Evolution.
- Do
serum biomarkers really measure breast cancer?. (2009), BMC Cancer.
- Characterizing
the developmental pathways TTF-1, NKX2-8, and PAX9 in lung
cancer. (2009), Proc. Natl. Acad. Sci. USA.
- Local
sliced inverse regression. (2009), Proceedings of Advances in Neural
Information Processing Systems.
2008
- Modeling
cancer progression via pathway dependencies. (2008), PLoS Comput Biol.
2007
- Gene
Expression Programs of Human Smooth Muscle Cells: Tissue-Specific
Differentiation and Prognostic Significance in Breast Cancers.
(2007), PLoS Genetics.
- Understanding the use of
unlabelled data in predictive modelling. (2007), Statistical Science.
- Characterizing
the Function Space for Bayesian Kernel Models. (2007), J Mach Learn Res.
-
Genomic sweeping for hypermethylated genes (2007), Bioinformatics.
2006
- Evidence
of influence of genomic DNA sequence on human X chromosome
inactivation. (2006), PLoS Comput Biol.
- Analysis of Sample Set Enrichment
Scores: assaying the enrichment of sets of genes for individual
samples in genome-wide expression profiles. (2006), Bioinformatics.
- Gene
expression changes and moelcular pathways mediating
activity-dependent plasticity in visual cortex. (2006), Nat Neurosci.
-
Estimation of Gradients and Coordinate Covariation in
Classification. (2006), J Mach Learn Res.
-
Learning Coordinate Covariances via Gradients. (2006), J Mach Learn Res..
- Statistical Learning: Stability
is Sufficient for Generalization and Necessary and Sufficient for
Consistency of Empirical Risk Minimization. (2006), Adv Comput Math.
2005
- Gene Set
Enrichment Analysis: A Knowledge-Based Approach for Interpreting
Genome-wide Expression Profiles (2005), Proc Natl Acad Sci USA.
-
An oncogenic KRAS2 expression signature identified by cross-species
gene-expression analysis (2005), Nat Genet.
- Stability Results in Learning Theory (2005), Anal App.
- Permutation Tests for
Classification (2005), Proceedings of the Conference on Learning Theory.
-
Risk Bounds for Mixture Density Estimation (2005),
ESAIM: Probability and Statistics.
- Gene Selection via a Spectral
Approach (2005), IEEE Workshop on Computer Vision Methods for Bioinformatics.
2001-2004
- Androgen-Induced Differentiation and Tumorigenicity of Human Prostate
Epithelial Cells. (2004), Cancer Research.
-
Learning
Theory: general conditions for predictivity. (2004), Nature.
-
Estimating Dataset Size Requirements for Classifying DNA
Microarray Data. (2003), J Comput Biol.
- An
Analytical Method for Multi-class Molecular Cancer Classification. (2003), SIAM Reviews.
-
Optimal gene expression analysis by microarrays. (2002), Cancer Cell.
-
Gene Expression-Based Classification and Outcome Prediction of
Central Nervous System Embryonal Tumors. (2002), Nature.
- Choosing Multiple Parameters for
Support Vector Machines. (2002), Machine Learning.
-
A Uniform Approach to Molecular Cancer Diagnosis Using Tumor
Gene Expression Signatures. (2001), Proc Natl Acad Sci U S A.
-
Molecular classification of multiple tumor types. (2001), Bioinformatics.
-
Bounds on sample size for policy evaluation in Markov
environments. (2001), Proceedings of the Conference on Learning Theory.
- Feature Selection for SVMs. J Weston,
S Mukherjee, O Chapelle, M Pontil, T Poggio, V Vapnik. Proc Neural Information Processing Systems.
Book Chapters
- Classifying Microarray Data Using
Support Vector Machines. Understanding and Using Microarray Analysis Techniques: A Practical Guide.
- Regression and Classification with
Regularization. Nonlinear Estimation and Classification.
- b Uncertainty in Geometric Computations.
Unpublished notes
-
Statistical learning thoery lecture notes, random notes.
-
Non-parametric Bayesian kernel models, Working Paper.
- Support Vector Method for Multivariate Density
Estimation, CBCL/AI Memo.
- Support Vector Machine Classification of Microarray Data, CBCL/AI Memo.