Supervised dimension reduction
Software of SDR from Bayesian, algorithmic, and geometric
Dimension reduction in massive data
Localized sliced inverse
regression: Simple eigen-decomposition algorithm for SDR.
Bayesian mixture of
inverses: Probabilistic model for SDR.
learning: Geometric approach to SDR.
Kernel sliced inverse
regression: One can kernelize anything.
Development of linear, nonlinear, supervised, semisupervised and unsupervised
dimension reduction for massive data. Software is under development.
Code in < beta mode for PCA:
code: c-code for efficient eigendecomposition.
Automated 3D Geometric Morphometrics
Bayesian Sparse Factor Analysis of Genetic Covariance Matrices:
Bayesian Sparse Factor Analysis of Genetic Covariance Matrices (BSFG) is a genetic sparse factor model that inferences the matrix of genetic covariances among traits. The code implementing the model uses a Gibbs sampler to draw samples from the posterior distribution of a multivariate linear mixed effect model, where the random effects are generally unobserved genetic values (breeding values) with known covariance (ex. based on a pedigree). The focus of the model is on estimating the matrix of genetic (and residual) covariances among traits, called the G-matrix.
Gene set based approaches in high-throughput genomics
Automated 3D Geometric Morphometrics:
Software that allows for comparative analysis of 3D digital models representing bones. Unlike other three-dimensional geometric morphometric (3DGM) methods this software uses a fully automated procedure for placing landmarks on the bones. This allows for the alignment of bones followed by measuring distance between bones with minimal user intervention.
Probabilistic modeling and topology
Gene set enrichment
analysis: Provides formal statistical evaluation, and confidence
assessments, for annotation of an expression data set by measuring the
overlap of significantly perturbed genes with those in a database of
Analysis of Sample Set
Enrichment Scores: Similar to GSEA, but can estimate enrichment
scores on a per sample basis for all samples. ASSESs measures the
variation in overlap of significantly perturbed genes with those in
a database of gene sets.
Evidence-ranked motif identification: Implements an enumerative strategy for
identifying cis-regulatory elements from high-throughput genomic data
such as chromatin-immunoprecipitation experiments.
Gene set association
analysis: Integrates gene expression analysis with genome
wide association studies (GWAS) to determine
whether an a priori defined sets of genes shows statistically
significant, concordant differences with respect to gene expression
profiles and genotypes between two biological states.
Modeling complex disease traits
Graphical models can be extended to hypergraphs that model
higher-order interactions. The combinatorial nature and
exponential increase in complexity of hypergraph models
result in computational problems. We couple ideas from
computational geometry and topology with spatial processes
and classical Bayesian inference to learn hypergraphs.
We are working on extensions to directed hypergraphs using
Forman/Morse theory. This will be applied to inference of
dependence structure in social networks.
Probability and persistence homologies
The integration of probabilistic modeling with algebraic
structure is of great interest in topological analysis. We
are developing algorithms that are hopefully provable
for inference of topological invariants such as strata
of a Whitney stratified space. We are also very interested
in developing a probabilistic or distributional theory
for topological invariants.
Modeling cancer progression
Tumorigenesis is an example of a complex trait controlled by many
genes that interact to explain variation in phenotype. This
heterogeneity is the crux of the difficulty in modeling progression.
The genetic heterogeneity consists of two sources: stage of disease
and variability across individuals. Stratification across stage and
selection of genes that are consistently mutated may address the
heterogeneity. However, a great deal of power is lost and often
very few genes (5-10) are consistently mutated.
We address these issues by borrowing strength. We borrow strength
across genes by modeling progression in the space of a priori defined
gene sets such as members of signalling pathways or chromosomal
neighbors. We borrow strength across stages using Bayesian heirarchical models.
In addition, we are able infer gene interaction networks.
Integration of high-throughput genomic data
We are using the same framework to integrate high-throughput genomic sources such as expression, copy number variation, single nucleotide polymorphism, and methylation data.