Home


Supervised dimension reduction
    Software of SDR from Bayesian, algorithmic, and geometric perspectives:
    Localized sliced inverse regression: Simple eigen-decomposition algorithm for SDR.
    Bayesian mixture of inverses: Probabilistic model for SDR.
    Bayesian gradient learning: Geometric approach to SDR.
    Kernel sliced inverse regression: One can kernelize anything.


Dimension reduction in massive data
    Development of linear, nonlinear, supervised, semisupervised and unsupervised dimension reduction for massive data. Software is under development.
    Code in < beta mode for PCA:
    Eigendecomposition code: c-code for efficient eigendecomposition.


Quantitative genetics

    Bayesian Sparse Factor Analysis of Genetic Covariance Matrices: Bayesian Sparse Factor Analysis of Genetic Covariance Matrices (BSFG) is a genetic sparse factor model that inferences the matrix of genetic covariances among traits. The code implementing the model uses a Gibbs sampler to draw samples from the posterior distribution of a multivariate linear mixed effect model, where the random effects are generally unobserved genetic values (breeding values) with known covariance (ex. based on a pedigree). The focus of the model is on estimating the matrix of genetic (and residual) covariances among traits, called the G-matrix.


Automated 3D Geometric Morphometrics

    Automated 3D Geometric Morphometrics: Software that allows for comparative analysis of 3D digital models representing bones. Unlike other three-dimensional geometric morphometric (3DGM) methods this software uses a fully automated procedure for placing landmarks on the bones. This allows for the alignment of bones followed by measuring distance between bones with minimal user intervention.


Gene set based approaches in high-throughput genomics

    Gene set enrichment analysis: Provides formal statistical evaluation, and confidence assessments, for annotation of an expression data set by measuring the overlap of significantly perturbed genes with those in a database of gene sets.
    Analysis of Sample Set Enrichment Scores: Similar to GSEA, but can estimate enrichment scores on a per sample basis for all samples. ASSESs measures the variation in overlap of significantly perturbed genes with those in a database of gene sets.
    Evidence-ranked motif identification: Implements an enumerative strategy for identifying cis-regulatory elements from high-throughput genomic data such as chromatin-immunoprecipitation experiments.
    Gene set association analysis: Integrates gene expression analysis with genome wide association studies (GWAS) to determine whether an a priori defined sets of genes shows statistically significant, concordant differences with respect to gene expression profiles and genotypes between two biological states.


Probabilistic modeling and topology
    Simplicial models
    Graphical models can be extended to hypergraphs that model higher-order interactions. The combinatorial nature and exponential increase in complexity of hypergraph models result in computational problems. We couple ideas from computational geometry and topology with spatial processes and classical Bayesian inference to learn hypergraphs. We are working on extensions to directed hypergraphs using Forman/Morse theory. This will be applied to inference of dependence structure in social networks.

    Probability and persistence homologies
    The integration of probabilistic modeling with algebraic structure is of great interest in topological analysis. We are developing algorithms that are hopefully provable for inference of topological invariants such as strata of a Whitney stratified space. We are also very interested in developing a probabilistic or distributional theory for topological invariants.


Modeling complex disease traits
    Modeling cancer progression
    Tumorigenesis is an example of a complex trait controlled by many genes that interact to explain variation in phenotype. This heterogeneity is the crux of the difficulty in modeling progression.
    The genetic heterogeneity consists of two sources: stage of disease and variability across individuals. Stratification across stage and selection of genes that are consistently mutated may address the heterogeneity. However, a great deal of power is lost and often very few genes (5-10) are consistently mutated.
    We address these issues by borrowing strength. We borrow strength across genes by modeling progression in the space of a priori defined gene sets such as members of signalling pathways or chromosomal neighbors. We borrow strength across stages using Bayesian heirarchical models.
    In addition, we are able infer gene interaction networks.

    Integration of high-throughput genomic data
    We are using the same framework to integrate high-throughput genomic sources such as expression, copy number variation, single nucleotide polymorphism, and methylation data.