ABS04 - 2004 Applied Bayesian Statistics School

STATISTICS & GENE EXPRESSION GENOMICS:
METHODS AND COMPUTATIONS
Centro Congressi Panorama, Trento, Italy
15th-19th June 2004

Slides, notes, software, data, links

MIKE WEST

Lecture Slides Data, Examples, Code Statistics Notes Papers
Tools and Software More Software Sites Microarray Info Gene Info Servers


Lecture slides
  • Biology basics
  • Biological phenotypes
  • Affymetrix DNAmicroarray basics
  • Statistics, multivariate data exploration & regression approaches
    Binary regressions and molecular phenotyping studies
    Statistical prediction tree models and clinico-genomics
    Graphical models

    Data and examples with Matlab
    PNAS 2001 Breast Cancer:
    data and code, local version of the paper, the PNAS journal paper & low level array data (cel files)
    PNAS 2004 Breast Cancer:
    data and code, local version of the paper, and the PNAS journal paper
    Nat Gen 2003 Myc/Ras/E2F:
    data and code, local version of the paper, and the Nat Gen journal paper & low level array data (cel files)
    Science 1999 MIT/Whitehead Leukemia:
    data and paper
    Some Matlab utilities - functions and scripts - for data handling and exploration, and some of the statistical summaries and modelling of expression data here. And here are some additional functions and scripts for binary tree analysis and examples.

    Statistics notes
    1. Basic Statistics: dvi and pdf
    2. Least Squares Regression dvi and pdf
    3. Multiple Regression dvi and pdf
    4. Clustering: dvi and pdf
    5. Empirical Factors - PCA and SVD: dvi and pdf
    6. Factor Regression: dvi and pdf
    7. Bayesian Regression & Shrinkage Estimation: dvi and pdf
    8. Binary Regression: dvi and pdf
    9. Gibbs Sampling in Linear Regression with Shrinkage Priors: dvi and pdf
    10. Gibbs Sampling in Binary Regression: dvi and pdf
    11. Multinormal Theory: dvi and pdf

    A few other relevant Duke papers
    A list including those above as well as others on statistical modelling and a range of applications in expression genomics

    Some Duke software and tools sites, and key gene/genomics data base sites
    The Duke CAGP GraphExplore software for displaying, exploring and manipulating general graphs (directed, undirected), and of particular use for graphs generated in analysis of gene expression associations and other genomic data sets
    The Duke Integrated Genomics (DIG) data base, for exploration of gene annotation, links to information servers, automated literature searches to generate biological information, etc
    MetaGeneCreator software (Adrian Dobra at Duke) for reclustering and improved definition of metagene clusters. This takes as input either raw data, in which case it utilises k-means clustering and then an iterative refinement of clustering, or data from covariance selection/graphical models such as generated by ...
    GGM software - C++ code implementing methods of stochastic computation (Metropolis Hastings MCMC and shotgun stochastic search) for model exploration and selection in Gaussian graphical models (Duke graphical models group).
    Bayesian covariance selection in high-dimensions - initial code available as HdBCS (Adrian Dobra at Duke)

    Matlab, R, Bioconductor and other useful software and tools sites
    Bioconductor web site: Bioconductor is an open source and open development software project for the analysis and comprehension of genomic data, built on the R statistical programming environment
    Bioconductor lab materials - really useful introductory tutorial material for Bioconductor (and R). Look particularly at lab slides by Robert Gentleman and colleagues
    CRAN web site for R -- go there to download and install R (free). R is a widely used open source language and environment for statistical computing and graphics. It is available for Linux, Unix, Windows, and MacIntosh computers. More information on R is available in the "R Basics" section of the R FAQ
    xcluster by Gavin Sherlock
    Cluster software, including Cluster 3.0 (Max, Windows, Linux) and manual
    Java Treeview software and download site
    Eisen lab site for Cluster & Treeview software

    Some local microarray info
    Magic of Microarrays, a recent Scientific American article overview
    Duke template for Affymetrix files gives a brief description of Affymetrix data files
    Some useful web sites on arrays, resources
    Microarray slides (powerpoint)
    Affyx data processing - basic details
    Affymetrix manual more Affymetric details

    Some gene/genomics data base sites
    The Entrez Gene site at NCBI. Gene provides a unified query environment for genes defined by sequence and/or in NCBI's Map Viewer. You can query on names, symbols, accessions, publications, GO terms, chromosome numbers, E.C. numbers, and many other attributes associated with genes and the products they encode. Gene is one of the Entrez systems and is likely to (soon) replace LocusLink.
    The LocusLink site at NCBI, providing single query interface to curated sequence and descriptive information about genetic loci. It presents information on official nomenclature, aliases, sequence accessions, phenotypes, EC numbers, MIM numbers, UniGene clusters, homology, map locations, and related web sites.
    The KEGG site: Kyoto Encyclopedia of Genes and Genomes, for annotation and visualisation of functional metabolic pathways