Prof: | Sayan Mukherjee |
sayan@stat.duke.edu | OH: Wednesday 2:15-3:15pm, 112 Old Chem | ||

Class: | Tuesday 11:45am-2:40pm | 025 Old Chem |

- formalize the question as a probabilistic model (typically via a likelihood);
- clarify the interpretation of model parameters and the model assumptions;
- develop methods for parameter estimation;
- quantify uncertainty in parameter estimation;
- interpret the parameters to address the biological question.

Statistics at the level of STA611 (Introduction to Statistical Methods) is expected, along with knowledge of linear algebra and multivariate calculus.

A second set of references for R will also be useful. First, you can download R from the CRAN website. There are many resources, such as R Studio, that can help with the programming interface, and tutorials on R are all over the place. If you are getting bored with the standard graphics package, I really like using ggplot2 for beautiful graphics and figures. Finally, you can integrate R code and output with plain text using KNITR, but that might be going a bit too far for beginners.

We will have daily readings for the course, but there is no formal text for this class. However, some texts and notes that may be useful include:

- Michael Lavine, Introduction to Statistical Thought (an introductory statistical textbook with plenty of R examples, and it's online too)
- Ewans and Grant, Statistical Methods in Bioinformatics
- Cristianini and Hahn, Introduction to Computational Genomics
- Sayan Mukherjee, Statistical methods for computational biology
- Kevin Murphy, Machine Learning: a probabilistic perspective
- Durbin, Eddy, Krogh, Mitchison, Biological Sequence Analysis
- Joseph Felsenstein, Inferring phylogenies

Dog Population Splits and Mixtures from Genome-wide Allele Frequency Data

Paramter inference for a differential equations model of the NKCC2 cotransporter

Histone Occupancy and Gene Expression

Estimating the Effect of Single Nucleotide Variation on Transcription Factor Binding Affinity

Dealing With Censorship in Animal Models Involving Large Biological Datasets

Partial Factor Regression Model in a Genetics Context

Determining Critical Features for Protein Crystallization using Regression

Identification of cofactors of NPR1 by exploratory factor analysis of public microarrays.

Classifying and Clustering Vegetation in Belize Rainforests using Support Vector Machine

Differential expression analysis for RNA-Seq of single olfactory sensory neurons

Modeling Shear Stress in Schlemm’s Canal

Multi-Model Gene Expression Data Generation Framework with Linear Regression and Mixed-Effect Models

Dynamics of correlated sets of reactions in metabolic networks

Application of Bayesian Sparse Latent Factor Models in Metabolomic Profiling of Peripheral Blood

Note: The final project TeX template and final project style file should be used in preparation of your final project report. Please follow the instructions and let me know if you have questions. There will be a poster session on April 24th and the reports will be due on May 1.

This syllabus is

This syllabus is *tentative*, and will almost surely be superceded. Reload your browser for the current version.

- (Jan 16) Modeling biogical phenomena:
- Notes
- Readings
- (Jan 16) Inference of population structure:
- Notes
- Readings
- Homework due Feb 6: Assignment 1
- (Jan 23) Multiple hypothesis testing:
- (Jan 30) eQTL mapping:
- Notes: Frequentist regression Bayesian regression
- Readings
- (Jan 30) Epistasis and nonlinear regression:
- (Feb 6) Markov chain Monte Carlo:
- Notes
- Readings
- Homework due Feb 20: Assignment 2
- (Feb 13) Linear mixed models, Quantitative genetics, and Statistical genetics:
- Notes
- Readings
- [Rausher]
- [Yang et al,2014]
- [Runcie et al, 2013]
- Homework due March 27: Assignment 3

- (Feb 20) Motif finding, Mixture models, EM:
- (Feb 27) Hidden Markov models and gene finding:
- (Mar 6) Reconstructing population histories and coalescent models:
- Notes
- Readings
- Take-home midterm due Mar 27:
- (Mar 20) Compositional data, time series models:
- (Mar 27)) Gene networks, Path analysis, Graphical models:
- (Apr 3) Class cancelled due to machine learning day
- (Apr 10) New functional assays: single cell expression
- Notes
- Readings
- (Apr 17) New functional assays: 3D structure of the genome, also optimization