STA613/CBB540: Statistical methods in computational biology: Spring 2016
Prof: | Sayan Mukherjee | |
sayan@stat.duke.edu | |
OH: Wednesday 2:15-3:15pm, 112 Old
Chem |
TA: | Ryan Muraglia | |
| |
OH: Wednesday 10:00-12:00pm, SCC in Old Chem |
Class: | Tu/Thu
3:05-4:20am | | | | 025 Old Chem |
Description
This course is based on case studies of statistical approaches to problems in computational biology. We will learn about statistical modeling in computational biology by formulating biological questions and repeating the following steps:
- formalize the question as a probabilistic model (typically via a likelihood);
- clarify the interpretation of model parameters and the model assumptions;
- develop methods for parameter estimation;
- quantify uncertainty in parameter estimation;
- interpret the parameters to address the biological question.
Statistics at the level of STA611 (Introduction to Statistical Methods) is expected, along with knowledge of linear algebra and multivariate calculus.
Course grade is based on an a midterm (30%), a final project (50%),
and and biweekly homeworks (20%). The project can be
either a reanalysis of the data in one
of the case studies covered during the semester or a project of interest to the student (rotation projects are great). Homeworks are due to me exactly two weeks after they are handed out at the beginning of class. Late homeworks will not be accepted, although you are allowed one late homework (maximum one week) for the course. Students may (and should) collaboratively discuss the homework assignments; however, I expect each student to program and write up their own homework solutions. Please write the names of the students you discussed the homework assignment with at the top of your solutions.
A second set of references for R will also be useful. First, you can
download R from the CRAN
website. There are many resources, such as R Studio, that can help with the
programming interface, and tutorials
on R are all over the place. If you are getting bored with the standard graphics package, I really like using ggplot2 for beautiful graphics and figures. Finally, you can integrate R code and
output with plain text using KNITR, but that might be going a
bit too far for beginners.
We will have daily readings for the course, but there is no formal text for this class. However, some texts and notes that may be useful include:
Michael Lavine, | Introduction to Statistical Thought (an introductory statistical textbook with plenty of R examples, and it's online too)
Ewans and Grant, | Statistical Methods in Bioinformatics
Cristianini and Hahn, | Introduction to Computational Genomics
Sayan Mukherjee, | Statistical methods for computational biology
Kevin Murphy, | Machine Learning: a probabilistic perspective
Durbin, Eddy, Krogh, Mitchison, | Biological Sequence Analysis
Joseph Felsenstein, | Inferring phylogenies
This syllabus is tentative, and will almost surely be superceded. Reload your browser for the current version.
A link to a list of possible projects will appear soon.
Note: The final project TeX template and final project style file should be used in preparation of your final project report. Please follow the instructions and let me know if you have questions. Presentations will be on April 10th and 15th; the reports will be due on April 18th (Friday).
Lecture notes | Topic | Homework | |
|
|
Jan 14 |
Modelling biogical
phenomena | [Pearson, 1893][Turing, 1952] |
Jan 19 |
Inference of population structure
I | [Pritchard et al.,
2000][Stinging commentary to
Nicholas Wade]
| HW 1 due Jan 28 |
Jan 21 |
eQTL mapping | [Stranger
et al.,
2007] | |
Jan 26-28 |
Hypothesis
testing | [Storey et al.,
2003] [Stephens,
2014] [Subramanian
et al, 2005] | |
Feb 2 |
No class |
Feb 4 |
Markov chain Monte Carlo | [Rosenthal] | HW 2 due
Feb 22
genotypes
expression |
Feb
9, 11 |
Linear Mixed Models | [Runcie,
2013][Yang
et al,
2014]
| |
Feb 16 |
Epigenomics | [Lea et al, 2015] |
Feb 18 | Epistasis | [Sharp
et al,
2016][Crawford
et al 2015] | |
Feb 23-25, Mar 1 |
Motif finding and
EM | [Bailey
and Elkan,
1994][Dempster
et al, 1977] | |
Mar 3 HMM
notes
Coalescent notes |
Inference of population histories and
HMMs | [Li
and Durbin, 2011] | |
Mar 8 |
Mixture models and EM | [Bailey
et al., 1995] |
Mar 10 |
Hidden Markov
models | [Burge
& Karlin, 1997] | Midterm due Mar 19 |
Mar 22 |
Review and proof of
EM | Proof
of EM | |
Mar 24 |
Gene network
models | [Schafer
& Strimmer, 2005]
[Wright, 1918] | |
March 29
more notes |
HMMs | | |
Mar
31 more notes |
Morphometrics | [Bookstein,
1996] [Milnor, 2010]
|
April 5 |
Microbiomes | | |
Apr 7 |
Species models and the enigma
code | [IJ
Good,
1953][IJ
Good, 1979] |
Apr 12 |
Open | |
Apr 14 |
Open | |
Apr 19 |
| Final project presentations |
Apr 21 |
| Final project presentations |