Prof: | Sayan Mukherjee |
sayan@stat.duke.edu | OH: M 9:30-11:30 | 112 Old Chem | |

TAs: | |||||

Peter Hase | peter.hase@duke.edu | OH: Th 12:30-2:30 OC 203B | |||

Claire Lin | anqi.lin@duke.edu | OH: Th 10:00-12:00 OC 203B | |||

Yi Luo | yi.luo4@duke.edu | OH:Th 2:30-4:30 OC 203B | |||

Ethan McClure | ethan.mcclure@duke.edu | OH: T 11:00-1:00 OC 203B | |||

Haozhe Wang | haozhe.wang@duke.edu | OH: F 3:00-5:00 OC 025 | |||

Weiyu Yan | weiyu.yan@duke.edu | OH: W 4:00-6:00 OC 203B | |||

Wei Wen | wei.wen@duke.edu | OH: M 3:00-5:00 OC 203B | |||

Class: | W/F 10:05-11:20am | LSRC B101 |

The Academic Resource Center (ARC) offers free services to all
students during their undergraduate careers at Duke. Services include
Learning Consultations, Peer Tutoring and Study Groups, ADHD/LD
Coaching, Outreach Workshops, and more. Because learning is a process
unique to every individual, we work with each student to discover and
develop their own academic strategy for success at Duke. Contact the
ARC to schedule an appointment. Undergraduates in any year, studying
any discipline can benefit!

211 Academic Advising Center Building, East Campus – behind
Marketplace
arc.duke.edu • theARC@duke.edu

All students: we will have one poster session, April 17 from 10:00-12:00. The poster session will be in Gross Hall 3rd floor Ahmadieh Grand Hall. For a keynote version of an example poster see tex example or keynote example. If you are auditing the course, we'd love to have you at the poster sessions (bring your research groups too!).

Statistics at the level of STA611 (Introduction to Statistical Methods) is encouraged, along with knowledge of linear algebra and multivariate calculus.

Course grade is based on a take home midterm (15%), a take home final (35%), a final project (40%), and the poster session for the final project (10%). We will have homeworks but they will not be graded, we will post solutions.

There is a Piazza course discussion page. Please direct questions about homeworks and other matters to that page. Otherwise, you can email the instructors (TAs and professor). Note that we are more likely to respond to the Piazza questions than to the email, and your classmates may respond too, so that is a good place to start.

The final porjects should be in LaTeX. If you have never used LaTeX before, there are online tutorials, Mac GUIs, and even online compilers that might help you.

The course project will include a project proposal due mid-semester, a four page writeup of the project at the end of the semester, and an all-campus poster session where you will present your work. This is the most important part of the course; we strongly encourage you to come and discuss project ideas with us early and often throughout the semester. We expect some of these projects to become publications. You are absolutely permitted to use your current rotation or research project as course projects. Examples of previous projects can be found at projects.

The programming assignments in this course can be done in any language but we will be doing simulations in PyTorch.

The course will follow my lecture notes (this will be updated as the course proceeds), Lecture Notes. Some other texts and notes that may be useful include:

- Kevin Murphy, Machine Learning: a probabilistic perspective
- Michael Lavine, Introduction to Statistical Thought (an introductory statistical textbook with plenty of R examples, and it's online too)
- Chris Bishop, Pattern Recognition and Machine Learning
- Daphne Koller & Nir Friedman, Probabilistic Graphical Models
- Hastie, Tibshirani, Friedman, Elements of Statistical Learning (ESL) (PDF available online)
- David J.C. MacKay Information Theory, Inference, and Learning Algorithms (PDF available online)

The final project TeX template and final project style file should be used in preparation of your final project report. Please follow the instructions and let me know if you have questions.

This syllabus is *tentative*, and will almost surely be modified. Reload your browser for the current version.

- (Jan 11th) Introduction and review: Lecture
- Optional: (video) Christopher Bishop Embracing Uncertainty: The New Machine Intelligence
- Optional: (video) Sam Roweis Machine Learning, Probability and Graphical Models, Part 1
- Optional: (video) Mikaela Keller Basics of probability and statistics for statistical learning
- Optional: Alan Turing Computing Machinery and Intelligence
- (Jan 16th) Linear regression, the proceduralist approach: Lecture
- Optional: Norman R. Draper and R. Craig van Nostrand Ridge regression
- Optional: Elements of Statistical Learning Pages 61-67
- Optional: (video) Michael Jordan Bayesian or Frequentist: Which Are You?
- Optional: Proof that leave-k-out is unbiased Lecture notes based on: A. Luntz and V. Brailovsky. Technicheskaya Kibernetica, 3, 1969.
- (Jan 18th) Bayesian motivation for proceduralist approach: Lecture
- Optional: (video) Alex Smola Exponential Families
- Strongly suggested: Useful properties of the multivariate normal in notes
- Optional*: Persi Diaconis and Donald Ylvisaker Conjugate priors for exponential families
- (Jan 23rd) Bayesian linear regression: Lecture
- Optional: (video) LISA Short Course: Regression Using Bayesian Statistics in R
- Strongly suggested: Review of Functional analysis in notes
- (Jan 25th) Regularized logistic
regression:
Lecture and
Support Vector Machines
and optimization notes
- Optional: (video) Nate Otten Introduction to conjugate gradient
- Optional*: Andrew Stuart and Jochen Voss Matrix analysis and algorithms pg. 75--83

- (Jan 30th) Gaussian process regression: Lecture
- Optional: (video) Karl Rasmussen Gaussian processes
- Optional: (video) David MacKay Gaussian Process Basics
- Optional*: J.L. Doob The elementary Gaussian process
- (Feb 1st) Sparse regression: Lecture
- Optional: (video) Daniela Witten and Robert Tibshirani The Lasso
- Optional: (video) Trevor Hastie glmnet package
- (Feb 6th) Mixture models and latent space models I: Lecture
- Optional: (video) Victor Lavrenko Expectation maximization
- Optional: (slides) David Sontag Expectation maximization
- Optional*: Dempster, Laird, Rubin Maximum Likelihood from Incomplete Data via the EM Algorithm
- (Feb 8th) Mixture models and latent space models II: Lecture
- Optional: (video) Victor Lavrenko Expectation maximization
- Optional: (slides) David Sontag Expectation maximization
- Optional*: Dempster, Laird, Rubin Maximum Likelihood from Incomplete Data via the EM Algorithm
- (Feb 13th) Latent Dirichlet Allocation I: Lecture
- Optional: (video) Dave Blei Topic models
- Optional: (video) John Novembre Methods for the analysis of population structure and admixture
- Optional: (slides) Dave Blei Probabilistic Topic Models
- Optional: Pritchard, Stephens, Donnelly Inference of Population Structure Using Multilocus Genotype Data
- Optional: Blei, Ng, Jordan Latent Dirichlet Allocation
- (Feb 15th) Latent Dirichlet Allocation II: Lecture
- Optional: (video) Dave Blei Topic models
- Optional: (video) John Novembre Methods for the analysis of population structure and admixture
- Optional: (slides) Dave Blei Probabilistic Topic Models
- Optional: Pritchard, Stephens, Donnelly Inference of Population Structure Using Multilocus Genotype Data
- Optional: Blei, Ng, Jordan Latent Dirichlet Allocation
- (Feb 20th) Markov chain Monte Carlo I: Lecture
- Optional: (video) Iain Murray MCMC
- Optional: (slides) Iain Murray MCMC
- Optional: Casella, George Explaining the Gibbs Sampler
- Optional*: Levin, Peres, Wilmer Markov Chains and Mixing times
- Optional*: Metropolis, Rosenbluth, Rosenbluth, Teller, Teller Equation of State Calculations by Fast Computing Machines
- (Feb 22nd) Markov chain Monte Carlo II: Lecture
- Optional: (video) Iain Murray MCMC
- Optional: (slides) Iain Murray MCMC
- Optional: Casella, George Explaining the Gibbs Sampler
- Optional*: Levin, Peres, Wilmer Markov Chains and Mixing times
- Optional*: Metropolis, Rosenbluth, Rosenbluth, Teller, Teller Equation of State Calculations by Fast Computing Machines
- (Feb 27th) Hidden Markov models II Lecture
- Optional: (video) Nando de Freitas HMMs
- Optional: (slides) Eric Xing HMMs
- Optional: Rabiner A Tutorial on Hidden Markov Models and. Selected Applications in Speech Recognition.
- (March 6th) Dimension reduction and embeddings I Lecture
- Optional: (video) Juan Orduz Laplacian eigenmaps
- Optional: (video) Yann LeCun Graph emebddings
- Optional: (video) Laurens van der Maaten t-SNE
- Optional: (video) Konstantinos Perifanos Word embeddings
- Optional: Dasgupta and Gupta Johnson Lindenstrauss Lemma
- (March 8th) Dimension reduction and embeddings II Lecture
- Optional: (video) Juan Orduz Laplacian eigenmaps
- Optional: (video) Yann LeCun Graph emebddings
- Optional: (video) Laurens van der Maaten t-SNE
- Optional: (video) Konstantinos Perifanos Word embeddings
- Optional: Dasgupta and Gupta Johnson Lindenstrauss Lemma
- (March 20th) Neural networks I Lecture
- Trivedi and Kondor Backprop 1
- Trivedi and Kondor Backprop 2
- Trivedi and Kondor Convolutional nets
- Le Cun et al LeNet
- Krizhevsky et al AlexNet
- Leung Lecture on convolutional nets
- (March 22nd) Neural networks II Lecture
- Trivedi and Kondor Backprop 1
- Trivedi and Kondor Backprop 2
- Trivedi and Kondor Convolutional nets
- Le Cun et al LeNet
- Krizhevsky et al AlexNet
- Leung Lecture on convolutional nets
- (March 27th) Variational methods and Generative Adversarial Networks I Lecture
- (March 29th) Variational methods and Generative Adversarial Networks II Lecture
- (April 3rd) Optimization I Lecture
- Optional: (video) Sham Kakade Accelerating Stochastic Gradient Descent
- Optional: Leon Bottou Large Scale Machine Learning
- Optional: John Canny Stochastic Gradient Descent, Slide 49 is great
- Optional: Trevedi and Kondor Stochastic Gradient Descent
- (April 5th) Optimization II Lecture
- Optional: (video) Sham Kakade Accelerating Stochastic Gradient Descent
- Optional: Leon Bottou Large Scale Machine Learning
- Optional: John Canny Stochastic Gradient Descent, Slide 49 is great
- Optional: Trevedi and Kondor Stochastic Gradient Descent
- (April 10th) Computational differentiation Lecture
- Optional: Baydin, Pearlmutter, Radul, and Siskind Automatic Differentiation
- Optional: Maclaurin, Duvenaud, and Adams Reversible learning with exact arithmetic
- Optional: Kathura Getting Started with PyTorch
- Optional: Altexsoft Machine Learning Libraries
- (April 12th)) Statistical learning theory I:
Lecture
- Optional: (video) Leon Bottou and Vladimir Vapnik Foundations of Statistical Learning
- Optional: Vladimir Vapnik and Ya. Chervonenkis On the Uniform Convegence of Relative Frequencies of Events to their Probabilities
- Optional*: Michel Talagrand The Glivenko-Cantelli Problem

- (April 12th) Computational differentiation Lecture
- (April 17th) Poster session (10:05-12:00) in Gross Hall