STA561 COMPSCI571 ECE682: Probabilistic Machine Learning: Spring 2019

Prof:Sayan Mukherjee OH: M 9:30-11:30112 Old Chem
Peter Hase peter.hase@duke.eduOH: T 6:00-7:00 OC 203B
Claire Lin anqi.lin@duke.eduOH: Th 10:00-12:00 OC 203B
Yi Luo yi.luo4@duke.eduOH:Th 2:30-4:30 OC 203B
Ethan McClure ethan.mcclure@duke.eduOH: T 11:00-1:00 OC 203B
Haozhe Wang F 3:00-5:00 OC 025
Weiyu Yan weiyu.yan@duke.eduOH: W 4:00-6:00 OC 203B
Wei Wen wei.wen@duke.eduOH: M 3:00-5:00 OC 203B
Class:W/F 10:05-11:20am LSRC B101


Introduction to machine learning techniques. Graphical models, latent variable models, dimensionality reduction techniques, statistical learning, regression, kernel methods, state space models, HMMs, MCMC. Emphasis is on applying these techniques to real data in a variety of application areas.

Academic Resource Center

The Academic Resource Center (ARC) offers free services to all students during their undergraduate careers at Duke. Services include Learning Consultations, Peer Tutoring and Study Groups, ADHD/LD Coaching, Outreach Workshops, and more. Because learning is a process unique to every individual, we work with each student to discover and develop their own academic strategy for success at Duke. Contact the ARC to schedule an appointment. Undergraduates in any year, studying any discipline can benefit!

211 Academic Advising Center Building, East Campus – behind Marketplace • • 919-684-5917

News and information

All students: we will have one poster session, April 17 from 10:00-12:00. The poster session will be in Gross Hall 3rd floor Ahmadieh Grand Hall. For a keynote version of an example poster see tex example or keynote example. If you are auditing the course, we'd love to have you at the poster sessions (bring your research groups too!).

Statistics at the level of STA611 (Introduction to Statistical Methods) is encouraged, along with knowledge of linear algebra and multivariate calculus.

Course grade is based on a take home midterm (15%), a take home final (35%), a final project (40%), and the poster session for the final project (10%). We will have homeworks but they will not be graded, we will post solutions.

There is a Piazza course discussion page. Please direct questions about homeworks and other matters to that page. Otherwise, you can email the instructors (TAs and professor). Note that we are more likely to respond to the Piazza questions than to the email, and your classmates may respond too, so that is a good place to start.

The final porjects should be in LaTeX. If you have never used LaTeX before, there are online tutorials, Mac GUIs, and even online compilers that might help you.

The course project will include a project proposal due mid-semester, a four page writeup of the project at the end of the semester, and an all-campus poster session where you will present your work. This is the most important part of the course; we strongly encourage you to come and discuss project ideas with us early and often throughout the semester. We expect some of these projects to become publications. You are absolutely permitted to use your current rotation or research project as course projects. Examples of previous projects can be found at projects.

The programming assignments in this course can be done in any language but we will be doing simulations in PyTorch.

The course will follow my lecture notes (this will be updated as the course proceeds), Lecture Notes. Some other texts and notes that may be useful include:

  1. Kevin Murphy, Machine Learning: a probabilistic perspective
  2. Michael Lavine, Introduction to Statistical Thought (an introductory statistical textbook with plenty of R examples, and it's online too)
  3. Chris Bishop, Pattern Recognition and Machine Learning
  4. Daphne Koller & Nir Friedman, Probabilistic Graphical Models
  5. Hastie, Tibshirani, Friedman, Elements of Statistical Learning (ESL) (PDF available online)
  6. David J.C. MacKay Information Theory, Inference, and Learning Algorithms (PDF available online)

The final project TeX template and final project style file should be used in preparation of your final project report. Please follow the instructions and let me know if you have questions.

This syllabus is tentative, and will almost surely be modified. Reload your browser for the current version.


  1. (Jan 11th) Introduction and review: Lecture

  2. Handout for Lab 1

  3. (Jan 16th) Linear regression, the proceduralist approach: Lecture

  4. (Jan 18th) Bayesian motivation for proceduralist approach: Lecture

  5. (Jan 20th) Bayesian linear regression: Lecture

  6. HW 1 due Feb 1 Data for HW 1

  7. (Jan 23rd) Regularized logistic regression: Lecture and optimization notes
  8. (Jan 25th) Gaussian process regression: Lecture

  9. (Jan 30th) Sparse regression: Lecture

  10. (Feb 1st) Mixture models and latent space models I: Lecture

  11. (Feb 6th) Mixture models and latent space models II: Lecture

  12. (Feb 8th) Latent Dirichlet Allocation I: Lecture

  13. (Feb 13th) Latent Dirichlet Allocation II: Lecture

  14. (Feb 14-Feb 20) Take home midterm

  15. (Feb 15th) Markov chain Monte Carlo I: Lecture

  16. (Feb 20th) Markov chain Monte Carlo II: Lecture

  17. (Feb 22nd) Hidden Markov models I Lecture

  18. (Feb 27th) Hidden Markov models II Lecture

  19. (March 6th) Optimization I Lecture

  20. (March 8th) Optimization II Lecture

  21. (March 20th) Neural networks I Lecture

  22. (March 22nd) Neural networks II Lecture

  23. (March 25th) Variational methods and Generative Adversarial Networks I Lecture

  24. (March 27th) Variational methods and Generative Adversarial Networks II Lecture

  25. (March 29th) Dimension reduction and embeddings I Lecture

  26. (April 3rd) Dimension reduction and embeddings II Lecture

  27. (April 5th) Statistical learning theory I: Lecture
  28. (April 10th) Statistical learning theory II: Lecture
  29. (April 12th) TBD:

  30. (April 17th) Poster session (10:05-12:00)

  31. (April 18-21) Take home final

    (April 28th) Final projects due