STA561 COMPSCI571 ECE682: Probabilistic Machine Learning: Spring 2019

Prof:Sayan Mukherjee sayan@stat.duke.edu OH: M 9:30-11:30112 Old Chem
TAs:
Peter Hase peter.hase@duke.eduOH: Th 12:30-2:30 OC 203B
Claire Lin anqi.lin@duke.eduOH: Th 10:00-12:00 OC 203B
Yi Luo yi.luo4@duke.eduOH:Th 2:30-4:30 OC 203B
Ethan McClure ethan.mcclure@duke.eduOH: T 11:00-1:00 OC 203B
Haozhe Wang haozhe.wang@duke.eduOH: F 3:00-5:00 OC 025
Weiyu Yan weiyu.yan@duke.eduOH: W 4:00-6:00 OC 203B
Wei Wen wei.wen@duke.eduOH: M 3:00-5:00 OC 203B
Class:W/F 10:05-11:20am LSRC B101

Description

Introduction to machine learning techniques. Graphical models, latent variable models, dimensionality reduction techniques, statistical learning, regression, kernel methods, state space models, HMMs, MCMC. Emphasis is on applying these techniques to real data in a variety of application areas.


Academic Resource Center

The Academic Resource Center (ARC) offers free services to all students during their undergraduate careers at Duke. Services include Learning Consultations, Peer Tutoring and Study Groups, ADHD/LD Coaching, Outreach Workshops, and more. Because learning is a process unique to every individual, we work with each student to discover and develop their own academic strategy for success at Duke. Contact the ARC to schedule an appointment. Undergraduates in any year, studying any discipline can benefit!

211 Academic Advising Center Building, East Campus – behind Marketplace arc.duke.edu • theARC@duke.edu • 919-684-5917


News and information

All students: we will have one poster session, April 17 from 10:00-12:00. The poster session will be in Gross Hall 3rd floor Ahmadieh Grand Hall. For a keynote version of an example poster see tex example or keynote example. If you are auditing the course, we'd love to have you at the poster sessions (bring your research groups too!).


Statistics at the level of STA611 (Introduction to Statistical Methods) is encouraged, along with knowledge of linear algebra and multivariate calculus.

Course grade is based on a take home midterm (15%), a take home final (35%), a final project (40%), and the poster session for the final project (10%). We will have homeworks but they will not be graded, we will post solutions.

There is a Piazza course discussion page. Please direct questions about homeworks and other matters to that page. Otherwise, you can email the instructors (TAs and professor). Note that we are more likely to respond to the Piazza questions than to the email, and your classmates may respond too, so that is a good place to start.

The final porjects should be in LaTeX. If you have never used LaTeX before, there are online tutorials, Mac GUIs, and even online compilers that might help you.

The course project will include a project proposal due mid-semester, a four page writeup of the project at the end of the semester, and an all-campus poster session where you will present your work. This is the most important part of the course; we strongly encourage you to come and discuss project ideas with us early and often throughout the semester. We expect some of these projects to become publications. You are absolutely permitted to use your current rotation or research project as course projects. Examples of previous projects can be found at projects.

The programming assignments in this course can be done in any language but we will be doing simulations in PyTorch.

The course will follow my lecture notes (this will be updated as the course proceeds), Lecture Notes. Some other texts and notes that may be useful include:

  1. Kevin Murphy, Machine Learning: a probabilistic perspective
  2. Michael Lavine, Introduction to Statistical Thought (an introductory statistical textbook with plenty of R examples, and it's online too)
  3. Chris Bishop, Pattern Recognition and Machine Learning
  4. Daphne Koller & Nir Friedman, Probabilistic Graphical Models
  5. Hastie, Tibshirani, Friedman, Elements of Statistical Learning (ESL) (PDF available online)
  6. David J.C. MacKay Information Theory, Inference, and Learning Algorithms (PDF available online)

The final project TeX template and final project style file should be used in preparation of your final project report. Please follow the instructions and let me know if you have questions.

This syllabus is tentative, and will almost surely be modified. Reload your browser for the current version.


Syllabus

  1. (Jan 11th) Introduction and review: Lecture

  2. Handout for Lab 1

  3. (Jan 16th) Linear regression, the proceduralist approach: Lecture

  4. (Jan 18th) Bayesian motivation for proceduralist approach: Lecture

  5. (Jan 23rd) Bayesian linear regression: Lecture

  6. HW 1 Data for HW 1 HW 1 solutions

  7. (Jan 25th) Regularized logistic regression: Lecture and Support Vector Machines and optimization notes
  8. Handout for Lab 2

  9. (Jan 30th) Gaussian process regression: Lecture

  10. Handout for Lab 3

  11. (Feb 1st) Sparse regression: Lecture

  12. HW 2 Dataset 1 for HW 2 Dataset 2 for HW 2


    Practice mid\ term


  13. (Feb 6th) Mixture models and latent space models I: Lecture

  14. (Feb 8th) Mixture models and latent space models II: Lecture

  15. Handout for Lab 4

    Practice midterm

  16. (Feb 13th) Latent Dirichlet Allocation I: Lecture

  17. (Feb 15th) Latent Dirichlet Allocation II: Lecture

  18. (Feb 14-Feb 20) Take home midterm

    Handout for Lab 5

  19. (Feb 20th) Markov chain Monte Carlo I: Lecture

  20. (Feb 22nd) Markov chain Monte Carlo II: Lecture

  21. Handout for Lab 6

  22. (Feb 27th) Hidden Markov models II Lecture

  23. (March 6th) Dimension reduction and embeddings I Lecture

  24. (March 8th) Dimension reduction and embeddings II Lecture

  25. (March 20th) Neural networks I Lecture

  26. (March 22nd) Neural networks II Lecture

  27. (March 27th) Variational methods and Generative Adversarial Networks I Lecture

  28. (March 29th) Variational methods and Generative Adversarial Networks II Lecture

  29. (April 3rd) Optimization I Lecture

  30. (April 5th) Optimization II Lecture

  31. (April 10th) Computational differentiation Lecture

  32. (April 12th)) Statistical learning theory I: Lecture
  33. (April 12th) Computational differentiation Lecture


  34. (April 17th) Poster session (10:05-12:00) in Gross Hall

  35. (April 18-28) Take home final

    (April 28th) Final projects due