STA561 Probabilistic Machine Learning, Spring 2019

STA561 COMPSCI571 ECE682: Probabilistic Machine Learning: Spring 2019

Prof:	Sayan Mukherjee	sayan@stat.duke.edu		OH: M 9:30-11:30	112 Old Chem
TAs:
	Peter Hase	peter.hase@duke.edu	OH: Th 12:30-2:30 OC 203B
	Claire Lin	anqi.lin@duke.edu	OH: Th 10:00-12:00 OC 203B
	Yi Luo	yi.luo4@duke.edu	OH:Th 2:30-4:30 OC 203B
	Ethan McClure	ethan.mcclure@duke.edu	OH: T 11:00-1:00 OC 203B
	Haozhe Wang	haozhe.wang@duke.edu	OH: F 3:00-5:00 OC 025
	Weiyu Yan	weiyu.yan@duke.edu	OH: W 4:00-6:00 OC 203B
	Wei Wen	wei.wen@duke.edu	OH: M 3:00-5:00 OC 203B
Class:	W/F 10:05-11:20am				LSRC B101

Description

Introduction to machine learning techniques. Graphical models, latent variable models, dimensionality reduction techniques, statistical learning, regression, kernel methods, state space models, HMMs, MCMC. Emphasis is on applying these techniques to real data in a variety of application areas.

Academic Resource Center

The Academic Resource Center (ARC) offers free services to all students during their undergraduate careers at Duke. Services include Learning Consultations, Peer Tutoring and Study Groups, ADHD/LD Coaching, Outreach Workshops, and more. Because learning is a process unique to every individual, we work with each student to discover and develop their own academic strategy for success at Duke. Contact the ARC to schedule an appointment. Undergraduates in any year, studying any discipline can benefit!

211 Academic Advising Center Building, East Campus – behind Marketplace arc.duke.edu • theARC@duke.edu • 919-684-5917

News and information

All students: we will have one poster session, April 17 from 10:00-12:00. The poster session will be in Gross Hall 3rd floor Ahmadieh Grand Hall. For a keynote version of an example poster see tex example or keynote example. If you are auditing the course, we'd love to have you at the poster sessions (bring your research groups too!).

Statistics at the level of STA611 (Introduction to Statistical Methods) is encouraged, along with knowledge of linear algebra and multivariate calculus.

Course grade is based on a take home midterm (15%), a take home final (35%), a final project (40%), and the poster session for the final project (10%). We will have homeworks but they will not be graded, we will post solutions.

There is a Piazza course discussion page. Please direct questions about homeworks and other matters to that page. Otherwise, you can email the instructors (TAs and professor). Note that we are more likely to respond to the Piazza questions than to the email, and your classmates may respond too, so that is a good place to start.

The final porjects should be in LaTeX. If you have never used LaTeX before, there are online tutorials, Mac GUIs, and even online compilers that might help you.

The course project will include a project proposal due mid-semester, a four page writeup of the project at the end of the semester, and an all-campus poster session where you will present your work. This is the most important part of the course; we strongly encourage you to come and discuss project ideas with us early and often throughout the semester. We expect some of these projects to become publications. You are absolutely permitted to use your current rotation or research project as course projects. Examples of previous projects can be found at projects.

The programming assignments in this course can be done in any language but we will be doing simulations in PyTorch.

The course will follow my lecture notes (this will be updated as the course proceeds), Lecture Notes. Some other texts and notes that may be useful include:

Kevin Murphy, Machine Learning: a probabilistic perspective
Michael Lavine, Introduction to Statistical Thought (an introductory statistical textbook with plenty of R examples, and it's online too)
Chris Bishop, Pattern Recognition and Machine Learning
Daphne Koller & Nir Friedman, Probabilistic Graphical Models
Hastie, Tibshirani, Friedman, Elements of Statistical Learning (ESL) (PDF available online)
David J.C. MacKay Information Theory, Inference, and Learning Algorithms (PDF available online)

The final project TeX template and final project style file should be used in preparation of your final project report. Please follow the instructions and let me know if you have questions.

This syllabus is tentative, and will almost surely be modified. Reload your browser for the current version.

Syllabus

(Jan 11th) Introduction and review: Lecture

Optional: (video) Christopher Bishop Embracing Uncertainty: The New Machine Intelligence
Optional: (video) Sam Roweis Machine Learning, Probability and Graphical Models, Part 1
Optional: (video) Mikaela Keller Basics of probability and statistics for statistical learning
Optional: Alan Turing Computing Machinery and Intelligence

Handout for Lab 1

(Jan 16th) Linear regression, the proceduralist approach: Lecture

Optional: Norman R. Draper and R. Craig van Nostrand Ridge regression
Optional: Elements of Statistical Learning Pages 61-67
Optional: (video) Michael Jordan Bayesian or Frequentist: Which Are You?
Optional: Proof that leave-k-out is unbiased Lecture notes based on: A. Luntz and V. Brailovsky. Technicheskaya Kibernetica, 3, 1969.

(Jan 18th) Bayesian motivation for proceduralist approach: Lecture

Optional: (video) Alex Smola Exponential Families
Strongly suggested: Useful properties of the multivariate normal in notes
Optional*: Persi Diaconis and Donald Ylvisaker Conjugate priors for exponential families

(Jan 23rd) Bayesian linear regression: Lecture

Optional: (video) LISA Short Course: Regression Using Bayesian Statistics in R
Strongly suggested: Review of Functional analysis in notes

HW 1 Data for HW 1 HW 1 solutions

(Jan 25th) Regularized logistic regression: Lecture and Support Vector Machines and optimization notes
- Optional: (video) Nate Otten Introduction to conjugate gradient
- Optional*: Andrew Stuart and Jochen Voss Matrix analysis and algorithms pg. 75--83

Handout for Lab 2

(Jan 30th) Gaussian process regression: Lecture

Optional: (video) Karl Rasmussen Gaussian processes
Optional: (video) David MacKay Gaussian Process Basics
Optional*: J.L. Doob The elementary Gaussian process

Handout for Lab 3

(Feb 1st) Sparse regression: Lecture

Optional: (video) Daniela Witten and Robert Tibshirani The Lasso
Optional: (video) Trevor Hastie glmnet package

HW 2 Dataset 1 for HW 2 Dataset 2 for HW 2

Practice mid\ term

(Feb 6th) Mixture models and latent space models I: Lecture

Optional: (video) Victor Lavrenko Expectation maximization
Optional: (slides) David Sontag Expectation maximization
Optional*: Dempster, Laird, Rubin Maximum Likelihood from Incomplete Data via the EM Algorithm

(Feb 8th) Mixture models and latent space models II: Lecture

Optional: (video) Victor Lavrenko Expectation maximization
Optional: (slides) David Sontag Expectation maximization
Optional*: Dempster, Laird, Rubin Maximum Likelihood from Incomplete Data via the EM Algorithm

Handout for Lab 4

Practice midterm

(Feb 13th) Latent Dirichlet Allocation I: Lecture

Optional: (video) Dave Blei Topic models
Optional: (video) John Novembre Methods for the analysis of population structure and admixture
Optional: (slides) Dave Blei Probabilistic Topic Models
Optional: Pritchard, Stephens, Donnelly Inference of Population Structure Using Multilocus Genotype Data
Optional: Blei, Ng, Jordan Latent Dirichlet Allocation

(Feb 15th) Latent Dirichlet Allocation II: Lecture

Optional: (video) Dave Blei Topic models
Optional: (video) John Novembre Methods for the analysis of population structure and admixture
Optional: (slides) Dave Blei Probabilistic Topic Models
Optional: Pritchard, Stephens, Donnelly Inference of Population Structure Using Multilocus Genotype Data
Optional: Blei, Ng, Jordan Latent Dirichlet Allocation

(Feb 14-Feb 20) Take home midterm

Handout for Lab 5

(Feb 20th) Markov chain Monte Carlo I: Lecture

Optional: (video) Iain Murray MCMC
Optional: (slides) Iain Murray MCMC
Optional: Casella, George Explaining the Gibbs Sampler
Optional*: Levin, Peres, Wilmer Markov Chains and Mixing times
Optional*: Metropolis, Rosenbluth, Rosenbluth, Teller, Teller Equation of State Calculations by Fast Computing Machines

(Feb 22nd) Markov chain Monte Carlo II: Lecture

Optional: (video) Iain Murray MCMC
Optional: (slides) Iain Murray MCMC
Optional: Casella, George Explaining the Gibbs Sampler
Optional*: Levin, Peres, Wilmer Markov Chains and Mixing times
Optional*: Metropolis, Rosenbluth, Rosenbluth, Teller, Teller Equation of State Calculations by Fast Computing Machines

Handout for Lab 6

(Feb 27th) Hidden Markov models II Lecture

Optional: (video) Nando de Freitas HMMs
Optional: (slides) Eric Xing HMMs
Optional: Rabiner A Tutorial on Hidden Markov Models and. Selected Applications in Speech Recognition.

(March 6th) Dimension reduction and embeddings I Lecture

Optional: (video) Juan Orduz Laplacian eigenmaps
Optional: (video) Yann LeCun Graph emebddings
Optional: (video) Laurens van der Maaten t-SNE
Optional: (video) Konstantinos Perifanos Word embeddings
Optional: Dasgupta and Gupta Johnson Lindenstrauss Lemma

(March 8th) Dimension reduction and embeddings II Lecture

Optional: (video) Juan Orduz Laplacian eigenmaps
Optional: (video) Yann LeCun Graph emebddings
Optional: (video) Laurens van der Maaten t-SNE
Optional: (video) Konstantinos Perifanos Word embeddings
Optional: Dasgupta and Gupta Johnson Lindenstrauss Lemma

(March 20th) Neural networks I Lecture

Trivedi and Kondor Backprop 1
Trivedi and Kondor Backprop 2
Trivedi and Kondor Convolutional nets
Le Cun et al LeNet
Krizhevsky et al AlexNet
Leung Lecture on convolutional nets

(March 22nd) Neural networks II Lecture

Trivedi and Kondor Backprop 1
Trivedi and Kondor Backprop 2
Trivedi and Kondor Convolutional nets
Le Cun et al LeNet
Krizhevsky et al AlexNet
Leung Lecture on convolutional nets

(March 27th) Variational methods and Generative Adversarial Networks I Lecture

(March 29th) Variational methods and Generative Adversarial Networks II Lecture

(April 3rd) Optimization I Lecture

Optional: (video) Sham Kakade Accelerating Stochastic Gradient Descent
Optional: Leon Bottou Large Scale Machine Learning
Optional: John Canny Stochastic Gradient Descent, Slide 49 is great
Optional: Trevedi and Kondor Stochastic Gradient Descent

(April 5th) Optimization II Lecture

Optional: (video) Sham Kakade Accelerating Stochastic Gradient Descent
Optional: Leon Bottou Large Scale Machine Learning
Optional: John Canny Stochastic Gradient Descent, Slide 49 is great
Optional: Trevedi and Kondor Stochastic Gradient Descent

(April 10th) Computational differentiation Lecture

Optional: Baydin, Pearlmutter, Radul, and Siskind Automatic Differentiation
Optional: Maclaurin, Duvenaud, and Adams Reversible learning with exact arithmetic
Optional: Kathura Getting Started with PyTorch
Optional: Altexsoft Machine Learning Libraries

(April 12th)) Statistical learning theory I: Lecture
- Optional: (video) Leon Bottou and Vladimir Vapnik Foundations of Statistical Learning
- Optional: Vladimir Vapnik and Ya. Chervonenkis On the Uniform Convegence of Relative Frequencies of Events to their Probabilities
- Optional*: Michel Talagrand The Glivenko-Cantelli Problem
(April 12th) Computational differentiation Lecture

(April 17th) Poster session (10:05-12:00) in Gross Hall

(April 18-28) Take home final

(April 28th) Final projects due