STA561 Probabilistic Machine Learning, Fall 2015

STA561 COMPSCI571: Probabilistic Machine Learning: Fall 2015

Prof:	Sayan Mukherjee	sayan@stat.duke.edu		OH: Mon 10-12	112 Old Chem
TAs:
	Abhishek Dubey	abhisdub@cs.duke.edu	OH: Wednesday 10-11am LSRC D309
	Yuhao Liang	yuhao.liang@duke.edu	OH: Monday 7:00-9:00pm Old Chem 211a
	Xinyi Li	xinyi.li@duke.edu	OH:
Class:	M/W 8:30-9:45am				Social Sciences 136

Description

Introduction to machine learning techniques. Graphical models, latent variable models, dimensionality reduction techniques, statistical learning, regression, kernel methods, state space models, HMMs, MCMC. Emphasis is on applying these techniques to real data in a variety of application areas.

News and information

All students: we will have one poster session, Dec 4. The poster session will be in Gross Hall 3rd floor East Meeting Space. For a keynote version of an example poster see tex example or keynote example. If you are auditing the course, we'd love to have you at the poster sessions (bring your research groups too!).

Statistics at the level of STA611 (Introduction to Statistical Methods) is encouraged, along with knowledge of linear algebra and multivariate calculus.

Course grade is based on an in class midterm (15%), in class final (35%), a final project (40%), and the poster session for the final project (10%). We will have homeworks but they will not be graded, we will post solutions.

There is a Piazza course discussion page. Please direct questions about homeworks and other matters to that page. Otherwise, you can email the instructors (TAs and professor) at sta561-ta@duke.edu. Note that we are more likely to respond to the Piazza questions than to the email, and your classmates may respond too, so that is a good place to start.

The final porjects should be in LaTeX. If you have never used LaTeX before, there are online tutorials, Mac GUIs, and even online compilers that might help you.

The course project will include a project proposal due mid-semester, a four page writeup of the project at the end of the semester, and an all-campus poster session where you will present your work. This is the most important part of the course; we strongly encourage you to come and discuss project ideas with us early and often throughout the semester. We expect some of these projects to become publications. You are absolutely permitted to use your current rotation or research project as course projects. Examples of last years projects.

A second set of references for R may be useful. First, you can download R from the CRAN website. There are many resources, such as R Studio, that can help with the programming interface, and tutorials on R are all over the place. If you are getting bored with the standard graphics package, I really like using ggplot2 for beautiful graphics and figures. Finally, you can integrate R code and output with plain text using KNITR, but that might be going a bit too far if you are a beginner.

The course will follow my lecture notes (this will be updated as the course proceeds), Lecture Notes. Some other texts and notes that may be useful include:

Kevin Murphy, Machine Learning: a probabilistic perspective
Michael Lavine, Introduction to Statistical Thought (an introductory statistical textbook with plenty of R examples, and it's online too)
Chris Bishop, Pattern Recognition and Machine Learning
Daphne Koller & Nir Friedman, Probabilistic Graphical Models
Hastie, Tibshirani, Friedman, Elements of Statistical Learning (ESL) (PDF available online)
David J.C. MacKay Information Theory, Inference, and Learning Algorithms (PDF available online)

The final project TeX template and final project style file should be used in preparation of your final project report. Please follow the instructions and let me know if you have questions. We will have a poster session where you present your research project in lieu of a final exam.

This syllabus is tentative, and will almost surely be modified. Reload your browser for the current version.

This years final projects

Predicting sales of Rossman's stores
Gentrification Index Using Yelp Data
Risk estimates of tree mortality across species using Bayesian hierarchical models
Classification of TV Channels
Prediction of Coupon Purchasing Behavior
Classification of Cardiac Tissue Regions Based on Motion Profile in Ultrasound Images
Spectral Clustering of Chinese Herbal Medicine Network
Use of Machine Learning in Predicting Bankruptcy
Distinguishing malignant from benign breast tumors
Detection of Solar Panes from Satellite Imagery
Yelp Customer Review Bias Analysis through Linear Mixed Effect Models with Natural Language Sentiment Polarity Scores
Testing the CAPM Theory for German CDS Based on a Model with GARCH-type Volatilities and SSAEPD Errors
Bayesian Non-Parametrics and Dirichlet Process Clustering Techniques
Text Analysis of News Articles (Building a Protest Dataset through Machine Learning)
Information Popularity and Diffusion Size Prediction in Online Social Networks
Cascading Classifier for Face Detection
What's Cooking ? Predicting Cuisines from Recipe Ingredients
Analysing Senator Community Structure from Roll Call Data
Handwritten Digits Recognition
A Neural Algorithm for Artistic Style
Machine Learning with Python
Predictive Modeling of Bank Marketing for Term Deposit
Air Pollution Distribution Analysis for Beijing Haze
Beyond SVD
Legislation approval ratings prediction via vote correlation
Categorical Prediction of Song Popularity Using Topological Data Analysis
Movie Recommender System
The Effect of Racial Diversity on High School Graduation Rates
Comparison of feature selection methods in modeling resting metabolic rate
Randomization as regularization
Designing an optimum traffic signal system using reinforcement learning
Topic modeling for community analysis and range estimation
Classifying Soccer Matches in the English Premier League
Spectral algorithms and tensor methods for learning in POMDPs
World Cup Recap
Dimension Reduction Methods on Handwritten Digits Recognition
ML methods for Drosophila Dorsal closure
The Animal Model for Censored Traits
Spectral Clustering and Community Detection in Labeled Graph
Cluster Analysis of Endogenous Taxi Driver Schedule Patterns

Syllabus

(August 24th) Introduction and review: Lecture 1 in notes

Optional: (video) Christopher Bishop Embracing Uncertainty: The New Machine Intelligence
Optional: (video) Sam Roweis Machine Learning, Probability and Graphical Models, Part 1
Optional: (video) Mikaela Keller Basics of probability and statistics for statistical learning
Optional: Alan Turing Computing Machinery and Intelligence
Homework: Due Sep. 7 Assignment 1 Solution 1
- Poisson problem HW1.txt
- Gene expression problem test.txt train.txt samples.txt

(August 26th) No class
- Optional: (video) Michael Jordan Bayesian or Frequentist: Which Are You?
(August 31th) Linear regression, the proceduralist approach: Lecture 2 in notes

Optional: Norman R. Draper and R. Craig van Nostrand Ridge regression
Optional: Elements of Statistical Learning Pages 61-67
Optional: Proof that leave-k-out is unbiased Lecture notes based on: A. Luntz and V. Brailovsky. Technicheskaya Kibernetica, 3, 1969.

(September 2nd) Bayesian motivation for proceduralist approach: Lecture 3 in notes

Optional: (video) Alex Smola Exponential Families
Strongly suggested: Useful properties of the multivariate normal in notes
Optional*: Persi Diaconis and Donald Ylvisaker Conjugate priors for exponential families

(September 7th) Bayesian linear regression: Lecture 4 in notes

Optional: (video) LISA Short Course: Regression Using Bayesian Statistics in R
Strongly suggested: Review of Functional analysis in notes
Homework: Due Sep. 23 Assignment 2 Solution 2

(September 9th) Reproducing kernel Hilbert spaces: Lecture 5 in notes

Optional: (video) Partha Niyogi Introduction to Kernel Methods
Optional*: Nachman Aronszajn Theory of Reproducing Kernels

(September 14th) Nonlinear regression: Lecture 6 in notes

Optional: (video) Partha Niyogi Introduction to Kernel Methods
Optional: (video) John Shawe-Taylor Kernel Methods and Support Vector Machines
Strongly suggested: Review of convex optimization in notes
Strongly suggested if you don't know Lagrange Multipliers: Lagrange multipliers and KKT conditions

(September 16th, 21st) Support Vector Machines: Lecture 7 in notes
- Optional: (video) Lieven Vandenberghe Convex optimization
- Optional: (video) Stephen Boyd Domain Specific Languages for Convex Optimization
(September 23rd) Regularized logistic regression: Lecture 8 in notes

Optional: (video) Nate Otten Introduction to conjugate gradient
Optional*: Andrew Stuart and Jochen Voss Matrix analysis and algorithms pg. 75--83

(September 28th) Gaussian process regression: Lecture 9 in notes

Optional: (video) Karl Rasmussen Gaussian processes
Optional: (video) David MacKay Gaussian Process Basics
Optional*: J.L. Doob The elementary Gaussian process

(September 30th) Sparse regression: Lecture 10 in notes

Optional: (video) Daniela Witten and Robert Tibshirani The Lasso
Optional: (video) Trevor Hastie glmnet package
Homework Due Oct. 7 Assignment 3

(October 5th) The boosting hypothesis and Adaboost: Lecture 11 in notes

Optional: (video) Rob Schapire Theory and Applications of Boosting
Optional: Leslie Valiant A Theory of the Learnable
Optional: Rob SchapireThe Strength of Weak Learnability

(October 7th) In class midterm

(October 14th, 19th) Statistical learning theory: Lecture 12 in notes
- Optional: (video) Leon Bottou and Vladimir Vapnik Foundations of Statistical Learning
- Optional: Vladimir Vapnik and Ya. Chervonenkis On the Uniform Convegence of Relative Frequencies of Events to their Probabilities
- Optional*: Michel Talagrand The Glivenko-Cantelli Problem
(October 19th, 21st) Mixture models and latent space models: Lecture 13 in notes

Optional: (video) Victor Lavrenko Expectation maximization
Optional: (slides) David Sontag Expectation maximization
Optional*: Dempster, Laird, Rubin Maximum Likelihood from Incomplete Data via the EM Algorithm

(October 26th, 28th) Latent Dirichlet Allocation: Lecture 14 in notes

Optional: (video) Dave Blei Topic models
Optional: (video) John Novembre Methods for the analysis of population structure and admixture
Optional: (slides) Dave Blei Probabilistic Topic Models
Optional: Pritchard, Stephens, Donnelly Inference of Population Structure Using Multilocus Genotype Data
Optional: Blei, Ng, Jordan Latent Dirichlet Allocation

(November 2nd, 4th) Markov chain Monte Carlo: Lecture 15 in notes

Optional: (video) Iain Murray MCMC
Optional: (slides) Iain Murray MCMC
Optional: Casella, George Explaining the Gibbs Sampler
Optional*: Levin, Peres, Wilmer Markov Chains and Mixing times
Optional*: Metropolis, Rosenbluth, Rosenbluth, Teller, Teller Equation of State Calculations by Fast Computing Machines

(November 9th, 11th) Hidden Markov models Lecture 16 in notes

Optional: (video) Nando de Freitas HMMs
Optional: (slides) Eric Xing HMMs
Optional: Rabiner A Tutorial on Hidden Markov Models and. Selected Applications in Speech Recognition.

(November 23rd) In class final

(December 4th) Poster session (2pm)

(December 7th) Final projects due