STA561 COMPSCI571: Probabilistic Machine Learning: Fall 2015
Prof: | Sayan Mukherjee |
sayan@stat.duke.edu | |
OH: Mon 10-12 | 112 Old Chem |
TAs: |
| Abhishek
Dubey | abhisdub@cs.duke.edu | OH: Wednesday 10-11am
LSRC D309 | | |
| Yuhao
Liang | yuhao.liang@duke.edu | OH: Monday 7:00-9:00pm
Old Chem 211a | | |
| Xinyi Li | xinyi.li@duke.edu | OH: | | |
Class: | M/W
8:30-9:45am | | | | Social Sciences 136 |
Description
Introduction to machine learning techniques. Graphical models, latent
variable models, dimensionality reduction techniques, statistical learning, regression, kernel methods, state space models, HMMs, MCMC. Emphasis is on applying these techniques to real data in a variety of application areas.
News and information
All students: we will have one poster session, Dec 4. The poster
session will be in Gross Hall 3rd floor East Meeting Space. For a
keynote version of an example poster see
tex
example
or
keynote example.
If you are auditing the course, we'd love to have
you at the poster sessions (bring your research groups too!).
Statistics at the level of STA611 (Introduction to Statistical
Methods) is encouraged, along with knowledge of linear algebra and
multivariate calculus.
Course grade is based on an in class midterm (15%),
in class final (35%), a final project (40%), and the poster session
for the final project (10%). We will have homeworks but they will not
be graded, we will post solutions.
There is a Piazza course
discussion page. Please direct questions about homeworks and other
matters to that page. Otherwise, you can email the instructors (TAs
and professor) at sta561-ta@duke.edu. Note that we are more likely to
respond to the Piazza questions than to the email, and your classmates
may respond too, so that is a good place to start.
The final porjects should be in LaTeX. If you have never used LaTeX before, there are online
tutorials, Mac
GUIs, and even online
compilers that might help you.
The course project will include a project proposal due mid-semester, a
four page writeup of the project at the end of the semester, and an
all-campus poster session where you will present your work. This is
the most important part of the course; we strongly encourage you to
come and discuss project ideas with us early and often throughout the
semester. We expect some of these projects to become publications. You
are absolutely permitted to use your current rotation or research
project as course projects. Examples of last years projects.
A second set of references for R may be useful. First, you can
download R from the CRAN
website. There are many resources, such as R Studio, that can help with the
programming interface, and tutorials
on R are all over the place. If you are getting bored with the
standard graphics package, I really like using ggplot2 for beautiful graphics and
figures. Finally, you can integrate R code and output with plain text
using KNITR, but that might be
going a bit too far if you are a beginner.
The course will follow my lecture notes (this will be updated as the
course
proceeds), Lecture
Notes.
Some other texts and notes that may be useful include:
- Kevin Murphy,
Machine Learning: a probabilistic perspective
- Michael Lavine, Introduction to Statistical Thought (an introductory statistical textbook with plenty of R examples, and it's online too)
- Chris Bishop, Pattern Recognition and Machine Learning
- Daphne Koller & Nir Friedman, Probabilistic Graphical Models
- Hastie, Tibshirani, Friedman, Elements of Statistical Learning (ESL) (PDF available online)
- David J.C. MacKay Information Theory, Inference, and Learning Algorithms (PDF available online)
The final project TeX template and final project style file should be used in preparation of your final project report. Please follow the instructions and let me know if you have questions. We will have a poster session where you present your research project in lieu of a final exam.
This syllabus is tentative, and will almost surely be modified. Reload your browser for the current version.
- Predicting
sales of Rossman's stores
- Gentrification Index Using Yelp Data
- Risk
estimates of tree mortality across species using Bayesian
hierarchical models
- Classification
of TV Channels
- Prediction
of Coupon Purchasing Behavior
- Classification of Cardiac Tissue Regions Based on Motion Profile in Ultrasound Images
- Spectral Clustering of Chinese Herbal Medicine Network
- Use of Machine Learning in Predicting Bankruptcy
- Distinguishing
malignant from benign breast tumors
- Detection
of Solar Panes from Satellite Imagery
- Yelp
Customer Review Bias Analysis through Linear Mixed Effect Models
with Natural Language Sentiment Polarity Scores
- Testing the CAPM Theory for German CDS Based on a Model with GARCH-type Volatilities and SSAEPD Errors
- Bayesian Non-Parametrics and Dirichlet Process Clustering Techniques
- Text
Analysis of News Articles (Building a Protest Dataset through
Machine Learning)
- Information
Popularity and Diffusion Size Prediction in Online Social Networks
- Cascading
Classifier for Face Detection
- What's
Cooking ? Predicting Cuisines from Recipe Ingredients
- Analysing
Senator Community Structure from Roll Call Data
- Handwritten
Digits Recognition
- A Neural
Algorithm for Artistic Style
- Machine Learning
with Python
- Predictive Modeling
of Bank Marketing for Term Deposit
- Air
Pollution Distribution Analysis for Beijing Haze
- Beyond SVD
- Legislation approval
ratings prediction via vote correlation
- Categorical
Prediction of Song Popularity Using Topological Data Analysis
- Movie Recommender System
- The
Effect of Racial Diversity on High School Graduation Rates
- Comparison
of feature selection methods in modeling resting metabolic rate
- Randomization
as regularization
- Designing an
optimum traffic signal system using reinforcement learning
- Topic modeling
for community analysis and range estimation
- Classifying
Soccer Matches in the English Premier League
- Spectral
algorithms and tensor methods for learning in POMDPs
- World
Cup Recap
- Dimension
Reduction Methods on Handwritten Digits Recognition
- ML
methods for Drosophila Dorsal closure
- The
Animal Model for Censored Traits
- Spectral Clustering and Community Detection in
Labeled Graph
- Cluster
Analysis of Endogenous Taxi Driver Schedule Patterns
- (August 24th) Introduction and review: Lecture 1 in notes
- (August 26th) No class
- (August 31th) Linear regression, the proceduralist approach:
Lecture 2 in notes
- (September 2nd) Bayesian motivation for proceduralist
approach: Lecture 3 in notes
- (September 7th) Bayesian linear regression:
Lecture 4 in notes
- (September 9th) Reproducing kernel Hilbert spaces: Lecture 5 in notes
- (September 14th) Nonlinear regression: Lecture 6 in notes
- (September 16th, 21st) Support Vector Machines:
Lecture 7 in notes
- (September 23rd) Regularized logistic regression:
Lecture 8 in notes
- (September 28th) Gaussian process regression: Lecture 9 in notes
- (September 30th) Sparse regression: Lecture 10 in notes
- (October 5th) The boosting hypothesis and Adaboost:
Lecture 11 in notes
- (October 7th) In class midterm
- (October 14th, 19th) Statistical learning theory:
Lecture 12 in notes
- (October 19th, 21st) Mixture models and latent space models:
Lecture 13 in notes
- (October 26th, 28th) Latent Dirichlet Allocation:
Lecture 14 in notes
- (November 2nd, 4th) Markov chain Monte Carlo:
Lecture 15 in notes
- (November 9th, 11th) Hidden Markov models
Lecture 16 in notes
- (November 23rd) In class final
- (December 4th) Poster session (2pm)
- (December 7th) Final projects due