Sta 521L - Predictive Modeling and Statistical Learning

Course description

This is a master-level introductory course to statistical learning methods for prediction and inference. This course introduces students to concepts and techniques of modern regression and predictive modelling. The course will blend theory and application using a range of examples. Topics include exploratory data analysis and visualization, linear and generalized linear models, model selection, penalized estimation and shrinkage methods including Lasso, ridge regression and Bayesian regression, decision trees and ensemble methods. Other advanced topics, such as robust estimation, smoothing splines, support vector machines and neural networks, will be briefly discussed. The R programming language and applications are used throughout.

Corequisite: Statistical Science 323D or 523L and Statistical Science 360, 601, or 602L. All students should be comfortable with linear or matrix algebra and mathematical statistics at the level of STA 611 and be familiar with the R programming language and linear regression. Students should be familiar with Bayesian statistics either by taking the introduction to Bayesian inference STA 360/601/602 or currently co-registered in the course. Please see me if you have questions about the pre-requisites or background.

Acknowledgement: This course webpage contains materials such as lecture slides, homework and datasets that were developed or adapted by Merlise Clyde, Bin Yu, Raaz Dwivedi and Ryan Tibshirani.

Class info


Yuansi Chen


Jose Pliego San Martin

Aihua Li

Nancy Huang

Main References

Main text:


Lecture Timeline

Tentative, please refresh for the latest version

Week Date Topic Other
1 Aug. 30 Course introduction hw0 out
Sep. 01 Predictive Modeling Overview
2 Sep. 06 Exploratory Data Analysis I, basics and PCA hw0 due & hw1 out
Sep. 08 Exploratory Data Analysis II, clustering methods 1 proj1 out
3 Sep. 13 Exploratory Data Analysis III, clustering methods 2
Sep. 15 Linear Regression Review
4 Sep. 20 Linear Regression Review II hw1 due & hw2 out
Sep. 22 Diagnostics + Modeling count data
5 Sep. 27 Bias-variance decomp + Ridge regression
Sep. 29 Relation to Bayesian Regression
6 Oct. 04 LASSO hw2 due & hw3 out
Oct. 06 In-class midterm
7 Oct. 11 Fall break 🎇
Oct. 13 Model Assessment and Selection I proj1 due
8 Oct. 18 Model Assessment and Selection II
Oct. 20 Classification starts here - Logistic Regression proj2 out
9 Oct. 25 The Bayes classifier and k-nearest neighbors hw3 due & hw4 out
Oct. 27 Linear/Quadratic Discriminant Analysis
10 Nov. 01 Support Vector Machines & the Max-Margin Idea
Nov. 03 Kernel Methods I
11 Nov. 08 Kernel Methods II hw4 due & hw5 out
Nov. 10 Decision trees
12 Nov. 15 Decision trees + Bagging
Nov. 17 Bagging + Random Forests + Boosting
13 Nov. 22 Boosting II hw5 due & hw6 out
Nov. 24 Thanksgiving break 🎉
14 Nov. 29 Intro to Neural Networks
Dec. 01 Final Review
15 Dec. 06 Graduate reading period hw6 due, proj2 due
Dec. 08 Graduate reading period
16 Dec. 13 Graduate reading period
Dec. 15 Final Exam 2-5 in Old Chem 116 🍾

Required workload

Grading Policy

Other Resources

Books & Tutorials for Learning R

Need Help with Linear/Matrix Algebra?

Related courses