An introduction to statistical learning methods for prediction and inference.This course introduces students to concepts and techniques of Classical and Bayesian approaches for modern regression and predictive modelling. The course will blend theory and application using a range of examples. Topics include exploratory data analysis and visualization, linear and generalized linear models, model selection, penalized estimation and shrinkage methods including Lasso, ridge regression and Bayesian regression, regression and classification based on decision trees, Bayesian Model Averaging and ensemble methods, and time permitting, robust estimation, smoothing splines, support vector machines, neural nets or other advanced topics. The R programming language and applications are used throughout. Corequisite: Statistical Science 323D or 523L and Statistical Science 360, 601, or 602L.
All students should be comfortable with linear/matrix algebra and mathematical statistics at the level of STA 611 (Statistical Inference - Casella and Berger is an excellent resource) and familiar with the R programming language and should be familiar with linear regression. Students should be familiar with Bayesian statistics either by taking the introduction to Bayesian inference STA 360/601/602 or currently co-registered in the course. Please see me if you have questions about the pre-requisites/background.
Name | location | office hours | |
---|---|---|---|
Dr. Merlise Clyde | clyde@duke.edu | 223E Old Chem | after class, Thu 2-3 or by appointment |
Eric Su (TA) | eric.su@duke.edu | 203B | Mon 3-5 |
Hanyu Song (TA) | hanyu.song@duke.edu | 203B | Wed 3-5 |
Evan Poworoznek (TA) | evan.poworoznek@duke.edu | 203B | Th 1-3 |
Pritam Dey (TA) | pritam.dey@duke.edu | Piazza |
For questions outside of class/office hours please post on Piazza
We may switch lecture and lab times as needed; refer to the Course Calendar or Announcements in Sakai
Textbook | Ordering Information |
---|---|
![]() |
An Introduction to Statistical Learning: with Applications in R by Gareth James, Daniela Witten, Trevor Hastie, Robert Tibshirani. This is available freely available as aneBook Get it @Duke through the Duke Library. You are welcome to download or print it out. If you prefer a paperback or hardback version you may buy from Springer or Amazon. For additional information check out Videos for the ISL book |
![]() |
Elements of Statistical Learning by Trevor Hastie, Robert Tibshirani and Jerome Friedman. More advanced then ISL. This is freely available as an eBook. You are welcome to download or print it out. If you prefer a paperback version you may buy it at cost from Springer (see links from library site) or purchase a hardback version at the Duke Bookstore or through Amazon. |
![]() |
Applied Linear Regression by Sanford Weisberg (3rd Edition). In depth coverage of linear regression adn extensions, model checking, and more. get it @ Duke. The associated Computing Primer for Applied Linear Regression Using R is useful for the companion R package |
![]() |
Data Analysis Using Regression and Multilevel/Hierarchical Models by Andrew Gelman & Jennifer Hill (2009) ISBN-13: 978-0521686891. This is available from Amazon. We will refer to this book for practicle aspects of Regression and Bayesian Hierarchical modelling, model checking and more. |
![]() |
A First Course in Bayesian Statistical Methods by Peter Hoff (2009), Springer. ISBN 978-0-387-92299-7. Review chapters via eBook through the Duke Library. Used in co-requisite course STA 601 |
Other resources for reference books, statistical computing using R, etc are provided on the Resource tab
We will use R as a programming language for computation and data analysis, with the use existing packages written in R to support the course. All students will have access to RStudio/R on a server within the department and support during the labs. You are free to run RStudio/R on your personal laptop or desktop. See the Resources page for books and other resources for learning R. You should bring a calculator to exams. There is no restrictions on the type of calculator (but not on a mobile device).
Syllabus and other course information and policies are available on the Course Syllabus
Lecture slides, reading assignments, homework, and other materials are available on the Calendar
Links to Online Course Textbooks and other references, software and other useful resources are provided on the Resource menu
Link to Sakai allows access to Video Lectures, Gradebook, the Piazza discussion forum, etc
Github for Class Github Organization for Team coding submissions
This course has achieved Duke’s Green Classroom Certification. The certification indicates that the faculty member teaching this course has taken significant steps to green the delivery of this course. Your faculty member has completed a checklist indicating their common practices in areas of this course that have an environmental impact, such as paper and energy consumption. Some common practices implemented by faculty to reduce the environmental impact of their course include allowing electronic submission of assignments, providing online readings and turning off lights and electronics in the classroom when they are not in use. The eco-friendly aspects of course delivery may vary by faculty, by course and throughout the semester. Learn more at http://sustainability.duke.edu/action/certifications/classroom/index.php.