SAMSI COURSE ON
DATA MINING
AND MACHINE LEARNING
- Instructors:
- Professor David
Banks : banks@stat.duke.edu
Professor Feng Liang : feng@stat.duke.edu
- Class Time:
- Wednesdays, 4:30 - 7:00pm
Class begins August 27, 2003
- Class Location:
- NISS Building, Room 104
Maps and
Directions
Distances:
-   Duke - SAMSI: ~ 8.5 miles (14 km)
-   NCSU - SAMSI: ~ 16.5 miles (26 km)
-   UNC - SAMSI: ~ 13.5 miles (22 km)
- Course Description
- Data mining represents an expanding partnership between statisticians and computer scientists. This SAMSI course attempts to bring graduate students up to the research frontier in this area, drawing together the foundations (Curse of Dimensionality, smoothing, flexible modeling, recursive partitioning, and parsimony) with more recent innovations (support vector machines, boosting and bagging, model stiffness, data streaming, and false discovery rate). The class will involve some applications and some illustrative use of software, but the focus will be upon theory. Grading will be based upon a research project--students will be expected to invent a new idea in this area, implement it, and then test it (this is easier than it may sound).
- Prerequisites:
- Knowledge of statistical inference, including familiarity with density functions, degrees of freedom, hypothesis testing and multiple regression.
- Comfort with linear models.
- Some experience with modern statistical computing.
- Text:
- The main text for the course is Hastie, Tibshirani, and
Friedman's "The Elements of Statistical Learning," but it will be supplemented by current articles.
Lectures:
Instructor: David Banks
- Aug 27
- Sep 3
- Sep 10 : NO CLASS (Datamining Workshop)
- Sep 17
- Sep 24
- Oct 1
- Oct 8
Instructor: Feng Liang
- Oct 15 : Spline
Reading : Chap 5.1--5.7
HW : Ex 5.4
- Oct 22 : SVM
Reading : Chap 4, 12
HW : one problem on convex programming (see handout)
- Oct 29 : RKHS
Reading : Chap 5.8, 12.3.3
HW : Two problems (see handout)
- Nov 5 : Statistical learning theory, Shrinkage
Reading : Chap 2, 7
HW : Prove Hoeffding's inequality (see handout)
- Nov 12 : Clustering (guest lecture given by
Andrew Nobel)
Reading : Chap 14
- Nov 19 : Boosting, Tree models
Reading : Chap 9, 10
HW : project proposal due
- Nov 26 : THANKSGIVING HOLIDAY
- Dec 3 : Presentation
Links: