STA 832: Multivariate Analysis

Class:Tu Th 10:05-11:20am   Old Chem 116
Prof:Robert L. Wolpert (wolpert@stat.duke.edu)
OH:By appointment Old Chem 211c

Tentative Schedule

Description

Half a century ago the phrase Multivariate Statistics was generally understood to describe sampling-theory based statistical methods for studying multi-dimensional normally-distributed data. The fundamental tools for this aspect of the subject are a deep understanding of linear algebra and of the probability distributions associated with the normal, such as Wishart and its kin. The best-known methods arising in this area are PCA (Principal Components Analysis), FA (Factor Analysis), Hotelling's T2 test, and perhaps relatives like Principal Components Regression and multivariate ANOVA.

Traditional MVA methods are tailored for problems in which the number of observations (traditionally denoted n) exceeds (maybe by a lot) the number of uncertain parameters (traditionally p). Recently there is a great deal of interest in problems where p»n--- this arises naturally in genomic applications, intrusion detection, and other emerging areas of interest. "Big p small n"

My plan is to try to sketch the high-lights of traditional (multivariate Gaussian) MVA in the first half of the semester, then segue to a discussion-format course in which students select papers or book chapters covering more recent material, and make an oral presentation of these to the class. I'll help identify a number of possible papers and topics for student presentation, but you're welcome to choose something outside those offerings. Here's the start of a list of Multivariate Papers and one of Consistency and Asymptotic Papers (pdf copies of each availble on request).

Possible topics will include random-projection methods, the statistical modeling of computer output, random forests, linear discriminant analysis, kernel PCA, and others.

Students are expected to be (or become) comfortable with probability theory at the level of STA230 or STA711 or STA831, statistical inference at the level of STA250 or STA732, and linear models at the level of STA721. Some experience in computing in MatLab, Python, or R would be helpful.

Assessment

Assessment will be based on a few of problem sets covering material from the first part of the course and on the quality of the oral presentations and level of participation in the discussion part of the course. Presentations should include either a pdf of five-to-twenty slides (or so), or a paper of five-to-ten pages (or so), and should include an original example or illustration of the ideas being presented. I'm available for help.

Textbooks

The textbook for the first half of the course is: Other interesting and related books with complimentary strengths include: Morrison's book is good but insanely expensive ($185 at Amazon.com or $157 straight from the publisher), but Chapters 1 & 2 of (3/e) are accessible from the publisher on-line. Here's a helpful multimedia introduction to MVA click here.
Last modified: .