STA 345: Multivariate Analysis

 Class: Tu Th 2:50-4:05pm Old Chem 025 Prof: Robert L. Wolpert (wolpert@stat.duke.edu) OH: Wed 4:15-5:00pm Old Chem 211c TA: Anirban Bhattacharya (anib86@gmail.com) OH: Mon 3:00-4:30pm Old Chem 211a

Tentative Schedule

Description

Half a century ago the phrase Multivariate Statistics was generally understood to describe sampling-theory based statistical methods for studying multi-dimensional normally-distributed data. The fundamental tools for this aspect of the subject are a deep understanding of linear algebra and of the probability distributions associated with the normal, such as Wishart and its kin. The best-known methods arising in this area are PCA (Principal Components Analysis), FA (Factor Analysis), Hotelling's T2 test, and perhaps relatives like Principal Components Regression and multivariate ANOVA.

More recently, interest in computational methods, causality, and model formulation have all led to a growth in the study of Graphical Models in which the conditional (in)depependence structure for a family of random variables is encoded in the form of a graph, a collection of points (the vertices) some of which are connected (by edges, or possibly-ordered pairs of vertices). For non-Gaussian distributions it is sometimes necessary to go beyond graphs to "hypergraphs".

My plan is to try to cover the high-lights of both traditional (multivariate Gaussian) MVA and of graphical models. This will be the first time I've taught this material and I'll be learning some of it as we go along, so don't expect a smooth ride or a polished syllabus. I hope to have some computing aspects for the course if I can manage it.

Students are expected to be (or become) comfortable with probability theory at the level of STA214 or STA205, statistical inference at the level of STA215, and linear models at the level of STA244. Some experience in computing in R or MatLab would be helpful.

Assessment

This is a 300-level course and really shouldn't be graded--- but, since it is, there will be five problem sets (about once every two weeks) and an optional final project. Final project can be either a five page (or so) paper presenting a data-analysis using methods from this course on data of interest to you (or I can help you find some, if you prefer), or a five page (or so) paper (or 15 minute oral presentation) of a journal article that either develops or applies interesting multivariate methods. Any student who turns in the homeworks with a good-faith effort at completing them will receive at least an A- in the course; any student who also completes an optional project will receive an A; students who do neither of these will receive a B or B+.

Textbooks

The textbooks for the course are:
• S. J. Press, Applied Multivariate Analysis: Using Bayesian and Frequentist Inference (2/e)
• S. Lauretzen, Graphical Models
Other interesting books with complimentary strengths (which I'll use at times) include:
• T. W. Anderson, An Introduction to Multivariate Statistical Analysis (3/e)
• M. L. Eaton, Multivariate Statistics: A Vector Space Approach
• T. Hastie, R. Tibshirani, J. Friedman, The Elements of Statistical Learning: Data Mining, Inference and Prediction (2/e)
• K. V. Mardia, J. T. Kent, J. M. Bibby, Multivariate Analysis
• D. F. Morrison, Multivariate Statistical Methods (4/e)
• J. Whittaker, Graphical Models in Applied Multivariate Statistics
Morrison's book is good but insanely expensive (\$185 at Amazon.com or \$157 straight from the publisher), but
Chapters 1 & 2 of (3/e) are accessible from the publisher on-line. Here's a helpful multimedia introduction to MVA click here.