Class: | Tu Th 10:05-11:20am | Old Chem 116 | |

Prof: | Robert L. Wolpert | (wolpert@stat.duke.edu) | |

OH: | By appointment | Old Chem 211c |

Tentative Schedule

Half a century ago the phrase *Multivariate Statistics* was generally
understood to describe sampling-theory based statistical methods for studying
multi-dimensional normally-distributed data. The fundamental tools for this
aspect of the subject are a deep understanding of linear algebra and of the
probability distributions associated with the normal, such as Wishart and its
kin. The best-known methods arising in this area are PCA (Principal
Components Analysis), FA (Factor Analysis), Hotelling's *T*^{2}
test, and perhaps relatives like Principal Components Regression and
multivariate ANOVA.

Traditional MVA methods are tailored for problems in which the number
of observations (traditionally denoted *n*) exceeds (maybe by a lot)
the number of uncertain parameters (traditionally *p*). Recently
there is a great deal of interest in problems where *p*»*n*---
this arises naturally in genomic applications, intrusion detection, and other
emerging areas of interest. "Big *p* small *n*"

My plan is to try to sketch the high-lights of traditional (multivariate Gaussian) MVA in the first half of the semester, then segue to a discussion-format course in which students select papers or book chapters covering more recent material, and make an oral presentation of these to the class. I'll help identify a number of possible papers and topics for student presentation, but you're welcome to choose something outside those offerings. Here's the start of a list of Multivariate Papers and one of Consistency and Asymptotic Papers (pdf copies of each availble on request).

Possible topics will include random-projection methods, the statistical modeling of computer output, random forests, linear discriminant analysis, kernel PCA, and others.

Students are expected to be (or become) comfortable with probability theory
at the level of STA230 or STA711 or STA831, statistical inference at the
level of STA250 or STA732, and linear models at the level of STA721. Some
experience in computing in `MatLab`, `Python`, or `R`
would be helpful.

- S. J. Press, Applied Multivariate Analysis: Using Bayesian and Frequentist Inference (2/e)

- T. W. Anderson, An Introduction to Multivariate Statistical Analysis (3/e)
- M. L. Eaton, Multivariate Statistics: A Vector Space Approach
- T. Hastie, R. Tibshirani, J. Friedman, The Elements of Statistical Learning: Data Mining, Inference and Prediction (2/e)
- S. Lauretzen, Graphical Models
- K. V. Mardia, J. T. Kent, J. M. Bibby, Multivariate Analysis
- D. F. Morrison, Multivariate Statistical Methods (4/e)
- J. Whittaker, Graphical Models in Applied Multivariate Statistics

Last modified: .