Exercise 17

Exercise 19

Do a PCA on the

*logged*values of the variables “DR1TPROT”, “DR1TCARB”, “DR1TFIBE”,“DR1TPFAT”, “DR1TCHOL” of the NHANES dataset. Specifically,- compute the sample covariance matrix and its eigendecomposition;
- report the cumulative proportion of variance explained by the principal axes;
- plot the coefficients of the first two principal axes, and relate them to covariance matrix.

Repeat the above calculations but on the sample correlation matrix instead. Explain any similarities/differences between the two sets of results.

The

`heads.rds`

file on the course website contains data on heads of both male and female soldiers in the Swiss army.- Do a PCA on the combined dataset that includes both makes and females. Provide the covariance matrix, and plots of the eigenvalues and first few principal axes. Plot the first two principal components, and indicate sex by either a plotting character or color on your figure. Describe the results.
- Now repeat the calculations but with separate PCAs for each sex. Describe similarities and differences between the pooled analysis and the separate analyses.

(optional) Consider two ways of drawing a straight line through a scatterplot of a data matrix \(Y\in \mathbb R^{n\times 2}\). The first is the OLS regression line obtained by regressing the second column \(y_2\) on the first \(y_1\). The second is by finding the best one-dimensional approximation to the data matrix. Obtain formulas for the slopes of each line. When is one bigger (in magnitude) than the other? Hint: To simplify things, consider just the case of mean-zero data, or regression through the origin and best linear subspace.