832 HW7 - Due Friday 2020/10/09

  1. Let \(Y \sim N_{n\times p}( 0 ,\Psi \otimes I_n)\) where \(\Psi\) is an unknown diagonal matrix. Write out the -2 log likelihood for \(\Psi\) and find the MLE.

  2. Let \(Y\) follow a mean-zero \(q\)-factor model, so that \(Y = Z A^\top + E \Psi^{1/2}\). Find \(E[ Z| Y,A \Psi]\) and \(E[ Z^\top Z | Y,A , \Psi]\).

  3. Perform a \(q=4\) factor analysis on the subset of the NHANES dataset in the file nhanesHW7.rds. In particular,

    1. Find the MLE of the variance matrix, and compare to the sample variance matrix.
    2. Interpret the coefficients of the loading matrix and relate to the sample variance matrix.
    3. Compare the communality of each variable to its specific variance. Which variables are the “most explained” by the latent factors?
    4. Redo the analysis after scaling the columns to have unit variance. How does this change the fit, or the interpretation of the model coefficients?
  4. Identify a dataset that you would like to use for the data project. Describe the dataset in a few sentences, and also provide the following information:

    1. The data source and how do you access it.
    2. The primary variables you intend to analyze.
    3. A list of other secondary variables that may be important (continuous or categorical predictors, time, design variables, etc).
    4. A description of what sort of dependencies you anticipate are present in the data.
    5. (optional) A scientific question or hypothesis you plan to evaluate with your data analysis.