https://commons.wikimedia.org/wiki/File:Bayes%27_Theorem_MMB_01.jpg

HW1

Date

Thu, Aug 31, 2017

Links

Reading

Due 09/06/2017 11:55:00 PM

Exercise 1.5.8 in Christensen Note the matrix $J_n^n = \mathbf{1}_n \mathbf{1}_n^T$, where $\mathbf{1}_n$ is a vector of length n with entries that are all ones.
We showed that $P_X = X(X^TX)^{-1}X^T$ was an orthogonal projection on the column space of $X$ and that $\hat{Y} = P_X Y$. While useful for theory, the projection matrix should never be used in practice to find the MLE of $\boldsymbol{\mu}$ due to 1) computational complexity (inverses and matrix multiplication) and instability. To find $\hat{\beta}$ for $X$ of full column rank we solve $X \beta = P_X Y $ which leads to the normal equations $(X^TX) \beta = X^TY$ and solving the system of equations for $\beta$. Instead consider the following for $X$ ($n \times p, p < n$) of rank $p$

a. Any $X$ may be written via a singular value decomposition as $U \Lambda V^T$ where $U$ is a $n \times p$ orthonormal matrix ($U^TU = I_p$ and columns of $U$ form an orthonormal basis (ONB) for $C(X)$), $\Lambda$ is a $p \times p$ diagonal matrix and $V$ is a $p \times p$ orthogonal matrix ($V^TV = V V^T = I_p$. Note the difference between orthonormal and orthogonal. Show that $P_X$ may be expressed as a function of $U$ only and provide an expression for $\hat{Y}$. Similarly, find an expression for $\hat{\beta}$ in terms of $U$, $\Lambda$ and $V$. Your result should only require the inverse of a diagonal matrix!

b. $X$ may be written in a (reduced or thinned) QR decomposition as a matrix $Q$ that is a $n \times p$ orthonormal matrix (which forms an ONB for $C(X)$) and $R$ which is a $p \times p$ upper triangular matrix (i.e all elements below the diagonal are 0) where $X = Q R$. The columns of $Q$ are an ONB for the $C(X)$. Show that $P_X$ may be expressed as a function of $Q$ alone. Show that the the normal equations reduce to solving the triangular system $R \beta = Z$ where $Z = Q^T Y$. Because $R$ is upper triangular, show that $\hat{\beta}$ may be obtained be back-solving thus avoiding the matrix inverse of $X^TX$.

c. Any symmetric matrix $A$ may be written via a Cholesky decomposition as $A = L L^T$ where $L$ is lower triangular. If $Z = X^TY$ show that we can solve two triangular systems $L L^T \beta = Z$ by solving for $w$ using $L w = Z$ using a forward substitution and then for $\hat{\beta}$ using $L^T \beta = w$ avoiding any matrix inversion.

d. Use R to find $Q$ and $U$ for the matrices in Example 1.0.1 and 1.0.2 in Christensen. Does $Q$ equal $U$? See help pages via help(qr) and help(svd) for function documentation.
(Try using Rmarkdown for for writing up the solution!)

e. Prove that the two projection matrices obtained by the SVD and the QR method are the same. (review Theorems in Christensen Appencies about uniqueness of projections and cite appropriate results to show that they are the same.)

Note: The Cholesky method is the fastest in terms of $O(n p^2 + p^3 /3)$ floating point operations (flops), but is numerically unstable if the matrix is poorly conditioned. R uses the QR method ($O(2 n p^2 - 2p^3 /3)$ flops) in the function lm.fit() (which is the workhorse underneath the lm() function. The SVD method is the most expensive $O(2 n p^2 + 11 p^3)$ but can handle the rank deficient case. There are generalized Cholesky and QR methods for the rank deficient cases that involve pivoting the columns of X.

Review Chapter 1 and Appendix A in Plane Answers to Complex Questions on Vector Spaces

Review Material from the StatSci Bootcamp