HW 1 Solutions
STA 211 Spring 2023 (Jiang)
Let \(A_{ij}\) be the element of the matrix \(\mathbf{A}\) in the \(i^{th}\) row and \(j^{th}\) column and let \(x_i\) be the \(i^{th}\) element of the vector \(\mathbf{x}\). Write out the matrix product in scalar form:
\[\begin{align*} \mathbf{x}^T\mathbf{A}\mathbf{x} &= \sum_{i = 1}^k\sum_{j = 1}^k x_iA_{ij}x_j \\ &= \sum_{i = 1}^k \left(A_{ii}x_i^2 + \sum_{i \neq j} x_iA_{ij}x_j\right) \end{align*}\]
Consider taking the partial derivative with respect to an arbitrary \(x_u\). In the inner-most sum, there are two instances where the partial derivative does not vanish: once “on the left” of \(A_{ij}\) and once “on the right.” Since scalar multiplication is commutative and \(\mathbf{A}\) is symmetric (i.e., \(A_{ij} = A_{ji}\)), the partial derivative is the following scalar expression:
\[\begin{align*} \frac{\partial}{\partial x_u} \left( \sum_{i = 1}^k \left(A_{ii}x_i^2 + \sum_{i \neq j} x_iA_{ij}x_j\right)\right) &= 2A_{uu}x_u + \sum_{i \neq u}x_iA_{iu} + \sum_{i \neq u}A_{ui}x_i\\ &= \sum_{i = 1}^k x_iA_{iu} + \sum_{i = 1}^k A_{ui}x_i\\ &= 2\sum_{i = 1}^k A_{ui}x_i \end{align*}\]
Thus, in arranging each of these \(k\) partial derivatives into a vector, we have:
\[\begin{align*} \nabla \sum_{i = 1}^k\sum_{j = 1}^k x_iA_{ij}x_j &= \begin{bmatrix} 2\sum_{i = 1}^k A_{1i}x_i\\ \vdots\\ 2\sum_{i = 1}^k A_{ki}x_i \end{bmatrix} = 2\mathbf{Ax} \end{align*}\]
as desired.
Symmetry of \(\mathbf{H}\):
\[\begin{align*} \mathbf{H}^T &= \left(\mathbf{X}\left(\mathbf{X}^T\mathbf{X}\right)^T\mathbf{X}^T \right)^T \\ &= \left(\mathbf{X}^T\right)^T\left(\mathbf{X}\left(\mathbf{X}^T\mathbf{X}\right)^{-1}\right)^T \\ &= \mathbf{X} \left( \left( \left( \mathbf{X}^T\mathbf{X}\right)^{-1} \right)^T \mathbf{X}^T\right)\\ &= \mathbf{X}\left( \left( \mathbf{X}^T\mathbf{X}^T\right)^T \right)^{-1}\mathbf{X}^T\\ &= \mathbf{X}\left( \mathbf{X}^T\mathbf{X}\right)^{-1}\mathbf{X}^T = \mathbf{H} \end{align*}\]
Symmetry of \(\mathbf{I} - \mathbf{H}\) follows since the only differences between elements of \(\mathbf{H}\) and \(\mathbf{I} - \mathbf{H}\) occur on the diagonal (in general, the sum of two symmetric matrices is also symmetric).
Idempotency of \(\mathbf{H}\):
\[\begin{align*} \mathbf{H}^2 &= \left(\mathbf{X}\left(\mathbf{X}^T\mathbf{X}\right)^T\mathbf{X}^T \right)\left(\mathbf{X}\left(\mathbf{X}^T\mathbf{X}\right)^T\mathbf{X}^T \right)\\ &= \mathbf{X}\left(\mathbf{X}^T\mathbf{X}\right)^{-1}\left(\mathbf{X}^T\mathbf{X}\right)\left(\mathbf{X}^T\mathbf{X}\right)^{-1}\mathbf{X}^T\\ &= \mathbf{X}\left(\mathbf{X}^T\mathbf{X}\right)^T\mathbf{X}^T = \mathbf{H} \end{align*}\]
Idempotency of \(\mathbf{I} - \mathbf{H}\):
\[\begin{align*} (\mathbf{I} - \mathbf{H})^2 &= \left(\mathbf{I} - \mathbf{H} \right)\left(\mathbf{I} - \mathbf{H} \right)\\ &= \mathbf{I}^2 - 2\mathbf{H} + \mathbf{H}^2\\ &= \mathbf{I} - \mathbf{H} \end{align*}\]
Taking the gradient:
\[\begin{align*} \nabla\left(\left(\mathbf{y} - \mathbf{X}\boldsymbol{\beta} \right)^T\left(\mathbf{y} - \mathbf{X}\boldsymbol{\beta} \right) + \lambda \boldsymbol{\beta}^T\boldsymbol{\beta}\right) &= \underbrace{-2\mathbf{X}^T\mathbf{y} + 2\mathbf{X}^T\mathbf{X}\boldsymbol{\beta}}_{\mathrm{from}\:\mathrm{class}} + \lambda \nabla \boldsymbol{\beta}^T\boldsymbol{\beta}\\ &= -2\mathbf{X}^T\mathbf{y} + 2\mathbf{X}^T\mathbf{X}\boldsymbol{\beta} + 2\lambda \boldsymbol{\beta} \end{align*}\]
Setting equal to \(\mathbf{0}\) and solving for potential solution:
\[\begin{align*} \mathbf{0} &\stackrel{set}{=} -2\mathbf{X}^T\mathbf{y} + 2\mathbf{X}^T\mathbf{X}\boldsymbol{\beta} + 2\lambda \widehat{\boldsymbol{\beta}}\\ \mathbf{X}^T\mathbf{y} &= \left(\mathbf{X}^T\mathbf{X} + \lambda \mathbf{I}\right)\widehat{\boldsymbol{\beta}}\\ \left(\mathbf{X}^T\mathbf{X} + \lambda \mathbf{I}\right)^{-1}\mathbf{X}^T\mathbf{y} &= \widehat{\boldsymbol{\beta}} \end{align*}\]
The Hessian is as follows:
\[\begin{align*} \nabla\left(-2\mathbf{X}^T\mathbf{y} + 2\mathbf{X}^T\mathbf{X}\boldsymbol{\beta} + 2\lambda \boldsymbol{\beta}\right) &= 2\nabla \left(\mathbf{X}^T\mathbf{X} + \lambda\mathbf{I} \right) \boldsymbol{\beta}\\ &= 2\left(\mathbf{X}^T\mathbf{X} + \lambda\mathbf{I} \right) \end{align*}\]
which is positive definite, since for any non-zero \(\mathbf{z}\),
\[\begin{align*} \mathbf{z}^T\left(\mathbf{X}^T\mathbf{X} + \lambda\mathbf{I}\right)\mathbf{z} &= \mathbf{z}^T\mathbf{X}^T\mathbf{X}\mathbf{z} + \lambda \mathbf{z}^T\mathbf{z}\\ &= \left(\mathbf{Xz}\right)^T\mathbf{Xz} + \lambda \mathbf{z}^T\mathbf{z} > 0 \end{align*}\]
and so we have found a minimizing solution.