HW 1 Solutions

STA 211 Spring 2023 (Jiang)

Exercise 1

Show that \(\nabla\left(\mathbf{x}^T\mathbf{A}\mathbf{x}\right) = 2\mathbf{A}\mathbf{x}\) for a \(k\)-vector \(\mathbf{x}\) and symmetric \(k\times k\) matrix \(\mathbf{A}\).

Let \(A_{ij}\) be the element of the matrix \(\mathbf{A}\) in the \(i^{th}\) row and \(j^{th}\) column and let \(x_i\) be the \(i^{th}\) element of the vector \(\mathbf{x}\). Write out the matrix product in scalar form:

\[\begin{align*} \mathbf{x}^T\mathbf{A}\mathbf{x} &= \sum_{i = 1}^k\sum_{j = 1}^k x_iA_{ij}x_j \\ &= \sum_{i = 1}^k \left(A_{ii}x_i^2 + \sum_{i \neq j} x_iA_{ij}x_j\right) \end{align*}\]

Consider taking the partial derivative with respect to an arbitrary \(x_u\). In the inner-most sum, there are two instances where the partial derivative does not vanish: once “on the left” of \(A_{ij}\) and once “on the right.” Since scalar multiplication is commutative and \(\mathbf{A}\) is symmetric (i.e., \(A_{ij} = A_{ji}\)), the partial derivative is the following scalar expression:

\[\begin{align*} \frac{\partial}{\partial x_u} \left( \sum_{i = 1}^k \left(A_{ii}x_i^2 + \sum_{i \neq j} x_iA_{ij}x_j\right)\right) &= 2A_{uu}x_u + \sum_{i \neq u}x_iA_{iu} + \sum_{i \neq u}A_{ui}x_i\\ &= \sum_{i = 1}^k x_iA_{iu} + \sum_{i = 1}^k A_{ui}x_i\\ &= 2\sum_{i = 1}^k A_{ui}x_i \end{align*}\]

Thus, in arranging each of these \(k\) partial derivatives into a vector, we have:

\[\begin{align*} \nabla \sum_{i = 1}^k\sum_{j = 1}^k x_iA_{ij}x_j &= \begin{bmatrix} 2\sum_{i = 1}^k A_{1i}x_i\\ \vdots\\ 2\sum_{i = 1}^k A_{ki}x_i \end{bmatrix} = 2\mathbf{Ax} \end{align*}\]

as desired.

Exercise 2

Show that \(\mathbf{H}\) and \(\mathbf{I} - \mathbf{H}\) are symmetric (i.e., \(\mathbf{H}^T = \mathbf{H}\), etc.) and idempotent (i.e., \(\mathbf{H}^2 = \mathbf{H}\), etc.).

Symmetry of \(\mathbf{H}\):

\[\begin{align*} \mathbf{H}^T &= \left(\mathbf{X}\left(\mathbf{X}^T\mathbf{X}\right)^T\mathbf{X}^T \right)^T \\ &= \left(\mathbf{X}^T\right)^T\left(\mathbf{X}\left(\mathbf{X}^T\mathbf{X}\right)^{-1}\right)^T \\ &= \mathbf{X} \left( \left( \left( \mathbf{X}^T\mathbf{X}\right)^{-1} \right)^T \mathbf{X}^T\right)\\ &= \mathbf{X}\left( \left( \mathbf{X}^T\mathbf{X}^T\right)^T \right)^{-1}\mathbf{X}^T\\ &= \mathbf{X}\left( \mathbf{X}^T\mathbf{X}\right)^{-1}\mathbf{X}^T = \mathbf{H} \end{align*}\]

Symmetry of \(\mathbf{I} - \mathbf{H}\) follows since the only differences between elements of \(\mathbf{H}\) and \(\mathbf{I} - \mathbf{H}\) occur on the diagonal (in general, the sum of two symmetric matrices is also symmetric).

Idempotency of \(\mathbf{H}\):

\[\begin{align*} \mathbf{H}^2 &= \left(\mathbf{X}\left(\mathbf{X}^T\mathbf{X}\right)^T\mathbf{X}^T \right)\left(\mathbf{X}\left(\mathbf{X}^T\mathbf{X}\right)^T\mathbf{X}^T \right)\\ &= \mathbf{X}\left(\mathbf{X}^T\mathbf{X}\right)^{-1}\left(\mathbf{X}^T\mathbf{X}\right)\left(\mathbf{X}^T\mathbf{X}\right)^{-1}\mathbf{X}^T\\ &= \mathbf{X}\left(\mathbf{X}^T\mathbf{X}\right)^T\mathbf{X}^T = \mathbf{H} \end{align*}\]

Idempotency of \(\mathbf{I} - \mathbf{H}\):

\[\begin{align*} (\mathbf{I} - \mathbf{H})^2 &= \left(\mathbf{I} - \mathbf{H} \right)\left(\mathbf{I} - \mathbf{H} \right)\\ &= \mathbf{I}^2 - 2\mathbf{H} + \mathbf{H}^2\\ &= \mathbf{I} - \mathbf{H} \end{align*}\]

Exercise 3

Instead of the MSE, suppose we wanted to minimize the following function with respect to \(\boldsymbol{\beta}\), for some scalar \(\lambda > 0\) (assuming full rank \(\mathbf{X}\)): \[\begin{align*} \left(\mathbf{y} - \mathbf{X}\boldsymbol{\beta} \right)^T\left(\mathbf{y} - \mathbf{X}\boldsymbol{\beta} \right) + \lambda \boldsymbol{\beta}^T\boldsymbol{\beta}. \end{align*}\] Is there an analytical solution to this objective function? If so, provide the solution and demonstrate that it indeed minimizes the objective function. Otherwise, explain why not.

Taking the gradient:

\[\begin{align*} \nabla\left(\left(\mathbf{y} - \mathbf{X}\boldsymbol{\beta} \right)^T\left(\mathbf{y} - \mathbf{X}\boldsymbol{\beta} \right) + \lambda \boldsymbol{\beta}^T\boldsymbol{\beta}\right) &= \underbrace{-2\mathbf{X}^T\mathbf{y} + 2\mathbf{X}^T\mathbf{X}\boldsymbol{\beta}}_{\mathrm{from}\:\mathrm{class}} + \lambda \nabla \boldsymbol{\beta}^T\boldsymbol{\beta}\\ &= -2\mathbf{X}^T\mathbf{y} + 2\mathbf{X}^T\mathbf{X}\boldsymbol{\beta} + 2\lambda \boldsymbol{\beta} \end{align*}\]

Setting equal to \(\mathbf{0}\) and solving for potential solution:

\[\begin{align*} \mathbf{0} &\stackrel{set}{=} -2\mathbf{X}^T\mathbf{y} + 2\mathbf{X}^T\mathbf{X}\boldsymbol{\beta} + 2\lambda \widehat{\boldsymbol{\beta}}\\ \mathbf{X}^T\mathbf{y} &= \left(\mathbf{X}^T\mathbf{X} + \lambda \mathbf{I}\right)\widehat{\boldsymbol{\beta}}\\ \left(\mathbf{X}^T\mathbf{X} + \lambda \mathbf{I}\right)^{-1}\mathbf{X}^T\mathbf{y} &= \widehat{\boldsymbol{\beta}} \end{align*}\]

The Hessian is as follows:

\[\begin{align*} \nabla\left(-2\mathbf{X}^T\mathbf{y} + 2\mathbf{X}^T\mathbf{X}\boldsymbol{\beta} + 2\lambda \boldsymbol{\beta}\right) &= 2\nabla \left(\mathbf{X}^T\mathbf{X} + \lambda\mathbf{I} \right) \boldsymbol{\beta}\\ &= 2\left(\mathbf{X}^T\mathbf{X} + \lambda\mathbf{I} \right) \end{align*}\]

which is positive definite, since for any non-zero \(\mathbf{z}\),

\[\begin{align*} \mathbf{z}^T\left(\mathbf{X}^T\mathbf{X} + \lambda\mathbf{I}\right)\mathbf{z} &= \mathbf{z}^T\mathbf{X}^T\mathbf{X}\mathbf{z} + \lambda \mathbf{z}^T\mathbf{z}\\ &= \left(\mathbf{Xz}\right)^T\mathbf{Xz} + \lambda \mathbf{z}^T\mathbf{z} > 0 \end{align*}\]

and so we have found a minimizing solution.