HW 5 Solutions

STA 211 Spring 2023 (Jiang)

Exercise 1

Suppose \(\mathbf{x}\) is a \(k\)-vector and \(\mathbf{A}\) is a fixed \(k \times k\) matrix (non-random). What is the expectation of the quadratic form \(\mathbf{x}^T\mathbf{Ax}\)? You may assume \(E(\mathbf{x})\) and \(Cov(\mathbf{x})\) exist.

\[\begin{align*} E(\mathbf{x}^T\mathbf{Ax}) &= E(trace(\mathbf{x}^T\mathbf{Ax}))\\ &= E(trace(\mathbf{Ax}\mathbf{x}^T))\\ &= trace(\mathbf{A}E(\mathbf{xx}^T)) \end{align*}\]

Note that \(Cov(\mathbf{x}) = E(\mathbf{xx}^T) - E(\mathbf{x})E(\mathbf{x})^T\):

\[\begin{align*} &= trace(\mathbf{A}Cov(\mathbf{x})) + trace(\mathbf{A}E(\mathbf{x})E(\mathbf{x})^T)\\ &= trace(\mathbf{A}Cov(\mathbf{x})) + E(\mathbf{x}^T)\mathbf{A}E(\mathbf{x}). \end{align*}\]

Exercise 2

Suppose the assumptions on Slide 6 hold. What is the variance of the \(i^{th}\) residual? If we had a “good” estimate of \(\sigma^2\), which we’ll call \(\widehat{\sigma}^2\), the \(i^{th}\) standardized residual might then be defined by \(y_i - \widehat{y}_i\) divided by the square root of the answer to this exercise. Use this intuition to develop a notion of an outlier (no hard numbers required, just an idea).

Note from Ex. 3 that \(Cov(\widehat{\epsilon}) = \sigma^2(\mathbf{I} - \mathbf{H})\). Thus, the variance of the \(i^{th}\) residual are given by the diagonal terms in the matrix, \(\sigma^2(1 - H_{ii})\). An outlier might then be defined if the term

\[\begin{align*} \frac{y_i - \widehat{y}_i}{\sigma\sqrt{1 - H_{ii}}} \end{align*}\]

is sufficiently large in magnitude (for instance, if it is above some percentile of some t-distribution).

Exercise 3

What are the expectation and covariance of the residuals?

\[\begin{align*} E(\widehat{\epsilon}) &= E(\mathbf{y} - \mathbf{X}\widehat{\boldsymbol\beta})\\ &= E(\mathbf{X}\boldsymbol\beta + \boldsymbol\epsilon - \mathbf{X}(\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T\mathbf{y})\\ &= E(\mathbf{X}\boldsymbol\beta + \boldsymbol\epsilon - \mathbf{X}(\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T(\mathbf{X}\boldsymbol\beta + \boldsymbol\epsilon))\\ &= E(\mathbf{X}\boldsymbol\beta) + \mathbf{0} - E(\mathbf{X}(\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T(\mathbf{X}\boldsymbol\beta + \boldsymbol\epsilon))\\ &= E(\mathbf{X}\boldsymbol\beta) - E(\mathbf{X}\boldsymbol\beta) - \underbrace{E(\mathbf{X}(\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T\mathbf{\epsilon})}_{\mathbf{0}}\\ &= E(\mathbf{X}\boldsymbol\beta) - E(\mathbf{X}\boldsymbol\beta) = \mathbf{0}.\\ Cov(\widehat{\epsilon}) &= Cov(\mathbf{y} - \mathbf{X}\widehat{\boldsymbol\beta})\\ &= Cov(\mathbf{y} - \underbrace{(\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T}_{\mathbf{H}}\mathbf{y})\\ &= Cov((\mathbf{I} - \mathbf{H})\mathbf{y})\\ &= (\mathbf{I} - \mathbf{H})Cov(\mathbf{y})(\mathbf{I} - \mathbf{H})^T\\ &= (\mathbf{I} - \mathbf{H})Cov(\mathbf{X}\boldsymbol\beta + \boldsymbol\epsilon)(\mathbf{I} - \mathbf{H})^T\\ &= (\mathbf{I} - \mathbf{H})Cov(\boldsymbol\epsilon)(\mathbf{I} - \mathbf{H})^T\\ &= (\mathbf{I} - \mathbf{H})\sigma^2\mathbf{I}(\mathbf{I} - \mathbf{H})^T\\ &= \sigma^2 (\mathbf{I} - \mathbf{H})\mathbf{I}(\mathbf{I} - \mathbf{H})^T\\ &= \sigma^2(\mathbf{I} - \mathbf{H}) \end{align*}\]

As \(\mathbf{I} - \mathbf{H}\) is symmetric and idempotent (from earlier homework).