Simple Gauss-Markov: Let \(E[y] = X \beta\) and \(V[y] = \sigma^2 I\) for some known \(X\in \mathbb R^{n\times p}\) of full column rank, unknown \(\beta\in \mathbb R^{p}\) and unknown \(\sigma^2>0\).
Linear estimators of linear estimands: Consider a linear model \(E[y]=X\beta, \beta\in \mathbb R^p\) with \(V[y]=\sigma^2 I, \ \sigma^2>0\).
OLS and GLS: Let \(V\) be an \(n\times n\) positive definite covariance matrix, let \(\hat\beta_V = (X^\top V^{-1} X)^{-1} X^\top V^{-1} y\), and let \(\hat\beta\) be the OLS estimator. Show that \(\hat\beta_V=\hat\beta\) for all \(y\) iff \(V\) can be written \[ V= X\Psi X^\top + H \Phi H^\top \] for some \(H\) such that \(H^\top X = 0\) and for some positive definite matrices \(\Psi\) and \(\Phi\) of the appropriate dimension.
Centering: Most regression models used in practice include an “intercept” term, that is, a column of ones in the design matrix, whose coefficient represents the expected outcome when the values in the other columns are zero. Such a model can alternatively be written as \(y_{i} = \alpha + x_i^\top \beta + \epsilon_i\), or in matrix form as \(y= \alpha 1_n + X \beta + \epsilon\), so \(\alpha\) would be the expected value of \(y_i\) if \(x_i\) were zero. In the remainder of this exercise, assume \(E[ y] = 1_n \alpha + X\beta\) and \(V[y] = \sigma^2 I_n\).
Obtain an expression \((\hat\alpha,\hat\beta)\) for the OLS estimator of the \((p+1)\)-vector \((\alpha, \beta)\) in terms of \(X\), \(y\) and \(\bar x = X^\top 1/n\).
Obtain an expression for the orthogonal projection matrix onto the 1-dimensional linear subspace spanned by the vector \(1_n\), and also find the complementary projection matrix onto the null space of \(1_n\). Call this latter matrix “\(C\)” and let \(y_c = C y\). What linear model does \(y_c\) follow, i.e., what is the mean and variance of \(y_c\)?
Let \(H\) be an \(n\times (n-1)\) matrix whose columns are an
orthonormal basis for the null space of \(1_n\). Show that \(H H^\top = C\).
Let \(y_h = H^\top y\). What linear
model does \(y_h\) follow, i.e., what
is the mean and variance of \(y_h\)?
Find the OLS estimator \(\hat\beta_c\) of \(\beta\) based on \(y_c\) and the OLS estimator of \(\hat\beta_h\) of \(\beta\) based on \(y_h\). Is \(\hat\beta_h\) the BLUE among estimators based on \(y_h\)? Is \(\hat\beta_c\) the BLUE among estimators based on \(y_c\)?
Describe any differences among \(\hat\beta\), \(\hat\beta_c\) and \(\hat\beta_h\). Is any precision in estimating \(\beta\) lost by using \(y_c\) or \(y_h\), instead of the “full” data \(y\)?