1. GLS shrinkage: Let \(y|\beta\sim N_n(X\beta,\sigma^2 V)\). Find the conditional distribution of \(\beta\) given \(y\) under the prior distribution \(\beta\sim N(\beta_0,\Psi)\). In the case that \(\beta_0=0\), compare the posterior expectation of \(\beta\) to the GLS estimator.

  2. \(g\)-prior: Consider the estimator \(\hat\beta_g = \hat \beta_{OLS} \times g/(g+1)\), that is, the Bayes estimator under the \(g\)-prior. Compute the bias, variance and MSE of this estimator as a function of \(g\) and \(\beta\). Sketch the MSE as a function of \(g\), for fixed \(\beta\). For what values of \(g\) is the MSE of \(\hat\beta_g\) lower than that of \(\hat\beta_{OLS}\)? For fixed \(g\), for what values of \(\beta\) is the MSE of \(\hat\beta_g\) lower than that of \(\hat\beta_{OLS}\)?

  3. Invariance: Suppose researcher 1 will estimate \(\beta\) from \(y\) and \(X\) in the linear model \(y\sim N(X\beta,\sigma^2I)\), whereas researcher 2 will estimate \(\alpha\) in the linear model \(y\sim N(W\alpha,\sigma^2I)\), where \(W=XA\) for some non-singular matrix \(A\). Note that if \(a\in \mathbb R^p\) is the true value of \(\alpha\) in the second model, then \(A a\) is the true value of \(\beta\) in the first model.

    1. Find the OLS estimator \(\hat \beta\) of \(\beta\) based on \((y,X)\), the OLS estimator \(\hat\alpha\) of \(\alpha\) based on \((y,W)\) and describe the relationship between the estimators. Is \(\hat\beta = A \hat\alpha\)?
    2. Now both researchers use \(g\)-priors with the same value of \(g\) to obtain their estimators \(\hat \beta_g\) and \(\hat \alpha_g\). Describe the relationship between the estimators. Is \(\hat\beta_g = A \hat\alpha_g\)?
    3. Now both researchers use ridge priors with the same value of \(\lambda\) to obtain their estimators \(\hat \beta_\lambda\) and \(\hat \alpha_\lambda\). Describe the relationship between the estimators. Under what conditions is \(\hat\beta_\lambda = A \hat\alpha_\lambda\)?
  4. Group ridge regression: Consider Bayesian estimation of \((\alpha, \beta)\) in the linear model \(y\sim N(W\alpha+ X\beta, \sigma^2 I)\) under the prior distribution \(\alpha\sim N(0,\lambda_\alpha I)\), \(\beta\sim N(0,\lambda_\beta I)\), with \(\alpha\) and \(\beta\) being a priori independent.

    1. Find the posterior distribution of \((\alpha,\beta)\) and, using results on partitioned matrices, express the posterior mean in more detail than in terms of inverses of large matrices, i.e., your answer should be more detailed than including something like \(( [ W X ][ W X ]^\top )^{-1}\).
    2. Describe the Bayes estimates as a function of \(\lambda_\alpha\) for fixed \(\lambda_\beta\) and as a function of \(\lambda_\beta\) for fixed \(\lambda_\alpha\).
  5. Empirical Bayes ridge regression: Consider the normal linear model \(y \sim N(X\beta, \sigma^2 I)\) where both \(\sigma^2\) and \(\beta\) are unknown, with the ridge prior distribution \(\beta\sim N(0,\tau^2 I)\).

    1. Find the distribution of the OLS estimator \(\hat\beta\) conditional on \(\beta\), and also unconditional on \(\beta\).
    2. Obtain an estimator \(\hat\tau^2\) of \(\tau^2\) that is “Bayes unbiased”, that is, \(E[\hat \tau^2 ] =\tau^2\) where the expectation is over the conditional distribution of \(y\) given \(\beta\) and the prior distribution over \(\beta\). From this, construct an estimate \(\hat\lambda\) of \(\lambda=\sigma^2/\tau^2\).
    3. Evaluate the performance of \(\tilde \beta = (X^\top X + \hat\lambda I)^{-1} X^\top y\) compared to the OLS estimator in terms of MSE using a simulation study. Include both a scenario where the \(\tilde \beta\) performs better that \(\hat\beta_{OLS}\), and one where it performs worse.