1. Multigroup regression: Consider the multigroup regression model where \(y_{i,j} = u_{i}^\top(\beta +\alpha_j)+\epsilon_{i,j}\), \(i=1,\ldots,r\), \(j=1,\ldots,m\), where \(u_1,\ldots, u_r,\beta, \alpha_1,\ldots,\alpha_m\) are all \(p\)-dimensional vectors, with the \(\epsilon_{i,j}\)’s being mean zero, variance \(\sigma^2\) uncorrelated random variables. Note that we can write the model for the data in group \(j\) as \(E[ y_j] = U (\beta+\alpha_j)\), \(V[y_j ] = \sigma^2 I_r\) where \(y_j\in \mathbb R^r\) and \(U\) is the same \(r\times p\) matrix for every group \(j=1,\ldots,m\).

    1. The model can be written as \(E[y]=W \alpha + X\beta\) where \(\alpha = (\alpha_1^\top ,\ldots, \alpha_m^\top )^\top\). Express \(W\) and \(X\) in terms of \(U\).
    2. Find the OLS estimates of \(\beta + \alpha_j\), \(j=1,\ldots,m\) (you don’t have to use the result in a, but you can if you want).
    3. Extend the model to a linear mixed model by assuming the \(\alpha\) vector is randomly sampled with \(E[\alpha]=0\), \(V[\alpha_j]=\tau^2 I_p\) and \(E[ \alpha_j \alpha_{j'}^\top ] =0\) if \(j\neq j'\). Find the BLUE of \(\beta\) and the BLUP for \(\alpha_j\). Describe the relationship between the BLUP of \(\beta+\alpha_j\) and the OLS estimator from part b.
  2. Oracle shrinkage: Let \(\bar y_1,\ldots, \bar y_p\) be uncorrelated with \(E[\bar y_j] = \mu_j\) and \(V[\bar y_j] = \sigma^2/r\). For each \(j=1,\ldots,p\) consider the estimator \(\hat\mu_j = (1-w) \bar y + w \bar y_j\), where \(\bar y = \sum \bar y_j/p\).

    1. Find the value of \(w\) that minimizes \(\sum_{j=1}^p E[ (\hat\mu_j - \mu_j)^2]\).
    2. Compare the optimal “oracle” estimator from part a to the BLUPs of \(\mu_1,\ldots,\mu_p\) in the random effects model where \(\mu_j = \mu + \alpha_j\), \(E[\alpha] = 0\), \(V[ \alpha ] = I_p\tau^2\).
    3. Since \(\mu_1,\ldots, \mu_p\) are unknown, so is the optimal value of \(w\). However, suppose an estimate \(s^2\) of \(\sigma^2\) is available. Describe a method to obtain an estimate \(\hat w\) of the optimal value of \(w\) using \(s^2\) and \(\bar y_1,\ldots, \bar y_p\) (Comment: an estimator \(\hat \mu_1,\ldots, \hat \mu_p\) using \(\hat w\) is related to the James-Stein shrinkage estimator, and to empirical Bayes estimators).
  3. Let \(y\) and \(z\) be mean-zero random vectors, and let \(\hat y = A z\) be an unbiased linear predictor of \(y\) based on \(z\). Suppose \(E[ (y-\hat y) z^\top] = C\) is not zero. Adjust \(\hat y\) to construct a new estimator \(\check y = \hat y + Bz\) so that \(E[(y-\check y)(y-\check y)^\top]<E[ (y-\hat y) (y-\hat y)^\top ]\) in the Loewner order.

  4. Variance component estimation: Consider the random effects model \(y_{i,j} = \mu + \alpha_j + \epsilon_{i,j}\), \(i=1,\ldots,r_j\), \(j=1,\ldots,p\) where \(\alpha\sim N_p(0,\tau^2 I)\), \(\epsilon\sim N_n(0,\sigma^2 I)\) and \(\alpha\) and \(\epsilon\) are independent. Find unbiased estimators of \(\mu\), \(\sigma^2\) and \(\tau^2\) (e.g. method of moments or otherwise).

  5. Average versus conditional properties: The file “nels2002.RDS” contains data from the 2002 National Educational Longitudinal Study. For this exercise, we will analyze differences in math score (mscore) across schools using the one-way random effects model from the previous problem.

    1. Use a one-way ANOVA decomposition to test for the effect of across-school variability in mean score.
    2. Obtain unbiased estimators of \(\mu\), \(\sigma^2\) and \(\tau^2\). Construct the BLUPS of \(\mu_j = \mu + \alpha_j\) for each school, and plot these versus the school sample means. Describe how the BLUPS differ from the sample means.
    3. Now pretend that true mean vector \((\mu_1,\ldots, \mu_p)\) is equal to the vector of sample means \((\bar y_1,\ldots, \bar y_p)\), and the sample sizes for the different schools are as they are in the dataset. Imagine sampling the sample mean math score for school \(j\) as \(\bar y_{j} \sim N( \mu_j ,\sigma^2/r_j)\) and then computing the BLUP for each school, \[ \hat \mu_j = \frac{ 1/\tau^2}{ n_j/\sigma^2 + 1/\tau^2 } \mu + \frac{ n_j/\sigma^2}{ n_j/\sigma^2 + 1/\tau^2 } \bar y_j. \] and the 90% “prediction” interval for each school, \[ \hat \mu_j \pm 1.64 /\sqrt{r_j/\sigma^2 + 1/\tau^2 } \] where the values of \(\mu\), \(\tau^2\) and \(\sigma^2\) are from the previous part of this problem. Make a plot of \(\mu_j\) versus \(E[\hat \mu_j|\mu_j]\) and compute the average bias across schools. Make a plot of \(\mu_j\) versus the coverage probability of the prediction interval for each \(\mu_j\), and compute the average coverage probability across schools.