See Due date in Sakai
Please look at before class Tuesday in case there are questions or clarifications needed (or post on Piazza). Use LaTeX or write by hand (must be legible) and scan to submit via Sakai.
Let $X^s$ denote the centered and scaled design matrix (see slides or text) for the following problems.
$$\begin{aligned}
Y & = 1 \alpha + X^s \beta + \epsilon \\
U^TY & = U^T 1 \alpha + U^T X^s \beta + U^T\epsilon \\
\left(
\begin{array}{c}
y_0^* \\
y_1^* \\
\vdots \\
y_p^* \\
y_{p+1}^* \\
\vdots \\
y_{n-1}^*
\end{array} \right)
& =
\left[
\begin{array}{ccccc}
\sqrt{n} & 0 & 0 & \ldots & 0\\
0 & l_1 & 0 & \ldots & 0 \\
\vdots & 0 & \ddots & 0 & \ldots \\
\vdots& 0 & \vdots & \ddots & 0 \\
\vdots& \vdots & \vdots & 0 & l_p \\
\vdots & \vdots & \vdots &\vdots & 0 \\
\vdots & \vdots & \vdots &\vdots & \vdots \\
0 & \ldots & \ldots & \ldots & 0
\end{array} \right]
\left(
\begin{array}{c}
\alpha \\
\gamma_1 \\
\vdots \\
\gamma_p
\end{array}
\right)
+ \epsilon^*
\end{aligned}$$
where$X^s$ has been centered and standardized so that each column has
length 1 and mean 0 (see scale in R),
$U = [1_n/\sqrt{n} \, U_p \,U_{n-p-1}]$ is an orthogonal matrix
with $X^s = U_p L V^T$ from
the singular value decomposition, and $\gamma = V^T\beta$.
Verify that OLS estimates $\hat{\alpha} = \bar{y}$ and $\hat{\gamma}_i = y_i^*/l_i$ for $i = 1, \ldots, p$.
Show that $\textsf{SSE} = Y^TU_{n-p-1}U_{n-p-1}^T Y = Y(I - \frac{1}{n}1 1^T - U_p U_p^T)Y.$ Do you need to find $U_{n-p-1}^TY$ to compute SSE? Hint show that you can express SSE as a function of the MLEs of $\alpha$ and $\hat{\gamma}$s (and $Y^TY$)? \item Write the likelihood function as a product of three terms: a part that involves only $\alpha \mid \phi$, $\gamma \mid \phi$ and $\phi$ with the MLES and SSE using $Y^*$.
Derive conditional distributions for (a) $\alpha$, $\gamma$, $\phi$ given $\kappa_j, Y$ and the full conditionals for (b) $\kappa_j$ given $\alpha, \gamma, \phi, Y$
$$p(\alpha, \phi) \propto 1/\phi$$
$$\kappa_j \mid \phi, \alpha \sim G(\frac{1}{2}, \frac{1}{2})$$
$$\gamma_j \mid \kappa_j, \phi, \alpha \sim N(0, \frac{1}{\phi
\kappa_j})$$
You should have a name for the distribution and
expressions for all hyperparameters, not just an expression for the
density. Hint: write down likelihoods and priors,
but ignore any terms that do not involve the parameter of interest
(they go into constant of proportionality). Simplify until you
recognize the distribution.
What prior on $\kappa_j$’s leads to the g-prior?
Modify your Gamma prior on $\kappa_i$ to try to capture the
desired features of Goldstein & Smith (1974) (See Christensen
Chapter 15) where if
$$
\gamma_i^2 < \sigma^2\left[ \frac{2}{\kappa_i} + \frac{1}{l_i^2} \right]
$$ the fixed $\kappa_i$ Generalized Ridge shrinkage estimator (posterior mean) beats OLS:
if $l_i$ is small almost any $\kappa_i$ will improve over OLS and
if $l_i^2$ is large then only very small values of $\kappa_i$ will give an improvement
based on $l_i$, i.e. if $l_i^2$ is large $\kappa_i$ should be small. For $p=25$ with eigenvalues $l_i^2$ based on seq(0, 10, length=25)
, plot your prior densities, with an overlay of the G(1⁄2, 1⁄2) and comment.
Find the updated full conditionals based on your choice above. Do you need to update all of the full conditionals? Explain.
Implement your 2 generalized ridge models in R, JAGS or other language (see
earlier JAGS code as a starting point) and apply this to the
longley
data. How do your results compare to classical ridge?
Include histograms of the posterior distributions of coefficients,
plus means and credible intervals, as well as histograms of the
$\kappa$’s with the prior density overlaid. How sensitive are the
results to the prior assumptions? How do the estimates of
$\kappa_i$ compare to the best GCV estimate from lm.ridge
from class?
Explain the computational advantage of using the canonical parameterization in MCMC.
Provide interpretations of the estimates/credible intervals in the context of the problem. Does the idea of a “one-unit” change in a predictor with all other variables held fixed make sense for this data set?