https://commons.wikimedia.org/wiki/File:Bayes%27_Theorem_MMB_01.jpg

HW5

Date

Thu, Sep 14, 2017

Links

Reading

Due Friday 09/22/2017 3:15:00 PM

Please try to complete before class in case there are questions or clarifications needed (or post on Piazza). Use LaTeX or write by hand (must be legible) and scan to submit via Sakai.

Consider the linear model $Y = X\beta + \epsilon$ with $E[\epsilon] = 0_n$ and Cov$(\epsilon) = \sigma^2 I_n$.

Show that $P_{X^T} = (X^TX) (X^TX)^{-}$ is a projection onto the column space of $X^T$ where $(X^TX)^{-}$ is a generalized inverse of $X^TX$. Does this depend on the actual choice of generalized inverse? (explain) Is this an orthogonal projection?
Show that for an estimable function $\lambda = X^T a$ with $a \in C(X)$ that $(I - P_{X^T}) \lambda = 0$
Using the spectral decomposition of $(X^TX)$ and the Moore-Penrose generalized inverse (see class notes) find a simple expression for $I - P_{X^T}$ in terms of a reduced set of the eigenvectors of $X^TX$.
If $X$ is full column rank $p$, does a Best Linear Unbiased Prediction (BLUP) exist for all $x_* \in \mathbb{R}^{p}$ ($x_* \neq 0$)? Prove or Disprove.
(optional) Write a function in R to find the projection $(I - P_{X^T}) \lambda$ with the design matrix $X$ (with intercept) and $\lambda$ (vector or matrix) as input. (post the R code on Piazza). Apply your function to the example from class and compare to the conclusions from epredict. What sort of tolerance do you need to decide if $(I - P_{X^T}) \lambda = 0$?. Extra challenge - have your function return the estimates, SE and confidence intervals!
(collaborative function creation encouraged)
For the Prostate data: create dummy or indicator variables for each of the levels of the gleason scores and add to the dataframe; e.g. Prostate$D7 = (gleason == 7) (use base R or explore mutate from dplyr). Show that they are linearly related to the intercept.
Fit a linear model of with response lcavol including all of the dummy variables and the intercept. What are the coefficients? If you change the order that the dummy variables enter the model formula, what happens to the coefficients? If you force the intercept to be zero (add -1 to the formula) what are the results?
Using as.factor(gleason) as a predictor in lm, what is the equivalent model formula using dummy variables? See model.matrix to extract the design matrix.
What are the interpretation for these coefficients? (provide an explanation in a couple of sentences with the actual estimates. Full credit for interpretation with original units.)
In the model with all dummy variables and the intercept, use the theorem from class to show that each of the individual coefficients of the dummy variables and intercept are not estimable.
(for the energetic student. otherwise optional) The epredict function assumes that the intercept is always included, so any linear combination of $\beta$ always has the intercept added which means we cannot use the function to see if individual $\beta_j$ are estimable via a $\lambda = (0, 0, 1, \ldots 0)^T$. Create a new variable that is a column of ones Prostate$Int = rep(1, n) and fit the model using the formula lpsa ~ Int + D6 + D7 + D8 + D9 -1 where D7 is the dummy variable indication that the gleason score is 7 and -1 drops the column of ones added by default. Create a data frame for predicting that will let you demonstrate with epredict that none of the individual $\beta_j$ are estimable.

Review Chapter 2 and 6 and Appendices in Plane Answers to Complex Questions