HW 2 Solutions

STA 211 Spring 2023 (Jiang)

Exercise 1

Must there always be a linear relationship between some predictor \(x_k\) and the outcome \(y\) in a linear regression model? If yes, provide a proof; if no, provide a counterexample of such a model and clearly demonstrate a non-linear relationship between the two.

No. The model

\[\begin{align*} y_i = \beta_0 + \beta_1x_{i1} + \beta_2x^2_{i1} + \epsilon_i \end{align*}\]

is linear in the \(\beta\) terms so it is a linear model, but is quadratic in \(x_1\) (partial derivative with respect to \(x_1\) is \(\beta_1 + 2\beta_2x_1\), not a constant).

Exercise 2

[n.b. Palmer Penguins dataset]. Create a linear model in R that predicts the body mass of a penguin based on its flipper length, bill length, and which species it is. Provide your design matrix, and clearly label what each column corresponds to. Display the estimated regression coefficients from the lm function, and recover these same estimates from the dataset directly using matrix operations on the underlying data.

library(palmerpenguins)
library(dplyr)

# Some data manipulation: choosing relevant variables, creating
# dummy variables, and vector of 1s for intercept

dat <- penguins %>% 
  select(body_mass_g, flipper_length_mm, bill_length_mm, species) %>% 
  na.omit() %>% 
  mutate(chinstrap = ifelse(species == "Chinstrap", 1, 0),
         gentoo = ifelse(species == "Gentoo", 1, 0),
         intercept = 1) %>% 
  select(-species)

# Creating vector of response

y <- dat %>% 
  select(body_mass_g)

# Creating and displaying first few rows of design matrix. First
# column is vector of 1s for the intercept, next two are the
# data for flipper and bill length, and last two are dummy variables
# corresponding to which species the penguin is (1 is yes, 0 if no).

x <- dat %>% 
  select(intercept, flipper_length_mm, bill_length_mm,
         chinstrap, gentoo)

head(x)
# A tibble: 6 x 5
  intercept flipper_length_mm bill_length_mm chinstrap gentoo
      <dbl>             <int>          <dbl>     <dbl>  <dbl>
1         1               181           39.1         0      0
2         1               186           39.5         0      0
3         1               195           40.3         0      0
4         1               193           36.7         0      0
5         1               190           39.3         0      0
6         1               181           38.9         0      0
# Matrix operations for closed form solution

x <- as.matrix(x)
y <- as.matrix(y)

solve(t(x) %*% x) %*% t(x) %*% y
                  body_mass_g
intercept         -3904.38680
flipper_length_mm    27.42883
bill_length_mm       61.73645
chinstrap          -748.56228
gentoo               90.43531
# Comparison with lm function:

summary(lm(body_mass_g ~ flipper_length_mm + bill_length_mm + species,
           data = penguins))$coef[,1]
      (Intercept) flipper_length_mm    bill_length_mm  speciesChinstrap 
      -3904.38680          27.42883          61.73645        -748.56228 
    speciesGentoo 
         90.43531