Comparing models

class: center, middle, inverse, title-slide

# Comparing models
### Yue Jiang
### STA 210 / Duke University / Spring 2023

---

### Review: Multiple regression

The linear regression model is given by

`\begin{align*}
y_i &= \beta_0 + \beta_1x_{i1} + \cdots + \beta_px_{ip} + \epsilon_i,\\
\epsilon_i &\stackrel{i.i.d.}{\sim} N(0, \sigma^2)
\end{align*}`

Note the variability at the individual level. When using this model, we think
about relationships between the predictors and the .vocab[conditional expectation]
of the response:

`\begin{align*}
E(y | x_1, \cdots, x_p) &= \beta_0 + \beta_1x_1 + \cdots + \beta_px_p
\end{align*}`

(familiar visualization on next page...)

---

### Conditional expectation of the response

![](comparing-models_files/figure-html/unnamed-chunk-2-1.png)

---

### Comparing models?

.question[
Suppose you had two models (that aimed to do the same thing). In what sorts of
ways might you compare the models? How might one model be "better" (whatever
this means) than another?

Think about how models are being *used* (for explanation and prediction, etc.).
]

---

### Interpretation?

```r
ggplot(data = temp, aes(x = x, y = y)) + 
  geom_point() + 
  theme_bw() + 
  labs(x = "", y = "") + 
  stat_smooth(method = "lm", formula = y ~ I(x^0.5), 
              se = F, color = "deepskyblue") + 
  stat_smooth(method = "lm", formula = y ~ log(x), 
              lty = 3, se = F, color = "red") + 
  stat_smooth(method = "lm", formula = y ~ x,
              se = F, color = "darkgray")
```

---

### Interpretation?

![](comparing-models_files/figure-html/unnamed-chunk-5-1.png)

---

### Interpretation?

```r
tidy(m1)
```

```
## # A tibble: 2 x 5
##   term        estimate std.error statistic  p.value
##   <chr>          <dbl>     <dbl>     <dbl>    <dbl>
## 1 (Intercept)    -6.93     0.919     -7.54 8.79e-13
## 2 I(x^0.5)        3.09     0.346      8.94 8.97e-17
```

```r
tidy(m2)
```

```
## # A tibble: 2 x 5
##   term        estimate std.error statistic  p.value
##   <chr>          <dbl>     <dbl>     <dbl>    <dbl>
## 1 (Intercept)    -6.76     0.895     -7.56 7.89e-13
## 2 log(x)          4.12     0.458      9.00 6.15e-17
```

```r
tidy(m3)
```

```
## # A tibble: 2 x 5
##   term        estimate std.error statistic  p.value
##   <chr>          <dbl>     <dbl>     <dbl>    <dbl>
## 1 (Intercept)   -2.81     0.461      -6.08 4.40e- 9
## 2 x              0.580    0.0652      8.89 1.32e-16
```

---

### Interpretation?

Model 1 (square transform on `\(x\)`):

`\begin{align*}
\widehat{y}_i &= -6.93 + 3.09~\sqrt{x}_i
\end{align*}`

Model 2 (log transform on `\(x\)`):

`\begin{align*}
\widehat{y}_i &= -6.76 + 4.12~\log{x}_i
\end{align*}`

Model 3 (just a linear model):

`\begin{align*}
\widehat{y}_i &= -2.81 + 0.58~x_i
\end{align*}`

.question[
How would you interpret the slope estimate for `\(x\)` in each of these models? Is
there any you might prefer?
]

---

### Other metrics?

.vocab[Mean squared error] (MSE) is given by

`\begin{align*}
MSE = \frac{1}{n}\sum_{i = 1}^n (y_i - \widehat{y}_i)^2
\end{align*}`

.question[
In plain English, what is the mean squared error? Why might we also be 
interested in the .vocab[root mean squared error] (RMSE), which is the square
root of the MSE?
]

---

### Other metrics?

```r
m1_aug <- augment(m1); m2_aug <- augment(m2); m3_aug <- augment(m3)
rmse(m1_aug, truth = y, estimate = .fitted)
```

```
## # A tibble: 1 x 3
##   .metric .estimator .estimate
##   <chr>   <chr>          <dbl>
## 1 rmse    standard       0.441
```

```r
rmse(m2_aug, truth = y, estimate = .fitted)
```

```
## # A tibble: 1 x 3
##   .metric .estimator .estimate
##   <chr>   <chr>          <dbl>
## 1 rmse    standard       0.440
```

```r
rmse(m3_aug, truth = y, estimate = .fitted)
```

```
## # A tibble: 1 x 3
##   .metric .estimator .estimate
##   <chr>   <chr>          <dbl>
## 1 rmse    standard       0.441
```

---

### Review: Overall F-test for the model

`\begin{align*}
y_i &= \beta_0 + \beta_1x_{i1} + \cdots + \beta_px_{ip} + \epsilon
\end{align*}`

`\begin{align*}
H_0: \beta_1 = \cdots = \beta_p = 0\\
H_1: \text{at least one }~\beta_k \neq 0
\end{align*}`

.question[
What are the F test hypotheses in plain English?
]

---

### Review: Overall F-test for the model

```
## 
## Call:
## lm(formula = y ~ x)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.0415 -0.3251  0.0177  0.2917  1.3161 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  -2.8060     0.4612   -6.08  4.4e-09 ***
## x             0.5797     0.0652    8.89  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.443 on 248 degrees of freedom
## Multiple R-squared:  0.242,	Adjusted R-squared:  0.238 
## F-statistic:   79 on 1 and 248 DF,  p-value: <2e-16
```

.question[
What is the test statistic and its distribution under `\(H_0\)`?
]

---

### Comparing nested models

Two linear models are .vocab[nested] if one of the models consists solely of 
terms found "within" another. For instance:

`\begin{align*}
y_i &= \beta_0 + \beta_1x_{i1} + \beta_2x_{i2} + \epsilon_i\\
y_i &= \beta_0 + \beta_1x_{i1} + \epsilon_i
\end{align*}`

`\begin{align*}
y_i &= \beta_0 + \beta_1x_{i1} + \beta_2x_{i2} + \beta_3x_{i3} + \beta_4x_{i1}x_{i2} + \epsilon_i\\
y_i &= \beta_0 + \beta_1x_{i1} + \beta_2x_{i2} + \beta_3x_{i3} + \epsilon_i
\end{align*}`

We can test *groups* of nested variables as well (why might we want to do this?):

`\begin{align*}
y_i &= \beta_0 + \beta_1x_{i1} + \beta_2x_{i2} + \beta_3x_{i3} + \beta_4x_{i1}x_{i2} + \epsilon_i\\
y_i &= \beta_0 + \beta_1x_{i1} + \epsilon_i
\end{align*}`

---

### The nested F test

```
## 
## Call:
## lm(formula = PM2.5 ~ TEMP, data = dat)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -103.0  -57.2  -21.6   34.1  616.6 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  94.1178     0.7030   133.9   <2e-16 ***
## TEMP         -0.8896     0.0394   -22.6   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 79.8 on 31813 degrees of freedom
## Multiple R-squared:  0.0158,	Adjusted R-squared:  0.0158 
## F-statistic:  511 on 1 and 31813 DF,  p-value: <2e-16
```

---

### The nested F test

```
## 
## Call:
## lm(formula = PM2.5 ~ TEMP + PRES + RAIN + wd, data = dat)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -145.4  -48.7  -18.3   29.2  615.1 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 2902.2107    74.6836   38.86  < 2e-16 ***
## TEMP          -3.1551     0.0676  -46.69  < 2e-16 ***
## PRES          -2.7210     0.0731  -37.24  < 2e-16 ***
## RAIN          -2.5668     0.5036   -5.10  3.5e-07 ***
## wdENE         -5.8459     1.9820   -2.95   0.0032 ** 
## wdESE          0.2874     2.4198    0.12   0.9054    
## wdN          -52.7173     2.3173  -22.75  < 2e-16 ***
## wdNE         -18.0660     1.8951   -9.53  < 2e-16 ***
## wdNNE        -40.6988     2.2086  -18.43  < 2e-16 ***
## wdNNW        -72.1930     2.5190  -28.66  < 2e-16 ***
## wdNW         -70.6412     2.3786  -29.70  < 2e-16 ***
## wdS           -5.1466     2.6508   -1.94   0.0522 .  
## wdSE          -2.8372     2.6250   -1.08   0.2798    
## wdSSE         -1.4042     2.8888   -0.49   0.6269    
## wdSSW        -10.7876     2.3119   -4.67  3.1e-06 ***
## wdSW         -20.0180     2.0566   -9.73  < 2e-16 ***
## wdW          -32.7593     2.7593  -11.87  < 2e-16 ***
## wdWNW        -52.2191     2.8248  -18.49  < 2e-16 ***
## wdWSW        -27.7496     2.2769  -12.19  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 75 on 31796 degrees of freedom
## Multiple R-squared:  0.132,	Adjusted R-squared:  0.131 
## F-statistic:  268 on 18 and 31796 DF,  p-value: <2e-16
```

---

### The nested F test

```r
anova(m0, m1, m2) # /3/ nested models!
```

```
## Analysis of Variance Table
## 
## Model 1: PM2.5 ~ TEMP
## Model 2: PM2.5 ~ TEMP + PRES + RAIN
## Model 3: PM2.5 ~ TEMP + PRES + RAIN + wd
##   Res.Df      RSS Df Sum of Sq   F Pr(>F)    
## 1  31813 2.03e+08                            
## 2  31811 1.94e+08  2   8501865 756 <2e-16 ***
## 3  31796 1.79e+08 15  15376534 182 <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

.question[
How do we interpret this table?
]