Ordinary Least Squares

class: center, middle, inverse, title-slide

# Ordinary Least Squares
### Yue Jiang
### STA 210 / Duke University / Spring 2024

---

### Whatever this thing is...

---

### Diamonds!

---

### Diamonds!

---

### Predicting price based on mass

---

### Predicting price based on mass

---

### Predicting price based on mass

---

### Predicting price based on mass

---

### Predicting price based on mass

---

### Predicting price based on mass

---

### What is a linear model anyway?

Let's consider a situation with a single continuous predictor:

`\begin{align*}
y_i = \beta_0 + \beta_1x_i + \epsilon_i
\end{align*}`

- `\(y_i\)` is the outcome (dependent variable) of interest for observation `\(i\)`, for `\(i = 1, \cdots, n\)`
- `\(\beta_0\)` is the intercept parameter (more on what a "parameter" is on Thursday)
- `\(\beta_1\)` is the slope parameter
- `\(x_i\)` is the predictor variable
- `\(\epsilon_i\)` is the error (like `\(\beta_0\)` and `\(\beta_1\)`, it is not observed)

`\begin{align*}
E(y_i | x_i) = \beta_0 + \beta_1x_i
\end{align*}`

We want to find "good" estimates `\(\hat{\beta}_0\)` and `\(\hat{\beta}_1\)`.

---

### Slope-intercept form of a line

`\begin{align*}
E(y_i | x_i) = \beta_0 + \beta_1x_i
\end{align*}`

.question[
How do you interpret the slope and intercept in this formulation?
]

---

### Slope-intercept form of a line

- The .vocab[intercept], `\(\beta_0\)`, represents the point where the line crosses the y-axis (that is, when `\(x = 0\)`). 
- The .vocab[slope], `\(\beta_1\)`, represents the change in `\(y\)` as `\(x\)` changes by 1 unit.

Using the estimates for `\(\beta_0\)` and `\(\beta_1\)`, we can *predict* a value for `\(y_i\)` for each associated `\(x_i\)`:

`\begin{align*}
\hat{y}_i = \hat{\beta}_0 + \hat{\beta}_1x_i
\end{align*}`

The difference between the *observed* `\(y_i\)` and the value of `\(y_i\)` predicted from the model (that is, `\(\hat{y}_i\)`), is known as the .vocab[residual]:

`\begin{align*}
\hat{\epsilon}_i &= y_i - \left(\hat{\beta}_0 + \hat{\beta}_1x_i\right) \\
&= y_i - \hat{y}_i
\end{align*}`

---

### Not a linear model

---

### Some candidate lines

---

### A simpler example

---

### Loss functions

Remember that we are predicting the outcome of each of our outcome variables 
`\(y_i\)`, with a *linear* prediction made from some model `\(\hat{\beta}_0 + \hat{\beta}_1x_i\)`

But how do we tell whether we've made a "good" prediction or not? What do we
count as good?

Keep in mind the notion of the **residual**:

`\begin{align*}
\hat{\epsilon}_i &= y_i - \left(\hat{\beta}_0 + \hat{\beta}_1x_i\right) \\
&= y_i - \hat{y}_i
\end{align*}`

---

### Some loss functions

What might be some candidate loss functions?

`\begin{align*}
f(y_1, \cdots, y_n, \hat{y}_1, \cdots, \hat{y}_n) &= \frac{1}{n} \sum_{i = 1}^n \mathbf{1}_{\hat{y}_i \neq 17} &?\\
f(y_1, \cdots, y_n, \hat{y}_1, \cdots, \hat{y}_n) &= \frac{1}{n} \sum_{i = 1}^n \mathbf{1}_{y_i \neq \hat{y}_i} &?\\
f(y_1, \cdots, y_n, \hat{y}_1, \cdots, \hat{y}_n) &= \frac{1}{n} \sum_{i = 1}^n \Big | y_i - \hat{y}_i \Big | &?\\
f(y_1, \cdots, y_n, \hat{y}_1, \cdots, \hat{y}_n) &= \frac{1}{n} \sum_{i = 1}^n \left(y_i - \hat{y}_i\right)^2 &?\\
f(y_1, \cdots, y_n, \hat{y}_1, \cdots, \hat{y}_n) &= \frac{1}{n} \sum_{i = 1}^n \log \left( \frac{1 + \exp(2(y_i - \hat{y}_i))}{2\exp(y_i - \hat{y}_i)}   \right) &?
\end{align*}`

---

### One potential line

---

### Another potential line

---

### Yet another potential line

---

### Ordinary least squares

`\begin{align*}
f(y_1, \cdots, y_n, \hat{y}_1, \cdots, \hat{y}_n) &= \frac{1}{n} \sum_{i = 1}^n \left(y_i - \hat{y}_i\right)^2 \\
&= \frac{1}{n} \sum_{i = 1}^n \left(y_i - \left(\hat{\beta}_0 + \hat{\beta}_1x_{i} \right)\right)^2
\end{align*}`

We want to find the estimates `\(\hat{\beta}_0\)` and `\(\hat{\beta}_1\)` that minimize the squared residuals.

.question[
- Why use this loss function in particular?
- How would we go about finding these estimates?
]

---

### Back to diamonds

<img src="ols_files/figure-html/unnamed-chunk-18-1.png" width="90%" style="display: block; margin: auto;" />
`\begin{align*}
\widehat{Price}_i = \hat{\beta}_0 + \hat{\beta}_1(Carat)_i
\end{align*}`

---

### Back to diamonds

<img src="ols_files/figure-html/unnamed-chunk-19-1.png" width="90%" style="display: block; margin: auto;" />
`\begin{align*}
\widehat{Price}_i = -2458.2 + 8028.8(Carat)_i
\end{align*}`

---

### Back to diamonds

`\begin{align*}
\widehat{Price}_i = -2458.2 + 8028.8(Carat)_i
\end{align*}`

.question[
- How might we interpret these parameter estimates?
- What would you predict the average price of a 1.5 carat diamond to be?
- What would you predict the average price of a 15 carat diamond to be?
]

---

### Who *doesn't* love Novuary?!

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; ![](https://media.giphy.com/media/Sb9KqeeymLlESGWZyE/giphy.gif)
---

### Fitting linear models in R

```r
m1 <- lm(price ~ carat, data = diamonds)
summary(m1)
```

```
## 
## Call:
## lm(formula = price ~ carat, data = diamonds)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -5931.3  -900.4    -0.1   628.4  8574.0 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2458.19      97.11  -25.32   <2e-16
## carat        8028.78     105.73   75.94   <2e-16
## 
## Residual standard error: 1529 on 998 degrees of freedom
## Multiple R-squared:  0.8525,	Adjusted R-squared:  0.8523 
## F-statistic:  5766 on 1 and 998 DF,  p-value: < 2.2e-16
```

---

### The tidymodel framework

```r
library(tidymodels)
```

- A collection of packages for modeling and machine learning using tidyverse principles
- Consistent syntax for many different model types
- Helps streamline modeling workflow without having to manually program things like splitting data into training/testing sets, transforming or creating new variables, assessing model performance, and using the model for prediction or inference, etc.

Check out the (free!) online book [Tidy Modeling with R](https://www.tmwr.org/) or the [official website](https://www.tidymodels.org/) for more details.

---

### Fitting linear models with tidymodels

```r
tidy_mod <- linear_reg() |>
  set_engine("lm") |>
  fit(price ~ carat, data = diamonds)

tidy_mod
```

```
## parsnip model object
## 
## 
## Call:
## stats::lm(formula = price ~ carat, data = data)
## 
## Coefficients:
## (Intercept)        carat  
##       -2458         8029
```

---

### Fitting linear models with tidymodels

```r
linear_reg() |>
  set_engine("lm") |>
  fit(price ~ carat, data = diamonds) |>
  tidy()
```

```
## # A tibble: 2 x 5
##   term        estimate std.error statistic   p.value
##   <chr>          <dbl>     <dbl>     <dbl>     <dbl>
## 1 (Intercept)   -2458.      97.1     -25.3 1.32e-109
## 2 carat          8029.     106.       75.9 0
```

---

### Predicting

```r
new_carats <- tibble(carat = c(0.5, 1, 2))

predict(tidy_mod, new_carats)
```

```
## # A tibble: 3 x 1
##    .pred
##    <dbl>
## 1  1556.
## 2  5571.
## 3 13599.
```