class: center, middle, inverse, title-slide # Ordinary Least Squares ### Yue Jiang ### STA 210 / Duke University / Spring 2024 --- ### Whatever this thing is... <img src="img/diamonds2.jpg" width="90%" style="display: block; margin: auto;" /> --- ### Diamonds! <img src="img/marilyn.jpg" width="90%" style="display: block; margin: auto;" /> --- ### Diamonds! <img src="ols_files/figure-html/unnamed-chunk-4-1.png" width="90%" style="display: block; margin: auto;" /> --- ### Predicting price based on mass <img src="ols_files/figure-html/unnamed-chunk-5-1.png" width="90%" style="display: block; margin: auto;" /> --- ### Predicting price based on mass <img src="ols_files/figure-html/unnamed-chunk-6-1.png" width="90%" style="display: block; margin: auto;" /> --- ### Predicting price based on mass <img src="ols_files/figure-html/unnamed-chunk-7-1.png" width="90%" style="display: block; margin: auto;" /> --- ### Predicting price based on mass <img src="ols_files/figure-html/unnamed-chunk-8-1.png" width="90%" style="display: block; margin: auto;" /> --- ### Predicting price based on mass <img src="ols_files/figure-html/unnamed-chunk-9-1.png" width="90%" style="display: block; margin: auto;" /> --- ### Predicting price based on mass <img src="ols_files/figure-html/unnamed-chunk-10-1.png" width="90%" style="display: block; margin: auto;" /> --- ### What is a linear model anyway? Let's consider a situation with a single continuous predictor: `\begin{align*} y_i = \beta_0 + \beta_1x_i + \epsilon_i \end{align*}` - `\(y_i\)` is the outcome (dependent variable) of interest for observation `\(i\)`, for `\(i = 1, \cdots, n\)` - `\(\beta_0\)` is the intercept parameter (more on what a "parameter" is on Thursday) - `\(\beta_1\)` is the slope parameter - `\(x_i\)` is the predictor variable - `\(\epsilon_i\)` is the error (like `\(\beta_0\)` and `\(\beta_1\)`, it is not observed) `\begin{align*} E(y_i | x_i) = \beta_0 + \beta_1x_i \end{align*}` We want to find "good" estimates `\(\hat{\beta}_0\)` and `\(\hat{\beta}_1\)`. --- ### Slope-intercept form of a line <img src="img/slopemap.png" width="100%" style="display: block; margin: auto;" /> `\begin{align*} E(y_i | x_i) = \beta_0 + \beta_1x_i \end{align*}` .question[ How do you interpret the slope and intercept in this formulation? ] --- ### Slope-intercept form of a line - The .vocab[intercept], `\(\beta_0\)`, represents the point where the line crosses the y-axis (that is, when `\(x = 0\)`). - The .vocab[slope], `\(\beta_1\)`, represents the change in `\(y\)` as `\(x\)` changes by 1 unit. Using the estimates for `\(\beta_0\)` and `\(\beta_1\)`, we can *predict* a value for `\(y_i\)` for each associated `\(x_i\)`: `\begin{align*} \hat{y}_i = \hat{\beta}_0 + \hat{\beta}_1x_i \end{align*}` -- The difference between the *observed* `\(y_i\)` and the value of `\(y_i\)` predicted from the model (that is, `\(\hat{y}_i\)`), is known as the .vocab[residual]: `\begin{align*} \hat{\epsilon}_i &= y_i - \left(\hat{\beta}_0 + \hat{\beta}_1x_i\right) \\ &= y_i - \hat{y}_i \end{align*}` --- ### Not a linear model <img src="ols_files/figure-html/unnamed-chunk-12-1.png" width="90%" style="display: block; margin: auto;" /> --- ### Some candidate lines <img src="ols_files/figure-html/unnamed-chunk-13-1.png" width="90%" style="display: block; margin: auto;" /> --- ### A simpler example <img src="ols_files/figure-html/unnamed-chunk-14-1.png" style="display: block; margin: auto;" /> --- ### Loss functions Remember that we are predicting the outcome of each of our outcome variables `\(y_i\)`, with a *linear* prediction made from some model `\(\hat{\beta}_0 + \hat{\beta}_1x_i\)` But how do we tell whether we've made a "good" prediction or not? What do we count as good? Keep in mind the notion of the **residual**: `\begin{align*} \hat{\epsilon}_i &= y_i - \left(\hat{\beta}_0 + \hat{\beta}_1x_i\right) \\ &= y_i - \hat{y}_i \end{align*}` --- ### Some loss functions What might be some candidate loss functions? `\begin{align*} f(y_1, \cdots, y_n, \hat{y}_1, \cdots, \hat{y}_n) &= \frac{1}{n} \sum_{i = 1}^n \mathbf{1}_{\hat{y}_i \neq 17} &?\\ f(y_1, \cdots, y_n, \hat{y}_1, \cdots, \hat{y}_n) &= \frac{1}{n} \sum_{i = 1}^n \mathbf{1}_{y_i \neq \hat{y}_i} &?\\ f(y_1, \cdots, y_n, \hat{y}_1, \cdots, \hat{y}_n) &= \frac{1}{n} \sum_{i = 1}^n \Big | y_i - \hat{y}_i \Big | &?\\ f(y_1, \cdots, y_n, \hat{y}_1, \cdots, \hat{y}_n) &= \frac{1}{n} \sum_{i = 1}^n \left(y_i - \hat{y}_i\right)^2 &?\\ f(y_1, \cdots, y_n, \hat{y}_1, \cdots, \hat{y}_n) &= \frac{1}{n} \sum_{i = 1}^n \log \left( \frac{1 + \exp(2(y_i - \hat{y}_i))}{2\exp(y_i - \hat{y}_i)} \right) &? \end{align*}` --- ### One potential line <img src="ols_files/figure-html/unnamed-chunk-15-1.png" style="display: block; margin: auto;" /> --- ### Another potential line <img src="ols_files/figure-html/unnamed-chunk-16-1.png" style="display: block; margin: auto;" /> --- ### Yet another potential line <img src="ols_files/figure-html/unnamed-chunk-17-1.png" style="display: block; margin: auto;" /> --- ### Ordinary least squares `\begin{align*} f(y_1, \cdots, y_n, \hat{y}_1, \cdots, \hat{y}_n) &= \frac{1}{n} \sum_{i = 1}^n \left(y_i - \hat{y}_i\right)^2 \\ &= \frac{1}{n} \sum_{i = 1}^n \left(y_i - \left(\hat{\beta}_0 + \hat{\beta}_1x_{i} \right)\right)^2 \end{align*}` We want to find the estimates `\(\hat{\beta}_0\)` and `\(\hat{\beta}_1\)` that minimize the squared residuals. .question[ - Why use this loss function in particular? - How would we go about finding these estimates? ] --- ### Back to diamonds <img src="ols_files/figure-html/unnamed-chunk-18-1.png" width="90%" style="display: block; margin: auto;" /> `\begin{align*} \widehat{Price}_i = \hat{\beta}_0 + \hat{\beta}_1(Carat)_i \end{align*}` --- ### Back to diamonds <img src="ols_files/figure-html/unnamed-chunk-19-1.png" width="90%" style="display: block; margin: auto;" /> `\begin{align*} \widehat{Price}_i = -2458.2 + 8028.8(Carat)_i \end{align*}` --- ### Back to diamonds `\begin{align*} \widehat{Price}_i = -2458.2 + 8028.8(Carat)_i \end{align*}` .question[ - How might we interpret these parameter estimates? - What would you predict the average price of a 1.5 carat diamond to be? - What would you predict the average price of a 15 carat diamond to be? ] --- ### Who *doesn't* love Novuary?!  --- ### Fitting linear models in R ```r m1 <- lm(price ~ carat, data = diamonds) summary(m1) ``` ``` ## ## Call: ## lm(formula = price ~ carat, data = diamonds) ## ## Residuals: ## Min 1Q Median 3Q Max ## -5931.3 -900.4 -0.1 628.4 8574.0 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) -2458.19 97.11 -25.32 <2e-16 ## carat 8028.78 105.73 75.94 <2e-16 ## ## Residual standard error: 1529 on 998 degrees of freedom ## Multiple R-squared: 0.8525, Adjusted R-squared: 0.8523 ## F-statistic: 5766 on 1 and 998 DF, p-value: < 2.2e-16 ``` --- ### The tidymodel framework ```r library(tidymodels) ``` - A collection of packages for modeling and machine learning using tidyverse principles - Consistent syntax for many different model types - Helps streamline modeling workflow without having to manually program things like splitting data into training/testing sets, transforming or creating new variables, assessing model performance, and using the model for prediction or inference, etc. Check out the (free!) online book [Tidy Modeling with R](https://www.tmwr.org/) or the [official website](https://www.tidymodels.org/) for more details. --- ### Fitting linear models with tidymodels ```r tidy_mod <- linear_reg() |> set_engine("lm") |> fit(price ~ carat, data = diamonds) tidy_mod ``` ``` ## parsnip model object ## ## ## Call: ## stats::lm(formula = price ~ carat, data = data) ## ## Coefficients: ## (Intercept) carat ## -2458 8029 ``` --- ### Fitting linear models with tidymodels ```r linear_reg() |> set_engine("lm") |> fit(price ~ carat, data = diamonds) |> tidy() ``` ``` ## # A tibble: 2 x 5 ## term estimate std.error statistic p.value ## <chr> <dbl> <dbl> <dbl> <dbl> ## 1 (Intercept) -2458. 97.1 -25.3 1.32e-109 ## 2 carat 8029. 106. 75.9 0 ``` --- ### Predicting ```r new_carats <- tibble(carat = c(0.5, 1, 2)) predict(tidy_mod, new_carats) ``` ``` ## # A tibble: 3 x 1 ## .pred ## <dbl> ## 1 1556. ## 2 5571. ## 3 13599. ```