class: center, middle, inverse, title-slide # Transformations ### Yue Jiang ### STA 210 / Duke University / Spring 2024 --- ### Exam matters --- ### Diamonds! <img src="img/marilyn.jpg" width="90%" style="display: block; margin: auto;" /> --- ### Residual plots <img src="transformations_files/figure-html/unnamed-chunk-3-1.png" width="90%" style="display: block; margin: auto;" /> --- ### Residual plots <img src="transformations_files/figure-html/unnamed-chunk-4-1.png" width="90%" style="display: block; margin: auto;" /> --- ### A quick question .question[ Do you think a change in income from 30k to 60k per year is more "similar" to one the change from 50k - 100k or 430k to 460k? ] --- ### What do logarithms actually do? `\begin{align*} 10^4 &= 10,000 \\ 10^3 &= 1,000 \\ 10^2 &= 100 \\ 10^1 &= 10 \\ 10^0 &= 1 \\ 10^{-1} &= \frac{1}{10}\\ 10^{-2} &= \frac{1}{100} \end{align*}` .question[ How much "larger" is 1,000 than 10? How about 64 vs. 1/2? ] --- ### Logarithms in real life - The Richter scale is a logarithmic scale of earthquake intensity such that a 1,000 times difference in earthquake intensity corresponds to a 2 unit difference on the Richter scale. - The decibel scale is a logarithmic scale such that a 10 times difference in sound intensity corresponds to a 10 unit difference on the decibel scale (e.g., a 130 decibel sound is 10 times as powerful as a 120 decibel sound; a 70 decibel sound is 10 times as powerful as a 60 decibel sound). - Apparent magnitude of stars is on a logarithmic scale such that a 100 times difference in apparent magnitude corresponds to a 5 unit difference (in this case, a decrease, such that lower magnitude means brighter star) in visual brightness from Earth. .question[ The Sun has an apparent magnitude of -26.8. The full moon has an apparent magnitude of -12.7; How much brighter is the sun to us compared to the moon (on a multiplicative scale)? ] --- ### Logarithms in real life <img src="img/keys.png" width="90%" style="display: block; margin: auto;" /> Most guitars, pianos, and other instruments in Western music are tuned in 12-tone equal temperament . This is a logarithmic scale such that a 2 times difference in frequency (i.e., the next octave higher) corresponds to a 12 unit difference in note (there are 12 half-steps per octave). Find out why the [twelfth root of two](https://en.wikipedia.org/wiki/Twelfth_root_of_two) has its own Wikipedia page! --- ### Logarithms in real life <img src="img/moore.png" width="100%" style="display: block; margin: auto;" /> --- ### What do logarithms actually do? Logarithms allow us to "linearize" multiplicative relationships. For instance, in the log scale (base ten), 100 is just as far from 10,000 as 10 is from 1,000. In both cases, you have to multiply by ten twice to get from one to the other. We can see that by writing them out using exponents and logs. In base 10, log(10,000) = 4; log(100) = 2, for a linear difference of 2. Similarly, log(1,000) = 3; log(10) = 1, again a linear difference of 2. --- ### Log-transformed responses <img src="transformations_files/figure-html/unnamed-chunk-7-1.png" style="display: block; margin: auto;" /> --- ### Log-transformed responses <img src="transformations_files/figure-html/unnamed-chunk-8-1.png" style="display: block; margin: auto;" /> --- ### Let's fit a new model... <img src="transformations_files/figure-html/unnamed-chunk-9-1.png" style="display: block; margin: auto;" /> --- ### Let's fit a new model... ```r m2 <- lm(log(price) ~ carat, data = diamonds) summary(m2) ``` ``` ## ## Call: ## lm(formula = log(price) ~ carat, data = diamonds) ## ## Residuals: ## Min 1Q Median 3Q Max ## -1.2052 -0.2128 0.0353 0.2548 1.0092 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 6.14621 0.02295 267.85 <2e-16 ## carat 2.06657 0.02499 82.71 <2e-16 ## ## Residual standard error: 0.3614 on 998 degrees of freedom ## Multiple R-squared: 0.8727, Adjusted R-squared: 0.8726 ## F-statistic: 6841 on 1 and 998 DF, p-value: < 2.2e-16 ``` --- ### Let's fit a new model... `\begin{align*} \log(price_i | carat_i) &= \beta_0 + \beta_1~carat_i + \epsilon_i\\ \widehat{\log(price_i)} &= 6.15 + 2.07~carat_i \end{align*}` .question[ How might we interpret parameters in this model? How might we "undo" the log- transformation? ] **Important note**: `\(E(\log(Y)) \neq \log(E(Y))\)`! (...I know interpretations are already annoying enough). I'm being intentionally vague about "predicted log price" here. --- ### Let's fit a new model... `\begin{align*} \widehat{\log(price_i)} &= 6.15 + 2.07~carat_i\\ \widehat{price_i} &= \exp(6.15 + 2.07~carat_i)\\ &= \exp(6.15)\exp(2.07~carat_i) \end{align*}` .question[ How might we compare two observations "one unit apart" in carat (plug in some numbers and try it!)? ] --- ### Let's fit a new model... ```r exp(6.15 + 2.07*1)/exp(6.15 + 2.07*0) ``` ``` ## [1] 7.924823 ``` ```r exp(6.15 + 2.07*2)/exp(6.15 + 2.07*1) ``` ``` ## [1] 7.924823 ``` ```r exp(6.15 + 2.07*-8)/exp(6.15 + 2.07*-9) ``` ``` ## [1] 7.924823 ``` ```r exp(2.07) ``` ``` ## [1] 7.924823 ``` .question[ How might we interpret slopes with log-transformed responses? ] --- ### Interpretations on log-transformed responses For two observations one unit apart (e.g., one diamond that's one carat larger than another), the larger diamond is predicted to be approximately 7.9 times as pricy as the smaller diamond. Of course, this would be adjusting for other potential variables in our model. Briefly: focus on the exponentiated (or other bases depending on exact model fit) regression coefficients. The exponentiated slopes basically correspond to *multiplicative differences* in the outcome. **For this specific case**, it's alright not to use interpretations involving conditional expectations. There are technical interpretations involving [expected geometric means](https://en.wikipedia.org/wiki/Geometric_mean), but don't worry about them. --- ### Taking a quick break What if, instead of log transforming the response, we log transformed a predictor? `\begin{align*} y_i &= \beta_0 + \beta_1~\underbrace{\log_{10}(x_i)}_{z_i} + \epsilon_i \end{align*}` .question[ How might we interpret `\(\beta_1\)` in terms of `\(z_i\)`? How does this relate to `\(x_i\)`? How about `\(\beta_0\)`? Note that the response is not log-transformed, so we fully expect (ugh) the notion of "expectation" in these interpretations - we are indeed fitting models on the conditional expectation of `\(y_i\)` given the predictors. ] --- ### Taking a quick break `\begin{align*} y_i &= \beta_0 + \beta_1~\underbrace{\log_{2}(x_i)}_{w_i} + \epsilon_i \end{align*}` .question[ How about `\(\beta_1\)` in the model above? How about `\(\beta_0\)`? ] --- ### Taking a quick break Let's think about an interpretation in a different way. Using natural logs here. `\begin{align*} \widehat{y}_i &= \widehat{\beta}_0 + \widehat{\beta}_1~\log(x_i)\\ \widehat{y}_i^\prime &= \widehat{\beta}_0 + \widehat{\beta}_1~\underbrace{1.01 \times \log(x_i)}_{1\%~difference} \end{align*}` Then their difference is: `\begin{align*} \widehat{y}_i^\prime - \widehat{y}_i &= \widehat{\beta}_1(\log(1.01x_i)) - \widehat{\beta}_1(\log(x_i))\\ &= \widehat{\beta}_1(\log(1.01x_i) - \log(x_i))\\ &= \widehat{\beta}_1(\log(1.01x_i / x_i))\\ &= \widehat{\beta}_1(\log~1.01) \end{align*}` --- ### Another model Instead of a log-transformed predictor, how about another transformation on the predictor? In what situation would this be a reasonable transformation? `\begin{align*} y_i &= \beta_0 + \beta_1 x_i + \beta_2 x_i^2 + \epsilon_i \end{align*}` .question[ What is the expected change in `\(y\)` given a one unit change in `\(x\)` here? (yes, we're working with conditional expectations again. The only issue comes in when the outcome has a special kind of transformation applied) ] --- ### Transformations on both sides `\begin{align*} \log(y_i) &= \beta_0 + \beta_1~\log(x_i) + \epsilon_i \end{align*}` .question[ What would we predict to happen to `\(y\)` given changes in `\(x\)`? ] --- ### Back to carat vs. log10 price <img src="transformations_files/figure-html/unnamed-chunk-13-1.png" style="display: block; margin: auto;" /> --- ### Back to carat vs. log10 price <img src="transformations_files/figure-html/unnamed-chunk-14-1.png" style="display: block; margin: auto;" /> --- ### Back to carat vs. log10 price <img src="transformations_files/figure-html/unnamed-chunk-15-1.png" style="display: block; margin: auto;" /> --- ### Non-linear pattern in residuals Constant variance looks ok, but linearity is violated! .question[ What might we do? ] --- ### A third model... `\begin{align*} \log_{10}(y_i) = \beta_0 + \beta_1~carat_i + \beta_2~carat_i^2 + \epsilon_i \end{align*}` ```r m3 <- lm(log10(price) ~ carat + I(carat^2), data = diamonds) summary(m3) ``` ``` ## ## Call: ## lm(formula = log10(price) ~ carat + I(carat^2), data = diamonds) ## ## Residuals: ## Min 1Q Median 3Q Max ## -0.48842 -0.07275 0.00335 0.07390 0.35744 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 2.33675 0.01247 187.34 <2e-16 ## carat 1.77882 0.02843 62.57 <2e-16 ## I(carat^2) -0.43790 0.01361 -32.17 <2e-16 ## ## Residual standard error: 0.11 on 997 degrees of freedom ## Multiple R-squared: 0.9375, Adjusted R-squared: 0.9374 ## F-statistic: 7483 on 2 and 997 DF, p-value: < 2.2e-16 ``` --- ### A third model... <img src="transformations_files/figure-html/unnamed-chunk-17-1.png" style="display: block; margin: auto;" /> --- ### A third model... <img src="transformations_files/figure-html/unnamed-chunk-18-1.png" style="display: block; margin: auto;" /> --- ### A third model... `\begin{align*} \widehat{\log_{10}(price)}_i = 2.34 + 1.78~carat_i - 0.44(carat_i)^2 \end{align*}` .question[ The model seems to fit pretty well. What do we predict happens to price given changes in carat size? ]