Linear Mixed Models (1)

class: center, middle, inverse, title-slide

# Linear Mixed Models (1)
### Yue Jiang
### STA 210 / Duke University / Spring 2023

---

### Coffee

---

### Dry vs. wet process

In processing coffee beans, there are two methods prior to roasting that affect
the flavor profile of the coffee.

Dry process beans are immediately dried after picking. Since the skin of the
fruit is attached the entire time, it locks in the sugars, resulting in a
sweeter, less acidic taste.

Wet process beans have the skin and pulp of the fruit removed after picking, and
the seeds ferment in a wet mass and then soaked in water before the drying 
process. This results in a more acidic taste.

We also know that the variety of coffee, the location of harvest, and other 
factors are also associated with the final product.

---

### Coffee

---

### Coffee

---

### Processing vs. rating

```
## # A tibble: 2 x 5
##   term                estimate std.error statistic p.value
##   <chr>                  <dbl>     <dbl>     <dbl>   <dbl>
## 1 (Intercept)           82.4       0.180    458.   0      
## 2 processWashed / Wet   -0.659     0.211     -3.12 0.00187
```

---

### Processing vs. rating

```
## # A tibble: 5 x 5
##   term                estimate std.error statistic  p.value
##   <chr>                  <dbl>     <dbl>     <dbl>    <dbl>
## 1 (Intercept)           24.7       1.57      15.7  3.15e-48
## 2 processWashed / Wet    0.317     0.119      2.67 7.84e- 3
## 3 aroma                  1.28      0.247      5.17 3.04e- 7
## 4 flavor                 5.00      0.250     20.0  3.43e-71
## 5 body                   1.31      0.253      5.18 2.82e- 7
```

---

### Coffee

---

### Processing vs. rating

```
## # A tibble: 11 x 5
##    term                estimate std.error statistic  p.value
##    <chr>                  <dbl>     <dbl>     <dbl>    <dbl>
##  1 (Intercept)          27.8        1.71     16.2   9.15e-51
##  2 processWashed / Wet   0.298      0.160     1.87  6.21e- 2
##  3 aroma                 1.20       0.245     4.91  1.10e- 6
##  4 flavor                4.90       0.249    19.6   2.42e-69
##  5 body                  1.10       0.259     4.26  2.31e- 5
##  6 countryColombia       0.212      0.226     0.937 3.49e- 1
##  7 countryEthiopia       0.0508     0.329     0.155 8.77e- 1
##  8 countryGuatemala     -0.186      0.229    -0.816 4.15e- 1
##  9 countryMexico        -0.512      0.222    -2.30  2.17e- 2
## 10 countryTaiwan        -0.126      0.267    -0.470 6.38e- 1
## 11 countryUS (Hawaii)   -0.815      0.247    -3.30  1.03e- 3
```

---

### Independence violations

In this case, coffee-level observations are *not* independent. We might expect
that beans grown in particular countries might share similar growing conditions, 
climate, technique, etc., such that this country-specific relationship
violates independence.

It is not appropriate to simply "control for" country, as within each country, 
coffee beans are more similar to each other.

.question[
What are some consequences of this approach?
]

---

### Independence violations

We get inaccurate estimates of standard errors, leading to invalid inference.

We do not shed light on actual mechanisms we're interested in - in this case,
differences between within- and between-group variability of our observations.

---

### Independence violations

How might we deal with a violation of independence due to this clustered 
structure?

1. Variability in score might be due to within-country variations, or 
between-country variations. However, at the *country* level, we might be able
to assume independence (well...let's say that for now). Why not take the
mean values within each country and fit a model on that scale?

2. If violation of independence is due to nesting within country, why not
simply fit a different model for each country (and then compare the results)?

.question[
What are some of the advantages/disadvantages of these two approaches?
]

---

### A different way of thinking

It's often easier to think of a mixed model intuitively in terms of the "levels" 
at which the observations/variability occur.

For our coffee example, we might be interested in modeling
the score for the `$i$`th cup of coffee in the `$j$`th country:

`\begin{align*}
Score_{ij} &= \beta_{0j} + \beta_{1j}Process_{ij} + \beta_{2j}Aroma_{ij} + \\
&\mathrel{\phantom{=}} \beta_{3j}Flavor_{ij} + \beta_{4j}Body_{ij} + \epsilon_{ij}
\end{align*}`

How we think about the various `$\beta$` components might depend on our 
assumptions for the random effects.

---

### A different way of thinking

For now, let's think of a simplified model that looks at associations between
the score (outcome), the aroma rating (predictor), and random effect due to
country.

For the `$i$`th cup of coffee in the `$j$`th country:

`\begin{align*}
Score_{ij} &= \beta_{0j} + \beta_{1j}Aroma_{ij} + \epsilon_{ij}
\end{align*}`

Note that there is no `$\beta$` corresponding to country per se. Importantly,
we are estimating variability corresponding to the random effects.

---

### A random intercept model

For the `$i$`th cup of coffee in the `$j$`th country:

`\begin{align*}
Score_{ij} &= \beta_{0j} + \beta_{1j}Aroma_{ij} + \epsilon_{ij}\\
\beta_{0j} &= \gamma_{00} + u_{0j}\\
\beta_{1j} &= \gamma_1
\end{align*}`

`\begin{align*}
Score_{ij} &= (\gamma_{00} + u_{0j}) + \gamma_{1}Aroma_{ij} + \epsilon_{ij}\\
\end{align*}`

In this case, we may also specify `$\epsilon_{ij} \sim N(0, \sigma^2_{\epsilon})$` 
and `$u_{0j} \sim N(0, \sigma^2_{u})$`.

.question[
What can we answer with this type of model? How is it different than simply 
adding in country as a predictor? What is the dimension of `$\mathbf{Z}$` here?
]

---

### A random intercept model

`\begin{align*}
Score_{ij} &= (\gamma_{00} + u_{0j}) + \gamma_{1}Aroma_{ij} + \epsilon_{ij},\\
\epsilon_{ij} &\sim N(0, \sigma^2_{\epsilon})\\
u_{0j} &\sim N(0, \sigma^2_{u})
\end{align*}`

- *"What is the relationship between aroma and score (while taking into account the country-level clustering)?"*
- *"How much variability in coffee score occurs at the country-specific level (while knowing that higher aromas are associated with higher score)?"*

Depending on what we're interested in estimating, we might treat certain pieces
of the model as .vocab[nuisance parameters].

---

### A random intercept model

---

### Adding in a random slope

What if we thought that the relationships between aroma and score might vary
depending on country? Previously, we've considered this question using 
interaction effects, but for clustered/hierarchical data like these we may get
invalid standard error estimates.

What if we were interested in the variation in score rating, *due to country*,
as a function of the aroma rating of the coffee?

---

### Adding in a random slope

`\begin{align*}
Score_{ij} &= \beta_{0j} + \beta_{1j}Aroma_{ij} + \epsilon_{ij}\\
\beta_{0j} &= \gamma_{00} + u_{0j}\\
\beta_{1j} &= \gamma_{10} + u_{1j}
\end{align*}`

`\begin{align*}
Score_{ij} &= (\gamma_{00} + u_{0j}) + (\gamma_{10} + u_{1j})Aroma_{ij} + \epsilon_{ij}\\
\end{align*}`

with some correlational structure for the `$u_{0j}$`s and `$u_{1j}$`s (perhaps
jointly) and `$\epsilon$`s.

.question[
What can we answer with this type of model? How is it different than simply 
adding in interactions?
]

---

### Adding in a random slope

---

### Adding in a random slope

.question[
This looks very similar to an OLS model with interactions. What's the difference
here?
]

In our random effects model, the slopes are modeled as variables that have a 
"combined mean" and associated variance parameters. This results in the ability
to "borrow information" about the slopes across the groups.

This leads to then phenomenon of "partial pooling" toward the
group mean slope - the random slopes are shrunk toward the overall mean (and the
other slopes).

A simple OLS model with interaction doesn't include the parameters corresponding
to group mean/variance of slopes, and so there is no pooling effect and 
information about group means is not borrowed.

---

### Fitting mixed models in R

```r
library(lme4)

m1 <- lm(score ~ process + aroma + flavor + body + country, 
         data = coffee)
m2 <- lmer(score ~ 1 + process + aroma + flavor + body + 
             (1 | country), data = coffee)
m3 <- lmer(score ~ 1 + process + flavor + body + 
             (1 + aroma | country), data = coffee)
```

---

### Fitting mixed models in R

```r
summary(m1)
```

```
## 
## Call:
## lm(formula = score ~ process + aroma + flavor + body + country, 
##     data = coffee)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -9.2591 -0.2129  0.2461  0.6575  3.8947 
## 
## Coefficients:
##                     Estimate Std. Error t value Pr(>|t|)
## (Intercept)         27.80043    1.71419  16.218  < 2e-16
## processWashed / Wet  0.29849    0.15975   1.868  0.06210
## aroma                1.20418    0.24502   4.915 1.10e-06
## flavor               4.90092    0.24946  19.646  < 2e-16
## body                 1.10387    0.25909   4.260 2.31e-05
## countryColombia      0.21154    0.22570   0.937  0.34892
## countryEthiopia      0.05084    0.32905   0.155  0.87725
## countryGuatemala    -0.18649    0.22868  -0.816  0.41505
## countryMexico       -0.51171    0.22248  -2.300  0.02173
## countryTaiwan       -0.12561    0.26702  -0.470  0.63820
## countryUS (Hawaii)  -0.81508    0.24732  -3.296  0.00103
## 
## Residual standard error: 1.37 on 732 degrees of freedom
## Multiple R-squared:  0.7228,	Adjusted R-squared:  0.719 
## F-statistic: 190.8 on 10 and 732 DF,  p-value: < 2.2e-16
```

---

### Fitting mixed models in R

```r
summary(m2)
```

```
## Linear mixed model fit by REML ['lmerMod']
## Formula: score ~ 1 + process + aroma + flavor + body + (1 | country)
##    Data: coffee
## 
## REML criterion at convergence: 2591.2
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -6.7608 -0.1413  0.1801  0.4890  2.8766 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  country  (Intercept) 0.09761  0.3124  
##  Residual             1.87680  1.3700  
## Number of obs: 743, groups:  country, 7
## 
## Fixed effects:
##                     Estimate Std. Error t value
## (Intercept)          27.1050     1.6896  16.042
## processWashed / Wet   0.3174     0.1486   2.137
## aroma                 1.2240     0.2443   5.011
## flavor                4.9271     0.2482  19.855
## body                  1.1220     0.2574   4.359
## 
## Correlation of Fixed Effects:
##             (Intr) prcW/W aroma  flavor
## prcssWshd/W -0.208                     
## aroma       -0.283 -0.025              
## flavor      -0.038  0.036 -0.567       
## body        -0.548  0.123 -0.158 -0.393
```

---

### Fitting mixed models in R

```r
summary(m2)$coef
```

```
##                       Estimate Std. Error   t value
## (Intercept)         27.1049933  1.6896244 16.042023
## processWashed / Wet  0.3174144  0.1485559  2.136666
## aroma                1.2240265  0.2442577  5.011210
## flavor               4.9271276  0.2481529 19.855207
## body                 1.1219892  0.2573773  4.359316
```

---

### Fitting mixed models in R

```r
summary(m3)
```

```
## Linear mixed model fit by REML ['lmerMod']
## Formula: score ~ 1 + process + flavor + body + (1 + aroma | country)
##    Data: coffee
## 
## REML criterion at convergence: 2575.5
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -6.5716 -0.1567  0.1748  0.4798  2.6732 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev. Corr 
##  country  (Intercept) 154.954  12.448        
##           aroma         2.547   1.596   -1.00
##  Residual               1.807   1.344        
## Number of obs: 743, groups:  country, 7
## 
## Fixed effects:
##                     Estimate Std. Error t value
## (Intercept)          36.6355     2.1943  16.696
## processWashed / Wet   0.3395     0.1373   2.472
## flavor                4.7971     0.2494  19.236
## body                  1.2599     0.2551   4.938
## 
## Correlation of Fixed Effects:
##             (Intr) prcW/W flavor
## prcssWshd/W -0.204              
## flavor      -0.551  0.037       
## body        -0.572  0.149 -0.366
```

---

### Fitting mixed models in R

```r
summary(m3)$coef
```

```
##                      Estimate Std. Error   t value
## (Intercept)         36.635509  2.1942540 16.696111
## processWashed / Wet  0.339477  0.1373031  2.472464
## flavor               4.797067  0.2493800 19.235973
## body                 1.259942  0.2551478  4.938086
```

.question[
Why isn't a p-value provided? (Actually, don't answer this question...it's very 
complicated)
]