Linear Mixed Models (2)

class: center, middle, inverse, title-slide

# Linear Mixed Models (2)
### Yue Jiang
### STA 210 / Duke University / Spring 2023

---

### Coffee

---

### A mixed effects model

`\begin{align*}
Score_{ij} &= \beta_{0j} + \beta_{1j}Aroma_{ij} + \epsilon_{ij}\\
\beta_{0j} &= \gamma_{00} + u_{0j}\\
\beta_{1j} &= \gamma_{10} + u_{1j}
\end{align*}`

`\begin{align*}
Score_{ij} &= (\gamma_{00} + u_{0j}) + (\gamma_{10} + u_{1j})Aroma_{ij} + \epsilon_{ij}\\
\end{align*}`

with some correlational structure for the `$u_{0j}$`s and `$u_{1j}$`s (perhaps
jointly) and `$\epsilon$`s.

---

### Fitting mixed models in R

```r
library(lme4)

m1 <- lm(score ~ process + aroma + flavor + body + country, 
         data = coffee)
m2 <- lmer(score ~ 1 + process + aroma + flavor + body + 
             (1 | country), data = coffee)
m3 <- lmer(score ~ 1 + process + flavor + body + 
             (1 + aroma | country), data = coffee)
```

---

### Fitting mixed models in R

```r
summary(m1)
```

```
## 
## Call:
## lm(formula = score ~ process + aroma + flavor + body + country, 
##     data = coffee)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -9.2591 -0.2129  0.2461  0.6575  3.8947 
## 
## Coefficients:
##                     Estimate Std. Error t value Pr(>|t|)
## (Intercept)         27.80043    1.71419  16.218  < 2e-16
## processWashed / Wet  0.29849    0.15975   1.868  0.06210
## aroma                1.20418    0.24502   4.915 1.10e-06
## flavor               4.90092    0.24946  19.646  < 2e-16
## body                 1.10387    0.25909   4.260 2.31e-05
## countryColombia      0.21154    0.22570   0.937  0.34892
## countryEthiopia      0.05084    0.32905   0.155  0.87725
## countryGuatemala    -0.18649    0.22868  -0.816  0.41505
## countryMexico       -0.51171    0.22248  -2.300  0.02173
## countryTaiwan       -0.12561    0.26702  -0.470  0.63820
## countryUS (Hawaii)  -0.81508    0.24732  -3.296  0.00103
## 
## Residual standard error: 1.37 on 732 degrees of freedom
## Multiple R-squared:  0.7228,	Adjusted R-squared:  0.719 
## F-statistic: 190.8 on 10 and 732 DF,  p-value: < 2.2e-16
```

---

### Fitting mixed models in R

```r
summary(m2)
```

```
## Linear mixed model fit by REML ['lmerMod']
## Formula: score ~ 1 + process + aroma + flavor + body + (1 | country)
##    Data: coffee
## 
## REML criterion at convergence: 2591.2
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -6.7608 -0.1413  0.1801  0.4890  2.8766 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  country  (Intercept) 0.09761  0.3124  
##  Residual             1.87680  1.3700  
## Number of obs: 743, groups:  country, 7
## 
## Fixed effects:
##                     Estimate Std. Error t value
## (Intercept)          27.1050     1.6896  16.042
## processWashed / Wet   0.3174     0.1486   2.137
## aroma                 1.2240     0.2443   5.011
## flavor                4.9271     0.2482  19.855
## body                  1.1220     0.2574   4.359
## 
## Correlation of Fixed Effects:
##             (Intr) prcW/W aroma  flavor
## prcssWshd/W -0.208                     
## aroma       -0.283 -0.025              
## flavor      -0.038  0.036 -0.567       
## body        -0.548  0.123 -0.158 -0.393
```

---

### Fitting mixed models in R

```r
summary(m2)$coef
```

```
##                       Estimate Std. Error   t value
## (Intercept)         27.1049933  1.6896244 16.042023
## processWashed / Wet  0.3174144  0.1485559  2.136666
## aroma                1.2240265  0.2442577  5.011210
## flavor               4.9271276  0.2481529 19.855207
## body                 1.1219892  0.2573773  4.359316
```

---

### Fitting mixed models in R

```r
summary(m3)
```

```
## Linear mixed model fit by REML ['lmerMod']
## Formula: score ~ 1 + process + flavor + body + (1 + aroma | country)
##    Data: coffee
## 
## REML criterion at convergence: 2575.5
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -6.5716 -0.1567  0.1748  0.4798  2.6732 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev. Corr 
##  country  (Intercept) 154.954  12.448        
##           aroma         2.547   1.596   -1.00
##  Residual               1.807   1.344        
## Number of obs: 743, groups:  country, 7
## 
## Fixed effects:
##                     Estimate Std. Error t value
## (Intercept)          36.6355     2.1943  16.696
## processWashed / Wet   0.3395     0.1373   2.472
## flavor                4.7971     0.2494  19.236
## body                  1.2599     0.2551   4.938
## 
## Correlation of Fixed Effects:
##             (Intr) prcW/W flavor
## prcssWshd/W -0.204              
## flavor      -0.551  0.037       
## body        -0.572  0.149 -0.366
```

---

### Fitting mixed models in R

```r
summary(m3)$coef
```

```
##                      Estimate Std. Error   t value
## (Intercept)         36.635509  2.1942540 16.696111
## processWashed / Wet  0.339477  0.1373031  2.472464
## flavor               4.797067  0.2493800 19.235973
## body                 1.259942  0.2551478  4.938086
```

.question[
Why isn't a p-value provided? (Actually, don't answer this question...it's very 
complicated)
]

---

### Review

A simple random slope + intercept model:

`\begin{align*}
y_{ij} = (\beta_0 + u_{0i}) + (\beta_1 + u_{1i})x_{ij} + \epsilon_{ij}
\end{align*}`

- Random intercept: heterogeneity at `$x_{ij} = 0$` (context-specific)
- Random slope: heterogeneity in relationship between `$Y$` and `$X$`

Assumptions on `$F_{\mathbf{u}}$` and `$\epsilon$`

---

### Longitudinal data

Longitudinal data occur all the time, and are a flexible way of modeling data
that have multiple measures per observation (regardless of at what time they 
were measured). For instance:

- Examining participant weight loss in a bariatric surgery study
- Examining test scores in a cohort of schoolchildren
- Examining how opinions or attitudes toward climate change shift in a voting bloc

etc.

.question[
Why might we want to perform a longitudinal data over a cross-sectional one?
]

---

### Truck drivers

Belenky et al. (2003) conducted a study that examined the effects of sleep
deprivation on reaction time.

---

### The data

.question[
What do you notice? 
]

---

### The data

.question[
What do you notice?
]

---

### Driver-specific models

.question[
What do you notice? 
]

---

### Driver-specific models

(we could also examine individual CIs)

---

### Driver-specific models

(we could also examine individual CIs)

---

### Driver-specific models

.question[
What correlation structure might be appropriate for random effects?
]

---

### A linear mixed model

`\begin{align*}
Reaction_{ij} = (\beta_0 + u_{0i}) + (\beta_1 + u_{1i})Day_{ij} + \epsilon_{ij}
\end{align*}`

.question[
What do the random intercept and random slope represent for this example? 
]

---

### A linear mixed model

```
## Linear mixed model fit by REML ['lmerMod']
## Formula: Reaction ~ 1 + Days + (1 + Days | Subject)
##    Data: sleepstudy
## 
## REML criterion at convergence: 1743.6
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.9536 -0.4634  0.0231  0.4634  5.1793 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev. Corr
##  Subject  (Intercept) 612.10   24.741       
##           Days         35.07    5.922   0.07
##  Residual             654.94   25.592       
## Number of obs: 180, groups:  Subject, 18
## 
## Fixed effects:
##             Estimate Std. Error t value
## (Intercept)  251.405      6.825  36.838
## Days          10.467      1.546   6.771
## 
## Correlation of Fixed Effects:
##      (Intr)
## Days -0.138
```

---

### A linear mixed model

```r
coef(m1)
```

```
## $Subject
##     (Intercept)       Days
## 308    253.6637 19.6662617
## 309    211.0064  1.8476053
## 310    212.4447  5.0184295
## 330    275.0957  5.6529356
## 331    273.6654  7.3973743
## 332    260.4447 10.1951090
## 333    268.2456 10.2436499
## 334    244.1725 11.5418676
## 335    251.0714 -0.2848792
## 337    286.2956 19.0955511
## 349    226.1949 11.6407181
## 350    238.3351 17.0815038
## 351    255.9830  7.4520239
## 352    272.2688 14.0032871
## 369    254.6806 11.3395008
## 370    225.7921 15.2897709
## 371    252.2122  9.4791297
## 372    263.7197 11.7513080
## 
## attr(,"class")
## [1] "coef.mer"
```

---

### GLMMs

```
## Generalized linear mixed model fit by maximum likelihood (Laplace
##   Approximation) [glmerMod]
##  Family: binomial  ( logit )
## Formula: I(Reaction > 250) ~ 1 + Days + (1 + Days | Subject)
##    Data: sleepstudy
## 
##      AIC      BIC   logLik deviance df.resid 
##    107.9    123.9    -49.0     97.9      175 
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -2.38545  0.00194  0.01931  0.12337  1.25673 
## 
## Random effects:
##  Groups  Name        Variance Std.Dev. Corr 
##  Subject (Intercept) 13.7152  3.7034        
##          Days         0.9035  0.9505   -0.14
## Number of obs: 180, groups:  Subject, 18
## 
## Fixed effects:
##             Estimate Std. Error z value Pr(>|z|)
## (Intercept)   0.8604     1.3104   0.657   0.5114
## Days          1.1188     0.4892   2.287   0.0222
## 
## Correlation of Fixed Effects:
##      (Intr)
## Days -0.329
```

---

### Conditional vs. marginal models

Interpreting the fixed effects are the *conditional* slopes for someone. Even 
though we have only a single fixed effect here, we are still performing 
estimation conditionally on something.

.question[
What are we "holding constant" in this model?
]

We are conditioning on the random effect - here, each trucker (or at the very
least, truckers with the same random effects).

---

### Conditional vs. marginal models

If there is large between-group variability (i.e., in reaction times based on
trucker), then maybe the relative associations due to the fixed effects (how
many days each trucker was subjected to sleep deprivation) would be small.

We might want to examine the slopes corresponding to fixed effects (of primary
interest in this case) at different levels of the random effects, or even what
the average fixed effects are while **marginalizing** out the random effects.

.question[
How might we do that?
]

---

### Generalized estimating equations

GEEs are often used for clustered or
longitudinal data when we are interested in marginal effects of parameter 
estimates - that is, a .vocab[population average effect].
This is in contrast to the cluster-specific models being estimated previously.
---

### Generalized estimating equations

They're useful in a lot of cases! Longitudinal data, clustered data, even
spatial data.

The key point: GEEs estimate the population **average** without trying to model
the covariance structure "within each subject" (like when we estimate subject-
specific random effects in mixed models).

.question[
"For a one-unit increase in X in the *population*, how much would the average
Y be expected to change?"
]

Be very careful with this difference in interpretation.

---

### Correlation structure

We can specify a correlation structure based on prior knowledge of the way 
observations are correlated:

- Independent (time points are independent)
- Exchangeable (pairs of points have identical correlations)
- Autoregressive (time points have autoregressive structure, so correlation
between two observations one unit apart is `$\rho$`, two units apart is `$\rho^2$`,
etc.)
- Unstructured (unique correlation estimated for each pair of points)
- ...and others

(good news: it's ok if we're wrong!)

---

### Fitting a GEE

```r
library(gee)
m3 <- gee(Reaction ~ Days, data = sleepstudy,
          id = Subject,
          family = "gaussian",
          corstr = "exchangeable")
```

```
## (Intercept)        Days 
##   251.40510    10.46729
```

---

### Fitting a GEE

```
## 
##  GEE:  GENERALIZED LINEAR MODELS FOR DEPENDENT DATA
##  gee S-function, version 4.13 modified 98/01/27 (1998) 
## 
## Model:
##  Link:                      Identity 
##  Variance to Mean Relation: Gaussian 
##  Correlation Structure:     Exchangeable 
## 
## Call:
## gee(formula = Reaction ~ Days, id = Subject, data = sleepstudy, 
##     family = "gaussian", corstr = "exchangeable")
## 
## Summary of Residuals:
##         Min          1Q      Median          3Q         Max 
## -110.847693  -27.482909    1.546014   26.141904  139.953079 
## 
## 
## Coefficients:
##              Estimate Naive S.E.  Naive z Robust S.E. Robust z
## (Intercept) 251.40510  9.5378078 26.35879    6.632277  37.9063
## Days         10.46729  0.8109579 12.90731    1.502237   6.9678
## 
## Estimated Scale Parameter:  2276.694
## Number of Iterations:  1
## 
## Working Correlation
##            [,1]      [,2]      [,3]      [,4]      [,5]      [,6]      [,7]
##  [1,] 1.0000000 0.5710385 0.5710385 0.5710385 0.5710385 0.5710385 0.5710385
##  [2,] 0.5710385 1.0000000 0.5710385 0.5710385 0.5710385 0.5710385 0.5710385
##  [3,] 0.5710385 0.5710385 1.0000000 0.5710385 0.5710385 0.5710385 0.5710385
##  [4,] 0.5710385 0.5710385 0.5710385 1.0000000 0.5710385 0.5710385 0.5710385
##  [5,] 0.5710385 0.5710385 0.5710385 0.5710385 1.0000000 0.5710385 0.5710385
##  [6,] 0.5710385 0.5710385 0.5710385 0.5710385 0.5710385 1.0000000 0.5710385
##  [7,] 0.5710385 0.5710385 0.5710385 0.5710385 0.5710385 0.5710385 1.0000000
##  [8,] 0.5710385 0.5710385 0.5710385 0.5710385 0.5710385 0.5710385 0.5710385
##  [9,] 0.5710385 0.5710385 0.5710385 0.5710385 0.5710385 0.5710385 0.5710385
## [10,] 0.5710385 0.5710385 0.5710385 0.5710385 0.5710385 0.5710385 0.5710385
##            [,8]      [,9]     [,10]
##  [1,] 0.5710385 0.5710385 0.5710385
##  [2,] 0.5710385 0.5710385 0.5710385
##  [3,] 0.5710385 0.5710385 0.5710385
##  [4,] 0.5710385 0.5710385 0.5710385
##  [5,] 0.5710385 0.5710385 0.5710385
##  [6,] 0.5710385 0.5710385 0.5710385
##  [7,] 0.5710385 0.5710385 0.5710385
##  [8,] 1.0000000 0.5710385 0.5710385
##  [9,] 0.5710385 1.0000000 0.5710385
## [10,] 0.5710385 0.5710385 1.0000000
```

---

### Fitting a GEE

```r
m4 <- gee(Reaction ~ Days, data = sleepstudy,
          id = Subject,
          family = "gaussian",
          corstr = "AR-M", Mv = 1)
```

```
## (Intercept)        Days 
##   251.40510    10.46729
```

---

### Fitting a GEE

```
## 
##  GEE:  GENERALIZED LINEAR MODELS FOR DEPENDENT DATA
##  gee S-function, version 4.13 modified 98/01/27 (1998) 
## 
## Model:
##  Link:                      Identity 
##  Variance to Mean Relation: Gaussian 
##  Correlation Structure:     AR-M , M = 1 
## 
## Call:
## gee(formula = Reaction ~ Days, id = Subject, data = sleepstudy, 
##     family = "gaussian", corstr = "AR-M", Mv = 1)
## 
## Summary of Residuals:
##         Min          1Q      Median          3Q         Max 
## -112.927733  -29.566326   -0.533766   24.060825  137.872000 
## 
## 
## Coefficients:
##              Estimate Naive S.E.   Naive z Robust S.E.  Robust z
## (Intercept) 253.48930   10.70967 23.669193    6.356466 39.878969
## Days         10.46677    1.67394  6.252775    1.439438  7.271427
## 
## Estimated Scale Parameter:  2281.077
## Number of Iterations:  2
## 
## Working Correlation
##            [,1]      [,2]      [,3]      [,4]      [,5]      [,6]      [,7]
##  [1,] 1.0000000 0.7670216 0.5883221 0.4512558 0.3461229 0.2654837 0.2036318
##  [2,] 0.7670216 1.0000000 0.7670216 0.5883221 0.4512558 0.3461229 0.2654837
##  [3,] 0.5883221 0.7670216 1.0000000 0.7670216 0.5883221 0.4512558 0.3461229
##  [4,] 0.4512558 0.5883221 0.7670216 1.0000000 0.7670216 0.5883221 0.4512558
##  [5,] 0.3461229 0.4512558 0.5883221 0.7670216 1.0000000 0.7670216 0.5883221
##  [6,] 0.2654837 0.3461229 0.4512558 0.5883221 0.7670216 1.0000000 0.7670216
##  [7,] 0.2036318 0.2654837 0.3461229 0.4512558 0.5883221 0.7670216 1.0000000
##  [8,] 0.1561900 0.2036318 0.2654837 0.3461229 0.4512558 0.5883221 0.7670216
##  [9,] 0.1198011 0.1561900 0.2036318 0.2654837 0.3461229 0.4512558 0.5883221
## [10,] 0.0918900 0.1198011 0.1561900 0.2036318 0.2654837 0.3461229 0.4512558
##            [,8]      [,9]     [,10]
##  [1,] 0.1561900 0.1198011 0.0918900
##  [2,] 0.2036318 0.1561900 0.1198011
##  [3,] 0.2654837 0.2036318 0.1561900
##  [4,] 0.3461229 0.2654837 0.2036318
##  [5,] 0.4512558 0.3461229 0.2654837
##  [6,] 0.5883221 0.4512558 0.3461229
##  [7,] 0.7670216 0.5883221 0.4512558
##  [8,] 1.0000000 0.7670216 0.5883221
##  [9,] 0.7670216 1.0000000 0.7670216
## [10,] 0.5883221 0.7670216 1.0000000
```

---

### Fitting a GEE

```r
m5 <- gee(I(Reaction > 250) ~ Days, data = sleepstudy,
          id = Subject,
          family = "binomial",
          corstr = "exchangeable")
```

```
## (Intercept)        Days 
##   0.6020332   0.1947168
```

---

### Fitting a GEE

```
## 
##  GEE:  GENERALIZED LINEAR MODELS FOR DEPENDENT DATA
##  gee S-function, version 4.13 modified 98/01/27 (1998) 
## 
## Model:
##  Link:                      Logit 
##  Variance to Mean Relation: Binomial 
##  Correlation Structure:     Exchangeable 
## 
## Call:
## gee(formula = I(Reaction > 250) ~ Days, id = Subject, data = sleepstudy, 
##     family = "binomial", corstr = "exchangeable")
## 
## Summary of Residuals:
##         Min          1Q      Median          3Q         Max 
## -0.92668743  0.07331257  0.12696659  0.21094886  0.32951465 
## 
## 
## Coefficients:
##              Estimate Naive S.E.  Naive z Robust S.E. Robust z
## (Intercept) 0.7103810 0.43644464 1.627654   0.4648023 1.528351
## Days        0.2029448 0.06327235 3.207480   0.1126510 1.801536
## 
## Estimated Scale Parameter:  1.153384
## Number of Iterations:  4
## 
## Working Correlation
##            [,1]      [,2]      [,3]      [,4]      [,5]      [,6]      [,7]
##  [1,] 1.0000000 0.5198122 0.5198122 0.5198122 0.5198122 0.5198122 0.5198122
##  [2,] 0.5198122 1.0000000 0.5198122 0.5198122 0.5198122 0.5198122 0.5198122
##  [3,] 0.5198122 0.5198122 1.0000000 0.5198122 0.5198122 0.5198122 0.5198122
##  [4,] 0.5198122 0.5198122 0.5198122 1.0000000 0.5198122 0.5198122 0.5198122
##  [5,] 0.5198122 0.5198122 0.5198122 0.5198122 1.0000000 0.5198122 0.5198122
##  [6,] 0.5198122 0.5198122 0.5198122 0.5198122 0.5198122 1.0000000 0.5198122
##  [7,] 0.5198122 0.5198122 0.5198122 0.5198122 0.5198122 0.5198122 1.0000000
##  [8,] 0.5198122 0.5198122 0.5198122 0.5198122 0.5198122 0.5198122 0.5198122
##  [9,] 0.5198122 0.5198122 0.5198122 0.5198122 0.5198122 0.5198122 0.5198122
## [10,] 0.5198122 0.5198122 0.5198122 0.5198122 0.5198122 0.5198122 0.5198122
##            [,8]      [,9]     [,10]
##  [1,] 0.5198122 0.5198122 0.5198122
##  [2,] 0.5198122 0.5198122 0.5198122
##  [3,] 0.5198122 0.5198122 0.5198122
##  [4,] 0.5198122 0.5198122 0.5198122
##  [5,] 0.5198122 0.5198122 0.5198122
##  [6,] 0.5198122 0.5198122 0.5198122
##  [7,] 0.5198122 0.5198122 0.5198122
##  [8,] 1.0000000 0.5198122 0.5198122
##  [9,] 0.5198122 1.0000000 0.5198122
## [10,] 0.5198122 0.5198122 1.0000000
```