class: center, middle, inverse, title-slide # Linear Mixed Models (2) ### Yue Jiang ### STA 210 / Duke University / Spring 2023 --- ### Coffee <img src="img/coffee.jpg" width="80%" style="display: block; margin: auto;" /> --- ### A mixed effects model `\begin{align*} Score_{ij} &= \beta_{0j} + \beta_{1j}Aroma_{ij} + \epsilon_{ij}\\ \beta_{0j} &= \gamma_{00} + u_{0j}\\ \beta_{1j} &= \gamma_{10} + u_{1j} \end{align*}` `\begin{align*} Score_{ij} &= (\gamma_{00} + u_{0j}) + (\gamma_{10} + u_{1j})Aroma_{ij} + \epsilon_{ij}\\ \end{align*}` with some correlational structure for the `\(u_{0j}\)`s and `\(u_{1j}\)`s (perhaps jointly) and `\(\epsilon\)`s. --- ### Fitting mixed models in R ```r library(lme4) m1 <- lm(score ~ process + aroma + flavor + body + country, data = coffee) m2 <- lmer(score ~ 1 + process + aroma + flavor + body + (1 | country), data = coffee) m3 <- lmer(score ~ 1 + process + flavor + body + (1 + aroma | country), data = coffee) ``` --- ### Fitting mixed models in R ```r summary(m1) ``` ``` ## ## Call: ## lm(formula = score ~ process + aroma + flavor + body + country, ## data = coffee) ## ## Residuals: ## Min 1Q Median 3Q Max ## -9.2591 -0.2129 0.2461 0.6575 3.8947 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 27.80043 1.71419 16.218 < 2e-16 ## processWashed / Wet 0.29849 0.15975 1.868 0.06210 ## aroma 1.20418 0.24502 4.915 1.10e-06 ## flavor 4.90092 0.24946 19.646 < 2e-16 ## body 1.10387 0.25909 4.260 2.31e-05 ## countryColombia 0.21154 0.22570 0.937 0.34892 ## countryEthiopia 0.05084 0.32905 0.155 0.87725 ## countryGuatemala -0.18649 0.22868 -0.816 0.41505 ## countryMexico -0.51171 0.22248 -2.300 0.02173 ## countryTaiwan -0.12561 0.26702 -0.470 0.63820 ## countryUS (Hawaii) -0.81508 0.24732 -3.296 0.00103 ## ## Residual standard error: 1.37 on 732 degrees of freedom ## Multiple R-squared: 0.7228, Adjusted R-squared: 0.719 ## F-statistic: 190.8 on 10 and 732 DF, p-value: < 2.2e-16 ``` --- ### Fitting mixed models in R ```r summary(m2) ``` ``` ## Linear mixed model fit by REML ['lmerMod'] ## Formula: score ~ 1 + process + aroma + flavor + body + (1 | country) ## Data: coffee ## ## REML criterion at convergence: 2591.2 ## ## Scaled residuals: ## Min 1Q Median 3Q Max ## -6.7608 -0.1413 0.1801 0.4890 2.8766 ## ## Random effects: ## Groups Name Variance Std.Dev. ## country (Intercept) 0.09761 0.3124 ## Residual 1.87680 1.3700 ## Number of obs: 743, groups: country, 7 ## ## Fixed effects: ## Estimate Std. Error t value ## (Intercept) 27.1050 1.6896 16.042 ## processWashed / Wet 0.3174 0.1486 2.137 ## aroma 1.2240 0.2443 5.011 ## flavor 4.9271 0.2482 19.855 ## body 1.1220 0.2574 4.359 ## ## Correlation of Fixed Effects: ## (Intr) prcW/W aroma flavor ## prcssWshd/W -0.208 ## aroma -0.283 -0.025 ## flavor -0.038 0.036 -0.567 ## body -0.548 0.123 -0.158 -0.393 ``` --- ### Fitting mixed models in R ```r summary(m2)$coef ``` ``` ## Estimate Std. Error t value ## (Intercept) 27.1049933 1.6896244 16.042023 ## processWashed / Wet 0.3174144 0.1485559 2.136666 ## aroma 1.2240265 0.2442577 5.011210 ## flavor 4.9271276 0.2481529 19.855207 ## body 1.1219892 0.2573773 4.359316 ``` --- ### Fitting mixed models in R ```r summary(m3) ``` ``` ## Linear mixed model fit by REML ['lmerMod'] ## Formula: score ~ 1 + process + flavor + body + (1 + aroma | country) ## Data: coffee ## ## REML criterion at convergence: 2575.5 ## ## Scaled residuals: ## Min 1Q Median 3Q Max ## -6.5716 -0.1567 0.1748 0.4798 2.6732 ## ## Random effects: ## Groups Name Variance Std.Dev. Corr ## country (Intercept) 154.954 12.448 ## aroma 2.547 1.596 -1.00 ## Residual 1.807 1.344 ## Number of obs: 743, groups: country, 7 ## ## Fixed effects: ## Estimate Std. Error t value ## (Intercept) 36.6355 2.1943 16.696 ## processWashed / Wet 0.3395 0.1373 2.472 ## flavor 4.7971 0.2494 19.236 ## body 1.2599 0.2551 4.938 ## ## Correlation of Fixed Effects: ## (Intr) prcW/W flavor ## prcssWshd/W -0.204 ## flavor -0.551 0.037 ## body -0.572 0.149 -0.366 ``` --- ### Fitting mixed models in R ```r summary(m3)$coef ``` ``` ## Estimate Std. Error t value ## (Intercept) 36.635509 2.1942540 16.696111 ## processWashed / Wet 0.339477 0.1373031 2.472464 ## flavor 4.797067 0.2493800 19.235973 ## body 1.259942 0.2551478 4.938086 ``` .question[ Why isn't a p-value provided? (Actually, don't answer this question...it's very complicated) ] --- ### Review A simple random slope + intercept model: `\begin{align*} y_{ij} = (\beta_0 + u_{0i}) + (\beta_1 + u_{1i})x_{ij} + \epsilon_{ij} \end{align*}` - Random intercept: heterogeneity at `\(x_{ij} = 0\)` (context-specific) - Random slope: heterogeneity in relationship between `\(Y\)` and `\(X\)` Assumptions on `\(F_{\mathbf{u}}\)` and `\(\epsilon\)` --- ### Longitudinal data Longitudinal data occur all the time, and are a flexible way of modeling data that have multiple measures per observation (regardless of at what time they were measured). For instance: - Examining participant weight loss in a bariatric surgery study - Examining test scores in a cohort of schoolchildren - Examining how opinions or attitudes toward climate change shift in a voting bloc etc. .question[ Why might we want to perform a longitudinal data over a cross-sectional one? ] --- ### Truck drivers <img src="img/truck.jpg" width="80%" style="display: block; margin: auto;" /> Belenky et al. (2003) conducted a study that examined the effects of sleep deprivation on reaction time. --- ### The data <img src="lmm-2_files/figure-html/unnamed-chunk-10-1.png" style="display: block; margin: auto;" /> .question[ What do you notice? ] --- ### The data <img src="lmm-2_files/figure-html/unnamed-chunk-11-1.png" style="display: block; margin: auto;" /> .question[ What do you notice? ] --- ### Driver-specific models <img src="lmm-2_files/figure-html/unnamed-chunk-12-1.png" style="display: block; margin: auto;" /> .question[ What do you notice? ] --- ### Driver-specific models <img src="lmm-2_files/figure-html/unnamed-chunk-14-1.png" style="display: block; margin: auto;" /> (we could also examine individual CIs) --- ### Driver-specific models <img src="lmm-2_files/figure-html/unnamed-chunk-15-1.png" style="display: block; margin: auto;" /> (we could also examine individual CIs) --- ### Driver-specific models <img src="lmm-2_files/figure-html/unnamed-chunk-16-1.png" style="display: block; margin: auto;" /> .question[ What correlation structure might be appropriate for random effects? ] --- ### A linear mixed model `\begin{align*} Reaction_{ij} = (\beta_0 + u_{0i}) + (\beta_1 + u_{1i})Day_{ij} + \epsilon_{ij} \end{align*}` .question[ What do the random intercept and random slope represent for this example? ] --- ### A linear mixed model ``` ## Linear mixed model fit by REML ['lmerMod'] ## Formula: Reaction ~ 1 + Days + (1 + Days | Subject) ## Data: sleepstudy ## ## REML criterion at convergence: 1743.6 ## ## Scaled residuals: ## Min 1Q Median 3Q Max ## -3.9536 -0.4634 0.0231 0.4634 5.1793 ## ## Random effects: ## Groups Name Variance Std.Dev. Corr ## Subject (Intercept) 612.10 24.741 ## Days 35.07 5.922 0.07 ## Residual 654.94 25.592 ## Number of obs: 180, groups: Subject, 18 ## ## Fixed effects: ## Estimate Std. Error t value ## (Intercept) 251.405 6.825 36.838 ## Days 10.467 1.546 6.771 ## ## Correlation of Fixed Effects: ## (Intr) ## Days -0.138 ``` --- ### A linear mixed model ```r coef(m1) ``` ``` ## $Subject ## (Intercept) Days ## 308 253.6637 19.6662617 ## 309 211.0064 1.8476053 ## 310 212.4447 5.0184295 ## 330 275.0957 5.6529356 ## 331 273.6654 7.3973743 ## 332 260.4447 10.1951090 ## 333 268.2456 10.2436499 ## 334 244.1725 11.5418676 ## 335 251.0714 -0.2848792 ## 337 286.2956 19.0955511 ## 349 226.1949 11.6407181 ## 350 238.3351 17.0815038 ## 351 255.9830 7.4520239 ## 352 272.2688 14.0032871 ## 369 254.6806 11.3395008 ## 370 225.7921 15.2897709 ## 371 252.2122 9.4791297 ## 372 263.7197 11.7513080 ## ## attr(,"class") ## [1] "coef.mer" ``` --- ### GLMMs ``` ## Generalized linear mixed model fit by maximum likelihood (Laplace ## Approximation) [glmerMod] ## Family: binomial ( logit ) ## Formula: I(Reaction > 250) ~ 1 + Days + (1 + Days | Subject) ## Data: sleepstudy ## ## AIC BIC logLik deviance df.resid ## 107.9 123.9 -49.0 97.9 175 ## ## Scaled residuals: ## Min 1Q Median 3Q Max ## -2.38545 0.00194 0.01931 0.12337 1.25673 ## ## Random effects: ## Groups Name Variance Std.Dev. Corr ## Subject (Intercept) 13.7152 3.7034 ## Days 0.9035 0.9505 -0.14 ## Number of obs: 180, groups: Subject, 18 ## ## Fixed effects: ## Estimate Std. Error z value Pr(>|z|) ## (Intercept) 0.8604 1.3104 0.657 0.5114 ## Days 1.1188 0.4892 2.287 0.0222 ## ## Correlation of Fixed Effects: ## (Intr) ## Days -0.329 ``` --- ### Conditional vs. marginal models Interpreting the fixed effects are the *conditional* slopes for someone. Even though we have only a single fixed effect here, we are still performing estimation conditionally on something. .question[ What are we "holding constant" in this model? ] -- We are conditioning on the random effect - here, each trucker (or at the very least, truckers with the same random effects). --- ### Conditional vs. marginal models If there is large between-group variability (i.e., in reaction times based on trucker), then maybe the relative associations due to the fixed effects (how many days each trucker was subjected to sleep deprivation) would be small. We might want to examine the slopes corresponding to fixed effects (of primary interest in this case) at different levels of the random effects, or even what the average fixed effects are while **marginalizing** out the random effects. .question[ How might we do that? ] --- ### Generalized estimating equations GEEs are often used for clustered or longitudinal data when we are interested in marginal effects of parameter estimates - that is, a .vocab[population average effect]. This is in contrast to the cluster-specific models being estimated previously. --- ### Generalized estimating equations They're useful in a lot of cases! Longitudinal data, clustered data, even spatial data. The key point: GEEs estimate the population **average** without trying to model the covariance structure "within each subject" (like when we estimate subject- specific random effects in mixed models). .question[ "For a one-unit increase in X in the *population*, how much would the average Y be expected to change?" ] Be very careful with this difference in interpretation. --- ### Correlation structure We can specify a correlation structure based on prior knowledge of the way observations are correlated: - Independent (time points are independent) - Exchangeable (pairs of points have identical correlations) - Autoregressive (time points have autoregressive structure, so correlation between two observations one unit apart is `\(\rho\)`, two units apart is `\(\rho^2\)`, etc.) - Unstructured (unique correlation estimated for each pair of points) - ...and others (good news: it's ok if we're wrong!) --- ### Fitting a GEE ```r library(gee) m3 <- gee(Reaction ~ Days, data = sleepstudy, id = Subject, family = "gaussian", corstr = "exchangeable") ``` ``` ## (Intercept) Days ## 251.40510 10.46729 ``` --- ### Fitting a GEE ``` ## ## GEE: GENERALIZED LINEAR MODELS FOR DEPENDENT DATA ## gee S-function, version 4.13 modified 98/01/27 (1998) ## ## Model: ## Link: Identity ## Variance to Mean Relation: Gaussian ## Correlation Structure: Exchangeable ## ## Call: ## gee(formula = Reaction ~ Days, id = Subject, data = sleepstudy, ## family = "gaussian", corstr = "exchangeable") ## ## Summary of Residuals: ## Min 1Q Median 3Q Max ## -110.847693 -27.482909 1.546014 26.141904 139.953079 ## ## ## Coefficients: ## Estimate Naive S.E. Naive z Robust S.E. Robust z ## (Intercept) 251.40510 9.5378078 26.35879 6.632277 37.9063 ## Days 10.46729 0.8109579 12.90731 1.502237 6.9678 ## ## Estimated Scale Parameter: 2276.694 ## Number of Iterations: 1 ## ## Working Correlation ## [,1] [,2] [,3] [,4] [,5] [,6] [,7] ## [1,] 1.0000000 0.5710385 0.5710385 0.5710385 0.5710385 0.5710385 0.5710385 ## [2,] 0.5710385 1.0000000 0.5710385 0.5710385 0.5710385 0.5710385 0.5710385 ## [3,] 0.5710385 0.5710385 1.0000000 0.5710385 0.5710385 0.5710385 0.5710385 ## [4,] 0.5710385 0.5710385 0.5710385 1.0000000 0.5710385 0.5710385 0.5710385 ## [5,] 0.5710385 0.5710385 0.5710385 0.5710385 1.0000000 0.5710385 0.5710385 ## [6,] 0.5710385 0.5710385 0.5710385 0.5710385 0.5710385 1.0000000 0.5710385 ## [7,] 0.5710385 0.5710385 0.5710385 0.5710385 0.5710385 0.5710385 1.0000000 ## [8,] 0.5710385 0.5710385 0.5710385 0.5710385 0.5710385 0.5710385 0.5710385 ## [9,] 0.5710385 0.5710385 0.5710385 0.5710385 0.5710385 0.5710385 0.5710385 ## [10,] 0.5710385 0.5710385 0.5710385 0.5710385 0.5710385 0.5710385 0.5710385 ## [,8] [,9] [,10] ## [1,] 0.5710385 0.5710385 0.5710385 ## [2,] 0.5710385 0.5710385 0.5710385 ## [3,] 0.5710385 0.5710385 0.5710385 ## [4,] 0.5710385 0.5710385 0.5710385 ## [5,] 0.5710385 0.5710385 0.5710385 ## [6,] 0.5710385 0.5710385 0.5710385 ## [7,] 0.5710385 0.5710385 0.5710385 ## [8,] 1.0000000 0.5710385 0.5710385 ## [9,] 0.5710385 1.0000000 0.5710385 ## [10,] 0.5710385 0.5710385 1.0000000 ``` --- ### Fitting a GEE ```r m4 <- gee(Reaction ~ Days, data = sleepstudy, id = Subject, family = "gaussian", corstr = "AR-M", Mv = 1) ``` ``` ## (Intercept) Days ## 251.40510 10.46729 ``` --- ### Fitting a GEE ``` ## ## GEE: GENERALIZED LINEAR MODELS FOR DEPENDENT DATA ## gee S-function, version 4.13 modified 98/01/27 (1998) ## ## Model: ## Link: Identity ## Variance to Mean Relation: Gaussian ## Correlation Structure: AR-M , M = 1 ## ## Call: ## gee(formula = Reaction ~ Days, id = Subject, data = sleepstudy, ## family = "gaussian", corstr = "AR-M", Mv = 1) ## ## Summary of Residuals: ## Min 1Q Median 3Q Max ## -112.927733 -29.566326 -0.533766 24.060825 137.872000 ## ## ## Coefficients: ## Estimate Naive S.E. Naive z Robust S.E. Robust z ## (Intercept) 253.48930 10.70967 23.669193 6.356466 39.878969 ## Days 10.46677 1.67394 6.252775 1.439438 7.271427 ## ## Estimated Scale Parameter: 2281.077 ## Number of Iterations: 2 ## ## Working Correlation ## [,1] [,2] [,3] [,4] [,5] [,6] [,7] ## [1,] 1.0000000 0.7670216 0.5883221 0.4512558 0.3461229 0.2654837 0.2036318 ## [2,] 0.7670216 1.0000000 0.7670216 0.5883221 0.4512558 0.3461229 0.2654837 ## [3,] 0.5883221 0.7670216 1.0000000 0.7670216 0.5883221 0.4512558 0.3461229 ## [4,] 0.4512558 0.5883221 0.7670216 1.0000000 0.7670216 0.5883221 0.4512558 ## [5,] 0.3461229 0.4512558 0.5883221 0.7670216 1.0000000 0.7670216 0.5883221 ## [6,] 0.2654837 0.3461229 0.4512558 0.5883221 0.7670216 1.0000000 0.7670216 ## [7,] 0.2036318 0.2654837 0.3461229 0.4512558 0.5883221 0.7670216 1.0000000 ## [8,] 0.1561900 0.2036318 0.2654837 0.3461229 0.4512558 0.5883221 0.7670216 ## [9,] 0.1198011 0.1561900 0.2036318 0.2654837 0.3461229 0.4512558 0.5883221 ## [10,] 0.0918900 0.1198011 0.1561900 0.2036318 0.2654837 0.3461229 0.4512558 ## [,8] [,9] [,10] ## [1,] 0.1561900 0.1198011 0.0918900 ## [2,] 0.2036318 0.1561900 0.1198011 ## [3,] 0.2654837 0.2036318 0.1561900 ## [4,] 0.3461229 0.2654837 0.2036318 ## [5,] 0.4512558 0.3461229 0.2654837 ## [6,] 0.5883221 0.4512558 0.3461229 ## [7,] 0.7670216 0.5883221 0.4512558 ## [8,] 1.0000000 0.7670216 0.5883221 ## [9,] 0.7670216 1.0000000 0.7670216 ## [10,] 0.5883221 0.7670216 1.0000000 ``` --- ### Fitting a GEE ```r m5 <- gee(I(Reaction > 250) ~ Days, data = sleepstudy, id = Subject, family = "binomial", corstr = "exchangeable") ``` ``` ## (Intercept) Days ## 0.6020332 0.1947168 ``` --- ### Fitting a GEE ``` ## ## GEE: GENERALIZED LINEAR MODELS FOR DEPENDENT DATA ## gee S-function, version 4.13 modified 98/01/27 (1998) ## ## Model: ## Link: Logit ## Variance to Mean Relation: Binomial ## Correlation Structure: Exchangeable ## ## Call: ## gee(formula = I(Reaction > 250) ~ Days, id = Subject, data = sleepstudy, ## family = "binomial", corstr = "exchangeable") ## ## Summary of Residuals: ## Min 1Q Median 3Q Max ## -0.92668743 0.07331257 0.12696659 0.21094886 0.32951465 ## ## ## Coefficients: ## Estimate Naive S.E. Naive z Robust S.E. Robust z ## (Intercept) 0.7103810 0.43644464 1.627654 0.4648023 1.528351 ## Days 0.2029448 0.06327235 3.207480 0.1126510 1.801536 ## ## Estimated Scale Parameter: 1.153384 ## Number of Iterations: 4 ## ## Working Correlation ## [,1] [,2] [,3] [,4] [,5] [,6] [,7] ## [1,] 1.0000000 0.5198122 0.5198122 0.5198122 0.5198122 0.5198122 0.5198122 ## [2,] 0.5198122 1.0000000 0.5198122 0.5198122 0.5198122 0.5198122 0.5198122 ## [3,] 0.5198122 0.5198122 1.0000000 0.5198122 0.5198122 0.5198122 0.5198122 ## [4,] 0.5198122 0.5198122 0.5198122 1.0000000 0.5198122 0.5198122 0.5198122 ## [5,] 0.5198122 0.5198122 0.5198122 0.5198122 1.0000000 0.5198122 0.5198122 ## [6,] 0.5198122 0.5198122 0.5198122 0.5198122 0.5198122 1.0000000 0.5198122 ## [7,] 0.5198122 0.5198122 0.5198122 0.5198122 0.5198122 0.5198122 1.0000000 ## [8,] 0.5198122 0.5198122 0.5198122 0.5198122 0.5198122 0.5198122 0.5198122 ## [9,] 0.5198122 0.5198122 0.5198122 0.5198122 0.5198122 0.5198122 0.5198122 ## [10,] 0.5198122 0.5198122 0.5198122 0.5198122 0.5198122 0.5198122 0.5198122 ## [,8] [,9] [,10] ## [1,] 0.5198122 0.5198122 0.5198122 ## [2,] 0.5198122 0.5198122 0.5198122 ## [3,] 0.5198122 0.5198122 0.5198122 ## [4,] 0.5198122 0.5198122 0.5198122 ## [5,] 0.5198122 0.5198122 0.5198122 ## [6,] 0.5198122 0.5198122 0.5198122 ## [7,] 0.5198122 0.5198122 0.5198122 ## [8,] 1.0000000 0.5198122 0.5198122 ## [9,] 0.5198122 1.0000000 0.5198122 ## [10,] 0.5198122 0.5198122 1.0000000 ```