class: center, middle, inverse, title-slide # Linear Mixed Models (2) ### Yue Jiang ### Duke University --- ### Review `\begin{align*} \mathbf{y} = \mathbf{X\beta} + \mathbf{Zu} + \mathbf{\epsilon} \end{align*}` - `\(\mathbf{y}\)`, `\(\mathbf{X}\)`, and `\(\mathbf{\beta}\)` are just as in "normal" regression (and `\(\mathbf{\epsilon}\)` still represent the individual residuals). - `\(\mathbf{u}\)` is a `\(q * K\)` vector of the `\(q\)` random effects for the `\(K\)` groups. - `\(\mathbf{Z}\)` is the `\(n \times (q * K)\)` design matrix corresponding to the random effects. Commonly, we might assume `\(\mathbf{u} \sim N_q(\mathbf{0}, \mathbf{G})\)`, where `\(\mathbf{G}\)` represents the covariance matrix of the `\(q\)` different random effects. We also might impose assumptions on the correlation structure for `\(\epsilon\)`. These types of structures come up, for instance, in **longitudinal data analysis**. --- ### Review A simple random slope + intercept model: `\begin{align*} y_{ij} = (\beta_0 + u_{0i}) + (\beta_1 + u_{1i})x_{ij} + \epsilon_{ij} \end{align*}` - Random intercept: heterogeneity at `\(x_{ij} = 0\)` (context-specific) - Random slope: heterogeneity in relationship between `\(Y\)` and `\(X\)` Assumptions on `\(F_{\mathbf{u}}\)` and `\(\epsilon\)` --- ### Longitudinal data Longitudinal data occur all the time, and are a flexible way of modeling data that have multiple measures per observation (regardless of at what time they were measured). For instance: - Examining participant weight loss in a bariatric surgery study - Examining test scores in a cohort of schoolchildren - Examining how opinions or attitudes toward climate change shift in a voting bloc etc. .question[ Why might we want to perform a longitudinal data over a cross-sectional one? ] --- ### Truck drivers <img src="img/truck.jpg" width="80%" style="display: block; margin: auto;" /> Belenky et al. (2003) conducted a study that examined the effects of sleep deprivation on reaction time. --- ### The data <img src="lmm-2_files/figure-html/unnamed-chunk-2-1.png" style="display: block; margin: auto;" /> .question[ What do you notice? ] --- ### The data <img src="lmm-2_files/figure-html/unnamed-chunk-3-1.png" style="display: block; margin: auto;" /> .question[ What do you notice? ] --- ### Driver-specific models <img src="lmm-2_files/figure-html/unnamed-chunk-4-1.png" style="display: block; margin: auto;" /> .question[ What do you notice? ] --- ### Driver-specific models <img src="lmm-2_files/figure-html/unnamed-chunk-6-1.png" style="display: block; margin: auto;" /> (we could also examine individual CIs) --- ### Driver-specific models <img src="lmm-2_files/figure-html/unnamed-chunk-7-1.png" style="display: block; margin: auto;" /> (we could also examine individual CIs) --- ### Driver-specific models <img src="lmm-2_files/figure-html/unnamed-chunk-8-1.png" style="display: block; margin: auto;" /> .question[ What correlation structure might be appropriate for random effects? ] --- ### A linear mixed model `\begin{align*} Reaction_{ij} = (\beta_0 + u_{0i}) + (\beta_1 + u_{1i})Day_{ij} + \epsilon_{ij} \end{align*}` .question[ What do the random intercept and random slope represent for this example? ] --- ### A linear mixed model ``` ## Linear mixed model fit by REML ['lmerMod'] ## Formula: Reaction ~ 1 + Days + (1 + Days | Subject) ## Data: sleepstudy ## ## REML criterion at convergence: 1743.6 ## ## Scaled residuals: ## Min 1Q Median 3Q Max ## -3.9536 -0.4634 0.0231 0.4634 5.1793 ## ## Random effects: ## Groups Name Variance Std.Dev. Corr ## Subject (Intercept) 612.10 24.741 ## Days 35.07 5.922 0.07 ## Residual 654.94 25.592 ## Number of obs: 180, groups: Subject, 18 ## ## Fixed effects: ## Estimate Std. Error t value ## (Intercept) 251.405 6.825 36.838 ## Days 10.467 1.546 6.771 ## ## Correlation of Fixed Effects: ## (Intr) ## Days -0.138 ``` --- ### A linear mixed model ```r coef(m1) ``` ``` ## $Subject ## (Intercept) Days ## 308 253.6637 19.6662617 ## 309 211.0064 1.8476053 ## 310 212.4447 5.0184295 ## 330 275.0957 5.6529356 ## 331 273.6654 7.3973743 ## 332 260.4447 10.1951090 ## 333 268.2456 10.2436499 ## 334 244.1725 11.5418676 ## 335 251.0714 -0.2848792 ## 337 286.2956 19.0955511 ## 349 226.1949 11.6407181 ## 350 238.3351 17.0815038 ## 351 255.9830 7.4520239 ## 352 272.2688 14.0032871 ## 369 254.6806 11.3395008 ## 370 225.7921 15.2897709 ## 371 252.2122 9.4791297 ## 372 263.7197 11.7513080 ## ## attr(,"class") ## [1] "coef.mer" ``` --- ### GLMMs How might we "extend" linear mixed models to generalized linear mixed models? `\begin{align*} g(E(\mathbf{y})) = \mathbf{X\beta} + \mathbf{Zu} \end{align*}` --- ### GLMMs ``` ## Generalized linear mixed model fit by maximum likelihood (Laplace ## Approximation) [glmerMod] ## Family: binomial ( logit ) ## Formula: I(Reaction > 250) ~ 1 + Days + (1 + Days | Subject) ## Data: sleepstudy ## ## AIC BIC logLik deviance df.resid ## 107.9 123.9 -49.0 97.9 175 ## ## Scaled residuals: ## Min 1Q Median 3Q Max ## -2.38545 0.00194 0.01931 0.12337 1.25673 ## ## Random effects: ## Groups Name Variance Std.Dev. Corr ## Subject (Intercept) 13.7152 3.7034 ## Days 0.9035 0.9505 -0.14 ## Number of obs: 180, groups: Subject, 18 ## ## Fixed effects: ## Estimate Std. Error z value Pr(>|z|) ## (Intercept) 0.8604 1.3104 0.657 0.5114 ## Days 1.1188 0.4892 2.287 0.0222 ## ## Correlation of Fixed Effects: ## (Intr) ## Days -0.329 ``` --- ### Conditional vs. marginal models Interpreting the fixed effects are the *conditional* slopes for someone. Even though we have only a single fixed effect here, we are still performing estimation conditionally on something. .question[ What are we "holding constant" in this model? ] -- We are conditioning on the random effect - here, each trucker (or at the very least, truckers with the same random effects). --- ### Conditional vs. marginal models If there is large between-group variability (i.e., in reaction times based on trucker), then maybe the relative associations due to the fixed effects (how many days each trucker was subjected to sleep deprivation) would be small. We might want to examine the slopes corresponding to fixed effects (of primary interest in this case) at different levels of the random effects, or even what the average fixed effects are while **marginalizing** out the random effects. .question[ How might we do that? ] --- ### Generalized estimating equations Generalized estimating equations (**GEE**) are often used for clustered or longitudinal data when we are interested in marginal effects of parameter estimates - that is, a population average effect. This is in contrast to the cluster-specific models being estimated with conditional techniques (keep in mind, we could marginalize "by hand" if we wanted to from a mixed effects model). GEE doesn't rely on a likelihood-based approach, which has advantages (easier computation) and disadvantages (difficulties with inferential procedures). --- ### Correlation structure We can specify a correlation structure based on prior knowledge of the way observations are correlated: - Independent (time points are independent, i.e., all off-diagonals are zero) - Exchangeable (pairs of points have identical correlations, i.e., all off- diagonals have same estimated correlation) - Autoregressive (time points have autoregressive structure, so correlation between two observations one unit apart is `\(\rho\)`, two units apart is `\(\rho^2\)`, etc.) - Toeplitz (band structure in matrix, with 0s outside of the band) - Unstructured (unique correlation estimated for each pair of points) --- ### Fitting a GEE ```r library(gee) m3 <- gee(Reaction ~ Days, data = sleepstudy, id = Subject, family = "gaussian", corstr = "exchangeable") ``` ``` ## (Intercept) Days ## 251.40510 10.46729 ``` --- ### Fitting a GEE ``` ## ## GEE: GENERALIZED LINEAR MODELS FOR DEPENDENT DATA ## gee S-function, version 4.13 modified 98/01/27 (1998) ## ## Model: ## Link: Identity ## Variance to Mean Relation: Gaussian ## Correlation Structure: Exchangeable ## ## Call: ## gee(formula = Reaction ~ Days, id = Subject, data = sleepstudy, ## family = "gaussian", corstr = "exchangeable") ## ## Summary of Residuals: ## Min 1Q Median 3Q Max ## -110.847693 -27.482909 1.546014 26.141904 139.953079 ## ## ## Coefficients: ## Estimate Naive S.E. Naive z Robust S.E. Robust z ## (Intercept) 251.40510 9.5378078 26.35879 6.632277 37.9063 ## Days 10.46729 0.8109579 12.90731 1.502237 6.9678 ## ## Estimated Scale Parameter: 2276.694 ## Number of Iterations: 1 ## ## Working Correlation ## [,1] [,2] [,3] [,4] [,5] [,6] [,7] ## [1,] 1.0000000 0.5710385 0.5710385 0.5710385 0.5710385 0.5710385 0.5710385 ## [2,] 0.5710385 1.0000000 0.5710385 0.5710385 0.5710385 0.5710385 0.5710385 ## [3,] 0.5710385 0.5710385 1.0000000 0.5710385 0.5710385 0.5710385 0.5710385 ## [4,] 0.5710385 0.5710385 0.5710385 1.0000000 0.5710385 0.5710385 0.5710385 ## [5,] 0.5710385 0.5710385 0.5710385 0.5710385 1.0000000 0.5710385 0.5710385 ## [6,] 0.5710385 0.5710385 0.5710385 0.5710385 0.5710385 1.0000000 0.5710385 ## [7,] 0.5710385 0.5710385 0.5710385 0.5710385 0.5710385 0.5710385 1.0000000 ## [8,] 0.5710385 0.5710385 0.5710385 0.5710385 0.5710385 0.5710385 0.5710385 ## [9,] 0.5710385 0.5710385 0.5710385 0.5710385 0.5710385 0.5710385 0.5710385 ## [10,] 0.5710385 0.5710385 0.5710385 0.5710385 0.5710385 0.5710385 0.5710385 ## [,8] [,9] [,10] ## [1,] 0.5710385 0.5710385 0.5710385 ## [2,] 0.5710385 0.5710385 0.5710385 ## [3,] 0.5710385 0.5710385 0.5710385 ## [4,] 0.5710385 0.5710385 0.5710385 ## [5,] 0.5710385 0.5710385 0.5710385 ## [6,] 0.5710385 0.5710385 0.5710385 ## [7,] 0.5710385 0.5710385 0.5710385 ## [8,] 1.0000000 0.5710385 0.5710385 ## [9,] 0.5710385 1.0000000 0.5710385 ## [10,] 0.5710385 0.5710385 1.0000000 ``` --- ### Fitting a GEE ```r m4 <- gee(Reaction ~ Days, data = sleepstudy, id = Subject, family = "gaussian", corstr = "AR-M", Mv = 1) ``` ``` ## (Intercept) Days ## 251.40510 10.46729 ``` --- ### Fitting a GEE ``` ## ## GEE: GENERALIZED LINEAR MODELS FOR DEPENDENT DATA ## gee S-function, version 4.13 modified 98/01/27 (1998) ## ## Model: ## Link: Identity ## Variance to Mean Relation: Gaussian ## Correlation Structure: AR-M , M = 1 ## ## Call: ## gee(formula = Reaction ~ Days, id = Subject, data = sleepstudy, ## family = "gaussian", corstr = "AR-M", Mv = 1) ## ## Summary of Residuals: ## Min 1Q Median 3Q Max ## -112.927733 -29.566326 -0.533766 24.060825 137.872000 ## ## ## Coefficients: ## Estimate Naive S.E. Naive z Robust S.E. Robust z ## (Intercept) 253.48930 10.70967 23.669193 6.356466 39.878969 ## Days 10.46677 1.67394 6.252775 1.439438 7.271427 ## ## Estimated Scale Parameter: 2281.077 ## Number of Iterations: 2 ## ## Working Correlation ## [,1] [,2] [,3] [,4] [,5] [,6] [,7] ## [1,] 1.0000000 0.7670216 0.5883221 0.4512558 0.3461229 0.2654837 0.2036318 ## [2,] 0.7670216 1.0000000 0.7670216 0.5883221 0.4512558 0.3461229 0.2654837 ## [3,] 0.5883221 0.7670216 1.0000000 0.7670216 0.5883221 0.4512558 0.3461229 ## [4,] 0.4512558 0.5883221 0.7670216 1.0000000 0.7670216 0.5883221 0.4512558 ## [5,] 0.3461229 0.4512558 0.5883221 0.7670216 1.0000000 0.7670216 0.5883221 ## [6,] 0.2654837 0.3461229 0.4512558 0.5883221 0.7670216 1.0000000 0.7670216 ## [7,] 0.2036318 0.2654837 0.3461229 0.4512558 0.5883221 0.7670216 1.0000000 ## [8,] 0.1561900 0.2036318 0.2654837 0.3461229 0.4512558 0.5883221 0.7670216 ## [9,] 0.1198011 0.1561900 0.2036318 0.2654837 0.3461229 0.4512558 0.5883221 ## [10,] 0.0918900 0.1198011 0.1561900 0.2036318 0.2654837 0.3461229 0.4512558 ## [,8] [,9] [,10] ## [1,] 0.1561900 0.1198011 0.0918900 ## [2,] 0.2036318 0.1561900 0.1198011 ## [3,] 0.2654837 0.2036318 0.1561900 ## [4,] 0.3461229 0.2654837 0.2036318 ## [5,] 0.4512558 0.3461229 0.2654837 ## [6,] 0.5883221 0.4512558 0.3461229 ## [7,] 0.7670216 0.5883221 0.4512558 ## [8,] 1.0000000 0.7670216 0.5883221 ## [9,] 0.7670216 1.0000000 0.7670216 ## [10,] 0.5883221 0.7670216 1.0000000 ``` --- ### Fitting a GEE ```r m5 <- gee(I(Reaction > 250) ~ Days, data = sleepstudy, id = Subject, family = "binomial", corstr = "exchangeable") ``` ``` ## (Intercept) Days ## 0.6020332 0.1947168 ``` --- ### Fitting a GEE ``` ## ## GEE: GENERALIZED LINEAR MODELS FOR DEPENDENT DATA ## gee S-function, version 4.13 modified 98/01/27 (1998) ## ## Model: ## Link: Logit ## Variance to Mean Relation: Binomial ## Correlation Structure: Exchangeable ## ## Call: ## gee(formula = I(Reaction > 250) ~ Days, id = Subject, data = sleepstudy, ## family = "binomial", corstr = "exchangeable") ## ## Summary of Residuals: ## Min 1Q Median 3Q Max ## -0.92668743 0.07331257 0.12696659 0.21094886 0.32951465 ## ## ## Coefficients: ## Estimate Naive S.E. Naive z Robust S.E. Robust z ## (Intercept) 0.7103810 0.43644464 1.627654 0.4648023 1.528351 ## Days 0.2029448 0.06327235 3.207480 0.1126510 1.801536 ## ## Estimated Scale Parameter: 1.153384 ## Number of Iterations: 4 ## ## Working Correlation ## [,1] [,2] [,3] [,4] [,5] [,6] [,7] ## [1,] 1.0000000 0.5198122 0.5198122 0.5198122 0.5198122 0.5198122 0.5198122 ## [2,] 0.5198122 1.0000000 0.5198122 0.5198122 0.5198122 0.5198122 0.5198122 ## [3,] 0.5198122 0.5198122 1.0000000 0.5198122 0.5198122 0.5198122 0.5198122 ## [4,] 0.5198122 0.5198122 0.5198122 1.0000000 0.5198122 0.5198122 0.5198122 ## [5,] 0.5198122 0.5198122 0.5198122 0.5198122 1.0000000 0.5198122 0.5198122 ## [6,] 0.5198122 0.5198122 0.5198122 0.5198122 0.5198122 1.0000000 0.5198122 ## [7,] 0.5198122 0.5198122 0.5198122 0.5198122 0.5198122 0.5198122 1.0000000 ## [8,] 0.5198122 0.5198122 0.5198122 0.5198122 0.5198122 0.5198122 0.5198122 ## [9,] 0.5198122 0.5198122 0.5198122 0.5198122 0.5198122 0.5198122 0.5198122 ## [10,] 0.5198122 0.5198122 0.5198122 0.5198122 0.5198122 0.5198122 0.5198122 ## [,8] [,9] [,10] ## [1,] 0.5198122 0.5198122 0.5198122 ## [2,] 0.5198122 0.5198122 0.5198122 ## [3,] 0.5198122 0.5198122 0.5198122 ## [4,] 0.5198122 0.5198122 0.5198122 ## [5,] 0.5198122 0.5198122 0.5198122 ## [6,] 0.5198122 0.5198122 0.5198122 ## [7,] 0.5198122 0.5198122 0.5198122 ## [8,] 1.0000000 0.5198122 0.5198122 ## [9,] 0.5198122 1.0000000 0.5198122 ## [10,] 0.5198122 0.5198122 1.0000000 ```