Lab 6: Principal Stratification Analysis with the PStrata package

Goal

In this lab, we demonstrate how to use the PStrata R package to conduct Principle Stratification analysis under the general non-compliance setting, i.e., all strata are potentially observable (unlike the truncation by death setting) via Bayesian mixture models. More information and code can be found in the R vignette.

Example 1: Flu data

We demonstrate the package using a case study with the non-compliance setting. The study aims to estimate the effect of influenza vaccine with the data originally studied by Mcdonald, Hui, and Tierney (1992) and subsequently reanalyzed by Hirano et al. (2000). In this study, physicians were randomly selected to receive a letter encouraging them to vaccinate their patients against the flu (encouragement). The treatment of interest is the actual vaccination status (vaccination) of the patients, and the outcome is a binary variable indicating whether they had hospital visits for flu-related reasons (hospital). For each patient, several covariates are observed, including age (age) and the rate of chronic obstructive pulmonary disease (copd). These covariates are included into both the S-model and Y-model.

Given that flu vaccination is not mandatory and can be taken irrespective of encouragement, non-compliant patients can fall into either the always-vaccinated or never-vaccinated strata. We assume the standard monotonicity, which rules out defiers. Furthermore, we assume exclusion restriction (ER) for the always-vaccinated and never-vaccinated strata, implying that encouragement on vaccination does not exert any direct effect on flu-related hospital visits.

Load the pacakge

We need to first load the package

Now read in the flu data

flu <- read.table("flu-clean.txt", header = T)
flu <- flu %>% rename(
  encouragement = grp, 
  vaccination = fluy2,
  hospital = wcxho79
) %>% filter(age != '.', race != '.', sex != '.')

flu_new <- flu %>% mutate(
  age = as.numeric(scale(as.numeric(age))),
  race = as.numeric(race),
  sex = as.numeric(sex)
)

Generate data

PS analysis assuming monotonicity and ER. Call the main function PStrata:

fit <- PStrata(
  S.formula = encouragement + vaccination ~ age + copd,
  Y.formula = hospital ~ age + copd,
  Y.family = binomial(link = "logit"),
  data = flu_new,
  strata = c(n = "00*", c = "01", a = "11*"),
  prior_intercept = prior_normal(0, 1),
  prior_coefficient = prior_normal(0, 1),
  cores = 6, chains = 6, 
  warmup = 1000, iter = 2000, refresh = 10
)

summary(fit)

##    
##          mean         sd       2.5%       25%    median       75%     97.5%
##   n 0.6940892 0.01237863 0.66988407 0.6856538 0.6940852 0.7025499 0.7186058
##   c 0.1141814 0.01639040 0.08229839 0.1031532 0.1142454 0.1251446 0.1465781
##   a 0.1917294 0.01056631 0.17143661 0.1844705 0.1916644 0.1987990 0.2127114

From the fitted model, 69.4% (CI: 67.1% to 71.7%) of the patients are never-vaccinated, 19.2% (CI: 17.2% to 21.3%) of the patients are always-vaccinated, and the remaining 11.4% (CI: 8.0% to 14.6%) are compliant-vaccinated.

outcome <- PSOutcome(fit)
summary(outcome, 'matrix')

##           mean          sd       2.5%        25%     median        75%
## n:0 0.08153730 0.007550369 0.06699409 0.07648877 0.08139149 0.08661397
## n:1 0.08153730 0.007550369 0.06699409 0.07648877 0.08139149 0.08661397
## c:0 0.16716126 0.064600352 0.05866256 0.12042183 0.16223646 0.20739427
## c:1 0.06951129 0.027895640 0.02534001 0.04914282 0.06537610 0.08600195
## a:0 0.09957474 0.014238977 0.07318826 0.08973994 0.09911196 0.10882072
## a:1 0.09957474 0.014238977 0.07318826 0.08973994 0.09911196 0.10882072
##          97.5%
## n:0 0.09652278
## n:1 0.09652278
## c:0 0.30856553
## c:1 0.13243374
## a:0 0.12922551
## a:1 0.12922551

contrast <- PSContrast(outcome, Z = TRUE)
summary(contrast, 'matrix')

##                  mean         sd       2.5%        25%      median         75%
## n:{1}-{0}  0.00000000 0.00000000  0.0000000  0.0000000  0.00000000  0.00000000
## c:{1}-{0} -0.09764997 0.06902484 -0.2438766 -0.1411792 -0.09428071 -0.04997113
## a:{1}-{0}  0.00000000 0.00000000  0.0000000  0.0000000  0.00000000  0.00000000
##                97.5%
## n:{1}-{0} 0.00000000
## c:{1}-{0} 0.02317951
## a:{1}-{0} 0.00000000

The contrast for the compliers, also known as the complier average causal effect, is -0.098 (CI: -0.247, 0.023).

Example 2: Simulated data with two-sided non-compliance

Generate data

In this example, we use simulated data mimicking a case of two-sided noncompliance to illustrate . We generate 1000 units. Two covariates \(X_1, X_2\) are independently sampled from \(\mathcal{N}(0,1)\). We assign \(S \in \{(0, 0), (0, 1), (1, 1)\}\) independently with probability 0.3, 0.5 and 0.2, respectively. The treatment assignment is random to every unit with \(p(Z = 1) = 0.5\) to mimic a random clinical trial. Conditional on the stratum \(S\) and treatment assignment \(Z\), the outcome \(Y\) follows a gaussian distribution where \[\begin{aligned} Y\mid S = (0,0), Z = z, X_1, X_2 &\sim \mathcal{N}(X_1 - X_2 + X_1X_2, 0.3)\\ Y\mid S = (0,1), Z = z, X_1, X_2 &\sim \mathcal{N}(2X_1 - (1 + z)X_2 + 2 + 6z, 0.2)\\ Y\mid S = (1,1), Z = z, X_1, X_2 &\sim \mathcal{N}(X_1 + X_2 - 1, 0.2). \end{aligned}\]

set.seed(0)

N <- 500
S <- sample(c(0, 1, 3), N, replace = TRUE, prob = c(0.3, 0.5, 0.2))
Z <- sample(c(0, 1), N, replace = TRUE, prob = c(0.5, 0.5))
X1 <- rnorm(N)
X2 <- rnorm(N)
Y0 <- rnorm(N, X1 - X2 + X1 * X2, 0.3)
Y1 <- rnorm(N, 2 * X1 - (1 + Z) * X2 + 2 + 6 * Z, 0.2)
Y3 <- rnorm(N, X1 + X2 - 1, 0.2)
Y <- Y0 * (S == 0) + Y1 * (S == 1) + Y3 * (S == 3)
D <- 0 * (S == 0) + Z * (S == 1) + 1 * (S == 3)

data_sim_1 <- data.frame(S = S, Z = Z, D = D, X1 = X1, X2 = X2, Y = Y)

Fit principal stratification model

To fit the principal stratification model, we use the function PStrata.

# Expect the code to run for 50 seconds.
fit_sim_1 <- PStrata(
  S.formula = Z + D ~ 1, # strata-model
  Y.formula = Y ~ X1 * X2, # outcome-model
  Y.family = gaussian(), 
  data = data_sim_1,
  strata = c(n = "00*", c = "01", a = "11*"), # ER for 00 and 11
  prior_intercept = prior_normal(0, 1),
  warmup = 500, iter = 1000, # number of posterior draws
  cores = 1, chains = 1, refresh = 100 # additional params for rstan
)

The output shows that the fitted object is a PStrata object with three strata (“00”, “01” and “11”). The estimated proportion of these three strata are respectively 0.27, 0.54 and 0.19. More details containing the quantiles of the proportion can be obtained via summary() function.

summary(fit_sim_1)

##    
##          mean         sd      2.5%       25%    median       75%     97.5%
##   n 0.2713596 0.01959676 0.2325805 0.2574988 0.2710431 0.2843103 0.3087445
##   c 0.5363448 0.02096907 0.4955951 0.5217178 0.5367542 0.5513913 0.5746530
##   a 0.1922957 0.01677380 0.1603042 0.1814399 0.1915056 0.2026696 0.2281863

Outcome and contrast

After fitting the model, we can estimate the stratum-treatment specific mean outcome \(E[Y(z) \mid S]\). To do this, we first define a PSOutcome object.

outcome_sim_1 <- PSOutcome(fit_sim_1)
outcome_sim_1

## Non-survival PSOutcome Object (3 strata, 2 treatment arms, 500 iterations)

We obtain the estimates of \(E[Y(z)\mid S]\) via summary() function.

summary(outcome_sim_1, 'matrix')

##            mean         sd        2.5%         25%      median         75%
## n:0  0.01464621 0.03001140 -0.04090769 -0.00649922  0.01319651  0.03563019
## n:1  0.01464621 0.03001140 -0.04090769 -0.00649922  0.01319651  0.03563019
## c:0  1.98211734 0.01855278  1.94447775  1.97035089  1.98253144  1.99400254
## c:1  8.05230645 0.01863840  8.01400960  8.04024203  8.05200197  8.06629026
## a:0 -1.09266937 0.01953406 -1.13162101 -1.10426841 -1.09335022 -1.08166714
## a:1 -1.09266937 0.01953406 -1.13162101 -1.10426841 -1.09335022 -1.08166714
##           97.5%
## n:0  0.07059981
## n:1  0.07059981
## c:0  2.01806883
## c:1  8.08809754
## a:0 -1.05094900
## a:1 -1.05094900

Similarly, we shall obtain the stratum-specific treatment effect \(E[Y(1) - Y(0)\mid S]\) as contrasts of two stratum-treatment specific mean outcomes.

contrast_sim_1 <- PSContrast(outcome_sim_1, Z = TRUE)
summary(contrast_sim_1, 'matrix')

##               mean         sd     2.5%      25%   median      75%    97.5%
## n:{1}-{0} 0.000000 0.00000000 0.000000 0.000000 0.000000 0.000000 0.000000
## c:{1}-{0} 6.070189 0.02745014 6.017786 6.052244 6.069754 6.089891 6.124838
## a:{1}-{0} 0.000000 0.00000000 0.000000 0.000000 0.000000 0.000000 0.000000

Example 3: Simulated data with two post-treatment variables

Generate data

In this example, we use simulated data mimicking a case of two post-randomization variables (e.g. treatment switching \(D_1\) and discontinuation \(D_2\)) to illustrate Let the principal strata be defined by \(S = (D_1(0), D_1(1), D_2(0), D_2(1))\). In this study, we include five out of 16 strata, namely \(\mathcal{S} = \{0000, 0001, 0101, 0011, 1111\}\), and assume ER for 0000, 0011 and 1111. We do not include baseline covariates.

We simulate 10,000 sample units. We randomly assign principal stratum \(S\) and treatment status \(Z\) to each unit, with the stratum-assignment probability being \(p = (0.15, 0.3, 0.2, 0.2, 0.15)\) and the treatment assignment probability \(P(Z = 1) = 0.5\). The outcome variable \(Y\) is sampled from a Gaussian distribution where \[\begin{aligned} Y\mid S = 0000, Z = z &\sim \mathcal{N}(3, 1),\\ Y\mid S = 0001, Z = 0 &\sim \mathcal{N}(-1, 0.5),\\ Y\mid S = 0001, Z = 1 &\sim \mathcal{N}(-3, 0.5),\\ Y\mid S = 0011, Z = 0 &\sim \mathcal{N}(2, 0.5),\\ Y\mid S = 0011, Z = 1 &\sim \mathcal{N}(5, 0.5),\\ Y\mid S = 0101, Z = z &\sim \mathcal{N}(-1, 3),\\ Y\mid S = 1111, Z = z &\sim \mathcal{N}(1, 2).\\ \end{aligned}\]

set.seed(0)

N <- 500
S <- sample(c(0, 1, 2, 3, 4), N, replace = TRUE, prob = c(0.15, 0.3, 0.2, 0.2, 0.15))
Z <- sample(c(0, 1), N, replace = TRUE, prob = c(0.5, 0.5))
Y0 <- rnorm(N, 3, 1)
Y1 <- rnorm(N, -1 - 2 * Z, 0.5)
Y2 <- rnorm(N, 2 + 3 * Z, 0.5)
Y3 <- rnorm(N, -1, 3)
Y4 <- rnorm(N, 1, 2)
Y <- Y0 * (S == 0) + Y1 * (S == 1) + Y2 * (S == 2) + Y3 * (S == 3) + Y4 * (S == 4)
D1 <- 0 * (S == 0) + 0 * (S == 1) + Z * (S == 2) + 0 * (S == 3) + 1 * (S == 4)
D2 <- 0 * (S == 0) + Z * (S == 1) + Z * (S == 2) + 1 * (S == 3) + 1 * (S == 4)

data_sim_2 <- data.frame(S = S, Z = Z, D1 = D1, D2 = D2, Y = Y)

Fit principal stratification model

To fit the principal stratification model, we use the function PStrata.

# Expect the code to run for 1 minute.
fit_sim_2 <- PStrata(
  S.formula = Z + D1 + D2 ~ 1, # strata-model
  Y.formula = Y ~ 1, # outcome-model
  Y.family = gaussian(), 
  data = data_sim_2,
  strata = c(
    nn = "00|00*", nc = "00|01", cc = "01|01", na = "00|11*", aa = "11|11*"
  ), # ER for 0000, 0011 and 1111
  prior_intercept = prior_normal(0, 1),
  warmup = 1000, iter = 2000, # number of posterior draws
  cores = 6, chains = 6, refresh = 100 # additional params for rstan
)

fit_sim_2

The output shows that the fitted object is a PStrata object with five strata (“00”, “01” and “11”). The estimated proportion of these three strata are respectively 0.14, 0.29, 0.25, 0.17 and 0.14. More details containing the quantiles of the proportion can be obtained via summary() function.

summary(fit_sim_2)

##     
##           mean         sd      2.5%       25%    median       75%     97.5%
##   nn 0.1415638 0.01940088 0.1060333 0.1280018 0.1410142 0.1536411 0.1823727
##   nc 0.2835854 0.02452357 0.2355547 0.2671040 0.2836109 0.3001903 0.3318968
##   cc 0.2602745 0.02139645 0.2164755 0.2467728 0.2606711 0.2745099 0.3013636
##   na 0.1721335 0.01819867 0.1377121 0.1598360 0.1714303 0.1838966 0.2098265
##   aa 0.1424429 0.01599924 0.1131864 0.1312081 0.1418275 0.1527108 0.1756519

Outcome and contrast

After fitting the model, we can estimate the stratum-treatment specific mean outcome \(E[Y(z) \mid S]\). To do this, we first define a PSOutcome object.

outcome_sim_2 <- PSOutcome(fit_sim_2)
outcome_sim_2

## Non-survival PSOutcome Object (5 strata, 2 treatment arms, 6000 iterations)

We obtain the estimates of \(E[Y(z)\mid S]\) via summary() function.

summary(outcome_sim_2, 'matrix')

##            mean         sd       2.5%        25%     median        75%
## nn:0  2.7500485 0.15623388  2.4204813  2.6503968  2.7558580  2.8585265
## nn:1  2.7500485 0.15623388  2.4204813  2.6503968  2.7558580  2.8585265
## nc:0  1.5455674 1.17557741 -1.1334414  1.9190143  2.0351996  2.1291977
## nc:1 -3.0769840 0.05948234 -3.1950553 -3.1162267 -3.0780242 -3.0386311
## cc:0 -0.5529952 1.15833873 -1.1854335 -1.1001898 -1.0550343 -0.9936478
## cc:1  5.0304496 0.07375488  4.8823312  4.9817096  5.0322094  5.0800789
## na:0 -0.5248572 0.32391110 -1.1669752 -0.7433145 -0.5203816 -0.3115532
## na:1 -0.5248572 0.32391110 -1.1669752 -0.7433145 -0.5203816 -0.3115532
## aa:0  1.2072830 0.21985965  0.7821366  1.0550475  1.2032175  1.3519967
## aa:1  1.2072830 0.21985965  0.7821366  1.0550475  1.2032175  1.3519967
##           97.5%
## nn:0  3.0399067
## nn:1  3.0399067
## nc:0  2.3030854
## nc:1 -2.9547480
## cc:0  2.1561111
## cc:1  5.1690131
## na:0  0.1198462
## na:1  0.1198462
## aa:0  1.6483725
## aa:1  1.6483725

Similarly, we shall obtain the stratum-specific treatment effect \(E[Y(1) - Y(0)\mid S]\) as contrasts of two stratum-treatment specific mean outcomes.

contrast_sim_2 <- PSContrast(outcome_sim_2, Z = TRUE)
summary(contrast_sim_2, 'matrix')

##                 mean       sd      2.5%       25%    median       75%     97.5%
## nn:{1}-{0}  0.000000 0.000000  0.000000  0.000000  0.000000  0.000000  0.000000
## nc:{1}-{0} -4.622551 1.177847 -5.410777 -5.213474 -5.111587 -4.977055 -1.911335
## cc:{1}-{0}  5.583445 1.160861  2.842398  5.975988  6.076284  6.153460  6.280881
## na:{1}-{0}  0.000000 0.000000  0.000000  0.000000  0.000000  0.000000  0.000000
## aa:{1}-{0}  0.000000 0.000000  0.000000  0.000000  0.000000  0.000000  0.000000

Lab 6: Principal Stratification Analysis with the PStrata package

Fan Li and Bo Liu

12/01/2023

Goal

Example 1: Flu data

Load the pacakge

Generate data

Example 2: Simulated data with two-sided non-compliance

Generate data

Fit principal stratification model

Outcome and contrast

Example 3: Simulated data with two post-treatment variables

Generate data

Fit principal stratification model

Outcome and contrast