In this lab, we demonstrate how to use the PStrata R package to conduct Principle Stratification analysis under the general non-compliance setting, i.e., all strata are potentially observable (unlike the truncation by death setting) via Bayesian mixture models. More information and code can be found in the R vignette.
We demonstrate the package using a case study with the non-compliance
setting. The study aims to estimate the effect of influenza vaccine with
the data originally studied by Mcdonald, Hui, and Tierney (1992) and
subsequently reanalyzed by Hirano et al. (2000). In this study,
physicians were randomly selected to receive a letter encouraging them
to vaccinate their patients against the flu
(encouragement
). The treatment of interest is the actual
vaccination status (vaccination
) of the patients, and the
outcome is a binary variable indicating whether they had hospital visits
for flu-related reasons (hospital
). For each patient,
several covariates are observed, including age (age
) and
the rate of chronic obstructive pulmonary disease (copd
).
These covariates are included into both the S-model and Y-model.
Given that flu vaccination is not mandatory and can be taken irrespective of encouragement, non-compliant patients can fall into either the always-vaccinated or never-vaccinated strata. We assume the standard monotonicity, which rules out defiers. Furthermore, we assume exclusion restriction (ER) for the always-vaccinated and never-vaccinated strata, implying that encouragement on vaccination does not exert any direct effect on flu-related hospital visits.
We need to first load the package
Now read in the flu data
flu <- read.table("flu-clean.txt", header = T)
flu <- flu %>% rename(
encouragement = grp,
vaccination = fluy2,
hospital = wcxho79
) %>% filter(age != '.', race != '.', sex != '.')
flu_new <- flu %>% mutate(
age = as.numeric(scale(as.numeric(age))),
race = as.numeric(race),
sex = as.numeric(sex)
)
PS analysis assuming monotonicity and ER. Call the main function PStrata:
fit <- PStrata(
S.formula = encouragement + vaccination ~ age + copd,
Y.formula = hospital ~ age + copd,
Y.family = binomial(link = "logit"),
data = flu_new,
strata = c(n = "00*", c = "01", a = "11*"),
prior_intercept = prior_normal(0, 1),
prior_coefficient = prior_normal(0, 1),
cores = 6, chains = 6,
warmup = 1000, iter = 2000, refresh = 10
)
summary(fit)
##
## mean sd 2.5% 25% median 75% 97.5%
## n 0.6940892 0.01237863 0.66988407 0.6856538 0.6940852 0.7025499 0.7186058
## c 0.1141814 0.01639040 0.08229839 0.1031532 0.1142454 0.1251446 0.1465781
## a 0.1917294 0.01056631 0.17143661 0.1844705 0.1916644 0.1987990 0.2127114
From the fitted model, 69.4% (CI: 67.1% to 71.7%) of the patients are never-vaccinated, 19.2% (CI: 17.2% to 21.3%) of the patients are always-vaccinated, and the remaining 11.4% (CI: 8.0% to 14.6%) are compliant-vaccinated.
outcome <- PSOutcome(fit)
summary(outcome, 'matrix')
## mean sd 2.5% 25% median 75%
## n:0 0.08153730 0.007550369 0.06699409 0.07648877 0.08139149 0.08661397
## n:1 0.08153730 0.007550369 0.06699409 0.07648877 0.08139149 0.08661397
## c:0 0.16716126 0.064600352 0.05866256 0.12042183 0.16223646 0.20739427
## c:1 0.06951129 0.027895640 0.02534001 0.04914282 0.06537610 0.08600195
## a:0 0.09957474 0.014238977 0.07318826 0.08973994 0.09911196 0.10882072
## a:1 0.09957474 0.014238977 0.07318826 0.08973994 0.09911196 0.10882072
## 97.5%
## n:0 0.09652278
## n:1 0.09652278
## c:0 0.30856553
## c:1 0.13243374
## a:0 0.12922551
## a:1 0.12922551
contrast <- PSContrast(outcome, Z = TRUE)
summary(contrast, 'matrix')
## mean sd 2.5% 25% median 75%
## n:{1}-{0} 0.00000000 0.00000000 0.0000000 0.0000000 0.00000000 0.00000000
## c:{1}-{0} -0.09764997 0.06902484 -0.2438766 -0.1411792 -0.09428071 -0.04997113
## a:{1}-{0} 0.00000000 0.00000000 0.0000000 0.0000000 0.00000000 0.00000000
## 97.5%
## n:{1}-{0} 0.00000000
## c:{1}-{0} 0.02317951
## a:{1}-{0} 0.00000000
The contrast for the compliers, also known as the complier average causal effect, is -0.098 (CI: -0.247, 0.023).
In this example, we use simulated data mimicking a case of two-sided noncompliance to illustrate . We generate 1000 units. Two covariates \(X_1, X_2\) are independently sampled from \(\mathcal{N}(0,1)\). We assign \(S \in \{(0, 0), (0, 1), (1, 1)\}\) independently with probability 0.3, 0.5 and 0.2, respectively. The treatment assignment is random to every unit with \(p(Z = 1) = 0.5\) to mimic a random clinical trial. Conditional on the stratum \(S\) and treatment assignment \(Z\), the outcome \(Y\) follows a gaussian distribution where \[\begin{aligned} Y\mid S = (0,0), Z = z, X_1, X_2 &\sim \mathcal{N}(X_1 - X_2 + X_1X_2, 0.3)\\ Y\mid S = (0,1), Z = z, X_1, X_2 &\sim \mathcal{N}(2X_1 - (1 + z)X_2 + 2 + 6z, 0.2)\\ Y\mid S = (1,1), Z = z, X_1, X_2 &\sim \mathcal{N}(X_1 + X_2 - 1, 0.2). \end{aligned}\]
set.seed(0)
N <- 500
S <- sample(c(0, 1, 3), N, replace = TRUE, prob = c(0.3, 0.5, 0.2))
Z <- sample(c(0, 1), N, replace = TRUE, prob = c(0.5, 0.5))
X1 <- rnorm(N)
X2 <- rnorm(N)
Y0 <- rnorm(N, X1 - X2 + X1 * X2, 0.3)
Y1 <- rnorm(N, 2 * X1 - (1 + Z) * X2 + 2 + 6 * Z, 0.2)
Y3 <- rnorm(N, X1 + X2 - 1, 0.2)
Y <- Y0 * (S == 0) + Y1 * (S == 1) + Y3 * (S == 3)
D <- 0 * (S == 0) + Z * (S == 1) + 1 * (S == 3)
data_sim_1 <- data.frame(S = S, Z = Z, D = D, X1 = X1, X2 = X2, Y = Y)
To fit the principal stratification model, we use the function
PStrata
.
# Expect the code to run for 50 seconds.
fit_sim_1 <- PStrata(
S.formula = Z + D ~ 1, # strata-model
Y.formula = Y ~ X1 * X2, # outcome-model
Y.family = gaussian(),
data = data_sim_1,
strata = c(n = "00*", c = "01", a = "11*"), # ER for 00 and 11
prior_intercept = prior_normal(0, 1),
warmup = 500, iter = 1000, # number of posterior draws
cores = 1, chains = 1, refresh = 100 # additional params for rstan
)
The output shows that the fitted object is a PStrata
object with three strata (“00”, “01” and “11”). The estimated proportion
of these three strata are respectively 0.27, 0.54 and 0.19. More details
containing the quantiles of the proportion can be obtained via
summary()
function.
summary(fit_sim_1)
##
## mean sd 2.5% 25% median 75% 97.5%
## n 0.2713596 0.01959676 0.2325805 0.2574988 0.2710431 0.2843103 0.3087445
## c 0.5363448 0.02096907 0.4955951 0.5217178 0.5367542 0.5513913 0.5746530
## a 0.1922957 0.01677380 0.1603042 0.1814399 0.1915056 0.2026696 0.2281863
After fitting the model, we can estimate the stratum-treatment
specific mean outcome \(E[Y(z) \mid
S]\). To do this, we first define a PSOutcome
object.
outcome_sim_1 <- PSOutcome(fit_sim_1)
outcome_sim_1
## Non-survival PSOutcome Object (3 strata, 2 treatment arms, 500 iterations)
We obtain the estimates of \(E[Y(z)\mid
S]\) via summary()
function.
summary(outcome_sim_1, 'matrix')
## mean sd 2.5% 25% median 75%
## n:0 0.01464621 0.03001140 -0.04090769 -0.00649922 0.01319651 0.03563019
## n:1 0.01464621 0.03001140 -0.04090769 -0.00649922 0.01319651 0.03563019
## c:0 1.98211734 0.01855278 1.94447775 1.97035089 1.98253144 1.99400254
## c:1 8.05230645 0.01863840 8.01400960 8.04024203 8.05200197 8.06629026
## a:0 -1.09266937 0.01953406 -1.13162101 -1.10426841 -1.09335022 -1.08166714
## a:1 -1.09266937 0.01953406 -1.13162101 -1.10426841 -1.09335022 -1.08166714
## 97.5%
## n:0 0.07059981
## n:1 0.07059981
## c:0 2.01806883
## c:1 8.08809754
## a:0 -1.05094900
## a:1 -1.05094900
Similarly, we shall obtain the stratum-specific treatment effect \(E[Y(1) - Y(0)\mid S]\) as contrasts of two stratum-treatment specific mean outcomes.
contrast_sim_1 <- PSContrast(outcome_sim_1, Z = TRUE)
summary(contrast_sim_1, 'matrix')
## mean sd 2.5% 25% median 75% 97.5%
## n:{1}-{0} 0.000000 0.00000000 0.000000 0.000000 0.000000 0.000000 0.000000
## c:{1}-{0} 6.070189 0.02745014 6.017786 6.052244 6.069754 6.089891 6.124838
## a:{1}-{0} 0.000000 0.00000000 0.000000 0.000000 0.000000 0.000000 0.000000
In this example, we use simulated data mimicking a case of two post-randomization variables (e.g. treatment switching \(D_1\) and discontinuation \(D_2\)) to illustrate Let the principal strata be defined by \(S = (D_1(0), D_1(1), D_2(0), D_2(1))\). In this study, we include five out of 16 strata, namely \(\mathcal{S} = \{0000, 0001, 0101, 0011, 1111\}\), and assume ER for 0000, 0011 and 1111. We do not include baseline covariates.
We simulate 10,000 sample units. We randomly assign principal stratum \(S\) and treatment status \(Z\) to each unit, with the stratum-assignment probability being \(p = (0.15, 0.3, 0.2, 0.2, 0.15)\) and the treatment assignment probability \(P(Z = 1) = 0.5\). The outcome variable \(Y\) is sampled from a Gaussian distribution where \[\begin{aligned} Y\mid S = 0000, Z = z &\sim \mathcal{N}(3, 1),\\ Y\mid S = 0001, Z = 0 &\sim \mathcal{N}(-1, 0.5),\\ Y\mid S = 0001, Z = 1 &\sim \mathcal{N}(-3, 0.5),\\ Y\mid S = 0011, Z = 0 &\sim \mathcal{N}(2, 0.5),\\ Y\mid S = 0011, Z = 1 &\sim \mathcal{N}(5, 0.5),\\ Y\mid S = 0101, Z = z &\sim \mathcal{N}(-1, 3),\\ Y\mid S = 1111, Z = z &\sim \mathcal{N}(1, 2).\\ \end{aligned}\]
set.seed(0)
N <- 500
S <- sample(c(0, 1, 2, 3, 4), N, replace = TRUE, prob = c(0.15, 0.3, 0.2, 0.2, 0.15))
Z <- sample(c(0, 1), N, replace = TRUE, prob = c(0.5, 0.5))
Y0 <- rnorm(N, 3, 1)
Y1 <- rnorm(N, -1 - 2 * Z, 0.5)
Y2 <- rnorm(N, 2 + 3 * Z, 0.5)
Y3 <- rnorm(N, -1, 3)
Y4 <- rnorm(N, 1, 2)
Y <- Y0 * (S == 0) + Y1 * (S == 1) + Y2 * (S == 2) + Y3 * (S == 3) + Y4 * (S == 4)
D1 <- 0 * (S == 0) + 0 * (S == 1) + Z * (S == 2) + 0 * (S == 3) + 1 * (S == 4)
D2 <- 0 * (S == 0) + Z * (S == 1) + Z * (S == 2) + 1 * (S == 3) + 1 * (S == 4)
data_sim_2 <- data.frame(S = S, Z = Z, D1 = D1, D2 = D2, Y = Y)
To fit the principal stratification model, we use the function
PStrata
.
# Expect the code to run for 1 minute.
fit_sim_2 <- PStrata(
S.formula = Z + D1 + D2 ~ 1, # strata-model
Y.formula = Y ~ 1, # outcome-model
Y.family = gaussian(),
data = data_sim_2,
strata = c(
nn = "00|00*", nc = "00|01", cc = "01|01", na = "00|11*", aa = "11|11*"
), # ER for 0000, 0011 and 1111
prior_intercept = prior_normal(0, 1),
warmup = 1000, iter = 2000, # number of posterior draws
cores = 6, chains = 6, refresh = 100 # additional params for rstan
)
fit_sim_2
The output shows that the fitted object is a PStrata
object with five strata (“00”, “01” and “11”). The estimated proportion
of these three strata are respectively 0.14, 0.29, 0.25, 0.17 and 0.14.
More details containing the quantiles of the proportion can be obtained
via summary()
function.
summary(fit_sim_2)
##
## mean sd 2.5% 25% median 75% 97.5%
## nn 0.1415638 0.01940088 0.1060333 0.1280018 0.1410142 0.1536411 0.1823727
## nc 0.2835854 0.02452357 0.2355547 0.2671040 0.2836109 0.3001903 0.3318968
## cc 0.2602745 0.02139645 0.2164755 0.2467728 0.2606711 0.2745099 0.3013636
## na 0.1721335 0.01819867 0.1377121 0.1598360 0.1714303 0.1838966 0.2098265
## aa 0.1424429 0.01599924 0.1131864 0.1312081 0.1418275 0.1527108 0.1756519
After fitting the model, we can estimate the stratum-treatment
specific mean outcome \(E[Y(z) \mid
S]\). To do this, we first define a PSOutcome
object.
outcome_sim_2 <- PSOutcome(fit_sim_2)
outcome_sim_2
## Non-survival PSOutcome Object (5 strata, 2 treatment arms, 6000 iterations)
We obtain the estimates of \(E[Y(z)\mid
S]\) via summary()
function.
summary(outcome_sim_2, 'matrix')
## mean sd 2.5% 25% median 75%
## nn:0 2.7500485 0.15623388 2.4204813 2.6503968 2.7558580 2.8585265
## nn:1 2.7500485 0.15623388 2.4204813 2.6503968 2.7558580 2.8585265
## nc:0 1.5455674 1.17557741 -1.1334414 1.9190143 2.0351996 2.1291977
## nc:1 -3.0769840 0.05948234 -3.1950553 -3.1162267 -3.0780242 -3.0386311
## cc:0 -0.5529952 1.15833873 -1.1854335 -1.1001898 -1.0550343 -0.9936478
## cc:1 5.0304496 0.07375488 4.8823312 4.9817096 5.0322094 5.0800789
## na:0 -0.5248572 0.32391110 -1.1669752 -0.7433145 -0.5203816 -0.3115532
## na:1 -0.5248572 0.32391110 -1.1669752 -0.7433145 -0.5203816 -0.3115532
## aa:0 1.2072830 0.21985965 0.7821366 1.0550475 1.2032175 1.3519967
## aa:1 1.2072830 0.21985965 0.7821366 1.0550475 1.2032175 1.3519967
## 97.5%
## nn:0 3.0399067
## nn:1 3.0399067
## nc:0 2.3030854
## nc:1 -2.9547480
## cc:0 2.1561111
## cc:1 5.1690131
## na:0 0.1198462
## na:1 0.1198462
## aa:0 1.6483725
## aa:1 1.6483725
Similarly, we shall obtain the stratum-specific treatment effect \(E[Y(1) - Y(0)\mid S]\) as contrasts of two stratum-treatment specific mean outcomes.
contrast_sim_2 <- PSContrast(outcome_sim_2, Z = TRUE)
summary(contrast_sim_2, 'matrix')
## mean sd 2.5% 25% median 75% 97.5%
## nn:{1}-{0} 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
## nc:{1}-{0} -4.622551 1.177847 -5.410777 -5.213474 -5.111587 -4.977055 -1.911335
## cc:{1}-{0} 5.583445 1.160861 2.842398 5.975988 6.076284 6.153460 6.280881
## na:{1}-{0} 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
## aa:{1}-{0} 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000