Identification and multiply robust estimation in causal mediation analysis across principal strata
Abstract
We consider assessing causal mediation in the presence of a post-treatment event (examples include noncompliance, a clinical event, or death). We identify natural mediation effects for the entire study population and for each principal stratum characterized by the joint potential values of the post-treatment event. We derive the efficient influence function for each mediation estimand, which motivates a set of multiply robust estimators for inference. The multiply robust estimators are consistent under four types of misspecifications and are efficient when all nuisance models are correctly specified. We also develop a nonparametric efficient estimator that leverages data-adaptive machine learners to achieve efficient inference and discuss sensitivity methods to address key identification assumptions. We illustrate our methods via simulations and two real data examples.
Keywords: Causal inference, efficient influence function, endogenous subgroups, moderated mediation analysis, natural indirect effect, principal ignorability.
1 Introduction
1.1 Background and motivation
Causal mediation analysis (Imai et al., 2010) is widely used to investigate the role of a mediator () in explaining the causal mechanism from a treatment () to an outcome (). Under the potential outcomes framework, a primary step in causal mediation analysis is to decompose the total treatment effect into an indirect effect that works through and a direct effect that works around . While alternative definitions exist, the natural indirect and direct effects are the most relevant for studying causal mechanisms (Nguyen et al., 2021). The natural indirect effect compares potential outcomes by switching from the value it would have taken under the control condition to that under the treated condition, while fixing to the treated condition. The natural direct effect compares potential outcomes by switching from the control to the treated condition, while fixing to the value it would have taken under the control condition. Parametric regressions (e.g., Cheng et al., 2021, 2023b), semiparametric methods (e.g., Tchetgen Tchetgen and Shpitser, 2012), and nonparametric methods (e.g., Kim et al., 2017) have been proposed for estimating natural mediation effects, typically by assuming that is the only variable sitting on the causal chain connecting treatment and outcome.
Increasingly for experimental and observational studies, a post-treatment event () may occur prior to the measurement of the mediator. This event may be a post-treatment action or decision regarding uptake (e.g., noncompliance or treatment discontinuation), a clinical event (e.g., worsening of disease, adverse medication effect), or a terminal event precluding the observation of any data afterward (e.g., death). In each context, the post-treatment event provides important information in defining partially observed subgroups of the study population. An emerging interest lies in learning the treatment effect within each subgroup, and possibly the effect heterogeneity across subgroups. Under the principal stratification framework, one can characterize each subgroup with the joint potential values of under alternative conditions, referred to as principal strata more generally (Frangakis and Rubin, 2002), or as endogenous subgroups in social sciences (Page et al., 2015). Beyond understanding the total effect within each subgroup, here we are further interested in evaluating the natural mediation effects via within each subgroup characterized by a post-treatment event . Below, we give two examples that motivate such an objective (an additional example on mediation analysis with death-truncated mediator and outcome is provided in the Supplementary Material).
Example 1
(Mediation analysis with noncompliance) Noncompliance occurs if the actual treatment received () differs from the treatment assignment (). By Angrist et al. (1996), the study population is partitioned into four subgroups, including (i) always-takers who take the treatment regardless of assignment; (ii) never-takers who take the control regardless of assignment; (iii) compliers who comply with the assignment; (iv) defiers who take the opposite assignment. Typically, the compliers are of central interest because this is the only group for whom the average causal effect due to assignment reflects the average causal effect due to actual treatment received. As noncompliance is a post-randomization action or decision, a relevant research question in the presence of a mediator, measured prior to the outcome, is whether the treatment works through among the compliers (i.e., the complier natural mediation effect). A follow-up question is whether there is heterogeneity in the treatment effect mechanisms among the subgroups formed by noncompliance patterns.
Example 2
(Mediation analysis with an intercurrent event) In health research, disease progression, adverse reaction, or other early outcome may occur due to treatment, which are collectively referred to as an intercurrent event by the ICH E9 Estimands Framework (Kahan et al., 2023). Section 6.2 studies the role of perceived control () in mediating the effect of residence in a damp and moldy dwelling () on depression (), but dampness/mold related disease () occurred among some study units. It is then of interest to study the indirect effect due to among those who would always develop dampness/mold related disease regardless of their living condition (i.e., the doomed stratum), those who never develop dampness/mold related disease regardless of their living condition (i.e., the immune stratum), as well as those who would develop dampness/mold related disease only if living in such condition but otherwise not (i.e., the harmed stratum). A follow-up question is whether there is variation in the natural indirect effects among these subgroups.
In both examples, the principal causal effect (PCE)—the total treatment effect within a principal stratum—can be decomposed into a principal natural indirect effect (PNIE) through the mediator and a principal natural direct effect (PNDE) around the mediator. Addressing the PNIEs and their variation allows us to unpack the overall natural indirect effect to understand for whom and under what circumstances plays a crucial role in explaining the underlying mechanism. Such analyses could also help determine whether the overall natural indirect effect is driven by one particular subgroup, in which case future interventions might be restructured to better serve an intended subpopulation. To this end, estimating principal natural mediation effects is informative in itself, but comparing them across strata may also provide additional insights. In fact, studying variations in principal natural mediation effects is related to moderated mediation analysis given measured baseline covariates (Qin and Wang, 2024). However, the difference is that we focus on endogenous subgroups defined by a post-treatment event rather than those defined by covariates. Consequently, the scientific question addressed by principal natural mediation effects generally cannot be answered by merely exploring the conditional mediation effects given observed covariates alone (an empirical comparison is provided in Section 6.2).
1.2 Prior work and our contribution
The post-treatment event poses unique challenges on identification of the principal natural mediation effects, because (i) is a treatment-induced variable confounding the mediator-outcome relationship and (ii) the principal stratum membership is only partially observed. To tackle these challenges, a prevalent approach, usually developed under the noncompliance setting (Example 1), is to view the treatment assignment as an instrument variable for and use exclusion restriction to identify the mediation effects among the compliers. Exclusion restriction requires that all causal pathways from the treatment assignment to the mediator and outcome are only through (see Figure 1 for illustration), and therefore no treatment or mediation effects exist among principal strata where does not affect . For example, Yamamoto (2013), Park and Kürüm (2018, 2020) considered a combination of exclusion restriction, monotonicity of treatment assignment on the treatment receipt, and mediator ignorability to nonparametrically identify the complier natural mediation effects. Instead of assuming mediator ignorability, Frölich and Huber (2017) identified the complier natural mediation effect by drawing a second instrumental variable for the mediator. Under monotonicity and exclusion restriction, Rudolph et al. (2024) further proposed semiparametrically efficient estimators of (i) complier natural/interventional mediation effects with a single instrumental variable for the treatment and (ii) double complier interventional mediation effects with two instrumental variables for the treatment and mediator.
A critical assumption in the prior work is the exclusion restriction, which may not always hold in open-label studies where the assignment can exert a direct psychological effect on the mediator and the outcome not through the treatment receipt. In Example 2, the PCE and PNIE among the doomed and immune strata are of interest and may be non-zero; in this instance, the exclusion restriction does not apply either. Relaxing exclusion restriction, Park and Palardy (2020) developed a maximum likelihood approach with full distributional assumptions to empirically identify principal natural mediation effects; however, the consistency of their estimators requires all parametric models to be correctly specified and bias may arise under misspecification.
Our primary interest is to identify the principal natural mediation effects in endogenous subgroups defined by a post-treatment event. We assume monotonicity and ignorability conditions for nonparametric identification, but require neither the exclusion restriction nor fully parametric modeling assumptions. In a similar context, Tchetgen Tchetgen and VanderWeele (2014) studied the marginal natural mediation effect of in the presence of a post-treatment confounder based on a nonparametric structural equation model with independent errors and monotonicity, but did not provide identification results for the finer causal mediation estimands within the principal stratum. As we explain in Section 3, our identification assumptions are weaker than those considered by Tchetgen Tchetgen and VanderWeele (2014) in a technical sense, but remain sufficient to unpack the stratum-specific mediation effects that contribute to the marginal natural mediation effect. Leveraging the semiparametric efficiency theory (Bickel et al., 1993, see Hines et al. (2022) for an overview), we characterize the efficient influence function for each estimand under the nonparametric model to motivate semiparametric estimators. Our estimators are consistent under four types of working model misspecification, and are quadruply robust. As a further improvement, a nonparametric extension is also provided to incorporate data-adaptive machine learners for efficient inference (Chernozhukov et al., 2018). Finally, we develop strategies for sensitivity analyses under violations of the key ignorability assumptions in cases when insufficient baseline covariates are collected, or when there is unmeasured treatment-induced confounding of the mediator-outcome relationship.
2 Notation, causal estimands and identification
Suppose that we observe independent copies of the quintuple , where represents treatment assignment with 1 indicating the treated condition and 0 indicating the control condition, is the occurrence of the post-treatment event, is a mediator measured after , is the final outcome of interest, and is a vector of pre-treatment covariates. A directed acyclic graph (DAG) summarizing the relationships among variables is in Figure 1, where is allowed to affect either directly or through the intermediate variables and . For a generic variable , we use to denote its distribution function, to denote the probability mass/density function, to denote its expectation. Whenever applicable, we abbreviate , , and as , , and without ambiguity. Moreover, we define as the empirical average operator, as the indicator function, as the absolute value, as the -norm such that .
We pursue the potential outcomes framework to define causal mediation estimands (Imai et al., 2010). Let be the potential value of the post-treatment event under treatment , the potential value of the mediator when the treatment is and is set to , the potential outcome when the treatment is set to , the post-treatment event is set to , and the mediator is set to . Furthermore, we write such that the potential value of the mediator under treatment is identical to that under treatment when takes its natural value under treatment . Similarly, we have and . The equalities , , and are collectively referred to as the composition of potential values (VanderWeele and Vansteelandt, 2009).
To proceed, we adopt the principal stratification framework (Frangakis and Rubin, 2002) and use the joint potential values of the post-treatment event to partition the study population into four subgroups, . To contextualize the development, we take the noncompliance as a running example throughout, in which case these four strata are named as compliers, always-takers, never-takers, and defiers. For notational convenience, we re-express the joint potential values as so that . A central property is that is unaffected by the treatment and can be treated as a baseline covariate; therefore causal comparisons conditional on are well-defined subgroup causal effects. We define and as the proportion of principal stratum conditional on and marginalized over covariates , where is referred to as the principal score (Ding and Lu, 2017). Since the stratum membership is only partially observed, the principal score and its marginal counterpart cannot be estimated without further assumptions.
The PCE is defined as the effect of treatment assignment in each principal stratum (Jo and Stuart, 2009; Ding and Lu, 2017) and is written as:
which equals by composition of the potential outcome. To assess mediation, we decompose into a principal natural indirect effect () and a principal natural direct effect ():
(1) |
Intuitively, captures the effect of treatment assignment on outcome among units in stratum when the mediator is fixed to its natural value without treatment. The , on the other hand, captures the mean difference of the potential outcomes among units in stratum , when the assignment , but the mediator changes from its natural value under treatment to its counterfactual value under control. Therefore, measures the extent to which the causal effect of treatment assignment is mediated through among the subpopulation in stratum . Similarly, the intention-to-treat effect (ITT), defined as , can be decomposed in the usual fashion into an intention-to-treat natural indirect effect (ITT-NIE) and a intention-to-treat natural direct effect (ITT-NDE):
(2) |
One can verify that ITT is a weighted average of the PCEs such that , where . Similarly, and .
In what follows, we focus on identification of for any and , based on which all PCEs and their effect decompositions in (1) can be obtained. Notice that ITT, ITT-NIE, and ITT-NDE can be also obtained as they are weighted averages of . Identification of requires the following structural assumptions.
Assumption 1
(Consistency) For any , , and , we have if , if and , and if , and .
Assumption 2
(Ignorability of the treatment assignment) , , , , and , where “” stands for independence.
Assumption 1 is commonly invoked to exclude unit-level interference and enables us to connect the observed variables with their potential values. Assumption 2 is the no unmeasured confounding condition for treatment assignment that is often required to identify the ITT estimand in the absence of randomization. It is considered plausible when sufficient baseline covariates are collected such that no hidden confounders would give rise to systematic differences between the post-randomization variables in the treated and the control groups. A stronger statement of Assumption 2, , is often satisfied in randomized experiments.
We additionally require the monotonicity of treatment on the post-treatment event to identify the distribution of the principal stratum membership . We consider two types of monotonicity—standard monotonicity (Assumptions 3a) and strong monotonicity (Assumptions 3b). The standard version requires that treatment has a non-negative impact on the post-treatment event, whereas the stronger version further assumes . Strong monotonicity is satisfied under one-sided noncompliance where all control units had no access to treatment (Frölich and Melly, 2013).
Assumption 3
(Monotonicity) (a) Under standard monotonicity, for all units; (b) under strong monotonicity, for all units.
Assumption 3a rules out defiers (), and enables the identification of the principal scores for the remaining three principal strata () (Ding and Lu, 2017). Defining and for , because the observed data with includes only always-takers, we have . Similarly, and since , and the strata proportions are , , and . Assumption 3b rules out both always-takers and defiers ( and 01). Under strong monotonicity, the principal scores are given by and , and the strata proportions are and . To unify the presentation under standard and strong monotonicity, we re-express and as:
(3) |
for , where , , 10, 01, and 01 if , 00, 11, and 01, respectively. Note that under strong monotonicity. Because is equivalent to , we will use them interchangeably.
We next introduce two additional ignorability assumptions for mediation analysis within principal strata.
Assumption 4
(Generalized principal ignorability) , , , .
Principal ignorability has been previously introduced to identify the PCEs (Jo and Stuart, 2009; Ding and Lu, 2017; Forastiere et al., 2018). Assumption 4 generalizes the usual assumption to accommodate the mediator as an additional intermediate outcome. This assumption requires sufficient pre-treatment covariates to remove the confounding between and and that between and ; in other words, no systematic differences exist in the distribution of the potential mediator and outcome across principal strata, given covariates. Next, we require ignorability of the mediator (Yamamoto, 2013; Park and Kürüm, 2020):
Assumption 5
(Ignorability of the mediator) } , , , , .
Assumption 5 assumes that the potential mediator is independent of the potential outcome, given the observed covariates , within each assignment group and principal stratum. This assumption rules out unmeasured baseline confounders in the mediator-outcome relationship and requires that, apart from , there are no other treatment-induced confounders affecting the mediator-outcome relationship. Assumption 5, coupled with Assumptions 2 and 4, generalizes the standard sequential ignorability assumption for causal mediation analysis (Imai et al., 2010) to address a post-randomization event . In addition, when Assumptions 2–4 hold, Assumption 5 is equivalent to without the need to condition on treatment assignment and principal stratum (see Lemma S6 in the Supplementary Material). Lastly, we state the following positivity assumption.
Assumption 6
(Positivity) Assume that , , , , and for any , , , and . Additionally assume with and under standard monotonicity.
Let be the three strata under standard monotonicity and let be the two strata under strong monotonicity. Theorem 1 below shows that is nonparametrically identified under the aforementioned assumptions.
Theorem 1
By Theorem 1, we have and , and the decomposition of the ITT effect can also be identified by
(4) |
where under standard monotonicity and under strong monotonicity.
Remark 1
Rudolph et al. (2024) considered instrumental variables to identify the interventional and natural mediation effects among compliers. In a comparable scenario, they showed that under (i) exclusion restriction, (ii) standard monotonicity, and (iii) sequential randomization (Assumption 2 plus ), is identified by . A similar identification formula follows for . Due to exclusion restriction, the mediation effects among other strata are automatically zero, and thus only the compliers stratum contributes information to the ITT natural mediation effect. Replacing exclusion restriction with principal ignorability, Theorem 1 allows additional strata to contribute information to the ITT natural mediation effect and enables point identification of each stratum-specific natural mediation effect.
3 Connections to mediation analysis with a treatment-induced confounder or two mediators
Although we consider as a primary source to sub-classify the study population, there exist two complementary perspectives for the role of in causal mediation analysis; that is, can be viewed as a binary post-treatment confounder or another mediator sitting in the causal pathway between and . We discuss the connections between the present work and existing mediation methods for addressing treatment-induced confounding (Robins and Richardson, 2010; Tchetgen Tchetgen and VanderWeele, 2014; VanderWeele et al., 2014; Miles et al., 2020; Díaz et al., 2021; Xia and Chan, 2021) or two causally-ordered mediators (Albert and Nelson, 2011; Daniel et al., 2015; Steen et al., 2017; Zhou, 2022). When is considered as another mediator, methods have been proposed for identification of the path-specific effects through the four causal pathways given by Figure 1(a)–(d) (e.g., Daniel et al., 2015; Zhou, 2022). If is treated as a post-treatment confounder, methods have been developed for identifying different versions of mediation effects through , including the interventional mediation effects (VanderWeele et al., 2014; Díaz et al., 2021), the natural mediation effects (Robins and Richardson, 2010; Tchetgen Tchetgen and VanderWeele, 2014; Xia and Chan, 2021), and the path-specific effect on the causal pathway in Figure 1(c) (VanderWeele et al., 2014; Miles et al., 2017, 2020).
In contrast to methods that view as a post-treatment confounder, our work addresses a different scientific question. Both our approach and methods for two causally-ordered mediators aim to disentangle the roles of and in jointly explaining the causal mechanism, whereas mediation methods with a post-treatment confounder focus on the primary role of in explaining the causal mechanism. For example, Tchetgen Tchetgen and VanderWeele (2014) and Xia and Chan (2021) studied identification of the natural mediation effects defined in (2), which summarize the causal sequence through on the outcome marginalized over different levels of . Similarly, the interventional mediation effects in VanderWeele et al. (2014) and Díaz et al. (2021) only considered the causal sequence through mediator to define the estimand of interest. Finally, Miles et al. (2017) and Miles et al. (2020) mainly considered the path-specific effects on the causal pathway in Figure 1(c), where other causal pathways passing through were assumed of less interest.
Comparing the present work to methods for identifying path-specific effects, a notable difference lies in the causal estimands of interest. We specifically focus on decomposing the causal effects within endogenous subgroups characterized by the joint potential values of the , whereas the path-specific effects are defined for the entire study population. A further difference lies in the identification assumptions. Identifying path-specific effects requires certain ignorability assumptions regarding the observed post-treatment variable directly. For example, Daniel et al. (2015) requires that for any and . On the other hand, our identification assumptions require the use of the potential values of to define the principal stratum () and then invoke ignorability assumptions across the principal stratum .
Despite the aforementioned differences, there exist mathematical connections across the requisite identification conditions. We provide two further remarks about such connections, in particular to the work by Tchetgen Tchetgen and VanderWeele (2014) (on treatment-induced confounding) and Zhou (2022) (on two causally-ordered mediators). All proofs are provided in the Supplementary Material.
Remark 2
Tchetgen Tchetgen and VanderWeele (2014) used a nonparametric structural equations model with independent errors (NPSEM-IE) for the DAG in Figure 1, coupled with monotonicity, to identify the natural mediation effects (2) with a binary post-treatment confounder. Suppose that the consistency (Assumption 1) and the monotonicity (either Assumption 3a or 3b) hold, if further the NPSEM-IE for the DAG in Figure 1 (i.e., Assumption S1 in Supplementary Material) hold, then Assumptions 2, 4, and 5 also hold, but not vice versa.
By Remark 2, under the consistency and monotonicity, the ignorability assumptions (Assumptions 2, 4, and 5) are directly implied from a NPSEM-IE corresponding to the DAG in Figure 1. Remark 2 further implies that the identification formulas for the natural mediation effects (4) are equivalent to the identification formulas in Tchetgen Tchetgen and VanderWeele (2014), except that the present work invokes technically weaker assumptions.
Remark 3
Zhou (2022) considered a set of generalized sequential ignorability assumptions to identify path-specific effects with multiple mediators, and they are comparable to the present work in the special case when is binary. Suppose that the consistency (Assumption 1) holds, then, under monotonicity (either Assumption 3a or 3b), the set of generalized sequential ignorability assumptions in Zhou (2022) (i.e., Assumption S2 in Supplementary Material) are equivalent to Assumptions 2, 4, and 5.
By Remark 3, the assumptions in the present work are stronger than those in Zhou (2022), since the latter does not require monotonicity. This is expected as stronger assumptions are necessary to identify our finer-grained estimands that provide insights into the pathways within each subpopulation. Finally, if the monotonicity is plausible by the treatment on the first mediator , our assumptions are equivalent to the set of generalized sequential ignorability assumptions in Zhou (2022).
4 Estimation of natural mediation effects
4.1 Nuisance functions and parametric working models
We first define several nuisance functions of the observed-data distributions. Let be the probability of treatment conditional on , where is the propensity score. Note that degenerates to a constant value in randomized experiments. Let be the probability of the mediator conditional on , , and . Let be the conditional expectation of given , , , and . Let contain all nuisance functions, where is defined in Section 2. It should be noted that, within our definitions of the nuisance functions, the two variables –which directly relate to the principal strata–are presented as the subscript, while all other variables are presented as arguments.
One can specify parametric working models for . Specification of the parametric working models can be flexible. For example, logistic regressions can be used for and . When the mediator is continuous or binary, a linear regression or a logistic regression can be employed for . Similarly, a generalized linear model can be used for . Detailed model examples are provided in the Supplementary Material. Hereafter, we use to denote the submodel of the nonparametric model with a correctly specified for and unspecified other components. Analogously, define , , and as the submodel of with a correctly specified , , and , respectively. In addition, we use “” and “” to denote union and intersection of submodels such that denotes the correct specification of both and , and denotes the correct specification of either or .
Suggested in Theorem 1, one also needs to estimate , or equivalently , in order to estimate . There are multiple ways to estimate , as one can simply use the plug-in estimator and the inverse probability weighting estimator . In randomized experiments, because , one can also estimate by . In this article, we consider the doubly robust estimator developed in Jiang et al. (2022),
(5) |
which is consistent to under and is locally efficient under .
4.2 Moment-type estimators
We provide four distinct identification expressions of ; each expression uses only part, but not all, of the nuisance functions and the principal stratum proportion .
Theorem 2
For , or under standard or strong monotonicity, we have , where
with , , 11, 10, 01 if 10, 00, and 11, respectively.
The first expression is an average of outcome by the product of four different weights, where the first weight, , is the principal score weight for creating a pseudo-population within stratum (Jiang et al., 2022). The remaining three weights in —the inverse probability of treatment weight, the inverse probability of the post-treatment event weight, and the mediator density ratio weight—correct for selection bias associated with the treatment, post-treatment event, and the observed mediator value, within the pseudo population created by the principal score weight. The second expression is a product of two components, where the first component, , plays a similar role to the principal score weight to create a pseudo-population of stratum , and the second component, is a conditional version of given fixed values of baseline covariates and within stratum . Construction of the third expression bears some resemblance to the first expression, both of which use the principal score weight, except that the third expression uses a slightly different weighting scheme coupled with the conditional expectation of outcome instead of weighting directly on the observed outcome. The fourth expression shares a similar form to the second expression, but now involves the product between the principal score weight and .
According to Theorem 2, we can obtain the four moment-type estimators, , by replacing the unknown nuisance functions with their estimates from parametric working models and substituting the outer expectation operator by the empirical average operator . As an example, is given by
where . Here, the integral in becomes simple summations when the mediator is categorical and, if the mediator is continuous, numerical integration can be used for an approximate calculation. We summarize the asymptotic properties of the four moment-type estimators below.
Proposition 1
Suppose that the regularity conditions outlined in the Supplementary Material hold. Then, , , , and are consistent and asymptotic normal under , , , and , respectively.
4.3 From efficient influence function to multiply robust estimator
Denote as the nonparametric model over the observed data density function . The efficient influence function (EIF) of under is derived in Theorem 3 based on the semiparametric estimation theory (Bickel et al., 1993), which also implies the semiparametric efficiency bound, i.e., the lower bound of the asymptotic variance among all regular and asymptotic linear estimators of under the nonparametric model .
Theorem 3
The EIF of over is
where
, 11, 10, 01 if 10, 00, and 11, respectively. Therefore, the semiparametric efficiency bound for estimation of is .
Theorem 3 inspires a new estimator of by solving the following EIF-induced estimating equation
where and depend on nuisance functions and the denominator is a constant that does not affect the solution of the estimating equation. Therefore, the new estimator, which we hereafter refer to as the multiply robust estimator, can be constructed as
Theorem 4 summarizes the asymptotic properties of the multiply robust estimator.
Theorem 4
Suppose that the regularity conditions outlined in Supplementary Material hold. Under either , , , or , the multiply robust estimator is consistent and asymptotically normal such that converges to a zero-mean normal distribution with finite variance . Moreover, achieves the semiparametric efficiency bound under .
An attractive property of is that it offers four types of protection against misspecification of the parametric working models. Notice that the four moment-type estimators provided in Section 4.2 are only single robust; for example, is only consistent under . By contrast, is quadruply robust such that it is consistent for even if one of the four working models, , , , and , is misspecified. In addition, is also locally efficient when all of the four working models are correctly specified. A proof of the quadruple robustness property is given in Supplementary Material. As a caveat, the quadruple robustness is more stringent than the double robustness property, as the former requires three out of four working models to be correct whereas the latter only require one out of two working models to be correct. In practice, one can use nonparametric bootstrap to construct the standard error and confidence interval of .
Remark 4
The monotonicity assumption can place a restriction on the observed data density ; that is, the standard monotonicity indicates that and the strong monotonicity further constrains . Following previous efforts in obtaining efficient causal estimators under a principal stratification framework (Rudolph et al., 2024; Jiang et al., 2022), the EIF in Theorem 3 is derived under the nonparametric model , which does not leverage the monotonicity restriction on to potentially sharpen the efficiency bound. Therefore, is only locally efficient under , rather than under a more restrictive model space assuming monotonicity.
4.4 Nonparametric efficient estimation
We extend the proposed multiply robust estimator by estimating the nuisance functions, , via flexible nonparametric methods or modern data-adaptive machine learning methods. We denote the new estimator as with the superscript “np” to indicate using nonparametric algorithms. The cross-fitting procedure (Chernozhukov et al., 2018) is employed to circumvent the bias due to overfitting of nonparametric estimation on the nuisance functions. Specifically, we randomly partition the dataset into groups with approximately equal size such that the group size difference is at most 1. For each , let be the data in -th group and be the data excluding the -th group. For , we calculate the nuisance function estimates on data , denoted by , based on machine learning or nonparametric methods trained on data . The nuisance function estimates evaluated over the entire dataset, , is therefore a combination of , , , . Finally, is given by the solution to
so that , where and are and evaluated based on .
Theorem 5
Theorem 5 indicates that is consistent, asymptotically normal, and also achieves semiparametric efficiency lower bound, if all nuisance functions can be consistently estimated with a rate, which can be achieved by several machining learning algorithms (e.g., the boosting approach by Luo et al. (2016), and the random forest by Wager and Walther (2015), and the neural networks by Chen and White (1999)). When nuisance functions are estimated via data-adaptive methods, we use the empirical variance of the estimated EIF to construct the variance estimator for ; that is
where is constructed analogous to in (5) but evaluated using .
4.5 Estimation of natural mediation effects
Once we obtain , estimators of and can be constructed based on (1). For example, we can construct and , if either the moment-type method (s=a, b, c or d), multiply robust estimator (s=mr), or nonparametric efficient estimator (s=np) is used for . Analogously, ITT-NIE and ITT-NDE can estimated via using (4) by replacing and with their corresponding estimators. Specifically, we can construct estimators of ITT-NIE and ITT-NDE as and if either the moment-type method (s=a, b, c or d) or the multiply robust estimator (s=mr) is used for estimating , where and and under standard and strong monotonicity assumptions, respectively. In particular, the multiply robust estimators have the following explicit expressions
(6) | ||||
(7) |
Similarly, the nonparametric estimators and can be obtained by replacing all in (6) and (7) with . In the Supplementary Material, we show that, for all , is consistent and semiparametrically efficient if conditions in Theorem 5 are satisfied and is still quadruply robust and locally efficient when all working models in are correctly specified. Details on inference are given in the Supplementary Material.
Although we primarily discuss mediation effects on a mean difference scale, all effects can be defined on other scales as needed. For example, with a binary outcome one can consider a risk ratio scale and use and to quantify the natural indirect and direct effects within principal stratum . Similarly, one can use and to measure the natural indirect and direct effects among the entire study population. Estimation of ratio mediation effects is straightforward based on , and is omitted for brevity.
5 A simulation study
We investigate the finite-sample performance of the proposed methods via simulation studies. We consider the following data generation process modified from that in Kang and Schafer (2007), in which the positivity assumptions are practically violated under model misspecification. Specifically, we generate 1000 Monte Carlo samples with by the following process. We draw baseline covariates , and
In correctly specified parametric working models, we directly include the true baseline covariates into each working model, where specifications of the working models are given in Supplementary Material. Otherwise, we include a set of transformed covariates, , into misspecified working models, where , , , and . We evaluate each of the proposed moment-type estimators and multiply robust estimators under 6 different scenarios: (i) all components in are correctly specified; (ii) only is misspecified; (iii) only is misspecified; (iv) only is misspecified; (v) only is misspecified; (vi) all components in are misspecified.
For the nonparametric estimator, we consider a five-fold cross-fitting with the nuisance functions estimated by the Super Learner (Van der Laan et al., 2007) with a combination of random forest and generalized linear models libraries. Although the Super Learner is more flexible than parametric working models, its performance still depends on the quality of the input feature matrix. In each of Scenarios (i)–(vi), we use the true covariates as the feature matrix under the correctly specified nuisance scenario and the transformed covariates as the feature matrix under the misspecified nuisance scenario.

Figure 2 presents the boxplots of different estimators of over 1000 Monte Carlo simulations, with each panel corresponding to a specific simulation scenario. As expected, the moment-type estimators are centered around the true value if the required parametric working models are all correctly specified but may diverge from the true value otherwise. The multiply robust estimator exhibits minimum bias in Scenarios (i)–(v), confirming the quadruply robust property; however, it exhibits bias when all of the working models are misspecified as demonstrated in scenario (vi). The nonparametric efficient estimator performs fairly well with minimal bias in scenarios (i)–(v), and its bias in scenario (vi) is also smaller than that of the multiply robust estimator with parametric working models. For each scenario, we also investigate the 95% Wald-type confidence interval coverage rate in Table 1, where the variance is estimated by bootstrap in moment-type and multiply robust estimators and by the empirical variance of the EIF in the nonparametric method. We observe that both and present close to nominal coverage in scenarios (i)–(v), but their coverage rates are attenuated in scenario (vi) due to misspecification. Moreover, the moment estimator appears to have nominal coverage except for Scenario (iii), likely due to the over-estimation of the true sampling variance for this weighting estimator under misspecified weights. We also evaluate estimators of and and results are qualitatively similar. The detailed additional simulation results are provided in the Supplementary Material Figures S1–S2.
Scenario | ||||||
---|---|---|---|---|---|---|
(i) All nuisance correctly specified | 0.951 | 0.910 | 0.924 | 0.930 | 0.929 | 0.969 |
(ii) misspecified | 0.955 | 0.874 | 0.941 | 0.921 | 0.932 | 0.953 |
(iii) misspecified | 0.871 | 0.926 | 0.738 | 0.717 | 0.918 | 0.951 |
(iv) misspecified | 0.938 | 0.907 | 0.921 | 0.932 | 0.943 | 0.953 |
(v) misspecified | 0.954 | 0.873 | 0.799 | 0.748 | 0.946 | 0.958 |
(vi) All nuisance misspecified | 0.953 | 0.657 | 0.919 | 0.870 | 0.732 | 0.812 |
6 Two real-data applications
6.1 A job training program with noncompliance
JOBS II is a randomized field experiment among 1,801 unemployed workers to examine the effect of a job training workshop to promote mental health and high-quality reemployment (Price et al., 1992). Participants in the treatment group () were assigned to a job skills workshop, but 45% of the participants did not show up and dropped into control group. Let be the indicator of whether the individual attends the workshop, where among all participants in the control group because they had no access to the workshops. As strong monotonicity holds by design, we have two principal strata: compliers and never-takers. In the JOBS II study, it is of interest to assess the effect of attending job skills workshop () on depression () among the compliers, which quantifies the efficacy of the training program to mental health (VanderWeele, 2011). Previous efforts (e.g., Park and Kürüm, 2020) have also investigated the role of sense of mastery () in mediating the effect from the job skills workshop () on depression () among the complier stratum, typically under the exclusion restriction assumption. Because JOBS II is not double-blinded, the exclusion restriction may not hold due to psychological effects (Park and Kürüm, 2020; Stuart and Jo, 2015). For example, Stuart and Jo (2015) pointed out that participants assigned to the workshop may feel more optimistic about their reemployment opportunity, suggesting direct pathways from the assignment to depression not via their actual attendance status.
We assess causal mediation under our proposed assumptions, which permit the exploration of causal mechanism among never-takers. The mediator is sense of mastery at 6 weeks after randomization, with indicating a higher sense of mastery. The outcome is a continuous measure of depression at 6 months after randomization, which ranges from 1 to 4 with a higher value indicating worse depression. Baseline covariates () include age, gender, race, marital status, education, assertiveness, level of economic hardship, level of depression, and motivation. For the moment-type and multiply robust estimators, we used the parametric working models described in the Supplementary Material for the nuisance functions. Of note, the propensity score is known by randomization, and therefore the working logistic regression of is not subject to misspecification; we still include all baseline covariates into this working logistic regression to adjust for chance imbalance. For the nonparametric efficient estimator, we used Super Learner with the random forest and generalized linear model libraries for estimating the nuisance functions. We only present the results from the multiply robust estimator and nonparametric efficient estimator; complete results from other estimators are in Supplementary Material Tables S1–S2.
Population | Estimand | Method | ||
---|---|---|---|---|
mr | np | Rudolph et al.§ | ||
Overall | ITT-NIE | 0.015 (0.032, 0.003) | 0.017 (0.031, 0.004) | 0.017 (0.031, 0.004) |
ITT-NDE | 0.072 (0.151, 0.001) | 0.067 (0.146, 0.013) | 0.067 (0.146, 0.013) | |
ITT | 0.088 (0.167, 0.016) | 0.084 (0.162, 0.005) | 0.084 (0.162, 0.005) | |
Compliers | PNIE | 0.026 (0.055, 0.006) | 0.029 (0.052, 0.006) | 0.030 (0.053, 0.006) |
PNDE | 0.083 (0.170, 0.002) | 0.066 (0.156, 0.023) | 0.115 (0.251, 0.022) | |
PCE | 0.109 (0.191, 0.027) | 0.096 (0.182, 0.009) | 0.145 (0.280, 0.008) | |
Never-takers | PNIE | 0.000 (0.006, 0.006) | 0.001 (0.004, 0.003) | – |
PNDE | 0.058 (0.160, 0.031) | 0.066 (0.163, 0.032) | – | |
PCE | 0.058 (0.160, 0.030) | 0.067 (0.163, 0.032) | – |
-
‘Rudolph et al.’ is the nonparametric efficient estimator in Rudolph et al. (2024); see Remark 1 for more details of this approach. Their identification formulas of the ITT natural mediation effects are identical to the present work; this explains the numerical equivalence between ‘np’ and ‘Rudolph et al.’ for the ITT analysis. All effects among never-takers given by Rudolph et al. (2024) are zero due to exclusion restriction.
Table 2 (upper panel) presents the estimated ITT effect and its indirect and direct effect decomposition. Both multiply robust and nonparametric efficient estimators present similar results, indicating that the job skills workshop corresponds to negative ITT and ITT-NIE estimates, confirming that sense of mastery is a mediator of the total effect on depression. However, the ITT estimands do not provide resolution to the potential heterogeneity of mediation effects between compliers and never-takers. The estimated proportions of compliers and never-takers are 55% and 45%, respectively (under the nonparametric efficient estimator). We present the stratum-specific mean and standard deviation of the baseline characteristics in Supplementary Material Table S3. Compared to the never-takers, the compliers are older with higher education; a larger fraction of compliers are female, white, married, and are more motivated to participate in the study, but less assertive. To offer a complete picture of the mediation mechanism for compliers and never-takers, Table 2 (middle and bottom panel) additionally presents the estimated PCEs, together with their mediation effect decomposition. For the compliers stratum, both multiply robust and nonparametric efficient estimators suggest that the JOBS II intervention exerts a statistically significant effect on reducing depression, and approximately one quarter of the can be explained by the improvement of sense of mastery. For the never-takers, we observe a smaller but still beneficial effect of the intervention on reducing depression; the 95% confidence interval for crosses zero. For both the multiply robust and nonparametric efficient estimators, the indirect effect among the never-takers is estimated to be almost zero (e.g., with 95% confidence interval ).
We also compare our results to estimates under exclusion restriction. Table 2 (right column) shows the estimates using the approach developed by Rudolph et al. (2024) (see Remark 1 for more details of this approach). Due to exclusion restriction, all mediation effects among never-takers are assumed zero. We observe that the point and interval estimates of under exclusion restriction are close to their counterparts under principal ignorability, which is anticipated as the PNIE estimate among never-takers is minimal under principal ignorability. However, the point estimates of and under exclusion restriction are larger than those under principal ignorability.
6.2 An epidemiological study with an intercurrent event
We re-analyze the World Health Organization’s Large Analysis and Review of European Housing and Health Status (WHO-LARES) study with 5,882 individuals for the effect of living in damp or moldy conditions ( if yes and 0 if no) on depression ( if yes and 0 if no), where perceived control on one’s home () is the mediator of interest (VanderWeele and Vansteelandt, 2010). However, some individuals developed dampness or mold related diseases ( if yes and 0 otherwise). Steen et al. (2017) viewed as another mediator prior to and assesses the path-specific effects through and/or . To provide a complementary perspective, we view as an intercurrent event and partition the population into four strata: the doomed stratum () including those who would always be diseased regardless of living conditions, the immune stratum () including those who would never be diseased, the harmed stratum () including those who would be diseased only if living in damp or moldy conditions, and the benefiters stratum () including those who would only be diseased if not living in damp or moldy conditions. Among them, the doomed and harmed strata are two subpopulations of typical interest because their physical health is more sensitive to living conditions. That is, studying their treatment effects can help understand the impact of living in damp or moldy conditions on the mental health among the more physically vulnerable subgroups. As a further step, addressing the principal natural mediation effects can uncover the extent to which this impact is attributed to the perceived control on one’s home.
We consider the standard monotonicity assumption to rule out benefiters, which is plausible because living in damp and moldy conditions would generally only make an individual more likely to develop dampness or mold related diseases. The exclusion restriction is unlikely to hold because living in damp or moldy conditions can still directly affect mental health even in the absence of dampness or mold related diseases. We adjust for the following confounders: gender, age, marital status, education, employment, smoking, home ownership, home size, crowding (number of residents per room), heating, and natural light. The proportions of doomed, harmed, and immune strata based on the nonparametric efficient estimator are 51%, 8%, and 41%, respectively. We summarize the stratum-specific mean and standard deviation of the baseline characteristics in Supplementary Material Table S4. The doomed stratum includes more females, followed by harmed stratum, whereas the immune stratum includes the fewest females. As compared to the doomed and harmed strata, members in the immune stratum are more likely to be married and employed; they are also more satisfied with the heating system and natural light condition in their dwellings.
Population | Estimand | Method | |
---|---|---|---|
mr | np§ | ||
Overall | ITT-NIE | 1.021 (1.003, 1.043) | 1.031 (1.010, 1.053) |
ITT-NDE | 1.248 (1.114, 1.405) | 1.219 (1.078, 1.361) | |
ITT | 1.274 (1.137, 1.438) | 1.257 (1.114, 1.400) | |
Doomed | PNIE | 1.017 (0.999, 1.034) | 1.025 (1.001, 1.050) |
PNDE | 1.223 (1.044, 1.403) | 1.181 (1.014, 1.348) | |
PCE | 1.244 (1.070, 1.433) | 1.212 (1.044, 1.379) | |
Harmed | PNIE | 1.029 (1.001, 1.059) | 1.046 (1.013, 1.079) |
PNDE | 2.296 (1.841, 2.982) | 2.142 (1.724, 2.560) | |
PCE | 2.363 (1.897, 3.045) | 2.241 (1.817, 2.666) | |
Immune | PNIE | 1.025 (0.983, 1.074) | 1.035 (0.995, 1.075) |
PNDE | 1.102 (0.911, 1.328) | 1.111 (0.881, 1.340) | |
PCE | 1.129 (0.945, 1.336) | 1.150 (0.916, 1.385) |
Based on the nonparametric efficient method, the contrasts (and 95% confidence intervals) between the PNDE in different principal strata pairs are , , and , respectively. The contrasts (and 95% confidence intervals) between the PNIE in different principal strata pairs are , , , respectively.
Table 3 presents the results based on the multiply robust and nonparametric efficient estimators. With a binary outcome, we define all causal estimands on the risk ratio scale. Results based on moment-type estimators are given in Supplementary Material Tables S5–S6, exhibiting similar patterns. Table 3 (upper panel) presents the ITT natural mediation effects, and suggests that living in damp or moldy conditions has a causal effect on elevating the risk of depression; the 95% confidence intervals for ITT, ITT-NIE and ITT-NDE estimands all exclude the null. Table 3 (lower panels) presents the principal natural mediation effects. For the harmed stratum who are most sensitive to living conditions, we observe a large PNDE (risk ratio ), but PNDEs are smaller in the other strata (risk ratio ). We further obtain the difference in log PNDEs across the three subgroups using the nonparametric efficient estimator, and confirm that the PNDE within the harmed stratum is substantially different from that within the other two strata. For example, with 95% confidence interval (0.344, 0.967) and with 95% confidence interval (0.315, 0.874). On the other hand, the PNIEs are rather comparable in magnitude across strata. Based on the nonparametric efficient estimator, although only the 95% confidence intervals of the PNIE in the doomed and harmed strata exclude null, the 95% confidence interval for each pairwise difference in log PINE includes null. Finally, the proportion mediated varies across principal strata. That is, the perceived control on one’s home, as a mediator, explains the largest fraction of PCE among the immune strata (/), and explains the smallest fraction of PCE among the harmed strata (/).
As an additional exploratory comparison, we also carry out moderated mediation analysis with respect to baseline covariates. We evaluate the conditional natural indirect and direct effects on a risk ratio scale given each covariate, using the R package moderate.mediation (Qin and Wang, 2024). For each covariate considered, we partition the covariates vector into the moderator of interest and all remaining covariates as confounding adjustment variables. Next, we fit logistic models for the mediator and outcome to assess conditional mediation based on the identification formulas given in Qin and Wang (2024). The conditional natural (in)direct effects are summarized in Supplementary Material Figure S3 and Table S7. The results suggest that the mediation effect heterogeneity across different covariate levels is milder as compared to the results under the principal stratification mediation analysis. An important distinction of the moderated mediation analysis from our proposed methods is that the former fails to address as a potential post-treatment confounder, and may be biased even for quantifying the conditional mediation effect estimands.
7 A framework for sensitivity analysis
The principal ignorability (Assumption 4) and ignorability of the mediator (Assumption 5) are two crucial assumptions for identification of . These two assumptions, however, cannot be empirically verified. Sensitivity analysis is therefore a useful tool to assess causal effects under assumed violations of these assumptions. In the Supplementary Material, we develop a semiparametric sensitivity analysis framework to assess the impact of violation of Assumption 4 and Assumption 5 on inference about and mediation effects. The proposed sensitivity analysis strategy relies on the confounding function approach (Tchetgen Tchetgen and Shpitser, 2012; Ding and Lu, 2017). Once the confounding functions are developed, we further provide a multiply robust estimator for and natural mediation effects and prove its large-sample properties, assuming a known confounding function. In practice, the confounding function is unknown and users can specify a working sensitivity function with interpretable sensitivity parameters and then report the causal estimates under a range of values of sensitivity parameters, in order to identify tipping points that might reverse the causal conclusions. In the Supplementary Material, we also illustrate the proposed sensitivity analysis methods in the context of the JOBS II study.
8 Discussion
In this article, we consider a set of new identification assumptions for studying the natural mediation effects across several principal strata. This provides an important complementary perspective to existing methods that view as either a post-treatment confounder or another mediator, and enables the investigation of mediation mechanisms within subpopulations. We then derive the EIF for the principal natural mediation effects and further propose a quadruply robust estimator. Finally, a nonparametric extension has been developed to alleviate parametric model misspecification bias and to achieve efficient estimation.
While each principal stratum often represents a scientifically relevant subpopulation, the stratum membership is not always fully observed for each individual in a particular study, leading to potential barriers in optimizing future interventions to target subpopulations. Although there has been no consensus in mitigating such barriers, we offer three considerations that may improve the policy relevance for addressing mediation across principal strata. First, as a routine practice, we recommend summarizing the baseline characteristics for each stratum to help distinguish partially observed subpopulations in measurable dimensions. Given that the baseline summary is widely available from published social science and biomedical studies, summary statistics such those in Web Tables 3 and 4 (applications in Sections 6.1 and 6.2) can facilitate a direct comparison to existing study populations and determine the relevance of the current results to alternative populations. Importantly, these summary statistics can be readily obtained once the principal score is estimated, and an example case study can also be found in Section 5.2 of Cheng et al. (2023a). Second, special study design features may enable an explicit characterization of the endogenous subgroups. As a concrete example, randomization plus strong monotonicity—design features of the JOB II study in Section 6.1—ensure that individuals attending the workshop and those not attending the workshop in the treatment group are unbiased representations of the compliers and never-takers in the entire study, and evidence about their principal natural mediation effects serves to improve interventions that can at least target individuals in the treatment group. Third, for individuals with unobserved stratum membership, the estimated principal score model is useful for membership prediction. In the noncompliance scenario, Kennedy et al. (2020) discussed a two-stage treatment policy, where in the first stage one predicts the compliance status (for example, based on estimated principal scores), and in second stage, one recommends the optimal intervention in each predicted stratum. Predicting principal stratum membership was also an intermediate step in Chen et al. (2024), who have quantified conditional average treatment effects among the partially observed always-survivors in the truncation-by-death setting. Although the optimal methods for predicting membership and the best practice for operationalizing a multi-stage treatment policy remain important topics for future research, we believe this perspective continues to endorse the value of the principal natural mediation effect estimates for informing improved interventions to target partially observed subpopulations.
This article addresses a univariate mediator. When is multi-dimensional with several continuous components, it may be cumbersome to leverage the EIF in Theorem 3 to assess mediation, because one needs to estimate a multi-dimensional density and to further calculate a multi-dimensional integration . These challenges may be mitigated by reparametrizing the nuisance functions in EIF (Díaz et al., 2021; Zhou, 2022). For example, one can retain the current parameterization of but re-express to as
Here, depends on a set of nuisance functions , where the first three are identical to these from , and are two conditional probabilities, and is a nested expectation that can be estimated by regressing on within strata . Notice that this alternative set of nuisance functions only involves one-dimensional conditional expectations or probabilities regardless of the dimensionality of , and has potential to simplify the modeling process. The semiparametric efficient estimator based on the reparameterized EIF can now be defined as , whose asymptotic properties and finite-sample performance require future work.
To support the implementation of the proposed methodology, we have developed the psmediate R package along with a brief vignette, which can be accessed at https://github.com/chaochengstat/psmediate and https://rpubs.com/chaocheng/psmediate.
Acknowledgement
This work is partially supported by the Patient-Centered Outcomes Research Institute® (PCORI® Award ME-2023C1-31350). We thank the World Health Organization’s European Centre for Environment and Health, Bonn office, for providing the WHO-LARES data. We thank Johan Steen for connecting us with the Bonn office to apply for data access. The statements in this article are solely the responsibility of the authors and do not necessarily represent the views of PCORI® or World Health Organization.
References
- Albert and Nelson (2011) Albert, J. M. and Nelson, S. (2011), “Generalized causal mediation analysis,” Biometrics, 67, 1028–1038.
- Angrist et al. (1996) Angrist, J., Imbens, G., and Rubin, D. (1996), “Identification of causal effects using instrumental variables,” Journal of the American Statistical Association, 91, 444–455.
- Bickel et al. (1993) Bickel, P. J., Klaassen, C. A., Bickel, P. J., Ritov, Y., Klaassen, J., Wellner, J. A., and Ritov, Y. (1993), Efficient and Adaptive Estimation for Semiparametric Models, Springer.
- Chen et al. (2024) Chen, X., Harhay, M. O., Tong, G., and Li, F. (2024), “A Bayesian machine learning approach for estimating heterogeneous survivor causal effects: applications to a critical care trial,” The Annals of Applied Statistics, 18, 350.
- Chen and White (1999) Chen, X. and White, H. (1999), “Improved rates and asymptotic normality for nonparametric neural network estimators,” IEEE Transactions on Information Theory, 45, 682–691.
- Cheng et al. (2023a) Cheng, C., Guo, Y., Liu, B., Wruck, L., and Li, F. (2023a), “Multiply robust estimation for causal survival analysis with treatment noncompliance,” arXiv preprint arXiv:2305.13443.
- Cheng et al. (2021) Cheng, C., Spiegelman, D., and Li, F. (2021), “Estimating the natural indirect effect and the mediation proportion via the product method,” BMC Medical Research Methodology, 21, 1–20.
- Cheng et al. (2023b) — (2023b), “Is the product method more efficient than the difference method for assessing mediation?” American Journal of Epidemiology, 192, 84–92.
- Chernozhukov et al. (2018) Chernozhukov, V., Chetverikov, D., Demirer, M., Duflo, E., Hansen, C., Newey, W., and Robins, J. (2018), “Double/debiased machine learning for treatment and structural parameters: Double/debiased machine learning,” The Econometrics Journal, 21.
- Daniel et al. (2015) Daniel, R. M., De Stavola, B. L., Cousens, S., and Vansteelandt, S. (2015), “Causal mediation analysis with multiple mediators,” Biometrics, 71, 1–14.
- Díaz et al. (2021) Díaz, I., Hejazi, N. S., Rudolph, K. E., and van Der Laan, M. J. (2021), “Nonparametric efficient causal mediation with intermediate confounders,” Biometrika, 108, 627–641.
- Ding and Lu (2017) Ding, P. and Lu, J. (2017), “Principal stratification analysis using principal scores,” Journal of the Royal Statistical Society. Series B (Statistical Methodology), 757–777.
- Forastiere et al. (2018) Forastiere, L., Mattei, A., and Ding, P. (2018), “Principal ignorability in mediation analysis: through and beyond sequential ignorability,” Biometrika, 105, 979–986.
- Frangakis and Rubin (2002) Frangakis, C. E. and Rubin, D. B. (2002), “Principal stratification in causal inference,” Biometrics, 58, 21–29.
- Frölich and Huber (2017) Frölich, M. and Huber, M. (2017), “Direct and indirect treatment effects–causal chains and mediation analysis with instrumental variables,” Journal of the Royal Statistical Society. Series B (Statistical Methodology), 1645–1666.
- Frölich and Melly (2013) Frölich, M. and Melly, B. (2013), “Identification of treatment effects on the treated with one-sided non-compliance,” Econometric Reviews, 32, 384–414.
- Hines et al. (2022) Hines, O., Dukes, O., Diaz-Ordaz, K., and Vansteelandt, S. (2022), “Demystifying statistical learning based on efficient influence functions,” The American Statistician, 76, 292–304.
- Imai et al. (2010) Imai, K., Keele, L., and Yamamoto, T. (2010), “Identification, Inference and Sensitivity Analysis for Causal Mediation Effects,” Statistical Science, 25, 51–71.
- Jiang et al. (2022) Jiang, Z., Yang, S., and Ding, P. (2022), “Multiply robust estimation of causal effects under principal ignorability,” Journal of the Royal Statistical Society Series B: Statistical Methodology, 84, 1423–1445.
- Jo and Stuart (2009) Jo, B. and Stuart, E. A. (2009), “On the use of propensity scores in principal causal effect estimation,” Statistics in Medicine, 28, 2857–2875.
- Kahan et al. (2023) Kahan, B. C., Cro, S., Li, F., and Harhay, M. O. (2023), “Eliminating ambiguous treatment effects using estimands,” American Journal of Epidemiology, 192, 987–994.
- Kang and Schafer (2007) Kang, J. D. and Schafer, J. L. (2007), “Demystifying double robustness: A comparison of alternative strategies for estimating a population mean from incomplete data,” Statistical Science, 22, 523–539.
- Kennedy et al. (2020) Kennedy, E. H., Balakrishnan, S., and G’Sell, M. (2020), “Sharp instruments for classifying compliers and generalizing causal effects,” The Annals of Statistics, 48, 2008–2030.
- Kim et al. (2017) Kim, C., Daniels, M. J., Marcus, B. H., and Roy, J. A. (2017), “A framework for Bayesian nonparametric inference for causal effects of mediation,” Biometrics, 73, 401–409.
- Luo et al. (2016) Luo, Y., Spindler, M., and Kück, J. (2016), “High-Dimensional Boosting: Rate of Convergence,” arXiv preprint arXiv:1602.08927.
- Merchant et al. (2021) Merchant, A. T., Liu, J., Reynolds, M. A., Beck, J. D., and Zhang, J. (2021), “Quantile regression to estimate the survivor average causal effect of periodontal treatment effects on birthweight and gestational age,” Journal of Periodontology, 92, 975–982.
- Michalowicz et al. (2006) Michalowicz, B. S., Hodges, J. S., DiAngelis, A. J., Lupo, V. R., Novak, M. J., Ferguson, J. E., Buchanan, W., Bofill, J., Papapanou, P. N., Mitchell, D. A., et al. (2006), “Treatment of periodontal disease and the risk of preterm birth,” New England Journal of Medicine, 355, 1885–1894.
- Miles et al. (2017) Miles, C. H., Shpitser, I., Kanki, P., Meloni, S., and Tchetgen Tchetgen, E. J. (2017), “Quantifying an adherence path-specific effect of antiretroviral therapy in the Nigeria PEPFAR program,” Journal of the American Statistical Association, 112, 1443–1452.
- Miles et al. (2020) — (2020), “On semiparametric estimation of a path-specific effect in the presence of mediator-outcome confounding,” Biometrika, 107, 159–172.
- Nguyen et al. (2021) Nguyen, T. Q., Schmid, I., and Stuart, E. A. (2021), “Clarifying causal mediation analysis for the applied researcher: Defining effects based on what we want to learn.” Psychological Methods, 26, 255.
- Page et al. (2015) Page, L. C., Feller, A., Grindal, T., Miratrix, L., and Somers, M.-A. (2015), “Principal stratification: A tool for understanding variation in program effects across endogenous subgroups,” American Journal of Evaluation, 36, 514–531.
- Park and Kürüm (2018) Park, S. and Kürüm, E. (2018), “Causal mediation analysis with multiple mediators in the presence of treatment noncompliance,” Statistics in Medicine, 37, 1810–1829.
- Park and Kürüm (2020) — (2020), “A two-stage joint modeling method for causal mediation analysis in the presence of treatment noncompliance,” Journal of Causal Inference, 8, 131–149.
- Park and Palardy (2020) Park, S. and Palardy, G. J. (2020), “Sensitivity evaluation of methods for estimating complier average causal mediation effects to assumptions,” Journal of Educational and Behavioral Statistics, 45, 475–506.
- Price et al. (1992) Price, R. H., Van Ryn, M., and Vinokur, A. D. (1992), “Impact of a preventive job search intervention on the likelihood of depression among the unemployed,” Journal of Health and Social Behavior, 158–167.
- Qin and Wang (2024) Qin, X. and Wang, L. (2024), “Causal moderated mediation analysis: Methods and software,” Behavior Research Methods, 56, 1314–1334.
- Robins and Richardson (2010) Robins, J. M. and Richardson, T. S. (2010), “Alternative graphical causal models and the identification of direct effects,” in Causality and Psychopathology: Finding the Determinants of Disorders and Their Cures, 103–158.
- Rudolph et al. (2024) Rudolph, K. E., Williams, N., and Díaz, I. (2024), “Using instrumental variables to address unmeasured confounding in causal mediation analysis,” Biometrics, 80, ujad037.
- Steen et al. (2017) Steen, J., Loeys, T., Moerkerke, B., and Vansteelandt, S. (2017), “Flexible mediation analysis with multiple mediators,” American Journal of Epidemiology, 186, 184–193.
- Stuart and Jo (2015) Stuart, E. A. and Jo, B. (2015), “Assessing the sensitivity of methods for estimating principal causal effects,” Statistical Methods in Medical Research, 24, 657–674.
- Tchetgen Tchetgen and Shpitser (2012) Tchetgen Tchetgen, E. J. and Shpitser, I. (2012), “Semiparametric theory for causal mediation analysis: efficiency bounds, multiple robustness, and sensitivity analysis,” Annals of Statistics, 40, 1816.
- Tchetgen Tchetgen and VanderWeele (2014) Tchetgen Tchetgen, E. J. and VanderWeele, T. J. (2014), “On identification of natural direct effects when a confounder of the mediator is directly affected by exposure,” Epidemiology (Cambridge, Mass.), 25, 282.
- Van der Laan et al. (2007) Van der Laan, M. J., Polley, E. C., and Hubbard, A. E. (2007), “Super learner,” Statistical Applications in Genetics and Molecular Biology, 6.
- VanderWeele (2011) VanderWeele, T. J. (2011), “Principal stratification–uses and limitations,” The International Journal of Biostatistics, 7.
- VanderWeele and Vansteelandt (2009) VanderWeele, T. J. and Vansteelandt, S. (2009), “Conceptual issues concerning mediation, interventions and composition,” Statistics and its Interface, 2, 457–468.
- VanderWeele and Vansteelandt (2010) — (2010), “Odds ratios for mediation analysis for a dichotomous outcome,” American Journal of Epidemiology, 172, 1339–1348.
- VanderWeele et al. (2014) VanderWeele, T. J., Vansteelandt, S., and Robins, J. M. (2014), “Effect decomposition in the presence of an exposure-induced mediator-outcome confounder,” Epidemiology (Cambridge, Mass.), 25, 300.
- Wager and Walther (2015) Wager, S. and Walther, G. (2015), “Adaptive concentration of regression trees, with application to random forests,” arXiv preprint arXiv:1503.06388.
- Xia and Chan (2021) Xia, F. and Chan, K. C. G. (2021), “Identification, semiparametric efficiency, and quadruply robust estimation in mediation analysis with treatment-induced confounding,” Journal of the American Statistical Association, 1–10.
- Yamamoto (2013) Yamamoto, T. (2013), “Identification and estimation of causal mediation effects with treatment noncompliance,” Technical Report.
- Zhou (2022) Zhou, X. (2022), “Semiparametric estimation for causal mediation analysis with multiple causally ordered mediators,” Journal of the Royal Statistical Society Series B: Statistical Methodology, 84, 794–821.
Supplementary Material to “Identification and multiply robust estimation in mediation analysis across principal strata”
Section A provides an additional motivating example. Section B provides practical strategies on specification of the parametric working models. In Section C, we provide a semiparametric sensitivity analysis framework for the principal ignorability assumption and ignorability of the mediator assumption. In Section D, we provide the proofs for all theorems, propositions, and remarks in the main manuscript. In Section E, we present Supplementary Material Tables and Figures.
A An additional motivating example
Example 3
(Mediation analysis with death-truncated mediator and outcome) Consider a case when the mediator and outcome are truncated due to a terminal event before measurements of the mediator, but no other terminal event occurs between the mediator and outcome. One concrete example is the Obstetrics and Periodontal Therapy (OPT) trial (Michalowicz et al., 2006), where one may address the role of gestational age () in mediating the effect of a periodontal treatment during pregnancy () on birthweight (), but and are only measured if infants born alive. Here, the survival status of the infants () serves as a terminal event, where and are not well defined for dead units () (Merchant et al., 2021). In this scenario, it is of interest to estimate the average treatment effect and mediation effect among the subset of infants who would always survive regardless of the treatment (i.e., the always-survivor stratum). Specifically, assessing the average treatment effect among always-survivors (or referred to as the survivor average causal effect) can help answer the central research question in the OPT trial on whether the periodontal therapy has adverse/positive effect on newborn infant’s birthweight (Michalowicz et al., 2006), without the complications due to death as a terminal event. As a next step, investigating the principal natural mediation effect within always-survivors, one can clarify the role of gestational age in explaining the survivor average causal effect.
B Specification of the parametric working models
We can specify working models for . Specification of can be flexible, and we provide one example below. This specification strategy is also used in our simulation and application studies.
For , one can consider as a logistic regression with coefficients such that , where is the logistic function. Specification of differs between the one-sided and two-sided noncompliance scenarios. Under two-sided noncompliance, one can consider as a logistic regression with coefficients , leading to . Under one-sided noncompliance, we already know by the strong monotonicity and therefore we can fix and only specify a working model for ; for example, one can consider such that . If is binary, we can further consider as a logistic regression with coefficients such that . For a continuous , a feasible working model is , which implies that , where is the density function of . When is a continuous or binary, one can specify or with coefficients , leading to or . Estimators of the parameters in the parametric working models, , can proceed by maximum likelihood. Estimators of nuisance functions are therefore , which is evaluated at .
C Sensitivity analysis
The principal ignorability (Assumption 4) and ignorability of the mediator (Assumption 5) are required for the identification of in Theorem 1. However, these two assumptions are not empirically verifiable based on the observed data and may not hold in randomized experiments. We propose sensitivity analysis strategies to assess the impact of violation of these two assumptions on inference about . When evaluating the sensitivity to violation of one specific assumption, we shall assume all other structural assumptions hold. To fix ideas, we consider that the mediator is a multi-valued variable with finite support , and the methodology can be generalized to a continuous mediator.
C.1 Sensitivity analysis for the principal ignorability assumption
We first focus on the scenario under standard monotonicity, and methods under strong monotonicity are discussed at the end of this section. To begin with, we notice that Theorem 1 holds under a weaker version of Assumption 4 which consists of two statements:
-
(i)
and for any .
-
(ii)
and for any .
Statement (i) requires that the expectation of is same between the complier and always-takers strata and the expectation of is same between the complier and never-takers strata, conditional on all observed covariates. Statement (ii) implicitly suggests that and . Therefore, Statement (ii) requires that the distribution of is same between the complier and always-takers strata and the distribution of is same between the complier and never-takers strata, conditional on all observed covariates. Our sensitivity analysis is based on the following confounding functions measuring departure from the weaker version of principal ignorability:
The first two confounding functions measure deviation of principal ignorability in the outcome variable, where measures the ratio of the mean of among compliers versus always-takers and measures the ratio of the mean of among compliers versus never-takers, conditional on . On the other hand, the last two confounding functions measure deviation of principal ignorability in the mediator variable, where measures the relative risk of compliers against always-takers on the treated potential mediator at level and measures the relative risk of compliers against never-takers on the control potential mediator at level , conditional on . Notice that and are only defined for , which will determine the values of and as shown in Section D.8, where we have provided the following explicit expressions in terms of :
Theorem 1 holds if all sensitivity functions in are equal to 1. The following proposition generalizes Theorem 1 to the scenario when at least one confounding function has a value different from 1.
Proposition S1
Suppose that Assumptions 1, 2, 3a, 5, and 6 hold with known values of the confounding functions (), we can identify by
for any . Here is a sensitivity weight defined in Section D.8, which depends on the confounding functions and the observed-data nuisance functions and . As an example,
If is known, we can construct a new estimator of by carefully re-weighting each term in the original multiply robust estimator by the sensitivity weight . Specifically, the new estimator, , takes the following form:
(s8) |
where and is evaluated at . The following proposition shows that is a doubly robust estimator under or .
Proposition S2
Suppose that Assumptions 1, 2, 3a, 5, and 6 hold. Then, the estimator is consistent and asymptotically normal for any under or .
In practice, the confounding functions in are unknown. To conduct the sensitivity analysis, one can specify a parametric form of indexed by a finite-dimensional parameter , say . Then, one can report and its confidence intervals over a sequence of values of , which summarizes how sensitive the inference is affected under assumed departure from the principal ignorability assumption.
The above sensitivity analysis strategy can be easily extended to the scenario under strong monotonicity. Because there are no always-takers under strong monotonicity, we only need to quantify the departure of the principal ignorability between the never-takers and compliers strata, that is, only and are needed for sensitivity analysis. Similar to the construction of (s8), we can develop an estimator, , based on a set of confounding functions, , and this estimator is consistent to under or for any . Details of are given in Section D.9. Analogously, one can report over a set of choices of to quantify the values of under assumed departure from principal ignorability, where is user-specified parametric functions of .
C.2 Sensitivity analysis for the ignorability of the mediator assumption
We develop a sensitivity analysis framework to assess the extent to which the violation of Assumption 5 might affect the inference of ; identification of and , however, does not depend on Assumption 5. In Section D.10, we show that the expression of in Theorem 1 holds under a weaker version of Assumption 5 such that , for all , , under standard monotonicity, and under strong monotonicity. This weaker assumption only requires mean independence between the potential outcome and the mediator conditional on the treatment assignment, principal strata, and baseline covariates. Recognizing the sufficiency of this weaker assumption, we propose the following sensitivity function to assess violations of the weaker version of Assumption 5:
for , where by definition. If differs from 1, then the identification formula of in Theorem 1 no longer holds. The following proposition generalizes Theorem 1 to the scenario for a known .
Proposition S3
Suppose that Assumptions 1–4 and 6 hold. Based on the confounding function , we can identify as
for any under standard monotonicity and any under strong monotonicity, where
is the sensitivity weight which depends on the confounding function and the observed-data nuisance function .
If the sensitivity function is known, we show in Section D.10 that a consistent estimator of can be obtained by re-weighting each term in the multiply robust estimator by the sensitivity weight , and takes the following form:
where and is evaluated at . The following proposition shows that is a triply robust estimator under , or .
Proposition S4
Suppose that Assumptions 1–4 and 6 hold. Then, under either , , or , is consistent and asymptotically normal for any under standard monotonicity and under strong monotonicity.
To conduct the sensitivity analysis, one can specify a parametric form of indexed by a finite-dimensional parameter , . Then, one can report over a range of choices of , which captures the sensitivity of the conclusion under departure from Assumption 5.
C.3 Illustration of the sensitivity analysis framework based on the JOBS II study
This section revisits the JOBS II study in Section 6.1 to assess the robustness of our conclusions to the violation of the proposed structural assumptions. The ignorability assumption of treatment assignment (Assumption 2) and strong monotonicity assumption (Assumption 3) are satisfied in JOBS II study by design, but the principal ignorability (Assumption 4) and the ignorability of the mediator (Assumption 5) are generally not empirically verifiable without additional data. Henceforth, we apply the proposed sensitivity analysis framework to assess robustness of the estimated principal natural mediation effects to the violation of Assumptions 4 and 5, separately. For illustration, we only assess the range of the estimated principal natural mediation effects among the compliers stratum. While examining the violation of one assumption, we assume all other assumptions hold.
C.3.1 Sensitivity analysis for principal ignorability
As we discussed in Section C.1, under a one-sided noncompliance scenario (so strong monotonicity holds) with a binary mediator, the confounding functions can be used to measure the extent to deviation of the principal ignorability assumption. Specifically, measures the relative risk between compliers against the never-takers on the sense of mastery under the control condition and measures the ratio of the potential outcome mean (under the control condition) between compliers and never-takers. For simplicity (and this is often a practical strategy for sensitivity analysis without additional content knowledge), we assume the two confounding functions do not depend on the measured baseline covariates such that and . Our specified parametric confounding function is thus .
Figure S4 presents the bias-corrected estimate, , with fixed values of ranging within . The results suggest that is robust to violation of the principal ignorability on the mediator variable as has relatively small fluctuations with different values of . For example, only increases from 0.102 (95% CI: ) to 0.071 (95% CI: ) when varying from 0.5 to 1.5 with fixed at 1 (Figure S4, Panel B). In contrast, is more sensitive to violation of the principal ignorability on the outcome variable, because moved toward null when decreases from and the sign of can even be reverted to positive when .
Next, we assess robustness of our conclusions on under departure from principal ignorability. In the one-sided noncompliance scenario, the validity of only depends on the principal ignorability assumption on the mediator variable (as we clarify in Section D.9, violation of principal ignorabilty on the outcome variable has no impact on ). Therefore, we provide the bias-corrected estimate, , for ranging from 0.5 to 1.5 in Figure S5. The results suggest that estimates of are robust against violations of principal ignorability, because remains negative among all values of considered. The estimated 95% confidence intervals, however, straddle zero when or .
C.3.2 Sensitivity analysis for ignorability of the mediator
We then investigate whether the conclusion about the principal natural mediation effects among the compliers will be subject to change if the ignorability of the mediator is violated (while assuming all remaining assumptions hold). As indicated in Section C.2, the confounding function can be used to quantify the degree of violation of the ignorability of the mediator assumption. For simplicity, we assume is constant across all levels of , , and and therefore focus on a one-dimensional sensitivity parameter for ; in other words, the parametric confounding function is simply taken as .
Figure S6 presents the bias-corrected estimates of , by , and the bias-corrected estimates of , by , with varying from 0.8 to 1.2. We observe that and move towards null with a larger and smaller value of , respectively. Specifically, we observe that remains negative under all assumed values of , but the point estimate increases from 0.132 (95% CI: ) to 0.043 (95% CI: ) when moves from 0.8 to 1.2. On the other hand, decreases from 0.023 (95% CI: ) to 0.065 (95% CI: ), when increases from 0.8 and 1.2, suggesting that is relatively more sensitive to violation of Assumption 5.
D Proofs and technical details
D.1 The nonparametric identification result (Theorem 1)
Lemma S1
Let and be two random variables with densities and . Then, we have that .
Proof.
The proof is straightforward and omitted here.
Lemma S2
Let , , and be three random variables, then
Proof.
First suppose that holds, then we have that for any , , ,
which implies that . Using the same argument but switching the role of and , we can show under . Next suppose , which imply that
(s9) |
for any and . Therefore, we can show that for any :
where the last equality follows from equation (s9). This concludes .
Lemma S3
The principal ignorability assumption (Assumption 4) indicates that for any , , , and , which further implies
Proof.
Observing , we can see that Assumption 4 is equivalent to , which implies
In addition, since is equivalent to or , one can verify that
hold. Therefore, follows from Lemma S2, with , , and , conditional on . Similarly, one can show by applying Lemma S2, with , , and , conditional on . Finally, follows by applying Lemma S2 again, with , , and , conditional on .
Lemma S4
Let and be two binary random variables satisfying and be any random variable, then we have
Proof.
This follows from Lemma S1 in Forastiere et al. (2018).
Lemma S5
Under monotonicity (either Assumption 3a or 3b), Assumption 2 implies that for any , , , , .
Proof.
Assumption 2 suggests that
for any , , , and . Therefore,
(s10) |
follow from Lemma S2. Moreover, (s10) further implies
(s11) |
by applying Lemma S4, with , , and , conditional on . Therefore, we have that
This equation then shows that for any , , , and .
Lemma S6
Under Assumptions 2–4, Assumption 5 is equivalent to , , , , .
Proof.
Observe that
Assumption 5 | |||
where the first to the second row follows from Lemma S5 (as a consequence of Assumptions 2–3), and the second to the third row follows from Assumption 4. This completes the proof.
Proof of Theorem 1. Define such that and if in is 1 and 0, respectively. Similarly, define , , and . By the definition of , we have that
(by Assumption 5) | |||
(by composition of potential values) | |||
where and are identified in equation (3) of the main manuscript under the monotonicity assumption (either Assumption 3a or 3b). This completes the proof.
D.2 Connections to existing literature (Remarks 2 and 3)
We compare the identification assumptions used in the current article to the identification assumptions in Zhou (2022) and Tchetgen Tchetgen and VanderWeele (2014). Specifically, Zhou (2022) considers the identification of path-specific effects in the presence of multiple causally-ordered mediators and Tchetgen Tchetgen and VanderWeele (2014) considers the identification of mediation effects in the presence of an exposure-induced confounder.
We focus on a comparable scenario with two intermediate variables, a binary variable and a binary/continuous variable , both of which sit in the causal pathway between the treatment assignment and an outcome , and we further assume the monotonicity assumption of on (either Assumption 3a or Assumption 3b) holds. These two intermediate variables, , have different names in these three papers; they are referred to as the post-treatment event and the mediator in the current paper, as the first mediator and the second mediator in Zhou (2022), and as the treatment-induced confounder and the mediator in Tchetgen Tchetgen and VanderWeele (2014). All three papers consider the consistency assumption (Assumption 1) and slightly different versions of the positivity assumption. To offer a common ground for the comparison of key identification assumptions, throughout the comparison, we always assume the consistency (Assumption 1) and the positivity (Assumption 6) hold. Besides the consistency and positivity assumptions, Tchetgen Tchetgen and VanderWeele (2014) consider the monotonicity assumption (Assumption 3a) and the following NPSEM-IE holds:
Assumption S1 (NPSEM-IE in Tchetgen Tchetgen and VanderWeele, 2014)
Suppose the following nonparametric structural equation models with independent errors hold:
-
(i)
,
-
(ii)
,
-
(iii)
,
-
(iv)
,
-
(v)
,
where are nonparametric functions and the errors are mutually independent.
Besides the consistency and positivity assumptions, Zhou (2022) consider the following generalized sequential ignorability assumptions:
Assumption S2 (Assumption 2 in Zhou, 2022)
Suppose the following set of ignorability assumptions hold:
-
(i)
for any , , , , ,
-
(ii)
for any , , , , ,
-
(iii)
for any , , , , .
Besides the consistency and positivity assumptions, the current work considers Assumptions 2, 4, 5. To facilitate exposition, we restate these three assumptions:
Assumption S3 (Assumptions 2, 4 and 5 in current work)
Suppose the following ignorability assumptions hold:
-
(i)
for any , , , , and ,
-
(ii)
for any , , , and ,
-
(iii)
for any , , , , .
The following two lemmas are useful to prove Remarks 2 and 3 in the paper.
Lemma S7
Proof.
First suppose that Assumption S1 holds. According to Assumption S1 and by the consistency (Assumption 1) and composition of potential values, we have
(s12) | ||||
(s13) | ||||
(s14) |
for any , , , , and , which indicates that because are mutually independent. Therefore, Assumption S2(i) holds. Moreover, equations (s12), (s13), and (s14) suggest that
(s15) |
for any , , , , . This implies
which, together with Assumption S2(i), implies that
Therefore, Assumption S2(ii) hold. Similarly, equations (s13) and (s14) suggest that for any , , , , , which indicates
This, coupled with (s15), suggests that
for any , which further implies
as a consequence of Assumption S2(i). Because if , we conclude that Assumption S2(iii) holds. Now we complete the proof that Assumption S2 also holds if Assumption S1 is valid. However, Assumption S1 may not hold under Assumption S2; that is, Assumption S1 is stronger than Assumption S2. For example, Assumption S2 does not require for , but Assumption S1 implicitly require this by the following set of nonparametric structural equations:
Proof.
Assumption S2(i) is same to Assumption S3(i) by direct comparison. Next, we show Assumption S2(ii) is equivalent to Assumption S3(ii) under Assumption S2(i) (or equivalently Assumption S3(i)). Specifically, under Assumption S2(i), Assumption S2(ii) suggests that
That is, for any , , , , and , which further implies that
Applying Lemma S4 to the previous equation and noting that by monotonicity, we obtain ; i.e., Assumption S3(ii) holds under Assumption S2(i)–(ii). On the other hand, suppose that Assumption S3(ii) hold, then we can obtain that , by Lemma S4. This suggests that
for any , , , , and . Then applying Assumption S2(i), we have
thus Assumption S2(ii) also holds under Assumption S3(i)–(ii). Therefore, we have verified that S2(i)–(ii) are equivalent to Assumption S3(i)–(ii).
Next, we show that Assumption S2(iii) is equivalent to Assumption S3(iii) under Assumption S2(i)–(ii) (or equivalently, Assumption S3(i)–(ii)). When the monotonicity assumption (Assumption 3) holds, the following statements are equivalent under Assumption S2(i)–(ii) and Assumption S3(i)–(ii):
Assumption S3(iii) | |||
(by Lemma S5) | |||
(by Assumption S3(ii)) | |||
(by Assumption S3(i) or Assumption S2(i)) | |||
(by Assumption S2(ii)) | |||
Then we conclude the proof.
D.3 Moment type estimators (Theorem 2 and Proposition 1)
Proof of Theorem 2. One can easily verify that by direct comparison. Below we show :
Next, we show :
Finally, we show :
This concludes that .
Next, we proceed with the proof of Proposition 1.
Proof of Proposition 1. Here we only prove the consistency and asymptotic normality of under , and the proof can be easily extended to the other three moment-type estimators, , , and .
Let be all of the parameters in the parametric working models of , and let be the probability limit of . Let be the value of when it is evaluated at ; is taken as the probability limit of . Under , we have that , , , but we allow due to possible misspecification of . Let be the probability limit of . According to Jiang et al. (2022), under , a condition that is nested within .
Next, we prove the consistency and asymptotic normality of . Notice that can be viewed as the solution of the following estimating equation
where
and
In addition, assume that the following regularity conditions hold:
-
1.
Assume that , where is the influence function of and is a remainder term that converges in probability to 0. Also, assume that converges to a positive definite matrix.
-
2.
Let be a bounded convex neighborhood of . Assume that the class of functions is a Glivenko-Cantelli class in .
-
3.
Assume that converges to a positive value. In addition, we assume that both and converge to a positive value.
To prove asymptotic normality, we use a Taylor series, along with the above conditions, to deduce that
which suggests that
where . Then, by applying the central limit theorem and noticing that under , we can show that converges to a zero-mean normal distribution with variance
D.4 The efficient influence function (Theorem 3)
We derive the efficient influence function (EIF) of under the nonparamatic model over the observed data . Define , where and if , respectively. Then, , where . The following lemma demonstrates the EIFs of and , separately.
Lemma S9
For any , under standard monotonicity, and under strong monotonicity, the EIF of over is , where
and , , and are defined in Theorem 1. The EIF of over is , where
Proof.
To simplify notation, we abbreviate , , , , and as , , , , and , respectively. We let be the joint density of the observed data , which is abbreviated as hereafter. Notice that can be factorized as
We consider a parametric submodel for , which depends on a one-dimensional parameter . We assume that contains the true model at ; i.e., . Let be the score function of this parametric submodel, which is defined as
where . We can decompose the score function as a summation of the following 5 parts:
where , and , , , and are similarly defined. According to the semiparametric efficiency theory (Bickel et al., 1993), the EIF of , denoted by , must satisfy the following equation:
where
is evaluated under the parametric submodel .
Below we derive by solving directly. Specifically, we can show
(s16) | ||||
(s17) | ||||
(s18) | ||||
(s19) |
where
and
and
and
Therefore,
Now we conclude that the EIF of is .
Next, we drive the EIF of , denoted by . By the semiparametric efficiency theory (Bickel et al., 1993), must satisfy the following equation
where is evaluated under the parametric submodel. We can show
(s20) | ||||
(s21) |
where
and
Henceforth, we have that
Therefore, the EIF of is .
Lemma S10
Assume that and are two causal estimands and their EIFs based on the nonparametric model in the observed data are and , respectively. Then, the EIF of is
Moreover, if , the EIF of is
Proof.
We shall follow the notations in the proof of Lemma S9. Define and as the nonparametric identification formulas of the causal estimands and , which are evaluated under the parametric submodel . By the semiparametric efficiency theory, we have that
Also, since , a valid nonparametric identification formula of under the parametric submodel is . Then, noting
we conclude that is the EIF of . Similarly, because , a valid nonparametric identification formula of under the parametric submodel is . Then, we have that
thus is the EIF of .
D.5 The multiply robust estimator (Theorem 4)
Proof of Theorem 4. Let be all of the parameters in the parametric working models of , and let be the probability limit of , where some components of may not equal to either true value due to misspecification. Let be the value of when it is evaluated at , which is the probability limit of . Notice that , , , , under , , , and , respectively, but the equalities generally do not hold when the corresponding working model is misspecified. Let be the probability limit of . According to Jiang et al. (2022), under . Therefore, also holds under either , , , or . The previous discussion suggests that the probability limit of is
where . In what follows, we show that under Scenario I (), II (), III (), or IV (), which collectively verify the quadruple robustness of .
Scenario I ():
In Scenario I, , , , but generally . By the doubly robustness of , we also have . Observing this, we can rewrite , where
It is obvious that . Moreover,
which suggests that under .
Scenario II ():
In Scenario II, , , , but generally . Observing this, we can rewrite , where
One can verify ,
Therefore, we have obtained under .
Scenario III ():
In Scenario III, , , , but generally . Observing this, we can rewrite , where
Noting that ,
we have obtained under .
Scenario IV ():
In Scenario IV, , , , but generally . Therefore, we have , where
Noting that ,
we have that under .
Up until this point, we have confirmed that the probability limit of , i.e., , equals to the true value under either , , , or . Next, we prove the asymptotic normality of . Notice that can be viewed as the solution of the following estimating equation
where and are and evaluated at . Assume that the following regularity conditions hold:
-
1.
Assume that , where is the influence function of and is a remainder term that converges in probability to 0. Also, assume that converges to a positive definite matrix.
-
2.
Let be a bounded convex neighborhood of . Assume that the class of functions is a Glivenko-Cantelli class in .
-
3.
Assume that converges to a positive value. In addition, we assume that both and converge to a positive value.
To prove asymptotic normality, we use a Taylor series, along with the above conditions, to deduce that
which suggests that
where . Then, by applying the central limit theorem and noticing that under either , , , or , we can show that converges to a zero-mean normal distribution with variance
Finally, when all parametric working models are correctly specified, i.e., under the intersection model , then achieves the semiparametric efficiency bound. We can easily verify this by observing the following facts:
-
1.
and under .
-
2.
Following point 1, under .
-
3.
, because the EIF is orthogonal to the likelihood score of the parametric working models when they are correctly specified.
Under the above three points, we have
This has completed the proof.
D.6 The nonparametric efficient estimator (Theorem 5)
Proof of Theorem 5. Proof of the multiply robostness of is same to that of as shown in the proof of Theorem 4. Here, we only prove the asymptotic normality and local efficiency of .
To simplify notation but without loss of clarity, we abbreviate and its nonparametric estimator () as and , respectively. Also, we abbreviate the two unknown functions in the efficient influence function, and , as and , respectively. Based on the cross-splitting procedure, the nonparametric estimator can be decomposed in to the ratio of two terms and , where
Here, is the size of the -th group , is the empirical mean operator on , and is evaluated under , which is the nonparametric estimator of the nuisance functions based on the leave-one-out sample . We further have the following decomposition of and :
(s22) | ||||
(s23) |
In what follows, we show that . Using a similar strategy, one can also deduce that .
We can show the second term in (s22) is by cross-splitting. Specifically, by the Markov’s inequality, the independence induced by cross-splitting, and the fact that
we have that
for any . Therefore, because converges to in probability and therefore converges to 0. Since is a finite number and we partition the dataset as evenly as possible, we have that and thus .
Next, we show that . Specifically, we can show that
Define , , , , , , , , , , and as the abbreviations of the unknown nuisance functions , , , , , , , , , , and , respectively. Also, let , , , , , , , , , , and be their corresponding estimators evaluated under . Using these abbreviations, we can rewrite with
Therefore, we have that
where and the last equality of the previous equation follows from the law of iterated expectation. Using the Cauchy-Schwartz inequality, we then have
Noting that it is assumed for any , then . Now, we have confirmed that
thus
(s24) |
Using similar arguments, we can show and therefore
(s25) |
Notice that can be cast as the solution of the following equation
This, along with (s24) and (s25) suggests that
Moreover, since and , it follows that . Therefore, we further obtain
After simple algebra and observing , we have
which suggests that is asymptotically normal and its asymptotic variance achieves the efficiency lower bound discussed in Appendix D.5.
D.7 Estimation of natural mediation effects
This section elaborates on the multiple robust estimator and nonparametric estimator for the mediation effects, , , ITT-NIE and ITT-NDE. The following lemma provides the form of the EIF of the aforementioned mediation effects.
Lemma S11
For any or under standard or strong monotonicity, the EIFs of and are
and
respectively, where and are given in Theorem 3, , and 11, 10, 01 if 10, 00, and 11, respectively. In addition, the EIFs of ITT-NIE and ITT-NDE are
and
respectively, where or under the standard or strong monotonicity.
Proof.
Because is identified as the difference between and , and the EIFs of and are and as derived in Theorem 3. According to Lemma S10, we have that the EIF of is
The EIF of can be similarly obtained.
Also, ITT-NIE is identified as
where is defined in Section D.4 and its EIF has been derived in Lemma S9. Based on Lemma S10, one can easily show
We can calculate the EIF of ITT-NDE following the same strategy.
The following proposition demonstrates the multiply robust estimator for the mediation effects is still quadruply robust and locally efficient.
Proposition S5
Under either , , or , the multiply robust estimator is consistent and asymptotically normal for all . Moreover, is semiparametrically efficient under .
Proof.
The quadruple robustness and asymptotically normality of and follow directly from the properties of in Theorem 4. Next, we prove that is locally efficient under and similar results extend to . Based on the proof of Theorem 4 in Section D.5, we know that the influence function of under is
Then, it follows from that
where the second equality holds due to Lemma S11. This suggests that is semiparametrically efficient when all working models are correctly specified.
The multiply robust estimator of ITT-NIE is . Theorem 4 suggests that and under either , , , or . Also, Jiang et al. (2022) suggests that the doubly robust estimator for the marginal principal score is consistent to under . This further implies that under either , , , or . To prove asymptotic normality, notice that can be re-expressed as
Define , where is the estimator of the parameters in nuisance parametric working models. Then, under mild regularity conditions (similar to what we listed in the proof of Theorem 4), one can easily deduce that
where is the probability limit of and is the influence function of . This have confirmed that is asmptotically normal. Under , we can verify that is semiparametrically efficient by observing that and such that
where the second equality from Lemma S11. This suggests that is locally efficient. Using a similar strategy, one can prove that is quadruply robust, asymptotically normal, and locally efficient.
In parallel to results in Section D.6, the following proposition demonstrates the properties of the nonparametric estimator of the mediation effects.
Proposition S6
For any , is consistent if any three of the four nuisance functions in are consistently estimated in the -norm. Furthermore, if all elements in are consistent in the -norm and for any , then is asymptotically normal and semiparametrically efficient.
Proof.
Following the proof of Proposition S5, one can show that is consistent to for , if any three of the four functions in are consistently estimated. Here, we only prove the asymptotic normality and local efficiency of when for any .
We show in the proof of Theorem 5 that
when for any . Therefore,
This implies that and are asymptotically normal and semiparametrically efficient under the required convergence rate conditions for the nuisance function estimates. Also, we show in the proof of Theorem 5 that , which suggests that
and thus . Similarly, we can show . This implies that and are asymptotically normal and semiparametrically efficient under the required convergence rate conditions for the nonparametric nuisance function estimators.
Remark 5
(Variance estimation of the principal and ITT mediation effects) For the purpose of inference, nonparametric bootstrap can be used for the moment-type and multiply robust estimators. The asymptotic variance of the nonparametric efficient estimators can be obtained by using the empirical variance of the estimated EIF given in Lemma S11. For example, the asymptotic variance of can be estimated by
where is evaluated based on the nonparametric estimator of the nuisance functions, . The variance estimator of , , and can be similarly obtained.
D.8 Sensitivity analysis for the principal ignorability assumption under standard monotonicity
This section provides the supporting information for the sensitivity analysis for the principal ignorability assumption, under standard monotonicity. We first present the explicit forms of the sensitivity weight for all and :
Next, we prove Propositions S1 and S2, which include identification results and properties of the multiply robust estimator under violation of principal ignorability. We first provide two lemmas.
Lemma S12
Under Assumptions 1, 2, 3a, and 6 and the proposed confounding function , we can nonparametrically identify the distribution of for all and . Specifically, we have that and for any and
for any . This suggests that, for ,
Proof.
We first show for all and similar results extend to . Specifically, for any , we have that
(by Lemma S5 and Lemma S2) | |||
(because must be 1 given and ) | |||
(because the observed stratum with and only contains the always-takers) | |||
Next, we derive the expression of . Leveraging the same strategy, one can deduce the expressions of , , and . For , we have that
(followed by the law of total probability and or 11 in observed strata | |||
under standard monotonicity) | |||
(because must be 1 given Z=1 and either or 11) | |||
(by Lemma S5 and Lemma S2) | |||
which indicates for . Because , we obtain
Lemma S13
Under Assumptions 1, 2, 3a, and 6 and the proposed confounding function , we can nonparametrically identify the for all , , and . Specifically, we have that
for any , where and are given in for and for ,
Proof.
We first show and similar result extends to . Specifically, one can verify
(by Lemma S5 and Lemma S2) | |||
(by Assumption 5) | |||
(because must be 1 given and ) | |||
(because the observed stratum with and only contains the always-takers) | |||
Next, we derive the expression of . Notice that the expressions of , , and can be similarly obtained. Specifically, for any ,
(followed by the law of iterated expectation and or 11 in observed strata | |||
under standard monotonicity) | |||
(because must be 1 given Z=1 and either or 11) | |||
(followed by Assumption 5) | |||
(by Lemma S5 and Lemma S2) |
where
and
This suggests that
and thus
This completes the proof.
Next, we prove the nonparametric formulas in Proposition S1.
Proof of Proposition S1. We will derive the nonparametric identification formula for and omit the similar proofs for all other since the steps are similar. By the definition of , we have that
(by Lemma S5 coupled with Lemma S2) | |||
This completes the proof.
Finally, we prove properties of the multiply robust estimator in Proposition S2.
Proof of Proposition S2. Following the notation in the proof of Theorem 4, we let be probability limit of , where , , , , under , , , and respectively. Under the condition of either or , we always have and . Also, because the sensitivity weight only depends on the confounding function and the observed-data nuisance functions , we can confirm that the probability limit of is equal to under or . In addition, we can show that the probability limit of , denoted by , is equal to the true value under or , because is a doubly robust estimator under . The previous discussion suggests that the probability limit of is
under or , where . In what follows, we show that under Scenario I () or Scenario II (), which concludes the double robustness of .
Scenario I ():.
In Scenario I, but generally . Therefore, we can rewrite , where
One can show that by using the law of iterated expectation and
which suggests that under .
Scenario II ():
In Scenario II, but generally . Therefore, we have , where
Noting that by the law of iterated expectations and as shown in Proposition S1, we obtained that under .
Up until this point, we have confirmed that is consistent to under or . Then, under mild regularity conditions (similar to what we listed in the proof of Theorem 4), one can easily show that is also asymptotically normal such that converges to a zero-mean normal distribution with finite variance under either or . The proof is omitted for brevity.
D.9 Sensitivity analysis for the principal ignorability assumption under strong monotonicity
We can adapt to address strong monotonicity, following a similar procedure shown in Section D.8. Under strong monotonicity, the proposed multiply robust estimator takes the following explicit expression for any :
(s26) |
where and is the estimated sensitivity weight based on the parametric working models. Here, the sensitivity weight takes a slightly different form under standard monotonicity. We provide the explicit expressions of for and below.
It is worth noting that the sensitivity weights only depend on but not ; this suggests that the estimated principal natural indirect effect, , will not depend on the values of . However, the estimated principal natural direct effect, , is dependent on both and .
We can show that enjoys similar properties to :
Proposition S7
Suppose that Assumptions 1, 2, 3b, 5, and 6 hold. Under either or , the estimator is consistent and asymptotically normal for any .
D.10 Sensitivity analysis for the ignorability of the mediator assumption
This section presents proofs for Propositions S3 and S4, which refer to the identification results and properties of the multiply robust estimator under violation of Assumption 5.
Proof of Proposition S3. For any and with under standard monotonicity and under strong monotonicity, we can show that
Therefore,
where the last equality is followed by due to Lemma S5. We can then identify using the following equations:
Therefore, we can identify as
This completes the proof.
Proof of Proposition S4. Following notation in the proofs of Theorem 4 and Proposition S2, we let be the probability limit of . Because we assume the condition under either , , or , we always have , which suggests that the probability limit of always equals to because is only a function of the sensitivity weight and the mediator model . Also, we can show that the probability limit of , denoted by , always equals to the true value , because is doubly robust under . The previous discussion suggests that the probability limit of is
under , or , where
In what follows, we show that under Scenario I (), Scenario II (), or Scenario III (), which concludes the triple robustness of .
Scenario I ():
In Scenario I, and but generally . Therefore, we can rewrite , where
One can show that by using the law of iterated expectation and
where the last equality follows from Proposition S3. This suggests that under .
Scenario II ():
In Scenario II, and but generally . Observing this, we can rewrite , where
where . One can verify by using the law of iterated expectation and
Therefore, we have obtained under .
Scenario III ():
In Scenario III, and but generally . Therefore, we have , where
where . Noting that by the law of iterated expectations and as shown in Proposition S3, we obtained that under .
Now we have proved that is consistent to under , , or . We can also show that is asymptotically normal under certain regularity conditions similar to what we provide in the proof of Theorem 4; the proofs are omitted for brevity.
E Figures and Tables






Method | ITT-NIE | ITT-NDE | ITT |
---|---|---|---|
a | 0.017 (0.036, 0.005) | 0.075 (0.156, 0.001) | 0.092 (0.173, 0.020) |
b | 0.014 (0.031, 0.003) | 0.073 (0.148, 0.001) | 0.087 (0.165, 0.017) |
c | 0.021 (0.045, 0.005) | 0.072 (0.148, 0.002) | 0.092 (0.174, 0.021) |
d | 0.014 (0.031, 0.003) | 0.072 (0.147, 0.000) | 0.086 (0.165, 0.016) |
mr | 0.015 (0.032, 0.003) | 0.072 (0.151, 0.001) | 0.088 (0.167, 0.016) |
np | 0.017 (0.031, 0.004) | 0.067 (0.146, 0.013) | 0.084 (0.162, 0.005) |
Method | Estimand | Compliers | Never takers |
---|---|---|---|
a | PNIE | 0.030 (0.062, 0.009) | 0.000 (0.007, 0.005) |
PNDE | 0.083 (0.176, 0.002) | 0.063 (0.174, 0.029) | |
PCE | 0.113 (0.206, 0.027) | 0.063 (0.172, 0.027) | |
b | PNIE | 0.024 (0.052, 0.006) | 0.000 (0.006, 0.005) |
PNDE | 0.082 (0.167, 0.003) | 0.062 (0.166, 0.028) | |
PCE | 0.106 (0.186, 0.026) | 0.062 (0.165, 0.027) | |
c | PNIE | 0.030 (0.071, 0.001) | 0.007 (0.032, 0.015) |
PNDE | 0.083 (0.171, 0.003) | 0.056 (0.160, 0.032) | |
PCE | 0.113 (0.207, 0.029) | 0.063 (0.171, 0.029) | |
d | PNIE | 0.024 (0.052, 0.006) | 0.000 (0.006, 0.005) |
PNDE | 0.081 (0.166, 0.003) | 0.060 (0.166, 0.029) | |
PCE | 0.105 (0.187, 0.025) | 0.060 (0.167, 0.029) | |
mr | PNIE | 0.026 (0.055, 0.006) | 0.000 (0.006, 0.006) |
PNDE | 0.083 (0.170, 0.002) | 0.058 (0.160, 0.031) | |
PCE | 0.109 (0.191, 0.027) | 0.058 (0.160, 0.030) | |
np | PNIE | 0.029 (0.052, 0.006) | 0.001 (0.004, 0.003) |
PNDE | 0.066 (0.156, 0.023) | 0.066 (0.163, 0.032) | |
PCE | 0.096 (0.182, 0.009) | 0.067 (0.163, 0.032) | |
Rudolph et al.§ | PNIE | 0.030 (0.053, 0.006) | – |
PNDE | 0.115 (0.251, 0.022) | – | |
PCE | 0.145 (0.280, 0.008) | – |
-
‘Rudolph et al.’ is the nonparametric efficient estimator given by Rudolph et al. (2024); see Remark 1 for more details of this approach.
Variable | Compliers | Never-takers | ASD§ |
---|---|---|---|
Proportion | 55% | 45% | |
Gender (male) | 0.53 (0.50) | 0.62 (0.48) | 0.19 |
Age (years) | 39.14 (10.37) | 35.51 (9.40) | 0.37 |
White race | 0.84 (0.37) | 0.77 (0.42) | 0.15 |
Depression | 1.89 (0.55) | 1.87 (0.55) | 0.03 |
Economic hardship | 3.04 (0.93) | 3.11 (0.94) | 0.07 |
Motivation | 5.28 (0.63) | 5.05 (0.64) | 0.35 |
Marriage (baseline: never married) | |||
Married | 0.49 (0.50) | 0.43 (0.50) | 0.11 |
Separated | 0.03 (0.17) | 0.03 (0.18) | 0.03 |
Divorced | 0.18 (0.38) | 0.19 (0.4) | 0.04 |
Widowed | 0.02 (0.14) | 0.02 (0.15) | 0.02 |
Education (baseline: less than high school) | |||
High school | 0.32 (0.47) | 0.33 (0.47) | 0.02 |
Post-secondary non-tertiary education | 0.32 (0.47) | 0.40 (0.49) | 0.17 |
Bachelor’s degree | 0.19 (0.39) | 0.10 (0.31) | 0.23 |
Higher than a Bachelor’s degree | 0.13 (0.33) | 0.09 (0.28) | 0.13 |
Assertiveness | 3.39 (0.87) | 3.56 (0.83) | 0.21 |
-
Calculation of stratum-specific mean and standard deviation of the baseline characteristics follows the method in Cheng et al. (2023a), which are weighted average of the mean and standard deviation of baseline characteristics based on the principal scores (estimated according to the nonparametric efficient estimator).
-
ASD is the absolute standardized difference across the two principal strata. Given a specific baseline covariate, its ASD is calculated as , where and are the estimated mean and standard deviation of this covariate in the stratum .
Variable | Doomed | Harmed | Immune | Max ASD§ |
---|---|---|---|---|
Proportion | 51% | 8% | 41% | |
Personal characteristics | ||||
Female | 0.63 (0.47) | 0.56 (0.52) | 0.48 (0.48) | 0.31 |
Age (years) | 46.53 (17.89) | 46.80 (17.99) | 46.76 (17.3) | 0.02 |
Married | 0.59 (0.49) | 0.63 (0.48) | 0.64 (0.48) | 0.09 |
Post-secondary education | 0.27 (0.45) | 0.27 (0.45) | 0.29 (0.45) | 0.04 |
Employed | 0.58 (0.49) | 0.58 (0.49) | 0.62 (0.49) | 0.09 |
Non-smoking | 0.50 (0.50) | 0.49 (0.50) | 0.47 (0.5) | 0.06 |
Dwelling condition | ||||
Owning the house | 0.74 (0.44) | 0.72 (0.45) | 0.76 (0.43) | 0.08 |
House size greater than | 0.86 (0.35) | 0.84 (0.37) | 0.87 (0.34) | 0.08 |
Crowding ( resident per room) | 0.64 (0.48) | 0.64 (0.48) | 0.64 (0.48) | 0.01 |
Self-evaluation on dwelling condition | ||||
Satisfied with the heating system | 0.85 (0.35) | 0.85 (0.35) | 0.89 (0.32) | 0.10 |
Satisfied with natural light | 0.75 (0.43) | 0.73 (0.44) | 0.77 (0.42) | 0.10 |
-
Calculation of stratum-specific mean and standard deviation of the baseline characteristics follows the method in Cheng et al. (2023a), which are weighted average of the mean and standard deviation of baseline characteristics based on the principal scores (estimated according to the nonparametric efficient estimator).
-
Max ASD is the maximum pairwise absolute standardized difference across the three principal strata. Given a specific baseline covariate, its Max ASD is calculated as the maximum of for all , where and are the estimated mean and standard deviation of this covariate in the stratum .
Method | ITT-NIE | ITT-NDE | ITT |
---|---|---|---|
a | 1.024 (1.003, 1.045) | 1.242 (1.108, 1.385) | 1.271 (1.127, 1.418) |
b | 1.021 (1.002, 1.039) | 1.239 (1.105, 1.387) | 1.266 (1.121, 1.407) |
c | 1.029 (1.005, 1.055) | 1.232 (1.098, 1.386) | 1.268 (1.127, 1.414) |
d | 1.021 (1.002, 1.040) | 1.230 (1.095, 1.381) | 1.256 (1.110, 1.418) |
mr | 1.021 (1.003, 1.043) | 1.248 (1.114, 1.405) | 1.274 (1.137, 1.438) |
np | 1.031 (1.010, 1.053) | 1.219 (1.078, 1.361) | 1.257 (1.114, 1.400) |
Method | Estimand | Doomed | Harmed | Immune |
---|---|---|---|---|
a | PNIE | 1.019 (0.992, 1.041) | 1.028 (0.990, 1.061) | 1.030 (0.995, 1.073) |
PNDE | 1.252 (1.066, 1.421) | 2.112 (1.780, 2.478) | 1.087 (0.896, 1.306) | |
PCE | 1.276 (1.087, 1.444) | 2.171 (1.862, 2.508) | 1.120 (0.919, 1.323) | |
b | PNIE | 1.021 (0.999, 1.043) | 1.030 (0.998, 1.065) | 1.018 (0.987, 1.053) |
PNDE | 1.226 (1.043, 1.408) | 2.141 (1.659, 2.775) | 1.096 (0.908, 1.310) | |
PCE | 1.252 (1.073, 1.438) | 2.205 (1.705, 2.833) | 1.115 (0.931, 1.311) | |
c | PNIE | 1.040 (0.998, 1.081) | 1.026 (0.984, 1.061) | 1.010 (0.943, 1.065) |
PNDE | 1.226 (1.044, 1.407) | 2.113 (1.816, 2.458) | 1.099 (0.904, 1.322) | |
PCE | 1.275 (1.086, 1.452) | 2.168 (1.844, 2.495) | 1.111 (0.913, 1.308) | |
d | PNIE | 1.021 (0.999, 1.043) | 1.033 (0.998, 1.070) | 1.017 (0.987, 1.052) |
PNDE | 1.227 (1.040, 1.403) | 2.093 (1.795, 2.441) | 1.096 (0.903, 1.322) | |
PCE | 1.253 (1.070, 1.433) | 2.162 (1.852, 2.490) | 1.115 (0.918, 1.319) | |
mr | PNIE | 1.017 (0.999, 1.034) | 1.029 (1.001, 1.059) | 1.025 (0.983, 1.074) |
PNDE | 1.223 (1.044, 1.403) | 2.296 (1.841, 2.982) | 1.102 (0.911, 1.328) | |
PCE | 1.244 (1.070, 1.433) | 2.363 (1.897, 3.045) | 1.129 (0.945, 1.336) | |
np | PNIE | 1.025 (1.001, 1.050) | 1.046 (1.013, 1.079) | 1.035 (0.995, 1.075) |
PNDE | 1.181 (1.014, 1.348) | 2.142 (1.724, 2.560) | 1.111 (0.881, 1.340) | |
PCE | 1.212 (1.044, 1.379) | 2.241 (1.817, 2.666) | 1.150 (0.916, 1.385) |
Subpopulation | Conditional NIE | Conditional NDE |
---|---|---|
Gender | ||
Male | 1.008 (0.983, 1.038) | 1.233 (1.011, 1.500) |
Female | 1.029 (1.005, 1.058) | 1.261 (1.099, 1.440) |
Current marital status | ||
Unmarried | 1.028 (0.997, 1.065) | 1.246 (1.043, 1.488) |
Married | 1.014 (0.992, 1.037) | 1.256 (1.081, 1.447) |
Education | ||
Secondary school or less | 1.017 (0.999, 1.037) | 1.305 (1.161, 1.476) |
Post-secondary education | 1.039 (0.979, 1.119) | 0.972 (0.678, 1.336) |
Employment status | ||
Unemployed | 1.022 (0.996, 1.054) | 1.305 (1.127, 1.507) |
Employed | 1.020 (0.995, 1.046) | 1.198 (1.001, 1.408) |
Smoking status | ||
Smoking | 1.040 (1.007, 1.078) | 1.204 (1.023, 1.412) |
Non-smoking | 1.003 (0.981, 1.025) | 1.300 (1.105, 1.508) |
Owning the house | ||
No | 1.025 (0.977, 1.079) | 1.110 (0.868, 1.425) |
Yes | 1.019 (1.001, 1.038) | 1.300 (1.145, 1.466) |
House size | ||
1.053 (1.005, 1.128) | 1.256 (0.978, 1.594) | |
1.012 (0.993, 1.030) | 1.251 (1.103, 1.413) | |
Crowding | ||
resident per room | 1.031 (1.001, 1.072) | 1.038 (0.869, 1.239) |
resident per room | 1.014 (0.989, 1.038) | 1.434 (1.235, 1.649) |
Heating system | ||
Unsatisfied | 1.023 (1.002, 1.047) | 1.244 (1.095, 1.420) |
Satisfied | 1.008 (0.978, 1.042) | 1.279 (1.000, 1.582) |
Natural light | ||
Unsatisfied | 1.032 (0.995, 1.076) | 1.248 (1.032, 1.501) |
Satisfied | 1.013 (0.994, 1.034) | 1.252 (1.087, 1.431) |