This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Identification and multiply robust estimation in causal mediation analysis across principal strata

Chao Cheng1,2 and Fan Li1,2
1Department of Biostatistics, Yale School of Public Health
2Center for Methods in Implementation and Prevention Science,
Yale School of Public Health
Abstract

We consider assessing causal mediation in the presence of a post-treatment event (examples include noncompliance, a clinical event, or death). We identify natural mediation effects for the entire study population and for each principal stratum characterized by the joint potential values of the post-treatment event. We derive the efficient influence function for each mediation estimand, which motivates a set of multiply robust estimators for inference. The multiply robust estimators are consistent under four types of misspecifications and are efficient when all nuisance models are correctly specified. We also develop a nonparametric efficient estimator that leverages data-adaptive machine learners to achieve efficient inference and discuss sensitivity methods to address key identification assumptions. We illustrate our methods via simulations and two real data examples.


Keywords: Causal inference, efficient influence function, endogenous subgroups, moderated mediation analysis, natural indirect effect, principal ignorability.

1 Introduction

1.1 Background and motivation

Causal mediation analysis (Imai et al., 2010) is widely used to investigate the role of a mediator (MM) in explaining the causal mechanism from a treatment (ZZ) to an outcome (YY). Under the potential outcomes framework, a primary step in causal mediation analysis is to decompose the total treatment effect into an indirect effect that works through MM and a direct effect that works around MM. While alternative definitions exist, the natural indirect and direct effects are the most relevant for studying causal mechanisms (Nguyen et al., 2021). The natural indirect effect compares potential outcomes by switching MM from the value it would have taken under the control condition to that under the treated condition, while fixing ZZ to the treated condition. The natural direct effect compares potential outcomes by switching ZZ from the control to the treated condition, while fixing MM to the value it would have taken under the control condition. Parametric regressions (e.g., Cheng et al., 2021, 2023b), semiparametric methods (e.g., Tchetgen Tchetgen and Shpitser, 2012), and nonparametric methods (e.g., Kim et al., 2017) have been proposed for estimating natural mediation effects, typically by assuming that MM is the only variable sitting on the causal chain connecting treatment and outcome.

Increasingly for experimental and observational studies, a post-treatment event (DD) may occur prior to the measurement of the mediator. This event may be a post-treatment action or decision regarding uptake (e.g., noncompliance or treatment discontinuation), a clinical event (e.g., worsening of disease, adverse medication effect), or a terminal event precluding the observation of any data afterward (e.g., death). In each context, the post-treatment event provides important information in defining partially observed subgroups of the study population. An emerging interest lies in learning the treatment effect within each subgroup, and possibly the effect heterogeneity across subgroups. Under the principal stratification framework, one can characterize each subgroup with the joint potential values of DD under alternative conditions, referred to as principal strata more generally (Frangakis and Rubin, 2002), or as endogenous subgroups in social sciences (Page et al., 2015). Beyond understanding the total effect within each subgroup, here we are further interested in evaluating the natural mediation effects via MM within each subgroup characterized by a post-treatment event DD. Below, we give two examples that motivate such an objective (an additional example on mediation analysis with death-truncated mediator and outcome is provided in the Supplementary Material).

Example 1

(Mediation analysis with noncompliance) Noncompliance occurs if the actual treatment received (DD) differs from the treatment assignment (ZZ). By Angrist et al. (1996), the study population is partitioned into four subgroups, including (i) always-takers who take the treatment regardless of assignment; (ii) never-takers who take the control regardless of assignment; (iii) compliers who comply with the assignment; (iv) defiers who take the opposite assignment. Typically, the compliers are of central interest because this is the only group for whom the average causal effect due to assignment reflects the average causal effect due to actual treatment received. As noncompliance is a post-randomization action or decision, a relevant research question in the presence of a mediator, measured prior to the outcome, is whether the treatment works through MM among the compliers (i.e., the complier natural mediation effect). A follow-up question is whether there is heterogeneity in the treatment effect mechanisms among the subgroups formed by noncompliance patterns.

Example 2

(Mediation analysis with an intercurrent event) In health research, disease progression, adverse reaction, or other early outcome may occur due to treatment, which are collectively referred to as an intercurrent event by the ICH E9 Estimands Framework (Kahan et al., 2023). Section 6.2 studies the role of perceived control (MM) in mediating the effect of residence in a damp and moldy dwelling (ZZ) on depression (YY), but dampness/mold related disease (DD) occurred among some study units. It is then of interest to study the indirect effect due to MM among those who would always develop dampness/mold related disease regardless of their living condition (i.e., the doomed stratum), those who never develop dampness/mold related disease regardless of their living condition (i.e., the immune stratum), as well as those who would develop dampness/mold related disease only if living in such condition but otherwise not (i.e., the harmed stratum). A follow-up question is whether there is variation in the natural indirect effects among these subgroups.

In both examples, the principal causal effect (PCE)—the total treatment effect within a principal stratum—can be decomposed into a principal natural indirect effect (PNIE) through the mediator and a principal natural direct effect (PNDE) around the mediator. Addressing the PNIEs and their variation allows us to unpack the overall natural indirect effect to understand for whom and under what circumstances MM plays a crucial role in explaining the underlying mechanism. Such analyses could also help determine whether the overall natural indirect effect is driven by one particular subgroup, in which case future interventions might be restructured to better serve an intended subpopulation. To this end, estimating principal natural mediation effects is informative in itself, but comparing them across strata may also provide additional insights. In fact, studying variations in principal natural mediation effects is related to moderated mediation analysis given measured baseline covariates (Qin and Wang, 2024). However, the difference is that we focus on endogenous subgroups defined by a post-treatment event rather than those defined by covariates. Consequently, the scientific question addressed by principal natural mediation effects generally cannot be answered by merely exploring the conditional mediation effects given observed covariates alone (an empirical comparison is provided in Section 6.2).

1.2 Prior work and our contribution

The post-treatment event poses unique challenges on identification of the principal natural mediation effects, because (i) DD is a treatment-induced variable confounding the mediator-outcome relationship and (ii) the principal stratum membership is only partially observed. To tackle these challenges, a prevalent approach, usually developed under the noncompliance setting (Example 1), is to view the treatment assignment as an instrument variable for DD and use exclusion restriction to identify the mediation effects among the compliers. Exclusion restriction requires that all causal pathways from the treatment assignment to the mediator and outcome are only through DD (see Figure 1 for illustration), and therefore no treatment or mediation effects exist among principal strata where ZZ does not affect DD. For example, Yamamoto (2013), Park and Kürüm (2018, 2020) considered a combination of exclusion restriction, monotonicity of treatment assignment on the treatment receipt, and mediator ignorability to nonparametrically identify the complier natural mediation effects. Instead of assuming mediator ignorability, Frölich and Huber (2017) identified the complier natural mediation effect by drawing a second instrumental variable for the mediator. Under monotonicity and exclusion restriction, Rudolph et al. (2024) further proposed semiparametrically efficient estimators of (i) complier natural/interventional mediation effects with a single instrumental variable for the treatment and (ii) double complier interventional mediation effects with two instrumental variables for the treatment and mediator.

ZZDDMMYY𝑿{\bm{X}}(a)ZZDDMMYY(b)ZZDDMMYY(c)ZZDDMMYY(d)ZZDDMMYY
Figure 1: A directed acyclic diagram for mediation analysis across principal strata. Notation: ZZ is the treatment, DD is the post-treatment event, MM is the mediator, YY is the outcome, and 𝑿{\bm{X}} is a vector of pre-treatment covariates. There are four possible causal pathways from the treatment to outcome, indexed by (a)–(d). The exclusion restriction assumption excludes pathways (c) and (d).

A critical assumption in the prior work is the exclusion restriction, which may not always hold in open-label studies where the assignment can exert a direct psychological effect on the mediator and the outcome not through the treatment receipt. In Example 2, the PCE and PNIE among the doomed and immune strata are of interest and may be non-zero; in this instance, the exclusion restriction does not apply either. Relaxing exclusion restriction, Park and Palardy (2020) developed a maximum likelihood approach with full distributional assumptions to empirically identify principal natural mediation effects; however, the consistency of their estimators requires all parametric models to be correctly specified and bias may arise under misspecification.

Our primary interest is to identify the principal natural mediation effects in endogenous subgroups defined by a post-treatment event. We assume monotonicity and ignorability conditions for nonparametric identification, but require neither the exclusion restriction nor fully parametric modeling assumptions. In a similar context, Tchetgen Tchetgen and VanderWeele (2014) studied the marginal natural mediation effect of MM in the presence of a post-treatment confounder DD based on a nonparametric structural equation model with independent errors and monotonicity, but did not provide identification results for the finer causal mediation estimands within the principal stratum. As we explain in Section 3, our identification assumptions are weaker than those considered by Tchetgen Tchetgen and VanderWeele (2014) in a technical sense, but remain sufficient to unpack the stratum-specific mediation effects that contribute to the marginal natural mediation effect. Leveraging the semiparametric efficiency theory (Bickel et al., 1993, see Hines et al. (2022) for an overview), we characterize the efficient influence function for each estimand under the nonparametric model to motivate semiparametric estimators. Our estimators are consistent under four types of working model misspecification, and are quadruply robust. As a further improvement, a nonparametric extension is also provided to incorporate data-adaptive machine learners for efficient inference (Chernozhukov et al., 2018). Finally, we develop strategies for sensitivity analyses under violations of the key ignorability assumptions in cases when insufficient baseline covariates are collected, or when there is unmeasured treatment-induced confounding of the mediator-outcome relationship.

2 Notation, causal estimands and identification

Suppose that we observe nn independent copies of the quintuple 𝑶={𝑿,Z,D,M,Y}\bm{O}=\{{\bm{X}},Z,D,M,Y\}, where Z{0,1}Z\in\{0,1\} represents treatment assignment with 1 indicating the treated condition and 0 indicating the control condition, D{0,1}D\in\{0,1\} is the occurrence of the post-treatment event, MM is a mediator measured after DD, YY is the final outcome of interest, and 𝑿{\bm{X}} is a vector of pre-treatment covariates. A directed acyclic graph (DAG) summarizing the relationships among variables is in Figure 1, where ZZ is allowed to affect YY either directly or through the intermediate variables DD and MM. For a generic variable WW, we use W(w)\mathbb{P}_{W}(w) to denote its distribution function, fW(w)f_{W}(w) to denote the probability mass/density function, 𝔼W[W]\mathbb{E}_{W}[W] to denote its expectation. Whenever applicable, we abbreviate W(w)\mathbb{P}_{W}(w), 𝔼W[W]\mathbb{E}_{W}[W], and fW(w)f_{W}(w) as (w)\mathbb{P}(w), 𝔼[W]\mathbb{E}[W], and f(w)f(w) without ambiguity. Moreover, we define n[W]=1ni=1nWi\mathbb{P}_{n}[W]=\frac{1}{n}\sum_{i=1}^{n}W_{i} as the empirical average operator, 𝕀()\mathbb{I}(\cdot) as the indicator function, |||\cdot| as the absolute value, \|\cdot\| as the L2()L_{2}(\mathbb{P})-norm such that g2=g2𝑑\|g\|^{2}=\int g^{2}d\mathbb{P}.

We pursue the potential outcomes framework to define causal mediation estimands (Imai et al., 2010). Let DzD_{z} be the potential value of the post-treatment event under treatment zz, MzdM_{zd} the potential value of the mediator when the treatment is zz and DD is set to dd, YzdmY_{zdm} the potential outcome when the treatment is set to zz, the post-treatment event is set to dd, and the mediator is set to mm. Furthermore, we write Mz=MzDzM_{z}=M_{zD_{z}} such that the potential value of the mediator under treatment zz is identical to that under treatment zz when DD takes its natural value under treatment zz. Similarly, we have Yz=YzDzMzY_{z}=Y_{zD_{z}M_{z}} and Yzm=YzDzmY_{zm}=Y_{zD_{z}m}. The equalities Mz=MzDzM_{z}=M_{zD_{z}}, Yz=YzDzMzY_{z}=Y_{zD_{z}M_{z}}, and Yzm=YzDzmY_{zm}=Y_{zD_{z}m} are collectively referred to as the composition of potential values (VanderWeele and Vansteelandt, 2009).

To proceed, we adopt the principal stratification framework (Frangakis and Rubin, 2002) and use the joint potential values of the post-treatment event U=(D1,D0)U=(D_{1},D_{0}) to partition the study population into four subgroups, {(1,0),(1,1),(0,0),(0,1)}\{(1,0),(1,1),(0,0),(0,1)\}. To contextualize the development, we take the noncompliance as a running example throughout, in which case these four strata are named as compliers, always-takers, never-takers, and defiers. For notational convenience, we re-express the joint potential values (D1,D0)(D_{1},D_{0}) as D1D0D_{1}D_{0} so that U{10,11,00,01}U\in\{10,11,00,01\}. A central property is that UU is unaffected by the treatment and can be treated as a baseline covariate; therefore causal comparisons conditional on UU are well-defined subgroup causal effects. We define ed1d0(𝑿)=fU|𝑿(d1d0|𝑿)e_{d_{1}d_{0}}({\bm{X}})=f_{U|{\bm{X}}}(d_{1}d_{0}|{\bm{X}}) and ed1d0=fU(d1d0)e_{d_{1}d_{0}}=f_{U}(d_{1}d_{0}) as the proportion of principal stratum UU conditional on and marginalized over covariates 𝑿{\bm{X}}, where ed1d0(𝑿)e_{d_{1}d_{0}}({\bm{X}}) is referred to as the principal score (Ding and Lu, 2017). Since the stratum membership U=D1D0U=D_{1}D_{0} is only partially observed, the principal score ed1d0(𝑿)e_{d_{1}d_{0}}({\bm{X}}) and its marginal counterpart ed1d0e_{d_{1}d_{0}} cannot be estimated without further assumptions.

The PCE is defined as the effect of treatment assignment in each principal stratum (Jo and Stuart, 2009; Ding and Lu, 2017) and is written as:

PCEd1d0=𝔼[Y1Y0|U=d1d0](d1d0=10,11,00,01)\text{PCE}_{d_{1}d_{0}}=\mathbb{E}\left[Y_{1}-Y_{0}|U=d_{1}d_{0}\right]\quad(d_{1}d_{0}=10,11,00,01)

which equals 𝔼[Y1M1Y0M0|U=d1d0]\mathbb{E}\left[Y_{1M_{1}}-Y_{0M_{0}}|U=d_{1}d_{0}\right] by composition of the potential outcome. To assess mediation, we decompose PCEd1d0\text{PCE}_{d_{1}d_{0}} into a principal natural indirect effect (PNIEd1d0\text{PNIE}_{d_{1}d_{0}}) and a principal natural direct effect (PNDEd1d0\text{PNDE}_{d_{1}d_{0}}):

𝔼[Y1M1Y0M0|U=d1d0]PCEd1d0=𝔼[Y1M1Y1M0|U=d1d0]PNIEd1d0+𝔼[Y1M0Y0M0|U=d1d0]PNDEd1d0.\displaystyle\underbrace{\mathbb{E}\left[Y_{1M_{1}}-Y_{0M_{0}}|U=d_{1}d_{0}\right]}_{\text{PCE}_{d_{1}d_{0}}}=\underbrace{\mathbb{E}\left[Y_{1M_{1}}-Y_{1M_{0}}|U=d_{1}d_{0}\right]}_{\text{PNIE}_{d_{1}d_{0}}}+\underbrace{\mathbb{E}\left[Y_{1M_{0}}-Y_{0M_{0}}|U=d_{1}d_{0}\right]}_{\text{PNDE}_{d_{1}d_{0}}}. (1)

Intuitively, PNDEd1d0\text{PNDE}_{d_{1}d_{0}} captures the effect of treatment assignment on outcome among units in stratum d1d0d_{1}d_{0} when the mediator is fixed to its natural value without treatment. The PNIEd1d0\text{PNIE}_{d_{1}d_{0}}, on the other hand, captures the mean difference of the potential outcomes among units in stratum d1d0d_{1}d_{0}, when the assignment Z=1Z=1, but the mediator changes from its natural value under treatment to its counterfactual value under control. Therefore, PNIEd1d0\text{PNIE}_{d_{1}d_{0}} measures the extent to which the causal effect of treatment assignment is mediated through MM among the subpopulation in stratum d1d0d_{1}d_{0}. Similarly, the intention-to-treat effect (ITT), defined as 𝔼[Y1Y0]\mathbb{E}[Y_{1}-Y_{0}], can be decomposed in the usual fashion into an intention-to-treat natural indirect effect (ITT-NIE) and a intention-to-treat natural direct effect (ITT-NDE):

𝔼[Y1M1Y0M0]ITT=𝔼[Y1M1Y1M0]ITT-NIE+𝔼[Y1M0Y0M0]ITT-NDE.\displaystyle\underbrace{\mathbb{E}\left[Y_{1M_{1}}-Y_{0M_{0}}\right]}_{\text{ITT}}=\underbrace{\mathbb{E}\left[Y_{1M_{1}}-Y_{1M_{0}}\right]}_{\text{ITT-NIE}}+\underbrace{\mathbb{E}\left[Y_{1M_{0}}-Y_{0M_{0}}\right]}_{\text{ITT-NDE}}. (2)

One can verify that ITT is a weighted average of the PCEs such that ITT=d1d0𝒰alled1d0×PCEd1d0\text{ITT}=\displaystyle\sum_{d_{1}d_{0}\in\mathcal{U}_{\rm{all}}}e_{d_{1}d_{0}}\times\text{PCE}_{d_{1}d_{0}} , where 𝒰all={10,11,00,01}\mathcal{U}_{\rm{all}}=\{10,11,00,01\}. Similarly, ITT-NIE=d1d0𝒰alled1d0×PNIEd1d0\text{ITT-NIE}=\displaystyle\sum_{d_{1}d_{0}\in\mathcal{U}_{\rm{all}}}e_{d_{1}d_{0}}\times\text{PNIE}_{d_{1}d_{0}} and ITT-NDE=d1d0𝒰alled1d0×PNDEd1d0\text{ITT-NDE}=\displaystyle\sum_{d_{1}d_{0}\in\mathcal{U}_{\rm{all}}}e_{d_{1}d_{0}}\times\text{PNDE}_{d_{1}d_{0}}.

In what follows, we focus on identification of θd1d0(zz)=𝔼[YzMz|U=d1d0]\theta_{d_{1}d_{0}}^{(zz^{\prime})}=\mathbb{E}\left[Y_{zM_{z^{\prime}}}|U=d_{1}d_{0}\right] for any zz and z{0,1}z^{\prime}\in\{0,1\}, based on which all PCEs and their effect decompositions in (1) can be obtained. Notice that ITT, ITT-NIE, and ITT-NDE can be also obtained as they are weighted averages of θd1d0(zz)\theta_{d_{1}d_{0}}^{(zz^{\prime})}. Identification of θd1d0(zz)\theta_{d_{1}d_{0}}^{(zz^{\prime})} requires the following structural assumptions.

Assumption 1

(Consistency) For any zz, dd, and mm, we have Dz=DD_{z}=D if Z=zZ=z, Mzd=MM_{zd}=M if Z=zZ=z and D=dD=d, and Yzdm=YY_{zdm}=Y if Z=zZ=z, D=dD=d and M=mM=m.

Assumption 2

(Ignorability of the treatment assignment) {Dz,Mzd,Yzdm}Z|𝐗\{D_{z},M_{z^{\prime}d^{\prime}},Y_{z^{*}d^{\prime}m^{*}}\}\perp\!\!\!\perp Z|{\bm{X}} \forall zz, zz^{\prime}, zz^{*}, dd^{\prime}, and mm^{*}, where “\perp\!\!\!\perp” stands for independence.

Assumption 1 is commonly invoked to exclude unit-level interference and enables us to connect the observed variables with their potential values. Assumption 2 is the no unmeasured confounding condition for treatment assignment that is often required to identify the ITT estimand in the absence of randomization. It is considered plausible when sufficient baseline covariates 𝑿{\bm{X}} are collected such that no hidden confounders would give rise to systematic differences between the post-randomization variables in the treated and the control groups. A stronger statement of Assumption 2, {Dz,Mzd,Yzdm,𝑿}Z\{D_{z},M_{z^{\prime}d^{\prime}},Y_{z^{*}d^{\prime}m^{*}},{\bm{X}}\}\perp\!\!\!\perp Z, is often satisfied in randomized experiments.

We additionally require the monotonicity of treatment on the post-treatment event to identify the distribution of the principal stratum membership UU. We consider two types of monotonicity—standard monotonicity (Assumptions 3a) and strong monotonicity (Assumptions 3b). The standard version requires that treatment has a non-negative impact on the post-treatment event, whereas the stronger version further assumes D0=0D_{0}=0. Strong monotonicity is satisfied under one-sided noncompliance where all control units had no access to treatment (Frölich and Melly, 2013).

Assumption 3

(Monotonicity) (a) Under standard monotonicity, D1D0D_{1}\geq D_{0} for all units; (b) under strong monotonicity, D0=0D_{0}=0 for all units.

Assumption 3a rules out defiers (U=01U=01), and enables the identification of the principal scores ed1d0(𝑿)e_{d_{1}d_{0}}({\bm{X}}) for the remaining three principal strata (U{10,11,00}U\in\{10,11,00\}) (Ding and Lu, 2017). Defining pzd(𝑿)=fD|Z,𝑿(d|z,𝑿)p_{zd}({\bm{X}})=f_{D|Z,{\bm{X}}}(d|z,{\bm{X}}) and pzd=𝔼[pzd(𝑿)]p_{zd}=\mathbb{E}[p_{zd}({\bm{X}})] for z,d{0,1}z,d\in\{0,1\}, because the observed data with (Z=0,D=1)(Z=0,D=1) includes only always-takers, we have e11(𝑿)=p01(𝑿)e_{11}({\bm{X}})=p_{01}({\bm{X}}). Similarly, e00(𝑿)=p10(𝑿)e_{00}({\bm{X}})=p_{10}({\bm{X}}) and e10(𝑿)=p11(𝑿)p01(𝑿)e_{10}({\bm{X}})=p_{11}({\bm{X}})-p_{01}({\bm{X}}) since d1d0𝒰alled1d0(𝑿)=1\displaystyle\sum_{d_{1}d_{0}\in\mathcal{U}_{\text{all}}}\!\!\!e_{d_{1}d_{0}}({\bm{X}})=1, and the strata proportions are e10=p11p01e_{10}=p_{11}-p_{01}, e00=p10e_{00}=p_{10}, and e11=p01e_{11}=p_{01}. Assumption 3b rules out both always-takers and defiers (U=11U=11 and 01). Under strong monotonicity, the principal scores are given by e10(𝑿)=p11(𝑿)e_{10}({\bm{X}})=p_{11}({\bm{X}}) and e00(𝑿)=p10(𝑿)e_{00}({\bm{X}})=p_{10}({\bm{X}}), and the strata proportions are e10=p11e_{10}=p_{11} and e00=p10e_{00}=p_{10}. To unify the presentation under standard and strong monotonicity, we re-express ed1d0(𝑿)e_{d_{1}d_{0}}({\bm{X}}) and ed1d0e_{d_{1}d_{0}} as:

ed1d0(𝑿)=pzd(𝑿)kp01(𝑿)ed1d0=pzdkp01\displaystyle e_{d_{1}d_{0}}({\bm{X}})=p_{z^{*}d^{*}}({\bm{X}})-kp_{01}({\bm{X}})\text{, \quad}e_{d_{1}d_{0}}=p_{z^{*}d^{*}}-kp_{01} (3)

for d1d0{10,00,11,01}d_{1}d_{0}\in\{10,00,11,01\}, where k=|d1d0|k=|d_{1}-d_{0}|, zd=11z^{*}d^{*}=11, 10, 01, and 01 if d1d0=10d_{1}d_{0}=10, 00, 11, and 01, respectively. Note that p01(𝑿)p010p_{01}({\bm{X}})\equiv p_{01}\equiv 0 under strong monotonicity. Because {ed1d0(𝑿),ed1d0}\{e_{d_{1}d_{0}}({\bm{X}}),e_{d_{1}d_{0}}\} is equivalent to {pzd(𝑿)kp01(𝑿),pzdkp01}\{p_{z^{*}d^{*}}({\bm{X}})-kp_{01}({\bm{X}}),p_{z^{*}d^{*}}-kp_{01}\}, we will use them interchangeably.

We next introduce two additional ignorability assumptions for mediation analysis within principal strata.

Assumption 4

(Generalized principal ignorability) {Mzd,Yzdm}U|𝐗\{M_{zd},Y_{z^{\prime}dm^{\prime}}\}\perp\!\!\!\perp U|{\bm{X}} \forall zz, zz^{\prime}, dd, mm^{\prime}.

Principal ignorability has been previously introduced to identify the PCEs (Jo and Stuart, 2009; Ding and Lu, 2017; Forastiere et al., 2018). Assumption 4 generalizes the usual assumption to accommodate the mediator as an additional intermediate outcome. This assumption requires sufficient pre-treatment covariates to remove the confounding between UU and MM and that between UU and YY; in other words, no systematic differences exist in the distribution of the potential mediator and outcome across principal strata, given covariates. Next, we require ignorability of the mediator (Yamamoto, 2013; Park and Kürüm, 2020):

Assumption 5

(Ignorability of the mediator) MzdYzdm|{Z,U,𝐗M_{zd}\perp\!\!\!\perp Y_{z^{\prime}d^{\prime}m^{\prime}}|\{Z,U,{\bm{X}}} \forall zz, zz^{\prime}, dd, dd^{\prime}, mm^{\prime}.

Assumption 5 assumes that the potential mediator is independent of the potential outcome, given the observed covariates 𝑿{\bm{X}}, within each assignment group and principal stratum. This assumption rules out unmeasured baseline confounders in the mediator-outcome relationship and requires that, apart from DD, there are no other treatment-induced confounders affecting the mediator-outcome relationship. Assumption 5, coupled with Assumptions 2 and 4, generalizes the standard sequential ignorability assumption for causal mediation analysis (Imai et al., 2010) to address a post-randomization event DD. In addition, when Assumptions 24 hold, Assumption 5 is equivalent to MzdYzdm|𝑿M_{zd}\perp\!\!\!\perp Y_{z^{\prime}d^{\prime}m^{\prime}}|{\bm{X}} without the need to condition on treatment assignment and principal stratum (see Lemma S6 in the Supplementary Material). Lastly, we state the following positivity assumption.

Assumption 6

(Positivity) Assume that fZ|𝐗(z|𝐱)>0f_{Z|{\bm{X}}}(z|{\bm{x}})>0, fD|Z,𝐗(d|1,𝐱)>0f_{D|Z,{\bm{X}}}(d|1,{\bm{x}})>0, fD|Z,𝐗(0|0,𝐱)>0f_{D|Z,{\bm{X}}}(0|0,{\bm{x}})>0, fM|Z,D,𝐗(m|1,d,𝐱)>0f_{M|Z,D,{\bm{X}}}(m|1,d,{\bm{x}})>0, and fM|Z,D,𝐗(m|0,0,𝐱)>0f_{M|Z,D,{\bm{X}}}(m|0,0,{\bm{x}})>0 for any zz, dd, mm, and 𝐱\bm{x}. Additionally assume p11p01>0p_{11}-p_{01}>0 with fD|Z,𝐗(1|0,𝐱)>0f_{D|Z,{\bm{X}}}(1|0,{\bm{x}})>0 and fM|Z,D,𝐗(m|0,1,𝐱)>0f_{M|Z,D,{\bm{X}}}(m|0,1,{\bm{x}})>0 under standard monotonicity.

Let 𝒰a={10,00,11}\mathcal{U}_{\text{a}}=\{10,00,11\} be the three strata under standard monotonicity and let 𝒰b={10,00}\mathcal{U}_{\text{b}}=\{10,00\} be the two strata under strong monotonicity. Theorem 1 below shows that θd1d0(zz)\theta_{d_{1}d_{0}}^{(zz^{\prime})} is nonparametrically identified under the aforementioned assumptions.

Theorem 1

(Nonparametric identification) Suppose that Assumptions 16 hold. For any z,z{0,1}z,z^{\prime}\in\{0,1\}, d1d0𝒰ad_{1}d_{0}\in\mathcal{U}_{\text{a}} under standard monotonicity, and d1d0𝒰bd_{1}d_{0}\in\mathcal{U}_{\text{b}} under strong monotonicity, θd1d0(zz)\theta_{d_{1}d_{0}}^{(zz^{\prime})} is identified as follows:

θd1d0(zz)=𝒙ed1d0(𝒙)ed1d0m𝔼Y|Z,D,M,𝑿[Y|z,dz,m,𝒙]𝑑M|Z,D,𝑿(m|z,dz,𝒙)𝑑𝑿(𝒙),\theta_{d_{1}d_{0}}^{(zz^{\prime})}=\int_{{\bm{x}}}\frac{e_{d_{1}d_{0}}({\bm{x}})}{e_{d_{1}d_{0}}}\int_{m}\mathbb{E}_{Y|Z,D,M,{\bm{X}}}\left[Y|z,d_{z},m,{\bm{x}}\right]d\mathbb{P}_{M|Z,D,{\bm{X}}}\left(m|z^{\prime},d_{z^{\prime}},{\bm{x}}\right)d\mathbb{P}_{{\bm{X}}}\left({\bm{x}}\right),

where dz=𝕀(z=1)d1+𝕀(z=0)d0d_{z}=\mathbb{I}(z=1)d_{1}+\mathbb{I}(z=0)d_{0} and dz=𝕀(z=1)d1+𝕀(z=0)d0d_{z^{\prime}}=\mathbb{I}(z^{\prime}=1)d_{1}+\mathbb{I}(z^{\prime}=0)d_{0}. Here, ed1d0(𝐱)=pzd(𝐱)kp01(𝐱)e_{d_{1}d_{0}}({\bm{x}})=p_{z^{*}d^{*}}({\bm{x}})-kp_{01}({\bm{x}}) and ed1d0=pzdkp01e_{d_{1}d_{0}}=p_{z^{*}d^{*}}-kp_{01} are identified in (3).

By Theorem 1, we have PNDEd1d0=θd1d0(10)θd1d0(00)\text{PNDE}_{d_{1}d_{0}}=\theta^{(10)}_{d_{1}d_{0}}-\theta^{(00)}_{d_{1}d_{0}} and PNIEd1d0=θd1d0(11)θd1d0(10)\text{PNIE}_{d_{1}d_{0}}=\theta^{(11)}_{d_{1}d_{0}}-\theta^{(10)}_{d_{1}d_{0}}, and the decomposition of the ITT effect can also be identified by

ITT-NDE=d1d0𝒰ed1d0×(θd1d0(10)θd1d0(00)),ITT-NIE=d1d0𝒰ed1d0×(θd1d0(11)θd1d0(10)),\text{ITT-NDE}=\sum_{d_{1}d_{0}\in\mathcal{U}}e_{d_{1}d_{0}}\times\left(\theta^{(10)}_{d_{1}d_{0}}-\theta^{(00)}_{d_{1}d_{0}}\right),\quad\text{ITT-NIE}=\sum_{d_{1}d_{0}\in\mathcal{U}}e_{d_{1}d_{0}}\times\left(\theta^{(11)}_{d_{1}d_{0}}-\theta^{(10)}_{d_{1}d_{0}}\right), (4)

where 𝒰=𝒰a\mathcal{U}=\mathcal{U}_{\text{a}} under standard monotonicity and 𝒰=𝒰b\mathcal{U}=\mathcal{U}_{\text{b}} under strong monotonicity.

Remark 1

Rudolph et al. (2024) considered instrumental variables to identify the interventional and natural mediation effects among compliers. In a comparable scenario, they showed that under (i) exclusion restriction, (ii) standard monotonicity, and (iii) sequential randomization (Assumption 2 plus YzmM|Z,D,𝐗Y_{zm}\perp\!\!\!\perp M|Z,D,{\bm{X}}), PNIE10\text{PNIE}_{10} is identified by ITT-NIE/e10\text{ITT-NIE}/e_{10}. A similar identification formula follows for PNDE10\text{PNDE}_{10}. Due to exclusion restriction, the mediation effects among other strata are automatically zero, and thus only the compliers stratum contributes information to the ITT natural mediation effect. Replacing exclusion restriction with principal ignorability, Theorem 1 allows additional strata to contribute information to the ITT natural mediation effect and enables point identification of each stratum-specific natural mediation effect.

3 Connections to mediation analysis with a treatment-induced confounder or two mediators

Although we consider DD as a primary source to sub-classify the study population, there exist two complementary perspectives for the role of DD in causal mediation analysis; that is, DD can be viewed as a binary post-treatment confounder or another mediator sitting in the causal pathway between ZZ and MM. We discuss the connections between the present work and existing mediation methods for addressing treatment-induced confounding (Robins and Richardson, 2010; Tchetgen Tchetgen and VanderWeele, 2014; VanderWeele et al., 2014; Miles et al., 2020; Díaz et al., 2021; Xia and Chan, 2021) or two causally-ordered mediators (Albert and Nelson, 2011; Daniel et al., 2015; Steen et al., 2017; Zhou, 2022). When DD is considered as another mediator, methods have been proposed for identification of the path-specific effects through the four causal pathways given by Figure 1(a)–(d) (e.g., Daniel et al., 2015; Zhou, 2022). If DD is treated as a post-treatment confounder, methods have been developed for identifying different versions of mediation effects through MM, including the interventional mediation effects (VanderWeele et al., 2014; Díaz et al., 2021), the natural mediation effects (Robins and Richardson, 2010; Tchetgen Tchetgen and VanderWeele, 2014; Xia and Chan, 2021), and the path-specific effect on the causal pathway ZMYZ\shortrightarrow M\shortrightarrow Y in Figure 1(c) (VanderWeele et al., 2014; Miles et al., 2017, 2020).

In contrast to methods that view DD as a post-treatment confounder, our work addresses a different scientific question. Both our approach and methods for two causally-ordered mediators aim to disentangle the roles of MM and DD in jointly explaining the causal mechanism, whereas mediation methods with a post-treatment confounder focus on the primary role of MM in explaining the causal mechanism. For example, Tchetgen Tchetgen and VanderWeele (2014) and Xia and Chan (2021) studied identification of the natural mediation effects defined in (2), which summarize the causal sequence through MM on the outcome marginalized over different levels of DD. Similarly, the interventional mediation effects in VanderWeele et al. (2014) and Díaz et al. (2021) only considered the causal sequence through mediator MM to define the estimand of interest. Finally, Miles et al. (2017) and Miles et al. (2020) mainly considered the path-specific effects on the causal pathway ZMYZ\shortrightarrow M\shortrightarrow Y in Figure 1(c), where other causal pathways passing through DD were assumed of less interest.

Comparing the present work to methods for identifying path-specific effects, a notable difference lies in the causal estimands of interest. We specifically focus on decomposing the causal effects within endogenous subgroups characterized by the joint potential values of the DD, whereas the path-specific effects are defined for the entire study population. A further difference lies in the identification assumptions. Identifying path-specific effects requires certain ignorability assumptions regarding the observed post-treatment variable DD directly. For example, Daniel et al. (2015) requires that MzdD|{Z,𝑿}M_{zd}\perp\!\!\!\perp D|\{Z,{\bm{X}}\} for any dd and zz. On the other hand, our identification assumptions require the use of the potential values of DD to define the principal stratum (UU) and then invoke ignorability assumptions across the principal stratum U=D1D0U=D_{1}D_{0}.

Despite the aforementioned differences, there exist mathematical connections across the requisite identification conditions. We provide two further remarks about such connections, in particular to the work by Tchetgen Tchetgen and VanderWeele (2014) (on treatment-induced confounding) and Zhou (2022) (on two causally-ordered mediators). All proofs are provided in the Supplementary Material.

Remark 2

Tchetgen Tchetgen and VanderWeele (2014) used a nonparametric structural equations model with independent errors (NPSEM-IE) for the DAG in Figure 1, coupled with monotonicity, to identify the natural mediation effects (2) with a binary post-treatment confounder. Suppose that the consistency (Assumption 1) and the monotonicity (either Assumption 3a or 3b) hold, if further the NPSEM-IE for the DAG in Figure 1 (i.e., Assumption S1 in Supplementary Material) hold, then Assumptions 2, 4, and 5 also hold, but not vice versa.

By Remark 2, under the consistency and monotonicity, the ignorability assumptions (Assumptions 2, 4, and 5) are directly implied from a NPSEM-IE corresponding to the DAG in Figure 1. Remark 2 further implies that the identification formulas for the natural mediation effects (4) are equivalent to the identification formulas in Tchetgen Tchetgen and VanderWeele (2014), except that the present work invokes technically weaker assumptions.

Remark 3

Zhou (2022) considered a set of generalized sequential ignorability assumptions to identify path-specific effects with multiple mediators, and they are comparable to the present work in the special case when DD is binary. Suppose that the consistency (Assumption 1) holds, then, under monotonicity (either Assumption 3a or 3b), the set of generalized sequential ignorability assumptions in Zhou (2022) (i.e., Assumption S2 in Supplementary Material) are equivalent to Assumptions 2, 4, and 5.

By Remark 3, the assumptions in the present work are stronger than those in Zhou (2022), since the latter does not require monotonicity. This is expected as stronger assumptions are necessary to identify our finer-grained estimands that provide insights into the pathways within each subpopulation. Finally, if the monotonicity is plausible by the treatment ZZ on the first mediator DD, our assumptions are equivalent to the set of generalized sequential ignorability assumptions in Zhou (2022).

4 Estimation of natural mediation effects

4.1 Nuisance functions and parametric working models

We first define several nuisance functions of the observed-data distributions. Let πz(𝒙)=Z|𝑿(z|𝒙)\pi_{z}({\bm{x}})=\mathbb{P}_{Z|{\bm{X}}}(z|{\bm{x}}) be the probability of treatment conditional on 𝑿{\bm{X}}, where π1(𝒙)\pi_{1}({\bm{x}}) is the propensity score. Note that πz(𝒙)\pi_{z}({\bm{x}}) degenerates to a constant value πz\pi_{z} in randomized experiments. Let rzd(m,𝒙)=fM|Z,D,𝑿(m|z,d,𝒙)r_{zd}(m,{\bm{x}})=f_{M|Z,D,{\bm{X}}}(m|z,d,{\bm{x}}) be the probability of the mediator conditional on ZZ, DD, and 𝑿{\bm{X}}. Let μzd(m,𝒙)=𝔼Y|Z,D,M,𝑿[Y|z,d,m,𝒙]\mu_{zd}(m,{\bm{x}})=\mathbb{E}_{Y|Z,D,M,{\bm{X}}}[Y|z,d,m,{\bm{x}}] be the conditional expectation of YY given ZZ, DD, MM, and 𝑿{\bm{X}}. Let hnuisance={πz(𝒙),pzd(𝒙),rzd(m,𝒙),μzd(m,𝒙)}h_{nuisance}=\{\pi_{z}({\bm{x}}),p_{zd}({\bm{x}}),r_{zd}(m,{\bm{x}}),\mu_{zd}(m,{\bm{x}})\} contain all nuisance functions, where pzd(𝒙)=fD|Z,𝑿(d|z,𝒙)p_{zd}({\bm{x}})=f_{D|Z,{\bm{X}}}(d|z,{\bm{x}}) is defined in Section 2. It should be noted that, within our definitions of the nuisance functions, the two variables (Z,D)(Z,D)–which directly relate to the principal strata–are presented as the subscript, while all other variables are presented as arguments.

One can specify parametric working models hnuisancepar={πzpar(𝒙),pzdpar(𝒙),rzdpar(m,𝒙),μzdpar(m,𝒙)}h_{nuisance}^{\text{par}}=\{\pi_{z}^{\text{par}}({\bm{x}}),\allowbreak p_{zd}^{\text{par}}({\bm{x}}),\allowbreak r_{zd}^{\text{par}}(m,{\bm{x}}),\allowbreak\mu_{zd}^{\text{par}}(m,{\bm{x}})\} for hnuisanceh_{nuisance}. Specification of the parametric working models can be flexible. For example, logistic regressions can be used for πzpar(𝒙)\pi_{z}^{\text{par}}({\bm{x}}) and pzdpar(𝒙)p_{zd}^{\text{par}}({\bm{x}}). When the mediator is continuous or binary, a linear regression or a logistic regression can be employed for rzdpar(m,𝒙)r_{zd}^{\text{par}}(m,{\bm{x}}). Similarly, a generalized linear model can be used for μzdpar(m,𝒙)\mu_{zd}^{\text{par}}(m,{\bm{x}}). Detailed model examples are provided in the Supplementary Material. Hereafter, we use π\mathcal{M}_{\pi} to denote the submodel of the nonparametric model np\mathcal{M}_{np} with a correctly specified πzpar(𝒙)\pi_{z}^{\text{par}}({\bm{x}}) for πz(𝒙)\pi_{z}({\bm{x}}) and unspecified other components. Analogously, define e\mathcal{M}_{e}, m\mathcal{M}_{m}, and o\mathcal{M}_{o} as the submodel of np\mathcal{M}_{np} with a correctly specified pzdpar(𝒙)p_{zd}^{\text{par}}({\bm{x}}), rzdpar(m,𝒙)r_{zd}^{\text{par}}(m,{\bm{x}}), and μzdpar(m,𝒙)\mu_{zd}^{\text{par}}(m,{\bm{x}}), respectively. In addition, we use “\cup” and “\cap” to denote union and intersection of submodels such that πe\mathcal{M}_{\pi}\cap\mathcal{M}_{e} denotes the correct specification of both πzpar(𝒙)\pi_{z}^{\text{par}}({\bm{x}}) and pzdpar(𝒙)p_{zd}^{\text{par}}({\bm{x}}), and πe\mathcal{M}_{\pi}\cup\mathcal{M}_{e} denotes the correct specification of either πzpar(𝒙)\pi_{z}^{\text{par}}({\bm{x}}) or pzdpar(𝒙)p_{zd}^{\text{par}}({\bm{x}}).

Suggested in Theorem 1, one also needs to estimate ed1d0e_{d_{1}d_{0}}, or equivalently pzd𝔼[pzd(𝑿)]p_{zd}\equiv\mathbb{E}[p_{zd}({\bm{X}})], in order to estimate θd1d0(zz)\theta_{d_{1}d_{0}}^{(zz^{\prime})}. There are multiple ways to estimate pzdp_{zd}, as one can simply use the plug-in estimator n[p^zdpar(𝑿)]\mathbb{P}_{n}[\widehat{p}_{zd}^{\text{par}}({\bm{X}})] and the inverse probability weighting estimator n[𝕀(Z=z,D=d)/π^zpar(𝑿)]\mathbb{P}_{n}\left[{\mathbb{I}(Z=z,D=d)}/{\widehat{\pi}_{z}^{\text{par}}({\bm{X}})}\right]. In randomized experiments, because πz(𝑿)=πz\pi_{z}({\bm{X}})=\pi_{z}, one can also estimate pzdp_{zd} by n[𝕀(Z=z,D=d)]/n[𝕀(Z=z)]{\mathbb{P}_{n}[\mathbb{I}(Z=z,D=d)]}/{\mathbb{P}_{n}[\mathbb{I}(Z=z)]}. In this article, we consider the doubly robust estimator developed in Jiang et al. (2022),

p^zddr=n[𝕀(Z=z){𝕀(D=d)p^zdpar(𝑿)}π^zpar(𝑿)+p^zdpar(𝑿)],\widehat{p}_{zd}^{\text{dr}}=\mathbb{P}_{n}\left[\frac{\mathbb{I}(Z=z)\left\{\mathbb{I}(D=d)-\widehat{p}_{zd}^{\text{par}}({\bm{X}})\right\}}{\widehat{\pi}_{z}^{\text{par}}({\bm{X}})}+\widehat{p}_{zd}^{\text{par}}({\bm{X}})\right], (5)

which is consistent to pzdp_{zd} under πe\mathcal{M}_{\pi}\cup\mathcal{M}_{e} and is locally efficient under πe\mathcal{M}_{\pi}\cap\mathcal{M}_{e}.

4.2 Moment-type estimators

We provide four distinct identification expressions of θd1d0(zz)\theta_{d_{1}d_{0}}^{(zz^{\prime})}; each expression uses only part, but not all, of the nuisance functions hnusianceh_{nusiance} and the principal stratum proportion pzdp_{zd}.

Theorem 2

For z,z{0,1}z,z^{\prime}\in\{0,1\}, d1d0𝒰ad_{1}d_{0}\in\mathcal{U}_{\text{a}} or d1d0𝒰bd_{1}d_{0}\in\mathcal{U}_{\text{b}} under standard or strong monotonicity, we have θd1d0(zz)=θd1d0(zz),a=θd1d0(zz),b=θd1d0(zz),c=θd1d0(zz),d\theta_{d_{1}d_{0}}^{(zz^{\prime})}=\theta^{(zz^{\prime}),\textrm{a}}_{d_{1}d_{0}}=\theta^{(zz^{\prime}),\textrm{b}}_{d_{1}d_{0}}=\theta^{(zz^{\prime}),\textrm{c}}_{d_{1}d_{0}}=\theta^{(zz^{\prime}),\textrm{d}}_{d_{1}d_{0}}, where

θd1d0(zz),a\displaystyle\theta^{(zz^{\prime}),\textrm{a}}_{d_{1}d_{0}} =𝔼[pzd(𝑿)kp01(𝑿)pzdkp01𝕀(D=dz,Z=z)pzdz(𝑿)πz(𝑿)rzdz(M,𝑿)rzdz(M,𝑿)Y],\displaystyle=\mathbb{E}\left[\frac{p_{z^{*}d^{*}}({\bm{X}})-kp_{01}({\bm{X}})}{p_{z^{*}d^{*}}-kp_{01}}\frac{\mathbb{I}(D=d_{z},Z=z)}{p_{zd_{z}}({\bm{X}})\pi_{z}({\bm{X}})}\frac{r_{z^{\prime}d_{z^{\prime}}}(M,{\bm{X}})}{r_{zd_{z}}(M,{\bm{X}})}Y\right],
θd1d0(zz),b\displaystyle\theta^{(zz^{\prime}),\textrm{b}}_{d_{1}d_{0}} =𝔼[{𝕀(Z=z,D=d)πz(𝑿)k(1Z)Dπ0(𝑿)}ηzz(𝑿)pzdkp01],\displaystyle=\mathbb{E}\left[\left\{\frac{\mathbb{I}(Z=z^{*},D=d^{*})}{\pi_{z^{*}}({\bm{X}})}-k\frac{(1-Z)D}{\pi_{0}({\bm{X}})}\right\}\frac{\eta_{zz^{\prime}}({\bm{X}})}{p_{z^{*}d^{*}}-kp_{01}}\right],
θd1d0(zz),c\displaystyle\theta^{(zz^{\prime}),\textrm{c}}_{d_{1}d_{0}} =𝔼[pzd(𝑿)kp01(𝑿)pzdkp01𝕀(D=dz,Z=z)pzdz(𝑿)πz(𝑿)μzdz(M,𝑿)],\displaystyle=\mathbb{E}\left[\frac{p_{z^{*}d^{*}}({\bm{X}})-kp_{01}({\bm{X}})}{p_{z^{*}d^{*}}-kp_{01}}\frac{\mathbb{I}(D=d_{z^{\prime}},Z=z^{\prime})}{p_{z^{\prime}d_{z^{\prime}}}({\bm{X}})\pi_{z^{\prime}}({\bm{X}})}\mu_{zd_{z}}(M,{\bm{X}})\right],
θd1d0(zz),d\displaystyle\theta^{(zz^{\prime}),\textrm{d}}_{d_{1}d_{0}} =𝔼[pzd(𝑿)kp01(𝑿)pzdkp01ηzz(𝑿)].\displaystyle=\mathbb{E}\left[\frac{p_{z^{*}d^{*}}({\bm{X}})-kp_{01}({\bm{X}})}{p_{z^{*}d^{*}}-kp_{01}}\eta_{zz^{\prime}}({\bm{X}})\right].

with ηzz(𝐗)=mμzdz(m,𝐗)rzdz(m,𝐗)𝑑m\eta_{zz^{\prime}}({\bm{X}})=\int_{m}\mu_{zd_{z}}(m,{\bm{X}})r_{z^{\prime}d_{z^{\prime}}}(m,{\bm{X}})dm, k=|d1d0|k=|d_{1}-d_{0}|, zd=z^{*}d^{*}=11, 10, 01 if d1d0=d_{1}d_{0}=10, 00, and 11, respectively.

The first expression is an average of outcome by the product of four different weights, where the first weight, pzd(𝑿)kp01(𝑿)pzdkp01=ed1d0(𝑿)/ed1d0\frac{p_{z^{*}d^{*}}({\bm{X}})-kp_{01}({\bm{X}})}{p_{z^{*}d^{*}}-kp_{01}}={e_{d_{1}d_{0}}({\bm{X}})}/{e_{d_{1}d_{0}}}, is the principal score weight for creating a pseudo-population within stratum U=d1d0U=d_{1}d_{0} (Jiang et al., 2022). The remaining three weights in θd1d0(zz),a\theta^{(zz^{\prime}),\textrm{a}}_{d_{1}d_{0}}—the inverse probability of treatment weight, the inverse probability of the post-treatment event weight, and the mediator density ratio weight—correct for selection bias associated with the treatment, post-treatment event, and the observed mediator value, within the pseudo population created by the principal score weight. The second expression is a product of two components, where the first component, {𝕀(Z=z,D=d)πz(𝑿)k(1Z)Dπ0(𝑿)}/(pzdkp01)\left\{\frac{\mathbb{I}(Z=z^{*},D=d^{*})}{\pi_{z^{*}}({\bm{X}})}-k\frac{(1-Z)D}{\pi_{0}({\bm{X}})}\right\}\Big{/}(p_{z^{*}d^{*}}-kp_{01}), plays a similar role to the principal score weight to create a pseudo-population of stratum U=d1d0U=d_{1}d_{0}, and the second component, ηzz(𝑿)=𝔼[YzMz|U=d1d0,𝑿]\eta_{zz^{\prime}}({\bm{X}})=\mathbb{E}[Y_{zM_{z^{\prime}}}|U=d_{1}d_{0},{\bm{X}}] is a conditional version of θd1d0(zz)\theta_{d_{1}d_{0}}^{(zz^{\prime})} given fixed values of baseline covariates and within stratum U=d1d0U=d_{1}d_{0}. Construction of the third expression bears some resemblance to the first expression, both of which use the principal score weight, except that the third expression uses a slightly different weighting scheme coupled with the conditional expectation of outcome μzdz(M,𝑿)\mu_{zd_{z}}(M,{\bm{X}}) instead of weighting directly on the observed outcome. The fourth expression shares a similar form to the second expression, but now involves the product between the principal score weight and ηzz(𝑿)\eta_{zz^{\prime}}({\bm{X}}).

According to Theorem 2, we can obtain the four moment-type estimators, {θ^d1d0(zz),a,θ^d1d0(zz),b,θ^d1d0(zz),c,θ^d1d0(zz),d}\{\widehat{\theta}^{(zz^{\prime}),\rm{a}}_{d_{1}d_{0}},\allowbreak\widehat{\theta}^{(zz^{\prime}),\rm{b}}_{d_{1}d_{0}},\allowbreak\widehat{\theta}^{(zz^{\prime}),\rm{c}}_{d_{1}d_{0}},\allowbreak\widehat{\theta}^{(zz^{\prime}),\rm{d}}_{d_{1}d_{0}}\}, by replacing the unknown nuisance functions with their estimates from parametric working models and substituting the outer expectation operator 𝔼\mathbb{E} by the empirical average operator n\mathbb{P}_{n}. As an example, θ^d1d0(zz),d\widehat{\theta}^{(zz^{\prime}),\rm{d}}_{d_{1}d_{0}} is given by

n[p^zdpar(𝑿)kp^01par(𝑿)p^zddrkp^01drη^zzpar(𝑿)],\mathbb{P}_{n}\left[\frac{\widehat{p}_{z^{*}d^{*}}^{\text{par}}({\bm{X}})-k\widehat{p}_{01}^{\text{par}}({\bm{X}})}{\widehat{p}_{z^{*}d^{*}}^{\text{dr}}-k\widehat{p}_{01}^{\text{dr}}}\widehat{\eta}_{zz^{\prime}}^{\text{par}}({\bm{X}})\right],

where η^zzpar(𝑿)=mμ^zdzpar(m,𝑿)r^zdzpar(m,𝑿)𝑑m\widehat{\eta}_{zz^{\prime}}^{\text{par}}({\bm{X}})=\int_{m}\widehat{\mu}_{zd_{z}}^{\text{par}}(m,{\bm{X}})\widehat{r}_{z^{\prime}d_{z^{\prime}}}^{\text{par}}(m,{\bm{X}})dm. Here, the integral in η^zzpar(𝑿)\widehat{\eta}_{zz^{\prime}}^{\text{par}}({\bm{X}}) becomes simple summations when the mediator is categorical and, if the mediator is continuous, numerical integration can be used for an approximate calculation. We summarize the asymptotic properties of the four moment-type estimators below.

Proposition 1

Suppose that the regularity conditions outlined in the Supplementary Material hold. Then, θ^d1d0(zz),a\widehat{\theta}^{(zz^{\prime}),\rm{a}}_{d_{1}d_{0}}, θ^d1d0(zz),b\widehat{\theta}^{(zz^{\prime}),\rm{b}}_{d_{1}d_{0}}, θ^d1d0(zz),c\widehat{\theta}^{(zz^{\prime}),\rm{c}}_{d_{1}d_{0}}, and θ^d1d0(zz),d\widehat{\theta}^{(zz^{\prime}),\rm{d}}_{d_{1}d_{0}} are consistent and asymptotic normal under πem\mathcal{M}_{\pi}\cap\mathcal{M}_{e}\cap\mathcal{M}_{m}, πmo\mathcal{M}_{\pi}\cap\mathcal{M}_{m}\cap\mathcal{M}_{o}, πeo\mathcal{M}_{\pi}\cap\mathcal{M}_{e}\cap\mathcal{M}_{o}, and emo\mathcal{M}_{e}\cap\mathcal{M}_{m}\cap\mathcal{M}_{o}, respectively.

4.3 From efficient influence function to multiply robust estimator

Denote np\mathcal{M}_{np} as the nonparametric model over the observed data density function f𝑶f_{\bm{O}}. The efficient influence function (EIF) of θd1d0(zz)\theta_{d_{1}d_{0}}^{(zz^{\prime})} under np\mathcal{M}_{np} is derived in Theorem 3 based on the semiparametric estimation theory (Bickel et al., 1993), which also implies the semiparametric efficiency bound, i.e., the lower bound of the asymptotic variance among all regular and asymptotic linear estimators of θd1d0(zz)\theta_{d_{1}d_{0}}^{(zz^{\prime})} under the nonparametric model np\mathcal{M}_{np}.

Theorem 3

The EIF of θd1d0(zz)\theta_{d_{1}d_{0}}^{(zz^{\prime})} over np\mathcal{M}_{np} is

𝒟d1d0(zz)(𝑶)=ψd1d0(zz)(𝑶)θd1d0(zz)δd1d0(𝑶)pzdkp01,\mathcal{D}^{(zz^{\prime})}_{d_{1}d_{0}}(\bm{O})=\frac{\psi_{d_{1}d_{0}}^{(zz^{\prime})}(\bm{O})-\theta_{d_{1}d_{0}}^{(zz^{\prime})}\delta_{d_{1}d_{0}}(\bm{O})}{p_{z^{*}d^{*}}-kp_{01}},

where

ψd1d0(zz)(𝑶)=\displaystyle\psi_{d_{1}d_{0}}^{(zz^{\prime})}({\bm{O}})= (𝕀(Z=z){𝕀(D=d)pzd(𝑿)}πz(𝑿)k(1Z){Dp01(𝑿)}π0(𝑿))ηzz(𝑿)\displaystyle\left(\frac{\mathbb{I}(Z=z^{*})\left\{\mathbb{I}(D=d^{*})-p_{z^{*}d^{*}}({\bm{X}})\right\}}{\pi_{z^{*}}({\bm{X}})}-k\frac{(1-Z)\left\{D-p_{01}({\bm{X}})\right\}}{\pi_{0}({\bm{X}})}\right)\eta_{zz^{\prime}}({\bm{X}})
+{pzd(𝑿)kp01(𝑿)}𝕀(D=dz,Z=z)pzdz(𝑿)πz(𝑿)rzdz(M,𝑿)rzdz(M,𝑿){Yμzdz(M,𝑿)}\displaystyle+\left\{p_{z^{*}d^{*}}({\bm{X}})-kp_{01}({\bm{X}})\right\}\frac{\mathbb{I}(D=d_{z},Z=z)}{p_{zd_{z}}({\bm{X}})\pi_{z}({\bm{X}})}\frac{r_{z^{\prime}d_{z^{\prime}}}(M,{\bm{X}})}{r_{zd_{z}}(M,{\bm{X}})}\left\{Y-\mu_{zd_{z}}(M,{\bm{X}})\right\}
+{pzd(𝑿)kp01(𝑿)}𝕀(D=dz,Z=z)pzdz(𝑿)πz(𝑿){μzdz(M,𝑿)ηzz(𝑿)}\displaystyle+\left\{p_{z^{*}d^{*}}({\bm{X}})-kp_{01}({\bm{X}})\right\}\frac{\mathbb{I}(D=d_{z^{\prime}},Z=z^{\prime})}{p_{z^{\prime}d_{z^{\prime}}}({\bm{X}})\pi_{z^{\prime}}({\bm{X}})}\left\{\mu_{zd_{z}}(M,{\bm{X}})-\eta_{zz^{\prime}}({\bm{X}})\right\}
+{pzd(𝑿)kp01(𝑿)}ηzz(𝑿),\displaystyle+\left\{p_{z^{*}d^{*}}({\bm{X}})-kp_{01}({\bm{X}})\right\}\eta_{zz^{\prime}}({\bm{X}}),
δd1d0(𝑶)=\displaystyle\delta_{d_{1}d_{0}}({\bm{O}})= 𝕀(Z=z){𝕀(D=d)pzd(𝑿)}πz(𝑿)k(1Z){Dp01(𝑿)}π0(𝑿)+pzd(𝑿)kp01(𝑿),\displaystyle\frac{\mathbb{I}(Z=z^{*})\left\{\mathbb{I}(D=d^{*})-p_{z^{*}d^{*}}({\bm{X}})\right\}}{\pi_{z^{*}}({\bm{X}})}-k\frac{(1-Z)\left\{D-p_{01}({\bm{X}})\right\}}{\pi_{0}({\bm{X}})}+p_{z^{*}d^{*}}({\bm{X}})-kp_{01}({\bm{X}}),

k=|d1d0|k=|d_{1}-d_{0}|, zd=z^{*}d^{*}=11, 10, 01 if d1d0=d_{1}d_{0}=10, 00, and 11, respectively. Therefore, the semiparametric efficiency bound for estimation of θd1d0(zz)\theta_{d_{1}d_{0}}^{(zz^{\prime})} is 𝔼[{𝒟d1d0(zz)(𝐎)}2]\mathbb{E}\left[\left\{\mathcal{D}^{(zz^{\prime})}_{d_{1}d_{0}}(\bm{O})\right\}^{2}\right].

Theorem 3 inspires a new estimator of θd1d0(zz)\theta_{d_{1}d_{0}}^{(zz^{\prime})} by solving the following EIF-induced estimating equation

n[ψd1d0(zz)(𝑶)θd1d0(zz)δd1d0(𝑶)pzdkp01]=0,\mathbb{P}_{n}\left[\frac{\psi_{d_{1}d_{0}}^{(zz^{\prime})}(\bm{O})-\theta_{d_{1}d_{0}}^{(zz^{\prime})}\delta_{d_{1}d_{0}}(\bm{O})}{p_{z^{*}d^{*}}-kp_{01}}\right]=0,

where ψd1d0(zz)(𝑶)\psi_{d_{1}d_{0}}^{(zz^{\prime})}(\bm{O}) and δd1d0(𝑶)\delta_{d_{1}d_{0}}(\bm{O}) depend on nuisance functions hnuisanceh_{nuisance} and the denominator pzdkp01p_{z^{*}d^{*}}-kp_{01} is a constant that does not affect the solution of the estimating equation. Therefore, the new estimator, which we hereafter refer to as the multiply robust estimator, can be constructed as

θ^d1d0(zz),mr=n[ψ^d1d0(zz),par(𝑶)]n[δ^d1d0par(𝑶)].\widehat{\theta}^{(zz^{\prime}),\text{mr}}_{d_{1}d_{0}}=\frac{\mathbb{P}_{n}\left[\widehat{\psi}_{d_{1}d_{0}}^{(zz^{\prime}),\text{par}}(\bm{O})\right]}{\mathbb{P}_{n}\left[\widehat{\delta}_{d_{1}d_{0}}^{\text{par}}(\bm{O})\right]}.

Theorem 4 summarizes the asymptotic properties of the multiply robust estimator.

Theorem 4

Suppose that the regularity conditions outlined in Supplementary Material hold. Under either πem\mathcal{M}_{\pi}\cap\mathcal{M}_{e}\cap\mathcal{M}_{m}, πmo\mathcal{M}_{\pi}\cap\mathcal{M}_{m}\cap\mathcal{M}_{o}, πeo\mathcal{M}_{\pi}\cap\mathcal{M}_{e}\cap\mathcal{M}_{o}, or emo\mathcal{M}_{e}\cap\mathcal{M}_{m}\cap\mathcal{M}_{o}, the multiply robust estimator θ^d1d0(zz),mr\widehat{\theta}^{(zz^{\prime}),\text{mr}}_{d_{1}d_{0}} is consistent and asymptotically normal such that n(θ^d1d0(zz),mrθd1d0(zz))\sqrt{n}\left(\widehat{\theta}^{(zz^{\prime}),\text{mr}}_{d_{1}d_{0}}-\theta_{d_{1}d_{0}}^{(zz^{\prime})}\right) converges to a zero-mean normal distribution with finite variance VmrV_{\text{mr}}. Moreover, VmrV_{\text{mr}} achieves the semiparametric efficiency bound under πemo\mathcal{M}_{\pi}\cap\mathcal{M}_{e}\cap\mathcal{M}_{m}\cap\mathcal{M}_{o}.

An attractive property of θ^d1d0(zz),mr\widehat{\theta}^{(zz^{\prime}),\text{mr}}_{d_{1}d_{0}} is that it offers four types of protection against misspecification of the parametric working models. Notice that the four moment-type estimators provided in Section 4.2 are only single robust; for example, θ^d1d0(zz),a\widehat{\theta}^{(zz^{\prime}),\text{a}}_{d_{1}d_{0}} is only consistent under πem\mathcal{M}_{\pi}\cap\mathcal{M}_{e}\cap\mathcal{M}_{m}. By contrast, θ^d1d0(zz),mr\widehat{\theta}^{(zz^{\prime}),\text{mr}}_{d_{1}d_{0}} is quadruply robust such that it is consistent for θd1d0(zz)\theta_{d_{1}d_{0}}^{(zz^{\prime})} even if one of the four working models, π\mathcal{M}_{\pi}, e\mathcal{M}_{e}, m\mathcal{M}_{m}, and o\mathcal{M}_{o}, is misspecified. In addition, θ^d1d0(zz),mr\widehat{\theta}^{(zz^{\prime}),\text{mr}}_{d_{1}d_{0}} is also locally efficient when all of the four working models are correctly specified. A proof of the quadruple robustness property is given in Supplementary Material. As a caveat, the quadruple robustness is more stringent than the double robustness property, as the former requires three out of four working models to be correct whereas the latter only require one out of two working models to be correct. In practice, one can use nonparametric bootstrap to construct the standard error and confidence interval of θ^d1d0(zz),mr\widehat{\theta}^{(zz^{\prime}),\text{mr}}_{d_{1}d_{0}}.

Remark 4

The monotonicity assumption can place a restriction on the observed data density f𝐎f_{{\bm{O}}}; that is, the standard monotonicity indicates that fD|Z,𝐗(1|1,𝐱)fD|Z,𝐗(1|0,𝐱)f_{D|Z,{\bm{X}}}(1|1,{\bm{x}})\geq f_{D|Z,{\bm{X}}}(1|0,{\bm{x}}) and the strong monotonicity further constrains fD|Z,𝐗(1|0,𝐱)0f_{D|Z,{\bm{X}}}(1|0,{\bm{x}})\equiv 0. Following previous efforts in obtaining efficient causal estimators under a principal stratification framework (Rudolph et al., 2024; Jiang et al., 2022), the EIF in Theorem 3 is derived under the nonparametric model np\mathcal{M}_{np}, which does not leverage the monotonicity restriction on f𝐎f_{{\bm{O}}} to potentially sharpen the efficiency bound. Therefore, θ^d1d0(zz),mr\widehat{\theta}^{(zz^{\prime}),\text{mr}}_{d_{1}d_{0}} is only locally efficient under np\mathcal{M}_{np}, rather than under a more restrictive model space assuming monotonicity.

4.4 Nonparametric efficient estimation

We extend the proposed multiply robust estimator by estimating the nuisance functions, hnuisanceh_{\text{nuisance}}, via flexible nonparametric methods or modern data-adaptive machine learning methods. We denote the new estimator as θ^d1d0(zz),np\widehat{\theta}^{(zz^{\prime}),\text{np}}_{d_{1}d_{0}} with the superscript “np” to indicate using nonparametric algorithms. The cross-fitting procedure (Chernozhukov et al., 2018) is employed to circumvent the bias due to overfitting of nonparametric estimation on the nuisance functions. Specifically, we randomly partition the dataset into VV groups with approximately equal size such that the group size difference is at most 1. For each vv, let 𝒪v\mathcal{O}_{v} be the data in vv-th group and 𝒪v=v{1,,V}v𝒪v\mathcal{O}_{-v}=\displaystyle\cup_{v^{\prime}\in\{1,\dots,V\}\setminus v}\mathcal{O}_{v^{\prime}} be the data excluding the vv-th group. For v=1,,Vv=1,\dots,V, we calculate the nuisance function estimates on data 𝒪v\mathcal{O}_{v}, denoted by h^nuisancenp,v={π^znp,v(𝒙),p^zdnp,v(𝒙),r^zdnp,v(m,𝒙),μ^zdnp,v(m,𝒙)}\widehat{h}_{nuisance}^{\text{np},v}=\{\widehat{\pi}_{z}^{\text{np},v}({\bm{x}}),\widehat{p}_{zd}^{\text{np},v}({\bm{x}}),\widehat{r}_{zd}^{\text{np},v}(m,{\bm{x}}),\widehat{\mu}_{zd}^{\text{np},v}(m,{\bm{x}})\}, based on machine learning or nonparametric methods trained on data 𝒪v\mathcal{O}_{-v}. The nuisance function estimates evaluated over the entire dataset, h^nuisancenp\widehat{h}_{nuisance}^{\text{np}}, is therefore a combination of h^nuisancenp,1\widehat{h}_{nuisance}^{\text{np},1}, h^nuisancenp,2\widehat{h}_{nuisance}^{\text{np},2}, \dots, h^nuisancenp,V\widehat{h}_{nuisance}^{\text{np},V}. Finally, θ^d1d0(zz),np\widehat{\theta}^{(zz^{\prime}),\text{np}}_{d_{1}d_{0}} is given by the solution to

n[ψ^d1d0(zz),np(𝑶)θd1d0(zz)δ^d1d0np(𝑶)pzdkp01]=0\mathbb{P}_{n}\left[\frac{\widehat{\psi}_{d_{1}d_{0}}^{(zz^{\prime}),\text{np}}(\bm{O})-\theta_{d_{1}d_{0}}^{(zz^{\prime})}\widehat{\delta}_{d_{1}d_{0}}^{\text{np}}(\bm{O})}{p_{z^{*}d^{*}}-kp_{01}}\right]=0

so that θ^d1d0(zz),np=n[ψ^d1d0(zz),np(𝑶)]/n[δ^d1d0np(𝑶)]\widehat{\theta}^{(zz^{\prime}),\text{np}}_{d_{1}d_{0}}=\mathbb{P}_{n}[\widehat{\psi}_{d_{1}d_{0}}^{(zz^{\prime}),\text{np}}(\bm{O})]/\mathbb{P}_{n}[\widehat{\delta}_{d_{1}d_{0}}^{\text{np}}(\bm{O})], where ψ^d1d0(zz),np(𝑶)\widehat{\psi}_{d_{1}d_{0}}^{(zz^{\prime}),\text{np}}(\bm{O}) and δ^d1d0np(𝑶)\widehat{\delta}_{d_{1}d_{0}}^{\text{np}}(\bm{O}) are ψd1d0(zz)(𝑶)\psi_{d_{1}d_{0}}^{(zz^{\prime})}(\bm{O}) and δd1d0(𝑶)\delta_{d_{1}d_{0}}(\bm{O}) evaluated based on h^nuisancenp\widehat{h}_{nuisance}^{\text{np}}.

Theorem 5

Under Assumptions 16, θ^d1d0(zz),np\widehat{\theta}^{(zz^{\prime}),\text{np}}_{d_{1}d_{0}} is consistent if any three of the four nuisance functions in h^nuisancenp\widehat{h}_{nuisance}^{\text{np}} are consistently estimated in the L2()L_{2}(\mathbb{P})-norm. Furthermore, if all elements in h^nuisancenp\widehat{h}_{nuisance}^{\text{np}} are consistent in the L2()L_{2}(\mathbb{P})-norm and l^npl×g^npg=op(n1/2)\|\widehat{l}^{\text{np}}-l\|\times\|\widehat{g}^{\text{np}}-g\|=o_{p}(n^{-1/2}) for any lg{πz(𝐱),pzd(𝐱),rzd(m,𝐱),μzd(m,𝐱)}l\neq g\in\{\pi_{z}({\bm{x}}),p_{zd}({\bm{x}}),r_{zd}(m,{\bm{x}}),\mu_{zd}(m,{\bm{x}})\}, then θ^d1d0(zz),np\widehat{\theta}^{(zz^{\prime}),\text{np}}_{d_{1}d_{0}} is asymptotically normal and its asymptotic variance achieves the efficiency lower bound.

Theorem 5 indicates that θ^d1d0(zz),np\widehat{\theta}^{(zz^{\prime}),\text{np}}_{d_{1}d_{0}} is consistent, asymptotically normal, and also achieves semiparametric efficiency lower bound, if all nuisance functions can be consistently estimated with a op(n1/4)o_{p}(n^{-1/4}) rate, which can be achieved by several machining learning algorithms (e.g., the boosting approach by Luo et al. (2016), and the random forest by Wager and Walther (2015), and the neural networks by Chen and White (1999)). When nuisance functions are estimated via data-adaptive methods, we use the empirical variance of the estimated EIF to construct the variance estimator for θ^d1d0(zz),np\widehat{\theta}^{(zz^{\prime}),\text{np}}_{d_{1}d_{0}}; that is

Var^(θ^d1d0(zz),np)=1nn[{ψ^d1d0(zz),np(𝑶)θ^d1d0(zz),npδ^d1d0np(𝑶)p^zdnpkp^01np}2],\widehat{\text{Var}}(\widehat{\theta}^{(zz^{\prime}),\text{np}}_{d_{1}d_{0}})=\frac{1}{n}\mathbb{P}_{n}\left[\left\{\frac{\widehat{\psi}_{d_{1}d_{0}}^{(zz^{\prime}),\text{np}}(\bm{O})-\widehat{\theta}_{d_{1}d_{0}}^{(zz^{\prime}),\text{np}}\widehat{\delta}_{d_{1}d_{0}}^{\text{np}}(\bm{O})}{\widehat{p}_{z^{*}d^{*}}^{\text{np}}-k\widehat{p}_{01}^{\text{np}}}\right\}^{2}\right],

where p^zdnp\widehat{p}_{zd}^{\text{np}} is constructed analogous to p^zddr\widehat{p}_{zd}^{\text{dr}} in (5) but evaluated using h^nuisancenp\widehat{h}_{nuisance}^{\text{np}}.

4.5 Estimation of natural mediation effects

Once we obtain θ^d1d0(zz)\widehat{\theta}_{d_{1}d_{0}}^{(zz^{\prime})}, estimators of PNIEd1d0\text{PNIE}_{d_{1}d_{0}} and PNDEd1d0\text{PNDE}_{d_{1}d_{0}} can be constructed based on (1). For example, we can construct PNIE^d1d0s=θ^d1d0(11),sθ^d1d0(10),s\widehat{\text{PNIE}}_{d_{1}d_{0}}^{\text{s}}=\widehat{\theta}^{(11),\text{s}}_{d_{1}d_{0}}-\widehat{\theta}^{(10),\text{s}}_{d_{1}d_{0}} and PNDE^d1d0s=θ^d1d0(10),sθ^d1d0(00),s\widehat{\text{PNDE}}_{d_{1}d_{0}}^{\text{s}}=\widehat{\theta}^{(10),\text{s}}_{d_{1}d_{0}}-\widehat{\theta}^{(00),\text{s}}_{d_{1}d_{0}}, if either the moment-type method (s=a, b, c or d), multiply robust estimator (s=mr), or nonparametric efficient estimator (s=np) is used for θd1d0(zz)\theta^{(zz^{\prime})}_{d_{1}d_{0}}. Analogously, ITT-NIE and ITT-NDE can estimated via using (4) by replacing ed1d0=pzdkp01e_{d_{1}d_{0}}=p_{z^{*}d^{*}}-kp_{01} and θd1d0(zz)\theta_{d_{1}d_{0}}^{(zz^{\prime})} with their corresponding estimators. Specifically, we can construct estimators of ITT-NIE and ITT-NDE as ITT-NDE^s=d1d0𝒰e^d1d0dr×(θ^d1d0(10),sθ^d1d0(00),s)\widehat{\text{ITT-NDE}}^{\text{s}}=\displaystyle\sum_{d_{1}d_{0}\in\mathcal{U}}\widehat{e}_{d_{1}d_{0}}^{\text{dr}}\times\left(\widehat{\theta}^{(10),\text{s}}_{d_{1}d_{0}}-\widehat{\theta}^{(00),\text{s}}_{d_{1}d_{0}}\right) and ITT-NIE^s=d1d0𝒰e^d1d0dr×(θ^d1d0(11),sθd1d0(10),s)\widehat{\text{ITT-NIE}}^{\text{s}}=\displaystyle\sum_{d_{1}d_{0}\in\mathcal{U}}\widehat{e}_{d_{1}d_{0}}^{\text{dr}}\times\left(\widehat{\theta}^{(11),\text{s}}_{d_{1}d_{0}}-\theta^{(10),\text{s}}_{d_{1}d_{0}}\right) if either the moment-type method (s=a, b, c or d) or the multiply robust estimator (s=mr) is used for estimating θd1d0(zz)\theta_{d_{1}d_{0}}^{(zz^{\prime})}, where e^d1d0dr=p^zddrkp^01dr\widehat{e}_{d_{1}d_{0}}^{\text{dr}}=\widehat{p}_{z^{*}d^{*}}^{\text{dr}}-k\widehat{p}_{01}^{\text{dr}} and 𝒰=𝒰a\mathcal{U}=\mathcal{U}_{\text{a}} and 𝒰b\mathcal{U}_{\text{b}} under standard and strong monotonicity assumptions, respectively. In particular, the multiply robust estimators have the following explicit expressions

ITT-NDE^mr=\displaystyle\widehat{\text{ITT-NDE}}^{\text{mr}}= n[d1d0𝒰{ψ^d1d0(10),parψ^d1d0(00),par}],\displaystyle\mathbb{P}_{n}\left[\sum_{d_{1}d_{0}\in\mathcal{U}}\left\{\widehat{\psi}_{d_{1}d_{0}}^{(10),\text{par}}-\widehat{\psi}_{d_{1}d_{0}}^{(00),\text{par}}\right\}\right], (6)
ITT-NIE^mr=\displaystyle\widehat{\text{ITT-NIE}}^{\text{mr}}= n[d1d0𝒰{ψ^d1d0(11),parψ^d1d0(10),par}].\displaystyle\mathbb{P}_{n}\left[\sum_{d_{1}d_{0}\in\mathcal{U}}\left\{\widehat{\psi}_{d_{1}d_{0}}^{(11),\text{par}}-\widehat{\psi}_{d_{1}d_{0}}^{(10),\text{par}}\right\}\right]. (7)

Similarly, the nonparametric estimators ITT-NDE^np\widehat{\text{ITT-NDE}}^{\text{np}} and ITT-NIE^np\widehat{\text{ITT-NIE}}^{\text{np}} can be obtained by replacing all ψ^d1d0(zz),par\widehat{\psi}_{d_{1}d_{0}}^{(zz^{\prime}),\text{par}} in (6) and (7) with ψ^d1d0(zz),np\widehat{\psi}_{d_{1}d_{0}}^{(zz^{\prime}),\text{np}}. In the Supplementary Material, we show that, for all τ{PNIEd1d0,PNDEd1d0,ITT-NIE,ITT-NDE}\tau\in\{\text{PNIE}_{d_{1}d_{0}},\text{PNDE}_{d_{1}d_{0}},\text{ITT-NIE},\text{ITT-NDE}\}, τ^np\widehat{\tau}^{\text{np}} is consistent and semiparametrically efficient if conditions in Theorem 5 are satisfied and τ^mr\widehat{\tau}^{\text{mr}} is still quadruply robust and locally efficient when all working models in hnuisanceh_{nuisance} are correctly specified. Details on inference are given in the Supplementary Material.

Although we primarily discuss mediation effects on a mean difference scale, all effects can be defined on other scales as needed. For example, with a binary outcome one can consider a risk ratio scale and use PNIEd1d0RR=θd1d0(11)/θd1d0(10)\text{PNIE}_{d_{1}d_{0}}^{\text{RR}}=\theta_{d_{1}d_{0}}^{(11)}/\theta_{d_{1}d_{0}}^{(10)} and PNDEd1d0RR=θd1d0(10)/θd1d0(00)\text{PNDE}_{d_{1}d_{0}}^{\text{RR}}=\theta_{d_{1}d_{0}}^{(10)}/\theta_{d_{1}d_{0}}^{(00)} to quantify the natural indirect and direct effects within principal stratum U=d1d0U=d_{1}d_{0}. Similarly, one can use ITT-NIERR=𝔼[Y1M1]/𝔼[Y1M0]\text{ITT-NIE}^{\text{RR}}=\mathbb{E}[Y_{1M_{1}}]/\mathbb{E}[Y_{1M_{0}}] and ITT-NDERR=𝔼[Y1M0]/𝔼[Y0M0]\text{ITT-NDE}^{\text{RR}}=\mathbb{E}[Y_{1M_{0}}]/\mathbb{E}[Y_{0M_{0}}] to measure the natural indirect and direct effects among the entire study population. Estimation of ratio mediation effects is straightforward based on θ^d1d0(zz)\widehat{\theta}_{d_{1}d_{0}}^{(zz^{\prime})}, and is omitted for brevity.

5 A simulation study

We investigate the finite-sample performance of the proposed methods via simulation studies. We consider the following data generation process modified from that in Kang and Schafer (2007), in which the positivity assumptions are practically violated under model misspecification. Specifically, we generate 1000 Monte Carlo samples with n=1000n=1000 by the following process. We draw baseline covariates 𝑿=[X1,X2,X3,X4]N(𝟎4×1,𝑰4×4){\bm{X}}=[X_{1},X_{2},X_{3},X_{4}]\sim N(\bm{0}_{4\times 1},\bm{I}_{4\times 4}), and

Z|𝑿Bernoulli(expit([1,0.5,0.25,0.1]T𝑿)),\displaystyle Z|{\bm{X}}\sim\text{Bernoulli}\left(\text{expit}([-1,0.5,-0.25,-0.1]^{T}{\bm{X}})\right),
D|Z,𝑿Bernoulli(expit(1+2Z+[1,0.8,0.6,1]T𝑿)),\displaystyle D|Z,{\bm{X}}\sim\text{Bernoulli}\left(\text{expit}(-1+2Z+[1,-0.8,0.6,-1]^{T}{\bm{X}})\right),
M|D,Z,𝑿Bernoulli(expit(1.8+2Z+1.5D+[1,0.5,0.9,1]T𝑿)),\displaystyle M|D,Z,{\bm{X}}\sim\text{Bernoulli}\left(\text{expit}(-1.8+2Z+1.5D+[1,-0.5,0.9,-1]^{T}{\bm{X}})\right),
Y|M,D,Z,𝑿N(210+1.5ZD+M+[27.4,13.7,13.7,13.7]T𝑿,1).\displaystyle Y|M,D,Z,{\bm{X}}\sim N\left(210+1.5Z-D+M+[27.4,13.7,13.7,13.7]^{T}{\bm{X}},1\right).

In correctly specified parametric working models, we directly include the true baseline covariates 𝑿{\bm{X}} into each working model, where specifications of the working models are given in Supplementary Material. Otherwise, we include a set of transformed covariates, 𝑿~=[X~1,X~2,X~3,X~4]\widetilde{{\bm{X}}}=[\widetilde{X}_{1},\widetilde{X}_{2},\widetilde{X}_{3},\widetilde{X}_{4}], into misspecified working models, where X~1=exp(0.5X1)\widetilde{X}_{1}=\exp(0.5X_{1}), X~2=X21+X1\widetilde{X}_{2}=\frac{X_{2}}{1+X_{1}}, X~2=(X2X325+0.6)3\widetilde{X}_{2}=\left(\frac{X_{2}X_{3}}{25}+0.6\right)^{3}, and X~4=(X2+X4+20)2\widetilde{X}_{4}=\left(X_{2}+X_{4}+20\right)^{2}. We evaluate each of the proposed moment-type estimators and multiply robust estimators under 6 different scenarios: (i) all components in hnuisanceparh_{nuisance}^{\text{par}} are correctly specified; (ii) only πzpar(𝒙)\pi_{z}^{\text{par}}({\bm{x}}) is misspecified; (iii) only pzdpar(𝒙)p_{zd}^{\text{par}}({\bm{x}}) is misspecified; (iv) only rzdpar(m,𝒙)r_{zd}^{\text{par}}(m,{\bm{x}}) is misspecified; (v) only μzdpar(m,𝒙)\mu_{zd}^{\text{par}}(m,{\bm{x}}) is misspecified; (vi) all components in hnuisanceparh_{nuisance}^{\text{par}} are misspecified.

For the nonparametric estimator, we consider a five-fold cross-fitting with the nuisance functions estimated by the Super Learner (Van der Laan et al., 2007) with a combination of random forest and generalized linear models libraries. Although the Super Learner is more flexible than parametric working models, its performance still depends on the quality of the input feature matrix. In each of Scenarios (i)–(vi), we use the true covariates 𝑿{\bm{X}} as the feature matrix under the correctly specified nuisance scenario and the transformed covariates 𝑿~\widetilde{{\bm{X}}} as the feature matrix under the misspecified nuisance scenario.

Refer to caption
Figure 2: Simulation results for estimators of θ10(10)\theta_{10}^{(10)} among 6 different scenarios with sample size n=1,000n=1,000. Scenarios (i)–(vi) are described in Section 5. The horizontal line in each panel is the true value of θ10(10)\theta_{10}^{(10)}. The yellow highlighted boxplots indicate that the corresponding estimators are expected to be consistent by large-sample theory.

Figure 2 presents the boxplots of different estimators of θ10(10)=𝔼[Y1M0|U=10]\theta_{10}^{(10)}=\mathbb{E}[Y_{1M_{0}}|U=10] over 1000 Monte Carlo simulations, with each panel corresponding to a specific simulation scenario. As expected, the moment-type estimators are centered around the true value if the required parametric working models are all correctly specified but may diverge from the true value otherwise. The multiply robust estimator exhibits minimum bias in Scenarios (i)–(v), confirming the quadruply robust property; however, it exhibits bias when all of the working models are misspecified as demonstrated in scenario (vi). The nonparametric efficient estimator performs fairly well with minimal bias in scenarios (i)–(v), and its bias in scenario (vi) is also smaller than that of the multiply robust estimator with parametric working models. For each scenario, we also investigate the 95% Wald-type confidence interval coverage rate in Table 1, where the variance is estimated by bootstrap in moment-type and multiply robust estimators and by the empirical variance of the EIF in the nonparametric method. We observe that both θ^10(10),mr\widehat{\theta}_{10}^{(10),\text{mr}} and θ^10(10),np\widehat{\theta}_{10}^{(10),\text{np}} present close to nominal coverage in scenarios (i)–(v), but their coverage rates are attenuated in scenario (vi) due to misspecification. Moreover, the moment estimator θ^10(10),a\widehat{\theta}_{10}^{(10),\text{a}} appears to have nominal coverage except for Scenario (iii), likely due to the over-estimation of the true sampling variance for this weighting estimator under misspecified weights. We also evaluate estimators of θ11(10)\theta_{11}^{(10)} and θ00(10)\theta_{00}^{(10)} and results are qualitatively similar. The detailed additional simulation results are provided in the Supplementary Material Figures S1–S2.

Table 1: Simulation results of the 95% confidence interval coverage rate among different estimators of θ10(10)\theta_{10}^{(10)}. Scenarios (i)–(vi) are described in Section 4.6. The numbers displayed in bold signify that the corresponding estimators are expected to be consistent by large-sample theory.
Scenario θ^10(10),a\widehat{\theta}_{10}^{(10),\text{a}} θ^10(10),b\widehat{\theta}_{10}^{(10),\text{b}} θ^10(10),c\widehat{\theta}_{10}^{(10),\text{c}} θ^10(10),d\widehat{\theta}_{10}^{(10),\text{d}} θ^10(10),mr\widehat{\theta}_{10}^{(10),\text{mr}} θ^10(10),np\widehat{\theta}_{10}^{(10),\text{np}}
(i) All nuisance correctly specified 0.951 0.910 0.924 0.930 0.929 0.969
(ii) misspecified πzpar(𝒙)\pi_{z}^{\text{par}}({\bm{x}}) 0.955 0.874 0.941 0.921 0.932 0.953
(iii) misspecified pzdpar(𝒙)p_{zd}^{\text{par}}({\bm{x}}) 0.871 0.926 0.738 0.717 0.918 0.951
(iv) misspecified rzdpar(m,𝒙)r_{zd}^{\text{par}}(m,{\bm{x}}) 0.938 0.907 0.921 0.932 0.943 0.953
(v) misspecified μzdpar(m,𝒙)\mu_{zd}^{\text{par}}(m,{\bm{x}}) 0.954 0.873 0.799 0.748 0.946 0.958
(vi) All nuisance misspecified 0.953 0.657 0.919 0.870 0.732 0.812

6 Two real-data applications

6.1 A job training program with noncompliance

JOBS II is a randomized field experiment among 1,801 unemployed workers to examine the effect of a job training workshop to promote mental health and high-quality reemployment (Price et al., 1992). Participants in the treatment group (Z=1Z=1) were assigned to a job skills workshop, but 45% of the participants did not show up and dropped into control group. Let DD be the indicator of whether the individual attends the workshop, where D=0D=0 among all participants in the control group because they had no access to the workshops. As strong monotonicity holds by design, we have two principal strata: compliers and never-takers. In the JOBS II study, it is of interest to assess the effect of attending job skills workshop (ZZ) on depression (YY) among the compliers, which quantifies the efficacy of the training program to mental health (VanderWeele, 2011). Previous efforts (e.g., Park and Kürüm, 2020) have also investigated the role of sense of mastery (MM) in mediating the effect from the job skills workshop (ZZ) on depression (YY) among the complier stratum, typically under the exclusion restriction assumption. Because JOBS II is not double-blinded, the exclusion restriction may not hold due to psychological effects (Park and Kürüm, 2020; Stuart and Jo, 2015). For example, Stuart and Jo (2015) pointed out that participants assigned to the workshop may feel more optimistic about their reemployment opportunity, suggesting direct pathways from the assignment to depression not via their actual attendance status.

We assess causal mediation under our proposed assumptions, which permit the exploration of causal mechanism among never-takers. The mediator is sense of mastery at 6 weeks after randomization, with M=1M=1 indicating a higher sense of mastery. The outcome is a continuous measure of depression at 6 months after randomization, which ranges from 1 to 4 with a higher value indicating worse depression. Baseline covariates (𝑿\bm{X}) include age, gender, race, marital status, education, assertiveness, level of economic hardship, level of depression, and motivation. For the moment-type and multiply robust estimators, we used the parametric working models described in the Supplementary Material for the nuisance functions. Of note, the propensity score is known by randomization, and therefore the working logistic regression of πzpar(𝑿)\pi_{z}^{\text{par}}({\bm{X}}) is not subject to misspecification; we still include all baseline covariates into this working logistic regression to adjust for chance imbalance. For the nonparametric efficient estimator, we used Super Learner with the random forest and generalized linear model libraries for estimating the nuisance functions. We only present the results from the multiply robust estimator and nonparametric efficient estimator; complete results from other estimators are in Supplementary Material Tables S1–S2.

Table 2: Estimated causal effects and 95% confidence intervals in the JOBS II study
Population Estimand Method
mr np Rudolph et al.§
Overall ITT-NIE -0.015 (-0.032, -0.003) -0.017 (-0.031, -0.004) -0.017 (-0.031, -0.004)
ITT-NDE -0.072 (-0.151, -0.001) -0.067 (-0.146, 0.013) -0.067 (-0.146, 0.013)
ITT -0.088 (-0.167, -0.016) -0.084 (-0.162, -0.005) -0.084 (-0.162, -0.005)
Compliers PNIE -0.026 (-0.055, 0.006) -0.029 (-0.052, -0.006) -0.030 (-0.053, -0.006)
PNDE -0.083 (-0.170, 0.002) -0.066 (-0.156, 0.023) -0.115 (-0.251, 0.022)
PCE -0.109 (-0.191, -0.027) -0.096 (-0.182, -0.009) -0.145 (-0.280, -0.008)
Never-takers PNIE 0.000 (-0.006, 0.006) -0.001 (-0.004, 0.003)
PNDE -0.058 (-0.160, 0.031) -0.066 (-0.163, 0.032)
PCE -0.058 (-0.160, 0.030) -0.067 (-0.163, 0.032)
  • §\mathsection

    ‘Rudolph et al.’ is the nonparametric efficient estimator in Rudolph et al. (2024); see Remark 1 for more details of this approach. Their identification formulas of the ITT natural mediation effects are identical to the present work; this explains the numerical equivalence between ‘np’ and ‘Rudolph et al.’ for the ITT analysis. All effects among never-takers given by Rudolph et al. (2024) are zero due to exclusion restriction.

Table 2 (upper panel) presents the estimated ITT effect and its indirect and direct effect decomposition. Both multiply robust and nonparametric efficient estimators present similar results, indicating that the job skills workshop corresponds to negative ITT and ITT-NIE estimates, confirming that sense of mastery is a mediator of the total effect on depression. However, the ITT estimands do not provide resolution to the potential heterogeneity of mediation effects between compliers and never-takers. The estimated proportions of compliers and never-takers are 55% and 45%, respectively (under the nonparametric efficient estimator). We present the stratum-specific mean and standard deviation of the baseline characteristics in Supplementary Material Table S3. Compared to the never-takers, the compliers are older with higher education; a larger fraction of compliers are female, white, married, and are more motivated to participate in the study, but less assertive. To offer a complete picture of the mediation mechanism for compliers and never-takers, Table 2 (middle and bottom panel) additionally presents the estimated PCEs, together with their mediation effect decomposition. For the compliers stratum, both multiply robust and nonparametric efficient estimators suggest that the JOBS II intervention exerts a statistically significant effect on reducing depression, and approximately one quarter of the PCE10\text{PCE}_{10} can be explained by the improvement of sense of mastery. For the never-takers, we observe a smaller but still beneficial effect of the intervention on reducing depression; the 95% confidence interval for PCE00\text{PCE}_{00} crosses zero. For both the multiply robust and nonparametric efficient estimators, the indirect effect among the never-takers is estimated to be almost zero (e.g., PNIE^00np=0.001\widehat{\text{PNIE}}_{00}^{\text{np}}=-0.001 with 95% confidence interval [0.004,0.003][-0.004,0.003]).

We also compare our results to estimates under exclusion restriction. Table 2 (right column) shows the estimates using the approach developed by Rudolph et al. (2024) (see Remark 1 for more details of this approach). Due to exclusion restriction, all mediation effects among never-takers are assumed zero. We observe that the point and interval estimates of PNIE10\text{PNIE}_{10} under exclusion restriction are close to their counterparts under principal ignorability, which is anticipated as the PNIE estimate among never-takers is minimal under principal ignorability. However, the point estimates of PNDE10\text{PNDE}_{10} and PCE10\text{PCE}_{10} under exclusion restriction are larger than those under principal ignorability.

6.2 An epidemiological study with an intercurrent event

We re-analyze the World Health Organization’s Large Analysis and Review of European Housing and Health Status (WHO-LARES) study with 5,882 individuals for the effect of living in damp or moldy conditions (Z=1Z=1 if yes and 0 if no) on depression (Y=1Y=1 if yes and 0 if no), where perceived control on one’s home (MM) is the mediator of interest (VanderWeele and Vansteelandt, 2010). However, some individuals developed dampness or mold related diseases (D=1D=1 if yes and 0 otherwise). Steen et al. (2017) viewed DD as another mediator prior to MM and assesses the path-specific effects through DD and/or MM. To provide a complementary perspective, we view DD as an intercurrent event and partition the population into four strata: the doomed stratum (U=11U=11) including those who would always be diseased regardless of living conditions, the immune stratum (U=00U=00) including those who would never be diseased, the harmed stratum (U=10U=10) including those who would be diseased only if living in damp or moldy conditions, and the benefiters stratum (U=01U=01) including those who would only be diseased if not living in damp or moldy conditions. Among them, the doomed and harmed strata are two subpopulations of typical interest because their physical health is more sensitive to living conditions. That is, studying their treatment effects can help understand the impact of living in damp or moldy conditions on the mental health among the more physically vulnerable subgroups. As a further step, addressing the principal natural mediation effects can uncover the extent to which this impact is attributed to the perceived control on one’s home.

We consider the standard monotonicity assumption to rule out benefiters, which is plausible because living in damp and moldy conditions would generally only make an individual more likely to develop dampness or mold related diseases. The exclusion restriction is unlikely to hold because living in damp or moldy conditions can still directly affect mental health even in the absence of dampness or mold related diseases. We adjust for the following confounders: gender, age, marital status, education, employment, smoking, home ownership, home size, crowding (number of residents per room), heating, and natural light. The proportions of doomed, harmed, and immune strata based on the nonparametric efficient estimator are 51%, 8%, and 41%, respectively. We summarize the stratum-specific mean and standard deviation of the baseline characteristics in Supplementary Material Table S4. The doomed stratum includes more females, followed by harmed stratum, whereas the immune stratum includes the fewest females. As compared to the doomed and harmed strata, members in the immune stratum are more likely to be married and employed; they are also more satisfied with the heating system and natural light condition in their dwellings.

Table 3: Estimated causal effects and 95% confidence intervals in the WHO-LARES study.
Population Estimand Method
mr np§
Overall ITT-NIERR{}^{\text{RR}} 1.021 (1.003, 1.043) 1.031 (1.010, 1.053)
ITT-NDERR{}^{\text{RR}} 1.248 (1.114, 1.405) 1.219 (1.078, 1.361)
ITTRR{}^{\text{RR}} 1.274 (1.137, 1.438) 1.257 (1.114, 1.400)
Doomed PNIE11RR{}^{\text{RR}}_{11} 1.017 (0.999, 1.034) 1.025 (1.001, 1.050)
PNDE11RR{}^{\text{RR}}_{11} 1.223 (1.044, 1.403) 1.181 (1.014, 1.348)
PCE11RR{}^{\text{RR}}_{11} 1.244 (1.070, 1.433) 1.212 (1.044, 1.379)
Harmed PNIE10RR{}^{\text{RR}}_{10} 1.029 (1.001, 1.059) 1.046 (1.013, 1.079)
PNDE10RR{}^{\text{RR}}_{10} 2.296 (1.841, 2.982) 2.142 (1.724, 2.560)
PCE10RR{}^{\text{RR}}_{10} 2.363 (1.897, 3.045) 2.241 (1.817, 2.666)
Immune PNIE00RR{}^{\text{RR}}_{00} 1.025 (0.983, 1.074) 1.035 (0.995, 1.075)
PNDE00RR{}^{\text{RR}}_{00} 1.102 (0.911, 1.328) 1.111 (0.881, 1.340)
PCE00RR{}^{\text{RR}}_{00} 1.129 (0.945, 1.336) 1.150 (0.916, 1.385)

§\mathsection Based on the nonparametric efficient method, the contrasts (and 95% confidence intervals) between the PNDE in different principal strata pairs are log(PNDE^10RR)log(PNDE^00RR)=0.657\log(\widehat{\text{PNDE}}_{10}^{\text{RR}})-\log(\widehat{\text{PNDE}}_{00}^{\text{RR}})=0.657 (0.344,0.967)(0.344,0.967), log(PNDE^10RR)log(PNDE^11RR)=0.595\log(\widehat{\text{PNDE}}_{10}^{\text{RR}})-\log(\widehat{\text{PNDE}}_{11}^{\text{RR}})=0.595 (0.315,0.874)(0.315,0.874), and log(PNDE^11RR)log(PNDE^00RR)=0.061\log(\widehat{\text{PNDE}}_{11}^{\text{RR}})-\log(\widehat{\text{PNDE}}_{00}^{\text{RR}})=0.061 (0.189,0.311)(-0.189,0.311), respectively. The contrasts (and 95% confidence intervals) between the PNIE in different principal strata pairs are log(PNIE^10RR)log(PNIE^00RR)=0.011\log(\widehat{\text{PNIE}}_{10}^{\text{RR}})-\log(\widehat{\text{PNIE}}_{00}^{\text{RR}})=0.011 (0.039,0.061)(-0.039,0.061), log(PNIE^10RR)log(PNIE^11RR)=0.020\log(\widehat{\text{PNIE}}_{10}^{\text{RR}})-\log(\widehat{\text{PNIE}}_{11}^{\text{RR}})=0.020 (0.033,0.072)(-0.033,0.072), log(PNIE^11RR)log(PNIE^00RR)=0.009\log(\widehat{\text{PNIE}}_{11}^{\text{RR}})-\log(\widehat{\text{PNIE}}_{00}^{\text{RR}})=-0.009 (0.055,0.036)(-0.055,0.036), respectively.

Table 3 presents the results based on the multiply robust and nonparametric efficient estimators. With a binary outcome, we define all causal estimands on the risk ratio scale. Results based on moment-type estimators are given in Supplementary Material Tables S5–S6, exhibiting similar patterns. Table 3 (upper panel) presents the ITT natural mediation effects, and suggests that living in damp or moldy conditions has a causal effect on elevating the risk of depression; the 95% confidence intervals for ITT, ITT-NIE and ITT-NDE estimands all exclude the null. Table 3 (lower panels) presents the principal natural mediation effects. For the harmed stratum who are most sensitive to living conditions, we observe a large PNDE (risk ratio >2>2), but PNDEs are smaller in the other strata (risk ratio <1.3<1.3). We further obtain the difference in log PNDEs across the three subgroups using the nonparametric efficient estimator, and confirm that the PNDE within the harmed stratum is substantially different from that within the other two strata. For example, log(PNDE^10RR)log(PNDE^00RR)=0.657\log(\widehat{\text{PNDE}}_{10}^{\text{RR}})-\log(\widehat{\text{PNDE}}_{00}^{\text{RR}})=0.657 with 95% confidence interval (0.344, 0.967) and log(PNDE^10RR)log(PNDE^11RR)=0.595\log(\widehat{\text{PNDE}}_{10}^{\text{RR}})-\log(\widehat{\text{PNDE}}_{11}^{\text{RR}})=0.595 with 95% confidence interval (0.315, 0.874). On the other hand, the PNIEs are rather comparable in magnitude across strata. Based on the nonparametric efficient estimator, although only the 95% confidence intervals of the PNIE in the doomed and harmed strata exclude null, the 95% confidence interval for each pairwise difference in log PINE includes null. Finally, the proportion mediated varies across principal strata. That is, the perceived control on one’s home, as a mediator, explains the largest fraction of PCE00RR{}^{\text{RR}}_{00} among the immune strata (log(PNIE00RR)\log(\text{PNIE}^{\text{RR}}_{00})/log(PCE00RR)25%\log(\text{PCE}^{\text{RR}}_{00})\approx 25\%), and explains the smallest fraction of PCE10RR{}^{\text{RR}}_{10} among the harmed strata (log(PNIE10RR)\log(\text{PNIE}^{\text{RR}}_{10})/log(PCE10RR)6%\log(\text{PCE}^{\text{RR}}_{10})\approx 6\%).

As an additional exploratory comparison, we also carry out moderated mediation analysis with respect to baseline covariates. We evaluate the conditional natural indirect and direct effects on a risk ratio scale given each covariate, using the R package moderate.mediation (Qin and Wang, 2024). For each covariate considered, we partition the covariates vector into the moderator of interest and all remaining covariates as confounding adjustment variables. Next, we fit logistic models for the mediator and outcome to assess conditional mediation based on the identification formulas given in Qin and Wang (2024). The conditional natural (in)direct effects are summarized in Supplementary Material Figure S3 and Table S7. The results suggest that the mediation effect heterogeneity across different covariate levels is milder as compared to the results under the principal stratification mediation analysis. An important distinction of the moderated mediation analysis from our proposed methods is that the former fails to address DD as a potential post-treatment confounder, and may be biased even for quantifying the conditional mediation effect estimands.

7 A framework for sensitivity analysis

The principal ignorability (Assumption 4) and ignorability of the mediator (Assumption 5) are two crucial assumptions for identification of θd1d0(zz)\theta_{d_{1}d_{0}}^{(zz^{\prime})}. These two assumptions, however, cannot be empirically verified. Sensitivity analysis is therefore a useful tool to assess causal effects under assumed violations of these assumptions. In the Supplementary Material, we develop a semiparametric sensitivity analysis framework to assess the impact of violation of Assumption 4 and Assumption 5 on inference about θd1d0(zz)\theta_{d_{1}d_{0}}^{(zz^{\prime})} and mediation effects. The proposed sensitivity analysis strategy relies on the confounding function approach (Tchetgen Tchetgen and Shpitser, 2012; Ding and Lu, 2017). Once the confounding functions are developed, we further provide a multiply robust estimator for θd1d0(zz)\theta_{d_{1}d_{0}}^{(zz^{\prime})} and natural mediation effects and prove its large-sample properties, assuming a known confounding function. In practice, the confounding function is unknown and users can specify a working sensitivity function with interpretable sensitivity parameters and then report the causal estimates under a range of values of sensitivity parameters, in order to identify tipping points that might reverse the causal conclusions. In the Supplementary Material, we also illustrate the proposed sensitivity analysis methods in the context of the JOBS II study.

8 Discussion

In this article, we consider a set of new identification assumptions for studying the natural mediation effects across several principal strata. This provides an important complementary perspective to existing methods that view DD as either a post-treatment confounder or another mediator, and enables the investigation of mediation mechanisms within subpopulations. We then derive the EIF for the principal natural mediation effects and further propose a quadruply robust estimator. Finally, a nonparametric extension has been developed to alleviate parametric model misspecification bias and to achieve efficient estimation.

While each principal stratum often represents a scientifically relevant subpopulation, the stratum membership is not always fully observed for each individual in a particular study, leading to potential barriers in optimizing future interventions to target subpopulations. Although there has been no consensus in mitigating such barriers, we offer three considerations that may improve the policy relevance for addressing mediation across principal strata. First, as a routine practice, we recommend summarizing the baseline characteristics for each stratum to help distinguish partially observed subpopulations in measurable dimensions. Given that the baseline summary is widely available from published social science and biomedical studies, summary statistics such those in Web Tables 3 and 4 (applications in Sections 6.1 and 6.2) can facilitate a direct comparison to existing study populations and determine the relevance of the current results to alternative populations. Importantly, these summary statistics can be readily obtained once the principal score is estimated, and an example case study can also be found in Section 5.2 of Cheng et al. (2023a). Second, special study design features may enable an explicit characterization of the endogenous subgroups. As a concrete example, randomization plus strong monotonicity—design features of the JOB II study in Section 6.1—ensure that individuals attending the workshop and those not attending the workshop in the treatment group are unbiased representations of the compliers and never-takers in the entire study, and evidence about their principal natural mediation effects serves to improve interventions that can at least target individuals in the treatment group. Third, for individuals with unobserved stratum membership, the estimated principal score model is useful for membership prediction. In the noncompliance scenario, Kennedy et al. (2020) discussed a two-stage treatment policy, where in the first stage one predicts the compliance status (for example, based on estimated principal scores), and in second stage, one recommends the optimal intervention in each predicted stratum. Predicting principal stratum membership was also an intermediate step in Chen et al. (2024), who have quantified conditional average treatment effects among the partially observed always-survivors in the truncation-by-death setting. Although the optimal methods for predicting membership and the best practice for operationalizing a multi-stage treatment policy remain important topics for future research, we believe this perspective continues to endorse the value of the principal natural mediation effect estimates for informing improved interventions to target partially observed subpopulations.

This article addresses a univariate mediator. When MM is multi-dimensional with several continuous components, it may be cumbersome to leverage the EIF in Theorem 3 to assess mediation, because one needs to estimate a multi-dimensional density rzd(m,𝑿)r_{zd}(m,{\bm{X}}) and to further calculate a multi-dimensional integration ηzz(𝑿)=mμzdz(m,𝑿)rzdz(m,𝑿)𝑑m\eta_{zz^{\prime}}({\bm{X}})=\int_{m}\mu_{zd_{z}}(m,{\bm{X}})r_{z^{\prime}d_{z^{\prime}}}(m,{\bm{X}})dm. These challenges may be mitigated by reparametrizing the nuisance functions in EIF (Díaz et al., 2021; Zhou, 2022). For example, one can retain the current parameterization of δd1d0(𝑶)\delta_{d_{1}d_{0}}({\bm{O}}) but re-express ψd1d0(zz)(𝑶)\psi_{d_{1}d_{0}}^{(zz^{\prime})}({\bm{O}}) to ψd1d0(zz),(𝑶)\psi_{d_{1}d_{0}}^{(zz^{\prime}),\dagger}({\bm{O}}) as

(𝕀(Z=z){𝕀(D=d)pzd(𝑿)}πz(𝑿)k(1Z){Dp01(𝑿)}π0(𝑿))ηzz(𝑿)\displaystyle\left(\frac{\mathbb{I}(Z=z^{*})\left\{\mathbb{I}(D=d^{*})-p_{z^{*}d^{*}}({\bm{X}})\right\}}{\pi_{z^{*}}({\bm{X}})}-k\frac{(1-Z)\left\{D-p_{01}({\bm{X}})\right\}}{\pi_{0}({\bm{X}})}\right)\eta_{zz^{\prime}}^{\dagger}({\bm{X}})
+\displaystyle+ {pzd(𝑿)kp01(𝑿)}𝕀(D=dz,Z=z)pzdz(𝑿)πz(𝑿)gzdz(M,𝑿)κdz(M,𝑿)gzdz(M,𝑿)κdz(M,𝑿){Yμzdz(M,𝑿)}\displaystyle\left\{p_{z^{*}d^{*}}({\bm{X}})-kp_{01}({\bm{X}})\right\}\frac{\mathbb{I}(D=d_{z},Z=z)}{p_{z^{\prime}d_{z^{\prime}}}({\bm{X}})\pi_{z^{\prime}}({\bm{X}})}\frac{g_{z^{\prime}d_{z^{\prime}}}(M,{\bm{X}})\kappa_{d_{z^{\prime}}}(M,{\bm{X}})}{g_{zd_{z}}(M,{\bm{X}})\kappa_{d_{z}}(M,{\bm{X}})}\left\{Y-\mu_{zd_{z}}(M,{\bm{X}})\right\}
+\displaystyle+ {pzd(𝑿)kp01(𝑿)}𝕀(D=dz,Z=z)pzdz(𝑿)πz(𝑿){μzdz(M,𝑿)ηzz(𝑿)}\displaystyle\left\{p_{z^{*}d^{*}}({\bm{X}})-kp_{01}({\bm{X}})\right\}\frac{\mathbb{I}(D=d_{z^{\prime}},Z=z^{\prime})}{p_{z^{\prime}d_{z^{\prime}}}({\bm{X}})\pi_{z^{\prime}}({\bm{X}})}\left\{\mu_{zd_{z}}(M,{\bm{X}})-\eta_{zz^{\prime}}^{\dagger}({\bm{X}})\right\}
+\displaystyle+ {pzd(𝑿)kp01(𝑿)}ηzz(𝑿).\displaystyle\left\{p_{z^{*}d^{*}}({\bm{X}})-kp_{01}({\bm{X}})\right\}\eta_{zz^{\prime}}^{\dagger}({\bm{X}}).

Here, ψd1d0(zz),(𝑶)\psi_{d_{1}d_{0}}^{(zz^{\prime}),\dagger}({\bm{O}}) depends on a set of nuisance functions hnuisance={πz(𝑿),pzd(𝑿),μzd(M,𝑿),κd(M,𝑿),gzd(M,𝑿),ηzz(𝑿)}h_{\text{nuisance}}^{\dagger}=\{\pi_{z}({\bm{X}}),\allowbreak p_{zd}({\bm{X}}),\allowbreak\mu_{zd}(M,{\bm{X}}),\allowbreak\kappa_{d}(M,{\bm{X}}),\allowbreak g_{zd}(M,{\bm{X}}),\allowbreak\eta_{zz^{\prime}}^{\dagger}({\bm{X}})\}, where the first three are identical to these from hnuisanceh_{\text{nuisance}}, κd(M,𝑿)=fD|M,𝑿(d|M,𝑿)\kappa_{d}(M,{\bm{X}})=f_{D|M,{\bm{X}}}(d|M,{\bm{X}}) and gzd(M,𝑿)=fZ|D,M,𝑿(z|d,M,𝑿)g_{zd}(M,{\bm{X}})=f_{Z|D,M,{\bm{X}}}(z|d,M,{\bm{X}}) are two conditional probabilities, and ηzz(𝑿)=𝔼[μzdz(M,𝑿)|Z=z,D=dz,𝑿]\eta_{zz^{\prime}}^{\dagger}({\bm{X}})=\mathbb{E}[\mu_{zd_{z}}(M,{\bm{X}})|Z=z^{\prime},D=d_{z^{\prime}},{\bm{X}}] is a nested expectation that can be estimated by regressing μ^zdz(M,𝑿)\widehat{\mu}_{zd_{z}}(M,{\bm{X}}) on 𝑿{\bm{X}} within strata {Z=z,D=dz}\{Z=z^{\prime},D=d_{z^{\prime}}\}. Notice that this alternative set of nuisance functions only involves one-dimensional conditional expectations or probabilities regardless of the dimensionality of MM, and has potential to simplify the modeling process. The semiparametric efficient estimator based on the reparameterized EIF can now be defined as θ^d1d0(zz),=n[ψ^d1d0(zz),(𝑶)]/n[δ^d1d0(𝑶)]\widehat{\theta}_{d_{1}d_{0}}^{(zz^{\prime}),\dagger}=\mathbb{P}_{n}\left[\widehat{\psi}_{d_{1}d_{0}}^{(zz^{\prime}),\dagger}(\bm{O})\right]\Big{/}\mathbb{P}_{n}\left[\widehat{\delta}_{d_{1}d_{0}}(\bm{O})\right], whose asymptotic properties and finite-sample performance require future work.

To support the implementation of the proposed methodology, we have developed the psmediate R package along with a brief vignette, which can be accessed at https://github.com/chaochengstat/psmediate and https://rpubs.com/chaocheng/psmediate.

Acknowledgement

This work is partially supported by the Patient-Centered Outcomes Research Institute® (PCORI® Award ME-2023C1-31350). We thank the World Health Organization’s European Centre for Environment and Health, Bonn office, for providing the WHO-LARES data. We thank Johan Steen for connecting us with the Bonn office to apply for data access. The statements in this article are solely the responsibility of the authors and do not necessarily represent the views of PCORI® or World Health Organization.

References

  • Albert and Nelson (2011) Albert, J. M. and Nelson, S. (2011), “Generalized causal mediation analysis,” Biometrics, 67, 1028–1038.
  • Angrist et al. (1996) Angrist, J., Imbens, G., and Rubin, D. (1996), “Identification of causal effects using instrumental variables,” Journal of the American Statistical Association, 91, 444–455.
  • Bickel et al. (1993) Bickel, P. J., Klaassen, C. A., Bickel, P. J., Ritov, Y., Klaassen, J., Wellner, J. A., and Ritov, Y. (1993), Efficient and Adaptive Estimation for Semiparametric Models, Springer.
  • Chen et al. (2024) Chen, X., Harhay, M. O., Tong, G., and Li, F. (2024), “A Bayesian machine learning approach for estimating heterogeneous survivor causal effects: applications to a critical care trial,” The Annals of Applied Statistics, 18, 350.
  • Chen and White (1999) Chen, X. and White, H. (1999), “Improved rates and asymptotic normality for nonparametric neural network estimators,” IEEE Transactions on Information Theory, 45, 682–691.
  • Cheng et al. (2023a) Cheng, C., Guo, Y., Liu, B., Wruck, L., and Li, F. (2023a), “Multiply robust estimation for causal survival analysis with treatment noncompliance,” arXiv preprint arXiv:2305.13443.
  • Cheng et al. (2021) Cheng, C., Spiegelman, D., and Li, F. (2021), “Estimating the natural indirect effect and the mediation proportion via the product method,” BMC Medical Research Methodology, 21, 1–20.
  • Cheng et al. (2023b) — (2023b), “Is the product method more efficient than the difference method for assessing mediation?” American Journal of Epidemiology, 192, 84–92.
  • Chernozhukov et al. (2018) Chernozhukov, V., Chetverikov, D., Demirer, M., Duflo, E., Hansen, C., Newey, W., and Robins, J. (2018), “Double/debiased machine learning for treatment and structural parameters: Double/debiased machine learning,” The Econometrics Journal, 21.
  • Daniel et al. (2015) Daniel, R. M., De Stavola, B. L., Cousens, S., and Vansteelandt, S. (2015), “Causal mediation analysis with multiple mediators,” Biometrics, 71, 1–14.
  • Díaz et al. (2021) Díaz, I., Hejazi, N. S., Rudolph, K. E., and van Der Laan, M. J. (2021), “Nonparametric efficient causal mediation with intermediate confounders,” Biometrika, 108, 627–641.
  • Ding and Lu (2017) Ding, P. and Lu, J. (2017), “Principal stratification analysis using principal scores,” Journal of the Royal Statistical Society. Series B (Statistical Methodology), 757–777.
  • Forastiere et al. (2018) Forastiere, L., Mattei, A., and Ding, P. (2018), “Principal ignorability in mediation analysis: through and beyond sequential ignorability,” Biometrika, 105, 979–986.
  • Frangakis and Rubin (2002) Frangakis, C. E. and Rubin, D. B. (2002), “Principal stratification in causal inference,” Biometrics, 58, 21–29.
  • Frölich and Huber (2017) Frölich, M. and Huber, M. (2017), “Direct and indirect treatment effects–causal chains and mediation analysis with instrumental variables,” Journal of the Royal Statistical Society. Series B (Statistical Methodology), 1645–1666.
  • Frölich and Melly (2013) Frölich, M. and Melly, B. (2013), “Identification of treatment effects on the treated with one-sided non-compliance,” Econometric Reviews, 32, 384–414.
  • Hines et al. (2022) Hines, O., Dukes, O., Diaz-Ordaz, K., and Vansteelandt, S. (2022), “Demystifying statistical learning based on efficient influence functions,” The American Statistician, 76, 292–304.
  • Imai et al. (2010) Imai, K., Keele, L., and Yamamoto, T. (2010), “Identification, Inference and Sensitivity Analysis for Causal Mediation Effects,” Statistical Science, 25, 51–71.
  • Jiang et al. (2022) Jiang, Z., Yang, S., and Ding, P. (2022), “Multiply robust estimation of causal effects under principal ignorability,” Journal of the Royal Statistical Society Series B: Statistical Methodology, 84, 1423–1445.
  • Jo and Stuart (2009) Jo, B. and Stuart, E. A. (2009), “On the use of propensity scores in principal causal effect estimation,” Statistics in Medicine, 28, 2857–2875.
  • Kahan et al. (2023) Kahan, B. C., Cro, S., Li, F., and Harhay, M. O. (2023), “Eliminating ambiguous treatment effects using estimands,” American Journal of Epidemiology, 192, 987–994.
  • Kang and Schafer (2007) Kang, J. D. and Schafer, J. L. (2007), “Demystifying double robustness: A comparison of alternative strategies for estimating a population mean from incomplete data,” Statistical Science, 22, 523–539.
  • Kennedy et al. (2020) Kennedy, E. H., Balakrishnan, S., and G’Sell, M. (2020), “Sharp instruments for classifying compliers and generalizing causal effects,” The Annals of Statistics, 48, 2008–2030.
  • Kim et al. (2017) Kim, C., Daniels, M. J., Marcus, B. H., and Roy, J. A. (2017), “A framework for Bayesian nonparametric inference for causal effects of mediation,” Biometrics, 73, 401–409.
  • Luo et al. (2016) Luo, Y., Spindler, M., and Kück, J. (2016), “High-Dimensional L2L_{2} Boosting: Rate of Convergence,” arXiv preprint arXiv:1602.08927.
  • Merchant et al. (2021) Merchant, A. T., Liu, J., Reynolds, M. A., Beck, J. D., and Zhang, J. (2021), “Quantile regression to estimate the survivor average causal effect of periodontal treatment effects on birthweight and gestational age,” Journal of Periodontology, 92, 975–982.
  • Michalowicz et al. (2006) Michalowicz, B. S., Hodges, J. S., DiAngelis, A. J., Lupo, V. R., Novak, M. J., Ferguson, J. E., Buchanan, W., Bofill, J., Papapanou, P. N., Mitchell, D. A., et al. (2006), “Treatment of periodontal disease and the risk of preterm birth,” New England Journal of Medicine, 355, 1885–1894.
  • Miles et al. (2017) Miles, C. H., Shpitser, I., Kanki, P., Meloni, S., and Tchetgen Tchetgen, E. J. (2017), “Quantifying an adherence path-specific effect of antiretroviral therapy in the Nigeria PEPFAR program,” Journal of the American Statistical Association, 112, 1443–1452.
  • Miles et al. (2020) — (2020), “On semiparametric estimation of a path-specific effect in the presence of mediator-outcome confounding,” Biometrika, 107, 159–172.
  • Nguyen et al. (2021) Nguyen, T. Q., Schmid, I., and Stuart, E. A. (2021), “Clarifying causal mediation analysis for the applied researcher: Defining effects based on what we want to learn.” Psychological Methods, 26, 255.
  • Page et al. (2015) Page, L. C., Feller, A., Grindal, T., Miratrix, L., and Somers, M.-A. (2015), “Principal stratification: A tool for understanding variation in program effects across endogenous subgroups,” American Journal of Evaluation, 36, 514–531.
  • Park and Kürüm (2018) Park, S. and Kürüm, E. (2018), “Causal mediation analysis with multiple mediators in the presence of treatment noncompliance,” Statistics in Medicine, 37, 1810–1829.
  • Park and Kürüm (2020) — (2020), “A two-stage joint modeling method for causal mediation analysis in the presence of treatment noncompliance,” Journal of Causal Inference, 8, 131–149.
  • Park and Palardy (2020) Park, S. and Palardy, G. J. (2020), “Sensitivity evaluation of methods for estimating complier average causal mediation effects to assumptions,” Journal of Educational and Behavioral Statistics, 45, 475–506.
  • Price et al. (1992) Price, R. H., Van Ryn, M., and Vinokur, A. D. (1992), “Impact of a preventive job search intervention on the likelihood of depression among the unemployed,” Journal of Health and Social Behavior, 158–167.
  • Qin and Wang (2024) Qin, X. and Wang, L. (2024), “Causal moderated mediation analysis: Methods and software,” Behavior Research Methods, 56, 1314–1334.
  • Robins and Richardson (2010) Robins, J. M. and Richardson, T. S. (2010), “Alternative graphical causal models and the identification of direct effects,” in Causality and Psychopathology: Finding the Determinants of Disorders and Their Cures, 103–158.
  • Rudolph et al. (2024) Rudolph, K. E., Williams, N., and Díaz, I. (2024), “Using instrumental variables to address unmeasured confounding in causal mediation analysis,” Biometrics, 80, ujad037.
  • Steen et al. (2017) Steen, J., Loeys, T., Moerkerke, B., and Vansteelandt, S. (2017), “Flexible mediation analysis with multiple mediators,” American Journal of Epidemiology, 186, 184–193.
  • Stuart and Jo (2015) Stuart, E. A. and Jo, B. (2015), “Assessing the sensitivity of methods for estimating principal causal effects,” Statistical Methods in Medical Research, 24, 657–674.
  • Tchetgen Tchetgen and Shpitser (2012) Tchetgen Tchetgen, E. J. and Shpitser, I. (2012), “Semiparametric theory for causal mediation analysis: efficiency bounds, multiple robustness, and sensitivity analysis,” Annals of Statistics, 40, 1816.
  • Tchetgen Tchetgen and VanderWeele (2014) Tchetgen Tchetgen, E. J. and VanderWeele, T. J. (2014), “On identification of natural direct effects when a confounder of the mediator is directly affected by exposure,” Epidemiology (Cambridge, Mass.), 25, 282.
  • Van der Laan et al. (2007) Van der Laan, M. J., Polley, E. C., and Hubbard, A. E. (2007), “Super learner,” Statistical Applications in Genetics and Molecular Biology, 6.
  • VanderWeele (2011) VanderWeele, T. J. (2011), “Principal stratification–uses and limitations,” The International Journal of Biostatistics, 7.
  • VanderWeele and Vansteelandt (2009) VanderWeele, T. J. and Vansteelandt, S. (2009), “Conceptual issues concerning mediation, interventions and composition,” Statistics and its Interface, 2, 457–468.
  • VanderWeele and Vansteelandt (2010) — (2010), “Odds ratios for mediation analysis for a dichotomous outcome,” American Journal of Epidemiology, 172, 1339–1348.
  • VanderWeele et al. (2014) VanderWeele, T. J., Vansteelandt, S., and Robins, J. M. (2014), “Effect decomposition in the presence of an exposure-induced mediator-outcome confounder,” Epidemiology (Cambridge, Mass.), 25, 300.
  • Wager and Walther (2015) Wager, S. and Walther, G. (2015), “Adaptive concentration of regression trees, with application to random forests,” arXiv preprint arXiv:1503.06388.
  • Xia and Chan (2021) Xia, F. and Chan, K. C. G. (2021), “Identification, semiparametric efficiency, and quadruply robust estimation in mediation analysis with treatment-induced confounding,” Journal of the American Statistical Association, 1–10.
  • Yamamoto (2013) Yamamoto, T. (2013), “Identification and estimation of causal mediation effects with treatment noncompliance,” Technical Report.
  • Zhou (2022) Zhou, X. (2022), “Semiparametric estimation for causal mediation analysis with multiple causally ordered mediators,” Journal of the Royal Statistical Society Series B: Statistical Methodology, 84, 794–821.

Supplementary Material to “Identification and multiply robust estimation in mediation analysis across principal strata”

Section A provides an additional motivating example. Section B provides practical strategies on specification of the parametric working models. In Section C, we provide a semiparametric sensitivity analysis framework for the principal ignorability assumption and ignorability of the mediator assumption. In Section D, we provide the proofs for all theorems, propositions, and remarks in the main manuscript. In Section E, we present Supplementary Material Tables and Figures.

A An additional motivating example

Example 3

(Mediation analysis with death-truncated mediator and outcome) Consider a case when the mediator and outcome are truncated due to a terminal event before measurements of the mediator, but no other terminal event occurs between the mediator and outcome. One concrete example is the Obstetrics and Periodontal Therapy (OPT) trial (Michalowicz et al., 2006), where one may address the role of gestational age (MM) in mediating the effect of a periodontal treatment during pregnancy (ZZ) on birthweight (YY), but MM and YY are only measured if infants born alive. Here, the survival status of the infants (DD) serves as a terminal event, where MM and YY are not well defined for dead units (D=0D=0) (Merchant et al., 2021). In this scenario, it is of interest to estimate the average treatment effect and mediation effect among the subset of infants who would always survive regardless of the treatment (i.e., the always-survivor stratum). Specifically, assessing the average treatment effect among always-survivors (or referred to as the survivor average causal effect) can help answer the central research question in the OPT trial on whether the periodontal therapy has adverse/positive effect on newborn infant’s birthweight (Michalowicz et al., 2006), without the complications due to death as a terminal event. As a next step, investigating the principal natural mediation effect within always-survivors, one can clarify the role of gestational age in explaining the survivor average causal effect.

B Specification of the parametric working models

We can specify working models hnuisancepar={πzpar(𝒙),pzdpar(𝒙),rzdpar(m,𝒙),μzdpar(m,𝒙)}h_{nuisance}^{\text{par}}=\{\pi_{z}^{\text{par}}({\bm{x}}),p_{zd}^{\text{par}}({\bm{x}}),r_{zd}^{\text{par}}(m,{\bm{x}}),\mu_{zd}^{\text{par}}(m,{\bm{x}})\} for hnuisanceh_{nuisance}. Specification of hnuisanceparh_{nuisance}^{\text{par}} can be flexible, and we provide one example below. This specification strategy is also used in our simulation and application studies.

For πzpar(𝒙)\pi_{z}^{\text{par}}({\bm{x}}), one can consider fZ|𝑿(1|𝑿)=expit(𝜶T[1,𝑿T])f_{Z|{\bm{X}}}(1|{\bm{X}})=\text{expit}\left(\bm{\alpha}^{T}[1,{\bm{X}}^{T}]\right) as a logistic regression with coefficients 𝜶\bm{\alpha} such that πzpar(𝒙)={expit(𝜶T[1,𝒙T])}z{1expit(𝜶T[1,𝒙T])}1z\pi_{z}^{\text{par}}({\bm{x}})=\left\{\text{expit}\left(\bm{\alpha}^{T}[1,{\bm{x}}^{T}]\right)\right\}^{z}\left\{1-\text{expit}\left(\bm{\alpha}^{T}[1,{\bm{x}}^{T}]\right)\right\}^{1-z}, where expit(x)=11+exp(x)\text{expit}(x)=\frac{1}{1+\exp(-x)} is the logistic function. Specification of pzdpar(𝒙)p_{zd}^{\text{par}}({\bm{x}}) differs between the one-sided and two-sided noncompliance scenarios. Under two-sided noncompliance, one can consider fD|Z,𝑿(1|Z,𝑿)=expit(𝜷T[1,Z,𝑿T])f_{D|Z,{\bm{X}}}(1|Z,{\bm{X}})=\text{expit}\left(\bm{\beta}^{T}[1,Z,{\bm{X}}^{T}]\right) as a logistic regression with coefficients 𝜷\bm{\beta}, leading to pzdpar(𝒙)={expit(𝜷T[1,z,𝒙T])}d{1expit(𝜷T[1,z,𝒙T])}1dp_{zd}^{\text{par}}({\bm{x}})=\left\{\text{expit}\left(\bm{\beta}^{T}[1,z,{\bm{x}}^{T}]\right)\right\}^{d}\left\{1-\text{expit}\left(\bm{\beta}^{T}[1,z,{\bm{x}}^{T}]\right)\right\}^{1-d}. Under one-sided noncompliance, we already know p0d(𝒙)1dp_{0d}({\bm{x}})\equiv 1-d by the strong monotonicity and therefore we can fix p0dpar(𝒙)=1dp_{0d}^{\text{par}}({\bm{x}})=1-d and only specify a working model for p1dpar(𝒙)p_{1d}^{\text{par}}({\bm{x}}); for example, one can consider fD|Z,𝑿(1|1,𝑿)=expit(𝜷T[1,𝑿T])f_{D|Z,{\bm{X}}}(1|1,{\bm{X}})=\text{expit}\left(\bm{\beta}^{T}[1,{\bm{X}}^{T}]\right) such that p1dpar(𝒙)={expit(𝜷T[1,𝒙T])}d{1expit(𝜷T[1,𝒙T])}1dp_{1d}^{\text{par}}({\bm{x}})=\left\{\text{expit}\left(\bm{\beta}^{T}[1,{\bm{x}}^{T}]\right)\right\}^{d}\left\{1-\text{expit}\left(\bm{\beta}^{T}[1,{\bm{x}}^{T}]\right)\right\}^{1-d}. If MM is binary, we can further consider fM|Z,D,𝑿(1|Z,D,𝑿)=expit(𝜸T[1,Z,D,𝑿T])f_{M|Z,D,{\bm{X}}}(1|Z,D,{\bm{X}})=\text{expit}\left(\bm{\gamma}^{T}[1,Z,D,{\bm{X}}^{T}]\right) as a logistic regression with coefficients 𝜸\bm{\gamma} such that rzdpar(m,𝒙)={expit(𝜸T[1,z,d,𝒙T])}m{1expit(𝜸T[1,z,d,𝒙T])}1mr_{zd}^{\text{par}}(m,{\bm{x}})=\left\{\text{expit}\left(\bm{\gamma}^{T}[1,z,d,{\bm{x}}^{T}]\right)\right\}^{m}\left\{1-\text{expit}\left(\bm{\gamma}^{T}[1,z,d,{\bm{x}}^{T}]\right)\right\}^{1-m}. For a continuous MM, a feasible working model is M|Z,D,𝑿N(𝜸T[1,Z,D,𝑿T],σγ2)M|Z,D,{\bm{X}}\sim N(\bm{\gamma}^{T}[1,Z,D,{\bm{X}}^{T}],\sigma_{\gamma}^{2}), which implies that rzdpar(m,𝒙)=ϕ(𝜸T[1,z,d,𝒙T],σγ2)r_{zd}^{\text{par}}(m,{\bm{x}})=\phi(\bm{\gamma}^{T}[1,z,d,{\bm{x}}^{T}],\sigma_{\gamma}^{2}), where ϕ(μ,σ2)\phi(\mu,\sigma^{2}) is the density function of N(μ,σ2)N(\mu,\sigma^{2}). When YY is a continuous or binary, one can specify 𝔼[Y|Z,D,M,𝑿]=expit(𝜿T[1,Z,D,M,𝑿T])\mathbb{E}[Y|Z,D,M,{\bm{X}}]=\text{expit}\left(\bm{\kappa}^{T}[1,Z,D,M,{\bm{X}}^{T}]\right) or 𝜿T[1,Z,D,M,𝑿T]\bm{\kappa}^{T}[1,Z,D,M,{\bm{X}}^{T}] with coefficients 𝜿\bm{\kappa}, leading to μzdpar(m,𝒙;𝜿)=expit(𝜿T[1,z,d,m,𝒙T])\mu_{zd}^{\text{par}}(m,{\bm{x}};\bm{\kappa})=\text{expit}\left(\bm{\kappa}^{T}[1,z,d,m,{\bm{x}}^{T}]\right) or 𝜿T[1,z,d,m,𝒙T]\bm{\kappa}^{T}[1,z,d,m,{\bm{x}}^{T}]. Estimators of the parameters in the parametric working models, {𝜶^,𝜷^,𝜸^,σ^γ2,𝜿^}\{\widehat{\bm{\alpha}},\widehat{\bm{\beta}},\widehat{\bm{\gamma}},\widehat{\sigma}_{\gamma}^{2},\widehat{\bm{\kappa}}\}, can proceed by maximum likelihood. Estimators of nuisance functions are therefore h^nuisancepar={π^zpar(𝒙),p^zdpar(𝒙),r^zdpar(m,𝒙),μ^zdpar(m,𝒙)}\widehat{h}_{nuisance}^{\text{par}}=\left\{\widehat{\pi}_{z}^{\text{par}}({\bm{x}}),\widehat{p}_{zd}^{\text{par}}({\bm{x}}),\widehat{r}_{zd}^{\text{par}}(m,{\bm{x}}),\widehat{\mu}_{zd}^{\text{par}}(m,{\bm{x}})\right\}, which is hnuisanceparh_{nuisance}^{\text{par}} evaluated at {𝜶^,𝜷^,𝜸^,σ^γ2,𝜿^}\{\widehat{\bm{\alpha}},\widehat{\bm{\beta}},\widehat{\bm{\gamma}},\widehat{\sigma}_{\gamma}^{2},\widehat{\bm{\kappa}}\}.

C Sensitivity analysis

The principal ignorability (Assumption 4) and ignorability of the mediator (Assumption 5) are required for the identification of θd1d0(zz)\theta_{d_{1}d_{0}}^{(zz^{\prime})} in Theorem 1. However, these two assumptions are not empirically verifiable based on the observed data and may not hold in randomized experiments. We propose sensitivity analysis strategies to assess the impact of violation of these two assumptions on inference about θd1d0(zz)\theta_{d_{1}d_{0}}^{(zz^{\prime})}. When evaluating the sensitivity to violation of one specific assumption, we shall assume all other structural assumptions hold. To fix ideas, we consider that the mediator MM is a multi-valued variable with finite support M{0,1,,mmax}M\in\{0,1,\dots,m_{\max}\}, and the methodology can be generalized to a continuous mediator.

C.1 Sensitivity analysis for the principal ignorability assumption

We first focus on the scenario under standard monotonicity, and methods under strong monotonicity are discussed at the end of this section. To begin with, we notice that Theorem 1 holds under a weaker version of Assumption 4 which consists of two statements:

  • (i)

    𝔼Y1m|U,𝑿[Y1m|10,𝑿]=𝔼Y1m|U,𝑿[Y1m|11,𝑿]\mathbb{E}_{Y_{1m}|U,\!{\bm{X}}}[Y_{1m}|10,\!{\bm{X}}]\!=\!\mathbb{E}_{Y_{1m}|U,{\bm{X}}}[Y_{1m}|11,\!{\bm{X}}] and 𝔼Y0m|U,𝑿[Y0m|10,𝑿]=𝔼Y0m|U,𝑿[Y0m|00,𝑿]\mathbb{E}_{Y_{0m}|U,\!{\bm{X}}}[Y_{0m}|10,\!{\bm{X}}]\!=\!\mathbb{E}_{Y_{0m}|U,\!{\bm{X}}}[Y_{0m}|00,\!{\bm{X}}] for any m{0,,mmax}m\in\{0,\dots,m_{\max}\}.

  • (ii)

    fM1|U,𝑿(m|10,𝑿)=fM1|U,𝑿(m|11,𝑿)f_{M_{1}|U,{\bm{X}}}(m|10,{\bm{X}})=f_{M_{1}|U,{\bm{X}}}(m|11,{\bm{X}}) and fM0|U,𝑿(m|10,𝑿)=fM0|U,𝑿(m|00,𝑿)f_{M_{0}|U,{\bm{X}}}(m|10,{\bm{X}})=f_{M_{0}|U,{\bm{X}}}(m|00,{\bm{X}}) for any m{1,,mmax}m\in\{1,\dots,m_{\max}\}.

Statement (i) requires that the expectation of Y1mY_{1m} is same between the complier and always-takers strata and the expectation of Y0mY_{0m} is same between the complier and never-takers strata, conditional on all observed covariates. Statement (ii) implicitly suggests that fM1|U,𝑿(0|10,𝑿)=fM1|U,𝑿(0|11,𝑿)f_{M_{1}|U,{\bm{X}}}(0|10,{\bm{X}})=f_{M_{1}|U,{\bm{X}}}(0|11,{\bm{X}}) and fM0|U,𝑿(0|10,𝑿)=fM0|U,𝑿(0|00,𝑿)f_{M_{0}|U,{\bm{X}}}(0|10,{\bm{X}})=f_{M_{0}|U,{\bm{X}}}(0|00,{\bm{X}}). Therefore, Statement (ii) requires that the distribution of M1M_{1} is same between the complier and always-takers strata and the distribution of M0M_{0} is same between the complier and never-takers strata, conditional on all observed covariates. Our sensitivity analysis is based on the following confounding functions measuring departure from the weaker version of principal ignorability:

ξY(1)(m,𝒙)=𝔼Y1m|U,𝑿[Y1m|10,𝒙]𝔼Y1m|U,𝑿[Y1m|11,𝒙],ξY(0)(m,𝒙)=𝔼Y0m|U,𝑿[Y0m|10,𝒙]𝔼Y0m|U,𝑿[Y0m|00,𝒙]for m=0,,mmax\displaystyle\xi_{Y}^{(1)}(m,{\bm{x}})=\frac{\mathbb{E}_{Y_{1m}|U,{\bm{X}}}[Y_{1m}|10,{\bm{x}}]}{\mathbb{E}_{Y_{1m}|U,{\bm{X}}}[Y_{1m}|11,{\bm{x}}]},\quad\xi_{Y}^{(0)}(m,{\bm{x}})=\frac{\mathbb{E}_{Y_{0m}|U,{\bm{X}}}[Y_{0m}|10,{\bm{x}}]}{\mathbb{E}_{Y_{0m}|U,{\bm{X}}}[Y_{0m}|00,{\bm{x}}]}\quad\text{for $m=0,\dots,m_{\max}$}
ξM(1)(m,𝒙)=fM1|U,𝑿(m|10,𝒙)fM1|U,𝑿(m|11,𝒙),ξM(0)(m,𝒙)=fM0|U,𝑿(m|10,𝒙)fM0|U,𝑿(m|00,𝒙)for m=1,,mmax.\displaystyle\xi_{M}^{(1)}(m,{\bm{x}})=\frac{f_{M_{1}|U,{\bm{X}}}(m|10,{\bm{x}})}{f_{M_{1}|U,{\bm{X}}}(m|11,{\bm{x}})},\quad\xi_{M}^{(0)}(m,{\bm{x}})=\frac{f_{M_{0}|U,{\bm{X}}}(m|10,{\bm{x}})}{f_{M_{0}|U,{\bm{X}}}(m|00,{\bm{x}})}\quad\text{for $m=1,\dots,m_{\max}$}.

The first two confounding functions measure deviation of principal ignorability in the outcome variable, where ξY(1)(m,𝒙)\xi_{Y}^{(1)}(m,{\bm{x}}) measures the ratio of the mean of Y1mY_{1m} among compliers versus always-takers and ξY(0)(m,𝒙)\xi_{Y}^{(0)}(m,{\bm{x}}) measures the ratio of the mean of Y0mY_{0m} among compliers versus never-takers, conditional on 𝑿=𝒙{\bm{X}}={\bm{x}}. On the other hand, the last two confounding functions measure deviation of principal ignorability in the mediator variable, where ξM(1)(m,𝒙)\xi_{M}^{(1)}(m,{\bm{x}}) measures the relative risk of compliers against always-takers on the treated potential mediator at level mm and ξM(0)(m,𝒙)\xi_{M}^{(0)}(m,{\bm{x}}) measures the relative risk of compliers against never-takers on the control potential mediator at level mm, conditional on 𝑿=𝒙{\bm{X}}={\bm{x}}. Notice that ξM(1)(m,𝒙)\xi_{M}^{(1)}(m,{\bm{x}}) and ξM(0)(m,𝒙)\xi_{M}^{(0)}(m,{\bm{x}}) are only defined for m1m\geq 1, which will determine the values of ξM(1)(0,𝒙):=fM1|U,𝑿(0|10,𝒙)fM1|U,𝑿(0|11,𝒙)\xi_{M}^{(1)}(0,{\bm{x}}):=\frac{f_{M_{1}|U,{\bm{X}}}(0|10,{\bm{x}})}{f_{M_{1}|U,{\bm{X}}}(0|11,{\bm{x}})} and ξM(0)(0,𝒙):=fM0|U,𝑿(0|10,𝒙)fM0|U,𝑿(0|00,𝒙)\xi_{M}^{(0)}(0,{\bm{x}}):=\frac{f_{M_{0}|U,{\bm{X}}}(0|10,{\bm{x}})}{f_{M_{0}|U,{\bm{X}}}(0|00,{\bm{x}})} as shown in Section D.8, where we have provided the following explicit expressions in terms of {ξM(1)(m,𝒙),ξM(0)(m,𝒙) for m=1,,mmax}\{\xi_{M}^{(1)}(m,{\bm{x}}),\xi_{M}^{(0)}(m,{\bm{x}})\text{ for }m=1,\dots,m_{\max}\}:

ξM(1)(0,𝒙)\displaystyle\xi_{M}^{(1)}(0,{\bm{x}}) =1j=1mmaxξM(1)(j,𝒙)p11(𝒙)ξM(1)(j,𝒙)(p11(𝒙)p01(𝒙))+p01(𝒙)r11(j,𝒙)1j=1mmaxp11(𝒙)ξM(1)(j,𝒙)(p11(𝒙)p01(𝒙))+p01(𝒙)r11(j,𝒙),\displaystyle=\frac{1-\displaystyle\sum_{j=1}^{m_{\max}}\frac{\xi_{M}^{(1)}(j,{\bm{x}})p_{11}({\bm{x}})}{\xi_{M}^{(1)}(j,{\bm{x}})(p_{11}({\bm{x}})-p_{01}({\bm{x}}))+p_{01}({\bm{x}})}r_{11}(j,{\bm{x}})}{1-\displaystyle\sum_{j=1}^{m_{\max}}\frac{p_{11}({\bm{x}})}{\xi_{M}^{(1)}(j,{\bm{x}})(p_{11}({\bm{x}})-p_{01}({\bm{x}}))+p_{01}({\bm{x}})}r_{11}(j,{\bm{x}})},
ξM(0)(0,𝒙)\displaystyle\xi_{M}^{(0)}(0,{\bm{x}}) =1j=1mmaxξM(0)(j,𝒙)p00(𝒙)ξM(0)(j,𝒙)(p11(𝒙)p01(𝒙))+p10(𝒙)r00(j,𝒙)1j=1mmaxp00(𝒙)ξM(0)(j,𝒙)(p11(𝒙)p01(𝒙))+p10(𝒙)r00(j,𝒙).\displaystyle=\frac{1-\displaystyle\sum_{j=1}^{m_{\max}}\frac{\xi_{M}^{(0)}(j,{\bm{x}})p_{00}({\bm{x}})}{\xi_{M}^{(0)}(j,{\bm{x}})(p_{11}({\bm{x}})-p_{01}({\bm{x}}))+p_{10}({\bm{x}})}r_{00}(j,{\bm{x}})}{1-\displaystyle\sum_{j=1}^{m_{\max}}\frac{p_{00}({\bm{x}})}{\xi_{M}^{(0)}(j,{\bm{x}})(p_{11}({\bm{x}})-p_{01}({\bm{x}}))+p_{10}({\bm{x}})}r_{00}(j,{\bm{x}})}.

Theorem 1 holds if all sensitivity functions in ξ={(ξM(1)(m,𝒙),ξM(0)(m,𝒙)) for m=1,,mmax and (ξY(1)(m,𝒙),ξY(0)(m,𝒙)) for m=0,,mmax}\xi=\Big{\{}\left(\xi_{M}^{(1)}(m,{\bm{x}}),\xi_{M}^{(0)}(m,{\bm{x}})\right)\text{ for }m=1,\dots,m_{\max}\text{ and }\left(\xi_{Y}^{(1)}(m,{\bm{x}}),\xi_{Y}^{(0)}(m,{\bm{x}})\right)\text{ for }m=0,\dots,m_{\max}\Big{\}} are equal to 1. The following proposition generalizes Theorem 1 to the scenario when at least one confounding function has a value different from 1.

Proposition S1

Suppose that Assumptions 1, 2, 3a, 5, and 6 hold with known values of the confounding functions (ξ\xi), we can identify θd1d0(zz)\theta_{d_{1}d_{0}}^{(zz^{\prime})} by

θd1d0(zz)=𝒙ed1d0(𝒙)ed1d0{m=0mmaxwd1d0(zz)(m,𝒙)μzdz(m,𝒙)rzdz(m,𝒙)}d𝑿(𝒙),\theta_{d_{1}d_{0}}^{(zz^{\prime})}=\int_{{\bm{x}}}\frac{e_{d_{1}d_{0}}({\bm{x}})}{e_{d_{1}d_{0}}}\left\{\sum_{m=0}^{m_{\max}}w_{d_{1}d_{0}}^{(zz^{\prime})}(m,{\bm{x}})\mu_{zd_{z}}(m,{\bm{x}})r_{z^{\prime}d_{z^{\prime}}}(m,{\bm{x}})\right\}\text{d}\mathbb{P}_{{\bm{X}}}({\bm{x}}),

for any d1d0𝒰ad_{1}d_{0}\in\mathcal{U}_{\text{a}}. Here wd1d0(zz)(m,𝐱)w_{d_{1}d_{0}}^{(zz^{\prime})}(m,{\bm{x}}) is a sensitivity weight defined in Section D.8, which depends on the confounding functions ξ\xi and the observed-data nuisance functions pzd(𝐱)p_{zd}({\bm{x}}) and rzd(m,𝐱)r_{zd}(m,{\bm{x}}). As an example,

w10(10)(m,𝒙)={ξM(0)(m,𝒙)p00(𝒙)ξM(0)(m,𝒙)(p11(𝒙)p01(𝒙))+p10(𝒙)ξM(1)(m,𝒙)(p11(𝒙)p01(𝒙))+p01(𝒙)p01(𝒙)/ξY(1)(m,𝒙)+ξM(1)(m,𝒙)(p11(𝒙)p01(𝒙)),if m1,{1r00(0,𝒙)j=1mmaxξM(0)(j,𝒙)p00(𝒙)r00(j,𝒙)/r00(0,𝒙)ξM(0)(j,𝒙)(p11(𝒙)p01(𝒙))+p10(𝒙)}ξM(1)(0,𝑿)(p11(𝒙)p01(𝒙))+p01(𝒙)p01(𝒙)/ξY(1)(0,𝒙)+ξM(1)(0,𝒙)(p11(𝒙)p01(𝒙)),if m=0.\displaystyle w_{10}^{(10)}(m,{\bm{x}})=\begin{cases}\displaystyle\frac{\xi_{M}^{(0)}(m,{\bm{x}})p_{00}({\bm{x}})}{\xi_{M}^{(0)}(m,{\bm{x}})(p_{11}({\bm{x}})-p_{01}({\bm{x}}))+p_{10}({\bm{x}})}\displaystyle\frac{\xi_{M}^{(1)}(m,{\bm{x}})(p_{11}({\bm{x}})-p_{01}({\bm{x}}))+p_{01}({\bm{x}})}{p_{01}({\bm{x}})/\xi_{Y}^{(1)}(m,{\bm{x}})+\xi_{M}^{(1)}(m,{\bm{x}})(p_{11}({\bm{x}})-p_{01}({\bm{x}}))},&\text{if }m\geq 1,\\ \left\{\displaystyle\frac{1}{r_{00}(0,{\bm{x}})}-\displaystyle\sum_{j=1}^{m_{\max}}\frac{\xi_{M}^{(0)}(j,{\bm{x}})p_{00}({\bm{x}})r_{00}(j,{\bm{x}})/r_{00}(0,{\bm{x}})}{\xi_{M}^{(0)}(j,{\bm{x}})(p_{11}({\bm{x}})-p_{01}({\bm{x}}))+p_{10}({\bm{x}})}\right\}\displaystyle\frac{\xi_{M}^{(1)}(0,{\bm{X}})(p_{11}({\bm{x}})-p_{01}({\bm{x}}))+p_{01}({\bm{x}})}{p_{01}({\bm{x}})/\xi_{Y}^{(1)}(0,{\bm{x}})+\xi_{M}^{(1)}(0,{\bm{x}})(p_{11}({\bm{x}})-p_{01}({\bm{x}}))},&\text{if }m=0.\end{cases}

If ξ\xi is known, we can construct a new estimator of θd1d0(zz)\theta_{d_{1}d_{0}}^{(zz^{\prime})} by carefully re-weighting each term in the original multiply robust estimator θ^d1d0(zz),mr\widehat{\theta}^{(zz^{\prime}),\text{mr}}_{d_{1}d_{0}} by the sensitivity weight wd1d0(zz)(m,𝒙)w_{d_{1}d_{0}}^{(zz^{\prime})}(m,{\bm{x}}). Specifically, the new estimator, θ^d1d0(zz),mr(ξ)\widehat{\theta}^{(zz^{\prime}),\text{mr}}_{d_{1}d_{0}}(\xi), takes the following form:

θ^d1d0(zz),mr(ξ)=\displaystyle\widehat{\theta}^{(zz^{\prime}),\text{mr}}_{d_{1}d_{0}}(\xi)= n{(𝕀(Z=z){𝕀(D=d)p^zdpar(𝑿)}π^zpar(𝑿)k(1Z){Dp^01par(𝑿)}π^0par(𝑿))η^zzw,par(𝑿)p^zddrkp^01dr\displaystyle\mathbb{P}_{n}\Big{\{}\left(\frac{\mathbb{I}(Z=z^{*})\left\{\mathbb{I}(D=d^{*})-\widehat{p}_{z^{*}d^{*}}^{\text{par}}({\bm{X}})\right\}}{\widehat{\pi}_{z^{*}}^{\text{par}}({\bm{X}})}-k\frac{(1-Z)\left\{D-\widehat{p}_{01}^{\text{par}}({\bm{X}})\right\}}{\widehat{\pi}_{0}^{\text{par}}({\bm{X}})}\right)\frac{\widehat{\eta}_{zz^{\prime}}^{w,\text{par}}({\bm{X}})}{\widehat{p}_{z^{*}d^{*}}^{\text{dr}}-k\widehat{p}_{01}^{\text{dr}}}
+p^zdpar(𝑿)kp^01par(𝑿)p^zddrkp^01dr𝕀(D=dz,Z=z)p^zdzpar(𝑿)π^zpar(𝑿)r^zdzpar(M,𝑿)r^zdzpar(M,𝑿)w^d1d0(zz)(M,𝑿){Yμ^zdzpar(M,𝑿)}\displaystyle+\frac{\widehat{p}_{z^{*}d^{*}}^{\text{par}}({\bm{X}})-k\widehat{p}_{01}^{\text{par}}({\bm{X}})}{\widehat{p}_{z^{*}d^{*}}^{\text{dr}}-k\widehat{p}_{01}^{\text{dr}}}\frac{\mathbb{I}(D=d_{z},Z=z)}{\widehat{p}_{zd_{z}}^{\text{par}}({\bm{X}})\widehat{\pi}_{z}^{\text{par}}({\bm{X}})}\frac{\widehat{r}_{z^{\prime}d_{z^{\prime}}}^{\text{par}}(M,{\bm{X}})}{\widehat{r}_{zd_{z}}^{\text{par}}(M,{\bm{X}})}\widehat{w}_{d_{1}d_{0}}^{(zz^{\prime})}(M,{\bm{X}})\left\{Y-\widehat{\mu}_{zd_{z}}^{\text{par}}(M,{\bm{X}})\right\}
+p^zdpar(𝑿)kp^01par(𝑿)p^zddrkp^01dr𝕀(D=dz,Z=z)p^zdzpar(𝑿)π^zpar(𝑿){w^d1d0(zz)(M,𝑿)μ^zdzpar(M,𝑿)η^zzw,par(𝑿)}\displaystyle+\frac{\widehat{p}_{z^{*}d^{*}}^{\text{par}}({\bm{X}})-k\widehat{p}_{01}^{\text{par}}({\bm{X}})}{\widehat{p}_{z^{*}d^{*}}^{\text{dr}}-k\widehat{p}_{01}^{\text{dr}}}\frac{\mathbb{I}(D=d_{z^{\prime}},Z=z^{\prime})}{\widehat{p}_{z^{\prime}d_{z^{\prime}}}^{\text{par}}({\bm{X}})\widehat{\pi}_{z^{\prime}}^{\text{par}}({\bm{X}})}\left\{\widehat{w}_{d_{1}d_{0}}^{(zz^{\prime})}(M,{\bm{X}})\widehat{\mu}_{zd_{z}}^{\text{par}}(M,{\bm{X}})-\widehat{\eta}_{zz^{\prime}}^{w,\text{par}}({\bm{X}})\right\}
+p^zdpar(𝑿)kp^01par(𝑿)p^zddrkp^01drη^zzw,par(𝑿)},\displaystyle+\frac{\widehat{p}_{z^{*}d^{*}}^{\text{par}}({\bm{X}})-k\widehat{p}_{01}^{\text{par}}({\bm{X}})}{\widehat{p}_{z^{*}d^{*}}^{\text{dr}}-k\widehat{p}_{01}^{\text{dr}}}\widehat{\eta}_{zz^{\prime}}^{w,\text{par}}({\bm{X}})\Big{\}}, (s8)

where η^zzw,par(𝒙)=m=0mmaxw^d1d0(zz)(m,𝒙)μ^zdzpar(m,𝒙)r^zdzpar(m,𝒙)\widehat{\eta}_{zz^{\prime}}^{w,\text{par}}({\bm{x}})=\displaystyle\sum_{m=0}^{m_{\max}}\widehat{w}_{d_{1}d_{0}}^{(zz^{\prime})}(m,{\bm{x}})\widehat{\mu}_{zd_{z}}^{\text{par}}(m,{\bm{x}})\widehat{r}_{z^{\prime}d_{z^{\prime}}}^{\text{par}}(m,{\bm{x}}) and w^d1d0(zz)(m,𝒙)\widehat{w}_{d_{1}d_{0}}^{(zz^{\prime})}(m,{\bm{x}}) is wd1d0(zz)(m,𝒙)w_{d_{1}d_{0}}^{(zz^{\prime})}(m,{\bm{x}}) evaluated at {p^zdpar(𝒙),r^zdpar(m,𝒙)}\left\{\widehat{p}_{zd}^{\text{par}}({\bm{x}}),\widehat{r}_{zd}^{\text{par}}(m,{\bm{x}})\right\}. The following proposition shows that θ^d1d0(zz),mr(ξ)\widehat{\theta}^{(zz^{\prime}),\text{mr}}_{d_{1}d_{0}}(\xi) is a doubly robust estimator under πem\mathcal{M}_{\pi}\cap\mathcal{M}_{e}\cap\mathcal{M}_{m} or emo\mathcal{M}_{e}\cap\mathcal{M}_{m}\cap\mathcal{M}_{o}.

Proposition S2

Suppose that Assumptions 1, 2, 3a, 5, and 6 hold. Then, the estimator θ^d1d0(zz),mr(ξ)\widehat{\theta}^{(zz^{\prime}),\text{mr}}_{d_{1}d_{0}}(\xi) is consistent and asymptotically normal for any d1d0𝒰ad_{1}d_{0}\in\mathcal{U}_{\text{a}} under πem\mathcal{M}_{\pi}\cap\mathcal{M}_{e}\cap\mathcal{M}_{m} or emo\mathcal{M}_{e}\cap\mathcal{M}_{m}\cap\mathcal{M}_{o}.

In practice, the confounding functions in ξ\xi are unknown. To conduct the sensitivity analysis, one can specify a parametric form of ξ\xi indexed by a finite-dimensional parameter 𝝀\bm{\lambda}, say ξ𝝀\xi_{\bm{\lambda}}. Then, one can report θ^d1d0(zz),mr(ξ𝝀)\widehat{\theta}^{(zz^{\prime}),\text{mr}}_{d_{1}d_{0}}(\xi_{\bm{\lambda}}) and its confidence intervals over a sequence of values of 𝝀\bm{\lambda}, which summarizes how sensitive the inference is affected under assumed departure from the principal ignorability assumption.

The above sensitivity analysis strategy can be easily extended to the scenario under strong monotonicity. Because there are no always-takers under strong monotonicity, we only need to quantify the departure of the principal ignorability between the never-takers and compliers strata, that is, only ξM(0)(m,𝒙)\xi_{M}^{(0)}(m,{\bm{x}}) and ξY(0)(m,𝒙)\xi_{Y}^{(0)}(m,{\bm{x}}) are needed for sensitivity analysis. Similar to the construction of (s8), we can develop an estimator, θ^d1d0(zz),mr(κ)\widehat{\theta}^{(zz^{\prime}),\text{mr}}_{d_{1}d_{0}}(\kappa), based on a set of confounding functions, κ={ξM(0)(m,𝒙) for m=1,,mmax,ξY(0)(m,𝒙) for m=0,,mmax}\kappa=\{\xi_{M}^{(0)}(m,{\bm{x}})\text{ for }m=1,\dots,m_{\max},~{}\xi_{Y}^{(0)}(m,{\bm{x}})\text{ for }m=0,\allowbreak\dots,\allowbreak m_{\max}\}, and this estimator is consistent to θd1d0(zz)\theta^{(zz^{\prime})}_{d_{1}d_{0}} under πem\mathcal{M}_{\pi}\cap\mathcal{M}_{e}\cap\mathcal{M}_{m} or emo\mathcal{M}_{e}\cap\mathcal{M}_{m}\cap\mathcal{M}_{o} for any d1d0𝒰bd_{1}d_{0}\in\mathcal{U}_{\text{b}}. Details of θ^d1d0(zz),mr(κ)\widehat{\theta}^{(zz^{\prime}),\text{mr}}_{d_{1}d_{0}}(\kappa) are given in Section D.9. Analogously, one can report θ^d1d0(zz),mr(κ𝝀)\widehat{\theta}^{(zz^{\prime}),\text{mr}}_{d_{1}d_{0}}(\kappa_{\bm{\lambda}}) over a set of choices of 𝝀\bm{\lambda} to quantify the values of θd1d0(zz)\theta_{d_{1}d_{0}}^{(zz^{\prime})} under assumed departure from principal ignorability, where κ𝝀\kappa_{\bm{\lambda}} is user-specified parametric functions of κ\kappa.

C.2 Sensitivity analysis for the ignorability of the mediator assumption

We develop a sensitivity analysis framework to assess the extent to which the violation of Assumption 5 might affect the inference of θd1d0(10)\theta_{d_{1}d_{0}}^{(10)}; identification of θd1d0(11)\theta_{d_{1}d_{0}}^{(11)} and θd1d0(00)\theta_{d_{1}d_{0}}^{(00)}, however, does not depend on Assumption 5. In Section D.10, we show that the expression of θd1d0(10)\theta_{d_{1}d_{0}}^{(10)} in Theorem 1 holds under a weaker version of Assumption 5 such that 𝔼Yzm|Z,M,U,𝑿[Yzm|z,m,d1d0,𝑿]=𝔼Yzm|Z,M,U,𝑿[Yzm|z,0,d1d0,𝑿]\mathbb{E}_{Y_{zm}|Z,M,U,{\bm{X}}}[Y_{zm}|z,m,d_{1}d_{0},{\bm{X}}]=\mathbb{E}_{Y_{zm}|Z,M,U,{\bm{X}}}[Y_{zm}|z,0,d_{1}d_{0},{\bm{X}}], for all m>0m>0, z{0,1}z\in\{0,1\}, d1d0𝒰ad_{1}d_{0}\in\mathcal{U}_{a} under standard monotonicity, and d1d0𝒰bd_{1}d_{0}\in\mathcal{U}_{b} under strong monotonicity. This weaker assumption only requires mean independence between the potential outcome and the mediator conditional on the treatment assignment, principal strata, and baseline covariates. Recognizing the sufficiency of this weaker assumption, we propose the following sensitivity function to assess violations of the weaker version of Assumption 5:

t(z,m,d1d0,𝒙)=𝔼Y1m|Z,M,U,𝑿[Y1m|z,m,d1d0,𝒙]𝔼Y1m|Z,M,U,𝑿[Y1m|z,0,d1d0,𝒙],t(z,m,d_{1}d_{0},{\bm{x}})=\frac{\mathbb{E}_{Y_{1m}|Z,M,U,{\bm{X}}}[Y_{1m}|z,m,d_{1}d_{0},{\bm{x}}]}{\mathbb{E}_{Y_{1m}|Z,M,U,{\bm{X}}}[Y_{1m}|z,0,d_{1}d_{0},{\bm{x}}]},

for m{0,1,,mmax}m\in\{0,1,\dots,m_{\max}\}, where t(z,0,d1d0,𝒙)1t(z,0,d_{1}d_{0},{\bm{x}})\equiv 1 by definition. If t(z,m,d1d0,𝒙)t(z,m,d_{1}d_{0},{\bm{x}}) differs from 1, then the identification formula of θd1d0(10)\theta^{(10)}_{d_{1}d_{0}} in Theorem 1 no longer holds. The following proposition generalizes Theorem 1 to the scenario for a known t(z,m,d1d0,𝒙)t(z,m,d_{1}d_{0},{\bm{x}}).

Proposition S3

Suppose that Assumptions 1–4 and 6 hold. Based on the confounding function t(z,m,d1d0,𝐱)t(z,m,d_{1}d_{0},{\bm{x}}), we can identify θd1d0(10)\theta^{(10)}_{d_{1}d_{0}} as

θd1d0(10)=𝒙ed1d0(𝒙)ed1d0{m=0mmaxρd1d0(10)(m,𝒙)μ1d1(m,𝒙)r0d0(m,𝒙)}d𝑿(𝒙),\theta^{(10)}_{d_{1}d_{0}}=\int_{{\bm{x}}}\frac{e_{d_{1}d_{0}}({\bm{x}})}{e_{d_{1}d_{0}}}\left\{\sum_{m=0}^{m_{\max}}\rho_{d_{1}d_{0}}^{(10)}(m,{\bm{x}})\mu_{1d_{1}}(m,{\bm{x}})r_{0d_{0}}(m,{\bm{x}})\right\}\text{d}\mathbb{P}_{{\bm{X}}}({\bm{x}}),

for any d1d0𝒰ad_{1}d_{0}\in\mathcal{U}_{\text{a}} under standard monotonicity and any d1d0𝒰bd_{1}d_{0}\in\mathcal{U}_{\text{b}} under strong monotonicity, where

ρd1d0(10)(m,𝒙)={j=0mmaxt(1,j,d1d0,𝒙)t(1,m,d1d0,𝒙)r1d1(j,𝒙)}/{j=0mmaxt(0,j,d1d0,𝒙)t(0,m,d1d0,𝒙)r0d0(j,𝒙)}\rho_{d_{1}d_{0}}^{(10)}(m,{\bm{x}})=\left\{\sum_{j=0}^{m_{\max}}\frac{t(1,j,d_{1}d_{0},{\bm{x}})}{t(1,m,d_{1}d_{0},{\bm{x}})}r_{1d_{1}}(j,{\bm{x}})\right\}\Big{/}\left\{\sum_{j=0}^{m_{\max}}\frac{t(0,j,d_{1}d_{0},{\bm{x}})}{t(0,m,d_{1}d_{0},{\bm{x}})}r_{0d_{0}}(j,{\bm{x}})\right\}

is the sensitivity weight which depends on the confounding function t(z,m,d1d0,𝐱)t(z,m,d_{1}d_{0},{\bm{x}}) and the observed-data nuisance function rzd(m,𝐱)r_{zd}(m,{\bm{x}}).

If the sensitivity function t=t(z,m,d1d0,𝒙)t=t(z,m,d_{1}d_{0},{\bm{x}}) is known, we show in Section D.10 that a consistent estimator of θd1d0(10)\theta^{(10)}_{d_{1}d_{0}} can be obtained by re-weighting each term in the multiply robust estimator by the sensitivity weight ρd1d0(10)(m,𝑿)\rho_{d_{1}d_{0}}^{(10)}(m,{\bm{X}}), and takes the following form:

θ^d1d0(10),mr(t)=\displaystyle\widehat{\theta}^{(10),\text{mr}}_{d_{1}d_{0}}(t)= n{(𝕀(Z=z){𝕀(D=d)p^zdpar(𝑿)}π^zpar(𝑿)k(1Z){Dp^01par(𝑿)}π^0par(𝑿))η^10ρ,par(𝑿)p^zddrkp^01dr\displaystyle\mathbb{P}_{n}\Big{\{}\left(\frac{\mathbb{I}(Z=z^{*})\left\{\mathbb{I}(D=d^{*})-\widehat{p}_{z^{*}d^{*}}^{\text{par}}({\bm{X}})\right\}}{\widehat{\pi}_{z^{*}}^{\text{par}}({\bm{X}})}-k\frac{(1-Z)\left\{D-\widehat{p}_{01}^{\text{par}}({\bm{X}})\right\}}{\widehat{\pi}_{0}^{\text{par}}({\bm{X}})}\right)\frac{\widehat{\eta}_{10}^{\rho,\text{par}}({\bm{X}})}{\widehat{p}_{z^{*}d^{*}}^{\text{dr}}-k\widehat{p}_{01}^{\text{dr}}}
+p^zdpar(𝑿)kp^01par(𝑿)p^zddrkp^01dr𝕀(D=d1,Z=1)p^1d1par(𝑿)π^1par(𝑿)r^0d0par(M,𝑿)r^1d1par(M,𝑿)ρ^d1d0(10)(M,𝑿){Yμ^1d1par(M,𝑿)}\displaystyle+\frac{\widehat{p}_{z^{*}d^{*}}^{\text{par}}({\bm{X}})-k\widehat{p}_{01}^{\text{par}}({\bm{X}})}{\widehat{p}_{z^{*}d^{*}}^{\text{dr}}-k\widehat{p}_{01}^{\text{dr}}}\frac{\mathbb{I}(D=d_{1},Z=1)}{\widehat{p}_{1d_{1}}^{\text{par}}({\bm{X}})\widehat{\pi}_{1}^{\text{par}}({\bm{X}})}\frac{\widehat{r}_{0d_{0}}^{\text{par}}(M,{\bm{X}})}{\widehat{r}_{1d_{1}}^{\text{par}}(M,{\bm{X}})}\widehat{\rho}_{d_{1}d_{0}}^{(10)}(M,{\bm{X}})\left\{Y-\widehat{\mu}_{1d_{1}}^{\text{par}}(M,{\bm{X}})\right\}
+p^zdpar(𝑿)kp^01par(𝑿)p^zddrkp^01dr𝕀(D=d,Z=z)p^0d0par(𝑿)π^0par(𝑿){ρ^d1d0(10)(M,𝑿)μ^1d1par(M,𝑿)η^10ρ,par(𝑿)}\displaystyle+\frac{\widehat{p}_{z^{*}d^{*}}^{\text{par}}({\bm{X}})-k\widehat{p}_{01}^{\text{par}}({\bm{X}})}{\widehat{p}_{z^{*}d^{*}}^{\text{dr}}-k\widehat{p}_{01}^{\text{dr}}}\frac{\mathbb{I}(D=d^{\prime},Z=z^{\prime})}{\widehat{p}_{0d_{0}}^{\text{par}}({\bm{X}})\widehat{\pi}_{0}^{\text{par}}({\bm{X}})}\left\{\widehat{\rho}_{d_{1}d_{0}}^{(10)}(M,{\bm{X}})\widehat{\mu}_{1d_{1}}^{\text{par}}(M,{\bm{X}})-\widehat{\eta}_{10}^{\rho,\text{par}}({\bm{X}})\right\}
+p^zdpar(𝑿)kp^01par(𝑿)p^zddrkp^01drη^10ρ,par(𝑿)},\displaystyle+\frac{\widehat{p}_{z^{*}d^{*}}^{\text{par}}({\bm{X}})-k\widehat{p}_{01}^{\text{par}}({\bm{X}})}{\widehat{p}_{z^{*}d^{*}}^{\text{dr}}-k\widehat{p}_{01}^{\text{dr}}}\widehat{\eta}_{10}^{\rho,\text{par}}({\bm{X}})\Big{\}},

where η^10ρ,par(𝑿)=m=0mmaxρ^d1d0(10)(m,𝒙)μ^1d1par(m,𝒙)r^0d0par(m,𝒙)\widehat{\eta}_{10}^{\rho,\text{par}}({\bm{X}})=\displaystyle\sum_{m=0}^{m_{\max}}\widehat{\rho}_{d_{1}d_{0}}^{(10)}(m,{\bm{x}})\widehat{\mu}_{1d_{1}}^{\text{par}}(m,{\bm{x}})\widehat{r}_{0d_{0}}^{\text{par}}(m,{\bm{x}}) and ρ^d1d0(10)(m,𝒙)\widehat{\rho}_{d_{1}d_{0}}^{(10)}(m,{\bm{x}}) is ρd1d0(10)(m,𝒙){\rho}_{d_{1}d_{0}}^{(10)}(m,{\bm{x}}) evaluated at r^zdpar(m,𝒙)\widehat{r}_{zd}^{\text{par}}(m,{\bm{x}}). The following proposition shows that θ^d1d0(10),mr(t)\widehat{\theta}^{(10),\text{mr}}_{d_{1}d_{0}}(t) is a triply robust estimator under πem\mathcal{M}_{\pi}\cap\mathcal{M}_{e}\cap\mathcal{M}_{m}, emo\mathcal{M}_{e}\cap\mathcal{M}_{m}\cap\mathcal{M}_{o} or πmo\mathcal{M}_{\pi}\cap\mathcal{M}_{m}\cap\mathcal{M}_{o}.

Proposition S4

Suppose that Assumptions 1–4 and 6 hold. Then, under either πem\mathcal{M}_{\pi}\cap\mathcal{M}_{e}\cap\mathcal{M}_{m}, emo\mathcal{M}_{e}\cap\mathcal{M}_{m}\cap\mathcal{M}_{o}, or πmo\mathcal{M}_{\pi}\cap\mathcal{M}_{m}\cap\mathcal{M}_{o}, θ^d1d0(10),mr(t)\widehat{\theta}^{(10),\text{mr}}_{d_{1}d_{0}}(t) is consistent and asymptotically normal for any d1d0𝒰ad_{1}d_{0}\in\mathcal{U}_{\text{a}} under standard monotonicity and d1d0𝒰bd_{1}d_{0}\in\mathcal{U}_{\text{b}} under strong monotonicity.

To conduct the sensitivity analysis, one can specify a parametric form of tt indexed by a finite-dimensional parameter 𝜻\bm{\zeta}, t𝜻=t(z,m,d1d0,𝒙;𝜻)t_{\bm{\zeta}}=t(z,m,d_{1}d_{0},{\bm{x}};\bm{\zeta}). Then, one can report θ^d1d0(zz),mr(t𝜻)\widehat{\theta}^{(zz^{\prime}),\text{mr}}_{d_{1}d_{0}}(t_{\bm{\zeta}}) over a range of choices of 𝜻\bm{\zeta}, which captures the sensitivity of the conclusion under departure from Assumption 5.

C.3 Illustration of the sensitivity analysis framework based on the JOBS II study

This section revisits the JOBS II study in Section 6.1 to assess the robustness of our conclusions to the violation of the proposed structural assumptions. The ignorability assumption of treatment assignment (Assumption 2) and strong monotonicity assumption (Assumption 3) are satisfied in JOBS II study by design, but the principal ignorability (Assumption 4) and the ignorability of the mediator (Assumption 5) are generally not empirically verifiable without additional data. Henceforth, we apply the proposed sensitivity analysis framework to assess robustness of the estimated principal natural mediation effects to the violation of Assumptions 4 and 5, separately. For illustration, we only assess the range of the estimated principal natural mediation effects among the compliers stratum. While examining the violation of one assumption, we assume all other assumptions hold.

C.3.1 Sensitivity analysis for principal ignorability

As we discussed in Section C.1, under a one-sided noncompliance scenario (so strong monotonicity holds) with a binary mediator, the confounding functions κ={ξM(0)(1,𝒙),ξY(0)(1,𝒙)}\kappa=\{\xi_{M}^{(0)}(1,{\bm{x}}),\xi_{Y}^{(0)}(1,{\bm{x}})\} can be used to measure the extent to deviation of the principal ignorability assumption. Specifically, ξM(0)(1,𝒙)=fM0|U,𝑿(1|10,𝒙)fM0|U,𝑿(1|00,𝒙)\xi_{M}^{(0)}(1,{\bm{x}})=\frac{f_{M_{0}|U,{\bm{X}}}(1|10,{\bm{x}})}{f_{M_{0}|U,{\bm{X}}}(1|00,{\bm{x}})} measures the relative risk between compliers against the never-takers on the sense of mastery under the control condition and ξY(0)(1,𝒙)=EY0m|U,𝑿[Y0m|10,𝒙]EY0m|U,𝑿[Y0m|00,𝒙]\xi_{Y}^{(0)}(1,{\bm{x}})=\frac{E_{Y_{0m}|U,{\bm{X}}}[Y_{0m}|10,{\bm{x}}]}{E_{Y_{0m}|U,{\bm{X}}}[Y_{0m}|00,{\bm{x}}]} measures the ratio of the potential outcome mean (under the control condition) between compliers and never-takers. For simplicity (and this is often a practical strategy for sensitivity analysis without additional content knowledge), we assume the two confounding functions do not depend on the measured baseline covariates such that ξM(0)(1,𝒙)=λM\xi_{M}^{(0)}(1,{\bm{x}})=\lambda_{M} and ξY(0)(1,𝒙)=λY\xi_{Y}^{(0)}(1,{\bm{x}})=\lambda_{Y}. Our specified parametric confounding function is thus κ𝝀={λM,λY}\kappa_{\bm{\lambda}}=\{\lambda_{M},\lambda_{Y}\}.

Figure S4 presents the bias-corrected PNDE10\text{PNDE}_{10} estimate, PNDE^10=θ^10(10),mr(κ𝝀)θ^10(00),mr(κ𝝀)\widehat{\text{PNDE}}_{10}=\widehat{\theta}_{10}^{(10),\text{mr}}(\kappa_{\bm{\lambda}})-\widehat{\theta}_{10}^{(00),\text{mr}}(\kappa_{\bm{\lambda}}), with fixed values of {λM,λY}\{\lambda_{M},\lambda_{Y}\} ranging within [0.5,1.5]×[0.75,1.25][0.5,1.5]\times[0.75,1.25]. The results suggest that PNDE10{\text{PNDE}}_{10} is robust to violation of the principal ignorability on the mediator variable as PNDE^10\widehat{\text{PNDE}}_{10} has relatively small fluctuations with different values of λM\lambda_{M}. For example, PNDE^10\widehat{\text{PNDE}}_{10} only increases from -0.102 (95% CI: [0.193,0.003][-0.193,-0.003]) to -0.071 (95% CI: [0.148,0.015][-0.148,0.015]) when varying λM\lambda_{M} from 0.5 to 1.5 with λY\lambda_{Y} fixed at 1 (Figure S4, Panel B). In contrast, PNDE10{\text{PNDE}}_{10} is more sensitive to violation of the principal ignorability on the outcome variable, because PNDE^10\widehat{\text{PNDE}}_{10} moved toward null when λY\lambda_{Y} decreases from 11 and the sign of PNDE^10\widehat{\text{PNDE}}_{10} can even be reverted to positive when ξY0.87\xi_{Y}\leq 0.87.

Next, we assess robustness of our conclusions on PNIE10\text{PNIE}_{10} under departure from principal ignorability. In the one-sided noncompliance scenario, the validity of PNIE^10\widehat{\text{PNIE}}_{10} only depends on the principal ignorability assumption on the mediator variable (as we clarify in Section D.9, violation of principal ignorabilty on the outcome variable has no impact on PNIE^10\widehat{\text{PNIE}}_{10}). Therefore, we provide the bias-corrected PNIE10\text{PNIE}_{10} estimate, PNIE^10=θ^10(11),mr(κ𝝀)θ^10(10),mr(κ𝝀)\widehat{\text{PNIE}}_{10}=\widehat{\theta}_{10}^{(11),\text{mr}}(\kappa_{\bm{\lambda}})-\widehat{\theta}_{10}^{(10),\text{mr}}(\kappa_{\bm{\lambda}}), for λM\lambda_{M} ranging from 0.5 to 1.5 in Figure S5. The results suggest that estimates of PNIE10\text{PNIE}_{10} are robust against violations of principal ignorability, because PNIE^10\widehat{\text{PNIE}}_{10} remains negative among all values of λM\lambda_{M} considered. The estimated 95% confidence intervals, however, straddle zero when λM<0.75\lambda_{M}<0.75 or λM>1.35\lambda_{M}>1.35.

C.3.2 Sensitivity analysis for ignorability of the mediator

We then investigate whether the conclusion about the principal natural mediation effects among the compliers will be subject to change if the ignorability of the mediator is violated (while assuming all remaining assumptions hold). As indicated in Section C.2, the confounding function t(z,1,d1d0,𝒙)=𝔼Y1m|Z,M,U,𝑿[Y1m|z,1,d1d0,𝒙]𝔼Y1m|Z,M,U,𝑿[Y1m|z,0,d1d0,𝒙]t(z,1,d_{1}d_{0},{\bm{x}})=\frac{\mathbb{E}_{Y_{1m}|Z,M,U,{\bm{X}}}[Y_{1m}|z,1,d_{1}d_{0},{\bm{x}}]}{\mathbb{E}_{Y_{1m}|Z,M,U,{\bm{X}}}[Y_{1m}|z,0,d_{1}d_{0},{\bm{x}}]} can be used to quantify the degree of violation of the ignorability of the mediator assumption. For simplicity, we assume t(z,1,d1d0,𝒙)t(z,1,d_{1}d_{0},{\bm{x}}) is constant across all levels of zz, d1d0d_{1}d_{0}, and 𝒙{\bm{x}} and therefore focus on a one-dimensional sensitivity parameter ζ\zeta for t(z,1,d1d0,𝒙)t(z,1,d_{1}d_{0},{\bm{x}}); in other words, the parametric confounding function is simply taken as t𝜻:=t(z,1,d1d0,𝒙;ζ)=ζt_{\bm{\zeta}}:=t(z,1,d_{1}d_{0},{\bm{x}};\zeta)=\zeta.

Figure S6 presents the bias-corrected estimates of PNDE10\text{PNDE}_{10}, by PNDE^10=θ^10(10),mr(t𝜻)θ^10(00),mr\widehat{\text{PNDE}}_{10}=\widehat{\theta}_{10}^{(10),\text{mr}}(t_{\bm{\zeta}})-\widehat{\theta}_{10}^{(00),\text{mr}}, and the bias-corrected estimates of PNIE10\text{PNIE}_{10}, by PNIE^10=θ^10(11),mrθ^10(10),mr(t𝜻)\widehat{\text{PNIE}}_{10}=\widehat{\theta}_{10}^{(11),\text{mr}}-\widehat{\theta}_{10}^{(10),\text{mr}}(t_{\bm{\zeta}}), with ζ\zeta varying from 0.8 to 1.2. We observe that PNDE^10\widehat{\text{PNDE}}_{10} and PNIE^10\widehat{\text{PNIE}}_{10} move towards null with a larger and smaller value of ζ\zeta, respectively. Specifically, we observe that PNDE^10\widehat{\text{PNDE}}_{10} remains negative under all assumed values of ζ\zeta, but the point estimate increases from -0.132 (95% CI: [0.2210.047][-0.221-0.047]) to -0.043 (95% CI: [0.128,0.041][-0.128,0.041]) when ζ\zeta moves from 0.8 to 1.2. On the other hand, PNIE^10\widehat{\text{PNIE}}_{10} decreases from 0.023 (95% CI: [0.002,0.048][0.002,0.048]) to -0.065 (95% CI: [0.105,0.031][-0.105,-0.031]), when ζ\zeta increases from 0.8 and 1.2, suggesting that PNIE^10\widehat{\text{PNIE}}_{10} is relatively more sensitive to violation of Assumption 5.

D Proofs and technical details

D.1 The nonparametric identification result (Theorem 1)

Lemma S1

Let XX and VV be two random variables with densities fX(x)f_{X}(x) and fV(v)f_{V}(v). Then, we have that 𝔼[h(X)|V=v]=xfV|X(v|x)fV(v)h(x)dX(x)\mathbb{E}[h(X)|V=v]=\int_{x}\frac{f_{V|X}(v|x)}{f_{V}(v)}h(x)\text{d}\mathbb{P}_{X}(x).

Proof.

The proof is straightforward and omitted here. \square

Lemma S2

Let XX, VV, and GG be three random variables, then

X{V,G}XV|G and XG|V.X\perp\!\!\!\perp\{V,G\}\Longleftrightarrow X\perp\!\!\!\perp V|G\text{ and }X\perp\!\!\!\perp G|V.
Proof.

First suppose that X{V,G}X\perp\!\!\!\perp\{V,G\} holds, then we have that for any xx, vv, gg,

fX,V|G(c,v|g)=fX,V,G(x,v,g)fG(g)=fX(x)fV,G(v,g)fG(g)=fX(x)fV|G(v|g)=fX|G(x|g)fV|G(v|g),f_{X,V|G}(c,v|g)=\frac{f_{X,V,G}(x,v,g)}{f_{G}(g)}=\frac{f_{X}(x)f_{V,G}(v,g)}{f_{G}(g)}=f_{X}(x)f_{V|G}(v|g)=f_{X|G}(x|g)f_{V|G}(v|g),

which implies that XV|GX\perp\!\!\!\perp V|G. Using the same argument but switching the role of GG and VV, we can show XG|VX\perp\!\!\!\perp G|V under X{V,G}X\perp\!\!\!\perp\{V,G\}. Next suppose XV|G and XG|VX\perp\!\!\!\perp V|G\text{ and }X\perp\!\!\!\perp G|V, which imply that

fX(x)\displaystyle f_{X}(x) =gfX|G(x|g)dG(g)=gfX|G,V(x|g,v)dG(g)=gfX|V(x|v)dG(g)\displaystyle=\int_{g}f_{X|G}(x|g)\text{d}\mathbb{P}_{G}(g)=\int_{g}f_{X|G,V}(x|g,v)\text{d}\mathbb{P}_{G}(g)=\int_{g}f_{X|V}(x|v)\text{d}\mathbb{P}_{G}(g)
=fX|V(x|v),\displaystyle=f_{X|V}(x|v), (s9)

for any xx and vv. Therefore, we can show that for any x,v,gx,v,g:

fX,V,G(x,v,g)\displaystyle f_{X,V,G}(x,v,g) =fX|V,G(x|v,g)fV,G(v,g)=fX|V(x|v)fV,G(v,g)=fX(x)fV,G(v,g),\displaystyle=f_{X|V,G}(x|v,g)f_{V,G}(v,g)=f_{X|V}(x|v)f_{V,G}(v,g)=f_{X}(x)f_{V,G}(v,g),

where the last equality follows from equation (s9). This concludes X{V,G}X\perp\!\!\!\perp\{V,G\}. \square

Lemma S3

The principal ignorability assumption (Assumption 4) indicates that {Mzd,Yzdm}{D1,D0}|𝐗\{M_{zd},Y_{z^{\prime}dm^{\prime}}\}\perp\!\!\!\perp\{D_{1},D_{0}\}|{\bm{X}} for any zz, zz^{\prime}, dd, and mm^{\prime}, which further implies

MzdD1z|Dz,𝑿 and YzdmD1z|Mzd,Dz,𝑿.M_{zd}\perp\!\!\!\perp D_{1-z}|D_{z},{\bm{X}}\text{ and }Y_{z^{\prime}dm^{\prime}}\perp\!\!\!\perp D_{1-z^{\prime}}|M_{zd},D_{z^{\prime}},{\bm{X}}.
Proof.

Observing U={D1,D0}U=\{D_{1},D_{0}\}, we can see that Assumption 4 is equivalent to {Mzd,Yzdm}{D1,D0}|𝑿\{M_{zd},Y_{z^{\prime}dm^{\prime}}\}\perp\!\!\!\perp\{D_{1},D_{0}\}|{\bm{X}}, which implies

Mzd{D1,D0}|𝑿 and {Yzdm,Mzd}{D1,D0}|𝑿.M_{zd}\perp\!\!\!\perp\{D_{1},D_{0}\}|{\bm{X}}\text{ and }\{Y_{z^{\prime}dm^{\prime}},M_{zd}\}\perp\!\!\!\perp\{D_{1},D_{0}\}|{\bm{X}}.

In addition, since {D1,D0}\{D_{1},D_{0}\} is equivalent to {Dz,D1z}\{D_{z},D_{1-z}\} or {Dz,D1z}\{D_{z^{\prime}},D_{1-z^{\prime}}\}, one can verify that

Mzd{Dz,D1z}|𝑿 and {Yzdm,Mzd}{Dz,D1z}|𝑿M_{zd}\perp\!\!\!\perp\{D_{z},D_{1-z}\}|{\bm{X}}\text{ and }\{Y_{z^{\prime}dm^{\prime}},M_{zd}\}\perp\!\!\!\perp\{D_{z^{\prime}},D_{1-z^{\prime}}\}|{\bm{X}}

hold. Therefore, MzdD1z|Dz,𝑿M_{zd}\perp\!\!\!\perp D_{1-z}|D_{z},{\bm{X}} follows from Lemma S2, with X=MzdX=M_{zd}, V=D1zV=D_{1-z}, and G=DzG=D_{z}, conditional on 𝑿{\bm{X}}. Similarly, one can show {Yzdm,Mzd}D1z|Dz,𝑿\{Y_{z^{\prime}dm^{\prime}},M_{zd}\}\perp\!\!\!\perp D_{1-z^{\prime}}|D_{z^{\prime}},{\bm{X}} by applying Lemma S2, with X={Yzdm,Mzd}X=\{Y_{z^{\prime}dm^{\prime}},M_{zd}\}, V=D1zV=D_{1-z^{\prime}}, and G=DzG=D_{z^{\prime}}, conditional on 𝑿{\bm{X}}. Finally, YzdmD1z|Mzd,Dz,𝑿Y_{z^{\prime}dm^{\prime}}\perp\!\!\!\perp D_{1-z^{\prime}}|M_{zd},D_{z^{\prime}},{\bm{X}} follows by applying Lemma S2 again, with X=D1zX=D_{1-z^{\prime}}, V=YzdmV=Y_{z^{\prime}dm^{\prime}}, and G=MzdG=M_{zd}, conditional on 𝑿{\bm{X}}. \square

Lemma S4

Let VV and GG be two binary random variables satisfying VGV\geq G and XX be any random variable, then we have

X{V,G}XV and XG.X\perp\!\!\!\perp\{V,G\}\Longleftrightarrow X\perp\!\!\!\perp V\text{ and }X\perp\!\!\!\perp G.
Proof.

This follows from Lemma S1 in Forastiere et al. (2018). \square

Lemma S5

Under monotonicity (either Assumption 3a or 3b), Assumption 2 implies that {U,Mzd,Yzdm}Z|𝐗,\{U,M_{z^{\prime}d^{\prime}},Y_{z^{*}d^{*}m^{*}}\}\perp\!\!\!\perp Z|{\bm{X}}, for any zz^{\prime}, zz^{*}, dd^{\prime}, dd^{*}, mm^{*}.

Proof.

Assumption 2 suggests that

{D1,Mzd,Yzdm}Z|𝑿 and {D0,Mzd,Yzdm}Z|𝑿,\{D_{1},M_{z^{\prime}d^{\prime}},Y_{z^{*}d^{\prime}m^{*}}\}\perp\!\!\!\perp Z|{\bm{X}}\text{ and }\{D_{0},M_{z^{\prime}d^{\prime}},Y_{z^{*}d^{\prime}m^{*}}\}\perp\!\!\!\perp Z|{\bm{X}},

for any zz^{\prime}, zz^{*}, dd^{\prime}, and mm^{*}. Therefore,

D1Z|Mzd,Yzdm,𝑿 and D0Z|𝑿,Mzd,YzdmD_{1}\perp\!\!\!\perp Z|M_{z^{\prime}d^{\prime}},Y_{z^{*}d^{\prime}m^{*}},{\bm{X}}\text{ and }D_{0}\perp\!\!\!\perp Z|{\bm{X}},M_{z^{\prime}d^{\prime}},Y_{z^{*}d^{\prime}m^{*}} (s10)

follow from Lemma S2. Moreover, (s10) further implies

{D1,D0}Z|Mzd,Yzdm,𝑿UZ|Mzd,Yzdm,𝑿,\{D_{1},D_{0}\}\perp\!\!\!\perp Z|M_{z^{\prime}d^{\prime}},Y_{z^{*}d^{\prime}m^{*}},{\bm{X}}\Longleftrightarrow U\perp\!\!\!\perp Z|M_{z^{\prime}d^{\prime}},Y_{z^{*}d^{\prime}m^{*}},{\bm{X}}, (s11)

by applying Lemma S4, with V=D1V=D_{1}, G=D0G=D_{0}, and X=ZX=Z, conditional on {Mzd,Yzdm,𝑿}\{M_{z^{\prime}d^{\prime}},Y_{z^{*}d^{\prime}m^{*}},{\bm{X}}\}. Therefore, we have that

f(Z,U,Mzd,Yzdm|𝑿)\displaystyle f(Z,U,M_{z^{\prime}d^{\prime}},Y_{z^{*}d^{\prime}m^{*}}|{\bm{X}})
=\displaystyle= f(Z,U|Mzd,Yzdm,𝑿)f(Mzd,Yzdm|𝑿)\displaystyle f(Z,U|M_{z^{\prime}d^{\prime}},Y_{z^{*}d^{\prime}m^{*}},{\bm{X}})f(M_{z^{\prime}d^{\prime}},Y_{z^{*}d^{\prime}m^{*}}|{\bm{X}})
=\displaystyle= f(Z|Mzd,Yzdm,𝑿)f(U|Mzd,Yzdm,𝑿)f(Mzd,Yzdm|𝑿)(by (s11))\displaystyle f(Z|M_{z^{\prime}d^{\prime}},Y_{z^{*}d^{\prime}m^{*}},{\bm{X}})f(U|M_{z^{\prime}d^{\prime}},Y_{z^{*}d^{\prime}m^{*}},{\bm{X}})f(M_{z^{\prime}d^{\prime}},Y_{z^{*}d^{\prime}m^{*}}|{\bm{X}})\quad(\text{by \eqref{eq:lemma5_2}})
=\displaystyle= f(Z|𝑿)f(U|Mzd,Yzdm,𝑿)f(Mzd,Yzdm|𝑿)(by Assumption 2)\displaystyle f(Z|{\bm{X}})f(U|M_{z^{\prime}d^{\prime}},Y_{z^{*}d^{\prime}m^{*}},{\bm{X}})f(M_{z^{\prime}d^{\prime}},Y_{z^{*}d^{\prime}m^{*}}|{\bm{X}})\quad(\text{by Assumption 2})
=\displaystyle= f(Z|𝑿)f(U,Mzd,Yzdm|𝑿).\displaystyle f(Z|{\bm{X}})f(U,M_{z^{\prime}d^{\prime}},Y_{z^{*}d^{\prime}m^{*}}|{\bm{X}}).

This equation then shows that {U,Mzd,Yzdm}Z|𝑿\{U,M_{z^{\prime}d^{\prime}},Y_{z^{*}d^{\prime}m^{*}}\}\perp\!\!\!\perp Z|{\bm{X}} for any zz^{\prime}, zz^{*}, dd^{\prime}, and mm^{*}. \square

Lemma S6

Under Assumptions 2–4, Assumption 5 is equivalent to MzdYzdm|𝐗M_{zd}\perp\!\!\!\perp Y_{z^{\prime}d^{\prime}m^{\prime}}|{\bm{X}} \forall zz, zz^{\prime}, dd, dd^{\prime}, mm^{\prime}.

Proof.

Observe that

Assumption 5 f(Mzd,Yzdm|Z,U,𝑿)=f(Mzd|Z,U,𝑿)f(Yzdm|Z,U,𝑿)\displaystyle\Longleftrightarrow f(M_{zd},Y_{z^{\prime}d^{\prime}m^{\prime}}|Z,U,{\bm{X}})=f(M_{zd}|Z,U,{\bm{X}})f(Y_{z^{\prime}d^{\prime}m^{\prime}}|Z,U,{\bm{X}})
f(Mzd,Yzdm|U,𝑿)=f(Mzd|U,𝑿)f(Yzdm|U,𝑿)\displaystyle\Longleftrightarrow f(M_{zd},Y_{z^{\prime}d^{\prime}m^{\prime}}|U,{\bm{X}})=f(M_{zd}|U,{\bm{X}})f(Y_{z^{\prime}d^{\prime}m^{\prime}}|U,{\bm{X}})
f(Mzd,Yzdm|𝑿)=f(Mzd|𝑿)f(Yzdm|𝑿)\displaystyle\Longleftrightarrow f(M_{zd},Y_{z^{\prime}d^{\prime}m^{\prime}}|{\bm{X}})=f(M_{zd}|{\bm{X}})f(Y_{z^{\prime}d^{\prime}m^{\prime}}|{\bm{X}})
MzdYzdm|𝑿,\displaystyle\Longleftrightarrow M_{zd}\perp\!\!\!\perp Y_{z^{\prime}d^{\prime}m^{\prime}}|{\bm{X}},

where the first to the second row follows from Lemma S5 (as a consequence of Assumptions 2–3), and the second to the third row follows from Assumption 4. This completes the proof. \square

Proof of Theorem 1. Define dz=𝕀(z=1)d1+𝕀(z=0)d0d_{z}=\mathbb{I}(z=1)d_{1}+\mathbb{I}(z=0)d_{0} such that dz=d1d_{z}=d_{1} and d0d_{0} if zz in θd1d0(zz)\theta_{d_{1}d_{0}}^{(zz^{\prime})} is 1 and 0, respectively. Similarly, define dz=𝕀(z=1)d1+𝕀(z=0)d0d_{z^{\prime}}=\mathbb{I}(z^{\prime}=1)d_{1}+\mathbb{I}(z^{\prime}=0)d_{0}, d1z=𝕀(z=0)d1+𝕀(z=1)d0d_{1-z}=\mathbb{I}(z=0)d_{1}+\mathbb{I}(z=1)d_{0}, and d1z=𝕀(z=0)d1+𝕀(z=1)d0d_{1-z^{\prime}}=\mathbb{I}(z^{\prime}=0)d_{1}+\mathbb{I}(z^{\prime}=1)d_{0}. By the definition of θd1d0(zz)\theta_{d_{1}d_{0}}^{(zz^{\prime})}, we have that

θd1d0(zz)\displaystyle\theta_{d_{1}d_{0}}^{(zz^{\prime})} =𝔼[YzMz|U=d1d0]\displaystyle=\mathbb{E}[Y_{zM_{z^{\prime}}}|U=d_{1}d_{0}]
=𝔼[𝔼[YzMz|U=d1d0,𝑿]|U=d1d0](by law of iterated expectations)\displaystyle=\mathbb{E}\left[\mathbb{E}[Y_{zM_{z^{\prime}}}|U=d_{1}d_{0},{\bm{X}}]\Big{|}U=d_{1}d_{0}\right]\quad\text{(by law of iterated expectations)}
=𝔼[𝔼[YzMz|Z=z,U=d1d0,𝑿]|U=d1d0](by Lemma S5)\displaystyle=\mathbb{E}\left[\mathbb{E}[Y_{zM_{z^{\prime}}}|Z=z,U=d_{1}d_{0},{\bm{X}}]\Big{|}U=d_{1}d_{0}\right]\quad\text{(by Lemma \ref{lemma:randomization2})}
=𝔼[m𝔼[Yzm|Z=z,Mz=m,U=d1d0,𝑿]dMz|Z,U,𝑿(m|z,d1d0,𝑿)|U=d1d0]\displaystyle=\mathbb{E}\left[\int_{m}\mathbb{E}[Y_{zm}|Z=z,M_{z^{\prime}}=m,U=d_{1}d_{0},{\bm{X}}]\text{d}\mathbb{P}_{M_{z^{\prime}}|Z,U,{\bm{X}}}(m|z,d_{1}d_{0},{\bm{X}})\Big{|}U=d_{1}d_{0}\right]
=𝔼[m𝔼[Yzm|Z=z,Mz=m,U=d1d0,𝑿]dMz|Z,U,𝑿(m|z,d1d0,𝑿)|U=d1d0]\displaystyle=\mathbb{E}\left[\int_{m}\mathbb{E}[Y_{zm}|Z=z,M_{z}=m,U=d_{1}d_{0},{\bm{X}}]\text{d}\mathbb{P}_{M_{z^{\prime}}|Z,U,{\bm{X}}}(m|z,d_{1}d_{0},{\bm{X}})\Big{|}U=d_{1}d_{0}\right]
(by Assumption 5)
=𝔼[m𝔼[Yzm|Mz=m,U=d1d0,𝑿]dMz|U,𝑿(m|d1d0,𝑿)|U=d1d0](by Lemma S5)\displaystyle=\mathbb{E}\left[\int_{m}\mathbb{E}[Y_{zm}|M_{z}=m,U=d_{1}d_{0},{\bm{X}}]\text{d}\mathbb{P}_{M_{z^{\prime}}|U,{\bm{X}}}(m|d_{1}d_{0},{\bm{X}})\Big{|}U=d_{1}d_{0}\right]\quad\text{(by Lemma \ref{lemma:randomization2})}
=𝔼[m𝔼[YzDzm|MzDz=m,U=d1d0,𝑿]dMzDz|U,𝑿(m|d1d0,𝑿)|U=d1d0]\displaystyle=\mathbb{E}\left[\int_{m}\mathbb{E}[Y_{zD_{z}m}|M_{zD_{z}}=m,U=d_{1}d_{0},{\bm{X}}]\text{d}\mathbb{P}_{M_{z^{\prime}D_{z^{\prime}}}|U,{\bm{X}}}(m|d_{1}d_{0},{\bm{X}})\Big{|}U=d_{1}d_{0}\right]
(by composition of potential values)
=𝔼[m𝔼[YzDzm|Dz=dz,D1z=d1z,MzDz=m,𝑿]dMzDz|Dz,D1z,𝑿(m|dz,d1z,𝑿)|U=d1d0]\displaystyle=\mathbb{E}\left[\int_{m}\mathbb{E}[Y_{zD_{z}m}|D_{z}\!=\!d_{z},D_{1\!-\!z}\!=\!d_{1-z},M_{zD_{z}}\!=\!m,{\bm{X}}]\text{d}\mathbb{P}_{M_{z^{\prime}D_{z^{\prime}}}|D_{z^{\prime}},D_{1-z^{\prime}},{\bm{X}}}(m|d_{z^{\prime}},d_{1-z^{\prime}},{\bm{X}})\Big{|}U\!=\!d_{1}d_{0}\right]
=𝔼[m𝔼[YzDzm|Dz=dz,MzDz=m,𝑿]dMzDz|Dz,𝑿(m|dz,𝑿)|U=d1d0](by Lemma S3)\displaystyle=\mathbb{E}\left[\int_{m}\mathbb{E}[Y_{zD_{z}m}|D_{z}\!=\!d_{z},M_{zD_{z}}\!=\!m,{\bm{X}}]\text{d}\mathbb{P}_{M_{z^{\prime}D_{z^{\prime}}}|D_{z^{\prime}},{\bm{X}}}(m|d_{z^{\prime}},{\bm{X}})\Big{|}U\!=\!d_{1}d_{0}\right]\quad\text{(by Lemma \ref{lemma:pi_v2})}
=𝔼[m𝔼[YzDzm|Z=z,Dz=dz,MzDz=m,𝑿]dMzDz|Z,Dz,𝑿(m|z,dz,𝑿)|U=d1d0]\displaystyle=\mathbb{E}\left[\int_{m}\mathbb{E}[Y_{zD_{z}m}|Z=z,D_{z}\!=\!d_{z},M_{zD_{z}}\!=\!m,{\bm{X}}]\text{d}\mathbb{P}_{M_{z^{\prime}D_{z^{\prime}}}|Z,D_{z^{\prime}},{\bm{X}}}(m|z^{\prime},d_{z^{\prime}},{\bm{X}})\Big{|}U\!=\!d_{1}d_{0}\right]
=𝔼[m𝔼[Y|Z=z,D=dz,M=m,𝑿]dM|Z,D,𝑿(m|z,dz,𝑿)|U=d1d0](by Assumption 1)\displaystyle=\mathbb{E}\left[\int_{m}\mathbb{E}[Y|Z=z,D=d_{z},M=m,{\bm{X}}]\text{d}\mathbb{P}_{M|Z,D,{\bm{X}}}(m|z^{\prime},d_{z^{\prime}},{\bm{X}})\Big{|}U=d_{1}d_{0}\right]\quad\text{(by Assumption 1)}
=𝒙fU|𝑿(d1d0|𝒙)fU(d1d0)m𝔼Y|Z,D,M,𝑿[Y|z,dz,m,𝑿]dM|Z,D,𝑿(m|z,dz,𝑿)d𝑿(𝒙)(by Lemma S1)\displaystyle=\int_{{\bm{x}}}\frac{f_{U|{\bm{X}}}(d_{1}d_{0}|{\bm{x}})}{f_{U}(d_{1}d_{0})}\int_{m}\mathbb{E}_{Y|Z,D,M,{\bm{X}}}[Y|z,d_{z},m,{\bm{X}}]\text{d}\mathbb{P}_{M|Z,D,{\bm{X}}}(m|z^{\prime},d_{z^{\prime}},{\bm{X}})\text{d}\mathbb{P}_{{\bm{X}}}({\bm{x}})\quad\text{(by Lemma \ref{lemma:expectation})}
=𝒙ed1d0(𝒙)ed1d0m𝔼Y|Z,D,M,𝑿[Y|z,dz,m,𝑿]dM|Z,D,𝑿(m|z,dz,𝑿)d𝑿(𝒙),\displaystyle=\int_{{\bm{x}}}\frac{e_{d_{1}d_{0}}({\bm{x}})}{e_{d_{1}d_{0}}}\int_{m}\mathbb{E}_{Y|Z,D,M,{\bm{X}}}[Y|z,d_{z},m,{\bm{X}}]\text{d}\mathbb{P}_{M|Z,D,{\bm{X}}}(m|z^{\prime},d_{z^{\prime}},{\bm{X}})\text{d}\mathbb{P}_{{\bm{X}}}({\bm{x}}),

where ed1d0(𝒙)=pzd(𝒙)kp01(𝒙)e_{d_{1}d_{0}}({\bm{x}})=p_{z^{*}d^{*}}({\bm{x}})-kp_{01}({\bm{x}}) and ed1d0=pzdkp01e_{d_{1}d_{0}}=p_{z^{*}d^{*}}-kp_{01} are identified in equation (3) of the main manuscript under the monotonicity assumption (either Assumption 3a or 3b). This completes the proof. \square

D.2 Connections to existing literature (Remarks 2 and 3)

We compare the identification assumptions used in the current article to the identification assumptions in Zhou (2022) and Tchetgen Tchetgen and VanderWeele (2014). Specifically, Zhou (2022) considers the identification of path-specific effects in the presence of multiple causally-ordered mediators and Tchetgen Tchetgen and VanderWeele (2014) considers the identification of mediation effects in the presence of an exposure-induced confounder.

We focus on a comparable scenario with two intermediate variables, a binary variable DD and a binary/continuous variable MM, both of which sit in the causal pathway between the treatment assignment ZZ and an outcome YY, and we further assume the monotonicity assumption of ZZ on DD (either Assumption 3a or Assumption 3b) holds. These two intermediate variables, (D,M)(D,M), have different names in these three papers; they are referred to as the post-treatment event and the mediator in the current paper, as the first mediator and the second mediator in Zhou (2022), and as the treatment-induced confounder and the mediator in Tchetgen Tchetgen and VanderWeele (2014). All three papers consider the consistency assumption (Assumption 1) and slightly different versions of the positivity assumption. To offer a common ground for the comparison of key identification assumptions, throughout the comparison, we always assume the consistency (Assumption 1) and the positivity (Assumption 6) hold. Besides the consistency and positivity assumptions, Tchetgen Tchetgen and VanderWeele (2014) consider the monotonicity assumption (Assumption 3a) and the following NPSEM-IE holds:

Assumption S1 (NPSEM-IE in Tchetgen Tchetgen and VanderWeele, 2014)

Suppose the following nonparametric structural equation models with independent errors hold:

  • (i)

    𝑿=g𝑿(ϵ𝑿){\bm{X}}=g_{\bm{X}}(\epsilon_{\bm{X}}),

  • (ii)

    Z=gZ(𝑿,ϵZ)Z=g_{Z}({\bm{X}},\epsilon_{Z}),

  • (iii)

    D=gD(Z,𝑿,ϵD)D=g_{D}(Z,{\bm{X}},\epsilon_{D}),

  • (iv)

    M=gM(Z,D,𝑿,ϵM)M=g_{M}(Z,D,{\bm{X}},\epsilon_{M}),

  • (v)

    Y=gY(Z,D,M,𝑿,ϵY)Y=g_{Y}(Z,D,M,{\bm{X}},\epsilon_{Y}),

where {g𝐗,gZ,gD,gM,gY}\{g_{\bm{X}},g_{Z},g_{D},g_{M},g_{Y}\} are nonparametric functions and the errors {ϵ𝐗,ϵZ,ϵD,ϵM,ϵY}\{\epsilon_{\bm{X}},\epsilon_{Z},\epsilon_{D},\epsilon_{M},\epsilon_{Y}\} are mutually independent.

Besides the consistency and positivity assumptions, Zhou (2022) consider the following generalized sequential ignorability assumptions:

Assumption S2 (Assumption 2 in Zhou, 2022)

Suppose the following set of ignorability assumptions hold:

  • (i)

    {Dz,Mzd,Yzdm}Z|𝑿\{D_{z},M_{z^{\prime}d^{\prime}},Y_{z^{*}d^{\prime}m^{*}}\}\perp\!\!\!\perp Z|{\bm{X}} for any zz, zz^{\prime}, dd^{\prime}, zz^{*}, mm^{*},

  • (ii)

    {Mzd,Yzdm}Dz|Z,𝑿\{M_{zd},Y_{z^{\prime}dm^{\prime}}\}\perp\!\!\!\perp D_{z^{*}}|Z,{\bm{X}} for any zz, zz^{\prime}, zz^{*}, dd, mm^{\prime},

  • (iii)

    MzdYzdm|Z,D,𝑿M_{zd}\perp\!\!\!\perp Y_{z^{\prime}d^{\prime}m^{\prime}}|Z,D,{\bm{X}} for any zz, zz^{\prime}, dd, dd^{\prime}, mm^{\prime}.

Besides the consistency and positivity assumptions, the current work considers Assumptions 2, 4, 5. To facilitate exposition, we restate these three assumptions:

Assumption S3 (Assumptions 2, 4 and 5 in current work)

Suppose the following ignorability assumptions hold:

  • (i)

    {Dz,Mzd,Yzdm}Z|𝑿\{D_{z},M_{z^{\prime}d^{\prime}},Y_{z^{*}d^{\prime}m^{*}}\}\perp\!\!\!\perp Z|{\bm{X}} for any zz, zz^{\prime}, zz^{*}, dd^{\prime}, and mm^{*},

  • (ii)

    {Mzd,Yzdm}U|𝑿\{M_{zd},Y_{z^{\prime}dm^{\prime}}\}\perp\!\!\!\perp U|{\bm{X}} for any zz, zz^{\prime}, dd, and mm^{\prime},

  • (iii)

    MzdYzdm|Z,U,𝑿M_{zd}\perp\!\!\!\perp Y_{z^{\prime}d^{\prime}m^{\prime}}|Z,U,{\bm{X}} for any zz, zz^{\prime}, dd, dd^{\prime}, mm^{\prime}.

The following two lemmas are useful to prove Remarks 2 and 3 in the paper.

Lemma S7

Suppose that Assumptions 1 and 3 hold. Then, if Assumption S1 holds, Assumption S2 also holds, but not vice versa.

Proof.

First suppose that Assumption S1 holds. According to Assumption S1 and by the consistency (Assumption 1) and composition of potential values, we have

Z\displaystyle Z =gZ(𝑿,ϵZ)\displaystyle=g_{Z}({\bm{X}},\epsilon_{Z})
Dz\displaystyle D_{z} =gD(z,𝑿,ϵD)\displaystyle=g_{D}(z,{\bm{X}},\epsilon_{D}) (s12)
Mzd\displaystyle M_{z^{\prime}d^{\prime}} =gM(z,d,𝑿,ϵM)\displaystyle=g_{M}(z^{\prime},d^{\prime},{\bm{X}},\epsilon_{M}) (s13)
Yzdm\displaystyle Y_{z^{*}d^{\prime}m^{*}} =gY(z,d,m,𝑿,ϵY)\displaystyle=g_{Y}(z^{*},d^{\prime},m^{*},{\bm{X}},\epsilon_{Y}) (s14)

for any zz, zz^{\prime}, zz^{*}, dd^{\prime}, and mm^{*}, which indicates that {Dz,Mzd,Yzdm}Z|𝑿\{D_{z},M_{z^{\prime}d^{\prime}},Y_{z^{*}d^{\prime}m^{*}}\}\perp\!\!\!\perp Z|{\bm{X}} because {ϵZ,ϵD,ϵM,ϵY}\{\epsilon_{Z},\epsilon_{D},\epsilon_{M},\epsilon_{Y}\} are mutually independent. Therefore, Assumption S2(i) holds. Moreover, equations (s12), (s13), and (s14) suggest that

{Mzd,Yzdm}Dz|𝑿,\{M_{zd},Y_{z^{\prime}dm^{\prime}}\}\perp\!\!\!\perp D_{z^{*}}|{\bm{X}}, (s15)

for any zz, zz^{\prime}, zz^{*}, dd, mm^{\prime}. This implies

f(Mzd,Yzdm,Dz|𝑿)=f(Mzd,Yzdm|𝑿)f(Dz|𝑿),f(M_{zd},Y_{z^{\prime}dm^{\prime}},D_{z^{*}}|{\bm{X}})=f(M_{zd},Y_{z^{\prime}dm^{\prime}}|{\bm{X}})f(D_{z^{*}}|{\bm{X}}),

which, together with Assumption S2(i), implies that

f(Mzd,Yzdm,Dz|Z,𝑿)=f(Mzd,Yzdm|Z,𝑿)f(Dz|Z,𝑿).f(M_{zd},Y_{z^{\prime}dm^{\prime}},D_{z^{*}}|Z,{\bm{X}})=f(M_{zd},Y_{z^{\prime}dm^{\prime}}|Z,{\bm{X}})f(D_{z^{*}}|Z,{\bm{X}}).

Therefore, Assumption S2(ii) hold. Similarly, equations (s13) and (s14) suggest that MzdYzdm|𝑿M_{zd}\perp\!\!\!\perp Y_{z^{\prime}d^{\prime}m^{\prime}}|{\bm{X}} for any zz, zz^{\prime}, dd, dd^{\prime}, mm^{\prime}, which indicates

f(Mzd,Yzdm|𝑿)=f(Mzd|𝑿)f(Yzdm|𝑿).f(M_{zd},Y_{z^{\prime}d^{\prime}m^{\prime}}|{\bm{X}})=f(M_{zd}|{\bm{X}})f(Y_{z^{\prime}d^{\prime}m^{\prime}}|{\bm{X}}).

This, coupled with (s15), suggests that

f(Mzd,Yzdm|Dz,𝑿)=f(Mzd|Dz,𝑿)f(Yzdm|Dz,𝑿),f(M_{zd},Y_{z^{\prime}d^{\prime}m^{\prime}}|D_{z^{*}},{\bm{X}})=f(M_{zd}|D_{z^{*}},{\bm{X}})f(Y_{z^{\prime}d^{\prime}m^{\prime}}|D_{z^{*}},{\bm{X}}),

for any zz^{*}, which further implies

f(Mzd,Yzdm|Z=z,Dz,𝑿)=f(Mzd|Z=z,Dz,𝑿)f(Mzd|Z=z,Dz,𝑿)f(M_{zd},Y_{z^{\prime}d^{\prime}m^{\prime}}|Z=z^{*},D_{z^{*}},{\bm{X}})=f(M_{zd}|Z=z^{*},D_{z^{*}},{\bm{X}})f(M_{zd}|Z=z^{*},D_{z^{*}},{\bm{X}})

as a consequence of Assumption S2(i). Because Dz=DD_{z^{*}}=D if Z=zZ=z^{*}, we conclude that Assumption S2(iii) holds. Now we complete the proof that Assumption S2 also holds if Assumption S1 is valid. However, Assumption S1 may not hold under Assumption S2; that is, Assumption S1 is stronger than Assumption S2. For example, Assumption S2 does not require {Mzd,Mzd}Z|𝑿\{M_{zd},M_{z^{\prime}d^{\prime}}\}\perp\!\!\!\perp Z|{\bm{X}} for zdzdzd\neq z^{\prime}d^{\prime}, but Assumption S1 implicitly require this by the following set of nonparametric structural equations:

Z\displaystyle Z =gZ(𝑿,ϵZ),\displaystyle=g_{Z}({\bm{X}},\epsilon_{Z}),
Mzd\displaystyle M_{zd} =gM(z,d,𝑿,ϵM),\displaystyle=g_{M}(z,d,{\bm{X}},\epsilon_{M}),
Mzd\displaystyle M_{z^{\prime}d^{\prime}} =gM(z,d,𝑿,ϵM).\displaystyle=g_{M}(z^{\prime},d^{\prime},{\bm{X}},\epsilon_{M}).

\square

Lemma S8

Suppose that Assumptions 1 and 3 hold. Then, Assumption S2 is equivalent to Assumption S3.

Proof.

Assumption S2(i) is same to Assumption S3(i) by direct comparison. Next, we show Assumption S2(ii) is equivalent to Assumption S3(ii) under Assumption S2(i) (or equivalently Assumption S3(i)). Specifically, under Assumption S2(i), Assumption S2(ii) suggests that

f(Mzd,Yzdm,Dz|𝑿)\displaystyle f(M_{zd},Y_{z^{\prime}dm^{\prime}},D_{z^{*}}|{\bm{X}}) =f(Mzd,Yzdm,Dz|Z,𝑿)\displaystyle=f(M_{zd},Y_{z^{\prime}dm^{\prime}},D_{z^{*}}|Z,{\bm{X}})
=f(Mzd,Yzdm|Z,𝑿)f(Dz|Z,𝑿)\displaystyle=f(M_{zd},Y_{z^{\prime}dm^{\prime}}|Z,{\bm{X}})f(D_{z^{*}}|Z,{\bm{X}})
=f(Mzd,Yzdm|𝑿)f(Dz|𝑿).\displaystyle=f(M_{zd},Y_{z^{\prime}dm^{\prime}}|{\bm{X}})f(D_{z^{*}}|{\bm{X}}).

That is, {Mzd,Yzdm}Dz|𝑿\{M_{zd},Y_{z^{\prime}dm^{\prime}}\}\perp\!\!\!\perp D_{z^{*}}|{\bm{X}} for any zz, zz^{\prime}, zz^{*}, dd, and mm^{\prime}, which further implies that

{Mzd,Yzdm}D1|𝑿 and {Mzd,Yzdm}D0|𝑿.\{M_{zd},Y_{z^{\prime}dm^{\prime}}\}\perp\!\!\!\perp D_{1}|{\bm{X}}\text{ and }\{M_{zd},Y_{z^{\prime}dm^{\prime}}\}\perp\!\!\!\perp D_{0}|{\bm{X}}.

Applying Lemma S4 to the previous equation and noting that D1D0D_{1}\geq D_{0} by monotonicity, we obtain {Mzd,Yzdm}U|𝑿\{M_{zd},Y_{z^{\prime}dm^{\prime}}\}\perp\!\!\!\perp U|{\bm{X}}; i.e., Assumption S3(ii) holds under Assumption S2(i)–(ii). On the other hand, suppose that Assumption S3(ii) hold, then we can obtain that {Mzd,Yzdm}D1|𝑿 and {Mzd,Yzdm}D0|𝑿\{M_{zd},Y_{z^{\prime}dm^{\prime}}\}\perp\!\!\!\perp D_{1}|{\bm{X}}\text{ and }\{M_{zd},Y_{z^{\prime}dm^{\prime}}\}\perp\!\!\!\perp D_{0}|{\bm{X}}, by Lemma S4. This suggests that

{Mzd,Yzdm}Dz|𝑿,\{M_{zd},Y_{z^{\prime}dm^{\prime}}\}\perp\!\!\!\perp D_{z^{*}}|{\bm{X}},

for any zz, zz^{\prime}, zz^{*}, dd, and mm^{\prime}. Then applying Assumption S2(i), we have

f(Mzd,Yzdm,Dz|Z,𝑿)=\displaystyle f(M_{zd},Y_{z^{\prime}dm^{\prime}},D_{z^{*}}|Z,{\bm{X}})= f(Mzd,Yzdm,Dz|𝑿)=f(Mzd,Yzdm|𝑿)f(Dz|𝑿),\displaystyle f(M_{zd},Y_{z^{\prime}dm^{\prime}},D_{z^{*}}|{\bm{X}})=f(M_{zd},Y_{z^{\prime}dm^{\prime}}|{\bm{X}})f(D_{z^{*}}|{\bm{X}}),
=\displaystyle= f(Mzd,Yzdm|Z,𝑿)f(Dz|Z,𝑿)\displaystyle f(M_{zd},Y_{z^{\prime}dm^{\prime}}|Z,{\bm{X}})f(D_{z^{*}}|Z,{\bm{X}})

thus Assumption S2(ii) also holds under Assumption S3(i)–(ii). Therefore, we have verified that S2(i)–(ii) are equivalent to Assumption S3(i)–(ii).

Next, we show that Assumption S2(iii) is equivalent to Assumption S3(iii) under Assumption S2(i)–(ii) (or equivalently, Assumption S3(i)–(ii)). When the monotonicity assumption (Assumption 3) holds, the following statements are equivalent under Assumption S2(i)–(ii) and Assumption S3(i)–(ii):

Assumption S3(iii)
\displaystyle\Longleftrightarrow f(Mzd,Yzdm|Z,U,𝑿)=f(Yzdm|Z,U,𝑿)f(Mzd|Z,U,𝑿)\displaystyle f(M_{zd},Y_{z^{\prime}d^{\prime}m^{\prime}}|Z,U,{\bm{X}})=f(Y_{z^{\prime}d^{\prime}m^{\prime}}|Z,U,{\bm{X}})f(M_{zd}|Z,U,{\bm{X}})
(by Lemma S5)
\displaystyle\Longleftrightarrow f(Mzd,Yzdm|U,𝑿)=f(Yzdm|U,𝑿)f(Mzd|U,𝑿)\displaystyle f(M_{zd},Y_{z^{\prime}d^{\prime}m^{\prime}}|U,{\bm{X}})=f(Y_{z^{\prime}d^{\prime}m^{\prime}}|U,{\bm{X}})f(M_{zd}|U,{\bm{X}})
(by Assumption S3(ii))
\displaystyle\Longleftrightarrow f(Mzd,Yzdm|𝑿)=f(Yzdm|𝑿)f(Mzd|𝑿)\displaystyle f(M_{zd},Y_{z^{\prime}d^{\prime}m^{\prime}}|{\bm{X}})=f(Y_{z^{\prime}d^{\prime}m^{\prime}}|{\bm{X}})f(M_{zd}|{\bm{X}})
(by Assumption S3(i) or Assumption S2(i))
\displaystyle\Longleftrightarrow f(Mzd,Yzdm|Z,𝑿)=f(Yzdm|Z,𝑿)f(Mzd|Z,𝑿)\displaystyle f(M_{zd},Y_{z^{\prime}d^{\prime}m^{\prime}}|Z,{\bm{X}})=f(Y_{z^{\prime}d^{\prime}m^{\prime}}|Z,{\bm{X}})f(M_{zd}|Z,{\bm{X}})
(by Assumption S2(ii))
\displaystyle\Longleftrightarrow f(Mzd,Yzdm|Z,Dz,𝑿)=f(Yzdm|Z,Dz,𝑿)f(Mzd|Z,Dz,𝑿)\displaystyle f(M_{zd},Y_{z^{\prime}d^{\prime}m^{\prime}}|Z,D_{z^{*}},{\bm{X}})=f(Y_{z^{\prime}d^{\prime}m^{\prime}}|Z,D_{z^{*}},{\bm{X}})f(M_{zd}|Z,D_{z^{*}},{\bm{X}})
\displaystyle\Longleftrightarrow f(Mzd,Yzdm|Z=z,Dz,𝑿)=f(Yzdm|Z=z,Dz,𝑿)f(Mzd|Z=z,Dz,𝑿)\displaystyle f(M_{zd},Y_{z^{\prime}d^{\prime}m^{\prime}}|Z=z^{*},D_{z^{*}},{\bm{X}})=f(Y_{z^{\prime}d^{\prime}m^{\prime}}|Z=z^{*},D_{z^{*}},{\bm{X}})f(M_{zd}|Z=z^{*},D_{z^{*}},{\bm{X}})
\displaystyle\Longleftrightarrow f(Mzd,Yzdm|Z,D,𝑿)=f(Yzdm|Z,D,𝑿)f(Mzd|Z,D,𝑿)\displaystyle f(M_{zd},Y_{z^{\prime}d^{\prime}m^{\prime}}|Z,D,{\bm{X}})=f(Y_{z^{\prime}d^{\prime}m^{\prime}}|Z,D,{\bm{X}})f(M_{zd}|Z,D,{\bm{X}})
\displaystyle\Longleftrightarrow Assumption S2(iii).\displaystyle\text{Assumption \ref{assum:zhou}(iii)}.

Then we conclude the proof. \square

Proof of Remark 2 and Remark 3. Remark 3 follows from Lemma S8. Also, noting that Lemma S8 suggests that Assumptions 2, 4, and 5 are equivalent to Lemma S8 when the consistency (Assumption 1) and monotonicity (Assumption 3) hold. Remark 2 then directly follows from Lemma S7. \square

D.3 Moment type estimators (Theorem 2 and Proposition 1)

Proof of Theorem 2. One can easily verify that θd1d0(zz)=θd1d0(zz),d\theta_{d_{1}d_{0}}^{(zz^{\prime})}=\theta_{d_{1}d_{0}}^{(zz^{\prime}),\text{d}} by direct comparison. Below we show θd1d0(zz),c=θd1d0(zz),d\theta_{d_{1}d_{0}}^{(zz^{\prime}),\text{c}}=\theta_{d_{1}d_{0}}^{(zz^{\prime}),\text{d}}:

θd1d0(zz),c=\displaystyle\theta_{d_{1}d_{0}}^{(zz^{\prime}),\text{c}}= 𝔼[pzd(𝑿)kp01(𝑿)pzdkp01𝕀(D=dz,Z=z)pzdz(𝑿)πz(𝑿)μzdz(M,𝑿)]\displaystyle\mathbb{E}\left[\frac{p_{z^{*}d^{*}}({\bm{X}})-kp_{01}({\bm{X}})}{p_{z^{*}d^{*}}-kp_{01}}\frac{\mathbb{I}(D=d_{z^{\prime}},Z=z^{\prime})}{p_{z^{\prime}d_{z^{\prime}}}({\bm{X}})\pi_{z^{\prime}}({\bm{X}})}\mu_{zd_{z}}(M,{\bm{X}})\right]
=\displaystyle= 𝒙m{pzd(𝒙)kp01(𝒙)pzdkp011pzdz(𝒙)πz(𝒙)μzdz(m,𝒙)}fD|Z,𝑿(dz|z,𝒙)fZ|𝑿(z|𝒙)\displaystyle\int_{{\bm{x}}}\int_{m}\left\{\frac{p_{z^{*}d^{*}}({\bm{x}})-kp_{01}({\bm{x}})}{p_{z^{*}d^{*}}-kp_{01}}\frac{1}{p_{z^{\prime}d_{z^{\prime}}}({\bm{x}})\pi_{z^{\prime}}({\bm{x}})}\mu_{zd_{z}}(m,{\bm{x}})\right\}f_{D|Z,{\bm{X}}}(d_{z^{\prime}}|z^{\prime},{\bm{x}})f_{Z|{\bm{X}}}(z^{\prime}|{\bm{x}})
dM|Z,D,𝑿(m|z,dz,𝒙)d𝑿(𝒙)\displaystyle\quad\text{d}\mathbb{P}_{M|Z,D,{\bm{X}}}(m|z^{\prime},d_{z^{\prime}},{\bm{x}})\text{d}\mathbb{P}_{{\bm{X}}}({\bm{x}})
=\displaystyle= 𝒙m{pzd(𝒙)kp01(𝒙)pzdkp01μzdz(m,𝒙)dM|Z,D,𝑿(m|z,dz,𝒙)d𝑿(𝒙)}\displaystyle\int_{{\bm{x}}}\int_{m}\left\{\frac{p_{z^{*}d^{*}}({\bm{x}})-kp_{01}({\bm{x}})}{p_{z^{*}d^{*}}-kp_{01}}\mu_{zd_{z}}(m,{\bm{x}})\text{d}\mathbb{P}_{M|Z,D,{\bm{X}}}(m|z^{\prime},d_{z^{\prime}},{\bm{x}})\text{d}\mathbb{P}_{{\bm{X}}}({\bm{x}})\right\}
=\displaystyle= 𝒙pzd(𝒙)kp01(𝒙)pzdkp01mμzdz(m,𝒙)rzdz(m,𝒙)dmd𝑿(𝒙)\displaystyle\int_{{\bm{x}}}\frac{p_{z^{*}d^{*}}({\bm{x}})-kp_{01}({\bm{x}})}{p_{z^{*}d^{*}}-kp_{01}}\int_{m}\mu_{zd_{z}}(m,{\bm{x}})r_{zd_{z^{\prime}}}(m,{\bm{x}})\text{d}m\text{d}\mathbb{P}_{{\bm{X}}}({\bm{x}})
=\displaystyle= 𝒙pzd(𝒙)kp01(𝒙)pzdkp01ηzz(𝒙)d𝑿(𝒙)=θd1d0(zz),d.\displaystyle\int_{{\bm{x}}}\frac{p_{z^{*}d^{*}}({\bm{x}})-kp_{01}({\bm{x}})}{p_{z^{*}d^{*}}-kp_{01}}\eta_{zz^{\prime}}({\bm{x}})\text{d}\mathbb{P}_{{\bm{X}}}({\bm{x}})=\theta_{d_{1}d_{0}}^{(zz^{\prime}),\text{d}}.

Next, we show θd1d0(zz),b=θd1d0(zz),d\theta_{d_{1}d_{0}}^{(zz^{\prime}),\text{b}}=\theta_{d_{1}d_{0}}^{(zz^{\prime}),\text{d}}:

θd1d0(zz),b=\displaystyle\theta_{d_{1}d_{0}}^{(zz^{\prime}),\text{b}}= 𝔼[{𝕀(Z=z,D=d)πz(𝑿)k(1Z)Dπ0(𝑿)}ηzz(𝑿)pzdkp01]\displaystyle\mathbb{E}\left[\left\{\frac{\mathbb{I}(Z=z^{*},D=d^{*})}{\pi_{z^{*}}({\bm{X}})}-k\frac{(1-Z)D}{\pi_{0}({\bm{X}})}\right\}\frac{\eta_{zz^{\prime}}({\bm{X}})}{p_{z^{*}d^{*}}-kp_{01}}\right]
=\displaystyle= 𝒙{1πz(𝒙)ηzz(𝒙)pzdkp01}fD|Z,𝑿(d|z,𝒙)fZ|𝑿(z|𝒙)d𝑿(𝒙)\displaystyle\int_{{\bm{x}}}\left\{\frac{1}{\pi_{z^{*}}({\bm{x}})}\frac{\eta_{zz^{\prime}}({\bm{x}})}{p_{z^{*}d^{*}}-kp_{01}}\right\}f_{D|Z,{\bm{X}}}(d^{*}|z^{*},{\bm{x}})f_{Z|{\bm{X}}}(z^{*}|{\bm{x}})\text{d}\mathbb{P}_{\bm{X}}({\bm{x}})
𝒙{kπ0(𝒙)ηzz(𝒙)pzdkp01}fD|Z,𝑿(1|0,𝒙)fZ|𝑿(0|𝒙)d𝑿(𝒙)\displaystyle-\int_{{\bm{x}}}\left\{\frac{k}{\pi_{0}({\bm{x}})}\frac{\eta_{zz^{\prime}}({\bm{x}})}{p_{z^{*}d^{*}}-kp_{01}}\right\}f_{D|Z,{\bm{X}}}(1|0,{\bm{x}})f_{Z|{\bm{X}}}(0|{\bm{x}})\text{d}\mathbb{P}_{\bm{X}}({\bm{x}})
=\displaystyle= 𝒙{fD|Z,𝑿(d|z,𝒙)kfD|Z,𝑿(1|0,𝒙)}ηzz(𝒙)pzdkp01d𝑿(𝒙)\displaystyle\int_{{\bm{x}}}\left\{f_{D|Z,{\bm{X}}}(d^{*}|z^{*},{\bm{x}})-kf_{D|Z,{\bm{X}}}(1|0,{\bm{x}})\right\}\frac{\eta_{zz^{\prime}}({\bm{x}})}{p_{z^{*}d^{*}}-kp_{01}}\text{d}\mathbb{P}_{\bm{X}}({\bm{x}})
=\displaystyle= 𝒙pzd(𝒙)kp01(𝒙)pzdkp01ηzz(𝒙)d𝑿(𝒙)=θd1d0(zz),d.\displaystyle\int_{{\bm{x}}}\frac{p_{z^{*}d^{*}}({\bm{x}})-kp_{01}({\bm{x}})}{p_{z^{*}d^{*}}-kp_{01}}\eta_{zz^{\prime}}({\bm{x}})\text{d}\mathbb{P}_{{\bm{X}}}({\bm{x}})=\theta_{d_{1}d_{0}}^{(zz^{\prime}),\text{d}}.

Finally, we show θd1d0(zz),a=θd1d0(zz),d\theta_{d_{1}d_{0}}^{(zz^{\prime}),\text{a}}=\theta_{d_{1}d_{0}}^{(zz^{\prime}),\text{d}}:

θd1d0(zz),a=\displaystyle\theta_{d_{1}d_{0}}^{(zz^{\prime}),\text{a}}= 𝔼[pzd(𝑿)kp01(𝑿)pzdkp01𝕀(D=dz,Z=z)pzdz(𝑿)πz(𝑿)rzdz(M,𝑿)rzdz(M,𝑿)Y]\displaystyle\mathbb{E}\left[\frac{p_{z^{*}d^{*}}({\bm{X}})-kp_{01}({\bm{X}})}{p_{z^{*}d^{*}}-kp_{01}}\frac{\mathbb{I}(D=d_{z},Z=z)}{p_{zd_{z}}({\bm{X}})\pi_{z}({\bm{X}})}\frac{r_{z^{\prime}d_{z^{\prime}}}(M,{\bm{X}})}{r_{zd_{z}}(M,{\bm{X}})}Y\right]
=\displaystyle= 𝒙my{pzd(𝒙)kp01(𝒙)pzdkp011pzdz(𝒙)πz(𝒙)rzdz(m,𝒙)rzdz(m,𝒙)y}fD|Z,𝑿(dz|z,𝒙)fZ|𝑿(z|𝒙)\displaystyle\int_{{\bm{x}}}\int_{m}\int_{y}\left\{\frac{p_{z^{*}d^{*}}({\bm{x}})-kp_{01}({\bm{x}})}{p_{z^{*}d^{*}}-kp_{01}}\frac{1}{p_{zd_{z}}({\bm{x}})\pi_{z}({\bm{x}})}\frac{r_{z^{\prime}d_{z^{\prime}}}(m,{\bm{x}})}{r_{zd_{z}}(m,{\bm{x}})}y\right\}f_{D|Z,{\bm{X}}}(d_{z}|z,{\bm{x}})f_{Z|{\bm{X}}}(z|{\bm{x}})
dY|Z,D,M𝑿(y|z,dz,m,𝒙)dM|Z,D,𝑿(m|z,dz,𝒙)d𝑿(𝒙)\displaystyle\quad\text{d}\mathbb{P}_{Y|Z,D,M{\bm{X}}}(y|z,d_{z},m,{\bm{x}})\text{d}\mathbb{P}_{M|Z,D,{\bm{X}}}(m|z^{\prime},d_{z^{\prime}},{\bm{x}})\text{d}\mathbb{P}_{{\bm{X}}}({\bm{x}})
=\displaystyle= 𝒙mypzd(𝒙)kp01(𝒙)pzdkp01rzdz(m,𝒙)rzdz(m,𝒙)ydY|Z,D,M𝑿(y|z,dz,m,𝒙)\displaystyle\int_{{\bm{x}}}\int_{m}\int_{y}\frac{p_{z^{*}d^{*}}({\bm{x}})-kp_{01}({\bm{x}})}{p_{z^{*}d^{*}}-kp_{01}}\frac{r_{z^{\prime}d_{z^{\prime}}}(m,{\bm{x}})}{r_{zd_{z}}(m,{\bm{x}})}y\text{d}\mathbb{P}_{Y|Z,D,M{\bm{X}}}(y|z,d_{z},m,{\bm{x}})
dM|Z,D,𝑿(m|z,dz,𝒙)d𝑿(𝒙)\displaystyle\quad\quad\text{d}\mathbb{P}_{M|Z,D,{\bm{X}}}(m|z^{\prime},d_{z^{\prime}},{\bm{x}})\text{d}\mathbb{P}_{{\bm{X}}}({\bm{x}})
=\displaystyle= 𝒙pzd(𝒙)kp01(𝒙)pzdkp01mrzdz(m,𝒙)rzdz(m,𝒙)yydY|Z,D,M𝑿(y|z,dz,m,𝒙)\displaystyle\int_{{\bm{x}}}\frac{p_{z^{*}d^{*}}({\bm{x}})-kp_{01}({\bm{x}})}{p_{z^{*}d^{*}}-kp_{01}}\int_{m}\frac{r_{z^{\prime}d_{z^{\prime}}}(m,{\bm{x}})}{r_{zd_{z}}(m,{\bm{x}})}\int_{y}y\text{d}\mathbb{P}_{Y|Z,D,M{\bm{X}}}(y|z,d_{z},m,{\bm{x}})
dM|Z,D,𝑿(m|z,dz,𝒙)d𝑿(𝒙)\displaystyle\quad\quad\text{d}\mathbb{P}_{M|Z,D,{\bm{X}}}(m|z^{\prime},d_{z^{\prime}},{\bm{x}})\text{d}\mathbb{P}_{{\bm{X}}}({\bm{x}})
=\displaystyle= 𝒙pzd(𝒙)kp01(𝒙)pzdkp01mrzdz(m,𝒙)rzdz(m,𝒙)μzdz(m,𝒙)dM|Z,D,𝑿(m|z,dz,𝒙)d𝑿(𝒙)\displaystyle\int_{{\bm{x}}}\frac{p_{z^{*}d^{*}}({\bm{x}})-kp_{01}({\bm{x}})}{p_{z^{*}d^{*}}-kp_{01}}\int_{m}\frac{r_{z^{\prime}d_{z^{\prime}}}(m,{\bm{x}})}{r_{zd_{z}}(m,{\bm{x}})}\mu_{zd_{z}}(m,{\bm{x}})\text{d}\mathbb{P}_{M|Z,D,{\bm{X}}}(m|z^{\prime},d_{z^{\prime}},{\bm{x}})\text{d}\mathbb{P}_{{\bm{X}}}({\bm{x}})
=\displaystyle= 𝒙pzd(𝒙)kp01(𝒙)pzdkp01mrzdz(m,𝒙)μzdz(m,𝒙)dmd𝑿(𝒙)\displaystyle\int_{{\bm{x}}}\frac{p_{z^{*}d^{*}}({\bm{x}})-kp_{01}({\bm{x}})}{p_{z^{*}d^{*}}-kp_{01}}\int_{m}r_{z^{\prime}d_{z^{\prime}}}(m,{\bm{x}})\mu_{zd_{z}}(m,{\bm{x}})\text{d}m\text{d}\mathbb{P}_{{\bm{X}}}({\bm{x}})
=\displaystyle= 𝒙pzd(𝒙)kp01(𝒙)pzdkp01ηzz(𝒙)d𝑿(𝒙)=θd1d0(zz),d.\displaystyle\int_{{\bm{x}}}\frac{p_{z^{*}d^{*}}({\bm{x}})-kp_{01}({\bm{x}})}{p_{z^{*}d^{*}}-kp_{01}}\eta_{zz^{\prime}}({\bm{x}})\text{d}\mathbb{P}_{{\bm{X}}}({\bm{x}})=\theta_{d_{1}d_{0}}^{(zz^{\prime}),\text{d}}.

This concludes that θd1d0(zz)=θd1d0(zz),a=θd1d0(zz),b=θd1d0(zz),c=θd1d0(zz),d\theta_{d_{1}d_{0}}^{(zz^{\prime})}=\theta^{(zz^{\prime}),\textrm{a}}_{d_{1}d_{0}}=\theta^{(zz^{\prime}),\textrm{b}}_{d_{1}d_{0}}=\theta^{(zz^{\prime}),\textrm{c}}_{d_{1}d_{0}}=\theta^{(zz^{\prime}),\textrm{d}}_{d_{1}d_{0}}. \square

Next, we proceed with the proof of Proposition 1.

Proof of Proposition 1. Here we only prove the consistency and asymptotic normality of θ^d1d0(zz),d\widehat{\theta}_{d_{1}d_{0}}^{(zz^{\prime}),\text{d}} under emo\mathcal{M}_{e}\cap\mathcal{M}_{m}\cap\mathcal{M}_{o}, and the proof can be easily extended to the other three moment-type estimators, θ^d1d0(zz),a\widehat{\theta}_{d_{1}d_{0}}^{(zz^{\prime}),\text{a}}, θ^d1d0(zz),b\widehat{\theta}_{d_{1}d_{0}}^{(zz^{\prime}),\text{b}}, and θ^d1d0(zz),c\widehat{\theta}_{d_{1}d_{0}}^{(zz^{\prime}),\text{c}}.

Let 𝝉\bm{\tau} be all of the parameters in the parametric working models of hnuisanceparh_{nuisance}^{\text{par}}, and let 𝝉\bm{\tau}^{*} be the probability limit of 𝝉^\widehat{\bm{\tau}}. Let h~nuisance={π~z(𝒙),p~zd(𝒙),r~zd(m,𝒙),μ~zd(m,𝒙)}\widetilde{h}_{nuisance}=\{\widetilde{\pi}_{z}({\bm{x}}),\widetilde{p}_{zd}({\bm{x}}),\widetilde{r}_{zd}(m,{\bm{x}}),\widetilde{\mu}_{zd}(m,{\bm{x}})\} be the value of hnuisanceparh_{nuisance}^{\text{par}} when it is evaluated at 𝝉\bm{\tau}^{*}; h~nuisance\widetilde{h}_{nuisance} is taken as the probability limit of h^nuisancepar\widehat{h}_{nuisance}^{\text{par}}. Under emo\mathcal{M}_{e}\cap\mathcal{M}_{m}\cap\mathcal{M}_{o}, we have that p~zd(𝒙)=pzd(𝒙)\widetilde{p}_{zd}({\bm{x}})=p_{zd}({\bm{x}}), r~zd(m,𝒙)=rzd(m,𝒙)\widetilde{r}_{zd}(m,{\bm{x}})=r_{zd}(m,{\bm{x}}), μ~zd(m,𝒙)=μzd(m,𝒙)\widetilde{\mu}_{zd}(m,{\bm{x}})=\mu_{zd}(m,{\bm{x}}), but we allow π~z(𝒙)πz(𝒙)\widetilde{\pi}_{z}({\bm{x}})\neq\pi_{z}({\bm{x}}) due to possible misspecification of π\mathcal{M}_{\pi}. Let p~zd=𝔼[𝕀(Z=z)(𝕀(D=d)p~zd(𝑿))π~z(𝑿)+p~zd(𝑿)]\widetilde{p}_{zd}=\mathbb{E}\left[\frac{\mathbb{I}(Z=z)(\mathbb{I}(D=d)-\widetilde{p}_{zd}({\bm{X}}))}{\widetilde{\pi}_{z}({\bm{X}})}+\widetilde{p}_{zd}({\bm{X}})\right] be the probability limit of pzddrp_{zd}^{\text{dr}}. According to Jiang et al. (2022), p~zd=pzd\widetilde{p}_{zd}=p_{zd} under πe\mathcal{M}_{\pi}\cup\mathcal{M}_{e}, a condition that is nested within emo\mathcal{M}_{e}\cap\mathcal{M}_{m}\cap\mathcal{M}_{o}.

Next, we prove the consistency and asymptotic normality of θ^d1d0(zz),d\widehat{\theta}^{(zz^{\prime}),\text{d}}_{d_{1}d_{0}}. Notice that θ^d1d0(zz),d\widehat{\theta}^{(zz^{\prime}),\text{d}}_{d_{1}d_{0}} can be viewed as the solution of the following estimating equation

n[Sd(𝑶;θd1d0(zz),𝝉^)]=n[𝒮1(𝑶;𝝉^)θd1d0(zz)𝒮0(𝑶;𝝉^)]=0,\mathbb{P}_{n}\left[S_{\text{d}}\left({\bm{O}};\theta_{d_{1}d_{0}}^{(zz^{\prime})},\widehat{\bm{\tau}}\right)\right]=\mathbb{P}_{n}\left[\mathcal{S}_{1}(\bm{O};\widehat{\bm{\tau}})-\theta_{d_{1}d_{0}}^{(zz^{\prime})}\mathcal{S}_{0}(\bm{O};\widehat{\bm{\tau}})\right]=0,

where

𝒮1(𝑶;𝝉)=pzdpar(𝑿)kp01par(𝑿)pzddrkp01drmμzdzpar(m,𝑿)rzdzpar(m,𝑿)𝑑m\mathcal{S}_{1}(\bm{O};\bm{\tau})=\frac{p_{z^{*}d^{*}}^{\text{par}}({\bm{X}})-kp_{01}^{\text{par}}({\bm{X}})}{p_{z^{*}d^{*}}^{\text{dr}}-kp_{01}^{\text{dr}}}\int_{m}\mu_{zd_{z}}^{\text{par}}(m,{\bm{X}})r_{z^{\prime}d_{z^{\prime}}}^{\text{par}}(m,{\bm{X}})dm

and

𝒮0(𝑶;𝝉)=n[𝕀(Z=z){𝕀(D=d)pzdpar(𝑿)}πzpar(𝑿)+pzdpar(𝑿)].\mathcal{S}_{0}(\bm{O};\bm{\tau})=\mathbb{P}_{n}\left[\frac{\mathbb{I}(Z=z)\left\{\mathbb{I}(D=d)-p_{zd}^{\text{par}}({\bm{X}})\right\}}{{\pi}_{z}^{\text{par}}({\bm{X}})}+p_{zd}^{\text{par}}({\bm{X}})\right].

In addition, assume that the following regularity conditions hold:

  • 1.

    Assume that n(𝝉^𝝉)=nn[IF𝝉(𝑶;𝝉)]+op(1)\sqrt{n}(\widehat{\bm{\tau}}-\bm{\tau}^{*})=\sqrt{n}\mathbb{P}_{n}\left[\text{IF}_{\bm{\tau}}(\bm{O};\bm{\tau}^{*})\right]+o_{p}(1), where IF𝝉(𝑶;𝝉)\text{IF}_{\bm{\tau}}(\bm{O};\bm{\tau}^{*}) is the influence function of 𝝉^\widehat{\bm{\tau}} and op(1)o_{p}(1) is a remainder term that converges in probability to 0. Also, assume that n[{IF𝝉(𝑶;𝝉)}2]\mathbb{P}_{n}\left[\left\{\text{IF}_{\bm{\tau}}(\bm{O};\bm{\tau}^{*})\right\}^{\otimes 2}\right] converges to a positive definite matrix.

  • 2.

    Let 𝚵\bm{\Xi} be a bounded convex neighborhood of 𝝉\bm{\tau}^{*}. Assume that the class of functions {𝒮1(𝑶;𝝉),𝝉𝒮1(𝑶;𝝉),{𝒮1(𝑶;𝝉)}2,𝒮0(𝑶;𝝉),𝝉𝒮0(𝑶;𝝉),{𝒮0(𝑶;𝝉)}2,𝒮0(𝑶;𝝉),IF𝝉(𝑶;𝝉),{IF𝝉(𝑶;𝝉)}2}\Big{\{}\mathcal{S}_{1}(\bm{O};\bm{\tau}),\frac{\partial}{\partial\bm{\tau}}\mathcal{S}_{1}(\bm{O};\bm{\tau}),\left\{\mathcal{S}_{1}(\bm{O};\bm{\tau})\right\}^{2},\mathcal{S}_{0}(\bm{O};\bm{\tau}),\frac{\partial}{\partial\bm{\tau}}\mathcal{S}_{0}(\bm{O};\bm{\tau}),\left\{\mathcal{S}_{0}(\bm{O};\bm{\tau})\right\}^{2},\mathcal{S}_{0}(\bm{O};\bm{\tau}),\\ \text{IF}_{\bm{\tau}}(\bm{O};\bm{\tau}),\left\{\text{IF}_{\bm{\tau}}(\bm{O};\bm{\tau})\right\}^{\otimes 2}\Big{\}} is a Glivenko-Cantelli class in 𝚵\bm{\Xi}.

  • 3.

    Assume that n[𝒮0(𝑶;𝝉)]\mathbb{P}_{n}[\mathcal{S}_{0}(\bm{O};\bm{\tau}^{*})] converges to a positive value. In addition, we assume that both n[{𝒮1(𝑶;𝝉)}2]\mathbb{P}_{n}[\left\{\mathcal{S}_{1}(\bm{O};\bm{\tau}^{*})\right\}^{2}] and n[{𝒮0(𝑶;𝝉)}2]\mathbb{P}_{n}[\left\{\mathcal{S}_{0}(\bm{O};\bm{\tau}^{*})\right\}^{2}] converge to a positive value.

To prove asymptotic normality, we use a Taylor series, along with the above conditions, to deduce that

0=n[Sd(𝑶;θ^d1d0(zz),d,𝝉^)]=\displaystyle 0=\mathbb{P}_{n}\left[S_{\text{d}}\left({\bm{O}};\widehat{\theta}_{d_{1}d_{0}}^{(zz^{\prime}),\text{d}},\widehat{\bm{\tau}}\right)\right]= n[𝒮1(𝑶;𝝉^)θ^d1d0(zz),d𝒮0(𝑶;𝝉^)]\displaystyle\mathbb{P}_{n}\left[\mathcal{S}_{1}(\bm{O};\widehat{\bm{\tau}})-\widehat{\theta}_{d_{1}d_{0}}^{(zz^{\prime}),\text{d}}\mathcal{S}_{0}(\bm{O};\widehat{\bm{\tau}})\right]
=\displaystyle= n[𝒮1(𝑶;𝝉)θd1d0(zz),d𝒮0(𝑶;𝝉)]\displaystyle\mathbb{P}_{n}\left[\mathcal{S}_{1}(\bm{O};\bm{\tau}^{*})-\theta_{d_{1}d_{0}}^{(zz^{\prime}),\text{d}}\mathcal{S}_{0}(\bm{O};\bm{\tau}^{*})\right]
n[𝒮0(𝑶;𝝉)](θ^d1d0(zz),dθd1d0(zz),d)\displaystyle-\mathbb{P}_{n}\left[\mathcal{S}_{0}(\bm{O};\bm{\tau}^{*})\right]\left(\widehat{\theta}_{d_{1}d_{0}}^{(zz^{\prime}),\text{d}}-\theta_{d_{1}d_{0}}^{(zz^{\prime}),\text{d}}\right)
+n[𝝉𝒮1(𝑶;𝝉)θd1d0(zz),d𝝉𝒮0(𝑶;𝝉)](𝝉^𝝉)+op(n1/2),\displaystyle+\mathbb{P}_{n}\left[\frac{\partial}{\partial\bm{\tau}}\mathcal{S}_{1}(\bm{O};\bm{\tau}^{*})-\theta_{d_{1}d_{0}}^{(zz^{\prime}),\text{d}}\frac{\partial}{\partial\bm{\tau}}\mathcal{S}_{0}(\bm{O};\bm{\tau}^{*})\right](\widehat{\bm{\tau}}-\bm{\tau}^{*})+o_{p}(n^{-1/2}),

which suggests that

n(θ^d1d0(zz),dθd1d0(zz),d)\displaystyle\sqrt{n}\left(\widehat{\theta}_{d_{1}d_{0}}^{(zz^{\prime}),\text{d}}-\theta_{d_{1}d_{0}}^{(zz^{\prime}),\text{d}}\right)
=\displaystyle= {𝔼[𝒮0(𝑶;𝝉)]}1n{𝒮1(𝑶;𝝉)θd1d0(zz),d𝒮0(𝑶;𝝉)+R(θd1d0(zz),d,𝝉)IF𝝉(𝑶;𝝉)}+op(1),\displaystyle\left\{\mathbb{E}[\mathcal{S}_{0}(\bm{O};\bm{\tau}^{*})]\right\}^{-1}\mathbb{P}_{n}\left\{\mathcal{S}_{1}(\bm{O};\bm{\tau}^{*})-\theta_{d_{1}d_{0}}^{(zz^{\prime}),\text{d}}\mathcal{S}_{0}(\bm{O};\bm{\tau}^{*})+R(\theta_{d_{1}d_{0}}^{(zz^{\prime}),\text{d}},\bm{\tau}^{*})\text{IF}_{\bm{\tau}}(\bm{O};\bm{\tau}^{*})\right\}+o_{p}(1),

where R(θd1d0(zz),d,𝝉)=𝔼[𝝉𝒮1(𝑶;𝝉)θd1d0(zz),d𝝉𝒮0(𝑶;𝝉)]R(\theta_{d_{1}d_{0}}^{(zz^{\prime}),\text{d}},\bm{\tau}^{*})=\mathbb{E}\left[\frac{\partial}{\partial\bm{\tau}}\mathcal{S}_{1}(\bm{O};\bm{\tau}^{*})-\theta_{d_{1}d_{0}}^{(zz^{\prime}),\text{d}}\frac{\partial}{\partial\bm{\tau}}\mathcal{S}_{0}(\bm{O};\bm{\tau}^{*})\right]. Then, by applying the central limit theorem and noticing that θd1d0(zz),d=θd1d0(zz)\theta_{d_{1}d_{0}}^{(zz^{\prime}),\text{d}}=\theta_{d_{1}d_{0}}^{(zz^{\prime})} under emo\mathcal{M}_{e}\cap\mathcal{M}_{m}\cap\mathcal{M}_{o}, we can show that n(θ^d1d0(zz),dθd1d0(zz))\sqrt{n}\left(\widehat{\theta}_{d_{1}d_{0}}^{(zz^{\prime}),\text{d}}-\theta_{d_{1}d_{0}}^{(zz^{\prime})}\right) converges to a zero-mean normal distribution with variance

Vd={𝔼[𝒮0(𝑶;𝝉)]}2𝔼[{𝒮1(𝑶;𝝉)θd1d0(zz)𝒮0(𝑶;𝝉)+R(θd1d0(zz),𝝉)IF𝝉(𝑶;𝝉)}2].V_{\text{d}}=\left\{\mathbb{E}[\mathcal{S}_{0}(\bm{O};\bm{\tau}^{*})]\right\}^{-2}\mathbb{E}\left[\left\{\mathcal{S}_{1}(\bm{O};\bm{\tau}^{*})-\theta_{d_{1}d_{0}}^{(zz^{\prime})}\mathcal{S}_{0}(\bm{O};\bm{\tau}^{*})+R(\theta_{d_{1}d_{0}}^{(zz^{\prime})},\bm{\tau}^{*})\text{IF}_{\bm{\tau}}(\bm{O};\bm{\tau}^{*})\right\}^{2}\right].

D.4 The efficient influence function (Theorem 3)

We derive the efficient influence function (EIF) of θd1d0(zz)\theta_{d_{1}d_{0}}^{(zz^{\prime})} under the nonparamatic model over the observed data 𝑶={𝑿,Z,D,M,Y}\bm{O}=\{{\bm{X}},Z,D,M,Y\}. Define Hd1d0(zz)=𝔼[(pzd(𝑿)kp01(𝑿))ηzz(𝑿)]H_{d_{1}d_{0}}^{(zz^{\prime})}=\mathbb{E}\left[(p_{z^{*}d^{*}}({\bm{X}})-kp_{01}({\bm{X}}))\eta_{zz^{\prime}}({\bm{X}})\right], where k=|d1d0|k=|d_{1}-d_{0}| and zd=11,10,01z^{*}d^{*}=11,10,01 if d1d0=10,00,11d_{1}d_{0}=10,00,11, respectively. Then, θd1d0(zz)=Hd1d0(zz)/ed1d0\theta_{d_{1}d_{0}}^{(zz^{\prime})}=H_{d_{1}d_{0}}^{(zz^{\prime})}/e_{d_{1}d_{0}}, where ed1d0=pzdkp01=𝔼[pzd(𝑿)kp01(𝑿)]e_{d_{1}d_{0}}=p_{z^{*}d^{*}}-kp_{01}=\mathbb{E}[p_{z^{*}d^{*}}({\bm{X}})-kp_{01}({\bm{X}})]. The following lemma demonstrates the EIFs of Hd1d0(zz)H_{d_{1}d_{0}}^{(zz^{\prime})} and ed1d0e_{d_{1}d_{0}}, separately.

Lemma S9

For any z,z{0,1}z,z^{\prime}\in\{0,1\}, d1d0𝒰ad_{1}d_{0}\in\mathcal{U}_{\text{a}} under standard monotonicity, and d1d0𝒰bd_{1}d_{0}\in\mathcal{U}_{\text{b}} under strong monotonicity, the EIF of Hd1d0(zz)H_{d_{1}d_{0}}^{(zz^{\prime})} over np\mathcal{M}_{np} is 𝒟d1d0(zz),H(𝐎)=ψd1d0(zz)(𝐎)Hd1d0(zz)\mathcal{D}_{d_{1}d_{0}}^{(zz^{\prime}),H}(\bm{O})=\psi_{d_{1}d_{0}}^{(zz^{\prime})}({\bm{O}})-H_{d_{1}d_{0}}^{(zz^{\prime})}, where

ψd1d0(zz)(𝑶)=\displaystyle\psi_{d_{1}d_{0}}^{(zz^{\prime})}({\bm{O}})= (𝕀(Z=z){𝕀(D=d)pzd(𝑿)}πz(𝑿)k(1Z){Dp01(𝑿)}π0(𝑿))ηzz(𝑿)\displaystyle\left(\frac{\mathbb{I}(Z=z^{*})\left\{\mathbb{I}(D=d^{*})-p_{z^{*}d^{*}}({\bm{X}})\right\}}{\pi_{z^{*}}({\bm{X}})}-k\frac{(1-Z)\left\{D-p_{01}({\bm{X}})\right\}}{\pi_{0}({\bm{X}})}\right)\eta_{zz^{\prime}}({\bm{X}})
+{pzd(𝑿)kp01(𝑿)}𝕀(D=dz,Z=z)pzdz(𝑿)πz(𝑿)rzdz(M,𝑿)rzdz(M,𝑿){Yμzdz(M,𝑿)}\displaystyle+\left\{p_{z^{*}d^{*}}({\bm{X}})-kp_{01}({\bm{X}})\right\}\frac{\mathbb{I}(D=d_{z},Z=z)}{p_{zd_{z}}({\bm{X}})\pi_{z}({\bm{X}})}\frac{r_{z^{\prime}d_{z^{\prime}}}(M,{\bm{X}})}{r_{zd_{z}}(M,{\bm{X}})}\left\{Y-\mu_{zd_{z}}(M,{\bm{X}})\right\}
+{pzd(𝑿)kp01(𝑿)}𝕀(D=dz,Z=z)pzdz(𝑿)πz(𝑿){μzdz(M,𝑿)ηzz(𝑿)}\displaystyle+\left\{p_{z^{*}d^{*}}({\bm{X}})-kp_{01}({\bm{X}})\right\}\frac{\mathbb{I}(D=d_{z^{\prime}},Z=z^{\prime})}{p_{z^{\prime}d_{z^{\prime}}}({\bm{X}})\pi_{z^{\prime}}({\bm{X}})}\left\{\mu_{zd_{z}}(M,{\bm{X}})-\eta_{zz^{\prime}}({\bm{X}})\right\}
+{pzd(𝑿)kp01(𝑿)}ηzz(𝑿),\displaystyle+\left\{p_{z^{*}d^{*}}({\bm{X}})-kp_{01}({\bm{X}})\right\}\eta_{zz^{\prime}}({\bm{X}}),

and dzd_{z}, dzd_{z^{\prime}}, and ηzz(𝐗)\eta_{zz^{\prime}}({\bm{X}}) are defined in Theorem 1. The EIF of ed1d0e_{d_{1}d_{0}} over np\mathcal{M}_{np} is 𝒟d1d0e(𝐎)=δd1d0(𝐎)ed1d0\mathcal{D}_{d_{1}d_{0}}^{e}(\bm{O})=\delta_{d_{1}d_{0}}(\bm{O})-e_{d_{1}d_{0}}, where

δd1d0(𝑶)=𝕀(Z=z){𝕀(D=d)pzd(𝑿)}πz(𝑿)k(1Z){Dp01(𝑿)}π0(𝑿)+pzd(𝑿)kp01(𝑿).\delta_{d_{1}d_{0}}(\bm{O})=\frac{\mathbb{I}(Z=z^{*})\left\{\mathbb{I}(D=d^{*})-p_{z^{*}d^{*}}({\bm{X}})\right\}}{\pi_{z^{*}}({\bm{X}})}-k\frac{(1-Z)\left\{D-p_{01}({\bm{X}})\right\}}{\pi_{0}({\bm{X}})}+p_{z^{*}d^{*}}({\bm{X}})-kp_{01}({\bm{X}}).
Proof.

To simplify notation, we abbreviate fY|Z,D,M,𝑿(y|z,d,m,𝒙)f_{Y|Z,D,M,{\bm{X}}}(y|z,d,m,{\bm{x}}), fM|Z,D,𝑿(m|z,d,𝒙)f_{M|Z,D,{\bm{X}}}(m|z,d,{\bm{x}}), fD|Z,𝑿(d|z,𝒙)f_{D|Z,{\bm{X}}}(d|z,{\bm{x}}), fZ|𝑿(z|𝒙)f_{Z|{\bm{X}}}(z|{\bm{x}}), and f𝑿(𝒙)f_{{\bm{X}}}({\bm{x}}) as f(y|z,d,m,𝒙)f(y|z,d,m,{\bm{x}}), f(m|z,d,𝒙)f(m|z,d,{\bm{x}}), f(d|z,𝒙)f(d|z,{\bm{x}}), f(z|𝒙)f(z|{\bm{x}}), and f(𝒙)f({\bm{x}}), respectively. We let f𝑶(𝒐)f_{\bm{O}}(\bm{o}) be the joint density of the observed data 𝑶\bm{O}, which is abbreviated as f(𝒐)f(\bm{o}) hereafter. Notice that f(𝒐)f(\bm{o}) can be factorized as

f(𝒐)=f(y|z,d,m,𝒙)f(m|z,d,𝒙)f(d|z,𝒙)f(z|𝒙)f(𝒙).f(\bm{o})=f(y|z,d,m,{\bm{x}})f(m|z,d,{\bm{x}})f(d|z,{\bm{x}})f(z|{\bm{x}})f({\bm{x}}).

We consider a parametric submodel ft(𝒐)f_{t}(\bm{o}) for f(𝒐)f(\bm{o}), which depends on a one-dimensional parameter tt. We assume that ft(𝒐)f_{t}(\bm{o}) contains the true model f(𝒐)f(\bm{o}) at t=0t=0; i.e., ft=0(𝒐)=f(𝒐)f_{t=0}(\bm{o})=f(\bm{o}). Let St(y,m,d,z,𝒙)=St(𝒐)S_{t}(y,m,d,z,{\bm{x}})=S_{t}(\bm{o}) be the score function of this parametric submodel, which is defined as

St(𝒐)=tlogft(𝒐)S_{t}(\bm{o})=\triangledown_{t}\log f_{t}(\bm{o})

where tlogft()=logft()t\triangledown_{t}\log f_{t}(\cdot)=\frac{\partial\log f_{t}(\cdot)}{\partial t}. We can decompose the score function as a summation of the following 5 parts:

St(𝒐)=St(y|z,d,m,𝒙)+St(m|z,d,𝒙)+St(d|z,𝒙)+St(z|𝒙)+St(𝒙),\displaystyle S_{t}(\bm{o})=S_{t}(y|z,d,m,{\bm{x}})+S_{t}(m|z,d,{\bm{x}})+S_{t}(d|z,{\bm{x}})+S_{t}(z|{\bm{x}})+S_{t}({\bm{x}}),

where St(y|z,d,m,𝒙)=tlogft(y|z,d,m,𝒙)S_{t}(y|z,d,m,{\bm{x}})=\triangledown_{t}\log f_{t}(y|z,d,m,{\bm{x}}), and St(m|z,d,𝒙)S_{t}(m|z,d,{\bm{x}}), St(d|z,𝒙)S_{t}(d|z,{\bm{x}}), St(z|𝒙)S_{t}(z|{\bm{x}}), and St(𝒙)S_{t}({\bm{x}}) are similarly defined. According to the semiparametric efficiency theory (Bickel et al., 1993), the EIF of Hd1d0(zz)H_{d_{1}d_{0}}^{(zz^{\prime})}, denoted by 𝒟d1d0(zz),H(𝑶)\mathcal{D}_{d_{1}d_{0}}^{(zz^{\prime}),H}(\bm{O}), must satisfy the following equation:

𝔼[𝒟d1d0(zz),H(𝑶)St=0(𝑶)]=t=0Hd1d0(zz)(t),\mathbb{E}\left[\mathcal{D}_{d_{1}d_{0}}^{(zz^{\prime}),H}(\bm{O})S_{t=0}(\bm{O})\right]=\triangledown_{t=0}H_{d_{1}d_{0}}^{(zz^{\prime})}(t),

where

Hd1d0(zz)(t)=𝒙,m,y{ft(d|z,𝒙)kft(D=0|Z=1,𝒙)}yft(y|z,dz,m,𝒙)ft(m|z,dz,𝒙)ft(𝒙)dydmd𝒙H_{d_{1}d_{0}}^{(zz^{\prime})}(t)\!=\!\iiint_{{\bm{x}},m,y}\!\!\!\!\!\left\{f_{t}(d^{*}|z^{*},{\bm{x}})\!-\!kf_{t}(D\!=\!0|Z\!=\!1,{\bm{x}})\right\}yf_{t}(y|z,d_{z},m,{\bm{x}})f_{t}(m|z^{\prime},d_{z^{\prime}},{\bm{x}})f_{t}({\bm{x}})\text{d}y\text{d}m\text{d}{\bm{x}}

is Hd1d0(zz)H_{d_{1}d_{0}}^{(zz^{\prime})} evaluated under the parametric submodel ft(𝒐)f_{t}(\bm{o}).

Below we derive 𝒟d1d0(zz),H(𝑶)\mathcal{D}_{d_{1}d_{0}}^{(zz^{\prime}),H}(\bm{O}) by solving t=0Hd1d0(zz)(t)\triangledown_{t=0}H_{d_{1}d_{0}}^{(zz^{\prime})}(t) directly. Specifically, we can show

t=0Hd1d0(zz)(t)\displaystyle\triangledown_{t=0}H_{d_{1}d_{0}}^{(zz^{\prime})}(t)
=\displaystyle= 𝒙{t=0ft(d|z,𝒙)kt=0ft(D=0|Z=1,𝒙)}m,yyf(y|z,dz,m,𝒙)f(m|z,dz,𝒙)f(𝒙)dydmd𝒙\displaystyle\int_{{\bm{x}}}\!\!\left\{\triangledown_{t=0}f_{t}(d^{*}|z^{*},{\bm{x}})\!-\!k\triangledown_{t=0}f_{t}(D\!=\!0|Z\!=\!1,{\bm{x}})\right\}\iint_{m,y}yf(y|z,d_{z},m,{\bm{x}})f(m|z^{\prime},d_{z^{\prime}},{\bm{x}})f({\bm{x}})\text{d}y\text{d}m\text{d}{\bm{x}} (s16)
+𝒙{f(d|z,𝒙)kf(D=0|Z=1,𝒙)}m,yyt=0ft(y|z,dz,m,𝒙)f(m|z,dz,𝒙)f(𝒙)dydmd𝒙\displaystyle+\int_{{\bm{x}}}\!\!\left\{f(d^{*}|z^{*},{\bm{x}})\!-\!kf(D\!=\!0|Z\!=\!1,{\bm{x}})\right\}\iint_{m,y}y\triangledown_{t=0}f_{t}(y|z,d_{z},m,{\bm{x}})f(m|z^{\prime},d_{z^{\prime}},{\bm{x}})f({\bm{x}})\text{d}y\text{d}m\text{d}{\bm{x}} (s17)
+𝒙{f(d|z,𝒙)kf(D=0|Z=1,𝒙)}m,yyf(y|z,dz,m,𝒙)t=0ft(m|z,dz,𝒙)f(𝒙)dydmd𝒙\displaystyle+\int_{{\bm{x}}}\!\!\left\{f(d^{*}|z^{*},{\bm{x}})\!-\!kf(D\!=\!0|Z\!=\!1,{\bm{x}})\right\}\iint_{m,y}yf(y|z,d_{z},m,{\bm{x}})\triangledown_{t=0}f_{t}(m|z^{\prime},d_{z^{\prime}},{\bm{x}})f({\bm{x}})\text{d}y\text{d}m\text{d}{\bm{x}} (s18)
+𝒙{f(d|z,𝒙)kf(D=0|Z=1,𝒙)}m,yyf(y|z,dz,m,𝒙)f(m|z,dz,𝒙)t=0ft(𝒙)dydmd𝒙,\displaystyle+\int_{{\bm{x}}}\!\!\left\{f(d^{*}|z^{*},{\bm{x}})\!-\!kf(D\!=\!0|Z\!=\!1,{\bm{x}})\right\}\iint_{m,y}yf(y|z,d_{z},m,{\bm{x}})f(m|z^{\prime},d_{z^{\prime}},{\bm{x}})\triangledown_{t=0}f_{t}({\bm{x}})\text{d}y\text{d}m\text{d}{\bm{x}}, (s19)

where

(s16)\displaystyle\eqref{eq:eif1}
=\displaystyle= 𝒙,m{t=0ft(d|z,𝒙)kt=0ft(D=0|Z=1,𝒙)}𝔼[Y|z,dz,m,𝒙]f(m|z,dz,𝒙)dmf(𝒙)d𝒙\displaystyle\iint_{{\bm{x}},m}\left\{\triangledown_{t=0}f_{t}(d^{*}|z^{*},{\bm{x}})\!-\!k\triangledown_{t=0}f_{t}(D\!=\!0|Z\!=\!1,{\bm{x}})\right\}\mathbb{E}[Y|z,d_{z},m,{\bm{x}}]f(m|z^{\prime},d_{z^{\prime}},{\bm{x}})\text{d}mf({\bm{x}})\text{d}{\bm{x}}
=\displaystyle= 𝒙t=0ft(d|z,𝒙)ηzz(𝒙)f(𝒙)d𝒙k𝒙t=0ft(D=0|Z=1,𝒙)ηzz(𝒙)f(𝒙)d𝒙\displaystyle\int_{{\bm{x}}}\triangledown_{t=0}f_{t}(d^{*}|z^{*},{\bm{x}})\eta_{zz^{\prime}}({\bm{x}})f({\bm{x}})\text{d}{\bm{x}}-k\int_{{\bm{x}}}\triangledown_{t=0}f_{t}(D\!=\!0|Z\!=\!1,{\bm{x}})\eta_{zz^{\prime}}({\bm{x}})f({\bm{x}})\text{d}{\bm{x}}
=\displaystyle= 𝒙f(d|z,𝒙){𝔼Y,M|Z,D,𝑿[St=0(Y,M,d,z,𝒙)|z,d,𝒙]𝔼Y,M,D|Z,𝑿[St=0(Y,M,D,z,𝒙)|z,𝒙]}ηzz(𝒙)f(𝒙)d𝒙\displaystyle\int_{{\bm{x}}}f(d^{*}|z^{*},{\bm{x}})\left\{\mathbb{E}_{Y,M|Z,D,{\bm{X}}}[S_{t=0}(Y,M,d^{*},z^{*},{\bm{x}})|z^{*},d^{*},{\bm{x}}]-\mathbb{E}_{Y,M,D|Z,{\bm{X}}}[S_{t=0}(Y,M,D,z^{*},{\bm{x}})|z^{*},{\bm{x}}]\right\}\eta_{zz^{\prime}}({\bm{x}})f({\bm{x}})\text{d}{\bm{x}}
k𝒙f(D=0|Z=1,𝒙){𝔼Y,M|D,Z,𝑿[St=0(Y,M,0,1,𝒙)|1,0,𝒙]𝔼Y,M,D|Z,𝑿[St=0(Y,M,D,1,𝒙)|1,𝒙]}ηzz(𝒙)f(𝒙)d𝒙\displaystyle-k\int_{{\bm{x}}}f(D=0|Z=1,{\bm{x}})\left\{\mathbb{E}_{Y,M|D,Z,{\bm{X}}}[S_{t=0}(Y,M,0,1,{\bm{x}})|1,0,{\bm{x}}]-\mathbb{E}_{Y,M,D|Z,{\bm{X}}}[S_{t=0}(Y,M,D,1,{\bm{x}})|1,{\bm{x}}]\right\}\eta_{zz^{\prime}}({\bm{x}})f({\bm{x}})\text{d}{\bm{x}}
=\displaystyle= 𝒙,m,yf(d|z,𝒙)ηzz(𝒙)St=0(y,m,d,z,𝒙)f(y|z,d,m,𝒙)f(m|z,d,𝒙)f(𝒙)dydmd𝒙\displaystyle\iiint_{{\bm{x}},m,y}f(d^{*}|z^{*},{\bm{x}})\eta_{zz^{\prime}}({\bm{x}})S_{t=0}(y,m,d^{*},z^{*},{\bm{x}})f(y|z^{*},d^{*},m,{\bm{x}})f(m|z^{*},d^{*},{\bm{x}})f({\bm{x}})\text{d}y\text{d}m\text{d}{\bm{x}}
𝒙,d,m,yf(d|z,𝒙)ηzz(𝒙)St=0(y,m,d,z,𝒙)f(y|z,d,m,𝒙)f(m|z,d,𝒙)f(d|z,𝒙)f(𝒙)dydmd𝑑d𝒙\displaystyle-\iiiint_{{\bm{x}},d,m,y}f(d^{*}|z^{*},{\bm{x}})\eta_{zz^{\prime}}({\bm{x}})S_{t=0}(y,m,d,z^{*},{\bm{x}})f(y|z^{*},d,m,{\bm{x}})f(m|z^{*},d,{\bm{x}})f(d|z^{*},{\bm{x}})f({\bm{x}})\text{d}y\text{d}m\text{d}d\text{d}{\bm{x}}
k𝒙,m,yf(D=0|Z=1,𝒙)ηzz(𝒙)St=0(y,m,0,1,𝒙)f(y|1,0,m,𝒙)f(m|1,0,𝒙)f(𝒙)dydmd𝒙\displaystyle-k\iiint_{{\bm{x}},m,y}f(D=0|Z=1,{\bm{x}})\eta_{zz^{\prime}}({\bm{x}})S_{t=0}(y,m,0,1,{\bm{x}})f(y|1,0,m,{\bm{x}})f(m|1,0,{\bm{x}})f({\bm{x}})\text{d}y\text{d}m\text{d}{\bm{x}}
+k𝒙,d,m,yf(D=0|Z=1,𝒙)ηzz(𝒙)St=0(y,m,d,1,𝒙)f(y|1,d,m,𝒙)f(m|1,d,𝒙)f(d|1,𝒙)f(𝒙)dydmddd𝒙\displaystyle+k\iiiint_{{\bm{x}},d,m,y}f(D=0|Z=1,{\bm{x}})\eta_{zz^{\prime}}({\bm{x}})S_{t=0}(y,m,d,1,{\bm{x}})f(y|1,d,m,{\bm{x}})f(m|1,d,{\bm{x}})f(d|1,{\bm{x}})f({\bm{x}})\text{d}y\text{d}m\text{d}d\text{d}{\bm{x}}
=\displaystyle= 𝔼[𝕀(D=d,Z=z)pzd(𝑿)πz(𝑿)pzd(𝑿)ηzz(𝑿)St=0(𝑶)]𝔼[𝕀(Z=z)πz(𝑿)pzd(𝑿)ηzz(𝑿)St=0(𝑶)]\displaystyle\mathbb{E}\left[\frac{\mathbb{I}(D=d^{*},Z=z^{*})}{p_{z^{*}d^{*}}({\bm{X}})\pi_{z}^{*}({\bm{X}})}p_{z^{*}d^{*}}({\bm{X}})\eta_{zz^{\prime}}({\bm{X}})S_{t=0}(\bm{O})\right]-\mathbb{E}\left[\frac{\mathbb{I}(Z=z^{*})}{\pi_{z^{*}}({\bm{X}})}p_{z^{*}d^{*}}({\bm{X}})\eta_{zz^{\prime}}({\bm{X}})S_{t=0}(\bm{O})\right]
k𝔼[𝕀(D=0,Z=1)p01(𝑿)π0(𝑿)p01(𝑿)ηzz(𝑿)St=0(𝑶)]+k𝔼[𝕀(Z=0)π0(𝑿)p01(𝑿)ηzz(𝑿)St=0(𝑶)]\displaystyle-k\mathbb{E}\left[\frac{\mathbb{I}(D=0,Z=1)}{p_{01}({\bm{X}})\pi_{0}({\bm{X}})}p_{01}({\bm{X}})\eta_{zz^{\prime}}({\bm{X}})S_{t=0}(\bm{O})\right]+k\mathbb{E}\left[\frac{\mathbb{I}(Z=0)}{\pi_{0}({\bm{X}})}p_{01}({\bm{X}})\eta_{zz^{\prime}}({\bm{X}})S_{t=0}(\bm{O})\right]
=\displaystyle= 𝔼[(𝕀(Z=z){𝕀(D=d)pzd(𝑿)}πz(𝑿)k(1Z){Dp01(𝑿)}π0(𝑿))ηzz(𝑿)St=0(𝑶)]\displaystyle\mathbb{E}\left[\left(\frac{\mathbb{I}(Z=z^{*})\left\{\mathbb{I}(D=d^{*})-p_{z^{*}d^{*}}({\bm{X}})\right\}}{\pi_{z^{*}}({\bm{X}})}-k\frac{(1-Z)\left\{D-p_{01}({\bm{X}})\right\}}{\pi_{0}({\bm{X}})}\right)\eta_{zz^{\prime}}({\bm{X}})S_{t=0}({\bm{O}})\right]

and

(s17)=\displaystyle\eqref{eq:eif2}= 𝒙{pzd(𝒙)kp01(𝒙)}m,yyt=0ft(y|z,dz,m,𝒙)f(m|z,dz,𝒙)f(𝒙)dydmd𝒙\displaystyle\int_{{\bm{x}}}\left\{p_{z^{*}d^{*}}({\bm{x}})-kp_{01}({\bm{x}})\right\}\iint_{m,y}y\triangledown_{t=0}f_{t}(y|z,d_{z},m,{\bm{x}})f(m|z^{\prime},d_{z^{\prime}},{\bm{x}})f({\bm{x}})\text{d}y\text{d}m\text{d}{\bm{x}}
=\displaystyle= 𝒙,m,y{pzd(𝒙)kp01(𝒙)}yf(y|z,dz,m,𝒙){St=0(y,m,dz,z,𝒙)𝔼Y|M,D,Z,𝑿[S(Y,m,dz,z,𝒙)|m,dz,z,𝒙]}\displaystyle\iiint_{{\bm{x}},m,y}\left\{p_{z^{*}d^{*}}({\bm{x}})\!-\!kp_{01}({\bm{x}})\right\}yf(y|z,d_{z},m,{\bm{x}})\left\{S_{t=0}(y,m,d_{z},z,{\bm{x}})-\mathbb{E}_{Y|M,D,Z,{\bm{X}}}[S(Y,m,d_{z},z,{\bm{x}})|m,d_{z},z,{\bm{x}}]\right\}
f(m|z,dz,𝒙)f(𝒙)dydmd𝒙\displaystyle\quad f(m|z^{\prime},d_{z^{\prime}},{\bm{x}})f({\bm{x}})\text{d}y\text{d}m\text{d}{\bm{x}}
=\displaystyle= 𝒙,m,y{pzd(𝒙)kp01(𝒙)}yf(y|z,dz,m,𝒙)St=0(y,m,dz,z,𝒙)f(m|z,dz,𝒙)f(𝒙)dydmd𝒙\displaystyle\iiint_{{\bm{x}},m,y}\left\{p_{z^{*}d^{*}}({\bm{x}})\!-\!kp_{01}({\bm{x}})\right\}yf(y|z,d_{z},m,{\bm{x}})S_{t=0}(y,m,d_{z},z,{\bm{x}})f(m|z^{\prime},d_{z^{\prime}},{\bm{x}})f({\bm{x}})\text{d}y\text{d}m\text{d}{\bm{x}}
𝒙,m,y{pzd(𝒙)kp01(𝒙)}𝔼Y|Z,D,M,𝑿[Y|z,dz,m,𝒙]St=0(y,m,dz,z,𝒙)f(y|z,dz,m,𝒙)\displaystyle-\iiint_{{\bm{x}},m,y}\left\{p_{z^{*}d^{*}}({\bm{x}})\!-\!kp_{01}({\bm{x}})\right\}\mathbb{E}_{Y|Z,D,M,{\bm{X}}}[Y|z,d_{z},m,{\bm{x}}]S_{t=0}(y,m,d_{z},z,{\bm{x}})f(y|z,d_{z},m,{\bm{x}})
f(m|z,dz,𝒙)f(𝒙)dydmd𝒙\displaystyle\quad\quad f(m|z^{\prime},d_{z^{\prime}},{\bm{x}})f({\bm{x}})\text{d}y\text{d}m\text{d}{\bm{x}}
=\displaystyle= 𝒙,m,y{pzd(𝒙)kp01(𝒙)}yf(m|z,dz,𝒙)f(m|z,dz,𝒙)St=0(y,m,dz,z,𝒙)f(y|z,dz,m,𝒙)f(m|z,dz,𝒙)f(𝒙)dydmd𝒙\displaystyle\iiint_{{\bm{x}},m,y}\!\!\!\!\left\{p_{z^{*}d^{*}}({\bm{x}})\!-\!kp_{01}({\bm{x}})\right\}y\frac{f(m|z^{\prime},d_{z^{\prime}},{\bm{x}})}{f(m|z,d_{z},{\bm{x}})}S_{t=0}(y,m,d_{z},z,{\bm{x}})f(y|z,d_{z},m,{\bm{x}})f(m|z,d_{z},{\bm{x}})f({\bm{x}})\text{d}y\text{d}m\text{d}{\bm{x}}
𝒙,m,y{pzd(𝒙)kp01(𝒙)}μzdz(m,𝒙)f(m|z,dz,𝒙)f(m|z,dz,𝒙)St=0(y,m,dz,z,𝒙)f(y|z,dz,m,𝒙)\displaystyle-\iiint_{{\bm{x}},m,y}\!\!\!\!\left\{p_{z^{*}d^{*}}({\bm{x}})\!-\!kp_{01}({\bm{x}})\right\}\mu_{zd_{z}}(m,{\bm{x}})\frac{f(m|z^{\prime},d_{z^{\prime}},{\bm{x}})}{f(m|z,d_{z},{\bm{x}})}S_{t=0}(y,m,d_{z},z,{\bm{x}})f(y|z,d_{z},m,{\bm{x}})
f(m|z,dz,𝒙)f(𝒙)dydmd𝒙\displaystyle\quad\quad f(m|z,d_{z},{\bm{x}})f({\bm{x}})\text{d}y\text{d}m\text{d}{\bm{x}}
=\displaystyle= 𝔼[{pzd(𝑿)kp01(𝑿)}𝕀(D=dz,Z=z)pzdz(𝑿)πz(𝑿)rzdz(M,𝑿)rzdz(M,𝑿){Yμzdz(M,𝑿)}St=0(𝑶)]\displaystyle\mathbb{E}\left[\left\{p_{z^{*}d^{*}}({\bm{X}})-kp_{01}({\bm{X}})\right\}\frac{\mathbb{I}(D=d_{z},Z=z)}{p_{zd_{z}}({\bm{X}})\pi_{z}({\bm{X}})}\frac{r_{z^{\prime}d_{z^{\prime}}}(M,{\bm{X}})}{r_{zd_{z}}(M,{\bm{X}})}\left\{Y-\mu_{zd_{z}}(M,{\bm{X}})\right\}S_{t=0}({\bm{O}})\right]

and

(s18)=\displaystyle\eqref{eq:eif3}= 𝒙,m{pzd(𝒙)kp01(𝒙)}μzdz(m,𝒙)t=0ft(m|z,dz,𝒙)f(𝒙)dmd𝒙\displaystyle\iint_{{\bm{x}},m}\left\{p_{z^{*}d^{*}}({\bm{x}})-kp_{01}({\bm{x}})\right\}\mu_{zd_{z}}(m,{\bm{x}})\triangledown_{t=0}f_{t}(m|z^{\prime},d_{z^{\prime}},{\bm{x}})f({\bm{x}})\text{d}m\text{d}{\bm{x}}
=\displaystyle= 𝒙,m{pzd(𝒙)kp01(𝒙)}μzdz(m,𝒙)f(m|z,dz,𝒙){𝔼Y|Z,D,M,𝑿[St=0(Y,m,dz,z,𝒙)|z,dz,m,𝒙]\displaystyle\iint_{{\bm{x}},m}\left\{p_{z^{*}d^{*}}({\bm{x}})-kp_{01}({\bm{x}})\right\}\mu_{zd_{z}}(m,{\bm{x}})f(m|z^{\prime},d_{z^{\prime}},{\bm{x}})\Big{\{}\mathbb{E}_{Y|Z,D,M,{\bm{X}}}[S_{t=0}(Y,m,d_{z^{\prime}},z^{\prime},{\bm{x}})|z^{\prime},d_{z^{\prime}},m,{\bm{x}}]
𝔼Y,M|Z,D,𝑿[St=0(Y,M,dz,z,𝒙)|z,dz,𝒙]}f(𝒙)dmd𝒙\displaystyle\quad\quad-\mathbb{E}_{Y,M|Z,D,{\bm{X}}}[S_{t=0}(Y,M,d_{z^{\prime}},z^{\prime},{\bm{x}})|z^{\prime},d_{z^{\prime}},{\bm{x}}]\Big{\}}f({\bm{x}})\text{d}m\text{d}{\bm{x}}
=\displaystyle= 𝒙,m,y{pzd(𝒙)kp01(𝒙)}μzdz(m,𝒙)St=0(y,m,dz,z,𝒙)f(y|z,dz,m,𝒙)f(m|z,dz,𝒙)f(𝒙)dydmd𝒙\displaystyle\iiint_{{\bm{x}},m,y}\!\!\!\left\{p_{z^{*}d^{*}}({\bm{x}})-kp_{01}({\bm{x}})\right\}\mu_{zd_{z}}(m,{\bm{x}})S_{t=0}(y,m,d_{z^{\prime}},z^{\prime},{\bm{x}})f(y|z^{\prime},d_{z^{\prime}},m,{\bm{x}})f(m|z^{\prime},d_{z^{\prime}},{\bm{x}})f({\bm{x}})\text{d}y\text{d}m\text{d}{\bm{x}}
𝒙,m,y{pzd(𝒙)kp01(𝒙)}ηzz(𝒙)St=0(y,m,dz,z,𝒙)f(y|z,dz,m,𝒙)f(m|z,dz,𝒙)f(𝒙)dydmd𝒙\displaystyle-\iiint_{{\bm{x}},m,y}\!\!\!\left\{p_{z^{*}d^{*}}({\bm{x}})-kp_{01}({\bm{x}})\right\}\eta_{zz^{\prime}}({\bm{x}})S_{t=0}(y,m,d_{z^{\prime}},z^{\prime},{\bm{x}})f(y|z^{\prime},d_{z^{\prime}},m,{\bm{x}})f(m|z^{\prime},d_{z^{\prime}},{\bm{x}})f({\bm{x}})\text{d}y\text{d}m\text{d}{\bm{x}}
=\displaystyle= 𝔼[{pzd(𝑿)kp01(𝑿)}𝕀(D=dz,Z=z)pzdz(𝑿)πz(𝑿){μzdz(M,𝑿)ηzz(𝑿)}St=0(𝑶)]\displaystyle\mathbb{E}\left[\left\{p_{z^{*}d^{*}}({\bm{X}})-kp_{01}({\bm{X}})\right\}\frac{\mathbb{I}(D=d_{z^{\prime}},Z=z^{\prime})}{p_{z^{\prime}d_{z^{\prime}}}({\bm{X}})\pi_{z^{\prime}}({\bm{X}})}\left\{\mu_{zd_{z}}(M,{\bm{X}})-\eta_{zz^{\prime}}({\bm{X}})\right\}S_{t=0}({\bm{O}})\right]

and

(s19)=\displaystyle\eqref{eq:eif4}= 𝒙{pzd(𝒙)kp01(𝒙)}ηzz(𝒙)t=0ft(𝒙)d𝒙\displaystyle\int_{{\bm{x}}}\left\{p_{z^{*}d^{*}}({\bm{x}})-kp_{01}({\bm{x}})\right\}\eta_{zz^{\prime}}({\bm{x}})\triangledown_{t=0}f_{t}({\bm{x}})\text{d}{\bm{x}}
=\displaystyle= 𝒙{pzd(𝒙)kp01(𝒙)}ηzz(𝒙)f(𝒙){𝔼Y,M,D,Z|𝑿[St=0(Y,M,D,Z,𝒙)|𝒙]𝔼[St=0(Y,M,S,Z,𝑿)]}d𝒙\displaystyle\int_{{\bm{x}}}\left\{p_{z^{*}d^{*}}({\bm{x}})-kp_{01}({\bm{x}})\right\}\eta_{zz^{\prime}}({\bm{x}})f({\bm{x}})\left\{\mathbb{E}_{Y,M,D,Z|{\bm{X}}}[S_{t=0}(Y,M,D,Z,{\bm{x}})|{\bm{x}}]-\mathbb{E}[S_{t=0}(Y,M,S,Z,{\bm{X}})]\right\}\text{d}{\bm{x}}
=\displaystyle= 𝔼[{pzd(𝑿)kp01(𝑿)}ηzz(𝑿)St=0(𝑶)]Hd1d0(zz)𝔼[St=0(𝑶)]\displaystyle\mathbb{E}\left[\left\{p_{z^{*}d^{*}}({\bm{X}})-kp_{01}({\bm{X}})\right\}\eta_{zz^{\prime}}({\bm{X}})S_{t=0}({\bm{O}})\right]-H_{d_{1}d_{0}}^{(zz^{\prime})}\mathbb{E}\left[S_{t=0}({\bm{O}})\right]
=\displaystyle= 𝔼[({pzd(𝑿)kp01(𝑿)}ηzz(𝑿)Hd1d0(zz))St=0(𝑶)].\displaystyle\mathbb{E}\left[\left(\left\{p_{z^{*}d^{*}}({\bm{X}})-kp_{01}({\bm{X}})\right\}\eta_{zz^{\prime}}({\bm{X}})-H_{d_{1}d_{0}}^{(zz^{\prime})}\right)S_{t=0}({\bm{O}})\right].

Therefore,

t=0Hd1d0(zz)(t)\displaystyle\triangledown_{t=0}H_{d_{1}d_{0}}^{(zz^{\prime})}(t) =(s16)+(s17)+(s18)+(s19)\displaystyle=\eqref{eq:eif1}+\eqref{eq:eif2}+\eqref{eq:eif3}+\eqref{eq:eif4}
=𝔼[{ψd1d0(zz)(𝑶)Hd1d0(zz)}St=0(𝑶)].\displaystyle=\mathbb{E}\left[\left\{\psi_{d_{1}d_{0}}^{(zz^{\prime})}({\bm{O}})-H_{d_{1}d_{0}}^{(zz^{\prime})}\right\}S_{t=0}(\bm{O})\right].

Now we conclude that the EIF of Hd1d0(zz)H_{d_{1}d_{0}}^{(zz^{\prime})} is 𝒟d1d0(zz),H(𝑶)=ψd1d0(zz)(𝑶)Hd1d0(zz)\mathcal{D}_{d_{1}d_{0}}^{(zz^{\prime}),H}(\bm{O})=\psi_{d_{1}d_{0}}^{(zz^{\prime})}({\bm{O}})-H_{d_{1}d_{0}}^{(zz^{\prime})}.

Next, we drive the EIF of ed1d0e_{d_{1}d_{0}}, denoted by 𝒟d1d0e(𝑶)\mathcal{D}_{d_{1}d_{0}}^{e}(\bm{O}). By the semiparametric efficiency theory (Bickel et al., 1993), 𝒟d1d0e(𝑶)\mathcal{D}_{d_{1}d_{0}}^{e}(\bm{O}) must satisfy the following equation

𝔼[𝒟d1d0e(𝑶)St=0(𝑶)]=t=0ed1d0(t),\mathbb{E}\left[\mathcal{D}_{d_{1}d_{0}}^{e}(\bm{O})S_{t=0}(\bm{O})\right]=\triangledown_{t=0}e_{d_{1}d_{0}}(t),

where ed1d0(t)=𝒙{ft(d|z,𝒙)kft(D=0|Z=1,𝒙)}ft(𝒙)d𝒙e_{d_{1}d_{0}}(t)=\int_{{\bm{x}}}\left\{f_{t}(d^{*}|z^{*},{\bm{x}})-kf_{t}(D=0|Z=1,{\bm{x}})\right\}f_{t}({\bm{x}})\text{d}{\bm{x}} is ed1d0e_{d_{1}d_{0}} evaluated under the parametric submodel. We can show

t=0ed1d0(t)=\displaystyle\triangledown_{t=0}e_{d_{1}d_{0}}(t)= 𝒙{t=0ft(d|z,𝒙)kt=0ft(D=0|Z=1,𝒙)}f(𝒙)d𝒙\displaystyle\int_{{\bm{x}}}\left\{\triangledown_{t=0}f_{t}(d^{*}|z^{*},{\bm{x}})-k\triangledown_{t=0}f_{t}(D=0|Z=1,{\bm{x}})\right\}f({\bm{x}})\text{d}{\bm{x}} (s20)
+𝒙{f(d|z,𝒙)kf(D=0|Z=1,𝒙)}t=0ft(𝒙)d𝒙,\displaystyle+\int_{{\bm{x}}}\left\{f(d^{*}|z^{*},{\bm{x}})-kf(D=0|Z=1,{\bm{x}})\right\}\triangledown_{t=0}f_{t}({\bm{x}})\text{d}{\bm{x}}, (s21)

where

(s20)=\displaystyle\!\!\!\eqref{eq:q_eif1}= 𝒙t=0ft(d|z,𝒙)f(𝒙)d𝒙k𝒙t=0ft(D=0|Z=1,𝒙)f(𝒙)d𝒙\displaystyle\int_{{\bm{x}}}\triangledown_{t=0}f_{t}(d^{*}|z^{*},{\bm{x}})f({\bm{x}})\text{d}{\bm{x}}-k\int_{{\bm{x}}}\triangledown_{t=0}f_{t}(D\!=\!0|Z\!=\!1,{\bm{x}})f({\bm{x}})\text{d}{\bm{x}}
=\displaystyle= 𝒙f(d|z,𝒙){𝔼Y,M|Z,D,𝑿[St=0(Y,M,d,z,𝒙)|z,d,𝒙]𝔼Y,M,D|Z,𝑿[St=0(Y,M,D,z,𝒙)|z,𝒙]}f(𝒙)d𝒙\displaystyle\int_{{\bm{x}}}f(d^{*}|z^{*},{\bm{x}})\left\{\mathbb{E}_{Y,M|Z,D,{\bm{X}}}[S_{t=0}(Y,M,d^{*},z^{*},{\bm{x}})|z^{*},d^{*},{\bm{x}}]-\mathbb{E}_{Y,M,D|Z,{\bm{X}}}[S_{t=0}(Y,M,D,z^{*},{\bm{x}})|z^{*},{\bm{x}}]\right\}f({\bm{x}})\text{d}{\bm{x}}
k𝒙f(D=0|Z=1,𝒙){𝔼Y,M|D,Z,𝑿[St=0(Y,M,0,1,𝒙)|1,0,𝒙]𝔼Y,M,D|Z,𝑿[St=0(Y,M,D,1,𝒙)|1,𝒙]}f(𝒙)d𝒙\displaystyle-k\int_{{\bm{x}}}f(D=0|Z=1,{\bm{x}})\left\{\mathbb{E}_{Y,M|D,Z,{\bm{X}}}[S_{t=0}(Y,M,0,1,{\bm{x}})|1,0,{\bm{x}}]-\mathbb{E}_{Y,M,D|Z,{\bm{X}}}[S_{t=0}(Y,M,D,1,{\bm{x}})|1,{\bm{x}}]\right\}f({\bm{x}})\text{d}{\bm{x}}
=\displaystyle= 𝒙,m,yf(d|z,𝒙)St=0(y,m,d,z,𝒙)f(y|z,d,m,𝒙)f(m|z,d,𝒙)f(𝒙)dydmd𝒙\displaystyle\iiint_{{\bm{x}},m,y}f(d^{*}|z^{*},{\bm{x}})S_{t=0}(y,m,d^{*},z^{*},{\bm{x}})f(y|z^{*},d^{*},m,{\bm{x}})f(m|z^{*},d^{*},{\bm{x}})f({\bm{x}})\text{d}y\text{d}m\text{d}{\bm{x}}
𝒙,d,m,yf(d|z,𝒙)St=0(y,m,d,z,𝒙)f(y|z,d,m,𝒙)f(m|z,d,𝒙)f(d|z,𝒙)f(𝒙)dydmd𝑑d𝒙\displaystyle-\iiiint_{{\bm{x}},d,m,y}f(d^{*}|z^{*},{\bm{x}})S_{t=0}(y,m,d,z^{*},{\bm{x}})f(y|z^{*},d,m,{\bm{x}})f(m|z^{*},d,{\bm{x}})f(d|z^{*},{\bm{x}})f({\bm{x}})\text{d}y\text{d}m\text{d}d\text{d}{\bm{x}}
k𝒙,m,yf(D=0|Z=1,𝒙)St=0(y,m,0,1,𝒙)f(y|1,0,m,𝒙)f(m|1,0,𝒙)f(𝒙)dydmd𝒙\displaystyle-k\iiint_{{\bm{x}},m,y}f(D=0|Z=1,{\bm{x}})S_{t=0}(y,m,0,1,{\bm{x}})f(y|1,0,m,{\bm{x}})f(m|1,0,{\bm{x}})f({\bm{x}})\text{d}y\text{d}m\text{d}{\bm{x}}
+k𝒙,d,m,yf(D=0|Z=1,𝒙)St=0(y,m,d,1,𝒙)f(y|1,d,m,𝒙)f(m|1,d,𝒙)f(d|1,𝒙)f(𝒙)dydmddd𝒙\displaystyle+k\iiiint_{{\bm{x}},d,m,y}f(D=0|Z=1,{\bm{x}})S_{t=0}(y,m,d,1,{\bm{x}})f(y|1,d,m,{\bm{x}})f(m|1,d,{\bm{x}})f(d|1,{\bm{x}})f({\bm{x}})\text{d}y\text{d}m\text{d}d\text{d}{\bm{x}}
=\displaystyle= 𝔼[𝕀(D=d,Z=z)pzd(𝑿)πz(𝑿)pzd(𝑿)St=0(𝑶)]𝔼[𝕀(Z=z)πz(𝑿)pzd(𝑿)St=0(𝑶)]\displaystyle\mathbb{E}\left[\frac{\mathbb{I}(D=d^{*},Z=z^{*})}{p_{z^{*}d^{*}}({\bm{X}})\pi_{z}^{*}({\bm{X}})}p_{z^{*}d^{*}}({\bm{X}})S_{t=0}(\bm{O})\right]-\mathbb{E}\left[\frac{\mathbb{I}(Z=z^{*})}{\pi_{z^{*}}({\bm{X}})}p_{z^{*}d^{*}}({\bm{X}})S_{t=0}(\bm{O})\right]
k𝔼[𝕀(D=0,Z=1)p01(𝑿)π0(𝑿)p01(𝑿)St=0(𝑶)]+k𝔼[𝕀(Z=0)π0(𝑿)p01(𝑿)St=0(𝑶)]\displaystyle-k\mathbb{E}\left[\frac{\mathbb{I}(D=0,Z=1)}{p_{01}({\bm{X}})\pi_{0}({\bm{X}})}p_{01}({\bm{X}})S_{t=0}(\bm{O})\right]+k\mathbb{E}\left[\frac{\mathbb{I}(Z=0)}{\pi_{0}({\bm{X}})}p_{01}({\bm{X}})S_{t=0}(\bm{O})\right]
=\displaystyle= 𝔼[(𝕀(Z=z){𝕀(D=d)pzd(𝑿)}πz(𝑿)k(1Z){Dp01(𝑿)}π0(𝑿))St=0(𝑶)]\displaystyle\mathbb{E}\left[\left(\frac{\mathbb{I}(Z=z^{*})\left\{\mathbb{I}(D=d^{*})-p_{z^{*}d^{*}}({\bm{X}})\right\}}{\pi_{z^{*}}({\bm{X}})}-k\frac{(1-Z)\left\{D-p_{01}({\bm{X}})\right\}}{\pi_{0}({\bm{X}})}\right)S_{t=0}({\bm{O}})\right]

and

(s21)=\displaystyle\eqref{eq:q_eif2}= 𝒙{pzd(𝒙)kp01(𝒙)}t=0ft(𝒙)d𝒙\displaystyle\int_{{\bm{x}}}\left\{p_{z^{*}d^{*}}({\bm{x}})-kp_{01}({\bm{x}})\right\}\triangledown_{t=0}f_{t}({\bm{x}})\text{d}{\bm{x}}
=\displaystyle= 𝒙{pzd(𝒙)kp01(𝒙)}f(𝒙){𝔼Y,M,D,Z|𝑿[St=0(Y,M,D,Z,𝒙)|𝒙]𝔼[St=0(Y,M,S,Z,𝑿)]}d𝒙\displaystyle\int_{{\bm{x}}}\left\{p_{z^{*}d^{*}}({\bm{x}})-kp_{01}({\bm{x}})\right\}f({\bm{x}})\left\{\mathbb{E}_{Y,M,D,Z|{\bm{X}}}[S_{t=0}(Y,M,D,Z,{\bm{x}})|{\bm{x}}]-\mathbb{E}[S_{t=0}(Y,M,S,Z,{\bm{X}})]\right\}\text{d}{\bm{x}}
=\displaystyle= 𝔼[{pzd(𝑿)kp01(𝑿)}St=0(𝑶)]ed1d0𝔼[St=0(𝑶)]\displaystyle\mathbb{E}\left[\left\{p_{z^{*}d^{*}}({\bm{X}})-kp_{01}({\bm{X}})\right\}S_{t=0}({\bm{O}})\right]-e_{d_{1}d_{0}}\mathbb{E}\left[S_{t=0}({\bm{O}})\right]
=\displaystyle= 𝔼[({pzd(𝑿)kp01(𝑿)}ed1d0)St=0(𝑶)].\displaystyle\mathbb{E}\left[\left(\left\{p_{z^{*}d^{*}}({\bm{X}})-kp_{01}({\bm{X}})\right\}-e_{d_{1}d_{0}}\right)S_{t=0}({\bm{O}})\right].

Henceforth, we have that

t=0ed1d0(t)=(s20)+(s21)=𝔼[{δd1d0(zz)(𝑶)ed1d0}St=0(𝑶)].\displaystyle\triangledown_{t=0}e_{d_{1}d_{0}}(t)=\eqref{eq:q_eif1}+\eqref{eq:q_eif2}=\mathbb{E}\left[\left\{\delta_{d_{1}d_{0}}^{(zz^{\prime})}({\bm{O}})-e_{d_{1}d_{0}}\right\}S_{t=0}(\bm{O})\right].

Therefore, the EIF of ed1d0e_{d_{1}d_{0}} is 𝒟d1d0e(𝑶)=δd1d0(𝑶)ed1d0\mathcal{D}_{d_{1}d_{0}}^{e}(\bm{O})=\delta_{d_{1}d_{0}}({\bm{O}})-e_{d_{1}d_{0}}. \square

Lemma S10

Assume that α1\alpha_{1} and α2\alpha_{2} are two causal estimands and their EIFs based on the nonparametric model np\mathcal{M}_{np} in the observed data 𝐎\bm{O} are 𝒟1(𝐎)\mathcal{D}_{1}(\bm{O}) and 𝒟2(𝐎)\mathcal{D}_{2}(\bm{O}), respectively. Then, the EIF of α3=α1+α2\alpha_{3}=\alpha_{1}+\alpha_{2} is

𝒟3(𝑶)=𝒟1(𝑶)+𝒟2(𝑶).\mathcal{D}_{3}(\bm{O})=\mathcal{D}_{1}(\bm{O})+\mathcal{D}_{2}(\bm{O}).

Moreover, if α20\alpha_{2}\neq 0, the EIF of α4=α1/α2\alpha_{4}=\alpha_{1}/\alpha_{2} is

𝒟4(𝑶)=1α2{𝒟1(𝑶)α4𝒟2(𝑶)}.\mathcal{D}_{4}(\bm{O})=\frac{1}{\alpha_{2}}\left\{\mathcal{D}_{1}(\bm{O})-\alpha_{4}\mathcal{D}_{2}(\bm{O})\right\}.
Proof.

We shall follow the notations in the proof of Lemma S9. Define α1(t)\alpha_{1}(t) and α2(t)\alpha_{2}(t) as the nonparametric identification formulas of the causal estimands α1\alpha_{1} and α2\alpha_{2}, which are evaluated under the parametric submodel ft(𝒐)f_{t}(\bm{o}). By the semiparametric efficiency theory, we have that

𝔼[𝒟1(𝑶)St=0(𝑶)]=t=0α1(t) and 𝔼[𝒟2(𝑶)St=0(𝑶)]=t=0α2(t).\mathbb{E}\left[\mathcal{D}_{1}(\bm{O})S_{t=0}(\bm{O})\right]=\triangledown_{t=0}\alpha_{1}(t)\text{ and }\mathbb{E}\left[\mathcal{D}_{2}(\bm{O})S_{t=0}(\bm{O})\right]=\triangledown_{t=0}\alpha_{2}(t).

Also, since α3=α1+α2\alpha_{3}=\alpha_{1}+\alpha_{2}, a valid nonparametric identification formula of α3\alpha_{3} under the parametric submodel is α3(t)=α1(t)+α2(t)\alpha_{3}(t)=\alpha_{1}(t)+\alpha_{2}(t). Then, noting

t=0α3(t)\displaystyle\triangledown_{t=0}\alpha_{3}(t) =t=0α1(t)+t=0α2(t)\displaystyle=\triangledown_{t=0}\alpha_{1}(t)+\triangledown_{t=0}\alpha_{2}(t)
=𝔼[{𝒟1(𝑶)+𝒟2(𝑶)}St=0(𝑶)]\displaystyle=\mathbb{E}\left[\left\{\mathcal{D}_{1}(\bm{O})+\mathcal{D}_{2}(\bm{O})\right\}S_{t=0}(\bm{O})\right]
=𝔼[𝒟3(𝑶)St=0(𝑶)],\displaystyle=\mathbb{E}\left[\mathcal{D}_{3}(\bm{O})S_{t=0}(\bm{O})\right],

we conclude that 𝒟3(𝑶)\mathcal{D}_{3}(\bm{O}) is the EIF of α3\alpha_{3}. Similarly, because α4=α1/α2\alpha_{4}=\alpha_{1}/\alpha_{2}, a valid nonparametric identification formula of α4\alpha_{4} under the parametric submodel is α4(t)=α1(t)/α2(t)\alpha_{4}(t)=\alpha_{1}(t)/\alpha_{2}(t). Then, we have that

t=0α4(t)\displaystyle\triangledown_{t=0}\alpha_{4}(t) =t=0{α1(t)α2(t)}=α2t=0α1(t)α1t=0α2(t)α22\displaystyle=\triangledown_{t=0}\left\{\frac{\alpha_{1}(t)}{\alpha_{2}(t)}\right\}=\frac{\alpha_{2}\triangledown_{t=0}\alpha_{1}(t)-\alpha_{1}\triangledown_{t=0}\alpha_{2}(t)}{\alpha_{2}^{2}}
=1α2𝔼[𝒟1(𝑶)St=0(𝑶)]α1α2𝔼[𝒟1(𝑶)St=0(𝑶)]\displaystyle=\frac{1}{\alpha_{2}}\mathbb{E}\left[\mathcal{D}_{1}(\bm{O})S_{t=0}(\bm{O})\right]-\frac{\alpha_{1}}{\alpha_{2}}\mathbb{E}\left[\mathcal{D}_{1}(\bm{O})S_{t=0}(\bm{O})\right]
=𝔼[𝒟1(𝑶)α4𝒟2(𝑶)α2St=0(𝑶)],\displaystyle=\mathbb{E}\left[\frac{\mathcal{D}_{1}(\bm{O})-\alpha_{4}\mathcal{D}_{2}(\bm{O})}{\alpha_{2}}S_{t=0}(\bm{O})\right],

thus 𝒟4(𝑶)\mathcal{D}_{4}(\bm{O}) is the EIF of α4\alpha_{4}. \square

Proof of Theorem 3. Notice that θd1d0(zz)=𝔼[(pzd(𝑿)kp01(𝑿))ηzz(𝑿)]pzdkp01\theta_{d_{1}d_{0}}^{(zz^{\prime})}=\frac{\mathbb{E}\left[(p_{z^{*}d^{*}}({\bm{X}})-kp_{01}({\bm{X}}))\eta_{zz^{\prime}}({\bm{X}})\right]}{p_{z^{*}d^{*}}-kp_{01}} is a ratio parameter, and the EIFs of its nominator and denominator, Hd1d0(zz)=𝔼[(pzd(𝑿)kp01(𝑿))ηzz(𝑿)]H_{d_{1}d_{0}}^{(zz^{\prime})}=\mathbb{E}\left[(p_{z^{*}d^{*}}({\bm{X}})-kp_{01}({\bm{X}}))\eta_{zz^{\prime}}({\bm{X}})\right] and ed1d0=pzdkp01e_{d_{1}d_{0}}=p_{z^{*}d^{*}}-kp_{01}, have been derived in Lemma S9. Therefore, one can verify that the EIF shown in Theorem 3 holds by applying Lemma S10. \square

D.5 The multiply robust estimator (Theorem 4)

Proof of Theorem 4. Let 𝝉\bm{\tau} be all of the parameters in the parametric working models of hnuisanceparh_{nuisance}^{\text{par}}, and let 𝝉\bm{\tau}^{*} be the probability limit of 𝝉^\widehat{\bm{\tau}}, where some components of 𝝉\bm{\tau}^{*} may not equal to either true value due to misspecification. Let h~nuisance={π~z(𝒙),p~zd(𝒙),r~zd(m,𝒙),μ~zd(m,𝒙)}\widetilde{h}_{nuisance}=\{\widetilde{\pi}_{z}({\bm{x}}),\widetilde{p}_{zd}({\bm{x}}),\widetilde{r}_{zd}(m,{\bm{x}}),\\ \widetilde{\mu}_{zd}(m,{\bm{x}})\} be the value of hnuisanceparh_{nuisance}^{\text{par}} when it is evaluated at 𝝉\bm{\tau}^{*}, which is the probability limit of h^nuisancepar\widehat{h}_{nuisance}^{\text{par}}. Notice that π~z(𝒙)=πz(𝒙)\widetilde{\pi}_{z}({\bm{x}})=\pi_{z}({\bm{x}}), p~zd(𝒙)=pzd(𝒙)\widetilde{p}_{zd}({\bm{x}})=p_{zd}({\bm{x}}), r~zd(m,𝒙)=rzd(m,𝒙)\widetilde{r}_{zd}(m,{\bm{x}})=r_{zd}(m,{\bm{x}}), μ~zd(m,𝒙)=μzd(m,𝒙)\widetilde{\mu}_{zd}(m,{\bm{x}})=\mu_{zd}(m,{\bm{x}}), under π\mathcal{M}_{\pi}, e\mathcal{M}_{e}, m\mathcal{M}_{m}, and o\mathcal{M}_{o}, respectively, but the equalities generally do not hold when the corresponding working model is misspecified. Let p~zd=𝔼[𝕀(Z=z)(𝕀(D=d)p~zd(𝑿))π~z(𝑿)+p~zd(𝑿)]\widetilde{p}_{zd}=\mathbb{E}\left[\frac{\mathbb{I}(Z=z)(\mathbb{I}(D=d)-\widetilde{p}_{zd}({\bm{X}}))}{\widetilde{\pi}_{z}({\bm{X}})}+\widetilde{p}_{zd}({\bm{X}})\right] be the probability limit of pzddrp_{zd}^{\text{dr}}. According to Jiang et al. (2022), p~zd=pzd\widetilde{p}_{zd}=p_{zd} under πe\mathcal{M}_{\pi}\cup\mathcal{M}_{e}. Therefore, p~zd=pzd\widetilde{p}_{zd}=p_{zd} also holds under either πem\mathcal{M}_{\pi}\cap\mathcal{M}_{e}\cap\mathcal{M}_{m}, πmo\mathcal{M}_{\pi}\cap\mathcal{M}_{m}\cap\mathcal{M}_{o}, πeo\mathcal{M}_{\pi}\cap\mathcal{M}_{e}\cap\mathcal{M}_{o}, or emo\mathcal{M}_{e}\cap\mathcal{M}_{m}\cap\mathcal{M}_{o}. The previous discussion suggests that the probability limit of θ^d1d0(zz),mr\widehat{\theta}_{d_{1}d_{0}}^{(zz^{\prime}),\text{mr}} is

θd1d0(zz),mr=\displaystyle\theta^{(zz^{\prime}),\text{mr}}_{d_{1}d_{0}}= 𝔼{(𝕀(Z=z){𝕀(D=d)p~zd(𝑿)}π~z(𝑿)k(1Z){Dp~01(𝑿)}π~0(𝑿))η~zz(𝑿)p~zdkp~01\displaystyle\mathbb{E}\Big{\{}\left(\frac{\mathbb{I}(Z=z^{*})\left\{\mathbb{I}(D=d^{*})-\widetilde{p}_{z^{*}d^{*}}({\bm{X}})\right\}}{\widetilde{\pi}_{z^{*}}({\bm{X}})}-k\frac{(1-Z)\left\{D-\widetilde{p}_{01}({\bm{X}})\right\}}{\widetilde{\pi}_{0}({\bm{X}})}\right)\frac{\widetilde{\eta}_{zz^{\prime}}({\bm{X}})}{\widetilde{p}_{z^{*}d^{*}}-k\widetilde{p}_{01}}
+p~zd(𝑿)kp~01(𝑿)p~zdkp~01𝕀(D=dz,Z=z)p~zdz(𝑿)π~z(𝑿)r~zdz(M,𝑿)r~zdz(M,𝑿){Yμ~zdz(M,𝑿)}\displaystyle+\frac{\widetilde{p}_{z^{*}d^{*}}({\bm{X}})-k\widetilde{p}_{01}({\bm{X}})}{\widetilde{p}_{z^{*}d^{*}}-k\widetilde{p}_{01}}\frac{\mathbb{I}(D=d_{z},Z=z)}{\widetilde{p}_{zd_{z}}({\bm{X}})\widetilde{\pi}_{z}({\bm{X}})}\frac{\widetilde{r}_{z^{\prime}d_{z^{\prime}}}(M,{\bm{X}})}{\widetilde{r}_{zd_{z}}(M,{\bm{X}})}\left\{Y-\widetilde{\mu}_{zd_{z}}(M,{\bm{X}})\right\}
+p~zd(𝑿)kp~01(𝑿)p~zdkp~01𝕀(D=dz,Z=z)p~zdz(𝑿)π~z(𝑿){μ~zdz(M,𝑿)η~zz(𝑿)}\displaystyle+\frac{\widetilde{p}_{z^{*}d^{*}}({\bm{X}})-k\widetilde{p}_{01}({\bm{X}})}{\widetilde{p}_{z^{*}d^{*}}-k\widetilde{p}_{01}}\frac{\mathbb{I}(D=d_{z^{\prime}},Z=z^{\prime})}{\widetilde{p}_{z^{\prime}d_{z^{\prime}}}({\bm{X}})\widetilde{\pi}_{z^{\prime}}({\bm{X}})}\left\{\widetilde{\mu}_{zd_{z}}(M,{\bm{X}})-\widetilde{\eta}_{zz^{\prime}}({\bm{X}})\right\}
+p~zd(𝑿)kp~01(𝑿)p~zdkp~01η~zz(𝑿)},\displaystyle+\frac{\widetilde{p}_{z^{*}d^{*}}({\bm{X}})-k\widetilde{p}_{01}({\bm{X}})}{\widetilde{p}_{z^{*}d^{*}}-k\widetilde{p}_{01}}\widetilde{\eta}_{zz^{\prime}}({\bm{X}})\Big{\}},

where η~zz(𝑿)=mμ~zdz(m,𝑿)r~zdz(m,𝑿)dm\widetilde{\eta}_{zz^{\prime}}({\bm{X}})=\int_{m}\widetilde{\mu}_{zd_{z}}(m,{\bm{X}})\widetilde{r}_{z^{\prime}d_{z^{\prime}}}(m,{\bm{X}})\text{d}m. In what follows, we show that θd1d0(zz),mr=θd1d0(zz)\theta^{(zz^{\prime}),\text{mr}}_{d_{1}d_{0}}=\theta_{d_{1}d_{0}}^{(zz^{\prime})} under Scenario I (πem\mathcal{M}_{\pi}\cap\mathcal{M}_{e}\cap\mathcal{M}_{m}), II (πmo\mathcal{M}_{\pi}\cap\mathcal{M}_{m}\cap\mathcal{M}_{o}), III (πeo\mathcal{M}_{\pi}\cap\mathcal{M}_{e}\cap\mathcal{M}_{o}), or IV (emo\mathcal{M}_{e}\cap\mathcal{M}_{m}\cap\mathcal{M}_{o}), which collectively verify the quadruple robustness of θ^d1d0(zz),mr\widehat{\theta}_{d_{1}d_{0}}^{(zz^{\prime}),\text{mr}}.

Scenario I (πem\mathcal{M}_{\pi}\cap\mathcal{M}_{e}\cap\mathcal{M}_{m}):

In Scenario I, π~z(𝒙)=πz(𝒙)\widetilde{\pi}_{z}({\bm{x}})=\pi_{z}({\bm{x}}), p~zd(𝒙)=pzd(𝒙)\widetilde{p}_{zd}({\bm{x}})=p_{zd}({\bm{x}}), r~zd(m,𝒙)=rzd(m,𝒙)\widetilde{r}_{zd}(m,{\bm{x}})=r_{zd}(m,{\bm{x}}), but generally μ~zd(m,𝒙)μzd(m,𝒙)\widetilde{\mu}_{zd}(m,{\bm{x}})\neq\mu_{zd}(m,{\bm{x}}). By the doubly robustness of p~zd\widetilde{p}_{zd}, we also have p~zd=pzd\widetilde{p}_{zd}=p_{zd}. Observing this, we can rewrite θd1d0(zz),mr=j=14Δj\theta^{(zz^{\prime}),\text{mr}}_{d_{1}d_{0}}=\sum_{j=1}^{4}\Delta_{j}, where

Δ1\displaystyle\Delta_{1} =𝔼[(𝕀(Z=z){𝕀(D=d)pzd(𝑿)}πz(𝑿)k(1Z){Dp01(𝑿)}π0(𝑿))mμ~zdz(m,𝑿)rzdz(m,𝑿)dmpzdkp01],\displaystyle=\mathbb{E}\left[\left(\frac{\mathbb{I}(Z=z^{*})\left\{\mathbb{I}(D=d^{*})-p_{z^{*}d^{*}}({\bm{X}})\right\}}{\pi_{z^{*}}({\bm{X}})}-k\frac{(1-Z)\left\{D-p_{01}({\bm{X}})\right\}}{\pi_{0}({\bm{X}})}\right)\frac{\int_{m}\widetilde{\mu}_{zd_{z}}(m,{\bm{X}})r_{z^{\prime}d_{z^{\prime}}}(m,{\bm{X}})\text{d}m}{p_{z^{*}d^{*}}-kp_{01}}\right],
Δ2\displaystyle\Delta_{2} =𝔼[pzd(𝑿)kp01(𝑿)pzdkp01𝕀(D=dz,Z=z)pzdz(𝑿)πz(𝑿)rzdz(M,𝑿)rzdz(M,𝑿)Y],\displaystyle=\mathbb{E}\left[\frac{p_{z^{*}d^{*}}({\bm{X}})-kp_{01}({\bm{X}})}{p_{z^{*}d^{*}}-kp_{01}}\frac{\mathbb{I}(D=d_{z},Z=z)}{p_{zd_{z}}({\bm{X}})\pi_{z}({\bm{X}})}\frac{r_{z^{\prime}d_{z^{\prime}}}(M,{\bm{X}})}{r_{zd_{z}}(M,{\bm{X}})}Y\right],
Δ3\displaystyle\Delta_{3} =𝔼[pzd(𝑿)kp01(𝑿)pzdkp01{𝕀(D=dz,Z=z)pzdz(𝑿)πz(𝑿)𝕀(D=dz,Z=z)pzdz(𝑿)πz(𝑿)rzdz(M,𝑿)rzdz(M,𝑿)}μ~zdz(M,𝑿)],\displaystyle=\mathbb{E}\left[\frac{p_{z^{*}d^{*}}({\bm{X}})-kp_{01}({\bm{X}})}{p_{z^{*}d^{*}}-kp_{01}}\left\{\frac{\mathbb{I}(D=d_{z^{\prime}},Z=z^{\prime})}{p_{z^{\prime}d_{z^{\prime}}}({\bm{X}})\pi_{z^{\prime}}({\bm{X}})}-\frac{\mathbb{I}(D=d_{z},Z=z)}{p_{zd_{z}}({\bm{X}})\pi_{z}({\bm{X}})}\frac{r_{z^{\prime}d_{z^{\prime}}}(M,{\bm{X}})}{r_{zd_{z}}(M,{\bm{X}})}\right\}\widetilde{\mu}_{zd_{z}}(M,{\bm{X}})\right],
Δ4\displaystyle\Delta_{4} =𝔼[pzd(𝑿)kp01(𝑿)pzdkp01{1𝕀(D=dz,Z=z)pzdz(𝑿)πz(𝑿)}mμ~zdz(m,𝑿)rzdz(m,𝑿)dm].\displaystyle=\mathbb{E}\left[\frac{p_{z^{*}d^{*}}({\bm{X}})-kp_{01}({\bm{X}})}{p_{z^{*}d^{*}}-kp_{01}}\left\{1-\frac{\mathbb{I}(D=d_{z^{\prime}},Z=z^{\prime})}{p_{z^{\prime}d_{z^{\prime}}}({\bm{X}})\pi_{z^{\prime}}({\bm{X}})}\right\}\int_{m}\widetilde{\mu}_{zd_{z}}(m,{\bm{X}})r_{z^{\prime}d_{z^{\prime}}}(m,{\bm{X}})\text{d}m\right].

It is obvious that Δ2=θd1d0(zz),a\Delta_{2}=\theta_{d_{1}d_{0}}^{(zz^{\prime}),\text{a}}. Moreover,

Δ1=\displaystyle\Delta_{1}= 𝔼[𝕀(Z=z){𝕀(D=d)pzd(𝑿)}πz(𝑿)mμ~zdz(m,𝑿)rzdz(m,𝑿)dmpzdkp01]\displaystyle\mathbb{E}\left[\frac{\mathbb{I}(Z=z^{*})\left\{\mathbb{I}(D=d^{*})-p_{z^{*}d^{*}}({\bm{X}})\right\}}{\pi_{z^{*}}({\bm{X}})}\frac{\int_{m}\widetilde{\mu}_{zd_{z}}(m,{\bm{X}})r_{z^{\prime}d_{z^{\prime}}}(m,{\bm{X}})\text{d}m}{p_{z^{*}d^{*}}-kp_{01}}\right]
k𝔼[(1Z){Dp01(𝑿)}π0(𝑿)mμ~zdz(m,𝑿)rzdz(m,𝑿)dmpzdkp01]\displaystyle-k\mathbb{E}\left[\frac{(1-Z)\left\{D-p_{01}({\bm{X}})\right\}}{\pi_{0}({\bm{X}})}\frac{\int_{m}\widetilde{\mu}_{zd_{z}}(m,{\bm{X}})r_{z^{\prime}d_{z^{\prime}}}(m,{\bm{X}})\text{d}m}{p_{z^{*}d^{*}}-kp_{01}}\right]
=\displaystyle= 𝔼[𝕀(Z=z)πz(𝑿)mμ~zdz(m,𝑿)rzdz(m,𝑿)dmpzdkp01{𝔼D|Z,𝑿[𝕀(D=d)|z,𝑿]pzd(𝑿)}=0]\displaystyle\mathbb{E}\left[\frac{\mathbb{I}(Z=z^{*})}{\pi_{z^{*}}({\bm{X}})}\frac{\int_{m}\widetilde{\mu}_{zd_{z}}(m,{\bm{X}})r_{z^{\prime}d_{z^{\prime}}}(m,{\bm{X}})\text{d}m}{p_{z^{*}d^{*}}-kp_{01}}\underbrace{\left\{\mathbb{E}_{D|Z,{\bm{X}}}[\mathbb{I}(D=d^{*})|z^{*},{\bm{X}}]-p_{z^{*}d^{*}}({\bm{X}})\right\}}_{=0}\right]
k𝔼[(1Z)π0(𝑿)mμ~zdz(m,𝑿)rzdz(m,𝑿)dmpzdkp01{𝔼D|Z,𝑿[D|0,𝑿]p01(𝑿)}=0]\displaystyle-k\mathbb{E}\left[\frac{(1-Z)}{\pi_{0}({\bm{X}})}\frac{\int_{m}\widetilde{\mu}_{zd_{z}}(m,{\bm{X}})r_{z^{\prime}d_{z^{\prime}}}(m,{\bm{X}})\text{d}m}{p_{z^{*}d^{*}}-kp_{01}}\underbrace{\left\{\mathbb{E}_{D|Z,{\bm{X}}}[D|0,{\bm{X}}]-p_{01}({\bm{X}})\right\}}_{=0}\right]
=\displaystyle= 0k×0=0,\displaystyle 0-k\times 0=0,
Δ3=\displaystyle\Delta_{3}= 𝔼[pzd(𝑿)kp01(𝑿)pzdkp01𝕀(D=dz,Z=z)pzdz(𝑿)πz(𝑿)μ~zdz(M,𝑿)]\displaystyle\mathbb{E}\left[\frac{p_{z^{*}d^{*}}({\bm{X}})-kp_{01}({\bm{X}})}{p_{z^{*}d^{*}}-kp_{01}}\frac{\mathbb{I}(D=d_{z^{\prime}},Z=z^{\prime})}{p_{z^{\prime}d_{z^{\prime}}}({\bm{X}})\pi_{z^{\prime}}({\bm{X}})}\widetilde{\mu}_{zd_{z}}(M,{\bm{X}})\right]
𝔼[pzd(𝑿)kp01(𝑿)pzdkp01𝕀(D=dz,Z=z)pzdz(𝑿)πz(𝑿)rzdz(M,𝑿)rzdz(M,𝑿)μ~zdz(M,𝑿)]\displaystyle-\mathbb{E}\left[\frac{p_{z^{*}d^{*}}({\bm{X}})-kp_{01}({\bm{X}})}{p_{z^{*}d^{*}}-kp_{01}}\frac{\mathbb{I}(D=d_{z},Z=z)}{p_{zd_{z}}({\bm{X}})\pi_{z}({\bm{X}})}\frac{r_{z^{\prime}d_{z^{\prime}}}(M,{\bm{X}})}{r_{zd_{z}}(M,{\bm{X}})}\widetilde{\mu}_{zd_{z}}(M,{\bm{X}})\right]
=\displaystyle= 𝔼[pzd(𝑿)kp01(𝑿)pzdkp01𝕀(D=dz,Z=z)pzdz(𝑿)πz(𝑿)𝔼M|Z,D,𝑿[μ~zdz(M,𝑿)|z,dz,𝑿]]\displaystyle\mathbb{E}\left[\frac{p_{z^{*}d^{*}}({\bm{X}})-kp_{01}({\bm{X}})}{p_{z^{*}d^{*}}-kp_{01}}\frac{\mathbb{I}(D=d_{z^{\prime}},Z=z^{\prime})}{p_{z^{\prime}d_{z^{\prime}}}({\bm{X}})\pi_{z^{\prime}}({\bm{X}})}\mathbb{E}_{M|Z,D,{\bm{X}}}[\widetilde{\mu}_{zd_{z}}(M,{\bm{X}})|z^{\prime},d_{z^{\prime}},{\bm{X}}]\right]
𝔼[pzd(𝑿)kp01(𝑿)pzdkp01𝕀(D=dz,Z=z)pzdz(𝑿)πz(𝑿)𝔼M|Z,D,𝑿[rzdz(M,𝑿)rzdz(M,𝑿)μ~zdz(M,𝑿)|z,dz,𝑿]]\displaystyle-\mathbb{E}\left[\frac{p_{z^{*}d^{*}}({\bm{X}})-kp_{01}({\bm{X}})}{p_{z^{*}d^{*}}-kp_{01}}\frac{\mathbb{I}(D=d_{z},Z=z)}{p_{zd_{z}}({\bm{X}})\pi_{z}({\bm{X}})}\mathbb{E}_{M|Z,D,{\bm{X}}}\left[\frac{r_{z^{\prime}d_{z^{\prime}}}(M,{\bm{X}})}{r_{zd_{z}}(M,{\bm{X}})}\widetilde{\mu}_{zd_{z}}(M,{\bm{X}})\Big{|}z,d_{z},{\bm{X}}\right]\right]
=\displaystyle= 𝔼[pzd(𝑿)kp01(𝑿)pzdkp01𝔼M|Z,D,𝑿[μ~zdz(M,𝑿)|z,dz,𝑿]]\displaystyle\mathbb{E}\left[\frac{p_{z^{*}d^{*}}({\bm{X}})-kp_{01}({\bm{X}})}{p_{z^{*}d^{*}}-kp_{01}}\mathbb{E}_{M|Z,D,{\bm{X}}}[\widetilde{\mu}_{zd_{z}}(M,{\bm{X}})|z^{\prime},d_{z^{\prime}},{\bm{X}}]\right]
𝔼[pzd(𝑿)kp01(𝑿)pzdkp01𝔼M|Z,D,𝑿[rzdz(M,𝑿)rzdz(M,𝑿)μ~zdz(M,𝑿)|z,dz,𝑿]]\displaystyle-\mathbb{E}\left[\frac{p_{z^{*}d^{*}}({\bm{X}})-kp_{01}({\bm{X}})}{p_{z^{*}d^{*}}-kp_{01}}\mathbb{E}_{M|Z,D,{\bm{X}}}\left[\frac{r_{z^{\prime}d_{z^{\prime}}}(M,{\bm{X}})}{r_{zd_{z}}(M,{\bm{X}})}\widetilde{\mu}_{zd_{z}}(M,{\bm{X}})\Big{|}z,d_{z},{\bm{X}}\right]\right]
=\displaystyle= 𝔼[pzd(𝑿)kp01(𝑿)pzdkp01𝔼M|Z,D,𝑿[μ~zdz(M,𝑿)|z,dz,𝑿]]\displaystyle\mathbb{E}\left[\frac{p_{z^{*}d^{*}}({\bm{X}})-kp_{01}({\bm{X}})}{p_{z^{*}d^{*}}-kp_{01}}\mathbb{E}_{M|Z,D,{\bm{X}}}[\widetilde{\mu}_{zd_{z}}(M,{\bm{X}})|z^{\prime},d_{z^{\prime}},{\bm{X}}]\right]
𝔼[pzd(𝑿)kp01(𝑿)pzdkp01𝔼M|Z,D,𝑿[μ~zdz(M,𝑿)|z,dz,𝑿]]\displaystyle-\mathbb{E}\left[\frac{p_{z^{*}d^{*}}({\bm{X}})-kp_{01}({\bm{X}})}{p_{z^{*}d^{*}}-kp_{01}}\mathbb{E}_{M|Z,D,{\bm{X}}}\left[\widetilde{\mu}_{zd_{z}}(M,{\bm{X}})\Big{|}z^{\prime},d_{z^{\prime}},{\bm{X}}\right]\right]
=\displaystyle= 0,\displaystyle 0,
Δ4=\displaystyle\Delta_{4}= 𝔼[pzd(𝑿)kp01(𝑿)pzdkp01𝔼M|Z,D,𝑿[μ~zdz(m,𝑿)|z,dz,𝑿]{1𝕀(D=dz,Z=z)pzdz(𝑿)πz(𝑿)}]\displaystyle\mathbb{E}\left[\frac{p_{z^{*}d^{*}}({\bm{X}})-kp_{01}({\bm{X}})}{p_{z^{*}d^{*}}-kp_{01}}\mathbb{E}_{M|Z,D,{\bm{X}}}[\widetilde{\mu}_{zd_{z}}(m,{\bm{X}})|z^{\prime},d_{z^{\prime}},{\bm{X}}]\left\{1-\frac{\mathbb{I}(D=d_{z^{\prime}},Z=z^{\prime})}{p_{z^{\prime}d_{z^{\prime}}}({\bm{X}})\pi_{z^{\prime}}({\bm{X}})}\right\}\right]
=\displaystyle= 𝔼[pzd(𝑿)kp01(𝑿)pzdkp01𝔼M|Z,D,𝑿[μ~zdz(m,𝑿)|z,dz,𝑿]{11}]\displaystyle\mathbb{E}\left[\frac{p_{z^{*}d^{*}}({\bm{X}})-kp_{01}({\bm{X}})}{p_{z^{*}d^{*}}-kp_{01}}\mathbb{E}_{M|Z,D,{\bm{X}}}[\widetilde{\mu}_{zd_{z}}(m,{\bm{X}})|z^{\prime},d_{z^{\prime}},{\bm{X}}]\left\{1-1\right\}\right]
=\displaystyle= 0,\displaystyle 0,

which suggests that θd1d0(zz),mr=j=14Δj=θd1d0(zz),a=θd1d0(zz)\theta^{(zz^{\prime}),\text{mr}}_{d_{1}d_{0}}=\sum_{j=1}^{4}\Delta_{j}=\theta^{(zz^{\prime}),\text{a}}_{d_{1}d_{0}}=\theta^{(zz^{\prime})}_{d_{1}d_{0}} under πem\mathcal{M}_{\pi}\cap\mathcal{M}_{e}\cap\mathcal{M}_{m}.

Scenario II (πmo\mathcal{M}_{\pi}\cap\mathcal{M}_{m}\cap\mathcal{M}_{o}):

In Scenario II, π~z(𝒙)=πz(𝒙)\widetilde{\pi}_{z}({\bm{x}})=\pi_{z}({\bm{x}}), r~zd(m,𝒙)=rzd(m,𝒙)\widetilde{r}_{zd}(m,{\bm{x}})=r_{zd}(m,{\bm{x}}), μ~zd(m,𝒙)=μzd(m,𝒙)\widetilde{\mu}_{zd}(m,{\bm{x}})=\mu_{zd}(m,{\bm{x}}), but generally p~zd(𝒙)pzd(𝒙)\widetilde{p}_{zd}({\bm{x}})\neq p_{zd}({\bm{x}}). Observing this, we can rewrite θd1d0(zz),mr=j=14Δj\theta^{(zz^{\prime}),\text{mr}}_{d_{1}d_{0}}=\sum_{j=1}^{4}\Delta_{j}, where

Δ1\displaystyle\Delta_{1} =𝔼[{𝕀(Z=z,D=d)πz(𝑿)k(1Z)Dπ0(𝑿)}ηzz(𝑿)pzdkp01],\displaystyle=\mathbb{E}\left[\left\{\frac{\mathbb{I}(Z=z^{*},D=d^{*})}{\pi_{z^{*}}({\bm{X}})}-k\frac{(1-Z)D}{\pi_{0}({\bm{X}})}\right\}\frac{\eta_{zz^{\prime}}({\bm{X}})}{p_{z^{*}d^{*}}-kp_{01}}\right],
Δ2\displaystyle\Delta_{2} =𝔼[p~zd(𝑿)kp~01(𝑿)pzdkp01𝕀(D=dz,Z=z)p~zdz(𝑿)πz(𝑿)rzdz(M,𝑿)rzdz(M,𝑿){Yμzdz(M,𝑿)}],\displaystyle=\mathbb{E}\left[\frac{\widetilde{p}_{z^{*}d^{*}}({\bm{X}})-k\widetilde{p}_{01}({\bm{X}})}{p_{z^{*}d^{*}}-kp_{01}}\frac{\mathbb{I}(D=d_{z},Z=z)}{\widetilde{p}_{zd_{z}}({\bm{X}})\pi_{z}({\bm{X}})}\frac{r_{z^{\prime}d_{z^{\prime}}}(M,{\bm{X}})}{r_{zd_{z}}(M,{\bm{X}})}\left\{Y-\mu_{zd_{z}}(M,{\bm{X}})\right\}\right],
Δ3\displaystyle\Delta_{3} =𝔼[p~zd(𝑿)kp~01(𝑿)pzdkp01𝕀(D=dz,Z=z)p~zdz(𝑿)πz(𝑿){μzdz(M,𝑿)ηzz(𝑿)}],\displaystyle=\mathbb{E}\left[\frac{\widetilde{p}_{z^{*}d^{*}}({\bm{X}})-k\widetilde{p}_{01}({\bm{X}})}{p_{z^{*}d^{*}}-kp_{01}}\frac{\mathbb{I}(D=d_{z^{\prime}},Z=z^{\prime})}{\widetilde{p}_{z^{\prime}d_{z^{\prime}}}({\bm{X}})\pi_{z^{\prime}}({\bm{X}})}\left\{\mu_{zd_{z}}(M,{\bm{X}})-\eta_{zz^{\prime}}({\bm{X}})\right\}\right],
Δ4\displaystyle\Delta_{4} =𝔼[({1𝕀(Z=z)πz(𝑿)}p~zd(𝑿)k{11Zπ0(𝑿)}p~01(𝑿))ηzz(𝑿)pzdkp01].\displaystyle=\mathbb{E}\left[\left(\left\{1-\frac{\mathbb{I}(Z=z^{*})}{\pi_{z^{*}}({\bm{X}})}\right\}\widetilde{p}_{z^{*}d^{*}}({\bm{X}})-k\left\{1-\frac{1-Z}{\pi_{0}({\bm{X}})}\right\}\widetilde{p}_{01}({\bm{X}})\right)\frac{\eta_{zz^{\prime}}({\bm{X}})}{p_{z^{*}d^{*}}-kp_{01}}\right].

One can verify Δ1=θd1d0(zz),b\Delta_{1}=\theta_{d_{1}d_{0}}^{(zz^{\prime}),\text{b}},

Δ2=\displaystyle\Delta_{2}= 𝔼[p~zd(𝑿)kp~01(𝑿)pzdkp01𝕀(D=dz,Z=z)p~zdz(𝑿)πz(𝑿)rzdz(M,𝑿)rzdz(M,𝑿){Yμzdz(M,𝑿)}]\displaystyle\mathbb{E}\left[\frac{\widetilde{p}_{z^{*}d^{*}}({\bm{X}})-k\widetilde{p}_{01}({\bm{X}})}{p_{z^{*}d^{*}}-kp_{01}}\frac{\mathbb{I}(D=d_{z},Z=z)}{\widetilde{p}_{zd_{z}}({\bm{X}})\pi_{z}({\bm{X}})}\frac{r_{z^{\prime}d_{z^{\prime}}}(M,{\bm{X}})}{r_{zd_{z}}(M,{\bm{X}})}\left\{Y-\mu_{zd_{z}}(M,{\bm{X}})\right\}\right]
=\displaystyle= 𝔼[p~zd(𝑿)kp~01(𝑿)pzdkp01𝕀(D=dz,Z=z)p~zdz(𝑿)πz(𝑿)rzdz(M,𝑿)rzdz(M,𝑿){𝔼Y|Z,D,M,𝑿[Y|z,dz,M,𝑿]μzdz(M,𝑿)}=0]\displaystyle\mathbb{E}\left[\frac{\widetilde{p}_{z^{*}d^{*}}({\bm{X}})-k\widetilde{p}_{01}({\bm{X}})}{p_{z^{*}d^{*}}-kp_{01}}\frac{\mathbb{I}(D=d_{z},Z=z)}{\widetilde{p}_{zd_{z}}({\bm{X}})\pi_{z}({\bm{X}})}\frac{r_{z^{\prime}d_{z^{\prime}}}(M,{\bm{X}})}{r_{zd_{z}}(M,{\bm{X}})}\underbrace{\left\{\mathbb{E}_{Y|Z,D,M,{\bm{X}}}[Y|z,d_{z},M,{\bm{X}}]-\mu_{zd_{z}}(M,{\bm{X}})\right\}}_{=0}\right]
=\displaystyle= 0\displaystyle 0
Δ3=\displaystyle\Delta_{3}= 𝔼[p~zd(𝑿)kp~01(𝑿)pzdkp01𝕀(D=dz,Z=z)p~zdz(𝑿)πz(𝑿){𝔼M|Z,D,𝑿[μzdz(M,𝑿)|z,dz,𝑿]ηzz(𝑿)}]\displaystyle\mathbb{E}\left[\frac{\widetilde{p}_{z^{*}d^{*}}({\bm{X}})-k\widetilde{p}_{01}({\bm{X}})}{p_{z^{*}d^{*}}-kp_{01}}\frac{\mathbb{I}(D=d_{z^{\prime}},Z=z^{\prime})}{\widetilde{p}_{z^{\prime}d_{z^{\prime}}}({\bm{X}})\pi_{z^{\prime}}({\bm{X}})}\left\{\mathbb{E}_{M|Z,D,{\bm{X}}}[\mu_{zd_{z}}(M,{\bm{X}})|z^{\prime},d_{z^{\prime}},{\bm{X}}]-\eta_{zz^{\prime}}({\bm{X}})\right\}\right]
=\displaystyle= 𝔼[p~zd(𝑿)kp~01(𝑿)pzdkp01𝕀(D=dz,Z=z)p~zdz(𝑿)πz(𝑿){ηzz(𝑿)ηzz(𝑿)}]\displaystyle\mathbb{E}\left[\frac{\widetilde{p}_{z^{*}d^{*}}({\bm{X}})-k\widetilde{p}_{01}({\bm{X}})}{p_{z^{*}d^{*}}-kp_{01}}\frac{\mathbb{I}(D=d_{z^{\prime}},Z=z^{\prime})}{\widetilde{p}_{z^{\prime}d_{z^{\prime}}}({\bm{X}})\pi_{z^{\prime}}({\bm{X}})}\left\{\eta_{zz^{\prime}}({\bm{X}})-\eta_{zz^{\prime}}({\bm{X}})\right\}\right]
=\displaystyle= 0\displaystyle 0
Δ4=\displaystyle\Delta_{4}= 𝔼[({1𝕀(Z=z)πz(𝑿)}p~zd(𝑿)k{11Zπ0(𝑿)}p~01(𝑿))ηzz(𝑿)pzdkp01]\displaystyle\mathbb{E}\left[\left(\left\{1-\frac{\mathbb{I}(Z=z^{*})}{\pi_{z^{*}}({\bm{X}})}\right\}\widetilde{p}_{z^{*}d^{*}}({\bm{X}})-k\left\{1-\frac{1-Z}{\pi_{0}({\bm{X}})}\right\}\widetilde{p}_{01}({\bm{X}})\right)\frac{\eta_{zz^{\prime}}({\bm{X}})}{p_{z^{*}d^{*}}-kp_{01}}\right]
=\displaystyle= 𝔼[({1𝔼Z|𝑿[𝕀(Z=z)|𝑿]πz(𝑿)}p~zd(𝑿)k{1𝔼Z|𝑿[1Z|𝑿]π0(𝑿)}p~01(𝑿))ηzz(𝑿)pzdkp01]\displaystyle\mathbb{E}\left[\left(\left\{1-\frac{\mathbb{E}_{Z|{\bm{X}}}\left[\mathbb{I}(Z=z^{*})|{\bm{X}}\right]}{\pi_{z^{*}}({\bm{X}})}\right\}\widetilde{p}_{z^{*}d^{*}}({\bm{X}})-k\left\{1-\frac{\mathbb{E}_{Z|{\bm{X}}}[1-Z|{\bm{X}}]}{\pi_{0}({\bm{X}})}\right\}\widetilde{p}_{01}({\bm{X}})\right)\frac{\eta_{zz^{\prime}}({\bm{X}})}{p_{z^{*}d^{*}}-kp_{01}}\right]
=\displaystyle= 0.\displaystyle 0.

Therefore, we have obtained θd1d0(zz),mr=j=14Δj=θd1d0(zz),b=θd1d0(zz)\theta^{(zz^{\prime}),\text{mr}}_{d_{1}d_{0}}=\sum_{j=1}^{4}\Delta_{j}=\theta^{(zz^{\prime}),\text{b}}_{d_{1}d_{0}}=\theta^{(zz^{\prime})}_{d_{1}d_{0}} under πmo\mathcal{M}_{\pi}\cap\mathcal{M}_{m}\cap\mathcal{M}_{o}.

Scenario III (πeo\mathcal{M}_{\pi}\cap\mathcal{M}_{e}\cap\mathcal{M}_{o}):

In Scenario III, π~z(𝒙)=πz(𝒙)\widetilde{\pi}_{z}({\bm{x}})=\pi_{z}({\bm{x}}), p~zd(𝒙)=pzd(𝒙)\widetilde{p}_{zd}({\bm{x}})=p_{zd}({\bm{x}}), μ~zd(m,𝒙)=μzd(m,𝒙)\widetilde{\mu}_{zd}(m,{\bm{x}})=\mu_{zd}(m,{\bm{x}}), but generally r~zd(m,𝒙)rzd(m,𝒙)\widetilde{r}_{zd}(m,{\bm{x}})\neq r_{zd}(m,{\bm{x}}). Observing this, we can rewrite θd1d0(zz),mr=j=14Δj\theta^{(zz^{\prime}),\text{mr}}_{d_{1}d_{0}}=\sum_{j=1}^{4}\Delta_{j}, where

Δ1\displaystyle\Delta_{1} =𝔼[(𝕀(Z=z){𝕀(D=d)pzd(𝑿)}πz(𝑿)k(1Z){Dp01(𝑿)}π0(𝑿))mμzdz(m,𝑿)r~zdz(m,𝑿)dmpzdkp01],\displaystyle=\mathbb{E}\left[\left(\frac{\mathbb{I}(Z=z^{*})\left\{\mathbb{I}(D=d^{*})-p_{z^{*}d^{*}}({\bm{X}})\right\}}{\pi_{z^{*}}({\bm{X}})}-k\frac{(1-Z)\left\{D-p_{01}({\bm{X}})\right\}}{\pi_{0}({\bm{X}})}\right)\frac{\int_{m}{\mu}_{zd_{z}}(m,{\bm{X}})\widetilde{r}_{z^{\prime}d_{z^{\prime}}}(m,{\bm{X}})\text{d}m}{p_{z^{*}d^{*}}-kp_{01}}\right],
Δ2\displaystyle\Delta_{2} =𝔼[pzd(𝑿)kp01(𝑿)pzdkp01𝕀(D=dz,Z=z)pzdz(𝑿)πz(𝑿)r~zdz(M,𝑿)r~zdz(M,𝑿){Yμzdz(M,𝑿)}],\displaystyle=\mathbb{E}\left[\frac{p_{z^{*}d^{*}}({\bm{X}})-kp_{01}({\bm{X}})}{p_{z^{*}d^{*}}-kp_{01}}\frac{\mathbb{I}(D=d_{z},Z=z)}{p_{zd_{z}}({\bm{X}})\pi_{z}({\bm{X}})}\frac{\widetilde{r}_{z^{\prime}d_{z^{\prime}}}(M,{\bm{X}})}{\widetilde{r}_{zd_{z}}(M,{\bm{X}})}\left\{Y-\mu_{zd_{z}}(M,{\bm{X}})\right\}\right],
Δ3\displaystyle\Delta_{3} =𝔼[pzd(𝑿)kp01(𝑿)pzdkp01𝕀(D=dz,Z=z)pzdz(𝑿)πz(𝑿)μzdz(M,𝑿)],\displaystyle=\mathbb{E}\left[\frac{p_{z^{*}d^{*}}({\bm{X}})-kp_{01}({\bm{X}})}{p_{z^{*}d^{*}}-kp_{01}}\frac{\mathbb{I}(D=d_{z^{\prime}},Z=z^{\prime})}{p_{z^{\prime}d_{z^{\prime}}}({\bm{X}})\pi_{z^{\prime}}({\bm{X}})}\mu_{zd_{z}}(M,{\bm{X}})\right],
Δ4\displaystyle\Delta_{4} =𝔼[pzd(𝑿)kp01(𝑿)pzdkp01{1𝕀(D=dz,Z=z)pzdz(𝑿)πz(𝑿)}mμzdz(m,𝑿)r~zdz(m,𝑿)dm].\displaystyle=\mathbb{E}\left[\frac{p_{z^{*}d^{*}}({\bm{X}})-kp_{01}({\bm{X}})}{p_{z^{*}d^{*}}-kp_{01}}\left\{1-\frac{\mathbb{I}(D=d_{z^{\prime}},Z=z^{\prime})}{p_{z^{\prime}d_{z^{\prime}}}({\bm{X}})\pi_{z^{\prime}}({\bm{X}})}\right\}\int_{m}{\mu}_{zd_{z}}(m,{\bm{X}})\widetilde{r}_{z^{\prime}d_{z^{\prime}}}(m,{\bm{X}})\text{d}m\right].

Noting that Δ3=θd1d0(zz),c\Delta_{3}=\theta_{d_{1}d_{0}}^{(zz^{\prime}),\text{c}},

Δ1=\displaystyle\Delta_{1}= 𝔼[𝕀(Z=z){𝕀(D=d)pzd(𝑿)}πz(𝑿)mμzdz(m,𝑿)r~zdz(m,𝑿)dmpzdkp01]\displaystyle\mathbb{E}\left[\frac{\mathbb{I}(Z=z^{*})\left\{\mathbb{I}(D=d^{*})-p_{z^{*}d^{*}}({\bm{X}})\right\}}{\pi_{z^{*}}({\bm{X}})}\frac{\int_{m}{\mu}_{zd_{z}}(m,{\bm{X}})\widetilde{r}_{z^{\prime}d_{z^{\prime}}}(m,{\bm{X}})\text{d}m}{p_{z^{*}d^{*}}-kp_{01}}\right]
k𝔼[(1Z){Dp01(𝑿)}π0(𝑿)mμzdz(m,𝑿)r~zdz(m,𝑿)dmpzdkp01]\displaystyle-k\mathbb{E}\left[\frac{(1-Z)\left\{D-p_{01}({\bm{X}})\right\}}{\pi_{0}({\bm{X}})}\frac{\int_{m}{\mu}_{zd_{z}}(m,{\bm{X}})\widetilde{r}_{z^{\prime}d_{z^{\prime}}}(m,{\bm{X}})\text{d}m}{p_{z^{*}d^{*}}-kp_{01}}\right]
=\displaystyle= 𝔼[𝕀(Z=z)πz(𝑿)mμzdz(m,𝑿)r~zdz(m,𝑿)dmpzdkp01{𝔼D|Z,𝑿[𝕀(D=d)|z,𝑿]pzd(𝑿)}=0]\displaystyle\mathbb{E}\left[\frac{\mathbb{I}(Z=z^{*})}{\pi_{z^{*}}({\bm{X}})}\frac{\int_{m}{\mu}_{zd_{z}}(m,{\bm{X}})\widetilde{r}_{z^{\prime}d_{z^{\prime}}}(m,{\bm{X}})\text{d}m}{p_{z^{*}d^{*}}-kp_{01}}\underbrace{\left\{\mathbb{E}_{D|Z,{\bm{X}}}[\mathbb{I}(D=d^{*})|z^{*},{\bm{X}}]-p_{z^{*}d^{*}}({\bm{X}})\right\}}_{=0}\right]
k𝔼[(1Z)π0(𝑿)mμzdz(m,𝑿)r~zdz(m,𝑿)dmpzdkp01{𝔼D|Z,𝑿[D|0,𝑿]p01(𝑿)}=0]\displaystyle-k\mathbb{E}\left[\frac{(1-Z)}{\pi_{0}({\bm{X}})}\frac{\int_{m}{\mu}_{zd_{z}}(m,{\bm{X}})\widetilde{r}_{z^{\prime}d_{z^{\prime}}}(m,{\bm{X}})\text{d}m}{p_{z^{*}d^{*}}-kp_{01}}\underbrace{\left\{\mathbb{E}_{D|Z,{\bm{X}}}[D|0,{\bm{X}}]-p_{01}({\bm{X}})\right\}}_{=0}\right]
=\displaystyle= 0k×0=0,\displaystyle 0-k\times 0=0,
Δ2=\displaystyle\Delta_{2}= 𝔼[pzd(𝑿)kp01(𝑿)pzdkp01𝕀(D=dz,Z=z)pzdz(𝑿)πz(𝑿)r~zdz(M,𝑿)r~zdz(M,𝑿){Yμzdz(M,𝑿)}]\displaystyle\mathbb{E}\left[\frac{p_{z^{*}d^{*}}({\bm{X}})-kp_{01}({\bm{X}})}{p_{z^{*}d^{*}}-kp_{01}}\frac{\mathbb{I}(D=d_{z},Z=z)}{p_{zd_{z}}({\bm{X}})\pi_{z}({\bm{X}})}\frac{\widetilde{r}_{z^{\prime}d_{z^{\prime}}}(M,{\bm{X}})}{\widetilde{r}_{zd_{z}}(M,{\bm{X}})}\left\{Y-\mu_{zd_{z}}(M,{\bm{X}})\right\}\right]
=\displaystyle= 𝔼[pzd(𝑿)kp01(𝑿)pzdkp01𝕀(D=dz,Z=z)pzdz(𝑿)πz(𝑿)r~zdz(M,𝑿)r~zdz(M,𝑿){𝔼Y|Z,D,M,𝑿[Y|z,dz,M,𝑿]μzdz(M,𝑿)}=0]\displaystyle\mathbb{E}\left[\frac{p_{z^{*}d^{*}}({\bm{X}})-kp_{01}({\bm{X}})}{p_{z^{*}d^{*}}-kp_{01}}\frac{\mathbb{I}(D=d_{z},Z=z)}{p_{zd_{z}}({\bm{X}})\pi_{z}({\bm{X}})}\frac{\widetilde{r}_{z^{\prime}d_{z^{\prime}}}(M,{\bm{X}})}{\widetilde{r}_{zd_{z}}(M,{\bm{X}})}\underbrace{\left\{\mathbb{E}_{Y|Z,D,M,{\bm{X}}}[Y|z,d_{z},M,{\bm{X}}]-\mu_{zd_{z}}(M,{\bm{X}})\right\}}_{=0}\right]
=\displaystyle= 0,\displaystyle 0,
Δ4=\displaystyle\Delta_{4}= 𝔼[pzd(𝑿)kp01(𝑿)pzdkp01mμzdz(m,𝑿)r~zdz(m,𝑿){1𝔼Z,D|𝑿[𝕀(D=dz,Z=z)pzdz(𝑿)πz(𝑿)|𝑿]}dm]\displaystyle\mathbb{E}\left[\frac{p_{z^{*}d^{*}}({\bm{X}})-kp_{01}({\bm{X}})}{p_{z^{*}d^{*}}-kp_{01}}\int_{m}{\mu}_{zd_{z}}(m,{\bm{X}})\widetilde{r}_{z^{\prime}d_{z^{\prime}}}(m,{\bm{X}})\left\{1-\mathbb{E}_{Z,D|{\bm{X}}}\left[\frac{\mathbb{I}(D=d_{z^{\prime}},Z=z^{\prime})}{p_{z^{\prime}d_{z^{\prime}}}({\bm{X}})\pi_{z^{\prime}}({\bm{X}})}|{\bm{X}}\right]\right\}\text{d}m\right]
=\displaystyle= 𝔼[pzd(𝑿)kp01(𝑿)pzdkp01mμzdz(m,𝑿)r~zdz(m,𝑿){11}dm]\displaystyle\mathbb{E}\left[\frac{p_{z^{*}d^{*}}({\bm{X}})-kp_{01}({\bm{X}})}{p_{z^{*}d^{*}}-kp_{01}}\int_{m}{\mu}_{zd_{z}}(m,{\bm{X}})\widetilde{r}_{z^{\prime}d_{z^{\prime}}}(m,{\bm{X}})\left\{1-1\right\}\text{d}m\right]
=\displaystyle= 0,\displaystyle 0,

we have obtained θd1d0(zz),mr=j=14Δj=θd1d0(zz),c=θd1d0(zz)\theta^{(zz^{\prime}),\text{mr}}_{d_{1}d_{0}}=\sum_{j=1}^{4}\Delta_{j}=\theta^{(zz^{\prime}),\text{c}}_{d_{1}d_{0}}=\theta^{(zz^{\prime})}_{d_{1}d_{0}} under πeo\mathcal{M}_{\pi}\cap\mathcal{M}_{e}\cap\mathcal{M}_{o}.

Scenario IV (emo\mathcal{M}_{e}\cap\mathcal{M}_{m}\cap\mathcal{M}_{o}):

In Scenario IV, p~zd(𝒙)=pzd(𝒙)\widetilde{p}_{zd}({\bm{x}})=p_{zd}({\bm{x}}), r~zd(m,𝒙)=rzd(m,𝒙)\widetilde{r}_{zd}(m,{\bm{x}})=r_{zd}(m,{\bm{x}}), μ~zd(m,𝒙)=μzd(m,𝒙)\widetilde{\mu}_{zd}(m,{\bm{x}})=\mu_{zd}(m,{\bm{x}}), but generally π~z(𝒙)πz(𝒙)\widetilde{\pi}_{z}({\bm{x}})\neq\pi_{z}({\bm{x}}). Therefore, we have θd1d0(zz),mr=j=14Δj\theta^{(zz^{\prime}),\text{mr}}_{d_{1}d_{0}}=\sum_{j=1}^{4}\Delta_{j}, where

Δ1\displaystyle\Delta_{1} =𝔼[(𝕀(Z=z){𝕀(D=d)pzd(𝑿)}π~z(𝑿)k(1Z){Dp01(𝑿)}π~0(𝑿))ηzz(𝑿)pzdkp01],\displaystyle=\mathbb{E}\left[\left(\frac{\mathbb{I}(Z=z^{*})\left\{\mathbb{I}(D=d^{*})-p_{z^{*}d^{*}}({\bm{X}})\right\}}{\widetilde{\pi}_{z^{*}}({\bm{X}})}-k\frac{(1-Z)\left\{D-p_{01}({\bm{X}})\right\}}{\widetilde{\pi}_{0}({\bm{X}})}\right)\frac{\eta_{zz^{\prime}}({\bm{X}})}{p_{z^{*}d^{*}}-kp_{01}}\right],
Δ2\displaystyle\Delta_{2} =𝔼[pzd(𝑿)kp01(𝑿)pzdkp01𝕀(D=dz,Z=z)pzdz(𝑿)π~z(𝑿)rzdz(M,𝑿)rzdz(M,𝑿){Yμzdz(M,𝑿)}],\displaystyle=\mathbb{E}\left[\frac{p_{z^{*}d^{*}}({\bm{X}})-kp_{01}({\bm{X}})}{p_{z^{*}d^{*}}-kp_{01}}\frac{\mathbb{I}(D=d_{z},Z=z)}{p_{zd_{z}}({\bm{X}})\widetilde{\pi}_{z}({\bm{X}})}\frac{r_{z^{\prime}d_{z^{\prime}}}(M,{\bm{X}})}{r_{zd_{z}}(M,{\bm{X}})}\left\{Y-\mu_{zd_{z}}(M,{\bm{X}})\right\}\right],
Δ3\displaystyle\Delta_{3} =𝔼[pzd(𝑿)kp01(𝑿)pzdkp01𝕀(D=dz,Z=z)pzdz(𝑿)π~z(𝑿){μzdz(M,𝑿)ηzz(𝑿)}],\displaystyle=\mathbb{E}\left[\frac{p_{z^{*}d^{*}}({\bm{X}})-kp_{01}({\bm{X}})}{p_{z^{*}d^{*}}-kp_{01}}\frac{\mathbb{I}(D=d_{z^{\prime}},Z=z^{\prime})}{p_{z^{\prime}d_{z^{\prime}}}({\bm{X}})\widetilde{\pi}_{z^{\prime}}({\bm{X}})}\left\{\mu_{zd_{z}}(M,{\bm{X}})-\eta_{zz^{\prime}}({\bm{X}})\right\}\right],
Δ4\displaystyle\Delta_{4} =𝔼[pzd(𝑿)kp01(𝑿)pzdkp01ηzz(𝑿)].\displaystyle=\mathbb{E}\left[\frac{p_{z^{*}d^{*}}({\bm{X}})-kp_{01}({\bm{X}})}{p_{z^{*}d^{*}}-kp_{01}}\eta_{zz^{\prime}}({\bm{X}})\right].

Noting that Δ4=θd1d0(zz),d\Delta_{4}=\theta_{d_{1}d_{0}}^{(zz^{\prime}),\text{d}},

Δ1=\displaystyle\Delta_{1}= 𝔼[𝕀(Z=z){𝕀(D=d)pzd(𝑿)}π~z(𝑿)ηzz(𝑿)pzdkp01]k𝔼[(1Z){Dp01(𝑿)}π~0(𝑿)ηzz(𝑿)pzdkp01]\displaystyle\mathbb{E}\left[\frac{\mathbb{I}(Z=z^{*})\left\{\mathbb{I}(D=d^{*})-p_{z^{*}d^{*}}({\bm{X}})\right\}}{\widetilde{\pi}_{z^{*}}({\bm{X}})}\frac{\eta_{zz^{\prime}}({\bm{X}})}{p_{z^{*}d^{*}}-kp_{01}}\right]-k\mathbb{E}\left[\frac{(1-Z)\left\{D-p_{01}({\bm{X}})\right\}}{\widetilde{\pi}_{0}({\bm{X}})}\frac{\eta_{zz^{\prime}}({\bm{X}})}{p_{z^{*}d^{*}}-kp_{01}}\right]
=\displaystyle= 𝔼[𝕀(Z=z)π~z(𝑿)ηzz(𝑿)pzdkp01{𝔼D|Z,𝑿[𝕀(D=d)|z,𝑿]pzd(𝑿)}=0]\displaystyle\mathbb{E}\left[\frac{\mathbb{I}(Z=z^{*})}{\widetilde{\pi}_{z^{*}}({\bm{X}})}\frac{\eta_{zz^{\prime}}({\bm{X}})}{p_{z^{*}d^{*}}-kp_{01}}\underbrace{\left\{\mathbb{E}_{D|Z,{\bm{X}}}[\mathbb{I}(D=d^{*})|z^{*},{\bm{X}}]-p_{z^{*}d^{*}}({\bm{X}})\right\}}_{=0}\right]
k𝔼[(1Z)π~0(𝑿)ηzz(𝑿)pzdkp01{𝔼D|Z,𝑿[D|0,𝑿]p01(𝑿)}=0]\displaystyle-k\mathbb{E}\left[\frac{(1-Z)}{\widetilde{\pi}_{0}({\bm{X}})}\frac{\eta_{zz^{\prime}}({\bm{X}})}{p_{z^{*}d^{*}}-kp_{01}}\underbrace{\left\{\mathbb{E}_{D|Z,{\bm{X}}}[D|0,{\bm{X}}]-p_{01}({\bm{X}})\right\}}_{=0}\right]
=\displaystyle= 0k×0=0,\displaystyle 0-k\times 0=0,
Δ2=\displaystyle\Delta_{2}= 𝔼[pzd(𝑿)kp01(𝑿)pzdkp01𝕀(D=dz,Z=z)pzdz(𝑿)π~z(𝑿)rzdz(M,𝑿)rzdz(M,𝑿){Yμzdz(M,𝑿)}]\displaystyle\mathbb{E}\left[\frac{p_{z^{*}d^{*}}({\bm{X}})-kp_{01}({\bm{X}})}{p_{z^{*}d^{*}}-kp_{01}}\frac{\mathbb{I}(D=d_{z},Z=z)}{p_{zd_{z}}({\bm{X}})\widetilde{\pi}_{z}({\bm{X}})}\frac{r_{z^{\prime}d_{z^{\prime}}}(M,{\bm{X}})}{r_{zd_{z}}(M,{\bm{X}})}\left\{Y-\mu_{zd_{z}}(M,{\bm{X}})\right\}\right]
=\displaystyle= 𝔼[pzd(𝑿)kp01(𝑿)pzdkp01𝕀(D=dz,Z=z)pzdz(𝑿)π~z(𝑿)rzdz(M,𝑿)rzdz(M,𝑿){𝔼Y|Z,D,M,𝑿[Y|z,dz,M,𝑿]μzdz(M,𝑿)}=0]\displaystyle\mathbb{E}\left[\frac{p_{z^{*}d^{*}}({\bm{X}})-kp_{01}({\bm{X}})}{p_{z^{*}d^{*}}-kp_{01}}\frac{\mathbb{I}(D=d_{z},Z=z)}{p_{zd_{z}}({\bm{X}})\widetilde{\pi}_{z}({\bm{X}})}\frac{r_{z^{\prime}d_{z^{\prime}}}(M,{\bm{X}})}{r_{zd_{z}}(M,{\bm{X}})}\underbrace{\left\{\mathbb{E}_{Y|Z,D,M,{\bm{X}}}[Y|z,d_{z},M,{\bm{X}}]-\mu_{zd_{z}}(M,{\bm{X}})\right\}}_{=0}\right]
=\displaystyle= 0,\displaystyle 0,
Δ3=\displaystyle\Delta_{3}= 𝔼[pzd(𝑿)kp01(𝑿)pzdkp01𝕀(D=dz,Z=z)pzdz(𝑿)π~z(𝑿){𝔼M|Z,D,𝑿[μzdz(M,𝑿)|z,dz,𝑿]ηzz(𝑿)}]\displaystyle\mathbb{E}\left[\frac{p_{z^{*}d^{*}}({\bm{X}})-kp_{01}({\bm{X}})}{p_{z^{*}d^{*}}-kp_{01}}\frac{\mathbb{I}(D=d_{z^{\prime}},Z=z^{\prime})}{p_{z^{\prime}d_{z^{\prime}}}({\bm{X}})\widetilde{\pi}_{z^{\prime}}({\bm{X}})}\left\{\mathbb{E}_{M|Z,D,{\bm{X}}}[\mu_{zd_{z}}(M,{\bm{X}})|z^{\prime},d_{z^{\prime}},{\bm{X}}]-\eta_{zz^{\prime}}({\bm{X}})\right\}\right]
=\displaystyle= 𝔼[pzd(𝑿)kp01(𝑿)pzdkp01𝕀(D=dz,Z=z)pzdz(𝑿)π~z(𝑿){ηzz(𝑿)ηzz(𝑿)}]\displaystyle\mathbb{E}\left[\frac{p_{z^{*}d^{*}}({\bm{X}})-kp_{01}({\bm{X}})}{p_{z^{*}d^{*}}-kp_{01}}\frac{\mathbb{I}(D=d_{z^{\prime}},Z=z^{\prime})}{p_{z^{\prime}d_{z^{\prime}}}({\bm{X}})\widetilde{\pi}_{z^{\prime}}({\bm{X}})}\left\{\eta_{zz^{\prime}}({\bm{X}})-\eta_{zz^{\prime}}({\bm{X}})\right\}\right]
=\displaystyle= 0,\displaystyle 0,

we have that θd1d0(zz),mr=j=14Δj=θd1d0(zz),d=θd1d0(zz)\theta^{(zz^{\prime}),\text{mr}}_{d_{1}d_{0}}=\sum_{j=1}^{4}\Delta_{j}=\theta^{(zz^{\prime}),\text{d}}_{d_{1}d_{0}}=\theta^{(zz^{\prime})}_{d_{1}d_{0}} under emo\mathcal{M}_{e}\cap\mathcal{M}_{m}\cap\mathcal{M}_{o}.

Up until this point, we have confirmed that the probability limit of θ^d1d0(zz),mr\widehat{\theta}^{(zz^{\prime}),\text{mr}}_{d_{1}d_{0}}, i.e., θd1d0(zz),mr\theta^{(zz^{\prime}),\text{mr}}_{d_{1}d_{0}}, equals to the true value θd1d0(zz)\theta^{(zz^{\prime})}_{d_{1}d_{0}} under either πem\mathcal{M}_{\pi}\cap\mathcal{M}_{e}\cap\mathcal{M}_{m}, πmo\mathcal{M}_{\pi}\cap\mathcal{M}_{m}\cap\mathcal{M}_{o}, πeo\mathcal{M}_{\pi}\cap\mathcal{M}_{e}\cap\mathcal{M}_{o}, or emo\mathcal{M}_{e}\cap\mathcal{M}_{m}\cap\mathcal{M}_{o}. Next, we prove the asymptotic normality of θ^d1d0(zz),mr\widehat{\theta}^{(zz^{\prime}),\text{mr}}_{d_{1}d_{0}}. Notice that θ^d1d0(zz),mr\widehat{\theta}^{(zz^{\prime}),\text{mr}}_{d_{1}d_{0}} can be viewed as the solution of the following estimating equation

n[Smr(𝑶;θd1d0(zz),𝝉^)]=n[𝒮1(𝑶;𝝉^)θd1d0(zz)𝒮0(𝑶;𝝉^)]=0,\mathbb{P}_{n}\left[S_{\text{mr}}\left({\bm{O}};\theta_{d_{1}d_{0}}^{(zz^{\prime})},\widehat{\bm{\tau}}\right)\right]=\mathbb{P}_{n}\left[\mathcal{S}_{1}(\bm{O};\widehat{\bm{\tau}})-\theta_{d_{1}d_{0}}^{(zz^{\prime})}\mathcal{S}_{0}(\bm{O};\widehat{\bm{\tau}})\right]=0,

where 𝒮1(𝑶;𝝉)\mathcal{S}_{1}(\bm{O};{\bm{\tau}}) and 𝒮0(𝑶;𝝉)\mathcal{S}_{0}(\bm{O};{\bm{\tau}}) are ψd1d0(zz)(𝑶)\psi_{d_{1}d_{0}}^{(zz^{\prime})}(\bm{O}) and δd1d0(𝑶)\delta_{d_{1}d_{0}}(\bm{O}) evaluated at hnuisanceparh_{nuisance}^{\text{par}}. Assume that the following regularity conditions hold:

  • 1.

    Assume that n(𝝉^𝝉)=nn[IF𝝉(𝑶;𝝉)]+op(1)\sqrt{n}(\widehat{\bm{\tau}}-\bm{\tau}^{*})=\sqrt{n}\mathbb{P}_{n}\left[\text{IF}_{\bm{\tau}}(\bm{O};\bm{\tau}^{*})\right]+o_{p}(1), where IF𝝉(𝑶;𝝉)\text{IF}_{\bm{\tau}}(\bm{O};\bm{\tau}^{*}) is the influence function of 𝝉^\widehat{\bm{\tau}} and op(1)o_{p}(1) is a remainder term that converges in probability to 0. Also, assume that n[{IF𝝉(𝑶;𝝉)}2]\mathbb{P}_{n}\left[\left\{\text{IF}_{\bm{\tau}}(\bm{O};\bm{\tau}^{*})\right\}^{\otimes 2}\right] converges to a positive definite matrix.

  • 2.

    Let 𝚵\bm{\Xi} be a bounded convex neighborhood of 𝝉\bm{\tau}^{*}. Assume that the class of functions {𝒮1(𝑶;𝝉),𝝉𝒮1(𝑶;𝝉),{𝒮1(𝑶;𝝉)}2,𝒮0(𝑶;𝝉),𝝉𝒮0(𝑶;𝝉),{𝒮0(𝑶;𝝉)}2,IF𝝉(𝑶;𝝉),{IF𝝉(𝑶;𝝉)}2}\Big{\{}\mathcal{S}_{1}(\bm{O};\bm{\tau}),\frac{\partial}{\partial\bm{\tau}}\mathcal{S}_{1}(\bm{O};\bm{\tau}),\left\{\mathcal{S}_{1}(\bm{O};\bm{\tau})\right\}^{2},\mathcal{S}_{0}(\bm{O};\bm{\tau}),\frac{\partial}{\partial\bm{\tau}}\mathcal{S}_{0}(\bm{O};\bm{\tau}),\left\{\mathcal{S}_{0}(\bm{O};\bm{\tau})\right\}^{2},\text{IF}_{\bm{\tau}}(\bm{O};\bm{\tau}),\\ \left\{\text{IF}_{\bm{\tau}}(\bm{O};\bm{\tau})\right\}^{\otimes 2}\Big{\}} is a Glivenko-Cantelli class in 𝚵\bm{\Xi}.

  • 3.

    Assume that n[𝒮0(𝑶;𝝉)]\mathbb{P}_{n}[\mathcal{S}_{0}(\bm{O};\bm{\tau}^{*})] converges to a positive value. In addition, we assume that both n[{𝒮1(𝑶;𝝉)}2]\mathbb{P}_{n}[\left\{\mathcal{S}_{1}(\bm{O};\bm{\tau}^{*})\right\}^{2}] and n[{𝒮0(𝑶;𝝉)}2]\mathbb{P}_{n}[\left\{\mathcal{S}_{0}(\bm{O};\bm{\tau}^{*})\right\}^{2}] converge to a positive value.

To prove asymptotic normality, we use a Taylor series, along with the above conditions, to deduce that

0=n[Smr(𝑶;θ^d1d0(zz),mr,𝝉^)]=\displaystyle 0=\mathbb{P}_{n}\left[S_{\text{mr}}\left({\bm{O}};\widehat{\theta}_{d_{1}d_{0}}^{(zz^{\prime}),\text{mr}},\widehat{\bm{\tau}}\right)\right]= n[𝒮1(𝑶;𝝉^)θ^d1d0(zz),mr𝒮0(𝑶;𝝉^)]\displaystyle\mathbb{P}_{n}\left[\mathcal{S}_{1}(\bm{O};\widehat{\bm{\tau}})-\widehat{\theta}_{d_{1}d_{0}}^{(zz^{\prime}),\text{mr}}\mathcal{S}_{0}(\bm{O};\widehat{\bm{\tau}})\right]
=\displaystyle= n[𝒮1(𝑶;𝝉)θd1d0(zz),mr𝒮0(𝑶;𝝉)]\displaystyle\mathbb{P}_{n}\left[\mathcal{S}_{1}(\bm{O};\bm{\tau}^{*})-\theta_{d_{1}d_{0}}^{(zz^{\prime}),\text{mr}}\mathcal{S}_{0}(\bm{O};\bm{\tau}^{*})\right]
n[𝒮0(𝑶;𝝉)](θ^d1d0(zz),mrθd1d0(zz),mr)\displaystyle-\mathbb{P}_{n}\left[\mathcal{S}_{0}(\bm{O};\bm{\tau}^{*})\right]\left(\widehat{\theta}_{d_{1}d_{0}}^{(zz^{\prime}),\text{mr}}-\theta_{d_{1}d_{0}}^{(zz^{\prime}),\text{mr}}\right)
+n[𝝉𝒮1(𝑶;𝝉)θd1d0(zz),mr𝝉𝒮0(𝑶;𝝉)](𝝉^𝝉)+op(n1/2),\displaystyle+\mathbb{P}_{n}\left[\frac{\partial}{\partial\bm{\tau}}\mathcal{S}_{1}(\bm{O};\bm{\tau}^{*})-\theta_{d_{1}d_{0}}^{(zz^{\prime}),\text{mr}}\frac{\partial}{\partial\bm{\tau}}\mathcal{S}_{0}(\bm{O};\bm{\tau}^{*})\right](\widehat{\bm{\tau}}-\bm{\tau}^{*})+o_{p}(n^{-1/2}),

which suggests that

n(θ^d1d0(zz),mrθd1d0(zz),mr)\displaystyle\sqrt{n}\left(\widehat{\theta}_{d_{1}d_{0}}^{(zz^{\prime}),\text{mr}}-\theta_{d_{1}d_{0}}^{(zz^{\prime}),\text{mr}}\right)
=\displaystyle= {𝔼[𝒮0(𝑶;𝝉)]}1n{𝒮1(𝑶;𝝉)θd1d0(zz),mr𝒮0(𝑶;𝝉)+R(θd1d0(zz),mr,𝝉)IF𝝉(𝑶;𝝉)}+op(1),\displaystyle\left\{\mathbb{E}[\mathcal{S}_{0}(\bm{O};\bm{\tau}^{*})]\right\}^{-1}\mathbb{P}_{n}\left\{\mathcal{S}_{1}(\bm{O};\bm{\tau}^{*})-\theta_{d_{1}d_{0}}^{(zz^{\prime}),\text{mr}}\mathcal{S}_{0}(\bm{O};\bm{\tau}^{*})+R(\theta_{d_{1}d_{0}}^{(zz^{\prime}),\text{mr}},\bm{\tau}^{*})\text{IF}_{\bm{\tau}}(\bm{O};\bm{\tau}^{*})\right\}+o_{p}(1),

where R(θd1d0(zz),mr,𝝉)=𝔼[𝝉𝒮1(𝑶;𝝉)θd1d0(zz),mr𝝉𝒮0(𝑶;𝝉)]R(\theta_{d_{1}d_{0}}^{(zz^{\prime}),\text{mr}},\bm{\tau}^{*})=\mathbb{E}\left[\frac{\partial}{\partial\bm{\tau}}\mathcal{S}_{1}(\bm{O};\bm{\tau}^{*})-\theta_{d_{1}d_{0}}^{(zz^{\prime}),\text{mr}}\frac{\partial}{\partial\bm{\tau}}\mathcal{S}_{0}(\bm{O};\bm{\tau}^{*})\right]. Then, by applying the central limit theorem and noticing that θd1d0(zz),mr=θd1d0(zz)\theta_{d_{1}d_{0}}^{(zz^{\prime}),\text{mr}}=\theta_{d_{1}d_{0}}^{(zz^{\prime})} under either πem\mathcal{M}_{\pi}\cap\mathcal{M}_{e}\cap\mathcal{M}_{m}, πmo\mathcal{M}_{\pi}\cap\mathcal{M}_{m}\cap\mathcal{M}_{o}, πeo\mathcal{M}_{\pi}\cap\mathcal{M}_{e}\cap\mathcal{M}_{o}, or emo\mathcal{M}_{e}\cap\mathcal{M}_{m}\cap\mathcal{M}_{o}, we can show that n(θ^d1d0(zz),mrθd1d0(zz))\sqrt{n}\left(\widehat{\theta}_{d_{1}d_{0}}^{(zz^{\prime}),\text{mr}}-\theta_{d_{1}d_{0}}^{(zz^{\prime})}\right) converges to a zero-mean normal distribution with variance

Vmr={𝔼[𝒮0(𝑶;𝝉)]}2𝔼[{𝒮1(𝑶;𝝉)θd1d0(zz)𝒮0(𝑶;𝝉)+R(θd1d0(zz),𝝉)IF𝝉(𝑶;𝝉)}2].V_{\text{mr}}=\left\{\mathbb{E}[\mathcal{S}_{0}(\bm{O};\bm{\tau}^{*})]\right\}^{-2}\mathbb{E}\left[\left\{\mathcal{S}_{1}(\bm{O};\bm{\tau}^{*})-\theta_{d_{1}d_{0}}^{(zz^{\prime})}\mathcal{S}_{0}(\bm{O};\bm{\tau}^{*})+R(\theta_{d_{1}d_{0}}^{(zz^{\prime})},\bm{\tau}^{*})\text{IF}_{\bm{\tau}}(\bm{O};\bm{\tau}^{*})\right\}^{2}\right].

Finally, when all parametric working models are correctly specified, i.e., under the intersection model πemo\mathcal{M}_{\pi}\cap\mathcal{M}_{e}\cap\mathcal{M}_{m}\cap\mathcal{M}_{o}, then Vmr=𝔼[{𝒟d1d0(zz)(𝑶)}2]V_{\text{mr}}=\mathbb{E}\left[\left\{\mathcal{D}_{d_{1}d_{0}}^{(zz^{\prime})}(\bm{O})\right\}^{2}\right] achieves the semiparametric efficiency bound. We can easily verify this by observing the following facts:

  • 1.

    𝒮1(𝑶;𝝉)=ψd1d0(zz)(𝑶)\mathcal{S}_{1}(\bm{O};\bm{\tau}^{*})=\psi_{d_{1}d_{0}}^{(zz^{\prime})}(\bm{O}) and 𝒮0(𝑶;𝝉)=δd1d0(𝑶)\mathcal{S}_{0}(\bm{O};\bm{\tau}^{*})=\delta_{d_{1}d_{0}}(\bm{O}) under πemo\mathcal{M}_{\pi}\cap\mathcal{M}_{e}\cap\mathcal{M}_{m}\cap\mathcal{M}_{o}.

  • 2.

    Following point 1, 𝔼[𝒮0(𝑶;𝝉)]=ed1d0=pzdkp01\mathbb{E}[\mathcal{S}_{0}(\bm{O};\bm{\tau}^{*})]=e_{d_{1}d_{0}}=p_{z^{*}d^{*}}-kp_{01} under πemo\mathcal{M}_{\pi}\cap\mathcal{M}_{e}\cap\mathcal{M}_{m}\cap\mathcal{M}_{o}.

  • 3.

    R(θd1d0(zz),mr,𝝉)=0R(\theta_{d_{1}d_{0}}^{(zz^{\prime}),\text{mr}},\bm{\tau}^{*})=0, because the EIF is orthogonal to the likelihood score of the parametric working models when they are correctly specified.

Under the above three points, we have

Vmr=𝔼[{ψd1d0(zz)(𝑶)θd1d0(zz)δd1d0(𝑶)pzdkp01}2].V_{\text{mr}}=\mathbb{E}\left[\left\{\frac{\psi_{d_{1}d_{0}}^{(zz^{\prime})}(\bm{O})-\theta_{d_{1}d_{0}}^{(zz^{\prime})}\delta_{d_{1}d_{0}}(\bm{O})}{p_{z^{*}d^{*}}-kp_{01}}\right\}^{2}\right].

This has completed the proof. \square

D.6 The nonparametric efficient estimator (Theorem 5)

Proof of Theorem 5. Proof of the multiply robostness of θ^d1d0(zz),np\widehat{\theta}_{d_{1}d_{0}}^{(zz^{\prime}),\text{np}} is same to that of θ^d1d0(zz),mr\widehat{\theta}_{d_{1}d_{0}}^{(zz^{\prime}),\text{mr}} as shown in the proof of Theorem 4. Here, we only prove the asymptotic normality and local efficiency of θ^d1d0(zz),np\widehat{\theta}_{d_{1}d_{0}}^{(zz^{\prime}),\text{np}}.

To simplify notation but without loss of clarity, we abbreviate θd1d0(zz)\theta_{d_{1}d_{0}}^{(zz^{\prime})} and its nonparametric estimator (θ^d1d0(zz),np\widehat{\theta}_{d_{1}d_{0}}^{(zz^{\prime}),\text{np}}) as θ\theta and θ^\widehat{\theta}, respectively. Also, we abbreviate the two unknown functions in the efficient influence function, ψd1d0(zz)(𝑶)\psi_{d_{1}d_{0}}^{(zz^{\prime})}(\bm{O}) and δd1d0(𝑶)\delta_{d_{1}d_{0}}(\bm{O}), as ψ(𝑶)\psi(\bm{O}) and δ(𝑶)\delta(\bm{O}), respectively. Based on the cross-splitting procedure, the nonparametric estimator θ^\widehat{\theta} can be decomposed in to the ratio of two terms n[ψ^(𝑶)]\mathbb{P}_{n}\left[\widehat{\psi}(\bm{O})\right] and n[δ^(𝑶)]\mathbb{P}_{n}\left[\widehat{\delta}(\bm{O})\right], where

n[ψ^(𝑶)]=1nv=1Vnvnv[ψ^v(𝑶)] and n[δ^(𝑶)]=1nv=1Vnvnv[δ^v(𝑶)].\displaystyle\mathbb{P}_{n}\left[\widehat{\psi}(\bm{O})\right]=\frac{1}{n}\sum_{v=1}^{V}n_{v}\mathbb{P}_{n_{v}}\left[\widehat{\psi}^{v}(\bm{O})\right]\text{ and }\mathbb{P}_{n}\left[\widehat{\delta}(\bm{O})\right]=\frac{1}{n}\sum_{v=1}^{V}n_{v}\mathbb{P}_{n_{v}}\left[\widehat{\delta}^{v}(\bm{O})\right].

Here, nvn_{v} is the size of the vv-th group 𝒪v\mathcal{O}_{v}, nv[]\mathbb{P}_{n_{v}}[\cdot] is the empirical mean operator on 𝒪v\mathcal{O}_{v}, and {ψ^v(𝑶),δ^v(𝑶)}\{\widehat{\psi}^{v}(\bm{O}),\widehat{\delta}^{v}(\bm{O})\} is {ψ(𝑶),δ(𝑶)}\{\psi(\bm{O}),\delta(\bm{O})\} evaluated under h^nuisancenp,v\widehat{h}_{nuisance}^{\text{np},v}, which is the nonparametric estimator of the nuisance functions based on the leave-one-out sample 𝒪v\mathcal{O}_{-v}. We further have the following decomposition of nv[ψ^v(𝑶)]\mathbb{P}_{n_{v}}\left[\widehat{\psi}^{v}(\bm{O})\right] and nv[δ^v(𝑶)]\mathbb{P}_{n_{v}}\left[\widehat{\delta}^{v}(\bm{O})\right]:

nv[ψ^v(𝑶)]\displaystyle\mathbb{P}_{n_{v}}[\widehat{\psi}^{v}(\bm{O})] =nv[ψ(𝑶)]+(nv𝔼)[ψ^v(𝑶)ψ(𝑶)]+𝔼[ψ^v(𝑶)ψ(𝑶)]=:R1(ψ^v,ψ),\displaystyle=\mathbb{P}_{n_{v}}[\psi(\bm{O})]+(\mathbb{P}_{n_{v}}-\mathbb{E})[\widehat{\psi}^{v}(\bm{O})-\psi(\bm{O})]+\underbrace{\mathbb{E}[\widehat{\psi}^{v}(\bm{O})-\psi(\bm{O})]}_{=:R_{1}(\widehat{\psi}^{v},\psi)}, (s22)
nv[δ^v(𝑶)]\displaystyle\mathbb{P}_{n_{v}}[\widehat{\delta}^{v}(\bm{O})] =nv[δ(𝑶)]+(nv𝔼)[δ^v(𝑶)δ(𝑶)]+𝔼[δ^v(𝑶)δ(𝑶)]=:R2(δ^v,δ).\displaystyle=\mathbb{P}_{n_{v}}[\delta(\bm{O})]+(\mathbb{P}_{n_{v}}-\mathbb{E})[\widehat{\delta}^{v}(\bm{O})-\delta(\bm{O})]+\underbrace{\mathbb{E}[\widehat{\delta}^{v}(\bm{O})-\delta(\bm{O})]}_{=:R_{2}(\widehat{\delta}^{v},\delta)}. (s23)

In what follows, we show that (s22)=nv[ψ(𝑶)]+op(n1/2)\eqref{eq:t5_1}=\mathbb{P}_{n_{v}}[\psi(\bm{O})]+o_{p}(n^{-1/2}). Using a similar strategy, one can also deduce that (s23)=nv[δ(𝑶)]+op(n1/2)\eqref{eq:t5_2}=\mathbb{P}_{n_{v}}[\delta(\bm{O})]+o_{p}(n^{-1/2}).

We can show the second term in (s22) is op(n1/2)o_{p}(n^{-1/2}) by cross-splitting. Specifically, by the Markov’s inequality, the independence induced by cross-splitting, and the fact that

Var{(nv𝔼)[ψ^v(𝑶)ψ(𝑶)]|𝒪v}\displaystyle\text{Var}\left\{(\mathbb{P}_{n_{v}}-\mathbb{E})[\widehat{\psi}^{v}(\bm{O})-\psi(\bm{O})]\Big{|}\mathcal{O}_{-v}\right\} =Var{nv[ψ^v(𝑶)ψ(𝑶)]|𝒪v}\displaystyle=\text{Var}\left\{\mathbb{P}_{n_{v}}[\widehat{\psi}^{v}(\bm{O})-\psi(\bm{O})]\Big{|}\mathcal{O}_{-v}\right\}
=1nvVar{ψ^v(𝑶)ψ(𝑶)|𝒪v}\displaystyle=\frac{1}{n_{v}}\text{Var}\left\{\widehat{\psi}^{v}(\bm{O})-\psi(\bm{O})\Big{|}\mathcal{O}_{-v}\right\}
=1nvψ^vψ2,\displaystyle=\frac{1}{n_{v}}\|\widehat{\psi}^{v}-\psi\|^{2},

we have that

{nv|(nv𝔼)[ψ^v(𝑶)ψ(𝑶)]|ψ^vψϵ}\displaystyle\mathbb{P}\left\{\frac{\sqrt{n_{v}}\left|(\mathbb{P}_{n_{v}}-\mathbb{E})[\widehat{\psi}^{v}(\bm{O})-\psi(\bm{O})]\right|}{\|\widehat{\psi}^{v}-\psi\|}\geq\epsilon\right\} =𝔼[{nv|(nv𝔼)[ψ^v(𝑶)ψ(𝑶)]|ψ^vψϵ|𝒪v}]\displaystyle=\mathbb{E}\left[\mathbb{P}\left\{\frac{\sqrt{n_{v}}\left|(\mathbb{P}_{n_{v}}-\mathbb{E})[\widehat{\psi}^{v}(\bm{O})-\psi(\bm{O})]\right|}{\|\widehat{\psi}^{v}-\psi\|}\geq\epsilon\Big{|}\mathcal{O}_{-v}\right\}\right]
1ϵ2𝔼[Var{nv|(nv𝔼)[ψ^v(𝑶)ψ(𝑶)]|ψ^vψ|𝒪v}]\displaystyle\leq\frac{1}{\epsilon^{2}}\mathbb{E}\left[\text{Var}\left\{\frac{\sqrt{n_{v}}\left|(\mathbb{P}_{n_{v}}-\mathbb{E})[\widehat{\psi}^{v}(\bm{O})-\psi(\bm{O})]\right|}{\|\widehat{\psi}^{v}-\psi\|}\Big{|}\mathcal{O}_{-v}\right\}\right]
=ϵ2,\displaystyle=\epsilon^{-2},

for any ϵ>0\epsilon>0. Therefore, (nv𝔼)[ψ^v(𝑶)ψ(𝑶)]=Op(nv1/2ψ^vψ)=op(nv1/2)(\mathbb{P}_{n_{v}}-\mathbb{E})[\widehat{\psi}^{v}(\bm{O})-\psi(\bm{O})]=O_{p}(n_{v}^{-1/2}\|\widehat{\psi}^{v}-\psi\|)=o_{p}(n_{v}^{-1/2}) because ψ^v(𝑶)\widehat{\psi}^{v}(\bm{O}) converges to ψ(𝑶)\psi(\bm{O}) in probability and therefore ψ^vψ\|\widehat{\psi}^{v}-\psi\| converges to 0. Since VV is a finite number and we partition the dataset as evenly as possible, we have that nv/n=Op(1)n_{v}/n=O_{p}(1) and thus (nv𝔼)[ψ^v(𝑶)ψ(𝑶)]=op(n1/2)(\mathbb{P}_{n_{v}}-\mathbb{E})[\widehat{\psi}^{v}(\bm{O})-\psi(\bm{O})]=o_{p}(n^{-1/2}).

Next, we show that R1(ψ^v,ψ)=op(n1/2)R_{1}(\widehat{\psi}^{v},\psi)=o_{p}(n^{-1/2}). Specifically, we can show that

R1(ψ^v,ψ)=𝔼[ψ^v(𝑶)]𝔼[ψ(𝑶)]\displaystyle R_{1}(\widehat{\psi}^{v},\psi)=\mathbb{E}[\widehat{\psi}^{v}(\bm{O})]-\mathbb{E}[\psi(\bm{O})]
=\displaystyle= 𝔼{(𝕀(Z=z){𝕀(D=d)p^zdnp,v(𝑿)}π^znp,v(𝑿)k(1Z){Dp^01np,v(𝑿)}π^0np,v(𝑿))η^zznp,v(𝑿)\displaystyle\mathbb{E}\Big{\{}\left(\frac{\mathbb{I}(Z=z^{*})\left\{\mathbb{I}(D=d^{*})-\widehat{p}_{z^{*}d^{*}}^{\text{np},v}({\bm{X}})\right\}}{\widehat{\pi}_{z^{*}}^{\text{np},v}({\bm{X}})}-k\frac{(1-Z)\left\{D-\widehat{p}_{01}^{\text{np},v}({\bm{X}})\right\}}{\widehat{\pi}_{0}^{\text{np},v}({\bm{X}})}\right)\widehat{\eta}_{zz^{\prime}}^{\text{np},v}({\bm{X}})
+{p^zdnp,v(𝑿)kp^01np,v(𝑿)}𝕀(D=dz,Z=z)p^zdznp,v(𝑿)π^znp,v(𝑿)r^zdznp,v(M,𝑿)r^zdznp,v(M,𝑿){Yμ^zdznp,v(M,𝑿)}\displaystyle+\left\{\widehat{p}_{z^{*}d^{*}}^{\text{np},v}({\bm{X}})-k\widehat{p}_{01}^{\text{np},v}({\bm{X}})\right\}\frac{\mathbb{I}(D=d_{z},Z=z)}{\widehat{p}_{zd_{z}}^{\text{np},v}({\bm{X}})\widehat{\pi}_{z}^{\text{np},v}({\bm{X}})}\frac{\widehat{r}_{z^{\prime}d_{z^{\prime}}}^{\text{np},v}(M,{\bm{X}})}{\widehat{r}_{zd_{z}}^{\text{np},v}(M,{\bm{X}})}\left\{Y-\widehat{\mu}_{zd_{z}}^{\text{np},v}(M,{\bm{X}})\right\}
+{p^zdnp,v(𝑿)kp^01np,v(𝑿)}𝕀(D=dz,Z=z)p^zdznp,v(𝑿)π^znp,v(𝑿){μ^zdznp,v(M,𝑿)η^zznp,v(𝑿)}\displaystyle+\left\{\widehat{p}_{z^{*}d^{*}}^{\text{np},v}({\bm{X}})-k\widehat{p}_{01}^{\text{np},v}({\bm{X}})\right\}\frac{\mathbb{I}(D=d_{z^{\prime}},Z=z^{\prime})}{\widehat{p}_{z^{\prime}d_{z^{\prime}}}^{\text{np},v}({\bm{X}})\widehat{\pi}_{z^{\prime}}^{\text{np},v}({\bm{X}})}\left\{\widehat{\mu}_{zd_{z}}^{\text{np},v}(M,{\bm{X}})-\widehat{\eta}_{zz^{\prime}}^{\text{np},v}({\bm{X}})\right\}
+{p^zdnp,v(𝑿)kp^01np,v(𝑿)}η^zznp,v(𝑿)}\displaystyle+\left\{\widehat{p}_{z^{*}d^{*}}^{\text{np},v}({\bm{X}})-k\widehat{p}_{01}^{\text{np},v}({\bm{X}})\right\}\widehat{\eta}_{zz^{\prime}}^{\text{np},v}({\bm{X}})\Big{\}}
𝔼[{pzd(𝑿)kp01(𝑿)}mμzdz(m,𝑿)rzdz(m,𝑿)dm]\displaystyle-\mathbb{E}\left[\left\{p_{z^{*}d^{*}}({\bm{X}})-kp_{01}({\bm{X}})\right\}\int_{m}\mu_{zd_{z}}(m,{\bm{X}})r_{z^{\prime}d_{z^{\prime}}}(m,{\bm{X}})\text{d}m\right]
=\displaystyle= 𝔼[(πz(𝑿){pzd(𝑿)p^zdnp,v(𝑿)}π^znp,v(𝑿)kπ0(𝑿){p01(𝑿)p^01np,v(𝑿)}π^0np,v(𝑿))mμ^zdznp,v(m,𝑿)r^zdznp,v(m,𝑿)dm]\displaystyle\mathbb{E}\left[\left(\frac{\pi_{z^{*}}({\bm{X}})\left\{p_{z^{*}d^{*}}({\bm{X}})-\widehat{p}_{z^{*}d^{*}}^{\text{np},v}({\bm{X}})\right\}}{\widehat{\pi}_{z^{*}}^{\text{np},v}({\bm{X}})}-k\frac{\pi_{0}({\bm{X}})\left\{p_{01}({\bm{X}})-\widehat{p}_{01}^{\text{np},v}({\bm{X}})\right\}}{\widehat{\pi}_{0}^{\text{np},v}({\bm{X}})}\right)\int_{m}\widehat{\mu}_{zd_{z}}^{\text{np},v}(m,{\bm{X}})\widehat{r}_{z^{\prime}d_{z^{\prime}}}^{\text{np},v}(m,{\bm{X}})\text{d}m\right]
+𝔼[{p^zdnp,v(𝑿)kp^01np,v(𝑿)}pzdz(𝑿)πz(𝑿)p^zdznp,v(𝑿)π^znp,v(𝑿)mr^zdznp,v(m,𝑿)r^zdznp,v(m,𝑿){μzdz(m,𝑿)μ^zdznp,v(m,𝑿)}rzdz(m,𝑿)dm]\displaystyle+\mathbb{E}\left[\left\{\widehat{p}_{z^{*}d^{*}}^{\text{np},v}({\bm{X}})-k\widehat{p}_{01}^{\text{np},v}({\bm{X}})\right\}\frac{p_{zd_{z}}({\bm{X}})\pi_{z}({\bm{X}})}{\widehat{p}_{zd_{z}}^{\text{np},v}({\bm{X}})\widehat{\pi}_{z}^{\text{np},v}({\bm{X}})}\int_{m}\frac{\widehat{r}_{z^{\prime}d_{z^{\prime}}}^{\text{np},v}(m,{\bm{X}})}{\widehat{r}_{zd_{z}}^{\text{np},v}(m,{\bm{X}})}\left\{\mu_{zd_{z}}(m,{\bm{X}})-\widehat{\mu}_{zd_{z}}^{\text{np},v}(m,{\bm{X}})\right\}r_{zd_{z}}(m,{\bm{X}})\text{d}m\right]
+𝔼[{p^zdnp,v(𝑿)kp^01np,v(𝑿)}pzdz(𝑿)πz(𝑿)p^zdznp,v(𝑿)π^znp,v(𝑿)mμ^zdznp,v(m,𝑿){rzdz(m,𝑿)r^zdznp,v(m,𝑿)}dm]\displaystyle+\mathbb{E}\left[\left\{\widehat{p}_{z^{*}d^{*}}^{\text{np},v}({\bm{X}})-k\widehat{p}_{01}^{\text{np},v}({\bm{X}})\right\}\frac{p_{z^{\prime}d_{z^{\prime}}}({\bm{X}})\pi_{z^{\prime}}({\bm{X}})}{\widehat{p}_{z^{\prime}d_{z^{\prime}}}^{\text{np},v}({\bm{X}})\widehat{\pi}_{z^{\prime}}^{\text{np},v}({\bm{X}})}\int_{m}\widehat{\mu}_{zd_{z}}^{\text{np},v}(m,{\bm{X}})\left\{r_{z^{\prime}d_{z^{\prime}}}(m,{\bm{X}})-\widehat{r}_{z^{\prime}d_{z^{\prime}}}^{\text{np},v}(m,{\bm{X}})\right\}\text{d}m\right]
+𝔼[{p^zdnp,v(𝑿)kp^01np,v(𝑿)}mμ^zdznp,v(m,𝑿)r^zdznp,v(m,𝑿)dm]\displaystyle+\mathbb{E}\left[\left\{\widehat{p}_{z^{*}d^{*}}^{\text{np},v}({\bm{X}})-k\widehat{p}_{01}^{\text{np},v}({\bm{X}})\right\}\int_{m}\widehat{\mu}_{zd_{z}}^{\text{np},v}(m,{\bm{X}})\widehat{r}_{z^{\prime}d_{z^{\prime}}}^{\text{np},v}(m,{\bm{X}})\text{d}m\right]
𝔼[{pzd(𝑿)kp01(𝑿)}mμzdz(m,𝑿)rzdz(m,𝑿)dm]\displaystyle-\mathbb{E}\left[\left\{p_{z^{*}d^{*}}({\bm{X}})-kp_{01}({\bm{X}})\right\}\int_{m}\mu_{zd_{z}}(m,{\bm{X}})r_{z^{\prime}d_{z^{\prime}}}(m,{\bm{X}})\text{d}m\right]

Define π\pi_{*}, π0\pi_{0}, πz\pi_{z}, πz\pi_{z^{\prime}}, pp_{*}, p0p_{0}, pzp_{z}, pzp_{z^{\prime}}, rzr_{z}, rzr_{z^{\prime}}, and μz\mu_{z} as the abbreviations of the unknown nuisance functions πz(𝒙)\pi_{z^{*}}({\bm{x}}), π0(𝒙)\pi_{0}({\bm{x}}), πz(𝒙)\pi_{z}({\bm{x}}), πz(𝒙)\pi_{z^{\prime}}({\bm{x}}), pzd(𝒙)p_{z^{*}d^{*}}({\bm{x}}), p01(𝒙)p_{01}({\bm{x}}), pzdz(𝒙)p_{zd_{z}}({\bm{x}}), pzdz(𝒙)p_{z^{\prime}d_{z^{\prime}}}({\bm{x}}), rzdz(m,𝒙)r_{zd_{z}}(m,{\bm{x}}), rzdz(m,𝒙)r_{z^{\prime}d_{z^{\prime}}}(m,{\bm{x}}), and μzdz(m,𝒙)\mu_{zd_{z}}(m,{\bm{x}}), respectively. Also, let π^v\widehat{\pi}_{*}^{v}, π^0v\widehat{\pi}_{0}^{v}, π^zv\widehat{\pi}_{z}^{v}, π^zv\widehat{\pi}_{z^{\prime}}^{v}, p^v\widehat{p}_{*}^{v}, p^0v\widehat{p}_{0}^{v}, p^zv\widehat{p}_{z}^{v}, p^zv\widehat{p}_{z^{\prime}}^{v}, r^zv\widehat{r}_{z}^{v}, r^zv\widehat{r}_{z^{\prime}}^{v}, and μ^zv\widehat{\mu}_{z}^{v} be their corresponding estimators evaluated under h^nuisancenp,v\widehat{h}_{nuisance}^{\text{np},v}. Using these abbreviations, we can rewrite R1(ψ^v,ψ)=Δ1+Δ2+Δ3+Δ4Δ5R_{1}(\widehat{\psi}^{v},\psi)=\Delta_{1}+\Delta_{2}+\Delta_{3}+\Delta_{4}-\Delta_{5} with

Δ1\displaystyle\Delta_{1} =𝔼[(π(pp^v)π^vkπ0(p0p^0v)π^0v)mμ^zvr^zvdm]\displaystyle=\mathbb{E}\left[\left(\frac{\pi_{*}(p_{*}-\widehat{p}_{*}^{v})}{\widehat{\pi}_{*}^{v}}-k\frac{\pi_{0}(p_{0}-\widehat{p}_{0}^{v})}{\widehat{\pi}_{0}^{v}}\right)\int_{m}\widehat{\mu}_{z}^{v}\widehat{r}_{z^{\prime}}^{v}\text{d}m\right]
=𝔼[((ππ^v)(pp^v)π^vk(π0π^0v)(p0p^0v)π^0v)mμ^zvr^zvdm]+𝔼[{(pp^v)k(p0p^0v)}mμ^zvr^zvdm],\displaystyle=\mathbb{E}\left[\left(\frac{(\pi_{*}-\widehat{\pi}_{*}^{v})(p_{*}-\widehat{p}^{v})}{\widehat{\pi}_{*}^{v}}-k\frac{(\pi_{0}-\widehat{\pi}_{0}^{v})(p_{0}-\widehat{p}_{0}^{v})}{\widehat{\pi}_{0}^{v}}\right)\int_{m}\widehat{\mu}_{z}^{v}\widehat{r}_{z^{\prime}}^{v}\text{d}m\right]+\mathbb{E}\left[\left\{(p_{*}-\widehat{p}_{*}^{v})-k(p_{0}-\widehat{p}_{0}^{v})\right\}\int_{m}\widehat{\mu}_{z}^{v}\widehat{r}_{z^{\prime}}^{v}\text{d}m\right],
Δ2\displaystyle\Delta_{2} =𝔼[(p^vkp^0v)pzπzp^zvπ^zvmr^zvr^zv(μzμ^zv)rzdm]\displaystyle=\mathbb{E}\left[(\widehat{p}_{*}^{v}-k\widehat{p}_{0}^{v})\frac{p_{z}\pi_{z}}{\widehat{p}_{z}^{v}\widehat{\pi}_{z}^{v}}\int_{m}\frac{\widehat{r}_{z^{\prime}}^{v}}{\widehat{r}_{z}^{v}}(\mu_{z}-\widehat{\mu}_{z}^{v})r_{z^{\prime}}\text{d}m\right]
=𝔼[(p^vkp^0v)pzπzp^zvπ^zvm(rzr^zv)r^zv(μzμ^zv)r^zvdm]+𝔼[(p^vkp^0v)pzπzp^zvπ^zvm(μzμ^zv)r^zvdm]\displaystyle=\mathbb{E}\left[(\widehat{p}_{*}^{v}-k\widehat{p}_{0}^{v})\frac{p_{z}\pi_{z}}{\widehat{p}_{z}^{v}\widehat{\pi}_{z}^{v}}\int_{m}\frac{(r_{z}-\widehat{r}_{z}^{v})}{\widehat{r}_{z}^{v}}(\mu_{z}-\widehat{\mu}_{z}^{v})\widehat{r}_{z^{\prime}}^{v}\text{d}m\right]+\mathbb{E}\left[(\widehat{p}_{*}^{v}-k\widehat{p}_{0}^{v})\frac{p_{z}\pi_{z}}{\widehat{p}_{z}^{v}\widehat{\pi}_{z}^{v}}\int_{m}(\mu_{z}-\widehat{\mu}_{z}^{v})\widehat{r}_{z^{\prime}}^{v}\text{d}m\right]
=𝔼[(p^vkp^0v)pzπzp^zvπ^zvm(rzr^zv)r^zv(μzμ^zv)r^zvdm]\displaystyle=\mathbb{E}\left[(\widehat{p}_{*}^{v}-k\widehat{p}_{0}^{v})\frac{p_{z}\pi_{z}}{\widehat{p}_{z}^{v}\widehat{\pi}_{z}^{v}}\int_{m}\frac{(r_{z}-\widehat{r}_{z}^{v})}{\widehat{r}_{z}^{v}}(\mu_{z}-\widehat{\mu}_{z}^{v})\widehat{r}_{z^{\prime}}^{v}\text{d}m\right]
+𝔼[(p^vkp^0v)(pzp^zv)πzp^zvπ^zvm(μzμ^zv)r^zvdm]+𝔼[(p^vkp^0v)πzπ^zvm(μzμ^zv)r^zvdm]\displaystyle\quad+\mathbb{E}\left[(\widehat{p}_{*}^{v}-k\widehat{p}_{0}^{v})\frac{(p_{z}-\widehat{p}_{z}^{v})\pi_{z}}{\widehat{p}_{z}^{v}\widehat{\pi}_{z}^{v}}\int_{m}(\mu_{z}-\widehat{\mu}_{z}^{v})\widehat{r}_{z^{\prime}}^{v}\text{d}m\right]+\mathbb{E}\left[(\widehat{p}_{*}^{v}-k\widehat{p}_{0}^{v})\frac{\pi_{z}}{\widehat{\pi}_{z}^{v}}\int_{m}(\mu_{z}-\widehat{\mu}_{z}^{v})\widehat{r}_{z^{\prime}}^{v}\text{d}m\right]
=𝔼[(p^vkp^0v)pzπzp^zvπ^zvm(rzr^zv)r^zv(μzμ^zv)r^zvdm]+𝔼[(p^vkp^0v)(pzp^zv)πzp^zvπ^zvm(μzμ^zv)r^zvdm]\displaystyle=\mathbb{E}\left[(\widehat{p}_{*}^{v}-k\widehat{p}_{0}^{v})\frac{p_{z}\pi_{z}}{\widehat{p}_{z}^{v}\widehat{\pi}_{z}^{v}}\int_{m}\frac{(r_{z}-\widehat{r}_{z}^{v})}{\widehat{r}_{z}^{v}}(\mu_{z}-\widehat{\mu}_{z}^{v})\widehat{r}_{z^{\prime}}^{v}\text{d}m\right]+\mathbb{E}\left[(\widehat{p}_{*}^{v}-k\widehat{p}_{0}^{v})\frac{(p_{z}-\widehat{p}_{z}^{v})\pi_{z}}{\widehat{p}_{z}^{v}\widehat{\pi}_{z}^{v}}\int_{m}(\mu_{z}-\widehat{\mu}_{z}^{v})\widehat{r}_{z^{\prime}}^{v}\text{d}m\right]
+𝔼[(p^vkp^0v)πzπ^zvπ^zvm(μzμ^zv)r^zvdm]+𝔼[(p^vkp^0v)m(μzμ^zv)r^zvdm]\displaystyle\quad+\mathbb{E}\left[(\widehat{p}_{*}^{v}-k\widehat{p}_{0}^{v})\frac{\pi_{z}-\widehat{\pi}_{z}^{v}}{\widehat{\pi}_{z}^{v}}\int_{m}(\mu_{z}-\widehat{\mu}_{z}^{v})\widehat{r}_{z^{\prime}}^{v}\text{d}m\right]+\mathbb{E}\left[(\widehat{p}_{*}^{v}-k\widehat{p}_{0}^{v})\int_{m}(\mu_{z}-\widehat{\mu}_{z}^{v})\widehat{r}_{z^{\prime}}^{v}\text{d}m\right]
=𝔼[(p^vkp^0v)pzπzp^zvπ^zvm(rzr^zv)r^zv(μzμ^zv)r^zvdm]+𝔼[(p^vkp^0v)(pzp^zv)πzp^zvπ^zvm(μzμ^zv)r^zvdm]\displaystyle=\mathbb{E}\left[(\widehat{p}_{*}^{v}-k\widehat{p}_{0}^{v})\frac{p_{z}\pi_{z}}{\widehat{p}_{z}^{v}\widehat{\pi}_{z}^{v}}\int_{m}\frac{(r_{z}-\widehat{r}_{z}^{v})}{\widehat{r}_{z}^{v}}(\mu_{z}-\widehat{\mu}_{z}^{v})\widehat{r}_{z^{\prime}}^{v}\text{d}m\right]+\mathbb{E}\left[(\widehat{p}_{*}^{v}-k\widehat{p}_{0}^{v})\frac{(p_{z}-\widehat{p}_{z}^{v})\pi_{z}}{\widehat{p}_{z}^{v}\widehat{\pi}_{z}^{v}}\int_{m}(\mu_{z}-\widehat{\mu}_{z}^{v})\widehat{r}_{z^{\prime}}^{v}\text{d}m\right]
+𝔼[(p^vkp^0v)πzπ^zvπ^zvm(μzμ^zv)r^zvdm]+𝔼[{(p^vp)k(p^0vp0)}m(μzμ^zv)r^zvdm],\displaystyle\quad+\mathbb{E}\left[(\widehat{p}_{*}^{v}-k\widehat{p}_{0}^{v})\frac{\pi_{z}-\widehat{\pi}_{z}^{v}}{\widehat{\pi}_{z}^{v}}\int_{m}(\mu_{z}-\widehat{\mu}_{z}^{v})\widehat{r}_{z^{\prime}}^{v}\text{d}m\right]+\mathbb{E}\left[\{(\widehat{p}_{*}^{v}-p_{*})-k(\widehat{p}_{0}^{v}-p_{0})\}\int_{m}(\mu_{z}-\widehat{\mu}_{z}^{v})\widehat{r}_{z^{\prime}}^{v}\text{d}m\right],
+𝔼[(pkp0)m(μzμ^zv)r^zvdm],\displaystyle\quad+\mathbb{E}\left[(p_{*}-kp_{0})\int_{m}(\mu_{z}-\widehat{\mu}_{z}^{v})\widehat{r}_{z^{\prime}}^{v}\text{d}m\right],
Δ3\displaystyle\Delta_{3} =𝔼[(p^vkp^0v)pzπzp^zvπ^zvmμ^zv(rzr^zv)dm]\displaystyle=\mathbb{E}\left[(\widehat{p}_{*}^{v}-k\widehat{p}_{0}^{v})\frac{p_{z^{\prime}}\pi_{z^{\prime}}}{\widehat{p}_{z^{\prime}}^{v}\widehat{\pi}_{z^{\prime}}^{v}}\int_{m}\widehat{\mu}_{z}^{v}(r_{z^{\prime}}-\widehat{r}_{z^{\prime}}^{v})\text{d}m\right]
=𝔼[(p^vkp^0v)(pzp^zv)πzp^zvπ^zvmμ^zv(rzr^zv)dm]+𝔼[(p^vkp^0v)πzπ^zvmμ^zv(rzr^zv)dm]\displaystyle=\mathbb{E}\left[(\widehat{p}_{*}^{v}-k\widehat{p}_{0}^{v})\frac{(p_{z^{\prime}}-\widehat{p}_{z^{\prime}}^{v})\pi_{z^{\prime}}}{\widehat{p}_{z^{\prime}}^{v}\widehat{\pi}_{z^{\prime}}^{v}}\int_{m}\widehat{\mu}_{z}^{v}(r_{z^{\prime}}-\widehat{r}_{z^{\prime}}^{v})\text{d}m\right]+\mathbb{E}\left[(\widehat{p}_{*}^{v}-k\widehat{p}_{0}^{v})\frac{\pi_{z^{\prime}}}{\widehat{\pi}_{z^{\prime}}^{v}}\int_{m}\widehat{\mu}_{z}^{v}(r_{z^{\prime}}-\widehat{r}_{z^{\prime}}^{v})\text{d}m\right]
=𝔼[(p^vkp^0v)(pzp^zv)πzp^zvπ^zvmμ^zv(rzr^zv)dm]\displaystyle=\mathbb{E}\left[(\widehat{p}_{*}^{v}-k\widehat{p}_{0}^{v})\frac{(p_{z^{\prime}}-\widehat{p}_{z^{\prime}}^{v})\pi_{z^{\prime}}}{\widehat{p}_{z^{\prime}}^{v}\widehat{\pi}_{z^{\prime}}^{v}}\int_{m}\widehat{\mu}_{z}^{v}(r_{z^{\prime}}-\widehat{r}_{z^{\prime}}^{v})\text{d}m\right]
+𝔼[(p^vkp^0v)πzπ^zvπ^zvmμ^zv(rzr^zv)dm]+𝔼[(p^vkp^0v)mμ^zv(rzr^zv)dm]\displaystyle\quad+\mathbb{E}\left[(\widehat{p}_{*}^{v}-k\widehat{p}_{0}^{v})\frac{\pi_{z^{\prime}}-\widehat{\pi}_{z^{\prime}}^{v}}{\widehat{\pi}_{z^{\prime}}^{v}}\int_{m}\widehat{\mu}_{z}^{v}(r_{z^{\prime}}-\widehat{r}_{z^{\prime}}^{v})\text{d}m\right]+\mathbb{E}\left[(\widehat{p}_{*}^{v}-k\widehat{p}_{0}^{v})\int_{m}\widehat{\mu}_{z}^{v}(r_{z^{\prime}}-\widehat{r}_{z^{\prime}}^{v})\text{d}m\right]
=𝔼[(p^vkp^0v)(pzp^zv)πzp^zvπ^zvmμ^zv(rzr^zv)dm]+𝔼[(p^vkp^0v)πzπ^zvπ^zvmμ^zv(rzr^zv)dm]\displaystyle=\mathbb{E}\left[(\widehat{p}_{*}^{v}-k\widehat{p}_{0}^{v})\frac{(p_{z^{\prime}}-\widehat{p}_{z^{\prime}}^{v})\pi_{z^{\prime}}}{\widehat{p}_{z^{\prime}}^{v}\widehat{\pi}_{z^{\prime}}^{v}}\int_{m}\widehat{\mu}_{z}^{v}(r_{z^{\prime}}-\widehat{r}_{z^{\prime}}^{v})\text{d}m\right]+\mathbb{E}\left[(\widehat{p}_{*}^{v}-k\widehat{p}_{0}^{v})\frac{\pi_{z^{\prime}}-\widehat{\pi}_{z^{\prime}}^{v}}{\widehat{\pi}_{z^{\prime}}^{v}}\int_{m}\widehat{\mu}_{z}^{v}(r_{z^{\prime}}-\widehat{r}_{z^{\prime}}^{v})\text{d}m\right]
+𝔼[{(p^vp)k(p^0vp0)}mμ^zv(rzr^zv)dm]+𝔼[(pkp0)mμ^zv(rzr^zv)dm],\displaystyle\quad+\mathbb{E}\left[\{(\widehat{p}_{*}^{v}-p_{*})-k(\widehat{p}_{0}^{v}-p_{0})\}\int_{m}\widehat{\mu}_{z}^{v}(r_{z^{\prime}}-\widehat{r}_{z^{\prime}}^{v})\text{d}m\right]+\mathbb{E}\left[(p_{*}-kp_{0})\int_{m}\widehat{\mu}_{z}^{v}(r_{z^{\prime}}-\widehat{r}_{z^{\prime}}^{v})\text{d}m\right],
Δ4\displaystyle\Delta_{4} =𝔼[(p^vkp^0v)mμ^zvr^zvdm]\displaystyle=\mathbb{E}\left[(\widehat{p}_{*}^{v}-k\widehat{p}_{0}^{v})\int_{m}\widehat{\mu}_{z}^{v}\widehat{r}_{z^{\prime}}^{v}\text{d}m\right]
=𝔼[{(p^vp)k(p^0vp0)}mμ^zvr^zvdm]+𝔼[(pkp0)mμ^zvr^zvdm],\displaystyle=\mathbb{E}\left[\{(\widehat{p}_{*}^{v}-p_{*})-k(\widehat{p}_{0}^{v}-p_{0})\}\int_{m}\widehat{\mu}_{z}^{v}\widehat{r}_{z^{\prime}}^{v}\text{d}m\right]+\mathbb{E}\left[(p_{*}-kp_{0})\int_{m}\widehat{\mu}_{z}^{v}\widehat{r}_{z^{\prime}}^{v}\text{d}m\right],
Δ5\displaystyle\Delta_{5} =𝔼[(pkp0)mμzrzdm].\displaystyle=\mathbb{E}\left[(p_{*}-kp_{0})\int_{m}\mu_{z}r_{z^{\prime}}\text{d}m\right].

Therefore, we have that

R1(ψ^v,ψ)=\displaystyle R_{1}(\widehat{\psi}^{v},\psi)= Δ1+Δ2+Δ3+Δ4Δ5\displaystyle\Delta_{1}+\Delta_{2}+\Delta_{3}+\Delta_{4}-\Delta_{5}
=\displaystyle= 𝔼[((ππ^v)(pp^v)π^vk(π0π^0v)(p0p^0v)π^0v)mμ^zvr^zvdm]\displaystyle\mathbb{E}\left[\left(\frac{(\pi_{*}-\widehat{\pi}_{*}^{v})(p_{*}-\widehat{p}^{v})}{\widehat{\pi}_{*}^{v}}-k\frac{(\pi_{0}-\widehat{\pi}_{0}^{v})(p_{0}-\widehat{p}_{0}^{v})}{\widehat{\pi}_{0}^{v}}\right)\int_{m}\widehat{\mu}_{z}^{v}\widehat{r}_{z^{\prime}}^{v}\text{d}m\right]
+𝔼[(p^vkp^0v)pzπzp^zvπ^zvm(rzr^zv)r^zv(μzμ^zv)r^zvdm]\displaystyle+\mathbb{E}\left[(\widehat{p}_{*}^{v}-k\widehat{p}_{0}^{v})\frac{p_{z}\pi_{z}}{\widehat{p}_{z}^{v}\widehat{\pi}_{z}^{v}}\int_{m}\frac{(r_{z}-\widehat{r}_{z}^{v})}{\widehat{r}_{z}^{v}}(\mu_{z}-\widehat{\mu}_{z}^{v})\widehat{r}_{z^{\prime}}^{v}\text{d}m\right]
+𝔼[(p^vkp^0v)(pzp^zv)πzp^zvπ^zvm(μzμ^zv)r^zvdm]\displaystyle+\mathbb{E}\left[(\widehat{p}_{*}^{v}-k\widehat{p}_{0}^{v})\frac{(p_{z}-\widehat{p}_{z}^{v})\pi_{z}}{\widehat{p}_{z}^{v}\widehat{\pi}_{z}^{v}}\int_{m}(\mu_{z}-\widehat{\mu}_{z}^{v})\widehat{r}_{z^{\prime}}^{v}\text{d}m\right]
+𝔼[(p^vkp^0v)πzπ^zvπ^zvm(μzμ^zv)r^zvdm]\displaystyle+\mathbb{E}\left[(\widehat{p}_{*}^{v}-k\widehat{p}_{0}^{v})\frac{\pi_{z}-\widehat{\pi}_{z}^{v}}{\widehat{\pi}_{z}^{v}}\int_{m}(\mu_{z}-\widehat{\mu}_{z}^{v})\widehat{r}_{z^{\prime}}^{v}\text{d}m\right]
+𝔼[{(p^vp)k(p^0vp0)}m(μzμ^zv)r^zvdm]\displaystyle+\mathbb{E}\left[\{(\widehat{p}_{*}^{v}-p_{*})-k(\widehat{p}_{0}^{v}-p_{0})\}\int_{m}(\mu_{z}-\widehat{\mu}_{z}^{v})\widehat{r}_{z^{\prime}}^{v}\text{d}m\right]
+𝔼[(p^vkp^0v)(pzp^zv)πzp^zvπ^zvmμ^zv(rzr^zv)dm]\displaystyle+\mathbb{E}\left[(\widehat{p}_{*}^{v}-k\widehat{p}_{0}^{v})\frac{(p_{z^{\prime}}-\widehat{p}_{z^{\prime}}^{v})\pi_{z^{\prime}}}{\widehat{p}_{z^{\prime}}^{v}\widehat{\pi}_{z^{\prime}}^{v}}\int_{m}\widehat{\mu}_{z}^{v}(r_{z^{\prime}}-\widehat{r}_{z^{\prime}}^{v})\text{d}m\right]
+𝔼[(p^vkp^0v)πzπ^zvπ^zvmμ^zv(rzr^zv)dm]\displaystyle+\mathbb{E}\left[(\widehat{p}_{*}^{v}-k\widehat{p}_{0}^{v})\frac{\pi_{z^{\prime}}-\widehat{\pi}_{z^{\prime}}^{v}}{\widehat{\pi}_{z^{\prime}}^{v}}\int_{m}\widehat{\mu}_{z}^{v}(r_{z^{\prime}}-\widehat{r}_{z^{\prime}}^{v})\text{d}m\right]
+𝔼[{(p^vp)k(p^0vp0)}mμ^zv(rzr^zv)dm]\displaystyle+\mathbb{E}\left[\{(\widehat{p}_{*}^{v}-p_{*})-k(\widehat{p}_{0}^{v}-p_{0})\}\int_{m}\widehat{\mu}_{z}^{v}(r_{z^{\prime}}-\widehat{r}_{z^{\prime}}^{v})\text{d}m\right]
+𝔼[(pkp0)m(μ^zμzv)(r^zvrz)dm]\displaystyle+\mathbb{E}\left[(p_{*}-kp_{0})\int_{m}(\widehat{\mu}_{z}-\mu_{z}^{v})(\widehat{r}_{z^{\prime}}^{v}-r_{z^{\prime}})\text{d}m\right]
=\displaystyle= 𝔼[fm1((ππ^v)(pp^v)π^vk(π0π^0v)(p0p^0v)π^0v)μ^zvr^zv]\displaystyle\mathbb{E}\left[f_{m}^{-1}\left(\frac{(\pi_{*}-\widehat{\pi}_{*}^{v})(p_{*}-\widehat{p}_{*}^{v})}{\widehat{\pi}_{*}^{v}}-k\frac{(\pi_{0}-\widehat{\pi}_{0}^{v})(p_{0}-\widehat{p}_{0}^{v})}{\widehat{\pi}_{0}^{v}}\right)\widehat{\mu}_{z}^{v}\widehat{r}_{z^{\prime}}^{v}\right]
+𝔼[fm1(p^vkp^0v)pzπzp^zvπ^zv(rzr^zv)r^zv(μzμ^zv)r^zv]\displaystyle+\mathbb{E}\left[f_{m}^{-1}(\widehat{p}_{*}^{v}-k\widehat{p}_{0}^{v})\frac{p_{z}\pi_{z}}{\widehat{p}_{z}^{v}\widehat{\pi}_{z}^{v}}\frac{(r_{z}-\widehat{r}_{z}^{v})}{\widehat{r}_{z}^{v}}(\mu_{z}-\widehat{\mu}_{z}^{v})\widehat{r}_{z^{\prime}}^{v}\right]
+𝔼[fm1(p^vkp^0v)(pzp^zv)πzp^zvπ^zv(μzμ^zv)r^zv]\displaystyle+\mathbb{E}\left[f_{m}^{-1}(\widehat{p}_{*}^{v}-k\widehat{p}_{0}^{v})\frac{(p_{z}-\widehat{p}_{z}^{v})\pi_{z}}{\widehat{p}_{z}^{v}\widehat{\pi}_{z}^{v}}(\mu_{z}-\widehat{\mu}_{z}^{v})\widehat{r}_{z^{\prime}}^{v}\right]
+𝔼[fm1(p^vkp^0v)πzπ^zvπ^zv(μzμ^zv)r^zv]\displaystyle+\mathbb{E}\left[f_{m}^{-1}(\widehat{p}_{*}^{v}-k\widehat{p}_{0}^{v})\frac{\pi_{z}-\widehat{\pi}_{z}^{v}}{\widehat{\pi}_{z}^{v}}(\mu_{z}-\widehat{\mu}_{z}^{v})\widehat{r}_{z^{\prime}}^{v}\right]
+𝔼[fm1{(p^vp)k(p^0vp0)}(μzμ^zv)r^zv]\displaystyle+\mathbb{E}\left[f_{m}^{-1}\{(\widehat{p}_{*}^{v}-p_{*})-k(\widehat{p}_{0}^{v}-p_{0})\}(\mu_{z}-\widehat{\mu}_{z}^{v})\widehat{r}_{z^{\prime}}^{v}\right]
+𝔼[fm1(p^vkp^0v)(pzp^zv)πzp^zvπ^zvμ^zv(rzr^zv)]\displaystyle+\mathbb{E}\left[f_{m}^{-1}(\widehat{p}_{*}^{v}-k\widehat{p}_{0}^{v})\frac{(p_{z^{\prime}}-\widehat{p}_{z^{\prime}}^{v})\pi_{z^{\prime}}}{\widehat{p}_{z^{\prime}}^{v}\widehat{\pi}_{z^{\prime}}^{v}}\widehat{\mu}_{z}^{v}(r_{z^{\prime}}-\widehat{r}_{z^{\prime}}^{v})\right]
+𝔼[fm1(p^vkp^0v)πzπ^zvπ^zvμ^zv(rzr^zv)]\displaystyle+\mathbb{E}\left[f_{m}^{-1}(\widehat{p}_{*}^{v}-k\widehat{p}_{0}^{v})\frac{\pi_{z^{\prime}}-\widehat{\pi}_{z^{\prime}}^{v}}{\widehat{\pi}_{z^{\prime}}^{v}}\widehat{\mu}_{z}^{v}(r_{z^{\prime}}-\widehat{r}_{z^{\prime}}^{v})\right]
+𝔼[fm1{(p^vp)k(p^0vp0)}μ^zv(rzr^zv)]\displaystyle+\mathbb{E}\left[f_{m}^{-1}\{(\widehat{p}_{*}^{v}-p_{*})-k(\widehat{p}_{0}^{v}-p_{0})\}\widehat{\mu}_{z}^{v}(r_{z^{\prime}}-\widehat{r}_{z^{\prime}}^{v})\right]
+𝔼[fm1(pkp0)(μ^zvμz)(r^zvrz)],\displaystyle+\mathbb{E}\left[f_{m}^{-1}(p_{*}-kp_{0})(\widehat{\mu}_{z}^{v}-\mu_{z})(\widehat{r}_{z^{\prime}}^{v}-r_{z^{\prime}})\right],

where fm=fM|Z,D,𝑿(M|Z,D,𝑿)f_{m}=f_{M|Z,D,{\bm{X}}}(M|Z,D,{\bm{X}}) and the last equality of the previous equation follows from the law of iterated expectation. Using the Cauchy-Schwartz inequality, we then have

|R1(ψ^v,ψ)|\displaystyle|R_{1}(\widehat{\psi}^{v},\psi)|\leq fm1π^v1μ^zvr^zvπ^vπp^vp+fm1π^0v1μ^zvr^zvπ^0vπ0p^0vp0\displaystyle\|f_{m}^{-1}\widehat{\pi}_{*}^{v^{-1}}\widehat{\mu}_{z}^{v}\widehat{r}_{z^{\prime}}^{v}\|_{\infty}\|\widehat{\pi}_{*}^{v}-\pi_{*}\|\|\widehat{p}_{*}^{v}-p_{*}\|+\|f_{m}^{-1}\widehat{\pi}_{0}^{v^{-1}}\widehat{\mu}_{z}^{v}\widehat{r}_{z^{\prime}}^{v}\|_{\infty}\|\widehat{\pi}_{0}^{v}-\pi_{0}\|\|\widehat{p}_{0}^{v}-p_{0}\|
+{fm1p^vpzπzr^zvp^zv1π^zv1r^zv1+fm1p^0vpzπzr^zvp^zv1π^zv1r^zv1}μ^zvμzr^zvrz\displaystyle+\left\{\|f_{m}^{-1}\widehat{p}_{*}^{v}p_{z}\pi_{z}\widehat{r}_{z^{\prime}}^{v}\widehat{p}_{z}^{v^{-1}}\widehat{\pi}_{z}^{v^{-1}}\widehat{r}_{z}^{v^{-1}}\|_{\infty}+\|f_{m}^{-1}\widehat{p}_{0}^{v}p_{z}\pi_{z}\widehat{r}_{z^{\prime}}^{v}\widehat{p}_{z}^{v^{-1}}\widehat{\pi}_{z}^{v^{-1}}\widehat{r}_{z}^{v^{-1}}\|_{\infty}\right\}\|\widehat{\mu}_{z}^{v}-\mu_{z}\|\|\widehat{r}_{z}^{v}-r_{z}\|
+{fm1p^vπzr^zvp^zv1π^zv1+fm1p^0vπzr^zvp^zv1π^zv1}μ^zvμzp^zvpz\displaystyle+\left\{\|f_{m}^{-1}\widehat{p}_{*}^{v}\pi_{z}\widehat{r}_{z^{\prime}}^{v}\widehat{p}_{z}^{v^{-1}}\widehat{\pi}_{z}^{v^{-1}}\|_{\infty}+\|f_{m}^{-1}\widehat{p}_{0}^{v}\pi_{z}\widehat{r}_{z^{\prime}}^{v}\widehat{p}_{z}^{v^{-1}}\widehat{\pi}_{z}^{v^{-1}}\|_{\infty}\right\}\|\widehat{\mu}_{z}^{v}-\mu_{z}\|\|\widehat{p}_{z}^{v}-p_{z}\|
+{fm1p^vπ^zv1r^zv+fm1p^vπ^zv1r^zv}μ^zvμzπ^zvπz\displaystyle+\left\{\|f_{m}^{-1}\widehat{p}_{*}^{v}\widehat{\pi}_{z}^{v^{-1}}\widehat{r}_{z^{\prime}}^{v}\|_{\infty}+\|f_{m}^{-1}\widehat{p}_{*}^{v}\widehat{\pi}_{z}^{v^{-1}}\widehat{r}_{z^{\prime}}^{v}\|_{\infty}\right\}\|\widehat{\mu}_{z}^{v}-\mu_{z}\|\|\widehat{\pi}_{z}^{v}-\pi_{z}\|
+fm1r^zvp^vpμ^zvμz+fm1r^zvp^0vp0μ^zvμz\displaystyle+\|f_{m}^{-1}\widehat{r}_{z^{\prime}}^{v}\|_{\infty}\|\widehat{p}_{*}^{v}-p_{*}\|\|\widehat{\mu}_{z}^{v}-\mu_{z}\|+\|f_{m}^{-1}\widehat{r}_{z^{\prime}}^{v}\|_{\infty}\|\widehat{p}_{0}^{v}-p_{0}\|\|\widehat{\mu}_{z}^{v}-\mu_{z}\|
+{fm1p^vπzp^zv1π^zv1μzv+fm1p^0vπzp^zv1π^zv1μzv}r^zvrzp^zvpz\displaystyle+\left\{\|f_{m}^{-1}\widehat{p}_{*}^{v}\pi_{z^{\prime}}\widehat{p}_{z^{\prime}}^{v^{-1}}\widehat{\pi}_{z^{\prime}}^{v^{-1}}\mu_{z}^{v}\|_{\infty}+\|f_{m}^{-1}\widehat{p}_{0}^{v}\pi_{z^{\prime}}\widehat{p}_{z^{\prime}}^{v^{-1}}\widehat{\pi}_{z^{\prime}}^{v^{-1}}\mu_{z}^{v}\|_{\infty}\right\}\|\widehat{r}_{z^{\prime}}^{v}-r_{z^{\prime}}\|\|\widehat{p}_{z^{\prime}}^{v}-p_{z^{\prime}}\|
+{fm1p^vπ^zv1μ^zv+fm1p^vπ^zv1μ^zv}r^zvrzπ^zvπz\displaystyle+\left\{\|f_{m}^{-1}\widehat{p}_{*}^{v}\widehat{\pi}_{z^{\prime}}^{v^{-1}}\widehat{\mu}_{z}^{v}\|_{\infty}+\|f_{m}^{-1}\widehat{p}_{*}^{v}\widehat{\pi}_{z^{\prime}}^{v^{-1}}\widehat{\mu}_{z}^{v}\|_{\infty}\right\}\|\widehat{r}_{z^{\prime}}^{v}-r_{z^{\prime}}\|\|\widehat{\pi}_{z^{\prime}}^{v}-\pi_{z^{\prime}}\|
+fm1μ^zvr^zvrzp^vp+fm1μ^zvr^zvrzp^0vp0\displaystyle+\|f_{m}^{-1}\widehat{\mu}_{z}^{v}\|_{\infty}\|\widehat{r}_{z^{\prime}}^{v}-r_{z^{\prime}}\|\|\widehat{p}_{*}^{v}-p_{*}\|+\|f_{m}^{-1}\widehat{\mu}_{z}^{v}\|_{\infty}\|\widehat{r}_{z^{\prime}}^{v}-r_{z^{\prime}}\|\|\widehat{p}_{0}^{v}-p_{0}\|
+{fm1p+fm1p0}μ^zvμzr^zvrz.\displaystyle+\left\{\|f_{m}^{-1}p_{*}\|_{\infty}+\|f_{m}^{-1}p_{0}\|_{\infty}\right\}\|\widehat{\mu}_{z}^{v}-\mu_{z}\|\|\widehat{r}_{z^{\prime}}^{v}-r_{z^{\prime}}\|.

Noting that it is assumed l^np,vl×g^np,vg=op(n1/2)\|\widehat{l}^{\text{np},v}-l\|\times\|\widehat{g}^{\text{np},v}-g\|=o_{p}(n^{-1/2}) for any lg{πz(𝒙),pzd(𝒙),rzd(m,𝒙),μzd(m,𝒙)}l\neq g\in\{\pi_{z}({\bm{x}}),p_{zd}({\bm{x}}),r_{zd}(m,{\bm{x}}),\\ \mu_{zd}(m,{\bm{x}})\}, then R1(ψ^v,ψ)=op(n1/2)R_{1}(\widehat{\psi}^{v},\psi)=o_{p}(n^{-1/2}). Now, we have confirmed that

nv[ψ^v(𝑶)]=nv[ψ(𝑶)]+op(n1/2),\mathbb{P}_{n_{v}}[\widehat{\psi}^{v}(\bm{O})]=\mathbb{P}_{n_{v}}[\psi(\bm{O})]+o_{p}(n^{-1/2}),

thus

n[ψ^(𝑶)]\displaystyle\mathbb{P}_{n}\left[\widehat{\psi}(\bm{O})\right] =1nv=1Vnvnv[ψ^v(𝑶)]\displaystyle=\frac{1}{n}\sum_{v=1}^{V}n_{v}\mathbb{P}_{n_{v}}\left[\widehat{\psi}^{v}(\bm{O})\right]
=v=1V{nvnnv[ψ(𝑶)]+op(nvn3/2)}\displaystyle=\sum_{v=1}^{V}\left\{\frac{n_{v}}{n}\mathbb{P}_{n_{v}}\left[\psi(\bm{O})\right]+o_{p}\left(\frac{n_{v}}{n^{3/2}}\right)\right\}
=n[ψ(𝑶)]+op(n1/2)\displaystyle=\mathbb{P}_{n}\left[\psi(\bm{O})\right]+o_{p}(n^{-1/2}) (s24)

Using similar arguments, we can show nv[δ^v(𝑶)]=nv[δ(𝑶)]+op(n1/2)\mathbb{P}_{n_{v}}[\widehat{\delta}^{v}(\bm{O})]=\mathbb{P}_{n_{v}}[\delta(\bm{O})]+o_{p}(n^{-1/2}) and therefore

n[δ^(𝑶)]\displaystyle\mathbb{P}_{n}\left[\widehat{\delta}(\bm{O})\right] =n[δ(𝑶)]+op(n1/2).\displaystyle=\mathbb{P}_{n}\left[\delta(\bm{O})\right]+o_{p}(n^{-1/2}). (s25)

Notice that θ^\widehat{\theta} can be cast as the solution of the following equation

n[ψ^(𝑶)θ^δ^(𝑶)]=0.\mathbb{P}_{n}\left[\widehat{\psi}(\bm{O})-\widehat{\theta}\widehat{\delta}(\bm{O})\right]=0.

This, along with (s24) and (s25) suggests that

n[ψ(𝑶)θ^δ(𝑶)]=op(n1/2)\displaystyle\mathbb{P}_{n}\left[\psi(\bm{O})-\widehat{\theta}\delta(\bm{O})\right]=o_{p}(n^{-1/2})
\displaystyle\Longleftrightarrow n[ψ(𝑶)θδ(𝑶)]n[δ(𝑶)](θ^θ)=op(n1/2).\displaystyle\mathbb{P}_{n}\left[\psi(\bm{O})-\theta\delta(\bm{O})\right]-\mathbb{P}_{n}\left[\delta(\bm{O})\right](\widehat{\theta}-\theta)=o_{p}(n^{-1/2}).

Moreover, since n[δ(𝑶)]=𝔼[δ(𝑶)]+Op(n1/2)\mathbb{P}_{n}\left[\delta(\bm{O})\right]=\mathbb{E}\left[\delta(\bm{O})\right]+O_{p}(n^{-1/2}) and θ^=θ+op(1)\widehat{\theta}=\theta+o_{p}(1), it follows that n[δ(𝑶)](θ^θ)=𝔼[δ(𝑶)](θ^θ)+op(n1/2)\mathbb{P}_{n}\left[\delta(\bm{O})\right](\widehat{\theta}-\theta)=\mathbb{E}\left[\delta(\bm{O})\right](\widehat{\theta}-\theta)+o_{p}(n^{-1/2}). Therefore, we further obtain

n[ψ(𝑶)θδ(𝑶)]𝔼[δ(𝑶)](θ^θ)=op(n1/2).\mathbb{P}_{n}\left[\psi(\bm{O})-\theta\delta(\bm{O})\right]-\mathbb{E}\left[\delta(\bm{O})\right](\widehat{\theta}-\theta)=o_{p}(n^{-1/2}).

After simple algebra and observing 𝔼[δ(𝑶)]=ed1d0=pzdkp01\mathbb{E}\left[\delta(\bm{O})\right]=e_{d_{1}d_{0}}=p_{z^{*}d^{*}}-kp_{01}, we have

n(θ^θ)=nn[ψ(𝑶)θδ(𝑶)pzdkp01]+op(1)\sqrt{n}(\widehat{\theta}-\theta)=\sqrt{n}\mathbb{P}_{n}\left[\frac{\psi(\bm{O})-\theta\delta(\bm{O})}{p_{z^{*}d^{*}}-kp_{01}}\right]+o_{p}(1)

which suggests that θ^\widehat{\theta} is asymptotically normal and its asymptotic variance achieves the efficiency lower bound discussed in Appendix D.5. \square

D.7 Estimation of natural mediation effects

This section elaborates on the multiple robust estimator and nonparametric estimator for the mediation effects, PNIEd1d0\text{PNIE}_{d_{1}d_{0}}, PNDEd1d0\text{PNDE}_{d_{1}d_{0}}, ITT-NIE and ITT-NDE. The following lemma provides the form of the EIF of the aforementioned mediation effects.

Lemma S11

For any d1d0𝒰ad_{1}d_{0}\in\mathcal{U}_{\text{a}} or d1d0𝒰bd_{1}d_{0}\in\mathcal{U}_{\text{b}} under standard or strong monotonicity, the EIFs of PNIEd1d0\text{PNIE}_{d_{1}d_{0}} and PNDEd1d0\text{PNDE}_{d_{1}d_{0}} are

𝒟d1d0PNIE(𝑶)=ψd1d0(11)(𝑶)ψd1d0(10)(𝑶)PNIEd1d0×δd1d0(𝑶)pzdkp01\mathcal{D}_{d_{1}d_{0}}^{\text{PNIE}}(\bm{O})=\frac{\psi_{d_{1}d_{0}}^{(11)}(\bm{O})-\psi_{d_{1}d_{0}}^{(10)}(\bm{O})-\text{PNIE}_{d_{1}d_{0}}\times\delta_{d_{1}d_{0}}(\bm{O})}{p_{z^{*}d^{*}}-kp_{01}}

and

𝒟d1d0PNDE(𝑶)=ψd1d0(10)(𝑶)ψd1d0(00)(𝑶)PNDEd1d0×δd1d0(𝑶)pzdkp01\mathcal{D}_{d_{1}d_{0}}^{\text{PNDE}}(\bm{O})=\frac{\psi_{d_{1}d_{0}}^{(10)}(\bm{O})-\psi_{d_{1}d_{0}}^{(00)}(\bm{O})-\text{PNDE}_{d_{1}d_{0}}\times\delta_{d_{1}d_{0}}(\bm{O})}{p_{z^{*}d^{*}}-kp_{01}}

respectively, where ψd1d0(zz)(𝐎)\psi_{d_{1}d_{0}}^{(zz^{\prime})}(\bm{O}) and δd1d0(𝐎)\delta_{d_{1}d_{0}}(\bm{O}) are given in Theorem 3, k=|d1d0|k=|d_{1}-d_{0}|, and zd=z^{*}d^{*}=11, 10, 01 if d1d0=d_{1}d_{0}=10, 00, and 11, respectively. In addition, the EIFs of ITT-NIE and ITT-NDE are

𝒟ITT-NIE(𝑶)=d1d0𝒰{ψd1d0(11)(𝑶)ψd1d0(10)(𝑶)}ITT-NIE\mathcal{D}^{\text{ITT-NIE}}(\bm{O})=\sum_{d_{1}d_{0}\in\mathcal{U}}\left\{\psi_{d_{1}d_{0}}^{(11)}(\bm{O})-\psi_{d_{1}d_{0}}^{(10)}(\bm{O})\right\}-\text{ITT-NIE}

and

𝒟ITT-NDE(𝑶)=d1d0𝒰{ψd1d0(10)(𝑶)ψd1d0(00)(𝑶)}ITT-NDE\mathcal{D}^{\text{ITT-NDE}}(\bm{O})=\sum_{d_{1}d_{0}\in\mathcal{U}}\left\{\psi_{d_{1}d_{0}}^{(10)}(\bm{O})-\psi_{d_{1}d_{0}}^{(00)}(\bm{O})\right\}-\text{ITT-NDE}

respectively, where 𝒰=𝒰a\mathcal{U}=\mathcal{U}_{\text{a}} or 𝒰=𝒰b\mathcal{U}=\mathcal{U}_{\text{b}} under the standard or strong monotonicity.

Proof.

Because PNIEd1d0\text{PNIE}_{d_{1}d_{0}} is identified as the difference between θd1d0(11)\theta_{d_{1}d_{0}}^{(11)} and θd1d0(10)\theta_{d_{1}d_{0}}^{(10)}, and the EIFs of θd1d0(11)\theta_{d_{1}d_{0}}^{(11)} and θd1d0(10)\theta_{d_{1}d_{0}}^{(10)} are 𝒟d1d0(11)(𝑶)\mathcal{D}_{d_{1}d_{0}}^{(11)}(\bm{O}) and 𝒟d1d0(10)(𝑶)\mathcal{D}_{d_{1}d_{0}}^{(10)}(\bm{O}) as derived in Theorem 3. According to Lemma S10, we have that the EIF of PNIEd1d0\text{PNIE}_{d_{1}d_{0}} is

𝒟d1d0PNIE(𝑶)=𝒟d1d0(11)(𝑶)𝒟d1d0(10)(𝑶)=ψd1d0(11)(𝑶)ψd1d0(10)(𝑶)PNIEd1d0×δd1d0(𝑶)pzdkp01.\mathcal{D}_{d_{1}d_{0}}^{\text{PNIE}}(\bm{O})=\mathcal{D}_{d_{1}d_{0}}^{(11)}(\bm{O})-\mathcal{D}_{d_{1}d_{0}}^{(10)}(\bm{O})=\frac{\psi_{d_{1}d_{0}}^{(11)}(\bm{O})-\psi_{d_{1}d_{0}}^{(10)}(\bm{O})-\text{PNIE}_{d_{1}d_{0}}\times\delta_{d_{1}d_{0}}(\bm{O})}{p_{z^{*}d^{*}}-kp_{01}}.

The EIF of PNDEd1d0\text{PNDE}_{d_{1}d_{0}} can be similarly obtained.

Also, ITT-NIE is identified as

ITT-NIE=d1d0𝒰ed1d0×(θd1d0(11)θd1d0(10))=d1d0𝒰(Hd1d0(11)Hd1d0(10)),\text{ITT-NIE}=\sum_{d_{1}d_{0}\in\mathcal{U}}e_{d_{1}d_{0}}\times\left(\theta_{d_{1}d_{0}}^{(11)}-\theta_{d_{1}d_{0}}^{(10)}\right)=\sum_{d_{1}d_{0}\in\mathcal{U}}\left(H_{d_{1}d_{0}}^{(11)}-H_{d_{1}d_{0}}^{(10)}\right),

where Hd1d0(zz)=𝔼[(pzd(𝑿)kp01(𝑿))ηzz(𝑿)]H_{d_{1}d_{0}}^{(zz^{\prime})}=\mathbb{E}\left[(p_{z^{*}d^{*}}({\bm{X}})-kp_{01}({\bm{X}}))\eta_{zz^{\prime}}({\bm{X}})\right] is defined in Section D.4 and its EIF has been derived in Lemma S9. Based on Lemma S10, one can easily show

𝒟ITT-NIE(𝑶)=d1d0𝒰{𝒟d1d0(11),H(𝑶)𝒟d1d0(10),H(𝑶)}=d1d0𝒰{ψd1d0(11)(𝑶)ψd1d0(10)(𝑶)}ITT-NIE.\mathcal{D}^{\text{ITT-NIE}}(\bm{O})=\sum_{d_{1}d_{0}\in\mathcal{U}}\left\{\mathcal{D}_{d_{1}d_{0}}^{(11),H}(\bm{O})-\mathcal{D}_{d_{1}d_{0}}^{(10),H}(\bm{O})\right\}=\sum_{d_{1}d_{0}\in\mathcal{U}}\left\{\psi_{d_{1}d_{0}}^{(11)}(\bm{O})-\psi_{d_{1}d_{0}}^{(10)}(\bm{O})\right\}-\text{ITT-NIE}.

We can calculate the EIF of ITT-NDE following the same strategy. \square

The following proposition demonstrates the multiply robust estimator for the mediation effects is still quadruply robust and locally efficient.

Proposition S5

Under either πem\mathcal{M}_{\pi}\cap\mathcal{M}_{e}\cap\mathcal{M}_{m}, πmo\mathcal{M}_{\pi}\cap\mathcal{M}_{m}\cap\mathcal{M}_{o}, πeo\mathcal{M}_{\pi}\cap\mathcal{M}_{e}\cap\mathcal{M}_{o} or emo\mathcal{M}_{e}\cap\mathcal{M}_{m}\cap\mathcal{M}_{o}, the multiply robust estimator τ^mr\widehat{\tau}^{\text{mr}} is consistent and asymptotically normal for all τ{PNIEd1d0,PNDEd1d0,ITT-NIE,ITT-NDE}\tau\in\{\text{PNIE}_{d_{1}d_{0}},\text{PNDE}_{d_{1}d_{0}},\text{ITT-NIE},\text{ITT-NDE}\}. Moreover, τ^mr\widehat{\tau}^{\text{mr}} is semiparametrically efficient under πemo\mathcal{M}_{\pi}\cap\mathcal{M}_{e}\cap\mathcal{M}_{m}\cap\mathcal{M}_{o}.

Proof.

The quadruple robustness and asymptotically normality of PNIE^d1d0mr\widehat{\text{PNIE}}_{d_{1}d_{0}}^{\text{mr}} and PNDE^d1d0mr\widehat{\text{PNDE}}_{d_{1}d_{0}}^{\text{mr}} follow directly from the properties of θ^d1d0(zz),mr\widehat{\theta}_{d_{1}d_{0}}^{(zz^{\prime}),\text{mr}} in Theorem 4. Next, we prove that PNIE^d1d0mr\widehat{\text{PNIE}}_{d_{1}d_{0}}^{\text{mr}} is locally efficient under πemo\mathcal{M}_{\pi}\cap\mathcal{M}_{e}\cap\mathcal{M}_{m}\cap\mathcal{M}_{o} and similar results extend to PNDE^d1d0mr\widehat{\text{PNDE}}_{d_{1}d_{0}}^{\text{mr}}. Based on the proof of Theorem 4 in Section D.5, we know that the influence function of θ^d1d0(zz),mr\widehat{\theta}_{d_{1}d_{0}}^{(zz^{\prime}),\text{mr}} under πemo\mathcal{M}_{\pi}\cap\mathcal{M}_{e}\cap\mathcal{M}_{m}\cap\mathcal{M}_{o} is

n(θ^d1d0(zz),mrθd1d0(zz))=nn[ψd1d0(zz)(𝑶)θd1d0(zz)δd1d0(𝑶)pzdkp01]+op(1).\sqrt{n}(\widehat{\theta}_{d_{1}d_{0}}^{(zz^{\prime}),\text{mr}}-\theta_{d_{1}d_{0}}^{(zz^{\prime})})=\sqrt{n}\mathbb{P}_{n}\left[\frac{\psi_{d_{1}d_{0}}^{(zz^{\prime})}(\bm{O})-\theta_{d_{1}d_{0}}^{(zz^{\prime})}\delta_{d_{1}d_{0}}(\bm{O})}{p_{z^{*}d^{*}}-kp_{01}}\right]+o_{p}(1).

Then, it follows from PNIE^d1d0mr=θ^d1d0(11),mrθ^d1d0(10),mr\widehat{\text{PNIE}}_{d_{1}d_{0}}^{\text{mr}}=\widehat{\theta}_{d_{1}d_{0}}^{(11),\text{mr}}-\widehat{\theta}_{d_{1}d_{0}}^{(10),\text{mr}} that

n(PNIE^d1d0mrPNIEd1d0)\displaystyle\sqrt{n}(\widehat{\text{PNIE}}_{d_{1}d_{0}}^{\text{mr}}-\text{PNIE}_{d_{1}d_{0}}) =nn[ψd1d0(11)(𝑶)ψd1d0(10)(𝑶)PNIEd1d0×δd1d0(𝑶)pzdkp01]+op(1)\displaystyle=\sqrt{n}\mathbb{P}_{n}\left[\frac{\psi_{d_{1}d_{0}}^{(11)}(\bm{O})-\psi_{d_{1}d_{0}}^{(10)}(\bm{O})-\text{PNIE}_{d_{1}d_{0}}\times\delta_{d_{1}d_{0}}(\bm{O})}{p_{z^{*}d^{*}}-kp_{01}}\right]+o_{p}(1)
=nn[𝒟d1d0PNIE(𝑶)]+op(1),\displaystyle=\sqrt{n}\mathbb{P}_{n}\left[\mathcal{D}_{d_{1}d_{0}}^{\text{PNIE}}(\bm{O})\right]+o_{p}(1),

where the second equality holds due to Lemma S11. This suggests that PNIE^d1d0mr\widehat{\text{PNIE}}_{d_{1}d_{0}}^{\text{mr}} is semiparametrically efficient when all working models are correctly specified.

The multiply robust estimator of ITT-NIE is ITT-NIE^mr=d1d0𝒰e^d1d0dr×(θ^d1d0(11),mrθ^d1d0(10),mr)\widehat{\text{ITT-NIE}}^{\text{mr}}=\sum_{d_{1}d_{0}\in\mathcal{U}}\widehat{e}_{d_{1}d_{0}}^{\text{dr}}\times(\widehat{\theta}_{d_{1}d_{0}}^{(11),\text{mr}}-\widehat{\theta}_{d_{1}d_{0}}^{(10),\text{mr}}). Theorem 4 suggests that θ^d1d0(11),mr𝑝θd1d0(11)\widehat{\theta}_{d_{1}d_{0}}^{(11),\text{mr}}\xrightarrow{p}\theta_{d_{1}d_{0}}^{(11)} and θ^d1d0(11),mr𝑝θd1d0(10)\widehat{\theta}_{d_{1}d_{0}}^{(11),\text{mr}}\xrightarrow{p}\theta_{d_{1}d_{0}}^{(10)} under either πem\mathcal{M}_{\pi}\cap\mathcal{M}_{e}\cap\mathcal{M}_{m}, πmo\mathcal{M}_{\pi}\cap\mathcal{M}_{m}\cap\mathcal{M}_{o}, πeo\mathcal{M}_{\pi}\cap\mathcal{M}_{e}\cap\mathcal{M}_{o}, or emo\mathcal{M}_{e}\cap\mathcal{M}_{m}\cap\mathcal{M}_{o}. Also, Jiang et al. (2022) suggests that the doubly robust estimator for the marginal principal score e^d1d0dr=p^zddrkp^01dr\widehat{e}_{d_{1}d_{0}}^{\text{dr}}=\widehat{p}_{z^{*}d^{*}}^{\text{dr}}-k\widehat{p}_{01}^{\text{dr}} is consistent to ed1d0e_{d_{1}d_{0}} under πe\mathcal{M}_{\pi}\cup\mathcal{M}_{e}. This further implies that ITT-NIE^mr𝑝d1d0𝒰ed1d0×(θd1d0(11)θd1d0(10))=ITT-NIE\widehat{\text{ITT-NIE}}^{\text{mr}}\xrightarrow{p}\sum_{d_{1}d_{0}\in\mathcal{U}}e_{d_{1}d_{0}}\times(\theta_{d_{1}d_{0}}^{(11)}-\theta_{d_{1}d_{0}}^{(10)})=\text{ITT-NIE} under either πem\mathcal{M}_{\pi}\cap\mathcal{M}_{e}\cap\mathcal{M}_{m}, πmo\mathcal{M}_{\pi}\cap\mathcal{M}_{m}\cap\mathcal{M}_{o}, πeo\mathcal{M}_{\pi}\cap\mathcal{M}_{e}\cap\mathcal{M}_{o}, or emo\mathcal{M}_{e}\cap\mathcal{M}_{m}\cap\mathcal{M}_{o}. To prove asymptotic normality, notice that ITT-NIE^mr\widehat{\text{ITT-NIE}}^{\text{mr}} can be re-expressed as

ITT-NIE^mr\displaystyle\widehat{\text{ITT-NIE}}^{\text{mr}} =d1d0𝒰e^d1d0dr×(θ^d1d0(11),mrθ^d1d0(10),mr)\displaystyle=\sum_{d_{1}d_{0}\in\mathcal{U}}\widehat{e}_{d_{1}d_{0}}^{\text{dr}}\times(\widehat{\theta}_{d_{1}d_{0}}^{(11),\text{mr}}-\widehat{\theta}_{d_{1}d_{0}}^{(10),\text{mr}})
=d1d0𝒰n[δ^d1d0par(𝑶)]×(n[ψ^d1d0(11),par(𝑶)]n[δ^d1d0par(𝑶)]n[ψ^d1d0(10),par(𝑶)]n[δ^d1d0par(𝑶)])\displaystyle=\sum_{d_{1}d_{0}\in\mathcal{U}}\mathbb{P}_{n}[\widehat{\delta}_{d_{1}d_{0}}^{\text{par}}(\bm{O})]\times\left(\frac{\mathbb{P}_{n}[\widehat{\psi}_{d_{1}d_{0}}^{(11),\text{par}}(\bm{O})]}{\mathbb{P}_{n}[\widehat{\delta}_{d_{1}d_{0}}^{\text{par}}(\bm{O})]}-\frac{\mathbb{P}_{n}[\widehat{\psi}_{d_{1}d_{0}}^{(10),\text{par}}(\bm{O})]}{\mathbb{P}_{n}[\widehat{\delta}_{d_{1}d_{0}}^{\text{par}}(\bm{O})]}\right)
=n[d1d0𝒰ψ^d1d0(11),par(𝑶)ψ^d1d0(10),par(𝑶)].\displaystyle=\mathbb{P}_{n}\left[\sum_{d_{1}d_{0}\in\mathcal{U}}\widehat{\psi}_{d_{1}d_{0}}^{(11),\text{par}}(\bm{O})-\widehat{\psi}_{d_{1}d_{0}}^{(10),\text{par}}(\bm{O})\right].

Define Smr(𝑶;𝝉^)={d1d0𝒰ψ^d1d0(11),par(𝑶)ψ^d1d0(10),par(𝑶)}S_{\text{mr}}(\bm{O};\widehat{\bm{\tau}})=\{\sum_{d_{1}d_{0}\in\mathcal{U}}\widehat{\psi}_{d_{1}d_{0}}^{(11),\text{par}}(\bm{O})-\widehat{\psi}_{d_{1}d_{0}}^{(10),\text{par}}(\bm{O})\}, where 𝝉^\widehat{\bm{\tau}} is the estimator of the parameters in nuisance parametric working models. Then, under mild regularity conditions (similar to what we listed in the proof of Theorem 4), one can easily deduce that

n(ITT-NIE^mrITT-NIE)\displaystyle\sqrt{n}\left(\widehat{\text{ITT-NIE}}^{\text{mr}}-\text{ITT-NIE}\right)
=\displaystyle= nn{Smr(𝑶;𝝉)ITT-NIE+𝔼[𝝉Smr(𝑶;𝝉)]IF𝝉(𝑶;𝝉)}+op(1),\displaystyle\sqrt{n}\mathbb{P}_{n}\left\{S_{\text{mr}}(\bm{O};{\bm{\tau}}^{*})-\text{ITT-NIE}+\mathbb{E}\left[\frac{\partial}{\partial\bm{\tau}^{*}}S_{\text{mr}}(\bm{O};{\bm{\tau}}^{*})\right]\text{IF}_{\bm{\tau}}(\bm{O};\bm{\tau}^{*})\right\}+o_{p}(1),

where 𝝉\bm{\tau}^{*} is the probability limit of 𝝉^\widehat{\bm{\tau}} and IF𝝉(𝑶;𝝉)\text{IF}_{\bm{\tau}}(\bm{O};\bm{\tau}^{*}) is the influence function of 𝝉\bm{\tau}. This have confirmed that ITT-NIE^mr\widehat{\text{ITT-NIE}}^{\text{mr}} is asmptotically normal. Under πemo\mathcal{M}_{\pi}\cap\mathcal{M}_{e}\cap\mathcal{M}_{m}\cap\mathcal{M}_{o}, we can verify that ITT-NIE^mr\widehat{\text{ITT-NIE}}^{\text{mr}} is semiparametrically efficient by observing that Smr(𝑶;𝝉)=ψd1d0(11)(𝑶)ψd1d0(10)(𝑶)S_{\text{mr}}(\bm{O};{\bm{\tau}}^{*})=\psi_{d_{1}d_{0}}^{(11)}(\bm{O})-\psi_{d_{1}d_{0}}^{(10)}(\bm{O}) and 𝔼[𝝉Smr(𝑶;𝝉)]=0\mathbb{E}\left[\frac{\partial}{\partial\bm{\tau}^{*}}S_{\text{mr}}(\bm{O};{\bm{\tau}}^{*})\right]=0 such that

n(ITT-NIE^mrITT-NIE)\displaystyle\sqrt{n}\left(\widehat{\text{ITT-NIE}}^{\text{mr}}-\text{ITT-NIE}\right) =nn[d1d0𝒰{ψd1d0(11)(𝑶)ψd1d0(10)(𝑶)}ITT-NIE]+op(1)\displaystyle=\sqrt{n}\mathbb{P}_{n}\left[\sum_{d_{1}d_{0}\in\mathcal{U}}\left\{\psi_{d_{1}d_{0}}^{(11)}(\bm{O})-\psi_{d_{1}d_{0}}^{(10)}(\bm{O})\right\}-\text{ITT-NIE}\right]+o_{p}(1)
=nn[𝒟ITT-NIE(𝑶)]+op(1),\displaystyle=\sqrt{n}\mathbb{P}_{n}\left[\mathcal{D}^{\text{ITT-NIE}}(\bm{O})\right]+o_{p}(1),

where the second equality from Lemma S11. This suggests that ITT-NIE^mr\widehat{\text{ITT-NIE}}^{\text{mr}} is locally efficient. Using a similar strategy, one can prove that ITT-NDE^mr\widehat{\text{ITT-NDE}}^{\text{mr}} is quadruply robust, asymptotically normal, and locally efficient. \square

In parallel to results in Section D.6, the following proposition demonstrates the properties of the nonparametric estimator of the mediation effects.

Proposition S6

For any τ{PNIEd1d0,PNDEd1d0,ITT-NIE,ITT-NDE}\tau\in\{\text{PNIE}_{d_{1}d_{0}},\text{PNDE}_{d_{1}d_{0}},\text{ITT-NIE},\text{ITT-NDE}\}, τ^np\widehat{\tau}^{\text{np}} is consistent if any three of the four nuisance functions in h^nuisancenp\widehat{h}_{nuisance}^{\text{np}} are consistently estimated in the L2()L_{2}(\mathbb{P})-norm. Furthermore, if all elements in h^nuisancenp\widehat{h}_{nuisance}^{\text{np}} are consistent in the L2()L_{2}(\mathbb{P})-norm and l^npl×g^npg=op(n1/2)\|\widehat{l}^{\text{np}}-l\|\times\|\widehat{g}^{\text{np}}-g\|=o_{p}(n^{-1/2}) for any lg{πz(𝐱),pzd(𝐱),rzd(m,𝐱),μzd(m,𝐱)}l\neq g\in\{\pi_{z}({\bm{x}}),p_{zd}({\bm{x}}),r_{zd}(m,{\bm{x}}),\mu_{zd}(m,{\bm{x}})\}, then τ^np\widehat{\tau}^{\text{np}} is asymptotically normal and semiparametrically efficient.

Proof.

Following the proof of Proposition S5, one can show that τ^np\widehat{\tau}^{\text{np}} is consistent to τ\tau for τ{PNIEd1d0,PNDEd1d0,ITT-NIE,ITT-NDE}\tau\in\{\text{PNIE}_{d_{1}d_{0}},\text{PNDE}_{d_{1}d_{0}},\text{ITT-NIE},\text{ITT-NDE}\}, if any three of the four functions in h^nuisancenp={π^znp(𝒙),p^zdnp(𝒙),r^zdnp(m,𝒙),μ^zdnp(m,𝒙)}\widehat{h}_{nuisance}^{\text{np}}=\{\widehat{\pi}_{z}^{\text{np}}({\bm{x}}),\widehat{p}_{zd}^{\text{np}}({\bm{x}}),\widehat{r}_{zd}^{\text{np}}(m,{\bm{x}}),\widehat{\mu}_{zd}^{\text{np}}(m,{\bm{x}})\} are consistently estimated. Here, we only prove the asymptotic normality and local efficiency of τ^np\widehat{\tau}^{\text{np}} when l^npl×g^npg=op(n1/2)\|\widehat{l}^{\text{np}}-l\|\times\|\widehat{g}^{\text{np}}-g\|=o_{p}(n^{-1/2}) for any lg{πz(𝒙),pzd(𝒙),rzd(m,𝒙),μzd(m,𝒙)}l\neq g\in\{\pi_{z}({\bm{x}}),p_{zd}({\bm{x}}),r_{zd}(m,{\bm{x}}),\mu_{zd}(m,{\bm{x}})\}.

We show in the proof of Theorem 5 that

n(θ^d1d0(zz),npθd1d0(zz))=nn[ψd1d0(zz)(𝑶)θd1d0(zz)δd1d0(𝑶)pzdkp01]+op(1),\sqrt{n}(\widehat{\theta}_{d_{1}d_{0}}^{(zz^{\prime}),\text{np}}-\theta_{d_{1}d_{0}}^{(zz^{\prime})})=\sqrt{n}\mathbb{P}_{n}\left[\frac{\psi_{d_{1}d_{0}}^{(zz^{\prime})}(\bm{O})-\theta_{d_{1}d_{0}}^{(zz^{\prime})}\delta_{d_{1}d_{0}}(\bm{O})}{p_{z^{*}d^{*}}-kp_{01}}\right]+o_{p}(1),

when l^npl×g^npg=op(n1/2)\|\widehat{l}^{\text{np}}-l\|\times\|\widehat{g}^{\text{np}}-g\|=o_{p}(n^{-1/2}) for any lg{πz(𝒙),pzd(𝒙),rzd(m,𝒙),μzd(m,𝒙)}l\neq g\in\{\pi_{z}({\bm{x}}),p_{zd}({\bm{x}}),r_{zd}(m,{\bm{x}}),\mu_{zd}(m,{\bm{x}})\}. Therefore,

n(PNIE^d1d0npPNIEd1d0)\displaystyle\sqrt{n}(\widehat{\text{PNIE}}_{d_{1}d_{0}}^{\text{np}}-\text{PNIE}_{d_{1}d_{0}}) =nn[ψd1d0(11)(𝑶)ψd1d0(10)(𝑶)PNIEd1d0×δd1d0(𝑶)pzdkp01]+op(1)\displaystyle=\sqrt{n}\mathbb{P}_{n}\left[\frac{\psi_{d_{1}d_{0}}^{(11)}(\bm{O})-\psi_{d_{1}d_{0}}^{(10)}(\bm{O})-\text{PNIE}_{d_{1}d_{0}}\times\delta_{d_{1}d_{0}}(\bm{O})}{p_{z^{*}d^{*}}-kp_{01}}\right]+o_{p}(1)
=nn[𝒟d1d0PNIE(𝑶)]+op(1),\displaystyle=\sqrt{n}\mathbb{P}_{n}\left[\mathcal{D}_{d_{1}d_{0}}^{\text{PNIE}}(\bm{O})\right]+o_{p}(1),
n(PNDE^d1d0npPNDEd1d0)\displaystyle\sqrt{n}(\widehat{\text{PNDE}}_{d_{1}d_{0}}^{\text{np}}-\text{PNDE}_{d_{1}d_{0}}) =nn[ψd1d0(10)(𝑶)ψd1d0(00)(𝑶)PNDEd1d0×δd1d0(𝑶)pzdkp01]+op(1)\displaystyle=\sqrt{n}\mathbb{P}_{n}\left[\frac{\psi_{d_{1}d_{0}}^{(10)}(\bm{O})-\psi_{d_{1}d_{0}}^{(00)}(\bm{O})-\text{PNDE}_{d_{1}d_{0}}\times\delta_{d_{1}d_{0}}(\bm{O})}{p_{z^{*}d^{*}}-kp_{01}}\right]+o_{p}(1)
=nn[𝒟d1d0PNDE(𝑶)]+op(1).\displaystyle=\sqrt{n}\mathbb{P}_{n}\left[\mathcal{D}_{d_{1}d_{0}}^{\text{PNDE}}(\bm{O})\right]+o_{p}(1).

This implies that PNIE^d1d0np\widehat{\text{PNIE}}_{d_{1}d_{0}}^{\text{np}} and PNDE^d1d0np\widehat{\text{PNDE}}_{d_{1}d_{0}}^{\text{np}} are asymptotically normal and semiparametrically efficient under the required convergence rate conditions for the nuisance function estimates. Also, we show in the proof of Theorem 5 that n[ψ^d1d0(zz),np(𝑶)]=n[ψd1d0(zz)(𝑶)]+op(n1/2)\mathbb{P}_{n}[\widehat{\psi}_{d_{1}d_{0}}^{(zz^{\prime}),\text{np}}(\bm{O})]=\mathbb{P}_{n}[\psi_{d_{1}d_{0}}^{(zz^{\prime})}(\bm{O})]+o_{p}(n^{-1/2}), which suggests that

ITT-NIE^np\displaystyle\widehat{\text{ITT-NIE}}^{\text{np}} =d1d0𝒰e^d1d0np×(θ^d1d0(11),npθ^d1d0(10),np)\displaystyle=\sum_{d_{1}d_{0}\in\mathcal{U}}\widehat{e}_{d_{1}d_{0}}^{\text{np}}\times(\widehat{\theta}_{d_{1}d_{0}}^{(11),\text{np}}-\widehat{\theta}_{d_{1}d_{0}}^{(10),\text{np}})
=d1d0𝒰n[δ^d1d0np(𝑶)]×(n[ψ^d1d0(11),np(𝑶)]n[δ^d1d0np(𝑶)]n[ψ^d1d0(10),np(𝑶)]n[δ^d1d0np(𝑶)])\displaystyle=\sum_{d_{1}d_{0}\in\mathcal{U}}\mathbb{P}_{n}[\widehat{\delta}_{d_{1}d_{0}}^{\text{np}}(\bm{O})]\times\left(\frac{\mathbb{P}_{n}[\widehat{\psi}_{d_{1}d_{0}}^{(11),\text{np}}(\bm{O})]}{\mathbb{P}_{n}[\widehat{\delta}_{d_{1}d_{0}}^{\text{np}}(\bm{O})]}-\frac{\mathbb{P}_{n}[\widehat{\psi}_{d_{1}d_{0}}^{(10),\text{np}}(\bm{O})]}{\mathbb{P}_{n}[\widehat{\delta}_{d_{1}d_{0}}^{\text{np}}(\bm{O})]}\right)
=n[d1d0𝒰ψ^d1d0(11),np(𝑶)ψ^d1d0(10),np(𝑶)]\displaystyle=\mathbb{P}_{n}\left[\sum_{d_{1}d_{0}\in\mathcal{U}}\widehat{\psi}_{d_{1}d_{0}}^{(11),\text{np}}(\bm{O})-\widehat{\psi}_{d_{1}d_{0}}^{(10),\text{np}}(\bm{O})\right]
=n[d1d0𝒰ψd1d0(11)(𝑶)ψd1d0(10)(𝑶)]+op(n1/2)\displaystyle=\mathbb{P}_{n}\left[\sum_{d_{1}d_{0}\in\mathcal{U}}\psi_{d_{1}d_{0}}^{(11)}(\bm{O})-\psi_{d_{1}d_{0}}^{(10)}(\bm{O})\right]+o_{p}(n^{-1/2})

and thus n(ITT-NIE^npITT-NIE)=nn[d1d0𝒰{ψd1d0(11)(𝑶)ψd1d0(10)(𝑶)}ITT-NIE]+op(1)=nn[𝒟ITT-NIE(𝑶)]+op(1)\sqrt{n}\left(\widehat{\text{ITT-NIE}}^{\text{np}}-\text{ITT-NIE}\right)=\sqrt{n}\mathbb{P}_{n}\left[\displaystyle\sum_{d_{1}d_{0}\in\mathcal{U}}\left\{\psi_{d_{1}d_{0}}^{(11)}(\bm{O})-\psi_{d_{1}d_{0}}^{(10)}(\bm{O})\right\}-\text{ITT-NIE}\right]+o_{p}(1)=\sqrt{n}\mathbb{P}_{n}\left[\mathcal{D}^{\text{ITT-NIE}}(\bm{O})\right]+o_{p}(1). Similarly, we can show n(ITT-NDE^npITT-NDE)=nn[𝒟ITT-NDE(𝑶)]+op(1)\sqrt{n}\left(\widehat{\text{ITT-NDE}}^{\text{np}}-\text{ITT-NDE}\right)=\sqrt{n}\mathbb{P}_{n}\left[\mathcal{D}^{\text{ITT-NDE}}(\bm{O})\right]+o_{p}(1). This implies that ITT-NIE^np\widehat{\text{ITT-NIE}}^{\text{np}} and ITT-NDE^np\widehat{\text{ITT-NDE}}^{\text{np}} are asymptotically normal and semiparametrically efficient under the required convergence rate conditions for the nonparametric nuisance function estimators. \square

Remark 5

(Variance estimation of the principal and ITT mediation effects) For the purpose of inference, nonparametric bootstrap can be used for the moment-type and multiply robust estimators. The asymptotic variance of the nonparametric efficient estimators can be obtained by using the empirical variance of the estimated EIF given in Lemma S11. For example, the asymptotic variance of PNIE^d1d0np\widehat{\text{PNIE}}_{d_{1}d_{0}}^{\text{np}} can be estimated by

Var^(PNIE^d1d0(zz))=1nn[{𝒟^d1d0PNIE,np(𝑶)}2],\widehat{\text{Var}}\left(\widehat{\text{PNIE}}_{d_{1}d_{0}}^{(zz^{\prime})}\right)=\frac{1}{n}\mathbb{P}_{n}\left[\{\widehat{\mathcal{D}}_{d_{1}d_{0}}^{\text{PNIE,np}}(\bm{O})\}^{2}\right],

where 𝒟^d1d0PNIE,np(𝐎)\widehat{\mathcal{D}}_{d_{1}d_{0}}^{\text{PNIE,np}}(\bm{O}) is 𝒟d1d0PNIE(𝐎)\mathcal{D}_{d_{1}d_{0}}^{\text{PNIE}}(\bm{O}) evaluated based on the nonparametric estimator of the nuisance functions, h^nuisancenp\widehat{h}_{nuisance}^{\text{np}}. The variance estimator of PNDE^d1d0np\widehat{\text{PNDE}}_{d_{1}d_{0}}^{\text{np}}, ITT-NIE^np\widehat{\text{ITT-NIE}}^{\text{np}}, and ITT-NDE^np\widehat{\text{ITT-NDE}}^{\text{np}} can be similarly obtained.

D.8 Sensitivity analysis for the principal ignorability assumption under standard monotonicity

This section provides the supporting information for the sensitivity analysis for the principal ignorability assumption, under standard monotonicity. We first present the explicit forms of the sensitivity weight wd1d0(zz)(m,𝒙)w_{d_{1}d_{0}}^{(zz^{\prime})}(m,{\bm{x}}) for all zz{11,10,00}zz^{\prime}\in\{11,10,00\} and d1d0𝒰2-sidedd_{1}d_{0}\in\mathcal{U}_{\text{2-sided}}:

w10(11)(m,𝒙)\displaystyle w_{10}^{(11)}(m,{\bm{x}}) ={ξM(1)(m,𝒙)p11(𝒙)ξM(1)(m,𝒙)(p11(𝒙)p01(𝒙))+p01(𝒙)ξM(1)(m,𝒙)(p11(𝒙)p01(𝒙))+p01(𝒙)p01(𝒙)/ξY(1)(m,𝒙)+ξM(1)(m,𝒙)(p11(𝒙)p01(𝒙)),if m1,{1r11(0,𝒙)j=1mmaxξM(1)(j,𝒙)p11(𝒙)r11(j,𝒙)/r11(0,𝒙)ξM(1)(j,𝒙)(p11(𝒙)p01(𝒙))+p01(𝒙)}ξM(1)(0,𝑿)(p11(𝒙)p01(𝒙))+p01(𝒙)p01(𝒙)/ξY(1)(0,𝒙)+ξM(1)(0,𝒙)(p11(𝒙)p01(𝒙)),if m=0.\displaystyle=\begin{cases}\frac{\xi_{M}^{(1)}(m,{\bm{x}})p_{11}({\bm{x}})}{\xi_{M}^{(1)}(m,{\bm{x}})(p_{11}({\bm{x}})-p_{01}({\bm{x}}))+p_{01}({\bm{x}})}\frac{\xi_{M}^{(1)}(m,{\bm{x}})(p_{11}({\bm{x}})-p_{01}({\bm{x}}))+p_{01}({\bm{x}})}{p_{01}({\bm{x}})/\xi_{Y}^{(1)}(m,{\bm{x}})+\xi_{M}^{(1)}(m,{\bm{x}})(p_{11}({\bm{x}})-p_{01}({\bm{x}}))},&\text{if }m\geq 1,\\ \left\{\frac{1}{r_{11}(0,{\bm{x}})}-\displaystyle\sum_{j=1}^{m_{\max}}\frac{\xi_{M}^{(1)}(j,{\bm{x}})p_{11}({\bm{x}})r_{11}(j,{\bm{x}})/r_{11}(0,{\bm{x}})}{\xi_{M}^{(1)}(j,{\bm{x}})(p_{11}({\bm{x}})-p_{01}({\bm{x}}))+p_{01}({\bm{x}})}\right\}\frac{\xi_{M}^{(1)}(0,{\bm{X}})(p_{11}({\bm{x}})-p_{01}({\bm{x}}))+p_{01}({\bm{x}})}{p_{01}({\bm{x}})/\xi_{Y}^{(1)}(0,{\bm{x}})+\xi_{M}^{(1)}(0,{\bm{x}})(p_{11}({\bm{x}})-p_{01}({\bm{x}}))},&\text{if }m=0.\end{cases}
w10(10)(m,𝒙)\displaystyle w_{10}^{(10)}(m,{\bm{x}}) ={ξM(0)(m,𝒙)p00(𝒙)ξM(0)(m,𝒙)(p11(𝒙)p01(𝑿))+p10(𝒙)ξM(1)(m,𝑿)(p11(𝒙)p01(𝒙))+p01(𝒙)p01(𝒙)/ξY(1)(m,𝒙)+ξM(1)(m,𝒙)(p11(𝒙)p01(𝒙)),if m1,{1r00(0,𝒙)j=1mmaxξM(0)(j,𝒙)p00(𝒙)r00(j,𝒙)/r00(0,𝒙)ξM(0)(j,𝒙)(p11(𝒙)p01(𝒙))+p10(𝒙)}ξM(1)(0,𝑿)(p11(𝒙)p01(𝒙))+p01(𝒙)p01(𝒙)/ξY(1)(0,𝒙)+ξM(1)(0,𝒙)(p11(𝒙)p01(𝒙)),if m=0.\displaystyle=\begin{cases}\frac{\xi_{M}^{(0)}(m,{\bm{x}})p_{00}({\bm{x}})}{\xi_{M}^{(0)}(m,{\bm{x}})(p_{11}({\bm{x}})-p_{01}({\bm{X}}))+p_{10}({\bm{x}})}\frac{\xi_{M}^{(1)}(m,{\bm{X}})(p_{11}({\bm{x}})-p_{01}({\bm{x}}))+p_{01}({\bm{x}})}{p_{01}({\bm{x}})/\xi_{Y}^{(1)}(m,{\bm{x}})+\xi_{M}^{(1)}(m,{\bm{x}})(p_{11}({\bm{x}})-p_{01}({\bm{x}}))},&\text{if }m\geq 1,\\ \left\{\frac{1}{r_{00}(0,{\bm{x}})}-\displaystyle\sum_{j=1}^{m_{\max}}\frac{\xi_{M}^{(0)}(j,{\bm{x}})p_{00}({\bm{x}})r_{00}(j,{\bm{x}})/r_{00}(0,{\bm{x}})}{\xi_{M}^{(0)}(j,{\bm{x}})(p_{11}({\bm{x}})-p_{01}({\bm{x}}))+p_{10}({\bm{x}})}\right\}\frac{\xi_{M}^{(1)}(0,{\bm{X}})(p_{11}({\bm{x}})-p_{01}({\bm{x}}))+p_{01}({\bm{x}})}{p_{01}({\bm{x}})/\xi_{Y}^{(1)}(0,{\bm{x}})+\xi_{M}^{(1)}(0,{\bm{x}})(p_{11}({\bm{x}})-p_{01}({\bm{x}}))},&\text{if }m=0.\end{cases}
w10(00)(m,𝒙)\displaystyle w_{10}^{(00)}(m,{\bm{x}}) ={ξM(0)(m,𝒙)p00(𝒙)ξM(0)(m,𝒙)(p11(𝒙)p01(𝑿))+p10(𝒙)ξM(0)(m,𝒙)(p11(𝒙)p01(𝒙))+p10(𝒙)p10(𝒙)/ξY(0)(m,𝒙)+ξM(0)(m,𝒙)(p11(𝒙)p01(𝒙)),if m1,{1r00(0,𝒙)j=1mmaxξM(0)(j,𝒙)p00(𝒙)r00(j,𝒙)/r00(0,𝒙)ξM(0)(j,𝒙)(p11(𝒙)p01(𝒙))+p10(𝒙)}ξM(0)(0,𝒙)(p11(𝒙)p01(𝒙))+p10(𝒙)p10(𝒙)/ξY(0)(0,𝒙)+ξM(0)(0,𝒙)(p11(𝒙)p01(𝒙)),if m=0.\displaystyle=\begin{cases}\frac{\xi_{M}^{(0)}(m,{\bm{x}})p_{00}({\bm{x}})}{\xi_{M}^{(0)}(m,{\bm{x}})(p_{11}({\bm{x}})-p_{01}({\bm{X}}))+p_{10}({\bm{x}})}\frac{\xi_{M}^{(0)}(m,{\bm{x}})\left(p_{11}({\bm{x}})-p_{01}({\bm{x}})\right)+p_{10}({\bm{x}})}{p_{10}({\bm{x}})/\xi_{Y}^{(0)}(m,{\bm{x}})+\xi_{M}^{(0)}(m,{\bm{x}})\left(p_{11}({\bm{x}})-p_{01}({\bm{x}})\right)},&\text{if }m\geq 1,\\ \left\{\frac{1}{r_{00}(0,{\bm{x}})}-\displaystyle\sum_{j=1}^{m_{\max}}\frac{\xi_{M}^{(0)}(j,{\bm{x}})p_{00}({\bm{x}})r_{00}(j,{\bm{x}})/r_{00}(0,{\bm{x}})}{\xi_{M}^{(0)}(j,{\bm{x}})(p_{11}({\bm{x}})-p_{01}({\bm{x}}))+p_{10}({\bm{x}})}\right\}\frac{\xi_{M}^{(0)}(0,{\bm{x}})\left(p_{11}({\bm{x}})-p_{01}({\bm{x}})\right)+p_{10}({\bm{x}})}{p_{10}({\bm{x}})/\xi_{Y}^{(0)}(0,{\bm{x}})+\xi_{M}^{(0)}(0,{\bm{x}})\left(p_{11}({\bm{x}})-p_{01}({\bm{x}})\right)},&\text{if }m=0.\end{cases}
w00(11)(m,𝒙)\displaystyle w_{00}^{(11)}(m,{\bm{x}}) =1 for any m.\displaystyle=1\text{ for any $m$}.
w00(10)(m,𝒙)\displaystyle w_{00}^{(10)}(m,{\bm{x}}) ={p00(𝒙)ξM(0)(m,𝒙)(p11(𝒙)p01(𝒙))+p10(𝒙),if m1,1r00(0,𝒙)j=1mmaxp00(𝒙)r00(j,𝒙)/r00(0,𝒙)ξM(0)(j,𝒙)(p11(𝒙)p01(𝒙))+p10(𝒙),if m=0.\displaystyle=\begin{cases}\frac{p_{00}({\bm{x}})}{\xi_{M}^{(0)}(m,{\bm{x}})(p_{11}({\bm{x}})-p_{01}({\bm{x}}))+p_{10}({\bm{x}})},&\text{if }m\geq 1,\\ \frac{1}{r_{00}(0,{\bm{x}})}-\displaystyle\sum_{j=1}^{m_{\max}}\frac{p_{00}({\bm{x}})r_{00}(j,{\bm{x}})/r_{00}(0,{\bm{x}})}{\xi_{M}^{(0)}(j,{\bm{x}})(p_{11}({\bm{x}})-p_{01}({\bm{x}}))+p_{10}({\bm{x}})},&\text{if }m=0.\end{cases}
w00(00)(m,𝒙)\displaystyle w_{00}^{(00)}(m,{\bm{x}}) ={p00(𝒙)ξM(0)(m,𝒙)(p11(𝒙)p01(𝒙))+p10(𝒙)ξM(0)(m,𝒙)(p11(𝒙)p01(𝒙))+p10(𝒙)p10(𝒙)+ξY(0)(m,𝒙)ξM(0)(m,𝒙)(p11(𝒙)p01(𝒙)),if m1,{1r00(0,𝒙)j=1mmaxp00(𝒙)r00(j,𝒙)/r00(0,𝒙)ξM(0)(j,𝒙)(p11(𝒙)p01(𝒙))+p10(𝒙)}ξM(0)(0,𝒙)(p11(𝒙)p01(𝒙))+p10(𝒙)p10(𝒙)+ξY(0)(0,𝒙)ξM(0)(0,𝒙)(p11(𝒙)p01(𝒙)),if m=0.\displaystyle=\begin{cases}\frac{p_{00}({\bm{x}})}{\xi_{M}^{(0)}(m,{\bm{x}})(p_{11}({\bm{x}})-p_{01}({\bm{x}}))+p_{10}({\bm{x}})}\frac{\xi_{M}^{(0)}(m,{\bm{x}})\left(p_{11}({\bm{x}})-p_{01}({\bm{x}})\right)+p_{10}({\bm{x}})}{p_{10}({\bm{x}})+\xi_{Y}^{(0)}(m,{\bm{x}})\xi_{M}^{(0)}(m,{\bm{x}})\left(p_{11}({\bm{x}})-p_{01}({\bm{x}})\right)},&\text{if }m\geq 1,\\ \left\{\frac{1}{r_{00}(0,{\bm{x}})}-\displaystyle\sum_{j=1}^{m_{\max}}\frac{p_{00}({\bm{x}})r_{00}(j,{\bm{x}})/r_{00}(0,{\bm{x}})}{\xi_{M}^{(0)}(j,{\bm{x}})(p_{11}({\bm{x}})-p_{01}({\bm{x}}))+p_{10}({\bm{x}})}\right\}\frac{\xi_{M}^{(0)}(0,{\bm{x}})\left(p_{11}({\bm{x}})-p_{01}({\bm{x}})\right)+p_{10}({\bm{x}})}{p_{10}({\bm{x}})+\xi_{Y}^{(0)}(0,{\bm{x}})\xi_{M}^{(0)}(0,{\bm{x}})\left(p_{11}({\bm{x}})-p_{01}({\bm{x}})\right)},&\text{if }m=0.\end{cases}
w11(11)(m,𝒙)\displaystyle w_{11}^{(11)}(m,{\bm{x}}) ={p11(𝒙)ξM(1)(m,𝒙)(p11(𝒙)p01(𝒙))+p01(𝒙)ξM(1)(m,𝑿)(p11(𝒙)p01(𝒙))+p01(𝒙)p01(𝒙)+ξY(1)(m,𝒙)ξM(1)(m,𝒙)(p11(𝒙)p01(𝒙)),if m1,{1r11(0,𝒙)j=1mmaxp11(𝒙)r11(j,𝒙)/r11(0,𝒙)ξM(1)(j,𝒙)(p11(𝒙)p01(𝒙))+p01(𝒙)}ξM(1)(0,𝑿)(p11(𝒙)p01(𝒙))+p01(𝒙)p01(𝒙)+ξY(1)(0,𝒙)ξM(1)(0,𝒙)(p11(𝒙)p01(𝒙)),if m=0.\displaystyle=\begin{cases}\frac{p_{11}({\bm{x}})}{\xi_{M}^{(1)}(m,{\bm{x}})(p_{11}({\bm{x}})-p_{01}({\bm{x}}))+p_{01}({\bm{x}})}\frac{\xi_{M}^{(1)}(m,{\bm{X}})\left(p_{11}({\bm{x}})-p_{01}({\bm{x}})\right)+p_{01}({\bm{x}})}{p_{01}({\bm{x}})+\xi_{Y}^{(1)}(m,{\bm{x}})\xi_{M}^{(1)}(m,{\bm{x}})\left(p_{11}({\bm{x}})-p_{01}({\bm{x}})\right)},&\text{if }m\geq 1,\\ \left\{\frac{1}{r_{11}(0,{\bm{x}})}-\displaystyle\sum_{j=1}^{m_{\max}}\frac{p_{11}({\bm{x}})r_{11}(j,{\bm{x}})/r_{11}(0,{\bm{x}})}{\xi_{M}^{(1)}(j,{\bm{x}})(p_{11}({\bm{x}})-p_{01}({\bm{x}}))+p_{01}({\bm{x}})}\right\}\frac{\xi_{M}^{(1)}(0,{\bm{X}})\left(p_{11}({\bm{x}})-p_{01}({\bm{x}})\right)+p_{01}({\bm{x}})}{p_{01}({\bm{x}})+\xi_{Y}^{(1)}(0,{\bm{x}})\xi_{M}^{(1)}(0,{\bm{x}})\left(p_{11}({\bm{x}})-p_{01}({\bm{x}})\right)},&\text{if }m=0.\end{cases}
w11(10)(m,𝒙)\displaystyle w_{11}^{(10)}(m,{\bm{x}}) =ξM(1)(m,𝑿)(p11(𝒙)p01(𝒙))+p01(𝒙)p01(𝒙)+ξY(1)(m,𝒙)ξM(1)(m,𝒙)(p11(𝒙)p01(𝒙)) for any m.\displaystyle=\frac{\xi_{M}^{(1)}(m,{\bm{X}})\left(p_{11}({\bm{x}})-p_{01}({\bm{x}})\right)+p_{01}({\bm{x}})}{p_{01}({\bm{x}})+\xi_{Y}^{(1)}(m,{\bm{x}})\xi_{M}^{(1)}(m,{\bm{x}})\left(p_{11}({\bm{x}})-p_{01}({\bm{x}})\right)}\text{ for any $m$.}
w11(00)(m,𝒙)\displaystyle w_{11}^{(00)}(m,{\bm{x}}) =1 for any m.\displaystyle=1\text{ for any $m$.}

Next, we prove Propositions S1 and S2, which include identification results and properties of the multiply robust estimator under violation of principal ignorability. We first provide two lemmas.

Lemma S12

Under Assumptions 1, 2, 3a, and 6 and the proposed confounding function ξ={(ξM(1)(m,𝐱),ξM(0)(m,𝐱)) for m1 and (ξY(1)(m,𝐱),ξY(0)(m,𝐱)) for m0}\xi=\left\{\left(\xi_{M}^{(1)}(m,{\bm{x}}),\xi_{M}^{(0)}(m,{\bm{x}})\right)\text{ for }m\geq 1\text{ and }\left(\xi_{Y}^{(1)}(m,{\bm{x}}),\xi_{Y}^{(0)}(m,{\bm{x}})\right)\text{ for }m\geq 0\right\}, we can nonparametrically identify the distribution of Mz|{U=d1d0,𝐗}M_{z}|\{U=d_{1}d_{0},{\bm{X}}\} for all z{1,0}z\in\{1,0\} and d1d0𝒰2sidedd_{1}d_{0}\in\mathcal{U}_{2-sided}. Specifically, we have that fM0|U,𝐗(m|11,𝐱)=r01(m,𝐱)f_{M_{0}|U,{\bm{X}}}(m|11,{\bm{x}})=r_{01}(m,{\bm{x}}) and fM1|U,𝐗(m|00,𝐱)=r10(m,𝐱)f_{M_{1}|U,{\bm{X}}}(m|00,{\bm{x}})=r_{10}(m,{\bm{x}}) for any m=0,,mmaxm=0,\dots,m_{\max} and

fM1|U,𝑿(m|10,𝒙)\displaystyle f_{M_{1}|U,{\bm{X}}}(m|10,{\bm{x}}) =ξM(1)(m,𝒙)p11(𝒙)ξM(1)(m,𝒙)(p11(𝒙)p01(𝒙))+p01(𝒙)r11(m,𝒙),\displaystyle=\frac{\xi_{M}^{(1)}(m,{\bm{x}})p_{11}({\bm{x}})}{\xi_{M}^{(1)}(m,{\bm{x}})(p_{11}({\bm{x}})-p_{01}({\bm{x}}))+p_{01}({\bm{x}})}r_{11}(m,{\bm{x}}),
fM0|U,𝑿(m|10,𝒙)\displaystyle f_{M_{0}|U,{\bm{X}}}(m|10,{\bm{x}}) =ξM(0)(m,𝒙)p00(𝒙)ξM(0)(m,𝒙)(p11(𝒙)p01(𝒙))+p10(𝒙)r00(m,𝒙),\displaystyle=\frac{\xi_{M}^{(0)}(m,{\bm{x}})p_{00}({\bm{x}})}{\xi_{M}^{(0)}(m,{\bm{x}})(p_{11}({\bm{x}})-p_{01}({\bm{x}}))+p_{10}({\bm{x}})}r_{00}(m,{\bm{x}}),
fM1|U,𝑿(m|11,𝒙)\displaystyle f_{M_{1}|U,{\bm{X}}}(m|11,{\bm{x}}) =p11(𝒙)ξM(1)(m,𝒙)(p11(𝒙)p01(𝒙))+p01(𝒙)r11(m,𝒙),\displaystyle=\frac{p_{11}({\bm{x}})}{\xi_{M}^{(1)}(m,{\bm{x}})(p_{11}({\bm{x}})-p_{01}({\bm{x}}))+p_{01}({\bm{x}})}r_{11}(m,{\bm{x}}),
fM0|U,𝑿(m|00,𝒙)\displaystyle f_{M_{0}|U,{\bm{X}}}(m|00,{\bm{x}}) =p00(𝒙)ξM(0)(m,𝒙)(p11(𝒙)p01(𝒙))+p10(𝒙)r00(m,𝒙),\displaystyle=\frac{p_{00}({\bm{x}})}{\xi_{M}^{(0)}(m,{\bm{x}})(p_{11}({\bm{x}})-p_{01}({\bm{x}}))+p_{10}({\bm{x}})}r_{00}(m,{\bm{x}}),

for any m=1,,mmaxm=1,\dots,m_{\max}. This suggests that, for m=0m=0,

fM1|U,𝑿(0|10,𝒙)\displaystyle f_{M_{1}|U,{\bm{X}}}(0|10,{\bm{x}}) =1j=1mmaxξM(1)(j,𝒙)p11(𝒙)ξM(1)(j,𝒙)(p11(𝒙)p01(𝒙))+p01(𝒙)r11(j,𝒙),\displaystyle=1-\sum_{j=1}^{m_{\max}}\frac{\xi_{M}^{(1)}(j,{\bm{x}})p_{11}({\bm{x}})}{\xi_{M}^{(1)}(j,{\bm{x}})(p_{11}({\bm{x}})-p_{01}({\bm{x}}))+p_{01}({\bm{x}})}r_{11}(j,{\bm{x}}),
fM0|U,𝑿(0|10,𝒙)\displaystyle f_{M_{0}|U,{\bm{X}}}(0|10,{\bm{x}}) =1j=1mmaxξM(0)(j,𝒙)p00(𝒙)ξM(0)(j,𝒙)(p11(𝒙)p01(𝒙))+p10(𝒙)r00(j,𝒙),\displaystyle=1-\sum_{j=1}^{m_{\max}}\frac{\xi_{M}^{(0)}(j,{\bm{x}})p_{00}({\bm{x}})}{\xi_{M}^{(0)}(j,{\bm{x}})(p_{11}({\bm{x}})-p_{01}({\bm{x}}))+p_{10}({\bm{x}})}r_{00}(j,{\bm{x}}),
fM1|U,𝑿(0|11,𝒙)\displaystyle f_{M_{1}|U,{\bm{X}}}(0|11,{\bm{x}}) =1j=1mmaxp11(𝒙)ξM(1)(j,𝒙)(p11(𝒙)p01(𝒙))+p01(𝒙)r11(j,𝒙),\displaystyle=1-\sum_{j=1}^{m_{\max}}\frac{p_{11}({\bm{x}})}{\xi_{M}^{(1)}(j,{\bm{x}})(p_{11}({\bm{x}})-p_{01}({\bm{x}}))+p_{01}({\bm{x}})}r_{11}(j,{\bm{x}}),
fM0|U,𝑿(0|00,𝒙)\displaystyle f_{M_{0}|U,{\bm{X}}}(0|00,{\bm{x}}) =1j=1mmaxp00(𝒙)ξM(0)(j,𝒙)(p11(𝒙)p01(𝒙))+p10(𝒙)r00(j,𝒙).\displaystyle=1-\sum_{j=1}^{m_{\max}}\frac{p_{00}({\bm{x}})}{\xi_{M}^{(0)}(j,{\bm{x}})(p_{11}({\bm{x}})-p_{01}({\bm{x}}))+p_{10}({\bm{x}})}r_{00}(j,{\bm{x}}).
Proof.

We first show fM0|U,𝑿(m|11,𝒙)=r01(m,𝒙)f_{M_{0}|U,{\bm{X}}}(m|11,{\bm{x}})=r_{01}(m,{\bm{x}}) for all m{0,,mmax}m\in\{0,\dots,m_{\max}\} and similar results extend to fM1|U,𝑿(m|00,𝒙)=r10(m,𝒙)f_{M_{1}|U,{\bm{X}}}(m|00,{\bm{x}})=r_{10}(m,{\bm{x}}). Specifically, for any m{0,,mmax}m\in\{0,\dots,m_{\max}\}, we have that

fM0|U,𝑿(m|11,𝒙)\displaystyle f_{M_{0}|U,{\bm{X}}}(m|11,{\bm{x}}) =fM0|Z,U,𝑿(m|0,11,𝒙)\displaystyle=f_{M_{0}|Z,U,{\bm{X}}}(m|0,11,{\bm{x}})
(by Lemma S5 and Lemma S2)
=fM0|Z,D,U,𝑿(m|0,1,11,𝒙)\displaystyle=f_{M_{0}|Z,D,U,{\bm{X}}}(m|0,1,11,{\bm{x}})
(because DD must be 1 given Z=0Z=0 and U=11U=11)
=fM0|Z,D,𝑿(m|0,1,𝒙)\displaystyle=f_{M_{0}|Z,D,{\bm{X}}}(m|0,1,{\bm{x}})
(because the observed stratum with D=0D=0 and Z=1Z=1 only contains the always-takers)
=r01(m,𝒙).\displaystyle=r_{01}(m,{\bm{x}}).

Next, we derive the expression of fM1|U,𝑿(m|10,𝒙)f_{M_{1}|U,{\bm{X}}}(m|10,{\bm{x}}). Leveraging the same strategy, one can deduce the expressions of fM0|U,𝑿(m|10,𝒙)f_{M_{0}|U,{\bm{X}}}(m|10,{\bm{x}}), fM1|U,𝑿(m|11,𝒙)f_{M_{1}|U,{\bm{X}}}(m|11,{\bm{x}}), and fM0|U,𝑿(m|00,𝒙)f_{M_{0}|U,{\bm{X}}}(m|00,{\bm{x}}). For m1m\geq 1, we have that

r11(m,𝒙)=\displaystyle r_{11}(m,{\bm{x}})= fM|Z,D,𝑿(m|1,1,𝒙)\displaystyle f_{M|Z,D,{\bm{X}}}(m|1,1,{\bm{x}})
=\displaystyle= fM1|Z,D,U,𝑿(m|1,1,10,𝒙)fU|Z,D,𝑿(10|1,1,𝒙)+fM1|Z,D,U,𝑿(m|1,1,11,𝒙)fU|Z,D,𝑿(11|1,1,𝒙)\displaystyle f_{M_{1}|Z,D,U,{\bm{X}}}(m|1,1,10,{\bm{x}})f_{U|Z,D,{\bm{X}}}(10|1,1,{\bm{x}})+f_{M_{1}|Z,D,U,{\bm{X}}}(m|1,1,11,{\bm{x}})f_{U|Z,D,{\bm{X}}}(11|1,1,{\bm{x}})
(followed by the law of total probability and U=10U=10 or 11 in observed strata (Z=1,D=1)(Z=1,D=1)
under standard monotonicity)
=\displaystyle= fM1|Z,U,𝑿(m|1,10,𝒙)fU|Z,D,𝑿(10|1,1,𝒙)+fM1|Z,U,𝑿(m|1,10,𝒙)fU|Z,D,𝑿(10|1,1,𝒙)\displaystyle f_{M_{1}|Z,U,{\bm{X}}}(m|1,10,{\bm{x}})f_{U|Z,D,{\bm{X}}}(10|1,1,{\bm{x}})+f_{M_{1}|Z,U,{\bm{X}}}(m|1,10,{\bm{x}})f_{U|Z,D,{\bm{X}}}(10|1,1,{\bm{x}})
(because DD must be 1 given Z=1 and either U=10U=10 or 11)
=\displaystyle= fM1|U,𝑿(m|10,𝒙)fU|Z,D,𝑿(10|1,1,𝒙)+fM1|U,𝑿(m|10,𝒙)fU|Z,D,𝑿(10|1,1,𝒙)\displaystyle f_{M_{1}|U,{\bm{X}}}(m|10,{\bm{x}})f_{U|Z,D,{\bm{X}}}(10|1,1,{\bm{x}})+f_{M_{1}|U,{\bm{X}}}(m|10,{\bm{x}})f_{U|Z,D,{\bm{X}}}(10|1,1,{\bm{x}})
(by Lemma S5 and Lemma S2)
=\displaystyle= fM1|U,𝑿(m|10,𝒙)p11(𝒙)p01(𝒙)p11(𝒙)+fM1|U,𝑿(m|11,𝒙)p01(𝒙)p11(𝒙)\displaystyle f_{M_{1}|U,{\bm{X}}}(m|10,{\bm{x}})\frac{p_{11}({\bm{x}})-p_{01}({\bm{x}})}{p_{11}({\bm{x}})}+f_{M_{1}|U,{\bm{X}}}(m|11,{\bm{x}})\frac{p_{01}({\bm{x}})}{p_{11}({\bm{x}})}
=\displaystyle= fM1|U,𝑿(m|10,𝒙)p11(𝒙)p01(𝒙)p11(𝒙)+fM1|U,𝒙(m|10,𝒙)p01(𝒙)ξM(1)(m,𝒙)p11(𝒙)\displaystyle f_{M_{1}|U,{\bm{X}}}(m|10,{\bm{x}})\frac{p_{11}({\bm{x}})-p_{01}({\bm{x}})}{p_{11}({\bm{x}})}+f_{M_{1}|U,{\bm{x}}}(m|10,{\bm{x}})\frac{p_{01}({\bm{x}})}{\xi_{M}^{(1)}(m,{\bm{x}})p_{11}({\bm{x}})}
=\displaystyle= fM1|U,𝑿(m|10,𝒙)ξM(1)(m,𝒙)(p11(𝒙)p01(𝒙))+p01(𝒙)ξM(1)(m,𝒙)p11(𝒙),\displaystyle f_{M_{1}|U,{\bm{X}}}(m|10,{\bm{x}})\frac{\xi_{M}^{(1)}(m,{\bm{x}})\left(p_{11}({\bm{x}})-p_{01}({\bm{x}})\right)+p_{01}({\bm{x}})}{\xi_{M}^{(1)}(m,{\bm{x}})p_{11}({\bm{x}})},

which indicates fM1|U,𝑿(m|10,𝒙)=ξM(1)(m,𝒙)p11(𝒙)ξM(1)(m,𝒙)(p11(𝒙)p01(𝒙))+p01(𝒙)r11(m,𝒙)f_{M_{1}|U,{\bm{X}}}(m|10,{\bm{x}})=\frac{\xi_{M}^{(1)}(m,{\bm{x}})p_{11}({\bm{x}})}{\xi_{M}^{(1)}(m,{\bm{x}})\left(p_{11}({\bm{x}})-p_{01}({\bm{x}})\right)+p_{01}({\bm{x}})}r_{11}(m,{\bm{x}}) for m{1,,mmax}m\in\{1,\dots,m_{\max}\}. Because j=0mmaxfM1|U,𝑿(m|10,𝒙)=1\sum_{j=0}^{m_{\max}}f_{M_{1}|U,{\bm{X}}}(m|10,{\bm{x}})=1, we obtain

fM1|U,𝑿(0|10,𝒙)=1j=1mmaxξM(1)(j,𝒙)p11(𝒙)ξM(1)(j,𝒙)(p11(𝒙)p01(𝒙))+p01(𝒙)r11(j,𝒙).f_{M_{1}|U,{\bm{X}}}(0|10,{\bm{x}})=1-\sum_{j=1}^{m_{\max}}\frac{\xi_{M}^{(1)}(j,{\bm{x}})p_{11}({\bm{x}})}{\xi_{M}^{(1)}(j,{\bm{x}})(p_{11}({\bm{x}})-p_{01}({\bm{x}}))+p_{01}({\bm{x}})}r_{11}(j,{\bm{x}}).

\square

Lemma S13

Under Assumptions 1, 2, 3a, and 6 and the proposed confounding function ξ={(ξM(1)(m,𝐱),ξM(0)(m,𝐱)) for m1 and (ξY(1)(m,𝐱),ξY(0)(m,𝐱)) for m0}\xi=\left\{\left(\xi_{M}^{(1)}(m,{\bm{x}}),\xi_{M}^{(0)}(m,{\bm{x}})\right)\text{ for }m\geq 1\text{ and }\left(\xi_{Y}^{(1)}(m,{\bm{x}}),\xi_{Y}^{(0)}(m,{\bm{x}})\right)\text{ for }m\geq 0\right\}, we can nonparametrically identify the 𝔼Yzm|U,𝐗(Yzm|d1d0,𝐱)\mathbb{E}_{Y_{zm}|U,{\bm{X}}}(Y_{zm}|d_{1}d_{0},{\bm{x}}) for all z{1,0}z\in\{1,0\}, m{0,,mmax}m\in\{0,\dots,m_{\max}\}, and d1d0𝒰2-sidedd_{1}d_{0}\in\mathcal{U}_{\text{2-sided}}. Specifically, we have that

𝔼Y1m|U,𝑿[Y1m|10,𝒙]\displaystyle\mathbb{E}_{Y_{1m}|U,{\bm{X}}}[Y_{1m}|10,{\bm{x}}] =ξM(1)(m,𝒙)(p11(𝒙)p01(𝒙))+p01(𝒙)p01(𝒙)/ξY(1)(m,𝒙)+ξM(1)(m,𝒙)(p11(𝒙)p01(𝒙))μ11(m,𝒙),\displaystyle=\frac{\xi_{M}^{(1)}(m,{\bm{x}})\left(p_{11}({\bm{x}})-p_{01}({\bm{x}})\right)+p_{01}({\bm{x}})}{p_{01}({\bm{x}})/\xi_{Y}^{(1)}(m,{\bm{x}})+\xi_{M}^{(1)}(m,{\bm{x}})\left(p_{11}({\bm{x}})-p_{01}({\bm{x}})\right)}\mu_{11}(m,{\bm{x}}),
𝔼Y0m|U,𝑿[Y0m|10,𝒙]\displaystyle\mathbb{E}_{Y_{0m}|U,{\bm{X}}}[Y_{0m}|10,{\bm{x}}] =ξM(0)(m,𝒙)(p11(𝒙)p01(𝒙))+p10(𝒙)p10(𝒙)/ξY(0)(m,𝒙)+ξM(0)(m,𝒙)(p11(𝒙)p01(𝒙))μ00(m,𝒙),\displaystyle=\frac{\xi_{M}^{(0)}(m,{\bm{x}})\left(p_{11}({\bm{x}})-p_{01}({\bm{x}})\right)+p_{10}({\bm{x}})}{p_{10}({\bm{x}})/\xi_{Y}^{(0)}(m,{\bm{x}})+\xi_{M}^{(0)}(m,{\bm{x}})\left(p_{11}({\bm{x}})-p_{01}({\bm{x}})\right)}\mu_{00}(m,{\bm{x}}),
𝔼Y1m|U,𝑿[Y1m|11,𝒙]\displaystyle\mathbb{E}_{Y_{1m}|U,{\bm{X}}}[Y_{1m}|11,{\bm{x}}] =ξM(1)(m,𝑿)(p11(𝒙)p01(𝒙))+p01(𝒙)p01(𝒙)+ξY(1)(m,𝒙)ξM(1)(m,𝒙)(p11(𝒙)p01(𝒙))μ11(m,𝒙),\displaystyle=\frac{\xi_{M}^{(1)}(m,{\bm{X}})\left(p_{11}({\bm{x}})-p_{01}({\bm{x}})\right)+p_{01}({\bm{x}})}{p_{01}({\bm{x}})+\xi_{Y}^{(1)}(m,{\bm{x}})\xi_{M}^{(1)}(m,{\bm{x}})\left(p_{11}({\bm{x}})-p_{01}({\bm{x}})\right)}\mu_{11}(m,{\bm{x}}),
𝔼Y0m|U,𝑿[Y0m|11,𝒙]\displaystyle\mathbb{E}_{Y_{0m}|U,{\bm{X}}}[Y_{0m}|11,{\bm{x}}] =μ01(m,𝒙),\displaystyle=\mu_{01}(m,{\bm{x}}),
𝔼Y1m|U,𝑿[Y1m|00,𝒙]\displaystyle\mathbb{E}_{Y_{1m}|U,{\bm{X}}}[Y_{1m}|00,{\bm{x}}] =μ10(m,𝒙),\displaystyle=\mu_{10}(m,{\bm{x}}),
𝔼Y0m|U,𝑿[Y0m|00,𝒙]\displaystyle\mathbb{E}_{Y_{0m}|U,{\bm{X}}}[Y_{0m}|00,{\bm{x}}] =ξM(0)(m,𝒙)(p11(𝒙)p01(𝒙))+p10(𝒙)p10(𝒙)+ξY(0)(m,𝒙)ξM(0)(m,𝒙)(p11(𝒙)p01(𝒙))μ00(m,𝒙),\displaystyle=\frac{\xi_{M}^{(0)}(m,{\bm{x}})\left(p_{11}({\bm{x}})-p_{01}({\bm{x}})\right)+p_{10}({\bm{x}})}{p_{10}({\bm{x}})+\xi_{Y}^{(0)}(m,{\bm{x}})\xi_{M}^{(0)}(m,{\bm{x}})\left(p_{11}({\bm{x}})-p_{01}({\bm{x}})\right)}\mu_{00}(m,{\bm{x}}),

for any m=0,,mmaxm=0,\dots,m_{\max}, where ξM(1)(m,𝐱)\xi_{M}^{(1)}(m,{\bm{x}}) and ξM(1)(m,𝐱)\xi_{M}^{(1)}(m,{\bm{x}}) are given in ξ\xi for m1m\geq 1 and for m=0m=0,

ξM(1)(0,𝒙)\displaystyle\xi_{M}^{(1)}(0,{\bm{x}}) =:fM1|U,𝑿(0|10,𝒙)fM1|U,𝑿(0|11,𝒙)=1j=1mmaxξM(1)(j,𝒙)p11(𝒙)ξM(1)(j,𝒙)(p11(𝒙)p01(𝒙))+p01(𝒙)r11(j,𝒙)1j=1mmaxp11(𝒙)ξM(1)(j,𝒙)(p11(𝒙)p01(𝒙))+p01(𝒙)r11(j,𝒙),\displaystyle=:\frac{f_{M_{1}|U,{\bm{X}}}(0|10,{\bm{x}})}{f_{M_{1}|U,{\bm{X}}}(0|11,{\bm{x}})}=\frac{1-\sum_{j=1}^{m_{\max}}\displaystyle\frac{\xi_{M}^{(1)}(j,{\bm{x}})p_{11}({\bm{x}})}{\xi_{M}^{(1)}(j,{\bm{x}})(p_{11}({\bm{x}})-p_{01}({\bm{x}}))+p_{01}({\bm{x}})}r_{11}(j,{\bm{x}})}{1-\sum_{j=1}^{m_{\max}}\displaystyle\frac{p_{11}({\bm{x}})}{\xi_{M}^{(1)}(j,{\bm{x}})(p_{11}({\bm{x}})-p_{01}({\bm{x}}))+p_{01}({\bm{x}})}r_{11}(j,{\bm{x}})},
ξM(0)(0,𝒙)\displaystyle\xi_{M}^{(0)}(0,{\bm{x}}) =:fM0|U,𝑿(0|10,𝒙)fM0|U,𝑿(0|00,𝒙)=1j=1mmaxξM(0)(j,𝒙)p00(𝒙)ξM(0)(j,𝒙)(p11(𝒙)p01(𝒙))+p10(𝒙)r00(j,𝒙)1j=1mmaxp00(𝒙)ξM(0)(j,𝒙)(p11(𝒙)p01(𝒙))+p10(𝒙)r00(j,𝒙).\displaystyle=:\frac{f_{M_{0}|U,{\bm{X}}}(0|10,{\bm{x}})}{f_{M_{0}|U,{\bm{X}}}(0|00,{\bm{x}})}=\frac{1-\sum_{j=1}^{m_{\max}}\displaystyle\frac{\xi_{M}^{(0)}(j,{\bm{x}})p_{00}({\bm{x}})}{\xi_{M}^{(0)}(j,{\bm{x}})(p_{11}({\bm{x}})-p_{01}({\bm{x}}))+p_{10}({\bm{x}})}r_{00}(j,{\bm{x}})}{1-\sum_{j=1}^{m_{\max}}\displaystyle\frac{p_{00}({\bm{x}})}{\xi_{M}^{(0)}(j,{\bm{x}})(p_{11}({\bm{x}})-p_{01}({\bm{x}}))+p_{10}({\bm{x}})}r_{00}(j,{\bm{x}})}.
Proof.

We first show 𝔼Y0m|U,𝑿(Y0m|11,𝒙)=μ01(m,𝒙)\mathbb{E}_{Y_{0m}|U,{\bm{X}}}(Y_{0m}|11,{\bm{x}})=\mu_{01}(m,{\bm{x}}) and similar result extends to 𝔼Y1m|U,𝑿(Y1m|00,𝒙)=μ10(m,𝒙)\mathbb{E}_{Y_{1m}|U,{\bm{X}}}(Y_{1m}|00,{\bm{x}})=\mu_{10}(m,{\bm{x}}). Specifically, one can verify

𝔼Y0m|U,𝑿(Y0m|11,𝒙)\displaystyle\mathbb{E}_{Y_{0m}|U,{\bm{X}}}(Y_{0m}|11,{\bm{x}}) =𝔼Y0m|Z,U,𝑿(Y0m|0,11,𝒙)\displaystyle=\mathbb{E}_{Y_{0m}|Z,U,{\bm{X}}}(Y_{0m}|0,11,{\bm{x}})
(by Lemma S5 and Lemma S2)
=𝔼Y0m|Z,M,U,𝑿(Y0m|0,m,11,𝒙)\displaystyle=\mathbb{E}_{Y_{0m}|Z,M,U,{\bm{X}}}(Y_{0m}|0,m,11,{\bm{x}})
(by Assumption 5)
=𝔼Y0m|Z,D,M,U,𝑿(Y0m|0,1,m,11,𝒙)\displaystyle=\mathbb{E}_{Y_{0m}|Z,D,M,U,{\bm{X}}}(Y_{0m}|0,1,m,11,{\bm{x}})
(because DD must be 1 given Z=0Z=0 and U=11U=11)
=𝔼Y0m|Z,D,M,𝑿(Y0m|0,1,m,𝒙)\displaystyle=\mathbb{E}_{Y_{0m}|Z,D,M,{\bm{X}}}(Y_{0m}|0,1,m,{\bm{x}})
(because the observed stratum with D=0D=0 and Z=1Z=1 only contains the always-takers)
=μ01(m,𝒙).\displaystyle=\mu_{01}(m,{\bm{x}}).

Next, we derive the expression of 𝔼Y1m|U,𝑿(Y1m|10,𝒙)\mathbb{E}_{Y_{1m}|U,{\bm{X}}}(Y_{1m}|10,{\bm{x}}). Notice that the expressions of 𝔼Y0m|U,𝑿(Y0m|10,𝒙)\mathbb{E}_{Y_{0m}|U,{\bm{X}}}(Y_{0m}|10,{\bm{x}}), 𝔼Y1m|U,𝑿(Y1m|11,𝒙)\mathbb{E}_{Y_{1m}|U,{\bm{X}}}(Y_{1m}|11,{\bm{x}}), and 𝔼Y0m|U,𝑿(Y0m|00,𝒙)\mathbb{E}_{Y_{0m}|U,{\bm{X}}}(Y_{0m}|00,{\bm{x}}) can be similarly obtained. Specifically, for any m0m\geq 0,

μ11(m,𝒙)\displaystyle\mu_{11}(m,{\bm{x}})
=\displaystyle= 𝔼Y|Z,D,M,𝑿[Y1m|1,1,m,𝒙]=𝔼Y1m|Z,D,M1,𝑿[Y1m|1,1,m,𝒙]\displaystyle\mathbb{E}_{Y|Z,D,M,{\bm{X}}}[Y_{1m}|1,1,m,{\bm{x}}]=\mathbb{E}_{Y_{1m}|Z,D,M_{1},{\bm{X}}}[Y_{1m}|1,1,m,{\bm{x}}]
=\displaystyle= 𝔼Y1m|Z,D,M1,U,𝑿[Y1m|1,1,m,10,𝒙]fU|Z,D,M1,𝑿(10|1,1,m,𝒙)\displaystyle\mathbb{E}_{Y_{1m}|Z,D,M_{1},U,{\bm{X}}}[Y_{1m}|1,1,m,10,{\bm{x}}]f_{U|Z,D,M_{1},{\bm{X}}}(10|1,1,m,{\bm{x}})
+𝔼Y1m|Z,D,M1,U,𝑿[Y1m|1,1,m,11,𝒙]fU|Z,D,M1,𝑿(11|1,1,m,𝒙)\displaystyle+\mathbb{E}_{Y_{1m}|Z,D,M_{1},U,{\bm{X}}}[Y_{1m}|1,1,m,11,{\bm{x}}]f_{U|Z,D,M_{1},{\bm{X}}}(11|1,1,m,{\bm{x}})
(followed by the law of iterated expectation and U=10U=10 or 11 in observed strata (Z=1,D=1)(Z=1,D=1)
under standard monotonicity)
=\displaystyle= 𝔼Y1m|Z,M1,U,𝑿[Y1m|1,m,10,𝒙]fU|Z,D,M1,𝑿(10|1,1,m,𝒙)+𝔼Y1m|Z,M1,U,𝑿[Y1m|1,m,11,𝒙]fU|Z,D,M1,𝑿(11|1,1,m,𝒙)\displaystyle\mathbb{E}_{Y_{1m}|Z,M_{1},U,{\bm{X}}}[Y_{1m}|1,m,10,{\bm{x}}]f_{U|Z,D,M_{1},{\bm{X}}}(10|1,1,m,{\bm{x}})+\mathbb{E}_{Y_{1m}|Z,M_{1},U,{\bm{X}}}[Y_{1m}|1,m,11,{\bm{x}}]f_{U|Z,D,M_{1},{\bm{X}}}(11|1,1,m,{\bm{x}})
(because DD must be 1 given Z=1 and either U=10U=10 or 11)
=\displaystyle= 𝔼Y1m|Z,U,𝑿[Y1m|1,10,𝒙]fU|Z,D,M1,𝑿(10|1,1,m,𝒙)+𝔼Y1m|Z,U,𝑿[Y1m|1,11,𝒙]fU|Z,D,M1,𝑿(11|1,1,m,𝒙)\displaystyle\mathbb{E}_{Y_{1m}|Z,U,{\bm{X}}}[Y_{1m}|1,10,{\bm{x}}]f_{U|Z,D,M_{1},{\bm{X}}}(10|1,1,m,{\bm{x}})+\mathbb{E}_{Y_{1m}|Z,U,{\bm{X}}}[Y_{1m}|1,11,{\bm{x}}]f_{U|Z,D,M_{1},{\bm{X}}}(11|1,1,m,{\bm{x}})
(followed by Assumption 5)
=\displaystyle= 𝔼Y1m|U,𝑿[Y1m|10,𝒙]fU|Z,D,M1,𝑿(10|1,1,m,𝒙)+𝔼Y1m|U,𝑿[Y1m|11,𝒙]fU|Z,D,M1,𝑿(11|1,1,m,𝒙),\displaystyle\mathbb{E}_{Y_{1m}|U,{\bm{X}}}[Y_{1m}|10,{\bm{x}}]f_{U|Z,D,M_{1},{\bm{X}}}(10|1,1,m,{\bm{x}})+\mathbb{E}_{Y_{1m}|U,{\bm{X}}}[Y_{1m}|11,{\bm{x}}]f_{U|Z,D,M_{1},{\bm{X}}}(11|1,1,m,{\bm{x}}),
(by Lemma S5 and Lemma S2)

where

fU|Z,D,M1,𝑿(10|1,1,m,𝒙)=\displaystyle f_{U|Z,D,M_{1},{\bm{X}}}(10|1,1,m,{\bm{x}})= fM1|Z,D,U,𝑿(m|1,1,10,𝒙)fU|Z,D,𝑿(10|1,1,𝒙)u{10,11}fM1|Z,D,U,𝑿(m|1,1,u,𝒙)fU|Z,D,𝑿(u|1,1,𝒙)\displaystyle\frac{f_{M_{1}|Z,D,U,{\bm{X}}}(m|1,1,10,{\bm{x}})f_{U|Z,D,{\bm{X}}}(10|1,1,{\bm{x}})}{\displaystyle\sum_{u\in\{10,11\}}f_{M_{1}|Z,D,U,{\bm{X}}}(m|1,1,u,{\bm{x}})f_{U|Z,D,{\bm{X}}}(u|1,1,{\bm{x}})}
=\displaystyle= fM1|U,𝑿(m|10,𝒙)fU|Z,D,𝑿(10|1,1,𝒙)u{10,11}fM1|U,𝑿(m|u,𝒙)fU|Z,D,𝑿(u|1,1,𝒙)\displaystyle\frac{f_{M_{1}|U,{\bm{X}}}(m|10,{\bm{x}})f_{U|Z,D,{\bm{X}}}(10|1,1,{\bm{x}})}{\displaystyle\sum_{u\in\{10,11\}}f_{M_{1}|U,{\bm{X}}}(m|u,{\bm{x}})f_{U|Z,D,{\bm{X}}}(u|1,1,{\bm{x}})}
=\displaystyle= fM1|U,𝑿(m|10,𝒙)p11(𝒙)p01(𝒙)p11(𝒙)fM1|U,𝑿(m|10,𝒙)p11(𝒙)p01(𝒙)p11(𝒙)+fM1|U,𝑿(m|11,𝒙)p01(𝒙)p11(𝒙)\displaystyle\frac{f_{M_{1}|U,{\bm{X}}}(m|10,{\bm{x}})\displaystyle\frac{p_{11}({\bm{x}})-p_{01}({\bm{x}})}{p_{11}({\bm{x}})}}{f_{M_{1}|U,{\bm{X}}}(m|10,{\bm{x}})\displaystyle\frac{p_{11}({\bm{x}})-p_{01}({\bm{x}})}{p_{11}({\bm{x}})}+f_{M_{1}|U,{\bm{X}}}(m|11,{\bm{x}})\displaystyle\frac{p_{01}({\bm{x}})}{p_{11}({\bm{x}})}}
=\displaystyle= p11(𝒙)p01(𝒙)p11(𝒙)p11(𝒙)p01(𝒙)p11(𝒙)+p01(𝒙)ξM(1)(m,𝒙)p11(𝒙)=ξM(1)(m,𝒙)(p11(𝒙)p01(𝒙))ξM(1)(m,𝒙)(p11(𝒙)p01(𝒙))+p01(𝒙).\displaystyle\frac{\displaystyle\frac{p_{11}({\bm{x}})-p_{01}({\bm{x}})}{p_{11}({\bm{x}})}}{\displaystyle\frac{p_{11}({\bm{x}})-p_{01}({\bm{x}})}{p_{11}({\bm{x}})}+\displaystyle\frac{p_{01}({\bm{x}})}{\xi_{M}^{(1)}(m,{\bm{x}})p_{11}({\bm{x}})}}=\frac{\xi_{M}^{(1)}(m,{\bm{x}})\left(p_{11}({\bm{x}})-p_{01}({\bm{x}})\right)}{\xi_{M}^{(1)}(m,{\bm{x}})\left(p_{11}({\bm{x}})-p_{01}({\bm{x}})\right)+p_{01}({\bm{x}})}.

and

fU|Z,D,M1,𝑿(11|1,1,m,𝒙)\displaystyle f_{U|Z,D,M_{1},{\bm{X}}}(11|1,1,m,{\bm{x}}) =1fU|Z,D,M1,𝑿(10|1,1,m,𝒙)\displaystyle=1-f_{U|Z,D,M_{1},{\bm{X}}}(10|1,1,m,{\bm{x}})
=p01(𝒙)ξM(1)(m,𝒙)(p11(𝒙)p01(𝒙))+p01(𝒙).\displaystyle=\frac{p_{01}({\bm{x}})}{\xi_{M}^{(1)}(m,{\bm{x}})\left(p_{11}({\bm{x}})-p_{01}({\bm{x}})\right)+p_{01}({\bm{x}})}.

This suggests that

μ11(m,𝒙)\displaystyle\mu_{11}(m,{\bm{x}})
=\displaystyle= 𝔼Y1m|U,𝑿[Y1m|10,𝒙]ξM(1)(m,𝒙)(p11(𝒙)p01(𝒙))ξM(1)(m,𝒙)(p11(𝒙)p01(𝒙))+p01(𝒙)+𝔼Y1m|U,𝑿[Y1m|11,𝒙]p01(𝒙)ξM(1)(m,𝒙)(p11(𝒙)p01(𝒙))+p01(𝒙)\displaystyle\mathbb{E}_{Y_{1m}|U,{\bm{X}}}[Y_{1m}|10,{\bm{x}}]\frac{\xi_{M}^{(1)}(m,{\bm{x}})\left(p_{11}({\bm{x}})\!-\!p_{01}({\bm{x}})\right)}{\xi_{M}^{(1)}(m,{\bm{x}})\left(p_{11}({\bm{x}})\!-\!p_{01}({\bm{x}})\right)+p_{01}({\bm{x}})}\!+\!\mathbb{E}_{Y_{1m}|U,{\bm{X}}}[Y_{1m}|11,{\bm{x}}]\frac{p_{01}({\bm{x}})}{\xi_{M}^{(1)}(m,{\bm{x}})\left(p_{11}({\bm{x}})-p_{01}({\bm{x}})\right)+p_{01}({\bm{x}})}
=\displaystyle= 𝔼Y1m|U,𝑿[Y1m|10,𝒙]ξM(1)(m,𝒙)(p11(𝒙)p01(𝒙))ξM(1)(m,𝒙)(p11(𝒙)p01(𝒙))+p01(𝒙)+𝔼Y1m|U,𝑿[Y1m|11,𝒙]ξY(1)(m,𝒙)p01(𝒙)ξM(1)(m,𝒙)(p11(𝒙)p01(𝒙))+p01(𝒙)\displaystyle\mathbb{E}_{Y_{1m}|U,{\bm{X}}}[Y_{1m}|10,{\bm{x}}]\frac{\xi_{M}^{(1)}(m,{\bm{x}})\left(p_{11}({\bm{x}})\!-\!p_{01}({\bm{x}})\right)}{\xi_{M}^{(1)}(m,{\bm{x}})\left(p_{11}({\bm{x}})\!-\!p_{01}({\bm{x}})\right)+p_{01}({\bm{x}})}\!+\!\frac{\mathbb{E}_{Y_{1m}|U,{\bm{X}}}[Y_{1m}|11,{\bm{x}}]}{\xi_{Y}^{(1)}(m,{\bm{x}})}\frac{p_{01}({\bm{x}})}{\xi_{M}^{(1)}(m,{\bm{x}})\left(p_{11}({\bm{x}})-p_{01}({\bm{x}})\right)+p_{01}({\bm{x}})}
=\displaystyle= 𝔼Y1m|U,𝑿[Y1m|10,𝒙]×p01(𝒙)/ξY(1)(m,𝒙)+ξM(1)(m,𝒙)(p11(𝒙)p01(𝒙))ξM(1)(m,𝒙)(p11(𝒙)p01(𝒙))+p01(𝒙)\displaystyle\mathbb{E}_{Y_{1m}|U,{\bm{X}}}[Y_{1m}|10,{\bm{x}}]\times\frac{p_{01}({\bm{x}})/\xi_{Y}^{(1)}(m,{\bm{x}})+\xi_{M}^{(1)}(m,{\bm{x}})\left(p_{11}({\bm{x}})-p_{01}({\bm{x}})\right)}{\xi_{M}^{(1)}(m,{\bm{x}})\left(p_{11}({\bm{x}})-p_{01}({\bm{x}})\right)+p_{01}({\bm{x}})}

and thus

𝔼Y1m|U,𝑿[Y1m|10,𝒙]=ξM(1)(m,𝒙)(p11(𝒙)p01(𝒙))+p01(𝒙)p01(𝒙)/ξY(1)(m,𝒙)+ξM(1)(m,𝒙)(p11(𝒙)p01(𝒙))μ11(m,𝒙).\mathbb{E}_{Y_{1m}|U,{\bm{X}}}[Y_{1m}|10,{\bm{x}}]=\frac{\xi_{M}^{(1)}(m,{\bm{x}})\left(p_{11}({\bm{x}})-p_{01}({\bm{x}})\right)+p_{01}({\bm{x}})}{p_{01}({\bm{x}})/\xi_{Y}^{(1)}(m,{\bm{x}})+\xi_{M}^{(1)}(m,{\bm{x}})\left(p_{11}({\bm{x}})-p_{01}({\bm{x}})\right)}\mu_{11}(m,{\bm{x}}).

This completes the proof. \square

Next, we prove the nonparametric formulas θd1d0(zz)\theta_{d_{1}d_{0}}^{(zz^{\prime})} in Proposition S1.

Proof of Proposition S1. We will derive the nonparametric identification formula for θ10(10)\theta_{10}^{(10)} and omit the similar proofs for all other θd1d0(zz)\theta_{d_{1}d_{0}}^{(zz^{\prime})} since the steps are similar. By the definition of θ10(10)\theta_{10}^{(10)}, we have that

θ10(10)\displaystyle\theta_{10}^{(10)} =𝔼[Y1M0|U=10]\displaystyle=\mathbb{E}[Y_{1M_{0}}|U=10]
=𝔼[𝔼[Y1M0|U=10,𝑿]|U=10](by law of iterated expectations)\displaystyle=\mathbb{E}\left[\mathbb{E}[Y_{1M_{0}}|U=10,{\bm{X}}]\Big{|}U=10\right]\quad\text{(by law of iterated expectations)}
=𝔼[m=0mmax𝔼Y1m|M0,U,𝑿[Y1m|m,10,𝑿]fM0|U,𝑿(m|10,𝑿)|U=10]\displaystyle=\mathbb{E}\left[\sum_{m=0}^{m_{\max}}\mathbb{E}_{Y_{1m}|M_{0},U,{\bm{X}}}[Y_{1m}|m,10,{\bm{X}}]f_{M_{0}|U,{\bm{X}}}(m|10,{\bm{X}})\Big{|}U=10\right]
=𝔼[m=0mmax𝔼Y1m|Z,M0,U,𝑿[Y1m|1,m,10,𝑿]fM0|U,𝑿(m|10,𝑿)|U=10]\displaystyle=\mathbb{E}\left[\sum_{m=0}^{m_{\max}}\mathbb{E}_{Y_{1m}|Z,M_{0},U,{\bm{X}}}[Y_{1m}|1,m,10,{\bm{X}}]f_{M_{0}|U,{\bm{X}}}(m|10,{\bm{X}})\Big{|}U=10\right]
(by Lemma S5 coupled with Lemma S2)
=𝔼[m=0mmax𝔼Y1m|Z,U,𝑿[Y1m|1,10,𝑿]fM0|U,𝑿(m|10,𝑿)|U=10](by Assumption 5)\displaystyle=\mathbb{E}\left[\sum_{m=0}^{m_{\max}}\mathbb{E}_{Y_{1m}|Z,U,{\bm{X}}}[Y_{1m}|1,10,{\bm{X}}]f_{M_{0}|U,{\bm{X}}}(m|10,{\bm{X}})\Big{|}U=10\right]\quad\quad\text{(by Assumption 5)}
=𝔼[m=0mmax𝔼Y1m|U,𝑿[Y1m|10,𝑿]fM0|U,𝑿(m|10,𝑿)|U=10](by Lemma S5)\displaystyle=\mathbb{E}\left[\sum_{m=0}^{m_{\max}}\mathbb{E}_{Y_{1m}|U,{\bm{X}}}[Y_{1m}|10,{\bm{X}}]f_{M_{0}|U,{\bm{X}}}(m|10,{\bm{X}})\Big{|}U=10\right]\quad\text{(by Lemma \ref{lemma:randomization2})}
=𝔼[m=0mmaxw10(10)(m,𝑿)μ11(m,𝑿)r00(m,𝑿)|U=10](by Lemmas S12 and S13)\displaystyle=\mathbb{E}\left[\sum_{m=0}^{m_{\max}}w_{10}^{(10)}(m,{\bm{X}})\mu_{11}(m,{\bm{X}})r_{00}(m,{\bm{X}})\Big{|}U=10\right]\quad\quad\quad\text{(by Lemmas \ref{lemma:sa_1} and \ref{lemma:sa_2})}
=𝒙fU|𝑿(10|𝒙)fU(10){m=0mmaxw10(10)(m,𝒙)μ11(m,𝒙)r00(m,𝒙)}d𝑿(𝒙)(by Lemma S1)\displaystyle=\int_{{\bm{x}}}\frac{f_{U|{\bm{X}}}(10|{\bm{x}})}{f_{U}(10)}\left\{\sum_{m=0}^{m_{\max}}w_{10}^{(10)}(m,{\bm{x}})\mu_{11}(m,{\bm{x}})r_{00}(m,{\bm{x}})\right\}\text{d}\mathbb{P}_{{\bm{X}}}({\bm{x}})\quad\text{(by Lemma \ref{lemma:expectation})}
=𝒙e10(𝒙)e10{m=0mmaxw10(10)(m,𝒙)μ11(m,𝒙)r00(m,𝒙)}d𝑿(𝒙).\displaystyle=\int_{{\bm{x}}}\frac{e_{10}({\bm{x}})}{e_{10}}\left\{\sum_{m=0}^{m_{\max}}w_{10}^{(10)}(m,{\bm{x}})\mu_{11}(m,{\bm{x}})r_{00}(m,{\bm{x}})\right\}\text{d}\mathbb{P}_{{\bm{X}}}({\bm{x}}).

This completes the proof. \square

Finally, we prove properties of the multiply robust estimator θ^d1d0(zz)(ξ)\widehat{\theta}_{d_{1}d_{0}}^{(zz^{\prime})}(\xi) in Proposition S2.

Proof of Proposition S2. Following the notation in the proof of Theorem 4, we let h~nuisance={π~z(𝒙),p~zd(𝒙),r~zd(m,𝒙),μ~zd(m,𝒙)}\widetilde{h}_{nuisance}=\{\widetilde{\pi}_{z}({\bm{x}}),\widetilde{p}_{zd}({\bm{x}}),\widetilde{r}_{zd}(m,{\bm{x}}),\widetilde{\mu}_{zd}(m,{\bm{x}})\} be probability limit of h^nuisancepar\widehat{h}_{nuisance}^{\text{par}}, where π~z(𝒙)=πz(𝒙)\widetilde{\pi}_{z}({\bm{x}})=\pi_{z}({\bm{x}}), p~zd(𝒙)=pzd(𝒙)\widetilde{p}_{zd}({\bm{x}})=p_{zd}({\bm{x}}), r~zd(m,𝒙)=rzd(m,𝒙)\widetilde{r}_{zd}(m,{\bm{x}})=r_{zd}(m,{\bm{x}}), μ~zd(m,𝒙)=μzd(m,𝒙)\widetilde{\mu}_{zd}(m,{\bm{x}})=\mu_{zd}(m,{\bm{x}}), under π\mathcal{M}_{\pi}, e\mathcal{M}_{e}, m\mathcal{M}_{m}, and o\mathcal{M}_{o} respectively. Under the condition of either πem\mathcal{M}_{\pi}\cap\mathcal{M}_{e}\cap\mathcal{M}_{m} or emo\mathcal{M}_{e}\cap\mathcal{M}_{m}\cap\mathcal{M}_{o}, we always have p~zd(𝒙)=pzd(𝒙)\widetilde{p}_{zd}({\bm{x}})=p_{zd}({\bm{x}}) and r~zd(m,𝒙)=rzd(m,𝒙)\widetilde{r}_{zd}(m,{\bm{x}})=r_{zd}(m,{\bm{x}}). Also, because the sensitivity weight wd1d0(zz)(m,𝒙)w_{d_{1}d_{0}}^{(zz^{\prime})}(m,{\bm{x}}) only depends on the confounding function ξ\xi and the observed-data nuisance functions {pzd(𝒙),rzd(m,𝒙)}\{p_{zd}({\bm{x}}),r_{zd}(m,{\bm{x}})\}, we can confirm that the probability limit of w^d1d0(zz),par(m,𝒙)\widehat{w}_{d_{1}d_{0}}^{(zz^{\prime}),\text{par}}(m,{\bm{x}}) is equal to wd1d0(zz)(m,𝒙)w_{d_{1}d_{0}}^{(zz^{\prime})}(m,{\bm{x}}) under πem\mathcal{M}_{\pi}\cap\mathcal{M}_{e}\cap\mathcal{M}_{m} or emo\mathcal{M}_{e}\cap\mathcal{M}_{m}\cap\mathcal{M}_{o}. In addition, we can show that the probability limit of p^zddr\widehat{p}_{zd}^{\text{dr}}, denoted by p~zd\widetilde{p}_{zd}, is equal to the true value pzdp_{zd} under πem\mathcal{M}_{\pi}\cap\mathcal{M}_{e}\cap\mathcal{M}_{m} or emo\mathcal{M}_{e}\cap\mathcal{M}_{m}\cap\mathcal{M}_{o}, because p^zddr\widehat{p}_{zd}^{\text{dr}} is a doubly robust estimator under πe\mathcal{M}_{\pi}\cup\mathcal{M}_{e}. The previous discussion suggests that the probability limit of θ^d1d0(zz),mr(ξ)\widehat{\theta}_{d_{1}d_{0}}^{(zz^{\prime}),\text{mr}}(\xi) is

θ(zz),mrd1d0(ξ)=\displaystyle\theta^{(zz^{\prime}),\text{mr}}_{d_{1}d_{0}}(\xi)= 𝔼{(𝕀(Z=z){𝕀(D=d)pzd(𝑿)}π~z(𝑿)k(1Z){Dp01(𝑿)}π~0(𝑿))η~zzw(𝑿)pzdkp01\displaystyle\mathbb{E}\Big{\{}\left(\frac{\mathbb{I}(Z=z^{*})\left\{\mathbb{I}(D=d^{*})-p_{z^{*}d^{*}}({\bm{X}})\right\}}{\widetilde{\pi}_{z^{*}}({\bm{X}})}-k\frac{(1-Z)\left\{D-p_{01}({\bm{X}})\right\}}{\widetilde{\pi}_{0}({\bm{X}})}\right)\frac{\widetilde{\eta}_{zz^{\prime}}^{w}({\bm{X}})}{p_{z^{*}d^{*}}-kp_{01}}
+pzd(𝑿)kp01(𝑿)pzdkp01𝕀(D=dz,Z=z)pzdz(𝑿)π~z(𝑿)rzdz(M,𝑿)rzdz(M,𝑿)wd1d0(zz)(M,𝑿){Yμ~zdz(M,𝑿)}\displaystyle+\frac{p_{z^{*}d^{*}}({\bm{X}})-kp_{01}({\bm{X}})}{p_{z^{*}d^{*}}-kp_{01}}\frac{\mathbb{I}(D=d_{z},Z=z)}{p_{zd_{z}}({\bm{X}})\widetilde{\pi}_{z}({\bm{X}})}\frac{r_{z^{\prime}d_{z^{\prime}}}(M,{\bm{X}})}{r_{zd_{z}}(M,{\bm{X}})}w_{d_{1}d_{0}}^{(zz^{\prime})}(M,{\bm{X}})\left\{Y-\widetilde{\mu}_{zd_{z}}(M,{\bm{X}})\right\}
+pzd(𝑿)kp01(𝑿)pzdkp01𝕀(D=dz,Z=z)pzdz(𝑿)π~z(𝑿){wd1d0(zz)(M,𝑿)μ~zdz(M,𝑿)η~zzw(𝑿)}\displaystyle+\frac{p_{z^{*}d^{*}}({\bm{X}})-kp_{01}({\bm{X}})}{p_{z^{*}d^{*}}-kp_{01}}\frac{\mathbb{I}(D=d_{z^{\prime}},Z=z^{\prime})}{p_{z^{\prime}d_{z^{\prime}}}({\bm{X}})\widetilde{\pi}_{z^{\prime}}({\bm{X}})}\left\{w_{d_{1}d_{0}}^{(zz^{\prime})}(M,{\bm{X}})\widetilde{\mu}_{zd_{z}}(M,{\bm{X}})-\widetilde{\eta}_{zz^{\prime}}^{w}({\bm{X}})\right\}
+pzd(𝑿)kp01(𝑿)pzdkp01η~zzw(𝑿)}\displaystyle+\frac{p_{z^{*}d^{*}}({\bm{X}})-kp_{01}({\bm{X}})}{p_{z^{*}d^{*}}-kp_{01}}\widetilde{\eta}_{zz^{\prime}}^{w}({\bm{X}})\Big{\}}

under πem\mathcal{M}_{\pi}\cap\mathcal{M}_{e}\cap\mathcal{M}_{m} or emo\mathcal{M}_{e}\cap\mathcal{M}_{m}\cap\mathcal{M}_{o}, where η~zzw(𝑿)=m=0mmaxwd1d0(zz)(m,𝑿)μ~zdz(m,𝑿)rzdz(m,𝑿)\widetilde{\eta}_{zz^{\prime}}^{w}({\bm{X}})=\sum_{m=0}^{m_{\max}}w_{d_{1}d_{0}}^{(zz^{\prime})}(m,{\bm{X}})\widetilde{\mu}_{zd_{z}}(m,{\bm{X}})r_{z^{\prime}d_{z^{\prime}}}(m,{\bm{X}}). In what follows, we show that θ(zz),mrd1d0(ξ)=θd1d0(zz)\theta^{(zz^{\prime}),\text{mr}}_{d_{1}d_{0}}(\xi)=\theta_{d_{1}d_{0}}^{(zz^{\prime})} under Scenario I (πem\mathcal{M}_{\pi}\cap\mathcal{M}_{e}\cap\mathcal{M}_{m}) or Scenario II (emo\mathcal{M}_{e}\cap\mathcal{M}_{m}\cap\mathcal{M}_{o}), which concludes the double robustness of θ^d1d0(zz),mr(ξ)\widehat{\theta}_{d_{1}d_{0}}^{(zz^{\prime}),\text{mr}}(\xi).

Scenario I (πem\mathcal{M}_{\pi}\cap\mathcal{M}_{e}\cap\mathcal{M}_{m}):.

In Scenario I, π~z(𝒙)=πz(𝒙)\widetilde{\pi}_{z}({\bm{x}})=\pi_{z}({\bm{x}}) but generally μ~zd(m,𝒙)μzd(m,𝒙)\widetilde{\mu}_{zd}(m,{\bm{x}})\neq\mu_{zd}(m,{\bm{x}}). Therefore, we can rewrite θ(zz),mrd1d0(ξ)=j=14Δj\theta^{(zz^{\prime}),\text{mr}}_{d_{1}d_{0}}(\xi)=\sum_{j=1}^{4}\Delta_{j}, where

Δ1\displaystyle\Delta_{1} =𝔼[(𝕀(Z=z){𝕀(D=d)pzd(𝑿)}πz(𝑿)k(1Z){Dp01(𝑿)}π0(𝑿))m=0mmaxwd1d0(zz)(m,𝑿)μ~zdz(m,𝑿)rzdz(m,𝑿)pzdkp01],\displaystyle=\mathbb{E}\left[\left(\frac{\mathbb{I}(Z=z^{*})\left\{\mathbb{I}(D=d^{*})-p_{z^{*}d^{*}}({\bm{X}})\right\}}{\pi_{z^{*}}({\bm{X}})}-k\frac{(1-Z)\left\{D-p_{01}({\bm{X}})\right\}}{\pi_{0}({\bm{X}})}\right)\frac{\displaystyle\sum_{m=0}^{m_{\max}}w_{d_{1}d_{0}}^{(zz^{\prime})}(m,{\bm{X}})\widetilde{\mu}_{zd_{z}}(m,{\bm{X}})r_{z^{\prime}d_{z^{\prime}}}(m,{\bm{X}})}{p_{z^{*}d^{*}}-kp_{01}}\right],
Δ2\displaystyle\Delta_{2} =𝔼[pzd(𝑿)kp01(𝑿)pzdkp01𝕀(D=dz,Z=z)pzdz(𝑿)πz(𝑿)rzdz(M,𝑿)rzdz(M,𝑿)wd1d0(zz)(M,𝑿)Y],\displaystyle=\mathbb{E}\left[\frac{p_{z^{*}d^{*}}({\bm{X}})-kp_{01}({\bm{X}})}{p_{z^{*}d^{*}}-kp_{01}}\frac{\mathbb{I}(D=d_{z},Z=z)}{p_{zd_{z}}({\bm{X}})\pi_{z}({\bm{X}})}\frac{r_{z^{\prime}d_{z^{\prime}}}(M,{\bm{X}})}{r_{zd_{z}}(M,{\bm{X}})}w_{d_{1}d_{0}}^{(zz^{\prime})}(M,{\bm{X}})Y\right],
Δ3\displaystyle\Delta_{3} =𝔼[pzd(𝑿)kp01(𝑿)pzdkp01{𝕀(D=dz,Z=z)pzdz(𝑿)πz(𝑿)𝕀(D=dz,Z=z)pzdz(𝑿)πz(𝑿)rzdz(M,𝑿)rzdz(M,𝑿)}wd1d0(zz)(M,𝑿)μ~zdz(M,𝑿)],\displaystyle=\mathbb{E}\left[\frac{p_{z^{*}d^{*}}({\bm{X}})-kp_{01}({\bm{X}})}{p_{z^{*}d^{*}}-kp_{01}}\left\{\frac{\mathbb{I}(D=d_{z^{\prime}},Z=z^{\prime})}{p_{z^{\prime}d_{z^{\prime}}}({\bm{X}})\pi_{z^{\prime}}({\bm{X}})}-\frac{\mathbb{I}(D=d_{z},Z=z)}{p_{zd_{z}}({\bm{X}})\pi_{z}({\bm{X}})}\frac{r_{z^{\prime}d_{z^{\prime}}}(M,{\bm{X}})}{r_{zd_{z}}(M,{\bm{X}})}\right\}w_{d_{1}d_{0}}^{(zz^{\prime})}(M,{\bm{X}})\widetilde{\mu}_{zd_{z}}(M,{\bm{X}})\right],
Δ4\displaystyle\Delta_{4} =𝔼[pzd(𝑿)kp01(𝑿)pzdkp01{1𝕀(D=dz,Z=z)pzdz(𝑿)πz(𝑿)}m=0mmaxwd1d0(zz)(m,𝑿)μ~zdz(m,𝑿)rzdz(m,𝑿)].\displaystyle=\mathbb{E}\left[\frac{p_{z^{*}d^{*}}({\bm{X}})-kp_{01}({\bm{X}})}{p_{z^{*}d^{*}}-kp_{01}}\left\{1-\frac{\mathbb{I}(D=d_{z^{\prime}},Z=z^{\prime})}{p_{z^{\prime}d_{z^{\prime}}}({\bm{X}})\pi_{z^{\prime}}({\bm{X}})}\right\}\sum_{m=0}^{m_{\max}}w_{d_{1}d_{0}}^{(zz^{\prime})}(m,{\bm{X}})\widetilde{\mu}_{zd_{z}}(m,{\bm{X}})r_{z^{\prime}d_{z^{\prime}}}(m,{\bm{X}})\right].

One can show that Δ1=Δ3=Δ4=0\Delta_{1}=\Delta_{3}=\Delta_{4}=0 by using the law of iterated expectation and

Δ2=\displaystyle\Delta_{2}= 𝒙pzd(𝒙)kp01(𝒙)pzdkp011pzdz(𝒙)πz(𝒙)m=0mmax{rzdz(m,𝒙)rzdz(m,𝒙)wd1d0(zz)(m,𝒙)yydY|Z,D,M𝑿(y|z,dz,m,𝒙)rzdz(m,𝒙)}\displaystyle\int_{{\bm{x}}}\frac{p_{z^{*}d^{*}}({\bm{x}})-kp_{01}({\bm{x}})}{p_{z^{*}d^{*}}-kp_{01}}\frac{1}{p_{zd_{z}}({\bm{x}})\pi_{z}({\bm{x}})}\sum_{m=0}^{m_{\max}}\left\{\frac{r_{z^{\prime}d_{z^{\prime}}}(m,{\bm{x}})}{r_{zd_{z}}(m,{\bm{x}})}w_{d_{1}d_{0}}^{(zz^{\prime})}(m,{\bm{x}})\int_{y}y\text{d}\mathbb{P}_{Y|Z,D,M{\bm{X}}}(y|z,d_{z},m,{\bm{x}})r_{zd_{z}}(m,{\bm{x}})\right\}
fD|Z,𝑿(dz|z,𝒙)fZ|𝑿(z|𝒙)d𝑿(𝒙)\displaystyle\quad f_{D|Z,{\bm{X}}}(d_{z}|z,{\bm{x}})f_{Z|{\bm{X}}}(z|{\bm{x}})\text{d}\mathbb{P}_{{\bm{X}}}({\bm{x}})
=\displaystyle= 𝒙pzd(𝒙)kp01(𝒙)pzdkp01m=0mmax{rzdz(m,𝒙)rzdz(m,𝒙)wd1d0(zz)(m,𝒙)yydY|Z,D,M𝑿(y|z,dz,m,𝒙)rzdz(m,𝒙)}d𝑿(𝒙)\displaystyle\int_{{\bm{x}}}\frac{p_{z^{*}d^{*}}({\bm{x}})-kp_{01}({\bm{x}})}{p_{z^{*}d^{*}}-kp_{01}}\sum_{m=0}^{m_{\max}}\left\{\frac{r_{z^{\prime}d_{z^{\prime}}}(m,{\bm{x}})}{r_{zd_{z}}(m,{\bm{x}})}w_{d_{1}d_{0}}^{(zz^{\prime})}(m,{\bm{x}})\int_{y}y\text{d}\mathbb{P}_{Y|Z,D,M{\bm{X}}}(y|z,d_{z},m,{\bm{x}})r_{zd_{z}}(m,{\bm{x}})\right\}\text{d}\mathbb{P}_{{\bm{X}}}({\bm{x}})
=\displaystyle= 𝒙pzd(𝒙)kp01(𝒙)pzdkp01m=0mmax{rzdz(m,𝒙)rzdz(m,𝒙)wd1d0(zz)(m,𝒙)μzdz(m,𝒙)rzdz(m,𝒙)}d𝑿(𝒙)\displaystyle\int_{{\bm{x}}}\frac{p_{z^{*}d^{*}}({\bm{x}})-kp_{01}({\bm{x}})}{p_{z^{*}d^{*}}-kp_{01}}\sum_{m=0}^{m_{\max}}\left\{\frac{r_{z^{\prime}d_{z^{\prime}}}(m,{\bm{x}})}{r_{zd_{z}}(m,{\bm{x}})}w_{d_{1}d_{0}}^{(zz^{\prime})}(m,{\bm{x}})\mu_{zd_{z}}(m,{\bm{x}})r_{zd_{z}}(m,{\bm{x}})\right\}\text{d}\mathbb{P}_{{\bm{X}}}({\bm{x}})
=\displaystyle= 𝒙pzd(𝒙)kp01(𝒙)pzdkp01m=0mmax{wd1d0(zz)(m,𝒙)μzdz(m,𝒙)rzdz(m,𝒙)}d𝑿(𝒙)\displaystyle\int_{{\bm{x}}}\frac{p_{z^{*}d^{*}}({\bm{x}})-kp_{01}({\bm{x}})}{p_{z^{*}d^{*}}-kp_{01}}\sum_{m=0}^{m_{\max}}\left\{w_{d_{1}d_{0}}^{(zz^{\prime})}(m,{\bm{x}})\mu_{zd_{z}}(m,{\bm{x}})r_{z^{\prime}d_{z^{\prime}}}(m,{\bm{x}})\right\}\text{d}\mathbb{P}_{{\bm{X}}}({\bm{x}})
=\displaystyle= 𝒙ed1d0(𝒙)ed1d0m=0mmax{wd1d0(zz)(m,𝒙)μzdz(m,𝒙)rzdz(m,𝒙)}d𝑿(𝒙)\displaystyle\int_{{\bm{x}}}\frac{e_{d_{1}d_{0}}({\bm{x}})}{e_{d_{1}d_{0}}}\sum_{m=0}^{m_{\max}}\left\{w_{d_{1}d_{0}}^{(zz^{\prime})}(m,{\bm{x}})\mu_{zd_{z}}(m,{\bm{x}})r_{z^{\prime}d_{z^{\prime}}}(m,{\bm{x}})\right\}\text{d}\mathbb{P}_{{\bm{X}}}({\bm{x}})
=\displaystyle= θd1d0(zz),\displaystyle\theta_{d_{1}d_{0}}^{(zz^{\prime})},

which suggests that θ(zz),mrd1d0(ξ)=j=14Δj=θ(zz)d1d0\theta^{(zz^{\prime}),\text{mr}}_{d_{1}d_{0}}(\xi)=\sum_{j=1}^{4}\Delta_{j}=\theta^{(zz^{\prime})}_{d_{1}d_{0}} under πem\mathcal{M}_{\pi}\cap\mathcal{M}_{e}\cap\mathcal{M}_{m}.

Scenario II (emo\mathcal{M}_{e}\cap\mathcal{M}_{m}\cap\mathcal{M}_{o}):

In Scenario II, μ~zd(m,𝒙)=μzd(m,𝒙)\widetilde{\mu}_{zd}(m,{\bm{x}})=\mu_{zd}(m,{\bm{x}}) but generally π~z(𝒙)πz(𝒙)\widetilde{\pi}_{z}({\bm{x}})\neq\pi_{z}({\bm{x}}). Therefore, we have θ(zz),mrd1d0(ξ)=j=14Δj\theta^{(zz^{\prime}),\text{mr}}_{d_{1}d_{0}}(\xi)=\sum_{j=1}^{4}\Delta_{j}, where

Δ1\displaystyle\Delta_{1} =𝔼[(𝕀(Z=z){𝕀(D=d)pzd(𝑿)}π~z(𝑿)k(1Z){Dp01(𝑿)}π~0(𝑿))m=0mmaxwd1d0(zz)(m,𝑿)μzdz(m,𝑿)rzdz(m,𝑿)pzdkp01],\displaystyle=\mathbb{E}\left[\left(\frac{\mathbb{I}(Z=z^{*})\left\{\mathbb{I}(D=d^{*})-p_{z^{*}d^{*}}({\bm{X}})\right\}}{\widetilde{\pi}_{z^{*}}({\bm{X}})}-k\frac{(1-Z)\left\{D-p_{01}({\bm{X}})\right\}}{\widetilde{\pi}_{0}({\bm{X}})}\right)\frac{\displaystyle\sum_{m=0}^{m_{\max}}w_{d_{1}d_{0}}^{(zz^{\prime})}(m,{\bm{X}}){\mu}_{zd_{z}}(m,{\bm{X}})r_{z^{\prime}d_{z^{\prime}}}(m,{\bm{X}})}{p_{z^{*}d^{*}}-kp_{01}}\right],
Δ2\displaystyle\Delta_{2} =𝔼[pzd(𝑿)kp01(𝑿)pzdkp01𝕀(D=dz,Z=z)pzdz(𝑿)π~z(𝑿)rzdz(M,𝑿)rzdz(M,𝑿)wd1d0(zz)(M,𝑿){Yμzdz(M,𝑿)}],\displaystyle=\mathbb{E}\left[\frac{p_{z^{*}d^{*}}({\bm{X}})-kp_{01}({\bm{X}})}{p_{z^{*}d^{*}}-kp_{01}}\frac{\mathbb{I}(D=d_{z},Z=z)}{p_{zd_{z}}({\bm{X}})\widetilde{\pi}_{z}({\bm{X}})}\frac{r_{z^{\prime}d_{z^{\prime}}}(M,{\bm{X}})}{r_{zd_{z}}(M,{\bm{X}})}w_{d_{1}d_{0}}^{(zz^{\prime})}(M,{\bm{X}})\left\{Y-\mu_{zd_{z}}(M,{\bm{X}})\right\}\right],
Δ3\displaystyle\Delta_{3} =𝔼[pzd(𝑿)kp01(𝑿)pzdkp01𝕀(D=dz,Z=z)pzdz(𝑿)π~z(𝑿){wd1d0(zz)(M,𝑿)μzdz(M,𝑿)m=0mmaxwd1d0(zz)(m,𝑿)μzdz(m,𝑿)rzdz(m,𝑿)}],\displaystyle=\mathbb{E}\left[\frac{p_{z^{*}d^{*}}({\bm{X}})-kp_{01}({\bm{X}})}{p_{z^{*}d^{*}}-kp_{01}}\frac{\mathbb{I}(D=d_{z^{\prime}},Z=z^{\prime})}{p_{z^{\prime}d_{z^{\prime}}}({\bm{X}})\widetilde{\pi}_{z^{\prime}}({\bm{X}})}\left\{w_{d_{1}d_{0}}^{(zz^{\prime})}(M,{\bm{X}})\mu_{zd_{z}}(M,{\bm{X}})-\sum_{m=0}^{m_{\max}}w_{d_{1}d_{0}}^{(zz^{\prime})}(m,{\bm{X}}){\mu}_{zd_{z}}(m,{\bm{X}})r_{z^{\prime}d_{z^{\prime}}}(m,{\bm{X}})\right\}\right],
Δ4\displaystyle\Delta_{4} =𝔼[pzd(𝑿)kp01(𝑿)pzdkp01m=0mmaxwd1d0(zz)(m,𝑿)μzdz(m,𝑿)rzdz(m,𝑿)].\displaystyle=\mathbb{E}\left[\frac{p_{z^{*}d^{*}}({\bm{X}})-kp_{01}({\bm{X}})}{p_{z^{*}d^{*}}-kp_{01}}\sum_{m=0}^{m_{\max}}w_{d_{1}d_{0}}^{(zz^{\prime})}(m,{\bm{X}}){\mu}_{zd_{z}}(m,{\bm{X}})r_{z^{\prime}d_{z^{\prime}}}(m,{\bm{X}})\right].

Noting that Δ1=Δ2=Δ3=0\Delta_{1}=\Delta_{2}=\Delta_{3}=0 by the law of iterated expectations and Δ4=θd1d0(zz)\Delta_{4}=\theta_{d_{1}d_{0}}^{(zz^{\prime})} as shown in Proposition S1, we obtained that θ(zz),mrd1d0(ξ)=θ(zz)d1d0\theta^{(zz^{\prime}),\text{mr}}_{d_{1}d_{0}}(\xi)=\theta^{(zz^{\prime})}_{d_{1}d_{0}} under emo\mathcal{M}_{e}\cap\mathcal{M}_{m}\cap\mathcal{M}_{o}.

Up until this point, we have confirmed that θ^(zz),mrd1d0(ξ)\widehat{\theta}^{(zz^{\prime}),\text{mr}}_{d_{1}d_{0}}(\xi) is consistent to θ(zz)d1d0\theta^{(zz^{\prime})}_{d_{1}d_{0}} under πem\mathcal{M}_{\pi}\cap\mathcal{M}_{e}\cap\mathcal{M}_{m} or emo\mathcal{M}_{e}\cap\mathcal{M}_{m}\cap\mathcal{M}_{o}. Then, under mild regularity conditions (similar to what we listed in the proof of Theorem 4), one can easily show that θ^(zz),mrd1d0(ξ)\widehat{\theta}^{(zz^{\prime}),\text{mr}}_{d_{1}d_{0}}(\xi) is also asymptotically normal such that n(θ^(zz),mrd1d0(ξ)θd1d0(zz))\sqrt{n}\left(\widehat{\theta}^{(zz^{\prime}),\text{mr}}_{d_{1}d_{0}}(\xi)-\theta_{d_{1}d_{0}}^{(zz^{\prime})}\right) converges to a zero-mean normal distribution with finite variance under either πem\mathcal{M}_{\pi}\cap\mathcal{M}_{e}\cap\mathcal{M}_{m} or emo\mathcal{M}_{e}\cap\mathcal{M}_{m}\cap\mathcal{M}_{o}. The proof is omitted for brevity. \square

D.9 Sensitivity analysis for the principal ignorability assumption under strong monotonicity

We can adapt θ^d1d0(zz),mr(ξ)\widehat{\theta}_{d_{1}d_{0}}^{(zz^{\prime}),\text{mr}}(\xi) to address strong monotonicity, following a similar procedure shown in Section D.8. Under strong monotonicity, the proposed multiply robust estimator θ^d1d0(zz),mr(κ)\widehat{\theta}_{d_{1}d_{0}}^{(zz^{\prime}),\text{mr}}(\kappa) takes the following explicit expression for any d1d0𝒰bd_{1}d_{0}\in\mathcal{U}_{\text{b}}:

θ^(zz),mrd1d0(κ)=\displaystyle\widehat{\theta}^{(zz^{\prime}),\text{mr}}_{d_{1}d_{0}}(\kappa)= n{(𝕀(Z=z){𝕀(D=d)p^zdpar(𝑿)}π^zpar(𝑿)k(1Z){Dp^01par(𝑿)}π^0par(𝑿))η^zzw,par(𝑿)p^zddrkp^01dr\displaystyle\mathbb{P}_{n}\Big{\{}\left(\frac{\mathbb{I}(Z=z^{*})\left\{\mathbb{I}(D=d^{*})-\widehat{p}_{z^{*}d^{*}}^{\text{par}}({\bm{X}})\right\}}{\widehat{\pi}_{z^{*}}^{\text{par}}({\bm{X}})}-k\frac{(1-Z)\left\{D-\widehat{p}_{01}^{\text{par}}({\bm{X}})\right\}}{\widehat{\pi}_{0}^{\text{par}}({\bm{X}})}\right)\frac{\widehat{\eta}_{zz^{\prime}}^{w,\text{par}}({\bm{X}})}{\widehat{p}_{z^{*}d^{*}}^{\text{dr}}-k\widehat{p}_{01}^{\text{dr}}}
+p^zdpar(𝑿)kp^01par(𝑿)p^zddrkp^01dr𝕀(D=dz,Z=z)p^zdzpar(𝑿)π^zpar(𝑿)r^zdzpar(M,𝑿)r^zdzpar(M,𝑿)w^d1d0(zz)(M,𝑿){Yμ^zdzpar(M,𝑿)}\displaystyle+\frac{\widehat{p}_{z^{*}d^{*}}^{\text{par}}({\bm{X}})-k\widehat{p}_{01}^{\text{par}}({\bm{X}})}{\widehat{p}_{z^{*}d^{*}}^{\text{dr}}-k\widehat{p}_{01}^{\text{dr}}}\frac{\mathbb{I}(D=d_{z},Z=z)}{\widehat{p}_{zd_{z}}^{\text{par}}({\bm{X}})\widehat{\pi}_{z}^{\text{par}}({\bm{X}})}\frac{\widehat{r}_{z^{\prime}d_{z^{\prime}}}^{\text{par}}(M,{\bm{X}})}{\widehat{r}_{zd_{z}}^{\text{par}}(M,{\bm{X}})}\widehat{w}_{d_{1}d_{0}}^{(zz^{\prime})}(M,{\bm{X}})\left\{Y-\widehat{\mu}_{zd_{z}}^{\text{par}}(M,{\bm{X}})\right\}
+p^zdpar(𝑿)kp^01par(𝑿)p^zddrkp^01dr𝕀(D=dz,Z=z)p^zdzpar(𝑿)π^zpar(𝑿){w^d1d0(zz)(M,𝑿)μ^zdzpar(M,𝑿)η^zzw,par(𝑿)}\displaystyle+\frac{\widehat{p}_{z^{*}d^{*}}^{\text{par}}({\bm{X}})-k\widehat{p}_{01}^{\text{par}}({\bm{X}})}{\widehat{p}_{z^{*}d^{*}}^{\text{dr}}-k\widehat{p}_{01}^{\text{dr}}}\frac{\mathbb{I}(D=d_{z^{\prime}},Z=z^{\prime})}{\widehat{p}_{z^{\prime}d_{z^{\prime}}}^{\text{par}}({\bm{X}})\widehat{\pi}_{z^{\prime}}^{\text{par}}({\bm{X}})}\left\{\widehat{w}_{d_{1}d_{0}}^{(zz^{\prime})}(M,{\bm{X}})\widehat{\mu}_{zd_{z}}^{\text{par}}(M,{\bm{X}})-\widehat{\eta}_{zz^{\prime}}^{w,\text{par}}({\bm{X}})\right\}
+p^zdpar(𝑿)kp^01par(𝑿)p^zddrkp^01drη^zzw,par(𝑿)},\displaystyle+\frac{\widehat{p}_{z^{*}d^{*}}^{\text{par}}({\bm{X}})-k\widehat{p}_{01}^{\text{par}}({\bm{X}})}{\widehat{p}_{z^{*}d^{*}}^{\text{dr}}-k\widehat{p}_{01}^{\text{dr}}}\widehat{\eta}_{zz^{\prime}}^{w,\text{par}}({\bm{X}})\Big{\}}, (s26)

where η^zzw,par(𝒙)=m=0mmaxw^d1d0(zz)(m,𝒙)μ^zdzpar(m,𝒙)r^zdzpar(m,𝒙)\widehat{\eta}_{zz^{\prime}}^{w,\text{par}}({\bm{x}})=\displaystyle\sum_{m=0}^{m_{\max}}\widehat{w}_{d_{1}d_{0}}^{(zz^{\prime})}(m,{\bm{x}})\widehat{\mu}_{zd_{z}}^{\text{par}}(m,{\bm{x}})\widehat{r}_{z^{\prime}d_{z^{\prime}}}^{\text{par}}(m,{\bm{x}}) and w^d1d0(zz)(m,𝒙)\widehat{w}_{d_{1}d_{0}}^{(zz^{\prime})}(m,{\bm{x}}) is the estimated sensitivity weight based on the parametric working models. Here, the sensitivity weight wd1d0(zz)(m,𝒙)w_{d_{1}d_{0}}^{(zz^{\prime})}(m,{\bm{x}}) takes a slightly different form under standard monotonicity. We provide the explicit expressions of wd1d0(zz)(m,𝒙)w_{d_{1}d_{0}}^{(zz^{\prime})}(m,{\bm{x}}) for zz{11,10,00}zz^{\prime}\in\{11,10,00\} and d1d0{10,00}d_{1}d_{0}\in\{10,00\} below.

w10(11)(m,𝒙)\displaystyle w_{10}^{(11)}(m,{\bm{x}}) =1 for any m.\displaystyle=1\text{ for any $m$.}
w10(10)(m,𝒙)\displaystyle w_{10}^{(10)}(m,{\bm{x}}) ={ξM(0)(m,𝒙)p00(𝒙)ξM(0)(m,𝒙)(p11(𝒙)p01(𝑿))+p10(𝒙),if m1,1r00(0,𝒙)j=1mmaxξM(0)(j,𝒙)p00(𝒙)r00(j,𝒙)/r00(0,𝒙)ξM(0)(j,𝒙)(p11(𝒙)p01(𝒙))+p10(𝒙),if m=0.\displaystyle=\begin{cases}\frac{\xi_{M}^{(0)}(m,{\bm{x}})p_{00}({\bm{x}})}{\xi_{M}^{(0)}(m,{\bm{x}})(p_{11}({\bm{x}})-p_{01}({\bm{X}}))+p_{10}({\bm{x}})},&\text{if }m\geq 1,\\ \frac{1}{r_{00}(0,{\bm{x}})}-\displaystyle\sum_{j=1}^{m_{\max}}\frac{\xi_{M}^{(0)}(j,{\bm{x}})p_{00}({\bm{x}})r_{00}(j,{\bm{x}})/r_{00}(0,{\bm{x}})}{\xi_{M}^{(0)}(j,{\bm{x}})(p_{11}({\bm{x}})-p_{01}({\bm{x}}))+p_{10}({\bm{x}})},&\text{if }m=0.\end{cases}
w10(00)(m,𝒙)\displaystyle w_{10}^{(00)}(m,{\bm{x}}) ={ξM(0)(m,𝒙)p00(𝒙)ξM(0)(m,𝒙)(p11(𝒙)p01(𝑿))+p10(𝒙)ξM(0)(m,𝒙)(p11(𝒙)p01(𝒙))+p10(𝒙)p10(𝒙)/ξY(0)(m,𝒙)+ξM(0)(m,𝒙)(p11(𝒙)p01(𝒙)),if m1,{1r00(0,𝒙)j=1mmaxξM(0)(j,𝒙)p00(𝒙)r00(j,𝒙)/r00(0,𝒙)ξM(0)(j,𝒙)(p11(𝒙)p01(𝒙))+p10(𝒙)}ξM(0)(0,𝒙)(p11(𝒙)p01(𝒙))+p10(𝒙)p10(𝒙)/ξY(0)(0,𝒙)+ξM(0)(0,𝒙)(p11(𝒙)p01(𝒙)),if m=0.\displaystyle=\begin{cases}\frac{\xi_{M}^{(0)}(m,{\bm{x}})p_{00}({\bm{x}})}{\xi_{M}^{(0)}(m,{\bm{x}})(p_{11}({\bm{x}})-p_{01}({\bm{X}}))+p_{10}({\bm{x}})}\frac{\xi_{M}^{(0)}(m,{\bm{x}})\left(p_{11}({\bm{x}})-p_{01}({\bm{x}})\right)+p_{10}({\bm{x}})}{p_{10}({\bm{x}})/\xi_{Y}^{(0)}(m,{\bm{x}})+\xi_{M}^{(0)}(m,{\bm{x}})\left(p_{11}({\bm{x}})-p_{01}({\bm{x}})\right)},&\text{if }m\geq 1,\\ \left\{\frac{1}{r_{00}(0,{\bm{x}})}-\displaystyle\sum_{j=1}^{m_{\max}}\frac{\xi_{M}^{(0)}(j,{\bm{x}})p_{00}({\bm{x}})r_{00}(j,{\bm{x}})/r_{00}(0,{\bm{x}})}{\xi_{M}^{(0)}(j,{\bm{x}})(p_{11}({\bm{x}})-p_{01}({\bm{x}}))+p_{10}({\bm{x}})}\right\}\frac{\xi_{M}^{(0)}(0,{\bm{x}})\left(p_{11}({\bm{x}})-p_{01}({\bm{x}})\right)+p_{10}({\bm{x}})}{p_{10}({\bm{x}})/\xi_{Y}^{(0)}(0,{\bm{x}})+\xi_{M}^{(0)}(0,{\bm{x}})\left(p_{11}({\bm{x}})-p_{01}({\bm{x}})\right)},&\text{if }m=0.\end{cases}
w00(11)(m,𝒙)\displaystyle w_{00}^{(11)}(m,{\bm{x}}) =1 for any m.\displaystyle=1\text{ for any $m$}.
w00(10)(m,𝒙)\displaystyle w_{00}^{(10)}(m,{\bm{x}}) ={p00(𝒙)ξM(0)(m,𝒙)(p11(𝒙)p01(𝒙))+p10(𝒙),if m1,1r00(0,𝒙)j=1mmaxp00(𝒙)r00(j,𝒙)/r00(0,𝒙)ξM(0)(j,𝒙)(p11(𝒙)p01(𝒙))+p10(𝒙),if m=0.\displaystyle=\begin{cases}\frac{p_{00}({\bm{x}})}{\xi_{M}^{(0)}(m,{\bm{x}})(p_{11}({\bm{x}})-p_{01}({\bm{x}}))+p_{10}({\bm{x}})},&\text{if }m\geq 1,\\ \frac{1}{r_{00}(0,{\bm{x}})}-\displaystyle\sum_{j=1}^{m_{\max}}\frac{p_{00}({\bm{x}})r_{00}(j,{\bm{x}})/r_{00}(0,{\bm{x}})}{\xi_{M}^{(0)}(j,{\bm{x}})(p_{11}({\bm{x}})-p_{01}({\bm{x}}))+p_{10}({\bm{x}})},&\text{if }m=0.\end{cases}
w00(00)(m,𝒙)\displaystyle w_{00}^{(00)}(m,{\bm{x}}) ={p00(𝒙)ξM(0)(m,𝒙)(p11(𝒙)p01(𝒙))+p10(𝒙)ξM(0)(m,𝒙)(p11(𝒙)p01(𝒙))+p10(𝒙)p10(𝒙)+ξY(0)(m,𝒙)ξM(0)(m,𝒙)(p11(𝒙)p01(𝒙)),if m1,{1r00(0,𝒙)j=1mmaxp00(𝒙)r00(j,𝒙)/r00(0,𝒙)ξM(0)(j,𝒙)(p11(𝒙)p01(𝒙))+p10(𝒙)}ξM(0)(0,𝒙)(p11(𝒙)p01(𝒙))+p10(𝒙)p10(𝒙)+ξY(0)(0,𝒙)ξM(0)(0,𝒙)(p11(𝒙)p01(𝒙)),if m=0.\displaystyle=\begin{cases}\frac{p_{00}({\bm{x}})}{\xi_{M}^{(0)}(m,{\bm{x}})(p_{11}({\bm{x}})-p_{01}({\bm{x}}))+p_{10}({\bm{x}})}\frac{\xi_{M}^{(0)}(m,{\bm{x}})\left(p_{11}({\bm{x}})-p_{01}({\bm{x}})\right)+p_{10}({\bm{x}})}{p_{10}({\bm{x}})+\xi_{Y}^{(0)}(m,{\bm{x}})\xi_{M}^{(0)}(m,{\bm{x}})\left(p_{11}({\bm{x}})-p_{01}({\bm{x}})\right)},&\text{if }m\geq 1,\\ \left\{\frac{1}{r_{00}(0,{\bm{x}})}-\displaystyle\sum_{j=1}^{m_{\max}}\frac{p_{00}({\bm{x}})r_{00}(j,{\bm{x}})/r_{00}(0,{\bm{x}})}{\xi_{M}^{(0)}(j,{\bm{x}})(p_{11}({\bm{x}})-p_{01}({\bm{x}}))+p_{10}({\bm{x}})}\right\}\frac{\xi_{M}^{(0)}(0,{\bm{x}})\left(p_{11}({\bm{x}})-p_{01}({\bm{x}})\right)+p_{10}({\bm{x}})}{p_{10}({\bm{x}})+\xi_{Y}^{(0)}(0,{\bm{x}})\xi_{M}^{(0)}(0,{\bm{x}})\left(p_{11}({\bm{x}})-p_{01}({\bm{x}})\right)},&\text{if }m=0.\end{cases}

It is worth noting that the sensitivity weights {wd1d0(10)(m,𝒙),wd1d0(11)(m,𝒙)}\{w_{d_{1}d_{0}}^{(10)}(m,{\bm{x}}),w_{d_{1}d_{0}}^{(11)}(m,{\bm{x}})\} only depend on ξM(0)(m,𝒙)\xi_{M}^{(0)}(m,{\bm{x}}) but not ξY(0)(m,𝒙)\xi_{Y}^{(0)}(m,{\bm{x}}); this suggests that the estimated principal natural indirect effect, θ^d1d0(11),mr(κ)θ^d1d0(10),mr(κ)\widehat{\theta}_{d_{1}d_{0}}^{(11),\text{mr}}(\kappa)-\widehat{\theta}_{d_{1}d_{0}}^{(10),\text{mr}}(\kappa), will not depend on the values of ξY(0)(m,𝒙)\xi_{Y}^{(0)}(m,{\bm{x}}). However, the estimated principal natural direct effect, θ^d1d0(10),mr(κ)θ^d1d0(00),mr(κ)\widehat{\theta}_{d_{1}d_{0}}^{(10),\text{mr}}(\kappa)-\widehat{\theta}_{d_{1}d_{0}}^{(00),\text{mr}}(\kappa), is dependent on both ξY(0)(m,𝒙)\xi_{Y}^{(0)}(m,{\bm{x}}) and ξM(0)(m,𝒙)\xi_{M}^{(0)}(m,{\bm{x}}).

We can show that θ^(zz),mrd1d0(κ)\widehat{\theta}^{(zz^{\prime}),\text{mr}}_{d_{1}d_{0}}(\kappa) enjoys similar properties to θ^(zz),mrd1d0(ξ)\widehat{\theta}^{(zz^{\prime}),\text{mr}}_{d_{1}d_{0}}(\xi):

Proposition S7

Suppose that Assumptions 1, 2, 3b, 5, and 6 hold. Under either πem\mathcal{M}_{\pi}\cap\mathcal{M}_{e}\cap\mathcal{M}_{m} or emo\mathcal{M}_{e}\cap\mathcal{M}_{m}\cap\mathcal{M}_{o}, the estimator θ^(zz),mrd1d0(κ)\widehat{\theta}^{(zz^{\prime}),\text{mr}}_{d_{1}d_{0}}(\kappa) is consistent and asymptotically normal for any d1d0𝒰bd_{1}d_{0}\in\mathcal{U}_{\text{b}}.

Proof of Proposition S7 is similar to the proof of Proposition S2 and has been omitted for brevity.

D.10 Sensitivity analysis for the ignorability of the mediator assumption

This section presents proofs for Propositions S3 and S4, which refer to the identification results and properties of the multiply robust estimator under violation of Assumption 5.

Proof of Proposition S3. For any z{0,1}z\in\{0,1\} and d1d0𝒰d_{1}d_{0}\in\mathcal{U} with 𝒰=𝒰a\mathcal{U}=\mathcal{U}_{\text{a}} under standard monotonicity and 𝒰=𝒰b\mathcal{U}=\mathcal{U}_{\text{b}} under strong monotonicity, we can show that

𝔼Y1m|Z,U,𝑿[Y1m|z,d1d0,𝒙]\displaystyle\mathbb{E}_{Y_{1m}|Z,U,{\bm{X}}}[Y_{1m}|z,d_{1}d_{0},{\bm{x}}]
=\displaystyle= j=0mmax𝔼Y1m|Z,M,U,𝑿[Y1m|z,j,d1d0,𝒙]fM|Z,U,𝑿(j|z,d1d0,𝒙)\displaystyle\sum_{j=0}^{m_{\max}}\mathbb{E}_{Y_{1m}|Z,M,U,{\bm{X}}}[Y_{1m}|z,j,d_{1}d_{0},{\bm{x}}]f_{M|Z,U,{\bm{X}}}(j|z,d_{1}d_{0},{\bm{x}})
=\displaystyle= 𝔼Y1m|Z,M,U,𝑿[Y1m|z,m,d1d0,𝒙]j=0mmax𝔼Y1m|Z,M,U,𝑿[Y1m|z,j,d1d0,𝒙]𝔼Y1m|Z,M,U,𝑿[Y1m|z,m,d1d0,𝒙]fM|Z,U,𝑿(j|z,d1d0,𝒙)\displaystyle\mathbb{E}_{Y_{1m}|Z,M,U,{\bm{X}}}[Y_{1m}|z,m,d_{1}d_{0},{\bm{x}}]\sum_{j=0}^{m_{\max}}\frac{\mathbb{E}_{Y_{1m}|Z,M,U,{\bm{X}}}[Y_{1m}|z,j,d_{1}d_{0},{\bm{x}}]}{\mathbb{E}_{Y_{1m}|Z,M,U,{\bm{X}}}[Y_{1m}|z,m,d_{1}d_{0},{\bm{x}}]}f_{M|Z,U,{\bm{X}}}(j|z,d_{1}d_{0},{\bm{x}})
=\displaystyle= 𝔼Y1m|Z,M,U,𝑿[Y1m|z,m,d1d0,𝒙]j=0mmaxt(z,j,d1d0,𝒙)t(z,m,d1d0,𝒙)fM|Z,U,𝑿(j|z,d1d0,𝒙)\displaystyle\mathbb{E}_{Y_{1m}|Z,M,U,{\bm{X}}}[Y_{1m}|z,m,d_{1}d_{0},{\bm{x}}]\sum_{j=0}^{m_{\max}}\frac{t(z,j,d_{1}d_{0},{\bm{x}})}{t(z,m,d_{1}d_{0},{\bm{x}})}f_{M|Z,U,{\bm{X}}}(j|z,d_{1}d_{0},{\bm{x}})

Therefore,

𝔼Y1m|Z,M,U,𝑿[Y1m|0,m,d1d0,𝒙]𝔼Y1m|Z,M,U,𝑿[Y1m|1,m,d1d0,𝒙]\displaystyle\frac{\mathbb{E}_{Y_{1m}|Z,M,U,{\bm{X}}}[Y_{1m}|0,m,d_{1}d_{0},{\bm{x}}]}{\mathbb{E}_{Y_{1m}|Z,M,U,{\bm{X}}}[Y_{1m}|1,m,d_{1}d_{0},{\bm{x}}]}
=\displaystyle= 𝔼Y1m|Z,U,𝑿[Y1m|0,d1d0,𝒙]{j=0mmaxt(0,j,d1d0,𝒙)t(0,m,d1d0,𝒙)fM|Z,U,𝑿(j|0,d1d0,𝒙)}1𝔼Y1m|Z,U,𝑿[Y1m|1,d1d0,𝒙]{j=0mmaxt(1,j,d1d0,𝒙)t(1,m,d1d0,𝒙)fM|Z,U,𝑿(j|1,d1d0,𝒙)}1\displaystyle\frac{\mathbb{E}_{Y_{1m}|Z,U,{\bm{X}}}[Y_{1m}|0,d_{1}d_{0},{\bm{x}}]\left\{\displaystyle\sum_{j=0}^{m_{\max}}\frac{t(0,j,d_{1}d_{0},{\bm{x}})}{t(0,m,d_{1}d_{0},{\bm{x}})}f_{M|Z,U,{\bm{X}}}(j|0,d_{1}d_{0},{\bm{x}})\right\}^{-1}}{\mathbb{E}_{Y_{1m}|Z,U,{\bm{X}}}[Y_{1m}|1,d_{1}d_{0},{\bm{x}}]\left\{\displaystyle\sum_{j=0}^{m_{\max}}\frac{t(1,j,d_{1}d_{0},{\bm{x}})}{t(1,m,d_{1}d_{0},{\bm{x}})}f_{M|Z,U,{\bm{X}}}(j|1,d_{1}d_{0},{\bm{x}})\right\}^{-1}}
=\displaystyle= {j=0mmaxt(1,j,d1d0,𝒙)t(1,m,d1d0,𝒙)fM|Z,U,𝑿(j|1,d1d0,𝒙)}/{j=0mmaxt(0,j,d1d0,𝒙)t(0,m,d1d0,𝒙)fM|Z,U,𝑿(j|0,d1d0,𝒙)},\displaystyle\left\{\displaystyle\sum_{j=0}^{m_{\max}}\frac{t(1,j,d_{1}d_{0},{\bm{x}})}{t(1,m,d_{1}d_{0},{\bm{x}})}f_{M|Z,U,{\bm{X}}}(j|1,d_{1}d_{0},{\bm{x}})\right\}\Big{/}\left\{\displaystyle\sum_{j=0}^{m_{\max}}\frac{t(0,j,d_{1}d_{0},{\bm{x}})}{t(0,m,d_{1}d_{0},{\bm{x}})}f_{M|Z,U,{\bm{X}}}(j|0,d_{1}d_{0},{\bm{x}})\right\},

where the last equality is followed by 𝔼Y1m|Z,U,𝑿[Y1m|0,d1d0,𝒙]=𝔼Y1m|Z,U,𝑿[Y1m|1,d1d0,𝒙]\mathbb{E}_{Y_{1m}|Z,U,{\bm{X}}}[Y_{1m}|0,d_{1}d_{0},{\bm{x}}]=\mathbb{E}_{Y_{1m}|Z,U,{\bm{X}}}[Y_{1m}|1,d_{1}d_{0},{\bm{x}}] due to Lemma S5. We can then identify 𝔼Y1m|M0,U,𝑿[Y1m|m,d1d0,𝒙]\mathbb{E}_{Y_{1m}|M_{0},U,{\bm{X}}}[Y_{1m}|m,d_{1}d_{0},{\bm{x}}] using the following equations:

𝔼Y1m|M0,U,𝑿[m,d1d0,𝒙]\displaystyle\mathbb{E}_{Y_{1m}|M_{0},U,{\bm{X}}}[m,d_{1}d_{0},{\bm{x}}]
=\displaystyle= 𝔼Y1m|Z,M,U,𝑿[0,m,d1d0,𝒙]\displaystyle\mathbb{E}_{Y_{1m}|Z,M,U,{\bm{X}}}[0,m,d_{1}d_{0},{\bm{x}}]
=\displaystyle= 𝔼Y1m|Z,M,U,𝑿[1,m,d1d0,𝒙]×j=0mmaxt(1,j,d1d0,𝒙)t(1,m,d1d0,𝒙)fM|Z,U,𝑿(j|1,d1d0,𝒙)j=0mmaxt(0,j,d1d0,𝒙)t(0,m,d1d0,𝒙)fM|Z,U,𝑿(j|0,d1d0,𝒙)\displaystyle\mathbb{E}_{Y_{1m}|Z,M,U,{\bm{X}}}[1,m,d_{1}d_{0},{\bm{x}}]\times\frac{\sum_{j=0}^{m_{\max}}\displaystyle\frac{t(1,j,d_{1}d_{0},{\bm{x}})}{t(1,m,d_{1}d_{0},{\bm{x}})}f_{M|Z,U,{\bm{X}}}(j|1,d_{1}d_{0},{\bm{x}})}{\sum_{j=0}^{m_{\max}}\displaystyle\frac{t(0,j,d_{1}d_{0},{\bm{x}})}{t(0,m,d_{1}d_{0},{\bm{x}})}f_{M|Z,U,{\bm{X}}}(j|0,d_{1}d_{0},{\bm{x}})}
=\displaystyle= 𝔼Y1m|M1,U,𝑿[m,d1d0,𝒙]×j=0mmaxt(1,j,d1d0,𝒙)t(1,m,d1d0,𝒙)fM1|U,𝑿(j|d1d0,𝒙)j=0mmaxt(0,j,d1d0,𝒙)t(0,m,d1d0,𝒙)fM0|U,𝑿(j|d1d0,𝒙)\displaystyle\mathbb{E}_{Y_{1m}|M_{1},U,{\bm{X}}}[m,d_{1}d_{0},{\bm{x}}]\times\frac{\sum_{j=0}^{m_{\max}}\displaystyle\frac{t(1,j,d_{1}d_{0},{\bm{x}})}{t(1,m,d_{1}d_{0},{\bm{x}})}f_{M_{1}|U,{\bm{X}}}(j|d_{1}d_{0},{\bm{x}})}{\sum_{j=0}^{m_{\max}}\displaystyle\frac{t(0,j,d_{1}d_{0},{\bm{x}})}{t(0,m,d_{1}d_{0},{\bm{x}})}f_{M_{0}|U,{\bm{X}}}(j|d_{1}d_{0},{\bm{x}})}
=\displaystyle= 𝔼Y1m|D1,D0,M1,𝑿[d1,d0,m,𝒙]×j=0mmaxt(1,j,d1d0,𝒙)t(1,m,d1d0,𝒙)fM1|D1,D0,𝑿(j|d1,d0,𝒙)j=0mmaxt(0,j,d1d0,𝒙)t(0,m,d1d0,𝒙)fM0|D1,D0,𝑿(j|d1,d0,𝒙)\displaystyle\mathbb{E}_{Y_{1m}|D_{1},D_{0},M_{1},{\bm{X}}}[d_{1},d_{0},m,{\bm{x}}]\times\frac{\sum_{j=0}^{m_{\max}}\displaystyle\frac{t(1,j,d_{1}d_{0},{\bm{x}})}{t(1,m,d_{1}d_{0},{\bm{x}})}f_{M_{1}|D_{1},D_{0},{\bm{X}}}(j|d_{1},d_{0},{\bm{x}})}{\sum_{j=0}^{m_{\max}}\displaystyle\frac{t(0,j,d_{1}d_{0},{\bm{x}})}{t(0,m,d_{1}d_{0},{\bm{x}})}f_{M_{0}|D_{1},D_{0},{\bm{X}}}(j|d_{1},d_{0},{\bm{x}})}
=\displaystyle= 𝔼Y1m|D1,M1,𝑿[d1,m,𝒙]×j=0mmaxt(1,j,d1d0,𝒙)t(1,m,d1d0,𝒙)fM1|D1,𝑿(j|d1,𝒙)j=0mmaxt(0,j,d1d0,𝒙)t(0,m,d1d0,𝒙)fM0|D0,𝑿(j|d0,𝒙)(by Lemma S3)\displaystyle\mathbb{E}_{Y_{1m}|D_{1},M_{1},{\bm{X}}}[d_{1},m,{\bm{x}}]\times\frac{\sum_{j=0}^{m_{\max}}\displaystyle\frac{t(1,j,d_{1}d_{0},{\bm{x}})}{t(1,m,d_{1}d_{0},{\bm{x}})}f_{M_{1}|D_{1},{\bm{X}}}(j|d_{1},{\bm{x}})}{\sum_{j=0}^{m_{\max}}\displaystyle\frac{t(0,j,d_{1}d_{0},{\bm{x}})}{t(0,m,d_{1}d_{0},{\bm{x}})}f_{M_{0}|D_{0},{\bm{X}}}(j|d_{0},{\bm{x}})}\quad\quad\text{(by Lemma \ref{lemma:pi_v2})}
=\displaystyle= 𝔼Y|Z,D,M,𝑿[1,d1,m,𝒙]×j=0mmaxt(1,j,d1d0,𝒙)t(1,m,d1d0,𝒙)fM|Z,D,𝑿(j|1,d1,𝒙)j=0mmaxt(0,j,d1d0,𝒙)t(0,m,d1d0,𝒙)fM|Z,D,𝑿(j|0,d0,𝒙)(by Assumption 2)\displaystyle\mathbb{E}_{Y|Z,D,M,{\bm{X}}}[1,d_{1},m,{\bm{x}}]\times\frac{\sum_{j=0}^{m_{\max}}\displaystyle\frac{t(1,j,d_{1}d_{0},{\bm{x}})}{t(1,m,d_{1}d_{0},{\bm{x}})}f_{M|Z,D,{\bm{X}}}(j|1,d_{1},{\bm{x}})}{\sum_{j=0}^{m_{\max}}\displaystyle\frac{t(0,j,d_{1}d_{0},{\bm{x}})}{t(0,m,d_{1}d_{0},{\bm{x}})}f_{M|Z,D,{\bm{X}}}(j|0,d_{0},{\bm{x}})}\quad\quad(\text{by Assumption 2})
=\displaystyle= μ1d1(m,𝒙)j=0mmaxt(1,j,d1d0,𝒙)t(1,m,d1d0,𝒙)r1d1(j,𝒙)j=0mmaxt(0,j,d1d0,𝒙)t(0,m,d1d0,𝒙)r0d0(j,𝒙).\displaystyle\mu_{1d_{1}}(m,{\bm{x}})\frac{\sum_{j=0}^{m_{\max}}\displaystyle\frac{t(1,j,d_{1}d_{0},{\bm{x}})}{t(1,m,d_{1}d_{0},{\bm{x}})}r_{1d_{1}}(j,{\bm{x}})}{\sum_{j=0}^{m_{\max}}\displaystyle\frac{t(0,j,d_{1}d_{0},{\bm{x}})}{t(0,m,d_{1}d_{0},{\bm{x}})}r_{0d_{0}}(j,{\bm{x}})}.

Therefore, we can identify θd1d0(10)\theta_{d_{1}d_{0}}^{(10)} as

θd1d0(10)\displaystyle\theta_{d_{1}d_{0}}^{(10)} =𝔼[Y1M0|U=d1d0]=𝔼[𝔼[Y1M0|U=d1d0,𝑿]|U=d1d0]\displaystyle=\mathbb{E}[Y_{1M_{0}}|U=d_{1}d_{0}]=\mathbb{E}\left[\mathbb{E}[Y_{1M_{0}}|U=d_{1}d_{0},{\bm{X}}]\Big{|}U=d_{1}d_{0}\right]
=𝔼[m=0mmax𝔼Y1m|M0,U,𝑿[Y1m|m,d1d0,𝑿]fM0|U,𝑿(m|d1d0,𝑿)|U=d1d0]\displaystyle=\mathbb{E}\left[\sum_{m=0}^{m_{\max}}\mathbb{E}_{Y_{1m}|M_{0},U,{\bm{X}}}[Y_{1m}|m,d_{1}d_{0},{\bm{X}}]f_{M_{0}|U,{\bm{X}}}(m|d_{1}d_{0},{\bm{X}})\Big{|}U=d_{1}d_{0}\right]
=𝔼[m=0mmaxj=0mmaxt(1,j,d1d0,𝑿)t(1,m,d1d0,𝑿)r1d1(j,𝑿)j=0mmaxt(0,j,d1d0,𝑿)t(0,m,d1d0,𝑿)r0d0(j,𝑿)μ1d1(m,𝑿)fM0|U,𝑿(m|d1d0,𝑿)|U=d1d0]\displaystyle=\mathbb{E}\left[\sum_{m=0}^{m_{\max}}\frac{\sum_{j=0}^{m_{\max}}\displaystyle\frac{t(1,j,d_{1}d_{0},{\bm{X}})}{t(1,m,d_{1}d_{0},{\bm{X}})}r_{1d_{1}}(j,{\bm{X}})}{\sum_{j=0}^{m_{\max}}\displaystyle\frac{t(0,j,d_{1}d_{0},{\bm{X}})}{t(0,m,d_{1}d_{0},{\bm{X}})}r_{0d_{0}}(j,{\bm{X}})}\mu_{1d_{1}}(m,{\bm{X}})f_{M_{0}|U,{\bm{X}}}(m|d_{1}d_{0},{\bm{X}})\Big{|}U=d_{1}d_{0}\right]
=𝔼[m=0mmaxρd1d0(10)(m,𝑿)μ1d1(m,𝑿)fM0|U,𝑿(m|d1d0,𝑿)|U=d1d0]\displaystyle=\mathbb{E}\left[\sum_{m=0}^{m_{\max}}\rho_{d_{1}d_{0}}^{(10)}(m,{\bm{X}})\mu_{1d_{1}}(m,{\bm{X}})f_{M_{0}|U,{\bm{X}}}(m|d_{1}d_{0},{\bm{X}})\Big{|}U=d_{1}d_{0}\right]
=𝔼[m=0mmaxρd1d0(10)(m,𝑿)μ1d1(m,𝑿)fM0|D0,𝑿(m|0,d0,𝑿)|U=d1d0](by Lemma S3)\displaystyle=\mathbb{E}\left[\sum_{m=0}^{m_{\max}}\rho_{d_{1}d_{0}}^{(10)}(m,{\bm{X}})\mu_{1d_{1}}(m,{\bm{X}})f_{M_{0}|D_{0},{\bm{X}}}(m|0,d_{0},{\bm{X}})\Big{|}U=d_{1}d_{0}\right]\quad\text{(by Lemma \ref{lemma:pi_v2})}
=𝔼[m=0mmaxρd1d0(10)(m,𝑿)μ1d1(m,𝑿)fM0|Z,D0,𝑿(m|0,d0,𝑿)|U=d1d0](by Assumption 2)\displaystyle=\mathbb{E}\left[\sum_{m=0}^{m_{\max}}\rho_{d_{1}d_{0}}^{(10)}(m,{\bm{X}})\mu_{1d_{1}}(m,{\bm{X}})f_{M_{0}|Z,D_{0},{\bm{X}}}(m|0,d_{0},{\bm{X}})\Big{|}U=d_{1}d_{0}\right]\quad\text{(by Assumption 2)}
=𝔼[m=0mmaxρd1d0(10)(m,𝑿)μ1d1(m,𝑿)r0d0(m,𝑿)|U=d1d0]\displaystyle=\mathbb{E}\left[\sum_{m=0}^{m_{\max}}\rho_{d_{1}d_{0}}^{(10)}(m,{\bm{X}})\mu_{1d_{1}}(m,{\bm{X}})r_{0d_{0}}(m,{\bm{X}})\Big{|}U=d_{1}d_{0}\right]
=𝒙fU|𝑿(d1d0|𝒙)fU(d1d0){m=0mmaxρd1d0(10)(m,𝒙)μ1d1(m,𝒙)r0d0(m,𝒙)}d𝑿(𝒙)(by Lemma S1)\displaystyle=\int_{{\bm{x}}}\frac{f_{U|{\bm{X}}}(d_{1}d_{0}|{\bm{x}})}{f_{U}(d_{1}d_{0})}\left\{\sum_{m=0}^{m_{\max}}\rho_{d_{1}d_{0}}^{(10)}(m,{\bm{x}})\mu_{1d_{1}}(m,{\bm{x}})r_{0d_{0}}(m,{\bm{x}})\right\}\text{d}\mathbb{P}_{{\bm{X}}}({\bm{x}})\quad\text{(by Lemma \ref{lemma:expectation})}
=𝒙ed1d0(𝒙)ed1d0{m=0mmaxρd1d0(10)(m,𝒙)μ1d1(m,𝒙)r0d0(m,𝒙)}d𝑿(𝒙).\displaystyle=\int_{{\bm{x}}}\frac{e_{d_{1}d_{0}}({\bm{x}})}{e_{d_{1}d_{0}}}\left\{\sum_{m=0}^{m_{\max}}\rho_{d_{1}d_{0}}^{(10)}(m,{\bm{x}})\mu_{1d_{1}}(m,{\bm{x}})r_{0d_{0}}(m,{\bm{x}})\right\}\text{d}\mathbb{P}_{{\bm{X}}}({\bm{x}}).

This completes the proof. \square

Proof of Proposition S4. Following notation in the proofs of Theorem 4 and Proposition S2, we let h~nuisance={π~z(𝒙),p~zd(𝒙),r~zd(m,𝒙),μ~zd(m,𝒙)}\widetilde{h}_{nuisance}=\{\widetilde{\pi}_{z}({\bm{x}}),\widetilde{p}_{zd}({\bm{x}}),\widetilde{r}_{zd}(m,{\bm{x}}),\widetilde{\mu}_{zd}(m,{\bm{x}})\} be the probability limit of h^nuisancepar\widehat{h}_{nuisance}^{\text{par}}. Because we assume the condition under either πem\mathcal{M}_{\pi}\cap\mathcal{M}_{e}\cap\mathcal{M}_{m}, emo\mathcal{M}_{e}\cap\mathcal{M}_{m}\cap\mathcal{M}_{o}, or πmo\mathcal{M}_{\pi}\cap\mathcal{M}_{m}\cap\mathcal{M}_{o}, we always have r~zd(m,𝒙)=rzd(m,𝒙)\widetilde{r}_{zd}(m,{\bm{x}})=r_{zd}(m,{\bm{x}}), which suggests that the probability limit of ρ^d1d0(zz),par(m,𝒙)\widehat{\rho}_{d_{1}d_{0}}^{(zz^{\prime}),\text{par}}(m,{\bm{x}}) always equals to ρd1d0(zz)(m,𝒙)\rho_{d_{1}d_{0}}^{(zz^{\prime})}(m,{\bm{x}}) because ρ^d1d0(zz),par(m,𝒙)\widehat{\rho}_{d_{1}d_{0}}^{(zz^{\prime}),\text{par}}(m,{\bm{x}}) is only a function of the sensitivity weight tt and the mediator model m\mathcal{M}_{m}. Also, we can show that the probability limit of p^zddr\widehat{p}_{zd}^{\text{dr}}, denoted by p~zd\widetilde{p}_{zd}, always equals to the true value pzdp_{zd}, because p^zddr\widehat{p}_{zd}^{\text{dr}} is doubly robust under πe\mathcal{M}_{\pi}\cup\mathcal{M}_{e}. The previous discussion suggests that the probability limit of θ^d1d0(zz),mr(t)\widehat{\theta}_{d_{1}d_{0}}^{(zz^{\prime}),\text{mr}}(t) is

θ(10),mrd1d0(t)=\displaystyle\theta^{(10),\text{mr}}_{d_{1}d_{0}}(t)= 𝔼{(𝕀(Z=z){𝕀(D=d)p~zd(𝑿)}π~z(𝑿)k(1Z){Dp~01(𝑿)}π~0(𝑿))η~10ρ(𝑿)pzdkp01\displaystyle\mathbb{E}\Big{\{}\left(\frac{\mathbb{I}(Z=z^{*})\left\{\mathbb{I}(D=d^{*})-\widetilde{p}_{z^{*}d^{*}}({\bm{X}})\right\}}{\widetilde{\pi}_{z^{*}}({\bm{X}})}-k\frac{(1-Z)\left\{D-\widetilde{p}_{01}({\bm{X}})\right\}}{\widetilde{\pi}_{0}({\bm{X}})}\right)\frac{\widetilde{\eta}_{10}^{\rho}({\bm{X}})}{p_{z^{*}d^{*}}-kp_{01}}
+p~zd(𝑿)kp~01(𝑿)pzdkp01𝕀(D=d1,Z=1)p~1d1(𝑿)π~1(𝑿)r0d0(M,𝑿)r1d1(M,𝑿)ρd1d0(10)(M,𝑿){Yμ~zdz(M,𝑿)}\displaystyle+\frac{\widetilde{p}_{z^{*}d^{*}}({\bm{X}})-k\widetilde{p}_{01}({\bm{X}})}{p_{z^{*}d^{*}}-kp_{01}}\frac{\mathbb{I}(D=d_{1},Z=1)}{\widetilde{p}_{1d_{1}}({\bm{X}})\widetilde{\pi}_{1}({\bm{X}})}\frac{r_{0d_{0}}(M,{\bm{X}})}{r_{1d_{1}}(M,{\bm{X}})}\rho_{d_{1}d_{0}}^{(10)}(M,{\bm{X}})\left\{Y-\widetilde{\mu}_{zd_{z}}(M,{\bm{X}})\right\}
+p~zd(𝑿)kp~01(𝑿)pzdkp01𝕀(D=d0,Z=0)p~0d0(𝑿)π~0(𝑿){ρd1d0(10)(M,𝑿)μ~1d1(M,𝑿)η~10ρ(𝑿)}\displaystyle+\frac{\widetilde{p}_{z^{*}d^{*}}({\bm{X}})-k\widetilde{p}_{01}({\bm{X}})}{p_{z^{*}d^{*}}-kp_{01}}\frac{\mathbb{I}(D=d_{0},Z=0)}{\widetilde{p}_{0d_{0}}({\bm{X}})\widetilde{\pi}_{0}({\bm{X}})}\left\{\rho_{d_{1}d_{0}}^{(10)}(M,{\bm{X}})\widetilde{\mu}_{1d_{1}}(M,{\bm{X}})-\widetilde{\eta}_{10}^{\rho}({\bm{X}})\right\}
+p~zd(𝑿)kp~01(𝑿)pzdkp01η~10ρ(𝑿)}\displaystyle+\frac{\widetilde{p}_{z^{*}d^{*}}({\bm{X}})-k\widetilde{p}_{01}({\bm{X}})}{p_{z^{*}d^{*}}-kp_{01}}\widetilde{\eta}_{10}^{\rho}({\bm{X}})\Big{\}}

under πem\mathcal{M}_{\pi}\cap\mathcal{M}_{e}\cap\mathcal{M}_{m} emo\mathcal{M}_{e}\cap\mathcal{M}_{m}\cap\mathcal{M}_{o}, or πmo\mathcal{M}_{\pi}\cap\mathcal{M}_{m}\cap\mathcal{M}_{o}, where

η~10ρ(𝒙)=m=0mmaxρd1d0(10)(m,𝒙)μ~1d1(m,𝒙)r0d0(m,𝒙).\widetilde{\eta}_{10}^{\rho}({\bm{x}})=\sum_{m=0}^{m_{\max}}\rho_{d_{1}d_{0}}^{(10)}(m,{\bm{x}})\widetilde{\mu}_{1d_{1}}(m,{\bm{x}})r_{0d_{0}}(m,{\bm{x}}).

In what follows, we show that θ(10),mrd1d0(t)=θd1d0(10)\theta^{(10),\text{mr}}_{d_{1}d_{0}}(t)=\theta_{d_{1}d_{0}}^{(10)} under Scenario I (πem\mathcal{M}_{\pi}\cap\mathcal{M}_{e}\cap\mathcal{M}_{m}), Scenario II (πmo\mathcal{M}_{\pi}\cap\mathcal{M}_{m}\cap\mathcal{M}_{o}), or Scenario III (emo\mathcal{M}_{e}\cap\mathcal{M}_{m}\cap\mathcal{M}_{o}), which concludes the triple robustness of θ^d1d0(zz),mr(t)\widehat{\theta}_{d_{1}d_{0}}^{(zz^{\prime}),\text{mr}}(t).

Scenario I (πem\mathcal{M}_{\pi}\cap\mathcal{M}_{e}\cap\mathcal{M}_{m}):

In Scenario I, π~z(𝒙)=πz(𝒙)\widetilde{\pi}_{z}({\bm{x}})=\pi_{z}({\bm{x}}) and p~zd(𝒙)=pzd(𝒙)\widetilde{p}_{zd}({\bm{x}})=p_{zd}({\bm{x}}) but generally μ~zd(m,𝒙)μzd(m,𝒙)\widetilde{\mu}_{zd}(m,{\bm{x}})\neq\mu_{zd}(m,{\bm{x}}). Therefore, we can rewrite θ(10),mrd1d0(t)=j=14Δj\theta^{(10),\text{mr}}_{d_{1}d_{0}}(t)=\sum_{j=1}^{4}\Delta_{j}, where

Δ1\displaystyle\Delta_{1} =𝔼[(𝕀(Z=z){𝕀(D=d)pzd(𝑿)}πz(𝑿)k(1Z){Dp01(𝑿)}π0(𝑿))m=0mmaxρd1d0(10)(m,𝑿)μ~1d1(m,𝑿)r0d0(m,𝑿)pzdkp01],\displaystyle=\mathbb{E}\left[\left(\frac{\mathbb{I}(Z=z^{*})\left\{\mathbb{I}(D=d^{*})-p_{z^{*}d^{*}}({\bm{X}})\right\}}{\pi_{z^{*}}({\bm{X}})}-k\frac{(1-Z)\left\{D-p_{01}({\bm{X}})\right\}}{\pi_{0}({\bm{X}})}\right)\frac{\displaystyle\sum_{m=0}^{m_{\max}}\rho_{d_{1}d_{0}}^{(10)}(m,{\bm{X}})\widetilde{\mu}_{1d_{1}}(m,{\bm{X}})r_{0d_{0}}(m,{\bm{X}})}{p_{z^{*}d^{*}}-kp_{01}}\right],
Δ2\displaystyle\Delta_{2} =𝔼[pzd(𝑿)kp01(𝑿)pzdkp01𝕀(D=d1,Z=1)p1d1(𝑿)π1(𝑿)r0d0(M,𝑿)r1d1(M,𝑿)ρd1d0(10)(M,𝑿)Y],\displaystyle=\mathbb{E}\left[\frac{p_{z^{*}d^{*}}({\bm{X}})-kp_{01}({\bm{X}})}{p_{z^{*}d^{*}}-kp_{01}}\frac{\mathbb{I}(D=d_{1},Z=1)}{p_{1d_{1}}({\bm{X}})\pi_{1}({\bm{X}})}\frac{r_{0d_{0}}(M,{\bm{X}})}{r_{1d_{1}}(M,{\bm{X}})}\rho_{d_{1}d_{0}}^{(10)}(M,{\bm{X}})Y\right],
Δ3\displaystyle\Delta_{3} =𝔼[pzd(𝑿)kp01(𝑿)pzdkp01{𝕀(D=d0,Z=0)p0d0(𝑿)π0(𝑿)𝕀(D=d1,Z=1)p1d1(𝑿)π1(𝑿)r0d0(M,𝑿)r1d1(M,𝑿)}ρd1d0(10)(M,𝑿)μ~1d1(M,𝑿)],\displaystyle=\mathbb{E}\left[\frac{p_{z^{*}d^{*}}({\bm{X}})-kp_{01}({\bm{X}})}{p_{z^{*}d^{*}}-kp_{01}}\left\{\frac{\mathbb{I}(D=d_{0},Z=0)}{p_{0d_{0}}({\bm{X}})\pi_{0}({\bm{X}})}-\frac{\mathbb{I}(D=d_{1},Z=1)}{p_{1d_{1}}({\bm{X}})\pi_{1}({\bm{X}})}\frac{r_{0d_{0}}(M,{\bm{X}})}{r_{1d_{1}}(M,{\bm{X}})}\right\}\rho_{d_{1}d_{0}}^{(10)}(M,{\bm{X}})\widetilde{\mu}_{1d_{1}}(M,{\bm{X}})\right],
Δ4\displaystyle\Delta_{4} =𝔼[pzd(𝑿)kp01(𝑿)pzdkp01{1𝕀(D=d0,Z=0)p0d0(𝑿)π0(𝑿)}m=0mmaxρd1d0(10)(m,𝑿)μ~1d1(m,𝑿)r0d0(m,𝑿)].\displaystyle=\mathbb{E}\left[\frac{p_{z^{*}d^{*}}({\bm{X}})-kp_{01}({\bm{X}})}{p_{z^{*}d^{*}}-kp_{01}}\left\{1-\frac{\mathbb{I}(D=d_{0},Z=0)}{p_{0d_{0}}({\bm{X}})\pi_{0}({\bm{X}})}\right\}\sum_{m=0}^{m_{\max}}\rho_{d_{1}d_{0}}^{(10)}(m,{\bm{X}})\widetilde{\mu}_{1d_{1}}(m,{\bm{X}})r_{0d_{0}}(m,{\bm{X}})\right].

One can show that Δ1=Δ3=Δ4=0\Delta_{1}=\Delta_{3}=\Delta_{4}=0 by using the law of iterated expectation and

Δ2=\displaystyle\Delta_{2}= 𝒙pzd(𝒙)kp01(𝒙)pzdkp011p1d1(𝒙)π1(𝒙)m=0mmax{r0d0(m,𝒙)r1d1(m,𝒙)ρd1d0(10)(m,𝒙)yydY|Z,D,M𝑿(y|1,d1,m,𝒙)r1d1(m,𝒙)}\displaystyle\int_{{\bm{x}}}\frac{p_{z^{*}d^{*}}({\bm{x}})-kp_{01}({\bm{x}})}{p_{z^{*}d^{*}}-kp_{01}}\frac{1}{p_{1d_{1}}({\bm{x}})\pi_{1}({\bm{x}})}\sum_{m=0}^{m_{\max}}\left\{\frac{r_{0d_{0}}(m,{\bm{x}})}{r_{1d_{1}}(m,{\bm{x}})}\rho_{d_{1}d_{0}}^{(10)}(m,{\bm{x}})\int_{y}y\text{d}\mathbb{P}_{Y|Z,D,M{\bm{X}}}(y|1,d_{1},m,{\bm{x}})r_{1d_{1}}(m,{\bm{x}})\right\}
fD|Z,𝑿(d1|1,𝒙)fZ|𝑿(1|𝒙)d𝑿(𝒙)\displaystyle\quad f_{D|Z,{\bm{X}}}(d_{1}|1,{\bm{x}})f_{Z|{\bm{X}}}(1|{\bm{x}})\text{d}\mathbb{P}_{{\bm{X}}}({\bm{x}})
=\displaystyle= 𝒙pzd(𝒙)kp01(𝒙)pzdkp01m=0mmax{r0d0(m,𝒙)r1d1(m,𝒙)ρd1d0(10)(m,𝒙)yydY|Z,D,M𝑿(y|1,d1,m,𝒙)r1d1(m,𝒙)}d𝑿(𝒙)\displaystyle\int_{{\bm{x}}}\frac{p_{z^{*}d^{*}}({\bm{x}})-kp_{01}({\bm{x}})}{p_{z^{*}d^{*}}-kp_{01}}\sum_{m=0}^{m_{\max}}\left\{\frac{r_{0d_{0}}(m,{\bm{x}})}{r_{1d_{1}}(m,{\bm{x}})}\rho_{d_{1}d_{0}}^{(10)}(m,{\bm{x}})\int_{y}y\text{d}\mathbb{P}_{Y|Z,D,M{\bm{X}}}(y|1,d_{1},m,{\bm{x}})r_{1d_{1}}(m,{\bm{x}})\right\}\text{d}\mathbb{P}_{{\bm{X}}}({\bm{x}})
=\displaystyle= 𝒙pzd(𝒙)kp01(𝒙)pzdkp01m=0mmax{r0d0(m,𝒙)r1d1(m,𝒙)ρd1d0(10)(m,𝒙)μ1d1(m,𝒙)r1d1(m,𝒙)}d𝑿(𝒙)\displaystyle\int_{{\bm{x}}}\frac{p_{z^{*}d^{*}}({\bm{x}})-kp_{01}({\bm{x}})}{p_{z^{*}d^{*}}-kp_{01}}\sum_{m=0}^{m_{\max}}\left\{\frac{r_{0d_{0}}(m,{\bm{x}})}{r_{1d_{1}}(m,{\bm{x}})}\rho_{d_{1}d_{0}}^{(10)}(m,{\bm{x}})\mu_{1d_{1}}(m,{\bm{x}})r_{1d_{1}}(m,{\bm{x}})\right\}\text{d}\mathbb{P}_{{\bm{X}}}({\bm{x}})
=\displaystyle= 𝒙pzd(𝒙)kp01(𝒙)pzdkp01m=0mmax{ρd1d0(10)(m,𝒙)μ1d1(m,𝒙)r0d0(m,𝒙)}d𝑿(𝒙)\displaystyle\int_{{\bm{x}}}\frac{p_{z^{*}d^{*}}({\bm{x}})-kp_{01}({\bm{x}})}{p_{z^{*}d^{*}}-kp_{01}}\sum_{m=0}^{m_{\max}}\left\{\rho_{d_{1}d_{0}}^{(10)}(m,{\bm{x}})\mu_{1d_{1}}(m,{\bm{x}})r_{0d_{0}}(m,{\bm{x}})\right\}\text{d}\mathbb{P}_{{\bm{X}}}({\bm{x}})
=\displaystyle= θd1d0(zz),\displaystyle\theta_{d_{1}d_{0}}^{(zz^{\prime})},

where the last equality follows from Proposition S3. This suggests that θ(10),mrd1d0(t)=θ(10)d1d0\theta^{(10),\text{mr}}_{d_{1}d_{0}}(t)=\theta^{(10)}_{d_{1}d_{0}} under πem\mathcal{M}_{\pi}\cap\mathcal{M}_{e}\cap\mathcal{M}_{m}.

Scenario II (πmo\mathcal{M}_{\pi}\cap\mathcal{M}_{m}\cap\mathcal{M}_{o}):

In Scenario II, π~z(𝒙)=πz(𝒙)\widetilde{\pi}_{z}({\bm{x}})=\pi_{z}({\bm{x}}) and μ~zd(m,𝒙)=μzd(m,𝒙)\widetilde{\mu}_{zd}(m,{\bm{x}})=\mu_{zd}(m,{\bm{x}}) but generally p~zd(𝒙)pzd(𝒙)\widetilde{p}_{zd}({\bm{x}})\neq p_{zd}({\bm{x}}). Observing this, we can rewrite θ(zz),mrd1d0=j=14Δj\theta^{(zz^{\prime}),\text{mr}}_{d_{1}d_{0}}=\sum_{j=1}^{4}\Delta_{j}, where

Δ1\displaystyle\Delta_{1} =𝔼[{𝕀(Z=z,D=d)πz(𝑿)k(1Z)Dπ0(𝑿)}η10ρ(𝑿)pzdkp01],\displaystyle=\mathbb{E}\left[\left\{\frac{\mathbb{I}(Z=z^{*},D=d^{*})}{\pi_{z^{*}}({\bm{X}})}-k\frac{(1-Z)D}{\pi_{0}({\bm{X}})}\right\}\frac{\eta_{10}^{\rho}({\bm{X}})}{p_{z^{*}d^{*}}-kp_{01}}\right],
Δ2\displaystyle\Delta_{2} =𝔼[p~zd(𝑿)kp~01(𝑿)pzdkp01𝕀(D=d1,Z=1)p~1d1(𝑿)π1(𝑿)r0d0(M,𝑿)r1d1(M,𝑿)ρd1d0(10)(M,𝑿){Yμ1d1(M,𝑿)}],\displaystyle=\mathbb{E}\left[\frac{\widetilde{p}_{z^{*}d^{*}}({\bm{X}})-k\widetilde{p}_{01}({\bm{X}})}{p_{z^{*}d^{*}}-kp_{01}}\frac{\mathbb{I}(D=d_{1},Z=1)}{\widetilde{p}_{1d_{1}}({\bm{X}})\pi_{1}({\bm{X}})}\frac{r_{0d_{0}}(M,{\bm{X}})}{r_{1d_{1}}(M,{\bm{X}})}\rho_{d_{1}d_{0}}^{(10)}(M,{\bm{X}})\left\{Y-\mu_{1d_{1}}(M,{\bm{X}})\right\}\right],
Δ3\displaystyle\Delta_{3} =𝔼[p~zd(𝑿)kp~01(𝑿)pzdkp01𝕀(D=d0,Z=0)p~0d0(𝑿)π0(𝑿){ρd1d0(10)(M,𝑿)μ1d1(M,𝑿)η10ρ(𝑿)}],\displaystyle=\mathbb{E}\left[\frac{\widetilde{p}_{z^{*}d^{*}}({\bm{X}})-k\widetilde{p}_{01}({\bm{X}})}{p_{z^{*}d^{*}}-kp_{01}}\frac{\mathbb{I}(D=d_{0},Z=0)}{\widetilde{p}_{0d_{0}}({\bm{X}})\pi_{0}({\bm{X}})}\left\{\rho_{d_{1}d_{0}}^{(10)}(M,{\bm{X}})\mu_{1d_{1}}(M,{\bm{X}})-\eta_{10}^{\rho}({\bm{X}})\right\}\right],
Δ4\displaystyle\Delta_{4} =𝔼[({1𝕀(Z=z)πz(𝑿)}p~zd(𝑿)k{11Zπ0(𝑿)}p~01(𝑿))η10ρ(𝑿)pzdkp01],\displaystyle=\mathbb{E}\left[\left(\left\{1-\frac{\mathbb{I}(Z=z^{*})}{\pi_{z^{*}}({\bm{X}})}\right\}\widetilde{p}_{z^{*}d^{*}}({\bm{X}})-k\left\{1-\frac{1-Z}{\pi_{0}({\bm{X}})}\right\}\widetilde{p}_{01}({\bm{X}})\right)\frac{\eta_{10}^{\rho}({\bm{X}})}{p_{z^{*}d^{*}}-kp_{01}}\right],

where η10ρ(𝑿)=m=0mmaxρd1d0(m,𝒙)μ1d1(m,𝒙)r0d0(m,𝒙)\eta_{10}^{\rho}({\bm{X}})=\displaystyle\sum_{m=0}^{m_{\max}}\rho_{d_{1}d_{0}}(m,{\bm{x}})\mu_{1d_{1}}(m,{\bm{x}})r_{0d_{0}}(m,{\bm{x}}). One can verify Δ2=Δ3=Δ4=0\Delta_{2}=\Delta_{3}=\Delta_{4}=0 by using the law of iterated expectation and

Δ1=\displaystyle\Delta_{1}= 𝒙{1πz(𝒙)η10ρ(𝒙)pzdkp01}fD|Z,𝑿(d|z,𝒙)fZ|𝑿(z|𝒙)d𝑿(𝒙)\displaystyle\int_{{\bm{x}}}\left\{\frac{1}{\pi_{z^{*}}({\bm{x}})}\frac{\eta_{10}^{\rho}({\bm{x}})}{p_{z^{*}d^{*}}-kp_{01}}\right\}f_{D|Z,{\bm{X}}}(d^{*}|z^{*},{\bm{x}})f_{Z|{\bm{X}}}(z^{*}|{\bm{x}})\text{d}\mathbb{P}_{\bm{X}}({\bm{x}})
𝒙{kπ0(𝒙)η10ρ(𝒙)pzdkp01}fD|Z,𝑿(1|0,𝒙)fZ|𝑿(0|𝒙)d𝑿(𝒙)\displaystyle-\int_{{\bm{x}}}\left\{\frac{k}{\pi_{0}({\bm{x}})}\frac{\eta_{10}^{\rho}({\bm{x}})}{p_{z^{*}d^{*}}-kp_{01}}\right\}f_{D|Z,{\bm{X}}}(1|0,{\bm{x}})f_{Z|{\bm{X}}}(0|{\bm{x}})\text{d}\mathbb{P}_{\bm{X}}({\bm{x}})
=\displaystyle= 𝒙{fD|Z,𝑿(d|z,𝒙)kfD|Z,𝑿(1|0,𝒙)}η10ρ(𝒙)pzdkp01d𝑿(𝒙)\displaystyle\int_{{\bm{x}}}\left\{f_{D|Z,{\bm{X}}}(d^{*}|z^{*},{\bm{x}})-kf_{D|Z,{\bm{X}}}(1|0,{\bm{x}})\right\}\frac{\eta_{10}^{\rho}({\bm{x}})}{p_{z^{*}d^{*}}-kp_{01}}\text{d}\mathbb{P}_{\bm{X}}({\bm{x}})
=\displaystyle= 𝒙pzd(𝒙)kp01(𝒙)pzdkp01η10ρ(𝒙)d𝑿(𝒙)\displaystyle\int_{{\bm{x}}}\frac{p_{z^{*}d^{*}}({\bm{x}})-kp_{01}({\bm{x}})}{p_{z^{*}d^{*}}-kp_{01}}\eta_{10}^{\rho}({\bm{x}})\text{d}\mathbb{P}_{{\bm{X}}}({\bm{x}})
=\displaystyle= 𝒙pzd(𝒙)kp01(𝒙)pzdkp01m=0mmax{ρd1d0(10)(m,𝒙)μ1d1(m,𝒙)r0d0(m,𝒙)}d𝑿(𝒙)\displaystyle\int_{{\bm{x}}}\frac{p_{z^{*}d^{*}}({\bm{x}})-kp_{01}({\bm{x}})}{p_{z^{*}d^{*}}-kp_{01}}\sum_{m=0}^{m_{\max}}\left\{\rho_{d_{1}d_{0}}^{(10)}(m,{\bm{x}})\mu_{1d_{1}}(m,{\bm{x}})r_{0d_{0}}(m,{\bm{x}})\right\}\text{d}\mathbb{P}_{{\bm{X}}}({\bm{x}})
=\displaystyle= θd1d0(10).\displaystyle\theta_{d_{1}d_{0}}^{(10)}.

Therefore, we have obtained θ(10),mrd1d0(t)=θ(10)d1d0\theta^{(10),\text{mr}}_{d_{1}d_{0}}(t)=\theta^{(10)}_{d_{1}d_{0}} under πmo\mathcal{M}_{\pi}\cap\mathcal{M}_{m}\cap\mathcal{M}_{o}.

Scenario III (emo\mathcal{M}_{e}\cap\mathcal{M}_{m}\cap\mathcal{M}_{o}):

In Scenario III, μ~zd(m,𝒙)=μzd(m,𝒙)\widetilde{\mu}_{zd}(m,{\bm{x}})=\mu_{zd}(m,{\bm{x}}) and p~zd(𝒙)=pzd(𝒙)\widetilde{p}_{zd}({\bm{x}})=p_{zd}({\bm{x}}) but generally π~z(𝒙)πz(𝒙)\widetilde{\pi}_{z}({\bm{x}})\neq\pi_{z}({\bm{x}}). Therefore, we have θ(10),mrd1d0(ξ)=j=14Δj\theta^{(10),\text{mr}}_{d_{1}d_{0}}(\xi)=\sum_{j=1}^{4}\Delta_{j}, where

Δ1\displaystyle\Delta_{1} =𝔼[(𝕀(Z=z){𝕀(D=d)pzd(𝑿)}π~z(𝑿)k(1Z){Dp01(𝑿)}π~0(𝑿))η10ρ(𝑿)pzdkp01],\displaystyle=\mathbb{E}\left[\left(\frac{\mathbb{I}(Z=z^{*})\left\{\mathbb{I}(D=d^{*})-p_{z^{*}d^{*}}({\bm{X}})\right\}}{\widetilde{\pi}_{z^{*}}({\bm{X}})}-k\frac{(1-Z)\left\{D-p_{01}({\bm{X}})\right\}}{\widetilde{\pi}_{0}({\bm{X}})}\right)\frac{\eta_{10}^{\rho}({\bm{X}})}{p_{z^{*}d^{*}}-kp_{01}}\right],
Δ2\displaystyle\Delta_{2} =𝔼[pzd(𝑿)kp01(𝑿)pzdkp01𝕀(D=d1,Z=1)p1d1(𝑿)π~1(𝑿)r0d0(M,𝑿)r1d1(M,𝑿)ρd1d0(10)(M,𝑿){Yμ1d1(M,𝑿)}],\displaystyle=\mathbb{E}\left[\frac{p_{z^{*}d^{*}}({\bm{X}})-kp_{01}({\bm{X}})}{p_{z^{*}d^{*}}-kp_{01}}\frac{\mathbb{I}(D=d_{1},Z=1)}{p_{1d_{1}}({\bm{X}})\widetilde{\pi}_{1}({\bm{X}})}\frac{r_{0d_{0}}(M,{\bm{X}})}{r_{1d_{1}}(M,{\bm{X}})}\rho_{d_{1}d_{0}}^{(10)}(M,{\bm{X}})\left\{Y-\mu_{1d_{1}}(M,{\bm{X}})\right\}\right],
Δ3\displaystyle\Delta_{3} =𝔼[pzd(𝑿)kp01(𝑿)pzdkp01𝕀(D=d0,Z=0)p0d0(𝑿)π~0(𝑿){ρd1d0(10)(M,𝑿)μ1d1(M,𝑿)η10ρ(𝑿)}],\displaystyle=\mathbb{E}\left[\frac{p_{z^{*}d^{*}}({\bm{X}})-kp_{01}({\bm{X}})}{p_{z^{*}d^{*}}-kp_{01}}\frac{\mathbb{I}(D=d_{0},Z=0)}{p_{0d_{0}}({\bm{X}})\widetilde{\pi}_{0}({\bm{X}})}\left\{\rho_{d_{1}d_{0}}^{(10)}(M,{\bm{X}})\mu_{1d_{1}}(M,{\bm{X}})-\eta_{10}^{\rho}({\bm{X}})\right\}\right],
Δ4\displaystyle\Delta_{4} =𝔼[pzd(𝑿)kp01(𝑿)pzdkp01η10ρ(𝑿)],\displaystyle=\mathbb{E}\left[\frac{p_{z^{*}d^{*}}({\bm{X}})-kp_{01}({\bm{X}})}{p_{z^{*}d^{*}}-kp_{01}}\eta_{10}^{\rho}({\bm{X}})\right],

where η10ρ(𝑿)=m=0mmaxρd1d0(m,𝒙)μ1d1(m,𝒙)r0d0(m,𝒙)\eta_{10}^{\rho}({\bm{X}})=\displaystyle\sum_{m=0}^{m_{\max}}\rho_{d_{1}d_{0}}(m,{\bm{x}})\mu_{1d_{1}}(m,{\bm{x}})r_{0d_{0}}(m,{\bm{x}}). Noting that Δ1=Δ2=Δ3=0\Delta_{1}=\Delta_{2}=\Delta_{3}=0 by the law of iterated expectations and Δ4=θd1d0(10)\Delta_{4}=\theta_{d_{1}d_{0}}^{(10)} as shown in Proposition S3, we obtained that θ(10),mrd1d0(t)=θ(10)d1d0\theta^{(10),\text{mr}}_{d_{1}d_{0}}(t)=\theta^{(10)}_{d_{1}d_{0}} under emo\mathcal{M}_{e}\cap\mathcal{M}_{m}\cap\mathcal{M}_{o}.

Now we have proved that θ^(10),mrd1d0(ξ)\widehat{\theta}^{(10),\text{mr}}_{d_{1}d_{0}}(\xi) is consistent to θ(10)d1d0\theta^{(10)}_{d_{1}d_{0}} under πem\mathcal{M}_{\pi}\cap\mathcal{M}_{e}\cap\mathcal{M}_{m}, πmo\mathcal{M}_{\pi}\cap\mathcal{M}_{m}\cap\mathcal{M}_{o}, or emo\mathcal{M}_{e}\cap\mathcal{M}_{m}\cap\mathcal{M}_{o}. We can also show that θ^(10),mrd1d0(t)\widehat{\theta}^{(10),\text{mr}}_{d_{1}d_{0}}(t) is asymptotically normal under certain regularity conditions similar to what we provide in the proof of Theorem 4; the proofs are omitted for brevity. \square

E Figures and Tables

Refer to caption
Figure S1: Simulation results for estimators of θ11(10)\theta_{11}^{(10)} among 6 different scenarios with sample size n=1,000n=1,000. Scenarios (i)–(vi) are described in Section 5. The horizontal line in each panel is the true value of θ11(10)\theta_{11}^{(10)}. The yellow highlighted boxplots indicate that the corresponding estimators are expected to be consistent by large-sample theory.
Refer to caption
Figure S2: Simulation results for estimators of θ00(10)\theta_{00}^{(10)} among 6 different scenarios with sample size n=1,000n=1,000. Scenarios (i)–(vi) are described in Section 5. The horizontal line in each panel is the true value of θ00(10)\theta_{00}^{(10)}. The yellow highlighted boxplots indicate that the corresponding estimators are expected to be consistent by large-sample theory.
Refer to caption
Figure S3: Causal moderated mediation analysis conditional on participants’ age, WHO-LARES study. The conditional natural (in)direct effects are defined on the risk ratio scale, which are calculated using the the R package moderate.mediation based on the methodology in Qin and Wang (2024).
Refer to caption
Figure S4: Sensitivity analysis of PNDE10\text{PNDE}_{10} to violation of the principal ignorability assumption in JOBS II study. Panel A: Contour plot of PNDE^10=θ^10(10),mr(ξ𝝀)θ^10(00),mr(ξ𝝀)\widehat{\text{PNDE}}_{10}=\widehat{\theta}_{10}^{(10),\text{mr}}(\xi_{\bm{\lambda}})-\widehat{\theta}_{10}^{(00),\text{mr}}(\xi_{\bm{\lambda}}) for fixed values the sensitivity parameter {λM,λY}[0.5,1.5]×[0.75,1.25]\{\lambda_{M},\lambda_{Y}\}\in[0.5,1.5]\times[0.75,1.25], where pixels with the mark ‘×\times’ suggest that the corresponding 95% interval estimate covers 0. Panel B: PNDE^10\widehat{\text{PNDE}}_{10} with its 95% interval estimate for λM\lambda_{M} ranging from 0.5 to 1.5 while fixing λY=1\lambda_{Y}=1. Panel C: PNDE^10\widehat{\text{PNDE}}_{10} with its 95% interval estimate for λY\lambda_{Y} ranging from 0.75 to 1.25 while fixing λM=1\lambda_{M}=1.
Refer to caption
Figure S5: Sensitivity analysis of PNIE10\text{PNIE}_{10} to violation of the principal ignorability assumption in JOBS II study. We report PNIE^10=θ^10(11),mr(ξ𝝀)θ^10(10),mr(ξ𝝀)\widehat{\text{PNIE}}_{10}=\widehat{\theta}_{10}^{(11),\text{mr}}(\xi_{\bm{\lambda}})-\widehat{\theta}_{10}^{(10),\text{mr}}(\xi_{\bm{\lambda}}) and its 95% bootstrap confidence interval with the sensitivity parameter λM\lambda_{M} ranging from 0.5 to 1.5.
Refer to caption
Figure S6: Sensitivity analysis of principal natural mediation effects among complies to violation of the ignorability of mediator assumption in JOBS II study. Panel A: PNDE^10=θ^10(10),mr(t𝜻)θ^10(00),mr\widehat{\text{PNDE}}_{10}=\widehat{\theta}_{10}^{(10),\text{mr}}(t_{\bm{\zeta}})-\widehat{\theta}_{10}^{(00),\text{mr}} and its 95% interval estimate with ζ\zeta ranging from 0.8 to 1.2. Panel B: PNIE^10=θ^10(11),mrθ^10(10),mr(t𝜻)\widehat{\text{PNIE}}_{10}=\widehat{\theta}_{10}^{(11),\text{mr}}-\widehat{\theta}_{10}^{(10),\text{mr}}(t_{\bm{\zeta}}) and its 95% interval estimate with ζ\zeta ranging from 0.8 to 1.2.
Table S1: Analysis of the ITT natural mediation effects in the JOBS II study.
Method ITT-NIE ITT-NDE ITT
a -0.017 (-0.036, -0.005) -0.075 (-0.156, -0.001) -0.092 (-0.173, -0.020)
b -0.014 (-0.031, -0.003) -0.073 (-0.148, -0.001) -0.087 (-0.165, -0.017)
c -0.021 (-0.045, -0.005) -0.072 (-0.148, -0.002) -0.092 (-0.174, -0.021)
d -0.014 (-0.031, -0.003) -0.072 (-0.147, 0.000) -0.086 (-0.165, -0.016)
mr -0.015 (-0.032, -0.003) -0.072 (-0.151, -0.001) -0.088 (-0.167, -0.016)
np -0.017 (-0.031, -0.004) -0.067 (-0.146, 0.013) -0.084 (-0.162, -0.005)
Table S2: Analysis of the principal natural mediation effects in the JOBS II study.
Method Estimand Compliers Never takers
(e^10np=0.55)(\widehat{e}_{10}^{\text{np}}=0.55) (e^00np=0.45)(\widehat{e}_{00}^{\text{np}}=0.45)
a PNIE -0.030 (-0.062, -0.009) 0.000 (-0.007, 0.005)
PNDE -0.083 (-0.176, 0.002) -0.063 (-0.174, 0.029)
PCE -0.113 (-0.206, -0.027) -0.063 (-0.172, 0.027)
b PNIE -0.024 (-0.052, -0.006) 0.000 (-0.006, 0.005)
PNDE -0.082 (-0.167, 0.003) -0.062 (-0.166, 0.028)
PCE -0.106 (-0.186, -0.026) -0.062 (-0.165, 0.027)
c PNIE -0.030 (-0.071, 0.001) -0.007 (-0.032, 0.015)
PNDE -0.083 (-0.171, 0.003) -0.056 (-0.160, 0.032)
PCE -0.113 (-0.207, -0.029) -0.063 (-0.171, 0.029)
d PNIE -0.024 (-0.052, -0.006) 0.000 (-0.006, 0.005)
PNDE -0.081 (-0.166, 0.003) -0.060 (-0.166, 0.029)
PCE -0.105 (-0.187, -0.025) -0.060 (-0.167, 0.029)
mr PNIE -0.026 (-0.055, 0.006) 0.000 (-0.006, 0.006)
PNDE -0.083 (-0.170, 0.002) -0.058 (-0.160, 0.031)
PCE -0.109 (-0.191, -0.027) -0.058 (-0.160, 0.030)
np PNIE -0.029 (-0.052, -0.006) -0.001 (-0.004, 0.003)
PNDE -0.066 (-0.156, 0.023) -0.066 (-0.163, 0.032)
PCE -0.096 (-0.182, -0.009) -0.067 (-0.163, 0.032)
Rudolph et al.§ PNIE -0.030 (-0.053, -0.006)
PNDE -0.115 (-0.251, 0.022)
PCE -0.145 (-0.280, -0.008)
  • §\mathsection

    ‘Rudolph et al.’ is the nonparametric efficient estimator given by Rudolph et al. (2024); see Remark 1 for more details of this approach.

Table S3: Mean and standard deviation of baseline characteristics among the compliers and never-takers, JOBS II study.
Variable Compliers Never-takers ASD§
Proportion 55% 45%
Gender (male) 0.53 (0.50) 0.62 (0.48) 0.19
Age (years) 39.14 (10.37) 35.51 (9.40) 0.37
White race 0.84 (0.37) 0.77 (0.42) 0.15
Depression 1.89 (0.55) 1.87 (0.55) 0.03
Economic hardship 3.04 (0.93) 3.11 (0.94) 0.07
Motivation 5.28 (0.63) 5.05 (0.64) 0.35
Marriage (baseline: never married)
Married 0.49 (0.50) 0.43 (0.50) 0.11
Separated 0.03 (0.17) 0.03 (0.18) 0.03
Divorced 0.18 (0.38) 0.19 (0.4) 0.04
Widowed 0.02 (0.14) 0.02 (0.15) 0.02
Education (baseline: less than high school)
High school 0.32 (0.47) 0.33 (0.47) 0.02
Post-secondary non-tertiary education 0.32 (0.47) 0.40 (0.49) 0.17
Bachelor’s degree 0.19 (0.39) 0.10 (0.31) 0.23
Higher than a Bachelor’s degree 0.13 (0.33) 0.09 (0.28) 0.13
Assertiveness 3.39 (0.87) 3.56 (0.83) 0.21
  • \ddagger

    Calculation of stratum-specific mean and standard deviation of the baseline characteristics follows the method in Cheng et al. (2023a), which are weighted average of the mean and standard deviation of baseline characteristics based on the principal scores (estimated according to the nonparametric efficient estimator).

  • §\mathsection

    ASD is the absolute standardized difference across the two principal strata. Given a specific baseline covariate, its ASD is calculated as |x¯10x¯00|/0.5(s102+s002){|\bar{x}_{10}-\bar{x}_{00}|}/{\sqrt{0.5(s_{10}^{2}+s_{00}^{2})}}, where x¯d1d0\bar{x}_{d_{1}d_{0}} and sd1d0s_{d_{1}d_{0}} are the estimated mean and standard deviation of this covariate in the stratum U=d1d0U=d_{1}d_{0}.

Table S4: Mean and standard deviation of baseline characteristics across the doomed, harmed, and immune strata, WHO-LARES study.
Variable Doomed Harmed Immune Max ASD§
Proportion 51% 8% 41%
Personal characteristics
Female 0.63 (0.47) 0.56 (0.52) 0.48 (0.48) 0.31
Age (years) 46.53 (17.89) 46.80 (17.99) 46.76 (17.3) 0.02
Married 0.59 (0.49) 0.63 (0.48) 0.64 (0.48) 0.09
Post-secondary education 0.27 (0.45) 0.27 (0.45) 0.29 (0.45) 0.04
Employed 0.58 (0.49) 0.58 (0.49) 0.62 (0.49) 0.09
Non-smoking 0.50 (0.50) 0.49 (0.50) 0.47 (0.5) 0.06
Dwelling condition
Owning the house 0.74 (0.44) 0.72 (0.45) 0.76 (0.43) 0.08
House size greater than 50m250\text{m}^{2} 0.86 (0.35) 0.84 (0.37) 0.87 (0.34) 0.08
Crowding (1\geq 1 resident per room) 0.64 (0.48) 0.64 (0.48) 0.64 (0.48) 0.01
Self-evaluation on dwelling condition
Satisfied with the heating system 0.85 (0.35) 0.85 (0.35) 0.89 (0.32) 0.10
Satisfied with natural light 0.75 (0.43) 0.73 (0.44) 0.77 (0.42) 0.10
  • \ddagger

    Calculation of stratum-specific mean and standard deviation of the baseline characteristics follows the method in Cheng et al. (2023a), which are weighted average of the mean and standard deviation of baseline characteristics based on the principal scores (estimated according to the nonparametric efficient estimator).

  • §\mathsection

    Max ASD is the maximum pairwise absolute standardized difference across the three principal strata. Given a specific baseline covariate, its Max ASD is calculated as the maximum of |x¯d1d0x¯d1d0|/0.5(sd1d02+sd1d02){|\bar{x}_{d_{1}d_{0}}-\bar{x}_{d_{1}^{\prime}d_{0}^{\prime}}|}/{\sqrt{0.5(s_{d_{1}d_{0}}^{2}+s_{d_{1}^{\prime}d_{0}^{\prime}}^{2})}} for all d1d0d1d0{11,10,00}d_{1}d_{0}\neq d_{1}^{\prime}d_{0}^{\prime}\in\{11,10,00\}, where x¯d1d0\bar{x}_{d_{1}d_{0}} and sd1d0s_{d_{1}d_{0}} are the estimated mean and standard deviation of this covariate in the stratum U=d1d0U=d_{1}d_{0}.

Table S5: Analysis of the ITT natural mediation effects on a risk ratio scale, WHO-LARES study.
Method ITT-NIERR{}^{\text{RR}} ITT-NDERR{}^{\text{RR}} ITTRR{}^{\text{RR}}
a 1.024 (1.003, 1.045) 1.242 (1.108, 1.385) 1.271 (1.127, 1.418)
b 1.021 (1.002, 1.039) 1.239 (1.105, 1.387) 1.266 (1.121, 1.407)
c 1.029 (1.005, 1.055) 1.232 (1.098, 1.386) 1.268 (1.127, 1.414)
d 1.021 (1.002, 1.040) 1.230 (1.095, 1.381) 1.256 (1.110, 1.418)
mr 1.021 (1.003, 1.043) 1.248 (1.114, 1.405) 1.274 (1.137, 1.438)
np 1.031 (1.010, 1.053) 1.219 (1.078, 1.361) 1.257 (1.114, 1.400)
Table S6: Analysis of the principal natural mediation effects on a risk ratio scale, WHO-LARES study.
Method Estimand Doomed Harmed Immune
(e^11np=0.51)(\widehat{e}_{11}^{\text{np}}=0.51) (e^10np=0.08)(\widehat{e}_{10}^{\text{np}}=0.08) (e^00np=0.41)(\widehat{e}_{00}^{\text{np}}=0.41)
a PNIERR{}^{\text{RR}} 1.019 (0.992, 1.041) 1.028 (0.990, 1.061) 1.030 (0.995, 1.073)
PNDERR{}^{\text{RR}} 1.252 (1.066, 1.421) 2.112 (1.780, 2.478) 1.087 (0.896, 1.306)
PCERR{}^{\text{RR}} 1.276 (1.087, 1.444) 2.171 (1.862, 2.508) 1.120 (0.919, 1.323)
b PNIERR{}^{\text{RR}} 1.021 (0.999, 1.043) 1.030 (0.998, 1.065) 1.018 (0.987, 1.053)
PNDERR{}^{\text{RR}} 1.226 (1.043, 1.408) 2.141 (1.659, 2.775) 1.096 (0.908, 1.310)
PCERR{}^{\text{RR}} 1.252 (1.073, 1.438) 2.205 (1.705, 2.833) 1.115 (0.931, 1.311)
c PNIERR{}^{\text{RR}} 1.040 (0.998, 1.081) 1.026 (0.984, 1.061) 1.010 (0.943, 1.065)
PNDERR{}^{\text{RR}} 1.226 (1.044, 1.407) 2.113 (1.816, 2.458) 1.099 (0.904, 1.322)
PCERR{}^{\text{RR}} 1.275 (1.086, 1.452) 2.168 (1.844, 2.495) 1.111 (0.913, 1.308)
d PNIERR{}^{\text{RR}} 1.021 (0.999, 1.043) 1.033 (0.998, 1.070) 1.017 (0.987, 1.052)
PNDERR{}^{\text{RR}} 1.227 (1.040, 1.403) 2.093 (1.795, 2.441) 1.096 (0.903, 1.322)
PCERR{}^{\text{RR}} 1.253 (1.070, 1.433) 2.162 (1.852, 2.490) 1.115 (0.918, 1.319)
mr PNIERR{}^{\text{RR}} 1.017 (0.999, 1.034) 1.029 (1.001, 1.059) 1.025 (0.983, 1.074)
PNDERR{}^{\text{RR}} 1.223 (1.044, 1.403) 2.296 (1.841, 2.982) 1.102 (0.911, 1.328)
PCERR{}^{\text{RR}} 1.244 (1.070, 1.433) 2.363 (1.897, 3.045) 1.129 (0.945, 1.336)
np PNIERR{}^{\text{RR}} 1.025 (1.001, 1.050) 1.046 (1.013, 1.079) 1.035 (0.995, 1.075)
PNDERR{}^{\text{RR}} 1.181 (1.014, 1.348) 2.142 (1.724, 2.560) 1.111 (0.881, 1.340)
PCERR{}^{\text{RR}} 1.212 (1.044, 1.379) 2.241 (1.817, 2.666) 1.150 (0.916, 1.385)
Table S7: Causal moderated mediation analysis for each discrete baseline characteristic based on the R package moderate.mediation (Qin and Wang, 2024), WHO-LARES study. The natural indirect effect and natural direct effect for the entire population, as output from the moderate.mediation package, are 1.021 (1.002, 1.043) and 1.251 (1.114, 1.394), respectively. All mediation effects are defined on the risk ratio scale.
Subpopulation Conditional NIE Conditional NDE
Gender
Male 1.008 (0.983, 1.038) 1.233 (1.011, 1.500)
Female 1.029 (1.005, 1.058) 1.261 (1.099, 1.440)
Current marital status
Unmarried 1.028 (0.997, 1.065) 1.246 (1.043, 1.488)
Married 1.014 (0.992, 1.037) 1.256 (1.081, 1.447)
Education
Secondary school or less 1.017 (0.999, 1.037) 1.305 (1.161, 1.476)
Post-secondary education 1.039 (0.979, 1.119) 0.972 (0.678, 1.336)
Employment status
Unemployed 1.022 (0.996, 1.054) 1.305 (1.127, 1.507)
Employed 1.020 (0.995, 1.046) 1.198 (1.001, 1.408)
Smoking status
Smoking 1.040 (1.007, 1.078) 1.204 (1.023, 1.412)
Non-smoking 1.003 (0.981, 1.025) 1.300 (1.105, 1.508)
Owning the house
No 1.025 (0.977, 1.079) 1.110 (0.868, 1.425)
Yes 1.019 (1.001, 1.038) 1.300 (1.145, 1.466)
House size
50m2\leq 50\text{m}^{2} 1.053 (1.005, 1.128) 1.256 (0.978, 1.594)
>50m2>50\text{m}^{2} 1.012 (0.993, 1.030) 1.251 (1.103, 1.413)
Crowding
<1<1 resident per room 1.031 (1.001, 1.072) 1.038 (0.869, 1.239)
1\geq 1 resident per room 1.014 (0.989, 1.038) 1.434 (1.235, 1.649)
Heating system
Unsatisfied 1.023 (1.002, 1.047) 1.244 (1.095, 1.420)
Satisfied 1.008 (0.978, 1.042) 1.279 (1.000, 1.582)
Natural light
Unsatisfied 1.032 (0.995, 1.076) 1.248 (1.032, 1.501)
Satisfied 1.013 (0.994, 1.034) 1.252 (1.087, 1.431)