Identification and multiply robust estimation in causal mediation analysis across principal strata

Chao Cheng^1,2 and Fan Li^1,2
¹Department of Biostatistics, Yale School of Public Health
²Center for Methods in Implementation and Prevention Science,
Yale School of Public Health

Abstract

We consider assessing causal mediation in the presence of a post-treatment event (examples include noncompliance, a clinical event, or death). We identify natural mediation effects for the entire study population and for each principal stratum characterized by the joint potential values of the post-treatment event. We derive the efficient influence function for each mediation estimand, which motivates a set of multiply robust estimators for inference. The multiply robust estimators are consistent under four types of misspecifications and are efficient when all nuisance models are correctly specified. We also develop a nonparametric efficient estimator that leverages data-adaptive machine learners to achieve efficient inference and discuss sensitivity methods to address key identification assumptions. We illustrate our methods via simulations and two real data examples.

Keywords: Causal inference, efficient influence function, endogenous subgroups, moderated mediation analysis, natural indirect effect, principal ignorability.

1 Introduction

1.1 Background and motivation

Causal mediation analysis (Imai et al., 2010) is widely used to investigate the role of a mediator ( $M$ ) in explaining the causal mechanism from a treatment ( $Z$ ) to an outcome ( $Y$ ). Under the potential outcomes framework, a primary step in causal mediation analysis is to decompose the total treatment effect into an indirect effect that works through $M$ and a direct effect that works around $M$ . While alternative definitions exist, the natural indirect and direct effects are the most relevant for studying causal mechanisms (Nguyen et al., 2021). The natural indirect effect compares potential outcomes by switching $M$ from the value it would have taken under the control condition to that under the treated condition, while fixing $Z$ to the treated condition. The natural direct effect compares potential outcomes by switching $Z$ from the control to the treated condition, while fixing $M$ to the value it would have taken under the control condition. Parametric regressions (e.g., Cheng et al., 2021, 2023b), semiparametric methods (e.g., Tchetgen Tchetgen and Shpitser, 2012), and nonparametric methods (e.g., Kim et al., 2017) have been proposed for estimating natural mediation effects, typically by assuming that $M$ is the only variable sitting on the causal chain connecting treatment and outcome.

Increasingly for experimental and observational studies, a post-treatment event ( $D$ ) may occur prior to the measurement of the mediator. This event may be a post-treatment action or decision regarding uptake (e.g., noncompliance or treatment discontinuation), a clinical event (e.g., worsening of disease, adverse medication effect), or a terminal event precluding the observation of any data afterward (e.g., death). In each context, the post-treatment event provides important information in defining partially observed subgroups of the study population. An emerging interest lies in learning the treatment effect within each subgroup, and possibly the effect heterogeneity across subgroups. Under the principal stratification framework, one can characterize each subgroup with the joint potential values of $D$ under alternative conditions, referred to as principal strata more generally (Frangakis and Rubin, 2002), or as endogenous subgroups in social sciences (Page et al., 2015). Beyond understanding the total effect within each subgroup, here we are further interested in evaluating the natural mediation effects via $M$ within each subgroup characterized by a post-treatment event $D$ . Below, we give two examples that motivate such an objective (an additional example on mediation analysis with death-truncated mediator and outcome is provided in the Supplementary Material).

Example 1

(Mediation analysis with noncompliance) Noncompliance occurs if the actual treatment received ( $D$ ) differs from the treatment assignment ( $Z$ ). By Angrist et al. (1996), the study population is partitioned into four subgroups, including (i) always-takers who take the treatment regardless of assignment; (ii) never-takers who take the control regardless of assignment; (iii) compliers who comply with the assignment; (iv) defiers who take the opposite assignment. Typically, the compliers are of central interest because this is the only group for whom the average causal effect due to assignment reflects the average causal effect due to actual treatment received. As noncompliance is a post-randomization action or decision, a relevant research question in the presence of a mediator, measured prior to the outcome, is whether the treatment works through $M$ among the compliers (i.e., the complier natural mediation effect). A follow-up question is whether there is heterogeneity in the treatment effect mechanisms among the subgroups formed by noncompliance patterns.

Example 2

(Mediation analysis with an intercurrent event) In health research, disease progression, adverse reaction, or other early outcome may occur due to treatment, which are collectively referred to as an intercurrent event by the ICH E9 Estimands Framework (Kahan et al., 2023). Section 6.2 studies the role of perceived control ( $M$ ) in mediating the effect of residence in a damp and moldy dwelling ( $Z$ ) on depression ( $Y$ ), but dampness/mold related disease ( $D$ ) occurred among some study units. It is then of interest to study the indirect effect due to $M$ among those who would always develop dampness/mold related disease regardless of their living condition (i.e., the doomed stratum), those who never develop dampness/mold related disease regardless of their living condition (i.e., the immune stratum), as well as those who would develop dampness/mold related disease only if living in such condition but otherwise not (i.e., the harmed stratum). A follow-up question is whether there is variation in the natural indirect effects among these subgroups.

In both examples, the principal causal effect (PCE)—the total treatment effect within a principal stratum—can be decomposed into a principal natural indirect effect (PNIE) through the mediator and a principal natural direct effect (PNDE) around the mediator. Addressing the PNIEs and their variation allows us to unpack the overall natural indirect effect to understand for whom and under what circumstances $M$ plays a crucial role in explaining the underlying mechanism. Such analyses could also help determine whether the overall natural indirect effect is driven by one particular subgroup, in which case future interventions might be restructured to better serve an intended subpopulation. To this end, estimating principal natural mediation effects is informative in itself, but comparing them across strata may also provide additional insights. In fact, studying variations in principal natural mediation effects is related to moderated mediation analysis given measured baseline covariates (Qin and Wang, 2024). However, the difference is that we focus on endogenous subgroups defined by a post-treatment event rather than those defined by covariates. Consequently, the scientific question addressed by principal natural mediation effects generally cannot be answered by merely exploring the conditional mediation effects given observed covariates alone (an empirical comparison is provided in Section 6.2).

1.2 Prior work and our contribution

The post-treatment event poses unique challenges on identification of the principal natural mediation effects, because (i) $D$ is a treatment-induced variable confounding the mediator-outcome relationship and (ii) the principal stratum membership is only partially observed. To tackle these challenges, a prevalent approach, usually developed under the noncompliance setting (Example 1), is to view the treatment assignment as an instrument variable for $D$ and use exclusion restriction to identify the mediation effects among the compliers. Exclusion restriction requires that all causal pathways from the treatment assignment to the mediator and outcome are only through $D$ (see Figure 1 for illustration), and therefore no treatment or mediation effects exist among principal strata where $Z$ does not affect $D$ . For example, Yamamoto (2013), Park and Kürüm (2018, 2020) considered a combination of exclusion restriction, monotonicity of treatment assignment on the treatment receipt, and mediator ignorability to nonparametrically identify the complier natural mediation effects. Instead of assuming mediator ignorability, Frölich and Huber (2017) identified the complier natural mediation effect by drawing a second instrumental variable for the mediator. Under monotonicity and exclusion restriction, Rudolph et al. (2024) further proposed semiparametrically efficient estimators of (i) complier natural/interventional mediation effects with a single instrumental variable for the treatment and (ii) double complier interventional mediation effects with two instrumental variables for the treatment and mediator.

Figure 1: A directed acyclic diagram for mediation analysis across principal strata. Notation:

Z

is the treatment,

D

is the post-treatment event,

M

is the mediator,

Y

is the outcome, and

{\bm{X}}

is a vector of pre-treatment covariates. There are four possible causal pathways from the treatment to outcome, indexed by (a)–(d). The exclusion restriction assumption excludes pathways (c) and (d).

A critical assumption in the prior work is the exclusion restriction, which may not always hold in open-label studies where the assignment can exert a direct psychological effect on the mediator and the outcome not through the treatment receipt. In Example 2, the PCE and PNIE among the doomed and immune strata are of interest and may be non-zero; in this instance, the exclusion restriction does not apply either. Relaxing exclusion restriction, Park and Palardy (2020) developed a maximum likelihood approach with full distributional assumptions to empirically identify principal natural mediation effects; however, the consistency of their estimators requires all parametric models to be correctly specified and bias may arise under misspecification.

Our primary interest is to identify the principal natural mediation effects in endogenous subgroups defined by a post-treatment event. We assume monotonicity and ignorability conditions for nonparametric identification, but require neither the exclusion restriction nor fully parametric modeling assumptions. In a similar context, Tchetgen Tchetgen and VanderWeele (2014) studied the marginal natural mediation effect of $M$ in the presence of a post-treatment confounder $D$ based on a nonparametric structural equation model with independent errors and monotonicity, but did not provide identification results for the finer causal mediation estimands within the principal stratum. As we explain in Section 3, our identification assumptions are weaker than those considered by Tchetgen Tchetgen and VanderWeele (2014) in a technical sense, but remain sufficient to unpack the stratum-specific mediation effects that contribute to the marginal natural mediation effect. Leveraging the semiparametric efficiency theory (Bickel et al., 1993, see Hines et al. (2022) for an overview), we characterize the efficient influence function for each estimand under the nonparametric model to motivate semiparametric estimators. Our estimators are consistent under four types of working model misspecification, and are quadruply robust. As a further improvement, a nonparametric extension is also provided to incorporate data-adaptive machine learners for efficient inference (Chernozhukov et al., 2018). Finally, we develop strategies for sensitivity analyses under violations of the key ignorability assumptions in cases when insufficient baseline covariates are collected, or when there is unmeasured treatment-induced confounding of the mediator-outcome relationship.

2 Notation, causal estimands and identification

Suppose that we observe $n$ independent copies of the quintuple $\bm{O}=\{{\bm{X}},Z,D,M,Y\}$ , where $Z\in\{0,1\}$ represents treatment assignment with 1 indicating the treated condition and 0 indicating the control condition, $D\in\{0,1\}$ is the occurrence of the post-treatment event, $M$ is a mediator measured after $D$ , $Y$ is the final outcome of interest, and ${\bm{X}}$ is a vector of pre-treatment covariates. A directed acyclic graph (DAG) summarizing the relationships among variables is in Figure 1, where $Z$ is allowed to affect $Y$ either directly or through the intermediate variables $D$ and $M$ . For a generic variable $W$ , we use $\mathbb{P}_{W}(w)$ to denote its distribution function, $f_{W}(w)$ to denote the probability mass/density function, $\mathbb{E}_{W}[W]$ to denote its expectation. Whenever applicable, we abbreviate $\mathbb{P}_{W}(w)$ , $\mathbb{E}_{W}[W]$ , and $f_{W}(w)$ as $\mathbb{P}(w)$ , $\mathbb{E}[W]$ , and $f(w)$ without ambiguity. Moreover, we define $\mathbb{P}_{n}[W]=\frac{1}{n}\sum_{i=1}^{n}W_{i}$ as the empirical average operator, $\mathbb{I}(\cdot)$ as the indicator function, $|\cdot|$ as the absolute value, $\|\cdot\|$ as the $L_{2}(\mathbb{P})$ -norm such that $\|g\|^{2}=\int g^{2}d\mathbb{P}$ .

We pursue the potential outcomes framework to define causal mediation estimands (Imai et al., 2010). Let $D_{z}$ be the potential value of the post-treatment event under treatment $z$ , $M_{zd}$ the potential value of the mediator when the treatment is $z$ and $D$ is set to $d$ , $Y_{zdm}$ the potential outcome when the treatment is set to $z$ , the post-treatment event is set to $d$ , and the mediator is set to $m$ . Furthermore, we write $M_{z}=M_{zD_{z}}$ such that the potential value of the mediator under treatment $z$ is identical to that under treatment $z$ when $D$ takes its natural value under treatment $z$ . Similarly, we have $Y_{z}=Y_{zD_{z}M_{z}}$ and $Y_{zm}=Y_{zD_{z}m}$ . The equalities $M_{z}=M_{zD_{z}}$ , $Y_{z}=Y_{zD_{z}M_{z}}$ , and $Y_{zm}=Y_{zD_{z}m}$ are collectively referred to as the composition of potential values (VanderWeele and Vansteelandt, 2009).

To proceed, we adopt the principal stratification framework (Frangakis and Rubin, 2002) and use the joint potential values of the post-treatment event $U=(D_{1},D_{0})$ to partition the study population into four subgroups, $\{(1,0),(1,1),(0,0),(0,1)\}$ . To contextualize the development, we take the noncompliance as a running example throughout, in which case these four strata are named as compliers, always-takers, never-takers, and defiers. For notational convenience, we re-express the joint potential values $(D_{1},D_{0})$ as $D_{1}D_{0}$ so that $U\in\{10,11,00,01\}$ . A central property is that $U$ is unaffected by the treatment and can be treated as a baseline covariate; therefore causal comparisons conditional on $U$ are well-defined subgroup causal effects. We define $e_{d_{1}d_{0}}({\bm{X}})=f_{U|{\bm{X}}}(d_{1}d_{0}|{\bm{X}})$ and $e_{d_{1}d_{0}}=f_{U}(d_{1}d_{0})$ as the proportion of principal stratum $U$ conditional on and marginalized over covariates ${\bm{X}}$ , where $e_{d_{1}d_{0}}({\bm{X}})$ is referred to as the principal score (Ding and Lu, 2017). Since the stratum membership $U=D_{1}D_{0}$ is only partially observed, the principal score $e_{d_{1}d_{0}}({\bm{X}})$ and its marginal counterpart $e_{d_{1}d_{0}}$ cannot be estimated without further assumptions.

The PCE is defined as the effect of treatment assignment in each principal stratum (Jo and Stuart, 2009; Ding and Lu, 2017) and is written as:

\text{PCE}_{d_{1}d_{0}}=\mathbb{E}\left[Y_{1}-Y_{0}|U=d_{1}d_{0}\right]\quad(d_{1}d_{0}=10,11,00,01)

which equals $\mathbb{E}\left[Y_{1M_{1}}-Y_{0M_{0}}|U=d_{1}d_{0}\right]$ by composition of the potential outcome. To assess mediation, we decompose $\text{PCE}_{d_{1}d_{0}}$ into a principal natural indirect effect ( $\text{PNIE}_{d_{1}d_{0}}$ ) and a principal natural direct effect ( $\text{PNDE}_{d_{1}d_{0}}$ ):

\displaystyle\underbrace{\mathbb{E}\left[Y_{1M_{1}}-Y_{0M_{0}}|U=d_{1}d_{0}\right]}_{\text{PCE}_{d_{1}d_{0}}}=\underbrace{\mathbb{E}\left[Y_{1M_{1}}-Y_{1M_{0}}|U=d_{1}d_{0}\right]}_{\text{PNIE}_{d_{1}d_{0}}}+\underbrace{\mathbb{E}\left[Y_{1M_{0}}-Y_{0M_{0}}|U=d_{1}d_{0}\right]}_{\text{PNDE}_{d_{1}d_{0}}}.

(1)

Intuitively, $\text{PNDE}_{d_{1}d_{0}}$ captures the effect of treatment assignment on outcome among units in stratum $d_{1}d_{0}$ when the mediator is fixed to its natural value without treatment. The $\text{PNIE}_{d_{1}d_{0}}$ , on the other hand, captures the mean difference of the potential outcomes among units in stratum $d_{1}d_{0}$ , when the assignment $Z=1$ , but the mediator changes from its natural value under treatment to its counterfactual value under control. Therefore, $\text{PNIE}_{d_{1}d_{0}}$ measures the extent to which the causal effect of treatment assignment is mediated through $M$ among the subpopulation in stratum $d_{1}d_{0}$ . Similarly, the intention-to-treat effect (ITT), defined as $\mathbb{E}[Y_{1}-Y_{0}]$ , can be decomposed in the usual fashion into an intention-to-treat natural indirect effect (ITT-NIE) and a intention-to-treat natural direct effect (ITT-NDE):

\displaystyle\underbrace{\mathbb{E}\left[Y_{1M_{1}}-Y_{0M_{0}}\right]}_{\text{ITT}}=\underbrace{\mathbb{E}\left[Y_{1M_{1}}-Y_{1M_{0}}\right]}_{\text{ITT-NIE}}+\underbrace{\mathbb{E}\left[Y_{1M_{0}}-Y_{0M_{0}}\right]}_{\text{ITT-NDE}}.

(2)

One can verify that ITT is a weighted average of the PCEs such that $\text{ITT}=\displaystyle\sum_{d_{1}d_{0}\in\mathcal{U}_{\rm{all}}}e_{d_{1}d_{0}}\times\text{PCE}_{d_{1}d_{0}}$ , where $\mathcal{U}_{\rm{all}}=\{10,11,00,01\}$ . Similarly, $\text{ITT-NIE}=\displaystyle\sum_{d_{1}d_{0}\in\mathcal{U}_{\rm{all}}}e_{d_{1}d_{0}}\times\text{PNIE}_{d_{1}d_{0}}$ and $\text{ITT-NDE}=\displaystyle\sum_{d_{1}d_{0}\in\mathcal{U}_{\rm{all}}}e_{d_{1}d_{0}}\times\text{PNDE}_{d_{1}d_{0}}$ .

In what follows, we focus on identification of $\theta_{d_{1}d_{0}}^{(zz^{\prime})}=\mathbb{E}\left[Y_{zM_{z^{\prime}}}|U=d_{1}d_{0}\right]$ for any $z$ and $z^{\prime}\in\{0,1\}$ , based on which all PCEs and their effect decompositions in (1) can be obtained. Notice that ITT, ITT-NIE, and ITT-NDE can be also obtained as they are weighted averages of $\theta_{d_{1}d_{0}}^{(zz^{\prime})}$ . Identification of $\theta_{d_{1}d_{0}}^{(zz^{\prime})}$ requires the following structural assumptions.

Assumption 1

(Consistency) For any $z$ , $d$ , and $m$ , we have $D_{z}=D$ if $Z=z$ , $M_{zd}=M$ if $Z=z$ and $D=d$ , and $Y_{zdm}=Y$ if $Z=z$ , $D=d$ and $M=m$ .

Assumption 2

(Ignorability of the treatment assignment) $\{D_{z},M_{z^{\prime}d^{\prime}},Y_{z^{*}d^{\prime}m^{*}}\}\perp\!\!\!\perp Z|{\bm{X}}$ $\forall$ $z$ , $z^{\prime}$ , $z^{*}$ , $d^{\prime}$ , and $m^{*}$ , where “ $\perp\!\!\!\perp$ ” stands for independence.

Assumption 1 is commonly invoked to exclude unit-level interference and enables us to connect the observed variables with their potential values. Assumption 2 is the no unmeasured confounding condition for treatment assignment that is often required to identify the ITT estimand in the absence of randomization. It is considered plausible when sufficient baseline covariates ${\bm{X}}$ are collected such that no hidden confounders would give rise to systematic differences between the post-randomization variables in the treated and the control groups. A stronger statement of Assumption 2, $\{D_{z},M_{z^{\prime}d^{\prime}},Y_{z^{*}d^{\prime}m^{*}},{\bm{X}}\}\perp\!\!\!\perp Z$ , is often satisfied in randomized experiments.

We additionally require the monotonicity of treatment on the post-treatment event to identify the distribution of the principal stratum membership $U$ . We consider two types of monotonicity—standard monotonicity (Assumptions 3a) and strong monotonicity (Assumptions 3b). The standard version requires that treatment has a non-negative impact on the post-treatment event, whereas the stronger version further assumes $D_{0}=0$ . Strong monotonicity is satisfied under one-sided noncompliance where all control units had no access to treatment (Frölich and Melly, 2013).

Assumption 3

(Monotonicity) (a) Under standard monotonicity, $D_{1}\geq D_{0}$ for all units; (b) under strong monotonicity, $D_{0}=0$ for all units.

Assumption 3a rules out defiers ( $U=01$ ), and enables the identification of the principal scores $e_{d_{1}d_{0}}({\bm{X}})$ for the remaining three principal strata ( $U\in\{10,11,00\}$ ) (Ding and Lu, 2017). Defining $p_{zd}({\bm{X}})=f_{D|Z,{\bm{X}}}(d|z,{\bm{X}})$ and $p_{zd}=\mathbb{E}[p_{zd}({\bm{X}})]$ for $z,d\in\{0,1\}$ , because the observed data with $(Z=0,D=1)$ includes only always-takers, we have $e_{11}({\bm{X}})=p_{01}({\bm{X}})$ . Similarly, $e_{00}({\bm{X}})=p_{10}({\bm{X}})$ and $e_{10}({\bm{X}})=p_{11}({\bm{X}})-p_{01}({\bm{X}})$ since $\displaystyle\sum_{d_{1}d_{0}\in\mathcal{U}_{\text{all}}}\!\!\!e_{d_{1}d_{0}}({\bm{X}})=1$ , and the strata proportions are $e_{10}=p_{11}-p_{01}$ , $e_{00}=p_{10}$ , and $e_{11}=p_{01}$ . Assumption 3b rules out both always-takers and defiers ( $U=11$ and 01). Under strong monotonicity, the principal scores are given by $e_{10}({\bm{X}})=p_{11}({\bm{X}})$ and $e_{00}({\bm{X}})=p_{10}({\bm{X}})$ , and the strata proportions are $e_{10}=p_{11}$ and $e_{00}=p_{10}$ . To unify the presentation under standard and strong monotonicity, we re-express $e_{d_{1}d_{0}}({\bm{X}})$ and $e_{d_{1}d_{0}}$ as:

\displaystyle e_{d_{1}d_{0}}({\bm{X}})=p_{z^{*}d^{*}}({\bm{X}})-kp_{01}({\bm{X}})\text{, \quad}e_{d_{1}d_{0}}=p_{z^{*}d^{*}}-kp_{01}

(3)

for $d_{1}d_{0}\in\{10,00,11,01\}$ , where $k=|d_{1}-d_{0}|$ , $z^{*}d^{*}=11$ , 10, 01, and 01 if $d_{1}d_{0}=10$ , 00, 11, and 01, respectively. Note that $p_{01}({\bm{X}})\equiv p_{01}\equiv 0$ under strong monotonicity. Because $\{e_{d_{1}d_{0}}({\bm{X}}),e_{d_{1}d_{0}}\}$ is equivalent to $\{p_{z^{*}d^{*}}({\bm{X}})-kp_{01}({\bm{X}}),p_{z^{*}d^{*}}-kp_{01}\}$ , we will use them interchangeably.

We next introduce two additional ignorability assumptions for mediation analysis within principal strata.

Assumption 4

(Generalized principal ignorability) $\{M_{zd},Y_{z^{\prime}dm^{\prime}}\}\perp\!\!\!\perp U|{\bm{X}}$ $\forall$ $z$ , $z^{\prime}$ , $d$ , $m^{\prime}$ .

Principal ignorability has been previously introduced to identify the PCEs (Jo and Stuart, 2009; Ding and Lu, 2017; Forastiere et al., 2018). Assumption 4 generalizes the usual assumption to accommodate the mediator as an additional intermediate outcome. This assumption requires sufficient pre-treatment covariates to remove the confounding between $U$ and $M$ and that between $U$ and $Y$ ; in other words, no systematic differences exist in the distribution of the potential mediator and outcome across principal strata, given covariates. Next, we require ignorability of the mediator (Yamamoto, 2013; Park and Kürüm, 2020):

Assumption 5

(Ignorability of the mediator) $M_{zd}\perp\!\!\!\perp Y_{z^{\prime}d^{\prime}m^{\prime}}|\{Z,U,{\bm{X}}$ } $\forall$ $z$ , $z^{\prime}$ , $d$ , $d^{\prime}$ , $m^{\prime}$ .

Assumption 5 assumes that the potential mediator is independent of the potential outcome, given the observed covariates ${\bm{X}}$ , within each assignment group and principal stratum. This assumption rules out unmeasured baseline confounders in the mediator-outcome relationship and requires that, apart from $D$ , there are no other treatment-induced confounders affecting the mediator-outcome relationship. Assumption 5, coupled with Assumptions 2 and 4, generalizes the standard sequential ignorability assumption for causal mediation analysis (Imai et al., 2010) to address a post-randomization event $D$ . In addition, when Assumptions 2–4 hold, Assumption 5 is equivalent to $M_{zd}\perp\!\!\!\perp Y_{z^{\prime}d^{\prime}m^{\prime}}|{\bm{X}}$ without the need to condition on treatment assignment and principal stratum (see Lemma S6 in the Supplementary Material). Lastly, we state the following positivity assumption.

Assumption 6

(Positivity) Assume that $f_{Z|{\bm{X}}}(z|{\bm{x}})>0$ , $f_{D|Z,{\bm{X}}}(d|1,{\bm{x}})>0$ , $f_{D|Z,{\bm{X}}}(0|0,{\bm{x}})>0$ , $f_{M|Z,D,{\bm{X}}}(m|1,d,{\bm{x}})>0$ , and $f_{M|Z,D,{\bm{X}}}(m|0,0,{\bm{x}})>0$ for any $z$ , $d$ , $m$ , and $\bm{x}$ . Additionally assume $p_{11}-p_{01}>0$ with $f_{D|Z,{\bm{X}}}(1|0,{\bm{x}})>0$ and $f_{M|Z,D,{\bm{X}}}(m|0,1,{\bm{x}})>0$ under standard monotonicity.

Let $\mathcal{U}_{\text{a}}=\{10,00,11\}$ be the three strata under standard monotonicity and let $\mathcal{U}_{\text{b}}=\{10,00\}$ be the two strata under strong monotonicity. Theorem 1 below shows that $\theta_{d_{1}d_{0}}^{(zz^{\prime})}$ is nonparametrically identified under the aforementioned assumptions.

Theorem 1

(Nonparametric identification) Suppose that Assumptions 1–6 hold. For any $z,z^{\prime}\in\{0,1\}$ , $d_{1}d_{0}\in\mathcal{U}_{\text{a}}$ under standard monotonicity, and $d_{1}d_{0}\in\mathcal{U}_{\text{b}}$ under strong monotonicity, $\theta_{d_{1}d_{0}}^{(zz^{\prime})}$ is identified as follows:

\theta_{d_{1}d_{0}}^{(zz^{\prime})}=\int_{{\bm{x}}}\frac{e_{d_{1}d_{0}}({\bm{x}})}{e_{d_{1}d_{0}}}\int_{m}\mathbb{E}_{Y|Z,D,M,{\bm{X}}}\left[Y|z,d_{z},m,{\bm{x}}\right]d\mathbb{P}_{M|Z,D,{\bm{X}}}\left(m|z^{\prime},d_{z^{\prime}},{\bm{x}}\right)d\mathbb{P}_{{\bm{X}}}\left({\bm{x}}\right),

where $d_{z}=\mathbb{I}(z=1)d_{1}+\mathbb{I}(z=0)d_{0}$ and $d_{z^{\prime}}=\mathbb{I}(z^{\prime}=1)d_{1}+\mathbb{I}(z^{\prime}=0)d_{0}$ . Here, $e_{d_{1}d_{0}}({\bm{x}})=p_{z^{*}d^{*}}({\bm{x}})-kp_{01}({\bm{x}})$ and $e_{d_{1}d_{0}}=p_{z^{*}d^{*}}-kp_{01}$ are identified in (3).

By Theorem 1, we have $\text{PNDE}_{d_{1}d_{0}}=\theta^{(10)}_{d_{1}d_{0}}-\theta^{(00)}_{d_{1}d_{0}}$ and $\text{PNIE}_{d_{1}d_{0}}=\theta^{(11)}_{d_{1}d_{0}}-\theta^{(10)}_{d_{1}d_{0}}$ , and the decomposition of the ITT effect can also be identified by

\text{ITT-NDE}=\sum_{d_{1}d_{0}\in\mathcal{U}}e_{d_{1}d_{0}}\times\left(\theta^{(10)}_{d_{1}d_{0}}-\theta^{(00)}_{d_{1}d_{0}}\right),\quad\text{ITT-NIE}=\sum_{d_{1}d_{0}\in\mathcal{U}}e_{d_{1}d_{0}}\times\left(\theta^{(11)}_{d_{1}d_{0}}-\theta^{(10)}_{d_{1}d_{0}}\right),

(4)

where $\mathcal{U}=\mathcal{U}_{\text{a}}$ under standard monotonicity and $\mathcal{U}=\mathcal{U}_{\text{b}}$ under strong monotonicity.

Remark 1

Rudolph et al. (2024) considered instrumental variables to identify the interventional and natural mediation effects among compliers. In a comparable scenario, they showed that under (i) exclusion restriction, (ii) standard monotonicity, and (iii) sequential randomization (Assumption 2 plus $Y_{zm}\perp\!\!\!\perp M|Z,D,{\bm{X}}$ ), $\text{PNIE}_{10}$ is identified by $\text{ITT-NIE}/e_{10}$ . A similar identification formula follows for $\text{PNDE}_{10}$ . Due to exclusion restriction, the mediation effects among other strata are automatically zero, and thus only the compliers stratum contributes information to the ITT natural mediation effect. Replacing exclusion restriction with principal ignorability, Theorem 1 allows additional strata to contribute information to the ITT natural mediation effect and enables point identification of each stratum-specific natural mediation effect.

3 Connections to mediation analysis with a treatment-induced confounder or two mediators

Although we consider $D$ as a primary source to sub-classify the study population, there exist two complementary perspectives for the role of $D$ in causal mediation analysis; that is, $D$ can be viewed as a binary post-treatment confounder or another mediator sitting in the causal pathway between $Z$ and $M$ . We discuss the connections between the present work and existing mediation methods for addressing treatment-induced confounding (Robins and Richardson, 2010; Tchetgen Tchetgen and VanderWeele, 2014; VanderWeele et al., 2014; Miles et al., 2020; Díaz et al., 2021; Xia and Chan, 2021) or two causally-ordered mediators (Albert and Nelson, 2011; Daniel et al., 2015; Steen et al., 2017; Zhou, 2022). When $D$ is considered as another mediator, methods have been proposed for identification of the path-specific effects through the four causal pathways given by Figure 1(a)–(d) (e.g., Daniel et al., 2015; Zhou, 2022). If $D$ is treated as a post-treatment confounder, methods have been developed for identifying different versions of mediation effects through $M$ , including the interventional mediation effects (VanderWeele et al., 2014; Díaz et al., 2021), the natural mediation effects (Robins and Richardson, 2010; Tchetgen Tchetgen and VanderWeele, 2014; Xia and Chan, 2021), and the path-specific effect on the causal pathway $Z\shortrightarrow M\shortrightarrow Y$ in Figure 1(c) (VanderWeele et al., 2014; Miles et al., 2017, 2020).

In contrast to methods that view $D$ as a post-treatment confounder, our work addresses a different scientific question. Both our approach and methods for two causally-ordered mediators aim to disentangle the roles of $M$ and $D$ in jointly explaining the causal mechanism, whereas mediation methods with a post-treatment confounder focus on the primary role of $M$ in explaining the causal mechanism. For example, Tchetgen Tchetgen and VanderWeele (2014) and Xia and Chan (2021) studied identification of the natural mediation effects defined in (2), which summarize the causal sequence through $M$ on the outcome marginalized over different levels of $D$ . Similarly, the interventional mediation effects in VanderWeele et al. (2014) and Díaz et al. (2021) only considered the causal sequence through mediator $M$ to define the estimand of interest. Finally, Miles et al. (2017) and Miles et al. (2020) mainly considered the path-specific effects on the causal pathway $Z\shortrightarrow M\shortrightarrow Y$ in Figure 1(c), where other causal pathways passing through $D$ were assumed of less interest.

Comparing the present work to methods for identifying path-specific effects, a notable difference lies in the causal estimands of interest. We specifically focus on decomposing the causal effects within endogenous subgroups characterized by the joint potential values of the $D$ , whereas the path-specific effects are defined for the entire study population. A further difference lies in the identification assumptions. Identifying path-specific effects requires certain ignorability assumptions regarding the observed post-treatment variable $D$ directly. For example, Daniel et al. (2015) requires that $M_{zd}\perp\!\!\!\perp D|\{Z,{\bm{X}}\}$ for any $d$ and $z$ . On the other hand, our identification assumptions require the use of the potential values of $D$ to define the principal stratum ( $U$ ) and then invoke ignorability assumptions across the principal stratum $U=D_{1}D_{0}$ .

Despite the aforementioned differences, there exist mathematical connections across the requisite identification conditions. We provide two further remarks about such connections, in particular to the work by Tchetgen Tchetgen and VanderWeele (2014) (on treatment-induced confounding) and Zhou (2022) (on two causally-ordered mediators). All proofs are provided in the Supplementary Material.

Remark 2

Tchetgen Tchetgen and VanderWeele (2014) used a nonparametric structural equations model with independent errors (NPSEM-IE) for the DAG in Figure 1, coupled with monotonicity, to identify the natural mediation effects (2) with a binary post-treatment confounder. Suppose that the consistency (Assumption 1) and the monotonicity (either Assumption 3a or 3b) hold, if further the NPSEM-IE for the DAG in Figure 1 (i.e., Assumption S1 in Supplementary Material) hold, then Assumptions 2, 4, and 5 also hold, but not vice versa.

By Remark 2, under the consistency and monotonicity, the ignorability assumptions (Assumptions 2, 4, and 5) are directly implied from a NPSEM-IE corresponding to the DAG in Figure 1. Remark 2 further implies that the identification formulas for the natural mediation effects (4) are equivalent to the identification formulas in Tchetgen Tchetgen and VanderWeele (2014), except that the present work invokes technically weaker assumptions.

Remark 3

Zhou (2022) considered a set of generalized sequential ignorability assumptions to identify path-specific effects with multiple mediators, and they are comparable to the present work in the special case when $D$ is binary. Suppose that the consistency (Assumption 1) holds, then, under monotonicity (either Assumption 3a or 3b), the set of generalized sequential ignorability assumptions in Zhou (2022) (i.e., Assumption S2 in Supplementary Material) are equivalent to Assumptions 2, 4, and 5.

By Remark 3, the assumptions in the present work are stronger than those in Zhou (2022), since the latter does not require monotonicity. This is expected as stronger assumptions are necessary to identify our finer-grained estimands that provide insights into the pathways within each subpopulation. Finally, if the monotonicity is plausible by the treatment $Z$ on the first mediator $D$ , our assumptions are equivalent to the set of generalized sequential ignorability assumptions in Zhou (2022).

4 Estimation of natural mediation effects

4.1 Nuisance functions and parametric working models

We first define several nuisance functions of the observed-data distributions. Let $\pi_{z}({\bm{x}})=\mathbb{P}_{Z|{\bm{X}}}(z|{\bm{x}})$ be the probability of treatment conditional on ${\bm{X}}$ , where $\pi_{1}({\bm{x}})$ is the propensity score. Note that $\pi_{z}({\bm{x}})$ degenerates to a constant value $\pi_{z}$ in randomized experiments. Let $r_{zd}(m,{\bm{x}})=f_{M|Z,D,{\bm{X}}}(m|z,d,{\bm{x}})$ be the probability of the mediator conditional on $Z$ , $D$ , and ${\bm{X}}$ . Let $\mu_{zd}(m,{\bm{x}})=\mathbb{E}_{Y|Z,D,M,{\bm{X}}}[Y|z,d,m,{\bm{x}}]$ be the conditional expectation of $Y$ given $Z$ , $D$ , $M$ , and ${\bm{X}}$ . Let $h_{nuisance}=\{\pi_{z}({\bm{x}}),p_{zd}({\bm{x}}),r_{zd}(m,{\bm{x}}),\mu_{zd}(m,{\bm{x}})\}$ contain all nuisance functions, where $p_{zd}({\bm{x}})=f_{D|Z,{\bm{X}}}(d|z,{\bm{x}})$ is defined in Section 2. It should be noted that, within our definitions of the nuisance functions, the two variables $(Z,D)$ –which directly relate to the principal strata–are presented as the subscript, while all other variables are presented as arguments.

One can specify parametric working models $h_{nuisance}^{\text{par}}=\{\pi_{z}^{\text{par}}({\bm{x}}),\allowbreak p_{zd}^{\text{par}}({\bm{x}}),\allowbreak r_{zd}^{\text{par}}(m,{\bm{x}}),\allowbreak\mu_{zd}^{\text{par}}(m,{\bm{x}})\}$ for $h_{nuisance}$ . Specification of the parametric working models can be flexible. For example, logistic regressions can be used for $\pi_{z}^{\text{par}}({\bm{x}})$ and $p_{zd}^{\text{par}}({\bm{x}})$ . When the mediator is continuous or binary, a linear regression or a logistic regression can be employed for $r_{zd}^{\text{par}}(m,{\bm{x}})$ . Similarly, a generalized linear model can be used for $\mu_{zd}^{\text{par}}(m,{\bm{x}})$ . Detailed model examples are provided in the Supplementary Material. Hereafter, we use $\mathcal{M}_{\pi}$ to denote the submodel of the nonparametric model $\mathcal{M}_{np}$ with a correctly specified $\pi_{z}^{\text{par}}({\bm{x}})$ for $\pi_{z}({\bm{x}})$ and unspecified other components. Analogously, define $\mathcal{M}_{e}$ , $\mathcal{M}_{m}$ , and $\mathcal{M}_{o}$ as the submodel of $\mathcal{M}_{np}$ with a correctly specified $p_{zd}^{\text{par}}({\bm{x}})$ , $r_{zd}^{\text{par}}(m,{\bm{x}})$ , and $\mu_{zd}^{\text{par}}(m,{\bm{x}})$ , respectively. In addition, we use “ $\cup$ ” and “ $\cap$ ” to denote union and intersection of submodels such that $\mathcal{M}_{\pi}\cap\mathcal{M}_{e}$ denotes the correct specification of both $\pi_{z}^{\text{par}}({\bm{x}})$ and $p_{zd}^{\text{par}}({\bm{x}})$ , and $\mathcal{M}_{\pi}\cup\mathcal{M}_{e}$ denotes the correct specification of either $\pi_{z}^{\text{par}}({\bm{x}})$ or $p_{zd}^{\text{par}}({\bm{x}})$ .

Suggested in Theorem 1, one also needs to estimate $e_{d_{1}d_{0}}$ , or equivalently $p_{zd}\equiv\mathbb{E}[p_{zd}({\bm{X}})]$ , in order to estimate $\theta_{d_{1}d_{0}}^{(zz^{\prime})}$ . There are multiple ways to estimate $p_{zd}$ , as one can simply use the plug-in estimator $\mathbb{P}_{n}[\widehat{p}_{zd}^{\text{par}}({\bm{X}})]$ and the inverse probability weighting estimator $\mathbb{P}_{n}\left[{\mathbb{I}(Z=z,D=d)}/{\widehat{\pi}_{z}^{\text{par}}({\bm{X}})}\right]$ . In randomized experiments, because $\pi_{z}({\bm{X}})=\pi_{z}$ , one can also estimate $p_{zd}$ by ${\mathbb{P}_{n}[\mathbb{I}(Z=z,D=d)]}/{\mathbb{P}_{n}[\mathbb{I}(Z=z)]}$ . In this article, we consider the doubly robust estimator developed in Jiang et al. (2022),

\widehat{p}_{zd}^{\text{dr}}=\mathbb{P}_{n}\left[\frac{\mathbb{I}(Z=z)\left\{\mathbb{I}(D=d)-\widehat{p}_{zd}^{\text{par}}({\bm{X}})\right\}}{\widehat{\pi}_{z}^{\text{par}}({\bm{X}})}+\widehat{p}_{zd}^{\text{par}}({\bm{X}})\right],

(5)

which is consistent to $p_{zd}$ under $\mathcal{M}_{\pi}\cup\mathcal{M}_{e}$ and is locally efficient under $\mathcal{M}_{\pi}\cap\mathcal{M}_{e}$ .

4.2 Moment-type estimators

We provide four distinct identification expressions of $\theta_{d_{1}d_{0}}^{(zz^{\prime})}$ ; each expression uses only part, but not all, of the nuisance functions $h_{nusiance}$ and the principal stratum proportion $p_{zd}$ .

Theorem 2

For $z,z^{\prime}\in\{0,1\}$ , $d_{1}d_{0}\in\mathcal{U}_{\text{a}}$ or $d_{1}d_{0}\in\mathcal{U}_{\text{b}}$ under standard or strong monotonicity, we have $\theta_{d_{1}d_{0}}^{(zz^{\prime})}=\theta^{(zz^{\prime}),\textrm{a}}_{d_{1}d_{0}}=\theta^{(zz^{\prime}),\textrm{b}}_{d_{1}d_{0}}=\theta^{(zz^{\prime}),\textrm{c}}_{d_{1}d_{0}}=\theta^{(zz^{\prime}),\textrm{d}}_{d_{1}d_{0}}$ , where

	$\displaystyle\theta^{(zz^{\prime}),\textrm{a}}_{d_{1}d_{0}}$	$\displaystyle=\mathbb{E}\left[\frac{p_{z^{}d^{}}({\bm{X}})-kp_{01}({\bm{X}})}{p_{z^{}d^{}}-kp_{01}}\frac{\mathbb{I}(D=d_{z},Z=z)}{p_{zd_{z}}({\bm{X}})\pi_{z}({\bm{X}})}\frac{r_{z^{\prime}d_{z^{\prime}}}(M,{\bm{X}})}{r_{zd_{z}}(M,{\bm{X}})}Y\right],$
	$\displaystyle\theta^{(zz^{\prime}),\textrm{b}}_{d_{1}d_{0}}$	$\displaystyle=\mathbb{E}\left[\left\{\frac{\mathbb{I}(Z=z^{},D=d^{})}{\pi_{z^{}}({\bm{X}})}-k\frac{(1-Z)D}{\pi_{0}({\bm{X}})}\right\}\frac{\eta_{zz^{\prime}}({\bm{X}})}{p_{z^{}d^{*}}-kp_{01}}\right],$
	$\displaystyle\theta^{(zz^{\prime}),\textrm{c}}_{d_{1}d_{0}}$	$\displaystyle=\mathbb{E}\left[\frac{p_{z^{}d^{}}({\bm{X}})-kp_{01}({\bm{X}})}{p_{z^{}d^{}}-kp_{01}}\frac{\mathbb{I}(D=d_{z^{\prime}},Z=z^{\prime})}{p_{z^{\prime}d_{z^{\prime}}}({\bm{X}})\pi_{z^{\prime}}({\bm{X}})}\mu_{zd_{z}}(M,{\bm{X}})\right],$
	$\displaystyle\theta^{(zz^{\prime}),\textrm{d}}_{d_{1}d_{0}}$	$\displaystyle=\mathbb{E}\left[\frac{p_{z^{}d^{}}({\bm{X}})-kp_{01}({\bm{X}})}{p_{z^{}d^{}}-kp_{01}}\eta_{zz^{\prime}}({\bm{X}})\right].$

with $\eta_{zz^{\prime}}({\bm{X}})=\int_{m}\mu_{zd_{z}}(m,{\bm{X}})r_{z^{\prime}d_{z^{\prime}}}(m,{\bm{X}})dm$ , $k=|d_{1}-d_{0}|$ , $z^{*}d^{*}=$ 11, 10, 01 if $d_{1}d_{0}=$ 10, 00, and 11, respectively.

The first expression is an average of outcome by the product of four different weights, where the first weight, $\frac{p_{z^{*}d^{*}}({\bm{X}})-kp_{01}({\bm{X}})}{p_{z^{*}d^{*}}-kp_{01}}={e_{d_{1}d_{0}}({\bm{X}})}/{e_{d_{1}d_{0}}}$ , is the principal score weight for creating a pseudo-population within stratum $U=d_{1}d_{0}$ (Jiang et al., 2022). The remaining three weights in $\theta^{(zz^{\prime}),\textrm{a}}_{d_{1}d_{0}}$ —the inverse probability of treatment weight, the inverse probability of the post-treatment event weight, and the mediator density ratio weight—correct for selection bias associated with the treatment, post-treatment event, and the observed mediator value, within the pseudo population created by the principal score weight. The second expression is a product of two components, where the first component, $\left\{\frac{\mathbb{I}(Z=z^{*},D=d^{*})}{\pi_{z^{*}}({\bm{X}})}-k\frac{(1-Z)D}{\pi_{0}({\bm{X}})}\right\}\Big{/}(p_{z^{*}d^{*}}-kp_{01})$ , plays a similar role to the principal score weight to create a pseudo-population of stratum $U=d_{1}d_{0}$ , and the second component, $\eta_{zz^{\prime}}({\bm{X}})=\mathbb{E}[Y_{zM_{z^{\prime}}}|U=d_{1}d_{0},{\bm{X}}]$ is a conditional version of $\theta_{d_{1}d_{0}}^{(zz^{\prime})}$ given fixed values of baseline covariates and within stratum $U=d_{1}d_{0}$ . Construction of the third expression bears some resemblance to the first expression, both of which use the principal score weight, except that the third expression uses a slightly different weighting scheme coupled with the conditional expectation of outcome $\mu_{zd_{z}}(M,{\bm{X}})$ instead of weighting directly on the observed outcome. The fourth expression shares a similar form to the second expression, but now involves the product between the principal score weight and $\eta_{zz^{\prime}}({\bm{X}})$ .

According to Theorem 2, we can obtain the four moment-type estimators, $\{\widehat{\theta}^{(zz^{\prime}),\rm{a}}_{d_{1}d_{0}},\allowbreak\widehat{\theta}^{(zz^{\prime}),\rm{b}}_{d_{1}d_{0}},\allowbreak\widehat{\theta}^{(zz^{\prime}),\rm{c}}_{d_{1}d_{0}},\allowbreak\widehat{\theta}^{(zz^{\prime}),\rm{d}}_{d_{1}d_{0}}\}$ , by replacing the unknown nuisance functions with their estimates from parametric working models and substituting the outer expectation operator $\mathbb{E}$ by the empirical average operator $\mathbb{P}_{n}$ . As an example, $\widehat{\theta}^{(zz^{\prime}),\rm{d}}_{d_{1}d_{0}}$ is given by

\mathbb{P}_{n}\left[\frac{\widehat{p}_{z^{*}d^{*}}^{\text{par}}({\bm{X}})-k\widehat{p}_{01}^{\text{par}}({\bm{X}})}{\widehat{p}_{z^{*}d^{*}}^{\text{dr}}-k\widehat{p}_{01}^{\text{dr}}}\widehat{\eta}_{zz^{\prime}}^{\text{par}}({\bm{X}})\right],

where $\widehat{\eta}_{zz^{\prime}}^{\text{par}}({\bm{X}})=\int_{m}\widehat{\mu}_{zd_{z}}^{\text{par}}(m,{\bm{X}})\widehat{r}_{z^{\prime}d_{z^{\prime}}}^{\text{par}}(m,{\bm{X}})dm$ . Here, the integral in $\widehat{\eta}_{zz^{\prime}}^{\text{par}}({\bm{X}})$ becomes simple summations when the mediator is categorical and, if the mediator is continuous, numerical integration can be used for an approximate calculation. We summarize the asymptotic properties of the four moment-type estimators below.

Proposition 1

Suppose that the regularity conditions outlined in the Supplementary Material hold. Then, $\widehat{\theta}^{(zz^{\prime}),\rm{a}}_{d_{1}d_{0}}$ , $\widehat{\theta}^{(zz^{\prime}),\rm{b}}_{d_{1}d_{0}}$ , $\widehat{\theta}^{(zz^{\prime}),\rm{c}}_{d_{1}d_{0}}$ , and $\widehat{\theta}^{(zz^{\prime}),\rm{d}}_{d_{1}d_{0}}$ are consistent and asymptotic normal under $\mathcal{M}_{\pi}\cap\mathcal{M}_{e}\cap\mathcal{M}_{m}$ , $\mathcal{M}_{\pi}\cap\mathcal{M}_{m}\cap\mathcal{M}_{o}$ , $\mathcal{M}_{\pi}\cap\mathcal{M}_{e}\cap\mathcal{M}_{o}$ , and $\mathcal{M}_{e}\cap\mathcal{M}_{m}\cap\mathcal{M}_{o}$ , respectively.

4.3 From efficient influence function to multiply robust estimator

Denote $\mathcal{M}_{np}$ as the nonparametric model over the observed data density function $f_{\bm{O}}$ . The efficient influence function (EIF) of $\theta_{d_{1}d_{0}}^{(zz^{\prime})}$ under $\mathcal{M}_{np}$ is derived in Theorem 3 based on the semiparametric estimation theory (Bickel et al., 1993), which also implies the semiparametric efficiency bound, i.e., the lower bound of the asymptotic variance among all regular and asymptotic linear estimators of $\theta_{d_{1}d_{0}}^{(zz^{\prime})}$ under the nonparametric model $\mathcal{M}_{np}$ .

Theorem 3

The EIF of $\theta_{d_{1}d_{0}}^{(zz^{\prime})}$ over $\mathcal{M}_{np}$ is

\mathcal{D}^{(zz^{\prime})}_{d_{1}d_{0}}(\bm{O})=\frac{\psi_{d_{1}d_{0}}^{(zz^{\prime})}(\bm{O})-\theta_{d_{1}d_{0}}^{(zz^{\prime})}\delta_{d_{1}d_{0}}(\bm{O})}{p_{z^{*}d^{*}}-kp_{01}},

where

	$\displaystyle\psi_{d_{1}d_{0}}^{(zz^{\prime})}({\bm{O}})=$	$\displaystyle\left(\frac{\mathbb{I}(Z=z^{})\left\{\mathbb{I}(D=d^{})-p_{z^{}d^{}}({\bm{X}})\right\}}{\pi_{z^{*}}({\bm{X}})}-k\frac{(1-Z)\left\{D-p_{01}({\bm{X}})\right\}}{\pi_{0}({\bm{X}})}\right)\eta_{zz^{\prime}}({\bm{X}})$
		$\displaystyle+\left\{p_{z^{}d^{}}({\bm{X}})-kp_{01}({\bm{X}})\right\}\frac{\mathbb{I}(D=d_{z},Z=z)}{p_{zd_{z}}({\bm{X}})\pi_{z}({\bm{X}})}\frac{r_{z^{\prime}d_{z^{\prime}}}(M,{\bm{X}})}{r_{zd_{z}}(M,{\bm{X}})}\left\{Y-\mu_{zd_{z}}(M,{\bm{X}})\right\}$
		$\displaystyle+\left\{p_{z^{}d^{}}({\bm{X}})-kp_{01}({\bm{X}})\right\}\frac{\mathbb{I}(D=d_{z^{\prime}},Z=z^{\prime})}{p_{z^{\prime}d_{z^{\prime}}}({\bm{X}})\pi_{z^{\prime}}({\bm{X}})}\left\{\mu_{zd_{z}}(M,{\bm{X}})-\eta_{zz^{\prime}}({\bm{X}})\right\}$
		$\displaystyle+\left\{p_{z^{}d^{}}({\bm{X}})-kp_{01}({\bm{X}})\right\}\eta_{zz^{\prime}}({\bm{X}}),$
	$\displaystyle\delta_{d_{1}d_{0}}({\bm{O}})=$	$\displaystyle\frac{\mathbb{I}(Z=z^{})\left\{\mathbb{I}(D=d^{})-p_{z^{}d^{}}({\bm{X}})\right\}}{\pi_{z^{}}({\bm{X}})}-k\frac{(1-Z)\left\{D-p_{01}({\bm{X}})\right\}}{\pi_{0}({\bm{X}})}+p_{z^{}d^{*}}({\bm{X}})-kp_{01}({\bm{X}}),$

$k=|d_{1}-d_{0}|$ , $z^{*}d^{*}=$ 11, 10, 01 if $d_{1}d_{0}=$ 10, 00, and 11, respectively. Therefore, the semiparametric efficiency bound for estimation of $\theta_{d_{1}d_{0}}^{(zz^{\prime})}$ is $\mathbb{E}\left[\left\{\mathcal{D}^{(zz^{\prime})}_{d_{1}d_{0}}(\bm{O})\right\}^{2}\right]$ .

Theorem 3 inspires a new estimator of $\theta_{d_{1}d_{0}}^{(zz^{\prime})}$ by solving the following EIF-induced estimating equation

\mathbb{P}_{n}\left[\frac{\psi_{d_{1}d_{0}}^{(zz^{\prime})}(\bm{O})-\theta_{d_{1}d_{0}}^{(zz^{\prime})}\delta_{d_{1}d_{0}}(\bm{O})}{p_{z^{*}d^{*}}-kp_{01}}\right]=0,

where $\psi_{d_{1}d_{0}}^{(zz^{\prime})}(\bm{O})$ and $\delta_{d_{1}d_{0}}(\bm{O})$ depend on nuisance functions $h_{nuisance}$ and the denominator $p_{z^{*}d^{*}}-kp_{01}$ is a constant that does not affect the solution of the estimating equation. Therefore, the new estimator, which we hereafter refer to as the multiply robust estimator, can be constructed as

\widehat{\theta}^{(zz^{\prime}),\text{mr}}_{d_{1}d_{0}}=\frac{\mathbb{P}_{n}\left[\widehat{\psi}_{d_{1}d_{0}}^{(zz^{\prime}),\text{par}}(\bm{O})\right]}{\mathbb{P}_{n}\left[\widehat{\delta}_{d_{1}d_{0}}^{\text{par}}(\bm{O})\right]}.

Theorem 4 summarizes the asymptotic properties of the multiply robust estimator.

Theorem 4

Suppose that the regularity conditions outlined in Supplementary Material hold. Under either $\mathcal{M}_{\pi}\cap\mathcal{M}_{e}\cap\mathcal{M}_{m}$ , $\mathcal{M}_{\pi}\cap\mathcal{M}_{m}\cap\mathcal{M}_{o}$ , $\mathcal{M}_{\pi}\cap\mathcal{M}_{e}\cap\mathcal{M}_{o}$ , or $\mathcal{M}_{e}\cap\mathcal{M}_{m}\cap\mathcal{M}_{o}$ , the multiply robust estimator $\widehat{\theta}^{(zz^{\prime}),\text{mr}}_{d_{1}d_{0}}$ is consistent and asymptotically normal such that $\sqrt{n}\left(\widehat{\theta}^{(zz^{\prime}),\text{mr}}_{d_{1}d_{0}}-\theta_{d_{1}d_{0}}^{(zz^{\prime})}\right)$ converges to a zero-mean normal distribution with finite variance $V_{\text{mr}}$ . Moreover, $V_{\text{mr}}$ achieves the semiparametric efficiency bound under $\mathcal{M}_{\pi}\cap\mathcal{M}_{e}\cap\mathcal{M}_{m}\cap\mathcal{M}_{o}$ .

An attractive property of $\widehat{\theta}^{(zz^{\prime}),\text{mr}}_{d_{1}d_{0}}$ is that it offers four types of protection against misspecification of the parametric working models. Notice that the four moment-type estimators provided in Section 4.2 are only single robust; for example, $\widehat{\theta}^{(zz^{\prime}),\text{a}}_{d_{1}d_{0}}$ is only consistent under $\mathcal{M}_{\pi}\cap\mathcal{M}_{e}\cap\mathcal{M}_{m}$ . By contrast, $\widehat{\theta}^{(zz^{\prime}),\text{mr}}_{d_{1}d_{0}}$ is quadruply robust such that it is consistent for $\theta_{d_{1}d_{0}}^{(zz^{\prime})}$ even if one of the four working models, $\mathcal{M}_{\pi}$ , $\mathcal{M}_{e}$ , $\mathcal{M}_{m}$ , and $\mathcal{M}_{o}$ , is misspecified. In addition, $\widehat{\theta}^{(zz^{\prime}),\text{mr}}_{d_{1}d_{0}}$ is also locally efficient when all of the four working models are correctly specified. A proof of the quadruple robustness property is given in Supplementary Material. As a caveat, the quadruple robustness is more stringent than the double robustness property, as the former requires three out of four working models to be correct whereas the latter only require one out of two working models to be correct. In practice, one can use nonparametric bootstrap to construct the standard error and confidence interval of $\widehat{\theta}^{(zz^{\prime}),\text{mr}}_{d_{1}d_{0}}$ .

Remark 4

The monotonicity assumption can place a restriction on the observed data density $f_{{\bm{O}}}$ ; that is, the standard monotonicity indicates that $f_{D|Z,{\bm{X}}}(1|1,{\bm{x}})\geq f_{D|Z,{\bm{X}}}(1|0,{\bm{x}})$ and the strong monotonicity further constrains $f_{D|Z,{\bm{X}}}(1|0,{\bm{x}})\equiv 0$ . Following previous efforts in obtaining efficient causal estimators under a principal stratification framework (Rudolph et al., 2024; Jiang et al., 2022), the EIF in Theorem 3 is derived under the nonparametric model $\mathcal{M}_{np}$ , which does not leverage the monotonicity restriction on $f_{{\bm{O}}}$ to potentially sharpen the efficiency bound. Therefore, $\widehat{\theta}^{(zz^{\prime}),\text{mr}}_{d_{1}d_{0}}$ is only locally efficient under $\mathcal{M}_{np}$ , rather than under a more restrictive model space assuming monotonicity.

4.4 Nonparametric efficient estimation

We extend the proposed multiply robust estimator by estimating the nuisance functions, $h_{\text{nuisance}}$ , via flexible nonparametric methods or modern data-adaptive machine learning methods. We denote the new estimator as $\widehat{\theta}^{(zz^{\prime}),\text{np}}_{d_{1}d_{0}}$ with the superscript “np” to indicate using nonparametric algorithms. The cross-fitting procedure (Chernozhukov et al., 2018) is employed to circumvent the bias due to overfitting of nonparametric estimation on the nuisance functions. Specifically, we randomly partition the dataset into $V$ groups with approximately equal size such that the group size difference is at most 1. For each $v$ , let $\mathcal{O}_{v}$ be the data in $v$ -th group and $\mathcal{O}_{-v}=\displaystyle\cup_{v^{\prime}\in\{1,\dots,V\}\setminus v}\mathcal{O}_{v^{\prime}}$ be the data excluding the $v$ -th group. For $v=1,\dots,V$ , we calculate the nuisance function estimates on data $\mathcal{O}_{v}$ , denoted by $\widehat{h}_{nuisance}^{\text{np},v}=\{\widehat{\pi}_{z}^{\text{np},v}({\bm{x}}),\widehat{p}_{zd}^{\text{np},v}({\bm{x}}),\widehat{r}_{zd}^{\text{np},v}(m,{\bm{x}}),\widehat{\mu}_{zd}^{\text{np},v}(m,{\bm{x}})\}$ , based on machine learning or nonparametric methods trained on data $\mathcal{O}_{-v}$ . The nuisance function estimates evaluated over the entire dataset, $\widehat{h}_{nuisance}^{\text{np}}$ , is therefore a combination of $\widehat{h}_{nuisance}^{\text{np},1}$ , $\widehat{h}_{nuisance}^{\text{np},2}$ , $\dots$ , $\widehat{h}_{nuisance}^{\text{np},V}$ . Finally, $\widehat{\theta}^{(zz^{\prime}),\text{np}}_{d_{1}d_{0}}$ is given by the solution to

\mathbb{P}_{n}\left[\frac{\widehat{\psi}_{d_{1}d_{0}}^{(zz^{\prime}),\text{np}}(\bm{O})-\theta_{d_{1}d_{0}}^{(zz^{\prime})}\widehat{\delta}_{d_{1}d_{0}}^{\text{np}}(\bm{O})}{p_{z^{*}d^{*}}-kp_{01}}\right]=0

so that $\widehat{\theta}^{(zz^{\prime}),\text{np}}_{d_{1}d_{0}}=\mathbb{P}_{n}[\widehat{\psi}_{d_{1}d_{0}}^{(zz^{\prime}),\text{np}}(\bm{O})]/\mathbb{P}_{n}[\widehat{\delta}_{d_{1}d_{0}}^{\text{np}}(\bm{O})]$ , where $\widehat{\psi}_{d_{1}d_{0}}^{(zz^{\prime}),\text{np}}(\bm{O})$ and $\widehat{\delta}_{d_{1}d_{0}}^{\text{np}}(\bm{O})$ are $\psi_{d_{1}d_{0}}^{(zz^{\prime})}(\bm{O})$ and $\delta_{d_{1}d_{0}}(\bm{O})$ evaluated based on $\widehat{h}_{nuisance}^{\text{np}}$ .

Theorem 5

Under Assumptions 1–6, $\widehat{\theta}^{(zz^{\prime}),\text{np}}_{d_{1}d_{0}}$ is consistent if any three of the four nuisance functions in $\widehat{h}_{nuisance}^{\text{np}}$ are consistently estimated in the $L_{2}(\mathbb{P})$ -norm. Furthermore, if all elements in $\widehat{h}_{nuisance}^{\text{np}}$ are consistent in the $L_{2}(\mathbb{P})$ -norm and $\|\widehat{l}^{\text{np}}-l\|\times\|\widehat{g}^{\text{np}}-g\|=o_{p}(n^{-1/2})$ for any $l\neq g\in\{\pi_{z}({\bm{x}}),p_{zd}({\bm{x}}),r_{zd}(m,{\bm{x}}),\mu_{zd}(m,{\bm{x}})\}$ , then $\widehat{\theta}^{(zz^{\prime}),\text{np}}_{d_{1}d_{0}}$ is asymptotically normal and its asymptotic variance achieves the efficiency lower bound.

Theorem 5 indicates that $\widehat{\theta}^{(zz^{\prime}),\text{np}}_{d_{1}d_{0}}$ is consistent, asymptotically normal, and also achieves semiparametric efficiency lower bound, if all nuisance functions can be consistently estimated with a $o_{p}(n^{-1/4})$ rate, which can be achieved by several machining learning algorithms (e.g., the boosting approach by Luo et al. (2016), and the random forest by Wager and Walther (2015), and the neural networks by Chen and White (1999)). When nuisance functions are estimated via data-adaptive methods, we use the empirical variance of the estimated EIF to construct the variance estimator for $\widehat{\theta}^{(zz^{\prime}),\text{np}}_{d_{1}d_{0}}$ ; that is

\widehat{\text{Var}}(\widehat{\theta}^{(zz^{\prime}),\text{np}}_{d_{1}d_{0}})=\frac{1}{n}\mathbb{P}_{n}\left[\left\{\frac{\widehat{\psi}_{d_{1}d_{0}}^{(zz^{\prime}),\text{np}}(\bm{O})-\widehat{\theta}_{d_{1}d_{0}}^{(zz^{\prime}),\text{np}}\widehat{\delta}_{d_{1}d_{0}}^{\text{np}}(\bm{O})}{\widehat{p}_{z^{*}d^{*}}^{\text{np}}-k\widehat{p}_{01}^{\text{np}}}\right\}^{2}\right],

where $\widehat{p}_{zd}^{\text{np}}$ is constructed analogous to $\widehat{p}_{zd}^{\text{dr}}$ in (5) but evaluated using $\widehat{h}_{nuisance}^{\text{np}}$ .

4.5 Estimation of natural mediation effects

Once we obtain $\widehat{\theta}_{d_{1}d_{0}}^{(zz^{\prime})}$ , estimators of $\text{PNIE}_{d_{1}d_{0}}$ and $\text{PNDE}_{d_{1}d_{0}}$ can be constructed based on (1). For example, we can construct $\widehat{\text{PNIE}}_{d_{1}d_{0}}^{\text{s}}=\widehat{\theta}^{(11),\text{s}}_{d_{1}d_{0}}-\widehat{\theta}^{(10),\text{s}}_{d_{1}d_{0}}$ and $\widehat{\text{PNDE}}_{d_{1}d_{0}}^{\text{s}}=\widehat{\theta}^{(10),\text{s}}_{d_{1}d_{0}}-\widehat{\theta}^{(00),\text{s}}_{d_{1}d_{0}}$ , if either the moment-type method (s=a, b, c or d), multiply robust estimator (s=mr), or nonparametric efficient estimator (s=np) is used for $\theta^{(zz^{\prime})}_{d_{1}d_{0}}$ . Analogously, ITT-NIE and ITT-NDE can estimated via using (4) by replacing $e_{d_{1}d_{0}}=p_{z^{*}d^{*}}-kp_{01}$ and $\theta_{d_{1}d_{0}}^{(zz^{\prime})}$ with their corresponding estimators. Specifically, we can construct estimators of ITT-NIE and ITT-NDE as $\widehat{\text{ITT-NDE}}^{\text{s}}=\displaystyle\sum_{d_{1}d_{0}\in\mathcal{U}}\widehat{e}_{d_{1}d_{0}}^{\text{dr}}\times\left(\widehat{\theta}^{(10),\text{s}}_{d_{1}d_{0}}-\widehat{\theta}^{(00),\text{s}}_{d_{1}d_{0}}\right)$ and $\widehat{\text{ITT-NIE}}^{\text{s}}=\displaystyle\sum_{d_{1}d_{0}\in\mathcal{U}}\widehat{e}_{d_{1}d_{0}}^{\text{dr}}\times\left(\widehat{\theta}^{(11),\text{s}}_{d_{1}d_{0}}-\theta^{(10),\text{s}}_{d_{1}d_{0}}\right)$ if either the moment-type method (s=a, b, c or d) or the multiply robust estimator (s=mr) is used for estimating $\theta_{d_{1}d_{0}}^{(zz^{\prime})}$ , where $\widehat{e}_{d_{1}d_{0}}^{\text{dr}}=\widehat{p}_{z^{*}d^{*}}^{\text{dr}}-k\widehat{p}_{01}^{\text{dr}}$ and $\mathcal{U}=\mathcal{U}_{\text{a}}$ and $\mathcal{U}_{\text{b}}$ under standard and strong monotonicity assumptions, respectively. In particular, the multiply robust estimators have the following explicit expressions

	$\displaystyle\widehat{\text{ITT-NDE}}^{\text{mr}}=$	$\displaystyle\mathbb{P}_{n}\left[\sum_{d_{1}d_{0}\in\mathcal{U}}\left\{\widehat{\psi}_{d_{1}d_{0}}^{(10),\text{par}}-\widehat{\psi}_{d_{1}d_{0}}^{(00),\text{par}}\right\}\right],$		(6)
	$\displaystyle\widehat{\text{ITT-NIE}}^{\text{mr}}=$	$\displaystyle\mathbb{P}_{n}\left[\sum_{d_{1}d_{0}\in\mathcal{U}}\left\{\widehat{\psi}_{d_{1}d_{0}}^{(11),\text{par}}-\widehat{\psi}_{d_{1}d_{0}}^{(10),\text{par}}\right\}\right].$		(7)

Similarly, the nonparametric estimators $\widehat{\text{ITT-NDE}}^{\text{np}}$ and $\widehat{\text{ITT-NIE}}^{\text{np}}$ can be obtained by replacing all $\widehat{\psi}_{d_{1}d_{0}}^{(zz^{\prime}),\text{par}}$ in (6) and (7) with $\widehat{\psi}_{d_{1}d_{0}}^{(zz^{\prime}),\text{np}}$ . In the Supplementary Material, we show that, for all $\tau\in\{\text{PNIE}_{d_{1}d_{0}},\text{PNDE}_{d_{1}d_{0}},\text{ITT-NIE},\text{ITT-NDE}\}$ , $\widehat{\tau}^{\text{np}}$ is consistent and semiparametrically efficient if conditions in Theorem 5 are satisfied and $\widehat{\tau}^{\text{mr}}$ is still quadruply robust and locally efficient when all working models in $h_{nuisance}$ are correctly specified. Details on inference are given in the Supplementary Material.

Although we primarily discuss mediation effects on a mean difference scale, all effects can be defined on other scales as needed. For example, with a binary outcome one can consider a risk ratio scale and use $\text{PNIE}_{d_{1}d_{0}}^{\text{RR}}=\theta_{d_{1}d_{0}}^{(11)}/\theta_{d_{1}d_{0}}^{(10)}$ and $\text{PNDE}_{d_{1}d_{0}}^{\text{RR}}=\theta_{d_{1}d_{0}}^{(10)}/\theta_{d_{1}d_{0}}^{(00)}$ to quantify the natural indirect and direct effects within principal stratum $U=d_{1}d_{0}$ . Similarly, one can use $\text{ITT-NIE}^{\text{RR}}=\mathbb{E}[Y_{1M_{1}}]/\mathbb{E}[Y_{1M_{0}}]$ and $\text{ITT-NDE}^{\text{RR}}=\mathbb{E}[Y_{1M_{0}}]/\mathbb{E}[Y_{0M_{0}}]$ to measure the natural indirect and direct effects among the entire study population. Estimation of ratio mediation effects is straightforward based on $\widehat{\theta}_{d_{1}d_{0}}^{(zz^{\prime})}$ , and is omitted for brevity.

5 A simulation study

We investigate the finite-sample performance of the proposed methods via simulation studies. We consider the following data generation process modified from that in Kang and Schafer (2007), in which the positivity assumptions are practically violated under model misspecification. Specifically, we generate 1000 Monte Carlo samples with $n=1000$ by the following process. We draw baseline covariates ${\bm{X}}=[X_{1},X_{2},X_{3},X_{4}]\sim N(\bm{0}_{4\times 1},\bm{I}_{4\times 4})$ , and

	$\displaystyle Z\|{\bm{X}}\sim\text{Bernoulli}\left(\text{expit}([-1,0.5,-0.25,-0.1]^{T}{\bm{X}})\right),$
	$\displaystyle D\|Z,{\bm{X}}\sim\text{Bernoulli}\left(\text{expit}(-1+2Z+[1,-0.8,0.6,-1]^{T}{\bm{X}})\right),$
	$\displaystyle M\|D,Z,{\bm{X}}\sim\text{Bernoulli}\left(\text{expit}(-1.8+2Z+1.5D+[1,-0.5,0.9,-1]^{T}{\bm{X}})\right),$
	$\displaystyle Y\|M,D,Z,{\bm{X}}\sim N\left(210+1.5Z-D+M+[27.4,13.7,13.7,13.7]^{T}{\bm{X}},1\right).$

In correctly specified parametric working models, we directly include the true baseline covariates ${\bm{X}}$ into each working model, where specifications of the working models are given in Supplementary Material. Otherwise, we include a set of transformed covariates, $\widetilde{{\bm{X}}}=[\widetilde{X}_{1},\widetilde{X}_{2},\widetilde{X}_{3},\widetilde{X}_{4}]$ , into misspecified working models, where $\widetilde{X}_{1}=\exp(0.5X_{1})$ , $\widetilde{X}_{2}=\frac{X_{2}}{1+X_{1}}$ , $\widetilde{X}_{2}=\left(\frac{X_{2}X_{3}}{25}+0.6\right)^{3}$ , and $\widetilde{X}_{4}=\left(X_{2}+X_{4}+20\right)^{2}$ . We evaluate each of the proposed moment-type estimators and multiply robust estimators under 6 different scenarios: (i) all components in $h_{nuisance}^{\text{par}}$ are correctly specified; (ii) only $\pi_{z}^{\text{par}}({\bm{x}})$ is misspecified; (iii) only $p_{zd}^{\text{par}}({\bm{x}})$ is misspecified; (iv) only $r_{zd}^{\text{par}}(m,{\bm{x}})$ is misspecified; (v) only $\mu_{zd}^{\text{par}}(m,{\bm{x}})$ is misspecified; (vi) all components in $h_{nuisance}^{\text{par}}$ are misspecified.

For the nonparametric estimator, we consider a five-fold cross-fitting with the nuisance functions estimated by the Super Learner (Van der Laan et al., 2007) with a combination of random forest and generalized linear models libraries. Although the Super Learner is more flexible than parametric working models, its performance still depends on the quality of the input feature matrix. In each of Scenarios (i)–(vi), we use the true covariates ${\bm{X}}$ as the feature matrix under the correctly specified nuisance scenario and the transformed covariates $\widetilde{{\bm{X}}}$ as the feature matrix under the misspecified nuisance scenario.

Refer to caption — Figure 2: Simulation results for estimators of $\theta_{10}^{(10)}$ among 6 different scenarios with sample size $n=1,000$ . Scenarios (i)–(vi) are described in Section 5. The horizontal line in each panel is the true value of $\theta_{10}^{(10)}$ . The yellow highlighted boxplots indicate that the corresponding estimators are expected to be consistent by large-sample theory.

Figure 2 presents the boxplots of different estimators of $\theta_{10}^{(10)}=\mathbb{E}[Y_{1M_{0}}|U=10]$ over 1000 Monte Carlo simulations, with each panel corresponding to a specific simulation scenario. As expected, the moment-type estimators are centered around the true value if the required parametric working models are all correctly specified but may diverge from the true value otherwise. The multiply robust estimator exhibits minimum bias in Scenarios (i)–(v), confirming the quadruply robust property; however, it exhibits bias when all of the working models are misspecified as demonstrated in scenario (vi). The nonparametric efficient estimator performs fairly well with minimal bias in scenarios (i)–(v), and its bias in scenario (vi) is also smaller than that of the multiply robust estimator with parametric working models. For each scenario, we also investigate the 95% Wald-type confidence interval coverage rate in Table 1, where the variance is estimated by bootstrap in moment-type and multiply robust estimators and by the empirical variance of the EIF in the nonparametric method. We observe that both $\widehat{\theta}_{10}^{(10),\text{mr}}$ and $\widehat{\theta}_{10}^{(10),\text{np}}$ present close to nominal coverage in scenarios (i)–(v), but their coverage rates are attenuated in scenario (vi) due to misspecification. Moreover, the moment estimator $\widehat{\theta}_{10}^{(10),\text{a}}$ appears to have nominal coverage except for Scenario (iii), likely due to the over-estimation of the true sampling variance for this weighting estimator under misspecified weights. We also evaluate estimators of $\theta_{11}^{(10)}$ and $\theta_{00}^{(10)}$ and results are qualitatively similar. The detailed additional simulation results are provided in the Supplementary Material Figures S1–S2.

Table 1: Simulation results of the 95% confidence interval coverage rate among different estimators of

\theta_{10}^{(10)}

. Scenarios (i)–(vi) are described in Section 4.6. The numbers displayed in bold signify that the corresponding estimators are expected to be consistent by large-sample theory.

Scenario	$\widehat{\theta}_{10}^{(10),\text{a}}$	$\widehat{\theta}_{10}^{(10),\text{b}}$	$\widehat{\theta}_{10}^{(10),\text{c}}$	$\widehat{\theta}_{10}^{(10),\text{d}}$	$\widehat{\theta}_{10}^{(10),\text{mr}}$	$\widehat{\theta}_{10}^{(10),\text{np}}$
(i) All nuisance correctly specified	0.951	0.910	0.924	0.930	0.929	0.969
(ii) misspecified $\pi_{z}^{\text{par}}({\bm{x}})$	0.955	0.874	0.941	0.921	0.932	0.953
(iii) misspecified $p_{zd}^{\text{par}}({\bm{x}})$	0.871	0.926	0.738	0.717	0.918	0.951
(iv) misspecified $r_{zd}^{\text{par}}(m,{\bm{x}})$	0.938	0.907	0.921	0.932	0.943	0.953
(v) misspecified $\mu_{zd}^{\text{par}}(m,{\bm{x}})$	0.954	0.873	0.799	0.748	0.946	0.958
(vi) All nuisance misspecified	0.953	0.657	0.919	0.870	0.732	0.812

6 Two real-data applications

6.1 A job training program with noncompliance

JOBS II is a randomized field experiment among 1,801 unemployed workers to examine the effect of a job training workshop to promote mental health and high-quality reemployment (Price et al., 1992). Participants in the treatment group ( $Z=1$ ) were assigned to a job skills workshop, but 45% of the participants did not show up and dropped into control group. Let $D$ be the indicator of whether the individual attends the workshop, where $D=0$ among all participants in the control group because they had no access to the workshops. As strong monotonicity holds by design, we have two principal strata: compliers and never-takers. In the JOBS II study, it is of interest to assess the effect of attending job skills workshop ( $Z$ ) on depression ( $Y$ ) among the compliers, which quantifies the efficacy of the training program to mental health (VanderWeele, 2011). Previous efforts (e.g., Park and Kürüm, 2020) have also investigated the role of sense of mastery ( $M$ ) in mediating the effect from the job skills workshop ( $Z$ ) on depression ( $Y$ ) among the complier stratum, typically under the exclusion restriction assumption. Because JOBS II is not double-blinded, the exclusion restriction may not hold due to psychological effects (Park and Kürüm, 2020; Stuart and Jo, 2015). For example, Stuart and Jo (2015) pointed out that participants assigned to the workshop may feel more optimistic about their reemployment opportunity, suggesting direct pathways from the assignment to depression not via their actual attendance status.

We assess causal mediation under our proposed assumptions, which permit the exploration of causal mechanism among never-takers. The mediator is sense of mastery at 6 weeks after randomization, with $M=1$ indicating a higher sense of mastery. The outcome is a continuous measure of depression at 6 months after randomization, which ranges from 1 to 4 with a higher value indicating worse depression. Baseline covariates ( $\bm{X}$ ) include age, gender, race, marital status, education, assertiveness, level of economic hardship, level of depression, and motivation. For the moment-type and multiply robust estimators, we used the parametric working models described in the Supplementary Material for the nuisance functions. Of note, the propensity score is known by randomization, and therefore the working logistic regression of $\pi_{z}^{\text{par}}({\bm{X}})$ is not subject to misspecification; we still include all baseline covariates into this working logistic regression to adjust for chance imbalance. For the nonparametric efficient estimator, we used Super Learner with the random forest and generalized linear model libraries for estimating the nuisance functions. We only present the results from the multiply robust estimator and nonparametric efficient estimator; complete results from other estimators are in Supplementary Material Tables S1–S2.

Table 2: Estimated causal effects and 95% confidence intervals in the JOBS II study

Population	Estimand	Method
Population	Estimand	mr	np	Rudolph et al.^§
Overall	ITT-NIE	$-$ 0.015 ( $-$ 0.032, $-$ 0.003)	$-$ 0.017 ( $-$ 0.031, $-$ 0.004)	$-$ 0.017 ( $-$ 0.031, $-$ 0.004)
	ITT-NDE	$-$ 0.072 ( $-$ 0.151, $-$ 0.001)	$-$ 0.067 ( $-$ 0.146, 0.013)	$-$ 0.067 ( $-$ 0.146, 0.013)
	ITT	$-$ 0.088 ( $-$ 0.167, $-$ 0.016)	$-$ 0.084 ( $-$ 0.162, $-$ 0.005)	$-$ 0.084 ( $-$ 0.162, $-$ 0.005)
Compliers	PNIE	$-$ 0.026 ( $-$ 0.055, 0.006)	$-$ 0.029 ( $-$ 0.052, $-$ 0.006)	$-$ 0.030 ( $-$ 0.053, $-$ 0.006)
	PNDE	$-$ 0.083 ( $-$ 0.170, 0.002)	$-$ 0.066 ( $-$ 0.156, 0.023)	$-$ 0.115 ( $-$ 0.251, 0.022)
	PCE	$-$ 0.109 ( $-$ 0.191, $-$ 0.027)	$-$ 0.096 ( $-$ 0.182, $-$ 0.009)	$-$ 0.145 ( $-$ 0.280, $-$ 0.008)
Never-takers	PNIE	0.000 ( $-$ 0.006, 0.006)	$-$ 0.001 ( $-$ 0.004, 0.003)	–
	PNDE	$-$ 0.058 ( $-$ 0.160, 0.031)	$-$ 0.066 ( $-$ 0.163, 0.032)	–
	PCE	$-$ 0.058 ( $-$ 0.160, 0.030)	$-$ 0.067 ( $-$ 0.163, 0.032)	–

$\mathsection$

‘Rudolph et al.’ is the nonparametric efficient estimator in Rudolph et al. (2024); see Remark 1 for more details of this approach. Their identification formulas of the ITT natural mediation effects are identical to the present work; this explains the numerical equivalence between ‘np’ and ‘Rudolph et al.’ for the ITT analysis. All effects among never-takers given by Rudolph et al. (2024) are zero due to exclusion restriction.

Table 2 (upper panel) presents the estimated ITT effect and its indirect and direct effect decomposition. Both multiply robust and nonparametric efficient estimators present similar results, indicating that the job skills workshop corresponds to negative ITT and ITT-NIE estimates, confirming that sense of mastery is a mediator of the total effect on depression. However, the ITT estimands do not provide resolution to the potential heterogeneity of mediation effects between compliers and never-takers. The estimated proportions of compliers and never-takers are 55% and 45%, respectively (under the nonparametric efficient estimator). We present the stratum-specific mean and standard deviation of the baseline characteristics in Supplementary Material Table S3. Compared to the never-takers, the compliers are older with higher education; a larger fraction of compliers are female, white, married, and are more motivated to participate in the study, but less assertive. To offer a complete picture of the mediation mechanism for compliers and never-takers, Table 2 (middle and bottom panel) additionally presents the estimated PCEs, together with their mediation effect decomposition. For the compliers stratum, both multiply robust and nonparametric efficient estimators suggest that the JOBS II intervention exerts a statistically significant effect on reducing depression, and approximately one quarter of the $\text{PCE}_{10}$ can be explained by the improvement of sense of mastery. For the never-takers, we observe a smaller but still beneficial effect of the intervention on reducing depression; the 95% confidence interval for $\text{PCE}_{00}$ crosses zero. For both the multiply robust and nonparametric efficient estimators, the indirect effect among the never-takers is estimated to be almost zero (e.g., $\widehat{\text{PNIE}}_{00}^{\text{np}}=-0.001$ with 95% confidence interval $[-0.004,0.003]$ ).

We also compare our results to estimates under exclusion restriction. Table 2 (right column) shows the estimates using the approach developed by Rudolph et al. (2024) (see Remark 1 for more details of this approach). Due to exclusion restriction, all mediation effects among never-takers are assumed zero. We observe that the point and interval estimates of $\text{PNIE}_{10}$ under exclusion restriction are close to their counterparts under principal ignorability, which is anticipated as the PNIE estimate among never-takers is minimal under principal ignorability. However, the point estimates of $\text{PNDE}_{10}$ and $\text{PCE}_{10}$ under exclusion restriction are larger than those under principal ignorability.

6.2 An epidemiological study with an intercurrent event

We re-analyze the World Health Organization’s Large Analysis and Review of European Housing and Health Status (WHO-LARES) study with 5,882 individuals for the effect of living in damp or moldy conditions ( $Z=1$ if yes and 0 if no) on depression ( $Y=1$ if yes and 0 if no), where perceived control on one’s home ( $M$ ) is the mediator of interest (VanderWeele and Vansteelandt, 2010). However, some individuals developed dampness or mold related diseases ( $D=1$ if yes and 0 otherwise). Steen et al. (2017) viewed $D$ as another mediator prior to $M$ and assesses the path-specific effects through $D$ and/or $M$ . To provide a complementary perspective, we view $D$ as an intercurrent event and partition the population into four strata: the doomed stratum ( $U=11$ ) including those who would always be diseased regardless of living conditions, the immune stratum ( $U=00$ ) including those who would never be diseased, the harmed stratum ( $U=10$ ) including those who would be diseased only if living in damp or moldy conditions, and the benefiters stratum ( $U=01$ ) including those who would only be diseased if not living in damp or moldy conditions. Among them, the doomed and harmed strata are two subpopulations of typical interest because their physical health is more sensitive to living conditions. That is, studying their treatment effects can help understand the impact of living in damp or moldy conditions on the mental health among the more physically vulnerable subgroups. As a further step, addressing the principal natural mediation effects can uncover the extent to which this impact is attributed to the perceived control on one’s home.

We consider the standard monotonicity assumption to rule out benefiters, which is plausible because living in damp and moldy conditions would generally only make an individual more likely to develop dampness or mold related diseases. The exclusion restriction is unlikely to hold because living in damp or moldy conditions can still directly affect mental health even in the absence of dampness or mold related diseases. We adjust for the following confounders: gender, age, marital status, education, employment, smoking, home ownership, home size, crowding (number of residents per room), heating, and natural light. The proportions of doomed, harmed, and immune strata based on the nonparametric efficient estimator are 51%, 8%, and 41%, respectively. We summarize the stratum-specific mean and standard deviation of the baseline characteristics in Supplementary Material Table S4. The doomed stratum includes more females, followed by harmed stratum, whereas the immune stratum includes the fewest females. As compared to the doomed and harmed strata, members in the immune stratum are more likely to be married and employed; they are also more satisfied with the heating system and natural light condition in their dwellings.

Table 3: Estimated causal effects and 95% confidence intervals in the WHO-LARES study.

Population	Estimand	Method
Population	Estimand	mr	np^§
Overall	ITT-NIE ${}^{\text{RR}}$	1.021 (1.003, 1.043)	1.031 (1.010, 1.053)
	ITT-NDE ${}^{\text{RR}}$	1.248 (1.114, 1.405)	1.219 (1.078, 1.361)
	ITT ${}^{\text{RR}}$	1.274 (1.137, 1.438)	1.257 (1.114, 1.400)
Doomed	PNIE ${}^{\text{RR}}_{11}$	1.017 (0.999, 1.034)	1.025 (1.001, 1.050)
	PNDE ${}^{\text{RR}}_{11}$	1.223 (1.044, 1.403)	1.181 (1.014, 1.348)
	PCE ${}^{\text{RR}}_{11}$	1.244 (1.070, 1.433)	1.212 (1.044, 1.379)
Harmed	PNIE ${}^{\text{RR}}_{10}$	1.029 (1.001, 1.059)	1.046 (1.013, 1.079)
	PNDE ${}^{\text{RR}}_{10}$	2.296 (1.841, 2.982)	2.142 (1.724, 2.560)
	PCE ${}^{\text{RR}}_{10}$	2.363 (1.897, 3.045)	2.241 (1.817, 2.666)
Immune	PNIE ${}^{\text{RR}}_{00}$	1.025 (0.983, 1.074)	1.035 (0.995, 1.075)
	PNDE ${}^{\text{RR}}_{00}$	1.102 (0.911, 1.328)	1.111 (0.881, 1.340)
	PCE ${}^{\text{RR}}_{00}$	1.129 (0.945, 1.336)	1.150 (0.916, 1.385)

$\mathsection$ Based on the nonparametric efficient method, the contrasts (and 95% confidence intervals) between the PNDE in different principal strata pairs are $\log(\widehat{\text{PNDE}}_{10}^{\text{RR}})-\log(\widehat{\text{PNDE}}_{00}^{\text{RR}})=0.657$ $(0.344,0.967)$ , $\log(\widehat{\text{PNDE}}_{10}^{\text{RR}})-\log(\widehat{\text{PNDE}}_{11}^{\text{RR}})=0.595$ $(0.315,0.874)$ , and $\log(\widehat{\text{PNDE}}_{11}^{\text{RR}})-\log(\widehat{\text{PNDE}}_{00}^{\text{RR}})=0.061$ $(-0.189,0.311)$ , respectively. The contrasts (and 95% confidence intervals) between the PNIE in different principal strata pairs are $\log(\widehat{\text{PNIE}}_{10}^{\text{RR}})-\log(\widehat{\text{PNIE}}_{00}^{\text{RR}})=0.011$ $(-0.039,0.061)$ , $\log(\widehat{\text{PNIE}}_{10}^{\text{RR}})-\log(\widehat{\text{PNIE}}_{11}^{\text{RR}})=0.020$ $(-0.033,0.072)$ , $\log(\widehat{\text{PNIE}}_{11}^{\text{RR}})-\log(\widehat{\text{PNIE}}_{00}^{\text{RR}})=-0.009$ $(-0.055,0.036)$ , respectively.

Table 3 presents the results based on the multiply robust and nonparametric efficient estimators. With a binary outcome, we define all causal estimands on the risk ratio scale. Results based on moment-type estimators are given in Supplementary Material Tables S5–S6, exhibiting similar patterns. Table 3 (upper panel) presents the ITT natural mediation effects, and suggests that living in damp or moldy conditions has a causal effect on elevating the risk of depression; the 95% confidence intervals for ITT, ITT-NIE and ITT-NDE estimands all exclude the null. Table 3 (lower panels) presents the principal natural mediation effects. For the harmed stratum who are most sensitive to living conditions, we observe a large PNDE (risk ratio $>2$ ), but PNDEs are smaller in the other strata (risk ratio $<1.3$ ). We further obtain the difference in log PNDEs across the three subgroups using the nonparametric efficient estimator, and confirm that the PNDE within the harmed stratum is substantially different from that within the other two strata. For example, $\log(\widehat{\text{PNDE}}_{10}^{\text{RR}})-\log(\widehat{\text{PNDE}}_{00}^{\text{RR}})=0.657$ with 95% confidence interval (0.344, 0.967) and $\log(\widehat{\text{PNDE}}_{10}^{\text{RR}})-\log(\widehat{\text{PNDE}}_{11}^{\text{RR}})=0.595$ with 95% confidence interval (0.315, 0.874). On the other hand, the PNIEs are rather comparable in magnitude across strata. Based on the nonparametric efficient estimator, although only the 95% confidence intervals of the PNIE in the doomed and harmed strata exclude null, the 95% confidence interval for each pairwise difference in log PINE includes null. Finally, the proportion mediated varies across principal strata. That is, the perceived control on one’s home, as a mediator, explains the largest fraction of PCE ${}^{\text{RR}}_{00}$ among the immune strata ( $\log(\text{PNIE}^{\text{RR}}_{00})$ / $\log(\text{PCE}^{\text{RR}}_{00})\approx 25\%$ ), and explains the smallest fraction of PCE ${}^{\text{RR}}_{10}$ among the harmed strata ( $\log(\text{PNIE}^{\text{RR}}_{10})$ / $\log(\text{PCE}^{\text{RR}}_{10})\approx 6\%$ ).

As an additional exploratory comparison, we also carry out moderated mediation analysis with respect to baseline covariates. We evaluate the conditional natural indirect and direct effects on a risk ratio scale given each covariate, using the R package moderate.mediation (Qin and Wang, 2024). For each covariate considered, we partition the covariates vector into the moderator of interest and all remaining covariates as confounding adjustment variables. Next, we fit logistic models for the mediator and outcome to assess conditional mediation based on the identification formulas given in Qin and Wang (2024). The conditional natural (in)direct effects are summarized in Supplementary Material Figure S3 and Table S7. The results suggest that the mediation effect heterogeneity across different covariate levels is milder as compared to the results under the principal stratification mediation analysis. An important distinction of the moderated mediation analysis from our proposed methods is that the former fails to address $D$ as a potential post-treatment confounder, and may be biased even for quantifying the conditional mediation effect estimands.

7 A framework for sensitivity analysis

The principal ignorability (Assumption 4) and ignorability of the mediator (Assumption 5) are two crucial assumptions for identification of $\theta_{d_{1}d_{0}}^{(zz^{\prime})}$ . These two assumptions, however, cannot be empirically verified. Sensitivity analysis is therefore a useful tool to assess causal effects under assumed violations of these assumptions. In the Supplementary Material, we develop a semiparametric sensitivity analysis framework to assess the impact of violation of Assumption 4 and Assumption 5 on inference about $\theta_{d_{1}d_{0}}^{(zz^{\prime})}$ and mediation effects. The proposed sensitivity analysis strategy relies on the confounding function approach (Tchetgen Tchetgen and Shpitser, 2012; Ding and Lu, 2017). Once the confounding functions are developed, we further provide a multiply robust estimator for $\theta_{d_{1}d_{0}}^{(zz^{\prime})}$ and natural mediation effects and prove its large-sample properties, assuming a known confounding function. In practice, the confounding function is unknown and users can specify a working sensitivity function with interpretable sensitivity parameters and then report the causal estimates under a range of values of sensitivity parameters, in order to identify tipping points that might reverse the causal conclusions. In the Supplementary Material, we also illustrate the proposed sensitivity analysis methods in the context of the JOBS II study.

8 Discussion

In this article, we consider a set of new identification assumptions for studying the natural mediation effects across several principal strata. This provides an important complementary perspective to existing methods that view $D$ as either a post-treatment confounder or another mediator, and enables the investigation of mediation mechanisms within subpopulations. We then derive the EIF for the principal natural mediation effects and further propose a quadruply robust estimator. Finally, a nonparametric extension has been developed to alleviate parametric model misspecification bias and to achieve efficient estimation.

While each principal stratum often represents a scientifically relevant subpopulation, the stratum membership is not always fully observed for each individual in a particular study, leading to potential barriers in optimizing future interventions to target subpopulations. Although there has been no consensus in mitigating such barriers, we offer three considerations that may improve the policy relevance for addressing mediation across principal strata. First, as a routine practice, we recommend summarizing the baseline characteristics for each stratum to help distinguish partially observed subpopulations in measurable dimensions. Given that the baseline summary is widely available from published social science and biomedical studies, summary statistics such those in Web Tables 3 and 4 (applications in Sections 6.1 and 6.2) can facilitate a direct comparison to existing study populations and determine the relevance of the current results to alternative populations. Importantly, these summary statistics can be readily obtained once the principal score is estimated, and an example case study can also be found in Section 5.2 of Cheng et al. (2023a). Second, special study design features may enable an explicit characterization of the endogenous subgroups. As a concrete example, randomization plus strong monotonicity—design features of the JOB II study in Section 6.1—ensure that individuals attending the workshop and those not attending the workshop in the treatment group are unbiased representations of the compliers and never-takers in the entire study, and evidence about their principal natural mediation effects serves to improve interventions that can at least target individuals in the treatment group. Third, for individuals with unobserved stratum membership, the estimated principal score model is useful for membership prediction. In the noncompliance scenario, Kennedy et al. (2020) discussed a two-stage treatment policy, where in the first stage one predicts the compliance status (for example, based on estimated principal scores), and in second stage, one recommends the optimal intervention in each predicted stratum. Predicting principal stratum membership was also an intermediate step in Chen et al. (2024), who have quantified conditional average treatment effects among the partially observed always-survivors in the truncation-by-death setting. Although the optimal methods for predicting membership and the best practice for operationalizing a multi-stage treatment policy remain important topics for future research, we believe this perspective continues to endorse the value of the principal natural mediation effect estimates for informing improved interventions to target partially observed subpopulations.

This article addresses a univariate mediator. When $M$ is multi-dimensional with several continuous components, it may be cumbersome to leverage the EIF in Theorem 3 to assess mediation, because one needs to estimate a multi-dimensional density $r_{zd}(m,{\bm{X}})$ and to further calculate a multi-dimensional integration $\eta_{zz^{\prime}}({\bm{X}})=\int_{m}\mu_{zd_{z}}(m,{\bm{X}})r_{z^{\prime}d_{z^{\prime}}}(m,{\bm{X}})dm$ . These challenges may be mitigated by reparametrizing the nuisance functions in EIF (Díaz et al., 2021; Zhou, 2022). For example, one can retain the current parameterization of $\delta_{d_{1}d_{0}}({\bm{O}})$ but re-express $\psi_{d_{1}d_{0}}^{(zz^{\prime})}({\bm{O}})$ to $\psi_{d_{1}d_{0}}^{(zz^{\prime}),\dagger}({\bm{O}})$ as

		$\displaystyle\left(\frac{\mathbb{I}(Z=z^{})\left\{\mathbb{I}(D=d^{})-p_{z^{}d^{}}({\bm{X}})\right\}}{\pi_{z^{*}}({\bm{X}})}-k\frac{(1-Z)\left\{D-p_{01}({\bm{X}})\right\}}{\pi_{0}({\bm{X}})}\right)\eta_{zz^{\prime}}^{\dagger}({\bm{X}})$
	$\displaystyle+$	$\displaystyle\left\{p_{z^{}d^{}}({\bm{X}})-kp_{01}({\bm{X}})\right\}\frac{\mathbb{I}(D=d_{z},Z=z)}{p_{z^{\prime}d_{z^{\prime}}}({\bm{X}})\pi_{z^{\prime}}({\bm{X}})}\frac{g_{z^{\prime}d_{z^{\prime}}}(M,{\bm{X}})\kappa_{d_{z^{\prime}}}(M,{\bm{X}})}{g_{zd_{z}}(M,{\bm{X}})\kappa_{d_{z}}(M,{\bm{X}})}\left\{Y-\mu_{zd_{z}}(M,{\bm{X}})\right\}$
	$\displaystyle+$	$\displaystyle\left\{p_{z^{}d^{}}({\bm{X}})-kp_{01}({\bm{X}})\right\}\frac{\mathbb{I}(D=d_{z^{\prime}},Z=z^{\prime})}{p_{z^{\prime}d_{z^{\prime}}}({\bm{X}})\pi_{z^{\prime}}({\bm{X}})}\left\{\mu_{zd_{z}}(M,{\bm{X}})-\eta_{zz^{\prime}}^{\dagger}({\bm{X}})\right\}$
	$\displaystyle+$	$\displaystyle\left\{p_{z^{}d^{}}({\bm{X}})-kp_{01}({\bm{X}})\right\}\eta_{zz^{\prime}}^{\dagger}({\bm{X}}).$

Here, $\psi_{d_{1}d_{0}}^{(zz^{\prime}),\dagger}({\bm{O}})$ depends on a set of nuisance functions $h_{\text{nuisance}}^{\dagger}=\{\pi_{z}({\bm{X}}),\allowbreak p_{zd}({\bm{X}}),\allowbreak\mu_{zd}(M,{\bm{X}}),\allowbreak\kappa_{d}(M,{\bm{X}}),\allowbreak g_{zd}(M,{\bm{X}}),\allowbreak\eta_{zz^{\prime}}^{\dagger}({\bm{X}})\}$ , where the first three are identical to these from $h_{\text{nuisance}}$ , $\kappa_{d}(M,{\bm{X}})=f_{D|M,{\bm{X}}}(d|M,{\bm{X}})$ and $g_{zd}(M,{\bm{X}})=f_{Z|D,M,{\bm{X}}}(z|d,M,{\bm{X}})$ are two conditional probabilities, and $\eta_{zz^{\prime}}^{\dagger}({\bm{X}})=\mathbb{E}[\mu_{zd_{z}}(M,{\bm{X}})|Z=z^{\prime},D=d_{z^{\prime}},{\bm{X}}]$ is a nested expectation that can be estimated by regressing $\widehat{\mu}_{zd_{z}}(M,{\bm{X}})$ on ${\bm{X}}$ within strata $\{Z=z^{\prime},D=d_{z^{\prime}}\}$ . Notice that this alternative set of nuisance functions only involves one-dimensional conditional expectations or probabilities regardless of the dimensionality of $M$ , and has potential to simplify the modeling process. The semiparametric efficient estimator based on the reparameterized EIF can now be defined as $\widehat{\theta}_{d_{1}d_{0}}^{(zz^{\prime}),\dagger}=\mathbb{P}_{n}\left[\widehat{\psi}_{d_{1}d_{0}}^{(zz^{\prime}),\dagger}(\bm{O})\right]\Big{/}\mathbb{P}_{n}\left[\widehat{\delta}_{d_{1}d_{0}}(\bm{O})\right]$ , whose asymptotic properties and finite-sample performance require future work.

To support the implementation of the proposed methodology, we have developed the psmediate R package along with a brief vignette, which can be accessed at https://github.com/chaochengstat/psmediate and https://rpubs.com/chaocheng/psmediate.

Acknowledgement

This work is partially supported by the Patient-Centered Outcomes Research Institute^® (PCORI^® Award ME-2023C1-31350). We thank the World Health Organization’s European Centre for Environment and Health, Bonn office, for providing the WHO-LARES data. We thank Johan Steen for connecting us with the Bonn office to apply for data access. The statements in this article are solely the responsibility of the authors and do not necessarily represent the views of PCORI^® or World Health Organization.

References

Albert and Nelson (2011) Albert, J. M. and Nelson, S. (2011), “Generalized causal mediation analysis,” Biometrics, 67, 1028–1038.
Angrist et al. (1996) Angrist, J., Imbens, G., and Rubin, D. (1996), “Identification of causal effects using instrumental variables,” Journal of the American Statistical Association, 91, 444–455.
Bickel et al. (1993) Bickel, P. J., Klaassen, C. A., Bickel, P. J., Ritov, Y., Klaassen, J., Wellner, J. A., and Ritov, Y. (1993), Efficient and Adaptive Estimation for Semiparametric Models, Springer.
Chen et al. (2024) Chen, X., Harhay, M. O., Tong, G., and Li, F. (2024), “A Bayesian machine learning approach for estimating heterogeneous survivor causal effects: applications to a critical care trial,” The Annals of Applied Statistics, 18, 350.
Chen and White (1999) Chen, X. and White, H. (1999), “Improved rates and asymptotic normality for nonparametric neural network estimators,” IEEE Transactions on Information Theory, 45, 682–691.
Cheng et al. (2023a) Cheng, C., Guo, Y., Liu, B., Wruck, L., and Li, F. (2023a), “Multiply robust estimation for causal survival analysis with treatment noncompliance,” arXiv preprint arXiv:2305.13443.
Cheng et al. (2021) Cheng, C., Spiegelman, D., and Li, F. (2021), “Estimating the natural indirect effect and the mediation proportion via the product method,” BMC Medical Research Methodology, 21, 1–20.
Cheng et al. (2023b) — (2023b), “Is the product method more efficient than the difference method for assessing mediation?” American Journal of Epidemiology, 192, 84–92.
Chernozhukov et al. (2018) Chernozhukov, V., Chetverikov, D., Demirer, M., Duflo, E., Hansen, C., Newey, W., and Robins, J. (2018), “Double/debiased machine learning for treatment and structural parameters: Double/debiased machine learning,” The Econometrics Journal, 21.
Daniel et al. (2015) Daniel, R. M., De Stavola, B. L., Cousens, S., and Vansteelandt, S. (2015), “Causal mediation analysis with multiple mediators,” Biometrics, 71, 1–14.
Díaz et al. (2021) Díaz, I., Hejazi, N. S., Rudolph, K. E., and van Der Laan, M. J. (2021), “Nonparametric efficient causal mediation with intermediate confounders,” Biometrika, 108, 627–641.
Ding and Lu (2017) Ding, P. and Lu, J. (2017), “Principal stratification analysis using principal scores,” Journal of the Royal Statistical Society. Series B (Statistical Methodology), 757–777.
Forastiere et al. (2018) Forastiere, L., Mattei, A., and Ding, P. (2018), “Principal ignorability in mediation analysis: through and beyond sequential ignorability,” Biometrika, 105, 979–986.
Frangakis and Rubin (2002) Frangakis, C. E. and Rubin, D. B. (2002), “Principal stratification in causal inference,” Biometrics, 58, 21–29.
Frölich and Huber (2017) Frölich, M. and Huber, M. (2017), “Direct and indirect treatment effects–causal chains and mediation analysis with instrumental variables,” Journal of the Royal Statistical Society. Series B (Statistical Methodology), 1645–1666.
Frölich and Melly (2013) Frölich, M. and Melly, B. (2013), “Identification of treatment effects on the treated with one-sided non-compliance,” Econometric Reviews, 32, 384–414.
Hines et al. (2022) Hines, O., Dukes, O., Diaz-Ordaz, K., and Vansteelandt, S. (2022), “Demystifying statistical learning based on efficient influence functions,” The American Statistician, 76, 292–304.
Imai et al. (2010) Imai, K., Keele, L., and Yamamoto, T. (2010), “Identification, Inference and Sensitivity Analysis for Causal Mediation Effects,” Statistical Science, 25, 51–71.
Jiang et al. (2022) Jiang, Z., Yang, S., and Ding, P. (2022), “Multiply robust estimation of causal effects under principal ignorability,” Journal of the Royal Statistical Society Series B: Statistical Methodology, 84, 1423–1445.
Jo and Stuart (2009) Jo, B. and Stuart, E. A. (2009), “On the use of propensity scores in principal causal effect estimation,” Statistics in Medicine, 28, 2857–2875.
Kahan et al. (2023) Kahan, B. C., Cro, S., Li, F., and Harhay, M. O. (2023), “Eliminating ambiguous treatment effects using estimands,” American Journal of Epidemiology, 192, 987–994.
Kang and Schafer (2007) Kang, J. D. and Schafer, J. L. (2007), “Demystifying double robustness: A comparison of alternative strategies for estimating a population mean from incomplete data,” Statistical Science, 22, 523–539.
Kennedy et al. (2020) Kennedy, E. H., Balakrishnan, S., and G’Sell, M. (2020), “Sharp instruments for classifying compliers and generalizing causal effects,” The Annals of Statistics, 48, 2008–2030.
Kim et al. (2017) Kim, C., Daniels, M. J., Marcus, B. H., and Roy, J. A. (2017), “A framework for Bayesian nonparametric inference for causal effects of mediation,” Biometrics, 73, 401–409.
Luo et al. (2016) Luo, Y., Spindler, M., and Kück, J. (2016), “High-Dimensional $L_{2}$ Boosting: Rate of Convergence,” arXiv preprint arXiv:1602.08927.
Merchant et al. (2021) Merchant, A. T., Liu, J., Reynolds, M. A., Beck, J. D., and Zhang, J. (2021), “Quantile regression to estimate the survivor average causal effect of periodontal treatment effects on birthweight and gestational age,” Journal of Periodontology, 92, 975–982.
Michalowicz et al. (2006) Michalowicz, B. S., Hodges, J. S., DiAngelis, A. J., Lupo, V. R., Novak, M. J., Ferguson, J. E., Buchanan, W., Bofill, J., Papapanou, P. N., Mitchell, D. A., et al. (2006), “Treatment of periodontal disease and the risk of preterm birth,” New England Journal of Medicine, 355, 1885–1894.
Miles et al. (2017) Miles, C. H., Shpitser, I., Kanki, P., Meloni, S., and Tchetgen Tchetgen, E. J. (2017), “Quantifying an adherence path-specific effect of antiretroviral therapy in the Nigeria PEPFAR program,” Journal of the American Statistical Association, 112, 1443–1452.
Miles et al. (2020) — (2020), “On semiparametric estimation of a path-specific effect in the presence of mediator-outcome confounding,” Biometrika, 107, 159–172.
Nguyen et al. (2021) Nguyen, T. Q., Schmid, I., and Stuart, E. A. (2021), “Clarifying causal mediation analysis for the applied researcher: Defining effects based on what we want to learn.” Psychological Methods, 26, 255.
Page et al. (2015) Page, L. C., Feller, A., Grindal, T., Miratrix, L., and Somers, M.-A. (2015), “Principal stratification: A tool for understanding variation in program effects across endogenous subgroups,” American Journal of Evaluation, 36, 514–531.
Park and Kürüm (2018) Park, S. and Kürüm, E. (2018), “Causal mediation analysis with multiple mediators in the presence of treatment noncompliance,” Statistics in Medicine, 37, 1810–1829.
Park and Kürüm (2020) — (2020), “A two-stage joint modeling method for causal mediation analysis in the presence of treatment noncompliance,” Journal of Causal Inference, 8, 131–149.
Park and Palardy (2020) Park, S. and Palardy, G. J. (2020), “Sensitivity evaluation of methods for estimating complier average causal mediation effects to assumptions,” Journal of Educational and Behavioral Statistics, 45, 475–506.
Price et al. (1992) Price, R. H., Van Ryn, M., and Vinokur, A. D. (1992), “Impact of a preventive job search intervention on the likelihood of depression among the unemployed,” Journal of Health and Social Behavior, 158–167.
Qin and Wang (2024) Qin, X. and Wang, L. (2024), “Causal moderated mediation analysis: Methods and software,” Behavior Research Methods, 56, 1314–1334.
Robins and Richardson (2010) Robins, J. M. and Richardson, T. S. (2010), “Alternative graphical causal models and the identification of direct effects,” in Causality and Psychopathology: Finding the Determinants of Disorders and Their Cures, 103–158.
Rudolph et al. (2024) Rudolph, K. E., Williams, N., and Díaz, I. (2024), “Using instrumental variables to address unmeasured confounding in causal mediation analysis,” Biometrics, 80, ujad037.
Steen et al. (2017) Steen, J., Loeys, T., Moerkerke, B., and Vansteelandt, S. (2017), “Flexible mediation analysis with multiple mediators,” American Journal of Epidemiology, 186, 184–193.
Stuart and Jo (2015) Stuart, E. A. and Jo, B. (2015), “Assessing the sensitivity of methods for estimating principal causal effects,” Statistical Methods in Medical Research, 24, 657–674.
Tchetgen Tchetgen and Shpitser (2012) Tchetgen Tchetgen, E. J. and Shpitser, I. (2012), “Semiparametric theory for causal mediation analysis: efficiency bounds, multiple robustness, and sensitivity analysis,” Annals of Statistics, 40, 1816.
Tchetgen Tchetgen and VanderWeele (2014) Tchetgen Tchetgen, E. J. and VanderWeele, T. J. (2014), “On identification of natural direct effects when a confounder of the mediator is directly affected by exposure,” Epidemiology (Cambridge, Mass.), 25, 282.
Van der Laan et al. (2007) Van der Laan, M. J., Polley, E. C., and Hubbard, A. E. (2007), “Super learner,” Statistical Applications in Genetics and Molecular Biology, 6.
VanderWeele (2011) VanderWeele, T. J. (2011), “Principal stratification–uses and limitations,” The International Journal of Biostatistics, 7.
VanderWeele and Vansteelandt (2009) VanderWeele, T. J. and Vansteelandt, S. (2009), “Conceptual issues concerning mediation, interventions and composition,” Statistics and its Interface, 2, 457–468.
VanderWeele and Vansteelandt (2010) — (2010), “Odds ratios for mediation analysis for a dichotomous outcome,” American Journal of Epidemiology, 172, 1339–1348.
VanderWeele et al. (2014) VanderWeele, T. J., Vansteelandt, S., and Robins, J. M. (2014), “Effect decomposition in the presence of an exposure-induced mediator-outcome confounder,” Epidemiology (Cambridge, Mass.), 25, 300.
Wager and Walther (2015) Wager, S. and Walther, G. (2015), “Adaptive concentration of regression trees, with application to random forests,” arXiv preprint arXiv:1503.06388.
Xia and Chan (2021) Xia, F. and Chan, K. C. G. (2021), “Identification, semiparametric efficiency, and quadruply robust estimation in mediation analysis with treatment-induced confounding,” Journal of the American Statistical Association, 1–10.
Yamamoto (2013) Yamamoto, T. (2013), “Identification and estimation of causal mediation effects with treatment noncompliance,” Technical Report.
Zhou (2022) Zhou, X. (2022), “Semiparametric estimation for causal mediation analysis with multiple causally ordered mediators,” Journal of the Royal Statistical Society Series B: Statistical Methodology, 84, 794–821.

Supplementary Material to “Identification and multiply robust estimation in mediation analysis across principal strata”

Section A provides an additional motivating example. Section B provides practical strategies on specification of the parametric working models. In Section C, we provide a semiparametric sensitivity analysis framework for the principal ignorability assumption and ignorability of the mediator assumption. In Section D, we provide the proofs for all theorems, propositions, and remarks in the main manuscript. In Section E, we present Supplementary Material Tables and Figures.

A An additional motivating example

Example 3

(Mediation analysis with death-truncated mediator and outcome) Consider a case when the mediator and outcome are truncated due to a terminal event before measurements of the mediator, but no other terminal event occurs between the mediator and outcome. One concrete example is the Obstetrics and Periodontal Therapy (OPT) trial (Michalowicz et al., 2006), where one may address the role of gestational age ( $M$ ) in mediating the effect of a periodontal treatment during pregnancy ( $Z$ ) on birthweight ( $Y$ ), but $M$ and $Y$ are only measured if infants born alive. Here, the survival status of the infants ( $D$ ) serves as a terminal event, where $M$ and $Y$ are not well defined for dead units ( $D=0$ ) (Merchant et al., 2021). In this scenario, it is of interest to estimate the average treatment effect and mediation effect among the subset of infants who would always survive regardless of the treatment (i.e., the always-survivor stratum). Specifically, assessing the average treatment effect among always-survivors (or referred to as the survivor average causal effect) can help answer the central research question in the OPT trial on whether the periodontal therapy has adverse/positive effect on newborn infant’s birthweight (Michalowicz et al., 2006), without the complications due to death as a terminal event. As a next step, investigating the principal natural mediation effect within always-survivors, one can clarify the role of gestational age in explaining the survivor average causal effect.

B Specification of the parametric working models

We can specify working models $h_{nuisance}^{\text{par}}=\{\pi_{z}^{\text{par}}({\bm{x}}),p_{zd}^{\text{par}}({\bm{x}}),r_{zd}^{\text{par}}(m,{\bm{x}}),\mu_{zd}^{\text{par}}(m,{\bm{x}})\}$ for $h_{nuisance}$ . Specification of $h_{nuisance}^{\text{par}}$ can be flexible, and we provide one example below. This specification strategy is also used in our simulation and application studies.

For $\pi_{z}^{\text{par}}({\bm{x}})$ , one can consider $f_{Z|{\bm{X}}}(1|{\bm{X}})=\text{expit}\left(\bm{\alpha}^{T}[1,{\bm{X}}^{T}]\right)$ as a logistic regression with coefficients $\bm{\alpha}$ such that $\pi_{z}^{\text{par}}({\bm{x}})=\left\{\text{expit}\left(\bm{\alpha}^{T}[1,{\bm{x}}^{T}]\right)\right\}^{z}\left\{1-\text{expit}\left(\bm{\alpha}^{T}[1,{\bm{x}}^{T}]\right)\right\}^{1-z}$ , where $\text{expit}(x)=\frac{1}{1+\exp(-x)}$ is the logistic function. Specification of $p_{zd}^{\text{par}}({\bm{x}})$ differs between the one-sided and two-sided noncompliance scenarios. Under two-sided noncompliance, one can consider $f_{D|Z,{\bm{X}}}(1|Z,{\bm{X}})=\text{expit}\left(\bm{\beta}^{T}[1,Z,{\bm{X}}^{T}]\right)$ as a logistic regression with coefficients $\bm{\beta}$ , leading to $p_{zd}^{\text{par}}({\bm{x}})=\left\{\text{expit}\left(\bm{\beta}^{T}[1,z,{\bm{x}}^{T}]\right)\right\}^{d}\left\{1-\text{expit}\left(\bm{\beta}^{T}[1,z,{\bm{x}}^{T}]\right)\right\}^{1-d}$ . Under one-sided noncompliance, we already know $p_{0d}({\bm{x}})\equiv 1-d$ by the strong monotonicity and therefore we can fix $p_{0d}^{\text{par}}({\bm{x}})=1-d$ and only specify a working model for $p_{1d}^{\text{par}}({\bm{x}})$ ; for example, one can consider $f_{D|Z,{\bm{X}}}(1|1,{\bm{X}})=\text{expit}\left(\bm{\beta}^{T}[1,{\bm{X}}^{T}]\right)$ such that $p_{1d}^{\text{par}}({\bm{x}})=\left\{\text{expit}\left(\bm{\beta}^{T}[1,{\bm{x}}^{T}]\right)\right\}^{d}\left\{1-\text{expit}\left(\bm{\beta}^{T}[1,{\bm{x}}^{T}]\right)\right\}^{1-d}$ . If $M$ is binary, we can further consider $f_{M|Z,D,{\bm{X}}}(1|Z,D,{\bm{X}})=\text{expit}\left(\bm{\gamma}^{T}[1,Z,D,{\bm{X}}^{T}]\right)$ as a logistic regression with coefficients $\bm{\gamma}$ such that $r_{zd}^{\text{par}}(m,{\bm{x}})=\left\{\text{expit}\left(\bm{\gamma}^{T}[1,z,d,{\bm{x}}^{T}]\right)\right\}^{m}\left\{1-\text{expit}\left(\bm{\gamma}^{T}[1,z,d,{\bm{x}}^{T}]\right)\right\}^{1-m}$ . For a continuous $M$ , a feasible working model is $M|Z,D,{\bm{X}}\sim N(\bm{\gamma}^{T}[1,Z,D,{\bm{X}}^{T}],\sigma_{\gamma}^{2})$ , which implies that $r_{zd}^{\text{par}}(m,{\bm{x}})=\phi(\bm{\gamma}^{T}[1,z,d,{\bm{x}}^{T}],\sigma_{\gamma}^{2})$ , where $\phi(\mu,\sigma^{2})$ is the density function of $N(\mu,\sigma^{2})$ . When $Y$ is a continuous or binary, one can specify $\mathbb{E}[Y|Z,D,M,{\bm{X}}]=\text{expit}\left(\bm{\kappa}^{T}[1,Z,D,M,{\bm{X}}^{T}]\right)$ or $\bm{\kappa}^{T}[1,Z,D,M,{\bm{X}}^{T}]$ with coefficients $\bm{\kappa}$ , leading to $\mu_{zd}^{\text{par}}(m,{\bm{x}};\bm{\kappa})=\text{expit}\left(\bm{\kappa}^{T}[1,z,d,m,{\bm{x}}^{T}]\right)$ or $\bm{\kappa}^{T}[1,z,d,m,{\bm{x}}^{T}]$ . Estimators of the parameters in the parametric working models, $\{\widehat{\bm{\alpha}},\widehat{\bm{\beta}},\widehat{\bm{\gamma}},\widehat{\sigma}_{\gamma}^{2},\widehat{\bm{\kappa}}\}$ , can proceed by maximum likelihood. Estimators of nuisance functions are therefore $\widehat{h}_{nuisance}^{\text{par}}=\left\{\widehat{\pi}_{z}^{\text{par}}({\bm{x}}),\widehat{p}_{zd}^{\text{par}}({\bm{x}}),\widehat{r}_{zd}^{\text{par}}(m,{\bm{x}}),\widehat{\mu}_{zd}^{\text{par}}(m,{\bm{x}})\right\}$ , which is $h_{nuisance}^{\text{par}}$ evaluated at $\{\widehat{\bm{\alpha}},\widehat{\bm{\beta}},\widehat{\bm{\gamma}},\widehat{\sigma}_{\gamma}^{2},\widehat{\bm{\kappa}}\}$ .

C Sensitivity analysis

The principal ignorability (Assumption 4) and ignorability of the mediator (Assumption 5) are required for the identification of $\theta_{d_{1}d_{0}}^{(zz^{\prime})}$ in Theorem 1. However, these two assumptions are not empirically verifiable based on the observed data and may not hold in randomized experiments. We propose sensitivity analysis strategies to assess the impact of violation of these two assumptions on inference about $\theta_{d_{1}d_{0}}^{(zz^{\prime})}$ . When evaluating the sensitivity to violation of one specific assumption, we shall assume all other structural assumptions hold. To fix ideas, we consider that the mediator $M$ is a multi-valued variable with finite support $M\in\{0,1,\dots,m_{\max}\}$ , and the methodology can be generalized to a continuous mediator.

C.1 Sensitivity analysis for the principal ignorability assumption

We first focus on the scenario under standard monotonicity, and methods under strong monotonicity are discussed at the end of this section. To begin with, we notice that Theorem 1 holds under a weaker version of Assumption 4 which consists of two statements:

(i)

$\mathbb{E}_{Y_{1m}|U,\!{\bm{X}}}[Y_{1m}|10,\!{\bm{X}}]\!=\!\mathbb{E}_{Y_{1m}|U,{\bm{X}}}[Y_{1m}|11,\!{\bm{X}}]$ and $\mathbb{E}_{Y_{0m}|U,\!{\bm{X}}}[Y_{0m}|10,\!{\bm{X}}]\!=\!\mathbb{E}_{Y_{0m}|U,\!{\bm{X}}}[Y_{0m}|00,\!{\bm{X}}]$ for any $m\in\{0,\dots,m_{\max}\}$ .
(ii)

$f_{M_{1}|U,{\bm{X}}}(m|10,{\bm{X}})=f_{M_{1}|U,{\bm{X}}}(m|11,{\bm{X}})$ and $f_{M_{0}|U,{\bm{X}}}(m|10,{\bm{X}})=f_{M_{0}|U,{\bm{X}}}(m|00,{\bm{X}})$ for any $m\in\{1,\dots,m_{\max}\}$ .

Statement (i) requires that the expectation of $Y_{1m}$ is same between the complier and always-takers strata and the expectation of $Y_{0m}$ is same between the complier and never-takers strata, conditional on all observed covariates. Statement (ii) implicitly suggests that $f_{M_{1}|U,{\bm{X}}}(0|10,{\bm{X}})=f_{M_{1}|U,{\bm{X}}}(0|11,{\bm{X}})$ and $f_{M_{0}|U,{\bm{X}}}(0|10,{\bm{X}})=f_{M_{0}|U,{\bm{X}}}(0|00,{\bm{X}})$ . Therefore, Statement (ii) requires that the distribution of $M_{1}$ is same between the complier and always-takers strata and the distribution of $M_{0}$ is same between the complier and never-takers strata, conditional on all observed covariates. Our sensitivity analysis is based on the following confounding functions measuring departure from the weaker version of principal ignorability:

	$\displaystyle\xi_{Y}^{(1)}(m,{\bm{x}})=\frac{\mathbb{E}_{Y_{1m}\|U,{\bm{X}}}[Y_{1m}\|10,{\bm{x}}]}{\mathbb{E}_{Y_{1m}\|U,{\bm{X}}}[Y_{1m}\|11,{\bm{x}}]},\quad\xi_{Y}^{(0)}(m,{\bm{x}})=\frac{\mathbb{E}_{Y_{0m}\|U,{\bm{X}}}[Y_{0m}\|10,{\bm{x}}]}{\mathbb{E}_{Y_{0m}\|U,{\bm{X}}}[Y_{0m}\|00,{\bm{x}}]}\quad\text{for $m=0,\dots,m_{\max}$}$
	$\displaystyle\xi_{M}^{(1)}(m,{\bm{x}})=\frac{f_{M_{1}\|U,{\bm{X}}}(m\|10,{\bm{x}})}{f_{M_{1}\|U,{\bm{X}}}(m\|11,{\bm{x}})},\quad\xi_{M}^{(0)}(m,{\bm{x}})=\frac{f_{M_{0}\|U,{\bm{X}}}(m\|10,{\bm{x}})}{f_{M_{0}\|U,{\bm{X}}}(m\|00,{\bm{x}})}\quad\text{for $m=1,\dots,m_{\max}$}.$

The first two confounding functions measure deviation of principal ignorability in the outcome variable, where $\xi_{Y}^{(1)}(m,{\bm{x}})$ measures the ratio of the mean of $Y_{1m}$ among compliers versus always-takers and $\xi_{Y}^{(0)}(m,{\bm{x}})$ measures the ratio of the mean of $Y_{0m}$ among compliers versus never-takers, conditional on ${\bm{X}}={\bm{x}}$ . On the other hand, the last two confounding functions measure deviation of principal ignorability in the mediator variable, where $\xi_{M}^{(1)}(m,{\bm{x}})$ measures the relative risk of compliers against always-takers on the treated potential mediator at level $m$ and $\xi_{M}^{(0)}(m,{\bm{x}})$ measures the relative risk of compliers against never-takers on the control potential mediator at level $m$ , conditional on ${\bm{X}}={\bm{x}}$ . Notice that $\xi_{M}^{(1)}(m,{\bm{x}})$ and $\xi_{M}^{(0)}(m,{\bm{x}})$ are only defined for $m\geq 1$ , which will determine the values of $\xi_{M}^{(1)}(0,{\bm{x}}):=\frac{f_{M_{1}|U,{\bm{X}}}(0|10,{\bm{x}})}{f_{M_{1}|U,{\bm{X}}}(0|11,{\bm{x}})}$ and $\xi_{M}^{(0)}(0,{\bm{x}}):=\frac{f_{M_{0}|U,{\bm{X}}}(0|10,{\bm{x}})}{f_{M_{0}|U,{\bm{X}}}(0|00,{\bm{x}})}$ as shown in Section D.8, where we have provided the following explicit expressions in terms of $\{\xi_{M}^{(1)}(m,{\bm{x}}),\xi_{M}^{(0)}(m,{\bm{x}})\text{ for }m=1,\dots,m_{\max}\}$ :

	$\displaystyle\xi_{M}^{(1)}(0,{\bm{x}})$	$\displaystyle=\frac{1-\displaystyle\sum_{j=1}^{m_{\max}}\frac{\xi_{M}^{(1)}(j,{\bm{x}})p_{11}({\bm{x}})}{\xi_{M}^{(1)}(j,{\bm{x}})(p_{11}({\bm{x}})-p_{01}({\bm{x}}))+p_{01}({\bm{x}})}r_{11}(j,{\bm{x}})}{1-\displaystyle\sum_{j=1}^{m_{\max}}\frac{p_{11}({\bm{x}})}{\xi_{M}^{(1)}(j,{\bm{x}})(p_{11}({\bm{x}})-p_{01}({\bm{x}}))+p_{01}({\bm{x}})}r_{11}(j,{\bm{x}})},$
	$\displaystyle\xi_{M}^{(0)}(0,{\bm{x}})$	$\displaystyle=\frac{1-\displaystyle\sum_{j=1}^{m_{\max}}\frac{\xi_{M}^{(0)}(j,{\bm{x}})p_{00}({\bm{x}})}{\xi_{M}^{(0)}(j,{\bm{x}})(p_{11}({\bm{x}})-p_{01}({\bm{x}}))+p_{10}({\bm{x}})}r_{00}(j,{\bm{x}})}{1-\displaystyle\sum_{j=1}^{m_{\max}}\frac{p_{00}({\bm{x}})}{\xi_{M}^{(0)}(j,{\bm{x}})(p_{11}({\bm{x}})-p_{01}({\bm{x}}))+p_{10}({\bm{x}})}r_{00}(j,{\bm{x}})}.$

Theorem 1 holds if all sensitivity functions in $\xi=\Big{\{}\left(\xi_{M}^{(1)}(m,{\bm{x}}),\xi_{M}^{(0)}(m,{\bm{x}})\right)\text{ for }m=1,\dots,m_{\max}\text{ and }\left(\xi_{Y}^{(1)}(m,{\bm{x}}),\xi_{Y}^{(0)}(m,{\bm{x}})\right)\text{ for }m=0,\dots,m_{\max}\Big{\}}$ are equal to 1. The following proposition generalizes Theorem 1 to the scenario when at least one confounding function has a value different from 1.

Proposition S1

Suppose that Assumptions 1, 2, 3a, 5, and 6 hold with known values of the confounding functions ( $\xi$ ), we can identify $\theta_{d_{1}d_{0}}^{(zz^{\prime})}$ by

\theta_{d_{1}d_{0}}^{(zz^{\prime})}=\int_{{\bm{x}}}\frac{e_{d_{1}d_{0}}({\bm{x}})}{e_{d_{1}d_{0}}}\left\{\sum_{m=0}^{m_{\max}}w_{d_{1}d_{0}}^{(zz^{\prime})}(m,{\bm{x}})\mu_{zd_{z}}(m,{\bm{x}})r_{z^{\prime}d_{z^{\prime}}}(m,{\bm{x}})\right\}\text{d}\mathbb{P}_{{\bm{X}}}({\bm{x}}),

for any $d_{1}d_{0}\in\mathcal{U}_{\text{a}}$ . Here $w_{d_{1}d_{0}}^{(zz^{\prime})}(m,{\bm{x}})$ is a sensitivity weight defined in Section D.8, which depends on the confounding functions $\xi$ and the observed-data nuisance functions $p_{zd}({\bm{x}})$ and $r_{zd}(m,{\bm{x}})$ . As an example,

\displaystyle w_{10}^{(10)}(m,{\bm{x}})=\begin{cases}\displaystyle\frac{\xi_{M}^{(0)}(m,{\bm{x}})p_{00}({\bm{x}})}{\xi_{M}^{(0)}(m,{\bm{x}})(p_{11}({\bm{x}})-p_{01}({\bm{x}}))+p_{10}({\bm{x}})}\displaystyle\frac{\xi_{M}^{(1)}(m,{\bm{x}})(p_{11}({\bm{x}})-p_{01}({\bm{x}}))+p_{01}({\bm{x}})}{p_{01}({\bm{x}})/\xi_{Y}^{(1)}(m,{\bm{x}})+\xi_{M}^{(1)}(m,{\bm{x}})(p_{11}({\bm{x}})-p_{01}({\bm{x}}))},&\text{if }m\geq 1,\\ \left\{\displaystyle\frac{1}{r_{00}(0,{\bm{x}})}-\displaystyle\sum_{j=1}^{m_{\max}}\frac{\xi_{M}^{(0)}(j,{\bm{x}})p_{00}({\bm{x}})r_{00}(j,{\bm{x}})/r_{00}(0,{\bm{x}})}{\xi_{M}^{(0)}(j,{\bm{x}})(p_{11}({\bm{x}})-p_{01}({\bm{x}}))+p_{10}({\bm{x}})}\right\}\displaystyle\frac{\xi_{M}^{(1)}(0,{\bm{X}})(p_{11}({\bm{x}})-p_{01}({\bm{x}}))+p_{01}({\bm{x}})}{p_{01}({\bm{x}})/\xi_{Y}^{(1)}(0,{\bm{x}})+\xi_{M}^{(1)}(0,{\bm{x}})(p_{11}({\bm{x}})-p_{01}({\bm{x}}))},&\text{if }m=0.\end{cases}

If $\xi$ is known, we can construct a new estimator of $\theta_{d_{1}d_{0}}^{(zz^{\prime})}$ by carefully re-weighting each term in the original multiply robust estimator $\widehat{\theta}^{(zz^{\prime}),\text{mr}}_{d_{1}d_{0}}$ by the sensitivity weight $w_{d_{1}d_{0}}^{(zz^{\prime})}(m,{\bm{x}})$ . Specifically, the new estimator, $\widehat{\theta}^{(zz^{\prime}),\text{mr}}_{d_{1}d_{0}}(\xi)$ , takes the following form:

$\displaystyle\widehat{\theta}^{(zz^{\prime}),\text{mr}}_{d_{1}d_{0}}(\xi)=$	$\displaystyle\mathbb{P}_{n}\Big{\{}\left(\frac{\mathbb{I}(Z=z^{})\left\{\mathbb{I}(D=d^{})-\widehat{p}_{z^{}d^{}}^{\text{par}}({\bm{X}})\right\}}{\widehat{\pi}_{z^{}}^{\text{par}}({\bm{X}})}-k\frac{(1-Z)\left\{D-\widehat{p}_{01}^{\text{par}}({\bm{X}})\right\}}{\widehat{\pi}_{0}^{\text{par}}({\bm{X}})}\right)\frac{\widehat{\eta}_{zz^{\prime}}^{w,\text{par}}({\bm{X}})}{\widehat{p}_{z^{}d^{*}}^{\text{dr}}-k\widehat{p}_{01}^{\text{dr}}}$
	$\displaystyle+\frac{\widehat{p}_{z^{}d^{}}^{\text{par}}({\bm{X}})-k\widehat{p}_{01}^{\text{par}}({\bm{X}})}{\widehat{p}_{z^{}d^{}}^{\text{dr}}-k\widehat{p}_{01}^{\text{dr}}}\frac{\mathbb{I}(D=d_{z},Z=z)}{\widehat{p}_{zd_{z}}^{\text{par}}({\bm{X}})\widehat{\pi}_{z}^{\text{par}}({\bm{X}})}\frac{\widehat{r}_{z^{\prime}d_{z^{\prime}}}^{\text{par}}(M,{\bm{X}})}{\widehat{r}_{zd_{z}}^{\text{par}}(M,{\bm{X}})}\widehat{w}_{d_{1}d_{0}}^{(zz^{\prime})}(M,{\bm{X}})\left\{Y-\widehat{\mu}_{zd_{z}}^{\text{par}}(M,{\bm{X}})\right\}$
	$\displaystyle+\frac{\widehat{p}_{z^{}d^{}}^{\text{par}}({\bm{X}})-k\widehat{p}_{01}^{\text{par}}({\bm{X}})}{\widehat{p}_{z^{}d^{}}^{\text{dr}}-k\widehat{p}_{01}^{\text{dr}}}\frac{\mathbb{I}(D=d_{z^{\prime}},Z=z^{\prime})}{\widehat{p}_{z^{\prime}d_{z^{\prime}}}^{\text{par}}({\bm{X}})\widehat{\pi}_{z^{\prime}}^{\text{par}}({\bm{X}})}\left\{\widehat{w}_{d_{1}d_{0}}^{(zz^{\prime})}(M,{\bm{X}})\widehat{\mu}_{zd_{z}}^{\text{par}}(M,{\bm{X}})-\widehat{\eta}_{zz^{\prime}}^{w,\text{par}}({\bm{X}})\right\}$
	$\displaystyle+\frac{\widehat{p}_{z^{}d^{}}^{\text{par}}({\bm{X}})-k\widehat{p}_{01}^{\text{par}}({\bm{X}})}{\widehat{p}_{z^{}d^{}}^{\text{dr}}-k\widehat{p}_{01}^{\text{dr}}}\widehat{\eta}_{zz^{\prime}}^{w,\text{par}}({\bm{X}})\Big{\}},$	(s8)

where $\widehat{\eta}_{zz^{\prime}}^{w,\text{par}}({\bm{x}})=\displaystyle\sum_{m=0}^{m_{\max}}\widehat{w}_{d_{1}d_{0}}^{(zz^{\prime})}(m,{\bm{x}})\widehat{\mu}_{zd_{z}}^{\text{par}}(m,{\bm{x}})\widehat{r}_{z^{\prime}d_{z^{\prime}}}^{\text{par}}(m,{\bm{x}})$ and $\widehat{w}_{d_{1}d_{0}}^{(zz^{\prime})}(m,{\bm{x}})$ is $w_{d_{1}d_{0}}^{(zz^{\prime})}(m,{\bm{x}})$ evaluated at $\left\{\widehat{p}_{zd}^{\text{par}}({\bm{x}}),\widehat{r}_{zd}^{\text{par}}(m,{\bm{x}})\right\}$ . The following proposition shows that $\widehat{\theta}^{(zz^{\prime}),\text{mr}}_{d_{1}d_{0}}(\xi)$ is a doubly robust estimator under $\mathcal{M}_{\pi}\cap\mathcal{M}_{e}\cap\mathcal{M}_{m}$ or $\mathcal{M}_{e}\cap\mathcal{M}_{m}\cap\mathcal{M}_{o}$ .

Proposition S2

Suppose that Assumptions 1, 2, 3a, 5, and 6 hold. Then, the estimator $\widehat{\theta}^{(zz^{\prime}),\text{mr}}_{d_{1}d_{0}}(\xi)$ is consistent and asymptotically normal for any $d_{1}d_{0}\in\mathcal{U}_{\text{a}}$ under $\mathcal{M}_{\pi}\cap\mathcal{M}_{e}\cap\mathcal{M}_{m}$ or $\mathcal{M}_{e}\cap\mathcal{M}_{m}\cap\mathcal{M}_{o}$ .

In practice, the confounding functions in $\xi$ are unknown. To conduct the sensitivity analysis, one can specify a parametric form of $\xi$ indexed by a finite-dimensional parameter $\bm{\lambda}$ , say $\xi_{\bm{\lambda}}$ . Then, one can report $\widehat{\theta}^{(zz^{\prime}),\text{mr}}_{d_{1}d_{0}}(\xi_{\bm{\lambda}})$ and its confidence intervals over a sequence of values of $\bm{\lambda}$ , which summarizes how sensitive the inference is affected under assumed departure from the principal ignorability assumption.

The above sensitivity analysis strategy can be easily extended to the scenario under strong monotonicity. Because there are no always-takers under strong monotonicity, we only need to quantify the departure of the principal ignorability between the never-takers and compliers strata, that is, only $\xi_{M}^{(0)}(m,{\bm{x}})$ and $\xi_{Y}^{(0)}(m,{\bm{x}})$ are needed for sensitivity analysis. Similar to the construction of (s8), we can develop an estimator, $\widehat{\theta}^{(zz^{\prime}),\text{mr}}_{d_{1}d_{0}}(\kappa)$ , based on a set of confounding functions, $\kappa=\{\xi_{M}^{(0)}(m,{\bm{x}})\text{ for }m=1,\dots,m_{\max},~{}\xi_{Y}^{(0)}(m,{\bm{x}})\text{ for }m=0,\allowbreak\dots,\allowbreak m_{\max}\}$ , and this estimator is consistent to $\theta^{(zz^{\prime})}_{d_{1}d_{0}}$ under $\mathcal{M}_{\pi}\cap\mathcal{M}_{e}\cap\mathcal{M}_{m}$ or $\mathcal{M}_{e}\cap\mathcal{M}_{m}\cap\mathcal{M}_{o}$ for any $d_{1}d_{0}\in\mathcal{U}_{\text{b}}$ . Details of $\widehat{\theta}^{(zz^{\prime}),\text{mr}}_{d_{1}d_{0}}(\kappa)$ are given in Section D.9. Analogously, one can report $\widehat{\theta}^{(zz^{\prime}),\text{mr}}_{d_{1}d_{0}}(\kappa_{\bm{\lambda}})$ over a set of choices of $\bm{\lambda}$ to quantify the values of $\theta_{d_{1}d_{0}}^{(zz^{\prime})}$ under assumed departure from principal ignorability, where $\kappa_{\bm{\lambda}}$ is user-specified parametric functions of $\kappa$ .

C.2 Sensitivity analysis for the ignorability of the mediator assumption

We develop a sensitivity analysis framework to assess the extent to which the violation of Assumption 5 might affect the inference of $\theta_{d_{1}d_{0}}^{(10)}$ ; identification of $\theta_{d_{1}d_{0}}^{(11)}$ and $\theta_{d_{1}d_{0}}^{(00)}$ , however, does not depend on Assumption 5. In Section D.10, we show that the expression of $\theta_{d_{1}d_{0}}^{(10)}$ in Theorem 1 holds under a weaker version of Assumption 5 such that $\mathbb{E}_{Y_{zm}|Z,M,U,{\bm{X}}}[Y_{zm}|z,m,d_{1}d_{0},{\bm{X}}]=\mathbb{E}_{Y_{zm}|Z,M,U,{\bm{X}}}[Y_{zm}|z,0,d_{1}d_{0},{\bm{X}}]$ , for all $m>0$ , $z\in\{0,1\}$ , $d_{1}d_{0}\in\mathcal{U}_{a}$ under standard monotonicity, and $d_{1}d_{0}\in\mathcal{U}_{b}$ under strong monotonicity. This weaker assumption only requires mean independence between the potential outcome and the mediator conditional on the treatment assignment, principal strata, and baseline covariates. Recognizing the sufficiency of this weaker assumption, we propose the following sensitivity function to assess violations of the weaker version of Assumption 5:

t(z,m,d_{1}d_{0},{\bm{x}})=\frac{\mathbb{E}_{Y_{1m}|Z,M,U,{\bm{X}}}[Y_{1m}|z,m,d_{1}d_{0},{\bm{x}}]}{\mathbb{E}_{Y_{1m}|Z,M,U,{\bm{X}}}[Y_{1m}|z,0,d_{1}d_{0},{\bm{x}}]},

for $m\in\{0,1,\dots,m_{\max}\}$ , where $t(z,0,d_{1}d_{0},{\bm{x}})\equiv 1$ by definition. If $t(z,m,d_{1}d_{0},{\bm{x}})$ differs from 1, then the identification formula of $\theta^{(10)}_{d_{1}d_{0}}$ in Theorem 1 no longer holds. The following proposition generalizes Theorem 1 to the scenario for a known $t(z,m,d_{1}d_{0},{\bm{x}})$ .

Proposition S3

Suppose that Assumptions 1–4 and 6 hold. Based on the confounding function $t(z,m,d_{1}d_{0},{\bm{x}})$ , we can identify $\theta^{(10)}_{d_{1}d_{0}}$ as

\theta^{(10)}_{d_{1}d_{0}}=\int_{{\bm{x}}}\frac{e_{d_{1}d_{0}}({\bm{x}})}{e_{d_{1}d_{0}}}\left\{\sum_{m=0}^{m_{\max}}\rho_{d_{1}d_{0}}^{(10)}(m,{\bm{x}})\mu_{1d_{1}}(m,{\bm{x}})r_{0d_{0}}(m,{\bm{x}})\right\}\text{d}\mathbb{P}_{{\bm{X}}}({\bm{x}}),

for any $d_{1}d_{0}\in\mathcal{U}_{\text{a}}$ under standard monotonicity and any $d_{1}d_{0}\in\mathcal{U}_{\text{b}}$ under strong monotonicity, where

\rho_{d_{1}d_{0}}^{(10)}(m,{\bm{x}})=\left\{\sum_{j=0}^{m_{\max}}\frac{t(1,j,d_{1}d_{0},{\bm{x}})}{t(1,m,d_{1}d_{0},{\bm{x}})}r_{1d_{1}}(j,{\bm{x}})\right\}\Big{/}\left\{\sum_{j=0}^{m_{\max}}\frac{t(0,j,d_{1}d_{0},{\bm{x}})}{t(0,m,d_{1}d_{0},{\bm{x}})}r_{0d_{0}}(j,{\bm{x}})\right\}

is the sensitivity weight which depends on the confounding function $t(z,m,d_{1}d_{0},{\bm{x}})$ and the observed-data nuisance function $r_{zd}(m,{\bm{x}})$ .

If the sensitivity function $t=t(z,m,d_{1}d_{0},{\bm{x}})$ is known, we show in Section D.10 that a consistent estimator of $\theta^{(10)}_{d_{1}d_{0}}$ can be obtained by re-weighting each term in the multiply robust estimator by the sensitivity weight $\rho_{d_{1}d_{0}}^{(10)}(m,{\bm{X}})$ , and takes the following form:

	$\displaystyle\widehat{\theta}^{(10),\text{mr}}_{d_{1}d_{0}}(t)=$	$\displaystyle\mathbb{P}_{n}\Big{\{}\left(\frac{\mathbb{I}(Z=z^{})\left\{\mathbb{I}(D=d^{})-\widehat{p}_{z^{}d^{}}^{\text{par}}({\bm{X}})\right\}}{\widehat{\pi}_{z^{}}^{\text{par}}({\bm{X}})}-k\frac{(1-Z)\left\{D-\widehat{p}_{01}^{\text{par}}({\bm{X}})\right\}}{\widehat{\pi}_{0}^{\text{par}}({\bm{X}})}\right)\frac{\widehat{\eta}_{10}^{\rho,\text{par}}({\bm{X}})}{\widehat{p}_{z^{}d^{*}}^{\text{dr}}-k\widehat{p}_{01}^{\text{dr}}}$
		$\displaystyle+\frac{\widehat{p}_{z^{}d^{}}^{\text{par}}({\bm{X}})-k\widehat{p}_{01}^{\text{par}}({\bm{X}})}{\widehat{p}_{z^{}d^{}}^{\text{dr}}-k\widehat{p}_{01}^{\text{dr}}}\frac{\mathbb{I}(D=d_{1},Z=1)}{\widehat{p}_{1d_{1}}^{\text{par}}({\bm{X}})\widehat{\pi}_{1}^{\text{par}}({\bm{X}})}\frac{\widehat{r}_{0d_{0}}^{\text{par}}(M,{\bm{X}})}{\widehat{r}_{1d_{1}}^{\text{par}}(M,{\bm{X}})}\widehat{\rho}_{d_{1}d_{0}}^{(10)}(M,{\bm{X}})\left\{Y-\widehat{\mu}_{1d_{1}}^{\text{par}}(M,{\bm{X}})\right\}$
		$\displaystyle+\frac{\widehat{p}_{z^{}d^{}}^{\text{par}}({\bm{X}})-k\widehat{p}_{01}^{\text{par}}({\bm{X}})}{\widehat{p}_{z^{}d^{}}^{\text{dr}}-k\widehat{p}_{01}^{\text{dr}}}\frac{\mathbb{I}(D=d^{\prime},Z=z^{\prime})}{\widehat{p}_{0d_{0}}^{\text{par}}({\bm{X}})\widehat{\pi}_{0}^{\text{par}}({\bm{X}})}\left\{\widehat{\rho}_{d_{1}d_{0}}^{(10)}(M,{\bm{X}})\widehat{\mu}_{1d_{1}}^{\text{par}}(M,{\bm{X}})-\widehat{\eta}_{10}^{\rho,\text{par}}({\bm{X}})\right\}$
		$\displaystyle+\frac{\widehat{p}_{z^{}d^{}}^{\text{par}}({\bm{X}})-k\widehat{p}_{01}^{\text{par}}({\bm{X}})}{\widehat{p}_{z^{}d^{}}^{\text{dr}}-k\widehat{p}_{01}^{\text{dr}}}\widehat{\eta}_{10}^{\rho,\text{par}}({\bm{X}})\Big{\}},$

where $\widehat{\eta}_{10}^{\rho,\text{par}}({\bm{X}})=\displaystyle\sum_{m=0}^{m_{\max}}\widehat{\rho}_{d_{1}d_{0}}^{(10)}(m,{\bm{x}})\widehat{\mu}_{1d_{1}}^{\text{par}}(m,{\bm{x}})\widehat{r}_{0d_{0}}^{\text{par}}(m,{\bm{x}})$ and $\widehat{\rho}_{d_{1}d_{0}}^{(10)}(m,{\bm{x}})$ is ${\rho}_{d_{1}d_{0}}^{(10)}(m,{\bm{x}})$ evaluated at $\widehat{r}_{zd}^{\text{par}}(m,{\bm{x}})$ . The following proposition shows that $\widehat{\theta}^{(10),\text{mr}}_{d_{1}d_{0}}(t)$ is a triply robust estimator under $\mathcal{M}_{\pi}\cap\mathcal{M}_{e}\cap\mathcal{M}_{m}$ , $\mathcal{M}_{e}\cap\mathcal{M}_{m}\cap\mathcal{M}_{o}$ or $\mathcal{M}_{\pi}\cap\mathcal{M}_{m}\cap\mathcal{M}_{o}$ .

Proposition S4

Suppose that Assumptions 1–4 and 6 hold. Then, under either $\mathcal{M}_{\pi}\cap\mathcal{M}_{e}\cap\mathcal{M}_{m}$ , $\mathcal{M}_{e}\cap\mathcal{M}_{m}\cap\mathcal{M}_{o}$ , or $\mathcal{M}_{\pi}\cap\mathcal{M}_{m}\cap\mathcal{M}_{o}$ , $\widehat{\theta}^{(10),\text{mr}}_{d_{1}d_{0}}(t)$ is consistent and asymptotically normal for any $d_{1}d_{0}\in\mathcal{U}_{\text{a}}$ under standard monotonicity and $d_{1}d_{0}\in\mathcal{U}_{\text{b}}$ under strong monotonicity.

To conduct the sensitivity analysis, one can specify a parametric form of $t$ indexed by a finite-dimensional parameter $\bm{\zeta}$ , $t_{\bm{\zeta}}=t(z,m,d_{1}d_{0},{\bm{x}};\bm{\zeta})$ . Then, one can report $\widehat{\theta}^{(zz^{\prime}),\text{mr}}_{d_{1}d_{0}}(t_{\bm{\zeta}})$ over a range of choices of $\bm{\zeta}$ , which captures the sensitivity of the conclusion under departure from Assumption 5.

C.3 Illustration of the sensitivity analysis framework based on the JOBS II study

This section revisits the JOBS II study in Section 6.1 to assess the robustness of our conclusions to the violation of the proposed structural assumptions. The ignorability assumption of treatment assignment (Assumption 2) and strong monotonicity assumption (Assumption 3) are satisfied in JOBS II study by design, but the principal ignorability (Assumption 4) and the ignorability of the mediator (Assumption 5) are generally not empirically verifiable without additional data. Henceforth, we apply the proposed sensitivity analysis framework to assess robustness of the estimated principal natural mediation effects to the violation of Assumptions 4 and 5, separately. For illustration, we only assess the range of the estimated principal natural mediation effects among the compliers stratum. While examining the violation of one assumption, we assume all other assumptions hold.

C.3.1 Sensitivity analysis for principal ignorability

As we discussed in Section C.1, under a one-sided noncompliance scenario (so strong monotonicity holds) with a binary mediator, the confounding functions $\kappa=\{\xi_{M}^{(0)}(1,{\bm{x}}),\xi_{Y}^{(0)}(1,{\bm{x}})\}$ can be used to measure the extent to deviation of the principal ignorability assumption. Specifically, $\xi_{M}^{(0)}(1,{\bm{x}})=\frac{f_{M_{0}|U,{\bm{X}}}(1|10,{\bm{x}})}{f_{M_{0}|U,{\bm{X}}}(1|00,{\bm{x}})}$ measures the relative risk between compliers against the never-takers on the sense of mastery under the control condition and $\xi_{Y}^{(0)}(1,{\bm{x}})=\frac{E_{Y_{0m}|U,{\bm{X}}}[Y_{0m}|10,{\bm{x}}]}{E_{Y_{0m}|U,{\bm{X}}}[Y_{0m}|00,{\bm{x}}]}$ measures the ratio of the potential outcome mean (under the control condition) between compliers and never-takers. For simplicity (and this is often a practical strategy for sensitivity analysis without additional content knowledge), we assume the two confounding functions do not depend on the measured baseline covariates such that $\xi_{M}^{(0)}(1,{\bm{x}})=\lambda_{M}$ and $\xi_{Y}^{(0)}(1,{\bm{x}})=\lambda_{Y}$ . Our specified parametric confounding function is thus $\kappa_{\bm{\lambda}}=\{\lambda_{M},\lambda_{Y}\}$ .

Figure S4 presents the bias-corrected $\text{PNDE}_{10}$ estimate, $\widehat{\text{PNDE}}_{10}=\widehat{\theta}_{10}^{(10),\text{mr}}(\kappa_{\bm{\lambda}})-\widehat{\theta}_{10}^{(00),\text{mr}}(\kappa_{\bm{\lambda}})$ , with fixed values of $\{\lambda_{M},\lambda_{Y}\}$ ranging within $[0.5,1.5]\times[0.75,1.25]$ . The results suggest that ${\text{PNDE}}_{10}$ is robust to violation of the principal ignorability on the mediator variable as $\widehat{\text{PNDE}}_{10}$ has relatively small fluctuations with different values of $\lambda_{M}$ . For example, $\widehat{\text{PNDE}}_{10}$ only increases from $-$ 0.102 (95% CI: $[-0.193,-0.003]$ ) to $-$ 0.071 (95% CI: $[-0.148,0.015]$ ) when varying $\lambda_{M}$ from 0.5 to 1.5 with $\lambda_{Y}$ fixed at 1 (Figure S4, Panel B). In contrast, ${\text{PNDE}}_{10}$ is more sensitive to violation of the principal ignorability on the outcome variable, because $\widehat{\text{PNDE}}_{10}$ moved toward null when $\lambda_{Y}$ decreases from $1$ and the sign of $\widehat{\text{PNDE}}_{10}$ can even be reverted to positive when $\xi_{Y}\leq 0.87$ .

Next, we assess robustness of our conclusions on $\text{PNIE}_{10}$ under departure from principal ignorability. In the one-sided noncompliance scenario, the validity of $\widehat{\text{PNIE}}_{10}$ only depends on the principal ignorability assumption on the mediator variable (as we clarify in Section D.9, violation of principal ignorabilty on the outcome variable has no impact on $\widehat{\text{PNIE}}_{10}$ ). Therefore, we provide the bias-corrected $\text{PNIE}_{10}$ estimate, $\widehat{\text{PNIE}}_{10}=\widehat{\theta}_{10}^{(11),\text{mr}}(\kappa_{\bm{\lambda}})-\widehat{\theta}_{10}^{(10),\text{mr}}(\kappa_{\bm{\lambda}})$ , for $\lambda_{M}$ ranging from 0.5 to 1.5 in Figure S5. The results suggest that estimates of $\text{PNIE}_{10}$ are robust against violations of principal ignorability, because $\widehat{\text{PNIE}}_{10}$ remains negative among all values of $\lambda_{M}$ considered. The estimated 95% confidence intervals, however, straddle zero when $\lambda_{M}<0.75$ or $\lambda_{M}>1.35$ .

C.3.2 Sensitivity analysis for ignorability of the mediator

We then investigate whether the conclusion about the principal natural mediation effects among the compliers will be subject to change if the ignorability of the mediator is violated (while assuming all remaining assumptions hold). As indicated in Section C.2, the confounding function $t(z,1,d_{1}d_{0},{\bm{x}})=\frac{\mathbb{E}_{Y_{1m}|Z,M,U,{\bm{X}}}[Y_{1m}|z,1,d_{1}d_{0},{\bm{x}}]}{\mathbb{E}_{Y_{1m}|Z,M,U,{\bm{X}}}[Y_{1m}|z,0,d_{1}d_{0},{\bm{x}}]}$ can be used to quantify the degree of violation of the ignorability of the mediator assumption. For simplicity, we assume $t(z,1,d_{1}d_{0},{\bm{x}})$ is constant across all levels of $z$ , $d_{1}d_{0}$ , and ${\bm{x}}$ and therefore focus on a one-dimensional sensitivity parameter $\zeta$ for $t(z,1,d_{1}d_{0},{\bm{x}})$ ; in other words, the parametric confounding function is simply taken as $t_{\bm{\zeta}}:=t(z,1,d_{1}d_{0},{\bm{x}};\zeta)=\zeta$ .

Figure S6 presents the bias-corrected estimates of $\text{PNDE}_{10}$ , by $\widehat{\text{PNDE}}_{10}=\widehat{\theta}_{10}^{(10),\text{mr}}(t_{\bm{\zeta}})-\widehat{\theta}_{10}^{(00),\text{mr}}$ , and the bias-corrected estimates of $\text{PNIE}_{10}$ , by $\widehat{\text{PNIE}}_{10}=\widehat{\theta}_{10}^{(11),\text{mr}}-\widehat{\theta}_{10}^{(10),\text{mr}}(t_{\bm{\zeta}})$ , with $\zeta$ varying from 0.8 to 1.2. We observe that $\widehat{\text{PNDE}}_{10}$ and $\widehat{\text{PNIE}}_{10}$ move towards null with a larger and smaller value of $\zeta$ , respectively. Specifically, we observe that $\widehat{\text{PNDE}}_{10}$ remains negative under all assumed values of $\zeta$ , but the point estimate increases from $-$ 0.132 (95% CI: $[-0.221-0.047]$ ) to $-$ 0.043 (95% CI: $[-0.128,0.041]$ ) when $\zeta$ moves from 0.8 to 1.2. On the other hand, $\widehat{\text{PNIE}}_{10}$ decreases from 0.023 (95% CI: $[0.002,0.048]$ ) to $-$ 0.065 (95% CI: $[-0.105,-0.031]$ ), when $\zeta$ increases from 0.8 and 1.2, suggesting that $\widehat{\text{PNIE}}_{10}$ is relatively more sensitive to violation of Assumption 5.

D Proofs and technical details

D.1 The nonparametric identification result (Theorem 1)

Lemma S1

Let $X$ and $V$ be two random variables with densities $f_{X}(x)$ and $f_{V}(v)$ . Then, we have that $\mathbb{E}[h(X)|V=v]=\int_{x}\frac{f_{V|X}(v|x)}{f_{V}(v)}h(x)\text{d}\mathbb{P}_{X}(x)$ .

Proof.

The proof is straightforward and omitted here. $\square$

Lemma S2

Let $X$ , $V$ , and $G$ be three random variables, then

X\perp\!\!\!\perp\{V,G\}\Longleftrightarrow X\perp\!\!\!\perp V|G\text{ and }X\perp\!\!\!\perp G|V.

Proof.

First suppose that $X\perp\!\!\!\perp\{V,G\}$ holds, then we have that for any $x$ , $v$ , $g$ ,

f_{X,V|G}(c,v|g)=\frac{f_{X,V,G}(x,v,g)}{f_{G}(g)}=\frac{f_{X}(x)f_{V,G}(v,g)}{f_{G}(g)}=f_{X}(x)f_{V|G}(v|g)=f_{X|G}(x|g)f_{V|G}(v|g),

which implies that $X\perp\!\!\!\perp V|G$ . Using the same argument but switching the role of $G$ and $V$ , we can show $X\perp\!\!\!\perp G|V$ under $X\perp\!\!\!\perp\{V,G\}$ . Next suppose $X\perp\!\!\!\perp V|G\text{ and }X\perp\!\!\!\perp G|V$ , which imply that

	$\displaystyle f_{X}(x)$	$\displaystyle=\int_{g}f_{X\|G}(x\|g)\text{d}\mathbb{P}_{G}(g)=\int_{g}f_{X\|G,V}(x\|g,v)\text{d}\mathbb{P}_{G}(g)=\int_{g}f_{X\|V}(x\|v)\text{d}\mathbb{P}_{G}(g)$
		$\displaystyle=f_{X\|V}(x\|v),$		(s9)

for any $x$ and $v$ . Therefore, we can show that for any $x,v,g$ :

\displaystyle f_{X,V,G}(x,v,g)

\displaystyle=f_{X|V,G}(x|v,g)f_{V,G}(v,g)=f_{X|V}(x|v)f_{V,G}(v,g)=f_{X}(x)f_{V,G}(v,g),

where the last equality follows from equation (s9). This concludes $X\perp\!\!\!\perp\{V,G\}$ . $\square$

Lemma S3

The principal ignorability assumption (Assumption 4) indicates that $\{M_{zd},Y_{z^{\prime}dm^{\prime}}\}\perp\!\!\!\perp\{D_{1},D_{0}\}|{\bm{X}}$ for any $z$ , $z^{\prime}$ , $d$ , and $m^{\prime}$ , which further implies

M_{zd}\perp\!\!\!\perp D_{1-z}|D_{z},{\bm{X}}\text{ and }Y_{z^{\prime}dm^{\prime}}\perp\!\!\!\perp D_{1-z^{\prime}}|M_{zd},D_{z^{\prime}},{\bm{X}}.

Proof.

Observing $U=\{D_{1},D_{0}\}$ , we can see that Assumption 4 is equivalent to $\{M_{zd},Y_{z^{\prime}dm^{\prime}}\}\perp\!\!\!\perp\{D_{1},D_{0}\}|{\bm{X}}$ , which implies

M_{zd}\perp\!\!\!\perp\{D_{1},D_{0}\}|{\bm{X}}\text{ and }\{Y_{z^{\prime}dm^{\prime}},M_{zd}\}\perp\!\!\!\perp\{D_{1},D_{0}\}|{\bm{X}}.

In addition, since $\{D_{1},D_{0}\}$ is equivalent to $\{D_{z},D_{1-z}\}$ or $\{D_{z^{\prime}},D_{1-z^{\prime}}\}$ , one can verify that

M_{zd}\perp\!\!\!\perp\{D_{z},D_{1-z}\}|{\bm{X}}\text{ and }\{Y_{z^{\prime}dm^{\prime}},M_{zd}\}\perp\!\!\!\perp\{D_{z^{\prime}},D_{1-z^{\prime}}\}|{\bm{X}}

hold. Therefore, $M_{zd}\perp\!\!\!\perp D_{1-z}|D_{z},{\bm{X}}$ follows from Lemma S2, with $X=M_{zd}$ , $V=D_{1-z}$ , and $G=D_{z}$ , conditional on ${\bm{X}}$ . Similarly, one can show $\{Y_{z^{\prime}dm^{\prime}},M_{zd}\}\perp\!\!\!\perp D_{1-z^{\prime}}|D_{z^{\prime}},{\bm{X}}$ by applying Lemma S2, with $X=\{Y_{z^{\prime}dm^{\prime}},M_{zd}\}$ , $V=D_{1-z^{\prime}}$ , and $G=D_{z^{\prime}}$ , conditional on ${\bm{X}}$ . Finally, $Y_{z^{\prime}dm^{\prime}}\perp\!\!\!\perp D_{1-z^{\prime}}|M_{zd},D_{z^{\prime}},{\bm{X}}$ follows by applying Lemma S2 again, with $X=D_{1-z^{\prime}}$ , $V=Y_{z^{\prime}dm^{\prime}}$ , and $G=M_{zd}$ , conditional on ${\bm{X}}$ . $\square$

Lemma S4

Let $V$ and $G$ be two binary random variables satisfying $V\geq G$ and $X$ be any random variable, then we have

X\perp\!\!\!\perp\{V,G\}\Longleftrightarrow X\perp\!\!\!\perp V\text{ and }X\perp\!\!\!\perp G.

Proof.

This follows from Lemma S1 in Forastiere et al. (2018). $\square$

Lemma S5

Under monotonicity (either Assumption 3a or 3b), Assumption 2 implies that $\{U,M_{z^{\prime}d^{\prime}},Y_{z^{*}d^{*}m^{*}}\}\perp\!\!\!\perp Z|{\bm{X}},$ for any $z^{\prime}$ , $z^{*}$ , $d^{\prime}$ , $d^{*}$ , $m^{*}$ .

Proof.

Assumption 2 suggests that

\{D_{1},M_{z^{\prime}d^{\prime}},Y_{z^{*}d^{\prime}m^{*}}\}\perp\!\!\!\perp Z|{\bm{X}}\text{ and }\{D_{0},M_{z^{\prime}d^{\prime}},Y_{z^{*}d^{\prime}m^{*}}\}\perp\!\!\!\perp Z|{\bm{X}},

for any $z^{\prime}$ , $z^{*}$ , $d^{\prime}$ , and $m^{*}$ . Therefore,

D_{1}\perp\!\!\!\perp Z|M_{z^{\prime}d^{\prime}},Y_{z^{*}d^{\prime}m^{*}},{\bm{X}}\text{ and }D_{0}\perp\!\!\!\perp Z|{\bm{X}},M_{z^{\prime}d^{\prime}},Y_{z^{*}d^{\prime}m^{*}}

(s10)

follow from Lemma S2. Moreover, (s10) further implies

\{D_{1},D_{0}\}\perp\!\!\!\perp Z|M_{z^{\prime}d^{\prime}},Y_{z^{*}d^{\prime}m^{*}},{\bm{X}}\Longleftrightarrow U\perp\!\!\!\perp Z|M_{z^{\prime}d^{\prime}},Y_{z^{*}d^{\prime}m^{*}},{\bm{X}},

(s11)

by applying Lemma S4, with $V=D_{1}$ , $G=D_{0}$ , and $X=Z$ , conditional on $\{M_{z^{\prime}d^{\prime}},Y_{z^{*}d^{\prime}m^{*}},{\bm{X}}\}$ . Therefore, we have that

		$\displaystyle f(Z,U,M_{z^{\prime}d^{\prime}},Y_{z^{}d^{\prime}m^{}}\|{\bm{X}})$
	$\displaystyle=$	$\displaystyle f(Z,U\|M_{z^{\prime}d^{\prime}},Y_{z^{}d^{\prime}m^{}},{\bm{X}})f(M_{z^{\prime}d^{\prime}},Y_{z^{}d^{\prime}m^{}}\|{\bm{X}})$
	$\displaystyle=$	$\displaystyle f(Z\|M_{z^{\prime}d^{\prime}},Y_{z^{}d^{\prime}m^{}},{\bm{X}})f(U\|M_{z^{\prime}d^{\prime}},Y_{z^{}d^{\prime}m^{}},{\bm{X}})f(M_{z^{\prime}d^{\prime}},Y_{z^{}d^{\prime}m^{}}\|{\bm{X}})\quad(\text{by \eqref{eq:lemma5_2}})$
	$\displaystyle=$	$\displaystyle f(Z\|{\bm{X}})f(U\|M_{z^{\prime}d^{\prime}},Y_{z^{}d^{\prime}m^{}},{\bm{X}})f(M_{z^{\prime}d^{\prime}},Y_{z^{}d^{\prime}m^{}}\|{\bm{X}})\quad(\text{by Assumption 2})$
	$\displaystyle=$	$\displaystyle f(Z\|{\bm{X}})f(U,M_{z^{\prime}d^{\prime}},Y_{z^{}d^{\prime}m^{}}\|{\bm{X}}).$

This equation then shows that $\{U,M_{z^{\prime}d^{\prime}},Y_{z^{*}d^{\prime}m^{*}}\}\perp\!\!\!\perp Z|{\bm{X}}$ for any $z^{\prime}$ , $z^{*}$ , $d^{\prime}$ , and $m^{*}$ . $\square$

Lemma S6

Under Assumptions 2–4, Assumption 5 is equivalent to $M_{zd}\perp\!\!\!\perp Y_{z^{\prime}d^{\prime}m^{\prime}}|{\bm{X}}$ $\forall$ $z$ , $z^{\prime}$ , $d$ , $d^{\prime}$ , $m^{\prime}$ .

Proof.

Observe that

	Assumption 5	$\displaystyle\Longleftrightarrow f(M_{zd},Y_{z^{\prime}d^{\prime}m^{\prime}}\|Z,U,{\bm{X}})=f(M_{zd}\|Z,U,{\bm{X}})f(Y_{z^{\prime}d^{\prime}m^{\prime}}\|Z,U,{\bm{X}})$
		$\displaystyle\Longleftrightarrow f(M_{zd},Y_{z^{\prime}d^{\prime}m^{\prime}}\|U,{\bm{X}})=f(M_{zd}\|U,{\bm{X}})f(Y_{z^{\prime}d^{\prime}m^{\prime}}\|U,{\bm{X}})$
		$\displaystyle\Longleftrightarrow f(M_{zd},Y_{z^{\prime}d^{\prime}m^{\prime}}\|{\bm{X}})=f(M_{zd}\|{\bm{X}})f(Y_{z^{\prime}d^{\prime}m^{\prime}}\|{\bm{X}})$
		$\displaystyle\Longleftrightarrow M_{zd}\perp\!\!\!\perp Y_{z^{\prime}d^{\prime}m^{\prime}}\|{\bm{X}},$

where the first to the second row follows from Lemma S5 (as a consequence of Assumptions 2–3), and the second to the third row follows from Assumption 4. This completes the proof. $\square$

Proof of Theorem 1. Define $d_{z}=\mathbb{I}(z=1)d_{1}+\mathbb{I}(z=0)d_{0}$ such that $d_{z}=d_{1}$ and $d_{0}$ if $z$ in $\theta_{d_{1}d_{0}}^{(zz^{\prime})}$ is 1 and 0, respectively. Similarly, define $d_{z^{\prime}}=\mathbb{I}(z^{\prime}=1)d_{1}+\mathbb{I}(z^{\prime}=0)d_{0}$ , $d_{1-z}=\mathbb{I}(z=0)d_{1}+\mathbb{I}(z=1)d_{0}$ , and $d_{1-z^{\prime}}=\mathbb{I}(z^{\prime}=0)d_{1}+\mathbb{I}(z^{\prime}=1)d_{0}$ . By the definition of $\theta_{d_{1}d_{0}}^{(zz^{\prime})}$ , we have that

	$\displaystyle\theta_{d_{1}d_{0}}^{(zz^{\prime})}$	$\displaystyle=\mathbb{E}[Y_{zM_{z^{\prime}}}\|U=d_{1}d_{0}]$
		$\displaystyle=\mathbb{E}\left[\mathbb{E}[Y_{zM_{z^{\prime}}}\|U=d_{1}d_{0},{\bm{X}}]\Big{\|}U=d_{1}d_{0}\right]\quad\text{(by law of iterated expectations)}$
		$\displaystyle=\mathbb{E}\left[\mathbb{E}[Y_{zM_{z^{\prime}}}\|Z=z,U=d_{1}d_{0},{\bm{X}}]\Big{\|}U=d_{1}d_{0}\right]\quad\text{(by Lemma \ref{lemma:randomization2})}$
		$\displaystyle=\mathbb{E}\left[\int_{m}\mathbb{E}[Y_{zm}\|Z=z,M_{z^{\prime}}=m,U=d_{1}d_{0},{\bm{X}}]\text{d}\mathbb{P}_{M_{z^{\prime}}\|Z,U,{\bm{X}}}(m\|z,d_{1}d_{0},{\bm{X}})\Big{\|}U=d_{1}d_{0}\right]$
		$\displaystyle=\mathbb{E}\left[\int_{m}\mathbb{E}[Y_{zm}\|Z=z,M_{z}=m,U=d_{1}d_{0},{\bm{X}}]\text{d}\mathbb{P}_{M_{z^{\prime}}\|Z,U,{\bm{X}}}(m\|z,d_{1}d_{0},{\bm{X}})\Big{\|}U=d_{1}d_{0}\right]$
		(by Assumption 5)
		$\displaystyle=\mathbb{E}\left[\int_{m}\mathbb{E}[Y_{zm}\|M_{z}=m,U=d_{1}d_{0},{\bm{X}}]\text{d}\mathbb{P}_{M_{z^{\prime}}\|U,{\bm{X}}}(m\|d_{1}d_{0},{\bm{X}})\Big{\|}U=d_{1}d_{0}\right]\quad\text{(by Lemma \ref{lemma:randomization2})}$
		$\displaystyle=\mathbb{E}\left[\int_{m}\mathbb{E}[Y_{zD_{z}m}\|M_{zD_{z}}=m,U=d_{1}d_{0},{\bm{X}}]\text{d}\mathbb{P}_{M_{z^{\prime}D_{z^{\prime}}}\|U,{\bm{X}}}(m\|d_{1}d_{0},{\bm{X}})\Big{\|}U=d_{1}d_{0}\right]$
		(by composition of potential values)
		$\displaystyle=\mathbb{E}\left[\int_{m}\mathbb{E}[Y_{zD_{z}m}\|D_{z}\!=\!d_{z},D_{1\!-\!z}\!=\!d_{1-z},M_{zD_{z}}\!=\!m,{\bm{X}}]\text{d}\mathbb{P}_{M_{z^{\prime}D_{z^{\prime}}}\|D_{z^{\prime}},D_{1-z^{\prime}},{\bm{X}}}(m\|d_{z^{\prime}},d_{1-z^{\prime}},{\bm{X}})\Big{\|}U\!=\!d_{1}d_{0}\right]$
		$\displaystyle=\mathbb{E}\left[\int_{m}\mathbb{E}[Y_{zD_{z}m}\|D_{z}\!=\!d_{z},M_{zD_{z}}\!=\!m,{\bm{X}}]\text{d}\mathbb{P}_{M_{z^{\prime}D_{z^{\prime}}}\|D_{z^{\prime}},{\bm{X}}}(m\|d_{z^{\prime}},{\bm{X}})\Big{\|}U\!=\!d_{1}d_{0}\right]\quad\text{(by Lemma \ref{lemma:pi_v2})}$
		$\displaystyle=\mathbb{E}\left[\int_{m}\mathbb{E}[Y_{zD_{z}m}\|Z=z,D_{z}\!=\!d_{z},M_{zD_{z}}\!=\!m,{\bm{X}}]\text{d}\mathbb{P}_{M_{z^{\prime}D_{z^{\prime}}}\|Z,D_{z^{\prime}},{\bm{X}}}(m\|z^{\prime},d_{z^{\prime}},{\bm{X}})\Big{\|}U\!=\!d_{1}d_{0}\right]$
		$\displaystyle=\mathbb{E}\left[\int_{m}\mathbb{E}[Y\|Z=z,D=d_{z},M=m,{\bm{X}}]\text{d}\mathbb{P}_{M\|Z,D,{\bm{X}}}(m\|z^{\prime},d_{z^{\prime}},{\bm{X}})\Big{\|}U=d_{1}d_{0}\right]\quad\text{(by Assumption 1)}$
		$\displaystyle=\int_{{\bm{x}}}\frac{f_{U\|{\bm{X}}}(d_{1}d_{0}\|{\bm{x}})}{f_{U}(d_{1}d_{0})}\int_{m}\mathbb{E}_{Y\|Z,D,M,{\bm{X}}}[Y\|z,d_{z},m,{\bm{X}}]\text{d}\mathbb{P}_{M\|Z,D,{\bm{X}}}(m\|z^{\prime},d_{z^{\prime}},{\bm{X}})\text{d}\mathbb{P}_{{\bm{X}}}({\bm{x}})\quad\text{(by Lemma \ref{lemma:expectation})}$
		$\displaystyle=\int_{{\bm{x}}}\frac{e_{d_{1}d_{0}}({\bm{x}})}{e_{d_{1}d_{0}}}\int_{m}\mathbb{E}_{Y\|Z,D,M,{\bm{X}}}[Y\|z,d_{z},m,{\bm{X}}]\text{d}\mathbb{P}_{M\|Z,D,{\bm{X}}}(m\|z^{\prime},d_{z^{\prime}},{\bm{X}})\text{d}\mathbb{P}_{{\bm{X}}}({\bm{x}}),$

where $e_{d_{1}d_{0}}({\bm{x}})=p_{z^{*}d^{*}}({\bm{x}})-kp_{01}({\bm{x}})$ and $e_{d_{1}d_{0}}=p_{z^{*}d^{*}}-kp_{01}$ are identified in equation (3) of the main manuscript under the monotonicity assumption (either Assumption 3a or 3b). This completes the proof. $\square$

D.2 Connections to existing literature (Remarks 2 and 3)

We compare the identification assumptions used in the current article to the identification assumptions in Zhou (2022) and Tchetgen Tchetgen and VanderWeele (2014). Specifically, Zhou (2022) considers the identification of path-specific effects in the presence of multiple causally-ordered mediators and Tchetgen Tchetgen and VanderWeele (2014) considers the identification of mediation effects in the presence of an exposure-induced confounder.

We focus on a comparable scenario with two intermediate variables, a binary variable $D$ and a binary/continuous variable $M$ , both of which sit in the causal pathway between the treatment assignment $Z$ and an outcome $Y$ , and we further assume the monotonicity assumption of $Z$ on $D$ (either Assumption 3a or Assumption 3b) holds. These two intermediate variables, $(D,M)$ , have different names in these three papers; they are referred to as the post-treatment event and the mediator in the current paper, as the first mediator and the second mediator in Zhou (2022), and as the treatment-induced confounder and the mediator in Tchetgen Tchetgen and VanderWeele (2014). All three papers consider the consistency assumption (Assumption 1) and slightly different versions of the positivity assumption. To offer a common ground for the comparison of key identification assumptions, throughout the comparison, we always assume the consistency (Assumption 1) and the positivity (Assumption 6) hold. Besides the consistency and positivity assumptions, Tchetgen Tchetgen and VanderWeele (2014) consider the monotonicity assumption (Assumption 3a) and the following NPSEM-IE holds:

Assumption S1 (NPSEM-IE in Tchetgen Tchetgen and VanderWeele, 2014)

Suppose the following nonparametric structural equation models with independent errors hold:

(i)

${\bm{X}}=g_{\bm{X}}(\epsilon_{\bm{X}})$ ,
(ii)

$Z=g_{Z}({\bm{X}},\epsilon_{Z})$ ,
(iii)

$D=g_{D}(Z,{\bm{X}},\epsilon_{D})$ ,
(iv)

$M=g_{M}(Z,D,{\bm{X}},\epsilon_{M})$ ,
(v)

$Y=g_{Y}(Z,D,M,{\bm{X}},\epsilon_{Y})$ ,

where $\{g_{\bm{X}},g_{Z},g_{D},g_{M},g_{Y}\}$ are nonparametric functions and the errors $\{\epsilon_{\bm{X}},\epsilon_{Z},\epsilon_{D},\epsilon_{M},\epsilon_{Y}\}$ are mutually independent.

Besides the consistency and positivity assumptions, Zhou (2022) consider the following generalized sequential ignorability assumptions:

Assumption S2 (Assumption 2 in Zhou, 2022)

Suppose the following set of ignorability assumptions hold:

(i)

$\{D_{z},M_{z^{\prime}d^{\prime}},Y_{z^{*}d^{\prime}m^{*}}\}\perp\!\!\!\perp Z|{\bm{X}}$ for any $z$ , $z^{\prime}$ , $d^{\prime}$ , $z^{*}$ , $m^{*}$ ,
(ii)

$\{M_{zd},Y_{z^{\prime}dm^{\prime}}\}\perp\!\!\!\perp D_{z^{*}}|Z,{\bm{X}}$ for any $z$ , $z^{\prime}$ , $z^{*}$ , $d$ , $m^{\prime}$ ,
(iii)

$M_{zd}\perp\!\!\!\perp Y_{z^{\prime}d^{\prime}m^{\prime}}|Z,D,{\bm{X}}$ for any $z$ , $z^{\prime}$ , $d$ , $d^{\prime}$ , $m^{\prime}$ .

Besides the consistency and positivity assumptions, the current work considers Assumptions 2, 4, 5. To facilitate exposition, we restate these three assumptions:

Assumption S3 (Assumptions 2, 4 and 5 in current work)

Suppose the following ignorability assumptions hold:

(i)

$\{D_{z},M_{z^{\prime}d^{\prime}},Y_{z^{*}d^{\prime}m^{*}}\}\perp\!\!\!\perp Z|{\bm{X}}$ for any $z$ , $z^{\prime}$ , $z^{*}$ , $d^{\prime}$ , and $m^{*}$ ,
(ii)

$\{M_{zd},Y_{z^{\prime}dm^{\prime}}\}\perp\!\!\!\perp U|{\bm{X}}$ for any $z$ , $z^{\prime}$ , $d$ , and $m^{\prime}$ ,
(iii)

$M_{zd}\perp\!\!\!\perp Y_{z^{\prime}d^{\prime}m^{\prime}}|Z,U,{\bm{X}}$ for any $z$ , $z^{\prime}$ , $d$ , $d^{\prime}$ , $m^{\prime}$ .

The following two lemmas are useful to prove Remarks 2 and 3 in the paper.

Lemma S7

Suppose that Assumptions 1 and 3 hold. Then, if Assumption S1 holds, Assumption S2 also holds, but not vice versa.

Proof.

First suppose that Assumption S1 holds. According to Assumption S1 and by the consistency (Assumption 1) and composition of potential values, we have

$\displaystyle Z$	$\displaystyle=g_{Z}({\bm{X}},\epsilon_{Z})$
$\displaystyle D_{z}$	$\displaystyle=g_{D}(z,{\bm{X}},\epsilon_{D})$	(s12)
$\displaystyle M_{z^{\prime}d^{\prime}}$	$\displaystyle=g_{M}(z^{\prime},d^{\prime},{\bm{X}},\epsilon_{M})$	(s13)
$\displaystyle Y_{z^{}d^{\prime}m^{}}$	$\displaystyle=g_{Y}(z^{},d^{\prime},m^{},{\bm{X}},\epsilon_{Y})$	(s14)

for any $z$ , $z^{\prime}$ , $z^{*}$ , $d^{\prime}$ , and $m^{*}$ , which indicates that $\{D_{z},M_{z^{\prime}d^{\prime}},Y_{z^{*}d^{\prime}m^{*}}\}\perp\!\!\!\perp Z|{\bm{X}}$ because $\{\epsilon_{Z},\epsilon_{D},\epsilon_{M},\epsilon_{Y}\}$ are mutually independent. Therefore, Assumption S2(i) holds. Moreover, equations (s12), (s13), and (s14) suggest that

\{M_{zd},Y_{z^{\prime}dm^{\prime}}\}\perp\!\!\!\perp D_{z^{*}}|{\bm{X}},

(s15)

for any $z$ , $z^{\prime}$ , $z^{*}$ , $d$ , $m^{\prime}$ . This implies

f(M_{zd},Y_{z^{\prime}dm^{\prime}},D_{z^{*}}|{\bm{X}})=f(M_{zd},Y_{z^{\prime}dm^{\prime}}|{\bm{X}})f(D_{z^{*}}|{\bm{X}}),

which, together with Assumption S2(i), implies that

f(M_{zd},Y_{z^{\prime}dm^{\prime}},D_{z^{*}}|Z,{\bm{X}})=f(M_{zd},Y_{z^{\prime}dm^{\prime}}|Z,{\bm{X}})f(D_{z^{*}}|Z,{\bm{X}}).

Therefore, Assumption S2(ii) hold. Similarly, equations (s13) and (s14) suggest that $M_{zd}\perp\!\!\!\perp Y_{z^{\prime}d^{\prime}m^{\prime}}|{\bm{X}}$ for any $z$ , $z^{\prime}$ , $d$ , $d^{\prime}$ , $m^{\prime}$ , which indicates

f(M_{zd},Y_{z^{\prime}d^{\prime}m^{\prime}}|{\bm{X}})=f(M_{zd}|{\bm{X}})f(Y_{z^{\prime}d^{\prime}m^{\prime}}|{\bm{X}}).

This, coupled with (s15), suggests that

f(M_{zd},Y_{z^{\prime}d^{\prime}m^{\prime}}|D_{z^{*}},{\bm{X}})=f(M_{zd}|D_{z^{*}},{\bm{X}})f(Y_{z^{\prime}d^{\prime}m^{\prime}}|D_{z^{*}},{\bm{X}}),

for any $z^{*}$ , which further implies

f(M_{zd},Y_{z^{\prime}d^{\prime}m^{\prime}}|Z=z^{*},D_{z^{*}},{\bm{X}})=f(M_{zd}|Z=z^{*},D_{z^{*}},{\bm{X}})f(M_{zd}|Z=z^{*},D_{z^{*}},{\bm{X}})

as a consequence of Assumption S2(i). Because $D_{z^{*}}=D$ if $Z=z^{*}$ , we conclude that Assumption S2(iii) holds. Now we complete the proof that Assumption S2 also holds if Assumption S1 is valid. However, Assumption S1 may not hold under Assumption S2; that is, Assumption S1 is stronger than Assumption S2. For example, Assumption S2 does not require $\{M_{zd},M_{z^{\prime}d^{\prime}}\}\perp\!\!\!\perp Z|{\bm{X}}$ for $zd\neq z^{\prime}d^{\prime}$ , but Assumption S1 implicitly require this by the following set of nonparametric structural equations:

	$\displaystyle Z$	$\displaystyle=g_{Z}({\bm{X}},\epsilon_{Z}),$
	$\displaystyle M_{zd}$	$\displaystyle=g_{M}(z,d,{\bm{X}},\epsilon_{M}),$
	$\displaystyle M_{z^{\prime}d^{\prime}}$	$\displaystyle=g_{M}(z^{\prime},d^{\prime},{\bm{X}},\epsilon_{M}).$

$\square$

Lemma S8

Suppose that Assumptions 1 and 3 hold. Then, Assumption S2 is equivalent to Assumption S3.

Proof.

Assumption S2(i) is same to Assumption S3(i) by direct comparison. Next, we show Assumption S2(ii) is equivalent to Assumption S3(ii) under Assumption S2(i) (or equivalently Assumption S3(i)). Specifically, under Assumption S2(i), Assumption S2(ii) suggests that

	$\displaystyle f(M_{zd},Y_{z^{\prime}dm^{\prime}},D_{z^{*}}\|{\bm{X}})$	$\displaystyle=f(M_{zd},Y_{z^{\prime}dm^{\prime}},D_{z^{*}}\|Z,{\bm{X}})$
		$\displaystyle=f(M_{zd},Y_{z^{\prime}dm^{\prime}}\|Z,{\bm{X}})f(D_{z^{*}}\|Z,{\bm{X}})$
		$\displaystyle=f(M_{zd},Y_{z^{\prime}dm^{\prime}}\|{\bm{X}})f(D_{z^{*}}\|{\bm{X}}).$

That is, $\{M_{zd},Y_{z^{\prime}dm^{\prime}}\}\perp\!\!\!\perp D_{z^{*}}|{\bm{X}}$ for any $z$ , $z^{\prime}$ , $z^{*}$ , $d$ , and $m^{\prime}$ , which further implies that

\{M_{zd},Y_{z^{\prime}dm^{\prime}}\}\perp\!\!\!\perp D_{1}|{\bm{X}}\text{ and }\{M_{zd},Y_{z^{\prime}dm^{\prime}}\}\perp\!\!\!\perp D_{0}|{\bm{X}}.

Applying Lemma S4 to the previous equation and noting that $D_{1}\geq D_{0}$ by monotonicity, we obtain $\{M_{zd},Y_{z^{\prime}dm^{\prime}}\}\perp\!\!\!\perp U|{\bm{X}}$ ; i.e., Assumption S3(ii) holds under Assumption S2(i)–(ii). On the other hand, suppose that Assumption S3(ii) hold, then we can obtain that $\{M_{zd},Y_{z^{\prime}dm^{\prime}}\}\perp\!\!\!\perp D_{1}|{\bm{X}}\text{ and }\{M_{zd},Y_{z^{\prime}dm^{\prime}}\}\perp\!\!\!\perp D_{0}|{\bm{X}}$ , by Lemma S4. This suggests that

\{M_{zd},Y_{z^{\prime}dm^{\prime}}\}\perp\!\!\!\perp D_{z^{*}}|{\bm{X}},

for any $z$ , $z^{\prime}$ , $z^{*}$ , $d$ , and $m^{\prime}$ . Then applying Assumption S2(i), we have

	$\displaystyle f(M_{zd},Y_{z^{\prime}dm^{\prime}},D_{z^{*}}\|Z,{\bm{X}})=$	$\displaystyle f(M_{zd},Y_{z^{\prime}dm^{\prime}},D_{z^{}}\|{\bm{X}})=f(M_{zd},Y_{z^{\prime}dm^{\prime}}\|{\bm{X}})f(D_{z^{}}\|{\bm{X}}),$
	$\displaystyle=$	$\displaystyle f(M_{zd},Y_{z^{\prime}dm^{\prime}}\|Z,{\bm{X}})f(D_{z^{*}}\|Z,{\bm{X}})$

thus Assumption S2(ii) also holds under Assumption S3(i)–(ii). Therefore, we have verified that S2(i)–(ii) are equivalent to Assumption S3(i)–(ii).

Next, we show that Assumption S2(iii) is equivalent to Assumption S3(iii) under Assumption S2(i)–(ii) (or equivalently, Assumption S3(i)–(ii)). When the monotonicity assumption (Assumption 3) holds, the following statements are equivalent under Assumption S2(i)–(ii) and Assumption S3(i)–(ii):

		Assumption S3(iii)
	$\displaystyle\Longleftrightarrow$	$\displaystyle f(M_{zd},Y_{z^{\prime}d^{\prime}m^{\prime}}\|Z,U,{\bm{X}})=f(Y_{z^{\prime}d^{\prime}m^{\prime}}\|Z,U,{\bm{X}})f(M_{zd}\|Z,U,{\bm{X}})$
		(by Lemma S5)
	$\displaystyle\Longleftrightarrow$	$\displaystyle f(M_{zd},Y_{z^{\prime}d^{\prime}m^{\prime}}\|U,{\bm{X}})=f(Y_{z^{\prime}d^{\prime}m^{\prime}}\|U,{\bm{X}})f(M_{zd}\|U,{\bm{X}})$
		(by Assumption S3(ii))
	$\displaystyle\Longleftrightarrow$	$\displaystyle f(M_{zd},Y_{z^{\prime}d^{\prime}m^{\prime}}\|{\bm{X}})=f(Y_{z^{\prime}d^{\prime}m^{\prime}}\|{\bm{X}})f(M_{zd}\|{\bm{X}})$
		(by Assumption S3(i) or Assumption S2(i))
	$\displaystyle\Longleftrightarrow$	$\displaystyle f(M_{zd},Y_{z^{\prime}d^{\prime}m^{\prime}}\|Z,{\bm{X}})=f(Y_{z^{\prime}d^{\prime}m^{\prime}}\|Z,{\bm{X}})f(M_{zd}\|Z,{\bm{X}})$
		(by Assumption S2(ii))
	$\displaystyle\Longleftrightarrow$	$\displaystyle f(M_{zd},Y_{z^{\prime}d^{\prime}m^{\prime}}\|Z,D_{z^{}},{\bm{X}})=f(Y_{z^{\prime}d^{\prime}m^{\prime}}\|Z,D_{z^{}},{\bm{X}})f(M_{zd}\|Z,D_{z^{*}},{\bm{X}})$
	$\displaystyle\Longleftrightarrow$	$\displaystyle f(M_{zd},Y_{z^{\prime}d^{\prime}m^{\prime}}\|Z=z^{},D_{z^{}},{\bm{X}})=f(Y_{z^{\prime}d^{\prime}m^{\prime}}\|Z=z^{},D_{z^{}},{\bm{X}})f(M_{zd}\|Z=z^{},D_{z^{}},{\bm{X}})$
	$\displaystyle\Longleftrightarrow$	$\displaystyle f(M_{zd},Y_{z^{\prime}d^{\prime}m^{\prime}}\|Z,D,{\bm{X}})=f(Y_{z^{\prime}d^{\prime}m^{\prime}}\|Z,D,{\bm{X}})f(M_{zd}\|Z,D,{\bm{X}})$
	$\displaystyle\Longleftrightarrow$	$\displaystyle\text{Assumption \ref{assum:zhou}(iii)}.$

Then we conclude the proof. $\square$

Proof of Remark 2 and Remark 3. Remark 3 follows from Lemma S8. Also, noting that Lemma S8 suggests that Assumptions 2, 4, and 5 are equivalent to Lemma S8 when the consistency (Assumption 1) and monotonicity (Assumption 3) hold. Remark 2 then directly follows from Lemma S7. $\square$

D.3 Moment type estimators (Theorem 2 and Proposition 1)

Proof of Theorem 2. One can easily verify that $\theta_{d_{1}d_{0}}^{(zz^{\prime})}=\theta_{d_{1}d_{0}}^{(zz^{\prime}),\text{d}}$ by direct comparison. Below we show $\theta_{d_{1}d_{0}}^{(zz^{\prime}),\text{c}}=\theta_{d_{1}d_{0}}^{(zz^{\prime}),\text{d}}$ :

	$\displaystyle\theta_{d_{1}d_{0}}^{(zz^{\prime}),\text{c}}=$	$\displaystyle\mathbb{E}\left[\frac{p_{z^{}d^{}}({\bm{X}})-kp_{01}({\bm{X}})}{p_{z^{}d^{}}-kp_{01}}\frac{\mathbb{I}(D=d_{z^{\prime}},Z=z^{\prime})}{p_{z^{\prime}d_{z^{\prime}}}({\bm{X}})\pi_{z^{\prime}}({\bm{X}})}\mu_{zd_{z}}(M,{\bm{X}})\right]$
	$\displaystyle=$	$\displaystyle\int_{{\bm{x}}}\int_{m}\left\{\frac{p_{z^{}d^{}}({\bm{x}})-kp_{01}({\bm{x}})}{p_{z^{}d^{}}-kp_{01}}\frac{1}{p_{z^{\prime}d_{z^{\prime}}}({\bm{x}})\pi_{z^{\prime}}({\bm{x}})}\mu_{zd_{z}}(m,{\bm{x}})\right\}f_{D\|Z,{\bm{X}}}(d_{z^{\prime}}\|z^{\prime},{\bm{x}})f_{Z\|{\bm{X}}}(z^{\prime}\|{\bm{x}})$
		$\displaystyle\quad\text{d}\mathbb{P}_{M\|Z,D,{\bm{X}}}(m\|z^{\prime},d_{z^{\prime}},{\bm{x}})\text{d}\mathbb{P}_{{\bm{X}}}({\bm{x}})$
	$\displaystyle=$	$\displaystyle\int_{{\bm{x}}}\int_{m}\left\{\frac{p_{z^{}d^{}}({\bm{x}})-kp_{01}({\bm{x}})}{p_{z^{}d^{}}-kp_{01}}\mu_{zd_{z}}(m,{\bm{x}})\text{d}\mathbb{P}_{M\|Z,D,{\bm{X}}}(m\|z^{\prime},d_{z^{\prime}},{\bm{x}})\text{d}\mathbb{P}_{{\bm{X}}}({\bm{x}})\right\}$
	$\displaystyle=$	$\displaystyle\int_{{\bm{x}}}\frac{p_{z^{}d^{}}({\bm{x}})-kp_{01}({\bm{x}})}{p_{z^{}d^{}}-kp_{01}}\int_{m}\mu_{zd_{z}}(m,{\bm{x}})r_{zd_{z^{\prime}}}(m,{\bm{x}})\text{d}m\text{d}\mathbb{P}_{{\bm{X}}}({\bm{x}})$
	$\displaystyle=$	$\displaystyle\int_{{\bm{x}}}\frac{p_{z^{}d^{}}({\bm{x}})-kp_{01}({\bm{x}})}{p_{z^{}d^{}}-kp_{01}}\eta_{zz^{\prime}}({\bm{x}})\text{d}\mathbb{P}_{{\bm{X}}}({\bm{x}})=\theta_{d_{1}d_{0}}^{(zz^{\prime}),\text{d}}.$

Next, we show $\theta_{d_{1}d_{0}}^{(zz^{\prime}),\text{b}}=\theta_{d_{1}d_{0}}^{(zz^{\prime}),\text{d}}$ :

	$\displaystyle\theta_{d_{1}d_{0}}^{(zz^{\prime}),\text{b}}=$	$\displaystyle\mathbb{E}\left[\left\{\frac{\mathbb{I}(Z=z^{},D=d^{})}{\pi_{z^{}}({\bm{X}})}-k\frac{(1-Z)D}{\pi_{0}({\bm{X}})}\right\}\frac{\eta_{zz^{\prime}}({\bm{X}})}{p_{z^{}d^{*}}-kp_{01}}\right]$
	$\displaystyle=$	$\displaystyle\int_{{\bm{x}}}\left\{\frac{1}{\pi_{z^{}}({\bm{x}})}\frac{\eta_{zz^{\prime}}({\bm{x}})}{p_{z^{}d^{}}-kp_{01}}\right\}f_{D\|Z,{\bm{X}}}(d^{}\|z^{},{\bm{x}})f_{Z\|{\bm{X}}}(z^{}\|{\bm{x}})\text{d}\mathbb{P}_{\bm{X}}({\bm{x}})$
		$\displaystyle-\int_{{\bm{x}}}\left\{\frac{k}{\pi_{0}({\bm{x}})}\frac{\eta_{zz^{\prime}}({\bm{x}})}{p_{z^{}d^{}}-kp_{01}}\right\}f_{D\|Z,{\bm{X}}}(1\|0,{\bm{x}})f_{Z\|{\bm{X}}}(0\|{\bm{x}})\text{d}\mathbb{P}_{\bm{X}}({\bm{x}})$
	$\displaystyle=$	$\displaystyle\int_{{\bm{x}}}\left\{f_{D\|Z,{\bm{X}}}(d^{}\|z^{},{\bm{x}})-kf_{D\|Z,{\bm{X}}}(1\|0,{\bm{x}})\right\}\frac{\eta_{zz^{\prime}}({\bm{x}})}{p_{z^{}d^{}}-kp_{01}}\text{d}\mathbb{P}_{\bm{X}}({\bm{x}})$
	$\displaystyle=$	$\displaystyle\int_{{\bm{x}}}\frac{p_{z^{}d^{}}({\bm{x}})-kp_{01}({\bm{x}})}{p_{z^{}d^{}}-kp_{01}}\eta_{zz^{\prime}}({\bm{x}})\text{d}\mathbb{P}_{{\bm{X}}}({\bm{x}})=\theta_{d_{1}d_{0}}^{(zz^{\prime}),\text{d}}.$

Finally, we show $\theta_{d_{1}d_{0}}^{(zz^{\prime}),\text{a}}=\theta_{d_{1}d_{0}}^{(zz^{\prime}),\text{d}}$ :

	$\displaystyle\theta_{d_{1}d_{0}}^{(zz^{\prime}),\text{a}}=$	$\displaystyle\mathbb{E}\left[\frac{p_{z^{}d^{}}({\bm{X}})-kp_{01}({\bm{X}})}{p_{z^{}d^{}}-kp_{01}}\frac{\mathbb{I}(D=d_{z},Z=z)}{p_{zd_{z}}({\bm{X}})\pi_{z}({\bm{X}})}\frac{r_{z^{\prime}d_{z^{\prime}}}(M,{\bm{X}})}{r_{zd_{z}}(M,{\bm{X}})}Y\right]$
	$\displaystyle=$	$\displaystyle\int_{{\bm{x}}}\int_{m}\int_{y}\left\{\frac{p_{z^{}d^{}}({\bm{x}})-kp_{01}({\bm{x}})}{p_{z^{}d^{}}-kp_{01}}\frac{1}{p_{zd_{z}}({\bm{x}})\pi_{z}({\bm{x}})}\frac{r_{z^{\prime}d_{z^{\prime}}}(m,{\bm{x}})}{r_{zd_{z}}(m,{\bm{x}})}y\right\}f_{D\|Z,{\bm{X}}}(d_{z}\|z,{\bm{x}})f_{Z\|{\bm{X}}}(z\|{\bm{x}})$
		$\displaystyle\quad\text{d}\mathbb{P}_{Y\|Z,D,M{\bm{X}}}(y\|z,d_{z},m,{\bm{x}})\text{d}\mathbb{P}_{M\|Z,D,{\bm{X}}}(m\|z^{\prime},d_{z^{\prime}},{\bm{x}})\text{d}\mathbb{P}_{{\bm{X}}}({\bm{x}})$
	$\displaystyle=$	$\displaystyle\int_{{\bm{x}}}\int_{m}\int_{y}\frac{p_{z^{}d^{}}({\bm{x}})-kp_{01}({\bm{x}})}{p_{z^{}d^{}}-kp_{01}}\frac{r_{z^{\prime}d_{z^{\prime}}}(m,{\bm{x}})}{r_{zd_{z}}(m,{\bm{x}})}y\text{d}\mathbb{P}_{Y\|Z,D,M{\bm{X}}}(y\|z,d_{z},m,{\bm{x}})$
		$\displaystyle\quad\quad\text{d}\mathbb{P}_{M\|Z,D,{\bm{X}}}(m\|z^{\prime},d_{z^{\prime}},{\bm{x}})\text{d}\mathbb{P}_{{\bm{X}}}({\bm{x}})$
	$\displaystyle=$	$\displaystyle\int_{{\bm{x}}}\frac{p_{z^{}d^{}}({\bm{x}})-kp_{01}({\bm{x}})}{p_{z^{}d^{}}-kp_{01}}\int_{m}\frac{r_{z^{\prime}d_{z^{\prime}}}(m,{\bm{x}})}{r_{zd_{z}}(m,{\bm{x}})}\int_{y}y\text{d}\mathbb{P}_{Y\|Z,D,M{\bm{X}}}(y\|z,d_{z},m,{\bm{x}})$
		$\displaystyle\quad\quad\text{d}\mathbb{P}_{M\|Z,D,{\bm{X}}}(m\|z^{\prime},d_{z^{\prime}},{\bm{x}})\text{d}\mathbb{P}_{{\bm{X}}}({\bm{x}})$
	$\displaystyle=$	$\displaystyle\int_{{\bm{x}}}\frac{p_{z^{}d^{}}({\bm{x}})-kp_{01}({\bm{x}})}{p_{z^{}d^{}}-kp_{01}}\int_{m}\frac{r_{z^{\prime}d_{z^{\prime}}}(m,{\bm{x}})}{r_{zd_{z}}(m,{\bm{x}})}\mu_{zd_{z}}(m,{\bm{x}})\text{d}\mathbb{P}_{M\|Z,D,{\bm{X}}}(m\|z^{\prime},d_{z^{\prime}},{\bm{x}})\text{d}\mathbb{P}_{{\bm{X}}}({\bm{x}})$
	$\displaystyle=$	$\displaystyle\int_{{\bm{x}}}\frac{p_{z^{}d^{}}({\bm{x}})-kp_{01}({\bm{x}})}{p_{z^{}d^{}}-kp_{01}}\int_{m}r_{z^{\prime}d_{z^{\prime}}}(m,{\bm{x}})\mu_{zd_{z}}(m,{\bm{x}})\text{d}m\text{d}\mathbb{P}_{{\bm{X}}}({\bm{x}})$
	$\displaystyle=$	$\displaystyle\int_{{\bm{x}}}\frac{p_{z^{}d^{}}({\bm{x}})-kp_{01}({\bm{x}})}{p_{z^{}d^{}}-kp_{01}}\eta_{zz^{\prime}}({\bm{x}})\text{d}\mathbb{P}_{{\bm{X}}}({\bm{x}})=\theta_{d_{1}d_{0}}^{(zz^{\prime}),\text{d}}.$

This concludes that $\theta_{d_{1}d_{0}}^{(zz^{\prime})}=\theta^{(zz^{\prime}),\textrm{a}}_{d_{1}d_{0}}=\theta^{(zz^{\prime}),\textrm{b}}_{d_{1}d_{0}}=\theta^{(zz^{\prime}),\textrm{c}}_{d_{1}d_{0}}=\theta^{(zz^{\prime}),\textrm{d}}_{d_{1}d_{0}}$ . $\square$

Next, we proceed with the proof of Proposition 1.

Proof of Proposition 1. Here we only prove the consistency and asymptotic normality of $\widehat{\theta}_{d_{1}d_{0}}^{(zz^{\prime}),\text{d}}$ under $\mathcal{M}_{e}\cap\mathcal{M}_{m}\cap\mathcal{M}_{o}$ , and the proof can be easily extended to the other three moment-type estimators, $\widehat{\theta}_{d_{1}d_{0}}^{(zz^{\prime}),\text{a}}$ , $\widehat{\theta}_{d_{1}d_{0}}^{(zz^{\prime}),\text{b}}$ , and $\widehat{\theta}_{d_{1}d_{0}}^{(zz^{\prime}),\text{c}}$ .

Let $\bm{\tau}$ be all of the parameters in the parametric working models of $h_{nuisance}^{\text{par}}$ , and let $\bm{\tau}^{*}$ be the probability limit of $\widehat{\bm{\tau}}$ . Let $\widetilde{h}_{nuisance}=\{\widetilde{\pi}_{z}({\bm{x}}),\widetilde{p}_{zd}({\bm{x}}),\widetilde{r}_{zd}(m,{\bm{x}}),\widetilde{\mu}_{zd}(m,{\bm{x}})\}$ be the value of $h_{nuisance}^{\text{par}}$ when it is evaluated at $\bm{\tau}^{*}$ ; $\widetilde{h}_{nuisance}$ is taken as the probability limit of $\widehat{h}_{nuisance}^{\text{par}}$ . Under $\mathcal{M}_{e}\cap\mathcal{M}_{m}\cap\mathcal{M}_{o}$ , we have that $\widetilde{p}_{zd}({\bm{x}})=p_{zd}({\bm{x}})$ , $\widetilde{r}_{zd}(m,{\bm{x}})=r_{zd}(m,{\bm{x}})$ , $\widetilde{\mu}_{zd}(m,{\bm{x}})=\mu_{zd}(m,{\bm{x}})$ , but we allow $\widetilde{\pi}_{z}({\bm{x}})\neq\pi_{z}({\bm{x}})$ due to possible misspecification of $\mathcal{M}_{\pi}$ . Let $\widetilde{p}_{zd}=\mathbb{E}\left[\frac{\mathbb{I}(Z=z)(\mathbb{I}(D=d)-\widetilde{p}_{zd}({\bm{X}}))}{\widetilde{\pi}_{z}({\bm{X}})}+\widetilde{p}_{zd}({\bm{X}})\right]$ be the probability limit of $p_{zd}^{\text{dr}}$ . According to Jiang et al. (2022), $\widetilde{p}_{zd}=p_{zd}$ under $\mathcal{M}_{\pi}\cup\mathcal{M}_{e}$ , a condition that is nested within $\mathcal{M}_{e}\cap\mathcal{M}_{m}\cap\mathcal{M}_{o}$ .

Next, we prove the consistency and asymptotic normality of $\widehat{\theta}^{(zz^{\prime}),\text{d}}_{d_{1}d_{0}}$ . Notice that $\widehat{\theta}^{(zz^{\prime}),\text{d}}_{d_{1}d_{0}}$ can be viewed as the solution of the following estimating equation

\mathbb{P}_{n}\left[S_{\text{d}}\left({\bm{O}};\theta_{d_{1}d_{0}}^{(zz^{\prime})},\widehat{\bm{\tau}}\right)\right]=\mathbb{P}_{n}\left[\mathcal{S}_{1}(\bm{O};\widehat{\bm{\tau}})-\theta_{d_{1}d_{0}}^{(zz^{\prime})}\mathcal{S}_{0}(\bm{O};\widehat{\bm{\tau}})\right]=0,

where

\mathcal{S}_{1}(\bm{O};\bm{\tau})=\frac{p_{z^{*}d^{*}}^{\text{par}}({\bm{X}})-kp_{01}^{\text{par}}({\bm{X}})}{p_{z^{*}d^{*}}^{\text{dr}}-kp_{01}^{\text{dr}}}\int_{m}\mu_{zd_{z}}^{\text{par}}(m,{\bm{X}})r_{z^{\prime}d_{z^{\prime}}}^{\text{par}}(m,{\bm{X}})dm

and

\mathcal{S}_{0}(\bm{O};\bm{\tau})=\mathbb{P}_{n}\left[\frac{\mathbb{I}(Z=z)\left\{\mathbb{I}(D=d)-p_{zd}^{\text{par}}({\bm{X}})\right\}}{{\pi}_{z}^{\text{par}}({\bm{X}})}+p_{zd}^{\text{par}}({\bm{X}})\right].

In addition, assume that the following regularity conditions hold:

1.

Assume that $\sqrt{n}(\widehat{\bm{\tau}}-\bm{\tau}^{*})=\sqrt{n}\mathbb{P}_{n}\left[\text{IF}_{\bm{\tau}}(\bm{O};\bm{\tau}^{*})\right]+o_{p}(1)$ , where $\text{IF}_{\bm{\tau}}(\bm{O};\bm{\tau}^{*})$ is the influence function of $\widehat{\bm{\tau}}$ and $o_{p}(1)$ is a remainder term that converges in probability to 0. Also, assume that $\mathbb{P}_{n}\left[\left\{\text{IF}_{\bm{\tau}}(\bm{O};\bm{\tau}^{*})\right\}^{\otimes 2}\right]$ converges to a positive definite matrix.
2.

Let $\bm{\Xi}$ be a bounded convex neighborhood of $\bm{\tau}^{*}$ . Assume that the class of functions $\Big{\{}\mathcal{S}_{1}(\bm{O};\bm{\tau}),\frac{\partial}{\partial\bm{\tau}}\mathcal{S}_{1}(\bm{O};\bm{\tau}),\left\{\mathcal{S}_{1}(\bm{O};\bm{\tau})\right\}^{2},\mathcal{S}_{0}(\bm{O};\bm{\tau}),\frac{\partial}{\partial\bm{\tau}}\mathcal{S}_{0}(\bm{O};\bm{\tau}),\left\{\mathcal{S}_{0}(\bm{O};\bm{\tau})\right\}^{2},\mathcal{S}_{0}(\bm{O};\bm{\tau}),\\ \text{IF}_{\bm{\tau}}(\bm{O};\bm{\tau}),\left\{\text{IF}_{\bm{\tau}}(\bm{O};\bm{\tau})\right\}^{\otimes 2}\Big{\}}$ is a Glivenko-Cantelli class in $\bm{\Xi}$ .
3.

Assume that $\mathbb{P}_{n}[\mathcal{S}_{0}(\bm{O};\bm{\tau}^{*})]$ converges to a positive value. In addition, we assume that both $\mathbb{P}_{n}[\left\{\mathcal{S}_{1}(\bm{O};\bm{\tau}^{*})\right\}^{2}]$ and $\mathbb{P}_{n}[\left\{\mathcal{S}_{0}(\bm{O};\bm{\tau}^{*})\right\}^{2}]$ converge to a positive value.

To prove asymptotic normality, we use a Taylor series, along with the above conditions, to deduce that

	$\displaystyle 0=\mathbb{P}_{n}\left[S_{\text{d}}\left({\bm{O}};\widehat{\theta}_{d_{1}d_{0}}^{(zz^{\prime}),\text{d}},\widehat{\bm{\tau}}\right)\right]=$	$\displaystyle\mathbb{P}_{n}\left[\mathcal{S}_{1}(\bm{O};\widehat{\bm{\tau}})-\widehat{\theta}_{d_{1}d_{0}}^{(zz^{\prime}),\text{d}}\mathcal{S}_{0}(\bm{O};\widehat{\bm{\tau}})\right]$
	$\displaystyle=$	$\displaystyle\mathbb{P}_{n}\left[\mathcal{S}_{1}(\bm{O};\bm{\tau}^{})-\theta_{d_{1}d_{0}}^{(zz^{\prime}),\text{d}}\mathcal{S}_{0}(\bm{O};\bm{\tau}^{})\right]$
		$\displaystyle-\mathbb{P}_{n}\left[\mathcal{S}_{0}(\bm{O};\bm{\tau}^{*})\right]\left(\widehat{\theta}_{d_{1}d_{0}}^{(zz^{\prime}),\text{d}}-\theta_{d_{1}d_{0}}^{(zz^{\prime}),\text{d}}\right)$
		$\displaystyle+\mathbb{P}_{n}\left[\frac{\partial}{\partial\bm{\tau}}\mathcal{S}_{1}(\bm{O};\bm{\tau}^{})-\theta_{d_{1}d_{0}}^{(zz^{\prime}),\text{d}}\frac{\partial}{\partial\bm{\tau}}\mathcal{S}_{0}(\bm{O};\bm{\tau}^{})\right](\widehat{\bm{\tau}}-\bm{\tau}^{*})+o_{p}(n^{-1/2}),$

which suggests that

		$\displaystyle\sqrt{n}\left(\widehat{\theta}_{d_{1}d_{0}}^{(zz^{\prime}),\text{d}}-\theta_{d_{1}d_{0}}^{(zz^{\prime}),\text{d}}\right)$
	$\displaystyle=$	$\displaystyle\left\{\mathbb{E}[\mathcal{S}_{0}(\bm{O};\bm{\tau}^{})]\right\}^{-1}\mathbb{P}_{n}\left\{\mathcal{S}_{1}(\bm{O};\bm{\tau}^{})-\theta_{d_{1}d_{0}}^{(zz^{\prime}),\text{d}}\mathcal{S}_{0}(\bm{O};\bm{\tau}^{})+R(\theta_{d_{1}d_{0}}^{(zz^{\prime}),\text{d}},\bm{\tau}^{})\text{IF}_{\bm{\tau}}(\bm{O};\bm{\tau}^{*})\right\}+o_{p}(1),$

where $R(\theta_{d_{1}d_{0}}^{(zz^{\prime}),\text{d}},\bm{\tau}^{*})=\mathbb{E}\left[\frac{\partial}{\partial\bm{\tau}}\mathcal{S}_{1}(\bm{O};\bm{\tau}^{*})-\theta_{d_{1}d_{0}}^{(zz^{\prime}),\text{d}}\frac{\partial}{\partial\bm{\tau}}\mathcal{S}_{0}(\bm{O};\bm{\tau}^{*})\right]$ . Then, by applying the central limit theorem and noticing that $\theta_{d_{1}d_{0}}^{(zz^{\prime}),\text{d}}=\theta_{d_{1}d_{0}}^{(zz^{\prime})}$ under $\mathcal{M}_{e}\cap\mathcal{M}_{m}\cap\mathcal{M}_{o}$ , we can show that $\sqrt{n}\left(\widehat{\theta}_{d_{1}d_{0}}^{(zz^{\prime}),\text{d}}-\theta_{d_{1}d_{0}}^{(zz^{\prime})}\right)$ converges to a zero-mean normal distribution with variance

V_{\text{d}}=\left\{\mathbb{E}[\mathcal{S}_{0}(\bm{O};\bm{\tau}^{*})]\right\}^{-2}\mathbb{E}\left[\left\{\mathcal{S}_{1}(\bm{O};\bm{\tau}^{*})-\theta_{d_{1}d_{0}}^{(zz^{\prime})}\mathcal{S}_{0}(\bm{O};\bm{\tau}^{*})+R(\theta_{d_{1}d_{0}}^{(zz^{\prime})},\bm{\tau}^{*})\text{IF}_{\bm{\tau}}(\bm{O};\bm{\tau}^{*})\right\}^{2}\right].

D.4 The efficient influence function (Theorem 3)

We derive the efficient influence function (EIF) of $\theta_{d_{1}d_{0}}^{(zz^{\prime})}$ under the nonparamatic model over the observed data $\bm{O}=\{{\bm{X}},Z,D,M,Y\}$ . Define $H_{d_{1}d_{0}}^{(zz^{\prime})}=\mathbb{E}\left[(p_{z^{*}d^{*}}({\bm{X}})-kp_{01}({\bm{X}}))\eta_{zz^{\prime}}({\bm{X}})\right]$ , where $k=|d_{1}-d_{0}|$ and $z^{*}d^{*}=11,10,01$ if $d_{1}d_{0}=10,00,11$ , respectively. Then, $\theta_{d_{1}d_{0}}^{(zz^{\prime})}=H_{d_{1}d_{0}}^{(zz^{\prime})}/e_{d_{1}d_{0}}$ , where $e_{d_{1}d_{0}}=p_{z^{*}d^{*}}-kp_{01}=\mathbb{E}[p_{z^{*}d^{*}}({\bm{X}})-kp_{01}({\bm{X}})]$ . The following lemma demonstrates the EIFs of $H_{d_{1}d_{0}}^{(zz^{\prime})}$ and $e_{d_{1}d_{0}}$ , separately.

Lemma S9

For any $z,z^{\prime}\in\{0,1\}$ , $d_{1}d_{0}\in\mathcal{U}_{\text{a}}$ under standard monotonicity, and $d_{1}d_{0}\in\mathcal{U}_{\text{b}}$ under strong monotonicity, the EIF of $H_{d_{1}d_{0}}^{(zz^{\prime})}$ over $\mathcal{M}_{np}$ is $\mathcal{D}_{d_{1}d_{0}}^{(zz^{\prime}),H}(\bm{O})=\psi_{d_{1}d_{0}}^{(zz^{\prime})}({\bm{O}})-H_{d_{1}d_{0}}^{(zz^{\prime})}$ , where

	$\displaystyle\psi_{d_{1}d_{0}}^{(zz^{\prime})}({\bm{O}})=$	$\displaystyle\left(\frac{\mathbb{I}(Z=z^{})\left\{\mathbb{I}(D=d^{})-p_{z^{}d^{}}({\bm{X}})\right\}}{\pi_{z^{*}}({\bm{X}})}-k\frac{(1-Z)\left\{D-p_{01}({\bm{X}})\right\}}{\pi_{0}({\bm{X}})}\right)\eta_{zz^{\prime}}({\bm{X}})$
		$\displaystyle+\left\{p_{z^{}d^{}}({\bm{X}})-kp_{01}({\bm{X}})\right\}\frac{\mathbb{I}(D=d_{z},Z=z)}{p_{zd_{z}}({\bm{X}})\pi_{z}({\bm{X}})}\frac{r_{z^{\prime}d_{z^{\prime}}}(M,{\bm{X}})}{r_{zd_{z}}(M,{\bm{X}})}\left\{Y-\mu_{zd_{z}}(M,{\bm{X}})\right\}$
		$\displaystyle+\left\{p_{z^{}d^{}}({\bm{X}})-kp_{01}({\bm{X}})\right\}\frac{\mathbb{I}(D=d_{z^{\prime}},Z=z^{\prime})}{p_{z^{\prime}d_{z^{\prime}}}({\bm{X}})\pi_{z^{\prime}}({\bm{X}})}\left\{\mu_{zd_{z}}(M,{\bm{X}})-\eta_{zz^{\prime}}({\bm{X}})\right\}$
		$\displaystyle+\left\{p_{z^{}d^{}}({\bm{X}})-kp_{01}({\bm{X}})\right\}\eta_{zz^{\prime}}({\bm{X}}),$

and $d_{z}$ , $d_{z^{\prime}}$ , and $\eta_{zz^{\prime}}({\bm{X}})$ are defined in Theorem 1. The EIF of $e_{d_{1}d_{0}}$ over $\mathcal{M}_{np}$ is $\mathcal{D}_{d_{1}d_{0}}^{e}(\bm{O})=\delta_{d_{1}d_{0}}(\bm{O})-e_{d_{1}d_{0}}$ , where

\delta_{d_{1}d_{0}}(\bm{O})=\frac{\mathbb{I}(Z=z^{*})\left\{\mathbb{I}(D=d^{*})-p_{z^{*}d^{*}}({\bm{X}})\right\}}{\pi_{z^{*}}({\bm{X}})}-k\frac{(1-Z)\left\{D-p_{01}({\bm{X}})\right\}}{\pi_{0}({\bm{X}})}+p_{z^{*}d^{*}}({\bm{X}})-kp_{01}({\bm{X}}).

Proof.

To simplify notation, we abbreviate $f_{Y|Z,D,M,{\bm{X}}}(y|z,d,m,{\bm{x}})$ , $f_{M|Z,D,{\bm{X}}}(m|z,d,{\bm{x}})$ , $f_{D|Z,{\bm{X}}}(d|z,{\bm{x}})$ , $f_{Z|{\bm{X}}}(z|{\bm{x}})$ , and $f_{{\bm{X}}}({\bm{x}})$ as $f(y|z,d,m,{\bm{x}})$ , $f(m|z,d,{\bm{x}})$ , $f(d|z,{\bm{x}})$ , $f(z|{\bm{x}})$ , and $f({\bm{x}})$ , respectively. We let $f_{\bm{O}}(\bm{o})$ be the joint density of the observed data $\bm{O}$ , which is abbreviated as $f(\bm{o})$ hereafter. Notice that $f(\bm{o})$ can be factorized as

f(\bm{o})=f(y|z,d,m,{\bm{x}})f(m|z,d,{\bm{x}})f(d|z,{\bm{x}})f(z|{\bm{x}})f({\bm{x}}).

We consider a parametric submodel $f_{t}(\bm{o})$ for $f(\bm{o})$ , which depends on a one-dimensional parameter $t$ . We assume that $f_{t}(\bm{o})$ contains the true model $f(\bm{o})$ at $t=0$ ; i.e., $f_{t=0}(\bm{o})=f(\bm{o})$ . Let $S_{t}(y,m,d,z,{\bm{x}})=S_{t}(\bm{o})$ be the score function of this parametric submodel, which is defined as

S_{t}(\bm{o})=\triangledown_{t}\log f_{t}(\bm{o})

where $\triangledown_{t}\log f_{t}(\cdot)=\frac{\partial\log f_{t}(\cdot)}{\partial t}$ . We can decompose the score function as a summation of the following 5 parts:

\displaystyle S_{t}(\bm{o})=S_{t}(y|z,d,m,{\bm{x}})+S_{t}(m|z,d,{\bm{x}})+S_{t}(d|z,{\bm{x}})+S_{t}(z|{\bm{x}})+S_{t}({\bm{x}}),

where $S_{t}(y|z,d,m,{\bm{x}})=\triangledown_{t}\log f_{t}(y|z,d,m,{\bm{x}})$ , and $S_{t}(m|z,d,{\bm{x}})$ , $S_{t}(d|z,{\bm{x}})$ , $S_{t}(z|{\bm{x}})$ , and $S_{t}({\bm{x}})$ are similarly defined. According to the semiparametric efficiency theory (Bickel et al., 1993), the EIF of $H_{d_{1}d_{0}}^{(zz^{\prime})}$ , denoted by $\mathcal{D}_{d_{1}d_{0}}^{(zz^{\prime}),H}(\bm{O})$ , must satisfy the following equation:

\mathbb{E}\left[\mathcal{D}_{d_{1}d_{0}}^{(zz^{\prime}),H}(\bm{O})S_{t=0}(\bm{O})\right]=\triangledown_{t=0}H_{d_{1}d_{0}}^{(zz^{\prime})}(t),

where

H_{d_{1}d_{0}}^{(zz^{\prime})}(t)\!=\!\iiint_{{\bm{x}},m,y}\!\!\!\!\!\left\{f_{t}(d^{*}|z^{*},{\bm{x}})\!-\!kf_{t}(D\!=\!0|Z\!=\!1,{\bm{x}})\right\}yf_{t}(y|z,d_{z},m,{\bm{x}})f_{t}(m|z^{\prime},d_{z^{\prime}},{\bm{x}})f_{t}({\bm{x}})\text{d}y\text{d}m\text{d}{\bm{x}}

is $H_{d_{1}d_{0}}^{(zz^{\prime})}$ evaluated under the parametric submodel $f_{t}(\bm{o})$ .

Below we derive $\mathcal{D}_{d_{1}d_{0}}^{(zz^{\prime}),H}(\bm{O})$ by solving $\triangledown_{t=0}H_{d_{1}d_{0}}^{(zz^{\prime})}(t)$ directly. Specifically, we can show

	$\displaystyle\triangledown_{t=0}H_{d_{1}d_{0}}^{(zz^{\prime})}(t)$
$\displaystyle=$	$\displaystyle\int_{{\bm{x}}}\!\!\left\{\triangledown_{t=0}f_{t}(d^{}\|z^{},{\bm{x}})\!-\!k\triangledown_{t=0}f_{t}(D\!=\!0\|Z\!=\!1,{\bm{x}})\right\}\iint_{m,y}yf(y\|z,d_{z},m,{\bm{x}})f(m\|z^{\prime},d_{z^{\prime}},{\bm{x}})f({\bm{x}})\text{d}y\text{d}m\text{d}{\bm{x}}$	(s16)
	$\displaystyle+\int_{{\bm{x}}}\!\!\left\{f(d^{}\|z^{},{\bm{x}})\!-\!kf(D\!=\!0\|Z\!=\!1,{\bm{x}})\right\}\iint_{m,y}y\triangledown_{t=0}f_{t}(y\|z,d_{z},m,{\bm{x}})f(m\|z^{\prime},d_{z^{\prime}},{\bm{x}})f({\bm{x}})\text{d}y\text{d}m\text{d}{\bm{x}}$	(s17)
	$\displaystyle+\int_{{\bm{x}}}\!\!\left\{f(d^{}\|z^{},{\bm{x}})\!-\!kf(D\!=\!0\|Z\!=\!1,{\bm{x}})\right\}\iint_{m,y}yf(y\|z,d_{z},m,{\bm{x}})\triangledown_{t=0}f_{t}(m\|z^{\prime},d_{z^{\prime}},{\bm{x}})f({\bm{x}})\text{d}y\text{d}m\text{d}{\bm{x}}$	(s18)
	$\displaystyle+\int_{{\bm{x}}}\!\!\left\{f(d^{}\|z^{},{\bm{x}})\!-\!kf(D\!=\!0\|Z\!=\!1,{\bm{x}})\right\}\iint_{m,y}yf(y\|z,d_{z},m,{\bm{x}})f(m\|z^{\prime},d_{z^{\prime}},{\bm{x}})\triangledown_{t=0}f_{t}({\bm{x}})\text{d}y\text{d}m\text{d}{\bm{x}},$	(s19)

where

		$\displaystyle\eqref{eq:eif1}$
	$\displaystyle=$	$\displaystyle\iint_{{\bm{x}},m}\left\{\triangledown_{t=0}f_{t}(d^{}\|z^{},{\bm{x}})\!-\!k\triangledown_{t=0}f_{t}(D\!=\!0\|Z\!=\!1,{\bm{x}})\right\}\mathbb{E}[Y\|z,d_{z},m,{\bm{x}}]f(m\|z^{\prime},d_{z^{\prime}},{\bm{x}})\text{d}mf({\bm{x}})\text{d}{\bm{x}}$
	$\displaystyle=$	$\displaystyle\int_{{\bm{x}}}\triangledown_{t=0}f_{t}(d^{}\|z^{},{\bm{x}})\eta_{zz^{\prime}}({\bm{x}})f({\bm{x}})\text{d}{\bm{x}}-k\int_{{\bm{x}}}\triangledown_{t=0}f_{t}(D\!=\!0\|Z\!=\!1,{\bm{x}})\eta_{zz^{\prime}}({\bm{x}})f({\bm{x}})\text{d}{\bm{x}}$
	$\displaystyle=$	$\displaystyle\int_{{\bm{x}}}f(d^{}\|z^{},{\bm{x}})\left\{\mathbb{E}_{Y,M\|Z,D,{\bm{X}}}[S_{t=0}(Y,M,d^{},z^{},{\bm{x}})\|z^{},d^{},{\bm{x}}]-\mathbb{E}_{Y,M,D\|Z,{\bm{X}}}[S_{t=0}(Y,M,D,z^{},{\bm{x}})\|z^{},{\bm{x}}]\right\}\eta_{zz^{\prime}}({\bm{x}})f({\bm{x}})\text{d}{\bm{x}}$
		$\displaystyle-k\int_{{\bm{x}}}f(D=0\|Z=1,{\bm{x}})\left\{\mathbb{E}_{Y,M\|D,Z,{\bm{X}}}[S_{t=0}(Y,M,0,1,{\bm{x}})\|1,0,{\bm{x}}]-\mathbb{E}_{Y,M,D\|Z,{\bm{X}}}[S_{t=0}(Y,M,D,1,{\bm{x}})\|1,{\bm{x}}]\right\}\eta_{zz^{\prime}}({\bm{x}})f({\bm{x}})\text{d}{\bm{x}}$
	$\displaystyle=$	$\displaystyle\iiint_{{\bm{x}},m,y}f(d^{}\|z^{},{\bm{x}})\eta_{zz^{\prime}}({\bm{x}})S_{t=0}(y,m,d^{},z^{},{\bm{x}})f(y\|z^{},d^{},m,{\bm{x}})f(m\|z^{},d^{},{\bm{x}})f({\bm{x}})\text{d}y\text{d}m\text{d}{\bm{x}}$
		$\displaystyle-\iiiint_{{\bm{x}},d,m,y}f(d^{}\|z^{},{\bm{x}})\eta_{zz^{\prime}}({\bm{x}})S_{t=0}(y,m,d,z^{},{\bm{x}})f(y\|z^{},d,m,{\bm{x}})f(m\|z^{},d,{\bm{x}})f(d\|z^{},{\bm{x}})f({\bm{x}})\text{d}y\text{d}m\text{d}d\text{d}{\bm{x}}$
		$\displaystyle-k\iiint_{{\bm{x}},m,y}f(D=0\|Z=1,{\bm{x}})\eta_{zz^{\prime}}({\bm{x}})S_{t=0}(y,m,0,1,{\bm{x}})f(y\|1,0,m,{\bm{x}})f(m\|1,0,{\bm{x}})f({\bm{x}})\text{d}y\text{d}m\text{d}{\bm{x}}$
		$\displaystyle+k\iiiint_{{\bm{x}},d,m,y}f(D=0\|Z=1,{\bm{x}})\eta_{zz^{\prime}}({\bm{x}})S_{t=0}(y,m,d,1,{\bm{x}})f(y\|1,d,m,{\bm{x}})f(m\|1,d,{\bm{x}})f(d\|1,{\bm{x}})f({\bm{x}})\text{d}y\text{d}m\text{d}d\text{d}{\bm{x}}$
	$\displaystyle=$	$\displaystyle\mathbb{E}\left[\frac{\mathbb{I}(D=d^{},Z=z^{})}{p_{z^{}d^{}}({\bm{X}})\pi_{z}^{}({\bm{X}})}p_{z^{}d^{}}({\bm{X}})\eta_{zz^{\prime}}({\bm{X}})S_{t=0}(\bm{O})\right]-\mathbb{E}\left[\frac{\mathbb{I}(Z=z^{})}{\pi_{z^{}}({\bm{X}})}p_{z^{}d^{*}}({\bm{X}})\eta_{zz^{\prime}}({\bm{X}})S_{t=0}(\bm{O})\right]$
		$\displaystyle-k\mathbb{E}\left[\frac{\mathbb{I}(D=0,Z=1)}{p_{01}({\bm{X}})\pi_{0}({\bm{X}})}p_{01}({\bm{X}})\eta_{zz^{\prime}}({\bm{X}})S_{t=0}(\bm{O})\right]+k\mathbb{E}\left[\frac{\mathbb{I}(Z=0)}{\pi_{0}({\bm{X}})}p_{01}({\bm{X}})\eta_{zz^{\prime}}({\bm{X}})S_{t=0}(\bm{O})\right]$
	$\displaystyle=$	$\displaystyle\mathbb{E}\left[\left(\frac{\mathbb{I}(Z=z^{})\left\{\mathbb{I}(D=d^{})-p_{z^{}d^{}}({\bm{X}})\right\}}{\pi_{z^{*}}({\bm{X}})}-k\frac{(1-Z)\left\{D-p_{01}({\bm{X}})\right\}}{\pi_{0}({\bm{X}})}\right)\eta_{zz^{\prime}}({\bm{X}})S_{t=0}({\bm{O}})\right]$

and

	$\displaystyle\eqref{eq:eif2}=$	$\displaystyle\int_{{\bm{x}}}\left\{p_{z^{}d^{}}({\bm{x}})-kp_{01}({\bm{x}})\right\}\iint_{m,y}y\triangledown_{t=0}f_{t}(y\|z,d_{z},m,{\bm{x}})f(m\|z^{\prime},d_{z^{\prime}},{\bm{x}})f({\bm{x}})\text{d}y\text{d}m\text{d}{\bm{x}}$
	$\displaystyle=$	$\displaystyle\iiint_{{\bm{x}},m,y}\left\{p_{z^{}d^{}}({\bm{x}})\!-\!kp_{01}({\bm{x}})\right\}yf(y\|z,d_{z},m,{\bm{x}})\left\{S_{t=0}(y,m,d_{z},z,{\bm{x}})-\mathbb{E}_{Y\|M,D,Z,{\bm{X}}}[S(Y,m,d_{z},z,{\bm{x}})\|m,d_{z},z,{\bm{x}}]\right\}$
		$\displaystyle\quad f(m\|z^{\prime},d_{z^{\prime}},{\bm{x}})f({\bm{x}})\text{d}y\text{d}m\text{d}{\bm{x}}$
	$\displaystyle=$	$\displaystyle\iiint_{{\bm{x}},m,y}\left\{p_{z^{}d^{}}({\bm{x}})\!-\!kp_{01}({\bm{x}})\right\}yf(y\|z,d_{z},m,{\bm{x}})S_{t=0}(y,m,d_{z},z,{\bm{x}})f(m\|z^{\prime},d_{z^{\prime}},{\bm{x}})f({\bm{x}})\text{d}y\text{d}m\text{d}{\bm{x}}$
		$\displaystyle-\iiint_{{\bm{x}},m,y}\left\{p_{z^{}d^{}}({\bm{x}})\!-\!kp_{01}({\bm{x}})\right\}\mathbb{E}_{Y\|Z,D,M,{\bm{X}}}[Y\|z,d_{z},m,{\bm{x}}]S_{t=0}(y,m,d_{z},z,{\bm{x}})f(y\|z,d_{z},m,{\bm{x}})$
		$\displaystyle\quad\quad f(m\|z^{\prime},d_{z^{\prime}},{\bm{x}})f({\bm{x}})\text{d}y\text{d}m\text{d}{\bm{x}}$
	$\displaystyle=$	$\displaystyle\iiint_{{\bm{x}},m,y}\!\!\!\!\left\{p_{z^{}d^{}}({\bm{x}})\!-\!kp_{01}({\bm{x}})\right\}y\frac{f(m\|z^{\prime},d_{z^{\prime}},{\bm{x}})}{f(m\|z,d_{z},{\bm{x}})}S_{t=0}(y,m,d_{z},z,{\bm{x}})f(y\|z,d_{z},m,{\bm{x}})f(m\|z,d_{z},{\bm{x}})f({\bm{x}})\text{d}y\text{d}m\text{d}{\bm{x}}$
		$\displaystyle-\iiint_{{\bm{x}},m,y}\!\!\!\!\left\{p_{z^{}d^{}}({\bm{x}})\!-\!kp_{01}({\bm{x}})\right\}\mu_{zd_{z}}(m,{\bm{x}})\frac{f(m\|z^{\prime},d_{z^{\prime}},{\bm{x}})}{f(m\|z,d_{z},{\bm{x}})}S_{t=0}(y,m,d_{z},z,{\bm{x}})f(y\|z,d_{z},m,{\bm{x}})$
		$\displaystyle\quad\quad f(m\|z,d_{z},{\bm{x}})f({\bm{x}})\text{d}y\text{d}m\text{d}{\bm{x}}$
	$\displaystyle=$	$\displaystyle\mathbb{E}\left[\left\{p_{z^{}d^{}}({\bm{X}})-kp_{01}({\bm{X}})\right\}\frac{\mathbb{I}(D=d_{z},Z=z)}{p_{zd_{z}}({\bm{X}})\pi_{z}({\bm{X}})}\frac{r_{z^{\prime}d_{z^{\prime}}}(M,{\bm{X}})}{r_{zd_{z}}(M,{\bm{X}})}\left\{Y-\mu_{zd_{z}}(M,{\bm{X}})\right\}S_{t=0}({\bm{O}})\right]$

and

	$\displaystyle\eqref{eq:eif3}=$	$\displaystyle\iint_{{\bm{x}},m}\left\{p_{z^{}d^{}}({\bm{x}})-kp_{01}({\bm{x}})\right\}\mu_{zd_{z}}(m,{\bm{x}})\triangledown_{t=0}f_{t}(m\|z^{\prime},d_{z^{\prime}},{\bm{x}})f({\bm{x}})\text{d}m\text{d}{\bm{x}}$
	$\displaystyle=$	$\displaystyle\iint_{{\bm{x}},m}\left\{p_{z^{}d^{}}({\bm{x}})-kp_{01}({\bm{x}})\right\}\mu_{zd_{z}}(m,{\bm{x}})f(m\|z^{\prime},d_{z^{\prime}},{\bm{x}})\Big{\{}\mathbb{E}_{Y\|Z,D,M,{\bm{X}}}[S_{t=0}(Y,m,d_{z^{\prime}},z^{\prime},{\bm{x}})\|z^{\prime},d_{z^{\prime}},m,{\bm{x}}]$
		$\displaystyle\quad\quad-\mathbb{E}_{Y,M\|Z,D,{\bm{X}}}[S_{t=0}(Y,M,d_{z^{\prime}},z^{\prime},{\bm{x}})\|z^{\prime},d_{z^{\prime}},{\bm{x}}]\Big{\}}f({\bm{x}})\text{d}m\text{d}{\bm{x}}$
	$\displaystyle=$	$\displaystyle\iiint_{{\bm{x}},m,y}\!\!\!\left\{p_{z^{}d^{}}({\bm{x}})-kp_{01}({\bm{x}})\right\}\mu_{zd_{z}}(m,{\bm{x}})S_{t=0}(y,m,d_{z^{\prime}},z^{\prime},{\bm{x}})f(y\|z^{\prime},d_{z^{\prime}},m,{\bm{x}})f(m\|z^{\prime},d_{z^{\prime}},{\bm{x}})f({\bm{x}})\text{d}y\text{d}m\text{d}{\bm{x}}$
		$\displaystyle-\iiint_{{\bm{x}},m,y}\!\!\!\left\{p_{z^{}d^{}}({\bm{x}})-kp_{01}({\bm{x}})\right\}\eta_{zz^{\prime}}({\bm{x}})S_{t=0}(y,m,d_{z^{\prime}},z^{\prime},{\bm{x}})f(y\|z^{\prime},d_{z^{\prime}},m,{\bm{x}})f(m\|z^{\prime},d_{z^{\prime}},{\bm{x}})f({\bm{x}})\text{d}y\text{d}m\text{d}{\bm{x}}$
	$\displaystyle=$	$\displaystyle\mathbb{E}\left[\left\{p_{z^{}d^{}}({\bm{X}})-kp_{01}({\bm{X}})\right\}\frac{\mathbb{I}(D=d_{z^{\prime}},Z=z^{\prime})}{p_{z^{\prime}d_{z^{\prime}}}({\bm{X}})\pi_{z^{\prime}}({\bm{X}})}\left\{\mu_{zd_{z}}(M,{\bm{X}})-\eta_{zz^{\prime}}({\bm{X}})\right\}S_{t=0}({\bm{O}})\right]$

and

	$\displaystyle\eqref{eq:eif4}=$	$\displaystyle\int_{{\bm{x}}}\left\{p_{z^{}d^{}}({\bm{x}})-kp_{01}({\bm{x}})\right\}\eta_{zz^{\prime}}({\bm{x}})\triangledown_{t=0}f_{t}({\bm{x}})\text{d}{\bm{x}}$
	$\displaystyle=$	$\displaystyle\int_{{\bm{x}}}\left\{p_{z^{}d^{}}({\bm{x}})-kp_{01}({\bm{x}})\right\}\eta_{zz^{\prime}}({\bm{x}})f({\bm{x}})\left\{\mathbb{E}_{Y,M,D,Z\|{\bm{X}}}[S_{t=0}(Y,M,D,Z,{\bm{x}})\|{\bm{x}}]-\mathbb{E}[S_{t=0}(Y,M,S,Z,{\bm{X}})]\right\}\text{d}{\bm{x}}$
	$\displaystyle=$	$\displaystyle\mathbb{E}\left[\left\{p_{z^{}d^{}}({\bm{X}})-kp_{01}({\bm{X}})\right\}\eta_{zz^{\prime}}({\bm{X}})S_{t=0}({\bm{O}})\right]-H_{d_{1}d_{0}}^{(zz^{\prime})}\mathbb{E}\left[S_{t=0}({\bm{O}})\right]$
	$\displaystyle=$	$\displaystyle\mathbb{E}\left[\left(\left\{p_{z^{}d^{}}({\bm{X}})-kp_{01}({\bm{X}})\right\}\eta_{zz^{\prime}}({\bm{X}})-H_{d_{1}d_{0}}^{(zz^{\prime})}\right)S_{t=0}({\bm{O}})\right].$

Therefore,

	$\displaystyle\triangledown_{t=0}H_{d_{1}d_{0}}^{(zz^{\prime})}(t)$	$\displaystyle=\eqref{eq:eif1}+\eqref{eq:eif2}+\eqref{eq:eif3}+\eqref{eq:eif4}$
		$\displaystyle=\mathbb{E}\left[\left\{\psi_{d_{1}d_{0}}^{(zz^{\prime})}({\bm{O}})-H_{d_{1}d_{0}}^{(zz^{\prime})}\right\}S_{t=0}(\bm{O})\right].$

Now we conclude that the EIF of $H_{d_{1}d_{0}}^{(zz^{\prime})}$ is $\mathcal{D}_{d_{1}d_{0}}^{(zz^{\prime}),H}(\bm{O})=\psi_{d_{1}d_{0}}^{(zz^{\prime})}({\bm{O}})-H_{d_{1}d_{0}}^{(zz^{\prime})}$ .

Next, we drive the EIF of $e_{d_{1}d_{0}}$ , denoted by $\mathcal{D}_{d_{1}d_{0}}^{e}(\bm{O})$ . By the semiparametric efficiency theory (Bickel et al., 1993), $\mathcal{D}_{d_{1}d_{0}}^{e}(\bm{O})$ must satisfy the following equation

\mathbb{E}\left[\mathcal{D}_{d_{1}d_{0}}^{e}(\bm{O})S_{t=0}(\bm{O})\right]=\triangledown_{t=0}e_{d_{1}d_{0}}(t),

where $e_{d_{1}d_{0}}(t)=\int_{{\bm{x}}}\left\{f_{t}(d^{*}|z^{*},{\bm{x}})-kf_{t}(D=0|Z=1,{\bm{x}})\right\}f_{t}({\bm{x}})\text{d}{\bm{x}}$ is $e_{d_{1}d_{0}}$ evaluated under the parametric submodel. We can show

	$\displaystyle\triangledown_{t=0}e_{d_{1}d_{0}}(t)=$	$\displaystyle\int_{{\bm{x}}}\left\{\triangledown_{t=0}f_{t}(d^{}\|z^{},{\bm{x}})-k\triangledown_{t=0}f_{t}(D=0\|Z=1,{\bm{x}})\right\}f({\bm{x}})\text{d}{\bm{x}}$		(s20)
		$\displaystyle+\int_{{\bm{x}}}\left\{f(d^{}\|z^{},{\bm{x}})-kf(D=0\|Z=1,{\bm{x}})\right\}\triangledown_{t=0}f_{t}({\bm{x}})\text{d}{\bm{x}},$		(s21)

where

	$\displaystyle\!\!\!\eqref{eq:q_eif1}=$	$\displaystyle\int_{{\bm{x}}}\triangledown_{t=0}f_{t}(d^{}\|z^{},{\bm{x}})f({\bm{x}})\text{d}{\bm{x}}-k\int_{{\bm{x}}}\triangledown_{t=0}f_{t}(D\!=\!0\|Z\!=\!1,{\bm{x}})f({\bm{x}})\text{d}{\bm{x}}$
	$\displaystyle=$	$\displaystyle\int_{{\bm{x}}}f(d^{}\|z^{},{\bm{x}})\left\{\mathbb{E}_{Y,M\|Z,D,{\bm{X}}}[S_{t=0}(Y,M,d^{},z^{},{\bm{x}})\|z^{},d^{},{\bm{x}}]-\mathbb{E}_{Y,M,D\|Z,{\bm{X}}}[S_{t=0}(Y,M,D,z^{},{\bm{x}})\|z^{},{\bm{x}}]\right\}f({\bm{x}})\text{d}{\bm{x}}$
		$\displaystyle-k\int_{{\bm{x}}}f(D=0\|Z=1,{\bm{x}})\left\{\mathbb{E}_{Y,M\|D,Z,{\bm{X}}}[S_{t=0}(Y,M,0,1,{\bm{x}})\|1,0,{\bm{x}}]-\mathbb{E}_{Y,M,D\|Z,{\bm{X}}}[S_{t=0}(Y,M,D,1,{\bm{x}})\|1,{\bm{x}}]\right\}f({\bm{x}})\text{d}{\bm{x}}$
	$\displaystyle=$	$\displaystyle\iiint_{{\bm{x}},m,y}f(d^{}\|z^{},{\bm{x}})S_{t=0}(y,m,d^{},z^{},{\bm{x}})f(y\|z^{},d^{},m,{\bm{x}})f(m\|z^{},d^{},{\bm{x}})f({\bm{x}})\text{d}y\text{d}m\text{d}{\bm{x}}$
		$\displaystyle-\iiiint_{{\bm{x}},d,m,y}f(d^{}\|z^{},{\bm{x}})S_{t=0}(y,m,d,z^{},{\bm{x}})f(y\|z^{},d,m,{\bm{x}})f(m\|z^{},d,{\bm{x}})f(d\|z^{},{\bm{x}})f({\bm{x}})\text{d}y\text{d}m\text{d}d\text{d}{\bm{x}}$
		$\displaystyle-k\iiint_{{\bm{x}},m,y}f(D=0\|Z=1,{\bm{x}})S_{t=0}(y,m,0,1,{\bm{x}})f(y\|1,0,m,{\bm{x}})f(m\|1,0,{\bm{x}})f({\bm{x}})\text{d}y\text{d}m\text{d}{\bm{x}}$
		$\displaystyle+k\iiiint_{{\bm{x}},d,m,y}f(D=0\|Z=1,{\bm{x}})S_{t=0}(y,m,d,1,{\bm{x}})f(y\|1,d,m,{\bm{x}})f(m\|1,d,{\bm{x}})f(d\|1,{\bm{x}})f({\bm{x}})\text{d}y\text{d}m\text{d}d\text{d}{\bm{x}}$
	$\displaystyle=$	$\displaystyle\mathbb{E}\left[\frac{\mathbb{I}(D=d^{},Z=z^{})}{p_{z^{}d^{}}({\bm{X}})\pi_{z}^{}({\bm{X}})}p_{z^{}d^{}}({\bm{X}})S_{t=0}(\bm{O})\right]-\mathbb{E}\left[\frac{\mathbb{I}(Z=z^{})}{\pi_{z^{}}({\bm{X}})}p_{z^{}d^{*}}({\bm{X}})S_{t=0}(\bm{O})\right]$
		$\displaystyle-k\mathbb{E}\left[\frac{\mathbb{I}(D=0,Z=1)}{p_{01}({\bm{X}})\pi_{0}({\bm{X}})}p_{01}({\bm{X}})S_{t=0}(\bm{O})\right]+k\mathbb{E}\left[\frac{\mathbb{I}(Z=0)}{\pi_{0}({\bm{X}})}p_{01}({\bm{X}})S_{t=0}(\bm{O})\right]$
	$\displaystyle=$	$\displaystyle\mathbb{E}\left[\left(\frac{\mathbb{I}(Z=z^{})\left\{\mathbb{I}(D=d^{})-p_{z^{}d^{}}({\bm{X}})\right\}}{\pi_{z^{*}}({\bm{X}})}-k\frac{(1-Z)\left\{D-p_{01}({\bm{X}})\right\}}{\pi_{0}({\bm{X}})}\right)S_{t=0}({\bm{O}})\right]$

and

	$\displaystyle\eqref{eq:q_eif2}=$	$\displaystyle\int_{{\bm{x}}}\left\{p_{z^{}d^{}}({\bm{x}})-kp_{01}({\bm{x}})\right\}\triangledown_{t=0}f_{t}({\bm{x}})\text{d}{\bm{x}}$
	$\displaystyle=$	$\displaystyle\int_{{\bm{x}}}\left\{p_{z^{}d^{}}({\bm{x}})-kp_{01}({\bm{x}})\right\}f({\bm{x}})\left\{\mathbb{E}_{Y,M,D,Z\|{\bm{X}}}[S_{t=0}(Y,M,D,Z,{\bm{x}})\|{\bm{x}}]-\mathbb{E}[S_{t=0}(Y,M,S,Z,{\bm{X}})]\right\}\text{d}{\bm{x}}$
	$\displaystyle=$	$\displaystyle\mathbb{E}\left[\left\{p_{z^{}d^{}}({\bm{X}})-kp_{01}({\bm{X}})\right\}S_{t=0}({\bm{O}})\right]-e_{d_{1}d_{0}}\mathbb{E}\left[S_{t=0}({\bm{O}})\right]$
	$\displaystyle=$	$\displaystyle\mathbb{E}\left[\left(\left\{p_{z^{}d^{}}({\bm{X}})-kp_{01}({\bm{X}})\right\}-e_{d_{1}d_{0}}\right)S_{t=0}({\bm{O}})\right].$

Henceforth, we have that

\displaystyle\triangledown_{t=0}e_{d_{1}d_{0}}(t)=\eqref{eq:q_eif1}+\eqref{eq:q_eif2}=\mathbb{E}\left[\left\{\delta_{d_{1}d_{0}}^{(zz^{\prime})}({\bm{O}})-e_{d_{1}d_{0}}\right\}S_{t=0}(\bm{O})\right].

Therefore, the EIF of $e_{d_{1}d_{0}}$ is $\mathcal{D}_{d_{1}d_{0}}^{e}(\bm{O})=\delta_{d_{1}d_{0}}({\bm{O}})-e_{d_{1}d_{0}}$ . $\square$

Lemma S10

Assume that $\alpha_{1}$ and $\alpha_{2}$ are two causal estimands and their EIFs based on the nonparametric model $\mathcal{M}_{np}$ in the observed data $\bm{O}$ are $\mathcal{D}_{1}(\bm{O})$ and $\mathcal{D}_{2}(\bm{O})$ , respectively. Then, the EIF of $\alpha_{3}=\alpha_{1}+\alpha_{2}$ is

\mathcal{D}_{3}(\bm{O})=\mathcal{D}_{1}(\bm{O})+\mathcal{D}_{2}(\bm{O}).

Moreover, if $\alpha_{2}\neq 0$ , the EIF of $\alpha_{4}=\alpha_{1}/\alpha_{2}$ is

\mathcal{D}_{4}(\bm{O})=\frac{1}{\alpha_{2}}\left\{\mathcal{D}_{1}(\bm{O})-\alpha_{4}\mathcal{D}_{2}(\bm{O})\right\}.

Proof.

We shall follow the notations in the proof of Lemma S9. Define $\alpha_{1}(t)$ and $\alpha_{2}(t)$ as the nonparametric identification formulas of the causal estimands $\alpha_{1}$ and $\alpha_{2}$ , which are evaluated under the parametric submodel $f_{t}(\bm{o})$ . By the semiparametric efficiency theory, we have that

\mathbb{E}\left[\mathcal{D}_{1}(\bm{O})S_{t=0}(\bm{O})\right]=\triangledown_{t=0}\alpha_{1}(t)\text{ and }\mathbb{E}\left[\mathcal{D}_{2}(\bm{O})S_{t=0}(\bm{O})\right]=\triangledown_{t=0}\alpha_{2}(t).

Also, since $\alpha_{3}=\alpha_{1}+\alpha_{2}$ , a valid nonparametric identification formula of $\alpha_{3}$ under the parametric submodel is $\alpha_{3}(t)=\alpha_{1}(t)+\alpha_{2}(t)$ . Then, noting

	$\displaystyle\triangledown_{t=0}\alpha_{3}(t)$	$\displaystyle=\triangledown_{t=0}\alpha_{1}(t)+\triangledown_{t=0}\alpha_{2}(t)$
		$\displaystyle=\mathbb{E}\left[\left\{\mathcal{D}_{1}(\bm{O})+\mathcal{D}_{2}(\bm{O})\right\}S_{t=0}(\bm{O})\right]$
		$\displaystyle=\mathbb{E}\left[\mathcal{D}_{3}(\bm{O})S_{t=0}(\bm{O})\right],$

we conclude that $\mathcal{D}_{3}(\bm{O})$ is the EIF of $\alpha_{3}$ . Similarly, because $\alpha_{4}=\alpha_{1}/\alpha_{2}$ , a valid nonparametric identification formula of $\alpha_{4}$ under the parametric submodel is $\alpha_{4}(t)=\alpha_{1}(t)/\alpha_{2}(t)$ . Then, we have that

	$\displaystyle\triangledown_{t=0}\alpha_{4}(t)$	$\displaystyle=\triangledown_{t=0}\left\{\frac{\alpha_{1}(t)}{\alpha_{2}(t)}\right\}=\frac{\alpha_{2}\triangledown_{t=0}\alpha_{1}(t)-\alpha_{1}\triangledown_{t=0}\alpha_{2}(t)}{\alpha_{2}^{2}}$
		$\displaystyle=\frac{1}{\alpha_{2}}\mathbb{E}\left[\mathcal{D}_{1}(\bm{O})S_{t=0}(\bm{O})\right]-\frac{\alpha_{1}}{\alpha_{2}}\mathbb{E}\left[\mathcal{D}_{1}(\bm{O})S_{t=0}(\bm{O})\right]$
		$\displaystyle=\mathbb{E}\left[\frac{\mathcal{D}_{1}(\bm{O})-\alpha_{4}\mathcal{D}_{2}(\bm{O})}{\alpha_{2}}S_{t=0}(\bm{O})\right],$

thus $\mathcal{D}_{4}(\bm{O})$ is the EIF of $\alpha_{4}$ . $\square$

Proof of Theorem 3. Notice that $\theta_{d_{1}d_{0}}^{(zz^{\prime})}=\frac{\mathbb{E}\left[(p_{z^{*}d^{*}}({\bm{X}})-kp_{01}({\bm{X}}))\eta_{zz^{\prime}}({\bm{X}})\right]}{p_{z^{*}d^{*}}-kp_{01}}$ is a ratio parameter, and the EIFs of its nominator and denominator, $H_{d_{1}d_{0}}^{(zz^{\prime})}=\mathbb{E}\left[(p_{z^{*}d^{*}}({\bm{X}})-kp_{01}({\bm{X}}))\eta_{zz^{\prime}}({\bm{X}})\right]$ and $e_{d_{1}d_{0}}=p_{z^{*}d^{*}}-kp_{01}$ , have been derived in Lemma S9. Therefore, one can verify that the EIF shown in Theorem 3 holds by applying Lemma S10. $\square$

D.5 The multiply robust estimator (Theorem 4)

Proof of Theorem 4. Let $\bm{\tau}$ be all of the parameters in the parametric working models of $h_{nuisance}^{\text{par}}$ , and let $\bm{\tau}^{*}$ be the probability limit of $\widehat{\bm{\tau}}$ , where some components of $\bm{\tau}^{*}$ may not equal to either true value due to misspecification. Let $\widetilde{h}_{nuisance}=\{\widetilde{\pi}_{z}({\bm{x}}),\widetilde{p}_{zd}({\bm{x}}),\widetilde{r}_{zd}(m,{\bm{x}}),\\ \widetilde{\mu}_{zd}(m,{\bm{x}})\}$ be the value of $h_{nuisance}^{\text{par}}$ when it is evaluated at $\bm{\tau}^{*}$ , which is the probability limit of $\widehat{h}_{nuisance}^{\text{par}}$ . Notice that $\widetilde{\pi}_{z}({\bm{x}})=\pi_{z}({\bm{x}})$ , $\widetilde{p}_{zd}({\bm{x}})=p_{zd}({\bm{x}})$ , $\widetilde{r}_{zd}(m,{\bm{x}})=r_{zd}(m,{\bm{x}})$ , $\widetilde{\mu}_{zd}(m,{\bm{x}})=\mu_{zd}(m,{\bm{x}})$ , under $\mathcal{M}_{\pi}$ , $\mathcal{M}_{e}$ , $\mathcal{M}_{m}$ , and $\mathcal{M}_{o}$ , respectively, but the equalities generally do not hold when the corresponding working model is misspecified. Let $\widetilde{p}_{zd}=\mathbb{E}\left[\frac{\mathbb{I}(Z=z)(\mathbb{I}(D=d)-\widetilde{p}_{zd}({\bm{X}}))}{\widetilde{\pi}_{z}({\bm{X}})}+\widetilde{p}_{zd}({\bm{X}})\right]$ be the probability limit of $p_{zd}^{\text{dr}}$ . According to Jiang et al. (2022), $\widetilde{p}_{zd}=p_{zd}$ under $\mathcal{M}_{\pi}\cup\mathcal{M}_{e}$ . Therefore, $\widetilde{p}_{zd}=p_{zd}$ also holds under either $\mathcal{M}_{\pi}\cap\mathcal{M}_{e}\cap\mathcal{M}_{m}$ , $\mathcal{M}_{\pi}\cap\mathcal{M}_{m}\cap\mathcal{M}_{o}$ , $\mathcal{M}_{\pi}\cap\mathcal{M}_{e}\cap\mathcal{M}_{o}$ , or $\mathcal{M}_{e}\cap\mathcal{M}_{m}\cap\mathcal{M}_{o}$ . The previous discussion suggests that the probability limit of $\widehat{\theta}_{d_{1}d_{0}}^{(zz^{\prime}),\text{mr}}$ is

	$\displaystyle\theta^{(zz^{\prime}),\text{mr}}_{d_{1}d_{0}}=$	$\displaystyle\mathbb{E}\Big{\{}\left(\frac{\mathbb{I}(Z=z^{})\left\{\mathbb{I}(D=d^{})-\widetilde{p}_{z^{}d^{}}({\bm{X}})\right\}}{\widetilde{\pi}_{z^{}}({\bm{X}})}-k\frac{(1-Z)\left\{D-\widetilde{p}_{01}({\bm{X}})\right\}}{\widetilde{\pi}_{0}({\bm{X}})}\right)\frac{\widetilde{\eta}_{zz^{\prime}}({\bm{X}})}{\widetilde{p}_{z^{}d^{*}}-k\widetilde{p}_{01}}$
		$\displaystyle+\frac{\widetilde{p}_{z^{}d^{}}({\bm{X}})-k\widetilde{p}_{01}({\bm{X}})}{\widetilde{p}_{z^{}d^{}}-k\widetilde{p}_{01}}\frac{\mathbb{I}(D=d_{z},Z=z)}{\widetilde{p}_{zd_{z}}({\bm{X}})\widetilde{\pi}_{z}({\bm{X}})}\frac{\widetilde{r}_{z^{\prime}d_{z^{\prime}}}(M,{\bm{X}})}{\widetilde{r}_{zd_{z}}(M,{\bm{X}})}\left\{Y-\widetilde{\mu}_{zd_{z}}(M,{\bm{X}})\right\}$
		$\displaystyle+\frac{\widetilde{p}_{z^{}d^{}}({\bm{X}})-k\widetilde{p}_{01}({\bm{X}})}{\widetilde{p}_{z^{}d^{}}-k\widetilde{p}_{01}}\frac{\mathbb{I}(D=d_{z^{\prime}},Z=z^{\prime})}{\widetilde{p}_{z^{\prime}d_{z^{\prime}}}({\bm{X}})\widetilde{\pi}_{z^{\prime}}({\bm{X}})}\left\{\widetilde{\mu}_{zd_{z}}(M,{\bm{X}})-\widetilde{\eta}_{zz^{\prime}}({\bm{X}})\right\}$
		$\displaystyle+\frac{\widetilde{p}_{z^{}d^{}}({\bm{X}})-k\widetilde{p}_{01}({\bm{X}})}{\widetilde{p}_{z^{}d^{}}-k\widetilde{p}_{01}}\widetilde{\eta}_{zz^{\prime}}({\bm{X}})\Big{\}},$

where $\widetilde{\eta}_{zz^{\prime}}({\bm{X}})=\int_{m}\widetilde{\mu}_{zd_{z}}(m,{\bm{X}})\widetilde{r}_{z^{\prime}d_{z^{\prime}}}(m,{\bm{X}})\text{d}m$ . In what follows, we show that $\theta^{(zz^{\prime}),\text{mr}}_{d_{1}d_{0}}=\theta_{d_{1}d_{0}}^{(zz^{\prime})}$ under Scenario I ( $\mathcal{M}_{\pi}\cap\mathcal{M}_{e}\cap\mathcal{M}_{m}$ ), II ( $\mathcal{M}_{\pi}\cap\mathcal{M}_{m}\cap\mathcal{M}_{o}$ ), III ( $\mathcal{M}_{\pi}\cap\mathcal{M}_{e}\cap\mathcal{M}_{o}$ ), or IV ( $\mathcal{M}_{e}\cap\mathcal{M}_{m}\cap\mathcal{M}_{o}$ ), which collectively verify the quadruple robustness of $\widehat{\theta}_{d_{1}d_{0}}^{(zz^{\prime}),\text{mr}}$ .

Scenario I ( $\mathcal{M}_{\pi}\cap\mathcal{M}_{e}\cap\mathcal{M}_{m}$ ):

In Scenario I, $\widetilde{\pi}_{z}({\bm{x}})=\pi_{z}({\bm{x}})$ , $\widetilde{p}_{zd}({\bm{x}})=p_{zd}({\bm{x}})$ , $\widetilde{r}_{zd}(m,{\bm{x}})=r_{zd}(m,{\bm{x}})$ , but generally $\widetilde{\mu}_{zd}(m,{\bm{x}})\neq\mu_{zd}(m,{\bm{x}})$ . By the doubly robustness of $\widetilde{p}_{zd}$ , we also have $\widetilde{p}_{zd}=p_{zd}$ . Observing this, we can rewrite $\theta^{(zz^{\prime}),\text{mr}}_{d_{1}d_{0}}=\sum_{j=1}^{4}\Delta_{j}$ , where

	$\displaystyle\Delta_{1}$	$\displaystyle=\mathbb{E}\left[\left(\frac{\mathbb{I}(Z=z^{})\left\{\mathbb{I}(D=d^{})-p_{z^{}d^{}}({\bm{X}})\right\}}{\pi_{z^{}}({\bm{X}})}-k\frac{(1-Z)\left\{D-p_{01}({\bm{X}})\right\}}{\pi_{0}({\bm{X}})}\right)\frac{\int_{m}\widetilde{\mu}_{zd_{z}}(m,{\bm{X}})r_{z^{\prime}d_{z^{\prime}}}(m,{\bm{X}})\text{d}m}{p_{z^{}d^{*}}-kp_{01}}\right],$
	$\displaystyle\Delta_{2}$	$\displaystyle=\mathbb{E}\left[\frac{p_{z^{}d^{}}({\bm{X}})-kp_{01}({\bm{X}})}{p_{z^{}d^{}}-kp_{01}}\frac{\mathbb{I}(D=d_{z},Z=z)}{p_{zd_{z}}({\bm{X}})\pi_{z}({\bm{X}})}\frac{r_{z^{\prime}d_{z^{\prime}}}(M,{\bm{X}})}{r_{zd_{z}}(M,{\bm{X}})}Y\right],$
	$\displaystyle\Delta_{3}$	$\displaystyle=\mathbb{E}\left[\frac{p_{z^{}d^{}}({\bm{X}})-kp_{01}({\bm{X}})}{p_{z^{}d^{}}-kp_{01}}\left\{\frac{\mathbb{I}(D=d_{z^{\prime}},Z=z^{\prime})}{p_{z^{\prime}d_{z^{\prime}}}({\bm{X}})\pi_{z^{\prime}}({\bm{X}})}-\frac{\mathbb{I}(D=d_{z},Z=z)}{p_{zd_{z}}({\bm{X}})\pi_{z}({\bm{X}})}\frac{r_{z^{\prime}d_{z^{\prime}}}(M,{\bm{X}})}{r_{zd_{z}}(M,{\bm{X}})}\right\}\widetilde{\mu}_{zd_{z}}(M,{\bm{X}})\right],$
	$\displaystyle\Delta_{4}$	$\displaystyle=\mathbb{E}\left[\frac{p_{z^{}d^{}}({\bm{X}})-kp_{01}({\bm{X}})}{p_{z^{}d^{}}-kp_{01}}\left\{1-\frac{\mathbb{I}(D=d_{z^{\prime}},Z=z^{\prime})}{p_{z^{\prime}d_{z^{\prime}}}({\bm{X}})\pi_{z^{\prime}}({\bm{X}})}\right\}\int_{m}\widetilde{\mu}_{zd_{z}}(m,{\bm{X}})r_{z^{\prime}d_{z^{\prime}}}(m,{\bm{X}})\text{d}m\right].$

It is obvious that $\Delta_{2}=\theta_{d_{1}d_{0}}^{(zz^{\prime}),\text{a}}$ . Moreover,

	$\displaystyle\Delta_{1}=$	$\displaystyle\mathbb{E}\left[\frac{\mathbb{I}(Z=z^{})\left\{\mathbb{I}(D=d^{})-p_{z^{}d^{}}({\bm{X}})\right\}}{\pi_{z^{}}({\bm{X}})}\frac{\int_{m}\widetilde{\mu}_{zd_{z}}(m,{\bm{X}})r_{z^{\prime}d_{z^{\prime}}}(m,{\bm{X}})\text{d}m}{p_{z^{}d^{*}}-kp_{01}}\right]$
		$\displaystyle-k\mathbb{E}\left[\frac{(1-Z)\left\{D-p_{01}({\bm{X}})\right\}}{\pi_{0}({\bm{X}})}\frac{\int_{m}\widetilde{\mu}_{zd_{z}}(m,{\bm{X}})r_{z^{\prime}d_{z^{\prime}}}(m,{\bm{X}})\text{d}m}{p_{z^{}d^{}}-kp_{01}}\right]$
	$\displaystyle=$	$\displaystyle\mathbb{E}\left[\frac{\mathbb{I}(Z=z^{})}{\pi_{z^{}}({\bm{X}})}\frac{\int_{m}\widetilde{\mu}_{zd_{z}}(m,{\bm{X}})r_{z^{\prime}d_{z^{\prime}}}(m,{\bm{X}})\text{d}m}{p_{z^{}d^{}}-kp_{01}}\underbrace{\left\{\mathbb{E}_{D\|Z,{\bm{X}}}[\mathbb{I}(D=d^{})\|z^{},{\bm{X}}]-p_{z^{}d^{}}({\bm{X}})\right\}}_{=0}\right]$
		$\displaystyle-k\mathbb{E}\left[\frac{(1-Z)}{\pi_{0}({\bm{X}})}\frac{\int_{m}\widetilde{\mu}_{zd_{z}}(m,{\bm{X}})r_{z^{\prime}d_{z^{\prime}}}(m,{\bm{X}})\text{d}m}{p_{z^{}d^{}}-kp_{01}}\underbrace{\left\{\mathbb{E}_{D\|Z,{\bm{X}}}[D\|0,{\bm{X}}]-p_{01}({\bm{X}})\right\}}_{=0}\right]$
	$\displaystyle=$	$\displaystyle 0-k\times 0=0,$
	$\displaystyle\Delta_{3}=$	$\displaystyle\mathbb{E}\left[\frac{p_{z^{}d^{}}({\bm{X}})-kp_{01}({\bm{X}})}{p_{z^{}d^{}}-kp_{01}}\frac{\mathbb{I}(D=d_{z^{\prime}},Z=z^{\prime})}{p_{z^{\prime}d_{z^{\prime}}}({\bm{X}})\pi_{z^{\prime}}({\bm{X}})}\widetilde{\mu}_{zd_{z}}(M,{\bm{X}})\right]$
		$\displaystyle-\mathbb{E}\left[\frac{p_{z^{}d^{}}({\bm{X}})-kp_{01}({\bm{X}})}{p_{z^{}d^{}}-kp_{01}}\frac{\mathbb{I}(D=d_{z},Z=z)}{p_{zd_{z}}({\bm{X}})\pi_{z}({\bm{X}})}\frac{r_{z^{\prime}d_{z^{\prime}}}(M,{\bm{X}})}{r_{zd_{z}}(M,{\bm{X}})}\widetilde{\mu}_{zd_{z}}(M,{\bm{X}})\right]$
	$\displaystyle=$	$\displaystyle\mathbb{E}\left[\frac{p_{z^{}d^{}}({\bm{X}})-kp_{01}({\bm{X}})}{p_{z^{}d^{}}-kp_{01}}\frac{\mathbb{I}(D=d_{z^{\prime}},Z=z^{\prime})}{p_{z^{\prime}d_{z^{\prime}}}({\bm{X}})\pi_{z^{\prime}}({\bm{X}})}\mathbb{E}_{M\|Z,D,{\bm{X}}}[\widetilde{\mu}_{zd_{z}}(M,{\bm{X}})\|z^{\prime},d_{z^{\prime}},{\bm{X}}]\right]$
		$\displaystyle-\mathbb{E}\left[\frac{p_{z^{}d^{}}({\bm{X}})-kp_{01}({\bm{X}})}{p_{z^{}d^{}}-kp_{01}}\frac{\mathbb{I}(D=d_{z},Z=z)}{p_{zd_{z}}({\bm{X}})\pi_{z}({\bm{X}})}\mathbb{E}_{M\|Z,D,{\bm{X}}}\left[\frac{r_{z^{\prime}d_{z^{\prime}}}(M,{\bm{X}})}{r_{zd_{z}}(M,{\bm{X}})}\widetilde{\mu}_{zd_{z}}(M,{\bm{X}})\Big{\|}z,d_{z},{\bm{X}}\right]\right]$
	$\displaystyle=$	$\displaystyle\mathbb{E}\left[\frac{p_{z^{}d^{}}({\bm{X}})-kp_{01}({\bm{X}})}{p_{z^{}d^{}}-kp_{01}}\mathbb{E}_{M\|Z,D,{\bm{X}}}[\widetilde{\mu}_{zd_{z}}(M,{\bm{X}})\|z^{\prime},d_{z^{\prime}},{\bm{X}}]\right]$
		$\displaystyle-\mathbb{E}\left[\frac{p_{z^{}d^{}}({\bm{X}})-kp_{01}({\bm{X}})}{p_{z^{}d^{}}-kp_{01}}\mathbb{E}_{M\|Z,D,{\bm{X}}}\left[\frac{r_{z^{\prime}d_{z^{\prime}}}(M,{\bm{X}})}{r_{zd_{z}}(M,{\bm{X}})}\widetilde{\mu}_{zd_{z}}(M,{\bm{X}})\Big{\|}z,d_{z},{\bm{X}}\right]\right]$
	$\displaystyle=$	$\displaystyle\mathbb{E}\left[\frac{p_{z^{}d^{}}({\bm{X}})-kp_{01}({\bm{X}})}{p_{z^{}d^{}}-kp_{01}}\mathbb{E}_{M\|Z,D,{\bm{X}}}[\widetilde{\mu}_{zd_{z}}(M,{\bm{X}})\|z^{\prime},d_{z^{\prime}},{\bm{X}}]\right]$
		$\displaystyle-\mathbb{E}\left[\frac{p_{z^{}d^{}}({\bm{X}})-kp_{01}({\bm{X}})}{p_{z^{}d^{}}-kp_{01}}\mathbb{E}_{M\|Z,D,{\bm{X}}}\left[\widetilde{\mu}_{zd_{z}}(M,{\bm{X}})\Big{\|}z^{\prime},d_{z^{\prime}},{\bm{X}}\right]\right]$
	$\displaystyle=$	$\displaystyle 0,$
	$\displaystyle\Delta_{4}=$	$\displaystyle\mathbb{E}\left[\frac{p_{z^{}d^{}}({\bm{X}})-kp_{01}({\bm{X}})}{p_{z^{}d^{}}-kp_{01}}\mathbb{E}_{M\|Z,D,{\bm{X}}}[\widetilde{\mu}_{zd_{z}}(m,{\bm{X}})\|z^{\prime},d_{z^{\prime}},{\bm{X}}]\left\{1-\frac{\mathbb{I}(D=d_{z^{\prime}},Z=z^{\prime})}{p_{z^{\prime}d_{z^{\prime}}}({\bm{X}})\pi_{z^{\prime}}({\bm{X}})}\right\}\right]$
	$\displaystyle=$	$\displaystyle\mathbb{E}\left[\frac{p_{z^{}d^{}}({\bm{X}})-kp_{01}({\bm{X}})}{p_{z^{}d^{}}-kp_{01}}\mathbb{E}_{M\|Z,D,{\bm{X}}}[\widetilde{\mu}_{zd_{z}}(m,{\bm{X}})\|z^{\prime},d_{z^{\prime}},{\bm{X}}]\left\{1-1\right\}\right]$
	$\displaystyle=$	$\displaystyle 0,$

which suggests that $\theta^{(zz^{\prime}),\text{mr}}_{d_{1}d_{0}}=\sum_{j=1}^{4}\Delta_{j}=\theta^{(zz^{\prime}),\text{a}}_{d_{1}d_{0}}=\theta^{(zz^{\prime})}_{d_{1}d_{0}}$ under $\mathcal{M}_{\pi}\cap\mathcal{M}_{e}\cap\mathcal{M}_{m}$ .

Scenario II ( $\mathcal{M}_{\pi}\cap\mathcal{M}_{m}\cap\mathcal{M}_{o}$ ):

In Scenario II, $\widetilde{\pi}_{z}({\bm{x}})=\pi_{z}({\bm{x}})$ , $\widetilde{r}_{zd}(m,{\bm{x}})=r_{zd}(m,{\bm{x}})$ , $\widetilde{\mu}_{zd}(m,{\bm{x}})=\mu_{zd}(m,{\bm{x}})$ , but generally $\widetilde{p}_{zd}({\bm{x}})\neq p_{zd}({\bm{x}})$ . Observing this, we can rewrite $\theta^{(zz^{\prime}),\text{mr}}_{d_{1}d_{0}}=\sum_{j=1}^{4}\Delta_{j}$ , where

	$\displaystyle\Delta_{1}$	$\displaystyle=\mathbb{E}\left[\left\{\frac{\mathbb{I}(Z=z^{},D=d^{})}{\pi_{z^{}}({\bm{X}})}-k\frac{(1-Z)D}{\pi_{0}({\bm{X}})}\right\}\frac{\eta_{zz^{\prime}}({\bm{X}})}{p_{z^{}d^{*}}-kp_{01}}\right],$
	$\displaystyle\Delta_{2}$	$\displaystyle=\mathbb{E}\left[\frac{\widetilde{p}_{z^{}d^{}}({\bm{X}})-k\widetilde{p}_{01}({\bm{X}})}{p_{z^{}d^{}}-kp_{01}}\frac{\mathbb{I}(D=d_{z},Z=z)}{\widetilde{p}_{zd_{z}}({\bm{X}})\pi_{z}({\bm{X}})}\frac{r_{z^{\prime}d_{z^{\prime}}}(M,{\bm{X}})}{r_{zd_{z}}(M,{\bm{X}})}\left\{Y-\mu_{zd_{z}}(M,{\bm{X}})\right\}\right],$
	$\displaystyle\Delta_{3}$	$\displaystyle=\mathbb{E}\left[\frac{\widetilde{p}_{z^{}d^{}}({\bm{X}})-k\widetilde{p}_{01}({\bm{X}})}{p_{z^{}d^{}}-kp_{01}}\frac{\mathbb{I}(D=d_{z^{\prime}},Z=z^{\prime})}{\widetilde{p}_{z^{\prime}d_{z^{\prime}}}({\bm{X}})\pi_{z^{\prime}}({\bm{X}})}\left\{\mu_{zd_{z}}(M,{\bm{X}})-\eta_{zz^{\prime}}({\bm{X}})\right\}\right],$
	$\displaystyle\Delta_{4}$	$\displaystyle=\mathbb{E}\left[\left(\left\{1-\frac{\mathbb{I}(Z=z^{})}{\pi_{z^{}}({\bm{X}})}\right\}\widetilde{p}_{z^{}d^{}}({\bm{X}})-k\left\{1-\frac{1-Z}{\pi_{0}({\bm{X}})}\right\}\widetilde{p}_{01}({\bm{X}})\right)\frac{\eta_{zz^{\prime}}({\bm{X}})}{p_{z^{}d^{}}-kp_{01}}\right].$

One can verify $\Delta_{1}=\theta_{d_{1}d_{0}}^{(zz^{\prime}),\text{b}}$ ,

	$\displaystyle\Delta_{2}=$	$\displaystyle\mathbb{E}\left[\frac{\widetilde{p}_{z^{}d^{}}({\bm{X}})-k\widetilde{p}_{01}({\bm{X}})}{p_{z^{}d^{}}-kp_{01}}\frac{\mathbb{I}(D=d_{z},Z=z)}{\widetilde{p}_{zd_{z}}({\bm{X}})\pi_{z}({\bm{X}})}\frac{r_{z^{\prime}d_{z^{\prime}}}(M,{\bm{X}})}{r_{zd_{z}}(M,{\bm{X}})}\left\{Y-\mu_{zd_{z}}(M,{\bm{X}})\right\}\right]$
	$\displaystyle=$	$\displaystyle\mathbb{E}\left[\frac{\widetilde{p}_{z^{}d^{}}({\bm{X}})-k\widetilde{p}_{01}({\bm{X}})}{p_{z^{}d^{}}-kp_{01}}\frac{\mathbb{I}(D=d_{z},Z=z)}{\widetilde{p}_{zd_{z}}({\bm{X}})\pi_{z}({\bm{X}})}\frac{r_{z^{\prime}d_{z^{\prime}}}(M,{\bm{X}})}{r_{zd_{z}}(M,{\bm{X}})}\underbrace{\left\{\mathbb{E}_{Y\|Z,D,M,{\bm{X}}}[Y\|z,d_{z},M,{\bm{X}}]-\mu_{zd_{z}}(M,{\bm{X}})\right\}}_{=0}\right]$
	$\displaystyle=$	$\displaystyle 0$
	$\displaystyle\Delta_{3}=$	$\displaystyle\mathbb{E}\left[\frac{\widetilde{p}_{z^{}d^{}}({\bm{X}})-k\widetilde{p}_{01}({\bm{X}})}{p_{z^{}d^{}}-kp_{01}}\frac{\mathbb{I}(D=d_{z^{\prime}},Z=z^{\prime})}{\widetilde{p}_{z^{\prime}d_{z^{\prime}}}({\bm{X}})\pi_{z^{\prime}}({\bm{X}})}\left\{\mathbb{E}_{M\|Z,D,{\bm{X}}}[\mu_{zd_{z}}(M,{\bm{X}})\|z^{\prime},d_{z^{\prime}},{\bm{X}}]-\eta_{zz^{\prime}}({\bm{X}})\right\}\right]$
	$\displaystyle=$	$\displaystyle\mathbb{E}\left[\frac{\widetilde{p}_{z^{}d^{}}({\bm{X}})-k\widetilde{p}_{01}({\bm{X}})}{p_{z^{}d^{}}-kp_{01}}\frac{\mathbb{I}(D=d_{z^{\prime}},Z=z^{\prime})}{\widetilde{p}_{z^{\prime}d_{z^{\prime}}}({\bm{X}})\pi_{z^{\prime}}({\bm{X}})}\left\{\eta_{zz^{\prime}}({\bm{X}})-\eta_{zz^{\prime}}({\bm{X}})\right\}\right]$
	$\displaystyle=$	$\displaystyle 0$
	$\displaystyle\Delta_{4}=$	$\displaystyle\mathbb{E}\left[\left(\left\{1-\frac{\mathbb{I}(Z=z^{})}{\pi_{z^{}}({\bm{X}})}\right\}\widetilde{p}_{z^{}d^{}}({\bm{X}})-k\left\{1-\frac{1-Z}{\pi_{0}({\bm{X}})}\right\}\widetilde{p}_{01}({\bm{X}})\right)\frac{\eta_{zz^{\prime}}({\bm{X}})}{p_{z^{}d^{}}-kp_{01}}\right]$
	$\displaystyle=$	$\displaystyle\mathbb{E}\left[\left(\left\{1-\frac{\mathbb{E}_{Z\|{\bm{X}}}\left[\mathbb{I}(Z=z^{})\|{\bm{X}}\right]}{\pi_{z^{}}({\bm{X}})}\right\}\widetilde{p}_{z^{}d^{}}({\bm{X}})-k\left\{1-\frac{\mathbb{E}_{Z\|{\bm{X}}}[1-Z\|{\bm{X}}]}{\pi_{0}({\bm{X}})}\right\}\widetilde{p}_{01}({\bm{X}})\right)\frac{\eta_{zz^{\prime}}({\bm{X}})}{p_{z^{}d^{}}-kp_{01}}\right]$
	$\displaystyle=$	$\displaystyle 0.$

Therefore, we have obtained $\theta^{(zz^{\prime}),\text{mr}}_{d_{1}d_{0}}=\sum_{j=1}^{4}\Delta_{j}=\theta^{(zz^{\prime}),\text{b}}_{d_{1}d_{0}}=\theta^{(zz^{\prime})}_{d_{1}d_{0}}$ under $\mathcal{M}_{\pi}\cap\mathcal{M}_{m}\cap\mathcal{M}_{o}$ .

Scenario III ( $\mathcal{M}_{\pi}\cap\mathcal{M}_{e}\cap\mathcal{M}_{o}$ ):

In Scenario III, $\widetilde{\pi}_{z}({\bm{x}})=\pi_{z}({\bm{x}})$ , $\widetilde{p}_{zd}({\bm{x}})=p_{zd}({\bm{x}})$ , $\widetilde{\mu}_{zd}(m,{\bm{x}})=\mu_{zd}(m,{\bm{x}})$ , but generally $\widetilde{r}_{zd}(m,{\bm{x}})\neq r_{zd}(m,{\bm{x}})$ . Observing this, we can rewrite $\theta^{(zz^{\prime}),\text{mr}}_{d_{1}d_{0}}=\sum_{j=1}^{4}\Delta_{j}$ , where

	$\displaystyle\Delta_{1}$	$\displaystyle=\mathbb{E}\left[\left(\frac{\mathbb{I}(Z=z^{})\left\{\mathbb{I}(D=d^{})-p_{z^{}d^{}}({\bm{X}})\right\}}{\pi_{z^{}}({\bm{X}})}-k\frac{(1-Z)\left\{D-p_{01}({\bm{X}})\right\}}{\pi_{0}({\bm{X}})}\right)\frac{\int_{m}{\mu}_{zd_{z}}(m,{\bm{X}})\widetilde{r}_{z^{\prime}d_{z^{\prime}}}(m,{\bm{X}})\text{d}m}{p_{z^{}d^{*}}-kp_{01}}\right],$
	$\displaystyle\Delta_{2}$	$\displaystyle=\mathbb{E}\left[\frac{p_{z^{}d^{}}({\bm{X}})-kp_{01}({\bm{X}})}{p_{z^{}d^{}}-kp_{01}}\frac{\mathbb{I}(D=d_{z},Z=z)}{p_{zd_{z}}({\bm{X}})\pi_{z}({\bm{X}})}\frac{\widetilde{r}_{z^{\prime}d_{z^{\prime}}}(M,{\bm{X}})}{\widetilde{r}_{zd_{z}}(M,{\bm{X}})}\left\{Y-\mu_{zd_{z}}(M,{\bm{X}})\right\}\right],$
	$\displaystyle\Delta_{3}$	$\displaystyle=\mathbb{E}\left[\frac{p_{z^{}d^{}}({\bm{X}})-kp_{01}({\bm{X}})}{p_{z^{}d^{}}-kp_{01}}\frac{\mathbb{I}(D=d_{z^{\prime}},Z=z^{\prime})}{p_{z^{\prime}d_{z^{\prime}}}({\bm{X}})\pi_{z^{\prime}}({\bm{X}})}\mu_{zd_{z}}(M,{\bm{X}})\right],$
	$\displaystyle\Delta_{4}$	$\displaystyle=\mathbb{E}\left[\frac{p_{z^{}d^{}}({\bm{X}})-kp_{01}({\bm{X}})}{p_{z^{}d^{}}-kp_{01}}\left\{1-\frac{\mathbb{I}(D=d_{z^{\prime}},Z=z^{\prime})}{p_{z^{\prime}d_{z^{\prime}}}({\bm{X}})\pi_{z^{\prime}}({\bm{X}})}\right\}\int_{m}{\mu}_{zd_{z}}(m,{\bm{X}})\widetilde{r}_{z^{\prime}d_{z^{\prime}}}(m,{\bm{X}})\text{d}m\right].$

Noting that $\Delta_{3}=\theta_{d_{1}d_{0}}^{(zz^{\prime}),\text{c}}$ ,

	$\displaystyle\Delta_{1}=$	$\displaystyle\mathbb{E}\left[\frac{\mathbb{I}(Z=z^{})\left\{\mathbb{I}(D=d^{})-p_{z^{}d^{}}({\bm{X}})\right\}}{\pi_{z^{}}({\bm{X}})}\frac{\int_{m}{\mu}_{zd_{z}}(m,{\bm{X}})\widetilde{r}_{z^{\prime}d_{z^{\prime}}}(m,{\bm{X}})\text{d}m}{p_{z^{}d^{*}}-kp_{01}}\right]$
		$\displaystyle-k\mathbb{E}\left[\frac{(1-Z)\left\{D-p_{01}({\bm{X}})\right\}}{\pi_{0}({\bm{X}})}\frac{\int_{m}{\mu}_{zd_{z}}(m,{\bm{X}})\widetilde{r}_{z^{\prime}d_{z^{\prime}}}(m,{\bm{X}})\text{d}m}{p_{z^{}d^{}}-kp_{01}}\right]$
	$\displaystyle=$	$\displaystyle\mathbb{E}\left[\frac{\mathbb{I}(Z=z^{})}{\pi_{z^{}}({\bm{X}})}\frac{\int_{m}{\mu}_{zd_{z}}(m,{\bm{X}})\widetilde{r}_{z^{\prime}d_{z^{\prime}}}(m,{\bm{X}})\text{d}m}{p_{z^{}d^{}}-kp_{01}}\underbrace{\left\{\mathbb{E}_{D\|Z,{\bm{X}}}[\mathbb{I}(D=d^{})\|z^{},{\bm{X}}]-p_{z^{}d^{}}({\bm{X}})\right\}}_{=0}\right]$
		$\displaystyle-k\mathbb{E}\left[\frac{(1-Z)}{\pi_{0}({\bm{X}})}\frac{\int_{m}{\mu}_{zd_{z}}(m,{\bm{X}})\widetilde{r}_{z^{\prime}d_{z^{\prime}}}(m,{\bm{X}})\text{d}m}{p_{z^{}d^{}}-kp_{01}}\underbrace{\left\{\mathbb{E}_{D\|Z,{\bm{X}}}[D\|0,{\bm{X}}]-p_{01}({\bm{X}})\right\}}_{=0}\right]$
	$\displaystyle=$	$\displaystyle 0-k\times 0=0,$
	$\displaystyle\Delta_{2}=$	$\displaystyle\mathbb{E}\left[\frac{p_{z^{}d^{}}({\bm{X}})-kp_{01}({\bm{X}})}{p_{z^{}d^{}}-kp_{01}}\frac{\mathbb{I}(D=d_{z},Z=z)}{p_{zd_{z}}({\bm{X}})\pi_{z}({\bm{X}})}\frac{\widetilde{r}_{z^{\prime}d_{z^{\prime}}}(M,{\bm{X}})}{\widetilde{r}_{zd_{z}}(M,{\bm{X}})}\left\{Y-\mu_{zd_{z}}(M,{\bm{X}})\right\}\right]$
	$\displaystyle=$	$\displaystyle\mathbb{E}\left[\frac{p_{z^{}d^{}}({\bm{X}})-kp_{01}({\bm{X}})}{p_{z^{}d^{}}-kp_{01}}\frac{\mathbb{I}(D=d_{z},Z=z)}{p_{zd_{z}}({\bm{X}})\pi_{z}({\bm{X}})}\frac{\widetilde{r}_{z^{\prime}d_{z^{\prime}}}(M,{\bm{X}})}{\widetilde{r}_{zd_{z}}(M,{\bm{X}})}\underbrace{\left\{\mathbb{E}_{Y\|Z,D,M,{\bm{X}}}[Y\|z,d_{z},M,{\bm{X}}]-\mu_{zd_{z}}(M,{\bm{X}})\right\}}_{=0}\right]$
	$\displaystyle=$	$\displaystyle 0,$
	$\displaystyle\Delta_{4}=$	$\displaystyle\mathbb{E}\left[\frac{p_{z^{}d^{}}({\bm{X}})-kp_{01}({\bm{X}})}{p_{z^{}d^{}}-kp_{01}}\int_{m}{\mu}_{zd_{z}}(m,{\bm{X}})\widetilde{r}_{z^{\prime}d_{z^{\prime}}}(m,{\bm{X}})\left\{1-\mathbb{E}_{Z,D\|{\bm{X}}}\left[\frac{\mathbb{I}(D=d_{z^{\prime}},Z=z^{\prime})}{p_{z^{\prime}d_{z^{\prime}}}({\bm{X}})\pi_{z^{\prime}}({\bm{X}})}\|{\bm{X}}\right]\right\}\text{d}m\right]$
	$\displaystyle=$	$\displaystyle\mathbb{E}\left[\frac{p_{z^{}d^{}}({\bm{X}})-kp_{01}({\bm{X}})}{p_{z^{}d^{}}-kp_{01}}\int_{m}{\mu}_{zd_{z}}(m,{\bm{X}})\widetilde{r}_{z^{\prime}d_{z^{\prime}}}(m,{\bm{X}})\left\{1-1\right\}\text{d}m\right]$
	$\displaystyle=$	$\displaystyle 0,$

we have obtained $\theta^{(zz^{\prime}),\text{mr}}_{d_{1}d_{0}}=\sum_{j=1}^{4}\Delta_{j}=\theta^{(zz^{\prime}),\text{c}}_{d_{1}d_{0}}=\theta^{(zz^{\prime})}_{d_{1}d_{0}}$ under $\mathcal{M}_{\pi}\cap\mathcal{M}_{e}\cap\mathcal{M}_{o}$ .

Scenario IV ( $\mathcal{M}_{e}\cap\mathcal{M}_{m}\cap\mathcal{M}_{o}$ ):

In Scenario IV, $\widetilde{p}_{zd}({\bm{x}})=p_{zd}({\bm{x}})$ , $\widetilde{r}_{zd}(m,{\bm{x}})=r_{zd}(m,{\bm{x}})$ , $\widetilde{\mu}_{zd}(m,{\bm{x}})=\mu_{zd}(m,{\bm{x}})$ , but generally $\widetilde{\pi}_{z}({\bm{x}})\neq\pi_{z}({\bm{x}})$ . Therefore, we have $\theta^{(zz^{\prime}),\text{mr}}_{d_{1}d_{0}}=\sum_{j=1}^{4}\Delta_{j}$ , where

	$\displaystyle\Delta_{1}$	$\displaystyle=\mathbb{E}\left[\left(\frac{\mathbb{I}(Z=z^{})\left\{\mathbb{I}(D=d^{})-p_{z^{}d^{}}({\bm{X}})\right\}}{\widetilde{\pi}_{z^{}}({\bm{X}})}-k\frac{(1-Z)\left\{D-p_{01}({\bm{X}})\right\}}{\widetilde{\pi}_{0}({\bm{X}})}\right)\frac{\eta_{zz^{\prime}}({\bm{X}})}{p_{z^{}d^{*}}-kp_{01}}\right],$
	$\displaystyle\Delta_{2}$	$\displaystyle=\mathbb{E}\left[\frac{p_{z^{}d^{}}({\bm{X}})-kp_{01}({\bm{X}})}{p_{z^{}d^{}}-kp_{01}}\frac{\mathbb{I}(D=d_{z},Z=z)}{p_{zd_{z}}({\bm{X}})\widetilde{\pi}_{z}({\bm{X}})}\frac{r_{z^{\prime}d_{z^{\prime}}}(M,{\bm{X}})}{r_{zd_{z}}(M,{\bm{X}})}\left\{Y-\mu_{zd_{z}}(M,{\bm{X}})\right\}\right],$
	$\displaystyle\Delta_{3}$	$\displaystyle=\mathbb{E}\left[\frac{p_{z^{}d^{}}({\bm{X}})-kp_{01}({\bm{X}})}{p_{z^{}d^{}}-kp_{01}}\frac{\mathbb{I}(D=d_{z^{\prime}},Z=z^{\prime})}{p_{z^{\prime}d_{z^{\prime}}}({\bm{X}})\widetilde{\pi}_{z^{\prime}}({\bm{X}})}\left\{\mu_{zd_{z}}(M,{\bm{X}})-\eta_{zz^{\prime}}({\bm{X}})\right\}\right],$
	$\displaystyle\Delta_{4}$	$\displaystyle=\mathbb{E}\left[\frac{p_{z^{}d^{}}({\bm{X}})-kp_{01}({\bm{X}})}{p_{z^{}d^{}}-kp_{01}}\eta_{zz^{\prime}}({\bm{X}})\right].$

Noting that $\Delta_{4}=\theta_{d_{1}d_{0}}^{(zz^{\prime}),\text{d}}$ ,

	$\displaystyle\Delta_{1}=$	$\displaystyle\mathbb{E}\left[\frac{\mathbb{I}(Z=z^{})\left\{\mathbb{I}(D=d^{})-p_{z^{}d^{}}({\bm{X}})\right\}}{\widetilde{\pi}_{z^{}}({\bm{X}})}\frac{\eta_{zz^{\prime}}({\bm{X}})}{p_{z^{}d^{}}-kp_{01}}\right]-k\mathbb{E}\left[\frac{(1-Z)\left\{D-p_{01}({\bm{X}})\right\}}{\widetilde{\pi}_{0}({\bm{X}})}\frac{\eta_{zz^{\prime}}({\bm{X}})}{p_{z^{}d^{*}}-kp_{01}}\right]$
	$\displaystyle=$	$\displaystyle\mathbb{E}\left[\frac{\mathbb{I}(Z=z^{})}{\widetilde{\pi}_{z^{}}({\bm{X}})}\frac{\eta_{zz^{\prime}}({\bm{X}})}{p_{z^{}d^{}}-kp_{01}}\underbrace{\left\{\mathbb{E}_{D\|Z,{\bm{X}}}[\mathbb{I}(D=d^{})\|z^{},{\bm{X}}]-p_{z^{}d^{}}({\bm{X}})\right\}}_{=0}\right]$
		$\displaystyle-k\mathbb{E}\left[\frac{(1-Z)}{\widetilde{\pi}_{0}({\bm{X}})}\frac{\eta_{zz^{\prime}}({\bm{X}})}{p_{z^{}d^{}}-kp_{01}}\underbrace{\left\{\mathbb{E}_{D\|Z,{\bm{X}}}[D\|0,{\bm{X}}]-p_{01}({\bm{X}})\right\}}_{=0}\right]$
	$\displaystyle=$	$\displaystyle 0-k\times 0=0,$
	$\displaystyle\Delta_{2}=$	$\displaystyle\mathbb{E}\left[\frac{p_{z^{}d^{}}({\bm{X}})-kp_{01}({\bm{X}})}{p_{z^{}d^{}}-kp_{01}}\frac{\mathbb{I}(D=d_{z},Z=z)}{p_{zd_{z}}({\bm{X}})\widetilde{\pi}_{z}({\bm{X}})}\frac{r_{z^{\prime}d_{z^{\prime}}}(M,{\bm{X}})}{r_{zd_{z}}(M,{\bm{X}})}\left\{Y-\mu_{zd_{z}}(M,{\bm{X}})\right\}\right]$
	$\displaystyle=$	$\displaystyle\mathbb{E}\left[\frac{p_{z^{}d^{}}({\bm{X}})-kp_{01}({\bm{X}})}{p_{z^{}d^{}}-kp_{01}}\frac{\mathbb{I}(D=d_{z},Z=z)}{p_{zd_{z}}({\bm{X}})\widetilde{\pi}_{z}({\bm{X}})}\frac{r_{z^{\prime}d_{z^{\prime}}}(M,{\bm{X}})}{r_{zd_{z}}(M,{\bm{X}})}\underbrace{\left\{\mathbb{E}_{Y\|Z,D,M,{\bm{X}}}[Y\|z,d_{z},M,{\bm{X}}]-\mu_{zd_{z}}(M,{\bm{X}})\right\}}_{=0}\right]$
	$\displaystyle=$	$\displaystyle 0,$
	$\displaystyle\Delta_{3}=$	$\displaystyle\mathbb{E}\left[\frac{p_{z^{}d^{}}({\bm{X}})-kp_{01}({\bm{X}})}{p_{z^{}d^{}}-kp_{01}}\frac{\mathbb{I}(D=d_{z^{\prime}},Z=z^{\prime})}{p_{z^{\prime}d_{z^{\prime}}}({\bm{X}})\widetilde{\pi}_{z^{\prime}}({\bm{X}})}\left\{\mathbb{E}_{M\|Z,D,{\bm{X}}}[\mu_{zd_{z}}(M,{\bm{X}})\|z^{\prime},d_{z^{\prime}},{\bm{X}}]-\eta_{zz^{\prime}}({\bm{X}})\right\}\right]$
	$\displaystyle=$	$\displaystyle\mathbb{E}\left[\frac{p_{z^{}d^{}}({\bm{X}})-kp_{01}({\bm{X}})}{p_{z^{}d^{}}-kp_{01}}\frac{\mathbb{I}(D=d_{z^{\prime}},Z=z^{\prime})}{p_{z^{\prime}d_{z^{\prime}}}({\bm{X}})\widetilde{\pi}_{z^{\prime}}({\bm{X}})}\left\{\eta_{zz^{\prime}}({\bm{X}})-\eta_{zz^{\prime}}({\bm{X}})\right\}\right]$
	$\displaystyle=$	$\displaystyle 0,$

we have that $\theta^{(zz^{\prime}),\text{mr}}_{d_{1}d_{0}}=\sum_{j=1}^{4}\Delta_{j}=\theta^{(zz^{\prime}),\text{d}}_{d_{1}d_{0}}=\theta^{(zz^{\prime})}_{d_{1}d_{0}}$ under $\mathcal{M}_{e}\cap\mathcal{M}_{m}\cap\mathcal{M}_{o}$ .

Up until this point, we have confirmed that the probability limit of $\widehat{\theta}^{(zz^{\prime}),\text{mr}}_{d_{1}d_{0}}$ , i.e., $\theta^{(zz^{\prime}),\text{mr}}_{d_{1}d_{0}}$ , equals to the true value $\theta^{(zz^{\prime})}_{d_{1}d_{0}}$ under either $\mathcal{M}_{\pi}\cap\mathcal{M}_{e}\cap\mathcal{M}_{m}$ , $\mathcal{M}_{\pi}\cap\mathcal{M}_{m}\cap\mathcal{M}_{o}$ , $\mathcal{M}_{\pi}\cap\mathcal{M}_{e}\cap\mathcal{M}_{o}$ , or $\mathcal{M}_{e}\cap\mathcal{M}_{m}\cap\mathcal{M}_{o}$ . Next, we prove the asymptotic normality of $\widehat{\theta}^{(zz^{\prime}),\text{mr}}_{d_{1}d_{0}}$ . Notice that $\widehat{\theta}^{(zz^{\prime}),\text{mr}}_{d_{1}d_{0}}$ can be viewed as the solution of the following estimating equation

\mathbb{P}_{n}\left[S_{\text{mr}}\left({\bm{O}};\theta_{d_{1}d_{0}}^{(zz^{\prime})},\widehat{\bm{\tau}}\right)\right]=\mathbb{P}_{n}\left[\mathcal{S}_{1}(\bm{O};\widehat{\bm{\tau}})-\theta_{d_{1}d_{0}}^{(zz^{\prime})}\mathcal{S}_{0}(\bm{O};\widehat{\bm{\tau}})\right]=0,

where $\mathcal{S}_{1}(\bm{O};{\bm{\tau}})$ and $\mathcal{S}_{0}(\bm{O};{\bm{\tau}})$ are $\psi_{d_{1}d_{0}}^{(zz^{\prime})}(\bm{O})$ and $\delta_{d_{1}d_{0}}(\bm{O})$ evaluated at $h_{nuisance}^{\text{par}}$ . Assume that the following regularity conditions hold:

1.

Assume that $\sqrt{n}(\widehat{\bm{\tau}}-\bm{\tau}^{*})=\sqrt{n}\mathbb{P}_{n}\left[\text{IF}_{\bm{\tau}}(\bm{O};\bm{\tau}^{*})\right]+o_{p}(1)$ , where $\text{IF}_{\bm{\tau}}(\bm{O};\bm{\tau}^{*})$ is the influence function of $\widehat{\bm{\tau}}$ and $o_{p}(1)$ is a remainder term that converges in probability to 0. Also, assume that $\mathbb{P}_{n}\left[\left\{\text{IF}_{\bm{\tau}}(\bm{O};\bm{\tau}^{*})\right\}^{\otimes 2}\right]$ converges to a positive definite matrix.
2.

Let $\bm{\Xi}$ be a bounded convex neighborhood of $\bm{\tau}^{*}$ . Assume that the class of functions $\Big{\{}\mathcal{S}_{1}(\bm{O};\bm{\tau}),\frac{\partial}{\partial\bm{\tau}}\mathcal{S}_{1}(\bm{O};\bm{\tau}),\left\{\mathcal{S}_{1}(\bm{O};\bm{\tau})\right\}^{2},\mathcal{S}_{0}(\bm{O};\bm{\tau}),\frac{\partial}{\partial\bm{\tau}}\mathcal{S}_{0}(\bm{O};\bm{\tau}),\left\{\mathcal{S}_{0}(\bm{O};\bm{\tau})\right\}^{2},\text{IF}_{\bm{\tau}}(\bm{O};\bm{\tau}),\\ \left\{\text{IF}_{\bm{\tau}}(\bm{O};\bm{\tau})\right\}^{\otimes 2}\Big{\}}$ is a Glivenko-Cantelli class in $\bm{\Xi}$ .
3.

Assume that $\mathbb{P}_{n}[\mathcal{S}_{0}(\bm{O};\bm{\tau}^{*})]$ converges to a positive value. In addition, we assume that both $\mathbb{P}_{n}[\left\{\mathcal{S}_{1}(\bm{O};\bm{\tau}^{*})\right\}^{2}]$ and $\mathbb{P}_{n}[\left\{\mathcal{S}_{0}(\bm{O};\bm{\tau}^{*})\right\}^{2}]$ converge to a positive value.

To prove asymptotic normality, we use a Taylor series, along with the above conditions, to deduce that

	$\displaystyle 0=\mathbb{P}_{n}\left[S_{\text{mr}}\left({\bm{O}};\widehat{\theta}_{d_{1}d_{0}}^{(zz^{\prime}),\text{mr}},\widehat{\bm{\tau}}\right)\right]=$	$\displaystyle\mathbb{P}_{n}\left[\mathcal{S}_{1}(\bm{O};\widehat{\bm{\tau}})-\widehat{\theta}_{d_{1}d_{0}}^{(zz^{\prime}),\text{mr}}\mathcal{S}_{0}(\bm{O};\widehat{\bm{\tau}})\right]$
	$\displaystyle=$	$\displaystyle\mathbb{P}_{n}\left[\mathcal{S}_{1}(\bm{O};\bm{\tau}^{})-\theta_{d_{1}d_{0}}^{(zz^{\prime}),\text{mr}}\mathcal{S}_{0}(\bm{O};\bm{\tau}^{})\right]$
		$\displaystyle-\mathbb{P}_{n}\left[\mathcal{S}_{0}(\bm{O};\bm{\tau}^{*})\right]\left(\widehat{\theta}_{d_{1}d_{0}}^{(zz^{\prime}),\text{mr}}-\theta_{d_{1}d_{0}}^{(zz^{\prime}),\text{mr}}\right)$
		$\displaystyle+\mathbb{P}_{n}\left[\frac{\partial}{\partial\bm{\tau}}\mathcal{S}_{1}(\bm{O};\bm{\tau}^{})-\theta_{d_{1}d_{0}}^{(zz^{\prime}),\text{mr}}\frac{\partial}{\partial\bm{\tau}}\mathcal{S}_{0}(\bm{O};\bm{\tau}^{})\right](\widehat{\bm{\tau}}-\bm{\tau}^{*})+o_{p}(n^{-1/2}),$

which suggests that

		$\displaystyle\sqrt{n}\left(\widehat{\theta}_{d_{1}d_{0}}^{(zz^{\prime}),\text{mr}}-\theta_{d_{1}d_{0}}^{(zz^{\prime}),\text{mr}}\right)$
	$\displaystyle=$	$\displaystyle\left\{\mathbb{E}[\mathcal{S}_{0}(\bm{O};\bm{\tau}^{})]\right\}^{-1}\mathbb{P}_{n}\left\{\mathcal{S}_{1}(\bm{O};\bm{\tau}^{})-\theta_{d_{1}d_{0}}^{(zz^{\prime}),\text{mr}}\mathcal{S}_{0}(\bm{O};\bm{\tau}^{})+R(\theta_{d_{1}d_{0}}^{(zz^{\prime}),\text{mr}},\bm{\tau}^{})\text{IF}_{\bm{\tau}}(\bm{O};\bm{\tau}^{*})\right\}+o_{p}(1),$

where $R(\theta_{d_{1}d_{0}}^{(zz^{\prime}),\text{mr}},\bm{\tau}^{*})=\mathbb{E}\left[\frac{\partial}{\partial\bm{\tau}}\mathcal{S}_{1}(\bm{O};\bm{\tau}^{*})-\theta_{d_{1}d_{0}}^{(zz^{\prime}),\text{mr}}\frac{\partial}{\partial\bm{\tau}}\mathcal{S}_{0}(\bm{O};\bm{\tau}^{*})\right]$ . Then, by applying the central limit theorem and noticing that $\theta_{d_{1}d_{0}}^{(zz^{\prime}),\text{mr}}=\theta_{d_{1}d_{0}}^{(zz^{\prime})}$ under either $\mathcal{M}_{\pi}\cap\mathcal{M}_{e}\cap\mathcal{M}_{m}$ , $\mathcal{M}_{\pi}\cap\mathcal{M}_{m}\cap\mathcal{M}_{o}$ , $\mathcal{M}_{\pi}\cap\mathcal{M}_{e}\cap\mathcal{M}_{o}$ , or $\mathcal{M}_{e}\cap\mathcal{M}_{m}\cap\mathcal{M}_{o}$ , we can show that $\sqrt{n}\left(\widehat{\theta}_{d_{1}d_{0}}^{(zz^{\prime}),\text{mr}}-\theta_{d_{1}d_{0}}^{(zz^{\prime})}\right)$ converges to a zero-mean normal distribution with variance

V_{\text{mr}}=\left\{\mathbb{E}[\mathcal{S}_{0}(\bm{O};\bm{\tau}^{*})]\right\}^{-2}\mathbb{E}\left[\left\{\mathcal{S}_{1}(\bm{O};\bm{\tau}^{*})-\theta_{d_{1}d_{0}}^{(zz^{\prime})}\mathcal{S}_{0}(\bm{O};\bm{\tau}^{*})+R(\theta_{d_{1}d_{0}}^{(zz^{\prime})},\bm{\tau}^{*})\text{IF}_{\bm{\tau}}(\bm{O};\bm{\tau}^{*})\right\}^{2}\right].

Finally, when all parametric working models are correctly specified, i.e., under the intersection model $\mathcal{M}_{\pi}\cap\mathcal{M}_{e}\cap\mathcal{M}_{m}\cap\mathcal{M}_{o}$ , then $V_{\text{mr}}=\mathbb{E}\left[\left\{\mathcal{D}_{d_{1}d_{0}}^{(zz^{\prime})}(\bm{O})\right\}^{2}\right]$ achieves the semiparametric efficiency bound. We can easily verify this by observing the following facts:

1.

$\mathcal{S}_{1}(\bm{O};\bm{\tau}^{*})=\psi_{d_{1}d_{0}}^{(zz^{\prime})}(\bm{O})$ and $\mathcal{S}_{0}(\bm{O};\bm{\tau}^{*})=\delta_{d_{1}d_{0}}(\bm{O})$ under $\mathcal{M}_{\pi}\cap\mathcal{M}_{e}\cap\mathcal{M}_{m}\cap\mathcal{M}_{o}$ .
2.

Following point 1, $\mathbb{E}[\mathcal{S}_{0}(\bm{O};\bm{\tau}^{*})]=e_{d_{1}d_{0}}=p_{z^{*}d^{*}}-kp_{01}$ under $\mathcal{M}_{\pi}\cap\mathcal{M}_{e}\cap\mathcal{M}_{m}\cap\mathcal{M}_{o}$ .
3.

$R(\theta_{d_{1}d_{0}}^{(zz^{\prime}),\text{mr}},\bm{\tau}^{*})=0$ , because the EIF is orthogonal to the likelihood score of the parametric working models when they are correctly specified.

Under the above three points, we have

V_{\text{mr}}=\mathbb{E}\left[\left\{\frac{\psi_{d_{1}d_{0}}^{(zz^{\prime})}(\bm{O})-\theta_{d_{1}d_{0}}^{(zz^{\prime})}\delta_{d_{1}d_{0}}(\bm{O})}{p_{z^{*}d^{*}}-kp_{01}}\right\}^{2}\right].

This has completed the proof. $\square$

D.6 The nonparametric efficient estimator (Theorem 5)

Proof of Theorem 5. Proof of the multiply robostness of $\widehat{\theta}_{d_{1}d_{0}}^{(zz^{\prime}),\text{np}}$ is same to that of $\widehat{\theta}_{d_{1}d_{0}}^{(zz^{\prime}),\text{mr}}$ as shown in the proof of Theorem 4. Here, we only prove the asymptotic normality and local efficiency of $\widehat{\theta}_{d_{1}d_{0}}^{(zz^{\prime}),\text{np}}$ .

To simplify notation but without loss of clarity, we abbreviate $\theta_{d_{1}d_{0}}^{(zz^{\prime})}$ and its nonparametric estimator ( $\widehat{\theta}_{d_{1}d_{0}}^{(zz^{\prime}),\text{np}}$ ) as $\theta$ and $\widehat{\theta}$ , respectively. Also, we abbreviate the two unknown functions in the efficient influence function, $\psi_{d_{1}d_{0}}^{(zz^{\prime})}(\bm{O})$ and $\delta_{d_{1}d_{0}}(\bm{O})$ , as $\psi(\bm{O})$ and $\delta(\bm{O})$ , respectively. Based on the cross-splitting procedure, the nonparametric estimator $\widehat{\theta}$ can be decomposed in to the ratio of two terms $\mathbb{P}_{n}\left[\widehat{\psi}(\bm{O})\right]$ and $\mathbb{P}_{n}\left[\widehat{\delta}(\bm{O})\right]$ , where

\displaystyle\mathbb{P}_{n}\left[\widehat{\psi}(\bm{O})\right]=\frac{1}{n}\sum_{v=1}^{V}n_{v}\mathbb{P}_{n_{v}}\left[\widehat{\psi}^{v}(\bm{O})\right]\text{ and }\mathbb{P}_{n}\left[\widehat{\delta}(\bm{O})\right]=\frac{1}{n}\sum_{v=1}^{V}n_{v}\mathbb{P}_{n_{v}}\left[\widehat{\delta}^{v}(\bm{O})\right].

Here, $n_{v}$ is the size of the $v$ -th group $\mathcal{O}_{v}$ , $\mathbb{P}_{n_{v}}[\cdot]$ is the empirical mean operator on $\mathcal{O}_{v}$ , and $\{\widehat{\psi}^{v}(\bm{O}),\widehat{\delta}^{v}(\bm{O})\}$ is $\{\psi(\bm{O}),\delta(\bm{O})\}$ evaluated under $\widehat{h}_{nuisance}^{\text{np},v}$ , which is the nonparametric estimator of the nuisance functions based on the leave-one-out sample $\mathcal{O}_{-v}$ . We further have the following decomposition of $\mathbb{P}_{n_{v}}\left[\widehat{\psi}^{v}(\bm{O})\right]$ and $\mathbb{P}_{n_{v}}\left[\widehat{\delta}^{v}(\bm{O})\right]$ :

	$\displaystyle\mathbb{P}_{n_{v}}[\widehat{\psi}^{v}(\bm{O})]$	$\displaystyle=\mathbb{P}_{n_{v}}[\psi(\bm{O})]+(\mathbb{P}_{n_{v}}-\mathbb{E})[\widehat{\psi}^{v}(\bm{O})-\psi(\bm{O})]+\underbrace{\mathbb{E}[\widehat{\psi}^{v}(\bm{O})-\psi(\bm{O})]}_{=:R_{1}(\widehat{\psi}^{v},\psi)},$		(s22)
	$\displaystyle\mathbb{P}_{n_{v}}[\widehat{\delta}^{v}(\bm{O})]$	$\displaystyle=\mathbb{P}_{n_{v}}[\delta(\bm{O})]+(\mathbb{P}_{n_{v}}-\mathbb{E})[\widehat{\delta}^{v}(\bm{O})-\delta(\bm{O})]+\underbrace{\mathbb{E}[\widehat{\delta}^{v}(\bm{O})-\delta(\bm{O})]}_{=:R_{2}(\widehat{\delta}^{v},\delta)}.$		(s23)

In what follows, we show that $\eqref{eq:t5_1}=\mathbb{P}_{n_{v}}[\psi(\bm{O})]+o_{p}(n^{-1/2})$ . Using a similar strategy, one can also deduce that $\eqref{eq:t5_2}=\mathbb{P}_{n_{v}}[\delta(\bm{O})]+o_{p}(n^{-1/2})$ .

We can show the second term in (s22) is $o_{p}(n^{-1/2})$ by cross-splitting. Specifically, by the Markov’s inequality, the independence induced by cross-splitting, and the fact that

	$\displaystyle\text{Var}\left\{(\mathbb{P}_{n_{v}}-\mathbb{E})[\widehat{\psi}^{v}(\bm{O})-\psi(\bm{O})]\Big{\|}\mathcal{O}_{-v}\right\}$	$\displaystyle=\text{Var}\left\{\mathbb{P}_{n_{v}}[\widehat{\psi}^{v}(\bm{O})-\psi(\bm{O})]\Big{\|}\mathcal{O}_{-v}\right\}$
		$\displaystyle=\frac{1}{n_{v}}\text{Var}\left\{\widehat{\psi}^{v}(\bm{O})-\psi(\bm{O})\Big{\|}\mathcal{O}_{-v}\right\}$
		$\displaystyle=\frac{1}{n_{v}}\\|\widehat{\psi}^{v}-\psi\\|^{2},$

we have that

	$\displaystyle\mathbb{P}\left\{\frac{\sqrt{n_{v}}\left\|(\mathbb{P}_{n_{v}}-\mathbb{E})[\widehat{\psi}^{v}(\bm{O})-\psi(\bm{O})]\right\|}{\\|\widehat{\psi}^{v}-\psi\\|}\geq\epsilon\right\}$	$\displaystyle=\mathbb{E}\left[\mathbb{P}\left\{\frac{\sqrt{n_{v}}\left\|(\mathbb{P}_{n_{v}}-\mathbb{E})[\widehat{\psi}^{v}(\bm{O})-\psi(\bm{O})]\right\|}{\\|\widehat{\psi}^{v}-\psi\\|}\geq\epsilon\Big{\|}\mathcal{O}_{-v}\right\}\right]$
		$\displaystyle\leq\frac{1}{\epsilon^{2}}\mathbb{E}\left[\text{Var}\left\{\frac{\sqrt{n_{v}}\left\|(\mathbb{P}_{n_{v}}-\mathbb{E})[\widehat{\psi}^{v}(\bm{O})-\psi(\bm{O})]\right\|}{\\|\widehat{\psi}^{v}-\psi\\|}\Big{\|}\mathcal{O}_{-v}\right\}\right]$
		$\displaystyle=\epsilon^{-2},$

for any $\epsilon>0$ . Therefore, $(\mathbb{P}_{n_{v}}-\mathbb{E})[\widehat{\psi}^{v}(\bm{O})-\psi(\bm{O})]=O_{p}(n_{v}^{-1/2}\|\widehat{\psi}^{v}-\psi\|)=o_{p}(n_{v}^{-1/2})$ because $\widehat{\psi}^{v}(\bm{O})$ converges to $\psi(\bm{O})$ in probability and therefore $\|\widehat{\psi}^{v}-\psi\|$ converges to 0. Since $V$ is a finite number and we partition the dataset as evenly as possible, we have that $n_{v}/n=O_{p}(1)$ and thus $(\mathbb{P}_{n_{v}}-\mathbb{E})[\widehat{\psi}^{v}(\bm{O})-\psi(\bm{O})]=o_{p}(n^{-1/2})$ .

Next, we show that $R_{1}(\widehat{\psi}^{v},\psi)=o_{p}(n^{-1/2})$ . Specifically, we can show that

		$\displaystyle R_{1}(\widehat{\psi}^{v},\psi)=\mathbb{E}[\widehat{\psi}^{v}(\bm{O})]-\mathbb{E}[\psi(\bm{O})]$
	$\displaystyle=$	$\displaystyle\mathbb{E}\Big{\{}\left(\frac{\mathbb{I}(Z=z^{})\left\{\mathbb{I}(D=d^{})-\widehat{p}_{z^{}d^{}}^{\text{np},v}({\bm{X}})\right\}}{\widehat{\pi}_{z^{*}}^{\text{np},v}({\bm{X}})}-k\frac{(1-Z)\left\{D-\widehat{p}_{01}^{\text{np},v}({\bm{X}})\right\}}{\widehat{\pi}_{0}^{\text{np},v}({\bm{X}})}\right)\widehat{\eta}_{zz^{\prime}}^{\text{np},v}({\bm{X}})$
		$\displaystyle+\left\{\widehat{p}_{z^{}d^{}}^{\text{np},v}({\bm{X}})-k\widehat{p}_{01}^{\text{np},v}({\bm{X}})\right\}\frac{\mathbb{I}(D=d_{z},Z=z)}{\widehat{p}_{zd_{z}}^{\text{np},v}({\bm{X}})\widehat{\pi}_{z}^{\text{np},v}({\bm{X}})}\frac{\widehat{r}_{z^{\prime}d_{z^{\prime}}}^{\text{np},v}(M,{\bm{X}})}{\widehat{r}_{zd_{z}}^{\text{np},v}(M,{\bm{X}})}\left\{Y-\widehat{\mu}_{zd_{z}}^{\text{np},v}(M,{\bm{X}})\right\}$
		$\displaystyle+\left\{\widehat{p}_{z^{}d^{}}^{\text{np},v}({\bm{X}})-k\widehat{p}_{01}^{\text{np},v}({\bm{X}})\right\}\frac{\mathbb{I}(D=d_{z^{\prime}},Z=z^{\prime})}{\widehat{p}_{z^{\prime}d_{z^{\prime}}}^{\text{np},v}({\bm{X}})\widehat{\pi}_{z^{\prime}}^{\text{np},v}({\bm{X}})}\left\{\widehat{\mu}_{zd_{z}}^{\text{np},v}(M,{\bm{X}})-\widehat{\eta}_{zz^{\prime}}^{\text{np},v}({\bm{X}})\right\}$
		$\displaystyle+\left\{\widehat{p}_{z^{}d^{}}^{\text{np},v}({\bm{X}})-k\widehat{p}_{01}^{\text{np},v}({\bm{X}})\right\}\widehat{\eta}_{zz^{\prime}}^{\text{np},v}({\bm{X}})\Big{\}}$
		$\displaystyle-\mathbb{E}\left[\left\{p_{z^{}d^{}}({\bm{X}})-kp_{01}({\bm{X}})\right\}\int_{m}\mu_{zd_{z}}(m,{\bm{X}})r_{z^{\prime}d_{z^{\prime}}}(m,{\bm{X}})\text{d}m\right]$
	$\displaystyle=$	$\displaystyle\mathbb{E}\left[\left(\frac{\pi_{z^{}}({\bm{X}})\left\{p_{z^{}d^{}}({\bm{X}})-\widehat{p}_{z^{}d^{}}^{\text{np},v}({\bm{X}})\right\}}{\widehat{\pi}_{z^{}}^{\text{np},v}({\bm{X}})}-k\frac{\pi_{0}({\bm{X}})\left\{p_{01}({\bm{X}})-\widehat{p}_{01}^{\text{np},v}({\bm{X}})\right\}}{\widehat{\pi}_{0}^{\text{np},v}({\bm{X}})}\right)\int_{m}\widehat{\mu}_{zd_{z}}^{\text{np},v}(m,{\bm{X}})\widehat{r}_{z^{\prime}d_{z^{\prime}}}^{\text{np},v}(m,{\bm{X}})\text{d}m\right]$
		$\displaystyle+\mathbb{E}\left[\left\{\widehat{p}_{z^{}d^{}}^{\text{np},v}({\bm{X}})-k\widehat{p}_{01}^{\text{np},v}({\bm{X}})\right\}\frac{p_{zd_{z}}({\bm{X}})\pi_{z}({\bm{X}})}{\widehat{p}_{zd_{z}}^{\text{np},v}({\bm{X}})\widehat{\pi}_{z}^{\text{np},v}({\bm{X}})}\int_{m}\frac{\widehat{r}_{z^{\prime}d_{z^{\prime}}}^{\text{np},v}(m,{\bm{X}})}{\widehat{r}_{zd_{z}}^{\text{np},v}(m,{\bm{X}})}\left\{\mu_{zd_{z}}(m,{\bm{X}})-\widehat{\mu}_{zd_{z}}^{\text{np},v}(m,{\bm{X}})\right\}r_{zd_{z}}(m,{\bm{X}})\text{d}m\right]$
		$\displaystyle+\mathbb{E}\left[\left\{\widehat{p}_{z^{}d^{}}^{\text{np},v}({\bm{X}})-k\widehat{p}_{01}^{\text{np},v}({\bm{X}})\right\}\frac{p_{z^{\prime}d_{z^{\prime}}}({\bm{X}})\pi_{z^{\prime}}({\bm{X}})}{\widehat{p}_{z^{\prime}d_{z^{\prime}}}^{\text{np},v}({\bm{X}})\widehat{\pi}_{z^{\prime}}^{\text{np},v}({\bm{X}})}\int_{m}\widehat{\mu}_{zd_{z}}^{\text{np},v}(m,{\bm{X}})\left\{r_{z^{\prime}d_{z^{\prime}}}(m,{\bm{X}})-\widehat{r}_{z^{\prime}d_{z^{\prime}}}^{\text{np},v}(m,{\bm{X}})\right\}\text{d}m\right]$
		$\displaystyle+\mathbb{E}\left[\left\{\widehat{p}_{z^{}d^{}}^{\text{np},v}({\bm{X}})-k\widehat{p}_{01}^{\text{np},v}({\bm{X}})\right\}\int_{m}\widehat{\mu}_{zd_{z}}^{\text{np},v}(m,{\bm{X}})\widehat{r}_{z^{\prime}d_{z^{\prime}}}^{\text{np},v}(m,{\bm{X}})\text{d}m\right]$
		$\displaystyle-\mathbb{E}\left[\left\{p_{z^{}d^{}}({\bm{X}})-kp_{01}({\bm{X}})\right\}\int_{m}\mu_{zd_{z}}(m,{\bm{X}})r_{z^{\prime}d_{z^{\prime}}}(m,{\bm{X}})\text{d}m\right]$

Define $\pi_{*}$ , $\pi_{0}$ , $\pi_{z}$ , $\pi_{z^{\prime}}$ , $p_{*}$ , $p_{0}$ , $p_{z}$ , $p_{z^{\prime}}$ , $r_{z}$ , $r_{z^{\prime}}$ , and $\mu_{z}$ as the abbreviations of the unknown nuisance functions $\pi_{z^{*}}({\bm{x}})$ , $\pi_{0}({\bm{x}})$ , $\pi_{z}({\bm{x}})$ , $\pi_{z^{\prime}}({\bm{x}})$ , $p_{z^{*}d^{*}}({\bm{x}})$ , $p_{01}({\bm{x}})$ , $p_{zd_{z}}({\bm{x}})$ , $p_{z^{\prime}d_{z^{\prime}}}({\bm{x}})$ , $r_{zd_{z}}(m,{\bm{x}})$ , $r_{z^{\prime}d_{z^{\prime}}}(m,{\bm{x}})$ , and $\mu_{zd_{z}}(m,{\bm{x}})$ , respectively. Also, let $\widehat{\pi}_{*}^{v}$ , $\widehat{\pi}_{0}^{v}$ , $\widehat{\pi}_{z}^{v}$ , $\widehat{\pi}_{z^{\prime}}^{v}$ , $\widehat{p}_{*}^{v}$ , $\widehat{p}_{0}^{v}$ , $\widehat{p}_{z}^{v}$ , $\widehat{p}_{z^{\prime}}^{v}$ , $\widehat{r}_{z}^{v}$ , $\widehat{r}_{z^{\prime}}^{v}$ , and $\widehat{\mu}_{z}^{v}$ be their corresponding estimators evaluated under $\widehat{h}_{nuisance}^{\text{np},v}$ . Using these abbreviations, we can rewrite $R_{1}(\widehat{\psi}^{v},\psi)=\Delta_{1}+\Delta_{2}+\Delta_{3}+\Delta_{4}-\Delta_{5}$ with

	$\displaystyle\Delta_{1}$	$\displaystyle=\mathbb{E}\left[\left(\frac{\pi_{}(p_{}-\widehat{p}_{}^{v})}{\widehat{\pi}_{}^{v}}-k\frac{\pi_{0}(p_{0}-\widehat{p}_{0}^{v})}{\widehat{\pi}_{0}^{v}}\right)\int_{m}\widehat{\mu}_{z}^{v}\widehat{r}_{z^{\prime}}^{v}\text{d}m\right]$
		$\displaystyle=\mathbb{E}\left[\left(\frac{(\pi_{}-\widehat{\pi}_{}^{v})(p_{}-\widehat{p}^{v})}{\widehat{\pi}_{}^{v}}-k\frac{(\pi_{0}-\widehat{\pi}_{0}^{v})(p_{0}-\widehat{p}_{0}^{v})}{\widehat{\pi}_{0}^{v}}\right)\int_{m}\widehat{\mu}_{z}^{v}\widehat{r}_{z^{\prime}}^{v}\text{d}m\right]+\mathbb{E}\left[\left\{(p_{}-\widehat{p}_{}^{v})-k(p_{0}-\widehat{p}_{0}^{v})\right\}\int_{m}\widehat{\mu}_{z}^{v}\widehat{r}_{z^{\prime}}^{v}\text{d}m\right],$
	$\displaystyle\Delta_{2}$	$\displaystyle=\mathbb{E}\left[(\widehat{p}_{*}^{v}-k\widehat{p}_{0}^{v})\frac{p_{z}\pi_{z}}{\widehat{p}_{z}^{v}\widehat{\pi}_{z}^{v}}\int_{m}\frac{\widehat{r}_{z^{\prime}}^{v}}{\widehat{r}_{z}^{v}}(\mu_{z}-\widehat{\mu}_{z}^{v})r_{z^{\prime}}\text{d}m\right]$
		$\displaystyle=\mathbb{E}\left[(\widehat{p}_{}^{v}-k\widehat{p}_{0}^{v})\frac{p_{z}\pi_{z}}{\widehat{p}_{z}^{v}\widehat{\pi}_{z}^{v}}\int_{m}\frac{(r_{z}-\widehat{r}_{z}^{v})}{\widehat{r}_{z}^{v}}(\mu_{z}-\widehat{\mu}_{z}^{v})\widehat{r}_{z^{\prime}}^{v}\text{d}m\right]+\mathbb{E}\left[(\widehat{p}_{}^{v}-k\widehat{p}_{0}^{v})\frac{p_{z}\pi_{z}}{\widehat{p}_{z}^{v}\widehat{\pi}_{z}^{v}}\int_{m}(\mu_{z}-\widehat{\mu}_{z}^{v})\widehat{r}_{z^{\prime}}^{v}\text{d}m\right]$
		$\displaystyle=\mathbb{E}\left[(\widehat{p}_{*}^{v}-k\widehat{p}_{0}^{v})\frac{p_{z}\pi_{z}}{\widehat{p}_{z}^{v}\widehat{\pi}_{z}^{v}}\int_{m}\frac{(r_{z}-\widehat{r}_{z}^{v})}{\widehat{r}_{z}^{v}}(\mu_{z}-\widehat{\mu}_{z}^{v})\widehat{r}_{z^{\prime}}^{v}\text{d}m\right]$
		$\displaystyle\quad+\mathbb{E}\left[(\widehat{p}_{}^{v}-k\widehat{p}_{0}^{v})\frac{(p_{z}-\widehat{p}_{z}^{v})\pi_{z}}{\widehat{p}_{z}^{v}\widehat{\pi}_{z}^{v}}\int_{m}(\mu_{z}-\widehat{\mu}_{z}^{v})\widehat{r}_{z^{\prime}}^{v}\text{d}m\right]+\mathbb{E}\left[(\widehat{p}_{}^{v}-k\widehat{p}_{0}^{v})\frac{\pi_{z}}{\widehat{\pi}_{z}^{v}}\int_{m}(\mu_{z}-\widehat{\mu}_{z}^{v})\widehat{r}_{z^{\prime}}^{v}\text{d}m\right]$
		$\displaystyle=\mathbb{E}\left[(\widehat{p}_{}^{v}-k\widehat{p}_{0}^{v})\frac{p_{z}\pi_{z}}{\widehat{p}_{z}^{v}\widehat{\pi}_{z}^{v}}\int_{m}\frac{(r_{z}-\widehat{r}_{z}^{v})}{\widehat{r}_{z}^{v}}(\mu_{z}-\widehat{\mu}_{z}^{v})\widehat{r}_{z^{\prime}}^{v}\text{d}m\right]+\mathbb{E}\left[(\widehat{p}_{}^{v}-k\widehat{p}_{0}^{v})\frac{(p_{z}-\widehat{p}_{z}^{v})\pi_{z}}{\widehat{p}_{z}^{v}\widehat{\pi}_{z}^{v}}\int_{m}(\mu_{z}-\widehat{\mu}_{z}^{v})\widehat{r}_{z^{\prime}}^{v}\text{d}m\right]$
		$\displaystyle\quad+\mathbb{E}\left[(\widehat{p}_{}^{v}-k\widehat{p}_{0}^{v})\frac{\pi_{z}-\widehat{\pi}_{z}^{v}}{\widehat{\pi}_{z}^{v}}\int_{m}(\mu_{z}-\widehat{\mu}_{z}^{v})\widehat{r}_{z^{\prime}}^{v}\text{d}m\right]+\mathbb{E}\left[(\widehat{p}_{}^{v}-k\widehat{p}_{0}^{v})\int_{m}(\mu_{z}-\widehat{\mu}_{z}^{v})\widehat{r}_{z^{\prime}}^{v}\text{d}m\right]$
		$\displaystyle=\mathbb{E}\left[(\widehat{p}_{}^{v}-k\widehat{p}_{0}^{v})\frac{p_{z}\pi_{z}}{\widehat{p}_{z}^{v}\widehat{\pi}_{z}^{v}}\int_{m}\frac{(r_{z}-\widehat{r}_{z}^{v})}{\widehat{r}_{z}^{v}}(\mu_{z}-\widehat{\mu}_{z}^{v})\widehat{r}_{z^{\prime}}^{v}\text{d}m\right]+\mathbb{E}\left[(\widehat{p}_{}^{v}-k\widehat{p}_{0}^{v})\frac{(p_{z}-\widehat{p}_{z}^{v})\pi_{z}}{\widehat{p}_{z}^{v}\widehat{\pi}_{z}^{v}}\int_{m}(\mu_{z}-\widehat{\mu}_{z}^{v})\widehat{r}_{z^{\prime}}^{v}\text{d}m\right]$
		$\displaystyle\quad+\mathbb{E}\left[(\widehat{p}_{}^{v}-k\widehat{p}_{0}^{v})\frac{\pi_{z}-\widehat{\pi}_{z}^{v}}{\widehat{\pi}_{z}^{v}}\int_{m}(\mu_{z}-\widehat{\mu}_{z}^{v})\widehat{r}_{z^{\prime}}^{v}\text{d}m\right]+\mathbb{E}\left[\{(\widehat{p}_{}^{v}-p_{*})-k(\widehat{p}_{0}^{v}-p_{0})\}\int_{m}(\mu_{z}-\widehat{\mu}_{z}^{v})\widehat{r}_{z^{\prime}}^{v}\text{d}m\right],$
		$\displaystyle\quad+\mathbb{E}\left[(p_{*}-kp_{0})\int_{m}(\mu_{z}-\widehat{\mu}_{z}^{v})\widehat{r}_{z^{\prime}}^{v}\text{d}m\right],$
	$\displaystyle\Delta_{3}$	$\displaystyle=\mathbb{E}\left[(\widehat{p}_{*}^{v}-k\widehat{p}_{0}^{v})\frac{p_{z^{\prime}}\pi_{z^{\prime}}}{\widehat{p}_{z^{\prime}}^{v}\widehat{\pi}_{z^{\prime}}^{v}}\int_{m}\widehat{\mu}_{z}^{v}(r_{z^{\prime}}-\widehat{r}_{z^{\prime}}^{v})\text{d}m\right]$
		$\displaystyle=\mathbb{E}\left[(\widehat{p}_{}^{v}-k\widehat{p}_{0}^{v})\frac{(p_{z^{\prime}}-\widehat{p}_{z^{\prime}}^{v})\pi_{z^{\prime}}}{\widehat{p}_{z^{\prime}}^{v}\widehat{\pi}_{z^{\prime}}^{v}}\int_{m}\widehat{\mu}_{z}^{v}(r_{z^{\prime}}-\widehat{r}_{z^{\prime}}^{v})\text{d}m\right]+\mathbb{E}\left[(\widehat{p}_{}^{v}-k\widehat{p}_{0}^{v})\frac{\pi_{z^{\prime}}}{\widehat{\pi}_{z^{\prime}}^{v}}\int_{m}\widehat{\mu}_{z}^{v}(r_{z^{\prime}}-\widehat{r}_{z^{\prime}}^{v})\text{d}m\right]$
		$\displaystyle=\mathbb{E}\left[(\widehat{p}_{*}^{v}-k\widehat{p}_{0}^{v})\frac{(p_{z^{\prime}}-\widehat{p}_{z^{\prime}}^{v})\pi_{z^{\prime}}}{\widehat{p}_{z^{\prime}}^{v}\widehat{\pi}_{z^{\prime}}^{v}}\int_{m}\widehat{\mu}_{z}^{v}(r_{z^{\prime}}-\widehat{r}_{z^{\prime}}^{v})\text{d}m\right]$
		$\displaystyle\quad+\mathbb{E}\left[(\widehat{p}_{}^{v}-k\widehat{p}_{0}^{v})\frac{\pi_{z^{\prime}}-\widehat{\pi}_{z^{\prime}}^{v}}{\widehat{\pi}_{z^{\prime}}^{v}}\int_{m}\widehat{\mu}_{z}^{v}(r_{z^{\prime}}-\widehat{r}_{z^{\prime}}^{v})\text{d}m\right]+\mathbb{E}\left[(\widehat{p}_{}^{v}-k\widehat{p}_{0}^{v})\int_{m}\widehat{\mu}_{z}^{v}(r_{z^{\prime}}-\widehat{r}_{z^{\prime}}^{v})\text{d}m\right]$
		$\displaystyle=\mathbb{E}\left[(\widehat{p}_{}^{v}-k\widehat{p}_{0}^{v})\frac{(p_{z^{\prime}}-\widehat{p}_{z^{\prime}}^{v})\pi_{z^{\prime}}}{\widehat{p}_{z^{\prime}}^{v}\widehat{\pi}_{z^{\prime}}^{v}}\int_{m}\widehat{\mu}_{z}^{v}(r_{z^{\prime}}-\widehat{r}_{z^{\prime}}^{v})\text{d}m\right]+\mathbb{E}\left[(\widehat{p}_{}^{v}-k\widehat{p}_{0}^{v})\frac{\pi_{z^{\prime}}-\widehat{\pi}_{z^{\prime}}^{v}}{\widehat{\pi}_{z^{\prime}}^{v}}\int_{m}\widehat{\mu}_{z}^{v}(r_{z^{\prime}}-\widehat{r}_{z^{\prime}}^{v})\text{d}m\right]$
		$\displaystyle\quad+\mathbb{E}\left[\{(\widehat{p}_{}^{v}-p_{})-k(\widehat{p}_{0}^{v}-p_{0})\}\int_{m}\widehat{\mu}_{z}^{v}(r_{z^{\prime}}-\widehat{r}_{z^{\prime}}^{v})\text{d}m\right]+\mathbb{E}\left[(p_{*}-kp_{0})\int_{m}\widehat{\mu}_{z}^{v}(r_{z^{\prime}}-\widehat{r}_{z^{\prime}}^{v})\text{d}m\right],$
	$\displaystyle\Delta_{4}$	$\displaystyle=\mathbb{E}\left[(\widehat{p}_{*}^{v}-k\widehat{p}_{0}^{v})\int_{m}\widehat{\mu}_{z}^{v}\widehat{r}_{z^{\prime}}^{v}\text{d}m\right]$
		$\displaystyle=\mathbb{E}\left[\{(\widehat{p}_{}^{v}-p_{})-k(\widehat{p}_{0}^{v}-p_{0})\}\int_{m}\widehat{\mu}_{z}^{v}\widehat{r}_{z^{\prime}}^{v}\text{d}m\right]+\mathbb{E}\left[(p_{*}-kp_{0})\int_{m}\widehat{\mu}_{z}^{v}\widehat{r}_{z^{\prime}}^{v}\text{d}m\right],$
	$\displaystyle\Delta_{5}$	$\displaystyle=\mathbb{E}\left[(p_{*}-kp_{0})\int_{m}\mu_{z}r_{z^{\prime}}\text{d}m\right].$

Therefore, we have that

	$\displaystyle R_{1}(\widehat{\psi}^{v},\psi)=$	$\displaystyle\Delta_{1}+\Delta_{2}+\Delta_{3}+\Delta_{4}-\Delta_{5}$
	$\displaystyle=$	$\displaystyle\mathbb{E}\left[\left(\frac{(\pi_{}-\widehat{\pi}_{}^{v})(p_{}-\widehat{p}^{v})}{\widehat{\pi}_{}^{v}}-k\frac{(\pi_{0}-\widehat{\pi}_{0}^{v})(p_{0}-\widehat{p}_{0}^{v})}{\widehat{\pi}_{0}^{v}}\right)\int_{m}\widehat{\mu}_{z}^{v}\widehat{r}_{z^{\prime}}^{v}\text{d}m\right]$
		$\displaystyle+\mathbb{E}\left[(\widehat{p}_{*}^{v}-k\widehat{p}_{0}^{v})\frac{p_{z}\pi_{z}}{\widehat{p}_{z}^{v}\widehat{\pi}_{z}^{v}}\int_{m}\frac{(r_{z}-\widehat{r}_{z}^{v})}{\widehat{r}_{z}^{v}}(\mu_{z}-\widehat{\mu}_{z}^{v})\widehat{r}_{z^{\prime}}^{v}\text{d}m\right]$
		$\displaystyle+\mathbb{E}\left[(\widehat{p}_{*}^{v}-k\widehat{p}_{0}^{v})\frac{(p_{z}-\widehat{p}_{z}^{v})\pi_{z}}{\widehat{p}_{z}^{v}\widehat{\pi}_{z}^{v}}\int_{m}(\mu_{z}-\widehat{\mu}_{z}^{v})\widehat{r}_{z^{\prime}}^{v}\text{d}m\right]$
		$\displaystyle+\mathbb{E}\left[(\widehat{p}_{*}^{v}-k\widehat{p}_{0}^{v})\frac{\pi_{z}-\widehat{\pi}_{z}^{v}}{\widehat{\pi}_{z}^{v}}\int_{m}(\mu_{z}-\widehat{\mu}_{z}^{v})\widehat{r}_{z^{\prime}}^{v}\text{d}m\right]$
		$\displaystyle+\mathbb{E}\left[\{(\widehat{p}_{}^{v}-p_{})-k(\widehat{p}_{0}^{v}-p_{0})\}\int_{m}(\mu_{z}-\widehat{\mu}_{z}^{v})\widehat{r}_{z^{\prime}}^{v}\text{d}m\right]$
		$\displaystyle+\mathbb{E}\left[(\widehat{p}_{*}^{v}-k\widehat{p}_{0}^{v})\frac{(p_{z^{\prime}}-\widehat{p}_{z^{\prime}}^{v})\pi_{z^{\prime}}}{\widehat{p}_{z^{\prime}}^{v}\widehat{\pi}_{z^{\prime}}^{v}}\int_{m}\widehat{\mu}_{z}^{v}(r_{z^{\prime}}-\widehat{r}_{z^{\prime}}^{v})\text{d}m\right]$
		$\displaystyle+\mathbb{E}\left[(\widehat{p}_{*}^{v}-k\widehat{p}_{0}^{v})\frac{\pi_{z^{\prime}}-\widehat{\pi}_{z^{\prime}}^{v}}{\widehat{\pi}_{z^{\prime}}^{v}}\int_{m}\widehat{\mu}_{z}^{v}(r_{z^{\prime}}-\widehat{r}_{z^{\prime}}^{v})\text{d}m\right]$
		$\displaystyle+\mathbb{E}\left[\{(\widehat{p}_{}^{v}-p_{})-k(\widehat{p}_{0}^{v}-p_{0})\}\int_{m}\widehat{\mu}_{z}^{v}(r_{z^{\prime}}-\widehat{r}_{z^{\prime}}^{v})\text{d}m\right]$
		$\displaystyle+\mathbb{E}\left[(p_{*}-kp_{0})\int_{m}(\widehat{\mu}_{z}-\mu_{z}^{v})(\widehat{r}_{z^{\prime}}^{v}-r_{z^{\prime}})\text{d}m\right]$
	$\displaystyle=$	$\displaystyle\mathbb{E}\left[f_{m}^{-1}\left(\frac{(\pi_{}-\widehat{\pi}_{}^{v})(p_{}-\widehat{p}_{}^{v})}{\widehat{\pi}_{*}^{v}}-k\frac{(\pi_{0}-\widehat{\pi}_{0}^{v})(p_{0}-\widehat{p}_{0}^{v})}{\widehat{\pi}_{0}^{v}}\right)\widehat{\mu}_{z}^{v}\widehat{r}_{z^{\prime}}^{v}\right]$
		$\displaystyle+\mathbb{E}\left[f_{m}^{-1}(\widehat{p}_{*}^{v}-k\widehat{p}_{0}^{v})\frac{p_{z}\pi_{z}}{\widehat{p}_{z}^{v}\widehat{\pi}_{z}^{v}}\frac{(r_{z}-\widehat{r}_{z}^{v})}{\widehat{r}_{z}^{v}}(\mu_{z}-\widehat{\mu}_{z}^{v})\widehat{r}_{z^{\prime}}^{v}\right]$
		$\displaystyle+\mathbb{E}\left[f_{m}^{-1}(\widehat{p}_{*}^{v}-k\widehat{p}_{0}^{v})\frac{(p_{z}-\widehat{p}_{z}^{v})\pi_{z}}{\widehat{p}_{z}^{v}\widehat{\pi}_{z}^{v}}(\mu_{z}-\widehat{\mu}_{z}^{v})\widehat{r}_{z^{\prime}}^{v}\right]$
		$\displaystyle+\mathbb{E}\left[f_{m}^{-1}(\widehat{p}_{*}^{v}-k\widehat{p}_{0}^{v})\frac{\pi_{z}-\widehat{\pi}_{z}^{v}}{\widehat{\pi}_{z}^{v}}(\mu_{z}-\widehat{\mu}_{z}^{v})\widehat{r}_{z^{\prime}}^{v}\right]$
		$\displaystyle+\mathbb{E}\left[f_{m}^{-1}\{(\widehat{p}_{}^{v}-p_{})-k(\widehat{p}_{0}^{v}-p_{0})\}(\mu_{z}-\widehat{\mu}_{z}^{v})\widehat{r}_{z^{\prime}}^{v}\right]$
		$\displaystyle+\mathbb{E}\left[f_{m}^{-1}(\widehat{p}_{*}^{v}-k\widehat{p}_{0}^{v})\frac{(p_{z^{\prime}}-\widehat{p}_{z^{\prime}}^{v})\pi_{z^{\prime}}}{\widehat{p}_{z^{\prime}}^{v}\widehat{\pi}_{z^{\prime}}^{v}}\widehat{\mu}_{z}^{v}(r_{z^{\prime}}-\widehat{r}_{z^{\prime}}^{v})\right]$
		$\displaystyle+\mathbb{E}\left[f_{m}^{-1}(\widehat{p}_{*}^{v}-k\widehat{p}_{0}^{v})\frac{\pi_{z^{\prime}}-\widehat{\pi}_{z^{\prime}}^{v}}{\widehat{\pi}_{z^{\prime}}^{v}}\widehat{\mu}_{z}^{v}(r_{z^{\prime}}-\widehat{r}_{z^{\prime}}^{v})\right]$
		$\displaystyle+\mathbb{E}\left[f_{m}^{-1}\{(\widehat{p}_{}^{v}-p_{})-k(\widehat{p}_{0}^{v}-p_{0})\}\widehat{\mu}_{z}^{v}(r_{z^{\prime}}-\widehat{r}_{z^{\prime}}^{v})\right]$
		$\displaystyle+\mathbb{E}\left[f_{m}^{-1}(p_{*}-kp_{0})(\widehat{\mu}_{z}^{v}-\mu_{z})(\widehat{r}_{z^{\prime}}^{v}-r_{z^{\prime}})\right],$

where $f_{m}=f_{M|Z,D,{\bm{X}}}(M|Z,D,{\bm{X}})$ and the last equality of the previous equation follows from the law of iterated expectation. Using the Cauchy-Schwartz inequality, we then have

	$\displaystyle\|R_{1}(\widehat{\psi}^{v},\psi)\|\leq$	$\displaystyle\\|f_{m}^{-1}\widehat{\pi}_{}^{v^{-1}}\widehat{\mu}_{z}^{v}\widehat{r}_{z^{\prime}}^{v}\\|_{\infty}\\|\widehat{\pi}_{}^{v}-\pi_{}\\|\\|\widehat{p}_{}^{v}-p_{*}\\|+\\|f_{m}^{-1}\widehat{\pi}_{0}^{v^{-1}}\widehat{\mu}_{z}^{v}\widehat{r}_{z^{\prime}}^{v}\\|_{\infty}\\|\widehat{\pi}_{0}^{v}-\pi_{0}\\|\\|\widehat{p}_{0}^{v}-p_{0}\\|$
		$\displaystyle+\left\{\\|f_{m}^{-1}\widehat{p}_{*}^{v}p_{z}\pi_{z}\widehat{r}_{z^{\prime}}^{v}\widehat{p}_{z}^{v^{-1}}\widehat{\pi}_{z}^{v^{-1}}\widehat{r}_{z}^{v^{-1}}\\|_{\infty}+\\|f_{m}^{-1}\widehat{p}_{0}^{v}p_{z}\pi_{z}\widehat{r}_{z^{\prime}}^{v}\widehat{p}_{z}^{v^{-1}}\widehat{\pi}_{z}^{v^{-1}}\widehat{r}_{z}^{v^{-1}}\\|_{\infty}\right\}\\|\widehat{\mu}_{z}^{v}-\mu_{z}\\|\\|\widehat{r}_{z}^{v}-r_{z}\\|$
		$\displaystyle+\left\{\\|f_{m}^{-1}\widehat{p}_{*}^{v}\pi_{z}\widehat{r}_{z^{\prime}}^{v}\widehat{p}_{z}^{v^{-1}}\widehat{\pi}_{z}^{v^{-1}}\\|_{\infty}+\\|f_{m}^{-1}\widehat{p}_{0}^{v}\pi_{z}\widehat{r}_{z^{\prime}}^{v}\widehat{p}_{z}^{v^{-1}}\widehat{\pi}_{z}^{v^{-1}}\\|_{\infty}\right\}\\|\widehat{\mu}_{z}^{v}-\mu_{z}\\|\\|\widehat{p}_{z}^{v}-p_{z}\\|$
		$\displaystyle+\left\{\\|f_{m}^{-1}\widehat{p}_{}^{v}\widehat{\pi}_{z}^{v^{-1}}\widehat{r}_{z^{\prime}}^{v}\\|_{\infty}+\\|f_{m}^{-1}\widehat{p}_{}^{v}\widehat{\pi}_{z}^{v^{-1}}\widehat{r}_{z^{\prime}}^{v}\\|_{\infty}\right\}\\|\widehat{\mu}_{z}^{v}-\mu_{z}\\|\\|\widehat{\pi}_{z}^{v}-\pi_{z}\\|$
		$\displaystyle+\\|f_{m}^{-1}\widehat{r}_{z^{\prime}}^{v}\\|_{\infty}\\|\widehat{p}_{}^{v}-p_{}\\|\\|\widehat{\mu}_{z}^{v}-\mu_{z}\\|+\\|f_{m}^{-1}\widehat{r}_{z^{\prime}}^{v}\\|_{\infty}\\|\widehat{p}_{0}^{v}-p_{0}\\|\\|\widehat{\mu}_{z}^{v}-\mu_{z}\\|$
		$\displaystyle+\left\{\\|f_{m}^{-1}\widehat{p}_{*}^{v}\pi_{z^{\prime}}\widehat{p}_{z^{\prime}}^{v^{-1}}\widehat{\pi}_{z^{\prime}}^{v^{-1}}\mu_{z}^{v}\\|_{\infty}+\\|f_{m}^{-1}\widehat{p}_{0}^{v}\pi_{z^{\prime}}\widehat{p}_{z^{\prime}}^{v^{-1}}\widehat{\pi}_{z^{\prime}}^{v^{-1}}\mu_{z}^{v}\\|_{\infty}\right\}\\|\widehat{r}_{z^{\prime}}^{v}-r_{z^{\prime}}\\|\\|\widehat{p}_{z^{\prime}}^{v}-p_{z^{\prime}}\\|$
		$\displaystyle+\left\{\\|f_{m}^{-1}\widehat{p}_{}^{v}\widehat{\pi}_{z^{\prime}}^{v^{-1}}\widehat{\mu}_{z}^{v}\\|_{\infty}+\\|f_{m}^{-1}\widehat{p}_{}^{v}\widehat{\pi}_{z^{\prime}}^{v^{-1}}\widehat{\mu}_{z}^{v}\\|_{\infty}\right\}\\|\widehat{r}_{z^{\prime}}^{v}-r_{z^{\prime}}\\|\\|\widehat{\pi}_{z^{\prime}}^{v}-\pi_{z^{\prime}}\\|$
		$\displaystyle+\\|f_{m}^{-1}\widehat{\mu}_{z}^{v}\\|_{\infty}\\|\widehat{r}_{z^{\prime}}^{v}-r_{z^{\prime}}\\|\\|\widehat{p}_{}^{v}-p_{}\\|+\\|f_{m}^{-1}\widehat{\mu}_{z}^{v}\\|_{\infty}\\|\widehat{r}_{z^{\prime}}^{v}-r_{z^{\prime}}\\|\\|\widehat{p}_{0}^{v}-p_{0}\\|$
		$\displaystyle+\left\{\\|f_{m}^{-1}p_{*}\\|_{\infty}+\\|f_{m}^{-1}p_{0}\\|_{\infty}\right\}\\|\widehat{\mu}_{z}^{v}-\mu_{z}\\|\\|\widehat{r}_{z^{\prime}}^{v}-r_{z^{\prime}}\\|.$

Noting that it is assumed $\|\widehat{l}^{\text{np},v}-l\|\times\|\widehat{g}^{\text{np},v}-g\|=o_{p}(n^{-1/2})$ for any $l\neq g\in\{\pi_{z}({\bm{x}}),p_{zd}({\bm{x}}),r_{zd}(m,{\bm{x}}),\\ \mu_{zd}(m,{\bm{x}})\}$ , then $R_{1}(\widehat{\psi}^{v},\psi)=o_{p}(n^{-1/2})$ . Now, we have confirmed that

\mathbb{P}_{n_{v}}[\widehat{\psi}^{v}(\bm{O})]=\mathbb{P}_{n_{v}}[\psi(\bm{O})]+o_{p}(n^{-1/2}),

thus

$\displaystyle\mathbb{P}_{n}\left[\widehat{\psi}(\bm{O})\right]$	$\displaystyle=\frac{1}{n}\sum_{v=1}^{V}n_{v}\mathbb{P}_{n_{v}}\left[\widehat{\psi}^{v}(\bm{O})\right]$
	$\displaystyle=\sum_{v=1}^{V}\left\{\frac{n_{v}}{n}\mathbb{P}_{n_{v}}\left[\psi(\bm{O})\right]+o_{p}\left(\frac{n_{v}}{n^{3/2}}\right)\right\}$
	$\displaystyle=\mathbb{P}_{n}\left[\psi(\bm{O})\right]+o_{p}(n^{-1/2})$	(s24)

Using similar arguments, we can show $\mathbb{P}_{n_{v}}[\widehat{\delta}^{v}(\bm{O})]=\mathbb{P}_{n_{v}}[\delta(\bm{O})]+o_{p}(n^{-1/2})$ and therefore

\displaystyle\mathbb{P}_{n}\left[\widehat{\delta}(\bm{O})\right]

\displaystyle=\mathbb{P}_{n}\left[\delta(\bm{O})\right]+o_{p}(n^{-1/2}).

(s25)

Notice that $\widehat{\theta}$ can be cast as the solution of the following equation

\mathbb{P}_{n}\left[\widehat{\psi}(\bm{O})-\widehat{\theta}\widehat{\delta}(\bm{O})\right]=0.

This, along with (s24) and (s25) suggests that

		$\displaystyle\mathbb{P}_{n}\left[\psi(\bm{O})-\widehat{\theta}\delta(\bm{O})\right]=o_{p}(n^{-1/2})$
	$\displaystyle\Longleftrightarrow$	$\displaystyle\mathbb{P}_{n}\left[\psi(\bm{O})-\theta\delta(\bm{O})\right]-\mathbb{P}_{n}\left[\delta(\bm{O})\right](\widehat{\theta}-\theta)=o_{p}(n^{-1/2}).$

Moreover, since $\mathbb{P}_{n}\left[\delta(\bm{O})\right]=\mathbb{E}\left[\delta(\bm{O})\right]+O_{p}(n^{-1/2})$ and $\widehat{\theta}=\theta+o_{p}(1)$ , it follows that $\mathbb{P}_{n}\left[\delta(\bm{O})\right](\widehat{\theta}-\theta)=\mathbb{E}\left[\delta(\bm{O})\right](\widehat{\theta}-\theta)+o_{p}(n^{-1/2})$ . Therefore, we further obtain

\mathbb{P}_{n}\left[\psi(\bm{O})-\theta\delta(\bm{O})\right]-\mathbb{E}\left[\delta(\bm{O})\right](\widehat{\theta}-\theta)=o_{p}(n^{-1/2}).

After simple algebra and observing $\mathbb{E}\left[\delta(\bm{O})\right]=e_{d_{1}d_{0}}=p_{z^{*}d^{*}}-kp_{01}$ , we have

\sqrt{n}(\widehat{\theta}-\theta)=\sqrt{n}\mathbb{P}_{n}\left[\frac{\psi(\bm{O})-\theta\delta(\bm{O})}{p_{z^{*}d^{*}}-kp_{01}}\right]+o_{p}(1)

which suggests that $\widehat{\theta}$ is asymptotically normal and its asymptotic variance achieves the efficiency lower bound discussed in Appendix D.5. $\square$

D.7 Estimation of natural mediation effects

This section elaborates on the multiple robust estimator and nonparametric estimator for the mediation effects, $\text{PNIE}_{d_{1}d_{0}}$ , $\text{PNDE}_{d_{1}d_{0}}$ , ITT-NIE and ITT-NDE. The following lemma provides the form of the EIF of the aforementioned mediation effects.

Lemma S11

For any $d_{1}d_{0}\in\mathcal{U}_{\text{a}}$ or $d_{1}d_{0}\in\mathcal{U}_{\text{b}}$ under standard or strong monotonicity, the EIFs of $\text{PNIE}_{d_{1}d_{0}}$ and $\text{PNDE}_{d_{1}d_{0}}$ are

\mathcal{D}_{d_{1}d_{0}}^{\text{PNIE}}(\bm{O})=\frac{\psi_{d_{1}d_{0}}^{(11)}(\bm{O})-\psi_{d_{1}d_{0}}^{(10)}(\bm{O})-\text{PNIE}_{d_{1}d_{0}}\times\delta_{d_{1}d_{0}}(\bm{O})}{p_{z^{*}d^{*}}-kp_{01}}

and

\mathcal{D}_{d_{1}d_{0}}^{\text{PNDE}}(\bm{O})=\frac{\psi_{d_{1}d_{0}}^{(10)}(\bm{O})-\psi_{d_{1}d_{0}}^{(00)}(\bm{O})-\text{PNDE}_{d_{1}d_{0}}\times\delta_{d_{1}d_{0}}(\bm{O})}{p_{z^{*}d^{*}}-kp_{01}}

respectively, where $\psi_{d_{1}d_{0}}^{(zz^{\prime})}(\bm{O})$ and $\delta_{d_{1}d_{0}}(\bm{O})$ are given in Theorem 3, $k=|d_{1}-d_{0}|$ , and $z^{*}d^{*}=$ 11, 10, 01 if $d_{1}d_{0}=$ 10, 00, and 11, respectively. In addition, the EIFs of ITT-NIE and ITT-NDE are

\mathcal{D}^{\text{ITT-NIE}}(\bm{O})=\sum_{d_{1}d_{0}\in\mathcal{U}}\left\{\psi_{d_{1}d_{0}}^{(11)}(\bm{O})-\psi_{d_{1}d_{0}}^{(10)}(\bm{O})\right\}-\text{ITT-NIE}

and

\mathcal{D}^{\text{ITT-NDE}}(\bm{O})=\sum_{d_{1}d_{0}\in\mathcal{U}}\left\{\psi_{d_{1}d_{0}}^{(10)}(\bm{O})-\psi_{d_{1}d_{0}}^{(00)}(\bm{O})\right\}-\text{ITT-NDE}

respectively, where $\mathcal{U}=\mathcal{U}_{\text{a}}$ or $\mathcal{U}=\mathcal{U}_{\text{b}}$ under the standard or strong monotonicity.

Proof.

Because $\text{PNIE}_{d_{1}d_{0}}$ is identified as the difference between $\theta_{d_{1}d_{0}}^{(11)}$ and $\theta_{d_{1}d_{0}}^{(10)}$ , and the EIFs of $\theta_{d_{1}d_{0}}^{(11)}$ and $\theta_{d_{1}d_{0}}^{(10)}$ are $\mathcal{D}_{d_{1}d_{0}}^{(11)}(\bm{O})$ and $\mathcal{D}_{d_{1}d_{0}}^{(10)}(\bm{O})$ as derived in Theorem 3. According to Lemma S10, we have that the EIF of $\text{PNIE}_{d_{1}d_{0}}$ is

\mathcal{D}_{d_{1}d_{0}}^{\text{PNIE}}(\bm{O})=\mathcal{D}_{d_{1}d_{0}}^{(11)}(\bm{O})-\mathcal{D}_{d_{1}d_{0}}^{(10)}(\bm{O})=\frac{\psi_{d_{1}d_{0}}^{(11)}(\bm{O})-\psi_{d_{1}d_{0}}^{(10)}(\bm{O})-\text{PNIE}_{d_{1}d_{0}}\times\delta_{d_{1}d_{0}}(\bm{O})}{p_{z^{*}d^{*}}-kp_{01}}.

The EIF of $\text{PNDE}_{d_{1}d_{0}}$ can be similarly obtained.

Also, ITT-NIE is identified as

\text{ITT-NIE}=\sum_{d_{1}d_{0}\in\mathcal{U}}e_{d_{1}d_{0}}\times\left(\theta_{d_{1}d_{0}}^{(11)}-\theta_{d_{1}d_{0}}^{(10)}\right)=\sum_{d_{1}d_{0}\in\mathcal{U}}\left(H_{d_{1}d_{0}}^{(11)}-H_{d_{1}d_{0}}^{(10)}\right),

where $H_{d_{1}d_{0}}^{(zz^{\prime})}=\mathbb{E}\left[(p_{z^{*}d^{*}}({\bm{X}})-kp_{01}({\bm{X}}))\eta_{zz^{\prime}}({\bm{X}})\right]$ is defined in Section D.4 and its EIF has been derived in Lemma S9. Based on Lemma S10, one can easily show

\mathcal{D}^{\text{ITT-NIE}}(\bm{O})=\sum_{d_{1}d_{0}\in\mathcal{U}}\left\{\mathcal{D}_{d_{1}d_{0}}^{(11),H}(\bm{O})-\mathcal{D}_{d_{1}d_{0}}^{(10),H}(\bm{O})\right\}=\sum_{d_{1}d_{0}\in\mathcal{U}}\left\{\psi_{d_{1}d_{0}}^{(11)}(\bm{O})-\psi_{d_{1}d_{0}}^{(10)}(\bm{O})\right\}-\text{ITT-NIE}.

We can calculate the EIF of ITT-NDE following the same strategy. $\square$

The following proposition demonstrates the multiply robust estimator for the mediation effects is still quadruply robust and locally efficient.

Proposition S5

Under either $\mathcal{M}_{\pi}\cap\mathcal{M}_{e}\cap\mathcal{M}_{m}$ , $\mathcal{M}_{\pi}\cap\mathcal{M}_{m}\cap\mathcal{M}_{o}$ , $\mathcal{M}_{\pi}\cap\mathcal{M}_{e}\cap\mathcal{M}_{o}$ or $\mathcal{M}_{e}\cap\mathcal{M}_{m}\cap\mathcal{M}_{o}$ , the multiply robust estimator $\widehat{\tau}^{\text{mr}}$ is consistent and asymptotically normal for all $\tau\in\{\text{PNIE}_{d_{1}d_{0}},\text{PNDE}_{d_{1}d_{0}},\text{ITT-NIE},\text{ITT-NDE}\}$ . Moreover, $\widehat{\tau}^{\text{mr}}$ is semiparametrically efficient under $\mathcal{M}_{\pi}\cap\mathcal{M}_{e}\cap\mathcal{M}_{m}\cap\mathcal{M}_{o}$ .

Proof.

The quadruple robustness and asymptotically normality of $\widehat{\text{PNIE}}_{d_{1}d_{0}}^{\text{mr}}$ and $\widehat{\text{PNDE}}_{d_{1}d_{0}}^{\text{mr}}$ follow directly from the properties of $\widehat{\theta}_{d_{1}d_{0}}^{(zz^{\prime}),\text{mr}}$ in Theorem 4. Next, we prove that $\widehat{\text{PNIE}}_{d_{1}d_{0}}^{\text{mr}}$ is locally efficient under $\mathcal{M}_{\pi}\cap\mathcal{M}_{e}\cap\mathcal{M}_{m}\cap\mathcal{M}_{o}$ and similar results extend to $\widehat{\text{PNDE}}_{d_{1}d_{0}}^{\text{mr}}$ . Based on the proof of Theorem 4 in Section D.5, we know that the influence function of $\widehat{\theta}_{d_{1}d_{0}}^{(zz^{\prime}),\text{mr}}$ under $\mathcal{M}_{\pi}\cap\mathcal{M}_{e}\cap\mathcal{M}_{m}\cap\mathcal{M}_{o}$ is

\sqrt{n}(\widehat{\theta}_{d_{1}d_{0}}^{(zz^{\prime}),\text{mr}}-\theta_{d_{1}d_{0}}^{(zz^{\prime})})=\sqrt{n}\mathbb{P}_{n}\left[\frac{\psi_{d_{1}d_{0}}^{(zz^{\prime})}(\bm{O})-\theta_{d_{1}d_{0}}^{(zz^{\prime})}\delta_{d_{1}d_{0}}(\bm{O})}{p_{z^{*}d^{*}}-kp_{01}}\right]+o_{p}(1).

Then, it follows from $\widehat{\text{PNIE}}_{d_{1}d_{0}}^{\text{mr}}=\widehat{\theta}_{d_{1}d_{0}}^{(11),\text{mr}}-\widehat{\theta}_{d_{1}d_{0}}^{(10),\text{mr}}$ that

	$\displaystyle\sqrt{n}(\widehat{\text{PNIE}}_{d_{1}d_{0}}^{\text{mr}}-\text{PNIE}_{d_{1}d_{0}})$	$\displaystyle=\sqrt{n}\mathbb{P}_{n}\left[\frac{\psi_{d_{1}d_{0}}^{(11)}(\bm{O})-\psi_{d_{1}d_{0}}^{(10)}(\bm{O})-\text{PNIE}_{d_{1}d_{0}}\times\delta_{d_{1}d_{0}}(\bm{O})}{p_{z^{}d^{}}-kp_{01}}\right]+o_{p}(1)$
		$\displaystyle=\sqrt{n}\mathbb{P}_{n}\left[\mathcal{D}_{d_{1}d_{0}}^{\text{PNIE}}(\bm{O})\right]+o_{p}(1),$

where the second equality holds due to Lemma S11. This suggests that $\widehat{\text{PNIE}}_{d_{1}d_{0}}^{\text{mr}}$ is semiparametrically efficient when all working models are correctly specified.

The multiply robust estimator of ITT-NIE is $\widehat{\text{ITT-NIE}}^{\text{mr}}=\sum_{d_{1}d_{0}\in\mathcal{U}}\widehat{e}_{d_{1}d_{0}}^{\text{dr}}\times(\widehat{\theta}_{d_{1}d_{0}}^{(11),\text{mr}}-\widehat{\theta}_{d_{1}d_{0}}^{(10),\text{mr}})$ . Theorem 4 suggests that $\widehat{\theta}_{d_{1}d_{0}}^{(11),\text{mr}}\xrightarrow{p}\theta_{d_{1}d_{0}}^{(11)}$ and $\widehat{\theta}_{d_{1}d_{0}}^{(11),\text{mr}}\xrightarrow{p}\theta_{d_{1}d_{0}}^{(10)}$ under either $\mathcal{M}_{\pi}\cap\mathcal{M}_{e}\cap\mathcal{M}_{m}$ , $\mathcal{M}_{\pi}\cap\mathcal{M}_{m}\cap\mathcal{M}_{o}$ , $\mathcal{M}_{\pi}\cap\mathcal{M}_{e}\cap\mathcal{M}_{o}$ , or $\mathcal{M}_{e}\cap\mathcal{M}_{m}\cap\mathcal{M}_{o}$ . Also, Jiang et al. (2022) suggests that the doubly robust estimator for the marginal principal score $\widehat{e}_{d_{1}d_{0}}^{\text{dr}}=\widehat{p}_{z^{*}d^{*}}^{\text{dr}}-k\widehat{p}_{01}^{\text{dr}}$ is consistent to $e_{d_{1}d_{0}}$ under $\mathcal{M}_{\pi}\cup\mathcal{M}_{e}$ . This further implies that $\widehat{\text{ITT-NIE}}^{\text{mr}}\xrightarrow{p}\sum_{d_{1}d_{0}\in\mathcal{U}}e_{d_{1}d_{0}}\times(\theta_{d_{1}d_{0}}^{(11)}-\theta_{d_{1}d_{0}}^{(10)})=\text{ITT-NIE}$ under either $\mathcal{M}_{\pi}\cap\mathcal{M}_{e}\cap\mathcal{M}_{m}$ , $\mathcal{M}_{\pi}\cap\mathcal{M}_{m}\cap\mathcal{M}_{o}$ , $\mathcal{M}_{\pi}\cap\mathcal{M}_{e}\cap\mathcal{M}_{o}$ , or $\mathcal{M}_{e}\cap\mathcal{M}_{m}\cap\mathcal{M}_{o}$ . To prove asymptotic normality, notice that $\widehat{\text{ITT-NIE}}^{\text{mr}}$ can be re-expressed as

	$\displaystyle\widehat{\text{ITT-NIE}}^{\text{mr}}$	$\displaystyle=\sum_{d_{1}d_{0}\in\mathcal{U}}\widehat{e}_{d_{1}d_{0}}^{\text{dr}}\times(\widehat{\theta}_{d_{1}d_{0}}^{(11),\text{mr}}-\widehat{\theta}_{d_{1}d_{0}}^{(10),\text{mr}})$
		$\displaystyle=\sum_{d_{1}d_{0}\in\mathcal{U}}\mathbb{P}_{n}[\widehat{\delta}_{d_{1}d_{0}}^{\text{par}}(\bm{O})]\times\left(\frac{\mathbb{P}_{n}[\widehat{\psi}_{d_{1}d_{0}}^{(11),\text{par}}(\bm{O})]}{\mathbb{P}_{n}[\widehat{\delta}_{d_{1}d_{0}}^{\text{par}}(\bm{O})]}-\frac{\mathbb{P}_{n}[\widehat{\psi}_{d_{1}d_{0}}^{(10),\text{par}}(\bm{O})]}{\mathbb{P}_{n}[\widehat{\delta}_{d_{1}d_{0}}^{\text{par}}(\bm{O})]}\right)$
		$\displaystyle=\mathbb{P}_{n}\left[\sum_{d_{1}d_{0}\in\mathcal{U}}\widehat{\psi}_{d_{1}d_{0}}^{(11),\text{par}}(\bm{O})-\widehat{\psi}_{d_{1}d_{0}}^{(10),\text{par}}(\bm{O})\right].$

Define $S_{\text{mr}}(\bm{O};\widehat{\bm{\tau}})=\{\sum_{d_{1}d_{0}\in\mathcal{U}}\widehat{\psi}_{d_{1}d_{0}}^{(11),\text{par}}(\bm{O})-\widehat{\psi}_{d_{1}d_{0}}^{(10),\text{par}}(\bm{O})\}$ , where $\widehat{\bm{\tau}}$ is the estimator of the parameters in nuisance parametric working models. Then, under mild regularity conditions (similar to what we listed in the proof of Theorem 4), one can easily deduce that

		$\displaystyle\sqrt{n}\left(\widehat{\text{ITT-NIE}}^{\text{mr}}-\text{ITT-NIE}\right)$
	$\displaystyle=$	$\displaystyle\sqrt{n}\mathbb{P}_{n}\left\{S_{\text{mr}}(\bm{O};{\bm{\tau}}^{})-\text{ITT-NIE}+\mathbb{E}\left[\frac{\partial}{\partial\bm{\tau}^{}}S_{\text{mr}}(\bm{O};{\bm{\tau}}^{})\right]\text{IF}_{\bm{\tau}}(\bm{O};\bm{\tau}^{})\right\}+o_{p}(1),$

where $\bm{\tau}^{*}$ is the probability limit of $\widehat{\bm{\tau}}$ and $\text{IF}_{\bm{\tau}}(\bm{O};\bm{\tau}^{*})$ is the influence function of $\bm{\tau}$ . This have confirmed that $\widehat{\text{ITT-NIE}}^{\text{mr}}$ is asmptotically normal. Under $\mathcal{M}_{\pi}\cap\mathcal{M}_{e}\cap\mathcal{M}_{m}\cap\mathcal{M}_{o}$ , we can verify that $\widehat{\text{ITT-NIE}}^{\text{mr}}$ is semiparametrically efficient by observing that $S_{\text{mr}}(\bm{O};{\bm{\tau}}^{*})=\psi_{d_{1}d_{0}}^{(11)}(\bm{O})-\psi_{d_{1}d_{0}}^{(10)}(\bm{O})$ and $\mathbb{E}\left[\frac{\partial}{\partial\bm{\tau}^{*}}S_{\text{mr}}(\bm{O};{\bm{\tau}}^{*})\right]=0$ such that

	$\displaystyle\sqrt{n}\left(\widehat{\text{ITT-NIE}}^{\text{mr}}-\text{ITT-NIE}\right)$	$\displaystyle=\sqrt{n}\mathbb{P}_{n}\left[\sum_{d_{1}d_{0}\in\mathcal{U}}\left\{\psi_{d_{1}d_{0}}^{(11)}(\bm{O})-\psi_{d_{1}d_{0}}^{(10)}(\bm{O})\right\}-\text{ITT-NIE}\right]+o_{p}(1)$
		$\displaystyle=\sqrt{n}\mathbb{P}_{n}\left[\mathcal{D}^{\text{ITT-NIE}}(\bm{O})\right]+o_{p}(1),$

where the second equality from Lemma S11. This suggests that $\widehat{\text{ITT-NIE}}^{\text{mr}}$ is locally efficient. Using a similar strategy, one can prove that $\widehat{\text{ITT-NDE}}^{\text{mr}}$ is quadruply robust, asymptotically normal, and locally efficient. $\square$

In parallel to results in Section D.6, the following proposition demonstrates the properties of the nonparametric estimator of the mediation effects.

Proposition S6

For any $\tau\in\{\text{PNIE}_{d_{1}d_{0}},\text{PNDE}_{d_{1}d_{0}},\text{ITT-NIE},\text{ITT-NDE}\}$ , $\widehat{\tau}^{\text{np}}$ is consistent if any three of the four nuisance functions in $\widehat{h}_{nuisance}^{\text{np}}$ are consistently estimated in the $L_{2}(\mathbb{P})$ -norm. Furthermore, if all elements in $\widehat{h}_{nuisance}^{\text{np}}$ are consistent in the $L_{2}(\mathbb{P})$ -norm and $\|\widehat{l}^{\text{np}}-l\|\times\|\widehat{g}^{\text{np}}-g\|=o_{p}(n^{-1/2})$ for any $l\neq g\in\{\pi_{z}({\bm{x}}),p_{zd}({\bm{x}}),r_{zd}(m,{\bm{x}}),\mu_{zd}(m,{\bm{x}})\}$ , then $\widehat{\tau}^{\text{np}}$ is asymptotically normal and semiparametrically efficient.

Proof.

Following the proof of Proposition S5, one can show that $\widehat{\tau}^{\text{np}}$ is consistent to $\tau$ for $\tau\in\{\text{PNIE}_{d_{1}d_{0}},\text{PNDE}_{d_{1}d_{0}},\text{ITT-NIE},\text{ITT-NDE}\}$ , if any three of the four functions in $\widehat{h}_{nuisance}^{\text{np}}=\{\widehat{\pi}_{z}^{\text{np}}({\bm{x}}),\widehat{p}_{zd}^{\text{np}}({\bm{x}}),\widehat{r}_{zd}^{\text{np}}(m,{\bm{x}}),\widehat{\mu}_{zd}^{\text{np}}(m,{\bm{x}})\}$ are consistently estimated. Here, we only prove the asymptotic normality and local efficiency of $\widehat{\tau}^{\text{np}}$ when $\|\widehat{l}^{\text{np}}-l\|\times\|\widehat{g}^{\text{np}}-g\|=o_{p}(n^{-1/2})$ for any $l\neq g\in\{\pi_{z}({\bm{x}}),p_{zd}({\bm{x}}),r_{zd}(m,{\bm{x}}),\mu_{zd}(m,{\bm{x}})\}$ .

We show in the proof of Theorem 5 that

\sqrt{n}(\widehat{\theta}_{d_{1}d_{0}}^{(zz^{\prime}),\text{np}}-\theta_{d_{1}d_{0}}^{(zz^{\prime})})=\sqrt{n}\mathbb{P}_{n}\left[\frac{\psi_{d_{1}d_{0}}^{(zz^{\prime})}(\bm{O})-\theta_{d_{1}d_{0}}^{(zz^{\prime})}\delta_{d_{1}d_{0}}(\bm{O})}{p_{z^{*}d^{*}}-kp_{01}}\right]+o_{p}(1),

when $\|\widehat{l}^{\text{np}}-l\|\times\|\widehat{g}^{\text{np}}-g\|=o_{p}(n^{-1/2})$ for any $l\neq g\in\{\pi_{z}({\bm{x}}),p_{zd}({\bm{x}}),r_{zd}(m,{\bm{x}}),\mu_{zd}(m,{\bm{x}})\}$ . Therefore,

	$\displaystyle\sqrt{n}(\widehat{\text{PNIE}}_{d_{1}d_{0}}^{\text{np}}-\text{PNIE}_{d_{1}d_{0}})$	$\displaystyle=\sqrt{n}\mathbb{P}_{n}\left[\frac{\psi_{d_{1}d_{0}}^{(11)}(\bm{O})-\psi_{d_{1}d_{0}}^{(10)}(\bm{O})-\text{PNIE}_{d_{1}d_{0}}\times\delta_{d_{1}d_{0}}(\bm{O})}{p_{z^{}d^{}}-kp_{01}}\right]+o_{p}(1)$
		$\displaystyle=\sqrt{n}\mathbb{P}_{n}\left[\mathcal{D}_{d_{1}d_{0}}^{\text{PNIE}}(\bm{O})\right]+o_{p}(1),$
	$\displaystyle\sqrt{n}(\widehat{\text{PNDE}}_{d_{1}d_{0}}^{\text{np}}-\text{PNDE}_{d_{1}d_{0}})$	$\displaystyle=\sqrt{n}\mathbb{P}_{n}\left[\frac{\psi_{d_{1}d_{0}}^{(10)}(\bm{O})-\psi_{d_{1}d_{0}}^{(00)}(\bm{O})-\text{PNDE}_{d_{1}d_{0}}\times\delta_{d_{1}d_{0}}(\bm{O})}{p_{z^{}d^{}}-kp_{01}}\right]+o_{p}(1)$
		$\displaystyle=\sqrt{n}\mathbb{P}_{n}\left[\mathcal{D}_{d_{1}d_{0}}^{\text{PNDE}}(\bm{O})\right]+o_{p}(1).$

This implies that $\widehat{\text{PNIE}}_{d_{1}d_{0}}^{\text{np}}$ and $\widehat{\text{PNDE}}_{d_{1}d_{0}}^{\text{np}}$ are asymptotically normal and semiparametrically efficient under the required convergence rate conditions for the nuisance function estimates. Also, we show in the proof of Theorem 5 that $\mathbb{P}_{n}[\widehat{\psi}_{d_{1}d_{0}}^{(zz^{\prime}),\text{np}}(\bm{O})]=\mathbb{P}_{n}[\psi_{d_{1}d_{0}}^{(zz^{\prime})}(\bm{O})]+o_{p}(n^{-1/2})$ , which suggests that

	$\displaystyle\widehat{\text{ITT-NIE}}^{\text{np}}$	$\displaystyle=\sum_{d_{1}d_{0}\in\mathcal{U}}\widehat{e}_{d_{1}d_{0}}^{\text{np}}\times(\widehat{\theta}_{d_{1}d_{0}}^{(11),\text{np}}-\widehat{\theta}_{d_{1}d_{0}}^{(10),\text{np}})$
		$\displaystyle=\sum_{d_{1}d_{0}\in\mathcal{U}}\mathbb{P}_{n}[\widehat{\delta}_{d_{1}d_{0}}^{\text{np}}(\bm{O})]\times\left(\frac{\mathbb{P}_{n}[\widehat{\psi}_{d_{1}d_{0}}^{(11),\text{np}}(\bm{O})]}{\mathbb{P}_{n}[\widehat{\delta}_{d_{1}d_{0}}^{\text{np}}(\bm{O})]}-\frac{\mathbb{P}_{n}[\widehat{\psi}_{d_{1}d_{0}}^{(10),\text{np}}(\bm{O})]}{\mathbb{P}_{n}[\widehat{\delta}_{d_{1}d_{0}}^{\text{np}}(\bm{O})]}\right)$
		$\displaystyle=\mathbb{P}_{n}\left[\sum_{d_{1}d_{0}\in\mathcal{U}}\widehat{\psi}_{d_{1}d_{0}}^{(11),\text{np}}(\bm{O})-\widehat{\psi}_{d_{1}d_{0}}^{(10),\text{np}}(\bm{O})\right]$
		$\displaystyle=\mathbb{P}_{n}\left[\sum_{d_{1}d_{0}\in\mathcal{U}}\psi_{d_{1}d_{0}}^{(11)}(\bm{O})-\psi_{d_{1}d_{0}}^{(10)}(\bm{O})\right]+o_{p}(n^{-1/2})$

and thus $\sqrt{n}\left(\widehat{\text{ITT-NIE}}^{\text{np}}-\text{ITT-NIE}\right)=\sqrt{n}\mathbb{P}_{n}\left[\displaystyle\sum_{d_{1}d_{0}\in\mathcal{U}}\left\{\psi_{d_{1}d_{0}}^{(11)}(\bm{O})-\psi_{d_{1}d_{0}}^{(10)}(\bm{O})\right\}-\text{ITT-NIE}\right]+o_{p}(1)=\sqrt{n}\mathbb{P}_{n}\left[\mathcal{D}^{\text{ITT-NIE}}(\bm{O})\right]+o_{p}(1)$ . Similarly, we can show $\sqrt{n}\left(\widehat{\text{ITT-NDE}}^{\text{np}}-\text{ITT-NDE}\right)=\sqrt{n}\mathbb{P}_{n}\left[\mathcal{D}^{\text{ITT-NDE}}(\bm{O})\right]+o_{p}(1)$ . This implies that $\widehat{\text{ITT-NIE}}^{\text{np}}$ and $\widehat{\text{ITT-NDE}}^{\text{np}}$ are asymptotically normal and semiparametrically efficient under the required convergence rate conditions for the nonparametric nuisance function estimators. $\square$

Remark 5

(Variance estimation of the principal and ITT mediation effects) For the purpose of inference, nonparametric bootstrap can be used for the moment-type and multiply robust estimators. The asymptotic variance of the nonparametric efficient estimators can be obtained by using the empirical variance of the estimated EIF given in Lemma S11. For example, the asymptotic variance of $\widehat{\text{PNIE}}_{d_{1}d_{0}}^{\text{np}}$ can be estimated by

\widehat{\text{Var}}\left(\widehat{\text{PNIE}}_{d_{1}d_{0}}^{(zz^{\prime})}\right)=\frac{1}{n}\mathbb{P}_{n}\left[\{\widehat{\mathcal{D}}_{d_{1}d_{0}}^{\text{PNIE,np}}(\bm{O})\}^{2}\right],

where $\widehat{\mathcal{D}}_{d_{1}d_{0}}^{\text{PNIE,np}}(\bm{O})$ is $\mathcal{D}_{d_{1}d_{0}}^{\text{PNIE}}(\bm{O})$ evaluated based on the nonparametric estimator of the nuisance functions, $\widehat{h}_{nuisance}^{\text{np}}$ . The variance estimator of $\widehat{\text{PNDE}}_{d_{1}d_{0}}^{\text{np}}$ , $\widehat{\text{ITT-NIE}}^{\text{np}}$ , and $\widehat{\text{ITT-NDE}}^{\text{np}}$ can be similarly obtained.

D.8 Sensitivity analysis for the principal ignorability assumption under standard monotonicity

This section provides the supporting information for the sensitivity analysis for the principal ignorability assumption, under standard monotonicity. We first present the explicit forms of the sensitivity weight $w_{d_{1}d_{0}}^{(zz^{\prime})}(m,{\bm{x}})$ for all $zz^{\prime}\in\{11,10,00\}$ and $d_{1}d_{0}\in\mathcal{U}_{\text{2-sided}}$ :

	$\displaystyle w_{10}^{(11)}(m,{\bm{x}})$	$\displaystyle=\begin{cases}\frac{\xi_{M}^{(1)}(m,{\bm{x}})p_{11}({\bm{x}})}{\xi_{M}^{(1)}(m,{\bm{x}})(p_{11}({\bm{x}})-p_{01}({\bm{x}}))+p_{01}({\bm{x}})}\frac{\xi_{M}^{(1)}(m,{\bm{x}})(p_{11}({\bm{x}})-p_{01}({\bm{x}}))+p_{01}({\bm{x}})}{p_{01}({\bm{x}})/\xi_{Y}^{(1)}(m,{\bm{x}})+\xi_{M}^{(1)}(m,{\bm{x}})(p_{11}({\bm{x}})-p_{01}({\bm{x}}))},&\text{if }m\geq 1,\\ \left\{\frac{1}{r_{11}(0,{\bm{x}})}-\displaystyle\sum_{j=1}^{m_{\max}}\frac{\xi_{M}^{(1)}(j,{\bm{x}})p_{11}({\bm{x}})r_{11}(j,{\bm{x}})/r_{11}(0,{\bm{x}})}{\xi_{M}^{(1)}(j,{\bm{x}})(p_{11}({\bm{x}})-p_{01}({\bm{x}}))+p_{01}({\bm{x}})}\right\}\frac{\xi_{M}^{(1)}(0,{\bm{X}})(p_{11}({\bm{x}})-p_{01}({\bm{x}}))+p_{01}({\bm{x}})}{p_{01}({\bm{x}})/\xi_{Y}^{(1)}(0,{\bm{x}})+\xi_{M}^{(1)}(0,{\bm{x}})(p_{11}({\bm{x}})-p_{01}({\bm{x}}))},&\text{if }m=0.\end{cases}$
	$\displaystyle w_{10}^{(10)}(m,{\bm{x}})$	$\displaystyle=\begin{cases}\frac{\xi_{M}^{(0)}(m,{\bm{x}})p_{00}({\bm{x}})}{\xi_{M}^{(0)}(m,{\bm{x}})(p_{11}({\bm{x}})-p_{01}({\bm{X}}))+p_{10}({\bm{x}})}\frac{\xi_{M}^{(1)}(m,{\bm{X}})(p_{11}({\bm{x}})-p_{01}({\bm{x}}))+p_{01}({\bm{x}})}{p_{01}({\bm{x}})/\xi_{Y}^{(1)}(m,{\bm{x}})+\xi_{M}^{(1)}(m,{\bm{x}})(p_{11}({\bm{x}})-p_{01}({\bm{x}}))},&\text{if }m\geq 1,\\ \left\{\frac{1}{r_{00}(0,{\bm{x}})}-\displaystyle\sum_{j=1}^{m_{\max}}\frac{\xi_{M}^{(0)}(j,{\bm{x}})p_{00}({\bm{x}})r_{00}(j,{\bm{x}})/r_{00}(0,{\bm{x}})}{\xi_{M}^{(0)}(j,{\bm{x}})(p_{11}({\bm{x}})-p_{01}({\bm{x}}))+p_{10}({\bm{x}})}\right\}\frac{\xi_{M}^{(1)}(0,{\bm{X}})(p_{11}({\bm{x}})-p_{01}({\bm{x}}))+p_{01}({\bm{x}})}{p_{01}({\bm{x}})/\xi_{Y}^{(1)}(0,{\bm{x}})+\xi_{M}^{(1)}(0,{\bm{x}})(p_{11}({\bm{x}})-p_{01}({\bm{x}}))},&\text{if }m=0.\end{cases}$
	$\displaystyle w_{10}^{(00)}(m,{\bm{x}})$	$\displaystyle=\begin{cases}\frac{\xi_{M}^{(0)}(m,{\bm{x}})p_{00}({\bm{x}})}{\xi_{M}^{(0)}(m,{\bm{x}})(p_{11}({\bm{x}})-p_{01}({\bm{X}}))+p_{10}({\bm{x}})}\frac{\xi_{M}^{(0)}(m,{\bm{x}})\left(p_{11}({\bm{x}})-p_{01}({\bm{x}})\right)+p_{10}({\bm{x}})}{p_{10}({\bm{x}})/\xi_{Y}^{(0)}(m,{\bm{x}})+\xi_{M}^{(0)}(m,{\bm{x}})\left(p_{11}({\bm{x}})-p_{01}({\bm{x}})\right)},&\text{if }m\geq 1,\\ \left\{\frac{1}{r_{00}(0,{\bm{x}})}-\displaystyle\sum_{j=1}^{m_{\max}}\frac{\xi_{M}^{(0)}(j,{\bm{x}})p_{00}({\bm{x}})r_{00}(j,{\bm{x}})/r_{00}(0,{\bm{x}})}{\xi_{M}^{(0)}(j,{\bm{x}})(p_{11}({\bm{x}})-p_{01}({\bm{x}}))+p_{10}({\bm{x}})}\right\}\frac{\xi_{M}^{(0)}(0,{\bm{x}})\left(p_{11}({\bm{x}})-p_{01}({\bm{x}})\right)+p_{10}({\bm{x}})}{p_{10}({\bm{x}})/\xi_{Y}^{(0)}(0,{\bm{x}})+\xi_{M}^{(0)}(0,{\bm{x}})\left(p_{11}({\bm{x}})-p_{01}({\bm{x}})\right)},&\text{if }m=0.\end{cases}$
	$\displaystyle w_{00}^{(11)}(m,{\bm{x}})$	$\displaystyle=1\text{ for any $m$}.$
	$\displaystyle w_{00}^{(10)}(m,{\bm{x}})$	$\displaystyle=\begin{cases}\frac{p_{00}({\bm{x}})}{\xi_{M}^{(0)}(m,{\bm{x}})(p_{11}({\bm{x}})-p_{01}({\bm{x}}))+p_{10}({\bm{x}})},&\text{if }m\geq 1,\\ \frac{1}{r_{00}(0,{\bm{x}})}-\displaystyle\sum_{j=1}^{m_{\max}}\frac{p_{00}({\bm{x}})r_{00}(j,{\bm{x}})/r_{00}(0,{\bm{x}})}{\xi_{M}^{(0)}(j,{\bm{x}})(p_{11}({\bm{x}})-p_{01}({\bm{x}}))+p_{10}({\bm{x}})},&\text{if }m=0.\end{cases}$
	$\displaystyle w_{00}^{(00)}(m,{\bm{x}})$	$\displaystyle=\begin{cases}\frac{p_{00}({\bm{x}})}{\xi_{M}^{(0)}(m,{\bm{x}})(p_{11}({\bm{x}})-p_{01}({\bm{x}}))+p_{10}({\bm{x}})}\frac{\xi_{M}^{(0)}(m,{\bm{x}})\left(p_{11}({\bm{x}})-p_{01}({\bm{x}})\right)+p_{10}({\bm{x}})}{p_{10}({\bm{x}})+\xi_{Y}^{(0)}(m,{\bm{x}})\xi_{M}^{(0)}(m,{\bm{x}})\left(p_{11}({\bm{x}})-p_{01}({\bm{x}})\right)},&\text{if }m\geq 1,\\ \left\{\frac{1}{r_{00}(0,{\bm{x}})}-\displaystyle\sum_{j=1}^{m_{\max}}\frac{p_{00}({\bm{x}})r_{00}(j,{\bm{x}})/r_{00}(0,{\bm{x}})}{\xi_{M}^{(0)}(j,{\bm{x}})(p_{11}({\bm{x}})-p_{01}({\bm{x}}))+p_{10}({\bm{x}})}\right\}\frac{\xi_{M}^{(0)}(0,{\bm{x}})\left(p_{11}({\bm{x}})-p_{01}({\bm{x}})\right)+p_{10}({\bm{x}})}{p_{10}({\bm{x}})+\xi_{Y}^{(0)}(0,{\bm{x}})\xi_{M}^{(0)}(0,{\bm{x}})\left(p_{11}({\bm{x}})-p_{01}({\bm{x}})\right)},&\text{if }m=0.\end{cases}$
	$\displaystyle w_{11}^{(11)}(m,{\bm{x}})$	$\displaystyle=\begin{cases}\frac{p_{11}({\bm{x}})}{\xi_{M}^{(1)}(m,{\bm{x}})(p_{11}({\bm{x}})-p_{01}({\bm{x}}))+p_{01}({\bm{x}})}\frac{\xi_{M}^{(1)}(m,{\bm{X}})\left(p_{11}({\bm{x}})-p_{01}({\bm{x}})\right)+p_{01}({\bm{x}})}{p_{01}({\bm{x}})+\xi_{Y}^{(1)}(m,{\bm{x}})\xi_{M}^{(1)}(m,{\bm{x}})\left(p_{11}({\bm{x}})-p_{01}({\bm{x}})\right)},&\text{if }m\geq 1,\\ \left\{\frac{1}{r_{11}(0,{\bm{x}})}-\displaystyle\sum_{j=1}^{m_{\max}}\frac{p_{11}({\bm{x}})r_{11}(j,{\bm{x}})/r_{11}(0,{\bm{x}})}{\xi_{M}^{(1)}(j,{\bm{x}})(p_{11}({\bm{x}})-p_{01}({\bm{x}}))+p_{01}({\bm{x}})}\right\}\frac{\xi_{M}^{(1)}(0,{\bm{X}})\left(p_{11}({\bm{x}})-p_{01}({\bm{x}})\right)+p_{01}({\bm{x}})}{p_{01}({\bm{x}})+\xi_{Y}^{(1)}(0,{\bm{x}})\xi_{M}^{(1)}(0,{\bm{x}})\left(p_{11}({\bm{x}})-p_{01}({\bm{x}})\right)},&\text{if }m=0.\end{cases}$
	$\displaystyle w_{11}^{(10)}(m,{\bm{x}})$	$\displaystyle=\frac{\xi_{M}^{(1)}(m,{\bm{X}})\left(p_{11}({\bm{x}})-p_{01}({\bm{x}})\right)+p_{01}({\bm{x}})}{p_{01}({\bm{x}})+\xi_{Y}^{(1)}(m,{\bm{x}})\xi_{M}^{(1)}(m,{\bm{x}})\left(p_{11}({\bm{x}})-p_{01}({\bm{x}})\right)}\text{ for any $m$.}$
	$\displaystyle w_{11}^{(00)}(m,{\bm{x}})$	$\displaystyle=1\text{ for any $m$.}$

Next, we prove Propositions S1 and S2, which include identification results and properties of the multiply robust estimator under violation of principal ignorability. We first provide two lemmas.

Lemma S12

Under Assumptions 1, 2, 3a, and 6 and the proposed confounding function $\xi=\left\{\left(\xi_{M}^{(1)}(m,{\bm{x}}),\xi_{M}^{(0)}(m,{\bm{x}})\right)\text{ for }m\geq 1\text{ and }\left(\xi_{Y}^{(1)}(m,{\bm{x}}),\xi_{Y}^{(0)}(m,{\bm{x}})\right)\text{ for }m\geq 0\right\}$ , we can nonparametrically identify the distribution of $M_{z}|\{U=d_{1}d_{0},{\bm{X}}\}$ for all $z\in\{1,0\}$ and $d_{1}d_{0}\in\mathcal{U}_{2-sided}$ . Specifically, we have that $f_{M_{0}|U,{\bm{X}}}(m|11,{\bm{x}})=r_{01}(m,{\bm{x}})$ and $f_{M_{1}|U,{\bm{X}}}(m|00,{\bm{x}})=r_{10}(m,{\bm{x}})$ for any $m=0,\dots,m_{\max}$ and

	$\displaystyle f_{M_{1}\|U,{\bm{X}}}(m\|10,{\bm{x}})$	$\displaystyle=\frac{\xi_{M}^{(1)}(m,{\bm{x}})p_{11}({\bm{x}})}{\xi_{M}^{(1)}(m,{\bm{x}})(p_{11}({\bm{x}})-p_{01}({\bm{x}}))+p_{01}({\bm{x}})}r_{11}(m,{\bm{x}}),$
	$\displaystyle f_{M_{0}\|U,{\bm{X}}}(m\|10,{\bm{x}})$	$\displaystyle=\frac{\xi_{M}^{(0)}(m,{\bm{x}})p_{00}({\bm{x}})}{\xi_{M}^{(0)}(m,{\bm{x}})(p_{11}({\bm{x}})-p_{01}({\bm{x}}))+p_{10}({\bm{x}})}r_{00}(m,{\bm{x}}),$
	$\displaystyle f_{M_{1}\|U,{\bm{X}}}(m\|11,{\bm{x}})$	$\displaystyle=\frac{p_{11}({\bm{x}})}{\xi_{M}^{(1)}(m,{\bm{x}})(p_{11}({\bm{x}})-p_{01}({\bm{x}}))+p_{01}({\bm{x}})}r_{11}(m,{\bm{x}}),$
	$\displaystyle f_{M_{0}\|U,{\bm{X}}}(m\|00,{\bm{x}})$	$\displaystyle=\frac{p_{00}({\bm{x}})}{\xi_{M}^{(0)}(m,{\bm{x}})(p_{11}({\bm{x}})-p_{01}({\bm{x}}))+p_{10}({\bm{x}})}r_{00}(m,{\bm{x}}),$

for any $m=1,\dots,m_{\max}$ . This suggests that, for $m=0$ ,

	$\displaystyle f_{M_{1}\|U,{\bm{X}}}(0\|10,{\bm{x}})$	$\displaystyle=1-\sum_{j=1}^{m_{\max}}\frac{\xi_{M}^{(1)}(j,{\bm{x}})p_{11}({\bm{x}})}{\xi_{M}^{(1)}(j,{\bm{x}})(p_{11}({\bm{x}})-p_{01}({\bm{x}}))+p_{01}({\bm{x}})}r_{11}(j,{\bm{x}}),$
	$\displaystyle f_{M_{0}\|U,{\bm{X}}}(0\|10,{\bm{x}})$	$\displaystyle=1-\sum_{j=1}^{m_{\max}}\frac{\xi_{M}^{(0)}(j,{\bm{x}})p_{00}({\bm{x}})}{\xi_{M}^{(0)}(j,{\bm{x}})(p_{11}({\bm{x}})-p_{01}({\bm{x}}))+p_{10}({\bm{x}})}r_{00}(j,{\bm{x}}),$
	$\displaystyle f_{M_{1}\|U,{\bm{X}}}(0\|11,{\bm{x}})$	$\displaystyle=1-\sum_{j=1}^{m_{\max}}\frac{p_{11}({\bm{x}})}{\xi_{M}^{(1)}(j,{\bm{x}})(p_{11}({\bm{x}})-p_{01}({\bm{x}}))+p_{01}({\bm{x}})}r_{11}(j,{\bm{x}}),$
	$\displaystyle f_{M_{0}\|U,{\bm{X}}}(0\|00,{\bm{x}})$	$\displaystyle=1-\sum_{j=1}^{m_{\max}}\frac{p_{00}({\bm{x}})}{\xi_{M}^{(0)}(j,{\bm{x}})(p_{11}({\bm{x}})-p_{01}({\bm{x}}))+p_{10}({\bm{x}})}r_{00}(j,{\bm{x}}).$

Proof.

We first show $f_{M_{0}|U,{\bm{X}}}(m|11,{\bm{x}})=r_{01}(m,{\bm{x}})$ for all $m\in\{0,\dots,m_{\max}\}$ and similar results extend to $f_{M_{1}|U,{\bm{X}}}(m|00,{\bm{x}})=r_{10}(m,{\bm{x}})$ . Specifically, for any $m\in\{0,\dots,m_{\max}\}$ , we have that

	$\displaystyle f_{M_{0}\|U,{\bm{X}}}(m\|11,{\bm{x}})$	$\displaystyle=f_{M_{0}\|Z,U,{\bm{X}}}(m\|0,11,{\bm{x}})$
		(by Lemma S5 and Lemma S2)
		$\displaystyle=f_{M_{0}\|Z,D,U,{\bm{X}}}(m\|0,1,11,{\bm{x}})$
		(because $D$ must be 1 given $Z=0$ and $U=11$ )
		$\displaystyle=f_{M_{0}\|Z,D,{\bm{X}}}(m\|0,1,{\bm{x}})$
		(because the observed stratum with $D=0$ and $Z=1$ only contains the always-takers)
		$\displaystyle=r_{01}(m,{\bm{x}}).$

Next, we derive the expression of $f_{M_{1}|U,{\bm{X}}}(m|10,{\bm{x}})$ . Leveraging the same strategy, one can deduce the expressions of $f_{M_{0}|U,{\bm{X}}}(m|10,{\bm{x}})$ , $f_{M_{1}|U,{\bm{X}}}(m|11,{\bm{x}})$ , and $f_{M_{0}|U,{\bm{X}}}(m|00,{\bm{x}})$ . For $m\geq 1$ , we have that

	$\displaystyle r_{11}(m,{\bm{x}})=$	$\displaystyle f_{M\|Z,D,{\bm{X}}}(m\|1,1,{\bm{x}})$
	$\displaystyle=$	$\displaystyle f_{M_{1}\|Z,D,U,{\bm{X}}}(m\|1,1,10,{\bm{x}})f_{U\|Z,D,{\bm{X}}}(10\|1,1,{\bm{x}})+f_{M_{1}\|Z,D,U,{\bm{X}}}(m\|1,1,11,{\bm{x}})f_{U\|Z,D,{\bm{X}}}(11\|1,1,{\bm{x}})$
		(followed by the law of total probability and $U=10$ or 11 in observed strata $(Z=1,D=1)$
		under standard monotonicity)
	$\displaystyle=$	$\displaystyle f_{M_{1}\|Z,U,{\bm{X}}}(m\|1,10,{\bm{x}})f_{U\|Z,D,{\bm{X}}}(10\|1,1,{\bm{x}})+f_{M_{1}\|Z,U,{\bm{X}}}(m\|1,10,{\bm{x}})f_{U\|Z,D,{\bm{X}}}(10\|1,1,{\bm{x}})$
		(because $D$ must be 1 given Z=1 and either $U=10$ or 11)
	$\displaystyle=$	$\displaystyle f_{M_{1}\|U,{\bm{X}}}(m\|10,{\bm{x}})f_{U\|Z,D,{\bm{X}}}(10\|1,1,{\bm{x}})+f_{M_{1}\|U,{\bm{X}}}(m\|10,{\bm{x}})f_{U\|Z,D,{\bm{X}}}(10\|1,1,{\bm{x}})$
		(by Lemma S5 and Lemma S2)
	$\displaystyle=$	$\displaystyle f_{M_{1}\|U,{\bm{X}}}(m\|10,{\bm{x}})\frac{p_{11}({\bm{x}})-p_{01}({\bm{x}})}{p_{11}({\bm{x}})}+f_{M_{1}\|U,{\bm{X}}}(m\|11,{\bm{x}})\frac{p_{01}({\bm{x}})}{p_{11}({\bm{x}})}$
	$\displaystyle=$	$\displaystyle f_{M_{1}\|U,{\bm{X}}}(m\|10,{\bm{x}})\frac{p_{11}({\bm{x}})-p_{01}({\bm{x}})}{p_{11}({\bm{x}})}+f_{M_{1}\|U,{\bm{x}}}(m\|10,{\bm{x}})\frac{p_{01}({\bm{x}})}{\xi_{M}^{(1)}(m,{\bm{x}})p_{11}({\bm{x}})}$
	$\displaystyle=$	$\displaystyle f_{M_{1}\|U,{\bm{X}}}(m\|10,{\bm{x}})\frac{\xi_{M}^{(1)}(m,{\bm{x}})\left(p_{11}({\bm{x}})-p_{01}({\bm{x}})\right)+p_{01}({\bm{x}})}{\xi_{M}^{(1)}(m,{\bm{x}})p_{11}({\bm{x}})},$

which indicates $f_{M_{1}|U,{\bm{X}}}(m|10,{\bm{x}})=\frac{\xi_{M}^{(1)}(m,{\bm{x}})p_{11}({\bm{x}})}{\xi_{M}^{(1)}(m,{\bm{x}})\left(p_{11}({\bm{x}})-p_{01}({\bm{x}})\right)+p_{01}({\bm{x}})}r_{11}(m,{\bm{x}})$ for $m\in\{1,\dots,m_{\max}\}$ . Because $\sum_{j=0}^{m_{\max}}f_{M_{1}|U,{\bm{X}}}(m|10,{\bm{x}})=1$ , we obtain

f_{M_{1}|U,{\bm{X}}}(0|10,{\bm{x}})=1-\sum_{j=1}^{m_{\max}}\frac{\xi_{M}^{(1)}(j,{\bm{x}})p_{11}({\bm{x}})}{\xi_{M}^{(1)}(j,{\bm{x}})(p_{11}({\bm{x}})-p_{01}({\bm{x}}))+p_{01}({\bm{x}})}r_{11}(j,{\bm{x}}).

$\square$

Lemma S13

Under Assumptions 1, 2, 3a, and 6 and the proposed confounding function $\xi=\left\{\left(\xi_{M}^{(1)}(m,{\bm{x}}),\xi_{M}^{(0)}(m,{\bm{x}})\right)\text{ for }m\geq 1\text{ and }\left(\xi_{Y}^{(1)}(m,{\bm{x}}),\xi_{Y}^{(0)}(m,{\bm{x}})\right)\text{ for }m\geq 0\right\}$ , we can nonparametrically identify the $\mathbb{E}_{Y_{zm}|U,{\bm{X}}}(Y_{zm}|d_{1}d_{0},{\bm{x}})$ for all $z\in\{1,0\}$ , $m\in\{0,\dots,m_{\max}\}$ , and $d_{1}d_{0}\in\mathcal{U}_{\text{2-sided}}$ . Specifically, we have that

	$\displaystyle\mathbb{E}_{Y_{1m}\|U,{\bm{X}}}[Y_{1m}\|10,{\bm{x}}]$	$\displaystyle=\frac{\xi_{M}^{(1)}(m,{\bm{x}})\left(p_{11}({\bm{x}})-p_{01}({\bm{x}})\right)+p_{01}({\bm{x}})}{p_{01}({\bm{x}})/\xi_{Y}^{(1)}(m,{\bm{x}})+\xi_{M}^{(1)}(m,{\bm{x}})\left(p_{11}({\bm{x}})-p_{01}({\bm{x}})\right)}\mu_{11}(m,{\bm{x}}),$
	$\displaystyle\mathbb{E}_{Y_{0m}\|U,{\bm{X}}}[Y_{0m}\|10,{\bm{x}}]$	$\displaystyle=\frac{\xi_{M}^{(0)}(m,{\bm{x}})\left(p_{11}({\bm{x}})-p_{01}({\bm{x}})\right)+p_{10}({\bm{x}})}{p_{10}({\bm{x}})/\xi_{Y}^{(0)}(m,{\bm{x}})+\xi_{M}^{(0)}(m,{\bm{x}})\left(p_{11}({\bm{x}})-p_{01}({\bm{x}})\right)}\mu_{00}(m,{\bm{x}}),$
	$\displaystyle\mathbb{E}_{Y_{1m}\|U,{\bm{X}}}[Y_{1m}\|11,{\bm{x}}]$	$\displaystyle=\frac{\xi_{M}^{(1)}(m,{\bm{X}})\left(p_{11}({\bm{x}})-p_{01}({\bm{x}})\right)+p_{01}({\bm{x}})}{p_{01}({\bm{x}})+\xi_{Y}^{(1)}(m,{\bm{x}})\xi_{M}^{(1)}(m,{\bm{x}})\left(p_{11}({\bm{x}})-p_{01}({\bm{x}})\right)}\mu_{11}(m,{\bm{x}}),$
	$\displaystyle\mathbb{E}_{Y_{0m}\|U,{\bm{X}}}[Y_{0m}\|11,{\bm{x}}]$	$\displaystyle=\mu_{01}(m,{\bm{x}}),$
	$\displaystyle\mathbb{E}_{Y_{1m}\|U,{\bm{X}}}[Y_{1m}\|00,{\bm{x}}]$	$\displaystyle=\mu_{10}(m,{\bm{x}}),$
	$\displaystyle\mathbb{E}_{Y_{0m}\|U,{\bm{X}}}[Y_{0m}\|00,{\bm{x}}]$	$\displaystyle=\frac{\xi_{M}^{(0)}(m,{\bm{x}})\left(p_{11}({\bm{x}})-p_{01}({\bm{x}})\right)+p_{10}({\bm{x}})}{p_{10}({\bm{x}})+\xi_{Y}^{(0)}(m,{\bm{x}})\xi_{M}^{(0)}(m,{\bm{x}})\left(p_{11}({\bm{x}})-p_{01}({\bm{x}})\right)}\mu_{00}(m,{\bm{x}}),$

for any $m=0,\dots,m_{\max}$ , where $\xi_{M}^{(1)}(m,{\bm{x}})$ and $\xi_{M}^{(1)}(m,{\bm{x}})$ are given in $\xi$ for $m\geq 1$ and for $m=0$ ,

	$\displaystyle\xi_{M}^{(1)}(0,{\bm{x}})$	$\displaystyle=:\frac{f_{M_{1}\|U,{\bm{X}}}(0\|10,{\bm{x}})}{f_{M_{1}\|U,{\bm{X}}}(0\|11,{\bm{x}})}=\frac{1-\sum_{j=1}^{m_{\max}}\displaystyle\frac{\xi_{M}^{(1)}(j,{\bm{x}})p_{11}({\bm{x}})}{\xi_{M}^{(1)}(j,{\bm{x}})(p_{11}({\bm{x}})-p_{01}({\bm{x}}))+p_{01}({\bm{x}})}r_{11}(j,{\bm{x}})}{1-\sum_{j=1}^{m_{\max}}\displaystyle\frac{p_{11}({\bm{x}})}{\xi_{M}^{(1)}(j,{\bm{x}})(p_{11}({\bm{x}})-p_{01}({\bm{x}}))+p_{01}({\bm{x}})}r_{11}(j,{\bm{x}})},$
	$\displaystyle\xi_{M}^{(0)}(0,{\bm{x}})$	$\displaystyle=:\frac{f_{M_{0}\|U,{\bm{X}}}(0\|10,{\bm{x}})}{f_{M_{0}\|U,{\bm{X}}}(0\|00,{\bm{x}})}=\frac{1-\sum_{j=1}^{m_{\max}}\displaystyle\frac{\xi_{M}^{(0)}(j,{\bm{x}})p_{00}({\bm{x}})}{\xi_{M}^{(0)}(j,{\bm{x}})(p_{11}({\bm{x}})-p_{01}({\bm{x}}))+p_{10}({\bm{x}})}r_{00}(j,{\bm{x}})}{1-\sum_{j=1}^{m_{\max}}\displaystyle\frac{p_{00}({\bm{x}})}{\xi_{M}^{(0)}(j,{\bm{x}})(p_{11}({\bm{x}})-p_{01}({\bm{x}}))+p_{10}({\bm{x}})}r_{00}(j,{\bm{x}})}.$

Proof.

We first show $\mathbb{E}_{Y_{0m}|U,{\bm{X}}}(Y_{0m}|11,{\bm{x}})=\mu_{01}(m,{\bm{x}})$ and similar result extends to $\mathbb{E}_{Y_{1m}|U,{\bm{X}}}(Y_{1m}|00,{\bm{x}})=\mu_{10}(m,{\bm{x}})$ . Specifically, one can verify

	$\displaystyle\mathbb{E}_{Y_{0m}\|U,{\bm{X}}}(Y_{0m}\|11,{\bm{x}})$	$\displaystyle=\mathbb{E}_{Y_{0m}\|Z,U,{\bm{X}}}(Y_{0m}\|0,11,{\bm{x}})$
		(by Lemma S5 and Lemma S2)
		$\displaystyle=\mathbb{E}_{Y_{0m}\|Z,M,U,{\bm{X}}}(Y_{0m}\|0,m,11,{\bm{x}})$
		(by Assumption 5)
		$\displaystyle=\mathbb{E}_{Y_{0m}\|Z,D,M,U,{\bm{X}}}(Y_{0m}\|0,1,m,11,{\bm{x}})$
		(because $D$ must be 1 given $Z=0$ and $U=11$ )
		$\displaystyle=\mathbb{E}_{Y_{0m}\|Z,D,M,{\bm{X}}}(Y_{0m}\|0,1,m,{\bm{x}})$
		(because the observed stratum with $D=0$ and $Z=1$ only contains the always-takers)
		$\displaystyle=\mu_{01}(m,{\bm{x}}).$

Next, we derive the expression of $\mathbb{E}_{Y_{1m}|U,{\bm{X}}}(Y_{1m}|10,{\bm{x}})$ . Notice that the expressions of $\mathbb{E}_{Y_{0m}|U,{\bm{X}}}(Y_{0m}|10,{\bm{x}})$ , $\mathbb{E}_{Y_{1m}|U,{\bm{X}}}(Y_{1m}|11,{\bm{x}})$ , and $\mathbb{E}_{Y_{0m}|U,{\bm{X}}}(Y_{0m}|00,{\bm{x}})$ can be similarly obtained. Specifically, for any $m\geq 0$ ,

		$\displaystyle\mu_{11}(m,{\bm{x}})$
	$\displaystyle=$	$\displaystyle\mathbb{E}_{Y\|Z,D,M,{\bm{X}}}[Y_{1m}\|1,1,m,{\bm{x}}]=\mathbb{E}_{Y_{1m}\|Z,D,M_{1},{\bm{X}}}[Y_{1m}\|1,1,m,{\bm{x}}]$
	$\displaystyle=$	$\displaystyle\mathbb{E}_{Y_{1m}\|Z,D,M_{1},U,{\bm{X}}}[Y_{1m}\|1,1,m,10,{\bm{x}}]f_{U\|Z,D,M_{1},{\bm{X}}}(10\|1,1,m,{\bm{x}})$
		$\displaystyle+\mathbb{E}_{Y_{1m}\|Z,D,M_{1},U,{\bm{X}}}[Y_{1m}\|1,1,m,11,{\bm{x}}]f_{U\|Z,D,M_{1},{\bm{X}}}(11\|1,1,m,{\bm{x}})$
		(followed by the law of iterated expectation and $U=10$ or 11 in observed strata $(Z=1,D=1)$
		under standard monotonicity)
	$\displaystyle=$	$\displaystyle\mathbb{E}_{Y_{1m}\|Z,M_{1},U,{\bm{X}}}[Y_{1m}\|1,m,10,{\bm{x}}]f_{U\|Z,D,M_{1},{\bm{X}}}(10\|1,1,m,{\bm{x}})+\mathbb{E}_{Y_{1m}\|Z,M_{1},U,{\bm{X}}}[Y_{1m}\|1,m,11,{\bm{x}}]f_{U\|Z,D,M_{1},{\bm{X}}}(11\|1,1,m,{\bm{x}})$
		(because $D$ must be 1 given Z=1 and either $U=10$ or 11)
	$\displaystyle=$	$\displaystyle\mathbb{E}_{Y_{1m}\|Z,U,{\bm{X}}}[Y_{1m}\|1,10,{\bm{x}}]f_{U\|Z,D,M_{1},{\bm{X}}}(10\|1,1,m,{\bm{x}})+\mathbb{E}_{Y_{1m}\|Z,U,{\bm{X}}}[Y_{1m}\|1,11,{\bm{x}}]f_{U\|Z,D,M_{1},{\bm{X}}}(11\|1,1,m,{\bm{x}})$
		(followed by Assumption 5)
	$\displaystyle=$	$\displaystyle\mathbb{E}_{Y_{1m}\|U,{\bm{X}}}[Y_{1m}\|10,{\bm{x}}]f_{U\|Z,D,M_{1},{\bm{X}}}(10\|1,1,m,{\bm{x}})+\mathbb{E}_{Y_{1m}\|U,{\bm{X}}}[Y_{1m}\|11,{\bm{x}}]f_{U\|Z,D,M_{1},{\bm{X}}}(11\|1,1,m,{\bm{x}}),$
		(by Lemma S5 and Lemma S2)

where

	$\displaystyle f_{U\|Z,D,M_{1},{\bm{X}}}(10\|1,1,m,{\bm{x}})=$	$\displaystyle\frac{f_{M_{1}\|Z,D,U,{\bm{X}}}(m\|1,1,10,{\bm{x}})f_{U\|Z,D,{\bm{X}}}(10\|1,1,{\bm{x}})}{\displaystyle\sum_{u\in\{10,11\}}f_{M_{1}\|Z,D,U,{\bm{X}}}(m\|1,1,u,{\bm{x}})f_{U\|Z,D,{\bm{X}}}(u\|1,1,{\bm{x}})}$
	$\displaystyle=$	$\displaystyle\frac{f_{M_{1}\|U,{\bm{X}}}(m\|10,{\bm{x}})f_{U\|Z,D,{\bm{X}}}(10\|1,1,{\bm{x}})}{\displaystyle\sum_{u\in\{10,11\}}f_{M_{1}\|U,{\bm{X}}}(m\|u,{\bm{x}})f_{U\|Z,D,{\bm{X}}}(u\|1,1,{\bm{x}})}$
	$\displaystyle=$	$\displaystyle\frac{f_{M_{1}\|U,{\bm{X}}}(m\|10,{\bm{x}})\displaystyle\frac{p_{11}({\bm{x}})-p_{01}({\bm{x}})}{p_{11}({\bm{x}})}}{f_{M_{1}\|U,{\bm{X}}}(m\|10,{\bm{x}})\displaystyle\frac{p_{11}({\bm{x}})-p_{01}({\bm{x}})}{p_{11}({\bm{x}})}+f_{M_{1}\|U,{\bm{X}}}(m\|11,{\bm{x}})\displaystyle\frac{p_{01}({\bm{x}})}{p_{11}({\bm{x}})}}$
	$\displaystyle=$	$\displaystyle\frac{\displaystyle\frac{p_{11}({\bm{x}})-p_{01}({\bm{x}})}{p_{11}({\bm{x}})}}{\displaystyle\frac{p_{11}({\bm{x}})-p_{01}({\bm{x}})}{p_{11}({\bm{x}})}+\displaystyle\frac{p_{01}({\bm{x}})}{\xi_{M}^{(1)}(m,{\bm{x}})p_{11}({\bm{x}})}}=\frac{\xi_{M}^{(1)}(m,{\bm{x}})\left(p_{11}({\bm{x}})-p_{01}({\bm{x}})\right)}{\xi_{M}^{(1)}(m,{\bm{x}})\left(p_{11}({\bm{x}})-p_{01}({\bm{x}})\right)+p_{01}({\bm{x}})}.$

and

	$\displaystyle f_{U\|Z,D,M_{1},{\bm{X}}}(11\|1,1,m,{\bm{x}})$	$\displaystyle=1-f_{U\|Z,D,M_{1},{\bm{X}}}(10\|1,1,m,{\bm{x}})$
		$\displaystyle=\frac{p_{01}({\bm{x}})}{\xi_{M}^{(1)}(m,{\bm{x}})\left(p_{11}({\bm{x}})-p_{01}({\bm{x}})\right)+p_{01}({\bm{x}})}.$

This suggests that

		$\displaystyle\mu_{11}(m,{\bm{x}})$
	$\displaystyle=$	$\displaystyle\mathbb{E}_{Y_{1m}\|U,{\bm{X}}}[Y_{1m}\|10,{\bm{x}}]\frac{\xi_{M}^{(1)}(m,{\bm{x}})\left(p_{11}({\bm{x}})\!-\!p_{01}({\bm{x}})\right)}{\xi_{M}^{(1)}(m,{\bm{x}})\left(p_{11}({\bm{x}})\!-\!p_{01}({\bm{x}})\right)+p_{01}({\bm{x}})}\!+\!\mathbb{E}_{Y_{1m}\|U,{\bm{X}}}[Y_{1m}\|11,{\bm{x}}]\frac{p_{01}({\bm{x}})}{\xi_{M}^{(1)}(m,{\bm{x}})\left(p_{11}({\bm{x}})-p_{01}({\bm{x}})\right)+p_{01}({\bm{x}})}$
	$\displaystyle=$	$\displaystyle\mathbb{E}_{Y_{1m}\|U,{\bm{X}}}[Y_{1m}\|10,{\bm{x}}]\frac{\xi_{M}^{(1)}(m,{\bm{x}})\left(p_{11}({\bm{x}})\!-\!p_{01}({\bm{x}})\right)}{\xi_{M}^{(1)}(m,{\bm{x}})\left(p_{11}({\bm{x}})\!-\!p_{01}({\bm{x}})\right)+p_{01}({\bm{x}})}\!+\!\frac{\mathbb{E}_{Y_{1m}\|U,{\bm{X}}}[Y_{1m}\|11,{\bm{x}}]}{\xi_{Y}^{(1)}(m,{\bm{x}})}\frac{p_{01}({\bm{x}})}{\xi_{M}^{(1)}(m,{\bm{x}})\left(p_{11}({\bm{x}})-p_{01}({\bm{x}})\right)+p_{01}({\bm{x}})}$
	$\displaystyle=$	$\displaystyle\mathbb{E}_{Y_{1m}\|U,{\bm{X}}}[Y_{1m}\|10,{\bm{x}}]\times\frac{p_{01}({\bm{x}})/\xi_{Y}^{(1)}(m,{\bm{x}})+\xi_{M}^{(1)}(m,{\bm{x}})\left(p_{11}({\bm{x}})-p_{01}({\bm{x}})\right)}{\xi_{M}^{(1)}(m,{\bm{x}})\left(p_{11}({\bm{x}})-p_{01}({\bm{x}})\right)+p_{01}({\bm{x}})}$

and thus

\mathbb{E}_{Y_{1m}|U,{\bm{X}}}[Y_{1m}|10,{\bm{x}}]=\frac{\xi_{M}^{(1)}(m,{\bm{x}})\left(p_{11}({\bm{x}})-p_{01}({\bm{x}})\right)+p_{01}({\bm{x}})}{p_{01}({\bm{x}})/\xi_{Y}^{(1)}(m,{\bm{x}})+\xi_{M}^{(1)}(m,{\bm{x}})\left(p_{11}({\bm{x}})-p_{01}({\bm{x}})\right)}\mu_{11}(m,{\bm{x}}).

This completes the proof. $\square$

Next, we prove the nonparametric formulas $\theta_{d_{1}d_{0}}^{(zz^{\prime})}$ in Proposition S1.

Proof of Proposition S1. We will derive the nonparametric identification formula for $\theta_{10}^{(10)}$ and omit the similar proofs for all other $\theta_{d_{1}d_{0}}^{(zz^{\prime})}$ since the steps are similar. By the definition of $\theta_{10}^{(10)}$ , we have that

	$\displaystyle\theta_{10}^{(10)}$	$\displaystyle=\mathbb{E}[Y_{1M_{0}}\|U=10]$
		$\displaystyle=\mathbb{E}\left[\mathbb{E}[Y_{1M_{0}}\|U=10,{\bm{X}}]\Big{\|}U=10\right]\quad\text{(by law of iterated expectations)}$
		$\displaystyle=\mathbb{E}\left[\sum_{m=0}^{m_{\max}}\mathbb{E}_{Y_{1m}\|M_{0},U,{\bm{X}}}[Y_{1m}\|m,10,{\bm{X}}]f_{M_{0}\|U,{\bm{X}}}(m\|10,{\bm{X}})\Big{\|}U=10\right]$
		$\displaystyle=\mathbb{E}\left[\sum_{m=0}^{m_{\max}}\mathbb{E}_{Y_{1m}\|Z,M_{0},U,{\bm{X}}}[Y_{1m}\|1,m,10,{\bm{X}}]f_{M_{0}\|U,{\bm{X}}}(m\|10,{\bm{X}})\Big{\|}U=10\right]$
		(by Lemma S5 coupled with Lemma S2)
		$\displaystyle=\mathbb{E}\left[\sum_{m=0}^{m_{\max}}\mathbb{E}_{Y_{1m}\|Z,U,{\bm{X}}}[Y_{1m}\|1,10,{\bm{X}}]f_{M_{0}\|U,{\bm{X}}}(m\|10,{\bm{X}})\Big{\|}U=10\right]\quad\quad\text{(by Assumption 5)}$
		$\displaystyle=\mathbb{E}\left[\sum_{m=0}^{m_{\max}}\mathbb{E}_{Y_{1m}\|U,{\bm{X}}}[Y_{1m}\|10,{\bm{X}}]f_{M_{0}\|U,{\bm{X}}}(m\|10,{\bm{X}})\Big{\|}U=10\right]\quad\text{(by Lemma \ref{lemma:randomization2})}$
		$\displaystyle=\mathbb{E}\left[\sum_{m=0}^{m_{\max}}w_{10}^{(10)}(m,{\bm{X}})\mu_{11}(m,{\bm{X}})r_{00}(m,{\bm{X}})\Big{\|}U=10\right]\quad\quad\quad\text{(by Lemmas \ref{lemma:sa_1} and \ref{lemma:sa_2})}$
		$\displaystyle=\int_{{\bm{x}}}\frac{f_{U\|{\bm{X}}}(10\|{\bm{x}})}{f_{U}(10)}\left\{\sum_{m=0}^{m_{\max}}w_{10}^{(10)}(m,{\bm{x}})\mu_{11}(m,{\bm{x}})r_{00}(m,{\bm{x}})\right\}\text{d}\mathbb{P}_{{\bm{X}}}({\bm{x}})\quad\text{(by Lemma \ref{lemma:expectation})}$
		$\displaystyle=\int_{{\bm{x}}}\frac{e_{10}({\bm{x}})}{e_{10}}\left\{\sum_{m=0}^{m_{\max}}w_{10}^{(10)}(m,{\bm{x}})\mu_{11}(m,{\bm{x}})r_{00}(m,{\bm{x}})\right\}\text{d}\mathbb{P}_{{\bm{X}}}({\bm{x}}).$

This completes the proof. $\square$

Finally, we prove properties of the multiply robust estimator $\widehat{\theta}_{d_{1}d_{0}}^{(zz^{\prime})}(\xi)$ in Proposition S2.

Proof of Proposition S2. Following the notation in the proof of Theorem 4, we let $\widetilde{h}_{nuisance}=\{\widetilde{\pi}_{z}({\bm{x}}),\widetilde{p}_{zd}({\bm{x}}),\widetilde{r}_{zd}(m,{\bm{x}}),\widetilde{\mu}_{zd}(m,{\bm{x}})\}$ be probability limit of $\widehat{h}_{nuisance}^{\text{par}}$ , where $\widetilde{\pi}_{z}({\bm{x}})=\pi_{z}({\bm{x}})$ , $\widetilde{p}_{zd}({\bm{x}})=p_{zd}({\bm{x}})$ , $\widetilde{r}_{zd}(m,{\bm{x}})=r_{zd}(m,{\bm{x}})$ , $\widetilde{\mu}_{zd}(m,{\bm{x}})=\mu_{zd}(m,{\bm{x}})$ , under $\mathcal{M}_{\pi}$ , $\mathcal{M}_{e}$ , $\mathcal{M}_{m}$ , and $\mathcal{M}_{o}$ respectively. Under the condition of either $\mathcal{M}_{\pi}\cap\mathcal{M}_{e}\cap\mathcal{M}_{m}$ or $\mathcal{M}_{e}\cap\mathcal{M}_{m}\cap\mathcal{M}_{o}$ , we always have $\widetilde{p}_{zd}({\bm{x}})=p_{zd}({\bm{x}})$ and $\widetilde{r}_{zd}(m,{\bm{x}})=r_{zd}(m,{\bm{x}})$ . Also, because the sensitivity weight $w_{d_{1}d_{0}}^{(zz^{\prime})}(m,{\bm{x}})$ only depends on the confounding function $\xi$ and the observed-data nuisance functions $\{p_{zd}({\bm{x}}),r_{zd}(m,{\bm{x}})\}$ , we can confirm that the probability limit of $\widehat{w}_{d_{1}d_{0}}^{(zz^{\prime}),\text{par}}(m,{\bm{x}})$ is equal to $w_{d_{1}d_{0}}^{(zz^{\prime})}(m,{\bm{x}})$ under $\mathcal{M}_{\pi}\cap\mathcal{M}_{e}\cap\mathcal{M}_{m}$ or $\mathcal{M}_{e}\cap\mathcal{M}_{m}\cap\mathcal{M}_{o}$ . In addition, we can show that the probability limit of $\widehat{p}_{zd}^{\text{dr}}$ , denoted by $\widetilde{p}_{zd}$ , is equal to the true value $p_{zd}$ under $\mathcal{M}_{\pi}\cap\mathcal{M}_{e}\cap\mathcal{M}_{m}$ or $\mathcal{M}_{e}\cap\mathcal{M}_{m}\cap\mathcal{M}_{o}$ , because $\widehat{p}_{zd}^{\text{dr}}$ is a doubly robust estimator under $\mathcal{M}_{\pi}\cup\mathcal{M}_{e}$ . The previous discussion suggests that the probability limit of $\widehat{\theta}_{d_{1}d_{0}}^{(zz^{\prime}),\text{mr}}(\xi)$ is

	$\displaystyle\theta^{(zz^{\prime}),\text{mr}}_{d_{1}d_{0}}(\xi)=$	$\displaystyle\mathbb{E}\Big{\{}\left(\frac{\mathbb{I}(Z=z^{})\left\{\mathbb{I}(D=d^{})-p_{z^{}d^{}}({\bm{X}})\right\}}{\widetilde{\pi}_{z^{}}({\bm{X}})}-k\frac{(1-Z)\left\{D-p_{01}({\bm{X}})\right\}}{\widetilde{\pi}_{0}({\bm{X}})}\right)\frac{\widetilde{\eta}_{zz^{\prime}}^{w}({\bm{X}})}{p_{z^{}d^{*}}-kp_{01}}$
		$\displaystyle+\frac{p_{z^{}d^{}}({\bm{X}})-kp_{01}({\bm{X}})}{p_{z^{}d^{}}-kp_{01}}\frac{\mathbb{I}(D=d_{z},Z=z)}{p_{zd_{z}}({\bm{X}})\widetilde{\pi}_{z}({\bm{X}})}\frac{r_{z^{\prime}d_{z^{\prime}}}(M,{\bm{X}})}{r_{zd_{z}}(M,{\bm{X}})}w_{d_{1}d_{0}}^{(zz^{\prime})}(M,{\bm{X}})\left\{Y-\widetilde{\mu}_{zd_{z}}(M,{\bm{X}})\right\}$
		$\displaystyle+\frac{p_{z^{}d^{}}({\bm{X}})-kp_{01}({\bm{X}})}{p_{z^{}d^{}}-kp_{01}}\frac{\mathbb{I}(D=d_{z^{\prime}},Z=z^{\prime})}{p_{z^{\prime}d_{z^{\prime}}}({\bm{X}})\widetilde{\pi}_{z^{\prime}}({\bm{X}})}\left\{w_{d_{1}d_{0}}^{(zz^{\prime})}(M,{\bm{X}})\widetilde{\mu}_{zd_{z}}(M,{\bm{X}})-\widetilde{\eta}_{zz^{\prime}}^{w}({\bm{X}})\right\}$
		$\displaystyle+\frac{p_{z^{}d^{}}({\bm{X}})-kp_{01}({\bm{X}})}{p_{z^{}d^{}}-kp_{01}}\widetilde{\eta}_{zz^{\prime}}^{w}({\bm{X}})\Big{\}}$

under $\mathcal{M}_{\pi}\cap\mathcal{M}_{e}\cap\mathcal{M}_{m}$ or $\mathcal{M}_{e}\cap\mathcal{M}_{m}\cap\mathcal{M}_{o}$ , where $\widetilde{\eta}_{zz^{\prime}}^{w}({\bm{X}})=\sum_{m=0}^{m_{\max}}w_{d_{1}d_{0}}^{(zz^{\prime})}(m,{\bm{X}})\widetilde{\mu}_{zd_{z}}(m,{\bm{X}})r_{z^{\prime}d_{z^{\prime}}}(m,{\bm{X}})$ . In what follows, we show that $\theta^{(zz^{\prime}),\text{mr}}_{d_{1}d_{0}}(\xi)=\theta_{d_{1}d_{0}}^{(zz^{\prime})}$ under Scenario I ( $\mathcal{M}_{\pi}\cap\mathcal{M}_{e}\cap\mathcal{M}_{m}$ ) or Scenario II ( $\mathcal{M}_{e}\cap\mathcal{M}_{m}\cap\mathcal{M}_{o}$ ), which concludes the double robustness of $\widehat{\theta}_{d_{1}d_{0}}^{(zz^{\prime}),\text{mr}}(\xi)$ .

Scenario I ( $\mathcal{M}_{\pi}\cap\mathcal{M}_{e}\cap\mathcal{M}_{m}$ ):.

In Scenario I, $\widetilde{\pi}_{z}({\bm{x}})=\pi_{z}({\bm{x}})$ but generally $\widetilde{\mu}_{zd}(m,{\bm{x}})\neq\mu_{zd}(m,{\bm{x}})$ . Therefore, we can rewrite $\theta^{(zz^{\prime}),\text{mr}}_{d_{1}d_{0}}(\xi)=\sum_{j=1}^{4}\Delta_{j}$ , where

	$\displaystyle\Delta_{1}$	$\displaystyle=\mathbb{E}\left[\left(\frac{\mathbb{I}(Z=z^{})\left\{\mathbb{I}(D=d^{})-p_{z^{}d^{}}({\bm{X}})\right\}}{\pi_{z^{}}({\bm{X}})}-k\frac{(1-Z)\left\{D-p_{01}({\bm{X}})\right\}}{\pi_{0}({\bm{X}})}\right)\frac{\displaystyle\sum_{m=0}^{m_{\max}}w_{d_{1}d_{0}}^{(zz^{\prime})}(m,{\bm{X}})\widetilde{\mu}_{zd_{z}}(m,{\bm{X}})r_{z^{\prime}d_{z^{\prime}}}(m,{\bm{X}})}{p_{z^{}d^{*}}-kp_{01}}\right],$
	$\displaystyle\Delta_{2}$	$\displaystyle=\mathbb{E}\left[\frac{p_{z^{}d^{}}({\bm{X}})-kp_{01}({\bm{X}})}{p_{z^{}d^{}}-kp_{01}}\frac{\mathbb{I}(D=d_{z},Z=z)}{p_{zd_{z}}({\bm{X}})\pi_{z}({\bm{X}})}\frac{r_{z^{\prime}d_{z^{\prime}}}(M,{\bm{X}})}{r_{zd_{z}}(M,{\bm{X}})}w_{d_{1}d_{0}}^{(zz^{\prime})}(M,{\bm{X}})Y\right],$
	$\displaystyle\Delta_{3}$	$\displaystyle=\mathbb{E}\left[\frac{p_{z^{}d^{}}({\bm{X}})-kp_{01}({\bm{X}})}{p_{z^{}d^{}}-kp_{01}}\left\{\frac{\mathbb{I}(D=d_{z^{\prime}},Z=z^{\prime})}{p_{z^{\prime}d_{z^{\prime}}}({\bm{X}})\pi_{z^{\prime}}({\bm{X}})}-\frac{\mathbb{I}(D=d_{z},Z=z)}{p_{zd_{z}}({\bm{X}})\pi_{z}({\bm{X}})}\frac{r_{z^{\prime}d_{z^{\prime}}}(M,{\bm{X}})}{r_{zd_{z}}(M,{\bm{X}})}\right\}w_{d_{1}d_{0}}^{(zz^{\prime})}(M,{\bm{X}})\widetilde{\mu}_{zd_{z}}(M,{\bm{X}})\right],$
	$\displaystyle\Delta_{4}$	$\displaystyle=\mathbb{E}\left[\frac{p_{z^{}d^{}}({\bm{X}})-kp_{01}({\bm{X}})}{p_{z^{}d^{}}-kp_{01}}\left\{1-\frac{\mathbb{I}(D=d_{z^{\prime}},Z=z^{\prime})}{p_{z^{\prime}d_{z^{\prime}}}({\bm{X}})\pi_{z^{\prime}}({\bm{X}})}\right\}\sum_{m=0}^{m_{\max}}w_{d_{1}d_{0}}^{(zz^{\prime})}(m,{\bm{X}})\widetilde{\mu}_{zd_{z}}(m,{\bm{X}})r_{z^{\prime}d_{z^{\prime}}}(m,{\bm{X}})\right].$

One can show that $\Delta_{1}=\Delta_{3}=\Delta_{4}=0$ by using the law of iterated expectation and

	$\displaystyle\Delta_{2}=$	$\displaystyle\int_{{\bm{x}}}\frac{p_{z^{}d^{}}({\bm{x}})-kp_{01}({\bm{x}})}{p_{z^{}d^{}}-kp_{01}}\frac{1}{p_{zd_{z}}({\bm{x}})\pi_{z}({\bm{x}})}\sum_{m=0}^{m_{\max}}\left\{\frac{r_{z^{\prime}d_{z^{\prime}}}(m,{\bm{x}})}{r_{zd_{z}}(m,{\bm{x}})}w_{d_{1}d_{0}}^{(zz^{\prime})}(m,{\bm{x}})\int_{y}y\text{d}\mathbb{P}_{Y\|Z,D,M{\bm{X}}}(y\|z,d_{z},m,{\bm{x}})r_{zd_{z}}(m,{\bm{x}})\right\}$
		$\displaystyle\quad f_{D\|Z,{\bm{X}}}(d_{z}\|z,{\bm{x}})f_{Z\|{\bm{X}}}(z\|{\bm{x}})\text{d}\mathbb{P}_{{\bm{X}}}({\bm{x}})$
	$\displaystyle=$	$\displaystyle\int_{{\bm{x}}}\frac{p_{z^{}d^{}}({\bm{x}})-kp_{01}({\bm{x}})}{p_{z^{}d^{}}-kp_{01}}\sum_{m=0}^{m_{\max}}\left\{\frac{r_{z^{\prime}d_{z^{\prime}}}(m,{\bm{x}})}{r_{zd_{z}}(m,{\bm{x}})}w_{d_{1}d_{0}}^{(zz^{\prime})}(m,{\bm{x}})\int_{y}y\text{d}\mathbb{P}_{Y\|Z,D,M{\bm{X}}}(y\|z,d_{z},m,{\bm{x}})r_{zd_{z}}(m,{\bm{x}})\right\}\text{d}\mathbb{P}_{{\bm{X}}}({\bm{x}})$
	$\displaystyle=$	$\displaystyle\int_{{\bm{x}}}\frac{p_{z^{}d^{}}({\bm{x}})-kp_{01}({\bm{x}})}{p_{z^{}d^{}}-kp_{01}}\sum_{m=0}^{m_{\max}}\left\{\frac{r_{z^{\prime}d_{z^{\prime}}}(m,{\bm{x}})}{r_{zd_{z}}(m,{\bm{x}})}w_{d_{1}d_{0}}^{(zz^{\prime})}(m,{\bm{x}})\mu_{zd_{z}}(m,{\bm{x}})r_{zd_{z}}(m,{\bm{x}})\right\}\text{d}\mathbb{P}_{{\bm{X}}}({\bm{x}})$
	$\displaystyle=$	$\displaystyle\int_{{\bm{x}}}\frac{p_{z^{}d^{}}({\bm{x}})-kp_{01}({\bm{x}})}{p_{z^{}d^{}}-kp_{01}}\sum_{m=0}^{m_{\max}}\left\{w_{d_{1}d_{0}}^{(zz^{\prime})}(m,{\bm{x}})\mu_{zd_{z}}(m,{\bm{x}})r_{z^{\prime}d_{z^{\prime}}}(m,{\bm{x}})\right\}\text{d}\mathbb{P}_{{\bm{X}}}({\bm{x}})$
	$\displaystyle=$	$\displaystyle\int_{{\bm{x}}}\frac{e_{d_{1}d_{0}}({\bm{x}})}{e_{d_{1}d_{0}}}\sum_{m=0}^{m_{\max}}\left\{w_{d_{1}d_{0}}^{(zz^{\prime})}(m,{\bm{x}})\mu_{zd_{z}}(m,{\bm{x}})r_{z^{\prime}d_{z^{\prime}}}(m,{\bm{x}})\right\}\text{d}\mathbb{P}_{{\bm{X}}}({\bm{x}})$
	$\displaystyle=$	$\displaystyle\theta_{d_{1}d_{0}}^{(zz^{\prime})},$

which suggests that $\theta^{(zz^{\prime}),\text{mr}}_{d_{1}d_{0}}(\xi)=\sum_{j=1}^{4}\Delta_{j}=\theta^{(zz^{\prime})}_{d_{1}d_{0}}$ under $\mathcal{M}_{\pi}\cap\mathcal{M}_{e}\cap\mathcal{M}_{m}$ .

Scenario II ( $\mathcal{M}_{e}\cap\mathcal{M}_{m}\cap\mathcal{M}_{o}$ ):

In Scenario II, $\widetilde{\mu}_{zd}(m,{\bm{x}})=\mu_{zd}(m,{\bm{x}})$ but generally $\widetilde{\pi}_{z}({\bm{x}})\neq\pi_{z}({\bm{x}})$ . Therefore, we have $\theta^{(zz^{\prime}),\text{mr}}_{d_{1}d_{0}}(\xi)=\sum_{j=1}^{4}\Delta_{j}$ , where

	$\displaystyle\Delta_{1}$	$\displaystyle=\mathbb{E}\left[\left(\frac{\mathbb{I}(Z=z^{})\left\{\mathbb{I}(D=d^{})-p_{z^{}d^{}}({\bm{X}})\right\}}{\widetilde{\pi}_{z^{}}({\bm{X}})}-k\frac{(1-Z)\left\{D-p_{01}({\bm{X}})\right\}}{\widetilde{\pi}_{0}({\bm{X}})}\right)\frac{\displaystyle\sum_{m=0}^{m_{\max}}w_{d_{1}d_{0}}^{(zz^{\prime})}(m,{\bm{X}}){\mu}_{zd_{z}}(m,{\bm{X}})r_{z^{\prime}d_{z^{\prime}}}(m,{\bm{X}})}{p_{z^{}d^{*}}-kp_{01}}\right],$
	$\displaystyle\Delta_{2}$	$\displaystyle=\mathbb{E}\left[\frac{p_{z^{}d^{}}({\bm{X}})-kp_{01}({\bm{X}})}{p_{z^{}d^{}}-kp_{01}}\frac{\mathbb{I}(D=d_{z},Z=z)}{p_{zd_{z}}({\bm{X}})\widetilde{\pi}_{z}({\bm{X}})}\frac{r_{z^{\prime}d_{z^{\prime}}}(M,{\bm{X}})}{r_{zd_{z}}(M,{\bm{X}})}w_{d_{1}d_{0}}^{(zz^{\prime})}(M,{\bm{X}})\left\{Y-\mu_{zd_{z}}(M,{\bm{X}})\right\}\right],$
	$\displaystyle\Delta_{3}$	$\displaystyle=\mathbb{E}\left[\frac{p_{z^{}d^{}}({\bm{X}})-kp_{01}({\bm{X}})}{p_{z^{}d^{}}-kp_{01}}\frac{\mathbb{I}(D=d_{z^{\prime}},Z=z^{\prime})}{p_{z^{\prime}d_{z^{\prime}}}({\bm{X}})\widetilde{\pi}_{z^{\prime}}({\bm{X}})}\left\{w_{d_{1}d_{0}}^{(zz^{\prime})}(M,{\bm{X}})\mu_{zd_{z}}(M,{\bm{X}})-\sum_{m=0}^{m_{\max}}w_{d_{1}d_{0}}^{(zz^{\prime})}(m,{\bm{X}}){\mu}_{zd_{z}}(m,{\bm{X}})r_{z^{\prime}d_{z^{\prime}}}(m,{\bm{X}})\right\}\right],$
	$\displaystyle\Delta_{4}$	$\displaystyle=\mathbb{E}\left[\frac{p_{z^{}d^{}}({\bm{X}})-kp_{01}({\bm{X}})}{p_{z^{}d^{}}-kp_{01}}\sum_{m=0}^{m_{\max}}w_{d_{1}d_{0}}^{(zz^{\prime})}(m,{\bm{X}}){\mu}_{zd_{z}}(m,{\bm{X}})r_{z^{\prime}d_{z^{\prime}}}(m,{\bm{X}})\right].$

Noting that $\Delta_{1}=\Delta_{2}=\Delta_{3}=0$ by the law of iterated expectations and $\Delta_{4}=\theta_{d_{1}d_{0}}^{(zz^{\prime})}$ as shown in Proposition S1, we obtained that $\theta^{(zz^{\prime}),\text{mr}}_{d_{1}d_{0}}(\xi)=\theta^{(zz^{\prime})}_{d_{1}d_{0}}$ under $\mathcal{M}_{e}\cap\mathcal{M}_{m}\cap\mathcal{M}_{o}$ .

Up until this point, we have confirmed that $\widehat{\theta}^{(zz^{\prime}),\text{mr}}_{d_{1}d_{0}}(\xi)$ is consistent to $\theta^{(zz^{\prime})}_{d_{1}d_{0}}$ under $\mathcal{M}_{\pi}\cap\mathcal{M}_{e}\cap\mathcal{M}_{m}$ or $\mathcal{M}_{e}\cap\mathcal{M}_{m}\cap\mathcal{M}_{o}$ . Then, under mild regularity conditions (similar to what we listed in the proof of Theorem 4), one can easily show that $\widehat{\theta}^{(zz^{\prime}),\text{mr}}_{d_{1}d_{0}}(\xi)$ is also asymptotically normal such that $\sqrt{n}\left(\widehat{\theta}^{(zz^{\prime}),\text{mr}}_{d_{1}d_{0}}(\xi)-\theta_{d_{1}d_{0}}^{(zz^{\prime})}\right)$ converges to a zero-mean normal distribution with finite variance under either $\mathcal{M}_{\pi}\cap\mathcal{M}_{e}\cap\mathcal{M}_{m}$ or $\mathcal{M}_{e}\cap\mathcal{M}_{m}\cap\mathcal{M}_{o}$ . The proof is omitted for brevity. $\square$

D.9 Sensitivity analysis for the principal ignorability assumption under strong monotonicity

We can adapt $\widehat{\theta}_{d_{1}d_{0}}^{(zz^{\prime}),\text{mr}}(\xi)$ to address strong monotonicity, following a similar procedure shown in Section D.8. Under strong monotonicity, the proposed multiply robust estimator $\widehat{\theta}_{d_{1}d_{0}}^{(zz^{\prime}),\text{mr}}(\kappa)$ takes the following explicit expression for any $d_{1}d_{0}\in\mathcal{U}_{\text{b}}$ :

$\displaystyle\widehat{\theta}^{(zz^{\prime}),\text{mr}}_{d_{1}d_{0}}(\kappa)=$	$\displaystyle\mathbb{P}_{n}\Big{\{}\left(\frac{\mathbb{I}(Z=z^{})\left\{\mathbb{I}(D=d^{})-\widehat{p}_{z^{}d^{}}^{\text{par}}({\bm{X}})\right\}}{\widehat{\pi}_{z^{}}^{\text{par}}({\bm{X}})}-k\frac{(1-Z)\left\{D-\widehat{p}_{01}^{\text{par}}({\bm{X}})\right\}}{\widehat{\pi}_{0}^{\text{par}}({\bm{X}})}\right)\frac{\widehat{\eta}_{zz^{\prime}}^{w,\text{par}}({\bm{X}})}{\widehat{p}_{z^{}d^{*}}^{\text{dr}}-k\widehat{p}_{01}^{\text{dr}}}$
	$\displaystyle+\frac{\widehat{p}_{z^{}d^{}}^{\text{par}}({\bm{X}})-k\widehat{p}_{01}^{\text{par}}({\bm{X}})}{\widehat{p}_{z^{}d^{}}^{\text{dr}}-k\widehat{p}_{01}^{\text{dr}}}\frac{\mathbb{I}(D=d_{z},Z=z)}{\widehat{p}_{zd_{z}}^{\text{par}}({\bm{X}})\widehat{\pi}_{z}^{\text{par}}({\bm{X}})}\frac{\widehat{r}_{z^{\prime}d_{z^{\prime}}}^{\text{par}}(M,{\bm{X}})}{\widehat{r}_{zd_{z}}^{\text{par}}(M,{\bm{X}})}\widehat{w}_{d_{1}d_{0}}^{(zz^{\prime})}(M,{\bm{X}})\left\{Y-\widehat{\mu}_{zd_{z}}^{\text{par}}(M,{\bm{X}})\right\}$
	$\displaystyle+\frac{\widehat{p}_{z^{}d^{}}^{\text{par}}({\bm{X}})-k\widehat{p}_{01}^{\text{par}}({\bm{X}})}{\widehat{p}_{z^{}d^{}}^{\text{dr}}-k\widehat{p}_{01}^{\text{dr}}}\frac{\mathbb{I}(D=d_{z^{\prime}},Z=z^{\prime})}{\widehat{p}_{z^{\prime}d_{z^{\prime}}}^{\text{par}}({\bm{X}})\widehat{\pi}_{z^{\prime}}^{\text{par}}({\bm{X}})}\left\{\widehat{w}_{d_{1}d_{0}}^{(zz^{\prime})}(M,{\bm{X}})\widehat{\mu}_{zd_{z}}^{\text{par}}(M,{\bm{X}})-\widehat{\eta}_{zz^{\prime}}^{w,\text{par}}({\bm{X}})\right\}$
	$\displaystyle+\frac{\widehat{p}_{z^{}d^{}}^{\text{par}}({\bm{X}})-k\widehat{p}_{01}^{\text{par}}({\bm{X}})}{\widehat{p}_{z^{}d^{}}^{\text{dr}}-k\widehat{p}_{01}^{\text{dr}}}\widehat{\eta}_{zz^{\prime}}^{w,\text{par}}({\bm{X}})\Big{\}},$	(s26)

where $\widehat{\eta}_{zz^{\prime}}^{w,\text{par}}({\bm{x}})=\displaystyle\sum_{m=0}^{m_{\max}}\widehat{w}_{d_{1}d_{0}}^{(zz^{\prime})}(m,{\bm{x}})\widehat{\mu}_{zd_{z}}^{\text{par}}(m,{\bm{x}})\widehat{r}_{z^{\prime}d_{z^{\prime}}}^{\text{par}}(m,{\bm{x}})$ and $\widehat{w}_{d_{1}d_{0}}^{(zz^{\prime})}(m,{\bm{x}})$ is the estimated sensitivity weight based on the parametric working models. Here, the sensitivity weight $w_{d_{1}d_{0}}^{(zz^{\prime})}(m,{\bm{x}})$ takes a slightly different form under standard monotonicity. We provide the explicit expressions of $w_{d_{1}d_{0}}^{(zz^{\prime})}(m,{\bm{x}})$ for $zz^{\prime}\in\{11,10,00\}$ and $d_{1}d_{0}\in\{10,00\}$ below.

	$\displaystyle w_{10}^{(11)}(m,{\bm{x}})$	$\displaystyle=1\text{ for any $m$.}$
	$\displaystyle w_{10}^{(10)}(m,{\bm{x}})$	$\displaystyle=\begin{cases}\frac{\xi_{M}^{(0)}(m,{\bm{x}})p_{00}({\bm{x}})}{\xi_{M}^{(0)}(m,{\bm{x}})(p_{11}({\bm{x}})-p_{01}({\bm{X}}))+p_{10}({\bm{x}})},&\text{if }m\geq 1,\\ \frac{1}{r_{00}(0,{\bm{x}})}-\displaystyle\sum_{j=1}^{m_{\max}}\frac{\xi_{M}^{(0)}(j,{\bm{x}})p_{00}({\bm{x}})r_{00}(j,{\bm{x}})/r_{00}(0,{\bm{x}})}{\xi_{M}^{(0)}(j,{\bm{x}})(p_{11}({\bm{x}})-p_{01}({\bm{x}}))+p_{10}({\bm{x}})},&\text{if }m=0.\end{cases}$
	$\displaystyle w_{10}^{(00)}(m,{\bm{x}})$	$\displaystyle=\begin{cases}\frac{\xi_{M}^{(0)}(m,{\bm{x}})p_{00}({\bm{x}})}{\xi_{M}^{(0)}(m,{\bm{x}})(p_{11}({\bm{x}})-p_{01}({\bm{X}}))+p_{10}({\bm{x}})}\frac{\xi_{M}^{(0)}(m,{\bm{x}})\left(p_{11}({\bm{x}})-p_{01}({\bm{x}})\right)+p_{10}({\bm{x}})}{p_{10}({\bm{x}})/\xi_{Y}^{(0)}(m,{\bm{x}})+\xi_{M}^{(0)}(m,{\bm{x}})\left(p_{11}({\bm{x}})-p_{01}({\bm{x}})\right)},&\text{if }m\geq 1,\\ \left\{\frac{1}{r_{00}(0,{\bm{x}})}-\displaystyle\sum_{j=1}^{m_{\max}}\frac{\xi_{M}^{(0)}(j,{\bm{x}})p_{00}({\bm{x}})r_{00}(j,{\bm{x}})/r_{00}(0,{\bm{x}})}{\xi_{M}^{(0)}(j,{\bm{x}})(p_{11}({\bm{x}})-p_{01}({\bm{x}}))+p_{10}({\bm{x}})}\right\}\frac{\xi_{M}^{(0)}(0,{\bm{x}})\left(p_{11}({\bm{x}})-p_{01}({\bm{x}})\right)+p_{10}({\bm{x}})}{p_{10}({\bm{x}})/\xi_{Y}^{(0)}(0,{\bm{x}})+\xi_{M}^{(0)}(0,{\bm{x}})\left(p_{11}({\bm{x}})-p_{01}({\bm{x}})\right)},&\text{if }m=0.\end{cases}$
	$\displaystyle w_{00}^{(11)}(m,{\bm{x}})$	$\displaystyle=1\text{ for any $m$}.$
	$\displaystyle w_{00}^{(10)}(m,{\bm{x}})$	$\displaystyle=\begin{cases}\frac{p_{00}({\bm{x}})}{\xi_{M}^{(0)}(m,{\bm{x}})(p_{11}({\bm{x}})-p_{01}({\bm{x}}))+p_{10}({\bm{x}})},&\text{if }m\geq 1,\\ \frac{1}{r_{00}(0,{\bm{x}})}-\displaystyle\sum_{j=1}^{m_{\max}}\frac{p_{00}({\bm{x}})r_{00}(j,{\bm{x}})/r_{00}(0,{\bm{x}})}{\xi_{M}^{(0)}(j,{\bm{x}})(p_{11}({\bm{x}})-p_{01}({\bm{x}}))+p_{10}({\bm{x}})},&\text{if }m=0.\end{cases}$
	$\displaystyle w_{00}^{(00)}(m,{\bm{x}})$	$\displaystyle=\begin{cases}\frac{p_{00}({\bm{x}})}{\xi_{M}^{(0)}(m,{\bm{x}})(p_{11}({\bm{x}})-p_{01}({\bm{x}}))+p_{10}({\bm{x}})}\frac{\xi_{M}^{(0)}(m,{\bm{x}})\left(p_{11}({\bm{x}})-p_{01}({\bm{x}})\right)+p_{10}({\bm{x}})}{p_{10}({\bm{x}})+\xi_{Y}^{(0)}(m,{\bm{x}})\xi_{M}^{(0)}(m,{\bm{x}})\left(p_{11}({\bm{x}})-p_{01}({\bm{x}})\right)},&\text{if }m\geq 1,\\ \left\{\frac{1}{r_{00}(0,{\bm{x}})}-\displaystyle\sum_{j=1}^{m_{\max}}\frac{p_{00}({\bm{x}})r_{00}(j,{\bm{x}})/r_{00}(0,{\bm{x}})}{\xi_{M}^{(0)}(j,{\bm{x}})(p_{11}({\bm{x}})-p_{01}({\bm{x}}))+p_{10}({\bm{x}})}\right\}\frac{\xi_{M}^{(0)}(0,{\bm{x}})\left(p_{11}({\bm{x}})-p_{01}({\bm{x}})\right)+p_{10}({\bm{x}})}{p_{10}({\bm{x}})+\xi_{Y}^{(0)}(0,{\bm{x}})\xi_{M}^{(0)}(0,{\bm{x}})\left(p_{11}({\bm{x}})-p_{01}({\bm{x}})\right)},&\text{if }m=0.\end{cases}$

It is worth noting that the sensitivity weights $\{w_{d_{1}d_{0}}^{(10)}(m,{\bm{x}}),w_{d_{1}d_{0}}^{(11)}(m,{\bm{x}})\}$ only depend on $\xi_{M}^{(0)}(m,{\bm{x}})$ but not $\xi_{Y}^{(0)}(m,{\bm{x}})$ ; this suggests that the estimated principal natural indirect effect, $\widehat{\theta}_{d_{1}d_{0}}^{(11),\text{mr}}(\kappa)-\widehat{\theta}_{d_{1}d_{0}}^{(10),\text{mr}}(\kappa)$ , will not depend on the values of $\xi_{Y}^{(0)}(m,{\bm{x}})$ . However, the estimated principal natural direct effect, $\widehat{\theta}_{d_{1}d_{0}}^{(10),\text{mr}}(\kappa)-\widehat{\theta}_{d_{1}d_{0}}^{(00),\text{mr}}(\kappa)$ , is dependent on both $\xi_{Y}^{(0)}(m,{\bm{x}})$ and $\xi_{M}^{(0)}(m,{\bm{x}})$ .

We can show that $\widehat{\theta}^{(zz^{\prime}),\text{mr}}_{d_{1}d_{0}}(\kappa)$ enjoys similar properties to $\widehat{\theta}^{(zz^{\prime}),\text{mr}}_{d_{1}d_{0}}(\xi)$ :

Proposition S7

Suppose that Assumptions 1, 2, 3b, 5, and 6 hold. Under either $\mathcal{M}_{\pi}\cap\mathcal{M}_{e}\cap\mathcal{M}_{m}$ or $\mathcal{M}_{e}\cap\mathcal{M}_{m}\cap\mathcal{M}_{o}$ , the estimator $\widehat{\theta}^{(zz^{\prime}),\text{mr}}_{d_{1}d_{0}}(\kappa)$ is consistent and asymptotically normal for any $d_{1}d_{0}\in\mathcal{U}_{\text{b}}$ .

Proof of Proposition S7 is similar to the proof of Proposition S2 and has been omitted for brevity.

D.10 Sensitivity analysis for the ignorability of the mediator assumption

This section presents proofs for Propositions S3 and S4, which refer to the identification results and properties of the multiply robust estimator under violation of Assumption 5.

Proof of Proposition S3. For any $z\in\{0,1\}$ and $d_{1}d_{0}\in\mathcal{U}$ with $\mathcal{U}=\mathcal{U}_{\text{a}}$ under standard monotonicity and $\mathcal{U}=\mathcal{U}_{\text{b}}$ under strong monotonicity, we can show that

		$\displaystyle\mathbb{E}_{Y_{1m}\|Z,U,{\bm{X}}}[Y_{1m}\|z,d_{1}d_{0},{\bm{x}}]$
	$\displaystyle=$	$\displaystyle\sum_{j=0}^{m_{\max}}\mathbb{E}_{Y_{1m}\|Z,M,U,{\bm{X}}}[Y_{1m}\|z,j,d_{1}d_{0},{\bm{x}}]f_{M\|Z,U,{\bm{X}}}(j\|z,d_{1}d_{0},{\bm{x}})$
	$\displaystyle=$	$\displaystyle\mathbb{E}_{Y_{1m}\|Z,M,U,{\bm{X}}}[Y_{1m}\|z,m,d_{1}d_{0},{\bm{x}}]\sum_{j=0}^{m_{\max}}\frac{\mathbb{E}_{Y_{1m}\|Z,M,U,{\bm{X}}}[Y_{1m}\|z,j,d_{1}d_{0},{\bm{x}}]}{\mathbb{E}_{Y_{1m}\|Z,M,U,{\bm{X}}}[Y_{1m}\|z,m,d_{1}d_{0},{\bm{x}}]}f_{M\|Z,U,{\bm{X}}}(j\|z,d_{1}d_{0},{\bm{x}})$
	$\displaystyle=$	$\displaystyle\mathbb{E}_{Y_{1m}\|Z,M,U,{\bm{X}}}[Y_{1m}\|z,m,d_{1}d_{0},{\bm{x}}]\sum_{j=0}^{m_{\max}}\frac{t(z,j,d_{1}d_{0},{\bm{x}})}{t(z,m,d_{1}d_{0},{\bm{x}})}f_{M\|Z,U,{\bm{X}}}(j\|z,d_{1}d_{0},{\bm{x}})$

Therefore,

		$\displaystyle\frac{\mathbb{E}_{Y_{1m}\|Z,M,U,{\bm{X}}}[Y_{1m}\|0,m,d_{1}d_{0},{\bm{x}}]}{\mathbb{E}_{Y_{1m}\|Z,M,U,{\bm{X}}}[Y_{1m}\|1,m,d_{1}d_{0},{\bm{x}}]}$
	$\displaystyle=$	$\displaystyle\frac{\mathbb{E}_{Y_{1m}\|Z,U,{\bm{X}}}[Y_{1m}\|0,d_{1}d_{0},{\bm{x}}]\left\{\displaystyle\sum_{j=0}^{m_{\max}}\frac{t(0,j,d_{1}d_{0},{\bm{x}})}{t(0,m,d_{1}d_{0},{\bm{x}})}f_{M\|Z,U,{\bm{X}}}(j\|0,d_{1}d_{0},{\bm{x}})\right\}^{-1}}{\mathbb{E}_{Y_{1m}\|Z,U,{\bm{X}}}[Y_{1m}\|1,d_{1}d_{0},{\bm{x}}]\left\{\displaystyle\sum_{j=0}^{m_{\max}}\frac{t(1,j,d_{1}d_{0},{\bm{x}})}{t(1,m,d_{1}d_{0},{\bm{x}})}f_{M\|Z,U,{\bm{X}}}(j\|1,d_{1}d_{0},{\bm{x}})\right\}^{-1}}$
	$\displaystyle=$	$\displaystyle\left\{\displaystyle\sum_{j=0}^{m_{\max}}\frac{t(1,j,d_{1}d_{0},{\bm{x}})}{t(1,m,d_{1}d_{0},{\bm{x}})}f_{M\|Z,U,{\bm{X}}}(j\|1,d_{1}d_{0},{\bm{x}})\right\}\Big{/}\left\{\displaystyle\sum_{j=0}^{m_{\max}}\frac{t(0,j,d_{1}d_{0},{\bm{x}})}{t(0,m,d_{1}d_{0},{\bm{x}})}f_{M\|Z,U,{\bm{X}}}(j\|0,d_{1}d_{0},{\bm{x}})\right\},$

where the last equality is followed by $\mathbb{E}_{Y_{1m}|Z,U,{\bm{X}}}[Y_{1m}|0,d_{1}d_{0},{\bm{x}}]=\mathbb{E}_{Y_{1m}|Z,U,{\bm{X}}}[Y_{1m}|1,d_{1}d_{0},{\bm{x}}]$ due to Lemma S5. We can then identify $\mathbb{E}_{Y_{1m}|M_{0},U,{\bm{X}}}[Y_{1m}|m,d_{1}d_{0},{\bm{x}}]$ using the following equations:

		$\displaystyle\mathbb{E}_{Y_{1m}\|M_{0},U,{\bm{X}}}[m,d_{1}d_{0},{\bm{x}}]$
	$\displaystyle=$	$\displaystyle\mathbb{E}_{Y_{1m}\|Z,M,U,{\bm{X}}}[0,m,d_{1}d_{0},{\bm{x}}]$
	$\displaystyle=$	$\displaystyle\mathbb{E}_{Y_{1m}\|Z,M,U,{\bm{X}}}[1,m,d_{1}d_{0},{\bm{x}}]\times\frac{\sum_{j=0}^{m_{\max}}\displaystyle\frac{t(1,j,d_{1}d_{0},{\bm{x}})}{t(1,m,d_{1}d_{0},{\bm{x}})}f_{M\|Z,U,{\bm{X}}}(j\|1,d_{1}d_{0},{\bm{x}})}{\sum_{j=0}^{m_{\max}}\displaystyle\frac{t(0,j,d_{1}d_{0},{\bm{x}})}{t(0,m,d_{1}d_{0},{\bm{x}})}f_{M\|Z,U,{\bm{X}}}(j\|0,d_{1}d_{0},{\bm{x}})}$
	$\displaystyle=$	$\displaystyle\mathbb{E}_{Y_{1m}\|M_{1},U,{\bm{X}}}[m,d_{1}d_{0},{\bm{x}}]\times\frac{\sum_{j=0}^{m_{\max}}\displaystyle\frac{t(1,j,d_{1}d_{0},{\bm{x}})}{t(1,m,d_{1}d_{0},{\bm{x}})}f_{M_{1}\|U,{\bm{X}}}(j\|d_{1}d_{0},{\bm{x}})}{\sum_{j=0}^{m_{\max}}\displaystyle\frac{t(0,j,d_{1}d_{0},{\bm{x}})}{t(0,m,d_{1}d_{0},{\bm{x}})}f_{M_{0}\|U,{\bm{X}}}(j\|d_{1}d_{0},{\bm{x}})}$
	$\displaystyle=$	$\displaystyle\mathbb{E}_{Y_{1m}\|D_{1},D_{0},M_{1},{\bm{X}}}[d_{1},d_{0},m,{\bm{x}}]\times\frac{\sum_{j=0}^{m_{\max}}\displaystyle\frac{t(1,j,d_{1}d_{0},{\bm{x}})}{t(1,m,d_{1}d_{0},{\bm{x}})}f_{M_{1}\|D_{1},D_{0},{\bm{X}}}(j\|d_{1},d_{0},{\bm{x}})}{\sum_{j=0}^{m_{\max}}\displaystyle\frac{t(0,j,d_{1}d_{0},{\bm{x}})}{t(0,m,d_{1}d_{0},{\bm{x}})}f_{M_{0}\|D_{1},D_{0},{\bm{X}}}(j\|d_{1},d_{0},{\bm{x}})}$
	$\displaystyle=$	$\displaystyle\mathbb{E}_{Y_{1m}\|D_{1},M_{1},{\bm{X}}}[d_{1},m,{\bm{x}}]\times\frac{\sum_{j=0}^{m_{\max}}\displaystyle\frac{t(1,j,d_{1}d_{0},{\bm{x}})}{t(1,m,d_{1}d_{0},{\bm{x}})}f_{M_{1}\|D_{1},{\bm{X}}}(j\|d_{1},{\bm{x}})}{\sum_{j=0}^{m_{\max}}\displaystyle\frac{t(0,j,d_{1}d_{0},{\bm{x}})}{t(0,m,d_{1}d_{0},{\bm{x}})}f_{M_{0}\|D_{0},{\bm{X}}}(j\|d_{0},{\bm{x}})}\quad\quad\text{(by Lemma \ref{lemma:pi_v2})}$
	$\displaystyle=$	$\displaystyle\mathbb{E}_{Y\|Z,D,M,{\bm{X}}}[1,d_{1},m,{\bm{x}}]\times\frac{\sum_{j=0}^{m_{\max}}\displaystyle\frac{t(1,j,d_{1}d_{0},{\bm{x}})}{t(1,m,d_{1}d_{0},{\bm{x}})}f_{M\|Z,D,{\bm{X}}}(j\|1,d_{1},{\bm{x}})}{\sum_{j=0}^{m_{\max}}\displaystyle\frac{t(0,j,d_{1}d_{0},{\bm{x}})}{t(0,m,d_{1}d_{0},{\bm{x}})}f_{M\|Z,D,{\bm{X}}}(j\|0,d_{0},{\bm{x}})}\quad\quad(\text{by Assumption 2})$
	$\displaystyle=$	$\displaystyle\mu_{1d_{1}}(m,{\bm{x}})\frac{\sum_{j=0}^{m_{\max}}\displaystyle\frac{t(1,j,d_{1}d_{0},{\bm{x}})}{t(1,m,d_{1}d_{0},{\bm{x}})}r_{1d_{1}}(j,{\bm{x}})}{\sum_{j=0}^{m_{\max}}\displaystyle\frac{t(0,j,d_{1}d_{0},{\bm{x}})}{t(0,m,d_{1}d_{0},{\bm{x}})}r_{0d_{0}}(j,{\bm{x}})}.$

Therefore, we can identify $\theta_{d_{1}d_{0}}^{(10)}$ as

	$\displaystyle\theta_{d_{1}d_{0}}^{(10)}$	$\displaystyle=\mathbb{E}[Y_{1M_{0}}\|U=d_{1}d_{0}]=\mathbb{E}\left[\mathbb{E}[Y_{1M_{0}}\|U=d_{1}d_{0},{\bm{X}}]\Big{\|}U=d_{1}d_{0}\right]$
		$\displaystyle=\mathbb{E}\left[\sum_{m=0}^{m_{\max}}\mathbb{E}_{Y_{1m}\|M_{0},U,{\bm{X}}}[Y_{1m}\|m,d_{1}d_{0},{\bm{X}}]f_{M_{0}\|U,{\bm{X}}}(m\|d_{1}d_{0},{\bm{X}})\Big{\|}U=d_{1}d_{0}\right]$
		$\displaystyle=\mathbb{E}\left[\sum_{m=0}^{m_{\max}}\frac{\sum_{j=0}^{m_{\max}}\displaystyle\frac{t(1,j,d_{1}d_{0},{\bm{X}})}{t(1,m,d_{1}d_{0},{\bm{X}})}r_{1d_{1}}(j,{\bm{X}})}{\sum_{j=0}^{m_{\max}}\displaystyle\frac{t(0,j,d_{1}d_{0},{\bm{X}})}{t(0,m,d_{1}d_{0},{\bm{X}})}r_{0d_{0}}(j,{\bm{X}})}\mu_{1d_{1}}(m,{\bm{X}})f_{M_{0}\|U,{\bm{X}}}(m\|d_{1}d_{0},{\bm{X}})\Big{\|}U=d_{1}d_{0}\right]$
		$\displaystyle=\mathbb{E}\left[\sum_{m=0}^{m_{\max}}\rho_{d_{1}d_{0}}^{(10)}(m,{\bm{X}})\mu_{1d_{1}}(m,{\bm{X}})f_{M_{0}\|U,{\bm{X}}}(m\|d_{1}d_{0},{\bm{X}})\Big{\|}U=d_{1}d_{0}\right]$
		$\displaystyle=\mathbb{E}\left[\sum_{m=0}^{m_{\max}}\rho_{d_{1}d_{0}}^{(10)}(m,{\bm{X}})\mu_{1d_{1}}(m,{\bm{X}})f_{M_{0}\|D_{0},{\bm{X}}}(m\|0,d_{0},{\bm{X}})\Big{\|}U=d_{1}d_{0}\right]\quad\text{(by Lemma \ref{lemma:pi_v2})}$
		$\displaystyle=\mathbb{E}\left[\sum_{m=0}^{m_{\max}}\rho_{d_{1}d_{0}}^{(10)}(m,{\bm{X}})\mu_{1d_{1}}(m,{\bm{X}})f_{M_{0}\|Z,D_{0},{\bm{X}}}(m\|0,d_{0},{\bm{X}})\Big{\|}U=d_{1}d_{0}\right]\quad\text{(by Assumption 2)}$
		$\displaystyle=\mathbb{E}\left[\sum_{m=0}^{m_{\max}}\rho_{d_{1}d_{0}}^{(10)}(m,{\bm{X}})\mu_{1d_{1}}(m,{\bm{X}})r_{0d_{0}}(m,{\bm{X}})\Big{\|}U=d_{1}d_{0}\right]$
		$\displaystyle=\int_{{\bm{x}}}\frac{f_{U\|{\bm{X}}}(d_{1}d_{0}\|{\bm{x}})}{f_{U}(d_{1}d_{0})}\left\{\sum_{m=0}^{m_{\max}}\rho_{d_{1}d_{0}}^{(10)}(m,{\bm{x}})\mu_{1d_{1}}(m,{\bm{x}})r_{0d_{0}}(m,{\bm{x}})\right\}\text{d}\mathbb{P}_{{\bm{X}}}({\bm{x}})\quad\text{(by Lemma \ref{lemma:expectation})}$
		$\displaystyle=\int_{{\bm{x}}}\frac{e_{d_{1}d_{0}}({\bm{x}})}{e_{d_{1}d_{0}}}\left\{\sum_{m=0}^{m_{\max}}\rho_{d_{1}d_{0}}^{(10)}(m,{\bm{x}})\mu_{1d_{1}}(m,{\bm{x}})r_{0d_{0}}(m,{\bm{x}})\right\}\text{d}\mathbb{P}_{{\bm{X}}}({\bm{x}}).$

This completes the proof. $\square$

Proof of Proposition S4. Following notation in the proofs of Theorem 4 and Proposition S2, we let $\widetilde{h}_{nuisance}=\{\widetilde{\pi}_{z}({\bm{x}}),\widetilde{p}_{zd}({\bm{x}}),\widetilde{r}_{zd}(m,{\bm{x}}),\widetilde{\mu}_{zd}(m,{\bm{x}})\}$ be the probability limit of $\widehat{h}_{nuisance}^{\text{par}}$ . Because we assume the condition under either $\mathcal{M}_{\pi}\cap\mathcal{M}_{e}\cap\mathcal{M}_{m}$ , $\mathcal{M}_{e}\cap\mathcal{M}_{m}\cap\mathcal{M}_{o}$ , or $\mathcal{M}_{\pi}\cap\mathcal{M}_{m}\cap\mathcal{M}_{o}$ , we always have $\widetilde{r}_{zd}(m,{\bm{x}})=r_{zd}(m,{\bm{x}})$ , which suggests that the probability limit of $\widehat{\rho}_{d_{1}d_{0}}^{(zz^{\prime}),\text{par}}(m,{\bm{x}})$ always equals to $\rho_{d_{1}d_{0}}^{(zz^{\prime})}(m,{\bm{x}})$ because $\widehat{\rho}_{d_{1}d_{0}}^{(zz^{\prime}),\text{par}}(m,{\bm{x}})$ is only a function of the sensitivity weight $t$ and the mediator model $\mathcal{M}_{m}$ . Also, we can show that the probability limit of $\widehat{p}_{zd}^{\text{dr}}$ , denoted by $\widetilde{p}_{zd}$ , always equals to the true value $p_{zd}$ , because $\widehat{p}_{zd}^{\text{dr}}$ is doubly robust under $\mathcal{M}_{\pi}\cup\mathcal{M}_{e}$ . The previous discussion suggests that the probability limit of $\widehat{\theta}_{d_{1}d_{0}}^{(zz^{\prime}),\text{mr}}(t)$ is

	$\displaystyle\theta^{(10),\text{mr}}_{d_{1}d_{0}}(t)=$	$\displaystyle\mathbb{E}\Big{\{}\left(\frac{\mathbb{I}(Z=z^{})\left\{\mathbb{I}(D=d^{})-\widetilde{p}_{z^{}d^{}}({\bm{X}})\right\}}{\widetilde{\pi}_{z^{}}({\bm{X}})}-k\frac{(1-Z)\left\{D-\widetilde{p}_{01}({\bm{X}})\right\}}{\widetilde{\pi}_{0}({\bm{X}})}\right)\frac{\widetilde{\eta}_{10}^{\rho}({\bm{X}})}{p_{z^{}d^{*}}-kp_{01}}$
		$\displaystyle+\frac{\widetilde{p}_{z^{}d^{}}({\bm{X}})-k\widetilde{p}_{01}({\bm{X}})}{p_{z^{}d^{}}-kp_{01}}\frac{\mathbb{I}(D=d_{1},Z=1)}{\widetilde{p}_{1d_{1}}({\bm{X}})\widetilde{\pi}_{1}({\bm{X}})}\frac{r_{0d_{0}}(M,{\bm{X}})}{r_{1d_{1}}(M,{\bm{X}})}\rho_{d_{1}d_{0}}^{(10)}(M,{\bm{X}})\left\{Y-\widetilde{\mu}_{zd_{z}}(M,{\bm{X}})\right\}$
		$\displaystyle+\frac{\widetilde{p}_{z^{}d^{}}({\bm{X}})-k\widetilde{p}_{01}({\bm{X}})}{p_{z^{}d^{}}-kp_{01}}\frac{\mathbb{I}(D=d_{0},Z=0)}{\widetilde{p}_{0d_{0}}({\bm{X}})\widetilde{\pi}_{0}({\bm{X}})}\left\{\rho_{d_{1}d_{0}}^{(10)}(M,{\bm{X}})\widetilde{\mu}_{1d_{1}}(M,{\bm{X}})-\widetilde{\eta}_{10}^{\rho}({\bm{X}})\right\}$
		$\displaystyle+\frac{\widetilde{p}_{z^{}d^{}}({\bm{X}})-k\widetilde{p}_{01}({\bm{X}})}{p_{z^{}d^{}}-kp_{01}}\widetilde{\eta}_{10}^{\rho}({\bm{X}})\Big{\}}$

under $\mathcal{M}_{\pi}\cap\mathcal{M}_{e}\cap\mathcal{M}_{m}$ $\mathcal{M}_{e}\cap\mathcal{M}_{m}\cap\mathcal{M}_{o}$ , or $\mathcal{M}_{\pi}\cap\mathcal{M}_{m}\cap\mathcal{M}_{o}$ , where

\widetilde{\eta}_{10}^{\rho}({\bm{x}})=\sum_{m=0}^{m_{\max}}\rho_{d_{1}d_{0}}^{(10)}(m,{\bm{x}})\widetilde{\mu}_{1d_{1}}(m,{\bm{x}})r_{0d_{0}}(m,{\bm{x}}).

In what follows, we show that $\theta^{(10),\text{mr}}_{d_{1}d_{0}}(t)=\theta_{d_{1}d_{0}}^{(10)}$ under Scenario I ( $\mathcal{M}_{\pi}\cap\mathcal{M}_{e}\cap\mathcal{M}_{m}$ ), Scenario II ( $\mathcal{M}_{\pi}\cap\mathcal{M}_{m}\cap\mathcal{M}_{o}$ ), or Scenario III ( $\mathcal{M}_{e}\cap\mathcal{M}_{m}\cap\mathcal{M}_{o}$ ), which concludes the triple robustness of $\widehat{\theta}_{d_{1}d_{0}}^{(zz^{\prime}),\text{mr}}(t)$ .

Scenario I ( $\mathcal{M}_{\pi}\cap\mathcal{M}_{e}\cap\mathcal{M}_{m}$ ):

In Scenario I, $\widetilde{\pi}_{z}({\bm{x}})=\pi_{z}({\bm{x}})$ and $\widetilde{p}_{zd}({\bm{x}})=p_{zd}({\bm{x}})$ but generally $\widetilde{\mu}_{zd}(m,{\bm{x}})\neq\mu_{zd}(m,{\bm{x}})$ . Therefore, we can rewrite $\theta^{(10),\text{mr}}_{d_{1}d_{0}}(t)=\sum_{j=1}^{4}\Delta_{j}$ , where

	$\displaystyle\Delta_{1}$	$\displaystyle=\mathbb{E}\left[\left(\frac{\mathbb{I}(Z=z^{})\left\{\mathbb{I}(D=d^{})-p_{z^{}d^{}}({\bm{X}})\right\}}{\pi_{z^{}}({\bm{X}})}-k\frac{(1-Z)\left\{D-p_{01}({\bm{X}})\right\}}{\pi_{0}({\bm{X}})}\right)\frac{\displaystyle\sum_{m=0}^{m_{\max}}\rho_{d_{1}d_{0}}^{(10)}(m,{\bm{X}})\widetilde{\mu}_{1d_{1}}(m,{\bm{X}})r_{0d_{0}}(m,{\bm{X}})}{p_{z^{}d^{*}}-kp_{01}}\right],$
	$\displaystyle\Delta_{2}$	$\displaystyle=\mathbb{E}\left[\frac{p_{z^{}d^{}}({\bm{X}})-kp_{01}({\bm{X}})}{p_{z^{}d^{}}-kp_{01}}\frac{\mathbb{I}(D=d_{1},Z=1)}{p_{1d_{1}}({\bm{X}})\pi_{1}({\bm{X}})}\frac{r_{0d_{0}}(M,{\bm{X}})}{r_{1d_{1}}(M,{\bm{X}})}\rho_{d_{1}d_{0}}^{(10)}(M,{\bm{X}})Y\right],$
	$\displaystyle\Delta_{3}$	$\displaystyle=\mathbb{E}\left[\frac{p_{z^{}d^{}}({\bm{X}})-kp_{01}({\bm{X}})}{p_{z^{}d^{}}-kp_{01}}\left\{\frac{\mathbb{I}(D=d_{0},Z=0)}{p_{0d_{0}}({\bm{X}})\pi_{0}({\bm{X}})}-\frac{\mathbb{I}(D=d_{1},Z=1)}{p_{1d_{1}}({\bm{X}})\pi_{1}({\bm{X}})}\frac{r_{0d_{0}}(M,{\bm{X}})}{r_{1d_{1}}(M,{\bm{X}})}\right\}\rho_{d_{1}d_{0}}^{(10)}(M,{\bm{X}})\widetilde{\mu}_{1d_{1}}(M,{\bm{X}})\right],$
	$\displaystyle\Delta_{4}$	$\displaystyle=\mathbb{E}\left[\frac{p_{z^{}d^{}}({\bm{X}})-kp_{01}({\bm{X}})}{p_{z^{}d^{}}-kp_{01}}\left\{1-\frac{\mathbb{I}(D=d_{0},Z=0)}{p_{0d_{0}}({\bm{X}})\pi_{0}({\bm{X}})}\right\}\sum_{m=0}^{m_{\max}}\rho_{d_{1}d_{0}}^{(10)}(m,{\bm{X}})\widetilde{\mu}_{1d_{1}}(m,{\bm{X}})r_{0d_{0}}(m,{\bm{X}})\right].$

One can show that $\Delta_{1}=\Delta_{3}=\Delta_{4}=0$ by using the law of iterated expectation and

	$\displaystyle\Delta_{2}=$	$\displaystyle\int_{{\bm{x}}}\frac{p_{z^{}d^{}}({\bm{x}})-kp_{01}({\bm{x}})}{p_{z^{}d^{}}-kp_{01}}\frac{1}{p_{1d_{1}}({\bm{x}})\pi_{1}({\bm{x}})}\sum_{m=0}^{m_{\max}}\left\{\frac{r_{0d_{0}}(m,{\bm{x}})}{r_{1d_{1}}(m,{\bm{x}})}\rho_{d_{1}d_{0}}^{(10)}(m,{\bm{x}})\int_{y}y\text{d}\mathbb{P}_{Y\|Z,D,M{\bm{X}}}(y\|1,d_{1},m,{\bm{x}})r_{1d_{1}}(m,{\bm{x}})\right\}$
		$\displaystyle\quad f_{D\|Z,{\bm{X}}}(d_{1}\|1,{\bm{x}})f_{Z\|{\bm{X}}}(1\|{\bm{x}})\text{d}\mathbb{P}_{{\bm{X}}}({\bm{x}})$
	$\displaystyle=$	$\displaystyle\int_{{\bm{x}}}\frac{p_{z^{}d^{}}({\bm{x}})-kp_{01}({\bm{x}})}{p_{z^{}d^{}}-kp_{01}}\sum_{m=0}^{m_{\max}}\left\{\frac{r_{0d_{0}}(m,{\bm{x}})}{r_{1d_{1}}(m,{\bm{x}})}\rho_{d_{1}d_{0}}^{(10)}(m,{\bm{x}})\int_{y}y\text{d}\mathbb{P}_{Y\|Z,D,M{\bm{X}}}(y\|1,d_{1},m,{\bm{x}})r_{1d_{1}}(m,{\bm{x}})\right\}\text{d}\mathbb{P}_{{\bm{X}}}({\bm{x}})$
	$\displaystyle=$	$\displaystyle\int_{{\bm{x}}}\frac{p_{z^{}d^{}}({\bm{x}})-kp_{01}({\bm{x}})}{p_{z^{}d^{}}-kp_{01}}\sum_{m=0}^{m_{\max}}\left\{\frac{r_{0d_{0}}(m,{\bm{x}})}{r_{1d_{1}}(m,{\bm{x}})}\rho_{d_{1}d_{0}}^{(10)}(m,{\bm{x}})\mu_{1d_{1}}(m,{\bm{x}})r_{1d_{1}}(m,{\bm{x}})\right\}\text{d}\mathbb{P}_{{\bm{X}}}({\bm{x}})$
	$\displaystyle=$	$\displaystyle\int_{{\bm{x}}}\frac{p_{z^{}d^{}}({\bm{x}})-kp_{01}({\bm{x}})}{p_{z^{}d^{}}-kp_{01}}\sum_{m=0}^{m_{\max}}\left\{\rho_{d_{1}d_{0}}^{(10)}(m,{\bm{x}})\mu_{1d_{1}}(m,{\bm{x}})r_{0d_{0}}(m,{\bm{x}})\right\}\text{d}\mathbb{P}_{{\bm{X}}}({\bm{x}})$
	$\displaystyle=$	$\displaystyle\theta_{d_{1}d_{0}}^{(zz^{\prime})},$

where the last equality follows from Proposition S3. This suggests that $\theta^{(10),\text{mr}}_{d_{1}d_{0}}(t)=\theta^{(10)}_{d_{1}d_{0}}$ under $\mathcal{M}_{\pi}\cap\mathcal{M}_{e}\cap\mathcal{M}_{m}$ .

Scenario II ( $\mathcal{M}_{\pi}\cap\mathcal{M}_{m}\cap\mathcal{M}_{o}$ ):

In Scenario II, $\widetilde{\pi}_{z}({\bm{x}})=\pi_{z}({\bm{x}})$ and $\widetilde{\mu}_{zd}(m,{\bm{x}})=\mu_{zd}(m,{\bm{x}})$ but generally $\widetilde{p}_{zd}({\bm{x}})\neq p_{zd}({\bm{x}})$ . Observing this, we can rewrite $\theta^{(zz^{\prime}),\text{mr}}_{d_{1}d_{0}}=\sum_{j=1}^{4}\Delta_{j}$ , where

	$\displaystyle\Delta_{1}$	$\displaystyle=\mathbb{E}\left[\left\{\frac{\mathbb{I}(Z=z^{},D=d^{})}{\pi_{z^{}}({\bm{X}})}-k\frac{(1-Z)D}{\pi_{0}({\bm{X}})}\right\}\frac{\eta_{10}^{\rho}({\bm{X}})}{p_{z^{}d^{*}}-kp_{01}}\right],$
	$\displaystyle\Delta_{2}$	$\displaystyle=\mathbb{E}\left[\frac{\widetilde{p}_{z^{}d^{}}({\bm{X}})-k\widetilde{p}_{01}({\bm{X}})}{p_{z^{}d^{}}-kp_{01}}\frac{\mathbb{I}(D=d_{1},Z=1)}{\widetilde{p}_{1d_{1}}({\bm{X}})\pi_{1}({\bm{X}})}\frac{r_{0d_{0}}(M,{\bm{X}})}{r_{1d_{1}}(M,{\bm{X}})}\rho_{d_{1}d_{0}}^{(10)}(M,{\bm{X}})\left\{Y-\mu_{1d_{1}}(M,{\bm{X}})\right\}\right],$
	$\displaystyle\Delta_{3}$	$\displaystyle=\mathbb{E}\left[\frac{\widetilde{p}_{z^{}d^{}}({\bm{X}})-k\widetilde{p}_{01}({\bm{X}})}{p_{z^{}d^{}}-kp_{01}}\frac{\mathbb{I}(D=d_{0},Z=0)}{\widetilde{p}_{0d_{0}}({\bm{X}})\pi_{0}({\bm{X}})}\left\{\rho_{d_{1}d_{0}}^{(10)}(M,{\bm{X}})\mu_{1d_{1}}(M,{\bm{X}})-\eta_{10}^{\rho}({\bm{X}})\right\}\right],$
	$\displaystyle\Delta_{4}$	$\displaystyle=\mathbb{E}\left[\left(\left\{1-\frac{\mathbb{I}(Z=z^{})}{\pi_{z^{}}({\bm{X}})}\right\}\widetilde{p}_{z^{}d^{}}({\bm{X}})-k\left\{1-\frac{1-Z}{\pi_{0}({\bm{X}})}\right\}\widetilde{p}_{01}({\bm{X}})\right)\frac{\eta_{10}^{\rho}({\bm{X}})}{p_{z^{}d^{}}-kp_{01}}\right],$

where $\eta_{10}^{\rho}({\bm{X}})=\displaystyle\sum_{m=0}^{m_{\max}}\rho_{d_{1}d_{0}}(m,{\bm{x}})\mu_{1d_{1}}(m,{\bm{x}})r_{0d_{0}}(m,{\bm{x}})$ . One can verify $\Delta_{2}=\Delta_{3}=\Delta_{4}=0$ by using the law of iterated expectation and

	$\displaystyle\Delta_{1}=$	$\displaystyle\int_{{\bm{x}}}\left\{\frac{1}{\pi_{z^{}}({\bm{x}})}\frac{\eta_{10}^{\rho}({\bm{x}})}{p_{z^{}d^{}}-kp_{01}}\right\}f_{D\|Z,{\bm{X}}}(d^{}\|z^{},{\bm{x}})f_{Z\|{\bm{X}}}(z^{}\|{\bm{x}})\text{d}\mathbb{P}_{\bm{X}}({\bm{x}})$
		$\displaystyle-\int_{{\bm{x}}}\left\{\frac{k}{\pi_{0}({\bm{x}})}\frac{\eta_{10}^{\rho}({\bm{x}})}{p_{z^{}d^{}}-kp_{01}}\right\}f_{D\|Z,{\bm{X}}}(1\|0,{\bm{x}})f_{Z\|{\bm{X}}}(0\|{\bm{x}})\text{d}\mathbb{P}_{\bm{X}}({\bm{x}})$
	$\displaystyle=$	$\displaystyle\int_{{\bm{x}}}\left\{f_{D\|Z,{\bm{X}}}(d^{}\|z^{},{\bm{x}})-kf_{D\|Z,{\bm{X}}}(1\|0,{\bm{x}})\right\}\frac{\eta_{10}^{\rho}({\bm{x}})}{p_{z^{}d^{}}-kp_{01}}\text{d}\mathbb{P}_{\bm{X}}({\bm{x}})$
	$\displaystyle=$	$\displaystyle\int_{{\bm{x}}}\frac{p_{z^{}d^{}}({\bm{x}})-kp_{01}({\bm{x}})}{p_{z^{}d^{}}-kp_{01}}\eta_{10}^{\rho}({\bm{x}})\text{d}\mathbb{P}_{{\bm{X}}}({\bm{x}})$
	$\displaystyle=$	$\displaystyle\int_{{\bm{x}}}\frac{p_{z^{}d^{}}({\bm{x}})-kp_{01}({\bm{x}})}{p_{z^{}d^{}}-kp_{01}}\sum_{m=0}^{m_{\max}}\left\{\rho_{d_{1}d_{0}}^{(10)}(m,{\bm{x}})\mu_{1d_{1}}(m,{\bm{x}})r_{0d_{0}}(m,{\bm{x}})\right\}\text{d}\mathbb{P}_{{\bm{X}}}({\bm{x}})$
	$\displaystyle=$	$\displaystyle\theta_{d_{1}d_{0}}^{(10)}.$

Therefore, we have obtained $\theta^{(10),\text{mr}}_{d_{1}d_{0}}(t)=\theta^{(10)}_{d_{1}d_{0}}$ under $\mathcal{M}_{\pi}\cap\mathcal{M}_{m}\cap\mathcal{M}_{o}$ .

Scenario III ( $\mathcal{M}_{e}\cap\mathcal{M}_{m}\cap\mathcal{M}_{o}$ ):

In Scenario III, $\widetilde{\mu}_{zd}(m,{\bm{x}})=\mu_{zd}(m,{\bm{x}})$ and $\widetilde{p}_{zd}({\bm{x}})=p_{zd}({\bm{x}})$ but generally $\widetilde{\pi}_{z}({\bm{x}})\neq\pi_{z}({\bm{x}})$ . Therefore, we have $\theta^{(10),\text{mr}}_{d_{1}d_{0}}(\xi)=\sum_{j=1}^{4}\Delta_{j}$ , where

	$\displaystyle\Delta_{1}$	$\displaystyle=\mathbb{E}\left[\left(\frac{\mathbb{I}(Z=z^{})\left\{\mathbb{I}(D=d^{})-p_{z^{}d^{}}({\bm{X}})\right\}}{\widetilde{\pi}_{z^{}}({\bm{X}})}-k\frac{(1-Z)\left\{D-p_{01}({\bm{X}})\right\}}{\widetilde{\pi}_{0}({\bm{X}})}\right)\frac{\eta_{10}^{\rho}({\bm{X}})}{p_{z^{}d^{*}}-kp_{01}}\right],$
	$\displaystyle\Delta_{2}$	$\displaystyle=\mathbb{E}\left[\frac{p_{z^{}d^{}}({\bm{X}})-kp_{01}({\bm{X}})}{p_{z^{}d^{}}-kp_{01}}\frac{\mathbb{I}(D=d_{1},Z=1)}{p_{1d_{1}}({\bm{X}})\widetilde{\pi}_{1}({\bm{X}})}\frac{r_{0d_{0}}(M,{\bm{X}})}{r_{1d_{1}}(M,{\bm{X}})}\rho_{d_{1}d_{0}}^{(10)}(M,{\bm{X}})\left\{Y-\mu_{1d_{1}}(M,{\bm{X}})\right\}\right],$
	$\displaystyle\Delta_{3}$	$\displaystyle=\mathbb{E}\left[\frac{p_{z^{}d^{}}({\bm{X}})-kp_{01}({\bm{X}})}{p_{z^{}d^{}}-kp_{01}}\frac{\mathbb{I}(D=d_{0},Z=0)}{p_{0d_{0}}({\bm{X}})\widetilde{\pi}_{0}({\bm{X}})}\left\{\rho_{d_{1}d_{0}}^{(10)}(M,{\bm{X}})\mu_{1d_{1}}(M,{\bm{X}})-\eta_{10}^{\rho}({\bm{X}})\right\}\right],$
	$\displaystyle\Delta_{4}$	$\displaystyle=\mathbb{E}\left[\frac{p_{z^{}d^{}}({\bm{X}})-kp_{01}({\bm{X}})}{p_{z^{}d^{}}-kp_{01}}\eta_{10}^{\rho}({\bm{X}})\right],$

where $\eta_{10}^{\rho}({\bm{X}})=\displaystyle\sum_{m=0}^{m_{\max}}\rho_{d_{1}d_{0}}(m,{\bm{x}})\mu_{1d_{1}}(m,{\bm{x}})r_{0d_{0}}(m,{\bm{x}})$ . Noting that $\Delta_{1}=\Delta_{2}=\Delta_{3}=0$ by the law of iterated expectations and $\Delta_{4}=\theta_{d_{1}d_{0}}^{(10)}$ as shown in Proposition S3, we obtained that $\theta^{(10),\text{mr}}_{d_{1}d_{0}}(t)=\theta^{(10)}_{d_{1}d_{0}}$ under $\mathcal{M}_{e}\cap\mathcal{M}_{m}\cap\mathcal{M}_{o}$ .

Now we have proved that $\widehat{\theta}^{(10),\text{mr}}_{d_{1}d_{0}}(\xi)$ is consistent to $\theta^{(10)}_{d_{1}d_{0}}$ under $\mathcal{M}_{\pi}\cap\mathcal{M}_{e}\cap\mathcal{M}_{m}$ , $\mathcal{M}_{\pi}\cap\mathcal{M}_{m}\cap\mathcal{M}_{o}$ , or $\mathcal{M}_{e}\cap\mathcal{M}_{m}\cap\mathcal{M}_{o}$ . We can also show that $\widehat{\theta}^{(10),\text{mr}}_{d_{1}d_{0}}(t)$ is asymptotically normal under certain regularity conditions similar to what we provide in the proof of Theorem 4; the proofs are omitted for brevity. $\square$

E Figures and Tables

Table S1: Analysis of the ITT natural mediation effects in the JOBS II study.

Method	ITT-NIE	ITT-NDE	ITT
a	$-$ 0.017 ( $-$ 0.036, $-$ 0.005)	$-$ 0.075 ( $-$ 0.156, $-$ 0.001)	$-$ 0.092 ( $-$ 0.173, $-$ 0.020)
b	$-$ 0.014 ( $-$ 0.031, $-$ 0.003)	$-$ 0.073 ( $-$ 0.148, $-$ 0.001)	$-$ 0.087 ( $-$ 0.165, $-$ 0.017)
c	$-$ 0.021 ( $-$ 0.045, $-$ 0.005)	$-$ 0.072 ( $-$ 0.148, $-$ 0.002)	$-$ 0.092 ( $-$ 0.174, $-$ 0.021)
d	$-$ 0.014 ( $-$ 0.031, $-$ 0.003)	$-$ 0.072 ( $-$ 0.147, 0.000)	$-$ 0.086 ( $-$ 0.165, $-$ 0.016)
mr	$-$ 0.015 ( $-$ 0.032, $-$ 0.003)	$-$ 0.072 ( $-$ 0.151, $-$ 0.001)	$-$ 0.088 ( $-$ 0.167, $-$ 0.016)
np	$-$ 0.017 ( $-$ 0.031, $-$ 0.004)	$-$ 0.067 ( $-$ 0.146, 0.013)	$-$ 0.084 ( $-$ 0.162, $-$ 0.005)

Table S2: Analysis of the principal natural mediation effects in the JOBS II study.

Method	Estimand	Compliers	Never takers
Method	Estimand	$(\widehat{e}_{10}^{\text{np}}=0.55)$	$(\widehat{e}_{00}^{\text{np}}=0.45)$
a	PNIE	$-$ 0.030 ( $-$ 0.062, $-$ 0.009)	0.000 ( $-$ 0.007, 0.005)
	PNDE	$-$ 0.083 ( $-$ 0.176, 0.002)	$-$ 0.063 ( $-$ 0.174, 0.029)
	PCE	$-$ 0.113 ( $-$ 0.206, $-$ 0.027)	$-$ 0.063 ( $-$ 0.172, 0.027)
b	PNIE	$-$ 0.024 ( $-$ 0.052, $-$ 0.006)	0.000 ( $-$ 0.006, 0.005)
	PNDE	$-$ 0.082 ( $-$ 0.167, 0.003)	$-$ 0.062 ( $-$ 0.166, 0.028)
	PCE	$-$ 0.106 ( $-$ 0.186, $-$ 0.026)	$-$ 0.062 ( $-$ 0.165, 0.027)
c	PNIE	$-$ 0.030 ( $-$ 0.071, 0.001)	$-$ 0.007 ( $-$ 0.032, 0.015)
	PNDE	$-$ 0.083 ( $-$ 0.171, 0.003)	$-$ 0.056 ( $-$ 0.160, 0.032)
	PCE	$-$ 0.113 ( $-$ 0.207, $-$ 0.029)	$-$ 0.063 ( $-$ 0.171, 0.029)
d	PNIE	$-$ 0.024 ( $-$ 0.052, $-$ 0.006)	0.000 ( $-$ 0.006, 0.005)
	PNDE	$-$ 0.081 ( $-$ 0.166, 0.003)	$-$ 0.060 ( $-$ 0.166, 0.029)
	PCE	$-$ 0.105 ( $-$ 0.187, $-$ 0.025)	$-$ 0.060 ( $-$ 0.167, 0.029)
mr	PNIE	$-$ 0.026 ( $-$ 0.055, 0.006)	0.000 ( $-$ 0.006, 0.006)
	PNDE	$-$ 0.083 ( $-$ 0.170, 0.002)	$-$ 0.058 ( $-$ 0.160, 0.031)
	PCE	$-$ 0.109 ( $-$ 0.191, $-$ 0.027)	$-$ 0.058 ( $-$ 0.160, 0.030)
np	PNIE	$-$ 0.029 ( $-$ 0.052, $-$ 0.006)	$-$ 0.001 ( $-$ 0.004, 0.003)
	PNDE	$-$ 0.066 ( $-$ 0.156, 0.023)	$-$ 0.066 ( $-$ 0.163, 0.032)
	PCE	$-$ 0.096 ( $-$ 0.182, $-$ 0.009)	$-$ 0.067 ( $-$ 0.163, 0.032)
Rudolph et al.^§	PNIE	$-$ 0.030 ( $-$ 0.053, $-$ 0.006)	–
	PNDE	$-$ 0.115 ( $-$ 0.251, 0.022)	–
	PCE	$-$ 0.145 ( $-$ 0.280, $-$ 0.008)	–

$\mathsection$

‘Rudolph et al.’ is the nonparametric efficient estimator given by Rudolph et al. (2024); see Remark 1 for more details of this approach.

Table S3: Mean and standard deviation of baseline characteristics among the compliers and never-takers, JOBS II study^‡.

Variable	Compliers	Never-takers	ASD^§
Proportion	55%	45%
Gender (male)	0.53 (0.50)	0.62 (0.48)	0.19
Age (years)	39.14 (10.37)	35.51 (9.40)	0.37
White race	0.84 (0.37)	0.77 (0.42)	0.15
Depression	1.89 (0.55)	1.87 (0.55)	0.03
Economic hardship	3.04 (0.93)	3.11 (0.94)	0.07
Motivation	5.28 (0.63)	5.05 (0.64)	0.35
Marriage (baseline: never married)
Married	0.49 (0.50)	0.43 (0.50)	0.11
Separated	0.03 (0.17)	0.03 (0.18)	0.03
Divorced	0.18 (0.38)	0.19 (0.4)	0.04
Widowed	0.02 (0.14)	0.02 (0.15)	0.02
Education (baseline: less than high school)
High school	0.32 (0.47)	0.33 (0.47)	0.02
Post-secondary non-tertiary education	0.32 (0.47)	0.40 (0.49)	0.17
Bachelor’s degree	0.19 (0.39)	0.10 (0.31)	0.23
Higher than a Bachelor’s degree	0.13 (0.33)	0.09 (0.28)	0.13
Assertiveness	3.39 (0.87)	3.56 (0.83)	0.21

$\ddagger$

Calculation of stratum-specific mean and standard deviation of the baseline characteristics follows the method in Cheng et al. (2023a), which are weighted average of the mean and standard deviation of baseline characteristics based on the principal scores (estimated according to the nonparametric efficient estimator).
$\mathsection$

ASD is the absolute standardized difference across the two principal strata. Given a specific baseline covariate, its ASD is calculated as ${|\bar{x}_{10}-\bar{x}_{00}|}/{\sqrt{0.5(s_{10}^{2}+s_{00}^{2})}}$ , where $\bar{x}_{d_{1}d_{0}}$ and $s_{d_{1}d_{0}}$ are the estimated mean and standard deviation of this covariate in the stratum $U=d_{1}d_{0}$ .

Table S4: Mean and standard deviation of baseline characteristics across the doomed, harmed, and immune strata, WHO-LARES study^‡.

Variable	Doomed	Harmed	Immune	Max ASD^§
Proportion	51%	8%	41%
Personal characteristics
Female	0.63 (0.47)	0.56 (0.52)	0.48 (0.48)	0.31
Age (years)	46.53 (17.89)	46.80 (17.99)	46.76 (17.3)	0.02
Married	0.59 (0.49)	0.63 (0.48)	0.64 (0.48)	0.09
Post-secondary education	0.27 (0.45)	0.27 (0.45)	0.29 (0.45)	0.04
Employed	0.58 (0.49)	0.58 (0.49)	0.62 (0.49)	0.09
Non-smoking	0.50 (0.50)	0.49 (0.50)	0.47 (0.5)	0.06
Dwelling condition
Owning the house	0.74 (0.44)	0.72 (0.45)	0.76 (0.43)	0.08
House size greater than $50\text{m}^{2}$	0.86 (0.35)	0.84 (0.37)	0.87 (0.34)	0.08
Crowding ( $\geq 1$ resident per room)	0.64 (0.48)	0.64 (0.48)	0.64 (0.48)	0.01
Self-evaluation on dwelling condition
Satisfied with the heating system	0.85 (0.35)	0.85 (0.35)	0.89 (0.32)	0.10
Satisfied with natural light	0.75 (0.43)	0.73 (0.44)	0.77 (0.42)	0.10

$\ddagger$

Calculation of stratum-specific mean and standard deviation of the baseline characteristics follows the method in Cheng et al. (2023a), which are weighted average of the mean and standard deviation of baseline characteristics based on the principal scores (estimated according to the nonparametric efficient estimator).
$\mathsection$

Max ASD is the maximum pairwise absolute standardized difference across the three principal strata. Given a specific baseline covariate, its Max ASD is calculated as the maximum of ${|\bar{x}_{d_{1}d_{0}}-\bar{x}_{d_{1}^{\prime}d_{0}^{\prime}}|}/{\sqrt{0.5(s_{d_{1}d_{0}}^{2}+s_{d_{1}^{\prime}d_{0}^{\prime}}^{2})}}$ for all $d_{1}d_{0}\neq d_{1}^{\prime}d_{0}^{\prime}\in\{11,10,00\}$ , where $\bar{x}_{d_{1}d_{0}}$ and $s_{d_{1}d_{0}}$ are the estimated mean and standard deviation of this covariate in the stratum $U=d_{1}d_{0}$ .

Table S5: Analysis of the ITT natural mediation effects on a risk ratio scale, WHO-LARES study.

Method	ITT-NIE ${}^{\text{RR}}$	ITT-NDE ${}^{\text{RR}}$	ITT ${}^{\text{RR}}$
a	1.024 (1.003, 1.045)	1.242 (1.108, 1.385)	1.271 (1.127, 1.418)
b	1.021 (1.002, 1.039)	1.239 (1.105, 1.387)	1.266 (1.121, 1.407)
c	1.029 (1.005, 1.055)	1.232 (1.098, 1.386)	1.268 (1.127, 1.414)
d	1.021 (1.002, 1.040)	1.230 (1.095, 1.381)	1.256 (1.110, 1.418)
mr	1.021 (1.003, 1.043)	1.248 (1.114, 1.405)	1.274 (1.137, 1.438)
np	1.031 (1.010, 1.053)	1.219 (1.078, 1.361)	1.257 (1.114, 1.400)

Table S6: Analysis of the principal natural mediation effects on a risk ratio scale, WHO-LARES study.

Method	Estimand	Doomed	Harmed	Immune
Method	Estimand	$(\widehat{e}_{11}^{\text{np}}=0.51)$	$(\widehat{e}_{10}^{\text{np}}=0.08)$	$(\widehat{e}_{00}^{\text{np}}=0.41)$
a	PNIE ${}^{\text{RR}}$	1.019 (0.992, 1.041)	1.028 (0.990, 1.061)	1.030 (0.995, 1.073)
	PNDE ${}^{\text{RR}}$	1.252 (1.066, 1.421)	2.112 (1.780, 2.478)	1.087 (0.896, 1.306)
	PCE ${}^{\text{RR}}$	1.276 (1.087, 1.444)	2.171 (1.862, 2.508)	1.120 (0.919, 1.323)
b	PNIE ${}^{\text{RR}}$	1.021 (0.999, 1.043)	1.030 (0.998, 1.065)	1.018 (0.987, 1.053)
	PNDE ${}^{\text{RR}}$	1.226 (1.043, 1.408)	2.141 (1.659, 2.775)	1.096 (0.908, 1.310)
	PCE ${}^{\text{RR}}$	1.252 (1.073, 1.438)	2.205 (1.705, 2.833)	1.115 (0.931, 1.311)
c	PNIE ${}^{\text{RR}}$	1.040 (0.998, 1.081)	1.026 (0.984, 1.061)	1.010 (0.943, 1.065)
	PNDE ${}^{\text{RR}}$	1.226 (1.044, 1.407)	2.113 (1.816, 2.458)	1.099 (0.904, 1.322)
	PCE ${}^{\text{RR}}$	1.275 (1.086, 1.452)	2.168 (1.844, 2.495)	1.111 (0.913, 1.308)
d	PNIE ${}^{\text{RR}}$	1.021 (0.999, 1.043)	1.033 (0.998, 1.070)	1.017 (0.987, 1.052)
	PNDE ${}^{\text{RR}}$	1.227 (1.040, 1.403)	2.093 (1.795, 2.441)	1.096 (0.903, 1.322)
	PCE ${}^{\text{RR}}$	1.253 (1.070, 1.433)	2.162 (1.852, 2.490)	1.115 (0.918, 1.319)
mr	PNIE ${}^{\text{RR}}$	1.017 (0.999, 1.034)	1.029 (1.001, 1.059)	1.025 (0.983, 1.074)
	PNDE ${}^{\text{RR}}$	1.223 (1.044, 1.403)	2.296 (1.841, 2.982)	1.102 (0.911, 1.328)
	PCE ${}^{\text{RR}}$	1.244 (1.070, 1.433)	2.363 (1.897, 3.045)	1.129 (0.945, 1.336)
np	PNIE ${}^{\text{RR}}$	1.025 (1.001, 1.050)	1.046 (1.013, 1.079)	1.035 (0.995, 1.075)
	PNDE ${}^{\text{RR}}$	1.181 (1.014, 1.348)	2.142 (1.724, 2.560)	1.111 (0.881, 1.340)
	PCE ${}^{\text{RR}}$	1.212 (1.044, 1.379)	2.241 (1.817, 2.666)	1.150 (0.916, 1.385)

Table S7: Causal moderated mediation analysis for each discrete baseline characteristic based on the R package moderate.mediation (Qin and Wang, 2024), WHO-LARES study. The natural indirect effect and natural direct effect for the entire population, as output from the moderate.mediation package, are 1.021 (1.002, 1.043) and 1.251 (1.114, 1.394), respectively. All mediation effects are defined on the risk ratio scale.

Subpopulation	Conditional NIE	Conditional NDE
Gender
Male	1.008 (0.983, 1.038)	1.233 (1.011, 1.500)
Female	1.029 (1.005, 1.058)	1.261 (1.099, 1.440)
Current marital status
Unmarried	1.028 (0.997, 1.065)	1.246 (1.043, 1.488)
Married	1.014 (0.992, 1.037)	1.256 (1.081, 1.447)
Education
Secondary school or less	1.017 (0.999, 1.037)	1.305 (1.161, 1.476)
Post-secondary education	1.039 (0.979, 1.119)	0.972 (0.678, 1.336)
Employment status
Unemployed	1.022 (0.996, 1.054)	1.305 (1.127, 1.507)
Employed	1.020 (0.995, 1.046)	1.198 (1.001, 1.408)
Smoking status
Smoking	1.040 (1.007, 1.078)	1.204 (1.023, 1.412)
Non-smoking	1.003 (0.981, 1.025)	1.300 (1.105, 1.508)
Owning the house
No	1.025 (0.977, 1.079)	1.110 (0.868, 1.425)
Yes	1.019 (1.001, 1.038)	1.300 (1.145, 1.466)
House size
$\leq 50\text{m}^{2}$	1.053 (1.005, 1.128)	1.256 (0.978, 1.594)
$>50\text{m}^{2}$	1.012 (0.993, 1.030)	1.251 (1.103, 1.413)
Crowding
$<1$ resident per room	1.031 (1.001, 1.072)	1.038 (0.869, 1.239)
$\geq 1$ resident per room	1.014 (0.989, 1.038)	1.434 (1.235, 1.649)
Heating system
Unsatisfied	1.023 (1.002, 1.047)	1.244 (1.095, 1.420)
Satisfied	1.008 (0.978, 1.042)	1.279 (1.000, 1.582)
Natural light
Unsatisfied	1.032 (0.995, 1.076)	1.248 (1.032, 1.501)
Satisfied	1.013 (0.994, 1.034)	1.252 (1.087, 1.431)

	$\displaystyle\psi_{d_{1}d_{0}}^{(zz^{\prime})}({\bm{O}})=$	$\displaystyle\left(\frac{\mathbb{I}(Z=z^{})\left\{\mathbb{I}(D=d^{})-p_{z^{}d^{}}({\bm{X}})\right\}}{\pi_{z^{*}}({\bm{X}})}-k\frac{(1-Z)\left\{D-p_{01}({\bm{X}})\right\}}{\pi_{0}({\bm{X}})}\right)\eta_{zz^{\prime}}({\bm{X}})$
		$\displaystyle+\left\{p_{z^{}d^{}}({\bm{X}})-kp_{01}({\bm{X}})\right\}\frac{\mathbb{I}(D=d_{z},Z=z)}{p_{zd_{z}}({\bm{X}})\pi_{z}({\bm{X}})}\frac{r_{z^{\prime}d_{z^{\prime}}}(M,{\bm{X}})}{r_{zd_{z}}(M,{\bm{X}})}\left\{Y-\mu_{zd_{z}}(M,{\bm{X}})\right\}$
		$\displaystyle+\left\{p_{z^{}d^{}}({\bm{X}})-kp_{01}({\bm{X}})\right\}\frac{\mathbb{I}(D=d_{z^{\prime}},Z=z^{\prime})}{p_{z^{\prime}d_{z^{\prime}}}({\bm{X}})\pi_{z^{\prime}}({\bm{X}})}\left\{\mu_{zd_{z}}(M,{\bm{X}})-\eta_{zz^{\prime}}({\bm{X}})\right\}$
		$\displaystyle+\left\{p_{z^{}d^{}}({\bm{X}})-kp_{01}({\bm{X}})\right\}\eta_{zz^{\prime}}({\bm{X}}),$
	$\displaystyle\delta_{d_{1}d_{0}}({\bm{O}})=$	$\displaystyle\frac{\mathbb{I}(Z=z^{})\left\{\mathbb{I}(D=d^{})-p_{z^{}d^{}}({\bm{X}})\right\}}{\pi_{z^{}}({\bm{X}})}-k\frac{(1-Z)\left\{D-p_{01}({\bm{X}})\right\}}{\pi_{0}({\bm{X}})}+p_{z^{}d^{*}}({\bm{X}})-kp_{01}({\bm{X}}),$

	$\displaystyle Z\|{\bm{X}}\sim\text{Bernoulli}\left(\text{expit}([-1,0.5,-0.25,-0.1]^{T}{\bm{X}})\right),$
	$\displaystyle D\|Z,{\bm{X}}\sim\text{Bernoulli}\left(\text{expit}(-1+2Z+[1,-0.8,0.6,-1]^{T}{\bm{X}})\right),$
	$\displaystyle M\|D,Z,{\bm{X}}\sim\text{Bernoulli}\left(\text{expit}(-1.8+2Z+1.5D+[1,-0.5,0.9,-1]^{T}{\bm{X}})\right),$
	$\displaystyle Y\|M,D,Z,{\bm{X}}\sim N\left(210+1.5Z-D+M+[27.4,13.7,13.7,13.7]^{T}{\bm{X}},1\right).$

		$\displaystyle\left(\frac{\mathbb{I}(Z=z^{})\left\{\mathbb{I}(D=d^{})-p_{z^{}d^{}}({\bm{X}})\right\}}{\pi_{z^{*}}({\bm{X}})}-k\frac{(1-Z)\left\{D-p_{01}({\bm{X}})\right\}}{\pi_{0}({\bm{X}})}\right)\eta_{zz^{\prime}}^{\dagger}({\bm{X}})$
	$\displaystyle+$	$\displaystyle\left\{p_{z^{}d^{}}({\bm{X}})-kp_{01}({\bm{X}})\right\}\frac{\mathbb{I}(D=d_{z},Z=z)}{p_{z^{\prime}d_{z^{\prime}}}({\bm{X}})\pi_{z^{\prime}}({\bm{X}})}\frac{g_{z^{\prime}d_{z^{\prime}}}(M,{\bm{X}})\kappa_{d_{z^{\prime}}}(M,{\bm{X}})}{g_{zd_{z}}(M,{\bm{X}})\kappa_{d_{z}}(M,{\bm{X}})}\left\{Y-\mu_{zd_{z}}(M,{\bm{X}})\right\}$
	$\displaystyle+$	$\displaystyle\left\{p_{z^{}d^{}}({\bm{X}})-kp_{01}({\bm{X}})\right\}\frac{\mathbb{I}(D=d_{z^{\prime}},Z=z^{\prime})}{p_{z^{\prime}d_{z^{\prime}}}({\bm{X}})\pi_{z^{\prime}}({\bm{X}})}\left\{\mu_{zd_{z}}(M,{\bm{X}})-\eta_{zz^{\prime}}^{\dagger}({\bm{X}})\right\}$
	$\displaystyle+$	$\displaystyle\left\{p_{z^{}d^{}}({\bm{X}})-kp_{01}({\bm{X}})\right\}\eta_{zz^{\prime}}^{\dagger}({\bm{X}}).$

$\displaystyle\widehat{\theta}^{(zz^{\prime}),\text{mr}}_{d_{1}d_{0}}(\xi)=$	$\displaystyle\mathbb{P}_{n}\Big{\{}\left(\frac{\mathbb{I}(Z=z^{})\left\{\mathbb{I}(D=d^{})-\widehat{p}_{z^{}d^{}}^{\text{par}}({\bm{X}})\right\}}{\widehat{\pi}_{z^{}}^{\text{par}}({\bm{X}})}-k\frac{(1-Z)\left\{D-\widehat{p}_{01}^{\text{par}}({\bm{X}})\right\}}{\widehat{\pi}_{0}^{\text{par}}({\bm{X}})}\right)\frac{\widehat{\eta}_{zz^{\prime}}^{w,\text{par}}({\bm{X}})}{\widehat{p}_{z^{}d^{*}}^{\text{dr}}-k\widehat{p}_{01}^{\text{dr}}}$
	$\displaystyle+\frac{\widehat{p}_{z^{}d^{}}^{\text{par}}({\bm{X}})-k\widehat{p}_{01}^{\text{par}}({\bm{X}})}{\widehat{p}_{z^{}d^{}}^{\text{dr}}-k\widehat{p}_{01}^{\text{dr}}}\frac{\mathbb{I}(D=d_{z},Z=z)}{\widehat{p}_{zd_{z}}^{\text{par}}({\bm{X}})\widehat{\pi}_{z}^{\text{par}}({\bm{X}})}\frac{\widehat{r}_{z^{\prime}d_{z^{\prime}}}^{\text{par}}(M,{\bm{X}})}{\widehat{r}_{zd_{z}}^{\text{par}}(M,{\bm{X}})}\widehat{w}_{d_{1}d_{0}}^{(zz^{\prime})}(M,{\bm{X}})\left\{Y-\widehat{\mu}_{zd_{z}}^{\text{par}}(M,{\bm{X}})\right\}$
	$\displaystyle+\frac{\widehat{p}_{z^{}d^{}}^{\text{par}}({\bm{X}})-k\widehat{p}_{01}^{\text{par}}({\bm{X}})}{\widehat{p}_{z^{}d^{}}^{\text{dr}}-k\widehat{p}_{01}^{\text{dr}}}\frac{\mathbb{I}(D=d_{z^{\prime}},Z=z^{\prime})}{\widehat{p}_{z^{\prime}d_{z^{\prime}}}^{\text{par}}({\bm{X}})\widehat{\pi}_{z^{\prime}}^{\text{par}}({\bm{X}})}\left\{\widehat{w}_{d_{1}d_{0}}^{(zz^{\prime})}(M,{\bm{X}})\widehat{\mu}_{zd_{z}}^{\text{par}}(M,{\bm{X}})-\widehat{\eta}_{zz^{\prime}}^{w,\text{par}}({\bm{X}})\right\}$
	$\displaystyle+\frac{\widehat{p}_{z^{}d^{}}^{\text{par}}({\bm{X}})-k\widehat{p}_{01}^{\text{par}}({\bm{X}})}{\widehat{p}_{z^{}d^{}}^{\text{dr}}-k\widehat{p}_{01}^{\text{dr}}}\widehat{\eta}_{zz^{\prime}}^{w,\text{par}}({\bm{X}})\Big{\}},$	(s8)

	$\displaystyle\widehat{\theta}^{(10),\text{mr}}_{d_{1}d_{0}}(t)=$	$\displaystyle\mathbb{P}_{n}\Big{\{}\left(\frac{\mathbb{I}(Z=z^{})\left\{\mathbb{I}(D=d^{})-\widehat{p}_{z^{}d^{}}^{\text{par}}({\bm{X}})\right\}}{\widehat{\pi}_{z^{}}^{\text{par}}({\bm{X}})}-k\frac{(1-Z)\left\{D-\widehat{p}_{01}^{\text{par}}({\bm{X}})\right\}}{\widehat{\pi}_{0}^{\text{par}}({\bm{X}})}\right)\frac{\widehat{\eta}_{10}^{\rho,\text{par}}({\bm{X}})}{\widehat{p}_{z^{}d^{*}}^{\text{dr}}-k\widehat{p}_{01}^{\text{dr}}}$
		$\displaystyle+\frac{\widehat{p}_{z^{}d^{}}^{\text{par}}({\bm{X}})-k\widehat{p}_{01}^{\text{par}}({\bm{X}})}{\widehat{p}_{z^{}d^{}}^{\text{dr}}-k\widehat{p}_{01}^{\text{dr}}}\frac{\mathbb{I}(D=d_{1},Z=1)}{\widehat{p}_{1d_{1}}^{\text{par}}({\bm{X}})\widehat{\pi}_{1}^{\text{par}}({\bm{X}})}\frac{\widehat{r}_{0d_{0}}^{\text{par}}(M,{\bm{X}})}{\widehat{r}_{1d_{1}}^{\text{par}}(M,{\bm{X}})}\widehat{\rho}_{d_{1}d_{0}}^{(10)}(M,{\bm{X}})\left\{Y-\widehat{\mu}_{1d_{1}}^{\text{par}}(M,{\bm{X}})\right\}$
		$\displaystyle+\frac{\widehat{p}_{z^{}d^{}}^{\text{par}}({\bm{X}})-k\widehat{p}_{01}^{\text{par}}({\bm{X}})}{\widehat{p}_{z^{}d^{}}^{\text{dr}}-k\widehat{p}_{01}^{\text{dr}}}\frac{\mathbb{I}(D=d^{\prime},Z=z^{\prime})}{\widehat{p}_{0d_{0}}^{\text{par}}({\bm{X}})\widehat{\pi}_{0}^{\text{par}}({\bm{X}})}\left\{\widehat{\rho}_{d_{1}d_{0}}^{(10)}(M,{\bm{X}})\widehat{\mu}_{1d_{1}}^{\text{par}}(M,{\bm{X}})-\widehat{\eta}_{10}^{\rho,\text{par}}({\bm{X}})\right\}$
		$\displaystyle+\frac{\widehat{p}_{z^{}d^{}}^{\text{par}}({\bm{X}})-k\widehat{p}_{01}^{\text{par}}({\bm{X}})}{\widehat{p}_{z^{}d^{}}^{\text{dr}}-k\widehat{p}_{01}^{\text{dr}}}\widehat{\eta}_{10}^{\rho,\text{par}}({\bm{X}})\Big{\}},$