This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Semiparametric Inference of the Complier Average Causal Effect with Nonignorable Missing Outcomes

Hua Chen, 111Institute of Applied Physics and Computational Mathematics, Beijing, 100088, China Peng Ding, 222Department of Statistics, Harvard University, One Oxford Street, Cambridge, MA 02138, USA. 333Corresponding author’s Email: pengding@fas.harvard.edu  Zhi Geng, 444School of Mathematical Sciences, Peking University, Beijing, 100871, China and Xiaohua Zhou 555Department of Biostatistics, University of Washington, and Biostatistics Unit, HSR&D Center of Excellence, VA Puget Sound Health Care System, Seattle, Washington 98101, USA
Abstract

Noncompliance and missing data often occur in randomized trials, which complicate the inference of causal effects. When both noncompliance and missing data are present, previous papers proposed moment and maximum likelihood estimators for binary and normally distributed continuous outcomes under the latent ignorable missing data mechanism. However, the latent ignorable missing data mechanism may be violated in practice, because the missing data mechanism may depend directly on the missing outcome itself. Under noncompliance and an outcome-dependent nonignorable missing data mechanism, previous studies showed the identifiability of complier average causal effect for discrete outcomes. In this paper, we study the semiparametric identifiability and estimation of complier average causal effect in randomized clinical trials with both all-or-none noncompliance and the outcome-dependent nonignorable missing continuous outcomes, and propose a two-step maximum likelihood estimator in order to eliminate the infinite dimensional nuisance parameter. Our method does not need to specify a parametric form for the missing data mechanism. We also evaluate the finite sample property of our method via extensive simulation studies and sensitivity analysis, with an application to a double-blinded psychiatric clinical trial.


Key Words: Causal inference; Instrumental variable; Missing not at random; Noncompliance; Outcome-dependent missing; Principal stratification.

1 Introduction

Randomization is an effective way to study the average causal effects (ACEACEs) of new drugs or training programs. However, randomized trials are often plagued with noncompliance and missing data, which may make statistical inference difficult and biased. The noncompliance problem happens when some subjects fail to comply with their assigned treatments, and the missing data problem happens when investigators fail to collect information for some subjects. Ignoring noncompliance and missing data problems may lead to biased estimators of the ACEACEs.

The noncompliance problem has attracted a lot of attention in the literature. Efron and Feldman (1991) studied the noncompliance problem before the principal stratification framework (Frangakis and Rubin, 2002) was proposed. In the presence of noncompliance, Balke and Pearl (1997) proposed large sample bounds of the ACEACEs for binary outcomes using the linear programming method. Angrist et al. (1996) discussed the identifiability of the causal effect using the instrumental variable method. Imbens and Rubin (1997) proposed a Bayesian method to estimate the complier average causal effect (CACECACE). When some outcomes are missing, the identifiability and estimation of CACECACE are more complicated, and different types of missing data mechanisms have sizable impacts on the identifiability and estimation of CACECACE. Frangakis and Rubin (1999) established the identifiability and proposed a moment estimator of CACECACE under the latent ignorable (LI) missing data mechanism. Under the LI missing data mechanism, Zhou and Li (2006) and O'Malley and Normand (2005) proposed Expectation-Maximization (EM) algorithms (Dempster et al., 1977) to find the maximum likelihood estimators (MLEs) of CACECACE for binary and normally distributed outcomes, respectively. Barnard et al. (2003) proposed a Bayesian approach to estimate CACECACE with bivariate outcomes and covariate adjustment. Taylor and Zhou (2011) proposed a multiple imputation method to estimate CACECACE for clustered encouragement design studies.

However, the LI assumption may be implausible in some clinical studies when the missing data mechanism may depend on the missing outcome. Chen et al. (2009) and Imai (2009) discussed the identifiability of CACECACE for discrete outcomes under the outcome-dependent nonignorable (ODN) missing data mechanism. To the best of our knowledge, there are no published papers in the literature studying the identifiability of CACECACE for continuous outcomes under the ODN assumption. In this paper, we show that CACECACE is semiparametrically identifiable under some regular conditions, and propose estimation methods for CACECACE with continuous outcomes under the ODN assumption. For our semiparametric method, we need only assume that the distribution of the outcomes belongs to the exponential family without specifying a parametric form for the missing data mechanism.

This paper proceeds as follows. In Section 2, we discuss the notation and assumptions used in this paper and define the parameter of interest. In Section 3, we show the semiparametric identifiability and propose a two-step maximum likelihood estimator (TSMLE). In Section 4, we use several simulation studies to illustrate the finite sample properties of our proposed estimators and consider sensitivity analysis to assess the robustness of our estimation strategy. In Section 5, we analyze a double-blinded randomized clinical trial using the methods proposed in this paper. We conclude with a discussion and provide all proofs in the Appendices.

2 Notation and Assumptions

We consider a randomized trial with a continuous outcome. For the ii-th subject, let ZiZ_{i} denote the randomized treatment assignment (11 for treatment and 0 for control). Let DiD_{i} denote the treatment received (11 for treatment and 0 for control). When ZiDiZ_{i}\not=D_{i}, there exists noncompliance. Let YiY_{i} denote the outcome variable. Let RiR_{i} denote the missing data indicator of YiY_{i}, i.e., Ri=1R_{i}=1 if YiY_{i} is observed and Ri=0R_{i}=0 if YiY_{i} is missing. First, we need to make the following fundamental assumption.

Assumption 1 (Stable unit treatment value assumption, SUTVA):

There is no interference between units, which means that the potential outcomes of one individual do not depend on the treatment status of other individuals (Rubin, 1980), and there is only one version of potential outcome of a certain treatment (Rubin, 1986).

Except in the dependent case for infectious diseases (Hudgens and Halloran, 2008), the SUTVA assumption is reasonable in many cases. Under the SUTVA assumption, we define Di(z),Yi(z)D_{i}(z),Y_{i}(z) and Ri(z)R_{i}(z) as the potential treatment received, the potential outcome measured, and the potential missing data indicator for subject ii if he/she were assigned to treatment zz. These variables are potential outcomes because only one of the pairs {Di(1),Yi(1),Ri(1)}\{D_{i}(1),Y_{i}(1),R_{i}(1)\} and {Di(0),Yi(0),Ri(0)}\{D_{i}(0),Y_{i}(0),R_{i}(0)\} can be observed. Since ZiZ_{i} is the observed treatment assignment for subject ii, Di=Di(Zi),Yi=Yi(Zi)D_{i}=D_{i}(Z_{i}),Y_{i}=Y_{i}(Z_{i}), and Ri=Ri(Zi)R_{i}=R_{i}(Z_{i}) are the observed treatment received, the observed outcome, and the observed missing data indicator.

Under the principal stratification framework (Angrist et al., 1996; Frangakis and Rubin, 2002), we let UiU_{i} be the compliance status of subject ii, defined as follows:

Ui={a,ifDi(1)=1andDi(0)=1;c,ifDi(1)=1andDi(0)=0;d,ifDi(1)=0andDi(0)=1;n,ifDi(1)=0andDi(0)=0;\displaystyle U_{i}=\left\{\begin{array}[]{ll}a,&\text{if}\hskip 5.69054ptD_{i}(1)=1\hskip 5.69054pt\text{and}\hskip 5.69054ptD_{i}(0)=1;\\ c,&\text{if}\hskip 5.69054ptD_{i}(1)=1\hskip 5.69054pt\text{and}\hskip 5.69054ptD_{i}(0)=0;\\ d,&\text{if}\hskip 5.69054ptD_{i}(1)=0\hskip 5.69054pt\text{and}\hskip 5.69054ptD_{i}(0)=1;\\ n,&\text{if}\hskip 5.69054ptD_{i}(1)=0\hskip 5.69054pt\text{and}\hskip 5.69054ptD_{i}(0)=0;\end{array}\right. (5)

where a,c,da,c,d and nn represent ``always-taker'', ``complier'', ``defier'' and ``never-taker'', respectively. Here UiU_{i} is an unobserved variable, because we can observe only Di(1)D_{i}(1) or Di(0)D_{i}(0) for subject ii, but not both. The CACECACE of ZZ to YY is the parameter of interest, defined as

CACE(ZY)=E{Y(1)Y(0)U=c}.CACE(Z\rightarrow Y)=E\{Y(1)-Y(0)\mid U=c\}.

CACECACE is a subgroup causal effect for the compliers, with incompletely observed compliance status. Next, we give some sufficient conditions about the latent variables to make CACE(ZY)CACE(Z\rightarrow Y) identifiable, in the presence of noncompliance and nonignorable missing outcomes.

Assumption 2 (Randomization):

The treatment assignment ZZ is completely randomized.

Randomization means that ZZ is independent of {D(1),D(0),Y(1),Y(0),R(1),R(0)}\{D(1),D(0),Y(1),Y(0),R(1),R(0)\}, and we define ξ=P{Z=1D(1),D(0),Y(1),Y(0),R(1),R(0)}=P(Z=1)\xi=P\{Z=1\mid D(1),D(0),Y(1),Y(0),R(1),R(0)\}=P(Z=1). Under the randomization assumption, CACE(ZY)CACE(Z\rightarrow Y) can be expressed as

CACE(ZY)=E(YZ=1,U=c)E(YZ=0,U=c).CACE(Z\rightarrow Y)=E(Y\mid Z=1,U=c)-E(Y\mid Z=0,U=c).
Assumption 3 (Monotonicity):

Di(1)Di(0)D_{i}(1)\geq D_{i}(0) for each subject ii.

The monotonicity of Di(z)D_{i}(z) implies that there are no defiers. Define ωu=P(U=u)\omega_{u}=P(U=u) for u=a,c,d,nu=a,c,d,n, and the monotonicity assumption implies ωd=0\omega_{d}=0. Assumption 3 is plausible when the treatment assignment has a nonnegative effect on the treatment received for each subject, and it holds directly when the treatment is not available to subjects in the control arm, meaning Di(0)=0D_{i}(0)=0 for all subjects. The monotonicity assumption implies a positive ACEACE of ZZ on DD. However, under general circumstances, Assumption 3 is not fully testable, since only one of Di(1)D_{i}(1) and Di(0)D_{i}(0) can be observed.

Assumption 4:

ACE(ZD)0.ACE(Z\rightarrow D)\not=0.

By randomization, we have that ACE(ZD)=P(D=1Z=1)P(D=1Z=0)0ACE(Z\rightarrow D)=P(D=1\mid Z=1)-P(D=1\mid Z=0)\not=0 under Assumption 4, and therefore ZZ is correlated with DD. Without Assumption 4, we have P(D=1Z=1)=P(D=1Z=0)P(D=1\mid Z=1)=P(D=1\mid Z=0), which implies that ωc=0\omega_{c}=0 under Assumption 3. Since we are interested in the identifiability of CACE(ZY)CACE(Z\rightarrow Y), Assumption 4 is necessary.

Assumption 5 (Compound exclusion restrictions):

For never-takers and always-takers, we assume P{Y(1),R(1)U=n}=P{Y(0),R(0)U=n}P\{Y(1),R(1)\mid U=n\}=P\{Y(0),R(0)\mid U=n\}, and P{Y(1),R(1)U=a}=P{Y(0),R(0)U=a}P\{Y(1),R(1)\mid U=a\}=P\{Y(0),R(0)\mid U=a\}.

The traditional exclusion restriction assumes P{Y(1)U=u}=P{Y(0)U=u}P\{Y(1)\mid U=u\}=P\{Y(0)\mid U=u\} for u=au=a and nn. Frangakis and Rubin (1999) extended it to the compound exclusion restrictions, and imposed similar assumption on the joint vector of the outcome and the missing data indicator. Assumption 5 is reasonable in a double-blinded clinical trial, because the patients do not know the treatment assigned to them and thus ZZ has no ``direct effect'' on the outcome and the missing data indicator. However, when the missing data indicator depends on the treatment assigned, the compound exclusion restrictions may be violated. When ZZ is randomized, Assumption 5 is equivalent to P(Y,RZ=1,U=n)=P(Y,RZ=0,U=n)P(Y,R\mid Z=1,U=n)=P(Y,R\mid Z=0,U=n) and P(Y,RZ=1,U=a)=P(Y,RZ=0,U=a).P(Y,R\mid Z=1,U=a)=P(Y,R\mid Z=0,U=a).

Assumption 6 (Outcome-dependent nonignorable missing data):

For all y;y; z=0,1z=0,1; d=0,1;d=0,1; and u{a,c,n}u\in\{a,c,n\}, we assume

P{R(z)=1Y(z)=y,D(z)=d,U=u}\displaystyle P\{R(z)=1\mid Y(z)=y,D(z)=d,U=u\} =\displaystyle= P{R(z)=1Y(z)=y}\displaystyle P\{R(z)=1\mid Y(z)=y\} (6)
P{R(1)=1Y(1)=y}\displaystyle P\{R(1)=1\mid Y(1)=y\} =\displaystyle= P{R(0)=1Y(0)=y}.\displaystyle P\{R(0)=1\mid Y(0)=y\}. (7)

When ZZ is randomized, the equation (6) becomes P(R=1Y=y,D=d,U=u,Z=z)=P(R=1Y=y,Z=z),P(R=1\mid Y=y,D=d,U=u,Z=z)=P(R=1\mid Y=y,Z=z), and (7) becomes P(R=1Y=y,Z=1)=P(R=1Y=y,Z=0)P(R=1\mid Y=y,Z=1)=P(R=1\mid Y=y,Z=0). Define ρ(y)=P(R=1Y=y)\rho(y)=P(R=1\mid Y=y). Therefore Assumption 2 and Assumption 6 imply that ρ(y)=P(R=1Y=y,D=d,U=u,Z=z)\rho(y)=P(R=1\mid Y=y,D=d,U=u,Z=z). Hence RR depends on YY, but is independent of (Z,D,U)(Z,D,U) given YY.

In previous papers (Frangakis and Rubin, 1999; O'Malley and Normand, 2005; Zhou and Li, 2006), the LI assumption is used for modeling missing data, which means that the potential outcomes and associated potential nonresponse indicators are independent within each principal stratum, that is P{R(z)Y(z),D(z),U}=P{R(z)U}P\{R(z)\mid Y(z),D(z),U\}=P\{R(z)\mid U\}. Under the ODN missing data mechanism, the missing data indicator depends on the possibly missing outcome YY, which may be more reasonable than the LI missing data assumption in some applications. For example, some patients may have higher probabilities to leave the trial if their health outcomes are not good, and they may be more likely to stay in the trial otherwise. We illustrate the LI and ODN missing data mechanisms using the graphical models in Figure 1. Note that the arrows from ZZ to RR are absent because of the compound exclusion restriction assumption.

U\textstyle{U\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces}Z\textstyle{Z\ignorespaces\ignorespaces\ignorespaces\ignorespaces}D\textstyle{D\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces}Y\textstyle{Y}R\textstyle{R}
U\textstyle{U\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces}Z\textstyle{Z\ignorespaces\ignorespaces\ignorespaces\ignorespaces}D\textstyle{D\ignorespaces\ignorespaces\ignorespaces\ignorespaces}Y\textstyle{Y\ignorespaces\ignorespaces\ignorespaces\ignorespaces}R\textstyle{R}
(a) LI under Assumptions 2 and 5 (b) ODN under Assumptions 2 and 5
Figure 1: Graphical models for different missing data mechanisms

3 Semiparametric Identifiability and Estimation

In this section, we first discuss the difficulty of nonparametric identifiability without assuming a parametric form for both the outcome distribution and the missing data mechanism. If both the distribution of the outcome YY and the missing data mechanism ρ(y)\rho(y) are not specified, the model is essentially not identifiable without further assumptions. We then propose a semiparametric method, specifying only the distribution of YY without assuming any parametric form for the missing data mechanism. We show the identifiability and propose a TSMLE of CACE(ZY)CACE(Z\rightarrow Y) under the assumption that the distribution of the outcome variable YY belongs to the exponential family.

3.1 Semiparametric Identifiability

Under SUTVA, randomization and monotonicity assumptions, we have ξ=P(Z=1)\xi=P(Z=1), ωa=P(U=a)=P(D=1Z=0)\omega_{a}=P(U=a)=P(D=1\mid Z=0), ωn=P(U=n)=P(D=0Z=1)\omega_{n}=P(U=n)=P(D=0\mid Z=1), and ωc=1ωaωn\omega_{c}=1-\omega_{a}-\omega_{n}. These parameters can be identified directly from the observed data. Next we focus on the identification of the parameters of YY.

Assumption 7:

The conditional density of the outcome variable YY belongs to the following exponential family:

fzu(y)=f(yZ=z,U=u)=c(θzu)h(y)exp{k=1Kpk(θzu)Tk(y)},f_{zu}(y)=f(y\mid Z=z,U=u)=c(\theta_{zu})h(y)\exp\left\{\sum_{k=1}^{K}p_{k}(\theta_{zu})T_{k}(y)\right\}, (8)

where c(),h(),pk()c(\cdot),h(\cdot),p_{k}(\cdot), and Tk()T_{k}(\cdot) are known functions, and θ={θzu:z=0,1;u=c,a,n}\theta=\{\theta_{zu}:z=0,1;u=c,a,n\} are unknown parameters. We denote f(yZ=z,U=u)f(y\mid Z=z,U=u) simply as fzuf_{zu} hereinafter.

The parametric assumption of the outcome is untestable in general, since the missing data mechanism depends arbitrarily on the outcome. But for binary outcome, Small and Cheng (2009) proposed a goodness-of-fit test for the model under the ODN missing data mechanism. When the randomization assumption holds, the CACECACE is the difference between the expectations of the conditional density of YY, that is

CACE=E(YZ=1,U=c)E(YZ=0,U=c)=yf1c(y)𝑑yyf0c(y)𝑑y.CACE=E(Y\mid Z=1,U=c)-E(Y\mid Z=0,U=c)=\int yf_{1c}(y)dy-\int yf_{0c}(y)dy.

Hence if the parameters of fzu(y)f_{zu}(y) are identified, the CACECACE is also identified. The exponential family defined by Assumption 7 includes many common distributions, such as normal distributions N(μzu,σ2)N(\mu_{zu},\sigma^{2}), exponential distributions Exp(λzu)Exp(\lambda_{zu}) with mean parameter 1/λzu1/\lambda_{zu}, Gamma distributions Gamma(αzu,λ)Gamma(\alpha_{zu},\lambda) with shape parameter αzu\alpha_{zu} and rate parameter λ\lambda, and the log-normal distributions Lognormal(μzu,σ2)Lognormal(\mu_{zu},\sigma^{2}), where CACECACEs are specified as CACEnor=μ1cμ0cCACE_{nor}=\mu_{1c}-\mu_{0c}, CACEexp=1/λ1c1/λ0cCACE_{exp}=1/\lambda_{1c}-1/\lambda_{0c}, CACEgam=α1c/λα0c/λCACE_{gam}=\alpha_{1c}/\lambda-\alpha_{0c}/\lambda, and CACElog=exp{μ1c+σ2/2}exp{μ0c+σ2/2}CACE_{log}=\exp\{\mu_{1c}+\sigma^{2}/2\}-\exp\{\mu_{0c}+\sigma^{2}/2\}, respectively.

Next, Theorem 1 will show the identifiability of the parameters of θ\theta. The proof of Theorem 1 is provided in Appendix A. Assumption 5 implies θ1n=θ0n\theta_{1n}=\theta_{0n} and θ1a=θ0a\theta_{1a}=\theta_{0a}, which can be simplified as θn\theta_{n} and θa,\theta_{a}, respectively.

Theorem 1.

Under Assumptions 1 to 7, the vector η=(p1(θn)p1(θa)\eta=(p_{1}(\theta_{n})-p_{1}(\theta_{a}), p1(θ1c)p1(θa)p_{1}(\theta_{1c})-p_{1}(\theta_{a}), p1(θ0c)p1(θn)p_{1}(\theta_{0c})-p_{1}(\theta_{n}), \ldots, pK(θn)pK(θa)p_{K}(\theta_{n})-p_{K}(\theta_{a}), pK(θ1c)pK(θa)p_{K}(\theta_{1c})-p_{K}(\theta_{a}), pK(θ0c)pK(θn)p_{K}(\theta_{0c})-p_{K}(\theta_{n}), log{c(θn)}log{c(θa)}\log\left\{c(\theta_{n})\right\}-\log\left\{c(\theta_{a})\right\}, log{c(θ1c)}log{c(θa)}\log\left\{c(\theta_{1c})\right\}-\log\left\{c(\theta_{a})\right\}, log{c(θ0c)}log{c(θn)})\log\left\{c(\theta_{0c})\right\}-\log\left\{c(\theta_{n})\right\}) is identifiable. If there exists a one-to-one mapping from the parameter set θ\theta to the vector η\eta, then θ\theta is identifiable and so is CACE.CACE.

The one-to-one mapping condition seems complicated, but it is reasonable and holds for many widely-used distributions, such as homoskedastic normal distributions, exponential distributions, etc. We will verify the one-to-one mapping condition for normal and exponential distributions in Appendix C and Appendix D. Other distributions such as heteroskedastic normal distributions, Gamma distributions and lognormal distributions can be verified similarly. However, counterexamples do exist, and we provide one in Appendix A.

3.2 TSMLE of CACECACE

Because we do not specify a parametric form on the missing data mechanism ρ(y)\rho(y), the joint distribution of (Z,U,D,Y,R)(Z,U,D,Y,R) is not specified completely. Thus the MLEs of parameters are hard to obtain, since the likelihood depends on the infinite dimensional parameter ρ(y)\rho(y) as shown in Appendix B. In this subsection, we propose a two-step likelihood method to estimate parameters, which can be viewed as an example of the Two-Step Maximum Likelihood studied by Murphy and Topel (2002).

In the first step, a consistent estimator for α=(ξ,ωa,ωn)\alpha=(\xi,\omega_{a},\omega_{n}) can be obtained by MLE using the data {(Zi,Di):i=1,,N}\{(Z_{i},D_{i}):i=1,...,N\}. Let NN denote the sample size, N1=#{i:Zi=1}N_{1}=\#\{i:Z_{i}=1\}, N0=#{i:Zi=0}N_{0}=\#\{i:Z_{i}=0\} and nzd=#{i:Zi=z,Di=d}n_{zd}=\#\{i:Z_{i}=z,D_{i}=d\} for z=0,1z=0,1 and d=0,1d=0,1. Then the log likelihood function for α\alpha is

l1(α)=N1logξ+N0log(1ξ)+n11log(1ωn)+n10log(ωn)+n01log(ωa)+n00log(1ωa).l_{1}(\alpha)=N_{1}\log\xi+N_{0}\log(1-\xi)+n_{11}\log(1-\omega_{n})+n_{10}\log(\omega_{n})+n_{01}\log(\omega_{a})+n_{00}\log(1-\omega_{a}). (9)

The MLE for α\alpha is α^=(ξ^,ω^n,ω^a)=(N1/N,n10/N1,n01/N0)\hat{\alpha}=(\hat{\xi},\hat{\omega}_{n},\hat{\omega}_{a})=(N_{1}/N,n_{10}/N_{1},n_{01}/N_{0}), equivalent to the moment estimator.

In the second step, we propose a conditional likelihood method to estimate the parameter set θ\theta, which is based on the conditional probability of (Z,D)(Z,D) given YY and R=1R=1. Here the proposed conditional likelihood function does not depend on the nuisance parameter ρ(y)\rho(y), based on the fact that the following equations (3.2) to (12) do not depend on ρ(y)\rho(y):

log{P(Z=1,D=0Y=y,R=1)(1ξ)P(Z=0,D=1Y=y,R=1)ξ}\displaystyle\log\left\{\frac{P(Z=1,D=0\mid Y=y,R=1)(1-\xi)}{P(Z=0,D=1\mid Y=y,R=1)\xi}\right\}
=\displaystyle= log{P(U=n)f(yZ=1,U=n)P(U=a)f(yZ=0,U=a)}\displaystyle\log\left\{\frac{P(U=n)f(y\mid Z=1,U=n)}{P(U=a)f(y\mid Z=0,U=a)}\right\}
=\displaystyle= k=1K{pk(θn)pk(θa)}Tk(y)+log{ωnc(θn)ωac(θa)},\displaystyle\sum_{k=1}^{K}\left\{p_{k}(\theta_{n})-p_{k}(\theta_{a})\right\}T_{k}(y)+\log\left\{\frac{\omega_{n}c(\theta_{n})}{\omega_{a}c(\theta_{a})}\right\},
log{P(Z=0,D=0Y=y,R=1)ξP(Z=1,D=0Y=y,R=1)(1ξ)1}\displaystyle\log\left\{\frac{P(Z=0,D=0\mid Y=y,R=1)\xi}{P(Z=1,D=0\mid Y=y,R=1)(1-\xi)}-1\right\}
=\displaystyle= log{P(U=n)f(yZ=1,U=n)+P(U=c)f(yZ=0,U=c)P(U=n)f(yZ=1,U=n)1}\displaystyle\log\left\{\frac{P(U=n)f(y\mid Z=1,U=n)+P(U=c)f(y\mid Z=0,U=c)}{P(U=n)f(y\mid Z=1,U=n)}-1\right\}
=\displaystyle= log{P(U=c)f(yZ=0,U=c)P(U=n)f(yZ=1,U=n)}\displaystyle\log\left\{\frac{P(U=c)f(y\mid Z=0,U=c)}{P(U=n)f(y\mid Z=1,U=n)}\right\}
=\displaystyle= k=1K{pk(θ0c)pk(θn)}Tk(y)+log{ωcc(θ0c)ωac(θn)},\displaystyle\sum_{k=1}^{K}\left\{p_{k}(\theta_{0c})-p_{k}(\theta_{n})\right\}T_{k}(y)+\log\left\{\frac{\omega_{c}c(\theta_{0c})}{\omega_{a}c(\theta_{n})}\right\},
log{P(Z=1,D=1Y=y,R=1)(1ξ)P(Z=0,D=1Y=y,R=1)ξ1}\displaystyle\log\left\{\frac{P(Z=1,D=1\mid Y=y,R=1)(1-\xi)}{P(Z=0,D=1\mid Y=y,R=1)\xi}-1\right\}
=\displaystyle= log{P(U=a)f(yZ=1,U=a)+P(U=c)f(yZ=1,U=c)P(U=a)f(yZ=0,U=a)1}\displaystyle\log\left\{\frac{P(U=a)f(y\mid Z=1,U=a)+P(U=c)f(y\mid Z=1,U=c)}{P(U=a)f(y\mid Z=0,U=a)}-1\right\}
=\displaystyle= log{P(U=c)f(yZ=1,U=c)P(U=a)f(yZ=0,U=a)}\displaystyle\log\left\{\frac{P(U=c)f(y\mid Z=1,U=c)}{P(U=a)f(y\mid Z=0,U=a)}\right\}
=\displaystyle= k=1K{pk(θ1c)pk(θa)}Tk(y)+log{ωcc(θ1c)ωac(θa)}.\displaystyle\sum_{k=1}^{K}\left\{p_{k}(\theta_{1c})-p_{k}(\theta_{a})\right\}T_{k}(y)+\log\left\{\frac{\omega_{c}c(\theta_{1c})}{\omega_{a}c(\theta_{a})}\right\}. (12)

It is obvious that

z=01d=01P(Z=z,D=dY=y,R=1)=1.\sum\limits_{z=0}^{1}\sum\limits_{d=0}^{1}P(Z=z,D=d\mid Y=y,R=1)=1. (13)

The left hand sides of equations (3.2) to (12) consist of P(Z=z,D=dY=y,R=1)P(Z=z,D=d\mid Y=y,R=1) and ξ\xi, with the latter identified from the first step. The right hand sides of equations (3.2) to (12) consist of the parameters of interest. Therefore we can estimate θ\theta through a likelihood method. Note the right hand sides do not depend on ρ(y)\rho(y), so we do not need to specify the form of ρ(y)\rho(y). Let pzd(θ,α;y)p_{zd}(\theta,\alpha;y) denote P(Z=z,D=dY=y,R=1)P(Z=z,D=d\mid Y=y,R=1). Since (Z,D)(Z,D) given (Y=y,R=1)(Y=y,R=1) follows a multinomial distribution with four categories, the conditional log-likelihood function of (Z,D)(Z,D) can be written as

l2(θ,α)=i=1Nz=01d=01I(Zi=z,Di=d,Ri=1)logpzd(θ,α;yi).l_{2}(\theta,\alpha)=\sum\limits_{i=1}^{N}\sum\limits_{z=0}^{1}\sum\limits_{d=0}^{1}I(Z_{i}=z,D_{i}=d,R_{i}=1)\log p_{zd}(\theta,\alpha;y_{i}). (14)

From the proof of Theorem 1, the parameter θ\theta can be identified from the second likelihood function (14) after identifying α\alpha from the first likelihood function (9). Therefore, by maximizing l2(θ,α^)l_{2}(\theta,\hat{\alpha}) over θ\theta, we obtain the maximizer θ^\hat{\theta}. In practice, we can use the bootstrap method to approximate the sampling variance of the estimator of CACE.CACE.

4 Simulation Studies and Sensitivity Analysis

We report simulation studies and sensitivity analyses in order to evaluate the finite sample properties of the estimating methods proposed in this paper. In Tables 2-4, the columns with labels ``bias'', ``Std.Std. dev.dev.'', ``95%95\% CP'' and ``95%95\% CI'' represent the average bias, standard deviation, 95%95\% coverage proportion and average 95%95\% confidence interval, respectively.

First, we generate the outcomes under the ODN missing data mechanism from homoskedastic normal distributions (denoted as ``homo_normalhomo\_normal''), exponential distributions, Gamma distributions and log-normal distributions, respectively. We set the number of simulations to be 1000010000, and choose the sample sizes as 500,1000500,1000, 20002000 and 40004000, respectively. We show the joint distributions of (Z,U,D,Y,R)(Z,U,D,Y,R) in Table 1, and report the results in Table 2. The results have small biases and standard deviations, which decrease as the sample sizes become larger. And all the confidence intervals of CACECACE have empirical coverage proportions very close to their nominal values.

Table 1: True parameters for simulation
homo_normalhomo\_normal Exponential Gamma Lognormal
Y(1)|U=cY(1)|U=c N(5,1)N(5,1) Exp(1/5)Exp(1/5) Gamma(5,1)Gamma(5,1) Lognormal(0,1)Lognormal(0,1)
Y(0)|U=cY(0)|U=c N(4,1)N(4,1) Exp(1/4)Exp(1/4) Gamma(4,1)Gamma(4,1) Lognormal(1,1)Lognormal(-1,1)
Y(z)|U=aY(z)|U=a N(6,1)N(6,1) Exp(1/6)Exp(1/6) Gamma(6,1)Gamma(6,1) Lognormal(1.5,1)Lognormal(-1.5,1)
Y(z)|U=nY(z)|U=n N(3,1)N(3,1) Exp(1/3)Exp(1/3) Gamma(3,1)Gamma(3,1) Lognormal(0.5,1)Lognormal(-0.5,1)
ZZ Bernoulli(0.5)
UU ωc=ωn=ωa=1/3\omega_{c}=\omega_{n}=\omega_{a}=1/3
R|Y=yR|Y=y ρy=I(y2)×0.85+I(y7)×0.8+I(2<y<7)×0.9\rho_{y}=I(y\leq 2)\times 0.85+I(y\geq 7)\times 0.8+I(2<y<7)\times 0.9
Table 2: Results of simulation studies
true value NN bias Std.Std. dev.dev. 95%95\% CP 95%95\% CI
CACEnor=1.0CACE_{nor}=1.0 500 -0.0194 0.3395 0.9489 [0.3152, 1.6461]
1000 -0.0073 0.2343 0.9476 [0.5335, 1.4518]
2000 0.0022 0.1629 0.9500 [0.6828, 1.3215]
4000 -0.0019 0.1145 0.9504 [0.7736, 1.2225]
CACEexp=1.0CACE_{exp}=1.0 500 0.0872 1.5910 0.9455 [-2.0312, 4.2056]
1000 0.0312 1.0309 0.9441 [-0.9893, 3.0517]
2000 0.0106 0.7091 0.9479 [-0.3793, 2.4004]
4000 0.0072 0.4891 0.9506 [0.0486, 1.9657]
CACEgam=1.0CACE_{gam}=1.0 500 0.0830 1.6872 0.9915 [-2.2237, 4.3901]
1000 0.0284 0.5978 0.9625 [-0.1432, 2.2000]
2000 0.0108 0.3636 0.9493 [0.2981, 1.7235]
4000 0.0032 0.2530 0.9505 [0.5073, 1.4992]
CACElog=1.0422CACE_{log}=1.0422 500 0.1156 0.7849 0.9617 [-0.3806, 2.6962]
1000 0.0599 0.4571 0.9494 [0.2061, 1.9981]
2000 0.0218 0.3093 0.9496 [0.4578, 1.6702]
4000 0.0106 0.2130 0.9469 [0.6353, 1.4702]

Second, we report the results of comparison of our methods with the MLE proposed by O'Malley and Normand (2005) (``LI'' in Tables 3) under five different cases, which violate the homoskedastic normal outcomes or the ODN assumption. We repeat our simulation 1000010000 times with sample sizes of 40004000 in each case. The results of five cases are shown in Table 3, named as ``Heter'', ``Unif'',``T'', ``DY'' and ``DYU'', respectively. The first case, ``Heter'' case, violates the homoskedastic normal outcomes. Next two cases, ``Unif'' and ``T'', violate the exponential family assumption. The last two cases, ``DY'' and ``DYU'', violate the ODN assumption. In the ``Heter'' case we generate data from heteroskadastic normal outcomes. The data generating process is the same as ``homo_normalhomo\_normal'' except that Y(1)|U=cN(5,0.25)Y(1)|U=c\sim N(5,0.25), Y(z)|U=aN(6,0.30)Y(z)|U=a\sim N(6,0.30) for z=0,1z=0,1. In the `Unif'' case the data is generated the same as ``homo_normalhomo\_normal'' except that the outcomes follow uniform distribution with Y(1)|U=cU[2,8]Y(1)|U=c\sim U[2,8], Y(0)|U=cU[1,7]Y(0)|U=c\sim U[1,7], Y(z)|U=aU[3,9]Y(z)|U=a\sim U[3,9] and Y(z)|U=nU[1,5]Y(z)|U=n\sim U[1,5], respectively. The data generating process in the ``T'' case is the same as ``homo_normalhomo\_normal'' except that the outcomes follow t distributions with the same means as ``homo_normalhomo\_normal'' and degrees of freedom 44. In the ``DY'' case we generate data under the missing data mechanism depending on both DD and YY, and choose P(R=1D,Y)=0.8I(Y>5)×0.5+I(D=1)×0.1I(Y>5)I(D=0)×0.1P(R=1\mid D,Y)=0.8-I(Y>5)\times 0.5+I(D=1)\times 0.1-I(Y>5)I(D=0)\times 0.1 with other conditional distributions the same as ``homo_normalhomo\_normal''. In the ``DYU'' case we generate data with the missing data mechanism depending on DD, YY and UU, and choose P(R=1D,Y,U)=(1+exp{5+0.1DY0.1U})1P(R=1\mid D,Y,U)=(1+\exp\{5+0.1D-Y-0.1U\})^{-1}, with other conditional distributions the same as ``homo_normalhomo\_normal'' and U=1,2,3U=1,2,3 corresponding to U=c,n,aU=c,n,a. From Table 3, we can see that the point estimator of our method is generally robust to four kinds of violations of the assumptions. However, the results are worse for ``Unif'' case, which has a large bias, low 95%95\% coverage proportion and whose 95%95\% confidence interval does not cover the true value.

Table 3: Comparison of the methods assuming ODN and LI under five cases which violate the homoskedastic normal outcomes or the ODN assumption. (CACEtrue=1.0CACE_{true}=1.0)
Method Assump. bias Std.Std. dev.dev. 95%95\% CP 95%95\% CI
ODN Heter -0.0268 0.0772 0.9363 [0.8220, 1.1244]
Unif -0.3815 0.1865 0.4676 [0.2530, 0.9841]
T -0.0350 0.1730 0.9427 [0.6960, 1.3740]
DY 0.0201 0.1555 0.9465 [0.7154, 1.3249]
DYU -0.0852 0.1691 0.9242 [0.5834, 1.2462]
LI Heter -0.0277 0.0677 0.9300 [0.8395, 1.1051]
Unif -0.8521 0.2474 0.0695 [-0.3369, 0.6327]
T 0.2244 0.1225 0.5577 [0.9843, 1.4646]
DY -0.0288 0.0894 0.9370 [0.7959, 1.1465]
DYU -0.1267 0.1321 0.8426 [0.6144, 1.1322]

Finally, we compare our methods with the MLE proposed by O'Malley and Normand (2005) under the LI missing data mechanism (``LI'' in Table 4). We repeat our simulation 1000010000 times with sample sizes of 40004000 in each case. The data generating processes are the same as ``homo_normalhomo\_normal'', but the missing data mechanisms are LI. Denote γdu=P(R=1D=d,U=u)\gamma_{du}=P(R=1\mid D=d,U=u), and choose (γ1c,γ0c,γ0n,γ1a)=(0.8,0.75,0.7,0.9)(\gamma_{1c},\gamma_{0c},\gamma_{0n},\gamma_{1a})=(0.8,0.75,0.7,0.9), (0.9,0.7,0.8,0.7)(0.9,0.7,0.8,0.7), (0.7,0.6,0.6,0.8)(0.7,0.6,0.6,0.8) and (0.6,0.7,0.9,0.7)(0.6,0.7,0.9,0.7) for ``LI1'' to ``LI4'' respectively as shown in rows 1-4 and 5-8 of Table 4. Since the missing data mechanisms are LI, the ``LI'' method exhibits very small biases. Although the assumptions required by the ``ODN'' methods do not hold, the biases are not very large except for the missing mechanism LI4. The last case, LI4, has the largest variability among the γzu\gamma_{zu}'s and thus the largest bias for estimating the CACECACE, since the missing data mechanism has the ``strongest'' dependence on DD and UU but not YY. Next we generate data under the ODN assumption, and compare the methods under both ODN and LI assumptions. Let ρ(y;δ)=I(y2)×(0.9δ)+I(y7)×(0.92δ)+I(2<y<7)×0.9\rho(y;\delta)=I(y\leq 2)\times(0.9-\delta)+I(y\geq 7)\times(0.9-2\delta)+I(2<y<7)\times 0.9 where 0<δ<0.90<\delta<0.9. As δ\delta increases, the relationship between YY and RR becomes stronger. The data are generated from the same joint distribution as ``homo_normalhomo\_normal'' except for different ρ(y;δ)\rho(y;\delta). The results are shown in Figure 2. The method under ODN missing data mechanism has small bias and promising coverage property irrespective of δ\delta, but the method under LI missing data mechanism has larger bias and poorer coverage property with larger δ\delta.

Table 4: Comparison of the methods assuming ODN and LI when LI holds. (CACEtrue=1.0CACE_{true}=1.0
Method bias Std.Std. dev.dev. 95%95\% CP 95%95\% CI
ODN LI1 -0.0291 0.1236 0.9438 [0.7287, 1.2132]
LI2 0.0976 0.1452 0.8961 [0.8130, 1.3823]
LI3 -0.0569 0.1371 0.9332 [0.6745, 1.2118]
LI4 -0.1961 0.1152 0.5966 [0.5781, 1.0297]
LI LI1 -0.0013 0.1123 0.9491 [0.7785, 1.2189]
LI2 -0.0010 0.1099 0.9504 [0.7836, 1.2145]
LI3 -0.0008 0.1262 0.9502 [0.7519, 1.2465]
LI4 -0.0015 0.1290 0.9505 [0.7456, 1.2515]
Refer to caption
Figure 2: Comparison of the methods assuming ODN and LI when ODN holds. (CACEtrue=1.0CACE_{true}=1.0)

5 Application

We use the new methods proposed in this paper to re-analyze a psychiatric clinical trial. It is a double-blinded randomized study comparing the relative effect of clozapine and haloperidol in adults with refractory schizophrenia at fifteen Veterans Affairs medical centers. Clozapine is found to be more efficacious than standard drugs in patients with refractory schizophrenia. Yet it is associated with potentially fatal agranulocytosis. One objective for conducting this trial is to evaluate the clinical effect of two antipsychotic medications. The dataset has been analysed in Rosenheck et al. (1997), Levy et al. (2004), and O'Malley and Normand (2005). Some summary statistics of the data are described in Table 5. More details about the trial can be found in Rosenheck et al. (1997) and O'Malley and Normand (2005). In the treatment arm, 203203 patients are randomized to clozapine; in the control arm, 218218 patients are randomized to haloperidol. The outcome of interest is the positive and negative syndrome score (PANSS) with higher values indicating more severe symptoms. The baseline PANSS is nearly balanced in both groups. Missing outcome patterns are obviously different in the clozapine group (about 40/2030.2040/203\approx 0.20) than in the haloperidol group (about 59/2180.2759/218\approx 0.27). Hence it is possible that outcomes are not missing at random. The primary reasons for dropout in the clozapine group are side effects or non-drug-related reasons. The reasons for discontinuing haloperidol are lack of efficacy or worsening of symptoms. Therefore, the missing mechanism may possibly depend on the missing outcome, and we think that the ODN assumption may be more reasonable in this case.

Table 5: Summary statistics of the data from the psychiatric clinical trial
Received treatment
Assigned treatment Clozapine (D=1D=1) Haloperidol (D=0D=0) Total
Clozapine (Z=1Z=1)
    Sample size 122 81 203
    Missing sample size 0 40 40
    Mean of the baseline PANSS 90.83 91.20 90.98
Haloperidol (Z=0Z=0)
    Sample size 57 161 218
    Missing sample size 12 47 59
    Mean of the Baseline PANSS 96.30 90.69 92.16

The estimates of CACECACE by different methods are shown in Table 6. In Table 6, the ``homo'' and ``hetero'' in parentheses after ``ODN'' correspond to the homoskedastic and heteroskedastic model assumptions, respectively; and ``LI'' corresponds to the MLE proposed in O'Malley and Normand (2005). The columns of Table 6 correspond to the methods, point estimates, standard errors, 95%95\% and 90%90\% confidence intervals, respectively. The bootstrap method is used to compute standard errors and confidence intervals for all methods. From Table 6, we can see that under the homoskedastic assumption subjects in the clozapine group had 5.005.00 lower symptom levels than those in the haloperidol group, and under the heteroskedastic assumption it was 5.545.54. Both methods have similar conclusions that clozapine is somewhat more effective than haloperidol for patients with refractory schizophrenia. Both of the semiparametric methods give insignificant results since both 95%95\% confidence intervals include zero. Our results are similar to O'Malley and Normand (2005) which also gave an insignificant result. However, the results of the 90%90\% confidence intervals are somewhat different: the result under the ODN mechanism with the heteroskedastic assumption is significant, but the results from other two models are not.

Assuming different missing data mechanisms such as LI, ODN and under other different assumptions on the outcome variable, we can find different point estimates and confidence intervals for CACECACE using the data from the psychiatric clinical trial. When we have prior knowledge that the missing mechanism depends only on the treatment received and the compliance status, the method under LI missing mechanism will provide more credible conclusion. However, when we have prior knowledge that the missing mechanism depends directly on the outcome, we recommend our methods under the ODN missing data mechanism. The newly proposed methods can be used as alternatives for the predominate methods assuming the LI missing mechanism in sensitivity analysis.

Table 6: Estimates of CACECACE by different methods
Method Estimate S.E.S.E. 95%95\% CI 90%90\% CI
ODN(homo) -5.00 3.05 [-10.98, 0.98] [-10.02, 0.02]
ODN(hetero) -5.54 3.05 [-11.52, 0.44] [-10.56, -0.52]
LI -4.36 4.35 [-12.89, 4.17] [-11.52, 2.80]

6 Discussion

Randomization is a powerful tool to measure the relative causal effect of treatment versus control. Some subjects in randomized trials, however, may fail to comply with the assigned treatments or drop out before the final outcomes are measured. Noncompliance and missing data problems make statistical causal inference difficult, because the causal effects are not identifiable without additional assumptions. Under different assumptions about the missing data mechanisms of the outcomes, the identifiability and estimation methods may be fundamentally different. Most previous studies (Frangakis and Rubin, 1999; Barnard et al., 2003; O'Malley and Normand, 2005; Zhou and Li, 2006) rely on the LI assumption in order to identify CACECACE, but the LI assumption may be not reasonable when the missing data mechanism may depend on the outcome. Under the ODN missing data mechanism, Chen et al. (2009) and Imai (2009) showed the identifiability and proposed the moment estimator and the MLE of CACECACE for discrete outcomes. But there are no results for continuous outcomes under both noncompliance and ODN missing data mechanism. As a generalization of Chen et al. (2009) and Imai (2009), we study the semiparametric identifiability, and propose estimation methods for CACECACE with continuous outcomes under the ODN missing data mechanism.

The ODN assumption allows the missing data mechanism to depend on the outcome. However, the missing data processes in practical problems may be more complicated, and they may depend on other variables such as ZZ, UU and DD. For example, a missing mechanism depending on both the compliance status and the outcome may be reasonable in some real studies. Small and Cheng (2009) proposed a saturated model for P(R=1Z,U,Y)P(R=1\mid Z,U,Y), and the models under LI and ODN are special cases of their model. However, their model is generally not identifiable without restrictions on the parameters. It is worthwhile to study the identifiability of CACECACE under all possible restrictions of P(R=1Z,U,Y)P(R=1\mid Z,U,Y) and perform sensitivity analysis for models lack of identifiability. We consider only cross-sectional data in this paper, and generalizing our methods to longitudinal data is a future research topic.

Acknowledgments

We would like to thank Editor, Associate Editor and three reviewers for their very valuable comments and suggestions. Chen's research was supported in part by NSFC 11101045, CAEP 2012A0201011 and CAEP 2013A0101004. Geng's research was supported by NSFC 11021463, 10931002 and 11171365. Zhou's research was supported in part by Department of Veterans Affairs HSR&D RCS Award 05-196. It does not necessarily represent the views of VA HSR&D Service.

Appendices

Appendix A

Proof of Theorem 3.1. From equations (3.5) to (3.7), we can identify (p1(θn)p1(θa)(p_{1}(\theta_{n})-p_{1}(\theta_{a}), p1(θ1c)p1(θa)p_{1}(\theta_{1c})-p_{1}(\theta_{a}), p1(θ0c)p1(θn)p_{1}(\theta_{0c})-p_{1}(\theta_{n}), \ldots, pK(θn)pK(θa)p_{K}(\theta_{n})-p_{K}(\theta_{a}), pK(θ1c)pK(θa)p_{K}(\theta_{1c})-p_{K}(\theta_{a}), pK(θ0c)pK(θn)p_{K}(\theta_{0c})-p_{K}(\theta_{n}), log{c(θn)}log{c(θa)}\log\left\{c(\theta_{n})\right\}-\log\left\{c(\theta_{a})\right\}, log{c(θ1c)}log{c(θa)}\log\left\{c(\theta_{1c})\right\}-\log\left\{c(\theta_{a})\right\}, log{c(θ0c)}log{c(θn)})\log\left\{c(\theta_{0c})\right\}-\log\left\{c(\theta_{n})\right\}) using generalized linear models. Therefore θ\theta is identifiable because of the one-to-one mapping.

Counterexample for identifiability. Consider the following exponential family:

fzu(y)=f(yZ=z,U=u)=c(θzu)h(y)exp{k=1Kθzu,kyk}.f_{zu}(y)=f(y\mid Z=z,U=u)=c(\theta_{zu})h(y)\exp\left\{\sum_{k=1}^{K}\theta_{zu,k}y^{k}\right\}.

The number of unknown parameters contained in θ\theta is 4K4K, and the number of identifiable parameters contained in η\eta is 3(K+1)3(K+1). A necessary condition for the existence of a one-to-one mapping from θ\theta to η\eta is 4K3(K+1)4K\leq 3(K+1), or, equivalently, K3.K\leq 3. Therefore, when K>3K>3, a one-to-one mapping from θ\theta to η\eta does not exist.

Appendix B: Full likelihood for (α,θ,ρ(y))(\alpha,\theta,\rho(y))

Define Mzd=#{i:Zi=z,Di=d,Ri=0}M_{zd}=\#\{i:Z_{i}=z,D_{i}=d,R_{i}=0\} for z=0,1z=0,1 and d=0,1.d=0,1. Under the compound exclusion restriction, we have f1n(y)=f0n(y)=fn(y)f_{1n}(y)=f_{0n}(y)=f_{n}(y) and f1a(y)=f0a(y)=fa(y).f_{1a}(y)=f_{0a}(y)=f_{a}(y). The full likelihood for (α,θ,ρ(y))(\alpha,\theta,\rho(y)) is

L(α,θ,ρ(y))\displaystyle L(\alpha,\theta,\rho(y))
\displaystyle\propto ξN1(1ξ)N0(1ωn)n11ωnn10ωan01(1ωa)n00\displaystyle\xi^{N_{1}}(1-\xi)^{N_{0}}(1-\omega_{n})^{n_{11}}\omega_{n}^{n_{10}}\omega_{a}^{n_{01}}(1-\omega_{a})^{n_{00}}
[ωcωc+ωa{1ρ(y)}f1c(y)𝑑y+ωaωc+ωa{1ρ(y)}fa(y)𝑑y]M11[{1ρ(y)}fn(y)𝑑y]M10\displaystyle\cdot\left[\frac{\omega_{c}}{\omega_{c}+\omega_{a}}\int\{1-\rho(y)\}f_{1c}(y)dy+\frac{\omega_{a}}{\omega_{c}+\omega_{a}}\int\{1-\rho(y)\}f_{a}(y)dy\right]^{M_{11}}\left[\int\{1-\rho(y)\}f_{n}(y)dy\right]^{M_{10}}
[{1ρ(y)}fa(y)𝑑y]M01[ωcωc+ωn{1ρ(y)}f0c(y)𝑑y+ωnωc+ωn{1ρ(y)}fn(y)𝑑y]M00\displaystyle\cdot\left[\int\{1-\rho(y)\}f_{a}(y)dy\right]^{M_{01}}\left[\frac{\omega_{c}}{\omega_{c}+\omega_{n}}\int\{1-\rho(y)\}f_{0c}(y)dy+\frac{\omega_{n}}{\omega_{c}+\omega_{n}}\int\{1-\rho(y)\}f_{n}(y)dy\right]^{M_{00}}
i:Ri=1ρ(Yi)i:(Zi,Di,Ri)=(1,1,1){ωcωc+ωaf1c(Yi)+ωaωc+ωafa(Yi)}i:(Zi,Di,Ri)=(1,0,1)fn(Yi)\displaystyle\cdot\prod_{i:R_{i}=1}\rho(Y_{i})\cdot\prod_{i:(Z_{i},D_{i},R_{i})=(1,1,1)}\left\{\frac{\omega_{c}}{\omega_{c}+\omega_{a}}f_{1c}(Y_{i})+\frac{\omega_{a}}{\omega_{c}+\omega_{a}}f_{a}(Y_{i})\right\}\prod_{i:(Z_{i},D_{i},R_{i})=(1,0,1)}f_{n}(Y_{i})
i:(Zi,Di,Ri)=(0,1,1)fa(Yi)i:(Zi,Di,Ri)=(0,0,1){ωcωc+ωnf0c(Yi)+ωnωc+ωnfn(Yi)}.\displaystyle\cdot\prod_{i:(Z_{i},D_{i},R_{i})=(0,1,1)}f_{a}(Y_{i})\cdot\prod_{i:(Z_{i},D_{i},R_{i})=(0,0,1)}\left\{\frac{\omega_{c}}{\omega_{c}+\omega_{n}}f_{0c}(Y_{i})+\frac{\omega_{n}}{\omega_{c}+\omega_{n}}f_{n}(Y_{i})\right\}.

Appendix C: Verification of homoskedastic normal distribution in Subsection 3.2

For homoskedastic normal outcomes, equations (3.5) to (3.7) can be re-written as:

a1y+b1\displaystyle a_{1}y+b_{1} =\displaystyle= log{P(Z=1,D=0Y=y,R=1)(1ξ)P(Z=0,D=1Y=y,R=1)ξ}\displaystyle\log\left\{\frac{P(Z=1,D=0\mid Y=y,R=1)(1-\xi)}{P(Z=0,D=1\mid Y=y,R=1)\xi}\right\}
=\displaystyle= μnμaσ2y+μa2μn22σ2+log{ωnωa},\displaystyle\frac{\mu_{n}-\mu_{a}}{\sigma^{2}}y+\frac{\mu_{a}^{2}-\mu_{n}^{2}}{2\sigma^{2}}+\log\left\{\frac{\omega_{n}}{\omega_{a}}\right\},
a2y+b2\displaystyle a_{2}y+b_{2} =\displaystyle= log{P(Z=0,D=0Y=y,R=1)ξP(Z=1,D=0Y=y,R=1)(1ξ)1}\displaystyle\log\left\{\frac{P(Z=0,D=0\mid Y=y,R=1)\xi}{P(Z=1,D=0\mid Y=y,R=1)(1-\xi)}-1\right\}
=\displaystyle= μ0cμnσ2y+μn2μ0c22σ2+log{ωcωn},\displaystyle\frac{\mu_{0c}-\mu_{n}}{\sigma^{2}}y+\frac{\mu_{n}^{2}-\mu_{0c}^{2}}{2\sigma^{2}}+\log\left\{\frac{\omega_{c}}{\omega_{n}}\right\},
a3y+b3\displaystyle a_{3}y+b_{3} =\displaystyle= log{P(Z=1,D=1Y=y,R=1)(1ξ)P(Z=0,D=1Y=y,R=1)ξ1}\displaystyle\log\left\{\frac{P(Z=1,D=1\mid Y=y,R=1)(1-\xi)}{P(Z=0,D=1\mid Y=y,R=1)\xi}-1\right\}
=\displaystyle= μ1cμaσ2y+μa2μ1c22σ2+log{ωcωa}.\displaystyle\frac{\mu_{1c}-\mu_{a}}{\sigma^{2}}y+\frac{\mu_{a}^{2}-\mu_{1c}^{2}}{2\sigma^{2}}+\log\left\{\frac{\omega_{c}}{\omega_{a}}\right\}.

Since aia_{i} and bib_{i} can be identified from generalized linear models, we can identify all the parameters from the above equations and obtain the following results:

σ2\displaystyle\sigma^{2} =\displaystyle= 2[b1log{ωnωa}a1b2log{ωcωn}a2]/(a1+a2),\displaystyle 2\left[\frac{b_{1}-\log\left\{\frac{\omega_{n}}{\omega_{a}}\right\}}{a_{1}}-\frac{b_{2}-\log\left\{\frac{\omega_{c}}{\omega_{n}}\right\}}{a_{2}}\right]/(a_{1}+a_{2}),
=\displaystyle= 2[b1log{ωnωa}a1b3log{ωcωa}a3]/(a3a1),\displaystyle 2\left[\frac{b_{1}-\log\left\{\frac{\omega_{n}}{\omega_{a}}\right\}}{a_{1}}-\frac{b_{3}-\log\left\{\frac{\omega_{c}}{\omega_{a}}\right\}}{a_{3}}\right]/(a_{3}-a_{1}),
μ1c\displaystyle\mu_{1c} =\displaystyle= 12a3σ2b3log{ωcωa}a3,\displaystyle\frac{1}{2}a_{3}\sigma^{2}-\frac{b_{3}-\log\left\{\frac{\omega_{c}}{\omega_{a}}\right\}}{a_{3}},
μ0c\displaystyle\mu_{0c} =\displaystyle= 12a2σ2b2log{ωcωn}a2.\displaystyle\frac{1}{2}a_{2}\sigma^{2}-\frac{b_{2}-\log\left\{\frac{\omega_{c}}{\omega_{n}}\right\}}{a_{2}}.

Therefore, we can identify CACE=μ1cμ0c.CACE=\mu_{1c}-\mu_{0c}.

Appendix D: Verification of the exponential distribution in Subsection 3.2

For the exponentially distributed outcomes, equations (3.5) to (3.7) can be re-written as:

a1y+b1\displaystyle a_{1}y+b_{1} =\displaystyle= log{P(Z=1,D=0Y=y,R=1)(1ξ)P(Z=0,D=1Y=y,R=1)ξ}\displaystyle\log\left\{\frac{P(Z=1,D=0\mid Y=y,R=1)(1-\xi)}{P(Z=0,D=1\mid Y=y,R=1)\xi}\right\}
=\displaystyle= (λaλn)y+log{ωnλnωaλa},\displaystyle(\lambda_{a}-\lambda_{n})y+\log\left\{\frac{\omega_{n}\lambda_{n}}{\omega_{a}\lambda_{a}}\right\},
a2y+b2\displaystyle a_{2}y+b_{2} =\displaystyle= log{P(Z=0,D=0Y=y,R=1)ξP(Z=1,D=0Y=y,R=1)(1ξ)1}\displaystyle\log\left\{\frac{P(Z=0,D=0\mid Y=y,R=1)\xi}{P(Z=1,D=0\mid Y=y,R=1)(1-\xi)}-1\right\}
=\displaystyle= (λnλ0c)y+log{ωcλ0cωnλn},\displaystyle(\lambda_{n}-\lambda_{0c})y+\log\left\{\frac{\omega_{c}\lambda_{0c}}{\omega_{n}\lambda_{n}}\right\},
a3y+b3\displaystyle a_{3}y+b_{3} =\displaystyle= log{P(Z=1,D=1Y=y,R=1)(1ξ)P(Z=0,D=1Y=y,R=1)ξ1}\displaystyle\log\left\{\frac{P(Z=1,D=1\mid Y=y,R=1)(1-\xi)}{P(Z=0,D=1\mid Y=y,R=1)\xi}-1\right\}
=\displaystyle= (λaλ1c)y+log{ωcλ1cωaλa}.\displaystyle(\lambda_{a}-\lambda_{1c})y+\log\left\{\frac{\omega_{c}\lambda_{1c}}{\omega_{a}\lambda_{a}}\right\}.

Since aia_{i} and bib_{i} can be identified from generalized linear models, we can identify all the parameters from the above equations and obtain the following results:

λ0c\displaystyle\lambda_{0c} =\displaystyle= a1exp{b1log{ωn/ωa}}1exp{b1log{ωn/ωa}}a2,\displaystyle\frac{a_{1}\exp\{b_{1}-\log\{\omega_{n}/\omega_{a}\}\}}{1-\exp\{b_{1}-\log\{\omega_{n}/\omega_{a}\}\}}-a_{2},
λ1c\displaystyle\lambda_{1c} =\displaystyle= a11exp{b1log{ωn/ωa}}a3,\displaystyle\frac{a_{1}}{1-\exp\{b_{1}-\log\{\omega_{n}/\omega_{a}\}\}}-a_{3},
and CACE\displaystyle\text{and }CACE =\displaystyle= 1λ1c1λ0c.\displaystyle\frac{1}{\lambda_{1c}}-\frac{1}{\lambda_{0c}}.

References

  • Angrist et al. (1996) J. D. Angrist, G. W. Imbens, and D. B. Rubin. Identification of causal effects using instrumental variables (with discussion). Journal of the American Statistical Association, 91:444–455, 1996.
  • Balke and Pearl (1997) A. Balke and J. Pearl. Bounds on treatment effects from studies with imperfect compliance. Journal of the American Statistical Association, 92:1171–1176, 1997.
  • Barnard et al. (2003) J. Barnard, C. E. Frangakis, J. L. Hill, and D. B. Rubin. Principle stratification approach to broken randomized experiments: A case study of school choice vouchers in New York City (with discussion). Journal of the American Statistical Association, 98:299–314, 2003.
  • Chen et al. (2009) H. Chen, Z. Geng, and X. H. Zhou. Identifiability and estimation of causal effects in randomized trials with noncompliance and completely non-ignorable missing data (with discussion). Biometrics, 65:675–682, 2009.
  • Dempster et al. (1977) A.P. Dempster, N.M. Laird, and D.B. Rubin. Maximum likelihood from incomplete data via the EM algorithm (with discussion). Journal of the Royal Statistical Society: Series B (Statistical Methodology), 39:1–38, 1977.
  • Efron and Feldman (1991) B. Efron and D. Feldman. Compliance as an explanatory variable in clinical trials (with discussion). Journal of the American Statistical Association, 86:9–17, 1991.
  • Frangakis and Rubin (1999) C. E. Frangakis and D. B. Rubin. Addressing complications of intention-to-treat analysis in the combined presence of all-or-none treatment-noncompliance and subsequent missing outcomes. Biometrika, 86:365–379, 1999.
  • Frangakis and Rubin (2002) C. E. Frangakis and D. B. Rubin. Principal stratification in causal inference. Biometrics, 58:21–29, 2002.
  • Hudgens and Halloran (2008) M. G. Hudgens and M. E. Halloran. Toward causal inference with interference. Journal of the American Statistical Association, 103:832–842, 2008.
  • Imai (2009) K. Imai. Statistical analysis of randomized experiments with non-ignorable missing binary outcomes: an application to a voting experiment. Journal of the Royal Statistical Society: Series C (Applied Statistics), 58:83–104, 2009.
  • Imbens and Rubin (1997) G. W. Imbens and D. B. Rubin. Bayesian inference for causal effects in randomized experiments with noncompliance. The Annals of Statistics, 25:305–327, 1997.
  • Levy et al. (2004) D. E. Levy, A. J. O'Malley, and Normand S. L. T. Covariate adjustment in clinical trials with non-ignorable missing data and non-compliance. Statistics in Medicine, 23:2319–2339, 2004.
  • Murphy and Topel (2002) K.M. Murphy and R.H. Topel. Estimation and inference in two-step econometric models. Journal of Business and Economic Statistics, 20:88–97, 2002.
  • O'Malley and Normand (2005) A. J. O'Malley and S. L. T. Normand. Likelihood methods for treatment noncompliance and subsequent nonresponse in randomized trials. Biometrics, 61:325–334, 2005.
  • Rosenheck et al. (1997) R. Rosenheck, J. Cramer, W. Xu, J. Thomas, W. Henderson, L. Frisman, C. Fye, and D. Charney. A comparison of clozapine and haloperidol in hospitalized patients with refractory schizophrenia. New England Journal of Medicine, 337:809–815, 1997.
  • Rubin (1980) D. B. Rubin. Comment on ``Randomization analysis of experimental data: the Fisher randomization test'' by D. Basu. Journal of the American Statistical Association, 75:591–593, 1980.
  • Rubin (1986) D. B. Rubin. Comments on ``Statistics and causal inference'' by Paul Holland: Which ifs have causal answers. Journal of the American Statistical Association, 81:961–962, 1986.
  • Small and Cheng (2009) D. S. Small and J. Cheng. Discussion of ``Identifiability and estimation of causal effects in randomized trials with noncompliance and completely non-ignorable missing data'' by Hua Chen, Zhi Geng and Xiaohua Zhou. Biometrics, 65:682–685, 2009.
  • Taylor and Zhou (2011) L. Taylor and X. H. Zhou. Methods for clustered encouragement design studies with noncompliance and missing data. Biostatistics, 12:313–326, 2011.
  • Zhou and Li (2006) X. H. Zhou and S. M. Li. ITT analysis of randomized encouragement design studies with missing data. Statistics in Medicine, 25:2737–2761, 2006.