This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Unit Selection with Nonbinary Treatment and Effect

Ang Li, Judea Pearl
Abstract

The unit selection problem aims to identify a set of individuals who are most likely to exhibit a desired mode of behavior, for example, selecting individuals who would respond one way if encouraged and a different way if not encouraged. Using a combination of experimental and observational data, Li and Pearl derived tight bounds on the “benefit function”, which is the payoff/cost associated with selecting an individual with given characteristics. This paper extends the benefit function to the general form such that the treatment and effect are not restricted to binary. We propose an algorithm to test the identifiability of the nonbinary benefit function and an algorithm to compute the bounds of the nonbinary benefit function using experimental and observational data.

Introduction

Several areas of industry, marketing, and health science face the unit selection dilemma. For example, in customer relationship management (Berson, Smith, and Thearling 1999; Lejeune 2001; Hung, Yen, and Wang 2006; Tsai and Lu 2009), it is useful to determine the customers who are going to leave but might reconsider if encouraged to stay. Due to the high expense of such initiatives, management is forced to limit inducement to customers who are most likely to exhibit the behavior of interest. As another example, companies are interested in identifying users who would click on an advertisement if and only if it is highlighted in online advertising (Yan et al. 2009; Bottou et al. 2013; Li et al. 2014; Sun et al. 2015). The challenge in identifying these users stems from the fact that the desired response pattern is not observed directly but rather is defined counterfactually in terms of what the individual would do under hypothetical unrealized conditions. For example, when we observe that a user has clicked on a highlighted advertisement, we do not know whether they would click on that same advertisement if it were not highlighted.

The binary benefit function for the unit selection problem was defined by Li and Pearl (Li and Pearl 2019) (we will call this Li-Pearl’s model), and it properly captures the nature of the desired behavior. Using a combination of experimental and observational data, Li and Pearl derived tight bounds of the benefit function. The only assumption is that the treatment has no effect on the population-specific characteristics. Inspired by the idea of Mueller, Li, and Pearl (Mueller, Li, and Pearl 2021) and Dawid et al. (Dawid, Musio, and Murtas 2017) that the bounds of probabilities of causation could be narrowed using covariates information, Li and Pearl (Li and Pearl 2022c) narrowed the bounds of the benefit function using covariates information and their causal structure. However, the abovementioned studies are based on binary treatment and effect. Recently, researchers have shown interest in developing bounds for probabilities of causation with nonbinary treatment and effect. Zhang, Tian, and Bareinboim (Zhang, Tian, and Bareinboim 2022), as well as Li and Pearl (Li and Pearl 2022a), proposed nonlinear programming-based solutions to compute the bounds of nonbinary probabilities of causation numerically. Li and Pearl (Li and Pearl 2022b) provided the theoretical bounds of nonbinary probabilities of causation. The benefit function is a linear combination of probabilities of causation; therefore, in this paper, we focus on discovering the bounds of any benefit function without restricting them to binary treatment and effect.

Consider the following motivating scenario: a clinical study is conducted to test the effectiveness of a vaccine. The treatments include vaccinated and unvaccinated. The outcomes include uninfected, asymptomatic infected, and infected in a severe condition. The benefited individuals include the following: the individual who would be infected in a severe condition if unvaccinated and would be asymptomatic infected if vaccinated, the individual who would be infected in a severe condition if unvaccinated and would be uninfected if vaccinated, and the individual who would be asymptomatic infected if unvaccinated and would be uninfected if vaccinated. The harmed individuals include the following: the individual who would be asymptomatic infected if unvaccinated and would be infected in a severe condition if vaccinated, the individual who would be uninfected if unvaccinated and would be infected in a severe condition if vaccinated, and the individual who would be uninfected if unvaccinated and would be asymptomatic infected if vaccinated. All others are unaffected individuals. The researcher performing the clinical study has collected both experimental and observational data. The researcher then wants to know the expected difference between benefited and harmed individuals to emphasize the effectiveness of the vaccine.

We cannot apply Li-Pearl’s model because we have two treatments and three outcomes. In this paper, we extend Li-Pearl’s benefit function to general form without restricting them to binary treatment and effect. We will provide an algorithm to test the identifiability of the nonbinary benefit function and an algorithm to compute the bounds of the nonbinary benefit function using experimental and observational data.

Preliminaries

In this section, we review Li and Pearl’s binary benefit function of the unit selection problem (Li and Pearl 2019), and the theoretical bounds of the probabilities of causation recently proposed by Li and Pearl (Li and Pearl 2022b).

In this paper we use the language of counterfactuals in structural model semantics, as given in (Galles and Pearl 1998; Halpern 2000). we use Yx=yY_{x}=y to denote the counterfactual sentence “Variable YY would have the value yy, had XX been xx”. For simplicity purposes, in the rest of the paper, we use yxy_{x} to denote the event Yx=yY_{x}=y, yxy_{x^{\prime}} to denote the event Yx=yY_{x^{\prime}}=y, yxy^{\prime}_{x} to denote the event Yx=yY_{x}=y^{\prime}, and yxy^{\prime}_{x^{\prime}} to denote the event Yx=yY_{x^{\prime}}=y^{\prime}. we assume that experimental data will be summarized in the form of the causal efects such as P(yx)P(y_{x}) and observational data will be summarized in the form of the joint probability function such as P(x,y)P(x,y). If not specified, the variable XX stands for treatment and the variable YY stands for effect.

Individual behavior was classified into four response types: labeled complier, always-taker, never-taker, and defier. Suppose the benefit of selecting one individual in each category are β,γ,θ,δ\beta,\gamma,\theta,\delta respectively (i.e., the benefit vector is (β,γ,θ,δ)(\beta,\gamma,\theta,\delta)). Li and Pearl defined the objective function of the unit selection problem as the average benefit gained per individual. Suppose xx and xx^{\prime} are binary treatments, yy and yy^{\prime} are binary outcomes, and cc are population-specific characteristics, the objective function (i.e., benefit function) is following (If the goal is to evaluate the average benefit gained per individual for a specific population cc, argmaxcargmax_{c} can be dropped.):

argmaxc βP(yx,yx|c)+γP(yx,yx|c)+\displaystyle argmax_{c}\text{ }\beta P(y_{x},y^{\prime}_{x^{\prime}}|c)+\gamma P(y_{x},y_{x^{\prime}}|c)+
+θP(yx,yx|c)+δP(yx,yx|c).\displaystyle+\theta P(y^{\prime}_{x},y^{\prime}_{x^{\prime}}|c)+\delta P(y^{\prime}_{x},y_{x^{\prime}}|c).

Using a combination of experimental and observational data, Li and Pearl established the most general tight bounds on this benefit function (which we refer to as Li-Pearl’s Theorem in the rest of the paper). The only constraint is that the population-specific characteristics are not a descendant of the treatment.

Li and Pearl (Li and Pearl 2022b) provided eight theorems to compute bounds for any type of probabilities of causation with nonbinary treatment and effect. Suppose variable XX has mm values and YY has nn values, the following probabilities of causation are bounded. Besides, if the probabilities of causation are conditioned on a population-specific variable cc that is not affected by XX, then all the theorems still hold (we provided the extended theorems from Li and Pearl in the appendix).

P(yixj,yi),\displaystyle P({y_{i}}_{x_{j}},y_{i}),
s.t.,1in,1jm,\displaystyle s.t.,1\leq i\leq n,1\leq j\leq m,
P(yixj,yk),\displaystyle P({y_{i}}_{x_{j}},y_{k}),
s.t.,1i,kn,1jm,ik\displaystyle s.t.,1\leq i,k\leq n,1\leq j\leq m,i\neq k
P(yixj,xk),\displaystyle P({y_{i}}_{x_{j}},x_{k}),
s.t.,1in,1j,km,jk\displaystyle s.t.,1\leq i\leq n,1\leq j,k\leq m,j\neq k
P(yixj,yk,xp),\displaystyle P({y_{i}}_{x_{j}},y_{k},x_{p}),
s.t.,1i,kn,1j,pm,jp\displaystyle s.t.,1\leq i,k\leq n,1\leq j,p\leq m,j\neq p
P(yi1xj1,,yikxjk),\displaystyle P({y_{i_{1}}}_{x_{j_{1}}},...,{y_{i_{k}}}_{x_{j_{k}}}),
s.t.,1i1,,ikn,1j1,,jkm,j1jk\displaystyle s.t.,1\leq i_{1},...,i_{k}\leq n,1\leq j_{1},...,j_{k}\leq m,j_{1}\neq...\neq j_{k}
P(yi1xj1,,yikxjk,xp),\displaystyle P({y_{i_{1}}}_{x_{j_{1}}},...,{y_{i_{k}}}_{x_{j_{k}}},x_{p}),
s.t.,1i1,,ikn,1j1,,jk,pm,\displaystyle s.t.,1\leq i_{1},...,i_{k}\leq n,1\leq j_{1},...,j_{k},p\leq m,
j1jkp\displaystyle j_{1}\neq...\neq j_{k}\neq p
P(yi1xj1,,yikxjk,yq),\displaystyle P({y_{i_{1}}}_{x_{j_{1}}},...,{y_{i_{k}}}_{x_{j_{k}}},y_{q}),
s.t.,1i1,,ik,qn,1j1,,jkm,\displaystyle s.t.,1\leq i_{1},...,i_{k},q\leq n,1\leq j_{1},...,j_{k}\leq m,
j1jk\displaystyle j_{1}\neq...\neq j_{k}
P(yi1xj1,,yikxjk,xp,yq),\displaystyle P({y_{i_{1}}}_{x_{j_{1}}},...,{y_{i_{k}}}_{x_{j_{k}}},x_{p},y_{q}),
s.t.,1i1,,ik,qn,1j1,,jk,pm,\displaystyle s.t.,1\leq i_{1},...,i_{k},q\leq n,1\leq j_{1},...,j_{k},p\leq m,
j1jkp.\displaystyle j_{1}\neq...\neq j_{k}\neq p.

The benefit function is a linear combination of the probabilities of causation; therefore, we define the general benefit function for the unit selection problem based on Li and Pearl’s results.

Counterfactual Formulation of the Unit Selection Problem

Based on Li and Pearl (Li and Pearl 2019), the objective is to find a set of characteristics cc that maximizes the benefit associated with the resulting mixture of different response types of individuals. Let XX denotes the treatment with mm values and YY denotes the effect with nn values. Therefore, we have nmn^{m} different response types (i.e., one response type means assigning one effect to each of the treatments). Suppose the benefit of selecting an individual are (α1,,αnm)(\alpha_{1},...,\alpha_{n^{m}}) (we call (α1,,αnm)(\alpha_{1},...,\alpha_{n^{m}}) as benefit vector). Our objective, then, should be to find cc that maximizes the following expression (If the goal is to evaluate the average benefit gained per individual for a specific population cc, argmaxcargmax_{c} can be dropped):

argmaxc\displaystyle argmax_{c} α1P(y1x1,y1x2,,y1xm|c)+\displaystyle\alpha_{1}P({y_{1}}_{x_{1}},{y_{1}}_{x_{2}},...,{y_{1}}_{x_{m}}|c)+
α2P(y1x1,y1x2,,y2xm|c)+\displaystyle\alpha_{2}P({y_{1}}_{x_{1}},{y_{1}}_{x_{2}},...,{y_{2}}_{x_{m}}|c)+...
αnP(y1x1,y1x2,,ynxm|c)+\displaystyle\alpha_{n}P({y_{1}}_{x_{1}},{y_{1}}_{x_{2}},...,{y_{n}}_{x_{m}}|c)+...
αnm1+1P(y2x1,y1x2,,y1xm|c)+\displaystyle\alpha_{n^{m-1}+1}P({y_{2}}_{x_{1}},{y_{1}}_{x_{2}},...,{y_{1}}_{x_{m}}|c)+...
αnmP(ynx1,ynx2,,ynxm|c).\displaystyle\alpha_{n^{m}}P({y_{n}}_{x_{1}},{y_{n}}_{x_{2}},...,{y_{n}}_{x_{m}}|c).

Note that cc can be interpreted as the population-specific variable, the only assumption is that the treatment XX has no effect on the population-specific variable. Recall from Li and Pearl’s paper (Li and Pearl 2019), the benefit vector is provided by the decision-maker who uses the model.

In the next section, we will provide an algorithm that could check whether a given benefit function with the benefit vector is identifiable with purely experimental data (i.e., we can find the exact value of the benefit function rather than bounds). If it is not identifiable we will then provide an algorithm that computes the bounds of the benefit function given the benefit vector using experimental and observational data.

Main Results

Identifiability of Benefit Function

Recall that in binary case, the conditions of identifiable are gain equality (i.e., β+δ=γ+θ\beta+\delta=\gamma+\theta) or monotonicity (i.e., P(yx,yx)=0P(y_{x^{\prime}},y^{\prime}_{x})=0) (Li and Pearl 2019). Here, it is complicated in nonbinary cases, therefore; we provide an algorithm to test whether a given benefit function with the benefit vector is identifiable with purely experimental data.

Theorem 1.

Suppose variables XX has mm values x1,,xmx_{1},...,x_{m} and YY has nn values y1,,yny_{1},...,y_{n}. Then the benefit function f(c)f(c) is identifiable if Algorithm 1 returns (True, res), and res is the value of the benefit function.

f(c)=\displaystyle f(c)= α1P(y1x1,y1x2,,y1xm|c)+\displaystyle\alpha_{1}P({y_{1}}_{x_{1}},{y_{1}}_{x_{2}},...,{y_{1}}_{x_{m}}|c)+
α2P(y1x1,y1x2,,y2xm|c)+\displaystyle\alpha_{2}P({y_{1}}_{x_{1}},{y_{1}}_{x_{2}},...,{y_{2}}_{x_{m}}|c)+...
αnP(y1x1,y1x2,,ynxm|c)+\displaystyle\alpha_{n}P({y_{1}}_{x_{1}},{y_{1}}_{x_{2}},...,{y_{n}}_{x_{m}}|c)+...
αnm1+1P(y2x1,y1x2,,y1xm|c)+\displaystyle\alpha_{n^{m-1}+1}P({y_{2}}_{x_{1}},{y_{1}}_{x_{2}},...,{y_{1}}_{x_{m}}|c)+...
αnmP(ynx1,ynx2,,ynxm|c).\displaystyle\alpha_{n^{m}}P({y_{n}}_{x_{1}},{y_{n}}_{x_{2}},...,{y_{n}}_{x_{m}}|c).
Algorithm 1 Check identifiability of the benefit function

Input: aa, the benefit function,where a[i]a[i] is a m+1m+1 tuple that stands for ith term in the benefit function. If the ith term is αiP(yi1x1,yi2x2,,yimxm|c)\alpha_{i}P({y_{i_{1}}}_{x_{1}},{y_{i_{2}}}_{x_{2}},...,{y_{{i_{m}}}}_{x_{m}}|c), then a[i]=(αi,i1,i2,,im).a[i]=(\alpha_{i},i_{1},i_{2},...,i_{m}).

d[1,,m][1,,n]d[1,...,m][1,...,n], the experimental data, where d[i][j]=P(yjxi|c)d[i][j]=P({y_{j}}_{x_{i}}|c).

ee, the adjusted value of the benefit function.

The initial call of the algorithm is IBF(a[1,,nm],d[1,,m][1,,n],0)IBF(a[1,...,n^{m}],d[1,...,m][1,...,n],0), where a[1,,nm]a[1,...,n^{m}] corresponding to the original benefit function.

All lists in this algorithm start with index 11.

Output: (identifiable, value), a tuple, where identifiable == True if the given benefit function is identifiable and value is the value of the benefit function.

Function IBF(a,d,e)IBF(a,d,e):

1:  m=Truem=True;
2:  l=length(a)l=length(a);
3:  // Base case, if all benefit vector equals to (0,,0)(0,...,0), then the input benefit function is identifiable, and its value equals to the adjusted value.
4:  for i=1i=1 to ll do
5:     if a[i][1]0a[i][1]\neq 0 then
6:        m=Falsem=False;
7:        break;
8:     end if
9:  end for
10:  if m==Truem==True then
11:     Return(True,e)(True,e);
12:  end if
13:  // Build an equivalent benefit function by the fact that if 2r(m+1)s.t.,a[j1][r]==a[jnm1][r]\exists 2\leq r\leq(m+1)s.t.,a[j_{1}][r]=...=a[j_{n^{m-1}}][r], then the sum of these nm1n^{m-1} terms without coefficients is equal to P(ya[j1][r]xr)P({y_{a[j_{1}][r]}}_{x_{r}}), we then recursively solve the equivalent benefit function.
14:  for every nm1n^{m-1} pair in aa, (a[j1],,a[jnm1])(a[j_{1}],...,a[j_{n^{m-1}}]), s.t., there 2r(m+1)s.t.,a[j1][r]==a[jnm1][r]\exists 2\leq r\leq(m+1)s.t.,a[j_{1}][r]=...=a[j_{n^{m-1}}][r] do
15:     for k=1k=1 to nm1n^{m-1} do
16:        na=ana=a;
17:        nc=e+a[jk][1]d[r1][a[j1][r]]nc=e+a[j_{k}][1]*d[r-1][a[j_{1}][r]];
18:        for t=1t=1 to nm1n^{m-1} do
19:           if tkt\neq k then
20:              na[jt][1]=na[jt][1]na[jk][1]na[j_{t}][1]=na[j_{t}][1]-na[j_{k}][1];
21:           end if
22:        end for
23:        Remove(na[jk])Remove(na[j_{k}]);
24:        res=IBF(na,d,nc)res=IBF(na,d,nc);
25:        if res[0]==Trueres[0]==True then
26:           Return resres
27:        end if
28:     end for
29:  end for
30:  Return (False,e)(False,e);

The correctness of the algorithm simply follow the fact that nm1termsP(,yixj,|c)=P(yixj|c)\sum_{n^{m-1}\text{terms}}P(...,{y_{i}}_{x_{j}},...|c)=P({y_{i}}_{x_{j}}|c). Therefore, if there exist such nm1n^{m-1} terms in the benefit function, then we can obtain an equivalent benefit function by replacing one of the nm1n^{m-1} terms with experimental data P(yixj|c)P({y_{i}}_{x_{j}}|c). We exhausted all equivalent benefit functions to check if we could replace all the counterfactual terms with experimental data (i.e., identifiable).

For example, consider m=n=2m=n=2 and the benefit function:

7P(y1x1,y1x2|c)+2P(y1x1,y2x2|c)+\displaystyle 7P({y_{1}}_{x_{1}},{y_{1}}_{x_{2}}|c)+2P({y_{1}}_{x_{1}},{y_{2}}_{x_{2}}|c)+
4P(y2x1,y1x2|c)P(y2x1,y2x2|c)\displaystyle 4P({y_{2}}_{x_{1}},{y_{1}}_{x_{2}}|c)-P({y_{2}}_{x_{1}},{y_{2}}_{x_{2}}|c)
=\displaystyle= 7P(y1x1|c)5P(y1x1,y2x2|c)+\displaystyle 7P({y_{1}}_{x_{1}}|c)-5P({y_{1}}_{x_{1}},{y_{2}}_{x_{2}}|c)+
4P(y2x1,y1x2|c)P(y2x1,y2x2|c)\displaystyle 4P({y_{2}}_{x_{1}},{y_{1}}_{x_{2}}|c)-P({y_{2}}_{x_{1}},{y_{2}}_{x_{2}}|c)
=\displaystyle= 7P(y1x1|c)5P(y1x1,y2x2|c)+\displaystyle 7P({y_{1}}_{x_{1}}|c)-5P({y_{1}}_{x_{1}},{y_{2}}_{x_{2}}|c)+
4P(y2x1|c)5P(y2x1,y2x2|c)\displaystyle 4P({y_{2}}_{x_{1}}|c)-5P({y_{2}}_{x_{1}},{y_{2}}_{x_{2}}|c)
=\displaystyle= 7P(y1x1|c)5P(y2x2)+4P(y2x1|c).\displaystyle 7P({y_{1}}_{x_{1}}|c)-5P({y_{2}}_{x_{2}})+4P({y_{2}}_{x_{1}}|c).

Bounds of Benefit Function

If Algorithm 1 returns false, we then need to compute the bounds of the benefit function using experimental and observational data. We first obtain the bounds of the probabilities of causation, P(y1x1,y1x2,,y1xm|c)P({y_{1}}_{x_{1}},{y_{1}}_{x_{2}},...,{y_{1}}_{x_{m}}|c), …, P(y1x1,y1x2,,ynxm|c)P({y_{1}}_{x_{1}},{y_{1}}_{x_{2}},...,{y_{n}}_{x_{m}}|c), …, P(y2x1,y1x2,,y1xm|c)P({y_{2}}_{x_{1}},{y_{1}}_{x_{2}},...,{y_{1}}_{x_{m}}|c), …, P(ynx1,ynx2,,ynxm|c)P({y_{n}}_{x_{1}},{y_{n}}_{x_{2}},...,{y_{n}}_{x_{m}}|c), by Li and Pearl’s theorems (Li and Pearl 2022b). We then have the following theorem.

Theorem 2.

Suppose variables XX has mm values x1,,xmx_{1},...,x_{m} and YY has nn values y1,,yny_{1},...,y_{n}. Then the bounds of the benefit function f(c)f(c) is obtained by Algorithm 2.

f(c)=\displaystyle f(c)= α1P(y1x1,y1x2,,y1xm|c)+\displaystyle\alpha_{1}P({y_{1}}_{x_{1}},{y_{1}}_{x_{2}},...,{y_{1}}_{x_{m}}|c)+
α2P(y1x1,y1x2,,y2xm|c)+\displaystyle\alpha_{2}P({y_{1}}_{x_{1}},{y_{1}}_{x_{2}},...,{y_{2}}_{x_{m}}|c)+...
αnP(y1x1,y1x2,,ynxm|c)+\displaystyle\alpha_{n}P({y_{1}}_{x_{1}},{y_{1}}_{x_{2}},...,{y_{n}}_{x_{m}}|c)+...
αnm1+1P(y2x1,y1x2,,y1xm|c)+\displaystyle\alpha_{n^{m-1}+1}P({y_{2}}_{x_{1}},{y_{1}}_{x_{2}},...,{y_{1}}_{x_{m}}|c)+...
αnmP(ynx1,ynx2,,ynxm|c).\displaystyle\alpha_{n^{m}}P({y_{n}}_{x_{1}},{y_{n}}_{x_{2}},...,{y_{n}}_{x_{m}}|c).
Algorithm 2 Compute the bounds of the benefit function

Input: aa, the benefit function,where a[i]a[i] is a m+1m+1 tuple that stands for ith term in the benefit function. If the ith term is αiP(yi1x1,yi2x2,,yimxm|c)\alpha_{i}P({y_{i_{1}}}_{x_{1}},{y_{i_{2}}}_{x_{2}},...,{y_{{i_{m}}}}_{x_{m}}|c), then a[i]=(αi,i1,i2,,im).a[i]=(\alpha_{i},i_{1},i_{2},...,i_{m}).

lblb, the lower bound of all possible terms obtained from Li-Pearl’s theorems, where lb[(i1,i2,,im)]lb[(i_{1},i_{2},...,i_{m})] is the lower bound of P(yi1x1,yi2x2,,yimxm|c)P({y_{i_{1}}}_{x_{1}},{y_{i_{2}}}_{x_{2}},...,{y_{{i_{m}}}}_{x_{m}}|c).

ubub, the upper bound of all possible terms obtained from Li-Pearl’s theorems, where ub[(i1,i2,,im)]ub[(i_{1},i_{2},...,i_{m})] is the upper bound of P(yi1x1,yi2x2,,yimxm|c)P({y_{i_{1}}}_{x_{1}},{y_{i_{2}}}_{x_{2}},...,{y_{{i_{m}}}}_{x_{m}}|c).

ee, the adjusted value of the benefit function.
The initial call of the algorithm is BBF(a[1,,nm],lb,ub,0)BBF(a[1,...,n^{m}],lb,ub,0), where a[1,,nm]a[1,...,n^{m}] corresponding to the original benefit function.

All lists in this algorithm start with index 11.

Output: (lo, up), lower and upper bound of the benefit function.

Function BBF(a,lb,ub,e)BBF(a,lb,ub,e):

1:  l=length(a)l=length(a);
2:  // Base case, compute the bounds.
3:  up=e,lo=eup=e,lo=e;
4:  for i=1i=1 to ll do
5:     if a[i][1]<0a[i][1]<0 then
6:        lo=lo+a[i][1]ub[(a[i][2],,a[i][m+1])]lo=lo+a[i][1]*ub[(a[i][2],...,a[i][m+1])];
7:        up=up+a[i][1]lb[(a[i][2],,a[i][m+1])]up=up+a[i][1]*lb[(a[i][2],...,a[i][m+1])];
8:     else
9:        lo=lo+a[i][1]lb[(a[i][2],,a[i][m+1])]lo=lo+a[i][1]*lb[(a[i][2],...,a[i][m+1])];
10:        up=up+a[i][1]ub[(a[i][2],,a[i][m+1])]up=up+a[i][1]*ub[(a[i][2],...,a[i][m+1])];
11:     end if
12:  end for
13:  // Build an equivalent benefit function by the fact that if 2r(m+1)s.t.,a[j1][r]==a[jnm1][r]\exists 2\leq r\leq(m+1)s.t.,a[j_{1}][r]=...=a[j_{n^{m-1}}][r], then the sum of these nm1n^{m-1} terms without coefficients is equal to P(ya[j1][r]xr)P({y_{a[j_{1}][r]}}_{x_{r}}), we then recursively solve the equivalent benefit function.
14:  for every nm1n^{m-1} pair in aa, (a[j1],,a[jnm1])(a[j_{1}],...,a[j_{n^{m-1}}]), s.t., there 2r(m+1)s.t.,a[j1][r]==a[jnm1][r]\exists 2\leq r\leq(m+1)s.t.,a[j_{1}][r]=...=a[j_{n^{m-1}}][r] do
15:     for k=1k=1 to nm1n^{m-1} do
16:        na=ana=a;
17:        nc=e+a[jk][1]d[r1][a[j1][r]]nc=e+a[j_{k}][1]*d[r-1][a[j_{1}][r]];
18:        for t=1t=1 to nm1n^{m-1} do
19:           if tkt\neq k then
20:              na[jt][1]=na[jt][1]na[jk][1]na[j_{t}][1]=na[j_{t}][1]-na[j_{k}][1];
21:           end if
22:        end for
23:        Remove(na[jk])Remove(na[j_{k}]);
24:        res=BBF(na,lb,ub,nc)res=BBF(na,lb,ub,nc);
25:        lo=max{lo,res[0]}lo=\max\{lo,res[0]\}
26:        up=min{up,res[1]}up=\min\{up,res[1]\}
27:     end for
28:  end for
29:  Return (lo,up)(lo,up);

Again, the correctness of the algorithm simply follow the fact that nm1termsP(,yixj,|c)=P(yixj|c)\sum_{n^{m-1}\text{terms}}P(...,{y_{i}}_{x_{j}},...|c)=P({y_{i}}_{x_{j}}|c). We exhausted all equivalent benefit functions and take the maximum of all the lower bounds and take the minimum of all the upper bounds of equivalent benefit functions.

Example: Effectiveness of a Vaccine

Recall the motivating example at the beginning, a clinical study is conducted to test the effectiveness of a vaccine. The treatments include vaccinated and unvaccinated. The outcomes include uninfected, asymptomatic infected, and infected in a severe condition. The researcher of the clinical study has collected both experimental and observational data.

Task 1

The researcher wants to know the expected difference between benefited and harmed individuals to emphasize the effectiveness of the vaccine.

Let XX denotes vaccination with x1x_{1} being vaccinated and x2x_{2} being unvaccinated and YY denotes outcome, where y1y_{1} denotes uninfected, y2y_{2} denotes asymptomatic infected, and y3y_{3} denotes infected in a severe condition. The experimental and observational data of the clinical study are summarized in Tables 1 and 2.

Vaccinated Unvaccinated
Uninfected
5252
People
329329
People
Asymptomatic
512512
People
5858
People
Severe Condition
3636
People
213213
People
Overall
600600
People
600600
People
Table 1: Experimental data of the clinical study. Here, 600600 people were forced to take the vaccine and 600600 people were forced to take no vaccine.
Vaccinated Unvaccinated
Uninfected
1414
People
121121
People
Asymptomatic
933933
People
6565
People
Severe Condition
66
People
6161
People
Overall
953953
People
247247
People
Table 2: Observational data of the clinical study. Here, 12001200 people were free to the vaccine. 953953 people chose to take the vaccine and 247247 people chose to take no vaccine.

Based on the clinical study, the researcher of the vaccine claimed that the vaccine is effective in controlling the severe condition, the number of severe condition patients dropped from 213213 to only 3636.

Now consider the expected difference between benefited and harmed individuals. Recall the benefited individuals include the individual who would be infected in a severe condition if unvaccinated and would be asymptomatic infected if vaccinated, the individual who would be infected in a severe condition if unvaccinated and would be uninfected if vaccinated, and the individual who would be asymptomatic infected if unvaccinated and would be uninfected if vaccinated. The harmed individuals include the individual who would be asymptomatic infected if unvaccinated and would be infected in a severe condition if vaccinated, the individual who would be uninfected if unvaccinated and would be infected in a severe condition if vaccinated, and the individual who would be uninfected if unvaccinated and would be asymptomatic infected if vaccinated. All others are unaffected individuals. In order to maximize the difference between benefited and harmed individuals; therefore, we assign 11 to benefited individuals, assign 1-1 to harmed individuals, and 0 to all others in the benefit vector. The objective function (i.e., benefit function) is then

f(c)\displaystyle f(c) =\displaystyle= 0P(y1x1,y1x2|c)+P(y1x1,y2x2|c)+\displaystyle 0P({y_{1}}_{x_{1}},{y_{1}}_{x_{2}}|c)+P({y_{1}}_{x_{1}},{y_{2}}_{x_{2}}|c)+
P(y1x1,y3x2|c)P(y2x1,y1x2|c)+\displaystyle P({y_{1}}_{x_{1}},{y_{3}}_{x_{2}}|c)-P({y_{2}}_{x_{1}},{y_{1}}_{x_{2}}|c)+
0P(y2x1,y2x2|c)+P(y2x1,y3x2|c)\displaystyle 0P({y_{2}}_{x_{1}},{y_{2}}_{x_{2}}|c)+P({y_{2}}_{x_{1}},{y_{3}}_{x_{2}}|c)-
P(y3x1,y1x2|c)P(y3x1,y2x2|c)+\displaystyle-P({y_{3}}_{x_{1}},{y_{1}}_{x_{2}}|c)-P({y_{3}}_{x_{1}},{y_{2}}_{x_{2}}|c)+
0P(y3x1,y3x2|c).\displaystyle 0P({y_{3}}_{x_{1}},{y_{3}}_{x_{2}}|c).

The experimental data in Table 1 provide the following estimates:

P(y1x1|c)=52/600=0.087\displaystyle P({y_{1}}_{x_{1}}|c)=52/600=0.087
P(y2x1|c)=512/600=0.853\displaystyle P({y_{2}}_{x_{1}}|c)=512/600=0.853
P(y3x1|c)=36/600=0.060\displaystyle P({y_{3}}_{x_{1}}|c)=36/600=0.060
P(y1x2|c)=329/600=0.548\displaystyle P({y_{1}}_{x_{2}}|c)=329/600=0.548
P(y2x2|c)=58/600=0.097\displaystyle P({y_{2}}_{x_{2}}|c)=58/600=0.097
P(y3x2|c)=213/600=0.355\displaystyle P({y_{3}}_{x_{2}}|c)=213/600=0.355

The observational data Table 2 provide the following estimates:

P(x1,y1|c)=14/1200=0.012\displaystyle P(x_{1},y_{1}|c)=14/1200=0.012
P(x1,y2|c)=933/1200=0.778\displaystyle P(x_{1},y_{2}|c)=933/1200=0.778
P(x1,y3|c)=6/1200=0.005\displaystyle P(x_{1},y_{3}|c)=6/1200=0.005
P(x2,y1|c)=121/1200=0.101\displaystyle P(x_{2},y_{1}|c)=121/1200=0.101
P(x2,y2|c)=65/1200=0.054\displaystyle P(x_{2},y_{2}|c)=65/1200=0.054
P(x2,y3|c)=61/1200=0.051\displaystyle P(x_{2},y_{3}|c)=61/1200=0.051

We plug the estimates and the benefit function into Theorem 1, the Algorithm 1 returns false (i.e., not identifiable by experimental data). We then plug the estimates and the benefit function into Theorem 2 to obtain the bounds

0.228f(c)0.107\displaystyle-0.228\leq f(c)\leq-0.107

Thus, the expected difference between benefited and harmed individuals is at most 0.107-0.107 per individual. We can conclude that the vaccine is ineffective for the virus.

Task 2

The researcher of the clinic study claimed that the individual who would be infected in a severe condition if unvaccinated and would be uninfected if vaccinated and the individual who would be uninfected if unvaccinated and would be infected in a severe condition if vaccinated should be twice important than other individuals. Based on the clinical study, the number of severe condition patients dropped from 213213 to only 3636; therefore, the vaccine should be effective for the virus.

Now consider the expected difference between benefited and harmed individuals. The benefit vector should be the same except assigning 22 to the individual who would be infected in a severe condition if unvaccinated and would be uninfected if vaccinated and assigning 2-2 to the individual who would be uninfected if unvaccinated and would be infected in a severe condition if vaccinated.

The objective function (i.e., benefit function) is then

f(c)\displaystyle f(c) =\displaystyle= 0P(y1x1,y1x2|c)+P(y1x1,y2x2|c)+\displaystyle 0P({y_{1}}_{x_{1}},{y_{1}}_{x_{2}}|c)+P({y_{1}}_{x_{1}},{y_{2}}_{x_{2}}|c)+
2P(y1x1,y3x2|c)P(y2x1,y1x2|c)+\displaystyle 2P({y_{1}}_{x_{1}},{y_{3}}_{x_{2}}|c)-P({y_{2}}_{x_{1}},{y_{1}}_{x_{2}}|c)+
0P(y2x1,y2x2|c)+P(y2x1,y3x2|c)\displaystyle 0P({y_{2}}_{x_{1}},{y_{2}}_{x_{2}}|c)+P({y_{2}}_{x_{1}},{y_{3}}_{x_{2}}|c)-
2P(y3x1,y1x2|c)P(y3x1,y2x2|c)+\displaystyle-2P({y_{3}}_{x_{1}},{y_{1}}_{x_{2}}|c)-P({y_{3}}_{x_{1}},{y_{2}}_{x_{2}}|c)+
0P(y3x1,y3x2|c).\displaystyle 0P({y_{3}}_{x_{1}},{y_{3}}_{x_{2}}|c).

We plug the estimates and the benefit function into Theorem 1, the Algorithm 1 returns true (i.e., identifiable by experimental data) with value 0.167-0.167. The benefit function can be simplified as follow:

f(c)\displaystyle f(c) =\displaystyle= 2P(y1x1|c)+P(y2x1|c)\displaystyle 2P({y_{1}}_{x_{1}}|c)+P({y_{2}}_{x_{1}}|c)-
2P(y1x2|c)P(y2x2|c)\displaystyle-2P({y_{1}}_{x_{2}}|c)-P({y_{2}}_{x_{2}}|c)
=\displaystyle= 0.167.\displaystyle-0.167.

Thus, the expected difference between benefited and harmed individuals is exactly 0.167-0.167 per individual. We can conclude that the vaccine is still ineffective for the virus.

Simulated Results

In this section, we show the quality of the bounds of the benefit function obtained by Theorem 2 using four common benefit vectors.

First, we set m=2m=2 (i.e., XX has two values) and n=3n=3 (i.e., YY has three values). We set the benefit vector to one of the most common ones, (0,1,1,1,0,1,1,1,0)(0,1,1,-1,0,1,-1,-1,0), which is to evaluate the expected difference between benefited and harmed individuals. We randomly generated 10001000 populations where each population consists of different fractions of nine response types of individuals. For each population, we then generated sample distributions (observational data and experimental data) compatible with the fractions of response types (see the appendix for the generating algorithm). The advantage of this generating process is that we have the real benefit value (because we know the fractions of the response types) for comparison. Each sample population represents a different instantiate of the population-specific characteristics CC in the model. The generating algorithm ensures that the experimental data and observational data satisfy the general relation (i.e., P(x,y|c)P(yx|c)P(x,y|c)+1P(x|c)P(x,y|c)\leq P(y_{x}|c)\leq P(x,y|c)+1-P(x|c)). For a sample population ii, let [ai,bi][a_{i},b_{i}] be the bounds of the benefit function from the proposed theorem. We summarized the following criteria for each population as illustrated in Figure 1:

  • lower bound : aia_{i};

  • upper bound : bib_{i};

  • midpoint : (ai+bi)/2(a_{i}+b_{i})/2;

  • real benefit : dot product of the benefit vector and the fractions of response types;

Refer to caption
Figure 1: Bounds of the benefit function for 100100 sample populations out of 10001000 with the benefit vector (0,1,1,1,0,1,1,1,0)(0,1,1,-1,0,1,-1,-1,0).

From Figure 1, it is clear that the proposed bounds obtained from Theorem 2 are a good estimation of the real benefit. The lower and upper bounds are closely around the real benefit and the midpoints are almost identified with the real benefit. Besides, the average gap of the bounds, (biai)1000\frac{\sum(b_{i}-a_{i})}{1000}, is 0.3300.330, which is also small compared to the largest possible gap of 66.

Second, we set the benefit vector to another common one, (1,1,1,1,1,1,1,1,1)(-1,1,1,-1,-1,1,-1,-1,-1), which is to evaluate the expected difference between benefited and unbenefited (i.e., unaffected and harmed) individuals. We again randomly generated 10001000 populations where each population consists of different fractions of nine response types. The data generating process and all other factors remain the same. We summarized the same criteria for each population as illustrated in Figure 2.

Refer to caption
Figure 2: Bounds of the benefit function for 100100 sample populations out of 10001000 with the benefit vector (1,1,1,1,1,1,1,1,1)(-1,1,1,-1,-1,1,-1,-1,-1).

From Figure 2, it is clear that the proposed bounds obtained from Theorem 2 are a good estimation of the real benefit. The lower and upper bounds are closely around the real benefit and the midpoints are almost identified with the real benefit. Besides, the average gap of the bounds, (biai)1000\frac{\sum(b_{i}-a_{i})}{1000}, is 0.65200.6520, which is also small compared to the largest possible gap of 99.

Third, we set the benefit vector to another common one, (0,1,1,0,0,1,0,0,0)(0,1,1,0,0,1,0,0,0), which is to evaluate the expected benefited individuals. We again randomly generated 10001000 populations where each population consists of different fractions of nine response types. The data generating process and all other factors still remain the same. We summarized the same criteria for each population as illustrated in Figure 3.

Refer to caption
Figure 3: Bounds of the benefit function for 100100 sample populations out of 10001000 with the benefit vector (0,1,1,0,0,1,0,0,0)(0,1,1,0,0,1,0,0,0).

From Figure 3, it is clear that the proposed bounds obtained from Theorem 2 are a good estimation of the real benefit. The lower and upper bounds are closely around the real benefit and the midpoints are almost identified with the real benefit. Besides, the average gap of the bounds, (biai)1000\frac{\sum(b_{i}-a_{i})}{1000}, is 0.32840.3284, which is also small compared to the largest possible gap of 33.

Lastly, we set the benefit vector to the last common one, (0,0,0,1,0,0,1,1,0)(0,0,0,-1,0,0,-1,-1,0), which is to evaluate the expected harmed individuals (we set the benefit vector to 1-1 because we want to minimize the harmed individuals). We again randomly generated 10001000 populations where each population consists of different fractions of nine response types. The data generating process and all other factors still remain the same. We summarized the same criteria for each population as illustrated in Figure 4.

Refer to caption
Figure 4: Bounds of the benefit function for 100100 sample populations out of 10001000 with the benefit vector (0,0,0,1,0,0,1,1,0)(0,0,0,-1,0,0,-1,-1,0).

From Figure 4, it is clear that the proposed bounds obtained from Theorem 2 are a good estimation of the real benefit. The lower and upper bounds are closely around the real benefit and the midpoints are almost identified with the real benefit. Besides, the average gap of the bounds, (biai)1000\frac{\sum(b_{i}-a_{i})}{1000}, is 0.32660.3266, which is also small compared to the largest possible gap of 33.

Discussion

We have shown that the proposed theorems are a good estimation of the non-binary benefit function using examples and simulated studies. One may concern about the computation complexity of Algorithms 1 and 2. They are for sure in exponential time. However, the mm and nn (i.e., values of XX and YY) are usually small constant, therefore, we do not need to worry about too much.

Conclusion and Future Work

We demonstrated the formalization of the general benefit function with nonbinary treatment and effect. We provided the algorithm to compute the bounds of the general benefit function and the algorithm to check whether the benefit function is identifiable with purely experimental data. Examples and simulation results are provided to support the proposed theorems.

Future studies could assess the statistical properties of the proposed bounds. How tight would the bounds be? Does it sufficient to make decisions? Which data, experimental or observational, would affect the bounds more? How would the number of values in treatment and effect affect the quality of the bounds?

Another future direction could be to improve the bounds using covariate information as Li and Pearl (Li and Pearl 2019) did for the binary benefit function.

Acknowledgements

This research was supported in parts by grants from the National Science Foundation [#IIS-2106908], Office of Naval Research [#N00014-17-S-12091 and #N00014-21-1-2351], and Toyota Research Institute of North America [#PO-000897].

References

  • Berson, Smith, and Thearling (1999) Berson, A.; Smith, S.; and Thearling, K. 1999. Building data mining applications for CRM. McGraw-Hill Professional.
  • Bottou et al. (2013) Bottou, L.; Peters, J.; Quiñonero-Candela, J.; Charles, D. X.; Chickering, D. M.; Portugaly, E.; Ray, D.; Simard, P.; and Snelson, E. 2013. Counterfactual reasoning and learning systems: The example of computational advertising. The Journal of Machine Learning Research, 14(1): 3207–3260.
  • Dawid, Musio, and Murtas (2017) Dawid, P.; Musio, M.; and Murtas, R. 2017. The Probability of Causation. Law, Probability and Risk, (16): 163–179.
  • Galles and Pearl (1998) Galles, D.; and Pearl, J. 1998. An axiomatic characterization of causal counterfactuals. Foundations of Science, 3(1): 151–182.
  • Halpern (2000) Halpern, J. Y. 2000. Axiomatizing causal reasoning. Journal of Artificial Intelligence Research, 12: 317–337.
  • Hung, Yen, and Wang (2006) Hung, S.-Y.; Yen, D. C.; and Wang, H.-Y. 2006. Applying data mining to telecom churn management. Expert Systems with Applications, 31(3): 515–524.
  • Lejeune (2001) Lejeune, M. A. 2001. Measuring the impact of data mining on churn management. Internet Research, 11(5): 375–387.
  • Li and Pearl (2019) Li, A.; and Pearl, J. 2019. Unit Selection Based on Counterfactual Logic. In Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI-19, 1793–1799. International Joint Conferences on Artificial Intelligence Organization.
  • Li and Pearl (2022a) Li, A.; and Pearl, J. 2022a. Bounds on causal effects and application to high dimensional data. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, 5773–5780.
  • Li and Pearl (2022b) Li, A.; and Pearl, J. 2022b. Probabilities of Causation with Non-binary Treatment and Effect. Technical Report R-516, Department of Computer Science, University of California, Los Angeles, CA.
  • Li and Pearl (2022c) Li, A.; and Pearl, J. 2022c. Unit selection with causal diagram. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, 5765–5772.
  • Li et al. (2014) Li, L.; Chen, S.; Kleban, J.; and Gupta, A. 2014. Counterfactual estimation and optimization of click metrics for search engines. arXiv preprint arXiv:1403.1891.
  • Mueller, Li, and Pearl (2021) Mueller, S.; Li, A.; and Pearl, J. 2021. Causes of effects: Learning individual responses from population data. Technical Report R-505, <<http://ftp.cs.ucla.edu/pub/stat_ser/r505.pdf>>, Department of Computer Science, University of California, Los Angeles, CA. Forthcoming, Proceedings of IJCAI-2022.
  • Sun et al. (2015) Sun, W.; Wang, P.; Yin, D.; Yang, J.; and Chang, Y. 2015. Causal inference via sparse additive models with application to online advertising. In AAAI, 297–303.
  • Tsai and Lu (2009) Tsai, C.-F.; and Lu, Y.-H. 2009. Customer churn prediction by hybrid neural networks. Expert Systems with Applications, 36(10): 12547–12553.
  • Yan et al. (2009) Yan, J.; Liu, N.; Wang, G.; Zhang, W.; Jiang, Y.; and Chen, Z. 2009. How much can behavioral targeting help online advertising? In Proceedings of the 18th international conference on World Wide Web, 261–270. ACM.
  • Zhang, Tian, and Bareinboim (2022) Zhang, J.; Tian, J.; and Bareinboim, E. 2022. Partial counterfactual identification from observational and experimental data. In International Conference on Machine Learning, 26548–26558. PMLR.

Appendix A Appendix

Proof of Theorems

Theorem 1.

Suppose variables XX has mm values x1,,xmx_{1},...,x_{m} and YY has nn values y1,,yny_{1},...,y_{n}. Then the benefit function f(c)f(c) is identifiable if Algorithm 1 returns (True, res), and res is the value of the benefit function.

f(c)=\displaystyle f(c)= α1P(y1x1,y1x2,,y1xm|c)+\displaystyle\alpha_{1}P({y_{1}}_{x_{1}},{y_{1}}_{x_{2}},...,{y_{1}}_{x_{m}}|c)+
α2P(y1x1,y1x2,,y2xm|c)+\displaystyle\alpha_{2}P({y_{1}}_{x_{1}},{y_{1}}_{x_{2}},...,{y_{2}}_{x_{m}}|c)+...
αnP(y1x1,y1x2,,ynxm|c)+\displaystyle\alpha_{n}P({y_{1}}_{x_{1}},{y_{1}}_{x_{2}},...,{y_{n}}_{x_{m}}|c)+...
αnm1+1P(y2x1,y1x2,,y1xm|c)+\displaystyle\alpha_{n^{m-1}+1}P({y_{2}}_{x_{1}},{y_{1}}_{x_{2}},...,{y_{1}}_{x_{m}}|c)+...
αnmP(ynx1,ynx2,,ynxm|c).\displaystyle\alpha_{n^{m}}P({y_{n}}_{x_{1}},{y_{n}}_{x_{2}},...,{y_{n}}_{x_{m}}|c).
Proof.

The proof is simple.
Lines 11 to 1212 in Algorithm 1 simply check whether the given benefit function ff (encoded as aa in the algorithm) is identifiable.
On line 2424 of the algorithm, we recursively call on another benefit function ff^{\prime} (encoded as nana in the algorithm).
Now lets consider how we obtain ff^{\prime}.
if there exist nm1n^{m-1} terms in aa and rr, s.t., a[j1][r]==a[jnm1][r]a[j_{1}][r]=...=a[j_{n^{m-1}}][r], these nm1n^{m-1} terms are P(,ya[j1][r]xr1,|c)P(...,{y_{a[j_{1}][r]}}_{x_{r-1}},...|c), and the sum of these nm1n^{m-1} terms is equal to P(ya[j1][r]xr1|c)P({y_{a[j_{1}][r]}}_{x_{r-1}}|c).
We obtain ff^{\prime} by eliminating kth of the nm1n^{m-1} terms in ff, replacing kth term by other nm11n^{m-1}-1 terms and their sum P(ya[j1][r]xr1|c)P({y_{a[j_{1}][r]}}_{x_{r-1}}|c).
Therefore, f=f+ncf=f^{\prime}+nc where nc=a[jk][1]P(ya[j1][r]xr1|c)nc=a[j_{k}][1]*P({y_{a[j_{1}][r]}}_{x_{r-1}}|c). Thus, ff is identifiable if and only if ff^{\prime} is identifiable and f=f+ncf=f^{\prime}+nc. ∎

Theorem 2.

Suppose variables XX has mm values x1,,xmx_{1},...,x_{m} and YY has nn values y1,,yny_{1},...,y_{n}. Then the bounds of the benefit function f(c)f(c) is obtained by Algorithm 2.

f(c)=\displaystyle f(c)= α1P(y1x1,y1x2,,y1xm|c)+\displaystyle\alpha_{1}P({y_{1}}_{x_{1}},{y_{1}}_{x_{2}},...,{y_{1}}_{x_{m}}|c)+
α2P(y1x1,y1x2,,y2xm|c)+\displaystyle\alpha_{2}P({y_{1}}_{x_{1}},{y_{1}}_{x_{2}},...,{y_{2}}_{x_{m}}|c)+...
αnP(y1x1,y1x2,,ynxm|c)+\displaystyle\alpha_{n}P({y_{1}}_{x_{1}},{y_{1}}_{x_{2}},...,{y_{n}}_{x_{m}}|c)+...
αnm1+1P(y2x1,y1x2,,y1xm|c)+\displaystyle\alpha_{n^{m-1}+1}P({y_{2}}_{x_{1}},{y_{1}}_{x_{2}},...,{y_{1}}_{x_{m}}|c)+...
αnmP(ynx1,ynx2,,ynxm|c).\displaystyle\alpha_{n^{m}}P({y_{n}}_{x_{1}},{y_{n}}_{x_{2}},...,{y_{n}}_{x_{m}}|c).
Proof.

Similarly to Theorem 1,
lines 11 to 1212 in Algorithm 2 simply compute the bounds of the given benefit function ff (encoded as aa in the algorithm).
On line 2424 of the algorithm, we recursively call on another benefit function ff^{\prime} (encoded as nana in the algorithm).
Now lets consider how we obtain ff^{\prime}.
if there exist nm1n^{m-1} terms in aa and rr, s.t., a[j1][r]==a[jnm1][r]a[j_{1}][r]=...=a[j_{n^{m-1}}][r], these nm1n^{m-1} terms are P(,ya[j1][r]xr1,|c)P(...,{y_{a[j_{1}][r]}}_{x_{r-1}},...|c), and the sum of these nm1n^{m-1} terms is equal to P(ya[j1][r]xr1|c)P({y_{a[j_{1}][r]}}_{x_{r-1}}|c).
We obtain ff^{\prime} by eliminating kth of the nm1n^{m-1} terms in ff, replacing kth term by other nm11n^{m-1}-1 terms and their sum P(ya[j1][r]xr1|c)P({y_{a[j_{1}][r]}}_{x_{r-1}}|c).
Therefore, f=f+ncf=f^{\prime}+nc where nc=a[jk][1]P(ya[j1][r]xr1|c)nc=a[j_{k}][1]*P({y_{a[j_{1}][r]}}_{x_{r-1}}|c). Thus, the bounds of f+ncf^{\prime}+nc is the bounds of ff. ∎

Li-Pearl’s Bounds of Probabilities of Causation

The input, lb,ublb,ub, in Algorithm 2 depends on the bounds of probabilities of causation. The bounds of probabilities of causation recently proposed by Li and Pearl (Li and Pearl 2022b) is not conditional CC. However, nothing is changed if conditioning on a variable CC that is not affected by XX. We listed the conditional version of the eight theorems proposed by Li and Pearl. The proof of eight theorems is exactly the same, except every probability should be conditioned on CC.

Theorem 3.

Suppose variable XX has mm values x1,,xmx_{1},...,x_{m} and YY has nn values y1,,yny_{1},...,y_{n}, and variable CC is not affected by XX, then the probability of causation P(yixj,yi|c)P({y_{i}}_{x_{j}},y_{i}|c), where 1in,1jm1\leq i\leq n,1\leq j\leq m, is bounded as following:

max{P(xj,yi|c),P(yixj|c)+P(yi|c)1}P(yixj,yi|c)\displaystyle\max\left\{\begin{array}[]{cc}P(x_{j},y_{i}|c),\\ P({y_{i}}_{x_{j}}|c)+P(y_{i}|c)-1\\ \end{array}\right\}\leq P({y_{i}}_{x_{j}},y_{i}|c)
P(yixj,yi|c)min{P(yixj|c),P(yi|c)}\displaystyle P({y_{i}}_{x_{j}},y_{i}|c)\leq\min\left\{\begin{array}[]{cc}P({y_{i}}_{x_{j}}|c),\\ P(y_{i}|c)\\ \end{array}\right\}
Theorem 4.

Suppose variable XX has mm values x1,,xmx_{1},...,x_{m} and YY has nn values y1,,yny_{1},...,y_{n}, and variable CC is not affected by XX, then the probability of causation P(yixj,yk|c)P({y_{i}}_{x_{j}},y_{k}|c), where 1i,kn,1jm,ik1\leq i,k\leq n,1\leq j\leq m,i\neq k, is bounded as following:

max{0,P(yixj|c)+P(yk|c)1,1pm,pjmax{0,P(yixj|c)+P(xp,yk|c)1+P(xj|c)P(xj,yi|c)}}\displaystyle\max\left\{\begin{array}[]{cc}0,\\ P({y_{i}}_{x_{j}}|c)+P(y_{k}|c)-1,\\ \sum_{1\leq p\leq m,p\neq j}\max\left\{\begin{array}[]{cc}0,\\ P({y_{i}}_{x_{j}}|c)\\ +P(x_{p},y_{k}|c)\\ -1+P(x_{j}|c)\\ -P(x_{j},y_{i}|c)\\ \end{array}\right\}\end{array}\right\}
P(yixj,yk|c)\displaystyle\leq P({y_{i}}_{x_{j}},y_{k}|c)
P(yixj,yk|c)min{P(yixj|c)P(xj,yi|c),P(yk|c)P(yk,xj|c)}\displaystyle P({y_{i}}_{x_{j}},y_{k}|c)\leq\min\left\{\begin{array}[]{cc}P({y_{i}}_{x_{j}}|c)-P(x_{j},y_{i}|c),\\ P(y_{k}|c)-P(y_{k},x_{j}|c)\\ \end{array}\right\}
Theorem 5.

Suppose variable XX has mm values x1,,xmx_{1},...,x_{m} and YY has nn values y1,,yny_{1},...,y_{n}, and variable CC is not affected by XX, then the probability of causation P(yixj,xk|c)P({y_{i}}_{x_{j}},x_{k}|c), where 1in,1j,km,jk1\leq i\leq n,1\leq j,k\leq m,j\neq k, is bounded as following:

max{0,P(yixj|c)P(xj,yi|c)1+P(xj|c)+P(xk|c)}P(yixj,xk|c)\displaystyle\max\left\{\begin{array}[]{cc}0,\\ P({y_{i}}_{x_{j}}|c)-P(x_{j},y_{i}|c)\\ -1+P(x_{j}|c)+P(x_{k}|c)\\ \end{array}\right\}\leq P({y_{i}}_{x_{j}},x_{k}|c)
P(yixj,xk|c)min{P(yixj|c)P(xj,yi|c),P(xk|c)}\displaystyle P({y_{i}}_{x_{j}},x_{k}|c)\leq\min\left\{\begin{array}[]{cc}P({y_{i}}_{x_{j}}|c)-P(x_{j},y_{i}|c),\\ P(x_{k}|c)\\ \end{array}\right\}
Theorem 6.

Suppose variable XX has mm values x1,,xmx_{1},...,x_{m} and YY has nn values y1,,yny_{1},...,y_{n}, and variable CC is not affected by XX, then the probability of causation P(yixj,yk,xp|c)P({y_{i}}_{x_{j}},y_{k},x_{p}|c), where 1i,kn,1j,pm,jp1\leq i,k\leq n,1\leq j,p\leq m,j\neq p, is bounded as following:

max{0,P(yixj|c)+P(xp,yk|c)1+P(xj|c)P(xj,yi|c)}P(yixj,yk,xp|c)\displaystyle\max\left\{\begin{array}[]{cc}0,\\ P({y_{i}}_{x_{j}}|c)+P(x_{p},y_{k}|c)\\ -1+P(x_{j}|c)-P(x_{j},y_{i}|c)\\ \end{array}\right\}\leq P({y_{i}}_{x_{j}},y_{k},x_{p}|c)
P(yixj,yk,xp|c)min{P(yixj|c)P(xj,yi|c),P(xp,yk|c)}\displaystyle P({y_{i}}_{x_{j}},y_{k},x_{p}|c)\leq\min\left\{\begin{array}[]{cc}P({y_{i}}_{x_{j}}|c)-P(x_{j},y_{i}|c),\\ P(x_{p},y_{k}|c)\\ \end{array}\right\}
Theorem 7.

Suppose variable XX has mm values x1,,xmx_{1},...,x_{m} and YY has nn values y1,,yny_{1},...,y_{n}, and variable CC is not affected by XX, then the probability of causation P(yi1xj1,,yikxjk|c)P({y_{i_{1}}}_{x_{j_{1}}},...,{y_{i_{k}}}_{x_{j_{k}}}|c), where 1i1,,ikn,1j1,,jkm,j1jk1\leq i_{1},...,i_{k}\leq n,1\leq j_{1},...,j_{k}\leq m,j_{1}\neq...\neq j_{k}, is bounded as following:

max{0,1tkP(yitxjt|c)k+1,max1tk(LB(P(yi1xj1,,yit1xjt1,yit+1xjt+1,,yikxjk|c))+P(yitxit|c)1),1pm,s.t.,r,1rk,p=jrLB(P(yi1xj1,,yir1xjr1,yir+1xjr+1,,yikxjk,xjr,yir|c))+1pm,s.t.,pj1jkLB(P(yi1xj1,,yikxjk,xp|c))}\displaystyle\max\left\{\begin{array}[]{cc}0,\\ \\ \sum_{1\leq t\leq k}P({y_{i_{t}}}_{x_{j_{t}}}|c)-k+1,\\ \\ \max_{1\leq t\leq k}(LB(P({y_{i_{1}}}_{x_{j_{1}}},...,{y_{i_{t-1}}}_{x_{j_{t-1}}},\\ {y_{i_{t+1}}}_{x_{j_{t+1}}},...,{y_{i_{k}}}_{x_{j_{k}}}|c))\\ +P({y_{i_{t}}}_{x_{i_{t}}}|c)-1),\\ \\ \sum_{1\leq p\leq m,s.t.,\exists r,1\leq r\leq k,p=j_{r}}\\ LB(P({y_{i_{1}}}_{x_{j_{1}}},...,{y_{i_{r-1}}}_{x_{j_{r-1}}},\\ {y_{i_{r+1}}}_{x_{j_{r+1}}},...,{y_{i_{k}}}_{x_{j_{k}}},x_{j_{r}},y_{i_{r}}|c))+\\ \sum_{1\leq p\leq m,s.t.,p\neq j_{1}\neq...\neq j_{k}}\\ LB(P({y_{i_{1}}}_{x_{j_{1}}},...,{y_{i_{k}}}_{x_{j_{k}}},x_{p}|c))\\ \end{array}\right\}
P(yi1xj1,,yikxjk|c)\displaystyle\leq P({y_{i_{1}}}_{x_{j_{1}}},...,{y_{i_{k}}}_{x_{j_{k}}}|c)
P(yi1xj1,,yikxjk|c)\displaystyle P({y_{i_{1}}}_{x_{j_{1}}},...,{y_{i_{k}}}_{x_{j_{k}}}|c)\leq
min{min1tkP(yitxjt|c),min1tkUB(P(yi1xj1,,yit1xjt1,yit+1xjt+1,,yikxjk|c)),1pm,s.t.,r,1rk,p=jrUB(P(yi1xj1,,yir1xjr1,yir+1xjr+1,,yikxjk,xjr,yir|c))+1pm,s.t.,pj1jkUB(P(yi1xj1,,yikxjk,xp|c))}\displaystyle\min\left\{\begin{array}[]{cc}\min_{1\leq t\leq k}P({y_{i_{t}}}_{x_{j_{t}}}|c),\\ \\ \min_{1\leq t\leq k}UB(P({y_{i_{1}}}_{x_{j_{1}}},...,{y_{i_{t-1}}}_{x_{j_{t-1}}},\\ {y_{i_{t+1}}}_{x_{j_{t+1}}},...,{y_{i_{k}}}_{x_{j_{k}}}|c)),\\ \\ \sum_{1\leq p\leq m,s.t.,\exists r,1\leq r\leq k,p=j_{r}}\\ UB(P({y_{i_{1}}}_{x_{j_{1}}},...,{y_{i_{r-1}}}_{x_{j_{r-1}}},\\ {y_{i_{r+1}}}_{x_{j_{r+1}}},...,{y_{i_{k}}}_{x_{j_{k}}},x_{j_{r}},y_{i_{r}}|c))+\\ \sum_{1\leq p\leq m,s.t.,p\neq j_{1}\neq...\neq j_{k}}\\ UB(P({y_{i_{1}}}_{x_{j_{1}}},...,{y_{i_{k}}}_{x_{j_{k}}},x_{p}|c))\\ \end{array}\right\}

where,
LB(f)(f) denotes the lower bound of a function ff and UB(f)(f) denotes the upper bound of a function ff. The bounds of P(yi1xj1,,yir1xjr1,yir+1xjr+1,,yikxjk,xjr,yjr|c)P({y_{i_{1}}}_{x_{j_{1}}},...,{y_{i_{r-1}}}_{x_{j_{r-1}}},{y_{i_{r+1}}}_{x_{j_{r+1}}},...,{y_{i_{k}}}_{x_{j_{k}}},x_{j_{r}},y_{j_{r}}|c) are given by Theorem 6 or 10, the bounds of P(yi1xj1,,yikxjk,xp|c)P({y_{i_{1}}}_{x_{j_{1}}},...,{y_{i_{k}}}_{x_{j_{k}}},x_{p}|c) are given by Theorem 5 or 8, and the bounds of P(yi1xj1,,yit1xjt1,yit+1xjt+1,,yikxjk|c)P({y_{i_{1}}}_{x_{j_{1}}},...,{y_{i_{t-1}}}_{x_{j_{t-1}}},{y_{i_{t+1}}}_{x_{j_{t+1}}},...,{y_{i_{k}}}_{x_{j_{k}}}|c) are given by Theorem 7 or experimental data if k=2k=2.

Theorem 8.

Suppose variable XX has mm values x1,,xmx_{1},...,x_{m} and YY has nn values y1,,yny_{1},...,y_{n}, and variable CC is not affected by XX, then the probability of causation P(yi1xj1,,yikxjk,xp|c)P({y_{i_{1}}}_{x_{j_{1}}},...,{y_{i_{k}}}_{x_{j_{k}}},x_{p}|c), where 1i1,,ikn,1j1,,jk,pm,j1jkp1\leq i_{1},...,i_{k}\leq n,1\leq j_{1},...,j_{k},p\leq m,j_{1}\neq...\neq j_{k}\neq p, is bounded as following:

max{0,1tkP(yitxjt|c)+P(xp|c)k,max1tk(LB(P(yi1xj1,,yit1xjt1,yit+1xjt+1,,yikxjk|c))+LB(P(yitxit,xp|c))1)}\displaystyle\max\left\{\begin{array}[]{cc}0,\\ \\ \sum_{1\leq t\leq k}P({y_{i_{t}}}_{x_{j_{t}}}|c)+P(x_{p}|c)-k,\\ \\ \max_{1\leq t\leq k}(LB(P({y_{i_{1}}}_{x_{j_{1}}},...,{y_{i_{t-1}}}_{x_{j_{t-1}}},\\ {y_{i_{t+1}}}_{x_{j_{t+1}}},...,{y_{i_{k}}}_{x_{j_{k}}}|c))\\ +LB(P({y_{i_{t}}}_{x_{i_{t}}},x_{p}|c))-1)\\ \end{array}\right\}
P(yi1xj1,,yikxjk,xp|c)\displaystyle\leq P({y_{i_{1}}}_{x_{j_{1}}},...,{y_{i_{k}}}_{x_{j_{k}}},x_{p}|c)
P(yi1xj1,,yikxjk,xp|c)\displaystyle P({y_{i_{1}}}_{x_{j_{1}}},...,{y_{i_{k}}}_{x_{j_{k}}},x_{p}|c)\leq
min{min1tkP(yitxjt|c),P(xp|c),min1tkUB(P(yi1xj1,,yit1xjt1,yit+1xjt+1,,yikxjk|c)),min1tkUB(P(yitxit,xp|c))}\displaystyle\min\left\{\begin{array}[]{cc}\min_{1\leq t\leq k}P({y_{i_{t}}}_{x_{j_{t}}}|c),\\ \\ P(x_{p}|c),\\ \\ \min_{1\leq t\leq k}UB(P({y_{i_{1}}}_{x_{j_{1}}},...,{y_{i_{t-1}}}_{x_{j_{t-1}}},\\ {y_{i_{t+1}}}_{x_{j_{t+1}}},...,{y_{i_{k}}}_{x_{j_{k}}}|c)),\\ \\ \min_{1\leq t\leq k}UB(P({y_{i_{t}}}_{x_{i_{t}}},x_{p}|c))\\ \end{array}\right\}

where,
LB(f)(f) denotes the lower bound of a function ff and UB(f)(f) denotes the upper bound of a function ff. The bounds of P(yi1xj1,,yit1xjt1,yit+1xjt+1,,yikxjk|c)P({y_{i_{1}}}_{x_{j_{1}}},...,{y_{i_{t-1}}}_{x_{j_{t-1}}},{y_{i_{t+1}}}_{x_{j_{t+1}}},...,{y_{i_{k}}}_{x_{j_{k}}}|c) are given by Theorem 7 or experimental data if k=2k=2 and the bounds of P(yitxit,xp|c)P({y_{i_{t}}}_{x_{i_{t}}},x_{p}|c) are given by Theorem 5.

Theorem 9.

Suppose variable XX has mm values x1,,xmx_{1},...,x_{m} and YY has nn values y1,,yny_{1},...,y_{n}, and variable CC is not affected by XX, then the probability of causation P(yi1xj1,,yikxjk,yq|c)P({y_{i_{1}}}_{x_{j_{1}}},...,{y_{i_{k}}}_{x_{j_{k}}},y_{q}|c), where 1i1,,ik,qn,1j1,,jkm,j1jk1\leq i_{1},...,i_{k},q\leq n,1\leq j_{1},...,j_{k}\leq m,j_{1}\neq...\neq j_{k}, is bounded as following:

max{0,1tkP(yitxjt|c)+P(yq|c)k,max1tk(LB(P(yi1xj1,,yit1xjt1,yit+1xjt+1,,yikxjk|c))+LB(P(yitxit,yq|c))1),1pm,r,1rk,p=jr,q=irLB(P(yi1xj1,,yir1xjr1,yir+1xjr+1,,yikxjk,xjr,yir|c))+1pm,s.t.,pj1jkLB(P(yi1xj1,,yikxjk,xp,yq|c))}\displaystyle\max\left\{\begin{array}[]{cc}0,\\ \\ \sum_{1\leq t\leq k}P({y_{i_{t}}}_{x_{j_{t}}}|c)+P(y_{q}|c)-k,\\ \\ \max_{1\leq t\leq k}(LB(P({y_{i_{1}}}_{x_{j_{1}}},...,{y_{i_{t-1}}}_{x_{j_{t-1}}},\\ {y_{i_{t+1}}}_{x_{j_{t+1}}},...,{y_{i_{k}}}_{x_{j_{k}}}|c))\\ +LB(P({y_{i_{t}}}_{x_{i_{t}}},y_{q}|c))-1),\\ \\ \sum_{1\leq p\leq m,\exists r,1\leq r\leq k,p=j_{r},q=i_{r}}\\ LB(P({y_{i_{1}}}_{x_{j_{1}}},...,{y_{i_{r-1}}}_{x_{j_{r-1}}},\\ {y_{i_{r+1}}}_{x_{j_{r+1}}},...,{y_{i_{k}}}_{x_{j_{k}}},x_{j_{r}},y_{i_{r}}|c))+\\ \sum_{1\leq p\leq m,s.t.,p\neq j_{1}\neq...\neq j_{k}}\\ LB(P({y_{i_{1}}}_{x_{j_{1}}},...,{y_{i_{k}}}_{x_{j_{k}}},x_{p},y_{q}|c))\\ \end{array}\right\}
P(yi1xj1,,yikxjk,yq|c)\displaystyle\leq P({y_{i_{1}}}_{x_{j_{1}}},...,{y_{i_{k}}}_{x_{j_{k}}},y_{q}|c)
P(yi1xj1,,yikxjk,yq|c)\displaystyle P({y_{i_{1}}}_{x_{j_{1}}},...,{y_{i_{k}}}_{x_{j_{k}}},y_{q}|c)\leq
min{min1tkP(yitxjt|c),P(yq|c),min1tkUB(P(yi1xj1,,yit1xjt1,yit+1xjt+1,,yikxjk|c)),min1tkUB(P(yitxit,yq|c)),1pm,s.t.,r,1rk,p=jr,q=irUB(P(yi1xj1,,yir1xjr1,yir+1xjr+1,,yikxjk,xjr,yir|c))+1pm,s.t.,pj1jkUB(P(yi1xj1,,yikxjk,xp,yq|c))}\displaystyle\min\left\{\begin{array}[]{cc}\min_{1\leq t\leq k}P({y_{i_{t}}}_{x_{j_{t}}}|c),\\ \\ P(y_{q}|c),\\ \\ \min_{1\leq t\leq k}UB(P({y_{i_{1}}}_{x_{j_{1}}},...,{y_{i_{t-1}}}_{x_{j_{t-1}}},\\ {y_{i_{t+1}}}_{x_{j_{t+1}}},...,{y_{i_{k}}}_{x_{j_{k}}}|c)),\\ \\ \min_{1\leq t\leq k}UB(P({y_{i_{t}}}_{x_{i_{t}}},y_{q}|c)),\\ \\ \sum_{1\leq p\leq m,s.t.,\exists r,1\leq r\leq k,p=j_{r},q=i_{r}}\\ UB(P({y_{i_{1}}}_{x_{j_{1}}},...,{y_{i_{r-1}}}_{x_{j_{r-1}}},\\ {y_{i_{r+1}}}_{x_{j_{r+1}}},...,{y_{i_{k}}}_{x_{j_{k}}},x_{j_{r}},y_{i_{r}}|c))+\\ \sum_{1\leq p\leq m,s.t.,p\neq j_{1}\neq...\neq j_{k}}\\ UB(P({y_{i_{1}}}_{x_{j_{1}}},...,{y_{i_{k}}}_{x_{j_{k}}},x_{p},y_{q}|c))\\ \end{array}\right\}

where,
LB(f)(f) denotes the lower bound of a function ff and UB(f)(f) denotes the upper bound of a function ff. The bounds of P(yi1xj1,,yir1xjr1,yir+1xjr+1,,yikxjk,xjr,yjr|c)P({y_{i_{1}}}_{x_{j_{1}}},...,{y_{i_{r-1}}}_{x_{j_{r-1}}},{y_{i_{r+1}}}_{x_{j_{r+1}}},...,{y_{i_{k}}}_{x_{j_{k}}},x_{j_{r}},y_{j_{r}}|c), P(yi1xj1,,yikxjk,xp,yq|c)P({y_{i_{1}}}_{x_{j_{1}}},...,{y_{i_{k}}}_{x_{j_{k}}},x_{p},y_{q}|c) are given by Theorem 6 or 10, the bounds of P(yi1xj1,,yit1xjt1,yit+1xjt+1,,yikxjk|c)P({y_{i_{1}}}_{x_{j_{1}}},...,{y_{i_{t-1}}}_{x_{j_{t-1}}},{y_{i_{t+1}}}_{x_{j_{t+1}}},...,{y_{i_{k}}}_{x_{j_{k}}}|c) are given by Theorem 7 or experimental data if k=2k=2, and the bounds of P(yitxit,yq|c)P({y_{i_{t}}}_{x_{i_{t}}},y_{q}|c) are given by Theorem 3 or 4.

Theorem 10.

Suppose variable XX has mm values x1,,xmx_{1},...,x_{m} and YY has nn values y1,,yny_{1},...,y_{n}, and variable CC is not affected by XX, then the probability of causation P(yi1xj1,,yikxjk,xp,yq|c)P({y_{i_{1}}}_{x_{j_{1}}},...,{y_{i_{k}}}_{x_{j_{k}}},x_{p},y_{q}|c), where 1i1,,ik,qn,1j1,,jk,pm,j1jkp1\leq i_{1},...,i_{k},q\leq n,1\leq j_{1},...,j_{k},p\leq m,j_{1}\neq...\neq j_{k}\neq p, is bounded as following:

max{0,1tkP(yitxjt|c)+P(xp,yq|c)k,max1tk(LB(P(yi1xj1,,yit1xjt1,yit+1xjt+1,,yikxjk|c))+LB(P(yitxit,xp,yq|c))1)}\displaystyle\max\left\{\begin{array}[]{cc}0,\\ \\ \sum_{1\leq t\leq k}P({y_{i_{t}}}_{x_{j_{t}}}|c)+P(x_{p},y_{q}|c)-k,\\ \\ \max_{1\leq t\leq k}(LB(P({y_{i_{1}}}_{x_{j_{1}}},...,{y_{i_{t-1}}}_{x_{j_{t-1}}},\\ {y_{i_{t+1}}}_{x_{j_{t+1}}},...,{y_{i_{k}}}_{x_{j_{k}}}|c))\\ +LB(P({y_{i_{t}}}_{x_{i_{t}}},x_{p},y_{q}|c))-1)\\ \end{array}\right\}
P(yi1xj1,,yikxjk,xp,yq|c)\displaystyle\leq P({y_{i_{1}}}_{x_{j_{1}}},...,{y_{i_{k}}}_{x_{j_{k}}},x_{p},y_{q}|c)
P(yi1xj1,,yikxjk,xp,yq|c)\displaystyle P({y_{i_{1}}}_{x_{j_{1}}},...,{y_{i_{k}}}_{x_{j_{k}}},x_{p},y_{q}|c)\leq
min{min1tkP(yitxjt|c),P(xp,yq|c),min1tkUB(P(yi1xj1,,yit1xjt1,yit+1xjt+1,,yikxjk|c)),min1tkUB(P(yitxit,xp,yq|c))}\displaystyle\min\left\{\begin{array}[]{cc}\min_{1\leq t\leq k}P({y_{i_{t}}}_{x_{j_{t}}}|c),\\ \\ P(x_{p},y_{q}|c),\\ \\ \min_{1\leq t\leq k}UB(P({y_{i_{1}}}_{x_{j_{1}}},...,{y_{i_{t-1}}}_{x_{j_{t-1}}},\\ {y_{i_{t+1}}}_{x_{j_{t+1}}},...,{y_{i_{k}}}_{x_{j_{k}}}|c)),\\ \\ \min_{1\leq t\leq k}UB(P({y_{i_{t}}}_{x_{i_{t}}},x_{p},y_{q}|c))\\ \end{array}\right\}

where,
LB(f)(f) denotes the lower bound of a function ff and UB(f)(f) denotes the upper bound of a function ff. The bounds of P(yi1xj1,,yit1xjt1,yit+1xjt+1,,yikxjk|c)P({y_{i_{1}}}_{x_{j_{1}}},...,{y_{i_{t-1}}}_{x_{j_{t-1}}},{y_{i_{t+1}}}_{x_{j_{t+1}}},...,{y_{i_{k}}}_{x_{j_{k}}}|c) are given by Theorem 7 or experimental data if k=2k=2 and the bounds of P(yitxit,xp,yq|c)P({y_{i_{t}}}_{x_{i_{t}}},x_{p},y_{q}|c) are given by Theorem 6.

Calculation in the Example

Task 1

First by Theorem 7, we have,

0P(y1x1,y1x2|c)0.087,\displaystyle 0\leq P({y_{1}}_{x_{1}},{y_{1}}_{x_{2}}|c)\leq 0.087,
0P(y1x1,y2x2|c)0.066,\displaystyle 0\leq P({y_{1}}_{x_{1}},{y_{2}}_{x_{2}}|c)\leq 0.066,
0P(y1x1,y3x2|c)0.063,\displaystyle 0\leq P({y_{1}}_{x_{1}},{y_{3}}_{x_{2}}|c)\leq 0.063,
0.431P(y2x1,y1x2|c)0.523,\displaystyle 0.431\leq P({y_{2}}_{x_{1}},{y_{1}}_{x_{2}}|c)\leq 0.523,
0.026P(y2x1,y2x2|c)0.097,\displaystyle 0.026\leq P({y_{2}}_{x_{1}},{y_{2}}_{x_{2}}|c)\leq 0.097,
0.287P(y2x1,y3x2|c)0.355,\displaystyle 0.287\leq P({y_{2}}_{x_{1}},{y_{3}}_{x_{2}}|c)\leq 0.355,
0P(y3x1,y1x2|c)0.060,\displaystyle 0\leq P({y_{3}}_{x_{1}},{y_{1}}_{x_{2}}|c)\leq 0.060,
0P(y3x1,y2x2|c)0.059,\displaystyle 0\leq P({y_{3}}_{x_{1}},{y_{2}}_{x_{2}}|c)\leq 0.059,
0P(y3x1,y3x2|c)0.056.\displaystyle 0\leq P({y_{3}}_{x_{1}},{y_{3}}_{x_{2}}|c)\leq 0.056.

By Algorithm 2, the lower bound came from the following steps,

f(c)\displaystyle f(c) =\displaystyle= 0P(y1x1,y1x2|c)+P(y1x1,y2x2|c)+\displaystyle 0P({y_{1}}_{x_{1}},{y_{1}}_{x_{2}}|c)+P({y_{1}}_{x_{1}},{y_{2}}_{x_{2}}|c)+
P(y1x1,y3x2|c)P(y2x1,y1x2|c)+\displaystyle P({y_{1}}_{x_{1}},{y_{3}}_{x_{2}}|c)-P({y_{2}}_{x_{1}},{y_{1}}_{x_{2}}|c)+
0P(y2x1,y2x2|c)+P(y2x1,y3x2|c)\displaystyle 0P({y_{2}}_{x_{1}},{y_{2}}_{x_{2}}|c)+P({y_{2}}_{x_{1}},{y_{3}}_{x_{2}}|c)-
P(y3x1,y1x2|c)P(y3x1,y2x2|c)+\displaystyle-P({y_{3}}_{x_{1}},{y_{1}}_{x_{2}}|c)-P({y_{3}}_{x_{1}},{y_{2}}_{x_{2}}|c)+
0P(y3x1,y3x2|c)\displaystyle 0P({y_{3}}_{x_{1}},{y_{3}}_{x_{2}}|c)
=\displaystyle= P(y1x1,y2x2|c)+\displaystyle P({y_{1}}_{x_{1}},{y_{2}}_{x_{2}}|c)+
P(y1x1,y3x2|c)P(y2x1,y1x2|c)+\displaystyle P({y_{1}}_{x_{1}},{y_{3}}_{x_{2}}|c)-P({y_{2}}_{x_{1}},{y_{1}}_{x_{2}}|c)+
0P(y2x1,y2x2|c)+P(y2x1,y3x2|c)\displaystyle 0P({y_{2}}_{x_{1}},{y_{2}}_{x_{2}}|c)+P({y_{2}}_{x_{1}},{y_{3}}_{x_{2}}|c)-
P(y3x1,y1x2|c)P(y3x1,y2x2|c)+\displaystyle-P({y_{3}}_{x_{1}},{y_{1}}_{x_{2}}|c)-P({y_{3}}_{x_{1}},{y_{2}}_{x_{2}}|c)+
0P(y3x1,y3x2|c)\displaystyle 0P({y_{3}}_{x_{1}},{y_{3}}_{x_{2}}|c)
=\displaystyle= P(y1x1,y2x2|c)+\displaystyle P({y_{1}}_{x_{1}},{y_{2}}_{x_{2}}|c)+
P(y1x1,y3x2|c)P(y2x1,y1x2|c)+\displaystyle P({y_{1}}_{x_{1}},{y_{3}}_{x_{2}}|c)-P({y_{2}}_{x_{1}},{y_{1}}_{x_{2}}|c)+
P(y2x1,y3x2|c)\displaystyle P({y_{2}}_{x_{1}},{y_{3}}_{x_{2}}|c)-
P(y3x1,y1x2|c)P(y3x1,y2x2|c)+\displaystyle-P({y_{3}}_{x_{1}},{y_{1}}_{x_{2}}|c)-P({y_{3}}_{x_{1}},{y_{2}}_{x_{2}}|c)+
0P(y3x1,y3x2|c)\displaystyle 0P({y_{3}}_{x_{1}},{y_{3}}_{x_{2}}|c)
=\displaystyle= P(y1x1,y2x2|c)P(y2x1,y1x2|c)+\displaystyle P({y_{1}}_{x_{1}},{y_{2}}_{x_{2}}|c)-P({y_{2}}_{x_{1}},{y_{1}}_{x_{2}}|c)+
0P(y2x1,y3x2|c)\displaystyle 0P({y_{2}}_{x_{1}},{y_{3}}_{x_{2}}|c)-
P(y3x1,y1x2|c)P(y3x1,y2x2|c)\displaystyle-P({y_{3}}_{x_{1}},{y_{1}}_{x_{2}}|c)-P({y_{3}}_{x_{1}},{y_{2}}_{x_{2}}|c)-
P(y3x1,y3x2|c)+P(y3x2|c)\displaystyle-P({y_{3}}_{x_{1}},{y_{3}}_{x_{2}}|c)+P({y_{3}}_{x_{2}}|c)
=\displaystyle= P(y1x1,y2x2|c)P(y2x1,y1x2|c)+\displaystyle P({y_{1}}_{x_{1}},{y_{2}}_{x_{2}}|c)-P({y_{2}}_{x_{1}},{y_{1}}_{x_{2}}|c)+
0P(y2x1,y3x2|c)+0P(y3x1,y2x2|c)+\displaystyle 0P({y_{2}}_{x_{1}},{y_{3}}_{x_{2}}|c)+0P({y_{3}}_{x_{1}},{y_{2}}_{x_{2}}|c)+
0P(y3x1,y3x2|c)+P(y3x2|c)P(y3x1|c)\displaystyle 0P({y_{3}}_{x_{1}},{y_{3}}_{x_{2}}|c)+P({y_{3}}_{x_{2}}|c)-P({y_{3}}_{x_{1}}|c)
\displaystyle\geq LB(P(y1x1,y2x2|c))UB(P(y2x1,y1x2|c))+\displaystyle LB(P({y_{1}}_{x_{1}},{y_{2}}_{x_{2}}|c))-UB(P({y_{2}}_{x_{1}},{y_{1}}_{x_{2}}|c))+
+0+0+0+213/60036/600\displaystyle+0+0+0+213/600-36/600
=\displaystyle= 00.523+0.295\displaystyle 0-0.523+0.295
=\displaystyle= 0.228.\displaystyle-0.228.

and the upper bounds came from the following steps,

f(c)\displaystyle f(c) =\displaystyle= 0P(y1x1,y1x2|c)+P(y1x1,y2x2|c)+\displaystyle 0P({y_{1}}_{x_{1}},{y_{1}}_{x_{2}}|c)+P({y_{1}}_{x_{1}},{y_{2}}_{x_{2}}|c)+
P(y1x1,y3x2|c)P(y2x1,y1x2|c)+\displaystyle P({y_{1}}_{x_{1}},{y_{3}}_{x_{2}}|c)-P({y_{2}}_{x_{1}},{y_{1}}_{x_{2}}|c)+
0P(y2x1,y2x2|c)+P(y2x1,y3x2|c)\displaystyle 0P({y_{2}}_{x_{1}},{y_{2}}_{x_{2}}|c)+P({y_{2}}_{x_{1}},{y_{3}}_{x_{2}}|c)-
P(y3x1,y1x2|c)P(y3x1,y2x2|c)+\displaystyle-P({y_{3}}_{x_{1}},{y_{1}}_{x_{2}}|c)-P({y_{3}}_{x_{1}},{y_{2}}_{x_{2}}|c)+
0P(y3x1,y3x2|c)\displaystyle 0P({y_{3}}_{x_{1}},{y_{3}}_{x_{2}}|c)
=\displaystyle= P(y1x1,y2x2|c)+\displaystyle P({y_{1}}_{x_{1}},{y_{2}}_{x_{2}}|c)+
P(y1x1,y3x2|c)P(y2x1,y1x2|c)+\displaystyle P({y_{1}}_{x_{1}},{y_{3}}_{x_{2}}|c)-P({y_{2}}_{x_{1}},{y_{1}}_{x_{2}}|c)+
0P(y2x1,y2x2|c)+P(y2x1,y3x2|c)\displaystyle 0P({y_{2}}_{x_{1}},{y_{2}}_{x_{2}}|c)+P({y_{2}}_{x_{1}},{y_{3}}_{x_{2}}|c)-
P(y3x1,y1x2|c)P(y3x1,y2x2|c)+\displaystyle-P({y_{3}}_{x_{1}},{y_{1}}_{x_{2}}|c)-P({y_{3}}_{x_{1}},{y_{2}}_{x_{2}}|c)+
0P(y3x1,y3x2|c)\displaystyle 0P({y_{3}}_{x_{1}},{y_{3}}_{x_{2}}|c)
=\displaystyle= P(y1x1,y3x2|c)P(y2x1,y1x2|c)\displaystyle P({y_{1}}_{x_{1}},{y_{3}}_{x_{2}}|c)-P({y_{2}}_{x_{1}},{y_{1}}_{x_{2}}|c)-
P(y2x1,y2x2|c)+P(y2x1,y3x2|c)\displaystyle-P({y_{2}}_{x_{1}},{y_{2}}_{x_{2}}|c)+P({y_{2}}_{x_{1}},{y_{3}}_{x_{2}}|c)-
P(y3x1,y1x2|c)2P(y3x1,y2x2|c)+\displaystyle-P({y_{3}}_{x_{1}},{y_{1}}_{x_{2}}|c)-2P({y_{3}}_{x_{1}},{y_{2}}_{x_{2}}|c)+
0P(y3x1,y3x2|c)+P(y2x2|c)\displaystyle 0P({y_{3}}_{x_{1}},{y_{3}}_{x_{2}}|c)+P({y_{2}}_{x_{2}}|c)
=\displaystyle= P(y2x1,y1x2|c)\displaystyle-P({y_{2}}_{x_{1}},{y_{1}}_{x_{2}}|c)-
P(y2x1,y2x2|c)+0P(y2x1,y3x2|c)\displaystyle-P({y_{2}}_{x_{1}},{y_{2}}_{x_{2}}|c)+0P({y_{2}}_{x_{1}},{y_{3}}_{x_{2}}|c)-
P(y3x1,y1x2|c)2P(y3x1,y2x2|c)\displaystyle-P({y_{3}}_{x_{1}},{y_{1}}_{x_{2}}|c)-2P({y_{3}}_{x_{1}},{y_{2}}_{x_{2}}|c)-
P(y3x1,y3x2|c)+P(y2x2|c)+P(y3x2|c)\displaystyle-P({y_{3}}_{x_{1}},{y_{3}}_{x_{2}}|c)+P({y_{2}}_{x_{2}}|c)+P({y_{3}}_{x_{2}}|c)
=\displaystyle= 0P(y2x1,y2x2|c)+1P(y2x1,y3x2|c)\displaystyle 0P({y_{2}}_{x_{1}},{y_{2}}_{x_{2}}|c)+1P({y_{2}}_{x_{1}},{y_{3}}_{x_{2}}|c)-
P(y3x1,y2x2|c)+0P(y3x1,y3x2|c)+\displaystyle-P({y_{3}}_{x_{1}},{y_{2}}_{x_{2}}|c)+0P({y_{3}}_{x_{1}},{y_{3}}_{x_{2}}|c)+
+P(y2x2|c)+P(y3x2|c)P(y2x1|c)P(y3x1|c)\displaystyle+P({y_{2}}_{x_{2}}|c)+P({y_{3}}_{x_{2}}|c)-P({y_{2}}_{x_{1}}|c)-P({y_{3}}_{x_{1}}|c)
\displaystyle\leq 0+UB(P(y2x1,y3x2|c))\displaystyle 0+UB(P({y_{2}}_{x_{1}},{y_{3}}_{x_{2}}|c))-
LB(P(y3x1,y2x2|c))+0+\displaystyle-LB(P({y_{3}}_{x_{1}},{y_{2}}_{x_{2}}|c))+0+
+58/600+213/600512/60036/600\displaystyle+58/600+213/600-512/600-36/600
=\displaystyle= 0+0.3550+00.462\displaystyle 0+0.355-0+0-0.462
=\displaystyle= 0.107.\displaystyle-0.107.

Thus,

0.228f(c)0.107.\displaystyle-0.228\leq f(c)\leq-0.107.

Task 2

By Algorithm 1, the result came from the following steps,

f(c)\displaystyle f(c) =\displaystyle= 0P(y1x1,y1x2|c)+P(y1x1,y2x2|c)+\displaystyle 0P({y_{1}}_{x_{1}},{y_{1}}_{x_{2}}|c)+P({y_{1}}_{x_{1}},{y_{2}}_{x_{2}}|c)+
2P(y1x1,y3x2|c)P(y2x1,y1x2|c)+\displaystyle 2P({y_{1}}_{x_{1}},{y_{3}}_{x_{2}}|c)-P({y_{2}}_{x_{1}},{y_{1}}_{x_{2}}|c)+
0P(y2x1,y2x2|c)+P(y2x1,y3x2|c)\displaystyle 0P({y_{2}}_{x_{1}},{y_{2}}_{x_{2}}|c)+P({y_{2}}_{x_{1}},{y_{3}}_{x_{2}}|c)-
2P(y3x1,y1x2|c)P(y3x1,y2x2|c)+\displaystyle-2P({y_{3}}_{x_{1}},{y_{1}}_{x_{2}}|c)-P({y_{3}}_{x_{1}},{y_{2}}_{x_{2}}|c)+
0P(y3x1,y3x2|c)\displaystyle 0P({y_{3}}_{x_{1}},{y_{3}}_{x_{2}}|c)
=\displaystyle= P(y1x1,y2x2|c)+\displaystyle P({y_{1}}_{x_{1}},{y_{2}}_{x_{2}}|c)+
2P(y1x1,y3x2|c)P(y2x1,y1x2|c)+\displaystyle 2P({y_{1}}_{x_{1}},{y_{3}}_{x_{2}}|c)-P({y_{2}}_{x_{1}},{y_{1}}_{x_{2}}|c)+
0P(y2x1,y2x2|c)+P(y2x1,y3x2|c)\displaystyle 0P({y_{2}}_{x_{1}},{y_{2}}_{x_{2}}|c)+P({y_{2}}_{x_{1}},{y_{3}}_{x_{2}}|c)-
2P(y3x1,y1x2|c)P(y3x1,y2x2|c)+\displaystyle-2P({y_{3}}_{x_{1}},{y_{1}}_{x_{2}}|c)-P({y_{3}}_{x_{1}},{y_{2}}_{x_{2}}|c)+
0P(y3x1,y3x2|c)\displaystyle 0P({y_{3}}_{x_{1}},{y_{3}}_{x_{2}}|c)
=\displaystyle= 2P(y1x1,y3x2|c)P(y2x1,y1x2|c)\displaystyle 2P({y_{1}}_{x_{1}},{y_{3}}_{x_{2}}|c)-P({y_{2}}_{x_{1}},{y_{1}}_{x_{2}}|c)-
P(y2x1,y2x2|c)+P(y2x1,y3x2|c)\displaystyle-P({y_{2}}_{x_{1}},{y_{2}}_{x_{2}}|c)+P({y_{2}}_{x_{1}},{y_{3}}_{x_{2}}|c)-
2P(y3x1,y1x2|c)2P(y3x1,y2x2|c)+\displaystyle-2P({y_{3}}_{x_{1}},{y_{1}}_{x_{2}}|c)-2P({y_{3}}_{x_{1}},{y_{2}}_{x_{2}}|c)+
0P(y3x1,y3x2|c)+P(y2x2|c)\displaystyle 0P({y_{3}}_{x_{1}},{y_{3}}_{x_{2}}|c)+P({y_{2}}_{x_{2}}|c)
=\displaystyle= P(y2x1,y1x2|c)\displaystyle-P({y_{2}}_{x_{1}},{y_{1}}_{x_{2}}|c)-
P(y2x1,y2x2|c)P(y2x1,y3x2|c)\displaystyle-P({y_{2}}_{x_{1}},{y_{2}}_{x_{2}}|c)-P({y_{2}}_{x_{1}},{y_{3}}_{x_{2}}|c)-
2P(y3x1,y1x2|c)2P(y3x1,y2x2|c)\displaystyle-2P({y_{3}}_{x_{1}},{y_{1}}_{x_{2}}|c)-2P({y_{3}}_{x_{1}},{y_{2}}_{x_{2}}|c)-
2P(y3x1,y3x2|c)+P(y2x2|c)+2P(y3x2|c)\displaystyle-2P({y_{3}}_{x_{1}},{y_{3}}_{x_{2}}|c)+P({y_{2}}_{x_{2}}|c)+2P({y_{3}}_{x_{2}}|c)
=\displaystyle= 0P(y2x1,y2x2|c)+0P(y2x1,y3x2|c)\displaystyle 0P({y_{2}}_{x_{1}},{y_{2}}_{x_{2}}|c)+0P({y_{2}}_{x_{1}},{y_{3}}_{x_{2}}|c)-
2P(y3x1,y1x2|c)2P(y3x1,y2x2|c)\displaystyle-2P({y_{3}}_{x_{1}},{y_{1}}_{x_{2}}|c)-2P({y_{3}}_{x_{1}},{y_{2}}_{x_{2}}|c)-
2P(y3x1,y3x2|c)+\displaystyle-2P({y_{3}}_{x_{1}},{y_{3}}_{x_{2}}|c)+
+P(y2x2|c)+2P(y3x2|c)P(y2x1|c)\displaystyle+P({y_{2}}_{x_{2}}|c)+2P({y_{3}}_{x_{2}}|c)-P({y_{2}}_{x_{1}}|c)
=\displaystyle= 0P(y2x1,y2x2|c)+0P(y2x1,y3x2|c)+\displaystyle 0P({y_{2}}_{x_{1}},{y_{2}}_{x_{2}}|c)+0P({y_{2}}_{x_{1}},{y_{3}}_{x_{2}}|c)+
+0P(y3x1,y2x2|c)+0P(y3x1,y3x2|c)+\displaystyle+0P({y_{3}}_{x_{1}},{y_{2}}_{x_{2}}|c)+0P({y_{3}}_{x_{1}},{y_{3}}_{x_{2}}|c)+
P(y2x2|c)+2P(y3x2|c)\displaystyle P({y_{2}}_{x_{2}}|c)+2P({y_{3}}_{x_{2}}|c)-
P(y2x1|c)2P(y3x1|c)\displaystyle-P({y_{2}}_{x_{1}}|c)-2P({y_{3}}_{x_{1}}|c)
=\displaystyle= 58/600+426/600512/60072/600\displaystyle 58/600+426/600-512/600-72/600
=\displaystyle= 0.167.\displaystyle-0.167.

Distribution Generating Algorithm

Here, the sample distribution generating algorithm in the simulated studies is presented. It generated both experimental and observational data compatible with the fractions of response types of individuals. The data satisfy the general relation between experimental and observational data. Note that all four simulated studies shared the same distribution generating algorithm but with different benefit vectors.

Algorithm 3 Generate sample distributions for simulated studies

Input: numnum, number of samples needed.
Output: numnum sample distributions (observational data and experimental data).

1:  count=0count=0;
2:  while count<numcount<num do
3:     //rand(0,1)rand(0,1) is the function that random uniformly generate a number from 0 to 11.
4:     a=[]a=[];
5:     for i=1i=1 to 88 do
6:        a.append(rand(0,1))a.append(rand(0,1));
7:     end for
8:     a.append(1.0)a.append(1.0);
9:     a.sort()a.sort();
10:     // Each ckc_{k} corresponding to a sample distribution.
11:     k=countk=count;
12:     //ff is the fractions of response types of individuals, f[0]=P(y1x1,y1x2|ck),,f[8]=P(y3x1,y3x2|ck).f[0]=P({y_{1}}_{x_{1}},{y_{1}}_{x_{2}}|c_{k}),...,f[8]=P({y_{3}}_{x_{1}},{y_{3}}_{x_{2}}|c_{k}).
13:     f=[]f=[];
14:     f[0]=a[0]f[0]=a[0];
15:     for i=1i=1 to 88 do
16:        f[i]=a[i]a[i1]f[i]=a[i]-a[i-1];
17:     end for
18:     // Generate experimental data.
19:     P(y1x1|ck)=f[0]+f[1]+f[2]P({y_{1}}_{x_{1}}|c_{k})=f[0]+f[1]+f[2];
20:     P(y2x1|ck)=f[3]+f[4]+f[5]P({y_{2}}_{x_{1}}|c_{k})=f[3]+f[4]+f[5];
21:     P(y3x1|ck)=f[6]+f[7]+f[8]P({y_{3}}_{x_{1}}|c_{k})=f[6]+f[7]+f[8];
22:     P(y1x2|ck)=f[0]+f[3]+f[6]P({y_{1}}_{x_{2}}|c_{k})=f[0]+f[3]+f[6];
23:     P(y2x2|ck)=f[1]+f[4]+f[7]P({y_{2}}_{x_{2}}|c_{k})=f[1]+f[4]+f[7];
24:     P(y3x2|ck)=f[2]+f[5]+f[8]P({y_{3}}_{x_{2}}|c_{k})=f[2]+f[5]+f[8];
25:     // Generate observational data.
26:     P(x1,y1|ck)=rand(0,P(y1x1|ck))P({x_{1}},{y_{1}}|c_{k})=rand(0,P({y_{1}}_{x_{1}}|c_{k}));
27:     P(x1,y2|ck)=rand(0,P(y2x1|ck))P({x_{1}},{y_{2}}|c_{k})=rand(0,P({y_{2}}_{x_{1}}|c_{k}));
28:     P(x1|ck)=rand(P(x1,y1|ck)+P(x1,y2|ck),min{P(x1,y1|ck)+1P(y1x1|ck),P(x1,y2|ck)+1P(y1x2|ck)})P(x_{1}|c_{k})=rand(P({x_{1}},{y_{1}}|c_{k})+P({x_{1}},{y_{2}}|c_{k}),\min\{P({x_{1}},{y_{1}}|c_{k})+1-P({y_{1}}_{x_{1}}|c_{k}),P({x_{1}},{y_{2}}|c_{k})+1-P({y_{1}}_{x_{2}}|c_{k})\});
29:     P(x1,y3|ck)=P(x1|ck)P(x1,y1|ck)P(x1,y2|ck)P({x_{1}},{y_{3}}|c_{k})=P(x_{1}|c_{k})-P({x_{1}},{y_{1}}|c_{k})-P({x_{1}},{y_{2}}|c_{k});
30:     P(x2|ck)=1P(x1|ck)P(x_{2}|c_{k})=1-P(x_{1}|c_{k})
31:     P(x2,y1|ck)=rand(0,min{P(y1x2|ck),P(x2|ck)})P({x_{2}},{y_{1}}|c_{k})=rand(0,\min\{P({y_{1}}_{x_{2}}|c_{k}),P(x_{2}|c_{k})\});
32:     P(x2,y2|ck)=rand(0,min{P(y2x2|ck),P(x2|ck)P(x2,y1|ck)})P({x_{2}},{y_{2}}|c_{k})=rand(0,\min\{P({y_{2}}_{x_{2}}|c_{k}),P(x_{2}|c_{k})-P({x_{2}},{y_{1}}|c_{k})\});
33:     P(x2,y3|ck)=P(x2|ck)P(x2,y1|ck)P(x2,y2|ck)P({x_{2}},{y_{3}}|c_{k})=P(x_{2}|c_{k})-P({x_{2}},{y_{1}}|c_{k})-P({x_{2}},{y_{2}}|c_{k});
34:     //Validate the data, the experimental data and observational data should satisfies the following: P(x,y|ck)P(yx|ck)P(x,y|ck)+1P(x|ck)P(x,y|c_{k})\leq P(y_{x}|c_{k})\leq P(x,y|c_{k})+1-P(x|c_{k}).
35:     mark=Truemark=True
36:     for i=1i=1 to 33 do
37:        for j=1j=1 to 22 do
38:           if P(yixj|ck)<P(xj,yi|ck)P({y_{i}}_{x_{j}}|c_{k})<P(x_{j},y_{i}|c_{k}) or P(yixj|ck)>P(xj,yi|ck)+1P(xj|ck)P({y_{i}}_{x_{j}}|c_{k})>P(x_{j},y_{i}|c_{k})+1-P(x_{j}|c_{k}) then
39:              mark=Falsemark=False;
40:           end if
41:        end for
42:     end for
43:     if mark==Falsemark==False then
44:        continuecontinue;
45:     end if
46:     count=count+1count=count+1;
47:  end while