Towards Efficient Local Causal Structure Learning

Shuai Yang, Hao Wang, Kui Yu^∗, Fuyuan Cao, and Xindong Wu S. Yang, H. Wang and K. Yu are with the Key Laboratory of Knowledge Engineering With Big Data of Ministry of Education, Hefei University of Technology, China, also with the School of Computer Science and Information Engineering, Hefei University of Technology, Hefei 230601, China. E-mail: yangs@mail.hfut.edu.cn; {jsjxwangh, yukui}@hfut.edu.cn.F. Cao is with the School of Computer and Information Technology, Shanxi University, Taiyuan 030006, China. E-mail: cfy@sxu.edu.cn. X. Wu is with the Key Laboratory of Knowledge Engineering With Big Data of Ministry of Education, Hefei University of Technology, Hefei 230601, China, also with Mininglamp Academy of Sciences, Mininglamp Technology, Beijing, 100102, China. E-mail: wuxindong@mininglamp.com.Manuscript received month day, year; revised month day, year. (*Corresponding author: Kui Yu.)

Abstract

Local causal structure learning aims to discover and distinguish direct causes (parents) and direct effects (children) of a variable of interest from data. While emerging successes have been made, existing methods need to search a large space to distinguish direct causes from direct effects of a target variable T. To tackle this issue, we propose a novel Efficient Local Causal Structure learning algorithm, named ELCS. Specifically, we first propose the concept of N-structures, then design an efficient Markov Blanket (MB) discovery subroutine to integrate MB learning with N-structures to learn the MB of T and simultaneously distinguish direct causes from direct effects of T. With the proposed MB subroutine, ELCS starts from the target variable, sequentially finds MBs of variables connected to the target variable and simultaneously constructs local causal structures over MBs until the direct causes and direct effects of the target variable have been distinguished. Using eight Bayesian networks the extensive experiments have validated that ELCS achieves better accuracy and efficiency than the state-of-the-art algorithms.

Index Terms:

Bayesian network, Markov Blanket, Local causal structure learning.

1 Introduction

CAUSAL discovery has always been an important goal in many scientific fields, such as medicine, computer science and bioinformatics [1, 2, 3, 4]. There has been a great deal of recent interest in discovering causal relationships between variables, since it is not only helpful to reveal the underlying data generating mechanism, but also to improve classification and prediction performance in both static and non-static environments [5]. However, in many real-world scenarios, it is difficult to discover causal relationships between variables since true causality can only be identified using controlled experimentation [6].

Statistical approaches are useful in generating testable causal hypotheses which can accelerate the causal discovery process [7]. Learning a Bayesian network (BN) from observational data is the popular method for causal structure learning and causal inference. The structure of a BN takes the form of a directed acyclic graph (DAG) in which nodes of the DAG represent the variables and edges represent dependence between variables. A DAG implies causal concepts, since they code potential causal relationships between variables: the existence of a directed edge X $\rightarrow$ Y means that X is a direct cause of Y, and the absence of a directed edge X $\rightarrow$ Y means that X cannot be a direct cause of Y [8]. When a directed edge X $\rightarrow$ Y in a BN indicates that X is a direct cause of Y, in this case, the BN is known as a causal Bayesian network. Given a set of conditional dependencies from observational data and a corresponding DAG model, we can infer a causal Bayesian network using intervention calculus [9]. Then learning BN structures (i.e. DAGs) from observational data is the most important step for causal structure learning.

In recent years, many causal structure learning (i.e. DAG learning) methods have been designed [10], which can be roughly divided into global causal structure learning and local causal structure learning. The first type of methods aims to learn the casual structure of all variables, such as MMHC [11], NOTEARS [12] and DAG-GNN [13]. However, in many practical scenarios, it is not necessary to waste time to learn a global structure when we are only interested in the causal relationships around a given variable. To tackle this issue, the second type of methods is proposed, with the aim to discover and distinguish the direct causes (parents) and direct effects (children) of a target variable, such as PCD-by-PCD [14] and CMB [15].

PCD-by-PCD (PCD means Parents, Children and some Descendants) [14] and CMB (Causal Markov Blanket) [15] first learn the PCD or MB (Markov Blanket) of a target variable and construct a local structure among the target variable and the variables in the PCD or MB, then sequentially learn PCDs or MBs of the variables connected to the target variable and simultaneously construct local structures among variables in PCDs or MBs until the parents and children of the target variable have been distinguished.

While emerging successes have been made, existing local causal structure learning methods suffer from the following limitations. They need to search a large space to distinguish parents from children of a target variable. That is to say, existing local causal structure learning methods not only need to learn the PCD or MB of the target variable, but also may need to learn PCDs or MBs of the variables connected to the target variable. In the worst case (e.g. the target variable has all single ancestors) all existing methods may be required to learn PCDs or MBs of all variables in a dataset. This leads to that existing local causal structure learning methods are often computationally expensive or even infeasible especially with a large-sized BN. For instance, as shown in Fig. 1, there is an N-structure (see Definition 9) formed by four variables T, A, B, and C. Given the target variable T, in order to determine the causal relationship between T and B: PCD-by-PCD is required to learn the PCD of T and PCDs of A and B. For CMB, if only the MB of T is learnt, the edge direction between T and B cannot be determined since there is no V-structures around B. CMB needs to further learn the MBs of B and A to orient the edge between T and B. In a word, both PCD-by-PCD and CMB need to search a large space to determine the edge direction between T and B. A larger search space will result in performing more conditionally independence (CI) tests for discovering the causal relationships around a given variable. More CI tests not only increase computational time, but also lead to more unreliable tests. It will be beneficial to local causal structure learning if the edge direction between B and T can be determined in learning the PCD or MB of the target variable T without learning PCDs or MBs of the other variables.

Refer to caption — Figure 1: A sample Bayesian network. The MB of T includes E and J (parents), A, B, L and K (children), C and D (spouses).

Then a question naturally arises: can we reduce the search space in determining the edge directions between a given variable and its children to speed up the local causal structure learning? To address this problem, our main contributions of the paper are summarized as follows.

•

We propose the concept of N-structures, a special local structure for edge directions in local causal structure learning. Then we propose a new local causal structure learning, called ELCS. Through leveraging the N-structures, ELCS learns the MBs of the variables as few as possible to distinguish parents from children of a given variable as many as possible, which improves the efficiency of local causal structure learning and simultaneously reduces the impact of unreliable CI tests.
•

To integrate MB learning with N-structures to infer edge directions as many as possible during the MB learning procedure, we design an efficient MB discovery subroutine (EMB) and its efficient version EMB-II. EMB not only is able to learn the MB of a variable, but also has an ability to distinguish parents from children of the variable.
•

We have conducted extensive experiments on eight benchmark BNs, and have compared ELCS with five existing causal structure learning algorithms, including three state-of-the-art global structure learning and two local structure learning algorithms, to demonstrate the effectiveness and efficiency of the ELCS algorithm.

The remainder of this paper is organized as follows. Section 2 reviews the related work, and Section 3 gives the notations and definitions. Section 4 describes the proposed ELCS algorithm in detail. Section 5 reports and discusses the experimental results. Section 6 summarizes the paper.

2 Related Work

Our work focuses on local causal structure learning and is also related to MB learning and global causal structure learning. So this section briefly introduces the related work in the three areas.

MB learning. Learning Markov Blanket (MB) plays an essential part in the skeleton learning during BN structure learning. Existing MB learning methods can be categorized into two types: constraint-based methods and score-based methods. The former employs independence tests to find the MB of a given variable [16, 17], whereas the latter learns the MB using score-based BN structure learning algorithms [18, 19].

Constraint-based methods can be roughly grouped into simultaneous MB learning and divide-and-conquer MB learning. Given the target variable T, the simultaneous MB learning algorithm aims to learn parents, children, and spouses of T simultaneously, and does not distinguish spouses of T from its PC, such as GSMB [20], IAMB [21], Inter-IAMB [22] and Fast-IAMB [23]. To reduce the sample requirement of the simultaneous MB learning algorithm, the divide-and-conquer MB learning algorithm is proposed, with the aim to find PC and spouses of the target variable separately. The representative divide-and-conquer MB learning algorithms include CCMB [16], BAMB [17], MMMB [24], HITON-MB [25], PCMB [26] and STMB [27]. Recently, a comprehensive review of the state-of-the-art MB learning algorithms are discussed in [28].

However, existing MB learning methods only learn a local skeleton around a target variable and do not distinguish parents from children in the learnt MB of a target variable.

Global causal structure learning. A large amount of methods have been designed for global causal structure learning. Recent methods can be roughly categorized into two types: local-to-global structure learning methods and continuous optimization based learning methods. The local-to-global structure learning approach, such as MMHC [11], SSL $\pm$ C/G [18] and GSBN [29], first learns the MB or PC of each variable, then constructs a skeleton of a DAG using the learnt MBs or PCs, and finally orients edges of the learnt skeleton using score-based or constraint-based causal learning algorithms. Instead of learning the MB of each variable first, GGSL [30] starts with a randomly selected variable, and then uses a score-based MB learning algorithm to gradually expand the learnt structure through a series of local structure learning steps. Based on GGSL, a parallel BN structure learning algorithm (PSL) is designed to improve the efficiency [31].

Recently, several continuous optimization based learning approaches have been proposed for global causal structure learning [13, 12, 32, 33]. Zheng et al. consider a BN structure learning problem as a purely continuous optimization problem and propose the NOTEARS algorithm [12]. DAG-GNN uses a graph neural network based deep generative model to capture the complex data distribution to learn BN structures [13]. RL-BIC uses reinforcement learning to search for a directed acyclic graph (DAG) with the best score [32]. Zhang et al. propose a DAG variational autoencoder (D-VAE) for BN structure learning [33].

However, global causal structure learning methods are time consuming or even infeasible when the number of variables of a BN is large. In fact, in many practical settings, we are only interested in distinguishing parents from children of a variable of interest. In this case, it is unnecessary and wasteful to find an entire BN structure.

Local causal structure learning. Local causal structure learning aims to learn and distinguish the parents and children of a target variable. Although many algorithms have been designed for learning a whole structure, only several algorithms have been proposed for local causal structure learning. PCD-by-PCD first discovers the PCD of a target variable, then sequentially discovers PCDs of the variables connected to the target variable and simultaneously finds V-structures and orients the edges connected to the target variable until all the parents and children of the target variable are identified [14]. CMB first learns the MB of a target variable using HITON-MB and orients edges by tracking the conditional independence changes in the MB of the target variable, then sequentially learns MBs of the variables connected to the target variable and simultaneously construct local structures along the paths starting from the target variable until the parents and children of the target variable have been identified or they cannot be identified further by continuing the process [15].

As we discussed in Section 1, both PCD-by-PCD and CMB encounter the time inefficient problem since they need to learn a large number of PCDs or MBs of the variables for distinguishing parents from children of a target variable. To tackle this issue, in this paper, we aim to develop a new method through learning the MBs of variables as few as possible while orienting edges as many as possible.

3 Notations and Definitions

In this section, we will briefly introduce some basic definitions and notations frequently used in this paper (see Table I for a summary of the notations). Let U denote a set of random variables. $\mathbb{P}$ represents a joint probability distribution over U, and $\mathbb{G}$ is a DAG over U. In a DAG, X is a parent of Y and Y is a child of X if there exists a directed edge from X to Y. X is an ancestor of Y (i.e., non-descendant of Y) and Y is a descendant of X if there exists a directed path from X to Y.

TABLE I: Summary of Notations

Notation	Meanings
U	a set of random variables
W	a subset of U
$\mathbb{P}$	a joint probability distribution over U
$\mathbb{G}$	a direct acyclic graph over U
DAG	direct acyclic graph
X, Y, Z, T	a single variable in U
Z,S	a conditioning set within U
X $\!\perp\!\!\!\perp$ Y	X and Y are independent given Z
X $\not\!\perp\!\!\!\perp$ Y	X and Y are dependent given Z
MB ${}_{\emph{T}}$	Markov Blanket of T
PC ${}_{\emph{T}}$	a set of parents and children of T
P ${}_{\emph{T}}$	a set of parents of T
C ${}_{\emph{T}}$	a set of children of T
UN ${}_{\emph{T}}$	undistinguished variables in PC ${}_{\emph{T}}$
SP ${}_{\emph{T}}$	a set of spouses of T
SP ${}_{\emph{T}}$ {X}	a spouses of T with regard to T’s child X
Sep ${}_{\emph{T}}$ {X}	a set that d-separates X from T
Sep ${}_{\emph{T}}$	a set that contains the sets Sep ${}_{\emph{T}}$ { $\cdot$ } of all variables
CSP ${}_{\emph{T}}$	a set that contains the candidate spouse sets of all PC ${}_{\emph{T}}$ variables
Que	a circular queue(first in fist out)
$\|\cdot\|$	the size of a set

Definition 1 (Conditional Independence [34]).

Given a conditioning set Z, X is conditionally independent of Y if and only if $P(X|Y,\textbf{Z})=P(X|\textbf{Z})$ .

Definition 2 (Bayesian Network [34]).

The triplet $<$ U, $\mathbb{G}$ , $\mathbb{P}$ $>$ is called a Bayesian network (BN) if $<$ U, $\mathbb{G}$ , $\mathbb{P}$ $>$ satisfies the Markov condition: each variable is conditionally independent of variables in its non-descendant given its parents in $\mathbb{G}$ .

Definition 3 (Casual Bayesian Network [9]).

A BN is called a causal Bayesian network (CBN) if a directed edge in $\mathbb{G}$ has causal interpretation, that is, X $\rightarrow$ Y indicates that X is a direct cause of Y.

Definition 4 (Causal Structure Learning).

Global causal structure learning aims to learn a DAG over U from observational data, where edges represent potential causal relationships between variables, that is, X is a direct cause of Y if there exists a directed edge from X to Y [9]. Local causal structure learning aims to discover and distinguish direct causes and direct effects of a variable of interest [15].

Definition 5 (V-structure [34]).

If there is no an edge between X and Y, and Z has two incoming edges from X and Y, respectively, then X, Z and Y form a V-structure ( $X\rightarrow Z\leftarrow Y$ ).

In a BN, Z is a collider if there are two directed edges from X to Z and from Y to Z, respectively. V-structures play an important role in determining the edge directions between variables. For example, if there is a V-structure ( $X\rightarrow Z\leftarrow Y$ ) formed by X, Y and Z, we can identify X and Y as parents of Z using conditional independence (CI) tests.

Definition 6 (D-separation [34]).

Given a set S $\subseteq$ U $\setminus$ {X,Y}, a path $\pi$ between X and Y is blocked, if one of the following conditions is satisfied: 1) there is a non-collider variable within S on $\pi$ , or 2) there is a collier variable Z on $\pi$ , while Z and any its descendants are not in S. Otherwise, $\pi$ between X and Y is unblocked. X and Y are d-separation given S if and only if each path between X and Y is blocked by S.

In a DAG, given a conditioning set, we can determine whether two variables are conditionally independent using Definition 6.

Definition 7 (Faithfulness [35]).

Given a BN $<$ U, $\mathbb{G}$ , $\mathbb{P}$ $>$ , $\mathbb{G}$ is faithful to $\mathbb{P}$ if and only if all the conditional independencies appear in $\mathbb{P}$ are entailed by $\mathbb{G}$ . $\mathbb{P}$ is faithful if and only if there is a DAG $\mathbb{G}$ such that $\mathbb{G}$ is faithful to $\mathbb{P}$ .

Definition 7 indicates that in a faithful BN, if X and Y are d-separated given the conditioning set S in $\mathbb{G}$ , then they will be conditionally independent given S in $\mathbb{P}$ .

Definition 8 (Markov Blanket [34]).

In a faithful BN, the MB of a target variable T is denoted as MB_T, which is uniqueness and consists of parents, children and spouses (other parents of the target’s children) of T. All other variables in U $\setminus$ MB_T $\setminus$ {T} are conditionally independent of T given MB_T, $\forall$ X $\subseteq$ U $\setminus$ MB_T $\setminus$ {T}, X $\!\perp\!\!\!\perp$ T $|$ MB_T, where X $\!\perp\!\!\!\perp$ T $|$ MB_T denotes X and T are conditionally independent conditioning on MB_T.

Definition 9 (N-structure).

In a faithful BN, if there exists four variables T, A, B and C, and T is a parent of A and B, C is a parent of A, there is no an edge between C and T, A is an ancestor of B, the other parents of B are in PC set of T. Then, the local structure formed by the four variables is called an N-structure.

Fig. 2 gives examples of the N-structures. In Fig. 2 (a), there is an N-structure formed by T, A, B and C. In Fig. 2 (b), variables T, A, B and C construct an N-structure, and variables T, A, E and C construct an N-structure. Given the target variable T, we can leverage the N-structures to determine edge directions between T and its children (i.e. B and E) during learning the MB of T without learning MBs of the other variables.

Theorem 1 [35].

In a faithful BN, for any two variables X $\in$ U and Y $\in$ U, if there exists an edge between X and Y, then $\forall$ S $\subseteq$ U $\setminus$ {X,Y}, X $\not\!\perp\!\!\!\perp$ Y $|$ S holds.

Theorem 1 demonstrates that if X is a parent (or a child) of Y if X and Y are not conditionally independent conditioning any subsets excluding X and Y.

4 The Proposed Method

4.1 The ELCS Algorithm

We propose the Efficient Local Causal Structure learning algorithm (ELCS) to distinguish parents from children of a target variable, as shown in Algorithm 1. ELCS starts from the target variable, sequentially finds MBs of variables connected to the target variable and simultaneously constructs local causal structures over MBs until all the parents and children of the target variable have been distinguished or it is clear that they cannot be further distinguished by continuing the process. In the following, we first summarize the main idea of ELCS, then give the details of ELCS.

To improve the efficiency of local causal structure learning, in ELCS, we propose the following two acceleration strategies. First, ELCS finds the N-structures, and then leverages those found N-structures to infer edge directions between the target variable T and its children during learning the MB of T. Second, two rules in Lemma 1 are used to further infer edge directions between T and its PC during learning the MB of T.

As described in Algorithm 1, given the target variable T, ELCS first initializes the variable set W and the queue Que to empty sets (line 1 in Algorithm 1), where W is used to store variables that their MBs have been learnt, and Que is utilized to store variables that their MBs need to be learnt in next phase. Then, T enters Que (line 2 in Algorithm 1). Next, lines 4-13 in Algorithm 1 will be executed. At line 4 in Algorithm 1, the header element in Que is out of queue, that is, X = T. Since the MB of T has not been learnt, T is added to the set W (lines 5-6 in Algorithm 1). The EMB (Efficient Markov Blanket discovery) subroutine is executed to learn the MB of T (line 7 in Algorithm 1). Given a variable X, the EMB subroutine not only is able to find the PC (PC ${}_{\emph{X}}$ ) and MB (SP ${}_{\emph{X}}$ $\cup$ PC ${}_{\emph{X}}$ ), but also has an ability to distinguish parents from children of X. The details of EMB are described in Section 4.2. Let P ${}_{\emph{X}}$ represent a set of the identified parents of X. C ${}_{\emph{X}}$ denotes a set of the identified children of X. The set containing undistinguished PC variables of X is denoted as UN ${}_{\emph{X}}$ . After executing the line 7 in Algorithm 1, the sets of P ${}_{\emph{T}}$ , C ${}_{\emph{T}}$ , UN ${}_{\emph{T}}$ , SP ${}_{\emph{T}}$ , PC ${}_{\emph{T}}$ are obtained. If UN ${}_{\emph{T}}$ is empty, that is, parents and children of T are all distinguished, the learning process will be terminated. Otherwise, the undistinguished variables within UN ${}_{\emph{T}}$ will be put in Que (lines 9-11 in Algorithm 1). Then, Meek rules [36] are used to orient other edge directions between variables in W (line 13 in Algorithm 1). Lines 4-13 in Algorithm 1 will be repeated until all the parents and children of T have been distinguished or Que is empty or the size of W equals to that of the entire variable set.

10: Data D, target variable T , random variables set U 20: P

{}_{\emph{T}}

, C

{}_{\emph{T}}

, UN

{}_{\emph{T}}

31: W

\leftarrow

\emptyset

, Que =

\emptyset

42: Que

.push

(T) 53: Repeat 4: X = Que.

pop()

5: if X

\not\in

W then 6: W

\leftarrow

\cup

{X} 7: [P

{}_{\emph{X}}

, C

{}_{\emph{X}}

, UN

{}_{\emph{X}}

, SP

{}_{\emph{X}}

, PC

{}_{\emph{X}}

] = EMB(D, X) 68: Orienting edge directions between X and its PC nodes according to P

{}_{\emph{X}}

and C

{}_{\emph{X}}

9: for each Y

\in

{}_{\emph{X}}

do 10: Que

.push

(Y) 11: end for 12: end if 713: Using Meek rules to orient other edge directions between variables in W 814: Until (1) all parents and children of T can be determined, or 9(2) Que =

\emptyset

, or (3)W = U

Algorithm 1 ELCS

10: Data D, target variable T 20: P

{}_{\emph{T}}

, C

{}_{\emph{T}}

, UN

{}_{\emph{T}}

, SP

{}_{\emph{T}}

, PC

{}_{\emph{T}}

3/*Step 1: find the PC set of T*/ 41: [PC

{}_{\emph{T}}

, Sep

{}_{\emph{T}}

]

\leftarrow

RecogPC(T, D) 5/*Step 2: find spouses of T*/ 62: [SP

{}_{\emph{T}}

, CSP

{}_{\emph{T}}

]

\leftarrow

RecogSpouses(D, T, PC

{}_{\emph{T}}

, Sep

{}_{\emph{T}}

) 7/*Step 3: remove false positive PC from PC_T*/ 3: for each Y

\in

{}_{\emph{T}}

do 4: if T

\!\perp\!\!\!\perp

|

Z, Z

\subseteq

{}_{\emph{T}}

{Y}

\cup

{}_{\emph{T}}

\setminus

{Y} then 5: PC

{}_{\emph{T}}

\leftarrow

{}_{\emph{T}}

\setminus

{Y} 6: SP

{}_{\emph{T}}

{Y}

\leftarrow

\emptyset

7: end if 88: end for 9/*Step 4: find P_T, C_T*/ 9: [P

{}_{\emph{T}}

, C

{}_{\emph{T}}

, UN

{}_{\emph{T}}

]

\leftarrow

DistinguishPC(D, T, PC

{}_{\emph{T}}

, SP

{}_{\emph{T}}

, CSP

{}_{\emph{T}}

)

Algorithm 2 EMB

10: Data D, target variable T, PC

{}_{\emph{T}}

, Sep

{}_{\emph{T}}

20: SP

{}_{\emph{T}}

, CSP

{}_{\emph{T}}

1: SP

{}_{\emph{T}}

\leftarrow

\emptyset

2: for each X

\in

\setminus

{T}

\setminus

{}_{\emph{T}}

do 3: Temp

\leftarrow

\emptyset

4: for each Y

\in

{}_{\emph{T}}

do 5: if X

\not\!\perp\!\!\!\perp

|

\emptyset

then 6: Temp

\leftarrow

Temp

\cup

{Y} 7: end if 8: end for 9: if X

\not\!\perp\!\!\!\perp

|

Temp then 10: for each Y

\in

Temp do 11: if X

\not\!\perp\!\!\!\perp

|

{Y}

\cup

Sep

{}_{\emph{T}}

{X} then 12: CSP

{}_{\emph{T}}

{Y}

\leftarrow

CSP

{}_{\emph{T}}

{Y}

\cup

{X} 13: end if 14: end for 15: end if 16: end for 17: SP

{}_{\emph{T}}

\leftarrow

CSP

{}_{\emph{T}}

18: for each Y

\in

{}_{\emph{T}}

do 19: for each X

\in

{}_{\emph{T}}

{Y} do 20: if X

\!\perp\!\!\!\perp

|

Z, Z

\subseteq

{}_{\emph{T}}

{Y}

\cup

{T}

\cup

{}_{\emph{T}}

\setminus

{X,Y} then 21: SP

{}_{\emph{T}}

{Y}

\leftarrow

{}_{\emph{T}}

{Y}

\setminus

{X} 22: end if 23: end for 324: end for

Algorithm 3 RecogSpouses

10: Data D, target variable T , PC

{}_{\emph{T}}

, SP

{}_{\emph{T}}

, CSP

{}_{\emph{T}}

20: P

{}_{\emph{T}}

, C

{}_{\emph{T}}

, UN

{}_{\emph{T}}

1: C

{}_{\emph{T}}

\leftarrow

\emptyset

, P

{}_{\emph{T}}

\leftarrow

\emptyset

2: for each Y

\in

{}_{\emph{T}}

do 3: if SP

{}_{\emph{T}}

{Y} is nonempty then 4: C

{}_{\emph{T}}

\leftarrow

{}_{\emph{T}}

\cup

{Y} 5: end if 6: end for 7: UN

{}_{\emph{T}}

\leftarrow

{}_{\emph{T}}

\setminus

{}_{\emph{T}}

8: for each X

\in

{}_{\emph{T}}

do 9: if CSP

{}_{\emph{T}}

{X}

\cap

{}_{\emph{T}}

is nonempty then 10: C

{}_{\emph{T}}

\leftarrow

{}_{\emph{T}}

\cup

{X} 11: end if 12: end for 13: for each X

\in

{}_{\emph{T}}

\setminus

{}_{\emph{T}}

do 14: for each Y

\in

{}_{\emph{T}}

\setminus

{}_{\emph{T}}

do 15: if X

\!\perp\!\!\!\perp

|

\emptyset

and X

\not\!\perp\!\!\!\perp

|

T then 16: P

{}_{\emph{T}}

\leftarrow

{}_{\emph{T}}

\cup

{X}

\cup

{Y} 17: end if 18: end for 19: end for 20: UN

{}_{\emph{T}}

\leftarrow

{}_{\emph{T}}

\setminus

{}_{\emph{T}}

\setminus

{}_{\emph{T}}

21: for each X

\in

{}_{\emph{T}}

do 22: for each Y

\in

{}_{\emph{T}}

do 23: if X

\not\!\perp\!\!\!\perp

|

\emptyset

and X

\!\perp\!\!\!\perp

|

T then 24: C

{}_{\emph{T}}

\leftarrow

{}_{\emph{T}}

\cup

{X} 25: break 26: end if 27: end for 28: end for 29: UN

{}_{\emph{T}}

\leftarrow

{}_{\emph{T}}

\setminus

{}_{\emph{T}}

\setminus

{}_{\emph{T}}

Algorithm 4 DistinguishPC

4.2 EMB Subroutine

ELCS depends on the MB learning methods for local causal structure learning, but existing MB learning algorithms have the following shortcomings. First, existing MB learning methods cannot be directly combined with the N-structures to infer edge directions between the target variable and its children. Second, existing MB learning methods only focus on learning the MB of the target variable and are not able to distinguish parents from children. Third, existing MB learning methods may be computationally expensive. In order to help ELCS to leverage N-structures and the rules in Lemma 1 for efficiently learning local causal structures, we design an Effective MB discovery subroutine (EMB) to learn the MB of a target variable and distinguish parents from children of the target variable simultaneously.

As shown in Algorithm 2, EMB consists of four steps as follows. Given a target variable T, EMB first learns the PC of T (PC ${}_{\emph{T}}$ ) using an existing PC learning algorithm. Second, EMB obtains spouses of T using a RecogSpouses subroutine. Then, EMB removes false PC from PC ${}_{\emph{T}}$ . Finally, EMB orients edges between T and its PC as many as possible using a DistinguishPC subroutine. Specifically, to find the N-structures, EMB first determines which variable within U $\setminus$ {T} $\setminus$ PC ${}_{\emph{T}}$ is a candidate spouse of T, and obtains the candidate spouse set CSP ${}_{\emph{T}}$ {Y} of each variable Y within PC ${}_{\emph{T}}$ , and then obtains the spouses of T. Based on the learnt CSP ${}_{\emph{T}}$ {Y} and spouses, we can find the N-structures. Through leveraging the found N-structures, EMB can distinguish some children of T with regard to the found N-structures. In addition, two rules in Lemma 1 are used in the DistinguishPC subroutine to further distinguish parents from children of T. In the following, we will give the details of these four steps.

Lemma 1.

The PC (parents and children) set of a given variable T (T $\in$ U) is denoted as PC_T. Let X $\in$ PC_T, Y $\in$ PC_T. We can get the following two dependence relationships between X and Y.

(a) X $\!\perp\!\!\!\perp$ Y $|$ $\emptyset$ and X $\not\!\perp\!\!\!\perp$ Y $|$ T $\Rightarrow$ X and Y are both parents of T. This shows that there is a V-structure ( $X\rightarrow T\leftarrow Y$ ) formed by variables X, Y and T, and T is a collider.

(b) X is a direct cause of T, X $\not\!\perp\!\!\!\perp$ Y $|$ $\emptyset$ and X $\!\perp\!\!\!\perp$ Y $|$ T $\Rightarrow$ Y is a direct effects of T. This shows that there is only one path ( $X\rightarrow T\rightarrow Y$ ) from X to Y, and the path is blocked by T.

Step 1 (line 1 in Algorithm 2): EMB obtains PC ${}_{\emph{T}}$ and Sep ${}_{\emph{T}}$ of a target variable T by utilizing an existing PC learning algorithm, where Sep ${}_{\emph{T}}$ is a set that contains the sets Sep ${}_{\emph{T}}$ { $\cdot$ } of all variables. In this paper, we use HITON-PC [25] to find the PC of T (any other state-of-the-art PC learning algorithms can be used here to instantiate the RecogPC() function at Step 1 in Algorithm 2).

Step 2 (line 2 in Algorithm 2): At this step, EMB learns spouses of T. We design a RecogSpouses subroutine for learning spouses. The details of RecogSpouses are described in Algorithm 3. RecogSpouses first finds candidate spouses from all variables within U $\setminus$ {T} $\setminus$ PC ${}_{\emph{T}}$ that are conditionally independent of T. If X and T are conditionally independence, then we construct a set Temp that consists of variables which belong to PC ${}_{\emph{T}}$ and are dependent of X given an empty set (lines 3-8 in Algorithm 3). If X and T are conditionally independent conditioning on Temp, then X cannot be a spouse of T. Otherwise, X is regarded as a candidate spouse and lines 9-15 in Algorithm 3 will be executed. If X and T are dependent conditioning on Sep ${}_{\emph{T}}$ {X} $\cup$ Y (Y $\in$ Temp), then X will be added to CSP ${}_{\emph{T}}$ {Y} (lines 9-15 in Algorithm 3). Since some non-parent variables of Y will be added to CSP ${}_{\emph{T}}$ {Y}, non-parent variables of each Y $\in$ PC ${}_{\emph{T}}$ will be removed from CSP ${}_{\emph{T}}$ {Y} and the spouse set SP ${}_{\emph{T}}$ {Y} will be obtained after executing lines 17-24 in Algorithm 3.

Step 3 (lines 3-8 in Algorithm 2): At this step, EMB removes false positives from the candidate set of PC of T. For each variable Y within PC ${}_{\emph{T}}$ , if there exists a subset Z of the union SP ${}_{\emph{T}}$ {Y} $\cup$ PC ${}_{\emph{T}}$ such that Y and T are conditionally independent conditioning on Z, Y will be removed from PC ${}_{\emph{T}}$ , and SP ${}_{\emph{T}}$ {Y} will be set to an empty set.

Step 4 (line 9 in Algorithm 2): At this step, EMB distinguishes parents from children of T as many as possible. We propose a DistinguishPC subroutine to accomplish this goal. DistinguishPC first identifies some children of T with the help of spouses of T. Second, DistinguishPC uses the found N-structures to infer edge directions between T and its children. Finally, DistinguishPC distinguishes parents from children of T using Lemma 1.

The details of DistinguishPC are described in Algorithm 4. First, DistinguishPC uses the learnt spouses to identify some children of T (lines 2-6 in Algorithm 4). For example, in Fig. 1, C and D are spouses of T, SP ${}_{\emph{T}}$ {A} = {C}, SP ${}_{\emph{T}}$ {K} = {D}, and DistinguishPC identifies A and K as children of T. There exists an N-structure which is formed by C, A, B and T in Fig. 1, DistinguishPC can determine the edge direction between T and B with the help of SP ${}_{\emph{T}}$ and CSP ${}_{\emph{T}}$ . At Step 2, C will be added to CSP ${}_{\emph{T}}$ {A} and CSP ${}_{\emph{T}}$ {B}, and C will be removed from SP ${}_{\emph{T}}$ {B} because C is not a parent of B (lines 17-24 in Algorithm 3). Since C is a spouse of T and C is within the set CSP ${}_{\emph{T}}$ {B}, DistinguishPC identifies B as a child of T (lines 8-12 in Algorithm 4). Theorem 2 gives the theoretical analysis. In addition, in order to orient more edges between T and its PC, two rules in Lemma 1 are used. If X $\!\perp\!\!\!\perp$ Y $|$ $\emptyset$ and X $\not\!\perp\!\!\!\perp$ Y $|$ T, we can conclude that X and Y are both parents of T. Therefore, DistinguishPC identifies parents of T as many as possible using Lemma 1 (a) (lines 13-19 in Algorithm 4). If X is a parent of T, X $\not\!\perp\!\!\!\perp$ Y $|$ $\emptyset$ and X $\!\perp\!\!\!\perp$ Y $|$ T, then Y is a child of T. DistinguishPC uses the identified parents of T to determine edge directions between T and its children using Lemma 1 (b) (lines 21-28 in Algorithm 4).

We also propose a variant of EMB to further improve the efficiency of MB learning, which is referred to as EMB-II. Compared with EMB, in learning MB, instead of directly executing line 20 in Algorithm 3, EMB-II first ranks the variables within SP ${}_{\emph{T}}$ {Y} in descending order according to the dependency with variable Y, then executes line 20 in Algorithm 3.

We also propose a variant of ECLS to further improve the efficiency of local causal structure learning, which is referred to as ELCS-II. Compared with ECLS, ELCS-II uses EMB-II for MB learning instead of using EMB.

In the following, we will give the details of Theorem 2 and its proof.

Theorem 2.

In a faithful BN, given an N-structure consisting of four variables T, A, B, C. T is a parent of A and B, C is a parent of A, there is no an edge between C and T, A is an ancestor of B, and the parents of B are in PC set of T, then EMB identifies B as a child of T during learning the MB of T.

Proof.

Under the faithfulness assumption, the PC of a target variable T only contains parents and children of T. Since C is a parent of A, there is no an edge between C and T, then EMB identifies C as a spouse of T since there is a V-structure ( $\emph{C}\rightarrow\emph{A}\leftarrow\emph{T}$ ) around A. At step 2 of EMB, C will be added to CSP ${}_{\emph{T}}$ {B} (lines 2-16 in Algorithm 3) since C and T are conditionally dependent given the conditioning set {B} $\cup$ Sep ${}_{\emph{T}}$ {C}. If B is a parent of T, then C and T are conditionally independent given the conditioning set {B} $\cup$ Sep ${}_{\emph{T}}$ {C}, since all paths from T to C are blocked by conditioning set {B} $\cup$ Sep ${}_{\emph{T}}$ {C}. Therefore, B is a child of T. ∎

4.3 Tracing

We first trace the execution of EMB using the example in Fig. 3, then trace the execution of ELCS using the example in Fig. 4.

4.3.1 Tracing EMB

We utilize the example in Fig. 3 to trace the execution of EMB. Suppose that we have a dataset for variable set U = {A, B, C, D, E, F, X, H, I, J, K, L, T}. The independence relationships between variables can be represented by the Bayesian Network structure in Fig. 3. In the following, we regard T as the target variable, and give the execution process of EMB.

(1) step 1: referring to the simple network, i.e., the left network in Fig. 3. First, HITON-PC is used to find the PC of T. According to Theorem 1, A, B, L, K, E and J will be added to PC ${}_{\emph{T}}$ . Note that D is conditionally independent of T given an empty set, hence D will not be in any of the conditioning sets for higher order conditional independence tests. As a result, I will be added to PC ${}_{\emph{T}}$ since T and I are conditionally dependent given conditioning set {K}. Then, as shown in Fig. 3 (a), PC ${}_{\emph{T}}$ = {A, B, L, K, E, J, I}.

(2) step 2: as shown in Fig.3 (b), EMB discovers the spouses of T. X and each variable within Temp = {A, B, L, K, E, I} are conditionally dependent given an empty set, while X is conditionally independent of T given the conditioning set Temp, so that X cannot be a candidate spouse of T since each path from X to T is blocked by Temp. Similarly, both F and H are not candidate spouses. C and each variable within Temp = {A, B, L, K, E, I} are conditionally dependent given an empty set, and C is dependent of T given Temp, and we need to conduct further tests. Owning to C $\!\perp\!\!\!\perp$ T $|$ E, C $\not\!\perp\!\!\!\perp$ T $|$ {E, A}, C $\not\!\perp\!\!\!\perp$ T $|$ {E, B}, C $\!\perp\!\!\!\perp$ T $|$ {E, L}, C $\!\perp\!\!\!\perp$ T $|$ {E, K} and C $\!\perp\!\!\!\perp$ T $|$ {E, I}, hence C is added to CSP ${}_{\emph{T}}$ {A} and CSP ${}_{\emph{T}}$ {B}, CSP ${}_{\emph{T}}$ {A} = {C}, CSP ${}_{\emph{T}}$ {B} = {C}. Similarly, D is added to CSP ${}_{\emph{T}}$ {K} and CSP ${}_{\emph{T}}$ {I}, CSP ${}_{\emph{T}}$ {K} = {D}, CSP ${}_{\emph{T}}$ {I} = {D}. In the following, C will be removed from SP ${}_{\emph{T}}$ {B} since C $\!\perp\!\!\!\perp$ B $|$ {A, T} (lines 17-24 in Algorithm 3). After this step, SP ${}_{\emph{T}}$ {A} = {C}, SP ${}_{\emph{T}}$ {K} = {D}, SP ${}_{\emph{T}}$ {I} = {D}.

(3) step 3: as shown in Fig. 3 (c), after checking at line 4 in Algorithm 2, I will be removed from PC ${}_{\emph{T}}$ since I $\!\perp\!\!\!\perp$ T $|$ {K, D}. After this step, PC ${}_{\emph{T}}$ = {A, B, L, K, E, J}, SP ${}_{\emph{T}}$ = {C, D}, and MB ${}_{\emph{T}}$ ={A, B, L, K, E, J, C, D}.

(4) step 4: as shown in Fig. 3 (d), EMB orients the edge directions between T and its PC as many as possible. Since C is a spouse of T, and C has been added to SP ${}_{\emph{T}}$ {B} at step 2, based on Theorem 2, B is a child of T. In addition, according to Lemma 1, E and J are parents of T since E $\!\perp\!\!\!\perp$ J $|$ $\emptyset$ and E $\not\!\perp\!\!\!\perp$ J $|$ T. L is a child of T since E is a parent of T, E $\not\!\perp\!\!\!\perp$ L $|$ $\emptyset$ and E $\!\perp\!\!\!\perp$ L $|$ T.

4.3.2 Tracing ELCS

We use the example in Fig. 4 to trace the execution of ELCS.

Suppose that we have a dataset for variable set U = {A, B, C, D, E, F, X, H, I, J, K, L, T, Y}. The independence relationships between variables can be represented by the BN structure in Fig.4. In the following, we regard T as the target variable, and give the execution process of distinguishing parents from children of T using ELCS. We use the $G(X,Y)$ = -1 to represent that X is a parent of Y, $G(X,Y)$ = 1 represents that X is adjacent to Y, $G(X,Y)$ = 0 represents that there is no an edge between X and Y.

(1) step 1: referring to the simple network, i.e., the left network in Fig. 4. We first use EMB to distinguish parents from children of T. After learning the MB of T, as shown in Fig. 4 (a), the PC ${}_{\emph{T}}$ = {A, B, L, K, E, J, Y} and SP ${}_{\emph{T}}$ = {C, D} are obtained. Then, $G(T,A)$ = -1, $G(T,B)$ = -1, $G(T,L)$ = -1, $G(T,K)$ = -1, $G(E,T)$ = -1, $G(J,T)$ = -1 and $G(Y,T)$ = 1. The edge direction between Y and T is unsure.

(2) step 2: to resolve $G(Y,T)$ , as shown in Fig. 4 (b), we need to make a further search. In the following, the MB of Y is extracted using EMB, and $G(X,Y)$ = -1, $G(F,Y)$ = -1. After updating the current local structure using Meek rules, we can learn that Y is a parent of T, that is, $G(Y,T)$ = -1.

4.4 Theoretical Analysis

In the following, we first theoretically analyze the correctness of EMB, then theoretically analyze the correctness of ELCS.

Theorem 3 (Correctness of EMB).

Under the faithfulness assumption, and all CI tests are reliable, EMB finds all and only the MB of a given variable.

Proof.

At step 1, EMB finds all the true PC variables. According to Theorem 1, the variables which are dependent with the target variable T will be added to PC ${}_{\emph{T}}$ . PC ${}_{\emph{T}}$ contains the true parents and children of T, since the true PC variables are always dependent of T. In addition, PC ${}_{\emph{T}}$ also contains some descendants of T [27]. Then, based on the results of step 1, EMB finds all the true spouses of T at step 2. If Y is a collider, X is regarded as a candidate spouse if there exists a V-structure ( $\emph{X}\rightarrow\emph{Y}\leftarrow\emph{T}$ ) formed by X, Y and T, and X will be added to CSP ${}_{\emph{T}}$ {Y} (lines 1-16 in Algorithm 3). Owning to an exhaustive search, EMB will not miss any true spouses of T. In the following, the variable X $\in$ SP ${}_{\emph{T}}$ {Y} that is a non-parent of Y will be removed from SP ${}_{\emph{T}}$ {Y} (lines 17-24 in Algorithm 3). According to the Markov condition, the variable X $\in$ SP ${}_{\emph{T}}$ {Y} will be removed if it is not a parent of Y, since conditioning set SP ${}_{\emph{T}}$ {Y} $\cup$ T $\cup$ PC ${}_{\emph{T}}$ $\setminus$ {X,Y} contains all true parents of Y. Therefore, SP ${}_{\emph{T}}$ contains all the true spouses of T. The learnt PC ${}_{\emph{T}}$ may contain some false PC variables. T and each false PC variable are conditionally independent given the spouses of the false PC variable and PC ${}_{\emph{T}}$ . At the step 3, the false PC variables found at step 1 and false spouses found at step 2 will be removed (lines 3-8 in Algorithm 2). Then, EMB contains all and only the true PC variables PC ${}_{\emph{T}}$ and spouses SP ${}_{\emph{T}}$ after executing Algorithm 2, and PC ${}_{\emph{T}}$ and SP ${}_{\emph{T}}$ together form all and only true MB variables. At step 4, based on Theorem 2 and Lemma 1, parents and children of T are learnt by EMB are correct.

∎

Theorem 4 describes the correctness of the proposed ELCS algorithm. In the following, we will introduce Theorem 4 and its proof in detail.

Theorem 4 (Correctness of ELCS).

Under the faithfulness assumption, and all CI tests are reliable, ELCS distinguishes all parents from children of a given variable.

Proof.

Under the causal faithfulness assumption, given a target variable, EMB finds all and only the true MB variables and the true PC variables of the target variable. The learnt PC set contains all and only the parents and children of the target variable. Based on Definition 5, the children identified by the learnt spouses are correct. Base on Theorem 2, the children identified by the found N-structures are correct. All the parents and children identified by Lemma 1 are correct. ELCS updates the local causal structures until the parents and children of the target variable have been distinguished. Meek rules [36] are used to orient other undirected edges between the target variable and the variables that are adjacent to the target variable during learning the local causal structure, and all the edge directions determined by Meek rules are correct. Thus, all the parents and children of a given target variable distinguished by ELCS are correct. ∎

4.5 Computational Complexity

The computational complexity of ELCS algorithm depends on the steps of discovering MB. In the following, we will give the computational complexity of EMB and ELCS.

The computational complexity of EMB: The computational complexity of EMB depends on its four steps. Given a target variable T, at step 1, the computational cost is dominated by HITON-PC which takes at most $O(|\emph{{U}}|2^{|\emph{{PC}}_{\emph{T}}|})$ CI tests to find the PC. Step 2 takes $O(|\textbf{\emph{SP}}_{\emph{T}}|2^{|\textbf{\emph{PC}}_{\emph{T}}|\emph{+}|\textbf{\emph{SP}}_{\emph{T}}|})$ CI tests, step 3 takes $O(|\textbf{\emph{PC}}_{\emph{T}}|2^{|\textbf{\emph{PC}}_{\emph{T}}|\emph{+}|\textbf{\emph{SP}}_{\emph{T}}|})$ CI tests, step 4 takes $O(2{|\textbf{\emph{PC}}_{\emph{T}}|}^{2})$ CI tests. Thus, the complexity of EMB is $O(|\emph{{U}}|2^{|\emph{{PC}}_{\emph{T}}|}\emph{+}|\textbf{\emph{SP}}_{\emph{T}}|2^{|\textbf{\emph{PC}}_{\emph{T}}|\emph{+}|\textbf{\emph{SP}}_{\emph{T}}|}+|\textbf{\emph{PC}}_{\emph{T}}|2^{|\textbf{\emph{PC}}_{\emph{T}}|\emph{+}|\textbf{\emph{SP}}_{\emph{T}}|}+2{|\textbf{\emph{PC}}_{\emph{T}}|}^{2})$ = $O(|\emph{{U}}|2^{|\emph{{PC}}_{\emph{T}}|})$ . Let $|\textbf{\emph{PC}}|$ represent the largest size of the PC sets of all the variables, the complexity of EMB is $O(|\emph{{U}}|2^{|\emph{{PC}}|})$ .

We summarize the computational complexity of EMB and its rivals in Table II. From the table, IAMB is the fastest among all MB learning algorithms. MMMB and HINTON-MB are the slowest two algorithms, since they need to find the PC sets of all target variable’s parents and children. The computational complexity of EMB is lower than STMB. In general, most of the BNs have a large number of variables but a small-sized PC set of each variable, so that EMB is faster than STMB. The computational complexity of BAMB is the same with EMB.

The computational complexity of ELCS: In the best case, the complexity of ELCS is $O(|\emph{{U}}|2^{|\emph{{PC}}|})$ . But in the worst case (e.g. the target variable has all single ancestors), ELCS needs to learn a whole structure, hence the complexity of ELCS is $O(|\emph{{U}}|^{2}2^{|\emph{{PC}}|})$ .

Table III reports the computational complexity of ELCS and its rivals. Note that $m$ $\ll|\emph{{U}}|$ is the memory size of L-BFGS [12]. $n$ and $t$ are the number of samples in data and iterations, respectively. $h$ is the number of neurons in the hidden layer [13]. The computational complexity of global causal structure learning algorithms is higher than that of local causal structure learning algorithms, since they learn the whole structure of all variables. PCD-by-PCD uses MMPC for PC learning, and CMB uses HITON-MB for MB learning. In the best case, the complexity of PCD-by-PCD is consistent with that of MMMB, it is $O(|\emph{{U}}||\textbf{\emph{PC}}|2^{|\textbf{\emph{PC}}|})$ . The complexity of CMB is consistent with that of HITON-MB, it is $O(|\emph{{U}}||\textbf{\emph{PC}}|2^{|\textbf{\emph{PC}}|})$ . In the worst case, since MMPC can discover the separating sets while learning PC, the complexity of PCD-by-PCD is consistent with that of using MMPC to find PCs of all variables, it is $O(|\emph{{U}}|^{2}2^{|\textbf{\emph{PC}}|})$ . The complexity of CMB is consistent with that of using HITON-MB to find MBs of all variables, it is $O(|\emph{{U}}|^{2}|\textbf{\emph{PC}}|2^{|\textbf{\emph{PC}}|})$ . ELCS is the fastest algorithm in both best and worst cases, since the complexity of ELCS relies on that of EMB which only takes $O(|\emph{{U}}|2^{|\emph{{PC}}|})$ CI tests to find the MB.

TABLE II: Computational Complexity of Each Markov Blanket Learning Algorithm

Algorithms	IAMB	MMMB	HITON-MB	STMB	BAMB	EMB
Complexity	$O(\|\emph{{U}}\|^{2})$	$O(\|\emph{{U}}\|\|\textbf{\emph{PC}}\|2^{\|\textbf{\emph{PC}}\|})$	$O(\|\emph{{U}}\|\|\textbf{\emph{PC}}\|2^{\|\textbf{\emph{PC}}\|})$	$O(\|\emph{{U}}\|2^{\|\emph{{U}}\|})$	$O(\|\emph{{U}}\|2^{\|\textbf{\emph{PC}}\|})$	$O(\|\emph{{U}}\|2^{\|\emph{{PC}}\|})$

TABLE III: Computational Complexity of Local and Global Causal Structure Learning Algorithms

Algorithm	Worst Case	Best Case
PCD-by-PCD	$O(\|\emph{{U}}\|^{2}2^{\|\textbf{\emph{PC}}\|})$	$O(\|\emph{{U}}\|\|\textbf{\emph{PC}}\|2^{\|\textbf{\emph{PC}}\|})$
CMB	$O(\|\emph{{U}}\|^{2}\|\textbf{\emph{PC}}\|2^{\|\textbf{\emph{PC}}\|})$	$O(\|\emph{{U}}\|\|\textbf{\emph{PC}}\|2^{\|\textbf{\emph{PC}}\|})$
ELCS	$O(\|\emph{{U}}\|^{2}2^{\|\emph{{PC}}\|})$	$O(\|\emph{{U}}\|2^{\|\emph{{PC}}\|})$
MMHC	$O(\|\emph{{U}}\|^{2}2^{\|\emph{{U}}\|})$	$O(\|\emph{{U}}\|^{2}2^{\|\emph{{PC}}\|})$
NOTEARS	$O(m^{2}\|\emph{{U}}\|^{2}+m^{3}+mt\|\emph{{U}}\|^{2})$
DAG-GNN	$O(nht\|\emph{{U}}\|^{4}$ )

5 Experiments

In this section, we evaluate the performance of the proposed ELCS algorithm, and this section is organized as follows. Section 5.1 gives the experimental settings, Section 5.2 summarizes and discusses the experimental results, Section 5.3 analyses the reason why ELCS is efficient and effective.

5.1 Experimental Settings

5.1.1 Datasets

We use eight benchmark BNs to evaluate ELCS against its rivals. Each benchmark BN contains two groups of data, one group containing 10 data sets with 5000 data examples, and the other one including 10 data sets with 1000 data examples. The number of variables of these BNs ranges from 20 to 801. A brief description of the eight benchmark BNs is listed in Table IV.

5.1.2 Comparison Methods

We compare our approach ELCS with three state-of-the-art global causal structure learning algorithms, including MMHC [11], NOTEARS [12] and DAG-GNN [13], and two local causal structure learning algorithms, including PCD-by-PCD [14] and CMB [15]. In addition, we also compare ELCS with ELCS-II.

5.1.3 Implementation Details

PCD-by-PCD and CMB algorithms are implemented by ourselves in MATLAB (https://github.com/kuiy/CausalLearner). For MMHC, we use the implementation in the software tool of Causal Explorer [37]. For NOTEARS and DAG-GNN, we use the source codes provided by the authors. In the experiments, $G^{2}$ -test with the significance level of 0.01 is utilized to measure the conditional independence between variables. All experimental results are conducted on Windows 10 with Intel(R) i7-8700, 3.19 GHz CPU, and 16GB memory.

5.1.4 Evaluation Metrics

In the experiments, we evaluate the performance using the following metrics.

•

ArrP: The number of true directed edges in the output (i.e., the variables in the output belonging to the true parents and children of a target variable in a test DAG) divided by the number of edges in the output of an algorithm.
•

ArrR: The number of true directed edges in the output divided by the number of true directed edges (i.e., the number of parents and children of a target variable in a test DAG).
•

SHD: SHD is the number of total error edges, which contains undirected edges, reverse edges, missing edges and extra edges. The smaller value of SHD is better.
•

FDR: The number of false edges in the output divided by the number of edges in the output of an algorithm.
•

Efficiency: The number of CI tests and the running time (in seconds) are utilized to measure the efficiency.

In the following Tables, the results are reported in the format of $A\pm B$ , where $A$ denotes the average results, and $B$ represents the standard deviation. The best results in each setting have been marked in bold. ”-” means that the output of the corresponding BN cannot be generated in two days by the algorithm. Note that NOTEARS and DAG-GNN do not perform CI tests.

5.2 Experimental Results of ELCS and Its Rivals

TABLE IV: Summary of Benchmark BNs

Network

Num.

Vars

Num.

Edges

Max In/Out

Degree

Min/Max

|

PCset

|

Domain

Range

Alarm

4 / 5

1 / 6

2 - 4

Insurance

3 / 7

1 / 9

2 - 5

Child

2 / 7

1 / 8

2 - 6

Alarm10

370

570

4 / 7

1 / 9

2 - 4

Insurance10

270

556

5 / 8

1 / 11

2 - 5

Child10

200

257

2 / 7

1 / 8

2 - 6

Pigs

441

592

2 / 39

1 / 41

3 - 3

Gene

801

972

4 / 10

0 / 11

3 - 5

We compare ELCS with MMHC, NOTEARS, DAG-GNN, PCD-by-PCD, CMB and ELCS-II on the eight BNs as shown in Table IV. The average results of ArrP, ArrR, SHD, FDR, CI tests and running time of each algorithm are reported in Tables V-VI. Table V summarizes the experimental results on the eight BNs with 5,000 data examples, and Table VI reports the experimental results on the eight BNs with 1,000 data examples. From the experimental results, we have the following observations.

TABLE V: Comparison of ELCS with State-of-the-art Causal Structure Learning Algorithms on Eight Benchmark BNs (size=5,000)

Network	Algorithm	ArrP( $\uparrow$ )	ArrR( $\uparrow$ )	SHD( $\downarrow$ )	FDR( $\downarrow$ )	CI Tests( $\downarrow$ )	Time( $\downarrow$ )
Alarm	MMHC	0.19 $\pm$ 0.02	0.08 $\pm$ 0.02	4.58 $\pm$ 0.02	0.60 $\pm$ 0.01	13860 $\pm$ 4971	7.03 $\pm$ 2.74
	NOTEARS	0.61 $\pm$ 0.01	0.74 $\pm$ 0.04	2.84 $\pm$ 0.17	0.48 $\pm$ 0.02	-	541.95 $\pm$ 27.18
	DAG-GNN	0.66 $\pm$ 0.03	0.54 $\pm$ 0.04	2.02 $\pm$ 0.17	0.36 $\pm$ 0.05	-	1.1e3 $\pm$ 1.1e2
	PCD-by-PCD	0.77 $\pm$ 0.03	0.64 $\pm$ 0.05	0.94 $\pm$ 0.15	0.27 $\pm$ 0.04	2110 $\pm$ 92	0.81 $\pm$ 0.04
	CMB	0.77 $\pm$ 0.05	0.72 $\pm$ 0.06	0.76 $\pm$ 0.14	0.22 $\pm$ 0.04	2111 $\pm$ 207	0.69 $\pm$ 0.07
	ELCS	0.86 $\pm$ 0.01	0.81 $\pm$ 0.01	0.44 $\pm$ 0.06	0.07 $\pm$ 0.02	648 $\pm$ 55	0.20 $\pm$ 0.02
	ELCS-II	0.86 $\pm$ 0.01	0.81 $\pm$ 0.01	0.44 $\pm$ 0.06	0.07 $\pm$ 0.02	607 $\pm$ 52	0.19 $\pm$ 0.02
Insurance	MMHC	0.21 $\pm$ 0.02	0.03 $\pm$ 0.02	5.87 $\pm$ 0.17	0.67 $\pm$ 0.02	2603 $\pm$ 271	1.18 $\pm$ 0.12
	NOTEARS	0.46 $\pm$ 0.01	0.24 $\pm$ 0.01	4.64 $\pm$ 0.12	0.72 $\pm$ 0.02	-	420.89 $\pm$ 22.92
	DAG-GNN	0.54 $\pm$ 0.03	0.21 $\pm$ 0.01	4.35 $\pm$ 0.25	0.67 $\pm$ 0.05	-	518.84 $\pm$ 37.24
	PCD-by-PCD	0.68 $\pm$ 0.02	0.45 $\pm$ 0.02	2.07 $\pm$ 0.09	0.34 $\pm$ 0.03	3038 $\pm$ 300	1.48 $\pm$ 0.16
	CMB	0.70 $\pm$ 0.04	0.54 $\pm$ 0.04	2.31 $\pm$ 0.25	0.37 $\pm$ 0.05	11553 $\pm$ 4827	5.44 $\pm$ 2.29
	ELCS	0.85 $\pm$ 0.04	0.69 $\pm$ 0.04	1.61 $\pm$ 0.06	0.18 $\pm$ 0.05	1686 $\pm$ 276	0.75 $\pm$ 0.12
	ELCS-II	0.85 $\pm$ 0.04	0.69 $\pm$ 0.04	1.61 $\pm$ 0.06	0.18 $\pm$ 0.05	1637 $\pm$ 275	0.75 $\pm$ 0.12
Child	MMHC	0.22 $\pm$ 0.03	0.19 $\pm$ 0.07	3.63 $\pm$ 0.25	0.48 $\pm$ 0.03	8600 $\pm$ 632	5.32 $\pm$ 0.46
	NOTEARS	0.52 $\pm$ 0.02	0.39 $\pm$ 0.03	2.99 $\pm$ 0.17	0.70 $\pm$ 0.03	-	140.74 $\pm$ 36.59
	DAG-GNN	0.50 $\pm$ 0.04	0.29 $\pm$ 0.06	2.08 $\pm$ 0.10	0.44 $\pm$ 0.06	-	384.73 $\pm$ 30.76
	PCD-by-PCD	0.71 $\pm$ 0.02	0.59 $\pm$ 0.04	0.86 $\pm$ 0.09	0.26 $\pm$ 0.04	2432 $\pm$ 78	1.24 $\pm$ 0.04
	CMB	0.82 $\pm$ 0.05	0.75 $\pm$ 0.08	0.72 $\pm$ 0.18	0.25 $\pm$ 0.08	9424 $\pm$ 4106	4.58 $\pm$ 1.96
	ELCS	0.71 $\pm$ 0.12	0.61 $\pm$ 0.16	1.08 $\pm$ 0.36	0.09 $\pm$ 0.08	2093 $\pm$ 287	0.93 $\pm$ 0.10
	ELCS-II	0.71 $\pm$ 0.12	0.61 $\pm$ 0.16	1.08 $\pm$ 0.36	0.09 $\pm$ 0.08	2087 $\pm$ 287	0.93 $\pm$ 0.10
Alarm10	MMHC	0.19+0.00	0.02+0.00	5.63+0.05	0.63+0.00	9.7e7+8.9e6	4.6e4+4.8e3
	NOTEARS	0.73 $\pm$ 0.01	0.50 $\pm$ 0.01	2.27 $\pm$ 0.04	0.28 $\pm$ 0.01	-	1.6e4 $\pm$ 1.8e3
	DAG-GNN	-	-	-	-	-	-
	PCD-by-PCD	0.73 $\pm$ 0.01	0.54 $\pm$ 0.01	1.48 $\pm$ 0.03	0.21 $\pm$ 0.01	25795 $\pm$ 1770	8.18 $\pm$ 0.57
	CMB	0.72 $\pm$ 0.01	0.58 $\pm$ 0.01	1.57 $\pm$ 0.04	0.34 $\pm$ 0.01	14011 $\pm$ 565	3.69 $\pm$ 0.15
	ELCS	0.83 $\pm$ 0.01	0.68 $\pm$ 0.02	1.26 $\pm$ 0.07	0.14 $\pm$ 0.02	6893 $\pm$ 483	1.77 $\pm$ 0.12
	ELCS-II	0.83 $\pm$ 0.01	0.68 $\pm$ 0.02	1.26 $\pm$ 0.07	0.14 $\pm$ 0.02	6916 $\pm$ 480	1.77 $\pm$ 0.12
Insurance10	MMHC	0.22 $\pm$ 0.00	0.00 $\pm$ 0.00	6.72 $\pm$ 0.04	0.70 $\pm$ 0.00	1.9e5 $\pm$ 2.0e4	81.66 $\pm$ 10.39
	NOTEARS	0.30 $\pm$ 0.01	0.20 $\pm$ 0.00	8.67 $\pm$ 0.44	0.85 $\pm$ 0.01	-	1.7e4 $\pm$ 1.5e4
	DAG-GNN	-	-	-	-	-	-
	PCD-by-PCD	0.68 $\pm$ 0.01	0.46 $\pm$ 0.01	2.10 $\pm$ 0.05	0.41 $\pm$ 0.01	9581 $\pm$ 224	4.38 $\pm$ 0.13
	CMB	0.64 $\pm$ 0.01	0.49 $\pm$ 0.01	2.58 $\pm$ 0.06	0.48 $\pm$ 0.02	39932 $\pm$ 3898	16.04 $\pm$ 1.54
	ELCS	0.80 $\pm$ 0.02	0.67 $\pm$ 0.02	1.75 $\pm$ 0.11	0.23 $\pm$ 0.01	10809 $\pm$ 1528	3.92 $\pm$ 0.55
	ELCS-II	0.80 $\pm$ 0.02	0.67 $\pm$ 0.02	1.75 $\pm$ 0.11	0.23 $\pm$ 0.01	10605 $\pm$ 1499	3.91 $\pm$ 0.55
Child10	MMHC	0.15 $\pm$ 0.01	0.03 $\pm$ 0.01	5.29 $\pm$ 0.09	0.58 $\pm$ 0.01	7.9e5 $\pm$ 2.7e5	439.79 $\pm$ 169.28
	NOTEARS	0.61 $\pm$ 0.01	0.74 $\pm$ 0.04	2.84 $\pm$ 0.17	0.48 $\pm$ 0.02	-	541.95 $\pm$ 27.18
	DAG-GNN	-	-	-	-	-	-
	PCD-by-PCD	0.77 $\pm$ 0.01	0.68 $\pm$ 0.02	0.82 $\pm$ 0.04	0.22 $\pm$ 0.01	11341 $\pm$ 1470	4.44 $\pm$ 0.57
	CMB	0.75 $\pm$ 0.02	0.68 $\pm$ 0.02	1.03 $\pm$ 0.07	0.31 $\pm$ 0.02	22861 $\pm$ 2648	8.11 $\pm$ 0.95
	ELCS	0.83 $\pm$ 0.05	0.76 $\pm$ 0.07	0.73 $\pm$ 0.20	0.14 $\pm$ 0.03	13129 $\pm$ 2613	3.98 $\pm$ 0.79
	ELCS-II	0.83 $\pm$ 0.05	0.76 $\pm$ 0.07	0.73 $\pm$ 0.20	0.14 $\pm$ 0.03	13104 $\pm$ 2605	3.98 $\pm$ 0.79
Pigs	MMHC	0.26 $\pm$ 0.00	0.00 $\pm$ 0.00	6.85 $\pm$ 0.07	1.00 $\pm$ 0.00	4.3e5 $\pm$ 1.4e4	207.96 $\pm$ 6.88
	NOTEARS	0.43 $\pm$ 0.00	0.26 $\pm$ 0.00	2.77 $\pm$ 0.03	0.77 $\pm$ 0.00	-	3.1e4 $\pm$ 1.2e3
	DAG-GNN	-	-	-	-	-	-
	PCD-by-PCD	-	-	-	-	-	-
	CMB	-	-	-	-	-	-
	ELCS	0.91 $\pm$ 0.00	0.99 $\pm$ 0.00	0.42 $\pm$ 0.02	0.15 $\pm$ 0.01	13374 $\pm$ 8660	8.91 $\pm$ 6.84
	ELCS-II	0.91 $\pm$ 0.00	0.99 $\pm$ 0.00	0.42 $\pm$ 0.02	0.15 $\pm$ 0.01	11467 $\pm$ 5659	8.38 $\pm$ 5.12
Gene	MMHC	-	-	-	-	-	-
	NOTEARS	-	-	-	-	-	-
	DAG-GNN	-	-	-	-	-	-
	PCD-by-PCD	-	-	-	-	-	-
	CMB	-	-	-	-	-	-
	ELCS	0.76 $\pm$ 0.01	0.79 $\pm$ 0.01	0.79 $\pm$ 0.03	0.32 $\pm$ 0.01	36950 $\pm$ 7876	11.03 $\pm$ 2.35
	ELCS-II	0.76 $\pm$ 0.01	0.79 $\pm$ 0.01	0.79 $\pm$ 0.03	0.32 $\pm$ 0.01	36051 $\pm$ 7696	11.02 $\pm$ 2.08

TABLE VI: Comparison of ELCS with State-of-the-art Causal Structure Learning Algorithms on Eight Benchmark BNs (size=1,000)

Network	Algorithm	ArrP( $\uparrow$ )	ArrR( $\uparrow$ )	SHD( $\downarrow$ )	FDR( $\downarrow$ )	CI Tests( $\downarrow$ )	Time( $\downarrow$ )
Alarm	MMHC	0.22 $\pm$ 0.03	0.12 $\pm$ 0.04	4.32 $\pm$ 0.20	0.57 $\pm$ 0.03	8884 $\pm$ 4471	1.64 $\pm$ 0.81
	NOTEARS	0.59 $\pm$ 0.03	0.72 $\pm$ 0.06	3.14 $\pm$ 0.21	0.51 $\pm$ 0.04	-	232.28 $\pm$ 22.13
	DAG-GNN	0.48 $\pm$ 0.04	0.26 $\pm$ 0.06	2.01 $\pm$ 0.10	0.17 $\pm$ 0.03	-	241.19 $\pm$ 23.82
	PCD-by-PCD	0.66 $\pm$ 0.07	0.49 $\pm$ 0.05	1.33 $\pm$ 0.10	0.22 $\pm$ 0.04	1737 $\pm$ 265	0.39 $\pm$ 0.05
	CMB	0.67 $\pm$ 0.06	0.52 $\pm$ 0.06	1.32 $\pm$ 0.13	0.34 $\pm$ 0.07	3171 $\pm$ 410	0.50 $\pm$ 0.07
	ELCS	0.72 $\pm$ 0.07	0.61 $\pm$ 0.08	1.06 $\pm$ 0.14	0.11 $\pm$ 0.04	901 $\pm$ 172	0.13 $\pm$ 0.03
	ELCS-II	0.72 $\pm$ 0.07	0.61 $\pm$ 0.08	1.06 $\pm$ 0.14	0.11 $\pm$ 0.04	861 $\pm$ 162	0.13 $\pm$ 0.03
Insurance	MMHC	0.22 $\pm$ 0.02	0.04 $\pm$ 0.02	5.72 $\pm$ 0.18	0.65 $\pm$ 0.03	2110 $\pm$ 293	0.44 $\pm$ 0.05
	NOTEARS	0.43 $\pm$ 0.02	0.24 $\pm$ 0.01	4.90 $\pm$ 0.12	0.75 $\pm$ 0.03	-	220.97 $\pm$ 44.51
	DAG-GNN	0.49 $\pm$ 0.06	0.15 $\pm$ 0.05	3.78 $\pm$ 0.19	0.39 $\pm$ 0.13	-	151.75 $\pm$ 18.26
	PCD-by-PCD	0.68 $\pm$ 0.04	0.40 $\pm$ 0.04	2.43 $\pm$ 0.15	0.30 $\pm$ 0.06	1370 $\pm$ 104	0.33 $\pm$ 0.02
	CMB	0.69 $\pm$ 0.06	0.46 $\pm$ 0.05	2.55 $\pm$ 0.22	0.38 $\pm$ 0.08	4457 $\pm$ 1196	0.76 $\pm$ 0.20
	ELCS	0.69 $\pm$ 0.12	0.44 $\pm$ 0.15	2.49 $\pm$ 0.40	0.37 $\pm$ 0.19	1188 $\pm$ 566	0.17 $\pm$ 0.08
	ELCS-II	0.69 $\pm$ 0.12	0.44 $\pm$ 0.15	2.49 $\pm$ 0.40	0.37 $\pm$ 0.19	1106 $\pm$ 513	0.17 $\pm$ 0.08
Child	MMHC	0.24 $\pm$ 0.02	0.18 $\pm$ 0.04	3.41 $\pm$ 0.14	0.45 $\pm$ 0.03	4583 $\pm$ 898	0.90 $\pm$ 0.19
	NOTEARS	0.49 $\pm$ 0.02	0.37 $\pm$ 0.05	3.31 $\pm$ 0.23	0.72 $\pm$ 0.04	-	66.32 $\pm$ 25.78
	DAG-GNN	0.34 $\pm$ 0.05	0.15 $\pm$ 0.03	2.20 $\pm$ 0.05	0.29 $\pm$ 0.09	-	87.70 $\pm$ 8.96
	PCD-by-PCD	0.52 $\pm$ 0.05	0.34 $\pm$ 0.06	1.61 $\pm$ 0.12	0.33 $\pm$ 0.08	2085 $\pm$ 183	0.39 $\pm$ 0.02
	CMB	0.74 $\pm$ 0.09	0.59 $\pm$ 0.10	1.27 $\pm$ 0.29	0.33 $\pm$ 0.12	4991 $\pm$ 1145	0.65 $\pm$ 0.14
	ELCS	0.82 $\pm$ 0.05	0.69 $\pm$ 0.06	1.01 $\pm$ 0.18	0.21 $\pm$ 0.06	2882 $\pm$ 815	0.34 $\pm$ 0.10
	ELCS-II	0.82 $\pm$ 0.05	0.69 $\pm$ 0.06	1.01 $\pm$ 0.18	0.21 $\pm$ 0.06	2592 $\pm$ 723	0.33 $\pm$ 0.10
Alarm10	MMHC	0.19 $\pm$ 0.00	0.03 $\pm$ 0.01	5.69 $\pm$ 0.07	0.63 $\pm$ 0.00	3.9e6 $\pm$ 3.5e5	700.63 $\pm$ 55.38
	NOTEARS	0.39 $\pm$ 0.01	0.47 $\pm$ 0.01	9.27 $\pm$ 0.49	0.69 $\pm$ 0.02	-	1.6e4 $\pm$ 1.2e3
	DAG-GNN	-	-	-	-	-	-
	PCD-by-PCD	0.66 $\pm$ 0.01	0.44 $\pm$ 0.02	1.74 $\pm$ 0.06	0.20 $\pm$ 0.02	26572 $\pm$ 3414	4.87 $\pm$ 0.63
	CMB	0.68 $\pm$ 0.01	0.48 $\pm$ 0.02	1.90 $\pm$ 0.06	0.39 $\pm$ 0.02	10827 $\pm$ 643	1.51 $\pm$ 0.08
	ELCS	0.75 $\pm$ 0.02	0.53 $\pm$ 0.02	1.72 $\pm$ 0.06	0.20 $\pm$ 0.03	8800 $\pm$ 1218	1.18 $\pm$ 0.16
	ELCS-II	0.75 $\pm$ 0.02	0.53 $\pm$ 0.02	1.72 $\pm$ 0.06	0.20 $\pm$ 0.03	8745 $\pm$ 1209	1.18 $\pm$ 0.16
Insurance10	MMHC	0.24 $\pm$ 0.01	0.05 $\pm$ 0.01	6.57 $\pm$ 0.05	0.63 $\pm$ 0.01	9.6e5 $\pm$ 1.2e5	236.11 $\pm$ 25.93
	NOTEARS	0.20 $\pm$ 0.02	0.20 $\pm$ 0.00	14.11 $\pm$ 0.92	0.91 $\pm$ 0.01	-	9.0e3 $\pm$ 1.1e3
	DAG-GNN	-	-	-	-	-	-
	PCD-by-PCD	0.63 $\pm$ 0.01	0.37 $\pm$ 0.01	2.66 $\pm$ 0.05	0.46 $\pm$ 0.02	8461 $\pm$ 1809	1.60 $\pm$ 0.35
	CMB	0.62 $\pm$ 0.01	0.45 $\pm$ 0.01	2.95 $\pm$ 0.05	0.45 $\pm$ 0.02	20895 $\pm$ 2158	3.23 $\pm$ 0.33
	ELCS	0.50 $\pm$ 0.01	0.26 $\pm$ 0.00	3.18 $\pm$ 0.04	0.65 $\pm$ 0.00	4333 $\pm$ 1736	0.60 $\pm$ 0.25
	ELCS-II	0.50 $\pm$ 0.01	0.26 $\pm$ 0.00	3.18 $\pm$ 0.04	0.65 $\pm$ 0.00	3966 $\pm$ 1423	0.59 $\pm$ 0.25
Child10	MMHC	0.22 $\pm$ 0.01	0.19 $\pm$ 0.02	4.37 $\pm$ 0.11	0.48 $\pm$ 0.01	7.7e6 $\pm$ 2.0e6	1.5e3 $\pm$ 3.9e2
	NOTEARS	0.49 $\pm$ 0.01	0.34 $\pm$ 0.02	2.99 $\pm$ 0.10	0.65 $\pm$ 0.02	-	3.3e3 $\pm$ 1.9e2
	DAG-GNN	-	-	-	-	-	-
	PCD-by-PCD	0.55 $\pm$ 0.02	0.36 $\pm$ 0.03	1.69 $\pm$ 0.06	0.38 $\pm$ 0.03	15698 $\pm$ 3819	2.82 $\pm$ 0.71
	CMB	0.71 $\pm$ 0.04	0.59 $\pm$ 0.03	1.58 $\pm$ 0.12	0.35 $\pm$ 0.03	26986 $\pm$ 3942	3.71 $\pm$ 0.54
	ELCS	0.67 $\pm$ 0.03	0.55 $\pm$ 0.02	1.56 $\pm$ 0.07	0.36 $\pm$ 0.02	5074 $\pm$ 658	0.67 $\pm$ 0.09
	ELCS-II	0.67 $\pm$ 0.03	0.55 $\pm$ 0.02	1.56 $\pm$ 0.07	0.36 $\pm$ 0.02	4889 $\pm$ 631	0.66 $\pm$ 0.09
Pigs	MMHC	0.26 $\pm$ 0.00	0.00 $\pm$ 0.00	6.72 $\pm$ 0.02	1.00 $\pm$ 0.00	4.6e5 $\pm$ 9.9e3	90.35 $\pm$ 2.90
	NOTEARS	0.42 $\pm$ 0.00	0.22 $\pm$ 0.01	2.85 $\pm$ 0.03	0.80 $\pm$ 0.01	-	2.4e4 $\pm$ 1.7e3
	DAG-GNN	-	-	-	-	-	-
	PCD-by-PCD	-	-	-	-	-	-
	CMB	-	-	-	-	-	-
	ELCS	0.91 $\pm$ 0.01	0.99 $\pm$ 0.00	0.40 $\pm$ 0.03	0.15 $\pm$ 0.01	11793 $\pm$ 3279	0.84 $\pm$ 0.13
	ELCS-II	0.91 $\pm$ 0.01	0.99 $\pm$ 0.00	0.40 $\pm$ 0.03	0.15 $\pm$ 0.01	11685 $\pm$ 3272	0.84 $\pm$ 0.13
Gene	MMHC	-	-	-	-	-	-
	NOTEARS	-	-	-	-	-	-
	DAG-GNN	-	-	-	-	-	-
	PCD-by-PCD	-	-	-	-	-	-
	CMB	-	-	-	-	-	-
	ELCS	0.77 $\pm$ 0.00	0.78 $\pm$ 0.01	0.78 $\pm$ 0.02	0.31 $\pm$ 0.01	31753 $\pm$ 3432	4.37 $\pm$ 0.47
	ELCS-II	0.77 $\pm$ 0.00	0.78 $\pm$ 0.01	0.78 $\pm$ 0.02	0.31 $\pm$ 0.01	31430 $\pm$ 3406	4.36 $\pm$ 0.47

ELCS versus MMHC. Regardless of the number of samples (5000 or 1000), ELCS is significantly better than MMHC. On the ArrP and ArrR metrics, ELCS is superior to MMHC, which means that ELCS finds more true causal edges and less false casual edges. In addition, on the SHD metric, the value of SHD of ELCS is significantly lower than that of MMHC. On the FDR metric, ELCS performs better than MMHC. Furthermore, ELCS always uses less CI tests than MMHC. To learn the local causal structure of a target variable, MMHC needs to learn the whole DAG over all variables in a dataset, hence MMHC performs much more CI tests than ELCS. Thus, we can conclude that ELCS is more efficient and effective than MMHC.

ELCS versus NOTEARS and DAG-GNN. NOTEARS and DAG-GNN are global causal learning algorithms, they need to learn the global structure over all variables, and then obtain the parents and children of a given variable. ELCS achieves better performance than NOTEARS and DAG-GNN using both 5,000 data samples and 1,000 data samples, especially using 5,000 data samples. On the ArrP, ArrR, SHD and FDR metrics, ELCS is significantly better than NOTEARS and DAG-GNN. The values of ELCS on ArrP and ArrR metrics are higher than that of NOTEARS and DAG-GNN, and lower on the SHD and FDR metrics. Since NOTEARS and DAG-GNN adopt a continuous optimization strategy to obtain the DAG from observational data, and the experimental results are susceptible to the influence of parameters. Additionally, NOTEARS and DAG-GNN need to spend much time in learning the DAG, since they obtain the optimal solution by means of a large number of iterations. In a word, ELCS is superior to NOTEARS and DAG-GNN.

ELCS versus PCD-by-PCD and CMB. Both PCD-by-PCD and CMB are local causal learning algorithms. Using 5,000 data samples, ELCS performs better than PCD-by-PCD and CMB. Except on Child, ELCS achieves highest ArrP and ArrR values, and lowest SHD and FDR values on the other BNs. In addition, ELCS uses less CI tests than PCD-by-PCD and CMB on most of BNs. Using 1,000 data samples, ELCS is superior to PCD-by-PCD and CMB on Alarm, Child, Alarm10, Pigs and Gene. ELCS is better than CMB and worse than PCD-by-PCD on Insurance on the ArrP, ArrR, SHD and FDR metrics, while ELCS has advantages in terms of CI tests and running time. On Insurance10, ELCS is worse than PCD-by-PCD and CMB on the ArrP, ArrR, SHD and FDR metrics. The reason may be that EMB learns inaccurate MBs on small data samples. ELCS is better than PCD-by-PCD and little worse than CMB on Child10 on the ArrP, ArrR, SHD and FDR metrics. Generally, ELCS performs better than PCD-by-PCD and CMB.

ELCS versus ELCS-II. ELCS-II is superior to ELCS. ELCS-II further improves the efficiency of EMB while maintaining the same performance as measured by the ArrP, ArrR, SHD and FDR metrics, which indicates the efficiency of ELCS-II.

In summary, it can be seen from Tables V-VI, ELCS is significantly better than MMHC, NOTEARS and DAG-GNN. Additionally, ELCS outperforms PCD-by-PCD and CMB on the ArrP, ArrR, SHD, FDR metrics. Specifically, compared with PCD-by-PCD and CMB, ELCS not only achieves higher ArrP and ArrR values, but also achieves lower SHD and FDR values. Furthermore, ELCS is the fastest algorithm among all structure learning algorithms. ELCS is significantly better than MMHC and NOTEARS in terms of running time. MMHC, NOTEARS and DAG-GNN are global causal learning algorithms, they need to find the global structure of a BN. In particular, ELCS is 10 times faster than MMHC and 1000 times faster than NOTEARS and DAG-GNN on average. Additionally, ELCS is also superior to PCD-by-PCD and CMB in terms of running times. ELCS is 2 times faster than PCD-by-PCD and 3 times faster than CMB on average. Specifically, MMHC, NOTEARS, PCD-by-PCD and CMB fail to generate the output on several BNs, whereas ELCS can be successful applied in learning the local causal structure of each variable within two days. But beyond that, ELCS uses the smallest number of CI tests. Overall, ELCS is superior to its rivals in both efficiency and accuracy.

TABLE VII: Comparison of ELCS with “ECLS w/o N” on Eight Benchmark BNs (size=5,000)

Network	Algorithm	ArrP( $\uparrow$ )	ArrR( $\uparrow$ )	SHD( $\downarrow$ )	FDR( $\downarrow$ )	CI Tests( $\downarrow$ )	Time( $\downarrow$ )
Alarm	ELCS	0.86 $\pm$ 0.01	0.81 $\pm$ 0.01	0.44 $\pm$ 0.06	0.07 $\pm$ 0.02	648 $\pm$ 55	0.20 $\pm$ 0.02
Alarm	ELCS w/o N	0.87 $\pm$ 0.01	0.81 $\pm$ 0.01	0.40 $\pm$ 0.06	0.05 $\pm$ 0.02	701 $\pm$ 99	0.22 $\pm$ 0.03
Insurance	ELCS	0.85 $\pm$ 0.04	0.69 $\pm$ 0.04	1.61 $\pm$ 0.06	0.18 $\pm$ 0.05	1686 $\pm$ 276	0.75 $\pm$ 0.12
Insurance	ELCS w/o N	0.85 $\pm$ 0.04	0.68 $\pm$ 0.03	1.61 $\pm$ 0.14	0.18 $\pm$ 0.05	1924 $\pm$ 257	0.84 $\pm$ 0.11
Child	ELCS	0.71 $\pm$ 0.12	0.61 $\pm$ 0.16	1.08 $\pm$ 0.36	0.09 $\pm$ 0.08	2093 $\pm$ 287	0.93 $\pm$ 0.10
Child	ELCS w/o N	0.70 $\pm$ 0.13	0.59 $\pm$ 0.16	1.13 $\pm$ 0.36	0.09 $\pm$ 0.09	2143 $\pm$ 277	0.94 $\pm$ 0.09
Alarm10	ELCS	0.83 $\pm$ 0.01	0.68 $\pm$ 0.02	1.26 $\pm$ 0.07	0.14 $\pm$ 0.02	6893 $\pm$ 483	1.77 $\pm$ 0.12
Alarm10	ELCS w/o N	0.84 $\pm$ 0.01	0.68 $\pm$ 0.02	1.24 $\pm$ 0.06	0.14 $\pm$ 0.02	7579 $\pm$ 524	2.12 $\pm$ 0.14
Insurance10	ELCS	0.80 $\pm$ 0.02	0.67 $\pm$ 0.02	1.75 $\pm$ 0.11	0.23 $\pm$ 0.01	10809 $\pm$ 1528	3.92 $\pm$ 0.55
Insurance10	ELCS w/o N	0.86 $\pm$ 0.02	0.73 $\pm$ 0.02	1.42 $\pm$ 0.10	0.17 $\pm$ 0.01	11300 $\pm$ 1713	4.49 $\pm$ 0.55
Child10	ELCS	0.83 $\pm$ 0.05	0.76 $\pm$ 0.07	0.73 $\pm$ 0.20	0.14 $\pm$ 0.03	13129 $\pm$ 2613	3.98 $\pm$ 0.79
Child10	ELCS w/o N	0.83 $\pm$ 0.06	0.76 $\pm$ 0.07	0.74 $\pm$ 0.21	0.13 $\pm$ 0.03	13491 $\pm$ 2559	4.32 $\pm$ 0.77
Pigs	ELCS	0.91 $\pm$ 0.00	0.99 $\pm$ 0.00	0.42 $\pm$ 0.02	0.15 $\pm$ 0.01	13374 $\pm$ 8660	8.91 $\pm$ 6.84
Pigs	ELCS w/o N	0.91 $\pm$ 0.00	0.99 $\pm$ 0.00	0.40 $\pm$ 0.02	0.15 $\pm$ 0.01	14343 $\pm$ 8657	9.70 $\pm$ 6.75
Gene	ELCS	0.76 $\pm$ 0.01	0.79 $\pm$ 0.01	0.79 $\pm$ 0.03	0.32 $\pm$ 0.01	36950 $\pm$ 7876	11.03 $\pm$ 2.35
Gene	ELCS w/o N	0.76 $\pm$ 0.01	0.78 $\pm$ 0.01	0.79 $\pm$ 0.03	0.32 $\pm$ 0.01	38061 $\pm$ 8041	12.52 $\pm$ 2.62

5.3 Why ELCS is Efficient and Effective?

In this section, we analyse the reason why ELCS is efficient and effective from the following two aspects. First, we give a case study to evaluate the effectiveness of utilizing the N-structures to infer edge directions between a given variable and its children. Second, we evaluate the effectiveness of the proposed EMB subroutine, since the proposed ELCS algorithm relies on EMB.

5.3.1 Case Study

To illustrate the benefit of utilizing the N-structures, we do not use the N-structures to distinguish the children of a given variable in learning the MB of the variable, that is, in the DistinguishPC subroutine, we remove lines 8-12 in Algorithm 4, and we denote this version of ELCS as “ECLS w/o N”. Table VII summarizes the experimental results of ECLS and “ECLS w/o N” on the eight BNs with 5,000 data examples. From the experimental results, we note that ELCS achieves comparable performance against “ECLS w/o N” in terms of ArrP, ArrR, SHD and FDR on average but ELCS performs less CI tests. Specifically, on Alarm, for each variable, the number of CI tests is reduced by 53 on average. On Gene, for each variable, the number of CI tests is reduced by 1111 on average. We observe that there are significant differences between the number of CI tests of ELCS and “ECLS w/o N” on Insurance, Alarm10, Insurance10, Pigs and Gene, but there is a little difference between them on the other BNs. The reason may be that ELCS makes use of the N-structures to speed up children identification, and there are few N-structures existing in Alarm, Child and Child10, leading to that the number of CI tests of ELCS and “ECLS w/o N” on these three BNs are close to each other. The efficiency performance of ECLS will improve a little on the BNs with few N-structures, this is a limitation of ELCS. In summary, ELCS is more efficient and provides better local structure learning quality than “ECLS w/o N”, which indicates the effectiveness of leveraging the N-structures for local casual structure learning.

TABLE VIII: Comparison of EMB with State-of-the-art MB Learning Algorithms on Eight Benchmark BNs (size=5,000)

Network	Algorithm	Distance( $\downarrow$ )	F1( $\uparrow$ )	Precision( $\uparrow$ )	Recall( $\uparrow$ )	CI Tests( $\downarrow$ )	Time( $\downarrow$ )
Alarm	IAMB	0.15 $\pm$ 0.03	0.90 $\pm$ 0.02	0.94 $\pm$ 0.02	0.89 $\pm$ 0.01	142 $\pm$ 2	0.05 $\pm$ 0.00
	MMMB	0.10 $\pm$ 0.02	0.94 $\pm$ 0.02	0.92 $\pm$ 0.02	0.97 $\pm$ 0.01	604 $\pm$ 26	0.24 $\pm$ 0.01
	HITON-MB	0.06 $\pm$ 0.02	0.96 $\pm$ 0.01	0.97 $\pm$ 0.02	0.97 $\pm$ 0.01	394 $\pm$ 12	0.13 $\pm$ 0.00
	STMB	0.30 $\pm$ 0.02	0.78 $\pm$ 0.02	0.73 $\pm$ 0.02	0.96 $\pm$ 0.01	531 $\pm$ 15	0.19 $\pm$ 0.01
	BAMB	0.09 $\pm$ 0.03	0.94 $\pm$ 0.02	0.96 $\pm$ 0.03	0.95 $\pm$ 0.01	351 $\pm$ 11	0.14 $\pm$ 0.00
	EMB	0.06 $\pm$ 0.01	0.96 $\pm$ 0.01	0.99 $\pm$ 0.01	0.95 $\pm$ 0.01	318 $\pm$ 7	0.10 $\pm$ 0.00
	EMB-II	0.06 $\pm$ 0.01	0.96 $\pm$ 0.01	0.99 $\pm$ 0.01	0.95 $\pm$ 0.01	298 $\pm$ 5	0.09 $\pm$ 0.00
Insurance	IAMB	0.36 $\pm$ 0.02	0.76 $\pm$ 0.01	0.94 $\pm$ 0.02	0.67 $\pm$ 0.01	104 $\pm$ 2	0.04 $\pm$ 0.00
	MMMB	0.31 $\pm$ 0.02	0.79 $\pm$ 0.02	0.88 $\pm$ 0.03	0.76 $\pm$ 0.02	1186 $\pm$ 124	0.60 $\pm$ 0.07
	HITON-MB	0.33 $\pm$ 0.03	0.78 $\pm$ 0.02	0.88 $\pm$ 0.03	0.74 $\pm$ 0.02	679 $\pm$ 62	0.31 $\pm$ 0.04
	STMB	0.49 $\pm$ 0.03	0.65 $\pm$ 0.02	0.64 $\pm$ 0.04	0.77 $\pm$ 0.03	703 $\pm$ 47	0.33 $\pm$ 0.02
	BAMB	0.30 $\pm$ 0.02	0.80 $\pm$ 0.01	0.89 $\pm$ 0.03	0.77 $\pm$ 0.02	619 $\pm$ 39	0.33 $\pm$ 0.02
	EMB	0.31 $\pm$ 0.01	0.79 $\pm$ 0.01	0.92 $\pm$ 0.02	0.73 $\pm$ 0.01	370 $\pm$ 23	0.16 $\pm$ 0.01
	EMB-II	0.31 $\pm$ 0.01	0.79 $\pm$ 0.01	0.92 $\pm$ 0.02	0.73 $\pm$ 0.01	360 $\pm$ 20	0.15 $\pm$ 0.01
Child	IAMB	0.15 $\pm$ 0.02	0.90 $\pm$ 0.02	0.95 $\pm$ 0.03	0.88 $\pm$ 0.01	63 $\pm$ 1	0.03 $\pm$ 0.00
	MMMB	0.05 $\pm$ 0.02	0.97 $\pm$ 0.01	0.96 $\pm$ 0.02	0.99 $\pm$ 0.01	897 $\pm$ 25	0.47 $\pm$ 0.01
	HITON-MB	0.04 $\pm$ 0.03	0.98 $\pm$ 0.02	0.97 $\pm$ 0.03	0.99 $\pm$ 0.01	499 $\pm$ 16	0.24 $\pm$ 0.01
	STMB	0.17 $\pm$ 0.04	0.89 $\pm$ 0.03	0.84 $\pm$ 0.04	0.98 $\pm$ 0.02	374 $\pm$ 35	0.17 $\pm$ 0.02
	BAMB	0.09 $\pm$ 0.03	0.95 $\pm$ 0.02	0.93 $\pm$ 0.02	0.98 $\pm$ 0.02	376 $\pm$ 11	0.19 $\pm$ 0.01
	EMB	0.05 $\pm$ 0.02	0.97 $\pm$ 0.02	0.97 $\pm$ 0.02	0.98 $\pm$ 0.02	205 $\pm$ 8	0.09 $\pm$ 0.00
	EMB-II	0.05 $\pm$ 0.02	0.97 $\pm$ 0.02	0.97 $\pm$ 0.02	0.98 $\pm$ 0.02	204 $\pm$ 8	0.09 $\pm$ 0.00
Alarm10	IAMB	0.36 $\pm$ 0.01	0.75 $\pm$ 0.01	0.83 $\pm$ 0.01	0.74 $\pm$ 0.00	1637 $\pm$ 14	0.59 $\pm$ 0.01
	MMMB	0.26 $\pm$ 0.01	0.82 $\pm$ 0.01	0.88 $\pm$ 0.01	0.81 $\pm$ 0.00	1926 $\pm$ 45	0.60 $\pm$ 0.01
	HITON-MB	0.25 $\pm$ 0.01	0.84 $\pm$ 0.01	0.90 $\pm$ 0.01	0.82 $\pm$ 0.00	1714 $\pm$ 11	0.44 $\pm$ 0.00
	STMB	0.67 $\pm$ 0.01	0.48 $\pm$ 0.01	0.41 $\pm$ 0.01	0.83 $\pm$ 0.01	5049 $\pm$ 39	1.89 $\pm$ 0.02
	BAMB	0.30 $\pm$ 0.01	0.80 $\pm$ 0.00	0.83 $\pm$ 0.01	0.82 $\pm$ 0.00	1802 $\pm$ 12	0.57 $\pm$ 0.01
	EMB	0.25 $\pm$ 0.01	0.83 $\pm$ 0.01	0.91 $\pm$ 0.01	0.81 $\pm$ 0.00	1924 $\pm$ 7	0.50 $\pm$ 0.02
	EMB-II	0.25 $\pm$ 0.01	0.83 $\pm$ 0.01	0.91 $\pm$ 0.01	0.81 $\pm$ 0.00	1908 $\pm$ 7	0.49 $\pm$ 0.02
Insurance10	IAMB	0.42 $\pm$ 0.01	0.71 $\pm$ 0.01	0.89 $\pm$ 0.01	0.66 $\pm$ 0.00	1210 $\pm$ 8	0.50 $\pm$ 0.01
	MMMB	0.33 $\pm$ 0.01	0.78 $\pm$ 0.01	0.82 $\pm$ 0.01	0.80 $\pm$ 0.00	3274 $\pm$ 45	1.53 $\pm$ 0.03
	HITON-MB	0.32 $\pm$ 0.01	0.78 $\pm$ 0.01	0.84 $\pm$ 0.01	0.80 $\pm$ 0.00	2348 $\pm$ 18	0.93 $\pm$ 0.01
	STMB	0.77 $\pm$ 0.01	0.40 $\pm$ 0.01	0.30 $\pm$ 0.01	0.79 $\pm$ 0.00	6781 $\pm$ 118	3.36 $\pm$ 0.07
	BAMB	0.34 $\pm$ 0.01	0.77 $\pm$ 0.00	0.80 $\pm$ 0.01	0.80 $\pm$ 0.00	2541 $\pm$ 22	1.17 $\pm$ 0.01
	EMB	0.28 $\pm$ 0.01	0.81 $\pm$ 0.00	0.91 $\pm$ 0.01	0.78 $\pm$ 0.00	2189 $\pm$ 15	0.78 $\pm$ 0.01
	EMB-II	0.28 $\pm$ 0.01	0.81 $\pm$ 0.00	0.91 $\pm$ 0.01	0.78 $\pm$ 0.00	2122 $\pm$ 14	0.75 $\pm$ 0.01
Child10	IAMB	0.24 $\pm$ 0.01	0.84 $\pm$ 0.01	0.87 $\pm$ 0.01	0.88 $\pm$ 0.00	750 $\pm$ 10	0.31 $\pm$ 0.00
	MMMB	0.10 $\pm$ 0.01	0.94 $\pm$ 0.01	0.91 $\pm$ 0.01	0.99 $\pm$ 0.00	1622 $\pm$ 21	0.70 $\pm$ 0.01
	HITON-MB	0.08 $\pm$ 0.01	0.95 $\pm$ 0.01	0.92 $\pm$ 0.01	0.99 $\pm$ 0.00	1194 $\pm$ 11	0.43 $\pm$ 0.01
	STMB	0.56 $\pm$ 0.02	0.56 $\pm$ 0.02	0.45 $\pm$ 0.02	0.99 $\pm$ 0.00	2881 $\pm$ 24	1.26 $\pm$ 0.01
	BAMB	0.24 $\pm$ 0.01	0.84 $\pm$ 0.00	0.76 $\pm$ 0.01	0.99 $\pm$ 0.00	1111 $\pm$ 10	0.46 $\pm$ 0.01
	EMB	0.07 $\pm$ 0.01	0.95 $\pm$ 0.01	0.94 $\pm$ 0.01	0.99 $\pm$ 0.00	1062 $\pm$ 8	0.33 $\pm$ 0.00
	EMB-II	0.07 $\pm$ 0.01	0.95 $\pm$ 0.01	0.94 $\pm$ 0.01	0.99 $\pm$ 0.00	1061 $\pm$ 8	0.33 $\pm$ 0.00
Pigs	IAMB	0.42 $\pm$ 0.00	0.71 $\pm$ 0.00	0.62 $\pm$ 0.00	0.96 $\pm$ 0.00	2616 $\pm$ 7	1.28 $\pm$ 0.00
	MMMB	0.13 $\pm$ 0.00	0.92 $\pm$ 0.00	0.87 $\pm$ 0.00	1.00 $\pm$ 0.00	3.2e5 $\pm$ 2.2e4	2.2e2 $\pm$ 1.7e1
	HITON-MB	0.14 $\pm$ 0.00	0.92 $\pm$ 0.00	0.86 $\pm$ 0.00	1.00 $\pm$ 0.00	46956 $\pm$ 454	34.94 $\pm$ 0.33
	STMB	0.82 $\pm$ 0.01	0.26 $\pm$ 0.00	0.18 $\pm$ 0.01	1.00 $\pm$ 0.00	45770 $\pm$ 2746	29.20 $\pm$ 2.05
	BAMB	0.18 $\pm$ 0.01	0.89 $\pm$ 0.01	0.82 $\pm$ 0.01	1.00 $\pm$ 0.00	29097 $\pm$ 201	31.16 $\pm$ 0.13
	EMB	0.12 $\pm$ 0.00	0.93 $\pm$ 0.00	0.88 $\pm$ 0.00	1.00 $\pm$ 0.00	8784 $\pm$ 3405	5.73 $\pm$ 2.43
	EMB-II	0.12 $\pm$ 0.00	0.93 $\pm$ 0.00	0.88 $\pm$ 0.00	1.00 $\pm$ 0.00	7335 $\pm$ 209	4.94 $\pm$ 0.26
Gene	IAMB	0.32 $\pm$ 0.00	0.79 $\pm$ 0.00	0.76 $\pm$ 0.00	0.89 $\pm$ 0.00	3463 $\pm$ 10	1.36 $\pm$ 0.00
	MMMB	0.25 $\pm$ 0.00	0.83 $\pm$ 0.00	0.77 $\pm$ 0.00	0.94 $\pm$ 0.00	6035 $\pm$ 48	2.21 $\pm$ 0.02
	HITON-MB	0.25 $\pm$ 0.00	0.83 $\pm$ 0.00	0.77 $\pm$ 0.00	0.94 $\pm$ 0.00	4576 $\pm$ 22	1.44 $\pm$ 0.00
	STMB	0.88 $\pm$ 0.00	0.18 $\pm$ 0.00	0.13 $\pm$ 0.00	1.00 $\pm$ 0.00	17282 $\pm$ 268	7.70 $\pm$ 0.18
	BAMB	0.39 $\pm$ 0.00	0.73 $\pm$ 0.00	0.64 $\pm$ 0.00	0.94 $\pm$ 0.00	4474 $\pm$ 30	1.72 $\pm$ 0.02
	EMB	0.26 $\pm$ 0.00	0.82 $\pm$ 0.00	0.76 $\pm$ 0.00	0.94 $\pm$ 0.00	4486 $\pm$ 9	1.28 $\pm$ 0.00
	EMB-II	0.26 $\pm$ 0.00	0.82 $\pm$ 0.00	0.76 $\pm$ 0.00	0.94 $\pm$ 0.00	4412 $\pm$ 11	1.27 $\pm$ 0.00

TABLE IX: Comparison of EMB with State-of-the-art MB Learning Algorithms on Eight Benchmark BNs (size=1,000)

Network	Algorithm	Distance( $\downarrow$ )	F1( $\uparrow$ )	Precision( $\uparrow$ )	Recall( $\uparrow$ )	CI Tests( $\downarrow$ )	Time( $\downarrow$ )
Alarm	IAMB	0.27 $\pm$ 0.01	0.81 $\pm$ 0.01	0.93 $\pm$ 0.01	0.76 $\pm$ 0.01	120 $\pm$ 2	0.02 $\pm$ 0.00
	MMMB	0.20 $\pm$ 0.01	0.87 $\pm$ 0.01	0.91 $\pm$ 0.02	0.87 $\pm$ 0.01	437 $\pm$ 33	0.09 $\pm$ 0.00
	HITON-MB	0.16 $\pm$ 0.01	0.90 $\pm$ 0.01	0.95 $\pm$ 0.02	0.87 $\pm$ 0.01	315 $\pm$ 12	0.05 $\pm$ 0.00
	STMB	0.39 $\pm$ 0.03	0.72 $\pm$ 0.02	0.71 $\pm$ 0.02	0.85 $\pm$ 0.02	392 $\pm$ 12	0.06 $\pm$ 0.00
	BAMB	0.21 $\pm$ 0.01	0.86 $\pm$ 0.01	0.91 $\pm$ 0.02	0.86 $\pm$ 0.01	280 $\pm$ 16	0.04 $\pm$ 0.00
	EMB	0.18 $\pm$ 0.02	0.88 $\pm$ 0.01	0.96 $\pm$ 0.02	0.85 $\pm$ 0.02	297 $\pm$ 14	0.04 $\pm$ 0.00
	EMB-II	0.18 $\pm$ 0.02	0.88 $\pm$ 0.01	0.96 $\pm$ 0.02	0.85 $\pm$ 0.02	285 $\pm$ 13	0.04 $\pm$ 0.00
Insurance	IAMB	0.48 $\pm$ 0.02	0.66 $\pm$ 0.01	0.92 $\pm$ 0.03	0.56 $\pm$ 0.01	86 $\pm$ 2	0.01 $\pm$ 0.00
	MMMB	0.42 $\pm$ 0.03	0.71 $\pm$ 0.02	0.83 $\pm$ 0.03	0.66 $\pm$ 0.02	511 $\pm$ 47	0.12 $\pm$ 0.01
	HITON-MB	0.45 $\pm$ 0.03	0.69 $\pm$ 0.02	0.83 $\pm$ 0.04	0.65 $\pm$ 0.01	358 $\pm$ 34	0.06 $\pm$ 0.01
	STMB	0.59 $\pm$ 0.03	0.58 $\pm$ 0.03	0.58 $\pm$ 0.06	0.66 $\pm$ 0.04	1138 $\pm$ 1277	0.15 $\pm$ 0.16
	BAMB	0.45 $\pm$ 0.02	0.69 $\pm$ 0.02	0.76 $\pm$ 0.04	0.68 $\pm$ 0.01	404 $\pm$ 51	0.06 $\pm$ 0.01
	EMB	0.46 $\pm$ 0.03	0.68 $\pm$ 0.03	0.81 $\pm$ 0.08	0.64 $\pm$ 0.02	395 $\pm$ 82	0.06 $\pm$ 0.01
	EMB-II	0.46 $\pm$ 0.03	0.68 $\pm$ 0.03	0.81 $\pm$ 0.08	0.64 $\pm$ 0.02	371 $\pm$ 76	0.05 $\pm$ 0.01
Child	IAMB	0.27 $\pm$ 0.03	0.82 $\pm$ 0.02	0.94 $\pm$ 0.03	0.76 $\pm$ 0.02	54 $\pm$ 1	0.01 $\pm$ 0.00
	MMMB	0.22 $\pm$ 0.03	0.85 $\pm$ 0.02	0.89 $\pm$ 0.04	0.86 $\pm$ 0.02	823 $\pm$ 85	0.13 $\pm$ 0.01
	HITON-MB	0.20 $\pm$ 0.03	0.87 $\pm$ 0.02	0.90 $\pm$ 0.03	0.87 $\pm$ 0.02	469 $\pm$ 53	0.06 $\pm$ 0.01
	STMB	0.23 $\pm$ 0.07	0.85 $\pm$ 0.05	0.86 $\pm$ 0.05	0.87 $\pm$ 0.04	221 $\pm$ 7	0.04 $\pm$ 0.00
	BAMB	0.23 $\pm$ 0.04	0.85 $\pm$ 0.03	0.84 $\pm$ 0.04	0.91 $\pm$ 0.01	441 $\pm$ 58	0.05 $\pm$ 0.01
	EMB	0.17+0.03	0.89+0.03	0.94 $\pm$ 0.03	0.87 $\pm$ 0.02	320 $\pm$ 52	0.04 $\pm$ 0.01
	EMB-II	0.17+0.03	0.89+0.03	0.94 $\pm$ 0.03	0.87 $\pm$ 0.02	287 $\pm$ 39	0.04 $\pm$ 0.00
Alarm10	IAMB	0.52 $\pm$ 0.01	0.63 $\pm$ 0.01	0.78 $\pm$ 0.01	0.60 $\pm$ 0.01	1355 $\pm$ 11	0.18 $\pm$ 0.00
	MMMB	0.39 $\pm$ 0.01	0.73 $\pm$ 0.01	0.84 $\pm$ 0.01	0.70 $\pm$ 0.00	1579 $\pm$ 17	0.28 $\pm$ 0.00
	HITON-MB	0.37 $\pm$ 0.01	0.75 $\pm$ 0.01	0.86 $\pm$ 0.01	0.70 $\pm$ 0.01	1474 $\pm$ 13	0.20 $\pm$ 0.00
	STMB	0.76 $\pm$ 0.01	0.42 $\pm$ 0.01	0.37 $\pm$ 0.02	0.70 $\pm$ 0.01	3668 $\pm$ 29	0.76 $\pm$ 0.00
	BAMB	0.46 $\pm$ 0.00	0.68 $\pm$ 0.00	0.74 $\pm$ 0.01	0.70 $\pm$ 0.00	1551 $\pm$ 16	0.23 $\pm$ 0.00
	EMB	0.38 $\pm$ 0.01	0.74 $\pm$ 0.00	0.88 $\pm$ 0.01	0.69 $\pm$ 0.00	1765 $\pm$ 11	0.23 $\pm$ 0.00
	EMB-II	0.38 $\pm$ 0.01	0.74 $\pm$ 0.00	0.88 $\pm$ 0.01	0.69 $\pm$ 0.00	1756 $\pm$ 12	0.23 $\pm$ 0.00
Insurance10	IAMB	0.55 $\pm$ 0.01	0.61 $\pm$ 0.01	0.85 $\pm$ 0.01	0.53 $\pm$ 0.00	963 $\pm$ 5	0.13 $\pm$ 0.00
	MMMB	0.49 $\pm$ 0.01	0.66 $\pm$ 0.01	0.70 $\pm$ 0.01	0.70 $\pm$ 0.01	2180 $\pm$ 48	0.40 $\pm$ 0.01
	HITON-MB	0.45 $\pm$ 0.01	0.68 $\pm$ 0.01	0.74 $\pm$ 0.01	0.70 $\pm$ 0.01	1698 $\pm$ 28	0.25 $\pm$ 0.00
	STMB	0.75 $\pm$ 0.01	0.46 $\pm$ 0.01	0.33 $\pm$ 0.01	0.74 $\pm$ 0.01	1443 $\pm$ 14	0.21 $\pm$ 0.01
	BAMB	0.53 $\pm$ 0.01	0.62 $\pm$ 0.01	0.60 $\pm$ 0.01	0.74 $\pm$ 0.01	2243 $\pm$ 59	0.34 $\pm$ 0.01
	EMB	0.65 $\pm$ 0.06	0.53 $\pm$ 0.05	0.47 $\pm$ 0.07	0.71 $\pm$ 0.01	4333 $\pm$ 1736	0.57 $\pm$ 0.23
	EMB-II	0.65 $\pm$ 0.06	0.53 $\pm$ 0.05	0.47 $\pm$ 0.07	0.71 $\pm$ 0.01	3966 $\pm$ 1423	0.55 $\pm$ 0.23
Child10	IAMB	0.40 $\pm$ 0.01	0.72 $\pm$ 0.01	0.84 $\pm$ 0.01	0.71 $\pm$ 0.01	614 $\pm$ 8	0.08 $\pm$ 0.00
	MMMB	0.28 $\pm$ 0.02	0.81 $\pm$ 0.01	0.82 $\pm$ 0.02	0.86 $\pm$ 0.01	1757 $\pm$ 43	0.27 $\pm$ 0.00
	HITON-MB	0.25 $\pm$ 0.02	0.83 $\pm$ 0.01	0.84 $\pm$ 0.02	0.87 $\pm$ 0.01	1272 $\pm$ 24	0.17 $\pm$ 0.00
	STMB	0.66 $\pm$ 0.02	0.48 $\pm$ 0.02	0.39 $\pm$ 0.02	0.85 $\pm$ 0.01	2186 $\pm$ 41	0.39 $\pm$ 0.00
	BAMB	0.47 $\pm$ 0.01	0.67 $\pm$ 0.01	0.58 $\pm$ 0.01	0.90 $\pm$ 0.01	1460 $\pm$ 43	0.20 $\pm$ 0.01
	EMB	0.22 $\pm$ 0.02	0.85 $\pm$ 0.01	0.87 $\pm$ 0.02	0.88 $\pm$ 0.01	1225 $\pm$ 10	0.16 $\pm$ 0.00
	EMB-II	0.22 $\pm$ 0.02	0.85 $\pm$ 0.01	0.87 $\pm$ 0.02	0.88 $\pm$ 0.01	1182 $\pm$ 8	0.16 $\pm$ 0.00
Pigs	IAMB	0.34 $\pm$ 0.00	0.79 $\pm$ 0.00	0.82 $\pm$ 0.00	0.84 $\pm$ 0.00	1755 $\pm$ 1	0.22 $\pm$ 0.00
	MMMB	0.15 $\pm$ 0.01	0.91 $\pm$ 0.01	0.85 $\pm$ 0.01	1.00 $\pm$ 0.00	197884 $\pm$ 17879	8.44 $\pm$ 0.71
	HITON-MB	0.12 $\pm$ 0.01	0.92 $\pm$ 0.01	0.88 $\pm$ 0.01	1.00 $\pm$ 0.00	47028 $\pm$ 1667	4.78 $\pm$ 0.18
	STMB	0.85 $\pm$ 0.00	0.25 $\pm$ 0.00	0.15 $\pm$ 0.00	1.00 $\pm$ 0.00	25626 $\pm$ 3234	2.24 $\pm$ 0.09
	BAMB	0.31 $\pm$ 0.01	0.80 $\pm$ 0.01	0.69 $\pm$ 0.01	1.00 $\pm$ 0.00	40466 $\pm$ 4419	11.22 $\pm$ 1.55
	EMB	0.11 $\pm$ 0.01	0.93 $\pm$ 0.00	0.89 $\pm$ 0.01	1.00 $\pm$ 0.00	7541 $\pm$ 99	0.58 $\pm$ 0.01
	EMB-II	0.11 $\pm$ 0.01	0.93 $\pm$ 0.00	0.89 $\pm$ 0.01	1.00 $\pm$ 0.00	7466 $\pm$ 193	0.56 $\pm$ 0.01
Gene	IAMB	0.39 $\pm$ 0.00	0.73 $\pm$ 0.00	0.79 $\pm$ 0.00	0.79 $\pm$ 0.00	2887 $\pm$ 10	0.36 $\pm$ 0.00
	MMMB	0.28 $\pm$ 0.00	0.81 $\pm$ 0.00	0.75 $\pm$ 0.00	0.93 $\pm$ 0.00	4569 $\pm$ 36	0.75 $\pm$ 0.01
	HITON-MB	0.25 $\pm$ 0.01	0.83 $\pm$ 0.00	0.79 $\pm$ 0.00	0.93 $\pm$ 0.00	3918 $\pm$ 26	0.54 $\pm$ 0.01
	STMB	0.86 $\pm$ 0.00	0.21 $\pm$ 0.00	0.14 $\pm$ 0.00	0.99 $\pm$ 0.00	10672 $\pm$ 74	2.50 $\pm$ 0.01
	BAMB	0.46 $\pm$ 0.01	0.67 $\pm$ 0.00	0.57 $\pm$ 0.01	0.94 $\pm$ 0.00	3817 $\pm$ 52	0.61 $\pm$ 0.01
	EMB	0.25 $\pm$ 0.00	0.83 $\pm$ 0.00	0.78 $\pm$ 0.00	0.93 $\pm$ 0.00	4228 $\pm$ 18	0.57 $\pm$ 0.00
	EMB-II	0.25 $\pm$ 0.00	0.83 $\pm$ 0.00	0.78 $\pm$ 0.00	0.93 $\pm$ 0.00	4204 $\pm$ 17	0.57 $\pm$ 0.00

5.3.2 Experimental Results of EMB and Its Rivals

We evaluate the effectiveness of the proposed EMB by comparing it with five state-of-the-art MB learning algorithms, including BAMB [17], IAMB [21], MMMB [24], HITON-MB [25] and STMB [27].

For MB learning algorithms, we use precision, recall, F1, distance [26] [27], CI tests, and running time (in seconds) as the evaluation metrics.

•

Precision: The number of true positives in the output (i.e., the variables in the output belonging to the true MB of a target variable in a test DAG) divided by the number of variables in the output of an algorithm.
•

Recall: The number of true positives in the output divided by the number of true positives (the number of the true MB of a target variable in a test DAG).
•

F1 = 2 * Precision * Recall / (Precision + Recall): The F1 score is the harmonic average of the precision and recall, where F1 = 1 is the best case (perfect precision and recall) while F1 = 0 is the worst case.
•

Distance = $\sqrt{(1-Precision)^{2}+(1-Recall)^{2}}$ [26] [27], where distance = 0 is the best case (perfect precision and recall) while distance = $\sqrt{2}$ is the worst case.
•

Efficiency: The number of CI tests and the running time (in seconds) are used to measure the efficiency.

Tables VIII-IX report the experimental results of EMB and its rivals. From the experimental results, we have the following observations.

EMB versus IAMB, MMMB and HITON-MB. IAMB is much faster than EMB, while IAMB is significantly worse than EMB in terms of distance, F1, precision and recall on average. Compared with MMMB and HITON-MB, EMB is more efficient. EMB needs much less CI tests than MMMB and HITON-MB. In addition, using 5,000 data samples, EMB is 2 times faster than MMMB and 1.2 times faster than HITON-MB on average. Moreover, EMB is more accurate than MMMB. In particular, using 5,000 data samples, EMB achieves the lowest distance and the highest F1 values on Alarm, Insurance10, Child10 and Pigs. Using 1,000 data samples, EMB obtains the lowest distance and the highest F1 values on Child, Child10, Pigs and Gene. Overall, EMB is superior to IAMB, MMMB and HITON-MB.

EMB versus STMB and BAMB. From Tables VIII-IX, we note that STMB achieves higher recall values than EMB, but on the distance, F1 and precision metrics, STMB is significantly worse than EMB. Compared with BAMB, EMB achieves lower distance and higher F1 values. Additionally, the number of CI tests of EMB is less than STMB and BAMB. More specifically, using 5,000 data samples, EMB is 3.5 times faster than STMB and 1.5 times faster than BAMB on average. In a word, EMB performs better than BAMB and STMB in both efficiency and accuracy.

EMB versus EMB-II. EMB is inferior to EMB-II. Compared with EMB, EMB-II uses less CI tests for MB learning while achieving the same performance as measured by the distance, F1, precision and recall matrics, which indicates the efficiency of EMB-II.

EMB is able to effectively find the MB of a target variable, and simultaneously distinguish parents and children of the target variable. We note that EMB uses less CI tests for MB learning, which can reduce the impact of unreliable CI tests. In summary, EMB is helpful to learn the local causal structure.

To further demonstrate the effectiveness of EMB, we propose three variants of ELCS, which are referred to as ELCS-M, ECLS-S, ELCS-B, respectively. ELCS-M uses MMMB to replace EMB in ELCS. ECLS-S and ELCS-B use STMB and BAMB to replace EMB in ELCS, respectively. Table X reports the experimental results of ECLS and its three variants on the eight BNs with 5,000 data examples. From the table, we observe that ELCS outperforms these three rivals in terms of both CI tests and running time, which implies the efficiency of ELCS. We also note that ELCS achieves better ArrP, ArrR, SHD and FDR values than that of these three rivals, which shows the effectiveness of ELCS.

EMB has achieved encouraging performance, but it still suffers from the following two drawbacks. First, to improve the efficiency of EMB while maintaining competitive performance, EMB chooses to remove non-spouses within U $\setminus$ {T} $\setminus$ PC ${}_{\emph{T}}$ of the target variable T as early as possible at lines 2-16 in Algorithm 3. The size of the conditioning sets Temp (line 9 in Algorithm 3) and {Y} $\cup$ Sep ${}_{\emph{T}}$ {X} (line 11 in Algorithm 3) may be large, when the size of data samples is finite, the results of CI tests may be unreliable, leading to poor performance of EMB. Second, the performance of EMB is limited by HITON-PC that is used for PC learning. If HITON-PC has a lower quality of PC learning, inaccurate MBs will be learnt by EMB.

TABLE X: Comparison of ELCS with ELCS-M, ECLS-S and ELCS-B on Eight Benchmark BNs (size=5,000)

Network	Algorithm	ArrP( $\uparrow$ )	ArrR( $\uparrow$ )	SHD( $\downarrow$ )	FDR( $\downarrow$ )	CI Tests( $\downarrow$ )	Time( $\downarrow$ )
Alarm	ELCS-M	0.71 $\pm$ 0.02	0.59 $\pm$ 0.03	1.14 $\pm$ 0.06	0.26 $\pm$ 0.03	1627 $\pm$ 186	0.62 $\pm$ 0.08
	ELCS-S	0.79 $\pm$ 0.03	0.72 $\pm$ 0.05	0.88 $\pm$ 0.14	0.13 $\pm$ 0.04	1402 $\pm$ 78	0.57 $\pm$ 0.03
	ELCS-B	0.78 $\pm$ 0.03	0.66 $\pm$ 0.05	0.92 $\pm$ 0.09	0.26 $\pm$ 0.03	778 $\pm$ 46	0.30 $\pm$ 0.02
	ELCS	0.86 $\pm$ 0.01	0.81 $\pm$ 0.01	0.44 $\pm$ 0.06	0.07 $\pm$ 0.02	648 $\pm$ 55	0.20 $\pm$ 0.02
	ELCS-II	0.86 $\pm$ 0.01	0.81 $\pm$ 0.01	0.44 $\pm$ 0.06	0.07 $\pm$ 0.02	607 $\pm$ 52	0.19 $\pm$ 0.02
Insurance	ELCS-M	0.76 $\pm$ 0.03	0.59 $\pm$ 0.03	1.87 $\pm$ 0.15	0.31 $\pm$ 0.04	6976 $\pm$ 1183	3.28 $\pm$ 0.62
	ELCS-S	0.67 $\pm$ 0.05	0.50 $\pm$ 0.05	2.50 $\pm$ 0.25	0.37 $\pm$ 0.04	2653 $\pm$ 476	1.39 $\pm$ 0.25
	ELCS-B	0.67 $\pm$ 0.02	0.44 $\pm$ 0.02	2.26 $\pm$ 0.10	0.41 $\pm$ 0.01	3182 $\pm$ 447	1.78 $\pm$ 0.27
	ELCS	0.85 $\pm$ 0.04	0.69 $\pm$ 0.04	1.61 $\pm$ 0.06	0.18 $\pm$ 0.05	1686 $\pm$ 276	0.75 $\pm$ 0.12
	ELCS-II	0.85 $\pm$ 0.04	0.69 $\pm$ 0.04	1.61 $\pm$ 0.06	0.18 $\pm$ 0.05	1637 $\pm$ 275	0.75 $\pm$ 0.12
Child	ELCS-M	0.68 $\pm$ 0.11	0.56 $\pm$ 0.12	1.18 $\pm$ 0.31	0.06 $\pm$ 0.04	8897 $\pm$ 1247	4.43 $\pm$ 0.62
	ELCS-S	0.81 $\pm$ 0.07	0.72 $\pm$ 0.09	0.75 $\pm$ 0.18	0.18 $\pm$ 0.07	2451 $\pm$ 516	1.28 $\pm$ 0.28
	ELCS-B	0.75 $\pm$ 0.03	0.65 $\pm$ 0.05	0.93 $\pm$ 0.14	0.26 $\pm$ 0.05	2252 $\pm$ 195	1.16 $\pm$ 0.10
	ELCS	0.71 $\pm$ 0.12	0.61 $\pm$ 0.16	1.08 $\pm$ 0.36	0.09 $\pm$ 0.08	2093 $\pm$ 287	0.93 $\pm$ 0.10
	ELCS-II	0.71 $\pm$ 0.12	0.61 $\pm$ 0.16	1.08 $\pm$ 0.36	0.09 $\pm$ 0.08	2087 $\pm$ 287	0.93 $\pm$ 0.10
Alarm10	ELCS-M	0.77 $\pm$ 0.01	0.59 $\pm$ 0.02	1.60 $\pm$ 0.07	0.19 $\pm$ 0.02	10570 $\pm$ 1207	3.17 $\pm$ 0.39
	ELCS-S	0.69 $\pm$ 0.01	0.46 $\pm$ 0.01	1.65 $\pm$ 0.03	0.32 $\pm$ 0.01	8429 $\pm$ 237	3.66 $\pm$ 0.15
	ELCS-B	0.68 $\pm$ 0.01	0.44 $\pm$ 0.00	1.85 $\pm$ 0.02	0.46 $\pm$ 0.01	7223 $\pm$ 170	2.09 $\pm$ 0.03
	ELCS	0.83 $\pm$ 0.01	0.68 $\pm$ 0.02	1.26 $\pm$ 0.07	0.14 $\pm$ 0.02	6893 $\pm$ 483	1.77 $\pm$ 0.12
	ELCS-II	0.83 $\pm$ 0.01	0.68 $\pm$ 0.02	1.26 $\pm$ 0.07	0.14 $\pm$ 0.02	6916 $\pm$ 480	1.77 $\pm$ 0.12
Insurance10	ELCS-M	0.75 $\pm$ 0.01	0.61 $\pm$ 0.01	1.85 $\pm$ 0.04	0.33 $\pm$ 0.01	14461 $\pm$ 2927	6.35 $\pm$ 1.27
	ELCS-S	0.60 $\pm$ 0.01	0.38 $\pm$ 0.01	2.44 $\pm$ 0.02	0.51 $\pm$ 0.01	10338 $\pm$ 533	5.78 $\pm$ 0.55
	ELCS-B	0.64 $\pm$ 0.01	0.42 $\pm$ 0.01	2.31 $\pm$ 0.06	0.47 $\pm$ 0.01	7556 $\pm$ 224	3.55 $\pm$ 0.11
	ELCS	0.80 $\pm$ 0.02	0.67 $\pm$ 0.02	1.75 $\pm$ 0.11	0.23 $\pm$ 0.01	10809 $\pm$ 1528	3.92 $\pm$ 0.55
	ELCS-II	0.80 $\pm$ 0.02	0.67 $\pm$ 0.02	1.75 $\pm$ 0.11	0.23 $\pm$ 0.01	10605 $\pm$ 1499	3.91 $\pm$ 0.55
Child10	ELCS-M	0.80 $\pm$ 0.02	0.75 $\pm$ 0.03	0.75 $\pm$ 0.08	0.16 $\pm$ 0.02	13438 $\pm$ 1388	5.80 $\pm$ 0.58
	ELCS-S	0.66 $\pm$ 0.01	0.48 $\pm$ 0.02	1.17 $\pm$ 0.03	0.45 $\pm$ 0.02	15249 $\pm$ 2335	6.86 $\pm$ 0.70
	ELCS-B	0.70 $\pm$ 0.01	0.52 $\pm$ 0.02	1.05 $\pm$ 0.05	0.47 $\pm$ 0.02	13880 $\pm$ 1187	4.28 $\pm$ 0.05
	ELCS	0.83 $\pm$ 0.05	0.76 $\pm$ 0.07	0.73 $\pm$ 0.20	0.14 $\pm$ 0.03	13129 $\pm$ 2613	3.98 $\pm$ 0.79
	ELCS-II	0.83 $\pm$ 0.05	0.76 $\pm$ 0.07	0.73 $\pm$ 0.20	0.14 $\pm$ 0.03	13104 $\pm$ 2605	3.98 $\pm$ 0.79
Pigs	ELCS-M	-	-	-	-	-	-
	ELCS-S	-	-	-	-	-	-
	ELCS-B	-	-	-	-	-	-
	ELCS	0.91 $\pm$ 0.00	0.99 $\pm$ 0.00	0.42 $\pm$ 0.02	0.15 $\pm$ 0.01	13374 $\pm$ 8660	8.91 $\pm$ 6.84
	ELCS-II	0.91 $\pm$ 0.00	0.99 $\pm$ 0.00	0.42 $\pm$ 0.02	0.15 $\pm$ 0.01	11467 $\pm$ 5659	8.38 $\pm$ 5.12
Gene	ELCS-M	-	-	-	-	-	-
	ELCS-S	-	-	-	-	-	-
	ELCS-B	-	-	-	-	-	-
	ELCS	0.76 $\pm$ 0.01	0.79 $\pm$ 0.01	0.79 $\pm$ 0.03	0.32 $\pm$ 0.01	36950 $\pm$ 7876	11.03 $\pm$ 2.35
	ELCS-II	0.76 $\pm$ 0.01	0.79 $\pm$ 0.01	0.79 $\pm$ 0.03	0.32 $\pm$ 0.01	36051 $\pm$ 7696	11.02 $\pm$ 2.08

6 Conclusion

A new local causal structure learning algorithm (ELCS) has been proposed in this paper, which reduces the search space in distinguishing parents from children of a target variable of interest. Specifically, ELCS makes use of the N-structures to distinguish parents from children of the target variable during learning the MB of the target variable. Furthermore, to combine MB learning with the N-structures to infer edge directions between the target variable and its PC, we design an effective MB discovery subroutine (EMB). We theoretically analyze the correctness of ELCS. Extensive experimental results on benchmark BNs indicate that ELCS not only improves the efficiency for learning the local causal structure, but also achieves better performance in accuracy. In future, we plan to extend the ELCS algorithm for global causal structures learning and robust machine learning.

Acknowledgments

This work was supported in part by the National Key Research and Development Program of China (under Grant 2020AAA0106100), the National Natural Science Foundation of China (under Grant 61876206), and Open Project Foundation of Intelligent Information Processing Key Laboratory of Shanxi Province (under Grant CICIP2020003).

References

[1] K. Yu, L. Liu, and J. Li, “A unified view of causal and non-causal feature selection,” ACM Transactions on Knowledge Discovery from Data (TKDD), in press, 2020.
[2] R. Cai, J. Qiao, K. Zhang, Z. Zhang, and Z. Hao, “Causal discovery with cascade nonlinear additive noise model,” in Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI, Macao, China, 10-16 August, 2019, pp. 1609–1615.
[3] B. Huang, K. Zhang, M. Gong, and C. Glymour, “Causal discovery and forecasting in nonstationary environments with state-space models,” in Proceedings of the 36th International Conference on Machine Learning, ICML, 9-15 June, Long Beach, California, USA, 2019, pp. 2901–2910.
[4] A. Marx and J. Vreeken, “Identifiability of cause and effect using regularized regression,” in Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, KDD 2019, Anchorage, AK, USA, 4-8 August, 2019, pp. 852–861.
[5] K. Yu, L. Liu, J. Li, W. Ding, and T. D. Le, “Multi-source causal feature selection,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 42, no. 9, pp. 2240–2256, 2020.
[6] B. Gourévitch, R. L. Bouquin-Jeannès, and G. Faucon, “Linear and nonlinear causality between signals: methods, examples and neurophysiological applications,” Biol. Cybern., vol. 95, no. 4, pp. 349–369, 2006.
[7] J. Choi, R. S. Chapkin, and Y. Ni, “Bayesian causal structural learning with zero-inflated poisson bayesian networks,” in Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems, December 6-12, virtual, 2020.
[8] M. H. Maathuis, M. Kalisch, P. Bühlmann et al., “Estimating high-dimensional intervention effects from observational data,” The Annals of Statistics, vol. 37, no. 6A, pp. 3133–3164, 2009.
[9] J. Pearl, Causality. Cambridge university press, 2009.
[10] A. Marx and J. Vreeken, “Testing conditional independence on discrete data using stochastic complexity,” in The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS, 16-18 April, Naha, Okinawa, Japan, 2019, pp. 496–505.
[11] I. Tsamardinos, L. E. Brown, and C. F. Aliferis, “The max-min hill-climbing bayesian network structure learning algorithm,” Machine Learning, vol. 65, no. 1, pp. 31–78, 2006.
[12] X. Zheng, B. Aragam, P. Ravikumar, and E. P. Xing, “Dags with NO TEARS: continuous optimization for structure learning,” in Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, 3-8 December, Montréal, Canada., 2018, pp. 9492–9503.
[13] Y. Yu, J. Chen, T. Gao, and M. Yu, “DAG-GNN: DAG structure learning with graph neural networks,” in Proceedings of the 36th International Conference on Machine Learning, ICML, 9-15 June, Long Beach, California, USA, 2019, pp. 7154–7163.
[14] J. Yin, Y. Zhou, C. Wang, P. He, C. Zheng, and Z. Geng, “Partial orientation and local structural learning of causal networks for prediction,” in Causation and Prediction Challenge at WCCI, Hong Kong, June 1-6, 2008, pp. 93–105.
[15] T. Gao and Q. Ji, “Local causal discovery of direct causes and effects,” in Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems, 7-12 December, Montreal, Quebec, Canada, 2015, pp. 2512–2520.
[16] X. Wu, B. Jiang, K. Yu, C. Miao, and H. Chen, “Accurate markov boundary discovery for causal feature selection,” IEEE Trans. Cybern., vol. 50, no. 12, pp. 4983–4996, 2020.
[17] Z. Ling, K. Yu, H. Wang, L. Liu, W. Ding, and X. Wu, “BAMB: A balanced markov blanket discovery approach to feature selection,” ACM Transactions on Intelligent Systems and Technology, vol. 10, no. 5, pp. 52:1–52:25, 2019.
[18] T. Niinimaki and P. Parviainen, “Local structure discovery in bayesian networks,” in Proceedings of the Twenty-Eighth Conference on Uncertainty in Artificial Intelligence, Catalina Island, CA, USA, 14-18 August, 2012, pp. 634–643.
[19] T. Gao and Q. Ji, “Efficient score-based markov blanket discovery,” Int. J. Approx. Reason., vol. 80, pp. 277–293, 2017.
[20] D. Margaritis and S. Thrun, “Bayesian network induction via local neighborhoods,” in Advances in Neural Information Processing Systems 12, [NIPS Conference, Denver, Colorado, USA, November 29 - December 4, 1999], S. A. Solla, T. K. Leen, and K. Müller, Eds., 1999, pp. 505–511.
[21] I. Tsamardinos and C. F. Aliferis, “Towards principled feature selection: Relevancy, filters and wrappers,” in Proceedings of the Ninth International Workshop on Artificial Intelligence and Statistics, AISTATS 2003, Key West, Florida, USA, January 3-6, 2003, C. M. Bishop and B. J. Frey, Eds., 2003.
[22] I. Tsamardinos, C. F. Aliferis, A. R. Statnikov, and E. Statnikov, “Algorithms for large scale markov blanket discovery.” in FLAIRS conference, vol. 2, 2003, pp. 376–380.
[23] S. Yaramakala and D. Margaritis, “Speculative markov blanket discovery for optimal feature selection,” in Proceedings of the 5th IEEE International Conference on Data Mining (ICDM 2005), 27-30 November 2005, Houston, Texas, USA, 2005, pp. 809–812.
[24] I. Tsamardinos, C. F. Aliferis, and A. R. Statnikov, “Time and sample efficient discovery of markov blankets and direct causal relations,” in Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, 24-27 August, 2003, pp. 673–678.
[25] C. F. Aliferis, I. Tsamardinos, and A. R. Statnikov, “HITON: A novel markov blanket algorithm for optimal variable selection,” in AMIA, American Medical Informatics Association Annual Symposium, Washington, DC, USA, November 8-12, 2003.
[26] J. M. Peña, R. Nilsson, J. Björkegren, and J. Tegnér, “Towards scalable and data efficient learning of markov boundaries,” Int. J. Approx. Reason., vol. 45, no. 2, pp. 211–232, 2007.
[27] T. Gao and Q. Ji, “Efficient markov blanket discovery and its application,” IEEE Trans. Cybernetics, vol. 47, no. 5, pp. 1169–1179, 2017.
[28] K. Yu, X. Guo, L. Liu, J. Li, H. Wang, Z. Ling, and X. Wu, “Causality-based feature selection: Methods and evaluations,” ACM Computing Surveys (CSUR), vol. 53, no. 5, pp. 1–36, 2020.
[29] D. Margaritis and S. Thrun, “Bayesian network induction via local neighborhoods,” in Advances in Neural Information Processing Systems 12, [NIPS Conference, Denver, Colorado, USA, November 29-December 4], 1999, pp. 505–511.
[30] T. Gao, K. P. Fadnis, and M. Campbell, “Local-to-global bayesian network structure learning,” in Proceedings of the 34th International Conference on Machine Learning, ICML, Sydney, NSW, Australia, 6-11 August, 2017, pp. 1193–1202.
[31] T. Gao and D. Wei, “Parallel bayesian network structure learning,” in Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, 10-15 July, 2018, 2018, pp. 1671–1680.
[32] S. Zhu, I. Ng, and Z. Chen, “Causal discovery with reinforcement learning,” in 8th International Conference on Learning Representations, ICLR 2020, April 26-30 Addis Ababa, Ethiopia, 2020.
[33] M. Zhang, S. Jiang, Z. Cui, R. Garnett, and Y. Chen, “D-VAE: A variational autoencoder for directed acyclic graphs,” in Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, 2019, pp. 1586–1598.
[34] J. Pearl, Probabilistic reasoning in intelligent systems: networks of plausible inference. Morgan Kaufmann series in representation and reasoning, 1988.
[35] P. Spirtes, C. Glymour, and R. Scheines, Causation, Prediction, and Search. MIT Press, 2000.
[36] C. Meek, “Causal inference and causal explanation with background knowledge,” Proc. Conf. on Uncertainty in Artificial Intelligence (UAI-95), pp. 403–410, 1995.
[37] C. F. Aliferis, I. Tsamardinos, A. R. Statnikov, and L. E. Brown, “Causal explorer: A causal probabilistic network learning toolkit for biomedical discovery,” in Proceedings of the International Conference on Mathematics and Engineering Techniques in Medicine and Biological Scienes, METMBS, June 23-26, Las Vegas, Nevada, USA, 2003, pp. 371–376.