\floatsetup

[figure]style=plain

Generic Identifiability of Linear Structural Equation Models by Ancestor Decomposition

Mathias Drton Department of Statistics, University of Washington, Seattle, WA, U.S.A. md5@uw.edu and Luca Weihs Department of Statistics, University of Washington, Seattle, WA, U.S.A. lucaw@uw.edu

(Date: September 25, 2025)

Abstract.

Linear structural equation models, which relate random variables via linear interdependencies and Gaussian noise, are a popular tool for modeling multivariate joint distributions. These models correspond to mixed graphs that include both directed and bidirected edges representing the linear relationships and correlations between noise terms, respectively. A question of interest for these models is that of parameter identifiability, whether or not it is possible to recover edge coefficients from the joint covariance matrix of the random variables. For the problem of determining generic parameter identifiability, we present an algorithm that extends an algorithm from prior work by Foygel, Draisma, and Drton (2012). The main idea underlying our new algorithm is the use of ancestral subsets of vertices in the graph in application of a decomposition idea of Tian (2005).

Key words and phrases:

Half-trek criterion, structural equation models, identifiability, generic identifiability

1. Introduction

It is often useful to model the joint distribution of a random vector $X=(X_{1},...,X_{n})^{T}$ in terms of a collection of noisy linear interdependencies. In particular, we may postulate that each $X_{w}$ is a linear function of $X_{1},...,X_{w-1},X_{w+1},...,X_{n}$ and a stochastic noise term $\epsilon_{w}$ . Models of this type are called linear structural equation models and can be compactly expressed in matrix form as

(1.1)

\displaystyle X=\lambda_{0}+\Lambda^{T}X+\epsilon

where $\Lambda=(\lambda_{vw})$ is a $n\times n$ matrix, $\lambda_{0}=(\lambda_{01},...,\lambda_{0n})^{T}\in\mathbb{R}^{n}$ , and $\epsilon=(\epsilon_{1},...,\epsilon_{n})^{T}$ is a random vector of error terms. We will adopt the classical assumption that $\epsilon$ has a non-degenerate multivariate normal distribution with mean 0 and covariance matrix $\Omega=(\omega_{vw})$ . With this assumption it follows immediately that $X$ has a multivariate normal distribution with mean $(I-\Lambda)^{-T}\lambda_{0}$ and covariance matrix

(1.2)

\displaystyle\Sigma=(I-\Lambda)^{-T}\Omega(I-\Lambda)^{-1}

where $I$ is the $n\times n$ identity matrix. We refer the reader to the book by Bollen (1989) for background on these types of models.

We obtain a collection of interesting models by imposing different patterns of zeros among the coefficients in $\Lambda$ and $\Omega$ . These models can then be naturally associated with mixed graphs containing both directed and bidirected edges. In particular, the graph will contain the directed edge $v\to w$ when $\lambda_{vw}$ is not required to be zero and, similarly, will include the bidirected edge $v\leftrightarrow w$ when $\omega_{vw}$ is potentially non-zero. Representations of this type are often called path diagrams and were first advocated for in Wright (1921, 1934).

A natural question arising in the study of linear structural equations is that of identifiability; whether or not it is possible to uniquely recover the two parameter matrices $\Lambda$ and $\Omega$ from the covariance matrix $\Sigma$ they define via (1.2). The most stringent version, known as global identifiability, amounts to unique recovery of every pair $(\Lambda,\Omega)$ from the covariance matrix $\Sigma$ . This global property can be characterized efficiently (Drton et al., 2011). Often, however, a less stringent notion that we term generic identifiability is of interest. This property requires only that a generic (or randomly chosen) pair $(\Lambda,\Omega)$ can be recovered from its covariance matrix. The computational complexity of deciding whether a given mixed graph $G$ defines a generically identifiable linear structural equation model is unknown. There are, however, a number of graphical criteria that are sufficient for generic identifiability and can be checked in polynomial time in the number of considered variables (or vertices of the graph). To our knowledge, the most widely applicable such criterion is the Half-Trek Criterion (HTC) of Foygel, Draisma, and Drton (2012a), which built on earlier work of Brito and Pearl (2006). The HTC also comes with a necessary condition for generic identifiability but in this paper our focus is on the sufficient condition. We remark that an extension of the HTC for identification of subsets of edge coefficients is given in Chen et al. (2014).

We begin with a brief review of background such as the formal connection between structural equation models and mixed graphs, and give a review of prior work in Section 2. In the main Section 3, we will demonstrate a simple method by which to infer generic identifiability of certain entries of $(\Lambda,\Omega)$ by examining subgraphs of a given mixed graph $G$ that are induced by ancestral subsets of vertices. This will extend the applicability of the HTC in the case of acyclic mixed graphs after applying the decomposition techniques of Tian (2005). We leverage this extension in an efficient algorithmic form. In Section 4, we report on computational experiments demonstrating the applicability of our findings. A brief conclusion is given in Section 5.

2. Preliminaries

We assume that the reader is familiar with graphical representations of structural equation models and thus only provide a quick review of these topics. For a more in-depth treatment see, for instance, Pearl (2009) or Foygel et al. (2012a).

2.1. Mixed Graphs

For any $n\geq 1$ , let $[n]:=\{1,...,n\}$ . We define a mixed graph to be a triple $G=(V,D,B)$ where $V=[n]$ is a finite set of vertices and $D,B\subset V\times V$ . The sets $D$ and $B$ correspond to the directed and the bidirected edges, respectively. When $(v,w)\in D$ , we will write $v\to w\in G$ and if $(v,w)\in B$ then we will write $v\leftrightarrow w\in G$ . Since edges in $B$ are bidirected the set $B$ is symmetric, that is, we have $(v,w)\in B\iff(w,v)\in B$ . We require that both the directed part $(V,D)$ and bidirected part $(V,B)$ contain no self loops so that $(v,v)\not\in D\cup B$ for all $v\in V$ . If the directed graph $(V,D)$ does not contain any cycles, so that there are no vertices $v,w_{1},...,w_{m}\in V$ such that $v\to w_{1},w_{1}\to w_{2},...,w_{m}\to v\in G$ , then we say that $G$ is acyclic; note, in particular, that $G$ being acyclic does not imply $(V,B)$ does not contain any (undirected) cycles.

A path from $v$ to $w$ is any sequence of edges from $D$ or $B$ beginning at $v$ and ending with $w$ , here edges need not obey direction and loops are allowed. A directed path from $v$ to $w$ is then any path from $v$ to $w$ all of whose edges are directed and pointed in the same direction, away from $v$ and towards $w$ . Finally, a trek $\pi$ from a source $v$ to a target $w$ is any path that has no colliding arrowheads, that is, $\pi$ must be of the form

\displaystyle v^{L}_{l}\leftarrow v^{L}_{l-1}\leftarrow...\leftarrow v^{L}_{0}\longleftrightarrow v^{R}_{0}\to v^{R}_{1}\to...\to v^{R}_{r-1}\to v^{R}_{r}

\displaystyle v^{L}_{l}\leftarrow v^{L}_{l-1}\leftarrow...\leftarrow v^{L}_{1}\leftarrow v^{T}\to v^{R}_{1}\to...\to v^{R}_{r}\to v^{R}_{r}

where $v^{L}_{l}=v$ , $v^{R}_{r}=w$ , and we call $v^{T}$ the top node. If $\pi$ is as in the first case then we let $\text{Left}(\pi)=\{v^{L}_{0},...,v^{L}_{l}\}$ and $\text{Right}(\pi)=\{v^{R}_{0},...,v^{R}_{r}\}$ , if $\pi$ is as in the second case then we let $\text{Left}(\pi)=\{v^{T},v^{L}_{1},...,v^{L}_{l}\}$ and $\text{Right}(\pi)=\{v^{T},v^{R}_{1},...,v^{R}_{r}\}$ . Note that, in the second case, $v^{T}$ is included in both $\text{Left}(\pi)$ and $\text{Right}(\pi)$ . A trek $\pi$ is called a half-trek if $|\text{Left}(\pi)|=1$ so that $\pi$ is of the form

\displaystyle v^{L}_{0}\longleftrightarrow v^{R}_{0}\to v^{R}_{1}\to...\to v^{R}_{r-1}\to v^{R}_{r}

\displaystyle v^{T}\to v^{R}_{1}\to...\to v^{R}_{r}\to v^{R}_{r}

It will be useful to reference the local neighborhood structure of the graph. For this purpose, for all $v\in V$ , we define the two sets

(2.1)		$\displaystyle pa(v)=\{w\in V:w\to v\in G\},$
(2.2)		$\displaystyle sib(v)=\{w\in V:w\leftrightarrow v\in G\}.$

The former comprises the parents of $v$ , and the latter contains the siblings of $v$ .

We associate a mixed graph $G$ to a linear structural equation model as follows. Let $\mathbb{R}^{D}$ be the set of real $n\times n$ matrices $\Lambda=(\lambda_{vw})$ with support $D$ , i.e., $\lambda_{vw}\not=0\implies(v,w)\in D$ . Let $PD_{n}$ be the cone of $n\times n$ positive definite matrices $\Omega=(\omega_{vw})$ . Define $PD(B)\subset PD_{n}$ to be the subset of positive definite matrices with support $B$ , i.e. for $v\not=w$ , $\omega_{vw}\not=0\implies v\leftrightarrow w\in G$ .

In this paper, we focus on acyclic graphs $G$ . If $G$ is acyclic then the matrix $I-\Lambda$ is invertible for all $\Lambda\in\mathbb{R}^{D}$ . In other words, the equation system from (1.1) can always be solved uniquely for $X$ . We are led to the following definition.

Definition 2.1.

The linear structural equation model given by an acyclic mixed graph $G=(V,D,B)$ with $V=[n]$ is the collection of all $n$ -dimensional normal distributions with covariance matrix

\displaystyle\Sigma=(I-\Lambda)^{-T}\Omega(I-\Lambda)^{-1}

for a choice of $\Lambda\in\mathbb{R}^{D}$ and $\Omega\in PD(B)$ .

2.2. Prior Work and the HTC

For a fixed acyclic mixed graph $G$ , let $\Theta:=\mathbb{R}^{D}\times PD(B)$ be the parameter space and $\phi_{G}:\Theta\to PD_{n}$ be the map

(2.3)

\displaystyle\phi_{G}:(\Lambda,\Omega)\mapsto(I-\Lambda)^{-T}\Omega(I-\Lambda)^{-1}.

Then the question of identifiability is equivalent to asking whether the fiber

\displaystyle\mathcal{F}(\Lambda,\Omega):=\phi_{G}^{-1}(\{\phi_{G}(\Lambda,\Omega)\})

equals the singleton $\{(\Lambda,\Omega)\}$ . We note that the above notions are well-defined also when $G$ is not acyclic but, in that case, $\mathbb{R}^{D}$ should be restricted to contain only matrices $\Lambda$ with $I-\Lambda$ invertible.

When $\mathcal{F}(\Lambda,\Omega)=\{(\Lambda,\Omega)\}$ for all $(\Lambda,\Omega)\in\Theta$ , so that $\phi_{G}$ is injective on $\Theta$ , then $G$ is said to be globally identifiable. Global identifiability is, however, often too strong a condition. So-called instrumental variable problems, for instance, give rise to graphs $G$ that would not be globally identifiable but for which the set of $(\Lambda,\Omega)$ on which identifiability fails has measure zero; see the example in the introduction of Foygel et al. (2012a). Instead, we will be concerned with the question of generic identifiability.

Definition 2.2.

A mixed graph $G$ is said to be generically identifiable if there exists a proper algebraic subset $A\subset\Theta$ so that $G$ is identifiable on $\Theta\setminus A$ .

Here, as usual, an algebraic set is defined as the zero-set of a collection of polynomials. We again refer the reader to the introduction of Foygel et al. (2012a) for an in-depth exposition on why generic identifiability is an often appropriate weakening of global identifiability.

Now there will be cases when we are interested in understanding the generic identifiability of certain coefficients of a mixed graph $G$ rather than all coefficients simultaneously. In these cases we say that the coefficient $\lambda_{vu}$ (or $\omega_{vu}$ ), for $u,v\in V$ , is generically identifiable in $G$ if the projection of the fiber $\mathcal{F}(\Lambda,\Omega)$ onto $\lambda_{vu}$ (or $\omega_{vu}$ ) is a singleton for all $\Theta\setminus A$ where $A\subset\Theta$ is a proper algebraic set.

Let $\Lambda$ and $\Omega$ be matrices of indeterminates as in Equation (1.2) with zero pattern corresponding to $G$ . Then, by the Trek Rule of Wright (1921), see also Spirtes, Glymour, and Scheines (2000), the covariance $\Sigma_{vw}$ can be represented as a sum of monomials corresponding to treks between $v$ and $w$ in $G$ . To state the Trek Rule formally, let $\mathcal{T}(v,w)$ be the set of all treks from $v$ to $w$ in $G$ . Then for any $\pi\in\mathcal{T}(v,w)$ , if $\pi$ contains no bidirected edge and has top node $z$ , we define the trek monomial as

\displaystyle\pi(\Lambda,\Omega)=\omega_{zz}\prod_{x\to y\in\pi}\lambda_{xy},

and if $\pi$ contains a bidirected edge connecting $u,z\in V$ then we define the trek monomial as

\displaystyle\pi(\Lambda,\Omega)=\omega_{uz}\prod_{x\to y\in\pi}\lambda_{xy}

We may then state the rule as follows.

Proposition 2.3 (Trek Rule).

For all $v,w\in V$ , the covariance matrix $\Sigma=(I-\Lambda)^{-T}\Omega(I-\Lambda)^{-1}$ corresponding to a mixed graph $G$ satisfies

\displaystyle\Sigma_{vw}=\sum_{\pi\in\mathcal{T}(v,w)}\pi(\Lambda,\Omega).

Before giving the statement of the HTC, we must first define what is meant by a half-trek system. Let $\Pi=\{\pi_{1},...,\pi_{m}\}$ be a collection of $m$ treks with each $\pi_{i}$ having source $x_{i}$ and target $y_{i}$ , then $\Pi$ is called a system of treks from $X=\{x_{1},...,x_{m}\}$ to $Y=\{y_{1},...,y_{m}\}$ if $|X|=|Y|=m$ so that all sources as well as all targets are pairwise distinct. If each $\pi_{i}$ is a half-trek, then $\Pi$ is a system of half-treks. Moreover, a collection $\Pi=\{\pi_{1},...,\pi_{m}\}$ of treks is said to have no sided intersection if

\displaystyle\text{Left}(\pi_{i})\cap\text{Left}(\pi_{j})=\varnothing=\text{Right}(\pi_{i})\cap\text{Right}(\pi_{j}),\ \forall i\not=j

Let $htr(v)$ be the collection of vertices $w\in V\setminus(\{v\}\cup sib(v))$ for which there is a half-trek from $v$ to $w$ , these $w$ are called half-trek reachable from $v$ . We have the following definition and result of Foygel et al. (2012a).

Definition 2.4.

A set of nodes $Y\subset V$ satisfies the half-trek criterion with respect to a node $v\in V$ if

(i)

$|Y|=|pa(v)|$ ,
(ii)

$Y\cap(\{v\}\cup sib(v))=\varnothing$ , and
(iii)

there is a system of half-treks with no sided intersection from $Y$ to $pa(v)$ .

Theorem 2.5 (HTC-identifiability).

Let $(Y_{v}:v\in V)$ be a family of subsets of the vertex set $V$ of a mixed graph $G$ . If, for each node $v$ , the set $Y_{v}$ satisfies the half-trek criterion with respect to $v$ , and there is a total ordering $\prec$ on the vertex set $V$ such that $w\prec v$ whenever $w\in Y_{v}\cap htr(v)$ , then $G$ is rationally identifiable.

The assertion that $G$ is rationally identifiable means that the inverse map $\phi_{G}^{-1}$ can be represented as a rational function on $\Theta\setminus A$ where $A$ is some proper algebraic subset of $\Theta$ . Clearly, rational identifiability is a stronger condition than generic identifiability. If a graph $G$ satisfies Theorem 2.5 we will say that $G$ is HTC-identifiable (HTCI). In a similar vein, Theorem 2 of Foygel et al. (2012a) gives sufficient conditions for a graph $G$ to be generically unidentifiable (with generically infinite fibers of $\phi_{G}$ ), and we will call such graphs HTC-unidentifiable (HTCU). Graphs that are neither HTCI or HTCU are called HTC-inconclusive, these are the graphs on which progress is left to be made.

As is noted in Section 8 in Foygel et al. (2012a) we may extend the power of the HTC by using the graph decomposition techniques of Tian (2005). Let $C_{1},...,C_{k}\subset V$ be the unique partitioning of $V$ where $v,w\in C_{i}$ if and only if there exists a (possibly empty) path from $v$ to $w$ composed of only bidirected edges. In other words, $C_{1},...,C_{k}$ are the connected components of $(V,B)$ , the bidirected part of $G$ . For $i=1,\dots,k$ , let

	$\displaystyle V_{i}$	$\displaystyle=C_{i}\cup pa(C_{i}),$	$\displaystyle D_{i}$	$\displaystyle=\{v\to w\in G:v\in V_{i},\ w\in C_{i}\},$
	$\displaystyle B_{i}$	$\displaystyle=\{v\leftrightarrow w\in G:v,w\in C_{i}\},$	$\displaystyle G_{i}$	$\displaystyle=(V_{i},D_{i},B_{i}).$

Then the mixed graphs $G_{1},...,G_{k}$ are called the mixed components of $G$ . From the work of Tian (2005), Foygel et al. (2012a) present the following theorem.

Theorem 2.6 (Tian Decomposition).

For an acyclic mixed graph $G$ with mixed components $G_{1},...,G_{k}$ , the following holds:

(i)

$G$ is rationally (or generically) identifiable if and only if all components
$G_{1},...,G_{k}$ are rationally (or generically) identifiable;
(ii)

$G$ is generically infinite-to-one if and only if there exists a component $G_{j}$ that is generically infinite-to-one;
(iii)

if each $G_{j}$ is generically $h_{j}$ -to-one with $h_{j}<\infty$ , then $G$ is generically $h$ -to-one with $h=\prod_{j=1}^{k}h_{j}$ .

We remark that this decomposition also plays a role in non-linear models; see, for instance, the paper of Shpitser et al. (2014) and the references given therein.

3. Ancestral Decomposition

For a later strengthening of the HTC, we will show that the generic identification of certain subgraphs of an acyclic mixed graph $G=(V,D,B)$ implies the generic identification of their associated edge coefficients in the larger graph $G$ . This result is straightforward and is well known in other forms. Surprisingly, however, this simple idea can extend the applicability of the HTC when combined with the decomposition from Theorem 2.6. We will first define what we mean by an ancestral subset and an induced graph,

Definition 3.1.

Let $V^{\prime}\subset V$ be a subset of vertices. The ancestors of $V^{\prime}$ form the set

\displaystyle An(V^{\prime})=\{v\in V:\text{there exists a directed path from $v$ to some $w\in V^{\prime}$}\},

where we consider the empty path to be directed so that $V^{\prime}\subset An(V^{\prime})$ . If $V^{\prime}=An(V^{\prime})$ , then we call $V^{\prime}$ ancestral.

Definition 3.2.

Let $V^{\prime}\subset V$ be again a subset of vertices. The subgraph of $G$ induced by $V^{\prime}$ is the mixed graph $G_{V^{\prime}}=(V^{\prime},D^{\prime},B^{\prime})$ with

	$\displaystyle D^{\prime}$	$\displaystyle=\{v\to w\in G:v,w\in V^{\prime}\},$
	$\displaystyle B^{\prime}$	$\displaystyle=\{v\leftrightarrow w\in G:v,w\in V^{\prime}\}.$

We now have the following simple fact.

Theorem 3.3.

Let $G=(V,D,B)$ be a mixed graph, and let $V^{\prime}$ be an ancestral subset of $V$ . If the induced subgraph $G_{V^{\prime}}$ is generically (or rationally) identifiable then so are all the corresponding edge coefficients in $G$ .

Proof..

Let the covariance matrix $\Sigma=(I-\Lambda)^{-T}\Omega(I-\Lambda)^{-1}$ correspond to $G$ , that is, $\Lambda\in\mathbb{R}^{D}$ and $\Omega\in PD(B)$ . Let $\Lambda^{\prime}$ and $\Omega^{\prime}$ denote the $V^{\prime}\times V^{\prime}$ submatrices of $\Lambda$ and $\Omega$ , respectively, and let $\Sigma^{\prime}=(I_{|V^{\prime}|}-\Lambda^{\prime})^{-T}\Omega(I_{|V^{\prime}|}-\Lambda^{\prime})^{-1}$ where $I_{|V^{\prime}|}$ is the $|V^{\prime}|\times|V^{\prime}|$ identity matrix. For ease of notation, write $G^{\prime}=G_{V^{\prime}}$ .

Recall that for any $v,w\in V$ , the set $\mathcal{T}(v,w)$ comprises all treks between $v$ and $w$ in $G$ . Similarly, write $\mathcal{T}_{G^{\prime}}(v,w)$ for the set of treks between $v$ and $w$ in $G^{\prime}$ . Since $V^{\prime}$ is ancestral, it holds that $\mathcal{T}(v,w)=\mathcal{T}_{G^{\prime}}(v,w)$ for all $v,w\in V^{\prime}$ . Thus, by Proposition 2.3, we have that for any $v,w\in V^{\prime}$

\displaystyle\Sigma_{vw}

\displaystyle=\sum_{\pi\in\mathcal{T}(v,w)}\pi(\Lambda,\Omega)=\sum_{\pi\in\mathcal{T}_{G^{\prime}}(v,w)}\pi(\Lambda^{\prime},\Omega^{\prime})=\Sigma_{vw}^{\prime}.

Now suppose that $G^{\prime}$ is generically (or rationally) identifiable. Then $\Lambda^{\prime},\Omega^{\prime}$ can be generically (or rationally) recovered from $\Sigma^{\prime}$ . As we have just shown that $\Sigma_{vw}=\Sigma_{vw}^{\prime}$ for all $v,w\in V^{\prime}$ , we have that the entries of $\Lambda,\Omega$ corresponding to $\Lambda^{\prime},\Omega^{\prime}$ can be recovered from $\Sigma$ generically (or rationally). ∎

We may generalize the above theorem so that we do not have to consider the identifiability of all of $G^{\prime}$ and instead only look at certain edges in $G^{\prime}$ .

Corollary 3.4.

Let $G=(V,D,B)$ be a mixed graph, and let $V^{\prime}$ be an ancestral subset of $V$ . If an edge coefficient of $G_{V^{\prime}}$ is generically (or rationally) identifiable then so is the corresponding coefficient in $G$ .

Proof..

This follows exactly as in the proof of Theorem 3.3 by only considering a single generically (or rationally) identifiable coefficient of $G^{\prime}$ at a time. ∎

We give an example as to how Theorem 3.3 strengthens the HTC.

(a) An HTC-inconclusive graph

G

(b) The induced subgraph

G_{\{1,2,3,4,5\}}

G_{\{1,2,3,4,5\}}

Figure 1. A mixed graph

G

, a subgraph induced by an ancestral subset, and the mixed components the induced graph.

Example 3.5.

It is straightforward to check that the graph $G$ from Figure 1(a) is HTC-inconclusive using Algorithm 1 from Foygel et al. (2012a). We direct the reader who does not want to perform this computation by hand to the R package SEMID (Foygel and Drton, 2013; R Core Team, 2014). Moreover, $G$ cannot be decomposed as its bidirected part is connected.

Now the set $V^{\prime}=\{1,2,3,4,5\}$ is ancestral in $G$ , so we may apply Theorem 3.3 to the induced subgraph $G^{\prime}=G_{1,2,3,4,5}$ . While $G^{\prime}$ remains HTC-inconclusive, the Tian decomposition of Theorem 2.6 is applicable. After decomposing $G^{\prime}$ into its mixed components, see Figure 1(c), we find that each component is HTC-identifiable and thus $G^{\prime}$ itself is generically identifiable. To show generic identifiability of $G$ , we are left to show that all the coefficients on the directed edges between $pa(6)=\{1,2\}$ and 6 are generically identifiable. Since $Y=\{3,4\}$ satisfies the HTC with respect to $6$ it follows, by Lemma 3.6 below, that $\Lambda_{pa(v),v}$ is generically identifiable. Hence, the entire matrix $\Lambda$ is generically identifiable, and since $(I-\Lambda)^{T}\Sigma(I-\Lambda)=\Omega$ , this implies generic identifiability of $\Omega$ . We conclude that $G$ is generically identifiable despite being HTC-inconclusive.

Lemma 3.6.

Let $G=(V,D,B)$ be a mixed graph, and let $v\in V$ . If $Y\subset V$ satisfies the HTC with respect to $v$ and for each $y\in Y$ we have that $\Lambda_{pa(y),y}$ is generically (or rationally) identifiable, then $\Lambda_{pa(v),v}$ is generically (or rationally) identifiable.

Proof..

Suppose vertex $v$ has $m$ parents, and let $pa(v)=\{p_{1},...,p_{m}\}$ . Since $Y$ satisfies the HTC for $v$ , we must have $|Y|=|pa(v)|=m$ . Thus we may enumerate the set as $Y=\{y_{1},...,y_{m}\}$ . Define a matrix $A\in\mathbb{R}^{m\times m}$ with entries

A_{ij}:=\left\{\begin{array}[]{ll}[(I-\Lambda)^{T}\Sigma]_{y_{i}p_{j}}&\mbox{if }y_{i}\in htr(v),\\ \Sigma_{y_{i}p_{j}}&\mbox{if }y_{i}\not\in htr(v),\end{array}\right.

and define a vector $b\in\mathbb{R}^{m}$ with entries

b_{i}:=\left\{\begin{array}[]{ll}[(I-\Lambda)^{T}\Sigma]_{y_{i}v}&\mbox{if }y_{i}\in htr(v),\\ \Sigma_{y_{i}v}&\mbox{if }y_{i}\not\in htr(v).\end{array}\right.

Both $A$ and $b$ are generically identifiable because we have assumed that $\Lambda_{pa(y),y}$ is generically identifiable for every $y\in Y$ . Now, from the proof of Theorem 1 in Foygel et al. (2012a), we have $A\cdot\Lambda_{pa(v),v}=b$ , and from Lemma 2 of Foygel et al. (2012a) we deduce that $A$ is generically invertible. It follows that $\Lambda_{pa(v),v}=A^{-1}b$ generically so that $\Lambda_{pa(v),v}$ is generically identifiable. ∎

Algorithm 1 from Foygel et al. (2012a) (hereafter called the HTC-algorithm) determines whether or not a mixed graph $G=(V,D,B)$ satisfies the conditions of Theorem 2.5 and and thus checks if the graph is HTCI. The HTC-algorithm operates by iteratively looping through the nodes $v\in V$ and attempting to find a half-trek system $\Pi$ to $pa(v)$ with $\Pi$ having sources which are in a set of “allowed” nodes $A\subset V$ . Here, a node $w$ is allowed if $w\not\in htr(v)\cup\{v\}\cup sib(v)$ or if $w$ was previously shown to be generically identifiable in the sense that all coefficients on directed edges $u\to w$ were shown to be generically identifiable. If such a half-trek system $\Pi$ is found for node $v$ then Foygel et al. (2012a) show that this implies that $v$ is generically identifiable, and thus $v$ may be added to the set of allowed nodes for the remaining iterations. The algorithm terminates when all nodes have been shown to be generically identifiable or once it has iterated through all vertices and has been unable to show the generic identifiability of any new nodes. To find a half-trek system between a suitable subset of the set of allowed nodes $A$ and $pa(v)$ , the HTC-algorithm solves a Max Flow problem on an auxiliary network $G_{\text{flow}}(A,v)$ , and this step takes $\mathcal{O}(|V|^{3})$ time when $G$ is acyclic. If in $G_{\text{flow}}(A,v)$ one can find a flow of size $|pa(v)|$ then the half-trek system exists. See Section 6 of Foygel et al. (2012a) for more details about how $G_{\text{flow}}(A,v)$ is defined. Finally, for an acyclic mixed graph $G$ , the HTC-algorithm has a worst case running time of $\mathcal{O}(|V|^{5})$ .

Algorithm 1 presents a simple modification of the HTC-algorithm to leverage Corollary 3.4 extending the ability of the HTC to determine the generic identifiability of acyclic mixed graphs. We emphasize that this algorithm considers only certain ancestral subsets and, as such, we do not necessarily expect the algorithm to reach a conclusion in all cases in which Corollary 3.4 may be applicable.

Algorithm 1 A sufficient test for generic identifiability.

1: Input:

G=(V,D,B)

, an acyclic mixed graph on

n

nodes

v_{1},\dots,v_{n}

2: Initialize:

\text{SolvedNodes}\leftarrow\{v:pa(v)=\varnothing\}

3: repeat

4: for

v=v_{1},v_{2},\ldots,v_{n}

5: if

v\not\in

SolvedNodes then

\triangleright

Check if we can generically identify

\Lambda_{pa(v),v}

using

\triangleright

the induced graph

G_{An(\{v\}\cup(A\cap S)}

S\leftarrow An(v)\cup sib(An(v))

A\leftarrow S\cap(

SolvedNodes

\cup\ (V\setminus htr(v)))\setminus(\{v\}\cup sib(v))

10:

G^{\prime}\leftarrow

the mixed component of

G_{An(\{v\}\cup(A\cap S))}

containing

v

11:

A\leftarrow(A

\cap

(vertices in

G^{\prime})

)

\cup

(source nodes in

G^{\prime}

)

12: if MaxFlow

(G_{\mathrm{flow}}(v,A))=|pa(v)|

then

13: SolvedNode

\leftarrow

SolvedNodes

\cup\ \{v\}

14: Skip to next iteration of the loop

15: end if

16:

\triangleright

Check if we can generically identify

\Lambda_{pa(v),v}

using

17:

\triangleright

the induced graph

G_{An(\{v\})}

18:

G^{\prime}\leftarrow

the mixed component of

G_{An(v)}

containing

v

19:

A\leftarrow(A\cap(

vertices of

G^{\prime}

))

\cup\ (

source nodes in

G^{\prime}

)

20: if MaxFlow

(G^{\prime}_{\mathrm{flow}}(v,A))=|pa(v)|

then

21: SolvedNode

\leftarrow

SolvedNodes

\cup\ \{v\}

22: end if

23: end if

24: end for

25: until SolvedNodes

=V

or no change has occurred in the last iteration.

26: Output: “yes” if SolvedNodes

=V

, “no” otherwise.

Proposition 3.7.

Algorithm 1 returns “yes” only if the input acyclic mixed graph $G$ is generically identifiable and will return “yes” whenever the HTC-algorithm does. Moreover, Algorithm 1 returns “yes” for the HTC-inconclusive graph in Figure 1(a) and has time complexity at most $\mathcal{O}(|V|^{5})$ .

Proof..

The fact that Algorithm 1 only returns “yes” if $G$ is generically identifiable follows from Theorem 7 in Foygel et al. (2012a, b) and our Corollary 3.4. That Algorithm 1 returns “yes” whenever the HTC-algorithm does can be argued as follows:

If, for a set of allowed nodes $A$ and $v\in V$ , there exists $Y\subset A$ satisfying the HTC for $v$ then we must have that $Y\subset S:=An(v)\cup sib(An(v))$ and $Y$ satisfies the HTC for $v$ in $G_{An(\{v\}\cup(A\cap S))}$ . Lemma 4 of Foygel et al. (2012b) then yields that there exists a set $Y^{\prime}$ of allowed nodes which satisfies the HTC for $v$ in the mixed component of $G_{An(\{v\}\cup(A\cap S))}$ containing $v$ . Hence, if $v$ is added to the set of solved nodes in the HTC-algorithm it will also be added to the set of solved nodes in Algorithm 1. From this it follows that if the HTC-algorithm outputs “yes” then so will Algorithm 1.

It is straightforward to check that Algorithm 1 returns “yes” for the graph in Figure 1(a) and, thus, it remains only to argue that the time complexity is at most $\mathcal{O}(|V|^{5})$ . Note that the Max Flow algorithm for this problem has a running time of $\mathcal{O}(|V|^{3})$ since $G$ is acyclic, see Foygel et al. (2012a) for details. It is easy to see that this running time dominates on each iteration. Moreover, since at the end of each iteration through the $|V|$ nodes of the graph, the algorithm must either terminate or add at least one vertex to the set of solved nodes, it follows that there are at most $|V|^{2}$ iterations. We conclude that the maximum run time of the algorithm is $\mathcal{O}(|V|^{2}\cdot|V|^{3})=\mathcal{O}(|V|^{5})$ . ∎

Remark 3.8.

One might expect that lines 18 to 22 in Algorithm 1 are superfluous. This is, however, false, and indeed we have found examples of graphs $G$ on 10 nodes for which Algorithm 1 returns “yes” but the corresponding algorithm with lines 18 to 22 removed returns “no.” As these examples are fairly large we have chosen to not display them here.

4. Computational Experiments

We now run a simulation study to examine the effect of applying Algorithm 1 to HTC-inconclusive graphs. All code is written in $R$ , and we use the SEMID package to determine HTC-identifiability and HTC-unidentifiability (R Core Team, 2014; Foygel and Drton, 2013).

Algorithm 2 A procedure for generating random acyclic mixed graphs.

1: Input: A positive integer

n

and

0<p,q<1

2: Initialize: A mixed graph

G=(V,D,B)

with

V=\{1,...,n\}

D=B=\varnothing

3: Pick a random collection

E

n-1

bidirected edges so that

(V,E)

is a tree.

B\leftarrow E

5: for

1\leq i<j\leq n

6: Add

i\leftrightarrow j

B

with probability

p

7: end for

8: for

1\leq i<j\leq n

9: Add

i\to j

D

with probability

q

10: end for

11: Output: G

For each combination of $n\in\{6,8,10,12\}$ , $p\in\{.1,.2,.3\}$ , and $q\in\{.2,.3,.4,.5,.6\}$ we perform the following steps:

(i)

Use Algorithm 2 with probability parameters $p$ and $q$ to generate random acyclic mixed graphs $G$ with connected bidirected part on $n$ nodes, until we have found 1000 graphs which are HTC-inconclusive.
(ii)

For each of the 1000 HTC-inconclusive graphs $G$ , use Algorithm 1 to test the generic identifiability of $G$ .
(iii)

Record the proportion of the 1000 graphs that are shown to be generically identifiable by Algorithm 1. Call this proportion $a_{n,p,q}$ .

To summarize our findings we compute, for each pair $(n,p)$ , the average $b_{n,q}=\frac{1}{3}\sum_{p}a_{n,p,q}$ . We then plot the values of $b_{n,q}$ in Figure 2. According to this figure, Algorithm 1 provides a modest increase in the number of graphs that are generically identifiable. This improvement is seen to be largest when $q$ is large, that is, the directed part of the mixed graph is dense.

Refer to caption — Figure 2. The average proportion of HTC-inconclusive graphs found to be generically identifiable by Algorithm 1.

5. Conclusion

We have shown how the generic identifiability of a subgraph of a mixed graph $G$ induced by an ancestral subset of vertices implies the generic identifiability of the corresponding edge coefficients in $G$ (Theorem 3.3 and Corollary 3.4). We then provided, in Algorithm 1, one specific way of how to leverage this result by using the HTC of Foygel et al. (2012a) and the decomposition techniques of Tian (2005). Our new algorithm provides a modest strengthening of the HTC while not increasing the algorithmic complexity of the HTC-algorithm of Foygel et al. (2012a).

When saying above that Algorithm 1 constitutes one way of leveraging, we mean that the algorithm considers only certain ancestral subsets. While we do not have any examples to report, it is possible that there are acyclic mixed graphs for which Algorithm 1 does not return “yes” but which could be proven generically identifiable by a combination of Corollary 3.4, the Tian decomposition, and the HTC-algorithm. This said, it is not clear to us that all ancestral subsets can be considered in an algorithm with polynomial run time. Clarifying this issue would be an interesting topic for future work.

Acknowledgments

We would like to thank Thomas Richardson whose questions following a seminar talk started this project. This work was partially supported by the U.S. National Science Foundation (DMS-1305154) and National Security Agency (H98230-14-1-0119).

References

Bollen (1989) K. A. Bollen. Structural equations with latent variables. Wiley Series in Probability and Mathematical Statistics: Applied Probability and Statistics. John Wiley & Sons, Inc., New York, 1989. A Wiley-Interscience Publication.
Brito and Pearl (2006) C. Brito and J. Pearl. Graphical condition for identification in recursive SEM. In Proceedings of the Twenty-Second Annual Conference on Uncertainty in Artificial Intelligence (UAI-06), pages 47–54. AUAI Press, 2006.
Chen et al. (2014) B. Chen, J. Tian, and J. Pearl. Testable implications of linear structural equations models. In C. E. Brodley and P. Stone, editors, Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence, pages 2424–2430. AAAI Press, 2014.
Drton et al. (2011) M. Drton, R. Foygel, and S. Sullivant. Global identifiability of linear structural equation models. Ann. Statist., 39(2):865–886, 2011.
Foygel and Drton (2013) R. Foygel and M. Drton. SEMID: Identifiability of linear structural equation models, 2013. R package version 0.1.
Foygel et al. (2012a) R. Foygel, J. Draisma, and M. Drton. Half-trek criterion for generic identifiability of linear structural equation models. Ann. Statist., 40(3):1682–1713, 2012a.
Foygel et al. (2012b) R. Foygel, J. Draisma, and M. Drton. Supplement to “half-trek criterion for generic identifiability of linear structural equation models. Ann. Statist., 40(3), 2012b.
Pearl (2009) J. Pearl. Causality. Cambridge University Press, Cambridge, second edition, 2009. Models, reasoning, and inference.
R Core Team (2014) R Core Team. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria, 2014.
Shpitser et al. (2014) I. Shpitser, R. J. Evans, T. S. Richardson, and J. M. Robins. Introduction to nested Markov models. Behaviormetrika, 41(1):3–39, 2014.
Spirtes et al. (2000) P. Spirtes, C. Glymour, and R. Scheines. Causation, prediction, and search. MIT press, Cambridge, MA, 2nd edition, 2000.
Tian (2005) J. Tian. Identifying direct causal effects in linear models. In Proceedings of the National Conference on Artificial Intelligence (AAAI), pages 346–352. AAAI Press/The MIT Press, 2005.
Wright (1921) S. Wright. Correlation and causation. J. Agricultural Research, 20:557–585, 1921.
Wright (1934) S. Wright. The method of path coefficients. Ann. Math. Statist., 5:161–215, 1934.