This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

\floatsetup

[figure]style=plain

Generic Identifiability of Linear Structural Equation Models by Ancestor Decomposition

Mathias Drton Department of Statistics, University of Washington, Seattle, WA, U.S.A. md5@uw.edu  and  Luca Weihs Department of Statistics, University of Washington, Seattle, WA, U.S.A. lucaw@uw.edu
(Date: September 25, 2025)
Abstract.

Linear structural equation models, which relate random variables via linear interdependencies and Gaussian noise, are a popular tool for modeling multivariate joint distributions. These models correspond to mixed graphs that include both directed and bidirected edges representing the linear relationships and correlations between noise terms, respectively. A question of interest for these models is that of parameter identifiability, whether or not it is possible to recover edge coefficients from the joint covariance matrix of the random variables. For the problem of determining generic parameter identifiability, we present an algorithm that extends an algorithm from prior work by Foygel, Draisma, and Drton (2012). The main idea underlying our new algorithm is the use of ancestral subsets of vertices in the graph in application of a decomposition idea of Tian (2005).

Key words and phrases:
Half-trek criterion, structural equation models, identifiability, generic identifiability

1. Introduction

It is often useful to model the joint distribution of a random vector X=(X1,,Xn)TX=(X_{1},...,X_{n})^{T} in terms of a collection of noisy linear interdependencies. In particular, we may postulate that each XwX_{w} is a linear function of X1,,Xw1,Xw+1,,XnX_{1},...,X_{w-1},X_{w+1},...,X_{n} and a stochastic noise term ϵw\epsilon_{w}. Models of this type are called linear structural equation models and can be compactly expressed in matrix form as

(1.1) X=λ0+ΛTX+ϵ\displaystyle X=\lambda_{0}+\Lambda^{T}X+\epsilon

where Λ=(λvw)\Lambda=(\lambda_{vw}) is a n×nn\times n matrix, λ0=(λ01,,λ0n)Tn\lambda_{0}=(\lambda_{01},...,\lambda_{0n})^{T}\in\mathbb{R}^{n}, and ϵ=(ϵ1,,ϵn)T\epsilon=(\epsilon_{1},...,\epsilon_{n})^{T} is a random vector of error terms. We will adopt the classical assumption that ϵ\epsilon has a non-degenerate multivariate normal distribution with mean 0 and covariance matrix Ω=(ωvw)\Omega=(\omega_{vw}). With this assumption it follows immediately that XX has a multivariate normal distribution with mean (IΛ)Tλ0(I-\Lambda)^{-T}\lambda_{0} and covariance matrix

(1.2) Σ=(IΛ)TΩ(IΛ)1\displaystyle\Sigma=(I-\Lambda)^{-T}\Omega(I-\Lambda)^{-1}

where II is the n×nn\times n identity matrix. We refer the reader to the book by Bollen (1989) for background on these types of models.

We obtain a collection of interesting models by imposing different patterns of zeros among the coefficients in Λ\Lambda and Ω\Omega. These models can then be naturally associated with mixed graphs containing both directed and bidirected edges. In particular, the graph will contain the directed edge vwv\to w when λvw\lambda_{vw} is not required to be zero and, similarly, will include the bidirected edge vwv\leftrightarrow w when ωvw\omega_{vw} is potentially non-zero. Representations of this type are often called path diagrams and were first advocated for in Wright (1921, 1934).

A natural question arising in the study of linear structural equations is that of identifiability; whether or not it is possible to uniquely recover the two parameter matrices Λ\Lambda and Ω\Omega from the covariance matrix Σ\Sigma they define via (1.2). The most stringent version, known as global identifiability, amounts to unique recovery of every pair (Λ,Ω)(\Lambda,\Omega) from the covariance matrix Σ\Sigma. This global property can be characterized efficiently (Drton et al., 2011). Often, however, a less stringent notion that we term generic identifiability is of interest. This property requires only that a generic (or randomly chosen) pair (Λ,Ω)(\Lambda,\Omega) can be recovered from its covariance matrix. The computational complexity of deciding whether a given mixed graph GG defines a generically identifiable linear structural equation model is unknown. There are, however, a number of graphical criteria that are sufficient for generic identifiability and can be checked in polynomial time in the number of considered variables (or vertices of the graph). To our knowledge, the most widely applicable such criterion is the Half-Trek Criterion (HTC) of Foygel, Draisma, and Drton (2012a), which built on earlier work of Brito and Pearl (2006). The HTC also comes with a necessary condition for generic identifiability but in this paper our focus is on the sufficient condition. We remark that an extension of the HTC for identification of subsets of edge coefficients is given in Chen et al. (2014).

We begin with a brief review of background such as the formal connection between structural equation models and mixed graphs, and give a review of prior work in Section 2. In the main Section 3, we will demonstrate a simple method by which to infer generic identifiability of certain entries of (Λ,Ω)(\Lambda,\Omega) by examining subgraphs of a given mixed graph GG that are induced by ancestral subsets of vertices. This will extend the applicability of the HTC in the case of acyclic mixed graphs after applying the decomposition techniques of Tian (2005). We leverage this extension in an efficient algorithmic form. In Section 4, we report on computational experiments demonstrating the applicability of our findings. A brief conclusion is given in Section 5.

2. Preliminaries

We assume that the reader is familiar with graphical representations of structural equation models and thus only provide a quick review of these topics. For a more in-depth treatment see, for instance, Pearl (2009) or Foygel et al. (2012a).

2.1. Mixed Graphs

For any n1n\geq 1, let [n]:={1,,n}[n]:=\{1,...,n\}. We define a mixed graph to be a triple G=(V,D,B)G=(V,D,B) where V=[n]V=[n] is a finite set of vertices and D,BV×VD,B\subset V\times V. The sets DD and BB correspond to the directed and the bidirected edges, respectively. When (v,w)D(v,w)\in D, we will write vwGv\to w\in G and if (v,w)B(v,w)\in B then we will write vwGv\leftrightarrow w\in G. Since edges in BB are bidirected the set BB is symmetric, that is, we have (v,w)B(w,v)B(v,w)\in B\iff(w,v)\in B. We require that both the directed part (V,D)(V,D) and bidirected part (V,B)(V,B) contain no self loops so that (v,v)DB(v,v)\not\in D\cup B for all vVv\in V. If the directed graph (V,D)(V,D) does not contain any cycles, so that there are no vertices v,w1,,wmVv,w_{1},...,w_{m}\in V such that vw1,w1w2,,wmvGv\to w_{1},w_{1}\to w_{2},...,w_{m}\to v\in G, then we say that GG is acyclic; note, in particular, that GG being acyclic does not imply (V,B)(V,B) does not contain any (undirected) cycles.

A path from vv to ww is any sequence of edges from DD or BB beginning at vv and ending with ww, here edges need not obey direction and loops are allowed. A directed path from vv to ww is then any path from vv to ww all of whose edges are directed and pointed in the same direction, away from vv and towards ww. Finally, a trek π\pi from a source vv to a target ww is any path that has no colliding arrowheads, that is, π\pi must be of the form

vlLvl1Lv0Lv0Rv1Rvr1RvrR\displaystyle v^{L}_{l}\leftarrow v^{L}_{l-1}\leftarrow...\leftarrow v^{L}_{0}\longleftrightarrow v^{R}_{0}\to v^{R}_{1}\to...\to v^{R}_{r-1}\to v^{R}_{r}

or

vlLvl1Lv1LvTv1RvrRvrR\displaystyle v^{L}_{l}\leftarrow v^{L}_{l-1}\leftarrow...\leftarrow v^{L}_{1}\leftarrow v^{T}\to v^{R}_{1}\to...\to v^{R}_{r}\to v^{R}_{r}

where vlL=vv^{L}_{l}=v, vrR=wv^{R}_{r}=w, and we call vTv^{T} the top node. If π\pi is as in the first case then we let Left(π)={v0L,,vlL}\text{Left}(\pi)=\{v^{L}_{0},...,v^{L}_{l}\} and Right(π)={v0R,,vrR}\text{Right}(\pi)=\{v^{R}_{0},...,v^{R}_{r}\}, if π\pi is as in the second case then we let Left(π)={vT,v1L,,vlL}\text{Left}(\pi)=\{v^{T},v^{L}_{1},...,v^{L}_{l}\} and Right(π)={vT,v1R,,vrR}\text{Right}(\pi)=\{v^{T},v^{R}_{1},...,v^{R}_{r}\}. Note that, in the second case, vTv^{T} is included in both Left(π)\text{Left}(\pi) and Right(π)\text{Right}(\pi). A trek π\pi is called a half-trek if |Left(π)|=1|\text{Left}(\pi)|=1 so that π\pi is of the form

v0Lv0Rv1Rvr1RvrR\displaystyle v^{L}_{0}\longleftrightarrow v^{R}_{0}\to v^{R}_{1}\to...\to v^{R}_{r-1}\to v^{R}_{r}

or

vTv1RvrRvrR\displaystyle v^{T}\to v^{R}_{1}\to...\to v^{R}_{r}\to v^{R}_{r}

It will be useful to reference the local neighborhood structure of the graph. For this purpose, for all vVv\in V, we define the two sets

(2.1) pa(v)={wV:wvG},\displaystyle pa(v)=\{w\in V:w\to v\in G\},
(2.2) sib(v)={wV:wvG}.\displaystyle sib(v)=\{w\in V:w\leftrightarrow v\in G\}.

The former comprises the parents of vv, and the latter contains the siblings of vv.

We associate a mixed graph GG to a linear structural equation model as follows. Let D\mathbb{R}^{D} be the set of real n×nn\times n matrices Λ=(λvw)\Lambda=(\lambda_{vw}) with support DD, i.e., λvw0(v,w)D\lambda_{vw}\not=0\implies(v,w)\in D. Let PDnPD_{n} be the cone of n×nn\times n positive definite matrices Ω=(ωvw)\Omega=(\omega_{vw}). Define PD(B)PDnPD(B)\subset PD_{n} to be the subset of positive definite matrices with support BB, i.e. for vwv\not=w, ωvw0vwG\omega_{vw}\not=0\implies v\leftrightarrow w\in G.

In this paper, we focus on acyclic graphs GG. If GG is acyclic then the matrix IΛI-\Lambda is invertible for all ΛD\Lambda\in\mathbb{R}^{D}. In other words, the equation system from (1.1) can always be solved uniquely for XX. We are led to the following definition.

Definition 2.1.

The linear structural equation model given by an acyclic mixed graph G=(V,D,B)G=(V,D,B) with V=[n]V=[n] is the collection of all nn-dimensional normal distributions with covariance matrix

Σ=(IΛ)TΩ(IΛ)1\displaystyle\Sigma=(I-\Lambda)^{-T}\Omega(I-\Lambda)^{-1}

for a choice of ΛD\Lambda\in\mathbb{R}^{D} and ΩPD(B)\Omega\in PD(B).

2.2. Prior Work and the HTC

For a fixed acyclic mixed graph GG, let Θ:=D×PD(B)\Theta:=\mathbb{R}^{D}\times PD(B) be the parameter space and ϕG:ΘPDn\phi_{G}:\Theta\to PD_{n} be the map

(2.3) ϕG:(Λ,Ω)(IΛ)TΩ(IΛ)1.\displaystyle\phi_{G}:(\Lambda,\Omega)\mapsto(I-\Lambda)^{-T}\Omega(I-\Lambda)^{-1}.

Then the question of identifiability is equivalent to asking whether the fiber

(Λ,Ω):=ϕG1({ϕG(Λ,Ω)})\displaystyle\mathcal{F}(\Lambda,\Omega):=\phi_{G}^{-1}(\{\phi_{G}(\Lambda,\Omega)\})

equals the singleton {(Λ,Ω)}\{(\Lambda,\Omega)\}. We note that the above notions are well-defined also when GG is not acyclic but, in that case, D\mathbb{R}^{D} should be restricted to contain only matrices Λ\Lambda with IΛI-\Lambda invertible.

When (Λ,Ω)={(Λ,Ω)}\mathcal{F}(\Lambda,\Omega)=\{(\Lambda,\Omega)\} for all (Λ,Ω)Θ(\Lambda,\Omega)\in\Theta, so that ϕG\phi_{G} is injective on Θ\Theta, then GG is said to be globally identifiable. Global identifiability is, however, often too strong a condition. So-called instrumental variable problems, for instance, give rise to graphs GG that would not be globally identifiable but for which the set of (Λ,Ω)(\Lambda,\Omega) on which identifiability fails has measure zero; see the example in the introduction of Foygel et al. (2012a). Instead, we will be concerned with the question of generic identifiability.

Definition 2.2.

A mixed graph GG is said to be generically identifiable if there exists a proper algebraic subset AΘA\subset\Theta so that GG is identifiable on ΘA\Theta\setminus A.

Here, as usual, an algebraic set is defined as the zero-set of a collection of polynomials. We again refer the reader to the introduction of Foygel et al. (2012a) for an in-depth exposition on why generic identifiability is an often appropriate weakening of global identifiability.

Now there will be cases when we are interested in understanding the generic identifiability of certain coefficients of a mixed graph GG rather than all coefficients simultaneously. In these cases we say that the coefficient λvu\lambda_{vu} (or ωvu\omega_{vu}), for u,vVu,v\in V, is generically identifiable in GG if the projection of the fiber (Λ,Ω)\mathcal{F}(\Lambda,\Omega) onto λvu\lambda_{vu} (or ωvu\omega_{vu}) is a singleton for all ΘA\Theta\setminus A where AΘA\subset\Theta is a proper algebraic set.

Let Λ\Lambda and Ω\Omega be matrices of indeterminates as in Equation (1.2) with zero pattern corresponding to GG. Then, by the Trek Rule of Wright (1921), see also Spirtes, Glymour, and Scheines (2000), the covariance Σvw\Sigma_{vw} can be represented as a sum of monomials corresponding to treks between vv and ww in GG. To state the Trek Rule formally, let 𝒯(v,w)\mathcal{T}(v,w) be the set of all treks from vv to ww in GG. Then for any π𝒯(v,w)\pi\in\mathcal{T}(v,w), if π\pi contains no bidirected edge and has top node zz, we define the trek monomial as

π(Λ,Ω)=ωzzxyπλxy,\displaystyle\pi(\Lambda,\Omega)=\omega_{zz}\prod_{x\to y\in\pi}\lambda_{xy},

and if π\pi contains a bidirected edge connecting u,zVu,z\in V then we define the trek monomial as

π(Λ,Ω)=ωuzxyπλxy\displaystyle\pi(\Lambda,\Omega)=\omega_{uz}\prod_{x\to y\in\pi}\lambda_{xy}

We may then state the rule as follows.

Proposition 2.3 (Trek Rule).

For all v,wVv,w\in V, the covariance matrix Σ=(IΛ)TΩ(IΛ)1\Sigma=(I-\Lambda)^{-T}\Omega(I-\Lambda)^{-1} corresponding to a mixed graph GG satisfies

Σvw=π𝒯(v,w)π(Λ,Ω).\displaystyle\Sigma_{vw}=\sum_{\pi\in\mathcal{T}(v,w)}\pi(\Lambda,\Omega).

Before giving the statement of the HTC, we must first define what is meant by a half-trek system. Let Π={π1,,πm}\Pi=\{\pi_{1},...,\pi_{m}\} be a collection of mm treks with each πi\pi_{i} having source xix_{i} and target yiy_{i}, then Π\Pi is called a system of treks from X={x1,,xm}X=\{x_{1},...,x_{m}\} to Y={y1,,ym}Y=\{y_{1},...,y_{m}\} if |X|=|Y|=m|X|=|Y|=m so that all sources as well as all targets are pairwise distinct. If each πi\pi_{i} is a half-trek, then Π\Pi is a system of half-treks. Moreover, a collection Π={π1,,πm}\Pi=\{\pi_{1},...,\pi_{m}\} of treks is said to have no sided intersection if

Left(πi)Left(πj)==Right(πi)Right(πj),ij\displaystyle\text{Left}(\pi_{i})\cap\text{Left}(\pi_{j})=\varnothing=\text{Right}(\pi_{i})\cap\text{Right}(\pi_{j}),\ \forall i\not=j

Let htr(v)htr(v) be the collection of vertices wV({v}sib(v))w\in V\setminus(\{v\}\cup sib(v)) for which there is a half-trek from vv to ww, these ww are called half-trek reachable from vv. We have the following definition and result of Foygel et al. (2012a).

Definition 2.4.

A set of nodes YVY\subset V satisfies the half-trek criterion with respect to a node vVv\in V if

  1. (i)

    |Y|=|pa(v)||Y|=|pa(v)|,

  2. (ii)

    Y({v}sib(v))=Y\cap(\{v\}\cup sib(v))=\varnothing, and

  3. (iii)

    there is a system of half-treks with no sided intersection from YY to pa(v)pa(v).

Theorem 2.5 (HTC-identifiability).

Let (Yv:vV)(Y_{v}:v\in V) be a family of subsets of the vertex set VV of a mixed graph GG. If, for each node vv, the set YvY_{v} satisfies the half-trek criterion with respect to vv, and there is a total ordering \prec on the vertex set VV such that wvw\prec v whenever wYvhtr(v)w\in Y_{v}\cap htr(v), then GG is rationally identifiable.

The assertion that GG is rationally identifiable means that the inverse map ϕG1\phi_{G}^{-1} can be represented as a rational function on ΘA\Theta\setminus A where AA is some proper algebraic subset of Θ\Theta. Clearly, rational identifiability is a stronger condition than generic identifiability. If a graph GG satisfies Theorem 2.5 we will say that GG is HTC-identifiable (HTCI). In a similar vein, Theorem 2 of Foygel et al. (2012a) gives sufficient conditions for a graph GG to be generically unidentifiable (with generically infinite fibers of ϕG\phi_{G}), and we will call such graphs HTC-unidentifiable (HTCU). Graphs that are neither HTCI or HTCU are called HTC-inconclusive, these are the graphs on which progress is left to be made.

As is noted in Section 8 in Foygel et al. (2012a) we may extend the power of the HTC by using the graph decomposition techniques of Tian (2005). Let C1,,CkVC_{1},...,C_{k}\subset V be the unique partitioning of VV where v,wCiv,w\in C_{i} if and only if there exists a (possibly empty) path from vv to ww composed of only bidirected edges. In other words, C1,,CkC_{1},...,C_{k} are the connected components of (V,B)(V,B), the bidirected part of GG. For i=1,,ki=1,\dots,k, let

Vi\displaystyle V_{i} =Cipa(Ci),\displaystyle=C_{i}\cup pa(C_{i}), Di\displaystyle D_{i} ={vwG:vVi,wCi},\displaystyle=\{v\to w\in G:v\in V_{i},\ w\in C_{i}\},
Bi\displaystyle B_{i} ={vwG:v,wCi},\displaystyle=\{v\leftrightarrow w\in G:v,w\in C_{i}\}, Gi\displaystyle G_{i} =(Vi,Di,Bi).\displaystyle=(V_{i},D_{i},B_{i}).

Then the mixed graphs G1,,GkG_{1},...,G_{k} are called the mixed components of GG. From the work of Tian (2005), Foygel et al. (2012a) present the following theorem.

Theorem 2.6 (Tian Decomposition).

For an acyclic mixed graph GG with mixed components G1,,GkG_{1},...,G_{k}, the following holds:

  1. (i)

    GG is rationally (or generically) identifiable if and only if all components
    G1,,GkG_{1},...,G_{k} are rationally (or generically) identifiable;

  2. (ii)

    GG is generically infinite-to-one if and only if there exists a component GjG_{j} that is generically infinite-to-one;

  3. (iii)

    if each GjG_{j} is generically hjh_{j}-to-one with hj<h_{j}<\infty, then GG is generically hh-to-one with h=j=1khjh=\prod_{j=1}^{k}h_{j}.

We remark that this decomposition also plays a role in non-linear models; see, for instance, the paper of Shpitser et al. (2014) and the references given therein.

3. Ancestral Decomposition

For a later strengthening of the HTC, we will show that the generic identification of certain subgraphs of an acyclic mixed graph G=(V,D,B)G=(V,D,B) implies the generic identification of their associated edge coefficients in the larger graph GG. This result is straightforward and is well known in other forms. Surprisingly, however, this simple idea can extend the applicability of the HTC when combined with the decomposition from Theorem 2.6. We will first define what we mean by an ancestral subset and an induced graph,

Definition 3.1.

Let VVV^{\prime}\subset V be a subset of vertices. The ancestors of VV^{\prime} form the set

An(V)={vV:there exists a directed path from v to some wV},\displaystyle An(V^{\prime})=\{v\in V:\text{there exists a directed path from $v$ to some $w\in V^{\prime}$}\},

where we consider the empty path to be directed so that VAn(V)V^{\prime}\subset An(V^{\prime}). If V=An(V)V^{\prime}=An(V^{\prime}), then we call VV^{\prime} ancestral.

Definition 3.2.

Let VVV^{\prime}\subset V be again a subset of vertices. The subgraph of GG induced by VV^{\prime} is the mixed graph GV=(V,D,B)G_{V^{\prime}}=(V^{\prime},D^{\prime},B^{\prime}) with

D\displaystyle D^{\prime} ={vwG:v,wV},\displaystyle=\{v\to w\in G:v,w\in V^{\prime}\},
B\displaystyle B^{\prime} ={vwG:v,wV}.\displaystyle=\{v\leftrightarrow w\in G:v,w\in V^{\prime}\}.

We now have the following simple fact.

Theorem 3.3.

Let G=(V,D,B)G=(V,D,B) be a mixed graph, and let VV^{\prime} be an ancestral subset of VV. If the induced subgraph GVG_{V^{\prime}} is generically (or rationally) identifiable then so are all the corresponding edge coefficients in GG.

Proof..

Let the covariance matrix Σ=(IΛ)TΩ(IΛ)1\Sigma=(I-\Lambda)^{-T}\Omega(I-\Lambda)^{-1} correspond to GG, that is, ΛD\Lambda\in\mathbb{R}^{D} and ΩPD(B)\Omega\in PD(B). Let Λ\Lambda^{\prime} and Ω\Omega^{\prime} denote the V×VV^{\prime}\times V^{\prime} submatrices of Λ\Lambda and Ω\Omega, respectively, and let Σ=(I|V|Λ)TΩ(I|V|Λ)1\Sigma^{\prime}=(I_{|V^{\prime}|}-\Lambda^{\prime})^{-T}\Omega(I_{|V^{\prime}|}-\Lambda^{\prime})^{-1} where I|V|I_{|V^{\prime}|} is the |V|×|V||V^{\prime}|\times|V^{\prime}| identity matrix. For ease of notation, write G=GVG^{\prime}=G_{V^{\prime}}.

Recall that for any v,wVv,w\in V, the set 𝒯(v,w)\mathcal{T}(v,w) comprises all treks between vv and ww in GG. Similarly, write 𝒯G(v,w)\mathcal{T}_{G^{\prime}}(v,w) for the set of treks between vv and ww in GG^{\prime}. Since VV^{\prime} is ancestral, it holds that 𝒯(v,w)=𝒯G(v,w)\mathcal{T}(v,w)=\mathcal{T}_{G^{\prime}}(v,w) for all v,wVv,w\in V^{\prime}. Thus, by Proposition 2.3, we have that for any v,wVv,w\in V^{\prime}

Σvw\displaystyle\Sigma_{vw} =π𝒯(v,w)π(Λ,Ω)=π𝒯G(v,w)π(Λ,Ω)=Σvw.\displaystyle=\sum_{\pi\in\mathcal{T}(v,w)}\pi(\Lambda,\Omega)=\sum_{\pi\in\mathcal{T}_{G^{\prime}}(v,w)}\pi(\Lambda^{\prime},\Omega^{\prime})=\Sigma_{vw}^{\prime}.

Now suppose that GG^{\prime} is generically (or rationally) identifiable. Then Λ,Ω\Lambda^{\prime},\Omega^{\prime} can be generically (or rationally) recovered from Σ\Sigma^{\prime}. As we have just shown that Σvw=Σvw\Sigma_{vw}=\Sigma_{vw}^{\prime} for all v,wVv,w\in V^{\prime}, we have that the entries of Λ,Ω\Lambda,\Omega corresponding to Λ,Ω\Lambda^{\prime},\Omega^{\prime} can be recovered from Σ\Sigma generically (or rationally). ∎

We may generalize the above theorem so that we do not have to consider the identifiability of all of GG^{\prime} and instead only look at certain edges in GG^{\prime}.

Corollary 3.4.

Let G=(V,D,B)G=(V,D,B) be a mixed graph, and let VV^{\prime} be an ancestral subset of VV. If an edge coefficient of GVG_{V^{\prime}} is generically (or rationally) identifiable then so is the corresponding coefficient in GG.

Proof..

This follows exactly as in the proof of Theorem 3.3 by only considering a single generically (or rationally) identifiable coefficient of GG^{\prime} at a time. ∎

We give an example as to how Theorem 3.3 strengthens the HTC.

\pgfmathresultpt1\pgfmathresultpt2\pgfmathresultpt3\pgfmathresultpt4\pgfmathresultpt5\pgfmathresultpt6
(a) An HTC-inconclusive graph GG
\pgfmathresultpt1\pgfmathresultpt2\pgfmathresultpt3\pgfmathresultpt4\pgfmathresultpt5
(b) The induced subgraph G{1,2,3,4,5}G_{\{1,2,3,4,5\}}
\pgfmathresultpt1\pgfmathresultpt2\pgfmathresultpt3\pgfmathresultpt4
\pgfmathresultpt1\pgfmathresultpt2\pgfmathresultpt3\pgfmathresultpt4\pgfmathresultpt5
(c) The mixed components of G{1,2,3,4,5}G_{\{1,2,3,4,5\}}
Figure 1. A mixed graph GG, a subgraph induced by an ancestral subset, and the mixed components the induced graph.
Example 3.5.

It is straightforward to check that the graph GG from Figure 1(a) is HTC-inconclusive using Algorithm 1 from Foygel et al. (2012a). We direct the reader who does not want to perform this computation by hand to the R package SEMID (Foygel and Drton, 2013; R Core Team, 2014). Moreover, GG cannot be decomposed as its bidirected part is connected.

Now the set V={1,2,3,4,5}V^{\prime}=\{1,2,3,4,5\} is ancestral in GG, so we may apply Theorem 3.3 to the induced subgraph G=G1,2,3,4,5G^{\prime}=G_{1,2,3,4,5}. While GG^{\prime} remains HTC-inconclusive, the Tian decomposition of Theorem 2.6 is applicable. After decomposing GG^{\prime} into its mixed components, see Figure 1(c), we find that each component is HTC-identifiable and thus GG^{\prime} itself is generically identifiable. To show generic identifiability of GG, we are left to show that all the coefficients on the directed edges between pa(6)={1,2}pa(6)=\{1,2\} and 6 are generically identifiable. Since Y={3,4}Y=\{3,4\} satisfies the HTC with respect to 66 it follows, by Lemma 3.6 below, that Λpa(v),v\Lambda_{pa(v),v} is generically identifiable. Hence, the entire matrix Λ\Lambda is generically identifiable, and since (IΛ)TΣ(IΛ)=Ω(I-\Lambda)^{T}\Sigma(I-\Lambda)=\Omega, this implies generic identifiability of Ω\Omega. We conclude that GG is generically identifiable despite being HTC-inconclusive.

Lemma 3.6.

Let G=(V,D,B)G=(V,D,B) be a mixed graph, and let vVv\in V. If YVY\subset V satisfies the HTC with respect to vv and for each yYy\in Y we have that Λpa(y),y\Lambda_{pa(y),y} is generically (or rationally) identifiable, then Λpa(v),v\Lambda_{pa(v),v} is generically (or rationally) identifiable.

Proof..

Suppose vertex vv has mm parents, and let pa(v)={p1,,pm}pa(v)=\{p_{1},...,p_{m}\}. Since YY satisfies the HTC for vv, we must have |Y|=|pa(v)|=m|Y|=|pa(v)|=m. Thus we may enumerate the set as Y={y1,,ym}Y=\{y_{1},...,y_{m}\}. Define a matrix Am×mA\in\mathbb{R}^{m\times m} with entries

Aij:={[(IΛ)TΣ]yipjif yihtr(v),Σyipjif yihtr(v),A_{ij}:=\left\{\begin{array}[]{ll}[(I-\Lambda)^{T}\Sigma]_{y_{i}p_{j}}&\mbox{if }y_{i}\in htr(v),\\ \Sigma_{y_{i}p_{j}}&\mbox{if }y_{i}\not\in htr(v),\end{array}\right.

and define a vector bmb\in\mathbb{R}^{m} with entries

bi:={[(IΛ)TΣ]yivif yihtr(v),Σyivif yihtr(v).b_{i}:=\left\{\begin{array}[]{ll}[(I-\Lambda)^{T}\Sigma]_{y_{i}v}&\mbox{if }y_{i}\in htr(v),\\ \Sigma_{y_{i}v}&\mbox{if }y_{i}\not\in htr(v).\end{array}\right.

Both AA and bb are generically identifiable because we have assumed that Λpa(y),y\Lambda_{pa(y),y} is generically identifiable for every yYy\in Y. Now, from the proof of Theorem 1 in Foygel et al. (2012a), we have AΛpa(v),v=bA\cdot\Lambda_{pa(v),v}=b, and from Lemma 2 of Foygel et al. (2012a) we deduce that AA is generically invertible. It follows that Λpa(v),v=A1b\Lambda_{pa(v),v}=A^{-1}b generically so that Λpa(v),v\Lambda_{pa(v),v} is generically identifiable. ∎

Algorithm 1 from Foygel et al. (2012a) (hereafter called the HTC-algorithm) determines whether or not a mixed graph G=(V,D,B)G=(V,D,B) satisfies the conditions of Theorem 2.5 and and thus checks if the graph is HTCI. The HTC-algorithm operates by iteratively looping through the nodes vVv\in V and attempting to find a half-trek system Π\Pi to pa(v)pa(v) with Π\Pi having sources which are in a set of “allowed” nodes AVA\subset V. Here, a node ww is allowed if whtr(v){v}sib(v)w\not\in htr(v)\cup\{v\}\cup sib(v) or if ww was previously shown to be generically identifiable in the sense that all coefficients on directed edges uwu\to w were shown to be generically identifiable. If such a half-trek system Π\Pi is found for node vv then Foygel et al. (2012a) show that this implies that vv is generically identifiable, and thus vv may be added to the set of allowed nodes for the remaining iterations. The algorithm terminates when all nodes have been shown to be generically identifiable or once it has iterated through all vertices and has been unable to show the generic identifiability of any new nodes. To find a half-trek system between a suitable subset of the set of allowed nodes AA and pa(v)pa(v), the HTC-algorithm solves a Max Flow problem on an auxiliary network Gflow(A,v)G_{\text{flow}}(A,v), and this step takes 𝒪(|V|3)\mathcal{O}(|V|^{3}) time when GG is acyclic. If in Gflow(A,v)G_{\text{flow}}(A,v) one can find a flow of size |pa(v)||pa(v)| then the half-trek system exists. See Section 6 of Foygel et al. (2012a) for more details about how Gflow(A,v)G_{\text{flow}}(A,v) is defined. Finally, for an acyclic mixed graph GG, the HTC-algorithm has a worst case running time of 𝒪(|V|5)\mathcal{O}(|V|^{5}).

Algorithm 1 presents a simple modification of the HTC-algorithm to leverage Corollary 3.4 extending the ability of the HTC to determine the generic identifiability of acyclic mixed graphs. We emphasize that this algorithm considers only certain ancestral subsets and, as such, we do not necessarily expect the algorithm to reach a conclusion in all cases in which Corollary 3.4 may be applicable.

Algorithm 1 A sufficient test for generic identifiability.
1:Input: G=(V,D,B)G=(V,D,B), an acyclic mixed graph on nn nodes v1,,vnv_{1},\dots,v_{n}
2:Initialize: SolvedNodes{v:pa(v)=}\text{SolvedNodes}\leftarrow\{v:pa(v)=\varnothing\}.
3:repeat
4:  for v=v1,v2,,vnv=v_{1},v_{2},\ldots,v_{n} do
5:   if vv\not\in SolvedNodes then
6:    \triangleright Check if we can generically identify Λpa(v),v\Lambda_{pa(v),v} using
7:    \triangleright the induced graph GAn({v}(AS)G_{An(\{v\}\cup(A\cap S)}.
8:    SAn(v)sib(An(v))S\leftarrow An(v)\cup sib(An(v))
9:    AS(A\leftarrow S\cap(SolvedNodes (Vhtr(v)))({v}sib(v))\cup\ (V\setminus htr(v)))\setminus(\{v\}\cup sib(v))
10:    GG^{\prime}\leftarrow the mixed component of GAn({v}(AS))G_{An(\{v\}\cup(A\cap S))} containing vv
11:    A(AA\leftarrow(A \cap (vertices in G)G^{\prime})) \cup (source nodes in GG^{\prime})
12:    if MaxFlow(Gflow(v,A))=|pa(v)|(G_{\mathrm{flow}}(v,A))=|pa(v)| then
13:     SolvedNode \leftarrow SolvedNodes {v}\cup\ \{v\}.
14:     Skip to next iteration of the loop
15:    end if
16:    \triangleright Check if we can generically identify Λpa(v),v\Lambda_{pa(v),v} using
17:    \triangleright the induced graph GAn({v})G_{An(\{v\})}.
18:    GG^{\prime}\leftarrow the mixed component of GAn(v)G_{An(v)} containing vv
19:    A(A(A\leftarrow(A\cap(vertices of GG^{\prime})) (\cup\ (source nodes in GG^{\prime})
20:    if MaxFlow(Gflow(v,A))=|pa(v)|(G^{\prime}_{\mathrm{flow}}(v,A))=|pa(v)| then
21:     SolvedNode \leftarrow SolvedNodes {v}\cup\ \{v\}.
22:    end if
23:   end if
24:  end for
25:until SolvedNodes =V=V or no change has occurred in the last iteration.
26:Output: “yes” if SolvedNodes =V=V, “no” otherwise.
Proposition 3.7.

Algorithm 1 returns “yes” only if the input acyclic mixed graph GG is generically identifiable and will return “yes” whenever the HTC-algorithm does. Moreover, Algorithm 1 returns “yes” for the HTC-inconclusive graph in Figure 1(a) and has time complexity at most 𝒪(|V|5)\mathcal{O}(|V|^{5}).

Proof..

The fact that Algorithm 1 only returns “yes” if GG is generically identifiable follows from Theorem 7 in Foygel et al. (2012a, b) and our Corollary 3.4. That Algorithm 1 returns “yes” whenever the HTC-algorithm does can be argued as follows:

If, for a set of allowed nodes AA and vVv\in V, there exists YAY\subset A satisfying the HTC for vv then we must have that YS:=An(v)sib(An(v))Y\subset S:=An(v)\cup sib(An(v)) and YY satisfies the HTC for vv in GAn({v}(AS))G_{An(\{v\}\cup(A\cap S))}. Lemma 4 of Foygel et al. (2012b) then yields that there exists a set YY^{\prime} of allowed nodes which satisfies the HTC for vv in the mixed component of GAn({v}(AS))G_{An(\{v\}\cup(A\cap S))} containing vv. Hence, if vv is added to the set of solved nodes in the HTC-algorithm it will also be added to the set of solved nodes in Algorithm 1. From this it follows that if the HTC-algorithm outputs “yes” then so will Algorithm 1.

It is straightforward to check that Algorithm 1 returns “yes” for the graph in Figure 1(a) and, thus, it remains only to argue that the time complexity is at most 𝒪(|V|5)\mathcal{O}(|V|^{5}). Note that the Max Flow algorithm for this problem has a running time of 𝒪(|V|3)\mathcal{O}(|V|^{3}) since GG is acyclic, see Foygel et al. (2012a) for details. It is easy to see that this running time dominates on each iteration. Moreover, since at the end of each iteration through the |V||V| nodes of the graph, the algorithm must either terminate or add at least one vertex to the set of solved nodes, it follows that there are at most |V|2|V|^{2} iterations. We conclude that the maximum run time of the algorithm is 𝒪(|V|2|V|3)=𝒪(|V|5)\mathcal{O}(|V|^{2}\cdot|V|^{3})=\mathcal{O}(|V|^{5}). ∎

Remark 3.8.

One might expect that lines 18 to 22 in Algorithm 1 are superfluous. This is, however, false, and indeed we have found examples of graphs GG on 10 nodes for which Algorithm 1 returns “yes” but the corresponding algorithm with lines 18 to 22 removed returns “no.” As these examples are fairly large we have chosen to not display them here.

4. Computational Experiments

We now run a simulation study to examine the effect of applying Algorithm 1 to HTC-inconclusive graphs. All code is written in RR, and we use the SEMID package to determine HTC-identifiability and HTC-unidentifiability (R Core Team, 2014; Foygel and Drton, 2013).

Algorithm 2 A procedure for generating random acyclic mixed graphs.
1:Input: A positive integer nn and 0<p,q<10<p,q<1
2:Initialize: A mixed graph G=(V,D,B)G=(V,D,B) with V={1,,n}V=\{1,...,n\}, D=B=D=B=\varnothing
3: Pick a random collection EE of n1n-1 bidirected edges so that (V,E)(V,E) is a tree.
4:BEB\leftarrow E
5:for 1i<jn1\leq i<j\leq n do
6:  Add iji\leftrightarrow j to BB with probability pp
7:end for
8:for 1i<jn1\leq i<j\leq n do
9:  Add iji\to j to DD with probability qq
10:end for
11:Output: G

For each combination of n{6,8,10,12}n\in\{6,8,10,12\}, p{.1,.2,.3}p\in\{.1,.2,.3\}, and q{.2,.3,.4,.5,.6}q\in\{.2,.3,.4,.5,.6\} we perform the following steps:

  1. (i)

    Use Algorithm 2 with probability parameters pp and qq to generate random acyclic mixed graphs GG with connected bidirected part on nn nodes, until we have found 1000 graphs which are HTC-inconclusive.

  2. (ii)

    For each of the 1000 HTC-inconclusive graphs GG, use Algorithm 1 to test the generic identifiability of GG.

  3. (iii)

    Record the proportion of the 1000 graphs that are shown to be generically identifiable by Algorithm 1. Call this proportion an,p,qa_{n,p,q}.

To summarize our findings we compute, for each pair (n,p)(n,p), the average bn,q=13pan,p,qb_{n,q}=\frac{1}{3}\sum_{p}a_{n,p,q}. We then plot the values of bn,qb_{n,q} in Figure 2. According to this figure, Algorithm 1 provides a modest increase in the number of graphs that are generically identifiable. This improvement is seen to be largest when qq is large, that is, the directed part of the mixed graph is dense.

Refer to caption

Figure 2. The average proportion of HTC-inconclusive graphs found to be generically identifiable by Algorithm 1.

5. Conclusion

We have shown how the generic identifiability of a subgraph of a mixed graph GG induced by an ancestral subset of vertices implies the generic identifiability of the corresponding edge coefficients in GG (Theorem 3.3 and Corollary 3.4). We then provided, in Algorithm 1, one specific way of how to leverage this result by using the HTC of Foygel et al. (2012a) and the decomposition techniques of Tian (2005). Our new algorithm provides a modest strengthening of the HTC while not increasing the algorithmic complexity of the HTC-algorithm of Foygel et al. (2012a).

When saying above that Algorithm 1 constitutes one way of leveraging, we mean that the algorithm considers only certain ancestral subsets. While we do not have any examples to report, it is possible that there are acyclic mixed graphs for which Algorithm 1 does not return “yes” but which could be proven generically identifiable by a combination of Corollary 3.4, the Tian decomposition, and the HTC-algorithm. This said, it is not clear to us that all ancestral subsets can be considered in an algorithm with polynomial run time. Clarifying this issue would be an interesting topic for future work.

Acknowledgments

We would like to thank Thomas Richardson whose questions following a seminar talk started this project. This work was partially supported by the U.S. National Science Foundation (DMS-1305154) and National Security Agency (H98230-14-1-0119).

References

  • Bollen (1989) K. A. Bollen. Structural equations with latent variables. Wiley Series in Probability and Mathematical Statistics: Applied Probability and Statistics. John Wiley & Sons, Inc., New York, 1989. A Wiley-Interscience Publication.
  • Brito and Pearl (2006) C. Brito and J. Pearl. Graphical condition for identification in recursive SEM. In Proceedings of the Twenty-Second Annual Conference on Uncertainty in Artificial Intelligence (UAI-06), pages 47–54. AUAI Press, 2006.
  • Chen et al. (2014) B. Chen, J. Tian, and J. Pearl. Testable implications of linear structural equations models. In C. E. Brodley and P. Stone, editors, Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence, pages 2424–2430. AAAI Press, 2014.
  • Drton et al. (2011) M. Drton, R. Foygel, and S. Sullivant. Global identifiability of linear structural equation models. Ann. Statist., 39(2):865–886, 2011.
  • Foygel and Drton (2013) R. Foygel and M. Drton. SEMID: Identifiability of linear structural equation models, 2013. R package version 0.1.
  • Foygel et al. (2012a) R. Foygel, J. Draisma, and M. Drton. Half-trek criterion for generic identifiability of linear structural equation models. Ann. Statist., 40(3):1682–1713, 2012a.
  • Foygel et al. (2012b) R. Foygel, J. Draisma, and M. Drton. Supplement to “half-trek criterion for generic identifiability of linear structural equation models. Ann. Statist., 40(3), 2012b.
  • Pearl (2009) J. Pearl. Causality. Cambridge University Press, Cambridge, second edition, 2009. Models, reasoning, and inference.
  • R Core Team (2014) R Core Team. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria, 2014.
  • Shpitser et al. (2014) I. Shpitser, R. J. Evans, T. S. Richardson, and J. M. Robins. Introduction to nested Markov models. Behaviormetrika, 41(1):3–39, 2014.
  • Spirtes et al. (2000) P. Spirtes, C. Glymour, and R. Scheines. Causation, prediction, and search. MIT press, Cambridge, MA, 2nd edition, 2000.
  • Tian (2005) J. Tian. Identifying direct causal effects in linear models. In Proceedings of the National Conference on Artificial Intelligence (AAAI), pages 346–352. AAAI Press/The MIT Press, 2005.
  • Wright (1921) S. Wright. Correlation and causation. J. Agricultural Research, 20:557–585, 1921.
  • Wright (1934) S. Wright. The method of path coefficients. Ann. Math. Statist., 5:161–215, 1934.