This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Topology Identification under Spatially Correlated Noise

Mishfad Shaikh Veedu    Murti V. Salapaka Department of Electrical and Computer Engineering, University of Minnesota, MN 55455, USA (e-mail: {veedu002, murtis}@ umn.edu).
Abstract

This article addresses the problem of reconstructing the topology of a network of agents interacting via linear dynamics, while being excited by exogenous stochastic sources that are possibly correlated across the agents, from time-series measurements alone. It is shown, under the assumption that the correlations are affine in nature, such network of nodal interactions is equivalent to a network with added agents. The added agents are represented by nodes that are latent, where no corresponding time-series measurements are available; however, here all the exogenous excitements are spatially (that is, across agents) uncorrelated. Generalizing affine correlations, it is shown that, under polynomial correlations, the latent nodes in the expanded networks can be excited by clusters of noise sources, where the clusters are uncorrelated with each other. The clusters can be replaced with a single noise source if the latent nodes are allowed to have non-linear interactions. Finally, using the sparse plus low-rank matrix decomposition of the imaginary part of the inverse power spectral density matrix (IPSDM) of the time-series data, the topology of the network is reconstructed. Under non conservative assumptions, the correlation graph of the noise sources is retrieved.

keywords:
Linear dynamical systems, time-series analysis, probabilistic graphical model, network topology identification, power spectral density, sparse estimation, latent nodes; Structure Learning; Learning and Control; Sensor placement.
thanks: This work is supported by NSF through the project titled ”RAPID: COVID-19 Transmission Network Reconstruction from Time-Series Data” under Award Number 2030096.

,

1 Introduction

Networks and graphical models provide convenient tools for effective representations of complex high dimensional multi-agent systems. Such a representation is useful in applications including power grids [32, 47], meteorology [19], neuroscience [4], and finance [5]. Knowledge of the network interaction structure (also known as network topology) helps in understanding, predicting, and in many applications, controlling/identifying the system behavior [16, 24, 28, 35, 39, 45]. In applications such as power grid, finance, and meteorological system, it is either difficult or impossible to intervene and affect the system. Hence, inferring the network properties by passive means such as time-series measurements is of great interest in these applications. An example is unveiling the correlation structure between the stocks in the stock market from daily share prices [5], which is useful in predicting the market behavior.

Learning the conditional independence relation between variables from time-series measurements is an active research field among machine learning (ML), probabilistic graphical model (PGM), and statistics communities [6, 8, 29, 30, 33, 38], where the system modules are considered as random variables. However, such studies fail to capture dynamic dependencies between the entities in a system, which are prevalent in most of the aforementioned applications. For dynamical systems, autoregressive (AR) models that are excited by exogenous Gaussian noise sources, which are independent across time and variables, are explored in [1, 2, 3, 40]. Here, the graph topology captured the sparsity pattern of the regressor coefficient matrices, which characterized the conditional independence between the variables. It was shown that the sparsity pattern of the inverse power spectral density matrix (IPSDM), also known as concentration matrix, identifies the conditional independence relation between the variables (see [3, 40]). As shown in [27], the conditional independence structure between the variables is equivalent to the moral-graph structure of the underlying directed graph. For multi-agent systems with linear dynamical interaction, excited by wide sense stationary (WSS) noise sources that are mutually uncorrelated across the agents, moral graph reconstruction using Wiener projection has gained popularity in the last decade [21, 25, 27]. Here, the moral-graph of the underlying linear dynamic influence model (LDIM) is recovered from the magnitude response of Wiener coefficients, obtained from time-series measurements. For a wide range of applications, the spurious connections in the moral-graph can be identified–returning the true topology–by observing the phase response of the Wiener coefficients [43]. Furthermore, for systems with strictly causal dynamical dependencies, Granger causality based algorithms can unveil the exact cause-effect nature of the interactions; thus recovering the exact parent-child relationship of the underlying graph [14, 15, 27].

In many networks, time-series at only a subset of nodes is available. The nodes where the time-series measurements are not available form the latent/confound/hidden nodes. Topology reconstruction is more challenging in the presence of latent nodes as additional spurious connections due to confounded effects are formed when applying the aforementioned techniques. AR model identification of the independent Gaussian time-series discussed above, in the presence of latent nodes, was studied in [9, 10, 11, 13, 17, 22, 48, 49]. Here, the primary goal is to eliminate the spurious connections due to latent nodes and retrieve the original conditional independence structure from the observed time-series. In applications such as power grid, there is a need to retrieve the complete topology of the network, including that of the latent nodes. Such problems are studied for bidirected tree [42], poly-forest [36], and poly-tree [26, 37] networks excited by WSS noise sources that are uncorrelated across the agents. Recently, an approach to reconstruct the complete topology of a general linear dynamical network with WSS noise was provided in [46]. The work in [46] can be considered as a generalization of AR model identification with latent nodes, in asymptotic time-series regime. However, one major caveat of the aforementioned literature on graphical models is that the results fail if the exogenous noise sources are spatially correlated, i.e. if the noise is correlated across the agents/variables. Prior works have studied the systems with spatially correlated noise sources [18, 28, 35]; however, these studies assume the knowledge of topology of the network. In a related work, [34] studied topology estimation under spatially correlated noise sources and used this estimated topology in local module identification.

This article studies the problem of topology identification of the LDIMs that allow spatially correlated noise sources, similar to the problem in [34]. However, this article provides an alternate treatment, where the noise correlations are transformed to latent nodes. This transformation enables one to gain additional insights and apply techniques from topology identification with latent nodes to solve the problem.

The first major result of this article is to provide a transformation that converts an LDIM without latent nodes, but excited by spatially correlated exogenous noise sources, to LDIMs with latent nodes. Here, the latent nodes are characterized by the maximal cliques in the correlation graph, the undirected graph that represents the spatial correlation structure. It is shown that, under affine correlation assumption, that is, when the correlated noise sources are related in affine way (Assumption 1), there exist transformed LDIMs with latent nodes, where all the nodes are excited by spatially uncorrelated noise sources. A key feature of the transformation is that the correlations are completely captured using the latent nodes, while the original topology remains unaltered. The original moral-graph/topology is the same as the moral-graph/topology among the observed nodes in the altered graph. Thus, this transformed problem is shown to be equivalent to topology identification of networks with latent nodes. Consequently, any of the aforementioned techniques for the networks with latent node can be applied on the transformed LDIM to reconstruct the original moral graph/topology.

Next, relaxing the affine correlation assumption, polynomial correlation is considered (Assumption 2); here, the focus is on noise sources with distributions that are symmetric around the mean. It is shown that, in this scenario, the transformed dynamical model can be excited by clusters of noise sources, where the clusters are uncorrelated. However, the noise sources inside the clusters can be correlated. Using the sparse++low-rank decomposition of IPSDM from [46], the true topology of the network along with the correlation structure is reconstructed, from the IPSDM of the original LDIM, without any additional information, if the network satisfies a necessary and sufficient condition. Notice that, the results discussed here are applicable to the networks with static random variables and AR models also, when the exogenous noise sources are spatially correlated.

The article is organized as follows. Section 2 introduces the system model, including LDIM, and the essential definitions. Section 3 discusses IPSDM based topology reconstruction. Section 4 describes the correlation graph to latent nodes transformation and some major results. Section 5 discusses transformation of LDIMs with polynomial correlation to LDIMs with latent nodes. Section 6 explains the sparse++low-rank decomposition technique and how the topology can be reconstructed without the knowledge of correlation graph. Simulation results are provided in Section 7. Finally, Section 8 concludes the article.

Notations: Bold capital letters, 𝐀{\mathbf{A}}, denotes matrices; AijA_{ij} and (𝐀)ij({\mathbf{A}})_{ij} represents (i,j)th(i,j)^{\text{th}} entry of 𝐀{\mathbf{A}}; Bold small letters, 𝐯{\mathbf{v}}, denotes vectors; viv_{i} indicates ithi^{\text{th}} entry of 𝐯{\mathbf{v}}; [n]:={1,,n}[n]:=\{1,\dots,n\}; The subscript oo denotes the observed nodes index set [n][n] and the subscript hh indicates the latent nodes index set {n+1,,n+L}\{n+1,\dots,n+L\}, For example, 𝐱o:={x1,,xn}{\mathbf{x}}_{o}:=\{x_{1},\dots,x_{n}\} and 𝐱h:={xn+1,,xn+L}{\mathbf{x}}_{h}:=\{x_{n+1},\dots,x_{n+L}\}; Φ𝐱(z):=[(Φ𝐱(z))ij],z,|z|=1\Phi_{\mathbf{x}}(z):=[(\Phi_{\mathbf{x}}(z))_{ij}],~{}z\in\mathbb{C},~{}|z|=1 denote power spectral density matrix, where (Φ𝐱)ij(\Phi_{\mathbf{x}})_{ij} is cross power spectral density between the time-series at nodes ii and jj; {\mathcal{F}} denotes the set of real rational single input single output transfer functions that are analytic on unit circle; 𝒜{\mathcal{A}} denotes the set of rationally related zero mean jointly wide sense stationary (JWSS) scalar stochastic process; 𝒵(.){\mathcal{Z}}(.) denotes bilateral zz-transform; 𝕊n{\mathbb{S}}^{n} denotes the space of all skew symmetric n×nn\times n matrices; \mathbb{N} denotes the set of natural numbers, {0,1,2,}\{0,1,2,\dots\} ; For a set 𝒮{\mathcal{S}}, |𝒮||{\mathcal{S}}| denotes the cardinality of the set. 𝐀\|{\mathbf{A}}\|_{*} denotes the nuclear norm and 𝐀1\|{\mathbf{A}}\|_{1} denote sum of absolute values of entries of 𝐀{\mathbf{A}}. \phasea\phase{a} denotes the phase of the complex number aa.

2 System model

Consider a network of nn interconnected nodes, where node ii is equipped with time-series measurements, (x~i(t))t(\widetilde{x}_{i}(t))_{t\in\mathbb{Z}}, i[n]i\in[n]. The network interaction is described by,

𝐱(z)\displaystyle{\mathbf{x}}(z) =𝐇(z)𝐱(z)+𝐞(z),\displaystyle={\mathbf{H}}(z){\mathbf{x}}(z)+{\mathbf{e}}(z), (1)

where 𝐱(z)=[x1(z),x2(z),,xn(z)]{\mathbf{x}}(z)=[x_{1}(z),x_{2}(z),\dots,x_{n}(z)], 𝐱i(z)=𝒵[x~i]{\mathbf{x}}_{i}(z)=\mathcal{Z}[{{\widetilde{x}}_{i}}]. 𝐞(z)=[e1(z),e2(z),,en(z)]𝒜n{\mathbf{e}}(z)=[e_{1}(z),e_{2}(z),\dots,e_{n}(z)]\in{\mathcal{A}}^{n} are the exogenous noise sources with Φe(z)\Phi_{e}(z) is non-singular, but possibly non-diagonal. 𝐇(z){\mathbf{H}}(z) is the weighted adjacency matrix with Hii=0H_{ii}=0, for all i[n]i\in[n]. An LDIM is defined as the pair (𝐇,𝐞)({\mathbf{H}},{\mathbf{e}}), whose output process is given by (1). The LDIM is well-posed if every entry of (𝐈n𝐇(z))1({\mathbf{I}}_{n}-{\mathbf{H}}(z))^{-1} is analytic on the unit circle, |z|=1|z|=1 and topologically detectable if Φ𝐞\Phi_{\mathbf{e}} is positive definite.

Every LDIM has two associated graphs, viz: 1) a directed graph, the linear dynamic influence graph (LDIG), 𝒢(𝒱,){\mathcal{G}}({\mathcal{V}},{\mathcal{E}}), where 𝒱=[n]{\mathcal{V}}=[n] and :={(i,j):Hji0}{\mathcal{E}}:=\{(i,j):H_{ji}\neq 0\} and 2) an undirected graph, the correlation graph, 𝒢c(𝒱,c){\mathcal{G}}_{c}({\mathcal{V}},{\mathcal{E}}_{c}), where c:={(i,j):(Φe)ij0,ij}{\mathcal{E}}_{c}:=\{(i,j):(\Phi_{e})_{ij}\neq 0,\ i\neq j\}. Notice that, in a directed graph, if (i,j)(i,j)\in\mathcal{E} then there is a directed arrow from ii to jj in the graphical representation of \mathcal{E} (see Fig. 1(a) for example). Concisely, “LDIG (𝐇,𝐞)({\mathbf{H}},{\mathbf{e}})” is used to denote the LDIG 𝒢(𝒱,){\mathcal{G}}({\mathcal{V}},{\mathcal{E}}) corresponding to the LDIM (𝐇,𝐞)({\mathbf{H}},{\mathbf{e}}). For a directed graph 𝒢(𝒱,){\mathcal{G}}({\mathcal{V}},{\mathcal{E}}), the parent, the children, and the spouse sets of node ii are Pa(i):={j:(j,i)}Pa(i):=\{j:(j,i)\in{\mathcal{E}}\}, Ch(i):={j:(i,j)}Ch(i):=\{j:(i,j)\in{\mathcal{E}}\}, and Sp(i):={j:jPa(Ch(i))}Sp(i):=\{j:j\in Pa(Ch(i))\} respectively. The topology of a directed graph 𝒢(𝒱,){\mathcal{G}}({\mathcal{V}},{\mathcal{E}}), denoted 𝒯(𝒱,){\mathcal{T}}({\mathcal{V}},{\mathcal{E}}), is the undirected graph obtained by removing the directions from every edge (i,j)(i,j)\in{\mathcal{E}}. Moral-graph (also called kin-graph) of 𝒢(𝒱,){\mathcal{G}}({\mathcal{V}},{\mathcal{E}}), denoted kin(𝒢)kin({\mathcal{G}}) is an undirected graph, kin(𝒢):=𝒢(𝒱,)kin({\mathcal{G}}):={\mathcal{G}}({\mathcal{V}},{\mathcal{E}}^{\prime}), where :={(i,j):ij,i(Sp(j)Pa(j)Ch(j))}{\mathcal{E}}^{\prime}:=\{(i,j):i\neq j,\ i\in\left(Sp(j)\bigcup Pa(j)\bigcup Ch(j)\right)\}. A clique is a sub-graph of a given undirected graph where every pair of nodes in the sub-graph are adjacent. A maximal clique is a clique that is not a subset of a larger clique. For a correlation graph, 𝒢c(𝒱,c){\mathcal{G}}_{c}({\mathcal{V}},{\mathcal{E}}_{c}), qq denotes the number of maximal cliques with clique size >1>1.

The problem we address is described as follows.

Problem 1.

(P1) Consider a well-posed and topologically detectable LDIM, (𝐇,𝐞)({\mathbf{H}},{\mathbf{e}}), where Φ𝐞\Phi_{\mathbf{e}} is allowed to be non-diagonal and its associated graphs 𝒢(𝒱,){\mathcal{G}}({\mathcal{V}},{\mathcal{E}}) and 𝒢c(𝒱,c){\mathcal{G}}_{c}({\mathcal{V}},{\mathcal{E}}_{c}) are unknown. Given the power spectral density matrix Φ𝐱\Phi_{{\mathbf{x}}}, where 𝐱{\mathbf{x}} is given by (1), reconstruct the topology of 𝒢{\mathcal{G}}.

3 IPSDM based topology reconstruction

In this section, the IPSDM based topology reconstruction is presented. In [27], the authors showed that, for any LDIM characterized by (1)\eqref{eq:Corr_TF_model}, the availability of IPSDM, which can be written as

Φ𝐱1=(𝐈n𝐇)Φe1(𝐈n𝐇),\displaystyle\Phi_{{\mathbf{x}}}^{-1}=({\mathbf{I}}_{n}-{\mathbf{H}})^{*}\Phi_{e}^{-1}({\mathbf{I}}_{n}-{\mathbf{H}}), (2)

is sufficient for reconstructing the moral-graph of (𝐇,𝐞)({\mathbf{H}},{\mathbf{e}}). That is, if (Φ𝐱1)ij0(\Phi_{{\mathbf{x}}}^{-1})_{ij}\neq 0, then ii and jj are kins. However, an important assumption for the result to hold true is that Φe1\Phi_{e}^{-1} is diagonal. If Φe1\Phi_{e}^{-1} is non-diagonal, then the result does not hold in general. For iji\neq j,

(Φ𝐱1)ij\displaystyle(\Phi_{{\mathbf{x}}}^{-1})_{ij} =(Φe1)ij(𝐇Φe1)ij(Φe1𝐇)ij+(𝐇Φe1𝐇)ij\displaystyle=(\Phi_{e}^{-1})_{ij}-({\mathbf{H}}^{*}\Phi_{e}^{-1})_{ij}-(\Phi_{e}^{-1}{\mathbf{H}})_{ij}+({\mathbf{H}}^{*}\Phi_{e}^{-1}{\mathbf{H}})_{ij}
=(Φe1)ijk=1n(𝐇)ik(Φe1)kjk=1n(Φe1)ik𝐇kj\displaystyle=(\Phi_{e}^{-1})_{ij}-\sum_{k=1}^{n}({\mathbf{H}}^{*})_{ik}(\Phi_{e}^{-1})_{kj}-\sum_{k=1}^{n}(\Phi_{e}^{-1})_{ik}{\mathbf{H}}_{kj}
+k=1nl=1n(𝐇)ik(Φe1)kl𝐇lj,\displaystyle\hskip 28.45274pt+\sum_{k=1}^{n}\sum_{l=1}^{n}({\mathbf{H}}^{*})_{ik}(\Phi_{e}^{-1})_{kl}{\mathbf{H}}_{lj},

and any of the four terms can cause (Φ𝐱1)ij0(\Phi_{{\mathbf{x}}}^{-1})_{ij}\neq 0, depending on Φ𝐞1\Phi_{\mathbf{e}}^{-1}. Hence, this technique cannot be applied directly to solve Problem 1. For example, consider the network 𝒢{\mathcal{G}} given in Fig. 1(a) with (Φ𝐞1)130(\Phi_{\mathbf{e}}^{-1})_{13}\neq 0. Then, (Φ𝐱1)230(\Phi_{{\mathbf{x}}}^{-1})_{23}\neq 0 since 𝐇21(Φ𝐞1)130{\mathbf{H}}^{*}_{21}(\Phi_{\mathbf{e}}^{-1})_{13}\neq 0, which implies that the estimated topology has (2,3)(2,3) present, while (2,3)kin(𝒢)(2,3)\notin kin({\mathcal{G}}).

In the next section, the correlation graph, 𝒢c(𝒱,c){\mathcal{G}}_{c}({\mathcal{V}},{\mathcal{E}}_{c}), is scrutinized and properties of 𝒢c(𝒱,c){\mathcal{G}}_{c}({\mathcal{V}},{\mathcal{E}}_{c}) are evaluated as the first step in unveiling the true topology when noise sources admit correlations.

4 Spatial Correlation to Latent Node Transformation

In this section, a transformation of the LDIM, (𝐇,𝐞)({\mathbf{H}},{\mathbf{e}}), to an LDIM with latent nodes by exploiting the structural properties of the noise correlation is obtained. The transformation converts an LDIM without latent nodes and driven by spatially correlated exogenous noise sources to an LDIM with latent nodes that are excited by spatially uncorrelated exogenous noise sources. Assuming perfect knowledge of the noise correlation structure, the latent nodes and their children in the transformed LDIM are characterized. It is shown that although the transformation is not unique, the topology of the transformation is unique under affine correlation (Assumption 1).

For 𝐞~=[𝐞~o;𝐞~h]\mathbf{{\widetilde{e}}}=[\mathbf{{\widetilde{e}}}_{o};\mathbf{{\widetilde{e}}}_{h}] and 𝐇~:=\mathbf{{\widetilde{H}}}:= [𝐇𝐅𝟎𝟎]\left[\begin{matrix}{\mathbf{H}}&{\mathbf{F}}\\ {\mathbf{0}}&{\mathbf{0}}\end{matrix}\right], 𝐇n×n,𝐅n×L,L{\mathbf{H}}\in{\mathcal{F}}^{n\times n},\ {\mathbf{F}}\in{\mathcal{F}}^{n\times L},~{}L\in{\mathbb{N}}, (𝐇~,𝐞~)(\mathbf{{\widetilde{H}}},\mathbf{{\widetilde{e}}}) (or with a slight abuse of notation, ([𝐇,𝐅],𝐞~)([{\mathbf{H}},{\mathbf{F}}],\mathbf{{\widetilde{e}}})) denotes an LDIM with LL latent nodes, where 𝐅ik{\mathbf{F}}_{ik} denotes the directed edge weight from latent node kk to observed node ii and 𝐇ik{\mathbf{H}}_{ik} denotes the directed edge weight from observed node kk to observed node ii. Notice that the latent nodes considered in this article are strict parents and they do not have incoming edges.

4.1 Relation between Spatial Correlation and Latent Nodes

Here, it is demonstrated that the spatially correlated exogenous noise sources can be viewed as the children of a latent node that is a common parent of the correlated sources. The idea is explained in the motivating example below. Towards this, let us define the following notion of equivalent networks.

Definition 4.1.

Let (𝐇(1),𝐞(1))({\mathbf{H}}^{(1)},{\mathbf{e}}^{(1)}) and (𝐇(2),𝐞(2))({\mathbf{H}}^{(2)},{\mathbf{e}}^{(2)}) be two LDIMs and let 𝐱(1)=(𝐈n𝐇(1))1𝐞(1){\mathbf{x}}^{(1)}=({\mathbf{I}}_{n}-{\mathbf{H}}^{(1)})^{-1}{\mathbf{e}}^{(1)} and 𝐱(2)=(𝐈n𝐇(2))1𝐞(2){\mathbf{x}}^{(2)}=({\mathbf{I}}_{n}-{\mathbf{H}}^{(2)})^{-1}{\mathbf{e}}^{(2)} respectively. Then, the LDIMs (𝐇(1),𝐞(1))({\mathbf{H}}^{(1)},{\mathbf{e}}^{(1)}) and (𝐇(2),𝐞(2))({\mathbf{H}}^{(2)},{\mathbf{e}}^{(2)}) are said to be equivalent, denoted (𝐇(1),𝐞(1))(𝐇(2),𝐞(2))({\mathbf{H}}^{(1)},{\mathbf{e}}^{(1)})\equiv({\mathbf{H}}^{(2)},{\mathbf{e}}^{(2)}) if and only if Φ𝐱(1)=Φ𝐱(2)\Phi_{{\mathbf{x}}^{(1)}}=\Phi_{{\mathbf{x}}^{(2)}}.

Refer to caption
(a) Network with correlated noise sources, (Φ𝐞\Phi_{\mathbf{e}} non-diagonal).
Refer to caption
(b) Transformed model with latent node.
Figure 1: Transformation of an LDIM with correlated noise sources to a network with uncorrelated noise sources in the presence of a latent node.

Consider the network of three nodes in Fig. 1(a). Suppose the noise sources e1e_{1} and e2e_{2} are correlated, that is, Φe1e20\Phi_{e_{1}e_{2}}\neq 0. Consider the network in Fig. 1(b), where e~1,e~2,e~4\widetilde{e}_{1},{\widetilde{e}}_{2},{\widetilde{e}}_{4}, and e3e_{3} are jointly uncorrelated and node 44 is a latent node. Note that e3e_{3} is same in both the LDIMs while e~1,e~2{\widetilde{e}}_{1},{\widetilde{e}}_{2} are different from e1,e2e_{1},e_{2}. From the LDIM of Fig. 1(a), it follows that

x1\displaystyle x_{1} =e1+h13x3&x2=e2+h21x1.\displaystyle=e_{1}+h_{13}x_{3}\ \&\ x_{2}=e_{2}+h_{21}x_{1}. (3)

From the LDIM of Fig. 1(b), it follows that

x1=e~1+h14x4+h13x3,x2=e~2+h24x4+h21x1,&x4=e~4.\displaystyle\begin{split}x_{1}&={\widetilde{e}}_{1}+h_{14}x_{4}+h_{13}x_{3},\\ x_{2}&={\widetilde{e}}_{2}+h_{24}x_{4}+h_{21}x_{1},\ \&\ x_{4}={\widetilde{e}}_{4}.\end{split} (4)

Comparing (3) and (4), it is evident that if e~1,e~2{\widetilde{e}}_{1},\ {\widetilde{e}}_{2}, and e~4{\widetilde{e}}_{4} are such that e1=e~1+h14e~4e_{1}={\widetilde{e}}_{1}+h_{14}{\widetilde{e}}_{4} and e2=e~2+h24e~4e_{2}={\widetilde{e}}_{2}+h_{24}{\widetilde{e}}_{4}, then the time-series obtained from both the LDIMs are identical. That is, the LDIMs are equivalent.

For a given LDIM, (𝐇,𝐞)({\mathbf{H}},{\mathbf{e}}), with correlated exogenous noise sources, one can define a space of equivalent LDIMs with uncorrelated noise sources that provide the same time-series. The following definition captures this space of “LDIMs with latent nodes and uncorrelated noise sources” that are equivalent to the original LDIM with correlated noise sources.

Assumption 1.

In LDIM (𝐇,𝐞)({\mathbf{H}},{\mathbf{e}}), defined by (1), the exogenous noise processes ei,ej𝒜e_{i},e_{j}\in{\mathcal{A}} are correlated only via affine interactions, i.e., Φeiej0\Phi_{e_{i}e_{j}}\neq 0 if and only if there exists an affine transform, f(x)=a+bxf(x)=a+bx, a𝒜,b,b0a\in{\mathcal{A}},~{}b\in{\mathcal{F}},~{}b\neq 0 such that either ei=f(ej)e_{i}=f(e_{j}) or ej=f(ei)e_{j}=f(e_{i}).

Definition 4.2.

For any LDIM, (𝐇,𝐞)({\mathbf{H}},{\mathbf{e}}), with Φ𝐞\Phi_{\mathbf{e}} non-diagonal, and satisfying Assumption 1,

(𝐇,𝐞):={(𝐇~,𝐞~):𝐞=𝐞~o+𝐅𝐞~h,𝐇~=[𝐇𝐅𝟎𝟎],𝐞~=[𝐞~o,𝐞~h]T,𝐞~𝒜n+L,𝐅n×L,L,Φ𝐞~ diagonal},\displaystyle\begin{split}{\mathcal{L}}({\mathbf{H}},{\mathbf{e}})&:=\{(\mathbf{{\widetilde{H}}},\mathbf{{\widetilde{e}}}):{\mathbf{e}}=\mathbf{{\widetilde{e}}}_{o}+{\mathbf{F}}\mathbf{{\widetilde{e}}}_{h},\ \mathbf{{\widetilde{H}}}={\tiny\left[\begin{matrix}{\mathbf{H}}&{\mathbf{F}}\\ {\mathbf{0}}&{\mathbf{0}}\end{matrix}\right]},\ \mathbf{{\widetilde{e}}}=[\mathbf{{\widetilde{e}}}_{o},\mathbf{{\widetilde{e}}}_{h}]^{T},\\ &\ \mathbf{{\widetilde{e}}}\in{\mathcal{A}}^{n+L},\ {\mathbf{F}}\in{\mathcal{F}}^{n\times L},\ L\in{\mathbb{N}},\ \Phi_{\mathbf{{\widetilde{e}}}}\text{ diagonal}\},\end{split} (5)

is the space of all {\mathcal{L}}-transformations for a given (𝐇,𝐞)({\mathbf{H}},{\mathbf{e}}).

Remark 4.3.

In (4.2), the number of latent nodes, LL, is not fixed a priori.

Based on the definition of (𝐇,𝐞){\mathcal{L}}({\mathbf{H}},{\mathbf{e}}) in (5), LDIM for ([𝐇,𝐅],𝐞~)([{\mathbf{H}},{\mathbf{F}}],\mathbf{{\widetilde{e}}}) can be rewritten as

𝐱(z)\displaystyle{\mathbf{x}}(z) =𝐇(z)𝐱(z)+𝐆(z)𝐞~(z),\displaystyle={\mathbf{H}}(z){\mathbf{x}}(z)+{\mathbf{G}}(z)\mathbf{{\widetilde{e}}}(z), (6)

where 𝐞~(z)=[e~1(z),,e~n(z),,e~n+L(z)]\mathbf{{\widetilde{e}}}(z)=[{\widetilde{e}}_{1}(z),\dots,{\widetilde{e}}_{n}(z),\dots,{\widetilde{e}}_{n+L}(z)] are mutually uncorrelated and 𝐆=[𝐈n,𝐅]{\mathbf{G}}=\begin{bmatrix}{\mathbf{I}}_{n},{\mathbf{F}}\end{bmatrix}.

Refer to caption
(a) Original LDIG, 𝒢(𝒱,){\mathcal{G}}({\mathcal{V}},{\mathcal{E}}).
Refer to caption
(b) Correlation graph, 𝒢c(𝒱,c){\mathcal{G}}_{c}({\mathcal{V}},{\mathcal{E}}_{c}), of 𝒢(𝒱,){\mathcal{G}}({\mathcal{V}},{\mathcal{E}}).
Refer to caption
(c) Example transformed graph in (𝐇,𝐞){\mathcal{L}}({\mathbf{H}},{\mathbf{e}}) with one latent node.
Refer to caption
(d) Example transformed graph in (𝐇,𝐞){\mathcal{L}}({\mathbf{H}},{\mathbf{e}}) with three latent nodes.
Figure 2: Demonstration of (𝐇,𝐞){\mathcal{L}}({\mathbf{H}},{\mathbf{e}}); noise sources are not shown. The exogenous noises are uncorrelated in Fig. 2(c) and 2(d).

An LDIM, (𝐇~,𝐞~)(𝐇,𝐞)(\mathbf{{\widetilde{H}}},\mathbf{{\widetilde{e}}})\in{\mathcal{L}}({\mathbf{H}},{\mathbf{e}}), is called a transformed LDIM of (𝐇,𝐞)({\mathbf{H}},{\mathbf{e}}) in this article. Intuitively, {\mathcal{L}}-transformation completely assigns the spatial correlation component present in the original noise source, 𝐞{\mathbf{e}}, to latent nodes, without altering the LDIG, 𝒢(𝒱,){\mathcal{G}}({\mathcal{V}},{\mathcal{E}}), similar to the discussion on the aforementioned motivating example. For the LDIM in Fig. 2(a) with correlation graph in Fig. 2(b), Fig. 2(c), and Fig. 2(d) show some examples of the LDIGs that belong to (𝐇,𝐞){\mathcal{L}}({\mathbf{H}},{\mathbf{e}}). Thus, {\mathcal{L}}-transformation returns a larger network with latent nodes. Notice that the latent nodes present in an LDIG of (𝐇~,𝐞~)(𝐇,𝐞)(\mathbf{{\widetilde{H}}},\mathbf{{\widetilde{e}}})\in{\mathcal{L}}({\mathbf{H}},{\mathbf{e}}), are strict parents, whose interaction with the true nodes are characterized by 𝐅{\mathbf{F}}.

The following theorem formalizes the relation between the correlation graph and the latent nodes in the transformed higher dimensional LDIMs. In particular, it shows that two noise sources eie_{i} and eje_{j} are correlated if and only if there is a latent node, which is a common parent of both the nodes ii and jj in the transformed LDIM.

Lemma 2.

Let (𝐇,𝐞)({\mathbf{H}},{\mathbf{e}}) be an LDIM defined by (1) that satisfies Assumption 1, and let 𝒢c(𝒱,c){\mathcal{G}}_{c}({\mathcal{V}},{\mathcal{E}}_{c}) be the correlation graph of {ei}i=1n\{e_{i}\}_{i=1}^{n}. Then, for every distinct i,j[n]i,j\in[n], (i,j)c(i,j)\in{\mathcal{E}}_{c} if and only if every LDIG ([𝐇,𝐅],𝐞~)(𝐇,𝐞)([{\mathbf{H}},{\mathbf{F}}],\mathbf{{\widetilde{e}}})\in{\mathcal{L}}({\mathbf{H}},{\mathbf{e}}) contains a latent node hh such that hPa(i)Pa(j)h\in Pa(i)\cap Pa(j).

Proof: Refer Appendix A

Thus, eie_{i} and eje_{j} in the original LDIM are correlated if and only if for every {\mathcal{L}}-transformed LDIM there exists a kk such that Fik0F_{ik}\neq 0 and Fjk0F_{jk}\neq 0.

The following lemma shows that the number of latent nodes, LL, present in any ([𝐇,𝐅],𝐞~)(𝐇,𝐞)([{\mathbf{H}},{\mathbf{F}}],\mathbf{{\widetilde{e}}})\in{\mathcal{L}}({\mathbf{H}},{\mathbf{e}}) is at least qq, the number of maximal cliques with clique size greater than one in 𝒢c{\mathcal{G}}_{c}.

Lemma 3.

Let (𝐇,𝐞)({\mathbf{H}},{\mathbf{e}}) be an LDIM and let 𝒢c(𝒱,c){\mathcal{G}}_{c}({\mathcal{V}},{\mathcal{E}}_{c}) be the correlation graph of the exogenous noise sources, 𝐞{\mathbf{e}}. Then, the number, LL, of latent nodes present in any ([𝐇,𝐅],𝐞~)(𝐇,𝐞)([{\mathbf{H}},{\mathbf{F}}],\mathbf{{\widetilde{e}}})\in{\mathcal{L}}({\mathbf{H}},{\mathbf{e}}) satisfies LqL\geq q, where qq is the number of maximal cliques with clique size >1>1 in 𝒢c{\mathcal{G}}_{c}.

Proof: Refer Appendix B. ∎

Remark 4.4.

Let 𝒢c(𝒱c,c){\mathcal{G}}_{c}({\mathcal{V}}_{c},{\mathcal{E}}_{c}) be the correlation graph of the exogenous noise sources. Then, for any given maximal clique 𝒢(𝒱,)𝒢c(𝒱c,c){\mathcal{G}}^{\ell}({\mathcal{V}}^{\ell},{\mathcal{E}}^{\ell})\subseteq{\mathcal{G}}_{c}({\mathcal{V}}_{c},{\mathcal{E}}_{c}) and for any k1k_{\ell}\geq 1, there exists a transformed LDIG ([𝐇,𝐅],𝐞~)(𝐇,𝐞)([{\mathbf{H}},{\mathbf{F}}],\mathbf{{\widetilde{e}}})\in{\mathcal{L}}({\mathbf{H}},{\mathbf{e}}) with kk_{\ell} number of latent nodes such that the set 𝒱{\mathcal{V}}^{\ell} is equal to the children of the latent nodes. For |𝒱|=3|{\mathcal{V}}^{\ell}|=3, Fig. 2 shows examples with kl=1,3k_{l}=1,3 (see proof of Lemma 3 for details).

As shown in Lemma 3 and Fig. 2, the LDIMs in the characterizing space (𝐇,𝐞){\mathcal{L}}({\mathbf{H}},{\mathbf{e}}) of the equivalent LDIMs can have an arbitrary number of latent nodes, which leads to multiple transformed representations with varying number of latent nodes. The following definition restricts the number of latent nodes present in the equivalent transformed LDIGs and provides a minimal transformed representation–representation with minimal number of nodes–of the true LDIG.

Definition 4.5.

Define q(𝐇,𝐞){\mathcal{L}}_{q}({\mathbf{H}},{\mathbf{e}}) to be the set of all LDIMs in (𝐇,𝐞){\mathcal{L}}({\mathbf{H}},{\mathbf{e}}) with the number of latent nodes equal to the number of maximal cliques with clique size >1>1 present in 𝒢c{\mathcal{G}}_{c}.

The following lemma shows that there exist a unique latent node in every transformed LDIG ([𝐇,𝐅],𝐞~)q(𝐇,𝐞)([{\mathbf{H}},{\mathbf{F}}],\mathbf{{\widetilde{e}}})\in{\mathcal{L}}_{q}({\mathbf{H}},{\mathbf{e}}), corresponding to each maximal clique.

Theorem 4.

Let (𝐇,𝐞)({\mathbf{H}},{\mathbf{e}}) be an LDIM that satisfies Assumption 1 and let 𝒢c(𝒱,c){\mathcal{G}}_{c}({\mathcal{V}},{\mathcal{E}}_{c}) be the correlation graph of the exogenous noise sources,ee. Suppose 𝒢c{\mathcal{G}}_{c} has qq maximal cliques C1,,CqC_{1},\dots,C_{q}. Consider any LDIG ([𝐇,𝐅],𝐞~)q(𝐇,𝐞)([{\mathbf{H}},{\mathbf{F}}],\mathbf{{\widetilde{e}}})\in{\mathcal{L}}_{q}({\mathbf{H}},{\mathbf{e}}) with qq latent nodes, h1,,hqh_{1},\dots,h_{q}. Then, for a maximal clique CiC_{i} in 𝒢c{\mathcal{G}}_{c}, there exists a unique latent node h{h1,,hq}h\in\{h_{1},\dots,h_{q}\} such that Ch(h)=CiCh(h)=C_{i}.

Proof: Refer Appendix C. ∎

Remark 4.6.

Notice that the existence of the unique latent node is true only for the space q(𝐇,𝐞){\mathcal{L}}_{q}({\mathbf{H}},{\mathbf{e}}). Such a unique node might not exist for the transformed representations in (𝐇,𝐞){\mathcal{L}}({\mathbf{H}},{\mathbf{e}}). In other words, Theorem 4 identifies the minimal set of latent nodes required to explain the data.

4.2 Uniqueness of the topology

Here, the LDIMs in q(𝐇,𝐞){\mathcal{L}}_{q}({\mathbf{H}},{\mathbf{e}}) from the previous section is studied more carefully. In topology reconstruction, only the support structure of the transfer functions matter. The following proposition shows that the support structure is unique.

Proposition 5.

Let (𝐇,𝐞)({\mathbf{H}},{\mathbf{e}}) be an LDIM that satisfies Assumption 1. The topology of every LDIG ([𝐇,𝐅],𝐞~)q(𝐇,𝐞)([{\mathbf{H}},{\mathbf{F}}],\mathbf{{\widetilde{e}}})\in{\mathcal{L}}_{q}({\mathbf{H}},{\mathbf{e}}) is the same.

Proof: Refer Appendix D. ∎

Based on the above results, the following transformation of Problem (P1) is formulated.

Problem 6.

(P2) Consider an LDIM (𝐇,𝐞)({\mathbf{H}},{\mathbf{e}}) defined by (1) that satisfies Assumption 1, and Φ𝐞\Phi_{\mathbf{e}} allowed to be non-diagonal. Let ([𝐇,𝐅],𝐞~)q(𝐇,𝐞)([{\mathbf{H}},{\mathbf{F}}],\mathbf{{\widetilde{e}}})\in{\mathcal{L}}_{q}({\mathbf{H}},{\mathbf{e}}) be an LDIM with LDIG given by 𝒢t(𝒱t,t){\mathcal{G}}_{t}({\mathcal{V}}_{t},{\mathcal{E}}_{t}). Suppose that the time-series data among the observed nodes of ([𝐇,𝐅],𝐞~)([{\mathbf{H}},{\mathbf{F}}],\mathbf{{\widetilde{e}}}) is given. Then, reconstruct the topology among the observed nodes of 𝒢t{\mathcal{G}}_{t}.

Remark 4.7.

The time series data among the observed nodes of the transformed LDIM is the same as the time series obtained from the original LDIM.

Remark 4.8.

The problems (P1) and (P2) are equivalent. That is, the topology reconstructed in both the problems are the same, because topology among the observed nodes of any element of (𝐇,𝐞){\mathcal{L}}({\mathbf{H}},{\mathbf{e}}) is same as the topology of (𝐇,𝐞)({\mathbf{H}},{\mathbf{e}}) (see Definition 4.2 and proof of Proposition 5).

Therefore, instead of reconstructing the topology of (𝐇,𝐞)({\mathbf{H}},{\mathbf{e}}) with Φe\Phi_{e} non-diagonal, it is sufficient to reconstruct the topology among the observed nodes for one of the LDIMs ([𝐇,𝐅],𝐞~)q(𝐇,𝐞)([{\mathbf{H}},{\mathbf{F}}],\mathbf{{\widetilde{e}}})\in{\mathcal{L}}_{q}({\mathbf{H}},{\mathbf{e}}), which has Φ𝐞~\Phi_{\mathbf{{\widetilde{e}}}} diagonal.

5 Polynomial Correlation

The results in the previous section assumed that the correlations between the exogenous noise sources are affine in nature. In this section, a generalization of the affine correlation is addressed. It is shown that the noise sources that are correlated via non-affine, but a polynomial, interaction can be characterized using latent nodes with non-linear interaction dynamics. By lifting the processes to a higher dimension, the non-linearity is converted to linear interactions.

The following definitions are useful for the presentation in this section.

Definition 5.1.

[12] Let x=(x1,,xm)x=(x_{1},\dots,x_{m}). A monomial in x1,,xmx_{1},\dots,x_{m} is a product of the form xα:=x1α1x2α2xmαmx^{\alpha}:=x_{1}^{\alpha_{1}}x_{2}^{\alpha_{2}}\dots x_{m}^{\alpha_{m}}, where αm\alpha\in\mathbb{N}^{m}. A polynomial PP in xx with coefficient in a field (or a commutative ring), 𝔽\mathbb{F}, is a finite linear combination of monomials, i.e.,

P(x):=αaαxα,aα𝔽.P(x):=\sum_{\alpha}a_{\alpha}x^{\alpha},~{}a_{\alpha}\in\mathbb{F}.
  1. (i)

    The degree of a monomial xαx^{\alpha} is |α|:=i=1nαi|\alpha|:=\sum_{i=1}^{n}\alpha_{i}.

  2. (ii)

    The total degree of P0P\neq 0 is the maximum of |α||\alpha| such that aα0a_{\alpha}\neq 0.

Definition 5.2.

For any 𝐯𝒜m,{\mathbf{v}}\in{\mathcal{A}}^{m}, define the list of monomials of total degree at most pp,

(𝐯,p):=[f0(𝐯),f1(𝐯),,fp(𝐯)]T,{\mathcal{M}}({\mathbf{v}},p):=[f_{0}({\mathbf{v}}),f_{1}({\mathbf{v}}),\dots,f_{p}({\mathbf{v}})]^{T},

where fk(𝐯)f_{k}({\mathbf{v}}) lists all the kk-degree monomials, fk(𝐯):={𝐯1α1𝐯2α2𝐯mαmi=1mαi=k}f_{k}({\mathbf{v}}):=\{{\mathbf{v}}_{1}^{\alpha_{1}}{\mathbf{v}}_{2}^{\alpha_{2}}\dots\mathbf{v}_{m}^{\alpha_{m}}\mid\sum_{i=1}^{m}\alpha_{i}=k\}. For example, when m=2m=2, f0(𝐯)=1f_{0}({\mathbf{v}})=1, f1(𝐯)={𝐯1,𝐯2}f_{1}({\mathbf{v}})=\{{\mathbf{v}}_{1},{\mathbf{v}}_{2}\}, and f2(𝐯)={𝐯12,𝐯1𝐯2,𝐯22}f_{2}({\mathbf{v}})=\{{\mathbf{v}}_{1}^{2},{\mathbf{v}}_{1}{\mathbf{v}}_{2},{\mathbf{v}}_{2}^{2}\}. In general, the total number of monomials having degree kk is given by (m+k1k)\binom{m+k-1}{k}, where (nk)=n!k!(nk)!\binom{n}{k}=\frac{n!}{k!(n-k)!}. The total number of monomials up to total degree pp is M=M=k=0p(m+k1k)\displaystyle\sum_{k=0}^{p}\binom{m+k-1}{k}. For m=2m=2 and p=3p=3, (𝐯,3)=[1,𝐯1,𝐯2,𝐯12,𝐯1𝐯2,𝐯22,𝐯13,𝐯12𝐯2,𝐯1𝐯22,𝐯23]T{\mathcal{M}}({\mathbf{v}},3)=[1,{\mathbf{v}}_{1},{\mathbf{v}}_{2},{\mathbf{v}}_{1}^{2},{\mathbf{v}}_{1}{\mathbf{v}}_{2},{\mathbf{v}}_{2}^{2},{\mathbf{v}}_{1}^{3},{\mathbf{v}}_{1}^{2}{\mathbf{v}}_{2},{\mathbf{v}}_{1}{\mathbf{v}}_{2}^{2},{\mathbf{v}}_{2}^{3}]^{T}.

5.1 Characterization of Polynomial Correlations

Suppose e1=|α|Maα,1𝐯αe_{1}=\sum_{|\alpha|\leq M}a_{\alpha,1}{\mathbf{v}}^{\alpha} and e2=|α|Maα,2𝐯αe_{2}=\sum_{|\alpha|\leq M}a_{\alpha,2}{\mathbf{v}}^{\alpha}, where aα,1,aα,2a_{\alpha,1},a_{\alpha,2}\in{\mathcal{F}} and 𝐯𝒜m{\mathbf{v}}\in{\mathcal{A}}^{m}, m,Mm,M\in\mathbb{N}, with Φ𝐯\Phi_{\mathbf{v}} diagonal. Let 𝐲=(𝐯,p){\mathbf{y}}={\mathcal{M}}({\mathbf{v}},p) be the vector obtained by concatenating 𝐯α{\mathbf{v}}^{\alpha}, |α|p|\alpha|\leq p. Letting e1=𝐀1𝐲e_{1}={\mathbf{A}}_{1}{\mathbf{y}} and e2=𝐀2𝐲e_{2}={\mathbf{A}}_{2}{\mathbf{y}}, we have Φe1e2=𝐀1Φ𝐲𝐀2\Phi_{e_{1}e_{2}}={\mathbf{A}}_{1}\Phi_{\mathbf{y}}{\mathbf{A}}_{2}^{*}, where 𝐀i{\mathbf{A}}_{i} is the vector obtained by concatenating aα,ia_{\alpha,i} .

Notice that 𝐲{\mathbf{y}} is lifting of 𝐯{\mathbf{v}} into a higher dimensional space of polynomials. In the following, a discussion on the structure of Φ𝐲\Phi_{\mathbf{y}} is provided, with the help of examples.

5.2 Example: Lifting of a zero mean Gaussian Process

To illustrate lifting of noise processes to higher dimension, consider independent and identically distributed (IID) Gaussian process (GP). It is shown that, under lifting, the power spectral density matrix (PSDM) is block diagonal.

Consider an IID, GP {𝐯(k),k𝐯(k)N(0,σ2𝐈m)}\{{\mathbf{v}}(k),k\in{\mathbb{Z}}\mid{\mathbf{v}}(k)\sim N(0,\sigma^{2}{\mathbf{I}}_{m})\}. Then, 𝔼{𝐯i𝐯j}=𝔼{𝐯i}𝔼{𝐯j}=0{\mathbb{E}}\left\{{\mathbf{v}}_{i}{\mathbf{v}}_{j}\right\}={\mathbb{E}}\left\{{\mathbf{v}}_{i}\right\}{\mathbb{E}}\left\{{\mathbf{v}}_{j}\right\}=0. It can be shown that [31]

𝔼{𝐯ip}={0 if p is odd,σp(p1)!! if p is even,{\mathbb{E}}\left\{{\mathbf{v}}_{i}^{p}\right\}=\left\{\begin{array}[]{cc}0&\text{ if $p$ is odd,}\\ \sigma^{p}(p-1)!!&\text{ if $p$ is even,}\end{array}\right. (7)

where p!!:=(p1)(p3)3.1p!!:=(p-1)(p-3)\dots 3.1 denotes the double factorial. Consider m=2m=2 and let 𝐲=[y1,,y10]:=(𝐯,3){\mathbf{y}}=[y_{1},\dots,y_{10}]:={\mathcal{M}}({\mathbf{v}},3) =[1,𝐯1,𝐯2,𝐯12,𝐯1𝐯2,𝐯22,𝐯13,𝐯12𝐯2,𝐯1𝐯22,𝐯23]T=[1,~{}{\mathbf{v}}_{1},~{}{\mathbf{v}}_{2},~{}{\mathbf{v}}_{1}^{2},~{}{\mathbf{v}}_{1}{\mathbf{v}}_{2},~{}{\mathbf{v}}_{2}^{2},~{}{\mathbf{v}}_{1}^{3},~{}{\mathbf{v}}_{1}^{2}{\mathbf{v}}_{2},~{}{\mathbf{v}}_{1}{\mathbf{v}}_{2}^{2},~{}{\mathbf{v}}_{2}^{3}]^{T}. That is, 𝐲{\mathbf{y}} lists all the monomials of 𝐯1,𝐯2{\mathbf{v}}_{1},{\mathbf{v}}_{2} with degree 3\leq 3.

Notice that 𝔼{𝐯1i𝐯2j}=𝔼{𝐯1i}𝔼{𝐯2j}0{\mathbb{E}}\left\{{\mathbf{v}}_{1}^{i}{\mathbf{v}}_{2}^{j}\right\}={\mathbb{E}}\left\{{\mathbf{v}}_{1}^{i}\right\}{\mathbb{E}}\left\{{\mathbf{v}}_{2}^{j}\right\}\neq 0 if and only if both ii and jj are even. Then, 𝔼{y2y5}=𝔼{𝐯12𝐯2}=0{\mathbb{E}}\left\{y_{2}y_{5}\right\}={\mathbb{E}}\left\{{\mathbf{v}}_{1}^{2}{\mathbf{v}}_{2}\right\}=0. Straight forward computation shows that 𝔼{y2yk}0{\mathbb{E}}\left\{y_{2}y_{k}\right\}\neq 0 only for k=2,7,9k=2,7,9. Similarly, 𝔼{y7yk}0{\mathbb{E}}\left\{y_{7}y_{k}\right\}\neq 0 if and only if k=7,9,2k=7,9,2, and 𝔼{y9yk}0{\mathbb{E}}\left\{y_{9}y_{k}\right\}\neq 0 if and only if k=7,9,2k=7,9,2. Notice that k=2,7,9k=2,7,9 corresponds to the terms with odd power on x1x_{1} and even power on x2x_{2}. Repeating the same for every 𝔼{yiyk}{\mathbb{E}}\left\{y_{i}y_{k}\right\}, 1i,k101\leq i,k\leq 10, one can show that, after appropriate rearrangement of rows and columns, the covariance matrix and the PSDM of 𝐲{\mathbf{y}} forms a block diagonal matrix, given by (8). Here, 𝐲~=[y1,y4,y6,y2,y7,y9,y3,y8,y10,y5]\mathbf{\tilde{{\mathbf{y}}}}=[y_{1},y_{4},y_{6},y_{2},y_{7},y_{9},y_{3},y_{8},y_{10},y_{5}].

Φ𝐲~=[Φ11Φ14Φ160000000Φ41Φ44Φ460000000Φ61Φ46Φ660000000000Φ22Φ27Φ290000000Φ72Φ77Φ790000000Φ92Φ97Φ990000000000Φ550000000000Φ33Φ38Φ3100000000Φ83Φ88Φ8100000000Φ103Φ108Φ1010]\displaystyle\Phi_{\mathbf{\tilde{{\mathbf{y}}}}}=\left[\begin{matrix}\Phi_{11}&\Phi_{14}&\Phi_{16}&0&0&0&0&0&0&0\\ \Phi_{41}&\Phi_{44}&\Phi_{46}&0&0&0&0&0&0&0\\ \Phi_{61}&\Phi_{46}&\Phi_{66}&0&0&0&0&0&0&0\\ 0&0&0&\Phi_{22}&\Phi_{27}&\Phi_{29}&0&0&0&0\\ 0&0&0&\Phi_{72}&\Phi_{77}&\Phi_{79}&0&0&0&0\\ 0&0&0&\Phi_{92}&\Phi_{97}&\Phi_{99}&0&0&0&0\\ 0&0&0&0&0&0&\Phi_{55}&0&0&0\\ 0&0&0&0&0&0&0&\Phi_{33}&\Phi_{38}&\Phi_{310}\\ 0&0&0&0&0&0&0&\Phi_{83}&\Phi_{88}&\Phi_{810}\\ 0&0&0&0&0&0&0&\Phi_{103}&\Phi_{108}&\Phi_{1010}\\ \end{matrix}\right] (8)

It is worth mentioning that since the Gaussian process is zero mean IID, the auto-correlation function R𝐯i𝐯j(k)=R𝐯i𝐯j(0)δ(k)=𝔼{𝐯i(0)𝐯j(0)}δ(k)R_{{\mathbf{v}}_{i}{\mathbf{v}}_{j}}(k)=R_{{\mathbf{v}}_{i}{\mathbf{v}}_{j}}(0)\delta(k)={\mathbb{E}}\left\{{\mathbf{v}}_{i}(0){\mathbf{v}}_{j}(0)\right\}\delta(k), where δ\delta is Kronecker delta and thus Φ𝐲~(z)=R𝐲~(0)\Phi_{\mathbf{\tilde{{\mathbf{y}}}}}(z)=R_{\mathbf{\tilde{{\mathbf{y}}}}}(0), |z|=1\forall|z|=1 is white (same for all the frequencies).

Refer to caption
(a) Original LDIG, 𝒢(𝒱,){\mathcal{G}}({\mathcal{V}},{\mathcal{E}})
Refer to caption
(b) A correlation graph of 𝒢(𝒱,){\mathcal{G}}({\mathcal{V}},{\mathcal{E}})
Refer to caption
(c) An example transformed LDIG of sub figures above with m=2m=2, p=3p=3, and n=3n=3 with 2m=42^{m}=4 clusters. Here, e1=e~1+F12y2+F17y7+F19y9e_{1}={\widetilde{e}}_{1}+F_{12}y_{2}+F_{17}y_{7}+F_{19}y_{9}, e2=e~2+F22y2+F27y7+F29y9+F21y1+F24y4+F26y6e_{2}={\widetilde{e}}_{2}+F_{22}y_{2}+F_{27}y_{7}+F_{29}y_{9}+F_{21}y_{1}+F_{24}y_{4}+F_{26}y_{6}, and e3=e~3+F31y1+F34y4+F36y6e_{3}={\widetilde{e}}_{3}+F_{31}y_{1}+F_{34}y_{4}+F_{36}y_{6}.
Refer to caption
(d) Transformed graph of the above LDIG when all the three nodes are correlated. The latent node 44 will capture the interactions due to the nodes y1,,y10y_{1},\dots,y_{10}.
Figure 3: Transformation of an LDIM with correlated noise sources to a network with uncorrelated noise sources in the presence of a latent node.

The following proposition shows this for general m,pm,p.

Proposition 7.

Consider the Gaussian IID process {𝐯(k),k𝐯(k)N(0,σ2𝐈m)}\{{\mathbf{v}}(k),k\in{\mathbb{Z}}\mid{\mathbf{v}}(k)\sim N(0,\sigma^{2}{\mathbf{I}}_{m})\}. Let 𝐲=(𝐯,p){\mathbf{y}}={\mathcal{M}}({\mathbf{v}},p). Then, there exists a 𝐲~\mathbf{\tilde{{\mathbf{y}}}}, a permutation of 𝐲{\mathbf{y}}, such that Φ𝐲~\Phi_{\mathbf{\tilde{{\mathbf{y}}}}} is block diagonal with 2m2^{m} non-zero blocks.

Proof: Refer Appendix E. ∎

Based on this example, one can extend the result to symmetric zero mean WSS processes. Symmetric distributions are the distributions with probability density functions symmetric around mean, for example, Gaussian distribution.

Definition 5.3.

A probability distribution is said to be symmetric around mean if and only if it’s density function, ff, satisfies f(μx)=f(μ+x)f(\mu-x)=f(\mu+x) for every xx\in{\mathbb{R}}, where μ\mu is the mean of the distribution.

Proposition 8.

Consider a zero mean WSS process, 𝐯𝒜m{\mathbf{v}}\in{\mathcal{A}}^{m} with symmetric distribution and Φ𝐯\Phi_{\mathbf{v}} diagonal. Let 𝐲=(𝐯,p){\mathbf{y}}={\mathcal{M}}({\mathbf{v}},p). Then, there exists a 𝐲~\mathbf{\tilde{{\mathbf{y}}}}, a permutation of 𝐲{\mathbf{y}} such that Φ𝐲~\Phi_{\mathbf{\tilde{{\mathbf{y}}}}} is block diagonal with 2m2^{m} non-zero blocks.

Proof: Proof is similar to the IID GP case. For symmetric distributions, odd moments are zero, since odd moments are the integral of odd functions. ∎

As shown in the proof of Proposition 7, the monomial nodes corresponding to a given block diagonal (i.e., the same odd-even pattern) can be grouped into a cluster. Fig. 3(c) shows such a clustering with m=2m=2 and p=3p=3. The red nodes are the lifted processes in the higher dimension (the monomial nodes y1,,y10y_{1},\dots,y_{10}). The nodes inside a given cluster (shown by blue ellipse) are correlated with each other and the nodes from different clusters are not correlated with each other; this is a result of Φ𝐲~\Phi_{\mathbf{\tilde{{\mathbf{y}}}}} being block-diagonal. Here, e1e_{1} and e2e_{2} are correlated via the cluster of y2,y7,y_{2},~{}y_{7}, and y9y_{9}, whereas e2e_{2} and e3e_{3} are correlated via the cluster of y1,y4,y_{1},~{}y_{4}, and y6y_{6}. Notice the following important distinction from the LDIMs in Fig. 1. Here, it is sufficient for the observed nodes 1,21,2 to be connected to one of the latent nodes in a given cluster. It is not necessary for the nodes to have a common parent like the case in Fig. 1. The common parent property from Section 4 is replaced with a common ancestral cluster here.

5.3 Transformation of Polynomial Correlation to Latent Nodes

Based on the aforementioned discussion, relaxation of Assumption 1 and extension of the LDIM transformation results from Section 4 is presented here. It is shown that in order to relax Assumption 1, non-linear interactions are required between the latent nodes and the observed nodes. The following assumption is a generalization of Assumption 1.

Assumption 2.

In LDIM (𝐇,𝐞)({\mathbf{H}},{\mathbf{e}}), defined by (1), the noise processes ei,ej𝒜e_{i},e_{j}\in{\mathcal{A}} are correlated if and only if there exist polynomials P1,P2P_{1},P_{2} and 𝐞~=[𝐯T,e~1,e~2]T\mathbf{{\widetilde{e}}}=[{\mathbf{v}}^{T},{\widetilde{e}}_{1},{\widetilde{e}}_{2}]^{T}, 𝐯𝒜m{\mathbf{v}}\in{\mathcal{A}}^{m}, e1,e2𝒜e_{1},e_{2}\in{\mathcal{A}}, with Φ𝐞~\Phi_{\mathbf{{\widetilde{e}}}} diagonal, mm\in{\mathbb{N}}, such that e1=e~1+P1(𝐯)e_{1}={\widetilde{e}}_{1}+P_{1}({\mathbf{v}}) and e2=e~2+P2(𝐯)e_{2}={\widetilde{e}}_{2}+P_{2}({\mathbf{v}}), where Pi(𝐯)=|α|Maα,i𝐯αP_{i}({\mathbf{v}})=\sum_{|\alpha|\leq M}a_{\alpha,i}{\mathbf{v}}^{\alpha}, aα,ia_{\alpha,i}\in{\mathcal{F}}, for some MM\in{\mathbb{N}}.

Remark 5.4.

To be precise, the extension of Assumption 1 is given by y=P(x)y=P(x), where x𝒜x\in{\mathcal{A}} and P(x)=αaαxαP(x)=\sum_{\alpha}a_{\alpha}x^{\alpha}, aαa_{\alpha}\in{\mathcal{F}} is a polynomial of degree less than or equal to MM. Any x𝒜x\in{\mathcal{A}} can be written as x=e~x+bvx={\widetilde{e}}_{x}+bv, where e~x,v𝒜{\widetilde{e}}_{x},v\in{\mathcal{A}}, bb\in{\mathcal{F}}, and e~x{\widetilde{e}}_{x} uncorrelated with vv. Then, y=P(e~x+bv)y=P({\widetilde{e}}_{x}+bv), which is a special case of Assumption 2.

Definition 5.5.

For any LDIM, (𝐇,𝐞)({\mathbf{H}},{\mathbf{e}}), with Φ𝐞\Phi_{\mathbf{e}} non-diagonal, and satisfying Assumption 2, and ,p>1,~{}p>1

(p)(𝐇,𝐞)\displaystyle{\mathcal{L}}^{(p)}({\mathbf{H}},{\mathbf{e}}) :={(𝐇~,𝐲):𝐞=𝐞~o+𝐅𝐲,𝐇~=[𝐇𝐅𝟎𝟎],𝐅n×M,\displaystyle:=\{(\mathbf{{\widetilde{H}}},{\mathbf{y}}):{\mathbf{e}}=\mathbf{{\widetilde{e}}}_{o}+{\mathbf{F}}{\mathbf{y}},~{}\mathbf{{\widetilde{H}}}={\tiny\left[\begin{matrix}{\mathbf{H}}&{\mathbf{F}}\\ {\mathbf{0}}&{\mathbf{0}}\end{matrix}\right]},~{}{\mathbf{F}}\in{\mathcal{F}}^{n\times M},\
𝐲=(𝐯,p),𝐞~o𝒜n,𝐯𝒜m,m,𝐞~=[𝐞~o,𝐯]T,\displaystyle\ {\mathbf{y}}={\mathcal{M}}({\mathbf{v}},p),~{}\mathbf{{\widetilde{e}}}_{o}\in{\mathcal{A}}^{n},~{}{\mathbf{v}}\in{\mathcal{A}}^{m},~{}m\in{\mathbb{N}},~{}\mathbf{{\widetilde{e}}}=[\mathbf{{\widetilde{e}}}_{o},{\mathbf{v}}]^{T},
M=k=0p(m+k1k),Φ𝐞~ diagonal}\displaystyle M=\sum_{k=0}^{p}{\tiny\binom{m+k-1}{k}},~{}\Phi_{\mathbf{{\widetilde{e}}}}\text{ diagonal}\} (9)

is the space of all p{\mathcal{L}}_{p}-transformations for a given (𝐇,𝐞)({\mathbf{H}},{\mathbf{e}}). The matrix 𝐅{\mathbf{F}} is obtained by concatenating 𝐅αn×1{\mathbf{F}}_{\alpha}\in{\mathcal{F}}^{n\times 1}. This is done by first listing all the 𝐅α{\mathbf{F}}_{\alpha} corresponding to “degree one” monomials in lexicographic order, then “degree 22” in lexicographic order, etc. That is,𝐅:=[𝐅b1,,𝐅bL,𝐅b11,𝐅b12,,𝐅b1L,𝐅b21,𝐅b22,],{\mathbf{F}}:=[{\mathbf{F}}_{b_{1}},\dots,{\mathbf{F}}_{b_{L}},{\mathbf{F}}_{b_{11}},{\mathbf{F}}_{b_{12}},\dots,{\mathbf{F}}_{b_{1L}},{\mathbf{F}}_{b_{21}},{\mathbf{F}}_{b_{22}},\dots], where 𝐅bi1ik{\mathbf{F}}_{b_{i_{1}\dots i_{k}}} denotes the column vector of 𝐅{\mathbf{F}} corresponding to α=bi1++bik\alpha=b_{i_{1}}+\dots+b_{i_{k}}, bi:b_{i}: canonical basis. For m=2,p=3m=2,~{}p=3 from Section 5.2, 𝐅=[𝐅b1,𝐅b2,𝐅b11,𝐅b12,𝐅b22,𝐅b111,𝐅b112,𝐅b122,𝐅b222]{\mathbf{F}}=[{\mathbf{F}}_{b_{1}},{\mathbf{F}}_{b_{2}},{\mathbf{F}}_{b_{11}},{\mathbf{F}}_{b_{12}},{\mathbf{F}}_{b_{22}},{\mathbf{F}}_{b_{111}},{\mathbf{F}}_{b_{112}},{\mathbf{F}}_{b_{122}},{\mathbf{F}}_{b_{222}}]. Here, 𝐅b2=𝐅(0,1),𝐅b12=𝐅(1,1),𝐅b222=𝐅(0,3){\mathbf{F}}_{b_{2}}={\mathbf{F}}_{(0,1)},~{}{\mathbf{F}}_{b_{12}}={\mathbf{F}}_{(1,1)},~{}{\mathbf{F}}_{b_{222}}={\mathbf{F}}_{(0,3)}.

With this new “polynomial lifting” definition, one can transform a given LDIM with correlated noise sources to a transformed LDIM with latent nodes. As shown in the following lemma, transformation of correlations to uncorrelated latent nodes from Section 4 is replaced with uncorrelated latent clusters here. For any cluster cc, Ch(c):={Ch(i):ic}Ch(c):=\{Ch(i):i\in c\}, that is, Ch(c)Ch(c) denotes the union of the children of the nodes present in cluster cc.

Lemma 9.

Let (𝐇,𝐞)({\mathbf{H}},{\mathbf{e}}) be an LDIM that satisfies Assumption 2, and let 𝒢c(𝒱,c){\mathcal{G}}_{c}({\mathcal{V}},{\mathcal{E}}_{c}) be the correlation graph of the exogenous noise sources, {ei}i=1n\{e_{i}\}_{i=1}^{n}.

Then, for every distinct i,j[n]i,j\in[n], (i,j)c(i,j)\in{\mathcal{E}}_{c} if and only if for every LDIG (𝐇~,𝐲)(p)(𝐇,𝐞)(\mathbf{{\widetilde{H}}},{\mathbf{y}})\in{\mathcal{L}}^{(p)}({\mathbf{H}},{\mathbf{e}}), there exists a cluster cc such that i,jCh(c)i,j\in Ch(c).

Proof: Refer Appendix F. ∎

The following theorem shows that a subgraph, 𝒢(𝒱,){\mathcal{G}}^{\ell}({\mathcal{V}}^{\ell},{\mathcal{E}}^{\ell}), of the correlation graph, 𝒢c(𝒱,c){\mathcal{G}}_{c}({\mathcal{V}},{\mathcal{E}}_{c}), forms a maximal clique in 𝒢c{\mathcal{G}}_{c} if and only if for any transformed LDIG in (p)(𝐇,𝐞){\mathcal{L}}^{(p)}({\mathbf{H}},{\mathbf{e}}), the set of nodes in 𝒱l{\mathcal{V}}^{l} is equal to the set of the children of some latent cluster.

Theorem 10.

Let (𝐇,𝐞)({\mathbf{H}},{\mathbf{e}}) be an LDIM defined by (1) which satisfies Assumption 2,and let 𝒢c(𝒱,c){\mathcal{G}}_{c}({\mathcal{V}},{\mathcal{E}}_{c}) be the correlation graph of the exogenous noise sources, 𝐞{\mathbf{e}}. Suppose that 𝒢(𝒱,)𝒢c(𝒱,c){\mathcal{G}}^{\ell}({\mathcal{V}}^{\ell},{\mathcal{E}}^{\ell})\subseteq{\mathcal{G}}_{c}({\mathcal{V}},{\mathcal{E}}_{c}) is a maximal clique with |𝒱|>1|{\mathcal{V}}^{\ell}|>1. Then, for any LDIM (𝐇~,𝐲)(p)(𝐇,𝐞)(\mathbf{{\widetilde{H}}},{\mathbf{y}})\in{\mathcal{L}}^{(p)}({\mathbf{H}},{\mathbf{e}}), there exist latent clusters C1,,CkC_{1}^{\ell},\dots,C^{\ell}_{k_{\ell}} in the LDIG of (𝐇~,𝐲)(\mathbf{{\widetilde{H}}},{\mathbf{y}}) such that

𝒱\displaystyle{\mathcal{V}}^{\ell} =i=1kCh(Ci) and =i=1k,i,\displaystyle=\bigcup_{i=1}^{k_{\ell}}Ch(C^{\ell}_{i})\text{ and }{\mathcal{E}}^{\ell}=\bigcup_{i=1}^{k_{\ell}}{\mathcal{E}}_{\ell,i}, (10)

where ,i:={(k,j):k,jCh(ci)}{\mathcal{E}}_{\ell,i}:=\{(k,j):k,j\in Ch(c_{i}^{\ell})\}. In particular, for any latent cluster CC in the LDIG of (𝐇~,𝐲)(\mathbf{{\widetilde{H}}},{\mathbf{y}}), Ch(C)Ch(C) forms a clique in 𝒢c{\mathcal{G}}_{c}.

Proof: See Appendix G. ∎

Consider the LDIG shown in Fig. 3(a) with correlation graph given by Fig. 3(b). As shown in Fig. 3(d), if all the three nodes are correlated, one can transform this LDIM to an LDIM with latent node 44. However, here node 44 should capture the interactions from the latent nodes y1y10y_{1}-y_{10}. Therefore, during reconstruction, one should accommodate non-linear interaction between the latent node and the observed nodes.

In the next section, we describe a way to perform the reconstruction, when 𝒢c{\mathcal{G}}_{c} is unavailable.

6 Moral-Graph Reconstruction by Matrix Decomposition

In the previous sections, frameworks that convert a network with spatially correlated noise to a network with latent nodes were studied. In this section, a technique is provided to reconstruct the topology of the transformed LDIMs using the sparse plus low-rank matrix decomposition of the IPSDM obtained from the observed time-series. Note that this result does not require any extra information other than the IPSDM obtained from the true LDIM.

6.1 Topology Reconstruction under Affine Correlation

Here, reconstruction under affine correlation is discussed. Recall from Definition 4.2 that 𝐞=𝐞~o+𝐅𝐞~h{\mathbf{e}}=\mathbf{{\widetilde{e}}}_{o}+{\mathbf{F}}\mathbf{{\widetilde{e}}}_{h}. Then, PSDM of 𝐞{\mathbf{e}}, Φe\Phi_{e}, can be written as [27]: Φe=Φ𝐞~o+Φ𝐞~o𝐞~h𝐅+𝐅Φ𝐞~h𝐞~o+𝐅Φ𝐞~h𝐅\Phi_{e}=\Phi_{\mathbf{{\widetilde{e}}}_{o}}+\Phi_{\mathbf{{\widetilde{e}}}_{o}\mathbf{{\widetilde{e}}}_{h}}{\mathbf{F}}^{*}+{\mathbf{F}}\Phi_{\mathbf{{\widetilde{e}}}_{h}\mathbf{{\widetilde{e}}}_{o}}+{\mathbf{F}}\Phi_{\mathbf{{\widetilde{e}}}_{h}}{\mathbf{F}}^{*} =Φ𝐞~o+𝐅Φ𝐞~h𝐅=\Phi_{\mathbf{{\widetilde{e}}}_{o}}+{\mathbf{F}}\Phi_{\mathbf{{\widetilde{e}}}_{h}}{\mathbf{F}}^{*}, where the second equality follows because 𝐞~h\mathbf{{\widetilde{e}}}_{h} and 𝐞~o\mathbf{{\widetilde{e}}}_{o} are uncorrelated with mean zero. Then, IPSDM of the observed nodes in ([𝐇,𝐅],𝐞~)(𝐇,𝐞)([{\mathbf{H}},{\mathbf{F}}],\mathbf{{\widetilde{e}}})\in{\mathcal{L}}({\mathbf{H}},{\mathbf{e}}),

Φo1\displaystyle\Phi_{o}^{-1} =(𝐈n𝐇)(Φe~o+𝐅Φe~h𝐅)1(𝐈n𝐇)\displaystyle=({\mathbf{I}}_{n}-{\mathbf{H}})^{*}(\Phi_{{\widetilde{e}}_{o}}+{\mathbf{F}}\Phi_{{\widetilde{e}}_{h}}{\mathbf{F}}^{*})^{-1}({\mathbf{I}}_{n}-{\mathbf{H}})
=(a)𝐀𝐁,\displaystyle\stackrel{{\scriptstyle(a)}}{{=}}{\mathbf{A}}-{\mathbf{B}}, (11)

where 𝐀=(𝐈n𝐇)Φe~o1(𝐈n𝐇){\mathbf{A}}=({\mathbf{I}}_{n}-{\mathbf{H}})^{*}\Phi_{{\widetilde{e}}_{o}}^{-1}({\mathbf{I}}_{n}-{\mathbf{H}}) and 𝐁=(𝐈n𝐇)Φe~o1𝐅(Φe~h1+𝐅Φe~o1𝐅)1𝐅Φe~o1(𝐈n𝐇){\mathbf{B}}=({\mathbf{I}}_{n}-{\mathbf{H}})^{*}\Phi_{{\widetilde{e}}_{o}}^{-1}{\mathbf{F}}(\Phi_{{\widetilde{e}}_{h}}^{-1}+{\mathbf{F}}^{*}\Phi_{{\widetilde{e}}_{o}}^{-1}{\mathbf{F}})^{-1}{\mathbf{F}}^{*}\Phi_{{\widetilde{e}}_{o}}^{-1}({\mathbf{I}}_{n}-{\mathbf{H}}). Equality (a) follows from the matrix inversion lemma [20]. (11) can then be rewritten as:

Φo1=\displaystyle\Phi_{o}^{-1}= 𝐒+𝐋, where\displaystyle{\mathbf{S}}+{\mathbf{L}},\text{ where } (12)
𝐒\displaystyle{\mathbf{S}} =(𝐈n𝐇)Φe~o1(𝐈n𝐇),\displaystyle=({\mathbf{I}}_{n}-{\mathbf{H}}^{*})\Phi_{{\widetilde{e}}_{o}}^{-1}({\mathbf{I}}_{n}-{\mathbf{H}}), (13)
𝐋\displaystyle{\mathbf{L}} =ΨΛ1Ψ,\displaystyle=-\Psi^{*}\Lambda^{-1}\Psi, (14)

Ψ=𝐅Φe~o1(𝐈n𝐇)\Psi={\mathbf{F}}^{*}\Phi_{{\widetilde{e}}_{o}}^{-1}({\mathbf{I}}_{n}-{\mathbf{H}}), and Λ=𝐅Φe~o1𝐅+Φe~h1\Lambda={\mathbf{F}}^{*}\Phi_{{\widetilde{e}}_{o}}^{-1}{\mathbf{F}}+\Phi_{{\widetilde{e}}_{h}}^{-1}, which is similar to the model in [46]. If the moral-graph of the original LDIG is sparse and MnM\ll n, then 𝐒{\mathbf{S}} is sparse and 𝐋{\mathbf{L}} is low-rank. It was shown in [27] that the support of 𝐒{\mathbf{S}} retrieves the moral graph of 𝒢{\mathcal{G}}. Furthermore, as shown in [46], under some assumptions that is applicable to a large class of problems, (i,j)(i,j)-th entry of 𝐒{\mathbf{S}} is strictly real if and only if the edge iji-j is a strict spouse edge. Thus, it can be shown that, in a large class of problems, support of {𝐒}\Im\{{\mathbf{S}}\} retrieves the exact topology of 𝒢{\mathcal{G}}. Following the approach in [46], we reconstruct the network topology from the sparse+low-rank decomposition of {Φo1(z)}\Im\{\Phi_{o}^{-1}(z)\}, which is a skew-symmetric matrix. For completeness, the essential theories and a relevant algorithm from [46] is provided below in Section 6.4. The idea here is to decompose a given skew-symmetric matrix, i.e. {Φo1(z)}\Im\{\Phi_{o}^{-1}(z)\}, into the sparse and low-rank components ({𝐒(z)}\Im\{{\mathbf{S}}(z)\} and {𝐋(z)}\Im\{{\mathbf{L}}(z)\} respectively), and then to reconstruct the moral graph/topology from {𝐒(z)}\Im\{{\mathbf{S}}(z)\}, for some |z|=1|z|=1.

The next subsection shows how the sparse low-rank decomposition is applicable in polynomial correlation setting also.

6.2 Topology reconstruction under polynomial correlation

Under polynomial correlation, recall from Definition 5.5 that 𝐞(z)=𝐞~o(z)+𝐅𝐲~{\mathbf{e}}(z)=\mathbf{{\widetilde{e}}}_{o}(z)+{\mathbf{F}}\mathbf{\tilde{{\mathbf{y}}}}. Let 𝐱h=𝐞~h=𝐲~{\mathbf{x}}_{h}=\mathbf{{\widetilde{e}}}_{h}=\mathbf{\tilde{{\mathbf{y}}}} and let 𝐱=[𝐱oT,𝐱hT]T{\mathbf{x}}=[{\mathbf{x}}_{o}^{T},{\mathbf{x}}_{h}^{T}]^{T}. Then, one can write,

[𝐱o(z)𝐱h(z)]=[𝐇(z)𝐅(z)𝟎M×n𝟎M×M][𝐱o(z)𝐱h(z)]+[𝐞~𝐨(z)𝐞~𝐡(z)],\displaystyle\left[\begin{matrix}{\mathbf{x}}_{o}(z)\\ {\mathbf{x}}_{h}(z)\end{matrix}\right]=\begin{bmatrix}{\mathbf{H}}(z)&{\mathbf{F}}(z)\\ \mathbf{0}_{M\times n}&\mathbf{0}_{M\times M}\end{bmatrix}\begin{bmatrix}\mathbf{x}_{o}(z)\\ \mathbf{x}_{h}(z)\end{bmatrix}+\begin{bmatrix}\mathbf{{\widetilde{e}}_{o}}(z)\\ \mathbf{{\widetilde{e}}_{h}}(z)\end{bmatrix}, (15)

where, 𝐅ij{\mathbf{F}}_{ij} captures the influence of 𝐲~j\mathbf{\tilde{{\mathbf{y}}}}_{j} on xix_{i} and Φ𝐲~\Phi_{\mathbf{\tilde{{\mathbf{y}}}}} is block diagonal.

The topology of the sub-graph restricted to the observed nodes in the above LDIM and the true topology of the network are equivalent. Moreover, similar to (11) (see [46] for details), one can obtain (12) to (14) exactly.

Here 𝐒(z),𝐋(z)n×n{\mathbf{S}}(z),{\mathbf{L}}(z)\in\mathbb{C}^{n\times n}, ΨM×n\Psi\in\mathbb{C}^{M\times n}, and ΛM×M\Lambda\in\mathbb{C}^{M\times M}. If MnM\ll n, then 𝐋{\mathbf{L}} is low-rank. Let 𝒥:={j[M]i[n] with 𝐅ij0}{\mathcal{J}}:=\{j\in[M]\mid\exists i\in[n]\text{ with }{\mathbf{F}}_{ij}\neq 0\} be the set of monomials that has non zero contribution to ei,i[n]e_{i},~{}i\in[n] and let L=|𝒥|L=|{\mathcal{J}}|. Under this scenario, 𝐋{\mathbf{L}} is low-rank if LnL\ll n. Section 7.1 demonstrates an example on application of the sparse plus low-rank decomposition to reconstruct the topology under polynomial correlation.

6.3 Low-rank++Sparse Matrix Decomposition

Here, the following problem is considered: given a matrix 𝐂{\mathbf{C}} that is known to be sum of a sparse skew-symmetric matrix 𝐒{\mathbf{S}} and a low-rank skew-symmetric matrix 𝐋{\mathbf{L}}, retrieve the sparse and low-rank components. The following optimization program modified from [7] is used to obtain the sparse low-rank decomposition, where 0t10\leq t\leq 1 is a pre-selected penalty factor [46].

(𝐒^t,𝐋^t)=argmin𝐒^,𝐋^t𝐒^1+(1t)𝐋^ subject to 𝐒^+𝐋^=𝐂,𝐒^T=𝐒^,𝐋^T=𝐋^,\displaystyle\begin{split}(\widehat{{\mathbf{S}}}_{t},\widehat{{\mathbf{L}}}_{t})=&\arg\min_{{\widehat{\mathbf{S}}},{\widehat{\mathbf{L}}}}t\|{\widehat{\mathbf{S}}}\|_{1}+(1-t)\|{\widehat{\mathbf{L}}}\|_{*}\\ &\text{ subject to }{\widehat{\mathbf{S}}}+{\widehat{\mathbf{L}}}={\mathbf{C}},\\ &\hskip 44.10185pt{\widehat{\mathbf{S}}}^{T}=-{\widehat{\mathbf{S}}},\ {\widehat{\mathbf{L}}}^{T}=-{\widehat{\mathbf{L}}},\end{split} (16)

where 𝐂={Φo1(z)}{\mathbf{C}}=\Im\{\Phi_{o}^{-1}(z)\}, for some z,|z|=1z\in\mathbb{C},\ |z|=1.

In the next subsection, a sufficient condition and an algorithm for the exact recovery of the sparse and the low-rank components from 𝐂{\mathbf{C}} using (16) are provided.

6.4 Sufficient Condition for Sparse Low-rank Matrix Decomposition

In this subsection, a sufficient condition (proved in [7, 46]) is provided to uniquely decompose a matrix as a sum of the sparse skew-symmetric and the low-rank skew-symmetric components. Furthermore, an algorithm is provided that utilizes this sufficient condition to retrieve the sparse and low-rank components.

The following definitions are used in the subsequent results.

degmax(𝐌)\displaystyle deg_{max}({\mathbf{M}}) :=max(max1in(j=1n𝟙{𝐌ij0}),\displaystyle:=\max\left(\max_{1\leq i\leq n}\left(\sum_{j=1}^{n}\mathbbm{1}_{\{{\mathbf{M}}_{ij}\neq 0\}}\right)\right.,
max1jn(i=1n𝟙{𝐌ij0})),\displaystyle\hskip 56.9055pt\max_{1\leq j\leq n}\left.\left(\sum_{i=1}^{n}\mathbbm{1}_{\{{\mathbf{M}}_{ij}\neq 0\}}\right)\right), (17)
inc(𝐌)\displaystyle inc({\mathbf{M}}) :=maxk𝐔𝐔Tek2,\displaystyle:=\max_{k}\|{\mathbf{U}}{\mathbf{U}}^{T}e_{k}\|_{2}, (18)

where 𝐔Σ𝐕T{\mathbf{U}}\Sigma{\mathbf{V}}^{T} is the compact singular value decomposition of 𝐌{\mathbf{M}} and 2\|\cdot\|_{2} is the Euclidean norm of a vector.

The following is a sufficient condition that guarantees the unique decomposition of 𝐂{\mathbf{C}} (see [7, 46] for details).

Lemma 11.

Suppose that we are given a matrix 𝐂{\mathbf{C}}, which is the sum of a sparse matrix 𝐒~𝕊n\mathbf{\tilde{{\mathbf{S}}}}\in{\mathbb{S}}^{n} and a low-rank matrix 𝐋~𝕊n\mathbf{\tilde{{\mathbf{L}}}}\in{\mathbb{S}}^{n}. If (𝐒~,𝐋~)(\mathbf{\tilde{{\mathbf{S}}}},\mathbf{\tilde{{\mathbf{L}}}}) satisfies

degmax(𝐒~)inc(𝐋~)<112,\displaystyle deg_{max}(\mathbf{\tilde{{\mathbf{S}}}})inc(\mathbf{\tilde{{\mathbf{L}}}})<\frac{1}{12}, (19)

then there exists a penalty factor t[0,1]t\in[0,1] such that (16) returns (𝐒^t,𝐋^t)=(𝐒~,𝐋~)(\widehat{{\mathbf{S}}}_{t},\widehat{{\mathbf{L}}}_{t})=(\mathbf{\tilde{{\mathbf{S}}}},\mathbf{\tilde{{\mathbf{L}}}}).

Remark 6.1.

The results in [7, 46] are proved for the optimization problem with the objective function γ𝐒^1+𝐋^\gamma\|{\widehat{\mathbf{S}}}\|_{1}+\|{\widehat{\mathbf{L}}}\|_{*}, where γ0\gamma\geq 0. However, the results hold for (16)\eqref{eq:convex_opti_t} as well since the problems are equivalent (via the map t=γ/(1+γ)t=\gamma/(1+\gamma))

The sufficient condition (19) roughly translates to 𝐒~\mathbf{\tilde{{\mathbf{S}}}} being sparse and the number of maximal cliques, MM, being small, with clique sizes not too small (see [46] for more details).

The following metrics are used to measure the accuracy of the estimates (𝐒^t,𝐋^t)({\widehat{\mathbf{S}}}_{t},{\widehat{\mathbf{L}}}_{t}) in the optimization (16).

tolt\displaystyle tol_{t} :=𝐒^t𝐒~F𝐒~F+𝐋^t𝐋~F𝐋~F,\displaystyle:=\frac{\|\widehat{{\mathbf{S}}}_{t}-\mathbf{\tilde{{\mathbf{S}}}}\|_{F}}{\|\mathbf{\tilde{{\mathbf{S}}}}\|_{F}}+\frac{\|\widehat{{\mathbf{L}}}_{t}-\mathbf{\tilde{{\mathbf{L}}}}\|_{F}}{\|\mathbf{\tilde{{\mathbf{L}}}}\|_{F}},
difft\displaystyle{diff}_{t} :=(𝐒^tϵ𝐒^tF)+(𝐋^tϵ𝐋^tF),\displaystyle:=(\|\widehat{{\mathbf{S}}}_{t-\epsilon}-\widehat{{\mathbf{S}}}_{t}\|_{F})+(\|\ \widehat{{\mathbf{L}}}_{t-\epsilon}-\widehat{{\mathbf{L}}}_{t}\|_{F}), (20)

where .F\|.\|_{F} denotes the Frobenius norm and ϵ>0\epsilon>0 is a sufficiently small fixed constant. Note that tolttol_{t} requires the knowledge of the true matrices 𝐒~\mathbf{{\widetilde{S}}} and 𝐋~\mathbf{{\widetilde{L}}}, whereas difft{diff}_{t} does not.

The following Lemma is proved in [46] and is applied in Algorithm 1 later to retrieve the sparse and low-rank components.

Lemma 12.

Suppose we are given a matrix 𝐂{\mathbf{C}}, which is obtained by summing 𝐒~\mathbf{\tilde{{\mathbf{S}}}} and 𝐋~\mathbf{\tilde{{\mathbf{L}}}}, where 𝐒~\mathbf{\tilde{{\mathbf{S}}}} is a sparse skew-symmetric matrix and 𝐋~\mathbf{\tilde{{\mathbf{L}}}} is a low-rank skew-symmetric matrix. If 𝐒~\mathbf{\tilde{{\mathbf{S}}}} and 𝐋~\mathbf{\tilde{{\mathbf{L}}}} satisfies degmax(𝐒~)inc(𝐋~)<1/12\deg_{\max}(\mathbf{\tilde{{\mathbf{S}}}})inc(\mathbf{\tilde{{\mathbf{L}}}})<1/12, then there exist at least three regions where difft=0diff_{t}=0. In particular, there exists an interval [t1,t2][0,1][t_{1},t_{2}]\subset[0,1] with 0<t1<t2<10<t_{1}<t_{2}<1 such that the solution of (16) is (𝐒^t,𝐋^t)=(𝐒~,𝐋~)({\widehat{\mathbf{S}}}_{t},{\widehat{\mathbf{L}}}_{t})=(\mathbf{\tilde{{\mathbf{S}}}},\mathbf{\tilde{{\mathbf{L}}}}) for any t[t1,t2]t\in[t_{1},t_{2}].

Following the procedure in [46], moral-graph/topology reconstruction from the imaginary part of the IPSDM is considered here. The following are some assumptions from [46], assumed for the exact recovery of the topology. The details can be found in [46], and are skipped here. In the absence of Assumption 4, the reconstruction algorithm will detect some false positive edges, but none of the true edges are missed.

Assumption 3.

For any i[n]i\in[n], k[n]k\in[n] (k[L])(k^{\prime}\in[L]), if 𝐇ik(z)0,{\mathbf{H}}_{ik}(z)\neq 0, (Fik0)(F_{ik^{\prime}}\neq 0) then {𝐇ik(z)}0\Im\{{\mathbf{H}}_{ik}(z)\}\neq 0 ({𝐅ik(z)}0)(\Im\{{\mathbf{F}}_{ik^{\prime}}(z)\}\neq 0), for all z,|z|=1z\in\mathbb{C},~{}|z|=1.

Assumption 4.

For the LDIM in (1), and i,k,l𝒱i,k,l\in{\mathcal{V}}, if 𝐇ki(z)0{\mathbf{H}}_{ki}(z)\neq 0 and 𝐇kl(z)0{\mathbf{H}}_{kl}(z)\neq 0, then \phase𝐇ki(z)=\phase𝐇kl(z)\phase{{\mathbf{H}}_{ki}(z)}=\phase{{\mathbf{H}}_{kl}(z)}.

The following Lemma from [46] reconstructs the exact topology of the LDIM from 𝐒{\mathbf{S}} in (13).

Lemma 13.

Consider a well-posed and topologically detectable LDM (𝐇,𝐞)({\mathbf{H}},{\mathbf{e}}) described by (1) with the associated graph 𝒢(𝒱,){\mathcal{G}}({\mathcal{V}},{\mathcal{E}}), and satisfying Assumption 4. Let 𝐒(z){\mathbf{S}}(z) be given by (13) and let ^o:={(i,j):{𝐒ij(z)}0,i<j}\widehat{{\mathcal{E}}}_{o}:=\{(i,j):\Im\{{\mathbf{S}}_{ij}(z)\}\neq 0,~{}i<j\} and ¯o:={(i,j)(i,j) or (j,i),i<j}\overline{{\mathcal{E}}}_{o}:=\{(i,j)\mid(i,j)\in{\mathcal{E}}\text{ or }(j,i)\in{\mathcal{E}},~{}i<j\}, for some zz\in\mathbb{C}, |z|=1|z|=1. Then, ^o¯o\widehat{{\mathcal{E}}}_{o}\subseteq\overline{{\mathcal{E}}}_{o}. Additionally, if the LDM satisfies Assumption 3, then ^o=¯o\hat{\mathcal{E}}_{o}=\overline{\mathcal{E}}_{o} almost always.

The following is the main result of this section, which follows from Lemma 11 and Lemma 13.

Theorem 14.

Let (𝐇,𝐞)({\mathbf{H}},{\mathbf{e}}) be an LDIM with Φe\Phi_{e} non-diagonal that satisfy Assumption 3-Assumption 4. Suppose that there exists an (𝐇,𝐅,𝐞~)q(𝐇,𝐞)({\mathbf{H}},{\mathbf{F}},\mathbf{{\widetilde{e}}})\in{\mathcal{L}}_{q}({\mathbf{H}},{\mathbf{e}}) such that for 𝐒{\mathbf{S}} and 𝐋{\mathbf{L}} in (12), degmax({𝐒})inc({𝐋})<1/12\deg_{\max}(\Im\{{\mathbf{S}}\})inc(\Im\{{\mathbf{L}}\})<1/12. Then, the true topology of (𝐇,𝐞)({\mathbf{H}},{\mathbf{e}}) can be reconstructed by solving the optimization problem (16)\eqref{eq:convex_opti_t} with 𝐂={Φo1(z)}{\mathbf{C}}=\Im\{\Phi^{-1}_{o}(z)\} for some z=eiω,ω(0,2π]z=e^{i\omega},~{}\omega\in(0,2\pi]. In particular, 𝒯(𝒱,)=supp(𝐒^t){\mathcal{T}}({\mathcal{V}},{\mathcal{E}})=supp({\widehat{\mathbf{S}}}_{t}), for appropriately selected tt.

Proof: Refer Appendix H. ∎

Additionally, applying the algorithms in [46], the correlation graph also can be reconstructed, if the LDIM satisfies the following assumption, as shown in simulation results.

Assumption 5.

For every distinct latent nodes kh,khk_{h},k_{h}^{\prime} in the transformed dynamic graph, the distance between khk_{h} and khk_{h}^{\prime} is at least four hops.

Algorithm 1 Matrix decomposition

Input:Φoo1(z)\Phi_{oo}^{-1}(z): IPSDM among 𝒱o{\mathcal{V}}_{o}, ε\varepsilon, z=ejω,ω(π,π]z=e^{j\omega},~{}\omega\in(-\pi,\pi]
Output: Matrices (𝐒(z))\Im({\mathbf{S}}(z)) and (𝐋(z))\Im({\mathbf{L}}(z))

1:Set 𝐂={Φoo1(z)}{\mathbf{C}}=\Im\{\Phi_{oo}^{-1}(z)\}
2:Initialize (𝐒^0,𝐋^0)=(𝐂,𝟎)(\widehat{{\mathbf{S}}}_{0},\widehat{{\mathbf{L}}}_{0})=({\mathbf{C}},\mathbf{0})
3:for all t{ε,2ε,,1}t\in\{\varepsilon,2\varepsilon,\dots,1\} do
4:     Solve the convex optimization (16) and calculate difft{diff}_{t} in (20)
5:end for
6:Identify the three regions where difftdiff_{t} is zero and denote the middle region as [t1,t2][t_{1},t_{2}]. Pick a t0[t1,t2]t_{0}\in[t_{1},t_{2}] and the corresponding pair (𝐒^t0,𝐋^t0)(\hat{{\mathbf{S}}}_{t_{0}},\hat{{\mathbf{L}}}_{t_{0}}).
7:if degmax(𝐒^t0)inc(𝐋^t0)<112deg_{max}(\hat{{\mathbf{S}}}_{t_{0}})inc(\hat{{\mathbf{L}}}_{t_{0}})<\frac{1}{12} then
8:     (𝐒^(z),𝐋^(z))=(𝐒^t0,𝐋^t0)(\widehat{{\mathbf{S}}}(z),\widehat{{\mathbf{L}}}(z))=({\widehat{\mathbf{S}}}_{t_{0}},{\widehat{\mathbf{L}}}_{t_{0}})
9:     Return (𝐒^(z),𝐋^(z))(\widehat{{\mathbf{S}}}(z),\widehat{{\mathbf{L}}}(z))
10:end if

7 Simulation results

In this section, we demonstrate topology reconstruction of an LDIM (𝐇,𝐞)({\mathbf{H}},{\mathbf{e}}) with Φe\Phi_{e} non-diagonal, from Φ𝐱1\Phi_{\mathbf{x}}^{-1}, using the sparse++low-rank decomposition technique discussed in Section 6 for an affinely correlated network. Fig. 4(a)-4(e) respectively depicts 𝒢(𝒱,){\mathcal{G}}({\mathcal{V}},{\mathcal{E}}), 𝒯(𝒱,){\mathcal{T}}({\mathcal{V}},{\mathcal{E}}), 𝒢c(𝒱,c){\mathcal{G}}_{c}({\mathcal{V}},{\mathcal{E}}_{c}), 𝒢(𝒱t,t){\mathcal{G}}({\mathcal{V}}_{t},{\mathcal{E}}_{t}\setminus{\mathcal{E}}), and 𝒢t(𝒱t,t){\mathcal{G}}_{t}({\mathcal{V}}_{t},{\mathcal{E}}_{t}) described in Section 4. Simulations are done in Matlab. Yalmip [23] with SDP solver [44] is used to solve the convex program (16).

For the simulation, we assume we have access to the true PSDM, Φ𝐱\Phi_{\mathbf{x}}, of the LDIM of Fig. 4(a). Here, Φ𝐞\Phi_{\mathbf{e}} is non-diagonal with the (unkown) correlation structure as shown in Fig. 4(c).

For the reconstruction, the imaginary part of the IPSDM, 𝐂={Φ𝐱1(z)}{\mathbf{C}}=\Im\{\Phi^{-1}_{\mathbf{x}}(z)\} is employed in the convex optimization (16) for z=e2π/8z=e^{2\pi/8}. Optimization (16) is solved for all the values of t[ϵ,1]t\in[\epsilon,1], with the interval ϵ=0.01\epsilon=0.01. Notice that for t=0t=0 (𝐒^t,𝐋^t)=(𝐂,0)({\widehat{\mathbf{S}}}_{t},{\widehat{\mathbf{L}}}_{t})=({\mathbf{C}},0). Fig. 5 shows the comparison of tolttol_{t} and difft versus tt. (𝐒^t,𝐋^t)({\widehat{\mathbf{S}}}_{t},{\widehat{\mathbf{L}}}_{t}) for t=0.36t=0.36 is picked, which belongs to the middle zero region of difft as described in [46].

𝕀{𝐒^t𝟎}{\mathbb{I}}_{\{{\widehat{\mathbf{S}}}_{t}\neq{\mathbf{0}}\}} returned the exact topology of Fig. 4(b). From 𝕀{𝐋^t𝟎}{\mathbb{I}}_{\{{\widehat{\mathbf{L}}}_{t}\neq{\mathbf{0}}\}}, by following Algorithms 22 and 33 in [46], 𝒢c(𝒱,c){\mathcal{G}}_{c}({\mathcal{V}},{\mathcal{E}}_{c}) also is reconstructed, which matches Fig. 4(c) exactly.

Refer to caption
(a) The true LDIG, 𝒢(𝒱,){\mathcal{G}}({\mathcal{V}},{\mathcal{E}}), of (𝐇,𝐞)({\mathbf{H}},{\mathbf{e}}).
Refer to caption
(b) The true topology, 𝒯(𝒱,){\mathcal{T}}({\mathcal{V}},{\mathcal{E}}) of (𝐇,𝐞)({\mathbf{H}},{\mathbf{e}}).
Refer to caption
(c) Correlation graph, 𝒢c(𝒱,c){\mathcal{G}}_{c}({\mathcal{V}},{\mathcal{E}}_{c}), of (𝐇,𝐞)({\mathbf{H}},{\mathbf{e}});M=3M=3
Refer to caption
(d) Transformed correlation structure with latent nodes
Refer to caption
(e) Transformed graph, 𝒢t(𝒱t,t){\mathcal{G}}_{t}({\mathcal{V}}_{t},{\mathcal{E}}_{t}), of (𝐇~,𝐞~)q(𝐇,𝐞)(\mathbf{{\widetilde{H}}},\mathbf{{\widetilde{e}}})\in{\mathcal{L}}_{q}({\mathbf{H}},{\mathbf{e}}).
Figure 4: Network structure
Refer to caption
Figure 5: tolt and difft plots for the network of Fig. 4(a) under affine correlation
Refer to caption
Figure 6: tolt and difft plots for the network of Fig. 4(a) under polynomial correlation
Refer to caption
(a) The true topology and the topology estimated directly from Φ^o1(ej2π/5)\widehat{\Phi}_{o}^{-1}(e^{j2\pi/5}) for the network shown in Fig. 4(a) under affine correlation with N=106N=10^{6} data samples. The figures show the entries of matrices of size 29×2929\times 29. In the figures of topology, black denotes zero (edge absent) and white denotes 1 (edge present). In the gray scale images, gray denotes zero.
Refer to caption
(b) 𝐒^t{\widehat{\mathbf{S}}}_{t} obtained by solving (16) for 𝐂={Φ^o1(ej2π/5)}{\mathbf{C}}=\Im\{\widehat{\Phi}_{o}^{-1}(e^{j2\pi/5})\}. Topology is estimated from 𝐒^t{\widehat{\mathbf{S}}}_{t} for t=0.34t=0.34. In the figures of topology, black denotes zero (edge absent) and white denotes 1 (edge present). In the gray scale images, gray denotes zero.
Figure 7:

7.1 Polynomial correlation

Here, the topology reconstruction of the network when the noise processes are polynomially correlated, is shown. For the simulation, consider the LDIG shown in Fig. 4(a) and correlation graph of 4(b). The noise processes are IID GP as shown in Section 5.2 with m=2m=2 and p=3p=3. The entries of 𝐅{\mathbf{F}} are such that only the co-efficients corresponding to y2,y5,y_{2},y_{5}, and y9y_{9} are non-zero, and the coefficients Fij=0F_{ij}=0, for every 1i291\leq i\leq 29, 1j10,j{2,5,9}1\leq j\leq 10,~{}j\notin\{2,5,9\}.

Figure 6 shows the difft and tolt plot of applying Algorithm 1 with ϵ=0.01\epsilon=0.01. As shown in the plots, for t[.28,.38]t\in[.28,.38], tolt is zero, which corresponds to the exact decomposition. Additionally, as mentioned in Lemma 12, difft is zero in this interval. The support of 𝐒^t{\widehat{\mathbf{S}}}_{t} for some t[.28,.38]t\in[.28,.38] reconstructs the exact topology, which validates Theorem 14.

7.2 Finite data simulation

In this section, to evaluate the effect of finite data size, simulations are run on a synthetic data set based on the network shown in Fig. 4(e). For the PSD estimation, Welch method [41] is used. Notice that the accuracy and the sample complexity of the estimation can be improved by employing advanced IPSDM estimation techniques from literature, for example see [2, 48, 49]. Fig. 7 shows the estimated results from a sample size of 10610^{6}; Fig. 7(a) shows the true and the estimated IPSDM matrices. The fourth matrix of Fig. 7(a) shows the topology estimated directly from {Φ^o1(ej2π/5)}\Im\{\widehat{\Phi}_{o}^{-1}(e^{j2\pi/5})\}, without decomposition. The estimation is done by hard thresholding. That is, the edge (i,j)(i,j) is detected if [{Φ^o1(ej2π/5)}]i,j[\Im\{\widehat{\Phi}_{o}^{-1}(e^{j2\pi/5})\}]_{i,j} is greater than a threshold, and not detected otherwise. The detection threshold is selected to obtain the minimum number of errors. 14 out of 16 edges are detected but 6 false positive edges are also detected, with the total error of 50% (8 out of 16 edges). This shows that estimating the topology directly from {Φ^o1(ej2π/5)}\Im\{\widehat{\Phi}_{o}^{-1}(e^{j2\pi/5})\} returns an undesirable number of errors.

Towards the exact topology retrieval, the optimization (16) is performed on {Φ^o1(ej2π/5)}\Im\{\widehat{\Phi}_{o}^{-1}(e^{j2\pi/5})\} to obtain sparse plus low-rank decomposition. Fig. 7(b) shows the sparse part retrieved from the decomposition of {Φ^o1(ej2π/5)}\Im\{\widehat{\Phi}_{o}^{-1}(e^{j2\pi/5})\} for various tt from 0 to 0.50.5. 𝐒^t{\widehat{\mathbf{S}}}_{t} at t=0.34t=0.34 is selected for estimating the topology. Thus, as illustrated by the above example, with the approach proposed in the article it is possible to choose a threshold that yields 100% detection without sacrificing the false alarm performance.

In order to demonstrate that the techniques proposed in this article do not degrade drastically with lesser number of samples, a simulation is run on 6000 samples. Employing the detection from {Φ^o1(ej2π/5)}\Im\{\widehat{\Phi}_{o}^{-1}(e^{j2\pi/5})\} returned 48 false edges and missed one edge, thus a total of 49 errors. However, with the decomposition, detection from the support of 𝐒^t{\widehat{\mathbf{S}}}_{t} at t=.34t=.34 detected 14 out of 16 edges and missed none, thus giving an error rate of 87.5%. This confirms that the decomposition based reconstruction proposed by the article yield substantial advantages. The sample complexity analysis of the article’s methods is open for future research.

8 Conclusion

In this article, the problem of reconstructing the topology of an LDIM with spatially correlated noise sources was studied. First, assuming affine correlation and the knowledge of correlation graph, the given LDIM was transformed into an LDIM with latent nodes, where the latent nodes were characterized using the correlation graph and all the nodes were excited by uncorrelated noises. For polynomial correlation, a generalization of the affine correlation, the latent nodes in the transformed LDIMs were excited using clusters of noise, where the noise clusters were uncorrelated with each other. Finally, using a sparse low-rank matrix decomposition technique, the exact topology of the LDIM was reconstructed, solely from the IPSDM of the true LDIM, when the network satisfied a sufficient condition required for the matrix decomposition. Simulation results are provided that verify the theoretical results.

Appendix A Proof of Lemma 2

Let ([𝐇,𝐅],𝐞~)(𝐇,𝐞)([{\mathbf{H}},{\mathbf{F}}],\mathbf{{\widetilde{e}}})\in{\mathcal{L}}({\mathbf{H}},{\mathbf{e}}) and let i,j[n]i,j\in[n]. By definition of (𝐇,𝐞){\mathcal{L}}({\mathbf{H}},{\mathbf{e}}), ei=e~i+𝐅i𝐞~he_{i}={\widetilde{e}}_{i}+{\mathbf{F}}_{i}\mathbf{{\widetilde{e}}}_{h}, where 𝐅i{\mathbf{F}}_{i} denotes the ithi^{\text{th}} row of 𝐅{\mathbf{F}}, for any i[n]i\in[n]. To prove the only if part, suppose that Φeiej0\Phi_{e_{i}e_{j}}\neq 0. By definition, eiej=(e~i+𝐅i𝐞~h)(e~j+𝐅j𝐞~h)e_{i}e_{j}=({\widetilde{e}}_{i}+{\mathbf{F}}_{i}\mathbf{{\widetilde{e}}}_{h})({\widetilde{e}}_{j}+{\mathbf{F}}_{j}\mathbf{{\widetilde{e}}}_{h}). Then, Φeiej=𝐅iΦ𝐞~h𝐅j\Phi_{e_{i}e_{j}}={\mathbf{F}}_{i}\Phi_{\mathbf{{\widetilde{e}}}_{h}}{\mathbf{F}}_{j}^{*} =kFikFjkΦe~hk=\sum_{k}F_{ik}F_{jk}^{*}\Phi_{{\widetilde{e}}_{h_{k}}}, since e~i{\widetilde{e}}_{i} and e~j{\widetilde{e}}_{j} are uncorrelated for i,j[n+L]i,j\in[n+L]. Thus, kFikFjkΦe~hk0\sum_{k}F_{ik}F_{jk}^{*}\Phi_{{\widetilde{e}}_{h_{k}}}\neq 0, which implies that there exists a kk such that Fik0F_{ik}\neq 0 and Fjk0F_{jk}\neq 0. In other words, there exists a latent node hkh_{k} in the corresponding LDIG of ([𝐇,𝐅],𝐞~)([{\mathbf{H}},{\mathbf{F}}],\mathbf{{\widetilde{e}}}) such that i,jCh(hk)i,j\in Ch(h_{k}).

\Leftarrow Let i,j[n]i,j\in[n] such that Φeiej=0\Phi_{e_{i}e_{j}}=0. Then, from the proof of only if part, 0=𝐅iΦ𝐞~h𝐅j=kFikFjkΦe~hk,0={\mathbf{F}}_{i}\Phi_{\mathbf{{\widetilde{e}}}_{h}}{\mathbf{F}}_{j}^{*}=\sum_{k}F_{ik}F_{jk}^{*}\Phi_{{\widetilde{e}}_{h_{k}}}, which implies that Fik=0F_{ik}=0 or Fjk=0,kF_{jk}=0,\ \forall k, except for a few pathological cases that occur with Lebesgue measure zero. We ignore the pathological cases here. Hence, there does not exist any latent node hh such that both i,jCh(h)i,j\in Ch(h). ∎

The following result shows that if a subgraph, 𝒢(𝒱,){\mathcal{G}}^{\ell}({\mathcal{V}}^{\ell},{\mathcal{E}}^{\ell}), of the correlation graph, 𝒢c(𝒱,c){\mathcal{G}}_{c}({\mathcal{V}},{\mathcal{E}}_{c}), forms a maximal clique in 𝒢c{\mathcal{G}}_{c}, then for any transformed LDIG in (𝐇,𝐞){\mathcal{L}}({\mathbf{H}},{\mathbf{e}}), the set of nodes in 𝒱l{\mathcal{V}}^{l} is equal to the set of children of some latent nodes in the LDIG.

Appendix B Proof of Lemma 3

The following lemma is useful in proving Lemma 3.

Lemma 15.

Let (𝐇,𝐞)({\mathbf{H}},{\mathbf{e}}) be an LDIM defined by (1) which satisfies Assumption 1, and let 𝒢c(𝒱,c){\mathcal{G}}_{c}({\mathcal{V}},{\mathcal{E}}_{c}) be the correlation graph of the exogenous noise sources, 𝐞{\mathbf{e}}. Suppose that 𝒢(𝒱,)𝒢c(𝒱,c){\mathcal{G}}^{\ell}({\mathcal{V}}^{\ell},{\mathcal{E}}^{\ell})\subseteq{\mathcal{G}}_{c}({\mathcal{V}},{\mathcal{E}}_{c}) is a maximal clique with |𝒱|>1|{\mathcal{V}}^{\ell}|>1. Then, for every LDIG ([𝐇,𝐅],𝐞~)(𝐇,𝐞)([{\mathbf{H}},{\mathbf{F}}],\mathbf{{\widetilde{e}}})\in{\mathcal{L}}({\mathbf{H}},{\mathbf{e}}), there exist latent nodes h1,,hkh_{1}^{\ell},\dots,h^{\ell}_{k_{\ell}} such that

𝒱\displaystyle{\mathcal{V}}^{\ell} =i=1kCh(hi) and =i=1k,i,\displaystyle=\bigcup_{i=1}^{k_{\ell}}Ch(h^{\ell}_{i})\text{ and }{\mathcal{E}}^{\ell}=\bigcup_{i=1}^{k_{\ell}}{\mathcal{E}}_{\ell,i}, (21)

where ,i:={(k,j):k,jCh(hi)}{\mathcal{E}}_{\ell,i}:=\{(k,j):k,j\in Ch(h_{i}^{\ell})\}.

In particular, for any latent node hh in the LDIG ([𝐇,𝐅],𝐞~)([{\mathbf{H}},{\mathbf{F}}],\mathbf{{\widetilde{e}}}), Ch(h)Ch(h) forms a clique (not necessarily maximal) that is restricted to 𝒢(𝒱,){\mathcal{G}}^{\ell}({\mathcal{V}}^{\ell},{\mathcal{E}}^{\ell}) in 𝒢c{\mathcal{G}}_{c}.

Proof: Let 𝒱[n]{\mathcal{V}}^{\ell}\subseteq[n], |𝒱l|>1|{\mathcal{V}}^{l}|>1, such that 𝒱l{\mathcal{V}}^{l} forms a clique in 𝒢c{\mathcal{G}}_{c}. Lemma 2 showed that, for any i,j𝒱li,j\in{\mathcal{V}}^{l}, there exists a latent node hh such that i,jCh(h)i,j\in Ch(h) in the LDIG, 𝒢{\mathcal{G}}, of (𝐇~,𝐞~)(\mathbf{{\widetilde{H}}},\mathbf{{\widetilde{e}}}). Since this is true for any pair i,j𝒱li,j\in{\mathcal{V}}^{l}, there exists a minimal set of latent nodes l:={hil}i=1kl\mathcal{H}^{l}:=\{h^{l}_{i}\}_{i=1}^{k_{l}}, kl{0}k_{l}\in\mathbb{N}\setminus\{0\}, s. t. for any i,j𝒱li,j\in{\mathcal{V}}^{l}, we have i,jCh(hpl)i,j\in Ch(h_{p}^{l}) for some hplh_{p}^{l}\in\mathcal{H}, in 𝒢{\mathcal{G}}. Hence, 𝒱li=1klCh(hil){\mathcal{V}}^{l}\subseteq\bigcup_{i=1}^{k_{l}}Ch(h^{l}_{i}). Similarly, i,jCh(hpl)i,j\in Ch(h_{p}^{l}) implies (i,j)l,p(i,j)\in{\mathcal{E}}_{l,p}. Therefore, li=1kll,i{\mathcal{E}}^{l}\subseteq\bigcup_{i=1}^{k_{l}}{\mathcal{E}}_{l,i}.

Next, we prove that all the children of a given latent node belong to a single (maximal) clique, which proves 𝒱li=1klCh(hil){\mathcal{V}}^{l}\supseteq\bigcup_{i=1}^{k_{l}}Ch(h^{l}_{i}) and li=1kll,i{\mathcal{E}}^{l}\supseteq\bigcup_{i=1}^{k_{l}}{\mathcal{E}}_{l,i}. Let hlh\in\mathcal{H}^{l} be a latent node in the LDIG of (𝐇~,𝐞~)(\mathbf{{\widetilde{H}}},\mathbf{{\widetilde{e}}}) and suppose i,jCh(h)i,j\in Ch(h). Then, from the definition of (𝐇~,𝐞~)(\mathbf{{\widetilde{H}}},\mathbf{{\widetilde{e}}}), there exists \ell such that Fi0F_{i\ell}\neq 0 and Fj0F_{j\ell}\neq 0, and hence Φeiej=𝐅iΦ𝐞~h𝐅j0\Phi_{e_{i}e_{j}}={\mathbf{F}}_{i}\Phi_{\mathbf{{\widetilde{e}}}_{h}}{\mathbf{F}}_{j}^{*}\neq 0 a.e. Thus, (i,j)c(i,j)\in{\mathcal{E}}_{c}, excluding the pathological cases. Notice that this is true for any i,jCh(h)i,j\in Ch(h). Therefore, Ch(h)𝒱Ch(h)\subseteq{\mathcal{V}}^{\ell} forms a clique (not necessarily maximal) in 𝒢c(𝒱,c){\mathcal{G}}_{c}({\mathcal{V}},{\mathcal{E}}_{c}). Since this is true for every hlh\in\mathcal{H}^{l}, 𝒱li=1klCh(hil){\mathcal{V}}^{l}\supseteq\bigcup_{i=1}^{k_{l}}Ch(h^{l}_{i}). The similar proof shows that li=1kll,i{\mathcal{E}}^{l}\supseteq\bigcup_{i=1}^{k_{l}}{\mathcal{E}}_{l,i}. ∎

The proof of Lemma 3 follows directly from (21) and by noting that k1k_{\ell}\geq 1

Appendix C Proof of Theorem 4

Lemma 16.

(Pigeonhole principle): The pigeonhole principle states that if nn pigeons are put into mm pigeonholes, with n>mn>m, then at least one hole must contain more than one pigeon.

We use pigeonhole principle and Lemma 15 to prove this via contrapositive argument. Recall that the number of latent nodes LL is equal to the number of cliques MM. Suppose there exists a clique 𝒢l(𝒱l,l){\mathcal{G}}^{l}({\mathcal{V}}^{l},{\mathcal{E}}^{l}) such that, for some i,j𝒱li,j\in{\mathcal{V}}^{l}, there does not exist a latent node h(𝐇~,𝐞~)h\in(\mathbf{{\widetilde{H}}},\mathbf{{\widetilde{e}}}) with i,jCh(h)i,j\in Ch(h). By Lemma 15, there exist latent nodes h,hh,h^{\prime} such that iCh(h)i\in Ch(h) and jCh(h)j\in Ch(h^{\prime}). By Lemma 15 again, all the children of hh are included in a single clique. That is, Ch(h)𝒱lCh(h)\subseteq{\mathcal{V}}^{l} and Ch(h)𝒱lCh(h^{\prime})\subseteq{\mathcal{V}}^{l}, since i,j𝒱li,j\in{\mathcal{V}}^{l}. Then, excluding 𝒱l{\mathcal{V}}^{l}, hh, and hh^{\prime}, we are left with M1M-1 cliques and L2=M2L-2=M-2 latent nodes. Applying pigeonhole principle, with M1M-1 pigeons (cliques) and L2L-2 holes (latent nodes), there would exist at least one latent node kk with Ch(k)Ch(k) belonging to two different maximal cliques, which is a contradiction of Lemma 15. ∎

Appendix D Proof of Proposition 5

Let T1,T2{0,1}(n+L)×(n+L)T_{1},\ T_{2}\in\{0,1\}^{(n+L)\times(n+L)}, T1T2T_{1}\neq T_{2}, be the topologies of two distinct transformations (𝐇~1,𝐞~1)(\mathbf{{\widetilde{H}}}^{1},\mathbf{{\widetilde{e}}}^{1}) and (𝐇~2,𝐞~2)(\mathbf{{\widetilde{H}}}^{2},\mathbf{{\widetilde{e}}}^{2}) respectively. Without loss of generality, let (i,j)(i,j) be such that [T1]ij0[T_{1}]_{ij}\neq 0 and [T2]ij=0[T_{2}]_{ij}=0. By the definition of ([𝐇,𝐅],𝐞~)([{\mathbf{H}},{\mathbf{F}}],\mathbf{{\widetilde{e}}}), if ini\leq n and jnj\leq n, then [T1]ij=[T2]ij=𝕀{{Hij0}{Hji0}}[T_{1}]_{ij}=[T_{2}]_{ij}={\mathbb{I}}_{\{\{H_{ij}\neq 0\}\cup\{H_{ji}\neq 0\}\}}. If ini\leq n and jnj\geq n, then ii is an observed node and jj is a latent node. From Lemma 2, Fij1=Fij2=0F^{1}_{ij}=F^{2}_{ij}=0 if and only if jPa(i)j\notin Pa(i). Thus, [T1]ij=[T2]ij[T_{1}]_{ij}=[T_{2}]_{ij}, which is a contradiction, since both cannot be true. Similar contradiction holds if ini\geq n and jnj\leq n. If i,j>ni,j>n, then [T1]ij=[T2]ij=0[T_{1}]_{ij}=[T_{2}]_{ij}=0 since Pa(i)=Pa(j)=Pa(i)=Pa(j)=\emptyset. Thus, the assumption leads to a contradiction, which implies that T1=T2T_{1}=T_{2}. ∎

Appendix E Proof of Proposition 7

Consider a pair of monomials yk,yly_{k},y_{l} with yk=𝐯(0)αy_{k}={\mathbf{v}}(0)^{\alpha} and yl=𝐯(0)βy_{l}={\mathbf{v}}(0)^{\beta}. For notational convenience, the index 0 is omitted. Then, 𝔼{ykyl}=𝔼{𝐯α+β}{\mathbb{E}}\left\{y_{k}y_{l}\right\}={\mathbb{E}}\left\{{\mathbf{v}}^{\alpha+\beta}\right\} =i=1m𝔼{𝐯iαi+βi}\displaystyle=\prod_{i=1}^{m}{\mathbb{E}}\left\{{\mathbf{v}}_{i}^{\alpha_{i}+\beta_{i}}\right\}. By (7), 𝔼{ykyl}0{\mathbb{E}}\left\{y_{k}y_{l}\right\}\neq 0 if and only if αi+βi\alpha_{i}+\beta_{i} is even, i[m]\forall i\in[m]. Suppose αi\alpha_{i} is odd. Then, βi\beta_{i} must be odd. Similarly, βi\beta_{i} must be even if αi\alpha_{i} is even, i[m]\forall i\in[m].

Define an element-wise boolean operator, :m{0,1}m{\mathcal{B}}:\mathbb{N}^{m}\mapsto\{0,1\}^{m} such that for 𝐮=(α){\mathbf{u}}={\mathcal{B}}(\alpha), ui=0u_{i}=0 if αi\alpha_{i} is odd and ui=1u_{i}=1 if αi\alpha_{i} is even. Then, for yk=𝐯αy_{k}={\mathbf{v}}^{\alpha} and yl=𝐯βy_{l}={\mathbf{v}}^{\beta}, 𝔼{ykyl}0{\mathbb{E}}\left\{y_{k}y_{l}\right\}\neq 0 if and only if (α)=(β){\mathcal{B}}(\alpha)={\mathcal{B}}(\beta). Group the monomials with the the same odd-even pattern into one cluster. Since the total different values that (){\mathcal{B}}(\cdot) can take is 2m2^{m}, there are 2m2^{m} distinct clusters that are uncorrelated with each other.

Reorder 𝐲{\mathbf{y}} by grouping the monomials belonging to the same cluster together to obtain 𝐲~\mathbf{\tilde{{\mathbf{y}}}}, similar to the (n,p)=(2,3)(n,p)=(2,3) example in Section 5.2. Then, Φ𝐲~\Phi_{\mathbf{\tilde{{\mathbf{y}}}}} is a block diagonal matrix with 2m2^{m} blocks for mm variable polynomials, where each diagonal block corresponds to one particular pattern of {Odd,Even}m\{Odd,Even\}^{m}. ∎

Appendix F Proof of Lemma 9

Let (𝐇,𝐅,𝐞~)(𝐇,𝐞)({\mathbf{H}},{\mathbf{F}},\mathbf{{\widetilde{e}}})\in{\mathcal{L}}({\mathbf{H}},{\mathbf{e}}) and let i,j[n]i,j\in[n]. By definition of (𝐇,𝐞){\mathcal{L}}({\mathbf{H}},{\mathbf{e}}), ei=e~i+𝐅i𝐞~he_{i}={\widetilde{e}}_{i}+{\mathbf{F}}_{i}\mathbf{{\widetilde{e}}}_{h}, where 𝐅i{\mathbf{F}}_{i} denotes the ithi^{\text{th}} row of 𝐅{\mathbf{F}}, for any i[n]i\in[n]. To prove the only if part, suppose that Φeiej0\Phi_{e_{i}e_{j}}\neq 0. By definition, eiej=(e~i+𝐅i𝐲)(e~j+𝐅j𝐲)e_{i}e_{j}=({\widetilde{e}}_{i}+{\mathbf{F}}_{i}{\mathbf{y}})({\widetilde{e}}_{j}+{\mathbf{F}}_{j}{\mathbf{y}}). Since 𝐞~i,𝐞~j\mathbf{{\widetilde{e}}}_{i},\mathbf{{\widetilde{e}}}_{j}, and 𝐞~h\mathbf{{\widetilde{e}}}_{h} are uncorrelated, and Φ𝐲\Phi_{{\mathbf{y}}} is block diagonal (Proposition 8),

Φeiej=𝐅iΦ𝐲𝐅j=c=12mk1,k2𝒞cFik1Fjk2Φ𝐲k1𝐲k20.\Phi_{e_{i}e_{j}}={\mathbf{F}}_{i}\Phi_{{\mathbf{y}}}{\mathbf{F}}_{j}^{*}=\sum_{c=1}^{2^{m}}\sum_{k_{1},k_{2}\in{\mathcal{C}}_{c}}F_{ik_{1}}F_{jk_{2}}^{*}\Phi_{{\mathbf{y}}_{k_{1}}{\mathbf{y}}_{k_{2}}}\neq 0.

Thus, there exists a cc such that Fik10F_{ik_{1}}\neq 0 and Fjk20F_{jk_{2}}\neq 0 for some k1,k2𝒞ck_{1},k_{2}\in{\mathcal{C}}_{c}. That is, there exists a cluster cc such that i,jCh(c)i,j\in Ch(c).

\Leftarrow Let i,j[n]i,j\in[n] such that Φeiej=0\Phi_{e_{i}e_{j}}=0. Then, from the proof of only if part, c=12mk1,k2𝒞cFik1Fjk2Φ𝐲k1𝐲k2=0\sum_{c=1}^{2^{m}}\sum_{k_{1},k_{2}\in{\mathcal{C}}_{c}}F_{ik_{1}}F_{jk_{2}}^{*}\Phi_{{\mathbf{y}}_{k_{1}}{\mathbf{y}}_{k_{2}}}=0. That is, Fik=Fjk=0,k𝒞c,c[2m]F_{ik}=F_{jk}=0,\ \forall k\in{\mathcal{C}}_{c},~{}\forall c\in[2^{m}], except for a few pathological cases that occur with Lebesgue measure zero. We ignore the pathological cases here. Hence, there does not exist any cluster cc such that both i,jCh(c)i,j\in Ch(c). ∎

Appendix G Proof of Theorem 10

The proof follows similar to the proof of Lemma 15, by replacing latent nodes with latent cluster, as in the proof of Lemma 9. ∎

Appendix H Proof of Theorem 14

As shown in Lemma 11, if degmax({𝐒})inc({𝐋})<1/12deg_{max}(\Im\{{\mathbf{S}}\})inc(\Im\{{\mathbf{L}}\})<1/12, then for the appropriately selected tt, the convex program (16) retrieves {𝐒}\Im\{{\mathbf{S}}\} and {L}\Im\{L\} when 𝐂={𝐒}+{𝐋}{\mathbf{C}}=\Im\{{\mathbf{S}}\}+\Im\{{\mathbf{L}}\}. If any one of the LDIMs (𝐇~,𝐞)q(𝐇,𝐞)(\mathbf{{\widetilde{H}}},{\mathbf{e}})\in{\mathcal{L}}_{q}({\mathbf{H}},{\mathbf{e}}) satisfies this condition, then the imaginary part of 𝐒^t=(𝐈n𝐇)Φe~o1(𝐈n𝐇){\widehat{\mathbf{S}}}_{t}=({\mathbf{I}}_{n}-{\mathbf{H}}^{*})\Phi_{{\widetilde{e}}_{o}}^{-1}({\mathbf{I}}_{n}-{\mathbf{H}}) returns the topology among the observed node of (𝐇~,𝐞)(\mathbf{{\widetilde{H}}},{\mathbf{e}}), by Lemma 13. The theorem follows from Remark 4.8 and Lemma 15. ∎

{ack}

The authors acknowledge the support of NSF for supporting this research through the project titled ”RAPID: COVID-19 Transmission Network Reconstruction from Time-Series Data” under Award Number 2030096.

References

  • [1] Daniele Alpago, Mattia Zorzi, and Augusto Ferrante. Identification of sparse reciprocal graphical models. IEEE Control Systems Letters, 2(4):659–664, 2018.
  • [2] Daniele Alpago, Mattia Zorzi, and Augusto Ferrante. A scalable strategy for the identification of latent-variable graphical models. IEEE Transactions on Automatic Control, 67(7):3349–3362, 2022.
  • [3] Enrico Avventi, Anders G. Lindquist, and Bo Wahlberg. Arma identification of graphical models. IEEE Transactions on Automatic Control, 58(5):1167–1178, 2013.
  • [4] James M. Bower and David Beeman. The book of GENESIS: exploring realistic neural models with the GEneral NEural SImulation System. Springer Science & Business Media, 2012.
  • [5] David Carfi and Giovanni Caristi. Financial dynamical systems. Differential Geometry–Dynamical Systems, 2008.
  • [6] Elena Ceci, Yanning Shen, Georgios B. Giannakis, and Sergio Barbarossa. Graph-based learning under perturbations via total least-squares. IEEE Transactions on Signal Processing, 68:2870–2882, 2020.
  • [7] Venkat Chandrasekaran, Sujay Sanghavi, Pablo A. Parrilo, and Alan S. Willsky. Rank-sparsity incoherence for matrix decomposition. SIAM Journal on Optimization, 21(2):572–596, 2011.
  • [8] Venkat Chandrasekaran, Pablo A. Parrilo, and Alan S. Willsky. Latent variable graphical model selection via convex optimization. The Annals of Statistics, 40(4):1935–1967, 2012.
  • [9] Valentina Ciccone, Augusto Ferrante, and Mattia Zorzi. Robust identification of “sparse plus low-rank” graphical models: An optimization approach. In 2018 IEEE Conference on Decision and Control (CDC), 2241–2246, Dec 2018.
  • [10] Valentina Ciccone, Augusto Ferrante, and Mattia Zorzi. Factor models with real data: A robust estimation of the number of factors. IEEE Transactions on Automatic Control, 64(6):2412–2425, 2019.
  • [11] Valentina Ciccone, Augusto Ferrante, and Mattia Zorzi. Learning latent variable dynamic graphical models by confidence sets selection. IEEE Transactions on Automatic Control, 65(12):5130–5143, 2020.
  • [12] David A. Cox, John Little, and Donal O’Shea. Ideals, varieties, and algorithms-an introduction to computational algebraic geometry and commutative algebra. Springer, 2007.
  • [13] Francesca Crescente, Lucia Falconi, Federica Rozzi, Augusto Ferrante, and Mattia Zorzi. Learning ar factor models. In 2020 59th IEEE Conference on Decision and Control (CDC), 274–279, 2020.
  • [14] Mihaela Dimovska and Donatello Materassi. Granger-causality meets causal inference in graphical models: Learning networks via non-invasive observations. In 2017 IEEE 56th Annual Conference on Decision and Control (CDC), 5268–5273, Dec 2017.
  • [15] Mihaela Dimovska and Donatello Materassi. A control theoretic look at granger causality: extending topology reconstruction to networks with direct feedthroughs. IEEE Transactions on Automatic Control, Early Access:1–1, 2020.
  • [16] H. J.  Dreef, M. C. F. Donkers, and Paul M. J. Van den Hof. Identifiability of linear dynamic networks through switching modules. IFAC-PapersOnLine, 54(7):37–42, 2021.
  • [17] Lucia Falconi, Augusto Ferrante, and Mattia Zorzi. A robust approach to arma factor modeling. arXiv preprint arXiv:2107.03873, 2021.
  • [18] Stefanie J. M. Fonken, Karthik Raghavan Ramaswamy, and Paul M. J. Van den Hof. A scalable multi-step least squares method for network identification with unknown disturbance topology. Automatica, 141:110295, 2022.
  • [19] M. Ghil, M. R. Allen, M. D. Dettinger, K. Ide, D. Kondrashov, M. E. Mann, A. W. Robertson, A. Saunders, Y. Tian, F. Varadi, and P. Yiou. Advanced spectral methods for climatic time series. Reviews of Geophysics, 40(1):3–1–3–41, 2002.
  • [20] Roger A. Horn and Charles R. Johnson. Matrix Analysis. Cambridge University Press, USA, 2nd edition, 2012.
  • [21] Giacomo Innocenti and Donatello Materassi. Modeling the topology of a dynamical network via wiener filtering approach. Automatica, 48(5):936–946, 2012.
  • [22] Raphaël Liégeois, Bamdev Mishra, Mattia Zorzi, and Rudolph Sepulchre. Sparse plus low-rank autoregressive identification in neuroimaging time series. In 2015 54th IEEE Conference on Decision and Control (CDC), 3965–3970, Dec 2015.
  • [23] Johan Lofberg. Yalmip : a toolbox for modeling and optimization in matlab. In 2004 IEEE International Conference on Robotics and Automation (IEEE Cat. No.04CH37508), 284–289, Sep. 2004.
  • [24] Eduardo Mapurunga and Alexandre Sanfelici Bazanella. Optimal allocation of excitation and measurement for identification of dynamic networks. arXiv preprint arXiv:2007.09263, 2020.
  • [25] Donatello Materassi and Giacomo Innocenti. Topological identification in networks of dynamical systems. IEEE Transactions on Automatic Control, 55(8):1860–1871, 2010.
  • [26] Donatello Materassi and Murti V. Salapaka. Network reconstruction of dynamical polytrees with unobserved nodes. In 2012 IEEE 51st IEEE Conference on Decision and Control (CDC), 4629–4634, 2012.
  • [27] Donatello Materassi and Murti V. Salapaka. On the problem of reconstructing an unknown topology via locality properties of the wiener filter. IEEE Transactions on Automatic Control, 57(7):1765–1777, July 2012.
  • [28] Donatello Materassi and Murti V. Salapaka. Signal selection for estimation and identification in networks of dynamic systems: A graphical model approach. IEEE Transactions on Automatic Control, 1–1, 2019.
  • [29] Rohan Money, Joshin Krishnan, and Baltasar Beferull-Lozano. Online non-linear topology identification from graph-connected time series. In 2021 IEEE Data Science and Learning Workshop (DSLW), 1–6, 2021.
  • [30] Rohan Money, Joshin Krishnan, and Baltasar Beferull-Lozano. Random feature approximation for online nonlinear graph topology identification. In 2021 IEEE 31st International Workshop on Machine Learning for Signal Processing (MLSP), 1–6, 2021.
  • [31] Athanasios Papoulis. Probability and Statistics. Prentice-Hall, Inc., USA, 1990.
  • [32] Sourav Patel, Sandeep Attree, Saurav Talukdar, Mangal Prakash, and Murti V Salapaka. Distributed apportioning in a power network for providing demand response services. In 2017 IEEE International Conference on Smart Grid Communications (SmartGridComm), 38–44. IEEE, 2017.
  • [33] Christopher J. Quinn, Negar Kiyavash, and Todd P. Coleman. Directed information graphs. IEEE Transactions on Information Theory, 61(12):6887–6909, 2015.
  • [34] Venkatakrishnan C. Rajagopal, Karthik R. Ramaswamy, and Paul M. J. Van Den Hof. Learning local modules in dynamic networks without prior topology information. In 2021 60th IEEE Conference on Decision and Control (CDC), 840–845, 2021.
  • [35] Karthik R. Ramaswamy and Paul M.J. Vandenhof. A local direct method for module identification in dynamic networks with correlated noise. IEEE Transactions on Automatic Control, 1–1, 2020.
  • [36] Firoozeh Sepehr and Donatello Materassi. Blind learning of tree network topologies in the presence of hidden nodes. IEEE Transactions on Automatic Control, 65(3):1014–1028, March 2020.
  • [37] Firoozeh Sepehr and Donatello Materassi. An algorithm to learn polytree networks with hidden nodes. In Advances in Neural Information Processing Systems 32, 15110–15119. Curran Associates, Inc., 2019.
  • [38] Yanning Shen, Xiao Fu, Georgios B. Giannakis, and Nicholas D. Sidiropoulos. Topology identification of directed graphs via joint diagonalization of correlation matrices. IEEE Transactions on Signal and Information Processing over Networks, 6:271–283, 2020.
  • [39] Shengling Shi, Xiaodong Cheng, and Paul M. J. Van den Hof. Single module identifiability in linear dynamic networks with partial excitation and measurement. IEEE Transactions on Automatic Control, 68(1): 285 - 300 December 2021.
  • [40] Jitkomut Songsiri and Lieven Vandenberghe. Topology selection in graphical models of autoregressive processes. Journal of Machine Learning Research, 11(91):2671–2705, 2010.
  • [41] Petre Stoica and Randolph L. Moses. Spectral analysis of signals, volume 452. Pearson Prentice Hall Upper Saddle River, NJ, 2005.
  • [42] Saurav Talukdar, Deepjyothi Deka, Michael Chertkov, and Murti V. Salapaka. Topology learning of radial dynamical systems with latent nodes. In 2018 Annual American Control Conference (ACC), 1096–1101, June 2018.
  • [43] Saurav Talukdar, Deepjyoti Deka, Harish Doddi, Donatello Materassi, Michael Chertkov, and Murti V. Salapaka. Physics informed topology learning in networks of linear dynamical systems. Automatica, 112:108705, 2020.
  • [44] Reha H. Tütüncü, Kim-Chuan Toh, and Michael J. Todd. Sdpt3—a matlab software package for semidefinite-quadratic-linear programming, version 3.0. Web page http://www. math. nus. edu. sg/mattohkc/sdpt3. html, 2001.
  • [45] Paul M. J. Van den Hof, Arne Dankers, Peter S. C. Heuberger, and Xavier Bombois. Identification of dynamic models in complex networks with prediction error methods—basic methods for consistent module estimates. Automatica, 49(10):2994–3006, 2013.
  • [46] Mishfad S. Veedu, Doddi Harish, and Murti V. Salapaka. Topology learning of linear dynamical systems with latent nodes using matrix decomposition. IEEE Transactions on Automatic Control, 67(11): 5746 - 5761, Nov. 2022.
  • [47] Allen J Wood, Bruce F Wollenberg, and Gerald B Sheblé. Power generation, operation, and control. John Wiley & Sons, 2013.
  • [48] Mattia  Zorzi and Rudolph Sepulchre. Ar identification of latent-variable graphical models. IEEE Transactions on Automatic Control, 61(9):2327–2340, Sep. 2016.
  • [49] Mattia Zorzi and Alessandro Chiuso. Sparse plus low rank network identification: A nonparametric approach. Automatica, 76:355–366, 2017.