Topology Identification under Spatially Correlated Noise

Mishfad Shaikh Veedu Murti V. Salapaka Department of Electrical and Computer Engineering, University of Minnesota, MN 55455, USA (e-mail: {veedu002, murtis}@ umn.edu).

Abstract

This article addresses the problem of reconstructing the topology of a network of agents interacting via linear dynamics, while being excited by exogenous stochastic sources that are possibly correlated across the agents, from time-series measurements alone. It is shown, under the assumption that the correlations are affine in nature, such network of nodal interactions is equivalent to a network with added agents. The added agents are represented by nodes that are latent, where no corresponding time-series measurements are available; however, here all the exogenous excitements are spatially (that is, across agents) uncorrelated. Generalizing affine correlations, it is shown that, under polynomial correlations, the latent nodes in the expanded networks can be excited by clusters of noise sources, where the clusters are uncorrelated with each other. The clusters can be replaced with a single noise source if the latent nodes are allowed to have non-linear interactions. Finally, using the sparse plus low-rank matrix decomposition of the imaginary part of the inverse power spectral density matrix (IPSDM) of the time-series data, the topology of the network is reconstructed. Under non conservative assumptions, the correlation graph of the noise sources is retrieved.

keywords:

Linear dynamical systems, time-series analysis, probabilistic graphical model, network topology identification, power spectral density, sparse estimation, latent nodes; Structure Learning; Learning and Control; Sensor placement.

^†^†thanks: This work is supported by NSF through the project titled ”RAPID: COVID-19 Transmission Network Reconstruction from Time-Series Data” under Award Number 2030096.

1 Introduction

Networks and graphical models provide convenient tools for effective representations of complex high dimensional multi-agent systems. Such a representation is useful in applications including power grids [32, 47], meteorology [19], neuroscience [4], and finance [5]. Knowledge of the network interaction structure (also known as network topology) helps in understanding, predicting, and in many applications, controlling/identifying the system behavior [16, 24, 28, 35, 39, 45]. In applications such as power grid, finance, and meteorological system, it is either difficult or impossible to intervene and affect the system. Hence, inferring the network properties by passive means such as time-series measurements is of great interest in these applications. An example is unveiling the correlation structure between the stocks in the stock market from daily share prices [5], which is useful in predicting the market behavior.

Learning the conditional independence relation between variables from time-series measurements is an active research field among machine learning (ML), probabilistic graphical model (PGM), and statistics communities [6, 8, 29, 30, 33, 38], where the system modules are considered as random variables. However, such studies fail to capture dynamic dependencies between the entities in a system, which are prevalent in most of the aforementioned applications. For dynamical systems, autoregressive (AR) models that are excited by exogenous Gaussian noise sources, which are independent across time and variables, are explored in [1, 2, 3, 40]. Here, the graph topology captured the sparsity pattern of the regressor coefficient matrices, which characterized the conditional independence between the variables. It was shown that the sparsity pattern of the inverse power spectral density matrix (IPSDM), also known as concentration matrix, identifies the conditional independence relation between the variables (see [3, 40]). As shown in [27], the conditional independence structure between the variables is equivalent to the moral-graph structure of the underlying directed graph. For multi-agent systems with linear dynamical interaction, excited by wide sense stationary (WSS) noise sources that are mutually uncorrelated across the agents, moral graph reconstruction using Wiener projection has gained popularity in the last decade [21, 25, 27]. Here, the moral-graph of the underlying linear dynamic influence model (LDIM) is recovered from the magnitude response of Wiener coefficients, obtained from time-series measurements. For a wide range of applications, the spurious connections in the moral-graph can be identified–returning the true topology–by observing the phase response of the Wiener coefficients [43]. Furthermore, for systems with strictly causal dynamical dependencies, Granger causality based algorithms can unveil the exact cause-effect nature of the interactions; thus recovering the exact parent-child relationship of the underlying graph [14, 15, 27].

In many networks, time-series at only a subset of nodes is available. The nodes where the time-series measurements are not available form the latent/confound/hidden nodes. Topology reconstruction is more challenging in the presence of latent nodes as additional spurious connections due to confounded effects are formed when applying the aforementioned techniques. AR model identification of the independent Gaussian time-series discussed above, in the presence of latent nodes, was studied in [9, 10, 11, 13, 17, 22, 48, 49]. Here, the primary goal is to eliminate the spurious connections due to latent nodes and retrieve the original conditional independence structure from the observed time-series. In applications such as power grid, there is a need to retrieve the complete topology of the network, including that of the latent nodes. Such problems are studied for bidirected tree [42], poly-forest [36], and poly-tree [26, 37] networks excited by WSS noise sources that are uncorrelated across the agents. Recently, an approach to reconstruct the complete topology of a general linear dynamical network with WSS noise was provided in [46]. The work in [46] can be considered as a generalization of AR model identification with latent nodes, in asymptotic time-series regime. However, one major caveat of the aforementioned literature on graphical models is that the results fail if the exogenous noise sources are spatially correlated, i.e. if the noise is correlated across the agents/variables. Prior works have studied the systems with spatially correlated noise sources [18, 28, 35]; however, these studies assume the knowledge of topology of the network. In a related work, [34] studied topology estimation under spatially correlated noise sources and used this estimated topology in local module identification.

This article studies the problem of topology identification of the LDIMs that allow spatially correlated noise sources, similar to the problem in [34]. However, this article provides an alternate treatment, where the noise correlations are transformed to latent nodes. This transformation enables one to gain additional insights and apply techniques from topology identification with latent nodes to solve the problem.

The first major result of this article is to provide a transformation that converts an LDIM without latent nodes, but excited by spatially correlated exogenous noise sources, to LDIMs with latent nodes. Here, the latent nodes are characterized by the maximal cliques in the correlation graph, the undirected graph that represents the spatial correlation structure. It is shown that, under affine correlation assumption, that is, when the correlated noise sources are related in affine way (Assumption 1), there exist transformed LDIMs with latent nodes, where all the nodes are excited by spatially uncorrelated noise sources. A key feature of the transformation is that the correlations are completely captured using the latent nodes, while the original topology remains unaltered. The original moral-graph/topology is the same as the moral-graph/topology among the observed nodes in the altered graph. Thus, this transformed problem is shown to be equivalent to topology identification of networks with latent nodes. Consequently, any of the aforementioned techniques for the networks with latent node can be applied on the transformed LDIM to reconstruct the original moral graph/topology.

Next, relaxing the affine correlation assumption, polynomial correlation is considered (Assumption 2); here, the focus is on noise sources with distributions that are symmetric around the mean. It is shown that, in this scenario, the transformed dynamical model can be excited by clusters of noise sources, where the clusters are uncorrelated. However, the noise sources inside the clusters can be correlated. Using the sparse $+$ low-rank decomposition of IPSDM from [46], the true topology of the network along with the correlation structure is reconstructed, from the IPSDM of the original LDIM, without any additional information, if the network satisfies a necessary and sufficient condition. Notice that, the results discussed here are applicable to the networks with static random variables and AR models also, when the exogenous noise sources are spatially correlated.

The article is organized as follows. Section 2 introduces the system model, including LDIM, and the essential definitions. Section 3 discusses IPSDM based topology reconstruction. Section 4 describes the correlation graph to latent nodes transformation and some major results. Section 5 discusses transformation of LDIMs with polynomial correlation to LDIMs with latent nodes. Section 6 explains the sparse $+$ low-rank decomposition technique and how the topology can be reconstructed without the knowledge of correlation graph. Simulation results are provided in Section 7. Finally, Section 8 concludes the article.

Notations: Bold capital letters, ${\mathbf{A}}$ , denotes matrices; $A_{ij}$ and $({\mathbf{A}})_{ij}$ represents $(i,j)^{\text{th}}$ entry of ${\mathbf{A}}$ ; Bold small letters, ${\mathbf{v}}$ , denotes vectors; $v_{i}$ indicates $i^{\text{th}}$ entry of ${\mathbf{v}}$ ; $[n]:=\{1,\dots,n\}$ ; The subscript $o$ denotes the observed nodes index set $[n]$ and the subscript $h$ indicates the latent nodes index set $\{n+1,\dots,n+L\}$ , For example, ${\mathbf{x}}_{o}:=\{x_{1},\dots,x_{n}\}$ and ${\mathbf{x}}_{h}:=\{x_{n+1},\dots,x_{n+L}\}$ ; $\Phi_{\mathbf{x}}(z):=[(\Phi_{\mathbf{x}}(z))_{ij}],~{}z\in\mathbb{C},~{}|z|=1$ denote power spectral density matrix, where $(\Phi_{\mathbf{x}})_{ij}$ is cross power spectral density between the time-series at nodes $i$ and $j$ ; ${\mathcal{F}}$ denotes the set of real rational single input single output transfer functions that are analytic on unit circle; ${\mathcal{A}}$ denotes the set of rationally related zero mean jointly wide sense stationary (JWSS) scalar stochastic process; ${\mathcal{Z}}(.)$ denotes bilateral $z$ -transform; ${\mathbb{S}}^{n}$ denotes the space of all skew symmetric $n\times n$ matrices; $\mathbb{N}$ denotes the set of natural numbers, $\{0,1,2,\dots\}$ ; For a set ${\mathcal{S}}$ , $|{\mathcal{S}}|$ denotes the cardinality of the set. $\|{\mathbf{A}}\|_{*}$ denotes the nuclear norm and $\|{\mathbf{A}}\|_{1}$ denote sum of absolute values of entries of ${\mathbf{A}}$ . $\phase{a}$ denotes the phase of the complex number $a$ .

2 System model

Consider a network of $n$ interconnected nodes, where node $i$ is equipped with time-series measurements, $(\widetilde{x}_{i}(t))_{t\in\mathbb{Z}}$ , $i\in[n]$ . The network interaction is described by,

\displaystyle{\mathbf{x}}(z)

\displaystyle={\mathbf{H}}(z){\mathbf{x}}(z)+{\mathbf{e}}(z),

(1)

where ${\mathbf{x}}(z)=[x_{1}(z),x_{2}(z),\dots,x_{n}(z)]$ , ${\mathbf{x}}_{i}(z)=\mathcal{Z}[{{\widetilde{x}}_{i}}]$ . ${\mathbf{e}}(z)=[e_{1}(z),e_{2}(z),\dots,e_{n}(z)]\in{\mathcal{A}}^{n}$ are the exogenous noise sources with $\Phi_{e}(z)$ is non-singular, but possibly non-diagonal. ${\mathbf{H}}(z)$ is the weighted adjacency matrix with $H_{ii}=0$ , for all $i\in[n]$ . An LDIM is defined as the pair $({\mathbf{H}},{\mathbf{e}})$ , whose output process is given by (1). The LDIM is well-posed if every entry of $({\mathbf{I}}_{n}-{\mathbf{H}}(z))^{-1}$ is analytic on the unit circle, $|z|=1$ and topologically detectable if $\Phi_{\mathbf{e}}$ is positive definite.

Every LDIM has two associated graphs, viz: 1) a directed graph, the linear dynamic influence graph (LDIG), ${\mathcal{G}}({\mathcal{V}},{\mathcal{E}})$ , where ${\mathcal{V}}=[n]$ and ${\mathcal{E}}:=\{(i,j):H_{ji}\neq 0\}$ and 2) an undirected graph, the correlation graph, ${\mathcal{G}}_{c}({\mathcal{V}},{\mathcal{E}}_{c})$ , where ${\mathcal{E}}_{c}:=\{(i,j):(\Phi_{e})_{ij}\neq 0,\ i\neq j\}$ . Notice that, in a directed graph, if $(i,j)\in\mathcal{E}$ then there is a directed arrow from $i$ to $j$ in the graphical representation of $\mathcal{E}$ (see Fig. 1(a) for example). Concisely, “LDIG $({\mathbf{H}},{\mathbf{e}})$ ” is used to denote the LDIG ${\mathcal{G}}({\mathcal{V}},{\mathcal{E}})$ corresponding to the LDIM $({\mathbf{H}},{\mathbf{e}})$ . For a directed graph ${\mathcal{G}}({\mathcal{V}},{\mathcal{E}})$ , the parent, the children, and the spouse sets of node $i$ are $Pa(i):=\{j:(j,i)\in{\mathcal{E}}\}$ , $Ch(i):=\{j:(i,j)\in{\mathcal{E}}\}$ , and $Sp(i):=\{j:j\in Pa(Ch(i))\}$ respectively. The topology of a directed graph ${\mathcal{G}}({\mathcal{V}},{\mathcal{E}})$ , denoted ${\mathcal{T}}({\mathcal{V}},{\mathcal{E}})$ , is the undirected graph obtained by removing the directions from every edge $(i,j)\in{\mathcal{E}}$ . Moral-graph (also called kin-graph) of ${\mathcal{G}}({\mathcal{V}},{\mathcal{E}})$ , denoted $kin({\mathcal{G}})$ is an undirected graph, $kin({\mathcal{G}}):={\mathcal{G}}({\mathcal{V}},{\mathcal{E}}^{\prime})$ , where ${\mathcal{E}}^{\prime}:=\{(i,j):i\neq j,\ i\in\left(Sp(j)\bigcup Pa(j)\bigcup Ch(j)\right)\}$ . A clique is a sub-graph of a given undirected graph where every pair of nodes in the sub-graph are adjacent. A maximal clique is a clique that is not a subset of a larger clique. For a correlation graph, ${\mathcal{G}}_{c}({\mathcal{V}},{\mathcal{E}}_{c})$ , $q$ denotes the number of maximal cliques with clique size $>1$ .

The problem we address is described as follows.

Problem 1.

(P1) Consider a well-posed and topologically detectable LDIM, $({\mathbf{H}},{\mathbf{e}})$ , where $\Phi_{\mathbf{e}}$ is allowed to be non-diagonal and its associated graphs ${\mathcal{G}}({\mathcal{V}},{\mathcal{E}})$ and ${\mathcal{G}}_{c}({\mathcal{V}},{\mathcal{E}}_{c})$ are unknown. Given the power spectral density matrix $\Phi_{{\mathbf{x}}}$ , where ${\mathbf{x}}$ is given by (1), reconstruct the topology of ${\mathcal{G}}$ .

3 IPSDM based topology reconstruction

In this section, the IPSDM based topology reconstruction is presented. In [27], the authors showed that, for any LDIM characterized by $\eqref{eq:Corr_TF_model}$ , the availability of IPSDM, which can be written as

\displaystyle\Phi_{{\mathbf{x}}}^{-1}=({\mathbf{I}}_{n}-{\mathbf{H}})^{*}\Phi_{e}^{-1}({\mathbf{I}}_{n}-{\mathbf{H}}),

(2)

is sufficient for reconstructing the moral-graph of $({\mathbf{H}},{\mathbf{e}})$ . That is, if $(\Phi_{{\mathbf{x}}}^{-1})_{ij}\neq 0$ , then $i$ and $j$ are kins. However, an important assumption for the result to hold true is that $\Phi_{e}^{-1}$ is diagonal. If $\Phi_{e}^{-1}$ is non-diagonal, then the result does not hold in general. For $i\neq j$ ,

	$\displaystyle(\Phi_{{\mathbf{x}}}^{-1})_{ij}$	$\displaystyle=(\Phi_{e}^{-1})_{ij}-({\mathbf{H}}^{}\Phi_{e}^{-1})_{ij}-(\Phi_{e}^{-1}{\mathbf{H}})_{ij}+({\mathbf{H}}^{}\Phi_{e}^{-1}{\mathbf{H}})_{ij}$
		$\displaystyle=(\Phi_{e}^{-1})_{ij}-\sum_{k=1}^{n}({\mathbf{H}}^{*})_{ik}(\Phi_{e}^{-1})_{kj}-\sum_{k=1}^{n}(\Phi_{e}^{-1})_{ik}{\mathbf{H}}_{kj}$
		$\displaystyle\hskip 28.45274pt+\sum_{k=1}^{n}\sum_{l=1}^{n}({\mathbf{H}}^{*})_{ik}(\Phi_{e}^{-1})_{kl}{\mathbf{H}}_{lj},$

and any of the four terms can cause $(\Phi_{{\mathbf{x}}}^{-1})_{ij}\neq 0$ , depending on $\Phi_{\mathbf{e}}^{-1}$ . Hence, this technique cannot be applied directly to solve Problem 1. For example, consider the network ${\mathcal{G}}$ given in Fig. 1(a) with $(\Phi_{\mathbf{e}}^{-1})_{13}\neq 0$ . Then, $(\Phi_{{\mathbf{x}}}^{-1})_{23}\neq 0$ since ${\mathbf{H}}^{*}_{21}(\Phi_{\mathbf{e}}^{-1})_{13}\neq 0$ , which implies that the estimated topology has $(2,3)$ present, while $(2,3)\notin kin({\mathcal{G}})$ .

In the next section, the correlation graph, ${\mathcal{G}}_{c}({\mathcal{V}},{\mathcal{E}}_{c})$ , is scrutinized and properties of ${\mathcal{G}}_{c}({\mathcal{V}},{\mathcal{E}}_{c})$ are evaluated as the first step in unveiling the true topology when noise sources admit correlations.

4 Spatial Correlation to Latent Node Transformation

In this section, a transformation of the LDIM, $({\mathbf{H}},{\mathbf{e}})$ , to an LDIM with latent nodes by exploiting the structural properties of the noise correlation is obtained. The transformation converts an LDIM without latent nodes and driven by spatially correlated exogenous noise sources to an LDIM with latent nodes that are excited by spatially uncorrelated exogenous noise sources. Assuming perfect knowledge of the noise correlation structure, the latent nodes and their children in the transformed LDIM are characterized. It is shown that although the transformation is not unique, the topology of the transformation is unique under affine correlation (Assumption 1).

For $\mathbf{{\widetilde{e}}}=[\mathbf{{\widetilde{e}}}_{o};\mathbf{{\widetilde{e}}}_{h}]$ and $\mathbf{{\widetilde{H}}}:=$ $\left[\begin{matrix}{\mathbf{H}}&{\mathbf{F}}\\ {\mathbf{0}}&{\mathbf{0}}\end{matrix}\right]$ , ${\mathbf{H}}\in{\mathcal{F}}^{n\times n},\ {\mathbf{F}}\in{\mathcal{F}}^{n\times L},~{}L\in{\mathbb{N}}$ , $(\mathbf{{\widetilde{H}}},\mathbf{{\widetilde{e}}})$ (or with a slight abuse of notation, $([{\mathbf{H}},{\mathbf{F}}],\mathbf{{\widetilde{e}}})$ ) denotes an LDIM with $L$ latent nodes, where ${\mathbf{F}}_{ik}$ denotes the directed edge weight from latent node $k$ to observed node $i$ and ${\mathbf{H}}_{ik}$ denotes the directed edge weight from observed node $k$ to observed node $i$ . Notice that the latent nodes considered in this article are strict parents and they do not have incoming edges.

4.1 Relation between Spatial Correlation and Latent Nodes

Here, it is demonstrated that the spatially correlated exogenous noise sources can be viewed as the children of a latent node that is a common parent of the correlated sources. The idea is explained in the motivating example below. Towards this, let us define the following notion of equivalent networks.

Definition 4.1.

Let $({\mathbf{H}}^{(1)},{\mathbf{e}}^{(1)})$ and $({\mathbf{H}}^{(2)},{\mathbf{e}}^{(2)})$ be two LDIMs and let ${\mathbf{x}}^{(1)}=({\mathbf{I}}_{n}-{\mathbf{H}}^{(1)})^{-1}{\mathbf{e}}^{(1)}$ and ${\mathbf{x}}^{(2)}=({\mathbf{I}}_{n}-{\mathbf{H}}^{(2)})^{-1}{\mathbf{e}}^{(2)}$ respectively. Then, the LDIMs $({\mathbf{H}}^{(1)},{\mathbf{e}}^{(1)})$ and $({\mathbf{H}}^{(2)},{\mathbf{e}}^{(2)})$ are said to be equivalent, denoted $({\mathbf{H}}^{(1)},{\mathbf{e}}^{(1)})\equiv({\mathbf{H}}^{(2)},{\mathbf{e}}^{(2)})$ if and only if $\Phi_{{\mathbf{x}}^{(1)}}=\Phi_{{\mathbf{x}}^{(2)}}$ .

Refer to caption — (a) Network with correlated noise sources, ( $\Phi_{\mathbf{e}}$ non-diagonal).

Consider the network of three nodes in Fig. 1(a). Suppose the noise sources $e_{1}$ and $e_{2}$ are correlated, that is, $\Phi_{e_{1}e_{2}}\neq 0$ . Consider the network in Fig. 1(b), where $\widetilde{e}_{1},{\widetilde{e}}_{2},{\widetilde{e}}_{4}$ , and $e_{3}$ are jointly uncorrelated and node $4$ is a latent node. Note that $e_{3}$ is same in both the LDIMs while ${\widetilde{e}}_{1},{\widetilde{e}}_{2}$ are different from $e_{1},e_{2}$ . From the LDIM of Fig. 1(a), it follows that

\displaystyle x_{1}

\displaystyle=e_{1}+h_{13}x_{3}\ \&\ x_{2}=e_{2}+h_{21}x_{1}.

(3)

From the LDIM of Fig. 1(b), it follows that

\displaystyle\begin{split}x_{1}&={\widetilde{e}}_{1}+h_{14}x_{4}+h_{13}x_{3},\\ x_{2}&={\widetilde{e}}_{2}+h_{24}x_{4}+h_{21}x_{1},\ \&\ x_{4}={\widetilde{e}}_{4}.\end{split}

(4)

Comparing (3) and (4), it is evident that if ${\widetilde{e}}_{1},\ {\widetilde{e}}_{2}$ , and ${\widetilde{e}}_{4}$ are such that $e_{1}={\widetilde{e}}_{1}+h_{14}{\widetilde{e}}_{4}$ and $e_{2}={\widetilde{e}}_{2}+h_{24}{\widetilde{e}}_{4}$ , then the time-series obtained from both the LDIMs are identical. That is, the LDIMs are equivalent.

For a given LDIM, $({\mathbf{H}},{\mathbf{e}})$ , with correlated exogenous noise sources, one can define a space of equivalent LDIMs with uncorrelated noise sources that provide the same time-series. The following definition captures this space of “LDIMs with latent nodes and uncorrelated noise sources” that are equivalent to the original LDIM with correlated noise sources.

Assumption 1.

In LDIM $({\mathbf{H}},{\mathbf{e}})$ , defined by (1), the exogenous noise processes $e_{i},e_{j}\in{\mathcal{A}}$ are correlated only via affine interactions, i.e., $\Phi_{e_{i}e_{j}}\neq 0$ if and only if there exists an affine transform, $f(x)=a+bx$ , $a\in{\mathcal{A}},~{}b\in{\mathcal{F}},~{}b\neq 0$ such that either $e_{i}=f(e_{j})$ or $e_{j}=f(e_{i})$ .

Definition 4.2.

For any LDIM, $({\mathbf{H}},{\mathbf{e}})$ , with $\Phi_{\mathbf{e}}$ non-diagonal, and satisfying Assumption 1,

\displaystyle\begin{split}{\mathcal{L}}({\mathbf{H}},{\mathbf{e}})&:=\{(\mathbf{{\widetilde{H}}},\mathbf{{\widetilde{e}}}):{\mathbf{e}}=\mathbf{{\widetilde{e}}}_{o}+{\mathbf{F}}\mathbf{{\widetilde{e}}}_{h},\ \mathbf{{\widetilde{H}}}={\tiny\left[\begin{matrix}{\mathbf{H}}&{\mathbf{F}}\\ {\mathbf{0}}&{\mathbf{0}}\end{matrix}\right]},\ \mathbf{{\widetilde{e}}}=[\mathbf{{\widetilde{e}}}_{o},\mathbf{{\widetilde{e}}}_{h}]^{T},\\ &\ \mathbf{{\widetilde{e}}}\in{\mathcal{A}}^{n+L},\ {\mathbf{F}}\in{\mathcal{F}}^{n\times L},\ L\in{\mathbb{N}},\ \Phi_{\mathbf{{\widetilde{e}}}}\text{ diagonal}\},\end{split}

(5)

is the space of all ${\mathcal{L}}$ -transformations for a given $({\mathbf{H}},{\mathbf{e}})$ .

Remark 4.3.

In (4.2), the number of latent nodes, $L$ , is not fixed a priori.

Based on the definition of ${\mathcal{L}}({\mathbf{H}},{\mathbf{e}})$ in (5), LDIM for $([{\mathbf{H}},{\mathbf{F}}],\mathbf{{\widetilde{e}}})$ can be rewritten as

\displaystyle{\mathbf{x}}(z)

\displaystyle={\mathbf{H}}(z){\mathbf{x}}(z)+{\mathbf{G}}(z)\mathbf{{\widetilde{e}}}(z),

(6)

where $\mathbf{{\widetilde{e}}}(z)=[{\widetilde{e}}_{1}(z),\dots,{\widetilde{e}}_{n}(z),\dots,{\widetilde{e}}_{n+L}(z)]$ are mutually uncorrelated and ${\mathbf{G}}=\begin{bmatrix}{\mathbf{I}}_{n},{\mathbf{F}}\end{bmatrix}$ .

An LDIM, $(\mathbf{{\widetilde{H}}},\mathbf{{\widetilde{e}}})\in{\mathcal{L}}({\mathbf{H}},{\mathbf{e}})$ , is called a transformed LDIM of $({\mathbf{H}},{\mathbf{e}})$ in this article. Intuitively, ${\mathcal{L}}$ -transformation completely assigns the spatial correlation component present in the original noise source, ${\mathbf{e}}$ , to latent nodes, without altering the LDIG, ${\mathcal{G}}({\mathcal{V}},{\mathcal{E}})$ , similar to the discussion on the aforementioned motivating example. For the LDIM in Fig. 2(a) with correlation graph in Fig. 2(b), Fig. 2(c), and Fig. 2(d) show some examples of the LDIGs that belong to ${\mathcal{L}}({\mathbf{H}},{\mathbf{e}})$ . Thus, ${\mathcal{L}}$ -transformation returns a larger network with latent nodes. Notice that the latent nodes present in an LDIG of $(\mathbf{{\widetilde{H}}},\mathbf{{\widetilde{e}}})\in{\mathcal{L}}({\mathbf{H}},{\mathbf{e}})$ , are strict parents, whose interaction with the true nodes are characterized by ${\mathbf{F}}$ .

The following theorem formalizes the relation between the correlation graph and the latent nodes in the transformed higher dimensional LDIMs. In particular, it shows that two noise sources $e_{i}$ and $e_{j}$ are correlated if and only if there is a latent node, which is a common parent of both the nodes $i$ and $j$ in the transformed LDIM.

Lemma 2.

Let $({\mathbf{H}},{\mathbf{e}})$ be an LDIM defined by (1) that satisfies Assumption 1, and let ${\mathcal{G}}_{c}({\mathcal{V}},{\mathcal{E}}_{c})$ be the correlation graph of $\{e_{i}\}_{i=1}^{n}$ . Then, for every distinct $i,j\in[n]$ , $(i,j)\in{\mathcal{E}}_{c}$ if and only if every LDIG $([{\mathbf{H}},{\mathbf{F}}],\mathbf{{\widetilde{e}}})\in{\mathcal{L}}({\mathbf{H}},{\mathbf{e}})$ contains a latent node $h$ such that $h\in Pa(i)\cap Pa(j)$ .

Proof: Refer Appendix A ∎

Thus, $e_{i}$ and $e_{j}$ in the original LDIM are correlated if and only if for every ${\mathcal{L}}$ -transformed LDIM there exists a $k$ such that $F_{ik}\neq 0$ and $F_{jk}\neq 0$ .

The following lemma shows that the number of latent nodes, $L$ , present in any $([{\mathbf{H}},{\mathbf{F}}],\mathbf{{\widetilde{e}}})\in{\mathcal{L}}({\mathbf{H}},{\mathbf{e}})$ is at least $q$ , the number of maximal cliques with clique size greater than one in ${\mathcal{G}}_{c}$ .

Lemma 3.

Let $({\mathbf{H}},{\mathbf{e}})$ be an LDIM and let ${\mathcal{G}}_{c}({\mathcal{V}},{\mathcal{E}}_{c})$ be the correlation graph of the exogenous noise sources, ${\mathbf{e}}$ . Then, the number, $L$ , of latent nodes present in any $([{\mathbf{H}},{\mathbf{F}}],\mathbf{{\widetilde{e}}})\in{\mathcal{L}}({\mathbf{H}},{\mathbf{e}})$ satisfies $L\geq q$ , where $q$ is the number of maximal cliques with clique size $>1$ in ${\mathcal{G}}_{c}$ .

Proof: Refer Appendix B. ∎

Remark 4.4.

Let ${\mathcal{G}}_{c}({\mathcal{V}}_{c},{\mathcal{E}}_{c})$ be the correlation graph of the exogenous noise sources. Then, for any given maximal clique ${\mathcal{G}}^{\ell}({\mathcal{V}}^{\ell},{\mathcal{E}}^{\ell})\subseteq{\mathcal{G}}_{c}({\mathcal{V}}_{c},{\mathcal{E}}_{c})$ and for any $k_{\ell}\geq 1$ , there exists a transformed LDIG $([{\mathbf{H}},{\mathbf{F}}],\mathbf{{\widetilde{e}}})\in{\mathcal{L}}({\mathbf{H}},{\mathbf{e}})$ with $k_{\ell}$ number of latent nodes such that the set ${\mathcal{V}}^{\ell}$ is equal to the children of the latent nodes. For $|{\mathcal{V}}^{\ell}|=3$ , Fig. 2 shows examples with $k_{l}=1,3$ (see proof of Lemma 3 for details).

As shown in Lemma 3 and Fig. 2, the LDIMs in the characterizing space ${\mathcal{L}}({\mathbf{H}},{\mathbf{e}})$ of the equivalent LDIMs can have an arbitrary number of latent nodes, which leads to multiple transformed representations with varying number of latent nodes. The following definition restricts the number of latent nodes present in the equivalent transformed LDIGs and provides a minimal transformed representation–representation with minimal number of nodes–of the true LDIG.

Definition 4.5.

Define ${\mathcal{L}}_{q}({\mathbf{H}},{\mathbf{e}})$ to be the set of all LDIMs in ${\mathcal{L}}({\mathbf{H}},{\mathbf{e}})$ with the number of latent nodes equal to the number of maximal cliques with clique size $>1$ present in ${\mathcal{G}}_{c}$ .

The following lemma shows that there exist a unique latent node in every transformed LDIG $([{\mathbf{H}},{\mathbf{F}}],\mathbf{{\widetilde{e}}})\in{\mathcal{L}}_{q}({\mathbf{H}},{\mathbf{e}})$ , corresponding to each maximal clique.

Theorem 4.

Let $({\mathbf{H}},{\mathbf{e}})$ be an LDIM that satisfies Assumption 1 and let ${\mathcal{G}}_{c}({\mathcal{V}},{\mathcal{E}}_{c})$ be the correlation graph of the exogenous noise sources, $e$ . Suppose ${\mathcal{G}}_{c}$ has $q$ maximal cliques $C_{1},\dots,C_{q}$ . Consider any LDIG $([{\mathbf{H}},{\mathbf{F}}],\mathbf{{\widetilde{e}}})\in{\mathcal{L}}_{q}({\mathbf{H}},{\mathbf{e}})$ with $q$ latent nodes, $h_{1},\dots,h_{q}$ . Then, for a maximal clique $C_{i}$ in ${\mathcal{G}}_{c}$ , there exists a unique latent node $h\in\{h_{1},\dots,h_{q}\}$ such that $Ch(h)=C_{i}$ .

Proof: Refer Appendix C. ∎

Remark 4.6.

Notice that the existence of the unique latent node is true only for the space ${\mathcal{L}}_{q}({\mathbf{H}},{\mathbf{e}})$ . Such a unique node might not exist for the transformed representations in ${\mathcal{L}}({\mathbf{H}},{\mathbf{e}})$ . In other words, Theorem 4 identifies the minimal set of latent nodes required to explain the data.

4.2 Uniqueness of the topology

Here, the LDIMs in ${\mathcal{L}}_{q}({\mathbf{H}},{\mathbf{e}})$ from the previous section is studied more carefully. In topology reconstruction, only the support structure of the transfer functions matter. The following proposition shows that the support structure is unique.

Proposition 5.

Let $({\mathbf{H}},{\mathbf{e}})$ be an LDIM that satisfies Assumption 1. The topology of every LDIG $([{\mathbf{H}},{\mathbf{F}}],\mathbf{{\widetilde{e}}})\in{\mathcal{L}}_{q}({\mathbf{H}},{\mathbf{e}})$ is the same.

Proof: Refer Appendix D. ∎

Based on the above results, the following transformation of Problem (P1) is formulated.

Problem 6.

(P2) Consider an LDIM $({\mathbf{H}},{\mathbf{e}})$ defined by (1) that satisfies Assumption 1, and $\Phi_{\mathbf{e}}$ allowed to be non-diagonal. Let $([{\mathbf{H}},{\mathbf{F}}],\mathbf{{\widetilde{e}}})\in{\mathcal{L}}_{q}({\mathbf{H}},{\mathbf{e}})$ be an LDIM with LDIG given by ${\mathcal{G}}_{t}({\mathcal{V}}_{t},{\mathcal{E}}_{t})$ . Suppose that the time-series data among the observed nodes of $([{\mathbf{H}},{\mathbf{F}}],\mathbf{{\widetilde{e}}})$ is given. Then, reconstruct the topology among the observed nodes of ${\mathcal{G}}_{t}$ .

Remark 4.7.

The time series data among the observed nodes of the transformed LDIM is the same as the time series obtained from the original LDIM.

Remark 4.8.

The problems (P1) and (P2) are equivalent. That is, the topology reconstructed in both the problems are the same, because topology among the observed nodes of any element of ${\mathcal{L}}({\mathbf{H}},{\mathbf{e}})$ is same as the topology of $({\mathbf{H}},{\mathbf{e}})$ (see Definition 4.2 and proof of Proposition 5).

Therefore, instead of reconstructing the topology of $({\mathbf{H}},{\mathbf{e}})$ with $\Phi_{e}$ non-diagonal, it is sufficient to reconstruct the topology among the observed nodes for one of the LDIMs $([{\mathbf{H}},{\mathbf{F}}],\mathbf{{\widetilde{e}}})\in{\mathcal{L}}_{q}({\mathbf{H}},{\mathbf{e}})$ , which has $\Phi_{\mathbf{{\widetilde{e}}}}$ diagonal.

5 Polynomial Correlation

The results in the previous section assumed that the correlations between the exogenous noise sources are affine in nature. In this section, a generalization of the affine correlation is addressed. It is shown that the noise sources that are correlated via non-affine, but a polynomial, interaction can be characterized using latent nodes with non-linear interaction dynamics. By lifting the processes to a higher dimension, the non-linearity is converted to linear interactions.

The following definitions are useful for the presentation in this section.

Definition 5.1.

[12] Let $x=(x_{1},\dots,x_{m})$ . A monomial in $x_{1},\dots,x_{m}$ is a product of the form $x^{\alpha}:=x_{1}^{\alpha_{1}}x_{2}^{\alpha_{2}}\dots x_{m}^{\alpha_{m}}$ , where $\alpha\in\mathbb{N}^{m}$ . A polynomial $P$ in $x$ with coefficient in a field (or a commutative ring), $\mathbb{F}$ , is a finite linear combination of monomials, i.e.,

P(x):=\sum_{\alpha}a_{\alpha}x^{\alpha},~{}a_{\alpha}\in\mathbb{F}.

(i)

The degree of a monomial $x^{\alpha}$ is $|\alpha|:=\sum_{i=1}^{n}\alpha_{i}$ .
(ii)

The total degree of $P\neq 0$ is the maximum of $|\alpha|$ such that $a_{\alpha}\neq 0$ .

Definition 5.2.

For any ${\mathbf{v}}\in{\mathcal{A}}^{m},$ define the list of monomials of total degree at most $p$ ,

{\mathcal{M}}({\mathbf{v}},p):=[f_{0}({\mathbf{v}}),f_{1}({\mathbf{v}}),\dots,f_{p}({\mathbf{v}})]^{T},

where $f_{k}({\mathbf{v}})$ lists all the $k$ -degree monomials, $f_{k}({\mathbf{v}}):=\{{\mathbf{v}}_{1}^{\alpha_{1}}{\mathbf{v}}_{2}^{\alpha_{2}}\dots\mathbf{v}_{m}^{\alpha_{m}}\mid\sum_{i=1}^{m}\alpha_{i}=k\}$ . For example, when $m=2$ , $f_{0}({\mathbf{v}})=1$ , $f_{1}({\mathbf{v}})=\{{\mathbf{v}}_{1},{\mathbf{v}}_{2}\}$ , and $f_{2}({\mathbf{v}})=\{{\mathbf{v}}_{1}^{2},{\mathbf{v}}_{1}{\mathbf{v}}_{2},{\mathbf{v}}_{2}^{2}\}$ . In general, the total number of monomials having degree $k$ is given by $\binom{m+k-1}{k}$ , where $\binom{n}{k}=\frac{n!}{k!(n-k)!}$ . The total number of monomials up to total degree $p$ is $M=$ $\displaystyle\sum_{k=0}^{p}\binom{m+k-1}{k}$ . For $m=2$ and $p=3$ , ${\mathcal{M}}({\mathbf{v}},3)=[1,{\mathbf{v}}_{1},{\mathbf{v}}_{2},{\mathbf{v}}_{1}^{2},{\mathbf{v}}_{1}{\mathbf{v}}_{2},{\mathbf{v}}_{2}^{2},{\mathbf{v}}_{1}^{3},{\mathbf{v}}_{1}^{2}{\mathbf{v}}_{2},{\mathbf{v}}_{1}{\mathbf{v}}_{2}^{2},{\mathbf{v}}_{2}^{3}]^{T}$ .

5.1 Characterization of Polynomial Correlations

Suppose $e_{1}=\sum_{|\alpha|\leq M}a_{\alpha,1}{\mathbf{v}}^{\alpha}$ and $e_{2}=\sum_{|\alpha|\leq M}a_{\alpha,2}{\mathbf{v}}^{\alpha}$ , where $a_{\alpha,1},a_{\alpha,2}\in{\mathcal{F}}$ and ${\mathbf{v}}\in{\mathcal{A}}^{m}$ , $m,M\in\mathbb{N}$ , with $\Phi_{\mathbf{v}}$ diagonal. Let ${\mathbf{y}}={\mathcal{M}}({\mathbf{v}},p)$ be the vector obtained by concatenating ${\mathbf{v}}^{\alpha}$ , $|\alpha|\leq p$ . Letting $e_{1}={\mathbf{A}}_{1}{\mathbf{y}}$ and $e_{2}={\mathbf{A}}_{2}{\mathbf{y}}$ , we have $\Phi_{e_{1}e_{2}}={\mathbf{A}}_{1}\Phi_{\mathbf{y}}{\mathbf{A}}_{2}^{*}$ , where ${\mathbf{A}}_{i}$ is the vector obtained by concatenating $a_{\alpha,i}$ .

Notice that ${\mathbf{y}}$ is lifting of ${\mathbf{v}}$ into a higher dimensional space of polynomials. In the following, a discussion on the structure of $\Phi_{\mathbf{y}}$ is provided, with the help of examples.

5.2 Example: Lifting of a zero mean Gaussian Process

To illustrate lifting of noise processes to higher dimension, consider independent and identically distributed (IID) Gaussian process (GP). It is shown that, under lifting, the power spectral density matrix (PSDM) is block diagonal.

Consider an IID, GP $\{{\mathbf{v}}(k),k\in{\mathbb{Z}}\mid{\mathbf{v}}(k)\sim N(0,\sigma^{2}{\mathbf{I}}_{m})\}$ . Then, ${\mathbb{E}}\left\{{\mathbf{v}}_{i}{\mathbf{v}}_{j}\right\}={\mathbb{E}}\left\{{\mathbf{v}}_{i}\right\}{\mathbb{E}}\left\{{\mathbf{v}}_{j}\right\}=0$ . It can be shown that [31]

{\mathbb{E}}\left\{{\mathbf{v}}_{i}^{p}\right\}=\left\{\begin{array}[]{cc}0&\text{ if $p$ is odd,}\\ \sigma^{p}(p-1)!!&\text{ if $p$ is even,}\end{array}\right.

(7)

where $p!!:=(p-1)(p-3)\dots 3.1$ denotes the double factorial. Consider $m=2$ and let ${\mathbf{y}}=[y_{1},\dots,y_{10}]:={\mathcal{M}}({\mathbf{v}},3)$ $=[1,~{}{\mathbf{v}}_{1},~{}{\mathbf{v}}_{2},~{}{\mathbf{v}}_{1}^{2},~{}{\mathbf{v}}_{1}{\mathbf{v}}_{2},~{}{\mathbf{v}}_{2}^{2},~{}{\mathbf{v}}_{1}^{3},~{}{\mathbf{v}}_{1}^{2}{\mathbf{v}}_{2},~{}{\mathbf{v}}_{1}{\mathbf{v}}_{2}^{2},~{}{\mathbf{v}}_{2}^{3}]^{T}$ . That is, ${\mathbf{y}}$ lists all the monomials of ${\mathbf{v}}_{1},{\mathbf{v}}_{2}$ with degree $\leq 3$ .

Notice that ${\mathbb{E}}\left\{{\mathbf{v}}_{1}^{i}{\mathbf{v}}_{2}^{j}\right\}={\mathbb{E}}\left\{{\mathbf{v}}_{1}^{i}\right\}{\mathbb{E}}\left\{{\mathbf{v}}_{2}^{j}\right\}\neq 0$ if and only if both $i$ and $j$ are even. Then, ${\mathbb{E}}\left\{y_{2}y_{5}\right\}={\mathbb{E}}\left\{{\mathbf{v}}_{1}^{2}{\mathbf{v}}_{2}\right\}=0$ . Straight forward computation shows that ${\mathbb{E}}\left\{y_{2}y_{k}\right\}\neq 0$ only for $k=2,7,9$ . Similarly, ${\mathbb{E}}\left\{y_{7}y_{k}\right\}\neq 0$ if and only if $k=7,9,2$ , and ${\mathbb{E}}\left\{y_{9}y_{k}\right\}\neq 0$ if and only if $k=7,9,2$ . Notice that $k=2,7,9$ corresponds to the terms with odd power on $x_{1}$ and even power on $x_{2}$ . Repeating the same for every ${\mathbb{E}}\left\{y_{i}y_{k}\right\}$ , $1\leq i,k\leq 10$ , one can show that, after appropriate rearrangement of rows and columns, the covariance matrix and the PSDM of ${\mathbf{y}}$ forms a block diagonal matrix, given by (8). Here, $\mathbf{\tilde{{\mathbf{y}}}}=[y_{1},y_{4},y_{6},y_{2},y_{7},y_{9},y_{3},y_{8},y_{10},y_{5}]$ .

\displaystyle\Phi_{\mathbf{\tilde{{\mathbf{y}}}}}=\left[\begin{matrix}\Phi_{11}&\Phi_{14}&\Phi_{16}&0&0&0&0&0&0&0\\ \Phi_{41}&\Phi_{44}&\Phi_{46}&0&0&0&0&0&0&0\\ \Phi_{61}&\Phi_{46}&\Phi_{66}&0&0&0&0&0&0&0\\ 0&0&0&\Phi_{22}&\Phi_{27}&\Phi_{29}&0&0&0&0\\ 0&0&0&\Phi_{72}&\Phi_{77}&\Phi_{79}&0&0&0&0\\ 0&0&0&\Phi_{92}&\Phi_{97}&\Phi_{99}&0&0&0&0\\ 0&0&0&0&0&0&\Phi_{55}&0&0&0\\ 0&0&0&0&0&0&0&\Phi_{33}&\Phi_{38}&\Phi_{310}\\ 0&0&0&0&0&0&0&\Phi_{83}&\Phi_{88}&\Phi_{810}\\ 0&0&0&0&0&0&0&\Phi_{103}&\Phi_{108}&\Phi_{1010}\\ \end{matrix}\right]

(8)

It is worth mentioning that since the Gaussian process is zero mean IID, the auto-correlation function $R_{{\mathbf{v}}_{i}{\mathbf{v}}_{j}}(k)=R_{{\mathbf{v}}_{i}{\mathbf{v}}_{j}}(0)\delta(k)={\mathbb{E}}\left\{{\mathbf{v}}_{i}(0){\mathbf{v}}_{j}(0)\right\}\delta(k)$ , where $\delta$ is Kronecker delta and thus $\Phi_{\mathbf{\tilde{{\mathbf{y}}}}}(z)=R_{\mathbf{\tilde{{\mathbf{y}}}}}(0)$ , $\forall|z|=1$ is white (same for all the frequencies).

The following proposition shows this for general $m,p$ .

Proposition 7.

Consider the Gaussian IID process $\{{\mathbf{v}}(k),k\in{\mathbb{Z}}\mid{\mathbf{v}}(k)\sim N(0,\sigma^{2}{\mathbf{I}}_{m})\}$ . Let ${\mathbf{y}}={\mathcal{M}}({\mathbf{v}},p)$ . Then, there exists a $\mathbf{\tilde{{\mathbf{y}}}}$ , a permutation of ${\mathbf{y}}$ , such that $\Phi_{\mathbf{\tilde{{\mathbf{y}}}}}$ is block diagonal with $2^{m}$ non-zero blocks.

Proof: Refer Appendix E. ∎

Based on this example, one can extend the result to symmetric zero mean WSS processes. Symmetric distributions are the distributions with probability density functions symmetric around mean, for example, Gaussian distribution.

Definition 5.3.

A probability distribution is said to be symmetric around mean if and only if it’s density function, $f$ , satisfies $f(\mu-x)=f(\mu+x)$ for every $x\in{\mathbb{R}}$ , where $\mu$ is the mean of the distribution.

Proposition 8.

Consider a zero mean WSS process, ${\mathbf{v}}\in{\mathcal{A}}^{m}$ with symmetric distribution and $\Phi_{\mathbf{v}}$ diagonal. Let ${\mathbf{y}}={\mathcal{M}}({\mathbf{v}},p)$ . Then, there exists a $\mathbf{\tilde{{\mathbf{y}}}}$ , a permutation of ${\mathbf{y}}$ such that $\Phi_{\mathbf{\tilde{{\mathbf{y}}}}}$ is block diagonal with $2^{m}$ non-zero blocks.

Proof: Proof is similar to the IID GP case. For symmetric distributions, odd moments are zero, since odd moments are the integral of odd functions. ∎

As shown in the proof of Proposition 7, the monomial nodes corresponding to a given block diagonal (i.e., the same odd-even pattern) can be grouped into a cluster. Fig. 3(c) shows such a clustering with $m=2$ and $p=3$ . The red nodes are the lifted processes in the higher dimension (the monomial nodes $y_{1},\dots,y_{10}$ ). The nodes inside a given cluster (shown by blue ellipse) are correlated with each other and the nodes from different clusters are not correlated with each other; this is a result of $\Phi_{\mathbf{\tilde{{\mathbf{y}}}}}$ being block-diagonal. Here, $e_{1}$ and $e_{2}$ are correlated via the cluster of $y_{2},~{}y_{7},$ and $y_{9}$ , whereas $e_{2}$ and $e_{3}$ are correlated via the cluster of $y_{1},~{}y_{4},$ and $y_{6}$ . Notice the following important distinction from the LDIMs in Fig. 1. Here, it is sufficient for the observed nodes $1,2$ to be connected to one of the latent nodes in a given cluster. It is not necessary for the nodes to have a common parent like the case in Fig. 1. The common parent property from Section 4 is replaced with a common ancestral cluster here.

5.3 Transformation of Polynomial Correlation to Latent Nodes

Based on the aforementioned discussion, relaxation of Assumption 1 and extension of the LDIM transformation results from Section 4 is presented here. It is shown that in order to relax Assumption 1, non-linear interactions are required between the latent nodes and the observed nodes. The following assumption is a generalization of Assumption 1.

Assumption 2.

In LDIM $({\mathbf{H}},{\mathbf{e}})$ , defined by (1), the noise processes $e_{i},e_{j}\in{\mathcal{A}}$ are correlated if and only if there exist polynomials $P_{1},P_{2}$ and $\mathbf{{\widetilde{e}}}=[{\mathbf{v}}^{T},{\widetilde{e}}_{1},{\widetilde{e}}_{2}]^{T}$ , ${\mathbf{v}}\in{\mathcal{A}}^{m}$ , $e_{1},e_{2}\in{\mathcal{A}}$ , with $\Phi_{\mathbf{{\widetilde{e}}}}$ diagonal, $m\in{\mathbb{N}}$ , such that $e_{1}={\widetilde{e}}_{1}+P_{1}({\mathbf{v}})$ and $e_{2}={\widetilde{e}}_{2}+P_{2}({\mathbf{v}})$ , where $P_{i}({\mathbf{v}})=\sum_{|\alpha|\leq M}a_{\alpha,i}{\mathbf{v}}^{\alpha}$ , $a_{\alpha,i}\in{\mathcal{F}}$ , for some $M\in{\mathbb{N}}$ .

Remark 5.4.

To be precise, the extension of Assumption 1 is given by $y=P(x)$ , where $x\in{\mathcal{A}}$ and $P(x)=\sum_{\alpha}a_{\alpha}x^{\alpha}$ , $a_{\alpha}\in{\mathcal{F}}$ is a polynomial of degree less than or equal to $M$ . Any $x\in{\mathcal{A}}$ can be written as $x={\widetilde{e}}_{x}+bv$ , where ${\widetilde{e}}_{x},v\in{\mathcal{A}}$ , $b\in{\mathcal{F}}$ , and ${\widetilde{e}}_{x}$ uncorrelated with $v$ . Then, $y=P({\widetilde{e}}_{x}+bv)$ , which is a special case of Assumption 2.

Definition 5.5.

For any LDIM, $({\mathbf{H}},{\mathbf{e}})$ , with $\Phi_{\mathbf{e}}$ non-diagonal, and satisfying Assumption 2, and $,~{}p>1$

$\displaystyle{\mathcal{L}}^{(p)}({\mathbf{H}},{\mathbf{e}})$	$\displaystyle:=\{(\mathbf{{\widetilde{H}}},{\mathbf{y}}):{\mathbf{e}}=\mathbf{{\widetilde{e}}}_{o}+{\mathbf{F}}{\mathbf{y}},~{}\mathbf{{\widetilde{H}}}={\tiny\left[\begin{matrix}{\mathbf{H}}&{\mathbf{F}}\\ {\mathbf{0}}&{\mathbf{0}}\end{matrix}\right]},~{}{\mathbf{F}}\in{\mathcal{F}}^{n\times M},\$
	$\displaystyle\ {\mathbf{y}}={\mathcal{M}}({\mathbf{v}},p),~{}\mathbf{{\widetilde{e}}}_{o}\in{\mathcal{A}}^{n},~{}{\mathbf{v}}\in{\mathcal{A}}^{m},~{}m\in{\mathbb{N}},~{}\mathbf{{\widetilde{e}}}=[\mathbf{{\widetilde{e}}}_{o},{\mathbf{v}}]^{T},$
	$\displaystyle M=\sum_{k=0}^{p}{\tiny\binom{m+k-1}{k}},~{}\Phi_{\mathbf{{\widetilde{e}}}}\text{ diagonal}\}$	(9)

is the space of all ${\mathcal{L}}_{p}$ -transformations for a given $({\mathbf{H}},{\mathbf{e}})$ . The matrix ${\mathbf{F}}$ is obtained by concatenating ${\mathbf{F}}_{\alpha}\in{\mathcal{F}}^{n\times 1}$ . This is done by first listing all the ${\mathbf{F}}_{\alpha}$ corresponding to “degree one” monomials in lexicographic order, then “degree $2$ ” in lexicographic order, etc. That is, ${\mathbf{F}}:=[{\mathbf{F}}_{b_{1}},\dots,{\mathbf{F}}_{b_{L}},{\mathbf{F}}_{b_{11}},{\mathbf{F}}_{b_{12}},\dots,{\mathbf{F}}_{b_{1L}},{\mathbf{F}}_{b_{21}},{\mathbf{F}}_{b_{22}},\dots],$ where ${\mathbf{F}}_{b_{i_{1}\dots i_{k}}}$ denotes the column vector of ${\mathbf{F}}$ corresponding to $\alpha=b_{i_{1}}+\dots+b_{i_{k}}$ , $b_{i}:$ canonical basis. For $m=2,~{}p=3$ from Section 5.2, ${\mathbf{F}}=[{\mathbf{F}}_{b_{1}},{\mathbf{F}}_{b_{2}},{\mathbf{F}}_{b_{11}},{\mathbf{F}}_{b_{12}},{\mathbf{F}}_{b_{22}},{\mathbf{F}}_{b_{111}},{\mathbf{F}}_{b_{112}},{\mathbf{F}}_{b_{122}},{\mathbf{F}}_{b_{222}}]$ . Here, ${\mathbf{F}}_{b_{2}}={\mathbf{F}}_{(0,1)},~{}{\mathbf{F}}_{b_{12}}={\mathbf{F}}_{(1,1)},~{}{\mathbf{F}}_{b_{222}}={\mathbf{F}}_{(0,3)}$ .

With this new “polynomial lifting” definition, one can transform a given LDIM with correlated noise sources to a transformed LDIM with latent nodes. As shown in the following lemma, transformation of correlations to uncorrelated latent nodes from Section 4 is replaced with uncorrelated latent clusters here. For any cluster $c$ , $Ch(c):=\{Ch(i):i\in c\}$ , that is, $Ch(c)$ denotes the union of the children of the nodes present in cluster $c$ .

Lemma 9.

Let $({\mathbf{H}},{\mathbf{e}})$ be an LDIM that satisfies Assumption 2, and let ${\mathcal{G}}_{c}({\mathcal{V}},{\mathcal{E}}_{c})$ be the correlation graph of the exogenous noise sources, $\{e_{i}\}_{i=1}^{n}$ .

Then, for every distinct $i,j\in[n]$ , $(i,j)\in{\mathcal{E}}_{c}$ if and only if for every LDIG $(\mathbf{{\widetilde{H}}},{\mathbf{y}})\in{\mathcal{L}}^{(p)}({\mathbf{H}},{\mathbf{e}})$ , there exists a cluster $c$ such that $i,j\in Ch(c)$ .

Proof: Refer Appendix F. ∎

The following theorem shows that a subgraph, ${\mathcal{G}}^{\ell}({\mathcal{V}}^{\ell},{\mathcal{E}}^{\ell})$ , of the correlation graph, ${\mathcal{G}}_{c}({\mathcal{V}},{\mathcal{E}}_{c})$ , forms a maximal clique in ${\mathcal{G}}_{c}$ if and only if for any transformed LDIG in ${\mathcal{L}}^{(p)}({\mathbf{H}},{\mathbf{e}})$ , the set of nodes in ${\mathcal{V}}^{l}$ is equal to the set of the children of some latent cluster.

Theorem 10.

Let $({\mathbf{H}},{\mathbf{e}})$ be an LDIM defined by (1) which satisfies Assumption 2,and let ${\mathcal{G}}_{c}({\mathcal{V}},{\mathcal{E}}_{c})$ be the correlation graph of the exogenous noise sources, ${\mathbf{e}}$ . Suppose that ${\mathcal{G}}^{\ell}({\mathcal{V}}^{\ell},{\mathcal{E}}^{\ell})\subseteq{\mathcal{G}}_{c}({\mathcal{V}},{\mathcal{E}}_{c})$ is a maximal clique with $|{\mathcal{V}}^{\ell}|>1$ . Then, for any LDIM $(\mathbf{{\widetilde{H}}},{\mathbf{y}})\in{\mathcal{L}}^{(p)}({\mathbf{H}},{\mathbf{e}})$ , there exist latent clusters $C_{1}^{\ell},\dots,C^{\ell}_{k_{\ell}}$ in the LDIG of $(\mathbf{{\widetilde{H}}},{\mathbf{y}})$ such that

\displaystyle{\mathcal{V}}^{\ell}

\displaystyle=\bigcup_{i=1}^{k_{\ell}}Ch(C^{\ell}_{i})\text{ and }{\mathcal{E}}^{\ell}=\bigcup_{i=1}^{k_{\ell}}{\mathcal{E}}_{\ell,i},

(10)

where ${\mathcal{E}}_{\ell,i}:=\{(k,j):k,j\in Ch(c_{i}^{\ell})\}$ . In particular, for any latent cluster $C$ in the LDIG of $(\mathbf{{\widetilde{H}}},{\mathbf{y}})$ , $Ch(C)$ forms a clique in ${\mathcal{G}}_{c}$ .

Proof: See Appendix G. ∎

Consider the LDIG shown in Fig. 3(a) with correlation graph given by Fig. 3(b). As shown in Fig. 3(d), if all the three nodes are correlated, one can transform this LDIM to an LDIM with latent node $4$ . However, here node $4$ should capture the interactions from the latent nodes $y_{1}-y_{10}$ . Therefore, during reconstruction, one should accommodate non-linear interaction between the latent node and the observed nodes.

In the next section, we describe a way to perform the reconstruction, when ${\mathcal{G}}_{c}$ is unavailable.

6 Moral-Graph Reconstruction by Matrix Decomposition

In the previous sections, frameworks that convert a network with spatially correlated noise to a network with latent nodes were studied. In this section, a technique is provided to reconstruct the topology of the transformed LDIMs using the sparse plus low-rank matrix decomposition of the IPSDM obtained from the observed time-series. Note that this result does not require any extra information other than the IPSDM obtained from the true LDIM.

6.1 Topology Reconstruction under Affine Correlation

Here, reconstruction under affine correlation is discussed. Recall from Definition 4.2 that ${\mathbf{e}}=\mathbf{{\widetilde{e}}}_{o}+{\mathbf{F}}\mathbf{{\widetilde{e}}}_{h}$ . Then, PSDM of ${\mathbf{e}}$ , $\Phi_{e}$ , can be written as [27]: $\Phi_{e}=\Phi_{\mathbf{{\widetilde{e}}}_{o}}+\Phi_{\mathbf{{\widetilde{e}}}_{o}\mathbf{{\widetilde{e}}}_{h}}{\mathbf{F}}^{*}+{\mathbf{F}}\Phi_{\mathbf{{\widetilde{e}}}_{h}\mathbf{{\widetilde{e}}}_{o}}+{\mathbf{F}}\Phi_{\mathbf{{\widetilde{e}}}_{h}}{\mathbf{F}}^{*}$ $=\Phi_{\mathbf{{\widetilde{e}}}_{o}}+{\mathbf{F}}\Phi_{\mathbf{{\widetilde{e}}}_{h}}{\mathbf{F}}^{*}$ , where the second equality follows because $\mathbf{{\widetilde{e}}}_{h}$ and $\mathbf{{\widetilde{e}}}_{o}$ are uncorrelated with mean zero. Then, IPSDM of the observed nodes in $([{\mathbf{H}},{\mathbf{F}}],\mathbf{{\widetilde{e}}})\in{\mathcal{L}}({\mathbf{H}},{\mathbf{e}})$ ,

	$\displaystyle\Phi_{o}^{-1}$	$\displaystyle=({\mathbf{I}}_{n}-{\mathbf{H}})^{}(\Phi_{{\widetilde{e}}_{o}}+{\mathbf{F}}\Phi_{{\widetilde{e}}_{h}}{\mathbf{F}}^{})^{-1}({\mathbf{I}}_{n}-{\mathbf{H}})$
		$\displaystyle\stackrel{{\scriptstyle(a)}}{{=}}{\mathbf{A}}-{\mathbf{B}},$		(11)

where ${\mathbf{A}}=({\mathbf{I}}_{n}-{\mathbf{H}})^{*}\Phi_{{\widetilde{e}}_{o}}^{-1}({\mathbf{I}}_{n}-{\mathbf{H}})$ and ${\mathbf{B}}=({\mathbf{I}}_{n}-{\mathbf{H}})^{*}\Phi_{{\widetilde{e}}_{o}}^{-1}{\mathbf{F}}(\Phi_{{\widetilde{e}}_{h}}^{-1}+{\mathbf{F}}^{*}\Phi_{{\widetilde{e}}_{o}}^{-1}{\mathbf{F}})^{-1}{\mathbf{F}}^{*}\Phi_{{\widetilde{e}}_{o}}^{-1}({\mathbf{I}}_{n}-{\mathbf{H}})$ . Equality (a) follows from the matrix inversion lemma [20]. (11) can then be rewritten as:

$\displaystyle\Phi_{o}^{-1}=$	$\displaystyle{\mathbf{S}}+{\mathbf{L}},\text{ where }$	(12)
$\displaystyle{\mathbf{S}}$	$\displaystyle=({\mathbf{I}}_{n}-{\mathbf{H}}^{*})\Phi_{{\widetilde{e}}_{o}}^{-1}({\mathbf{I}}_{n}-{\mathbf{H}}),$	(13)
$\displaystyle{\mathbf{L}}$	$\displaystyle=-\Psi^{*}\Lambda^{-1}\Psi,$	(14)

$\Psi={\mathbf{F}}^{*}\Phi_{{\widetilde{e}}_{o}}^{-1}({\mathbf{I}}_{n}-{\mathbf{H}})$ , and $\Lambda={\mathbf{F}}^{*}\Phi_{{\widetilde{e}}_{o}}^{-1}{\mathbf{F}}+\Phi_{{\widetilde{e}}_{h}}^{-1}$ , which is similar to the model in [46]. If the moral-graph of the original LDIG is sparse and $M\ll n$ , then ${\mathbf{S}}$ is sparse and ${\mathbf{L}}$ is low-rank. It was shown in [27] that the support of ${\mathbf{S}}$ retrieves the moral graph of ${\mathcal{G}}$ . Furthermore, as shown in [46], under some assumptions that is applicable to a large class of problems, $(i,j)$ -th entry of ${\mathbf{S}}$ is strictly real if and only if the edge $i-j$ is a strict spouse edge. Thus, it can be shown that, in a large class of problems, support of $\Im\{{\mathbf{S}}\}$ retrieves the exact topology of ${\mathcal{G}}$ . Following the approach in [46], we reconstruct the network topology from the sparse+low-rank decomposition of $\Im\{\Phi_{o}^{-1}(z)\}$ , which is a skew-symmetric matrix. For completeness, the essential theories and a relevant algorithm from [46] is provided below in Section 6.4. The idea here is to decompose a given skew-symmetric matrix, i.e. $\Im\{\Phi_{o}^{-1}(z)\}$ , into the sparse and low-rank components ( $\Im\{{\mathbf{S}}(z)\}$ and $\Im\{{\mathbf{L}}(z)\}$ respectively), and then to reconstruct the moral graph/topology from $\Im\{{\mathbf{S}}(z)\}$ , for some $|z|=1$ .

The next subsection shows how the sparse low-rank decomposition is applicable in polynomial correlation setting also.

6.2 Topology reconstruction under polynomial correlation

Under polynomial correlation, recall from Definition 5.5 that ${\mathbf{e}}(z)=\mathbf{{\widetilde{e}}}_{o}(z)+{\mathbf{F}}\mathbf{\tilde{{\mathbf{y}}}}$ . Let ${\mathbf{x}}_{h}=\mathbf{{\widetilde{e}}}_{h}=\mathbf{\tilde{{\mathbf{y}}}}$ and let ${\mathbf{x}}=[{\mathbf{x}}_{o}^{T},{\mathbf{x}}_{h}^{T}]^{T}$ . Then, one can write,

\displaystyle\left[\begin{matrix}{\mathbf{x}}_{o}(z)\\ {\mathbf{x}}_{h}(z)\end{matrix}\right]=\begin{bmatrix}{\mathbf{H}}(z)&{\mathbf{F}}(z)\\ \mathbf{0}_{M\times n}&\mathbf{0}_{M\times M}\end{bmatrix}\begin{bmatrix}\mathbf{x}_{o}(z)\\ \mathbf{x}_{h}(z)\end{bmatrix}+\begin{bmatrix}\mathbf{{\widetilde{e}}_{o}}(z)\\ \mathbf{{\widetilde{e}}_{h}}(z)\end{bmatrix},

(15)

where, ${\mathbf{F}}_{ij}$ captures the influence of $\mathbf{\tilde{{\mathbf{y}}}}_{j}$ on $x_{i}$ and $\Phi_{\mathbf{\tilde{{\mathbf{y}}}}}$ is block diagonal.

The topology of the sub-graph restricted to the observed nodes in the above LDIM and the true topology of the network are equivalent. Moreover, similar to (11) (see [46] for details), one can obtain (12) to (14) exactly.

Here ${\mathbf{S}}(z),{\mathbf{L}}(z)\in\mathbb{C}^{n\times n}$ , $\Psi\in\mathbb{C}^{M\times n}$ , and $\Lambda\in\mathbb{C}^{M\times M}$ . If $M\ll n$ , then ${\mathbf{L}}$ is low-rank. Let ${\mathcal{J}}:=\{j\in[M]\mid\exists i\in[n]\text{ with }{\mathbf{F}}_{ij}\neq 0\}$ be the set of monomials that has non zero contribution to $e_{i},~{}i\in[n]$ and let $L=|{\mathcal{J}}|$ . Under this scenario, ${\mathbf{L}}$ is low-rank if $L\ll n$ . Section 7.1 demonstrates an example on application of the sparse plus low-rank decomposition to reconstruct the topology under polynomial correlation.

6.3 Low-rank $+$ Sparse Matrix Decomposition

Here, the following problem is considered: given a matrix ${\mathbf{C}}$ that is known to be sum of a sparse skew-symmetric matrix ${\mathbf{S}}$ and a low-rank skew-symmetric matrix ${\mathbf{L}}$ , retrieve the sparse and low-rank components. The following optimization program modified from [7] is used to obtain the sparse low-rank decomposition, where $0\leq t\leq 1$ is a pre-selected penalty factor [46].

\displaystyle\begin{split}(\widehat{{\mathbf{S}}}_{t},\widehat{{\mathbf{L}}}_{t})=&\arg\min_{{\widehat{\mathbf{S}}},{\widehat{\mathbf{L}}}}t\|{\widehat{\mathbf{S}}}\|_{1}+(1-t)\|{\widehat{\mathbf{L}}}\|_{*}\\ &\text{ subject to }{\widehat{\mathbf{S}}}+{\widehat{\mathbf{L}}}={\mathbf{C}},\\ &\hskip 44.10185pt{\widehat{\mathbf{S}}}^{T}=-{\widehat{\mathbf{S}}},\ {\widehat{\mathbf{L}}}^{T}=-{\widehat{\mathbf{L}}},\end{split}

(16)

where ${\mathbf{C}}=\Im\{\Phi_{o}^{-1}(z)\}$ , for some $z\in\mathbb{C},\ |z|=1$ .

In the next subsection, a sufficient condition and an algorithm for the exact recovery of the sparse and the low-rank components from ${\mathbf{C}}$ using (16) are provided.

6.4 Sufficient Condition for Sparse Low-rank Matrix Decomposition

In this subsection, a sufficient condition (proved in [7, 46]) is provided to uniquely decompose a matrix as a sum of the sparse skew-symmetric and the low-rank skew-symmetric components. Furthermore, an algorithm is provided that utilizes this sufficient condition to retrieve the sparse and low-rank components.

The following definitions are used in the subsequent results.

$\displaystyle deg_{max}({\mathbf{M}})$	$\displaystyle:=\max\left(\max_{1\leq i\leq n}\left(\sum_{j=1}^{n}\mathbbm{1}_{\{{\mathbf{M}}_{ij}\neq 0\}}\right)\right.,$
	$\displaystyle\hskip 56.9055pt\max_{1\leq j\leq n}\left.\left(\sum_{i=1}^{n}\mathbbm{1}_{\{{\mathbf{M}}_{ij}\neq 0\}}\right)\right),$	(17)
$\displaystyle inc({\mathbf{M}})$	$\displaystyle:=\max_{k}\\|{\mathbf{U}}{\mathbf{U}}^{T}e_{k}\\|_{2},$	(18)

where ${\mathbf{U}}\Sigma{\mathbf{V}}^{T}$ is the compact singular value decomposition of ${\mathbf{M}}$ and $\|\cdot\|_{2}$ is the Euclidean norm of a vector.

The following is a sufficient condition that guarantees the unique decomposition of ${\mathbf{C}}$ (see [7, 46] for details).

Lemma 11.

Suppose that we are given a matrix ${\mathbf{C}}$ , which is the sum of a sparse matrix $\mathbf{\tilde{{\mathbf{S}}}}\in{\mathbb{S}}^{n}$ and a low-rank matrix $\mathbf{\tilde{{\mathbf{L}}}}\in{\mathbb{S}}^{n}$ . If $(\mathbf{\tilde{{\mathbf{S}}}},\mathbf{\tilde{{\mathbf{L}}}})$ satisfies

\displaystyle deg_{max}(\mathbf{\tilde{{\mathbf{S}}}})inc(\mathbf{\tilde{{\mathbf{L}}}})<\frac{1}{12},

(19)

then there exists a penalty factor $t\in[0,1]$ such that (16) returns $(\widehat{{\mathbf{S}}}_{t},\widehat{{\mathbf{L}}}_{t})=(\mathbf{\tilde{{\mathbf{S}}}},\mathbf{\tilde{{\mathbf{L}}}})$ .

Remark 6.1.

The results in [7, 46] are proved for the optimization problem with the objective function $\gamma\|{\widehat{\mathbf{S}}}\|_{1}+\|{\widehat{\mathbf{L}}}\|_{*}$ , where $\gamma\geq 0$ . However, the results hold for $\eqref{eq:convex_opti_t}$ as well since the problems are equivalent (via the map $t=\gamma/(1+\gamma)$ )

The sufficient condition (19) roughly translates to $\mathbf{\tilde{{\mathbf{S}}}}$ being sparse and the number of maximal cliques, $M$ , being small, with clique sizes not too small (see [46] for more details).

The following metrics are used to measure the accuracy of the estimates $({\widehat{\mathbf{S}}}_{t},{\widehat{\mathbf{L}}}_{t})$ in the optimization (16).

	$\displaystyle tol_{t}$	$\displaystyle:=\frac{\\|\widehat{{\mathbf{S}}}_{t}-\mathbf{\tilde{{\mathbf{S}}}}\\|_{F}}{\\|\mathbf{\tilde{{\mathbf{S}}}}\\|_{F}}+\frac{\\|\widehat{{\mathbf{L}}}_{t}-\mathbf{\tilde{{\mathbf{L}}}}\\|_{F}}{\\|\mathbf{\tilde{{\mathbf{L}}}}\\|_{F}},$
	$\displaystyle{diff}_{t}$	$\displaystyle:=(\\|\widehat{{\mathbf{S}}}_{t-\epsilon}-\widehat{{\mathbf{S}}}_{t}\\|_{F})+(\\|\ \widehat{{\mathbf{L}}}_{t-\epsilon}-\widehat{{\mathbf{L}}}_{t}\\|_{F}),$		(20)

where $\|.\|_{F}$ denotes the Frobenius norm and $\epsilon>0$ is a sufficiently small fixed constant. Note that $tol_{t}$ requires the knowledge of the true matrices $\mathbf{{\widetilde{S}}}$ and $\mathbf{{\widetilde{L}}}$ , whereas ${diff}_{t}$ does not.

The following Lemma is proved in [46] and is applied in Algorithm 1 later to retrieve the sparse and low-rank components.

Lemma 12.

Suppose we are given a matrix ${\mathbf{C}}$ , which is obtained by summing $\mathbf{\tilde{{\mathbf{S}}}}$ and $\mathbf{\tilde{{\mathbf{L}}}}$ , where $\mathbf{\tilde{{\mathbf{S}}}}$ is a sparse skew-symmetric matrix and $\mathbf{\tilde{{\mathbf{L}}}}$ is a low-rank skew-symmetric matrix. If $\mathbf{\tilde{{\mathbf{S}}}}$ and $\mathbf{\tilde{{\mathbf{L}}}}$ satisfies $\deg_{\max}(\mathbf{\tilde{{\mathbf{S}}}})inc(\mathbf{\tilde{{\mathbf{L}}}})<1/12$ , then there exist at least three regions where $diff_{t}=0$ . In particular, there exists an interval $[t_{1},t_{2}]\subset[0,1]$ with $0<t_{1}<t_{2}<1$ such that the solution of (16) is $({\widehat{\mathbf{S}}}_{t},{\widehat{\mathbf{L}}}_{t})=(\mathbf{\tilde{{\mathbf{S}}}},\mathbf{\tilde{{\mathbf{L}}}})$ for any $t\in[t_{1},t_{2}]$ .

Following the procedure in [46], moral-graph/topology reconstruction from the imaginary part of the IPSDM is considered here. The following are some assumptions from [46], assumed for the exact recovery of the topology. The details can be found in [46], and are skipped here. In the absence of Assumption 4, the reconstruction algorithm will detect some false positive edges, but none of the true edges are missed.

Assumption 3.

For any $i\in[n]$ , $k\in[n]$ $(k^{\prime}\in[L])$ , if ${\mathbf{H}}_{ik}(z)\neq 0,$ $(F_{ik^{\prime}}\neq 0)$ then $\Im\{{\mathbf{H}}_{ik}(z)\}\neq 0$ $(\Im\{{\mathbf{F}}_{ik^{\prime}}(z)\}\neq 0)$ , for all $z\in\mathbb{C},~{}|z|=1$ .

Assumption 4.

For the LDIM in (1), and $i,k,l\in{\mathcal{V}}$ , if ${\mathbf{H}}_{ki}(z)\neq 0$ and ${\mathbf{H}}_{kl}(z)\neq 0$ , then $\phase{{\mathbf{H}}_{ki}(z)}=\phase{{\mathbf{H}}_{kl}(z)}$ .

The following Lemma from [46] reconstructs the exact topology of the LDIM from ${\mathbf{S}}$ in (13).

Lemma 13.

Consider a well-posed and topologically detectable LDM $({\mathbf{H}},{\mathbf{e}})$ described by (1) with the associated graph ${\mathcal{G}}({\mathcal{V}},{\mathcal{E}})$ , and satisfying Assumption 4. Let ${\mathbf{S}}(z)$ be given by (13) and let $\widehat{{\mathcal{E}}}_{o}:=\{(i,j):\Im\{{\mathbf{S}}_{ij}(z)\}\neq 0,~{}i<j\}$ and $\overline{{\mathcal{E}}}_{o}:=\{(i,j)\mid(i,j)\in{\mathcal{E}}\text{ or }(j,i)\in{\mathcal{E}},~{}i<j\}$ , for some $z\in\mathbb{C}$ , $|z|=1$ . Then, $\widehat{{\mathcal{E}}}_{o}\subseteq\overline{{\mathcal{E}}}_{o}$ . Additionally, if the LDM satisfies Assumption 3, then $\hat{\mathcal{E}}_{o}=\overline{\mathcal{E}}_{o}$ almost always.

The following is the main result of this section, which follows from Lemma 11 and Lemma 13.

Theorem 14.

Let $({\mathbf{H}},{\mathbf{e}})$ be an LDIM with $\Phi_{e}$ non-diagonal that satisfy Assumption 3-Assumption 4. Suppose that there exists an $({\mathbf{H}},{\mathbf{F}},\mathbf{{\widetilde{e}}})\in{\mathcal{L}}_{q}({\mathbf{H}},{\mathbf{e}})$ such that for ${\mathbf{S}}$ and ${\mathbf{L}}$ in (12), $\deg_{\max}(\Im\{{\mathbf{S}}\})inc(\Im\{{\mathbf{L}}\})<1/12$ . Then, the true topology of $({\mathbf{H}},{\mathbf{e}})$ can be reconstructed by solving the optimization problem $\eqref{eq:convex_opti_t}$ with ${\mathbf{C}}=\Im\{\Phi^{-1}_{o}(z)\}$ for some $z=e^{i\omega},~{}\omega\in(0,2\pi]$ . In particular, ${\mathcal{T}}({\mathcal{V}},{\mathcal{E}})=supp({\widehat{\mathbf{S}}}_{t})$ , for appropriately selected $t$ .

Proof: Refer Appendix H. ∎

Additionally, applying the algorithms in [46], the correlation graph also can be reconstructed, if the LDIM satisfies the following assumption, as shown in simulation results.

Assumption 5.

For every distinct latent nodes $k_{h},k_{h}^{\prime}$ in the transformed dynamic graph, the distance between $k_{h}$ and $k_{h}^{\prime}$ is at least four hops.

Algorithm 1 Matrix decomposition

Input: $\Phi_{oo}^{-1}(z)$ : IPSDM among ${\mathcal{V}}_{o}$ , $\varepsilon$ , $z=e^{j\omega},~{}\omega\in(-\pi,\pi]$
Output: Matrices $\Im({\mathbf{S}}(z))$ and $\Im({\mathbf{L}}(z))$

1:Set

{\mathbf{C}}=\Im\{\Phi_{oo}^{-1}(z)\}

2:Initialize

(\widehat{{\mathbf{S}}}_{0},\widehat{{\mathbf{L}}}_{0})=({\mathbf{C}},\mathbf{0})

3:for all

t\in\{\varepsilon,2\varepsilon,\dots,1\}

4: Solve the convex optimization (16) and calculate

{diff}_{t}

in (20)

5:end for

6:Identify the three regions where

diff_{t}

is zero and denote the middle region as

[t_{1},t_{2}]

. Pick a

t_{0}\in[t_{1},t_{2}]

and the corresponding pair

(\hat{{\mathbf{S}}}_{t_{0}},\hat{{\mathbf{L}}}_{t_{0}})

7:if

deg_{max}(\hat{{\mathbf{S}}}_{t_{0}})inc(\hat{{\mathbf{L}}}_{t_{0}})<\frac{1}{12}

then

(\widehat{{\mathbf{S}}}(z),\widehat{{\mathbf{L}}}(z))=({\widehat{\mathbf{S}}}_{t_{0}},{\widehat{\mathbf{L}}}_{t_{0}})

9: Return

(\widehat{{\mathbf{S}}}(z),\widehat{{\mathbf{L}}}(z))

10:end if

7 Simulation results

In this section, we demonstrate topology reconstruction of an LDIM $({\mathbf{H}},{\mathbf{e}})$ with $\Phi_{e}$ non-diagonal, from $\Phi_{\mathbf{x}}^{-1}$ , using the sparse $+$ low-rank decomposition technique discussed in Section 6 for an affinely correlated network. Fig. 4(a)-4(e) respectively depicts ${\mathcal{G}}({\mathcal{V}},{\mathcal{E}})$ , ${\mathcal{T}}({\mathcal{V}},{\mathcal{E}})$ , ${\mathcal{G}}_{c}({\mathcal{V}},{\mathcal{E}}_{c})$ , ${\mathcal{G}}({\mathcal{V}}_{t},{\mathcal{E}}_{t}\setminus{\mathcal{E}})$ , and ${\mathcal{G}}_{t}({\mathcal{V}}_{t},{\mathcal{E}}_{t})$ described in Section 4. Simulations are done in Matlab. Yalmip [23] with SDP solver [44] is used to solve the convex program (16).

For the simulation, we assume we have access to the true PSDM, $\Phi_{\mathbf{x}}$ , of the LDIM of Fig. 4(a). Here, $\Phi_{\mathbf{e}}$ is non-diagonal with the (unkown) correlation structure as shown in Fig. 4(c).

For the reconstruction, the imaginary part of the IPSDM, ${\mathbf{C}}=\Im\{\Phi^{-1}_{\mathbf{x}}(z)\}$ is employed in the convex optimization (16) for $z=e^{2\pi/8}$ . Optimization (16) is solved for all the values of $t\in[\epsilon,1]$ , with the interval $\epsilon=0.01$ . Notice that for $t=0$ $({\widehat{\mathbf{S}}}_{t},{\widehat{\mathbf{L}}}_{t})=({\mathbf{C}},0)$ . Fig. 5 shows the comparison of $tol_{t}$ and diff_t versus $t$ . $({\widehat{\mathbf{S}}}_{t},{\widehat{\mathbf{L}}}_{t})$ for $t=0.36$ is picked, which belongs to the middle zero region of diff_t as described in [46].

${\mathbb{I}}_{\{{\widehat{\mathbf{S}}}_{t}\neq{\mathbf{0}}\}}$ returned the exact topology of Fig. 4(b). From ${\mathbb{I}}_{\{{\widehat{\mathbf{L}}}_{t}\neq{\mathbf{0}}\}}$ , by following Algorithms $2$ and $3$ in [46], ${\mathcal{G}}_{c}({\mathcal{V}},{\mathcal{E}}_{c})$ also is reconstructed, which matches Fig. 4(c) exactly.

7.1 Polynomial correlation

Here, the topology reconstruction of the network when the noise processes are polynomially correlated, is shown. For the simulation, consider the LDIG shown in Fig. 4(a) and correlation graph of 4(b). The noise processes are IID GP as shown in Section 5.2 with $m=2$ and $p=3$ . The entries of ${\mathbf{F}}$ are such that only the co-efficients corresponding to $y_{2},y_{5},$ and $y_{9}$ are non-zero, and the coefficients $F_{ij}=0$ , for every $1\leq i\leq 29$ , $1\leq j\leq 10,~{}j\notin\{2,5,9\}$ .

Figure 6 shows the diff_t and tol_t plot of applying Algorithm 1 with $\epsilon=0.01$ . As shown in the plots, for $t\in[.28,.38]$ , tol_t is zero, which corresponds to the exact decomposition. Additionally, as mentioned in Lemma 12, diff_t is zero in this interval. The support of ${\widehat{\mathbf{S}}}_{t}$ for some $t\in[.28,.38]$ reconstructs the exact topology, which validates Theorem 14.

7.2 Finite data simulation

In this section, to evaluate the effect of finite data size, simulations are run on a synthetic data set based on the network shown in Fig. 4(e). For the PSD estimation, Welch method [41] is used. Notice that the accuracy and the sample complexity of the estimation can be improved by employing advanced IPSDM estimation techniques from literature, for example see [2, 48, 49]. Fig. 7 shows the estimated results from a sample size of $10^{6}$ ; Fig. 7(a) shows the true and the estimated IPSDM matrices. The fourth matrix of Fig. 7(a) shows the topology estimated directly from $\Im\{\widehat{\Phi}_{o}^{-1}(e^{j2\pi/5})\}$ , without decomposition. The estimation is done by hard thresholding. That is, the edge $(i,j)$ is detected if $[\Im\{\widehat{\Phi}_{o}^{-1}(e^{j2\pi/5})\}]_{i,j}$ is greater than a threshold, and not detected otherwise. The detection threshold is selected to obtain the minimum number of errors. 14 out of 16 edges are detected but 6 false positive edges are also detected, with the total error of 50% (8 out of 16 edges). This shows that estimating the topology directly from $\Im\{\widehat{\Phi}_{o}^{-1}(e^{j2\pi/5})\}$ returns an undesirable number of errors.

Towards the exact topology retrieval, the optimization (16) is performed on $\Im\{\widehat{\Phi}_{o}^{-1}(e^{j2\pi/5})\}$ to obtain sparse plus low-rank decomposition. Fig. 7(b) shows the sparse part retrieved from the decomposition of $\Im\{\widehat{\Phi}_{o}^{-1}(e^{j2\pi/5})\}$ for various $t$ from $0$ to $0.5$ . ${\widehat{\mathbf{S}}}_{t}$ at $t=0.34$ is selected for estimating the topology. Thus, as illustrated by the above example, with the approach proposed in the article it is possible to choose a threshold that yields 100% detection without sacrificing the false alarm performance.

In order to demonstrate that the techniques proposed in this article do not degrade drastically with lesser number of samples, a simulation is run on 6000 samples. Employing the detection from $\Im\{\widehat{\Phi}_{o}^{-1}(e^{j2\pi/5})\}$ returned 48 false edges and missed one edge, thus a total of 49 errors. However, with the decomposition, detection from the support of ${\widehat{\mathbf{S}}}_{t}$ at $t=.34$ detected 14 out of 16 edges and missed none, thus giving an error rate of 87.5%. This confirms that the decomposition based reconstruction proposed by the article yield substantial advantages. The sample complexity analysis of the article’s methods is open for future research.

8 Conclusion

In this article, the problem of reconstructing the topology of an LDIM with spatially correlated noise sources was studied. First, assuming affine correlation and the knowledge of correlation graph, the given LDIM was transformed into an LDIM with latent nodes, where the latent nodes were characterized using the correlation graph and all the nodes were excited by uncorrelated noises. For polynomial correlation, a generalization of the affine correlation, the latent nodes in the transformed LDIMs were excited using clusters of noise, where the noise clusters were uncorrelated with each other. Finally, using a sparse low-rank matrix decomposition technique, the exact topology of the LDIM was reconstructed, solely from the IPSDM of the true LDIM, when the network satisfied a sufficient condition required for the matrix decomposition. Simulation results are provided that verify the theoretical results.

Appendix A Proof of Lemma 2

Let $([{\mathbf{H}},{\mathbf{F}}],\mathbf{{\widetilde{e}}})\in{\mathcal{L}}({\mathbf{H}},{\mathbf{e}})$ and let $i,j\in[n]$ . By definition of ${\mathcal{L}}({\mathbf{H}},{\mathbf{e}})$ , $e_{i}={\widetilde{e}}_{i}+{\mathbf{F}}_{i}\mathbf{{\widetilde{e}}}_{h}$ , where ${\mathbf{F}}_{i}$ denotes the $i^{\text{th}}$ row of ${\mathbf{F}}$ , for any $i\in[n]$ . To prove the only if part, suppose that $\Phi_{e_{i}e_{j}}\neq 0$ . By definition, $e_{i}e_{j}=({\widetilde{e}}_{i}+{\mathbf{F}}_{i}\mathbf{{\widetilde{e}}}_{h})({\widetilde{e}}_{j}+{\mathbf{F}}_{j}\mathbf{{\widetilde{e}}}_{h})$ . Then, $\Phi_{e_{i}e_{j}}={\mathbf{F}}_{i}\Phi_{\mathbf{{\widetilde{e}}}_{h}}{\mathbf{F}}_{j}^{*}$ $=\sum_{k}F_{ik}F_{jk}^{*}\Phi_{{\widetilde{e}}_{h_{k}}}$ , since ${\widetilde{e}}_{i}$ and ${\widetilde{e}}_{j}$ are uncorrelated for $i,j\in[n+L]$ . Thus, $\sum_{k}F_{ik}F_{jk}^{*}\Phi_{{\widetilde{e}}_{h_{k}}}\neq 0$ , which implies that there exists a $k$ such that $F_{ik}\neq 0$ and $F_{jk}\neq 0$ . In other words, there exists a latent node $h_{k}$ in the corresponding LDIG of $([{\mathbf{H}},{\mathbf{F}}],\mathbf{{\widetilde{e}}})$ such that $i,j\in Ch(h_{k})$ .

$\Leftarrow$ Let $i,j\in[n]$ such that $\Phi_{e_{i}e_{j}}=0$ . Then, from the proof of only if part, $0={\mathbf{F}}_{i}\Phi_{\mathbf{{\widetilde{e}}}_{h}}{\mathbf{F}}_{j}^{*}=\sum_{k}F_{ik}F_{jk}^{*}\Phi_{{\widetilde{e}}_{h_{k}}},$ which implies that $F_{ik}=0$ or $F_{jk}=0,\ \forall k$ , except for a few pathological cases that occur with Lebesgue measure zero. We ignore the pathological cases here. Hence, there does not exist any latent node $h$ such that both $i,j\in Ch(h)$ . ∎

The following result shows that if a subgraph, ${\mathcal{G}}^{\ell}({\mathcal{V}}^{\ell},{\mathcal{E}}^{\ell})$ , of the correlation graph, ${\mathcal{G}}_{c}({\mathcal{V}},{\mathcal{E}}_{c})$ , forms a maximal clique in ${\mathcal{G}}_{c}$ , then for any transformed LDIG in ${\mathcal{L}}({\mathbf{H}},{\mathbf{e}})$ , the set of nodes in ${\mathcal{V}}^{l}$ is equal to the set of children of some latent nodes in the LDIG.

Appendix B Proof of Lemma 3

The following lemma is useful in proving Lemma 3.

Lemma 15.

Let $({\mathbf{H}},{\mathbf{e}})$ be an LDIM defined by (1) which satisfies Assumption 1, and let ${\mathcal{G}}_{c}({\mathcal{V}},{\mathcal{E}}_{c})$ be the correlation graph of the exogenous noise sources, ${\mathbf{e}}$ . Suppose that ${\mathcal{G}}^{\ell}({\mathcal{V}}^{\ell},{\mathcal{E}}^{\ell})\subseteq{\mathcal{G}}_{c}({\mathcal{V}},{\mathcal{E}}_{c})$ is a maximal clique with $|{\mathcal{V}}^{\ell}|>1$ . Then, for every LDIG $([{\mathbf{H}},{\mathbf{F}}],\mathbf{{\widetilde{e}}})\in{\mathcal{L}}({\mathbf{H}},{\mathbf{e}})$ , there exist latent nodes $h_{1}^{\ell},\dots,h^{\ell}_{k_{\ell}}$ such that

\displaystyle{\mathcal{V}}^{\ell}

\displaystyle=\bigcup_{i=1}^{k_{\ell}}Ch(h^{\ell}_{i})\text{ and }{\mathcal{E}}^{\ell}=\bigcup_{i=1}^{k_{\ell}}{\mathcal{E}}_{\ell,i},

(21)

where ${\mathcal{E}}_{\ell,i}:=\{(k,j):k,j\in Ch(h_{i}^{\ell})\}$ .

In particular, for any latent node $h$ in the LDIG $([{\mathbf{H}},{\mathbf{F}}],\mathbf{{\widetilde{e}}})$ , $Ch(h)$ forms a clique (not necessarily maximal) that is restricted to ${\mathcal{G}}^{\ell}({\mathcal{V}}^{\ell},{\mathcal{E}}^{\ell})$ in ${\mathcal{G}}_{c}$ .

Proof: Let ${\mathcal{V}}^{\ell}\subseteq[n]$ , $|{\mathcal{V}}^{l}|>1$ , such that ${\mathcal{V}}^{l}$ forms a clique in ${\mathcal{G}}_{c}$ . Lemma 2 showed that, for any $i,j\in{\mathcal{V}}^{l}$ , there exists a latent node $h$ such that $i,j\in Ch(h)$ in the LDIG, ${\mathcal{G}}$ , of $(\mathbf{{\widetilde{H}}},\mathbf{{\widetilde{e}}})$ . Since this is true for any pair $i,j\in{\mathcal{V}}^{l}$ , there exists a minimal set of latent nodes $\mathcal{H}^{l}:=\{h^{l}_{i}\}_{i=1}^{k_{l}}$ , $k_{l}\in\mathbb{N}\setminus\{0\}$ , s. t. for any $i,j\in{\mathcal{V}}^{l}$ , we have $i,j\in Ch(h_{p}^{l})$ for some $h_{p}^{l}\in\mathcal{H}$ , in ${\mathcal{G}}$ . Hence, ${\mathcal{V}}^{l}\subseteq\bigcup_{i=1}^{k_{l}}Ch(h^{l}_{i})$ . Similarly, $i,j\in Ch(h_{p}^{l})$ implies $(i,j)\in{\mathcal{E}}_{l,p}$ . Therefore, ${\mathcal{E}}^{l}\subseteq\bigcup_{i=1}^{k_{l}}{\mathcal{E}}_{l,i}$ .

Next, we prove that all the children of a given latent node belong to a single (maximal) clique, which proves ${\mathcal{V}}^{l}\supseteq\bigcup_{i=1}^{k_{l}}Ch(h^{l}_{i})$ and ${\mathcal{E}}^{l}\supseteq\bigcup_{i=1}^{k_{l}}{\mathcal{E}}_{l,i}$ . Let $h\in\mathcal{H}^{l}$ be a latent node in the LDIG of $(\mathbf{{\widetilde{H}}},\mathbf{{\widetilde{e}}})$ and suppose $i,j\in Ch(h)$ . Then, from the definition of $(\mathbf{{\widetilde{H}}},\mathbf{{\widetilde{e}}})$ , there exists $\ell$ such that $F_{i\ell}\neq 0$ and $F_{j\ell}\neq 0$ , and hence $\Phi_{e_{i}e_{j}}={\mathbf{F}}_{i}\Phi_{\mathbf{{\widetilde{e}}}_{h}}{\mathbf{F}}_{j}^{*}\neq 0$ a.e. Thus, $(i,j)\in{\mathcal{E}}_{c}$ , excluding the pathological cases. Notice that this is true for any $i,j\in Ch(h)$ . Therefore, $Ch(h)\subseteq{\mathcal{V}}^{\ell}$ forms a clique (not necessarily maximal) in ${\mathcal{G}}_{c}({\mathcal{V}},{\mathcal{E}}_{c})$ . Since this is true for every $h\in\mathcal{H}^{l}$ , ${\mathcal{V}}^{l}\supseteq\bigcup_{i=1}^{k_{l}}Ch(h^{l}_{i})$ . The similar proof shows that ${\mathcal{E}}^{l}\supseteq\bigcup_{i=1}^{k_{l}}{\mathcal{E}}_{l,i}$ . ∎

The proof of Lemma 3 follows directly from (21) and by noting that $k_{\ell}\geq 1$ ∎

Appendix C Proof of Theorem 4

Lemma 16.

(Pigeonhole principle): The pigeonhole principle states that if $n$ pigeons are put into $m$ pigeonholes, with $n>m$ , then at least one hole must contain more than one pigeon.

We use pigeonhole principle and Lemma 15 to prove this via contrapositive argument. Recall that the number of latent nodes $L$ is equal to the number of cliques $M$ . Suppose there exists a clique ${\mathcal{G}}^{l}({\mathcal{V}}^{l},{\mathcal{E}}^{l})$ such that, for some $i,j\in{\mathcal{V}}^{l}$ , there does not exist a latent node $h\in(\mathbf{{\widetilde{H}}},\mathbf{{\widetilde{e}}})$ with $i,j\in Ch(h)$ . By Lemma 15, there exist latent nodes $h,h^{\prime}$ such that $i\in Ch(h)$ and $j\in Ch(h^{\prime})$ . By Lemma 15 again, all the children of $h$ are included in a single clique. That is, $Ch(h)\subseteq{\mathcal{V}}^{l}$ and $Ch(h^{\prime})\subseteq{\mathcal{V}}^{l}$ , since $i,j\in{\mathcal{V}}^{l}$ . Then, excluding ${\mathcal{V}}^{l}$ , $h$ , and $h^{\prime}$ , we are left with $M-1$ cliques and $L-2=M-2$ latent nodes. Applying pigeonhole principle, with $M-1$ pigeons (cliques) and $L-2$ holes (latent nodes), there would exist at least one latent node $k$ with $Ch(k)$ belonging to two different maximal cliques, which is a contradiction of Lemma 15. ∎

Appendix D Proof of Proposition 5

Let $T_{1},\ T_{2}\in\{0,1\}^{(n+L)\times(n+L)}$ , $T_{1}\neq T_{2}$ , be the topologies of two distinct transformations $(\mathbf{{\widetilde{H}}}^{1},\mathbf{{\widetilde{e}}}^{1})$ and $(\mathbf{{\widetilde{H}}}^{2},\mathbf{{\widetilde{e}}}^{2})$ respectively. Without loss of generality, let $(i,j)$ be such that $[T_{1}]_{ij}\neq 0$ and $[T_{2}]_{ij}=0$ . By the definition of $([{\mathbf{H}},{\mathbf{F}}],\mathbf{{\widetilde{e}}})$ , if $i\leq n$ and $j\leq n$ , then $[T_{1}]_{ij}=[T_{2}]_{ij}={\mathbb{I}}_{\{\{H_{ij}\neq 0\}\cup\{H_{ji}\neq 0\}\}}$ . If $i\leq n$ and $j\geq n$ , then $i$ is an observed node and $j$ is a latent node. From Lemma 2, $F^{1}_{ij}=F^{2}_{ij}=0$ if and only if $j\notin Pa(i)$ . Thus, $[T_{1}]_{ij}=[T_{2}]_{ij}$ , which is a contradiction, since both cannot be true. Similar contradiction holds if $i\geq n$ and $j\leq n$ . If $i,j>n$ , then $[T_{1}]_{ij}=[T_{2}]_{ij}=0$ since $Pa(i)=Pa(j)=\emptyset$ . Thus, the assumption leads to a contradiction, which implies that $T_{1}=T_{2}$ . ∎

Appendix E Proof of Proposition 7

Consider a pair of monomials $y_{k},y_{l}$ with $y_{k}={\mathbf{v}}(0)^{\alpha}$ and $y_{l}={\mathbf{v}}(0)^{\beta}$ . For notational convenience, the index $0$ is omitted. Then, ${\mathbb{E}}\left\{y_{k}y_{l}\right\}={\mathbb{E}}\left\{{\mathbf{v}}^{\alpha+\beta}\right\}$ $\displaystyle=\prod_{i=1}^{m}{\mathbb{E}}\left\{{\mathbf{v}}_{i}^{\alpha_{i}+\beta_{i}}\right\}$ . By (7), ${\mathbb{E}}\left\{y_{k}y_{l}\right\}\neq 0$ if and only if $\alpha_{i}+\beta_{i}$ is even, $\forall i\in[m]$ . Suppose $\alpha_{i}$ is odd. Then, $\beta_{i}$ must be odd. Similarly, $\beta_{i}$ must be even if $\alpha_{i}$ is even, $\forall i\in[m]$ .

Define an element-wise boolean operator, ${\mathcal{B}}:\mathbb{N}^{m}\mapsto\{0,1\}^{m}$ such that for ${\mathbf{u}}={\mathcal{B}}(\alpha)$ , $u_{i}=0$ if $\alpha_{i}$ is odd and $u_{i}=1$ if $\alpha_{i}$ is even. Then, for $y_{k}={\mathbf{v}}^{\alpha}$ and $y_{l}={\mathbf{v}}^{\beta}$ , ${\mathbb{E}}\left\{y_{k}y_{l}\right\}\neq 0$ if and only if ${\mathcal{B}}(\alpha)={\mathcal{B}}(\beta)$ . Group the monomials with the the same odd-even pattern into one cluster. Since the total different values that ${\mathcal{B}}(\cdot)$ can take is $2^{m}$ , there are $2^{m}$ distinct clusters that are uncorrelated with each other.

Reorder ${\mathbf{y}}$ by grouping the monomials belonging to the same cluster together to obtain $\mathbf{\tilde{{\mathbf{y}}}}$ , similar to the $(n,p)=(2,3)$ example in Section 5.2. Then, $\Phi_{\mathbf{\tilde{{\mathbf{y}}}}}$ is a block diagonal matrix with $2^{m}$ blocks for $m$ variable polynomials, where each diagonal block corresponds to one particular pattern of $\{Odd,Even\}^{m}$ . ∎

Appendix F Proof of Lemma 9

Let $({\mathbf{H}},{\mathbf{F}},\mathbf{{\widetilde{e}}})\in{\mathcal{L}}({\mathbf{H}},{\mathbf{e}})$ and let $i,j\in[n]$ . By definition of ${\mathcal{L}}({\mathbf{H}},{\mathbf{e}})$ , $e_{i}={\widetilde{e}}_{i}+{\mathbf{F}}_{i}\mathbf{{\widetilde{e}}}_{h}$ , where ${\mathbf{F}}_{i}$ denotes the $i^{\text{th}}$ row of ${\mathbf{F}}$ , for any $i\in[n]$ . To prove the only if part, suppose that $\Phi_{e_{i}e_{j}}\neq 0$ . By definition, $e_{i}e_{j}=({\widetilde{e}}_{i}+{\mathbf{F}}_{i}{\mathbf{y}})({\widetilde{e}}_{j}+{\mathbf{F}}_{j}{\mathbf{y}})$ . Since $\mathbf{{\widetilde{e}}}_{i},\mathbf{{\widetilde{e}}}_{j}$ , and $\mathbf{{\widetilde{e}}}_{h}$ are uncorrelated, and $\Phi_{{\mathbf{y}}}$ is block diagonal (Proposition 8),

\Phi_{e_{i}e_{j}}={\mathbf{F}}_{i}\Phi_{{\mathbf{y}}}{\mathbf{F}}_{j}^{*}=\sum_{c=1}^{2^{m}}\sum_{k_{1},k_{2}\in{\mathcal{C}}_{c}}F_{ik_{1}}F_{jk_{2}}^{*}\Phi_{{\mathbf{y}}_{k_{1}}{\mathbf{y}}_{k_{2}}}\neq 0.

Thus, there exists a $c$ such that $F_{ik_{1}}\neq 0$ and $F_{jk_{2}}\neq 0$ for some $k_{1},k_{2}\in{\mathcal{C}}_{c}$ . That is, there exists a cluster $c$ such that $i,j\in Ch(c)$ .

$\Leftarrow$ Let $i,j\in[n]$ such that $\Phi_{e_{i}e_{j}}=0$ . Then, from the proof of only if part, $\sum_{c=1}^{2^{m}}\sum_{k_{1},k_{2}\in{\mathcal{C}}_{c}}F_{ik_{1}}F_{jk_{2}}^{*}\Phi_{{\mathbf{y}}_{k_{1}}{\mathbf{y}}_{k_{2}}}=0$ . That is, $F_{ik}=F_{jk}=0,\ \forall k\in{\mathcal{C}}_{c},~{}\forall c\in[2^{m}]$ , except for a few pathological cases that occur with Lebesgue measure zero. We ignore the pathological cases here. Hence, there does not exist any cluster $c$ such that both $i,j\in Ch(c)$ . ∎

Appendix G Proof of Theorem 10

The proof follows similar to the proof of Lemma 15, by replacing latent nodes with latent cluster, as in the proof of Lemma 9. ∎

Appendix H Proof of Theorem 14

As shown in Lemma 11, if $deg_{max}(\Im\{{\mathbf{S}}\})inc(\Im\{{\mathbf{L}}\})<1/12$ , then for the appropriately selected $t$ , the convex program (16) retrieves $\Im\{{\mathbf{S}}\}$ and $\Im\{L\}$ when ${\mathbf{C}}=\Im\{{\mathbf{S}}\}+\Im\{{\mathbf{L}}\}$ . If any one of the LDIMs $(\mathbf{{\widetilde{H}}},{\mathbf{e}})\in{\mathcal{L}}_{q}({\mathbf{H}},{\mathbf{e}})$ satisfies this condition, then the imaginary part of ${\widehat{\mathbf{S}}}_{t}=({\mathbf{I}}_{n}-{\mathbf{H}}^{*})\Phi_{{\widetilde{e}}_{o}}^{-1}({\mathbf{I}}_{n}-{\mathbf{H}})$ returns the topology among the observed node of $(\mathbf{{\widetilde{H}}},{\mathbf{e}})$ , by Lemma 13. The theorem follows from Remark 4.8 and Lemma 15. ∎

{ack}

The authors acknowledge the support of NSF for supporting this research through the project titled ”RAPID: COVID-19 Transmission Network Reconstruction from Time-Series Data” under Award Number 2030096.

References

[1] Daniele Alpago, Mattia Zorzi, and Augusto Ferrante. Identification of sparse reciprocal graphical models. IEEE Control Systems Letters, 2(4):659–664, 2018.
[2] Daniele Alpago, Mattia Zorzi, and Augusto Ferrante. A scalable strategy for the identification of latent-variable graphical models. IEEE Transactions on Automatic Control, 67(7):3349–3362, 2022.
[3] Enrico Avventi, Anders G. Lindquist, and Bo Wahlberg. Arma identification of graphical models. IEEE Transactions on Automatic Control, 58(5):1167–1178, 2013.
[4] James M. Bower and David Beeman. The book of GENESIS: exploring realistic neural models with the GEneral NEural SImulation System. Springer Science & Business Media, 2012.
[5] David Carfi and Giovanni Caristi. Financial dynamical systems. Differential Geometry–Dynamical Systems, 2008.
[6] Elena Ceci, Yanning Shen, Georgios B. Giannakis, and Sergio Barbarossa. Graph-based learning under perturbations via total least-squares. IEEE Transactions on Signal Processing, 68:2870–2882, 2020.
[7] Venkat Chandrasekaran, Sujay Sanghavi, Pablo A. Parrilo, and Alan S. Willsky. Rank-sparsity incoherence for matrix decomposition. SIAM Journal on Optimization, 21(2):572–596, 2011.
[8] Venkat Chandrasekaran, Pablo A. Parrilo, and Alan S. Willsky. Latent variable graphical model selection via convex optimization. The Annals of Statistics, 40(4):1935–1967, 2012.
[9] Valentina Ciccone, Augusto Ferrante, and Mattia Zorzi. Robust identification of “sparse plus low-rank” graphical models: An optimization approach. In 2018 IEEE Conference on Decision and Control (CDC), 2241–2246, Dec 2018.
[10] Valentina Ciccone, Augusto Ferrante, and Mattia Zorzi. Factor models with real data: A robust estimation of the number of factors. IEEE Transactions on Automatic Control, 64(6):2412–2425, 2019.
[11] Valentina Ciccone, Augusto Ferrante, and Mattia Zorzi. Learning latent variable dynamic graphical models by confidence sets selection. IEEE Transactions on Automatic Control, 65(12):5130–5143, 2020.
[12] David A. Cox, John Little, and Donal O’Shea. Ideals, varieties, and algorithms-an introduction to computational algebraic geometry and commutative algebra. Springer, 2007.
[13] Francesca Crescente, Lucia Falconi, Federica Rozzi, Augusto Ferrante, and Mattia Zorzi. Learning ar factor models. In 2020 59th IEEE Conference on Decision and Control (CDC), 274–279, 2020.
[14] Mihaela Dimovska and Donatello Materassi. Granger-causality meets causal inference in graphical models: Learning networks via non-invasive observations. In 2017 IEEE 56th Annual Conference on Decision and Control (CDC), 5268–5273, Dec 2017.
[15] Mihaela Dimovska and Donatello Materassi. A control theoretic look at granger causality: extending topology reconstruction to networks with direct feedthroughs. IEEE Transactions on Automatic Control, Early Access:1–1, 2020.
[16] H. J. Dreef, M. C. F. Donkers, and Paul M. J. Van den Hof. Identifiability of linear dynamic networks through switching modules. IFAC-PapersOnLine, 54(7):37–42, 2021.
[17] Lucia Falconi, Augusto Ferrante, and Mattia Zorzi. A robust approach to arma factor modeling. arXiv preprint arXiv:2107.03873, 2021.
[18] Stefanie J. M. Fonken, Karthik Raghavan Ramaswamy, and Paul M. J. Van den Hof. A scalable multi-step least squares method for network identification with unknown disturbance topology. Automatica, 141:110295, 2022.
[19] M. Ghil, M. R. Allen, M. D. Dettinger, K. Ide, D. Kondrashov, M. E. Mann, A. W. Robertson, A. Saunders, Y. Tian, F. Varadi, and P. Yiou. Advanced spectral methods for climatic time series. Reviews of Geophysics, 40(1):3–1–3–41, 2002.
[20] Roger A. Horn and Charles R. Johnson. Matrix Analysis. Cambridge University Press, USA, 2nd edition, 2012.
[21] Giacomo Innocenti and Donatello Materassi. Modeling the topology of a dynamical network via wiener filtering approach. Automatica, 48(5):936–946, 2012.
[22] Raphaël Liégeois, Bamdev Mishra, Mattia Zorzi, and Rudolph Sepulchre. Sparse plus low-rank autoregressive identification in neuroimaging time series. In 2015 54th IEEE Conference on Decision and Control (CDC), 3965–3970, Dec 2015.
[23] Johan Lofberg. Yalmip : a toolbox for modeling and optimization in matlab. In 2004 IEEE International Conference on Robotics and Automation (IEEE Cat. No.04CH37508), 284–289, Sep. 2004.
[24] Eduardo Mapurunga and Alexandre Sanfelici Bazanella. Optimal allocation of excitation and measurement for identification of dynamic networks. arXiv preprint arXiv:2007.09263, 2020.
[25] Donatello Materassi and Giacomo Innocenti. Topological identification in networks of dynamical systems. IEEE Transactions on Automatic Control, 55(8):1860–1871, 2010.
[26] Donatello Materassi and Murti V. Salapaka. Network reconstruction of dynamical polytrees with unobserved nodes. In 2012 IEEE 51st IEEE Conference on Decision and Control (CDC), 4629–4634, 2012.
[27] Donatello Materassi and Murti V. Salapaka. On the problem of reconstructing an unknown topology via locality properties of the wiener filter. IEEE Transactions on Automatic Control, 57(7):1765–1777, July 2012.
[28] Donatello Materassi and Murti V. Salapaka. Signal selection for estimation and identification in networks of dynamic systems: A graphical model approach. IEEE Transactions on Automatic Control, 1–1, 2019.
[29] Rohan Money, Joshin Krishnan, and Baltasar Beferull-Lozano. Online non-linear topology identification from graph-connected time series. In 2021 IEEE Data Science and Learning Workshop (DSLW), 1–6, 2021.
[30] Rohan Money, Joshin Krishnan, and Baltasar Beferull-Lozano. Random feature approximation for online nonlinear graph topology identification. In 2021 IEEE 31st International Workshop on Machine Learning for Signal Processing (MLSP), 1–6, 2021.
[31] Athanasios Papoulis. Probability and Statistics. Prentice-Hall, Inc., USA, 1990.
[32] Sourav Patel, Sandeep Attree, Saurav Talukdar, Mangal Prakash, and Murti V Salapaka. Distributed apportioning in a power network for providing demand response services. In 2017 IEEE International Conference on Smart Grid Communications (SmartGridComm), 38–44. IEEE, 2017.
[33] Christopher J. Quinn, Negar Kiyavash, and Todd P. Coleman. Directed information graphs. IEEE Transactions on Information Theory, 61(12):6887–6909, 2015.
[34] Venkatakrishnan C. Rajagopal, Karthik R. Ramaswamy, and Paul M. J. Van Den Hof. Learning local modules in dynamic networks without prior topology information. In 2021 60th IEEE Conference on Decision and Control (CDC), 840–845, 2021.
[35] Karthik R. Ramaswamy and Paul M.J. Vandenhof. A local direct method for module identification in dynamic networks with correlated noise. IEEE Transactions on Automatic Control, 1–1, 2020.
[36] Firoozeh Sepehr and Donatello Materassi. Blind learning of tree network topologies in the presence of hidden nodes. IEEE Transactions on Automatic Control, 65(3):1014–1028, March 2020.
[37] Firoozeh Sepehr and Donatello Materassi. An algorithm to learn polytree networks with hidden nodes. In Advances in Neural Information Processing Systems 32, 15110–15119. Curran Associates, Inc., 2019.
[38] Yanning Shen, Xiao Fu, Georgios B. Giannakis, and Nicholas D. Sidiropoulos. Topology identification of directed graphs via joint diagonalization of correlation matrices. IEEE Transactions on Signal and Information Processing over Networks, 6:271–283, 2020.
[39] Shengling Shi, Xiaodong Cheng, and Paul M. J. Van den Hof. Single module identifiability in linear dynamic networks with partial excitation and measurement. IEEE Transactions on Automatic Control, 68(1): 285 - 300 December 2021.
[40] Jitkomut Songsiri and Lieven Vandenberghe. Topology selection in graphical models of autoregressive processes. Journal of Machine Learning Research, 11(91):2671–2705, 2010.
[41] Petre Stoica and Randolph L. Moses. Spectral analysis of signals, volume 452. Pearson Prentice Hall Upper Saddle River, NJ, 2005.
[42] Saurav Talukdar, Deepjyothi Deka, Michael Chertkov, and Murti V. Salapaka. Topology learning of radial dynamical systems with latent nodes. In 2018 Annual American Control Conference (ACC), 1096–1101, June 2018.
[43] Saurav Talukdar, Deepjyoti Deka, Harish Doddi, Donatello Materassi, Michael Chertkov, and Murti V. Salapaka. Physics informed topology learning in networks of linear dynamical systems. Automatica, 112:108705, 2020.
[44] Reha H. Tütüncü, Kim-Chuan Toh, and Michael J. Todd. Sdpt3—a matlab software package for semidefinite-quadratic-linear programming, version 3.0. Web page http://www. math. nus. edu. sg/mattohkc/sdpt3. html, 2001.
[45] Paul M. J. Van den Hof, Arne Dankers, Peter S. C. Heuberger, and Xavier Bombois. Identification of dynamic models in complex networks with prediction error methods—basic methods for consistent module estimates. Automatica, 49(10):2994–3006, 2013.
[46] Mishfad S. Veedu, Doddi Harish, and Murti V. Salapaka. Topology learning of linear dynamical systems with latent nodes using matrix decomposition. IEEE Transactions on Automatic Control, 67(11): 5746 - 5761, Nov. 2022.
[47] Allen J Wood, Bruce F Wollenberg, and Gerald B Sheblé. Power generation, operation, and control. John Wiley & Sons, 2013.
[48] Mattia Zorzi and Rudolph Sepulchre. Ar identification of latent-variable graphical models. IEEE Transactions on Automatic Control, 61(9):2327–2340, Sep. 2016.
[49] Mattia Zorzi and Alessandro Chiuso. Sparse plus low rank network identification: A nonparametric approach. Automatica, 76:355–366, 2017.