This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Stochastic Dynamics of Noisy Average Consensus: Analysis and Optimization

Tadashi Wadayama and Ayano Nakai-Kasai
Part of this research was presented at IEEE International Symposium on Information Theory 2022 (ISIT2022) [1]. 1Nagoya Institute of Technology, Gokiso, Nagoya, Aichi 466-8555, Japan,
wadayama@nitech.ac.jp, nakai.ayano@nitech.ac.jp
Abstract

A continuous-time average consensus system is a linear dynamical system defined over a graph, where each node has its own state value that evolves according to a simultaneous linear differential equation. A node is allowed to interact with neighboring nodes. Average consensus is a phenomenon that the all the state values converge to the average of the initial state values. In this paper, we assume that a node can communicate with neighboring nodes through an additive white Gaussian noise channel. We first formulate the noisy average consensus system by using a stochastic differential equation (SDE), which allows us to use the Euler-Maruyama method, a numerical technique for solving SDEs. By studying the stochastic behavior of the residual error of the Euler-Maruyama method, we arrive at the covariance evolution equation. The analysis of the residual error leads to a compact formula for mean squared error (MSE), which shows that the sum of the inverse eigenvalues of the Laplacian matrix is the most dominant factor influencing the MSE. Furthermore, we propose optimization problems aimed at minimizing the MSE at a given target time, and introduce a deep unfolding-based optimization method to solve these problems. The quality of the solution is validated by numerical experiments.

I Introduction

Continuous-time average consensus system is a linear dynamical system defined over a graph [3]. Each node has its own state value, and it evolves according to a simultaneous linear differential equation where a node is only allowed to interact with neighboring nodes. The ordinary differential equation (ODE) at the node i(1in)i(1\leq i\leq n) governing the evolution of the state value xi(t)x_{i}(t) of the node ii is given by

dxi(t)dt=j𝒩iμij(xi(t)xj(t)).\displaystyle\frac{dx_{i}(t)}{dt}=-\sum_{j\in{\cal N}_{i}}\mu_{ij}(x_{i}(t)-x_{j}(t)). (1)

The set 𝒩i{\cal N}_{i} denote the neighboring nodes of node ii, while the positive scalar μij\mu_{ij} denotes the edge weight associated with the edge (i,j)(i,j). The same ODE applies to all other nodes as well. These dynamics gradually decrease the differences between the state values of neighboring nodes, leading to a phenomenon called average consensus that the all the state values converge to the average of the initial state values [2].

The average consensus system has been studied in numerous fields such as multi-agent control [4], distributed algorithm [5], formation control [6]. An excellent survey on average consensus systems can be found in [3].

In this paper, we will examine average consensus systems within the context of communications across noisy channels, such as wireless networks. Specifically, we consider the scenario in which nodes engage in local wireless communication, such as drones flying in the air or sensors dispersed across a designated area. It is assumed that each node can only communicate with neighboring nodes via an additive white Gaussian noise (AWGN) channel. The objective of the communication is to aggregate the information held by all nodes through the application of average consensus systems. As previously stated, the consensus value is the average of the initial state values.

In this setting, we must account for the impact of Gaussian noise on the differential equations. The differential equation for a noisy average consensus system takes the form:

dxi(t)dt=j𝒩iμij(xi(t)xj(t))+αWi(t),\displaystyle\frac{dx_{i}(t)}{dt}=-\sum_{j\in{\cal N}_{i}}\mu_{ij}(x_{i}(t)-x_{j}(t))+\alpha W_{i}(t), (2)

where Wi(t)W_{i}(t) represents an additive white Gaussian process, and α\alpha is a positive constant. The noise Wi(t)W_{i}(t) can be considered as the sum of the noises occurring on the edges adjacent to the node ii. In a noiseless average consensus system, it is well-established that the second smallest eigenvalue of the Laplacian matrix of the graph determines the convergence speed to the average [5]. The convergence behavior of a noisy system may be quite different from that of the noiseless system due to the presence of edge noise. However, the stochastic dynamics of such a system has not yet been studied. Studies on discrete-time consensus protocols subject to additive noise can be found in [11][12], but to the best of our knowledge, there are no prior studies on continuous-time noisy consensus systems.

The main goal of this paper is to study the stochastic dynamics of continuous-time noisy average consensus system. The theoretical understanding of the stochastic behavior of such systems will be valuable for various areas such as multi-agent control and the design of consensus-based distributed algorithms for noisy environments.

The primary contributions of this paper are as follows. We first formulate the noisy average consensus systems using stochastic differential equations (SDE) [7][8]. This SDE formulation facilitates mathematically rigorous treatment of noisy average consensus. We use the Euler-Maruyama method [7] for solving the SDE, which is a numerical method for solving SDEs. We derive a closed-form mean squared error (MSE) formula by analyzing the stochastic behavior of the residual errors in the Euler-Maruyama method. We show that the MSE is dominated by the sum of the inverse eigenvalues of the Laplacian matrix. However, minimizing the MSE at a specific target time is a non-trivial task because the objective function involves the sum of the inverse eigenvalues. To solve this optimization problem, we will propose a deep unfolding-based optimization method.

The outline of the paper is as follows. In Section 2, we introduce the mathematical notation used throughout the paper, and then provide the definition and fundamental properties of average consensus systems. In Section 3, we define a noisy average consensus system as a SDE. In Section 4, we present an analysis of the stochastic behavior of the consensus error and derive a concise MSE formula. In Section 5, we propose a deep unfolding-based optimization method for minimizing the MSE at a specified target time. Finally, in Section 6, we conclude the discussion.

II Preliminaries

II-A Notation

The following notation will be used throughout this paper. The symbols \mathbb{R} and +\mathbb{R}_{+} represent the set of real numbers and the set of positive real numbers, respectively. The one dimensional Gaussian distribution with mean μ\mu and variance σ2\sigma^{2} is denoted by 𝒩(μ,σ2){\cal N}(\mu,\sigma^{2}). The multivariate Gaussian distribution with mean vector 𝝁\bm{\mu} and covariance matrix 𝚺\bm{\Sigma} is represented by 𝒩(𝝁,𝚺){\cal N}(\bm{\mu},\bm{\Sigma}). The expectation operator is denoted by 𝖤[]{\sf E}[\cdot]. The notation diag(𝒙)\mbox{diag}(\bm{x}) is the diagonal matrix whose diagonal elements are given by 𝒙n\bm{x}\in\mathbb{R}^{n}. The matrix exponential exp(𝑿)(𝑿n×n)\exp(\bm{X})(\bm{X}\in\mathbb{R}^{n\times n}) is defined by

exp(𝑿)k=01k!𝑿k.\displaystyle\exp(\bm{X})\equiv\sum_{k=0}^{\infty}\frac{1}{k!}\bm{X}^{k}. (3)

The Frobenius norm of 𝑿n×n\bm{X}\in\mathbb{R}^{n\times n} is denoted by 𝑿F\|\bm{X}\|_{F}. The notation [n][n] denotes the set of consecutive integers from 11 to nn.

II-B Average Consensus

Let G(V,E)G\equiv(V,E) be a connected undirected graph where V=[n]V=[n]. Suppose that a node iVi\in V can be regarded as an agent communicating over the graph GG. Namely, a node ii and a node jj can communicate with each other if (i,j)E(i,j)\in E. We will not distinguish (i,j)(i,j) and (j,i)(j,i) because the graph GG is undirected.

Each node ii has a state value xi(t)x_{i}(t)\in\mathbb{R} where tt\in\mathbb{R} represents continuous-time variable. The neighborhood of a node iVi\in V is represented by

𝒩i{jV:(j,i)E,ij}.\displaystyle{\cal N}_{i}\equiv\{j\in V:(j,i)\in E,i\neq j\}. (4)

Note that the node ii is excluded from 𝒩i{\cal N}_{i}. For any time tt, a node iVi\in V can access the self-state xi(t)x_{i}(t) and the state values of its neighborhood, i.e., xj(t),j𝒩ix_{j}(t),j\in{\cal N}_{i} but cannot access to the other state values.

In this section, we briefly review the basic properties of the average consensus processes [3]. We now assume that the set of state values 𝒙(t)(x1(t),x2(t),,xn(t))T\bm{x}(t)\equiv(x_{1}(t),x_{2}(t),\ldots,x_{n}(t))^{T} are evolved according to the simultaneous differential equations

dxi(t)dt=j𝒩iμij(xi(t)xj(t)),i[n],\displaystyle\frac{dx_{i}(t)}{dt}=-\sum_{j\in{\cal N}_{i}}\mu_{ij}(x_{i}(t)-x_{j}(t)),\quad i\in[n], (5)

where the initial condition is

𝒙(0)=𝒄(c1,c2,,cn)Tn.\displaystyle\bm{x}(0)=\bm{c}\equiv(c_{1},c_{2},\ldots,c_{n})^{T}\in\mathbb{R}^{n}. (6)

The edge weight μij\mu_{ij} follows the symmetric condition

μij=μji,i[n],j[n].\displaystyle\mu_{ij}=\mu_{ji},\quad i\in[n],j\in[n]. (7)

Let 𝚫(Δ1,Δ2,,Δn)T+n\bm{\Delta}\equiv(\Delta_{1},\Delta_{2},\ldots,\Delta_{n})^{T}\in\mathbb{R}_{+}^{n} be a degree sequence where Δi\Delta_{i} is defined by

Δij𝒩iμij,i[n].\displaystyle\Delta_{i}\equiv\sum_{j\in{\cal N}_{i}}\mu_{ij},\quad i\in[n]. (8)

The continuous-time dynamical system (5) is called an average consensus system because a state value converges to the average of the initial state values at the limit of tt\rightarrow\infty, i.e,

limt𝒙(t)=1n(i=1nci)𝟏=γ𝟏,\displaystyle\lim_{t\rightarrow\infty}\bm{x}(t)=\frac{1}{n}\left(\sum_{i=1}^{n}c_{i}\right)\bm{1}{=\gamma\bm{1}}, (9)

where the vector 𝟏\bm{1} represents (1,1,,1)T(1,1,\ldots,1)^{T} and γ\gamma is defined by

γ1ni=1nci.\displaystyle\gamma\equiv\frac{1}{n}\sum_{i=1}^{n}c_{i}. (10)

We define the Laplacian matrix 𝑳{Lij}n×n\bm{L}\equiv\{L_{ij}\}\in\mathbb{R}^{n\times n} of this consensus system as follows:

Lij\displaystyle L_{ij} =Δi,i=j,i[n],\displaystyle=\Delta_{i},\quad i=j,\ i\in[n], (11)
Lij\displaystyle L_{ij} =μij,ij and (i,j)E,\displaystyle=-\mu_{ij},\quad i\neq j\mbox{ and }(i,j)\in E, (12)
Lij\displaystyle L_{ij} =0,ij and (i,j)E.\displaystyle=0,\quad i\neq j\mbox{ and }(i,j)\notin E. (13)

From this definition, a Laplacian matrix satisfies

𝑳𝟏\displaystyle\bm{L}\bm{1} =𝟎,\displaystyle=\bm{0}, (14)
diag(𝑳)\displaystyle\mbox{diag}(\bm{L}) =𝚫,\displaystyle=\bm{\Delta}, (15)
𝑳\displaystyle\bm{L} =𝑳T.\displaystyle=\bm{L}^{T}. (16)

Note that the eigenvalues of the Laplacian matrix 𝑳\bm{L} are nonnegative real because 𝑳\bm{L} is a positive semi-definite symmetric matrix. Let λ1=0<λ2λn\lambda_{1}=0<\lambda_{2}\leq\ldots\leq\lambda_{n} be the eigenvalues of 𝑳\bm{L} and 𝝃1,𝝃2,,𝝃n\bm{\xi}_{1},\bm{\xi}_{2},\ldots,\bm{\xi}_{n} be the corresponding orthonormal eigenvectors. The first eigenvector 𝝃1(1/n)𝟏\bm{\xi}_{1}\equiv(1/\sqrt{n})\bm{1} is corresponding to the eigenvalue λ1=0\lambda_{1}=0, which results in 𝑳𝝃1=0\bm{L}\bm{\xi}_{1}=0.

By using the notion of the Laplacian matrix, the dynamical system (5) can be compactly rewritten as

d𝒙(t)dt=𝑳𝒙(t),\displaystyle\frac{d\bm{x}(t)}{dt}=-\bm{L}\bm{x}(t), (17)

where the initial condition is 𝒙(0)=𝒄\bm{x}(0)=\bm{c}. The dynamical behaviors of the average consensus system (17) are thus characterized by the Laplacian matrix 𝑳\bm{L}. Since the ODE (17) is a linear ODE, it can be easily solved. The solution of the ODE (17) is given by

𝒙(t)=exp(𝑳t)𝒙(0),t0.\displaystyle\bm{x}(t)=\exp(-\bm{L}t)\bm{x}(0),\quad t\geq 0. (18)

Let 𝑼(𝝃1,𝝃2,,𝝃n)n×n\bm{U}\equiv(\bm{\xi}_{1},\bm{\xi}_{2},\ldots,\bm{\xi}_{n})\in\mathbb{R}^{n\times n} where 𝑼\bm{U} is an orthogonal matrix. The Laplacian matrix 𝑳\bm{L} can be diagonalized by using 𝑼\bm{U}, i.e.,

𝑳=𝑼diag(λ1,,λn)𝑼T.\displaystyle\bm{L}=\bm{U}\mbox{diag}(\lambda_{1},\ldots,\lambda_{n})\bm{U}^{T}. (19)

On the basis of the diagonalization, we have the spectral expansion of the matrix exponential:

exp(𝑳t)\displaystyle\exp(-\bm{L}t) =exp(𝑼diag(λ1,,λn)𝑼Tt)\displaystyle=\exp(-\bm{U}\mbox{diag}(\lambda_{1},\ldots,\lambda_{n})\bm{U}^{T}t)
=𝑼exp(diag(λ1,,λn)t)𝑼T\displaystyle=\bm{U}\exp(-\mbox{diag}(\lambda_{1},\ldots,\lambda_{n})t)\bm{U}^{T}
=i=1nexp(λit)𝝃i𝝃iT.\displaystyle=\sum_{i=1}^{n}\exp(-\lambda_{i}t)\bm{\xi}_{i}\bm{\xi}_{i}^{T}. (20)

Substituting this to 𝒙(t)=exp(𝑳t)𝒙(0)\bm{x}(t)=\exp(-\bm{L}t)\bm{x}(0), we immediately have

𝒙(t)=1n𝟏(𝟏T)𝒄+i=2nexp(λit)𝝃i𝝃iT𝒄.\displaystyle\bm{x}(t)=\frac{1}{n}\bm{1}(\bm{1}^{T})\bm{c}+\sum_{i=2}^{n}\exp(-\lambda_{i}t)\bm{\xi}_{i}\bm{\xi}_{i}^{T}\bm{c}. (21)

The second term of the right-hand side converges to zero since λk>0\lambda_{k}>0 for k=2,3,,nk=2,3,\ldots,n. This explains why average consensus happens, i.e., the convergence to the average of the initial state values (9). The second smallest eigenvalue λ2\lambda_{2}, called algebraic connectivity [10], determines the convergence speed because exp(λ2t)𝝃2𝝃2𝖳\exp(-\lambda_{2}t)\bm{\xi}_{2}\bm{\xi}_{2}^{\sf T} shows the slowest convergence in the second term.

III Noisy average consensus system

III-A SDE formulation

The dynamical model (2) containing a white Gaussian noise process is mathematically challenging to handle. We will use a common approach of approximating the white Gaussian process by using the standard Wiener process. Instead of model (2), we will focus on the following stochastic differential equation (SDE)  [8]

d𝒙(t)=𝑳𝒙(t)dt+αd𝒃(t)d\bm{x}(t)=-\bm{L}\bm{x}(t)dt+\alpha d\bm{b}(t) (22)

to study the noisy average consensus system. The parameter α\alpha is a positive real number, and it represents the intensity of the noises. The stochastic term 𝒃(t)\bm{b}(t) represents the nn-dimensional standard Wiener process. The elements of 𝒃(t)=(b1(t),b2(t),,bn(t))T\bm{b}(t)=(b_{1}(t),b_{2}(t),\ldots,b_{n}(t))^{T} are independent one dimensional-standard Wiener processes. For the Wiener process b(t)b(t), we have b(0)=0b(0)=0, E[b(t)]=0E[b(t)]=0, and it satisfies

b(t)b(s)𝒩(0,ts), 0st.\displaystyle b(t)-b(s)\sim{\cal N}(0,t-s),\ 0\leq s\leq t. (23)

III-B Approaches for studying stochastic dynamics

Our primary objective in the following analysis is to investigate the stochastic dynamics of the noisy average consensus system, focusing on deriving the mean and covariance of the solution 𝒙(t)\bm{x}(t) for the SDE (22).

There are two approaches to analyze the system. The first approach relies on the established theory of Ito calculus [8], which is used to handle stochastic integrals directly (see Fig. 1). Ito calculus can be applied to derive the first and second moments of the solution of (22).

Alternatively, the second approach employs the Euler-Maruyama (EM) method [7] and utilizes the weak convergence property [7] of the EM method. We will adopt the latter approach in our analysis, as it does not require knowledge of advanced stochastic calculus if we accept the weak convergence property. Additionally, this approach can be naturally extended to the analysis on the discrete-time noisy average consensus system. Furthermore, the EM method plays a key role in the optimization method to be presented in Section V. Our analysis motivates the use EM method for optimizing the covariance.

Refer to caption
Figure 1: Two approaches for deriving the mean and covariance of 𝒙(t)\bm{x}(t). This paper follows the lower path using the EM method.

III-C Euler-Maruyama method

We use the Euler-Maruyama method corresponding to this SDE so as to study the stochastic behavior of the solution of the SDE (22) defined above. The EM method is well-known numerical method for solving SDEs [7].

Assume that we need numerical solutions of a SDE in the time interval 0tT0\leq t\leq T. We divide this interval into NN bins and let tkkη,k=0,1,,Nt_{k}\equiv k\eta,\ k=0,1,\ldots,N where the interval η\eta is given by ηT/N.\eta\equiv{T}/{N}. Let us define a discretized sample 𝒙(k)\bm{x}^{(k)} be 𝒙(k)𝒙(tk).\bm{x}^{(k)}\equiv\bm{x}(t_{k}). It should be noted that, the choice of the width η\eta is crucial in order to ensure the stability and the accuracy of the EM method. A small width leads to a more accurate solution, but requires more computational time. A large width may be computationally efficient but may lead to instability in the solution.

The recursive equation of the EM method corresponding to SDE (22) is given by

𝒙(k+1)=𝒙(k)η𝑳𝒙(k)+α𝒘(k),k=0,1,2,,N,\displaystyle\bm{x}^{(k+1)}=\bm{x}^{(k)}-\eta\bm{L}\bm{x}^{(k)}+\alpha\bm{w}^{(k)},\ k=0,1,2,\ldots,N, (24)

where each element of 𝒘(k)(w1(k),w2(k),,wn(k))T\bm{w}^{(k)}\equiv(w_{1}^{(k)},w_{2}^{(k)},\ldots,w_{n}^{(k)})^{T} follows wi(k)𝒩(0,η).w_{i}^{(k)}\sim{\cal N}(0,\eta). In the following discussion, we will use the equivalent expression [7]:

𝒙(k+1)=𝒙(k)η𝑳𝒙(k)+αη𝒛(k),k=0,1,2,,N,\displaystyle\bm{x}^{(k+1)}=\bm{x}^{(k)}-\eta\bm{L}\bm{x}^{(k)}+\alpha\sqrt{\eta}\bm{z}^{(k)},\ k=0,1,2,\ldots,N, (25)

where 𝒛(k)\bm{z}^{(k)} is a random vector following the multivariate Gaussian distribution 𝒩(𝟎,𝑰){\cal N}(\bm{0},\bm{I}). The initial vector 𝒙(0)\bm{x}^{(0)} is set to be 𝒄\bm{c}. This recursive equation will be referred to as the Euler-Maruyama recursive equation.

Figure 2 presents a solution evaluated with the EM method. The cycle graph with 10 nodes with the degree sequence 𝒅=(2,2,,2)\bm{d}=(2,2,\ldots,2) is assumed. The initial value is randomly initialized as 𝒙(0)𝒩(0,𝑰)\bm{x}(0)\sim{\cal N}(0,\bm{I}). We can confirm that the state values are certainly converging to the average value γ\gamma in the case of noiseless case (left). On the other hand, the state vector fluctuates around the average in the noisy case (right).

Refer to caption
Figure 2: Trajectories of 𝒙(tk)=(x1(tk),,xn(tk))\bm{x}(t_{k})=(x_{1}(t_{k}),\ldots,x_{n}(t_{k})) estimated by using the EM method. Cycle graph with 10 nodes were used. The range [0,10.0][0,10.0] are discretized with N=100N=100 points. The consensus average value is γ=0.3267\gamma=-0.3267. Left panel: noiseless case (α=0.0)(\alpha=0.0), Right panel: noisy case (α=0.1)(\alpha=0.1).

IV Analysis for Noisy average consensus

IV-A Recursive equation for residual error

In the following, we will analyze the stochastic behavior of the residual error. This will be the basis for the MSE formula to be presented.

Recall that the initial state vector is 𝒄=(c1,c2,,cn)T\bm{c}=(c_{1},c_{2},\ldots,c_{n})^{T} and that the average of the initial values is denoted by γ\gamma. Since the set of eigenvectors {𝝃1,,𝝃n}\{\bm{\xi}_{1},\ldots,\bm{\xi}_{n}\} of 𝑳\bm{L} is an orthonormal base, we can expand the initial state vector 𝒄\bm{c} as

𝒄=ζ1𝝃1+ζ2𝝃2++ζn𝝃n,\displaystyle\bm{c}=\zeta_{1}\bm{\xi}_{1}+\zeta_{2}\bm{\xi}_{2}+\cdots+\zeta_{n}\bm{\xi}_{n}, (26)

where the coefficient is obtained by ζi=𝒄T𝝃i(i[n])\zeta_{i}=\bm{c}^{T}\bm{\xi}_{i}(i\in[n]). Note that ζ1𝝃1=γ𝟏\zeta_{1}\bm{\xi}_{1}=\gamma\bm{1} holds.

At the initial index k=0k=0, the Euler-Maruyama recursive equation becomes

𝒙(1)=𝒙(0)η𝑳𝒙(0)+αη𝒛(0).\displaystyle\bm{x}^{(1)}=\bm{x}^{(0)}-\eta\bm{L}\bm{x}^{(0)}+\alpha\sqrt{\eta}\bm{z}^{(0)}. (27)

Substituting (26) into the above equation, we have

𝒙(1)\displaystyle\bm{x}^{(1)} =𝒙(0)η𝑳(ζ1𝝃1+ζ2𝝃2++ζn𝝃n)+αη𝒛(0)\displaystyle=\bm{x}^{(0)}-\eta\bm{L}(\zeta_{1}\bm{\xi}_{1}+\zeta_{2}\bm{\xi}_{2}+\cdots+\zeta_{n}\bm{\xi}_{n})+\alpha\sqrt{\eta}\bm{z}^{(0)}
=𝒙(0)ηζ1𝑳𝝃1ηL(𝒙(0)ζ1𝝃1)+αη𝒛(0)\displaystyle=\bm{x}^{(0)}-\eta\zeta_{1}\bm{L}\bm{\xi}_{1}-\eta L(\bm{x}^{(0)}-\zeta_{1}\bm{\xi}_{1})+\alpha\sqrt{\eta}\bm{z}^{(0)}
=𝒙(0)η𝑳(𝒙(0)γ𝟏)+αη𝒛(0),\displaystyle={\bm{x}^{(0)}-\eta\bm{L}(\bm{x}^{(0)}-\gamma\bm{1})+\alpha\sqrt{\eta}\bm{z}^{(0)}}, (28)

where the equations L𝝃1=𝟎L\bm{\xi}_{1}=\bm{0} and ζ1𝝃1=γ𝟏\zeta_{1}\bm{\xi}_{1}=\gamma\bm{1} are used in the last equality. Subtracting γ𝟏\gamma\bm{1} from the both sides, we get

𝒙(1)γ𝟏=(𝑰η𝑳)(𝒙(0)γ𝟏)+αη𝒛(0).\displaystyle\bm{x}^{(1)}-\gamma\bm{1}=(\bm{I}-\eta\bm{L})(\bm{x}^{(0)}-\gamma\bm{1})+\alpha\sqrt{\eta}\bm{z}^{(0)}. (29)

For the index k1k\geq 1, the Euler-Maruyama recursive equation can be written as

𝒙(k+1)=(𝑰η𝑳)𝒙(k)+αη𝒛(k).\displaystyle\bm{x}^{(k+1)}=(\bm{I}-\eta\bm{L})\bm{x}^{(k)}+\alpha\sqrt{\eta}\bm{z}^{(k)}. (30)

Subtracting γ𝟏\gamma\bm{1} from the both sides, we have

𝒙(k+1)γ𝟏=(𝑰η𝑳)𝒙(k)γ𝟏+αη𝒛(k).\displaystyle\bm{x}^{(k+1)}-\gamma\bm{1}=(\bm{I}-\eta\bm{L})\bm{x}^{(k)}-\gamma\bm{1}+\alpha\sqrt{\eta}\bm{z}^{(k)}. (31)

By using the relation (𝑰η𝑳)γ𝟏=γ𝟏,(\bm{I}-\eta\bm{L})\gamma\bm{1}=\gamma\bm{1}, we can rewrite the above equation as

𝒙(k+1)γ𝟏\displaystyle\bm{x}^{(k+1)}-\gamma\bm{1} =(𝑰η𝑳)𝒙(k)(𝑰η𝑳)γ𝟏+αη𝒛(k)\displaystyle=(\bm{I}-\eta\bm{L})\bm{x}^{(k)}-(\bm{I}-\eta\bm{L})\gamma\bm{1}+\alpha\sqrt{\eta}\bm{z}^{(k)}
=(𝑰η𝑳)(𝒙(k)γ𝟏)+αη𝒛(k).\displaystyle=(\bm{I}-\eta\bm{L})(\bm{x}^{(k)}-\gamma\bm{1})+\alpha\sqrt{\eta}\bm{z}^{(k)}. (32)

It can be confirmed the above recursion (32) is consistent with the initial equation (29). We here summarize the above argument as the following lemma.

Lemma 1

Let 𝐞(k)𝐱(k)γ𝟏\bm{e}^{(k)}\equiv\bm{x}^{(k)}-\gamma\bm{1} be the residual error at index kk. The evolution of the residual error of the EM method is described by

𝒆(k+1)=(𝑰η𝑳)𝒆(k)+αη𝒛(k)\displaystyle\bm{e}^{(k+1)}=(\bm{I}-\eta\bm{L})\bm{e}^{(k)}+\alpha\sqrt{\eta}\bm{z}^{(k)} (33)

for k0k\geq 0.

The residual error 𝒆(k)\bm{e}^{(k)} denotes the error between the average vector γ𝟏\gamma\bm{1} and the state vector 𝒙(k)\bm{x}^{(k)} at time index kk. By analyzing the statistical behavior of 𝒆(k)\bm{e}^{(k)}, we can gain insight into the stochastic properties of the dynamics of the noisy consensus system.

IV-B Asymptotic mean of residual error

Let a vector 𝒙𝒩(𝝁,𝚺)\bm{x}\sim{\cal N}(\bm{\mu},\bm{\Sigma}). Recall that the vector obtained by a linear map 𝒚=𝑨𝒙\bm{y}=\bm{A}\bm{x} also follows the Gaussian distribution, i.e.,

𝒚𝒩(𝑨𝝁,𝑨𝚺𝑨T),\displaystyle\bm{y}\sim{\cal N}(\bm{A}\bm{\mu},\bm{A}\bm{\Sigma}\bm{A}^{T}), (34)

where 𝑨n×n\bm{A}\in\mathbb{R}^{n\times n}. If two Gaussian vectors 𝒂𝒩(𝝁a,𝚺a)\bm{a}\sim{\cal N}(\bm{\mu}_{a},\bm{\Sigma}_{a}) and 𝒃𝒩(𝝁b,𝚺b)\bm{b}\sim{\cal N}(\bm{\mu}_{b},\bm{\Sigma}_{b}) are independent, the sum 𝒛=𝒂+𝒃\bm{z}=\bm{a}+\bm{b} becomes also Gaussian, i.e,

𝒛𝒩(𝝁a+𝝁b,𝚺a+𝚺b).\displaystyle\bm{z}\sim{\cal N}(\bm{\mu}_{a}+\bm{\mu}_{b},\bm{\Sigma}_{a}+\bm{\Sigma}_{b}). (35)

In the recursive equation (33), it is evident that 𝒆(1)\bm{e}^{(1)} follows a multivariate Gaussian distribution because

𝒆(1)=(𝑰η𝑳)(𝒄γ𝟏)+αη𝒛(0)\displaystyle\bm{e}^{(1)}=(\bm{I}-\eta\bm{L})(\bm{c}-\gamma\bm{1})+\alpha\sqrt{\eta}\bm{z}^{(0)} (36)

is the sum of a constant vector and a Gaussian random vector. From the above properties of Gaussian random vectors, the residual error vector 𝒆(k)\bm{e}^{(k)} follows the multivariate Gaussian distribution 𝒩(𝝁(k),𝚺(k)){\cal N}(\bm{\mu}^{(k)},\bm{\Sigma}^{(k)}) where the mean vector 𝝁(k)\bm{\mu}^{(k)} and the covariance matrix 𝚺(k)\bm{\Sigma}^{(k)} are recursively determined by

𝝁(k+1)\displaystyle\bm{\mu}^{(k+1)} =(𝑰η𝑳)𝝁(k),\displaystyle=(\bm{I}-\eta\bm{L})\bm{\mu}^{(k)}, (37)
𝚺(k+1)\displaystyle\bm{\Sigma}^{(k+1)} =(𝑰η𝑳)𝚺(k)(𝑰η𝑳)T+α2η𝑰\displaystyle=(\bm{I}-\eta\bm{L})\bm{\Sigma}^{(k)}(\bm{I}-\eta\bm{L})^{T}+\alpha^{2}\eta\bm{I} (38)

for k0k\geq 0 where the initial values are formally given by

𝝁(0)\displaystyle\bm{\mu}^{(0)} =𝒄γ𝟏,\displaystyle=\bm{c}-\gamma\bm{1}, (39)
𝚺(0)\displaystyle\bm{\Sigma}^{(0)} =𝑶.\displaystyle=\bm{O}. (40)

Solving the recursive equation, we can get the asymptotic mean formula as follows.

Lemma 2

Suppose that T>0T>0 is given. The asymptotic mean at NN\rightarrow\infty is given by

limN𝝁(N)=exp(𝑳T)(𝒄γ𝟏).\displaystyle\lim_{N\rightarrow\infty}\bm{\mu}^{(N)}=\exp(-\bm{L}T)(\bm{c}-\gamma\bm{1}). (41)

(Proof) The mean recursion is given as 𝝁(k)=(𝑰η𝑳)k(𝒄γ𝟏)\bm{\mu}^{(k)}=(\bm{I}-\eta\bm{L})^{k}(\bm{c}-\gamma\bm{1}) for k1k\geq 1. Recall that the eigenvalue decomposition of 𝑳\bm{L} is given by 𝑳=𝑼diag(λ1,,λn)𝑼T\bm{L}=\bm{U}\mbox{diag}(\lambda_{1},\ldots,\lambda_{n})\bm{U}^{T}. From

𝑰η𝑳=𝑼(𝑰ηdiag(λ1,,λn))𝑼T,\displaystyle\bm{I}-\eta\bm{L}=\bm{U}(\bm{I}-\eta\mbox{diag}(\lambda_{1},\ldots,\lambda_{n}))\bm{U}^{T}, (42)

we have

(𝑰η𝑳)k=𝑼diag((1ηλ1)k,,(1ηλn)k)𝑼T.\displaystyle(\bm{I}-\eta\bm{L})^{k}=\bm{U}\mbox{diag}((1-\eta\lambda_{1})^{k},\ldots,(1-\eta\lambda_{n})^{k})\bm{U}^{T}. (43)

This implies, from the definition of exponential function,

limN(𝑰TN𝑳)N=exp(𝑳T),\displaystyle\lim_{N\rightarrow\infty}\left(\bm{I}-\frac{T}{N}\bm{L}\right)^{N}=\exp(-\bm{L}T), (44)

where η=T/N\eta=T/N.

It is easy to confirm that the claim of this lemma is consistent with the continuous solution of noiseless case (18). Namely, at the limit of α0\alpha\rightarrow 0, the state evolution of the noisy system converges to that of the noiseless system.

IV-C Asymptotic covariance of residual error

We here discuss the asymptotic behavior of the covariance matrix 𝚺(N)\bm{\Sigma}^{(N)} at the limit of NN\rightarrow\infty.

Lemma 3

Suppose that T>0T>0 is given. The asymptotic covariance matrix at NN\rightarrow\infty is given by

limN𝚺(N)=𝑼diag(α2T,θ2,θ3,,θn)𝑼T,\displaystyle\lim_{N\rightarrow\infty}\bm{\Sigma}^{(N)}=\bm{U}\mbox{diag}\left(\alpha^{2}T,\theta_{2},\theta_{3},\ldots,\theta_{n}\right)\bm{U}^{T}, (45)

where θi\theta_{i} is defined by

θiα22λi(1e2λiT).\displaystyle\theta_{i}\equiv\frac{\alpha^{2}}{2\lambda_{i}}\left(1-e^{-2\lambda_{i}T}\right). (46)

(Proof) Recall that

𝑰η𝑳=𝑼diag(1,1ηλ2,1ηλn)𝑼T.\displaystyle\bm{I}-\eta\bm{L}=\bm{U}\mbox{diag}(1,1-\eta\lambda_{2}\ldots,1-\eta\lambda_{n})\bm{U}^{T}. (47)

Let 𝚺(k)=𝑼diag(s1(k),,sn(k))𝑼T\bm{\Sigma}^{(k)}=\bm{U}\mbox{diag}(s_{1}^{(k)},\ldots,s_{n}^{(k)})\bm{U}^{T}. A spectral representation of the covariance evolution (38) is thus given by

diag(s1(k+1),,sn(k+1))\displaystyle\mbox{diag}(s_{1}^{(k+1)},\ldots,s_{n}^{(k+1)})
=diag(s1(k),s2(k)(1ηλ2)2,sn(k)(1ηλn)2)+α2η𝑰,\displaystyle=\mbox{diag}(s_{1}^{(k)},s_{2}^{(k)}(1-\eta\lambda_{2})^{2}\ldots,s_{n}^{(k)}(1-\eta\lambda_{n})^{2})+\alpha^{2}\eta\bm{I}, (48)

where si(0)=0s_{i}^{(0)}=0. The first component follows a recursion s1(k+1)=s1(k)+α2ηs_{1}^{(k+1)}=s_{1}^{(k)}+\alpha^{2}\eta and thus we have s1(N)=α2ηN=α2T.s_{1}^{(N)}=\alpha^{2}\eta N=\alpha^{2}T. Another component follows

si(k+1)=si(k)(1ηλi)2+α2η.\displaystyle s_{i}^{(k+1)}=s_{i}^{(k)}(1-\eta\lambda_{i})^{2}+\alpha^{2}\eta. (49)

Let us consider the characteristic equation of (49) which is given by

s=s(1ηλi)2+α2η.\displaystyle s=s(1-\eta\lambda_{i})^{2}+\alpha^{2}\eta. (50)

The solution of the equation is given by

s=α2η1(1ηλi)2.\displaystyle s=\frac{\alpha^{2}\eta}{1-(1-\eta\lambda_{i})^{2}}. (51)

The above recursive equation (49) thus can be transformed as

si(k+1)s=(si(k)s)(1ηλi)2.\displaystyle s_{i}^{(k+1)}-s=(s_{i}^{(k)}-s)(1-\eta\lambda_{i})^{2}. (52)

From the above equation, si(N)s_{i}^{(N)} can be solved as

si(N)=s+(si(0)s)(1ηλi)2N.\displaystyle s_{i}^{(N)}=s+(s_{i}^{(0)}-s)(1-\eta\lambda_{i})^{2N}. (53)

Taking the limit NN\rightarrow\infty, we have

limNsi(N)=α22λi(1e2λiT).\displaystyle\lim_{N\rightarrow\infty}s_{i}^{(N)}=\frac{\alpha^{2}}{2\lambda_{i}}\left(1-e^{-2\lambda_{i}T}\right). (54)

We thus have the claim of this lemma.

IV-D Weak convergence of Euler-Maruyama method

As previously noted, the asymptotic mean (41) is consistent with the continuous solution. The weak convergence property of the EM method [7] allows us to obtain the moments of the error at time tt.

We will briefly explain the weak convergence property. Suppose a SDE with the form:

d𝒙(t)=ϕ(𝒙(t))dt+ψ(𝒙(t))d𝒃(t).\displaystyle d\bm{x}(t)=\phi(\bm{x}(t))dt+\psi(\bm{x}(t))d\bm{b}(t). (55)

If ϕ\phi and ψ\psi are bounded and Lipschitz continuous, then the finite order moment estimated by the EM method converges to the exact moment of the solution 𝒙(t)\bm{x}(t) at the limit NN\rightarrow\infty [7]. This property is called the weak convergence property. In our case, the SDE (22) has bounded and Lipschitz continuous coefficient functions, i.e, ϕ(𝒙)=𝑳𝒙\phi(\bm{x})=-\bm{L}\bm{x} and ψ(𝒙)=α\psi(\bm{x})=\alpha. Hence, we can employ the weak convergence property in our analysis.

Suppose 𝒙(t)\bm{x}(t) is a solution of SDE (22) with the initial condition 𝒙(0)=𝒄\bm{x}(0)=\bm{c}. Let 𝝁(t)\bm{\mu}(t) be the mean vector of the residual error 𝒆(t)=𝒙(t)γ𝟏\bm{e}(t)=\bm{x}(t)-\gamma\bm{1} and 𝚺(t)\bm{\Sigma}(t) is the covariance matrix of the residual error 𝒆(t)\bm{e}(t).

Theorem 1

For a positive real number t>0t>0, the mean and the covariance matrix of the residual error 𝐞(t)\bm{e}(t) are given by

𝝁(t)\displaystyle\bm{\mu}(t) =exp(𝑳t)(𝒄γ𝟏)\displaystyle=\exp(-\bm{L}t)(\bm{c}-\gamma\bm{1}) (56)
𝚺(t)\displaystyle\bm{\Sigma}(t) =𝑼diag(α2t,θ2,θ3,,θn)𝑼T.\displaystyle=\bm{U}\mbox{diag}\left(\alpha^{2}t,\theta_{2},\theta_{3},\ldots,\theta_{n}\right)\bm{U}^{T}. (57)

(Proof) Due to the weak convergence property of the EM method, the first and second moments of the error are converged to the asymptotic mean and covariance of the EM method [7], i.e.,

𝝁(T)\displaystyle\bm{\mu}(T) =limN𝝁(N)\displaystyle=\lim_{N\rightarrow\infty}\bm{\mu}^{(N)} (58)
𝚺(T)\displaystyle\bm{\Sigma}(T) =limN𝚺(N),\displaystyle=\lim_{N\rightarrow\infty}\bm{\Sigma}^{(N)}, (59)

where NN and TT are related by T=ηNT=\eta N. Applying Lemmas 2 and 3 and replacing the variable TT by tt provide the claim of the theorem.

IV-E Mean squared error

In the following, we assume that the initial state vector 𝒄\bm{c} follows Gaussian distribution 𝒩(𝟎,𝑰){\cal N}(\bm{0},\bm{I}).

In this setting, 𝝁(t)\bm{\mu}(t) also follows multivariate Gaussian distribution with the mean vector 𝟎\bm{0} and the covariance matrix 𝑸(t)𝑸(t)T\bm{Q}(t)\bm{Q}(t)^{T} where

𝑸(t)exp(𝑳t)(𝑰1n𝟏(𝟏T))\displaystyle\bm{Q}(t)\equiv\exp(-\bm{L}t)\left(\bm{I}-\frac{1}{n}\bm{1}(\bm{1}^{T})\right) (60)

because 𝝁(t)\bm{\mu}(t) can be rewritten as

𝝁(t)\displaystyle\bm{\mu}(t) =exp(𝑳t)(𝒄γ𝟏)=exp(𝑳t)(𝑰1n𝟏(𝟏T))𝒄.\displaystyle=\exp(-\bm{L}t)(\bm{c}-\gamma\bm{1})=\exp(-\bm{L}t)\left(\bm{I}-\frac{1}{n}\bm{1}(\bm{1}^{T})\right)\bm{c}. (61)

By using the result of Theorem 1, we immediately have the following corollary indicating the MSE formula.

Corollary 1

The mean squared error (MSE)

𝖬𝖲𝖤(t)𝖤[𝒙(t)γ𝟏22]\displaystyle{\sf MSE}(t)\equiv{\sf E}[\|\bm{x}(t)-\gamma\bm{1}\|_{2}^{2}] (62)

is given by

𝖬𝖲𝖤(t)\displaystyle{\sf MSE}(t) =α2t+α22i=2n1e2λitλi+tr(𝑸(t)𝑸(t)T).\displaystyle=\alpha^{2}t+\frac{\alpha^{2}}{2}\sum_{i=2}^{n}\frac{1-e^{-2\lambda_{i}t}}{\lambda_{i}}+\mbox{tr}(\bm{Q}(t)\bm{Q}(t)^{T}).

(Proof) We can rewrite 𝒙(t)\bm{x}(t) as:

𝒙(t)=γ𝟏+𝑸(t)𝒄+𝒘,\displaystyle\bm{x}(t)=\gamma\bm{1}+\bm{Q}(t)\bm{c}+\bm{w}, (63)

where 𝒘𝒩(𝟎,𝚺(t))\bm{w}\sim{\cal N}(\bm{0},\bm{\Sigma}(t)), and 𝒘\bm{w} and 𝒄\bm{c} are independent. We thus have

𝖬𝖲𝖤(t)\displaystyle{\sf MSE}(t) =tr(𝚺(t))+tr(𝑸(t)𝑸(t)T)\displaystyle=\mbox{tr}(\bm{\Sigma}(t))+\mbox{tr}(\bm{Q}(t)\bm{Q}(t)^{T})
=α2t+α22i=2n1e2λitλi+tr(𝑸(t)𝑸(t)T)\displaystyle=\alpha^{2}t+\frac{\alpha^{2}}{2}\sum_{i=2}^{n}\frac{1-e^{-2\lambda_{i}t}}{\lambda_{i}}+\mbox{tr}(\bm{Q}(t)\bm{Q}(t)^{T}) (64)

due to Theorem 1.

Since the value of the term tr(𝑸(t)𝑸(t)T)\mbox{tr}(\bm{Q}(t)\bm{Q}(t)^{T}) is exponentially decreasing with tt, tr(𝚺(t))\mbox{tr}(\bm{\Sigma}(t)) is dominant in 𝖬𝖲𝖤(t){\sf MSE}(t) for sufficiently large tt. For sufficiently large tt, the MSE is well approximated by the asymptotic MSE (AMSE) as

𝖬𝖲𝖤(t)𝖠𝖬𝖲𝖤(t)α2t+α22i=2n1λi\displaystyle{\sf MSE}(t)\simeq{\sf AMSE}(t)\equiv\alpha^{2}t+\frac{\alpha^{2}}{2}\sum_{i=2}^{n}\frac{1}{\lambda_{i}} (65)

because tr(𝑸(t)𝑸(t)T)\mbox{tr}(\bm{Q}(t)\bm{Q}(t)^{T}) is negligible, and 1e2λit1-e^{-2\lambda_{i}t} can be well approximated to 11. We can observe that the sum of inverse eigenvalue i=2n(1/λi)\sum_{i=2}^{n}({1}/{\lambda_{i}}) of the Laplacian matrix determines the intercept of the 𝖠𝖬𝖲𝖤(t){\sf AMSE}(t). In other words, the graph topology influences the stochastic error behavior through the sum of inverse eigenvalues of the Laplacian matrix.

Figure 3 presents a comparison of 𝖬𝖲𝖤(t){\sf MSE}(t) evaluated by the EM method (25) and the formula in (64). In this experiment, the cycle graph with 10 nodes is used. The values of 𝖠𝖬𝖲𝖤(t){\sf AMSE}(t) are also included in Fig. 3. We can see that the theoretical values of 𝖬𝖲𝖤(t){\sf MSE}(t) and estimated values by the EM method are quite close.

Refer to caption
Figure 3: Comparison of MSE: The label Euler-Maruyama represents 𝖬𝖲𝖤(t){\sf MSE}(t) estimated by using samples generated by the EM method. Theoretical 𝖬𝖲𝖤(t){\sf MSE}(t) represents the values evaluated by (64).Theoretical 𝖠𝖬𝖲𝖤(t){\sf AMSE}(t) represents the values of 𝖠𝖬𝖲𝖤(t){\sf AMSE}(t). Cycle graph with 10 nodes with 𝒅=(2,2,,2)\bm{d}=(2,2,\ldots,2) are used. The parameter setting is as follows: N=250N=250, T=10T=10, α=0.2\alpha=0.2. 50005000 samples are generated by the EM method for estimating 𝖬𝖲𝖤(t){\sf MSE}(t).

V Minimization of mean squared error

V-A Optimization Problems A and B

In the previous section, we demonstrated that the MSE can be expressed in closed-form. It is natural to optimize the edge weights {μij}\{\mu_{ij}\} in order to decrease the value of the MSE. The optimization of the edge weights is equivalent to the optimization of the Laplacian matrix 𝑳\bm{L}. There exist several related works that aim to achieve a similar goal for noise-free systems. For example, Xiao and Boyd [5] proposed a method to minimize the second eigenvalue to achieve the fastest convergence to the average. They formulated the optimization problem as a convex optimization problem, which can be solved efficiently. Kishida et al. [13] presented a deep unfolding-based method for optimizing time-dependent edge weights, yet these methods are not applicable to systems with noise. Optimizing the MSE may be a non-trivial task as it involves the sum of the inverse eigenvalues of the Laplacian matrix.

In this subsection, we will present two optimization problems of edge weights.

V-A1 Optimization problem A

Assume that a degree sequence 𝒅+\bm{d}\in\mathbb{R}^{+} is given in advance. The optimization problem A is the minimization problem of 𝖬𝖲𝖤(t){\sf MSE}(t^{*}) under the given degree sequence where tt^{*} is the predetermined target time given in advance. The precise formulation of the problem is given as follows:

minimize 𝖬𝖲𝖤(t)\displaystyle\mbox{minimize }{\sf MSE}(t^{*})
subject to:
𝑳\displaystyle\bm{L} ={Lij}n×n\displaystyle=\{L_{ij}\}\in\mathbb{R}^{n\times n} (66)
𝑳\displaystyle\bm{L} =𝑳T\displaystyle=\bm{L}^{T} (67)
𝑳\displaystyle\bm{L} 𝟏=𝟎\displaystyle\bm{1}=\bm{0} (68)
diag(𝑳)𝒅2<θ\displaystyle\|\mbox{diag}(\bm{L})-\bm{d}\|_{2}<\theta (69)
Lij\displaystyle L_{ij} =0,(i,j)E.\displaystyle=0,\ (i,j)\notin E. (70)

The constraint (67) is imposed for the symmetry of the edge weight μij=μji\mu_{ij}=\mu_{ji} for (i,j)E(i,j)\in E. The row sum constraint (68) is needed for satisfying (8). The constraint (69) means that 𝑳\bm{L} should be close enough to the given degree sequence. The positive constant θ\theta can be seen as a tolerance parameter.

One way to interpret the optimization problem A is to consider the graph GG representing the wireless connection between terminals i[n]i\in[n]. The degree sequence 𝒅=(d1,d2,,dn)\bm{d}=(d_{1},d_{2},\ldots,d_{n}) can be seen as an allocated receive total wireless power, i.e., the terminal ii can receive the neighbouring signals up to the total power did_{i}. If an average consensus protocol is used in such a wireless network for specific applications, it is desirable to optimize the 𝖬𝖲𝖤(t){\sf MSE}(t^{*}) while satisfying the power constraint.

V-A2 Optimization problem B

Assume that a real constant D+D\in\mathbb{R}^{+} is given in advance. The optimization problem B is the minimization problem of 𝖬𝖲𝖤(t){\sf MSE}(t^{*}) under the situation that the diagonal sum of the Laplacian matrix 𝑳\bm{L} is equal to DD. The formulation is given as follows:

minimize 𝖬𝖲𝖤(t)\displaystyle\mbox{minimize }{\sf MSE}(t^{*})
subject to:
𝑳\displaystyle\bm{L} ={Lij}n×n\displaystyle=\{L_{ij}\}\in\mathbb{R}^{n\times n} (71)
𝑳\displaystyle\bm{L} =𝑳T\displaystyle=\bm{L}^{T} (72)
𝑳\displaystyle\bm{L} 𝟏=𝟎\displaystyle\bm{1}=\bm{0} (73)
|i=1nLiiD|<θ\displaystyle\left|\sum_{i=1}^{n}L_{ii}-D\right|<\theta (74)
Lij\displaystyle L_{ij} =0,(i,j)E.\displaystyle=0,\ (i,j)\notin E. (75)

Following the interpretation above, the power allocation is also optimized in this problem.

V-B Minimization based on deep-unfolded EM method

Advances in deep neural networks have had a strong impact on the design of algorithms for communications and signal processing [14, 15, 16]. Deep unfolding can be seen as a very effective way to improve the convergence of iterative algorithms. Gregor and LeCun introduced the Learned ISTA (LISTA) [21]. Borgerding et al. also proposed variants of AMP and VAMP with trainable capability [19][20]. Trainable ISTA(TISTA) [23] is another trainable sparse signal recovery algorithm with fast convergence. TISTA requires only a small number of trainable parameters, which provides a fast and stable training process. Another advantage of deep unfolding is that it has a relatively high interpretability of learning results.

The concept behind deep unfolding is rather simple. We can embed trainable parameters into the original iterative algorithm and then unfold the signal-flow graph of the original algorithm. The standard supervised training techniques used in deep learning, such as Stochastic Gradient Descent (SGD) and back propagation, can then be applied to the unfolded signal-flow graph to optimize the trainable parameters.

The combination of deep unfolding and the differential equation solvers [24] is a current area of active research in scientific machine learning. It should be noted, however, that the technique is not limited to applications within scientific machine learning. In this subsection, we introduce an optimization algorithm that is based on the deep-unfolded EM method. The central idea is to use a loss function that approximates 𝖬𝖲𝖤(t){\sf MSE}(t^{*}). By using a stochastic gradient descent approach with this loss function, we can obtain a near-optimal solution for both optimization problems A and B. The proposed method can be easily implemented using any modern neural network framework that includes an automatic differentiation mechanism. The following subsections will provide a more detailed explanation of the proposed method.

V-B1 Mini-batch for optimization

In an optimization process described below, a number of mini-batches are randomly generated. A mini-batch consists of

{(𝒄1,γ1),(𝒄2,γ2),(𝒄K,γK)}.\displaystyle{\cal M}\equiv\{(\bm{c}_{1},\gamma_{1}),(\bm{c}_{2},\gamma_{2}),\ldots(\bm{c}_{K},\gamma_{K})\}. (76)

The size parameter KK is called the mini-batch size. The initial value vector 𝒄i\bm{c}_{i} follows Gaussian distribution, i.e., 𝒄i𝒩(𝟎,𝑰)(i[n])\bm{c}_{i}\sim{\cal N}(\bm{0},\bm{I})(i\in[n]). The corresponding average value are obtained by γi(1/n)𝒄iT𝑰\gamma_{i}\equiv(1/n)\bm{c}_{i}^{T}\bm{I}.

V-B2 Loss function for Optimization problem A

The loss function corresponding to a mini-batch {\cal M} is given by

E(𝑳)1Ki=1K𝝌(𝒄i)γi𝟏22+PA(𝑳),\displaystyle E_{\cal M}(\bm{L})\equiv\frac{1}{K}\sum_{i=1}^{K}\|\bm{\chi}(\bm{c}_{i})-\gamma_{i}\bm{1}\|^{2}_{2}+P_{A}(\bm{L}), (77)

where 𝝌(𝒄i)𝒙(N)\bm{\chi}(\bm{c}_{i})\equiv\bm{x}^{(N)} is the random variable given by the Euler-Maruyama recursion:

𝒙(k+1)=𝒙(k)η𝑳𝒙(k)+αη𝒛(k),k=0,1,2,,N,\displaystyle\bm{x}^{(k+1)}=\bm{x}^{(k)}-\eta\bm{L}\bm{x}^{(k)}+\alpha\sqrt{\eta}\bm{z}^{(k)},\ k=0,1,2,\ldots,N, (78)

with 𝒙(0)=𝒄i\bm{x}^{(0)}=\bm{c}_{i}. The first term of the loss function can be regarded as an approximation of 𝖬𝖲𝖤(t){\sf MSE}(t^{*}):

1Ki=1K𝝌(𝒄i)γi𝟏22𝖬𝖲𝖤(t)\displaystyle\frac{1}{K}\sum_{i=1}^{K}\|\bm{\chi}(\bm{c}_{i})-\gamma_{i}\bm{1}\|^{2}_{2}\simeq{\sf MSE}(t^{*}) (79)

for sufficiently large KK and T=tT=t^{*}.

The function PA(𝑳)P_{A}(\bm{L}) is a penalty function corresponding to the constraints (67)–(70) defined by

PA(𝑳)\displaystyle P_{A}(\bm{L}) ρ1𝑳𝑳TF2+ρ2𝑳𝟏22+ρ3diag(𝑳)𝒅22\displaystyle\equiv\rho_{1}\|\bm{L}-\bm{L}^{T}\|_{F}^{2}+\rho_{2}\|\bm{L}\bm{1}\|_{2}^{2}+\rho_{3}\|\mbox{diag}(\bm{L})-\bm{d}\|_{2}^{2}
+ρ4𝑳\displaystyle+\rho_{4}\|\bm{L} 𝑴F2,\displaystyle\odot\bm{M}\|_{F}^{2}, (80)

where 𝑴={Mij}\bm{M}=\{M_{ij}\} is the mask matrix defined by

Mij{1,(i,j)E0,otherwise.\displaystyle M_{ij}\equiv\left\{\begin{array}[]{cc}1,&(i,j)\notin E\\ 0,&\mbox{otherwise}.\end{array}\right. (83)

The operator \odot represents the Hadamard matrix product. The positive constants ρi(i[4])\rho_{i}(i\in[4]) controls relative strength of each penalty term. The first term of the penalty function corresponds to the symmetric constraint (67). The term 𝑳𝟏22\|\bm{L}\bm{1}\|_{2}^{2} is the penalty term for the row sum constraint (68). The third term diag(𝑳)𝒅22\|\mbox{diag}(\bm{L})-\bm{d}\|_{2}^{2} is included for the degree constraint. The last term 𝑳𝑴F2\|\bm{L}\odot\bm{M}\|_{F}^{2} enforces LijL_{ij} to be very small if (i,j)E(i,j)\notin E.

Due to these penalty terms in PA(𝑳)P_{A}(\bm{L}), the violations on the constraints (67)–(70) are suppressed in an optimization process.

V-B3 Loss function for Optimization problem B

For Optimization problem B, we use almost the same same loss function:

E(𝑳)1Ki=1K𝝌(𝒄i)γi𝟏22+PB(𝑳).\displaystyle E_{\cal M}(\bm{L})\equiv\frac{1}{K}\sum_{i=1}^{K}\|\bm{\chi}(\bm{c}_{i})-\gamma_{i}\bm{1}\|^{2}_{2}+P_{B}(\bm{L}). (84)

In this case, we use the penalty function matched to the feasible conditions of Optimization problem B:

PB(𝑳)\displaystyle P_{B}(\bm{L}) ρ1𝑳𝑳TF2+ρ2𝑳𝟏22+ρ3(i=1nLiiD)2\displaystyle\equiv\rho_{1}\|\bm{L}-\bm{L}^{T}\|_{F}^{2}+\rho_{2}\|\bm{L}\bm{1}\|_{2}^{2}+\rho_{3}\left(\sum_{i=1}^{n}L_{ii}-D\right)^{2}
+ρ4𝑳\displaystyle+\rho_{4}\|\bm{L} 𝑴F2.\displaystyle\odot\bm{M}\|_{F}^{2}. (85)

The third term of PB(𝑳)P_{B}(\bm{L}) corresponds to the diagonal sum condition (74).

V-B4 Optimization process

The optimization process is summarized in Algorithm 1. This optimization algorithm is mainly based on the Deep-unfolded Euler-Maruyama (DU-EM) method for approximating 𝖬𝖲𝖤(t){\sf MSE}(t^{*}). The initial value of the matrix 𝑳\bm{L} is assumed to be the n×nn\times n zero matrix 𝑶n×n\bm{O}^{n\times n}. The main loop can be regarded as a stochastic gradient descent method minimizing the loss values. The update of 𝑳\bm{L} (line 5) can be done by any optimizer such as the Adam optimizer. The gradient of the loss function (line 4) can be easily evaluated by using an automatic differentiation mechanism included in recent neural network frameworks such as TensorFlow, PyTorch, Jax, and Flux.jl with Julia. The block diagram of the Algorithm 1 is shown in Fig. 4.

Algorithm 1 Optimization process using DU-EM method
0:  graph GG, tolerance θ\theta, degree sequence 𝒅\bm{d} or degree sum DD
0:  Laplacian matrix 𝑳out\bm{L}_{out}
1:  Let 𝑳𝑶n×n\bm{L}\equiv\bm{O}^{n\times n}
2:  for i=1i=1 to II do
3:     Generate a mini-batch {\cal M} randomly.
4:     Compute the gradient of the loss function
𝒈E(𝑳)\displaystyle\bm{g}\equiv\nabla E_{\cal M}(\bm{L})
5:     The matrix 𝑳\bm{L} is updated by using 𝒈\bm{g}.
6:  end for
7:  𝑳outroundθ,(𝑳)\bm{L}_{out}\equiv\mbox{round}_{\theta,*}(\bm{L})
Refer to caption
Figure 4: Block diagram of optimization process in Algorithm 1. The core of the algorithm is the DU-EM method for approximating 𝖬𝖲𝖤(t){\sf MSE}(t^{*}). Several standard deep learning techniques such as back propagation and stochastic gradient descent can be applied to update the trainable matrix 𝑳\bm{L}.

The stochastic optimization process outlined in Algorithm 1 is unable to guarantee that the obtained solution will be strictly feasible. To ensure feasibility, it is necessary to search for a feasible solution that is near the result obtained by optimization. This is accomplished by using the round function roundθ,()\mbox{round}_{\theta,*}(\cdot) at line 7 of Algorithm 1.

The specific details for the round function used for optimization problem A are outlined in Algorithm 2. The first step in the algorithm, 𝑳(𝑳in+𝑳inT)/2\bm{L}\equiv(\bm{L}_{in}+\bm{L}_{in}^{T})/2, ensures that the resulting matrix is symmetric. The nested loop from line 2 to line 7 is used to enforce the degree constraint and the constraint Lij=0(i,j)EL_{ij}=0\ (i,j)\notin E. The single loop from line 9 to line 11 is implemented to satisfy the constraint 𝑳𝟏=𝟎\bm{L}\bm{1}=\bm{0}. The output of the round function roundθ,𝒅()\mbox{round}_{\theta,\bm{d}}(\cdot) guarantees that the constraints (67)-(70) of optimization problem A are strictly satisfied. A similar round function can be constructed for optimization problem B, which is presented in Algorithm 3.

Algorithm 2 Round function roundθ,𝒅()\mbox{round}_{\theta,\bm{d}}(\cdot) for Opt. prob. A
0:  Matrix 𝑳in\bm{L}_{in}, degree sequence 𝒅\bm{d}, threshold value θ\theta
0:  Laplacian matrix 𝑳out\bm{L}_{out} satisfying (67)–(70)
1:  Let 𝑳(𝑳in+𝑳inT)/2\bm{L}\equiv(\bm{L}_{in}+\bm{L}_{in}^{T})/2
2:  for i=1i=1 to nn do
3:     LiidiL_{ii}\equiv d_{i}
4:     for j=1j=1 to nn do
5:        If (i,j)E(i,j)\notin E, then let Lij0L_{ij}\equiv 0
6:     end for
7:  end for
8:  ϵ=(ϵ1,,ϵn)T𝑳𝟏\bm{\epsilon}=(\epsilon_{1},\ldots,\epsilon_{n})^{T}\equiv\bm{L}\bm{1}
9:  for i=1i=1 to nn do
10:     Let LiiLiiϵiL_{ii}\equiv L_{ii}-\epsilon_{i}
11:  end for
12:  if diag(𝑳)𝒅2θ\|\mbox{diag}(\bm{L})-\bm{d}\|_{2}\geq\theta then
13:     Quit with declaration “optimization failed”
14:  end if
15:  Output 𝑳out𝑳\bm{L}_{out}\equiv\bm{L}

Algorithm 3 Round function roundθ,D()\mbox{round}_{\theta,D}(\cdot) for Opt. prob. B
0:  Matrix 𝑳in\bm{L}_{in}, degree sum DD, threshold value θ\theta
0:  Laplacian matrix 𝑳out\bm{L}_{out}
1:  Let 𝑳(𝑳in+𝑳inT)/2\bm{L}\equiv(\bm{L}_{in}+\bm{L}_{in}^{T})/2
2:  for i=1i=1 to nn do
3:     for j=1j=1 to nn do
4:        If (i,j)E(i,j)\notin E, then let Lij0L_{ij}\equiv 0
5:     end for
6:  end for
7:  ϵ=(ϵ1,,ϵn)T𝑳𝟏\bm{\epsilon}=(\epsilon_{1},\ldots,\epsilon_{n})^{T}\equiv\bm{L}\bm{1}
8:  for i=1i=1 to nn do
9:     Let LiiLiiϵiL_{ii}\equiv L_{ii}-\epsilon_{i}
10:  end for
11:  if |i=1nLiiD|θ\left|\sum_{i=1}^{n}L_{ii}-D\right|\geq\theta then
12:     Quit with declaration “optimization failed”
13:  end if
14:  Output 𝑳out𝑳\bm{L}_{out}\equiv\bm{L}

VI Numerical results

VI-A Choice of Number of bins for EM-method

In the previous sections, we proposed a DU-based optimization method. This section presents results of numerical experiments. For these experiments, we used the automatic differentiation mechanism provided by Flux.jl [25] on Julia Language [26].

Before discussing the optimization of MSE, we first examine the choice number of bins, NN. Small NN is beneficial for computational efficiency but it may lead to inaccurate estimation of MSE. In this subsection, we will compare the Monte carlo estimates of MSE estimated by the EM-method.

The Karate graph is a well-known graph of a small social network. It represents the relationships between 34 members of a karate club at a university. The graph consists of 34 nodes, which represent the members of the club, and 78 edges, which represent the relationships between the members.

Figure 5 compares three cases, i.e., N=100,250,1000N=100,250,1000. No visible difference can be observed in the range from T=0T=0 to T=5T=5. In the following experiments, we will use N=250N=250 for EM-method.

Refer to caption
Figure 5: MSE values estimated by Monte Carlo method based on the EM method. Karate graph (n=34n=34) and its unweighted Laplacian is assumed.

VI-B Petersen graph (Optimization problem A)

Petersen graph is a 3-regular graph with n=10n=10 nodes (Fig.6(a)). In this subsection, we will examine the behavior of our optimization algorithm of 𝖬𝖲𝖤(t){\sf MSE}(t) for Petersen graph.

Refer to caption
Figure 6: Small graphs: (a) Cycle graph, (b) Petersen graph, (c) House graph.

An adjacency matrix 𝑨{Aij}n×n\bm{A}\equiv\{A_{ij}\}\in\mathbb{R}^{n\times n} of a graph G(V,E)G\equiv(V,E) is defined by

Aij{1,(i,j)E0,otherwise.\displaystyle A_{ij}\equiv\left\{\begin{array}[]{cc}1,&(i,j)\in E\\ 0,&\mbox{otherwise}.\end{array}\right. (88)

An unweighted Laplacian matrix 𝑳\bm{L} is defined by

𝑳𝑫𝑨,\displaystyle\bm{L}\equiv\bm{D}-\bm{A}, (89)

The degree matrix 𝑫={Dij}\bm{D}=\{D_{ij}\} is a diagonal matrix where DiiD_{ii} is the degree of the node ii. Namely, an unweighted Laplacian corresponds the case where μij=μji=1\mu_{ij}=\mu_{ji}=1 for any (i,j)E(i,j)\in E.

In the following discussion, let 𝑳P\bm{L}_{P} be the unweighted Laplacian matrix of Petersen graph. We assume Optimization problem A with the degree sequence 𝒅diag(𝑳P)=(3,3,,3)\bm{d}\equiv\mbox{diag}(\bm{L}_{P})=(3,3,\ldots,3).

The parameter setting is as follows. The mini-batch size is set to K=25K=25. The noise intensity is α=0.3\alpha=0.3. The penalty coefficients are ρ1=ρ2=ρ3=ρ4=10\rho_{1}=\rho_{2}=\rho_{3}=\rho_{4}=10. For time discretization, we use T=4,N=250T=4,N=250. The number of iterations for an optimization process is set to 3000. The tolerance parameter is set to θ=0.1\theta=0.1. In the optimization process, we used the Adam optimizer with a learning rate of 0.01.

The loss values of an optimization process of Algorithm 1 are presented in Fig.7. In the initial stages of the optimization process, the loss value is relatively high since the initial 𝑳\bm{L} is set to the zero matrix, which means that the system cannot achieve average consensus. The loss value decreases monotonically until around iteration 700, after which it fluctuates within a range of 700k3000700\leq k\leq 3000. The graph shows that the matrix 𝑳\bm{L} in Algorithm 1 is being updated appropriately and that the loss value, which approximates 𝖬𝖲𝖤(t){\sf MSE}(t), is decreasing.

Refer to caption
Figure 7: Loss values in an optimization process: Optimization prob. A for Petersen graph

Let us denote the Laplacian matrix obtained by the optimization process as 𝑳\bm{L}^{*}. Table I summarizes several important quantities regarding 𝑳\bm{L}^{*}. The top 4 rows of Table I indicate that 𝑳\bm{L}^{*} is certainly a feasible solution satisfying (67)–(70) because we set θ=0.1\theta=0.1. This numerical results confirms that the round function roundθ,𝒅()\mbox{round}_{\theta,\bm{d}}(\cdot) works appropriately. The last row of Table I shows that 𝑳\bm{L}^{*} is very close to the unweighted Laplacian matrix 𝑳P\bm{L}_{P}. Since Petersen graph is regular and has high symmetry, it is conjectured that 𝑳P\bm{L}_{P} is the optimal solution for Optimization problem A. Thus, the closeness between 𝑳P\bm{L}_{P} and 𝑳\bm{L}^{*} can be seen as a convincing result.

TABLE I: Several quantities on optimization result 𝑳\bm{L}^{*}
𝑳𝑳TF\|\bm{L}^{*}-\bm{L}^{*T}\|_{F} 0
𝑳𝟏2\|\bm{L}^{*}\bm{1}\|_{2} 0
𝑳MF\|\bm{L}^{*}\odot M\|_{F} 0
diag(𝑳)𝒅2\|\mbox{diag}(\bm{L}^{*})-\bm{d}\|_{2} 5.74×1035.74\times 10^{-3}
𝑳P𝑳F\|\bm{L}_{P}-\bm{L}^{*}\|_{F} 0.188

The MSE values of the optimization result 𝑳\bm{L}^{*} and the unweighted Laplacian matrix 𝑳P\bm{L}_{P} are presented in Fig.8. These values are evaluated by the MSE formula (64). No visible difference can be seen between two curves. This means that Algorithm 1 successfully found a good solution for Optimization problem A in this case.

Refer to caption
Figure 8: Petersen graph: MSE values of the optimization result 𝑳\bm{L}^{*} and the unweighted Laplacian matrix 𝑳P\bm{L}_{P}.

VI-C Karate graph (Optimization problem A)

We here consider Optimization problem A on the Karate graph. Let 𝑳K\bm{L}_{K} be the unweighted Laplacian matrix of the Karate graph. The target degree sequence is set to 𝒅diag(𝑳)=(16,9,10,,12,17).\bm{d}\equiv\mbox{diag}(\bm{L})=(16,9,10,\ldots,12,17). The parameter setting for an optimization process is given as follows. The mini-batch size is set to K=50K=50. The noise intensity is set to α=0.3\alpha=0.3. The penalty coefficients are ρ1=ρ2=ρ3=ρ4=10\rho_{1}=\rho_{2}=\rho_{3}=\rho_{4}=10. We use T=2,N=250T=2,N=250 for DU-EM method. The number of iterations for an optimization process is set to 5000. The tolerance is set to θ=0.1\theta=0.1. In the optimization process, we used the Adam optimizer with learning rate 0.01.

Assume that 𝑳\bm{L}^{*} is the Laplacian matrix obtained by an optimization process. The matrix 𝑳\bm{L}^{*} is a feasible solution satisfying all the constraints (67)–(70). For example, we have diag(𝑳)𝒅2=0.0894<0.1\|\mbox{diag}(\bm{L}^{*})-\bm{d}\|_{2}=0.0894<0.1. Figure 9 presents the absolute values of non-diagonal elements in 𝑳K\bm{L}_{K} and 𝑳\bm{L}^{*}. According to its definition, the absolute value of a non-diagonal element of 𝑳K\bm{L}_{K} take the value one (left panel). On the other hand, we can observe that non-diagonal elements of 𝑳\bm{L}^{*} takes the absolute values in the range 0 to 1.51.5.

Refer to caption
Figure 9: Absolute values of non-diagonal elements in 𝑳K\bm{L}_{K} and 𝑳\bm{L}^{*}: (left panel) Laplacian matrix 𝑳K{\bm{L}}_{K} of the Karate graph, (right panel) The Laplacian matrix 𝑳\bm{L}^{*} obtained by an optimization process.

We present the MSE values of the optimization result 𝑳\bm{L}^{*} and the unweighted Laplacian matrix 𝑳K\bm{L}_{K} in Fig.10. These values are evaluated by the MSE formula (64). It can be seen that the optimized Laplacian 𝑳\bm{L}^{*} provides smaller MSE values. In this case, appropriate assignment of weights μij\mu_{ij} improves the noise immunity of the system. The inverse eigenvalue sums of the Laplacian matrices 𝑳K\bm{L}_{K} and 𝑳\bm{L}^{*} are 13.83 and 13.41, respectively. In this case, the optimization process of Algorithm 1 can successfully provide a feasible Laplacian matrix with smaller inverse eigenvalue sum. As shown in (65), the inverse eigenvalue sum determines the behavior of 𝖬𝖲𝖤(t){\sf MSE}(t).

Refer to caption
Figure 10: Karate graph: MSE values of 𝑳\bm{L}^{*} (𝒅=diag(𝑳K)\bm{d}=\mbox{diag}(\bm{L}_{K})) and the unweighted Laplacian matrix 𝑳K\bm{L}_{K}.

VI-D House graph (Optimization problem B)

The house graph (Fig.6(c)) is a small irregular graph with 5 nodes defined by the adjacency matrix:

𝑨=(0110010010100110110100110).\displaystyle\bm{A}=\begin{pmatrix}0&1&1&0&0\\ 1&0&0&1&0\\ 1&0&0&1&1\\ 0&1&1&0&1\\ 0&0&1&1&0\\ \end{pmatrix}. (90)

We thus have the unweighted Laplacian 𝑳H\bm{L}_{H} of the house graph as

𝑳H=(2110012010103110113100112),\displaystyle\bm{L}_{H}=\begin{pmatrix}2&-1&-1&0&0\\ -1&2&0&-1&0\\ -1&0&3&-1&-1\\ 0&-1&-1&3&-1\\ 0&0&-1&-1&2\\ \end{pmatrix}, (91)

where the diagonal sum of 𝑳H\bm{L}_{H} is 12.

We made two optimizations for D=12D=12 and D=20D=20. The parameter setting is almost the same as the one used in the previous subsection. Only the difference is to use ρ3=0.1\rho_{3}=0.1 as the diagonal sum penalty constant. As results of the optimization processes, we have two Laplacian matrices 𝑳12\bm{L}^{*}_{12} (D=12)(D=12) and 𝑳24\bm{L}^{*}_{24} (D=24)(D=24) as follows:

𝑳12=(2.291.051.23001.052.2901.2401.2302.700.441.0301.240.442.711.03001.031.032.06)\displaystyle\bm{L}^{*}_{12}=\begin{pmatrix}2.29&-1.05&-1.23&0&0\\ -1.05&2.29&0&-1.24&0\\ -1.23&0&2.70&-0.44&-1.03\\ 0&-1.24&-0.44&2.71&-1.03\\ 0&0&-1.03&-1.03&2.06\\ \end{pmatrix} (92)
𝑳24=(4.802.702.09002.704.7902.0802.0904.810.372.3502.080.374.852.40002.352.404.74)\displaystyle\bm{L}^{*}_{24}=\begin{pmatrix}4.80&-2.70&-2.09&0&0\\ -2.70&4.79&0&-2.08&0\\ -2.09&0&4.81&-0.37&-2.35\\ 0&-2.08&-0.37&4.85&-2.40\\ 0&0&-2.35&-2.40&4.74\\ \end{pmatrix} (93)

The diagonal sums of 𝑳12\bm{L}^{*}_{12} and 𝑳24\bm{L}^{*}_{24} are 12.04 and 23.99, respectively. Compared with 𝑳12\bm{L}^{*}_{12} with 𝑳H\bm{L}_{H}, the diagonal elements of 𝑳12\bm{L}^{*}_{12} are more flat:

diag(𝑳12)\displaystyle\mbox{diag}(\bm{L}^{*}_{12}) =(2.29,2.29,2.70,2.71,2.06)T,\displaystyle=(2.29,2.29,2.70,2.71,2.06)^{T}, (94)
diag(𝑳H)\displaystyle\mbox{diag}(\bm{L}_{H}) =(2,2,3,3,2)T.\displaystyle=(2,2,3,3,2)^{T}. (95)

The MSE values of the optimization result 𝑳12,𝑳24\bm{L}^{*}_{12},\bm{L}^{*}_{24} and the unweighted Laplacian matrix 𝑳H\bm{L}_{H} are shown in Fig.11. We can observe that 𝑳12\bm{L}_{12}^{*} achieves slightly smaller MSE values compared with the unweighted Laplacian matrix 𝑳H\bm{L}_{H}. The Laplacian matrix 𝑳24\bm{L}^{*}_{24} provides much smaller MSE values than those of 𝑳H\bm{L}_{H}. The sums of inverse eigenvalues are 1.64,1.59,0.821.64,1.59,0.82 for 𝑳H\bm{L}_{H}, 𝑳12\bm{L}_{12}^{*}, and 𝑳24\bm{L}_{24}^{*}, respectively.

Refer to caption
Figure 11: House graph: MSE values of 𝑳12,𝑳20\bm{L}^{*}_{12},\bm{L}^{*}_{20} and the unweighted Laplacian matrix 𝑳H\bm{L}_{H}.

VI-E Barabási-Albert (BA) random graphs (Optimization problem B)

As an example of random scale-free networks, we here handle Barabási-Albert random graph which use a preferential attachment mechanism. The number of edges between a new node to existing nodes is assumed to be 5.

In this experiment, we generated an instance of Barabási-Albert random graph with 50 nodes. The unweighted Laplacian of the instance is denoted by 𝑳B\bm{L}_{B}. The sum of the diagonal elements of 𝑳B\bm{L}_{B} is 450450. The parameter setting for optimization is the same as the one used in the previous subsection except for D=450D=450. The output of the optimization algorithm is referred to as 𝑳\bm{L}^{*}.

Figure 12 presents the MSE values of the original unweighted Laplacian 𝑳B\bm{L}_{B} and the optimization output 𝑳\bm{L}^{*}. We can observe that the optimized MSE values are substantially smaller than those of the unweighted Laplacian 𝑳B\bm{L}_{B}. The sums of inverse eigenvalues for 𝑳\bm{L}^{*} and 𝑳B\bm{L}_{B} are 6.446.44 and 7.167.16, respectively.

Figure 13 illustrates the values of diagonal elements of 𝑳\bm{L}^{*} and 𝑳B\bm{L}_{B}. It can be observed that the values distribution of 𝑳\bm{L}^{*} is almost flat although the values of 𝑳B\bm{L}_{B} varies from 5 to 21. This observation is consistent with the tendency observed in the previous subsection regarding the house graph.

Refer to caption
Figure 12: Barabási-Albert random graph: MSE values of 𝑳\bm{L}^{*} and the unweighted Laplacian matrix 𝑳B\bm{L}_{B}.
Refer to caption
Figure 13: Comparison of diagonal elements of 𝑳\bm{L}^{*} and 𝑳B\bm{L}_{B}.

VII Conclusion

In this paper, we have formulated a noisy average consensus system through a SDE. This formulation allows for an analytical study of the stochastic dynamics of the system. We derived a formula for the evolution of covariance for the EM method. Through the weak convergence property, we have established Theorem 1 and derived a MSE formula that provides the MSE at time tt. Analysis of the MSE formula reveals that the sum of inverse eigenvalues for the Laplacian matrix is the most significant factor impacting the MSE dynamics. To optimize the edge weights, a deep unfolding-based technique is presented. The quality of the solution has been validated by numerical experiments.

It is important to note that the theoretical understanding gained in this study will also provide valuable perspective on consensus-based distributed algorithms in noisy environments. In addition, the methodology for optimization proposed in this paper is versatile and can be adapted for various algorithms operating on graphs. The exploration of potential applications will be an open area for further studies.

Acknowledgement

This study was supported by JSPS Grant-in-Aid for Scientific Research (A) Grant Number 22H00514. The authors thank Prof. Masaki Ogura for letting us know the related work [11] on discrete-time average consensus systems.

References

  • [1] T. Wadayama and A. Nakai-Kasai, “Continuous-time noisy average consensus system as Gaussian multiple access channel,” IEEE International Symposium on Information Theory, (ISIT) 2022.
  • [2] R. Olfati-Saber and R. M. Murray, “Consensus problems in networks of agents with switching topology and time-delays,” IEEE Trans. Automat. Contr., vol. 49, no. 9, pp. 1520–1533, Sept. 2004.
  • [3] R. Olfati-Saber, J. A. Fax, and R. M. Murray, “Consensus and cooperation in networked multi-agent systems,” Proceedings of the IEEE, vol. 95, no. 1, pp. 215–233, 2007.
  • [4] W. Reb and R. W. Beard, “Consensus seeking in multi-agent systems under dynamically changing interaction topologies,” IEEE Transactions on Automatic Control, vol. 50, no. 5, pp. 655–661, 2005.
  • [5] L. Xiao and S. Boyd, “Fast linear iterations for distributed averaging,” Systems and Control Letters, vol. 53, pp. 65–78, 2004.
  • [6] W. Ren, “Consensus strategies for cooperative control of vehicle formations,” IET Control Theory and Applications, vol.2, pp. 505–512, 2007.
  • [7] P. E. Kloeden and E. Platen, “Numerical solution of stochastic differential equations,” Springer-Verlag, 1991.
  • [8] B. Oksendal, “Stochastic differential equations: an introduction with applications,” Springer, 2010.
  • [9] C. Godsil and G. F. Royle, “Algebraic graph theory,” Springer, 2001.
  • [10] F. Chung, “Spectral graph theory,” American Mathematical Society, 1997.
  • [11] A. Jadbabaie and A. Olshevsky, “On performance of consensus protocols subject to noise: Role of hitting times and network structure,” IEEE 55th Conference on Decision and Control (CDC), pp. 179-184, 2016.
  • [12] R. Rajagopal and M. J. Wainwright, “Network-based consensus averaging with general noisy channels,” IEEE Trans. Signal Process., vol. 59, no. 1, pp. 373–385, Jan. 2011.
  • [13] M. Kishida, M. Ogura, Y. Yoshida, and T. Wadayama, “Deep learning-based average consensus,” IEEE Access, vol. 8, pp. 142404 - 142412, 2020.
  • [14] B. Aazhang, B. P. Paris and G. C. Orsak, “Neural networks for multiuser detection in code-division multiple-access communications,” IEEE Trans. Comm., vol. 40, no. 7, pp. 1212-1222, Jul. 1992.
  • [15] E. Nachmani, Y. Beéry and D. Burshtein, “Learning to decode linear codes using deep learning,” 2016 54th Annual Allerton Conf. Comm., Control, and Computing, 2016, pp. 341-346.
  • [16] T. O’Shea and J. Hoydis, “An introduction to deep learning for the physical layer,” IEEE Trans. Cog. Comm. Net., vol. 3, no. 4, pp. 563-575, Dec. 2017.
  • [17] Y. A. LeCun, L. Bottou, G. B. Orr, and K. R. Müller, “Efficient backprop,” in Neural networks: Tricks of the trade, G. B. Orr and K. R. Müller, Eds. Springer-Verlag, London, UK, 1998, pp. 9-50.
  • [18] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Learning representations by back-propagating errors,” Nature, vol. 323, no. 6088, pp. 533-536, Oct. 1986.
  • [19] M. Borgerding and P. Schniter, “Onsager-corrected deep learning for sparse linear inverse problems,” 2016 IEEE Global Conf. Signal and Inf. Process. (GlobalSIP), Washington, DC, Dec. 2016, pp. 227-231.
  • [20] M. Borgerding, P. Schniter, and S. Rangan, “AMP-inspired deep networks for sparse linear inverse problems, ” IEEE Trans, Sig. Process. vol 65, no. 16, pp. 4293-4308 Aug. 2017.
  • [21] K. Gregor, and Y. LeCun, “Learning fast approximations of sparse coding,” Proc. 27th Int. Conf. Machine Learning, pp. 399-406, 2010.
  • [22] A. Balatsoukas-Stimming and C. Studer, “Deep Unfolding for Communications Systems: A Survey and Some New Directions,” 2019 IEEE International Workshop on Signal Processing Systems (SiPS), pp. 266-271, 2019.
  • [23] D. Ito, S. Takabe, and T. Wadayama, ”Trainable ISTA for sparse signal recovery,” IEEE Transactions on Signal Processing, vol. 67, no. 12, pp. 3113-3125, 2019.
  • [24] C. Rackauckas, Y. Ma, J. Martensen, C. Warner, K. Zubov, R. Supekar, D. Skinner, A. Ramadhan, and A. Edelman, “Universal differential equations for scientific machine learning,” arXiv:2001.04385, 2020.
  • [25] M. Innes, “Flux: Elegant machine learning with Julia,” Journal of Open Source Software, 2018.
  • [26] J. Bezanson, S. Karpinski, B. Viral, and A. Edelman, “Julia: A fast dynamic language for technical computing,” arXiv preprint arXiv:1209.5145, 2012.
  • [27] R. Albert, A.-L. Barabasi, “Statistical mechanics of complex networks,” American Physical Society, Rev. Mod. Phys., vol.74 pp. 47–97, 2002.