Stochastic Dynamics of Noisy Average Consensus: Analysis and Optimization

Tadashi Wadayama and Ayano Nakai-Kasai
Part of this research was presented at IEEE International Symposium on Information Theory 2022 (ISIT2022) [1]. 1Nagoya Institute of Technology, Gokiso, Nagoya, Aichi 466-8555, Japan,
wadayama@nitech.ac.jp, nakai.ayano@nitech.ac.jp

Abstract

A continuous-time average consensus system is a linear dynamical system defined over a graph, where each node has its own state value that evolves according to a simultaneous linear differential equation. A node is allowed to interact with neighboring nodes. Average consensus is a phenomenon that the all the state values converge to the average of the initial state values. In this paper, we assume that a node can communicate with neighboring nodes through an additive white Gaussian noise channel. We first formulate the noisy average consensus system by using a stochastic differential equation (SDE), which allows us to use the Euler-Maruyama method, a numerical technique for solving SDEs. By studying the stochastic behavior of the residual error of the Euler-Maruyama method, we arrive at the covariance evolution equation. The analysis of the residual error leads to a compact formula for mean squared error (MSE), which shows that the sum of the inverse eigenvalues of the Laplacian matrix is the most dominant factor influencing the MSE. Furthermore, we propose optimization problems aimed at minimizing the MSE at a given target time, and introduce a deep unfolding-based optimization method to solve these problems. The quality of the solution is validated by numerical experiments.

I Introduction

Continuous-time average consensus system is a linear dynamical system defined over a graph [3]. Each node has its own state value, and it evolves according to a simultaneous linear differential equation where a node is only allowed to interact with neighboring nodes. The ordinary differential equation (ODE) at the node $i(1\leq i\leq n)$ governing the evolution of the state value $x_{i}(t)$ of the node $i$ is given by

\displaystyle\frac{dx_{i}(t)}{dt}=-\sum_{j\in{\cal N}_{i}}\mu_{ij}(x_{i}(t)-x_{j}(t)).

(1)

The set ${\cal N}_{i}$ denote the neighboring nodes of node $i$ , while the positive scalar $\mu_{ij}$ denotes the edge weight associated with the edge $(i,j)$ . The same ODE applies to all other nodes as well. These dynamics gradually decrease the differences between the state values of neighboring nodes, leading to a phenomenon called average consensus that the all the state values converge to the average of the initial state values [2].

The average consensus system has been studied in numerous fields such as multi-agent control [4], distributed algorithm [5], formation control [6]. An excellent survey on average consensus systems can be found in [3].

In this paper, we will examine average consensus systems within the context of communications across noisy channels, such as wireless networks. Specifically, we consider the scenario in which nodes engage in local wireless communication, such as drones flying in the air or sensors dispersed across a designated area. It is assumed that each node can only communicate with neighboring nodes via an additive white Gaussian noise (AWGN) channel. The objective of the communication is to aggregate the information held by all nodes through the application of average consensus systems. As previously stated, the consensus value is the average of the initial state values.

In this setting, we must account for the impact of Gaussian noise on the differential equations. The differential equation for a noisy average consensus system takes the form:

\displaystyle\frac{dx_{i}(t)}{dt}=-\sum_{j\in{\cal N}_{i}}\mu_{ij}(x_{i}(t)-x_{j}(t))+\alpha W_{i}(t),

(2)

where $W_{i}(t)$ represents an additive white Gaussian process, and $\alpha$ is a positive constant. The noise $W_{i}(t)$ can be considered as the sum of the noises occurring on the edges adjacent to the node $i$ . In a noiseless average consensus system, it is well-established that the second smallest eigenvalue of the Laplacian matrix of the graph determines the convergence speed to the average [5]. The convergence behavior of a noisy system may be quite different from that of the noiseless system due to the presence of edge noise. However, the stochastic dynamics of such a system has not yet been studied. Studies on discrete-time consensus protocols subject to additive noise can be found in [11][12], but to the best of our knowledge, there are no prior studies on continuous-time noisy consensus systems.

The main goal of this paper is to study the stochastic dynamics of continuous-time noisy average consensus system. The theoretical understanding of the stochastic behavior of such systems will be valuable for various areas such as multi-agent control and the design of consensus-based distributed algorithms for noisy environments.

The primary contributions of this paper are as follows. We first formulate the noisy average consensus systems using stochastic differential equations (SDE) [7][8]. This SDE formulation facilitates mathematically rigorous treatment of noisy average consensus. We use the Euler-Maruyama method [7] for solving the SDE, which is a numerical method for solving SDEs. We derive a closed-form mean squared error (MSE) formula by analyzing the stochastic behavior of the residual errors in the Euler-Maruyama method. We show that the MSE is dominated by the sum of the inverse eigenvalues of the Laplacian matrix. However, minimizing the MSE at a specific target time is a non-trivial task because the objective function involves the sum of the inverse eigenvalues. To solve this optimization problem, we will propose a deep unfolding-based optimization method.

The outline of the paper is as follows. In Section 2, we introduce the mathematical notation used throughout the paper, and then provide the definition and fundamental properties of average consensus systems. In Section 3, we define a noisy average consensus system as a SDE. In Section 4, we present an analysis of the stochastic behavior of the consensus error and derive a concise MSE formula. In Section 5, we propose a deep unfolding-based optimization method for minimizing the MSE at a specified target time. Finally, in Section 6, we conclude the discussion.

II Preliminaries

II-A Notation

The following notation will be used throughout this paper. The symbols $\mathbb{R}$ and $\mathbb{R}_{+}$ represent the set of real numbers and the set of positive real numbers, respectively. The one dimensional Gaussian distribution with mean $\mu$ and variance $\sigma^{2}$ is denoted by ${\cal N}(\mu,\sigma^{2})$ . The multivariate Gaussian distribution with mean vector $\bm{\mu}$ and covariance matrix $\bm{\Sigma}$ is represented by ${\cal N}(\bm{\mu},\bm{\Sigma})$ . The expectation operator is denoted by ${\sf E}[\cdot]$ . The notation $\mbox{diag}(\bm{x})$ is the diagonal matrix whose diagonal elements are given by $\bm{x}\in\mathbb{R}^{n}$ . The matrix exponential $\exp(\bm{X})(\bm{X}\in\mathbb{R}^{n\times n})$ is defined by

\displaystyle\exp(\bm{X})\equiv\sum_{k=0}^{\infty}\frac{1}{k!}\bm{X}^{k}.

(3)

The Frobenius norm of $\bm{X}\in\mathbb{R}^{n\times n}$ is denoted by $\|\bm{X}\|_{F}$ . The notation $[n]$ denotes the set of consecutive integers from $1$ to $n$ .

II-B Average Consensus

Let $G\equiv(V,E)$ be a connected undirected graph where $V=[n]$ . Suppose that a node $i\in V$ can be regarded as an agent communicating over the graph $G$ . Namely, a node $i$ and a node $j$ can communicate with each other if $(i,j)\in E$ . We will not distinguish $(i,j)$ and $(j,i)$ because the graph $G$ is undirected.

Each node $i$ has a state value $x_{i}(t)\in\mathbb{R}$ where $t\in\mathbb{R}$ represents continuous-time variable. The neighborhood of a node $i\in V$ is represented by

\displaystyle{\cal N}_{i}\equiv\{j\in V:(j,i)\in E,i\neq j\}.

(4)

Note that the node $i$ is excluded from ${\cal N}_{i}$ . For any time $t$ , a node $i\in V$ can access the self-state $x_{i}(t)$ and the state values of its neighborhood, i.e., $x_{j}(t),j\in{\cal N}_{i}$ but cannot access to the other state values.

In this section, we briefly review the basic properties of the average consensus processes [3]. We now assume that the set of state values $\bm{x}(t)\equiv(x_{1}(t),x_{2}(t),\ldots,x_{n}(t))^{T}$ are evolved according to the simultaneous differential equations

\displaystyle\frac{dx_{i}(t)}{dt}=-\sum_{j\in{\cal N}_{i}}\mu_{ij}(x_{i}(t)-x_{j}(t)),\quad i\in[n],

(5)

where the initial condition is

\displaystyle\bm{x}(0)=\bm{c}\equiv(c_{1},c_{2},\ldots,c_{n})^{T}\in\mathbb{R}^{n}.

(6)

The edge weight $\mu_{ij}$ follows the symmetric condition

\displaystyle\mu_{ij}=\mu_{ji},\quad i\in[n],j\in[n].

(7)

Let $\bm{\Delta}\equiv(\Delta_{1},\Delta_{2},\ldots,\Delta_{n})^{T}\in\mathbb{R}_{+}^{n}$ be a degree sequence where $\Delta_{i}$ is defined by

\displaystyle\Delta_{i}\equiv\sum_{j\in{\cal N}_{i}}\mu_{ij},\quad i\in[n].

(8)

The continuous-time dynamical system (5) is called an average consensus system because a state value converges to the average of the initial state values at the limit of $t\rightarrow\infty$ , i.e,

\displaystyle\lim_{t\rightarrow\infty}\bm{x}(t)=\frac{1}{n}\left(\sum_{i=1}^{n}c_{i}\right)\bm{1}{=\gamma\bm{1}},

(9)

where the vector $\bm{1}$ represents $(1,1,\ldots,1)^{T}$ and $\gamma$ is defined by

\displaystyle\gamma\equiv\frac{1}{n}\sum_{i=1}^{n}c_{i}.

(10)

We define the Laplacian matrix $\bm{L}\equiv\{L_{ij}\}\in\mathbb{R}^{n\times n}$ of this consensus system as follows:

$\displaystyle L_{ij}$	$\displaystyle=\Delta_{i},\quad i=j,\ i\in[n],$	(11)
$\displaystyle L_{ij}$	$\displaystyle=-\mu_{ij},\quad i\neq j\mbox{ and }(i,j)\in E,$	(12)
$\displaystyle L_{ij}$	$\displaystyle=0,\quad i\neq j\mbox{ and }(i,j)\notin E.$	(13)

From this definition, a Laplacian matrix satisfies

$\displaystyle\bm{L}\bm{1}$	$\displaystyle=\bm{0},$	(14)
$\displaystyle\mbox{diag}(\bm{L})$	$\displaystyle=\bm{\Delta},$	(15)
$\displaystyle\bm{L}$	$\displaystyle=\bm{L}^{T}.$	(16)

Note that the eigenvalues of the Laplacian matrix $\bm{L}$ are nonnegative real because $\bm{L}$ is a positive semi-definite symmetric matrix. Let $\lambda_{1}=0<\lambda_{2}\leq\ldots\leq\lambda_{n}$ be the eigenvalues of $\bm{L}$ and $\bm{\xi}_{1},\bm{\xi}_{2},\ldots,\bm{\xi}_{n}$ be the corresponding orthonormal eigenvectors. The first eigenvector $\bm{\xi}_{1}\equiv(1/\sqrt{n})\bm{1}$ is corresponding to the eigenvalue $\lambda_{1}=0$ , which results in $\bm{L}\bm{\xi}_{1}=0$ .

By using the notion of the Laplacian matrix, the dynamical system (5) can be compactly rewritten as

\displaystyle\frac{d\bm{x}(t)}{dt}=-\bm{L}\bm{x}(t),

(17)

where the initial condition is $\bm{x}(0)=\bm{c}$ . The dynamical behaviors of the average consensus system (17) are thus characterized by the Laplacian matrix $\bm{L}$ . Since the ODE (17) is a linear ODE, it can be easily solved. The solution of the ODE (17) is given by

\displaystyle\bm{x}(t)=\exp(-\bm{L}t)\bm{x}(0),\quad t\geq 0.

(18)

Let $\bm{U}\equiv(\bm{\xi}_{1},\bm{\xi}_{2},\ldots,\bm{\xi}_{n})\in\mathbb{R}^{n\times n}$ where $\bm{U}$ is an orthogonal matrix. The Laplacian matrix $\bm{L}$ can be diagonalized by using $\bm{U}$ , i.e.,

\displaystyle\bm{L}=\bm{U}\mbox{diag}(\lambda_{1},\ldots,\lambda_{n})\bm{U}^{T}.

(19)

On the basis of the diagonalization, we have the spectral expansion of the matrix exponential:

$\displaystyle\exp(-\bm{L}t)$	$\displaystyle=\exp(-\bm{U}\mbox{diag}(\lambda_{1},\ldots,\lambda_{n})\bm{U}^{T}t)$
	$\displaystyle=\bm{U}\exp(-\mbox{diag}(\lambda_{1},\ldots,\lambda_{n})t)\bm{U}^{T}$
	$\displaystyle=\sum_{i=1}^{n}\exp(-\lambda_{i}t)\bm{\xi}_{i}\bm{\xi}_{i}^{T}.$	(20)

Substituting this to $\bm{x}(t)=\exp(-\bm{L}t)\bm{x}(0)$ , we immediately have

\displaystyle\bm{x}(t)=\frac{1}{n}\bm{1}(\bm{1}^{T})\bm{c}+\sum_{i=2}^{n}\exp(-\lambda_{i}t)\bm{\xi}_{i}\bm{\xi}_{i}^{T}\bm{c}.

(21)

The second term of the right-hand side converges to zero since $\lambda_{k}>0$ for $k=2,3,\ldots,n$ . This explains why average consensus happens, i.e., the convergence to the average of the initial state values (9). The second smallest eigenvalue $\lambda_{2}$ , called algebraic connectivity [10], determines the convergence speed because $\exp(-\lambda_{2}t)\bm{\xi}_{2}\bm{\xi}_{2}^{\sf T}$ shows the slowest convergence in the second term.

III Noisy average consensus system

III-A SDE formulation

The dynamical model (2) containing a white Gaussian noise process is mathematically challenging to handle. We will use a common approach of approximating the white Gaussian process by using the standard Wiener process. Instead of model (2), we will focus on the following stochastic differential equation (SDE) [8]

d\bm{x}(t)=-\bm{L}\bm{x}(t)dt+\alpha d\bm{b}(t)

(22)

to study the noisy average consensus system. The parameter $\alpha$ is a positive real number, and it represents the intensity of the noises. The stochastic term $\bm{b}(t)$ represents the $n$ -dimensional standard Wiener process. The elements of $\bm{b}(t)=(b_{1}(t),b_{2}(t),\ldots,b_{n}(t))^{T}$ are independent one dimensional-standard Wiener processes. For the Wiener process $b(t)$ , we have $b(0)=0$ , $E[b(t)]=0$ , and it satisfies

\displaystyle b(t)-b(s)\sim{\cal N}(0,t-s),\ 0\leq s\leq t.

(23)

III-B Approaches for studying stochastic dynamics

Our primary objective in the following analysis is to investigate the stochastic dynamics of the noisy average consensus system, focusing on deriving the mean and covariance of the solution $\bm{x}(t)$ for the SDE (22).

There are two approaches to analyze the system. The first approach relies on the established theory of Ito calculus [8], which is used to handle stochastic integrals directly (see Fig. 1). Ito calculus can be applied to derive the first and second moments of the solution of (22).

Alternatively, the second approach employs the Euler-Maruyama (EM) method [7] and utilizes the weak convergence property [7] of the EM method. We will adopt the latter approach in our analysis, as it does not require knowledge of advanced stochastic calculus if we accept the weak convergence property. Additionally, this approach can be naturally extended to the analysis on the discrete-time noisy average consensus system. Furthermore, the EM method plays a key role in the optimization method to be presented in Section V. Our analysis motivates the use EM method for optimizing the covariance.

Refer to caption — Figure 1: Two approaches for deriving the mean and covariance of $\bm{x}(t)$ . This paper follows the lower path using the EM method.

III-C Euler-Maruyama method

We use the Euler-Maruyama method corresponding to this SDE so as to study the stochastic behavior of the solution of the SDE (22) defined above. The EM method is well-known numerical method for solving SDEs [7].

Assume that we need numerical solutions of a SDE in the time interval $0\leq t\leq T$ . We divide this interval into $N$ bins and let $t_{k}\equiv k\eta,\ k=0,1,\ldots,N$ where the interval $\eta$ is given by $\eta\equiv{T}/{N}.$ Let us define a discretized sample $\bm{x}^{(k)}$ be $\bm{x}^{(k)}\equiv\bm{x}(t_{k}).$ It should be noted that, the choice of the width $\eta$ is crucial in order to ensure the stability and the accuracy of the EM method. A small width leads to a more accurate solution, but requires more computational time. A large width may be computationally efficient but may lead to instability in the solution.

The recursive equation of the EM method corresponding to SDE (22) is given by

\displaystyle\bm{x}^{(k+1)}=\bm{x}^{(k)}-\eta\bm{L}\bm{x}^{(k)}+\alpha\bm{w}^{(k)},\ k=0,1,2,\ldots,N,

(24)

where each element of $\bm{w}^{(k)}\equiv(w_{1}^{(k)},w_{2}^{(k)},\ldots,w_{n}^{(k)})^{T}$ follows $w_{i}^{(k)}\sim{\cal N}(0,\eta).$ In the following discussion, we will use the equivalent expression [7]:

\displaystyle\bm{x}^{(k+1)}=\bm{x}^{(k)}-\eta\bm{L}\bm{x}^{(k)}+\alpha\sqrt{\eta}\bm{z}^{(k)},\ k=0,1,2,\ldots,N,

(25)

where $\bm{z}^{(k)}$ is a random vector following the multivariate Gaussian distribution ${\cal N}(\bm{0},\bm{I})$ . The initial vector $\bm{x}^{(0)}$ is set to be $\bm{c}$ . This recursive equation will be referred to as the Euler-Maruyama recursive equation.

Figure 2 presents a solution evaluated with the EM method. The cycle graph with 10 nodes with the degree sequence $\bm{d}=(2,2,\ldots,2)$ is assumed. The initial value is randomly initialized as $\bm{x}(0)\sim{\cal N}(0,\bm{I})$ . We can confirm that the state values are certainly converging to the average value $\gamma$ in the case of noiseless case (left). On the other hand, the state vector fluctuates around the average in the noisy case (right).

IV Analysis for Noisy average consensus

IV-A Recursive equation for residual error

In the following, we will analyze the stochastic behavior of the residual error. This will be the basis for the MSE formula to be presented.

Recall that the initial state vector is $\bm{c}=(c_{1},c_{2},\ldots,c_{n})^{T}$ and that the average of the initial values is denoted by $\gamma$ . Since the set of eigenvectors $\{\bm{\xi}_{1},\ldots,\bm{\xi}_{n}\}$ of $\bm{L}$ is an orthonormal base, we can expand the initial state vector $\bm{c}$ as

\displaystyle\bm{c}=\zeta_{1}\bm{\xi}_{1}+\zeta_{2}\bm{\xi}_{2}+\cdots+\zeta_{n}\bm{\xi}_{n},

(26)

where the coefficient is obtained by $\zeta_{i}=\bm{c}^{T}\bm{\xi}_{i}(i\in[n])$ . Note that $\zeta_{1}\bm{\xi}_{1}=\gamma\bm{1}$ holds.

At the initial index $k=0$ , the Euler-Maruyama recursive equation becomes

\displaystyle\bm{x}^{(1)}=\bm{x}^{(0)}-\eta\bm{L}\bm{x}^{(0)}+\alpha\sqrt{\eta}\bm{z}^{(0)}.

(27)

Substituting (26) into the above equation, we have

$\displaystyle\bm{x}^{(1)}$	$\displaystyle=\bm{x}^{(0)}-\eta\bm{L}(\zeta_{1}\bm{\xi}_{1}+\zeta_{2}\bm{\xi}_{2}+\cdots+\zeta_{n}\bm{\xi}_{n})+\alpha\sqrt{\eta}\bm{z}^{(0)}$
	$\displaystyle=\bm{x}^{(0)}-\eta\zeta_{1}\bm{L}\bm{\xi}_{1}-\eta L(\bm{x}^{(0)}-\zeta_{1}\bm{\xi}_{1})+\alpha\sqrt{\eta}\bm{z}^{(0)}$
	$\displaystyle={\bm{x}^{(0)}-\eta\bm{L}(\bm{x}^{(0)}-\gamma\bm{1})+\alpha\sqrt{\eta}\bm{z}^{(0)}},$	(28)

where the equations $L\bm{\xi}_{1}=\bm{0}$ and $\zeta_{1}\bm{\xi}_{1}=\gamma\bm{1}$ are used in the last equality. Subtracting $\gamma\bm{1}$ from the both sides, we get

\displaystyle\bm{x}^{(1)}-\gamma\bm{1}=(\bm{I}-\eta\bm{L})(\bm{x}^{(0)}-\gamma\bm{1})+\alpha\sqrt{\eta}\bm{z}^{(0)}.

(29)

For the index $k\geq 1$ , the Euler-Maruyama recursive equation can be written as

\displaystyle\bm{x}^{(k+1)}=(\bm{I}-\eta\bm{L})\bm{x}^{(k)}+\alpha\sqrt{\eta}\bm{z}^{(k)}.

(30)

Subtracting $\gamma\bm{1}$ from the both sides, we have

\displaystyle\bm{x}^{(k+1)}-\gamma\bm{1}=(\bm{I}-\eta\bm{L})\bm{x}^{(k)}-\gamma\bm{1}+\alpha\sqrt{\eta}\bm{z}^{(k)}.

(31)

By using the relation $(\bm{I}-\eta\bm{L})\gamma\bm{1}=\gamma\bm{1},$ we can rewrite the above equation as

	$\displaystyle\bm{x}^{(k+1)}-\gamma\bm{1}$	$\displaystyle=(\bm{I}-\eta\bm{L})\bm{x}^{(k)}-(\bm{I}-\eta\bm{L})\gamma\bm{1}+\alpha\sqrt{\eta}\bm{z}^{(k)}$
		$\displaystyle=(\bm{I}-\eta\bm{L})(\bm{x}^{(k)}-\gamma\bm{1})+\alpha\sqrt{\eta}\bm{z}^{(k)}.$		(32)

It can be confirmed the above recursion (32) is consistent with the initial equation (29). We here summarize the above argument as the following lemma.

Lemma 1

Let $\bm{e}^{(k)}\equiv\bm{x}^{(k)}-\gamma\bm{1}$ be the residual error at index $k$ . The evolution of the residual error of the EM method is described by

\displaystyle\bm{e}^{(k+1)}=(\bm{I}-\eta\bm{L})\bm{e}^{(k)}+\alpha\sqrt{\eta}\bm{z}^{(k)}

(33)

for $k\geq 0$ .

The residual error $\bm{e}^{(k)}$ denotes the error between the average vector $\gamma\bm{1}$ and the state vector $\bm{x}^{(k)}$ at time index $k$ . By analyzing the statistical behavior of $\bm{e}^{(k)}$ , we can gain insight into the stochastic properties of the dynamics of the noisy consensus system.

IV-B Asymptotic mean of residual error

Let a vector $\bm{x}\sim{\cal N}(\bm{\mu},\bm{\Sigma})$ . Recall that the vector obtained by a linear map $\bm{y}=\bm{A}\bm{x}$ also follows the Gaussian distribution, i.e.,

\displaystyle\bm{y}\sim{\cal N}(\bm{A}\bm{\mu},\bm{A}\bm{\Sigma}\bm{A}^{T}),

(34)

where $\bm{A}\in\mathbb{R}^{n\times n}$ . If two Gaussian vectors $\bm{a}\sim{\cal N}(\bm{\mu}_{a},\bm{\Sigma}_{a})$ and $\bm{b}\sim{\cal N}(\bm{\mu}_{b},\bm{\Sigma}_{b})$ are independent, the sum $\bm{z}=\bm{a}+\bm{b}$ becomes also Gaussian, i.e,

\displaystyle\bm{z}\sim{\cal N}(\bm{\mu}_{a}+\bm{\mu}_{b},\bm{\Sigma}_{a}+\bm{\Sigma}_{b}).

(35)

In the recursive equation (33), it is evident that $\bm{e}^{(1)}$ follows a multivariate Gaussian distribution because

\displaystyle\bm{e}^{(1)}=(\bm{I}-\eta\bm{L})(\bm{c}-\gamma\bm{1})+\alpha\sqrt{\eta}\bm{z}^{(0)}

(36)

is the sum of a constant vector and a Gaussian random vector. From the above properties of Gaussian random vectors, the residual error vector $\bm{e}^{(k)}$ follows the multivariate Gaussian distribution ${\cal N}(\bm{\mu}^{(k)},\bm{\Sigma}^{(k)})$ where the mean vector $\bm{\mu}^{(k)}$ and the covariance matrix $\bm{\Sigma}^{(k)}$ are recursively determined by

	$\displaystyle\bm{\mu}^{(k+1)}$	$\displaystyle=(\bm{I}-\eta\bm{L})\bm{\mu}^{(k)},$		(37)
	$\displaystyle\bm{\Sigma}^{(k+1)}$	$\displaystyle=(\bm{I}-\eta\bm{L})\bm{\Sigma}^{(k)}(\bm{I}-\eta\bm{L})^{T}+\alpha^{2}\eta\bm{I}$		(38)

for $k\geq 0$ where the initial values are formally given by

	$\displaystyle\bm{\mu}^{(0)}$	$\displaystyle=\bm{c}-\gamma\bm{1},$		(39)
	$\displaystyle\bm{\Sigma}^{(0)}$	$\displaystyle=\bm{O}.$		(40)

Solving the recursive equation, we can get the asymptotic mean formula as follows.

Lemma 2

Suppose that $T>0$ is given. The asymptotic mean at $N\rightarrow\infty$ is given by

\displaystyle\lim_{N\rightarrow\infty}\bm{\mu}^{(N)}=\exp(-\bm{L}T)(\bm{c}-\gamma\bm{1}).

(41)

(Proof) The mean recursion is given as $\bm{\mu}^{(k)}=(\bm{I}-\eta\bm{L})^{k}(\bm{c}-\gamma\bm{1})$ for $k\geq 1$ . Recall that the eigenvalue decomposition of $\bm{L}$ is given by $\bm{L}=\bm{U}\mbox{diag}(\lambda_{1},\ldots,\lambda_{n})\bm{U}^{T}$ . From

\displaystyle\bm{I}-\eta\bm{L}=\bm{U}(\bm{I}-\eta\mbox{diag}(\lambda_{1},\ldots,\lambda_{n}))\bm{U}^{T},

(42)

we have

\displaystyle(\bm{I}-\eta\bm{L})^{k}=\bm{U}\mbox{diag}((1-\eta\lambda_{1})^{k},\ldots,(1-\eta\lambda_{n})^{k})\bm{U}^{T}.

(43)

This implies, from the definition of exponential function,

\displaystyle\lim_{N\rightarrow\infty}\left(\bm{I}-\frac{T}{N}\bm{L}\right)^{N}=\exp(-\bm{L}T),

(44)

where $\eta=T/N$ .

It is easy to confirm that the claim of this lemma is consistent with the continuous solution of noiseless case (18). Namely, at the limit of $\alpha\rightarrow 0$ , the state evolution of the noisy system converges to that of the noiseless system.

IV-C Asymptotic covariance of residual error

We here discuss the asymptotic behavior of the covariance matrix $\bm{\Sigma}^{(N)}$ at the limit of $N\rightarrow\infty$ .

Lemma 3

Suppose that $T>0$ is given. The asymptotic covariance matrix at $N\rightarrow\infty$ is given by

\displaystyle\lim_{N\rightarrow\infty}\bm{\Sigma}^{(N)}=\bm{U}\mbox{diag}\left(\alpha^{2}T,\theta_{2},\theta_{3},\ldots,\theta_{n}\right)\bm{U}^{T},

(45)

where $\theta_{i}$ is defined by

\displaystyle\theta_{i}\equiv\frac{\alpha^{2}}{2\lambda_{i}}\left(1-e^{-2\lambda_{i}T}\right).

(46)

(Proof) Recall that

\displaystyle\bm{I}-\eta\bm{L}=\bm{U}\mbox{diag}(1,1-\eta\lambda_{2}\ldots,1-\eta\lambda_{n})\bm{U}^{T}.

(47)

Let $\bm{\Sigma}^{(k)}=\bm{U}\mbox{diag}(s_{1}^{(k)},\ldots,s_{n}^{(k)})\bm{U}^{T}$ . A spectral representation of the covariance evolution (38) is thus given by

	$\displaystyle\mbox{diag}(s_{1}^{(k+1)},\ldots,s_{n}^{(k+1)})$
	$\displaystyle=\mbox{diag}(s_{1}^{(k)},s_{2}^{(k)}(1-\eta\lambda_{2})^{2}\ldots,s_{n}^{(k)}(1-\eta\lambda_{n})^{2})+\alpha^{2}\eta\bm{I},$		(48)

where $s_{i}^{(0)}=0$ . The first component follows a recursion $s_{1}^{(k+1)}=s_{1}^{(k)}+\alpha^{2}\eta$ and thus we have $s_{1}^{(N)}=\alpha^{2}\eta N=\alpha^{2}T.$ Another component follows

\displaystyle s_{i}^{(k+1)}=s_{i}^{(k)}(1-\eta\lambda_{i})^{2}+\alpha^{2}\eta.

(49)

Let us consider the characteristic equation of (49) which is given by

\displaystyle s=s(1-\eta\lambda_{i})^{2}+\alpha^{2}\eta.

(50)

The solution of the equation is given by

\displaystyle s=\frac{\alpha^{2}\eta}{1-(1-\eta\lambda_{i})^{2}}.

(51)

The above recursive equation (49) thus can be transformed as

\displaystyle s_{i}^{(k+1)}-s=(s_{i}^{(k)}-s)(1-\eta\lambda_{i})^{2}.

(52)

From the above equation, $s_{i}^{(N)}$ can be solved as

\displaystyle s_{i}^{(N)}=s+(s_{i}^{(0)}-s)(1-\eta\lambda_{i})^{2N}.

(53)

Taking the limit $N\rightarrow\infty$ , we have

\displaystyle\lim_{N\rightarrow\infty}s_{i}^{(N)}=\frac{\alpha^{2}}{2\lambda_{i}}\left(1-e^{-2\lambda_{i}T}\right).

(54)

We thus have the claim of this lemma.

IV-D Weak convergence of Euler-Maruyama method

As previously noted, the asymptotic mean (41) is consistent with the continuous solution. The weak convergence property of the EM method [7] allows us to obtain the moments of the error at time $t$ .

We will briefly explain the weak convergence property. Suppose a SDE with the form:

\displaystyle d\bm{x}(t)=\phi(\bm{x}(t))dt+\psi(\bm{x}(t))d\bm{b}(t).

(55)

If $\phi$ and $\psi$ are bounded and Lipschitz continuous, then the finite order moment estimated by the EM method converges to the exact moment of the solution $\bm{x}(t)$ at the limit $N\rightarrow\infty$ [7]. This property is called the weak convergence property. In our case, the SDE (22) has bounded and Lipschitz continuous coefficient functions, i.e, $\phi(\bm{x})=-\bm{L}\bm{x}$ and $\psi(\bm{x})=\alpha$ . Hence, we can employ the weak convergence property in our analysis.

Suppose $\bm{x}(t)$ is a solution of SDE (22) with the initial condition $\bm{x}(0)=\bm{c}$ . Let $\bm{\mu}(t)$ be the mean vector of the residual error $\bm{e}(t)=\bm{x}(t)-\gamma\bm{1}$ and $\bm{\Sigma}(t)$ is the covariance matrix of the residual error $\bm{e}(t)$ .

Theorem 1

For a positive real number $t>0$ , the mean and the covariance matrix of the residual error $\bm{e}(t)$ are given by

	$\displaystyle\bm{\mu}(t)$	$\displaystyle=\exp(-\bm{L}t)(\bm{c}-\gamma\bm{1})$		(56)
	$\displaystyle\bm{\Sigma}(t)$	$\displaystyle=\bm{U}\mbox{diag}\left(\alpha^{2}t,\theta_{2},\theta_{3},\ldots,\theta_{n}\right)\bm{U}^{T}.$		(57)

(Proof) Due to the weak convergence property of the EM method, the first and second moments of the error are converged to the asymptotic mean and covariance of the EM method [7], i.e.,

	$\displaystyle\bm{\mu}(T)$	$\displaystyle=\lim_{N\rightarrow\infty}\bm{\mu}^{(N)}$		(58)
	$\displaystyle\bm{\Sigma}(T)$	$\displaystyle=\lim_{N\rightarrow\infty}\bm{\Sigma}^{(N)},$		(59)

where $N$ and $T$ are related by $T=\eta N$ . Applying Lemmas 2 and 3 and replacing the variable $T$ by $t$ provide the claim of the theorem.

IV-E Mean squared error

In the following, we assume that the initial state vector $\bm{c}$ follows Gaussian distribution ${\cal N}(\bm{0},\bm{I})$ .

In this setting, $\bm{\mu}(t)$ also follows multivariate Gaussian distribution with the mean vector $\bm{0}$ and the covariance matrix $\bm{Q}(t)\bm{Q}(t)^{T}$ where

\displaystyle\bm{Q}(t)\equiv\exp(-\bm{L}t)\left(\bm{I}-\frac{1}{n}\bm{1}(\bm{1}^{T})\right)

(60)

because $\bm{\mu}(t)$ can be rewritten as

\displaystyle\bm{\mu}(t)

\displaystyle=\exp(-\bm{L}t)(\bm{c}-\gamma\bm{1})=\exp(-\bm{L}t)\left(\bm{I}-\frac{1}{n}\bm{1}(\bm{1}^{T})\right)\bm{c}.

(61)

By using the result of Theorem 1, we immediately have the following corollary indicating the MSE formula.

Corollary 1

The mean squared error (MSE)

\displaystyle{\sf MSE}(t)\equiv{\sf E}[\|\bm{x}(t)-\gamma\bm{1}\|_{2}^{2}]

(62)

is given by

\displaystyle{\sf MSE}(t)

\displaystyle=\alpha^{2}t+\frac{\alpha^{2}}{2}\sum_{i=2}^{n}\frac{1-e^{-2\lambda_{i}t}}{\lambda_{i}}+\mbox{tr}(\bm{Q}(t)\bm{Q}(t)^{T}).

(Proof) We can rewrite $\bm{x}(t)$ as:

\displaystyle\bm{x}(t)=\gamma\bm{1}+\bm{Q}(t)\bm{c}+\bm{w},

(63)

where $\bm{w}\sim{\cal N}(\bm{0},\bm{\Sigma}(t))$ , and $\bm{w}$ and $\bm{c}$ are independent. We thus have

	$\displaystyle{\sf MSE}(t)$	$\displaystyle=\mbox{tr}(\bm{\Sigma}(t))+\mbox{tr}(\bm{Q}(t)\bm{Q}(t)^{T})$
		$\displaystyle=\alpha^{2}t+\frac{\alpha^{2}}{2}\sum_{i=2}^{n}\frac{1-e^{-2\lambda_{i}t}}{\lambda_{i}}+\mbox{tr}(\bm{Q}(t)\bm{Q}(t)^{T})$		(64)

due to Theorem 1.

Since the value of the term $\mbox{tr}(\bm{Q}(t)\bm{Q}(t)^{T})$ is exponentially decreasing with $t$ , $\mbox{tr}(\bm{\Sigma}(t))$ is dominant in ${\sf MSE}(t)$ for sufficiently large $t$ . For sufficiently large $t$ , the MSE is well approximated by the asymptotic MSE (AMSE) as

\displaystyle{\sf MSE}(t)\simeq{\sf AMSE}(t)\equiv\alpha^{2}t+\frac{\alpha^{2}}{2}\sum_{i=2}^{n}\frac{1}{\lambda_{i}}

(65)

because $\mbox{tr}(\bm{Q}(t)\bm{Q}(t)^{T})$ is negligible, and $1-e^{-2\lambda_{i}t}$ can be well approximated to $1$ . We can observe that the sum of inverse eigenvalue $\sum_{i=2}^{n}({1}/{\lambda_{i}})$ of the Laplacian matrix determines the intercept of the ${\sf AMSE}(t)$ . In other words, the graph topology influences the stochastic error behavior through the sum of inverse eigenvalues of the Laplacian matrix.

Figure 3 presents a comparison of ${\sf MSE}(t)$ evaluated by the EM method (25) and the formula in (64). In this experiment, the cycle graph with 10 nodes is used. The values of ${\sf AMSE}(t)$ are also included in Fig. 3. We can see that the theoretical values of ${\sf MSE}(t)$ and estimated values by the EM method are quite close.

V Minimization of mean squared error

V-A Optimization Problems A and B

In the previous section, we demonstrated that the MSE can be expressed in closed-form. It is natural to optimize the edge weights $\{\mu_{ij}\}$ in order to decrease the value of the MSE. The optimization of the edge weights is equivalent to the optimization of the Laplacian matrix $\bm{L}$ . There exist several related works that aim to achieve a similar goal for noise-free systems. For example, Xiao and Boyd [5] proposed a method to minimize the second eigenvalue to achieve the fastest convergence to the average. They formulated the optimization problem as a convex optimization problem, which can be solved efficiently. Kishida et al. [13] presented a deep unfolding-based method for optimizing time-dependent edge weights, yet these methods are not applicable to systems with noise. Optimizing the MSE may be a non-trivial task as it involves the sum of the inverse eigenvalues of the Laplacian matrix.

In this subsection, we will present two optimization problems of edge weights.

V-A1 Optimization problem A

Assume that a degree sequence $\bm{d}\in\mathbb{R}^{+}$ is given in advance. The optimization problem A is the minimization problem of ${\sf MSE}(t^{*})$ under the given degree sequence where $t^{*}$ is the predetermined target time given in advance. The precise formulation of the problem is given as follows:

$\displaystyle\mbox{minimize }{\sf MSE}(t^{*})$
subject to:
$\displaystyle\bm{L}$	$\displaystyle=\{L_{ij}\}\in\mathbb{R}^{n\times n}$	(66)
$\displaystyle\bm{L}$	$\displaystyle=\bm{L}^{T}$	(67)
$\displaystyle\bm{L}$	$\displaystyle\bm{1}=\bm{0}$	(68)
	$\displaystyle\\|\mbox{diag}(\bm{L})-\bm{d}\\|_{2}<\theta$	(69)
$\displaystyle L_{ij}$	$\displaystyle=0,\ (i,j)\notin E.$	(70)

The constraint (67) is imposed for the symmetry of the edge weight $\mu_{ij}=\mu_{ji}$ for $(i,j)\in E$ . The row sum constraint (68) is needed for satisfying (8). The constraint (69) means that $\bm{L}$ should be close enough to the given degree sequence. The positive constant $\theta$ can be seen as a tolerance parameter.

One way to interpret the optimization problem A is to consider the graph $G$ representing the wireless connection between terminals $i\in[n]$ . The degree sequence $\bm{d}=(d_{1},d_{2},\ldots,d_{n})$ can be seen as an allocated receive total wireless power, i.e., the terminal $i$ can receive the neighbouring signals up to the total power $d_{i}$ . If an average consensus protocol is used in such a wireless network for specific applications, it is desirable to optimize the ${\sf MSE}(t^{*})$ while satisfying the power constraint.

V-A2 Optimization problem B

Assume that a real constant $D\in\mathbb{R}^{+}$ is given in advance. The optimization problem B is the minimization problem of ${\sf MSE}(t^{*})$ under the situation that the diagonal sum of the Laplacian matrix $\bm{L}$ is equal to $D$ . The formulation is given as follows:

$\displaystyle\mbox{minimize }{\sf MSE}(t^{*})$
subject to:
$\displaystyle\bm{L}$	$\displaystyle=\{L_{ij}\}\in\mathbb{R}^{n\times n}$	(71)
$\displaystyle\bm{L}$	$\displaystyle=\bm{L}^{T}$	(72)
$\displaystyle\bm{L}$	$\displaystyle\bm{1}=\bm{0}$	(73)
	$\displaystyle\left\|\sum_{i=1}^{n}L_{ii}-D\right\|<\theta$	(74)
$\displaystyle L_{ij}$	$\displaystyle=0,\ (i,j)\notin E.$	(75)

Following the interpretation above, the power allocation is also optimized in this problem.

V-B Minimization based on deep-unfolded EM method

Advances in deep neural networks have had a strong impact on the design of algorithms for communications and signal processing [14, 15, 16]. Deep unfolding can be seen as a very effective way to improve the convergence of iterative algorithms. Gregor and LeCun introduced the Learned ISTA (LISTA) [21]. Borgerding et al. also proposed variants of AMP and VAMP with trainable capability [19][20]. Trainable ISTA(TISTA) [23] is another trainable sparse signal recovery algorithm with fast convergence. TISTA requires only a small number of trainable parameters, which provides a fast and stable training process. Another advantage of deep unfolding is that it has a relatively high interpretability of learning results.

The concept behind deep unfolding is rather simple. We can embed trainable parameters into the original iterative algorithm and then unfold the signal-flow graph of the original algorithm. The standard supervised training techniques used in deep learning, such as Stochastic Gradient Descent (SGD) and back propagation, can then be applied to the unfolded signal-flow graph to optimize the trainable parameters.

The combination of deep unfolding and the differential equation solvers [24] is a current area of active research in scientific machine learning. It should be noted, however, that the technique is not limited to applications within scientific machine learning. In this subsection, we introduce an optimization algorithm that is based on the deep-unfolded EM method. The central idea is to use a loss function that approximates ${\sf MSE}(t^{*})$ . By using a stochastic gradient descent approach with this loss function, we can obtain a near-optimal solution for both optimization problems A and B. The proposed method can be easily implemented using any modern neural network framework that includes an automatic differentiation mechanism. The following subsections will provide a more detailed explanation of the proposed method.

V-B1 Mini-batch for optimization

In an optimization process described below, a number of mini-batches are randomly generated. A mini-batch consists of

\displaystyle{\cal M}\equiv\{(\bm{c}_{1},\gamma_{1}),(\bm{c}_{2},\gamma_{2}),\ldots(\bm{c}_{K},\gamma_{K})\}.

(76)

The size parameter $K$ is called the mini-batch size. The initial value vector $\bm{c}_{i}$ follows Gaussian distribution, i.e., $\bm{c}_{i}\sim{\cal N}(\bm{0},\bm{I})(i\in[n])$ . The corresponding average value are obtained by $\gamma_{i}\equiv(1/n)\bm{c}_{i}^{T}\bm{I}$ .

V-B2 Loss function for Optimization problem A

The loss function corresponding to a mini-batch ${\cal M}$ is given by

\displaystyle E_{\cal M}(\bm{L})\equiv\frac{1}{K}\sum_{i=1}^{K}\|\bm{\chi}(\bm{c}_{i})-\gamma_{i}\bm{1}\|^{2}_{2}+P_{A}(\bm{L}),

(77)

where $\bm{\chi}(\bm{c}_{i})\equiv\bm{x}^{(N)}$ is the random variable given by the Euler-Maruyama recursion:

\displaystyle\bm{x}^{(k+1)}=\bm{x}^{(k)}-\eta\bm{L}\bm{x}^{(k)}+\alpha\sqrt{\eta}\bm{z}^{(k)},\ k=0,1,2,\ldots,N,

(78)

with $\bm{x}^{(0)}=\bm{c}_{i}$ . The first term of the loss function can be regarded as an approximation of ${\sf MSE}(t^{*})$ :

\displaystyle\frac{1}{K}\sum_{i=1}^{K}\|\bm{\chi}(\bm{c}_{i})-\gamma_{i}\bm{1}\|^{2}_{2}\simeq{\sf MSE}(t^{*})

(79)

for sufficiently large $K$ and $T=t^{*}$ .

The function $P_{A}(\bm{L})$ is a penalty function corresponding to the constraints (67)–(70) defined by

	$\displaystyle P_{A}(\bm{L})$	$\displaystyle\equiv\rho_{1}\\|\bm{L}-\bm{L}^{T}\\|_{F}^{2}+\rho_{2}\\|\bm{L}\bm{1}\\|_{2}^{2}+\rho_{3}\\|\mbox{diag}(\bm{L})-\bm{d}\\|_{2}^{2}$
	$\displaystyle+\rho_{4}\\|\bm{L}$	$\displaystyle\odot\bm{M}\\|_{F}^{2},$		(80)

where $\bm{M}=\{M_{ij}\}$ is the mask matrix defined by

\displaystyle M_{ij}\equiv\left\{\begin{array}[]{cc}1,&(i,j)\notin E\\ 0,&\mbox{otherwise}.\end{array}\right.

(83)

The operator $\odot$ represents the Hadamard matrix product. The positive constants $\rho_{i}(i\in[4])$ controls relative strength of each penalty term. The first term of the penalty function corresponds to the symmetric constraint (67). The term $\|\bm{L}\bm{1}\|_{2}^{2}$ is the penalty term for the row sum constraint (68). The third term $\|\mbox{diag}(\bm{L})-\bm{d}\|_{2}^{2}$ is included for the degree constraint. The last term $\|\bm{L}\odot\bm{M}\|_{F}^{2}$ enforces $L_{ij}$ to be very small if $(i,j)\notin E$ .

Due to these penalty terms in $P_{A}(\bm{L})$ , the violations on the constraints (67)–(70) are suppressed in an optimization process.

V-B3 Loss function for Optimization problem B

For Optimization problem B, we use almost the same same loss function:

\displaystyle E_{\cal M}(\bm{L})\equiv\frac{1}{K}\sum_{i=1}^{K}\|\bm{\chi}(\bm{c}_{i})-\gamma_{i}\bm{1}\|^{2}_{2}+P_{B}(\bm{L}).

(84)

In this case, we use the penalty function matched to the feasible conditions of Optimization problem B:

	$\displaystyle P_{B}(\bm{L})$	$\displaystyle\equiv\rho_{1}\\|\bm{L}-\bm{L}^{T}\\|_{F}^{2}+\rho_{2}\\|\bm{L}\bm{1}\\|_{2}^{2}+\rho_{3}\left(\sum_{i=1}^{n}L_{ii}-D\right)^{2}$
	$\displaystyle+\rho_{4}\\|\bm{L}$	$\displaystyle\odot\bm{M}\\|_{F}^{2}.$		(85)

The third term of $P_{B}(\bm{L})$ corresponds to the diagonal sum condition (74).

V-B4 Optimization process

The optimization process is summarized in Algorithm 1. This optimization algorithm is mainly based on the Deep-unfolded Euler-Maruyama (DU-EM) method for approximating ${\sf MSE}(t^{*})$ . The initial value of the matrix $\bm{L}$ is assumed to be the $n\times n$ zero matrix $\bm{O}^{n\times n}$ . The main loop can be regarded as a stochastic gradient descent method minimizing the loss values. The update of $\bm{L}$ (line 5) can be done by any optimizer such as the Adam optimizer. The gradient of the loss function (line 4) can be easily evaluated by using an automatic differentiation mechanism included in recent neural network frameworks such as TensorFlow, PyTorch, Jax, and Flux.jl with Julia. The block diagram of the Algorithm 1 is shown in Fig. 4.

Algorithm 1 Optimization process using DU-EM method

0: graph

G

, tolerance

\theta

, degree sequence

\bm{d}

or degree sum

D

0: Laplacian matrix

\bm{L}_{out}

1: Let

\bm{L}\equiv\bm{O}^{n\times n}

2: for

i=1

I

3: Generate a mini-batch

{\cal M}

randomly.

4: Compute the gradient of the loss function

\displaystyle\bm{g}\equiv\nabla E_{\cal M}(\bm{L})

5: The matrix

\bm{L}

is updated by using

\bm{g}

6: end for

\bm{L}_{out}\equiv\mbox{round}_{\theta,*}(\bm{L})

The stochastic optimization process outlined in Algorithm 1 is unable to guarantee that the obtained solution will be strictly feasible. To ensure feasibility, it is necessary to search for a feasible solution that is near the result obtained by optimization. This is accomplished by using the round function $\mbox{round}_{\theta,*}(\cdot)$ at line 7 of Algorithm 1.

The specific details for the round function used for optimization problem A are outlined in Algorithm 2. The first step in the algorithm, $\bm{L}\equiv(\bm{L}_{in}+\bm{L}_{in}^{T})/2$ , ensures that the resulting matrix is symmetric. The nested loop from line 2 to line 7 is used to enforce the degree constraint and the constraint $L_{ij}=0\ (i,j)\notin E$ . The single loop from line 9 to line 11 is implemented to satisfy the constraint $\bm{L}\bm{1}=\bm{0}$ . The output of the round function $\mbox{round}_{\theta,\bm{d}}(\cdot)$ guarantees that the constraints (67)-(70) of optimization problem A are strictly satisfied. A similar round function can be constructed for optimization problem B, which is presented in Algorithm 3.

Algorithm 2 Round function

\mbox{round}_{\theta,\bm{d}}(\cdot)

for Opt. prob. A

0: Matrix

\bm{L}_{in}

, degree sequence

\bm{d}

, threshold value

\theta

0: Laplacian matrix

\bm{L}_{out}

satisfying (67)–(70)

1: Let

\bm{L}\equiv(\bm{L}_{in}+\bm{L}_{in}^{T})/2

2: for

i=1

n

L_{ii}\equiv d_{i}

4: for

j=1

n

5: If

(i,j)\notin E

, then let

L_{ij}\equiv 0

6: end for

7: end for

\bm{\epsilon}=(\epsilon_{1},\ldots,\epsilon_{n})^{T}\equiv\bm{L}\bm{1}

9: for

i=1

n

10: Let

L_{ii}\equiv L_{ii}-\epsilon_{i}

11: end for

12: if

\|\mbox{diag}(\bm{L})-\bm{d}\|_{2}\geq\theta

then

13: Quit with declaration “optimization failed”

14: end if

15: Output

\bm{L}_{out}\equiv\bm{L}

Algorithm 3 Round function

\mbox{round}_{\theta,D}(\cdot)

for Opt. prob. B

0: Matrix

\bm{L}_{in}

, degree sum

D

, threshold value

\theta

0: Laplacian matrix

\bm{L}_{out}

1: Let

\bm{L}\equiv(\bm{L}_{in}+\bm{L}_{in}^{T})/2

2: for

i=1

n

3: for

j=1

n

4: If

(i,j)\notin E

, then let

L_{ij}\equiv 0

5: end for

6: end for

\bm{\epsilon}=(\epsilon_{1},\ldots,\epsilon_{n})^{T}\equiv\bm{L}\bm{1}

8: for

i=1

n

9: Let

L_{ii}\equiv L_{ii}-\epsilon_{i}

10: end for

11: if

\left|\sum_{i=1}^{n}L_{ii}-D\right|\geq\theta

then

12: Quit with declaration “optimization failed”

13: end if

14: Output

\bm{L}_{out}\equiv\bm{L}

VI Numerical results

VI-A Choice of Number of bins for EM-method

In the previous sections, we proposed a DU-based optimization method. This section presents results of numerical experiments. For these experiments, we used the automatic differentiation mechanism provided by Flux.jl [25] on Julia Language [26].

Before discussing the optimization of MSE, we first examine the choice number of bins, $N$ . Small $N$ is beneficial for computational efficiency but it may lead to inaccurate estimation of MSE. In this subsection, we will compare the Monte carlo estimates of MSE estimated by the EM-method.

The Karate graph is a well-known graph of a small social network. It represents the relationships between 34 members of a karate club at a university. The graph consists of 34 nodes, which represent the members of the club, and 78 edges, which represent the relationships between the members.

Figure 5 compares three cases, i.e., $N=100,250,1000$ . No visible difference can be observed in the range from $T=0$ to $T=5$ . In the following experiments, we will use $N=250$ for EM-method.

VI-B Petersen graph (Optimization problem A)

Petersen graph is a 3-regular graph with $n=10$ nodes (Fig.6(a)). In this subsection, we will examine the behavior of our optimization algorithm of ${\sf MSE}(t)$ for Petersen graph.

An adjacency matrix $\bm{A}\equiv\{A_{ij}\}\in\mathbb{R}^{n\times n}$ of a graph $G\equiv(V,E)$ is defined by

\displaystyle A_{ij}\equiv\left\{\begin{array}[]{cc}1,&(i,j)\in E\\ 0,&\mbox{otherwise}.\end{array}\right.

(88)

An unweighted Laplacian matrix $\bm{L}$ is defined by

\displaystyle\bm{L}\equiv\bm{D}-\bm{A},

(89)

The degree matrix $\bm{D}=\{D_{ij}\}$ is a diagonal matrix where $D_{ii}$ is the degree of the node $i$ . Namely, an unweighted Laplacian corresponds the case where $\mu_{ij}=\mu_{ji}=1$ for any $(i,j)\in E$ .

In the following discussion, let $\bm{L}_{P}$ be the unweighted Laplacian matrix of Petersen graph. We assume Optimization problem A with the degree sequence $\bm{d}\equiv\mbox{diag}(\bm{L}_{P})=(3,3,\ldots,3)$ .

The parameter setting is as follows. The mini-batch size is set to $K=25$ . The noise intensity is $\alpha=0.3$ . The penalty coefficients are $\rho_{1}=\rho_{2}=\rho_{3}=\rho_{4}=10$ . For time discretization, we use $T=4,N=250$ . The number of iterations for an optimization process is set to 3000. The tolerance parameter is set to $\theta=0.1$ . In the optimization process, we used the Adam optimizer with a learning rate of 0.01.

The loss values of an optimization process of Algorithm 1 are presented in Fig.7. In the initial stages of the optimization process, the loss value is relatively high since the initial $\bm{L}$ is set to the zero matrix, which means that the system cannot achieve average consensus. The loss value decreases monotonically until around iteration 700, after which it fluctuates within a range of $700\leq k\leq 3000$ . The graph shows that the matrix $\bm{L}$ in Algorithm 1 is being updated appropriately and that the loss value, which approximates ${\sf MSE}(t)$ , is decreasing.

Let us denote the Laplacian matrix obtained by the optimization process as $\bm{L}^{*}$ . Table I summarizes several important quantities regarding $\bm{L}^{*}$ . The top 4 rows of Table I indicate that $\bm{L}^{*}$ is certainly a feasible solution satisfying (67)–(70) because we set $\theta=0.1$ . This numerical results confirms that the round function $\mbox{round}_{\theta,\bm{d}}(\cdot)$ works appropriately. The last row of Table I shows that $\bm{L}^{*}$ is very close to the unweighted Laplacian matrix $\bm{L}_{P}$ . Since Petersen graph is regular and has high symmetry, it is conjectured that $\bm{L}_{P}$ is the optimal solution for Optimization problem A. Thus, the closeness between $\bm{L}_{P}$ and $\bm{L}^{*}$ can be seen as a convincing result.

TABLE I: Several quantities on optimization result

\bm{L}^{*}

$\\|\bm{L}^{}-\bm{L}^{T}\\|_{F}$	0
$\\|\bm{L}^{*}\bm{1}\\|_{2}$	0
$\\|\bm{L}^{*}\odot M\\|_{F}$	0
$\\|\mbox{diag}(\bm{L}^{*})-\bm{d}\\|_{2}$	$5.74\times 10^{-3}$
$\\|\bm{L}_{P}-\bm{L}^{*}\\|_{F}$	0.188

The MSE values of the optimization result $\bm{L}^{*}$ and the unweighted Laplacian matrix $\bm{L}_{P}$ are presented in Fig.8. These values are evaluated by the MSE formula (64). No visible difference can be seen between two curves. This means that Algorithm 1 successfully found a good solution for Optimization problem A in this case.

VI-C Karate graph (Optimization problem A)

We here consider Optimization problem A on the Karate graph. Let $\bm{L}_{K}$ be the unweighted Laplacian matrix of the Karate graph. The target degree sequence is set to $\bm{d}\equiv\mbox{diag}(\bm{L})=(16,9,10,\ldots,12,17).$ The parameter setting for an optimization process is given as follows. The mini-batch size is set to $K=50$ . The noise intensity is set to $\alpha=0.3$ . The penalty coefficients are $\rho_{1}=\rho_{2}=\rho_{3}=\rho_{4}=10$ . We use $T=2,N=250$ for DU-EM method. The number of iterations for an optimization process is set to 5000. The tolerance is set to $\theta=0.1$ . In the optimization process, we used the Adam optimizer with learning rate 0.01.

Assume that $\bm{L}^{*}$ is the Laplacian matrix obtained by an optimization process. The matrix $\bm{L}^{*}$ is a feasible solution satisfying all the constraints (67)–(70). For example, we have $\|\mbox{diag}(\bm{L}^{*})-\bm{d}\|_{2}=0.0894<0.1$ . Figure 9 presents the absolute values of non-diagonal elements in $\bm{L}_{K}$ and $\bm{L}^{*}$ . According to its definition, the absolute value of a non-diagonal element of $\bm{L}_{K}$ take the value one (left panel). On the other hand, we can observe that non-diagonal elements of $\bm{L}^{*}$ takes the absolute values in the range $0$ to $1.5$ .

We present the MSE values of the optimization result $\bm{L}^{*}$ and the unweighted Laplacian matrix $\bm{L}_{K}$ in Fig.10. These values are evaluated by the MSE formula (64). It can be seen that the optimized Laplacian $\bm{L}^{*}$ provides smaller MSE values. In this case, appropriate assignment of weights $\mu_{ij}$ improves the noise immunity of the system. The inverse eigenvalue sums of the Laplacian matrices $\bm{L}_{K}$ and $\bm{L}^{*}$ are 13.83 and 13.41, respectively. In this case, the optimization process of Algorithm 1 can successfully provide a feasible Laplacian matrix with smaller inverse eigenvalue sum. As shown in (65), the inverse eigenvalue sum determines the behavior of ${\sf MSE}(t)$ .

VI-D House graph (Optimization problem B)

The house graph (Fig.6(c)) is a small irregular graph with 5 nodes defined by the adjacency matrix:

\displaystyle\bm{A}=\begin{pmatrix}0&1&1&0&0\\ 1&0&0&1&0\\ 1&0&0&1&1\\ 0&1&1&0&1\\ 0&0&1&1&0\\ \end{pmatrix}.

(90)

We thus have the unweighted Laplacian $\bm{L}_{H}$ of the house graph as

\displaystyle\bm{L}_{H}=\begin{pmatrix}2&-1&-1&0&0\\ -1&2&0&-1&0\\ -1&0&3&-1&-1\\ 0&-1&-1&3&-1\\ 0&0&-1&-1&2\\ \end{pmatrix},

(91)

where the diagonal sum of $\bm{L}_{H}$ is 12.

We made two optimizations for $D=12$ and $D=20$ . The parameter setting is almost the same as the one used in the previous subsection. Only the difference is to use $\rho_{3}=0.1$ as the diagonal sum penalty constant. As results of the optimization processes, we have two Laplacian matrices $\bm{L}^{*}_{12}$ $(D=12)$ and $\bm{L}^{*}_{24}$ $(D=24)$ as follows:

\displaystyle\bm{L}^{*}_{12}=\begin{pmatrix}2.29&-1.05&-1.23&0&0\\ -1.05&2.29&0&-1.24&0\\ -1.23&0&2.70&-0.44&-1.03\\ 0&-1.24&-0.44&2.71&-1.03\\ 0&0&-1.03&-1.03&2.06\\ \end{pmatrix}

(92)

\displaystyle\bm{L}^{*}_{24}=\begin{pmatrix}4.80&-2.70&-2.09&0&0\\ -2.70&4.79&0&-2.08&0\\ -2.09&0&4.81&-0.37&-2.35\\ 0&-2.08&-0.37&4.85&-2.40\\ 0&0&-2.35&-2.40&4.74\\ \end{pmatrix}

(93)

The diagonal sums of $\bm{L}^{*}_{12}$ and $\bm{L}^{*}_{24}$ are 12.04 and 23.99, respectively. Compared with $\bm{L}^{*}_{12}$ with $\bm{L}_{H}$ , the diagonal elements of $\bm{L}^{*}_{12}$ are more flat:

	$\displaystyle\mbox{diag}(\bm{L}^{*}_{12})$	$\displaystyle=(2.29,2.29,2.70,2.71,2.06)^{T},$		(94)
	$\displaystyle\mbox{diag}(\bm{L}_{H})$	$\displaystyle=(2,2,3,3,2)^{T}.$		(95)

The MSE values of the optimization result $\bm{L}^{*}_{12},\bm{L}^{*}_{24}$ and the unweighted Laplacian matrix $\bm{L}_{H}$ are shown in Fig.11. We can observe that $\bm{L}_{12}^{*}$ achieves slightly smaller MSE values compared with the unweighted Laplacian matrix $\bm{L}_{H}$ . The Laplacian matrix $\bm{L}^{*}_{24}$ provides much smaller MSE values than those of $\bm{L}_{H}$ . The sums of inverse eigenvalues are $1.64,1.59,0.82$ for $\bm{L}_{H}$ , $\bm{L}_{12}^{*}$ , and $\bm{L}_{24}^{*}$ , respectively.

VI-E Barabási-Albert (BA) random graphs (Optimization problem B)

As an example of random scale-free networks, we here handle Barabási-Albert random graph which use a preferential attachment mechanism. The number of edges between a new node to existing nodes is assumed to be 5.

In this experiment, we generated an instance of Barabási-Albert random graph with 50 nodes. The unweighted Laplacian of the instance is denoted by $\bm{L}_{B}$ . The sum of the diagonal elements of $\bm{L}_{B}$ is $450$ . The parameter setting for optimization is the same as the one used in the previous subsection except for $D=450$ . The output of the optimization algorithm is referred to as $\bm{L}^{*}$ .

Figure 12 presents the MSE values of the original unweighted Laplacian $\bm{L}_{B}$ and the optimization output $\bm{L}^{*}$ . We can observe that the optimized MSE values are substantially smaller than those of the unweighted Laplacian $\bm{L}_{B}$ . The sums of inverse eigenvalues for $\bm{L}^{*}$ and $\bm{L}_{B}$ are $6.44$ and $7.16$ , respectively.

Figure 13 illustrates the values of diagonal elements of $\bm{L}^{*}$ and $\bm{L}_{B}$ . It can be observed that the values distribution of $\bm{L}^{*}$ is almost flat although the values of $\bm{L}_{B}$ varies from 5 to 21. This observation is consistent with the tendency observed in the previous subsection regarding the house graph.

VII Conclusion

In this paper, we have formulated a noisy average consensus system through a SDE. This formulation allows for an analytical study of the stochastic dynamics of the system. We derived a formula for the evolution of covariance for the EM method. Through the weak convergence property, we have established Theorem 1 and derived a MSE formula that provides the MSE at time $t$ . Analysis of the MSE formula reveals that the sum of inverse eigenvalues for the Laplacian matrix is the most significant factor impacting the MSE dynamics. To optimize the edge weights, a deep unfolding-based technique is presented. The quality of the solution has been validated by numerical experiments.

It is important to note that the theoretical understanding gained in this study will also provide valuable perspective on consensus-based distributed algorithms in noisy environments. In addition, the methodology for optimization proposed in this paper is versatile and can be adapted for various algorithms operating on graphs. The exploration of potential applications will be an open area for further studies.

Acknowledgement

This study was supported by JSPS Grant-in-Aid for Scientific Research (A) Grant Number 22H00514. The authors thank Prof. Masaki Ogura for letting us know the related work [11] on discrete-time average consensus systems.

References

[1] T. Wadayama and A. Nakai-Kasai, “Continuous-time noisy average consensus system as Gaussian multiple access channel,” IEEE International Symposium on Information Theory, (ISIT) 2022.
[2] R. Olfati-Saber and R. M. Murray, “Consensus problems in networks of agents with switching topology and time-delays,” IEEE Trans. Automat. Contr., vol. 49, no. 9, pp. 1520–1533, Sept. 2004.
[3] R. Olfati-Saber, J. A. Fax, and R. M. Murray, “Consensus and cooperation in networked multi-agent systems,” Proceedings of the IEEE, vol. 95, no. 1, pp. 215–233, 2007.
[4] W. Reb and R. W. Beard, “Consensus seeking in multi-agent systems under dynamically changing interaction topologies,” IEEE Transactions on Automatic Control, vol. 50, no. 5, pp. 655–661, 2005.
[5] L. Xiao and S. Boyd, “Fast linear iterations for distributed averaging,” Systems and Control Letters, vol. 53, pp. 65–78, 2004.
[6] W. Ren, “Consensus strategies for cooperative control of vehicle formations,” IET Control Theory and Applications, vol.2, pp. 505–512, 2007.
[7] P. E. Kloeden and E. Platen, “Numerical solution of stochastic differential equations,” Springer-Verlag, 1991.
[8] B. Oksendal, “Stochastic differential equations: an introduction with applications,” Springer, 2010.
[9] C. Godsil and G. F. Royle, “Algebraic graph theory,” Springer, 2001.
[10] F. Chung, “Spectral graph theory,” American Mathematical Society, 1997.
[11] A. Jadbabaie and A. Olshevsky, “On performance of consensus protocols subject to noise: Role of hitting times and network structure,” IEEE 55th Conference on Decision and Control (CDC), pp. 179-184, 2016.
[12] R. Rajagopal and M. J. Wainwright, “Network-based consensus averaging with general noisy channels,” IEEE Trans. Signal Process., vol. 59, no. 1, pp. 373–385, Jan. 2011.
[13] M. Kishida, M. Ogura, Y. Yoshida, and T. Wadayama, “Deep learning-based average consensus,” IEEE Access, vol. 8, pp. 142404 - 142412, 2020.
[14] B. Aazhang, B. P. Paris and G. C. Orsak, “Neural networks for multiuser detection in code-division multiple-access communications,” IEEE Trans. Comm., vol. 40, no. 7, pp. 1212-1222, Jul. 1992.
[15] E. Nachmani, Y. Beéry and D. Burshtein, “Learning to decode linear codes using deep learning,” 2016 54th Annual Allerton Conf. Comm., Control, and Computing, 2016, pp. 341-346.
[16] T. O’Shea and J. Hoydis, “An introduction to deep learning for the physical layer,” IEEE Trans. Cog. Comm. Net., vol. 3, no. 4, pp. 563-575, Dec. 2017.
[17] Y. A. LeCun, L. Bottou, G. B. Orr, and K. R. Müller, “Efficient backprop,” in Neural networks: Tricks of the trade, G. B. Orr and K. R. Müller, Eds. Springer-Verlag, London, UK, 1998, pp. 9-50.
[18] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Learning representations by back-propagating errors,” Nature, vol. 323, no. 6088, pp. 533-536, Oct. 1986.
[19] M. Borgerding and P. Schniter, “Onsager-corrected deep learning for sparse linear inverse problems,” 2016 IEEE Global Conf. Signal and Inf. Process. (GlobalSIP), Washington, DC, Dec. 2016, pp. 227-231.
[20] M. Borgerding, P. Schniter, and S. Rangan, “AMP-inspired deep networks for sparse linear inverse problems, ” IEEE Trans, Sig. Process. vol 65, no. 16, pp. 4293-4308 Aug. 2017.
[21] K. Gregor, and Y. LeCun, “Learning fast approximations of sparse coding,” Proc. 27th Int. Conf. Machine Learning, pp. 399-406, 2010.
[22] A. Balatsoukas-Stimming and C. Studer, “Deep Unfolding for Communications Systems: A Survey and Some New Directions,” 2019 IEEE International Workshop on Signal Processing Systems (SiPS), pp. 266-271, 2019.
[23] D. Ito, S. Takabe, and T. Wadayama, ”Trainable ISTA for sparse signal recovery,” IEEE Transactions on Signal Processing, vol. 67, no. 12, pp. 3113-3125, 2019.
[24] C. Rackauckas, Y. Ma, J. Martensen, C. Warner, K. Zubov, R. Supekar, D. Skinner, A. Ramadhan, and A. Edelman, “Universal differential equations for scientific machine learning,” arXiv:2001.04385, 2020.
[25] M. Innes, “Flux: Elegant machine learning with Julia,” Journal of Open Source Software, 2018.
[26] J. Bezanson, S. Karpinski, B. Viral, and A. Edelman, “Julia: A fast dynamic language for technical computing,” arXiv preprint arXiv:1209.5145, 2012.
[27] R. Albert, A.-L. Barabasi, “Statistical mechanics of complex networks,” American Physical Society, Rev. Mod. Phys., vol.74 pp. 47–97, 2002.

	$\displaystyle P_{A}(\bm{L})$	$\displaystyle\equiv\rho_{1}\\|\bm{L}-\bm{L}^{T}\\|_{F}^{2}+\rho_{2}\\|\bm{L}\bm{1}\\|_{2}^{2}+\rho_{3}\\|\mbox{diag}(\bm{L})-\bm{d}\\|_{2}^{2}$
	$\displaystyle+\rho_{4}\\|\bm{L}$	$\displaystyle\odot\bm{M}\\|_{F}^{2},$		(80)

	$\displaystyle P_{B}(\bm{L})$	$\displaystyle\equiv\rho_{1}\\|\bm{L}-\bm{L}^{T}\\|_{F}^{2}+\rho_{2}\\|\bm{L}\bm{1}\\|_{2}^{2}+\rho_{3}\left(\sum_{i=1}^{n}L_{ii}-D\right)^{2}$
	$\displaystyle+\rho_{4}\\|\bm{L}$	$\displaystyle\odot\bm{M}\\|_{F}^{2}.$		(85)

$\\|\bm{L}^{}-\bm{L}^{T}\\|_{F}$	0
$\\|\bm{L}^{*}\bm{1}\\|_{2}$	0
$\\|\bm{L}^{*}\odot M\\|_{F}$	0
$\\|\mbox{diag}(\bm{L}^{*})-\bm{d}\\|_{2}$	$5.74\times 10^{-3}$
$\\|\bm{L}_{P}-\bm{L}^{*}\\|_{F}$	0.188