This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

DOT and DOP: Linearly Convergent Algorithms for Finding Fixed Points of Multi-Agent Operators thanks:

Xiuxian Li, Member, IEEE, Min Meng, and Lihua Xie, Fellow IEEE This research was supported by Ministry of Education, Singapore, under grant AcRF TIER 1- 2019-T1-001-088 (RG72/19), the National Natural Science Foundation of China under Grant 62003243, Shanghai Municipal Commission of Science and Technology No. 19511132101, Shanghai Municipal Science and Technology Major Project under no. 2021SHZDZX0100. (Corresponding author: Lihua Xie.)X. Li and M. Meng are with Department of Control Science and Engineering, College of Electronics and Information Engineering, Institute for Advanced Study, and Shanghai Research Institute for Intelligent Autonomous Systems, Tongji University, Shanghai, China (e-mail: xli@tongji.edu.cn, mengmin@tongji.edu.cn).L. Xie is with School of Electrical and Electronic Engineering, Nanyang Technological University, 50 Nanyang Avenue, Singapore (e-mail: elhxie@ntu.edu.sg).
Abstract

This paper investigates the distributed fixed point finding problem for a global operator over a directed and unbalanced multi-agent network, where the global operator is quasi-nonexpansive and only partially accessible to each individual agent. Two cases are addressed, that is, the global operator is sum separable and block separable. For this first case, the global operator is the sum of local operators, which are assumed to be Lipschitz, and each local operator is privately known to each individual agent. To deal with this scenario, a distributed (or decentralized) algorithm, called Distributed quasi-averaged Operator Tracking algorithm (DOT), is proposed and rigorously analyzed, and it is shown that the algorithm can converge to a fixed point of the global operator at a linear rate under a linear regularity condition, which is strictly weaker than the strong convexity assumption on cost functions in existing convex optimization literature. For the second scenario, the global operator is composed of a group of local block operators which are Lipschitz and can be accessed only by each individual agent. In this setup, a distributed algorithm, called Distributed quasi-averaged Operator Playing algorithm (DOP), is developed and shown to be linearly convergent to a fixed point of the global operator under the linear regularity condition. The above studied problems provide a unified framework for many interesting problems. As examples, the proposed DOT and DOP are exploited to deal with distributed optimization and multi-player games under partial-decision information. Finally, numerical examples are presented to corroborate the theoretical results.

Index Terms:
Distributed algorithms, multi-agent networks, linear convergence, fixed point, bounded linear regularity, distributed optimization, game, real Hilbert spaces.

I Introduction

Fixed point theory in real Hilbert spaces is known as a powerful tool in a variety of domains such as optimization, engineering, economics, game theory, and nonlinear numerical analysis [1, 2]. Generally speaking, the main goal is to devise algorithms for computing a fixed point of an operator.

Up to now, plenty of research has extensively addressed centralized algorithms for finding a fixed point of nonexpansive or quasi-nonexpansive operators in the literature [3, 4, 5, 6], where a central/global coordinator or computing unit is able to access all information of the studied problem. It is known that the typical Picard iteration in general does not converge for nonexpansive operators (e.g., a simple example is the operator Id-Id with nonzero initial points, where IdId is the identity operator), although it usually performs well for contractive operators. For nonexpansive operators, one prominent algorithm is the so-called Krasnosel’skiĭ-Mann (KM) iteration [7, 8], and it is shown to converge weakly to a fixed point of a nonexpansive operator in real Hilbert spaces under mild conditions [9].

In recent few decades, distributed (or decentralized) algorithms have been an active topic in a wide range of domains, including fixed point theory, computer science, game theory, and control theory, and so on, mostly inspired by the fact that distributed algorithms, in contrast with centralized ones, possess a host of fascinating advantages, such as low cost, robustness to failures or antagonistic attacks, privacy preservation, low computational complexity, and so on. Distributed algorithms do not assume global/central coordinators or computing units and instead a finite group of agents (e.g., computing units and robots, and so on.), who may be spatially separated, aim to solve a global problem in a collaborative manner by local information exchanges. Wherein, local information exchanges are often depicted by a simple graph, connoting that every agent can interact with only a subset of agents, instead of all agents. Along this line, distributed algorithms have thus far been investigated extensively under both fixed and time-varying communication graphs in distributed optimization [10, 11, 12], game theory [13], and multi-agent systems/networks [14, 15], to just name a few.

In more recent years, distributed algorithms have received a growing attention in the fixed point finding problem [16, 17, 18, 19, 20, 21]. For instance, a synchronous distributed algorithm was proposed in [16] for computing a common fixed point of a collection of paracontraction operators, and for the same problem, an asynchronous distributed algorithm was developed in [17]. Meanwhile, different from paracontraction operators, another type of operators (i.e., strongly quasi-nonexpansive operators) was addressed for the common fixed point seeking problem in [18] by designing a distributed algorithm in the presence of time-varying delays under the assumption that the communication graph is repeatedly jointly strongly connected. It should be noted that many interesting problems can boil down to the common fixed point finding problem, such as convex feasibility problems [22, 23] and the problem of solving linear algebraic equations in a distributed fashion [24, 25, 26], and so forth. For example, the linear algebraic equation solving problem was formulated such that the distribution of random communication graphs is not required, which include asynchronous updates and/or unreliable interconnection protocols. Notice that all the aforementioned works are in the Euclidean space. Regarding the Hilbert space, the authors in [19] investigated distributed optimization under random and directed interconnection graphs, where a distributed algorithm was proposed and shown to be convergent in both almost surely and mean square sense along with the introduction of a novel convex minimization problem over the fixed-value point set of a nonexpansive random operator. In addition, the authors in [20] took into account the common fixed point finding problem for a finite collection of nonexpansive operators, where two distributed algorithms were proposed with a full coordinate updating and a random block-coordinate updating, respectively, and compared with [16, 17, 18], the contributions of [20] lie in the study of real Hilbert spaces, the consideration of operator errors, and the establishment of a sublinear convergence speed. Furthermore, a more general scenario, where no common fixed points are assumed for all local operators, was investigated in [21], where two distributed algorithms were devised to resolve the problem. It is noteworthy that, to our best knowledge, [21] is the first to investigate the fixed point finding problem of a global operator in real Hilbert spaces, where the global operator is an average of local operators over a multi-agent network. Nevertheless, the convergence rate is not analyzed for the proposed algorithms in [21].

Motivated by the above facts, the purpose of this paper is to further investigate the fixed point finding problem of a quasi-nonexpansive global operator over a time-invariant, directed, and unbalanced communication graph. Two scenarios are taken into account, that is, the global operator is sum separable and block separable. In the first case, the global operator is composed of a sum of local operators, and in the second case, the global operator is comprised of a family of local block operators. That is, in both cases the global operator is separable and consists of local operators, which are assumed to be Lipschitz and are only privately accessible to each individual agent, thereby needing all agents to tackle the global problem in a collaborative manner. The contributions of this paper are threefold as follows.

  1. 1.

    For the first case, a distributed algorithm, called distributed quasi-averaged operator tracking algorithm (DOT), is developed and shown to be convergent to a fixed point of the global operator at a linear rate under a linear regularity condition. Compared with the closely related work [21], where no convergence speed is provided, a different algorithm is developed here and shown to be convergent at a linear rate. It should be noted that the problem here is more general than the common fixed point seeking problem [17, 16, 18, 20], where all local operators are assumed to have at least one common fixed point, while this assumption is dropped here. As a special case, linear convergence can also be ensured for the common fixed point seeking problem. In contrast, [20] only provides a sublinear rate for nonexpansive operators.

  2. 2.

    For the second case, a distributed algorithm, called distributed quasi-averaged operator playing algorithm (DOP), is proposed, which is shown to be linearly convergent to a fixed point of the global operator under the linear regularity condition. To our best knowledge, this is the first to study the block separable case in a decentralized manner.

  3. 3.

    The studied setups in this paper provide a unified framework for a host of interesting problems. For example, the proposed DOT and DOP algorithms can be leveraged to resolve distributed optimization and multi-player games under partial-decision information.

A preliminary version of the paper was presented at a conference [27]. The present paper extends the result of [27] in several ways. [27] only considers the sum separable case without providing detailed proofs of the main result. In comparison, a full proof of the main result (i.e., Theorem 1) for the sum separable case is provided. Besides, one more scenario is investigated in this paper, i.e., the block separable case (see Theorem 2). Also, one more applications is presented here, i.e., multi-player games under partial-decision information, along with one more numerical example.

The structure of this paper is as follows. Some basic knowledge and the problem formulation are introduced in Section II, and the first case with the global operator being sum separable is addressed in Section III, followed by the second case with the global operator being block-coordinate separable in Section IV. Several applications are provided in Section V. Some numerical examples are presented in Section VI, and the conclusion is drawn in Section VII.

II Preliminaries and Problem Formulation

II-A Notations

Let \mathcal{H} be a real Hilbert space with inner product ,\langle\cdot,\cdot\rangle and associated norm \|\cdot\|. Define [N]:={1,2,,N}[N]:=\{1,2,\ldots,N\} for any integer N>0N>0, and denote by col(z1,,zk)col(z_{1},\ldots,z_{k}) the column vector or matrix by stacking up zi,i[k]z_{i},i\in[k]. Given an integer n>0n>0, denote by \mathbb{R}, n\mathbb{R}^{n}, n×n\mathbb{R}^{n\times n}, and \mathbb{N} the sets of real numbers, nn-dimensional real vectors, n×nn\times n real matrices, and nonnegative integers, respectively. Let PX()P_{X}(\cdot) represent the projection operator onto a closed and convex set XX\subseteq\mathcal{H}, i.e., PX(z):=argminxXzxP_{X}(z):=\mathop{\arg\min}_{x\in X}\|z-x\| for zz\in\mathcal{H}. Moreover, denote by II, IdId, and \otimes the identity matrix of appropriate dimension, the identity operator, and the Kronecker product, respectively. Let 𝟏n{\bf 1}_{n} be an nn-dimensional vector with all entries 11 for an integer n>0n>0, and the subscript is omitted when the dimension is clear in the context. dX(z):=infxXzxd_{X}(z):=\inf_{x\in X}\|z-x\| denotes the distance from zz\in\mathcal{H} to the set XX. For an operator T:T:\mathcal{H}\to\mathcal{H}, define Fix(T):={x|T(x)=x}Fix(T):=\{x\in\mathcal{H}|T(x)=x\} to be the set of fixed points of TT and Tβ:=Id+β(TId)T_{\beta}:=Id+\beta(T-Id), called a β\beta-relaxation of TT with a relaxation parameter β0\beta\geq 0. Denote by MM_{\infty} the infinite power of a square matrix MM, i.e., M=limkMkM_{\infty}=\lim_{k\to\infty}M^{k}, if it exists, and let ρ(M)\rho(M) and det(M)det(M) be the spectral radius and determinant of MM, respectively.

II-B Operator Theory

Consider an operator T:ST:S\to\mathcal{H} for a nonempty set SS\subseteq\mathcal{H}. TT is called LL-Lipschitz (continuous) for a constant L>0L>0 if

T(x)T(y)Lxy,x,yS.\displaystyle\|T(x)-T(y)\|\leq L\|x-y\|,~{}~{}~{}\forall x,y\in S. (1)

Further, TT is called nonexpansive (resp. contractive) if L=1L=1 (resp. L<1L<1), quasi-nonexpansive (QNE) if (1) holds with L=1L=1 for all xSx\in S and yFix(T)y\in Fix(T), and ρ\rho-strongly quasi-nonexpansive (ρ\rho-SQNE) for ρ>0\rho>0 if it holds that for all xSx\in S and yFix(T)y\in Fix(T),

T(x)y2xy2ρxT(x)2.\displaystyle\|T(x)-y\|^{2}\leq\|x-y\|^{2}-\rho\|x-T(x)\|^{2}. (2)

TT is called η\eta-averaged (resp. η\eta-quasi-averaged) for η(0,1)\eta\in(0,1) if it can be written as

T=(1η)Id+ηR,\displaystyle T=(1-\eta)Id+\eta R, (3)

where RR is some nonexpansive (resp. quasi-nonexpansive) operator.

The aforementioned concepts can be found in Section 4 of [1], where quasi-averaged operators are defined here as an analogue of averaged operators. It is well known that when TT is QNE, the fixed point set Fix(T)Fix(T) is closed and convex (cf., Corollary 4.24 in [1]).

The operator TT is said boundedly linearly regular if for any bounded set 𝕂S\mathbb{K}\subseteq S, there exists a constant ω>0\omega>0 such that

dFix(T)(x)ωxTx,x𝕂\displaystyle d_{Fix(T)}(x)\leq\omega\|x-Tx\|,~{}~{}~{}\forall x\in\mathbb{K}

and TT is called linearly regular if ω\omega is independent of 𝕂\mathbb{K} [28].

It is easy to observe that the (bounded) linear regularity means that the distance between xx and TxTx is lower bounded by a scaled distance from the vector xx to the set Fix(T)Fix(T), and bounded linear regularity is weaker than linear regularity. For instance, the projection operator PCP_{C} is linearly regular with constant 11, where CC\subseteq\mathcal{H} is nonempty, closed and convex, and when =\mathcal{H}=\mathbb{R}, the thresholder operator

T(x)={0,if |x|1x1,if x>1x+1,if x<1\displaystyle T(x)=\left\{\begin{array}[]{ll}0,&\text{if~{}}|x|\leq 1\\ x-1,&\text{if~{}}x>1\\ x+1,&\text{if~{}}x<-1\end{array}\right. (7)

is boundedly linear regular with ω=max{|x|,1}\omega=\max\{|x|,1\}, but not linearly regular [29]. Moreover, it has been shown in [30] that linear regularity is necessary and sufficient for global Q-linear convergence to a fixed point for an SQNE operator. More details on (bounded) linear regularity can be found in [29, 30, 31].

II-C Problem Formulation

The aim of this paper is to compute a fixed point of a global operator F:F:\mathcal{H}\to\mathcal{H}, i.e.,

findx,s.t.xFix(F).\displaystyle\text{find}~{}x\in\mathcal{H},~{}~{}\text{s.t.}~{}~{}x\in Fix(F). (8)

It is worth mentioning that problem (8) in real Hilbert spaces (but not in n\mathbb{R}^{n}) can find applications, for example, in digital signal processing and L2([0,π])L^{2}([0,\pi]) (i.e., the space of all square integrable functions f:[0,π]f:[0,\pi]\to\mathbb{R}) [32], and so on.

In this paper, no global/central coordinator, master, or computing unit is assumed to exist for problem (8), and however the global operator FF is separable and consists of local operators, which are privately accessible to each individual agent in a network. Briefly speaking, only partial information of FF can be privately known by each agent in a network, which is interesting and realistic in large-scale problems, as extensively studied in distributed optimization and distributed machine learning, and so on. Specifically, we focus on two scenarios: 1) FF is sum separable and 2) FF is block separable, as elaborated below.

Case 1: Sum Separable. FF is sum separable over a network of NN agents, that is,

(Problem I) findxFix(F),F=1Ni=1NFi,\displaystyle\text{{\bf(Problem I)}~{}~{}~{}~{}~{}find}~{}x\in Fix(F),~{}~{}F=\frac{1}{N}\sum_{i=1}^{N}F_{i}, (9)

where each Fi:F_{i}:\mathcal{H}\to\mathcal{H} is a local operator, only privately accessible to agent ii for i[N]i\in[N]. Note that the formulation (9) is also investigated in [21], where, however, no convergence rates are provided, while a linear convergence speed is established in this paper. And the formulation (6) is more general than the common fixed point finding problem [16, 17, 18, 19, 20, 21] and the linear algebraic equation solving problem [24, 25, 26], as discussed in the introduction section.

Case 2: Block (or Block-Coordinate) Separable. In this case, =1N\mathcal{H}=\mathcal{H}_{1}\oplus\cdots\oplus\mathcal{H}_{N} is the direct Hilbert sum, where every i,i[N]\mathcal{H}_{i},i\in[N] is a real Hilbert space, along with the same inner product ,\langle\cdot,\cdot\rangle and associated norm \|\cdot\| as \mathcal{H}, i.e., \mathcal{H} and 1\mathcal{H}_{1} are consistent when N=1N=1. Let x=(x1,,xN)x=(x_{1},\ldots,x_{N}) denote a generic vector in \mathcal{H} with xii,i[N]x_{i}\in\mathcal{H}_{i},i\in[N]. Then the global operator FF can be written as a block-coordinate version F=(F1,,FN)F=(\texttt{F}_{1},\ldots,\texttt{F}_{N}), where Fi:i\texttt{F}_{i}:\mathcal{H}\to\mathcal{H}_{i} for i[N]i\in[N], i.e., F(x)=(F1(x),,FN(x))F(x)=(\texttt{F}_{1}(x),\ldots,\texttt{F}_{N}(x)) for xx\in\mathcal{H}. In this setup, FF is block (or block-coordinate) separable over a network of NN agents, i.e.,

(Problem II) findxFix(F),F=(F1,,FN),\displaystyle\text{{\bf(Problem II)}~{}~{}~{}~{}~{}find}~{}x\in Fix(F),~{}F=(\texttt{F}_{1},\ldots,\texttt{F}_{N}), (10)

where each Fi\texttt{F}_{i} is a local operator, only privately accessible to agent ii for all i[N]i\in[N]. Also, each agent ii only knows its own vector xix_{i} with no knowledge of xjx_{j}’s for all jij\neq i. To our best knowledge, this scenario is novel and also practical. For example, in multi-player games, it is difficult or impossible for a global/central coordinator or master to know the whole coordinates of xx due to privacy.

With the above discussion, the objective of this paper is to develop distributed (or decentralized) algorithms to solve problems I and II in real Hilbert spaces.

Remark 1.

To illustrate applications of the above studied problems, a simple example in function approximation is provided here, which is useful such as in reinforcement learning (cf., Chapter 9 in [33]). Consider a (reward) function r:n\texttt{r}:\mathbb{R}^{n}\to\mathbb{R}, which may be unknown in reality, and let us approximate it by j=1wjexp(xcj22σj2)\sum_{j=1}^{\infty}w_{j}\exp(-\frac{\|x-c_{j}\|^{2}}{2\sigma_{j}^{2}}) based on radial basis functions (RBFs), where cjc_{j} and σj\sigma_{j} are some prespecified parameters (e.g., feature’s center state and feature’s width in reinforcement learning, respectively), w=(w1,w2,)2w=(w_{1},w_{2},\ldots)\in\ell^{2} is the variable to be optimized, and 2\ell^{2} is the space of square-summable sequences which is an infinite-dimensional Hilbert space. Then the performance is to minimize the approximation error f(w):=1Ni=1Nfi(w)f(w):=\frac{1}{N}\sum_{i=1}^{N}f_{i}(w) with fi(w):=1Nii=1Ni|r(si)j=1wjexp(sicj22σj2)|2f_{i}(w):=\frac{1}{N_{i}}\sum_{i=1}^{N_{i}}|\texttt{r}(s_{i})-\sum_{j=1}^{\infty}w_{j}\exp(-\frac{\|s_{i}-c_{j}\|^{2}}{2\sigma_{j}^{2}})|^{2}, where {si}i=1Ni\{s_{i}\}_{i=1}^{N_{i}} are a set of sample data privately known by agent ii. It is easy to verify that fif_{i} is differentiable and convex with LiL_{i}-Lipschitz gradient for some constant Li>0L_{i}>0, thus implying that the operator Fi:wwξfi(w)F_{i}:w\mapsto w-\xi\nabla f_{i}(w) is nonexpansive for a constant ξ(0,2/L)\xi\in(0,2/L) with L:=maxi[N]LiL:=\max_{i\in[N]}L_{i} (Lemma 4 in [3]). Then the problem is equivalent to the fixed point finding problem (9) with =2\mathcal{H}=\ell^{2}. More applications will be given in Section V.

II-D Graph Theory

The communication pattern among all agents is captured by a simple directed graph, denoted by 𝒢=(𝒱,)\mathcal{G}=(\mathcal{V},\mathcal{E}), where 𝒱=[N]\mathcal{V}=[N] and 𝒱×𝒱\mathcal{E}\subseteq\mathcal{V}\times\mathcal{V} are the node (or agent) and edge sets, respectively. An edge (i,j)(i,j)\in\mathcal{E} means that agent ii can send information to agent jj, but not vice versa, and agent ii (resp. jj) is called an in-neighbor or simply neighbor (resp. out-neighbor) of agent jj (resp. ii). A graph is called undirected if and only if (i,j)(i,j)\in\mathcal{E} amounts to (j,i)(j,i)\in\mathcal{E}, and directed otherwise. A directed path is defined to be a sequence of adjacent edges (i1,i2),(i2,i3),,(il1,il)(i_{1},i_{2}),(i_{2},i_{3}),\ldots,(i_{l-1},i_{l}), and a graph is said strongly connected if any two nodes can be connected by a directed path from one to the other.

II-E Assumptions

With the above preparations, we are now ready to impose some standard assumptions.

Assumption 1 (Strong Connectivity).

The communication graph 𝒢\mathcal{G} is strongly connected. Moreover,

  1. 1.

    two matrices A=(aij)N×NA=(a_{ij})\in\mathbb{R}^{N\times N} (row-stochastic) and B=(bij)N×NB=(b_{ij})\in\mathbb{R}^{N\times N} (column-stochastic) are arbitrarily assigned to 𝒢\mathcal{G} with aii>0,bii>0a_{ii}>0,b_{ii}>0 for all i[N]i\in[N];

  2. 2.

    denote by the left stochastic eigenvector (resp. right stochastic eigenvector) of AA (resp. BB) associated with the eigenvalue 11 as π=col(π1,,πN)\pi=col(\pi_{1},\ldots,\pi_{N}) (resp. ν=col(ν1,,νN)\nu=col(\nu_{1},\ldots,\nu_{N})) such that A=𝟏NπA_{\infty}={\bf 1}_{N}\pi^{\top} (resp. B=ν𝟏NB_{\infty}=\nu{\bf 1}_{N}^{\top}) and πi>0\pi_{i}>0, νi>0\nu_{i}>0 for all i[N]i\in[N].

It is worth mentioning that the directed graph is not required to be balanced in this paper. Note that, similar to [34], AA and BB are consistent with 𝒢\mathcal{G}, that is, aij>0a_{ij}>0 and bij>0b_{ij}>0 if and only if (j,i)(j,i)\in\mathcal{E} for iji\neq j. Notice that AA and BB do not need to be doubly stochastic. Additionally, the property in Assumption 1.2 can be ensured by the strong connectivity of 𝒢\mathcal{G} [34].

Assumption 2.

FiF_{i} is Lipschitz with constant LiL_{i} for all i[N]i\in[N], i.e., Fi(x)Fi(y)Lixy,x,y\|F_{i}(x)-F_{i}(y)\|\leq L_{i}\|x-y\|,~{}\forall x,y\in\mathcal{H}.

Assumption 3.
  1. 1.

    FF is quasi-nonexpansive with Fix(F)Fix(F)\neq\emptyset.

  2. 2.

    FF is linearly regular, i.e., there exists a constant κ>0\kappa>0 such that

    dFix(F)(x)κF(x)x,x.\displaystyle d_{Fix(F)}(x)\leq\kappa\|F(x)-x\|,~{}~{}~{}\forall x\in\mathbb{H}. (11)
Remark 2.

Note that Assumption 3 is only made for the global operator FF, and is not necessary for any local operators FiF_{i}’s. In addition, the linear regularity is strictly weaker than the strong convexity of functions (Section III in [30]), where the linear regularity of functions is for operators involving functions’ gradients, as shown in Section V-A.

III The DOT Algorithm for Problem I

This section aims to develop a distributed algorithm for tackling problem (9) which can converge at a linear rate. Without loss of generality, the vectors in \mathcal{H} are viewed as column vectors in this section.

For problem (9), if FF can be known by a global/central computing unit (or coordinator), then a famous centralized algorithm, called the KM iteration [35], can be exploited, i.e.,

xk+1=xk+αk(F(xk)xk),\displaystyle x_{k+1}=x_{k}+\alpha_{k}(F(x_{k})-x_{k}), (12)

where {αk}k\{\alpha_{k}\}_{k\in\mathbb{N}} is a sequence of relaxation parameters with αk[0,1]\alpha_{k}\in[0,1]. Note that the KM iteration usually applies to nonexpansive operators, but it still works for quasi-nonexpansive operators here under the linear regularity condition in Assumption 3. However, the centralized iteration (12) is not realistic here since no global/central computing unit (or coordinator) exists in our setting, which hence motivates us to devise distributed (or decentralized) algorithms based on only local information exchanges among all agents.

Motivated by the classical KM iteration and the tracking techniques such as those in [36, 34, 37], a distributed quasi-averaged operator tracking algorithm (DOT) is proposed as

xi,k+1\displaystyle x_{i,k+1} =j=1Naijxj,k+α(yi,kwi,kj=1Naijxj,k),\displaystyle=\sum_{j=1}^{N}a_{ij}x_{j,k}+\alpha\Big{(}\frac{y_{i,k}}{w_{i,k}}-\sum_{j=1}^{N}a_{ij}x_{j,k}\Big{)}, (13a)
yi,k+1\displaystyle y_{i,k+1} =j=1Nbijyj,k+Fi(xi,k+1)Fi(xi,k),\displaystyle=\sum_{j=1}^{N}b_{ij}y_{j,k}+F_{i}(x_{i,k+1})-F_{i}(x_{i,k}), (13b)
wi,k+1\displaystyle w_{i,k+1} =j=1Nbijwj,k,\displaystyle=\sum_{j=1}^{N}b_{ij}w_{j,k}, (13c)

where xi,kx_{i,k} is an estimate of a fixed point of the global operator FF by agent ii at time k0k\geq 0 for all i[N]i\in[N], and α(0,1)\alpha\in(0,1) is the stepsize to be determined. Set the initial conditions as: arbitrary xi,0x_{i,0}\in\mathcal{H}, yi,0=Fi(xi,0)y_{i,0}=F_{i}(x_{i,0}), and wi,0=1w_{i,0}=1 for all i[N]i\in[N]. It is noteworthy that neighboring agents are only involved in (13) for each agent due to aij=0a_{ij}=0 and bij=0b_{ij}=0 when agent jj is not a neighbor of agent ii.

Roughly speaking, yi,ky_{i,k} is employed to track the weighted global operator νii=1NFi(xi,k)\nu_{i}\sum_{i=1}^{N}F_{i}(x_{i,k}), and meanwhile wi,kw_{i,k} is a scalar used to track νiN\nu_{i}N in order to counteract the imbalance of the matrix BB in (13b).

For (13c), it is easy to verify that each wi,kw_{i,k} will exponentially converge to NνiN\nu_{i}. However, invoking the method in [38], it can be concluded that the final value NνiN\nu_{i} can be evaluated for each agent ii in finite time in a distributed manner. Because of this, without loss of generality, algorithm (13) can be rewritten by replacing wi,kw_{i,k} with NνiN\nu_{i} as in Algorithm 1.

To facilitate the following analysis, Algorithm 1 can be written in a compact form

xk+1\displaystyle x_{k+1} =𝐀xk+α[1N(Dν1Id)yk𝐀xk],\displaystyle=\mathbf{A}x_{k}+\alpha\Big{[}\frac{1}{N}(D_{\nu}^{-1}\otimes Id)y_{k}-\mathbf{A}x_{k}\Big{]}, (14)
yk+1\displaystyle y_{k+1} =𝐁yk+𝐅(xk+1)𝐅(xk),\displaystyle=\mathbf{B}y_{k}+\mathbf{F}(x_{k+1})-\mathbf{F}(x_{k}), (15)

where xk,ykx_{k},y_{k} are concatenated vectors of xi,k,yi,kx_{i,k},y_{i,k}, respectively, 𝐀:=AId\mathbf{A}:=A\otimes Id, 𝐁:=BId\mathbf{B}:=B\otimes Id, Dν:=diag{ν1,,νN}D_{\nu}:=diag\{\nu_{1},\ldots,\nu_{N}\}, and 𝐅(z):=col(F1(z1),,FN(zN))\mathbf{F}(z):=col(F_{1}(z_{1}),\ldots,F_{N}(z_{N})) for a vector z=col(z1,,zN)N:=××z=col(z_{1},\ldots,z_{N})\in\mathcal{H}^{N}:=\mathcal{H}\times\cdots\times\mathcal{H} (the NN-fold Cartesian product of \mathcal{H}).

Algorithm 1 Distributed Quasi-Averaged Operator Tracking (DOT)
1:  Initialization: Stepsize α\alpha in (30), communication matrices AA and BB, and local initial conditions xi,0x_{i,0}\in\mathcal{H} and yi,0=Fi(xi,0)y_{i,0}=F_{i}(x_{i,0}) for all i[N]i\in[N].
2:  Iterations: Step k0k\geq 0: update for each i[N]i\in[N]:
xi,k+1\displaystyle x_{i,k+1} =j=1Naijxj,k+α(yi,kNνij=1Naijxj,k),\displaystyle=\sum_{j=1}^{N}a_{ij}x_{j,k}+\alpha\Big{(}\frac{y_{i,k}}{N\nu_{i}}-\sum_{j=1}^{N}a_{ij}x_{j,k}\Big{)}, (16a)
yi,k+1\displaystyle y_{i,k+1} =j=1Nbijyj,k+Fi(xi,k+1)Fi(xi,k).\displaystyle=\sum_{j=1}^{N}b_{ij}y_{j,k}+F_{i}(x_{i,k+1})-F_{i}(x_{i,k}). (16b)

To proceed, it is helpful to introduce two new weighted norms for the Cartesian product N\mathcal{H}^{N}, i.e.,

zπ:=i=1Nπizi2,zν:=i=1Nzi2νi\displaystyle\|z\|_{\pi}:=\sqrt{\sum_{i=1}^{N}\pi_{i}\|z_{i}\|^{2}},~{}~{}~{}\|z\|_{\nu}:=\sqrt{\sum_{i=1}^{N}\frac{\|z_{i}\|^{2}}{\nu_{i}}} (17)

for any vector z=col(z1,,zN)Nz=col(z_{1},\ldots,z_{N})\in\mathcal{H}^{N}. Let \|\cdot\| be the natural norm in N\mathcal{H}^{N}, i.e., z:=i=1Nzi2\|z\|:=\sqrt{\sum_{i=1}^{N}\|z_{i}\|^{2}}. Additionally, it is also necessary to introduce two weighted norms in N\mathbb{R}^{N} [34], i.e., for any x=col(x1,,xN)Nx=col(x_{1},\ldots,x_{N})\in\mathbb{R}^{N},

xπ:=i=1Nπixi2,xν:=i=1Nxi2νi.\displaystyle\|x\|_{\pi}:=\sqrt{\sum_{i=1}^{N}\pi_{i}x_{i}^{2}},~{}~{}~{}\|x\|_{\nu}:=\sqrt{\sum_{i=1}^{N}\frac{x_{i}^{2}}{\nu_{i}}}. (18)

Please note that the notations π\|\cdot\|_{\pi} and ν\|\cdot\|_{\nu} in (17) and (18) should be easily distinguished by the context. Accordingly, let us denote by Mπ\|M\|_{\pi} and Mν\|M\|_{\nu} (resp. MTπ\|M\otimes T\|_{\pi} and MTν\|M\otimes T\|_{\nu}) the norms for a matrix MN×NM\in\mathbb{R}^{N\times N} (resp. a matrix MN×NM\in\mathbb{R}^{N\times N} and an operator TT in \mathcal{H}) induced by π\|\cdot\|_{\pi} and ν\|\cdot\|_{\nu} in (18) (resp. (17)), respectively.

It is easy to see that the natural norm \|\cdot\| is equivalent to π\|\cdot\|_{\pi}, ν\|\cdot\|_{\nu} in (17), (18), and thus to the induced matrix norms, that is, there are positive constants ci,i[4]c_{i},i\in[4] such that

c1\displaystyle c_{1}\|\cdot\| πc2,\displaystyle\leq\|\cdot\|_{\pi}\leq c_{2}\|\cdot\|, (19)
c3ν\displaystyle c_{3}\|\cdot\|_{\nu} πc4ν.\displaystyle\leq\|\cdot\|_{\pi}\leq c_{4}\|\cdot\|_{\nu}. (20)

Then the following results can be obtained.

Lemma 1 ([34]).

For all xNx\in\mathbb{R}^{N}, there hold

AxAxπρ1xAxπ,\displaystyle\|Ax-A_{\infty}x\|_{\pi}\leq\rho_{1}\|x-A_{\infty}x\|_{\pi}, (21)
BxBxνρ2xBxν,\displaystyle\|Bx-B_{\infty}x\|_{\nu}\leq\rho_{2}\|x-B_{\infty}x\|_{\nu}, (22)
Aπ=Aπ=INAπ=1,\displaystyle\|A\|_{\pi}=\|A_{\infty}\|_{\pi}=\|I_{N}-A_{\infty}\|_{\pi}=1, (23)
Bν=Bν=INBν=1,\displaystyle\|B\|_{\nu}=\|B_{\infty}\|_{\nu}=\|I_{N}-B_{\infty}\|_{\nu}=1, (24)

where ρ1:=AAπ<1\rho_{1}:=\|A-A_{\infty}\|_{\pi}<1 and ρ2:=BBν<1\rho_{2}:=\|B-B_{\infty}\|_{\nu}<1.

Lemma 2.

For all zNz\in\mathcal{H}^{N}, the following statements hold

𝐀z𝐀zπ\displaystyle\|\mathbf{A}z-\mathbf{A}_{\infty}z\|_{\pi} ρ1z𝐀zπ,\displaystyle\leq\rho_{1}\|z-\mathbf{A}_{\infty}z\|_{\pi}, (25)
𝐁z𝐁zν\displaystyle\|\mathbf{B}z-\mathbf{B}_{\infty}z\|_{\nu} ρ2z𝐁zν,\displaystyle\leq\rho_{2}\|z-\mathbf{B}_{\infty}z\|_{\nu}, (26)
INId𝐀π\displaystyle\|I_{N}\otimes Id-\mathbf{A}_{\infty}\|_{\pi} =INAπ=1,\displaystyle=\|I_{N}-A_{\infty}\|_{\pi}=1, (27)

where 𝐀:=AId\mathbf{A}_{\infty}:=A_{\infty}\otimes Id and 𝐁:=BId\mathbf{B}_{\infty}:=B_{\infty}\otimes Id.

Proof.

The proof can be found in Appendix A. ∎

Lemma 3 ([39]).

For an irreducible nonnegative matrix Mn×nM\in\mathbb{R}^{n\times n}, it is primitive if it has at least one non-zero diagonal entry.

Lemma 4 ([39]).

For an irreducible nonnegative matrix Mn×nM\in\mathbb{R}^{n\times n}, there hold (i) ρ(M)>0\rho(M)>0 is an eigenvalue of MM, (ii) Mx=ρ(M)xMx=\rho(M)x for some positive vector xx, and (iii) ρ(M)\rho(M) is an algebraically simple eigenvalue.

Lemma 5 ([40]).

For X,Yn×nX,Y\in\mathbb{R}^{n\times n}, let λ\lambda be a simple eigenvalue of XX. Denote uu and vv respectively the left and right eigenvectors of XX corresponding to λ\lambda. Then, it holds that

  1. 1.

    for each ϵ>0\epsilon>0, there exists a δ>0\delta>0 such that, t\forall t\in\mathbb{C} with |t|<δ|t|<\delta, there is a unique eigenvalue λ(t)\lambda(t) of X+tYX+tY such that |λ(t)λtuYvuv||t|ϵ|\lambda(t)-\lambda-t\frac{u^{\top}Yv}{u^{\top}v}|\leq|t|\epsilon,

  2. 2.

    λ(t)\lambda(t) is continuous at t=0t=0, and limt0λ(t)=λ\lim_{t\to 0}\lambda(t)=\lambda,

  3. 3.

    λ(t)\lambda(t) is differentiable at t=0t=0, and dλ(t)dt|t=0=uYvuv\frac{d\lambda(t)}{dt}\big{|}_{t=0}=\frac{u^{\top}Yv}{u^{\top}v}.

Lemma 6.

It holds that y¯k=i=1NFi(xi,k)\bar{y}_{k}=\sum_{i=1}^{N}F_{i}(x_{i,k}), where y¯k:=i=1Nyi,k\bar{y}_{k}:=\sum_{i=1}^{N}y_{i,k}.

Proof.

Left multiplying (15) by 𝟏{\bf 1}^{\top} yields that y¯k+1=y¯k+i=1NFi(xi,k+1)i=1NFi(xi,k)\bar{y}_{k+1}=\bar{y}_{k}+\sum_{i=1}^{N}F_{i}(x_{i,k+1})-\sum_{i=1}^{N}F_{i}(x_{i,k}), which further implies that y¯ki=1NFi(xi,k)=y¯0i=1NFi(xi,0)\bar{y}_{k}-\sum_{i=1}^{N}F_{i}(x_{i,k})=\bar{y}_{0}-\sum_{i=1}^{N}F_{i}(x_{i,0}). Note that yi,0=Fi(xi,0)y_{i,0}=F_{i}(x_{i,0}). The conclusion directly follows. ∎

To move forward, an important result for the convergence analysis is first given below.

Lemma 7.

Under Assumption 3, if α(0,1δ]\alpha\in(0,1-\delta], where δ(0,1)\delta\in(0,1) is any pre-specified parameter, then there holds

dFix(F)(Fα(x))ρ3dFix(F)(x),x\displaystyle d_{Fix(F)}(F_{\alpha}(x))\leq\rho_{3}d_{Fix(F)}(x),~{}~{}~{}\forall x\in\mathbb{H} (28)

where Fα:=Id+α(FId)F_{\alpha}:=Id+\alpha(F-Id) is the α\alpha-quasi-averaged operator of FF, and

ρ3:=1δα4κ2[0,1).\displaystyle\rho_{3}:=1-\frac{\delta\alpha}{4\kappa^{2}}\in[0,1). (29)
Proof.

It is easy to see that FαId=α(FId)F_{\alpha}-Id=\alpha(F-Id), which together with (11) yields that dFix(F)(x)κF(x)x=καFα(x)xd_{Fix(F)}(x)\leq\kappa\|F(x)-x\|=\frac{\kappa}{\alpha}\|F_{\alpha}(x)-x\| for all xx\in\mathbb{H}. Therefore, FαF_{\alpha} is linearly regular with constant κα\frac{\kappa}{\alpha}. Simultaneously, it is known that each α\alpha-quasi-averaged operator is 1αα\frac{1-\alpha}{\alpha}-SQNE [30], and thus FαF_{\alpha} is 1αα\frac{1-\alpha}{\alpha}-SQNE. With the above two properties of FαF_{\alpha} as well as Fix(Fα)=Fix(F)Fix(F_{\alpha})=Fix(F), invoking Theorem 1 in [30] leads to dFix(F)(Fα(x))ϕdFix(F)(x)d_{Fix(F)}(F_{\alpha}(x))\leq\phi d_{Fix(F)}(x), where ϕ:=1α(1α)κ2[0,1)\phi:=\sqrt{1-\frac{\alpha(1-\alpha)}{\kappa^{2}}}\in[0,1). Meanwhile, it is easy to verify that

ϕ1α(1α)2κ21δα4κ2,\displaystyle\phi\leq 1-\frac{\alpha(1-\alpha)}{2\kappa^{2}}\leq 1-\frac{\delta\alpha}{4\kappa^{2}},

where α1δ\alpha\leq 1-\delta is used in the last inequality. This ends the proof. ∎

We are now ready to give the main result of this section.

Theorem 1.

Under Assumptions 1-3, all xi,kx_{i,k}’s generated by Algorithm 1 converge to a common point in Fix(F)Fix(F) at a linear rate, if there holds

0<α<min{1δ,αc},\displaystyle 0<\alpha<\min\{1-\delta,\alpha_{c}\}, (30)

where αc\alpha_{c} is the smallest positive real root of equation det(IM(α))=0det(I-M(\alpha))=0, and

M(α):=((1α)ρ1αc2θ10θ2(αθ3+θ4)ρ2+αθ1θ22αc1θ2αL¯c1αNθ1c11δα4κ2)\displaystyle M(\alpha):=\left(\begin{array}[]{ccc}(1-\alpha)\rho_{1}&\alpha c_{2}\theta_{1}&0\\ \theta_{2}(\alpha\theta_{3}+\theta_{4})&\rho_{2}+\alpha\theta_{1}\theta_{2}&2\alpha c_{1}\theta_{2}\\ \frac{\alpha\bar{L}}{c_{1}}&\frac{\alpha\sqrt{N}\theta_{1}}{c_{1}}&1-\frac{\delta\alpha}{4\kappa^{2}}\\ \end{array}\right) (34)

with L¯:=maxi[N]{Li}\bar{L}:=\max_{i\in[N]}\{L_{i}\}, θ1:=c4Dν1N\theta_{1}:=\frac{c_{4}\|D_{\nu}^{-1}\|}{N}, θ2:=c2L¯(N+1)c1c3\theta_{2}:=\frac{c_{2}\bar{L}(\sqrt{N}+1)}{c_{1}c_{3}}, θ3:=ρ1+L¯\theta_{3}:=\rho_{1}+\bar{L}, and θ4:=AI\theta_{4}:=\|A-I\|.

Proof.

The proof can be found in Appendix B. ∎

Remark 3.

It should be noticed that the problem considered in this paper is more general than the common fixed point finding problem in [17, 16, 18, 20], where all local operators are assumed to have at least one common fixed point, while this is dropped in this paper. It is worthwhile to notice the linear algebraic equation solving problem in [24, 25, 26] can be cast as a special case of the common fixed point seeking problem. Note that no convergence speeds are provided in [17, 16, 18, 19], although random interconnection graphs are considered in [19]. In addition, the same problem as here is also studied in [21] for nonexpansive operators, where the convergence rate is not analyzed, while a linear convergence rate is established here and more general operators are considered, i.e., quasi-nonexpansive operators. It should be also noted that a main difference between DOT here and D-KM in [21] is that DOT exploits a tracking technique for FF with a constant stepsize, similar to the tracking idea for a global gradient in distributed optimization [36, 34, 37], while D-KM does not use this idea and applies a diminishing stepsize.

As a special case, when all local operators have at least one common fixed point, problem (9) will reduce to the common fixed point seeking problem due to Fix(F)=i=1NFix(Fi)Fix(F)=\cap_{i=1}^{N}Fix(F_{i}) in this case (e.g., Proposition 4.47 in [1]). Therefore, we have the following result.

Corollary 1.

Under the same conditions in Theorem 1, if all FiF_{i}’s have at least one common fixed point, then all xi,kx_{i,k}’s generated by Algorithm 1 converge to a common point in i=1NFix(Fi)\cap_{i=1}^{N}Fix(F_{i}) at a linear rate.

Note that the convergence speed is also analyzed for the common fixed point seeking problem in [20] (i.e., the DO algorithm), where the main difference between DOT and DO is that an estimate is introduced for each agent to track the global operator FF here, while each agent does not perform this tracking in the DO algorithm. However, the rate is sublinear and all operators are assumed to be nonexpansive in [20], while a linear rate is provided here in Corollary 1 and less conservative operators, i.e., quasi-nonexpansive operators, are considered. Note that time-varying communication graphs were considered with non-identical stepsizes for the DO algorithm in [20], while this paper is concerned with static communication graphs with an identical stepsize for all agents. Along this line, it is interesting to further address the case with non-identical stepsizes for different agents and time-varying communication graphs in future.

IV The DOP Algorithm for Problem II

This section is concerned with solving problem (10). Without loss of generality, the vectors in \mathcal{H} are viewed as row vectors in this section for convenient analysis.

For problem (10), each agent i[N]i\in[N] can only privately access Fi\texttt{F}_{i} with its own vector xix_{i} for a whole vector x=(x1,,xN)x=(x_{1},\ldots,x_{N})\in\mathcal{H} over a network of NN agents, where xix_{i} is privately known by agent ii itself, as commonly encountered in multi-player games, and so on. To handle this problem, each agent i[N]i\in[N] maintains a vector xki=(x1,ki,,xN,ki)x_{k}^{i}=(x_{1,k}^{i},\ldots,x_{N,k}^{i})\in\mathcal{H} at time step k0k\geq 0 as an estimate of a fixed point of FF, where xj,kix_{j,k}^{i} is an estimate of xj,kx_{j,k} (i.e., the vector of agent jj at time kk) by agent ii at time kk with xi,ki=xi,kx_{i,k}^{i}=x_{i,k}. That is, each agent ii updates its own vector xi,kx_{i,k} at time slot kk without access to the vectors of all other agents jij\neq i, and thus each agent ii needs to estimate other agents’ vectors xj,kx_{j,k} denoted as xkix_{k}^{i} at each time k0k\geq 0 over the communication graph 𝒢\mathcal{G} satisfying Assumption 1.

Now, a distributed algorithm is proposed as in Algorithm 2, where A=(aij)N×NA=(a_{ij})\in\mathbb{R}^{N\times N} is the communication matrix introduced after Assumption 1, which is only row-stochastic, and xi,kj:=(x1,kj,,xi1,kj,xi+1,kj,,xN,kj)x_{-i,k}^{j}:=(x_{1,k}^{j},\ldots,x_{i-1,k}^{j},x_{i+1,k}^{j},\ldots,x_{N,k}^{j}) for all i,j[N]i,j\in[N], i.e., xi,kjx_{-i,k}^{j} is the agent jj’s estimate of all agents’ vectors except the ii-th agent. We recall that π\pi is the left stochastic eigenvector of AA associated with the eigenvalue 11 as introduced in Assumption 1.2. It should be noted that the ii-th entry πi\pi_{i} of π\pi can be evaluated by agent ii in finite time in a distributed fashion using the approach in [38]. Thus, Algorithm 2 is distributed.

Algorithm 2 Distributed Quasi-Averaged Operator Playing (DOP)
1:  Initialization: Stepsize α\alpha in (39), communication matrix AA, and local initial conditions x0ix_{0}^{i}\in\mathcal{H} for all i[N]i\in[N].
2:  Iterations: Step k0k\geq 0: update for each i[N]i\in[N]:
xi,k+1=j=1Naijxi,kj+απi(Fi(xki)j=1Naijxi,kj),\displaystyle x_{i,k+1}=\sum_{j=1}^{N}a_{ij}x_{i,k}^{j}+\frac{\alpha}{\pi_{i}}\big{(}\texttt{F}_{i}(x_{k}^{i})-\sum_{j=1}^{N}a_{ij}x_{i,k}^{j}\big{)}, (35a)
xi,k+1i=j=1Naijxi,kj.\displaystyle x_{-i,k+1}^{i}=\sum_{j=1}^{N}a_{ij}x_{-i,k}^{j}. (35b)

To ease the upcoming analysis, let us define xk:=col(xk1,,xkN)Nx_{k}:=col(x_{k}^{1},\ldots,x_{k}^{N})\in\mathcal{H}^{N}, x^i,k:=j=1Naijxi,kj\hat{x}_{i,k}:=\sum_{j=1}^{N}a_{ij}x_{i,k}^{j}, and F¯:=diag{(F1(xk1)x^1,k)/π1,,(FN(xkN)x^N,k)/πN}\bar{F}:=diag\{(\texttt{F}_{1}(x_{k}^{1})-\hat{x}_{1,k})/\pi_{1},\ldots,(\texttt{F}_{N}(x_{k}^{N})-\hat{x}_{N,k})/\pi_{N}\}. Then algorithm (35) can be written in a compact form

xk+1=Axk+αF¯.\displaystyle x_{k+1}=Ax_{k}+\alpha\bar{F}. (36)

Multiplying π\pi^{\top} on both sides of (36) yields that

x~k+1=x~k+αF~,\displaystyle\tilde{x}_{k+1}=\tilde{x}_{k}+\alpha\tilde{F}, (37)

where x~k=(x~1,k,,x~N,k):=i=1Nπixki\tilde{x}_{k}=(\tilde{x}_{1,k},\ldots,\tilde{x}_{N,k}):=\sum_{i=1}^{N}\pi_{i}x_{k}^{i} and F~:=(F1(xk1)x^1,k,,FN(xkN)x^N,k)\tilde{F}:=(\texttt{F}_{1}(x_{k}^{1})-\hat{x}_{1,k},\ldots,\texttt{F}_{N}(x_{k}^{N})-\hat{x}_{N,k}).

To move forward, it is useful to recall the weighted norm zπ:=i=1Nπizi2\|z\|_{\pi}:=\sqrt{\sum_{i=1}^{N}\pi_{i}\|z_{i}\|^{2}} for a vector z=col(z1,,zN)Nz=col(z_{1},\ldots,z_{N})\in\mathcal{H}^{N}, as defined in (17). And let \|\cdot\| be a norm in N\mathcal{H}^{N} defined by z:=i=1Nzi2\|z\|:=\sqrt{\sum_{i=1}^{N}\|z_{i}\|^{2}}. Remember that the vectors in \mathcal{H} are seen as row vectors in this section. Then similar to (25) in Lemma 2, it is easy to obtain the following result.

Lemma 8.

For all zNz\in\mathcal{H}^{N}, it holds that

AzAzπ\displaystyle\|Az-A_{\infty}z\|_{\pi} ρ1zAzπ,\displaystyle\leq\rho_{1}\|z-A_{\infty}z\|_{\pi}, (38)

where A=𝟏NπA_{\infty}={\bf 1}_{N}\pi^{\top} as defined in the paragraph after Assumption 1 and ρ1:=AAπ<1\rho_{1}:=\|A-A_{\infty}\|_{\pi}<1.

To ensure the linear convergence, Assumptions 2 and 3 are still imposed in this section, but FiF_{i} in Assumption 2 is replaced with Fi\texttt{F}_{i}, i.e., Fi(x)Fi(y)Lixy,x,y\|\texttt{F}_{i}(x)-\texttt{F}_{i}(y)\|\leq L_{i}\|x-y\|,~{}\forall x,y\in\mathcal{H} for i[N]i\in[N].

With the above preparations, we are now ready to give the main result of this section.

Theorem 2.

Under Assumptions 1-3 with FiF_{i} being replaced with Fi\texttt{F}_{i} in Assumption 2, all xkix_{k}^{i}’s generated by DOP converge to a common point in Fix(F)Fix(F) at a linear rate, if

0<α<min{1δ,αL},\displaystyle 0<\alpha<\min\{1-\delta,\alpha_{L}\}, (39)

where δ(0,1)\delta\in(0,1) is any pre-specified parameter, αL\alpha_{L} is the smallest positive real root of equation det(IΘ(α))=0det(I-\Theta(\alpha))=0, and

Θ(α):=(ρ1+αθ52αc22ϖα(L¯+1)c11δα4κ2)\displaystyle\Theta(\alpha):=\left(\begin{array}[]{cc}\rho_{1}+\alpha\theta_{5}&2\alpha c_{2}\sqrt{2\varpi}\\ \frac{\alpha(\bar{L}+1)}{c_{1}}&1-\frac{\delta\alpha}{4\kappa^{2}}\\ \end{array}\right) (42)

with θ5:=2c2ϖ(L¯2+1)/c1\theta_{5}:=2c_{2}\sqrt{\varpi(\bar{L}^{2}+1)}/c_{1}, L¯:=maxi[N]{Li}\bar{L}:=\max_{i\in[N]}\{L_{i}\}, ϖ:=N1+(1π¯)2π¯2\varpi:=N-1+\frac{(1-\underline{\pi})^{2}}{\underline{\pi}^{2}}, and π¯:=mini[N]{πi}>0\underline{\pi}:=\min_{i\in[N]}\{\pi_{i}\}>0.

Proof.

The proof can be found in Appendix C. ∎

Remark 4.

It is worth pointing out that the work [21] only considers the sum separable case and does not present the convergence speed. In contrast, this paper addresses both the sum separable case (see Theorem 1) and the block separable case (see Theorem 2), and to our best knowledge, this paper is the first to address the block separable case for the fixed point finding problem in the decentralized fashion. Note that the block separable case in Theorem 2 has many applications as will be discussed in Section V.

Remark 5.

Moreover, it is worth noting that Problem II can be cast as the common fixed point finding problem for a family of operators Ti:=(Id1,,Idi1,Fi,Idi+1,,IdN):T_{i}:=(Id_{1},\ldots,Id_{i-1},\texttt{F}_{i},Id_{i+1},\ldots,Id_{N}):\mathcal{H}\to\mathcal{H}, where Idi:xxiId_{i}:x\mapsto x_{i} for x=(x1,,xN)x=(x_{1},\ldots,x_{N})\in\mathcal{H} and i[N]i\in[N]. However, there exist two issues: 1) the linear regularity condition may not hold for i=1NTi\sum_{i=1}^{N}T_{i}; and 2) although the DO algorithm proposed in [20] is applicable for finding common fixed points, only a sublinear convergence speed is established.

Remark 6.

It is noteworthy that in problem II each agent ii can only know its own vector xi,kx_{i,k} at each time k0k\geq 0, but has no access to vectors xj,kx_{j,k}’s of all other agents for jij\neq i. In this regard, agent ii needs to estimate all other xj,kx_{j,k}’s in order to compute the value of its operator Fi\texttt{F}_{i}. If each agent has full access to all other agents’ vectors, then a simpler algorithm can be devised to tackle this setup, i.e.,

xi,k+1=xi,k+α(Fi(xk)xi,k),\displaystyle x_{i,k+1}=x_{i,k}+\alpha(\texttt{F}_{i}(x_{k})-x_{i,k}), (43)

where xi,kx_{i,k} is the same as in (35) and xk:=(x1,k,,xN,k)x_{k}:=(x_{1,k},\ldots,x_{N,k}). However, there is no need for each agent to estimate the entire vector xkx_{k} in this setup. As for (43), a linear convergence to a fixed point of the global operator FF can be similarly proved to that of Theorem 2.

V Applications of DOT and DOP

The considered problems can provide a unified framework for a multitude of interesting problems. To show this, this section aims to provide two examples, i.e., distributed optimization and multi-player games under partial decision-information.

V-A Distributed Optimization

Consider a global optimization problem

minxf(x)\displaystyle\min_{x\in\mathcal{H}}~{}~{}~{}f(x) (44)

where f:f:\mathcal{H}\to\mathbb{R} is a differentiable and convex function, whose gradient is Lipschitz with constant LL. It is easy to verify that problem is equivalent to finding fixed points of an operator F:xxξf(x)F:x\mapsto x-\xi\nabla f(x) for any given ξ>0\xi>0, which is shown to be (Lξ)/2(L\xi)/2-averaged when ξ(0,2/L)\xi\in(0,2/L) (cf., Lemma 4 in [3]) and thus nonexpansive. For large-scale problems, the function ff is usually expensive or impossible to be known by a global/central coordinator or computing unit, instead it is more practical to consider the case where ff is separable. Along this line, two cases are discussed below.

Case 1. ff is sum separable, i.e., f(x)=1Ni=1Nfi(x)f(x)=\frac{1}{N}\sum_{i=1}^{N}f_{i}(x), where fi:f_{i}:\mathcal{H}\to\mathbb{R} is the local function, which is differentiable and convex with i\ell_{i}-Lipschitz gradient, only known to agent ii. This problem is often called distributed/decentralizd optimization, which has been extensively studied in the literature. In this case, the problem can be equivalently cast as problem I (i.e., (9)) with Fi:xxξfi(x)F_{i}:x\mapsto x-\xi\nabla f_{i}(x) for any given bounded ξ>0\xi>0, which is Lipschitz. Therefore, Assumption 2 holds true. In this setup, DOT proposed in this paper can be leveraged to solve problem (44) in the sum separable case. As such, under Assumptions 1-3, the linear convergence to a solution of (44) can be guaranteed by Theorem 1.

Case 2. ff is block separable, that is, f(x)=col(x1f(x),,xNf(x))\nabla f(x)=col(\nabla_{x_{1}}f(x),\ldots,\nabla_{x_{N}}f(x)) with x=col(x1,,xN)x=col(x_{1},\ldots,x_{N}), where xix_{i} is the vector of agent i[N]i\in[N] and each agent ii is only capable of computing partial gradient xif(x)\nabla_{x_{i}}f(x) with respect to its own vector xix_{i}. This scenario is realistic in some cases, partially because it is computationally expensive to compute the whole gradient xf(x)\nabla_{x}f(x) by a global/central coordinator or computing unit, and partially because only part of data xix_{i} may be privately acquired by spatially distributed agents. In this setup, the problem can be recast as problem (10) with Fi:xxiξxif(x)\texttt{F}_{i}:x\mapsto x_{i}-\xi\nabla_{x_{i}}f(x) for any given bounded ξ>0\xi>0, which is Lipschitz if xif(x)\nabla_{x_{i}}f(x) is so. For this problem, under Assumptions 1-3, the linear convergence to a solution of (44) can be ensured by Theorem 2.

Remark 7.

Note that in Case 1, the linear convergence rate is ensured under the linear regularity in Assumption 3, which is strictly weaker than the strong convexity of fif_{i}’s or ff [30], which is widely postulated in distributed optimization [41, 42, 43, 44, 45, 40], to just name a few. Also, notice that the linear regularity is only assumed for FF, not necessary for any local operator FiF_{i}. For Case 1, a similar condition to linear regularity, i.e., metric subregularity, is employed in [46] for deriving a linear convergence, which is however in Euclidean spaces under balanced undirected communication graphs, while the result here works in a more general setting, i.e., in real Hilbert spaces under unbalanced directed graphs. It should be also noted that the aforesaid problem is just an application of a general problem (9) addressed here. In addition, to our best knowledge, this paper is the first to investigated the Case 2 in distributed optimization.

V-B Game Under Partial-Decision Information

Consider a noncooperative NN-player game with unconstrained action sets, where each player can be viewed as an agent and a Nash equilibrium is assumed to exist for the game. In this problem, each player i[N]i\in[N] possesses its own cost (or payoff) function Ji(xi,xi)J_{i}(x_{i},x_{-i}), which is differentiable, where xix_{i} is the decision/action vector of player ii and xix_{-i} denotes the decision vectors of all other players, i.e., xi:=col(x1,k,,xi1,k,xi+1,k,,xN,k)x_{-i}:=col(x_{1,k},\ldots,x_{i-1,k},x_{i+1,k},\ldots,x_{N,k}). Note that player ii cannot access other players’ decision vectors, i.e., the considered game here is under partial-decision information, which is more practical than the case where each player has full access to all other players’ decisions in most of existing works. For this problem, at time step k0k\geq 0, each player i[N]i\in[N] will choose its own decision vector xi,knix_{i,k}\in\mathbb{R}^{n_{i}} and a cost Ji(xi,k,xi,k)J_{i}(x_{i,k},x_{-i,k}) will be incurred for player ii after all players make their decisions. Then the objective is for each player to minimize its own cost function, that is, all players desire to achieve a Nash equilibrium (NE) x=col(x1,,xN)nx^{*}=col(x_{1}^{*},\ldots,x_{N}^{*})\in\mathbb{R}^{n} with n:=i=1Nnin:=\sum_{i=1}^{N}n_{i}, which is defined as: for all i[N]i\in[N],

Ji(xi,xi)Ji(xi,xi),xini.\displaystyle J_{i}(x_{i}^{*},x_{-i}^{*})\leq J_{i}(x_{i},x_{-i}^{*}),~{}~{}~{}\forall x_{i}\in\mathbb{R}^{n_{i}}. (45)

To proceed, let iJi(xi,xi)\nabla_{i}J_{i}(x_{i},x_{-i}) denote xiJi(xi,xi)\nabla_{x_{i}}J_{i}(x_{i},x_{-i}) for simplicity. It is then easy to see that an NE xx^{*} satisfies iJi(xi,xi)=0\nabla_{i}J_{i}(x_{i}^{*},x_{-i}^{*})=0 for all i[N]i\in[N]. Consequently, the Nash equilibrium seeking can be equivalently recast as to find fixed points of an operator FF, defined by

F\displaystyle F :=IdrU,\displaystyle:=Id-rU, (46)
U\displaystyle U :=col(1J1,,NJN),\displaystyle:=col(\nabla_{1}J_{1},\ldots,\nabla_{N}J_{N}), (47)

where r>0r>0 is any constant. By defining Fi:=IdiriJi\texttt{F}_{i}:=Id_{i}-r\nabla_{i}J_{i} for i[N]i\in[N] with Idi:xxiId_{i}:x\mapsto x_{i} for x=col(x1,,xN)nx=col(x_{1},\ldots,x_{N})\in\mathbb{R}^{n}, one can obtain that FF is block separable, i.e., F=col(F1,,FN)F=col(\texttt{F}_{1},\ldots,\texttt{F}_{N}), which is consistent with problem (10). As a result, the linear convergence to an NE of the game can be assured by Theorem 2 under Assumptions 1-3 with FF being quasi-nonexpansive.

Note that Assumptions 1-3 are relativley mild, some of which are less conservative than those employed in the literature, as remarked below.

1) Assumption 2 is in fact equivalent to iJi\nabla_{i}J_{i} being Lipschitz for all i[N]i\in[N], i.e., iJi(x)iJi(y)qixy\|\nabla_{i}J_{i}(x)-\nabla_{i}J_{i}(y)\|\leq q_{i}\|x-y\| for some qi>0q_{i}>0 and for all x,ynx,y\in\mathbb{R}^{n}, which has been frequently employed in the literature, see e.g., [47, 48, 49, 50]. Then it can be readily obtained that UU is qq-Lipschitz, where q:=i=1Nqi2q:=\sqrt{\sum_{i=1}^{N}q_{i}^{2}}.

2) The linear regularity is strictly weaker than the strong monotonicity, which has been widely imposed for deriving the linear convergence [47, 48, 49, 50], i.e., (U(x)U(z))(xz)μxz2(U(x)-U(z))^{\top}(x-z)\geq\mu\|x-z\|^{2} for some μ>0\mu>0 and for all x,znx,z\in\mathbb{R}^{n}. To see this, it is obvious that strong monotonicity is strictly stronger than quasi-strong monotonicity, i.e., (U(x)U(y))(xy)μxy2(U(x)-U(y))^{\top}(x-y)\geq\mu\|x-y\|^{2} for all xnx\in\mathbb{R}^{n} and yFix(F)y\in Fix(F). Meanwhile, quasi-strong monotonicity can imply the linear regularity of FF, since it holds that F(x)x=rU(x)=rU(x)U(PFix(F)(x))rμxPFix(F)(x)=rμdFix(F)(x)\|F(x)-x\|=r\|U(x)\|=r\|U(x)-U(P_{Fix(F)}(x))\|\geq r\mu\|x-P_{Fix(F)}(x)\|=r\mu d_{Fix(F)}(x) for all xnx\in\mathbb{R}^{n}, i.e., dFix(F)(x)Fα(x)x/(rμ)d_{Fix(F)}(x)\leq\|F_{\alpha}(x)-x\|/(r\mu), where U(PFix(F)(x))=0U(P_{Fix(F)}(x))=0 and the quasi-strong monotonicity have been utilized. It should be also noted that the game can have a closed convex set of NEs (not necessary to be unique) under linear regularity.

3) The quasi-nonexpansiveness of FF is a weak assumption. For example, if the aforementioned quasi-strong monotonicity holds, then it holds that for all xnx\in\mathbb{R}^{n} and yFix(F)y\in Fix(F),

F(x)y2\displaystyle\|F(x)-y\|^{2} =xyr(U(x)U(y))2\displaystyle=\|x-y-r(U(x)-U(y))\|^{2}
=xy22r(xy)(U(x)U(y))\displaystyle=\|x-y\|^{2}-2r(x-y)^{\top}(U(x)-U(y))
+r2U(x)U(y)2\displaystyle\hskip 11.38092pt+r^{2}\|U(x)-U(y)\|^{2}
(12μr+q2r2)xy2,\displaystyle\leq(1-2\mu r+q^{2}r^{2})\|x-y\|^{2}, (48)

where the quasi-strong monotonicity and qq-Lipschitz continuity of UU are used in the inequality. In view of (48), it is easy to see that FF is even contractive, which is stronger than quasi-nonexpansive, if r(0,2μq2)r\in(0,\frac{2\mu}{q^{2}}).

4) Assumption 1 requires the strong connectivity for directed graphs, which are not necessarily balanced. In contrast, balanced undirected/directed graphs are exploited in [47, 48, 49, 50]. We note that time-varying graphs are considered in [49], but the graphs still need to be balanced, and in this case, it is interesting for us to extend the results of this paper to time-varying communication graphs.

VI Numerical Examples

This section is to provide two numerical examples to corroborate the proposed algorithms.

Example 1.

Consider a distributed optimization problem as discussed in Case 1 of Section V-A, where fi(x)=i(Ex)+bixf_{i}(x)=\hbar_{i}(Ex)+b_{i}^{\top}x, and i(z)\hbar_{i}(z) is a strongly convex function with Lipschitz continuous gradient. It is easy to see that this problem is equivalent to finding a fixed point of the operator F:=IdξfF:=Id-\xi\nabla f for ξ(0,2/L)\xi\in(0,2/L) (see Section V-A), which is in the form (9) with Fi:=IdiξfiF_{i}:=Id_{i}-\xi\nabla f_{i} for i[N]i\in[N].

It should be noted that fif_{i} will be strongly convex when EE has full column rank, and fif_{i} will be convex but not strongly convex if EE does not have full column rank (Section III in [30]), which is frequently encountered in practical applications, such as the L1L1-loss linear support vector machine (SVM) in machine learning [51]. Denote by XX^{*} the nonempty set of optimizers of this problem. Although fif_{i} is not strongly convex when EE does not has full column rank, it has been shown in Theorem 18 of [51] that this problem admits a global error bound, i.e., dX(x)τf(x),xnd_{X^{*}}(x)\leq\tau\|\nabla f(x)\|,\forall x\in\mathbb{R}^{n} for some constant τ0\tau\geq 0, which further leads to dFix(F)(x)τξxF(x)d_{Fix(F)}(x)\leq\frac{\tau}{\xi}\|x-F(x)\| for all xnx\in\mathbb{R}^{n}, i.e., satisfying the linear regularity condition.

Refer to caption
Figure 1: Evolutions of distance to the optimizer set by DOT in this paper.
Refer to caption
Figure 2: Evolutions of distance to the optimizer set by D-KM in [21].

In the simulation, let N=100N=100, n=5n=5, i(Ex)=|Expi|2\hbar_{i}(Ex)=|Ex-p_{i}|^{2}, E=(1,1,1,1,1)1×5E=(1,1,1,1,1)\in\mathbb{R}^{1\times 5}, pi=ip_{i}=i, and bi=col(i,i,i,i,i)/5b_{i}=col(i,i,i,i,i)/5 for all i[N]i\in[N]. Then it is easy to verify that the gradient constant of ff is L=10L=10, thus holding ξ(0,0.2)\xi\in(0,0.2). Setting α=0.05\alpha=0.05 and ξ=0.1\xi=0.1, running the DOT algorithm (13) gives rise to the simulation results in Fig. 1, indicating that all xi,kx_{i,k}’s converge linearly to the optimal set X:={z=col(z1,z2)2:z1+2z2=3(N+1)/837.875}X^{*}:=\{z=col(z_{1},z_{2})\in\mathbb{R}^{2}:z_{1}+2z_{2}=3(N+1)/8\approx 37.875\}. In comparison with the D-KM iteration proposed in [21] under the same communication graph as DOT (see Fig. 2), which is equivalent to the classical distributed gradient descent (DGD) algorithm for this problem, it can be observed that the DOT algorithm here has a faster convergence speed. Overall, the simulation supports the theoretical result.

Refer to caption
Figure 3: Evolutions of xj,kixj,k1x_{j,k}^{i}-x_{j,k}^{1} for i,j[N]i,j\in[N] by DOP.
Refer to caption
Figure 4: Evolutions of iJi(xi,ki,xi,ki)\nabla_{i}J_{i}(x_{i,k}^{i},x_{-i,k}^{i}) for i[N]i\in[N] by DOP.
Example 2.

Consider the class of games as discussed in Section V-B with N=50N=50 players. To be specific, each player ii has its decision vector in 2\mathbb{R}^{2} with its cost function given as Ji(xi,xi)=hi(Exi)+lixiJ_{i}(x_{i},x_{-i})=h_{i}(Ex_{i})+l_{i}^{\top}x_{i}, where hi(z)=riz2+sizh_{i}(z)=r_{i}z^{2}+s_{i}z is strongly convex for zz\in\mathbb{R} with ri>0r_{i}>0 and sis_{i}\in\mathbb{R}, and li(xi)=jicijxjl_{i}(x_{-i})=\sum_{j\neq i}c_{ij}x_{j} with cij2×2c_{ij}\in\mathbb{R}^{2\times 2}. Note that JiJ_{i} is not strongly convex in xix_{i} if EE is not of full column rank, as discussed in Example 1. In this example, let E=(1,1)E=(1,1), which does not have full column rank. Thus, JiJ_{i} is not strongly convex in xix_{i}, however Fi:=IdiriJi\texttt{F}_{i}:=Id_{i}-r\nabla_{i}J_{i} is linearly regular as similarly illustrated as in Example 1. It is then easy to verify that the global operator F=(F1,,FN)F=(\texttt{F}_{1},\ldots,\texttt{F}_{N}) is linearly regular. Moreover, the Nash equilibrium may not be unique since EE is not of full column rank. Set α=0.01\alpha=0.01 and r=0.1r=0.1. By randomly choosing rir_{i}, sis_{i}, and cijc_{ij} for i,j[N]i,j\in[N] with a randomly generated strongly connected communication graph, performing the developed DOP with each component of initial conditions randomly in [0,1][0,1] gives the simulation results in Figs. 3 and 4. In Fig. 3, the distances from xki=col(x1,ki,,xN,ki)x_{k}^{i}=col(x_{1,k}^{i},\ldots,x_{N,k}^{i}) to the equilibrium set are plotted for all players, showing that all players’ estimate xkix_{k}^{i}’s converge to the equilibrium set at a linear rate. On the other hand, the gradient iJi(xi,ki,xi,ki)\nabla_{i}J_{i}(x_{i,k}^{i},x_{-i,k}^{i}) of each agent ii is given in Fig. 4, indicating that all agents’ gradients converge linearly to zero. Hence, the simulation results support the theoretical result in Theorem 2.

VII Conclusion

This paper has investigated the fixed point seeking problem for a quasi-nonexpansive global operator over a fixed, unbalanced, and directed communication graph, for which two scenarios have been considered, i.e., the global operator is respectively sum separable and block separable under the linear regularity condition. For the first case, the global operator consists of a sum of local operators, which are assumed to be Lipschitz. To solve this case, a distributed algorithm, DOT, has been proposed and shown to be convergent to a fixed point of the global operator at a linear rate. For the second case, a distributed algorithm, DOP, has been developed, showing to be linearly convergent to a fixed point of the global operator. Meanwhile, two applications have been presented in detail, i.e., distributed optimization and multi-player game under partial-decision information. In future, it is interesting to study asynchronous algorithms, non-identical stepsizes for agents, and time-varying communication graphs.

Acknowledgment

The authors are grateful to the Editor, the Associate Editor and the anonymous reviewers for their insightful suggestions.

Appendix

VII-A Proof of Lemma 2

To prove (25), it can be obtained that

𝐀z𝐀zπ\displaystyle\|\mathbf{A}z-\mathbf{A}_{\infty}z\|_{\pi} =(𝐀𝐀)(z𝐀z)π\displaystyle=\|(\mathbf{A}-\mathbf{A}_{\infty})(z-\mathbf{A}_{\infty}z)\|_{\pi}
𝐀𝐀πz𝐀zπ,\displaystyle\leq\|\mathbf{A}-\mathbf{A}_{\infty}\|_{\pi}\|z-\mathbf{A}_{\infty}z\|_{\pi}, (49)

where the equality has used the fact that 𝐀𝐀=𝐀𝐀=𝐀\mathbf{A}\mathbf{A}_{\infty}=\mathbf{A}_{\infty}\mathbf{A}_{\infty}=\mathbf{A}_{\infty}.

Consider the term 𝐀𝐀π\|\mathbf{A}-\mathbf{A}_{\infty}\|_{\pi} in (49). To do so, by definition (17), one has that for any xNx\in\mathbb{R}^{N} and yy\in\mathcal{H},

xyπ\displaystyle\|x\otimes y\|_{\pi} =i=1Nπixi2y2=yi=1Nπixi2=xπy,\displaystyle=\sqrt{\sum_{i=1}^{N}\pi_{i}x_{i}^{2}\|y\|^{2}}=\|y\|\sqrt{\sum_{i=1}^{N}\pi_{i}x_{i}^{2}}=\|x\|_{\pi}\|y\|, (50)

which, together with the norm’s definition, leads to

𝐀𝐀π\displaystyle\|\mathbf{A}-\mathbf{A}_{\infty}\|_{\pi} =supxyπ0(𝐀𝐀)(xy)πxyπ\displaystyle=\sup_{\|x\otimes y\|_{\pi}\neq 0}\frac{\|(\mathbf{A}-\mathbf{A}_{\infty})(x\otimes y)\|_{\pi}}{\|x\otimes y\|_{\pi}}
=supxyπ0[(AA)x]yπxyπ\displaystyle=\sup_{\|x\otimes y\|_{\pi}\neq 0}\frac{\|[(A-A_{\infty})x]\otimes y\|_{\pi}}{\|x\otimes y\|_{\pi}}
=supxπy0(AA)xπyxπy\displaystyle=\sup_{\|x\|_{\pi}\|y\|\neq 0}\frac{\|(A-A_{\infty})x\|_{\pi}\|y\|}{\|x\|_{\pi}\|y\|}
=supxπ0(AA)xπxπ\displaystyle=\sup_{\|x\|_{\pi}\neq 0}\frac{\|(A-A_{\infty})x\|_{\pi}}{\|x\|_{\pi}}
=AAπ.\displaystyle=\|A-A_{\infty}\|_{\pi}. (51)

Note that AAπ=ρ1\|A-A_{\infty}\|_{\pi}=\rho_{1} by Lemma 1. Consequently, putting together (49)-(51) gives rise to (25). By noting that 𝐁𝐁=𝐁𝐁=𝐁\mathbf{B}\mathbf{B}_{\infty}=\mathbf{B}_{\infty}\mathbf{B}_{\infty}=\mathbf{B}_{\infty}, similar arguments can be applied to obtain (26) and (27). This ends the proof.

VII-B Proof of Theorem 1

Let us first establish upper bounds on xk+1𝐀xk+1π\|x_{k+1}-\mathbf{A}_{\infty}x_{k+1}\|_{\pi}, xk+1xk\|x_{k+1}-x_{k}\|, yk+1𝐁yk+1ν\|y_{k+1}-\mathbf{B}_{\infty}y_{k+1}\|_{\nu}, and 𝐀xk+1𝟏Nxk+1)\|\mathbf{A}_{\infty}x_{k+1}-{\bf 1}_{N}\otimes x_{k+1}^{*})\|, where x¯k:=i=1Nπixi,k\bar{x}_{k}:=\sum_{i=1}^{N}\pi_{i}x_{i,k} and xk:=PFix(F)(x¯k)x_{k}^{*}:=P_{Fix(F)}(\bar{x}_{k}) for all k0k\geq 0.

For xk+1𝐀xk+1π\|x_{k+1}-\mathbf{A}_{\infty}x_{k+1}\|_{\pi}, by noting 𝐀𝐀=𝐀\mathbf{A}_{\infty}\mathbf{A}=\mathbf{A}_{\infty}, invoking (14) yields that

xk+1𝐀xk+1π\displaystyle\|x_{k+1}-\mathbf{A}_{\infty}x_{k+1}\|_{\pi}
=(1α)𝐀xk+αN(Dν1Id)yk(1α)𝐀𝐀xk\displaystyle=\|(1-\alpha)\mathbf{A}x_{k}+\frac{\alpha}{N}(D_{\nu}^{-1}\otimes Id)y_{k}-(1-\alpha)\mathbf{A}_{\infty}\mathbf{A}x_{k}
αN𝐀(Dν1Id)ykπ\displaystyle\hskip 11.38092pt-\frac{\alpha}{N}\mathbf{A}_{\infty}(D_{\nu}^{-1}\otimes Id)y_{k}\|_{\pi}
(1α)𝐀xk𝐀xkπ\displaystyle\leq(1-\alpha)\|\mathbf{A}x_{k}-\mathbf{A}_{\infty}x_{k}\|_{\pi}
+αN(INId𝐀)(Dν1Id)ykπ.\displaystyle\hskip 11.38092pt+\frac{\alpha}{N}\|(I_{N}\otimes Id-\mathbf{A}_{\infty})(D_{\nu}^{-1}\otimes Id)y_{k}\|_{\pi}. (52)

Consider the last term in (52). One can obtain that

(INId𝐀)(Dν1Id)ykπ\displaystyle\|(I_{N}\otimes Id-\mathbf{A}_{\infty})(D_{\nu}^{-1}\otimes Id)y_{k}\|_{\pi}
=(INId𝐀)[(Dν1Id)yk𝟏Ny¯k]π\displaystyle=\|(I_{N}\otimes Id-\mathbf{A}_{\infty})[(D_{\nu}^{-1}\otimes Id)y_{k}-{\bf 1}_{N}\otimes\bar{y}_{k}]\|_{\pi}
=(INId𝐀)(Dν1Id)[ykνy¯k]π\displaystyle=\|(I_{N}\otimes Id-\mathbf{A}_{\infty})(D_{\nu}^{-1}\otimes Id)[y_{k}-\nu\otimes\bar{y}_{k}]\|_{\pi}
INId𝐀πDν1Idπyk𝐁ykπ\displaystyle\leq\|I_{N}\otimes Id-\mathbf{A}_{\infty}\|_{\pi}\|D_{\nu}^{-1}\otimes Id\|_{\pi}\|y_{k}-\mathbf{B}_{\infty}y_{k}\|_{\pi}
c4Dν1πyk𝐁ykν,\displaystyle\leq c_{4}\|D_{\nu}^{-1}\|_{\pi}\|y_{k}-\mathbf{B}_{\infty}y_{k}\|_{\nu}, (53)

where y¯k\bar{y}_{k} is defined in Lemma 6, the first inequality has applied the fact that νy¯k=𝐁yk\nu\otimes\bar{y}_{k}=\mathbf{B}_{\infty}y_{k}, and the last inequality has employed (27), Dν1Idπ=Dν1π\|D_{\nu}^{-1}\otimes Id\|_{\pi}=\|D_{\nu}^{-1}\|_{\pi} (using the same argument as that in Lemma 2), and (20).

In view of (19) and (25), inserting (53) in (52) results in

xk+1𝐀xk+1π\displaystyle\|x_{k+1}-\mathbf{A}_{\infty}x_{k+1}\|_{\pi} (1α)ρ1xk𝐀xkπ\displaystyle\leq(1-\alpha)\rho_{1}\|x_{k}-\mathbf{A}_{\infty}x_{k}\|_{\pi}
+αc2c4Dν1Nyk𝐁ykν.\displaystyle\hskip 2.84544pt+\frac{\alpha c_{2}c_{4}\|D_{\nu}^{-1}\|}{N}\|y_{k}-\mathbf{B}_{\infty}y_{k}\|_{\nu}. (54)

As for xk+1xk\|x_{k+1}-x_{k}\|, it can be obtained from (14) that

xk+1xk\displaystyle\|x_{k+1}-x_{k}\|
=𝐀xkxk+α(Dν1IdNyk𝐀xk)\displaystyle=\|\mathbf{A}x_{k}-x_{k}+\alpha\Big{(}\frac{D_{\nu}^{-1}\otimes Id}{N}y_{k}-\mathbf{A}x_{k}\Big{)}\|
AIxk𝐀xk+αDν1Nyk𝐁yk\displaystyle\leq\|A-I\|\|x_{k}-\mathbf{A}_{\infty}x_{k}\|+\frac{\alpha\|D_{\nu}^{-1}\|}{N}\|y_{k}-\mathbf{B}_{\infty}y_{k}\|
+αDν1IdN𝐁yk𝐀xk,\displaystyle\hskip 11.38092pt+\alpha\|\frac{D_{\nu}^{-1}\otimes Id}{N}\mathbf{B}_{\infty}y_{k}-\mathbf{A}x_{k}\|, (55)

where the inequality has leveraged the triangle inequality and the facts that (𝐀INId)(xk𝐀xk)=𝐀xkxk(\mathbf{A}-I_{N}\otimes Id)(x_{k}-\mathbf{A}_{\infty}x_{k})=\mathbf{A}x_{k}-x_{k} and 𝐀INId=AI\|\mathbf{A}-I_{N}\otimes Id\|=\|A-I\| (using the same argument as that in Lemma 2).

Consider the last term in (55). By using 𝐁=ν𝟏NId\mathbf{B}_{\infty}=\nu{\bf 1}_{N}^{\top}\otimes Id, one has that

Dν1IdN𝐁yk𝐀xk=𝟏Ny¯kN𝐀xk\displaystyle\|\frac{D_{\nu}^{-1}\otimes Id}{N}\mathbf{B}_{\infty}y_{k}-\mathbf{A}x_{k}\|=\|{\bf 1}_{N}\otimes\frac{\bar{y}_{k}}{N}-\mathbf{A}x_{k}\|
𝟏N(y¯kNi=1NFi(x¯k)N)\displaystyle\leq\|{\bf 1}_{N}\otimes\Big{(}\frac{\bar{y}_{k}}{N}-\frac{\sum_{i=1}^{N}F_{i}(\bar{x}_{k})}{N}\Big{)}\|
+𝟏N(i=1NFi(x¯k)Nxk)\displaystyle\hskip 11.38092pt+\|{\bf 1}_{N}\otimes\Big{(}\frac{\sum_{i=1}^{N}F_{i}(\bar{x}_{k})}{N}-x_{k}^{*}\Big{)}\|
+𝟏Nxk𝐀xk+𝐀xk𝐀xk.\displaystyle\hskip 11.38092pt+\|{\bf 1}_{N}\otimes x_{k}^{*}-\mathbf{A}_{\infty}x_{k}\|+\|\mathbf{A}_{\infty}x_{k}-\mathbf{A}x_{k}\|. (56)

For the first term in the last inequality in (56), invoking Lemma 6, one can obtain that

𝟏N(y¯kNi=1NFi(x¯k)N)2\displaystyle\|{\bf 1}_{N}\otimes\Big{(}\frac{\bar{y}_{k}}{N}-\frac{\sum_{i=1}^{N}F_{i}(\bar{x}_{k})}{N}\Big{)}\|^{2}
=𝟏Ni=1N(Fi(xi,k)Fi(x¯k))N2\displaystyle=\|{\bf 1}_{N}\otimes\frac{\sum_{i=1}^{N}(F_{i}(x_{i,k})-F_{i}(\bar{x}_{k}))}{N}\|^{2}
=1Ni=1N(Fi(xi,k)Fi(x¯k))2\displaystyle=\frac{1}{N}\|\sum_{i=1}^{N}(F_{i}(x_{i,k})-F_{i}(\bar{x}_{k}))\|^{2}
i=1NFi(xi,k)Fi(x¯k)2\displaystyle\leq\sum_{i=1}^{N}\|F_{i}(x_{i,k})-F_{i}(\bar{x}_{k})\|^{2}
i=1NLi2xi,kx¯k2\displaystyle\leq\sum_{i=1}^{N}L_{i}^{2}\|x_{i,k}-\bar{x}_{k}\|^{2}
L¯2xk𝐀xk2,\displaystyle\leq\bar{L}^{2}\|x_{k}-\mathbf{A}_{\infty}x_{k}\|^{2}, (57)

where the first and second inequalities have employed i=1Nzi2Ni=1Nzi2\|\sum_{i=1}^{N}z_{i}\|^{2}\leq N\sum_{i=1}^{N}\|z_{i}\|^{2} for any vectors ziz_{i}’s and Assumption 2, respectively. Similarly, it can be obtained that

𝟏N(i=1NFi(x¯k)Nxk)2\displaystyle\|{\bf 1}_{N}\otimes\big{(}\frac{\sum_{i=1}^{N}F_{i}(\bar{x}_{k})}{N}-x_{k}^{*}\big{)}\|^{2} =NF(x¯k)xk2\displaystyle=N\|F(\bar{x}_{k})-x_{k}^{*}\|^{2}
Nx¯kxk2\displaystyle\leq N\|\bar{x}_{k}-x_{k}^{*}\|^{2}
=𝐀xk𝟏Nxk2,\displaystyle=\|\mathbf{A}_{\infty}x_{k}-{\bf 1}_{N}\otimes x_{k}^{*}\|^{2}, (58)

where xkFix(F)x_{k}^{*}\in Fix(F) and the quasi-nonexpansiveness of FF have been used in the inequality.

As a result, substituting (57) and (58) into (56) leads to

Dν1IdN𝐁yk𝐀xk\displaystyle\|\frac{D_{\nu}^{-1}\otimes Id}{N}\mathbf{B}_{\infty}y_{k}-\mathbf{A}x_{k}\|
L¯xk𝐀xk+2𝐀xk𝟏Nxk\displaystyle\leq\bar{L}\|x_{k}-\mathbf{A}_{\infty}x_{k}\|+2\|\mathbf{A}_{\infty}x_{k}-{\bf 1}_{N}\otimes x_{k}^{*}\|
+𝐀xk𝐀xk\displaystyle\hskip 11.38092pt+\|\mathbf{A}x_{k}-\mathbf{A}_{\infty}x_{k}\|
ρ1+L¯c1xk𝐀xkπ+2𝐀xk𝟏Nxk,\displaystyle\leq\frac{\rho_{1}+\bar{L}}{c_{1}}\|x_{k}-\mathbf{A}_{\infty}x_{k}\|_{\pi}+2\|\mathbf{A}_{\infty}x_{k}-{\bf 1}_{N}\otimes x_{k}^{*}\|, (59)

where (19) and (21) have been utilized in the last inequality. Putting (59) in (55) leads to

xk+1xk\displaystyle\|x_{k+1}-x_{k}\| α(ρ1+L¯)+AIc1xk𝐀xkπ\displaystyle\leq\frac{\alpha(\rho_{1}+\bar{L})+\|A-I\|}{c_{1}}\|x_{k}-\mathbf{A}_{\infty}x_{k}\|_{\pi}
+αc4Dν1Nc1yk𝐁ykν\displaystyle\hskip 11.38092pt+\frac{\alpha c_{4}\|D_{\nu}^{-1}\|}{Nc_{1}}\|y_{k}-\mathbf{B}_{\infty}y_{k}\|_{\nu}
+2α𝐀xk𝟏Nxk.\displaystyle\hskip 11.38092pt+2\alpha\|\mathbf{A}_{\infty}x_{k}-{\bf 1}_{N}\otimes x_{k}^{*}\|. (60)

Regarding yk+1𝐁yk+1ν\|y_{k+1}-\mathbf{B}_{\infty}y_{k+1}\|_{\nu}, invoking (15) yields that

yk+1𝐁yk+1ν\displaystyle\|y_{k+1}-\mathbf{B}_{\infty}y_{k+1}\|_{\nu} =𝐁yk𝐁𝐁yk+𝐅(xk+1)\displaystyle=\|\mathbf{B}y_{k}-\mathbf{B}_{\infty}\mathbf{B}y_{k}+\mathbf{F}(x_{k+1})
𝐅(xk)𝐁(𝐅(xk+1)𝐅(xk))ν\displaystyle-\mathbf{F}(x_{k})-\mathbf{B}_{\infty}(\mathbf{F}(x_{k+1})-\mathbf{F}(x_{k}))\|_{\nu}
𝐁yk𝐁ykν+𝐅(xk+1)𝐅(xk)ν\displaystyle\leq\|\mathbf{B}y_{k}-\mathbf{B}_{\infty}y_{k}\|_{\nu}+\|\mathbf{F}(x_{k+1})-\mathbf{F}(x_{k})\|_{\nu}
+𝐁(𝐅(xk+1)𝐅(xk))ν\displaystyle+\|\mathbf{B}_{\infty}(\mathbf{F}(x_{k+1})-\mathbf{F}(x_{k}))\|_{\nu}
ρ2yk𝐁ykν\displaystyle\leq\rho_{2}\|y_{k}-\mathbf{B}_{\infty}y_{k}\|_{\nu}
+c2(1+N)c3𝐅(xk+1)𝐅(xk),\displaystyle+\frac{c_{2}(1+\sqrt{N})}{c_{3}}\|\mathbf{F}(x_{k+1})-\mathbf{F}(x_{k})\|, (61)

where 𝐁𝐁=𝐁\mathbf{B}_{\infty}\mathbf{B}=\mathbf{B}_{\infty} has been used in the first inequality, and (19), (20), (26) and 𝐁=BB1BN\|\mathbf{B}_{\infty}\|=\|B_{\infty}\|\leq\sqrt{\|B_{\infty}\|_{1}\|B_{\infty}\|_{\infty}}\leq\sqrt{N} have been exploited in the last inequality.

On the other hand, it can be obtained that

𝐅(xk+1)𝐅(xk)2\displaystyle\|\mathbf{F}(x_{k+1})-\mathbf{F}(x_{k})\|^{2} =i=1NFi(xi,k+1)Fi(xi,k)2\displaystyle=\sum_{i=1}^{N}\|F_{i}(x_{i,k+1})-F_{i}(x_{i,k})\|^{2}
i=1NLi2xi,k+1xi,k2\displaystyle\leq\sum_{i=1}^{N}L_{i}^{2}\|x_{i,k+1}-x_{i,k}\|^{2}
L¯2xk+1xk2,\displaystyle\leq\bar{L}^{2}\|x_{k+1}-x_{k}\|^{2}, (62)

where Assumption 2 has been applied to obtain the first inequality in (62).

Now substituting (60) and (62) into (61) leads to

yk+1𝐁yk+1ν\displaystyle\|y_{k+1}-\mathbf{B}_{\infty}y_{k+1}\|_{\nu}
ρ2yk𝐁ykν+c2L¯(N+1)c3xk+1xk\displaystyle\leq\rho_{2}\|y_{k}-\mathbf{B}_{\infty}y_{k}\|_{\nu}+\frac{c_{2}\bar{L}(\sqrt{N}+1)}{c_{3}}\|x_{k+1}-x_{k}\|
(ρ2+αc2c4L¯(N+1)Dν1Nc1c3)yk𝐁ykν\displaystyle\leq\Big{(}\rho_{2}+\frac{\alpha c_{2}c_{4}\bar{L}(\sqrt{N}+1)\|D_{\nu}^{-1}\|}{Nc_{1}c_{3}}\Big{)}\|y_{k}-\mathbf{B}_{\infty}y_{k}\|_{\nu}
+c2L¯(N+1)c1c3(αθ3+AI)xk𝐀xkπ\displaystyle\hskip 11.38092pt+\frac{c_{2}\bar{L}(\sqrt{N}+1)}{c_{1}c_{3}}(\alpha\theta_{3}+\|A-I\|)\|x_{k}-\mathbf{A}_{\infty}x_{k}\|_{\pi}
+2αc2L¯(N+1)c3𝐀xk𝟏Nxk,\displaystyle\hskip 11.38092pt+\frac{2\alpha c_{2}\bar{L}(\sqrt{N}+1)}{c_{3}}\|\mathbf{A}_{\infty}x_{k}-{\bf 1}_{N}\otimes x_{k}^{*}\|, (63)

where θ3=ρ1+L¯\theta_{3}=\rho_{1}+\bar{L}.

With regard to 𝐀xk+1𝟏NPFix(F)(x¯k+1)\|\mathbf{A}_{\infty}x_{k+1}-{\bf 1}_{N}\otimes P_{Fix(F)}(\bar{x}_{k+1})\|, by defining x¯k:=PFix(F)(Fα(x¯k))\bar{x}_{k}^{*}:=P_{Fix(F)}(F_{\alpha}(\bar{x}_{k})) and noting that 𝐀xk+1=𝟏Nx¯k+1\mathbf{A}_{\infty}x_{k+1}={\bf 1}_{N}\otimes\bar{x}_{k+1}, xk+1=PFix(F)(x¯k+1)x_{k+1}^{*}=P_{Fix(F)}(\bar{x}_{k+1}) and 𝐀𝐀=𝐀\mathbf{A}_{\infty}\mathbf{A}=\mathbf{A}_{\infty}, invoking (14) gives rise to

𝐀xk+1𝟏Nxk+1\displaystyle\|\mathbf{A}_{\infty}x_{k+1}-{\bf 1}_{N}\otimes x_{k+1}^{*}\|
𝐀xk+1𝟏Nx¯k\displaystyle\leq\|\mathbf{A}_{\infty}x_{k+1}-{\bf 1}_{N}\otimes\bar{x}_{k}^{*}\|
=𝐀𝐀xk+α[𝐀N(Dν1Id)yk𝐀𝐀xk]\displaystyle=\|\mathbf{A}_{\infty}\mathbf{A}x_{k}+\alpha\big{[}\frac{\mathbf{A}_{\infty}}{N}(D_{\nu}^{-1}\otimes Id)y_{k}-\mathbf{A}_{\infty}\mathbf{A}x_{k}\big{]}
𝟏Nx¯k\displaystyle\hskip 11.38092pt-{\bf 1}_{N}\otimes\bar{x}_{k}^{*}\|
=𝟏Nx¯k+α[𝐀N(𝟏Ny¯k)𝟏Nx¯k]𝟏Nx¯k\displaystyle=\|{\bf 1}_{N}\otimes\bar{x}_{k}+\alpha\big{[}\frac{\mathbf{A}_{\infty}}{N}({\bf 1}_{N}\otimes\bar{y}_{k})-{\bf 1}_{N}\otimes\bar{x}_{k}\big{]}-{\bf 1}_{N}\otimes\bar{x}_{k}^{*}
+α𝐀N(Dν1Id)(yk𝐁yk)\displaystyle\hskip 11.38092pt+\frac{\alpha\mathbf{A}_{\infty}}{N}(D_{\nu}^{-1}\otimes Id)(y_{k}-\mathbf{B}_{\infty}y_{k})\|
𝟏Nx¯k+α[𝐀N(𝟏Ny¯k)𝟏Nx¯k]𝟏Nx¯k\displaystyle\leq\|{\bf 1}_{N}\otimes\bar{x}_{k}+\alpha\big{[}\frac{\mathbf{A}_{\infty}}{N}({\bf 1}_{N}\otimes\bar{y}_{k})-{\bf 1}_{N}\otimes\bar{x}_{k}\big{]}-{\bf 1}_{N}\otimes\bar{x}_{k}^{*}\|
+αc4Dν1Nc1yk𝐁ykν,\displaystyle\hskip 11.38092pt+\frac{\alpha c_{4}\|D_{\nu}^{-1}\|}{\sqrt{N}c_{1}}\|y_{k}-\mathbf{B}_{\infty}y_{k}\|_{\nu}, (64)

where the last inequality has utilized (19), (20), and the fact that 𝐀=AA1AN\|\mathbf{A}_{\infty}\|=\|A_{\infty}\|\leq\sqrt{\|A_{\infty}\|_{1}\|A_{\infty}\|_{\infty}}\leq\sqrt{N}.

On the other hand, by 𝐀=𝟏Nπ\mathbf{A}_{\infty}={\bf 1}_{N}\otimes\pi^{\top} and Lemma 6, one can obtain that

𝟏Nx¯k+α[𝐀N(𝟏Ny¯k)𝟏Nx¯k]𝟏Nx¯k\displaystyle\|{\bf 1}_{N}\otimes\bar{x}_{k}+\alpha\big{[}\frac{\mathbf{A}_{\infty}}{N}({\bf 1}_{N}\otimes\bar{y}_{k})-{\bf 1}_{N}\otimes\bar{x}_{k}\big{]}-{\bf 1}_{N}\otimes\bar{x}_{k}^{*}\|
=𝟏Nx¯k+α[𝟏Ni=1NFi(xi,k)N𝟏Nx¯k]\displaystyle=\|{\bf 1}_{N}\otimes\bar{x}_{k}+\alpha\big{[}{\bf 1}_{N}\otimes\frac{\sum_{i=1}^{N}F_{i}(x_{i,k})}{N}-{\bf 1}_{N}\otimes\bar{x}_{k}\big{]}
𝟏Nx¯k\displaystyle\hskip 11.38092pt-{\bf 1}_{N}\otimes\bar{x}_{k}^{*}\|
=𝟏NFα(x¯k)𝟏Nx¯k\displaystyle=\|{\bf 1}_{N}\otimes F_{\alpha}(\bar{x}_{k})-{\bf 1}_{N}\otimes\bar{x}_{k}^{*}
+α𝟏Ni=1N(Fi(xi,k)Fi(x¯k))N\displaystyle\hskip 11.38092pt+\alpha{\bf 1}_{N}\otimes\frac{\sum_{i=1}^{N}(F_{i}(x_{i,k})-F_{i}(\bar{x}_{k}))}{N}\|
𝟏N[Fα(x¯k)x¯k]\displaystyle\leq\|{\bf 1}_{N}\otimes[F_{\alpha}(\bar{x}_{k})-\bar{x}_{k}^{*}]\|
+α𝟏Ni=1N(Fi(xi,k)Fi(x¯k))N,\displaystyle\hskip 11.38092pt+\alpha\|{\bf 1}_{N}\otimes\frac{\sum_{i=1}^{N}(F_{i}(x_{i,k})-F_{i}(\bar{x}_{k}))}{N}\|, (65)

where FαF_{\alpha} is defined in (11).

For the term 𝟏N[Fα(x¯k)x¯k]\|{\bf 1}_{N}\otimes[F_{\alpha}(\bar{x}_{k})-\bar{x}_{k}^{*}]\| in (65), invoking Fix(Fα)=Fix(F)Fix(F_{\alpha})=Fix(F) and Lemma 7 implies that

𝟏N[Fα(x¯k)x^k]2\displaystyle\|{\bf 1}_{N}\otimes[F_{\alpha}(\bar{x}_{k})-\hat{x}_{k}^{*}]\|^{2} =NFα(x¯k)x¯k2\displaystyle=N\|F_{\alpha}(\bar{x}_{k})-\bar{x}_{k}^{*}\|^{2}
Nρ32x¯kPFix(F)(x¯k)2\displaystyle\leq N\rho_{3}^{2}\|\bar{x}_{k}-P_{Fix(F)}(\bar{x}_{k})\|^{2}
=Nρ32x¯kxk2\displaystyle=N\rho_{3}^{2}\|\bar{x}_{k}-x_{k}^{*}\|^{2}
=ρ32𝟏N(x¯kxk)2.\displaystyle=\rho_{3}^{2}\|{\bf 1}_{N}\otimes(\bar{x}_{k}-x_{k}^{*})\|^{2}. (66)

Now, putting together (57) and (64)-(66) results in

𝐀xk+1𝟏Nxk+1\displaystyle\|\mathbf{A}_{\infty}x_{k+1}-{\bf 1}_{N}\otimes x_{k+1}^{*}\|
ρ3𝐀xk𝟏Nxk+αL¯c1xk𝐀xkπ\displaystyle\leq\rho_{3}\|\mathbf{A}_{\infty}x_{k}-{\bf 1}_{N}\otimes x_{k}^{*}\|+\frac{\alpha\bar{L}}{c_{1}}\|x_{k}-\mathbf{A}_{\infty}x_{k}\|_{\pi}
+αc4Dν1Nc1yk𝐁ykν.\displaystyle\hskip 11.38092pt+\frac{\alpha c_{4}\|D_{\nu}^{-1}\|}{\sqrt{N}c_{1}}\|y_{k}-\mathbf{B}_{\infty}y_{k}\|_{\nu}. (67)

In summary, let us define zk:=col(xk𝐀xkπ,yk𝐁ykν,𝐀xk𝟏Nxk)z_{k}:=col(\|x_{k}-\mathbf{A}_{\infty}x_{k}\|_{\pi},\|y_{k}-\mathbf{B}_{\infty}y_{k}\|_{\nu},\|\mathbf{A}_{\infty}x_{k}-{\bf 1}_{N}\otimes x_{k}^{*}\|). Combining (54), (63), and (67) with α(0,1)\alpha\in(0,1) yields that

zk+1M(α)zk,\displaystyle z_{k+1}\leq M(\alpha)z_{k}, (68)

where M(α)M(\alpha) is defined in (34).

It is easy to see that zkz_{k} will converge to the origin at an exponential rate if ρ(M(α))<1\rho(M(\alpha))<1. To ensure ρ(M(α))<1\rho(M(\alpha))<1, it is straightforward to observe that when α=0\alpha=0, 11 is a simple eigenvalue of M(0)M(0) with corresponding left and right eigenvectors both being col(0,0,1)col(0,0,1). Then invoking Lemma 5 gives rise to

dλ(α)dα|α=0=14κ2<0,\displaystyle\frac{d\lambda(\alpha)}{d\alpha}\Big{|}_{\alpha=0}=-\frac{1}{4\kappa^{2}}<0,

indicating that the simple eigenvalue 11 of M(0)M(0) will decrease when increasing the value of α\alpha. Thus, by the continuity of ρ(M(α))\rho(M(\alpha)) with respect to α\alpha, there must exist a constant αc>0\alpha_{c}>0 such that ρ(M(α))<1\rho(M(\alpha))<1 for all α(0,αc)\alpha\in(0,\alpha_{c}). To find αc\alpha_{c}, one can see that the graph associated with M(α)M(\alpha) consisting of 33 agents is strongly connected, which, in conjunction with Theorem C.3 in [14], leads to M(α)M(\alpha) being irreducible. Furthermore, in view of Lemma 3, it can be obtained that M(α)M(\alpha) is primitive, which together with Lemma 4 results in that ρ(M(α))\rho(M(\alpha)) is a simple eigenvalue of M(α)M(\alpha) and all other eigenvalues have absolute values of less than ρ(M(α))\rho(M(\alpha)). Moreover, it can be ensured that ρ(M(α))=1\rho(M(\alpha))=1 when increasing α\alpha from 0 to some point, and thereby the value of αc\alpha_{c} can be calculated by letting det(IM(α))=0det(I-M(\alpha))=0. This completes the proof.

VII-C Proof of Theorem 2

Let us bound xk+1Axk+1π\|x_{k+1}-A_{\infty}x_{k+1}\|_{\pi} and dFix(F)(x~k+1)d_{Fix(F)}(\tilde{x}_{k+1}) in the following.

First, to bound xk+1Axk+1π\|x_{k+1}-A_{\infty}x_{k+1}\|_{\pi}, in view of (36) and AA=AA_{\infty}A=A_{\infty}, one has that

xk+1Axk+1π\displaystyle\|x_{k+1}-A_{\infty}x_{k+1}\|_{\pi} =Axk+αF¯AAxkαAF¯π\displaystyle=\|Ax_{k}+\alpha\bar{F}-A_{\infty}Ax_{k}-\alpha A_{\infty}\bar{F}\|_{\pi}
AxkAxkπ+αF¯𝟏NF~π\displaystyle\leq\|Ax_{k}-A_{\infty}x_{k}\|_{\pi}+\alpha\|\bar{F}-{\bf 1}_{N}\tilde{F}\|_{\pi}
ρ1xkAxkπ+αc2F¯𝟏NF~,\displaystyle\leq\rho_{1}\|x_{k}-A_{\infty}x_{k}\|_{\pi}+\alpha c_{2}\|\bar{F}-{\bf 1}_{N}\tilde{F}\|, (69)

where (19) and (38) have been utilized in the last inequality.

For the last term in (69), it is easy to verify that

F¯𝟏NF~=((1π11)e1,ke2,keN,ke1,k(1π21)e2,keN,ke1,ke2,k(1πN1)eN,k)\displaystyle\bar{F}-{\bf 1}_{N}\tilde{F}=\left(\begin{array}[]{cccc}(\frac{1}{\pi_{1}}-1)e_{1,k}&-e_{2,k}&\cdots&-e_{N,k}\\ -e_{1,k}&(\frac{1}{\pi_{2}}-1)e_{2,k}&\cdots&-e_{N,k}\\ \vdots&\vdots&\ddots&\vdots\\ -e_{1,k}&-e_{2,k}&\cdots&(\frac{1}{\pi_{N}}-1)e_{N,k}\\ \end{array}\right) (74)

with ei,k:=Fi(xki)x^i,ke_{i,k}:=\texttt{F}_{i}(x_{k}^{i})-\hat{x}_{i,k}, and thus it can be obtained that

F¯𝟏NF~2=i=1N[(1πi1)2ei,k2+jiej,k2]\displaystyle\|\bar{F}-{\bf 1}_{N}\tilde{F}\|^{2}=\sum_{i=1}^{N}\big{[}(\frac{1}{\pi_{i}}-1)^{2}\|e_{i,k}\|^{2}+\sum_{j\neq i}\|e_{j,k}\|^{2}\big{]}
ϖi=1Nei,k2\displaystyle\hskip 11.38092pt\leq\varpi\sum_{i=1}^{N}\|e_{i,k}\|^{2}
=ϖi=1NFi(xki)Fi(x~k)+Fi(x~k)xi+xix~i,k\displaystyle\hskip 11.38092pt=\varpi\sum_{i=1}^{N}\|\texttt{F}_{i}(x_{k}^{i})-\texttt{F}_{i}(\tilde{x}_{k})+\texttt{F}_{i}(\tilde{x}_{k})-x_{i}^{*}+x_{i}^{*}-\tilde{x}_{i,k}
+x~i,kx^i,k2\displaystyle\hskip 22.76228pt+\tilde{x}_{i,k}-\hat{x}_{i,k}\|^{2}
4ϖi=1N(Fi(xki)Fi(x~k)2+Fi(x~k)xi2\displaystyle\hskip 11.38092pt\leq 4\varpi\sum_{i=1}^{N}\big{(}\|\texttt{F}_{i}(x_{k}^{i})-\texttt{F}_{i}(\tilde{x}_{k})\|^{2}+\|\texttt{F}_{i}(\tilde{x}_{k})-x_{i}^{*}\|^{2}
+x~i,kxi2+x^i,kx~i,k2),\displaystyle\hskip 22.76228pt+\|\tilde{x}_{i,k}-x_{i}^{*}\|^{2}+\|\hat{x}_{i,k}-\tilde{x}_{i,k}\|^{2}\big{)}, (75)

where ϖ:=N1+(1π¯)2/π¯2\varpi:=N-1+(1-\underline{\pi})^{2}/\underline{\pi}^{2}, x=(x1,,xN)x^{*}=(x_{1}^{*},\ldots,x_{N}^{*}) denotes any fixed point of FF, πiπ¯\pi_{i}\geq\underline{\pi} has been used in the first inequality and i=1nzi2ni=1nzi2\|\sum_{i=1}^{n}z_{i}\|^{2}\leq n\sum_{i=1}^{n}\|z_{i}\|^{2} for any vectors ziz_{i}’s in the last inequality. Note that x~i,k\tilde{x}_{i,k} is the ii-th block-coordinate of x~k\tilde{x}_{k} defined in (37).

To proceed, one can obtain that

i=1Nx^i,kx~i,k2=i=1Nj=1Naij(xi,kjx~i,k)2\displaystyle\sum_{i=1}^{N}\|\hat{x}_{i,k}-\tilde{x}_{i,k}\|^{2}=\sum_{i=1}^{N}\|\sum_{j=1}^{N}a_{ij}(x_{i,k}^{j}-\tilde{x}_{i,k})\|^{2}
i=1Nj=1Naijxi,kjx~i,k2\displaystyle\leq\sum_{i=1}^{N}\sum_{j=1}^{N}a_{ij}\|x_{i,k}^{j}-\tilde{x}_{i,k}\|^{2}
i=1Nj=1Nxi,kjx~i,k2\displaystyle\leq\sum_{i=1}^{N}\sum_{j=1}^{N}\|x_{i,k}^{j}-\tilde{x}_{i,k}\|^{2}
=xk𝟏Nx~k2,\displaystyle=\|x_{k}-{\bf 1}_{N}\tilde{x}_{k}\|^{2}, (76)

where j=1Naij=1\sum_{j=1}^{N}a_{ij}=1 has been exploited for the first equality, the convexity of norm 2\|\cdot\|^{2} for the first inequality, and aij1a_{ij}\leq 1 for the second inequality.

Now, invoking Assumptions 2 and 3.1, (75) and (76) yields

F¯𝟏NF~2\displaystyle\|\bar{F}-{\bf 1}_{N}\tilde{F}\|^{2} 4ϖ(L¯2+1)xk𝟏Nx~k2\displaystyle\leq 4\varpi(\bar{L}^{2}+1)\|x_{k}-{\bf 1}_{N}\tilde{x}_{k}\|^{2}
+8ϖx~kx2,\displaystyle\hskip 11.38092pt+8\varpi\|\tilde{x}_{k}-x^{*}\|^{2}, (77)

which, together with 𝟏Nx~k=Axk{\bf 1}_{N}\tilde{x}_{k}=A_{\infty}x_{k}, implies

F¯𝟏NF~\displaystyle\|\bar{F}-{\bf 1}_{N}\tilde{F}\| 2ϖ(L¯2+1)xkAxk\displaystyle\leq 2\sqrt{\varpi(\bar{L}^{2}+1)}\|x_{k}-A_{\infty}x_{k}\|
+22ϖx~kx.\displaystyle\hskip 11.38092pt+2\sqrt{2\varpi}\|\tilde{x}_{k}-x^{*}\|. (78)

At this step, by choosing x=PFix(F)(x~k)x^{*}=P_{Fix(F)}(\tilde{x}_{k}), substituting (78) into (69) leads to

xk+1Axk+1π\displaystyle\|x_{k+1}-A_{\infty}x_{k+1}\|_{\pi} (ρ1+αθ3)xkAxkπ\displaystyle\leq(\rho_{1}+\alpha\theta_{3})\|x_{k}-A_{\infty}x_{k}\|_{\pi}
+2αc22ϖdFix(F)(x~k),\displaystyle+2\alpha c_{2}\sqrt{2\varpi}d_{Fix(F)}(\tilde{x}_{k}), (79)

where θ3:=2c2N(L¯2+1)/c1\theta_{3}:=2c_{2}\sqrt{N(\bar{L}^{2}+1)}/c_{1}.

Second, to bound dFix(F)(x~k+1)d_{Fix(F)}(\tilde{x}_{k+1}), one can first observe that

F~\displaystyle\tilde{F} =(e1,k,,eN,k)\displaystyle=(e_{1,k},\ldots,e_{N,k})
=(F1(x~k)x~1,k,,FN(x~k)x~N,k)\displaystyle=(\texttt{F}_{1}(\tilde{x}_{k})-\tilde{x}_{1,k},\ldots,\texttt{F}_{N}(\tilde{x}_{k})-\tilde{x}_{N,k})
+h1,k+h2,k,\displaystyle\hskip 11.38092pt+h_{1,k}+h_{2,k}, (80)

where ei,k:=Fi(xki)x^i,ke_{i,k}:=\texttt{F}_{i}(x_{k}^{i})-\hat{x}_{i,k} and

h1,k\displaystyle h_{1,k} :=(F1(xk1)F1(x~k),,FN(xkN)FN(x~k)),\displaystyle:=(\texttt{F}_{1}(x_{k}^{1})-\texttt{F}_{1}(\tilde{x}_{k}),\ldots,\texttt{F}_{N}(x_{k}^{N})-\texttt{F}_{N}(\tilde{x}_{k})),
h2,k\displaystyle h_{2,k} :=(x~1,kx^1,k,,x~N,kx^N,k).\displaystyle:=(\tilde{x}_{1,k}-\hat{x}_{1,k},\ldots,\tilde{x}_{N,k}-\hat{x}_{N,k}).

Meanwhile, invoking Assumption 2 yields that

h1,k2\displaystyle\|h_{1,k}\|^{2} =i=1NFi(xki)Fi(x~k)2i=1NLi2xkix~k2\displaystyle=\sum_{i=1}^{N}\|\texttt{F}_{i}(x_{k}^{i})-\texttt{F}_{i}(\tilde{x}_{k})\|^{2}\leq\sum_{i=1}^{N}L_{i}^{2}\|x_{k}^{i}-\tilde{x}_{k}\|^{2}
L¯2xk𝟏Nx~k2,\displaystyle\leq\bar{L}^{2}\|x_{k}-{\bf 1}_{N}\tilde{x}_{k}\|^{2}, (81)

and by (76) and πi(0,1)\pi_{i}\in(0,1), it has that

h2,k2=i=1Nx^i,kx~i,k2xk𝟏Nx~k2.\displaystyle\|h_{2,k}\|^{2}=\sum_{i=1}^{N}\|\hat{x}_{i,k}-\tilde{x}_{i,k}\|^{2}\leq\|x_{k}-{\bf 1}_{N}\tilde{x}_{k}\|^{2}. (82)

Now, in view of (37), (80), (81), (82) and 𝟏Nx~k=Axk{\bf 1}_{N}\tilde{x}_{k}=A_{\infty}x_{k}, it has that for y=PFix(F)(Fα(x~k))Fix(F)y^{*}=P_{Fix(F)}(F_{\alpha}(\tilde{x}_{k}))\in Fix(F),

x~k+1y=x~k+αF~y\displaystyle\|\tilde{x}_{k+1}-y^{*}\|=\|\tilde{x}_{k}+\alpha\tilde{F}-y^{*}\|
=Fα(x~k)y+α(h1,k+h2,k)\displaystyle=\|F_{\alpha}(\tilde{x}_{k})-y^{*}+\alpha(h_{1,k}+h_{2,k})\|
Fα(x~k)y+α(h1,k+h2,k)\displaystyle\leq\|F_{\alpha}(\tilde{x}_{k})-y^{*}\|+\alpha(\|h_{1,k}\|+\|h_{2,k}\|)
dFix(F)(Fα(x~k))+α(L¯+1)c1xkAxkπ,\displaystyle\leq d_{Fix(F)}(F_{\alpha}(\tilde{x}_{k}))+\frac{\alpha(\bar{L}+1)}{c_{1}}\|x_{k}-A_{\infty}x_{k}\|_{\pi}, (83)

where (19) has been employed in the last inequality and note Fα:=Id+α(FId)F_{\alpha}:=Id+\alpha(F-Id).

To analyze the term dFix(F)(Fα(x~k))d_{Fix(F)}(F_{\alpha}(\tilde{x}_{k})) in (83), invoking Lemma 7 and (83) yields that

x~k+1y\displaystyle\|\tilde{x}_{k+1}-y^{*}\|
ρ3dFix(F)(x~k)+α(L¯+1)c1xkAxkπ,\displaystyle\leq\rho_{3}d_{Fix(F)}(\tilde{x}_{k})+\frac{\alpha(\bar{L}+1)}{c_{1}}\|x_{k}-A_{\infty}x_{k}\|_{\pi}, (84)

Combining (83) with dFix(F)(x~k+1)x~k+1yd_{Fix(F)}(\tilde{x}_{k+1})\leq\|\tilde{x}_{k+1}-y^{*}\| can yield that

dFix(F)(x~k+1)\displaystyle d_{Fix(F)}(\tilde{x}_{k+1}) ρ3dFix(F)(x~k)\displaystyle\leq\rho_{3}d_{Fix(F)}(\tilde{x}_{k})
+α(L¯+1)c1xkAxkπ.\displaystyle\hskip 11.38092pt+\frac{\alpha(\bar{L}+1)}{c_{1}}\|x_{k}-A_{\infty}x_{k}\|_{\pi}. (85)

Finally, by setting zk:=col(xkAxkπ,dFix(F)(x~k))z_{k}:=col(\|x_{k}-A_{\infty}x_{k}\|_{\pi},d_{Fix(F)}(\tilde{x}_{k})) for k0k\geq 0, invoking (79) and (85) results in

zk+1Θ(α)zk,\displaystyle z_{k+1}\leq\Theta(\alpha)z_{k}, (86)

where Θ(α)\Theta(\alpha) is defined in (42). Note that Axk=𝟏Nx~kA_{\infty}x_{k}={\bf 1}_{N}\tilde{x}_{k}. At this step, Theorem 2 can be proved by following the similar analysis to that after (68) for proving Theorem 1.

References

  • [1] H. H. Bauschke and P. L. Combettes, Convex Analysis and Monotone Operator Theory in Hilbert Spaces, 2nd ed.   Springer, New York, 2017.
  • [2] A. Cegielski, Iterative Methods for Fixed Point Problems in Hilbert Spaces.   Springer, Heidelberg, 2012, vol. 2057.
  • [3] J. Liang, J. Fadili, and G. Peyré, “Convergence rates with inexact non-expansive operators,” Mathematical Programming, vol. 159, no. 1-2, pp. 403–434, 2016.
  • [4] J. M. Borwein, G. Li, and M. K. Tam, “Convergence rate analysis for averaged fixed point iterations in common fixed point problems,” SIAM Journal on Optimization, vol. 27, no. 1, pp. 1–33, 2017.
  • [5] M. Bravo, R. Cominetti, and M. Pavez-Signé, “Rates of convergence for inexact Krasnosel’skiĭ-Mann iterations in Banach spaces,” Mathematical Programming, no. 1-2, pp. 241–262, 2019.
  • [6] A. Themelis and P. Patrinos, “SuperMann: A superlinearly convergent algorithm for finding fixed points of nonexpansive operators,” IEEE Transactions on Automatic Control, vol. 64, no. 12, pp. 4875–4890, 2019.
  • [7] W. R. Mann, “Mean value methods in iteration,” Proceedings of the American Mathematical Society, vol. 4, no. 3, pp. 506–510, 1953.
  • [8] M. A. Krasnosel’skiĭ, “Two comments on the method of successive approximations,” Uspekhi Matematicheskikh Nauk, vol. 10, pp. 123–127, 1955.
  • [9] S. Reich, “Weak convergence theorems for nonexpansive mappings in Banach spaces,” Journal of Mathematical Analysis and Applications, vol. 67, no. 2, pp. 274–276, 1979.
  • [10] A. Nedić and A. Ozdaglar, “Distributed subgradient methods for multi-agent optimization,” IEEE Transactions on Automatic Control, vol. 54, no. 1, pp. 48–61, 2009.
  • [11] S. Liu, Z. Qiu, and L. Xie, “Convergence rate analysis of distributed optimization with projected subgradient algorithm,” Automatica, vol. 83, pp. 162–169, 2017.
  • [12] X. Li, L. Xie, and Y. Hong, “Distributed continuous-time nonsmooth convex optimization with coupled inequality constraints,” IEEE Transactions on Control of Network Systems, vol. 7, no. 1, pp. 74–84, 2020.
  • [13] V. G. L. Mejia, F. L. Lewis, Y. Wan, E. N. Sanchez, and L. Fan, “Solutions for multi-agent pursuit-evasion games on communication graphs: Finite-time capture and asymptotic behaviors,” IEEE Transactions on Automatic Control, vol. 65, no. 5, pp. 1911–1923, 2020.
  • [14] W. Ren and R. W. Beard, Distributed Consensus in Multi-Vehicle Cooperative Control.   London, U.K.: Springer-Verlag, 2008.
  • [15] X. Li, M. Z. Q. Chen, and H. Su, “Quantized consensus of multi-agent networks with sampled data and Markovian interaction links,” IEEE Transactions on Cybernetics, vol. 49, no. 5, pp. 1816–1825, 2019.
  • [16] D. Fullmer and A. S. Morse, “A distributed algorithm for computing a common fixed point of a finite family of paracontractions,” IEEE Transactions on Automatic Control, vol. 63, no. 9, pp. 2833–2843, 2018.
  • [17] D. Fullmer, J. Liu, and A. S. Morse, “An asynchronous distributed algorithm for computing a common fixed point of a family of paracontractions,” in Proceedings of 55th Conference on Decision and Control, Las Vegas, USA, 2016, pp. 2620–2625.
  • [18] J. Liu, D. Fullmer, A. Nedić, T. Başar, and A. S. Morse, “A distributed algorithm for computing a common fixed point of a family of strongly quasi-nonexpansive maps,” in Proceedings of American Control Conference, Seattle, USA, 2017, pp. 686–690.
  • [19] S. S. Alaviani and N. Elia, “Distributed multi-agent convex optimization over random digraphs,” IEEE Transactions on Automatic Control, vol. 65, no. 3, pp. 986–998, 2020.
  • [20] X. Li and G. Feng, “Distributed algorithms for computing a common fixed point of a group of nonexpansive operators,” IEEE Transactions on Automatic Control, vol. 66, no. 5, pp. 2130–2145, 2021.
  • [21] X. Li and L. Xie, “Distributed algorithms for computing a fixed point of multi-agent nonexpansive operators,” Automatica, vol. 122, p. 109286, 2020.
  • [22] I. Necoara, P. Richtárik, and A. Patrascu, “Randomized projection methods for convex feasibility problems: Conditioning and convergence rates,” SIAM Journal on Optimization, vol. 29, no. 4, pp. 2814–2852, 2019.
  • [23] A. Y. Kruger, D. R. Luke, and N. H. Thao, “Set regularities and feasibility problems,” Mathematical Programming, vol. 168, no. 1-2, pp. 279–311, 2018.
  • [24] S. Mou, J. Liu, and A. S. Morse, “A distributed algorithm for solving a linear algebraic equation,” IEEE Transactions on Automatic Control, vol. 60, no. 11, pp. 2863–2878, 2015.
  • [25] P. Wang, W. Ren, and Z. Duan, “Distributed algorithm to solve a system of linear equations with unique or multiple solutions from arbitrary initializations,” IEEE Transactions on Control of Network Systems, vol. 6, no. 1, pp. 82–93, 2019.
  • [26] S. S. Alaviani and N. Elia, “A distributed algorithm for solving linear algebraic equations over random networks,” in Proceedings of 57th Conference on Decision and Control, Miami Beach, FL, USA, 2018, pp. 83–88.
  • [27] X. Li, M. Meng, and L. Xie, “A linearly convergent algorithm for multi-agent quasi-nonexpansive operators in real Hilbert spaces,” in Proceedings of 59th IEEE Conference on Decision and Control, Jeju Island, Korea, 2020, pp. 4903–4908.
  • [28] H. H. Bauschke and J. M. Borwein, “On projection algorithms for solving convex feasibility problems,” SIAM Review, vol. 38, no. 3, pp. 367–426, 1996.
  • [29] H. H. Bauschke, D. Noll, and H. M. Phan, “Linear and strong convergence of algorithms involving averaged nonexpansive operators,” Journal of Mathematical Analysis and Applications, vol. 421, no. 1, pp. 1–20, 2015.
  • [30] G. Banjac and P. J. Goulart, “Tight global linear convergence rate bounds for operator splitting methods,” IEEE Transactions on Automatic Control, vol. 63, no. 12, pp. 4126–4139, 2018.
  • [31] A. Cegielski, S. Reich, and R. Zalas, “Regular sequences of quasi-nonexpansive operators and their applications,” SIAM Journal on Optimization, vol. 28, no. 2, pp. 1508–1532, 2018.
  • [32] L. Debnath and P. Mikusinski, Introduction to Hilbert Spaces with Applications.   Academic Press, 2005.
  • [33] R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction, 2nd ed.   MIT press, 2018.
  • [34] R. Xin, A. K. Sahu, U. A. Khan, and S. Kar, “Distributed stochastic optimization with gradient tracking over strongly-connected networks,” in Proceedings of 57th Conference on Decision and Control, Nice, France, 2019.
  • [35] S. Matsushita, “On the convergence rate of the Krasnosel’skiĭ-Mann iteration,” Bulletin of the Australian Mathematical Society, vol. 96, no. 1, pp. 162–170, 2017.
  • [36] J. Zhang and K. You, “Decentralized stochastic gradient tracking for empirical risk minimization,” arXiv preprint arXiv:1909.02712, 2019.
  • [37] X. Li, X. Yi, and L. Xie, “Distributed online optimization for multi-agent networks with coupled inequality constraints,” IEEE Transactions on Automatic Control, in press, doi: 10.1109/TAC.2020.3021011, 2020.
  • [38] T. Charalambous, M. G. Rabbat, M. Johansson, and C. N. Hadjicostis, “Distributed finite-time computation of digraph parameters: Left-eigenvector, out-degree and spectrum,” IEEE Transactions on Control of Network Systems, vol. 3, no. 2, pp. 137–148, 2015.
  • [39] R. A. Horn and C. R. Johnson, Matrix Analysis, 2nd ed.   New York, NY: Cambridge University Press, 2012.
  • [40] X. Li, L. Xie, and Y. Hong, “Distributed aggregative optimization over multi-agent networks,” arXiv preprint arXiv:2005.13436, 2020.
  • [41] D. Jakovetić, J. M. F. Moura, and J. Xavier, “Linear convergence rate of a class of distributed augmented Lagrangian algorithms,” IEEE Transactions on Automatic Control, vol. 60, no. 4, pp. 922–936, 2014.
  • [42] A. Nedić, A. Olshevsky, and W. Shi, “Achieving geometric convergence for distributed optimization over time-varying graphs,” SIAM Journal on Optimization, vol. 27, no. 4, pp. 2597–2633, 2017.
  • [43] J. Xu, S. Zhu, Y. C. Soh, and L. Xie, “Convergence of asynchronous distributed gradient methods over stochastic networks,” IEEE Transactions on Automatic Control, vol. 63, no. 2, pp. 434–448, 2018.
  • [44] R. Xin and U. A. Khan, “Distributed heavy-ball: A generalization and acceleration of first-order methods with gradient tracking,” IEEE Transactions on Automatic Control, vol. 65, no. 6, pp. 2627–2633, 2020.
  • [45] S. Pu, W. Shi, J. Xu, and A. Nedić, “Push-pull gradient methods for distributed optimization in networks,” IEEE Transactions on Automatic Control, in press, doi: 10.1109/TAC.2020.2972824, 2020.
  • [46] S. Liang, L. Y. Wang, and G. Yin, “Exponential convergence of distributed primal-dual convex optimization algorithm without strong convexity,” Automatica, vol. 105, pp. 298–306, 2019.
  • [47] T. Tatarenko and A. Nedić, “Geometric convergence of distributed gradient play in games with unconstrained action sets,” arXiv preprint arXiv:1907.07144, 2019.
  • [48] M. Bianchi, G. Belgioioso, and S. Grammatico, “A distributed proximal-point algorithm for Nash equilibrium seeking under partial-decision information with geometric convergence,” arXiv preprint arXiv:1910.11613, 2019.
  • [49] M. Bianchi and S. Grammatico, “Fully distributed Nash equilibrium seeking over time-varying communication networks with linear convergence rate,” arXiv preprint arXiv:2003.10871, 2020.
  • [50] M. Meng and X. Li, “On the linear convergence of distributed Nash equilibrium seeking for multi-cluster games under partial-decision information,” arXiv preprint arXiv:2005.06923, 2020.
  • [51] P.-W. Wang and C.-J. Lin, “Iteration complexity of feasible descent methods for convex optimization,” The Journal of Machine Learning Research, vol. 15, no. 1, pp. 1523–1548, 2014.