DOT and DOP: Linearly Convergent Algorithms for Finding Fixed Points of Multi-Agent Operators ^†^†thanks:

Xiuxian Li, Member, IEEE, Min Meng, and Lihua Xie, Fellow IEEE This research was supported by Ministry of Education, Singapore, under grant AcRF TIER 1- 2019-T1-001-088 (RG72/19), the National Natural Science Foundation of China under Grant 62003243, Shanghai Municipal Commission of Science and Technology No. 19511132101, Shanghai Municipal Science and Technology Major Project under no. 2021SHZDZX0100. (Corresponding author: Lihua Xie.)X. Li and M. Meng are with Department of Control Science and Engineering, College of Electronics and Information Engineering, Institute for Advanced Study, and Shanghai Research Institute for Intelligent Autonomous Systems, Tongji University, Shanghai, China (e-mail: xli@tongji.edu.cn, mengmin@tongji.edu.cn).L. Xie is with School of Electrical and Electronic Engineering, Nanyang Technological University, 50 Nanyang Avenue, Singapore (e-mail: elhxie@ntu.edu.sg).

Abstract

This paper investigates the distributed fixed point finding problem for a global operator over a directed and unbalanced multi-agent network, where the global operator is quasi-nonexpansive and only partially accessible to each individual agent. Two cases are addressed, that is, the global operator is sum separable and block separable. For this first case, the global operator is the sum of local operators, which are assumed to be Lipschitz, and each local operator is privately known to each individual agent. To deal with this scenario, a distributed (or decentralized) algorithm, called Distributed quasi-averaged Operator Tracking algorithm (DOT), is proposed and rigorously analyzed, and it is shown that the algorithm can converge to a fixed point of the global operator at a linear rate under a linear regularity condition, which is strictly weaker than the strong convexity assumption on cost functions in existing convex optimization literature. For the second scenario, the global operator is composed of a group of local block operators which are Lipschitz and can be accessed only by each individual agent. In this setup, a distributed algorithm, called Distributed quasi-averaged Operator Playing algorithm (DOP), is developed and shown to be linearly convergent to a fixed point of the global operator under the linear regularity condition. The above studied problems provide a unified framework for many interesting problems. As examples, the proposed DOT and DOP are exploited to deal with distributed optimization and multi-player games under partial-decision information. Finally, numerical examples are presented to corroborate the theoretical results.

Index Terms:

Distributed algorithms, multi-agent networks, linear convergence, fixed point, bounded linear regularity, distributed optimization, game, real Hilbert spaces.

I Introduction

Fixed point theory in real Hilbert spaces is known as a powerful tool in a variety of domains such as optimization, engineering, economics, game theory, and nonlinear numerical analysis [1, 2]. Generally speaking, the main goal is to devise algorithms for computing a fixed point of an operator.

Up to now, plenty of research has extensively addressed centralized algorithms for finding a fixed point of nonexpansive or quasi-nonexpansive operators in the literature [3, 4, 5, 6], where a central/global coordinator or computing unit is able to access all information of the studied problem. It is known that the typical Picard iteration in general does not converge for nonexpansive operators (e.g., a simple example is the operator $-Id$ with nonzero initial points, where $Id$ is the identity operator), although it usually performs well for contractive operators. For nonexpansive operators, one prominent algorithm is the so-called Krasnosel’skiĭ-Mann (KM) iteration [7, 8], and it is shown to converge weakly to a fixed point of a nonexpansive operator in real Hilbert spaces under mild conditions [9].

In recent few decades, distributed (or decentralized) algorithms have been an active topic in a wide range of domains, including fixed point theory, computer science, game theory, and control theory, and so on, mostly inspired by the fact that distributed algorithms, in contrast with centralized ones, possess a host of fascinating advantages, such as low cost, robustness to failures or antagonistic attacks, privacy preservation, low computational complexity, and so on. Distributed algorithms do not assume global/central coordinators or computing units and instead a finite group of agents (e.g., computing units and robots, and so on.), who may be spatially separated, aim to solve a global problem in a collaborative manner by local information exchanges. Wherein, local information exchanges are often depicted by a simple graph, connoting that every agent can interact with only a subset of agents, instead of all agents. Along this line, distributed algorithms have thus far been investigated extensively under both fixed and time-varying communication graphs in distributed optimization [10, 11, 12], game theory [13], and multi-agent systems/networks [14, 15], to just name a few.

In more recent years, distributed algorithms have received a growing attention in the fixed point finding problem [16, 17, 18, 19, 20, 21]. For instance, a synchronous distributed algorithm was proposed in [16] for computing a common fixed point of a collection of paracontraction operators, and for the same problem, an asynchronous distributed algorithm was developed in [17]. Meanwhile, different from paracontraction operators, another type of operators (i.e., strongly quasi-nonexpansive operators) was addressed for the common fixed point seeking problem in [18] by designing a distributed algorithm in the presence of time-varying delays under the assumption that the communication graph is repeatedly jointly strongly connected. It should be noted that many interesting problems can boil down to the common fixed point finding problem, such as convex feasibility problems [22, 23] and the problem of solving linear algebraic equations in a distributed fashion [24, 25, 26], and so forth. For example, the linear algebraic equation solving problem was formulated such that the distribution of random communication graphs is not required, which include asynchronous updates and/or unreliable interconnection protocols. Notice that all the aforementioned works are in the Euclidean space. Regarding the Hilbert space, the authors in [19] investigated distributed optimization under random and directed interconnection graphs, where a distributed algorithm was proposed and shown to be convergent in both almost surely and mean square sense along with the introduction of a novel convex minimization problem over the fixed-value point set of a nonexpansive random operator. In addition, the authors in [20] took into account the common fixed point finding problem for a finite collection of nonexpansive operators, where two distributed algorithms were proposed with a full coordinate updating and a random block-coordinate updating, respectively, and compared with [16, 17, 18], the contributions of [20] lie in the study of real Hilbert spaces, the consideration of operator errors, and the establishment of a sublinear convergence speed. Furthermore, a more general scenario, where no common fixed points are assumed for all local operators, was investigated in [21], where two distributed algorithms were devised to resolve the problem. It is noteworthy that, to our best knowledge, [21] is the first to investigate the fixed point finding problem of a global operator in real Hilbert spaces, where the global operator is an average of local operators over a multi-agent network. Nevertheless, the convergence rate is not analyzed for the proposed algorithms in [21].

Motivated by the above facts, the purpose of this paper is to further investigate the fixed point finding problem of a quasi-nonexpansive global operator over a time-invariant, directed, and unbalanced communication graph. Two scenarios are taken into account, that is, the global operator is sum separable and block separable. In the first case, the global operator is composed of a sum of local operators, and in the second case, the global operator is comprised of a family of local block operators. That is, in both cases the global operator is separable and consists of local operators, which are assumed to be Lipschitz and are only privately accessible to each individual agent, thereby needing all agents to tackle the global problem in a collaborative manner. The contributions of this paper are threefold as follows.

1.

For the first case, a distributed algorithm, called distributed quasi-averaged operator tracking algorithm (DOT), is developed and shown to be convergent to a fixed point of the global operator at a linear rate under a linear regularity condition. Compared with the closely related work [21], where no convergence speed is provided, a different algorithm is developed here and shown to be convergent at a linear rate. It should be noted that the problem here is more general than the common fixed point seeking problem [17, 16, 18, 20], where all local operators are assumed to have at least one common fixed point, while this assumption is dropped here. As a special case, linear convergence can also be ensured for the common fixed point seeking problem. In contrast, [20] only provides a sublinear rate for nonexpansive operators.
2.

For the second case, a distributed algorithm, called distributed quasi-averaged operator playing algorithm (DOP), is proposed, which is shown to be linearly convergent to a fixed point of the global operator under the linear regularity condition. To our best knowledge, this is the first to study the block separable case in a decentralized manner.
3.

The studied setups in this paper provide a unified framework for a host of interesting problems. For example, the proposed DOT and DOP algorithms can be leveraged to resolve distributed optimization and multi-player games under partial-decision information.

A preliminary version of the paper was presented at a conference [27]. The present paper extends the result of [27] in several ways. [27] only considers the sum separable case without providing detailed proofs of the main result. In comparison, a full proof of the main result (i.e., Theorem 1) for the sum separable case is provided. Besides, one more scenario is investigated in this paper, i.e., the block separable case (see Theorem 2). Also, one more applications is presented here, i.e., multi-player games under partial-decision information, along with one more numerical example.

The structure of this paper is as follows. Some basic knowledge and the problem formulation are introduced in Section II, and the first case with the global operator being sum separable is addressed in Section III, followed by the second case with the global operator being block-coordinate separable in Section IV. Several applications are provided in Section V. Some numerical examples are presented in Section VI, and the conclusion is drawn in Section VII.

II Preliminaries and Problem Formulation

II-A Notations

Let $\mathcal{H}$ be a real Hilbert space with inner product $\langle\cdot,\cdot\rangle$ and associated norm $\|\cdot\|$ . Define $[N]:=\{1,2,\ldots,N\}$ for any integer $N>0$ , and denote by $col(z_{1},\ldots,z_{k})$ the column vector or matrix by stacking up $z_{i},i\in[k]$ . Given an integer $n>0$ , denote by $\mathbb{R}$ , $\mathbb{R}^{n}$ , $\mathbb{R}^{n\times n}$ , and $\mathbb{N}$ the sets of real numbers, $n$ -dimensional real vectors, $n\times n$ real matrices, and nonnegative integers, respectively. Let $P_{X}(\cdot)$ represent the projection operator onto a closed and convex set $X\subseteq\mathcal{H}$ , i.e., $P_{X}(z):=\mathop{\arg\min}_{x\in X}\|z-x\|$ for $z\in\mathcal{H}$ . Moreover, denote by $I$ , $Id$ , and $\otimes$ the identity matrix of appropriate dimension, the identity operator, and the Kronecker product, respectively. Let ${\bf 1}_{n}$ be an $n$ -dimensional vector with all entries $1$ for an integer $n>0$ , and the subscript is omitted when the dimension is clear in the context. $d_{X}(z):=\inf_{x\in X}\|z-x\|$ denotes the distance from $z\in\mathcal{H}$ to the set $X$ . For an operator $T:\mathcal{H}\to\mathcal{H}$ , define $Fix(T):=\{x\in\mathcal{H}|T(x)=x\}$ to be the set of fixed points of $T$ and $T_{\beta}:=Id+\beta(T-Id)$ , called a $\beta$ -relaxation of $T$ with a relaxation parameter $\beta\geq 0$ . Denote by $M_{\infty}$ the infinite power of a square matrix $M$ , i.e., $M_{\infty}=\lim_{k\to\infty}M^{k}$ , if it exists, and let $\rho(M)$ and $det(M)$ be the spectral radius and determinant of $M$ , respectively.

II-B Operator Theory

Consider an operator $T:S\to\mathcal{H}$ for a nonempty set $S\subseteq\mathcal{H}$ . $T$ is called $L$ -Lipschitz (continuous) for a constant $L>0$ if

\displaystyle\|T(x)-T(y)\|\leq L\|x-y\|,~{}~{}~{}\forall x,y\in S.

(1)

Further, $T$ is called nonexpansive (resp. contractive) if $L=1$ (resp. $L<1$ ), quasi-nonexpansive (QNE) if (1) holds with $L=1$ for all $x\in S$ and $y\in Fix(T)$ , and $\rho$ -strongly quasi-nonexpansive ( $\rho$ -SQNE) for $\rho>0$ if it holds that for all $x\in S$ and $y\in Fix(T)$ ,

\displaystyle\|T(x)-y\|^{2}\leq\|x-y\|^{2}-\rho\|x-T(x)\|^{2}.

(2)

$T$ is called $\eta$ -averaged (resp. $\eta$ -quasi-averaged) for $\eta\in(0,1)$ if it can be written as

\displaystyle T=(1-\eta)Id+\eta R,

(3)

where $R$ is some nonexpansive (resp. quasi-nonexpansive) operator.

The aforementioned concepts can be found in Section 4 of [1], where quasi-averaged operators are defined here as an analogue of averaged operators. It is well known that when $T$ is QNE, the fixed point set $Fix(T)$ is closed and convex (cf., Corollary 4.24 in [1]).

The operator $T$ is said boundedly linearly regular if for any bounded set $\mathbb{K}\subseteq S$ , there exists a constant $\omega>0$ such that

\displaystyle d_{Fix(T)}(x)\leq\omega\|x-Tx\|,~{}~{}~{}\forall x\in\mathbb{K}

and $T$ is called linearly regular if $\omega$ is independent of $\mathbb{K}$ [28].

It is easy to observe that the (bounded) linear regularity means that the distance between $x$ and $Tx$ is lower bounded by a scaled distance from the vector $x$ to the set $Fix(T)$ , and bounded linear regularity is weaker than linear regularity. For instance, the projection operator $P_{C}$ is linearly regular with constant $1$ , where $C\subseteq\mathcal{H}$ is nonempty, closed and convex, and when $\mathcal{H}=\mathbb{R}$ , the thresholder operator

\displaystyle T(x)=\left\{\begin{array}[]{ll}0,&\text{if~{}}|x|\leq 1\\ x-1,&\text{if~{}}x>1\\ x+1,&\text{if~{}}x<-1\end{array}\right.

(7)

is boundedly linear regular with $\omega=\max\{|x|,1\}$ , but not linearly regular [29]. Moreover, it has been shown in [30] that linear regularity is necessary and sufficient for global Q-linear convergence to a fixed point for an SQNE operator. More details on (bounded) linear regularity can be found in [29, 30, 31].

II-C Problem Formulation

The aim of this paper is to compute a fixed point of a global operator $F:\mathcal{H}\to\mathcal{H}$ , i.e.,

\displaystyle\text{find}~{}x\in\mathcal{H},~{}~{}\text{s.t.}~{}~{}x\in Fix(F).

(8)

It is worth mentioning that problem (8) in real Hilbert spaces (but not in $\mathbb{R}^{n}$ ) can find applications, for example, in digital signal processing and $L^{2}([0,\pi])$ (i.e., the space of all square integrable functions $f:[0,\pi]\to\mathbb{R}$ ) [32], and so on.

In this paper, no global/central coordinator, master, or computing unit is assumed to exist for problem (8), and however the global operator $F$ is separable and consists of local operators, which are privately accessible to each individual agent in a network. Briefly speaking, only partial information of $F$ can be privately known by each agent in a network, which is interesting and realistic in large-scale problems, as extensively studied in distributed optimization and distributed machine learning, and so on. Specifically, we focus on two scenarios: 1) $F$ is sum separable and 2) $F$ is block separable, as elaborated below.

Case 1: Sum Separable. $F$ is sum separable over a network of $N$ agents, that is,

\displaystyle\text{{\bf(Problem I)}~{}~{}~{}~{}~{}find}~{}x\in Fix(F),~{}~{}F=\frac{1}{N}\sum_{i=1}^{N}F_{i},

(9)

where each $F_{i}:\mathcal{H}\to\mathcal{H}$ is a local operator, only privately accessible to agent $i$ for $i\in[N]$ . Note that the formulation (9) is also investigated in [21], where, however, no convergence rates are provided, while a linear convergence speed is established in this paper. And the formulation (6) is more general than the common fixed point finding problem [16, 17, 18, 19, 20, 21] and the linear algebraic equation solving problem [24, 25, 26], as discussed in the introduction section.

Case 2: Block (or Block-Coordinate) Separable. In this case, $\mathcal{H}=\mathcal{H}_{1}\oplus\cdots\oplus\mathcal{H}_{N}$ is the direct Hilbert sum, where every $\mathcal{H}_{i},i\in[N]$ is a real Hilbert space, along with the same inner product $\langle\cdot,\cdot\rangle$ and associated norm $\|\cdot\|$ as $\mathcal{H}$ , i.e., $\mathcal{H}$ and $\mathcal{H}_{1}$ are consistent when $N=1$ . Let $x=(x_{1},\ldots,x_{N})$ denote a generic vector in $\mathcal{H}$ with $x_{i}\in\mathcal{H}_{i},i\in[N]$ . Then the global operator $F$ can be written as a block-coordinate version $F=(\texttt{F}_{1},\ldots,\texttt{F}_{N})$ , where $\texttt{F}_{i}:\mathcal{H}\to\mathcal{H}_{i}$ for $i\in[N]$ , i.e., $F(x)=(\texttt{F}_{1}(x),\ldots,\texttt{F}_{N}(x))$ for $x\in\mathcal{H}$ . In this setup, $F$ is block (or block-coordinate) separable over a network of $N$ agents, i.e.,

\displaystyle\text{{\bf(Problem II)}~{}~{}~{}~{}~{}find}~{}x\in Fix(F),~{}F=(\texttt{F}_{1},\ldots,\texttt{F}_{N}),

(10)

where each $\texttt{F}_{i}$ is a local operator, only privately accessible to agent $i$ for all $i\in[N]$ . Also, each agent $i$ only knows its own vector $x_{i}$ with no knowledge of $x_{j}$ ’s for all $j\neq i$ . To our best knowledge, this scenario is novel and also practical. For example, in multi-player games, it is difficult or impossible for a global/central coordinator or master to know the whole coordinates of $x$ due to privacy.

With the above discussion, the objective of this paper is to develop distributed (or decentralized) algorithms to solve problems I and II in real Hilbert spaces.

Remark 1.

To illustrate applications of the above studied problems, a simple example in function approximation is provided here, which is useful such as in reinforcement learning (cf., Chapter 9 in [33]). Consider a (reward) function $\texttt{r}:\mathbb{R}^{n}\to\mathbb{R}$ , which may be unknown in reality, and let us approximate it by $\sum_{j=1}^{\infty}w_{j}\exp(-\frac{\|x-c_{j}\|^{2}}{2\sigma_{j}^{2}})$ based on radial basis functions (RBFs), where $c_{j}$ and $\sigma_{j}$ are some prespecified parameters (e.g., feature’s center state and feature’s width in reinforcement learning, respectively), $w=(w_{1},w_{2},\ldots)\in\ell^{2}$ is the variable to be optimized, and $\ell^{2}$ is the space of square-summable sequences which is an infinite-dimensional Hilbert space. Then the performance is to minimize the approximation error $f(w):=\frac{1}{N}\sum_{i=1}^{N}f_{i}(w)$ with $f_{i}(w):=\frac{1}{N_{i}}\sum_{i=1}^{N_{i}}|\texttt{r}(s_{i})-\sum_{j=1}^{\infty}w_{j}\exp(-\frac{\|s_{i}-c_{j}\|^{2}}{2\sigma_{j}^{2}})|^{2}$ , where $\{s_{i}\}_{i=1}^{N_{i}}$ are a set of sample data privately known by agent $i$ . It is easy to verify that $f_{i}$ is differentiable and convex with $L_{i}$ -Lipschitz gradient for some constant $L_{i}>0$ , thus implying that the operator $F_{i}:w\mapsto w-\xi\nabla f_{i}(w)$ is nonexpansive for a constant $\xi\in(0,2/L)$ with $L:=\max_{i\in[N]}L_{i}$ (Lemma 4 in [3]). Then the problem is equivalent to the fixed point finding problem (9) with $\mathcal{H}=\ell^{2}$ . More applications will be given in Section V.

II-D Graph Theory

The communication pattern among all agents is captured by a simple directed graph, denoted by $\mathcal{G}=(\mathcal{V},\mathcal{E})$ , where $\mathcal{V}=[N]$ and $\mathcal{E}\subseteq\mathcal{V}\times\mathcal{V}$ are the node (or agent) and edge sets, respectively. An edge $(i,j)\in\mathcal{E}$ means that agent $i$ can send information to agent $j$ , but not vice versa, and agent $i$ (resp. $j$ ) is called an in-neighbor or simply neighbor (resp. out-neighbor) of agent $j$ (resp. $i$ ). A graph is called undirected if and only if $(i,j)\in\mathcal{E}$ amounts to $(j,i)\in\mathcal{E}$ , and directed otherwise. A directed path is defined to be a sequence of adjacent edges $(i_{1},i_{2}),(i_{2},i_{3}),\ldots,(i_{l-1},i_{l})$ , and a graph is said strongly connected if any two nodes can be connected by a directed path from one to the other.

II-E Assumptions

With the above preparations, we are now ready to impose some standard assumptions.

Assumption 1 (Strong Connectivity).

The communication graph $\mathcal{G}$ is strongly connected. Moreover,

1.

two matrices $A=(a_{ij})\in\mathbb{R}^{N\times N}$ (row-stochastic) and $B=(b_{ij})\in\mathbb{R}^{N\times N}$ (column-stochastic) are arbitrarily assigned to $\mathcal{G}$ with $a_{ii}>0,b_{ii}>0$ for all $i\in[N]$ ;
2.

denote by the left stochastic eigenvector (resp. right stochastic eigenvector) of $A$ (resp. $B$ ) associated with the eigenvalue $1$ as $\pi=col(\pi_{1},\ldots,\pi_{N})$ (resp. $\nu=col(\nu_{1},\ldots,\nu_{N})$ ) such that $A_{\infty}={\bf 1}_{N}\pi^{\top}$ (resp. $B_{\infty}=\nu{\bf 1}_{N}^{\top}$ ) and $\pi_{i}>0$ , $\nu_{i}>0$ for all $i\in[N]$ .

It is worth mentioning that the directed graph is not required to be balanced in this paper. Note that, similar to [34], $A$ and $B$ are consistent with $\mathcal{G}$ , that is, $a_{ij}>0$ and $b_{ij}>0$ if and only if $(j,i)\in\mathcal{E}$ for $i\neq j$ . Notice that $A$ and $B$ do not need to be doubly stochastic. Additionally, the property in Assumption 1.2 can be ensured by the strong connectivity of $\mathcal{G}$ [34].

Assumption 2.

$F_{i}$ is Lipschitz with constant $L_{i}$ for all $i\in[N]$ , i.e., $\|F_{i}(x)-F_{i}(y)\|\leq L_{i}\|x-y\|,~{}\forall x,y\in\mathcal{H}$ .

Assumption 3.

1.

$F$ is quasi-nonexpansive with $Fix(F)\neq\emptyset$ .
2.

$F$ is linearly regular, i.e., there exists a constant $\kappa>0$ such that

$\displaystyle d_{Fix(F)}(x)\leq\kappa\|F(x)-x\|,~{}~{}~{}\forall x\in\mathbb{H}.$ (11)

Remark 2.

Note that Assumption 3 is only made for the global operator $F$ , and is not necessary for any local operators $F_{i}$ ’s. In addition, the linear regularity is strictly weaker than the strong convexity of functions (Section III in [30]), where the linear regularity of functions is for operators involving functions’ gradients, as shown in Section V-A.

III The DOT Algorithm for Problem I

This section aims to develop a distributed algorithm for tackling problem (9) which can converge at a linear rate. Without loss of generality, the vectors in $\mathcal{H}$ are viewed as column vectors in this section.

For problem (9), if $F$ can be known by a global/central computing unit (or coordinator), then a famous centralized algorithm, called the KM iteration [35], can be exploited, i.e.,

\displaystyle x_{k+1}=x_{k}+\alpha_{k}(F(x_{k})-x_{k}),

(12)

where $\{\alpha_{k}\}_{k\in\mathbb{N}}$ is a sequence of relaxation parameters with $\alpha_{k}\in[0,1]$ . Note that the KM iteration usually applies to nonexpansive operators, but it still works for quasi-nonexpansive operators here under the linear regularity condition in Assumption 3. However, the centralized iteration (12) is not realistic here since no global/central computing unit (or coordinator) exists in our setting, which hence motivates us to devise distributed (or decentralized) algorithms based on only local information exchanges among all agents.

Motivated by the classical KM iteration and the tracking techniques such as those in [36, 34, 37], a distributed quasi-averaged operator tracking algorithm (DOT) is proposed as


$\displaystyle x_{i,k+1}$	$\displaystyle=\sum_{j=1}^{N}a_{ij}x_{j,k}+\alpha\Big{(}\frac{y_{i,k}}{w_{i,k}}-\sum_{j=1}^{N}a_{ij}x_{j,k}\Big{)},$	(13a)
$\displaystyle y_{i,k+1}$	$\displaystyle=\sum_{j=1}^{N}b_{ij}y_{j,k}+F_{i}(x_{i,k+1})-F_{i}(x_{i,k}),$	(13b)
$\displaystyle w_{i,k+1}$	$\displaystyle=\sum_{j=1}^{N}b_{ij}w_{j,k},$	(13c)

where $x_{i,k}$ is an estimate of a fixed point of the global operator $F$ by agent $i$ at time $k\geq 0$ for all $i\in[N]$ , and $\alpha\in(0,1)$ is the stepsize to be determined. Set the initial conditions as: arbitrary $x_{i,0}\in\mathcal{H}$ , $y_{i,0}=F_{i}(x_{i,0})$ , and $w_{i,0}=1$ for all $i\in[N]$ . It is noteworthy that neighboring agents are only involved in (13) for each agent due to $a_{ij}=0$ and $b_{ij}=0$ when agent $j$ is not a neighbor of agent $i$ .

Roughly speaking, $y_{i,k}$ is employed to track the weighted global operator $\nu_{i}\sum_{i=1}^{N}F_{i}(x_{i,k})$ , and meanwhile $w_{i,k}$ is a scalar used to track $\nu_{i}N$ in order to counteract the imbalance of the matrix $B$ in (13b).

For (13c), it is easy to verify that each $w_{i,k}$ will exponentially converge to $N\nu_{i}$ . However, invoking the method in [38], it can be concluded that the final value $N\nu_{i}$ can be evaluated for each agent $i$ in finite time in a distributed manner. Because of this, without loss of generality, algorithm (13) can be rewritten by replacing $w_{i,k}$ with $N\nu_{i}$ as in Algorithm 1.

To facilitate the following analysis, Algorithm 1 can be written in a compact form

	$\displaystyle x_{k+1}$	$\displaystyle=\mathbf{A}x_{k}+\alpha\Big{[}\frac{1}{N}(D_{\nu}^{-1}\otimes Id)y_{k}-\mathbf{A}x_{k}\Big{]},$		(14)
	$\displaystyle y_{k+1}$	$\displaystyle=\mathbf{B}y_{k}+\mathbf{F}(x_{k+1})-\mathbf{F}(x_{k}),$		(15)

where $x_{k},y_{k}$ are concatenated vectors of $x_{i,k},y_{i,k}$ , respectively, $\mathbf{A}:=A\otimes Id$ , $\mathbf{B}:=B\otimes Id$ , $D_{\nu}:=diag\{\nu_{1},\ldots,\nu_{N}\}$ , and $\mathbf{F}(z):=col(F_{1}(z_{1}),\ldots,F_{N}(z_{N}))$ for a vector $z=col(z_{1},\ldots,z_{N})\in\mathcal{H}^{N}:=\mathcal{H}\times\cdots\times\mathcal{H}$ (the $N$ -fold Cartesian product of $\mathcal{H}$ ).

Algorithm 1 Distributed Quasi-Averaged Operator Tracking (DOT)

1: Initialization: Stepsize

\alpha

in (30), communication matrices

A

and

B

, and local initial conditions

x_{i,0}\in\mathcal{H}

and

y_{i,0}=F_{i}(x_{i,0})

for all

i\in[N]

2: Iterations: Step

k\geq 0

: update for each

i\in[N]


$\displaystyle x_{i,k+1}$	$\displaystyle=\sum_{j=1}^{N}a_{ij}x_{j,k}+\alpha\Big{(}\frac{y_{i,k}}{N\nu_{i}}-\sum_{j=1}^{N}a_{ij}x_{j,k}\Big{)},$	(16a)
$\displaystyle y_{i,k+1}$	$\displaystyle=\sum_{j=1}^{N}b_{ij}y_{j,k}+F_{i}(x_{i,k+1})-F_{i}(x_{i,k}).$	(16b)

To proceed, it is helpful to introduce two new weighted norms for the Cartesian product $\mathcal{H}^{N}$ , i.e.,

\displaystyle\|z\|_{\pi}:=\sqrt{\sum_{i=1}^{N}\pi_{i}\|z_{i}\|^{2}},~{}~{}~{}\|z\|_{\nu}:=\sqrt{\sum_{i=1}^{N}\frac{\|z_{i}\|^{2}}{\nu_{i}}}

(17)

for any vector $z=col(z_{1},\ldots,z_{N})\in\mathcal{H}^{N}$ . Let $\|\cdot\|$ be the natural norm in $\mathcal{H}^{N}$ , i.e., $\|z\|:=\sqrt{\sum_{i=1}^{N}\|z_{i}\|^{2}}$ . Additionally, it is also necessary to introduce two weighted norms in $\mathbb{R}^{N}$ [34], i.e., for any $x=col(x_{1},\ldots,x_{N})\in\mathbb{R}^{N}$ ,

\displaystyle\|x\|_{\pi}:=\sqrt{\sum_{i=1}^{N}\pi_{i}x_{i}^{2}},~{}~{}~{}\|x\|_{\nu}:=\sqrt{\sum_{i=1}^{N}\frac{x_{i}^{2}}{\nu_{i}}}.

(18)

Please note that the notations $\|\cdot\|_{\pi}$ and $\|\cdot\|_{\nu}$ in (17) and (18) should be easily distinguished by the context. Accordingly, let us denote by $\|M\|_{\pi}$ and $\|M\|_{\nu}$ (resp. $\|M\otimes T\|_{\pi}$ and $\|M\otimes T\|_{\nu}$ ) the norms for a matrix $M\in\mathbb{R}^{N\times N}$ (resp. a matrix $M\in\mathbb{R}^{N\times N}$ and an operator $T$ in $\mathcal{H}$ ) induced by $\|\cdot\|_{\pi}$ and $\|\cdot\|_{\nu}$ in (18) (resp. (17)), respectively.

It is easy to see that the natural norm $\|\cdot\|$ is equivalent to $\|\cdot\|_{\pi}$ , $\|\cdot\|_{\nu}$ in (17), (18), and thus to the induced matrix norms, that is, there are positive constants $c_{i},i\in[4]$ such that

	$\displaystyle c_{1}\\|\cdot\\|$	$\displaystyle\leq\\|\cdot\\|_{\pi}\leq c_{2}\\|\cdot\\|,$		(19)
	$\displaystyle c_{3}\\|\cdot\\|_{\nu}$	$\displaystyle\leq\\|\cdot\\|_{\pi}\leq c_{4}\\|\cdot\\|_{\nu}.$		(20)

Then the following results can be obtained.

Lemma 1 ([34]).

For all $x\in\mathbb{R}^{N}$ , there hold

	$\displaystyle\\|Ax-A_{\infty}x\\|_{\pi}\leq\rho_{1}\\|x-A_{\infty}x\\|_{\pi},$		(21)
	$\displaystyle\\|Bx-B_{\infty}x\\|_{\nu}\leq\rho_{2}\\|x-B_{\infty}x\\|_{\nu},$		(22)
	$\displaystyle\\|A\\|_{\pi}=\\|A_{\infty}\\|_{\pi}=\\|I_{N}-A_{\infty}\\|_{\pi}=1,$		(23)
	$\displaystyle\\|B\\|_{\nu}=\\|B_{\infty}\\|_{\nu}=\\|I_{N}-B_{\infty}\\|_{\nu}=1,$		(24)

where $\rho_{1}:=\|A-A_{\infty}\|_{\pi}<1$ and $\rho_{2}:=\|B-B_{\infty}\|_{\nu}<1$ .

Lemma 2.

For all $z\in\mathcal{H}^{N}$ , the following statements hold

$\displaystyle\\|\mathbf{A}z-\mathbf{A}_{\infty}z\\|_{\pi}$	$\displaystyle\leq\rho_{1}\\|z-\mathbf{A}_{\infty}z\\|_{\pi},$	(25)
$\displaystyle\\|\mathbf{B}z-\mathbf{B}_{\infty}z\\|_{\nu}$	$\displaystyle\leq\rho_{2}\\|z-\mathbf{B}_{\infty}z\\|_{\nu},$	(26)
$\displaystyle\\|I_{N}\otimes Id-\mathbf{A}_{\infty}\\|_{\pi}$	$\displaystyle=\\|I_{N}-A_{\infty}\\|_{\pi}=1,$	(27)

where $\mathbf{A}_{\infty}:=A_{\infty}\otimes Id$ and $\mathbf{B}_{\infty}:=B_{\infty}\otimes Id$ .

Proof.

The proof can be found in Appendix A. ∎

Lemma 3 ([39]).

For an irreducible nonnegative matrix $M\in\mathbb{R}^{n\times n}$ , it is primitive if it has at least one non-zero diagonal entry.

Lemma 4 ([39]).

For an irreducible nonnegative matrix $M\in\mathbb{R}^{n\times n}$ , there hold (i) $\rho(M)>0$ is an eigenvalue of $M$ , (ii) $Mx=\rho(M)x$ for some positive vector $x$ , and (iii) $\rho(M)$ is an algebraically simple eigenvalue.

Lemma 5 ([40]).

For $X,Y\in\mathbb{R}^{n\times n}$ , let $\lambda$ be a simple eigenvalue of $X$ . Denote $u$ and $v$ respectively the left and right eigenvectors of $X$ corresponding to $\lambda$ . Then, it holds that

1.

for each $\epsilon>0$ , there exists a $\delta>0$ such that, $\forall t\in\mathbb{C}$ with $|t|<\delta$ , there is a unique eigenvalue $\lambda(t)$ of $X+tY$ such that $|\lambda(t)-\lambda-t\frac{u^{\top}Yv}{u^{\top}v}|\leq|t|\epsilon$ ,
2.

$\lambda(t)$ is continuous at $t=0$ , and $\lim_{t\to 0}\lambda(t)=\lambda$ ,
3.

$\lambda(t)$ is differentiable at $t=0$ , and $\frac{d\lambda(t)}{dt}\big{|}_{t=0}=\frac{u^{\top}Yv}{u^{\top}v}$ .

Lemma 6.

It holds that $\bar{y}_{k}=\sum_{i=1}^{N}F_{i}(x_{i,k})$ , where $\bar{y}_{k}:=\sum_{i=1}^{N}y_{i,k}$ .

Proof.

Left multiplying (15) by ${\bf 1}^{\top}$ yields that $\bar{y}_{k+1}=\bar{y}_{k}+\sum_{i=1}^{N}F_{i}(x_{i,k+1})-\sum_{i=1}^{N}F_{i}(x_{i,k})$ , which further implies that $\bar{y}_{k}-\sum_{i=1}^{N}F_{i}(x_{i,k})=\bar{y}_{0}-\sum_{i=1}^{N}F_{i}(x_{i,0})$ . Note that $y_{i,0}=F_{i}(x_{i,0})$ . The conclusion directly follows. ∎

To move forward, an important result for the convergence analysis is first given below.

Lemma 7.

Under Assumption 3, if $\alpha\in(0,1-\delta]$ , where $\delta\in(0,1)$ is any pre-specified parameter, then there holds

\displaystyle d_{Fix(F)}(F_{\alpha}(x))\leq\rho_{3}d_{Fix(F)}(x),~{}~{}~{}\forall x\in\mathbb{H}

(28)

where $F_{\alpha}:=Id+\alpha(F-Id)$ is the $\alpha$ -quasi-averaged operator of $F$ , and

\displaystyle\rho_{3}:=1-\frac{\delta\alpha}{4\kappa^{2}}\in[0,1).

(29)

Proof.

It is easy to see that $F_{\alpha}-Id=\alpha(F-Id)$ , which together with (11) yields that $d_{Fix(F)}(x)\leq\kappa\|F(x)-x\|=\frac{\kappa}{\alpha}\|F_{\alpha}(x)-x\|$ for all $x\in\mathbb{H}$ . Therefore, $F_{\alpha}$ is linearly regular with constant $\frac{\kappa}{\alpha}$ . Simultaneously, it is known that each $\alpha$ -quasi-averaged operator is $\frac{1-\alpha}{\alpha}$ -SQNE [30], and thus $F_{\alpha}$ is $\frac{1-\alpha}{\alpha}$ -SQNE. With the above two properties of $F_{\alpha}$ as well as $Fix(F_{\alpha})=Fix(F)$ , invoking Theorem 1 in [30] leads to $d_{Fix(F)}(F_{\alpha}(x))\leq\phi d_{Fix(F)}(x)$ , where $\phi:=\sqrt{1-\frac{\alpha(1-\alpha)}{\kappa^{2}}}\in[0,1)$ . Meanwhile, it is easy to verify that

\displaystyle\phi\leq 1-\frac{\alpha(1-\alpha)}{2\kappa^{2}}\leq 1-\frac{\delta\alpha}{4\kappa^{2}},

where $\alpha\leq 1-\delta$ is used in the last inequality. This ends the proof. ∎

We are now ready to give the main result of this section.

Theorem 1.

Under Assumptions 1-3, all $x_{i,k}$ ’s generated by Algorithm 1 converge to a common point in $Fix(F)$ at a linear rate, if there holds

\displaystyle 0<\alpha<\min\{1-\delta,\alpha_{c}\},

(30)

where $\alpha_{c}$ is the smallest positive real root of equation $det(I-M(\alpha))=0$ , and

\displaystyle M(\alpha):=\left(\begin{array}[]{ccc}(1-\alpha)\rho_{1}&\alpha c_{2}\theta_{1}&0\\ \theta_{2}(\alpha\theta_{3}+\theta_{4})&\rho_{2}+\alpha\theta_{1}\theta_{2}&2\alpha c_{1}\theta_{2}\\ \frac{\alpha\bar{L}}{c_{1}}&\frac{\alpha\sqrt{N}\theta_{1}}{c_{1}}&1-\frac{\delta\alpha}{4\kappa^{2}}\\ \end{array}\right)

(34)

with $\bar{L}:=\max_{i\in[N]}\{L_{i}\}$ , $\theta_{1}:=\frac{c_{4}\|D_{\nu}^{-1}\|}{N}$ , $\theta_{2}:=\frac{c_{2}\bar{L}(\sqrt{N}+1)}{c_{1}c_{3}}$ , $\theta_{3}:=\rho_{1}+\bar{L}$ , and $\theta_{4}:=\|A-I\|$ .

Proof.

The proof can be found in Appendix B. ∎

Remark 3.

It should be noticed that the problem considered in this paper is more general than the common fixed point finding problem in [17, 16, 18, 20], where all local operators are assumed to have at least one common fixed point, while this is dropped in this paper. It is worthwhile to notice the linear algebraic equation solving problem in [24, 25, 26] can be cast as a special case of the common fixed point seeking problem. Note that no convergence speeds are provided in [17, 16, 18, 19], although random interconnection graphs are considered in [19]. In addition, the same problem as here is also studied in [21] for nonexpansive operators, where the convergence rate is not analyzed, while a linear convergence rate is established here and more general operators are considered, i.e., quasi-nonexpansive operators. It should be also noted that a main difference between DOT here and D-KM in [21] is that DOT exploits a tracking technique for $F$ with a constant stepsize, similar to the tracking idea for a global gradient in distributed optimization [36, 34, 37], while D-KM does not use this idea and applies a diminishing stepsize.

As a special case, when all local operators have at least one common fixed point, problem (9) will reduce to the common fixed point seeking problem due to $Fix(F)=\cap_{i=1}^{N}Fix(F_{i})$ in this case (e.g., Proposition 4.47 in [1]). Therefore, we have the following result.

Corollary 1.

Under the same conditions in Theorem 1, if all $F_{i}$ ’s have at least one common fixed point, then all $x_{i,k}$ ’s generated by Algorithm 1 converge to a common point in $\cap_{i=1}^{N}Fix(F_{i})$ at a linear rate.

Note that the convergence speed is also analyzed for the common fixed point seeking problem in [20] (i.e., the DO algorithm), where the main difference between DOT and DO is that an estimate is introduced for each agent to track the global operator $F$ here, while each agent does not perform this tracking in the DO algorithm. However, the rate is sublinear and all operators are assumed to be nonexpansive in [20], while a linear rate is provided here in Corollary 1 and less conservative operators, i.e., quasi-nonexpansive operators, are considered. Note that time-varying communication graphs were considered with non-identical stepsizes for the DO algorithm in [20], while this paper is concerned with static communication graphs with an identical stepsize for all agents. Along this line, it is interesting to further address the case with non-identical stepsizes for different agents and time-varying communication graphs in future.

IV The DOP Algorithm for Problem II

This section is concerned with solving problem (10). Without loss of generality, the vectors in $\mathcal{H}$ are viewed as row vectors in this section for convenient analysis.

For problem (10), each agent $i\in[N]$ can only privately access $\texttt{F}_{i}$ with its own vector $x_{i}$ for a whole vector $x=(x_{1},\ldots,x_{N})\in\mathcal{H}$ over a network of $N$ agents, where $x_{i}$ is privately known by agent $i$ itself, as commonly encountered in multi-player games, and so on. To handle this problem, each agent $i\in[N]$ maintains a vector $x_{k}^{i}=(x_{1,k}^{i},\ldots,x_{N,k}^{i})\in\mathcal{H}$ at time step $k\geq 0$ as an estimate of a fixed point of $F$ , where $x_{j,k}^{i}$ is an estimate of $x_{j,k}$ (i.e., the vector of agent $j$ at time $k$ ) by agent $i$ at time $k$ with $x_{i,k}^{i}=x_{i,k}$ . That is, each agent $i$ updates its own vector $x_{i,k}$ at time slot $k$ without access to the vectors of all other agents $j\neq i$ , and thus each agent $i$ needs to estimate other agents’ vectors $x_{j,k}$ denoted as $x_{k}^{i}$ at each time $k\geq 0$ over the communication graph $\mathcal{G}$ satisfying Assumption 1.

Now, a distributed algorithm is proposed as in Algorithm 2, where $A=(a_{ij})\in\mathbb{R}^{N\times N}$ is the communication matrix introduced after Assumption 1, which is only row-stochastic, and $x_{-i,k}^{j}:=(x_{1,k}^{j},\ldots,x_{i-1,k}^{j},x_{i+1,k}^{j},\ldots,x_{N,k}^{j})$ for all $i,j\in[N]$ , i.e., $x_{-i,k}^{j}$ is the agent $j$ ’s estimate of all agents’ vectors except the $i$ -th agent. We recall that $\pi$ is the left stochastic eigenvector of $A$ associated with the eigenvalue $1$ as introduced in Assumption 1.2. It should be noted that the $i$ -th entry $\pi_{i}$ of $\pi$ can be evaluated by agent $i$ in finite time in a distributed fashion using the approach in [38]. Thus, Algorithm 2 is distributed.

Algorithm 2 Distributed Quasi-Averaged Operator Playing (DOP)

1: Initialization: Stepsize

\alpha

in (39), communication matrix

A

, and local initial conditions

x_{0}^{i}\in\mathcal{H}

for all

i\in[N]

2: Iterations: Step

k\geq 0

: update for each

i\in[N]


	$\displaystyle x_{i,k+1}=\sum_{j=1}^{N}a_{ij}x_{i,k}^{j}+\frac{\alpha}{\pi_{i}}\big{(}\texttt{F}_{i}(x_{k}^{i})-\sum_{j=1}^{N}a_{ij}x_{i,k}^{j}\big{)},$		(35a)
	$\displaystyle x_{-i,k+1}^{i}=\sum_{j=1}^{N}a_{ij}x_{-i,k}^{j}.$		(35b)

To ease the upcoming analysis, let us define $x_{k}:=col(x_{k}^{1},\ldots,x_{k}^{N})\in\mathcal{H}^{N}$ , $\hat{x}_{i,k}:=\sum_{j=1}^{N}a_{ij}x_{i,k}^{j}$ , and $\bar{F}:=diag\{(\texttt{F}_{1}(x_{k}^{1})-\hat{x}_{1,k})/\pi_{1},\ldots,(\texttt{F}_{N}(x_{k}^{N})-\hat{x}_{N,k})/\pi_{N}\}$ . Then algorithm (35) can be written in a compact form

\displaystyle x_{k+1}=Ax_{k}+\alpha\bar{F}.

(36)

Multiplying $\pi^{\top}$ on both sides of (36) yields that

\displaystyle\tilde{x}_{k+1}=\tilde{x}_{k}+\alpha\tilde{F},

(37)

where $\tilde{x}_{k}=(\tilde{x}_{1,k},\ldots,\tilde{x}_{N,k}):=\sum_{i=1}^{N}\pi_{i}x_{k}^{i}$ and $\tilde{F}:=(\texttt{F}_{1}(x_{k}^{1})-\hat{x}_{1,k},\ldots,\texttt{F}_{N}(x_{k}^{N})-\hat{x}_{N,k})$ .

To move forward, it is useful to recall the weighted norm $\|z\|_{\pi}:=\sqrt{\sum_{i=1}^{N}\pi_{i}\|z_{i}\|^{2}}$ for a vector $z=col(z_{1},\ldots,z_{N})\in\mathcal{H}^{N}$ , as defined in (17). And let $\|\cdot\|$ be a norm in $\mathcal{H}^{N}$ defined by $\|z\|:=\sqrt{\sum_{i=1}^{N}\|z_{i}\|^{2}}$ . Remember that the vectors in $\mathcal{H}$ are seen as row vectors in this section. Then similar to (25) in Lemma 2, it is easy to obtain the following result.

Lemma 8.

For all $z\in\mathcal{H}^{N}$ , it holds that

\displaystyle\|Az-A_{\infty}z\|_{\pi}

\displaystyle\leq\rho_{1}\|z-A_{\infty}z\|_{\pi},

(38)

where $A_{\infty}={\bf 1}_{N}\pi^{\top}$ as defined in the paragraph after Assumption 1 and $\rho_{1}:=\|A-A_{\infty}\|_{\pi}<1$ .

To ensure the linear convergence, Assumptions 2 and 3 are still imposed in this section, but $F_{i}$ in Assumption 2 is replaced with $\texttt{F}_{i}$ , i.e., $\|\texttt{F}_{i}(x)-\texttt{F}_{i}(y)\|\leq L_{i}\|x-y\|,~{}\forall x,y\in\mathcal{H}$ for $i\in[N]$ .

With the above preparations, we are now ready to give the main result of this section.

Theorem 2.

Under Assumptions 1-3 with $F_{i}$ being replaced with $\texttt{F}_{i}$ in Assumption 2, all $x_{k}^{i}$ ’s generated by DOP converge to a common point in $Fix(F)$ at a linear rate, if

\displaystyle 0<\alpha<\min\{1-\delta,\alpha_{L}\},

(39)

where $\delta\in(0,1)$ is any pre-specified parameter, $\alpha_{L}$ is the smallest positive real root of equation $det(I-\Theta(\alpha))=0$ , and

\displaystyle\Theta(\alpha):=\left(\begin{array}[]{cc}\rho_{1}+\alpha\theta_{5}&2\alpha c_{2}\sqrt{2\varpi}\\ \frac{\alpha(\bar{L}+1)}{c_{1}}&1-\frac{\delta\alpha}{4\kappa^{2}}\\ \end{array}\right)

(42)

with $\theta_{5}:=2c_{2}\sqrt{\varpi(\bar{L}^{2}+1)}/c_{1}$ , $\bar{L}:=\max_{i\in[N]}\{L_{i}\}$ , $\varpi:=N-1+\frac{(1-\underline{\pi})^{2}}{\underline{\pi}^{2}}$ , and $\underline{\pi}:=\min_{i\in[N]}\{\pi_{i}\}>0$ .

Proof.

The proof can be found in Appendix C. ∎

Remark 4.

It is worth pointing out that the work [21] only considers the sum separable case and does not present the convergence speed. In contrast, this paper addresses both the sum separable case (see Theorem 1) and the block separable case (see Theorem 2), and to our best knowledge, this paper is the first to address the block separable case for the fixed point finding problem in the decentralized fashion. Note that the block separable case in Theorem 2 has many applications as will be discussed in Section V.

Remark 5.

Moreover, it is worth noting that Problem II can be cast as the common fixed point finding problem for a family of operators $T_{i}:=(Id_{1},\ldots,Id_{i-1},\texttt{F}_{i},Id_{i+1},\ldots,Id_{N}):\mathcal{H}\to\mathcal{H}$ , where $Id_{i}:x\mapsto x_{i}$ for $x=(x_{1},\ldots,x_{N})\in\mathcal{H}$ and $i\in[N]$ . However, there exist two issues: 1) the linear regularity condition may not hold for $\sum_{i=1}^{N}T_{i}$ ; and 2) although the DO algorithm proposed in [20] is applicable for finding common fixed points, only a sublinear convergence speed is established.

Remark 6.

It is noteworthy that in problem II each agent $i$ can only know its own vector $x_{i,k}$ at each time $k\geq 0$ , but has no access to vectors $x_{j,k}$ ’s of all other agents for $j\neq i$ . In this regard, agent $i$ needs to estimate all other $x_{j,k}$ ’s in order to compute the value of its operator $\texttt{F}_{i}$ . If each agent has full access to all other agents’ vectors, then a simpler algorithm can be devised to tackle this setup, i.e.,

\displaystyle x_{i,k+1}=x_{i,k}+\alpha(\texttt{F}_{i}(x_{k})-x_{i,k}),

(43)

where $x_{i,k}$ is the same as in (35) and $x_{k}:=(x_{1,k},\ldots,x_{N,k})$ . However, there is no need for each agent to estimate the entire vector $x_{k}$ in this setup. As for (43), a linear convergence to a fixed point of the global operator $F$ can be similarly proved to that of Theorem 2.

V Applications of DOT and DOP

The considered problems can provide a unified framework for a multitude of interesting problems. To show this, this section aims to provide two examples, i.e., distributed optimization and multi-player games under partial decision-information.

V-A Distributed Optimization

Consider a global optimization problem

\displaystyle\min_{x\in\mathcal{H}}~{}~{}~{}f(x)

(44)

where $f:\mathcal{H}\to\mathbb{R}$ is a differentiable and convex function, whose gradient is Lipschitz with constant $L$ . It is easy to verify that problem is equivalent to finding fixed points of an operator $F:x\mapsto x-\xi\nabla f(x)$ for any given $\xi>0$ , which is shown to be $(L\xi)/2$ -averaged when $\xi\in(0,2/L)$ (cf., Lemma 4 in [3]) and thus nonexpansive. For large-scale problems, the function $f$ is usually expensive or impossible to be known by a global/central coordinator or computing unit, instead it is more practical to consider the case where $f$ is separable. Along this line, two cases are discussed below.

Case 1. $f$ is sum separable, i.e., $f(x)=\frac{1}{N}\sum_{i=1}^{N}f_{i}(x)$ , where $f_{i}:\mathcal{H}\to\mathbb{R}$ is the local function, which is differentiable and convex with $\ell_{i}$ -Lipschitz gradient, only known to agent $i$ . This problem is often called distributed/decentralizd optimization, which has been extensively studied in the literature. In this case, the problem can be equivalently cast as problem I (i.e., (9)) with $F_{i}:x\mapsto x-\xi\nabla f_{i}(x)$ for any given bounded $\xi>0$ , which is Lipschitz. Therefore, Assumption 2 holds true. In this setup, DOT proposed in this paper can be leveraged to solve problem (44) in the sum separable case. As such, under Assumptions 1-3, the linear convergence to a solution of (44) can be guaranteed by Theorem 1.

Case 2. $f$ is block separable, that is, $\nabla f(x)=col(\nabla_{x_{1}}f(x),\ldots,\nabla_{x_{N}}f(x))$ with $x=col(x_{1},\ldots,x_{N})$ , where $x_{i}$ is the vector of agent $i\in[N]$ and each agent $i$ is only capable of computing partial gradient $\nabla_{x_{i}}f(x)$ with respect to its own vector $x_{i}$ . This scenario is realistic in some cases, partially because it is computationally expensive to compute the whole gradient $\nabla_{x}f(x)$ by a global/central coordinator or computing unit, and partially because only part of data $x_{i}$ may be privately acquired by spatially distributed agents. In this setup, the problem can be recast as problem (10) with $\texttt{F}_{i}:x\mapsto x_{i}-\xi\nabla_{x_{i}}f(x)$ for any given bounded $\xi>0$ , which is Lipschitz if $\nabla_{x_{i}}f(x)$ is so. For this problem, under Assumptions 1-3, the linear convergence to a solution of (44) can be ensured by Theorem 2.

Remark 7.

Note that in Case 1, the linear convergence rate is ensured under the linear regularity in Assumption 3, which is strictly weaker than the strong convexity of $f_{i}$ ’s or $f$ [30], which is widely postulated in distributed optimization [41, 42, 43, 44, 45, 40], to just name a few. Also, notice that the linear regularity is only assumed for $F$ , not necessary for any local operator $F_{i}$ . For Case 1, a similar condition to linear regularity, i.e., metric subregularity, is employed in [46] for deriving a linear convergence, which is however in Euclidean spaces under balanced undirected communication graphs, while the result here works in a more general setting, i.e., in real Hilbert spaces under unbalanced directed graphs. It should be also noted that the aforesaid problem is just an application of a general problem (9) addressed here. In addition, to our best knowledge, this paper is the first to investigated the Case 2 in distributed optimization.

V-B Game Under Partial-Decision Information

Consider a noncooperative $N$ -player game with unconstrained action sets, where each player can be viewed as an agent and a Nash equilibrium is assumed to exist for the game. In this problem, each player $i\in[N]$ possesses its own cost (or payoff) function $J_{i}(x_{i},x_{-i})$ , which is differentiable, where $x_{i}$ is the decision/action vector of player $i$ and $x_{-i}$ denotes the decision vectors of all other players, i.e., $x_{-i}:=col(x_{1,k},\ldots,x_{i-1,k},x_{i+1,k},\ldots,x_{N,k})$ . Note that player $i$ cannot access other players’ decision vectors, i.e., the considered game here is under partial-decision information, which is more practical than the case where each player has full access to all other players’ decisions in most of existing works. For this problem, at time step $k\geq 0$ , each player $i\in[N]$ will choose its own decision vector $x_{i,k}\in\mathbb{R}^{n_{i}}$ and a cost $J_{i}(x_{i,k},x_{-i,k})$ will be incurred for player $i$ after all players make their decisions. Then the objective is for each player to minimize its own cost function, that is, all players desire to achieve a Nash equilibrium (NE) $x^{*}=col(x_{1}^{*},\ldots,x_{N}^{*})\in\mathbb{R}^{n}$ with $n:=\sum_{i=1}^{N}n_{i}$ , which is defined as: for all $i\in[N]$ ,

\displaystyle J_{i}(x_{i}^{*},x_{-i}^{*})\leq J_{i}(x_{i},x_{-i}^{*}),~{}~{}~{}\forall x_{i}\in\mathbb{R}^{n_{i}}.

(45)

To proceed, let $\nabla_{i}J_{i}(x_{i},x_{-i})$ denote $\nabla_{x_{i}}J_{i}(x_{i},x_{-i})$ for simplicity. It is then easy to see that an NE $x^{*}$ satisfies $\nabla_{i}J_{i}(x_{i}^{*},x_{-i}^{*})=0$ for all $i\in[N]$ . Consequently, the Nash equilibrium seeking can be equivalently recast as to find fixed points of an operator $F$ , defined by

	$\displaystyle F$	$\displaystyle:=Id-rU,$		(46)
	$\displaystyle U$	$\displaystyle:=col(\nabla_{1}J_{1},\ldots,\nabla_{N}J_{N}),$		(47)

where $r>0$ is any constant. By defining $\texttt{F}_{i}:=Id_{i}-r\nabla_{i}J_{i}$ for $i\in[N]$ with $Id_{i}:x\mapsto x_{i}$ for $x=col(x_{1},\ldots,x_{N})\in\mathbb{R}^{n}$ , one can obtain that $F$ is block separable, i.e., $F=col(\texttt{F}_{1},\ldots,\texttt{F}_{N})$ , which is consistent with problem (10). As a result, the linear convergence to an NE of the game can be assured by Theorem 2 under Assumptions 1-3 with $F$ being quasi-nonexpansive.

Note that Assumptions 1-3 are relativley mild, some of which are less conservative than those employed in the literature, as remarked below.

1) Assumption 2 is in fact equivalent to $\nabla_{i}J_{i}$ being Lipschitz for all $i\in[N]$ , i.e., $\|\nabla_{i}J_{i}(x)-\nabla_{i}J_{i}(y)\|\leq q_{i}\|x-y\|$ for some $q_{i}>0$ and for all $x,y\in\mathbb{R}^{n}$ , which has been frequently employed in the literature, see e.g., [47, 48, 49, 50]. Then it can be readily obtained that $U$ is $q$ -Lipschitz, where $q:=\sqrt{\sum_{i=1}^{N}q_{i}^{2}}$ .

2) The linear regularity is strictly weaker than the strong monotonicity, which has been widely imposed for deriving the linear convergence [47, 48, 49, 50], i.e., $(U(x)-U(z))^{\top}(x-z)\geq\mu\|x-z\|^{2}$ for some $\mu>0$ and for all $x,z\in\mathbb{R}^{n}$ . To see this, it is obvious that strong monotonicity is strictly stronger than quasi-strong monotonicity, i.e., $(U(x)-U(y))^{\top}(x-y)\geq\mu\|x-y\|^{2}$ for all $x\in\mathbb{R}^{n}$ and $y\in Fix(F)$ . Meanwhile, quasi-strong monotonicity can imply the linear regularity of $F$ , since it holds that $\|F(x)-x\|=r\|U(x)\|=r\|U(x)-U(P_{Fix(F)}(x))\|\geq r\mu\|x-P_{Fix(F)}(x)\|=r\mu d_{Fix(F)}(x)$ for all $x\in\mathbb{R}^{n}$ , i.e., $d_{Fix(F)}(x)\leq\|F_{\alpha}(x)-x\|/(r\mu)$ , where $U(P_{Fix(F)}(x))=0$ and the quasi-strong monotonicity have been utilized. It should be also noted that the game can have a closed convex set of NEs (not necessary to be unique) under linear regularity.

3) The quasi-nonexpansiveness of $F$ is a weak assumption. For example, if the aforementioned quasi-strong monotonicity holds, then it holds that for all $x\in\mathbb{R}^{n}$ and $y\in Fix(F)$ ,

$\displaystyle\\|F(x)-y\\|^{2}$	$\displaystyle=\\|x-y-r(U(x)-U(y))\\|^{2}$
	$\displaystyle=\\|x-y\\|^{2}-2r(x-y)^{\top}(U(x)-U(y))$
	$\displaystyle\hskip 11.38092pt+r^{2}\\|U(x)-U(y)\\|^{2}$
	$\displaystyle\leq(1-2\mu r+q^{2}r^{2})\\|x-y\\|^{2},$	(48)

where the quasi-strong monotonicity and $q$ -Lipschitz continuity of $U$ are used in the inequality. In view of (48), it is easy to see that $F$ is even contractive, which is stronger than quasi-nonexpansive, if $r\in(0,\frac{2\mu}{q^{2}})$ .

4) Assumption 1 requires the strong connectivity for directed graphs, which are not necessarily balanced. In contrast, balanced undirected/directed graphs are exploited in [47, 48, 49, 50]. We note that time-varying graphs are considered in [49], but the graphs still need to be balanced, and in this case, it is interesting for us to extend the results of this paper to time-varying communication graphs.

VI Numerical Examples

This section is to provide two numerical examples to corroborate the proposed algorithms.

Example 1.

Consider a distributed optimization problem as discussed in Case 1 of Section V-A, where $f_{i}(x)=\hbar_{i}(Ex)+b_{i}^{\top}x$ , and $\hbar_{i}(z)$ is a strongly convex function with Lipschitz continuous gradient. It is easy to see that this problem is equivalent to finding a fixed point of the operator $F:=Id-\xi\nabla f$ for $\xi\in(0,2/L)$ (see Section V-A), which is in the form (9) with $F_{i}:=Id_{i}-\xi\nabla f_{i}$ for $i\in[N]$ .

It should be noted that $f_{i}$ will be strongly convex when $E$ has full column rank, and $f_{i}$ will be convex but not strongly convex if $E$ does not have full column rank (Section III in [30]), which is frequently encountered in practical applications, such as the $L1$ -loss linear support vector machine (SVM) in machine learning [51]. Denote by $X^{*}$ the nonempty set of optimizers of this problem. Although $f_{i}$ is not strongly convex when $E$ does not has full column rank, it has been shown in Theorem 18 of [51] that this problem admits a global error bound, i.e., $d_{X^{*}}(x)\leq\tau\|\nabla f(x)\|,\forall x\in\mathbb{R}^{n}$ for some constant $\tau\geq 0$ , which further leads to $d_{Fix(F)}(x)\leq\frac{\tau}{\xi}\|x-F(x)\|$ for all $x\in\mathbb{R}^{n}$ , i.e., satisfying the linear regularity condition.

Refer to caption — Figure 1: Evolutions of distance to the optimizer set by DOT in this paper.

In the simulation, let $N=100$ , $n=5$ , $\hbar_{i}(Ex)=|Ex-p_{i}|^{2}$ , $E=(1,1,1,1,1)\in\mathbb{R}^{1\times 5}$ , $p_{i}=i$ , and $b_{i}=col(i,i,i,i,i)/5$ for all $i\in[N]$ . Then it is easy to verify that the gradient constant of $f$ is $L=10$ , thus holding $\xi\in(0,0.2)$ . Setting $\alpha=0.05$ and $\xi=0.1$ , running the DOT algorithm (13) gives rise to the simulation results in Fig. 1, indicating that all $x_{i,k}$ ’s converge linearly to the optimal set $X^{*}:=\{z=col(z_{1},z_{2})\in\mathbb{R}^{2}:z_{1}+2z_{2}=3(N+1)/8\approx 37.875\}$ . In comparison with the D-KM iteration proposed in [21] under the same communication graph as DOT (see Fig. 2), which is equivalent to the classical distributed gradient descent (DGD) algorithm for this problem, it can be observed that the DOT algorithm here has a faster convergence speed. Overall, the simulation supports the theoretical result.

Example 2.

Consider the class of games as discussed in Section V-B with $N=50$ players. To be specific, each player $i$ has its decision vector in $\mathbb{R}^{2}$ with its cost function given as $J_{i}(x_{i},x_{-i})=h_{i}(Ex_{i})+l_{i}^{\top}x_{i}$ , where $h_{i}(z)=r_{i}z^{2}+s_{i}z$ is strongly convex for $z\in\mathbb{R}$ with $r_{i}>0$ and $s_{i}\in\mathbb{R}$ , and $l_{i}(x_{-i})=\sum_{j\neq i}c_{ij}x_{j}$ with $c_{ij}\in\mathbb{R}^{2\times 2}$ . Note that $J_{i}$ is not strongly convex in $x_{i}$ if $E$ is not of full column rank, as discussed in Example 1. In this example, let $E=(1,1)$ , which does not have full column rank. Thus, $J_{i}$ is not strongly convex in $x_{i}$ , however $\texttt{F}_{i}:=Id_{i}-r\nabla_{i}J_{i}$ is linearly regular as similarly illustrated as in Example 1. It is then easy to verify that the global operator $F=(\texttt{F}_{1},\ldots,\texttt{F}_{N})$ is linearly regular. Moreover, the Nash equilibrium may not be unique since $E$ is not of full column rank. Set $\alpha=0.01$ and $r=0.1$ . By randomly choosing $r_{i}$ , $s_{i}$ , and $c_{ij}$ for $i,j\in[N]$ with a randomly generated strongly connected communication graph, performing the developed DOP with each component of initial conditions randomly in $[0,1]$ gives the simulation results in Figs. 3 and 4. In Fig. 3, the distances from $x_{k}^{i}=col(x_{1,k}^{i},\ldots,x_{N,k}^{i})$ to the equilibrium set are plotted for all players, showing that all players’ estimate $x_{k}^{i}$ ’s converge to the equilibrium set at a linear rate. On the other hand, the gradient $\nabla_{i}J_{i}(x_{i,k}^{i},x_{-i,k}^{i})$ of each agent $i$ is given in Fig. 4, indicating that all agents’ gradients converge linearly to zero. Hence, the simulation results support the theoretical result in Theorem 2.

VII Conclusion

This paper has investigated the fixed point seeking problem for a quasi-nonexpansive global operator over a fixed, unbalanced, and directed communication graph, for which two scenarios have been considered, i.e., the global operator is respectively sum separable and block separable under the linear regularity condition. For the first case, the global operator consists of a sum of local operators, which are assumed to be Lipschitz. To solve this case, a distributed algorithm, DOT, has been proposed and shown to be convergent to a fixed point of the global operator at a linear rate. For the second case, a distributed algorithm, DOP, has been developed, showing to be linearly convergent to a fixed point of the global operator. Meanwhile, two applications have been presented in detail, i.e., distributed optimization and multi-player game under partial-decision information. In future, it is interesting to study asynchronous algorithms, non-identical stepsizes for agents, and time-varying communication graphs.

Acknowledgment

The authors are grateful to the Editor, the Associate Editor and the anonymous reviewers for their insightful suggestions.

Appendix

VII-A Proof of Lemma 2

To prove (25), it can be obtained that

	$\displaystyle\\|\mathbf{A}z-\mathbf{A}_{\infty}z\\|_{\pi}$	$\displaystyle=\\|(\mathbf{A}-\mathbf{A}_{\infty})(z-\mathbf{A}_{\infty}z)\\|_{\pi}$
		$\displaystyle\leq\\|\mathbf{A}-\mathbf{A}_{\infty}\\|_{\pi}\\|z-\mathbf{A}_{\infty}z\\|_{\pi},$		(49)

where the equality has used the fact that $\mathbf{A}\mathbf{A}_{\infty}=\mathbf{A}_{\infty}\mathbf{A}_{\infty}=\mathbf{A}_{\infty}$ .

Consider the term $\|\mathbf{A}-\mathbf{A}_{\infty}\|_{\pi}$ in (49). To do so, by definition (17), one has that for any $x\in\mathbb{R}^{N}$ and $y\in\mathcal{H}$ ,

\displaystyle\|x\otimes y\|_{\pi}

\displaystyle=\sqrt{\sum_{i=1}^{N}\pi_{i}x_{i}^{2}\|y\|^{2}}=\|y\|\sqrt{\sum_{i=1}^{N}\pi_{i}x_{i}^{2}}=\|x\|_{\pi}\|y\|,

(50)

which, together with the norm’s definition, leads to

$\displaystyle\\|\mathbf{A}-\mathbf{A}_{\infty}\\|_{\pi}$	$\displaystyle=\sup_{\\|x\otimes y\\|_{\pi}\neq 0}\frac{\\|(\mathbf{A}-\mathbf{A}_{\infty})(x\otimes y)\\|_{\pi}}{\\|x\otimes y\\|_{\pi}}$
	$\displaystyle=\sup_{\\|x\otimes y\\|_{\pi}\neq 0}\frac{\\|[(A-A_{\infty})x]\otimes y\\|_{\pi}}{\\|x\otimes y\\|_{\pi}}$
	$\displaystyle=\sup_{\\|x\\|_{\pi}\\|y\\|\neq 0}\frac{\\|(A-A_{\infty})x\\|_{\pi}\\|y\\|}{\\|x\\|_{\pi}\\|y\\|}$
	$\displaystyle=\sup_{\\|x\\|_{\pi}\neq 0}\frac{\\|(A-A_{\infty})x\\|_{\pi}}{\\|x\\|_{\pi}}$
	$\displaystyle=\\|A-A_{\infty}\\|_{\pi}.$	(51)

Note that $\|A-A_{\infty}\|_{\pi}=\rho_{1}$ by Lemma 1. Consequently, putting together (49)-(51) gives rise to (25). By noting that $\mathbf{B}\mathbf{B}_{\infty}=\mathbf{B}_{\infty}\mathbf{B}_{\infty}=\mathbf{B}_{\infty}$ , similar arguments can be applied to obtain (26) and (27). This ends the proof.

VII-B Proof of Theorem 1

Let us first establish upper bounds on $\|x_{k+1}-\mathbf{A}_{\infty}x_{k+1}\|_{\pi}$ , $\|x_{k+1}-x_{k}\|$ , $\|y_{k+1}-\mathbf{B}_{\infty}y_{k+1}\|_{\nu}$ , and $\|\mathbf{A}_{\infty}x_{k+1}-{\bf 1}_{N}\otimes x_{k+1}^{*})\|$ , where $\bar{x}_{k}:=\sum_{i=1}^{N}\pi_{i}x_{i,k}$ and $x_{k}^{*}:=P_{Fix(F)}(\bar{x}_{k})$ for all $k\geq 0$ .

For $\|x_{k+1}-\mathbf{A}_{\infty}x_{k+1}\|_{\pi}$ , by noting $\mathbf{A}_{\infty}\mathbf{A}=\mathbf{A}_{\infty}$ , invoking (14) yields that

	$\displaystyle\\|x_{k+1}-\mathbf{A}_{\infty}x_{k+1}\\|_{\pi}$
	$\displaystyle=\\|(1-\alpha)\mathbf{A}x_{k}+\frac{\alpha}{N}(D_{\nu}^{-1}\otimes Id)y_{k}-(1-\alpha)\mathbf{A}_{\infty}\mathbf{A}x_{k}$
	$\displaystyle\hskip 11.38092pt-\frac{\alpha}{N}\mathbf{A}_{\infty}(D_{\nu}^{-1}\otimes Id)y_{k}\\|_{\pi}$
	$\displaystyle\leq(1-\alpha)\\|\mathbf{A}x_{k}-\mathbf{A}_{\infty}x_{k}\\|_{\pi}$
	$\displaystyle\hskip 11.38092pt+\frac{\alpha}{N}\\|(I_{N}\otimes Id-\mathbf{A}_{\infty})(D_{\nu}^{-1}\otimes Id)y_{k}\\|_{\pi}.$		(52)

Consider the last term in (52). One can obtain that

	$\displaystyle\\|(I_{N}\otimes Id-\mathbf{A}_{\infty})(D_{\nu}^{-1}\otimes Id)y_{k}\\|_{\pi}$
	$\displaystyle=\\|(I_{N}\otimes Id-\mathbf{A}_{\infty})[(D_{\nu}^{-1}\otimes Id)y_{k}-{\bf 1}_{N}\otimes\bar{y}_{k}]\\|_{\pi}$
	$\displaystyle=\\|(I_{N}\otimes Id-\mathbf{A}_{\infty})(D_{\nu}^{-1}\otimes Id)[y_{k}-\nu\otimes\bar{y}_{k}]\\|_{\pi}$
	$\displaystyle\leq\\|I_{N}\otimes Id-\mathbf{A}_{\infty}\\|_{\pi}\\|D_{\nu}^{-1}\otimes Id\\|_{\pi}\\|y_{k}-\mathbf{B}_{\infty}y_{k}\\|_{\pi}$
	$\displaystyle\leq c_{4}\\|D_{\nu}^{-1}\\|_{\pi}\\|y_{k}-\mathbf{B}_{\infty}y_{k}\\|_{\nu},$		(53)

where $\bar{y}_{k}$ is defined in Lemma 6, the first inequality has applied the fact that $\nu\otimes\bar{y}_{k}=\mathbf{B}_{\infty}y_{k}$ , and the last inequality has employed (27), $\|D_{\nu}^{-1}\otimes Id\|_{\pi}=\|D_{\nu}^{-1}\|_{\pi}$ (using the same argument as that in Lemma 2), and (20).

In view of (19) and (25), inserting (53) in (52) results in

	$\displaystyle\\|x_{k+1}-\mathbf{A}_{\infty}x_{k+1}\\|_{\pi}$	$\displaystyle\leq(1-\alpha)\rho_{1}\\|x_{k}-\mathbf{A}_{\infty}x_{k}\\|_{\pi}$
		$\displaystyle\hskip 2.84544pt+\frac{\alpha c_{2}c_{4}\\|D_{\nu}^{-1}\\|}{N}\\|y_{k}-\mathbf{B}_{\infty}y_{k}\\|_{\nu}.$		(54)

As for $\|x_{k+1}-x_{k}\|$ , it can be obtained from (14) that

	$\displaystyle\\|x_{k+1}-x_{k}\\|$
	$\displaystyle=\\|\mathbf{A}x_{k}-x_{k}+\alpha\Big{(}\frac{D_{\nu}^{-1}\otimes Id}{N}y_{k}-\mathbf{A}x_{k}\Big{)}\\|$
	$\displaystyle\leq\\|A-I\\|\\|x_{k}-\mathbf{A}_{\infty}x_{k}\\|+\frac{\alpha\\|D_{\nu}^{-1}\\|}{N}\\|y_{k}-\mathbf{B}_{\infty}y_{k}\\|$
	$\displaystyle\hskip 11.38092pt+\alpha\\|\frac{D_{\nu}^{-1}\otimes Id}{N}\mathbf{B}_{\infty}y_{k}-\mathbf{A}x_{k}\\|,$		(55)

where the inequality has leveraged the triangle inequality and the facts that $(\mathbf{A}-I_{N}\otimes Id)(x_{k}-\mathbf{A}_{\infty}x_{k})=\mathbf{A}x_{k}-x_{k}$ and $\|\mathbf{A}-I_{N}\otimes Id\|=\|A-I\|$ (using the same argument as that in Lemma 2).

Consider the last term in (55). By using $\mathbf{B}_{\infty}=\nu{\bf 1}_{N}^{\top}\otimes Id$ , one has that

	$\displaystyle\\|\frac{D_{\nu}^{-1}\otimes Id}{N}\mathbf{B}_{\infty}y_{k}-\mathbf{A}x_{k}\\|=\\|{\bf 1}_{N}\otimes\frac{\bar{y}_{k}}{N}-\mathbf{A}x_{k}\\|$
	$\displaystyle\leq\\|{\bf 1}_{N}\otimes\Big{(}\frac{\bar{y}_{k}}{N}-\frac{\sum_{i=1}^{N}F_{i}(\bar{x}_{k})}{N}\Big{)}\\|$
	$\displaystyle\hskip 11.38092pt+\\|{\bf 1}_{N}\otimes\Big{(}\frac{\sum_{i=1}^{N}F_{i}(\bar{x}_{k})}{N}-x_{k}^{*}\Big{)}\\|$
	$\displaystyle\hskip 11.38092pt+\\|{\bf 1}_{N}\otimes x_{k}^{*}-\mathbf{A}_{\infty}x_{k}\\|+\\|\mathbf{A}_{\infty}x_{k}-\mathbf{A}x_{k}\\|.$		(56)

For the first term in the last inequality in (56), invoking Lemma 6, one can obtain that

	$\displaystyle\\|{\bf 1}_{N}\otimes\Big{(}\frac{\bar{y}_{k}}{N}-\frac{\sum_{i=1}^{N}F_{i}(\bar{x}_{k})}{N}\Big{)}\\|^{2}$
	$\displaystyle=\\|{\bf 1}_{N}\otimes\frac{\sum_{i=1}^{N}(F_{i}(x_{i,k})-F_{i}(\bar{x}_{k}))}{N}\\|^{2}$
	$\displaystyle=\frac{1}{N}\\|\sum_{i=1}^{N}(F_{i}(x_{i,k})-F_{i}(\bar{x}_{k}))\\|^{2}$
	$\displaystyle\leq\sum_{i=1}^{N}\\|F_{i}(x_{i,k})-F_{i}(\bar{x}_{k})\\|^{2}$
	$\displaystyle\leq\sum_{i=1}^{N}L_{i}^{2}\\|x_{i,k}-\bar{x}_{k}\\|^{2}$
	$\displaystyle\leq\bar{L}^{2}\\|x_{k}-\mathbf{A}_{\infty}x_{k}\\|^{2},$		(57)

where the first and second inequalities have employed $\|\sum_{i=1}^{N}z_{i}\|^{2}\leq N\sum_{i=1}^{N}\|z_{i}\|^{2}$ for any vectors $z_{i}$ ’s and Assumption 2, respectively. Similarly, it can be obtained that

$\displaystyle\\|{\bf 1}_{N}\otimes\big{(}\frac{\sum_{i=1}^{N}F_{i}(\bar{x}_{k})}{N}-x_{k}^{*}\big{)}\\|^{2}$	$\displaystyle=N\\|F(\bar{x}_{k})-x_{k}^{*}\\|^{2}$
	$\displaystyle\leq N\\|\bar{x}_{k}-x_{k}^{*}\\|^{2}$
	$\displaystyle=\\|\mathbf{A}_{\infty}x_{k}-{\bf 1}_{N}\otimes x_{k}^{*}\\|^{2},$	(58)

where $x_{k}^{*}\in Fix(F)$ and the quasi-nonexpansiveness of $F$ have been used in the inequality.

As a result, substituting (57) and (58) into (56) leads to

	$\displaystyle\\|\frac{D_{\nu}^{-1}\otimes Id}{N}\mathbf{B}_{\infty}y_{k}-\mathbf{A}x_{k}\\|$
	$\displaystyle\leq\bar{L}\\|x_{k}-\mathbf{A}_{\infty}x_{k}\\|+2\\|\mathbf{A}_{\infty}x_{k}-{\bf 1}_{N}\otimes x_{k}^{*}\\|$
	$\displaystyle\hskip 11.38092pt+\\|\mathbf{A}x_{k}-\mathbf{A}_{\infty}x_{k}\\|$
	$\displaystyle\leq\frac{\rho_{1}+\bar{L}}{c_{1}}\\|x_{k}-\mathbf{A}_{\infty}x_{k}\\|_{\pi}+2\\|\mathbf{A}_{\infty}x_{k}-{\bf 1}_{N}\otimes x_{k}^{*}\\|,$		(59)

where (19) and (21) have been utilized in the last inequality. Putting (59) in (55) leads to

$\displaystyle\\|x_{k+1}-x_{k}\\|$	$\displaystyle\leq\frac{\alpha(\rho_{1}+\bar{L})+\\|A-I\\|}{c_{1}}\\|x_{k}-\mathbf{A}_{\infty}x_{k}\\|_{\pi}$
	$\displaystyle\hskip 11.38092pt+\frac{\alpha c_{4}\\|D_{\nu}^{-1}\\|}{Nc_{1}}\\|y_{k}-\mathbf{B}_{\infty}y_{k}\\|_{\nu}$
	$\displaystyle\hskip 11.38092pt+2\alpha\\|\mathbf{A}_{\infty}x_{k}-{\bf 1}_{N}\otimes x_{k}^{*}\\|.$	(60)

Regarding $\|y_{k+1}-\mathbf{B}_{\infty}y_{k+1}\|_{\nu}$ , invoking (15) yields that

$\displaystyle\\|y_{k+1}-\mathbf{B}_{\infty}y_{k+1}\\|_{\nu}$	$\displaystyle=\\|\mathbf{B}y_{k}-\mathbf{B}_{\infty}\mathbf{B}y_{k}+\mathbf{F}(x_{k+1})$
	$\displaystyle-\mathbf{F}(x_{k})-\mathbf{B}_{\infty}(\mathbf{F}(x_{k+1})-\mathbf{F}(x_{k}))\\|_{\nu}$
	$\displaystyle\leq\\|\mathbf{B}y_{k}-\mathbf{B}_{\infty}y_{k}\\|_{\nu}+\\|\mathbf{F}(x_{k+1})-\mathbf{F}(x_{k})\\|_{\nu}$
	$\displaystyle+\\|\mathbf{B}_{\infty}(\mathbf{F}(x_{k+1})-\mathbf{F}(x_{k}))\\|_{\nu}$
	$\displaystyle\leq\rho_{2}\\|y_{k}-\mathbf{B}_{\infty}y_{k}\\|_{\nu}$
	$\displaystyle+\frac{c_{2}(1+\sqrt{N})}{c_{3}}\\|\mathbf{F}(x_{k+1})-\mathbf{F}(x_{k})\\|,$	(61)

where $\mathbf{B}_{\infty}\mathbf{B}=\mathbf{B}_{\infty}$ has been used in the first inequality, and (19), (20), (26) and $\|\mathbf{B}_{\infty}\|=\|B_{\infty}\|\leq\sqrt{\|B_{\infty}\|_{1}\|B_{\infty}\|_{\infty}}\leq\sqrt{N}$ have been exploited in the last inequality.

On the other hand, it can be obtained that

$\displaystyle\\|\mathbf{F}(x_{k+1})-\mathbf{F}(x_{k})\\|^{2}$	$\displaystyle=\sum_{i=1}^{N}\\|F_{i}(x_{i,k+1})-F_{i}(x_{i,k})\\|^{2}$
	$\displaystyle\leq\sum_{i=1}^{N}L_{i}^{2}\\|x_{i,k+1}-x_{i,k}\\|^{2}$
	$\displaystyle\leq\bar{L}^{2}\\|x_{k+1}-x_{k}\\|^{2},$	(62)

where Assumption 2 has been applied to obtain the first inequality in (62).

Now substituting (60) and (62) into (61) leads to

	$\displaystyle\\|y_{k+1}-\mathbf{B}_{\infty}y_{k+1}\\|_{\nu}$
	$\displaystyle\leq\rho_{2}\\|y_{k}-\mathbf{B}_{\infty}y_{k}\\|_{\nu}+\frac{c_{2}\bar{L}(\sqrt{N}+1)}{c_{3}}\\|x_{k+1}-x_{k}\\|$
	$\displaystyle\leq\Big{(}\rho_{2}+\frac{\alpha c_{2}c_{4}\bar{L}(\sqrt{N}+1)\\|D_{\nu}^{-1}\\|}{Nc_{1}c_{3}}\Big{)}\\|y_{k}-\mathbf{B}_{\infty}y_{k}\\|_{\nu}$
	$\displaystyle\hskip 11.38092pt+\frac{c_{2}\bar{L}(\sqrt{N}+1)}{c_{1}c_{3}}(\alpha\theta_{3}+\\|A-I\\|)\\|x_{k}-\mathbf{A}_{\infty}x_{k}\\|_{\pi}$
	$\displaystyle\hskip 11.38092pt+\frac{2\alpha c_{2}\bar{L}(\sqrt{N}+1)}{c_{3}}\\|\mathbf{A}_{\infty}x_{k}-{\bf 1}_{N}\otimes x_{k}^{*}\\|,$		(63)

where $\theta_{3}=\rho_{1}+\bar{L}$ .

With regard to $\|\mathbf{A}_{\infty}x_{k+1}-{\bf 1}_{N}\otimes P_{Fix(F)}(\bar{x}_{k+1})\|$ , by defining $\bar{x}_{k}^{*}:=P_{Fix(F)}(F_{\alpha}(\bar{x}_{k}))$ and noting that $\mathbf{A}_{\infty}x_{k+1}={\bf 1}_{N}\otimes\bar{x}_{k+1}$ , $x_{k+1}^{*}=P_{Fix(F)}(\bar{x}_{k+1})$ and $\mathbf{A}_{\infty}\mathbf{A}=\mathbf{A}_{\infty}$ , invoking (14) gives rise to

	$\displaystyle\\|\mathbf{A}_{\infty}x_{k+1}-{\bf 1}_{N}\otimes x_{k+1}^{*}\\|$
	$\displaystyle\leq\\|\mathbf{A}_{\infty}x_{k+1}-{\bf 1}_{N}\otimes\bar{x}_{k}^{*}\\|$
	$\displaystyle=\\|\mathbf{A}_{\infty}\mathbf{A}x_{k}+\alpha\big{[}\frac{\mathbf{A}_{\infty}}{N}(D_{\nu}^{-1}\otimes Id)y_{k}-\mathbf{A}_{\infty}\mathbf{A}x_{k}\big{]}$
	$\displaystyle\hskip 11.38092pt-{\bf 1}_{N}\otimes\bar{x}_{k}^{*}\\|$
	$\displaystyle=\\|{\bf 1}_{N}\otimes\bar{x}_{k}+\alpha\big{[}\frac{\mathbf{A}_{\infty}}{N}({\bf 1}_{N}\otimes\bar{y}_{k})-{\bf 1}_{N}\otimes\bar{x}_{k}\big{]}-{\bf 1}_{N}\otimes\bar{x}_{k}^{*}$
	$\displaystyle\hskip 11.38092pt+\frac{\alpha\mathbf{A}_{\infty}}{N}(D_{\nu}^{-1}\otimes Id)(y_{k}-\mathbf{B}_{\infty}y_{k})\\|$
	$\displaystyle\leq\\|{\bf 1}_{N}\otimes\bar{x}_{k}+\alpha\big{[}\frac{\mathbf{A}_{\infty}}{N}({\bf 1}_{N}\otimes\bar{y}_{k})-{\bf 1}_{N}\otimes\bar{x}_{k}\big{]}-{\bf 1}_{N}\otimes\bar{x}_{k}^{*}\\|$
	$\displaystyle\hskip 11.38092pt+\frac{\alpha c_{4}\\|D_{\nu}^{-1}\\|}{\sqrt{N}c_{1}}\\|y_{k}-\mathbf{B}_{\infty}y_{k}\\|_{\nu},$		(64)

where the last inequality has utilized (19), (20), and the fact that $\|\mathbf{A}_{\infty}\|=\|A_{\infty}\|\leq\sqrt{\|A_{\infty}\|_{1}\|A_{\infty}\|_{\infty}}\leq\sqrt{N}$ .

On the other hand, by $\mathbf{A}_{\infty}={\bf 1}_{N}\otimes\pi^{\top}$ and Lemma 6, one can obtain that

	$\displaystyle\\|{\bf 1}_{N}\otimes\bar{x}_{k}+\alpha\big{[}\frac{\mathbf{A}_{\infty}}{N}({\bf 1}_{N}\otimes\bar{y}_{k})-{\bf 1}_{N}\otimes\bar{x}_{k}\big{]}-{\bf 1}_{N}\otimes\bar{x}_{k}^{*}\\|$
	$\displaystyle=\\|{\bf 1}_{N}\otimes\bar{x}_{k}+\alpha\big{[}{\bf 1}_{N}\otimes\frac{\sum_{i=1}^{N}F_{i}(x_{i,k})}{N}-{\bf 1}_{N}\otimes\bar{x}_{k}\big{]}$
	$\displaystyle\hskip 11.38092pt-{\bf 1}_{N}\otimes\bar{x}_{k}^{*}\\|$
	$\displaystyle=\\|{\bf 1}_{N}\otimes F_{\alpha}(\bar{x}_{k})-{\bf 1}_{N}\otimes\bar{x}_{k}^{*}$
	$\displaystyle\hskip 11.38092pt+\alpha{\bf 1}_{N}\otimes\frac{\sum_{i=1}^{N}(F_{i}(x_{i,k})-F_{i}(\bar{x}_{k}))}{N}\\|$
	$\displaystyle\leq\\|{\bf 1}_{N}\otimes[F_{\alpha}(\bar{x}_{k})-\bar{x}_{k}^{*}]\\|$
	$\displaystyle\hskip 11.38092pt+\alpha\\|{\bf 1}_{N}\otimes\frac{\sum_{i=1}^{N}(F_{i}(x_{i,k})-F_{i}(\bar{x}_{k}))}{N}\\|,$		(65)

where $F_{\alpha}$ is defined in (11).

For the term $\|{\bf 1}_{N}\otimes[F_{\alpha}(\bar{x}_{k})-\bar{x}_{k}^{*}]\|$ in (65), invoking $Fix(F_{\alpha})=Fix(F)$ and Lemma 7 implies that

$\displaystyle\\|{\bf 1}_{N}\otimes[F_{\alpha}(\bar{x}_{k})-\hat{x}_{k}^{*}]\\|^{2}$	$\displaystyle=N\\|F_{\alpha}(\bar{x}_{k})-\bar{x}_{k}^{*}\\|^{2}$
	$\displaystyle\leq N\rho_{3}^{2}\\|\bar{x}_{k}-P_{Fix(F)}(\bar{x}_{k})\\|^{2}$
	$\displaystyle=N\rho_{3}^{2}\\|\bar{x}_{k}-x_{k}^{*}\\|^{2}$
	$\displaystyle=\rho_{3}^{2}\\|{\bf 1}_{N}\otimes(\bar{x}_{k}-x_{k}^{*})\\|^{2}.$	(66)

Now, putting together (57) and (64)-(66) results in

	$\displaystyle\\|\mathbf{A}_{\infty}x_{k+1}-{\bf 1}_{N}\otimes x_{k+1}^{*}\\|$
	$\displaystyle\leq\rho_{3}\\|\mathbf{A}_{\infty}x_{k}-{\bf 1}_{N}\otimes x_{k}^{*}\\|+\frac{\alpha\bar{L}}{c_{1}}\\|x_{k}-\mathbf{A}_{\infty}x_{k}\\|_{\pi}$
	$\displaystyle\hskip 11.38092pt+\frac{\alpha c_{4}\\|D_{\nu}^{-1}\\|}{\sqrt{N}c_{1}}\\|y_{k}-\mathbf{B}_{\infty}y_{k}\\|_{\nu}.$		(67)

In summary, let us define $z_{k}:=col(\|x_{k}-\mathbf{A}_{\infty}x_{k}\|_{\pi},\|y_{k}-\mathbf{B}_{\infty}y_{k}\|_{\nu},\|\mathbf{A}_{\infty}x_{k}-{\bf 1}_{N}\otimes x_{k}^{*}\|)$ . Combining (54), (63), and (67) with $\alpha\in(0,1)$ yields that

\displaystyle z_{k+1}\leq M(\alpha)z_{k},

(68)

where $M(\alpha)$ is defined in (34).

It is easy to see that $z_{k}$ will converge to the origin at an exponential rate if $\rho(M(\alpha))<1$ . To ensure $\rho(M(\alpha))<1$ , it is straightforward to observe that when $\alpha=0$ , $1$ is a simple eigenvalue of $M(0)$ with corresponding left and right eigenvectors both being $col(0,0,1)$ . Then invoking Lemma 5 gives rise to

\displaystyle\frac{d\lambda(\alpha)}{d\alpha}\Big{|}_{\alpha=0}=-\frac{1}{4\kappa^{2}}<0,

indicating that the simple eigenvalue $1$ of $M(0)$ will decrease when increasing the value of $\alpha$ . Thus, by the continuity of $\rho(M(\alpha))$ with respect to $\alpha$ , there must exist a constant $\alpha_{c}>0$ such that $\rho(M(\alpha))<1$ for all $\alpha\in(0,\alpha_{c})$ . To find $\alpha_{c}$ , one can see that the graph associated with $M(\alpha)$ consisting of $3$ agents is strongly connected, which, in conjunction with Theorem C.3 in [14], leads to $M(\alpha)$ being irreducible. Furthermore, in view of Lemma 3, it can be obtained that $M(\alpha)$ is primitive, which together with Lemma 4 results in that $\rho(M(\alpha))$ is a simple eigenvalue of $M(\alpha)$ and all other eigenvalues have absolute values of less than $\rho(M(\alpha))$ . Moreover, it can be ensured that $\rho(M(\alpha))=1$ when increasing $\alpha$ from $0$ to some point, and thereby the value of $\alpha_{c}$ can be calculated by letting $det(I-M(\alpha))=0$ . This completes the proof.

VII-C Proof of Theorem 2

Let us bound $\|x_{k+1}-A_{\infty}x_{k+1}\|_{\pi}$ and $d_{Fix(F)}(\tilde{x}_{k+1})$ in the following.

First, to bound $\|x_{k+1}-A_{\infty}x_{k+1}\|_{\pi}$ , in view of (36) and $A_{\infty}A=A_{\infty}$ , one has that

$\displaystyle\\|x_{k+1}-A_{\infty}x_{k+1}\\|_{\pi}$	$\displaystyle=\\|Ax_{k}+\alpha\bar{F}-A_{\infty}Ax_{k}-\alpha A_{\infty}\bar{F}\\|_{\pi}$
	$\displaystyle\leq\\|Ax_{k}-A_{\infty}x_{k}\\|_{\pi}+\alpha\\|\bar{F}-{\bf 1}_{N}\tilde{F}\\|_{\pi}$
	$\displaystyle\leq\rho_{1}\\|x_{k}-A_{\infty}x_{k}\\|_{\pi}+\alpha c_{2}\\|\bar{F}-{\bf 1}_{N}\tilde{F}\\|,$	(69)

where (19) and (38) have been utilized in the last inequality.

For the last term in (69), it is easy to verify that

\displaystyle\bar{F}-{\bf 1}_{N}\tilde{F}=\left(\begin{array}[]{cccc}(\frac{1}{\pi_{1}}-1)e_{1,k}&-e_{2,k}&\cdots&-e_{N,k}\\ -e_{1,k}&(\frac{1}{\pi_{2}}-1)e_{2,k}&\cdots&-e_{N,k}\\ \vdots&\vdots&\ddots&\vdots\\ -e_{1,k}&-e_{2,k}&\cdots&(\frac{1}{\pi_{N}}-1)e_{N,k}\\ \end{array}\right)

(74)

with $e_{i,k}:=\texttt{F}_{i}(x_{k}^{i})-\hat{x}_{i,k}$ , and thus it can be obtained that

	$\displaystyle\\|\bar{F}-{\bf 1}_{N}\tilde{F}\\|^{2}=\sum_{i=1}^{N}\big{[}(\frac{1}{\pi_{i}}-1)^{2}\\|e_{i,k}\\|^{2}+\sum_{j\neq i}\\|e_{j,k}\\|^{2}\big{]}$
	$\displaystyle\hskip 11.38092pt\leq\varpi\sum_{i=1}^{N}\\|e_{i,k}\\|^{2}$
	$\displaystyle\hskip 11.38092pt=\varpi\sum_{i=1}^{N}\\|\texttt{F}_{i}(x_{k}^{i})-\texttt{F}_{i}(\tilde{x}_{k})+\texttt{F}_{i}(\tilde{x}_{k})-x_{i}^{}+x_{i}^{}-\tilde{x}_{i,k}$
	$\displaystyle\hskip 22.76228pt+\tilde{x}_{i,k}-\hat{x}_{i,k}\\|^{2}$
	$\displaystyle\hskip 11.38092pt\leq 4\varpi\sum_{i=1}^{N}\big{(}\\|\texttt{F}_{i}(x_{k}^{i})-\texttt{F}_{i}(\tilde{x}_{k})\\|^{2}+\\|\texttt{F}_{i}(\tilde{x}_{k})-x_{i}^{*}\\|^{2}$
	$\displaystyle\hskip 22.76228pt+\\|\tilde{x}_{i,k}-x_{i}^{*}\\|^{2}+\\|\hat{x}_{i,k}-\tilde{x}_{i,k}\\|^{2}\big{)},$		(75)

where $\varpi:=N-1+(1-\underline{\pi})^{2}/\underline{\pi}^{2}$ , $x^{*}=(x_{1}^{*},\ldots,x_{N}^{*})$ denotes any fixed point of $F$ , $\pi_{i}\geq\underline{\pi}$ has been used in the first inequality and $\|\sum_{i=1}^{n}z_{i}\|^{2}\leq n\sum_{i=1}^{n}\|z_{i}\|^{2}$ for any vectors $z_{i}$ ’s in the last inequality. Note that $\tilde{x}_{i,k}$ is the $i$ -th block-coordinate of $\tilde{x}_{k}$ defined in (37).

To proceed, one can obtain that

	$\displaystyle\sum_{i=1}^{N}\\|\hat{x}_{i,k}-\tilde{x}_{i,k}\\|^{2}=\sum_{i=1}^{N}\\|\sum_{j=1}^{N}a_{ij}(x_{i,k}^{j}-\tilde{x}_{i,k})\\|^{2}$
	$\displaystyle\leq\sum_{i=1}^{N}\sum_{j=1}^{N}a_{ij}\\|x_{i,k}^{j}-\tilde{x}_{i,k}\\|^{2}$
	$\displaystyle\leq\sum_{i=1}^{N}\sum_{j=1}^{N}\\|x_{i,k}^{j}-\tilde{x}_{i,k}\\|^{2}$
	$\displaystyle=\\|x_{k}-{\bf 1}_{N}\tilde{x}_{k}\\|^{2},$		(76)

where $\sum_{j=1}^{N}a_{ij}=1$ has been exploited for the first equality, the convexity of norm $\|\cdot\|^{2}$ for the first inequality, and $a_{ij}\leq 1$ for the second inequality.

Now, invoking Assumptions 2 and 3.1, (75) and (76) yields

	$\displaystyle\\|\bar{F}-{\bf 1}_{N}\tilde{F}\\|^{2}$	$\displaystyle\leq 4\varpi(\bar{L}^{2}+1)\\|x_{k}-{\bf 1}_{N}\tilde{x}_{k}\\|^{2}$
		$\displaystyle\hskip 11.38092pt+8\varpi\\|\tilde{x}_{k}-x^{*}\\|^{2},$		(77)

which, together with ${\bf 1}_{N}\tilde{x}_{k}=A_{\infty}x_{k}$ , implies

	$\displaystyle\\|\bar{F}-{\bf 1}_{N}\tilde{F}\\|$	$\displaystyle\leq 2\sqrt{\varpi(\bar{L}^{2}+1)}\\|x_{k}-A_{\infty}x_{k}\\|$
		$\displaystyle\hskip 11.38092pt+2\sqrt{2\varpi}\\|\tilde{x}_{k}-x^{*}\\|.$		(78)

At this step, by choosing $x^{*}=P_{Fix(F)}(\tilde{x}_{k})$ , substituting (78) into (69) leads to

	$\displaystyle\\|x_{k+1}-A_{\infty}x_{k+1}\\|_{\pi}$	$\displaystyle\leq(\rho_{1}+\alpha\theta_{3})\\|x_{k}-A_{\infty}x_{k}\\|_{\pi}$
		$\displaystyle+2\alpha c_{2}\sqrt{2\varpi}d_{Fix(F)}(\tilde{x}_{k}),$		(79)

where $\theta_{3}:=2c_{2}\sqrt{N(\bar{L}^{2}+1)}/c_{1}$ .

Second, to bound $d_{Fix(F)}(\tilde{x}_{k+1})$ , one can first observe that

$\displaystyle\tilde{F}$	$\displaystyle=(e_{1,k},\ldots,e_{N,k})$
	$\displaystyle=(\texttt{F}_{1}(\tilde{x}_{k})-\tilde{x}_{1,k},\ldots,\texttt{F}_{N}(\tilde{x}_{k})-\tilde{x}_{N,k})$
	$\displaystyle\hskip 11.38092pt+h_{1,k}+h_{2,k},$	(80)

where $e_{i,k}:=\texttt{F}_{i}(x_{k}^{i})-\hat{x}_{i,k}$ and

	$\displaystyle h_{1,k}$	$\displaystyle:=(\texttt{F}_{1}(x_{k}^{1})-\texttt{F}_{1}(\tilde{x}_{k}),\ldots,\texttt{F}_{N}(x_{k}^{N})-\texttt{F}_{N}(\tilde{x}_{k})),$
	$\displaystyle h_{2,k}$	$\displaystyle:=(\tilde{x}_{1,k}-\hat{x}_{1,k},\ldots,\tilde{x}_{N,k}-\hat{x}_{N,k}).$

Meanwhile, invoking Assumption 2 yields that

	$\displaystyle\\|h_{1,k}\\|^{2}$	$\displaystyle=\sum_{i=1}^{N}\\|\texttt{F}_{i}(x_{k}^{i})-\texttt{F}_{i}(\tilde{x}_{k})\\|^{2}\leq\sum_{i=1}^{N}L_{i}^{2}\\|x_{k}^{i}-\tilde{x}_{k}\\|^{2}$
		$\displaystyle\leq\bar{L}^{2}\\|x_{k}-{\bf 1}_{N}\tilde{x}_{k}\\|^{2},$		(81)

and by (76) and $\pi_{i}\in(0,1)$ , it has that

\displaystyle\|h_{2,k}\|^{2}=\sum_{i=1}^{N}\|\hat{x}_{i,k}-\tilde{x}_{i,k}\|^{2}\leq\|x_{k}-{\bf 1}_{N}\tilde{x}_{k}\|^{2}.

(82)

Now, in view of (37), (80), (81), (82) and ${\bf 1}_{N}\tilde{x}_{k}=A_{\infty}x_{k}$ , it has that for $y^{*}=P_{Fix(F)}(F_{\alpha}(\tilde{x}_{k}))\in Fix(F)$ ,

	$\displaystyle\\|\tilde{x}_{k+1}-y^{}\\|=\\|\tilde{x}_{k}+\alpha\tilde{F}-y^{}\\|$
	$\displaystyle=\\|F_{\alpha}(\tilde{x}_{k})-y^{*}+\alpha(h_{1,k}+h_{2,k})\\|$
	$\displaystyle\leq\\|F_{\alpha}(\tilde{x}_{k})-y^{*}\\|+\alpha(\\|h_{1,k}\\|+\\|h_{2,k}\\|)$
	$\displaystyle\leq d_{Fix(F)}(F_{\alpha}(\tilde{x}_{k}))+\frac{\alpha(\bar{L}+1)}{c_{1}}\\|x_{k}-A_{\infty}x_{k}\\|_{\pi},$		(83)

where (19) has been employed in the last inequality and note $F_{\alpha}:=Id+\alpha(F-Id)$ .

To analyze the term $d_{Fix(F)}(F_{\alpha}(\tilde{x}_{k}))$ in (83), invoking Lemma 7 and (83) yields that

	$\displaystyle\\|\tilde{x}_{k+1}-y^{*}\\|$
	$\displaystyle\leq\rho_{3}d_{Fix(F)}(\tilde{x}_{k})+\frac{\alpha(\bar{L}+1)}{c_{1}}\\|x_{k}-A_{\infty}x_{k}\\|_{\pi},$		(84)

Combining (83) with $d_{Fix(F)}(\tilde{x}_{k+1})\leq\|\tilde{x}_{k+1}-y^{*}\|$ can yield that

	$\displaystyle d_{Fix(F)}(\tilde{x}_{k+1})$	$\displaystyle\leq\rho_{3}d_{Fix(F)}(\tilde{x}_{k})$
		$\displaystyle\hskip 11.38092pt+\frac{\alpha(\bar{L}+1)}{c_{1}}\\|x_{k}-A_{\infty}x_{k}\\|_{\pi}.$		(85)

Finally, by setting $z_{k}:=col(\|x_{k}-A_{\infty}x_{k}\|_{\pi},d_{Fix(F)}(\tilde{x}_{k}))$ for $k\geq 0$ , invoking (79) and (85) results in

\displaystyle z_{k+1}\leq\Theta(\alpha)z_{k},

(86)

where $\Theta(\alpha)$ is defined in (42). Note that $A_{\infty}x_{k}={\bf 1}_{N}\tilde{x}_{k}$ . At this step, Theorem 2 can be proved by following the similar analysis to that after (68) for proving Theorem 1.

References

[1] H. H. Bauschke and P. L. Combettes, Convex Analysis and Monotone Operator Theory in Hilbert Spaces, 2nd ed. Springer, New York, 2017.
[2] A. Cegielski, Iterative Methods for Fixed Point Problems in Hilbert Spaces. Springer, Heidelberg, 2012, vol. 2057.
[3] J. Liang, J. Fadili, and G. Peyré, “Convergence rates with inexact non-expansive operators,” Mathematical Programming, vol. 159, no. 1-2, pp. 403–434, 2016.
[4] J. M. Borwein, G. Li, and M. K. Tam, “Convergence rate analysis for averaged fixed point iterations in common fixed point problems,” SIAM Journal on Optimization, vol. 27, no. 1, pp. 1–33, 2017.
[5] M. Bravo, R. Cominetti, and M. Pavez-Signé, “Rates of convergence for inexact Krasnosel’skiĭ-Mann iterations in Banach spaces,” Mathematical Programming, no. 1-2, pp. 241–262, 2019.
[6] A. Themelis and P. Patrinos, “SuperMann: A superlinearly convergent algorithm for finding fixed points of nonexpansive operators,” IEEE Transactions on Automatic Control, vol. 64, no. 12, pp. 4875–4890, 2019.
[7] W. R. Mann, “Mean value methods in iteration,” Proceedings of the American Mathematical Society, vol. 4, no. 3, pp. 506–510, 1953.
[8] M. A. Krasnosel’skiĭ, “Two comments on the method of successive approximations,” Uspekhi Matematicheskikh Nauk, vol. 10, pp. 123–127, 1955.
[9] S. Reich, “Weak convergence theorems for nonexpansive mappings in Banach spaces,” Journal of Mathematical Analysis and Applications, vol. 67, no. 2, pp. 274–276, 1979.
[10] A. Nedić and A. Ozdaglar, “Distributed subgradient methods for multi-agent optimization,” IEEE Transactions on Automatic Control, vol. 54, no. 1, pp. 48–61, 2009.
[11] S. Liu, Z. Qiu, and L. Xie, “Convergence rate analysis of distributed optimization with projected subgradient algorithm,” Automatica, vol. 83, pp. 162–169, 2017.
[12] X. Li, L. Xie, and Y. Hong, “Distributed continuous-time nonsmooth convex optimization with coupled inequality constraints,” IEEE Transactions on Control of Network Systems, vol. 7, no. 1, pp. 74–84, 2020.
[13] V. G. L. Mejia, F. L. Lewis, Y. Wan, E. N. Sanchez, and L. Fan, “Solutions for multi-agent pursuit-evasion games on communication graphs: Finite-time capture and asymptotic behaviors,” IEEE Transactions on Automatic Control, vol. 65, no. 5, pp. 1911–1923, 2020.
[14] W. Ren and R. W. Beard, Distributed Consensus in Multi-Vehicle Cooperative Control. London, U.K.: Springer-Verlag, 2008.
[15] X. Li, M. Z. Q. Chen, and H. Su, “Quantized consensus of multi-agent networks with sampled data and Markovian interaction links,” IEEE Transactions on Cybernetics, vol. 49, no. 5, pp. 1816–1825, 2019.
[16] D. Fullmer and A. S. Morse, “A distributed algorithm for computing a common fixed point of a finite family of paracontractions,” IEEE Transactions on Automatic Control, vol. 63, no. 9, pp. 2833–2843, 2018.
[17] D. Fullmer, J. Liu, and A. S. Morse, “An asynchronous distributed algorithm for computing a common fixed point of a family of paracontractions,” in Proceedings of 55th Conference on Decision and Control, Las Vegas, USA, 2016, pp. 2620–2625.
[18] J. Liu, D. Fullmer, A. Nedić, T. Başar, and A. S. Morse, “A distributed algorithm for computing a common fixed point of a family of strongly quasi-nonexpansive maps,” in Proceedings of American Control Conference, Seattle, USA, 2017, pp. 686–690.
[19] S. S. Alaviani and N. Elia, “Distributed multi-agent convex optimization over random digraphs,” IEEE Transactions on Automatic Control, vol. 65, no. 3, pp. 986–998, 2020.
[20] X. Li and G. Feng, “Distributed algorithms for computing a common fixed point of a group of nonexpansive operators,” IEEE Transactions on Automatic Control, vol. 66, no. 5, pp. 2130–2145, 2021.
[21] X. Li and L. Xie, “Distributed algorithms for computing a fixed point of multi-agent nonexpansive operators,” Automatica, vol. 122, p. 109286, 2020.
[22] I. Necoara, P. Richtárik, and A. Patrascu, “Randomized projection methods for convex feasibility problems: Conditioning and convergence rates,” SIAM Journal on Optimization, vol. 29, no. 4, pp. 2814–2852, 2019.
[23] A. Y. Kruger, D. R. Luke, and N. H. Thao, “Set regularities and feasibility problems,” Mathematical Programming, vol. 168, no. 1-2, pp. 279–311, 2018.
[24] S. Mou, J. Liu, and A. S. Morse, “A distributed algorithm for solving a linear algebraic equation,” IEEE Transactions on Automatic Control, vol. 60, no. 11, pp. 2863–2878, 2015.
[25] P. Wang, W. Ren, and Z. Duan, “Distributed algorithm to solve a system of linear equations with unique or multiple solutions from arbitrary initializations,” IEEE Transactions on Control of Network Systems, vol. 6, no. 1, pp. 82–93, 2019.
[26] S. S. Alaviani and N. Elia, “A distributed algorithm for solving linear algebraic equations over random networks,” in Proceedings of 57th Conference on Decision and Control, Miami Beach, FL, USA, 2018, pp. 83–88.
[27] X. Li, M. Meng, and L. Xie, “A linearly convergent algorithm for multi-agent quasi-nonexpansive operators in real Hilbert spaces,” in Proceedings of 59th IEEE Conference on Decision and Control, Jeju Island, Korea, 2020, pp. 4903–4908.
[28] H. H. Bauschke and J. M. Borwein, “On projection algorithms for solving convex feasibility problems,” SIAM Review, vol. 38, no. 3, pp. 367–426, 1996.
[29] H. H. Bauschke, D. Noll, and H. M. Phan, “Linear and strong convergence of algorithms involving averaged nonexpansive operators,” Journal of Mathematical Analysis and Applications, vol. 421, no. 1, pp. 1–20, 2015.
[30] G. Banjac and P. J. Goulart, “Tight global linear convergence rate bounds for operator splitting methods,” IEEE Transactions on Automatic Control, vol. 63, no. 12, pp. 4126–4139, 2018.
[31] A. Cegielski, S. Reich, and R. Zalas, “Regular sequences of quasi-nonexpansive operators and their applications,” SIAM Journal on Optimization, vol. 28, no. 2, pp. 1508–1532, 2018.
[32] L. Debnath and P. Mikusinski, Introduction to Hilbert Spaces with Applications. Academic Press, 2005.
[33] R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction, 2nd ed. MIT press, 2018.
[34] R. Xin, A. K. Sahu, U. A. Khan, and S. Kar, “Distributed stochastic optimization with gradient tracking over strongly-connected networks,” in Proceedings of 57th Conference on Decision and Control, Nice, France, 2019.
[35] S. Matsushita, “On the convergence rate of the Krasnosel’skiĭ-Mann iteration,” Bulletin of the Australian Mathematical Society, vol. 96, no. 1, pp. 162–170, 2017.
[36] J. Zhang and K. You, “Decentralized stochastic gradient tracking for empirical risk minimization,” arXiv preprint arXiv:1909.02712, 2019.
[37] X. Li, X. Yi, and L. Xie, “Distributed online optimization for multi-agent networks with coupled inequality constraints,” IEEE Transactions on Automatic Control, in press, doi: 10.1109/TAC.2020.3021011, 2020.
[38] T. Charalambous, M. G. Rabbat, M. Johansson, and C. N. Hadjicostis, “Distributed finite-time computation of digraph parameters: Left-eigenvector, out-degree and spectrum,” IEEE Transactions on Control of Network Systems, vol. 3, no. 2, pp. 137–148, 2015.
[39] R. A. Horn and C. R. Johnson, Matrix Analysis, 2nd ed. New York, NY: Cambridge University Press, 2012.
[40] X. Li, L. Xie, and Y. Hong, “Distributed aggregative optimization over multi-agent networks,” arXiv preprint arXiv:2005.13436, 2020.
[41] D. Jakovetić, J. M. F. Moura, and J. Xavier, “Linear convergence rate of a class of distributed augmented Lagrangian algorithms,” IEEE Transactions on Automatic Control, vol. 60, no. 4, pp. 922–936, 2014.
[42] A. Nedić, A. Olshevsky, and W. Shi, “Achieving geometric convergence for distributed optimization over time-varying graphs,” SIAM Journal on Optimization, vol. 27, no. 4, pp. 2597–2633, 2017.
[43] J. Xu, S. Zhu, Y. C. Soh, and L. Xie, “Convergence of asynchronous distributed gradient methods over stochastic networks,” IEEE Transactions on Automatic Control, vol. 63, no. 2, pp. 434–448, 2018.
[44] R. Xin and U. A. Khan, “Distributed heavy-ball: A generalization and acceleration of first-order methods with gradient tracking,” IEEE Transactions on Automatic Control, vol. 65, no. 6, pp. 2627–2633, 2020.
[45] S. Pu, W. Shi, J. Xu, and A. Nedić, “Push-pull gradient methods for distributed optimization in networks,” IEEE Transactions on Automatic Control, in press, doi: 10.1109/TAC.2020.2972824, 2020.
[46] S. Liang, L. Y. Wang, and G. Yin, “Exponential convergence of distributed primal-dual convex optimization algorithm without strong convexity,” Automatica, vol. 105, pp. 298–306, 2019.
[47] T. Tatarenko and A. Nedić, “Geometric convergence of distributed gradient play in games with unconstrained action sets,” arXiv preprint arXiv:1907.07144, 2019.
[48] M. Bianchi, G. Belgioioso, and S. Grammatico, “A distributed proximal-point algorithm for Nash equilibrium seeking under partial-decision information with geometric convergence,” arXiv preprint arXiv:1910.11613, 2019.
[49] M. Bianchi and S. Grammatico, “Fully distributed Nash equilibrium seeking over time-varying communication networks with linear convergence rate,” arXiv preprint arXiv:2003.10871, 2020.
[50] M. Meng and X. Li, “On the linear convergence of distributed Nash equilibrium seeking for multi-cluster games under partial-decision information,” arXiv preprint arXiv:2005.06923, 2020.
[51] P.-W. Wang and C.-J. Lin, “Iteration complexity of feasible descent methods for convex optimization,” The Journal of Machine Learning Research, vol. 15, no. 1, pp. 1523–1548, 2014.

	$\displaystyle c_{1}\\|\cdot\\|$	$\displaystyle\leq\\|\cdot\\|_{\pi}\leq c_{2}\\|\cdot\\|,$		(19)
	$\displaystyle c_{3}\\|\cdot\\|_{\nu}$	$\displaystyle\leq\\|\cdot\\|_{\pi}\leq c_{4}\\|\cdot\\|_{\nu}.$		(20)

$\displaystyle\\|\mathbf{A}z-\mathbf{A}_{\infty}z\\|_{\pi}$	$\displaystyle\leq\rho_{1}\\|z-\mathbf{A}_{\infty}z\\|_{\pi},$	(25)
$\displaystyle\\|\mathbf{B}z-\mathbf{B}_{\infty}z\\|_{\nu}$	$\displaystyle\leq\rho_{2}\\|z-\mathbf{B}_{\infty}z\\|_{\nu},$	(26)
$\displaystyle\\|I_{N}\otimes Id-\mathbf{A}_{\infty}\\|_{\pi}$	$\displaystyle=\\|I_{N}-A_{\infty}\\|_{\pi}=1,$	(27)

$\displaystyle\\|F(x)-y\\|^{2}$	$\displaystyle=\\|x-y-r(U(x)-U(y))\\|^{2}$
	$\displaystyle=\\|x-y\\|^{2}-2r(x-y)^{\top}(U(x)-U(y))$
	$\displaystyle\hskip 11.38092pt+r^{2}\\|U(x)-U(y)\\|^{2}$
	$\displaystyle\leq(1-2\mu r+q^{2}r^{2})\\|x-y\\|^{2},$	(48)

	$\displaystyle\\|\mathbf{A}z-\mathbf{A}_{\infty}z\\|_{\pi}$	$\displaystyle=\\|(\mathbf{A}-\mathbf{A}_{\infty})(z-\mathbf{A}_{\infty}z)\\|_{\pi}$
		$\displaystyle\leq\\|\mathbf{A}-\mathbf{A}_{\infty}\\|_{\pi}\\|z-\mathbf{A}_{\infty}z\\|_{\pi},$		(49)

$\displaystyle\\|\mathbf{A}-\mathbf{A}_{\infty}\\|_{\pi}$	$\displaystyle=\sup_{\\|x\otimes y\\|_{\pi}\neq 0}\frac{\\|(\mathbf{A}-\mathbf{A}_{\infty})(x\otimes y)\\|_{\pi}}{\\|x\otimes y\\|_{\pi}}$
	$\displaystyle=\sup_{\\|x\otimes y\\|_{\pi}\neq 0}\frac{\\|[(A-A_{\infty})x]\otimes y\\|_{\pi}}{\\|x\otimes y\\|_{\pi}}$
	$\displaystyle=\sup_{\\|x\\|_{\pi}\\|y\\|\neq 0}\frac{\\|(A-A_{\infty})x\\|_{\pi}\\|y\\|}{\\|x\\|_{\pi}\\|y\\|}$
	$\displaystyle=\sup_{\\|x\\|_{\pi}\neq 0}\frac{\\|(A-A_{\infty})x\\|_{\pi}}{\\|x\\|_{\pi}}$
	$\displaystyle=\\|A-A_{\infty}\\|_{\pi}.$	(51)

DOT and DOP: Linearly Convergent Algorithms for Finding Fixed Points of Multi-Agent Operators ††thanks:

Abstract

Index Terms:

I Introduction

II Preliminaries and Problem Formulation

II-A Notations

II-B Operator Theory

II-C Problem Formulation

Remark 1.

II-D Graph Theory

II-E Assumptions

Assumption 1 (Strong Connectivity).

Assumption 2.

Assumption 3.

Remark 2.

III The DOT Algorithm for Problem I

Lemma 1 ([34]).

Lemma 2.

Proof.

Lemma 3 ([39]).

Lemma 4 ([39]).

Lemma 5 ([40]).

Lemma 6.

Proof.

Lemma 7.

Proof.

Theorem 1.

Proof.

Remark 3.

Corollary 1.

IV The DOP Algorithm for Problem II

Lemma 8.

Theorem 2.

Proof.

Remark 4.

Remark 5.

Remark 6.

V Applications of DOT and DOP

V-A Distributed Optimization

Remark 7.

V-B Game Under Partial-Decision Information

VI Numerical Examples

Example 1.

Example 2.

VII Conclusion

Acknowledgment

Appendix

VII-A Proof of Lemma 2

VII-B Proof of Theorem 1

VII-C Proof of Theorem 2

References

DOT and DOP: Linearly Convergent Algorithms for Finding Fixed Points of Multi-Agent Operators ^†^†thanks: