[1]\fnmSeyyed Shaho \surAlaviani

[1]\orgdivDepartment of Mechanical Engineering, \orgnameUniversity of Minnesota, \orgaddress\cityMinneapolis, \stateMinnesota, \countryUSA

2]\orgdivDean, Thomas J. Watson College of Engineering and Applied Science, \orgnameBinghamton University, \orgaddress\cityBinghamton, \stateNew York, \countryUSA

Distributed Convex Optimization with State-Dependent (Social) Interactions over Random Networks

salavian@umn.edu \fnmAtul \surKelkar akelkar1@binghamton.edu * [

Abstract

This paper aims at distributed multi-agent convex optimization where the communications network among the agents are presented by a random sequence of possibly state-dependent weighted graphs. This is the first work to consider both random arbitrary communication networks and state-dependent interactions among agents. The state-dependent weighted random operator of the graph is shown to be quasi-nonexpansive; this property neglects a priori distribution assumption of random communication topologies to be imposed on the operator. Therefore, it contains more general class of random networks with or without asynchronous protocols. A more general mathematical optimization problem than that addressed in the literature is presented, namely minimization of a convex function over the fixed-value point set of a quasi-nonexpansive random operator. A discrete-time algorithm is provided that is able to converge both almost surely and in mean square to the global solution of the optimization problem. Hence, as a special case, it reduces to a totally asynchronous algorithm for the distributed optimization problem. The algorithm is able to converge even if the weighted matrix of the graph is periodic and irreducible under synchronous protocol. Finally, a case study on a network of robots in an automated warehouse is given where there is distribution dependency among random communication graphs.

keywords:

46Exx, 49Mxx, 65Kxx

1 Introduction

Distributed multi-agent optimization has been an attractive topic due to its applications in several areas such as power systems, smart buildings, and machine learning to name a few; therefore, several investigators have paid much attention to distributed optimization problems (see Surveys [1]-[4]). Switched dynamical systems are divided into two categories: arbitrary (or state-independent) and state-dependent (see [5] and references therein for details and several examples). Many references, cited in Surveys [1]-[4], have investigated distributed optimization over arbitrary networks.

On the other hand, state-dependent networks have been shown in practical systems such as flocking of birds [6], opinion dynamics [7]-[15], mobile robotic networks [16], wireless networks [17], and predator-prey interaction [18]. For example, an agent in social networks weighs the opinions of others based on how much its opinion is close to theirs (see Section I in the preliminary version, i.e., [31], for more details).

In state-dependent networks, coupling between algorithm analysis and information exchange among agents impose significant challenge because states of agents at each time determine the weights in the communication networks. Hence, distributed algorithms’ design for consensus and optimization over state-dependent networks is still a challenge.

Consensus problem for opinion dynamics has been investigated in [7]-[15]. Existence of consensus in a multi-robot network has been shown in [19]. Distributed consensus [21]-[26] and distributed optimization [20], [27]-[29] over state-dependent networks with time-invariant or time-varying¹¹1The underlying communication graph is a priori known in a time-varying arbitrary network at each time $t$ , whereas it is a priori unknown in a random arbitrary network. arbitrary graphs have been considered. Hence, the gap in the literature is to consider distributed multi-agent optimization problems with both state-dependent interactions and random arbitrary (see footnote 1) networks.

This paper aims at distributed multi-agent convex optimization over both state-dependent and random arbitrary networks, that has not been addressed in the literature. Assuming doubly stochasticity of weighted matrix of the graph with respect to state variables for each communication network and strong connectivity of the union of the communication networks allows this result to be applicable to periodic and irreducible weighted matrix of the graph in synchronous²²2In a synchronous protocol, all nodes activate at the same time and perform communication updates. On the other hand, in an asynchronous protocol, each node has its concept of time defined by a local timer, which randomly triggers either by the local timer or by a message from neighboring nodes. The algorithms guaranteed to work with no a priori bound on the time for updates are called totally asynchronous, and those that need the knowledge of a priori bound, known as B-connectivity assumption, are called partially asynchronous (see [32] and [33, Ch. 6-7]). protocol. We show that state-dependent weighted random operator of the graph is quasi-nonexpansive³³3It has been shown in [30] that state-independent weighted random operator of the graph has nonexpansivity property.; therefore, imposing a priori distribution of random communication topologies in not required. Thus, it contains random arbitrary networks with/without asynchronous protocols for more general class of switched networks. As an extension of the the distributed optimization problem, we provide a more general mathematical optimization problem than that defined in [30], namely minimization of a convex function over the fixed-value point set of a quasi-nonexpansive random operator. Consequently, the reduced optimization problem to distributed optimization includes both state-independent and state-dependent networks over random arbitrary communication graphs with/without asynchronous protocol (see footnote 3). We prove that the discrete-time algorithm proposed in [30] is utilized for quasi-nonexpansive random operators (which include nonexpansive random operators as a special case). The algorithm converges both almost surely and in mean square to the global optimal solution of the optimization problem under suitable assumptions. For the distributed optimization problem, the algorithm reduces to a totally asynchronous algorithm (see footnote 2). It should be noted that the distributed algorithm⁴⁴4We require to clarify that the distributed algorithm in this paper is the randomized version of the algorithm presented in [29]. In [29], the convergence under deterministic arbitrary switching (see footnote 1) is provided, while we prove here its stochastic convergence (both almost sure and mean square) under random arbitrary switching. Furthermore, quasi-nonexpansivity property of the state-dependent weighted operator of the graph (defined in [29]) has not been shown in [29], whereas we show it here. is totally asynchronous but not asynchronous due to synchronized diminishing step size. The algorithm is able to converge even if the weighted matrix of the graph is periodic and irreducible under synchronous protocol. We provide a numerical example where there is distribution dependency among random arbitrary switching graphs and apply the distributed algorithm to validate the results, while no existing references can conclude results (see Example 1). This version provides proofs, mean square convergence of the proposed algorithm, a numerical example, and larger range of a parameter (i.e., $\beta$ ) in the algorithm, that have not been presented in the preliminary version (i.e., [31]).

This paper is organized as follows. In Section 2, preliminaries on convex analysis and stochastic convergence are given. In Section 3, formulations of the distributed optimization problem and the mathematical optimization are provided. Algorithm and its convergence analysis are presented in Section 4. Finally, a numerical example is given in order to show advantages of the results in Section 5, followed by conclusions and future work in Section 6.

Notations: $\Re$ denotes the set of all real numbers. For any vector $z\in\Re^{n},\|z\|_{2}=\sqrt{z^{T}z},$ and for any matrix $Z\in\Re^{n\times n},\|Z\|_{2}=\sqrt{\lambda_{\max}(Z^{T}Z)}=\sigma_{\max}(Z)$ where $Z^{T}$ represents the transpose of matrix $Z$ , $\lambda_{\max}$ represents maximum eigenvalue, and $\sigma_{\max}$ represents largest singular value. Sorted in an increasing order with respect to real parts, $\lambda_{2}(Z)$ represents the second eigenvalue of a matrix $Z$ . $Re(r)$ represents the real part of the complex number $r$ . For any matrix $Z\in\Re^{n\times n}$ with $Z=[z_{ij}]$ , $\|Z\|_{1}=\max_{1\leq j\leq n}\{\sum_{i=1}^{n}|z_{ij}|\}$ and $\|Z\|_{\infty}=\max_{1\leq i\leq n}\{\sum_{j=1}^{n}|z_{ij}|\}$ . $I_{n}$ represents Identity matrix of size $n\times n$ for some $n\in\mathbb{N}$ where $\mathbb{N}$ denotes the set of all natural numbers. $\nabla f(x)$ denotes the gradient of the function $f(x)$ . $\otimes$ denotes the Kronecker product. $\times$ represents Cartesian product. $E[x]$ denotes Expectation of the random variable $x$ .

2 Preliminaries

A vector $v\in\Re^{n}$ is said to be a stochastic vector when its components $v_{i},i=1,2,...,n$ , are non-negative and their sum is equal to 1; a square $n\times n$ matrix $V$ is said to be a stochastic matrix when each row of $V$ is a stochastic vector. A square $n\times n$ matrix $V$ is said to be doubly stochastic matrix when both $V$ and $V^{T}$ are stochastic matrices.

Let $\mathcal{H}$ be a real Hilbert space with norm $\|.\|$ and inner product $\langle.,.\rangle$ . An operator $A:\mathcal{H}\longrightarrow\mathcal{H}$ is said to be monotone if $\langle x-y,Ax-Ay\rangle\geq 0$ for all $x,y\in\mathcal{H}$ . $A:\mathcal{H}\longrightarrow\mathcal{H}$ is $\rho$ -strongly monotone if $\langle x-y,Ax-Ay\rangle\geq\rho\|x-y\|^{2}$ for all $x,y\in\mathcal{H}$ . A differentiable function $f:\mathcal{H}\longrightarrow\Re$ is $\rho$ -strongly convex if $\langle x-y,\nabla f(x)-\nabla f(y)\rangle\geq\rho\|x-y\|^{2}$ for all $x,y\in\mathcal{H}$ . Therefore, a function is $\rho$ -strongly convex if its gradient is $\rho$ -strongly monotone. A convex differentiable function $f:\mathcal{H}\longrightarrow\Re$ is $\mathcal{L}$ -strongly smooth if

\langle x-y,\nabla f(x)-\nabla f(y)\rangle\leq\mathcal{L}\|x-y\|^{2},\forall x,y\in\mathcal{H}.

A mapping $B:\mathcal{H}\longrightarrow\mathcal{H}$ is said to be $K$ -Lipschitz continuous if there exists a $K>0$ such that $\|Bx-By\|\leq K\|x-y\|$ for all $x,y\in\mathcal{H}$ . Let $S$ be a nonempty subset of a Hilbert space $\mathcal{H}$ and $Q:S\longrightarrow\mathcal{H}$ . The point $x$ is called a fixed point of $Q$ if $x=Q(x)$ . And, $\text{Fix}(Q)$ denotes the set of all fixed points of $Q$ .

Let $\omega^{*}$ and $\omega$ denote elements in the sets $\Omega^{*}$ and $\Omega$ , respectively, where $\Omega=\Omega^{*}\times\Omega^{*}\ldots$ . Let $(\Omega^{*},\sigma)$ be a measurable space ( $\sigma$ -sigma algebra) and $C$ be a nonempty subset of a Hilbert space $\mathcal{H}$ . A mapping $x:\Omega^{*}\longrightarrow\mathcal{H}$ is measurable if $x^{-1}(U)\in\sigma$ for each open subset $U$ of $\mathcal{H}$ . The mapping $T:\Omega^{*}\times C\longrightarrow\mathcal{H}$ is a random map if for each fixed $z\in C$ , the mapping $T(.,z):\Omega^{*}\longrightarrow\mathcal{H}$ is measurable, and it is continuous if for each $\omega^{*}\in\Omega^{*}$ the mapping $T(\omega^{*},.):C\longrightarrow\mathcal{H}$ is continuous.

Definition 1.

A measurable mapping $x:\Omega^{*}\longrightarrow C,C\subseteq\mathcal{H},$ is a random fixed point of the random map $T:\Omega^{*}\times C\longrightarrow\mathcal{H}$ if $T(\omega^{*},x(\omega^{*}))=x(\omega^{*})$ for each $\omega^{*}\in\Omega^{*}$ .

Definition 2.

[30] If there exists a point $\hat{x}\in\mathcal{H}$ such that $\hat{x}=T(\omega^{*},\hat{x})$ for all $\omega^{*}\in\Omega^{*}$ , it is called fixed-value point, and $FVP(T)$ represents the set of all fixed-value points of $T$ .

Definition 3.

Let $C$ be a nonempty subset of a Hilbert space $\mathcal{H}$ and $T:\Omega^{*}\times C\longrightarrow C$ be a random map. The map $T$ is said to be

1) nonexpansive random operator if for each $\omega^{*}\in\Omega^{*}$ and for arbitrary $x,y\in C$ we have

\|T(\omega^{*},x)-T(\omega^{*},y)\|\leq\|x-y\|,

(1)

2) quasi-nonexpansive random operator if for any $x\in C$ we have

\|T(\omega^{*},x)-\xi(\omega^{*})\|\leq\|x-\xi(\omega^{*})\|

where $\xi:\Omega^{*}\longrightarrow C$ is a random fixed point of $T$ (see Definition 1).

Note that if $\|T(\omega^{*},x)-T(\omega^{*},x)\|\leq\gamma\|x-y\|,0\leq\gamma<1,$ holds in (1), the operator is called (Banach) contraction.

Remark 1.

If a nonexpansive random operator has a random fixed point, then it is a quasi-nonexpansive random operator. From Definitions 2 and 3, if a quasi-nonexpansive random operator has a fixed-value point, say $x^{*}$ , then we have for any $x\in C$ that

\|T(\omega^{*},x)-x^{*}\|\leq\|x-x^{*}\|.

(2)

Proposition 1.

[34, Th. 1] If $C$ is a closed convex subset of a Hilbert space $\mathcal{H}$ and $T:C\longrightarrow C$ is quasi-nonexpansive, then $Fix(T)$ is a nonempty closed convex set.

Definition 4.

A sequence of random variables $x_{t}$ is said to converge

1) pointwise (surely) to $x$ if for every $\omega\in\Omega$ ,

\lim_{t\longrightarrow\infty}\|x_{t}(\omega)-x(\omega)\|=0,

2) almost surely to $x$ if there exists a subset $\mathcal{A}\subseteq\Omega$ such that $Pr(\mathcal{A})=0$ , and for every $\omega\notin\mathcal{A}$ ,

\lim_{t\longrightarrow\infty}\|x_{t}(\omega)-x(\omega)\|=0,

3) in mean square to $x$ if

E[\|x_{t}-x\|^{2}]\longrightarrow 0\quad{}as\quad{}t\longrightarrow\infty.

Lemma 1. [35, Ch. 5] Let $W\in\Re^{m\times m}$ . Then $\|W\|_{2}\leq\sqrt{\|W\|_{1}\|W\|_{\infty}}.$

Lemma 2. [36] Let $\{a_{t}\}_{t=0}^{\infty}$ be a sequence of nonnegative real numbers satisfying $a_{t+1}\leq(1-b_{t})a_{t}+b_{t}h_{t},\quad{t\geq 0}$ where $b_{t}\in[0,1],\sum_{t=0}^{\infty}b_{t}=\infty$ , and $\displaystyle\limsup_{t\longrightarrow\infty}h_{t}\leq 0$ . Then $\displaystyle\lim_{t\longrightarrow\infty}a_{t}=0$ .

Lemma 3. Let the sequence $\{x_{t}\}_{t=0}^{\infty}$ in a real Hilbert space $\mathcal{H}$ be bounded for each realization $\omega\in\Omega$ and converge almost surely to $x^{*}$ . Then the sequence converges in mean square to $x^{*}$ .

Proof: See the proof of Theorem 2 in [30].

The Cucker-Smale weight [6], which depends on distance between two agents $i$ and $j$ , is of the form

\mathcal{W}_{ij}(x_{i},x_{j})=\frac{Q}{(\sigma^{2}+\|x_{i}-x_{j}\|^{2}_{2})^{\beta}}

(3)

where $Q,\sigma>,$ and $\beta\geq 0$ .

3 Problem Formulation

In social networks, an agent weighs the opinions of others based on how close its opinion (or state) and theirs are, that motivates consideration of state-dependent networks. Vehicular platoon can be modeled as both position-dependent (or state-dependent by considering the position as state) and random arbitrary networks, that is a practical example for motivation of this work. Therefore, there are two combined networks: 1) a network induced by states’ weights, and 2) the underlying random arbitrary network (see Section III in the preliminary version, i.e., [31], for details). The combined state-dependent & random arbitrary network is formulated as follows.

A network of $m\in\mathbb{N}$ nodes labeled by the set $\mathcal{V}=\{1,2,...,m\}$ is considered. The topology of the interconnections among nodes is not fixed but defined by a set of graphs $\mathcal{G}(\omega^{*})=(\mathcal{V},\mathcal{E}(\omega^{*}))$ where $\mathcal{E}(\omega^{*})$ is the ordered edge set $\mathcal{E}(\omega^{*})\subseteq\mathcal{V}\times\mathcal{V}$ and $\omega^{*}\in\Omega^{*}$ where $\Omega^{*}$ is the set of all possible communication graphs, i.e., $\Omega^{*}=\{\mathcal{G}_{1},\mathcal{G}_{2},...,\mathcal{G}_{\bar{N}}\}$ . We assume that $(\Omega^{*},\sigma)$ is a measurable space where $\sigma$ is the $\sigma$ -algebra on $\Omega^{*}$ . We write $\mathcal{N}_{i}^{in}(\omega^{*})/\mathcal{N}_{i}^{out}(\omega^{*})$ for the labels of agent $i$ ’s in/out neighbors at graph $\mathcal{G}(\omega^{*})$ so that there is an arc in $\mathcal{G}(\omega^{*})$ from vertex $j/i$ to vertex $i/j$ only if agent $i$ receives/sends information from/to agent $j$ . We write $\mathcal{N}_{i}(\omega^{*})$ when $\mathcal{N}_{i}^{in}(\omega^{*})=\mathcal{N}_{i}^{out}(\omega^{*})$ . It is assumed that there are no communication delay or communication noise in the network.

It should be noted that in our formulation, the in and out neighbors of each agent $i\in\mathcal{V}$ at each graph $\mathcal{G}_{i}\in\Omega^{*},i=1,\ldots,\bar{N},$ are fixed, and we will consider that the weights of links are possibly state-dependent. For instance, an agent pays attention arbitrarily at each time to its friends while it weighs the difference between its opinion and others for decision (see [29, Sec. III] for more details).

We associate for each node $i\in\mathcal{V}$ a convex cost function $f_{i}:\Re^{n}\longrightarrow\Re$ which is only observed by node $i$ . The objective of each agent is to find a solution of the following optimization problem:

\underset{s}{\min}\sum_{i=1}^{m}f_{i}(s)

where $s\in\Re^{n}$ . Since each node $i$ knows only its own $f_{i}$ , the nodes cannot individually calculate the optimal solution and, therefore, must collaborate to do so.

The above problem can be formulated based on local variables of the agents as

		$\displaystyle\underset{x}{\text{min}}$		$\displaystyle f(x):=\sum_{i=1}^{m}f_{i}(x_{i})$		(4)
		subject to		$\displaystyle x_{1}=\ldots=x_{m}$		(4)

where $x=[x_{1}^{T},x_{2}^{T},\ldots,x_{m}^{T}]^{T},x_{i}\in\Re^{n},i\in\mathcal{V}$ , and the constraint set is reached through state-dependent interactions and random (arbitrary) communication graphs. The set

\mathcal{C}:=\{x\in\Re^{mn}|x_{i}=x_{j},1\leq i,j\leq m,x_{i}\in\Re^{n}\}

(5)

is known as consensus subspace which is a convex set. Note that the Hilbert space considered in this paper for the distributed optimization problem is $\mathcal{H}=(\Re^{mn},\|.\|_{2}).$

We show $W(\omega^{*},x):=\mathcal{W}(\omega^{*},x)\otimes I_{n}$ and $\mathcal{W}(\omega^{*},x)=[\mathcal{W}_{ij}(\omega^{*},x_{i},x_{j})]$ for the state-dependent weighted matrix of the fixed graph $\omega^{*}\in\Omega^{*}$ in a switching network having all possible communication topologies in the set $\Omega^{*}$ . For instance, if nodes are not activated at some time $\tilde{t}$ for communication updates in asynchronous protocol, and/or there are no edges in occuring graph at the time $\tilde{t}$ , then $\mathcal{W}(\omega^{*}_{\tilde{t}},x_{\tilde{t}})=I_{m}$ .

Now we impose Assumptions 1 and 2 below on $\mathcal{W}(\omega^{*},x)$ .

Assumption 1. For each fixed $\omega^{*}\in\Omega^{*}$ , the weights $\mathcal{W}_{ij}(\omega^{*},x_{i},x_{j}):\Omega^{*}\times\Re^{n}\times\Re^{n}\longrightarrow[0,1]$ are continuous, and the state-dependent weighted matrix of the graph is doubly stochastic for all $\omega^{*}\in\Omega^{*}$ , i.e.,

i) $\sum_{j\in\mathcal{N}_{i}^{in}(\omega^{*})\cup\{i\}}\mathcal{W}_{ij}(\omega^{*},x_{i},x_{j})=1,i=1,2,...,m,$

ii) $\sum_{j\in\mathcal{N}_{i}^{out}(\omega^{*})\cup\{i\}}\mathcal{W}_{ij}(\omega^{*},x_{i},x_{j})=1,i=1,2,...,m.$

Assumption 1 allows us to remove the couple of information exchange with the analysis of our proposed algorithm and to consider random graphs together. The state-dependent weight $\mathcal{W}_{ij}(\omega^{*},x_{i},x_{j})$ between any two agents $i$ and $j$ in Assumption 1 is general and may be a function of distance or other forms of interactions. Note that any network with undirected links and continuous weights $\mathcal{W}_{ij}(\omega^{*},x_{i},x_{j})$ satisfies Assumption 1 since the weighted matrix of the graph is symmetric (and thus doubly stochastic).

Assumption 2. The union of the graphs in $\Omega^{*}$ is strongly connected for all $x\in\Re^{mn}$ , i.e.,

Re[\lambda_{2}(\sum_{\omega^{*}\in\Omega^{*}}(I_{m}-\mathcal{W}(\omega^{*},x)))]>0,\quad{}\forall x\in\Re^{mn}.

(6)

Assumption 2 guarantees that the information sent from each node is eventually received by every other node. The set $\mathcal{C}$ defined in (5) (which is the constraint set of (4)) can be obtained from the set

\{x|W(\omega^{*},x)x=x,\forall\omega^{*}\in\Omega^{*},\textit{with Assumptions 1 and 2}\}

(7)

(see [29, Appendices A and B] by setting $G=\Omega^{*}$ for the proof). This allows us to reformulate (4) as

		$\displaystyle\underset{x}{\text{min}}$		$\displaystyle f(x):=\sum_{i=1}^{m}f_{i}(x_{i})$		(8)
		subject to		$\displaystyle W(\omega^{},x)x=x,\forall\omega^{}\in\Omega^{*}.$		(8)

Thus, a solution of (4) can be attained by solving (8) with Assumptions 1 and 2.

The random operator $T(\omega^{*},x):=W(\omega^{*},x)x$ is called state-dependent weighted random operator of the graph (see [30, Def. 8], [29, Def. 4]). From Definition 2 and (7), we have $FVP(T)=\mathcal{C}$ with Assumptions 1 and 2.

Now we show that the random operator $T(\omega^{*},x):=W(\omega^{*},x)x$ with Assumption 1 is quasi-nonexpansive in the Hilbert space $\mathcal{H}=(\Re^{mn},\|.\|_{2})$ . Let $z\in FVP(T)=\mathcal{C}$ . Since $z\in\mathcal{C}$ and $W(\omega^{*},x)$ is a stochastic matrix (see Assumption 1) for all $\omega^{*}\in\Omega^{*},x\in\mathcal{H}$ , we have $W(\omega^{*},x)z=z$ . Therefore, we obtain

	$\displaystyle\\|T(\omega^{*},x)-z\\|_{2}$	$\displaystyle=\\|W(\omega^{},x)x-W(\omega^{},x)z\\|_{2}$
		$\displaystyle\leq\\|W(\omega^{*},x)\\|_{2}\\|x-z\\|_{2}.$

Since $W(\omega^{*},x)$ is doubly stochastic by Assumption 1, we have from Lemma 1 (where $\|W(\omega^{*},x)\|_{1}=\|W(\omega^{*},x)\|_{\infty}=1$ ) that $\|W(\omega^{*},x)\|_{2}\leq 1,\forall\omega^{*}\in\Omega^{*}.$ Hence,

\|T(\omega^{*},x)-z\|_{2}\leq\|W(\omega^{*},x)\|_{2}\|x-z\|_{2}\leq\|x-z\|_{2}

(9)

which implies that the random operator $T(\omega^{*},x)$ is quasi-nonexpansive (see Remark 1).

Problem (8) is a special case of the general class of problem presented in Problem 1 below where $T(\omega^{*},x):=W(\omega^{*},x)x$ . It is to be noted that Problem 3 in [30] is defined for nonexpansive random operator, while we define Problem 1 below for quasi-nonexpansive random operator which contains nonexpansive random operator as a special case (see Remark 1).

Problem 1: Let $\mathcal{H}$ be a real Hilbert space. Assume that the problem is feasible, namely $FVP(T)\neq\emptyset$ . Given a convex function $f:\mathcal{H}\longrightarrow\Re$ and a quasi-nonexpansive random mapping $T:\Omega^{*}\times\mathcal{H}\longrightarrow\mathcal{H}$ , the problem is to find $x^{*}\in\underset{x}{\operatorname{argmin}}f(x)$ such that $x^{*}$ is a fixed-value point of $T(\omega^{*},x)$ , i.e., we have the following minimization problem

		$\displaystyle\underset{x}{\text{min}}$		$\displaystyle f(x)$		(10)
		subject to		$\displaystyle x\in FVP(T)$		(10)

where $FVP(T)$ is the set of fixed-value points of the random operator $T(\omega^{*},x)$ (see Definition 2).

Remark 2.

A fixed-value point of a quasi-nonexpansive random mapping is a common fixed point of a family of quasi-nonexpansive non-random mappings $T(\omega^{*},.)$ for each $\omega^{*}$ . From Preposition 1, the fixed point set of a quasi-nonexpansive non-random mapping $T(\omega^{*},.)$ for each $\omega^{*}$ is a convex set. It is well-known that the intersection of convex sets (finite, countable, or uncountable) is convex. Thus, $FVP(T)$ is a convex set, and Problem 1 is a convex optimization problem.

4 Algorithm and Its Convergence

Here, we present that the proposed algorithm in [30] (which works for nonexpansive random operators) is applicable for solving Problem 1 with quasi-nonexpansive random operators. Thus, we propose the following algorithm for solving Problem 1:

x_{t+1}=\alpha_{t}(x_{t}-\beta\nabla f(x_{t}))+(1-\alpha_{t})\hat{T}(\omega_{t}^{*},x_{t}),

(11)

where $\hat{T}(\omega_{t}^{*},x_{t}):=(1-\eta)x_{t}+\eta T(\omega_{t}^{*},x_{t}),\eta\in(0,1),\alpha_{t}\in[0,1],$ and $\omega^{*}_{t}$ is a realization of the set $\Omega^{*}$ at time $t$ . The challenge of extending the result in [30] is to use weaker property (2) which is valid for all $x\in\mathcal{H}$ and $x^{*}\in FVP(T)$ , instead of stronger property (1) which is valid for all $x,y\in\mathcal{H}$ .

Let $(\Omega^{*},\sigma)$ be a measurable space where $\Omega^{*}$ and $\sigma$ are defined in Section 3. Consider a probability measure $\mu$ defined on the space $(\Omega,\mathcal{F})$ where

\Omega=\Omega^{*}\times\Omega^{*}\times\Omega^{*}\times\ldots

and $\mathcal{F}$ is a sigma algebra on $\Omega$ such that $(\Omega,\mathcal{F},\mu)$ forms a probability space. We denote a realization in this probability space by $\omega\in\Omega$ . We have the following assumptions.

Assumption 3. $f(x)$ is continuously differentiable, $\rho$ -strongly convex, and $\nabla f(x)$ is $K$ -Lipschitz continuous.

Assumption 4. There exists a nonempty subset $\tilde{K}\subseteq\Omega^{*}$ such that $FVP(T)=\{\tilde{z}|\tilde{z}\in\mathcal{H},\tilde{z}=T(\bar{\omega},\tilde{z}),\forall\bar{\omega}\in\tilde{K}\}$ , and each element of $\tilde{K}$ occurs infinitely often almost surely.

Assumption 4 is weaker than existing assumptions for random networks as explained in details in Remark 3 below.

Remark 3.

[30] If the sequence $\{\omega^{*}(t)\}_{n=0}^{\infty}$ is mutually independent with $\sum_{t=0}^{\infty}Pr_{t}(\bar{\omega})=\infty$ where $Pr_{t}(\bar{\omega})$ is the probability of (a particular element) $\bar{\omega}$ occurring at time $t$ , then Assumption 4 is satisfied. Moreover, any ergodic stationary sequences $\{\omega^{*}(t)\}_{t=0}^{\infty},Pr(\bar{\omega})>0,$ satisfy Assumption 4. Consequently, any time-invariant Markov chain with its unique stationary distribution as the initial distribution satisfies Assumption 4.

4.1 Almost Sure Convergence

Before we give our theorems, we need to extend Lemma 5 in [30] (which is for nonexpansive random operators) to quasi-nonexpansive random operators. Hence, we have the following lemma.

Lemma 4. Let $\mathcal{H}$ be a real Hilbert space, $\hat{T}(\omega^{*},x):=(1-\eta)x+\eta T(\omega^{*},x),\omega^{*}\in\Omega^{*},x\in\mathcal{H},$ with a quasi-nonexpansive random operator $T$ , $FVP(T)\neq\emptyset$ , and $\eta\in(0,1]$ . Then

(i) $FVP(T)=FVP(\hat{T}).$

(ii) $\langle x-\hat{T}(\omega^{*},x),x-z\rangle\geq\frac{\eta}{2}\|x-T(\omega^{*},x)\|^{2},\forall z\in\quad{}\quad{}\quad{}\quad{}FVP(T),\forall\omega^{*}\in\Omega^{*}.$

(iii) $\hat{T}(\omega^{*},x)$ is quasi-nonexpasnive.

Proof. See Appendix A.

We present the main theorem in this paper as follows.

Theorem 2.

Consider Problem 1 with Assumptions 3 and 4. Let $\beta\in(0,\frac{2}{K})$ and $\alpha_{t}\in[0,1],t\in\mathbb{N}\cup\{0\}$ such that

(a) $\displaystyle\lim_{t\longrightarrow\infty}\alpha_{t}=0,$

(b) $\sum_{t=0}^{\infty}\alpha_{t}=\infty.$

Then starting from any initial point, the sequence generated by (11) globally converges almost surely to the unique solution of the problem.

Note that the range of $\beta$ in Theorem 1 in [31] (i.e., the preliminary version of this paper) is $\beta\in(0,\frac{2\rho}{K^{2}})$ which is enlarged to $\beta\in(0,\frac{2}{K})$ in Theorem 1 above. This is due to the fact that according to definitions of strong convexity and strong smoothness of a differentiable convex function $f$ (see also parts (5)-(6) in [43, p. 38]), we always have $\rho\leq K$ . Hence, $\frac{2\rho}{K^{2}}\leq\frac{2}{K}$ . An advantageous of this enlargement is to have more choice to select the parameter $\beta.$ An example of $\alpha_{t}$ satisfying (a) and (b) in Theorem 1 is $\alpha_{t}:=\frac{1}{(1+t)^{\zeta}}$ where $\zeta\in(0,1]$ .

Remark 4.

As seen from the proof of Theorem 1, an advantage of the proposed technique in [30] (and thus here) is that we are able to analyze stochastic processes in a fully deterministic way (see Remark 12 in [30] for details).

Proof of Theorem 1. We prove Theorem 1 in three steps, i.e.

Step 1: $\{x_{t}\}_{t=0}^{\infty},\forall\omega\in\Omega,$ is bounded (see Lemma 5 in Appendix B).

Step 2: $\{x_{t}\}_{t=0}^{\infty}$ converges almost surely to a random variable supported by the feasible set (see Lemma 6 in Appendix C).

Step 3: $\{x_{t}\}_{t=0}^{\infty}$ converges almost surely to the optimal solution (see Lemma 7 in Appendix D).

4.2 Mean Square Convergence

Due to the fact that almost sure convergence in general does not imply mean square convergence and vice versa, we show the mean square convergence of the random sequence generated by Algorithm (11) in the following theorem.

Theorem 3.

Consider Problem 1 with Assumptions 3 and 4. Suppose that $\beta\in(0,\frac{2}{K})$ and $\alpha_{t}\in[0,1],t\in\mathbb{N}\cup\{0\}$ , satisfies (a) and (b) in Theorem 1. Then starting from any initial point, the sequence generated by (11) globally converges in mean square to the unique solution of the problem.

Proof. From Step 1, Theorem 1, and Lemma 3, one can prove Theorem 2.

4.3 Distributed Optimization

Distributed optimization problem with state-dependent interactions over random arbitrary networks is a special case of Problem 1 (see Section 3). Hence, Algorithm (11) is directly applied to solve (8) in a distributed manner under the consideration that each $f_{i}(x_{i})$ is $\rho$ -strongly convex and $\nabla f_{i}(x_{i})$ is $K$ -Lipschitz. Thus, we give the following corollary of Theorems 1 and 2.

Corollary 1. Consider the optimization (8) with Assumptions 1, 2, and 4. Assume that each $f_{i}(x_{i})$ is $\rho$ -strongly convex and $\nabla f_{i}(x_{i})$ is $K$ -Lipschitz for $i=1,\ldots,m$ . Suppose that $\beta\in(0,\frac{2}{K}),\eta\in(0,1),$ and $\alpha_{t}\in[0,1],t\in\mathbb{N}\cup\{0\}$ satisfies (a) and (b) in Theorem 1. Then starting from any initial point, the sequence generated by the following distributed algorithm based on local information for each agent $i$

	$\displaystyle x_{i,t+1}$	$\displaystyle=\alpha_{t}(x_{i,t}-\beta\nabla f_{i}(x_{i,t}))+(1-\alpha_{t})((1-\eta)x_{i,t}$
		$\displaystyle\quad{}+\eta\sum_{j\in\mathcal{N}_{i}^{in}(\omega^{}_{t})\cup\{i\}}\mathcal{W}_{ij}(\omega^{}_{t},x_{i},x_{j})x_{j,t}),$		(12)

globally converges both almost surely and in mean square to the unique solution of the problem.

Algorithm (4.3) is totally asynchronous algorithm (see footnote 2) without requiring a priori distribution or B-connectivity (see footnote 2) of switched graphs. B-connectivity assumption satisfies Assumption 4. The algorithm is not asynchronous due to synchronized diminishing step size $\alpha_{t}$ . The algorithm still works in the case where state-dependent/state-independent weighted matrix of the graph is periodic and irreducible in synchronous protocol. Detailed properties of Algorithm (4.3) for time-varying (see footnote 1) networks has been studied in [29] and can be induced for random networks (see also footnote 4).

Remark 5.

The convergence rate of a totally asynchronous algorithm in general cannot be established. Determining rate of convergence of (4.3) based on suitable assumptions is left for future work. An asynchronous and totally asynchronous algorithm for distributed optimization over random networks with state-independent interactions has recently been proposed in [37]. As a special case of distributed optimization over state-independent networks, asynchronous and total asynchronous algorithms have been given for average consensus and solving linear algebraic equations in [38] and [39], respectively (see [37, Sec. I] for details).

5 Numerical Example

We give a practical example of distributed optimization with state-dependent interactions of Cucker-Smale form [6] with random (arbitrary) communication links where there are distribution dependencies among random arbitrary switched graphs. We mention that the following example has been solved over time-varying (see footnote 1) networks in [29], while we solve it here over random arbitrary networks, where there are distribution dependency among switched communication graphs, to show the capability of Algorithm (4.3).

Example 1. (Distributed Optimization over Random Arbitrary Networks for an Automated Warehouse): Consider $m$ robots on the shop floor in a warehouse. Assigning tasks to robotic agents is modeled as optimization problems in an automated warehouse [40], that are solved by a centralized processor and are neither scalable nor can handle autonomous entities [40]. Moreover, due to large number of robots, the robots must handle tasks in collaborative manner [40] due to computational restriction of a centralized processor. If we assume that the communications among robots are carried out via a wireless network, then the signal power at a receiver is inversely proportional to some power of the distance between transmitter and receiver [41]. Therefore, if we consider the position as the state for each robot, then the weights of the links between robots are state-dependent.

Refer to caption — Figure 1: Variables $x_{i}^{1},i=1,\ldots,20,$ of the robotic agents with weights of the form (14). This figure shows that the variables are getting consensus when the robots communicate for one realization of random network with distribution dependency.

Assume that $m=20$ robots bring some loads from different initial places to a place for delivery. The desired place to put the loads is determined to minimize the pre-defined cost as sum of squared distances to the initial places of the robots as

\underset{s}{\min}\sum_{i=1}^{20}\|s-d_{i}\|^{2}_{2}

(13)

where $s\in\Re^{2}$ is the decision variable, and $d_{i}$ is the position of the initial place of the load $i$ on the two-dimensional shop floor. The above problem is reformulated as the following problem based on the local variables of the agents:

		$\displaystyle\underset{x}{\text{min}}$		$\displaystyle f(x):=\sum_{i=1}^{20}0.5\\|x_{i}-d_{i}\\|^{2}_{2}$
		subject to		$\displaystyle x_{1}=x_{2}=\ldots=x_{20}$

where $x_{i}=[x_{i}^{1},x_{i}^{2}]^{T}$ , and the constraint set is reached via distance-dependent network with random communication graphs.

The topology of the underlying undirected graph is assumed to be a line graph, i.e., $1\longleftrightarrow 2\ldots\longleftrightarrow 20$ , for minimal connectivity among robots. Based on weighing property of wireless communication network mentioned eralier, the weight of the link between robots $i$ and $j$ is modeled to be of Cucker-Smale form (see Section 2)

\mathcal{W}_{ij}(x_{i},x_{j})=\frac{0.25}{1+\|x_{i}-x_{j}\|^{2}_{2}}.

(14)

One can see that the weight of each link at each time $t$ is only determined by the states of the agents, and hence no local property is assumed or determined a priori for all $t$ in Algorithm (4.3) (see [29] for details). It is easy to check that $f_{i}(x_{i}):=0.5\|x_{i}-d_{i}\|^{2}_{2},i=1,2,...,20,$ are 1-strongly convex, and $\nabla f_{i}(x_{i})$ are $1$ -Lipschitz continuous.

We consider that each link has independent and identically distributed (i.i.d.) Bernoulli distribution with $Pr(failure)=0.5$ in every $\tilde{N}$ -interval, and at the iteration $k\tilde{N},k=1,\ldots,$ the link that has worked the minimum number of the times in the previous $\tilde{N}$ -interval occurs. If some links have the same number of occurrences in the previous $\tilde{N}$ -interval, then one is chosen randomly. Here, we have the graphs $\mathcal{\omega}^{*}\in\Omega^{*}=\{\mathcal{G}_{1},\ldots,\mathcal{G}_{19}\}$ . Thus the sequence $\{\omega^{*}_{t}\}_{t=0}^{\infty}$ is not independent. It has been shown in [30] that the each graph $\mathcal{G}_{i},i=1,\ldots,19,$ occurs infinitely often almost surely. Moreover, the union of the graphs is strongly connected for all $x\in\Re^{40}$ . Therefore, Assumption 4 is fulfilled. Thus the conditions of Theorems 1 and 2 are satisfied.

We use $\eta=0.8,\alpha_{t}=\frac{1}{1+t},t\geq 0,\beta=\frac{1}{K}=\dfrac{1}{1}$ for simulation. The initial position of agent $i$ is chosen to be $x_{i,0}=[10cos(\frac{(i-1)2\pi}{22}),10sin(\frac{(i-1)2\pi}{22})]^{T}$ . The optimal solution of (13) in centralized way can be computed as mean of $d_{i},i=1,\ldots,20,$ and is $s^{*}=[-0.9002,0.4111]^{T}$ . The results given by Algorithm (4.3) are shown in Figure 1-4. The error $e_{t}:=\|x_{t}-s^{*}\otimes\textbf{1}_{20}\|_{2}$ , where $x_{t}=[x_{1,t}^{T},\ldots,x_{20,t}^{T}]^{T},$ is given in Fig. 4. The two-dimensional (2D) plot is shown in Figure 3. Figures 1-4 show that the positions of robotic agents are approaching the solution of the optimization (13) for one realization of random network with distribution dependency. Note that no existing result can solve this problem since the weights of links are both position-dependent and randomly arbitrarily activated.

We also simulate the above example with different weights than Cucker-Smale form, i.e.,

\mathcal{W}_{ij}(x_{i},x_{j})=\frac{0.25}{1+log^{2}(1+\|x_{i}-x_{j}\|_{2})},

(15)

and the results are shown in Figures 5-6. The figures show that the variables of agents are getting consensus on the optimal solution of the problem.

6 Conclusions and Future Work

Distributed optimization with both state-dependent interactions and random (arbitrary) networks is considered. It is shown that the state-dependent weighted random operator of the graph is quasi-nonexpansive; thus, it is not required to impose a priori distribution of random communication topologies on switching graphs. A more general optimization problem than that addressed in the literature is provided. A gradient-based discrete-time algorithm using diminishing step size is provided that is able to converge both almost surely and in mean square to the global solution of the optimization problem under suitable assumptions. Moreover, it reduces to a totally asynchronous algorithm for the distributed optimization problem. Relaxing strong convexity assumption of cost functions and/or doubly stochasticity assumption of communication graphs opens problems for future research.

Appendix A

Proof of Lemma 4.

(i) The proof is the same as the proof of part (i) of Lemma 5 in [30].

(ii) We have from quasi-nonexpansivity of $T(\omega^{*},x)$ for arbitrary $x\in\mathcal{H}$ that

\|T(\omega^{*},x)-z\|^{2}\leq\|x-z\|^{2},\forall z\in FVP(T),\forall\omega^{*}\in\Omega^{*}.

(16)

In a Hilbert space $\mathcal{H}$ , we have

\|u+v\|^{2}=\|u\|^{2}+\|v\|^{2}+2\langle u,v\rangle,\forall u,v\in\mathcal{H}.

(17)

From (17), we obtain for all $z\in FVP(T)$ and for all $\omega^{*}\in\Omega^{*}$ that

$\displaystyle\\|T(\omega^{*},x)-z\\|^{2}$	$\displaystyle=\\|T(\omega^{*},x)-x+x-z\\|^{2}$
	$\displaystyle=\\|T(\omega^{*},x)-x\\|^{2}+\\|x-z\\|^{2}$
	$\displaystyle\quad+2\langle T(\omega^{*},x)-x,x-z\rangle.$	(18)

Substituting (18) for (16) yields

2\langle x-T(\omega^{*},x),x-z\rangle\geq\|T(\omega^{*},x)-x\|^{2}.

(19)

From the definition of $\hat{T}(\omega_{t}^{*},x_{t})$ (see (11)), substituting $x-T(\omega^{*},x)=\frac{x-\hat{T}(\omega^{*},x)}{\eta}$ for the left hand side of the inequality (19) implies (ii). Thus the proof of part (ii) of Lemma 4 is complete.

(iii) We have from quasi-nonexpansivity of $T(\omega^{*},x)$ for $z\in FVP(T)$ and arbitrary $x\in\mathcal{H}$ that

	$\displaystyle\\|\hat{T}(\omega^{*},x)-z\\|$	$\displaystyle\leq(1-\eta)\\|x-z\\|+\eta\\|T(\omega^{*},x)-z\\|$
		$\displaystyle\leq(1-\eta)\\|x-z\\|+\eta\\|x-z\\|$
		$\displaystyle=\\|x-z\\|,\forall\omega^{}\in\Omega^{}.$

Therefore, $\hat{T}(\omega^{*},x)$ is a quasi-nonexpansive random operator, and the proof of part (iii) of Lemma 4 is complete.

Appendix B

Lemma 5. The sequence $\{x_{t}\}_{t=0}^{\infty},\forall\omega\in\Omega,$ generated by (11) is bounded with Assumption 3.

Proof. Since the cost function is smooth and strongly convex and the constraint set is nonempty and closed, the problem has a unique solution. Let $x^{*}$ be the unique solution of the problem. We can write $x^{*}=\alpha_{t}x^{*}+(1-\alpha_{t})x^{*},\forall t\in\mathbb{N}\cup\{0\}$ . Therefore, we have

	$\displaystyle\\|x_{t+1}-x^{*}\\|$	$\displaystyle=\\|\alpha_{t}(x_{t}-\beta\nabla f(x_{t}))+(1-\alpha_{t})\hat{T}(\omega_{t}^{},x_{t})-x^{}\\|$
		$\displaystyle=\\|\alpha_{t}(x_{t}-\beta\nabla f(x_{t})-x^{})+(1-\alpha_{t})(\hat{T}(\omega_{t}^{},x_{t})-x^{*})\\|$
		$\displaystyle\leq\alpha_{t}\\|x_{t}-\beta\nabla f(x_{t})-x^{}\\|+(1-\alpha_{t})\\|\hat{T}(\omega_{t}^{},x_{t})-x^{*}\\|.$

Since $x^{*}$ is the solution, we have that $x^{*}\in FVP(T)=FVP(\hat{T})$ (see part (i) of Lemma 4). Due to the fact that $\hat{T}(\omega^{*},x)$ is a quasi-nonexpansive random operator (see part (iii) of Lemma 4), the above can be written as

	$\displaystyle\\|x_{t+1}-x^{*}\\|$	$\displaystyle\leq\alpha_{t}\\|x_{t}-\beta\nabla f(x_{t})-x^{}\\|+(1-\alpha_{t})\\|\hat{T}(\omega_{t}^{},x_{t})-x^{*}\\|$
		$\displaystyle\leq\alpha_{t}\\|x_{t}-\beta\nabla f(x_{t})-x^{}\\|+(1-\alpha_{t})\\|x_{t}-x^{}\\|.$		(20)

Since $\nabla f_{i}(x_{i})$ is $K$ -Lipschitz, $f_{i}(x_{i})$ is $K$ -strongly smooth (see [42, Lem. 3.4]). When $f_{i}(x_{i})$ is $\rho$ -strongly convex and $K$ -strongly smooth, the operator $H(x):=x-\beta\nabla f(x)$ where $\beta\in(0,\frac{2}{K})$ is a contraction (see [43, p. 15] for details). Indeed, there exists a $0<\gamma\leq 1$ such that

\|x-y-\beta(\nabla f(x)-\nabla f(y))\|\leq(1-\gamma)\|x-y\|,\forall x,y\in\mathcal{H}.

(21)

We have that

	$\displaystyle\\|x_{t}-\beta\nabla f(x_{t})-x^{*}\\|$	$\displaystyle=\\|x_{t}-x^{}-\beta(\nabla f(x_{t})-\nabla f(x^{}))-\beta\nabla f(x^{*})\\|$
		$\displaystyle\leq\\|x_{t}-x^{}-\beta(\nabla f(x_{t})-\nabla f(x^{}))\\|+\beta\\|\nabla f(x^{*})\\|.$		(22)

Therefore, (21) and (22) imply

	$\displaystyle\\|x_{t}-\beta\nabla f(x_{t})-x^{*}\\|$	$\displaystyle\leq\\|x_{t}-x^{}-\beta(\nabla f(x_{t})-\nabla f(x^{}))\\|+\beta\\|\nabla f(x^{*})\\|$
		$\displaystyle\leq(1-\gamma)\\|x_{t}-x^{}\\|+\beta\\|\nabla f(x^{})\\|.$		(23)

Substituting (23) for (20) yields

	$\displaystyle\\|x_{t+1}-x^{*}\\|$	$\displaystyle\leq(1-\gamma\alpha_{t})\\|x_{t}-x^{}\\|+\alpha_{t}\beta\\|\nabla f(x^{})\\|$
		$\displaystyle=(1-\gamma\alpha_{t})\\|x_{t}-x^{}\\|+\gamma\alpha_{t}(\frac{\beta\\|\nabla f(x^{})\\|}{\gamma})$

which by induction implies that

\|x_{t+1}-x^{*}\|\leq max\{\|x_{0}-x^{*}\|,\frac{\beta\|\nabla f(x^{*})\|}{\gamma}\}

that implies $\|x_{t}-x^{*}\|,t\in\mathbb{N}\cup\{0\},\forall\omega\in\Omega$ , is bounded. Therefore, $\{x_{t}\}_{t=0}^{\infty}$ is bounded for all $\omega\in\Omega$ .

Appendix C

Lemma 6. The sequence $\{x_{t}\}_{t=0}^{\infty}$ generated by (11) converges almost surely to a random variable supported by the feasible set.

Proof. From (11) and $x_{t}=\alpha_{t}x_{t}+(1-\alpha_{t})x_{t}$ , we have

x_{t+1}-x_{t}+\alpha_{t}\beta\nabla f(x_{t})=(1-\alpha_{t})(\hat{T}(\omega_{t}^{*},x_{t})-x_{t}),

(24)

and thus

\displaystyle\langle x_{t+1}-x_{t}+\alpha_{t}\beta\nabla f(x_{t}),x_{t}-x^{*}\rangle=-(1-\alpha_{t})\langle x_{t}-\hat{T}(\omega_{t}^{*},x_{t}),x_{t}-x^{*}\rangle.

(25)

Due to $x^{*}\in FVP(T)$ , we have from part (ii) of Lemma 4 that

\langle x_{t}-\hat{T}(\omega_{t}^{*},x_{t}),x_{t}-x^{*}\rangle\geq\frac{\eta}{2}\|x_{t}-T(\omega_{t}^{*},x_{t})\|^{2}.

(26)

We get from (25) and (26) that

\displaystyle\langle x_{t+1}-x_{t}+\alpha_{t}\beta\nabla f(x_{t}),x_{t}-x^{*}\rangle\leq-\frac{\eta}{2}(1-\alpha_{t})\|x_{t}-T(\omega_{t}^{*},x_{t})\|^{2}

(27)

\displaystyle-\langle x_{t}-x_{t+1},x_{t}-x^{*}\rangle\leq-\alpha_{t}\langle\beta\nabla f(x_{t}),x_{t}-x^{*}\rangle-\frac{\eta}{2}(1-\alpha_{t})\|x_{t}-T(\omega_{t}^{*},x_{t})\|^{2}.

(28)

In a Hilbert space $\mathcal{H},$ we have for any $u,v\in\mathcal{H}$ that

\langle u,v\rangle=-\frac{1}{2}\|u-v\|^{2}+\frac{1}{2}\|u\|^{2}+\frac{1}{2}\|v\|^{2}.

(29)

We obtain from (29) that

\langle x_{t}-x_{t+1},x_{t}-x^{*}\rangle=-C_{t+1}+C_{t}+\frac{1}{2}\|x_{t}-x_{t+1}\|^{2}

(30)

where $C_{t}:=\frac{1}{2}\|x_{t}-x^{*}\|^{2}$ . We get from (28) and (30) that

\displaystyle C_{t+1}-C_{t}-\frac{1}{2}\|x_{t}-x_{t+1}\|^{2}

\displaystyle\leq-\alpha_{t}\langle\beta\nabla f(x_{t}),x_{t}-x^{*}\rangle-\frac{\eta}{2}(1-\alpha_{t})\|x_{t}-T(\omega_{t}^{*},x_{t})\|^{2}.

(31)

From (24) and (17) we obtain

$\displaystyle\\|x_{t+1}-x_{t}\\|^{2}$	$\displaystyle=\\|-\alpha_{t}\beta\nabla f(x_{t})+(1-\alpha_{t})(\hat{T}(\omega^{*}_{t},x_{t})-x_{t})\\|^{2}$
	$\displaystyle=\alpha_{t}^{2}\\|\beta\nabla f(x_{t})\\|^{2}+(1-\alpha_{t})^{2}\\|\hat{T}(\omega^{*}_{t},x_{t})-x_{t}\\|^{2}$
	$\displaystyle\quad-2\alpha_{t}(1-\alpha_{t})\langle\beta\nabla f(x_{t}),\hat{T}(\omega_{t}^{*},x_{t})-x_{t}\rangle.$	(32)

We know that $\|\hat{T}(\omega^{*}_{t},x_{t})-x_{t}\|=\eta\|x_{t}-T(\omega^{*}_{t},x_{t})\|$ . Since $\alpha_{t}\in[0,1]$ , we have also that $(1-\alpha_{t})^{2}\leq(1-\alpha_{t})$ . Using these facts and multiplying both sides of (32) by $\frac{1}{2}$ yield

$\displaystyle\frac{1}{2}\\|x_{t+1}-x_{t}\\|^{2}$	$\displaystyle=\frac{1}{2}\alpha_{t}^{2}\\|\beta\nabla f(x_{t})\\|^{2}+\frac{1}{2}(1-\alpha_{t})^{2}\eta^{2}\\|T(\omega^{*}_{t},x_{t})-x_{t}\\|^{2}$
	$\displaystyle\quad-\alpha_{t}(1-\alpha_{t})\langle\beta\nabla f(x_{t}),\hat{T}(\omega_{t}^{*},x_{t})-x_{t}\rangle$
	$\displaystyle\leq\frac{1}{2}\alpha_{t}^{2}\\|\beta\nabla f(x_{t})\\|^{2}+\frac{1}{2}(1-\alpha_{t})\eta^{2}\\|T(\omega^{*}_{t},x_{t})-x_{t}\\|^{2}$
	$\displaystyle\quad-\alpha_{t}(1-\alpha_{t})\langle\beta\nabla f(x_{t}),\hat{T}(\omega_{t}^{*},x_{t})-x_{t}\rangle.$	(33)

We obtain from (31) and (33) that

$\displaystyle C_{t+1}-C_{t}$	$\displaystyle\leq\frac{1}{2}\\|x_{t+1}-x_{t}\\|^{2}-\alpha_{t}\langle\beta\nabla f(x_{t}),x_{t}-x^{*}\rangle$
	$\displaystyle\quad-\frac{\eta}{2}(1-\alpha_{t})\\|x_{t}-T(\omega_{t}^{*},x_{t})\\|^{2}$
	$\displaystyle\leq-(\frac{1}{2}-\frac{\eta}{2})\eta(1-\alpha_{t})\\|x_{t}-T(\omega_{t}^{*},x_{t})\\|^{2}$
	$\displaystyle\quad+\alpha_{t}(\frac{1}{2}\alpha_{t}\\|\beta\nabla f(x_{t})\\|^{2}$
	$\displaystyle\quad-\langle\beta\nabla f(x_{t}),x_{t}-x^{*}\rangle$
	$\displaystyle\quad-(1-\alpha_{t})\langle\beta\nabla f(x_{t}),\hat{T}(\omega_{t}^{*},x_{t})-x_{t}\rangle).$	(34)

We claim that there exists $t_{0}\in\mathbb{N}$ such that the sequence $\{C_{t}\}$ is non-increasing for $t\geq t_{0}$ . We use proof by contradiction and assume that this is not true. Then there exists a subsequence $\{C_{t_{j}}\}$ such that $C_{t_{j}+1}-C_{t_{j}}>0$ which with (34) implies

$\displaystyle 0$	$\displaystyle<C_{t_{j}+1}-C_{t_{j}}$
	$\displaystyle\leq-(\frac{1}{2}-\frac{\eta}{2})\eta(1-\alpha_{t_{j}})\\|x_{t_{j}}-T(\omega_{t_{j}}^{*},x_{t_{j}})\\|^{2}$
	$\displaystyle\quad{}+\alpha_{t_{j}}(\frac{1}{2}\alpha_{t_{j}}\beta^{2}\\|\nabla f(x_{t_{j}})\\|^{2}-\langle\beta\nabla f(x_{t_{j}}),x_{t_{j}}-x^{*}\rangle$
	$\displaystyle\quad{}-(1-\alpha_{t_{j}})\langle\beta\nabla f(x_{t_{j}}),\hat{T}(\omega_{t_{j}}^{*},x_{t_{j}})-x_{t_{j}}\rangle).$	(35)

Since $\{x_{t}\}$ is bounded, $\nabla f(x)$ is continuous, and $\eta\in(0,1)$ , we get from (35) and Theorem 1 (a) that

$\displaystyle 0$	$\displaystyle<\liminf_{j\longrightarrow\infty}[-(\frac{1}{2}-\frac{\eta}{2})\eta(1-\alpha_{t_{j}})\\|x_{t_{j}}-T(\omega_{t_{j}}^{*},x_{t_{j}})\\|^{2}$
	$\displaystyle\quad+\alpha_{t_{j}}(\frac{1}{2}\alpha_{t_{j}}\\|\beta\nabla f(x_{t_{j}})\\|^{2}-\langle\beta\nabla f(x_{t_{j}}),x_{t_{j}}-x^{*}\rangle$
	$\displaystyle\quad-(1-\alpha_{t_{j}})\langle\beta\nabla f(x_{t_{j}}),\hat{T}(\omega_{t_{j}}^{*},x_{t_{j}})-x_{t_{j}}\rangle)]\leq 0$	(36)

that is a contradiction. Hence, there exists $t_{0}\in\mathbb{N}$ such that the sequence $\{C_{t}\}$ is non-increasing for $n\geq t_{0}$ . Since $\{C_{t}\}$ is bounded below, it converges for all $\omega\in\Omega$ .

Now we take the limit of both sides of (34) and utilize the convergence of $\{C_{t}\}$ , continuity of $\nabla f(x)$ , Step 1, $\eta\in(0,1)$ , and Theorem 1 (a) to obtain

\lim_{t\longrightarrow\infty}\|x_{t}-T(\omega_{t}^{*},x_{t})\|=0,\quad{}\textit{pointwise (surely)}

which implies that $\{x_{t}\}_{t=0}^{\infty}$ converges for each $\omega\in\Omega$ since $FVP(T)\neq\emptyset$ . Moreover, this together with Assumption 4 implies that $\{x_{t}\}$ converges almost surely to a random variable supported by $FVP(T)$ .

Appendix D

Lemma 7. The sequence $\{x_{t}\}_{t=0}^{\infty}$ generated by (11) converges almost surely to the optimal solution.

Proof. Here we prove that $\{x_{t}\}_{t=0}^{\infty}$ converges almost surely to the optimal solution. Since $x^{*}\in FVP(T)$ is the optimal solution, we have

\langle\bar{x}-x^{*},\nabla f(x^{*})\rangle\geq 0,\forall\bar{x}\in FVP(T).

(37)

From (17), we have that

$\displaystyle\\|x_{t+1}-x^{*}\\|^{2}$	$\displaystyle=\\|x_{t+1}-x^{}+\alpha_{t}\beta\nabla f(x^{})-\alpha_{t}\beta\nabla f(x^{*})\\|^{2}$
	$\displaystyle=\\|x_{t+1}-x^{}+\alpha_{t}\beta\nabla f(x^{})\\|^{2}+\alpha_{t}^{2}\\|\beta\nabla f(x^{*})\\|^{2}$
	$\displaystyle\quad-2\alpha_{t}\langle\beta\nabla f(x^{}),x_{t+1}-x^{}+\alpha_{t}\beta\nabla f(x^{*})\rangle.$	(38)

We have that $x^{*}=\alpha_{t}x^{*}+(1-\alpha_{t})x^{*},\forall n\in\mathbb{N}\cup\{0\}$ ; We get from this fact and (11) that

	$\displaystyle\\|x_{t+1}-x^{}+\alpha_{t}\beta\nabla f(x^{})\\|^{2}$	$\displaystyle=\\|\alpha_{t}[x_{t}-x^{}-\beta(\nabla f(x_{t})-\nabla f(x^{}))]$
		$\displaystyle\quad+(1-\alpha_{t})[\hat{T}(\omega_{t}^{},x_{t})-x^{}]\\|^{2}.$		(39)

Furthermore, we have

	$\displaystyle\langle\beta\nabla f(x^{}),x_{t+1}-x^{}+\alpha_{t}\beta\nabla f(x^{*})\rangle$	$\displaystyle=\langle\beta\nabla f(x^{}),x_{t+1}-x^{}\rangle+\alpha_{t}\langle\beta\nabla f(x^{}),\beta\nabla f(x^{})\rangle$
		$\displaystyle=\langle\beta\nabla f(x^{}),x_{t+1}-x^{}\rangle+\alpha_{t}\\|\beta\nabla f(x^{*})\\|^{2}.$		(40)

Substituting (39) and (40) for (38) implies

	$\displaystyle\\|x_{t+1}-x^{*}\\|^{2}$	$\displaystyle=\\|x_{t+1}-x^{}+\alpha_{t}\beta\nabla f(x^{})\\|^{2}+\alpha_{t}^{2}\\|\beta\nabla f(x^{*})\\|^{2}$
		$\displaystyle\quad-2\alpha_{t}\langle\beta\nabla f(x^{}),x_{t+1}-x^{}+\alpha_{t}\beta\nabla f(x^{*})\rangle$
		$\displaystyle=\\|\alpha_{t}[x_{t}-x^{}-\beta(\nabla f(x_{t})-\nabla f(x^{}))]$
		$\displaystyle\quad+(1-\alpha_{t})[\hat{T}(\omega_{t}^{},x_{t})-x^{}]\\|^{2}$
		$\displaystyle\quad-2\alpha_{t}\langle\beta\nabla f(x^{}),x_{t+1}-x^{}\rangle-\alpha_{t}^{2}\\|\beta\nabla f(x^{*})\\|^{2}$
		$\displaystyle=\alpha_{t}^{2}\\|x_{t}-x^{}-\beta(\nabla f(x_{t})-\nabla f(x^{}))\\|^{2}$
		$\displaystyle\quad+(1-\alpha_{t})^{2}\\|\hat{T}(\omega_{t}^{},x_{t})-x^{}\\|^{2}$
		$\displaystyle\quad+2\alpha_{t}(1-\alpha_{t})\langle x_{t}-x^{*}$
		$\displaystyle\quad-\beta(\nabla f(x_{t})-\nabla f(x^{})),\hat{T}(\omega_{t}^{},x_{t})-x^{*}\rangle$
		$\displaystyle\quad-2\alpha_{t}\langle\beta\nabla f(x^{}),x_{t+1}-x^{}\rangle-\alpha_{t}^{2}\\|\beta\nabla f(x^{*})\\|^{2}.$

From (21), quasi-nonexpansivity property of $\hat{T}(\omega^{*},x)$ , and Cauchy–Schwarz inequality, we have

\displaystyle\langle x_{t}-x^{*}-\beta(\nabla f(x_{t})-\nabla f(x^{*})),\hat{T}(\omega_{t}^{*},x_{t})-x^{*}\rangle\leq(1-\gamma)\|x_{t}-x^{*}\|^{2}.

(41)

We get from (21) that

\|x_{t}-x^{*}-\beta(\nabla f(x_{t})-\nabla f(x^{*}))\|^{2}\leq(1-\gamma)^{2}\|x_{t}-x^{*}\|^{2}.

(42)

We obtain from (41), (42), and quasi-nonexpansivity property of $\hat{T}(\omega^{*},x)$ that

	$\displaystyle\\|x_{t+1}-x^{*}\\|^{2}$	$\displaystyle=\alpha_{t}^{2}\\|x_{t}-x^{}-\beta(\nabla f(x_{t})-\nabla f(x^{}))\\|^{2}$
		$\displaystyle\quad+(1-\alpha_{t})^{2}\\|\hat{T}(\omega_{t}^{},x_{t})-x^{}\\|^{2}$
		$\displaystyle\quad+2\alpha_{t}(1-\alpha_{t})\langle x_{t}-x^{}-\beta(\nabla f(x_{t})-\nabla f(x^{})),\hat{T}(\omega_{t}^{},x_{t})-x^{}\rangle$
		$\displaystyle\quad-2\alpha_{t}\langle\beta\nabla f(x^{}),x_{t+1}-x^{}\rangle-\alpha_{t}^{2}\\|\beta\nabla f(x^{*})\\|^{2}$
		$\displaystyle\leq(1-2\gamma\alpha_{t})\\|x_{t}-x^{*}\\|^{2}$
		$\displaystyle\quad+\alpha_{t}(\gamma^{2}\alpha_{t}\\|x_{t}-x^{}\\|^{2}-2\langle\beta\nabla f(x^{}),x_{t+1}-x^{*}\rangle)$
		$\displaystyle=(1-\gamma\alpha_{t})\\|x_{t}-x^{}\\|^{2}-\gamma\alpha_{t}\\|x_{t}-x^{}\\|^{2}$
		$\displaystyle\quad+\alpha_{t}(\gamma^{2}\alpha_{t}\\|x_{t}-x^{}\\|^{2}-2\langle\beta\nabla f(x^{}),x_{t+1}-x^{*}\rangle).$

We obtain from $\gamma\alpha_{t}\|x_{t}-x^{*}\|^{2}\geq 0$ that

	$\displaystyle\quad(1-\gamma\alpha_{t})\\|x_{t}-x^{}\\|^{2}-\gamma\alpha_{t}\\|x_{t}-x^{}\\|^{2}$
	$\displaystyle\quad+\alpha_{t}(\gamma^{2}\alpha_{t}\\|x_{t}-x^{}\\|^{2}-2\langle\beta\nabla f(x^{}),x_{t+1}-x^{*}\rangle)$
	$\displaystyle\leq(1-\gamma\alpha_{t})\\|x_{t}-x^{*}\\|^{2}$
	$\displaystyle\quad+\alpha_{t}(\gamma^{2}\alpha_{t}\\|x_{t}-x^{}\\|^{2}-2\langle\beta\nabla f(x^{}),x_{t+1}-x^{*}\rangle)$

or finally

\displaystyle\|x_{t+1}-x^{*}\|^{2}\leq(1-\gamma\alpha_{t})\|x_{t}-x^{*}\|^{2}+\gamma\alpha_{t}(\frac{\gamma^{2}\alpha_{t}\|x_{t}-x^{*}\|^{2}-2\langle\beta\nabla f(x^{*}),x_{t+1}-x^{*}\rangle}{\gamma}).

(43)

From Step 1, Step 2, (37), and the condition in Theorem 1 (a), we get

\displaystyle\displaystyle\lim_{t\longrightarrow\infty}(\gamma^{2}\alpha_{t}\|x_{t}-x^{*}\|^{2}-2\beta\langle\nabla f(x^{*}),x_{t+1}-x^{*}\rangle)\leq 0\quad{\textit{almost surely}}.

(44)

Setting $a_{t},b_{t},h_{t}$ in Lemma 2 as

	$\displaystyle a_{t}=\\|x_{t}-x^{*}\\|^{2},$
	$\displaystyle b_{t}=\gamma\alpha_{t},$
	$\displaystyle h_{t}=(\frac{\gamma^{2}\alpha_{t}\\|x_{t}-x^{}\\|^{2}-2\beta\langle\nabla f(x^{}),x_{t+1}-x^{*}\rangle}{\gamma}),$

we get from (43), (44), and the condition in Theorem 1 (b) that

\displaystyle\lim_{t\longrightarrow\infty}\|x_{t}-x^{*}\|^{2}=0\quad{\textit{almost surely}}.

Therefore, $\{x_{t}\}_{t=0}^{\infty}$ converges almost surely to $x^{*}$ .

References

[1] Jakovetić, D., Bajović, D., Xavier, J., and Moura, J. M. F.: Primal-dual methods for large-scale and distributed convex optimization and data analysis. Proceedings of The IEEE. 108, 1923–1938 (2020)
[2] Yang, T., Yi, X., Wu, J., Yuan, Y., Wu, D., Meng, Z., Hong, Y., Wang, H., Lin, Z., and Johansson, K. H.: A survey of distributed optimization. Annual Reviews in Control. 47, 278–305 (2019)
[3] Mazlum, D. K., Dörfler, F., Sandberg, H., Low, S. H., Chakrabarti, S., Baldick, R., and Lavaei, J.: A survey of distributed optimization and control algorithms for electric power systems. IEEE Trans. on Smart Grid. 8, 2941–2962 (2017)
[4] Nedić, A.: Distributed optimization. Encyclopedia of Systems and Control. 1–12 (2014)
[5] Liberzon, D.: Switching in Systems and Control. Springer, New York (2003)
[6] Cucker, F., and Smale, S.: Emergent behavior in flocks. IEEE Trans. Automatic Control. 52, 852–862 (2007)
[7] Krause, U.: A discrete nonlinear and non-autonomous model of consensus formation. Communications in Difference Equations. Gordon and Breach, Amsterdam. 227–236 (2000)
[8] Totsch, S., and Tadmor, E.: Heterophilious dynamics enhances consensus. SIAM Review. 56, 577-621 (2014)
[9] Acemoglu, D., Ozdaglar, A., and Parandehgheibi, A.: Spread of (mis)information in social networks. Games and Economic Behavior. 70, 194–227 (2010)
[10] Acemoglu, D., and Ozdaglar, A.: Opinion dynamics and learning in social networks. Dynamic Games and Applications. 1, 3–49 (2011)
[11] Acemoglu, D., Como, G., Fagnani, F., and Ozdaglar, A.: Opinion fluctuations and disagreement in social networks. Mathematics of Operations Research. 38, 1–27 (2013)
[12] Acemoglu, D., Bimpikis, K., and Ozdaglar, A.: Dynamics of information exchange in endogenous social networks. Theoretical Economics. 9, 41–97 (2014)
[13] Heyselmann, R., and Krause, U.: Opinion dynamics and bounded confidence moels, analysis, and simulation. J. Artificial Societies and Social Simulation. 5, 1–33 (2002)
[14] Blondel, V. D., Hendrickx, J. M., and Tsitsiklis, J. N.: On Krause’s multi-agent consensus model with state-dependent connectivity. IEEE Trans. on Autom. Contr. 54, 2586–2597 (2009)
[15] Acemoglu, D., Ozdaglar, A., and Yildiz, E.: Diffusion of innovations in social networks. Proc. of 50th IEEE Conf. on Dec. Cont. and Europ. Cont. Conf., Dec. 12-15, Orlando, FL, USA. 2329–2334 (2011)
[16] Simonetto, A., Kevicsky, T., and Babuška, R.: Constrained distributed algebraic connectivity maximization in robotic networks. Automatica. 49, 1348–1357 (2013)
[17] Kim, Y., and Mesbahi, M.: On maximizing the second smallest eigenvalue of a state-dependent graph laplacian. IEEE Trans. on Autom. Contr. 51, 116–120 (2006)
[18] Siljak, D. D.: Dynamic graphs. Nonlin. Analysis: Hybrid Syst. 2, 544–567 (2008)
[19] Trianni, V., Simone, D. D., Reina, A., and Baronchelli, A.: Emergence of consensus in a multi-robot network: from abstract models to empirical validation. IEEE Robotics and Automation Letters. 1, 348–353 (2016)
[20] Lobel, I., Ozdaglar, A., and Feiger, D.: Distributed multi-agent optimization with state-dependent communication. Math. Program. Ser. B. 129, 255–284 (2011)
[21] Jing, G., Zheng, Y., and Wang, L.: Consensus of multiagent systems with distance-dependent communication networks. IEEE Trans. on Neural Networks and Learning Systems. 28, 2712–2726 (2017)
[22] Jing, G., and Wang, L.: Finite time coordination under state-dependent communication graphs with inherent links. IEEE Trans. on Circuits and Systems-II: Express Briefs. 66, 968–972 (2019)
[23] Shang, Y.: Constrained consensus in state-dependent directed multiagent networks. IEEE Trans. on Network Science and Engineering. 9, 4416–4425 (2022)
[24] Sluc̆iak, O., and Rupp, M.: Consensus algorithm with state-dependent weights. IEEE Trans. Signal Processing. 64, 1972–1985 (2016)
[25] Bogojeska, A., Mirchev, M., Mishkovski, I., and Kocarev, L.: Synchronization and consensus in state-dependent networks. IEEE Trans. on Circuits and Systems-I: Regular Papers. 61, 522–529 (2014)
[26] Awad, A., Chapman, A., Schoof, E., Narang-Siddarth, A., and Mesbahi, M.: Time-scale separation on networks: consensus, tracking, and state-dependent interactions. IEEE 54th Annual Conf. on Decision and Control, Dec. 15-18, Osaka, Japan. 6172–6177 (2015)
[27] Shi, G., Johansson, K. H., and Hong, Y.: Reaching an optimal consensus: dynamical systems that compute intersections of convex sets. IEEE Trans. Automatic Control. 58, 610–622 (2013)
[28] Verma, A., Vasconcelos, M., Mitra, U., and Touri, B.: Maximal dissent: a state-dependent way to agree in distributed convex optimization. IEEE Trans. on Control of Network Systems. 10, 1783–1795 (2023)
[29] Alaviani, S. Sh., and Elia, N.: Distributed convex optimization with state-dependent (social) interactions and time-varying topologies. IEEE Trans. Signal Processing. 69, 2611-2624 (2021)
[30] Alaviani, S. Sh., and Elia, N.: Distributed multiagent convex optimization over random digraphs. IEEE Trans. Automatic Control. 65, 986–998 (2020)
[31] Alaviani, S. Sh., and Kelkar, A. G.: Distributed convex optimization with state-dependent interactions over random networks. Proc. of IEEE Conf. on Decision and Control, Dec. 13-17, Austin, Texas, USA. 3149–3153 (2021)
[32] Tsitsiklis, J. N.: Problems in decentralized decision making and computation. Ph.D. dissertation. Dep. Elect. Eng. Comp. Sci., MIT, Cambridge, MA (1984)
[33] Bertsekas, D. P., and Tsitsiklis, J. N.: Parallel and Distributed Computation: Numerical Methods. Prentice Hall, Englewood Cliffs (1989)
[34] Dotson, W. G.: Fixed points of quasi-nonexpansive mappings. J. Austral. Math. Soc. 13, 167–170 (1972)
[35] Horn, R. A., and Johnson, C. R.: Matrix Analysis. Cambridge University Press, New York (1985)
[36] Xu, H. K.: Iterative algorithms for nonlinear operators. J. London Math. Soc. 66, 240-256 (2002)
[37] Alaviani, S. Sh., and Kelkar, A. G.: Asynchronous Algorithms for Distributed Consensus-Based Optimization and Distributed Resource Allocation over Random Networks. Proc. of Amer. Cont. Conf., June 8-10, Atlanta, GA, USA. 216–221 (2022)
[38] Alaviani, S. Sh., and Elia, N.: Distributed average consensus over random networks. Proc. of Amer. Cont. Conf., July 10-12, Philadelphia, PA, USA,. 1854–1859 (2019)
[39] Alaviani, S. Sh., and Elia, N.: A distributed algorithm for solving linear algebraic equations over random networks. IEEE Trans. on Automatic Control. 66, 2399–2406 (2021)
[40] Kattepur, A., Rath, H. K., Mukherjee, A., and Simha, A.: Distributed optimization framework for Industry 4.0 automated warehouses. EAI Endorsed Transactions on Industrial Networks and Intelligent Systems. 5, 1–10 (2018)
[41] Pahlavan, K., and Levesque, H.: Wireless Information Networks. Wiley, New York (1995)
[42] Bubeck, S.: Convex optimization: algorithms and complexity. Foundations and Trends in Machine Learning. 8, 231–357 (2015)
[43] Ryu, E. K., and Boyd, S.: A primer on monotone operator methods. Appl. Comput. Math. 15, 3–43 (2016)

$\displaystyle\\|T(\omega^{*},x)-z\\|^{2}$	$\displaystyle=\\|T(\omega^{*},x)-x+x-z\\|^{2}$
	$\displaystyle=\\|T(\omega^{*},x)-x\\|^{2}+\\|x-z\\|^{2}$
	$\displaystyle\quad+2\langle T(\omega^{*},x)-x,x-z\rangle.$	(18)

	$\displaystyle\\|\hat{T}(\omega^{*},x)-z\\|$	$\displaystyle\leq(1-\eta)\\|x-z\\|+\eta\\|T(\omega^{*},x)-z\\|$
		$\displaystyle\leq(1-\eta)\\|x-z\\|+\eta\\|x-z\\|$
		$\displaystyle=\\|x-z\\|,\forall\omega^{}\in\Omega^{}.$

	$\displaystyle\\|x_{t+1}-x^{*}\\|$	$\displaystyle=\\|\alpha_{t}(x_{t}-\beta\nabla f(x_{t}))+(1-\alpha_{t})\hat{T}(\omega_{t}^{},x_{t})-x^{}\\|$
		$\displaystyle=\\|\alpha_{t}(x_{t}-\beta\nabla f(x_{t})-x^{})+(1-\alpha_{t})(\hat{T}(\omega_{t}^{},x_{t})-x^{*})\\|$
		$\displaystyle\leq\alpha_{t}\\|x_{t}-\beta\nabla f(x_{t})-x^{}\\|+(1-\alpha_{t})\\|\hat{T}(\omega_{t}^{},x_{t})-x^{*}\\|.$

	$\displaystyle\\|x_{t+1}-x^{*}\\|$	$\displaystyle\leq\alpha_{t}\\|x_{t}-\beta\nabla f(x_{t})-x^{}\\|+(1-\alpha_{t})\\|\hat{T}(\omega_{t}^{},x_{t})-x^{*}\\|$
		$\displaystyle\leq\alpha_{t}\\|x_{t}-\beta\nabla f(x_{t})-x^{}\\|+(1-\alpha_{t})\\|x_{t}-x^{}\\|.$		(20)

	$\displaystyle\\|x_{t}-\beta\nabla f(x_{t})-x^{*}\\|$	$\displaystyle=\\|x_{t}-x^{}-\beta(\nabla f(x_{t})-\nabla f(x^{}))-\beta\nabla f(x^{*})\\|$
		$\displaystyle\leq\\|x_{t}-x^{}-\beta(\nabla f(x_{t})-\nabla f(x^{}))\\|+\beta\\|\nabla f(x^{*})\\|.$		(22)