This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

[1]\fnmSeyyed Shaho \surAlaviani

[1]\orgdivDepartment of Mechanical Engineering, \orgnameUniversity of Minnesota, \orgaddress\cityMinneapolis, \stateMinnesota, \countryUSA

2]\orgdivDean, Thomas J. Watson College of Engineering and Applied Science, \orgnameBinghamton University, \orgaddress\cityBinghamton, \stateNew York, \countryUSA

Distributed Convex Optimization with State-Dependent (Social) Interactions over Random Networks

salavian@umn.edu    \fnmAtul \surKelkar akelkar1@binghamton.edu * [
Abstract

This paper aims at distributed multi-agent convex optimization where the communications network among the agents are presented by a random sequence of possibly state-dependent weighted graphs. This is the first work to consider both random arbitrary communication networks and state-dependent interactions among agents. The state-dependent weighted random operator of the graph is shown to be quasi-nonexpansive; this property neglects a priori distribution assumption of random communication topologies to be imposed on the operator. Therefore, it contains more general class of random networks with or without asynchronous protocols. A more general mathematical optimization problem than that addressed in the literature is presented, namely minimization of a convex function over the fixed-value point set of a quasi-nonexpansive random operator. A discrete-time algorithm is provided that is able to converge both almost surely and in mean square to the global solution of the optimization problem. Hence, as a special case, it reduces to a totally asynchronous algorithm for the distributed optimization problem. The algorithm is able to converge even if the weighted matrix of the graph is periodic and irreducible under synchronous protocol. Finally, a case study on a network of robots in an automated warehouse is given where there is distribution dependency among random communication graphs.

keywords:
46Exx, 49Mxx, 65Kxx

1 Introduction

Distributed multi-agent optimization has been an attractive topic due to its applications in several areas such as power systems, smart buildings, and machine learning to name a few; therefore, several investigators have paid much attention to distributed optimization problems (see Surveys [1]-[4]). Switched dynamical systems are divided into two categories: arbitrary (or state-independent) and state-dependent (see [5] and references therein for details and several examples). Many references, cited in Surveys [1]-[4], have investigated distributed optimization over arbitrary networks.

On the other hand, state-dependent networks have been shown in practical systems such as flocking of birds [6], opinion dynamics [7]-[15], mobile robotic networks [16], wireless networks [17], and predator-prey interaction [18]. For example, an agent in social networks weighs the opinions of others based on how much its opinion is close to theirs (see Section I in the preliminary version, i.e., [31], for more details).

In state-dependent networks, coupling between algorithm analysis and information exchange among agents impose significant challenge because states of agents at each time determine the weights in the communication networks. Hence, distributed algorithms’ design for consensus and optimization over state-dependent networks is still a challenge.

Consensus problem for opinion dynamics has been investigated in [7]-[15]. Existence of consensus in a multi-robot network has been shown in [19]. Distributed consensus [21]-[26] and distributed optimization [20], [27]-[29] over state-dependent networks with time-invariant or time-varying111The underlying communication graph is a priori known in a time-varying arbitrary network at each time tt, whereas it is a priori unknown in a random arbitrary network. arbitrary graphs have been considered. Hence, the gap in the literature is to consider distributed multi-agent optimization problems with both state-dependent interactions and random arbitrary (see footnote 1) networks.

This paper aims at distributed multi-agent convex optimization over both state-dependent and random arbitrary networks, that has not been addressed in the literature. Assuming doubly stochasticity of weighted matrix of the graph with respect to state variables for each communication network and strong connectivity of the union of the communication networks allows this result to be applicable to periodic and irreducible weighted matrix of the graph in synchronous222In a synchronous protocol, all nodes activate at the same time and perform communication updates. On the other hand, in an asynchronous protocol, each node has its concept of time defined by a local timer, which randomly triggers either by the local timer or by a message from neighboring nodes. The algorithms guaranteed to work with no a priori bound on the time for updates are called totally asynchronous, and those that need the knowledge of a priori bound, known as B-connectivity assumption, are called partially asynchronous (see [32] and [33, Ch. 6-7]). protocol. We show that state-dependent weighted random operator of the graph is quasi-nonexpansive333It has been shown in [30] that state-independent weighted random operator of the graph has nonexpansivity property.; therefore, imposing a priori distribution of random communication topologies in not required. Thus, it contains random arbitrary networks with/without asynchronous protocols for more general class of switched networks. As an extension of the the distributed optimization problem, we provide a more general mathematical optimization problem than that defined in [30], namely minimization of a convex function over the fixed-value point set of a quasi-nonexpansive random operator. Consequently, the reduced optimization problem to distributed optimization includes both state-independent and state-dependent networks over random arbitrary communication graphs with/without asynchronous protocol (see footnote 3). We prove that the discrete-time algorithm proposed in [30] is utilized for quasi-nonexpansive random operators (which include nonexpansive random operators as a special case). The algorithm converges both almost surely and in mean square to the global optimal solution of the optimization problem under suitable assumptions. For the distributed optimization problem, the algorithm reduces to a totally asynchronous algorithm (see footnote 2). It should be noted that the distributed algorithm444We require to clarify that the distributed algorithm in this paper is the randomized version of the algorithm presented in [29]. In [29], the convergence under deterministic arbitrary switching (see footnote 1) is provided, while we prove here its stochastic convergence (both almost sure and mean square) under random arbitrary switching. Furthermore, quasi-nonexpansivity property of the state-dependent weighted operator of the graph (defined in [29]) has not been shown in [29], whereas we show it here. is totally asynchronous but not asynchronous due to synchronized diminishing step size. The algorithm is able to converge even if the weighted matrix of the graph is periodic and irreducible under synchronous protocol. We provide a numerical example where there is distribution dependency among random arbitrary switching graphs and apply the distributed algorithm to validate the results, while no existing references can conclude results (see Example 1). This version provides proofs, mean square convergence of the proposed algorithm, a numerical example, and larger range of a parameter (i.e., β\beta) in the algorithm, that have not been presented in the preliminary version (i.e., [31]).

This paper is organized as follows. In Section 2, preliminaries on convex analysis and stochastic convergence are given. In Section 3, formulations of the distributed optimization problem and the mathematical optimization are provided. Algorithm and its convergence analysis are presented in Section 4. Finally, a numerical example is given in order to show advantages of the results in Section 5, followed by conclusions and future work in Section 6.

Notations: \Re denotes the set of all real numbers. For any vector zn,z2=zTz,z\in\Re^{n},\|z\|_{2}=\sqrt{z^{T}z}, and for any matrix Zn×n,Z2=λmax(ZTZ)=σmax(Z)Z\in\Re^{n\times n},\|Z\|_{2}=\sqrt{\lambda_{\max}(Z^{T}Z)}=\sigma_{\max}(Z) where ZTZ^{T} represents the transpose of matrix ZZ, λmax\lambda_{\max} represents maximum eigenvalue, and σmax\sigma_{\max} represents largest singular value. Sorted in an increasing order with respect to real parts, λ2(Z)\lambda_{2}(Z) represents the second eigenvalue of a matrix ZZ. Re(r)Re(r) represents the real part of the complex number rr. For any matrix Zn×nZ\in\Re^{n\times n} with Z=[zij]Z=[z_{ij}], Z1=max1jn{i=1n|zij|}\|Z\|_{1}=\max_{1\leq j\leq n}\{\sum_{i=1}^{n}|z_{ij}|\} and Z=max1in{j=1n|zij|}\|Z\|_{\infty}=\max_{1\leq i\leq n}\{\sum_{j=1}^{n}|z_{ij}|\}. InI_{n} represents Identity matrix of size n×nn\times n for some nn\in\mathbb{N} where \mathbb{N} denotes the set of all natural numbers. f(x)\nabla f(x) denotes the gradient of the function f(x)f(x). \otimes denotes the Kronecker product. ×\times represents Cartesian product. E[x]E[x] denotes Expectation of the random variable xx.

2 Preliminaries

A vector vnv\in\Re^{n} is said to be a stochastic vector when its components vi,i=1,2,,nv_{i},i=1,2,...,n, are non-negative and their sum is equal to 1; a square n×nn\times n matrix VV is said to be a stochastic matrix when each row of VV is a stochastic vector. A square n×nn\times n matrix VV is said to be doubly stochastic matrix when both VV and VTV^{T} are stochastic matrices.

Let \mathcal{H} be a real Hilbert space with norm .\|.\| and inner product .,.\langle.,.\rangle. An operator A:A:\mathcal{H}\longrightarrow\mathcal{H} is said to be monotone if xy,AxAy0\langle x-y,Ax-Ay\rangle\geq 0 for all x,yx,y\in\mathcal{H}. A:A:\mathcal{H}\longrightarrow\mathcal{H} is ρ\rho-strongly monotone if xy,AxAyρxy2\langle x-y,Ax-Ay\rangle\geq\rho\|x-y\|^{2} for all x,yx,y\in\mathcal{H}. A differentiable function f:f:\mathcal{H}\longrightarrow\Re is ρ\rho-strongly convex if xy,f(x)f(y)ρxy2\langle x-y,\nabla f(x)-\nabla f(y)\rangle\geq\rho\|x-y\|^{2} for all x,yx,y\in\mathcal{H}. Therefore, a function is ρ\rho-strongly convex if its gradient is ρ\rho-strongly monotone. A convex differentiable function f:f:\mathcal{H}\longrightarrow\Re is \mathcal{L}-strongly smooth if

xy,f(x)f(y)xy2,x,y.\langle x-y,\nabla f(x)-\nabla f(y)\rangle\leq\mathcal{L}\|x-y\|^{2},\forall x,y\in\mathcal{H}.

A mapping B:B:\mathcal{H}\longrightarrow\mathcal{H} is said to be KK-Lipschitz continuous if there exists a K>0K>0 such that BxByKxy\|Bx-By\|\leq K\|x-y\| for all x,yx,y\in\mathcal{H}. Let SS be a nonempty subset of a Hilbert space \mathcal{H} and Q:SQ:S\longrightarrow\mathcal{H}. The point xx is called a fixed point of QQ if x=Q(x)x=Q(x). And, Fix(Q)\text{Fix}(Q) denotes the set of all fixed points of QQ.

Let ω\omega^{*} and ω\omega denote elements in the sets Ω\Omega^{*} and Ω\Omega, respectively, where Ω=Ω×Ω\Omega=\Omega^{*}\times\Omega^{*}\ldots. Let (Ω,σ)(\Omega^{*},\sigma) be a measurable space (σ\sigma-sigma algebra) and CC be a nonempty subset of a Hilbert space \mathcal{H}. A mapping x:Ωx:\Omega^{*}\longrightarrow\mathcal{H} is measurable if x1(U)σx^{-1}(U)\in\sigma for each open subset UU of \mathcal{H}. The mapping T:Ω×CT:\Omega^{*}\times C\longrightarrow\mathcal{H} is a random map if for each fixed zCz\in C, the mapping T(.,z):ΩT(.,z):\Omega^{*}\longrightarrow\mathcal{H} is measurable, and it is continuous if for each ωΩ\omega^{*}\in\Omega^{*} the mapping T(ω,.):CT(\omega^{*},.):C\longrightarrow\mathcal{H} is continuous.

Definition 1.

A measurable mapping x:ΩC,C,x:\Omega^{*}\longrightarrow C,C\subseteq\mathcal{H}, is a random fixed point of the random map T:Ω×CT:\Omega^{*}\times C\longrightarrow\mathcal{H} if T(ω,x(ω))=x(ω)T(\omega^{*},x(\omega^{*}))=x(\omega^{*}) for each ωΩ\omega^{*}\in\Omega^{*}.

Definition 2.

[30] If there exists a point x^\hat{x}\in\mathcal{H} such that x^=T(ω,x^)\hat{x}=T(\omega^{*},\hat{x}) for all ωΩ\omega^{*}\in\Omega^{*}, it is called fixed-value point, and FVP(T)FVP(T) represents the set of all fixed-value points of TT.

Definition 3.

Let CC be a nonempty subset of a Hilbert space \mathcal{H} and T:Ω×CCT:\Omega^{*}\times C\longrightarrow C be a random map. The map TT is said to be

1) nonexpansive random operator if for each ωΩ\omega^{*}\in\Omega^{*} and for arbitrary x,yCx,y\in C we have

T(ω,x)T(ω,y)xy,\|T(\omega^{*},x)-T(\omega^{*},y)\|\leq\|x-y\|, (1)

2) quasi-nonexpansive random operator if for any xCx\in C we have

T(ω,x)ξ(ω)xξ(ω)\|T(\omega^{*},x)-\xi(\omega^{*})\|\leq\|x-\xi(\omega^{*})\|

where ξ:ΩC\xi:\Omega^{*}\longrightarrow C is a random fixed point of TT (see Definition 1).

Note that if T(ω,x)T(ω,x)γxy,0γ<1,\|T(\omega^{*},x)-T(\omega^{*},x)\|\leq\gamma\|x-y\|,0\leq\gamma<1, holds in (1), the operator is called (Banach) contraction.

Remark 1.

If a nonexpansive random operator has a random fixed point, then it is a quasi-nonexpansive random operator. From Definitions 2 and 3, if a quasi-nonexpansive random operator has a fixed-value point, say xx^{*}, then we have for any xCx\in C that

T(ω,x)xxx.\|T(\omega^{*},x)-x^{*}\|\leq\|x-x^{*}\|. (2)
Proposition 1.

[34, Th. 1] If CC is a closed convex subset of a Hilbert space \mathcal{H} and T:CCT:C\longrightarrow C is quasi-nonexpansive, then Fix(T)Fix(T) is a nonempty closed convex set.

Definition 4.

A sequence of random variables xtx_{t} is said to converge

1) pointwise (surely) to xx if for every ωΩ\omega\in\Omega,

limtxt(ω)x(ω)=0,\lim_{t\longrightarrow\infty}\|x_{t}(\omega)-x(\omega)\|=0,

2) almost surely to xx if there exists a subset 𝒜Ω\mathcal{A}\subseteq\Omega such that Pr(𝒜)=0Pr(\mathcal{A})=0, and for every ω𝒜\omega\notin\mathcal{A},

limtxt(ω)x(ω)=0,\lim_{t\longrightarrow\infty}\|x_{t}(\omega)-x(\omega)\|=0,

3) in mean square to xx if

E[xtx2]0ast.E[\|x_{t}-x\|^{2}]\longrightarrow 0\quad{}as\quad{}t\longrightarrow\infty.

Lemma 1. [35, Ch. 5] Let Wm×mW\in\Re^{m\times m}. Then W2W1W.\|W\|_{2}\leq\sqrt{\|W\|_{1}\|W\|_{\infty}}.

Lemma 2. [36] Let {at}t=0\{a_{t}\}_{t=0}^{\infty} be a sequence of nonnegative real numbers satisfying at+1(1bt)at+btht,t0a_{t+1}\leq(1-b_{t})a_{t}+b_{t}h_{t},\quad{t\geq 0} where bt[0,1],t=0bt=b_{t}\in[0,1],\sum_{t=0}^{\infty}b_{t}=\infty, and lim suptht0\displaystyle\limsup_{t\longrightarrow\infty}h_{t}\leq 0. Then limtat=0\displaystyle\lim_{t\longrightarrow\infty}a_{t}=0.

Lemma 3. Let the sequence {xt}t=0\{x_{t}\}_{t=0}^{\infty} in a real Hilbert space \mathcal{H} be bounded for each realization ωΩ\omega\in\Omega and converge almost surely to xx^{*}. Then the sequence converges in mean square to xx^{*}.

Proof: See the proof of Theorem 2 in [30].

The Cucker-Smale weight [6], which depends on distance between two agents ii and jj, is of the form

𝒲ij(xi,xj)=Q(σ2+xixj22)β\mathcal{W}_{ij}(x_{i},x_{j})=\frac{Q}{(\sigma^{2}+\|x_{i}-x_{j}\|^{2}_{2})^{\beta}} (3)

where Q,σ>,Q,\sigma>, and β0\beta\geq 0.

3 Problem Formulation

In social networks, an agent weighs the opinions of others based on how close its opinion (or state) and theirs are, that motivates consideration of state-dependent networks. Vehicular platoon can be modeled as both position-dependent (or state-dependent by considering the position as state) and random arbitrary networks, that is a practical example for motivation of this work. Therefore, there are two combined networks: 1) a network induced by states’ weights, and 2) the underlying random arbitrary network (see Section III in the preliminary version, i.e., [31], for details). The combined state-dependent & random arbitrary network is formulated as follows.

A network of mm\in\mathbb{N} nodes labeled by the set 𝒱={1,2,,m}\mathcal{V}=\{1,2,...,m\} is considered. The topology of the interconnections among nodes is not fixed but defined by a set of graphs 𝒢(ω)=(𝒱,(ω))\mathcal{G}(\omega^{*})=(\mathcal{V},\mathcal{E}(\omega^{*})) where (ω)\mathcal{E}(\omega^{*}) is the ordered edge set (ω)𝒱×𝒱\mathcal{E}(\omega^{*})\subseteq\mathcal{V}\times\mathcal{V} and ωΩ\omega^{*}\in\Omega^{*} where Ω\Omega^{*} is the set of all possible communication graphs, i.e., Ω={𝒢1,𝒢2,,𝒢N¯}\Omega^{*}=\{\mathcal{G}_{1},\mathcal{G}_{2},...,\mathcal{G}_{\bar{N}}\}. We assume that (Ω,σ)(\Omega^{*},\sigma) is a measurable space where σ\sigma is the σ\sigma-algebra on Ω\Omega^{*}. We write 𝒩iin(ω)/𝒩iout(ω)\mathcal{N}_{i}^{in}(\omega^{*})/\mathcal{N}_{i}^{out}(\omega^{*}) for the labels of agent ii’s in/out neighbors at graph 𝒢(ω)\mathcal{G}(\omega^{*}) so that there is an arc in 𝒢(ω)\mathcal{G}(\omega^{*}) from vertex j/ij/i to vertex i/ji/j only if agent ii receives/sends information from/to agent jj. We write 𝒩i(ω)\mathcal{N}_{i}(\omega^{*}) when 𝒩iin(ω)=𝒩iout(ω)\mathcal{N}_{i}^{in}(\omega^{*})=\mathcal{N}_{i}^{out}(\omega^{*}). It is assumed that there are no communication delay or communication noise in the network.

It should be noted that in our formulation, the in and out neighbors of each agent i𝒱i\in\mathcal{V} at each graph 𝒢iΩ,i=1,,N¯,\mathcal{G}_{i}\in\Omega^{*},i=1,\ldots,\bar{N}, are fixed, and we will consider that the weights of links are possibly state-dependent. For instance, an agent pays attention arbitrarily at each time to its friends while it weighs the difference between its opinion and others for decision (see [29, Sec. III] for more details).

We associate for each node i𝒱i\in\mathcal{V} a convex cost function fi:nf_{i}:\Re^{n}\longrightarrow\Re which is only observed by node ii. The objective of each agent is to find a solution of the following optimization problem:

min𝑠i=1mfi(s)\underset{s}{\min}\sum_{i=1}^{m}f_{i}(s)

where sns\in\Re^{n}. Since each node ii knows only its own fif_{i}, the nodes cannot individually calculate the optimal solution and, therefore, must collaborate to do so.

The above problem can be formulated based on local variables of the agents as

min𝑥\displaystyle\underset{x}{\text{min}} f(x):=i=1mfi(xi)\displaystyle f(x):=\sum_{i=1}^{m}f_{i}(x_{i}) (4)
subject to x1==xm\displaystyle x_{1}=\ldots=x_{m}

where x=[x1T,x2T,,xmT]T,xin,i𝒱x=[x_{1}^{T},x_{2}^{T},\ldots,x_{m}^{T}]^{T},x_{i}\in\Re^{n},i\in\mathcal{V}, and the constraint set is reached through state-dependent interactions and random (arbitrary) communication graphs. The set

𝒞:={xmn|xi=xj,1i,jm,xin}\mathcal{C}:=\{x\in\Re^{mn}|x_{i}=x_{j},1\leq i,j\leq m,x_{i}\in\Re^{n}\} (5)

is known as consensus subspace which is a convex set. Note that the Hilbert space considered in this paper for the distributed optimization problem is =(mn,.2).\mathcal{H}=(\Re^{mn},\|.\|_{2}).

We show W(ω,x):=𝒲(ω,x)InW(\omega^{*},x):=\mathcal{W}(\omega^{*},x)\otimes I_{n} and 𝒲(ω,x)=[𝒲ij(ω,xi,xj)]\mathcal{W}(\omega^{*},x)=[\mathcal{W}_{ij}(\omega^{*},x_{i},x_{j})] for the state-dependent weighted matrix of the fixed graph ωΩ\omega^{*}\in\Omega^{*} in a switching network having all possible communication topologies in the set Ω\Omega^{*}. For instance, if nodes are not activated at some time t~\tilde{t} for communication updates in asynchronous protocol, and/or there are no edges in occuring graph at the time t~\tilde{t}, then 𝒲(ωt~,xt~)=Im\mathcal{W}(\omega^{*}_{\tilde{t}},x_{\tilde{t}})=I_{m}.

Now we impose Assumptions 1 and 2 below on 𝒲(ω,x)\mathcal{W}(\omega^{*},x).

Assumption 1. For each fixed ωΩ\omega^{*}\in\Omega^{*}, the weights 𝒲ij(ω,xi,xj):Ω×n×n[0,1]\mathcal{W}_{ij}(\omega^{*},x_{i},x_{j}):\Omega^{*}\times\Re^{n}\times\Re^{n}\longrightarrow[0,1] are continuous, and the state-dependent weighted matrix of the graph is doubly stochastic for all ωΩ\omega^{*}\in\Omega^{*}, i.e.,

i) j𝒩iin(ω){i}𝒲ij(ω,xi,xj)=1,i=1,2,,m,\sum_{j\in\mathcal{N}_{i}^{in}(\omega^{*})\cup\{i\}}\mathcal{W}_{ij}(\omega^{*},x_{i},x_{j})=1,i=1,2,...,m,

ii) j𝒩iout(ω){i}𝒲ij(ω,xi,xj)=1,i=1,2,,m.\sum_{j\in\mathcal{N}_{i}^{out}(\omega^{*})\cup\{i\}}\mathcal{W}_{ij}(\omega^{*},x_{i},x_{j})=1,i=1,2,...,m.

Assumption 1 allows us to remove the couple of information exchange with the analysis of our proposed algorithm and to consider random graphs together. The state-dependent weight 𝒲ij(ω,xi,xj)\mathcal{W}_{ij}(\omega^{*},x_{i},x_{j}) between any two agents ii and jj in Assumption 1 is general and may be a function of distance or other forms of interactions. Note that any network with undirected links and continuous weights 𝒲ij(ω,xi,xj)\mathcal{W}_{ij}(\omega^{*},x_{i},x_{j}) satisfies Assumption 1 since the weighted matrix of the graph is symmetric (and thus doubly stochastic).

Assumption 2. The union of the graphs in Ω\Omega^{*} is strongly connected for all xmnx\in\Re^{mn}, i.e.,

Re[λ2(ωΩ(Im𝒲(ω,x)))]>0,xmn.Re[\lambda_{2}(\sum_{\omega^{*}\in\Omega^{*}}(I_{m}-\mathcal{W}(\omega^{*},x)))]>0,\quad{}\forall x\in\Re^{mn}. (6)

Assumption 2 guarantees that the information sent from each node is eventually received by every other node. The set 𝒞\mathcal{C} defined in (5) (which is the constraint set of (4)) can be obtained from the set

{x|W(ω,x)x=x,ωΩ,with Assumptions 1 and 2}\{x|W(\omega^{*},x)x=x,\forall\omega^{*}\in\Omega^{*},\textit{with Assumptions 1 and 2}\} (7)

(see [29, Appendices A and B] by setting G=ΩG=\Omega^{*} for the proof). This allows us to reformulate (4) as

min𝑥\displaystyle\underset{x}{\text{min}} f(x):=i=1mfi(xi)\displaystyle f(x):=\sum_{i=1}^{m}f_{i}(x_{i}) (8)
subject to W(ω,x)x=x,ωΩ.\displaystyle W(\omega^{*},x)x=x,\forall\omega^{*}\in\Omega^{*}.

Thus, a solution of (4) can be attained by solving (8) with Assumptions 1 and 2.

The random operator T(ω,x):=W(ω,x)xT(\omega^{*},x):=W(\omega^{*},x)x is called state-dependent weighted random operator of the graph (see [30, Def. 8], [29, Def. 4]). From Definition 2 and (7), we have FVP(T)=𝒞FVP(T)=\mathcal{C} with Assumptions 1 and 2.

Now we show that the random operator T(ω,x):=W(ω,x)xT(\omega^{*},x):=W(\omega^{*},x)x with Assumption 1 is quasi-nonexpansive in the Hilbert space =(mn,.2)\mathcal{H}=(\Re^{mn},\|.\|_{2}). Let zFVP(T)=𝒞z\in FVP(T)=\mathcal{C}. Since z𝒞z\in\mathcal{C} and W(ω,x)W(\omega^{*},x) is a stochastic matrix (see Assumption 1) for all ωΩ,x\omega^{*}\in\Omega^{*},x\in\mathcal{H}, we have W(ω,x)z=zW(\omega^{*},x)z=z. Therefore, we obtain

T(ω,x)z2\displaystyle\|T(\omega^{*},x)-z\|_{2} =W(ω,x)xW(ω,x)z2\displaystyle=\|W(\omega^{*},x)x-W(\omega^{*},x)z\|_{2}
W(ω,x)2xz2.\displaystyle\leq\|W(\omega^{*},x)\|_{2}\|x-z\|_{2}.

Since W(ω,x)W(\omega^{*},x) is doubly stochastic by Assumption 1, we have from Lemma 1 (where W(ω,x)1=W(ω,x)=1\|W(\omega^{*},x)\|_{1}=\|W(\omega^{*},x)\|_{\infty}=1) that W(ω,x)21,ωΩ.\|W(\omega^{*},x)\|_{2}\leq 1,\forall\omega^{*}\in\Omega^{*}. Hence,

T(ω,x)z2W(ω,x)2xz2xz2\|T(\omega^{*},x)-z\|_{2}\leq\|W(\omega^{*},x)\|_{2}\|x-z\|_{2}\leq\|x-z\|_{2} (9)

which implies that the random operator T(ω,x)T(\omega^{*},x) is quasi-nonexpansive (see Remark 1).

Problem (8) is a special case of the general class of problem presented in Problem 1 below where T(ω,x):=W(ω,x)xT(\omega^{*},x):=W(\omega^{*},x)x. It is to be noted that Problem 3 in [30] is defined for nonexpansive random operator, while we define Problem 1 below for quasi-nonexpansive random operator which contains nonexpansive random operator as a special case (see Remark 1).

Problem 1: Let \mathcal{H} be a real Hilbert space. Assume that the problem is feasible, namely FVP(T)FVP(T)\neq\emptyset. Given a convex function f:f:\mathcal{H}\longrightarrow\Re and a quasi-nonexpansive random mapping T:Ω×T:\Omega^{*}\times\mathcal{H}\longrightarrow\mathcal{H}, the problem is to find xargmin𝑥f(x)x^{*}\in\underset{x}{\operatorname{argmin}}f(x) such that xx^{*} is a fixed-value point of T(ω,x)T(\omega^{*},x), i.e., we have the following minimization problem

min𝑥\displaystyle\underset{x}{\text{min}} f(x)\displaystyle f(x) (10)
subject to xFVP(T)\displaystyle x\in FVP(T)

where FVP(T)FVP(T) is the set of fixed-value points of the random operator T(ω,x)T(\omega^{*},x) (see Definition 2).

Remark 2.

A fixed-value point of a quasi-nonexpansive random mapping is a common fixed point of a family of quasi-nonexpansive non-random mappings T(ω,.)T(\omega^{*},.) for each ω\omega^{*}. From Preposition 1, the fixed point set of a quasi-nonexpansive non-random mapping T(ω,.)T(\omega^{*},.) for each ω\omega^{*} is a convex set. It is well-known that the intersection of convex sets (finite, countable, or uncountable) is convex. Thus, FVP(T)FVP(T) is a convex set, and Problem 1 is a convex optimization problem.

4 Algorithm and Its Convergence

Here, we present that the proposed algorithm in [30] (which works for nonexpansive random operators) is applicable for solving Problem 1 with quasi-nonexpansive random operators. Thus, we propose the following algorithm for solving Problem 1:

xt+1=αt(xtβf(xt))+(1αt)T^(ωt,xt),x_{t+1}=\alpha_{t}(x_{t}-\beta\nabla f(x_{t}))+(1-\alpha_{t})\hat{T}(\omega_{t}^{*},x_{t}), (11)

where T^(ωt,xt):=(1η)xt+ηT(ωt,xt),η(0,1),αt[0,1],\hat{T}(\omega_{t}^{*},x_{t}):=(1-\eta)x_{t}+\eta T(\omega_{t}^{*},x_{t}),\eta\in(0,1),\alpha_{t}\in[0,1], and ωt\omega^{*}_{t} is a realization of the set Ω\Omega^{*} at time tt. The challenge of extending the result in [30] is to use weaker property (2) which is valid for all xx\in\mathcal{H} and xFVP(T)x^{*}\in FVP(T), instead of stronger property (1) which is valid for all x,yx,y\in\mathcal{H}.

Let (Ω,σ)(\Omega^{*},\sigma) be a measurable space where Ω\Omega^{*} and σ\sigma are defined in Section 3. Consider a probability measure μ\mu defined on the space (Ω,)(\Omega,\mathcal{F}) where

Ω=Ω×Ω×Ω×\Omega=\Omega^{*}\times\Omega^{*}\times\Omega^{*}\times\ldots

and \mathcal{F} is a sigma algebra on Ω\Omega such that (Ω,,μ)(\Omega,\mathcal{F},\mu) forms a probability space. We denote a realization in this probability space by ωΩ\omega\in\Omega. We have the following assumptions.

Assumption 3. f(x)f(x) is continuously differentiable, ρ\rho-strongly convex, and f(x)\nabla f(x) is KK-Lipschitz continuous.

Assumption 4. There exists a nonempty subset K~Ω\tilde{K}\subseteq\Omega^{*} such that FVP(T)={z~|z~,z~=T(ω¯,z~),ω¯K~}FVP(T)=\{\tilde{z}|\tilde{z}\in\mathcal{H},\tilde{z}=T(\bar{\omega},\tilde{z}),\forall\bar{\omega}\in\tilde{K}\}, and each element of K~\tilde{K} occurs infinitely often almost surely.

Assumption 4 is weaker than existing assumptions for random networks as explained in details in Remark 3 below.

Remark 3.

[30] If the sequence {ω(t)}n=0\{\omega^{*}(t)\}_{n=0}^{\infty} is mutually independent with t=0Prt(ω¯)=\sum_{t=0}^{\infty}Pr_{t}(\bar{\omega})=\infty where Prt(ω¯)Pr_{t}(\bar{\omega}) is the probability of (a particular element) ω¯\bar{\omega} occurring at time tt, then Assumption 4 is satisfied. Moreover, any ergodic stationary sequences {ω(t)}t=0,Pr(ω¯)>0,\{\omega^{*}(t)\}_{t=0}^{\infty},Pr(\bar{\omega})>0, satisfy Assumption 4. Consequently, any time-invariant Markov chain with its unique stationary distribution as the initial distribution satisfies Assumption 4.

4.1 Almost Sure Convergence

Before we give our theorems, we need to extend Lemma 5 in [30] (which is for nonexpansive random operators) to quasi-nonexpansive random operators. Hence, we have the following lemma.

Lemma 4. Let \mathcal{H} be a real Hilbert space, T^(ω,x):=(1η)x+ηT(ω,x),ωΩ,x,\hat{T}(\omega^{*},x):=(1-\eta)x+\eta T(\omega^{*},x),\omega^{*}\in\Omega^{*},x\in\mathcal{H}, with a quasi-nonexpansive random operator TT, FVP(T)FVP(T)\neq\emptyset, and η(0,1]\eta\in(0,1]. Then

(i) FVP(T)=FVP(T^).FVP(T)=FVP(\hat{T}).

(ii) xT^(ω,x),xzη2xT(ω,x)2,zFVP(T),ωΩ.\langle x-\hat{T}(\omega^{*},x),x-z\rangle\geq\frac{\eta}{2}\|x-T(\omega^{*},x)\|^{2},\forall z\in\quad{}\quad{}\quad{}\quad{}FVP(T),\forall\omega^{*}\in\Omega^{*}.

(iii) T^(ω,x)\hat{T}(\omega^{*},x) is quasi-nonexpasnive.

Proof. See Appendix A.

We present the main theorem in this paper as follows.

Theorem 2.

Consider Problem 1 with Assumptions 3 and 4. Let β(0,2K)\beta\in(0,\frac{2}{K}) and αt[0,1],t{0}\alpha_{t}\in[0,1],t\in\mathbb{N}\cup\{0\} such that

(a) limtαt=0,\displaystyle\lim_{t\longrightarrow\infty}\alpha_{t}=0,

(b) t=0αt=.\sum_{t=0}^{\infty}\alpha_{t}=\infty.

Then starting from any initial point, the sequence generated by (11) globally converges almost surely to the unique solution of the problem.

Note that the range of β\beta in Theorem 1 in [31] (i.e., the preliminary version of this paper) is β(0,2ρK2)\beta\in(0,\frac{2\rho}{K^{2}}) which is enlarged to β(0,2K)\beta\in(0,\frac{2}{K}) in Theorem 1 above. This is due to the fact that according to definitions of strong convexity and strong smoothness of a differentiable convex function ff (see also parts (5)-(6) in [43, p. 38]), we always have ρK\rho\leq K. Hence, 2ρK22K\frac{2\rho}{K^{2}}\leq\frac{2}{K}. An advantageous of this enlargement is to have more choice to select the parameter β.\beta. An example of αt\alpha_{t} satisfying (a) and (b) in Theorem 1 is αt:=1(1+t)ζ\alpha_{t}:=\frac{1}{(1+t)^{\zeta}} where ζ(0,1]\zeta\in(0,1].

Remark 4.

As seen from the proof of Theorem 1, an advantage of the proposed technique in [30] (and thus here) is that we are able to analyze stochastic processes in a fully deterministic way (see Remark 12 in [30] for details).

Proof of Theorem 1. We prove Theorem 1 in three steps, i.e.

Step 1: {xt}t=0,ωΩ,\{x_{t}\}_{t=0}^{\infty},\forall\omega\in\Omega, is bounded (see Lemma 5 in Appendix B).

Step 2: {xt}t=0\{x_{t}\}_{t=0}^{\infty} converges almost surely to a random variable supported by the feasible set (see Lemma 6 in Appendix C).

Step 3: {xt}t=0\{x_{t}\}_{t=0}^{\infty} converges almost surely to the optimal solution (see Lemma 7 in Appendix D).

4.2 Mean Square Convergence

Due to the fact that almost sure convergence in general does not imply mean square convergence and vice versa, we show the mean square convergence of the random sequence generated by Algorithm (11) in the following theorem.

Theorem 3.

Consider Problem 1 with Assumptions 3 and 4. Suppose that β(0,2K)\beta\in(0,\frac{2}{K}) and αt[0,1],t{0}\alpha_{t}\in[0,1],t\in\mathbb{N}\cup\{0\}, satisfies (a) and (b) in Theorem 1. Then starting from any initial point, the sequence generated by (11) globally converges in mean square to the unique solution of the problem.

Proof. From Step 1, Theorem 1, and Lemma 3, one can prove Theorem 2.

4.3 Distributed Optimization

Distributed optimization problem with state-dependent interactions over random arbitrary networks is a special case of Problem 1 (see Section 3). Hence, Algorithm (11) is directly applied to solve (8) in a distributed manner under the consideration that each fi(xi)f_{i}(x_{i}) is ρ\rho-strongly convex and fi(xi)\nabla f_{i}(x_{i}) is KK-Lipschitz. Thus, we give the following corollary of Theorems 1 and 2.

Corollary 1. Consider the optimization (8) with Assumptions 1, 2, and 4. Assume that each fi(xi)f_{i}(x_{i}) is ρ\rho-strongly convex and fi(xi)\nabla f_{i}(x_{i}) is KK-Lipschitz for i=1,,mi=1,\ldots,m. Suppose that β(0,2K),η(0,1),\beta\in(0,\frac{2}{K}),\eta\in(0,1), and αt[0,1],t{0}\alpha_{t}\in[0,1],t\in\mathbb{N}\cup\{0\} satisfies (a) and (b) in Theorem 1. Then starting from any initial point, the sequence generated by the following distributed algorithm based on local information for each agent ii

xi,t+1\displaystyle x_{i,t+1} =αt(xi,tβfi(xi,t))+(1αt)((1η)xi,t\displaystyle=\alpha_{t}(x_{i,t}-\beta\nabla f_{i}(x_{i,t}))+(1-\alpha_{t})((1-\eta)x_{i,t}
+ηj𝒩iin(ωt){i}𝒲ij(ωt,xi,xj)xj,t),\displaystyle\quad{}+\eta\sum_{j\in\mathcal{N}_{i}^{in}(\omega^{*}_{t})\cup\{i\}}\mathcal{W}_{ij}(\omega^{*}_{t},x_{i},x_{j})x_{j,t}), (12)

globally converges both almost surely and in mean square to the unique solution of the problem.

Algorithm (4.3) is totally asynchronous algorithm (see footnote 2) without requiring a priori distribution or B-connectivity (see footnote 2) of switched graphs. B-connectivity assumption satisfies Assumption 4. The algorithm is not asynchronous due to synchronized diminishing step size αt\alpha_{t}. The algorithm still works in the case where state-dependent/state-independent weighted matrix of the graph is periodic and irreducible in synchronous protocol. Detailed properties of Algorithm (4.3) for time-varying (see footnote 1) networks has been studied in [29] and can be induced for random networks (see also footnote 4).

Remark 5.

The convergence rate of a totally asynchronous algorithm in general cannot be established. Determining rate of convergence of (4.3) based on suitable assumptions is left for future work. An asynchronous and totally asynchronous algorithm for distributed optimization over random networks with state-independent interactions has recently been proposed in [37]. As a special case of distributed optimization over state-independent networks, asynchronous and total asynchronous algorithms have been given for average consensus and solving linear algebraic equations in [38] and [39], respectively (see [37, Sec. I] for details).

5 Numerical Example

We give a practical example of distributed optimization with state-dependent interactions of Cucker-Smale form [6] with random (arbitrary) communication links where there are distribution dependencies among random arbitrary switched graphs. We mention that the following example has been solved over time-varying (see footnote 1) networks in [29], while we solve it here over random arbitrary networks, where there are distribution dependency among switched communication graphs, to show the capability of Algorithm (4.3).

Example 1. (Distributed Optimization over Random Arbitrary Networks for an Automated Warehouse): Consider mm robots on the shop floor in a warehouse. Assigning tasks to robotic agents is modeled as optimization problems in an automated warehouse [40], that are solved by a centralized processor and are neither scalable nor can handle autonomous entities [40]. Moreover, due to large number of robots, the robots must handle tasks in collaborative manner [40] due to computational restriction of a centralized processor. If we assume that the communications among robots are carried out via a wireless network, then the signal power at a receiver is inversely proportional to some power of the distance between transmitter and receiver [41]. Therefore, if we consider the position as the state for each robot, then the weights of the links between robots are state-dependent.

Refer to caption
Figure 1: Variables xi1,i=1,,20,x_{i}^{1},i=1,\ldots,20, of the robotic agents with weights of the form (14). This figure shows that the variables are getting consensus when the robots communicate for one realization of random network with distribution dependency.

Assume that m=20m=20 robots bring some loads from different initial places to a place for delivery. The desired place to put the loads is determined to minimize the pre-defined cost as sum of squared distances to the initial places of the robots as

min𝑠i=120sdi22\underset{s}{\min}\sum_{i=1}^{20}\|s-d_{i}\|^{2}_{2} (13)

where s2s\in\Re^{2} is the decision variable, and did_{i} is the position of the initial place of the load ii on the two-dimensional shop floor. The above problem is reformulated as the following problem based on the local variables of the agents:

min𝑥\displaystyle\underset{x}{\text{min}} f(x):=i=1200.5xidi22\displaystyle f(x):=\sum_{i=1}^{20}0.5\|x_{i}-d_{i}\|^{2}_{2}
subject to x1=x2==x20\displaystyle x_{1}=x_{2}=\ldots=x_{20}

where xi=[xi1,xi2]Tx_{i}=[x_{i}^{1},x_{i}^{2}]^{T}, and the constraint set is reached via distance-dependent network with random communication graphs.

Refer to caption
Figure 2: Variables xi2,i=1,,20,x_{i}^{2},i=1,\ldots,20, of the robotic agents with weights of the form (14). This figure shows that the variables are getting consensus when the robots communicate for one realization of random network with distribution dependency.

The topology of the underlying undirected graph is assumed to be a line graph, i.e., 12201\longleftrightarrow 2\ldots\longleftrightarrow 20, for minimal connectivity among robots. Based on weighing property of wireless communication network mentioned eralier, the weight of the link between robots ii and jj is modeled to be of Cucker-Smale form (see Section 2)

𝒲ij(xi,xj)=0.251+xixj22.\mathcal{W}_{ij}(x_{i},x_{j})=\frac{0.25}{1+\|x_{i}-x_{j}\|^{2}_{2}}. (14)

One can see that the weight of each link at each time tt is only determined by the states of the agents, and hence no local property is assumed or determined a priori for all tt in Algorithm (4.3) (see [29] for details). It is easy to check that fi(xi):=0.5xidi22,i=1,2,,20,f_{i}(x_{i}):=0.5\|x_{i}-d_{i}\|^{2}_{2},i=1,2,...,20, are 1-strongly convex, and fi(xi)\nabla f_{i}(x_{i}) are 11-Lipschitz continuous.

Refer to caption
Figure 3: Two-dimensional (2D) plot of variables x1x^{1} and x2x^{2}, in Figures 1 and 2, where the initial positions of agents are shown with ’o’, and the final position is shown with ’x’.
Refer to caption
Figure 4: The error in Example 1 with weights of the form (14) for one realization of the random network with distribution dependency.

We consider that each link has independent and identically distributed (i.i.d.) Bernoulli distribution with Pr(failure)=0.5Pr(failure)=0.5 in every N~\tilde{N}-interval, and at the iteration kN~,k=1,,k\tilde{N},k=1,\ldots, the link that has worked the minimum number of the times in the previous N~\tilde{N}-interval occurs. If some links have the same number of occurrences in the previous N~\tilde{N}-interval, then one is chosen randomly. Here, we have the graphs ωΩ={𝒢1,,𝒢19}\mathcal{\omega}^{*}\in\Omega^{*}=\{\mathcal{G}_{1},\ldots,\mathcal{G}_{19}\}. Thus the sequence {ωt}t=0\{\omega^{*}_{t}\}_{t=0}^{\infty} is not independent. It has been shown in [30] that the each graph 𝒢i,i=1,,19,\mathcal{G}_{i},i=1,\ldots,19, occurs infinitely often almost surely. Moreover, the union of the graphs is strongly connected for all x40x\in\Re^{40}. Therefore, Assumption 4 is fulfilled. Thus the conditions of Theorems 1 and 2 are satisfied.

We use η=0.8,αt=11+t,t0,β=1K=11\eta=0.8,\alpha_{t}=\frac{1}{1+t},t\geq 0,\beta=\frac{1}{K}=\dfrac{1}{1} for simulation. The initial position of agent ii is chosen to be xi,0=[10cos((i1)2π22),10sin((i1)2π22)]Tx_{i,0}=[10cos(\frac{(i-1)2\pi}{22}),10sin(\frac{(i-1)2\pi}{22})]^{T}. The optimal solution of (13) in centralized way can be computed as mean of di,i=1,,20,d_{i},i=1,\ldots,20, and is s=[0.9002,0.4111]Ts^{*}=[-0.9002,0.4111]^{T}. The results given by Algorithm (4.3) are shown in Figure 1-4. The error et:=xts1202e_{t}:=\|x_{t}-s^{*}\otimes\textbf{1}_{20}\|_{2}, where xt=[x1,tT,,x20,tT]T,x_{t}=[x_{1,t}^{T},\ldots,x_{20,t}^{T}]^{T}, is given in Fig. 4. The two-dimensional (2D) plot is shown in Figure 3. Figures 1-4 show that the positions of robotic agents are approaching the solution of the optimization (13) for one realization of random network with distribution dependency. Note that no existing result can solve this problem since the weights of links are both position-dependent and randomly arbitrarily activated.

We also simulate the above example with different weights than Cucker-Smale form, i.e.,

𝒲ij(xi,xj)=0.251+log2(1+xixj2),\mathcal{W}_{ij}(x_{i},x_{j})=\frac{0.25}{1+log^{2}(1+\|x_{i}-x_{j}\|_{2})}, (15)

and the results are shown in Figures 5-6. The figures show that the variables of agents are getting consensus on the optimal solution of the problem.

6 Conclusions and Future Work

Distributed optimization with both state-dependent interactions and random (arbitrary) networks is considered. It is shown that the state-dependent weighted random operator of the graph is quasi-nonexpansive; thus, it is not required to impose a priori distribution of random communication topologies on switching graphs. A more general optimization problem than that addressed in the literature is provided. A gradient-based discrete-time algorithm using diminishing step size is provided that is able to converge both almost surely and in mean square to the global solution of the optimization problem under suitable assumptions. Moreover, it reduces to a totally asynchronous algorithm for the distributed optimization problem. Relaxing strong convexity assumption of cost functions and/or doubly stochasticity assumption of communication graphs opens problems for future research.

Refer to caption
Figure 5: Variables xi1,i=1,,20,x_{i}^{1},i=1,\ldots,20, of the robotic agents with weights of the form (15). This figure shows that the variables are getting consensus when the robots communicate for one realization of random network with distribution dependency.
Refer to caption
Figure 6: Variables xi2,i=1,,20,x_{i}^{2},i=1,\ldots,20, of the robotic agents with weights of the form (15). This figure shows that the variables are getting consensus when the robots communicate for one realization of random network with distribution dependency.

Appendix A

Proof of Lemma 4.

(i) The proof is the same as the proof of part (i) of Lemma 5 in [30].

(ii) We have from quasi-nonexpansivity of T(ω,x)T(\omega^{*},x) for arbitrary xx\in\mathcal{H} that

T(ω,x)z2xz2,zFVP(T),ωΩ.\|T(\omega^{*},x)-z\|^{2}\leq\|x-z\|^{2},\forall z\in FVP(T),\forall\omega^{*}\in\Omega^{*}. (16)

In a Hilbert space \mathcal{H}, we have

u+v2=u2+v2+2u,v,u,v.\|u+v\|^{2}=\|u\|^{2}+\|v\|^{2}+2\langle u,v\rangle,\forall u,v\in\mathcal{H}. (17)

From (17), we obtain for all zFVP(T)z\in FVP(T) and for all ωΩ\omega^{*}\in\Omega^{*} that

T(ω,x)z2\displaystyle\|T(\omega^{*},x)-z\|^{2} =T(ω,x)x+xz2\displaystyle=\|T(\omega^{*},x)-x+x-z\|^{2}
=T(ω,x)x2+xz2\displaystyle=\|T(\omega^{*},x)-x\|^{2}+\|x-z\|^{2}
+2T(ω,x)x,xz.\displaystyle\quad+2\langle T(\omega^{*},x)-x,x-z\rangle. (18)

Substituting (18) for (16) yields

2xT(ω,x),xzT(ω,x)x2.2\langle x-T(\omega^{*},x),x-z\rangle\geq\|T(\omega^{*},x)-x\|^{2}. (19)

From the definition of T^(ωt,xt)\hat{T}(\omega_{t}^{*},x_{t}) (see (11)), substituting xT(ω,x)=xT^(ω,x)ηx-T(\omega^{*},x)=\frac{x-\hat{T}(\omega^{*},x)}{\eta} for the left hand side of the inequality (19) implies (ii). Thus the proof of part (ii) of Lemma 4 is complete.

(iii) We have from quasi-nonexpansivity of T(ω,x)T(\omega^{*},x) for zFVP(T)z\in FVP(T) and arbitrary xx\in\mathcal{H} that

T^(ω,x)z\displaystyle\|\hat{T}(\omega^{*},x)-z\| (1η)xz+ηT(ω,x)z\displaystyle\leq(1-\eta)\|x-z\|+\eta\|T(\omega^{*},x)-z\|
(1η)xz+ηxz\displaystyle\leq(1-\eta)\|x-z\|+\eta\|x-z\|
=xz,ωΩ.\displaystyle=\|x-z\|,\forall\omega^{*}\in\Omega^{*}.

Therefore, T^(ω,x)\hat{T}(\omega^{*},x) is a quasi-nonexpansive random operator, and the proof of part (iii) of Lemma 4 is complete.

Appendix B

Lemma 5. The sequence {xt}t=0,ωΩ,\{x_{t}\}_{t=0}^{\infty},\forall\omega\in\Omega, generated by (11) is bounded with Assumption 3.

Proof. Since the cost function is smooth and strongly convex and the constraint set is nonempty and closed, the problem has a unique solution. Let xx^{*} be the unique solution of the problem. We can write x=αtx+(1αt)x,t{0}x^{*}=\alpha_{t}x^{*}+(1-\alpha_{t})x^{*},\forall t\in\mathbb{N}\cup\{0\}. Therefore, we have

xt+1x\displaystyle\|x_{t+1}-x^{*}\| =αt(xtβf(xt))+(1αt)T^(ωt,xt)x\displaystyle=\|\alpha_{t}(x_{t}-\beta\nabla f(x_{t}))+(1-\alpha_{t})\hat{T}(\omega_{t}^{*},x_{t})-x^{*}\|
=αt(xtβf(xt)x)+(1αt)(T^(ωt,xt)x)\displaystyle=\|\alpha_{t}(x_{t}-\beta\nabla f(x_{t})-x^{*})+(1-\alpha_{t})(\hat{T}(\omega_{t}^{*},x_{t})-x^{*})\|
αtxtβf(xt)x+(1αt)T^(ωt,xt)x.\displaystyle\leq\alpha_{t}\|x_{t}-\beta\nabla f(x_{t})-x^{*}\|+(1-\alpha_{t})\|\hat{T}(\omega_{t}^{*},x_{t})-x^{*}\|.

Since xx^{*} is the solution, we have that xFVP(T)=FVP(T^)x^{*}\in FVP(T)=FVP(\hat{T}) (see part (i) of Lemma 4). Due to the fact that T^(ω,x)\hat{T}(\omega^{*},x) is a quasi-nonexpansive random operator (see part (iii) of Lemma 4), the above can be written as

xt+1x\displaystyle\|x_{t+1}-x^{*}\| αtxtβf(xt)x+(1αt)T^(ωt,xt)x\displaystyle\leq\alpha_{t}\|x_{t}-\beta\nabla f(x_{t})-x^{*}\|+(1-\alpha_{t})\|\hat{T}(\omega_{t}^{*},x_{t})-x^{*}\|
αtxtβf(xt)x+(1αt)xtx.\displaystyle\leq\alpha_{t}\|x_{t}-\beta\nabla f(x_{t})-x^{*}\|+(1-\alpha_{t})\|x_{t}-x^{*}\|. (20)

Since fi(xi)\nabla f_{i}(x_{i}) is KK-Lipschitz, fi(xi)f_{i}(x_{i}) is KK-strongly smooth (see [42, Lem. 3.4]). When fi(xi)f_{i}(x_{i}) is ρ\rho-strongly convex and KK-strongly smooth, the operator H(x):=xβf(x)H(x):=x-\beta\nabla f(x) where β(0,2K)\beta\in(0,\frac{2}{K}) is a contraction (see [43, p. 15] for details). Indeed, there exists a 0<γ10<\gamma\leq 1 such that

xyβ(f(x)f(y))(1γ)xy,x,y.\|x-y-\beta(\nabla f(x)-\nabla f(y))\|\leq(1-\gamma)\|x-y\|,\forall x,y\in\mathcal{H}. (21)

We have that

xtβf(xt)x\displaystyle\|x_{t}-\beta\nabla f(x_{t})-x^{*}\| =xtxβ(f(xt)f(x))βf(x)\displaystyle=\|x_{t}-x^{*}-\beta(\nabla f(x_{t})-\nabla f(x^{*}))-\beta\nabla f(x^{*})\|
xtxβ(f(xt)f(x))+βf(x).\displaystyle\leq\|x_{t}-x^{*}-\beta(\nabla f(x_{t})-\nabla f(x^{*}))\|+\beta\|\nabla f(x^{*})\|. (22)

Therefore, (21) and (22) imply

xtβf(xt)x\displaystyle\|x_{t}-\beta\nabla f(x_{t})-x^{*}\| xtxβ(f(xt)f(x))+βf(x)\displaystyle\leq\|x_{t}-x^{*}-\beta(\nabla f(x_{t})-\nabla f(x^{*}))\|+\beta\|\nabla f(x^{*})\|
(1γ)xtx+βf(x).\displaystyle\leq(1-\gamma)\|x_{t}-x^{*}\|+\beta\|\nabla f(x^{*})\|. (23)

Substituting (23) for (20) yields

xt+1x\displaystyle\|x_{t+1}-x^{*}\| (1γαt)xtx+αtβf(x)\displaystyle\leq(1-\gamma\alpha_{t})\|x_{t}-x^{*}\|+\alpha_{t}\beta\|\nabla f(x^{*})\|
=(1γαt)xtx+γαt(βf(x)γ)\displaystyle=(1-\gamma\alpha_{t})\|x_{t}-x^{*}\|+\gamma\alpha_{t}(\frac{\beta\|\nabla f(x^{*})\|}{\gamma})

which by induction implies that

xt+1xmax{x0x,βf(x)γ}\|x_{t+1}-x^{*}\|\leq max\{\|x_{0}-x^{*}\|,\frac{\beta\|\nabla f(x^{*})\|}{\gamma}\}

that implies xtx,t{0},ωΩ\|x_{t}-x^{*}\|,t\in\mathbb{N}\cup\{0\},\forall\omega\in\Omega, is bounded. Therefore, {xt}t=0\{x_{t}\}_{t=0}^{\infty} is bounded for all ωΩ\omega\in\Omega.

Appendix C

Lemma 6. The sequence {xt}t=0\{x_{t}\}_{t=0}^{\infty} generated by (11) converges almost surely to a random variable supported by the feasible set.

Proof. From (11) and xt=αtxt+(1αt)xtx_{t}=\alpha_{t}x_{t}+(1-\alpha_{t})x_{t}, we have

xt+1xt+αtβf(xt)=(1αt)(T^(ωt,xt)xt),x_{t+1}-x_{t}+\alpha_{t}\beta\nabla f(x_{t})=(1-\alpha_{t})(\hat{T}(\omega_{t}^{*},x_{t})-x_{t}), (24)

and thus

xt+1xt+αtβf(xt),xtx=(1αt)xtT^(ωt,xt),xtx.\displaystyle\langle x_{t+1}-x_{t}+\alpha_{t}\beta\nabla f(x_{t}),x_{t}-x^{*}\rangle=-(1-\alpha_{t})\langle x_{t}-\hat{T}(\omega_{t}^{*},x_{t}),x_{t}-x^{*}\rangle. (25)

Due to xFVP(T)x^{*}\in FVP(T), we have from part (ii) of Lemma 4 that

xtT^(ωt,xt),xtxη2xtT(ωt,xt)2.\langle x_{t}-\hat{T}(\omega_{t}^{*},x_{t}),x_{t}-x^{*}\rangle\geq\frac{\eta}{2}\|x_{t}-T(\omega_{t}^{*},x_{t})\|^{2}. (26)

We get from (25) and (26) that

xt+1xt+αtβf(xt),xtxη2(1αt)xtT(ωt,xt)2\displaystyle\langle x_{t+1}-x_{t}+\alpha_{t}\beta\nabla f(x_{t}),x_{t}-x^{*}\rangle\leq-\frac{\eta}{2}(1-\alpha_{t})\|x_{t}-T(\omega_{t}^{*},x_{t})\|^{2} (27)

or

xtxt+1,xtxαtβf(xt),xtxη2(1αt)xtT(ωt,xt)2.\displaystyle-\langle x_{t}-x_{t+1},x_{t}-x^{*}\rangle\leq-\alpha_{t}\langle\beta\nabla f(x_{t}),x_{t}-x^{*}\rangle-\frac{\eta}{2}(1-\alpha_{t})\|x_{t}-T(\omega_{t}^{*},x_{t})\|^{2}. (28)

In a Hilbert space ,\mathcal{H}, we have for any u,vu,v\in\mathcal{H} that

u,v=12uv2+12u2+12v2.\langle u,v\rangle=-\frac{1}{2}\|u-v\|^{2}+\frac{1}{2}\|u\|^{2}+\frac{1}{2}\|v\|^{2}. (29)

We obtain from (29) that

xtxt+1,xtx=Ct+1+Ct+12xtxt+12\langle x_{t}-x_{t+1},x_{t}-x^{*}\rangle=-C_{t+1}+C_{t}+\frac{1}{2}\|x_{t}-x_{t+1}\|^{2} (30)

where Ct:=12xtx2C_{t}:=\frac{1}{2}\|x_{t}-x^{*}\|^{2}. We get from (28) and (30) that

Ct+1Ct12xtxt+12\displaystyle C_{t+1}-C_{t}-\frac{1}{2}\|x_{t}-x_{t+1}\|^{2} αtβf(xt),xtxη2(1αt)xtT(ωt,xt)2.\displaystyle\leq-\alpha_{t}\langle\beta\nabla f(x_{t}),x_{t}-x^{*}\rangle-\frac{\eta}{2}(1-\alpha_{t})\|x_{t}-T(\omega_{t}^{*},x_{t})\|^{2}. (31)

From (24) and (17) we obtain

xt+1xt2\displaystyle\|x_{t+1}-x_{t}\|^{2} =αtβf(xt)+(1αt)(T^(ωt,xt)xt)2\displaystyle=\|-\alpha_{t}\beta\nabla f(x_{t})+(1-\alpha_{t})(\hat{T}(\omega^{*}_{t},x_{t})-x_{t})\|^{2}
=αt2βf(xt)2+(1αt)2T^(ωt,xt)xt2\displaystyle=\alpha_{t}^{2}\|\beta\nabla f(x_{t})\|^{2}+(1-\alpha_{t})^{2}\|\hat{T}(\omega^{*}_{t},x_{t})-x_{t}\|^{2}
2αt(1αt)βf(xt),T^(ωt,xt)xt.\displaystyle\quad-2\alpha_{t}(1-\alpha_{t})\langle\beta\nabla f(x_{t}),\hat{T}(\omega_{t}^{*},x_{t})-x_{t}\rangle. (32)

We know that T^(ωt,xt)xt=ηxtT(ωt,xt)\|\hat{T}(\omega^{*}_{t},x_{t})-x_{t}\|=\eta\|x_{t}-T(\omega^{*}_{t},x_{t})\|. Since αt[0,1]\alpha_{t}\in[0,1], we have also that (1αt)2(1αt)(1-\alpha_{t})^{2}\leq(1-\alpha_{t}). Using these facts and multiplying both sides of (32) by 12\frac{1}{2} yield

12xt+1xt2\displaystyle\frac{1}{2}\|x_{t+1}-x_{t}\|^{2} =12αt2βf(xt)2+12(1αt)2η2T(ωt,xt)xt2\displaystyle=\frac{1}{2}\alpha_{t}^{2}\|\beta\nabla f(x_{t})\|^{2}+\frac{1}{2}(1-\alpha_{t})^{2}\eta^{2}\|T(\omega^{*}_{t},x_{t})-x_{t}\|^{2}
αt(1αt)βf(xt),T^(ωt,xt)xt\displaystyle\quad-\alpha_{t}(1-\alpha_{t})\langle\beta\nabla f(x_{t}),\hat{T}(\omega_{t}^{*},x_{t})-x_{t}\rangle
12αt2βf(xt)2+12(1αt)η2T(ωt,xt)xt2\displaystyle\leq\frac{1}{2}\alpha_{t}^{2}\|\beta\nabla f(x_{t})\|^{2}+\frac{1}{2}(1-\alpha_{t})\eta^{2}\|T(\omega^{*}_{t},x_{t})-x_{t}\|^{2}
αt(1αt)βf(xt),T^(ωt,xt)xt.\displaystyle\quad-\alpha_{t}(1-\alpha_{t})\langle\beta\nabla f(x_{t}),\hat{T}(\omega_{t}^{*},x_{t})-x_{t}\rangle. (33)

We obtain from (31) and (33) that

Ct+1Ct\displaystyle C_{t+1}-C_{t} 12xt+1xt2αtβf(xt),xtx\displaystyle\leq\frac{1}{2}\|x_{t+1}-x_{t}\|^{2}-\alpha_{t}\langle\beta\nabla f(x_{t}),x_{t}-x^{*}\rangle
η2(1αt)xtT(ωt,xt)2\displaystyle\quad-\frac{\eta}{2}(1-\alpha_{t})\|x_{t}-T(\omega_{t}^{*},x_{t})\|^{2}
(12η2)η(1αt)xtT(ωt,xt)2\displaystyle\leq-(\frac{1}{2}-\frac{\eta}{2})\eta(1-\alpha_{t})\|x_{t}-T(\omega_{t}^{*},x_{t})\|^{2}
+αt(12αtβf(xt)2\displaystyle\quad+\alpha_{t}(\frac{1}{2}\alpha_{t}\|\beta\nabla f(x_{t})\|^{2}
βf(xt),xtx\displaystyle\quad-\langle\beta\nabla f(x_{t}),x_{t}-x^{*}\rangle
(1αt)βf(xt),T^(ωt,xt)xt).\displaystyle\quad-(1-\alpha_{t})\langle\beta\nabla f(x_{t}),\hat{T}(\omega_{t}^{*},x_{t})-x_{t}\rangle). (34)

We claim that there exists t0t_{0}\in\mathbb{N} such that the sequence {Ct}\{C_{t}\} is non-increasing for tt0t\geq t_{0}. We use proof by contradiction and assume that this is not true. Then there exists a subsequence {Ctj}\{C_{t_{j}}\} such that Ctj+1Ctj>0C_{t_{j}+1}-C_{t_{j}}>0 which with (34) implies

0\displaystyle 0 <Ctj+1Ctj\displaystyle<C_{t_{j}+1}-C_{t_{j}}
(12η2)η(1αtj)xtjT(ωtj,xtj)2\displaystyle\leq-(\frac{1}{2}-\frac{\eta}{2})\eta(1-\alpha_{t_{j}})\|x_{t_{j}}-T(\omega_{t_{j}}^{*},x_{t_{j}})\|^{2}
+αtj(12αtjβ2f(xtj)2βf(xtj),xtjx\displaystyle\quad{}+\alpha_{t_{j}}(\frac{1}{2}\alpha_{t_{j}}\beta^{2}\|\nabla f(x_{t_{j}})\|^{2}-\langle\beta\nabla f(x_{t_{j}}),x_{t_{j}}-x^{*}\rangle
(1αtj)βf(xtj),T^(ωtj,xtj)xtj).\displaystyle\quad{}-(1-\alpha_{t_{j}})\langle\beta\nabla f(x_{t_{j}}),\hat{T}(\omega_{t_{j}}^{*},x_{t_{j}})-x_{t_{j}}\rangle). (35)

Since {xt}\{x_{t}\} is bounded, f(x)\nabla f(x) is continuous, and η(0,1)\eta\in(0,1), we get from (35) and Theorem 1 (a) that

0\displaystyle 0 <lim infj[(12η2)η(1αtj)xtjT(ωtj,xtj)2\displaystyle<\liminf_{j\longrightarrow\infty}[-(\frac{1}{2}-\frac{\eta}{2})\eta(1-\alpha_{t_{j}})\|x_{t_{j}}-T(\omega_{t_{j}}^{*},x_{t_{j}})\|^{2}
+αtj(12αtjβf(xtj)2βf(xtj),xtjx\displaystyle\quad+\alpha_{t_{j}}(\frac{1}{2}\alpha_{t_{j}}\|\beta\nabla f(x_{t_{j}})\|^{2}-\langle\beta\nabla f(x_{t_{j}}),x_{t_{j}}-x^{*}\rangle
(1αtj)βf(xtj),T^(ωtj,xtj)xtj)]0\displaystyle\quad-(1-\alpha_{t_{j}})\langle\beta\nabla f(x_{t_{j}}),\hat{T}(\omega_{t_{j}}^{*},x_{t_{j}})-x_{t_{j}}\rangle)]\leq 0 (36)

that is a contradiction. Hence, there exists t0t_{0}\in\mathbb{N} such that the sequence {Ct}\{C_{t}\} is non-increasing for nt0n\geq t_{0}. Since {Ct}\{C_{t}\} is bounded below, it converges for all ωΩ\omega\in\Omega.

Now we take the limit of both sides of (34) and utilize the convergence of {Ct}\{C_{t}\}, continuity of f(x)\nabla f(x), Step 1, η(0,1)\eta\in(0,1), and Theorem 1 (a) to obtain

limtxtT(ωt,xt)=0,pointwise (surely)\lim_{t\longrightarrow\infty}\|x_{t}-T(\omega_{t}^{*},x_{t})\|=0,\quad{}\textit{pointwise (surely)}

which implies that {xt}t=0\{x_{t}\}_{t=0}^{\infty} converges for each ωΩ\omega\in\Omega since FVP(T)FVP(T)\neq\emptyset. Moreover, this together with Assumption 4 implies that {xt}\{x_{t}\} converges almost surely to a random variable supported by FVP(T)FVP(T).

Appendix D

Lemma 7. The sequence {xt}t=0\{x_{t}\}_{t=0}^{\infty} generated by (11) converges almost surely to the optimal solution.

Proof. Here we prove that {xt}t=0\{x_{t}\}_{t=0}^{\infty} converges almost surely to the optimal solution. Since xFVP(T)x^{*}\in FVP(T) is the optimal solution, we have

x¯x,f(x)0,x¯FVP(T).\langle\bar{x}-x^{*},\nabla f(x^{*})\rangle\geq 0,\forall\bar{x}\in FVP(T). (37)

From (17), we have that

xt+1x2\displaystyle\|x_{t+1}-x^{*}\|^{2} =xt+1x+αtβf(x)αtβf(x)2\displaystyle=\|x_{t+1}-x^{*}+\alpha_{t}\beta\nabla f(x^{*})-\alpha_{t}\beta\nabla f(x^{*})\|^{2}
=xt+1x+αtβf(x)2+αt2βf(x)2\displaystyle=\|x_{t+1}-x^{*}+\alpha_{t}\beta\nabla f(x^{*})\|^{2}+\alpha_{t}^{2}\|\beta\nabla f(x^{*})\|^{2}
2αtβf(x),xt+1x+αtβf(x).\displaystyle\quad-2\alpha_{t}\langle\beta\nabla f(x^{*}),x_{t+1}-x^{*}+\alpha_{t}\beta\nabla f(x^{*})\rangle. (38)

We have that x=αtx+(1αt)x,n{0}x^{*}=\alpha_{t}x^{*}+(1-\alpha_{t})x^{*},\forall n\in\mathbb{N}\cup\{0\}; We get from this fact and (11) that

xt+1x+αtβf(x)2\displaystyle\|x_{t+1}-x^{*}+\alpha_{t}\beta\nabla f(x^{*})\|^{2} =αt[xtxβ(f(xt)f(x))]\displaystyle=\|\alpha_{t}[x_{t}-x^{*}-\beta(\nabla f(x_{t})-\nabla f(x^{*}))]
+(1αt)[T^(ωt,xt)x]2.\displaystyle\quad+(1-\alpha_{t})[\hat{T}(\omega_{t}^{*},x_{t})-x^{*}]\|^{2}. (39)

Furthermore, we have

βf(x),xt+1x+αtβf(x)\displaystyle\langle\beta\nabla f(x^{*}),x_{t+1}-x^{*}+\alpha_{t}\beta\nabla f(x^{*})\rangle =βf(x),xt+1x+αtβf(x),βf(x)\displaystyle=\langle\beta\nabla f(x^{*}),x_{t+1}-x^{*}\rangle+\alpha_{t}\langle\beta\nabla f(x^{*}),\beta\nabla f(x^{*})\rangle
=βf(x),xt+1x+αtβf(x)2.\displaystyle=\langle\beta\nabla f(x^{*}),x_{t+1}-x^{*}\rangle+\alpha_{t}\|\beta\nabla f(x^{*})\|^{2}. (40)

Substituting (39) and (40) for (38) implies

xt+1x2\displaystyle\|x_{t+1}-x^{*}\|^{2} =xt+1x+αtβf(x)2+αt2βf(x)2\displaystyle=\|x_{t+1}-x^{*}+\alpha_{t}\beta\nabla f(x^{*})\|^{2}+\alpha_{t}^{2}\|\beta\nabla f(x^{*})\|^{2}
2αtβf(x),xt+1x+αtβf(x)\displaystyle\quad-2\alpha_{t}\langle\beta\nabla f(x^{*}),x_{t+1}-x^{*}+\alpha_{t}\beta\nabla f(x^{*})\rangle
=αt[xtxβ(f(xt)f(x))]\displaystyle=\|\alpha_{t}[x_{t}-x^{*}-\beta(\nabla f(x_{t})-\nabla f(x^{*}))]
+(1αt)[T^(ωt,xt)x]2\displaystyle\quad+(1-\alpha_{t})[\hat{T}(\omega_{t}^{*},x_{t})-x^{*}]\|^{2}
2αtβf(x),xt+1xαt2βf(x)2\displaystyle\quad-2\alpha_{t}\langle\beta\nabla f(x^{*}),x_{t+1}-x^{*}\rangle-\alpha_{t}^{2}\|\beta\nabla f(x^{*})\|^{2}
=αt2xtxβ(f(xt)f(x))2\displaystyle=\alpha_{t}^{2}\|x_{t}-x^{*}-\beta(\nabla f(x_{t})-\nabla f(x^{*}))\|^{2}
+(1αt)2T^(ωt,xt)x2\displaystyle\quad+(1-\alpha_{t})^{2}\|\hat{T}(\omega_{t}^{*},x_{t})-x^{*}\|^{2}
+2αt(1αt)xtx\displaystyle\quad+2\alpha_{t}(1-\alpha_{t})\langle x_{t}-x^{*}
β(f(xt)f(x)),T^(ωt,xt)x\displaystyle\quad-\beta(\nabla f(x_{t})-\nabla f(x^{*})),\hat{T}(\omega_{t}^{*},x_{t})-x^{*}\rangle
2αtβf(x),xt+1xαt2βf(x)2.\displaystyle\quad-2\alpha_{t}\langle\beta\nabla f(x^{*}),x_{t+1}-x^{*}\rangle-\alpha_{t}^{2}\|\beta\nabla f(x^{*})\|^{2}.

From (21), quasi-nonexpansivity property of T^(ω,x)\hat{T}(\omega^{*},x), and Cauchy–Schwarz inequality, we have

xtxβ(f(xt)f(x)),T^(ωt,xt)x(1γ)xtx2.\displaystyle\langle x_{t}-x^{*}-\beta(\nabla f(x_{t})-\nabla f(x^{*})),\hat{T}(\omega_{t}^{*},x_{t})-x^{*}\rangle\leq(1-\gamma)\|x_{t}-x^{*}\|^{2}. (41)

We get from (21) that

xtxβ(f(xt)f(x))2(1γ)2xtx2.\|x_{t}-x^{*}-\beta(\nabla f(x_{t})-\nabla f(x^{*}))\|^{2}\leq(1-\gamma)^{2}\|x_{t}-x^{*}\|^{2}. (42)

We obtain from (41), (42), and quasi-nonexpansivity property of T^(ω,x)\hat{T}(\omega^{*},x) that

xt+1x2\displaystyle\|x_{t+1}-x^{*}\|^{2} =αt2xtxβ(f(xt)f(x))2\displaystyle=\alpha_{t}^{2}\|x_{t}-x^{*}-\beta(\nabla f(x_{t})-\nabla f(x^{*}))\|^{2}
+(1αt)2T^(ωt,xt)x2\displaystyle\quad+(1-\alpha_{t})^{2}\|\hat{T}(\omega_{t}^{*},x_{t})-x^{*}\|^{2}
+2αt(1αt)xtxβ(f(xt)f(x)),T^(ωt,xt)x\displaystyle\quad+2\alpha_{t}(1-\alpha_{t})\langle x_{t}-x^{*}-\beta(\nabla f(x_{t})-\nabla f(x^{*})),\hat{T}(\omega_{t}^{*},x_{t})-x^{*}\rangle
2αtβf(x),xt+1xαt2βf(x)2\displaystyle\quad-2\alpha_{t}\langle\beta\nabla f(x^{*}),x_{t+1}-x^{*}\rangle-\alpha_{t}^{2}\|\beta\nabla f(x^{*})\|^{2}
(12γαt)xtx2\displaystyle\leq(1-2\gamma\alpha_{t})\|x_{t}-x^{*}\|^{2}
+αt(γ2αtxtx22βf(x),xt+1x)\displaystyle\quad+\alpha_{t}(\gamma^{2}\alpha_{t}\|x_{t}-x^{*}\|^{2}-2\langle\beta\nabla f(x^{*}),x_{t+1}-x^{*}\rangle)
=(1γαt)xtx2γαtxtx2\displaystyle=(1-\gamma\alpha_{t})\|x_{t}-x^{*}\|^{2}-\gamma\alpha_{t}\|x_{t}-x^{*}\|^{2}
+αt(γ2αtxtx22βf(x),xt+1x).\displaystyle\quad+\alpha_{t}(\gamma^{2}\alpha_{t}\|x_{t}-x^{*}\|^{2}-2\langle\beta\nabla f(x^{*}),x_{t+1}-x^{*}\rangle).

We obtain from γαtxtx20\gamma\alpha_{t}\|x_{t}-x^{*}\|^{2}\geq 0 that

(1γαt)xtx2γαtxtx2\displaystyle\quad(1-\gamma\alpha_{t})\|x_{t}-x^{*}\|^{2}-\gamma\alpha_{t}\|x_{t}-x^{*}\|^{2}
+αt(γ2αtxtx22βf(x),xt+1x)\displaystyle\quad+\alpha_{t}(\gamma^{2}\alpha_{t}\|x_{t}-x^{*}\|^{2}-2\langle\beta\nabla f(x^{*}),x_{t+1}-x^{*}\rangle)
(1γαt)xtx2\displaystyle\leq(1-\gamma\alpha_{t})\|x_{t}-x^{*}\|^{2}
+αt(γ2αtxtx22βf(x),xt+1x)\displaystyle\quad+\alpha_{t}(\gamma^{2}\alpha_{t}\|x_{t}-x^{*}\|^{2}-2\langle\beta\nabla f(x^{*}),x_{t+1}-x^{*}\rangle)

or finally

xt+1x2(1γαt)xtx2+γαt(γ2αtxtx22βf(x),xt+1xγ).\displaystyle\|x_{t+1}-x^{*}\|^{2}\leq(1-\gamma\alpha_{t})\|x_{t}-x^{*}\|^{2}+\gamma\alpha_{t}(\frac{\gamma^{2}\alpha_{t}\|x_{t}-x^{*}\|^{2}-2\langle\beta\nabla f(x^{*}),x_{t+1}-x^{*}\rangle}{\gamma}). (43)

From Step 1, Step 2, (37), and the condition in Theorem 1 (a), we get

limt(γ2αtxtx22βf(x),xt+1x)0almost surely.\displaystyle\displaystyle\lim_{t\longrightarrow\infty}(\gamma^{2}\alpha_{t}\|x_{t}-x^{*}\|^{2}-2\beta\langle\nabla f(x^{*}),x_{t+1}-x^{*}\rangle)\leq 0\quad{\textit{almost surely}}. (44)

Setting at,bt,hta_{t},b_{t},h_{t} in Lemma 2 as

at=xtx2,\displaystyle a_{t}=\|x_{t}-x^{*}\|^{2},
bt=γαt,\displaystyle b_{t}=\gamma\alpha_{t},
ht=(γ2αtxtx22βf(x),xt+1xγ),\displaystyle h_{t}=(\frac{\gamma^{2}\alpha_{t}\|x_{t}-x^{*}\|^{2}-2\beta\langle\nabla f(x^{*}),x_{t+1}-x^{*}\rangle}{\gamma}),

we get from (43), (44), and the condition in Theorem 1 (b) that

limtxtx2=0almost surely.\displaystyle\lim_{t\longrightarrow\infty}\|x_{t}-x^{*}\|^{2}=0\quad{\textit{almost surely}}.

Therefore, {xt}t=0\{x_{t}\}_{t=0}^{\infty} converges almost surely to xx^{*}.

References

  • [1] Jakovetić, D., Bajović, D., Xavier, J., and Moura, J. M. F.: Primal-dual methods for large-scale and distributed convex optimization and data analysis. Proceedings of The IEEE. 108, 1923–1938 (2020)
  • [2] Yang, T., Yi, X., Wu, J., Yuan, Y., Wu, D., Meng, Z., Hong, Y., Wang, H., Lin, Z., and Johansson, K. H.: A survey of distributed optimization. Annual Reviews in Control. 47, 278–305 (2019)
  • [3] Mazlum, D. K., Dörfler, F., Sandberg, H., Low, S. H., Chakrabarti, S., Baldick, R., and Lavaei, J.: A survey of distributed optimization and control algorithms for electric power systems. IEEE Trans. on Smart Grid. 8, 2941–2962 (2017)
  • [4] Nedić, A.: Distributed optimization. Encyclopedia of Systems and Control. 1–12 (2014)
  • [5] Liberzon, D.: Switching in Systems and Control. Springer, New York (2003)
  • [6] Cucker, F., and Smale, S.: Emergent behavior in flocks. IEEE Trans. Automatic Control. 52, 852–862 (2007)
  • [7] Krause, U.: A discrete nonlinear and non-autonomous model of consensus formation. Communications in Difference Equations. Gordon and Breach, Amsterdam. 227–236 (2000)
  • [8] Totsch, S., and Tadmor, E.: Heterophilious dynamics enhances consensus. SIAM Review. 56, 577-621 (2014)
  • [9] Acemoglu, D., Ozdaglar, A., and Parandehgheibi, A.: Spread of (mis)information in social networks. Games and Economic Behavior. 70, 194–227 (2010)
  • [10] Acemoglu, D., and Ozdaglar, A.: Opinion dynamics and learning in social networks. Dynamic Games and Applications. 1, 3–49 (2011)
  • [11] Acemoglu, D., Como, G., Fagnani, F., and Ozdaglar, A.: Opinion fluctuations and disagreement in social networks. Mathematics of Operations Research. 38, 1–27 (2013)
  • [12] Acemoglu, D., Bimpikis, K., and Ozdaglar, A.: Dynamics of information exchange in endogenous social networks. Theoretical Economics. 9, 41–97 (2014)
  • [13] Heyselmann, R., and Krause, U.: Opinion dynamics and bounded confidence moels, analysis, and simulation. J. Artificial Societies and Social Simulation. 5, 1–33 (2002)
  • [14] Blondel, V. D., Hendrickx, J. M., and Tsitsiklis, J. N.: On Krause’s multi-agent consensus model with state-dependent connectivity. IEEE Trans. on Autom. Contr. 54, 2586–2597 (2009)
  • [15] Acemoglu, D., Ozdaglar, A., and Yildiz, E.: Diffusion of innovations in social networks. Proc. of 50th IEEE Conf. on Dec. Cont. and Europ. Cont. Conf., Dec. 12-15, Orlando, FL, USA. 2329–2334 (2011)
  • [16] Simonetto, A., Kevicsky, T., and Babuška, R.: Constrained distributed algebraic connectivity maximization in robotic networks. Automatica. 49, 1348–1357 (2013)
  • [17] Kim, Y., and Mesbahi, M.: On maximizing the second smallest eigenvalue of a state-dependent graph laplacian. IEEE Trans. on Autom. Contr. 51, 116–120 (2006)
  • [18] Siljak, D. D.: Dynamic graphs. Nonlin. Analysis: Hybrid Syst. 2, 544–567 (2008)
  • [19] Trianni, V., Simone, D. D., Reina, A., and Baronchelli, A.: Emergence of consensus in a multi-robot network: from abstract models to empirical validation. IEEE Robotics and Automation Letters. 1, 348–353 (2016)
  • [20] Lobel, I., Ozdaglar, A., and Feiger, D.: Distributed multi-agent optimization with state-dependent communication. Math. Program. Ser. B. 129, 255–284 (2011)
  • [21] Jing, G., Zheng, Y., and Wang, L.: Consensus of multiagent systems with distance-dependent communication networks. IEEE Trans. on Neural Networks and Learning Systems. 28, 2712–2726 (2017)
  • [22] Jing, G., and Wang, L.: Finite time coordination under state-dependent communication graphs with inherent links. IEEE Trans. on Circuits and Systems-II: Express Briefs. 66, 968–972 (2019)
  • [23] Shang, Y.: Constrained consensus in state-dependent directed multiagent networks. IEEE Trans. on Network Science and Engineering. 9, 4416–4425 (2022)
  • [24] Sluc̆iak, O., and Rupp, M.: Consensus algorithm with state-dependent weights. IEEE Trans. Signal Processing. 64, 1972–1985 (2016)
  • [25] Bogojeska, A., Mirchev, M., Mishkovski, I., and Kocarev, L.: Synchronization and consensus in state-dependent networks. IEEE Trans. on Circuits and Systems-I: Regular Papers. 61, 522–529 (2014)
  • [26] Awad, A., Chapman, A., Schoof, E., Narang-Siddarth, A., and Mesbahi, M.: Time-scale separation on networks: consensus, tracking, and state-dependent interactions. IEEE 54th Annual Conf. on Decision and Control, Dec. 15-18, Osaka, Japan. 6172–6177 (2015)
  • [27] Shi, G., Johansson, K. H., and Hong, Y.: Reaching an optimal consensus: dynamical systems that compute intersections of convex sets. IEEE Trans. Automatic Control. 58, 610–622 (2013)
  • [28] Verma, A., Vasconcelos, M., Mitra, U., and Touri, B.: Maximal dissent: a state-dependent way to agree in distributed convex optimization. IEEE Trans. on Control of Network Systems. 10, 1783–1795 (2023)
  • [29] Alaviani, S. Sh., and Elia, N.: Distributed convex optimization with state-dependent (social) interactions and time-varying topologies. IEEE Trans. Signal Processing. 69, 2611-2624 (2021)
  • [30] Alaviani, S. Sh., and Elia, N.: Distributed multiagent convex optimization over random digraphs. IEEE Trans. Automatic Control. 65, 986–998 (2020)
  • [31] Alaviani, S. Sh., and Kelkar, A. G.: Distributed convex optimization with state-dependent interactions over random networks. Proc. of IEEE Conf. on Decision and Control, Dec. 13-17, Austin, Texas, USA. 3149–3153 (2021)
  • [32] Tsitsiklis, J. N.: Problems in decentralized decision making and computation.    Ph.D. dissertation. Dep. Elect. Eng. Comp. Sci., MIT, Cambridge, MA (1984)
  • [33] Bertsekas, D. P., and Tsitsiklis, J. N.: Parallel and Distributed Computation: Numerical Methods. Prentice Hall, Englewood Cliffs (1989)
  • [34] Dotson, W. G.: Fixed points of quasi-nonexpansive mappings. J. Austral. Math. Soc. 13, 167–170 (1972)
  • [35] Horn, R. A., and Johnson, C. R.: Matrix Analysis. Cambridge University Press, New York (1985)
  • [36] Xu, H. K.: Iterative algorithms for nonlinear operators. J. London Math. Soc. 66, 240-256 (2002)
  • [37] Alaviani, S. Sh., and Kelkar, A. G.: Asynchronous Algorithms for Distributed Consensus-Based Optimization and Distributed Resource Allocation over Random Networks. Proc. of Amer. Cont. Conf., June 8-10, Atlanta, GA, USA. 216–221 (2022)
  • [38] Alaviani, S. Sh., and Elia, N.: Distributed average consensus over random networks. Proc. of Amer. Cont. Conf., July 10-12, Philadelphia, PA, USA,. 1854–1859 (2019)
  • [39] Alaviani, S. Sh., and Elia, N.: A distributed algorithm for solving linear algebraic equations over random networks. IEEE Trans. on Automatic Control. 66, 2399–2406 (2021)
  • [40] Kattepur, A., Rath, H. K., Mukherjee, A., and Simha, A.: Distributed optimization framework for Industry 4.0 automated warehouses. EAI Endorsed Transactions on Industrial Networks and Intelligent Systems. 5, 1–10 (2018)
  • [41] Pahlavan, K., and Levesque, H.: Wireless Information Networks. Wiley, New York (1995)
  • [42] Bubeck, S.: Convex optimization: algorithms and complexity. Foundations and Trends in Machine Learning. 8, 231–357 (2015)
  • [43] Ryu, E. K., and Boyd, S.: A primer on monotone operator methods. Appl. Comput. Math. 15, 3–43 (2016)