This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

An Asynchronous Approximate Distributed
Alternating Direction Method of Multipliers in Digraphs

Wei Jiang, Andreas Grammenos, Evangelia Kalyvianaki, and Themistoklis Charalambous W. Jiang and T. Charalambous are with the Department of Electrical Engineering and Automation, School of Electrical Engineering, Aalto University, Espoo, Finland. Emails: {name.surname}@aalto.fi.A. Grammenos is with the Department of Computer Science and Technology, University of Cambridge, Cambridge, and the Alan Turing Institute, London, UK. Email: ag926@cl.cam.ac.uk.E. Kalyvianaki is with the Department of Computer Science and Technology, University of Cambridge, Cambridge, UK. Email: ek264@cl.cam.ac.uk.
Abstract

In this work, we consider the asynchronous distributed optimization problem in which each node has its own convex cost function and can communicate directly only with its neighbors, as determined by a directed communication topology (directed graph or digraph). First, we reformulate the optimization problem so that Alternating Direction Method of Multipliers (ADMM) can be utilized. Then, we propose an algorithm, herein called Asynchronous Approximate Distributed Alternating Direction Method of Multipliers (AsyAD-ADMM), using finite-time asynchronous approximate ratio consensus, to solve the multi-node convex optimization problem, in which every node performs iterative computations and exchanges information with its neighbors asynchronously. More specifically, at every iteration of AsyAD-ADMM, each node solves a local convex optimization problem for one of the primal variables and utilizes a finite-time asynchronous approximate consensus protocol to obtain the value of the other variable which is close to the optimal value, since the cost function for the second primal variable is not decomposable. If the individual cost functions are convex but not necessarily differentiable, the proposed algorithm converges at a rate of O(1/k)O(1/k), where kk is the iteration counter. The efficacy of AsyAD-ADMM is exemplified via a proof-of-concept distributed least-square optimization problem with different performance-influencing factors investigated.

Index Terms:
Distributed optimization, asynchronous ADMM, directed graphs, ratio consensus, finite-time consensus.

I Introduction

In this paper, we are interested in developing a modified ADMM algorithm for solving large-scale optimization problems in digraphs. In that direction, there are two main communication topologies considered: (i) master-workers communication topology [1, 2, 3] and (ii) multi-node communication topology [4, 5, 6, 7]. When the ADMM has a master-worker communication topology, the worker nodes optimize their local objectives and communicate their local variables to the master node which updates the global optimization variable and send it back to the workers. When the ADMM has no master node, the optimization problem is solved over a network of nodes. Herein, we focus on the ADMM realized on multi-node communication topologies.

There are several real-life applications in which synchronous operation is infeasible or costly and complicated. As a result, asynchronous information exchange is necessitated. For example, resource allocation in data centers gives rise to large-scale problems and networks, which naturally call for asynchronous solutions. Recently, different asynchronous distributed ADMM versions are proposed under different assumptions and communication topologies. For example, the version in [1, 2] is based on the master-worker architecture under a star topology. Even though worker updates are parallel and distributed, the master update is centralized as it collects information from workers and broadcasts the updated information to workers for their following iteration. Authors in [4, 5] proposed another version with the almost surely convergence by using information from a randomly selected subset of nodes for ADMM update under the positive possibility selection assumption. Then, authors in [3] combined the above two architecture and designed a master-local processor-link processor architecture with the random node selection. It is worth noting that all the above asynchronous versions are only available for undirected communication graphs, i.e., assuming that every communication link is bidirectional. Very recently, distributed ADMM versions for digraphs are proposed in [6, 7]. However, they are synchronous, which is also the motivation for this work.

In this paper, we propose AsyAD-ADMM, an asynchronous distributed ADMM algorithm for digraphs. First, the optimization problem is restructured such that at every optimization step, each node can solve individually a local convex optimization problem for one of the primal variables, while for the other primal variable the communication among nodes is required, since the cost function is not decomposable. Then, we find the solution for the primal variable whose cost function is non-decomposable by means of a finite-time asynchronous approximate consensus algorithm. More specifically, we adopt the algorithm introduced in [8, Algorithm 2]. The algorithm relies on asynchronous min\min-consensus and max\max-consensus protocols to compute the approximate average consensus value considering different communication delays over a finite number of steps. Unlike [8], the consensus algorithm should be repeated at every optimization step, which is also asynchronous. Towards this end, we propose a mechanism in which every next optimization step is initiated after the previous one has finished. The main contributions of the paper are as follows.

  • \bullet

    We propose an asynchronous distributed ADMM algorithm which can be applied to digraphs.

  • \bullet

    The convergence analysis shows AsyAD-ADMM converges at a rate of O(1/k)O(1/k), where kk is the iteration counter. Factors influencing the algorithm performance are also investigated.

II Preliminaries

II-A Notation and graph theory

The set of real (non-negative integer, positive integer) numbers is denoted by \mathds{R} (,\mathds{Z},\mathds{N}) and +n\mathds{R}^{n}_{+} denotes the non-negative orthant of the nn-dimensional real space n\mathds{R}^{n}. ATA^{\mbox{\tiny T}} denotes the transpose of matrix AA. For An×nA\in\mathds{R}^{n\times n}, aija_{ij} denotes the entry in row ii and column jj. By 𝟙\mathds{1} we denote the all-ones vector and by II we denote the identity matrix (of appropriate dimensions). \|\cdot\| denotes the 2-norm.

In multi-node systems with fixed communication links (edges), the exchange of information between nodes can be conveniently captured by a graph 𝒢(𝒱,)\mathcal{G}(\mathcal{V},\mathcal{E}) of order nn (n2)(n\geq 2), where 𝒱={v1,v2,,vn}\mathcal{V}=\{v_{1},v_{2},\ldots,v_{n}\} is the set of nodes and 𝒱×𝒱\mathcal{E}\subseteq\mathcal{V}\times\mathcal{V} is the set of edges. A directed edge from node viv_{i} to node vjv_{j} is denoted by εji=(vj,vi)\varepsilon_{ji}=(v_{j},v_{i})\in\mathcal{E} and represents a communication link that allows node vjv_{j} to receive information from node viv_{i}. A graph is said to be undirected if and only if εji\varepsilon_{ji}\in\mathcal{E} implies εij\varepsilon_{ij}\in\mathcal{E}. A digraph is called strongly connected if there exists a path from each vertex viv_{i} of the graph to each vertex vjv_{j} (vjviv_{j}\neq v_{i}). In other words, for any vj,vi𝒱v_{j},v_{i}\in\mathcal{V}, vjviv_{j}\neq v_{i}, one can find a sequence of nodes vi=vl1v_{i}=v_{l_{1}}, vl2v_{l_{2}}, vl3v_{l_{3}}, \ldots, vlm=vjv_{l_{m}}=v_{j} such that link (vls+1,vls)(v_{l_{s+1}},v_{l_{s}})\in\mathcal{E} for all s=1,2,,m1s=1,2,\ldots,m-1. The diameter DD of a graph is the longest shortest path between any two nodes in the network.

All nodes that can transmit information to node vjv_{j} directly are said to be in-neighbors of node vjv_{j} and belong to the set 𝒩j={vi𝒱|εji}\mathcal{N}^{-}_{j}=\{v_{i}\in\mathcal{V}\;|\;\varepsilon_{ji}\in\mathcal{E}\}. The cardinality of 𝒩j\mathcal{N}^{-}_{j}, is called the in-degree of vjv_{j} and is denoted by 𝒟j=|𝒩j|\mathcal{D}^{-}_{j}=\left|\mathcal{N}^{-}_{j}\right|. The nodes that receive information from node vjv_{j} belong to the set of out-neighbors of node vjv_{j}, denoted by 𝒩j+={vl𝒱|εlj}\mathcal{N}^{+}_{j}=\{v_{l}\in\mathcal{V}\;|\;\varepsilon_{lj}\in\mathcal{E}\}. The cardinality of 𝒩j+\mathcal{N}^{+}_{j}, is called the out-degree of vjv_{j} and is denoted by 𝒟j+=|𝒩j+|\mathcal{D}^{+}_{j}=\left|\mathcal{N}^{+}_{j}\right|.

In the type of algorithms we consider, we associate a positive weight pjip_{ji} for each edge εji{(vj,vj)|vj𝒱}\varepsilon_{ji}\in\mathcal{E}\cup\{(v_{j},v_{j})\;|\>v_{j}\in\mathcal{V}\}. The nonnegative matrix P=[pji]+n×nP=[p_{ji}]\in\mathds{R}_{+}^{n\times n} is a weighted adjacency matrix that has zero entries at locations that do not correspond to directed edges (or self-edges) in the graph.

II-B Standard ADMM Algorithm

The standard ADMM algorithm solves the problem:

min\displaystyle\min f(x)+g(z)\displaystyle f(x)+g(z) (1)
s.t. Ax+Bz=c\displaystyle Ax+Bz=c

for variables xp,zmx\in\mathds{R}^{p},z\in\mathds{R}^{m} with matrices Aq×p,Bq×mA\in\mathds{R}^{q\times p},B\in\mathds{R}^{q\times m} and vector cqc\in\mathds{R}^{q} (p,m,qp,m,q\in\mathds{N}). The augmented Lagrangian is

Lρ(x,z,λ)=\displaystyle L_{\rho}(x,z,\lambda)= f(x)+g(z)+λT(Ax+Bzc)\displaystyle f(x)+g(z)+\lambda^{T}(Ax+Bz-c) (2)
+ρ2Ax+Bzc2,\displaystyle+\frac{\rho}{2}\|Ax+Bz-c\|^{2},

where λ\lambda is the Lagrange multiplier and ρ>0\rho>0 is a penalty parameter. In ADMM, the primary variables x,zx,z and the Lagrange multiplier λ\lambda are updated as follows: starting from some initial vector [x0z0λ0]T\begin{bmatrix}x^{0}&z^{0}&\lambda^{0}\end{bmatrix}^{\mbox{\tiny T}}, at each optimization iteration kk,

xk+1=\displaystyle x^{k+1}= argminxLρ(x,zk,λk),\displaystyle\operatorname*{argmin}_{x}L_{\rho}(x,z^{k},\lambda^{k}), (3)
zk+1=\displaystyle z^{k+1}= argminzLρ(xk+1,z,λk),\displaystyle\operatorname*{argmin}_{z}L_{\rho}(x^{k+1},z,\lambda^{k}), (4)
λk+1=\displaystyle\lambda^{k+1}= λk+ρ(Axk+1+Bzk+1c).\displaystyle\lambda^{k}+\rho(Ax^{k+1}+Bz^{k+1}-c). (5)

The step-size in the Lagrange multiplier update is the same as the augmented Lagrangian function parameter ρ\rho.

III Problem Formulation

In this work, we consider a strongly connected digraph 𝒢(𝒱,)\mathcal{G}(\mathcal{V},\mathcal{E}) in which each node vi𝒱v_{i}\in\mathcal{V} is endowed with a scalar cost function fi:pf_{i}:\mathds{R}^{p}\mapsto\mathds{R} assumed to be known to the node only. We assume that each node viv_{i} has knowledge of the number of its out-going links, 𝒟i+\mathcal{D}^{+}_{i}, and has access to local information only via its communication with the in-neighboring nodes, 𝒩i\mathcal{N}^{-}_{i}. The only global information available to all the nodes in the network is given in Assumption 1.

Assumption 1

The diameter of the network DD, or an upper bound, is known to all nodes.

While Assumption 1 is limiting, there exist distributed methods for extracting such information; see, e.g., [9, 10].

The objective is to design a discrete-time coordination algorithm that allows every node viv_{i} in a digraph to distributively and asynchronously solve the following global optimization problem:

minxpi=1nfi(x),\operatorname*{min}_{x\in\mathds{R}^{p}}\sum_{i=1}^{n}f_{i}(x), (6)

where xpx\in\mathds{R}^{p} is a global optimization variable (or a common decision variable). In order to solve problem and to enjoy the structure ADMM scheme at the same time, a separate decision variable xix_{i} for node viv_{i} is introduced and the constraint xixjϵ,ϵ>0,vi,vj𝒱\|x_{i}-x_{j}\|\leq\epsilon,\epsilon>0,\forall v_{i},v_{j}\in\mathcal{V} is imposed to allow the asynchronous distributed ADMM framework. Thus, problem (6) is reformulated as

minxi,i=1,,n\displaystyle\min_{x_{i},i=1,\ldots,n}\, i=1nfi(xi),\displaystyle\sum_{i=1}^{n}f_{i}(x_{i}), (7)
s.t. xixjϵ,vi,vj𝒱,\displaystyle\|x_{i}-x_{j}\|\leq\epsilon,\forall v_{i},v_{j}\in\mathcal{V}, (8)

where ϵ\epsilon is a predefined error tolerance. Define a closed nonempty convex set 𝒞\mathcal{C} as

𝒞={[x1Tx2TxnT]Tnp:xixjϵ}.\mathcal{C}=\left\{\begin{bmatrix}x_{1}^{\mbox{\tiny T}}&x_{2}^{\mbox{\tiny T}}&\ldots&x_{n}^{\mbox{\tiny T}}\end{bmatrix}^{\mbox{\tiny T}}\in\mathds{R}^{np}\,:\,\|x_{i}-x_{j}\|\leq\epsilon\right\}. (9)

By denoting X[x1Tx2TxnT]TX\coloneqq\begin{bmatrix}x_{1}^{\mbox{\tiny T}}&x_{2}^{\mbox{\tiny T}}&\ldots&x_{n}^{\mbox{\tiny T}}\end{bmatrix}^{\mbox{\tiny T}} and making variable znpz\in\mathds{R}^{np} as a copy of vector XX, constraint (8) related to problem (7) becomes

s.t.X=z,z𝒞.\text{s.t.}\,X=z,\,z\in\mathcal{C}. (10)

Then, define g(z)g(z) as the indicator function of set 𝒞\mathcal{C} as

g(z)={0,ifz𝒞,,otherwise.g(z)=\left\{\begin{array}[]{l}\begin{aligned} &0,\quad\text{if}\,z\in\mathcal{C},\\ &\infty,\,\text{otherwise}.\end{aligned}\end{array}\right. (11)

Finally, problem (7) with constraint (10) is transformed to

minz,xi,i=1,,n\displaystyle\min_{z,x_{i},i=1,\ldots,n} {i=1nfi(xi)+g(z)},\displaystyle\left\{\sum_{i=1}^{n}f_{i}(x_{i})+g(z)\right\}, (12)
s.t. Xz=0.\displaystyle X-z=0.

For notational convenience, denote F(X)i=1nfi(xi)F(X)\coloneqq\sum_{i=1}^{n}f_{i}(x_{i}). Thus, denote the Lagrangian function as

L(X,z,λ)=F(X)+g(z)+λT(Xz),L(X,z,\lambda)=F(X)+g(z)+\lambda^{\mbox{\tiny T}}(X-z), (13)

where λnp\lambda\in\mathds{R}^{np} is the Lagrange multiplier. The following standard assumptions are required for problem (12).

Assumption 2

Each cost function fi:p{+}f_{i}:\mathds{R}^{p}\rightarrow\mathds{R}\cup\{+\infty\} is closed, proper and convex.

Assumption 3

The Lagrangian L(X,z,λ)L(X,z,\lambda) has a saddle point, i.e., there exists a solution (X,z,λ)(X^{*},z^{*},\lambda^{*}), for which

L(X,z,λ)L(X,z,λ)L(X,z,λ),L(X^{*},z^{*},\lambda)\leq L(X^{*},z^{*},\lambda^{*})\leq L(X,z,\lambda^{*}), (14)

holds for all XX in np\mathds{R}^{np}, zz in np\mathds{R}^{np} and λ\lambda in np\mathds{R}^{np}.

Assumption 2 allows fif_{i} to be non-differentiable [11]. By Assumptions 2-3 and based on the definition of g(z)g(z) in (11), L(X,z,λ)L(X,z,\lambda^{*}) is convex in (X,z)(X,z) and (X,z)(X^{*},z^{*}) is a solution to problem (12[11, 12].

At iteration kk, the corresponding augmented Lagrangian of optimization problem (12) is written as

Lρ\displaystyle L_{\rho} (Xk,zk,λk)\displaystyle(X^{k},z^{k},\lambda^{k}) (15)
=\displaystyle= i=1nfi(xik)+g(zk)+λkT(Xkzk)+ρ2Xkzk2\displaystyle\sum_{i=1}^{n}f_{i}(x_{i}^{k})+g(z^{k})+{\lambda^{k}}^{\mbox{\tiny T}}(X^{k}-z^{k})+\frac{\rho}{2}\|X^{k}-z^{k}\|^{2}
=\displaystyle= i=1n(fi(xik)+λikT(xikzik)+ρ2xikzik2)+g(zk),\displaystyle\sum_{i=1}^{n}\left(f_{i}(x_{i}^{k})+{\lambda_{i}^{k}}^{\mbox{\tiny T}}(x_{i}^{k}-z_{i}^{k})+\frac{\rho}{2}\|x_{i}^{k}-z_{i}^{k}\|^{2}\right)+g(z^{k}),

where zipz_{i}\in\mathds{R}^{p} is the ii-th element of vector zz. By ignoring terms which are independent of the minimization variables (i.e., xi,zx_{i},z), for each node viv_{i}, the standard ADMM updates (3)-(5) change to the following format:

xik+1=\displaystyle x_{i}^{k+1}= argminxifi(xi)+λikTxi+ρ2xizik2,\displaystyle\operatorname*{argmin}_{x_{i}}f_{i}(x_{i})+{\lambda_{i}^{k}}^{\mbox{\tiny T}}x_{i}+\frac{\rho}{2}\|x_{i}-z_{i}^{k}\|^{2}, (16)
zk+1=\displaystyle z^{k+1}= argminzg(z)+λkT(Xk+1z)+ρ2Xk+1z2\displaystyle\operatorname*{argmin}_{z}g(z)+{\lambda^{k}}^{\mbox{\tiny T}}(X^{k+1}-z)+\frac{\rho}{2}\|X^{k+1}-z\|^{2}
=\displaystyle= argminzg(z)+ρ2Xk+1z+1ρλk2,\displaystyle\operatorname*{argmin}_{z}g(z)+\frac{\rho}{2}\|X^{k+1}-z+\frac{1}{\rho}\lambda^{k}\|^{2}, (17)
λik+1=\displaystyle\lambda_{i}^{k+1}= λik+ρ(xik+1zik+1),\displaystyle\lambda_{i}^{k}+\rho(x_{i}^{k+1}-z_{i}^{k+1}), (18)

where the last term in (17) comes from the identity 2aTb+b2=(a+b)2a22a^{T}b+b^{2}=(a+b)^{2}-a^{2} with a=λk/ρa=\lambda^{k}/\rho and b=Xk+1zb=X^{k+1}-z.

Update (16) for xik+1x_{i}^{k+1} can be solved by a classical method, e.g., the proximity operator [11, Section 4]. Update (18) for the dual variable λik+1\lambda_{i}^{k+1} can be implemented trivially by node viv_{i}. Note that both updates can be done independently and in parallel by node viv_{i}.

Since gg in (11) is the indicator function of the closed nonempty convex set 𝒞\mathcal{C}, update (17) for zk+1z^{k+1} becomes

zk+1=Π𝒞(Xk+1+λk/ρ),\displaystyle z^{k+1}=\Pi_{\mathcal{C}}(X^{k+1}+\lambda^{k}/\rho), (19)

where Π𝒞\Pi_{\mathcal{C}} denotes the projection (in the Euclidean norm) onto 𝒞\mathcal{C}. Intuitively, from (17) and the definition of g(z)g(z) in (11), one can see that the elements of zz (i.e., z1,z2,,znz_{1},z_{2},\ldots,z_{n}) should go into 𝒞\mathcal{C} in finite time. If not, one will have g(z)=g(z)=\infty and update (17) will never be finished. Then, from the definition of 𝒞\mathcal{C} in (9), one can see that zz going into 𝒞\mathcal{C} means zizjϵ,vi,vj𝒱\|z_{i}-z_{j}\|\leq\epsilon,\forall v_{i},v_{j}\in\mathcal{V}.

It is worth noting that if zizj=0,vi,vj𝒱z_{i}-z_{j}=0,\forall v_{i},v_{j}\in\mathcal{V}, it means z1=z2==znz_{1}=z_{2}=\ldots=z_{n}, which is in the mathematical format of consensus. In other words, each node vi𝒱v_{i}\in\mathcal{V} can have ziz_{i} reach 1ni=1nzi(0),zi(0)=xik+1+λik/ρ\frac{1}{n}\sum_{i=1}^{n}z_{i}(0),z_{i}(0)=x_{i}^{k+1}+\lambda_{i}^{k}/\rho in a finite number of steps, i.e., finite-time average consensus. Similarly, requiring zizjϵ,vi,vj𝒱\|z_{i}-z_{j}\|\leq\epsilon,\forall v_{i},v_{j}\in\mathcal{V} means asking nodes vi,vj𝒱v_{i},v_{j}\in\mathcal{V} to have zi,zjz_{i},z_{j} with ϵ\epsilon closeness to the average consensus, i.e., to have zi,zjz_{i},z_{j} enter a circle with its center at 1ni=1n(xik+1+λik/ρ)\frac{1}{n}\sum_{i=1}^{n}(x_{i}^{k+1}+\lambda_{i}^{k}/\rho) and its radius as ϵ\epsilon. Here we call it finite-time “approximate” ratio consensus.

IV Asynchronous operation and consensus algorithms

Before we proceed to the description of our main algorithm, we describe the form of asynchrony that the nodes experience and some consensus algorithms that are key ingredients for the operation of our proposed algorithms.

IV-A Asynchrony description

Let t(0)+t(0)\in\mathds{R}_{+} the time at which the iterations for the optimization start. We assume that there is a set of times 𝒯={t(1),t(2),t(3),}\mathcal{T}=\{t(1),t(2),t(3),\ldots\} at which one or more nodes transmit some value to their neighbors. In a synchronous setting, each node viv_{i} updates and sends its information to its neighbors at discrete times in 𝒯\mathcal{T} and no processing or communication delays are considered. In an asynchronous setting, a message that is received at time t(k1)t(k_{1}) and processed at time t(k2)t(k_{2}), k2>k1k_{2}>k_{1}, experiences a process delay of t(k2)t(k1)t(k_{2})-t(k_{1}) (or a time-index delay k2k1k_{2}-k_{1}). In Fig. 1, we show through a simple example how the time steps evolve for each node in the network; with ti(k)t_{i}(k) we denote the time step at which iteration kk takes place for node viv_{i}.

Refer to caption
Figure 1: A simple example of a network consisting of 3 nodes. In the timeline of each node, blue ticks indicate an iteration for node viv_{i} and the arrows indicate the transmissions. The time in between transmissions is the processing delay, while the time from the beginning of the transmission to the end (arrow) is the transmission delay.

We index nodes’ information states and any other information at time t(k)t(k) by time-index kk. Hence, we use xikxi[k]x_{i}^{k}\equiv x_{i}[k]\in\mathds{R} to denote the information state of node ii at time t(k)t(k). Note that xikT{x_{i}^{k}}^{\mbox{\tiny T}} denotes (is equivalent to) xi[k]Tx_{i}[k]^{\mbox{\tiny T}}.

Assumption 4

There exists an upper bound B,BB,B\in\mathds{N}, on the time-index steps that is needed for a node to process the information received from other nodes.

Assumption 4 basically states that since the number of nodes is finite and they update their state regularly, there exists a finite number of steps BB before which all nodes have updated their values (and hence broadcast it to neighboring nodes). However, it is not possible for nodes to count the number of steps elapsed in the network. For this reason we make an additional assumption.

Assumption 5

There exists an upper bound T,T0T,T\in\mathds{R}_{\geq 0}, on the actual time in seconds that is needed for a node to process the information received from other nodes.

In other words, the upper bound BB steps in Assumption 4 is translated to an upper bound of TT in seconds, something that the nodes can count individually.

IV-A1 Asynchronous ratio consensus

Here, we regard the asynchronous information exchange among nodes as delayed information. Thus, we adopt a protocol developed in [13] where each node updates its information state xjk+1x_{j}^{k+1} by combining the delayed information received by its neighbors xisx_{i}^{s} (s,sk,vi𝒩js\in\mathds{Z},s\leq k,~{}v_{i}\in\mathcal{N}^{-}_{j}) using constant positive weights pjip_{ji}. Integer τjik0\tau_{ji}^{k}\geq 0 is used to represent the delay of a message sent from node viv_{i} to node vjv_{j} at time instant kk. We require that 0τjikτ¯jiτ¯0\leq\tau_{ji}^{k}\leq\bar{\tau}_{ji}\leq\bar{\tau} for all k0k\geq 0 for some finite τ¯=max{τ¯ji}\bar{\tau}=\max\{\bar{\tau}_{ji}\}, τ¯\bar{\tau}\in\mathds{N} and 1+τ¯B1+\bar{\tau}\leq B where BB is defined in Assumption 4. We make the reasonable assumption that τjjk=0\tau_{jj}^{k}=0, vj𝒱\forall v_{j}\in\mathcal{V}, at all time instances kk (i.e., the own value of a node is always available without delay). Each node updates its information state according to:

yjk+1=pjjyjk+vi𝒩jr=0τ¯pjiyikrIkr,jir,\displaystyle y_{j}^{k+1}=p_{jj}y_{j}^{k}+\sum_{v_{i}\in\mathcal{N}^{-}_{j}}\sum_{r=0}^{\bar{\tau}}p_{ji}y_{i}^{k-r}I_{k-r,ji}^{r}, (20)

for k0k\geq 0, where yj0y_{j}^{0}\in\mathds{R} is the initial state of node vjv_{j}; pjip_{ji} εji\forall\varepsilon_{ji}\in\mathcal{E} form P=[pji]P=[p_{ji}] that adheres to the graph structure, and is primitive column stochastic; and

Ik,ji(τ)={1,if τjik=τ,0,otherwise.\displaystyle I_{k,ji}(\tau)=\begin{cases}1,&\text{if $\tau_{ji}^{k}=\tau$,}\\ 0,&\text{otherwise.}\end{cases} (21)
Lemma 1

[13, Lemma 2] Consider a strongly connected digraph 𝒢(𝒱,)\mathcal{G}(\mathcal{V},\mathcal{E}). Let yjky_{j}^{k} and wjkw_{j}^{k} (for all vj𝒱v_{j}\in\mathcal{V} and k=0,1,2,k=0,1,2,\ldots) be the result of the iterations (20) and

wjk+1\displaystyle w_{j}^{k+1} =pjjwjk+vi𝒩jr=0τ¯pjiwikrIkr,jir,\displaystyle=p_{jj}w_{j}^{k}+\sum_{v_{i}\in\mathcal{N}^{-}_{j}}\sum_{r=0}^{\bar{\tau}}p_{ji}w_{i}^{k-r}I_{k-r,ji}^{r}\;, (22)

where plj=11+𝒟j+p_{lj}=\frac{1}{1+\mathcal{D}_{j}^{+}} for vl𝒩j+{vj}v_{l}\in\mathcal{N}_{j}^{+}\cup\{v_{j}\} (zeros otherwise) with y0=[y10T,y20T,,y|𝒱|0T]Ty^{0}=[{y_{1}^{0}}^{\mbox{\tiny T}},{y_{2}^{0}}^{\mbox{\tiny T}},\ldots,{y_{|\mathcal{V}|}^{0}}^{\mbox{\tiny T}}]^{T} and w0=𝟙w^{0}=\mathds{1}; Ik,jiI_{k,ji} is an indicator function that captures the bounded delay τjik\tau_{ji}^{k} on link (vj,vi)(v_{j},v_{i}) at iteration kk (as defined in (21), τjikτ¯\tau_{ji}^{k}\leq\bar{\tau}). Then, the solution zjk=yjk/wjk\displaystyle z_{j}^{k}={y_{j}^{k}}/{w_{j}^{k}} to the average consensus problem can be asymptotically obtained as limkzjk=v𝒱y0|𝒱|,vj𝒱.\lim_{k\rightarrow\infty}z_{j}^{k}=\frac{\sum_{v_{\ell}\in\mathcal{V}}y_{\ell}^{0}}{|\mathcal{V}|}\;,\;\forall v_{j}\in\mathcal{V}\;.

IV-A2 Asynchronous max\max-consensus

When the updates are asynchronous, for any node vj𝒱v_{j}\in\mathcal{V}, the update rule is as follows [14]:

yj[tj(k+1)]=maxvi𝒩j[tj(k+1)]{vj}{yi[tj(k)+θij(k)]},\displaystyle y_{j}[t_{j}(k+1)]=\max_{v_{i}\in\mathcal{N}_{j}^{-}[t_{j}(k+1)]\cup\{v_{j}\}}\{y_{i}[t_{j}(k)+\theta_{ij}(k)]\},

where yi[tj(k)+θij(k)]y_{i}[t_{j}(k)+\theta_{ij}(k)] are the states of the in-neighbors 𝒩j[tj(k+1)]\mathcal{N}_{j}^{-}[t_{j}(k+1)] available at the time of the update. Variable θij(k)\theta_{ij}(k)\in\mathds{R}, evaluated with respect to the update time tj(k)t_{j}(k) (defined in Sec. IV-A), is used here to express asynchronous state updates occurring at the neighbors of node vjv_{j}, between two consecutive updates of the state of node vjv_{j}. It has been shown in [14, Lemma 5.1] that this algorithm converges to the maximum value among all nodes in a finite number of steps ss, sBDs\leq BD, where DD and BB are defined in Assumptions 1 and 4, respectively.

V Main results

Our proposed algorithm works on two levels: i) at the optimization level in which the distributed ADMM procedure is described for updating xik+1x_{i}^{k+1}, zik+1z_{i}^{k+1} and λik+1\lambda_{i}^{k+1} (Algorithm 1), and ii) at the distributed coordination level for computing zik+1z_{i}^{k+1} in an asynchronous manner (Algorithm 2). Algorithm 2 is executed within Algorithm 1.

V-A Algorithm 1

Unlike asynchronous operation which is depicted in Fig. 1, in a synchronous algorithm all nodes need to agree on the update time t(k)t(k), which usually requires synchronization among all nodes or the existence of a global clock. Whether the distributed ADMM update is synchronous or asynchronous arises not only when exchanging information for computing zk+1z^{k+1} using (17), but also when they start the next optimization step (i.e., when optimization step kk ends and step k+1k+1 begins). For now, we assume that the transition in the optimization steps is somehow synchronized. We will discuss ways to achieve it later.

Our AsyAD-ADMM algorithm to solve optimization problem (12) is summarized in Algorithm  1. In Algorithm 1, at every optimization step, each node viv_{i} does the following:

  • \bullet

    It computes xik+1x_{i}^{k+1} using Eq. (16) locally.

  • \bullet

    It computes zik+1z_{i}^{k+1} via Algorithm 2 in an asynchronous fashion and in a finite number of steps, since Algorithm 2 deploys a termination mechanism.

  • \bullet

    It computes λik+1\lambda_{i}^{k+1} using Eq. (18) and the values of xik+1x_{i}^{k+1} and zik+1z_{i}^{k+1}.

For the optimization, we assume that all nodes are aware of the network diameter DD (or an upper bound on DD), an upper bound on time-steps 1+τ¯B1+\bar{\tau}\leq B, the augmented Lagrangian function parameter ρ\rho, and the ADMM maximum optimization step kmaxk_{\max}111Note that the ADMM stopping criterion is the primal and dual feasibility condition in [11]..

Algorithm 1 AsyAD-ADMM: Asynchronous Approximate Distributed ADMM
1:  Input: A strongly connected digraph 𝒢(𝒱,)\mathcal{G}(\mathcal{V},\mathcal{E}), in which each node vi𝒱v_{i}\in\mathcal{V} knows its out-degree 𝒩i+\mathcal{N}_{i}^{+}, ρ>0\rho>0, network diameter DD (or an upper bound), upper bound on the number of time-steps τ¯\bar{\tau}, error tolerance ϵ\epsilon, kmaxk_{\max} (ADMM maximum number of iterations)
2:  Initialization: At k=0k=0, each node vi𝒱v_{i}\in\mathcal{V} sets xi0x_{i}^{0}, zi0z_{i}^{0}, and λi0\lambda_{i}^{0} randomly.
3:  Node vi𝒱v_{i}\in\mathcal{V} does the following:
4:  while kkmaxk\leq k_{\max}  do
5:     Compute xik+1x_{i}^{k+1} using Eq. (16)
6:     Compute zik+1z_{i}^{k+1} via Algorithm 2 by setting yi0=xik+1+λik/ρy_{i}^{0}=x_{i}^{k+1}+\lambda_{i}^{k}/\rho and using DD, τ¯\bar{\tau}, and ϵ\epsilon.
7:     Compute λik+1\lambda_{i}^{k+1} using Eq. (18)
8:     if ADMM stopping criterion is satisfied then
9:        Stop AsyAD-ADMM
10:     end if
11:     kk+1k\leftarrow k+1
12:  end while

V-B Algorithm 2

Under Assumption 1, Cady et al. in [15] proposed an algorithm which is based on the synchronous ratio-consensus protocol [16] and takes advantage of synchronous min\min- and max\max-consensus iterations to allow the nodes to determine the time step, k0k_{0}, when their ratios, (i.e., {zik0|vi𝒱}\{z_{i}^{k_{0}}|v_{i}\in\mathcal{V}\} in ADMM), are within ϵ\epsilon of each other. However, the algorithm in [16] cannot deal with asynchronous information exchange scenarios. To circumvent this problem, and inspired by Cady et al. in [15], the authors in [17] adopted robustified ratio consensus proposed in [13] to propose a termination mechanism for average consensus with delays. Similarly, [8] proposed the corresponding asynchronous termination algorithm in the context of a quadratic distributed optimization problem, in which the algorithm is executed only once. We adopt the asynchronous termination algorithm proposed in [8] to compute zik+1z_{i}^{k+1}, and we extend it to accommodate the repetition of this algorithm. The algorithm, under Assumption 4 is summarized in Algorithm 2. Specifically, Algorithm 2 makes use of the following ideas:

  • \bullet

    Each node vjv_{j} runs asynchronous ratio consensus in Sec. IV-A1; in our case, we use initial conditions yj0=xjk+1+λjk/ρy_{j}^{0}=x_{j}^{k+1}+\lambda_{j}^{k}/\rho.

  • \bullet

    At the same time, each node maintains two auxiliary states, mjkm_{j}^{k} and MjkM_{j}^{k}, which are updated using asynchronous min\min- and max\max-consensus in Sec. IV-A2, respectively. Both converge in (1+τ¯)D(1+\bar{\tau})D steps [14].

  • \bullet

    Every (1+τ¯)D(1+\bar{\tau})D steps each node checks whether Mjkmjk<ϵ\|M_{j}^{k}-m_{j}^{k}\|<\epsilon. If this is the case, then the ratios for all nodes are close to the asymptotic value and it stops iterating. Otherwise, mjkm_{j}^{k} and MjkM_{j}^{k} are reinitialized to zjkz_{j}^{k}.

Algorithm 2 Distributed Finite-Time Termination for Asynchronous Ratio Consensus
1:  Input: A strongly connected digraph 𝒢=(𝒱,)\mathcal{G}=(\mathcal{V},\mathcal{E}). vj𝒱v_{j}\in\mathcal{V} knows its out-degree 𝒩j+\mathcal{N}_{j}^{+}. yj0y_{j}^{0}, wj0=𝟙w_{j}^{0}=\mathds{1}, D,τ¯,ϵD,\bar{\tau},\epsilon.
2:  set Mj0=+,mj0=,flagj0=0,zj=yj0wj0M_{j}^{0}=+\infty,~{}m_{j}^{0}=-\infty,{\rm flag}_{j}^{0}=0,z_{j}=\frac{y_{j}^{0}}{w_{j}^{0}}
3:  set plj=11+𝒟j+p_{lj}=\frac{1}{1+\mathcal{D}_{j}^{+}}, vl𝒩j+{vj}\forall~{}v_{l}\in\mathcal{N}_{j}^{+}\cup\{v_{j}\} (zero otherwise)
4:  for k0k\geq 0 do
5:     while flagjk=0{\rm flag}_{j}^{k}=0 do
6:        if kmod(1+τ¯)D=0k\mod(1+\bar{\tau})D=0 and k0k\neq 0 then
7:           if Mjkmjk<ϵ\|M_{j}^{k}-m_{j}^{k}\|<\epsilon then
8:              set flagjk=1{\rm flag}_{j}^{k}=1
9:           end if
10:           set Mjk=mjk=zjk=yjkwjkM_{j}^{k}=m_{j}^{k}=z_{j}^{k}=\frac{y_{j}^{k}}{w_{j}^{k}}
11:        end if
12:        broadcast to all vl𝒩j+v_{l}\in\mathcal{N}_{j}^{+}:
pljyjkp_{lj}y_{j}^{k}, pljwjkp_{lj}w_{j}^{k}, MjkM_{j}^{k}, mjkm_{j}^{k}
13:        receive from all vi𝒩j[k]v_{i}\in\mathcal{N}_{j}^{-}[k] :
pjiyikp_{ji}y_{i}^{k}, pjiwikp_{ji}w_{i}^{k}, MikM_{i}^{k}, mikm_{i}^{k}
14:        compute
15:        yjkpjjyjk+vi𝒩jr=0τ¯yjikrIkr,jiry_{j}^{k}\leftarrow p_{jj}y_{j}^{k}+\sum_{v_{i}\in\mathcal{N}^{-}_{j}}\sum_{r=0}^{\bar{\tau}}y_{ji}^{k-r}I_{k-r,ji}^{r}
16:        wjkpjjwjk+vi𝒩jr=0τ¯wjikrIkr,jirw_{j}^{k}\leftarrow p_{jj}w_{j}^{k}+\sum_{v_{i}\in\mathcal{N}^{-}_{j}}\sum_{r=0}^{\bar{\tau}}w_{ji}^{k-r}I_{k-r,ji}^{r}
17:        Mjkmaxvi𝒩j{vj}{Mi[tj(k)+θij(k)]}M_{j}^{k}\leftarrow\max_{v_{i}\in\mathcal{N}_{j}^{-}\cup\{v_{j}\}}\{M_{i}[t_{j}(k)+\theta_{ij}(k)]\}
18:        mjkmaxvi𝒩j{vj}{mi[tj(k)+θij(k)]}m_{j}^{k}\leftarrow\max_{v_{i}\in\mathcal{N}_{j}^{-}\cup\{v_{j}\}}\{m_{i}[t_{j}(k)+\theta_{ij}(k)]\}
19:     end while
20:  end for

As stated in Assumption 5, the upper bound BB on time-steps is guaranteed to be executed within TT seconds (actual time). So, nodes have a more loose synchronization of every DTDT seconds. The max\max- and min\min- are checked every DTDT seconds and if they have converged in one of the checks, the next optimization step is initiated for all DTDT seconds, after the termination of the algorithm. This approach requires some loose form of synchronization, but the time scale is much more coarse.

V-C Convergence analysis

The following Theorem 1 states that AsyAD-ADMM has O(1k)O(\frac{1}{k}) convergence rate.

Theorem 1 (Convergence)

Let {Xk,zk,λk}\{X^{k},z^{k},\lambda^{k}\} be the iterates in AsyAD-ADMM algorithm for problem (12), where Xk=[x1kT,x2kT,,xnkT]TX^{k}=[{x_{1}^{k}}^{\mbox{\tiny T}},{x_{2}^{k}}^{\mbox{\tiny T}},\ldots,{x_{n}^{k}}^{\mbox{\tiny T}}]^{\mbox{\tiny T}} and λk=[λ1kT,λ2kT,,λnkT]T\lambda^{k}=[{\lambda_{1}^{k}}^{\mbox{\tiny T}},{\lambda_{2}^{k}}^{\mbox{\tiny T}},\ldots,{\lambda_{n}^{k}}^{\mbox{\tiny T}}]^{\mbox{\tiny T}}. Let X¯k=1ks=0k1Xs+1,z¯k=1ks=0k1zs+1\bar{X}^{k}=\frac{1}{k}\sum_{s=0}^{k-1}X^{s+1},\bar{z}^{k}=\frac{1}{k}\sum_{s=0}^{k-1}z^{s+1} be respectively the ergodic average of XkX^{k}. Considering a strongly connected communication graph, under Assumptions 1-5, the following relationship holds for any iteration kk as

0\displaystyle 0 L(X¯k,z¯k,λ)L(X,z,λ)\displaystyle\leq L(\bar{X}^{k},\bar{z}^{k},\lambda^{*})-L(X^{*},z^{*},\lambda^{*}) (23)
1k(12ρλλ02+ρ2Xz02)+𝒪(nϵ),\displaystyle\leq\frac{1}{k}\left(\frac{1}{2\rho}\|\lambda^{*}-\lambda^{0}\|^{2}+\frac{\rho}{2}\|X^{*}-z^{0}\|^{2}\right)+\mathcal{O}(\sqrt{n}\epsilon),

where ϵ\epsilon is the zz update (17) tolerance whose value is independent of the researched problem.

Proof:

See Appendix A. ∎

Remark 1

Note that the convergence proof shows the convergence rate for the optimization steps. The fact that the distributed algorithm is approximate, an additional term 𝒪(nϵ)\mathcal{O}(\sqrt{n}\epsilon) is introduced, that it is a function of the error tolerance and the size of the network and that it influences the solution precision.

Remark 2

The convergence proof here is basically different from the ones in [12] and [6] as the investigated problems are different. Specifically, in [12], the proposed D-ADMM can be only applied to nodes with undirected graphs as the constraint AX=0AX=0 is needed to minimize the objective function (7), where the matrix AA is related to the communication graph structure which must be undirected. Authors in [6] proposed the D-ADMM for digraphs with the same constraint xixjϵ,ϵ>0\|x_{i}-x_{j}\|\leq\epsilon,\epsilon>0. However, it can only be applied to synchronous case.

VI Numerical Example

The distributed least square problem is considered as

argminxpf(x)=12i=1nAixbi2,\operatorname*{argmin}_{x\in\mathds{R}^{p}}f(x)=\frac{1}{2}\sum_{i=1}^{n}\|A_{i}x-b_{i}\|^{2}, (24)

where Aiq×pA_{i}\in\mathds{R}^{q\times p} is only known to node viv_{i}, biqb_{i}\in\mathds{R}^{q} is the measured data and xpx\in\mathds{R}^{p} is the common decision variable that needs to be optimized. For the automatic generation of large number of different matrices AiA_{i}, we choose q=pq=p to have the square AiA_{i}. All elements of AiA_{i} and bib_{i} are set from independent and identically distributed (i.i.d.) samples of standard normal distribution 𝒩(0,1)\mathcal{N}(0,1).

We choose p=3p=3 and n=600n=600 to have 600 nodes having a strongly connected digraph. We implement the D-ADMM-FTERC algorithm in [7] as a benchmark to evaluate factors (i.e., ϵ,τ¯\epsilon,\bar{\tau}) influencing AsyAD-ADMM performance as D-ADMM-FTERC is developed for calculating the optimal solution to distributed optimization problems for digraphs, i.e., all figures related to D-ADMM-FTERC are optimal. However, D-ADMM-FTERC is synchronous and this is the reason we develop AsyAD-ADMM in this paper.

Fig. 2 demonstrates three points: (i) AsyAD-ADMM solutions are very close to the optimal one from D-ADMM-FTERC. (ii) From Fig. 2 (a) to (b), the smaller the value of ϵ\epsilon is, the closer AsyAD-ADMM solutions are to D-ADMM-FTERC one. This is reasonable as smaller value of ϵ\epsilon means better and more accurate “approximate” ratio consensus for zz update (17) using Algorithm 2.

Refer to caption
Figure 2: Comparison of solutions to problem (24) between AsyAD-ADMM in this paper and synchronous D-ADMM-FTERC in [7] without stopping condition (steps 8-10 in Algorithm 1, kmax=200k_{\max}=200): subplots from (a) to (b) are related to different values of ϵ\epsilon and τ¯\bar{\tau} influencing the performance of AsyAD-ADMM.

This can be also validated in Fig. 3 where red and black lines are the zz updates using AsyAD-ADMM and D-ADMM-FTERC, respectively. (iii) The values of τ¯\bar{\tau} (defined in Sec. IV-A1) do not influence the precision of AsyAD-ADMM, which also makes sense as τ¯\bar{\tau} decides the degree of asynchrony, not the asynchronous ADMM precision. Nodes use this information to decide when to update as shown in step 6 of Algorithm 2. The values of τ¯\bar{\tau} do influence the iteration steps of Algorithm 2 as described in Table I from which it shows the larger the value of τ¯\bar{\tau} is, the more iteration steps Algorithm 2 needs in each AsyAD-ADMM optimization step. For ϵ=0.01\epsilon=0.01, the steps are the same as 10001000 because we set the iteration step upper bound for Algorithm 2 as 1000; it means iteration steps are no less than 1000 when ϵ0.01\epsilon\leq 0.01 and τ¯3\bar{\tau}\geq 3.

Refer to caption
Figure 3: zkz^{k} (zijk,i=1,,n,j=1,,pz^{k}_{ij},i=1,\ldots,n,j=1,\ldots,p) update in (17) with τ¯=3\bar{\tau}=3 (red and black lines are from AsyAD-ADMM and D-ADMM-FTERC, respectively ) under stopping condition (steps 8-10 in Algorithm 1, absolute tolerance: 1exp41\exp^{-4}, relative tolerance: 1exp21\exp^{-2}).

Note that in Fig. 3, we have AsyAD-ADMM with the stopping condition which is for the convenience of figure presentation. The reason that Fig. 3 (a) and (b) have a big difference is zizjϵ\|z_{i}-z_{j}\|\leq\epsilon, which means a smaller ϵ\epsilon leads to a more accurate zz update (17) using Algorithm 2. And the reason is that a smaller ϵ\epsilon leads to a larger Algorithm 2 iteration step as one can check it is 99 and 10001000 from Table I for ϵ=0.1\epsilon=0.1 and ϵ=0.01\epsilon=0.01 in this example. Also from Fig. 3, one can see the ADMM iteration numbers are the same for different values of ϵ\epsilon for AsyAD-ADMM with the stopping condition. Furthermore, Table II describes the running time comparison on an Intel Core i5 processor at 2.6 GHz with Matlab R2020b for different values of ϵ\epsilon and τ¯\bar{\tau} with kmax=200k_{\max}=200. One can see for ϵ=0.01\epsilon=0.01, from τ¯=3,5,10\bar{\tau}=3,5,10, even though Algorithm 2 iteration steps are the same as 1000, the running time is larger and larger.

TABLE I: Algorithm 2 iteration step in each Algorithm 1 iteration.
τ¯=3\bar{\tau}=3 τ¯=5\bar{\tau}=5 τ¯=10\bar{\tau}=10
ϵ=0.1\epsilon=0.1 9 13 23
ϵ=0.01\epsilon=0.01 1000 1000 1000
TABLE II: AsyAD-ADMM Algorithm 1 running time.
τ¯=3\bar{\tau}=3 τ¯=5\bar{\tau}=5 τ¯=10\bar{\tau}=10
ϵ=0.1\epsilon=0.1 7.9677s 12.0282 s 33.6401s
ϵ=0.01\epsilon=0.01 190.1420s 246.9963s 470.6576s

This is because Algorithm 2 running time is the time gap between two consecutive iterations multiplies the iteration step while in step 6 of Algorithm 2, (1+τ¯)D(1+\bar{\tau})D is the time gap. Fig. 4 demonstrates two points. (i) It validates Theorem 1 that AsyAD-ADMM has 𝒪(1k)\mathcal{O}(\frac{1}{k}) convergence rate. (ii) It is in accordance with Fig. 2 and Fig. 3 that a smaller value of ϵ\epsilon has a better solution precision.

What is more, from Fig. 2 and Table II, considering the AsyAD-ADMM solution precision and running time, it is better to have the value of ϵ\epsilon neither too large nor too small; for this example, ϵ[0.01,0.1)\epsilon\in[0.01,0.1) is a good value range.

Refer to caption
Figure 4: Comparison of the absolute errors against different values of ϵ\epsilon with τ¯=3\bar{\tau}=3 and the stopping condition (steps 8-10 in Algorithm 1, absolute tolerance: 1exp41\exp^{-4}, relative tolerance: 1exp21\exp^{-2}).

VII Conclusions and Future Directions

An asynchronous approximate distributed alternating direction method of multipliers algorithm is proposed to provide a solution for the distributed optimization problems allowing asynchronous information exchange among nodes in digraphs with assumptions of bounded time-index steps of asynchronous nodes and the known digraph diameter to all nodes. By proposing the finite-time asynchronous approximate ratio consensus algorithm for one ADMM primal variable update, the solution is close to the optimal but acceptable. How to choosing algorithm parameters to have an aggressive performance is also discussed.

Future work will focus on designing a new asynchronous ADMM version without the communication graph diameter information to compute the optimal solution for distributed optimization problems.

Appendix A Proof of Theorem 1

The analysis is inspired by [12] and [6]. From the second inequality of the saddle point of Lagrangian function (14), the first inequality in (23) can be proved directly.

We now prove the second inequality in (23). For each node viv_{i}, since xik+1x_{i}^{k+1} minimizes Lρ(x,zk,λk)L_{\rho}(x,z^{k},\lambda^{k}) in (16), by the optimal condition, we have

(xxik+1)T[hi(xik+1)+λik+ρ(xik+1zik)]0,\displaystyle(x-x_{i}^{k+1})^{\mbox{\tiny T}}[h_{i}(x_{i}^{k+1})+\lambda_{i}^{k}+\rho(x_{i}^{k+1}-z_{i}^{k})]\geq 0, (25)

where hi(xik+1)h_{i}(x_{i}^{k+1}) is the sub-gradient of fif_{i} at xik+1x_{i}^{k+1}. By integrating xik+1=(λik+1λik)/ρ+zik+1x_{i}^{k+1}=(\lambda_{i}^{k+1}-\lambda_{i}^{k})/\rho+z_{i}^{k+1} from (18) into the above inequality, we have

(xxik+1)T[hi(xik+1)+λik+1+ρ(zik+1zik)]0.(x-x_{i}^{k+1})^{\mbox{\tiny T}}[h_{i}(x_{i}^{k+1})+\lambda_{i}^{k+1}+\rho(z_{i}^{k+1}-z_{i}^{k})]\geq 0. (26)

The above nn inequalities can be written in compact form as

(XXk+1)T[h¯(Xk+1)+λk+1+ρ(zk+1zk)]0,(X-X^{k+1})^{\mbox{\tiny T}}[\bar{h}(X^{k+1})+\lambda^{k+1}+\rho(z^{k+1}-z^{k})]\geq 0, (27)

where h¯(Xk+1)=[h1T(x1k+1),,hnT(xnk+1)]T\bar{h}(X^{k+1})=[h_{1}^{\mbox{\tiny T}}(x_{1}^{k+1}),\ldots,h_{n}^{\mbox{\tiny T}}(x_{n}^{k+1})]^{\mbox{\tiny T}}. Denote

z~ik+11ni=1n(xik+1+λikρ),\tilde{z}_{i}^{k+1}\coloneqq\frac{1}{n}\sum_{i=1}^{n}(x_{i}^{k+1}+\frac{\lambda_{i}^{k}}{\rho}), (28)

and z~k+1=[z~1(k+1)T,,z~n(k+1)T]T\tilde{z}^{k+1}=[\tilde{z}_{1}^{{(k+1)}^{\mbox{\tiny T}}},\ldots,\tilde{z}_{n}^{{(k+1)}^{\mbox{\tiny T}}}]^{\mbox{\tiny T}}. Based on “approximate” consensus Algorithm 2, we denote the real zz update (17) for the step k+1k+1 as zk+1z~k+1+ek+1,ek+1=[e1(k+1)T,,en(k+1)T]Tz^{k+1}\coloneqq\tilde{z}^{k+1}+e^{k+1},e^{k+1}=[e_{1}^{{(k+1)}^{\mbox{\tiny T}}},\ldots,e_{n}^{{(k+1)}^{\mbox{\tiny T}}}]^{\mbox{\tiny T}} with eik+1ϵ,i=1,,n,k=0,1,\|e_{i}^{k+1}\|\leq\epsilon,i=1,\ldots,n,k=0,1,\ldots, i.e., ek+12nϵ\|e^{k+1}\|_{2}\leq\sqrt{n}\epsilon. Since z~k+1\tilde{z}^{k+1} minimizes Lρ(xk+1,z,λk)L_{\rho}(x^{k+1},z,\lambda^{k}) in (17), similarly, we have

(z\displaystyle(z- z~k+1)T[g¯(z~k+1)λkρ(Xk+1z~k+1)]\displaystyle\tilde{z}^{k+1})^{\mbox{\tiny T}}[\bar{g}(\tilde{z}^{k+1})-\lambda^{k}-\rho(X^{k+1}-\tilde{z}^{k+1})] (29)
=(zz~k+1)T(g¯(z~k+1)λk+1ρek+1)0\displaystyle=(z-\tilde{z}^{k+1})^{\mbox{\tiny T}}(\bar{g}(\tilde{z}^{k+1})-\lambda^{k+1}-\rho e^{k+1})\geq 0

for all z𝒞z\in\mathcal{C}, where g¯(z~k+1)\bar{g}(\tilde{z}^{k+1}) is the sub-gradient of gg at z~k+1\tilde{z}^{k+1}. As both FF and gg are convex, by utilizing the sub-gradient inequality, from (27) and (29), we get

F(\displaystyle F( Xk+1)F(X)+g(z~k+1)g(z)\displaystyle X^{k+1})-F(X)+g(\tilde{z}^{k+1})-g(z)
\displaystyle\leq (XXk+1)Th¯(Xk+1)(zz~k+1)Tg¯(z~k+1)\displaystyle-(X-X^{k+1})^{\mbox{\tiny T}}\bar{h}(X^{k+1})-(z-\tilde{z}^{k+1})^{\mbox{\tiny T}}\bar{g}(\tilde{z}^{k+1})
\displaystyle\leq (XXk+1)T[λk+1+ρ(zk+1zk)]\displaystyle(X-X^{k+1})^{\mbox{\tiny T}}[\lambda^{k+1}+\rho(z^{k+1}-z^{k})]
+(zz~k+1)T(λk+1ρek+1)\displaystyle+(z-\tilde{z}^{k+1})^{\mbox{\tiny T}}(-\lambda^{k+1}-\rho e^{k+1})
=\displaystyle= λ(k+1)T(XXk+1z+zk+1ek+1)\displaystyle\lambda^{{(k+1)}^{\mbox{\tiny T}}}(X-X^{k+1}-z+z^{k+1}-e^{k+1})
+ρ(XXk+1)T(zk+1zk)ρ(zz~k+1)Tek+1\displaystyle+\rho(X-X^{k+1})^{\mbox{\tiny T}}(z^{k+1}-z^{k})-\rho(z-\tilde{z}^{k+1})^{\mbox{\tiny T}}e^{k+1}
=\displaystyle= λ(k+1)T[XXk+1z+zk+1]+ρ(XXk+1)T(zk+1\displaystyle\lambda^{{(k+1)}^{\mbox{\tiny T}}}[X-X^{k+1}-z+z^{k+1}]+\rho(X-X^{k+1})^{\mbox{\tiny T}}(z^{k+1}
zk)[λk+1+ρ(zz~k+1)]Tek+1.\displaystyle-z^{k})-[\lambda^{k+1}+\rho(z-\tilde{z}^{k+1})]^{\mbox{\tiny T}}e^{k+1}. (30)

From (9), (11) and (28), we have g(z~k+1)=g(zk+1)g(\tilde{z}^{k+1})=g(z^{k+1}) as {z~k+1,zk+1}𝒞\{\tilde{z}^{k+1},z^{k+1}\}\in\mathcal{C}. Due to feasibility of the optimal solution (X,z)(X^{*},z^{*}), we obtain Xz=0X^{*}-z^{*}=0. By setting X=X,z=zX=X^{*},z=z^{*}, (30) becomes

F\displaystyle F (Xk+1)F(X)+g(zk+1)g(z)λ(k+1)T\displaystyle(X^{k+1})-F(X^{*})+g(z^{k+1})-g(z^{*})\leq\lambda^{{(k+1)}^{\mbox{\tiny T}}} (31)
×(zk+1Xk+1)+ρ(XXk+1)T(zk+1zk)ηk+1.\displaystyle\times(z^{k+1}-X^{k+1})+\rho(X^{*}-X^{k+1})^{\mbox{\tiny T}}(z^{k+1}-z^{k})-\eta^{k+1}.

where ηk+1[λk+1+ρ(zz~k+1)]Tek+1\eta^{k+1}\coloneqq[\lambda^{k+1}+\rho(z^{*}-\tilde{z}^{k+1})]^{\mbox{\tiny T}}e^{k+1}. Note that in Appendix A of [11], we know that both λk+1\lambda^{k+1} and zz~k+1z^{*}-\tilde{z}^{k+1} are bounded; and zz~k+1z^{*}-\tilde{z}^{k+1} goes to zero as kk\rightarrow\infty. Therefore, for k=1,2,k=1,2,\ldots, there exist constant numbers MλM_{\lambda} and MzM_{z} such that

λkMλ,zz~kMzk,MzkMz,limkMzk=0.\displaystyle\|\lambda^{k}\|\leq M_{\lambda},\|z^{*}-\tilde{z}^{k}\|\leq M_{z}^{k},M_{z}^{k}\leq M_{z},\lim_{k\to\infty}M_{z}^{k}=0.

Denote F¯k+1F(Xk+1)F(X)+g(zk+1)g(z)+λT(Xk+1zk+1)\bar{F}^{k+1}\coloneqq F(X^{k+1})-F(X^{*})+g(z^{k+1})-g(z^{*})+{\lambda^{*}}^{\mbox{\tiny T}}(X^{k+1}-z^{k+1}) and ϕk+1ρ(XXk+1)T(zk+1zk)\phi^{k+1}\coloneqq\rho(X^{*}-X^{k+1})^{\mbox{\tiny T}}(z^{k+1}-z^{k}). Adding λT(Xk+1zk+1){\lambda^{*}}^{\mbox{\tiny T}}(X^{k+1}-z^{k+1}) to both sides of (31) yields

F¯k+1\displaystyle\bar{F}^{k+1}\leq (λλk+1)T(Xk+1zk+1)+ϕk+1ηk+1\displaystyle(\lambda^{*}-\lambda^{k+1})^{\mbox{\tiny T}}(X^{k+1}-z^{k+1})+\phi^{k+1}-\eta^{k+1}
\displaystyle\leq 1ρ(λλk+1)T(λk+1λk)+ϕk+1ηk+1,\displaystyle\frac{1}{\rho}(\lambda^{*}-\lambda^{k+1})^{\mbox{\tiny T}}(\lambda^{k+1}-\lambda^{k})+\phi^{k+1}-\eta^{k+1}, (32)

where the last equality is calculated from (18). Recall an well-known equality law that

(a1\displaystyle(a_{1} a2)T(a3a4)=12(a1a42a1a32)\displaystyle-a_{2})^{\mbox{\tiny T}}(a_{3}-a_{4})=\frac{1}{2}(\|a_{1}-a_{4}\|^{2}-\|a_{1}-a_{3}\|^{2}) (33)
+12(a2a32a2a42),a1,a2,a3,a4p.\displaystyle+\frac{1}{2}(\|a_{2}-a_{3}\|^{2}-\|a_{2}-a_{4}\|^{2}),\forall a_{1},a_{2},a_{3},a_{4}\in\mathds{R}^{p}.

Then, by using equality (33), (32) changes to

F¯k+1\displaystyle\bar{F}^{k+1}\leq 12ρ(λλk2λλk+12λk+1λk2)\displaystyle\frac{1}{2\rho}(\|\lambda^{*}-\lambda^{k}\|^{2}-\|\lambda^{*}-\lambda^{k+1}\|^{2}-\|\lambda^{k+1}-\lambda^{k}\|^{2})
+ρ2(Xzk2Xzk+12\displaystyle+\frac{\rho}{2}(\|X^{*}-z^{k}\|^{2}-\|X^{*}-z^{k+1}\|^{2}
+Xk+1zk+12Xk+1zk2)ηk+1\displaystyle+\|X^{k+1}-z^{k+1}\|^{2}-\|X^{k+1}-z^{k}\|^{2})-\eta^{k+1}
\displaystyle\leq 12ρ(λλk2λλk+12)\displaystyle\frac{1}{2\rho}(\|\lambda^{*}-\lambda^{k}\|^{2}-\|\lambda^{*}-\lambda^{k+1}\|^{2}) (34)
+ρ2(Xzk2Xzk+12)ηk+1,\displaystyle+\frac{\rho}{2}(\|X^{*}-z^{k}\|^{2}-\|X^{*}-z^{k+1}\|^{2})-\eta^{k+1},

where the last inequality comes from using (18) and dropping the negative term ρ2Xk+1zk2-\frac{\rho}{2}\|X^{k+1}-z^{k}\|^{2}. Now, by using sks\coloneqq k, we change (34) to another format as

F¯s+1\displaystyle\bar{F}^{s+1}\leq 12ρ(λλs2λλs+12)\displaystyle\frac{1}{2\rho}(\|\lambda^{*}-\lambda^{s}\|^{2}-\|\lambda^{*}-\lambda^{s+1}\|^{2})
+ρ2(Xzs2Xzs+12)ηs+1,\displaystyle+\frac{\rho}{2}(\|X^{*}-z^{s}\|^{2}-\|X^{*}-z^{s+1}\|^{2})-\eta^{s+1}, (35)

which holds true for all ss. Denote the constant θλλ02/(2ρ)+ρXz02/2\theta\coloneqq\|\lambda^{*}-\lambda^{0}\|^{2}/(2\rho)+\rho\|X^{*}-z^{0}\|^{2}/2. By summing (35) over s=0,1,,k1s=0,1,\ldots,k-1 and after telescoping calculation, we have

s=0k1F(\displaystyle\sum_{s=0}^{k-1}F( Xs+1)kF(X)+s=0k1g(zs+1)kg(z)\displaystyle X^{s+1})-kF(X^{*})+\sum_{s=0}^{k-1}g(z^{s+1})-kg(z^{*})
+λTs=0k1(Xs+1zs+1)\displaystyle+{\lambda^{*}}^{\mbox{\tiny T}}\sum_{s=0}^{k-1}(X^{s+1}-z^{s+1}) (36)
\displaystyle\leq θ12ρλλk2ρ2Xzk2s=0k1ηs+1.\displaystyle\theta-\frac{1}{2\rho}\|\lambda^{*}-\lambda^{k}\|^{2}-\frac{\rho}{2}\|X^{*}-z^{k}\|^{2}-\sum_{s=0}^{k-1}\eta^{s+1}.

Due to the convexity of both FF and gg, we get kF(X¯k)s=0k1F(Xs+1)kF(\bar{X}^{k})\leq\sum_{s=0}^{k-1}F(X^{s+1}) and kg(z¯k)s=0k1g(zs+1)kg(\bar{z}^{k})\leq\sum_{s=0}^{k-1}g(z^{s+1}). Thus, by the definition of X¯k,z¯k\bar{X}^{k},\bar{z}^{k} and dropping the negative terms,

kF\displaystyle kF (X¯k)kF(X)+kg(z¯k)kg(z)+λT(kX¯kkz¯k)\displaystyle(\bar{X}^{k})-kF(X^{*})+kg(\bar{z}^{k})-kg(z^{*})+{\lambda^{*}}^{\mbox{\tiny T}}(k\bar{X}^{k}-k\bar{z}^{k})
\displaystyle\leq θs=0k1ηs+1θ+k(Mλ+ρMz)nϵ𝒪(nϵ).\displaystyle\theta-\sum_{s=0}^{k-1}\eta^{s+1}\leq\theta+k\underbrace{(M_{\lambda}+\rho M_{z})\sqrt{n}\epsilon}_{\coloneqq\mathcal{O}(\sqrt{n}\epsilon)}. (37)

Based on Xz=0X^{*}-z^{*}=0, (37) combined with the definition of Lagrangian function [cf. (13)] prove (23).

References

  • [1] R. Zhang and J. Kwok, “Asynchronous distributed admm for consensus optimization,” in International conference on machine learning.   PMLR, 2014, pp. 1701–1709.
  • [2] T.-H. Chang, M. Hong, W.-C. Liao, and X. Wang, “Asynchronous distributed ADMM for large-scale optimization-part ii: Algorithm and convergence analysis,” IEEE Transactions on Signal Processing, vol. 64, no. 12, pp. 3118–3130, 2016.
  • [3] S. E. Li, Z. Wang, Y. Zheng, Q. Sun, J. Gao, F. Ma, and K. Li, “Synchronous and asynchronous parallel computation for large-scale optimal control of connected vehicles,” Transportation research part C: emerging technologies, vol. 121, p. 102842, 2020.
  • [4] E. Wei and A. Ozdaglar, “On the o(1=k) convergence of asynchronous distributed alternating direction method of multipliers,” in 2013 IEEE Global Conference on Signal and Information Processing, 2013, pp. 551–554.
  • [5] N. Bastianello, R. Carli, L. Schenato, and M. Todescato, “Asynchronous distributed optimization over lossy networks via relaxed admm: Stability and linear convergence,” IEEE Transactions on Automatic Control, pp. 1–1, 2020.
  • [6] V. Khatana and M. V. Salapaka, “D-DistADMM: A O(1/k){O}(1/k) distributed ADMM for distributed optimization in directed graph topologies,” in Proc. 59th IEEE Conf. Dec. Control, Dec. 2020, pp. 2992–2997.
  • [7] W. Jiang and T. Charalambous, “Distributed alternating direction method of multipliers using finite-time exact ratio consensus in digraphs,” in Proc. 19th European Control Conf., Jun. 2021,accepted.
  • [8] A. Grammenos, T. Charalambous, and E. Kalyvianaki, “CPU scheduling in data centers using asynchronous finite-time distributed coordination mechanisms,” arXiv preprint arXiv:2101.06139, 2021.
  • [9] I. Shames, T. Charalambous, C. Hadjicostis, and M. Johansson, “Distributed network size estimation and average degree estimation and control in networks isomorphic to directed graphs,” in 50th Ann. Allerton Conf. Commun., Control, and Computing, Oct. 2012, pp. 1885–1892.
  • [10] T. Charalambous, M. G. Rabbat, M. Johansson, and C. N. Hadjicostis, “Distributed finite-time computation of digraph parameters: Left-eigenvector, out-degree and spectrum,” IEEE Transactions on Control of Network Systems, vol. 3, no. 2, pp. 137–148, 2016.
  • [11] S. Boyd, N. Parikh, and E. Chu, Distributed optimization and statistical learning via the alternating direction method of multipliers.   Now Publishers Inc, 2011.
  • [12] E. Wei and A. Ozdaglar, “Distributed alternating direction method of multipliers,” in Proc. 51st IEEE Conf. Dec. Control, 2012, pp. 5445–5450.
  • [13] C. N. Hadjicostis and T. Charalambous, “Average consensus in the presence of delays in directed graph topologies,” IEEE Transactions on Automatic Control, vol. 59, no. 3, pp. 763–768, March 2014.
  • [14] S. Giannini, D. Di Paola, A. Petitti, and A. Rizzo, “On the convergence of the max-consensus protocol with asynchronous updates,” in IEEE Conference on Decision and Control (CDC), 2013, pp. 2605–2610.
  • [15] S. T. Cady, A. D. Domínguez-García, and C. N. Hadjicostis, “Finite-time approximate consensus and its application to distributed frequency regulation in islanded AC microgrids,” in Proc. of Hawaii International Conference on System Sciences, 2015, pp. 2664–2670.
  • [16] A. D. Domínguez-García and C. N. Hadjicostis, “Coordination and control of distributed energy resources for provision of ancillary services,” in IEEE Int. Conf. Smart Grid Commun., Oct. 2010, pp. 537–542.
  • [17] M. Prakash, S. Talukdar, S. Attree, V. Yadav, and M. V. Salapaka, “Distributed stopping criterion for consensus in the presence of delays,” IEEE Transactions on Control of Network Systems, vol. 7, no. 1, pp. 85–95, 2020.