An Asynchronous Approximate Distributed
Alternating Direction Method of Multipliers in Digraphs

Wei Jiang, Andreas Grammenos, Evangelia Kalyvianaki, and Themistoklis Charalambous W. Jiang and T. Charalambous are with the Department of Electrical Engineering and Automation, School of Electrical Engineering, Aalto University, Espoo, Finland. Emails: {name.surname}@aalto.fi.A. Grammenos is with the Department of Computer Science and Technology, University of Cambridge, Cambridge, and the Alan Turing Institute, London, UK. Email: ag926@cl.cam.ac.uk.E. Kalyvianaki is with the Department of Computer Science and Technology, University of Cambridge, Cambridge, UK. Email: ek264@cl.cam.ac.uk.

Abstract

In this work, we consider the asynchronous distributed optimization problem in which each node has its own convex cost function and can communicate directly only with its neighbors, as determined by a directed communication topology (directed graph or digraph). First, we reformulate the optimization problem so that Alternating Direction Method of Multipliers (ADMM) can be utilized. Then, we propose an algorithm, herein called Asynchronous Approximate Distributed Alternating Direction Method of Multipliers (AsyAD-ADMM), using finite-time asynchronous approximate ratio consensus, to solve the multi-node convex optimization problem, in which every node performs iterative computations and exchanges information with its neighbors asynchronously. More specifically, at every iteration of AsyAD-ADMM, each node solves a local convex optimization problem for one of the primal variables and utilizes a finite-time asynchronous approximate consensus protocol to obtain the value of the other variable which is close to the optimal value, since the cost function for the second primal variable is not decomposable. If the individual cost functions are convex but not necessarily differentiable, the proposed algorithm converges at a rate of $O(1/k)$ , where $k$ is the iteration counter. The efficacy of AsyAD-ADMM is exemplified via a proof-of-concept distributed least-square optimization problem with different performance-influencing factors investigated.

Index Terms:

Distributed optimization, asynchronous ADMM, directed graphs, ratio consensus, finite-time consensus.

I Introduction

In this paper, we are interested in developing a modified ADMM algorithm for solving large-scale optimization problems in digraphs. In that direction, there are two main communication topologies considered: (i) master-workers communication topology [1, 2, 3] and (ii) multi-node communication topology [4, 5, 6, 7]. When the ADMM has a master-worker communication topology, the worker nodes optimize their local objectives and communicate their local variables to the master node which updates the global optimization variable and send it back to the workers. When the ADMM has no master node, the optimization problem is solved over a network of nodes. Herein, we focus on the ADMM realized on multi-node communication topologies.

There are several real-life applications in which synchronous operation is infeasible or costly and complicated. As a result, asynchronous information exchange is necessitated. For example, resource allocation in data centers gives rise to large-scale problems and networks, which naturally call for asynchronous solutions. Recently, different asynchronous distributed ADMM versions are proposed under different assumptions and communication topologies. For example, the version in [1, 2] is based on the master-worker architecture under a star topology. Even though worker updates are parallel and distributed, the master update is centralized as it collects information from workers and broadcasts the updated information to workers for their following iteration. Authors in [4, 5] proposed another version with the almost surely convergence by using information from a randomly selected subset of nodes for ADMM update under the positive possibility selection assumption. Then, authors in [3] combined the above two architecture and designed a master-local processor-link processor architecture with the random node selection. It is worth noting that all the above asynchronous versions are only available for undirected communication graphs, i.e., assuming that every communication link is bidirectional. Very recently, distributed ADMM versions for digraphs are proposed in [6, 7]. However, they are synchronous, which is also the motivation for this work.

In this paper, we propose AsyAD-ADMM, an asynchronous distributed ADMM algorithm for digraphs. First, the optimization problem is restructured such that at every optimization step, each node can solve individually a local convex optimization problem for one of the primal variables, while for the other primal variable the communication among nodes is required, since the cost function is not decomposable. Then, we find the solution for the primal variable whose cost function is non-decomposable by means of a finite-time asynchronous approximate consensus algorithm. More specifically, we adopt the algorithm introduced in [8, Algorithm 2]. The algorithm relies on asynchronous $\min-$ consensus and $\max-$ consensus protocols to compute the approximate average consensus value considering different communication delays over a finite number of steps. Unlike [8], the consensus algorithm should be repeated at every optimization step, which is also asynchronous. Towards this end, we propose a mechanism in which every next optimization step is initiated after the previous one has finished. The main contributions of the paper are as follows.

$\bullet$

We propose an asynchronous distributed ADMM algorithm which can be applied to digraphs.
$\bullet$

The convergence analysis shows AsyAD-ADMM converges at a rate of $O(1/k)$ , where $k$ is the iteration counter. Factors influencing the algorithm performance are also investigated.

II Preliminaries

II-A Notation and graph theory

The set of real (non-negative integer, positive integer) numbers is denoted by $\mathds{R}$ ( $\mathds{Z},\mathds{N}$ ) and $\mathds{R}^{n}_{+}$ denotes the non-negative orthant of the $n$ -dimensional real space $\mathds{R}^{n}$ . $A^{\mbox{\tiny T}}$ denotes the transpose of matrix $A$ . For $A\in\mathds{R}^{n\times n}$ , $a_{ij}$ denotes the entry in row $i$ and column $j$ . By $\mathds{1}$ we denote the all-ones vector and by $I$ we denote the identity matrix (of appropriate dimensions). $\|\cdot\|$ denotes the 2-norm.

In multi-node systems with fixed communication links (edges), the exchange of information between nodes can be conveniently captured by a graph $\mathcal{G}(\mathcal{V},\mathcal{E})$ of order $n$ $(n\geq 2)$ , where $\mathcal{V}=\{v_{1},v_{2},\ldots,v_{n}\}$ is the set of nodes and $\mathcal{E}\subseteq\mathcal{V}\times\mathcal{V}$ is the set of edges. A directed edge from node $v_{i}$ to node $v_{j}$ is denoted by $\varepsilon_{ji}=(v_{j},v_{i})\in\mathcal{E}$ and represents a communication link that allows node $v_{j}$ to receive information from node $v_{i}$ . A graph is said to be undirected if and only if $\varepsilon_{ji}\in\mathcal{E}$ implies $\varepsilon_{ij}\in\mathcal{E}$ . A digraph is called strongly connected if there exists a path from each vertex $v_{i}$ of the graph to each vertex $v_{j}$ ( $v_{j}\neq v_{i}$ ). In other words, for any $v_{j},v_{i}\in\mathcal{V}$ , $v_{j}\neq v_{i}$ , one can find a sequence of nodes $v_{i}=v_{l_{1}}$ , $v_{l_{2}}$ , $v_{l_{3}}$ , $\ldots$ , $v_{l_{m}}=v_{j}$ such that link $(v_{l_{s+1}},v_{l_{s}})\in\mathcal{E}$ for all $s=1,2,\ldots,m-1$ . The diameter $D$ of a graph is the longest shortest path between any two nodes in the network.

All nodes that can transmit information to node $v_{j}$ directly are said to be in-neighbors of node $v_{j}$ and belong to the set $\mathcal{N}^{-}_{j}=\{v_{i}\in\mathcal{V}\;|\;\varepsilon_{ji}\in\mathcal{E}\}$ . The cardinality of $\mathcal{N}^{-}_{j}$ , is called the in-degree of $v_{j}$ and is denoted by $\mathcal{D}^{-}_{j}=\left|\mathcal{N}^{-}_{j}\right|$ . The nodes that receive information from node $v_{j}$ belong to the set of out-neighbors of node $v_{j}$ , denoted by $\mathcal{N}^{+}_{j}=\{v_{l}\in\mathcal{V}\;|\;\varepsilon_{lj}\in\mathcal{E}\}$ . The cardinality of $\mathcal{N}^{+}_{j}$ , is called the out-degree of $v_{j}$ and is denoted by $\mathcal{D}^{+}_{j}=\left|\mathcal{N}^{+}_{j}\right|$ .

In the type of algorithms we consider, we associate a positive weight $p_{ji}$ for each edge $\varepsilon_{ji}\in\mathcal{E}\cup\{(v_{j},v_{j})\;|\>v_{j}\in\mathcal{V}\}$ . The nonnegative matrix $P=[p_{ji}]\in\mathds{R}_{+}^{n\times n}$ is a weighted adjacency matrix that has zero entries at locations that do not correspond to directed edges (or self-edges) in the graph.

II-B Standard ADMM Algorithm

The standard ADMM algorithm solves the problem:

	$\displaystyle\min$	$\displaystyle f(x)+g(z)$		(1)
	s.t.	$\displaystyle Ax+Bz=c$		(1)

for variables $x\in\mathds{R}^{p},z\in\mathds{R}^{m}$ with matrices $A\in\mathds{R}^{q\times p},B\in\mathds{R}^{q\times m}$ and vector $c\in\mathds{R}^{q}$ ( $p,m,q\in\mathds{N}$ ). The augmented Lagrangian is

	$\displaystyle L_{\rho}(x,z,\lambda)=$	$\displaystyle f(x)+g(z)+\lambda^{T}(Ax+Bz-c)$		(2)
		$\displaystyle+\frac{\rho}{2}\\|Ax+Bz-c\\|^{2},$		(2)

where $\lambda$ is the Lagrange multiplier and $\rho>0$ is a penalty parameter. In ADMM, the primary variables $x,z$ and the Lagrange multiplier $\lambda$ are updated as follows: starting from some initial vector $\begin{bmatrix}x^{0}&z^{0}&\lambda^{0}\end{bmatrix}^{\mbox{\tiny T}}$ , at each optimization iteration $k$ ,

$\displaystyle x^{k+1}=$	$\displaystyle\operatorname*{argmin}_{x}L_{\rho}(x,z^{k},\lambda^{k}),$	(3)
$\displaystyle z^{k+1}=$	$\displaystyle\operatorname*{argmin}_{z}L_{\rho}(x^{k+1},z,\lambda^{k}),$	(4)
$\displaystyle\lambda^{k+1}=$	$\displaystyle\lambda^{k}+\rho(Ax^{k+1}+Bz^{k+1}-c).$	(5)

The step-size in the Lagrange multiplier update is the same as the augmented Lagrangian function parameter $\rho$ .

III Problem Formulation

In this work, we consider a strongly connected digraph $\mathcal{G}(\mathcal{V},\mathcal{E})$ in which each node $v_{i}\in\mathcal{V}$ is endowed with a scalar cost function $f_{i}:\mathds{R}^{p}\mapsto\mathds{R}$ assumed to be known to the node only. We assume that each node $v_{i}$ has knowledge of the number of its out-going links, $\mathcal{D}^{+}_{i}$ , and has access to local information only via its communication with the in-neighboring nodes, $\mathcal{N}^{-}_{i}$ . The only global information available to all the nodes in the network is given in Assumption 1.

Assumption 1

The diameter of the network $D$ , or an upper bound, is known to all nodes.

While Assumption 1 is limiting, there exist distributed methods for extracting such information; see, e.g., [9, 10].

The objective is to design a discrete-time coordination algorithm that allows every node $v_{i}$ in a digraph to distributively and asynchronously solve the following global optimization problem:

\operatorname*{min}_{x\in\mathds{R}^{p}}\sum_{i=1}^{n}f_{i}(x),

(6)

where $x\in\mathds{R}^{p}$ is a global optimization variable (or a common decision variable). In order to solve problem and to enjoy the structure ADMM scheme at the same time, a separate decision variable $x_{i}$ for node $v_{i}$ is introduced and the constraint $\|x_{i}-x_{j}\|\leq\epsilon,\epsilon>0,\forall v_{i},v_{j}\in\mathcal{V}$ is imposed to allow the asynchronous distributed ADMM framework. Thus, problem (6) is reformulated as

	$\displaystyle\min_{x_{i},i=1,\ldots,n}\,$	$\displaystyle\sum_{i=1}^{n}f_{i}(x_{i}),$		(7)
	s.t.	$\displaystyle\\|x_{i}-x_{j}\\|\leq\epsilon,\forall v_{i},v_{j}\in\mathcal{V},$		(8)

where $\epsilon$ is a predefined error tolerance. Define a closed nonempty convex set $\mathcal{C}$ as

\mathcal{C}=\left\{\begin{bmatrix}x_{1}^{\mbox{\tiny T}}&x_{2}^{\mbox{\tiny T}}&\ldots&x_{n}^{\mbox{\tiny T}}\end{bmatrix}^{\mbox{\tiny T}}\in\mathds{R}^{np}\,:\,\|x_{i}-x_{j}\|\leq\epsilon\right\}.

(9)

By denoting $X\coloneqq\begin{bmatrix}x_{1}^{\mbox{\tiny T}}&x_{2}^{\mbox{\tiny T}}&\ldots&x_{n}^{\mbox{\tiny T}}\end{bmatrix}^{\mbox{\tiny T}}$ and making variable $z\in\mathds{R}^{np}$ as a copy of vector $X$ , constraint (8) related to problem (7) becomes

\text{s.t.}\,X=z,\,z\in\mathcal{C}.

(10)

Then, define $g(z)$ as the indicator function of set $\mathcal{C}$ as

g(z)=\left\{\begin{array}[]{l}\begin{aligned} &0,\quad\text{if}\,z\in\mathcal{C},\\ &\infty,\,\text{otherwise}.\end{aligned}\end{array}\right.

(11)

Finally, problem (7) with constraint (10) is transformed to

	$\displaystyle\min_{z,x_{i},i=1,\ldots,n}$	$\displaystyle\left\{\sum_{i=1}^{n}f_{i}(x_{i})+g(z)\right\},$		(12)
	s.t.	$\displaystyle X-z=0.$		(12)

For notational convenience, denote $F(X)\coloneqq\sum_{i=1}^{n}f_{i}(x_{i})$ . Thus, denote the Lagrangian function as

L(X,z,\lambda)=F(X)+g(z)+\lambda^{\mbox{\tiny T}}(X-z),

(13)

where $\lambda\in\mathds{R}^{np}$ is the Lagrange multiplier. The following standard assumptions are required for problem (12).

Assumption 2

Each cost function $f_{i}:\mathds{R}^{p}\rightarrow\mathds{R}\cup\{+\infty\}$ is closed, proper and convex.

Assumption 3

The Lagrangian $L(X,z,\lambda)$ has a saddle point, i.e., there exists a solution $(X^{*},z^{*},\lambda^{*})$ , for which

L(X^{*},z^{*},\lambda)\leq L(X^{*},z^{*},\lambda^{*})\leq L(X,z,\lambda^{*}),

(14)

holds for all $X$ in $\mathds{R}^{np}$ , $z$ in $\mathds{R}^{np}$ and $\lambda$ in $\mathds{R}^{np}$ .

Assumption 2 allows $f_{i}$ to be non-differentiable [11]. By Assumptions 2-3 and based on the definition of $g(z)$ in (11), $L(X,z,\lambda^{*})$ is convex in $(X,z)$ and $(X^{*},z^{*})$ is a solution to problem (12) [11, 12].

At iteration $k$ , the corresponding augmented Lagrangian of optimization problem (12) is written as

$\displaystyle L_{\rho}$	$\displaystyle(X^{k},z^{k},\lambda^{k})$	(15)
$\displaystyle=$	$\displaystyle\sum_{i=1}^{n}f_{i}(x_{i}^{k})+g(z^{k})+{\lambda^{k}}^{\mbox{\tiny T}}(X^{k}-z^{k})+\frac{\rho}{2}\\|X^{k}-z^{k}\\|^{2}$
$\displaystyle=$	$\displaystyle\sum_{i=1}^{n}\left(f_{i}(x_{i}^{k})+{\lambda_{i}^{k}}^{\mbox{\tiny T}}(x_{i}^{k}-z_{i}^{k})+\frac{\rho}{2}\\|x_{i}^{k}-z_{i}^{k}\\|^{2}\right)+g(z^{k}),$

where $z_{i}\in\mathds{R}^{p}$ is the $i$ -th element of vector $z$ . By ignoring terms which are independent of the minimization variables (i.e., $x_{i},z$ ), for each node $v_{i}$ , the standard ADMM updates (3)-(5) change to the following format:

$\displaystyle x_{i}^{k+1}=$	$\displaystyle\operatorname*{argmin}_{x_{i}}f_{i}(x_{i})+{\lambda_{i}^{k}}^{\mbox{\tiny T}}x_{i}+\frac{\rho}{2}\\|x_{i}-z_{i}^{k}\\|^{2},$	(16)
$\displaystyle z^{k+1}=$	$\displaystyle\operatorname*{argmin}_{z}g(z)+{\lambda^{k}}^{\mbox{\tiny T}}(X^{k+1}-z)+\frac{\rho}{2}\\|X^{k+1}-z\\|^{2}$
$\displaystyle=$	$\displaystyle\operatorname*{argmin}_{z}g(z)+\frac{\rho}{2}\\|X^{k+1}-z+\frac{1}{\rho}\lambda^{k}\\|^{2},$	(17)
$\displaystyle\lambda_{i}^{k+1}=$	$\displaystyle\lambda_{i}^{k}+\rho(x_{i}^{k+1}-z_{i}^{k+1}),$	(18)

where the last term in (17) comes from the identity $2a^{T}b+b^{2}=(a+b)^{2}-a^{2}$ with $a=\lambda^{k}/\rho$ and $b=X^{k+1}-z$ .

Update (16) for $x_{i}^{k+1}$ can be solved by a classical method, e.g., the proximity operator [11, Section 4]. Update (18) for the dual variable $\lambda_{i}^{k+1}$ can be implemented trivially by node $v_{i}$ . Note that both updates can be done independently and in parallel by node $v_{i}$ .

Since $g$ in (11) is the indicator function of the closed nonempty convex set $\mathcal{C}$ , update (17) for $z^{k+1}$ becomes

\displaystyle z^{k+1}=\Pi_{\mathcal{C}}(X^{k+1}+\lambda^{k}/\rho),

(19)

where $\Pi_{\mathcal{C}}$ denotes the projection (in the Euclidean norm) onto $\mathcal{C}$ . Intuitively, from (17) and the definition of $g(z)$ in (11), one can see that the elements of $z$ (i.e., $z_{1},z_{2},\ldots,z_{n}$ ) should go into $\mathcal{C}$ in finite time. If not, one will have $g(z)=\infty$ and update (17) will never be finished. Then, from the definition of $\mathcal{C}$ in (9), one can see that $z$ going into $\mathcal{C}$ means $\|z_{i}-z_{j}\|\leq\epsilon,\forall v_{i},v_{j}\in\mathcal{V}$ .

It is worth noting that if $z_{i}-z_{j}=0,\forall v_{i},v_{j}\in\mathcal{V}$ , it means $z_{1}=z_{2}=\ldots=z_{n}$ , which is in the mathematical format of consensus. In other words, each node $v_{i}\in\mathcal{V}$ can have $z_{i}$ reach $\frac{1}{n}\sum_{i=1}^{n}z_{i}(0),z_{i}(0)=x_{i}^{k+1}+\lambda_{i}^{k}/\rho$ in a finite number of steps, i.e., finite-time average consensus. Similarly, requiring $\|z_{i}-z_{j}\|\leq\epsilon,\forall v_{i},v_{j}\in\mathcal{V}$ means asking nodes $v_{i},v_{j}\in\mathcal{V}$ to have $z_{i},z_{j}$ with $\epsilon$ closeness to the average consensus, i.e., to have $z_{i},z_{j}$ enter a circle with its center at $\frac{1}{n}\sum_{i=1}^{n}(x_{i}^{k+1}+\lambda_{i}^{k}/\rho)$ and its radius as $\epsilon$ . Here we call it finite-time “approximate” ratio consensus.

IV Asynchronous operation and consensus algorithms

Before we proceed to the description of our main algorithm, we describe the form of asynchrony that the nodes experience and some consensus algorithms that are key ingredients for the operation of our proposed algorithms.

IV-A Asynchrony description

Let $t(0)\in\mathds{R}_{+}$ the time at which the iterations for the optimization start. We assume that there is a set of times $\mathcal{T}=\{t(1),t(2),t(3),\ldots\}$ at which one or more nodes transmit some value to their neighbors. In a synchronous setting, each node $v_{i}$ updates and sends its information to its neighbors at discrete times in $\mathcal{T}$ and no processing or communication delays are considered. In an asynchronous setting, a message that is received at time $t(k_{1})$ and processed at time $t(k_{2})$ , $k_{2}>k_{1}$ , experiences a process delay of $t(k_{2})-t(k_{1})$ (or a time-index delay $k_{2}-k_{1}$ ). In Fig. 1, we show through a simple example how the time steps evolve for each node in the network; with $t_{i}(k)$ we denote the time step at which iteration $k$ takes place for node $v_{i}$ .

Refer to caption — Figure 1: A simple example of a network consisting of 3 nodes. In the timeline of each node, blue ticks indicate an iteration for node $v_{i}$ and the arrows indicate the transmissions. The time in between transmissions is the processing delay, while the time from the beginning of the transmission to the end (arrow) is the transmission delay.

We index nodes’ information states and any other information at time $t(k)$ by time-index $k$ . Hence, we use $x_{i}^{k}\equiv x_{i}[k]\in\mathds{R}$ to denote the information state of node $i$ at time $t(k)$ . Note that ${x_{i}^{k}}^{\mbox{\tiny T}}$ denotes (is equivalent to) $x_{i}[k]^{\mbox{\tiny T}}$ .

Assumption 4

There exists an upper bound $B,B\in\mathds{N}$ , on the time-index steps that is needed for a node to process the information received from other nodes.

Assumption 4 basically states that since the number of nodes is finite and they update their state regularly, there exists a finite number of steps $B$ before which all nodes have updated their values (and hence broadcast it to neighboring nodes). However, it is not possible for nodes to count the number of steps elapsed in the network. For this reason we make an additional assumption.

Assumption 5

There exists an upper bound $T,T\in\mathds{R}_{\geq 0}$ , on the actual time in seconds that is needed for a node to process the information received from other nodes.

In other words, the upper bound $B$ steps in Assumption 4 is translated to an upper bound of $T$ in seconds, something that the nodes can count individually.

IV-A1 Asynchronous ratio consensus

Here, we regard the asynchronous information exchange among nodes as delayed information. Thus, we adopt a protocol developed in [13] where each node updates its information state $x_{j}^{k+1}$ by combining the delayed information received by its neighbors $x_{i}^{s}$ ( $s\in\mathds{Z},s\leq k,~{}v_{i}\in\mathcal{N}^{-}_{j}$ ) using constant positive weights $p_{ji}$ . Integer $\tau_{ji}^{k}\geq 0$ is used to represent the delay of a message sent from node $v_{i}$ to node $v_{j}$ at time instant $k$ . We require that $0\leq\tau_{ji}^{k}\leq\bar{\tau}_{ji}\leq\bar{\tau}$ for all $k\geq 0$ for some finite $\bar{\tau}=\max\{\bar{\tau}_{ji}\}$ , $\bar{\tau}\in\mathds{N}$ and $1+\bar{\tau}\leq B$ where $B$ is defined in Assumption 4. We make the reasonable assumption that $\tau_{jj}^{k}=0$ , $\forall v_{j}\in\mathcal{V}$ , at all time instances $k$ (i.e., the own value of a node is always available without delay). Each node updates its information state according to:

\displaystyle y_{j}^{k+1}=p_{jj}y_{j}^{k}+\sum_{v_{i}\in\mathcal{N}^{-}_{j}}\sum_{r=0}^{\bar{\tau}}p_{ji}y_{i}^{k-r}I_{k-r,ji}^{r},

(20)

for $k\geq 0$ , where $y_{j}^{0}\in\mathds{R}$ is the initial state of node $v_{j}$ ; $p_{ji}$ $\forall\varepsilon_{ji}\in\mathcal{E}$ form $P=[p_{ji}]$ that adheres to the graph structure, and is primitive column stochastic; and

\displaystyle I_{k,ji}(\tau)=\begin{cases}1,&\text{if $\tau_{ji}^{k}=\tau$,}\\ 0,&\text{otherwise.}\end{cases}

(21)

Lemma 1

[13, Lemma 2] Consider a strongly connected digraph $\mathcal{G}(\mathcal{V},\mathcal{E})$ . Let $y_{j}^{k}$ and $w_{j}^{k}$ (for all $v_{j}\in\mathcal{V}$ and $k=0,1,2,\ldots$ ) be the result of the iterations (20) and

\displaystyle w_{j}^{k+1}

\displaystyle=p_{jj}w_{j}^{k}+\sum_{v_{i}\in\mathcal{N}^{-}_{j}}\sum_{r=0}^{\bar{\tau}}p_{ji}w_{i}^{k-r}I_{k-r,ji}^{r}\;,

(22)

where $p_{lj}=\frac{1}{1+\mathcal{D}_{j}^{+}}$ for $v_{l}\in\mathcal{N}_{j}^{+}\cup\{v_{j}\}$ (zeros otherwise) with $y^{0}=[{y_{1}^{0}}^{\mbox{\tiny T}},{y_{2}^{0}}^{\mbox{\tiny T}},\ldots,{y_{|\mathcal{V}|}^{0}}^{\mbox{\tiny T}}]^{T}$ and $w^{0}=\mathds{1}$ ; $I_{k,ji}$ is an indicator function that captures the bounded delay $\tau_{ji}^{k}$ on link $(v_{j},v_{i})$ at iteration $k$ (as defined in (21), $\tau_{ji}^{k}\leq\bar{\tau}$ ). Then, the solution $\displaystyle z_{j}^{k}={y_{j}^{k}}/{w_{j}^{k}}$ to the average consensus problem can be asymptotically obtained as $\lim_{k\rightarrow\infty}z_{j}^{k}=\frac{\sum_{v_{\ell}\in\mathcal{V}}y_{\ell}^{0}}{|\mathcal{V}|}\;,\;\forall v_{j}\in\mathcal{V}\;.$

IV-A2 Asynchronous $\max-$ consensus

When the updates are asynchronous, for any node $v_{j}\in\mathcal{V}$ , the update rule is as follows [14]:

\displaystyle y_{j}[t_{j}(k+1)]=\max_{v_{i}\in\mathcal{N}_{j}^{-}[t_{j}(k+1)]\cup\{v_{j}\}}\{y_{i}[t_{j}(k)+\theta_{ij}(k)]\},

where $y_{i}[t_{j}(k)+\theta_{ij}(k)]$ are the states of the in-neighbors $\mathcal{N}_{j}^{-}[t_{j}(k+1)]$ available at the time of the update. Variable $\theta_{ij}(k)\in\mathds{R}$ , evaluated with respect to the update time $t_{j}(k)$ (defined in Sec. IV-A), is used here to express asynchronous state updates occurring at the neighbors of node $v_{j}$ , between two consecutive updates of the state of node $v_{j}$ . It has been shown in [14, Lemma 5.1] that this algorithm converges to the maximum value among all nodes in a finite number of steps $s$ , $s\leq BD$ , where $D$ and $B$ are defined in Assumptions 1 and 4, respectively.

V Main results

Our proposed algorithm works on two levels: i) at the optimization level in which the distributed ADMM procedure is described for updating $x_{i}^{k+1}$ , $z_{i}^{k+1}$ and $\lambda_{i}^{k+1}$ (Algorithm 1), and ii) at the distributed coordination level for computing $z_{i}^{k+1}$ in an asynchronous manner (Algorithm 2). Algorithm 2 is executed within Algorithm 1.

V-A Algorithm 1

Unlike asynchronous operation which is depicted in Fig. 1, in a synchronous algorithm all nodes need to agree on the update time $t(k)$ , which usually requires synchronization among all nodes or the existence of a global clock. Whether the distributed ADMM update is synchronous or asynchronous arises not only when exchanging information for computing $z^{k+1}$ using (17), but also when they start the next optimization step (i.e., when optimization step $k$ ends and step $k+1$ begins). For now, we assume that the transition in the optimization steps is somehow synchronized. We will discuss ways to achieve it later.

Our AsyAD-ADMM algorithm to solve optimization problem (12) is summarized in Algorithm 1. In Algorithm 1, at every optimization step, each node $v_{i}$ does the following:

$\bullet$

It computes $x_{i}^{k+1}$ using Eq. (16) locally.
$\bullet$

It computes $z_{i}^{k+1}$ via Algorithm 2 in an asynchronous fashion and in a finite number of steps, since Algorithm 2 deploys a termination mechanism.
$\bullet$

It computes $\lambda_{i}^{k+1}$ using Eq. (18) and the values of $x_{i}^{k+1}$ and $z_{i}^{k+1}$ .

For the optimization, we assume that all nodes are aware of the network diameter $D$ (or an upper bound on $D$ ), an upper bound on time-steps $1+\bar{\tau}\leq B$ , the augmented Lagrangian function parameter $\rho$ , and the ADMM maximum optimization step $k_{\max}$ ¹¹1Note that the ADMM stopping criterion is the primal and dual feasibility condition in [11]..

Algorithm 1 AsyAD-ADMM: Asynchronous Approximate Distributed ADMM

1: Input: A strongly connected digraph

\mathcal{G}(\mathcal{V},\mathcal{E})

, in which each node

v_{i}\in\mathcal{V}

knows its out-degree

\mathcal{N}_{i}^{+}

\rho>0

, network diameter

D

(or an upper bound), upper bound on the number of time-steps

\bar{\tau}

, error tolerance

\epsilon

k_{\max}

(ADMM maximum number of iterations)

2: Initialization: At

k=0

, each node

v_{i}\in\mathcal{V}

sets

x_{i}^{0}

z_{i}^{0}

, and

\lambda_{i}^{0}

randomly.

3: Node

v_{i}\in\mathcal{V}

does the following:

4: while

k\leq k_{\max}

5: Compute

x_{i}^{k+1}

using Eq. (16)

6: Compute

z_{i}^{k+1}

via Algorithm 2 by setting

y_{i}^{0}=x_{i}^{k+1}+\lambda_{i}^{k}/\rho

and using

D

\bar{\tau}

, and

\epsilon

7: Compute

\lambda_{i}^{k+1}

using Eq. (18)

8: if ADMM stopping criterion is satisfied then

9: Stop AsyAD-ADMM

10: end if

11:

k\leftarrow k+1

12: end while

V-B Algorithm 2

Under Assumption 1, Cady et al. in [15] proposed an algorithm which is based on the synchronous ratio-consensus protocol [16] and takes advantage of synchronous $\min$ - and $\max$ -consensus iterations to allow the nodes to determine the time step, $k_{0}$ , when their ratios, (i.e., $\{z_{i}^{k_{0}}|v_{i}\in\mathcal{V}\}$ in ADMM), are within $\epsilon$ of each other. However, the algorithm in [16] cannot deal with asynchronous information exchange scenarios. To circumvent this problem, and inspired by Cady et al. in [15], the authors in [17] adopted robustified ratio consensus proposed in [13] to propose a termination mechanism for average consensus with delays. Similarly, [8] proposed the corresponding asynchronous termination algorithm in the context of a quadratic distributed optimization problem, in which the algorithm is executed only once. We adopt the asynchronous termination algorithm proposed in [8] to compute $z_{i}^{k+1}$ , and we extend it to accommodate the repetition of this algorithm. The algorithm, under Assumption 4 is summarized in Algorithm 2. Specifically, Algorithm 2 makes use of the following ideas:

$\bullet$

Each node $v_{j}$ runs asynchronous ratio consensus in Sec. IV-A1; in our case, we use initial conditions $y_{j}^{0}=x_{j}^{k+1}+\lambda_{j}^{k}/\rho$ .
$\bullet$

At the same time, each node maintains two auxiliary states, $m_{j}^{k}$ and $M_{j}^{k}$ , which are updated using asynchronous $\min$ - and $\max$ -consensus in Sec. IV-A2, respectively. Both converge in $(1+\bar{\tau})D$ steps [14].
$\bullet$

Every $(1+\bar{\tau})D$ steps each node checks whether $\|M_{j}^{k}-m_{j}^{k}\|<\epsilon$ . If this is the case, then the ratios for all nodes are close to the asymptotic value and it stops iterating. Otherwise, $m_{j}^{k}$ and $M_{j}^{k}$ are reinitialized to $z_{j}^{k}$ .

Algorithm 2 Distributed Finite-Time Termination for Asynchronous Ratio Consensus

1: Input: A strongly connected digraph

\mathcal{G}=(\mathcal{V},\mathcal{E})

v_{j}\in\mathcal{V}

knows its out-degree

\mathcal{N}_{j}^{+}

y_{j}^{0}

w_{j}^{0}=\mathds{1}

D,\bar{\tau},\epsilon

2: set

M_{j}^{0}=+\infty,~{}m_{j}^{0}=-\infty,{\rm flag}_{j}^{0}=0,z_{j}=\frac{y_{j}^{0}}{w_{j}^{0}}

3: set

p_{lj}=\frac{1}{1+\mathcal{D}_{j}^{+}}

\forall~{}v_{l}\in\mathcal{N}_{j}^{+}\cup\{v_{j}\}

(zero otherwise)

4: for

k\geq 0

5: while

{\rm flag}_{j}^{k}=0

6: if

k\mod(1+\bar{\tau})D=0

and

k\neq 0

then

7: if

\|M_{j}^{k}-m_{j}^{k}\|<\epsilon

then

8: set

{\rm flag}_{j}^{k}=1

9: end if

10: set

M_{j}^{k}=m_{j}^{k}=z_{j}^{k}=\frac{y_{j}^{k}}{w_{j}^{k}}

11: end if

12: broadcast to all

v_{l}\in\mathcal{N}_{j}^{+}

p_{lj}y_{j}^{k}

p_{lj}w_{j}^{k}

M_{j}^{k}

m_{j}^{k}

13: receive from all

v_{i}\in\mathcal{N}_{j}^{-}[k]

p_{ji}y_{i}^{k}

p_{ji}w_{i}^{k}

M_{i}^{k}

m_{i}^{k}

14: compute

15:

y_{j}^{k}\leftarrow p_{jj}y_{j}^{k}+\sum_{v_{i}\in\mathcal{N}^{-}_{j}}\sum_{r=0}^{\bar{\tau}}y_{ji}^{k-r}I_{k-r,ji}^{r}

16:

w_{j}^{k}\leftarrow p_{jj}w_{j}^{k}+\sum_{v_{i}\in\mathcal{N}^{-}_{j}}\sum_{r=0}^{\bar{\tau}}w_{ji}^{k-r}I_{k-r,ji}^{r}

17:

M_{j}^{k}\leftarrow\max_{v_{i}\in\mathcal{N}_{j}^{-}\cup\{v_{j}\}}\{M_{i}[t_{j}(k)+\theta_{ij}(k)]\}

18:

m_{j}^{k}\leftarrow\max_{v_{i}\in\mathcal{N}_{j}^{-}\cup\{v_{j}\}}\{m_{i}[t_{j}(k)+\theta_{ij}(k)]\}

19: end while

20: end for

As stated in Assumption 5, the upper bound $B$ on time-steps is guaranteed to be executed within $T$ seconds (actual time). So, nodes have a more loose synchronization of every $DT$ seconds. The $\max-$ and $\min-$ are checked every $DT$ seconds and if they have converged in one of the checks, the next optimization step is initiated for all $DT$ seconds, after the termination of the algorithm. This approach requires some loose form of synchronization, but the time scale is much more coarse.

V-C Convergence analysis

The following Theorem 1 states that AsyAD-ADMM has $O(\frac{1}{k})$ convergence rate.

Theorem 1 (Convergence)

Let $\{X^{k},z^{k},\lambda^{k}\}$ be the iterates in AsyAD-ADMM algorithm for problem (12), where $X^{k}=[{x_{1}^{k}}^{\mbox{\tiny T}},{x_{2}^{k}}^{\mbox{\tiny T}},\ldots,{x_{n}^{k}}^{\mbox{\tiny T}}]^{\mbox{\tiny T}}$ and $\lambda^{k}=[{\lambda_{1}^{k}}^{\mbox{\tiny T}},{\lambda_{2}^{k}}^{\mbox{\tiny T}},\ldots,{\lambda_{n}^{k}}^{\mbox{\tiny T}}]^{\mbox{\tiny T}}$ . Let $\bar{X}^{k}=\frac{1}{k}\sum_{s=0}^{k-1}X^{s+1},\bar{z}^{k}=\frac{1}{k}\sum_{s=0}^{k-1}z^{s+1}$ be respectively the ergodic average of $X^{k}$ . Considering a strongly connected communication graph, under Assumptions 1-5, the following relationship holds for any iteration $k$ as

	$\displaystyle 0$	$\displaystyle\leq L(\bar{X}^{k},\bar{z}^{k},\lambda^{})-L(X^{},z^{},\lambda^{})$		(23)
		$\displaystyle\leq\frac{1}{k}\left(\frac{1}{2\rho}\\|\lambda^{}-\lambda^{0}\\|^{2}+\frac{\rho}{2}\\|X^{}-z^{0}\\|^{2}\right)+\mathcal{O}(\sqrt{n}\epsilon),$

where $\epsilon$ is the $z$ update (17) tolerance whose value is independent of the researched problem.

Proof:

See Appendix A. ∎

Remark 1

Note that the convergence proof shows the convergence rate for the optimization steps. The fact that the distributed algorithm is approximate, an additional term $\mathcal{O}(\sqrt{n}\epsilon)$ is introduced, that it is a function of the error tolerance and the size of the network and that it influences the solution precision.

Remark 2

The convergence proof here is basically different from the ones in [12] and [6] as the investigated problems are different. Specifically, in [12], the proposed D-ADMM can be only applied to nodes with undirected graphs as the constraint $AX=0$ is needed to minimize the objective function (7), where the matrix $A$ is related to the communication graph structure which must be undirected. Authors in [6] proposed the D-ADMM for digraphs with the same constraint $\|x_{i}-x_{j}\|\leq\epsilon,\epsilon>0$ . However, it can only be applied to synchronous case.

VI Numerical Example

The distributed least square problem is considered as

\operatorname*{argmin}_{x\in\mathds{R}^{p}}f(x)=\frac{1}{2}\sum_{i=1}^{n}\|A_{i}x-b_{i}\|^{2},

(24)

where $A_{i}\in\mathds{R}^{q\times p}$ is only known to node $v_{i}$ , $b_{i}\in\mathds{R}^{q}$ is the measured data and $x\in\mathds{R}^{p}$ is the common decision variable that needs to be optimized. For the automatic generation of large number of different matrices $A_{i}$ , we choose $q=p$ to have the square $A_{i}$ . All elements of $A_{i}$ and $b_{i}$ are set from independent and identically distributed (i.i.d.) samples of standard normal distribution $\mathcal{N}(0,1)$ .

We choose $p=3$ and $n=600$ to have 600 nodes having a strongly connected digraph. We implement the D-ADMM-FTERC algorithm in [7] as a benchmark to evaluate factors (i.e., $\epsilon,\bar{\tau}$ ) influencing AsyAD-ADMM performance as D-ADMM-FTERC is developed for calculating the optimal solution to distributed optimization problems for digraphs, i.e., all figures related to D-ADMM-FTERC are optimal. However, D-ADMM-FTERC is synchronous and this is the reason we develop AsyAD-ADMM in this paper.

Fig. 2 demonstrates three points: (i) AsyAD-ADMM solutions are very close to the optimal one from D-ADMM-FTERC. (ii) From Fig. 2 (a) to (b), the smaller the value of $\epsilon$ is, the closer AsyAD-ADMM solutions are to D-ADMM-FTERC one. This is reasonable as smaller value of $\epsilon$ means better and more accurate “approximate” ratio consensus for $z$ update (17) using Algorithm 2.

This can be also validated in Fig. 3 where red and black lines are the $z$ updates using AsyAD-ADMM and D-ADMM-FTERC, respectively. (iii) The values of $\bar{\tau}$ (defined in Sec. IV-A1) do not influence the precision of AsyAD-ADMM, which also makes sense as $\bar{\tau}$ decides the degree of asynchrony, not the asynchronous ADMM precision. Nodes use this information to decide when to update as shown in step 6 of Algorithm 2. The values of $\bar{\tau}$ do influence the iteration steps of Algorithm 2 as described in Table I from which it shows the larger the value of $\bar{\tau}$ is, the more iteration steps Algorithm 2 needs in each AsyAD-ADMM optimization step. For $\epsilon=0.01$ , the steps are the same as $1000$ because we set the iteration step upper bound for Algorithm 2 as 1000; it means iteration steps are no less than 1000 when $\epsilon\leq 0.01$ and $\bar{\tau}\geq 3$ .

Note that in Fig. 3, we have AsyAD-ADMM with the stopping condition which is for the convenience of figure presentation. The reason that Fig. 3 (a) and (b) have a big difference is $\|z_{i}-z_{j}\|\leq\epsilon$ , which means a smaller $\epsilon$ leads to a more accurate $z$ update (17) using Algorithm 2. And the reason is that a smaller $\epsilon$ leads to a larger Algorithm 2 iteration step as one can check it is $9$ and $1000$ from Table I for $\epsilon=0.1$ and $\epsilon=0.01$ in this example. Also from Fig. 3, one can see the ADMM iteration numbers are the same for different values of $\epsilon$ for AsyAD-ADMM with the stopping condition. Furthermore, Table II describes the running time comparison on an Intel Core i5 processor at 2.6 GHz with Matlab R2020b for different values of $\epsilon$ and $\bar{\tau}$ with $k_{\max}=200$ . One can see for $\epsilon=0.01$ , from $\bar{\tau}=3,5,10$ , even though Algorithm 2 iteration steps are the same as 1000, the running time is larger and larger.

TABLE I: Algorithm 2 iteration step in each Algorithm 1 iteration.

	$\bar{\tau}=3$	$\bar{\tau}=5$	$\bar{\tau}=10$
$\epsilon=0.1$	9	13	23
$\epsilon=0.01$	1000	1000	1000

TABLE II: AsyAD-ADMM Algorithm 1 running time.

	$\bar{\tau}=3$	$\bar{\tau}=5$	$\bar{\tau}=10$
$\epsilon=0.1$	7.9677s	12.0282 s	33.6401s
$\epsilon=0.01$	190.1420s	246.9963s	470.6576s

This is because Algorithm 2 running time is the time gap between two consecutive iterations multiplies the iteration step while in step 6 of Algorithm 2, $(1+\bar{\tau})D$ is the time gap. Fig. 4 demonstrates two points. (i) It validates Theorem 1 that AsyAD-ADMM has $\mathcal{O}(\frac{1}{k})$ convergence rate. (ii) It is in accordance with Fig. 2 and Fig. 3 that a smaller value of $\epsilon$ has a better solution precision.

What is more, from Fig. 2 and Table II, considering the AsyAD-ADMM solution precision and running time, it is better to have the value of $\epsilon$ neither too large nor too small; for this example, $\epsilon\in[0.01,0.1)$ is a good value range.

VII Conclusions and Future Directions

An asynchronous approximate distributed alternating direction method of multipliers algorithm is proposed to provide a solution for the distributed optimization problems allowing asynchronous information exchange among nodes in digraphs with assumptions of bounded time-index steps of asynchronous nodes and the known digraph diameter to all nodes. By proposing the finite-time asynchronous approximate ratio consensus algorithm for one ADMM primal variable update, the solution is close to the optimal but acceptable. How to choosing algorithm parameters to have an aggressive performance is also discussed.

Future work will focus on designing a new asynchronous ADMM version without the communication graph diameter information to compute the optimal solution for distributed optimization problems.

Appendix A Proof of Theorem 1

The analysis is inspired by [12] and [6]. From the second inequality of the saddle point of Lagrangian function (14), the first inequality in (23) can be proved directly.

We now prove the second inequality in (23). For each node $v_{i}$ , since $x_{i}^{k+1}$ minimizes $L_{\rho}(x,z^{k},\lambda^{k})$ in (16), by the optimal condition, we have

\displaystyle(x-x_{i}^{k+1})^{\mbox{\tiny T}}[h_{i}(x_{i}^{k+1})+\lambda_{i}^{k}+\rho(x_{i}^{k+1}-z_{i}^{k})]\geq 0,

(25)

where $h_{i}(x_{i}^{k+1})$ is the sub-gradient of $f_{i}$ at $x_{i}^{k+1}$ . By integrating $x_{i}^{k+1}=(\lambda_{i}^{k+1}-\lambda_{i}^{k})/\rho+z_{i}^{k+1}$ from (18) into the above inequality, we have

(x-x_{i}^{k+1})^{\mbox{\tiny T}}[h_{i}(x_{i}^{k+1})+\lambda_{i}^{k+1}+\rho(z_{i}^{k+1}-z_{i}^{k})]\geq 0.

(26)

The above $n$ inequalities can be written in compact form as

(X-X^{k+1})^{\mbox{\tiny T}}[\bar{h}(X^{k+1})+\lambda^{k+1}+\rho(z^{k+1}-z^{k})]\geq 0,

(27)

where $\bar{h}(X^{k+1})=[h_{1}^{\mbox{\tiny T}}(x_{1}^{k+1}),\ldots,h_{n}^{\mbox{\tiny T}}(x_{n}^{k+1})]^{\mbox{\tiny T}}$ . Denote

\tilde{z}_{i}^{k+1}\coloneqq\frac{1}{n}\sum_{i=1}^{n}(x_{i}^{k+1}+\frac{\lambda_{i}^{k}}{\rho}),

(28)

and $\tilde{z}^{k+1}=[\tilde{z}_{1}^{{(k+1)}^{\mbox{\tiny T}}},\ldots,\tilde{z}_{n}^{{(k+1)}^{\mbox{\tiny T}}}]^{\mbox{\tiny T}}$ . Based on “approximate” consensus Algorithm 2, we denote the real $z$ update (17) for the step $k+1$ as $z^{k+1}\coloneqq\tilde{z}^{k+1}+e^{k+1},e^{k+1}=[e_{1}^{{(k+1)}^{\mbox{\tiny T}}},\ldots,e_{n}^{{(k+1)}^{\mbox{\tiny T}}}]^{\mbox{\tiny T}}$ with $\|e_{i}^{k+1}\|\leq\epsilon,i=1,\ldots,n,k=0,1,\ldots$ , i.e., $\|e^{k+1}\|_{2}\leq\sqrt{n}\epsilon$ . Since $\tilde{z}^{k+1}$ minimizes $L_{\rho}(x^{k+1},z,\lambda^{k})$ in (17), similarly, we have

	$\displaystyle(z-$	$\displaystyle\tilde{z}^{k+1})^{\mbox{\tiny T}}[\bar{g}(\tilde{z}^{k+1})-\lambda^{k}-\rho(X^{k+1}-\tilde{z}^{k+1})]$		(29)
		$\displaystyle=(z-\tilde{z}^{k+1})^{\mbox{\tiny T}}(\bar{g}(\tilde{z}^{k+1})-\lambda^{k+1}-\rho e^{k+1})\geq 0$		(29)

for all $z\in\mathcal{C}$ , where $\bar{g}(\tilde{z}^{k+1})$ is the sub-gradient of $g$ at $\tilde{z}^{k+1}$ . As both $F$ and $g$ are convex, by utilizing the sub-gradient inequality, from (27) and (29), we get

$\displaystyle F($	$\displaystyle X^{k+1})-F(X)+g(\tilde{z}^{k+1})-g(z)$
$\displaystyle\leq$	$\displaystyle-(X-X^{k+1})^{\mbox{\tiny T}}\bar{h}(X^{k+1})-(z-\tilde{z}^{k+1})^{\mbox{\tiny T}}\bar{g}(\tilde{z}^{k+1})$
$\displaystyle\leq$	$\displaystyle(X-X^{k+1})^{\mbox{\tiny T}}[\lambda^{k+1}+\rho(z^{k+1}-z^{k})]$
	$\displaystyle+(z-\tilde{z}^{k+1})^{\mbox{\tiny T}}(-\lambda^{k+1}-\rho e^{k+1})$
$\displaystyle=$	$\displaystyle\lambda^{{(k+1)}^{\mbox{\tiny T}}}(X-X^{k+1}-z+z^{k+1}-e^{k+1})$
	$\displaystyle+\rho(X-X^{k+1})^{\mbox{\tiny T}}(z^{k+1}-z^{k})-\rho(z-\tilde{z}^{k+1})^{\mbox{\tiny T}}e^{k+1}$
$\displaystyle=$	$\displaystyle\lambda^{{(k+1)}^{\mbox{\tiny T}}}[X-X^{k+1}-z+z^{k+1}]+\rho(X-X^{k+1})^{\mbox{\tiny T}}(z^{k+1}$
	$\displaystyle-z^{k})-[\lambda^{k+1}+\rho(z-\tilde{z}^{k+1})]^{\mbox{\tiny T}}e^{k+1}.$	(30)

From (9), (11) and (28), we have $g(\tilde{z}^{k+1})=g(z^{k+1})$ as $\{\tilde{z}^{k+1},z^{k+1}\}\in\mathcal{C}$ . Due to feasibility of the optimal solution $(X^{*},z^{*})$ , we obtain $X^{*}-z^{*}=0$ . By setting $X=X^{*},z=z^{*}$ , (30) becomes

	$\displaystyle F$	$\displaystyle(X^{k+1})-F(X^{})+g(z^{k+1})-g(z^{})\leq\lambda^{{(k+1)}^{\mbox{\tiny T}}}$		(31)
		$\displaystyle\times(z^{k+1}-X^{k+1})+\rho(X^{*}-X^{k+1})^{\mbox{\tiny T}}(z^{k+1}-z^{k})-\eta^{k+1}.$

where $\eta^{k+1}\coloneqq[\lambda^{k+1}+\rho(z^{*}-\tilde{z}^{k+1})]^{\mbox{\tiny T}}e^{k+1}$ . Note that in Appendix A of [11], we know that both $\lambda^{k+1}$ and $z^{*}-\tilde{z}^{k+1}$ are bounded; and $z^{*}-\tilde{z}^{k+1}$ goes to zero as $k\rightarrow\infty$ . Therefore, for $k=1,2,\ldots$ , there exist constant numbers $M_{\lambda}$ and $M_{z}$ such that

\displaystyle\|\lambda^{k}\|\leq M_{\lambda},\|z^{*}-\tilde{z}^{k}\|\leq M_{z}^{k},M_{z}^{k}\leq M_{z},\lim_{k\to\infty}M_{z}^{k}=0.

Denote $\bar{F}^{k+1}\coloneqq F(X^{k+1})-F(X^{*})+g(z^{k+1})-g(z^{*})+{\lambda^{*}}^{\mbox{\tiny T}}(X^{k+1}-z^{k+1})$ and $\phi^{k+1}\coloneqq\rho(X^{*}-X^{k+1})^{\mbox{\tiny T}}(z^{k+1}-z^{k})$ . Adding ${\lambda^{*}}^{\mbox{\tiny T}}(X^{k+1}-z^{k+1})$ to both sides of (31) yields

	$\displaystyle\bar{F}^{k+1}\leq$	$\displaystyle(\lambda^{*}-\lambda^{k+1})^{\mbox{\tiny T}}(X^{k+1}-z^{k+1})+\phi^{k+1}-\eta^{k+1}$
	$\displaystyle\leq$	$\displaystyle\frac{1}{\rho}(\lambda^{*}-\lambda^{k+1})^{\mbox{\tiny T}}(\lambda^{k+1}-\lambda^{k})+\phi^{k+1}-\eta^{k+1},$		(32)

where the last equality is calculated from (18). Recall an well-known equality law that

	$\displaystyle(a_{1}$	$\displaystyle-a_{2})^{\mbox{\tiny T}}(a_{3}-a_{4})=\frac{1}{2}(\\|a_{1}-a_{4}\\|^{2}-\\|a_{1}-a_{3}\\|^{2})$		(33)
		$\displaystyle+\frac{1}{2}(\\|a_{2}-a_{3}\\|^{2}-\\|a_{2}-a_{4}\\|^{2}),\forall a_{1},a_{2},a_{3},a_{4}\in\mathds{R}^{p}.$

Then, by using equality (33), (32) changes to

$\displaystyle\bar{F}^{k+1}\leq$	$\displaystyle\frac{1}{2\rho}(\\|\lambda^{}-\lambda^{k}\\|^{2}-\\|\lambda^{}-\lambda^{k+1}\\|^{2}-\\|\lambda^{k+1}-\lambda^{k}\\|^{2})$
	$\displaystyle+\frac{\rho}{2}(\\|X^{}-z^{k}\\|^{2}-\\|X^{}-z^{k+1}\\|^{2}$
	$\displaystyle+\\|X^{k+1}-z^{k+1}\\|^{2}-\\|X^{k+1}-z^{k}\\|^{2})-\eta^{k+1}$
$\displaystyle\leq$	$\displaystyle\frac{1}{2\rho}(\\|\lambda^{}-\lambda^{k}\\|^{2}-\\|\lambda^{}-\lambda^{k+1}\\|^{2})$	(34)
	$\displaystyle+\frac{\rho}{2}(\\|X^{}-z^{k}\\|^{2}-\\|X^{}-z^{k+1}\\|^{2})-\eta^{k+1},$

where the last inequality comes from using (18) and dropping the negative term $-\frac{\rho}{2}\|X^{k+1}-z^{k}\|^{2}$ . Now, by using $s\coloneqq k$ , we change (34) to another format as

	$\displaystyle\bar{F}^{s+1}\leq$	$\displaystyle\frac{1}{2\rho}(\\|\lambda^{}-\lambda^{s}\\|^{2}-\\|\lambda^{}-\lambda^{s+1}\\|^{2})$
		$\displaystyle+\frac{\rho}{2}(\\|X^{}-z^{s}\\|^{2}-\\|X^{}-z^{s+1}\\|^{2})-\eta^{s+1},$		(35)

which holds true for all $s$ . Denote the constant $\theta\coloneqq\|\lambda^{*}-\lambda^{0}\|^{2}/(2\rho)+\rho\|X^{*}-z^{0}\|^{2}/2$ . By summing (35) over $s=0,1,\ldots,k-1$ and after telescoping calculation, we have

$\displaystyle\sum_{s=0}^{k-1}F($	$\displaystyle X^{s+1})-kF(X^{})+\sum_{s=0}^{k-1}g(z^{s+1})-kg(z^{})$
	$\displaystyle+{\lambda^{*}}^{\mbox{\tiny T}}\sum_{s=0}^{k-1}(X^{s+1}-z^{s+1})$	(36)
$\displaystyle\leq$	$\displaystyle\theta-\frac{1}{2\rho}\\|\lambda^{}-\lambda^{k}\\|^{2}-\frac{\rho}{2}\\|X^{}-z^{k}\\|^{2}-\sum_{s=0}^{k-1}\eta^{s+1}.$

Due to the convexity of both $F$ and $g$ , we get $kF(\bar{X}^{k})\leq\sum_{s=0}^{k-1}F(X^{s+1})$ and $kg(\bar{z}^{k})\leq\sum_{s=0}^{k-1}g(z^{s+1})$ . Thus, by the definition of $\bar{X}^{k},\bar{z}^{k}$ and dropping the negative terms,

	$\displaystyle kF$	$\displaystyle(\bar{X}^{k})-kF(X^{})+kg(\bar{z}^{k})-kg(z^{})+{\lambda^{*}}^{\mbox{\tiny T}}(k\bar{X}^{k}-k\bar{z}^{k})$
	$\displaystyle\leq$	$\displaystyle\theta-\sum_{s=0}^{k-1}\eta^{s+1}\leq\theta+k\underbrace{(M_{\lambda}+\rho M_{z})\sqrt{n}\epsilon}_{\coloneqq\mathcal{O}(\sqrt{n}\epsilon)}.$		(37)

Based on $X^{*}-z^{*}=0$ , (37) combined with the definition of Lagrangian function [cf. (13)] prove (23).

References

[1] R. Zhang and J. Kwok, “Asynchronous distributed admm for consensus optimization,” in International conference on machine learning. PMLR, 2014, pp. 1701–1709.
[2] T.-H. Chang, M. Hong, W.-C. Liao, and X. Wang, “Asynchronous distributed ADMM for large-scale optimization-part $i$ : Algorithm and convergence analysis,” IEEE Transactions on Signal Processing, vol. 64, no. 12, pp. 3118–3130, 2016.
[3] S. E. Li, Z. Wang, Y. Zheng, Q. Sun, J. Gao, F. Ma, and K. Li, “Synchronous and asynchronous parallel computation for large-scale optimal control of connected vehicles,” Transportation research part C: emerging technologies, vol. 121, p. 102842, 2020.
[4] E. Wei and A. Ozdaglar, “On the o(1=k) convergence of asynchronous distributed alternating direction method of multipliers,” in 2013 IEEE Global Conference on Signal and Information Processing, 2013, pp. 551–554.
[5] N. Bastianello, R. Carli, L. Schenato, and M. Todescato, “Asynchronous distributed optimization over lossy networks via relaxed admm: Stability and linear convergence,” IEEE Transactions on Automatic Control, pp. 1–1, 2020.
[6] V. Khatana and M. V. Salapaka, “D-DistADMM: A ${O}(1/k)$ distributed ADMM for distributed optimization in directed graph topologies,” in Proc. 59th IEEE Conf. Dec. Control, Dec. 2020, pp. 2992–2997.
[7] W. Jiang and T. Charalambous, “Distributed alternating direction method of multipliers using finite-time exact ratio consensus in digraphs,” in Proc. 19th European Control Conf., Jun. 2021,accepted.
[8] A. Grammenos, T. Charalambous, and E. Kalyvianaki, “CPU scheduling in data centers using asynchronous finite-time distributed coordination mechanisms,” arXiv preprint arXiv:2101.06139, 2021.
[9] I. Shames, T. Charalambous, C. Hadjicostis, and M. Johansson, “Distributed network size estimation and average degree estimation and control in networks isomorphic to directed graphs,” in 50th Ann. Allerton Conf. Commun., Control, and Computing, Oct. 2012, pp. 1885–1892.
[10] T. Charalambous, M. G. Rabbat, M. Johansson, and C. N. Hadjicostis, “Distributed finite-time computation of digraph parameters: Left-eigenvector, out-degree and spectrum,” IEEE Transactions on Control of Network Systems, vol. 3, no. 2, pp. 137–148, 2016.
[11] S. Boyd, N. Parikh, and E. Chu, Distributed optimization and statistical learning via the alternating direction method of multipliers. Now Publishers Inc, 2011.
[12] E. Wei and A. Ozdaglar, “Distributed alternating direction method of multipliers,” in Proc. 51st IEEE Conf. Dec. Control, 2012, pp. 5445–5450.
[13] C. N. Hadjicostis and T. Charalambous, “Average consensus in the presence of delays in directed graph topologies,” IEEE Transactions on Automatic Control, vol. 59, no. 3, pp. 763–768, March 2014.
[14] S. Giannini, D. Di Paola, A. Petitti, and A. Rizzo, “On the convergence of the max-consensus protocol with asynchronous updates,” in IEEE Conference on Decision and Control (CDC), 2013, pp. 2605–2610.
[15] S. T. Cady, A. D. Domínguez-García, and C. N. Hadjicostis, “Finite-time approximate consensus and its application to distributed frequency regulation in islanded AC microgrids,” in Proc. of Hawaii International Conference on System Sciences, 2015, pp. 2664–2670.
[16] A. D. Domínguez-García and C. N. Hadjicostis, “Coordination and control of distributed energy resources for provision of ancillary services,” in IEEE Int. Conf. Smart Grid Commun., Oct. 2010, pp. 537–542.
[17] M. Prakash, S. Talukdar, S. Attree, V. Yadav, and M. V. Salapaka, “Distributed stopping criterion for consensus in the presence of delays,” IEEE Transactions on Control of Network Systems, vol. 7, no. 1, pp. 85–95, 2020.

$\displaystyle\bar{F}^{k+1}\leq$	$\displaystyle\frac{1}{2\rho}(\\|\lambda^{}-\lambda^{k}\\|^{2}-\\|\lambda^{}-\lambda^{k+1}\\|^{2}-\\|\lambda^{k+1}-\lambda^{k}\\|^{2})$
	$\displaystyle+\frac{\rho}{2}(\\|X^{}-z^{k}\\|^{2}-\\|X^{}-z^{k+1}\\|^{2}$
	$\displaystyle+\\|X^{k+1}-z^{k+1}\\|^{2}-\\|X^{k+1}-z^{k}\\|^{2})-\eta^{k+1}$
$\displaystyle\leq$	$\displaystyle\frac{1}{2\rho}(\\|\lambda^{}-\lambda^{k}\\|^{2}-\\|\lambda^{}-\lambda^{k+1}\\|^{2})$	(34)
	$\displaystyle+\frac{\rho}{2}(\\|X^{}-z^{k}\\|^{2}-\\|X^{}-z^{k+1}\\|^{2})-\eta^{k+1},$

An Asynchronous Approximate Distributed Alternating Direction Method of Multipliers in Digraphs