Distributed Mirror Descent Algorithm with Bregman Damping for Nonsmooth Constrained Optimization

Guanpu Chen, Weijian Li, Gehui Xu, and Yiguang Hong This work was supported in part by Shanghai Municipal Science and Technology Major Project under Grant 2021SHZDZX0100, and in part by the National Natural Science Foundation of China under Grant 61733018. (Corresponding author: Yiguang Hong)G. Chen and G. Xu is with Key Laboratory of Systems and Control, Academy of Mathematics and Systems Science, Beijing, China, and is also with School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing, China. chengp@amss.ac.cn, xghapple@amss.ac.cnW. Li is with Department of Automation, University of Science and Technology of China, Hefei, Anhui, China. ustcwjli@mail.ustc.edu.cnY. Hong is with Department of Control Science and Engineering & Shanghai Research Institute for Intelligent Autonomous Systems, Tongji University, Shanghai, China, and is also with the Key Laboratory of Systems and Control, Academy of Mathematics and Systems Science, Beijing, China. yghong@iss.ac.cn

Abstract

To solve distributed optimization efficiently with various constraints and nonsmooth functions, we propose a distributed mirror descent algorithm with embedded Bregman damping, as a generalization of conventional distributed projection-based algorithms. In fact, our continuous-time algorithm well inherits good capabilities of mirror descent approaches to rapidly compute explicit solutions to the problems with some specific constraint structures. Moreover, we rigorously prove the convergence of our algorithm, along with the boundedness of the trajectory and the accuracy of the solution.

I INTRODUCTION

Distributed optimization has served as a hot topic in recent years for its broad applications in various fields such as sensor networks and smart grids [1, 2, 3, 4, 5, 6, 7]. Under multi-agent frameworks, the global cost function consists of agents’ local ones, and each agent shares limited information with its neighbors through a network to achieve an optimal solution. Meanwhile, distributed continuous-time algorithms have been well developed thanks to system dynamics and control theory [8, 9, 10, 11, 12].

Up to now, various approaches have been employed for distributed design of constrained optimization. Intuitively, implementing projection operations on local constraints is a most popular method, such as projected proportional-integral protocol [8] and projected dynamics with constraints based on KKT conditions [9]. In addition, other approaches such as distributed primal-dual dynamics and penalty-based algorithms [10, 11, 12] also perform well provided that constraints are endowed with specific expressions. However, time complexity in finding optimal solutions with complex or high-dimensional constraints forces researchers to exploit efficient approaches for special constraint structures, such as the unit simplex and the Euclidean sphere.

In fact, the mirror descent (MD) method serves as a powerful tool for solving constrained optimization. As we know, first introduced in [13], MD is regarded as a generalization of (sub)gradient methods. With mapping the variables into a conjugate space, MD employs the Bregman divergence and performs well in handling local constraints with specific structures [14, 15, 16]. This process results in a faster convergence rate than that of projected (sub)gradient descent algorithms, especially for large-scale optimization problems. Undoubtedly, as such an important tool, MD has played a crucial role in various distributed algorithm designs, as given in [17, 18, 19, 20].

In recent years, continuous-time MD-based algorithms have also attracted much attention. For example, [14] proposed the acceleration of a continuous-time MD algorithm, and afterward, [21] showed continuous-time stochastic MD for strongly convex functions, while [22] proposed a discounted continuous-time MD dynamics to approximate the exact solution. In the distributed design, although [19] presented a distributed MD dynamics with integral feedback, the result merely achieved optimal consensus and part variables turn to be unbounded. Due to the booming development and extensive demand of distributed design, distributed continuous-time MD-based methods need more exploration and development actually.

Therefore, we study continuous-time MD-based algorithms to solve distributed nonsmooth optimization with local and coupled constraints. The main contributions of this note can be summarized as follows. We propose a distributed continuous-time mirror descent algorithm by introducing the Bregman damping, which can be regarded as a generalization of classic distributed projection-based dynamics [8, 23] by taking the Bregman damping in a quadratic form. Moreover, our algorithm well inherits the good capabilities of MD-based approaches to rapidly compute explicit solutions to the problems with some concrete constraint structures like the unit simplex or the Euclidean sphere. With the designed Bregman damping, our MD-based algorithm makes all the variables’ trajectories bounded, which could not be ensured in [14, 19], and avoids the inaccuracy of the convergent point occurred in [22].

The remaining part is organized as follows. Section II gives related preliminary knowledge. Next, Section III formulates the distributed optimization and provides our algorithm, while Section IV presents the main results. Then, Section V provides illustrative numerical examples. Finally, Section VI gives the conclusion.

II Preliminaries

In this section, we give necessary notations and related preliminary knowledge.

II-A Notations

Denote $\mathbb{R}^{n}$ (or $\mathbb{R}^{m\times n}$ ) as the set of $n$ -dimensional (or $m$ -by- $n$ ) real column vectors (or real matrices), and $I_{n}$ as the $n\times n$ identity matrix. Let ${1}_{n}$ (or ${0}_{n}$ ) be the $n$ -dimensional column vector with all entries of $1$ (or $0$ ). Denote $A\otimes B$ as the Kronecker product of matrices $A$ and $B$ . Take $col\{x_{1},\dots,x_{n}\}=col\{x_{i}\}_{i=1}^{n}=(x_{1}^{\rm T},\dots,x_{n}^{\rm T})^{\rm T}$ , $\|\cdot\|$ as the Euclidean norm, and $\text{rint}(C)$ as the relative interior of the set $C$ [24].

An undirected graph can be defined by $\mathcal{G}(\mathcal{V},\mathcal{E})$ , where $\mathcal{V}=\{1,\ldots,n\}$ is the set of nodes and $\mathcal{E}\subset\mathcal{V}\times\mathcal{V}$ is the set of edges. Let $\mathcal{A}=[a_{ij}]\in\mathbb{R}^{n\times n}$ be the adjacency matrix of $\mathcal{G}$ such that $a_{ij}=a_{ji}>0$ if $\{j,i\}\in\mathcal{E}$ , and $a_{ij}=0$ , otherwise. The Laplacian matrix is $L_{n}=\mathcal{D}-\mathcal{A}$ , where $\mathcal{D}=\text{diag}\{D_{ii}\}\in\mathbb{R}^{n\times n}$ with $\mathcal{D}_{ii}=\sum_{j=1}^{n}a_{ij}$ . If the graph $\mathcal{G}$ is connected, then $\text{ker}(L_{n})=\{k{1}_{n}:k\in\mathbb{R}\}$ .

II-B Convex analysis

For a closed convex set $\Omega\subseteq\mathbb{R}^{n}$ , the projection map $P_{\Omega}:\mathbb{R}^{n}\rightarrow\Omega$ is defined as $P_{\Omega}(x)=\text{argmin}_{y\in\Omega}\|x-y\|$ . Especially, denote $[x]^{+}\triangleq P_{\mathbb{R}^{n}_{+}}(x)$ for convenience. For $x\in\Omega$ , denote the normal cone to $\Omega$ at $x$ by

\mathcal{N}_{\Omega}(x)=\big{\{}v\in\mathbb{R}^{n}:v^{\rm T}(y-x)\leq 0,\quad\forall y\in\Omega\big{\}}.

A continuous function $f:\mathbb{R}^{n}\rightarrow\mathbb{R}$ is $\omega$ -strongly convex on $\Omega$ if

(x-y)^{\rm T}(g_{x}-g_{y})\geq\omega\|x-y\|^{2},\,\forall x,y\in C,

where $\omega>0$ , $g_{x}\in\partial f(x)$ , and $g_{y}\in\partial f(y)$ .

The Bregman divergence based on the differentiable generating function $f:\Omega\rightarrow\mathbb{R}$ is defined as

\displaystyle D_{f}(x,y)=f(x)-f(y)-\nabla f(y)^{T}(x-y),\,\forall x,y\in\Omega.

The convex conjugate function of $f$ is defined as

\displaystyle f^{*}(z)=\sup_{x\in\Omega}\big{\{}x^{T}z-f(x)\big{\}}.

The following lemma reveals a classical conclusion about convex conjugate functions, of which readers can find more details in [13, 15].

Lemma 1

Suppose that a function $f$ is differentiable and strongly convex on a closed convex set $\Omega$ . Then $f^{*}(z)$ is convex and differentiable, and $f^{*}(z)=\min_{x\in\Omega}\big{\{}-x^{T}z+f(x)\big{\}}$ . Moreover, $\nabla f^{*}(z)=\Pi_{\Omega}^{f}(z)$ , where

\displaystyle\Pi_{\Omega}^{f}(z)\triangleq\mathop{\text{argmin}}_{x\in\Omega}\big{\{}-x^{T}z+f(x)\big{\}}.

(1)

II-C Differential inclusion

A differential inclusion is given by

\dot{x}(t)=\mathcal{F}(x(t)),~{}x(0)=x_{0},~{}t\geq 0,

(2)

where $\mathcal{F}:\mathbb{R}^{n}\rightrightarrows\mathbb{R}^{n}$ is a set-valued map. $\mathcal{F}$ is upper semi-continuous at $x$ if there exists $\delta>0$ for all $\epsilon>0$ such that

\mathcal{F}(y)\subset\mathcal{F}(x)+B(0;\epsilon),~{}\forall y\in B(x;\delta)

and it is upper semi-continuous if it is so for all $x\in\mathbb{R}^{n}$ . A Caratheodory solution to (2) defined on $[0,\tau)\subset[0,+\infty)$ is an absolutely continuous function $x:[0,\tau)\rightarrow\mathbb{R}^{n}$ satisfying (2) for almost all $t\in[0,\tau)$ in Lebesgue measure [25]. The solution $x(t)$ is right maximal if it has no extension in time. A set $\mathcal{M}$ is said to be weakly (strongly) invariant with respect to (2) if $\mathcal{M}$ contains a (all) maximal solution to (2) for any $x_{0}\in\mathcal{M}$ . If $0_{n}\in\mathcal{F}(x_{e})$ , then $x_{e}$ is an equilibrium point of (2). The existence of a solution to (2) is guaranteed by the following lemma [25].

Lemma 2

If $\mathcal{F}$ is locally bounded, upper semicontinuous, and takes nonempty, compact and convex values, then there exists a Caratheodory solution to (2) for any initial value.

Let $V:\mathbb{R}^{n}\rightarrow\mathbb{R}$ be a locally Lipschitz continuous function, and $\partial V(x)$ be the Clarke generalized gradient of $V$ at $x$ . The set-valued Lie derivative for $V$ is defined by $\mathcal{L}_{\mathcal{F}}V(x)\triangleq\{a\in\mathbb{R}:a=p^{T}v,p\in\partial V(x),v\in\mathcal{F}(x)\}$ . Let $\max\mathcal{L}_{\mathcal{F}}V(x)$ be the largest element of $\mathcal{L}_{\mathcal{F}}V(x)$ . Referring to [25], we have the following invariance principle for (2).

Lemma 3

Suppose that $\mathcal{F}$ is upper semi-continuous and locally bounded, and $\mathcal{F}(x)$ takes nonempty, compact, and convex values. Let $V:\mathbb{R}^{n}\rightarrow\mathbb{R}$ be a locally Lipschitz and regular function, $\mathcal{S}\subset\mathbb{R}^{n}$ be compact and strongly invariant for (2), and $\psi(t)$ be a solution to (2). Take

\mathcal{R}=\{x\in\mathbb{R}^{n}:0\in\mathcal{L}_{\mathcal{F}}V(x)\},

and $\mathcal{M}$ as the largest weakly invariant subset of $\bar{\mathcal{R}}\cap\mathcal{S}$ , where $\bar{\mathcal{R}}$ is the closure of $\mathcal{R}$ . If $\max\mathcal{L}_{\mathcal{F}}V(x)\leq 0$ for all $x\in\mathcal{S}$ , then $\lim_{t\to\infty}{\rm dist}(\psi(t),\mathcal{M})=0$ .

III Formulation and algorithm

In this paper, we consider a nonsmooth optimization problem with both local and coupled constraints. There are $N$ agents indexed by $\mathcal{V}=\{1,\dots,N\}$ in a network $\mathcal{G}(\mathcal{V},\mathcal{E})$ . For agent $i$ , the decision variable is $x_{i}$ , the local feasible set is $\Omega_{i}\subseteq\mathbb{R}^{n}$ , and the local cost function is $f_{i}:\mathbb{R}^{n}\rightarrow\mathbb{R}$ . Define $\bm{\Omega}=\prod_{i=1}^{N}\Omega_{i}$ and $\bm{x}=\text{col}\{x_{i}\}_{i=1}^{N}$ . All agents cooperate to solve the following distributed optimization:

	$\displaystyle\min_{\bm{x}\in\bm{\Omega}}$	$\displaystyle\;\sum_{i=1}^{N}f_{i}(x_{i})$
	s.t.	$\displaystyle\;\sum_{i=1}^{N}g_{i}(x_{i})\leq 0_{p},\quad\sum_{i=1}^{N}A_{i}x_{i}-b_{i}=0_{q},$		(3)

where $g_{i}:\mathbb{R}^{n}\rightarrow\mathbb{R}^{p}$ , $A_{i}\in\mathbb{R}^{q\times n}$ and $b_{i}\in\mathbb{R}^{q}$ , for $i\in\mathcal{V}$ . Except for the local constraint $\bm{\Omega}$ , other constraints in (III) are said to be coupled ones since the solutions rely on global information. In the multi-agent network, agent $i$ only has the local decision variable $x_{i}\in\Omega_{i}$ , and moreover, the local information $f_{i}$ , $g_{i}$ , $A_{i}$ and $b_{i}$ . Thus, agents need communication with neighbors through the network $\mathcal{G}$ .

Actually, MD replaces the Euclidean regularization in (sub)gradient descent algorithms with Bregman divergence. In return, different generating functions of Bregman divergence may efficiently bring explicit solutions on different special feasible sets. For example, if $\phi(x)=\frac{1}{2}\|x\|^{2}$ and $\Omega$ is convex and closed, then

\displaystyle\Pi_{\Omega}^{\phi}(z)=\mathop{\text{argmin}}_{x\in\Omega}\frac{1}{2}\|x-z\|^{2}=P_{\Omega}(z),

(4)

which actually turns into the classical Euclidean regularization with projection operations. Furthermore, if $\Omega=\{x\in\mathbb{R}^{n}_{+}:\sum_{k=1}^{n}x^{k}=1\}$ and $\phi(x)=\sum_{k=1}^{n}x^{k}\log(x^{k})$ with the convention $0\log 0=0$ , then

\displaystyle\Pi_{\Omega}^{\phi}(z)=\text{col}\Big{\{}\frac{\exp(z^{k})}{\sum_{j=1}^{n}\exp(z^{j})}\Big{\}}_{k=1}^{n},

(5)

which is the well-known KL-divergence on the unit simplex.

Assign a generating function $\phi_{i}:\mathbb{R}^{n}\rightarrow\mathbb{R}$ of the Bregman divergence to each agent $i\in\mathcal{V}$ . Then we consider the following assumptions for (III).

Assumption 1

(i)

For $i\in\mathcal{V}$ , $\Omega_{i}$ is closed and convex, $f_{i}$ and $g_{i}$ are convex on $\Omega_{i}$ , and moreover, $\phi_{i}$ is differentiable and strongly convex on $\Omega_{i}$ .
(ii)

There exists at least one $\bm{x}\in\text{rint}(\bm{\Omega})$ such that $\sum_{i=1}^{N}g_{i}(x_{i})<0_{p}$ and $\sum_{i=1}^{N}A_{i}x_{i}-b_{i}=0_{q}$ .
(iii)

The undirected graph $\mathcal{G}$ is connected.

Remark 1

Clearly, (III) can be regarded as a generalization for both distributed optimal consensus problems [26, 12, 19] and distributed resource allocation problems [27, 23]. Moreover, $g_{i}$ in the coupled constraints may not required to be affine, which is more general than the constraints in previous works [10, 28]. Also, the problem setting does not require strongly or strictly convexity for either cost functions $f_{i}$ or constraint functions $g_{i}$ [27, 10], and the selection qualification for generating function $\phi_{i}$ has also been widely used [19, 14, 22].

For designing a distributed algorithm, we introduce auxiliary variables $\omega_{i}\in\mathbb{R}^{p}$ , $\nu_{i}\in\mathbb{R}^{q}$ , $\lambda_{i}\in\mathbb{R}^{p}$ , $\mu_{i}\in\mathbb{R}^{q}$ , $y_{i}\in\mathbb{R}^{n}$ and $\gamma_{i}\in\mathbb{R}^{p}$ for each agent $i\in\mathcal{V}$ . Moreover, we employ the gradient $\nabla\phi_{i}$ of generating functions as the Bregman damping in the algorithm, which ensures the trajectories’ boundedness [14, 19] and avoids the convergence inaccuracy [22]. Recall that $a_{ij}$ is the $(i,j)$ -th entry of the adjacency matrix $\mathcal{A}$ and $\Pi_{\Omega_{i}}^{\phi_{i}}(\cdot)$ is defined in (1). Then we propose a distributed mirror descent algorithm with Bregman damping (MDBD) for (III).

Algorithm 1 MDBD for

i\in\mathcal{V}

Initialization:
$x_{i}(0)\in\Omega_{i}$ , $y_{i}(0)=0_{n}$ , $\lambda_{i}(0)=0_{p}$ , $\gamma_{i}(0)=0_{p}$ ,
$\omega_{i}(0)=0_{p}$ , $\mu_{i}(0)=0_{q}$ , $\nu_{i}(0)=0_{q}$ ;
take a proper generating function $\phi_{i}(\cdot)$ according to $\Omega_{i}$ .
Flows renewal:

	$\displaystyle\dot{y}_{i}\,$	$\displaystyle{\in}-{\partial}f_{i}(x_{i})-{\partial}g_{i}(x_{i})^{\rm T}\lambda_{i}-A_{i}^{\rm T}\mu_{i}+\nabla\phi_{i}(x_{i})-y_{i},$
	$\displaystyle\dot{\gamma}_{i}$	$\displaystyle=g_{i}(x_{i})-\sum_{j=1}^{N}a_{ij}(\omega_{i}-\omega_{j})+\lambda_{i}-\gamma_{i},$
	$\displaystyle\dot{\mu}_{i}$	$\displaystyle=A_{i}x_{i}-b_{i}-\sum_{j=1}^{N}a_{ij}(\nu_{i}-\nu_{j}),$
	$\displaystyle\dot{\omega}_{i}$	$\displaystyle=\sum_{j=1}^{N}a_{ij}(\lambda_{i}-\lambda_{j}),$
	$\displaystyle\dot{\nu}_{i}$	$\displaystyle=\sum_{j=1}^{N}a_{ij}(\mu_{i}-\mu_{j}),$
	$\displaystyle x_{i}$	$\displaystyle=\Pi_{\Omega_{i}}^{\phi_{i}}(y_{i}),$
	$\displaystyle\lambda_{i}$	$\displaystyle=[\gamma_{i}]^{+}.$

In Algorithm 1, for each agent $i\in\mathcal{V}$ , information like ${\partial}f_{i}$ , ${\partial}g_{i}$ , $A_{i}$ and $b_{i}$ serves as private knowledge, and values like $\omega_{i}$ , $\nu_{i}$ , $\lambda_{i}$ and $\mu_{i}$ should be exchanged with neighbors through the network $\mathcal{G}$ . Moreover, generating function $\phi_{i}$ and Bregman damping $\nabla\phi_{i}$ can be determined privately and individually, not necessarily identical. It follows from Lemma 2 that the existence of a Caratheodory solution to Algorithm 1 can be guaranteed.

For simplicity, define

\bm{\lambda}=\text{col}\big{\{}\lambda_{i}\big{\}}_{i=1}^{N}\in\mathbb{R}^{Np},~{}~{}\bm{\mu}=\text{col}\big{\{}\mu_{i}\big{\}}_{i=1}^{N}\in\mathbb{R}^{Nq},

\bm{\omega}=\text{col}\big{\{}\omega_{i}\big{\}}_{i=1}^{N}\in\mathbb{R}^{Np},~{}~{}\bm{\nu}=\text{col}\big{\{}\nu_{i}\big{\}}_{i=1}^{N}\in\mathbb{R}^{Nq},

\bm{y}=\text{col}\big{\{}y_{i}\big{\}}_{i=1}^{N}\in\mathbb{R}^{Nn},~{}~{}\bm{\gamma}=\text{col}\big{\{}\gamma_{i}\big{\}}_{i=1}^{N}\in\mathbb{R}^{Np}.

Let $\bm{\Theta}=\bm{\Omega}\times\mathbb{R}^{Np}_{+}\times\mathbb{R}^{N(p+2q)}$ , and moreover,

\bm{z}=\text{col}\{\bm{x},\bm{\lambda},\bm{\mu},\bm{\omega},\bm{\nu}\},\quad\bm{s}=\text{col}\{\bm{y},\bm{\gamma},\bm{\mu},\bm{\omega},\bm{\nu}\}.

Take the Lagrangian function $\mathcal{L}:\bm{\Theta}\rightarrow\mathbb{R}$ as

	$\displaystyle\mathcal{L}(\bm{z})=$	$\displaystyle\sum_{i=1}^{N}f_{i}(x_{i})+\sum_{i=1}^{N}\lambda_{i}^{T}\big{(}g_{i}(x_{i})-\sum_{j=1}^{N}a_{ij}(\omega_{i}-\omega_{j})\big{)}$
		$\displaystyle+\sum_{i=1}^{N}\mu_{i}^{T}\big{(}A_{i}x_{i}-b_{i}-\sum_{j=1}^{N}a_{ij}(\nu_{i}-\nu_{j})\big{)}.$		(6)

For such a distributed convex optimization (III) with a zero dual gap, $\bm{x}^{\star}\in\bm{\Omega}$ is an optimal solution to problem (III) if and only if there exist auxiliary variables $(\bm{\lambda}^{\star},\bm{\mu}^{\star},\bm{\omega}^{\star},\bm{\nu}^{\star})\in\mathbb{R}^{Np}_{+}\times\mathbb{R}^{N(p+2q)}$ , such that $\bm{z}^{\star}=\text{col}\{\bm{x}^{\star},\bm{\lambda}^{\star},\bm{\mu}^{\star},\bm{\omega}^{\star},\bm{\nu}^{\star}\}$ is a saddle point of $\mathcal{L}$ [23], that is, for arbitrary $\bm{z}\in\bm{\Theta}$ ,

	$\displaystyle\mathcal{L}(\bm{x}^{\star},\bm{\lambda},\bm{\mu},\bm{\omega}^{\star},\bm{\nu}^{\star})\leq\mathcal{L}(\bm{x}^{\star},\bm{\lambda}^{\star},$	$\displaystyle\bm{\mu}^{\star},\bm{\omega}^{\star},\bm{\nu}^{\star})$
	$\displaystyle\leq$	$\displaystyle\mathcal{L}(\bm{x},\bm{\lambda}^{\star},\bm{\mu}^{\star},\bm{\omega},\bm{\nu}).$

Define

\displaystyle F(\bm{z})=\begin{bmatrix}\text{col}\big{\{}{\partial}f_{i}(x_{i})+{\partial}g_{i}(x_{i})^{\rm T}\lambda_{i}+A_{i}^{\rm T}\mu_{i}\big{\}}_{i=1}^{N}\\ \text{col}\big{\{}-g_{i}(x_{i})\big{\}}_{i=1}^{N}+\bm{L}_{p}\bm{w}\\ \text{col}\big{\{}-A_{i}x_{i}+b_{i}\big{\}}_{i=1}^{N}+\bm{L}_{q}\bm{\nu}\\ -\bm{L}_{p}\bm{\lambda}\\ -\bm{L}_{q}\bm{\mu}\end{bmatrix},

(7)

where $\bm{L}_{p}=L_{N}\otimes I_{p}$ and $\bm{L}_{q}=L_{N}\otimes I_{q}$ . In fact, $\bm{z}^{\star}$ is a saddle point of $\mathcal{L}$ if and only if $-F(\bm{z}^{\star})\in\mathcal{N}_{\bm{\Theta}}({\bm{z}}^{\star})$ , which was obtained in [27, 23, 10].

Hence, Algorithm 1 can be presented in the following compact form:

\left\{\begin{aligned} \dot{\bm{s}}\;{\in}&-F(\bm{z})+\nabla\Phi(\bm{z})-\bm{s},\\ \bm{z}=&\;\Pi_{\bm{\Theta}}^{\Phi}(\bm{s}),\end{aligned}\right.

(8)

where


$\displaystyle\nabla\Phi(\bm{z})\triangleq$	$\displaystyle\text{col}\Big{\{}\text{col}\{\nabla\phi_{i}(x_{i})\}_{i=1}^{N},\text{col}\{\lambda_{i}\}_{i=1}^{N},$
	$\displaystyle\quad\text{col}\{\mu_{i}\}_{i=1}^{N},\text{col}\{\omega_{i}\}_{i=1}^{N},\text{col}\{\nu_{i}\}_{i=1}^{N}\Big{\}},$	(9a)
$\displaystyle\Pi_{\bm{\Theta}}^{\Phi}(\bm{s})\triangleq$	$\displaystyle\text{col}\Big{\{}\text{col}\{\Pi_{\Omega_{i}}^{\phi_{i}}(y_{i})\}_{i=1}^{N},\text{col}\{[\gamma_{i}]^{+}\}_{i=1}^{N},$
	$\displaystyle\quad\text{col}\{\mu_{i}\}_{i=1}^{N},\text{col}\{\omega_{i}\}_{i=1}^{N},\text{col}\{\nu_{i}\}_{i=1}^{N}\}\Big{\}}.$	(9b)

Remark 2

In fact, if $\phi(\cdot)=\frac{1}{2}\|\cdot\|^{2}$ , then $\Pi_{\mathbb{R}^{p}}^{\phi}(z)=P_{\mathbb{R}^{p}}(z)=z$ and $\Pi_{\mathbb{R}^{p}_{+}}^{\phi}(z)=P_{\mathbb{R}^{p}_{+}}(z)=[z]^{+}$ , and therefore, (8) can be rewritten as

\left\{\begin{aligned} \dot{\bm{s}}\;{\in}&-F(\bm{z})+\bm{z}-\bm{s},\\ \bm{z}=&\;P_{\bm{\Theta}}(\bm{s}),\end{aligned}\right.

(10)

which is actually a widely-investigated dynamics such as the proportional-integral protocol in [8] and projected output feedback in [23]. Thus, MDBD generalizes the conventional distributed projection-based design for constrained optimization. Obviously, $\bm{z}$ in (10) is replaced with the Bregman damping $\nabla\Phi(\bm{z})$ in (8).

IV Main results

In this section, we investigate the convergence of MDBD. Though Bregman damping improves the convergence of MDBD, the process also brings challenges for the convergence analysis. The following lemma shows the relationship between MDBD and the saddle points of the Lagrangian function $\mathcal{L}$ .

Lemma 4

Under Assumption 1, $\bm{z}^{\star}$ is a saddle point of Lagrangian function $\mathcal{L}$ in (III) if and only if there exists $\bm{s}^{\star}\in-F(\bm{z}^{\star})+\nabla\Phi(\bm{z}^{\star})$ such that $\bm{z}^{\star}=\Pi_{\bm{\Theta}}^{\Phi}(\bm{s}^{\star})$ .

Proof. For $\tilde{\bm{z}}=\Pi_{\bm{\Theta}}^{\Phi}(\bm{s}^{\star})$ , the first-order condition is

\displaystyle-F(\bm{z}^{\star})+\nabla\Phi(\bm{z}^{\star})-\nabla\Phi(\tilde{\bm{z}})\in\mathcal{N}_{\bm{\Theta}}(\tilde{\bm{z}}).

(11)

We firstly show the sufficiency. Given $\bm{z}^{\star}$ , suppose that there exists $\bm{s}^{\star}\in-F(\bm{z}^{\star})+\nabla\Phi(\bm{z}^{\star})$ such that $\bm{z}^{\star}=\Pi_{\bm{\Theta}}^{\Phi}(\bm{s}^{\star})$ . Thus, (11) holds with $\tilde{\bm{z}}=\bm{z}^{\star}$ , and $-F(\bm{z}^{\star})\in\mathcal{N}_{\bm{\Theta}}({\bm{z}}^{\star})$ , which means that $\bm{z}^{\star}$ is a saddle point of $\mathcal{L}$ .

Secondly, we show the necessity. Suppose $-F(\bm{z}^{\star})\in\mathcal{N}_{\bm{\Theta}}({\bm{z}}^{\star})$ and take $\bm{s}^{\star}\in-F(\bm{z}^{\star})+\nabla\Phi(\bm{z}^{\star})$ . Recall that (11) holds with $\tilde{\bm{z}}=\bm{z}^{\star}$ , which implies that $\bm{z}^{\star}$ is a solution to $\Pi_{\bm{\Theta}}^{\Phi}(\bm{s}^{\star})$ . Furthermore, since $\phi_{i}(\cdot)$ and $\frac{1}{2}\|\cdot\|^{2}$ are strongly convex, the solution to $\Pi_{\bm{\Theta}}^{\Phi}(\bm{s}^{\star})$ is unique. Therefore, $\bm{z}^{\star}=\Pi_{\bm{\Theta}}^{\Phi}(\bm{s}^{\star})$ . $\square$

The following theorem shows the correctness and the convergence of Algorithm 1.

Theorem 1

Under Assumption 1, the following statements hold.

(i)

The trajectory $(\bm{s}(t),\bm{z}(t))$ of (8) is bounded;
(ii)

$\bm{x}(t)$ converges to an optimal solution to problem (III).

Proof. (i) Firstly, we show that the output $\bm{z}(t)$ is bounded. By Lemma 4, take $\bm{z}^{\star}$ as a saddle point of $\mathcal{L}$ and thus, there exists $\bm{s}^{\star}\in-F(\bm{z}^{\star})+\nabla\Phi(\bm{z}^{\star})$ such that $\bm{z}^{\star}=\Pi_{\bm{\Theta}}^{\Phi}(\bm{s}^{\star})$ . Take $\phi^{*}_{i}$ as the convex conjugate of $\phi_{i}$ , and construct a Lyapunov candidate function as

	$\displaystyle V_{1}=$	$\displaystyle\sum_{i=1}^{N}D_{\phi_{i}^{*}}(y_{i}-y_{i}^{\star})+\frac{1}{2}\\|\bm{\gamma}-\bm{\gamma}^{\star}\\|^{2}+\frac{1}{2}\\|\bm{\mu}-\bm{\mu}^{\star}\\|^{2}$
		$\displaystyle+\frac{1}{2}\\|\bm{\omega}-\bm{\omega}^{\star}\\|^{2}+\frac{1}{2}\\|\bm{\nu}-\bm{\nu}^{\star}\\|^{2}.$		(12)

Since $x_{i}=\Pi_{\Omega_{i}}^{\phi_{i}}(y_{i})$ , it follows from Lemma 1 that


$\displaystyle\phi^{*}_{i}(y_{i})=$	$\displaystyle x_{i}^{T}y_{i}-\phi_{i}(x_{i}),$	(13a)
$\displaystyle\phi^{*}_{i}(y_{i}^{\star})=$	$\displaystyle x_{i}^{\star T}y_{i}^{\star}-\phi_{i}(x_{i}^{\star}).$	(13b)

Thus, by substituting (13), the Bregman divergence becomes

	$\displaystyle D_{\phi_{i}^{*}}(y_{i}-y_{i}^{\star})=$	$\displaystyle\phi^{}_{i}(y_{i})-\phi^{}_{i}(y_{i}^{\star})-\nabla\phi^{*}_{i}(y_{i}^{\star})^{T}(y_{i}-y_{i}^{\star})$
	$\displaystyle=$	$\displaystyle\phi_{i}(x_{i}^{\star})-\phi_{i}(x_{i})-(x_{i}^{\star}-x_{i})^{T}y_{i}.$

Since $\phi_{i}(\cdot)$ is strongly convex for $i\in\mathcal{V}$ , there exists a positive constant $\sigma$ such that

\displaystyle\sum_{i=1}^{N}\!D_{\phi_{i}^{*}}\!(y_{i}\!-\!y_{i}^{\star})\!\geq\!\frac{\sigma}{2}\!\|\bm{x}\!-\!\bm{x}^{\star}\!\|^{2}\!+\!\sum_{i=1}^{N}(x_{i}^{\star}\!-\!x_{i})^{T}\!(\nabla\phi_{i}(x_{i})\!-\!y_{i}).

In fact, $\nabla\phi^{*}_{i}(y_{i})=\mathop{\text{argmin}}_{x\in{\Omega_{i}}}\{-x^{T}y_{i}+\phi_{i}(x)\}$ , which leads to

	$\displaystyle 0\leq$	$\displaystyle\big{(}\nabla\phi_{i}(\nabla\phi^{}_{i}(y_{i}))-y_{i}\big{)}^{T}\big{(}\nabla\phi^{}_{i}(y_{i}^{\star})-\nabla\phi_{i}^{*}(y_{i})\big{)}$
	$\displaystyle=$	$\displaystyle(\nabla\phi_{i}(x_{i})-y_{i})^{T}(x_{i}^{\star}-x_{i}).$

Thus, $\sum_{i=1}^{N}D_{\phi_{i}^{*}}(y_{i}-y_{i}^{\star})\geq\frac{\sigma}{2}\|\bm{x}-\bm{x}^{\star}\|^{2}$ . In addition,

\displaystyle\|\bm{\lambda}-\bm{\lambda}^{\star}\|^{2}=\|[\bm{\gamma}]^{+}-[\bm{\gamma}^{\star}]^{+}\|^{2}\leq\|\bm{\gamma}-\bm{\gamma}^{\star}\|^{2}.

Therefore,

	$\displaystyle V_{1}(\bm{s}(t))\geq$	$\displaystyle\frac{\kappa}{2}\Big{(}\\|\bm{x}-\bm{x}^{\star}\\|^{2}+\\|\bm{\lambda}-\bm{\lambda}^{\star}\\|^{2}+\\|\bm{\mu}-\bm{\mu}^{\star}\\|^{2}$
		$\displaystyle+\\|\bm{\omega}-\bm{\omega}^{\star}\\|^{2}+\\|\bm{\nu}-\bm{\nu}^{\star}\\|^{2}\Big{)},$		(14)

where $\kappa=\min\{\sigma,1\}$ . This means $V_{1}(\bm{s}(t))\geq\frac{\kappa}{2}\|\bm{z}-\bm{z}^{\star}\|^{2}$ , that is, $V_{1}$ is radially unbounded in $\bm{z}$ . Clearly, the function $V_{1}$ along (20) satisfies

	$\displaystyle\mathcal{L}_{\mathcal{F}}$	$\displaystyle V_{1}=\big{\{}\beta\in\mathbb{R},\beta=\sum_{i=1}^{N}\big{(}\nabla\phi_{i}^{}(y_{i})-\nabla\phi_{i}^{}(y_{i}^{\star})\big{)}^{T}\dot{y}_{i}$
	$\displaystyle+$	$\displaystyle(\bm{\gamma}\!-\!\bm{\gamma}^{\star})^{T}\dot{\bm{\gamma}}+(\bm{\mu}\!-\!\bm{\mu}^{\star})^{T}\dot{\bm{\mu}}+(\bm{\omega}\!-\!\bm{\omega}^{\star})^{T}\dot{\bm{\omega}}+(\bm{\nu}\!-\!\bm{\nu}^{\star})^{T}\dot{\bm{\nu}}$
	$\displaystyle=$	$\displaystyle\big{\{}\beta\in\mathbb{R},\beta=\big{(}\bm{z}\!-\!\bm{z}^{\star}\big{)}^{T}\!\big{(}-\bm{\eta}\!+\!\nabla\Phi(\bm{z})\!-\!\bm{s}\big{)},\;\bm{\eta}\in F(\bm{z})\big{\}}.$

Combining the convexity of $f_{i}$ and $g_{i}$ with the property for saddle point $\bm{z}^{\star}$ ,

	$\displaystyle-(\bm{z}-$	$\displaystyle\bm{z}^{\star}\big{)}^{T}\bm{\eta}$		(15)
	$\displaystyle\leq$	$\displaystyle\mathcal{L}(\bm{x}^{\star},\bm{\lambda},\bm{\mu},\bm{\omega}^{\star},\bm{\nu}^{\star})-\mathcal{L}(\bm{x},\bm{\lambda}^{\star},\bm{\mu}^{\star},\bm{\omega},\bm{\nu})\leq 0.$

Thus,

	$\displaystyle{\beta}$	$\displaystyle\leq(\bm{z}-\bm{z}^{\star})^{T}(\nabla\Phi(\bm{z})-\bm{s})$		(16)
		$\displaystyle=\sum_{i=1}^{N}(x_{i}-x_{i}^{\star})^{T}(\nabla\phi_{i}(x_{i})-y_{i})+(\bm{\lambda}-\bm{\lambda}^{\star})^{T}(\bm{\lambda}-\bm{\gamma}).$

On the one hand, for $i\in\mathcal{P}$ , we consider a differentiable function

\displaystyle J(\alpha)=\phi_{i}\big{(}\alpha x_{i}^{\star}+(1-\alpha)x_{i}\big{)}-\big{(}\alpha x_{i}^{\star}+(1-\alpha)x_{i}\big{)}^{T}y_{i},

with a constant $\alpha\in[0,1]$ . Correspondingly, we have

\displaystyle J^{\prime}(\alpha)=(x_{i}^{\star}-x_{i})^{T}\Big{(}\nabla\phi_{i}(\alpha x_{i}^{\star}+(1-\alpha)x_{i}\big{)}-y_{i}\Big{)}.

Recalling $x_{i}=\Pi_{\Omega_{i}}^{\phi_{i}}(y_{i})=\mathop{\text{argmin}}_{x\in\Omega_{i}}\big{\{}-x^{T}y_{i}+\phi_{i}(x)\big{\}}$ , $J(0)\leq J(\alpha)$ , $\forall\alpha\in[0,1]$ , because of the convexity of $\Omega_{i}$ . This yields $J^{\prime}(\alpha)\big{|}_{0^{+}}\geq 0$ , that is,

\displaystyle J^{\prime}(\alpha)|_{0^{+}}=(x_{i}^{\star}-x_{i})^{T}(\nabla\phi_{i}(x_{i})-y_{i})\geq 0.

On the other hand,

\displaystyle(\bm{\lambda}-\bm{\lambda}^{\star})^{T}(\bm{\lambda}-\bm{\gamma})=([\bm{\gamma}]^{+}-\bm{\lambda}^{\star})^{T}([\bm{\gamma}]^{+}-\bm{\gamma})\leq 0.

Therefore, ${\beta}\leq 0$ , which implies that the output $\bm{z}(t)$ is bounded.

Secondly, we show that $\bm{s}(t)$ is bounded. Actually, it follows from (IV) and the statement above that $\bm{\gamma}(t)$ is bounded. Thereby, we merely need to consider $\bm{y}$ . Take another Lyapunov candidate function as

\displaystyle V_{2}=\frac{1}{2}\|\bm{y}\|^{2},

(17)

which is radially unbounded in $\bm{y}$ . Along the trajectories of Algorithm 1, the derivative of $V_{2}$ satisfies

	$\displaystyle\mathcal{L}_{\mathcal{F}}V_{2}=\big{\{}\zeta\in$	$\displaystyle\mathbb{R}:\zeta\in\sum_{i=1}^{N}y_{i}^{T}\big{(}-\partial f_{i}(x_{i})-\partial g_{i}(x_{i})^{\rm T}\lambda_{i}$
		$\displaystyle-A_{i}^{\rm T}\mu_{i}+\nabla\phi_{i}(x_{i})\big{)}-\\|y_{i}\\|^{2}\big{\}}.$

It is clear that $\zeta\leq-\|\bm{y}\|^{2}+m\|\bm{y}\|=-2V_{2}+m\sqrt{2V_{2}}$ for a positive constant $m$ , since $\bm{x}$ , $\bm{\lambda}$ and $\bm{\mu}$ have been proved to be bounded. On this basis, it can be easily verified that $V_{2}$ is bounded, so is $\bm{y}$ . Together, $\bm{s}(t)$ is bounded.

(ii) Set $\mathcal{R}=\big{\{}(\bm{z},\bm{s}):0\in\mathcal{L}_{\mathcal{F}}V_{1}\big{\}}$ . Clearly, by (15), $\mathcal{R}\subseteq\big{\{}(\bm{z},\bm{s}):\mathcal{L}(\bm{x}^{\star},\bm{\lambda},\bm{\mu},\bm{\omega}^{\star},\bm{\nu}^{\star})=\mathcal{L}(\bm{x},\bm{\lambda}^{\star},\bm{\mu}^{\star},\bm{\omega},\bm{\nu})\big{\}}$ . Let $\mathcal{M}$ be the largest invariant subset of $\mathcal{R}$ . By Lemma 3, $(\bm{z}(t),\bm{s}(t))\rightarrow\mathcal{M}$ as $t\rightarrow\infty$ . Take any $(\tilde{\bm{z}},\tilde{\bm{s}})\in\mathcal{M}$ . Let $\hat{\bm{s}}\in-F(\tilde{\bm{z}})+\nabla\Phi(\tilde{\bm{z}})$ , and clearly $(\tilde{\bm{z}},\hat{\bm{s}})\in\mathcal{M}$ as well. Similar to (IV), we take another Lyapunov function $\tilde{V}_{1}$ by replacing $(\bm{z}^{\star},\bm{s}^{\star})$ with $(\tilde{\bm{z}},\hat{\bm{s}})$ . Based on similar arguments, $\tilde{\bm{z}}$ is Lyapunov stable, so is $\hat{\bm{s}}$ . By Proposition 4.7 in [29], there exists $(\bm{z}^{\#},{\bm{s}}^{\#})\in\mathcal{M}$ such that $({\bm{z}(t)},{\bm{s}(t)})\rightarrow({\bm{z}}^{\#},{\bm{s}}^{\#})$ as $t\rightarrow\infty$ , which yields that $\bm{x}(t)$ in Algorithm 1 converges to an optimal solution to problem (III). $\square$

Remark 3

It is worth mentioning that the Bregman damping $\nabla\Phi(\bm{z})$ in (8) is fundamental to make the trajectory of variable $\bm{s}$ avoid going to infinity [14, 19], or converging to an inexact optimal point [22]. Clearly, $\bm{z}$ in the first ODE in (10) derives actually not from the variable $\bm{z}$ itself, but from the gradient of the quadratic function $\|\bm{z}\|^{2}/2$ instead. This is exactly the crucial point in designing the distributed MD-based dynamics (8). Correspondingly, the properties in conjugate spaces, referring to (13), play an important role in the analysis.

For convenience, we define

	$\displaystyle\hat{\bm{x}}\triangleq$	$\displaystyle\frac{1}{t}\int_{0}^{t}\!\bm{x}(\tau)d\tau,\;\hat{\bm{\lambda}}\triangleq\frac{1}{t}\!\int_{0}^{t}\bm{\lambda}(\tau)d\tau,\;\hat{\bm{\mu}}\triangleq\frac{1}{t}\!\int_{0}^{t}\bm{\mu}(\tau)d\tau,$
	$\displaystyle\hat{\bm{\omega}}\triangleq$	$\displaystyle\frac{1}{t}\int_{0}^{t}\bm{\omega}(\tau)d\tau,\;\hat{\bm{\nu}}\triangleq\frac{1}{t}\int_{0}^{t}\bm{\nu}(\tau)d\tau.$

Then we describe the convergence rate of Algorithm 1.

Theorem 2

Under Assumption 1, (8) converges with a rate of $\mathcal{O}(1/t)$ , i.e.,

\displaystyle 0\leq\mathcal{L}(\hat{\bm{x}},\bm{\lambda}^{\star},\bm{\mu}^{\star},\hat{\bm{\omega}},\hat{\bm{\nu}})-\mathcal{L}(\bm{x}^{\star},\hat{\bm{\lambda}},\hat{\bm{\mu}},\bm{\omega}^{\star},\bm{\nu}^{\star})\leq\frac{1}{t}V_{1}(\bm{s}(0)).

Proof. It follows from (IV)-(16) that

\displaystyle\frac{d}{dt}V_{1}\leq\mathcal{L}(\bm{x}^{\star},\bm{\lambda},\bm{\mu},\bm{\omega}^{\star},\bm{\nu}^{\star})-\mathcal{L}(\bm{x},\bm{\lambda}^{\star},\bm{\mu}^{\star},\bm{\omega},\bm{\nu})\leq 0.

By integrating both sides over the time interval $[0,t]$ ,

$\displaystyle-V_{1}(\bm{s}(0))\leq V_{1}(\bm{s}(t))-V_{1}(\bm{s}($	$\displaystyle 0))$
$\displaystyle\leq\int_{0}^{t}\Big{(}\mathcal{L}(\bm{x}^{\star},\bm{\lambda}(\tau),\bm{\mu}(\tau),\bm{\omega}^{\star},\bm{\nu}^{\star})$		(18)
$\displaystyle-\mathcal{L}(\bm{x}(\tau),\bm{\lambda}^{\star},\bm{\mu}^{\star},$	$\displaystyle\bm{\omega}(\tau),\bm{\nu}(\tau))\Big{)}d\tau\leq 0.$

With applying Jensen’s inequality to the convex-concave Lagrangian function $\mathcal{L}$ ,

	$\displaystyle\mathcal{L}(\bm{x}^{\star}\!,\bm{\lambda}(t),\bm{\mu}(t),\bm{\omega}^{\star}\!,\bm{\nu}^{\star})\!$	$\displaystyle\geq\!\frac{1}{t}\!\int_{0}^{t}\!\!\mathcal{L}(\bm{x}^{\star}\!,\bm{\lambda}(\!\tau\!),\bm{\mu}(\!\tau\!),\bm{\omega}^{\star}\!,\bm{\nu}^{\star})d\tau,$
	$\displaystyle\mathcal{L}(\bm{x}(t),\!\bm{\lambda}^{\star}\!,\bm{\mu}^{\star}\!,\bm{\omega}(t),\!\bm{\nu}(t)\!)\!$	$\displaystyle\leq\!\frac{1}{t}\!\int_{0}^{t}\!\!\mathcal{L}(\bm{x}(\!\tau\!),\!\bm{\lambda}^{\star}\!,\bm{\mu}^{\star}\!,\bm{\omega}(\!\tau\!),\!\bm{\nu}(\!\tau\!)\!)d\tau.$

By substituting the above inequalities into (IV), the conclusion follows. $\square$

V Numerical examples

In this section, we examine the correctness and effectiveness of Algorithm 1 on the classical simplex-constrained problems (see, e.g., [22, 18]), where the local constraint set is an $n$ -simplex, e.g.,

\Omega_{i}=\{x_{i}\in\mathbb{R}^{n}_{+}:\sum_{k=1}^{n}x_{i,k}=1\},\,\forall i\in\mathcal{V}.

First, we consider the following nonsmooth optimization problem with $N=10$ and $n=4$ ,

		$\displaystyle\min_{\bm{x}\in\bm{\Omega}}\,\sum_{i=1}^{N}\left\\|W_{i}x_{i}-d_{i}\right\\|^{2}+c_{i}\left\\|x_{i}\right\\|_{1}$		(19)
	s.t.	$\displaystyle\sum_{i=1}^{N}g_{i}(x_{i})\leq 0,\quad\sum_{i=1}^{N}A_{i}x_{i}-\sum_{i=1}^{N}b_{i}=0_{2},$		(19)

where $W_{i}$ is a positive semi-definite matrix, $d_{i}\in\mathbb{R}^{4}$ , and $c_{i}>0$ . The coupled inequality constraint is

g_{i}(x_{i})=\left\|x_{i}\right\|^{2}+c_{i}\left\|x_{i}\right\|_{1}-\frac{25}{2n+i^{2}}\;,

while $A_{i}\in\mathbb{R}^{2\times 4}$ and $b_{i}\in\mathbb{R}^{2}$ are random matrices ensuring the Slater’s constraint condition. Here, $W_{i}$ , $d_{i}$ , $g_{i}$ , $A_{i}$ and $b_{i}$ are private to agent $i$ , and all agents communicate through an undirected cycle network $\mathcal{G}$ :

1\rightleftarrows 2\rightleftarrows\cdots\rightleftarrows 10\rightleftarrows 1.

To implement the MD method, we employ the negative entropy function $\phi_{i}(x_{i})\!=\!\sum_{k=1}^{n}x_{i,k}\!\log(x_{i,k})$ as the generating function on $\Omega_{i}$ in Algorithm 1. In Fig. 1, we show the trajectories of one dimension of each $x_{i}$ and $y_{i}$ , respectively. Clearly, the trajectories of both $x_{i}$ and $y_{i}$ in MDBD are bounded, while the boundedness of $y_{i}$ may not be guaranteed in [14, 19].

Refer to caption — (a) Trajectories of $x_{i}$

Next, we show the effectiveness of MDBD by comparisons. As is investigated in [16, 14], when the generating function satisfies $\phi_{i}(x_{i})\!=\!\sum_{k=1}^{n}x_{i,k}\!\log(x_{i,k})$ on the unit simplex, $\Pi_{\Omega_{i}}^{\phi_{i}}(y_{i})$ can be explicitly expressed as (5). In this circumstance, the MD-based method works better than projection-based algorithms, since it can be regarded as projection-free and effectively saves the time for projection operation, especially with high-dimensional variables.

To this end, we investigate different dimensions of decision variable $x_{i}$ and compare MDBD with two distributed continuous-time projection-based algorithms — the proportional-integral protocol (PIP-Yang) in [8] and the projected output feedback (POF-Zeng) in [23], still for the cost functions and the coupled constraints given in (19).

TABLE I: Real running time (sec) in different dimensions

	$n=4$	$n=64$	$n=256$	$n=1024$	$n=4096$	$n=10^{5}$	$n=10^{6}$
MDBD	0.47	2.42	6.76	12.98	27.99	146.62	466.60
PIP-YANG	2.51	19.63	48.51	195.67	892.74	$>3000$	$>5000$
POF-ZENG	3.92	21.78	39.73	207.03	1136.85	$>3000$	$>5000$

In Fig. 2, the $x$ -axis is for the real running time of the GPU, while the $y$ -axis is for the optimal error $\|\bm{x}-\bm{x}^{\star}\|$ . As the dimension increases, the real running time of the two projection-based dynamics is obviously longer than that of MDBD, because obtaining (5) is much faster than calculating a projection on high-dimensional constraint sets via solving a general quadratic optimization problem.

Furthermore, Table I lists the real running time for three algorithms with different dimensions of decision variables. As the dimension increases, finding the projection points in large-scale circumstances becomes more and more difficult. But remarkably, MDBD still maintains good performance, due to the advantage of MD.

VI CONCLUSIONS

We investigated distributed nonsmooth optimization with both local set constraints and coupled constraints. Based on the mirror descent method, we proposed a continuous-time algorithm with introducing the Bregman damping to guarantee the algorithm’s boundedness and accuracy. Furthermore, we utilized nonsmooth techniques, conjugate functions, and the Lyapunov stability to prove the convergence. Finally, we implemented comparative experiments to illustrate the effectiveness of our algorithm.

References

[1] T. Yang, X. Yi, J. Wu, Y. Yuan, D. Wu, Z. Meng, Y. Hong, H. Wang, Z. Lin, and K. H. Johansson, “A survey of distributed optimization,” Annual Reviews in Control, vol. 47, pp. 278–305, 2019.
[2] M. Zhu and S. Martínez, “On distributed convex optimization under inequality and equality constraints,” IEEE Transactions on Automatic Control, vol. 57, no. 1, pp. 151–164, 2011.
[3] D. Yuan, D. W. Ho, and S. Xu, “Regularized primal–dual subgradient method for distributed constrained optimization,” IEEE Transactions on Cybernetics, vol. 46, no. 9, pp. 2109–2118, 2015.
[4] A. Cherukuri and J. Cortes, “Initialization-free distributed coordination for economic dispatch under varying loads and generator commitment,” Automatica, vol. 74, pp. 183–193, 2016.
[5] J.-M. Xu and Y. C. Soh, “A distributed simultaneous perturbation approach for large-scale dynamic optimization problems,” Automatica, vol. 72, pp. 194–204, 2016.
[6] K. You, R. Tempo, and P. Xie, “Distributed algorithms for robust convex optimization via the scenario approach,” IEEE Transactions on Automatic Control, vol. 64, no. 3, pp. 880–895, 2018.
[7] K. Lu, G. Jing, and L. Wang, “Online distributed optimization with strongly pseudoconvex-sum cost functions,” IEEE Transactions on Automatic Control, vol. 65, no. 1, pp. 426–433, 2019.
[8] S. Yang, Q. Liu, and J. Wang, “A multi-agent system with a proportional-integral protocol for distributed constrained optimization,” IEEE Transactions on Automatic Control, vol. 62, no. 7, pp. 3461–3467, 2016.
[9] Y. Zhu, W. Yu, G. Wen, G. Chen, and W. Ren, “Continuous-time distributed subgradient algorithm for convex optimization with general constraints,” IEEE Transactions on Automatic Control, vol. 64, no. 4, pp. 1694–1701, 2018.
[10] S. Liang, X. Zeng, and Y. Hong, “Distributed nonsmooth optimization with coupled inequality constraints via modified Lagrangian function,” IEEE Transactions on Automatic Control, vol. 63, no. 6, pp. 1753–1759, 2017.
[11] X. Li, L. Xie, and Y. Hong, “Distributed continuous-time nonsmooth convex optimization with coupled inequality constraints,” IEEE Transactions on Control of Network Systems, vol. 7, no. 1, pp. 74–84, 2019.
[12] W. Li, X. Zeng, S. Liang, and Y. Hong, “Exponentially convergent algorithm design for constrained distributed optimization via non-smooth approach,” IEEE Transactions on Automatic Control, 2021, doi: 10.1109/TAC.2021.3075666.
[13] A. S. Nemirovskij and D. B. Yudin, Problem Complexity and Method Efficiency in Optimization. Wiley-Interscience, 1983.
[14] W. Krichene, A. Bayen, and P. Bartlett, “Accelerated mirror descent in continuous and discrete time,” in Advances in Neural Information Processing Systems, vol. 28, 2015, pp. 2845–2853.
[15] J. Diakonikolas and L. Orecchia, “The approximate duality gap technique: A unified theory of first-order methods,” SIAM Journal on Optimization, vol. 29, no. 1, pp. 660–689, 2019.
[16] A. Ben-Tal, T. Margalit, and A. Nemirovski, “The ordered subsets mirror descent optimization method with applications to tomography,” SIAM Journal on Optimization, vol. 12, no. 1, pp. 79–108, 2001.
[17] S. Shahrampour and A. Jadbabaie, “Distributed online optimization in dynamic environments using mirror descent,” IEEE Transactions on Automatic Control, vol. 63, no. 3, pp. 714–725, 2017.
[18] D. Yuan, Y. Hong, D. W. Ho, and G. Jiang, “Optimal distributed stochastic mirror descent for strongly convex optimization,” Automatica, vol. 90, pp. 196–203, 2018.
[19] Y. Sun and S. Shahrampour, “Distributed mirror descent with integral feedback: Asymptotic convergence analysis of continuous-time dynamics,” IEEE Control Systems Letters, vol. 5, no. 5, pp. 1507–1512, 2020.
[20] Y. Wang, Z. Tu, and H. Qin, “Distributed stochastic mirror descent algorithm for resource allocation problem,” Control Theory and Technology, vol. 18, no. 4, pp. 339–347, 2020.
[21] P. Xu, T. Wang, and Q. Gu, “Continuous and discrete-time accelerated stochastic mirror descent for strongly convex functions,” in International Conference on Machine Learning, 2018, pp. 5492–5501.
[22] B. Gao and L. Pavel, “Continuous-time discounted mirror-descent dynamics in monotone concave games,” IEEE Transactions on Automatic Control, 2020, doi: 10.1109/TAC.2020.3045094.
[23] X. Zeng, P. Yi, Y. Hong, and L. Xie, “Distributed continuous-time algorithms for nonsmooth extended monotropic optimization problems,” SIAM Journal on Control and Optimization, vol. 56, no. 6, pp. 3973–3993, 2018.
[24] S. Boyd and L. Vandenberghe, Convex Optimization. Cambridge University Press, 2004.
[25] J. Cortés, “Discontinuous dynamical systems,” IEEE Control Systems Magazine, vol. 28, no. 3, pp. 36–73, 2008.
[26] G. Shi and K. H. Johansson, “Randomized optimal consensus of multi-agent systems,” Automatica, vol. 48, no. 12, pp. 3018–3030, 2012.
[27] P. Yi, Y. Hong, and F. Liu, “Initialization-free distributed algorithms for optimal resource allocation with feasibility constraints and application to economic dispatch of power systems,” Automatica, vol. 74, pp. 259–269, 2016.
[28] G. Chen, Y. Ming, Y. Hong, and P. Yi, “Distributed algorithm for $\varepsilon$ -generalized nash equilibria with uncertain coupled constraints,” Automatica, vol. 123, p. 109313, 2021.
[29] W. M. Haddad and V. Chellaboina, Nonlinear Dynamical Systems and Control: A Lyapunov-based Approach. Princeton University Press, 2011.