This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Privacy-Preserving Distributed Online Mirror Descent for Nonconvex Optimization

Yingjie Zhou Zhouyingjie7@163.com Tao Li tli@math.ecnu.edu.cn School of Mathematical Sciences, East China Normal University, Shanghai, 200241, China Key Laboratory of Mathematics and Engineering Applications, Ministry of Education, School of Mathematical Sciences, East China Normal University, Shanghai, 200241, China
Abstract

We investigate the distributed online nonconvex optimization problem with differential privacy over time-varying networks. Each node minimizes the sum of several nonconvex functions while preserving the node’s differential privacy. We propose a privacy-preserving distributed online mirror descent algorithm for nonconvex optimization, which uses the mirror descent to update decision variables and the Laplace differential privacy mechanism to protect privacy. Unlike the existing works, the proposed algorithm allows the cost functions to be nonconvex, which is more applicable. Based upon these, we prove that if the communication network is BB-strongly connected and the constraint set is compact, then by choosing the step size properly, the algorithm guarantees ϵ\epsilon-differential privacy at each time. Furthermore, we prove that if the local cost functions are β\beta-smooth, then the regret over time horizon TT grows sublinearly while preserving differential privacy, with an upper bound O(T)O(\sqrt{T}). Finally, the effectiveness of the algorithm is demonstrated through numerical simulations.

keywords:
Nonconvex problems , differential privacy , distributed online optimization , regret analysis.
journal: Systems & Control Letters

1 Introduction

In distributed optimization, each node is aware of its local cost function. Nodes exchange information with their neighboring nodes through a communication network and collaborate to optimize the global cost function. This collaboration results in convergence towards the global optimal solution, where the global cost function is the sum of the local cost functions of all nodes. In numerous practical scenarios, each node’s local cost function varies over time which leads to distributed online optimization problems. Distributed online optimization problems have wide applications in signal processing ([1]), economic dispatch ([2]), and sensor networks ([3]) et al. The regret is commonly used to evaluate the performance of a distributed online optimization algorithm. If the regret grows sublinearly, then the algorithm is effective. The existing works show that if the cost function is convex, then the best regret bound achieved by an algorithm is O(T)O(\sqrt{T}), and if the cost function is strongly convex, then the best regret bound is O(ln(T))O(\ln(T)), where TT denotes the time horizon ([4, 5]). In recent years, distributed online optimization problems have been extensively studied ([6, 7, 8, 9]). The algorithms above are designed based on the Euclidean distance. Such algorithms often face challenges in computing projections for complex cost functions and constraint sets, e.g. problems with simplex constraints. Beck et al. ([10]) proposed the mirror descent algorithm based on the Bregman distance instead of the Euclidean distance, which effectively improves the computational efficiency. The mirror descent algorithm is highly effective for handling large-scale optimization problems ([11]), and numerous studies have focused on extending this algorithm to distributed settings ([12, 13]).

In distributed algorithms, each node possesses a data set containing its private information. By collecting the information exchanged between nodes, attackers could potentially deduce the data sets of the nodes, leading to privacy leakage. To protect private information, an effective approach is the differential privacy mechanism proposed by Dwork et al. ([14]). The fundamental principle of the differential privacy mechanism is to add a certain level of noises to the communication, thereby ensuring that attackers can only obtain limited privacy information from their observations. Recently, extensive works have been proposed for distributed online convex optimization problems with differential privacy ([15, 16, 17, 18, 19, 20, 21]). Yuan et al. ([15]) proposed a privacy-preserving distributed online optimization algorithm based on the mirror descent. For a fixed graph, they proved that if the cost function is strongly convex, then the regret bound of the algorithm is the same as that without differential privacy, which is O(ln(T))O(\ln{(T)}). Zhao et al. ([16]) proposed a distributed online optimization algorithm with differential privacy based on one-point residual feedback. They proved that, even if the gradient information of the cost function is unknown, the algorithm can still achieve differential privacy and the regret of the algorithm grows sublinearly. For constrained optimization problems, Lü et al. ([17]) introduced an efficient privacy-preserving distributed online dual averaging algorithm by using the gradient rescaling strategy. Li et al. ([18]) proposed a framework for differentially private distributed optimization. For time-varying communication graphs, they obtained the regrets bounds O(T)O(\sqrt{T}) and O(ln(T))O(\ln{(T)}) for convex and strongly convex cost functions, respectively. Zhu et al. ([19]) proposed a differentially private distributed stochastic subgradient online convex optimization algorithm based on weight balancing over time-varying directed networks. They demonstrated that the algorithm ensures ϵ\epsilon-differential privacy and provided the regret bound of the algorithm.

The aforementioned privacy-preserving distributed online optimization algorithms all address the cases of convex cost functions. However, in practical applications, many problems involve nonconvex cost functions. For example, in machine learning, the cost function may be nonconvex due to sparsity and low-rank constraints ([22]). In wireless communication, energy efficiency problems may also involve nonconvex cost functions due to constraints on node transmission power ([23]). Since the cost function is nonconvex, finding a global minimizer is challenging. Therefore, the regret for convex optimization cannot be used to measure the performance of nonconvex online algorithms. For nonconvex optimization, the goal is usually to find a point that satisfies the first-order optimality condition ([24, 25, 26, 27]). They used the regret based on the first-order optimality condition to evaluate the performance of the algorithm. Under appropriate assumptions, they proved that the regret grows sublinearly.

The privacy-preserving distributed online nonconvex optimization problems have wide applications in practical engineering fields, such as distributed target tracking. Up to now, the studies on privacy-preserving distributed nonconvex optimization have been restricted to offline settings ([28, 29]). In [28], a new algorithm for decentralized nonconvex optimization was proposed, which can enable both rigorous differential privacy and convergence. Compared with the existing works, the main challenges for privacy-preserving distributed online nonconvex optimization are as follows. The first challenge is ensuring the convergence of the algorithm with the differential privacy mechanism. Existing works on distributed online nonconvex optimization do not incorporate privacy protection, and adding noises to decision variables may lead to divergence for differential privacy in distributed online optimization algorithms. The second challenge is improving the computational efficiency of online algorithms. For online optimizations, computational resources are limited, and the immediate decision-making is required. Compared with [28], which addressed unconstrained optimization problems, we focus on optimization problems with set constraints, making the solution process more difficult and the computational cost higher. The algorithm in [28] is based on the gradient descent. For constrained optimization problems, directly incorporating Euclidean projection into the algorithm given in [28] may lead to high computational costs. Motivated by selecting Bregman projections based on different constraint conditions can improve computational efficiency, we combine the differential privacy mechanism with the distributed mirror descent method to design a privacy-preserving distributed online mirror descent algorithm for nonconvex optimization (DPDO-NC). The proposed algorithm can address distributed online nonconvex optimization problems while protecting privacy. To evaluate the algorithm’s performance, we use the regret based on the first-order optimality condition and give a thorough analysis of the algorithm’s regret. Table 1 compares our algorithm with existing privacy-preserving distributed online optimization algorithms. The main contributions of this paper are summarized as follows.

Table 1: Comparison of privacy-preserving distributed online algorithms.
Reference Cost functions Constraint Communication graph
[15] strongly convex convex set fixed undirected
[16], [20] convex/strongly convex convex set fixed directed
[17] convex convex set fixed directed
[18] convex/strongly convex convex set time-varying undirected
[19] convex/strongly convex no time-varying directed
[21] convex equality fixed undirected
our work nonconvex convex set time-varying directed

In our algorithm, each node adds noises to its decision variables and then broadcasts the perturbed variables to neighboring nodes via the communication network to achieve consensus. Finally, the mirror descent method is used to update the local decision variables. We prove that the proposed algorithm maintains ϵ\epsilon-differential privacy at each iteration. Compared with [27], our algorithm adds Laplace noises during communication with neighbors to protect privacy information, and the communication topology in our algorithm is a time-varying graph rather than a fixed graph. Compared with [14, 15, 16, 17, 18], we do not require the cost function to be convex. While [14, 15, 16] used fixed graphs, we consider time-varying graphs, which have broader application scenarios.

To overcome the challenges posed by constraints and online nonconvex cost functions, the proposed algorithm updates decisions by solving subproblems. The subproblems include the Bregman divergence. If the Bregman divergence satisfies appropriate conditions, then the algorithm can effectively track its stationary point. By using the mirror descent algorithm, we can choose different Bregman divergences depending on the problem, which can reduce the computational cost. By appropriately selecting the step size, we prove that the regret upper bound of the algorithm is O(T)O(\sqrt{T}), i.e., the algorithm converges to the stationary point sublinearly. The regret order is the same as the regret for distributed online nonconvex optimization algorithms without differential privacy.

The numerical simulations demonstrate the effectiveness of the proposed algorithm. By using the distributed localization problem, we demonstrate that the algorithm’s regret is sublinear. Additionally, the numerical simulations reveal the tradeoff between the level of privacy protection and the algorithm’s performance.

The main structure and content of this paper are arranged as follows. Section 2 introduces the preliminaries, including the graph theory and the problem description. Section 3 presents the differentially private distributed online nonconvex optimization algorithm. Section 4 includes the differential privacy analysis and the regret analysis. Section 5 gives the numerical simulations. The final section concludes the paper.

Notations: let \mathbb{R} and \mathbb{Z} be the set of real numbers and integers, respectively. For a vector or matrix XX, X\|X\| and X1\|X\|_{1} represent its 2-norm and 1-norm, respectively. Let XTX^{T} be the transpose of XX, and [X]i[X]_{i} be the ii-th row of the matrix. For a given function ff, f\nabla f denotes its gradient. Let 1 be a column vector of appropriate dimension with all elements equal to 1. Let II be an identity matrix of appropriate dimension. For a random variable xx, [x]\mathbb{P}[x] and 𝔼[x]\mathbb{E}[x] denote its probability distribution and expectation, respectively. For a random variable ξ\xi, ξLap(σ)\xi\sim Lap(\sigma) indicates that ξ\xi follows a Laplace distribution with scale parameter σ\sigma, whose probability density function is given by f(x|σ)=12σe|x|σf(x|\sigma)=\frac{1}{2\sigma}e^{-\frac{|x|}{\sigma}}.

2 Preliminaries and Problem Description

This section provides the problem description and preliminaries for the subsequent analysis.

2.1 Graph theory

The communication network is modeled by a directed graph (digraph) G(t)={V,E(t),A(t)},t=1,2,G(t)=\{V,E(t),A(t)\},t=1,2,\dots, where V={1,2,,N}V=\{1,2,\dots,N\} is the set of nodes and E(t)E(t) is the set of edges. Elements in E(t)E(t) are represented as pairs (j,i)(j,i), where j,iVj,i\in V, and (j,i)E(t)(j,i)\in E(t) indicates that node jj can directly send information to node ii. Let Niin(t)={j|(j,i)E(t)}{i}N_{i}^{in}(t)=\{j|(j,i)\in E(t)\}\cup\{i\} and Niout(t)={j|(i,j)E(t)}{i}N_{i}^{out}(t)=\{j|(i,j)\in E(t)\}\cup\{i\} denote the in-neighbors and out-neighbors of node ii at each time tt, respectively. The matrix A(t)={aij(t)}N×NA(t)=\{a_{ij}(t)\}_{N\times N} is called the weight matrix of the graph G(t)G(t). If (j,i)E(t)(j,i)\in E(t), then a<aij(t)<1a<a_{ij}(t)<1 for some 0<a<10<a<1, and aij(t)=0a_{ij}(t)=0 otherwise. If A(t)A(t) is symmetric, then G(t)G(t) is an undirected graph. If A(t)A(t) satisfies 𝟏TA(t)=𝟏T\boldsymbol{1}^{T}A(t)=\boldsymbol{1}^{T} and AT(t)𝟏=𝟏A^{T}(t)\boldsymbol{1}=\boldsymbol{1}, then A(t)A(t) is called a doubly stochastic matrix, and G(t)G(t) is called a balanced graph. For a fixed graph G={V,E,A}G=\{V,E,A\}, if for any two nodes i,jVi,j\in V, there exist k1,k2,,kmVk_{1},k_{2},\dots,k_{m}\in V such that (i,k1),(k1,k2),,(km,j)E(i,k_{1}),(k_{1},k_{2}),\dots,(k_{m},j)\in E, then the graph GG is strongly connected. For the digraph G(t)G(t), a BB-edge set is defined as EB(t)=k=(t1)B+1tBE(k)E_{B}(t)=\bigcup_{k=(t-1)B+1}^{tB}E(k) for some integer B1B\geq 1. The digraph G(t)G(t) is BB-strongly connected if the digraph with the node set VV and the edge set EB(t)E_{B}(t) is strongly connected for all t1t\geq 1 ([30]).

2.2 Problem description

In this paper, we consider the time-varying system with NN nodes, where each node only knows its cost function ftif_{t}^{i} and communicates with neighboring nodes through a directed time-varying network G(t)G(t). The nodes cooperate to solve the following optimization problem

minxΩi=1Nfti(x),t=1,,T,\begin{gathered}\min_{x\in\Omega}\sum_{i=1}^{N}f_{t}^{i}(x),\ t=1,\dots,T,\end{gathered} (1)

where fti:df_{t}^{i}:\mathbb{R}^{d}\rightarrow\mathbb{R} is the cost function of node ii at time tt, and TT is the time horizon. The following basic assumptions are given for problem (1).

Assumption 1

For all t=1,,Tt=1,\dots,T, G(t)G(t) is BB-strongly connected, and the weight matrix A(t)A(t) is doubly stochastic.

Assumption 2

The set Ωd\Omega\subseteq\mathbb{R}^{d} is a bounded closed convex set.

Assumption 3

For all t=1,,Tt=1,\dots,T, i=1,,Ni=1,\dots,N, fti(x)f_{t}^{i}(x) is differentiable and β\beta-smooth with respect to xx, i.e., there exist β>0\beta>0 such that

fti(x)fti(y)βxy,x,yd.\displaystyle\left\|\nabla f_{t}^{i}(x)-\nabla f_{t}^{i}(y)\right\|\leq\beta\left\|x-y\right\|,\quad\forall\,x,y\in\mathbb{R}^{d}.

By Assumptions 2-3, we know that there are η>0\eta>0 and θ>0\theta>0 such that x1x2η\|x_{1}-x_{2}\|\leq\eta for all x1,x2Ωx_{1},x_{2}\in\Omega, and fti(x)θ\|\nabla f_{t}^{i}(x)\|\leq\theta for all i=1,,Ni=1,\dots,N, t=1,,Tt=1,\dots,T, and xΩx\in\Omega. Assumptions 2-3 are common in the research on distributed optimization ([18][20]). Moreover, there are no convexity requirements for the cost functions ftif_{t}^{i} in this paper. We give the following practical example that satisfies Assumptions 1-3.

Example 1

Distributed localization problem ([25][31]). Consider NN sensors collaborating to locate a moving target. Let xt0Ωdx_{t}^{0}\in\Omega\subseteq\mathbb{R}^{d} be the true position of the target at time tt, and sis_{i} be the position of sensor ii. Each sensor can only obtain the distance measurement between the target’s position and its position, given by

dti=sixt0+ϑti,i=1,,N,d_{t}^{i}=\|s_{i}-x_{t}^{0}\|+\vartheta_{t}^{i},\,\,\,i=1,\dots,N,

where ϑti\vartheta_{t}^{i} is the measurement error at time tt. To estimate the target’s position, the sensors collaborate to solve the following optimization problem

minxΩi=1N12|sixdti|2,\min_{x\in\Omega}\sum_{i=1}^{N}\frac{1}{2}\left|\|s_{i}-x\|-d_{t}^{i}\right|^{2},

where Ω\Omega is a bounded closed convex set representing the range of the target’s position. It can be observed that fti(x)=12|sixdti|2f_{t}^{i}(x)=\frac{1}{2}\left|\|s_{i}-x\|-d_{t}^{i}\right|^{2} is nonconvex with respect to xx.

In distributed online convex optimization, each node uses local information and information transmitted from neighboring nodes to make a local prediction xt+1ix_{t+1}^{i} of the optimal solution xx^{*} at time t+1t+1, where xt+1ix_{t+1}^{i} is the decision variable of node ii at time t+1t+1. Node ii can only observe fti(xti)f_{t}^{i}(x_{t}^{i}) at time tt, so in order to solve the global optimization task, the nodes must communicate with each other through the network G(t)G(t).

Generally speaking, find the global optimal point xx^{*} is impossible for nonconvex optimization. We use the following definition of regret.

Definition 1

([32]) Let {xti,t=1,,T,i=1,,N}\{x_{t}^{i},\,t=1,\dots,T,\,i=1,\dots,N\} be the sequence of decisions generated by a given distributed online optimization algorithm, the regret of node ii is defined by

𝔼[RTi]maxxΩ(𝔼[t=1Tj=1Nftj(xti),xtix]).\mathbb{E}[{R}_{T}^{i}]\triangleq\max_{x\in{\Omega}}\left(\mathbb{E}\left[\sum_{t=1}^{T}\sum_{j=1}^{N}\langle\nabla f_{t}^{j}(x_{t}^{i}),x_{t}^{i}-x\rangle\right]\right). (2)

Different from the convex optimization, which aims at finding a global minimum, using the regret (2) aims at finding a stationary point of a nonconvex cost function. Our goal is to design an efficient online distributed optimization algorithm such that the regret (2) grows sublinearly with respect to time horizon TT, i.e., limT𝔼[RTi]/T=0\lim_{T\rightarrow\infty}\mathbb{E}[{R}_{T}^{i}]/T=0, while also ensuring the privacy of each node’s information.

Remark 1

Lu et al. ([27]) provides the following individual regret for distributed online nonconvex optimization

RTimaxxΩ(t=1Tj=1Nftj(xti),xtix),{R}_{T}^{i}\triangleq\max_{x\in\Omega}\left(\sum_{t=1}^{T}\sum_{j=1}^{N}\langle\nabla f_{t}^{j}(x_{t}^{i}),x_{t}^{i}-x\rangle\right), (3)

where i=1,,Ni=1,\dots,N. If xx^{*} is a stationary point of t=1Tj=1Nftj(x)\sum_{t=1}^{T}\!\sum_{j=1}^{N}f_{t}^{j}(x), then t=1Tj=1Nftj(x),xx0,\sum_{t=1}^{T}\!\sum_{j=1}^{N}\langle\nabla f_{t}^{j}(x^{*}),x^{*}-x\rangle\leq 0, xΩ\forall\,x\in\Omega.

Definition 1 is similar to the above, both describing the degree of violation of the first-order optimality condition. The regret (3) is for the deterministic case, whereas in our algorithm, the noises are added, and therefore the regret (2) is in the sense of expectation.

The regret (2) is also applicable for distributed online convex optimization ([18, 19]), for which the regret is defined as

t=1Tj=1N𝔼[ftj(xti)]t=1Tj=1Nftj(x),\sum_{t=1}^{T}\sum_{j=1}^{N}\mathbb{E}[f_{t}^{j}\left(x_{t}^{i}\right)]-\sum_{t=1}^{T}\sum_{j=1}^{N}f_{t}^{j}\left(x^{*}\right), (4)

where x=argminxΩt=1Tj=1Nftj(x)x^{*}=\arg\min_{x\in\Omega}\sum_{t=1}^{T}\sum_{j=1}^{N}f_{t}^{j}\left(x\right). By the properties of convex functions, we can obtain

t=1Tj=1N𝔼[ftj(xti)]t=1Tj=1Nftj(x)𝔼[t=1Tj=1Nftj(xti),xtix]𝔼[RTi].\displaystyle\sum_{t=1}^{T}\sum_{j=1}^{N}\mathbb{E}[f_{t}^{j}\left(x_{t}^{i}\right)]-\sum_{t=1}^{T}\sum_{j=1}^{N}f_{t}^{j}\left(x^{*}\right)\leq\mathbb{E}\left[\sum_{t=1}^{T}\sum_{j=1}^{N}\langle\nabla f_{t}^{j}(x_{t}^{i}),x_{t}^{i}-x^{*}\rangle\right]\!\leq\!\mathbb{E}[{R}_{T}^{i}].

From above, we can see that for convex functions, the regret (4) grows sublinearly if (2) grows sublinearly.

3 Algorithm Design

Without revealing the private information of individual nodes, in this paper, we propose a differentially private distributed online optimization algorithm for solving the nonconvex optimization problem (1), based on the distributed mirror descent algorithm and the differential privacy mechanism. The mirror descent algorithm is based on the Bregman divergence.

Definition 2

([12]) Given a ω\omega-strongly convex function φ:d\varphi:\mathbb{R}^{d}\rightarrow\mathbb{R}, i.e., there is ω>0\omega>0 such that

φ(y)φ(x)φ(x)T(yx)+ω2xy2,x,yd.\varphi\left(y\right)-\varphi\left(x\right)\geq\nabla\varphi\left(x\right)^{T}\left(y-x\right)+\frac{\omega}{2}\|x-y\|^{2},\,\forall\,x,y\in\mathbb{R}^{d}.

The Bregman divergence generated by φ\varphi is defined as

Dφ(x,y)=φ(x)φ(y)φ(y),xy.D_{\varphi}(x,y)=\varphi(x)-\varphi(y)-\langle\nabla\varphi(y),x-y\rangle. (5)

The Bregman divergence measures the distortion or loss resulting from approximating yy by xx. From Definition 2, the Bregman divergence has the following properties: (i) Dφ(x,y)0D_{\varphi}(x,y)\geq 0. (ii) Dφ(x,y)D_{\varphi}(x,y) is strongly convex with respect to xx. We give some assumptions on the Bregman divergence.

Assumption 4

(i) For any xΩx\in\Omega, Dφ(x,y)D_{\varphi}(x,y) is convex respect to yy, i.e., for any ri0,i=1,,N,r_{i}\geq 0,i=1,\dots,N, satisfying i=1Nri=1\sum_{i=1}^{N}r_{i}=1,

Dφ(x,i=1Nriyi)i=1NriDφ(x,yi),yid.D_{\varphi}\left(x,\sum_{i=1}^{N}r_{i}y_{i}\right)\leq\sum_{i=1}^{N}r_{i}D_{\varphi}\left(x,y_{i}\right),\,\,\forall\,y_{i}\in\mathbb{R}^{d}.

(ii) For any xΩx\in\Omega, the Bregman divergence Dφ(x,y)D_{\varphi}(x,y) is M-smooth respect to yy, i.e., there is M>0M>0 such that

Dφ(x,y1)Dφ(x,y2)yDφ(x,y2),y1y2+M2y1y22,y1,y2d.D_{\varphi}(x,y_{1})-D_{\varphi}(x,y_{2})\leq\langle\nabla_{y}D_{\varphi}(x,y_{2}),y_{1}-y_{2}\rangle+\frac{M}{2}\|y_{1}-y_{2}\|^{2},\,\,\forall\,y_{1},y_{2}\in\mathbb{R}^{d}.

Assumption 4 is a common assumption in the distributed mirror descent algorithm and is also mentioned in [12] and [15].

Suppose an attacker can eavesdrop on the communication channel and intercept messages exchanged between nodes, which may lead to privacy leakage. To protect privacy, we add noises to the communication. At time tt, each node ii adds noises to perturb the decision variables before communication. Then, each node communicates the perturbed decision variables with its neighbors for consensus. The specific iteration process is given by

zti=j=1Naij(t)(xtj+ξtj),z_{t}^{i}=\sum_{j=1}^{N}a_{ij}(t)\left(x_{t}^{j}+\xi_{t}^{j}\right), (6)

where ξtjLap(σt)\xi_{t}^{j}\sim Lap(\sigma_{t}). Next, each node ii updates its local decision variable using the mirror descent based on the consensus variable ztiz_{t}^{i}. The specific iteration process is given by

xt+1i=argminxΩ{Dφ(x,zti)+αtfti(xti),x},x_{t+1}^{i}=\arg\min_{x\in\Omega}\left\{D_{\varphi}(x,z_{t}^{i})+\langle\alpha_{t}\nabla f_{t}^{i}(x_{t}^{i}),x\rangle\right\}, (7)

where αt\alpha_{t} is the step size, and Dφ(x,zti)D_{\varphi}(x,z_{t}^{i}) is given by (5). In (7), Dφ(x,zti)+αtfti(xti),xD_{\varphi}(x,z_{t}^{i})+\langle\alpha_{t}\nabla f_{t}^{i}(x_{t}^{i}),x\rangle is strongly convex with respect to xx, which ensures a unique solution for xt+1ix_{t+1}^{i}. The pseudocode for the algorithm is given as follows.

1:Input: step size αt=1Nt\alpha_{t}=\frac{1}{N\sqrt{t}}, privacy level ϵ\epsilon, noise magnitude σt=2dαtθωϵ\sigma_{t}=\frac{2\sqrt{d}\alpha_{t}\theta}{\omega\epsilon}, Bregman divergence Dφ(x,y)D_{\varphi}(x,y), initial value {x1i}i=1NΩ\{x_{1}^{i}\}_{i=1}^{N}\in\Omega, weighted matrix A(t)A(t).
2:for t=1,2,,Tt=1,2,\dots,T do
3:     for i=1,2,,Ni=1,2,\dots,N  do
4:         Disturb xtix_{t}^{i} by adding noises ξtiLap(σt)\xi_{t}^{i}\sim Lap(\sigma_{t}), and broadcast qtj=xtj+ξtjq_{t}^{j}=x_{t}^{j}+\xi_{t}^{j} to neighboring nodes.
5:         zti=j=1Naij(t)qtjz_{t}^{i}=\sum_{j=1}^{N}a_{ij}(t)q_{t}^{j}.
6:         Update xt+1i=argminxΩ{Dφ(x,zti)+αtfti(xti),x}.x_{t+1}^{i}=\arg\min_{x\in\Omega}\left\{D_{\varphi}(x,z_{t}^{i})+\langle\alpha_{t}\nabla f_{t}^{i}(x_{t}^{i}),x\rangle\right\}.
7:     end for
8:end for
9:Output: {xTi}i=1N\{x_{T}^{i}\}_{i=1}^{N}.
Algorithm 1 Differentially private distributed online mirror descent algorithm for nonconvex optimization (DPDO-NC).
Remark 2

Selecting Bregman functions based on different constraint conditions can improve computational efficiency. For example, for the optimization problem with probabilistic simplex constraint, i.e. minxΩf(x),\min_{x\in\Omega}f(x), with Ω={x=[x1,,xd]Td|xi0,i=1dxi=1}\Omega=\{x=[x_{1},\dots,x_{d}]^{T}\in\mathbb{R}^{d}|x_{i}\geq 0,\sum_{i=1}^{d}x_{i}=1\}, if choosing the squared Euclidean distance, then in each projection step, projecting the iterates back onto the simplex constraint is computationally complex. However, if choosing the Bregman function φ(x)=i=1dxilogxi\varphi(x)=\sum_{i=1}^{d}x_{i}\log x_{i}, then one can deduce that [xt+1]j=exp([yt+1]j1)i=1dexp([yt+1]i1)[x_{t+1}]_{j}=\frac{\exp{([y_{t+1}]_{j}-1)}}{\sum_{i=1}^{d}\exp{([y_{t+1}]_{i}-1)}} and [yt+1]j=[yt]jαf(xt)[xt]j[y_{t+1}]_{j}=[y_{t}]_{j}-\alpha\frac{\partial f(x_{t})}{\partial[x_{t}]_{j}}. The iteration points automatically remain within the set Ω\Omega without additional projection computations and the computational cost is significantly reduced.

4 Algorithm Analyses

In this section, we provide the analyses of the differential privacy and the regret of the proposed algorithm. Denote

Xt=((xt1)T,,(xtN)T)T,Xt=((xt1)T,,(xtN)T)T,X_{t}=((x_{t}^{1})^{T},\dots,(x_{t}^{N})^{T})^{T},\,{X_{t}}^{\prime}=(({x_{t}^{1}}^{\prime})^{T},\dots,({x_{t}^{N}}^{\prime})^{T})^{T},
Zt=((zt1)T,,(ztN)T)T,Zt=((zt1)T,,(ztN)T)T,Z_{t}=((z_{t}^{1})^{T},\dots,(z_{t}^{N})^{T})^{T},\,{Z_{t}}^{\prime}=(({z_{t}^{1}}^{\prime})^{T},\dots,({z_{t}^{N}}^{\prime})^{T})^{T},
Qt=((qt1)T,,(qtN)T)T,Qt=((qt1)T,,(qtN)T)T,Q_{t}=((q_{t}^{1})^{T},\dots,(q_{t}^{N})^{T})^{T},\,{Q_{t}}^{\prime}=(({q_{t}^{1}}^{\prime})^{T},\dots,({q_{t}^{N}}^{\prime})^{T})^{T},
Ξt=((ξt1)T,,(ξtN)T)T,x¯t=1N(1NTId)Xt.\Xi_{t}=((\xi_{t}^{1})^{T},\dots,(\xi_{t}^{N})^{T})^{T},\,\overline{x}_{t}=\frac{1}{N}(\textbf{1}_{N}^{T}\otimes I_{d})X_{t}.

4.1 Differential privacy analysis

We firstly introduce the basic concepts related to the differential privacy.

Definition 3

([33]) Given two data sets =(f1,f2,,fN)\mathcal{F}=(f_{1},f_{2},\dots,f_{N}) and =(f1,f2,,fN)\mathcal{F}^{\prime}=(f_{1}^{\prime},f_{2}^{\prime},\dots,f_{N}^{\prime}), if there exists an i{1,,N}i\in\{1,\dots,N\} such that fifif_{i}\neq f_{i}^{\prime} and fj=fjf_{j}=f_{j}^{\prime} for all jij\neq i, then the data sets \mathcal{F} and \mathcal{F}^{\prime} are adjacent. Two adjacent data sets is denoted as adj(,){\rm adj}(\mathcal{F},\mathcal{F}^{\prime}).

For the distributed online optimization problem PP, let the execution of the algorithm be ϱ={X1,Q1,Z1},,{XT,QT,ZT}\varrho=\{X_{1},Q_{1},Z_{1}\},\dots,\{X_{T},Q_{T},Z_{T}\}. The observation sequence in the above execution of the algorithm is Q1,,QTQ_{1},\dots,Q_{T} (the information exchanged during the algorithm’s execution). Denote the mapping from execution to observation by A(ϱ)Q1,,QTA(\varrho)\triangleq Q_{1},\dots,Q_{T}. Let ObsObs be the observation sequence of the algorithm for any distributed online optimization problem PP. For any initial state X1X_{1}, any fixed observation OObsO\in Obs, and any two adjacent data sets \mathcal{F} and \mathcal{F}^{\prime}, we denote the algorithm’s executions for the two distributed online optimization problems PP and PP^{\prime} by A1(,O,X1)={X1,Q1,Z1},,{XT,QT,ZT}A^{-1}(\mathcal{F},O,X_{1})=\{X_{1},Q_{1},Z_{1}\},\dots,\{X_{T},Q_{T},Z_{T}\} and A1(,O,X1)={X1,Q1,Z1},A^{-1}(\mathcal{F}^{\prime},O,X_{1})=\{X_{1}^{\prime},Q_{1}^{\prime},Z_{1}^{\prime}\},\dots, {XT,QT,ZT}\{X_{T}^{\prime},Q_{T}^{\prime},Z_{T}^{\prime}\}, respectively. Let AXt1(,O,X1)A^{-1}_{X_{t}}(\mathcal{F},O,X_{1}) be the set of elements XtX_{t} in the execution A1(,O,X1)A^{-1}(\mathcal{F},O,X_{1}). Using the above notation, we give the definition of differential privacy.

Definition 4

([34]) For any two adjacent data sets \mathcal{F} and \mathcal{F}^{\prime}, any initial state X1X_{1}, and any set of observation sequences 𝒪Obs\mathcal{O}\subseteq Obs, if the algorithm satisfies

[A1(,𝒪,X1)]exp(ϵ)[A1(,𝒪,X1)],\mathbb{P}[A^{-1}(\mathcal{F},\mathcal{O},X_{1})]\leq\exp(\epsilon)\mathbb{P}[A^{-1}(\mathcal{F}^{\prime},\mathcal{O},X_{1})],

where ϵ>0\epsilon>0 is the privacy parameter, then the algorithm satisfy ϵ\epsilon-differential privacy.

If two data sets are adjacent, then the probability that the algorithm produces the same output for both data sets is very close. Differential privacy aims to mitigate the difference in outputs between two adjacent data sets by adding noises to the algorithm. To determine the amount of noises to add, the sensitivity plays an important role in algorithm design. We will now provide the definition of algorithm sensitivity.

Definition 5

([34][35]) At each time tt, for any initial state X1X_{1} and any two adjacent data sets \mathcal{F} and \mathcal{F}^{\prime}, the sensitivity of the algorithm is defined as

(t)supOObs{supXAXt1(,O,X1),XAXt1(,O,X1)XX1}.\bigtriangleup(t)\triangleq\sup_{O\in Obs}\left\{\sup_{X\in A^{-1}_{X_{t}}(\mathcal{F},O,X_{1}),\,X^{\prime}\in A^{-1}_{X_{t}}(\mathcal{F}^{\prime},O,X_{1})}\|X-X^{\prime}\|_{1}\right\}.

In differential privacy, sensitivity is a crucial quantity that determines the amount of noises to be added in each iteration to achieve differential privacy. The sensitivity of an algorithm describes the extent to which a change in a single data point in adjacent data sets affects the algorithm. Therefore, we determine the noises magnitude by constraining sensitivity to ensure ϵ\epsilon-differential privacy. The following lemma provides a bound for the sensitivity of Algorithm 1.

Lemma 1

Suppose Assumption 2 and Assumption 4 hold. Then the sensitivity of Algorithm 1 satisfies

(t)2dαtθω.\bigtriangleup(t)\leq\frac{2\sqrt{d}\alpha_{t}\theta}{\omega}. (8)
Proof 1

Based on (7) and the first-order optimality condition, we have

αtfti(xti)+xDφ(xt+1i,zti),xxt+1i0.\left\langle\alpha_{t}\nabla f_{t}^{i}(x_{t}^{i})+\nabla_{x}D_{\varphi}(x_{t+1}^{i},z_{t}^{i}),x-x_{t+1}^{i}\right\rangle\geq 0. (9)

By (5), we have

xDφ(xt+1i,zti)=φ(xt+1i)φ(zti).\nabla_{x}D_{\varphi}(x_{t+1}^{i},z_{t}^{i})=\nabla\varphi(x_{t+1}^{i})-\nabla\varphi(z_{t}^{i}). (10)

Let x=ztix=z_{t}^{i}, by (9) and (10), we obtain

αtfti(xti),ztixt+1i\displaystyle\left\langle\alpha_{t}\nabla f_{t}^{i}(x_{t}^{i}),z_{t}^{i}-x_{t+1}^{i}\right\rangle φ(zti)φ(xt+1i),ztixt+1i\displaystyle\geq\left\langle\nabla\varphi(z_{t}^{i})-\nabla\varphi(x_{t+1}^{i}),z_{t}^{i}-x_{t+1}^{i}\right\rangle
ωztixt+1i2.\displaystyle\geq\omega\|z_{t}^{i}-x_{t+1}^{i}\|^{2}.

By Assumption 2 and Cauchy-Schwarz inequality, we have

ωztixt+1i2αtfti(xti)ztixt+1iαtθztixt+1i.\omega\|z_{t}^{i}-x_{t+1}^{i}\|^{2}\leq\alpha_{t}\|\nabla f_{t}^{i}(x_{t}^{i})\|\|z_{t}^{i}-x_{t+1}^{i}\|\leq\alpha_{t}\theta\|z_{t}^{i}-x_{t+1}^{i}\|.

Therefore, we have

xt+1iztiαtθω.\|x_{t+1}^{i}-z_{t}^{i}\|\leq\frac{\alpha_{t}\theta}{\omega}. (11)

Similarly, we have

xt+1iztiαtθω.\|{x_{t+1}^{i}}^{\prime}-{z_{t}^{i}}^{\prime}\|\leq\frac{\alpha_{t}\theta}{\omega}. (12)

At each time tt, by Definition 5, the attacker observes the same information from two adjacent data sets, i.e., qti=qtiq_{t}^{i}={q_{t}^{i}}^{\prime}. From (6), we know that zti=zti,t=1,,T,i=1,,Nz_{t}^{i}={z_{t}^{i}}^{\prime},\forall\,t=1,\dots,T,\forall\,i=1,\dots,N. By (11), (12) and the triangle inequality, we have

A1(,O,X1)A1(,O,X1)1\displaystyle\|A^{-1}(\mathcal{F},O,X_{1})-A^{-1}(\mathcal{F}^{\prime},O,X_{1})\|_{1} =xt+1ixt+1i1\displaystyle=\|x_{t+1}^{i}-{x_{t+1}^{i}}^{\prime}\|_{1}
dxt+1ixt+1i\displaystyle\leq\sqrt{d}\|x_{t+1}^{i}-{x_{t+1}^{i}}^{\prime}\|
xt+1izti+xt+1izti\displaystyle\leq\|x_{t+1}^{i}-z_{t}^{i}\|+\|{x_{t+1}^{i}}^{\prime}-{z_{t}^{i}}^{\prime}\|
2dαtθω.\displaystyle\leq\frac{2\sqrt{d}\alpha_{t}\theta}{\omega}. (13)

From the arbitrariness of observation OO and adjacent data sets \mathcal{F} and \mathcal{F}^{\prime}, as well as (13), we obtain (8). \qed

The data set =t=1Tt\mathcal{F}=\bigcup_{t=1}^{T}\mathcal{F}_{t} contains all the information that needs to be protected, where t={ft1,,ftN}\mathcal{F}_{t}=\{f_{t}^{1},\dots,f_{t}^{N}\} represents the information that the algorithm needs to protect at time tt. The adjacent data set of \mathcal{F} is denoted as =t=1Tt\mathcal{F}^{\prime}=\bigcup_{t=1}^{T}\mathcal{F}_{t}^{\prime}, where t={ft1,,ftN}\mathcal{F}_{t}^{\prime}=\{{f_{t}^{1}}^{\prime},\dots,{f_{t}^{N}}^{\prime}\}. Next, we present the ϵ\epsilon-differential privacy theorem.

Theorem 1

Suppose Assumption 2 and Assumption 4 hold. If σt=(t)ϵ\sigma_{t}=\frac{\bigtriangleup(t)}{\epsilon}, t=1,,T\forall\,t=1,\dots,T and ϵ>0\epsilon>0, then Algorithm 1 guarantees ϵ\epsilon-differential privacy at each time tt. Furthermore, over the time horizon TT, Algorithm 1 guarantees ϵ^\hat{\epsilon}-differential privacy, where ϵ^=t=1T(t)σt\hat{\epsilon}=\sum_{t=1}^{T}\frac{\bigtriangleup(t)}{\sigma_{t}}.

Proof 2

By Definition 5, we have

XtXt1(t).\|X_{t}-{X_{t}}^{\prime}\|_{1}\leq\bigtriangleup(t).

Thus, we have

i=1Nj=1d|[xti]j[xti]j|=XtXt1(t),\sum_{i=1}^{N}\sum_{j=1}^{d}\left|[x_{t}^{i}]_{j}-[{x_{t}^{i}}]_{j}\right|=\|X_{t}-{X_{t}}^{\prime}\|_{1}\leq\bigtriangleup(t),

where [xti]j[x_{t}^{i}]_{j} denotes the jj-th component of the vector xtix_{t}^{i}. By the properties of the Laplace distribution and qti=qtiq_{t}^{i}={q_{t}^{i}}^{\prime}, it follows that

i=1Nj=1d[[qti]j[xti]j][[qti]j[xti]j]\displaystyle\prod_{i=1}^{N}\prod_{j=1}^{d}\frac{\mathbb{P}\left[[q_{t}^{i}]_{j}-[x_{t}^{i}]_{j}\right]}{\mathbb{P}\left[[{q_{t}^{i}}^{\prime}]_{j}-[{x_{t}^{i}}^{\prime}]_{j}\right]} =i=1Nj=1dexp(|[qti]j[xti]j|σt)exp(|[qti]j[xti]j|σt)\displaystyle=\prod_{i=1}^{N}\prod_{j=1}^{d}\frac{\exp\left(-\frac{\left|[q_{t}^{i}]_{j}-[x_{t}^{i}]_{j}\right|}{\sigma_{t}}\right)}{\exp\left(-\frac{\left|[{q_{t}^{i}}^{\prime}]_{j}-[{x_{t}^{i}}^{\prime}]_{j}\right|}{\sigma_{t}}\right)}
i=1Nj=1dexp(|[qti]j[xti]j[qti]j+[xti]j|σt)\displaystyle\leq\prod_{i=1}^{N}\prod_{j=1}^{d}\exp\left(\frac{\left|[{q_{t}^{i}}^{\prime}]_{j}-[{x_{t}^{i}}^{\prime}]_{j}-[q_{t}^{i}]_{j}+[x_{t}^{i}]_{j}\right|}{\sigma_{t}}\right)
=exp(i=1Nj=1d|[xti]j[xti]j|σt)\displaystyle=\exp\left(\sum_{i=1}^{N}\sum_{j=1}^{d}\frac{\left|[x_{t}^{i}]_{j}-[{x_{t}^{i}}^{\prime}]_{j}\right|}{\sigma_{t}}\right)
exp((t)σt).\displaystyle\leq\exp\left(\frac{\triangle(t)}{\sigma_{t}}\right). (14)

By (2), we obtain

[A1(t,𝒪,X1)]exp(ϵ)[A1(t,𝒪,X1)].\displaystyle\mathbb{P}[A^{-1}(\mathcal{F}_{t},\mathcal{O},X_{1})]\leq\exp(\epsilon)\mathbb{P}[A^{-1}(\mathcal{F}_{t}^{\prime},\mathcal{O},X_{1})].

Thus, the Algorithm 1 guarantees ϵ\epsilon-differential privacy at each time tt. Let ϵ^=t=1T(t)σt\hat{\epsilon}=\sum_{t=1}^{T}\frac{\bigtriangleup(t)}{\sigma_{t}}, by (2), we obtain

[A1(,𝒪,X1)]\displaystyle\mathbb{P}[A^{-1}(\mathcal{F},\mathcal{O},X_{1})] =t=1T[A1(t,𝒪,X1)]\displaystyle=\prod_{t=1}^{T}\mathbb{P}[A^{-1}(\mathcal{F}_{t},\mathcal{O},X_{1})]
t=1Texp((t)σt)[A1(t,𝒪,X1)]\displaystyle\leq\prod_{t=1}^{T}\exp\left(\frac{\triangle(t)}{\sigma_{t}}\right)\mathbb{P}[A^{-1}(\mathcal{F}_{t}^{\prime},\mathcal{O},X_{1})]
=exp(ϵ^)[A1(,𝒪,X1)].\displaystyle=\exp(\hat{\epsilon})\mathbb{P}[A^{-1}(\mathcal{F}^{\prime},\mathcal{O},X_{1})].
\qed
Remark 3

According to Theorem 1, Algorithm 1 satisfies ϵ\epsilon-differential privacy in each iteration. Since the upper bound for the sensitivity includes the step size, if choosing a decaying step size, then the sensitivity will gradually decrease. The level of differential privacy depends on the step size αt\alpha_{t}, the bound on the gradient θ\theta, the dimension of the decision variables dd, the strong convexity parameter ω\omega, and the noise magnitude σt\sigma_{t}. The smaller ϵ\epsilon is, the higher the level of differential privacy.

4.2 Regret analysis

This subsection analyzes the regret of Algorithm 1. By appropriately choosing parameters, for Algorithm 1, this section provides an upper bound O(T)O(\sqrt{T}) on its regret. We present some lemmas for analyzing the regret of the algorithm.

For any ts1t\geq s\geq 1, the state transition matrix is defined as

Φ(t,s)={A(t1)A(s+1)A(s), if t>s,IN, if t=s.\Phi(t,s)=\left\{\begin{array}[]{l}A(t-1)\cdots A(s+1)A(s),\text{ if }\,\,t>s,\\ I_{N},\text{ if }\,\,t=s.\end{array}\right.

According to Property 1 in [30], we have the following lemma.

Lemma 2

Suppose Assumption 1 hold. For any i,jVi,j\in V, tst\geq s, we have

|[Φ(t,s)]ij1N|Cλts,\qquad\left|[\Phi(t,s)]_{ij}-\frac{1}{N}\right|\leq C\lambda^{t-s}, (15)

where C=2(1+a(N1)B)/(1+a(N1)B)C=2\left(1+a^{-(N-1)B}\right)/\left(1+a^{(N-1)B}\right) and λ=(1a(N1)B)1/(N1)B\lambda=\left(1-a^{(N-1)B}\right)^{1/(N-1)B}.

In order to prove the main results, some necessary lemmas are provided.

Lemma 3

Suppose Assumptions 1-3 hold. For the sequence of decisions {xti,t=1,,T,i=1,,N}\{x_{t}^{i},\,t=1,\dots,T,\,i=1,\dots,N\} generated by Algorithm 1, for any i,j{1,,N}i,j\in\{1,\dots,N\}, we have

𝔼[xtixtj]\displaystyle\quad\mathbb{E}[\|x_{t}^{i}-x_{t}^{j}\|] (16)
2NdCλt1𝔼[X1]+22NdCk=1t1λtkσk+2dNθCωk=1t1λtk1αk.\displaystyle\leq\!2\sqrt{Nd}C\lambda^{t-1}\mathbb{E}\left[\|X_{1}\|\right]+2\sqrt{2}NdC\sum_{k=1}^{t-1}\lambda^{t-k}\sigma_{k}+\frac{2\sqrt{d}N\theta C}{\omega}\sum_{k=1}^{t-1}\lambda^{t-k-1}\alpha_{k}.
Proof 3

Firstly, denote pti=xt+1iztip_{t}^{i}=x_{t+1}^{i}-z_{t}^{i} and Pt=((pt1)T,,(ptN)T)TP_{t}=((p_{t}^{1})^{T},\dots,(p_{t}^{N})^{T})^{T}. By (11), we have

ptiαtθω.\|p_{t}^{i}\|\leq\frac{\alpha_{t}\theta}{\omega}. (17)

By (6), we obtain

Xt\displaystyle X_{t} =Zt1+Pt1\displaystyle=Z_{t-1}+P_{t-1}
=(A(t1)Id)(Xt1+Ξt1)+Pt1\displaystyle=\left(A(t-1)\otimes I_{d}\right)\left(X_{t-1}+\Xi_{t-1}\right)+P_{t-1}
=(Φ(t,1)Id)X1+k=1t1(Φ(t,k)Id)Ξk+k=1t1(Φ(t,k+1)Id)Pk.\displaystyle=\left(\Phi(t,1)\otimes I_{d}\right)X_{1}+\sum_{k=1}^{t-1}\left(\Phi(t,k)\otimes I_{d}\right)\Xi_{k}+\sum_{k=1}^{t-1}\left(\Phi(t,k+1)\otimes I_{d}\right)P_{k}. (18)

By (18), we obtain

xti=([Φ(t,1)]iId)X1+k=1t1([Φ(t,k)]iId)Ξk+k=1t1([Φ(t,k+1)]iId)Pk.\displaystyle x_{t}^{i}=\left([\Phi(t,1)]_{i}\otimes I_{d}\right)X_{1}\!+\!\sum_{k=1}^{t-1}\left([\Phi(t,k)]_{i}\otimes I_{d}\right)\Xi_{k}\!+\!\sum_{k=1}^{t-1}\left([\Phi(t,k+1)]_{i}\otimes I_{d}\right)P_{k}. (19)

By Assumption 1 and (18), we have

x¯t\displaystyle\overline{x}_{t} =1N(1NTId)Xt\displaystyle=\frac{1}{N}(\textbf{1}_{N}^{T}\!\otimes I_{d})X_{t}
=1N(1NTId)(Φ(t,1)Id)X1+1Nk=1t1(1NTId)(Φ(t,k)Id)Ξk\displaystyle=\frac{1}{N}(\textbf{1}_{N}^{T}\otimes I_{d})\left(\Phi(t,1)\otimes I_{d}\right)X_{1}+\frac{1}{N}\sum_{k=1}^{t-1}(\textbf{1}_{N}^{T}\otimes I_{d})\left(\Phi(t,k)\otimes I_{d}\right)\Xi_{k}
+1Nk=1t1(1NTId)(Φ(t,k+1)Id)Pk\displaystyle\quad+\frac{1}{N}\sum_{k=1}^{t-1}(\textbf{1}_{N}^{T}\otimes I_{d})\left(\Phi(t,k+1)\otimes I_{d}\right)P_{k}
=x¯1+1Nk=1t1(1NTId)Ξk+1Nk=1t1(1NTId)Pk.\displaystyle=\overline{x}_{1}+\frac{1}{N}\sum_{k=1}^{t-1}\left(\textbf{1}_{N}^{T}\otimes I_{d}\right)\Xi_{k}+\frac{1}{N}\sum_{k=1}^{t-1}\left(\textbf{1}_{N}^{T}\otimes I_{d}\right)P_{k}. (20)

Combining Lemma 2, (17), (18), and (19), we obtain

𝔼[xtix¯t]\displaystyle\mathbb{E}[\|x_{t}^{i}-\overline{x}_{t}\|] =𝔼[(([Φ(t,1)]i1N1NT)Id)X1]\displaystyle=\mathbb{E}\left[\left\|\left(\left([\Phi(t,1)]_{i}-\frac{1}{N}\textbf{1}_{N}^{T}\right)\otimes I_{d}\right)X_{1}\right\|\right]
+𝔼[k=1t1(([Φ(t,k)]i1N1NT)Id)Ξk]\displaystyle\quad+\mathbb{E}\left[\left\|\sum_{k=1}^{t-1}\left(\left([\Phi(t,k)]_{i}-\frac{1}{N}\textbf{1}_{N}^{T}\right)\otimes I_{d}\right)\Xi_{k}\right\|\right]
+𝔼[k=1t1(([Φ(t,k+1)]i1N1NT)Id)Pk]\displaystyle\quad+\mathbb{E}\left[\left\|\sum_{k=1}^{t-1}\left(\left([\Phi(t,k+1)]_{i}-\frac{1}{N}\textbf{1}_{N}^{T}\right)\otimes I_{d}\right)P_{k}\right\|\right]
(([Φ(t,1)]i1N1NT)Id)𝔼[X1]\displaystyle\leq\left\|\left(\left([\Phi(t,1)]_{i}-\frac{1}{N}\textbf{1}_{N}^{T}\right)\otimes I_{d}\right)\right\|\mathbb{E}\left[\|X_{1}\|\right]
+k=1t1(([Φ(t,k)]i1N1NT)Id)𝔼[Ξk]\displaystyle\quad+\sum_{k=1}^{t-1}\left\|\left(\left([\Phi(t,k)]_{i}-\frac{1}{N}\textbf{1}_{N}^{T}\right)\otimes I_{d}\right)\right\|\mathbb{E}\left[\|\Xi_{k}\|\right]
+k=1t1(([Φ(t,k+1)]i1N1NT)Id)𝔼[Pk]\displaystyle\quad+\sum_{k=1}^{t-1}\left\|\left(\left([\Phi(t,k+1)]_{i}-\frac{1}{N}\textbf{1}_{N}^{T}\right)\otimes I_{d}\right)\right\|\mathbb{E}\left[\|P_{k}\|\right]
NdCλt1𝔼[X1]+2NdCk=1t1λtkσk+dNθCωk=1t1λtk1αk,\displaystyle\leq\!\!\sqrt{Nd}C\lambda^{t-1}\mathbb{E}\left[\|X_{1}\!\|\right]\!\!+\!\sqrt{2}NdC\!\!\sum_{k=1}^{t-1}\!\lambda^{t-k}\sigma_{k}\!+\!\frac{\sqrt{d}N\theta C}{\omega}\!\!\sum_{k=1}^{t-1}\!\lambda^{t-k-1}\alpha_{k}, (21)

where the last inequality holds due to Lemma 2, (17), and 𝔼[Ξk]=2Ndσt\mathbb{E}\left[\|\Xi_{k}\|\right]=\sqrt{2Nd}\sigma_{t}. Therefore, for any i,j{1,,N}i,j\in\{1,\dots,N\}, by (3) and the triangle inequality, we have (16). \qed

Lemma 4

Suppose Assumptions 1-4 hold. For the sequence of decisions {xti,t=1,,T,i=1,,N}\{x_{t}^{i},\,t=1,\dots,T,\,i=1,\dots,N\} generated by Algorithm 1, for any i{1,,N}i\in\{1,\dots,N\}, xΩx\in\Omega, we have

t=1Ti=1N1αt𝔼[Dφ(x,zti)Dφ(x,xt+1i)]NMη22αT+t=1Ti=1NM2αt𝔼[ξti2].\sum_{t=1}^{T}\sum_{i=1}^{N}\frac{1}{\alpha_{t}}\mathbb{E}\left[D_{\varphi}(x,z_{t}^{i})-D_{\varphi}(x,x_{t+1}^{i})\right]\leq\frac{NM\eta^{2}}{2\alpha_{T}}+\sum_{t=1}^{T}\sum_{i=1}^{N}\frac{M}{2\alpha_{t}}\mathbb{E}\left[\|\xi_{t}^{i}\|^{2}\right]. (22)
Proof 4

By Assumption 1, Assumption 4 and (6), for any xΩx\in\Omega, we have

i=1NDφ(x,zti)\displaystyle\sum_{i=1}^{N}D_{\varphi}(x,z_{t}^{i}) =i=1NDφ(x,j=1Naij(t)(xtj+ξtj))\displaystyle=\sum_{i=1}^{N}D_{\varphi}\left(x,\sum_{j=1}^{N}a_{ij}(t)(x_{t}^{j}+\xi_{t}^{j})\right)
i=1Nj=1Naij(t)Dφ(x,xtj+ξtj)\displaystyle\leq\sum_{i=1}^{N}\sum_{j=1}^{N}a_{ij}(t)D_{\varphi}\left(x,x_{t}^{j}+\xi_{t}^{j}\right)
=j=1NDφ(x,xtj+ξtj).\displaystyle=\sum_{j=1}^{N}D_{\varphi}\left(x,x_{t}^{j}+\xi_{t}^{j}\right).

From (6) and (7), it is known that xtix_{t}^{i} and ξti\xi_{t}^{i} are independent. Combining the properties of the Laplace distribution and Assumption 4, we obtain

𝔼[Dφ(x,xti+ξti)Dφ(x,xti)]\displaystyle\mathbb{E}\left[D_{\varphi}\left(x,x_{t}^{i}+\xi_{t}^{i}\right)-D_{\varphi}\left(x,x_{t}^{i}\right)\right] 𝔼[yDφ(x,xti),ξti]+M2𝔼[ξti2]\displaystyle\leq\mathbb{E}\left[\langle\nabla_{y}D_{\varphi}\left(x,x_{t}^{i}\right),\xi_{t}^{i}\rangle\right]+\frac{M}{2}\mathbb{E}\left[\|\xi_{t}^{i}\|^{2}\right]
=M2𝔼[ξti2].\displaystyle=\frac{M}{2}\mathbb{E}\left[\|\xi_{t}^{i}\|^{2}\right]. (23)

By Assumptions 2 - 4, we obtain

t=1T1αt(Dφ(x,xti)Dφ(x,xt+1i))\displaystyle\quad\sum_{t=1}^{T}\frac{1}{\alpha_{t}}\left(D_{\varphi}(x,x_{t}^{i})-D_{\varphi}(x,x_{t+1}^{i})\right)
=1α1Dφ(x,x1i)1αTDφ(x,xT+1i)+t=2T1(1αt1αt1)Dφ(x,xti)\displaystyle=\frac{1}{\alpha_{1}}D_{\varphi}(x,x_{1}^{i})-\frac{1}{\alpha_{T}}D_{\varphi}(x,x_{T+1}^{i})+\sum_{t=2}^{T-1}\left(\frac{1}{\alpha_{t}}-\frac{1}{\alpha_{t-1}}\right)D_{\varphi}(x,x_{t}^{i})
M2α1xx1i2+M2t=2T1(1αt1αt1)xxti2\displaystyle\leq\frac{M}{2\alpha_{1}}\|x-x_{1}^{i}\|^{2}+\frac{M}{2}\sum_{t=2}^{T-1}\left(\frac{1}{\alpha_{t}}-\frac{1}{\alpha_{t-1}}\right)\|x-x_{t}^{i}\|^{2}
Mη22α1+Mη22t=2T1(1αt1αt1)\displaystyle\leq\frac{M\eta^{2}}{2\alpha_{1}}+\frac{M\eta^{2}}{2}\sum_{t=2}^{T-1}\left(\frac{1}{\alpha_{t}}-\frac{1}{\alpha_{t-1}}\right)
=Mη22αT,\displaystyle=\frac{M\eta^{2}}{2\alpha_{T}}, (24)

where the first inequality holds due to Dφ(x,xT+1i)0D_{\varphi}(x,x_{T+1}^{i})\geq 0 and

Dφ(x,x1i)=Dφ(x,x1i)Dφ(x,x)M2xx1i,D_{\varphi}(x,x_{1}^{i})=D_{\varphi}(x,x_{1}^{i})-D_{\varphi}(x,x)\leq\frac{M}{2}\|x-x_{1}^{i}\|,

and the second inequality holds due to x,yΩ\forall x,y\in\!\Omega, xyη\|x-y\|\!\leq\!\eta. By (23) and (4), we have (22). \qed

Lemma 5

([36]) Consider the following problem

minxΩ{Dφ(x,y)+s,x},\min_{x\in\Omega}\{D_{\varphi}(x,y)+\langle s,x\rangle\},

where s:ds:\mathbb{R}^{d}\rightarrow\mathbb{R} is a convex function, and Dφ(x,y)D_{\varphi}(x,y) is given by (5). xx^{*} is the optimal solution to the above problem if and only if

s,xzDφ(z,y)Dφ(z,x)Dφ(x,y),zΩ.\langle s,x^{*}-z\rangle\leq D_{\varphi}(z,y)-D_{\varphi}(z,x^{*})-D_{\varphi}(x^{*},y),\,\forall\,z\in\Omega.
Lemma 6

Suppose Assumptions 1-4 hold. For the sequence of decisions {xti,t=1,,T,i=1,,N}\{x_{t}^{i},\,t=1,\dots,T,\,i=1,\dots,N\} generated by Algorithm 1, for any xΩx\in\Omega, we have

𝔼[t=1Tj=1Nftj(xti),xtix]\displaystyle\quad\mathbb{E}\left[\sum_{t=1}^{T}\sum_{j=1}^{N}\langle\nabla f_{t}^{j}(x_{t}^{i}),x_{t}^{i}-x\rangle\right]
t=1Tj=1Nβη𝔼[xtixtj]+t=1Tj=1Nθ𝔼[xtiztj]+NMη22αT\displaystyle\leq\sum_{t=1}^{T}\sum_{j=1}^{N}\beta\eta\mathbb{E}\left[\|x_{t}^{i}-x_{t}^{j}\|\right]+\sum_{t=1}^{T}\sum_{j=1}^{N}\theta\mathbb{E}\left[\|x_{t}^{i}-z_{t}^{j}\|\right]+\frac{NM\eta^{2}}{2\alpha_{T}}
+t=1Tj=1N𝔼[ftj(xtj),ztjxt+1j]+t=1Tj=1NM2αt𝔼[ξtj2].\displaystyle\quad+\sum_{t=1}^{T}\sum_{j=1}^{N}\mathbb{E}\left[\langle\nabla f_{t}^{j}(x_{t}^{j}),z_{t}^{j}-x_{t+1}^{j}\rangle\right]+\sum_{t=1}^{T}\sum_{j=1}^{N}\frac{M}{2\alpha_{t}}\mathbb{E}\left[\|\xi_{t}^{j}\|^{2}\right]. (25)
Proof 5

Denote s=αtftj(xtj)s=\alpha_{t}\nabla f_{t}^{j}(x_{t}^{j}), y=ztjy=z_{t}^{j}, x=xt+1jx^{*}=x_{t+1}^{j}, and z=xΩz=x\in\Omega. By Lemma 5, we have

Dφ(x,xt+1j)+Dφ(xt+1j,ztj)Dφ(x,ztj)αtftj(xtj),xxt+1j.D_{\varphi}(x,x_{t+1}^{j})+D_{\varphi}(x_{t+1}^{j},z_{t}^{j})-D_{\varphi}(x,z_{t}^{j})\leq\langle\alpha_{t}\nabla f_{t}^{j}(x_{t}^{j}),x-x_{t+1}^{j}\rangle.

From the non-negativity of Bregman divergence and the above inequality, we have

Dφ(x,xt+1j)Dφ(x,ztj)\displaystyle D_{\varphi}(x,x_{t+1}^{j})-D_{\varphi}(x,z_{t}^{j}) αtftj(xtj),xxt+1j\displaystyle\leq\langle\alpha_{t}\nabla f_{t}^{j}(x_{t}^{j}),x-x_{t+1}^{j}\rangle
=αtftj(xtj),xztj+αtftj(xtj),ztjxt+1j.\displaystyle=\langle\alpha_{t}\nabla f_{t}^{j}(x_{t}^{j}),x-z_{t}^{j}\rangle+\langle\alpha_{t}\nabla f_{t}^{j}(x_{t}^{j}),z_{t}^{j}-x_{t+1}^{j}\rangle.

Rearranging and summing the above inequality yields

t=1Tj=1N𝔼[ftj(xtj),ztjx]\displaystyle\quad\sum_{t=1}^{T}\sum_{j=1}^{N}\mathbb{E}\left[\langle\nabla f_{t}^{j}(x_{t}^{j}),z_{t}^{j}-x\rangle\right]
t=1Tj=1N𝔼[ftj(xtj),ztjxt+1j]+t=1Tj=1N1αt𝔼[Dφ(x,ztj)Dφ(x,xt+1j)].\displaystyle\leq\sum_{t=1}^{T}\sum_{j=1}^{N}\mathbb{E}\left[\langle\nabla f_{t}^{j}(x_{t}^{j}),z_{t}^{j}-x_{t+1}^{j}\rangle\right]+\sum_{t=1}^{T}\sum_{j=1}^{N}\frac{1}{\alpha_{t}}\mathbb{E}\left[D_{\varphi}(x,z_{t}^{j})-D_{\varphi}(x,x_{t+1}^{j})\right]. (26)

By Assumptions 1- 3, for any i,j{1,,N}i,j\in\{1,\dots,N\}, xΩx\in\Omega, we have

𝔼[t=1Tj=1Nftj(xti),xtix]\displaystyle\quad\mathbb{E}\left[\sum_{t=1}^{T}\sum_{j=1}^{N}\langle\nabla f_{t}^{j}(x_{t}^{i}),x_{t}^{i}-x\rangle\right]
=t=1Tj=1N𝔼[ftj(xti)ftj(xtj),xtix+ftj(xtj),ztjx+ftj(xtj),xtiztj]\displaystyle=\!\sum_{t=1}^{T}\!\sum_{j=1}^{N}\!\mathbb{E}\!\left[\!\langle\nabla f_{t}^{j}(x_{t}^{i})\!-\!\!\nabla f_{t}^{j}(x_{t}^{j}),x_{t}^{i}\!-\!x\rangle\!+\!\langle\nabla f_{t}^{j}(x_{t}^{j}),z_{t}^{j}\!-\!x\rangle\!+\!\langle\nabla f_{t}^{j}(x_{t}^{j}),x_{t}^{i}\!-\!z_{t}^{j}\rangle\!\right]
t=1Tj=1N𝔼[βηxtixtj+θxtiztj+ftj(xtj),ztjx],\displaystyle\leq\sum_{t=1}^{T}\sum_{j=1}^{N}\mathbb{E}\left[\beta\eta\|x_{t}^{i}-x_{t}^{j}\|+\theta\|x_{t}^{i}-z_{t}^{j}\|+\langle\nabla f_{t}^{j}(x_{t}^{j}),z_{t}^{j}-x\rangle\right], (27)

where the inequality holds due to ftj(xti)ftj(xtj)βxtixtj\|\nabla f_{t}^{j}(x_{t}^{i})-\nabla f_{t}^{j}(x_{t}^{j})\|\leq\beta\|x_{t}^{i}-x_{t}^{j}\|, xtixη\|x_{t}^{i}-x\|\leq\eta, and ftj(xtj)θ\|\nabla f_{t}^{j}(x_{t}^{j})\|\leq\theta. Combining (22), (5), and (5), the lemma is thus proved. \qed

Next, we present the theorem for the regret analysis of Algorithm 1.

Theorem 2

Suppose Assumptions 1-4 hold. If αt=1Nt\alpha_{t}=\frac{1}{N\sqrt{t}}, σt=2dαtθωϵ\sigma_{t}=\frac{2\sqrt{d}\alpha_{t}\theta}{\omega\epsilon}, t=1,,Tt=1,\dots,T, for any i{1,,N}i\in\{1,\dots,N\}, then the regret of Algorithm 1 satisfies

𝔼[RTi]U1+U2T,\displaystyle\mathbb{E}[R_{T}^{i}]\leq U_{1}+U_{2}\sqrt{T}, (28)

where U1=2N32dC(βη+2θ)𝔼[X1]1λU_{1}=\frac{2N^{\frac{3}{2}}\sqrt{d}C(\beta\eta+2\theta)\mathbb{E}\left[\|X_{1}\|\right]}{1-\lambda} and U2=82Ndθ2ωϵ+2Nθ2ω+Mη22+8d2Mθ2ω2ϵ2+82d32NCθ(βη+2θ)ωϵ(1λ)+4dNCθ(βη+2θ)ω(1λ)U_{2}=\frac{8\sqrt{2N}d\theta^{2}}{\omega\epsilon}+\frac{2\sqrt{N}\theta^{2}}{\omega}+\frac{M\eta^{2}}{2}+\frac{8d^{2}M\theta^{2}}{\omega^{2}\epsilon^{2}}+\frac{8\sqrt{2}d^{\frac{3}{2}}NC\theta(\beta\eta+2\theta)}{\omega\epsilon(1-\lambda)}+\frac{4\sqrt{d}NC\theta(\beta\eta+2\theta)}{\omega(1-\lambda)}, which implies limT𝔼[RTi]/T=0\lim_{T\rightarrow\infty}\mathbb{E}[R_{T}^{i}]/T=0.

Proof 6

To prove maxxΩ(𝔼[t=1Tj=1Nftj(xti),xtix])\max_{x\in{\Omega}}\left(\mathbb{E}\left[\sum_{t=1}^{T}\sum_{j=1}^{N}\langle\nabla f_{t}^{j}(x_{t}^{i}),x_{t}^{i}-x\rangle\right]\right) is sublinear, it suffices to show 𝔼[t=1Tj=1Nftj(xti),xtix]\mathbb{E}\left[\sum_{t=1}^{T}\sum_{j=1}^{N}\langle\nabla f_{t}^{j}(x_{t}^{i}),x_{t}^{i}-x\rangle\right] is sublinear for any xΩx\in\Omega. By (6), ftj(xtj)θ\|\nabla f_{t}^{j}(x_{t}^{j})\|\leq\theta, and 𝔼[ξti]=0\mathbb{E}[\xi_{t}^{i}]=0, we have

𝔼[ftj(xtj),ztjxt+1j]\displaystyle\mathbb{E}\left[\langle\nabla f_{t}^{j}(x_{t}^{j}),z_{t}^{j}-x_{t+1}^{j}\rangle\right] =𝔼[ftj(xtj),i=1Naji(t)(xti+ξti)xt+1j]\displaystyle=\mathbb{E}\left[\langle\nabla f_{t}^{j}(x_{t}^{j}),\sum_{i=1}^{N}a_{ji}(t)(x_{t}^{i}+\xi_{t}^{i})-x_{t+1}^{j}\rangle\right]
=i=1Naji(t)(𝔼[ftj(xtj),xtixt+1j]+𝔼[ftj(xtj),ξti])\displaystyle=\!\!\sum_{i=1}^{N}\!a_{ji}(t)\!\!\left(\mathbb{E}[\langle\nabla f_{t}^{j}(x_{t}^{j}),x_{t}^{i}\!-\!x_{t+1}^{j}\rangle]\!+\!\mathbb{E}[\!\langle\nabla f_{t}^{j}(x_{t}^{j}),\xi_{t}^{i}\rangle]\right)
i=1Naji(t)θ𝔼[xtixt+1j].\displaystyle\leq\sum_{i=1}^{N}a_{ji}(t)\theta\mathbb{E}[\|x_{t}^{i}-x_{t+1}^{j}\|]. (29)

By (6), the triangle inequality, and 𝔼[ξth]=2dσt\mathbb{E}[\|\xi_{t}^{h}\|]=\sqrt{2d}\sigma_{t}, we have

𝔼[xtiztj]\displaystyle\mathbb{E}[\|x_{t}^{i}-z_{t}^{j}\|] h=1Najh(t)𝔼[xtixth]+𝔼[h=1Najh(t)ξth]\displaystyle\leq\sum_{h=1}^{N}a_{jh}(t)\mathbb{E}[\|x_{t}^{i}-x_{t}^{h}\|]+\mathbb{E}\left[\sum_{h=1}^{N}a_{jh}(t)\|\xi_{t}^{h}\|\right]
h=1Najh(t)𝔼[xtixth]+2dσt.\displaystyle\leq\sum_{h=1}^{N}a_{jh}(t)\mathbb{E}[\|x_{t}^{i}-x_{t}^{h}\|]+\sqrt{2d}\sigma_{t}. (30)

By (16), (3), and the properties of the Laplace distribution, we obtain

𝔼[x¯t+1x¯t]\displaystyle\mathbb{E}\left[\|\overline{x}_{t+1}-\overline{x}_{t}\|\right] =𝔼[1N(1NTId)Ξt+1N(1NTId)Pt]\displaystyle=\mathbb{E}\left[\left\|\frac{1}{N}\left(\textbf{1}_{N}^{T}\otimes I_{d}\right)\Xi_{t}+\frac{1}{N}\left(\textbf{1}_{N}^{T}\otimes I_{d}\right)P_{t}\right\|\right]
2dNσt+Nθαtω.\displaystyle\leq\sqrt{2dN}\sigma_{t}+\frac{\sqrt{N}\theta\alpha_{t}}{\omega}. (31)

By the triangle inequality, (3), and (31), we obtain

𝔼[xt+1jxti]\displaystyle\mathbb{E}\left[\|x_{t+1}^{j}-x_{t}^{i}\|\right] 𝔼[xt+1jx¯t+1]+𝔼[xtix¯t]+𝔼[x¯t+1x¯t]\displaystyle\leq\mathbb{E}\left[\|x_{t+1}^{j}-\overline{x}_{t+1}\|\right]+\mathbb{E}\left[\|x_{t}^{i}-\overline{x}_{t}\|\right]+\mathbb{E}\left[\|\overline{x}_{t+1}-\overline{x}_{t}\|\right]
2NdCλt𝔼[X1]+22NdCk=1t1λtkσk+2dNσt\displaystyle\leq 2\sqrt{Nd}C\lambda^{t}\mathbb{E}\left[\|X_{1}\|\right]+2\sqrt{2}NdC\sum_{k=1}^{t-1}\lambda^{t-k}\sigma_{k}+\sqrt{2dN}\sigma_{t}
+2dNθCωk=1t1λtk1αk+Nθαtω.\displaystyle\quad+\frac{2\sqrt{d}N\theta C}{\omega}\sum_{k=1}^{t-1}\lambda^{t-k-1}\alpha_{k}+\frac{\sqrt{N}\theta\alpha_{t}}{\omega}. (32)

Substituting (6), (6), and (6) into (6), and combining with 𝟏TA(t)=𝟏T\boldsymbol{1}^{T}A(t)=\boldsymbol{1}^{T}, AT(t)𝟏=𝟏A^{T}(t)\boldsymbol{1}=\boldsymbol{1} and (16), we obtain

𝔼[t=1Tj=1Nftj(xti),xtix]\displaystyle\quad\mathbb{E}\left[\sum_{t=1}^{T}\sum_{j=1}^{N}\langle\nabla f_{t}^{j}(x_{t}^{i}),x_{t}^{i}-x\rangle\right]
t=1Tj=1Nβη𝔼[xtixtj]+t=1Tj=1Nh=1Najh(t)θ𝔼[xtixth]+t=1TNθ2dσt\displaystyle\leq\sum_{t=1}^{T}\sum_{j=1}^{N}\beta\eta\mathbb{E}[\|x_{t}^{i}-x_{t}^{j}\|]+\sum_{t=1}^{T}\sum_{j=1}^{N}\sum_{h=1}^{N}a_{jh}(t)\theta\mathbb{E}[\|x_{t}^{i}-x_{t}^{h}\|]+\sum_{t=1}^{T}N\theta\sqrt{2d}\sigma_{t}
+t=1Tj=1Ni=1Naji(t)θ𝔼[xtixt+1j]+NMη22αT++t=1Tj=1NM2αt𝔼[ξtj2]\displaystyle\quad+\!\!\sum_{t=1}^{T}\sum_{j=1}^{N}\sum_{i=1}^{N}\!a_{ji}(t)\theta\mathbb{E}[\|x_{t}^{i}-x_{t+1}^{j}\|]\!+\!\frac{NM\eta^{2}}{2\alpha_{T}}\!+\!+\!\sum_{t=1}^{T}\sum_{j=1}^{N}\frac{M}{2\alpha_{t}}\mathbb{E}\left[\|\xi_{t}^{j}\|^{2}\right]
2CNd(βη+θ)t=1T(Nλt1𝔼[X1]+2dNk=1t1λtkσk+Nθωk=1t1λtk1αk)\displaystyle\leq 2CN\sqrt{d}(\beta\eta\!+\!\theta)\!\!\sum_{t=1}^{T}\!\!\left(\!\!\!\sqrt{N}\lambda^{t-1}\mathbb{E}\!\left[\|X_{1}\|\right]\!+\!\sqrt{2d}N\sum_{k=1}^{t-1}\!\lambda^{t-k}\sigma_{k}\!+\!\frac{N\theta}{\omega}\!\sum_{k=1}^{t-1}\!\lambda^{t-k-1}\alpha_{k}\!\!\right)
+2Cθt=1T(N32dλt𝔼[X1]+2N2dk=1t1λtkσk+dN2θωk=1t1λtk1αk)\displaystyle\quad+2C\theta\sum_{t=1}^{T}\!\left(\!N^{\frac{3}{2}}\sqrt{d}\!\lambda^{t}\mathbb{E}\left[\|X_{1}\|\right]\!+\!\sqrt{2}N^{2}d\!\sum_{k=1}^{t-1}\!\!\lambda^{t-k}\sigma_{k}+\frac{\sqrt{d}N^{2}\theta}{\omega}\sum_{k=1}^{t-1}\lambda^{t-k-1}\alpha_{k}\right)
+t=1T(22dN32θσt+N32θ2αtω+NMdσt2αt)+NMη22αT.\displaystyle\quad+\sum_{t=1}^{T}\left(2\sqrt{2d}N^{\frac{3}{2}}\theta\sigma_{t}+\frac{N^{\frac{3}{2}}\theta^{2}\alpha_{t}}{\omega}+\frac{NMd\sigma_{t}^{2}}{\alpha_{t}}\right)+\frac{NM\eta^{2}}{2\alpha_{T}}.

Substituting αt=1Nt\alpha_{t}=\frac{1}{N\sqrt{t}} and σt=2dαtθωϵ\sigma_{t}=\frac{2\sqrt{d}\alpha_{t}\theta}{\omega\epsilon} into the above inequality, and combining with t=1Tλt11λ\sum_{t=1}^{T}\lambda^{t}\leq\frac{1}{1-\lambda}, t=1Tαt2TN\sum_{t=1}^{T}\alpha_{t}\leq\frac{2\sqrt{T}}{N}, and

t=1Tk=1t1λtkαkt=1Tk=1tλtkαk=k=1Tαkt=kTλtk2TN(1λ),\sum_{t=1}^{T}\sum_{k=1}^{t-1}\lambda^{t-k}\alpha_{k}\leq\sum_{t=1}^{T}\sum_{k=1}^{t}\lambda^{t-k}\alpha_{k}=\sum_{k=1}^{T}\alpha_{k}\sum_{t=k}^{T}\lambda^{t-k}\leq\frac{2\sqrt{T}}{N(1-\lambda)},

we obtain

𝔼[t=1Tj=1Nftj(xti),xtix]\displaystyle\quad\mathbb{E}\left[\sum_{t=1}^{T}\sum_{j=1}^{N}\langle\nabla f_{t}^{j}(x_{t}^{i}),x_{t}^{i}-x\rangle\right]
2N32dC(βη+2θ)𝔼[X1]t=1Tλt1+22dN32θt=1Tσt+N32θ2ωt=1Tαt\displaystyle\leq 2N^{\frac{3}{2}}\sqrt{d}C(\beta\eta+2\theta)\mathbb{E}\left[\|X_{1}\|\right]\sum_{t=1}^{T}\lambda^{t-1}+2\sqrt{2d}N^{\frac{3}{2}}\theta\sum_{t=1}^{T}\sigma_{t}+\frac{N^{\frac{3}{2}}\theta^{2}}{\omega}\sum_{t=1}^{T}\alpha_{t}
+NMη22αT+NMdt=1Tσt2αt+22N2dC(βη+2θ)t=1Tk=1t1λtkσk\displaystyle\quad+\frac{NM\eta^{2}}{2\alpha_{T}}+NMd\sum_{t=1}^{T}\frac{\sigma_{t}^{2}}{\alpha_{t}}+2\sqrt{2}N^{2}dC(\beta\eta+2\theta)\!\sum_{t=1}^{T}\!\sum_{k=1}^{t-1}\!\!\lambda^{t-k}\sigma_{k}
+2dN2θC(βη+2θ)ωt=1Tk=1t1λtk1αk\displaystyle\quad+\!\frac{2\sqrt{d}N^{2}\theta C(\beta\eta+2\theta)}{\omega}\!\!\sum_{t=1}^{T}\!\sum_{k=1}^{t-1}\lambda^{t-k-1}\alpha_{k}
2N32dC(βη+2θ)𝔼[X1]1λ+82Ndθ2ωϵT+2Nθ2ωT+Mη22T\displaystyle\leq\frac{2N^{\frac{3}{2}}\sqrt{d}C(\beta\eta+2\theta)\mathbb{E}\left[\|X_{1}\|\right]}{1-\lambda}+\frac{8\sqrt{2N}d\theta^{2}}{\omega\epsilon}\sqrt{T}+\frac{2\sqrt{N}\theta^{2}}{\omega}\sqrt{T}+\frac{M\eta^{2}}{2}\sqrt{T}
+8d2Mθ2ω2ϵ2T+82d32NCθ(βη+2θ)ωϵ(1λ)T+4dNCθ(βη+2θ)ω(1λ)T.\displaystyle\quad+\frac{8d^{2}M\theta^{2}}{\omega^{2}\epsilon^{2}}\sqrt{T}+\frac{8\sqrt{2}d^{\frac{3}{2}}NC\theta(\beta\eta+2\theta)}{\omega\epsilon(1-\lambda)}\sqrt{T}+\frac{4\sqrt{d}NC\theta(\beta\eta+2\theta)}{\omega(1-\lambda)}\sqrt{T}. (33)

By the arbitrariness of xx and combining with (6), we obtain (28). \qed

Remark 4

From Theorem 2, we know that Algorithm 1 converges sublinearly. This is consistent with the best results in existing works on distributed online nonconvex optimization. The differential privacy mechanism we added does not affect the order of the upper bound on the regret, but only the coefficient. Furthermore, the regret of the algorithm depends on the properties of the cost function, the constraint set, the problem’s dimension, the number of nodes, the connectivity of the communication graph, and the privacy parameters. It is worth noting that from Theorem 2, we can see that the larger the privacy parameter ϵ\epsilon, the tighter the bound on the regret. However, in the differential privacy analysis of Theorem 1, the smaller the privacy parameter, the higher the privacy level. Therefore, in practical applications, it is necessary to choose an appropriate privacy parameter to balance algorithm performance and privacy level.

5 Numerical Simulations

In this section, we demonstrate the theoretical results of the proposed algorithm (DPDO-NC) through the distributed localization problem (Example 1). We consider a time-varying communication topology consisting of 6 sensors. The time-varying communication topology switches between three graphs, denoted by G1,G2,G3G_{1},G_{2},G_{3}. The weight matrices for these three graphs are denoted as A1,A2,A3A_{1},A_{2},A_{3}, where

A1=[120000121212000001212000001212000001212000001212],A2=[015151515151501515151515150151515151515015151515151501515151515150],A3=[130130130013013013130130130013013013130130130013013013].A_{1}\!\!=\!\!\!\left[\begin{array}[]{cccccc}\frac{1}{2}&0&0&0&0&\frac{1}{2}\\ \frac{1}{2}&\frac{1}{2}&0&0&0&0\\ 0&\frac{1}{2}&\frac{1}{2}&0&0&0\\ 0&0&\frac{1}{2}&\frac{1}{2}&0&0\\ 0&0&0&\frac{1}{2}&\frac{1}{2}&0\\ 0&0&0&0&\frac{1}{2}&\frac{1}{2}\\ \end{array}\right]\!\!\!,\,\!A_{2}\!\!=\!\!\!\left[\begin{array}[]{cccccc}0&\frac{1}{5}&\frac{1}{5}&\frac{1}{5}&\frac{1}{5}&\frac{1}{5}\\ \frac{1}{5}&0&\frac{1}{5}&\frac{1}{5}&\frac{1}{5}&\frac{1}{5}\\ \frac{1}{5}&\frac{1}{5}&0&\frac{1}{5}&\frac{1}{5}&\frac{1}{5}\\ \frac{1}{5}&\frac{1}{5}&\frac{1}{5}&0&\frac{1}{5}&\frac{1}{5}\\ \frac{1}{5}&\frac{1}{5}&\frac{1}{5}&\frac{1}{5}&0&\frac{1}{5}\\ \frac{1}{5}&\frac{1}{5}&\frac{1}{5}&\frac{1}{5}&\frac{1}{5}&0\\ \end{array}\right]\!\!\!,\,\!A_{3}\!\!=\!\!\!\left[\begin{array}[]{cccccc}\frac{1}{3}&0&\frac{1}{3}&0&\frac{1}{3}&0\\ 0&\frac{1}{3}&0&\frac{1}{3}&0&\frac{1}{3}\\ \frac{1}{3}&0&\frac{1}{3}&0&\frac{1}{3}&0\\ 0&\frac{1}{3}&0&\frac{1}{3}&0&\frac{1}{3}\\ \frac{1}{3}&0&\frac{1}{3}&0&\frac{1}{3}&0\\ 0&\frac{1}{3}&0&\frac{1}{3}&0&\frac{1}{3}\\ \end{array}\right]\!\!\!.

The evolution process of the target location is defined as

xt+10=xt0+[(1)qtsin(t/50)10tqtcos(t/70)40t],x_{t+1}^{0}=x_{t}^{0}+\begin{bmatrix}\frac{(-1)^{q_{t}}\sin(t/50)}{10t}\\ \frac{-q_{t}\cos(t/70)}{40t}\end{bmatrix},

where qtBernoulli(0.5)q_{t}\sim Bernoulli(0.5), and initial state x10=[0.8,0.95]Tx_{1}^{0}=[0.8,0.95]^{T}. The distance measurements are given by

dti=sixt0+ϑti,i=1,,N,d_{t}^{i}=\|s_{i}-x_{t}^{0}\|+\vartheta_{t}^{i},\,\,\,i=1,\dots,N,

where ϑti\vartheta_{t}^{i} is the measurement error of the model, which follows a uniform distribution over [0,0.001][0,0.001]. The sensors collaborate to solve the following problem

minxΩi=1612|sixdti|2,\min_{x\in\Omega}\sum_{i=1}^{6}\frac{1}{2}\left|\|s_{i}-x\|-d_{t}^{i}\right|^{2},

where Ω={x2|x13}\Omega=\{x\in\mathbb{R}^{2}|\|x\|_{1}\leq 3\}. The initial positions of the sensors are si=[0.8,0.95]T,i=1,,6s_{i}=[0.8,0.95]^{T},i=1,\dots,6. The initial estimates of the sensors are x1i=[0,0]T,i=1,,6x_{1}^{i}=[0,0]^{T},i=1,\dots,6. Fix the number of iterations to T=500T=500. The distance function of the algorithm is φ=12x2\varphi=\frac{1}{2}\|x\|^{2}, and the step size is αt=16t\alpha_{t}=\frac{1}{6\sqrt{t}}.

Figure 1 shows the evolution of maxi𝔼[RTi]/T\max_{i}\mathbb{E}[R_{T}^{i}]/T with different privacy levels, where the privacy parameters are set to ϵ=5\epsilon=5, 11, and 0.50.5, and the cases without differential privacy is also compared. From Figure 1, it can be observed that Algorithm 1 converges under different privacy levels, and the lower the privacy level, i.e., the larger ϵ\epsilon, the better the convergence performance of the algorithm. The parameter ϵ\epsilon reveals the trade-off between privacy protection and algorithm performance, which is consistent with the theoretical results presented in this paper.

Refer to caption
Figure 1: Evolution of maxi𝔼[RTi]/T\max_{i}\mathbb{E}[R_{T}^{i}]/T with different privacy levels.

When the privacy parameter is set to ϵ=5\epsilon=5, we compare the convergence of each node. Figure 2 depicts the variation in the regret of each node. As shown in Figure 2, all nodes converge, and the convergence speeds are almost identical. This indicates that through interaction with neighboring nodes, the regret of each node achieves sublinear convergence.

Refer to caption
Figure 2: Comparison of each node’s regret when ϵ=5\epsilon=5.

We also present the convergence of the algorithm with different Bregman divergences. We demonstrate the algorithm’s performance using the squared Euclidean distance and the Mahalanobis distance, with φ=x2\varphi=\|x\|^{2} and φ=xTQx\varphi=x^{T}Qx, where QQ is a positive semi-definite matrix. Both distance functions satisfy Assumption 4. Figure 3 shows the evolution of maxi𝔼[RTi]/T\max_{i}\mathbb{E}[R_{T}^{i}]/T under different Bregman divergence with ϵ=5\epsilon=5. As seen in Figure 3, for different Bregman divergences, as long as they satisfy Assumption 4, the proposed differential privacy distributed online nonconvex optimization algorithm can converge to the stationary point, and the regret is sublinear.

Refer to caption
Figure 3: Evolution of maxi𝔼[RTi]/T\max_{i}\mathbb{E}[R_{T}^{i}]/T with different Bregman divergences.

6 Conclusion

In this paper, we study the privacy-preserving distributed online optimization for nonconvex problems over time-varying graphs. Based on the Laplace differential privacy mechanism and the distributed mirror descent algorithm, we propose a privacy-preserving distributed online mirror descent algorithm for nonconvex optimization (DPDO-NC), which guarantees ϵ\epsilon-differential privacy at each iteration. In addition, we establish the upper bound of the regret for the proposed algorithm and prove that the individual regret based on the first-order optimality condition grows sublinearly, with an upper bound O(T)O(\sqrt{T}). Finally, we demonstrate the effectiveness of the algorithm through numerical simulations and analyze the relationship between the algorithm regret and the privacy level. Future research could explore more complex communication networks and scenarios with intricate constraints, as well as focus on improving the convergence rate of algorithms for solving distributed online nonconvex optimization problems.

Acknowledgements

This work was supported by the National Natural Science Foundation of China under Grant 62261136550.

References

  • [1] G. Tychogiorgos, A. Gkelias, K. K. Leung, A non-convex distributed optimization framework and its application to wireless ad-hoc networks, IEEE Transactions on Wireless Communications 12 (9) (2013) 4286–4296.
  • [2] W. Chen, T. Li, Distributed economic dispatch for energy internet based on multiagent consensus control, IEEE Transactions on automatic control 66 (1) (2020) 137–152.
  • [3] W. Ma, J. Wang, V. Gupta, C. Chen, Distributed energy management for networked microgrids using online admm with regret, IEEE Transactions on Smart Grid 9 (2) (2016) 847–856.
  • [4] P. Nazari, D. A. Tarzanagh, G. Michailidis, Dadam: A consensus-based distributed adaptive gradient method for online optimization, IEEE Transactions on Signal Processing 70 (2022) 6065–6079.
  • [5] E. Hazan, A. Agarwal, S. Kale, Logarithmic regret algorithms for online convex optimization, Machine Learning 69 (2) (2007) 169–192.
  • [6] M. Akbari, B. Gharesifard, T. Linder, Individual regret bounds for the distributed online alternating direction method of multipliers, IEEE Transactions on Automatic Control 64 (4) (2018) 1746–1752.
  • [7] R. Dixit, A. S. Bedi, K. Rajawat, Online learning over dynamic graphs via distributed proximal gradient algorithm, IEEE Transactions on Automatic Control 66 (11) (2020) 5065–5079.
  • [8] J. Li, C. Gu, Z. Wu, Online distributed stochastic learning algorithm for convex optimization in time-varying directed networks, Neurocomputing 416 (2020) 85–94.
  • [9] K. Lu, G. Jing, L. Wang, Online distributed optimization with strongly pseudoconvex-sum cost functions, IEEE Transactions on Automatic Control 65 (1) (2019) 426–433.
  • [10] A. Beck, M. Teboulle, Mirror descent and nonlinear projected subgradient methods for convex optimization, Operations Research Letters 31 (3) (2003) 167–175.
  • [11] A. Ben-Tal, T. Margalit, A. Nemirovski, The ordered subsets mirror descent optimization method with applications to tomography, SIAM Journal on Optimization 12 (1) (2001) 79–108.
  • [12] S. Shahrampour, A. Jadbabaie, Distributed online optimization in dynamic environments using mirror descent, IEEE Transactions on Automatic Control 63 (3) (2017) 714–725.
  • [13] D. Yuan, Y. Hong, D. W. Ho, S. Xu, Distributed mirror descent for online composite optimization, IEEE Transactions on Automatic Control 66 (2) (2020) 714–729.
  • [14] C. Dwork, F. McSherry, K. Nissim, A. Smith, Calibrating noise to sensitivity in private data analysis, in: Theory of Cryptography: Third Theory of Cryptography Conference, TCC 2006, New York, NY, USA, March 4-7, 2006. Proceedings 3, Springer, 2006, pp. 265–284.
  • [15] M. Yuan, J. Lei, Y. Hong, Differentially private distributed online mirror descent algorithm, Neurocomputing 551 (2023) 126531.
  • [16] Z. Zhao, Z. Yang, M. Wei, Q. Ji, Privacy preserving distributed online projected residual feedback optimization over unbalanced directed graphs, Journal of the Franklin Institute 360 (18) (2023) 14823–14840.
  • [17] Q. Lü, K. Zhang, S. Deng, Y. Li, H. Li, S. Gao, Y. Chen, Privacy-preserving decentralized dual averaging for online optimization over directed networks, IEEE Transactions on Industrial Cyber-Physical Systems 1 (2023) 79–91.
  • [18] C. Li, P. Zhou, L. Xiong, Q. Wang, T. Wang, Differentially private distributed online learning, IEEE transactions on knowledge and data engineering 30 (8) (2018) 1440–1453.
  • [19] J. Zhu, C. Xu, J. Guan, D. O. Wu, Differentially private distributed online algorithms over time-varying directed networks, IEEE Transactions on Signal and Information Processing over Networks 4 (1) (2018) 4–17.
  • [20] Y. Xiong, J. Xu, K. You, J. Liu, L. Wu, Privacy-preserving distributed online optimization over unbalanced digraphs via subgradient rescaling, IEEE Transactions on Control of Network Systems 7 (3) (2020) 1366–1378.
  • [21] H. Wang, K. Liu, D. Han, S. Chai, Y. Xia, Privacy-preserving distributed online stochastic optimization with time-varying distributions, IEEE Transactions on Control of Network Systems 10 (2) (2022) 1069–1082.
  • [22] J. Zhang, S. Ge, T. Chang, Z. Luo, Decentralized non-convex learning with linearly coupled constraints: Algorithm designs and application to vertical learning problem, IEEE Transactions on Signal Processing 70 (2022) 3312–3327.
  • [23] S. Hashempour, A. A. Suratgar, A. Afshar, Distributed nonconvex optimization for energy efficiency in mobile ad hoc networks, IEEE Systems Journal 15 (4) (2021) 5683–5693.
  • [24] E. Hazan, K. Singh, C. Zhang, Efficient regret minimization in non-convex games, in: International Conference on Machine Learning, PMLR, 2017, pp. 1433–1441.
  • [25] A. Lesage-Landry, J. A. Taylor, I. Shames, Second-order online nonconvex optimization, IEEE Transactions on Automatic Control 66 (10) (2020) 4866–4872.
  • [26] J. Li, C. Li, J. Fan, T. Huang, Online distributed stochastic gradient algorithm for non-convex optimization with compressed communication, IEEE Transactions on Automatic Control 69 (2) (2024) 936–951.
  • [27] K. Lu, L. Wang, Online distributed optimization with nonconvex objective functions: Sublinearity of first-order optimality condition-based regret, IEEE Transactions on Automatic Control 67 (6) (2021) 3029–3035.
  • [28] Y. Wang, T. Başar, Decentralized nonconvex optimization with guaranteed privacy and accuracy, Automatica 150 (2023) 110858.
  • [29] M. Khajenejad, S. Martínez, Guaranteed privacy of distributed nonconvex optimization via mixed-monotone functional perturbations, IEEE Control Systems Letters 7 (2022) 1081–1086.
  • [30] A. Nedic, A. Ozdaglar, Distributed subgradient methods for multi-agent optimization, IEEE Transactions on Automatic Control 54 (1) (2009) 48–61.
  • [31] M. Cao, B. D. Anderson, A. S. Morse, Sensor network localization with imprecise distances, Systems & control letters 55 (11) (2006) 887–893.
  • [32] Y. Hua, S. Liu, Y. Hong, K. H. Johansson, G. Wang, Distributed online bandit nonconvex optimization with one-point residual feedback via dynamic regret, arXiv preprint arXiv:2409.15680.
  • [33] C. Dwork, Differential privacy, in: International colloquium on automata, languages, and programming, Springer, 2006, pp. 1–12.
  • [34] Z. Huang, S. Mitra, N. Vaidya, Differentially private distributed optimization, in: Proceedings of the 16th International Conference on Distributed Computing and Networking, 2015, pp. 1–10.
  • [35] Y. Wang, A. Nedić, Tailoring gradient methods for differentially private distributed optimization, IEEE Transactions on Automatic Control 69 (2) (2023) 872–887.
  • [36] A. Beck, M. Teboulle, Mirror descent and nonlinear projected subgradient methods for convex optimization, Operations Research Letters 31 (3) (2003) 167–175.