This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Distributed gradient-based optimization in the presence of dependent aperiodic communication

A.Adrian Redderlabel=e1]aredder@mail.upb.de [    A.Arunselvan Ramaswamylabel=e2]arunselvan.ramaswamy@kau.se [    H.Holger Karllabel=e3]holger.karl@hpi.de [ Department of Computer Science, Paderborn University. Department of Computer Science, Karlstad University. Hasso-Plattner-Institute, University of Potsdam.
Abstract

Iterative distributed optimization algorithms involve multiple agents that communicate with each other, over time, in order to minimize/maximize a global objective. In the presence of unreliable communication networks, the Age-of-Information (AoI), which measures the freshness of data received, may be large and hence hinder algorithmic convergence. In this paper, we study the convergence of general distributed gradient-based optimization algorithms in the presence of communication that neither happens periodically nor at stochastically independent points in time. We show that convergence is guaranteed provided the random variables associated with the AoI processes are stochastically dominated by a random variable with finite first moment. This improves on previous requirements of boundedness of more than the first moment. We then introduce stochastically strongly connected (SSC) networks, a new stochastic form of strong connectedness for time-varying networks. We show: If for any p0p\geq 0 the processes that describe the success of communication between agents in a SSC network are α\alpha-mixing with np1α(n)n^{p-1}\alpha(n) summable, then the associated AoI processes are stochastically dominated by a random variable with finite pp-th moment. In combination with our first contribution, this implies that distributed stochastic gradient descend converges in the presence of AoI, if α(n)\alpha(n) is summable.

Distributed Optimization,
Stochastic Gradient Decent,
Time-Varying Networks,
Age of Information,
α\alpha-Mixing,
Stochastic dominance,
keywords:
\startlocaldefs\endlocaldefs

, and

1 Introduction

Distributed optimization of stochastically approximated loss functions lies at the heart of many system-level problems that arise in multi-agent learning [26], resource allocation for data centers [12], or decentralized control of power systems [18]. In these scenarios, distributed implementations have many advantages such as balanced workload or the avoidance of a single point of failure. However, this usually comes with high communication costs for coordination [29], entailing that information can only be exchanged rarely, causing local versions of global information to be significantly outdated. Hence, it is of high interest to characterize conditions such that a distributed optimization algorithm can converge when only significantly outdated information with sporadic updates is available.

We therefore consider distributed stochastic optimization problems (SOPs) where the choice of local optimization variables has to be coordinated over an uncertain time-varying communication network. A typical distributed SOP can take the following form:

x=(x1,,xD)=argmin xd𝔼ξ[f(x,ξ)]\begin{split}x^{*}=(x_{1}^{*},\ldots,x_{D}^{*})&=\underset{x\in\mathbb{R}^{d}}{\text{argmin }}\mathbb{E}_{\xi}\left[f(x,\xi)\right]\end{split} (1.1)

The objective is to minimize a real-valued function f:d×𝒮f:\mathbb{R}^{d}\times\mathcal{S}\to\mathbb{R}, which is function of an optimization variable xdx\in\mathbb{R}^{d} and a random variable ξ\xi representing noise or uncertainty taken from a set 𝒮\mathcal{S}. The optimization variable xx is composed of local components xidix_{i}\in\mathbb{R}^{d_{i}} that are associated with local agents of a distributed system. Hence, no global control of xx is possible. Moreover, the distribution of ξ\xi is typically unknown in practical scenarios and the agents can only observe samples ξn\xi^{n} of the uncertainty ξ\xi at discrete time steps n0n\in\mathbb{N}_{0}. Thus, the problem is that each agent ii has to coordinate the local choice for its variable xix_{i} with all other agents by exchanging information over a network and iteratively refine the choice based on the observed samples of uncertainty ξn\xi^{n}.

To solve this problem, we propose the following solution. Suppose every agent ii runs a local distributed stochastic gradient descent (SGD) algorithm that generates a sequence {xin}n=0\{x_{i}^{n}\}_{n=0}^{\infty} to solve problem (1.1). Ideally, every agent ii would like to have direct access to every new element of the sequences {xjn}n=0\{x_{j}^{n}\}_{n=0}^{\infty} from every other agent jij\not=i during run-time of its own local algorithm. However, due to the distributed nature of the considered setting, the agents have to communicate their updates for their local optimization variables to other agents via a communication network. Because of the uncertainty of communication networks, each agent ii can therefore only use delayed versions xjnτij(n)x_{j}^{n-\tau_{ij}(n)} for all jij\not=i to update its own local variable xinx_{i}^{n}. Here, xjnτij(n)x_{j}^{n-\tau_{ij}(n)} denotes the newest update of xjx_{j} available at agent ii at time nn and τij(t)\tau_{ij}(t) is its corresponding age. We refer to the τij(t)\tau_{ij}(t)’s as the Age of Information (AoI) variables. The resulting distributed algorithm is therefore in essence a “straightforward” implementation of SGD, where the true values of local variables are replaced by their aged counterparts. Due to the size of generated information in large distributed systems, and the uncertainty and high cost for communication over networks, the AoI variables can not be expected to be bounded and should therefore be modelled as an unbounded sequence of random variables. The problem is therefore to formulate mild network and communication assumptions that are representative and easily verifiable such that this SGD algorithm that uses highly aged information will still converge.

A major challenge for this problem is the multitude of potential factors that affect the AoI random variables. Information exchange between some pairs of agents might experience unbounded delays; mobility of agents or network scheduling algorithms can induce a varying set of network topologies. This can create dependencies among successive network transmissions, preventing agents from exchanging data for extended periods of time. In general, transmissions that happen close in some domain (e.g. time, frequency, or space in wireless communication) are expected to be highly correlated. It is therefore important to formulate a communication network model and associated assumptions that can represent these cases while being mathematically tractable for analysis. Notably, the assumption of guaranteed periodic or stochastically independent communication is practically unrealistic.

1.1 Network models in the literature

One of the most common models in the distributed optimization literature is a time-varying network model that is represented by a time-varying graph (Definition 5). For this graph, the most common assumption is that there is a constant MM such that the union graph associated with all time intervals [n,n+M][n,n+M] is strongly connected [32, 1, 34, 15]. A network with this property is typically called uniformly strongly connected [21], MM-strongly connected [22, 28] or jointly strongly connected [33]. This model implies guaranteed periodic communication. Another common model is to assume a time-varying network graph whose expected union graph is strongly connected, where the events that describe the success of communication across network edges are independent across time [3, 16, 14, 25].

In ref. [16, 22, 32, 28, 34, 14, 3, 15] the objective is that agents come to a consensus on one global optimization variable to minimize the sum of real-valued functions, each of which associated with one of the local agents. Although such consensus-type problems might appear quite different to (1.1), it turns out that an algorithm for (1.1) can also find a solution for consensus problems after a minor reformulation at the cost of additional communication, which we discuss in [27]. In contrast to the consensus type problems, ref. [33, 1, 25] and this work consider distributed optimization problems where each agent has to select a local optimization variable, such that the combination of all local variables solves a global optimization problem.

Observe that literature exclusively considers network models that either guarantee periodic communication or require communication based on independent events. We believe that these are restrictive assumptions that do not represent real-world communication networks well. To close this gap, we present a less restrictive network model and verifiable network conditions that guarantee that an SGD algorithm finds a solution to problems of the form (1.1). We also show the aforementioned typical network assumptions from the literature are stronger versions of our new set of network assumptions (Assumptions 5 and 6). Our assumptions only require a stochastic form of strong connectivity and a dependency decay (mixing) property. To the best of our knowledge, ours is the first work that guarantees asymptotic convergence of a distributed optimization scheme under such mildly restrictive conditions, connecting an abstract optimization theory with a wide range verifiable network conditions. However, it must be noted that other papers (such as those discussed above) can provide rate of convergence results while we merely give an almost sure convergence analysis.

1.2 Summary of contribution

Our work contributes to the literature of network conditions that guarantee asymptotic convergence to the set of stationary points of a distributed stochastic optimization problems with potentially non-convex objective function. Our work, builds on our previous work on SGD for time-varying networks [25]. However, whereas in [25] the focus was on the optimization iteration, with a strong and restrictive i.i.d. network assumption, this work focuses on guaranteeing convergence under significantly weaker network conditions. Most importantly, our network conditions cover time-varying network topologies, unbounded communication delays, non-independent aperiodic communication, asynchronous local updates and event-driven communication.

As the first step, we describe a distributed stochastic gradient descend algorithm (Algorithm 1) that instead of true local variables uses aged variables as a consequence of network communication. The AoI variables τij(n)\tau_{ij}(n) therefore induce gradient errors when comparing Algorithm 1 with and without AoI. As our first major contribution, we show in Lemma 2 that the aforementioned gradient errors vanish asymptotically under an asymptotic growth conditions for the AoI variables. Specifically, we require that all τij(n)\tau_{ij}(n) for all n0n\in\mathbb{N}_{0} are stochastically dominated by a non-negative integer-valued random variable with at least finite first moment. This provides a significant weakening of traditional assumptions from the stochastic approximation literature in the present setting, since traditionally a dominating random variable with at least a bounded moment greater than one was required. With Lemma 2 we then show the convergence of Algorithm 1 in Theorem 1.

Our second contribution is a universally applicable time-varying network model and associated assumptions to generally verify the existence of dominating random variables with an arbitrary required moment conditions. Our time-varying network model is formulated using events AijnA^{n}_{ij}, each of which represents successful information exchange from some agent ii to another agent jj during some time slot nn. We then introduce the notion of (ε,κ)(\varepsilon,\kappa)-stochastically strongly connected (SSC) network with ε(0,1)\varepsilon\in(0,1) and κ0\kappa\in\mathbb{N}_{0}. This notion requires that there is a set of network edges that form a strongly connected graph for which (nn+κAijn))ε\mathbb{P}\left(\cup_{n}^{n+\kappa}A_{ij}^{n})\right)\geq\varepsilon for all nN0n\in N_{0}. In other words, for those edges communication is successful at least ones over every interval of length κ\kappa with at least probability ε\varepsilon. We then present a general recipe to validate stochastic dominance properties with required moment conditions. Afterwards, Theorem 2 presents our main result: Fix any p0p\geq 0 and consider a (ε,κ)(\varepsilon,\kappa)-SSC network. If there exists some η0\eta\in\mathbb{N}_{0}, such that the processes 𝟙k=nn+ηAijn\mathbbm{1}_{\bigcup_{k=n}^{n+\eta}A^{n}_{ij}} are α\alpha-mixing with n=0np1α(n)<\sum_{n=0}^{\infty}n^{p-1}\alpha(n)<\infty, then all AoI variables τij(n)\tau_{ij}(n) are stochastically dominated by non-negative integer-valued random variable τ¯\overline{\tau} with 𝔼[τ¯p]<\mathbb{E}\left[\overline{\tau}^{p}\right]<\infty. This result, together with Theorem 1, will therefore imply our final convergence result for Algorithm 1 under the minimal requirement of a SSC network with summable α\alpha-mixing coefficients.

The rest of the paper is structured as follows: In Section 2 we state notation and preliminaries from probability and graph theory. In Section 3 we discuss the problem formulation and our distributed SGD algorithm. Afterwards we prove the almost sure convergence of Algorithm 1 in Section 4 under asymptotic growth conditions for the AoI variables. We then introduce our time-varying network model and associated assumptions in Section 5. Section 6 then presents our construction to validate stochastic dominance properties and our main results. Finally, we discuss the verifiability of our network assumptions and future work in Section 7.

2 Notation, definitions and preliminaries

This section presents notation, and preliminaries from probability and graph theory. Throughout our work, discrete points in time are indicated by superscript letters nn. We refer to a time slot nn as the time interval from time step n1n-1 to nn. We use n0n\in\mathbb{N}_{0} to denote n{0}n\in\mathbb{N}\cup\{0\}.

We make frequent use of the big 𝒪\mathcal{O} notation: Consider two real-valued sequences xnx^{n}, yny^{n}. Then xn𝒪(yn)x^{n}\in\mathcal{O}(y^{n}), if lim supnxnyn<\limsup\limits_{n\rightarrow\infty}\frac{x^{n}}{y^{n}}<\infty.

From probability theory we need the concepts of stochastic dominance, expectation of non-negative integer-valued random variables, measure of dependency and α\alpha-mixing:

Definition 1.

A non-negative integer-valued random variable τ\tau is said to be stochastically dominated by a random variable τ¯\overline{\tau} if (τ>m)(τ¯>m)\mathbb{P}\left(\tau>m\right)\leq\mathbb{P}\left(\overline{\tau}>m\right) for all m0m\geq 0.

Proposition 1 ([9]).

Suppose τ\tau is non-negative integer-valued random variable, then

𝔼[τp]=m=0((m+1)pmp)(τ>m),p>0.\mathbb{E}\left[\tau^{p}\right]=\sum_{m=0}^{\infty}((m+1)^{p}-m^{p})\mathbb{P}\left(\tau>m\right),\qquad p>0. (2.1)

Let (Ω,,)(\Omega,\mathcal{F},\mathbb{P}) be a probability space, and let 𝒜\mathcal{A} and \mathcal{B} be two sub-σ\sigma-algebras of \mathcal{F}. The measure of dependency α\alpha between 𝒜\mathcal{A} and \mathcal{B} is defined as

α(𝒜,)supA𝒜,B|(AB)(A)(B)|.\alpha(\mathcal{A},\mathcal{B})\coloneqq\sup_{A\in\mathcal{A},B\in\mathcal{B}}\lvert\mathbb{P}\left(A\cap B\right)-\mathbb{P}\left(A\right)\mathbb{P}\left(B\right)\rvert. (2.2)

Consider a stochastic process X={Xn}n0X=\{X_{n}\}_{n\in\mathbb{N}_{0}}. For 0lm0\leq l\leq m\leq\infty, define the sub-σ\sigma-algebra generated from XlX_{l} up to XmX_{m} by

lmσ(Xnlnm),\mathcal{F}_{l}^{m}\coloneqq\sigma\left(X_{n}\mid l\leq n\leq m\right), (2.3)

Informally, the σ\sigma-algebra generated by a stochastic process from a time interval describes the information that can be extracted from the associated process realizations, see [11] for details.

Definition 2.

The α\alpha-mixing coefficients of the process XX are

α(n)supl0α(0l,l+n).\alpha(n)\coloneqq\sup_{l\geq 0}\alpha(\mathcal{F}_{0}^{l},\mathcal{F}_{l+n}^{\infty}). (2.4)

for every n0n\in\mathbb{N}_{0}. The process XX is called strongly mixing (or α\alpha-mixing), if α(n)0\alpha(n)\rightarrow 0 as nn\rightarrow\infty.

Mixing is a notion of asymptotic independence. We refer to [8] for a survey about different mixing notions. We now introduce a subclass of strongly mixing processes with different rates of convergence:

Definition 3.

The process XX is called 𝒑\boldsymbol{p}-strongly mixing for some p0p\geq 0, if

n=0np1α(n)<.\sum_{n=0}^{\infty}n^{p-1}\alpha(n)<\infty. (2.5)

We will use this new mixing property to describe dependency decay of different orders.

For details on graph theory we refer the reader to [30]. We require the following concepts:

Definition 4.

A directed graph is called strongly connected, if every pair of nodes is connected by a directed path.

Definition 5.

A time-varying network is defined as sequence

{(𝒱,n)}n0,\{(\mathcal{V},\mathcal{E}^{n})\}_{n\in\mathbb{N}_{0}}, (2.6)

where each element (𝒱,n)(\mathcal{V},\mathcal{E}^{n}) is a directed graph.

We will use the following new connectivity notion for time-varying networks:

Definition 6.

A time-varying network {(𝒱,n)}n0\{(\mathcal{V},\mathcal{E}^{n})\}_{n\in\mathbb{N}_{0}} is called (ε,κ)(\varepsilon,\kappa)-stochastically strongly connected (SSC) with ε(0,1)\varepsilon\in(0,1) and κ0\kappa\in\mathbb{N}_{0} , if there exists a strongly connected graph (𝒱,)(\mathcal{V},\mathcal{E}), such that for all n0n\in\mathbb{N}_{0} and for all (i,j)(i,j)\in\mathcal{E}

((i,j)k=nn+κk)ε.\mathbb{P}\left((i,j)\in\bigcup_{k=n}^{n+\kappa}\mathcal{E}^{k}\right)\geq\varepsilon. (2.7)

3 Problem description

We consider a DD-agent distributed optimization problem, where each agent i𝒱{1,,D}i\in\mathcal{V}\coloneqq\{1,\ldots,D\} has to choose values for a local variable xidix_{i}\in\mathbb{R}^{d_{i}} to minimize a global objective function FF. The global optimization variable x=(x1,,xD)dx=(x_{1},\ldots,x_{D})\in\mathbb{R}^{d} is the concatenation of the local optimization variables xix_{i} associated with the local agents. The objective function is assumed to be stochastic and given by

F(x)𝔼ξ[f(x,ξ)],F(x)\coloneqq\mathbb{E}_{\xi}\left[f(x,\xi)\right], (3.1)

with f:d×𝒮f:\mathbb{R}^{d}\times\mathcal{S}\to\mathbb{R} a random real-valued function, where the randomness is modeled by an 𝒮\mathcal{S}-valued random variable ξ\xi that represents noise or uncertainty.

As discussed in the introduction, if a central agent had direct control of the optimization vector xx, it would be straightforward to find a local minimum of (3.1) using stochastic gradient descend (SGD) under suitable assumptions [7, Ch. 10]. However, as the components xix_{i} of xx are associated with distributed agents, we consider that the agents need to coordinate their choice for the local optimization variables by exchanging information via a communication network.

We assume a synchronized communication setting according to a global clock n0n\in\mathbb{N}_{0}. Each agent updates its local variable at every time step nn based on a local gradient descend iteration. The iterations will be defined in Section 3.1. For each agent i𝒱i\in\mathcal{V}, the local iterations generate a sequence {xin}n=0\{x_{i}^{n}\}^{\infty}_{n=0} starting from an initial candidate value xi0x^{0}_{i} for the optimal value xix^{*}_{i}. To execute the local gradient iteration, agent ii requires a locally available estimate of the current optimization variable xjnx^{n}_{j} of agent jj for all jij\not=i. We consider that these information have to be communicated using a communication network. Specifically, every agent will use the newest available local optimization variable from every other agent to update its own local variable. Due to the potential uncertainty of the network only aged/delayed versions of the local variables of the other agents are available at agent ii at any time step. Therefore, agent ii has only access to the delayed version

X^in(x1nτi1(n),,xin,,xDnτiD(n))\hat{X}_{i}^{n}\coloneqq\left(x_{1}^{n-\tau_{i1}(n)},\ldots,x_{i}^{n},\ldots,x_{D}^{n-\tau_{iD}(n)}\right) (3.2)

of xnx^{n} at every time step nn. Here, xjnτij(n)x_{j}^{n-\tau_{ij}(n)} denotes the newest update of xjx_{j} available at agent ii at time nn and τij(t)\tau_{ij}(t)’s are the AoI random variables. Further, we refer to X^in\hat{X}_{i}^{n} as the local believe vector of agent ii at time nn. As the next step, we describe the gradient based iteration that uses X^in\hat{X}_{i}^{n} instead of xnx^{n} to solve problem (3.1).

3.1 Algorithm

We consider that the agents iteratively refine their local variables using the partial derivatives xjf(,ξ)\nabla_{x_{j}}f(\cdot,\xi). We assume that the agents do not know the distribution of ξ\xi, but during any time slot nn an agent can observe an i.i.d. realisation ξn\xi^{n} of ξ\xi. For simplicity, we assume that all agents are affected by the same realisation of the random variable ξ\xi. In other words, when agent ii and agent jj calculate their partial derivatives during some time slot nn, they use the same realisation ξn\xi^{n} of ξ\xi, i.e. xif(,ξn)\nabla_{x_{i}}f(\cdotp,\xi^{n}) and xjf(,ξn)\nabla_{x_{j}}f(\cdotp,\xi^{n}). The extension to agent-specific realisations of ξ\xi is merely a technical reformulation that was already described in [24].

To evaluate the partial derivatives xif(,ξn)\nabla_{x_{i}}f(\cdotp,\xi^{n}), agent ii uses the most recent available version of the optimization variable xjnx^{n}_{j} of agent jj for all jij\not=i, i.e. it calculates xif(X^in,ξn)\nabla_{x_{i}}f(\hat{X}^{n}_{i},\xi^{n}) instead of xif(xn,ξn)\nabla{x_{i}}f(x^{n},\xi^{n}). The following SGD iteration is used by each agent to update its local variable:

xin+1=xina(n)(xif(X^in,ξn)+λin),x^{n+1}_{i}=x^{n}_{i}-a(n)\left(\nabla_{x_{i}}f(\hat{X}^{n}_{i},\xi^{n})+\lambda_{i}^{n}\right), (3.3)

where {a(n)}n0\{a(n)\}_{n\in\mathbb{N}_{0}} is a given step-size sequence and λin\lambda_{i}^{n} is a local stochastic additive error that may arise during the calculation of xif\nabla_{x_{i}}f. Algorithm 1 summarizes the protocol that runs on every agent locally. For now, we assume that the agents use some communication protocol to exchange their local believe vectors X^in\hat{X}^{n}_{i} over a network. The protocol and the network properties therefore induce the distribution of the AoI variables τij(n)\tau_{ij}(n). In the next section, we will proof the convergence of Algorithm 1 under an abstract growth conditions for the AoI variables. Section 5 then formulates a communication network model and associated assumptions to satisfy these growth conditions.

Remark 1.

In our previous work [25], we also included asynchronous gradient updates in (3.3). The agents are therefore allowed to update their local variable not at every time step n0n\geq 0. This may be included here using the associated assumptions from [25]. Our previous work, considers (3.3) for a restrictive network model with independent communication (see Section 5.4 for further details). This work resolves this issue, but we use synchronous gradient updates to avoid notational overload.

1 Initialize local optimization variable estimate xi0x^{0}_{i} ;
2 Initialize local belief vector X^i0\hat{X}^{0}_{i} ;
3 for all time steps nn do
4       Obtain network sample ξn\xi^{n} ;
5       xin+1xina(n)xif(X^in,ξn)x^{n+1}_{i}\leftarrow x^{n}_{i}-a(n)\nabla_{x_{i}}f(\hat{X}^{n}_{i},\xi^{n}) ;
6       Update ii-th component of X^in\hat{X}^{n}_{i} to new xin+1x^{n+1}_{i} with appended timestamp n+1n+1 ;
7       Run local communciation protocol to exchange X^in\hat{X}^{n}_{i} ;
8      
9 end for
Algorithm 1 Local algorithm at agent i𝒱i\in\mathcal{V}

4 Asymptotic convergence of Algorithm 1

In this section, we will show the asymptotic convergence of Algorithm 1. Specifically, we show that the iterations in (3.3) converge to a neighbourhood of a local stationary point of (3.1). The main part of the proof is to show that the gradient errors

einxF(x1nτi1(n),,xDnτiD(n),ξn)xF(x1n,,xDn,ξn)e^{n}_{i}\coloneqq\nabla_{x}F(x_{1}^{n-\tau_{i1}(n)},\ldots,x_{D}^{n-\tau_{iD}(n)},\xi^{n})-\nabla_{x}F(x_{1}^{n},\ldots,x_{D}^{n},\xi^{n}) (4.1)

due to AoI vanish asymptotically. This error captures the difference of the gradient descent step some agent ii would take given its local believe vector compared to the true global state.

To show that the gradient errors vanish, we require that the AoI variables τij(n)\tau_{ij}(n) satisfy an asymptotic growth conditions. Observe that the gradient errors depend on the AoI variables and the step size sequence a(n)a(n), since a(n)a(n) determines how much successive steps of iteration (3.3) differ. If the step size sequence gets smaller quick enough relative to some maximal potential growth of the AoI variables, we expect eine^{n}_{i} to decay to zero. This is because even significantly outdated information stays relevant, if the steps taken during that time were comparably small. The convergence of Algorithm 1 will then follow from the convergence of (3.3) when one considers no AoI, i.e. the case τij(n)=0\tau_{ij}(n)=0.

The following assumption formalize the required trade of between the choice of the step size sequence and the required network quality.

Assumption 1.
  1. 1.

    There exists p[1,2)p\in[1,2) and a non-negative integer-valued random variable τ¯\overline{\tau}, such that τ¯\overline{\tau} stochastically dominates (Definition 1) all τij(n)\tau_{ij}(n) for all i,j𝒱i,j\in\mathcal{V} and all n0n\in\mathbb{N}_{0} with

    𝔼[τ¯p]<.\mathbb{E}\left[\overline{\tau}^{p}\right]<\infty.
  2. 2.

    The step-size sequence {a(n)}n0\{a(n)\}_{n\in\mathbb{N}_{0}} satisfies:

    1. (i)

      n=0a(n)=\sum\limits_{n=0}^{\infty}a(n)=\infty, n=0a(n)2<\sum\limits_{n=0}^{\infty}a(n)^{2}<\infty.

    2. (ii)

      a(n)𝒪(n1p)a(n)\in\mathcal{O}(n^{-\frac{1}{p}}) with pp as in 1.

Assumption 1.1 requires that the network quality is good enough, such that the tail distribution of the AoI variables τij(n)\tau_{ij}(n) decays rapidly enough, such that at least a dominating random variable with finite mean exists. This assumption contributes a significant weakening of the traditional assumptions required for convergence in the present setting. The traditional assumptions formulated in [6], required at least a dominating random variable with finite pp-th moment for p>1p>1. In this work, we show for the first time that actually p=1p=1 is sufficient to achieve asymptotic convergence. We show that under Assumption 1.1 the growth of each τij(n)\tau_{ij}(n) can not exceed any fraction of nn after some potentially large time-step. We formulate this in Lemma 1.

Assumption 1.1(i) is standard in the stochastic approximation literature. Assumption 1.1(ii) requires that we choose the step size depending on the network quality. For example, if only the worst network quality can be verified, i.e. that there is only a dominating variable with finite mean, then we have to choose a(n)𝒪(1n)a(n)\in\mathcal{O}(\frac{1}{n}). In addition to the aforementioned weakening of assumptions, we also do not require that the step-size sequence is eventually monotonically decreasing and we only require a(n)𝒪(n1p)a(n)\in\mathcal{O}(n^{-\frac{1}{p}}) instead of a(n)o(n1p)a(n)\in o(n^{-\frac{1}{p}}). Both conditions were traditionally assumed.

We will now present additional assumptions associated with the objective function ff in (3.1) and the iterations in (3.3). After that we show the convergence of Algorithm 1. In Section 5 we will then present verifiable network conditions to ensure that Assumption 1.1 holds. We will also see that it is easy to formulate very restrictive network conditions, such that the growth of the AoI variables behave very well. For example, one can show that all moments of the AoI variables are bounded under the standard assumptions in the distributed optimization literature (see Section 5.4). That is, Assumption 1.1 would be satisfied for all p1p\geq 1.

In addition to Assumption 1, we require the following assumptions.

Assumption 2.
  1. 1.

    xf\nabla_{x}f is continuous and locally Lipschitz continuous in the xx-coordinate, where the associated constant may depend on ξ\xi.

  2. 2.

    𝔼[xf]=x𝔼[f]\mathbb{E}\left[\nabla_{x}f\right]=\nabla_{x}\mathbb{E}\left[f\right].

  3. 3.

    ξ\xi is an 𝒮\mathcal{S}-valued random variable, where 𝒮\mathcal{S} is a one-point compactificable space.

Assumption 3.

For all i𝒱i\in\mathcal{V}, we have supn0xin<\sup\limits_{n\in\mathbb{N}_{0}}\ \lVert x_{i}^{n}\rVert<\infty a.s.

Assumption 4.

Almost surely, lim supnλnλ\limsup\limits_{n\to\infty}\lVert\lambda^{n}\lVert\leq\lambda for some fixed λ>0\lambda>0.

We refer to [25] for detailed discussion on the verifiability of Assumptions 2, 3 and 4.

Recall the gradient errors due to the AoI variables in (4.1). Next, we will show that these gradient errors vanish asymptotically. We start with an asymptotic grow property for the AoI variables under Assumption 1.1.

Lemma 1.

Under Assumption 1.1, we have for every ε(0,1)\varepsilon\in(0,1) and for all i,j𝒱i,j\in\mathcal{V} that

n=0(τij(n)>εn1p)<.\sum_{n=0}^{\infty}\mathbb{P}\left(\tau_{ij}(n)>\varepsilon n^{\frac{1}{p}}\right)<\infty. (4.2)
Proof.

Fix ε(0,1)\varepsilon\in(0,1). By Assumption 1 there is a non-negative integer-valued random variable τ¯\overline{\tau}, such that

(τij(n)>εn1p)(τ¯>εn1p)\mathbb{P}\left(\tau_{ij}(n)>\varepsilon n^{\frac{1}{p}}\right)\leq\mathbb{P}\left(\overline{\tau}>\varepsilon n^{\frac{1}{p}}\right) (4.3)

for all n0n\in\mathbb{N}_{0} and 𝔼[τ¯p]<\mathbb{E}\left[\overline{\tau}^{p}\right]<\infty. Hence, we have

n=0(τij(n)>εn1p)\displaystyle\sum_{n=0}^{\infty}\mathbb{P}\left(\tau_{ij}(n)>\varepsilon n^{\frac{1}{p}}\right) n=0(τ¯>εn1p)\displaystyle\leq\sum_{n=0}^{\infty}\mathbb{P}\left(\overline{\tau}>\varepsilon n^{\frac{1}{p}}\right) (4.4)
=m=0n𝒩(m)(τ¯>εn1p)\displaystyle=\sum_{m=0}^{\infty}\sum_{n\in\mathcal{N}(m)}\mathbb{P}\left(\overline{\tau}>\varepsilon n^{\frac{1}{p}}\right) (4.5)
m=0n𝒩(m)(τ¯>m)=m=0|𝒩(m)|(τ¯>m),\displaystyle\leq\sum_{m=0}^{\infty}\sum_{n\in\mathcal{N}(m)}\mathbb{P}\left(\overline{\tau}>m\right)=\sum_{m=0}^{\infty}\lvert\mathcal{N}(m)\rvert\mathbb{P}\left(\overline{\tau}>m\right), (4.6)

where the sets 𝒩(m)\mathcal{N}(m) are defined as

𝒩(m){n0:mεn1p<m+1}={n0:mp/εpn<(m+1)p/εp}\mathcal{N}(m)\coloneqq\{n\in\mathbb{N}_{0}:m\leq\varepsilon n^{\frac{1}{p}}<m+1\}=\{n\in\mathbb{N}_{0}:\nicefrac{{m^{p}}}{{\varepsilon^{p}}}\leq n<\nicefrac{{(m+1)^{p}}}{{\varepsilon^{p}}}\} (4.7)

for every m0m\in\mathbb{N}_{0}. We use these sets to consider all εn1p\varepsilon n^{\frac{1}{p}} in every interval [m,m+1)[m,m+1). The second inequality then follows from the monotonicity of the cumulative distribution function (CDF) by definition of the sets 𝒩(m)\mathcal{N}(m). Since |𝒩(m)|1εp((m+1)pmp)\lvert\mathcal{N}(m)\rvert\leq\frac{1}{\varepsilon^{p}}\left((m+1)^{p}-m^{p}\right), we have therefore shown that

n=0(τij(n)>εn1p)1εpn=0((n+1)pnp)(τ¯>n)=1εp𝔼[τ¯p]<.\sum_{n=0}^{\infty}\mathbb{P}\left(\tau_{ij}(n)>\varepsilon n^{\frac{1}{p}}\right)\leq\frac{1}{\varepsilon^{p}}\sum_{n=0}^{\infty}\left((n+1)^{p}-n^{p}\right)\mathbb{P}\left(\overline{\tau}>n\right)=\frac{1}{\varepsilon^{p}}\mathbb{E}\left[\overline{\tau}^{p}\right]<\infty. (4.8)

The last equality follows from Proposition 1, since τ¯\overline{\tau} is a non-negative integer-valued random variable. ∎

We are now ready to prove that the gradient errors due to AoI vanish asymptotically.

Lemma 2.

Under Assumptions 2, 3 and 1, we have that limnein=0\lim\limits_{n\to\infty}\lVert e^{n}_{i}\lVert=0.

Proof.

By Assumption 3, we have that xnB¯r(0)x^{n}\in\overline{B}_{r}(0) for some sample path dependent radius 0<R<0<R<\infty. Then, [25, Lemma 1] shows that xF\nabla_{x}F is locally Lipschitz continuous with a constant independent of ξ\xi. Hence, xF\nabla_{x}F is globally Lipschitz continuous with a constant LL when restricted to B¯R(0)\overline{B}_{R}(0). Using the triangular inequality, the established Lipschitz continuity of xF\nabla_{x}F and Assumption 3, we have that

einLj𝒱m=nτij(n)n1xjm+1xjmCj𝒱m=nτij(n)n1a(m),\lVert e^{n}_{i}\lVert\leq L\sum_{j\in\mathcal{V}}\sum\limits_{m=n-\tau_{ij}(n)}^{n-1}\lVert x_{j}^{m+1}-x_{j}^{m}\lVert\leq C\sum_{j\in\mathcal{V}}\sum\limits_{m=n-\tau_{ij}(n)}^{n-1}a(m), (4.9)

for a sample path dependent constant C>0C>0. We will now show that

limn0(m=nτij(n)n1a(m))=0,\lim\limits_{n\to 0}\left(\sum\limits_{m=n-\tau_{ij}(n)}^{n-1}a(m)\right)=0, (4.10)

which will imply that limnein=0\lim\limits_{n\to\infty}\lVert e^{n}_{i}\lVert=0.

By Assumption 1, a(n)𝒪(n1p)a(n)\in\mathcal{O}(n^{-\frac{1}{p}}). Hence, there are constants c>0c>0 and NN\in\mathbb{N}, such that

a(n)cn1p for all nN.a(n)\leq cn^{-\frac{1}{p}}\text{ for all }n\geq N. (4.11)

Also by Assumption 1, there is some τ¯\overline{\tau} that stochastically dominates all τij(n)\tau_{ij}(n), with 𝔼[τ¯p]<\mathbb{E}\left[\overline{\tau}^{p}\right]<\infty. Now fix ε(0,1)\varepsilon\in(0,1). By Lemma 1 we have that

n=0(τij(n)>εn1p)<.\sum_{n=0}^{\infty}\mathbb{P}\left(\tau_{ij}(n)>\varepsilon n^{\frac{1}{p}}\right)<\infty. (4.12)

It now follows from the Borel-Cantelli Lemma that (τij(n)>εn1p i.o. )=0.\mathbb{P}\left(\tau_{ij}(n)>\varepsilon n^{\frac{1}{p}}\text{ i.o. }\right)=0. Hence, there is sample path dependent N(ε)N(\varepsilon)\in\mathbb{N}, such that

τij(n)εn1pnN(ε).\tau_{ij}(n)\leq\varepsilon n^{\frac{1}{p}}\qquad\forall\,n\geq N(\varepsilon). (4.13)

Equations 4.11 and 4.13 therefore yield that

m=nτij(n)n1a(m)cm=nεn1pn1m1p\sum\limits_{m=n-\tau_{ij}(n)}^{n-1}a(m)\leq c\sum\limits_{m=n-\varepsilon n^{\frac{1}{p}}}^{n-1}m^{-\frac{1}{p}} (4.14)

for all nn with nN(ε)n\geq N(\varepsilon) and nεn1pNn-\varepsilon n^{\frac{1}{p}}\geq N. Finally, using the monotonicity of n1pn^{-\frac{1}{p}}, we have

m=nεn1pn1m1pεn1p(nεn1p)1p=ε(1εn1p1)1pn{εp(1,2),ε1εp=1.\sum\limits_{m=n-\varepsilon n^{\frac{1}{p}}}^{n-1}m^{-\frac{1}{p}}\leq\varepsilon n^{\frac{1}{p}}(n-\varepsilon n^{\frac{1}{p}})^{-\frac{1}{p}}=\varepsilon(1-\varepsilon n^{\frac{1}{p}-1})^{-\frac{1}{p}}\xrightarrow[n\to\infty]{}\begin{cases}\varepsilon&\quad p\in(1,2),\\ \frac{\varepsilon}{1-\varepsilon}&\quad p=1.\end{cases} (4.15)

Hence,

lim supn(m=nτij(n)n1a(m))cε1ε\limsup\limits_{n\rightarrow\infty}\left(\sum\limits_{m=n-\tau_{ij}(n)}^{n-1}a(m)\right)\leq\frac{c\varepsilon}{1-\varepsilon} (4.16)

and (4.10) follows, since the choice of ε\varepsilon is arbitrary. ∎

In [25, Theorem 1] we proved the convergence of Algorithm 1 for τij(n)=0\tau_{ij}(n)=0 for all n0n\in\mathbb{N}_{0}. The following theorem is now an immediate consequence of this result and Lemma 2.

Theorem 1.

Under Assumptions 2, 3, 1 and 4, we have that Algorithm 1 converges almost surely to a λ\lambda-neighbourhood of the set of stationary points of F, where λ\lambda is the almost sure bound of the additive errors according to Assumption 4.

5 A new set of network conditions for distributed optimization

In the previous section, we presented a convergence proof for Algorithm 1 under the network assumption Assumption 1.1. This assumption directly requires that some pp-th moment of all AoI variables is bounded. However, the distribution of all AoI variables will typically be the consequence of direct agent to agent communication. We are therefore interest in more concrete conditions on the network and the agent communication that imply the required AoI moment conditions. To achieve this, this section introduces a network model and associated assumptions to verify Assumption 1.1.

5.1 Network model

Recall that Algorithm 1 requires that the agents exchange their local variables xinx_{i}^{n} over a network. The network and an associated communication protocol should allow that local variables xinx_{i}^{n} can frequently spread across the network and reach every agent. We will now introduce a network model were the agents try to exchange their local belief vectors X^in\hat{X}_{i}^{n}. The agents therefore try to share their latest available version of all other agents local variable with other agents. This might potentially flood the network with data, however, there well known protocols to reduce the number of possibly redundant transmissions [17].

We assume a time-varying network (Definition 5)

{(𝒱,n)}n0,\{(\mathcal{V},\mathcal{E}^{n})\}_{n\in\mathbb{N}_{0}}, (5.1)

which is a sequence of directed graphs. Each agent is in one-to-one correspondence with one node in the graph. For every time step n0n\in\mathbb{N}_{0}, an edge (i,j)n(i,j)\in\mathcal{E}^{n} represents the event that agent ii successfully exchanges its local believe vector X^in\hat{X}_{i}^{n} during time slot nn with agent jj. We denote this event by AijnA^{n}_{ij}. Therefore, the sequence of directed graphs and the sequences of events {Aijn}n0\{A^{n}_{ij}\}_{n\in\mathbb{N}_{0}} are in one to one correspondence: An edge (i,j)n(i,j)\in\mathcal{E}^{n} if and only if the event AijnA^{n}_{ij} occurs. An edge therefore does not represent the possibility for communication, but the actual event of communication.

One may add additional complexity to the model, e.g. using a graph that represents the possibility for communication. Additionally, the model may be extended to scenarios where multiple successive events AijnA_{ij}^{n} need to occur to guarantee the exchange of a single realization of a believe vector X^in\hat{X}^{n}_{i}. This might be necessary if the dimension of X^in\hat{X}^{n}_{i} is very large and/or the network bandwidth is small.

Note that although we defined the events AijnA^{n}_{ij} for all (i,j)𝒱×𝒱(i,j)\in\mathcal{V}\times\mathcal{V}, some of those events might never occur over the whole time horizon. We will especially do not require that all agents communicate directly! However, at least some of the events AijnA^{n}_{ij} should occur “frequently” enough such that the time-varying network satisfies certain connectivity properties. This will be formulated in Section 5.2 with Assumption 5.

The formulation of the time-varying communication network using the edge events AijnA^{n}_{ij} has several advantages. The model allows for an underlying time-varying graph that may be the consequence of an network scheduling algorithm or the physical dynamics of the agents themselves. Each event AijnA_{ij}^{n} can be represented as a multistage process. For example, (i) the availability of a channel, (ii) the use of an access protocol given the availability of a channel, (iii) the success of the transmission given the successful channel access. In general, the event-based formulation appears to be very convenient for analysis.

In the next two subsection, we will formulate our assumptions for the time-varying network {(𝒱,n)}n0\{(\mathcal{V},\mathcal{E}^{n})\}_{n\in\mathbb{N}_{0}} using the events AijnA_{ij}^{n}.

5.2 Stochastic strong connectedness

The following assumption formalizes our required network connectivity property.

Assumption 5 (Network connectivity assumption).

We assume that the time-varying network is (ε,κ)(\varepsilon,\kappa)-stochastically strongly connected (SSC) (Definition 6) for some ε(0,1)\varepsilon\in(0,1) and some κ0\kappa\in\mathbb{N}_{0}.

Using the events AijnA^{n}_{ij}, a (ε,κ)(\varepsilon,\kappa)-SSC network requires there exists a strongly connected graph (𝒱,)(\mathcal{V},\mathcal{E}), such that for all n0n\in\mathbb{N}_{0} and for all (i,j)(i,j)\in\mathcal{E}, we have

(k=nn+κAijn)ε.\mathbb{P}\left(\bigcup_{k=n}^{n+\kappa}A^{n}_{ij}\right)\geq\varepsilon. (5.2)

A (ε,κ)(\varepsilon,\kappa)-SSC network therefore requires that there are some pairs of agents (i,j)(i,j)\in\mathcal{E} that can communicate directly at least ones in every time-interval of the form [n,n+κ][n,n+\kappa] with positive probability ε\varepsilon. Notice that SSC does not require direct communication between every pair of agents. The only agents that do communicate are those given in the set \mathcal{E}. A SSC network reflects our intuition of a non-degenerate communication network. Some agents can “frequently” exchange information with positive probability and information can spread across the network since \mathcal{E} is strongly connected.

Note that a network that is SSC does not imply guaranteed transmissions periodically. We will see shortly that SSC is significantly weaker that plain guaranteed periodic communication. With stochastic strong connectivity we can not draw any conclusions about the dependency of events in the network. On the other hand, assuming guaranteed periodic communication does imply a strong form of dependency decay as shown in Section 5.4. Recall that our objective is to verify Assumption 1.1. However, using SSC alone is not sufficient to even guarantee the existence of a dominating random variable as required in Assumption 1.1. The next subsection therefore formulates dependency decay conditions using strong mixing (Definition 2).

5.3 Network dependency decay

Recall that our time-varying network is given by a sequence of directed graphs {(𝒱,n)}n0\{(\mathcal{V},\mathcal{E}^{n})\}_{n\in\mathbb{N}_{0}}. The sequence is in one-to-one correspondences with events AijnA^{n}_{ij} that represent the presence of an edge at time nn. We will now formulate a dependency decay assumption based on the notion of strongly mixing processes. We can then show that the AoI variables τij(n)\tau_{ij}(n) associated with a (ε,κ)(\varepsilon,\kappa)-SSC network satisfies specific moment conditions depending on the assumed rate at which dependency decays in the network.

Assumption 6 (Dependency decay assumption).

We assume that the time-varying network is such that there is some η0\eta\geq 0 such that each process 𝟙k=nn+ηAijn\mathbbm{1}_{\bigcup_{k=n}^{n+\eta}A^{n}_{ij}} is pp-strongly mixing (Definition 3) for some p[1,2)p\in[1,2).

With this assumption we do not require that the dependency of subsequent events AijnA_{ij}^{n} does decay at any specific rate. However, there should be an interval size η>0\eta>0, such that the dependency of subsequent union events k=nn+ηAijn\bigcup_{k=n}^{n+\eta}A^{n}_{ij} decays sufficiently fast. Notice that Assumption 6 is a dependency decay assumption for the network processes 𝟙k=nn+NAijn\mathbbm{1}_{\bigcup_{k=n}^{n+N}A^{n}_{ij}} associated with all network edges (i,j)𝒱(i,j)\in\mathcal{V}. However, we actually only require the assumption for those edges (i,j)(i,j)\in\mathcal{E} in an edge set \mathcal{E} according to Assumption 5. Additionally, notice that we do not require any form of independence or dependency decay between transmissions over different edges. The reason for this is Lemma 5. The lemma will show that the existence of a dominating random variable for the AoI variables is in a natural way a transitive property of the network.

In this work we don’t give a recipe to verify Assumption 6. However, we will see in the next subsection that the standard assumptions in the distributed optimization literature all imply Assumption 6. Another set of examples where Assumption 6 is also directly satisfied are scenarios where the network events AijnA^{n}_{ij} are driven by a geometrically ergodic Markov process [10, 8]. Of course, it can be comparatively difficult to verify this in practice. However, traditionally and also more recently it has been quite common to model network fading channels by finite Markov chains [31, 4, 23, 19, 5]. We further discuss the verifiability of Assumption 6 in Section 7.

5.4 Comparison of Assumptions 5 and 6 to assumptions in the literature

In this subsection we show that the typical network assumptions in the literature imply Assumptions 5 and 6.

First, consider the network models in [25, 3, 16, 14]. It is easy to check that network models imply the following properties:

  1. 1.

    There is a strongly connected graph (𝒱,)(\mathcal{V},\mathcal{E}) and some ε>0\varepsilon>0, such that (Aijn)>ε\mathbb{P}\left(A_{ij}^{n}\right)>\varepsilon for all n0n\in\mathbb{N}_{0} and for all (i,j)(i,j)\in\mathcal{E}.

  2. 2.

    The events AijnA_{ij}^{n} are independent for different time-steps or different edges.

Independence is particularly unrealistic for wireless communication systems, since transmission that occur close in time, space, frequency or code can be highly correlated. Notably, this assumptions do not show any trade off between the choice of the step size sequence a(n)a(n) and some network related property. Hence, there is no trade of between the growth of the AoI variables and the choice of the step size sequence. In fact, it is easy to show that under this assumptions all moments of all τij(n)\tau_{ij}(n) are bounded, see Section 6.1 Example 1.

We can now show that the above properties imply Assumptions 5 and 6. Assumption 5 is directly satisfied for κ=1\kappa=1. Define the σ\sigma-algebras

lmσ(Aijnlnm;i,j𝒱).\mathcal{F}_{l}^{m}\coloneqq\sigma\left(A^{n}_{ij}\mid l\leq n\leq m;i,j\in\mathcal{V}\right). (5.3)

Then Assumption 6 holds trivially, since the independence of the events AijnA_{ij}^{n} implies that

|(AB)(A)(B)|=0.\lvert\mathbb{P}\left(A\cap B\right)-\mathbb{P}\left(A\right)\mathbb{P}\left(B\right)\rvert=0. (5.4)

for A0lA\in\mathcal{F}_{0}^{l} and Bl+nB\in\mathcal{F}_{l+n}^{\infty} for all l,nl,n\in\mathbb{N}. Hence, the mixing coefficients αij(n)\alpha_{ij}(n) for each process AijnA_{ij}^{n} satisfies αij(n)=0\alpha_{ij}(n)=0 for every n0n\geq 0.

Second, consider the time-varying network in [33, 22, 32, 28, 1, 34, 15]. The authors assume that their network is MM-strongly connected. Hence, they assume guaranteed periodic communication. Assumption 5 is therefore directly satisfied by choosing κ=M\kappa=M. Then (k=nn+κAijk)=1\mathbb{P}\left(\bigcup_{k=n}^{n+\kappa}A^{k}_{ij}\right)=1 for all n0n\in\mathbb{N}_{0}. Assumption 6 is also directly satisfied by choosing η=M\eta=M. To see this, fix any n,m0n,m\geq 0 with mnm\not=n. Then

((k=nn+ηAijk)(k=mm+ηAijk))=1,\mathbb{P}\left(\left(\bigcup_{k=n}^{n+\eta}A^{k}_{ij}\right)\cap\left(\bigcup_{k=m}^{m+\eta}A^{k}_{ij}\right)\right)=1, (5.5)

since the intersection of almost sure events is an almost sure event. Therefore,

((k=nn+ηAijk)(k=mm+ηAijk))(k=nn+ηAijk)(k=mm+ηAijk)=0\mathbb{P}\left(\left(\bigcup_{k=n}^{n+\eta}A^{k}_{ij}\right)\cap\left(\bigcup_{k=m}^{m+\eta}A^{k}_{ij}\right)\right)-\mathbb{P}\left(\bigcup_{k=n}^{n+\eta}A^{k}_{ij}\right)\mathbb{P}\left(\bigcup_{k=m}^{m+\eta}A^{k}_{ij}\right)=0 (5.6)

and Assumption 6 follows.

We have therefore shown that the network models in the literature satisfy Assumptions 5 and 6. Moreover, Assumptions 5 and 6 are significantly weaker, since they do not require independent communication or guaranteed periodic communication, but merely asymptotic independence.

6 Stochastic dominance properties of AoI for Time-Varying Networks

In this section, we show that Assumptions 5 and 6 imply Assumption 1.1. Recall that the AoI variables τij(n)\tau_{ij}(n), as defined in Section 3, are now a consequence of the network model formulated in Section 5.1. Each agent tries to send its local believe vector X^in\hat{X}_{i}^{n} (Equation 3.2) to some other agents. A successful transmission to some other agent jj is represented by an edge (i,j)n(i,j)\in\mathcal{E}^{n} of the time-varying network {(𝒱,n)}n0\{(\mathcal{V},\mathcal{E}^{n})\}_{n\in\mathbb{N}_{0}} or equivalently by the event AijnA_{ij}^{n}.

Recall that Assumption 1.1 requires finite moment properties of a random variable that stochastically dominates (Definition 1) all τij(n)\tau_{ij}(n). The following definition will be useful to formulate our main result and the subsequent proof.

Definition 7.

We say an AoI variable τij(n)\tau_{ij}(n) is stochastically dominated with finite p\boldsymbol{p}-th moment for some p0p\geq 0, if there exists a non-negative integer-valued random variable τ¯\overline{\tau} that stochastically dominates all τij(n)\tau_{ij}(n) for and all n0n\in\mathbb{N}_{0} with 𝔼[τ¯p]<\mathbb{E}\left[\overline{\tau}^{p}\right]<\infty.

The following theorem formulates the main result of this section.

Theorem 2.

Let {(𝒱,n)}n0\{(\mathcal{V},\mathcal{E}^{n})\}_{n\in\mathbb{N}_{0}} be a time-varying network that is (ε,κ)(\varepsilon,\kappa)-SSC (Definition 6) with associated strongly connected graph (𝒱,)(\mathcal{V},\mathcal{E}). If for each (i,j)(i,j)\in\mathcal{E}, there is some η0\eta\in\mathbb{N}_{0}, such that the process 𝟙k=nn+ηAijn\mathbbm{1}_{\bigcup_{k=n}^{n+\eta}A^{n}_{ij}} is pp-strongly mixing (Definition 3) for some p0p\geq 0, then all AoI variables τij(n)\tau_{ij}(n) are stochastically dominated by a single random variable with finite pp-th moment.

Stochastic dominance with finite 0-th moment corresponds to the mere existence of a dominating random variable without any necessary moment condition. Theorem 2 shows a more general result as it would be required for the convergence of Algorithm 1. It is shown for all p[0,)p\in[0,\infty). The following corollary is now immediate and requires Theorem 2 for p[1,2)p\in[1,2).

Corollary 1.

Under Assumptions 2, 3, 4, 5 and 6, we have that Algorithm 1 converges almost surely to a λ\lambda-neighbourhood of the set of stationary points of F, where λ\lambda is the almost sure bound of the additive errors according to Assumption 4.

Proof.

Under Assumption 5 and 6, it follows from Theorem 2 that Assumption 1.1 holds for some p[1,2)p\in[1,2). We can then choose a step size sequence a(n)a(n) that is not summable, but square summable with a(n)𝒪(n1p)a(n)\in\mathcal{O}(n^{-\frac{1}{p}}) and therefore also satisfy Assumption 1.2. The requirements of Theorem 1 are therefore satisfied and the statement of the corollary follows. ∎

The rest of this section is devoted to the proof of Theorem 2. We begin by describing a general construction/recipe to establish the stochastic dominance properties for AoI variables of time-varying networks. In addition, we illustrate the recipe for the scenario where the edge events AijnA^{n}_{ij} are independent. Afterwards, we give the proof of Theorem 2. Before proceeding, we show a preliminary property of the AoI variables for an (ε,κ)(\varepsilon,\kappa)-SSC network.

Lemma 3.

Let {(𝒱,n)}n0\{(\mathcal{V},\mathcal{E}^{n})\}_{n\in\mathbb{N}_{0}} be a time-varying network that is (ε,κ)(\varepsilon,\kappa)-SSC with associated strongly connected graph (𝒱,)(\mathcal{V},\mathcal{E}), then for all (i,j)(i,j)\in\mathcal{E} we have

(τij(n)>m)<ε,m,nκ,\mathbb{P}\left(\tau_{ij}(n)>m\right)<\varepsilon,\qquad\forall\,m,n\geq\kappa, (6.1)
Proof.

First, we have have (τij(n)>m)=0\mathbb{P}\left(\tau_{ij}(n)>m\right)=0 for mnm\geq n, since τij(n)n\tau_{ij}(n)\leq n. We therefore concentrate on m<nm<n. Fix (i,j)(i,j)\in\mathcal{E}, i.e. ii and jj are agents that can communicate directly. Observe that successful direct communication from ii to jj during any time interval of the form [nm+1,n][n-m+1,n] implies that the AoI at time nn is less than mm. In other words, we have the following inclusion

{l=nm+1nAijl}{τij(n)m}.\{\bigcup_{l=n-m+1}^{n}A^{l}_{ij}\}\subset\{\tau_{ij}(n)\leq m\}. (6.2)

Since the network is (ε,κ)(\varepsilon,\kappa)-SSC, we have that

(τij(n)m)(l=nm+1nAijl)ε,n>mM\mathbb{P}\left(\tau_{ij}(n)\leq m\right)\geq\mathbb{P}\left(\bigcup_{l=n-m+1}^{n}A^{l}_{ij}\right)\geq\varepsilon,\qquad\forall\,n>m\geq M (6.3)

The complementary event of the previous expression therefore concludes the proof of the lemma. ∎

6.1 A construction to establish stochastic dominance properties

We now describe a general construction to establish the stochastic dominance properties with some finite pp-th moment (Definition 7) for an AoI variable τij(n)\tau_{ij}(n). The idea is to find a uniform upper bound u:00u:\mathbb{N}_{0}\to\mathbb{R}_{\geq 0}, such that

(τij(n)>m)u(m)\mathbb{P}\left(\tau_{ij}(n)>m\right)\leq u(m)

for all mNm\geq N independent of n0n\in\mathbb{N}_{0} for some N0N\in\mathbb{N}_{0} and limmu(m)=0\lim\limits_{m\to\infty}u(m)=0. We can now use this bound to define the CDF of a new random variable. Since limmu(m)=0\lim\limits_{m\to\infty}u(m)=0 there is some M0M\in\mathbb{N}_{0}, such that u(m)1u(m)\leq 1 for all mMNm\geq M\geq N. Now define a non-negative integer-valued random variable τ¯ij\overline{\tau}_{ij} by describing its CDF (more precisely its complementary CDF) as follows:

(τ¯ij>m)\displaystyle\mathbb{P}\left(\overline{\tau}_{ij}>m\right) =1,\displaystyle=1,\quad 0m<M,\displaystyle 0\leq m<M, (6.4)
(τ¯ij>m)\displaystyle\mathbb{P}\left(\overline{\tau}_{ij}>m\right) =u(m),\displaystyle=u(m),\quad mM.\displaystyle m\geq M. (6.5)

By definition τ¯ij\overline{\tau}_{ij} stochastically dominates all τij(n)\tau_{ij}(n) for all n0n\in\mathbb{N}_{0}. Moreover, if

m=0((m+1)pmp)u(m)<\sum_{m=0}^{\infty}((m+1)^{p}-m^{p})u(m)<\infty

for some p>0p>0, then it will follow from Proposition 1 that τij(n)\tau_{ij}(n) is stochastically dominated with finite pp-th moment.

As the next step, we describe how we can find a function u(m)u(m) for the above construction. Consider a (ε,κ)(\varepsilon,\kappa)-SSC network. Let (𝒱,)(\mathcal{V},\mathcal{E}) be the strongly connected graph associated with the (ε,κ)(\varepsilon,\kappa)-SSC network and fix an edge (i,j)(i,j)\in\mathcal{E}. Let Δ(m)\Delta(m) be an increasing sequence in \mathbb{N}, with limmΔ(m)=\lim\limits_{m\to\infty}\Delta(m)=\infty. Now for each n,m0n,m\in\mathbb{N}_{0} use this sequence to define time indices

n1nm+Δ(m),nk\displaystyle n_{1}\coloneqq n-m+\Delta(m),\qquad n_{k} nk1+2Δ(m)\displaystyle\coloneqq n_{k-1}+2\Delta(m) (6.6)

as long as nknn_{k}\leq n. Let L(m)L(m) be the number of constructed time indices and observe that

(τij(n)>m)(k=1L(m){τij(nk)>Δ(m)}).\mathbb{P}\left(\tau_{ij}(n)>m\right)\leq\mathbb{P}\left(\bigcap_{k=1}^{L(m)}\{\tau_{ij}(n_{k})>\Delta(m)\}\right). (6.7)

This follows since τij(n)>m\tau_{ij}(n)>m implies τij(nk)>Δ(m)\tau_{ij}(n_{k})>\Delta(m) for all k{1,,L(m)}k\in\{1,\ldots,L(m)\} by the very construction of the time indices nkn_{k}. In general, we can now derive u(m)u(m) as an upper bound to the right-hand side in (6.7), which we illustrate immediately for case of independent network communication, i.e. were the events AijnA_{ij}^{n} are independent. For the case of dependent network communication, this will be formulated in Lemma 4 in the next section.

Example 1 (Independent network communication).

Let (𝒱,)(\mathcal{V},\mathcal{E}) be the strongly connected graph associated with a (ε,κ)(\varepsilon,\kappa)-SSC network and consider an edge (i,j)(i,j)\in\mathcal{E}. Using the exemplary network independence and Lemma 3, we have from (6.7) that

(τij(n)>m)k=1L(m)(τij(nk)>Δ(m))<εL(m),\mathbb{P}\left(\tau_{ij}(n)>m\right)\leq\prod_{k=1}^{L(m)}\mathbb{P}\left(\tau_{ij}(n_{k})>\Delta(m)\right)<\varepsilon^{L(m)}, (6.8)

for all mm large enough such that Δ(m)κ\Delta(m)\geq\kappa. Now define

u(m)εL(m),Δ(m)mu(m)\coloneqq\varepsilon^{L(m)},\quad\Delta(m)\approx\sqrt{m}

and hence L(m)m/2L(m)\approx\nicefrac{{\sqrt{m}}}{{2}}. The construction described above then yields a dominating random variable τ¯\overline{\tau} for all τij(n)\tau_{ij}(n) for all n0n\in\mathbb{N}_{0}. It is now easy to verify that

𝔼[τ¯p]m=0((m+1)pmp)u(m)m=0((m+1)pmp)εm/2<\mathbb{E}\left[\overline{\tau}^{p}\right]\leq\sum_{m=0}^{\infty}((m+1)^{p}-m^{p})u(m)\approx\sum_{m=0}^{\infty}((m+1)^{p}-m^{p})\varepsilon^{\nicefrac{{\sqrt{m}}}{{2}}}<\infty

for all p0p\geq 0, since the series is a version of a weighted geometric series. We have therefore established that with independent communication, each AoI variable τij(n)\tau_{ij}(n) with (i,j)(i,j)\in\mathcal{E} is stochastically dominated with finite pp-th moment for every p0p\geq 0. This underlines how strong the assumption of independent communication is.

6.2 Proof of Theorem 2

In the previous example, we used the independence of the edge events AijnA_{ij}^{n} to establish a uniform upper bound for (τij(n)>m)\mathbb{P}\left(\tau_{ij}(n)>m\right) with geometric decay. Recall that Δ(m)\Delta(m) was used in (6.6) to define the time indices nkn_{k}, such that nkΔ(m)nk1=Δ(m)n_{k}-\Delta(m)-n_{k-1}=\Delta(m). Now consider the case where the edge events are not independent but merely mixing. We will see that we can then find a new upper bound to (6.7), such that

(τij(n)>m)(k=1L(m){τij(nk)>Δ(m)})εL(m)+error(Δ(m)).\mathbb{P}\left(\tau_{ij}(n)>m\right)\leq\mathbb{P}\left(\bigcap_{k=1}^{L(m)}\{\tau_{ij}(n_{k})>\Delta(m)\}\right)\leq\varepsilon^{L(m)}+error(\Delta(m)). (6.9)

with an error term error(Δ(m))error(\Delta(m)) due to the non independence.

Now, if the mixing coefficients associated with processes 𝟙k=nn+ηAijn\mathbbm{1}_{\bigcup_{k=n}^{n+\eta}A^{n}_{ij}} decay rapidly enough, we expect that error(Δ(m))error(\Delta(m)) decays sufficiently, such that the new upper bound still satisfies some summability properties and hence allows that we establish stochastic dominance properties. The following lemma makes this intuition precise. We establishes the stochastic dominance property of order p0p\geq 0 for those network edges (i,j)(i,j) that ensure that the network is (ε,κ)(\varepsilon,\kappa)-SSC.

Lemma 4.

Let {(𝒱,n)}n0\{(\mathcal{V},\mathcal{E}^{n})\}_{n\in\mathbb{N}_{0}} be a time-varying network that is (ε,κ)(\varepsilon,\kappa)-SSC (Definition 6) with associated strongly connected graph (𝒱,)(\mathcal{V},\mathcal{E}). If for any (i,j)(i,j)\in\mathcal{E} the process 𝟙k=nn+ηAijn\mathbbm{1}_{\bigcup_{k=n}^{n+\eta}A^{n}_{ij}} is pp-strongly mixing (Definition 3) for some p0p\geq 0 and some η0\eta\in\mathbb{N}_{0}, then τij(n)\tau_{ij}(n) is stochastically dominated with finite pp-th moment (Definition 7).

Proof.

Fix an edge (i,j)(i,j)\in\mathcal{E}. The theme of the proof is to establish a uniform upper bound to the complementary CDF of τij(n)\tau_{ij}(n) independent of nn, such that the construction from Section 6.1 yields the required dominating random variable.

Step 1 (Reduction to η=0\eta=0): The pp-strongly mixing property of the network guarantees mixing of the process 𝟙k=nn+ηAijn\mathbbm{1}_{\bigcup_{k=n}^{n+\eta}A^{n}_{ij}} for some η0\eta\in\mathbb{N}_{0}. W.l.o.g. we can assume that η=0\eta=0. This is justified as follows. Lets denote by τijη(k)\tau^{\eta}_{ij}(k) a new random variable that captures the time, since the last interval of the form [mη,(m+1)η][m\eta,(m+1)\eta] with at least one successful transmission from ii to jj. The case η=0\eta=0 then yields the conclusion of the Lemma for τijη(k)\tau^{\eta}_{ij}(k), i.e. there will be random variable τ¯ijη\overline{\tau}^{\eta}_{ij} that stochastically dominates all τijη(k)\tau^{\eta}_{ij}(k) with 𝔼[(τ¯ijη)p]<\mathbb{E}\left[(\overline{\tau}^{\eta}_{ij})^{p}\right]<\infty. For any k0k\geq 0 and n{kη,(k+1)η}n\in\{k\eta,(k+1)\eta\}, we have τij(n)η(τN(k)+1)\tau_{ij}(n)\leq\eta(\tau_{N}(k)+1). Therefore,

(τij(n)>m)(N(τijη(k)+1)>m)(η(τ¯ijη+1)>m)\mathbb{P}\left(\tau_{ij}(n)>m\right)\leq\mathbb{P}\left(N(\tau^{\eta}_{ij}(k)+1)>m\right)\leq\mathbb{P}\left(\eta(\overline{\tau}^{\eta}_{ij}+1)>m\right) (6.10)

and 𝔼[ηp(τ¯ijη+1)p]<\mathbb{E}\left[\eta^{p}(\overline{\tau}^{\eta}_{ij}+1)^{p}\right]<\infty by Minkowski’s inequality. Hence, η(τN(k)+1)\eta(\tau_{N}(k)+1) would be the required dominating random variable for τij(n)\tau_{ij}(n) and we may therefore assume η=0\eta=0.

Step 2 (Initial CDF bound): Fix m0m\in\mathbb{N}_{0} and recall the definition of Δ(m)\Delta(m) and the associated sequence nkn_{k} for each n0n\in\mathbb{N}_{0} from Section 6.1. We have

(τij(n)>m)(k=1L(m){τij(nk)>Δ(m)}).\mathbb{P}\left(\tau_{ij}(n)>m\right)\leq\mathbb{P}\left(\bigcap_{k=1}^{L(m)}\{\tau_{ij}(n_{k})>\Delta(m)\}\right). (6.7 recalled)

With a slide abuse of notation we will now refer with τij(n)\tau_{ij}(n) to the age of information associated with the direct information exchange from ii to jj. The age of information associated with direct information exchange by definition stochastically dominates the actual AoI. Without this step we would technically require a stronger mixing requirement, specifically, one for the events generated by all AijnA_{ij}^{n} and not only for the events generated by AijnA_{ij}^{n} for the pair (i,j)(i,j). Note that Lemma 3 also directly holds for this case, since we anyway used the direct information exchange to prove it.

We will now establish an upper bound to (6.7 recalled) using that 𝟙Aijn\mathbbm{1}_{A^{n}_{ij}} is pp-strongly mixing. For this, define the following sub-σ\sigma-algebras generated by the events AijnA^{n}_{ij}:

lsσ(Aijnlns),l0,s0{}.\mathcal{F}_{l}^{s}\coloneqq\sigma\left(A^{n}_{ij}\mid l\leq n\leq s\right),\quad l\in\mathbb{N}_{0},s\in\mathbb{N}_{0}\cup\{\infty\}. (6.11)

The important generated events are, whether the AoI variables at some time step s0s\in\mathbb{N}_{0} exceed a threshold l0l\in\mathbb{N}_{0}, i.e. whether {τij(s)>l}\{\tau_{ij}(s)>l\}. Since the event {τij(s)>l}\{\tau_{ij}(s)>l\} is generated by the events AijkA^{k}_{ij} with k{sl+1,,s1,s}k\in\{s-l+1,\ldots,s-1,s\},we have that

{τij(s)>l}sl+1s.\{\tau_{ij}(s)>l\}\in\mathcal{F}^{s}_{s-l+1}. (6.12)

For this, we required the reduction to age of information associated with direct information exchange. It then follows by definition of the time indices nkn_{k} that

{τij(nL(m))>Δ(m)}nL(m)Δ(m)+1nL(m)nL(m)Δ(m)\{\tau_{ij}(n_{L(m)})>\Delta(m)\}\in\mathcal{F}^{n_{L(m)}}_{n_{L(m)}-\Delta(m)+1}\subset\mathcal{F}^{\infty}_{n_{L(m)}-\Delta(m)} (6.13)

and

{τij(nk)>Δ(m)}nkΔ(m)+1nk0nL(m)1\{\tau_{ij}(n_{k})>\Delta(m)\}\in\mathcal{F}^{n_{k}}_{n_{k}-\Delta(m)+1}\subset\mathcal{F}^{n_{L(m)-1}}_{0} (6.14)

for every k{1,,L(m)1}k\in\{1,\ldots,L(m)-1\}. Hence,

k=1L(m)1{τij(nk)>Δ(m)}0nL(m)1.\bigcap_{k=1}^{L(m)-1}\{\tau_{ij}(n_{k})>\Delta(m)\}\in\mathcal{F}^{n_{L(m)-1}}_{0}. (6.15)

By construction of the indices nkn_{k}, we have nL(m)Δ(m)nL(m)1=Δ(m)n_{L(m)}-\Delta(m)-n_{L(m)-1}=\Delta(m). The strong mixing property of the process 𝟙Aijn\mathbbm{1}_{A^{n}_{ij}} therefore implys that

(k=1L(m){τij(nk)>Δ(m)})({τij(nL(m))>Δ(m)})(k=1L(m)1{τij(nk)>Δ(m)})+α(Δ(m)),\begin{split}\mathbb{P}\left(\bigcap_{k=1}^{L(m)}\{\tau_{ij}(n_{k})>\Delta(m)\}\right)&\leq\mathbb{P}\left(\{\tau_{ij}(n_{L(m)})>\Delta(m)\}\right)\mathbb{P}\left(\bigcap_{k=1}^{L(m)-1}\{\tau_{ij}(n_{k})>\Delta(m)\}\right)\\ &\qquad+\alpha(\Delta(m)),\end{split} (6.16)

where α(n)\alpha(n) are the mixing coefficients associated with the process 𝟙Aijn\mathbbm{1}_{A^{n}_{ij}}. It now follows from Lemma 3 that (τij(nk)>Δ(m))<ε\mathbb{P}\left(\tau_{ij}(n_{k})>\Delta(m)\right)<\varepsilon for Δ(m)κ\Delta(m)\geq\kappa, since the network is (ε,κ)(\varepsilon,\kappa)-SSC. Hence,

(k=1L(m){τij(nk)>Δ(m)})ε(k=1L(m)1{τij(nk)>Δ(m)})+α(Δ(m)).\mathbb{P}\left(\bigcap_{k=1}^{L(m)}\{\tau_{ij}(n_{k})>\Delta(m)\}\right)\leq\varepsilon\mathbb{P}\left(\bigcap_{k=1}^{L(m)-1}\{\tau_{ij}(n_{k})>\Delta(m)\}\right)+\alpha(\Delta(m)). (6.17)

Applying (6.16) and (6.17) successively yields:

(τij(n)>m)\displaystyle\mathbb{P}\left(\tau_{ij}(n)>m\right) k=1L(m)({τij(nk)>Δ(m)})+k=1L(m)1εk1α(Δ(m))\displaystyle\leq\prod_{k=1}^{L(m)}\mathbb{P}\left(\{\tau_{ij}(n_{k})>\Delta(m)\}\right)+\sum_{k=1}^{L(m)-1}\varepsilon^{k-1}\alpha(\Delta(m)) (6.18)
εL(m)+11εα(Δ(m)).\displaystyle\leq\varepsilon^{L(m)}+\frac{1}{1-\varepsilon}\alpha(\Delta(m)). (6.19)

for Δ(m)κ.\Delta(m)\geq\kappa.

For p=0p=0, we can now apply the construction presented in Section 6.1 with the bound (6.19) to obtain a dominating random variable. Here we may choose Δ(m)\Delta(m) as in Example 1. For p>0p>0 it is now crucial to choose Δ(m)\Delta(m), such that both terms in (6.19) decay rapidly enough to obtain the required stochastic dominance property with finite pp-th moment. However, it turns out that the bound (6.19) is only sufficient to achieve this for all q<pq<p, due to the merely geometric decay of the first term. The next step therefore uses (6.19) to obtain a better upper bound for (6.18).

Step 3: To improve the CDF bound for p>0p>0, we use that m=0mp1α(m)\sum_{m=0}^{\infty}m^{p-1}\alpha(m) is summable. It then follows that for p>1p>1 we have

α(m)𝒪(m(p1))\alpha(m)\in\mathcal{O}(m^{-(p-1)}) (6.20)

and for 0<p10<p\leq 1 we have

α(m)𝒪(mp),\alpha(m)\in\mathcal{O}(m^{-p}), (6.21)

since for this case mp1α(m)m^{p-1}\alpha(m) is guaranteed to be decreasing as p1<0p-1<0. Both cases show that there is a constant cc and some μ~>0\tilde{\mu}>0, such that

α(Δ(m))C(Δ(m))μ~\alpha(\Delta(m))\leq C(\Delta(m))^{-\tilde{\mu}} (6.22)

for sufficiently large mm. With (6.19) it then follows that

(τij(n)>m)εL(m)+c(Δ(m))μ~.\mathbb{P}\left(\tau_{ij}(n)>m\right)\leq\varepsilon^{L(m)}+c(\Delta(m))^{-\tilde{\mu}}. (6.23)

for sufficiently large mm. Since the first term above is exponential and the second is rational, we can find a new μ>0\mu>0, such that

(τij(n)>m)<mμ\mathbb{P}\left(\tau_{ij}(n)>m\right)<m^{-\mu} (6.24)

for mm sufficiently large. For this, one may again choose Δ(m)m\Delta(m)\approx\sqrt{m}.

Step 4 - (Verifying the stochastic dominance property with finite pp-th moment):

We now insert the CDF bound from step 4 in (6.18) and obtain

(τij(n)>m)Δ(m)μL(m)+11εα(Δ(m))\begin{split}\mathbb{P}\left(\tau_{ij}(n)>m\right)\leq\Delta(m)^{-\mu L(m)}+\frac{1}{1-\varepsilon}\alpha(\Delta(m))\end{split} (6.25)

for mm sufficiently large. Now choose δ(0,1)\delta\in(0,1), such that

μ(14δ1)p+1\mu(\frac{1}{4\delta}-1)\geq p+1 (6.26)

and then choose Δ(m)=δm\Delta(m)=\lceil\delta m\rceil. We choose this to guarantee the required summability property of the first term in (6.25), since

L(m)=m2Δ(m)14δ1L(m)=\lfloor\frac{m}{2\Delta(m)}\rfloor\geq\frac{1}{4\delta}-1 (6.27)

for m12δm\geq\frac{1}{2\delta}. Hence, we have

(τij(n)>m)\displaystyle\mathbb{P}\left(\tau_{ij}(n)>m\right) (δm)(p+1)+11εα(δm)\displaystyle\leq(\delta m)^{-(p+1)}+\frac{1}{1-\varepsilon}\alpha(\lceil\delta m\rceil) (6.28)

for m12δm\geq\frac{1}{2\delta}.

Now define

u(m)(δm)(p+1)+11εα(δm)u(m)\coloneqq(\delta m)^{-(p+1)}+\frac{1}{1-\varepsilon}\alpha(\lceil\delta m\rceil)

and apply the construction presented in Section 6.1. This yields a non-negative integer-valued random variable τ¯ij\overline{\tau}_{ij} that stochastically dominates τij(n)\tau_{ij}(n) for all nn\in\mathbb{N}. Moreover, we have 𝔼[τ¯ijp]<\mathbb{E}\left[\overline{\tau}_{ij}^{p}\right]<\infty, if

m=0((m+1)pmp)u(m)<.\sum_{m=0}^{\infty}((m+1)^{p}-m^{p})u(m)<\infty. (6.29)

The first part of the series is finite, since

m=1((m+1)pmp)(δm)(p+1)2pδp+1m=1m2<,\sum_{m=1}^{\infty}((m+1)^{p}-m^{p})(\delta m)^{-(p+1)}\leq\frac{2^{p}}{\delta^{p+1}}\sum_{m=1}^{\infty}m^{-2}<\infty, (6.30)

where we used that ((m+1)pmp)2pmp1((m+1)^{p}-m^{p})\leq 2^{p}m^{p-1} for mm\in\mathbb{N}.

For the second part of the series, note that α(n)\alpha(n) is by construction a monotonically decreasing function from 0\mathbb{N}_{0} to [0,14][0,\frac{1}{4}] [8]. Now extend α(n)\alpha(n) by linear interpolation to a monotonically decreasing function from 0\mathbb{R}_{\geq 0} to [0,14][0,\frac{1}{4}]. Then for all m0m\in\mathbb{N}_{0}, we have α(δm)α(δm)\alpha(\lceil\delta m\rceil)\leq\alpha(\delta m) by monotonicity. Hence the second part is finite, since

m=1((m+1)pmp)α(δm)2pm=1mp1α(δm)22p1δpm=1mp1α(m)<\begin{split}\sum_{m=1}^{\infty}((m+1)^{p}-m^{p})\alpha(\delta m)&\leq 2^{p}\sum_{m=1}^{\infty}m^{p-1}\alpha(\delta m)\\ &\leq\frac{2^{2p-1}}{\delta^{p}}\sum_{m=1}^{\infty}m^{p-1}\alpha(m)<\infty\end{split} (6.31)

The second inequality can be shown using a similar construction as in Lemma 1. Finally, the finiteness of the last summation follows from the assumed pp-strongly mixing property.

We have thus established the stochastic dominance property of order p0p\geq 0 for those network edges that ensure that the network is SSC under the pp-strongly mixing condition. As the next step, we show an elementary lemma associated with the AoI variables of a time-varying network. The lemma shows that the existence of stochastically dominating random variables associated with the AoI variables of a time-varying network is a transitive property.

Lemma 5.

For nodes i,j,k𝒱i,j,k\in\mathcal{V} of a time-varying network suppose τij(n)\tau_{ij}(n) and τjk(n)\tau_{jk}(n) are stochastically dominated by τ¯ij\overline{\tau}_{ij} and τ¯jk\overline{\tau}_{jk}, respectively. Then

  1. 1.

    There is random variable τ¯ik\overline{\tau}_{ik} that stochastically dominates τik(n)\tau_{ik}(n).

  2. 2.

    If moreover 𝔼[τ¯ijp]+𝔼[τ¯jkp]<\mathbb{E}\left[\overline{\tau}_{ij}^{p}\right]+\mathbb{E}\left[\overline{\tau}_{jk}^{p}\right]<\infty for some p>0p>0, then also 𝔼[τ¯ikp]<\mathbb{E}\left[\overline{\tau}_{ik}^{p}\right]<\infty.

Proof.

Fix i,j,k𝒱i,j,k\in\mathcal{V} and some m2m\geq 2. Now observe the following inclusion associated with events of the three AoI variables τij(n),τjk(n)\tau_{ij}(n),\tau_{jk}(n) and τik(n)\tau_{ik}(n):

{τij(nm2)m2}{τjk(n)m2}{τik(n)m},\{\tau_{ij}(n-\frac{m}{2})\leq\frac{m}{2}\}\cap\{\tau_{jk}(n)\leq\frac{m}{2}\}\subset\{\tau_{ik}(n)\leq m\}, (6.32)

The inclusion states that the two events

  1. 1.

    The AoI is less than m2\frac{m}{2} for information received at node jj from node ii at time nm2n-\frac{m}{2}

  2. 2.

    The AoI is less than m2\frac{m}{2} for information received at node kk from node jj at time nn

imply the event that the AoI is less than mm for information received at node kk from node ii at time nn. By taking the complement of the inclusion in (6.32), we have that

(τik(n)>m)\displaystyle\mathbb{P}\left(\tau_{ik}(n)>m\right) ({τij(nm2)>m2}{τjk(n)>m2})\displaystyle\leq\mathbb{P}\left(\{\tau_{ij}(n-\frac{m}{2})>\frac{m}{2}\}\cup\{\tau_{jk}(n)>\frac{m}{2}\}\right) (6.33)
(τij(nm2)>m2)+(τjk(n)>m2)\displaystyle\leq\mathbb{P}\left(\tau_{ij}(n-\frac{m}{2})>\frac{m}{2}\right)+\mathbb{P}\left(\tau_{jk}(n)>\frac{m}{2}\right) (6.34)
<(τ¯ij>m2)+(τ¯jk>m2).\displaystyle<\mathbb{P}\left(\overline{\tau}_{ij}>\frac{m}{2}\right)+\mathbb{P}\left(\overline{\tau}_{jk}>\frac{m}{2}\right). (6.35)

In the last step, we used the assumption that there are random variables τ¯ij\overline{\tau}_{ij} and τ¯jk\overline{\tau}_{jk} that stochastically dominate τij(n)\tau_{ij}(n) and τjk(n)\tau_{jk}(n), respectively, for all nn.

Now τ¯ij\overline{\tau}_{ij} and τ¯jk\overline{\tau}_{jk} are integer-valued, so there is some MM\in\mathbb{N} such that

(τ¯ij>m2)+(τ¯jk>m2)<1\mathbb{P}\left(\overline{\tau}_{ij}>\frac{m}{2}\right)+\mathbb{P}\left(\overline{\tau}_{jk}>\frac{m}{2}\right)<1 (6.36)

for all mMm\geq M. Define a non-negative integer-valued random variable τ¯ik\overline{\tau}_{ik} by defining its CDF:

(τ¯ik>m)\displaystyle\mathbb{P}\left(\overline{\tau}_{ik}>m\right) 1, for all 0m<M,\displaystyle\coloneqq 1,\quad\text{ for all }0\leq m<M, (6.37)
(τ¯ik>m)\displaystyle\mathbb{P}\left(\overline{\tau}_{ik}>m\right) (τ¯ij>m2)+(τ¯jk>m2), otherwise.\displaystyle\coloneqq\mathbb{P}\left(\overline{\tau}_{ij}>\frac{m}{2}\right)+\mathbb{P}\left(\overline{\tau}_{jk}>\frac{m}{2}\right),\quad\text{ otherwise}. (6.38)

This proves part (a)(a) of the lemma.

Now suppose 𝔼[τ¯ijp]+𝔼[τ¯jkp]<\mathbb{E}\left[\overline{\tau}_{ij}^{p}\right]+\mathbb{E}\left[\overline{\tau}_{jk}^{p}\right]<\infty for some p>0p>0. We can now write the pp-th moment of τ¯ik\overline{\tau}_{ik} using its CDF from above:

𝔼[τ¯ikp]\displaystyle\mathbb{E}\left[\overline{\tau}_{ik}^{p}\right] =m=0((m+1)pmp)(τ¯ik>m)\displaystyle=\sum_{m=0}^{\infty}((m+1)^{p}-m^{p})\mathbb{P}\left(\overline{\tau}_{ik}>m\right) (6.39)
m=0((m+1)pmp)(τ¯ij>m2)+m=0((m+1)pmp)(τ¯jk>m2)\displaystyle\leq\sum_{m=0}^{\infty}((m+1)^{p}-m^{p})\mathbb{P}\left(\overline{\tau}_{ij}>\frac{m}{2}\right)+\sum_{m=0}^{\infty}((m+1)^{p}-m^{p})\mathbb{P}\left(\overline{\tau}_{jk}>\frac{m}{2}\right) (6.40)
=2p(𝔼[τ¯ijp]+𝔼[τ¯jkp])<.\displaystyle=2^{p}\left(\mathbb{E}\left[\overline{\tau}_{ij}^{p}\right]+\mathbb{E}\left[\overline{\tau}_{jk}^{p}\right]\right)<\infty. (6.41)

Where the equality follows from Proposition 1, since 2τ¯ij2\overline{\tau}_{ij} and 2τ¯jk2\overline{\tau}_{jk} are non-negative integer-valued random variables. This proves part (b)(b) of the lemma. ∎

Lemma 5 allows that we extend the stochastic dominance properties from Lemma 4 for node pairs (i,j)(i,j)\in\mathcal{E} to arbitrary node pairs (i,j)𝒱2(i,j)\in\mathcal{V}^{2}. We are now ready to prove Theorem 2.

Proof of Theorem 2.

First, fix an arbitrary pairs of nodes (i,j)𝒱2(i,j)\in\mathcal{V}^{2}. Since the network is SSC, it follows from Lemma 4 there is a sequence of edges {(ik,ik+1)}k=1K1\{(i_{k},i_{k+1})\}_{k=1}^{K-1}\in\mathcal{E} for some K1K\geq 1, with i1=ii_{1}=i and iK=ji_{K}=j, such that for each τikik+1\tau_{i_{k}i_{k+1}}, there is non-negative integer-valued random variable τ¯ikik+1\overline{\tau}_{i_{k}i_{k+1}} that stochastically dominates all τikik+1(n)\tau_{i_{k}i_{k+1}}(n) for all n0n\in\mathbb{N}_{0}, with 𝔼[τ¯ikik+1p]<\mathbb{E}\left[\overline{\tau}_{i_{k}i_{k+1}}^{p}\right]<\infty. It now follows by induction using the transitive property of the AoI variables from Lemma 5(b), that there is a non-negative integer-valued random variable τ¯ij\overline{\tau}_{ij} that stochastically dominates all τij(n)\tau_{ij}(n) for all n0n\in\mathbb{N}_{0}, with 𝔼[τ¯ijp]<\mathbb{E}\left[\overline{\tau}_{ij}^{p}\right]<\infty.

It is now left to verify that there is a single dominating random variables for all pairs (i,j)𝒱2(i,j)\in\mathcal{V}^{2}. This essentially follows since we consider finitely many agents. For every m0m\geq 0, define

h(m)(i,j)𝒱2(τ¯ij>m).h(m)\coloneqq\sum_{(i,j)\in\mathcal{V}^{2}}\mathbb{P}\left(\overline{\tau}_{ij}>m\right). (6.42)

Since |𝒱2|<|\mathcal{V}^{2}|<\infty, there is some M0M\geq 0, such that h(m)1h(m)\leq 1 for all mMm\geq M. Define a non-negative integer-valued random variable τ¯\overline{\tau} by describing its CDF as follows:

(τ¯ij>m)\displaystyle\mathbb{P}\left(\overline{\tau}_{ij}>m\right) =1,\displaystyle=1, 0m<M,\displaystyle 0\leq m<M, (6.43)
(τ¯ij>m)\displaystyle\mathbb{P}\left(\overline{\tau}_{ij}>m\right) =h(m),\displaystyle=h(m),\qquad mM.\displaystyle m\geq M. (6.44)

By construction τ¯\overline{\tau} stochastically dominates all τij(n)\tau_{ij}(n) for all (i,j)𝒱2(i,j)\in\mathcal{V}^{2} and for all n0n\in\mathbb{N}_{0}. Finally, we have

𝔼[τ¯p]m=0((m+1)pmp)h(m)=(i,j)𝒱2𝔼[τ¯ijp]<,\mathbb{E}\left[\overline{\tau}^{p}\right]\leq\sum_{m=0}^{\infty}\left((m+1)^{p}-m^{p}\right)h(m)=\sum_{(i,j)\in\mathcal{V}^{2}}\mathbb{E}\left[\overline{\tau}_{ij}^{p}\right]<\infty, (6.45)

where the equality simply follows from continuity of addition and since all 𝔼[τ¯ijp]\mathbb{E}\left[\overline{\tau}_{ij}^{p}\right] are convergent.

7 Conclusions and future work

In this work, we presented an asymptotic convergence analysis of distributed stochastic gradient descent that uses aged information. The required network assumptions have been weakened to the mere existence of non-negative integer-valued random variable with finite first moment that stochastically dominates all age of information random variables variables. This assumption can be satisfied with the new network Assumptions 5 and 6. These assumptions are significantly weaker then the common network assumptions in the literature. We hope that our assumptions penalize future work in distributed optimization under less restrictive network assumptions. Notably, instead of periodic or independent communication, we merely require asymptotically independent communication formulated using α\alpha-mixing with the minimal requirement that n=0α(n)<\sum_{n=0}^{\infty}\alpha(n)<\infty.

It would be interesting to see, whether summability properties of α\alpha-mixing coefficients indeed hold for representative physical wireless communication system. This might be possible when the underlying physical system has a mixing property in an ergodic sense. For example, hyperbolic systems are common models to describe electro magnetic wave propagation and it was shown in [2] that hyperbolic systems admit a strong mixing property in an ergodic sense.

To apply Assumption 6 in practice, it would be most desirable if the α\alpha-mixing coefficients (or an upper bound) for the network processes 𝟙k=nn+ηAijn\mathbbm{1}_{\bigcup_{k=n}^{n+\eta}A^{n}_{ij}} could be estimated from data. Unfortunately, there are only a handful of methods that estimate or approximate the mixing coefficients from data. One method that uses an approximation method based on histograms was presented in [20]. However, this method suffers from high complexity. Very recently, a new method was presented in [13]. Most notably, the work presents a hypothesis test to decide, whether the sum of the alpha mixing coefficients is below an upper bound. With this, it is therefore now possible to verify with high confidence, whether Assumption 6 holds for p=1p=1 using data.

{funding}

Adrian Redder was supported by the German Research Foundation (DFG) - 315248657 and SFB 901.

References

  • Aybat and Hamedani [2019] {barticle}[author] \bauthor\bsnmAybat, \bfnmNecdet Serhat\binitsN. S. and \bauthor\bsnmHamedani, \bfnmErfan Yazdandoost\binitsE. Y. (\byear2019). \btitleA distributed ADMM-like method for resource sharing over time-varying networks. \bjournalSIAM Journal on Optimization \bvolume29 \bpages3036–3068. \endbibitem
  • Babillot [2002] {barticle}[author] \bauthor\bsnmBabillot, \bfnmMartine\binitsM. (\byear2002). \btitleOn the mixing property for hyperbolic systems. \bjournalIsrael journal of mathematics \bvolume129 \bpages61–76. \endbibitem
  • Bastianello et al. [2020] {barticle}[author] \bauthor\bsnmBastianello, \bfnmNicola\binitsN., \bauthor\bsnmCarli, \bfnmRuggero\binitsR., \bauthor\bsnmSchenato, \bfnmLuca\binitsL. and \bauthor\bsnmTodescato, \bfnmMarco\binitsM. (\byear2020). \btitleAsynchronous distributed optimization over lossy networks via relaxed ADMM: Stability and linear convergence. \bjournalIEEE Transactions on Automatic Control \bvolume66 \bpages2620–2635. \endbibitem
  • Bianchi [2000] {barticle}[author] \bauthor\bsnmBianchi, \bfnmGiuseppe\binitsG. (\byear2000). \btitlePerformance analysis of the IEEE 802.11 distributed coordination function. \bjournalIEEE Journal on selected areas in communications \bvolume18 \bpages535–547. \endbibitem
  • Boban, Gong and Xu [2016] {binproceedings}[author] \bauthor\bsnmBoban, \bfnmMate\binitsM., \bauthor\bsnmGong, \bfnmXitao\binitsX. and \bauthor\bsnmXu, \bfnmWen\binitsW. (\byear2016). \btitleModeling the evolution of line-of-sight blockage for V2V channels. In \bbooktitle2016 IEEE 84th Vehicular Technology Conference (VTC-Fall) \bpages1–7. \bpublisherIEEE. \endbibitem
  • Borkar [1998] {barticle}[author] \bauthor\bsnmBorkar, \bfnmVivek S\binitsV. S. (\byear1998). \btitleAsynchronous stochastic approximations. \bjournalSIAM Journal on Control and Optimization \bvolume36 \bpages840–851. \endbibitem
  • Borkar [2009] {bbook}[author] \bauthor\bsnmBorkar, \bfnmVivek S\binitsV. S. (\byear2009). \btitleStochastic approximation: A dynamical systems viewpoint. \bpublisherSpringer. \endbibitem
  • Bradley [2005] {barticle}[author] \bauthor\bsnmBradley, \bfnmRichard C\binitsR. C. (\byear2005). \btitleBasic properties of strong mixing conditions. A survey and some open questions. \bjournalProbability surveys \bvolume2 \bpages107–144. \endbibitem
  • Chakraborti, Jardim and Epprecht [2018] {barticle}[author] \bauthor\bsnmChakraborti, \bfnmSubhabrata\binitsS., \bauthor\bsnmJardim, \bfnmFelipe\binitsF. and \bauthor\bsnmEpprecht, \bfnmEugenio\binitsE. (\byear2018). \btitleHigher-order moments using the survival function: The alternative expectation formula. \bjournalThe American Statistician. \endbibitem
  • Davydov [1974] {barticle}[author] \bauthor\bsnmDavydov, \bfnmYu A\binitsY. A. (\byear1974). \btitleMixing conditions for Markov chains. \bjournalTheory of Probability & Its Applications \bvolume18 \bpages312–328. \endbibitem
  • Durrett [2019] {bbook}[author] \bauthor\bsnmDurrett, \bfnmRick\binitsR. (\byear2019). \btitleProbability: Theory and Examples. \bpublisherCambridge University Press. \endbibitem
  • Haghshenas et al. [2019] {barticle}[author] \bauthor\bsnmHaghshenas, \bfnmKawsar\binitsK., \bauthor\bsnmPahlevan, \bfnmAli\binitsA., \bauthor\bsnmZapater, \bfnmMarina\binitsM., \bauthor\bsnmMohammadi, \bfnmSiamak\binitsS. and \bauthor\bsnmAtienza, \bfnmDavid\binitsD. (\byear2019). \btitleMagnetic: Multi-agent machine learning-based approach for energy efficient dynamic consolidation in data centers. \bjournalIEEE Transactions on Services Computing. \endbibitem
  • Khaleghi and Lugosi [2021] {barticle}[author] \bauthor\bsnmKhaleghi, \bfnmAzadeh\binitsA. and \bauthor\bsnmLugosi, \bfnmGábor\binitsG. (\byear2021). \btitleInferring the mixing properties of an ergodic process. \bjournalarXiv preprint arXiv:2106.07054. \endbibitem
  • Koloskova et al. [2020] {binproceedings}[author] \bauthor\bsnmKoloskova, \bfnmAnastasia\binitsA., \bauthor\bsnmLoizou, \bfnmNicolas\binitsN., \bauthor\bsnmBoreiri, \bfnmSadra\binitsS., \bauthor\bsnmJaggi, \bfnmMartin\binitsM. and \bauthor\bsnmStich, \bfnmSebastian\binitsS. (\byear2020). \btitleA unified theory of decentralized SGD with changing topology and local updates. In \bbooktitleInternational Conference on Machine Learning \bpages5381–5393. \bpublisherPMLR. \endbibitem
  • Kovalev et al. [2021] {binproceedings}[author] \bauthor\bsnmKovalev, \bfnmDmitry\binitsD., \bauthor\bsnmShulgin, \bfnmEgor\binitsE., \bauthor\bsnmRichtárik, \bfnmPeter\binitsP., \bauthor\bsnmRogozin, \bfnmAlexander\binitsA. and \bauthor\bsnmGasnikov, \bfnmAlexander\binitsA. (\byear2021). \btitleADOM: Accelerated decentralized optimization method for time-varying networks. In \bbooktitleInternational Conference on Machine Learning. \bpublisherPMLR. \endbibitem
  • Lei, Chen and Fang [2018] {barticle}[author] \bauthor\bsnmLei, \bfnmJinlong\binitsJ., \bauthor\bsnmChen, \bfnmHan-Fu\binitsH.-F. and \bauthor\bsnmFang, \bfnmHai-Tao\binitsH.-T. (\byear2018). \btitleAsymptotic Properties of Primal-Dual Algorithm for Distributed Stochastic Optimization over Random Networks with Imperfect Communications. \bjournalSIAM Journal on Control and Optimization \bvolume56 \bpages2159–2188. \endbibitem
  • Lim and Kim [2001] {barticle}[author] \bauthor\bsnmLim, \bfnmHyojun\binitsH. and \bauthor\bsnmKim, \bfnmChongkwon\binitsC. (\byear2001). \btitleFlooding in wireless ad hoc networks. \bjournalComputer Communications \bvolume24 \bpages353–363. \endbibitem
  • Lin and Bitar [2017] {barticle}[author] \bauthor\bsnmLin, \bfnmWeixuan\binitsW. and \bauthor\bsnmBitar, \bfnmEilyan\binitsE. (\byear2017). \btitleDecentralized stochastic control of distributed energy resources. \bjournalIEEE Transactions on Power Systems \bvolume33 \bpages888–900. \endbibitem
  • Lin et al. [2015] {barticle}[author] \bauthor\bsnmLin, \bfnmSiyu\binitsS., \bauthor\bsnmKong, \bfnmLinghe\binitsL., \bauthor\bsnmHe, \bfnmLiang\binitsL., \bauthor\bsnmGuan, \bfnmKe\binitsK., \bauthor\bsnmAi, \bfnmBo\binitsB., \bauthor\bsnmZhong, \bfnmZhangdui\binitsZ. and \bauthor\bsnmBriso-Rodríguez, \bfnmCesar\binitsC. (\byear2015). \btitleFinite-state Markov modeling for high-speed railway fading channels. \bjournalIEEE Antennas and Wireless Propagation Letters \bvolume14 \bpages954–957. \endbibitem
  • McDonald, Shalizi and Schervish [2015] {barticle}[author] \bauthor\bsnmMcDonald, \bfnmDaniel J\binitsD. J., \bauthor\bsnmShalizi, \bfnmCosma Rohilla\binitsC. R. and \bauthor\bsnmSchervish, \bfnmMark\binitsM. (\byear2015). \btitleEstimating beta-mixing coefficients via histograms. \bjournalElectronic Journal of Statistics \bvolume9 \bpages2855–2883. \endbibitem
  • Nedić and Olshevsky [2014] {barticle}[author] \bauthor\bsnmNedić, \bfnmAngelia\binitsA. and \bauthor\bsnmOlshevsky, \bfnmAlex\binitsA. (\byear2014). \btitleDistributed optimization over time-varying directed graphs. \bjournalIEEE Transactions on Automatic Control \bvolume60 \bpages601–615. \endbibitem
  • Nedic, Olshevsky and Shi [2017] {barticle}[author] \bauthor\bsnmNedic, \bfnmAngelia\binitsA., \bauthor\bsnmOlshevsky, \bfnmAlex\binitsA. and \bauthor\bsnmShi, \bfnmWei\binitsW. (\byear2017). \btitleAchieving geometric convergence for distributed optimization over time-varying graphs. \bjournalSIAM Journal on Optimization \bvolume27 \bpages2597–2633. \endbibitem
  • Pimentel, Falk and Lisbôa [2004] {barticle}[author] \bauthor\bsnmPimentel, \bfnmCecilio\binitsC., \bauthor\bsnmFalk, \bfnmTiago H\binitsT. H. and \bauthor\bsnmLisbôa, \bfnmLuciano\binitsL. (\byear2004). \btitleFinite-state Markov modeling of correlated Rician-fading channels. \bjournalIEEE transactions on vehicular technology \bvolume53 \bpages1491–1501. \endbibitem
  • Ramaswamy, Redder and Quevedo [2019] {barticle}[author] \bauthor\bsnmRamaswamy, \bfnmArunselvan\binitsA., \bauthor\bsnmRedder, \bfnmAdrian\binitsA. and \bauthor\bsnmQuevedo, \bfnmDaniel E\binitsD. E. (\byear2019). \btitleOptimization over time-varying networks with unbounded delays. \bjournalarXiv preprint arXiv:1912.07055. \endbibitem
  • Ramaswamy, Redder and Quevedo [2021] {barticle}[author] \bauthor\bsnmRamaswamy, \bfnmArunselvan\binitsA., \bauthor\bsnmRedder, \bfnmAdrian\binitsA. and \bauthor\bsnmQuevedo, \bfnmDaniel E\binitsD. E. (\byear2021). \btitleDistributed optimization over time-varying networks with stochastic information delays. \bjournalIEEE Transactions on Automatic Control. \endbibitem
  • Redder, Ramaswamy and Karl [2022a] {barticle}[author] \bauthor\bsnmRedder, \bfnmAdrian\binitsA., \bauthor\bsnmRamaswamy, \bfnmArunselvan\binitsA. and \bauthor\bsnmKarl, \bfnmHolger\binitsH. (\byear2022a). \btitleAsymptotic Convergence of Deep Multi-Agent Actor-Critic Algorithms. \bjournalarXiv preprint arXiv:2201.00570. \endbibitem
  • Redder, Ramaswamy and Karl [2022b] {bmisc}[author] \bauthor\bsnmRedder, \bfnmAdrian\binitsA., \bauthor\bsnmRamaswamy, \bfnmArunselvan\binitsA. and \bauthor\bsnmKarl, \bfnmHolger\binitsH. (\byear2022b). \btitleMulti-agent gradient-based resource allocation for networked systems (To appear). \endbibitem
  • Scutari and Sun [2019] {barticle}[author] \bauthor\bsnmScutari, \bfnmGesualdo\binitsG. and \bauthor\bsnmSun, \bfnmYing\binitsY. (\byear2019). \btitleDistributed nonconvex constrained optimization over time-varying digraphs. \bjournalMath. Program. \bvolume176 \bpages497–544. \endbibitem
  • Tang et al. [2018] {barticle}[author] \bauthor\bsnmTang, \bfnmHanlin\binitsH., \bauthor\bsnmGan, \bfnmShaoduo\binitsS., \bauthor\bsnmZhang, \bfnmCe\binitsC., \bauthor\bsnmZhang, \bfnmTong\binitsT. and \bauthor\bsnmLiu, \bfnmJi\binitsJ. (\byear2018). \btitleCommunication Compression for Decentralized Training. \bjournalAdvances in Neural Information Processing Systems \bvolume31 \bpages7652–7662. \endbibitem
  • Trudeau [1993] {bbook}[author] \bauthor\bsnmTrudeau, \bfnmRichard J\binitsR. J. (\byear1993). \btitleIntroduction to Graph Theory. \bpublisherCourier Corporation. \endbibitem
  • Wang and Moayeri [1995] {barticle}[author] \bauthor\bsnmWang, \bfnmHong Shen\binitsH. S. and \bauthor\bsnmMoayeri, \bfnmNader\binitsN. (\byear1995). \btitleFinite-state Markov channel-a useful model for radio communication channels. \bjournalIEEE transactions on vehicular technology \bvolume44 \bpages163–171. \endbibitem
  • Wang et al. [2019] {barticle}[author] \bauthor\bsnmWang, \bfnmYinghui\binitsY., \bauthor\bsnmZhao, \bfnmWenxiao\binitsW., \bauthor\bsnmHong, \bfnmYiguang\binitsY. and \bauthor\bsnmZamani, \bfnmMohsen\binitsM. (\byear2019). \btitleDistributed Subgradient-Free Stochastic Optimization Algorithm for Nonsmooth Convex Functions over Time-Varying Networks. \bjournalSIAM Journal on Control and Optimization \bvolume57 \bpages2821–2842. \endbibitem
  • Xu et al. [2017] {barticle}[author] \bauthor\bsnmXu, \bfnmYun\binitsY., \bauthor\bsnmHan, \bfnmTingrui\binitsT., \bauthor\bsnmCai, \bfnmKai\binitsK., \bauthor\bsnmLin, \bfnmZhiyun\binitsZ., \bauthor\bsnmYan, \bfnmGangfeng\binitsG. and \bauthor\bsnmFu, \bfnmMinyue\binitsM. (\byear2017). \btitleA distributed algorithm for resource allocation over dynamic digraphs. \bjournalIEEE Transactions on Signal Processing \bvolume65 \bpages2600–2612. \endbibitem
  • Yu, Ho and Yuan [2020] {barticle}[author] \bauthor\bsnmYu, \bfnmZhan\binitsZ., \bauthor\bsnmHo, \bfnmDaniel WC\binitsD. W. and \bauthor\bsnmYuan, \bfnmDeming\binitsD. (\byear2020). \btitleDistributed Stochastic Optimization over Time-Varying Noisy Network. \bjournalarXiv preprint arXiv:2005.03982. \endbibitem