PAUSE: Low-Latency and Privacy-Aware Active User Selection for Federated Learning

Ori Peleg, Natalie Lang, Stefano Rini, Nir Shlezinger, and Kobi Cohen Parts of this work were accepted for presentation in the 2025 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) as the paper [1]. O. Peleg, N. Lang, N. Shlezinger, and K. Cohen are with School of ECE, Ben-Gurion University of the Negev, Beer-Sheva, Israel (email: {oripele, langn}@post.bgu.ac.il; {nirshl; yakovsec}@bgu.ac.il). S. Rini is with the Department of ECE, National Yang-Ming Chiao-Tung University (NYCU), Hsinchu, Taiwan (email: stefano.rini@nycu.edu.tw). This research was supported by the Israeli Ministry of Science and Technology.

Abstract

\Ac

fl enables multiple edge devices to collaboratively train a machine learning model without the need to share potentially private data. Federated learning proceeds through iterative exchanges of model updates, which pose two key challenges: (i) the accumulation of privacy leakage over time and (ii) communication latency. These two limitations are typically addressed separately—(i) via perturbed updates to enhance privacy and (ii) user selection to mitigate latency—both at the expense of accuracy. In this work, we propose a method that jointly addresses the accumulation of privacy leakage and communication latency via active user selection, aiming to improve the trade-off among privacy, latency, and model performance. To achieve this, we construct a reward function that accounts for these three objectives. Building on this reward, we propose a MAB-based algorithm, termed Privacy-aware Active User SElection (PAUSE) – which dynamically selects a subset of users each round while ensuring bounded overall privacy leakage. We establish a theoretical analysis, systematically showing that the reward growth rate of PAUSE follows that of the best-known rate in MAB literature. To address the complexity overhead of active user selection, we propose a simulated annealing-based relaxation of PAUSE and analyze its ability to approximate the reward-maximizing policy under reduced complexity. We numerically validate the privacy leakage, associated improved latency, and accuracy gains of our methods for the federated training in various scenarios.

Index Terms:

Federated Learning; Communication latency; Privacy; Multi-Armed Bandit; Simulated Annealing.

I Introduction

The effectiveness of deep learning models heavily depends on the availability of large amounts of data. In real-world scenarios, data is often gathered by edge devices such as mobile phones, medical devices, sensors, and vehicles. Because these data often contain sensitive information, there is a pressing need to utilize them for training deep neural networks without compromising user privacy. A popular framework to enable training DNNs without requiring data centralization is that of federated learning (FL) [2]. In FL, each participating device locally trains its model in parallel, and a central server periodically aggregates these local models into a global one [3].

The distributed operation of FL, and particularly the fact that learning is carried out using multiple remote users in parallel, induces several challenges that are not present in traditional centralized learning [4, 5]. A key challenge stems from the fact that FL involves repeated exchanges of highly-parameterized models between the orchestrating server and numerous users. This often entails significant communication latency which– in turn– impacts convergence, complexity, and scalability [6]. Communication latency can be tackled by model compression [7, 8, 9, 10], and via over-the-air aggregation in settings where the users share a common wireless channel [11, 12, 13]. A complementary approach for balancing communication latency, which is key for scaling FL over massive networks, is user selection [14, 15, 16]. User selection limits the number of users participating in each round, traditionally employing pre-defined policies [17, 18, 19, 20], with more recent schemes exploring active user selection based on MAB framework [21, 22, 23, 24]. The latter adapts the policy based on, e.g., learning progress and communication delay.

Another prominent challenge of FL is associated with one of its core motivators–privacy preservation. While FL does not involve data sharing, it does not necessarily preserve data privacy, as model inversion attacks were shown to unveil private information and even reconstruct the data from model updates [25, 26, 27, 28]. The common framework for analyzing privacy leakage in FL is based on local differential privacy (LDP) [29]. LDP mechanisms limit privacy leakage in a given FL round, typically by employing privacy preserving noise (PPN) [30, 31, 32], that can also be unified with model compression [33, 34]. However, this results in having the amount of leaked privacy grow with the number of learning rounds [35], degrading performance by restricting the number of learning rounds and necessitating dominant PPN. Existing approaches to avoid accumulation of privacy leakage consider it as a separate task to tackling latency and scalability, often by focusing on a fixed pre-defined number of rounds [36], or by relying on an additional trusted coordinator unit [37, 38, 39], thus deviating from how FL typically operates. This motivates unifying privacy enhancement and user selection, as means to jointly tackle privacy accumulation and latency in FL.

In this work we propose a novel framework for private and scalable multi-round FL with low latency via active user selection. Our proposed method, coined PAUSE, is based on a generic per-round privacy budget, designed to avoid leakage surpassing a pre-defined limit for any number of FL rounds. This operation results in users inducing more PPN each time they participate. The budget is accounted for in formulating a dedicated reward function for active user selection that balances privacy, communication, and generalization. Based on the reward, we propose a MAB-based policy that prioritizes users with lesser PPN, balanced with grouping users of similar expected communication latency and exploring new users for enhancing generalization. We provide an analysis of PAUSE, rigorously proving that its regret growth rate obeys the desirable growth in MAB theory [40, 41, 42].

The direct application of PAUSE involves a brute search of a combinatorial nature, whose complexity grows dramatically with the number of users. To circumvent this excessive complexity and enhance scalability, we propose a reduced complexity implementation of PAUSE based on simulated annealing (SA) [43], coined SA-PAUSE. We analyze the computational complexity of SA-PAUSE, quantifying its reduction compared to direct PAUSE, and rigorously characterize conditions for it to achieve the same performance as costly brute search. We evaluate PAUSE in learning of different scenarios with varying DNNs, datasets, privacy budgets, and data distributions. Our experimental studies systematically show that by fusing privacy enhancement and user selection, PAUSE enables accurate and rapid learning, approaching the performance of FL without such constraints and notably outperforming alternative approaches that do not account for leakage accumulation. We also show that SA-PAUSE approaches the performance of direct PAUSE in both privacy leakage, model accuracy, and latency, while supporting scalable implementations on large FL networks.

The rest of this paper is organized as follows. We review some necessary preliminaries and formulate the problem in Section II. PAUSE is introduced and analyzed in Section III, while its reduced complexity, SA-PAUSE, is detailed in Section IV. Numerical simulations are reported in Section V, and Section VI provides concluding remarks.

Notation: Throughout this paper, we use boldface lower-case letters for vectors, e.g., $\bm{x}$ . The stochastic expectation, probability operator, indicator function, and $\ell_{2}$ norm are denoted by $\mathbb{E}[\cdot]$ , $\mathbb{P}(\cdot)$ , $\mathbf{1}(\cdot)$ , and $\|\cdot\|$ , respectively. For a set $\mathcal{X}$ , we write ${\left|\mathcal{X}\right|}$ as its cardinality.

II System Model and Preliminaries

This section reviews the necessary background for deriving PAUSE. We start by recalling the FL setup and basics in LDP in Subsections II-A-II-B, respectively. Then, we formulate the active user selection problem in Subsection II-C.

II-A Preliminaries: Federated Learning

II-A1 Objective

The FL setup involves the collaborative training of a machine learning model $\bm{\theta}\in\mathbb{R}^{d}$ , carried out by $K$ remote users and orchestrated by a server. Let the set of users be indexed by $\mathbb{K}=\{1,\ldots,K\}$ , and let $\mathcal{D}_{k}$ denote the private dataset of user $k\in\mathbb{K}$ , which cannot be shared with the server. Define $F_{k}(\bm{\theta})$ as the empirical risk of a model $\bm{\theta}$ evaluated on $\mathcal{D}_{k}$ . The goal is to determine the $d\times 1$ optimal parameter vector $\bm{\theta}^{\rm opt}$ that minimizes the overall loss across all users, that is

\bm{\theta}^{\rm opt}=\operatorname*{arg\,min}_{\bm{\theta}}\left\{F(\bm{\theta})\triangleq\sum_{k=1}^{K}\frac{{\left|\mathcal{D}_{k}\right|}}{{\left|\mathcal{D}\right|}}F_{k}\left(\bm{\theta}\right)\right\}.

(1)

II-A2 Learning Procedure

FL operates over multiple iterations divided into rounds [4]. At FL round $t$ , the server selects a set of participating users $\mathcal{S}_{t}\subseteq\mathbb{K}$ , and sends the current model $\bm{\theta}_{t}$ to them. Each participating user of index $k\in\mathcal{S}_{t}$ then trains $\bm{\theta}_{t}$ using its local data $\mathcal{D}_{k}$ using, e.g., multiple iterations of mini-batch stochastic gradient descent (SGD) [44], into the updated $\bm{\theta}^{k}_{t+1}$ .

The model update obtained by the $k$ th user, denoted $\bm{h}^{k}_{t+1}=\bm{\theta}^{k}_{t+1}-\bm{\theta}_{t}$ , is shared with the server, which aggregates the local updates into a global model update. The aggregation rule commonly employed by the central server in FL is that of federated averaging (FedAvg) [2], in which the global model is obtained as

\displaystyle\bm{\theta}_{t+1}=\bm{\theta}_{t}+\sum_{k\in\mathcal{S}_{t}}\alpha_{t}^{k}\bm{h}^{k}_{t+1}=\sum_{k\in\mathcal{S}_{t}}\alpha_{t}^{k}\bm{\theta}^{k}_{t+1},

(2)

where $\alpha_{t}^{k}=\frac{{\left|\mathcal{D}_{k}\right|}}{{\left|\cup_{j\in\mathcal{S}_{t}}\mathcal{D}_{j}\right|}}$ . The updated global model is again distributed to the users and the learning procedure continues.

II-A3 Communication Model

Communication between the users and the server is associated with some varying latency [4]. We model this delay via the random variable $\tau_{t,k}$ , representing the total latency in the $t$ th round between the server and the $k$ th user. Accordingly, the communication latency of the whole round, denoted as $\tau_{t}^{\rm total}$ , is determined by the user with the highest latency

\tau_{t}^{\rm total}=\max_{k\in\mathcal{S}_{t}}\tau_{t,k}.

(3)

The communication latency $\tau_{t,k}$ varies over time (due to fading [6]) and between users (due to system heterogeneity [45]). As the latter is device specific, we model $\tau_{t,k}$ as being drawn in an i.i.d. manner from a device specific distribution [21], denoted $\tau_{k}$ . We further assume the users differ in their expected latencies, $\mathbb{E}[\tau_{k}]$ . We denote the minimal difference between these terms as $\delta\triangleq\min_{i\neq j\in\mathbb{K}}|\mathbb{E}[\tau_{i}]-\mathbb{E}[\tau_{j}]|$ , and assume that there is a minimal latency corresponding to, e.g., the minimal delay. Mathematically, this implies that there exists some $\tau_{\min}>0$ such that $\tau_{t,k}\geq\tau_{\min}$ with probability one.

II-B Preliminaries: Local Differential Privacy

One of the main motivations for FL is the need to preserve the privacy of the users’ data. Nonetheless, the concealment of the dataset of the $k$ th user, $\mathcal{D}_{k}$ , in favor of sharing the model updates trained using $\mathcal{D}_{k}$ , was shown to be potentially leaky [25, 26, 27, 28]. Therefore, to satisfy the privacy requirements of FL, initiated privacy mechanisms are necessary.

In FL, privacy is commonly quantified in terms of LDP [46, 47], as this metric assumes an untrusted server by the users.

Definition 1 ( $\epsilon$ -LDP [48]).

A randomized mechanism $\mathcal{M}$ satisfies $\epsilon$ -LDP if for any pairs of input values $v,v^{\prime}$ in the domain of $\mathcal{M}$ and for any possible output $y$ in it, it holds that

\displaystyle\mathbb{P}[\mathcal{M}(v)=y]\leq e^{\epsilon}\mathbb{P}[\mathcal{M}(v^{\prime})=y].

(4)

In Definition 1, a smaller $\epsilon$ means stronger privacy protection. A common mechanism to achieve $\epsilon$ -LDP is the Laplace mechanism (LM). Let $\operatorname*{\text{Laplace}}(\mu,b)$ be the Laplace distribution with location $\mu$ and scale $b$ . The LM is defined as:

Theorem 1 (LM [49]).

Given any function $f:D\to\mathbb{R}^{d}$ where $D$ is a domain of datasets, the LM defined as :

\mathcal{M}^{\rm Laplace}\left(f(x),\epsilon\right)=f(x)+{\left[z_{1},\dots,z_{d}\right]}^{T},

(5)

is $\epsilon$ -LDP. In (5), $z_{i}\overset{\acs{iid}}{\sim}\operatorname*{\text{Laplace}}\left(0,\Delta f/\epsilon\right)$ , i.e., they obey an i.i.d. zero-mean Laplace distribution with scale $\Delta f/\epsilon$ , where $\Delta f\triangleq\max_{x,y\in D}{||f(x)-f(y)||}_{1}$ .

LDP mechanisms, such as LM, guarantee $\epsilon$ -LDP for a given query of $\mathcal{M}$ in (4). In FL, this amounts for a single model update. As FL involves multiple rounds, one has to account for the accumulated leakage, given by the composition theorem:

Theorem 2 (Composition [48]).

Let $\mathcal{M}_{i}$ be an $\epsilon_{i}$ -LDP mechanism on input $v$ , and $\mathcal{M}(v)$ is the sequential composition of $\mathcal{M}_{1}(v),...,\mathcal{M}_{m}(v)$ , then $\mathcal{M}(v)$ satisfies $\sum_{k=1}^{m}\epsilon_{i}$ -LDP.

Theorem 2 indicates that the privacy leakage of each user in FL is accumulated as the training proceeds.

II-C Problem Formulation

Our goal is to design a privacy leakage policy alongside privacy-aware user selection. Formally, we aim to set for every round $t\in\mathbb{N}$ an algorithm that selects $m=|\mathcal{S}_{t}|$ users, while setting the privacy leakage budget $\{\epsilon_{k,t}\}_{k\in\mathcal{S}_{t}}$ . These policies should account for the following considerations:

C1

Optimize the accuracy of the trained $\bm{\theta}$ (1).
C2

Minimize the overall latency due to (3).
C3

Maintain $\bar{\epsilon}$ -LDP, i.e., the overall leakage by each user should not exceed $\bar{\epsilon}$ , where $\bar{\epsilon}$ is a pre-defined constant.
C4

Operate with limited complexity to support real-time implementation in large-scale networks.

The considerations above are addressed in the subsequent sections. We first focus solely on considerations C1-C3, based on which we present PAUSE in Section III. Subsequently, Section IV adapts PAUSE to accommodate consideration C4, yielding SA-PAUSE, thereby jointly tackling C1-C4.

III Privacy-Aware Active User Selection

This section introduces PAUSE. We first formulate its time-varying privacy budget policy and associated reward in Subsection III-A. The resulting user selection algorithm is detailed in Subsection III-B, with its regret growth analyzed in Subsection III-C. We conclude with a discussion in Subsection III-D.

III-A Reward and Privacy Policy

The formulation of PAUSE relies on two main components: $(i)$ a prefixed round-varying privacy budget; and $(ii)$ a reward holistically accounting for privacy, latency, and generalization. The privacy policy is designed to ensure that C3 is preserved regardless of the number of iterations each user participated in. Accordingly, we define a sequence $\{\epsilon_{i}\}$ with $\epsilon_{i}>0$ , satisfying:

\sum_{i=1}^{\infty}\epsilon_{i}=\bar{\epsilon},

(6)

for $\bar{\epsilon}$ finite. Using the sequence $\{\epsilon_{i}\}$ , the privacy budget of any user at the $i$ th time it participates in training the model is set to $\epsilon_{i}$ , and achieved using, e.g., LM. This guarantees that C3 holds. One candidate setting, which is also used in our experiments, sets $\epsilon_{i}=\bar{\epsilon}{(e^{\eta}-1)}e^{-\eta i}$ . This guarantees achieving asymptotic leakage of $\bar{\epsilon}$ by the limit of a geometric column. for which (6) holds when $\eta>0$ .

The reward guides the active user selection procedure and utilizes two terms. The first is the privacy reward, which accounts for the fact that our privacy policy has users introduce more dominant PPN each time they participate. The privacy reward assigned to the $k$ th user at round $t$ is

p_{k}(t)\triangleq 1-\frac{\sum_{t=1}^{T_{k}(t)}\epsilon_{t}}{\bar{\epsilon}},

(7)

where $T_{k}(t)$ is the number of rounds the $k$ th user has been selected up to and including the $t$ th round, i.e., $T_{k}(t)\triangleq\sum_{i=1}^{t}\mathbf{1}(k\in\mathcal{S}_{t})$ . The privacy reward (7) yields higher values to users who have participated in fewer rounds.

The second term is the generalization reward, designed to meet C1. It assigns higher values for users whose data have been underutilized compared to the relative size of their data from the whole available data, $\frac{|\mathcal{D}_{k}|}{|\mathcal{D}|}$ . We adopt the generalization reward proposed in [23], which was shown to account for both i.i.d. balanced data and non-i.i.d. imbalanced data cases, and rewards the $k$ th user in an $m$ -sized group at round $t$ via the function

g_{k}(t)\triangleq\bigg{|}\frac{m}{{|\mathcal{D}|}/{|\mathcal{D}_{k}|}}-\frac{T_{k}(t)}{t}\bigg{|}^{\beta}\cdot\operatorname*{\text{sign}}\biggl{(}\frac{m}{{|\mathcal{D}|}/{|\mathcal{D}_{k}|}}-\frac{T_{k}(t)}{t}\biggr{)}.

(8)

In (8), $\beta>1$ is a hyper-parameter that adjusts the fuzziness of the function, i.e., higher $\beta$ yields lower absolute value where the other parameters are fixed. Fig. 1 describes $g_{k}(\cdot)$ as a function of ${T_{k}(t)}/{t}$ , and illustrates the effect of different $\beta$ values as means of balancing the reward assigned to users that participated much (high ${T_{k}(t)}/{t}$ ).

Refer to caption — Figure 1: Generalization reward (8) for different values of $\beta$ , with $\frac{|\mathcal{D}|}{|\mathcal{D}_{k}|}=K$ .

Our proposed reward encompasses the above terms, grading the selection of a group of users $\mathcal{S}$ of size $m$ at round $t$ as

	$\displaystyle r(\mathcal{S},t)\triangleq\frac{\tau_{\min}}{\max_{k\in\mathcal{S}}\tau_{k,t}}\!+\!\frac{\alpha}{m}\sum_{k\in\mathcal{S}}g_{k}(t-1)\!+\!\frac{\gamma}{m}\sum_{k\in\mathcal{S}}p_{k}(t-1)$
	$\displaystyle=\min_{k\in\mathcal{S}}\frac{\tau_{\min}}{\tau_{k,t}}+\frac{\alpha}{m}\sum_{k\in\mathcal{S}}g_{k}(t-1)+\frac{\gamma}{m}\sum_{k\in\mathcal{S}}p_{k}(t-1).$		(9)

The reward in (9) is composed of three additive terms which correspond to C2, C1 and C3, respectively, with $\alpha$ and $\gamma$ being hyper-parameters balancing these considerations. At this point we can make two remarks regarding the reward (9):

1.

Both $g_{k}(\cdot)$ and $p_{k}(\cdot)$ penalize repeated selection of the same users. However, each rewards differently, based on generalization and privacy considerations. The former accounts for the relative dataset sizes of the users, while the latter doesn’t. In the case of homogeneous data, where for all $k\in\mathbb{K}$ , $|\mathcal{D}_{k}|=\frac{|\mathcal{D}|}{K}$ , both $g_{k}(\cdot)$ and $p_{k}(\cdot)$ play a similar role. However, they differ significantly in the non-i.i.d case.
2.

The value of the first term is determined solely by the slowest user. This non-linearity, combined with the two other terms, directs the algorithm we derive from this reward to select a group of users with similar latency in a given round.

III-B PAUSE Algorithm

Here, we present PAUSE, which is a combinatorical -based [42] algorithm based on the reward (9). To derive PAUSE, we seek a policy $\Pi\triangleq(\mathcal{S}_{1},\mathcal{S}_{2},...)$ such that $\mathbb{E}[\sum_{t=1}^{n}r(\mathcal{S}_{t},t)]$ is maximized over $n$ . To maximize the given term, as is customary in MAB settings, we aim to minimize the regret, defined as the loss of the algorithm compared to an algorithm composed by a Genie that has prior knowledge of the expectations of the random variables, i.e., of $\mu_{k}\triangleq\mathbb{E}[\frac{\tau_{\min}}{\tau_{k}}]$ .

We define the Genie’s algorithm as selecting

\mathcal{G}_{t}\triangleq\operatorname*{arg\,max}_{\mathcal{S}\subseteq\mathbb{K};|\mathcal{S}|=m}\{C^{\mathcal{G}}(\mathcal{S},t)\},

(10)

where

C^{\mathcal{G}}(\mathcal{S},t)\triangleq\min_{k\in\mathcal{S}}\mu_{k}+\frac{\alpha}{m}\sum_{k\in\mathcal{S}}g_{k}(t-1)+\frac{\gamma}{m}\sum_{k\in\mathcal{S}}p_{k}(t-1).

The Genie policy (10) attempts to maximize the expectation of the reward (9) in each round, by replacing the order of the expectation and the $\min_{k\in\mathcal{S}}$ operator. As the reward $C^{\mathcal{G}}$ is history-dependent, the Genie’s policy is history-dependent as well.

We use the Genie policy to derive PAUSE, denoted $\mathcal{P}\triangleq(\mathcal{P}_{1},\mathcal{P}_{2},\ldots)$ , as an upper confidence bound (ucb)-type algorithm [41]. Accordingly, PAUSE estimates the unknown expectations $\{\mu_{k}\}$ with their empirical means, computed using the latency measured in previous rounds via

\overline{\mu_{k}}(n)\triangleq\frac{1}{T_{k}(n)}\sum_{t=1}^{n}\frac{\tau_{\min}}{\tau_{k,t}}\cdot\mathbf{1}(k\in\mathcal{P}_{t}).

(11)

Note that (11) can be efficiently updated in a recursive manner, as

\overline{\mu_{k}}(t)=\frac{T_{k}(t-1)}{T_{k}(t)}\overline{\mu_{k}}(t-1)+\frac{\mathbf{1}(k\in\mathcal{P}_{t})}{T_{k}(t)}\frac{\tau_{min}}{\tau_{k,t}}.

(12)

PAUSE uses (11) to compute the ucb terms for each user at the end of the $t$ th round [41], via

{\rm ucb}(k,t)\triangleq\overline{\mu_{k}}(t)+\sqrt{\frac{(m+1)\log(t)}{T_{k}(t)}}.

(13)

The ucb term in (13) is designed to tackle C2. Its formulation encapsulates the inherent exploration vs. exploitation trade-off in MAB problems, boosting exploitation of the fastest users in expectation using $\overline{\mu_{k}}(t)$ , while encouraging to explore other users in its second term. The resulting user selection rule at round $t$ is

	$\displaystyle\mathcal{P}_{t}=\operatorname*{arg\,max}_{\mathcal{S}\subseteq\mathbb{K};\|\mathcal{S}\|=m}\biggr{\{}$	$\displaystyle\min_{k\in\mathcal{S}}{\rm ucb}(k,t-1)+\frac{\alpha}{m}\sum_{k\in\mathcal{S}}g_{k}(t-1)$
		$\displaystyle+\frac{\gamma}{m}\sum_{k\in\mathcal{S}}p_{k}(t-1)\biggl{\}}.$		(14)

The overall active user selection procedure is summarized as Algorithm 1. The chosen users send their noisy local model updates to the server which updates the global model by (2) and sends it back to all the users in $\mathbb{K}$ . At the end of every round, we update the users’ reward terms for the next round, in which $p_{k}(t)$ and $\overline{\mu_{k}}(t)$ change their values only for participating users $k\in\mathcal{P}_{t}$ . Note that, by the formulation of Algorithm 1, it holds that when $m$ is an integer divisor of $K$ , then in the first $\frac{K}{m}$ rounds, the server chooses every user exactly once due to the initial conditions.

Input : Set of users

\mathbb{K}

; Number of active users

m

Init : Set

T_{k}(0),\overline{\mu_{k}}(0),p_{k}(0)\leftarrow 0

;

{\rm ucb}(k,0)\leftarrow\infty

;
Initial model parameters

{\bm{\theta}}_{0}

1 for $t=1,2\ldots$ do

2 Select

\mathcal{P}_{t}

via (14);

4 Share

{\bm{\theta}}_{t-1}

with users in

\mathcal{P}_{t}

;

6 Aggregate global model

{\bm{\theta}}_{t}

via (2);

7 for $k\in\mathbb{K}$ do

8 Update

T_{k}(t)\leftarrow T_{k}(t-1)+\mathbf{1}(k\in\mathcal{P}_{t})

;

9 Update empirical estimate

\overline{\mu_{k}}(t)

via (12);

10 Update

{\rm ucb}(k,t)

via (13);

return ${\bm{\theta}}_{t}$

Algorithm 1 PAUSE

III-C Regret Analysis

To evaluate PAUSE, we next analyze its regret, which for a policy $\Pi$ is defined as the expectation of the reward gap between the given policy and the Genie’s policy:

R^{\Pi}(n)\triangleq\mathbb{E}\Bigl{[}\sum_{t=1}^{n}r(\mathcal{G}_{t},t)-r(\mathcal{S}_{t},t)\Bigr{]}.

(15)

We define the maximal reward gap for any policy as $\Delta_{\max}\triangleq\max_{t\in\mathbb{N},\Pi}r(\mathcal{G}_{t},t)-r(\mathcal{S}_{t},t)$ . This quantity is bounded as stated the following lemma:

Lemma 1.

User selection via (14) with the reward (9) satisfies

\Delta_{\max}\leq 2\alpha+\gamma+\max_{k\in\mathbb{K}}\mu_{k}-\max_{k\in\mathbb{K}}\mu_{k}.

(16)

Proof.

Inequality (16) follows from (9), as $g_{k}(t)\in[-1,1]$ and $p_{k}(t)\in[0,1]$ for every $k\in\mathbb{K}$ and $t\in\mathbb{N}$ . ∎

We bound the regret of PAUSE in the following theorem:

Theorem 3.

The regret of PAUSE holds

R^{\mathcal{P}}(n)\leq K(\Delta_{\max}\!+\!\delta)\left(\frac{4(m\!+\!1)\log(n)}{\delta^{2}}\!+\!1\!+\!\frac{2\pi^{2}}{3}\right),

(17)

Proof.

The proof is given in Appendix -A. ∎

Theorem 3 bounds the regret accumulated at every round $n$ . In the asymptotic regime, it implies that PAUSE achieves a regret growth whose order does not exceed $\mathcal{O}(\log(n))$ , being the best-known regret rate in MAB [41, 42].

III-D Discussion

PAUSE is particularly designed to facilitate privacy and communication constrained FL. It leverages MAB-based active user selection to dynamically cope with privacy leakage accumulation, without restricting the overall number of FL rounds as in [36, 20, 24]. PAUSE is theoretically shown to achieve best-known regret growth and it demonstrated promising results in our experiments as detailed in Section V.

The formulation of PAUSE in Algorithm 1 focuses on the server operation, requiring the users only to send their updates with the proper PPN. As such, it can be naturally combined with existing methods for alleviating latency and privacy via update encoding [4]. Moreover, the statement of Algorithm 1 complies with any privacy policy imposed, while adhering to the constraints C1-C3. This inherent adaptability makes it an agile solution across diverse policy frameworks.

A core challenge associated with applying PAUSE stems from the fact that (14) involves a brute search over $\binom{K}{m}$ options. Such computation is expected to become infeasible at large networks, i.e., as $K$ grows; making it incompatible with consideration C4. This complexity can be alleviated by approximating the brute search with low-complexity policies based on (14), as we do in the sequel.

IV SA-PAUSE

In this section we alleviate the computational burden associated with the brute search operation of PAUSE. The resulting algorithm, termed SA-PAUSE, is based on SA principles, as detailed in Subsection IV-A. We analyze SA-PAUSE, rigorously identifying conditions for which it coincides with PAUSE and characterize its time complexity in Subsection IV-B.

IV-A Simulated Annealing Algorithm

To ease the computational efficiency of the search procedure in (14), we construct a graph structure where the set of vertices $\mathbb{V}$ comprises all possible subsets of $m$ users in $\mathbb{K}$ . For each vertex (i.e., set of users) $\mathcal{V}\in\mathbb{V}$ , we denote its neighboring set as $\mathcal{N}_{\mathcal{V}}$ . Two vertices $\mathcal{V},\mathcal{U}\in\mathbb{V}$ are designated as neighbors, when they satisfy the following requirements:

R1

The intersection of the vertices contains exactly $m-1$ elements, i.e., the sets of users $\mathcal{V}$ and $\mathcal{U}$ differ in a single user, thus $|\mathcal{V}\cap\mathcal{U}|=m-1$ .

One of the users which appears in only a single set minimizes one of the terms of the selection rule (14) in its designated group. i.e., one of the sets is an active neighbor of the other. Mathematically, we say that $\mathcal{U}$ is an active neighbor of $\mathcal{V}$ (and $\mathcal{V}$ is a passive neighbor of $\mathcal{U}$ ) if the distinct node in $\mathcal{V}$ , i.e., $k=\mathcal{V}\setminus\mathcal{U}$ , holds

	$\displaystyle k\in\Big{\{}$	$\displaystyle\operatorname{arg\,min}\limits_{k^{\prime}\in\mathcal{V}}{\rm ucb}(k^{\prime},t-1),\operatorname{arg\,min}\limits_{k^{\prime}\in\mathcal{V}}p_{k^{\prime}}(t-1),$
		$\displaystyle\operatorname*{arg\,min}\limits_{k^{\prime}\in\mathcal{V}}g_{k^{\prime}}(t-1)\Big{\}}.$

The above graph construction is inherently undirected due to the symmetric nature of the neighbor relationships.

To formalize our optimization objective, we define the energy of each vertex as the quantity we seek to maximize in PAUSE’s search (14). Specifically, for any vertex $\mathcal{V}$ , define

	$\displaystyle E(\mathcal{V})\triangleq$	$\displaystyle\min_{k\in\mathcal{V}}{\rm ucb}(k,t-1)+\frac{\alpha}{m}\sum_{k\in\mathcal{V}}g_{k}(t-1)$
		$\displaystyle+\frac{\gamma}{m}\sum_{k\in\mathcal{V}}p_{k}(t-1).$		(18)

To identify a vertex exhibiting maximal energy, we introduce an optimized SA-based algorithm [43], which iteratively inspects vertices (i.e., candidate user sets) in the graph. The resulting procedure, detailed in Algorithm 2, is comprised of two stages taking place on FL round $t$ : initilalziaiton and iterative search.

Initialization: Following established SA methodology, we maintain an auxiliary temperature sequence, whose $j$ th entry is defined as $\tau_{j}=\frac{C}{\log(j+1)}$ , where parameter $C>0$ exceeds the maximum energy differential between any pair of vertices in the graph. Thus, one must first set the value of $C$ .

Accordingly, the initialization phase at round $t$ involves sorting all $K$ users according to their respective ${\rm ucb}(k,t-1)$ , $p_{k}(t-1)$ , and $g_{k}(t-1)$ values into three distinct lists. These three sorted lists are used first to determine an appropriate value for $C$ . For each list $i\in{1,2,3}$ , we denote $\mathcal{A}_{m}^{i}$ and $\mathcal{B}_{m}^{i}$ as the sets containing the $m$ users with minimal and maximal values, respectively. The parameter $C$ is then established as follows, where $\omega$ represents a small positive constant:

$\displaystyle C=$	$\displaystyle\min_{k\in\mathcal{B}^{1}_{m}}{\rm ucb}(k,t-1)-\min_{k\in\mathcal{A}^{1}_{m}}{\rm ucb}(k,t-1)$
	$\displaystyle+\frac{\alpha}{m}\Big{[}\sum_{k\in\mathcal{B}^{2}_{m}}g_{k}(t-1)-\sum_{k\in\mathcal{A}^{2}_{m}}g_{k}(t-1)\Big{]}$
	$\displaystyle+\frac{\gamma}{m}\Big{[}\sum_{k\in\mathcal{B}^{3}_{m}}p_{k}(t-1)-\sum_{k\in\mathcal{A}^{3}_{m}}p_{k}(t-1)\Big{]}+\omega.$	(19)

Iterative Search: The algorithm’s iterative phase updates an inspected vertex, moving at iteration $j$ from the previously inspected $\mathcal{V}_{j}$ into an updated $\mathcal{V}_{j+1}$ . This necessitates the identification of $\mathcal{N}_{\mathcal{V}_{j}}$ . We decompose this task into the discovery of active and passive neighbors as specified in R2, utilizing the previously constructed sorted lists:

N1

Active Neighbor Identification - To determine the active neighbors in iteration $i$ , we examine each sorted list ( ${\rm ucb}(k,t-1)$ , $p_{k}(t-1)$ , and $g_{k}(t-1)$ ) to identify the user with the minimal value within $\mathcal{V}_{j}$ . An active neighbor is generated by substituting any of these minimal-value users with a user not present in $\mathcal{V}_{j}$ . This procedure yields at most $3(K-m)$ active neighbors of $\mathcal{V}_{j}$ .
N2
Passive Neighbor Identification - For passive neighbors, we establish that a vertex $\mathcal{U}$ qualifies as a passive neighbor of $\mathcal{V}_{j}$ if it can be constructed through one of two mechanisms, illustrated using the ${\rm ucb}(k,t-1)$ sorted list. Let $a$ denote the user with minimal ${\rm ucb}(k,t-1)$ in $\mathcal{V}_{j}$ and $b$ represent the user with the second-minimal value. $\mathcal{U}$ is a passive neighbor of $\mathcal{V}_{j}$ if it is obtained by either:
1. (a)
  
  Replace any user in $\mathcal{V}_{j}$ except $a$ with a user whose ${\rm ucb}(k,t-1)$ value is lower than $a$ ’s (positioned before $a$ in the sorted list).
2. (b)
  
  Replace $a$ with a user whose ${\rm ucb}(k,t-1)$ value is lower than $b$ ’s (positioned before $b$ in the sorted list).

Once the neighbors set $\mathcal{N}_{\mathcal{V}_{j}}$ is formulated, the algorithm inspects a random neighbor $\mathcal{U}$ . This set is inspected in the following iteration if it improves in terms of the energy (18) (for which it is also saved as the best set explored so far), or alternatively it is randomly selected with probability $\exp{\big{(}-\frac{E(\mathcal{U})-E(\mathcal{V}_{j})}{\tau_{j}}\big{)}}$ . The resulting procedure is summarized as Algorithm 2.

Input : Set of users

\mathbb{K}

; Number of active users

m

Init : Randomly sample a vertex

\mathcal{V}_{1}

and set

\mathcal{P}_{t}=\mathcal{V}_{1}

;
Sort the users along

{\rm ucb}(k,t-1)

p_{k}(t-1)

, and

g_{k}(t-1)

, in three different lists.

1 Compute

C

via (19);

2 for $j=1,2\ldots$ do

3 Find

N_{\mathcal{V}_{j}}

as described in N1 and N2;

4 Sample randomly

\mathcal{U}\in\mathcal{N}_{\mathcal{V}_{j}}

;

5 if $E(\mathcal{U})\geq E(\mathcal{V}_{j})$ then

6 Update inspected vertex

\mathcal{V}_{j+1}\leftarrow\mathcal{U}

;

7 Update best vertex

\mathcal{P}_{t}\leftarrow\mathcal{U}

;

9 else

10 Sample

p

uniformly over

[0,1]

;

11 Set

\tau_{j}=\frac{C}{\log(1+j)}

;

12 if $p\leq\exp{\big{(}-\frac{E(\mathcal{U})-E(\mathcal{V}_{j})}{\tau_{j}}\big{)}}$ then

13 Update inspected vertex

\mathcal{V}_{j+1}\leftarrow\mathcal{U}

;

14 else

15 Re-inspect vertex

\mathcal{V}_{j+1}\leftarrow\mathcal{V}_{j}

;

return $\mathcal{P}_{t}$

Algorithm 2 Tailored SA for PAUSE at round

t

The proposed SA-PAUSE implements its FL procedure with active user selection formulated, while using Algorithm 2 to approximate PAUSE’s search 14. SA-PAUSE thus realizes Algorithm 1 while replacing its Step 1 with Algorithm 2.

IV-B Theoretical Analysis

Optimality: The SA search of SA-PAUSE, detailed in Algorithm 2, replaces searching over all possible user selections with exploration over a graph. To show its validity, we first prove that it indeed finds the reward-maximizing set of users, as done in PAUSE. Since in general there may be more than one set of users that maximizes the reward (or equivalently, the energy (18)), we use $\mathcal{J}$ to denote the set of vertices exhibiting maximal energy in the graph. The ability of Algorithm 2 to recover the same users set as brute search via (14) (or one that is equivalent in terms of reward) is stated in the following theorem:

Theorem 4.

For Algorithm 2, it holds that:

\lim_{j\rightarrow\infty}\mathbb{P}(\mathcal{V}_{j}\in\mathcal{J})=1.

(20)

Proof.

The proof is given in Appendix -B. ∎

Theorem 4 shows that Algorithm 2 is guaranteed to recover the reward-maximizing users set in the horizon of infinite number of iterations. While the SA algorithm operates over a finite number of iterations, and Theorem 4 applies as $j\rightarrow\infty$ , the carefully designed cooling temperature sequence and algorithmic structure ensure robust practical performance of SA algorithms [50, 51]. This efficacy is empirically validated in Section V.

Time-Complexity: Having shown that Algorithm 2 can approach the users set recovered via PAUSE, we next show that it satisfies its core motivation, i.e., carry out this computation with reduced complexity, and thus supports scalability. While inherently the number of selected users $m$ is smaller than the overall number of users $K$ , and often $m\ll K$ , we accommodate in our analysis computationally intensive settings where $m$ is allowed to grow with $K$ , but in the order of $m=\Theta(K)$ .

On each FL round $t$ , the initialization phase requires $\mathcal{O}(K\log K)$ operations due to the list sorting procedures. During each iteration $j$ , locating $\mathcal{V}_{j}$ ’s users’ indices in the sorted lists can be accomplished in $\mathcal{O}(K\log K)$ operations through pointer manipulation. The identification of $\mathcal{N}_{\mathcal{V}_{j}}$ exhibits complexity $\mathcal{O}(|\mathcal{N}_{\mathcal{V}_{j}}|)$ , as each neighbor can be found in constant time. While the number of active neighbors is bounded by $3(K-m)$ , the quantity of passive neighbors varies across users and iterations. Given that each passive neighbor of $\mathcal{V}_{j}$ corresponds to that node being an active neighbor of $\mathcal{V}_{j}$ , and considering the bounded number of active neighbors per user, a balanced graph typically exhibits approximately $3(K-m)$ passive neighbors per user. Specifically, in the average case where each user in $\mathbb{V}$ has $\mathcal{O}(K\log K)$ passive neighbors, the complexity order of Algorithm 2 is $\mathcal{O}(K\log K)$ .

For comparative purposes, consider a simplified SA variant (termed Vanilla-SA) where the neighboring criterion is reduced to only the first condition in R1 (i.e., nodes are neighbors if they share exactly $m-1$ users). This algorithm closely resembles Algorithm 2, but eliminates list sorting and determines $N_{\mathcal{V}_{j}}$ by exhaustively replacing each user in $\mathcal{V}_{j}$ with each user in $\mathbb{K}\setminus\{\mathcal{V}_{j}\}$ . In this case, by setting $C$ to be an upper bound on $\Delta_{max}$ (16), e.g., $C\triangleq 2\alpha+\gamma+1$ we satisfy the conditions for Theorem 4 as well, ensuring asymptotic convergence. However, this approach results in $|N_{\mathcal{V}_{j}}|=m(K-m)$ , producing a densely connected graph that impedes search efficiency and invariably yields $\mathcal{O}(K^{2})$ complexity. Table I presents a comprehensive comparison of time complexities across different scenarios.

	Best	Worst
Brute force search 14	$O(e^{K})$
Vanilla-SA	$O(K^{2})$
Algorithm 2	$O(K\log K)$	$O(K^{2})$

TABLE I: Time complexity comparison of different algorithms

Summary: Combining the optimality analysis in Theorem 4 with the complexity characterization in Table I indicates that the integration of Algorithm 2 to approximate PAUSE’s search (14) into SA-PAUSE enables the application of PAUSE to large-scale networks, meeting C4. The theoretical convergence guarantees, coupled with its practical efficiency, make it a robust solution for approximating PAUSE and thus still adhering for considerations C1-C3. The empirical validation of these theoretical results is presented comprehensively in the following section.

V Numerical Study

V-A Experimental Setup

Here, we numerically evaluate PAUSE in FL¹¹1The source code used in our experimental study, including all the hyper-parameters, is available online at https://github.com/oritalp/PAUSE/tree/production. We consider the training of a DNN for image classification based on MNIST and CIFAR-10. The trained model is comprised of a convolutional neural network (CNN) with three hidden layers. These layers are followed by a fully-connected network (FC) with two hidden layers for CIFAR-10, and a three-layer FC with 32 neurons at its widest layer for MNIST.

We examine our approach in both small and large network settings with varying privacy budgets. In the former, the data is divided between $K=30$ users, and $m=5$ of them being chosen at each round, while the latter corresponds to $K=300$ and $m=15$ users. The communication latency $\tau_{k}$ obeys a normal distribution for every $k\in\mathbb{K}$ . The users are equally divided into two groups: fast users, who had lower communication latency expectations, and slower users. For each configuration, we test our approach both in i.i.d and non-i.i.d data distributions. In the imbalnaced case, the data quantities are sampled from a Dirichlet distribution with parameter $\bm{\alpha}$ , where each user exhibits a dominant label comprising approximately a quarter of the data.

As PAUSE becomes computationally infeasible in the large network case, it is only tested on small networks, while SA-PAUSE is being tested in both scenarios. These algorithms are compared with the following benchmarks:

•

Random, uniformly sampling $m=5$ users without replacements [44], solely in the i.i.d balanced case.
•

FedAvg with privacy and FedAvg w.o. privacy, choosing all $K$ users, with and without privacy, respectively.
•

Fastest in expectation, using only the same pre-known five fastest $m$ users in expectation at each round.
•

The clustered sampling selection algorithm proposed in [20].

V-B Small Network with i.i.d. Data

Our first study trains the mentioned CNN with an overall privacy budget of $\bar{\epsilon}=40$ for image classification using the CIFAR-10 dataset. The resulting FL accuracy versus communication latency are illustrated in Fig. 2. The error curves were smoothened with an averaging window of size $10$ to attenuate the fluctuations. As expected, due to privacy leakage accumulation, the more rounds a user participates in, the noisier its updates are. This is evident in Fig. 2, where choosing all users quickly results in ineffective updates. PAUSE consistently achieves both accurate learning and rapid convergence. Further observing this figure indicates SA-PAUSE successfully approximates PAUSE’s brute-force search as well.

PAUSE’s ability to mitigate privacy accumulation is showcased in Fig. 3. There, we report the overall leakage as it evolves over epochs. Fig. 3 reveals that the privacy violation at each given epoch using PAUSE is lower compared to the random and the clustered sampling methods, adding to its improved accuracy and latency noted in Fig. 2. Note FedAvg with privacy and fastest in expectation methods’ maximum privacy violation coincide as in every epoch it’s raised by an $\epsilon_{i}$ .

V-C Small Network with non-i.i.d. Data

Subsequently, we train the same DNN with CIFAR-10 in the non-i.i.d case as described previously with an overall privacy budget of $\bar{\epsilon}=100$ . As opposed to the balanced data test, this setting necessitates balancing between users with varying quantities of data, which might contribute differently to the learning process. The data quantities were sampled from a Dirichlet distribution with parameter ${\bm{\alpha}}={\bm{3}}$ . Analyzing the validation accuracy versus communication latency in Fig. 4 indicates the superiority of our algorithms also in this case in terms of accuracy and latency. Fig. 5 depicts the maximum privacy violation of the system, this time, versus the communication latency, and facilitates this statement by demonstrating both PAUSE and its approximation’s ability to maintain privacy better though performing more sever-clients iterations in any given time.

V-D Large Networks

We proceed to consider the large network settings. Here, we train two models: one for MNIST with i.i.d. data distribution, and on for CIFAR-10 with data distributed non-i.i.d. For these scenarios, we implemented two modifications. First, to accelerate the convergence of the SA procedure in Algorithm 2 under a reasonable amount of iterations, we modulate the temperature coefficient $C$ as in [52, 53]. This is accomplished by dividing the temperature coefficient by a constant $\kappa=30$ , i.e., the temperature in the $j$ th iteration becomes $\tau_{j}=\frac{C}{\kappa\log(1+j)}$ [52, 53]. Second, to enhance exploitation [54, 55], we amplified the empirical mean $\overline{\mu_{k}}(t)$ in 13 by another constant, $\zeta=3$ .

The overall privacy budgets for the MNIST and CIFAR-10 evaluations are set to $\bar{\epsilon}=20$ and $\bar{\epsilon}=10$ , respectively. In the former, the data quantities were sampled from a Dirichlet distribution with parameter ${\bm{\alpha}}={\bm{2}}$ . Both cases exhibited consistent trends with the small networks tests, systematically demonstrating SA-PAUSE’s robustness across diverse privacy budgets, datasets, and network scales.

As before, the validation accuracy versus communication latency graphs are presented in Fig. 6 and Fig. 8, while the maximum overall privacy leakage versus time graphs are depicted in 7 and Fig. 9. These results systematically demonstrate the ability of our proposed SA-PAUSE to facilitate rapid learning over large networks with balanced and limited privacy leakage.

VI Conclusion

We proposed PAUSE, an active and dynamic user selection algorithm under fixed privacy constraints. This algorithm balances three FL aspects: accuracy of the trained model, communication latency, and system’s privacy. We showed that under common assumptions ,PAUSE’s regret achieves a logarithmic order with time. To address complexity and scalability, we developed SA-PAUSE, which integrates a SA algorithm with theoretical guarantees to approximate PAUSE’s brute-force search in feasible running time. We numerically demonstrated SA-PAUSE’s ability to approximate PAUSE’s search and its superiority over alternative approaches in diverse experimental scenarios.

-A Proof of Theorem 3

In the following, define $h_{k}(t)\triangleq\sqrt{\frac{(m+1)\log(t)}{T_{k}(t)}}$ . The regret can be bounded following the definition of $\Delta_{max}$ as

	$\displaystyle R^{\mathcal{P}}(n)$	$\displaystyle=\mathbb{E}\left[\sum_{t=1}^{n}r(\mathcal{G}_{t},t)-r(\mathcal{P}_{t},t)\right]$
		$\displaystyle\leq\Delta_{\max}\mathbb{E}\left[\sum_{t=1}^{n}\mathbf{1}(r(\mathcal{G}_{t},t)\neq r(\mathcal{P}_{t},t)\right],$		(-A.1)

We introduce another indicator function for every $i\in\mathbb{K}$ along with its cumulative sum, denoted:

\displaystyle I_{i}(t)

\displaystyle\triangleq\left.\begin{cases}1,&\begin{cases}&i=\underset{k\in\mathcal{C}_{t}}{\operatorname*{arg\,min}}~T_{k}(t-1)\\ &r(\mathcal{P}_{t},t)\neq r(\mathcal{G}_{t},t)\end{cases}\\ 0,&\text{else}\end{cases}\right.\,N_{i}(n)\triangleq\sum_{t=1}^{n}I_{i}(t).

Let $\mathcal{C}_{t}\triangleq\mathcal{P}_{t}\cup\mathcal{G}_{t}$ . In every round $t$ where $r(\mathcal{P}_{t})\neq r(\mathcal{G}_{t})$ , the counter $N_{k}(t)$ is incremented for only a single user in $\mathcal{C}_{t}$ , while for the remaining users $N_{k}(t-1)=N_{k}(t)$ . Thus, it holds that $\sum_{t=1}^{n}\mathbf{1}\big{(}(\mathcal{G}_{t},t)\neq r(\mathcal{P}_{t},t)\big{)}=\sum_{k=1}^{K}N_{k}(n)$ . Substituting this into (-A.1), we obtain that

R^{\mathcal{P}}(n)\leq\Delta_{\max}\sum_{k=1}^{K}\mathbb{E}[N_{k}(n)].

(-A.2)

In the remainder, we focus on bounding $\mathbb{E}[N_{k}(n)]$ for every $k\in\mathbb{K}$ . After that, we substitute the derived upper bound into (-A.2). To that aim, let $k\in\mathbb{K}$ and fix some $l\in\mathbb{N}$ whose value is determined later). We note that:

	$\displaystyle\mathbb{E}[N_{k}(n)]=\mathbb{E}[\sum_{t=1}^{n}\mathbf{1}(I_{k}(t)=1)]$
	$\displaystyle=\mathbb{E}[\sum_{t=1}^{n}\mathbf{1}(I_{k}(t)=1,N_{k}(t)\leq l)+\mathbf{1}(I_{k}(t)=1,N_{k}(t)>l)]$
	$\displaystyle\qquad\qquad\stackrel{{\scriptstyle(a)}}{{\leq}}l+\mathbb{E}[\sum_{t=1}^{n}\mathbf{1}(I_{k}(t)=1,N_{k}(t)>l)],$		(-A.3)

where $(a)$ arises from considering the cases $N_{k}(n)\leq l$ and its complementary state.

PAUSE’s policy (14) implies that in every iteration:

	$\displaystyle\!\!\min_{k\in\mathcal{P}_{t}}{\rm ucb}(k,t-1)\!+\!\frac{\alpha}{m}\sum_{k\in\mathcal{P}_{t}}\!g_{k}(t\!-\!1)\!+\!\frac{\gamma}{m}\sum_{k\in\mathcal{P}_{t}}\!p_{k}(t\!-\!1)\geq$
	$\displaystyle\!\!\min_{k\in\mathcal{G}_{t}}{\rm ucb}(k,t\!-\!1)\!+\!\frac{\alpha}{m}\sum_{k\in\mathcal{G}_{t}}\!g_{k}(t\!-\!1)\!+\!\frac{\gamma}{m}\sum_{k\in\mathcal{G}_{t}}\!p_{k}(t\!-\!1).$		(-A.4)

Since this happens with probability one, we can incorporate it into the mentioned inequality (-A.3):

\begin{split}&\mathbb{E}[N_{k}(n)]\leq l+\mathbb{E}\Biggl{[}\sum_{t=1}^{n}\mathbf{1}\Biggl{\{}I_{k}(t)=1,N_{k}(t)>l,\\ &\min_{k\in\mathcal{P}_{t}}{\rm ucb}(k,t-1)+\frac{\alpha}{m}\sum_{k\in\mathcal{P}_{t}}g_{k}(t-1)+\frac{\gamma}{m}\sum_{k\in\mathcal{P}_{t}}p_{k}(t-1)\geq\\ &\min_{k\in\mathcal{G}_{t}}{\rm ucb}(k,t-1)+\frac{\alpha}{m}\sum_{k\in\mathcal{G}_{t}}g_{k}(t-1)+\frac{\gamma}{m}\sum_{k\in\mathcal{G}_{t}}p_{k}(t-1)\Biggr{\}}\Biggr{]}.\end{split}

We now denote the users chosen in the $t$ th iteration by the PAUSE algorithm and by the Genie as: $\mathcal{G}_{t}=\tilde{u}_{t,1},...,\tilde{u}_{t,m}$ and $\mathcal{P}_{t}=u_{t,1},...,u_{t,m}$ , respectively. For every $t$ , the indicator function in the sum is equal to 1 only if the $k$ th user is chosen the least at the beginning of the $t$ th iteration, i.e., $T_{k}(t-1)\leq T_{j}(t-1)$ for every $j\in\mathcal{C}_{t}$ . The intersection of $I_{k}(t)=1$ with $N_{k}(t)>l$ implies $T_{k}(t-1)\leq l$ . Therefore, this intersection of events implies that for every $j\in\mathcal{C}_{t},l\leq T_{j}(t-1)\leq t-1$ . Using this result, we can further bound every event in the indicator functions in the upper bound of $\mathbb{E}[N_{k}(n)]$ :

\begin{split}\mathbb{E}[N_{k}(n)]\leq&l+\mathbb{E}\Biggl{[}\sum_{t=1}^{n}\mathbf{1}\biggl{\{}I_{k}(t)=1,N_{k}(t)>l,\\ &\min_{l\leq T_{u_{t,1}},...,T_{u_{t,m}}\leq t-1}\biggr{(}\min_{k\in\mathcal{P}_{t}}{\rm ucb}(k,t-1)+\\ &\frac{\alpha}{m}\sum_{k\in\mathcal{P}_{t}}g_{k}(t-1)+\frac{\gamma}{m}\sum_{k\in\mathcal{P}_{t}}p_{k}(t-1)\biggl{)}\geq\\ &\min_{l\leq T_{\tilde{u}_{t,1}},...,T_{\tilde{u}_{t,m}}\leq t-1}\biggl{(}\min_{k\in\mathcal{G}_{t}}{\rm ucb}(k,t-1)+\\ &\frac{\alpha}{m}\sum_{k\in\mathcal{G}_{t}}g_{k}(t-1)+\frac{\gamma}{m}\sum_{k\in\mathcal{G}_{t}}p_{k}(t-1)\biggr{)}\biggr{\}}\biggr{]}.\end{split}

Using the fact that for any finite set of events of size $q$ , it holds that $\{A_{i}\}_{i=1}^{q}$ , $\mathbf{1}(\cup_{i=1}^{q}A_{i})\leq\sum_{i=1}^{q}\mathbf{1}(A_{i})$ , and that the expectation of an indicator function is the probability of the internal event occurring, we have that

\begin{split}&\mathbb{E}[N_{k}(n)]\leq l+\sum_{t=1}^{n}\sum_{l\leq T_{\tilde{u}_{t,1}},...,T_{\tilde{u}_{t,m}},T_{u_{t,1}},...,T_{u_{t,m}}\leq t-1}\\ &\mathbb{P}\biggl{[}\min_{k\in\mathcal{P}_{t}}{\rm ucb}(k,t\!-\!1)\!+\!\frac{\alpha}{m}\sum_{k\in\mathcal{P}_{t}}\!g_{k}(t\!-\!1)\!+\!\frac{\gamma}{m}\sum_{k\in\mathcal{P}_{t}}\!p_{k}(t\!-\!1)\geq\\ &\min_{k\in\mathcal{G}_{t}}{\rm ucb}(k,t\!-\!1)\!+\!\frac{\alpha}{m}\sum_{k\in\mathcal{G}_{t}}\!g_{k}(t\!-\!1)\!+\!\frac{\gamma}{m}\sum_{k\in\mathcal{G}_{t}}\!p_{k}(t\!-\!1)\biggr{]}.\end{split}

(-A.5)

In the following steps we focus on bounding the terms in the double sum. To that aim, we define the following:

a_{t}\!=\!\operatorname*{arg\,min}_{k\in\mathcal{P}_{t}}{\rm ucb}(k,t\!-\!1),~b_{t}\!=\!\operatorname*{arg\,min}_{k\in\mathcal{G}_{t}}{\rm ucb}(k,t\!-\!1).

(-A.6)

Using these notations and writing $h_{a_{t}}\triangleq h_{a_{t}}(t)$ , we state the following lemma:

Lemma -A.1.

The event (-A.4) implies at least one of the next three events occurs:

1.

$\bar{x}_{b_{t}}+h_{b_{t}}\leq\mu_{b_{t}}$ ;
2.

$\bar{x}_{a_{t}}\geq\mu_{a_{t}}+h_{a_{t}}$ ;
3.

$\mu_{b_{t}}+\frac{\alpha}{m}\sum_{k\in\mathcal{G}_{t}}g_{k}(t-1)+\frac{\gamma}{m}\sum_{k\in\mathcal{G}_{t}}p_{k}(t-1)<\mu_{a_{t}}+2h{a_{t}}+\frac{\alpha}{m}\sum_{k\in\mathcal{P}_{t}}g_{k}(t-1)+\frac{\gamma}{m}\sum_{k\in\mathcal{P}_{t}}p_{k}(t-1)$ .

Proof.

Proof by contradiction: we assume all three events don’t occur and examine the following:

\begin{split}&\bar{x}_{b_{t}}+h_{b_{t}}+\frac{\alpha}{m}\sum_{k\in\mathcal{G}_{t}}g_{k}(t-1)+\frac{\gamma}{m}\sum_{k\in\mathcal{G}_{t}}p_{k}(t-1)\stackrel{{\scriptstyle(1)}}{{>}}\\ &\mu_{b_{t}}+\frac{\alpha}{m}\sum_{k\in\mathcal{G}_{t}}g_{k}(t-1)+\frac{\gamma}{m}\sum_{k\in\mathcal{G}_{t}}p_{k}(t-1)\stackrel{{\scriptstyle(3)}}{{\geq}}\\ &\mu_{a_{t}}+2h_{a_{t}}+\frac{\alpha}{m}\sum_{k\in\mathcal{P}_{t}}g_{k}(t-1)+\frac{\gamma}{m}\sum_{k\in\mathcal{P}_{t}}p_{k}(t-1)\stackrel{{\scriptstyle(2)}}{{>}}\\ &\bar{x}_{a_{t}}+h_{a_{t}}+\frac{\alpha}{m}\sum_{k\in\mathcal{P}_{t}}g_{k}(t-1)+\frac{\gamma}{m}\sum_{k\in\mathcal{P}_{t}}p_{k}(t-1).\end{split}

By the definitions of $a_{t}$ and $b_{t}$ (-A.6), the inequality above can also be written as:

	$\displaystyle\!\!\min_{k\in\mathcal{P}_{t}}{\rm ucb}(k,t\!-\!1)\!+\!\frac{\alpha}{m}\sum_{k\in\mathcal{P}_{t}}\!g_{k}(t\!-\!1)\!+\!\frac{\gamma}{m}\sum_{k\in\mathcal{P}_{t}}\!p_{k}(t\!-\!1)<$
	$\displaystyle\!\!\min_{k\in\mathcal{G}_{t}}{\rm ucb}(k,t\!-\!1)\!+\!\frac{\alpha}{m}\sum_{k\in\mathcal{G}_{t}}\!g_{k}(t\!-\!1)\!+\!\frac{\gamma}{m}\sum_{k\in\mathcal{G}_{t}}\!p_{k}(t\!-\!1),$		(-A.7)

contradicting our initially assumed event (-A.4). ∎

Applying the union bound and the relationship between the events shown in Lemma -A.1 implies:

	$\displaystyle\mathbb{P}\biggl{[}\min_{k\in\mathcal{P}_{t}}{\rm ucb}(k,t-1)+\frac{\alpha}{m}\sum_{k\in\mathcal{P}_{t}}g_{k}(t-1)+\frac{\gamma}{m}\sum_{k\in\mathcal{P}_{t}}p_{k}(t-1)$
	$\displaystyle\geq\min_{k\in\mathcal{G}_{t}}{\rm ucb}(k,t-1)+\frac{\alpha}{m}\sum_{k\in\mathcal{G}_{t}}g_{k}(t-1)+\frac{\gamma}{m}\sum_{k\in\mathcal{G}_{t}}p_{k}(t-1)\biggr{]}$
	$\displaystyle\leq\overbrace{\mathbb{P}[\bar{x}_{b_{t}}+h_{b_{t}}\leq\mu_{b_{t}}]}^{\triangleq(1)}+\overbrace{\mathbb{P}[\bar{x}_{a_{t}}\geq\mu_{a_{t}}+h_{a_{t}}]}^{\triangleq(2)}+$
	$\displaystyle\mathbb{P}[\mu_{b_{t}}+\frac{\alpha}{m}\sum_{k\in\mathcal{G}_{t}}g_{k}(t-1)+\frac{\gamma}{m}\sum_{k\in\mathcal{G}_{t}}p_{k}(t-1)<$
	$\displaystyle\underbrace{\mu_{a_{t}}\!+\!2h{a_{t}}\!+\!\frac{\alpha}{m}\sum_{k\in\mathcal{P}_{t}}g_{k}(t-1)\!+\!\frac{\gamma}{m}\sum_{k\in\mathcal{P}_{t}}p_{k}(t-1)]}_{\triangleq(3)}.$		(-A.8)

We obtained three probability terms – $(1)$ , $(2)$ , and $(3)$ . we will start with bounding the first two using Hoeffding’s inequality [56]. Term $(3)$ will be bounded right after in a different manner. We’ll demonstrate how the first term is bounded; the second one is done similarly by replacing $b_{t}$ with $a_{t}$ :

	$\displaystyle\mathbb{P}[\bar{x}_{b_{t}}+h_{b_{t}}\leq\mu_{b_{t}}]=\mathbb{P}[\bar{x}_{b_{t}}-\mu_{b_{t}}\leq-h_{b_{t}}]$
	$\displaystyle=\mathbb{P}\left[\sum_{j=1}^{T_{b_{t}}(t-1)}\frac{\tau_{min}}{(\tau_{b_{t}})_{j}}-\mu_{b_{t}}\leq-h_{b_{t}}T_{b_{t}}(t-1)\right]$
	$\displaystyle\leq e^{-}\frac{2T_{b_{t}}^{2}(t-1)(m+1)\log(t)}{T_{b_{t}}^{2}(t-1)}=t^{-2(m+1)},$		(-A.9)

where $(\tau_{b_{t}})_{j}$ is the latency of the user $b_{t}$ at the $j$ th round it participated. This, results in the following inequalities:

\overbrace{\mathbb{P}[\bar{x}_{b_{t}}\!+\!h_{b_{t}}\leq\mu_{b_{t}}]}^{=(1)}\leq t^{-2(m\!+\!1)},~\overbrace{\mathbb{P}[\bar{x}_{a_{t}}\geq\mu_{a_{t}}\!+\!h_{a_{t}}]}^{=(2)}\leq t^{-2(m\!+\!1)}.

To bound $(3)$ we define another two definitions:

\begin{split}&A_{t}=\operatorname*{arg\,min}_{k\in\mathcal{P}_{t}}\mu_{k},\quad B_{t}=\operatorname*{arg\,min}_{k\in\mathcal{G}_{t}}\mu_{k}.\end{split}

(-A.10)

Using the law of total probability to divide $(3)$ into 2 parts:

\begin{split}&\mathbb{P}\biggr{[}\mu_{b_{t}}+\frac{\alpha}{m}\sum_{k\in\mathcal{G}_{t}}g_{k}(t-1)+\frac{\gamma}{m}\sum_{k\in\mathcal{G}_{t}}p_{k}(t-1)<\\ &\underbrace{\mu_{a_{t}}+2h{a_{t}}+\frac{\alpha}{m}\sum_{k\in\mathcal{P}_{t}}g_{k}(t-1)+\frac{\gamma}{m}\sum_{k\in\mathcal{P}_{t}}p_{k}(t-1)\biggr{]}}_{\triangleq(3)}\\ &=\mathbb{P}\biggl{[}\bigl{(}\mu_{b_{t}}+\frac{\alpha}{m}\sum_{k\in\mathcal{G}_{t}}g_{k}(t-1)+\frac{\gamma}{m}\sum_{k\in\mathcal{G}_{t}}p_{k}(t-1)<\\ &\mu_{a_{t}}+2h{a_{t}}+\frac{\alpha}{m}\sum_{k\in\mathcal{P}_{t}}g_{k}(t-1)+\frac{\gamma}{m}\sum_{k\in\mathcal{P}_{t}}p_{k}(t-1)\bigr{)}\\ &\cap(b_{t}=B_{t})\biggr{]}\\ &+\mathbb{P}\biggl{[}\bigl{(}\mu_{b_{t}}+\frac{\alpha}{m}\sum_{k\in\mathcal{G}_{t}}g_{k}(t-1)+\frac{\gamma}{m}\sum_{k\in\mathcal{G}_{t}}p_{k}(t-1)<\\ &\mu_{a_{t}}+2h{a_{t}}+\frac{\alpha}{m}\sum_{k\in\mathcal{P}_{t}}g_{k}(t-1)+\frac{\gamma}{m}\sum_{k\in\mathcal{P}_{t}}p_{k}(t-1)\bigr{)}\\ &\cap(b_{t}\neq B_{t})\biggr{]}.\end{split}

We denote the former term as $(3a)$ and the latter as $(3b)$ :


		$\displaystyle(3a)\triangleq\mathbb{P}\biggl{[}\bigl{(}\mu_{b_{t}}+\frac{\alpha}{m}\sum_{k\in\mathcal{G}_{t}}g_{k}(t-1)+\frac{\gamma}{m}\sum_{k\in\mathcal{G}_{t}}p_{k}(t-1)<$
		$\displaystyle\mu_{a_{t}}+2h{a_{t}}+\frac{\alpha}{m}\sum_{k\in\mathcal{P}_{t}}g_{k}(t-1)+\frac{\gamma}{m}\sum_{k\in\mathcal{P}_{t}}p_{k}(t-1)\bigr{)}$
		$\displaystyle\cap(b_{t}=B_{t})\biggr{]},$		(-A.11a)
		$\displaystyle(3b)\triangleq\mathbb{P}\biggl{[}\bigl{(}\mu_{b_{t}}+\frac{\alpha}{m}\sum_{k\in\mathcal{G}_{t}}g_{k}(t-1)+\frac{\gamma}{m}\sum_{k\in\mathcal{G}_{t}}p_{k}(t-1)<$
		$\displaystyle\mu_{a_{t}}+2h{a_{t}}+\frac{\alpha}{m}\sum_{k\in\mathcal{P}_{t}}g_{k}(t-1)+\frac{\gamma}{m}\sum_{k\in\mathcal{P}_{t}}p_{k}(t-1)\bigr{)}$
		$\displaystyle\cap(b_{t}\neq B_{t})\biggr{]}.$		(-A.11b)

In the following we show that for a range of values of $l$ , which so far was arbitrarily chosen, $3(a)$ is equal to $0$ . Recalling the definitions of $a_{t}$ (-A.6) and $A_{t}$ (-A.10), we know $\mu_{A_{t}}\leq\mu_{a_{t}}$ . plugging this relation into probability of contained events in $(b)$ , and upper bounding by omitting the intersection in $(a)$ , yields:

\begin{split}&(3a)\stackrel{{\scriptstyle(a)}}{{\leq}}\\ &\mathbb{P}\biggl{[}\mu_{B_{t}}+\frac{\alpha}{m}\sum_{k\in\mathcal{G}_{t}}g_{k}(t-1)+\frac{\gamma}{m}\sum_{k\in\mathcal{G}_{t}}p_{k}(t-1)<\\ &\mu_{a_{t}}+2h{a_{t}}+\frac{\alpha}{m}\sum_{k\in\mathcal{P}_{t}}g_{k}(t-1)+\frac{\gamma}{m}\sum_{k\in\mathcal{P}_{t}}p_{k}(t-1)\biggr{]}\\ &\stackrel{{\scriptstyle(b)}}{{\leq}}\mathbb{P}\biggl{[}\mu_{B_{t}}+\frac{\alpha}{m}\sum_{k\in\mathcal{G}_{t}}g_{k}(t-1)+\frac{\gamma}{m}\sum_{k\in\mathcal{G}_{t}}p_{k}(t-1)<\\ &\mu_{A_{t}}+2h{a_{t}}+\frac{\alpha}{m}\sum_{k\in\mathcal{P}_{t}}g_{k}(t-1)+\frac{\gamma}{m}\sum_{k\in\mathcal{P}_{t}}p_{k}(t-1)\biggr{]}\\ &=\mathbb{P}\biggl{[}\overbrace{\mu_{B_{t}}+\frac{\alpha}{m}\sum_{k\in\mathcal{G}_{t}}g_{k}(t-1)+\frac{\gamma}{m}\sum_{k\in\mathcal{G}_{t}}p_{k}(t-1)}^{=C^{\mathcal{G}}(\mathcal{G}_{t},t)}-\\ &\overbrace{\bigl{(}\mu_{A_{t}}+\frac{\alpha}{m}\sum_{k\in\mathcal{P}_{t}}g_{k}(t-1)+\frac{\gamma}{m}\sum_{k\in\mathcal{P}_{t}}p_{k}(t-1)\bigr{)}}^{=C^{\mathcal{G}}(\mathcal{P}_{t},t)}<2h_{a_{t}}\biggr{]}\\ &=\mathbb{P}\Bigl{[}C^{\mathcal{G}}(\mathcal{G}_{t},t)-C^{\mathcal{G}}(\mathcal{P}_{t},t)<2\sqrt{\frac{(m+1)\log(t)}{T_{a_{t}}(t-1)}}\Bigr{]},\end{split}

where the last two equalities derive from reorganizing the event and recalling the definitions of $C^{\mathcal{G}}(\mathcal{S},t)$ and $h_{k}(t)$ , respectively. We now show this event exists in probability 0, and then the latest bound implies $(3a)$ is equal to $0$ as well. We observe the mentioned event while recalling that $T_{a_{t}}(t)\geq l$ by the relevant indexes in the summation in (-A.5):

	$\displaystyle C^{\mathcal{G}}(\mathcal{G}_{t},t)-C^{\mathcal{G}}(\mathcal{P}_{t},t)$	$\displaystyle<2\sqrt{\frac{(m+1)\log(t)}{T_{a_{t}}(t-1)}}$
		$\displaystyle\leq 2\sqrt{\frac{(m+1)\log(n)}{l}}.$		(-A.12)

Next, we observe an enhanced version of the Genie that is rewarded by an additive term of $\delta$ in every round that $\mathcal{G}_{t}\neq\mathcal{P}_{t}$ . Recalling we observe solely cases where this statement occurs, the LHS is directly larger than $\delta$ . Thus, to secure non-existence of this event we may set any $l$ value fulfilling $\delta<2\sqrt{\frac{(m+1)\log(n)}{l}}$ . Recalling $\delta>0$ , we reorganize this condition into:

l\geq\Bigg{\lceil}\frac{4(m+1)\log(n)}{\delta^{2}}\Bigg{\rceil}.

(-A.13)

Moreover, this enhanced version adds another term of $\delta\sum_{t=1}^{n}\mathbb{E}\bigg{[}\mathbf{1}\Big{(}(\mathcal{G}_{t},t)\neq r(\mathcal{P}_{t},t)\Big{)}\bigg{]}=\delta\sum_{k=1}^{K}\mathbb{E}[N_{k}(n)]$ to the regret, as noted later in the proof closure.

Recall that we initially aimed to upper bound the probability of the event (-A.7) by splitting it into three events using the union bound (-A.8). We then showed $(1)$ and $(2)$ are bounded, and divided $(3)$ into 2 parts - $3(a)$ and $3(b)$ . By setting an appropriate value of $l$ (-A.13), we demonstrated $3(a)$ can be shown to be equal to 0. The last step is to upper bound $3(b)$ which is done similarly.

We start by recalling the definition of $3(b)$ (-A.11) and then bound it by a containing event:

	$\displaystyle(3b)\triangleq\mathbb{P}\biggl{[}\bigl{(}\mu_{b_{t}}+\frac{\alpha}{m}\sum_{k\in\mathcal{G}_{t}}g_{k}(t-1)+\frac{\gamma}{m}\sum_{k\in\mathcal{G}_{t}}p_{k}(t-1)<$
	$\displaystyle\qquad\mu_{a_{t}}+2h{a_{t}}+\frac{\alpha}{m}\sum_{k\in\mathcal{P}_{t}}g_{k}(t-1)+\frac{\gamma}{m}\sum_{k\in\mathcal{P}_{t}}p_{k}(t-1)\bigr{)}$
	$\displaystyle\qquad\cap(b_{t}\neq B_{t})\biggr{]}$
	$\displaystyle\leq\mathbb{P}[b_{t}\neq B_{t}]=\mathbb{P}[\overline{\mu_{b_{t}}}(t)+h_{b_{t}}\leq\overline{\mu_{B_{t}}}(t)+h_{B_{t}}].$		(-A.14)

The last equality arises from the definitions of $b_{t}$ (-A.6) and $B_{t}$ (-A.10), and definition (13). We now prove a lemma regarding this event, which its probability upper bounds $3(b)$ :

Lemma -A.2.

The following event implies at least one of the next three events occur:

\overline{\mu_{b_{t}}}(t)+h_{b_{t}}\leq\overline{\mu_{B_{t}}}(t)+h_{B_{t}}

(-A.15)

1.

$\overline{\mu_{b_{t}}}(t)+h_{b_{t}}\leq\mu_{b_{t}}$
2.

$\overline{\mu_{B_{t}}}(t)\geq\mu_{B_{t}}+h_{B_{t}}$
3.

$\mu_{b_{t}}<\mu_{B_{t}}+2h_{B_{t}}$

Proof.

We prove by contradiction, as $\overline{\mu_{b_{t}}}(t)+h_{b_{t}}\stackrel{{\scriptstyle(1)}}{{>}}\mu_{b_{t}}\stackrel{{\scriptstyle(3)}}{{\geq}}\mu_{B_{t}}+2h_{B_{t}}\stackrel{{\scriptstyle(2)}}{{>}}\overline{\mu_{B_{t}}}(t)+h_{B_{t}}$ , thus proving the lemma ∎

Combining the lemma, the union bound an the upper bound we found in (-A.14) yields:

	$\displaystyle 3(b)\leq$	$\displaystyle\mathbb{P}[\overline{\mu_{b_{t}}}(t)-\mu_{b_{t}}\leq-h_{b_{t}}]+\mathbb{P}[\overline{\mu_{B_{t}}}(t)-\mu_{B_{t}}\geq h_{B_{t}}]$
		$\displaystyle+\mathbb{P}[\mu_{b_{t}}<\mu_{B_{t}}+2h_{B_{t}}].$

We already showed in (-A.9) that the first term is bounded by $t^{-2(m+1)}$ . Repeating the same steps for $B_{t}$ instead of $b_{t}$ we can show that this value also bounds the second term. Furthermore, we now show that the event in the third term occurs with probability 0 when setting an appropriate value of $l$ . Observing the mentioned event:

\begin{split}\mu_{b_{t}}-\mu_{B_{t}}<2\frac{(m+1)log(t)}{T_{B_{t}}(t-1)}.\end{split}

(-A.16)

Similar to (-A.13), and recalling $b_{t}\neq B_{t}$ , by demanding $l\geq\Big{\lceil}\frac{4(m+1)\log(n)}{\delta^{2}}\Big{\rceil}$ we assure this event occurs with probability 0. As this is the same range as in (-A.13) we set $l$ to be the lowest integer in this range, i.e., $l=\Big{\lceil}\frac{4(m+1)\log(n)}{\delta^{2}}\Big{\rceil}$ .

Finally, as we showed: $(3)\leq 2t^{-2(m+1)}$ . plugging the bounds on $(1)$ , $(2)$ , and $(3)$ into (-A.8) we obtain:

\begin{split}&\mathbb{P}\biggl{[}\min_{k\in\mathcal{P}_{t}}{\rm ucb}(k,t-1)+\frac{\alpha}{m}\sum_{k\in\mathcal{P}_{t}}g_{k}(t-1)+\frac{\gamma}{m}\sum_{k\in\mathcal{P}_{t}}p_{k}(t-1)\\ &\geq\min_{k\in\mathcal{G}_{t}}{\rm ucb}(k,t-1)+\frac{\alpha}{m}\sum_{k\in\mathcal{G}_{t}}g_{k}(t-1)+\frac{\gamma}{m}\sum_{k\in\mathcal{G}_{t}}p_{k}(t-1)\biggr{]}\\ &\leq\overbrace{t^{-2(m+1)}}^{\geq(1)}+\overbrace{t^{-2(m+1)}}^{\geq(2)}+\overbrace{2t^{-2(m+1)}}^{\geq(3)}=4t^{-2(m+1)}.\end{split}

Substituting this bound along with the chosen value of $l$ into the result we obtained at the beginning of the proof (-A.5) we obtain:

\begin{split}\mathbb{E}[N_{k}(n)]\leq&\Big{\lceil}\frac{4(m+1)\log(n)}{\delta^{2}}\Big{\rceil}+\\ &\sum_{t=1}^{n}\sum_{l\leq T_{\tilde{u}_{t,1}},...,T_{\tilde{u}_{t,m}},T_{u_{t,1}},...,T_{u_{t,m}}\leq t-1}4t^{-2(m+1)}\\ &\leq\frac{4(m+1)\log(n)}{\delta^{2}}+1+\sum_{t=1}^{n}4t^{-2(m+1)}\cdot t^{2m}\\ &\leq\frac{4(m+1)\log(n)}{\delta^{2}}+1+4\overbrace{\sum_{t=1}^{\infty}t^{-2}}^{=\pi^{2}/3}.\end{split}

To conclude the theorem’s statement, we set this result back into (-A.2) while recalling the added regret from the Genie empowerment, obtaining

\begin{split}R^{\mathcal{P}}(n)&\leq(\Delta_{\max}+\delta)\sum_{k=1}^{K}\mathbb{E}[N_{k}(n)]\\ &\leq K(\Delta_{\max}+\delta)\bigg{(}\frac{4(m+1)\log(n)}{\delta^{2}}+1+\frac{4\pi^{2}}{3}\bigg{)},\end{split}

concluding the proof of the theorem.

-B Proof of Theorem 4

To prove the theorem we introduce essential terminology and definitions. We define reachability as follows: Given two nodes $\mathcal{V}_{1}$ and $\mathcal{V}_{2}$ and energy level $E$ , node $\mathcal{V}_{1}$ is considered reachable from $\mathcal{V}_{2}$ if there exists a path connecting them that traverses only nodes with energy greater than or equal to $E$ . Building upon this definition, a graph exhibits Weak Reversibility if, for any energy level $E$ and nodes $\mathcal{U}_{1}$ and $\mathcal{U}_{2}$ , $\mathcal{U}_{1}$ is reachable from $\mathcal{U}_{2}$ at height $E$ if and only if $\mathcal{U}_{2}$ is reachable from $\mathcal{U}_{1}$ at height $E$ .

Following [43], to prove that Theorem 4 holds, one has to show that the following requirements hold:

R1

The graph satisfies weak reversibility [43].
R2

the temperature sequence is from the form of $\tau_{j}=\frac{C}{\log(j+1)}$ where $C$ is greater than the maximal energy difference between any two nodes.
R3

The Markov chain introduced in Algorithm 2 is irreducible.

We prove the three mentioned conditions are satisfied to conclude the theorem. Requirements R1 and R2 follow from the formulation of SA-PAUSE. Specifically, weak reversibility (R1) stems directly from the definition and the undirected graph property, while the temperature sequence condition R2 is satisfied as we set $C$ to be as mentioned in (19).

To prove that R3 holds, by definition, we need to show there is a path with positive probability between any two nodes $\mathcal{V},\mathcal{U}\in\mathbb{V}$ . Since the graph is undirected it is sufficient to show a path from $\mathcal{V}$ to $\mathcal{U}$ . In Algorithm 3, we present an implicit algorithm yielding a series of nodes $\mathcal{V}_{0},\mathcal{V}_{1},\ldots,\mathcal{U}$ . within this sequence, consecutive nodes are neighbors, i.e., the algorithm yields a path with positive probability from $\mathcal{V}_{0}$ to $\mathcal{U}$ .

Input : Set of users

\mathbb{K}

; an arbitrary node

\mathcal{V}_{0}

, and

\mathcal{U}

Init :

j=0

3while $\mathcal{U}\neq\mathcal{V}_{j}$ do

4 if $\min_{k\in\mathcal{V}_{j}}\{{\rm ucb}(k)\}\leq\max_{k\in\mathcal{U}\setminus\mathcal{V}_{j}}\{{\rm ucb}(k)\}$ then

\mathcal{V}_{j+1}\triangleq\Big{(}\mathcal{V}_{j}\setminus\operatorname*{arg\,min}_{k\in\mathcal{V}_{j}}\{{\rm ucb}(k)\}\Big{)}\bigcup\operatorname*{arg\,max}_{k\in\mathcal{U}\setminus\mathcal{V}_{j}}\{{\rm ucb}(k)\}

7 else

8 sample a random user

p

from

\mathcal{V}_{j}\setminus\mathcal{U}

;

\mathcal{V}_{j+1}\triangleq(\mathcal{V}_{j}\setminus\{p\})\bigcup\operatorname*{arg\,max}_{k\in\mathcal{U}\setminus\mathcal{V}_{j}}\{{\rm ucb}(k)\}

;

j=j+1

Algorithm 3 Constructing Path from

\mathcal{V}_{0}

\mathcal{U}

This algorithm possesses a crucial characteristic; the conditional statement evaluates to true until it transitions to false, and from that moment on, it remains False to the end. Thus, the algorithm can be partitioned into two phases, the iterations before the statement becomes false, and the rest. We denote the iteration the condition becomes false as $j^{0}$ .

First, observe that when $j<j^{0}$ , $\mathcal{V}_{j+1}$ is an active neighbor of $\mathcal{V}_{j}$ , whereas during all subsequent iterations, the former is a passive neighbor of the latter. This proves the transitions occur in a positive probability in the first place.

Next, we prove the algorithm’s correctness and termination. Let $b$ the minimum ${\rm ucb}$ value in $\mathcal{V}_{j^{0}}$ . For every $k\in\mathcal{U}$ , if ${\rm ucb}(k)>b$ , then it is added to $\mathcal{V}_{j}$ in an iteration $j<j^{0}$ . this is guaranteed because if such incorporation had not occurred by the $j^{0}$ th iteration, the conditional statement would remain satisfied, contradicting the definition of $b$ . The rest of the users, i.e., every $k\in\mathcal{U}$ such that ${\rm ucb}(k)\leq b$ , will be added during the second phase.

Notice the algorithm avoids cyclical additions and subtractions, as during the second phase, users from $\mathcal{U}$ who are already present in $\mathcal{V}_{j}$ for all $j\geq j^{0}$ are preserved when constructing $\mathcal{V}_{j+1}$ . Instead, a user not belonging to $\mathcal{U}$ is eliminated. Throughout this exposition, we have established that the algorithm terminates, and every user $k\in\mathcal{U}$ is eventually incorporated into the evolving set without subsequent elimination. This completes our verification of the algorithm’s correctness, and the proof as a whole.

References

[1] O. Peleg, N. Lang, S. Rini, N. Shlezinger, and K. Cohen, “PAUSE: Privacy-aware active user selection for federated learning,” in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2025.
[2] B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. y Arcas, “Communication-efficient learning of deep networks from decentralized data,” in Artificial Intelligence and Statistics. PMLR, 2017, pp. 1273–1282.
[3] P. Kairouz et al., “Advances and open problems in federated learning,” Foundations and trends® in machine learning, vol. 14, no. 1–2, pp. 1–210, 2021.
[4] T. Gafni, N. Shlezinger, K. Cohen, Y. C. Eldar, and H. V. Poor, “Federated learning: A signal processing perspective,” IEEE Signal Process. Mag., vol. 39, no. 3, pp. 14–41, 2022.
[5] T. Li, A. K. Sahu, A. Talwalkar, and V. Smith, “Federated learning: Challenges, methods, and future directions,” IEEE Signal Process. Mag., vol. 37, no. 3, pp. 50–60, 2020.
[6] M. Chen, N. Shlezinger, H. V. Poor, Y. C. Eldar, and S. Cui, “Communication-efficient federated learning,” Proceedings of the National Academy of Sciences, vol. 118, no. 17, 2021.
[7] D. Alistarh, T. Hoefler, M. Johansson, N. Konstantinov, S. Khirirat, and C. Renggli, “The convergence of sparsified gradient methods,” Advances in Neural Information Processing Systems, vol. 31, 2018.
[8] P. Han, S. Wang, and K. K. Leung, “Adaptive gradient sparsification for efficient federated learning: An online learning approach,” in IEEE International Conference on Distributed Computing Systems (ICDCS), 2020, pp. 300–310.
[9] A. Reisizadeh, A. Mokhtari, H. Hassani, A. Jadbabaie, and R. Pedarsani, “Fedpaq: A communication-efficient federated learning method with periodic averaging and quantization,” in International Conference on Artificial Intelligence and Statistics. PMLR, 2020, pp. 2021–2031.
[10] N. Shlezinger, M. Chen, Y. C. Eldar, H. V. Poor, and S. Cui, “UVeQFed: Universal vector quantization for federated learning,” IEEE Trans. Signal Process., vol. 69, pp. 500–514, 2020.
[11] M. M. Amiri and D. Gündüz, “Machine learning at the wireless edge: Distributed stochastic gradient descent over-the-air,” IEEE Trans. Signal Process., vol. 68, pp. 2155–2169, 2020.
[12] T. Sery and K. Cohen, “On analog gradient descent learning over multiple access fading channels,” IEEE Trans. Signal Process., vol. 68, pp. 2897–2911, 2020.
[13] K. Yang, T. Jiang, Y. Shi, and Z. Ding, “Federated learning via over-the-air computation,” IEEE Trans. Wireless Commun., vol. 19, no. 3, pp. 2022–2035, 2020.
[14] S. Mayhoub and T. M. Shami, “A review of client selection methods in federated learning,” Archives of Computational Methods in Engineering, vol. 31, no. 2, pp. 1129–1152, 2024.
[15] J. Li, T. Chen, and S. Teng, “A comprehensive survey on client selection strategies in federated learning,” Computer Networks, p. 110663, 2024.
[16] L. Fu, H. Zhang, G. Gao, M. Zhang, and X. Liu, “Client selection in federated learning: Principles, challenges, and opportunities,” IEEE Internet Things J., vol. 10, no. 24, pp. 21 811–21 819, 2023.
[17] J. Xu and H. Wang, “Client selection and bandwidth allocation in wireless federated learning networks: A long-term perspective,” IEEE Trans. Wireless Commun., vol. 20, no. 2, pp. 1188–1200, 2020.
[18] S. AbdulRahman, H. Tout, A. Mourad, and C. Talhi, “FedMCCS: Multicriteria client selection model for optimal iot federated learning,” IEEE Internet Things J., vol. 8, no. 6, pp. 4723–4735, 2020.
[19] E. Rizk, S. Vlaski, and A. H. Sayed, “Federated learning under importance sampling,” IEEE Trans. Signal Process., vol. 70, pp. 5381–5396, 2022.
[20] Y. Fraboni, R. Vidal, L. Kameni, and M. Lorenzi, “Clustered sampling: Low-variance and improved representativity for clients selection in federated learning,” in International Conference on Machine Learning. PMLR, 2021, pp. 3407–3416.
[21] W. Xia, T. Q. Quek, K. Guo, W. Wen, H. H. Yang, and H. Zhu, “Multi-armed bandit-based client scheduling for federated learning,” IEEE Trans. Wireless Commun., vol. 19, no. 11, pp. 7108–7123, 2020.
[22] B. Xu, W. Xia, J. Zhang, T. Q. Quek, and H. Zhu, “Online client scheduling for fast federated learning,” IEEE Wireless Commun. Lett., vol. 10, no. 7, pp. 1434–1438, 2021.
[23] D. Ben-Ami, K. Cohen, and Q. Zhao, “Client selection for generalization in accelerated federated learning: A multi-armed bandit approach,” IEEE Access, 2025.
[24] Y. Chen, W. Xu, X. Wu, M. Zhang, and B. Luo, “Personalized local differentially private federated learning with adaptive client sampling,” in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2024, pp. 6600–6604.
[25] L. Zhu and S. Han, “Deep leakage from gradients,” in Federated learning. Springer, 2020, pp. 17–31.
[26] B. Zhao, K. R. Mopuri, and H. Bilen, “iDLG: Improved deep leakage from gradients,” arXiv preprint arXiv:2001.02610, 2020.
[27] Y. Huang, S. Gupta, Z. Song, K. Li, and S. Arora, “Evaluating gradient inversion attacks and defenses in federated learning,” Advances in Neural Information Processing Systems, vol. 34, 2021.
[28] H. Yin, A. Mallya, A. Vahdat, J. M. Alvarez, J. Kautz, and P. Molchanov, “See through gradients: Image batch recovery via gradinversion,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 16 337–16 346.
[29] M. Kim, O. Günlü, and R. F. Schaefer, “Federated learning with local differential privacy: Trade-offs between privacy, utility, and communication,” in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2021, pp. 2650–2654.
[30] K. Wei et al., “Federated learning with differential privacy: Algorithms and performance analysis,” IEEE Trans. Inf. Forensics Security, vol. 15, pp. 3454–3469, 2020.
[31] L. Lyu, “DP-SIGNSGD: When efficiency meets privacy and robustness,” in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2021, pp. 3070–3074.
[32] A. Lowy and M. Razaviyayn, “Private federated learning without a trusted server: Optimal algorithms for convex losses,” in International Conference on Learning Representations, 2023.
[33] N. Lang, E. Sofer, T. Shaked, and N. Shlezinger, “Joint privacy enhancement and quantization in federated learning,” IEEE Trans. Signal Process., vol. 71, pp. 295–310, 2023.
[34] N. Lang, N. Shlezinger, R. G. D’Oliveira, and S. E. Rouayheb, “Compressed private aggregation for scalable and robust federated learning over massive networks,” arXiv preprint arXiv:2308.00540, 2023.
[35] C. Dwork, G. N. Rothblum, and S. Vadhan, “Boosting and differential privacy,” in IEEE Annual Symposium on Foundations of Computer Science, 2010, pp. 51–60.
[36] J. Zhang, D. Fay, and M. Johansson, “Dynamic privacy allocation for locally differentially private federated learning with composite objectives,” in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2024, pp. 9461–9465.
[37] L. Sun, J. Qian, X. Chen, and P. S. Yu, “LDP-FL: Practical private aggregation in federated learning with local differential privacy,” in International Joint Conference on Artificial Intelligence, 2021.
[38] A. Cheu, A. Smith, J. Ullman, D. Zeber, and M. Zhilyaev, “Distributed differential privacy via shuffling,” in Advances in Cryptology–EUROCRYPT 2019: 38th Annual International Conference on the Theory and Applications of Cryptographic Techniques, Darmstadt, Germany, May 19–23, 2019, Proceedings, Part I 38. Springer, 2019, pp. 375–403.
[39] B. Balle, J. Bell, A. Gascón, and K. Nissim, “The privacy blanket of the shuffle model,” in Advances in Cryptology–CRYPTO 2019: 39th Annual International Cryptology Conference, Santa Barbara, CA, USA, August 18–22, 2019, Proceedings, Part II 39. Springer, 2019, pp. 638–667.
[40] Q. Zhao, Multi-armed bandits: Theory and applications to online learning in networks. Springer Nature, 2022.
[41] P. Auer, N. Cesa-Bianchi, and P. Fischer, “Finite-time analysis of the multiarmed bandit problem,” Machine learning, vol. 47, pp. 235–256, 2002.
[42] W. Chen, Y. Wang, and Y. Yuan, “Combinatorial multi-armed bandit: General framework and applications,” in International conference on machine learning. PMLR, 2013, pp. 151–159.
[43] B. Hajek, “Cooling schedules for optimal annealing,” Mathematics of operations research, vol. 13, no. 2, pp. 311–329, 1988.
[44] X. Li, K. Huang, W. Yang, S. Wang, and Z. Zhang, “On the convergence of FedAvg on non-iid data,” in International Conference on Learning Representations, 2019.
[45] N. Lang, A. Cohen, and N. Shlezinger, “Stragglers-aware low-latency synchronous federated learning via layer-wise model updates,” IEEE Trans. on Commun., 2024, early access.
[46] S. P. Kasiviswanathan, H. K. Lee, K. Nissim, S. Raskhodnikova, and A. Smith, “What can we learn privately?” SIAM Journal on Computing, vol. 40, no. 3, pp. 793–826, 2011.
[47] Y. Wang, Y. Tong, and D. Shi, “Federated latent dirichlet allocation: A local differential privacy based framework,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, no. 04, 2020, pp. 6283–6290.
[48] T. Wang, X. Zhang, J. Feng, and X. Yang, “A comprehensive survey on local differential privacy toward data statistics and analysis,” Sensors, vol. 20, no. 24, p. 7030, 2020.
[49] C. Dwork, F. McSherry, K. Nissim, and A. Smith, “Calibrating noise to sensitivity in private data analysis,” Journal of Privacy and Confidentiality, vol. 7, no. 3, pp. 17–51, 2016.
[50] D. Henderson, S. H. Jacobson, and A. W. Johnson, “The theory and practice of simulated annealing,” Handbook of metaheuristics, pp. 287–319, 2003.
[51] S. Ledesma, G. Aviña, and R. Sanchez, “Practical considerations for simulated annealing implementation,” Simulated annealing, vol. 20, pp. 401–420, 2008.
[52] W. Ben-Ameur, “Computing the initial temperature of simulated annealing,” Computational optimization and applications, vol. 29, pp. 369–385, 2004.
[53] I. Bezáková, D. Štefankovič, V. V. Vazirani, and E. Vigoda, “Accelerating simulated annealing for the permanent and combinatorial counting problems,” SIAM Journal on Computing, vol. 37, no. 5, pp. 1429–1454, 2008.
[54] H. Wu, X. Guo, and X. Liu, “Adaptive exploration-exploitation tradeoff for opportunistic bandits,” in International Conference on Machine Learning. PMLR, 2018, pp. 5306–5314.
[55] M. M. Drugan, A. Nowé, and B. Manderick, “Pareto upper confidence bounds algorithms: an empirical study,” in IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL), 2014.
[56] W. Hoeffding, “Probability inequalities for sums of bounded random variables,” The collected works of Wassily Hoeffding, pp. 409–426, 1994.

PAUSE: Low-Latency and Privacy-Aware Active User Selection for Federated Learning

Abstract

Index Terms:

I Introduction

II System Model and Preliminaries

II-A Preliminaries: Federated Learning

II-A1 Objective

II-A2 Learning Procedure

II-A3 Communication Model

II-B Preliminaries: Local Differential Privacy

Definition 1 (ϵ\epsilon-LDP [48]).

Theorem 1 (LM [49]).

Theorem 2 (Composition [48]).

II-C Problem Formulation

III Privacy-Aware Active User Selection

III-A Reward and Privacy Policy

III-B PAUSE Algorithm

III-C Regret Analysis

Lemma 1.

Proof.

Theorem 3.

Proof.

III-D Discussion

IV SA-PAUSE

IV-A Simulated Annealing Algorithm

IV-B Theoretical Analysis

Theorem 4.

Proof.

V Numerical Study

V-A Experimental Setup

V-B Small Network with i.i.d. Data

V-C Small Network with non-i.i.d. Data

V-D Large Networks

VI Conclusion

-A Proof of Theorem 3

Lemma -A.1.

Proof.

Lemma -A.2.

Proof.

-B Proof of Theorem 4

References

Definition 1 ( $\epsilon$ -LDP [48]).