Pure-DP Aggregation in the Shuffle Model:
Error-Optimal and Communication-Efficient

Badih Ghazi Google Research, Mountain View. Email: badihghazi@gmail.com. Ravi Kumar Google Research, Mountain View. Email: ravi.k53@gmail.com. Pasin Manurangsi Google Research, Thailand. Email: pasin@google.com.

Abstract

We obtain a new protocol for binary counting in the $\epsilon$ - $\mathrm{DP}_{\mathrm{shuffle}}$ model with error $O(1/\epsilon)$ and expected communication $\widetilde{O}\left(\frac{\log n}{\epsilon}\right)$ messages per user. Previous protocols incur either an error of $O(1/\epsilon^{1.5})$ with $O_{\epsilon}(\log{n})$ messages per user (Ghazi et al., ITC 2020) or an error of $O(1/\epsilon)$ with $O_{\epsilon}(n^{2.5})$ messages per user (Cheu and Yan, TPDP 2022). Using the new protocol, we obtained improved $\epsilon$ - $\mathrm{DP}_{\mathrm{shuffle}}$ protocols for real summation and histograms.

1 Introduction

Differential privacy (DP) [DMNS06] is a widely accepted notion used for bounding and quantifying an algorithm’s leakage of personal information. Its most basic form, known as pure-DP, is governed by a single parameter $\epsilon>0$ , which bounds the leakage of the algorithm. Specifically, a randomized algorithm $A(\cdot)$ is said to be $\epsilon$ -DP if for any subset $S$ of output values, and for any two datasets $D$ and $D^{\prime}$ differing on a single user’s data, it holds that $\Pr[A(D)\in S]\leq e^{\epsilon}\cdot\Pr[A(D^{\prime})\in S]$ . In settings where pure-DP is not (known to be) possible, a common relaxation is the so-called approximate $(\epsilon,\delta)$ -DP [DKM⁺06], which has an additional parameter $\delta\in[0,1]$ . In this case, the condition becomes: $\Pr[A(D)\in S]\leq e^{\epsilon}\cdot\Pr[A(D^{\prime})\in S]+\delta$ .

Depending on the trust assumptions, three models of DP are commonly studied. The first is the central model, where a trusted curator is assumed to hold the raw data and required to release a private output. (This goes back to the first work [DMNS06] on DP.) The second is the local model [EGS03, DMNS06, KLN⁺08], where each user’s message is required to be private. The third is the shuffle model [BEM⁺17, CSU⁺19, EFM⁺19], where the users’ messages are routed through a trusted shuffler, which is assumed to be non-colluding with the curator, and which is expected to randomly permute the messages incoming from the different users ( $\mathrm{DP}_{\mathrm{shuffle}}$ ). More formally, a protocol $P=(R,S,A)$ in the shuffle model consists of $3$ procedures: (i) a local randomizer $R(\cdot)$ that takes as input the data of a single user and outputs one or more messages, (ii) a shuffler $S(\cdot)$ that randomly permutes the messages from all the local randomizers, and (iii) an analyst $A(\cdot)$ that consumes the permuted output of the shuffler; the output of the protocol $P$ is the output of the analyst $A(\cdot)$ . Privacy in the shuffle model is defined as follows:

Definition 1 ([CSU⁺19, EFM⁺19]).

A protocol $P=(R,S,A)$ is said to be $(\epsilon,\delta)$ - $\mathrm{DP}_{\mathrm{shuffle}}$ if for any input dataset $D=(x_{1},\dots,x_{n})$ where $n$ is the number of users, it holds that $S(R(x_{1}),\dots,R(x_{n}))$ is $(\epsilon,\delta)$ -DP. In the particular case where $\delta=0$ , the protocol $P$ is said to be $\epsilon$ - $\mathrm{DP}_{\mathrm{shuffle}}$ .

For several analytics tasks, low-error algorithms are known in the central model, whereas they are known to be impossible in the local model. For such tasks, low-error algorithms are commonly sought in the shuffle model, since it is more preferable to trust a shuffler than a central curator.

1.1 Our Contributions

In the binary summation (aka counting) problem, each user $i$ receives an input $x_{i}\in\{0,1\}$ and the goal is to estimate $\sum_{i\in[n]}x_{i}$ . For this well-studied task, the discrete Laplace mechanism is known to achieve the optimal (expected absolute) error of $O(1/\epsilon)$ for $\epsilon$ -DP summation in the central model [GRS09, GV16]. Note that this error is independent of the number $n$ of users, and is an absolute constant for the common parameter regime where $\epsilon=O(1)$ . In contrast, the error of any aggregation protocol in the local model is known to be at least on the order of $\sqrt{n}$ [BNO08, CSS12]. There have been many works studied aggregation in the $\mathrm{DP}_{\mathrm{shuffle}}$ setting including [BBGN19, BBGN20, GMPV20, GGK⁺20, GKMP20, GKM⁺21b, GKM21a, BC20]. For pure-DP aggregation, it is known that any single-message protocol (where each user sends a single message to the shuffler) should incur error $\Omega_{\epsilon}(\sqrt{n})$ [BC20]. For multi-message protocols, where each user can send multiple messages to the shuffler, the best known protocols incur either an error of $O(1/\epsilon^{1.5})$ with $O(\log{n})$ messages per user [GGK⁺20] or an error of $O(1/\epsilon)$ with $O(n^{2.5})$ messages per user [CY22]. No protocol simultaneously achieved error $O(1/\epsilon)$ and communication $O(\log{n})$ .

We obtain an $\epsilon$ - $\mathrm{DP}_{\mathrm{shuffle}}$ algorithm for binary summation, where each user, in expectation, sends $O\left(\frac{\log n}{\epsilon}\right)$ one-bit messages; this answers the main open question for this basic aggregation task.

Theorem 2.

For every positive real number $\epsilon\leq O(1)$ , there is a (non-interactive) $\epsilon$ - $\mathrm{DP}_{\mathrm{shuffle}}$ protocol for binary summation with RMSE $O(1/\epsilon)$ , where each user sends $O\left(\frac{\log n}{\epsilon}\right)$ messages in expectation and each message consists of a single bit.

In fact, similar to the protocol of [CY22], our protocol can get an error that is arbitrarily close to that of the discrete Laplace mechanism, which is known to be optimal in the central model. We defer the formal statement to Theorem 6.

Before we continue, we note that while the expected number of messages in Theorem 2 is small (and with an exponential tail), the worst case number of messages is unbounded. This should be contrasted with an $\Omega_{\epsilon}(\sqrt{\log n})$ lower bound in [GGK⁺20] that only applies to the worst case number of bits sent by a user. We discuss this further in Section 5.

Protocols for Real Summation and Histogram.

Using known techniques (e.g., [CSU⁺19, GGK⁺20]), we immediately get the following consequences for real summation and histogram.

In the real summation problem, each $x_{i}$ is a real value in $[0,1]$ ; the goal is again to estimate the sum $\sum_{i\in[n]}x_{i}$ . The protocol in [GGK⁺20] achieves an expected RMSE of $\widetilde{O}(1/\epsilon^{1.5})$ ; here, each user sends $O_{\epsilon}(\log^{3}n)$ messages each of length $O(\log\log n)$ bits. By running their protocol bit-by-bit with an appropriate privacy budget split, we get an algorithm with an improved, and asymptotically optimal, error of $O(1/\epsilon)$ while with expected communication similar to theirs.

Corollary 3.

For every positive real number $\epsilon\leq O(1)$ , there is a (non-interactive) $\epsilon$ - $\mathrm{DP}_{\mathrm{shuffle}}$ protocol for real summation with RMSE $O(1/\epsilon)$ , where each user sends $O(\frac{\log^{3}n}{\epsilon})$ messages in expectation and each message consists of $O(\log\log n)$ bits.

A widely used primitive related, though not identical, to aggregation is histogram computation. In the histogram problem, each $x_{i}$ is a number in $[B]$ ; the goal is to estimate the histogram of the dataset, where the histogram $\mathbf{h}\in\mathbb{Z}_{\geq 0}^{B}$ is defined by $h_{b}=|\{i\in[n]\mid x_{i}=b\}|$ . The error of an estimated histogram $\tilde{\mathbf{h}}$ is usually measured in the $\ell_{\infty}$ sense, i.e., $\|\tilde{\mathbf{h}}-\mathbf{h}\|_{\infty}=\max_{b\in[B]}|h_{b}-\tilde{h}_{b}|$ .

For this task, which has been studied in several papers including [GGK⁺21, BC20], the best known pure- $\mathrm{DP}_{\mathrm{shuffle}}$ protocol achieved $\ell_{\infty}$ -error $O\left(\frac{\log{B}\log{n}}{\epsilon^{1.5}}\right)$ and communication $O\left(\frac{B\log{n}\log{B}}{\epsilon}\right)$ bits. By running the our $(\epsilon/2)$ - $\mathrm{DP}_{\mathrm{shuffle}}$ protocol separately for each bucket [GGK⁺20, Appendix A], we immediately arrive at the following:

Corollary 4.

For every positive real number $\epsilon\leq O(1)$ , there is a (non-interactive) $\epsilon$ - $\mathrm{DP}_{\mathrm{shuffle}}$ protocol that computes histograms on domains of size $B$ with an expected $\ell_{\infty}$ -error of at most $O\left(\frac{\log{B}\log{n}}{\epsilon}\right)$ , where each user sends $O\left(\frac{B\log{n}}{\epsilon}\right)$ messages in expectation and each message consists of $O(\log{B})$ bits.

1.2 Technical Overview

We will now briefly discuss the proof of Theorem 2. Surprisingly, we show that a simple modification of the algorithm from [GKMP20] satisfies pure-DP! To understand the modification and its necessity, it is first important to understand the algorithm in [GKMP20]. In their protocol, the messages are either $+1$ or $-1$ , and the analyzer’s output is simply the sum of all messages. There are three type of messages each user sends:

•

Input-Dependent Messages: If the input $x_{i}$ is 1, the user sends a +1 message. Otherwise, the user does not send anything.
•

Flooding Messages: These are messages that do not affect the final estimation error. In particular, a random variable $z^{\pm 1}_{i}$ is drawn from an appropriate distribution and the user sends $z^{\pm 1}_{i}$ additional copies of $-1$ and $z^{\pm 1}_{i}$ additional copies of $+1$ . These messages get canceled when the analyzer computes it output.
•

Noise Messages: These are the messages that affect the error in the end. Specifically, $z^{+1}_{i},z^{-1}_{i}$ are drawn i.i.d. from an appropriate distribution, and $z^{-1}_{i}$ additional copies of $-1$ and $z^{+1}_{i}$ additional copies of $+1$ are then sent.

We note here that the view of the analyzer is simply the number of +1 messages and the number of $-1$ messages, which we will denote by $V_{+1}$ and $V_{-1}$ respectively.

While [GKMP20] shows that this protocol is $(\epsilon,\delta)$ -DP, it is easy to show that this is not $\epsilon$ -DP for any finite $\epsilon$ . Indeed, consider two neighboring datasets where $X$ consists of all zeros and $X^{\prime}$ consists of a single one and $n-1$ zeros. There is a non-zero probability that $V_{+1}(X)=0$ , while $V_{+1}(X^{\prime})$ is always non-zero (because of the input-dependent message from the user holding the single one).

To fix this, we randomize this “input-dependent” part. With probability $q$ , the user sends nothing. With the remaining probability $1-q$ , (instead of sending a single +1 for $x_{i}=1$ as in [GKMP20],) the user sends $s+1$ copies of +1 and $s$ copies of $-1$ ; similarly, the user sends $s$ copies of +1 and $s$ copies of $-1$ in the case $x_{i}=0$ . By setting $q$ to be sufficiently small (e.g., $q=O(\nicefrac{{1}}{{\epsilon n}})$ ), it can be shown that the error remains roughly the same as before. Furthermore, when $s$ is sufficiently large (i.e., $O_{\epsilon}(\log n)$ ), we manage to show that this algorithm satisfies $\epsilon$ - $\mathrm{DP}_{\mathrm{shuffle}}$ . While the exact reason for this pure-DP guarantee is rather technical, the general idea is similar to [GGK⁺20]: by making the “border” part of the support equal in probabilities in the two cases, we avoid the issues presented above. Furthermore, by making $s$ sufficiently large, the input-dependent probability is “sufficiently inside” of the support that it usually does not completely dominate the contribution from the outer part.

Finally, note that $V_{+1},V_{-1}$ involves summation of many i.i.d. random variables $\sum_{i\in[n]}z^{\pm 1}_{i}$ , $\sum_{i\in[n]}z^{+1}_{i}$ , and $\sum_{i\in[n]}z^{-1}_{i}$ . As observed in [GKMP20], it is convenient to use infinitely divisible distributions so that these sums have distributions that are independent of $n$ , allowing for simpler calculations. We inherit this feature from their analysis.

2 Preliminaries

For a discrete distribution $\mathcal{D}$ , let $f_{\mathcal{D}}$ denote its probability mass function (PMF). The max-divergence between distributions $\mathcal{D}_{1},\mathcal{D}_{2}$ is defined as $d_{\infty}(\mathcal{D}_{1}\|\mathcal{D}_{2}):=\max_{x\in\operatorname{supp}(\mathcal{D}_{1})}\ln(f_{\mathcal{D}_{1}}(x)/f_{\mathcal{D}_{2}}(x))$ .

For two distributions $\mathcal{D}_{1},\mathcal{D}_{2}$ over $\mathbb{Z}^{d}$ , we write $\mathcal{D}_{1}*\mathcal{D}_{2}$ to denote its convolution, i.e., the distribution of $z_{1}+z_{2}$ where $z_{1}\sim\mathcal{D}_{1},z_{2}\sim\mathcal{D}_{2}$ are independent. Moreover, let $(\mathcal{D})^{*n}$ denote the $n$ -fold convolution of $\mathcal{D}$ , i.e., the distribution of $z_{1}+\cdots+z_{n}$ where $z_{1},\dots,z_{n}\sim\mathcal{D}$ are independent. We write $\mathcal{D}\otimes\mathcal{D}^{\prime}$ to denote the product distribution of $\mathcal{D}_{1},\mathcal{D}_{2}$ . Furthermore, we may write a value to denote the distribution all of whose probability mass is at that value (e.g., 0 stands for the probability distribution that is always equal to zero).

A distribution $\mathcal{D}$ is infinitely divisible iff, for every positive integer $n$ , there exists a distribution $\mathcal{D}_{/n}$ such that $(\mathcal{D}_{/n})^{*n}$ . Two distributions we will use here (both supported on $\mathbb{Z}_{\geq 0}$ ) are:

•

Poisson Distribution $\operatorname{Poi}(\lambda)$ : This is the distribution whose PMF is $f_{\operatorname{Poi}(\lambda)}(k)=\lambda^{k}e^{-\lambda}/k!$ . It satisfies $\operatorname{Poi}(\lambda)_{/n}=\operatorname{Poi}(\lambda/n)$ .
•
Negative Binomial Distribution $\operatorname{NB}(r,p)$ : This is the distribution whose PMF is $f_{\operatorname{NB}(r,p)}(k)=\binom{k+r-1}{k}p^{r}(1-p)^{k}$ . It satisfies $\operatorname{NB}(r,p)_{/n}=\operatorname{NB}(r/n,p)$ .
- –
  
  Geometric Distribution $\operatorname{Geo}(p)$ : A special case of the $\operatorname{NB}$ distribution is the geometric distribution $\operatorname{Geo}(p)=\operatorname{NB}(1,p)$ , i.e., one with $f_{\operatorname{Geo}(p)}(k)=p(1-p)^{k}$ .

Finally, we recall that the discrete Laplace distribution $\operatorname{DLap}(a)$ is a distribution supported on $\mathbb{Z}$ with PMF $f_{\operatorname{DLap}(a)}(x)\propto\exp\left(-a|x|\right)$ . It is well known that $\operatorname{DLap}(a)$ is the distribution of $z_{1}-z_{2}$ where $z_{1},z_{2}\sim\operatorname{Geo}(1-\exp(-a))$ are independent. Furthermore, the variance of the discrete Laplace distribution is $\operatorname{Var}(\operatorname{DLap}(a))=\frac{2e^{-a}}{(1-e^{-a})^{2}}$ .

We will also use the following well-known lemma¹¹1This can be viewed as a special case of the post-processing property of DP where the post-processing function is adding a random variable drawn from $\mathcal{D}_{3}$ . Another way to see that this holds is to simply observe that, for any $y\in\operatorname{supp}(\mathcal{D}_{1}*\mathcal{D}_{2})$ , we have $f_{\mathcal{D}_{1}*\mathcal{D}_{3}}(y)=\sum_{z\in\operatorname{supp}(\mathcal{D}_{3})}f_{\mathcal{D}_{3}}(z)\cdot f_{\mathcal{D}_{1}}(y-z)\leq\sum_{z\in\operatorname{supp}(\mathcal{D}_{3})}f_{\mathcal{D}_{3}}(z)\cdot\left(e^{d_{\infty}(\mathcal{D}_{1}\|\mathcal{D}_{2})}\cdot f_{\mathcal{D}_{2}}(y-z)\right)=e^{d_{\infty}(\mathcal{D}_{1}\|\mathcal{D}_{2})}f_{\mathcal{D}_{2}*\mathcal{D}_{3}}(y)$ .:

Lemma 5.

For any distributions $\mathcal{D}_{1},\mathcal{D}_{2},\mathcal{D}_{3}$ over $\mathbb{Z}^{d}$ , $d_{\infty}(\mathcal{D}_{1}*\mathcal{D}_{3}\|\mathcal{D}_{2}*\mathcal{D}_{3})\leq d_{\infty}(\mathcal{D}_{1}\|\mathcal{D}_{2})$ .

3 Counting Protocol

In this section, we will describe a pure- $\mathrm{DP}_{\mathrm{shuffle}}$ algorithm for counting, which is our main result.

Theorem 6.

For any positive real numbers $\epsilon\leq O(1)$ and $\rho\in(0,1/2]$ , there is a (non-interactive) $\epsilon$ - $\mathrm{DP}_{\mathrm{shuffle}}$ protocol for binary summation that has MSE at most $(1+\rho)\cdot\operatorname{Var}(\operatorname{DLap}(e^{-\epsilon}))$ where each user sends $O\left(\frac{\log(n/\rho)}{\epsilon\rho}\right)$ messages in expectation and each message consists of a single bit.

By setting $\rho$ arbitrarily close to zero, we can get the mean square error (MSE) to be arbitrarily close to that of the discrete Laplace mechanism, which is known to be (asymptotically) optimal in the central model [GRS09, GV16]. We can get this guarantee for other type of errors, e.g., $\ell_{1}$ -error (aka expected absolute error) as well, but for ease of presentation, we only focus on MSE.

Note that Theorem 6 implies Theorem 2 by simply setting $\rho$ to be a positive constant (say, $0.5$ ).

3.1 Algorithm

In this section we present and analyze our main algorithm for counting (aka binary summation). To begin, we will set our parameters as follows.

Condition 7.

Let $\lambda,\epsilon^{\prime},\epsilon,q\in\mathbb{R}_{>0}$ and $s\in\mathbb{Z}_{>0}$ . Suppose that the following conditions hold:

•

$\epsilon^{\prime}<\epsilon$ ,
•

$s\geq 2\ln\left(\frac{1}{(e^{\epsilon}-1)q}\right)/(\epsilon-\epsilon^{\prime})$ ,
•

$\lambda\geq\frac{e^{\epsilon-\epsilon^{\prime}}}{1-e^{(\epsilon^{\prime}-\epsilon)/2}}\cdot s$ .

We now define the following distributions:

•

$\mathcal{D}^{\mathrm{noise}}=\operatorname{Geo}(1-e^{-\epsilon^{\prime}})$ .
•

$\mathcal{D}^{\mathrm{flood}}=\operatorname{Poi}(\lambda)$ .

•

For $x\in\{0,1\}$ , $\mathcal{D}^{\mathrm{input},x}$ supported on $\mathbb{Z}_{\geq 0}^{2}$ is defined as

	$\displaystyle\mathcal{D}^{\mathrm{input},x}((s+x,s))$	$\displaystyle=1-q,$
	$\displaystyle\mathcal{D}^{\mathrm{input},x}((0,0))$	$\displaystyle=q.$

Algorithm 1 contains the formal description of the randomizer and Algorithm 2 contains the description of the analyzer. As mentioned earlier, our algorithm is the same as that of [GKMP20], except in the first step (Line 2). In [GKMP20], the protocol always sends a single +1 if $x_{i}=1$ and nothing otherwise in this step. Instead, we randomize this step by always sending nothing with a certain probability. With the remaining probability, instead of sending a single +1 for $x_{i}=1$ , we send $s+1$ copies of +1 and $s$ copies of $-1$ (similarly, we send $s$ copies of +1 and $s$ copies of $-1$ in the case $x_{i}=0$ ).

Algorithm 1 Counting Randomizer

1: procedure CorrNoiseRandomizer

{}_{n}(x_{i})

2: Sample

(y^{+1}_{i},y^{-1}_{i})\sim\mathcal{D}^{\mathrm{input},x_{i}}

3: Sample

z^{+1}_{i},z^{-1}_{i}\sim\mathcal{D}^{\mathrm{noise}}_{/n}

4: Sample

z^{\pm 1}_{i}\sim\mathcal{D}^{\mathrm{flood}}_{/n}

5: Send

y_{i}^{+1}+z^{+1}_{i}+z^{\pm 1}_{i}

copies of

+1

, and

y^{-1}_{i}+z^{-1}_{i}+z^{\pm 1}_{i}

copies of

-1

Algorithm 2 Counting Analyzer

1: procedure CorrNoiseAnalyzer_n,s

R\leftarrow

multiset of messages received

3: return

\left(\sum_{y\in R}y\right)-ns

4 Analysis of the Protocol

In this section we analyze the privacy, utility, and communication guarantees of our counting protocol. Throughout the remainder of this section, we assume the distributions and parameters are set as in 7; for brevity, we will not state this assumption in our privacy analysis.

4.1 Privacy Analysis

Lemma 8 (Main Privacy Guarantee).

CorrNoiseRandomizer satisfies $\epsilon$ - $\mathrm{DP}_{\mathrm{shuffle}}$ .

To prove the above, we need the following technical lemmas regarding $\mathcal{D}^{\mathrm{noise}},\mathcal{D}^{\mathrm{flood}}$ .

Lemma 9.

For every $i\in\mathbb{Z}$ , $f_{\mathcal{D}^{\mathrm{noise}}}(i-1)\leq e^{\epsilon^{\prime}}f_{\mathcal{D}^{\mathrm{noise}}}(i)$

Proof.

This immediately follows from the PMF definition of $\mathcal{D}^{\mathrm{noise}}=\operatorname{Geo}(1-e^{\epsilon^{\prime}})$ . ∎

Lemma 10.

For every $i\in\mathbb{Z}$ , $(e^{\epsilon}-1)q\cdot f_{\mathcal{D}^{\mathrm{flood}}}(i+s)+e^{\epsilon-\epsilon^{\prime}}f_{\mathcal{D}^{\mathrm{flood}}}(i-1)\geq f_{\mathcal{D}^{\mathrm{flood}}}(i)$ .

Proof.

If $e^{\epsilon-\epsilon^{\prime}}f_{\mathcal{D}^{\mathrm{flood}}}(i-1)\geq f_{\mathcal{D}^{\mathrm{flood}}}(i)$ , then the statement is clearly true. Otherwise, we have $f_{\mathcal{D}^{\mathrm{flood}}}(i)>0$ (i.e., $i\geq 0$ ) and $e^{\epsilon^{\prime}-\epsilon}>\frac{f_{\mathcal{D}^{\mathrm{flood}}}(i-1)}{f_{\mathcal{D}^{\mathrm{flood}}}(i)}=\frac{i}{\lambda}$ , which implies

\displaystyle 0\leq i\leq e^{\epsilon^{\prime}-\epsilon}\lambda.

(1)

We can then bound $\frac{f_{\mathcal{D}^{\mathrm{flood}}}(i+s)}{f_{\mathcal{D}^{\mathrm{flood}}}(i)}$ as

\displaystyle\frac{f_{\mathcal{D}^{\mathrm{flood}}}(i+s)}{f_{\mathcal{D}^{\mathrm{flood}}}(i)}=\frac{\lambda^{s}}{(i+1)\cdots(i+s)}\geq\frac{\lambda^{s}}{(i+s)^{s}}\overset{\eqref{eq:range-i}}{\geq}\left(\frac{\lambda}{e^{\epsilon^{\prime}-\epsilon}\lambda+s}\right)^{s}\geq\left(\frac{\lambda}{e^{(\epsilon^{\prime}-\epsilon)/2}\lambda}\right)^{s}\geq\frac{1}{(e^{\epsilon}-1)q},

where the last two inequalities follow from our assumptions on $\lambda$ and $s$ respectively (7). Thus, in this case, we also have $q\cdot\mathcal{D}^{\mathrm{flood}}(i+s)+e^{-\epsilon^{\prime}}\mathcal{D}^{\mathrm{flood}}(i-1)\geq e^{-\epsilon}\mathcal{D}^{\mathrm{flood}}(i)$ as desired. ∎

We are now ready to prove the privacy guarantee (Lemma 8).

Proof of Lemma 8.

For any input dataset $X$ . Let $V(X)=(V_{+1},V_{-1})$ denote the distribution of the view of shuffler, where $V_{+1}$ and $V_{-1}$ denotes the number of +1 messages and the number of $-1$ messages respectively.

Consider two neighboring datasets $X=(x_{1},\dots,x_{n})$ and $X^{\prime}=(x^{\prime}_{1},\dots,x^{\prime}_{n})$ . Assume w.l.o.g. that they differ in the first coordinate and $x_{1}=1,x^{\prime}_{1}=0$ and $x_{2}^{\prime}=x_{2}$ , …, $x_{n}^{\prime}=x_{n}$ . To prove that CorrNoiseRandomizer satisfies $\epsilon$ - $\mathrm{DP}_{\mathrm{shuffle}}$ , we need to prove that $d_{\infty}(V(X)\|V(X^{\prime}))\leq\epsilon$ and $d_{\infty}(V(X^{\prime})\|V(X))\leq\epsilon$ .

Let $\mathcal{F}$ denote the distribution on $\mathbb{Z}^{2}$ of $(X,X)$ where $X\sim\mathcal{D}^{\mathrm{flood}}$ . Observe that

\displaystyle V(X)=\mathcal{D}^{\mathrm{input},1}*\mathcal{D}^{\mathrm{input},x_{2}}*\cdots*\mathcal{D}^{\mathrm{input},x_{n}}*\mathcal{F}*(\mathcal{D}^{\mathrm{noise}}\otimes 0)*(0\otimes\mathcal{D}^{\mathrm{noise}}),

and

\displaystyle V(X^{\prime})=\mathcal{D}^{\mathrm{input},0}*\mathcal{D}^{\mathrm{input},x_{2}}*\cdots*\mathcal{D}^{\mathrm{input},x_{n}}*\mathcal{F}*(\mathcal{D}^{\mathrm{noise}}\otimes 0)*(0\otimes\mathcal{D}^{\mathrm{noise}}).

Bounding $d_{\infty}(V(X)\|V(X^{\prime}))$ .

From Lemma 5, we have

\displaystyle d_{\infty}(V(X)\|V(X^{\prime}))\leq d_{\infty}(\mathcal{D}^{\mathrm{input},1}*(\mathcal{D}^{\mathrm{noise}}\otimes 0)\|\mathcal{D}^{\mathrm{input},0}*(\mathcal{D}^{\mathrm{noise}}\otimes 0)).

For any $i,j\in\mathbb{Z}$ , we have

	$\displaystyle f_{\mathcal{D}^{\mathrm{input},1}*\mathcal{D}^{\mathrm{noise}}\otimes 0}(i,j)$	$\displaystyle=q\cdot f_{\mathcal{D}^{\mathrm{noise}}}(i)\mathbf{1}[j=0]+(1-q)\cdot f_{\mathcal{D}^{\mathrm{noise}}}(i-s-1)\mathbf{1}[j=s]$
	$\displaystyle(\text{\lx@cref{creftypecap~refnum}{lem:geo-ratio}})$	$\displaystyle\leq q\cdot f_{\mathcal{D}^{\mathrm{noise}}}(i)\mathbf{1}[j=0]+(1-q)\cdot e^{\epsilon^{\prime}}f_{\mathcal{D}^{\mathrm{noise}}}(i-s)\mathbf{1}[j=s]$
	$\displaystyle(\text{\lx@cref{creftypecap~refnum}{cond:parameters}})$	$\displaystyle\leq e^{\epsilon}\left(q\cdot f_{\mathcal{D}^{\mathrm{noise}}}(i)\mathbf{1}[j=0]+(1-q)\cdot f_{\mathcal{D}^{\mathrm{noise}}}(i-s)\mathbf{1}[j=s]\right)$
		$\displaystyle=e^{\epsilon}\cdot f_{\mathcal{D}^{\mathrm{input},0}*\mathcal{D}^{\mathrm{noise}}\otimes 0}(i,j).$

Combining the above inequalities, we have $d_{\infty}(V(X)\|V(X^{\prime}))\leq\epsilon$ as desired.

Bounding $d_{\infty}(V(X^{\prime})\|V(X))$ .

Again, from Lemma 5, we have

\displaystyle d_{\infty}(V(X^{\prime})\|V(X))\leq d_{\infty}(\mathcal{D}^{\mathrm{input},0}*\mathcal{F}*(0\times\mathcal{D}^{\mathrm{noise}})\|\mathcal{D}^{\mathrm{input},1}*\mathcal{F}*(0\times\mathcal{D}^{\mathrm{noise}})).

For any $i,j\in\mathbb{Z}$ , we have

		$\displaystyle f_{\mathcal{D}^{\mathrm{input},0}\mathcal{F}(0\times\mathcal{D}^{\mathrm{noise}})}(i,j)$
		$\displaystyle=f_{\mathcal{D}^{\mathrm{input},0}*\mathcal{F}}(i,i)\cdot f_{\mathcal{D}^{\mathrm{noise}}}(j-i)$
		$\displaystyle=\left(q\cdot f_{\mathcal{D}^{\mathrm{flood}}}(i)+(1-q)\cdot f_{\mathcal{D}^{\mathrm{flood}}}(i-s)\right)\cdot f_{\mathcal{D}^{\mathrm{noise}}}(j-i)$
	$\displaystyle(\text{\lx@cref{creftypecap~refnum}{lem:poi-ratio}})$	$\displaystyle\leq e^{\epsilon}\left(q\cdot f_{\mathcal{D}^{\mathrm{flood}}}(i)+(1-q)\cdot e^{-\epsilon^{\prime}}f_{\mathcal{D}^{\mathrm{flood}}}(i-s-1)\right)\cdot f_{\mathcal{D}^{\mathrm{noise}}}(j-i)$
	$\displaystyle(\text{\lx@cref{creftypecap~refnum}{lem:geo-ratio}})$	$\displaystyle\leq e^{\epsilon}\left(q\cdot f_{\mathcal{D}^{\mathrm{flood}}}(i)\cdot f_{\mathcal{D}^{\mathrm{noise}}}(j-i)+(1-q)\cdot f_{\mathcal{D}^{\mathrm{flood}}}(i-s-1)\cdot f_{\mathcal{D}^{\mathrm{noise}}}(j-i+1)\right)$
		$\displaystyle=e^{\epsilon}f_{\mathcal{D}^{\mathrm{input},1}\mathcal{F}(0\times\mathcal{D}^{\mathrm{noise}})}(i,j).$

Combining the above two inequalities, we have $d_{\infty}(V(X^{\prime})\|V(X))\leq\epsilon$ , concluding our proof. ∎

4.2 Utility Analysis

We next analyze the MSE of the output estimate.

Lemma 11.

The MSE of the estimator is at most $\operatorname{Var}(\operatorname{DLap}(e^{-\epsilon^{\prime}}))+qn+q^{2}n(n-1)$ .

Proof.

Notice that the output estimate is equal to $\sum_{i\in[n]}(y^{+1}_{i}-y^{-1}_{i}+z^{+1}_{i}-z^{-1}_{i})=\sum_{i\in[n]}(y^{+1}_{i}-y^{-1}_{i})+Z$ where $Z\sim\operatorname{DLap}(e^{-\epsilon^{\prime}})$ . As a result, the MSE of the output estimate is equal to

\displaystyle\mathbb{E}\left[\left(\sum_{i\in[n]}(y^{+1}_{i}-y^{-1}_{i}-x_{i})+Z\right)^{2}\right]=\mathbb{E}\left[\left(\sum_{i\in[n]}(y^{+1}_{i}-y^{-1}_{i}-x_{i})\right)^{2}\right]+\operatorname{Var}(\operatorname{DLap}(e^{-\epsilon^{\prime}})).

Next, notice that, if $x_{i}=0$ , then $y^{+1}_{i}-y^{-1}_{i}-x_{i}=0$ always. Otherwise, if $x_{i}=1$ , then $y^{+1}_{i}-y^{-1}_{i}-x_{i}=0$ with probability $1-q$ and $y^{+1}_{i}-y^{-1}_{i}-x_{i}=1$ with probability $q$ . As a result, we have

\displaystyle\mathbb{E}\left[\left(\sum_{i\in[n]}(y^{+1}_{i}-y^{-1}_{i}-x_{i})\right)^{2}\right]

\displaystyle\leq qn+q^{2}n(n-1).\qed

4.3 Communication Analysis

The expected number of bits send by the users can be easily computed as follows.

Lemma 12.

The expected number of messages sent by each user is at most $2s+1+\frac{\lambda}{n}+O\left(\frac{1}{\epsilon^{\prime}n}\right)$ .

Proof.

The expected number of bits sent per user is

	$\displaystyle\mathbb{E}[y^{+1}_{i}+y^{-1}_{i}]+\mathbb{E}[z^{+1}_{i}+z^{-1}_{i}]+2\mathbb{E}[z^{\pm 1}_{i}]$	$\displaystyle\leq(2s+1)+\frac{2\mathbb{E}[\mathcal{D}^{\mathrm{noise}}]}{n}+\frac{\mathbb{E}[\mathcal{D}^{\mathrm{flood}}]}{n}$
		$\displaystyle=2s+1+O\left(\frac{1}{\epsilon^{\prime}n}\right)+\frac{\lambda}{n}.\qed$

4.4 Putting Things Together: Proof of Theorem 6

Finally, we are ready to prove Theorem 6 by plugging in appropriate parameters and invoke the previous lemmas.

Proof of Theorem 6.

We start by picking $\epsilon^{\prime}=\epsilon-0.01\rho\cdot\min\{\epsilon,1\}$ . For this choice of $\epsilon^{\prime}$ , we have

\displaystyle\frac{\operatorname{Var}(\operatorname{DLap}(\epsilon^{\prime}))}{\operatorname{Var}(\operatorname{DLap}(\epsilon))}=\frac{\frac{2e^{-\epsilon^{\prime}}}{(1-e^{-\epsilon^{\prime}})^{2}}}{\frac{2e^{-\epsilon}}{(1-e^{-\epsilon})^{2}}}\leq 1+\frac{(e^{\epsilon-\epsilon^{\prime}}-1)(1+e^{-\epsilon^{\prime}})}{1-e^{-\epsilon^{\prime}}}\leq 1+\frac{3(\epsilon-\epsilon^{\prime})\cdot 2}{\epsilon^{\prime}}\leq 1+0.5\rho.

Then, picking

	$\displaystyle q$	$\displaystyle=\frac{0.1\rho\cdot\operatorname{Var}(\operatorname{DLap}(\epsilon))}{n}=O\left(\frac{\rho}{\epsilon^{2}n}\right),$
	$\displaystyle s$	$\displaystyle\geq 2\ln\left(\frac{1}{(e^{\epsilon}-1)q}\right)/(\epsilon-\epsilon^{\prime})=O\left(\frac{\log(n/\rho)}{\epsilon\rho}\right),$
	$\displaystyle\lambda$	$\displaystyle\geq\frac{e^{\epsilon-\epsilon^{\prime}}}{1-e^{(\epsilon^{\prime}-\epsilon)/2}}\cdot s=O\left(\frac{\log(n/\rho)}{\epsilon^{2}\rho}\right),$

and applying Lemma 8, Lemma 11, and Lemma 12 immediately imply Theorem 6. (Note that we may assume that $\epsilon\geq 1/n$ ; otherwise we can just output zero. Under this assumption, we have $\lambda/n\leq O\left(\frac{\log(n/\rho)}{\epsilon\rho}\right)$ as desired for the communication complexity claim.) ∎

5 Conclusions and Open Questions

In this work, we have provided pure- $\mathrm{DP}_{\mathrm{shuffle}}$ algorithms that achieve nearly optimal errors for bit summation, real summation, and histogram while significantly improving on the communication complexity compared to the state-of-the-art. Despite this, there are still a number of interesting open questions, some of which we highlight below.

•

Protocol with a bounded number of messages. As mentioned briefly in Section 1.1, our protocol can result in an arbitrarily large number of messages per user, although the expected number is quite small. (In fact, the distribution of the number of messages enjoys a strong exponential tail bound.) Is it possible to design a pure- $\mathrm{DP}_{\mathrm{shuffle}}$ protocol where the maximum number of messages is $O\left(\frac{\log n}{\epsilon}\right)$ for binary summation?

For this question, we note that a rather natural approach is to modify our protocol to make its number of messages bounded. Namely, we replace $\mathcal{D}^{\mathrm{noise}}_{/n}$ and $\mathcal{D}^{\mathrm{flood}}_{/n}$ by a truncated version of their respective distributions. It turns out that the latter is relatively simple (e.g., even replacing it with a Bernoulli distribution also works) because we only require a mild condition in Lemma 10 to hold. On the other hand, for the former, we are using Lemma 9 which only holds for unbounded distributions. We would like to stress that we do not know whether replacing $\mathcal{D}^{\mathrm{noise}}_{/n}$ with a truncated version of the negative binomial distribution with a “symmetrized” the input dependent part²²2This means that w.p. $q$ we output $s$ copies of both $+1$ and $-1$ messages, for both $x_{i}=0$ and $x_{i}=1$ cases. Without this change, the supports of the two cases are not the same and thus it obviously violates pure-DP. violates pure-DP; however, we do not know how to prove that it satisfies pure-DP either, as the probability mass function of their convolutions become somewhat unwieldy.
•

Lower bounds on the expected number of messages. Recall that the communication lower bound from [GGK⁺20] only applies to the maximum number of messages sent. Is it possible to prove a communication lower bound on the expected number of messages (even if the maximum number of messages is unbounded)? We note that the techniques from [GGK⁺20] does not apply.
•

Histogram protocol for large $B$ . Our protocol has communication complexity that grows linearly with $B$ , which becomes impractical when $B$ is large. Can we get protocol for histogram whose communication is $O_{\epsilon}\left((\log n)^{O(1)}\right)$ for $B=O(n)$ (while achieving nearly optimal errors)? For approximate- $\mathrm{DP}_{\mathrm{shuffle}}$ , a histogram protocol with expected communication of $1+O_{\epsilon}\left(\frac{B(\log(n/\delta)^{O(1)})}{n}\right)$ is known [GKMP20]. It would be interesting to understand if extending such a protocol to the pure- $\mathrm{DP}_{\mathrm{shuffle}}$ setting is possible.

More generally, despite the (by-now) vast literature on the shuffle model, most work have focused attention on approximate- $\mathrm{DP}_{\mathrm{shuffle}}$ . It would be interesting to expand the existing study to pure- $\mathrm{DP}_{\mathrm{shuffle}}$ as well.

References

[BBGN19] Borja Balle, James Bell, Adrià Gascón, and Kobbi Nissim. The privacy blanket of the shuffle model. In CRYPTO, pages 638–667, 2019.
[BBGN20] Borja Balle, James Bell, Adrià Gascón, and Kobbi Nissim. Private summation in the multi-message shuffle model. In CCS, pages 657–676, 2020.
[BC20] Victor Balcer and Albert Cheu. Separating local & shuffled differential privacy via histograms. In ITC, pages 1:1–1:14, 2020.
[BEM⁺17] Andrea Bittau, Úlfar Erlingsson, Petros Maniatis, Ilya Mironov, Ananth Raghunathan, David Lie, Mitch Rudominer, Ushasree Kode, Julien Tinnés, and Bernhard Seefeld. Prochlo: Strong privacy for analytics in the crowd. In SOSP, pages 441–459, 2017.
[BNO08] Amos Beimel, Kobbi Nissim, and Eran Omri. Distributed private data analysis: Simultaneously solving how and what. In CRYPTO, pages 451–468, 2008.
[CSS12] T.-H. Hubert Chan, Elaine Shi, and Dawn Song. Optimal lower bound for differentially private multi-party aggregation. In ESA, pages 277–288, 2012.
[CSU⁺19] Albert Cheu, Adam D. Smith, Jonathan Ullman, David Zeber, and Maxim Zhilyaev. Distributed differential privacy via shuffling. In EUROCRYPT, pages 375–403, 2019.
[CY22] Albert Cheu and Chao Yan. Pure differential privacy from secure intermediaries. In TPDP, 2022.
[DKM⁺06] Cynthia Dwork, Krishnaram Kenthapadi, Frank McSherry, Ilya Mironov, and Moni Naor. Our data, ourselves: Privacy via distributed noise generation. In EUROCRYPT, pages 486–503, 2006.
[DMNS06] Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam Smith. Calibrating noise to sensitivity in private data analysis. In TCC, pages 265–284, 2006.
[EFM⁺19] Úlfar Erlingsson, Vitaly Feldman, Ilya Mironov, Ananth Raghunathan, Kunal Talwar, and Abhradeep Thakurta. Amplification by shuffling: From local to central differential privacy via anonymity. In SODA, pages 2468–2479, 2019.
[EGS03] Alexandre Evfimievski, Johannes Gehrke, and Ramakrishnan Srikant. Limiting privacy breaches in privacy preserving data mining. In PODS, pages 211–222, 2003.
[GGK⁺20] Badih Ghazi, Noah Golowich, Ravi Kumar, Pasin Manurangsi, Rasmus Pagh, and Ameya Velingker. Pure differentially private summation from anonymous messages. In ITC, pages 15:1–15:23, 2020.
[GGK⁺21] Badih Ghazi, Noah Golowich, Ravi Kumar, Rasmus Pagh, and Ameya Velingker. On the power of multiple anonymous messages: Frequency estimation and selection in the shuffle model of differential privacy. In EUROCRYPT, pages 463–488, 2021.
[GKM21a] Badih Ghazi, Ravi Kumar, and Pasin Manurangsi. User-level differentially private learning via correlated sampling. In NeurIPS, pages 20172–20184, 2021.
[GKM⁺21b] Badih Ghazi, Ravi Kumar, Pasin Manurangsi, Rasmus Pagh, and Amer Sinha. Differentially private aggregation in the shuffle model: Almost central accuracy in almost a single message. In ICML, pages 3692–3701, 2021.
[GKMP20] Badih Ghazi, Ravi Kumar, Pasin Manurangsi, and Rasmus Pagh. Private counting from anonymous messages: Near-optimal accuracy with vanishing communication overhead. In ICML, pages 3505–3514, 2020.
[GMPV20] Badih Ghazi, Pasin Manurangsi, Rasmus Pagh, and Ameya Velingker. Private aggregation from fewer anonymous messages. In EUROCRYPT, pages 798–827, 2020.
[GRS09] Arpita Ghosh, Tim Roughgarden, and Mukund Sundararajan. Universally utility-maximizing privacy mechanisms. In STOC, pages 351–360, 2009.
[GV16] Quan Geng and Pramod Viswanath. The optimal noise-adding mechanism in differential privacy. IEEE Trans. Inf. Theory, 62(2):925–951, 2016.
[KLN⁺08] Shiva Prasad Kasiviswanathan, Homin K. Lee, Kobbi Nissim, Sofya Rashkodnikova, and Adam Smith. What can we learn privately? In FOCS, pages 531–540, 2008.

Pure-DP Aggregation in the Shuffle Model: Error-Optimal and Communication-Efficient

Abstract

1 Introduction

Definition 1 ([CSU+19, EFM+19]).

1.1 Our Contributions

Theorem 2.

Protocols for Real Summation and Histogram.

Corollary 3.

Corollary 4.

1.2 Technical Overview

2 Preliminaries

Lemma 5.

3 Counting Protocol

Theorem 6.

3.1 Algorithm

Condition 7.

4 Analysis of the Protocol

4.1 Privacy Analysis

Lemma 8 (Main Privacy Guarantee).

Lemma 9.

Proof.

Lemma 10.

Proof.

Proof of Lemma 8.

Bounding d∞​(V​(X)∥V​(X′))d_{\infty}(V(X)\|V(X^{\prime})).

Bounding d∞​(V​(X′)∥V​(X))d_{\infty}(V(X^{\prime})\|V(X)).

4.2 Utility Analysis

Lemma 11.

Proof.

4.3 Communication Analysis

Lemma 12.

Proof.

4.4 Putting Things Together: Proof of Theorem 6

Proof of Theorem 6.

5 Conclusions and Open Questions

References

Pure-DP Aggregation in the Shuffle Model:
Error-Optimal and Communication-Efficient

Definition 1 ([CSU⁺19, EFM⁺19]).

Bounding $d_{\infty}(V(X)\|V(X^{\prime}))$ .

Bounding $d_{\infty}(V(X^{\prime})\|V(X))$ .