This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Optimized Tradeoffs for Private Prediction with Majority Ensembling

Shuli Jiang shulij@andrew.cmu.edu
Robotics Institute, Carnegie Mellon University
Qiuyi (Richard) Zhang qiuyiz@google.com
Google DeepMind
Gauri Joshi gaurij@andrew.cmu.edu
Electrical and Computer Engineering, Carnegie Mellon University
Abstract

We study a classical problem in private prediction, the problem of computing an (mϵ,δ)(m\epsilon,\delta)-differentially private majority of KK (ϵ,Δ)(\epsilon,\Delta)-differentially private algorithms for 1mK1\leq m\leq K and 1>δΔ01>\delta\geq\Delta\geq 0. Standard methods such as subsampling or randomized response are widely used, but do they provide optimal privacy-utility tradeoffs? To answer this, we introduce the Data-dependent Randomized Response Majority (DaRRM) algorithm. It is parameterized by a data-dependent noise function γ\gamma, and enables efficient utility optimization over the class of all private algorithms, encompassing those standard methods. We show that maximizing the utility of an (mϵ,δ)(m\epsilon,\delta)-private majority algorithm can be computed tractably through an optimization problem for any mKm\leq K by a novel structural result that reduces the infinitely many privacy constraints into a polynomial set. In some settings, we show that DaRRM provably enjoys a privacy gain of a factor of 2 over common baselines, with fixed utility. Lastly, we demonstrate the strong empirical effectiveness of our first-of-its-kind privacy-constrained utility optimization for ensembling labels for private prediction from private teachers in image classification. Notably, our DaRRM framework with an optimized γ\gamma exhibits substantial utility gains when compared against several baselines.

1 Introduction

Differential privacy (DP) is a widely applied framework for formally reasoning about privacy leakage when releasing statistics on a sensitive database Erlingsson et al. (2014); Cormode et al. (2018). Differential privacy protects data privacy by obfuscating algorithmic output, ensuring that query responses look similar on adjacent datasets while preserving utility as much as possible Dwork et al. (2006).

Privacy in practice often requires aggregating or composing multiple private procedures that are distributed for data or training efficiency. For example, it is common to aggregate multiple private algorithmic or model outputs in methods such as boosting or calibration (Sagi & Rokach, 2018). In federated learning, model training is distributed across multiple edge devices. Those devices need to send local information, such as labels or gradients Konečnỳ et al. (2016), to an aggregating server, which is often honest but curious about the local training data. Hence, the output from each model at an edge device needs to be privatized locally before being sent to the server. When translating from a local privacy guarantee to a centralized one, one needs to reason about the composition of the local privacy leakage Naseri et al. (2020). Therefore, we formally ask the following:

Problem 1.1 (Private Majority Ensembling (Illustrated in Figure 1)).

Consider K1K\geq 1 (ϵ,Δ)({\epsilon},\Delta)-differentially private mechanisms M1,,MKM_{1},\dots,M_{K} for KK odd. Given a dataset 𝒟{\mathcal{D}}, each mechanism outputs a binary answer — that is, Mi:𝒟{0,1}M_{i}:{\mathcal{D}}\rightarrow\{0,1\}, i[K]\forall i\in[K]. Given a privacy allowance 1mK1\leq m\leq K, mm\in\mathbb{R} and a failure probability δΔ0\delta\geq\Delta\geq 0, δ,Δ[0,1)\delta,\Delta\in[0,1), how can one maximize the utility of an (mϵ,δ)(m{\epsilon},\delta)-differentially private mechanism 𝒜{\mathcal{A}} to compute the majority function g(S1,S2,,SK)g(S_{1},S_{2},\dots,S_{K}), where SiMi(𝒟)S_{i}\sim M_{i}({\mathcal{D}})?

Refer to caption
Figure 1: An illustration of the problem setting. The inputs are the dataset 𝒟{\mathcal{D}} and KK (ϵ,Δ)({\epsilon},\Delta)-differentially private mechanisms M1,,MKM_{1},\dots,M_{K}. One draws samples SiMi(𝒟)S_{i}\sim M_{i}({\mathcal{D}}) and computes an aggregated output g(S1,,SK)g(S_{1},\dots,S_{K}) based on all observed samples. Our goal is to design a randomized algorithm 𝒜{\mathcal{A}} that approximately computes gg and is (mϵ,δ)(m{\epsilon},\delta)-differentially private for 1mK1\leq m\leq K and δΔ0\delta\geq\Delta\geq 0. We focus on gg being the majority function .

The majority function gg is often used in private prediction, where one studies the privacy cost of releasing one prediction Dwork & Feldman (2018) and exploits the fact that releasing only the aggregated output on sharded models is significantly more private than releasing each prediction. For example, this occurs in semi-supervised knowledge transfer with private aggregated teacher ensembles (PATE) Papernot et al. (2017; 2018), in ensemble learning algorithms Jia & Qiu (2020); Xiang et al. (2018), machine unlearning Bourtoule et al. (2021), private distributed learning algorithms such as Stochastic Sign-SGD Xiang & Su (2023), and in ensemble feature selection Liu et al. (2018). Private prediction is also shown to be a competitive technique in data-adaptive settings, where the underlying dataset is changing slowly over time, to quickly adjust to online dataset updates Zhu et al. (2023). Furthermore, to address the large privacy loss of private prediction under the many-query regime, there has been recent works in everlasting private prediction that extends privacy guarantees with repeated, possibly infinite, queries without suffering a linear increase in privacy loss Naor et al. (2023); Stemmer (2024).

These works, however, rely often on the standard sensitivity analysis of gg to provide a private output and thus generally provide limited utility guarantees. This is because the maximum sensitivity of gg can be too pessimistic in practice, as observed in the problem of private hyperparameter optimization (Liu & Talwar, 2019). On the other hand, for private model ensembling, a naive way to bound privacy loss without restrictive assumptions is to apply simple composition (Theorem 2.2) or general composition (Theorem 2.3, a tighter version compared to advanced composition) to reason about the final privacy loss after aggregation. A black-box application of the simple composition theorem to compute gg would incur a KϵK{\epsilon} privacy cost in the pure differential privacy setting, that is, δ=0\delta=0, or if one is willing to tolerate some failure probability δ\delta, general composition would yield a O(Kϵ)O(\sqrt{K}{\epsilon}) privacy cost Kairouz et al. (2015). Thus, a natural baseline algorithm 𝒜{\mathcal{A}} that is (mϵ,mΔ)(m\epsilon,m\Delta)-differentially private applies privacy amplification by subsampling and randomly chooses mm of the KK mechanisms to aggregate and returns the majority of the subsampled mechanisms. This technique is reminiscent of the subsampling procedure used for the maximization function gg (Liu & Talwar, 2019) or some general techniques for privacy amplification in the federated setting via shuffling (Erlingsson et al., 2019).

However, standard composition analysis and privacy amplication techniques can be suboptimal for computing a private majority, in terms of both utility and privacy. Observe that if there is a clear majority among the outputs of M1(𝒟),,MK(𝒟)M_{1}({\mathcal{D}}),\dots,M_{K}({\mathcal{D}}), one can add less noise. This is because each mechanism MiM_{i} is (ϵ,Δ)({\epsilon},\Delta)-differentially private already, and hence, is less likely to change its output on a neighboring dataset by definition. This implies the majority outcome is unlikely to change based on single isolated changes in 𝒟{\mathcal{D}}. Furthermore, composition theorems make two pessimistic assumptions: 1) the worst-case function gg and the dataset 𝒟{\mathcal{D}} are considered, and 2) all intermediate mechanism outputs M1(𝒟),,MK(𝒟)M_{1}({\mathcal{D}}),\dots,M_{K}({\mathcal{D}}) are released, rather than just the final aggregate. Based on these observations, is it possible then to improve the utility of computing a private majority, under a fixed privacy loss?

1.1 Our Contributions

We give a (perhaps surprising) affirmative answer to the above question by using our novel data-dependent randomized response framework (DaRRM), which captures all private majority algorithms, we introduce a tractable noise optimization procedure that maximizes the privacy-utility tradeoffs. Furthermore, we can provably achieve a constant factor improvement in utility over simple subsampling by applying data-dependent noise injection when MiM_{i}’s are i.i.d. and δ=0\delta=0. To our knowledge, this is the first of its work of its kind that gives a tractable utility optimization over the possibly infinite set of privacy constraints.

Data-dependent Randomized Response Majority (DaRRM). We generalize the classical Randomized Response (RR) mechanism and the commonly used subsampling baseline for solving Problem 1.1 and propose a general randomized response framework DaRRM (see Algorithm 1), which comes with a customizable noise function γ\gamma. We show that DaRRM actually captures all algorithms computing the majority whose outputs are at least as good as a random guess (see Lemma 3.3), by choosing different γ\gamma functions.

Designing γ\gamma with Provable Privacy Amplification. The choice of the γ\gamma function in DaRRM allows us to explicitly optimize noise while trading off privacy and utility. Using structural observations, we show privacy amplification by a factor of 2 under mild conditions over applying simple composition in the pure differential privacy setting when the mechanisms MiM_{i}’s are i.i.d. (see Theorem 4.1).

Finding the Best γ\gamma through Dimension-Reduced Optimization. We further exploit the generality of DaRRM by applying a novel optimization-based approach that applies constrained optimization to find a data-dependent γ\gamma that maximizes some measure of utility. One challenge is that there are infinitely many privacy constraints, which are necessary for DaRRM with the optimized γ\gamma to satisfy the given privacy loss. We show that we can reformulate the privacy constraints, which are infinite dimensional, to a finite polynomial-sized constraint set, allowing us to efficiently constrain the optimization problem to find the best γ\gamma, even for approximate differential privacy (see Lemma 5.1). Empirically, we show that with a small mm and ϵ{\epsilon}, the optimized γ\gamma (see γopt\gamma_{opt} in Figure 2) achieves the best utility among all γ\gamma functions, even compared to the subsampling and the data-independent baseline. To our knowledge, this is the first utility maximization algorithm that optimizes over all private algorithms by constrained optimization with dimension reduction.

Experiments. In downstream tasks, such as semi-supervised knowledge transfer for private image classification, we compare our DaRRM with an optimized γ\gamma to compute the private label majority from private teachers against PATE Papernot et al. (2018), which computes the private label majority from non-private teachers. We fix the privacy loss of the output of both algorithms to be the same and find that when the number of teachers KK is small, DaRRM indeed has a higher utility than PATE, achieving 10%-15% and 30% higher accuracy on datasets MNIST and Fashion-MNIST, respectively.

2 Background

2.1 Related Work

Private Composition. Blackbox privacy composition analysis often leads to pessimistic utility guarantees. In the blackbox composition setting, one can do no better than the O(Kϵ)O(K\epsilon) privacy analysis for pure differential privacy Dwork et al. (2014). For approximate differential privacy, previous work has found optimal constants for advanced composition by reducing to the binary case of hypothesis testing with randomized response; and optimal tradeoffs between ϵ,δ\epsilon,\delta for black box composition are given in Kairouz et al. (2015), where there could be a modest improvement 20%20\%.

Thus, for specific applications, previous work has turned to white-box composition analysis for improved utility. This includes, for example, moment accountant for private SGD Abadi et al. (2016) and the application of contractive maps in stochastic convex optimization Feldman et al. (2018). For the specific case of model ensembles, Papernot et al. (2018) shows a data-dependent privacy bound that vanishes as the probability of disagreement goes to 0. Their method provides no utility analysis but they empirically observed less privacy loss when there is greater ensemble agreement.

When gg is the maximization function, some previous work shows that an approximately maximum value can be outputted with high probability while incurring O(ϵ)O(\epsilon) privacy loss, independently of KK. Liu & Talwar (2019) proposed a random stopping mechanism for m=1m=1 that draws samples uniformly at random from Mi(𝒟)M_{i}({\mathcal{D}}) at each iteration. In any given iteration, the sampling halts with probability γ\gamma and the final output is computed based on the samples collected until that time. This leads to a final privacy cost of only 3ϵ3{\epsilon} for the maximization function gg, which can be improved to 2ϵ2{\epsilon} (Papernot & Steinke, 2022). In addition to the aforementioned works, composing top-k and exponential mechanisms also enjoy slightly improved composition analysis via a bounded-range analysis Durfee & Rogers (2019); Dong et al. (2020).

Bypassing the Global Sensitivity. To ensure differential privacy, it is usually assumed the query function gg has bounded global sensitivity — that is, the output of gg does not change much on any adjacent input datasets differing in one entry. The noise added to the output is then proportional to the global sensitivity of gg. If the sensitivity is large, the output utility will thus be terrible due to a large amount of noises added. However, the worst case global sensitivity can be rare in practice, and this observation has inspired a line of works on designing private algorithms with data-dependent sensitivity bound to reduce the amount of noises added.

Instead of using the maximum global sensitivity of gg on any dataset, the classical Propose-Test-Release framework of Dwork Dwork & Lei (2009) uses a local sensitivity value for robust queries that is tested privately and if the sensitivity value is too large, the mechanism is halted before the query release. The halting mechanism incurs some failure probability but deals with the worst-case sensitivity situations, while allowing for lower noise injection in most average-case cases.

One popular way to estimate average-case sensitivity is to use the Subsample-and-Aggregate framework by introducing the notion of perturbation stability, also known as local sensitivity of a function gg on a dataset 𝒟{\mathcal{D}} Thakurta & Smith (2013); Dwork et al. (2014), which represents the minimum number of entries in 𝒟{\mathcal{D}} needs to be changed to change g(𝒟)g({\mathcal{D}}). One related concept is smooth sensitivity, a measure of variability of gg in the neighborhood of each dataset instance. To apply the framework under smooth sensitivity, one needs to privately estimate a function’s local sensitivity LsL_{s} and adapt noise injection to be order of O(Lsϵ)O(\frac{L_{s}}{{\epsilon}}), where LsL_{s} can often be as small as O(en)O(e^{-n}), where n=|𝒟|n=|\mathcal{D}|, the total dataset size Nissim et al. (2007). Generally, the private computation of the smooth sensitivity of a blackbox function is nontrivial but is aided by the Subsample and Aggregate approach for certain functions.

These techniques hinge on the observation that a function with higher stability on 𝒟{\mathcal{D}} requires less noise to ensure worst case privacy. Such techniques are also applied to answer multiple online functions/queries in model-agnostic learning Bassily et al. (2018). However, we highlight two key differences in our setting with a weaker stability assumption. First, in order to estimate the perturbation stability of gg on 𝒟{\mathcal{D}}, one needs to downsample or split 𝒟{\mathcal{D}} into multiple blocks Thakurta & Smith (2013); Dwork et al. (2014); Bassily et al. (2018), 𝒟^1,,𝒟^B\hat{{\mathcal{D}}}_{1},\dots,\hat{{\mathcal{D}}}_{B} , and estimate the perturbation stability based on the mode of g(𝒟^1),,g(𝒟^B)g(\hat{{\mathcal{D}}}_{1}),\dots,g(\hat{{\mathcal{D}}}_{B}). This essentially reduces the amount of change in the output of gg due to a single entry in 𝒟{\mathcal{D}}, with high probability and replaces the hard-to-estimate perturbation stability of gg with an easy-to-compute perturbation stability of the mode. Such a notion of stability has also been successfully applied, along with the sparse vector technique, for model-agnostic private learning to handle exponentially number of queries to a model Bassily et al. (2018). Note that in these cases, since a private stochastic test is applied, one cannot achieve pure differential privacy Dwork et al. (2014). In practice, e.g. federated learning, however, one does not have direct access to 𝒟{\mathcal{D}}, and thus it is impractical to draw samples from or to split 𝒟{\mathcal{D}}. Second, to ensure good utility, one relies on a key assumption, i.e. the subsampling stability of gg, which requires g(𝒟^)=g(𝒟)g(\hat{{\mathcal{D}}})=g({\mathcal{D}}) with high probability over the draw of subsamples 𝒟^\hat{{\mathcal{D}}}.

Although our intuition in designing DaRRM also relies on the stability of the mode function gg, previous usage of stability to improve privacy-utility tradeoffs, e.g., propose-test-release Vadhan (2017); Dwork et al. (2014), requires the testing of such stability, based on which one adds a larger (constant) noise γ\gamma. This can still lead to adding redundant noise in our case.

Optimal Randomized Response. Holohan et al. (2017) and Kairouz et al. (2015) show that the classical Randomized Response (RR) mechanism with a constant probability of faithfully revealing the true answer is optimal in certain private estimation problems. Our proposed DaRRM framework and our problem setting is a generalized version of the ones considered in both Holohan et al. (2017) and Kairouz et al. (2015), which not only subsumes RR but also enables a data-dependent probability, or noise addition.

While RR with a constant probability can be shown optimal in problems such as private count queries or private estimation of trait possession in a population, it is not optimal in other problems, such as private majority ensembling, since unlike the former problems, changing one response of the underlying mechanisms does not necessarily change the output of the majority. To explicitly compute the minimum amout of noise required, one needs the output distributions of the underlying mechanisms but this is unknown. To resolve this, our proposed DaRRM framework adds the amount of noise dependent on the set of observed outcomes from the underlying private mechanisms, 𝒮\mathcal{S}, which is a random variable of the dataset and is hence a proxy. This enables DaRRM to calibrate the amount of noise based on whether the majority output is likely to change. The amount of noise is automatically reduced when the majority output is not likely to change.

Second, Holohan et al. (2017) and Kairouz et al. (2015) both consider a special case in our setting where all KK private mechanisms are i.i.d., while our approach focuses on the more general setting where each private mechanism can have a different output distribution.

Learning A Good Noise Distribution. There have been limited works that attempt to derive or learn a good noise distribution that improves the utility. For deep neural networks inference, Mireshghallah et al. (2020) attempts to learn the best noise distribution to maximizing utility subject to an entropy Lagrangian, but no formal privacy guarantees were derived. For queries with bounded sensitivity, Geng & Viswanath (2015) demonstrate that the optimal noise distribution is in fact a staircase distribution that approaches the Laplacian distribution as ϵ0\epsilon\to 0.

Private Prediction. Instead of releasing a privately trained model as in private learning, private prediction hides the models and only releases private outputs. Private prediction has been shown as a practical alternative compared to private learning, as performing private prediction is much easier compared to private learning on a wide range of tasks Dwork & Feldman (2018); Naor et al. (2023); van der Maaten & Hannun (2020). Although a privately trained model can make infinitely many predictions at the inference time without incurring additional privacy loss, since differential privacy is closed under post-processing, it has been shown recently that it is indeed possible to make infinitely many private predictions Naor et al. (2023) with a finite privacy loss for specific problems.

2.2 Preliminaries

We first introduce the definition of differential privacy, simple composition and general composition as follows. The general composition Kairouz et al. (2015) gives a near optimal and closed-form bound on privacy loss under adaptive composition, which improves upon advanced composition Dwork et al. (2014).

Definition 2.1 (Differential Privacy (DP) Dwork et al. (2014)).

A randomized mechanism :𝒟{\mathcal{M}}:{\mathcal{D}}\rightarrow{\mathcal{R}} with a domain 𝒟{\mathcal{D}} and range {\mathcal{R}} satisfies (ϵ,δ)({\epsilon},\delta)-differential privacy for ϵ,δ0{\epsilon},\delta\geq 0 if for any two adjacent datasets 𝒟,𝒟{\mathcal{D}},{\mathcal{D}}^{\prime} and for any subset of outputs SS\subseteq{\mathcal{R}} it holds that Pr[(𝒟)S]eϵPr[(𝒟)S]+δ\Pr[{\mathcal{M}}({\mathcal{D}})\in S]\leq e^{{\epsilon}}\Pr[{\mathcal{M}}({\mathcal{D}}^{\prime})\in S]+\delta. δ=0\delta=0 is often called pure differential privacy; while δ>0\delta>0 is often called approximate differential privacy.

Theorem 2.2 (Simple Composition Dwork et al. (2014)).

For any ϵ>0{\epsilon}>0 and δ[0,1]\delta\in[0,1], the class of (ϵ,δ)({\epsilon},\delta)-differentially private mechanisms satisfy (kϵ,kδ)(k{\epsilon},k\delta)-differential privacy under kk-fold adaptive composition.

Theorem 2.3 (General Composition (Theorem 3.4 of Kairouz et al. (2015))).

For any ϵ>0,δ[0,1]{\epsilon}>0,\delta\in[0,1] and δ(0,1]\delta^{\prime}\in(0,1], the class of (ϵ,δ)({\epsilon},\delta)-differentially private mechanisms satisfies (ϵ,1(1δ)k(1δ))({\epsilon}^{\prime},1-(1-\delta)^{k}(1-\delta^{\prime}))-differential privacy under kk-fold adaptive composition for

ϵ=min{kϵ,(eϵ1)ϵkeϵ+1+ϵ2klog(e+kϵ2δ),(eϵ1)ϵkeϵ+1+ϵ2klog(1/δ)}\displaystyle{\epsilon}^{\prime}=\min\Big{\{}k{\epsilon},\frac{(e^{{\epsilon}}-1){\epsilon}k}{e^{{\epsilon}}+1}+{\epsilon}\sqrt{2k\log(e+\frac{\sqrt{k{\epsilon}^{2}}}{\delta^{\prime}})},\frac{(e^{{\epsilon}}-1){\epsilon}k}{e^{{\epsilon}}+1}+{\epsilon}\sqrt{2k\log(1/\delta^{\prime})}\Big{\}}

We then formalize the error and utility metric in our problem as follows:

Definition 2.4 (Error Metric and Utility Metric).

For the problem setting in Definition 1.1, let the observed (random) outcomes set be 𝒮={S1,..,Sk}{\mathcal{S}}=\{S_{1},..,S_{k}\}, where SiMi(𝒟)S_{i}\sim M_{i}({\mathcal{D}}). For a fixed 𝒟{\mathcal{D}}, we define the error of an algorithm 𝒜{\mathcal{A}}, i.e., (𝒜){\mathcal{E}}({\mathcal{A}}), in computing the majority function gg as the Total Variation (TV) distance between g(𝒮)g({\mathcal{S}}) and 𝒜(𝒟){\mathcal{A}}({\mathcal{D}}). Specifically,

(𝒜)\displaystyle{\mathcal{E}}({\mathcal{A}}) =𝒟TV(g(𝒮)𝒜(𝒟))=|Pr[𝒜(𝒟)=1]Pr[g(𝒮)=1]|\displaystyle=\mathcal{D}_{TV}(g({\mathcal{S}})\;\|\;{\mathcal{A}}({\mathcal{D}}))=|\Pr[{\mathcal{A}}({\mathcal{D}})=1]-\Pr[g({\mathcal{S}})=1]|

and the utility is defined as 1(𝒜)1-{\mathcal{E}}({\mathcal{A}}).

Notation. Throughout the paper, we use the same notations defined in Problem 1.1 and Definition 2.4. Furthermore, let 𝒟{\mathcal{D}} and 𝒟{\mathcal{D}}^{\prime} to denote a pair of adjacent datasets with one entry being different. Also, let pi=Pr[Mi(𝒟)=1]p_{i}=\Pr[M_{i}({\mathcal{D}})=1] and pi=Pr[Mi(𝒟)=1]p_{i}^{\prime}=\Pr[M_{i}({\mathcal{D}}^{\prime})=1], i[K]\forall i\in[K]. We omit the subscript ii when all pip_{i}’s or pip_{i}^{\prime}’s are equal. 𝕀{}{\mathbb{I}}\{\cdot\} denotes the indicator function and [K]={1,2,,K}[K]=\{1,2,\dots,K\}. For the purpose of analysis, let (𝒟)=i=1KMi(𝒟){0,1,,K}{\mathcal{L}}({\mathcal{D}})=\sum_{i=1}^{K}M_{i}({\mathcal{D}})\in\{0,1,\dots,K\}, i.e. the (random) sum of all observed outcomes on dataset 𝒟{\mathcal{D}}. 𝒟{\mathcal{D}} is omitted when the context is clear. Unless specified, we use the noise function γ:{0,1,,K}[0,1]\gamma:\{0,1,\dots,K\}\to[0,1] as input to our algorithms to calibrate the probabilistic noise injection. Unless specified, the privacy allowance mm\in\mathbb{R}.

3 Private Majority Algorithms

The very first approach to consider when solving private majority ensembling (Problem 1.1), since the output is binary, is the classical Randomized Response (RR) mechanism Dwork et al. (2014), where one flips a biased coin with a constant probability pconst[0,1]p_{const}\in[0,1]. If the coin lands on head with probability pconstp_{const}, output the true majority base on KK samples; if not, then simply output a noisy random answer. However, to make the output (mϵ,δ)(m{\epsilon},\delta)-differential private, the success probability pconstp_{const} can be at most O(mK)O(\frac{m}{K}) (or O(mK)O(\frac{m}{\sqrt{K}})) when δ=0\delta=0 (or δ>0\delta>0) (see Appendix  A.1), which is too small for any reasonable utility.

The key observation for improved utility is that the probability of success should not be a constant, but should depend on the unpublished set of observed outcomes from the mechanisms 𝒮{\mathcal{S}}. If we see many 1’s or 0’s in 𝒮{\mathcal{S}}, then there should be a clear majority even on adjacent datasets. On the other hand, if we see about half 1’s and half 0’s, this means the majority is highly volatile to data changes, which implies we need more noise to ensure privacy. In summary, if we can calibrate the success probability based on 𝒮{\mathcal{S}} to smoothly increase when there is a clear majority, we can improve the utility without affecting privacy.

Subsampling. One natural baseline is outputting the majority of mm out of KK randomly subsampled mechanisms (without replacement), given a privacy allowance m[K]m\in[K]. Suppose δmΔ\delta\geq m\Delta, the privacy loss of the aggregated output can be reasoned through simple composition or general composition. Interestingly, we show outputting the majority of mm out of KK subsampled mechanisms corresponds to RR with a non-constant probability pγ=γSub((𝒟))p_{\gamma}=\gamma_{Sub}({\mathcal{L}}({\mathcal{D}})), which is set by a polynomial function γSub:{0,,K}[0,1]\gamma_{Sub}:\{0,\dots,K\}\rightarrow[0,1] based on the sum of observed outcomes (𝒟){\mathcal{L}}({\mathcal{D}}) in Lemma 3.1 (see a full proof in Appendix A.2). Intuitively, subsampling may be seen as implicitly adding noise by only outputting based on a randomly chosen subset of the mechanisms; therefore this implicit noise is inherently data-dependent on (𝒟){\mathcal{L}}({\mathcal{D}}).

Lemma 3.1.

Consider Problem 1.1, with the privacy allowance m[K]m\in[K]. Consider the data-dependent algorithm that computes (𝒟){\mathcal{L}}({\mathcal{D}}) and then applies RR with probability pγp_{\gamma}. If pγ=γSub(l)p_{\gamma}=\gamma_{Sub}(l), where l{0,1,,K}l\in\{0,1,\dots,K\} is the value of (𝒟){\mathcal{L}}({\mathcal{D}}), i.e., the (random) sum of observed outcomes on dataset 𝒟{\mathcal{D}}, and γSub:{0,1,,K}[0,1]\gamma_{\text{Sub}}:\{0,1,\dots,K\}\rightarrow[0,1] is

γSub(l)=γSub(Kl)={12j=m+12m(lj)(Klmj)(Km)if m is odd12j=m2+1m(lj)(Klmj)(Km)(lm2)(Klm2)(Km)if m is even\displaystyle\gamma_{Sub}(l)=\gamma_{Sub}(K-l)=\begin{cases}1-2\sum_{j=\frac{m+1}{2}}^{m}\frac{{l\choose j}{K-l\choose m-j}}{{K\choose m}}&\text{if $m$ is odd}\\ 1-2\sum_{j=\frac{m}{2}+1}^{m}\frac{{l\choose j}{K-l\choose m-j}}{{K\choose m}}-\frac{{l\choose\frac{m}{2}}{K-l\choose\frac{m}{2}}}{{K\choose m}}&\text{if $m$ is even}\end{cases}

then the majority of mm out of KK subsampled mechanisms without replacement and the output of our data-dependent RR algorithm have the same distribution.

One thing special about subsampling is that when m=1m=1, it indeed results in the optimal error, which we show in Lemma 3.2 as follows. See a full proof in Appendix A.3. Note that when m=1m=1, subsampling outputs a majority of 1 with probability exactly 1Ki=1Kpi\frac{1}{K}\sum_{i=1}^{K}p_{i}. This lower bound only applies to the case when m=1m=1, since when m>1m>1, the probability of subsampling outputting a majority of 1 is not necessary 1Ki=1Kpi\frac{1}{K}\sum_{i=1}^{K}p_{i}.

Lemma 3.2 (Lower Bound on Error when m=1m=1).

Let 𝒜{\mathcal{A}} be an (ϵ,δ)({\epsilon},\delta)-differentially private algorithm, where ϵ(0,12){\epsilon}\in(0,\frac{1}{2}) and δ[0,12)\delta\in[0,\frac{1}{2}), that computes the majority of KK (ϵ,δ)({\epsilon},\delta)-differentially private mechanisms M1,,MKM_{1},\dots,M_{K}, where Mi:𝒟{0,1}M_{i}:{\mathcal{D}}\rightarrow\{0,1\} on dataset 𝒟{\mathcal{D}} and Pr[Mi(𝒟)=1]=pi,i[K]\Pr[M_{i}({\mathcal{D}})=1]=p_{i},\forall i\in[K]. Then, the error (𝒜)|Pr[g(𝒮)=1]1Ki=1Kpi|{\mathcal{E}}({\mathcal{A}})\geq|\Pr[g({\mathcal{S}})=1]-\frac{1}{K}\sum_{i=1}^{K}p_{i}|, where g(𝒮)g({\mathcal{S}}) is the probability of the true majority output being 1 as defined in Definition 1.1.

Algorithm 1 DaRRM()\textsf{DaRRM}(\cdot): Data-dependent Randomized Response Majority
1:  Input: KK (ϵ,Δ)({\epsilon},\Delta)-DP mechanisms {Mi}i=1K\{M_{i}\}_{i=1}^{K}, noise function γ:{0,1}K+1[0,1]\gamma:\{0,1\}^{K+1}\rightarrow[0,1] (in our specific setting γ:{0,1,,K}[0,1]\gamma:\{0,1,\dots,K\}\rightarrow[0,1]), dataset 𝒟{\mathcal{D}}, privacy allowance 1mK1\leq m\leq K, failure probability δΔ0\delta\geq\Delta\geq 0
2:  Output: (mϵ,δ)(m{\epsilon},\delta)-DP majority vote of {Mi}i=1K\{M_{i}\}_{i=1}^{K}
3:  𝒮={S1,..,SK}{\mathcal{S}}=\{S_{1},..,S_{K}\}, where SiMi(𝒟)S_{i}\sim M_{i}({\mathcal{D}})
4:  =i=1KSi{\mathcal{L}}=\sum_{i=1}^{K}S_{i}
5:  Set probability pγγ(𝒮)p_{\gamma}\leftarrow\gamma({\mathcal{S}}) (in our setting pγγ()p_{\gamma}\leftarrow\gamma({\mathcal{L}}))
6:  Flip the pγp_{\gamma}- biased coin
7:  if Head (with probability pγp_{\gamma}then
8:     Output 𝕀{1K12}{\mathbb{I}}\{\frac{1}{K}{\mathcal{L}}\geq\frac{1}{2}\}
9:  else
10:     Output 0/10/1 with equal probability
11:  end if

Data-dependent Randomized Response (DaRRM). Does subsampling give optimal utility when m>1m>1? Inspired by the connection between RR and subsampling, we propose Data-dependent Randomized Response Majority (DaRRM) in Algorithm 1, to study optimizing privacy-utility tradeoffs in private majority ensembling. In particular, DaRRM has a non-constant success probability pγp_{\gamma} that is set by a parameterized noise function γ\gamma, which in turn depends on the set of observed outcomes 𝒮={S1,,SK}{\mathcal{S}}=\{S_{1},\dots,S_{K}\}. In fact, we can show that DaRRM is general: any reasonable algorithm 𝒜{\mathcal{A}}, name one whose output is at least as good as a random guess, can be captured by the DaRRM framework in Lemma 3.3 (see a full proof in Appendix A.4). We denote DaRRM instantiated with a specific noise function γ\gamma by DaRRMγ\textsf{DaRRM}_{\gamma}.

Lemma 3.3 (Generality of DaRRM).

Let 𝒜{\mathcal{A}} be any randomized algorithm to compute the majority function gg on 𝒮{\mathcal{S}} such that for all 𝒮{\mathcal{S}}, Pr[𝒜(𝒮)=g(𝒮)]1/2\Pr[{\mathcal{A}}({\mathcal{S}})=g({\mathcal{S}})]\geq 1/2 (i.e. 𝒜{\mathcal{A}} is at least as good as a random guess). Then, there exists a a general function γ:{0,1}K+1[0,1]\gamma:\{0,1\}^{K+1}\rightarrow[0,1] such that if one sets pγp_{\gamma} by γ(𝒮)\gamma({\mathcal{S}}) in DaRRM, the output distribution of DaRRMγ\textsf{DaRRM}_{\gamma} is the same as the output distribution of 𝒜{\mathcal{A}}.

Designing the γ\gamma Function. With the DaRRM framework, we ask: how to design a good γ\gamma function that maximizes the utility? First, we introduce two characteristics of γ\gamma that do not affect the utility, while simplifying the analysis and the empirical optimization:

  1. (a)

    A function of the sum of observed samples: Since the observed samples set 𝒮{\mathcal{S}} is a permutation-invariant set, a sufficient statistic that captures the full state of 𝒮{\mathcal{S}} is =i=1KSi{\mathcal{L}}=\sum_{i=1}^{K}S_{i}, the sum of observed outcomes. This allows us to reduce γ(𝒮)=γ()\gamma({\mathcal{S}})=\gamma({\mathcal{L}}). Hence, in the rest of the paper, we focus on γ:{0,1,,K}[0,1]\gamma:\{0,1,\dots,K\}\rightarrow[0,1].

  2. (b)

    Symmetric around K2\mathit{\frac{K}{2}}: If γ\gamma is asymmetric, we can symmetrize by reflecting one region about K2\frac{K}{2} and achieve better or equal expected utility, where the utility is summed over symmetric distributions of pip_{i}.

Note that γSub\gamma_{Sub} satisfies both characteristics. Now, recall (𝒟){\mathcal{L}}({\mathcal{D}}) and (𝒟){\mathcal{L}}({\mathcal{D}}^{\prime}) are the sum of observed outcomes on adjacent datasets 𝒟{\mathcal{D}} and 𝒟{\mathcal{D}}^{\prime}. Also, recall pi=Pr[Mi(𝒟)=1]p_{i}=\Pr[M_{i}({\mathcal{D}})=1] and pi=Pr[Mi(𝒟)=1]p_{i}^{\prime}=\Pr[M_{i}({\mathcal{D}}^{\prime})=1] are the output probabilities of the mechanism MiM_{i} on 𝒟,𝒟{\mathcal{D}},{\mathcal{D}}^{\prime}. To design a good noise function γ\gamma in DaRRM, we start by deriving conditions for a γ\gamma function such that DaRRMγ\textsf{DaRRM}_{\gamma} is (mϵ,δ)(m{\epsilon},\delta)-differentially private in Lemma 3.4 (see a full proof in Appendix A.5).

Lemma 3.4 (γ\gamma privacy condition).

Consider using DaRRM (Algorithm 1) to solve Problem 1.1, let αl=Pr[(𝒟)=l]\alpha_{l}=\Pr[{\mathcal{L}}({\mathcal{D}})=l] and αl=Pr[(𝒟)=l]\alpha_{l}^{\prime}=\Pr[{\mathcal{L}}({\mathcal{D}}^{\prime})=l], where 𝒟{\mathcal{D}} and 𝒟{\mathcal{D}}^{\prime} are adjacent datasets and l{0,,K}l\in\{0,\dots,K\}. For a noise function γ:{0,1,,K}[0,1]\gamma:\{0,1,\dots,K\}\rightarrow[0,1] such that γ(l)=γ(Kl),l\gamma(l)=\gamma(K-l),\forall l, DaRRMγ\textsf{DaRRM}_{\gamma} is (mϵ,δ)(m\epsilon,\delta)-differentially private if and only if for all αl,αl\alpha_{l},\alpha_{l}^{\prime}, the following holds,

f(p1,,pK,p1,,pK;γ)emϵ1+2δ\displaystyle f(p_{1},\dots,p_{K},p_{1}^{\prime},\dots,p_{K}^{\prime};\gamma)\leq e^{m{\epsilon}}-1+2\delta (1)

where ff is called the privacy cost objective and

f(p1,,pK,p1,,pK;γ):=l=0K12(emϵαlαl)γ(l)+l=K+12K(αlemϵαl)γ(l)\displaystyle f(p_{1},\dots,p_{K},p_{1}^{\prime},\dots,p_{K}^{\prime};\gamma):=\sum_{l=0}^{\frac{K-1}{2}}(e^{m{\epsilon}}\alpha_{l}^{\prime}-\alpha_{l})\cdot\gamma(l)+\sum_{l=\frac{K+1}{2}}^{K}(\alpha_{l}-e^{m{\epsilon}}\alpha_{l}^{\prime})\cdot\gamma(l)

4 Provable Privacy Amplification

We theoretically demonstrate that privacy is provably amplified under improved design of γ\gamma in our DaRRM framework. Specifically, we show when the mechanisms are i.i.d. and δ=0\delta=0, we gain privacy amplification by a factor of 2 compared to the naïve subsampling baseline by carefully designing γ\gamma.

Theorem 4.1 (Provable Privacy Amplification by 2).

Consider using DaRRM (Algorithm 1) to solve Problem 1.1, with i.i.d. mechanisms {Mi}i=1K\{M_{i}\}_{i=1}^{K}, i.e., pi=pp_{i}=p, pi=pp_{i}^{\prime}=p^{\prime}, i[K]\forall i\in[K], the privacy allowance m[K]m\in[K] and δ=Δ=0\delta=\Delta=0. Let the noise function γ:{0,1,,K}[0,1]\gamma:\{0,1,\dots,K\}\rightarrow[0,1] be that:
if mK+12m\geq\frac{K+1}{2}, γ(l)=1\gamma(l)=1 and if mK12m\leq\frac{K-1}{2},

γ(l)={12h(l)lK122h(l)1lK+12\displaystyle\gamma(l)=\begin{cases}1-2h(l)&\forall l\leq\frac{K-1}{2}\\ 2h(l)-1&\forall l\geq\frac{K+1}{2}\end{cases}

where h(l)=i=m2m1(li)(Kl2m1i)(K2m1)h(l)=\sum_{i=m}^{2m-1}\frac{{l\choose i}{K-l\choose 2m-1-i}}{{K\choose 2m-1}}, then DaRRMγ\textsf{DaRRM}_{\gamma} is mϵm{\epsilon}-differentially private.

Interpretation. First, when mK12m\leq\frac{K-1}{2} is small, the γ(l)\gamma(l) in Theorem 4.1 corresponds to outputting the majority based on subsampling 2m12m-1 outcomes, from Lemma 3.1. However, the subsampling baseline, whose privacy loss is reasoned through simple composition, would have indicated that one can only output the majority based on mm outcomes, therefore implying a 22x privacy gain. When mK+12m\geq\frac{K+1}{2}, the above theorem indicates that we can set a constant γ=1\gamma=1, which implies we are optimally outputting the true majority with no noise while still surprisingly ensuring mϵm{\epsilon} privacy.

Intuition. This 22x privacy gain is intuitively possible because the majority is only dependent on half of the mechanisms’ outputs, therefore the privacy leakage is also halved. To see this, we start by analyzing the privacy cost objective in Eq. 31, where with a careful analysis of its gradient, we show that the maximum indeed occurs (p,p)=(0,0)(p^{*},p^{\prime*})=(0,0) when γ\gamma satisfies certain conditions. Now, when (p,p)0(p^{*},p^{\prime*})\to 0, note that the probability ratio of outputting 11 with 2m12m-1 outcomes is approximately emϵe^{m\epsilon}, where dependence on mm follows because the probability of outputting 11 is dominated by the probability that exactly mm mechanisms output 1. To rigorize this, we derive sufficient conditions for γ\gamma functions that satisfy max(p,p)f(p,p;γ)=f(0,0;γ)emϵ1\max_{(p,p^{\prime})}f(p,p^{\prime};\gamma)=f(0,0;\gamma)\leq e^{m{\epsilon}}-1 as indicated by Lemma 3.4, to ensure DaRRM to be mϵm{\epsilon}-differentially private and a more detailed overview and the full proof can be found in Appendix B.

5 Optimizing the Noise Function γ\gamma in DaRRM

Theoretically designing γ\gamma and extending privacy amplification results to the δ>0\delta>0 case is difficult and it is likely that our crafted γ\gamma is far from optimal. On the other hand, one can optimize for such γ\gamma^{*} that maximizes the utility but this involves solving a “Semi-infinite Programming” problem, due to the infinitely many privacy constraints, which are the constraints in the optimization problem necessary to ensure DaRRM with the optimized γ\gamma satisfy a given privacy loss. Solving a “Semi-infinite Programming” problem in general is non-tractable, but we show that in our specific setting this is in fact tractable, proposing a novel learning approach based on DaRRM that can optimize the noise function to maximize the utility. To the best of our knowledge, such optimization, presented as follows, is the first of its kind:

minγ[0,1]K+1𝔼p1,p2,,pK𝒯[(DaRRMγ)]\displaystyle\min_{\gamma\in[0,1]^{K+1}}\mathbb{E}_{p_{1},p_{2},\dots,p_{K}\sim{\mathcal{T}}}[{\mathcal{E}}(\textsf{DaRRM}_{\gamma})] (2)
s.t.max{(pi,pi)i}i=1Kf(p1,,pK,p1,,pK;γ)emϵ1+2δ\displaystyle\text{s.t.}\max_{\{(p_{i},p_{i}^{\prime})\in{\mathcal{F}}_{i}\}_{i=1}^{K}}f(p_{1},\dots,p_{K},p_{1}^{\prime},\dots,p_{K}^{\prime};\gamma)\leq e^{m{\epsilon}}-1+2\delta (3)
γ(l)=γ(Kl),l{0,1,,K}\displaystyle\quad\gamma(l)=\gamma(K-l),\forall l\in\{0,1,\dots,K\}

where ff is the privacy cost objective as defined in Lemma 3.4, i{\mathcal{F}}_{i} is the feasible region where (pi,pi)(p_{i},p_{i}^{\prime}) lies due to each mechanism MiM_{i} being ϵ{\epsilon}-differentially private. Observe that since γ\gamma is symmetric around K2\frac{K}{2}, we only need to optimize K+12\frac{K+1}{2} variables instead of K+1K+1 variables. 𝒯{\mathcal{T}} is the distribution from which p1,,pKp_{1},\dots,p_{K} are drawn. We want to stress that no prior knowledge about the dataset or the amount of consensus among the private mechanisms is required to use our optimization framework. When there is no prior knowledge about p1,,pKp_{1},\dots,p_{K}, 𝒯{\mathcal{T}} is set to be the uniform distribution for maximizing the expected utility. Note the above optimization problem also enables the flexibility of incorporating prior knowledge about the mechanisms by choosing a prior distribution 𝒯{\mathcal{T}} to further improve the utility.

Optimizing Over All Algorithms. We want to stress that by solving the above optimization problem, we are indeed optimizing over all algorithms for maximal utility, since we show in Lemma 3.3 DaRRM that captures all reasonable algorithms computing a private majority.

Linear Optimization Objective. Perhaps surprisingly, it turns out that optimizing for γ\gamma^{*} is a Linear Programming (LP) problem! Indeed, after expanding the optimization objective in Eq. 2 by the utility definition (see Definition 2.4), optimizing the above objective is essentially same as optimizing:

minγ[0,1]K+112l=K+12K𝔼p1,p2,,pK𝒯[(αlαKl)]γ(l)\displaystyle\min_{\gamma\in[0,1]^{K+1}}-\frac{1}{2}\sum_{l=\frac{K+1}{2}}^{K}\mathbb{E}_{p_{1},p_{2},\dots,p_{K}\sim{\mathcal{T}}}\left[(\alpha_{l}-\alpha_{K-l})\right]\cdot\gamma(l)

where αl=Pr[(𝒟)=l],l{0,1,,K}\alpha_{l}=\Pr[{\mathcal{L}}({\mathcal{D}})=l],\forall l\in\{0,1,\dots,K\} and observe (𝒟)PoissonBinomial(p1,,pK){\mathcal{L}}({\mathcal{D}})\sim\text{PoissonBinomial}(p_{1},\dots,p_{K}). The above objective is linear in γ\gamma. See a full derivation in Appendix C.1.

Although taking the expectation over p1,,pKp_{1},\dots,p_{K} involves integrating over KK variables and this can be computationally expensive, we discuss how to formulate a computationally efficient approximation of the objective in Appendix C.2, which we later use in the experiments. Note that the objective only for maximizing the utility and hence approximating the objective does not affect the privacy guarantee.

Reducing Infinitely Many Constraints to A Polynomial Set. The constraints in the optimization problem (Eq. 3) is what makes sure the output of DaRRMγ\textsf{DaRRM}_{\gamma} is mϵm{\epsilon}-differentially private. We thus call them the privacy constraints. Note that the privacy constraints are linear in γ\gamma.

Though it appears we need to solve for infinitely many such privacy constraints since pip_{i}’s and pip_{i}^{\prime}’s are continuous, we show that through a structural understanding of DaRRM, we can reduce the number of privacy constraints from infinitely many to exponentially many, and further to a polynomial set. First, we observe the privacy cost objective ff is linear in each independent pair of (pi,pi)(p_{i},p_{i}^{\prime}) fixing all (pj,pj)(p_{j},p_{j}^{\prime}), ji\forall j\neq i, and hence finding the worst case probabilities in (pi,pi)(p_{i},p_{i}^{\prime}) given any γ\gamma, (pi,pi)=argmax(pi,pi)f(p1,,pK,p1,,pK;γ)(p_{i}^{*},p_{i}^{\prime*})=\operatorname*{arg\,max}_{(p_{i},p_{i}^{\prime})}f(p_{1},\dots,p_{K},p_{1}^{\prime},\dots,p_{K}^{\prime};\gamma) is a linear programming (LP) problem. Furthermore, since pip_{i} and pip_{i}^{\prime} are the probability of outputting 1 from the ii-th (ϵ,Δ)({\epsilon},\Delta)-differentially private mechanism MiM_{i} on adjacent datasets, by definition, they are close and lie in a feasible region i{\mathcal{F}}_{i}, which we show has 8 corners if δ>0\delta>0 (and only 4 corners if δ=0\delta=0). This implies (pi,pi)(p_{i}^{*},p_{i}^{\prime*}) only happens at one of the corners of i{\mathcal{F}}_{i}, and hence the number of constraints reduces to K8K^{8} (and K4K^{4} if δ=0\delta=0). Second, observe that αl\alpha_{l} and αl\alpha_{l}^{\prime} in the privacy cost objective ff are the pmf of two Poisson Binomial distributions at l{0,,K}l\in\{0,\dots,K\}. Notice that the Poisson Binomial is invariant under the permutation of its parameters, i.e. PoissonBinomial(p1,,pK)\text{PoissonBinomial}(p_{1},\dots,p_{K}) has the same distribution as PoissonBinomial(π(p1,,pK))\text{PoissonBinomial}(\pi(p_{1},\dots,p_{K})), under some permutation π\pi. Based on this observation, we show the number of constraints can be further reduced to O(K7)O(K^{7}) if δ>0\delta>0 (and O(K3)O(K^{3}) if δ=0\delta=0). We formalize the two-step reduction of the number of privacy constraints in Lemma 5.1 as follows. See a full proof in Appendix C.3. 111Practical Limitation. Although the number of constraints is polynomial in KK and optimizing γ\gamma in DaRRM is an LP, O(K7)O(K^{7}) can still make the number of constraints intractably large when KK is large. In practice, we observe with the Gurobi optimizer, one can optimize γ\gamma for K41K\leq 41 on a laptop if δ>0\delta>0. But if δ=0\delta=0, since the number of privacy constraints is O(K3)O(K^{3}), one can optimize for KK over 100.

Lemma 5.1.

Consider using DaRRM (Algorithm 1) to solve Problem 1.1 and let ff be the privacy cost objective as defined in Lemma 3.4. Given an arbitrary noise function γ\gamma, let the worst case probabilities be (p1,,pK,p1,,pK)=argmax{(pi,pi)}i=1Kf(p1,,pK,p1,,pK;γ)(p_{1}^{*},\dots,p_{K}^{*},p_{1}^{\prime*},\dots,p_{K}^{\prime*})=\operatorname*{arg\,max}_{\{(p_{i},p_{i}^{\prime})\}_{i=1}^{K}}f(p_{1},\dots,p_{K},p_{1}^{\prime},\dots,p_{K}^{\prime};\gamma).

(p1,,pK,p1,,pK)=argmax{(pi,pi)}i=1Kf(p1,,pK,p1,,pK;γ)\displaystyle(p_{1}^{*},\dots,p_{K}^{*},p_{1}^{\prime*},\dots,p_{K}^{\prime*})=\operatorname*{arg\,max}_{\{(p_{i},p_{i}^{\prime})\}_{i=1}^{K}}f(p_{1},\dots,p_{K},p_{1}^{\prime},\dots,p_{K}^{\prime};\gamma)

Then, each pair (pi,pi),i[K](p_{i}^{*},p_{i}^{\prime*}),\forall i\in[K] satisfies

(pi,pi){\displaystyle(p_{i}^{*},p_{i}^{\prime*})\in\{ (0,0),(1,1),(0,Δ),(Δ,0),(1Δ,1),\displaystyle(0,0),(1,1),(0,\Delta),(\Delta,0),(1-\Delta,1),
(1,1Δ),(eϵ+Δeϵ+1,1Δeϵ+1),(1Δeϵ+1,eϵ+Δeϵ+1)}\displaystyle(1,1-\Delta),(\frac{e^{{\epsilon}}+\Delta}{e^{{\epsilon}}+1},\frac{1-\Delta}{e^{{\epsilon}}+1}),(\frac{1-\Delta}{e^{{\epsilon}}+1},\frac{e^{{\epsilon}}+\Delta}{e^{{\epsilon}}+1})\}

Furthermore, when δ>0\delta>0, there exists a finite vector set 𝒫{\mathcal{P}} of size O(K7)O(K^{7}) such that if β=max{(pi,pi)}i=1K𝒫f(p1,,pK,p1,,pK;γ)\beta=\max_{\{(p_{i},p_{i}^{\prime})\}_{i=1}^{K}\in{\mathcal{P}}}f(p_{1},\dots,p_{K},p_{1}^{\prime},\dots,p_{K}^{\prime};\gamma), then f(p1,,pK,p1,,pK;γ)βf(p_{1}^{*},\dots,p_{K}^{*},p_{1}^{\prime*},\dots,p_{K}^{\prime*};\gamma)\leq\beta. When δ=0\delta=0, the size of 𝒫{\mathcal{P}} can be reduced to O(K3)O(K^{3}).

6 Experiments

We empirically solve222All code for the experiments can be found at https://anonymous.4open.science/r/OptimizedPrivateMajority-CF50 the above optimization problem (Eq. 2) using the Gurobi333https://www.gurobi.com/ solver and first present the shape of the optimized γ\gamma function, which we call γopt\gamma_{opt}, and its utility in Section 6.1. Then, we demonstrate the compelling effectiveness of DaRRM with an optimized γ\gamma function, i.e., DaRRMγopt\textsf{DaRRM}_{\gamma_{opt}}, in ensembling labels for private prediction from private teachers through the application of semi-supervised knowledge transfer for private image classification in Section 6.2.

6.1 Optimized γ\gamma in Simulations

Refer to caption
Refer to caption
Refer to caption
Figure 2: Plots of the shape and (DaRRMγ){\mathcal{E}}(\textsf{DaRRM}_{\gamma}) of different γ\gamma functions: the optimized γopt\gamma_{opt}

, and the baselines γSub\gamma_{Sub} (corresponding to subsampling) and γconst\gamma_{const} (corresponding to RR). Here, K=11,m{1,3,5,7}K=11,m\in\{1,3,5,7\}, ϵ=0.1{\epsilon}=0.1, Δ=105\Delta=10^{-5} and δ=1(1Δ)mmΔ\delta=1-(1-\Delta)^{m}\approx m\Delta.

We compare the shape and the error (DaRRMγ){\mathcal{E}}(\textsf{DaRRM}_{\gamma}) of different γ\gamma functions: an optimized γopt\gamma_{opt} and the subsampling γSub\gamma_{Sub} as in Lemma 3.1444Note the subsampling mechanism from Section 4, which enjoys a privacy amplification by a factor of 2, only applies to pure differential privacy settings (i.e., when Δ=δ=0\Delta=\delta=0). However, we focus on the more general approximate differential privacy settings (with Δ>0\Delta>0) in the experiments, and hence, the subsampling baseline we consider throughout this section is the basic version without privacy amplification. To see how the subsampling mechanism from Section 4 with privacy amplification compares against the other algorithms, please refer to Appendix D.1.2.. We also compare against pconstp_{const} in the classical baseline RR (see Section A.1) and (RR){\mathcal{E}}(\textsf{RR}). Here, pconstp_{const} can be viewed as a constant noise function γconst(l)=pconst,l{0,1,,K}\gamma_{const}(l)=p_{const},\forall l\in\{0,1,\dots,K\}; and (RR){\mathcal{E}}(\textsf{RR}) is the same as (DaRRMγconst){\mathcal{E}}(\textsf{DaRRM}_{\gamma_{const}}).

We present the results with K=11,ϵ=0.1,Δ=105K=11,{\epsilon}=0.1,\Delta=10^{-5} and m{1,3,5,7}m\in\{1,3,5,7\}. We assume there is no prior knowledge about the mechanisms {Mi}i=1K\{M_{i}\}_{i=1}^{K}, and set the prior distribution from which pip_{i}’s are drawn, 𝒯{\mathcal{T}}, to be the uniform distribution, in the optimization objective (Eq. 2) searching for γopt\gamma_{opt}. To ensure a fair comparison against the subsampling baseline, we set δ\delta to be the one by mm-fold general composition (see Theorem 2.3), which in this case, is δ=1(1Δ)mmΔ\delta=1-(1-\Delta)^{m}\approx m\Delta. We plot each γ\gamma functions over the support {0,1,,K}\{0,1,\dots,K\} and the corresponding error of each algorithm in Figure 2.

Discussion. In summary, at m=1m=1, the optimized noise function γopt\gamma_{opt} overlaps with γsub\gamma_{sub} which corresponds to the subsampling baseline. This agrees with our lower bound on the error in Lemma 3.2, which implies that at m=1m=1, subsampling indeed gives the optimal error. When m>1m>1, the optimized noise function γopt\gamma_{opt} has the highest probability of outputting the true majority over the support than the γ\gamma functions corresponding to the baselines. This implies DaRRMγopt\textsf{DaRRM}_{\gamma_{opt}} has the lowest error (and hence, highest utility), which is verified on the bottom set of plots. More results on comparing the DaRRMγopt\textsf{DaRRM}_{\gamma_{opt}} optimized under the uniform 𝒯{\mathcal{T}} against the baselines by general composition (Theorem 2.3) and in pure differential privacy settings (i.e., Δ=δ=0\Delta=\delta=0) for large KK and mm can be found in Appendix D.1.1 and D.1.2. Furthermore, we include results optimizing γ\gamma using a non-uniform 𝒯{\mathcal{T}} prior in Appendix D.1.3.

6.2 Private Semi-Supervised Knowledge Transfer

Dataset MNIST # Queries GNMax (Baseline) DaRRMγSub\textsf{DaRRM}_{\gamma_{Sub}} (Baseline) DaRRMγopt\textsf{DaRRM}_{\gamma_{opt}} (Ours) Q=20Q=20 0.63 (0.09) 0.76 (0.09) 0.79 (0.09) Q=50Q=50 0.66 (0.06) 0.75 (0.06) 0.79 (0.05) Q=100Q=100 0.64 (0.04) 0.76 (0.04) 0.80 (0.04)

Dataset Fashion-MNIST # Queries GNMax (Baseline) DaRRMγSub\textsf{DaRRM}_{\gamma_{Sub}} (Baseline) DaRRMγopt\textsf{DaRRM}_{\gamma_{opt}} (Ours) Q=20Q=20 0.65 (0.11) 0.90 (0.07) 0.96 (0.03) Q=50Q=50 0.59 (0.06) 0.94 (0.03) 0.96 (0.02) Q=100Q=100 0.64 (0.04) 0.93 (0.02) 0.96 (0.02)

Table 1: Accuracy of the predicted labels of QQ query samples on datasets MNIST (on the left) and Fashion-MNIST (on the right). We report the mean and one std. in parentheses over 10 random draws of the query samples from the test dataset. Note each prediction on the query sample is (ϵquery,δquery)({\epsilon}_{query},\delta_{query})

-differentially private. With the same per query privacy loss (and hence the same total privacy loss over QQ samples), DaRRMγopt\textsf{DaRRM}_{\gamma_{opt}} achieves the highest accuracy compared to the other two baselines.

Semi-supervised Knowledge Transfer. We apply our DaRRM framework in the application of semi-supervised knowledge transfer for private image classification. We follow a similar setup as in PATE Papernot et al. (2017; 2018), where one trains KK teachers, each on a subset of a sensitive dataset, and at the inference time, queries the teachers for the majority of their votes, i.e., the predicted labels, of a test sample. Each time the teachers are queried, there is a privacy loss, and we focus on this private prediction subroutine in this section. To limit the total privacy loss over all queries, the student model is also trained on a public dataset without labels. The student model queries the labels of a small portion of the samples in this dataset from the teachers and is then trained using semi-supervised learning algorithms on both the labeled and unlabeled samples from the public dataset.

Baselines. We want the privacy loss per query of a test sample to the teachers to be (ϵquery,δquery)({\epsilon}_{query},\delta_{query}). This can be achieved via two ways: 1) Train KK non-private teachers, add Gaussian noise to the number of predicted labels from the teachers in each output class, and output the majority of the noisy votes. This is exactly the GNMax algorithm from PATE Papernot et al. (2018). 2) Train KK (ϵ,Δ)({\epsilon},\Delta)-differentially private teachers and output the majority of the teachers’ votes by adding a smaller amount of noise. This can be computed using DaRRM with an appropriate noise function γ\gamma. We compare the performance of GNMax and DaRRM with two γ\gamma functions: γopt\gamma_{opt} (i.e., the optimized γ\gamma), and γSub\gamma_{Sub} (i.e., the subsampling baseline). The overall privacy loss over QQ queries to the teachers can be computed by general composition (Theorem 2.3).

Experiment Setup. We use samples from two randomly chosen classes — class 5 and 8 — from the MNIST and Fashion-MNIST datasets to form our training and testing datasets. Our MNIST has a total of 11272 training samples and 18661866 testing samples; our Fashion-MNIST has 1000010000 training samples and 20002000 testing samples. We train K=11K=11 teachers on equally divided subsets of the training datasets. Each teacher is a CNN model. The non-private and private teachers are trained using SGD and DP-SGD Abadi et al. (2016), respectively, for 5 epochs. DaRRM Setup: The Gaussian noise in DP-SGD has zero mean and std. σdpsgd=12\sigma_{dpsgd}=12; the gradient norm clipping threshold is C=1C=1. This results in each private teacher, trained on MNIST and Fashion-MNIST, being (ϵ,Δ)=(0.0892,104)({\epsilon},\Delta)=(0.0892,10^{-4}) and (0.0852,104)(0.0852,10^{-4})-differentially private, respectively, after 5 epochs. We set the privacy allowance m=3m=3555 Here, we present results with privacy allowance m=3m=3 because we think this is a more interesting case. m=1m=1 is less interesting, since one cannot get improvement compared to the subsampling baseline. mm close to a K25\frac{K}{2}\approx 5 is also less interesting, as this case seems too easy for our proposed method (the optimized γ\gamma function is very close to 1, meaning very little noise needs to be added in this case). Hence, we pick m=3m=3, which is a case when improvement is possible, and is also potentially challenging for our optimization framework. This is also realistic as most applications would only want to tolerate a constant privacy overhead. See more results with different privacy allowance mm’s in this setting in Appendix D.2.2. and the privacy loss per query is then computed using general composition under mm-fold, which give the same privacy loss in the high privacy regime, resulting in (ϵquery,δquery)=(0.2676,0.0003)({\epsilon}_{query},\delta_{query})=(0.2676,0.0003) on MNIST and (0.2556,0.0003)(0.2556,0.0003) on Fashion-MNIST. GNMax Setup: We now compute the std. σ\sigma of the Gaussian noise used by GNMax to achieve a per-query privacy loss of (mϵ,mΔ)(m{\epsilon},m\Delta), as in the DaRRM setup. We optimize σ\sigma according to the Renyi differential privacy loss bound of Gaussian noise. Although Papernot et al. (2018) gives a potentially tighter data-dependent privacy loss bound for majority ensembling non-private teachers, we found when KK and the number of output classes are small as in our case, even if all teachers agree on a single output class, the condition of the data-dependent bound is not satisfied. Hence, we only use the privacy loss bound of Gaussian noise here to set σ\sigma in GNMax. See Appendix D.2.1 for more details, including the σ\sigma values and other parameters. Finally, the per sample privacy loss and the total privacy loss over QQ queries, which is computed by advanced composition, are reported in Table 9.

The testing dataset is treated as the public dataset on which one trains a student model. Papernot et al. (2018) empirically shows querying Q=1%NQ=1\%N samples from a public dataset of size NN suffices to train a student model with a good performance. Therefore, we pick Q{20,50,100}Q\in\{20,50,100\}. We repeat the selection of QQ samples 10 times and report the mean test accuracy with one std. in parentheses in Table 1. The QQ queries serve as the labeled samples in training the student model. The higher the accuracy of the labels from the queries, the better the final performance of the student model. We skip the actual training of the student model using semi-supervised learning algorithms here.

Dataset # Queries Privacy loss per query (ϵquery,δquery)({\epsilon}_{query},\delta_{query}) Total privacy loss over QQ queries (ϵtotal,δtotal)({\epsilon}_{total},\delta_{total})
MNIST Q=20Q=20 (0.2676,0.0003)(0.2676,0.0003) (5.352,0.006)(5.352,0.006)
Q=50Q=50 (9.901,0.015)(9.901,0.015)
Q=100Q=100 (15.044,0.030)(15.044,0.030)
Fashion MNIST Q=20Q=20 (0.2556,0.0003)(0.2556,0.0003) (5.112,0.006)(5.112,0.006)
Q=50Q=50 (9.382,0.015)(9.382,0.015)
Q=100Q=100 (14.219,0.030)(14.219,0.030)
Table 2: The privacy loss per query to the teachers and the total privacy loss over QQ queries. Note the total privacy loss is computed by general composition (see Theorem 2.3), where we set δ=0.0001\delta^{\prime}=0.0001.

Discussion. Table 1 shows DaRRMγopt\textsf{DaRRM}_{\gamma_{opt}} achieves the highest accuracy (i.e., utility) compared to the two baselines on both datasets. First, comparing to DaRRMγSub\textsf{DaRRM}_{\gamma_{Sub}}, we verify that subsampling does not achieve a tight privacy-utility tradeoff, and we can optimize the noise function γ\gamma in DaRRM to maximize the utility given a target privacy loss. Second, comparing to GNMax, the result shows there are regimes where ensembling private teachers gives a higher utility than directly ensembling non-private teachers, assuming the outputs in both settings have the same privacy loss. Intuitively, this is because ensembling private teachers adds fine-grained noise during both training the teachers and aggregation of teachers’ votes, while ensembling non-private teachers adds a coarser amount of noise only to the teachers’ outputs. This further motivates private prediction from private teachers and the practical usage of DaRRM, in addition to the need of aggregating private teachers in federated learning settings with an honest-but-curious server.

7 Conclusion

In computing a private majority from KK private mechanisms, we propose the DaRRM framework, which is provably general, with a customizable γ\gamma function. We show a privacy amplification by a factor of 2 in the i.i.d. mechanisms and a pure differential privacy setting. For the general setting, we propose an tractable optimization algorithm that maximizes utility while ensuring privacy guarantees. Furthermore, we demonstrate the empirical effectiveness of DaRRM with an optimized γ\gamma. We hope that this work inspires more research on the intersection of privacy frameworks and optimization.

References

  • Abadi et al. (2016) Martin Abadi, Andy Chu, Ian Goodfellow, H Brendan McMahan, Ilya Mironov, Kunal Talwar, and Li Zhang. Deep learning with differential privacy. In Proceedings of the 2016 ACM SIGSAC conference on computer and communications security, pp.  308–318, 2016.
  • Bassily et al. (2018) Raef Bassily, Om Thakkar, and Abhradeep Thakurta. Model-agnostic private learning via stability. arXiv preprint arXiv:1803.05101, 2018.
  • Bourtoule et al. (2021) Lucas Bourtoule, Varun Chandrasekaran, Christopher A Choquette-Choo, Hengrui Jia, Adelin Travers, Baiwu Zhang, David Lie, and Nicolas Papernot. Machine unlearning. In 2021 IEEE Symposium on Security and Privacy (SP), pp.  141–159. IEEE, 2021.
  • Cormode et al. (2018) Graham Cormode, Somesh Jha, Tejas Kulkarni, Ninghui Li, Divesh Srivastava, and Tianhao Wang. Privacy at scale: Local differential privacy in practice. In Proceedings of the 2018 International Conference on Management of Data, pp.  1655–1658, 2018.
  • Dong et al. (2020) Jinshuo Dong, David Durfee, and Ryan Rogers. Optimal differential privacy composition for exponential mechanisms. In International Conference on Machine Learning, pp.  2597–2606. PMLR, 2020.
  • Durfee & Rogers (2019) David Durfee and Ryan M Rogers. Practical differentially private top-k selection with pay-what-you-get composition. Advances in Neural Information Processing Systems, 32, 2019.
  • Dwork & Feldman (2018) Cynthia Dwork and Vitaly Feldman. Privacy-preserving prediction. In Conference On Learning Theory, pp.  1693–1702. PMLR, 2018.
  • Dwork & Lei (2009) Cynthia Dwork and Jing Lei. Differential privacy and robust statistics. In Proceedings of the forty-first annual ACM symposium on Theory of computing, pp.  371–380, 2009.
  • Dwork et al. (2006) Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam Smith. Calibrating noise to sensitivity in private data analysis. In Theory of cryptography conference, pp.  265–284. Springer, 2006.
  • Dwork et al. (2014) Cynthia Dwork, Aaron Roth, et al. The algorithmic foundations of differential privacy. Found. Trends Theor. Comput. Sci., 9(3-4):211–407, 2014.
  • Erlingsson et al. (2014) Úlfar Erlingsson, Vasyl Pihur, and Aleksandra Korolova. Rappor: Randomized aggregatable privacy-preserving ordinal response. In Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security, CCS ’14, pp.  1054–1067, New York, NY, USA, 2014. Association for Computing Machinery. ISBN 9781450329576. doi: 10.1145/2660267.2660348. URL https://doi.org/10.1145/2660267.2660348.
  • Erlingsson et al. (2019) Úlfar Erlingsson, Vitaly Feldman, Ilya Mironov, Ananth Raghunathan, Kunal Talwar, and Abhradeep Thakurta. Amplification by shuffling: From local to central differential privacy via anonymity. In Proceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms, pp.  2468–2479. SIAM, 2019.
  • Feldman et al. (2018) Vitaly Feldman, Ilya Mironov, Kunal Talwar, and Abhradeep Thakurta. Privacy amplification by iteration. In 2018 IEEE 59th Annual Symposium on Foundations of Computer Science (FOCS), pp.  521–532. IEEE, 2018.
  • Geng & Viswanath (2015) Quan Geng and Pramod Viswanath. The optimal noise-adding mechanism in differential privacy. IEEE Transactions on Information Theory, 62(2):925–951, 2015.
  • Holohan et al. (2017) Naoise Holohan, Douglas J. Leith, and Oliver Mason. Optimal differentially private mechanisms for randomised response. IEEE Transactions on Information Forensics and Security, 12(11):2726–2735, November 2017. ISSN 1556-6021. doi: 10.1109/tifs.2017.2718487. URL http://dx.doi.org/10.1109/TIFS.2017.2718487.
  • Jia & Qiu (2020) Junjie Jia and Wanyong Qiu. Research on an ensemble classification algorithm based on differential privacy. IEEE Access, 8:93499–93513, 2020. doi: 10.1109/ACCESS.2020.2995058.
  • Kairouz et al. (2015) Peter Kairouz, Sewoong Oh, and Pramod Viswanath. The composition theorem for differential privacy. In International conference on machine learning, pp.  1376–1385. PMLR, 2015.
  • Konečnỳ et al. (2016) Jakub Konečnỳ, H Brendan McMahan, Felix X Yu, Peter Richtárik, Ananda Theertha Suresh, and Dave Bacon. Federated learning: Strategies for improving communication efficiency. arXiv preprint arXiv:1610.05492, 2016.
  • Liu & Talwar (2019) Jingcheng Liu and Kunal Talwar. Private selection from private candidates. In Proceedings of the 51st Annual ACM SIGACT Symposium on Theory of Computing, STOC 2019, pp.  298–309, New York, NY, USA, 2019. Association for Computing Machinery. ISBN 9781450367059. doi: 10.1145/3313276.3316377. URL https://doi.org/10.1145/3313276.3316377.
  • Liu et al. (2018) Zhongfeng Liu, Yun Li, and Wei Ji. Differential private ensemble feature selection. In 2018 International Joint Conference on Neural Networks (IJCNN), pp.  1–6, 2018. doi: 10.1109/IJCNN.2018.8489308.
  • Mireshghallah et al. (2020) Fatemehsadat Mireshghallah, Mohammadkazem Taram, Prakash Ramrakhyani, Ali Jalali, Dean Tullsen, and Hadi Esmaeilzadeh. Shredder: Learning noise distributions to protect inference privacy. In Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems, pp.  3–18, 2020.
  • Naor et al. (2023) Moni Naor, Kobbi Nissim, Uri Stemmer, and Chao Yan. Private everlasting prediction. arXiv preprint arXiv:2305.09579, 2023.
  • Naseri et al. (2020) Mohammad Naseri, Jamie Hayes, and Emiliano De Cristofaro. Local and central differential privacy for robustness and privacy in federated learning. arXiv preprint arXiv:2009.03561, 2020.
  • Nissim et al. (2007) Kobbi Nissim, Sofya Raskhodnikova, and Adam Smith. Smooth sensitivity and sampling in private data analysis. In Proceedings of the thirty-ninth annual ACM symposium on Theory of computing, pp.  75–84, 2007.
  • Papernot & Steinke (2022) Nicolas Papernot and Thomas Steinke. Hyperparameter tuning with renyi differential privacy. In International Conference on Learning Representations, 2022. URL https://openreview.net/forum?id=-70L8lpp9DF.
  • Papernot et al. (2017) Nicolas Papernot, Martin Abadi, Ulfar Erlingsson, Ian Goodfellow, and Kunal Talwar. Semi-supervised knowledge transfer for deep learning from private training data. In Proceedings of the International Conference on Learning Representations, 2017. URL https://arxiv.org/abs/1610.05755.
  • Papernot et al. (2018) Nicolas Papernot, Shuang Song, Ilya Mironov, Ananth Raghunathan, Kunal Talwar, and Úlfar Erlingsson. Scalable private learning with pate. arXiv preprint arXiv:1802.08908, 2018.
  • Sagi & Rokach (2018) Omer Sagi and Lior Rokach. Ensemble learning: A survey. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 8(4):e1249, 2018.
  • Stemmer (2024) Uri Stemmer. Private truly-everlasting robust-prediction. arXiv preprint arXiv:2401.04311, 2024.
  • Thakurta & Smith (2013) Abhradeep Guha Thakurta and Adam Smith. Differentially private feature selection via stability arguments, and the robustness of the lasso. In Conference on Learning Theory, pp.  819–850. PMLR, 2013.
  • Vadhan (2017) Salil Vadhan. The Complexity of Differential Privacy, pp.  347–450. Springer International Publishing, Cham, 2017. doi: 10.1007/978-3-319-57048-8_7. URL https://doi.org/10.1007/978-3-319-57048-8_7.
  • van der Maaten & Hannun (2020) Laurens van der Maaten and Awni Hannun. The trade-offs of private prediction, 2020.
  • Xiang & Su (2023) Ming Xiang and Lili Su. $\beta$-stochastic sign SGD: A byzantine resilient and differentially private gradient compressor for federated learning, 2023. URL https://openreview.net/forum?id=oVPqFCI1g7q.
  • Xiang et al. (2018) Tao Xiang, Yang Li, Xiaoguo Li, Shigang Zhong, and Shui Yu. Collaborative ensemble learning under differential privacy. Web Intelligence, 16:73–87, 03 2018. doi: 10.3233/WEB-180374.
  • Zhang et al. (2019) Ying-Ying Zhang, Teng-Zhong Rong, and Man-Man Li. Expectation identity for the binomial distribution and its application in the calculations of high-order binomial moments. Communications in Statistics - Theory and Methods, 48(22):5467–5476, 2019. doi: 10.1080/03610926.2018.1435818. URL https://doi.org/10.1080/03610926.2018.1435818.
  • Zhu et al. (2023) Yuqing Zhu, Xuandong Zhao, Chuan Guo, and Yu-Xiang Wang. " private prediction strikes back!”private kernelized nearest neighbors with individual renyi filter. arXiv preprint arXiv:2306.07381, 2023.

Appendix A Details of Section 3

A.1 Randomized Response with Constant Probability pconstp_{const}

Algorithm 2 Randomized Response Majority (RR)
1:  Input: KK (ϵ,Δ)({\epsilon},\Delta)-DP mechanisms {Mi}i=1K\{M_{i}\}_{i=1}^{K}, noise function γ:{0,,K}[0,1]\gamma:\{0,\dots,K\}\rightarrow[0,1], dataset 𝒟{\mathcal{D}}, privacy allowance 1mK1\leq m\leq K, failure probability δΔ0\delta\geq\Delta\geq 0
2:  Output: (mϵ,δ)(m{\epsilon},\delta)-DP majority vote of {Mi}i=1K\{M_{i}\}_{i=1}^{K}
3:  Compute a constant probability pconst[0,1]p_{const}\in[0,1]
4:  Flip the pconstp_{const}- biased coin
5:  if Head (with probability pconstp_{const}then
6:     𝒮={S1,..,Sk}{\mathcal{S}}=\{S_{1},..,S_{k}\}, where SiMi(𝒟)S_{i}\sim M_{i}({\mathcal{D}})
7:     =i=1KSi{\mathcal{L}}=\sum_{i=1}^{K}S_{i}
8:     Output 𝕀{1K12}{\mathbb{I}}\{\frac{1}{K}{\mathcal{L}}\geq\frac{1}{2}\}
9:  else
10:     Output 0/10/1 with equal probability
11:  end if

We show the magnitude of pconstp_{const} in RR (Algorithm 2) to solve Problem 1.1, such that the output is (mϵ,δ)(m{\epsilon},\delta)-DP, in Lemma A.1.

Lemma A.1.

Consider using RR (Algorithm 2) to solve Problem 1.1. Let the majority of KK (ϵ,Δ)({\epsilon},\Delta)-differentially private mechanisms be (τϵ,λ)(\tau{\epsilon},\lambda)-differentially private, where τ[1,K]\tau\in[1,K] and λ[0,1)\lambda\in[0,1) are computed by simple composition (Theorem 2.2) or general composition (Theorem 2.3). If

pconstemϵ1+2δ2(eτϵemϵ+(1+emϵ)λ)eτϵ+1+emϵ1\displaystyle p_{const}\leq\frac{e^{m{\epsilon}}-1+2\delta}{\frac{2(e^{\tau{\epsilon}}-e^{m{\epsilon}}+(1+e^{m{\epsilon}})\lambda)}{e^{\tau{\epsilon}}+1}+e^{m{\epsilon}}-1} (4)

then RR is (mϵ,δ)(m{\epsilon},\delta)-differentially private.

Proof of Lemma A.1.

Let x{0,1}x\in\{0,1\} denote the output of RR. Let qx=Pr[(𝒟)=x]q_{x}=\Pr[{\mathcal{L}}({\mathcal{D}})=x] and qx=Pr[(𝒟)=x]q_{x}^{\prime}=\Pr[{\mathcal{L}}({\mathcal{D}}^{\prime})=x], where (𝒟)=i=1KMi(𝒟){\mathcal{L}}({\mathcal{D}})=\sum_{i=1}^{K}M_{i}({\mathcal{D}}), (𝒟)=i=1KMi(𝒟){\mathcal{L}}({\mathcal{D}}^{\prime})=\sum_{i=1}^{K}M_{i}({\mathcal{D}}^{\prime}) and 𝒟{\mathcal{D}}, 𝒟{\mathcal{D}}^{\prime} are adjacent datasets. Recall each mechanism MiM_{i} is (ϵ,Δ)({\epsilon},\Delta)-differentially private, and the majority of the outputs of {Mi}i=1K\{M_{i}\}_{i=1}^{K} is (τϵ,λ)(\tau{\epsilon},\lambda)-differentially private. When Δ=0\Delta=0, using simple composition, τ=K\tau=K and λ=0\lambda=0. When Δ>0\Delta>0, using general composition τK\tau\approx\sqrt{K} and λKΔ\lambda\approx K\Delta. By definition of differential privacy (Definition 2.1), all of the following four constraints on qx,qxq_{x},q_{x}^{\prime} apply:

qxeτϵqx+λ,and1qxeτϵ(1qx)+λ\displaystyle q_{x}\leq e^{\tau{\epsilon}}q_{x}^{\prime}+\lambda,\quad\text{and}\quad 1-q_{x}^{\prime}\leq e^{\tau{\epsilon}}(1-q_{x})+\lambda
qxeτϵqx+λ,and1qxeτϵ(1qx)+λ\displaystyle q_{x}^{\prime}\leq e^{\tau{\epsilon}}q_{x}+\lambda,\quad\text{and}\quad 1-q_{x}\leq e^{\tau{\epsilon}}(1-q_{x}^{\prime})+\lambda

To ensure RR is (mϵ,δ)(m{\epsilon},\delta)-differentially private, pconstp_{const} needs to be such that for all possible qx,qx[0,1]q_{x},q_{x}^{\prime}\in[0,1],

Pr[RR(𝒟)=x]emϵPr[RR(𝒟)=x]+δ\displaystyle\Pr[\textsf{RR}({\mathcal{D}})=x]\leq e^{m{\epsilon}}\Pr[\textsf{RR}({\mathcal{D}}^{\prime})=x]+\delta (5)
pconstqx+12(1pconst)emϵ(pconstqx+12(1pconst))+δ\displaystyle p_{const}\cdot q_{x}+\frac{1}{2}(1-p_{const})\leq e^{m{\epsilon}}(p_{const}\cdot q_{x}^{\prime}+\frac{1}{2}(1-p_{const}))+\delta (6)
(qxemϵqx+12emϵ12)pconst12emϵ12+δ\displaystyle(q_{x}-e^{m{\epsilon}}q_{x}^{\prime}+\frac{1}{2}e^{m{\epsilon}}-\frac{1}{2})\cdot p_{const}\leq\frac{1}{2}e^{m{\epsilon}}-\frac{1}{2}+\delta (7)

Let h(qx,qx):=qxemϵqx+12emϵ12h(q_{x},q_{x}^{\prime}):=q_{x}-e^{m{\epsilon}}q_{x}^{\prime}+\frac{1}{2}e^{m{\epsilon}}-\frac{1}{2}. The above inequality of pconstp_{const} (Eq. 7) needs to hold for worst case output probabilities qx,qxq_{x}^{*},q_{x}^{\prime*} that cause the maximum privacy loss. That is, pconstp_{const} needs to satisfy

pconstmaxqx,qxh(qx,qx)12emϵ12+δ\displaystyle p_{const}\cdot max_{q_{x},q_{x}^{\prime}}h(q_{x},q_{x}^{\prime})\leq\frac{1}{2}e^{m{\epsilon}}-\frac{1}{2}+\delta (8)

To find the worst case output probabilities, we solve the following Linear Programming (LP) problem:

Objective: maxqx,qxh(qx,qx):=qxemϵqx+12emϵ12\displaystyle\max_{q_{x},q_{x}^{\prime}}\quad h(q_{x},q_{x}^{\prime}):=q_{x}-e^{m{\epsilon}}q_{x}^{\prime}+\frac{1}{2}e^{m{\epsilon}}-\frac{1}{2} (9)
Subject to: 0qx1,0qx1\displaystyle 0\leq q_{x}\leq 1,0\leq q_{x}^{\prime}\leq 1 (10)
qxeτϵqx+λ,1qxeτϵ(1qx)+λ\displaystyle q_{x}\leq e^{\tau{\epsilon}}q_{x}^{\prime}+\lambda,1-q_{x}^{\prime}\leq e^{\tau{\epsilon}}(1-q_{x})+\lambda (11)
qxeτϵqx+λ,1qxeτϵ(1qx)+λ\displaystyle q_{x}^{\prime}\leq e^{\tau{\epsilon}}q_{x}+\lambda,1-q_{x}\leq e^{\tau{\epsilon}}(1-q_{x}^{\prime})+\lambda (12)
Refer to caption
Figure 3: A visualization of the above LP problem.

The optimum of any LP problem is at the corners of the feasible region, which is bounded by the optimization constraints. We plot the feasible region {\mathcal{F}} and the objective of the above LP problem in Figure 3. Here, (qx,qx)=argmaxqx,qxh(qx,qx)(q_{x}^{*},q_{x}^{\prime*})=\operatorname*{arg\,max}_{q_{x},q_{x}^{\prime}}h(q_{x},q_{x}^{\prime})\in {(0,0),(1,1),(0,λ),(λ,0),(1λ,1),(1,1λ),(1λeτϵ+1,eτϵ+λeτϵ+1),(eτϵ+λeτϵ+1,1λeτϵ+1)}\{(0,0),(1,1),(0,\lambda),(\lambda,0),(1-\lambda,1),(1,1-\lambda),(\frac{1-\lambda}{e^{\tau{\epsilon}}+1},\frac{e^{\tau{\epsilon}}+\lambda}{e^{\tau{\epsilon}}+1}),(\frac{e^{\tau{\epsilon}}+\lambda}{e^{\tau{\epsilon}}+1},\frac{1-\lambda}{e^{\tau{\epsilon}}+1})\}. The optimum of the LP problem – that is, the worse case probabilities qx,qxq_{x}^{*},q_{x}^{\prime*} – is,

qx=eτϵ+λeτϵ+1,qx=1λeτϵ+1\displaystyle q_{x}^{*}=\frac{e^{\tau{\epsilon}}+\lambda}{e^{\tau{\epsilon}}+1},\quad q_{x}^{\prime*}=\frac{1-\lambda}{e^{\tau{\epsilon}}+1} (13)

By Eq. 8,

pconst(eτϵ+λeτϵ+1emϵ1λeτϵ+1+12emϵ12)\displaystyle p_{const}\cdot\Big{(}\frac{e^{\tau{\epsilon}}+\lambda}{e^{\tau{\epsilon}}+1}-e^{m{\epsilon}}\frac{1-\lambda}{e^{\tau{\epsilon}}+1}+\frac{1}{2}e^{m{\epsilon}}-\frac{1}{2}\Big{)} 12(emϵ1)+δ\displaystyle\leq\frac{1}{2}(e^{m{\epsilon}}-1)+\delta (14)
pconst(eτϵemϵ+(1+emϵ)λeτϵ+1+12(emϵ1))\displaystyle p_{const}\cdot\Big{(}\frac{e^{\tau{\epsilon}}-e^{m{\epsilon}}+(1+e^{m{\epsilon}})\lambda}{e^{\tau{\epsilon}}+1}+\frac{1}{2}(e^{m{\epsilon}}-1)\Big{)} 12(emϵ1)+δ\displaystyle\leq\frac{1}{2}(e^{m{\epsilon}}-1)+\delta (15)
pconst\displaystyle p_{const} emϵ1+2δ2(eτϵemϵ+(1+emϵ)λ)eτϵ+1+emϵ1\displaystyle\leq\frac{e^{m{\epsilon}}-1+2\delta}{\frac{2(e^{\tau{\epsilon}}-e^{m{\epsilon}}+(1+e^{m{\epsilon}})\lambda)}{e^{\tau{\epsilon}}+1}+e^{m{\epsilon}}-1} (16)

For small m,ϵ,Km,{\epsilon},K, using the approximation ey1+ye^{y}\approx 1+y and that τϵ<2\tau{\epsilon}<2,

pconstmϵ+2δ2(τϵmϵ+(2+mϵ)λ)τϵ+2+mϵmϵ+2δτϵ+(2+mϵ)λ\displaystyle p_{const}\approx\frac{m{\epsilon}+2\delta}{\frac{2(\tau{\epsilon}-m{\epsilon}+(2+m{\epsilon})\lambda)}{\tau{\epsilon}+2}+m{\epsilon}}\approx\frac{m{\epsilon}+2\delta}{\tau{\epsilon}+(2+m{\epsilon})\lambda} (17)

In the pure differential privacy setting, δ=0,λ=0,τ=K\delta=0,\lambda=0,\tau=K, and so pconstmKp_{const}\approx\frac{m}{K}; and in the approximate differential privacy setting, λ0,δ0,τK\lambda\approx 0,\delta\approx 0,\tau\approx\sqrt{K}, and so pconstmKp_{const}\approx\frac{m}{\sqrt{K}}. ∎

A.2 Proof of Lemma 3.1

Algorithm 3 Subsampling Majority (SubMaj)
1:  Input: KK (ϵ,Δ)({\epsilon},\Delta)-DP mechanisms {Mi}i=1K\{M_{i}\}_{i=1}^{K}, noise function γ:{0,,K}[0,1]\gamma:\{0,\dots,K\}\rightarrow[0,1], dataset 𝒟{\mathcal{D}}, privacy allowance 1mK1\leq m\leq K, failure probability δΔ0\delta\geq\Delta\geq 0
2:  Output: (mϵ,δ)(m{\epsilon},\delta)-DP majority vote of {Mi}i=1K\{M_{i}\}_{i=1}^{K}
3:  𝒮={S1,..,Sk}{\mathcal{S}}=\{S_{1},..,S_{k}\}, where SiMi(𝒟)S_{i}\sim M_{i}({\mathcal{D}})
4:  𝒥m{\mathcal{J}}_{m}\leftarrow mm indices chosen uniformly at random from [K][K] without replacement
5:  ^=j𝒥Sj\widehat{{\mathcal{L}}}=\sum_{j\in{\mathcal{J}}}S_{j}
6:  Output 𝕀{1m^12}{\mathbb{I}}\{\frac{1}{m}\widehat{{\mathcal{L}}}\geq\frac{1}{2}\}
Lemma A.2 (Restatement of Lemma 3.1).

Consider Problem 1.1, with the privacy allowance m[K]m\in[K]. Consider the data-dependent algorithm that computes (𝒟){\mathcal{L}}({\mathcal{D}}) and then applies RR with probability pγp_{\gamma}. If pγ=γSub(l)p_{\gamma}=\gamma_{Sub}(l), where l{0,1,,K}l\in\{0,1,\dots,K\} is the value of (𝒟){\mathcal{L}}({\mathcal{D}}), i.e., the (random) sum of observed outcomes on dataset 𝒟{\mathcal{D}}, and γSub:{0,1,,K}[0,1]\gamma_{\text{Sub}}:\{0,1,\dots,K\}\rightarrow[0,1] is

γSub(l)=γSub(Kl)\displaystyle\gamma_{Sub}(l)=\gamma_{Sub}(K-l)
={12j=m+12m(lj)(Klmj)(Km)if m is odd12j=m2+1m(lj)(Klmj)(Km)(lm2)(Klm2)(Km)if m is even\displaystyle=\begin{cases}1-2\sum_{j=\frac{m+1}{2}}^{m}\frac{{l\choose j}{K-l\choose m-j}}{{K\choose m}}&\text{if $m$ is odd}\\ 1-2\sum_{j=\frac{m}{2}+1}^{m}\frac{{l\choose j}{K-l\choose m-j}}{{K\choose m}}-\frac{{l\choose\frac{m}{2}}{K-l\choose\frac{m}{2}}}{{K\choose m}}&\text{if $m$ is even}\end{cases}

then the majority of mm out of KK subsampled mechanisms without replacement and the output of our data-dependent RR algorithm have the same distribution.

Proof of Lemma 3.1.

Let =i=1KSi{\mathcal{L}}=\sum_{i=1}^{K}S_{i} be the sum of observed outcomes from KK mechanisms. Following Algorithm 3, 𝒥m{\mathcal{J}}_{m} denotes the mm indices chosen uniformly at random from [K][K] without replacement. Conditioned on {\mathcal{L}}, notice the output of SubMaj follows a hypergeometric distribution. The output probability of SubMaj is

Pr[SubMaj(𝒟)=1]\displaystyle\Pr[\textsf{SubMaj}({\mathcal{D}})=1] =l=0KPr[SubMaj(𝒟)=1=l]Pr[=l]\displaystyle=\sum_{l=0}^{K}\Pr[\textsf{SubMaj}({\mathcal{D}})=1\mid{\mathcal{L}}=l]\cdot\Pr[{\mathcal{L}}=l] (18)
=l=0KPr[j𝒥mSjm2=l]Pr[=l]\displaystyle=\sum_{l=0}^{K}\Pr[\sum_{j\in{\mathcal{J}}_{m}}S_{j}\geq\frac{m}{2}\mid{\mathcal{L}}=l]\cdot\Pr[{\mathcal{L}}=l] (19)
={l=0K(j=m+12m(lj)(Klmj)(Km))Pr[=l]if m is oddl=0K(j=m2+1m(lj)(Klmj)(Km)+12(lm2)(Klm2)(Km))Pr[=l]if m is even\displaystyle=\begin{cases}\sum_{l=0}^{K}(\sum_{j=\frac{m+1}{2}}^{m}\frac{{l\choose j}{K-l\choose m-j}}{{K\choose m}})\cdot\Pr[{\mathcal{L}}=l]&\text{if $m$ is odd}\\ \sum_{l=0}^{K}(\sum_{j=\frac{m}{2}+1}^{m}\frac{{l\choose j}{K-l\choose m-j}}{{K\choose m}}+\frac{1}{2}\frac{{l\choose\frac{m}{2}}{K-l\choose\frac{m}{2}}}{{K\choose m}})\cdot\Pr[{\mathcal{L}}=l]&\text{if $m$ is even}\end{cases} (20)

Consider an arbitrary noise function γSub:{0,1,,K}[0,1]\gamma_{Sub}:\{0,1,\dots,K\}\rightarrow[0,1]. Let RR-d(𝒟)\textsf{RR-d}({\mathcal{D}}) denote the output of the data-dependent RR-d on dataset 𝒟{\mathcal{D}}, where RR-d has the non-constant probability set by γSub\gamma_{Sub}. The output probability of RR is,

Pr[RR-d(𝒟)=1]\displaystyle\Pr[\textsf{RR-d}({\mathcal{D}})=1] =l=0KPr[RR-d(𝒟)=1=l]Pr[=l]\displaystyle=\sum_{l=0}^{K}\Pr[\textsf{RR-d}({\mathcal{D}})=1\mid{\mathcal{L}}=l]\cdot\Pr[{\mathcal{L}}=l] (21)
=l=0K(γSub(l)𝕀{lK+12}+12(1γSub(l)))Pr[=l]\displaystyle=\sum_{l=0}^{K}(\gamma_{Sub}(l)\cdot{\mathbb{I}}\{l\geq\frac{K+1}{2}\}+\frac{1}{2}(1-\gamma_{Sub}(l)))\cdot\Pr[{\mathcal{L}}=l] (22)

We want Pr[RR-d(𝒟)=1]=Pr[Submaj(𝒟)=1]\Pr[\textsf{RR-d}({\mathcal{D}})=1]=\Pr[\textsf{Submaj}({\mathcal{D}})=1].

If mm is odd, for any lK12l\leq\frac{K-1}{2}, this is

12(1γSub(l))=j=m+12m(lj)(Klmj)(Km)\displaystyle\frac{1}{2}(1-\gamma_{Sub}(l))=\sum_{j=\frac{m+1}{2}}^{m}\frac{{l\choose j}{K-l\choose m-j}}{{K\choose m}}
γSub(l)=12j=m+12m(lj)(Klmj)(Km)\displaystyle\Rightarrow\gamma_{Sub}(l)=1-2\sum_{j=\frac{m+1}{2}}^{m}\frac{{l\choose j}{K-l\choose m-j}}{{K\choose m}} (23)

and for any lK+12l\geq\frac{K+1}{2}, this is

12+12γSub(l)=j=m+12m(lj)(Klmj)(Km)\displaystyle\frac{1}{2}+\frac{1}{2}\gamma_{Sub}(l)=\sum_{j=\frac{m+1}{2}}^{m}\frac{{l\choose j}{K-l\choose m-j}}{{K\choose m}}
γSub(l)=2j=m+12m(lj)(Klmj)(Km)1\displaystyle\Rightarrow\gamma_{Sub}(l)=2\sum_{j=\frac{m+1}{2}}^{m}\frac{{l\choose j}{K-l\choose m-j}}{{K\choose m}}-1 (24)

Similarly, if mm is even, for any lK12l\leq\frac{K-1}{2}, this is

12(1γSub(l))=j=m2+1m(lj)(Klmj)(Km)+12(lm2)(Klm2)(Km)\displaystyle\frac{1}{2}(1-\gamma_{Sub}(l))=\sum_{j=\frac{m}{2}+1}^{m}\frac{{l\choose j}{K-l\choose m-j}}{{K\choose m}}+\frac{1}{2}\frac{{l\choose\frac{m}{2}}{K-l\choose\frac{m}{2}}}{{K\choose m}}
γSub(l)=12j=m2+1m(lj)(Klmj)(Km)(lm2)(Klm2)(Km)\displaystyle\Rightarrow\gamma_{Sub}(l)=1-2\sum_{j=\frac{m}{2}+1}^{m}\frac{{l\choose j}{K-l\choose m-j}}{{K\choose m}}-\frac{{l\choose\frac{m}{2}}{K-l\choose\frac{m}{2}}}{{K\choose m}} (25)

and for any lK+12l\geq\frac{K+1}{2}, this is

12+12γSub(l)=j=m2+1m(lj)(Klmj)(Km)+12(lm2)(Klm2)(Km)\displaystyle\frac{1}{2}+\frac{1}{2}\gamma_{Sub}(l)=\sum_{j=\frac{m}{2}+1}^{m}\frac{{l\choose j}{K-l\choose m-j}}{{K\choose m}}+\frac{1}{2}\frac{{l\choose\frac{m}{2}}{K-l\choose\frac{m}{2}}}{{K\choose m}}
γSub(l)=2j=m2+1m(lj)(Klmj)(Km)+(lm2)(Klm2)(Km)1\displaystyle\Rightarrow\gamma_{Sub}(l)=2\sum_{j=\frac{m}{2}+1}^{m}\frac{{l\choose j}{K-l\choose m-j}}{{K\choose m}}+\frac{{l\choose\frac{m}{2}}{K-l\choose\frac{m}{2}}}{{K\choose m}}-1 (26)

Next, we show the above γSub\gamma_{Sub} is indeed symmetric around K2\frac{K}{2}. For any lK12l\leq\frac{K-1}{2}, there is KlK+12K-l\geq\frac{K+1}{2}. If mm is odd,

γSub(Kl)\displaystyle\gamma_{Sub}(K-l) =2j=m+12m(Klj)(lmj)(Km)1=2(1j=1m12(Klj)(lmj)(Km))1\displaystyle=2\sum_{j=\frac{m+1}{2}}^{m}\frac{{K-l\choose j}{l\choose m-j}}{{K\choose m}}-1=2\Big{(}1-\sum_{j=1}^{\frac{m-1}{2}}\frac{{K-l\choose j}{l\choose m-j}}{{K\choose m}}\Big{)}-1
=12j=1m12(Klj)(lmj)(Km)=12j=m+12m(lj)(Klmj)(Km)\displaystyle=1-2\sum_{j=1}^{\frac{m-1}{2}}\frac{{K-l\choose j}{l\choose m-j}}{{K\choose m}}=1-2\sum_{j=\frac{m+1}{2}}^{m}\frac{{l\choose j}{K-l\choose m-j}}{{K\choose m}}
=γSub(l)\displaystyle=\gamma_{Sub}(l) (27)

Similarly, if mm is even,

γSub(Kl)\displaystyle\gamma_{Sub}(K-l) =2j=m2+1m(Klj)(lmj)(Km)+(lm2)(Klm2)(Km)1=2(1j=1m21(Klj)(lmj)(Km)12(lm2)(Klm2)(Km))1\displaystyle=2\sum_{j=\frac{m}{2}+1}^{m}\frac{{K-l\choose j}{l\choose m-j}}{{K\choose m}}+\frac{{l\choose\frac{m}{2}}{K-l\choose\frac{m}{2}}}{{K\choose m}}-1=2\Big{(}1-\sum_{j=1}^{\frac{m}{2}-1}\frac{{K-l\choose j}{l\choose m-j}}{{K\choose m}}-\frac{1}{2}\frac{{l\choose\frac{m}{2}}{K-l\choose\frac{m}{2}}}{{K\choose m}}\Big{)}-1
=12j=1m21(Klj)(lmj)(Km)(lm2)(Klm2)(Km)=12j=m2+1m(lj)(Klmj)(Km)(lm2)(Klm2)(Km)\displaystyle=1-2\sum_{j=1}^{\frac{m}{2}-1}\frac{{K-l\choose j}{l\choose m-j}}{{K\choose m}}-\frac{{l\choose\frac{m}{2}}{K-l\choose\frac{m}{2}}}{{K\choose m}}=1-2\sum_{j=\frac{m}{2}+1}^{m}\frac{{l\choose j}{K-l\choose m-j}}{{K\choose m}}-\frac{{l\choose\frac{m}{2}}{K-l\choose\frac{m}{2}}}{{K\choose m}}
=γSub(l)\displaystyle=\gamma_{Sub}(l) (28)

Now, combining Eq. 23, Eq. 24 and Eq. 27, if mm is odd, setting γSub\gamma_{Sub} as

γSub(l)=γSub(Kl)=12j=m+12m(lj)(Klmj)(Km)\displaystyle\gamma_{Sub}(l)=\gamma_{Sub}(K-l)=1-2\sum_{j=\frac{m+1}{2}}^{m}\frac{{l\choose j}{K-l\choose m-j}}{{K\choose m}} (29)

makes RR-d have the same output distribution as SubMaj.

Similarly, combining Eq. 25, Eq. 26 and Eq. 28, if mm is even, setting γSub\gamma_{Sub} as

γSub(l)=γSub(Kl)=12j=m2+1m(lj)(Klmj)(Km)(lm2)(Klm2)(Km)\displaystyle\gamma_{Sub}(l)=\gamma_{Sub}(K-l)=1-2\sum_{j=\frac{m}{2}+1}^{m}\frac{{l\choose j}{K-l\choose m-j}}{{K\choose m}}-\frac{{l\choose\frac{m}{2}}{K-l\choose\frac{m}{2}}}{{K\choose m}} (30)

makes RR-d have the same output distribution as SubMaj.

A.3 Proof of Lemma 3.2

Lemma A.3 (Restatement of Lemma 3.2).

Let 𝒜{\mathcal{A}} be an (ϵ,δ)({\epsilon},\delta)-differentially private algorithm, where ϵ(0,12){\epsilon}\in(0,\frac{1}{2}) and δ[0,12)\delta\in[0,\frac{1}{2}), that computes the majority of KK (ϵ,δ)({\epsilon},\delta)-differentially private mechanisms M1,,MKM_{1},\dots,M_{K}, where Mi:𝒟{0,1}M_{i}:{\mathcal{D}}\rightarrow\{0,1\} on dataset 𝒟{\mathcal{D}} and Pr[Mi(𝒟)=1]=pi,i[K]\Pr[M_{i}({\mathcal{D}})=1]=p_{i},\forall i\in[K]. Then, the error (𝒜)|Pr[g(𝒮)=1]1Ki=1Kpi|{\mathcal{E}}({\mathcal{A}})\geq|\Pr[g({\mathcal{S}})=1]-\frac{1}{K}\sum_{i=1}^{K}p_{i}|, where g(𝒮)g({\mathcal{S}}) is the probability of the true majority output being 1 as defined in Definition 1.1.

Proof.

Consider the setting where MiM_{i}’s are i.i.d., i.e., Pr[Mi(𝒟)=1]=p,i[K]\Pr[M_{i}({\mathcal{D}})=1]=p,\forall i\in[K] for some p[0,1]p\in[0,1] on any dataset 𝒟{\mathcal{D}}. Then, it suffices to show (𝒜)|Pr[g(𝒮)]=1p|{\mathcal{E}}({\mathcal{A}})\geq|\Pr[g({\mathcal{S}})]=1-p|, because a lower bound in this special case would indicate a lower bound for the more general case, where pip_{i}’s can be different.

Construct a dataset 𝒟0{\mathcal{D}}_{0} and KK mechanisms {Mi}i=1K\{M_{i}\}_{i=1}^{K} such that Pr[i(𝒟0)=1]=Pr[i(𝒟0)=0]=12\Pr[{\mathcal{M}}_{i}({\mathcal{D}}_{0})=1]=\Pr[{\mathcal{M}}_{i}({\mathcal{D}}_{0})=0]=\frac{1}{2} and without loss of generality, we may assume Pr[𝒜(𝒟0)=1]12\Pr[{\mathcal{A}}({\mathcal{D}}_{0})=1]\leq\frac{1}{2}.

Next, we construct a sequence of datasets 𝒟1,𝒟2,,𝒟L{\mathcal{D}}_{1},{\mathcal{D}}_{2},\dots,{\mathcal{D}}_{L}, such that 𝒟j{\mathcal{D}}_{j} and 𝒟j+1{\mathcal{D}}_{j+1} are neighboring datasets tha t differ in one entry, for all j[L1]j\in[L-1], and Pr[Mi(𝒟j)=1]=12ejϵ+l=0j1elϵδ\Pr[M_{i}({\mathcal{D}}_{j})=1]=\frac{1}{2}e^{j{\epsilon}}+\sum_{l=0}^{j-1}e^{l{\epsilon}}\delta, i[K]\forall i\in[K], j[L]\forall j\in[L]. Choose LL\in{\mathbb{N}} such that 12eLϵ+l=0L1eϵlδ=p\frac{1}{2}e^{L{\epsilon}}+\sum_{l=0}^{L-1}e^{{\epsilon}l}\delta=p, for some 1p>121\geq p>\frac{1}{2}.

Now, by definition of differential privacy,

Pr[𝒜(𝒟1)=1]eϵPr[𝒜(𝒟0)=1]+δ\displaystyle\Pr[{\mathcal{A}}({\mathcal{D}}_{1})=1]\leq e^{{\epsilon}}\Pr[{\mathcal{A}}({\mathcal{D}}_{0})=1]+\delta
Pr[𝒜(𝒟2)=1]eϵPr[𝒜(𝒟1)=1]+δe2ϵPr[𝒜(𝒟0)=1]+eϵδ+δ\displaystyle\Pr[{\mathcal{A}}({\mathcal{D}}_{2})=1]\leq e^{{\epsilon}}\Pr[{\mathcal{A}}({\mathcal{D}}_{1})=1]+\delta\leq e^{2{\epsilon}}\Pr[{\mathcal{A}}({\mathcal{D}}_{0})=1]+e^{{\epsilon}}\delta+\delta
\displaystyle\dots
Pr[𝒜(𝒟L)=1]eLϵPr[𝒜(𝒟0)=1]+l=0L1eϵlδeLϵ12+l=0L1eϵlδ=p\displaystyle\Pr[{\mathcal{A}}({\mathcal{D}}_{L})=1]\leq e^{L{\epsilon}}\Pr[{\mathcal{A}}({\mathcal{D}}_{0})=1]+\sum_{l=0}^{L-1}e^{{\epsilon}l}\delta\leq e^{L{\epsilon}}\frac{1}{2}+\sum_{l=0}^{L-1}e^{{\epsilon}l}\delta=p

Since the probability of true majority being 1 on dataset 𝒟L{\mathcal{D}}_{L} is Pr[g(𝒮)=1]p>12\Pr[g({\mathcal{S}})=1]\geq p>\frac{1}{2}, there is

(𝒜)=|Pr[g(𝒮)=1]Pr[𝒜(𝒟L)=1]|Pr[g(𝒮)=1]p\displaystyle{\mathcal{E}}({\mathcal{A}})=|\Pr[g({\mathcal{S}})=1]-\Pr[{\mathcal{A}}({\mathcal{D}}_{L})=1]|\geq\Pr[g({\mathcal{S}})=1]-p

A.4 Proof of Lemma 3.3

Lemma A.4 (Restatement of Lemma 3.3).

Let 𝒜{\mathcal{A}} be any randomized algorithm to compute the majority function gg on 𝒮{\mathcal{S}} such that for all 𝒮{\mathcal{S}}, Pr[𝒜(𝒮)=g(𝒮)]1/2\Pr[{\mathcal{A}}({\mathcal{S}})=g({\mathcal{S}})]\geq 1/2 (i.e. 𝒜{\mathcal{A}} is at least as good as a random guess). Then, there exists a a general function γ:{0,1}K+1[0,1]\gamma:\{0,1\}^{K+1}\rightarrow[0,1] such that if one sets pγp_{\gamma} by γ(𝒮)\gamma({\mathcal{S}}) in DaRRM, the output distribution of DaRRMγ\textsf{DaRRM}_{\gamma} is the same as the output distribution of 𝒜{\mathcal{A}}.

Proof of Lemma 3.3.

For some 𝒟{\mathcal{D}} and conditioned on 𝒮{\mathcal{S}}, we see that by definition Pr[DaRRMγ(𝒮)=g(𝒮)]=γ(𝒮)+(1/2)(1γ(𝒮))\Pr[\textsf{DaRRM}_{\gamma}({\mathcal{S}})=g({\mathcal{S}})]=\gamma({\mathcal{S}})+(1/2)(1-\gamma({\mathcal{S}})). We want to set γ\gamma such that Pr[DaRRMγ(𝒮)=g(𝒮)]=Pr[𝒜(𝒮)=g(𝒮)]\Pr[\textsf{DaRRM}_{\gamma}({\mathcal{S}})=g({\mathcal{S}})]=\Pr[{\mathcal{A}}({\mathcal{S}})=g({\mathcal{S}})]. Therefore, we set γ(𝒮)=2Pr[𝒜(𝒮)=g(𝒮)]1\gamma({\mathcal{S}})=2\Pr[{\mathcal{A}}({\mathcal{S}})=g({\mathcal{S}})]-1.

Lastly, we need to justify that γ[0,1]\gamma\in[0,1]. Clearly, γ(𝒮)211\gamma({\mathcal{S}})\leq 2-1\leq 1 since Pr[𝒜(𝒮)=g(𝒮)]1\Pr[{\mathcal{A}}({\mathcal{S}})=g({\mathcal{S}})]\leq 1. Note that the non-negativity follows from assumption. ∎

A.5 Proof of Lemma 3.4

Lemma A.5 (Restatement of Lemma 3.4).

Consider using DaRRM (Algorithm 1) to solve Problem 1.1, let αl=Pr[(𝒟)=l]\alpha_{l}=\Pr[{\mathcal{L}}({\mathcal{D}})=l] and αl=Pr[(𝒟)=l]\alpha_{l}^{\prime}=\Pr[{\mathcal{L}}({\mathcal{D}}^{\prime})=l], where 𝒟{\mathcal{D}} and 𝒟{\mathcal{D}}^{\prime} are adjacent datasets and l{0,,K}l\in\{0,\dots,K\}. For a noise function γ:{0,1,,K}[0,1]\gamma:\{0,1,\dots,K\}\rightarrow[0,1] such that γ(l)=γ(Kl),l\gamma(l)=\gamma(K-l),\forall l, DaRRMγ\textsf{DaRRM}_{\gamma} is (mϵ,δ)(m\epsilon,\delta)-differentially private if and only if for all αl,αl\alpha_{l},\alpha_{l}^{\prime}, the following holds,

f(p1,,pK,p1,,pK;γ)emϵ1+2δ\displaystyle f(p_{1},\dots,p_{K},p_{1}^{\prime},\dots,p_{K}^{\prime};\gamma)\leq e^{m{\epsilon}}-1+2\delta (31)

where ff is called the privacy cost objective and

f(p1,,pK,p1,,pK;γ):=l=0K12(emϵαlαl)γ(l)+l=K+12K(αlemϵαl)γ(l)\displaystyle f(p_{1},\dots,p_{K},p_{1}^{\prime},\dots,p_{K}^{\prime};\gamma):=\sum_{l=0}^{\frac{K-1}{2}}(e^{m{\epsilon}}\alpha_{l}^{\prime}-\alpha_{l})\cdot\gamma(l)+\sum_{l=\frac{K+1}{2}}^{K}(\alpha_{l}-e^{m{\epsilon}}\alpha_{l}^{\prime})\cdot\gamma(l)
Proof of Lemma 3.4.

By the definition of differential privacy (Definition 2.1),

DaRRMγ\textsf{DaRRM}_{\gamma} is (mϵ,δ)(m{\epsilon},\delta)-differentially private
Pr[DaRRMγ(𝒟)=1]emϵPr[DaRRMγ(𝒟)=1]+δ,\displaystyle\iff\Pr[\textsf{DaRRM}_{\gamma}({\mathcal{D}})=1]\leq e^{m{\epsilon}}\Pr[\textsf{DaRRM}_{\gamma}({\mathcal{D}}^{\prime})=1]+\delta,
and Pr[DaRRMγ(𝒟)=0]emϵPr[DaRRMγ(𝒟)=0]+δ, adjacent datasets 𝒟,𝒟\displaystyle\text{and }\Pr[\textsf{DaRRM}_{\gamma}({\mathcal{D}})=0]\leq e^{m{\epsilon}}\Pr[\textsf{DaRRM}_{\gamma}({\mathcal{D}}^{\prime})=0]+\delta,\quad\forall\text{ adjacent datasets ${\mathcal{D}},{\mathcal{D}}^{\prime}$} (32)

Let random variables (𝒟)=i=1KS(𝒟){\mathcal{L}}({\mathcal{D}})=\sum_{i=1}^{K}S({\mathcal{D}}) and (𝒟)=i=1KS(𝒟){\mathcal{L}}({\mathcal{D}}^{\prime})=\sum_{i=1}^{K}S({\mathcal{D}}^{\prime}) be the sum of observed outcomes on adjacent datasets 𝒟{\mathcal{D}} and 𝒟{\mathcal{D}}^{\prime}, based on which one sets pγp_{\gamma} in DaRRM. Let αl=Pr[(𝒟)=l]\alpha_{l}=\Pr[{\mathcal{L}}({\mathcal{D}})=l] and αl=Pr[(𝒟)=l]\alpha_{l}^{\prime}=\Pr[{\mathcal{L}}({\mathcal{D}}^{\prime})=l], l{0,1,,K}\forall l\in\{0,1,\dots,K\}.

Consider the output being 1.

Pr[DaRRMγ(𝒟)=1]emϵPr[DaRRMγ(𝒟)=1]+δ\displaystyle\Pr[\textsf{DaRRM}_{\gamma}({\mathcal{D}})=1]\leq e^{m{\epsilon}}\Pr[\textsf{DaRRM}_{\gamma}({\mathcal{D}}^{\prime})=1]+\delta (33)
l=0KPr[DaRRMγ(𝒟)=1(𝒟)=l]Pr[(𝒟)=l]\displaystyle\iff\sum_{l=0}^{K}\Pr[\textsf{DaRRM}_{\gamma}({\mathcal{D}})=1\mid{\mathcal{L}}({\mathcal{D}})=l]\cdot\Pr[{\mathcal{L}}({\mathcal{D}})=l] (34)
emϵ(l=0KPr[DaRRMγ(𝒟)=1(𝒟)=l]Pr[(𝒟)=l])+δ\displaystyle\leq e^{m{\epsilon}}\Big{(}\sum_{l=0}^{K}\Pr[\textsf{DaRRM}_{\gamma}({\mathcal{D}}^{\prime})=1\mid{\mathcal{L}}({\mathcal{D}}^{\prime})={l}]\cdot\Pr[{\mathcal{L}}({\mathcal{D}}^{\prime})=l]\Big{)}+\delta
l=0K(γ(l)𝕀{lK2}+12(1γ(l)))Pr[(𝒟)=l]\displaystyle\iff\sum_{l=0}^{K}\Big{(}\gamma(l)\cdot{\mathbb{I}}\{l\geq\frac{K}{2}\}+\frac{1}{2}(1-\gamma(l))\Big{)}\cdot\Pr[{\mathcal{L}}({\mathcal{D}})=l] (35)
emϵ(l=0K(γ(l)𝕀{lK2}+12(1γ(l))})Pr[(𝒟)=l])+δ\displaystyle\leq e^{m{\epsilon}}\Big{(}\sum_{l=0}^{K}\Big{(}\gamma(l)\cdot{\mathbb{I}}\{l\geq\frac{K}{2}\}+\frac{1}{2}(1-\gamma(l))\}\Big{)}\cdot\Pr[{\mathcal{L}}({\mathcal{D}}^{\prime})=l]\Big{)}+\delta
l=K+12K(γ(l)+12(1γ(l)))Pr[(𝒟)=l]+l=0K1212(1γ(l))Pr[(𝒟)=l]\displaystyle\iff\sum_{l=\frac{K+1}{2}}^{K}\Big{(}\gamma(l)+\frac{1}{2}(1-\gamma(l))\Big{)}\cdot\Pr[{\mathcal{L}}({\mathcal{D}})=l]+\sum_{l=0}^{\frac{K-1}{2}}\frac{1}{2}(1-\gamma(l))\cdot\Pr[{\mathcal{L}}({\mathcal{D}})=l] (36)
emϵ(l=K+12K(γ(l)+12(1γ(l)))Pr[(𝒟)=l])+emϵ(l=0K1212(1γ(l))Pr[(𝒟)=l])+δ\displaystyle\leq e^{m{\epsilon}}\Big{(}\sum_{l=\frac{K+1}{2}}^{K}\Big{(}\gamma(l)+\frac{1}{2}(1-\gamma(l))\Big{)}\cdot\Pr[{\mathcal{L}}({\mathcal{D}})=l]\Big{)}+e^{m{\epsilon}}\Big{(}\sum_{l=0}^{\frac{K-1}{2}}\frac{1}{2}(1-\gamma(l))\cdot\Pr[{\mathcal{L}}({\mathcal{D}}^{\prime})=l]\Big{)}+\delta
l=K+12K12γ(l)αll=0K1212γ(l)αl+12\displaystyle\iff\sum_{l=\frac{K+1}{2}}^{K}\frac{1}{2}\gamma(l)\alpha_{l}-\sum_{l=0}^{\frac{K-1}{2}}\frac{1}{2}\gamma(l)\alpha_{l}+\frac{1}{2} (37)
emϵl=K+12K12γ(l)αlemϵl=0K1212γ(l)αl+12emϵ+δ\displaystyle\leq e^{m{\epsilon}}\sum_{l=\frac{K+1}{2}}^{K}\frac{1}{2}\gamma(l)\alpha_{l}^{\prime}-e^{m{\epsilon}}\sum_{l=0}^{\frac{K-1}{2}}\frac{1}{2}\gamma(l)\alpha_{l}^{\prime}+\frac{1}{2}e^{m{\epsilon}}+\delta
l=K+12K(αlemϵαl)γ(l)l=0K12(αlemϵαl)γ(l)emϵ1+2δ\displaystyle\iff\sum_{l=\frac{K+1}{2}}^{K}(\alpha_{l}-e^{m{\epsilon}}\alpha_{l}^{\prime})\gamma(l)-\sum_{l=0}^{\frac{K-1}{2}}(\alpha_{l}-e^{m{\epsilon}}\alpha_{l}^{\prime})\gamma(l)\leq e^{m{\epsilon}}-1+2\delta (38)

Similarly, consider the output being 0.

Pr[DaRRMγ(𝒟)=0]emϵPr[DaRRMγ(𝒟)=0]+δ\displaystyle\Pr[\textsf{DaRRM}_{\gamma}({\mathcal{D}})=0]\leq e^{m{\epsilon}}\Pr[\textsf{DaRRM}_{\gamma}({\mathcal{D}}^{\prime})=0]+\delta (39)
l=0KPr[DaRRMγ(𝒟)=0(𝒟)=l]Pr[(𝒟)=l]\displaystyle\iff\sum_{l=0}^{K}\Pr[\textsf{DaRRM}_{\gamma}({\mathcal{D}})=0\mid{\mathcal{L}}({\mathcal{D}})=l]\cdot\Pr[{\mathcal{L}}({\mathcal{D}})=l] (40)
emϵ(l=0KPr[DaRRMγ(𝒟)=0(𝒟)=l]Pr[(𝒟)=l])+δ\displaystyle\leq e^{m{\epsilon}}\Big{(}\sum_{l=0}^{K}\Pr[\textsf{DaRRM}_{\gamma}({\mathcal{D}}^{\prime})=0\mid{\mathcal{L}}({\mathcal{D}}^{\prime})=l]\cdot\Pr[{\mathcal{L}}({\mathcal{D}}^{\prime})=l]\Big{)}+\delta
l=0K(γ(l)𝕀{l<K2}+12(1γ(l)))Pr[(𝒟)=l]\displaystyle\iff\sum_{l=0}^{K}\Big{(}\gamma(l)\cdot{\mathbb{I}}\{l<\frac{K}{2}\}+\frac{1}{2}(1-\gamma(l))\Big{)}\cdot\Pr[{\mathcal{L}}({\mathcal{D}})=l] (41)
emϵ(l=0Kγ(l)𝕀{l<K2}+12(1γ(l)))Pr[(𝒟)=l]+δ\displaystyle\leq e^{m{\epsilon}}\Big{(}\sum_{l=0}^{K}\gamma(l)\cdot{\mathbb{I}}\{l<\frac{K}{2}\}+\frac{1}{2}(1-\gamma(l))\Big{)}\cdot\Pr[{\mathcal{L}}({\mathcal{D}}^{\prime})=l]+\delta
l=0K12(γ(l)+12(1γ(l)))Pr[(𝒟)=l]+l=K+12K12(1γ(l))Pr[(𝒟)=l]\displaystyle\iff\sum_{l=0}^{\frac{K-1}{2}}\Big{(}\gamma(l)+\frac{1}{2}(1-\gamma(l))\Big{)}\cdot\Pr[{\mathcal{L}}({\mathcal{D}})=l]+\sum_{l=\frac{K+1}{2}}^{K}\frac{1}{2}(1-\gamma(l))\cdot\Pr[{\mathcal{L}}({\mathcal{D}})=l] (42)
emϵ(l=0K12(γ(l)+12(1γ(l)))Pr[(𝒟)=l]+l=K+12K12(1γ(l))Pr[(𝒟)=l])+δ\displaystyle\leq e^{m{\epsilon}}\Big{(}\sum_{l=0}^{\frac{K-1}{2}}\Big{(}\gamma(l)+\frac{1}{2}(1-\gamma(l))\Big{)}\cdot\Pr[{\mathcal{L}}({\mathcal{D}}^{\prime})=l]+\sum_{l=\frac{K+1}{2}}^{K}\frac{1}{2}(1-\gamma(l))\cdot\Pr[{\mathcal{L}}({\mathcal{D}}^{\prime})=l]\Big{)}+\delta
l=0K1212γ(l)αll=K+12K12γ(l)αl+12\displaystyle\iff\sum_{l=0}^{\frac{K-1}{2}}\frac{1}{2}\gamma(l)\alpha_{l}-\sum_{l=\frac{K+1}{2}}^{K}\frac{1}{2}\gamma(l)\alpha_{l}+\frac{1}{2} (43)
emϵl=0K1212γ(l)αlemϵl=K+12K12γ(l)αl+12emϵ+δ\displaystyle\leq e^{m{\epsilon}}\sum_{l=0}^{\frac{K-1}{2}}\frac{1}{2}\gamma(l)\alpha_{l}^{\prime}-e^{m{\epsilon}}\sum_{l=\frac{K+1}{2}}^{K}\frac{1}{2}\gamma(l)\alpha_{l}^{\prime}+\frac{1}{2}e^{m{\epsilon}}+\delta
l=0K12(αlemϵαl)γ(l)l=K+12K(αlemϵαl)γ(l)emϵ1+2δ\displaystyle\iff\sum_{l=0}^{\frac{K-1}{2}}(\alpha_{l}-e^{m{\epsilon}}\alpha_{l}^{\prime})\gamma(l)-\sum_{l=\frac{K+1}{2}}^{K}(\alpha_{l}-e^{m{\epsilon}}\alpha_{l}^{\prime})\gamma(l)\leq e^{m{\epsilon}}-1+2\delta (44)

Therefore, plugging Eq. 38 and Eq. 44 into Eq. 32,

DaRRMγ\textsf{DaRRM}_{\gamma} is (mϵ,δ)(m{\epsilon},\delta)-differentially private
l=K+12K(αlemϵαl)γ(l)l=0K12(αlemϵαl)γ(l)emϵ1+2δ\displaystyle\iff\sum_{l=\frac{K+1}{2}}^{K}(\alpha_{l}-e^{m{\epsilon}}\alpha_{l}^{\prime})\gamma(l)-\sum_{l=0}^{\frac{K-1}{2}}(\alpha_{l}-e^{m{\epsilon}}\alpha_{l}^{\prime})\gamma(l)\leq e^{m{\epsilon}}-1+2\delta (45)
andl=0K12(αlemϵαl)γ(l)l=K+12K(αlemϵαl)γ(l)emϵ1+2δ\displaystyle\text{and}\sum_{l=0}^{\frac{K-1}{2}}(\alpha_{l}-e^{m{\epsilon}}\alpha_{l}^{\prime})\gamma(l)-\sum_{l=\frac{K+1}{2}}^{K}(\alpha_{l}-e^{m{\epsilon}}\alpha_{l}^{\prime})\gamma(l)\leq e^{m{\epsilon}}-1+2\delta (46)

where αl=Pr[(𝒟)=l]\alpha_{l}=\Pr[{\mathcal{L}}({\mathcal{D}})=l] and αl=Pr[(𝒟)=l]\alpha_{l}^{\prime}=\Pr[{\mathcal{L}}({\mathcal{D}}^{\prime})=l], l{0,1,,K}\forall l\in\{0,1,\dots,K\} and 𝒟,𝒟{\mathcal{D}},{\mathcal{D}}^{\prime} are any adjacent datasets.

Next, we show if γ\gamma is symmetric around K2\frac{K}{2}, i.e., γ(l)=γ(Kl)\gamma(l)=\gamma(K-l), satisfying either one of Eq.  45 or Eq.  46 implies satisfying the other one. Following Eq. 45,

l=K+12K(αlemϵαl)γ(l)l=0K12(αlemϵαl)γ(l)emϵ1+2δ\displaystyle\sum_{l=\frac{K+1}{2}}^{K}(\alpha_{l}-e^{m{\epsilon}}\alpha_{l}^{\prime})\gamma(l)-\sum_{l=0}^{\frac{K-1}{2}}(\alpha_{l}-e^{m{\epsilon}}\alpha_{l}^{\prime})\gamma(l)\leq e^{m{\epsilon}}-1+2\delta (47)
l=0K12(αKlemϵαKl)γ(Kl)l=K12K(αKlemϵαKl)γ(Kl)emϵ1+2δ\displaystyle\iff\sum_{l=0}^{\frac{K-1}{2}}(\alpha_{K-l}-e^{m{\epsilon}}\alpha_{K-l}^{\prime})\cdot\gamma(K-l)-\sum_{l=\frac{K-1}{2}}^{K}(\alpha_{K-l}-e^{m{\epsilon}}\alpha_{K-l}^{\prime})\cdot\gamma(K-l)\leq e^{m{\epsilon}}-1+2\delta (48)
l=0K12(αKlemϵαKl)γ(l)l=K12K(αKlemϵαKl)γ(l)emϵ1+2δ\displaystyle\iff\sum_{l=0}^{\frac{K-1}{2}}(\alpha_{K-l}-e^{m{\epsilon}}\alpha_{K-l}^{\prime})\cdot\gamma(l)-\sum_{l=\frac{K-1}{2}}^{K}(\alpha_{K-l}-e^{m{\epsilon}}\alpha_{K-l}^{\prime})\cdot\gamma(l)\leq e^{m{\epsilon}}-1+2\delta (49)
Since γ(l)=γ(Kl)\gamma(l)=\gamma(K-l)

For analysis purpose, we rewrite Eq. 46 as

l=0K12(α~lemϵα~l)γ(l)l=K12K(α~lemϵα~l)γ(l)emϵ1+2δ\displaystyle\sum_{l=0}^{\frac{K-1}{2}}(\widetilde{\alpha}_{l}-e^{m{\epsilon}}\widetilde{\alpha}_{l}^{\prime})\cdot\gamma(l)-\sum_{l=\frac{K-1}{2}}^{K}(\widetilde{\alpha}_{l}-e^{m{\epsilon}}\widetilde{\alpha}_{l}^{\prime})\cdot\gamma(l)\leq e^{m{\epsilon}}-1+2\delta (50)

and proceed by showing Eq. 49 \iff Eq. 50.

Recall pi=Pr[Mi(𝒟)=1]p_{i}=\Pr[M_{i}({\mathcal{D}})=1] and pi=Pr[Mi(𝒟)=1]p_{i}^{\prime}=\Pr[M_{i}({\mathcal{D}}^{\prime})=1]. Observe (𝒟)PoissonBinomial({pi}i=1K){\mathcal{L}}({\mathcal{D}})\sim\text{PoissonBinomial}(\{p_{i}\}_{i=1}^{K}) and (𝒟)PoissonBinomial({pi}i=1K){\mathcal{L}}({\mathcal{D}}^{\prime})\sim\text{PoissonBinomial}(\{p_{i}^{\prime}\}_{i=1}^{K}). Let Fl={𝒜:|𝒜|=l,𝒜[K]}F_{l}=\{{\mathcal{A}}:|{\mathcal{A}}|=l,{\mathcal{A}}\subseteq[K]\}, for any l{0,,K}l\in\{0,\dots,K\}, denote the set of all subsets of ll integers that can be selected from [K][K]. Let 𝒜c=[K]𝒜{\mathcal{A}}^{c}=[K]\setminus{\mathcal{A}} be 𝒜{\mathcal{A}}’s complement set. Notice FKl={𝒜c:𝒜Fl}F_{K-l}=\{{\mathcal{A}}^{c}:{\mathcal{A}}\in F_{l}\}.

Since α\alpha denotes the pmf of the Poisson Binomial distribution at ll, it follows that

αl=Pr[(𝒟)=l]=𝒜FlΠi𝒜piΠj𝒜c(1pj)\displaystyle\alpha_{l}=\Pr[{\mathcal{L}}({\mathcal{D}})=l]=\sum_{{\mathcal{A}}\in F_{l}}\Pi_{i\in{\mathcal{A}}}p_{i}\Pi_{j\in{\mathcal{A}}^{c}}(1-p_{j}) (51)

Consider βi=1pi,i[K]\beta_{i}=1-p_{i},\forall i\in[K] and a new random variable βPoissonBinomial({βi}i=1K){\mathcal{L}}^{\beta}\sim\text{PoissonBinomial}(\{\beta_{i}\}_{i=1}^{K}), and let α~l=Pr[β=1]\widetilde{\alpha}_{l}=\Pr[{\mathcal{L}}^{\beta}=1]. Observe that

α~l=Pr[β=l]\displaystyle\widetilde{\alpha}_{l}^{\prime}=\Pr[{\mathcal{L}}^{\beta}=l] =𝒜FlΠj𝒜βiΠi𝒜c(1βi)=𝒜FlΠj𝒜(1pj)Πi𝒜cpi\displaystyle=\sum_{{\mathcal{A}}\in F_{l}}\Pi_{j\in{\mathcal{A}}}\beta_{i}\Pi_{i\in{\mathcal{A}}^{c}}(1-\beta_{i})=\sum_{{\mathcal{A}}\in F_{l}}\Pi_{j\in{\mathcal{A}}}(1-p_{j})\Pi_{i\in{\mathcal{A}}^{c}}p_{i}
=𝒜cFKlΠj𝒜(1pi)Πi𝒜cpi=𝒜FKlΠi𝒜piΠj𝒜c(1pi)\displaystyle=\sum_{{\mathcal{A}}^{c}\in F_{K-l}}\Pi_{j\in{\mathcal{A}}}(1-p_{i})\Pi_{i\in{\mathcal{A}}^{c}}p_{i}=\sum_{{\mathcal{A}}\in F_{K-l}}\Pi_{i\in{\mathcal{A}}}p_{i}\Pi_{j\in{\mathcal{A}}^{c}}(1-p_{i})
=αKl\displaystyle=\alpha_{K-l} (52)

Similarly, consider βi=1pi,i[K]\beta_{i}^{\prime}=1-p_{i}^{\prime},\forall i\in[K] and a new random variable βPoissonBinomial(βi}i=1L){\mathcal{L}}^{\prime\beta}\sim\text{PoissonBinomial}(\beta_{i}^{\prime}\}_{i=1}^{L}), and let α~l=Pr[β=1]\widetilde{\alpha}_{l}^{\prime}=\Pr[{\mathcal{L}}^{\prime\beta}=1]. Then, α~l=αKl\widetilde{\alpha}_{l}^{\prime}=\alpha^{\prime}_{K-l}.

Since Eq. 49 holds for all possible αKl\alpha_{K-l}, αKl\alpha_{K-l}^{\prime}, Eq. 50 then holds for all α~l,α~l\widetilde{\alpha}_{l},\widetilde{\alpha}_{l}^{\prime} in the KK-simplex, and so Eq. 50 follows by relabeling αKl\alpha_{K-l} as α~l\widetilde{\alpha}_{l} and αKl\alpha_{K-l}^{\prime} as α~l\widetilde{\alpha}_{l}^{\prime}.

The above implies Eq. 45 \iff Eq. 46. Therefore,

DaRRMγ\textsf{DaRRM}_{\gamma} is (mϵ,δ)(m{\epsilon},\delta)-differentially private
l=K+12K(αlemϵαl)γ(l)l=0K12(αlemϵαl)γ(l):=f(p1,,pK,p1,,pK;γ)emϵ1+2δ\displaystyle\iff\underbrace{\sum_{l=\frac{K+1}{2}}^{K}(\alpha_{l}-e^{m{\epsilon}}\alpha_{l}^{\prime})\gamma(l)-\sum_{l=0}^{\frac{K-1}{2}}(\alpha_{l}-e^{m{\epsilon}}\alpha_{l}^{\prime})\gamma(l)}_{:=f(p_{1},\dots,p_{K},p_{1}^{\prime},\dots,p_{K}^{\prime};\gamma)}\leq e^{m{\epsilon}}-1+2\delta (53)

Appendix B Details of Section 4: Provable Privacy Amplification

In this section, we consider Problem 1.1 in the pure differential privacy and i.i.d. mechanisms setting. That is, δ=Δ=0\delta=\Delta=0 and p=pi=Pr[Mi(𝒟)=1],p=pi=Pr[Mi(𝒟)=1],i[K]p=p_{i}=\Pr[M_{i}({\mathcal{D}})=1],p^{\prime}=p_{i}^{\prime}=\Pr[M_{i}({\mathcal{D}}^{\prime})=1],\forall i\in[K]. Our goal is to search for a good noise function γ\gamma such that: 1) DaRRMγ\textsf{DaRRM}_{\gamma} is mϵm{\epsilon}-DP, and 2) DaRRMγ\textsf{DaRRM}_{\gamma} achieves higher utility than that of the baselines (see Section 3) under a fixed privacy loss. Our main finding of such a γ\gamma function is presented in Theorem 4.1, which states given a privacy allowance m[K]m\in[K], one can indeed output the majority of 2m12m-1 subsampled mechanisms, instead of just mm as indicated by simple composition. Later, we formally verify in Lemma B.11, Section B.3 that taking the majority of more mechanisms strictly increases the utility.

To start, by Lemma 3.4, for any noise function γ\gamma, γ\gamma satisfying goal 1) is equivalent to satisfying

f(p,p;γ)eϵ1\displaystyle f(p,p^{\prime};\gamma)\leq e^{{\epsilon}}-1 (54)

where f(p,p;γ)=l=0K12(emϵαlαl)γ(l)+l=K+12K(αlemϵαl)γ(l)f(p,p^{\prime};\gamma)=\sum_{l=0}^{\frac{K-1}{2}}(e^{m{\epsilon}}\alpha_{l}^{\prime}-\alpha_{l})\cdot\gamma(l)+\sum_{l=\frac{K+1}{2}}^{K}(\alpha_{l}-e^{m{\epsilon}}\alpha_{l}^{\prime})\cdot\gamma(l) refers to the privacy cost objective (see Lemma 3.4) in the i.i.d. mechanisms setting, and recall αl=Pr[(𝒟)=l]\alpha_{l}=\Pr[{\mathcal{L}}({\mathcal{D}})=l] and αl=Pr[(𝒟)=l]\alpha_{l}^{\prime}=\Pr[{\mathcal{L}}({\mathcal{D}}^{\prime})=l], l{0,1,,K}\forall l\in\{0,1,\dots,K\}. Notice in this setting, (𝒟)Binomial(p){\mathcal{L}}({\mathcal{D}})\sim\text{Binomial}(p), and (𝒟)Binomial(p){\mathcal{L}}({\mathcal{D}}^{\prime})\sim\text{Binomial}(p^{\prime}).

Monotonicity Assumption. For analysis, we restrict our search for a γ\gamma function with good utility to the class with a mild monotonicity assumption: γ(l)γ(l+1),lK12\gamma(l)\geq\gamma(l+1),\forall l\leq\frac{K-1}{2} and γ(l)γ(l+1),lK+12\gamma(l)\leq\gamma(l+1),\forall l\geq\frac{K+1}{2}. This matches our intuition that as (𝒟)=i=1KSi{\mathcal{L}}({\mathcal{D}})=\sum_{i=1}^{K}S_{i}, i.e., the number of mechanisms outputting 1, approaches 0 or KK, there is a clearer majority and so not much noise is needed to ensure privacy, which implies a larger value of γ\gamma.

Refer to caption
Figure 4: The feasible region {\mathcal{F}} is plotted as the blue area. The four boundaries are implied by p,pp,p^{\prime} satisfying ϵ{\epsilon}-differential privacy.

Roadmap of Proof of Theorem 4.1. Since γ\gamma needs to enable Eq. 54 to be satisfied for all p,p[0,1]p,p^{\prime}\in[0,1], we begin by showing characteristics of the worst case probabilities, i.e., (p,p)=argmax(p,p)f(p,p;γ)(p^{*},p^{\prime*})=\operatorname*{arg\,max}_{(p,p^{\prime})}f(p,p^{\prime};\gamma), given any γ:{0,1,,K}[0,1]\gamma:\{0,1,\dots,K\}\rightarrow[0,1] that is symmetric around K2\frac{K}{2} and that satisfies the above monotonicity assumption, in Lemma B.1, Section B.1. We call (p,p)(p^{*},p^{\prime*}) the worst case probabilities, since they incur the largest privacy loss. Later in Section B.2, we present the main proof of Theorem 4.1, where we focus on searching for a good γ\gamma that enables f(p,p;γ)eϵ1f(p^{*},p^{\prime*};\gamma)\leq e^{{\epsilon}}-1, based on the characteristics of (p,p)(p^{*},p^{\prime*}) in Lemma B.1, to ensure DaRRMγ\textsf{DaRRM}_{\gamma} is mϵm{\epsilon}-differentially private.

B.1 Characterizing the Worst Case Probabilities

First, note (p,p)(p,p^{\prime}) are close to each other and lie in a feasible region {\mathcal{F}}, due to each mechanism MiM_{i} being ϵ{\epsilon}-differentially private; and so does (p,p)(p^{*},p^{\prime*}). The feasible region, as illustrated in Figure 4, is bounded by (a) peϵpp^{\prime}\leq e^{{\epsilon}}p (b) peϵpp\leq e^{{\epsilon}}p^{\prime} (c) 1peϵ(1p)1-p^{\prime}\leq e^{{\epsilon}}(1-p), and (d) 1peϵ(1p)1-p\leq e^{{\epsilon}}(1-p^{\prime}) , where the four boundaries are derived from the definition of differential privacy. Therefore, we only need to search for (p,p)=argmax(p,p)f(p,p;γ)(p^{*},p^{\prime*})=\operatorname*{arg\,max}_{(p,p^{\prime})\in{\mathcal{F}}}f(p,p^{\prime};\gamma).

Next, we show that given γ\gamma satisfying certain conditions, (p,p)(p^{*},p^{\prime*}) can only be on two of the four boundaries of {\mathcal{F}} in Lemma B.1 — that is, either p=eϵpp^{*}=e^{{\epsilon}}p^{\prime}, i.e., on the blue line in Figure 4, or 1p=eϵ(1p)1-p^{\prime*}=e^{{\epsilon}}(1-p^{*}), i.e., on the orange line in Figure 4.

Lemma B.1 (Characteristics of worst case probabilities).

For any noise function γ:{0,1,,K}[0,1]\gamma:\{0,1,\dots,K\}\rightarrow[0,1] that is 1) symmetric around K2\frac{K}{2}, 2) satisfies the monotonicity assumption, and 3) γ(K12)>0\gamma(\frac{K-1}{2})>0 and γ(K+12)>0\gamma(\frac{K+1}{2})>0, the worst case probabilities given γ\gamma, (p,p)=argmax(p,p)f(p,p;γ)(p^{*},p^{\prime*})=\operatorname*{arg\,max}_{(p,p^{\prime})\in{\mathcal{F}}}f(p,p^{\prime};\gamma), must satisfy one of the following two equalities:

p=eϵp,\displaystyle p^{*}=e^{{\epsilon}}p^{\prime*},\quad p[0,1eϵ+1],p[0,11+eϵ]\displaystyle\forall p^{*}\in[0,\frac{1}{e^{-{\epsilon}}+1}],p^{\prime*}\in[0,\frac{1}{1+e^{{\epsilon}}}]
or 1p=eϵ(1p),\displaystyle\text{or }\quad 1-p^{\prime*}=e^{{\epsilon}}(1-p^{*}),\quad p[11+eϵ,1],p[11+eϵ,1]\displaystyle\forall p^{*}\in[\frac{1}{1+e^{-{\epsilon}}},1],p^{\prime*}\in[\frac{1}{1+e^{{\epsilon}}},1]

To show Lemma B.1, we first show in Lemma B.2 that the search of (p,p)(p^{*},p^{\prime*}) can be refined to one of the four boundaries of {\mathcal{F}}, via a careful gradient analysis of f(p,p;γ)f(p,p^{\prime};\gamma) in {\mathcal{F}}, and then show in Lemma B.3 that the search of (p,p)(p^{*},p^{\prime*}) can be further refined to two of the four boundaries, due to symmetry of p,pp,p^{\prime}. Lemma B.1 directly follows from the two.

Lemma B.2.

For any noise function γ:{0,1,,K}[0,1]\gamma:\{0,1,\dots,K\}\rightarrow[0,1] that is 1) symmetric around K2\frac{K}{2}, 2) satisfies the monotonicity assumption, and 3) γ(K12)>0\gamma(\frac{K-1}{2})>0 and γ(K+12)>0\gamma(\frac{K+1}{2})>0, the worst case probabilities given γ\gamma, (p,p)=argmax(p,p)f(p,p;γ)(p^{*},p^{\prime*})=\operatorname*{arg\,max}_{(p,p^{\prime})\in{\mathcal{F}}}f(p,p^{\prime};\gamma), must satisfy one of the following four equalities:

p=eϵp,\displaystyle p^{\prime*}=e^{{\epsilon}}p^{*}, p[0,11+eϵ],p[0,11+eϵ]\displaystyle\forall p^{*}\in[0,\frac{1}{1+e^{{\epsilon}}}],p^{\prime*}\in[0,\frac{1}{1+e^{-{\epsilon}}}]
p=eϵp,\displaystyle p^{*}=e^{{\epsilon}}p^{\prime*}, p[0,1eϵ+1],p[0,11+eϵ]\displaystyle\forall p^{*}\in[0,\frac{1}{e^{-{\epsilon}}+1}],p^{\prime*}\in[0,\frac{1}{1+e^{{\epsilon}}}]
1p=eϵ(1p),\displaystyle 1-p^{*}=e^{{\epsilon}}(1-p^{\prime*}), p[11+eϵ,1],p[11+eϵ,1]\displaystyle\forall p^{*}\in[\frac{1}{1+e^{{\epsilon}}},1],p^{\prime*}\in[\frac{1}{1+e^{-{\epsilon}}},1]
1p=eϵ(1p),\displaystyle 1-p^{\prime*}=e^{{\epsilon}}(1-p^{*}), p[11+eϵ,1],p[11+eϵ,1]\displaystyle\forall p^{*}\in[\frac{1}{1+e^{-{\epsilon}}},1],p^{\prime*}\in[\frac{1}{1+e^{{\epsilon}}},1]
Proof of Lemma B.2.

Recall the privacy cost objective (as defined in Lemma 3.4) is now

f(p,p;γ)=l=0K12(emϵαlαl)γ(l)+l=K+12K(αlemϵαl)γ(l)\displaystyle f(p,p^{\prime};\gamma)=\sum_{l=0}^{\frac{K-1}{2}}(e^{m{\epsilon}}\alpha_{l}^{\prime}-\alpha_{l})\cdot\gamma(l)+\sum_{l=\frac{K+1}{2}}^{K}(\alpha_{l}-e^{m{\epsilon}}\alpha_{l}^{\prime})\cdot\gamma(l)

where αl=Pr[(𝒟)=l]\alpha_{l}=\Pr[{\mathcal{L}}({\mathcal{D}})=l] and αl=Pr[(𝒟)=l]\alpha_{l}^{\prime}=\Pr[{\mathcal{L}}({\mathcal{D}}^{\prime})=l], l{0,1,,K}\forall l\in\{0,1,\dots,K\}. Since (𝒟)Binomial(p){\mathcal{L}}({\mathcal{D}})\sim\text{Binomial}(p) and (𝒟)Binomial(p){\mathcal{L}}({\mathcal{D}}^{\prime})\sim\text{Binomial}(p^{\prime}) in the i.i.d. mechanisms setting, and using the pmf of the Binomial distribution, ff can be written as

f(p,p;γ)=l=0K12(emϵ(Kl)pl(1p)Kl(Kl)pl(1p)Kl)γ(l)+l=K+12K((Kl)pl(1p)Klemϵ(Kl)pl(1p)Kl)\displaystyle f(p,p^{\prime};\gamma)=\sum_{l=0}^{\frac{K-1}{2}}(e^{m{\epsilon}}{K\choose l}p^{\prime l}(1-p^{\prime})^{K-l}-{K\choose l}p^{l}(1-p)^{K-l})\cdot\gamma(l)+\sum_{l=\frac{K+1}{2}}^{K}({K\choose l}p^{l}(1-p)^{K-l}-e^{m{\epsilon}}{K\choose l}p^{\prime l}(1-p^{\prime})^{K-l})

The gradients w.r.t. pp and pp^{\prime} are

pf(p,p;γ)\displaystyle\nabla_{p}f(p,p^{\prime};\gamma) =l=0K12(Kl)γ(l)(lpl1(1p)Klpl(Kl)(1p)Kl1):=A\displaystyle=\underbrace{\sum_{l=0}^{\frac{K-1}{2}}-{K\choose l}\gamma(l)\cdot(lp^{l-1}(1-p)^{K-l}-p^{l}(K-l)(1-p)^{K-l-1})}_{:=A} (55)
+l=K+12K(Kl)γ(l)(lpl1(1p)Klpl(Kl)(1p)Kl1):=B\displaystyle+\underbrace{\sum_{l=\frac{K+1}{2}}^{K}{K\choose l}\gamma(l)\cdot(lp^{l-1}(1-p)^{K-l}-p^{l}(K-l)(1-p)^{K-l-1})}_{:=B}

and

pf(p,p;γ)\displaystyle\nabla_{p^{\prime}}f(p,p^{\prime};\gamma) =l=0K12emϵ(Kl)γ(l)(lpl1(1p)Klpl(Kl)(1p)Kl1)\displaystyle=\sum_{l=0}^{\frac{K-1}{2}}e^{m{\epsilon}}{K\choose l}\gamma(l)\cdot(lp^{\prime l-1}(1-p^{\prime})^{K-l}-p^{\prime l}(K-l)(1-p^{\prime})^{K-l-1}) (56)
+l=K+12Kemϵ(Kl)γ(l)(lpl1(1p)Klpl(Kl)(1p)Kl1)\displaystyle+\sum_{l=\frac{K+1}{2}}^{K}-e^{m{\epsilon}}{K\choose l}\gamma(l)\cdot(lp^{\prime l-1}(1-p^{\prime})^{K-l}-p^{\prime l}(K-l)(1-p^{\prime})^{K-l-1})

We show in the following p(0,1)\forall p\in(0,1), pf(p,p;γ)>0\nabla_{p}f(p,p^{\prime};\gamma)>0 and pf(p,p;γ)<0\nabla_{p^{\prime}}f(p,p^{\prime};\gamma)<0. This implies there is no local maximum inside {\mathcal{F}}, and so (p,p)=argmaxp,pf(p,p;γ)(p^{*},p^{\prime*})=\operatorname*{arg\,max}_{p,p^{\prime}}f(p,p^{\prime};\gamma) must be on one of the four boundaries of {\mathcal{F}}. Also, if p=0p=0, then p=0p^{\prime}=0, and (0,0)(0,0) is a corner point at the intersection of two boundaries. Similarly, if p=1p=1, then p=1p^{\prime}=1, and (1,1)(1,1) is also a corner point. This concludes p[0,1]\forall p\in[0,1], (p,p)=argmaxp,pf(p,p;γ)(p^{*},p^{\prime*})=\operatorname*{arg\,max}_{p,p^{\prime}}f(p,p^{\prime};\gamma) must be on one of the four boundaries of {\mathcal{F}}.

To show pf(p,p;γ)>0\nabla_{p}f(p,p^{\prime};\gamma)>0 for p(0,1)p\in(0,1), we write pf(p,p;γ)=A+B\nabla_{p}f(p,p^{\prime};\gamma)=A+B as in Eq. 55, and show that A>0A>0 and B>0B>0.

To show A>0A>0, first note

A:=l=0K12γ(l)(Kl)(pl(Kl)(1p)Kl1lpl1(1p)Kl)>0\displaystyle A:=\sum_{l=0}^{\frac{K-1}{2}}\gamma(l){K\choose l}\cdot(p^{l}(K-l)(1-p)^{K-l-1}-lp^{l-1}(1-p)^{K-l})>0 (57)
l=0K12γ(l)(Kl)pl(Kl)(1p)Kl1>l=0K12γ(l)(Kl)lpl1(1p)Kl)\displaystyle\iff\sum_{l=0}^{\frac{K-1}{2}}\gamma(l){K\choose l}\cdot p^{l}(K-l)(1-p)^{K-l-1}>\sum_{l=0}^{\frac{K-1}{2}}\gamma(l){K\choose l}\cdot lp^{l-1}(1-p)^{K-l}) (58)
l=0K12γ(l)(K1l)KKlpl(Kl)(1p)Kl1>l=1K12γ(l)(K1l1)Kllpl1(1p)Kl\displaystyle\iff\sum_{l=0}^{\frac{K-1}{2}}\gamma(l){K-1\choose l}\frac{K}{K-l}\cdot p^{l}(K-l)(1-p)^{K-l-1}>\sum_{l=1}^{\frac{K-1}{2}}\gamma(l){K-1\choose l-1}\frac{K}{l}\cdot lp^{l-1}(1-p)^{K-l} (59)
Kl=0K12γ(l)(K1l)pl(1p)Kl1>Kl=1K12γ(l)(K1l1)pl1(1p)Kl\displaystyle\iff K\sum_{l=0}^{\frac{K-1}{2}}\gamma(l){K-1\choose l}p^{l}(1-p)^{K-l-1}>K\sum_{l=1}^{\frac{K-1}{2}}\gamma(l){K-1\choose l-1}p^{l-1}(1-p)^{K-l} (60)
l=0K12γ(l)(K1l)pl(1p)Kl1>l=0K121γ(l+1)(K1l)pl(1p)Kl1\displaystyle\iff\sum_{l=0}^{\frac{K-1}{2}}\gamma(l){K-1\choose l}p^{l}(1-p)^{K-l-1}>\sum_{l=0}^{\frac{K-1}{2}-1}\gamma(l+1){K-1\choose l}p^{l}(1-p)^{K-l-1} (61)

Since lK12\forall l\leq\frac{K-1}{2}, γ(l)γ(l+1)\gamma(l)\geq\gamma(l+1) and p(0,1)p\in(0,1), there is for l{0,,K121}l\in\{0,\dots,\frac{K-1}{2}-1\},

γ(l)(K1l)pl(1p)Kl1γ(l+1)(K1l)pl(1p)Kl1\displaystyle\gamma(l){K-1\choose l}p^{l}(1-p)^{K-l-1}\geq\gamma(l+1){K-1\choose l}p^{l}(1-p)^{K-l-1} (62)

Furthermore, since γ(K12)>0\gamma(\frac{K-1}{2})>0 and p(0,1)p\in(0,1),

γ(K12)(K1K12)pK12(1p)K12>0\displaystyle\gamma(\frac{K-1}{2}){K-1\choose\frac{K-1}{2}}p^{\frac{K-1}{2}}(1-p)^{\frac{K-1}{2}}>0 (63)

Eq. 62 and Eq. 63 combined implies

γ(K12)(K1K12)pK12(1p)K12+l=0K121γ(l)(K1l)pl(1p)Kl1>l=0K121γ(l+1)(K1l)pl(1p)Kl1\displaystyle\gamma(\frac{K-1}{2}){K-1\choose\frac{K-1}{2}}p^{\frac{K-1}{2}}(1-p)^{\frac{K-1}{2}}+\sum_{l=0}^{\frac{K-1}{2}-1}\gamma(l){K-1\choose l}p^{l}(1-p)^{K-l-1}>\sum_{l=0}^{\frac{K-1}{2}-1}\gamma(l+1){K-1\choose l}p^{l}(1-p)^{K-l-1} (64)

and hence, Eq. 61 holds. This further implies A>0A>0.

Next, to show B>0B>0, note that

B:=l=K+12K(Kl)γ(l)(lpl1(1p)Klpl(Kl)(1p)Kl1)>0\displaystyle B:=\sum_{l=\frac{K+1}{2}}^{K}{K\choose l}\gamma(l)\cdot(lp^{l-1}(1-p)^{K-l}-p^{l}(K-l)(1-p)^{K-l-1})>0 (65)
l=K+12K(Kl)γ(l)lpl1(1p)Kl>l=K+12K(Kl)pl(Kl)(1p)Kl1\displaystyle\iff\sum_{l=\frac{K+1}{2}}^{K}{K\choose l}\gamma(l)\cdot lp^{l-1}(1-p)^{K-l}>\sum_{l=\frac{K+1}{2}}^{K}{K\choose l}p^{l}(K-l)(1-p)^{K-l-1} (66)
l=K+12Kγ(l)(K1l1)Kllpl1(1p)Kl\displaystyle\iff\sum_{l=\frac{K+1}{2}}^{K}\gamma(l){K-1\choose l-1}\frac{K}{l}\cdot lp^{l-1}(1-p)^{K-l} (67)
>l=K+12K1γ(l)(K1l)KKlpl(Kl)(1p)Kl1\displaystyle>\sum_{l=\frac{K+1}{2}}^{K-1}\gamma(l){K-1\choose l}\frac{K}{K-l}\cdot p^{l}(K-l)(1-p)^{K-l-1}
Kl=K+12Kγ(l)(K1l1)pl1(1p)Kl\displaystyle\iff K\sum_{l=\frac{K+1}{2}}^{K}\gamma(l){K-1\choose l-1}\cdot p^{l-1}(1-p)^{K-l} (68)
>Kl=K+12K1γ(l)(K1l)pl(1p)Kl1\displaystyle>K\sum_{l=\frac{K+1}{2}}^{K-1}\gamma(l){K-1\choose l}\cdot p^{l}(1-p)^{K-l-1}
l=K+12Kγ(l)(K1l1)pl1(1p)Kl>l=K+12+1Kγ(l1)(K1l1)pl1(1p)Kl\displaystyle\iff\sum_{l=\frac{K+1}{2}}^{K}\gamma(l){K-1\choose l-1}\cdot p^{l-1}(1-p)^{K-l}>\sum_{l=\frac{K+1}{2}+1}^{K}\gamma(l-1){K-1\choose l-1}\cdot p^{l-1}(1-p)^{K-l} (69)

Since lK+12\forall l\geq\frac{K+1}{2}, γ(l)γ(l1)\gamma(l)\geq\gamma(l-1) and p(0,1)p\in(0,1), there is for l{K+12+1,,K}l\in\{\frac{K+1}{2}+1,\dots,K\},

γ(l)(K1l1)pl1(1p)Klγ(l1)(K1l1)pl1(1p)Kl\displaystyle\gamma(l){K-1\choose l-1}p^{l-1}(1-p)^{K-l}\geq\gamma(l-1){K-1\choose l-1}p^{l-1}(1-p)^{K-l} (70)

Furthermore, since γ(K+12)>0\gamma(\frac{K+1}{2})>0 and p(0,1)p\in(0,1),

γ(K+12)(K1K12)pK12(1p)K12>0\displaystyle\gamma(\frac{K+1}{2}){K-1\choose\frac{K-1}{2}}p^{\frac{K-1}{2}}(1-p)^{\frac{K-1}{2}}>0 (71)

Eq. 70 and Eq. 71 combined implies

γ(K+12)(K1K12)pK12(1p)K12+l=K+12+1Kγ(l)(K1l1)pl1(1p)Kl>l=K+12+1Kγ(l1)(K1l1)pl1(1p)Kl\displaystyle\gamma(\frac{K+1}{2}){K-1\choose\frac{K-1}{2}}p^{\frac{K-1}{2}}(1-p)^{\frac{K-1}{2}}+\sum_{l=\frac{K+1}{2}+1}^{K}\gamma(l){K-1\choose l-1}\cdot p^{l-1}(1-p)^{K-l}>\sum_{l=\frac{K+1}{2}+1}^{K}\gamma(l-1){K-1\choose l-1}\cdot p^{l-1}(1-p)^{K-l} (72)

and hence Eq. 69 holds. This further implies B>0B>0.

Following Eq.55, for p(0,1)p\in(0,1) and γ\gamma satisfying the three assumptions,

pf(p,p;γ)=A+B>0\displaystyle\nabla_{p}f(p,p^{\prime};\gamma)=A+B>0 (73)

Following similar techniques, one can show for p(0,1)p\in(0,1) and γ\gamma satisfying the three conditions,

pf(p,p;γ)<0\displaystyle\nabla_{p^{\prime}}f(p,p^{\prime};\gamma)<0 (74)

This implies there is no local minima or local maxima inside the feasible region {\mathcal{F}}. Also recall (p,p){(0,0),(1,1)}(p,p^{\prime})\in\{(0,0),(1,1)\} are two special cases where (p,p)(p,p^{\prime}) is at the intersection of two boundaries. Hence, we conclude the worst case probability (p,p)=argmaxp,pf(p,p;γ)(p^{*},p^{\prime*})=\operatorname*{arg\,max}_{p,p^{\prime}\in{\mathcal{F}}}f(p,p^{\prime};\gamma) is on one of the four boundaries of {\mathcal{F}} — that is, (p,p)(p^{*},p^{\prime*}) satisfy one of the following:

p=eϵp,\displaystyle p^{\prime*}=e^{{\epsilon}}p^{*}, p[0,11+eϵ],p[0,11+eϵ]\displaystyle\forall p\in[0,\frac{1}{1+e^{{\epsilon}}}],p^{\prime}\in[0,\frac{1}{1+e^{-{\epsilon}}}]
p=eϵp,\displaystyle p^{*}=e^{{\epsilon}}p^{\prime*}, p[0,1eϵ+1],p[0,11+eϵ]\displaystyle\forall p\in[0,\frac{1}{e^{-{\epsilon}}+1}],p^{\prime}\in[0,\frac{1}{1+e^{{\epsilon}}}]
1p=eϵ(1p),\displaystyle 1-p^{*}=e^{{\epsilon}}(1-p^{\prime*}), p[11+eϵ,1],p[11+eϵ,1]\displaystyle\forall p\in[\frac{1}{1+e^{{\epsilon}}},1],p^{\prime}\in[\frac{1}{1+e^{-{\epsilon}}},1]
1p=eϵ(1p),\displaystyle 1-p^{\prime*}=e^{{\epsilon}}(1-p^{*}), p[11+eϵ,1],p[11+eϵ,1]\displaystyle\forall p\in[\frac{1}{1+e^{-{\epsilon}}},1],p^{\prime}\in[\frac{1}{1+e^{{\epsilon}}},1]

Lemma B.3.

For any noise function γ:{0,1,,K}[0,1]\gamma:\{0,1,\dots,K\}\rightarrow[0,1] function that is 1) symmetric around K2\frac{K}{2} and 2) satisfies the monotonicity assumption, the privacy cost objective f(p,p;γ)f(p,p^{\prime};\gamma) is maximized when ppp\geq p^{\prime}.

Proof of Lemma B.3.

Following Eq. 33 and Eq. 38 in the proof of Lemma 3.4, and that δ=0\delta=0,

Pr[DaRRMγ(𝒟)=1]emϵPr[DaRRMγ(𝒟)=1]\displaystyle\Pr[\textsf{DaRRM}_{\gamma}({\mathcal{D}})=1]\leq e^{m{\epsilon}}\Pr[\textsf{DaRRM}_{\gamma}({\mathcal{D}}^{\prime})=1] (75)
l=K+12K(αlemϵαl)γ(l)l=0K12(αlemϵαl)γ(l)=f(p,p;γ)emϵ1\displaystyle\iff\underbrace{\sum_{l=\frac{K+1}{2}}^{K}(\alpha_{l}-e^{m{\epsilon}}\alpha_{l}^{\prime})\gamma(l)-\sum_{l=0}^{\frac{K-1}{2}}(\alpha_{l}-e^{m{\epsilon}}\alpha_{l}^{\prime})\gamma(l)}_{=f(p,p^{\prime};\gamma)}\leq e^{m{\epsilon}}-1 (76)

where αl=Pr[(𝒟)=l]\alpha_{l}=\Pr[{\mathcal{L}}({\mathcal{D}})=l] and αl=Pr[(𝒟)=l]\alpha_{l}^{\prime}=\Pr[{\mathcal{L}}({\mathcal{D}}^{\prime})=l], l{0,1,,K}\forall l\in\{0,1,\dots,K\}. This implies

f(p,p;γ)=Pr[DaRRMγ(𝒟)=1]Pr[DaRRMγ(𝒟)=1]1\displaystyle f(p,p^{\prime};\gamma)=\frac{\Pr[\textsf{DaRRM}_{\gamma}({\mathcal{D}})=1]}{\Pr[\textsf{DaRRM}_{\gamma}({\mathcal{D}}^{\prime})=1]}-1 (77)

Hence, f(p,p;γ)f(p,p^{\prime};\gamma) is maximized when Pr[DaRRMγ(𝒟)=1]Pr[DaRRMγ(𝒟)=1]\Pr[\textsf{DaRRM}_{\gamma}({\mathcal{D}})=1]\geq\Pr[\textsf{DaRRM}_{\gamma}({\mathcal{D}}^{\prime})=1].

Pr[DaRRMγ(𝒟)=1]\displaystyle\Pr[\textsf{DaRRM}_{\gamma}({\mathcal{D}})=1] =l=0KPr[DaRRMγ(𝒟)=1(𝒟)=1]Pr[(𝒟)=l]\displaystyle=\sum_{l=0}^{K}\Pr[\textsf{DaRRM}_{\gamma}({\mathcal{D}})=1\mid{\mathcal{L}}({\mathcal{D}})=1]\cdot\Pr[{\mathcal{L}}({\mathcal{D}})=l] (78)
=l=0K(γ(l)𝕀{lK2}+12(1γ(l)))Pr[(𝒟)=l]\displaystyle=\sum_{l=0}^{K}\Big{(}\gamma(l)\cdot{\mathbb{I}}\{l\geq\frac{K}{2}\}+\frac{1}{2}(1-\gamma(l))\Big{)}\cdot\Pr[{\mathcal{L}}({\mathcal{D}})=l] (79)
=l=0K1212(1γ(l))αl+l=K+12K(γ(l)+12(1γ(l)))αl\displaystyle=\sum_{l=0}^{\frac{K-1}{2}}\frac{1}{2}(1-\gamma(l))\cdot\alpha_{l}+\sum_{l=\frac{K+1}{2}}^{K}\Big{(}\gamma(l)+\frac{1}{2}(1-\gamma(l))\Big{)}\cdot\alpha_{l} (80)
=12l=K+12Kγ(l)(Kl)pl(1p)Kl12l=0K12γ(l)(Kl)pl(1p)Kl1+12\displaystyle=\frac{1}{2}\sum_{l=\frac{K+1}{2}}^{K}\gamma(l){K\choose l}p^{l}(1-p)^{K-l}-\frac{1}{2}\sum_{l=0}^{\frac{K-1}{2}}\gamma(l){K\choose l}p^{l}(1-p)^{K-l-1}+\frac{1}{2} (81)

where the last line follows from the observation that in the i.i.d. mechanisms setting, (𝒟)Binomial(p){\mathcal{L}}({\mathcal{D}})\sim\text{Binomial}(p) and αl\alpha_{l} is hence the pmf of the Binomial distribution at ll.

Similarly,

Pr[DaRRMγ(𝒟)=1]=12l=K+12Kγ(l)(Kl)pl(1p)Kl12l=0K12γ(l)(Kl)pl(1p)Kl1+12\displaystyle\Pr[\textsf{DaRRM}_{\gamma}({\mathcal{D}}^{\prime})=1]=\frac{1}{2}\sum_{l=\frac{K+1}{2}}^{K}\gamma(l){K\choose l}p^{\prime l}(1-p^{\prime})^{K-l}-\frac{1}{2}\sum_{l=0}^{\frac{K-1}{2}}\gamma(l){K\choose l}p^{\prime l}(1-p^{\prime})^{K-l-1}+\frac{1}{2} (82)

Now define the objective

h(β)=12l=K+12Kγ(l)(Kl)βl(1β)Kl12l=0K12γ(l)(Kl)βl(1β)Kl1+12\displaystyle h(\beta)=\frac{1}{2}\sum_{l=\frac{K+1}{2}}^{K}\gamma(l){K\choose l}\beta^{l}(1-\beta)^{K-l}-\frac{1}{2}\sum_{l=0}^{\frac{K-1}{2}}\gamma(l){K\choose l}\beta^{l}(1-\beta)^{K-l-1}+\frac{1}{2} (83)

for β[0,1]\beta\in[0,1] and it follows that Pr[DaRRMγ(𝒟)=1]=h(p)\Pr[\textsf{DaRRM}_{\gamma}({\mathcal{D}})=1]=h(p) and Pr[DaRRMγ(𝒟)=1]=h(p)\Pr[\textsf{DaRRM}_{\gamma}({\mathcal{D}}^{\prime})=1]=h(p^{\prime}). We now analyze the monotonicity of h(β)h(\beta) in β\beta.

For ease of presentation, define g(l):={12γ(l)lK212γ(l)lK2g(l):=\begin{cases}-\frac{1}{2}\gamma(l)&\forall l\leq\frac{K}{2}\\ \frac{1}{2}\gamma(l)&\forall l\geq\frac{K}{2}\end{cases}. Since γ(l)γ(l+1),lK2\gamma(l)\geq\gamma(l+1),\forall l\leq\frac{K}{2} and γ(l+1)γ(l),lK2\gamma(l+1)\geq\gamma(l),\forall l\geq\frac{K}{2}, there is g(l+1)g(l),l{0,,K}g(l+1)\geq g(l),\forall l\in\{0,\dots,K\}. And replacing γ(l)\gamma(l) with g(l)g(l) in Eq. 83,

h(β)=l=0Kg(l)(Kl)βl(1β)Kl\displaystyle h(\beta)=\sum_{l=0}^{K}g(l){K\choose l}\beta^{l}(1-\beta)^{K-l} (84)
βh(β)\displaystyle\nabla_{\beta}h(\beta) =l=0Kg(l)(Kl)(lβl1(1β)Kl(Kl)βl(1β)Kl1)\displaystyle=\sum_{l=0}^{K}g(l){K\choose l}\Big{(}l\beta^{l-1}(1-\beta)^{K-l}-(K-l)\beta^{l}(1-\beta)^{K-l-1}\Big{)} (85)
=l=1Kg(l)(K1l1)Kllβl1(1β)Kll=0K1(K1l)KKl(Kl)βl(1β)Kl1\displaystyle=\sum_{l=1}^{K}g(l){K-1\choose l-1}\frac{K}{l}l\beta^{l-1}(1-\beta)^{K-l}-\sum_{l=0}^{K-1}{K-1\choose l}\frac{K}{K-l}(K-l)\beta^{l}(1-\beta)^{K-l-1} (86)
=Kl=1K(K1l1)βl1(1β)KlKl=0K1(K1l)βl(1β)Kl1\displaystyle=K\sum_{l=1}^{K}{K-1\choose l-1}\beta^{l-1}(1-\beta)^{K-l}-K\sum_{l=0}^{K-1}{K-1\choose l}\beta^{l}(1-\beta)^{K-l-1} (87)
=Kl=0K1g(l+1)(K1l)βl(1β)Kl1Kl=0K1g(l)(K1l)βl(1β)Kl1\displaystyle=K\sum_{l=0}^{K-1}g(l+1){K-1\choose l}\beta^{l}(1-\beta)^{K-l-1}-K\sum_{l=0}^{K-1}g(l){K-1\choose l}\beta^{l}(1-\beta)^{K-l-1} (88)
=Kl=0K1(g(l+1)g(l))(K1l)βl(1β)Kl1\displaystyle=K\sum_{l=0}^{K-1}\Big{(}g(l+1)-g(l)\Big{)}{K-1\choose l}\beta^{l}(1-\beta)^{K-l-1} (89)

Since g(l+1)g(l)g(l+1)\geq g(l) and (K1l)βl(1β)Kl10{K-1\choose l}\beta^{l}(1-\beta)^{K-l-1}\geq 0, βh(β)0\nabla_{\beta}h(\beta)\geq 0. This implies h(β)h(\beta) is monotonically non-decreasing in β\beta and hence,

Pr[DaRRMγ(𝒟)=1]Pr[DaRRMγ(𝒟)=1]pp\displaystyle\Pr[\textsf{DaRRM}_{\gamma}({\mathcal{D}})=1]\geq\Pr[\textsf{DaRRM}_{\gamma}({\mathcal{D}}^{\prime})=1]\iff p\geq p^{\prime} (90)

Therefore, f(p,p;γ)f(p,p^{\prime};\gamma) is maximzied when ppp\geq p^{\prime}. ∎

B.2 Proof of Privacy Amplification (Theorem 4.1)

Theorem B.4 (Restatement of Theorem 4.1).

Consider using DaRRM (Algorithm 1) to solve Problem 1.1, with i.i.d. mechanisms {Mi}i=1K\{M_{i}\}_{i=1}^{K}, i.e., pi=pp_{i}=p, pi=pp_{i}^{\prime}=p^{\prime}, i[K]\forall i\in[K], the privacy allowance m[K]m\in[K] and δ=Δ=0\delta=\Delta=0. Let the noise function γ:{0,1,,K}[0,1]\gamma:\{0,1,\dots,K\}\rightarrow[0,1] be that:
if mK+12m\geq\frac{K+1}{2},

γ(l)=1\displaystyle\gamma(l)=1

and if mK12m\leq\frac{K-1}{2},

γ(l)={12h(l)lK122h(l)1lK+12\displaystyle\gamma(l)=\begin{cases}1-2h(l)&\forall l\leq\frac{K-1}{2}\\ 2h(l)-1&\forall l\geq\frac{K+1}{2}\end{cases}

where h(l)=i=m2m1(li)(Kl2m1i)(K2m1)h(l)=\sum_{i=m}^{2m-1}\frac{{l\choose i}{K-l\choose 2m-1-i}}{{K\choose 2m-1}}, then DaRRMγ\textsf{DaRRM}_{\gamma} is mϵm{\epsilon}-differentially private.

Roadmap. Theorem 4.1 consists of two parts: γ\gamma under a large privacy allowance mK+12m\geq\frac{K+1}{2} and γ\gamma under a small privacy allowance mK12m\leq\frac{K-1}{2}. We first show in Lemma B.5, Section B.2.1 that if mK+12m\geq\frac{K+1}{2}, setting γ=1\gamma=1 suffices to ensure DaRRMγ\textsf{DaRRM}_{\gamma} to be mϵm{\epsilon}-differentially private, and hence one can always output the true majority of KK mechanisms. In contrast, simple composition indicates only when m=Km=K can one output the true majority of KK mechanisms. Next, we show in Lemma B.10, Section B.2.2 that if mK12m\leq\frac{K-1}{2}, one can set γ\gamma to be γDSub\gamma_{DSub}, which corresponds to outputting the majority of 2m12m-1 subsampled mechanisms (and hence the name “Double Subsampling”, or DSub). In contrast, simple compositon indicates one can only output the majority of mm subsampled mechanisms to make sure the output is mϵm{\epsilon}-differentially private. Theorem 4.1 follows directly from combining Lemma B.5 and Lemma B.10.

B.2.1 Privacy Amplification Under A Large Privacy Allowance mK+12m\geq\frac{K+1}{2}

The proof of Lemma B.5 is straightforward. We show that given the constant γmax(l)=1\gamma_{max}(l)=1, if mK+12m\geq\frac{K+1}{2}, the worst case probabilities are (p,p)=argmax(p,p)f(p,p;γmax)=(0,0)(p^{*},p^{\prime*})=\operatorname*{arg\,max}_{(p,p^{\prime})\in{\mathcal{F}}}f(p,p^{\prime};\gamma_{max})=(0,0) and notice that f(0,0;γmax)=emϵ1f(0,0;\gamma_{max})=e^{m{\epsilon}}-1, which satisfies the condition in Lemma 3.4. Hence, DaRRMγmax\textsf{DaRRM}_{\gamma_{max}} is mϵm{\epsilon}-differentially private.

Lemma B.5 (Privacy amplification, mK+12m\geq\frac{K+1}{2}).

Consider using DaRRM (Algorithm 1) to solve Problem 1.1, with i.i.d. mechanisms {Mi}i=1K\{M_{i}\}_{i=1}^{K}, i.e., pi=pp_{i}=p, pi=pp_{i}^{\prime}=p^{\prime}, i[K]\forall i\in[K], the privacy allowance mK+12,mm\geq\frac{K+1}{2},m\in{\mathbb{Z}} and δ=Δ=0\delta=\Delta=0. Let the noise function be the constant γmax(l)=1,l{0,1,,K}\gamma_{max}(l)=1,\forall l\in\{0,1,\dots,K\}. Then, DaRRMγmax\textsf{DaRRM}_{\gamma_{max}} is mϵm{\epsilon}-differentially private.

Proof of Lemma B.5.

First, notice γmax(l)=1,l{0,1,,K}\gamma_{max}(l)=1,\forall l\in\{0,1,\dots,K\} is: 1) symmetric around K2\frac{K}{2}, 2) satisfies the monotonicity assumption, and 3) γmax(K12)>0\gamma_{max}(\frac{K-1}{2})>0 and γmax(K+12)>0\gamma_{max}(\frac{K+1}{2})>0. Therefore, by Lemma B.1, the worst case probabilities given γmax\gamma_{max}, i.e., (p,p)=argmax(p,p)f(p,p;γmax)(p^{*},p^{\prime*})=\operatorname*{arg\,max}_{(p,p^{\prime})\in{\mathcal{F}}}f(p,p^{\prime};\gamma_{max}), are on one of the two boundaries of {\mathcal{F}}, satisfying

p=eϵp,\displaystyle p^{*}=e^{{\epsilon}}p^{\prime*},\quad p[0,1eϵ+1],p[0,11+eϵ]\displaystyle\forall p^{*}\in[0,\frac{1}{e^{-{\epsilon}}+1}],p^{\prime*}\in[0,\frac{1}{1+e^{{\epsilon}}}]
or 1p=eϵ(1p),\displaystyle\text{or }\quad 1-p^{\prime*}=e^{{\epsilon}}(1-p^{*}),\quad p[11+eϵ,1],p[11+eϵ,1]\displaystyle\forall p^{*}\in[\frac{1}{1+e^{-{\epsilon}}},1],p^{\prime*}\in[\frac{1}{1+e^{{\epsilon}}},1]

We now find the local maximums on the two possible boundaries, i.e.,

(plocal,plocal)=argmax(p,p):p=eϵp,p[0,1eϵ+1]f(p,p;γmax)\displaystyle(p_{local}^{*},p_{local}^{\prime*})=\operatorname*{arg\,max}_{(p,p^{\prime}):p=e^{{\epsilon}}p^{\prime},p\in[0,\frac{1}{e^{-{\epsilon}}+1}]}f(p,p^{\prime};\gamma_{max})

and

(plocal,plocal)=argmax(p,p):1p=eϵ(1p),p[11+eϵ,1]f(p,p;γmax)\displaystyle(p_{local}^{*},p_{local}^{\prime*})=\operatorname*{arg\,max}_{(p,p^{\prime}):1-p^{\prime}=e^{{\epsilon}}(1-p),p\in[\frac{1}{1+e^{-{\epsilon}}},1]}f(p,p^{\prime};\gamma_{max})

separately.

Part I: Local worst case probabilities on the boundary p=eϵpp=e^{{\epsilon}}p^{\prime}.

Plugging p=eϵpp=e^{{\epsilon}}p^{\prime} into the privacy cost objective f(p,p;γmax)f(p,p^{\prime};\gamma_{max}), one gets

f(p;γmax)\displaystyle f(p^{\prime};\gamma_{max}) =l=0K12(emϵ(Kl)pl(1p)Kl(Kl)(eϵp)l(1eϵp)Kl)\displaystyle=\sum_{l=0}^{\frac{K-1}{2}}(e^{m{\epsilon}}{K\choose l}p^{\prime l}(1-p^{\prime})^{K-l}-{K\choose l}(e^{{\epsilon}}p^{\prime})^{l}(1-e^{{\epsilon}}p^{\prime})^{K-l}) (91)
+l=K+12K((Kl)(eϵp)l(1eϵp)Klemϵ(Kl)pl(1p)Kl)\displaystyle+\sum_{l=\frac{K+1}{2}}^{K}({K\choose l}(e^{{\epsilon}}p^{\prime})^{l}(1-e^{{\epsilon}}p^{\prime})^{K-l}-e^{m{\epsilon}}{K\choose l}p^{\prime l}(1-p^{\prime})^{K-l})

The gradient w.r.t. pp^{\prime} is

pf(p;γmax)\displaystyle\nabla_{p^{\prime}}f(p^{\prime};\gamma_{max}) =l=0K12(emϵ(Kl)(lpl1(1p)Klpl(Kl)(1p)Kl1)\displaystyle=\sum_{l=0}^{\frac{K-1}{2}}\Big{(}e^{m{\epsilon}}{K\choose l}(lp^{\prime l-1}(1-p^{\prime})^{K-l}-p^{\prime l}(K-l)(1-p^{\prime})^{K-l-1}) (92)
eϵ(Kl)(l(eϵp)l1(1eϵp)Kleϵlpl(Kl)(1eϵp)Kl1))\displaystyle-e^{{\epsilon}}{K\choose l}(l(e^{{\epsilon}}p^{\prime})^{l-1}(1-e^{{\epsilon}}p^{\prime})^{K-l}-e^{{\epsilon}l}p^{\prime l}(K-l)(1-e^{{\epsilon}}p^{\prime})^{K-l-1})\Big{)}
+l=K+12K(eϵ(Kl)(l(eϵp)l1(1eϵp)Kleϵlpl(Kl)(1eϵp)Kl1)\displaystyle+\sum_{l=\frac{K+1}{2}}^{K}\Big{(}e^{{\epsilon}}{K\choose l}(l(e^{{\epsilon}}p^{\prime})^{l-1}(1-e^{{\epsilon}}p^{\prime})^{K-l}-e^{{\epsilon}l}p^{\prime l}(K-l)(1-e^{{\epsilon}}p^{\prime})^{K-l-1})
emϵ(Kl)(lpl1(1p)Klpl(Kl)(1p)Kl1))\displaystyle-e^{m{\epsilon}}{K\choose l}(lp^{\prime l-1}(1-p^{\prime})^{K-l}-p^{\prime l}(K-l)(1-p^{\prime})^{K-l-1})\Big{)}
=Kl=0K12emϵ(K1l)pl(1p)Kl1+Kl=K+12K1emϵ(K1l)pl(1p)Kl1\displaystyle=-K\sum_{l=0}^{\frac{K-1}{2}}e^{m{\epsilon}}{K-1\choose l}p^{\prime l}(1-p^{\prime})^{K-l-1}+K\sum_{l=\frac{K+1}{2}}^{K-1}e^{m{\epsilon}}{K-1\choose l}p^{\prime l}(1-p^{\prime})^{K-l-1} (93)
+Kl=0K12eϵ(K1l)(ϵp)ϵ(1eϵp)Kl1Kl=K+12K1eϵ(K1l)(eϵp)l(1eϵp)Kl1\displaystyle+K\sum_{l=0}^{\frac{K-1}{2}}e^{{\epsilon}}{K-1\choose l}({\epsilon}p^{\prime})^{{\epsilon}}(1-e^{{\epsilon}}p^{\prime})^{K-l-1}-K\sum_{l=\frac{K+1}{2}}^{K-1}e^{{\epsilon}}{K-1\choose l}(e^{{\epsilon}}p^{\prime})^{l}(1-e^{{\epsilon}}p^{\prime})^{K-l-1}
+Kl=0K121emϵ(K1l)pl(1p)Kl1Kl=K12K1emϵ(K1l)pl(1p)Kl1\displaystyle+K\sum_{l=0}^{\frac{K-1}{2}-1}e^{m{\epsilon}}{K-1\choose l}p^{\prime l}(1-p^{\prime})^{K-l-1}-K\sum_{l=\frac{K-1}{2}}^{K-1}e^{m{\epsilon}}{K-1\choose l}p^{\prime l}(1-p^{\prime})^{K-l-1}
Kl=0K121eϵ(K1l)(eϵp)l(1eϵp)Kl1+Kl=K12K1eϵ(K1l)(eϵp)l(1eϵp)Kl1\displaystyle-K\sum_{l=0}^{\frac{K-1}{2}-1}e^{{\epsilon}}{K-1\choose l}(e^{{\epsilon}}p^{\prime})^{l}(1-e^{{\epsilon}}p^{\prime})^{K-l-1}+K\sum_{l=\frac{K-1}{2}}^{K-1}e^{{\epsilon}}{K-1\choose l}(e^{{\epsilon}}p^{\prime})^{l}(1-e^{{\epsilon}}p^{\prime})^{K-l-1}
=2Kemϵ(K1K12)pK12(1p)K12:=A+2Keϵ(K1K12)(eϵp)K12(1eϵp)K12:=B\displaystyle=-2K\underbrace{e^{m{\epsilon}}{K-1\choose\frac{K-1}{2}}p^{\prime\frac{K-1}{2}}(1-p^{\prime})^{\frac{K-1}{2}}}_{:=A}+2K\underbrace{e^{{\epsilon}}{K-1\choose\frac{K-1}{2}}(e^{{\epsilon}}p^{\prime})^{\frac{K-1}{2}}(1-e^{{\epsilon}}p^{\prime})^{\frac{K-1}{2}}}_{:=B} (94)

Notice that

AB=emϵ(K1K12)pK12(1p)K12eϵ(K1K12)(eϵp)K12(1eϵp)K12=emϵeK+12ϵ(1p1eϵp)K12\displaystyle\frac{A}{B}=\frac{e^{m{\epsilon}}{K-1\choose\frac{K-1}{2}}p^{\prime\frac{K-1}{2}}(1-p^{\prime})^{\frac{K-1}{2}}}{e^{{\epsilon}}{K-1\choose\frac{K-1}{2}}(e^{{\epsilon}}p^{\prime})^{\frac{K-1}{2}}(1-e^{{\epsilon}}p^{\prime})^{\frac{K-1}{2}}}=\frac{e^{m{\epsilon}}}{e^{\frac{K+1}{2}{\epsilon}}}\cdot(\frac{1-p^{\prime}}{1-e^{{\epsilon}}p^{\prime}})^{\frac{K-1}{2}} (95)

Since 1p1eϵp1\frac{1-p^{\prime}}{1-e^{{\epsilon}}p^{\prime}}\geq 1 and mK+12m\geq\frac{K+1}{2}, AB1\frac{A}{B}\geq 1. This implies pf(p;γmax)0\nabla_{p^{\prime}}f(p^{\prime};\gamma_{max})\leq 0. Hence, f(p;γmax)f(p^{\prime};\gamma_{max}) is monotonically non-increasing on the boundary, for p[0,11+eϵ]p^{\prime}\in[0,\frac{1}{1+e^{{\epsilon}}}].

Therefore, argmaxp:p[0,11+eϵ]f(p;γmax)=0\operatorname*{arg\,max}_{p^{\prime}:p^{\prime}\in[0,\frac{1}{1+e^{{\epsilon}}}]}f(p^{\prime};\gamma_{max})=0. Since p=eϵpp=e^{{\epsilon}}p^{\prime}, p=0p^{\prime}=0 implies p=0p=0.

Hence,

(plocal,plocal)=argmax(p,p):p=eϵp,p[0,1eϵ+1]f(p,p;γmax)=(0,0)\displaystyle(p_{local}^{*},p_{local}^{\prime*})=\operatorname*{arg\,max}_{(p,p^{\prime}):p=e^{{\epsilon}}p^{\prime},p\in[0,\frac{1}{e^{-{\epsilon}}+1}]}f(p,p^{\prime};\gamma_{max})=(0,0)

and

max(p,p):p=eϵp,p[0,1eϵ+1]f(p,p;γmax)=f(0,0;γmax)=emϵ1\displaystyle\max_{(p,p^{\prime}):p=e^{{\epsilon}}p^{\prime},p\in[0,\frac{1}{e^{-{\epsilon}}+1}]}f(p,p^{\prime};\gamma_{max})=f(0,0;\gamma_{max})=e^{m{\epsilon}}-1

Part II: Local worst case probabilities on the boundary 1p=eϵ(1p)1-p^{\prime}=e^{{\epsilon}}(1-p).

For simplicity, let q=1pq=1-p and q=1pq^{\prime}=1-p^{\prime}. Note on this boundary p[11+eϵ,1]p\in[\frac{1}{1+e^{-{\epsilon}}},1] and p[11+eϵ,1]p^{\prime}\in[\frac{1}{1+e^{{\epsilon}}},1], and hence, q[0,11+eϵ]q\in[0,\frac{1}{1+e^{{\epsilon}}}] and q[0,11+eϵ]q^{\prime}\in[0,\frac{1}{1+e^{-{\epsilon}}}].

Plugging qq and qq^{\prime} into the privacy cost objective f(p,p;γmax)f(p,p^{\prime};\gamma_{max}), one gets a new objective in q,qq,q^{\prime} as

f(q,q;γmax)\displaystyle f(q,q^{\prime};\gamma_{max}) =l=0K12(emϵ(Kl)(1q)lqKl(Kl)(1q)lqKl)γmax(l)\displaystyle=\sum_{l=0}^{\frac{K-1}{2}}\Big{(}e^{m{\epsilon}}{K\choose l}(1-q^{\prime})^{l}q^{\prime K-l}-{K\choose l}(1-q)^{l}q^{K-l}\Big{)}\cdot\gamma_{max}(l) (96)
+l=K+12K((Kl)(1q)lqKlemϵ(Kl)(1q)lqKl)γmax(l)\displaystyle+\sum_{l=\frac{K+1}{2}}^{K}\Big{(}{K\choose l}(1-q)^{l}q^{K-l}-e^{m{\epsilon}}{K\choose l}(1-q^{\prime})^{l}q^{\prime K-l}\Big{)}\cdot\gamma_{max}(l)
=l=0K12(emϵ(Kl)(1q)lqKl(Kl)(1q)lqKl)\displaystyle=\sum_{l=0}^{\frac{K-1}{2}}\Big{(}e^{m{\epsilon}}{K\choose l}(1-q^{\prime})^{l}q^{\prime K-l}-{K\choose l}(1-q)^{l}q^{K-l}\Big{)} (97)
+l=K+12K((Kl)(1q)lqKlemϵ(Kl)(1q)lqKl)\displaystyle+\sum_{l=\frac{K+1}{2}}^{K}\Big{(}{K\choose l}(1-q)^{l}q^{K-l}-e^{m{\epsilon}}{K\choose l}(1-q^{\prime})^{l}q^{\prime K-l}\Big{)}

Since on this boundary, 1p=eϵ(1p)1-p^{\prime}=e^{{\epsilon}}(1-p), writing this in q,qq,q^{\prime}, this becomes q=eϵqq^{\prime}=e^{{\epsilon}}q. Plugging q=eϵqq^{\prime}=e^{{\epsilon}}q into f(q,q;γmax)f(q,q^{\prime};\gamma_{max}), one gets

f(q;γmax)\displaystyle f(q;\gamma_{max}) =l=0K12(emϵ(Kl)(1eϵq)l(eϵq)Kl(Kl)(1q)lqKl)\displaystyle=\sum_{l=0}^{\frac{K-1}{2}}\Big{(}e^{m{\epsilon}}{K\choose l}(1-e^{{\epsilon}}q)^{l}(e^{{\epsilon}}q)^{K-l}-{K\choose l}(1-q)^{l}q^{K-l}\Big{)} (98)
+l=K+12K((Kl)(1q)lqKlemϵ(Kl)(1eϵq)l(eϵq)Kl)\displaystyle+\sum_{l=\frac{K+1}{2}}^{K}\Big{(}{K\choose l}(1-q)^{l}q^{K-l}-e^{m{\epsilon}}{K\choose l}(1-e^{{\epsilon}}q)^{l}(e^{{\epsilon}}q)^{K-l}\Big{)}

The gradient w.r.t. qq is

qf(q)\displaystyle\nabla_{q}f(q) =l=0K12(emϵ(Kl)((eϵ)l(1eϵq)l1(eϵq)Kl+eϵ(Kl)(1eϵq)l(eϵq)Kl1)\displaystyle=\sum_{l=0}^{\frac{K-1}{2}}\Big{(}e^{m{\epsilon}}{K\choose l}\Big{(}(-e^{{\epsilon}})l(1-e^{{\epsilon}}q)^{l-1}(e^{{\epsilon}}q)^{K-l}+e^{{\epsilon}}(K-l)(1-e^{{\epsilon}}q)^{l}(e^{{\epsilon}}q)^{K-l-1}\Big{)} (99)
(Kl)(l(1q)l1qKl+(Kl)(1q)lqKl1))\displaystyle-{K\choose l}\Big{(}-l(1-q)^{l-1}q^{K-l}+(K-l)(1-q)^{l}q^{K-l-1}\Big{)}\Big{)}
+l=K+12K((Kl)(l(1q)l1qKl+(Kl)(1q)lqKl1)\displaystyle+\sum_{l=\frac{K+1}{2}}^{K}\Big{(}{K\choose l}\Big{(}-l(1-q)^{l-1}q^{K-l}+(K-l)(1-q)^{l}q^{K-l-1}\Big{)}
emϵ(Kl)((eϵ)l(1eϵq)l1(eϵq)Kl+eϵ(Kl)(1eϵq)l(eϵq)Kl1))\displaystyle-e^{m{\epsilon}}{K\choose l}\Big{(}(-e^{{\epsilon}})l(1-e^{{\epsilon}}q)^{l-1}(e^{{\epsilon}}q)^{K-l}+e^{{\epsilon}}(K-l)(1-e^{{\epsilon}}q)^{l}(e^{{\epsilon}}q)^{K-l-1}\Big{)}\Big{)}
=l=1K12e(m+1)ϵ(K1l1)Kll(1eϵq)l1(eϵq)Kl+l=0K12e(m+1)ϵ(K1l)KKl(Kl)(1eϵq)l(eϵq)Kl1\displaystyle=-\sum_{l=1}^{\frac{K-1}{2}}e^{(m+1){\epsilon}}{K-1\choose l-1}\frac{K}{l}l(1-e^{{\epsilon}}q)^{l-1}(e^{{\epsilon}}q)^{K-l}+\sum_{l=0}^{\frac{K-1}{2}}e^{(m+1){\epsilon}}{K-1\choose l}\frac{K}{K-l}(K-l)(1-e^{{\epsilon}}q)^{l}(e^{{\epsilon}}q)^{K-l-1} (100)
+l=1K12(K1l1)Kll(1q)l1qKll=0K12(K1l)KKl(Kl)(1q)lqKl1\displaystyle+\sum_{l=1}^{\frac{K-1}{2}}{K-1\choose l-1}\frac{K}{l}l(1-q)^{l-1}q^{K-l}-\sum_{l=0}^{\frac{K-1}{2}}{K-1\choose l}\frac{K}{K-l}(K-l)(1-q)^{l}q^{K-l-1}
l=K+12K(K1l1)Kll(1q)l1qKl+l=K+12K1(K1l)KKl(Kl)(1q)lqKl1\displaystyle-\sum_{l=\frac{K+1}{2}}^{K}{K-1\choose l-1}\frac{K}{l}l(1-q)^{l-1}q^{K-l}+\sum_{l=\frac{K+1}{2}}^{K-1}{K-1\choose l}\frac{K}{K-l}(K-l)(1-q)^{l}q^{K-l-1}
+l=K+12Ke(m+1)ϵ(K1l1)Kll(1eϵq)l1(eϵq)Kll=K+12K1e(m+1)ϵ(K1l)KKl(Kl)(1eϵq)l(eϵq)Kl1\displaystyle+\sum_{l=\frac{K+1}{2}}^{K}e^{(m+1){\epsilon}}{K-1\choose l-1}\frac{K}{l}l(1-e^{{\epsilon}}q)^{l-1}(e^{{\epsilon}}q)^{K-l}-\sum_{l=\frac{K+1}{2}}^{K-1}e^{(m+1){\epsilon}}{K-1\choose l}\frac{K}{K-l}(K-l)(1-e^{{\epsilon}}q)^{l}(e^{{\epsilon}}q)^{K-l-1}
=Kl=1K12e(m+1)ϵ(K1l1)(1eϵq)l1(eϵq)Kl+Kl=0K12e(m+1)ϵ(K1l)(1eϵq)l(eϵq)Kl1\displaystyle=-K\sum_{l=1}^{\frac{K-1}{2}}e^{(m+1){\epsilon}}{K-1\choose l-1}(1-e^{{\epsilon}}q)^{l-1}(e^{{\epsilon}}q)^{K-l}+K\sum_{l=0}^{\frac{K-1}{2}}e^{(m+1){\epsilon}}{K-1\choose l}(1-e^{{\epsilon}}q)^{l}(e^{{\epsilon}}q)^{K-l-1} (101)
+Kl=1K12(K1l1)(1q)l1qKlKl=0K12(K1l)(1q)lqKl1\displaystyle+K\sum_{l=1}^{\frac{K-1}{2}}{K-1\choose l-1}(1-q)^{l-1}q^{K-l}-K\sum_{l=0}^{\frac{K-1}{2}}{K-1\choose l}(1-q)^{l}q^{K-l-1}
Kl=K+12K(K1l1)(1q)l1qKl+Kl=K+12K1(K1l)(1q)lqKl1\displaystyle-K\sum_{l=\frac{K+1}{2}}^{K}{K-1\choose l-1}(1-q)^{l-1}q^{K-l}+K\sum_{l=\frac{K+1}{2}}^{K-1}{K-1\choose l}(1-q)^{l}q^{K-l-1}
+Kl=K+12Ke(m+1)ϵ(K1l1)(1eϵq)l1(eϵq)KlKl=K+12K1e(m+1)ϵ(K1l)(1eϵq)l(eϵq)Kl1\displaystyle+K\sum_{l=\frac{K+1}{2}}^{K}e^{(m+1){\epsilon}}{K-1\choose l-1}(1-e^{{\epsilon}}q)^{l-1}(e^{{\epsilon}}q)^{K-l}-K\sum_{l=\frac{K+1}{2}}^{K-1}e^{(m+1){\epsilon}}{K-1\choose l}(1-e^{{\epsilon}}q)^{l}(e^{{\epsilon}}q)^{K-l-1}
=2Ke(m+1)ϵ(K1K12)(1eϵq)K12(eϵq)K122K(K1K12)(1q)K12qK12\displaystyle=2Ke^{(m+1){\epsilon}}{K-1\choose\frac{K-1}{2}}(1-e^{{\epsilon}}q)^{\frac{K-1}{2}}(e^{{\epsilon}}q)^{\frac{K-1}{2}}-2K{K-1\choose\frac{K-1}{2}}(1-q)^{\frac{K-1}{2}}q^{\frac{K-1}{2}} (102)

Recall q[0,11+eϵ]q\in[0,\frac{1}{1+e^{{\epsilon}}}] and so (1eϵq)(eϵq)(1q)q(1-e^{{\epsilon}}q)(e^{{\epsilon}}q)\geq(1-q)q. Furthermore, since e(m+1)ϵ1e^{(m+1){\epsilon}}\geq 1, there is qf(q)0\nabla_{q}f(q)\geq 0. This implies f(q)f(q) is monotonically non-decreasing in qq, and so the local maximum on this boundary is

(qlocal,qlocal)=argmax(q,q):q=eϵq,q[0,11+eϵ]f(q,q;γmax)=(11+eϵ,11+eϵ)\displaystyle(q_{local}^{*},q_{local}^{\prime*})=\operatorname*{arg\,max}_{(q,q^{\prime}):q^{\prime}=e^{{\epsilon}}q,q\in[0,\frac{1}{1+e^{{\epsilon}}}]}f(q,q^{\prime};\gamma_{max})=(\frac{1}{1+e^{{\epsilon}}},\frac{1}{1+e^{-{\epsilon}}}) (103)

That is,

(plocal,plocal)=argmax(p,p):1p=eϵ(1p),p[11+eϵ,1]f(p,p;γmax)=(1qlocal,1qlocal)=(11+eϵ,11+eϵ)\displaystyle(p_{local}^{*},p_{local}^{\prime*})=\operatorname*{arg\,max}_{(p,p^{\prime}):1-p^{\prime}=e^{{\epsilon}}(1-p),p\in[\frac{1}{1+e^{-{\epsilon}}},1]}f(p,p^{\prime};\gamma_{max})=(1-q_{local}^{*},1-q_{local}^{\prime*})=(\frac{1}{1+e^{-{\epsilon}}},\frac{1}{1+e^{{\epsilon}}}) (104)

Part III: The global worst case probabilities.

Notice that (11+eϵ,11+eϵ)(\frac{1}{1+e^{-{\epsilon}}},\frac{1}{1+e^{{\epsilon}}}), the maximum on the second boundary 1p=eϵ(1p),p[11+eϵ,1]1-p^{\prime}=e^{{\epsilon}}(1-p),\forall p\in[\frac{1}{1+e^{-{\epsilon}}},1], is indeed the minimum on the first boundary p=eϵp,p[0,11+eϵ+1]p=e^{{\epsilon}}p^{\prime},\forall p\in[0,\frac{1}{1+e^{-{\epsilon}}+1}].

Therefore, the global maximum given γmax\gamma_{max} is

(p,p)=argmax(p,p)f(p,p;γmax)=argmax(p,p):p=eϵp,p[0,11+eϵ]f(p,p;γmax)=(0,0)\displaystyle(p^{*},p^{\prime*})=\operatorname*{arg\,max}_{(p,p^{\prime})\in{\mathcal{F}}}f(p,p^{\prime};\gamma_{max})=\operatorname*{arg\,max}_{(p,p^{\prime}):p=e^{{\epsilon}}p^{\prime},p\in[0,\frac{1}{1+e^{-{\epsilon}}}]}f(p,p^{\prime};\gamma_{max})=(0,0) (105)

and recall that f(0,0;γmax)=emϵ1f(0,0;\gamma_{max})=e^{m{\epsilon}}-1.

Hence, if mK+12m\geq\frac{K+1}{2}, by Lemma 3.4 DaRRMγmax\textsf{DaRRM}_{\gamma_{max}} is mϵm{\epsilon}-differentially private.

B.2.2 Privacy Amplification Under A Small Privacy Allowance mK12m\leq\frac{K-1}{2}

The proof of Lemma B.10 is slightly more involved. First, recall by Lemma 3.1, γSub\gamma_{Sub}, the noise function that makes the output of DaRRMγSub\textsf{DaRRM}_{\gamma_{Sub}} and the subsampling baseline the same, is

γSub(l)=γSub(K=l)\displaystyle\gamma_{Sub}(l)=\gamma_{Sub}(K=l)
={12j=m+12m(lj)(Klmj)(Km)if m is odd12j=m2+1m(lj)(Klmj)(Km)(lm2)(Klm2)(Km)if m is even\displaystyle=\begin{cases}1-2\sum_{j=\frac{m+1}{2}}^{m}\frac{{l\choose j}{K-l\choose m-j}}{{K\choose m}}&\text{if $m$ is odd}\\ 1-2\sum_{j=\frac{m}{2}+1}^{m}\frac{{l\choose j}{K-l\choose m-j}}{{K\choose m}}-\frac{{l\choose\frac{m}{2}}{K-l\choose\frac{m}{2}}}{{K\choose m}}&\text{if $m$ is even}\end{cases}

for l{0,1,,K}l\in\{0,1,\dots,K\}, suppose the privacy allowance mm\in{\mathbb{Z}}.

If we define h(l):={j=m+12m(lj)(Klmj)(Km)if m is oddj=m2+1m(lj)(Klmj)(Km)(lm2)(Klm2)(Km)if m is evenh(l):=\begin{cases}\sum_{j=\frac{m+1}{2}}^{m}\frac{{l\choose j}{K-l\choose m-j}}{{K\choose m}}&\text{if $m$ is odd}\\ \sum_{j=\frac{m}{2}+1}^{m}\frac{{l\choose j}{K-l\choose m-j}}{{K\choose m}}-\frac{{l\choose\frac{m}{2}}{K-l\choose\frac{m}{2}}}{{K\choose m}}&\text{if $m$ is even}\end{cases}, then γSub(l)\gamma_{Sub}(l) can be written as γSub(l)={12h(l)if lK122h(l)1if lK+12\gamma_{Sub}(l)=\begin{cases}1-2h(l)&\text{if $l\leq\frac{K-1}{2}$}\\ 2h(l)-1&\text{if $l\geq\frac{K+1}{2}$}\end{cases}.

This can be generalized to a broader class of γ\gamma functions — which we call the “symmetric form family” — as follows

Definition B.6.

γ:{0,1,,K}[0,1]\gamma:\{0,1,\dots,K\}\rightarrow[0,1] is a member of the “symmetric form family” if γ\gamma follows

γ(l)={12h(l)if lK122h(l)1if lK+12\displaystyle\gamma(l)=\begin{cases}1-2h(l)&\text{if $l\leq\frac{K-1}{2}$}\\ 2h(l)-1&\text{if $l\geq\frac{K+1}{2}$}\end{cases} (106)

where h:{0,1,,K}[0,1]h:\{0,1,\dots,K\}\rightarrow[0,1] and

h(l)+h(Kl)=1,h(l+1)h(l),l{0,1,,K},andγ(K12)>0,γ(K+12)>0\displaystyle h(l)+h(K-l)=1,\quad h(l+1)\geq h(l),\quad\forall l\in\{0,1,\dots,K\},\quad\text{and}\quad\gamma(\frac{K-1}{2})>0,\gamma(\frac{K+1}{2})>0

It is easy to verify any γ\gamma function that belongs to the “symmetric form family” satisfies: 1) symmetric around K2\frac{K}{2} and 2) the monotonicity assumption. Hence, Lemma B.1 can be invoked to find the worst case probabilities given such γ\gamma, i.e., (p,p)=argmax(p,p)f(p,p;γ)(p^{*},p^{\prime*})=\operatorname*{arg\,max}_{(p,p^{\prime})\in{\mathcal{F}}}f(p,p^{\prime};\gamma), which in turn gives us the guarantee of DaRRMγ\textsf{DaRRM}_{\gamma} being mϵm{\epsilon}-differentially private.

Roadmap. In this section, we restrict our search of a good γ\gamma that maximizes the utility of DaRRMγ\textsf{DaRRM}_{\gamma} to in the “symmetric form family”. To show the main privacy amplification result under a small mm in Lemma B.10, Section B.2.4, we need a few building blocks, shown in Section B.2.3. We first show in Lemma B.7, Section B.2.3 two clean sufficient conditions that if a “symmetric form family” γ\gamma satisfies, then DaRRMγ\textsf{DaRRM}_{\gamma} is mϵm{\epsilon}-differentially private, in terms of the expectation of the γ\gamma function applied to Binomial random variables. The Binomial random variables appear in the lemma, because recall the sum of the observed outcomes on a dataset 𝒟{\mathcal{D}}, (𝒟){\mathcal{L}}({\mathcal{D}}), follows a Binomial distribution in the i.i.d. mechanisms setting. Next, we show a recurrence relationship that connects the expectation of Binomial random variables to Hypergeometric random variables in Lemma B.9. This is needed because observe that for γ\gamma functions that makes DaRRMγ\textsf{DaRRM}_{\gamma} have the same output as the majority of subsampled mechanisms, the hh function is now a sum of pmfs of the Hypergeometric random variable.

Finally, the proof of the main result under a small mm (Lemma B.10) is presented in Section B.2.4, based on Lemma B.7 and Lemma B.9. We show in Lemma B.10 that γDSub\gamma_{DSub}, i.e., the γ\gamma function that enables the output of DaRRMγDSub\textsf{DaRRM}_{\gamma_{DSub}} and outputting the majority of 2m12m-1 subsampled mechanisms to be the same, belongs to the “symmetric form family” and satisfies the sufficient conditions as stated in Lemma B.7, implying DaRRMγDSub\textsf{DaRRM}_{\gamma_{DSub}} being mϵm{\epsilon}-differentially private.

B.2.3 Building Blocks

Lemma B.7 (Privacy conditions of the “symmetric form family” functions).

Let random variables XBinomial(K1,p)X\sim\text{Binomial}(K-1,p^{\prime}), YBinomial(K1,eϵp)Y\sim\text{Binomial}(K-1,e^{{\epsilon}}p^{\prime}), X^Binomial(K1,1eϵ(1p))\hat{X}\sim\text{Binomial}(K-1,1-e^{{\epsilon}}(1-p)) and Y^Binomial(K1,p)\hat{Y}\sim\text{Binomial}(K-1,p). For a function γ:{0,1,,K}[0,1]\gamma:\{0,1,\dots,K\}\rightarrow[0,1] that belongs to the “symmetric form family” (Definition B.6), if γ\gamma also satisfies both conditions as follows:

emϵ𝔼X[h(X+1)h(X)]eϵ𝔼Y[h(Y+1)h(Y)],p[0,11+eϵ]\displaystyle e^{m{\epsilon}}\mathbb{E}_{X}[h(X+1)-h(X)]\geq e^{{\epsilon}}\mathbb{E}_{Y}[h(Y+1)-h(Y)],\quad\forall p^{\prime}\in[0,\frac{1}{1+e^{{\epsilon}}}] (107)
e(m+1)ϵ𝔼X^[h(X^+1)h(X^)]𝔼Y^[h(Y^+1)h(Y^)],p[11+eϵ,1]\displaystyle e^{(m+1){\epsilon}}\mathbb{E}_{\hat{X}}[h(\hat{X}+1)-h(\hat{X})]\geq\mathbb{E}_{\hat{Y}}[h(\hat{Y}+1)-h(\hat{Y})],\quad\forall p\in[\frac{1}{1+e^{-{\epsilon}}},1] (108)

then Algorithm DaRRMγ\textsf{DaRRM}_{\gamma} is mϵm{\epsilon}-differentially private.

Proof of Lemma B.7.

Since h(l+1)h(l)h(l+1)\geq h(l) on l{0,,K}l\in\{0,\dots,K\}, γ(l)γ(l+1),lK2\gamma(l)\geq\gamma(l+1),\forall l\leq\frac{K}{2} and γ(l+1)γ(l),lK2\gamma(l+1)\geq\gamma(l),\forall l\geq\frac{K}{2}. Furthermore, since h(l)+h(Kl)=1h(l)+h(K-l)=1, γ(K12)=12h(K12)=12(1h(K+12))=2h(K+12)1\gamma(\frac{K-1}{2})=1-2h(\frac{K-1}{2})=1-2(1-h(\frac{K+1}{2}))=2h(\frac{K+1}{2})-1. Hence, any γ\gamma that belongs to the “symmetric form family” satisfies: 1) symmetric around K2\frac{K}{2}, 2) the monotonicity assumption, and 3) γ(K12)=γ(K+12)>0\gamma(\frac{K-1}{2})=\gamma(\frac{K+1}{2})>0.

Therefore, by Lemma B.1, the worst case probabilities (p,p)=argmax(p,p)f(p,p;γ)(p^{*},p^{\prime*})=\operatorname*{arg\,max}_{(p,p^{\prime})\in{\mathcal{F}}}f(p,p^{\prime};\gamma) are on one of the two boundaries of {\mathcal{F}}, satisfying

p=eϵp,\displaystyle p^{*}=e^{{\epsilon}}p^{\prime*},\quad p[0,1eϵ+1],p[0,11+eϵ]\displaystyle\forall p^{*}\in[0,\frac{1}{e^{-{\epsilon}}+1}],p^{\prime*}\in[0,\frac{1}{1+e^{{\epsilon}}}] (109)
or 1p=eϵ(1p),\displaystyle\text{or }\quad 1-p^{\prime*}=e^{{\epsilon}}(1-p^{*}),\quad p[11+eϵ,1],p[11+eϵ,1]\displaystyle\forall p^{*}\in[\frac{1}{1+e^{-{\epsilon}}},1],p^{\prime*}\in[\frac{1}{1+e^{{\epsilon}}},1] (110)

We now derive the sufficient conditions that if any γ\gamma from the “symmetric form family” satisfy, then DaRRMγ\textsf{DaRRM}_{\gamma} is mϵm{\epsilon}-differentially private, from the two boundaries as in Eq. 109 and Eq. 110 separately.

Part I: Deriving a sufficient condition from Eq. 109 for “symmetric form family” γ\gamma.

Consider the boundary of {\mathcal{F}}, p=eϵpp=e^{{\epsilon}}p^{\prime}, p[0,11+eϵ],p[0,11+eϵ]\forall p\in[0,\frac{1}{1+e^{-{\epsilon}}}],p^{\prime}\in[0,\frac{1}{1+e^{{\epsilon}}}].

Given any γ\gamma, plugging p=eϵpp=e^{{\epsilon}}p^{\prime} into the privacy cost objective f(p,p;γ)f(p,p^{\prime};\gamma), one gets

f(p;γ)\displaystyle f(p^{\prime};\gamma) =l=0K12(emϵ(Kl)pl(1p)Kl(Kl)(eϵp)l(1eϵp)Kl)γ(l)\displaystyle=\sum_{l=0}^{\frac{K-1}{2}}(e^{m{\epsilon}}{K\choose l}p^{\prime l}(1-p^{\prime})^{K-l}-{K\choose l}(e^{{\epsilon}}p^{\prime})^{l}(1-e^{{\epsilon}}p^{\prime})^{K-l})\cdot\gamma(l) (111)
+l=K+12K((Kl)(eϵp)l(1eϵp)Klemϵ(Kl)pl(1p)Kl)γ(l)\displaystyle+\sum_{l=\frac{K+1}{2}}^{K}({K\choose l}(e^{{\epsilon}}p^{\prime})^{l}(1-e^{{\epsilon}}p^{\prime})^{K-l}-e^{m{\epsilon}}{K\choose l}p^{\prime l}(1-p^{\prime})^{K-l})\cdot\gamma(l)

The gradient w.r.t. pp^{\prime} is

pf(p);γK=emϵl=0K121(K1l)pl(1p)Kl1(γ(l+1)γ(l))2emϵ(K1K12)pK12(1p)K12γ(K12)\displaystyle\frac{\nabla_{p^{\prime}}f(p^{\prime});\gamma}{K}=e^{m{\epsilon}}\sum_{l=0}^{\frac{K-1}{2}-1}{K-1\choose l}p^{\prime l}(1-p^{\prime})^{K-l-1}\Big{(}\gamma(l+1)-\gamma(l)\Big{)}-2e^{m{\epsilon}}{K-1\choose\frac{K-1}{2}}p^{\prime\frac{K-1}{2}}(1-p^{\prime})^{\frac{K-1}{2}}\gamma(\frac{K-1}{2}) (112)
+emϵl=K+12K1(K1l)pl(1p)Kl1(γ(l)γ(l+1))\displaystyle+e^{m{\epsilon}}\sum_{l=\frac{K+1}{2}}^{K-1}{K-1\choose l}p^{\prime l}(1-p^{\prime})^{K-l-1}\Big{(}\gamma(l)-\gamma(l+1)\Big{)}
+eϵl=0K121(K1l)(eϵp)l(1eϵp)Kl1(γ(l)γ(l+1))+2eϵ(K1K12)(eϵp)K12(1eϵp)K12γ(K12)\displaystyle+e^{{\epsilon}}\sum_{l=0}^{\frac{K-1}{2}-1}{K-1\choose l}(e^{{\epsilon}}p^{\prime})^{l}(1-e^{{\epsilon}}p^{\prime})^{K-l-1}\Big{(}\gamma(l)-\gamma(l+1)\Big{)}+2e^{{\epsilon}}{K-1\choose\frac{K-1}{2}}(e^{{\epsilon}}p^{\prime})^{\frac{K-1}{2}}(1-e^{{\epsilon}}p^{\prime})^{\frac{K-1}{2}}\gamma(\frac{K-1}{2})
+eϵl=K+12K1(K1l)(eϵp)l(1eϵp)Kl1(γ(l+1)γ(l))\displaystyle+e^{{\epsilon}}\sum_{l=\frac{K+1}{2}}^{K-1}{K-1\choose l}(e^{{\epsilon}}p^{\prime})^{l}(1-e^{{\epsilon}}p^{\prime})^{K-l-1}\Big{(}\gamma(l+1)-\gamma(l)\Big{)}

Consider l{0,1,,K}l\in\{0,1,\dots,K\} in the above Eq. 112. For any function γ\gamma that belongs to the “symmetric form family”,

  1. 1.

    If lK2l\leq\frac{K}{2}, γ(l)γ(l+1)=(12h(l))(12h(l+1))=2h(l+1)2h(l)\gamma(l)-\gamma(l+1)=(1-2h(l))-(1-2h(l+1))=2h(l+1)-2h(l)

  2. 2.

    If lK2l\geq\frac{K}{2}, γ(l+1)γ(l)=(2h(l+1)1)(2h(l)1)=2h(l+1)2h(l)\gamma(l+1)-\gamma(l)=(2h(l+1)-1)-(2h(l)-1)=2h(l+1)-2h(l)

  3. 3.

    Since γ(K12)=γ(K+12)\gamma(\frac{K-1}{2})=\gamma(\frac{K+1}{2}),

    2γ(K12)\displaystyle 2\gamma(\frac{K-1}{2}) =(γ(K12)+γ(K+12))\displaystyle=\Big{(}\gamma(\frac{K-1}{2})+\gamma(\frac{K+1}{2})\Big{)} (113)
    =(12h(K12)+2h(K+12)1)\displaystyle=\Big{(}1-2h(\frac{K-1}{2})+2h(\frac{K+1}{2})-1\Big{)} (114)
    =2h(K+12)2h(K12)\displaystyle=2h(\frac{K+1}{2})-2h(\frac{K-1}{2}) (115)

Hence, following Eq. 112, the gradient, pf(p;γ)\nabla_{p^{\prime}}f(p^{\prime};\gamma), given a “symmetric form family” γ\gamma can be written as

pf(p;γ)K\displaystyle\frac{\nabla_{p^{\prime}}f(p^{\prime};\gamma)}{K} =emϵl=0K1(K1l)pl(1p)Kl(2h(l+1)2h(l))\displaystyle=-e^{m{\epsilon}}\sum_{l=0}^{K-1}{K-1\choose l}p^{\prime l}(1-p^{\prime})^{K-l}\Big{(}2h(l+1)-2h(l)\Big{)} (116)
+eϵl=0K1(K1l)(eϵp)l(1eϵp)Kl1(2h(l+1)2h(l))\displaystyle+e^{{\epsilon}}\sum_{l=0}^{K-1}{K-1\choose l}(e^{{\epsilon}}p^{\prime})^{l}(1-e^{{\epsilon}}p^{\prime})^{K-l-1}\Big{(}2h(l+1)-2h(l)\Big{)}
=2emϵ𝔼X[h(X+1)h(X)]+2eϵ𝔼Y[h(Y+1)h(Y)]\displaystyle=-2e^{m{\epsilon}}\mathbb{E}_{X}[h(X+1)-h(X)]+2e^{{\epsilon}}\mathbb{E}_{Y}[h(Y+1)-h(Y)] (117)

where XBinomial(K1,p)X\sim\text{Binomial}(K-1,p^{\prime}) and YBinomial(K1,eϵp)Y\sim\text{Binomial}(K-1,e^{{\epsilon}}p^{\prime}). The above implies

pf(p;γ)0eϵ𝔼Y[h(Y+1)h(Y)]emϵ𝔼X[h(X+1)h(X)]\displaystyle\nabla_{p^{\prime}}f(p^{\prime};\gamma)\leq 0\iff e^{{\epsilon}}\mathbb{E}_{Y}[h(Y+1)-h(Y)]\leq e^{m{\epsilon}}\mathbb{E}_{X}[h(X+1)-h(X)] (118)

If pf(p;γ)0\nabla_{p^{\prime}}f(p^{\prime};\gamma)\leq 0, then we know the local worst case probabilities on the boundary p=eϵp,p[0,11+eϵ]p=e^{{\epsilon}}p^{\prime},\forall p\in[0,\frac{1}{1+e^{-{\epsilon}}}] given any γ\gamma is (plocal,plocal)=argmax(p,p):p=eϵp,p[0,11+eϵ]f(p,p;γ)=(0,0)(p_{local}^{*},p_{local}^{\prime*})=\operatorname*{arg\,max}_{(p,p^{\prime}):p=e^{{\epsilon}}p^{\prime},p\in[0,\frac{1}{1+e^{-{\epsilon}}}]}f(p,p^{\prime};\gamma)=(0,0). Furthermore, recall the privacy cost objective given any γ\gamma is

f(p,p;γ)\displaystyle f(p,p^{\prime};\gamma)
=l=0K12(emϵαlαl)γ(l)+l=K+12K(αlemϵαl)γ(l)\displaystyle=\sum_{l=0}^{\frac{K-1}{2}}(e^{m{\epsilon}}\alpha_{l}^{\prime}-\alpha_{l})\cdot\gamma(l)+\sum_{l=\frac{K+1}{2}}^{K}(\alpha_{l}-e^{m{\epsilon}}\alpha_{l}^{\prime})\cdot\gamma(l)
=l=0K12(emϵ(Kl)pl(1p)Kl(Kl)pl(1p)Kl)γ(l)+l=K+12K((Kl)pl(1p)Klemϵ(Kl)pl(1p)Kl)γ(l)\displaystyle=\sum_{l=0}^{\frac{K-1}{2}}\Big{(}e^{m{\epsilon}}{K\choose l}p^{\prime l}(1-p^{\prime})^{K-l}-{K\choose l}p^{l}(1-p)^{K-l}\Big{)}\cdot\gamma(l)+\sum_{l=\frac{K+1}{2}}^{K}\Big{(}{K\choose l}p^{l}(1-p)^{K-l}-e^{m{\epsilon}}{K\choose l}p^{\prime l}(1-p^{\prime})^{K-l}\Big{)}\cdot\gamma(l)

and so for any γ\gamma,

f(0,0;γ)=(emϵ1)γ(0)emϵ1\displaystyle f(0,0;\gamma)=(e^{m{\epsilon}}-1)\cdot\gamma(0)\leq e^{m{\epsilon}}-1 (119)

Also, notice the local minimum on this boundary is

(pmin,pmin)=argmin(p,p):p=eϵp,p[0,11+eϵ]f(p,pl;γ)=(11+eϵ,11+eϵ)\displaystyle(p_{min},p_{min}^{\prime})=\operatorname*{arg\,min}_{(p,p^{\prime}):p=e^{{\epsilon}}p^{\prime},p\in[0,\frac{1}{1+e^{-{\epsilon}}}]}f(p,p^{\prime}l;\gamma)=(\frac{1}{1+e^{-{\epsilon}}},\frac{1}{1+e^{{\epsilon}}}) (120)

Part II: Deriving a sufficient condition from Eq. 110 for “symmetric form family” γ\gamma.

Consider the boundary of {\mathcal{F}}, 1p=eϵ(1p)1-p^{\prime}=e^{{\epsilon}}(1-p), p[11+eϵ,1],p[11+eϵ,1]\forall p\in[\frac{1}{1+e^{-{\epsilon}}},1],p^{\prime}\in[\frac{1}{1+e^{{\epsilon}}},1]. For simplicity, let q=1p[0,11+eϵ]q=1-p\in[0,\frac{1}{1+e^{{\epsilon}}}] and q=1p[0,11+eϵ]q^{\prime}=1-p^{\prime}\in[0,\frac{1}{1+e^{-{\epsilon}}}]. Plugging q=eϵqq^{\prime}=e^{{\epsilon}}q into the privacy cost objective, one gets, given any γ\gamma,

f(q;γ)\displaystyle f(q;\gamma) =l=0K12(emϵ(Kl)(1eϵq)l(eϵq)Kl(Kl)(1q)lqKl)γ(l)\displaystyle=\sum_{l=0}^{\frac{K-1}{2}}\Big{(}e^{m{\epsilon}}{K\choose l}(1-e^{{\epsilon}}q)^{l}(e^{{\epsilon}}q)^{K-l}-{K\choose l}(1-q)^{l}q^{K-l}\Big{)}\cdot\gamma(l) (121)
+l=K+12K((Kl)(1q)lqKlemϵ(Kl)(1eϵq)l(eϵq)Kl)γ(l)\displaystyle+\sum_{l=\frac{K+1}{2}}^{K}\Big{(}{K\choose l}(1-q)^{l}q^{K-l}-e^{m{\epsilon}}{K\choose l}(1-e^{{\epsilon}}q)^{l}(e^{{\epsilon}}q)^{K-l}\Big{)}\cdot\gamma(l)

The gradient w.r.t. qq is

qf(q;γ)K\displaystyle\frac{\nabla_{q}f(q;\gamma)}{K} =l=0K121e(m+1)ϵ(K1l)(1eϵq)l(eϵq)Kl1(γ(l)γ(l+1))\displaystyle=\sum_{l=0}^{\frac{K-1}{2}-1}e^{(m+1){\epsilon}}{K-1\choose l}(1-e^{{\epsilon}}q)^{l}(e^{{\epsilon}}q)^{K-l-1}\cdot\Big{(}\gamma(l)-\gamma(l+1)\Big{)} (122)
+l=K+12K1(K1l)(1eϵq)l(eϵq)Kl1(γ(l+1)γ(l))+2e(m+1)ϵ(K1K12)(1eϵq)K12(eϵq)K12γ(K12)\displaystyle+\sum_{l=\frac{K+1}{2}}^{K-1}{K-1\choose l}(1-e^{{\epsilon}}q)^{l}(e^{{\epsilon}}q)^{K-l-1}\cdot\Big{(}\gamma(l+1)-\gamma(l)\Big{)}+2e^{(m+1){\epsilon}}{K-1\choose\frac{K-1}{2}}(1-e^{{\epsilon}}q)^{\frac{K-1}{2}}(e^{{\epsilon}}q)^{\frac{K-1}{2}}\cdot\gamma(\frac{K-1}{2})
+l=0K121(K1l)(1q)lqKl1(γ(l+1)γ(l))\displaystyle+\sum_{l=0}^{\frac{K-1}{2}-1}{K-1\choose l}(1-q)^{l}q^{K-l-1}\cdot\Big{(}\gamma(l+1)-\gamma(l)\Big{)}
+l=K+12K1(1q)lqKl1(γ(l)γ(l+1))2(K1K12)(1q)K12qK12γ(K12)\displaystyle+\sum_{l=\frac{K+1}{2}}^{K-1}(1-q)^{l}q^{K-l-1}\cdot\Big{(}\gamma(l)-\gamma(l+1)\Big{)}-2{K-1\choose\frac{K-1}{2}}(1-q)^{\frac{K-1}{2}}q^{\frac{K-1}{2}}\cdot\gamma(\frac{K-1}{2})

For any function γ\gamma that belongs to the “symmetric form family”, the gradient qf(q;γ)\nabla_{q}f(q;\gamma) can be written as

qf(q;γ)K\displaystyle\frac{\nabla_{q}f(q;\gamma)}{K} =e(m+1)ϵl=0K1(K1l)(1eϵq)l(eϵq)Kl1(2h(l+1)2h(l))\displaystyle=e^{(m+1){\epsilon}}\sum_{l=0}^{K-1}{K-1\choose l}(1-e^{{\epsilon}}q)^{l}(e^{{\epsilon}}q)^{K-l-1}\cdot\Big{(}2h(l+1)-2h(l)\Big{)} (123)
l=0K(K1l)(1q)lqKl1(2h(l+1)2h(l))\displaystyle-\sum_{l=0}^{K}{K-1\choose l}(1-q)^{l}q^{K-l-1}\cdot\Big{(}2h(l+1)-2h(l)\Big{)}
=2e(m+1)ϵ𝔼X^[h(X^+1)h(X^)]2𝔼Y^[h(Y^+1)h(Y^)]\displaystyle=2e^{(m+1){\epsilon}}\mathbb{E}_{\hat{X}}[h(\hat{X}+1)-h(\hat{X})]-2\mathbb{E}_{\hat{Y}}[h(\hat{Y}+1)-h(\hat{Y})] (124)

where X^Binomial(K1,1eϵ(1p))\hat{X}\sim\text{Binomial}(K-1,1-e^{{\epsilon}}(1-p)) and Y^Binomial(K1,p)\hat{Y}\sim\text{Binomial}(K-1,p). The above implies

qf(q;γ)0e(m+1)ϵ𝔼X^[h(X^+1)h(X^)]𝔼Y^[h(Y^+1)h(Y^)]\displaystyle\nabla_{q}f(q;\gamma)\geq 0\iff e^{(m+1){\epsilon}}\mathbb{E}_{\hat{X}}[h(\hat{X}+1)-h(\hat{X})]\geq\mathbb{E}_{\hat{Y}}[h(\hat{Y}+1)-h(\hat{Y})] (125)

If qf(q;γ)0\nabla_{q}f(q;\gamma)\geq 0, then since q[0,11+eϵ]q\in[0,\frac{1}{1+e^{{\epsilon}}}], we know that the local maximum given any γ\gamma is (qlocal,qlocal)=argmax(q,q):q=eϵq,q[0,11+eϵ]f(q,q;γ)=(11+eϵ,11+eϵ)(q_{local}^{*},q_{local}^{\prime*})=\operatorname*{arg\,max}_{(q,q^{\prime}):q^{\prime}=e^{{\epsilon}}q,q\in[0,\frac{1}{1+e^{{\epsilon}}}]}f(q,q^{\prime};\gamma)=(\frac{1}{1+e^{{\epsilon}}},\frac{1}{1+e^{-{\epsilon}}}). That is,

(plocal,plocal)=argmax(p,p):1p=eϵ(1p),p[11+eϵ,1]f(p,p;γ)=(1qlocal,1qlocal)=(11+eϵ,11+eϵ)\displaystyle(p_{local}^{*},p_{local}^{\prime*})=\operatorname*{arg\,max}_{(p,p^{\prime}):1-p^{\prime}=e^{{\epsilon}}(1-p),p\in[\frac{1}{1+e^{-{\epsilon}}},1]}f(p,p^{\prime};\gamma)=(1-q_{local}^{*},1-q_{local}^{\prime*})=(\frac{1}{1+e^{-{\epsilon}}},\frac{1}{1+e^{{\epsilon}}})

Notice by Eq. 120, the above (11+eϵ,11+eϵ)(\frac{1}{1+e^{-{\epsilon}}},\frac{1}{1+e^{{\epsilon}}}) is the local minimum on the first boundary p=eϵpp=e^{{\epsilon}}p^{\prime}, p[0,11+eϵ]\forall p\in[0,\frac{1}{1+e^{-{\epsilon}}}].

Therefore, given an arbitrary γ\gamma function, if it satisfies both of the following:

  1. 1.

    On the boundary p=eϵp,p[0,11+eϵ]p=e^{{\epsilon}}p^{\prime},\forall p\in[0,\frac{1}{1+e^{-{\epsilon}}}], pf(p;γ)0\nabla_{p^{\prime}}f(p^{\prime};\gamma)\leq 0

  2. 2.

    On the boundary 1p=eϵ(1p),p[11+eϵ,1]1-p^{\prime}=e^{{\epsilon}}(1-p),\forall p\in[\frac{1}{1+e^{-{\epsilon}}},1], qf(q;γ)0\nabla_{q^{\prime}}f(q^{\prime};\gamma)\geq 0 where q=1pq^{\prime}=1-p^{\prime}

then the global worst case probabilities given this γ\gamma is (p,p)=argmax(p,p)f(p,p;γ)=(0,0)(p^{*},p^{\prime*})=\operatorname*{arg\,max}_{(p,p^{\prime})\in{\mathcal{F}}}f(p,p^{\prime};\gamma)=(0,0). Furthermore, since by Eq. 119, f(0,0;γ)emϵ1f(0,0;\gamma)\leq e^{m{\epsilon}}-1 for any γ\gamma, this implies DaRRMγ\textsf{DaRRM}_{\gamma} is mϵm{\epsilon}-differentially private by Lemma 3.4.

Now, if γ\gamma belongs to the “symmetric form family”, by Eq. 118 and Eq. 125, the sufficient conditions for γ\gamma that enables DaRRMγ\textsf{DaRRM}_{\gamma} to be mϵm{\epsilon}-differentially private are hence

eϵ𝔼Y[h(Y+1)h(Y)]emϵ𝔼X[h(X+1)h(X)],p[0,11+eϵ]\displaystyle e^{{\epsilon}}\mathbb{E}_{Y}[h(Y+1)-h(Y)]\leq e^{m{\epsilon}}\mathbb{E}_{X}[h(X+1)-h(X)],\quad\forall p^{\prime}\in[0,\frac{1}{1+e^{{\epsilon}}}]
ande(m+1)ϵ𝔼X^[h(X^+1)h(X^)]𝔼Y^[h(Y^+1)h(Y^)],p[11+eϵ,1]\displaystyle\text{and}\quad e^{(m+1){\epsilon}}\mathbb{E}_{\hat{X}}[h(\hat{X}+1)-h(\hat{X})]\geq\mathbb{E}_{\hat{Y}}[h(\hat{Y}+1)-h(\hat{Y})],\quad\forall p\in[\frac{1}{1+e^{-{\epsilon}}},1]

where XBinomial(K1,p)X\sim\text{Binomial}(K-1,p^{\prime}), YBinomial(K1,eϵp)Y\sim\text{Binomial}(K-1,e^{{\epsilon}}p^{\prime}), X^Binomial(K1,1eϵ(1p))\hat{X}\sim\text{Binomial}(K-1,1-e^{{\epsilon}}(1-p)) and Y^Binomial(K1,p)\hat{Y}\sim\text{Binomial}(K-1,p).

Lemma B.8 (Binomial Expectation Recurrence Relationship (Theorem 2.1 of Zhang et al. (2019))).

Let X(K1)Binomial(K1,p)X_{(K-1)}\sim\text{Binomial}(K-1,p) and X(K)Binomial(K,p)X_{(K)}\sim\text{Binomial}(K,p). Let g(x)g(x) be a function with <𝔼[g(X(K1))]<-\infty<\mathbb{E}[g(X_{(K-1)})]<\infty and <g(1)<-\infty<g(-1)<\infty, then

Kp𝔼X(K1)[g(X(K1))]=𝔼X(K)[X(K)g(X(K)1)]\displaystyle Kp\mathbb{E}_{X_{(K-1)}}[g(X_{(K-1)})]=\mathbb{E}_{X_{(K)}}[X_{(K)}g(X_{(K)}-1)] (126)
Lemma B.9.

Given i,m,Ki,m,K\in{\mathbb{Z}}, K1K\geq 1, 0imK0\leq i\leq m\leq K, let X(K)Binomial(K,p)X_{(K)}\sim\text{Binomial}(K,p) for some p[0,1]p\in[0,1], there is

1(Km)𝔼X(K)[(Xi)(KXmi)]=(mi)pi(1p)mi\displaystyle\frac{1}{{K\choose m}}\mathbb{E}_{X_{(K)}}\left[{X\choose i}{K-X\choose m-i}\right]={m\choose i}p^{i}(1-p)^{m-i} (127)
Proof of Lemma B.9.

We show the above statement in Eq. 127 by induction on KK and mm.

Base Case: K=1K=1.

  1. 1.

    If m=0m=0, then i=0i=0. 1(10)𝔼X(1)[(X0)(1X0)]=𝔼X(1)[1]=1\frac{1}{{1\choose 0}}\mathbb{E}_{X_{(1)}}[{X\choose 0}{1-X\choose 0}]=\mathbb{E}_{X_{(1)}}[1]=1, and (00)p0(1p)0=1{0\choose 0}p^{0}(1-p)^{0}=1.

  2. 2.

    If m=1m=1,

    1. (a)

      i=0i=0, 1(11)𝔼X(1)[(X0)(1X1)]=𝔼X(1)[1X]=1p\frac{1}{{1\choose 1}}\mathbb{E}_{X_{(1)}}[{X\choose 0}{1-X\choose 1}]=\mathbb{E}_{X_{(1)}}[1-X]=1-p, and (10)p0(1p)1=1p{1\choose 0}p^{0}(1-p)^{1}=1-p

    2. (b)

      i=1i=1, 1(11)𝔼X(1)[(X1)(1X0)]=𝔼X(1)[X]=p\frac{1}{{1\choose 1}}\mathbb{E}_{X_{(1)}}[{X\choose 1}{1-X\choose 0}]=\mathbb{E}_{X_{(1)}}[X]=p, and (11)p1(1p)0=p{1\choose 1}p^{1}(1-p)^{0}=p.

Hence, Eq. 127 holds for the base case.

Induction Hypothesis: Suppose the statement holds for some K1K\geq 1 and 0imK0\leq i\leq m\leq K. Consider 1imK+11\leq i\leq m\leq K+1,

1(K+1m)𝔼X(K+1)[(Xi)(K+1Xmi)]\displaystyle\frac{1}{{K+1\choose m}}\mathbb{E}_{X_{(K+1)}}\left[{X\choose i}{K+1-X\choose m-i}\right] (128)
=1(K+1m)𝔼X(K+1)[X!i!(Xi)!(K+1X)!(mi)!(K+1X(mi))!]\displaystyle=\frac{1}{{K+1\choose m}}\mathbb{E}_{X_{(K+1)}}[\frac{X!}{i!(X-i)!}\frac{(K+1-X)!}{(m-i)!(K+1-X-(m-i))!}] (129)
=1(K+1m)i!(mi)!𝔼X(K+1)[X(X1)!((X1)(i1))!(K(X1))!(K(X1)((m1)(i1)))!]\displaystyle=\frac{1}{{K+1\choose m}i!(m-i)!}\mathbb{E}_{X_{(K+1)}}[X\frac{(X-1)!}{((X-1)-(i-1))!}\frac{(K-(X-1))!}{(K-(X-1)-((m-1)-(i-1)))!}] (130)
=1(K+1m)i!(mi)!𝔼X(K)[X!(X(i1))!(KX)!(KX((m1)(i1)))!]\displaystyle=\frac{1}{{K+1\choose m}i!(m-i)!}\mathbb{E}_{X_{(K)}}[\frac{X!}{(X-(i-1))!}{\frac{(K-X)!}{(K-X-((m-1)-(i-1)))!}}] (131)
(By Lemma B.8)
=(i1)!(mi)!(K+1m)i!(mi)!𝔼X(K)[(Xi1)(KX(m1)(i1))]\displaystyle=\frac{(i-1)!(m-i)!}{{K+1\choose m}i!(m-i)!}\mathbb{E}_{X_{(K)}}[{X\choose i-1}{K-X\choose(m-1)-(i-1)}] (132)
=(i1)!(K+1m)i!(K+1)p(Km1)(m1i1)pi1(1p)mi\displaystyle=\frac{(i-1)!}{{K+1\choose m}i!}(K+1)p{K\choose m-1}{m-1\choose i-1}p^{i-1}(1-p)^{m-i} (133)
(By Induction Hypothesis)
=m!(K+1m)!(K+1)!iK!(m1)!(Km+1)!(m1)!(i1)!(mi)!(K+1)pi(1p)mi\displaystyle=\frac{m!(K+1-m)!}{(K+1)!i}\frac{K!}{(m-1)!(K-m+1)!}\frac{(m-1)!}{(i-1)!(m-i)!}(K+1)p^{i}(1-p)^{m-i} (134)
=m!i!(mi)!pi(1p)mi=(mi)pi(1p)mi\displaystyle=\frac{m!}{i!(m-i)!}p^{i}(1-p)^{m-i}={m\choose i}p^{i}(1-p)^{m-i} (135)

Now we consider the edge cases when 0=im0=i\leq m.

If i=0i=0 and m=0m=0,

1(K+10)𝔼X(K+1)[(X0)(K+1X0)]=1𝔼X(K+1)[1]=1=(00)p0(1p)0\displaystyle\frac{1}{{K+1\choose 0}}\mathbb{E}_{X_{(K+1)}}[{X\choose 0}{K+1-X\choose 0}]=1\cdot\mathbb{E}_{X_{(K+1)}}[1]=1={0\choose 0}p^{0}(1-p)^{0} (136)

If i=0i=0 and m>0m>0,

1(K+1m)𝔼X(K+1)[(K+1Xm)]\displaystyle\frac{1}{{K+1\choose m}}\mathbb{E}_{X_{(K+1)}}[{K+1-X\choose m}] (137)
=1(K+1m)x=0K+1(K+1xm)(K+1x)px(1p)K+1x\displaystyle=\frac{1}{{K+1\choose m}}\sum_{x=0}^{K+1}{K+1-x\choose m}{K+1\choose x}p^{x}(1-p)^{K+1-x} (138)
=1(K+1m)x=0K+1(K+1xm)((Kx)+(Kx1)𝕀{x1})px(1p)K+1x\displaystyle=\frac{1}{{K+1\choose m}}\sum_{x=0}^{K+1}{K+1-x\choose m}\Big{(}{K\choose x}+{K\choose x-1}{\mathbb{I}}\{x\geq 1\}\Big{)}p^{x}(1-p)^{K+1-x} (139)
=1(K+1m)x=0K(K+1xm)(Kx)px(1p)K+1x+1(K+1m)x=1K+1(K+1xm)(Kx1)px(1p)K+1x\displaystyle=\frac{1}{{K+1\choose m}}\sum_{x=0}^{K}{K+1-x\choose m}{K\choose x}p^{x}(1-p)^{K+1-x}+\frac{1}{{K+1\choose m}}\sum_{x=1}^{K+1}{K+1-x\choose m}{K\choose x-1}p^{x}(1-p)^{K+1-x} (140)
(Since when x=K+1x=K+1 and m>0m>0, (K+1xm)=0{K+1-x\choose m}=0)
=1(K+1m)(x=0K(Kxm)(Kx)px(1p)K+1x+x=0K(Kxm1)(Kx)px(1p)K+1x)\displaystyle=\frac{1}{{K+1\choose m}}\Big{(}\sum_{x=0}^{K}{K-x\choose m}{K\choose x}p^{x}(1-p)^{K+1-x}+\sum_{x=0}^{K}{K-x\choose m-1}{K\choose x}p^{x}(1-p)^{K+1-x}\Big{)} (141)
+1(K+1m)x=0K(Kxm)(Kx)px+1(1p)Kx\displaystyle+\frac{1}{{K+1\choose m}}\sum_{x=0}^{K}{K-x\choose m}{K\choose x}p^{x+1}(1-p)^{K-x}
(Since (K+1xm)=(Kxm)+(Kxm1){K+1-x\choose m}={K-x\choose m}+{K-x\choose m-1})
=1(K+1m)((1p)𝔼X(K)[(KXm)]+(1p)𝔼X(k)[(KXm1)])+1(K+1m)p𝔼X(K)[(KXm)]\displaystyle=\frac{1}{{K+1\choose m}}\Big{(}(1-p)\mathbb{E}_{X_{(K)}}[{K-X\choose m}]+(1-p)\mathbb{E}_{X_{(k)}}[{K-X\choose m-1}]\Big{)}+\frac{1}{{K+1\choose m}}p\mathbb{E}_{X_{(K)}}[{K-X\choose m}] (142)
=1(K+1m)(𝔼X(K)[(KXm)]+(1p)𝔼X(K)[(KXm1)])\displaystyle=\frac{1}{{K+1\choose m}}\Big{(}\mathbb{E}_{X_{(K)}}[{K-X\choose m}]+(1-p)\mathbb{E}_{X_{(K)}}[{K-X\choose m-1}]\Big{)} (143)
=1(K+1m)((Km)(1p)m+(1p)(Km1)(1p)m1)\displaystyle=\frac{1}{{K+1\choose m}}\Big{(}{K\choose m}(1-p)^{m}+(1-p){K\choose m-1}(1-p)^{m-1}\Big{)} (144)
(By Induction Hypothesis) (145)
=1(K+1m)(K+1m)(1p)m\displaystyle=\frac{1}{{K+1\choose m}}{K+1\choose m}(1-p)^{m} (146)
=(1p)m\displaystyle=(1-p)^{m} (147)

Hence, Eq. 127 holds for all K1K\geq 1 and 0imK0\leq i\leq m\leq K.

B.2.4 Main Result: Privacy Amplification Under a Small mm

Lemma B.10 (Privacy amplification, mK12m\leq\frac{K-1}{2}).

Consider using DaRRM (Algorithm 1) to solve Problem 1.1, with i.i.d. mechanisms {Mi}i=1K\{M_{i}\}_{i=1}^{K}, pi=pp_{i}=p, pi=pp_{i}^{\prime}=p^{\prime}, i[K]\forall i\in[K], the privacy allowance 1mK12,m1\leq m\leq\frac{K-1}{2},m\in{\mathbb{Z}} and δ=Δ=0\delta=\Delta=0. Let the noise function be that

γDSub(l)={12h(l)l{0,1,,K12}2h(l)1l{K+12,,K}\displaystyle\gamma_{DSub}(l)=\begin{cases}1-2h(l)&\forall l\in\{0,1,\dots,\frac{K-1}{2}\}\\ 2h(l)-1&\forall l\in\{\frac{K+1}{2},\dots,K\}\end{cases} (148)

where h:{0,1,,K}[0,1]h:\{0,1,\dots,K\}\rightarrow[0,1] and h(l)=i=m2m1(li)(Kl2m1i)(K2m1)h(l)=\sum_{i=m}^{2m-1}\frac{{l\choose i}{K-l\choose 2m-1-i}}{{K\choose 2m-1}}, l{0,1,,K}\forall l\in\{0,1,\dots,K\}, then Algorithm DaRRMγDSub\textsf{DaRRM}_{\gamma_{DSub}} is mϵm{\epsilon}-differentially private.

Proof of Lemma B.10.

First, note γDSub\gamma_{DSub} belongs to the “symmetric form family”. We show γDSub\gamma_{DSub} satisfies the two sufficient conditions in Lemma B.7 and hence by Lemma B.7, DaRRMγDSub\textsf{DaRRM}_{\gamma_{DSub}} is mϵm{\epsilon}-differentially private. Specifically, we consider h(l)=i=m2m1(li)(Kl2m1i)(K2m1)h(l)=\sum_{i=m}^{2m-1}\frac{{l\choose i}{K-l\choose 2m-1-i}}{{K\choose 2m-1}}, l{0,1,,K}\forall l\in\{0,1,\dots,K\} and 1mK1\leq m\leq K.

Two show the first condition is satisfied, let X(K1)Binomial(K1,p)X_{(K-1)}\sim\text{Binomial}(K-1,p) and Y(K1)Binomial(K1,eϵp)Y_{(K-1)}\sim\text{Binomial}(K-1,e^{{\epsilon}}p), and consider p[0,11+eϵ]p\in[0,\frac{1}{1+e^{{\epsilon}}}].

𝔼X(K1)[h(X+1)]\displaystyle\mathbb{E}_{X_{(K-1)}}[h(X+1)] =1(K2m1)i=m2m1𝔼X(K1)[(X+1i)(KX12m1i)]\displaystyle=\frac{1}{{K\choose 2m-1}}\sum_{i=m}^{2m-1}\mathbb{E}_{X_{(K-1)}}[{X+1\choose i}{K-X-1\choose 2m-1-i}] (149)
=1(K2m1)i=m2m1𝔼X(K1)[(Xi)(KX12m1i)+(Xi1)(KX12m1i)]\displaystyle=\frac{1}{{K\choose 2m-1}}\sum_{i=m}^{2m-1}\mathbb{E}_{X_{(K-1)}}[{X\choose i}{K-X-1\choose 2m-1-i}+{X\choose i-1}{K-X-1\choose 2m-1-i}] (150)
(Since (X+1i)=(Xi)+(Xi1)𝕀{i1}{X+1\choose i}={X\choose i}+{X\choose i-1}{\mathbb{I}}\{i\geq 1\})
=1(K2m1)i=m2m1(𝔼X(K1)[(Xi)(K1X2m1i)]+𝔼X(K1)[(Xi1)(K1X(2m2)(i1))])\displaystyle=\frac{1}{{K\choose 2m-1}}\sum_{i=m}^{2m-1}\Big{(}\mathbb{E}_{X_{(K-1)}}[{X\choose i}{K-1-X\choose 2m-1-i}]+\mathbb{E}_{X_{(K-1)}}[{X\choose i-1}{K-1-X\choose(2m-2)-(i-1)}]\Big{)} (151)
=1(K2m1)i=m2m1((K12m1)(2m1i)pi(1p)2m1i+(K12m2)(2m2i1)pi1(1p)2m1i)\displaystyle=\frac{1}{{K\choose 2m-1}}\sum_{i=m}^{2m-1}\Big{(}{K-1\choose 2m-1}{2m-1\choose i}p^{i}(1-p)^{2m-1-i}+{K-1\choose 2m-2}{2m-2\choose i-1}p^{i-1}(1-p)^{2m-1-i}\Big{)} (152)
(By Lemma B.9)
𝔼X(K1)[h(X)]\displaystyle\mathbb{E}_{X_{(K-1)}}[h(X)] =1(K2m1)i=m2m1𝔼X(K1)[(Xi)(KX2m1i)]\displaystyle=\frac{1}{{K\choose 2m-1}}\sum_{i=m}^{2m-1}\mathbb{E}_{X_{(K-1)}}[{X\choose i}{K-X\choose 2m-1-i}] (153)
(Since (KX2m1i)=(K1X2m1i)+(K1X2m2i){K-X\choose 2m-1-i}={K-1-X\choose 2m-1-i}+{K-1-X\choose 2m-2-i})
=1(K2m1)i=m2m1(𝔼X(K1)[(Xi)(K1X2m1i)]+𝔼X(K1)[(Xi)(K1X2m2i)]𝕀{i2m2})\displaystyle=\frac{1}{{K\choose 2m-1}}\sum_{i=m}^{2m-1}\Big{(}\mathbb{E}_{X_{(K-1)}}[{X\choose i}{K-1-X\choose 2m-1-i}]+\mathbb{E}_{X_{(K-1)}}[{X\choose i}{K-1-X\choose 2m-2-i}]{\mathbb{I}}\{i\leq 2m-2\}\Big{)} (154)
=1(K2m1)i=m2m1((K12m1)(2m1i)pi(1p)2m1i+(K12m2)(2m2i)pi(1p)2m2i𝕀{i2m2})\displaystyle=\frac{1}{{K\choose 2m-1}}\sum_{i=m}^{2m-1}\Big{(}{K-1\choose 2m-1}{2m-1\choose i}p^{i}(1-p)^{2m-1-i}+{K-1\choose 2m-2}{2m-2\choose i}p^{i}(1-p)^{2m-2-i}{\mathbb{I}}\{i\leq 2m-2\}\Big{)} (155)
(By Lemma B.9)

Hence, following Eq. 155 and Eq. 152,

𝔼X(K1)[h(X+1)h(X)]\displaystyle\mathbb{E}_{X_{(K-1)}}[h(X+1)-h(X)] (156)
=1(K2m1)(i=m2m1(K12m2)(2m2i1)pi1(1p)2m1ii=m2m2(K12m2)(2m2i)pi(1p)2m2i)\displaystyle=\frac{1}{{K\choose 2m-1}}\Big{(}\sum_{i=m}^{2m-1}{K-1\choose 2m-2}{2m-2\choose i-1}p^{i-1}(1-p)^{2m-1-i}-\sum_{i=m}^{2m-2}{K-1\choose 2m-2}{2m-2\choose i}p^{i}(1-p)^{2m-2-i}\Big{)} (157)
=1(K2m1)(i=m12m2(K12m2)(2m2i)pi(1p)2m2ii=m2m2(K12m2)(2m2i)pi(1p)2m2i)\displaystyle=\frac{1}{{K\choose 2m-1}}\Big{(}\sum_{i=m-1}^{2m-2}{K-1\choose 2m-2}{2m-2\choose i}p^{i}(1-p)^{2m-2-i}-\sum_{i=m}^{2m-2}{K-1\choose 2m-2}{2m-2\choose i}p^{i}(1-p)^{2m-2-i}\Big{)} (158)
=2m1K(2m2m1)pm1(1p)m1\displaystyle=\frac{2m-1}{K}{2m-2\choose m-1}p^{m-1}(1-p)^{m-1} (159)

Similarly,

𝔼Y(K1)[h(Y+1)h(Y)]=2m1K(2m2m1)(eϵp)m1(1eϵp)m1\displaystyle\mathbb{E}_{Y_{(K-1)}}[h(Y+1)-h(Y)]=\frac{2m-1}{K}{2m-2\choose m-1}(e^{{\epsilon}}p)^{m-1}(1-e^{{\epsilon}}p)^{m-1} (160)

Since p[0,11+eϵ]p\in[0,\frac{1}{1+e^{{\epsilon}}}], there is p(1p)eϵeϵp(1eϵp)p(1-p)\geq e^{-{\epsilon}}e^{{\epsilon}}p(1-e^{{\epsilon}}p). Hence,

e(m1)ϵ𝔼X(K1)[h(X+1)h(X)]\displaystyle e^{(m-1){\epsilon}}\mathbb{E}_{X_{(K-1)}}[h(X+1)-h(X)] =2m1K(2m2m1)e(m1)ϵpm1(1p)m1\displaystyle=\frac{2m-1}{K}{2m-2\choose m-1}e^{(m-1){\epsilon}}p^{m-1}(1-p)^{m-1} (161)
2m1K(2m2m1)e(m1)ϵ(eϵeϵp(1eϵp))m1\displaystyle\geq\frac{2m-1}{K}{2m-2\choose m-1}e^{(m-1){\epsilon}}(e^{-{\epsilon}}e^{{\epsilon}}p(1-e^{{\epsilon}}p))^{m-1} (162)
=2m1K(2m2m1)(eϵp)m1(1eϵp)m1\displaystyle=\frac{2m-1}{K}{2m-2\choose m-1}(e^{{\epsilon}}p)^{m-1}(1-e^{{\epsilon}}p)^{m-1} (163)
=𝔼Y(K1)[h(Y+1)h(Y)]\displaystyle=\mathbb{E}_{Y_{(K-1)}}[h(Y+1)-h(Y)] (164)

implying

emϵ𝔼X(K1)[h(X+1)h(X)]eϵ𝔼Y(K1)[h(Y+1)h(Y)]\displaystyle e^{m{\epsilon}}\mathbb{E}_{X_{(K-1)}}[h(X+1)-h(X)]\geq e^{{\epsilon}}\mathbb{E}_{Y_{(K-1)}}[h(Y+1)-h(Y)] (165)

and the first condition is satisfied.

To show the second condition is satisfied, let X^(K1)Binom(K1,1eϵ(1p))\hat{X}_{(K-1)}\sim\text{Binom}(K-1,1-e^{{\epsilon}}(1-p)) and Y^(K1)Binom(K1,p)\hat{Y}_{(K-1)}\sim\text{Binom}(K-1,p), and consider p[11+eϵ,1]p\in[\frac{1}{1+e^{-{\epsilon}}},1].

𝔼X^(K1)[h(X^+1)]\displaystyle\mathbb{E}_{\hat{X}_{(K-1)}}[h(\hat{X}+1)] =1(K2m1)i=m2m1(𝔼X^(K1)[(X^i)(K1X^2m1i)]+𝔼X^(K1)[(X^i1)(K1X^(2m2)(i1))])\displaystyle=\frac{1}{{K\choose 2m-1}}\sum_{i=m}^{2m-1}\Big{(}\mathbb{E}_{\hat{X}_{(K-1)}}[{\hat{X}\choose i}{K-1-\hat{X}\choose 2m-1-i}]+\mathbb{E}_{\hat{X}_{(K-1)}}[{\hat{X}\choose i-1}{K-1-\hat{X}\choose(2m-2)-(i-1)}]\Big{)} (166)
=1(K2m1)i=m2m1((K12m1)(2m1i)(1eϵ(1p))i(eϵ(1p))2m1i\displaystyle=\frac{1}{{K\choose 2m-1}}\sum_{i=m}^{2m-1}\Big{(}{K-1\choose 2m-1}{2m-1\choose i}(1-e^{{\epsilon}}(1-p))^{i}(e^{{\epsilon}}(1-p))^{2m-1-i} (167)
+(K12m2)(2m2i1)(1eϵ(1p))i1(eϵ(1p))2m1i)\displaystyle+{K-1\choose 2m-2}{2m-2\choose i-1}(1-e^{{\epsilon}}(1-p))^{i-1}(e^{{\epsilon}}(1-p))^{2m-1-i}\Big{)}
By Lemma B.9

and

𝔼X^(K1)[h(X^)]\displaystyle\mathbb{E}_{\hat{X}_{(K-1)}}[h(\hat{X})] =1(K2m1)i=m2m1(𝔼X^(K1)[(X^i)(K1X^2m1i)]+𝔼X^(K1)[(X^i)(K1X^2m2i)]𝕀{i2m2})\displaystyle=\frac{1}{{K\choose 2m-1}}\sum_{i=m}^{2m-1}\Big{(}\mathbb{E}_{\hat{X}_{(K-1)}}[{\hat{X}\choose i}{K-1-\hat{X}\choose 2m-1-i}]+\mathbb{E}_{\hat{X}_{(K-1)}}[{\hat{X}\choose i}{K-1-\hat{X}\choose 2m-2-i}]{\mathbb{I}}\{i\leq 2m-2\}\Big{)} (168)
=1(K2m1)i=m2m1((K12m1)(2m1i)(1eϵ(1p))i(eϵ(1p))2m1i\displaystyle=\frac{1}{{K\choose 2m-1}}\sum_{i=m}^{2m-1}\Big{(}{K-1\choose 2m-1}{2m-1\choose i}(1-e^{{\epsilon}}(1-p))^{i}(e^{{\epsilon}}(1-p))^{2m-1-i} (169)
+(K12m2)(2m2i)(1eϵ(1p))i(eϵ(1p))2m2i𝕀{i2m2})\displaystyle+{K-1\choose 2m-2}{2m-2\choose i}(1-e^{{\epsilon}}(1-p))^{i}(e^{{\epsilon}}(1-p))^{2m-2-i}{\mathbb{I}}\{i\leq 2m-2\}\Big{)}
By Lemma B.9

Hence, following Eq. 167 and Eq. 169,

𝔼X^(K1)[h(X^+1)h(X^)]\displaystyle\mathbb{E}_{\hat{X}_{(K-1)}}[h(\hat{X}+1)-h(\hat{X})] (170)
=1(K2m1)(i=m2m1(K12m2)(2m2i1)(1eϵ(1p))i1(eϵ(1p))2m1i\displaystyle=\frac{1}{{K\choose 2m-1}}\Big{(}\sum_{i=m}^{2m-1}{K-1\choose 2m-2}{2m-2\choose i-1}(1-e^{{\epsilon}}(1-p))^{i-1}(e^{{\epsilon}}(1-p))^{2m-1-i} (171)
i=m2m2(K12m2)(2m2i)(1eϵ(1p))i(eϵ(1p))2m2i)\displaystyle-\sum_{i=m}^{2m-2}{K-1\choose 2m-2}{2m-2\choose i}(1-e^{{\epsilon}}(1-p))^{i}(e^{{\epsilon}}(1-p))^{2m-2-i}\Big{)}
=1(K2m1)(i=m12m2(K12m2)(2m2i)(1eϵ(1p))i(eϵ(1p))2m2i\displaystyle=\frac{1}{{K\choose 2m-1}}\Big{(}\sum_{i=m-1}^{2m-2}{K-1\choose 2m-2}{2m-2\choose i}(1-e^{{\epsilon}}(1-p))^{i}(e^{{\epsilon}}(1-p))^{2m-2-i} (172)
i=m2m2(K12m2)(2m2i)(1eϵ(1p))i(eϵ(1p))2m2i)\displaystyle-\sum_{i=m}^{2m-2}{K-1\choose 2m-2}{2m-2\choose i}(1-e^{{\epsilon}}(1-p))^{i}(e^{{\epsilon}}(1-p))^{2m-2-i}\Big{)}
=2m1K(2m2m1)(1eϵ(1p))m1(eϵ(1p))m1\displaystyle=\frac{2m-1}{K}{2m-2\choose m-1}(1-e^{{\epsilon}}(1-p))^{m-1}(e^{{\epsilon}}(1-p))^{m-1} (173)

Similarly,

𝔼Y^(K1)[h(Y^+1)h(Y^)]=2m1K(2m2m1)pm1(1p)m1\displaystyle\mathbb{E}_{\hat{Y}_{(K-1)}}[h(\hat{Y}+1)-h(\hat{Y})]=\frac{2m-1}{K}{2m-2\choose m-1}p^{m-1}(1-p)^{m-1} (174)

Hence,

e(m+1)ϵ𝔼X^(K1)[h(X^+1)h(X^)]\displaystyle e^{(m+1){\epsilon}}\mathbb{E}_{\hat{X}_{(K-1)}}[h(\hat{X}+1)-h(\hat{X})] =e(m+1)ϵ2m1K(2m2m1)(1eϵ(1p))m1(eϵ(1p))m1\displaystyle=e^{(m+1){\epsilon}}\frac{2m-1}{K}{2m-2\choose m-1}(1-e^{{\epsilon}}(1-p))^{m-1}(e^{{\epsilon}}(1-p))^{m-1} (175)
2m1K(2m2m1)(1eϵ(1p))m1e(m1)ϵ(1p)m1\displaystyle\geq\frac{2m-1}{K}{2m-2\choose m-1}(1-e^{{\epsilon}}(1-p))^{m-1}e^{(m-1){\epsilon}}(1-p)^{m-1} (176)
=2m1K(2m2m1)(eϵe2ϵ(1p))m1(1p)m1\displaystyle=\frac{2m-1}{K}{2m-2\choose m-1}(e^{{\epsilon}}-e^{2{\epsilon}}(1-p))^{m-1}(1-p)^{m-1} (177)

Note that

eϵe2ϵ(1p)=eϵe2ϵ+e2ϵpp\displaystyle e^{{\epsilon}}-e^{2{\epsilon}}(1-p)=e^{{\epsilon}}-e^{2{\epsilon}}+e^{2{\epsilon}}p\geq p (178)
(eϵ+1)(eϵ1)peϵ(eϵ1)\displaystyle\iff(e^{{\epsilon}}+1)(e^{{\epsilon}}-1)p\geq e^{{\epsilon}}(e^{{\epsilon}}-1) (179)
peϵeϵ+1=11+eϵ\displaystyle\iff p\geq\frac{e^{{\epsilon}}}{e^{{\epsilon}}+1}=\frac{1}{1+e^{-{\epsilon}}} (180)

and the condition needs to hold for p[11+eϵ,1]p\in[\frac{1}{1+e^{-{\epsilon}}},1].

Therefore, following Eq. 177,

e(m+1)ϵ𝔼X^(K1)[h(X^+1)h(X^)]\displaystyle e^{(m+1){\epsilon}}\mathbb{E}_{\hat{X}_{(K-1)}}[h(\hat{X}+1)-h(\hat{X})] 2m1K(2m2m1)pm1(1p)m1\displaystyle\geq\frac{2m-1}{K}{2m-2\choose m-1}p^{m-1}(1-p)^{m-1} (181)
=𝔼Y^(K1)[h(Y^+1)h(Y^)]\displaystyle=\mathbb{E}_{\hat{Y}_{(K-1)}}[h(\hat{Y}+1)-h(\hat{Y})] (182)

implying the second condition is satisfied.

Therefore, by Lemma B.7, DaRRMγDSub\textsf{DaRRM}_{\gamma_{DSub}} is mϵm{\epsilon}-differentially private.

B.3 Comparing the Utility of Subsampling Approaches

Intuitively, if we subsample 2m12m-1 mechanisms, the utility is higher than that of the naïve subsampling approach which outputs the majority based on only mm mechanisms. To complete the story, we formally compare the utility of outputting the majority of 2m12m-1 subsampled mechanisms (Theorem 4.1) and outputting the majority of mm subsampled mechanisms (simple composition, Theorem 2.2) in the i.i.d. mechanisms and pure differential privacy setting, fixing the output privacy loss to be mϵm{\epsilon}.

Lemma B.11.

Consider Problem 1.1 with i.i.d. mechanisms {Mi}i=1K\{M_{i}\}_{i=1}^{K}, i.e., p=pi=Pr[Mi(𝒟)=1],p=pi=Pr[Mi(𝒟)=1],i[K]p=p_{i}=\Pr[M_{i}({\mathcal{D}})=1],p^{\prime}=p_{i}^{\prime}=\Pr[M_{i}({\mathcal{D}}^{\prime})=1],\forall i\in[K]. Let γ1:{0,1,,K}[0,1],γ2:{0,1,,K}[0,1]\gamma_{1}:\{0,1,\dots,K\}\rightarrow[0,1],\gamma_{2}:\{0,1,\dots,K\}\rightarrow[0,1] be two functions that are both symmetric around K2\frac{K}{2}. If 1γ1(l)γ2(l)0,l{0,,K}1\geq\gamma_{1}(l)\geq\gamma_{2}(l)\geq 0,\forall l\in\{0,\dots,K\}, then (DaRRMγ1)(DaRRMγ2){\mathcal{E}}(\textsf{DaRRM}_{\gamma_{1}})\leq{\mathcal{E}}(\textsf{DaRRM}_{\gamma_{2}}).

Proof.

Recall 𝒮={S1,,SK}{\mathcal{S}}=\{S_{1},\dots,S_{K}\}, where SiMi(𝒟)S_{i}\sim M_{i}({\mathcal{D}}), is the set of observed outcomes from the mechanisms {Mi}i=1K\{M_{i}\}_{i=1}^{K}. By Definition 2.4, for any γ\gamma that is symmetric around K2\frac{K}{2}, the error of DaRRMγ\textsf{DaRRM}_{\gamma} is

(DaRRMγ)\displaystyle{\mathcal{E}}(\textsf{DaRRM}_{\gamma}) =|Pr[DaRRMγ(𝒟)=1]Pr[g(𝒮)=1]|\displaystyle=\Big{|}\Pr[\textsf{DaRRM}_{\gamma}({\mathcal{D}})=1]-\Pr[g({\mathcal{S}})=1]\Big{|} (183)
=|l=K+12K(γ(l)+12(1γ(l)))αl+l=0K1212(1γ(l))αll=K+12Kαl|\displaystyle=\Big{|}\sum_{l=\frac{K+1}{2}}^{K}\Big{(}\gamma(l)+\frac{1}{2}(1-\gamma(l))\Big{)}\cdot\alpha_{l}+\sum_{l=0}^{\frac{K-1}{2}}\frac{1}{2}(1-\gamma(l))\cdot\alpha_{l}-\sum_{l=\frac{K+1}{2}}^{K}\alpha_{l}\Big{|} (184)
=|l=K+12K(12γ(l)12)αl+l=0K12(1212γ(l))αl|\displaystyle=\Big{|}\sum_{l=\frac{K+1}{2}}^{K}\Big{(}\frac{1}{2}\gamma(l)-\frac{1}{2}\Big{)}\cdot\alpha_{l}+\sum_{l=0}^{\frac{K-1}{2}}\Big{(}\frac{1}{2}-\frac{1}{2}\gamma(l)\Big{)}\cdot\alpha_{l}\Big{|} (185)
=|12l=K+12K(1γ(l))(αlαKl)|\displaystyle=\Big{|}\frac{1}{2}\sum_{l=\frac{K+1}{2}}^{K}(1-\gamma(l))\cdot(\alpha_{l}-\alpha_{K-l})\Big{|} (186)

where αl=(Kl)pl(1p)Kl\alpha_{l}={K\choose l}p^{l}(1-p)^{K-l}, l{0,1,,K}\forall l\in\{0,1,\dots,K\} and recall p=Pr[Mi(𝒟)=1]p=\Pr[M_{i}({\mathcal{D}})=1], i[K]\forall i\in[K].

For any lK+12l\geq\frac{K+1}{2},

  1. 1.

    If p=0p=0 or p=1p=1, αl=αKl\alpha_{l}=\alpha_{K-l}.

  2. 2.

    Otherwise, for p(0,1)p\in(0,1),

    1. (a)

      If p12p\geq\frac{1}{2},

      αlαKl=pl(1p)KlpKl(1p)l=p2lK(1p)K2l=(p1p1)2lK01,αlαKl\displaystyle\frac{\alpha_{l}}{\alpha_{K-l}}=\frac{p^{l}(1-p)^{K-l}}{p^{K-l}(1-p)^{l}}=p^{2l-K}(1-p)^{K-2l}=(\underbrace{\frac{p}{1-p}}_{\geq 1})^{\underbrace{2l-K}_{\geq 0}}\geq 1,\quad\Rightarrow\alpha_{l}\geq\alpha_{K-l} (187)
    2. (b)

      If p<12p<\frac{1}{2},

      αlαKl=(p1p1)2lK01,αlαKl\displaystyle\frac{\alpha_{l}}{\alpha_{K-l}}=(\underbrace{\frac{p}{1-p}}_{\leq 1})^{\underbrace{2l-K}_{\geq 0}}\leq 1,\quad\Rightarrow\alpha_{l}\leq\alpha_{K-l} (188)

Hence, if p12p\geq\frac{1}{2}, then αlαKl,lK+12\alpha_{l}\geq\alpha_{K-l},\forall l\geq\frac{K+1}{2}. Since γ1(l)γ2(l),l{0,,K}\gamma_{1}(l)\geq\gamma_{2}(l),\forall l\in\{0,\dots,K\}, 1γ1(l)1γ2(l)1-\gamma_{1}(l)\leq 1-\gamma_{2}(l), and so

(DaRRMγ1)=l=K+12K12(1γ1(l))(αlαKl)l=K+12K12(1γ2(l))(αlαKl)=(DaRRMγ2)\displaystyle{\mathcal{E}}(\textsf{DaRRM}_{\gamma_{1}})=\sum_{l=\frac{K+1}{2}}^{K}\frac{1}{2}(1-\gamma_{1}(l))\cdot(\alpha_{l}-\alpha_{K-l})\leq\sum_{l=\frac{K+1}{2}}^{K}\frac{1}{2}(1-\gamma_{2}(l))\cdot(\alpha_{l}-\alpha_{K-l})={\mathcal{E}}(\textsf{DaRRM}_{\gamma_{2}}) (189)

Similarly, if p<12p<\frac{1}{2}, then αlαKl,lK+12\alpha_{l}\leq\alpha_{K-l},\forall l\geq\frac{K+1}{2} and

(DaRRMγ1)=l=K+12K12(1γ1(l))(αKlαl)l=K+12K12(1γ2(l))(αKlαl)=(DaRRMγ2))\displaystyle{\mathcal{E}}(\textsf{DaRRM}_{\gamma_{1}})=\sum_{l=\frac{K+1}{2}}^{K}\frac{1}{2}(1-\gamma_{1}(l))\cdot(\alpha_{K-l}-\alpha_{l})\leq\sum_{l=\frac{K+1}{2}}^{K}\frac{1}{2}(1-\gamma_{2}(l))\cdot(\alpha_{K-l}-\alpha_{l})={\mathcal{E}}(\textsf{DaRRM}_{\gamma_{2}})) (190)

Therefore,

(DaRRMγ1)(DaRRMγ2)\displaystyle{\mathcal{E}}(\textsf{DaRRM}_{\gamma_{1}})\leq{\mathcal{E}}(\textsf{DaRRM}_{\gamma_{2}}) (191)

Since γDSub(l)γSub(l)\gamma_{DSub}(l)\geq\gamma_{Sub}(l), l{0,1,,K}\forall l\in\{0,1,\dots,K\}, by Lemma B.11, (DaRRMγDSub)(DaRRMγSub){\mathcal{E}}(\textsf{DaRRM}_{\gamma_{DSub}})\leq{\mathcal{E}}(\textsf{DaRRM}_{\gamma_{Sub}}) — that is, outputting 2m12m-1 mechanisms has a higher utility than outputting mm mechanisms.

Appendix C Details of Section 5: Optimizing the Noise Function γ\gamma in DaRRM

C.1 Deriving the Optimization Objective

For any γ\gamma function that is symmetric around K2\frac{K}{2}, we can write the optimization objective as

𝔼p1,p2,,pK𝒯[(DaRRMγ)]\displaystyle\mathbb{E}_{p_{1},p_{2},\dots,p_{K}\sim{\mathcal{T}}}[{\mathcal{E}}(\textsf{DaRRM}_{\gamma})] (192)
=𝔼p1,p2,,pK𝒯[|Pr[DaRRMγ(𝒟)=1]Pr[g(𝒮)=1]|]\displaystyle=\mathbb{E}_{p_{1},p_{2},\dots,p_{K}\sim{\mathcal{T}}}[|\Pr[\textsf{DaRRM}_{\gamma}({\mathcal{D}})=1]-\Pr[g({\mathcal{S}})=1]|] (193)
=𝔼p1,p2,,pK𝒯[|l=K+12K(αl(γ(l)+12(1γ(l)))αl)+l=0K12αl12(1γ(l))|]\displaystyle=\mathbb{E}_{p_{1},p_{2},\dots,p_{K}\sim{\mathcal{T}}}\left[\Big{|}\sum_{l=\frac{K+1}{2}}^{K}\Big{(}\alpha_{l}\cdot(\gamma(l)+\frac{1}{2}(1-\gamma(l)))-\alpha_{l}\Big{)}+\sum_{l=0}^{\frac{K-1}{2}}\alpha_{l}\cdot\frac{1}{2}(1-\gamma(l))\Big{|}\right] (194)
=𝔼p1,p2,,pK𝒯[|l=0K12αl(12γ(l)12)+l=K+12Kαl(1212γ(l))|]\displaystyle=\mathbb{E}_{p_{1},p_{2},\dots,p_{K}\sim{\mathcal{T}}}\left[\Big{|}\sum_{l=0}^{\frac{K-1}{2}}\alpha_{l}(\frac{1}{2}\gamma(l)-\frac{1}{2})+\sum_{l=\frac{K+1}{2}}^{K}\alpha_{l}(\frac{1}{2}-\frac{1}{2}\gamma(l))\Big{|}\right] (195)
The above follows by conditioning on =l{0,1,,K}{\mathcal{L}}=l\in\{0,1,\dots,K\}, i.e. the sum of observed outcomes in 𝒮{\mathcal{S}}
=𝔼p1,p2,,pK𝒯[|12l=K+12K(αlαKl)(1γ(l))|]\displaystyle=\mathbb{E}_{p_{1},p_{2},\dots,p_{K}\sim{\mathcal{T}}}\left[\Big{|}\frac{1}{2}\sum_{l=\frac{K+1}{2}}^{K}\left(\alpha_{l}-\alpha_{K-l}\right)(1-\gamma(l))\Big{|}\right] (196)
The above follows by symmetry of γ\gamma

Furthermore, notice the objective is symmetric around 0, and can be written as

𝔼p1,p2,,pK𝒯[12l=K+12K(αlαKl)(1γ(l))]\displaystyle\mathbb{E}_{p_{1},p_{2},\dots,p_{K}\sim{\mathcal{T}}}\left[\frac{1}{2}\sum_{l=\frac{K+1}{2}}^{K}\left(\alpha_{l}-\alpha_{K-l}\right)(1-\gamma(l))\right] (197)
=12𝔼p1,p2,,pK𝒯[l=K+12K((αlαKl)(αlαKl)γ(l))]\displaystyle=\frac{1}{2}\mathbb{E}_{p_{1},p_{2},\dots,p_{K}\sim{\mathcal{T}}}\left[\sum_{l=\frac{K+1}{2}}^{K}\Big{(}(\alpha_{l}-\alpha_{K-l})-(\alpha_{l}-\alpha_{K-l})\gamma(l)\Big{)}\right] (198)
=12𝔼p1,p2,,pK𝒯[l=K+12K(αlαKl)]:=A12𝔼p1,p2,,pK𝒯[l=K+12K(αlαKl)γ(l)]:=B\displaystyle=\underbrace{\frac{1}{2}\mathbb{E}_{p_{1},p_{2},\dots,p_{K}\sim{\mathcal{T}}}\left[\sum_{l=\frac{K+1}{2}}^{K}(\alpha_{l}-\alpha_{K-l})\right]}_{:=A}\underbrace{-\frac{1}{2}\mathbb{E}_{p_{1},p_{2},\dots,p_{K}\sim{\mathcal{T}}}\left[\sum_{l=\frac{K+1}{2}}^{K}(\alpha_{l}-\alpha_{K-l})\gamma(l)\right]}_{:=B} (199)

Since expression AA in Eq. 199 does not involve γ\gamma, we only need to optimize expression BB in Eq. 199. That is,

12𝔼p1,p2,,pK𝒯[l=K+12K(αlαKl)γ(l)]\displaystyle-\frac{1}{2}\mathbb{E}_{p_{1},p_{2},\dots,p_{K}\sim{\mathcal{T}}}\left[\sum_{l=\frac{K+1}{2}}^{K}(\alpha_{l}-\alpha_{K-l})\gamma(l)\right] (200)
=12l=K+12K𝔼p1,p2,,pK𝒯[(αlαKl)]γ(l)\displaystyle=-\frac{1}{2}\sum_{l=\frac{K+1}{2}}^{K}\mathbb{E}_{p_{1},p_{2},\dots,p_{K}\sim{\mathcal{T}}}\left[(\alpha_{l}-\alpha_{K-l})\right]\cdot\gamma(l) (201)

Eq. 201 is the optimization objective we use in the experiments. We see the optimization objective is linear in γ\gamma.

Note in the general setting, (𝒟)PoissonBinomial(p1,p2,,pK){\mathcal{L}}({\mathcal{D}})\sim\text{PoissonBinomial}(p_{1},p_{2},\dots,p_{K}), where recall (𝒟){\mathcal{L}}({\mathcal{D}}) is the sum of observed outcomes on dataset 𝒟{\mathcal{D}}, and hence, αl=Pr[(𝒟)=l]\alpha_{l}=\Pr[{\mathcal{L}}({\mathcal{D}})=l] is the pmf of the Poisson Binomial distribution at l{0,1,,K}l\in\{0,1,\dots,K\}.

C.2 Practical Approximation of the Objective

Since the optimization objective in Eq. 200 requires taking an expectation over p1,,pKp_{1},\dots,p_{K}, and this invovles integrating over KK variables, which can be slow in practice, we propose the following approximation to efficiently compute the objective. We start with a simple idea to compute the objective, by sampling pip_{i}’s from [0,1][0,1] and take an empirical average of the objective value over all subsampled sets of p1,,pKp_{1},\dots,p_{K} as the approximation of the expectation in Section C.2.1. However, we found this approach is less numerically stable. We then propose the second approach to approximate the objective in Section C.2.2, which approximates the integration over pip_{i}’s using the rectangular rule instead of directly approximating the objective value. We use the second approximation approach in our experiments and empirically demonstrates its effectiveness. Note approximating the optimization objective does not affect the privacy guarantee.

C.2.1 Approximation via Direct Sampling of pip_{i}’s

One straightforward way of efficiently computing an approximation to the optimization objective is as follows:

Algorithm 4 Straightforward Approximation of the Optimization Objective
1:  Input: # mechanisms KK\in{\mathbb{N}}, # iterations TT\in{\mathbb{N}}, noise function γ:{0,1,,K}[0,1]\gamma:\{0,1,\dots,K\}\rightarrow[0,1]
2:  for t=1,2,,Tt=1,2,\dots,T do
3:     Sample p^1,p^2,,p^K𝒯\hat{p}_{1},\hat{p}_{2},\dots,\hat{p}_{K}\sim{\mathcal{T}}
4:     ^PoissonBinomail(p^1,,p^K)\widehat{{\mathcal{L}}}\leftarrow\text{PoissonBinomail}(\hat{p}_{1},\dots,\hat{p}_{K})
5:     α^lPr[^=l],l{0,,K}\hat{\alpha}_{l}\leftarrow\Pr[\widehat{{\mathcal{L}}}=l],\forall l\{0,\dots,K\}
6:     gt12l=K+12K(α^lα^Kl)γ(l)g_{t}\leftarrow-\frac{1}{2}\sum_{l=\frac{K+1}{2}}^{K}(\hat{\alpha}_{l}-\hat{\alpha}_{K-l})\cdot\gamma(l)
7:  end for
8:  Return 1Tt=1Tgt\frac{1}{T}\sum_{t=1}^{T}g_{t}

However, we found this approximation is not very numerically stable even for T=10000T=10000 in the experiments and so we propose to adopt the second approximation as follows.

C.2.2 Approximating the Integration Over pip_{i}’s

Consider the following surrogate objective:

12l=K+12K0.510.510.51(αlαKl)𝑑p1𝑑p2𝑑pKγ(l)\displaystyle-\frac{1}{2}\sum_{l=\frac{K+1}{2}}^{K}\int_{0.5}^{1}\int_{0.5}^{1}\dots\int_{0.5}^{1}(\alpha_{l}-\alpha_{K-l})dp_{1}dp_{2}\dots dp_{K}\cdot\gamma(l) (202)

where we approximate the integration instead of directly approximating the objective value. The approximation of the integration is based on the rectangular rule and that the Poisson Binomial distribution is invariant to the order of its probability parameters.

First, we discretize the integration over pip_{i}’s: pick τ=50\tau=50 points representing probabilities between [0.5,1)[0.5,1) with equal distance θ=0.5τ\theta=\frac{0.5}{\tau}. Denote this set of points as 𝒲{\mathcal{W}}. We pick only τ=50\tau=50 samples to ensure the distance between each sample, i.e., θ\theta, is not too small; or this can cause numerical instability. For each l{K+12,K+12+1,,K}l\in\{\frac{K+1}{2},\frac{K+1}{2}+1,\dots,K\}, we want to compute an approximated coefficient for γ(l)\gamma(l) as follows:

0.510.510.51(αlαKl)𝑑p1𝑑p2𝑑pKp1𝒲p2𝒲pK𝒲(αlαKl)\displaystyle\int_{0.5}^{1}\int_{0.5}^{1}\dots\int_{0.5}^{1}(\alpha_{l}-\alpha_{K-l})dp_{1}dp_{2}\dots dp_{K}\approx\sum_{p_{1}\in{\mathcal{W}}}\sum_{p_{2}\in{\mathcal{W}}}\dots\sum_{p_{K}\in{\mathcal{W}}}(\alpha_{l}-\alpha_{K-l}) (203)

which approximates integration over a KK-dimensional grid 𝒲K{\mathcal{W}}^{K}.

The idea is then to sample points from this KK-dimensional grid 𝒲K{\mathcal{W}}^{K} and compute an empirical mean of the integration based on the sample probabilities for p1,,pKp_{1},\dots,p_{K} from 𝒲K{\mathcal{W}}^{K} as the approximation of the integration in the objective.

Let (s1,s2,,sK)(s_{1},s_{2},\dots,s_{K}) be randomly sampled probability values from 𝒲K{\mathcal{W}}^{K} and we want to compute (αlαKl)(\alpha_{l}-\alpha_{K-l}) for all ll based on (p1,,pK)=(s1,,sK)(p_{1},\dots,p_{K})=(s_{1},\dots,s_{K}). To apply the rectangular rule, since the grid of probabilities is KK-dimensional, the weight of (αlαKl)(\alpha_{l}-\alpha_{K-l}) in the approximate integration is θK\theta^{K}. Furthermore, observe that αl\alpha_{l} is the pmf at ll from a Poison Binomial distribution in our case, and PoissonBinomial(p1,,pK)dist.PoissonBinomial(π(p1,,pK))\text{PoissonBinomial}(p_{1},\dots,p_{K})\stackrel{{\scriptstyle dist.}}{{\sim}}\text{PoissonBinomial}(\pi(p_{1},\dots,p_{K})), where π\pi denotes a permutation of p1,,pKp_{1},\dots,p_{K} and dist.\stackrel{{\scriptstyle dist.}}{{\sim}} denotes “the same distribution”. Hence, with a single probability sample (s1,,sK)(s_{1},\dots,s_{K}), we can indeed compute αlαKl\alpha_{l}-\alpha_{K-l} for each ll at K!K! points from the grid 𝒲K{\mathcal{W}}^{K}, since they all have the same value. Therefore, we should set the weight of αlαKl\alpha_{l}-\alpha_{K-l} in the approximate integration as w=θKK!w=\theta^{K}\cdot K!. Furthermore, since the order of (p1,,pK)(p_{1},\dots,p_{K}) does not affect the objective value, there is a total of (τ\tau choose KK with replacement) =(τ+K1K):=P={\tau+K-1\choose K}:=P different points in the grid 𝒲K{\mathcal{W}}^{K}.

In summary, the integration based approximation of the objective proceeds as follows:

Algorithm 5 Integration Based Approximation of the Optimization Objective
1:  Input: # mechanisms KK\in{\mathbb{N}}, # iterations T=10000T=10000\in{\mathbb{N}}, noise function γ:{0,1,,K}[0,1]\gamma:\{0,1,\dots,K\}\rightarrow[0,1], τ=50\tau=50: # samples between [0.5,1)[0.5,1) to form the set 𝒲{\mathcal{W}}
2:  θ0.5/τ\theta\leftarrow 0.5/\tau distance between samples
3:  wθKK!w\leftarrow\theta^{K}\cdot K!
4:  P(τ+K1K)P\leftarrow{\tau+K-1\choose K}
5:  for t=1,2,,Tt=1,2,\dots,T do
6:     Sample probabilities (s1,s2,,sK)𝒲K(s_{1},s_{2},\dots,s_{K})\sim{\mathcal{W}}^{K}
7:     ^PoissonBinomial(s1,s2,,sK)\widehat{{\mathcal{L}}}\sim\text{PoissonBinomial}(s_{1},s_{2},\dots,s_{K})
8:     α^lPr[^=l],l{0,1,,K}\hat{\alpha}_{l}\leftarrow\Pr[\widehat{{\mathcal{L}}}=l],\forall l\in\{0,1,\dots,K\}
9:     gt12l=K+12Kw(α^lα^Kl)γ(l)g_{t}\leftarrow-\frac{1}{2}\sum_{l=\frac{K+1}{2}}^{K}w\cdot(\hat{\alpha}_{l}-\hat{\alpha}_{K-l})\cdot\gamma(l)
10:  end for
11:  Return PNt=1Tgt\frac{P}{N}\sum_{t=1}^{T}g_{t}

C.3 Reducing # Constraints from \infty to a Polynomial Set

Lemma C.1 (Restatement of Lemma 5.1).

Consider using DaRRM (Algorithm 1) to solve Problem 1.1 and let ff be the privacy cost objective as defined in Lemma 3.4. Given an arbitrary noise function γ\gamma, let the worst case probabilities be

(p1,,pK,p1,,pK)=argmax{(pi,pi)}i=1Kf(p1,,pK,p1,,pK;γ)(p_{1}^{*},\dots,p_{K}^{*},p_{1}^{\prime*},\dots,p_{K}^{\prime*})=\operatorname*{arg\,max}_{\{(p_{i},p_{i}^{\prime})\}_{i=1}^{K}}f(p_{1},\dots,p_{K},p_{1}^{\prime},\dots,p_{K}^{\prime};\gamma)

Then, each pair (pi,pi),i[K](p_{i}^{*},p_{i}^{\prime*}),\forall i\in[K] satisfies

(pi,pi){\displaystyle(p_{i}^{*},p_{i}^{\prime*})\in\{ (0,0),(1,1),(0,Δ),(Δ,0),(1Δ,1),\displaystyle(0,0),(1,1),(0,\Delta),(\Delta,0),(1-\Delta,1),
(1,1Δ),(eϵ+Δeϵ+1,1Δeϵ+1),(1Δeϵ+1,eϵ+Δeϵ+1)}\displaystyle(1,1-\Delta),(\frac{e^{{\epsilon}}+\Delta}{e^{{\epsilon}}+1},\frac{1-\Delta}{e^{{\epsilon}}+1}),(\frac{1-\Delta}{e^{{\epsilon}}+1},\frac{e^{{\epsilon}}+\Delta}{e^{{\epsilon}}+1})\}

Furthermore, when δ>0\delta>0, there exists a finite vector set 𝒫{\mathcal{P}} of size O(K7)O(K^{7}) such that if β=max{(pi,pi)}i=1K𝒫f(p1,,pK,p1,,pK;γ)\beta=\max_{\{(p_{i},p_{i}^{\prime})\}_{i=1}^{K}\in{\mathcal{P}}}f(p_{1},\dots,p_{K},p_{1}^{\prime},\dots,p_{K}^{\prime};\gamma), then f(p1,,pK,p1,,pK;γ)βf(p_{1}^{*},\dots,p_{K}^{*},p_{1}^{\prime*},\dots,p_{K}^{\prime*};\gamma)\leq\beta. When δ=0\delta=0, the size of 𝒫{\mathcal{P}} can be reduced to O(K3)O(K^{3}).

Refer to caption
Figure 5: An illustration of the feasible region i{\mathcal{F}}_{i}.
Proof.

Part I: Reducing # privacy constraints from \infty to exponentially many.

Consider (pi,pi)(p_{i},p_{i}^{\prime}) for an arbitrary i[K]i\in[K] and fixing (pj,pj),ji(p_{j},p_{j}^{\prime}),\forall j\neq i. Given any noise function γ\gamma, recall the privacy cost objective f(p1,,pK,p1,,pK;γ)f(p_{1},\dots,p_{K},p_{1}^{\prime},\dots,p_{K}^{\prime};\gamma) (see Lemma 3.4), is

f(p1,,pK,p1,,pK;γ)=l=0K12(emϵαlαl)γ(l)+l=K+12K(αlemϵαl)γ(l)\displaystyle f(p_{1},\dots,p_{K},p_{1}^{\prime},\dots,p_{K}^{\prime};\gamma)=\sum_{l=0}^{\frac{K-1}{2}}(e^{m{\epsilon}}\alpha_{l}^{\prime}-\alpha_{l})\cdot\gamma(l)+\sum_{l=\frac{K+1}{2}}^{K}(\alpha_{l}-e^{m{\epsilon}}\alpha_{l}^{\prime})\cdot\gamma(l)

and the privacy constraints are of the form

f(p1,,pK,p1,,pK;γ)emϵ1+2δ\displaystyle f(p_{1},\dots,p_{K},p_{1}^{\prime},\dots,p_{K}^{\prime};\gamma)\leq e^{m{\epsilon}}-1+2\delta

where recall that αl=Pr[(𝒟)=l]\alpha_{l}=\Pr[{\mathcal{L}}({\mathcal{D}})=l] is a function of {pi}i=1K\{p_{i}\}_{i=1}^{K} and αl=Pr[(𝒟)=l]\alpha_{l}^{\prime}=\Pr[{\mathcal{L}}({\mathcal{D}}^{\prime})=l] is a function of {pi}i=1K\{p_{i}^{\prime}\}_{i=1}^{K}, l{0,1,,K}\forall l\in\{0,1,\dots,K\} and (𝒟){\mathcal{L}}({\mathcal{D}}), (𝒟){\mathcal{L}}({\mathcal{D}}^{\prime}) are the sum of observed outcomes on neighboring datasets 𝒟{\mathcal{D}} and 𝒟{\mathcal{D}}^{\prime}. By Lemma 3.4, γ\gamma needs to make the above privacy constraint hold for all possible {(pi,pi)}i=1K\{(p_{i},p_{i}^{\prime})\}_{i=1}^{K} to make DaRRMγ\textsf{DaRRM}_{\gamma} (mϵ,δ)(m{\epsilon},\delta)-differentially private. This is equivalent to saying, γ\gamma needs to ensure max{(pi,pi}i=1Kf(p1,,pK,p1,,pK;γ)emϵ1+2δ\max_{\{(p_{i},p_{i}^{\prime}\}_{i=1}^{K}}f(p_{1},\dots,p_{K},p_{1}^{\prime},\dots,p_{K}^{\prime};\gamma)\leq e^{m{\epsilon}}-1+2\delta.

Notice that the sum of observed outcomes follows a Poisson Binomial distribution, i.e., (𝒟)PoissonBinomial(p1,,pK){\mathcal{L}}({\mathcal{D}})\sim\text{PoissonBinomial}(p_{1},\dots,p_{K}) and (𝒟)PoissonBinomial(p1,,pK){\mathcal{L}}({\mathcal{D}}^{\prime})\sim\text{PoissonBinomial}(p_{1}^{\prime},\dots,p_{K}^{\prime}). Hence, by the pmf of the Poisson Binomial distribution666See, e.g. https://en.wikipedia.org/wiki/Poisson_binomial_distribution, for the pmf of Poisson Binomial distribution. , the privacy cost objective ff is linear in each pip_{i} and pip_{i}^{\prime}, fixing all (pj,pj)(p_{j},p_{j}^{\prime}), ji\forall j\neq i. Since each mechanism MiM_{i} is (ϵ,Δ)({\epsilon},\Delta)-differentially private, by definition, (pi,pi)(p_{i},p_{i}^{\prime}) satisfies all of the following:

pieϵpi+Δ,pieϵp+Δ\displaystyle p_{i}\leq e^{{\epsilon}}p_{i}^{\prime}+\Delta,\quad p_{i}^{\prime}\leq e^{{\epsilon}}p+\Delta
1pieϵ(1pi)+Δ,1pieϵ(1pi)+Δ\displaystyle 1-p_{i}\leq e^{{\epsilon}}(1-p_{i}^{\prime})+\Delta,\quad 1-p_{i}^{\prime}\leq e^{{\epsilon}}(1-p_{i})+\Delta

That is, (pi,pi)(p_{i},p_{i}^{\prime}) lies in a feasible region i{\mathcal{F}}_{i} (see Figure 5). Note the constraints on (pi,pi)(p_{i},p_{i}^{\prime}), that is, the boundaries of i{\mathcal{F}}_{i}, are linear in pip_{i} and pip_{i}^{\prime}. And so the optimization problem (pi,pi)=argmax(pi,pi)f(p1,,pK,p1,,pK;γ)(p_{i}^{*},p_{i}^{\prime*})=\operatorname*{arg\,max}_{(p_{i},p_{i}^{\prime})}f(p_{1},\dots,p_{K},p_{1}^{\prime},\dots,p_{K}^{\prime};\gamma), which finds the worst case probabilities in (pi,pi)(p_{i},p_{i}^{\prime}), is a Linear Programming (LP) problem in (pi,pi)(p_{i},p_{i}^{\prime}) for i[K]i\in[K]. This implies (pi,pi)(p_{i}^{*},p_{i}^{\prime*}) has to be on one of the eight corners of i{\mathcal{F}}_{i} — that is (pi,pi){(0,0),(1,1)(p_{i}^{*},p_{i}^{\prime*})\in\{(0,0),(1,1), (0,Δ),(Δ,0),(1Δ,1),(1,1Δ),(eϵ+Δeϵ+1,1Δeϵ+1),(1Δeϵ+1,eϵ+Δeϵ+1)}:=𝒞(0,\Delta),(\Delta,0),(1-\Delta,1),(1,1-\Delta),(\frac{e^{{\epsilon}}+\Delta}{e^{{\epsilon}}+1},\frac{1-\Delta}{e^{{\epsilon}}+1}),(\frac{1-\Delta}{e^{{\epsilon}}+1},\frac{e^{{\epsilon}}+\Delta}{e^{{\epsilon}}+1})\}:={\mathcal{C}}. Since all (pi,pi)(p_{i},p_{i}^{\prime}) and (pj,pj)(p_{j},p_{j}^{\prime}), for iji\neq j, are independent, we can search for the worst case probabilities by searching for (pi,pi)𝒞(p_{i}^{*},p_{i}^{\prime*})\in{\mathcal{C}}, instead of searching for (pi,pi)i,i[K](p_{i},p_{i}^{\prime})\in{\mathcal{F}}_{i},\forall i\in[K]. Therefore, the infinitely many privacy constraints are now reduced to only 8K8^{K} to optimize for the best γ\gamma function that maximizes the utility of DaRRMγ\textsf{DaRRM}_{\gamma}, while ensuring the output is mϵm{\epsilon}-differentially private.

Part II: Reducing # privacy constraints from exponentially many to a polynomial set.

To further reduce the number of privacy constraints in optimization, observe that the Poisson Binomial distribution is invariant under the permutation of its parameters. That is, PoissonBinomial(p1,,pK)dist.PoissonBinomial(π(p1,,pK))\text{PoissonBinomial}(p_{1},\dots,p_{K})\stackrel{{\scriptstyle dist.}}{{\sim}}\text{PoissonBinomial}(\pi(p_{1},\dots,p_{K})), for some permutation π\pi and dist.\stackrel{{\scriptstyle dist.}}{{\sim}} means “follows the same distribution”. Similarly, PoissonBinomial(p1,,pK)dist.PoissonBinomial(π(p1,,pK))\text{PoissonBinomial}(p_{1}^{\prime},\dots,p_{K}^{\prime})\stackrel{{\scriptstyle dist.}}{{\sim}}\text{PoissonBinomial}(\pi(p_{1}^{\prime},\dots,p_{K}^{\prime})).

The above observation implies if we have one privacy constraint f(p1=v1,,pK=vK,p1=v1,,pK=vK;γ)emϵ1+2δf(p_{1}=v_{1},\dots,p_{K}=v_{K},p_{1}^{\prime}=v_{1}^{\prime},\dots,p_{K}^{\prime}=v_{K}^{\prime};\gamma)\leq e^{m{\epsilon}}-1+2\delta, for some {(vi,vi)}i=1K𝒞K\{(v_{i},v_{i}^{\prime})\}_{i=1}^{K}\in{\mathcal{C}}^{K}, then any privacy constraint f(p1=s1,,pK=sK,p1=s1,,pK=sK;γ)emϵ1+2δf(p_{1}=s_{1},\dots,p_{K}=s_{K},p_{1}^{\prime}=s_{1}^{\prime},\dots,p_{K}^{\prime}=s_{K}^{\prime};\gamma)\leq e^{m{\epsilon}}-1+2\delta, where (s1,,sK)=π1(v1,,vK)(s_{1},\dots,s_{K})=\pi_{1}(v_{1},\dots,v_{K}), (s1,,sK)=π(v1,,vK)(s_{1}^{\prime},\dots,s_{K}^{\prime})=\pi(v_{1}^{\prime},\dots,v_{K}^{\prime}), for permutations π1\pi_{1} and π2\pi_{2}, is redundant.

Therefore, there is a vector set 𝒫{\mathcal{P}}, where each probability vector (p1,,pK,p1,,pK)(p_{1},\dots,p_{K},p_{1}^{\prime},\dots,p_{K}^{\prime}) in 𝒫{\mathcal{P}} is constructed by setting (p1,p1),(p2,p2),,(pK,pK)=(v1,v2,,vK)(p_{1},p_{1}^{\prime}),(p_{2},p_{2}^{\prime}),\dots,(p_{K},p_{K}^{\prime})=(v_{1},v_{2},\dots,v_{K}), where vi𝒞,i[K]v_{i}\in{\mathcal{C}},\forall i\in[K], such that vectors constructed by (p1,p1),(p2,p2),,(pK,pK)=π(v1,v2,,vK)(p_{1},p_{1}^{\prime}),(p_{2},p_{2}^{\prime}),\dots,(p_{K},p_{K}^{\prime})=\pi(v_{1},v_{2},\dots,v_{K}) is not in 𝒫{\mathcal{P}}. Note |𝒫|=|{\mathcal{P}}|= (8 chooses K with replacement) = (K+81K)=O(K7){K+8-1\choose K}=O(K^{7}). If we can restrict our search for the worst case probabilities to this set 𝒫{\mathcal{P}} — that is, solving for β:=max{(pi,pi)}i=1K𝒫f(p1,,pK,p1,,pK;γ)\beta:=\max_{\{(p_{i},p_{i}^{\prime})\}_{i=1}^{K}\in{\mathcal{P}}}f(p_{1},\dots,p_{K},p_{1}^{\prime},\dots,p_{K}^{\prime};\gamma), then f(p1,,pK,p1,,pK;γ)βf(p_{1}^{*},\dots,p_{K}^{*},p_{1}^{\prime*},\dots,p_{K}^{\prime*};\gamma)\leq\beta. This implies we only need O(K7)O(K^{7}) privacy constraints to optimize for the best noise function γ\gamma in DaRRM, while making sure DaRRMγ\textsf{DaRRM}_{\gamma} is mϵm{\epsilon}-differentially private.

Note if Δ=0\Delta=0, i.e., the mechanism MiM_{i}’s are pure differentially private, the feasible region i{\mathcal{F}}_{i} in which (pi,pi)(p_{i},p_{i}^{\prime}) lies has only 4 corners instead of 8. This implies (pi,pi)𝒞={(0,0),(1,1),(eϵeϵ+1,1eϵ+1),(1eϵ+1,eϵeϵ+1)}(p_{i}^{*},p_{i}^{\prime*})\in{\mathcal{C}}=\{(0,0),(1,1),(\frac{e^{{\epsilon}}}{e^{{\epsilon}}+1},\frac{1}{e^{{\epsilon}}+1}),(\frac{1}{e^{{\epsilon}}+1},\frac{e^{{\epsilon}}}{e^{{\epsilon}}+1})\}. Hence, in this case, |𝒫|=|{\mathcal{P}}|= (4 choose KK with replacement) = (K+41K)=O(K3){K+4-1\choose K}=O(K^{3}), which implies we only need O(K3)O(K^{3}) privacy constraints to optimize for the best noise function γ\gamma in DaRRM.

Appendix D Full Experiment Results

D.1 Optimized γ\gamma in Simulations

D.1.1 Comparison Using General Composition

The general composition (Theorem 2.3) indicates less total privacy loss than simple composition (Theorem 2.2) when the number of folds, mm, is large, or when the failure probability δ\delta is large. To enable meaningful comparison against general composition, we consider a larger KK and a larger failure probability δ\delta.

Consider K=35,ϵ=0.1,Δ=105K=35,{\epsilon}=0.1,\Delta=10^{-5}. By general composition, if one outputs the majority of MM subsampled mechanisms for some M<KM<K, the majority output is (ϵopt,δopt)({\epsilon}_{opt},\delta_{opt})-differentially private, where

ϵopt=min{Mϵ,(eϵ1)ϵMeϵ+1+ϵ2Mlog(e+Mϵ2δ),(eϵ1)ϵMeϵ+1+ϵ2Mlog(1δ)},δopt=1(1δ)M(1δ)\displaystyle{\epsilon}_{opt}=\min\Big{\{}M{\epsilon},\frac{(e^{{\epsilon}}-1){\epsilon}M}{e^{{\epsilon}}+1}+{\epsilon}\sqrt{2M\log(e+\frac{\sqrt{M{\epsilon}^{2}}}{\delta^{\prime}})},\frac{(e^{{\epsilon}}-1){\epsilon}M}{e^{{\epsilon}}+1}+{\epsilon}\sqrt{2M\log(\frac{1}{\delta^{\prime}})}\Big{\}},\quad\delta_{opt}=1-(1-\delta)^{M}(1-\delta^{\prime})

for some δ0\delta^{\prime}\geq 0. We set this as the privacy guarantee of all majority ensembling algorithms. That is, if we want the majority output to be (mϵ,δ)(m{\epsilon},\delta)-differentially private, we set

m=ϵoptϵ=min{M,(eϵ1)Meϵ+1+2Mlog(e+Mϵ2δ),(eϵ1)Meϵ+1+2Mlog(1δ)}\displaystyle m=\frac{{\epsilon}_{opt}}{{\epsilon}}=\min\Big{\{}M,\frac{(e^{{\epsilon}}-1)M}{e^{{\epsilon}}+1}+\sqrt{2M\log(e+\frac{\sqrt{M{\epsilon}^{2}}}{\delta^{\prime}})},\frac{(e^{{\epsilon}}-1)M}{e^{{\epsilon}}+1}+\sqrt{2M\log(\frac{1}{\delta^{\prime}})}\Big{\}}

and δ=1(1δ)M(1δ)\delta=1-(1-\delta)^{M}(1-\delta^{\prime}) accordingly. The parameters τ\tau and λ\lambda to compute pconstp_{const} in RR (see Section A.1) are set to be

τ=min{K,(eϵ1)Keϵ+1+2Klog(e+Kϵ2δ),(eϵ1)Keϵ+1+2Klog(1δ)}\displaystyle\tau=\min\Big{\{}K,\frac{(e^{{\epsilon}}-1)K}{e^{{\epsilon}}+1}+\sqrt{2K\log(e+\frac{\sqrt{K{\epsilon}^{2}}}{\delta^{\prime}})},\frac{(e^{{\epsilon}}-1)K}{e^{{\epsilon}}+1}+\sqrt{2K\log(\frac{1}{\delta^{\prime}})}\Big{\}}

and λ=1(1δ)K(1δ)\lambda=1-(1-\delta)^{K}(1-\delta^{\prime}).

In the experiments, we consider M={10,13,15,20}M=\{10,13,15,20\} and δ=0.1\delta^{\prime}=0.1; and γopt\gamma_{opt} is computed using a uniform prior 𝒯{\mathcal{T}}.

All values of the parameters of the private ensembling algorithms we use in the experiment are listed in the table:

# Subsampled mechanisms MM 10 13 15 20
Privacy allowance mm 6.4521 7.5742 8.2708 9.8823
Parameter of constant γ\gamma τ\tau 14.0328 14.0328 14.0328 14.0328
Parameter of constant γ\gamma λ\lambda 0.1003 0.1003 0.1003 0.1003
Overall privacy loss mϵm{\epsilon} 0.6452 0.7574 0.8271 0.9882
Overall failure probability δ\delta 0.1001 0.1001 0.1001 0.1002
Table 3: All parameter values. Note that all the private ensembling algorithms we compare in the experiment is required to be (mϵ,δ)(m{\epsilon},\delta)-differentially private. Here, K=35,ϵ=0.1,Δ=105K=35,{\epsilon}=0.1,\Delta=10^{-5} and δ=0.1\delta^{\prime}=0.1.
Refer to caption
Refer to caption
Refer to caption
Figure 6: Plots of the shape and (DaRRMγ){\mathcal{E}}(\textsf{DaRRM}_{\gamma}) of different γ\gamma functions: the optimized γSub\gamma_{Sub}, and the baselines γSub\gamma_{Sub} (corresponding to subsampling) and γconst\gamma_{const} (corresponding to RR). Here, K=35,M{10,13,15,20}K=35,M\in\{10,13,15,20\}, Δ=105\Delta=10^{-5}, ϵ=0.1{\epsilon}=0.1, δ=0.1\delta^{\prime}=0.1.

D.1.2 Comparison in Pure Differential Privacy Settings

Consider the pure differential privacy setting, where Δ=δ=0\Delta=\delta=0. Note in this setting, it is known that simple composition is tight.

To compute an optimized γopt\gamma_{opt} in DaRRM, since we have shown the number of constraints is O(K3)O(K^{3}) if Δ=δ=0\Delta=\delta=0 (see Lemma 5.1), we can set KK to be larger. Here, we present results for K{11,101}K\in\{11,101\} and ϵ=0.1{\epsilon}=0.1.

Again, we compare the shape of different γ\gamma and the corresponding (DaRRMγ){\mathcal{E}}(\textsf{DaRRM}_{\gamma}) under those γ\gamma functions, fixing the total privacy loss to be mϵm{\epsilon}. γopt\gamma_{opt} is computed using a uniform prior 𝒯{\mathcal{T}}.

Since the subsampling mechanism from Section 4 with privacy amplification applies to this setting, we compare four different γ\gamma noise functions here:

  1. 1.

    γopt\gamma_{opt} (Ours): optimized γ\gamma function using our optimization framework

  2. 2.

    γSub\gamma_{Sub} (Baseline): the γ\gamma function that corresponds to outputting the majority of mm out KK subsampled mechanisms

  3. 3.

    γDSub\gamma_{DSub} (Baseline): the γ\gamma function that corresponds to outputting 2m12m-1 subsampled mechanisms from Theorem 4.1, aka., Double Subsampling (DSub)

  4. 4.

    γconst\gamma_{const} (Baseline): the constant γ\gamma function that corresponds to the classical Randomized Response (RR) algorithm

Setting 1. K=11K=11, m{1,3,5,7,9,11}m\in\{1,3,5,7,9,11\}.

Refer to caption
Refer to caption
Refer to caption
Figure 7: Plots of shape and (DaRRMγ){\mathcal{E}}(\textsf{DaRRM}_{\gamma}) of different γ\gamma functions: the optimized γOpt\gamma_{Opt}, the baselines γSub\gamma_{Sub} and γDSub\gamma_{DSub} (Theorem 4.1), and the constant γconst\gamma_{const} (corresponding to RR). Here, K=11,m{1,3,5,7,9,11}K=11,m\in\{1,3,5,7,9,11\}, ϵ=0.1{\epsilon}=0.1 and δ=Δ=0\delta=\Delta=0. Note when m{7,9}m\in\{7,9\}, the cyan line (γDSub\gamma_{DSub}) and the red line (γopt\gamma_{opt}) overlap. When m=11m=11, all lines overlap. Observe that when mK+12m\geq\frac{K+1}{2}, that is, m{7,9,11}m\in\{7,9,11\} in this case, the above plots suggest both γopt\gamma_{opt} and γDSub\gamma_{DSub} achieve the minimum error at 0. This is consistent with our theory.

Setting 2. K=101,m{10,20,30,40,60,80}K=101,m\in\{10,20,30,40,60,80\}.

Refer to caption
Refer to caption
Refer to caption
Figure 8: Plots of shape and (DaRRMγ){\mathcal{E}}(\textsf{DaRRM}_{\gamma}) of different γ\gamma functions: the optimized γOpt\gamma_{Opt}, the baselines γSub\gamma_{Sub} and γDSub\gamma_{DSub} (Theorem 4.1), and the constant γconst\gamma_{const} (corresponding to RR). Here, K=101,m{10,20,30,40,60,80}K=101,m\in\{10,20,30,40,60,80\}, ϵ=0.1{\epsilon}=0.1 and δ=Δ=0\delta=\Delta=0.

D.1.3 Comparison Using Different Prior Distributions

When optimizing γ\gamma that maximizes the utility in DaRRM, recall that the objective takes an expectation over pip_{i}’s for pi𝒯p_{i}\sim{\mathcal{T}}, where 𝒯{\mathcal{T}} is some distribution and pi=Pr[Mi(𝒟)=1]p_{i}=\Pr[M_{i}({\mathcal{D}})=1]. The previous experiments assume we do not have access to any prior knowledge about pip_{i}’s and hence 𝒯{\mathcal{T}} is the uniform distribution, i.e., Uniform([0,1])\text{Uniform}([0,1]). However, when one has knowledge about the mechanisms, one can set a proper prior 𝒯{\mathcal{T}} to further maximize the utility of DaRRM.

In this section, let 𝒯U{\mathcal{T}}_{U} denote Uniform([0,1])\text{Uniform}([0,1]) and we present results considering a different prior distribution, which we call 𝒯P{\mathcal{T}}_{P}, as follows. Suppose our prior belief is that each mechanism MiM_{i} has a clear tendency towards voting 0 or 1, i.e., pip_{i} is far from 0.5. Let 𝒯P{\mathcal{T}}_{P} be Uniform([0,0.3][0.7,1][0,0.3]\cup[0.7,1]).

To optimize γ\gamma under 𝒯P{\mathcal{T}}_{P}, we change the approximate optimization objective in Eq. 202, which optimizes γ\gamma under 𝒯U{\mathcal{T}}_{U}, to be the following,

12l=K+12K0.710.710.71(αlαKl)𝑑p1𝑑p2𝑑pKγ(l)\displaystyle-\frac{1}{2}\sum_{l=\frac{K+1}{2}}^{K}\int_{0.7}^{1}\int_{0.7}^{1}\dots\int_{0.7}^{1}(\alpha_{l}-\alpha_{K-l})dp_{1}dp_{2}\dots dp_{K}\cdot\gamma(l) (204)

Setting. K=11,m{3,5}K=11,m\in\{3,5\}, ϵ=0.1{\epsilon}=0.1, δ=Δ=0\delta=\Delta=0.

We compare the shape and (DaRRMγ){\mathcal{E}}(\textsf{DaRRM}_{\gamma}) of different γ\gamma functions:

  1. 1.

    γoptU\gamma_{opt-U} denote the γ\gamma function optimized under pi𝒯Up_{i}\sim{\mathcal{T}}_{U}

  2. 2.

    γoptP\gamma_{opt-P} denote the γ\gamma function optimized under pi𝒯Pp_{i}\sim{\mathcal{T}}_{P}

  3. 3.

    γSub\gamma_{Sub}, corresponding to the subsampling baseline

  4. 4.

    γconst\gamma_{const}, corresponding to the RR baseline

Note when we compute the error, we take the expectation w.r.t. the actual pip_{i} distributions, regardless of the prior used to optimize γ\gamma. In the experiments, we consider three different actual pip_{i} distributions:"

  1. 1.

    “Actual: Uniform([0,1])([0,1])”: pi𝒯U,i[K]p_{i}\sim{\mathcal{T}}_{U},\forall i\in[K]

  2. 2.

    “Actual: pi=0.5p_{i}=0.5”: pi=0.5,i[K]p_{i}=0.5,\forall i\in[K]

    This setting implies the mechanisms do not have a clear majority

  3. 3.

    “Actual: Uniform([0,0.1])([0,0.1])”: piUniform([0,0.1]),i[K]p_{i}\sim\text{Uniform}([0,0.1]),\forall i\in[K]

    This setting implies the mechanisms have a clear majority (i.e., 0)

Since our prior 𝒯P{\mathcal{T}}_{P} is closer to Uniform([0,0.1])\text{Uniform}([0,0.1]) (i.e., there is a clear majority), we would expect (DaRRMγoptP){\mathcal{E}}(\textsf{DaRRM}_{\gamma_{opt-P}}) to be the lowest when piUniform[0,0.1]p_{i}\sim\text{Uniform}[0,0.1], but to be higher than (DaRRMγoptU){\mathcal{E}}(\textsf{DaRRM}_{\gamma_{opt-U}}) when piUniform([0,1])p_{i}\sim\text{Uniform}([0,1]) or pi=0.5p_{i}=0.5. The results are presented in Figure 9.

Refer to caption
Refer to caption
Refer to caption
Figure 9: Comparison of the shape and (DaRRMγ){\mathcal{E}}(\textsf{DaRRM}_{\gamma}) of different γ\gamma functions: 1) γ\gamma optimized under prior 𝒯U{\mathcal{T}}_{U}, 2) γ\gamma optimized under prior 𝒯P{\mathcal{T}}_{P}, 3) γSub\gamma_{Sub} (corresponding to the subsampling baseline) and 4) γconst\gamma_{const} (corresponding to the RR baseline). Here, K=11,m{3,5},ϵ=0.1K=11,m\in\{3,5\},{\epsilon}=0.1. Observe that if the prior 𝒯P{\mathcal{T}}_{P} used in optimizing γ\gamma is closer to the actual distribution of pip_{i}’s, there is additional utility gain (i.e., decreased error); otherwise, we slightly suffer a utility loss (i.e., increased error), compared to optimize γ\gamma under the 𝒯U{\mathcal{T}}_{U} prior. Furthermore, regardless of the choice of the prior distribution 𝒯{\mathcal{T}} in optimizing γ\gamma, DaRRMγ\textsf{DaRRM}_{\gamma} with an optimized γ\gamma achieves a lower error compared to the the baselines.

D.2 Private Semi-Supervised Knowledge Transfer

D.2.1 More Details about the Baseline GNMax Papernot et al. (2018)

The GNMax aggregation mechanism for majority ensembling of non-private teachers proceeds as follows (Section 4.1 of Papernot et al. (2018)): on input xx,

Mσ(x)=argmaxi{ni(x)+𝒩(0,σ2)}\displaystyle M_{\sigma}(x)=\operatorname*{arg\,max}_{i}\{n_{i}(x)+{\mathcal{N}}(0,\sigma^{2})\} (205)

where ni(x)n_{i}(x) is # teachers who vote for class ii.

How to set σ\sigma in GNMax?

Section 4.1 of Papernot et al. (2018) states the GNMax mechanism is (λ,λ/σ2)(\lambda,\lambda/\sigma^{2})-Renyi differentially private (RDP), for all λ1\lambda\geq 1. RDP bounds can be converted to DP bounds as follows:

Theorem D.1 (RDP to DP (Theorem 5 of Papernot et al. (2018))).

If a mechanism MM guarantees (λ,ϵ)(\lambda,{\epsilon})-RDP, then MM guarantees (ϵ+log1/δλ1,δ)({\epsilon}+\frac{\log 1/\delta}{\lambda-1},\delta)-differential privacy for δ(0,1)\delta\in(0,1).

Therefore, GNMax with parameter σ2\sigma^{2} guarantees (λσ2+log1/δλ1,δ)(\frac{\lambda}{\sigma^{2}}+\frac{\log 1/\delta}{\lambda-1},\delta)-differential privacy, λ1\forall\lambda\geq 1. Given m,ϵ,Δm,{\epsilon},\Delta, we want to choose λ\lambda and σ2\sigma^{2} here so that the output of GNMax is (mϵ,mΔ)(m{\epsilon},m\Delta)-differentially private. Here, δ=mΔ\delta=m\Delta.

We first obtain a valid range of λ\lambda. Since mϵ0m{\epsilon}\geq 0, λσ2+log1/δλ10\frac{\lambda}{\sigma^{2}}+\frac{\log 1/\delta}{\lambda-1}\geq 0 and so λlog1/δmϵ+1:=λmin\lambda\geq\frac{\log 1/\delta}{m{\epsilon}}+1:=\lambda_{min}. And σ2=λmϵlog1/δλ1\sigma^{2}=\frac{\lambda}{m{\epsilon}-\frac{\log 1/\delta}{\lambda-1}}. Since the smaller σ2\sigma^{2} is, the higher the utility, we perform a grid search over λ[λmin,500]\lambda\in[\lambda_{min},500], with discretized λ\lambda values of equal distance 0.5, to find the minimum σmin2\sigma_{min}^{2}. For the (mϵ,mΔ)(m{\epsilon},m\Delta) values used in the experiments, we observe σ2\sigma^{2} decreases first and then increases as λ\lambda increases, as shown in Figure 10. The λ\lambda and σmin\sigma_{min} values in the RDP bound of Gaussian noise to compute the privacy loss of GNMax’s output we use in the experiments are presented in Table 4.

Refer to caption
Refer to caption
Figure 10: Plots of λ\lambda vs. σ2\sigma^{2} in the Gaussian RDP privacy bound. The goal is to choose a λ\lambda value that minimizes σ2\sigma^{2}. It is not hard to see the value of σ2\sigma^{2} decreases at first and then increases as λ\lambda increases.
Privacy Loss Per Query (mϵ,mΔ)(m{\epsilon},m\Delta) λ\lambda σmin\sigma_{min}
MNIST (0.2676,0.0003)(0.2676,0.0003) 34.31 21.46
Fashion-MNIST (0.2556,0.0003)(0.2556,0.0003) 35.74 22.46
Table 4: Parameters of the RDP bound of Gaussian noise to compute the privacy loss of GNMax’s output.

A Note on the Data-dependent Privacy Loss Bound

Papernot et al. (2018) gives a potentially tighter data-dependent bound on the privacy loss using GNMax to output the majority of non-private teacherss votes. We give a clean pseudo-code on computing the data-dependent privacy loss bound in Algorithm 6, based on the lemmas and theorems in Papernot et al. (2018). Given privacy parameters σ,λ\sigma,\lambda and the teacher votes per class {ni}i=1C\{n_{i}\}_{i=1}^{C} for CC classes, the data-dependent bound can be empirically evaluated and compared against the Gaussian privacy loss bound. The smaller one is the final privacy loss. We empirically find that the condition of the data-dependent bound (line 8 in Algorithm 6) is not satisfied when KK and the number of classes CC are small, e.g., K=11,C=2K=11,C=2 as in our case, even if all teachers agree on the same output. And so in the experiments, we can only apply the Gaussian privacy loss bound (line 14).

Algorithm 6 Compute Tighter Privacy Loss
1:  Input: Std. of Gaussian noise σ\sigma, Privacy parameter λ\lambda, # teachers KK, # classes CC, # votes per class {ni}i=1C\{n_{i}\}_{i=1}^{C}
2:  {}{\mathcal{B}}\leftarrow\{\} bound candidates
3:  for i=1,2,,Ki=1,2,\dots,K do
4:     q(i)12iierfc(nini2σ)q^{(i)}\leftarrow\frac{1}{2}\sum_{i\neq i^{*}}\text{erfc}(\frac{n_{i^{*}}-n_{i}}{2\sigma})
5:     μ2(i)σlog1/q(i)\mu_{2}^{(i)}\leftarrow\sigma\cdot\sqrt{\log 1/q^{(i)}}, μ1(i)μ2(i)+1\mu_{1}^{(i)}\leftarrow\mu_{2}^{(i)}+1
6:     ϵ1(i)μ1(i)σ2{\epsilon}_{1}^{(i)}\leftarrow\frac{\mu_{1}^{(i)}}{\sigma^{2}}, ϵ2(i)μ2(i)σ2{\epsilon}_{2}^{(i)}\leftarrow\frac{\mu_{2}^{(i)}}{\sigma^{2}}
7:     qub(i)exp((μ2(i)1)ϵ2(i))/(μ1(i)μ1(i)1μ2(i)μ2(i)1)μ2(i)q_{ub}^{(i)}\leftarrow\exp((\mu_{2}^{(i)}-1)^{{\epsilon}_{2}^{(i)}})/(\frac{\mu_{1}^{(i)}}{\mu_{1}^{(i)}-1}\cdot\frac{\mu_{2}^{(i)}}{\mu_{2}^{(i)}-1})^{\mu_{2}^{(i)}}
8:     if q(i)<1q^{(i)}<1 and μ1(i)λ\mu_{1}^{(i)}\geq\lambda and μ2>1\mu_{2}>1 and q(i)qub(i)q^{(i)}\leq q^{(i)}_{ub} then
9:        A(i)(1q(i))/(1q(i)exp(ϵ2(i))μ2(i)1μ2(i))A^{(i)}\leftarrow(1-q^{(i)})/(1-q^{(i)}\cdot\exp({\epsilon}_{2}^{(i)})^{\frac{\mu_{2}^{(i)}-1}{\mu_{2}^{(i)}}})
10:        B(i)exp(ϵ1(i))/(q(i))1μ1(i)1B^{(i)}\leftarrow\exp({\epsilon}_{1}^{(i)})/(q^{(i)})^{\frac{1}{\mu_{1}^{(i)}-1}}
11:        DataDependentBound1λ1((1q(i))(A(i))λ1+q(i)(B(i))λ1)\text{DataDependentBound}\leftarrow\frac{1}{\lambda-1}\cdot\Big{(}(1-q^{(i)})\cdot(A^{(i)})^{\lambda-1}+q^{(i)}\cdot(B^{(i)})^{\lambda-1}\Big{)}
12:        DataDependentBound{\mathcal{B}}\leftarrow{\mathcal{B}}\cup\text{DataDependentBound}
13:     else
14:        GaussianBoundλσ2\text{GaussianBound}\leftarrow\frac{\lambda}{\sigma^{2}}
15:        GaussianBound{\mathcal{B}}\leftarrow{\mathcal{B}}\cup\text{GaussianBound}
16:     end if
17:  end for
18:  Return min\min{\mathcal{B}}

D.2.2 Additional Results for Private Semi-Supervised Knowledge Transfer

m=1m=1.

Dataset # Queries Privacy loss per query (ϵquery,δquery)({\epsilon}_{query},\delta_{query}) Total privacy loss over QQ queries (ϵtotal,δtotal)({\epsilon}_{total},\delta_{total})
MNIST Q=20Q=20 (0.0892,0.0001)(0.0892,0.0001) (1.704,0.002)({1.704},0.002)
Q=50Q=50 (2.837,0.005)({2.837},0.005)
Q=100Q=100 (4.202,0.010)({4.202},0.010)
Fashion MNIST Q=20Q=20 (0.0852,0.0001)(0.0852,0.0001) (1.620,0.002)({1.620},0.002)
Q=50Q=50 (2.695,0.005)({2.695},0.005)
Q=100Q=100 (3.988,0.010)({3.988},0.010)
Table 5: The privacy loss per query to the teachers and the total privacy loss over QQ queries. Note the total privacy loss is computed by general composition, where we set δ=0.0001\delta^{\prime}=0.0001.

Dataset MNIST # Queries GNMax (Baseline) DaRRMγSub\textsf{DaRRM}_{\gamma_{Sub}} (Baseline) DaRRMγopt\textsf{DaRRM}_{\gamma_{opt}} (Ours) Q=20Q=20 0.54 (0.11) 0.68 (0.07) 0.74 (0.08) Q=50Q=50 0.51 (0.07) 0.67 (0.05) 0.66 (0.05) Q=100Q=100 0.57 (0.03) 0.71 (0.03) 0.69 (0.04)

Dataset Fashion-MNIST # Queries GNMax (Baseline) DaRRMγSub\textsf{DaRRM}_{\gamma_{Sub}} (Baseline) DaRRMγopt\textsf{DaRRM}_{\gamma_{opt}} (Ours) Q=20Q=20 0.56 (0.10) 0.92 (0.05) 0.89 (0.06) Q=50Q=50 0.52 (0.05) 0.89 (0.04) 0.92 (0.03) Q=100Q=100 0.56 (0.04) 0.89 (0.04) 0.91 (0.04)

Table 6: Accuracy of the predicted labels of QQ query samples on datasets MNIST (on the left) and Fashion-MNIST (on the right). We report the mean and one std. in parentheses over 10 random draws of the query samples from the test dataset. Note each prediction on the query sample is (ϵtotal,δtotal)({\epsilon}_{total},\delta_{total})-differentially private. Note in this case where m=1m=1, by Lemma 3.2, subsampling achieves the optimal error/utility. Hence, there is not much difference in terms of accuracy between DaRRMγSub\textsf{DaRRM}_{\gamma_{Sub}} and DaRRMγopt\textsf{DaRRM}_{\gamma_{opt}} as expected.

m=5m=5.

Dataset # Queries Privacy loss per query (ϵquery,δquery)({\epsilon}_{query},\delta_{query}) Total privacy loss over QQ queries (ϵtotal,δtotal)({\epsilon}_{total},\delta_{total})
MNIST Q=20Q=20 (0.4460,0.0005)(0.4460,0.0005) (8.920,0.010)({8.920},0.010)
Q=50Q=50 (18.428,0.025)({18.428},0.025)
Q=100Q=100 (28.926,0.049)({28.926},{0.049})
Fashion MNIST Q=20Q=20 (0.4260,0.0005)(0.4260,0.0005) (8.520,0.010)({8.520},0.010)
Q=50Q=50 (17.398,0.025)({17.398},0.025)
Q=100Q=100 (27.223,0.049)({27.223},{0.049})
Table 7: The privacy loss per query to the teachers and the total privacy loss over QQ queries. Note the total privacy loss is computed by general composition, where we set δ=0.0001\delta^{\prime}=0.0001.

Dataset MNIST # Queries GNMax (Baseline) DaRRMγSub\textsf{DaRRM}_{\gamma_{Sub}} (Baseline) DaRRMγopt\textsf{DaRRM}_{\gamma_{opt}} (Ours) Q=20Q=20 0.73 (0.11) 0.76 (0.09) 0.84 (0.07) Q=50Q=50 0.75 (0.07) 0.82 (0.04) 0.83 (0.04) Q=100Q=100 0.72 (0.04) 0.79 (0.05) 0.83 (0.03)

Dataset Fashion-MNIST # Queries GNMax (Baseline) DaRRMγSub\textsf{DaRRM}_{\gamma_{Sub}} (Baseline) DaRRMγopt\textsf{DaRRM}_{\gamma_{opt}} (Ours) Q=20Q=20 0.72 (0.10) 0.96 (0.04) 0.97 (0.04) Q=50Q=50 0.72 (0.08) 0.96 (0.02) 0.97 (0.02) Q=100Q=100 0.72 (0.06) 0.97 (0.01) 0.97 (0.01)

Table 8: Accuracy of the predicted labels of QQ query samples on datasets MNIST (on the left) and Fashion-MNIST (on the right). We report the mean and one std. in parentheses over 10 random draws of the query samples from the test dataset. Note each prediction on the query sample is (ϵtotal,δtotal)({\epsilon}_{total},\delta_{total})-differentially private. With the same per query privacy loss (and hence the same total privacy loss over QQ samples), DaRRMγopt\textsf{DaRRM}_{\gamma_{opt}} achieves the highest accuracy compared to the other two baselines.

m=7m=7.

Dataset # Queries Privacy loss per query (ϵquery,δquery)({\epsilon}_{query},\delta_{query}) Total privacy loss over QQ queries (ϵtotal,δtotal)({\epsilon}_{total},\delta_{total})
MNIST Q=20Q=20 (0.6244,0.0007)(0.6244,0.0007) (12.488,0.014)({12.488},0.014)
Q=50Q=50 (28.392,0.035)({28.392},0.035)
Q=100Q=100 (45.683,0.068)({45.683},{0.068})
Fashion MNIST Q=20Q=20 (0.5964,0.0007)(0.5964,0.0007) (11.928,0.014)({11.928},0.014)
Q=50Q=50 (26.738,0.035)({26.738},0.035)
Q=100Q=100 (42.873,0.068)({42.873},{0.068})
Table 9: The privacy loss per query to the teachers and the total privacy loss over QQ queries. Note the total privacy loss is computed by general composition, where we set δ=0.0001\delta^{\prime}=0.0001.

Dataset MNIST # Queries GNMax (Baseline) DaRRMγSub\textsf{DaRRM}_{\gamma_{Sub}} (Baseline) DaRRMγopt\textsf{DaRRM}_{\gamma_{opt}} (Ours) Q=20Q=20 0.79 (0.07) 0.80 (0.09) 0.85 (0.08) Q=50Q=50 0.80 (0.05) 0.82 (0.05) 0.85 (0.04) Q=100Q=100 0.80 (0.04) 0.80 (0.04) 0.83 (0.03)

Dataset Fashion-MNIST # Queries GNMax (Baseline) DaRRMγSub\textsf{DaRRM}_{\gamma_{Sub}} (Baseline) DaRRMγopt\textsf{DaRRM}_{\gamma_{opt}} (Ours) Q=20Q=20 0.79 (0.07) 0.95 (0.04) 0.96 (0.04) Q=50Q=50 0.79 (0.05) 0.96 (0.03) 0.97 (0.03) Q=100Q=100 0.79 (0.03) 0.96 (0.02) 0.96 (0.02)

Table 10: Accuracy of the predicted labels of QQ query samples on datasets MNIST (on the left) and Fashion-MNIST (on the right). We report the mean and one std. in parentheses over 10 random draws of the query samples from the test dataset. Note each prediction on the query sample is (ϵtotal,δtotal)({\epsilon}_{total},\delta_{total})-differentially private. With the same per query privacy loss (and hence the same total privacy loss over QQ samples), DaRRMγopt\textsf{DaRRM}_{\gamma_{opt}} achieves the highest accuracy compared to the other two baselines.