This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Adaptive Privacy Composition for Accuracy-first Mechanisms

Ryan Rogers Data and AI Foundations. LinkedIn Gennady Samorodnitsky Data and AI Foundations. LinkedIn Cornell University Zhiwei Steven Wu Carnegie Mellon University Aaditya Ramdas Carnegie Mellon University
Abstract

In many practical applications of differential privacy, practitioners seek to provide the best privacy guarantees subject to a target level of accuracy. A recent line of work by [12, 21] has developed such accuracy-first mechanisms by leveraging the idea of noise reduction that adds correlated noise to the sufficient statistic in a private computation and produces a sequence of increasingly accurate answers. A major advantage of noise reduction mechanisms is that the analysts only pay the privacy cost of the least noisy or most accurate answer released. Despite this appealing property in isolation, there has not been a systematic study on how to use them in conjunction with other differentially private mechanisms. A fundamental challenge is that the privacy guarantee for noise reduction mechanisms is (necessarily) formulated as ex-post privacy that bounds the privacy loss as a function of the released outcome. Furthermore, there has yet to be any study on how ex-post private mechanisms compose, which allows us to track the accumulated privacy over several mechanisms. We develop privacy filters [17, 7, 22] that allow an analyst to adaptively switch between differentially private and ex-post private mechanisms subject to an overall differential privacy guarantee.

1 Introduction

Although differential privacy has been recognized by the research community as the de-facto standard to ensure the privacy of a sensitive dataset while still allowing useful insights, it has yet to become widely applied in practice despite its promise to ensure formal privacy guarantees. There are notable applications of differential privacy, including the U.S. Census [1], yet few would argue that differential privacy has become quite standard in practice.

One common objection to differential privacy is that it injects noise and can cause spurious results for data analyses. A recent line of work in differential privacy has focused on developing accuracy-first mechanisms that aim to ensure a target accuracy guarantee while achieving the best privacy guarantee [12, 21]. In particular, these accuracy-first mechanisms do not ensure a predetermined level of privacy, but instead provide ex-post privacy, which allows the resulting privacy loss to depend on the outcome of the mechanism. This is in contrast to the prevalent paradigm of differential privacy that fixes the scale of privacy noise in advance and hopes the result is accurate. With accuracy-first mechanisms, practitioners instead specify the levels of accuracy that would ensure useful data analyses and then aim to achieve such utility with the strongest privacy guarantee.

However, one of the limitations of this line of work is that it is not clear how ex-post privacy mechanisms compose, so if we combine multiple ex-post privacy mechanisms, what is the overall privacy guarantee? Composition is one of the key properties of differential privacy (when used in a privacy-first manner), so it is important to develop a composition theory for ex-post privacy mechanisms. Moreover, how do we analyze the privacy guarantee when we compose ex-post privacy mechanisms with differentially private mechanisms?

Our work seeks to answer these questions by connecting with another line work in differential privacy on fully adaptive privacy composition. Traditional differential privacy composition results would require the analyst to fix privacy loss parameters, which is then inversely proportional to the scale of noise, for each analysis in advance, prior to any interaction. Knowing that there will be noise, the data scientist may want to select different levels of noise for different analyses, subject to some overall privacy budget. Privacy filters and odometers, introduced in [17], provide a way to bound the overall privacy guarantee despite adaptively selected privacy loss parameters. There have since been other works that have improved on the privacy loss bounds in this adaptive setting, to the point of matching (including constants) what one achieves in a nonadaptive setting [7, 22].

A natural next step would then be to allow an analyst some overall privacy loss budget to interact with the dataset and the analyst can then determine the accuracy metric they want to set with each new query. As a motivating example, consider some accuracy metric of α\alpha% relative error to different counts with some overall privacy loss parameters ϵ,δ\epsilon,\delta, so that the entire interaction will be (ε,δ)(\varepsilon,\delta)-differentially private. The first true count might be very large, so the amount of noise that needs to be added to ensure the target α\alpha% relative error can be huge, and hence very little of the privacy budget should be used for that query, allowing for potentially more results to be returned than an approach that sets an a priori noise level.

A baseline approach to add relative noise would be to add a large amount of noise and then check if the noisy count is within some tolerance based on the scale of noise added, then if the noisy count is deemed good enough we stop, otherwise we scale the privacy loss up by some factor and repeat. We refer to this approach as the doubling approach (see Section 7.1 for more details), which was also used in [12]. The primary issue with this approach is that the accumulated privacy loss needs to combine the privacy loss each time we add noise, even though we are only interested in the outcome when we stopped. Noise reduction mechanisms from [12, 21] show how it is possible to only pay for the privacy of the last noise addition. However, it is not clear how the privacy loss will accumulate over several noise reduction mechanisms, since each one ensures ex-post privacy, not differential privacy.

We make the following contributions in this work:

  • We present a general (basic) composition result for ex-post privacy mechanisms that can be used to create a privacy filter when an analyst can select arbitrary ex-post privacy mechanisms and (concentrated) differentially private mechanisms.

  • We develop a unified privacy filter that combines noise reduction mechanisms — specifically the Brownian Mechanism [21] — with traditional (concentrated) differentially private mechanisms.

  • We apply our results to the task of releasing counts from a dataset subject to a relative error bound comparing the unified privacy filter and the baseline doubling approach, which uses the privacy filters from [22].

Our main technical contribution is in the unified privacy filter for noise reduction mechanisms and differentially private mechanisms. Prior work [21] showed that the privacy loss of the Brownian noise reduction can be written in terms of a scaled Brownian motion at the time where the noise reduction was stopped. We present a new analysis for the ex-post privacy guarantee of the Brownian mechanism that considers a reverse time martingale, based on a scaled standard Brownian motion. Composition bounds for differential privacy consider a forward time martingale and apply a concentration bound, such as Azuma’s inequality [6], so we show how we can construct a forward time martingale from the stopped Brownian motions, despite the stopping times being adaptive, not predictable, at each time. See Figure 1 for a sketch of how we combine reverse time martingales into an overall forward time filtration.

Refer to caption
Figure 1: Our privacy filter tracks an accumulated privacy loss over many different mechanisms (forward filtration, X-axis) and stopping when it exceeds a predefined ϵ\epsilon (dotted line). Each mechanism satisfies approximate zCDP (blue dot) or is a noise reduction mechanism (black dots). The latter itself involves several rounds of interaction in a reverse filtration (red arrow) until a stopping criterion based on utility is met (red box). Later queries/mechanisms can depend on past ones.

As a running example, we consider the problem of releasing as many counts as possible subject to the constraint that each noisy count should have no more than a target relative error and that some overall privacy loss budget is preserved. A recent application of differential privacy for releasing Wikipedia usage data111https://www.tmlt.io/resources/publishing-wikipedia-usage-data-with-strong-privacy-guarantees considered relative accuracy as a top utility metric, demonstrating the importance of returning private counts subject to relative accuracy constraints. There have been other works that have considered adding relative noise subject to differential privacy. In particular, iReduct [23] was developed to release a batch of queries subject to a relative error bound. The primary difference between that work and our setting here is that we do not want to fix the number of queries in advance and we want to allow the queries to be selected in an adaptive way. Xiao et al. [23] consider a batch of mm queries and initially adds a lot of noise to each count and iteratively checks whether the noisy counts are good enough. The counts that should have smaller noise are then identified and a Laplace based noise reduction algorithm is used to decrease the noise on the identified set. They continue in this way until either all counts are good enough or a target privacy loss is exhausted. There are two scenarios that might arise: (1) all counts satisfy the relative error condition and we should have tried more counts because some privacy loss budget remains, (2) the procedure stopped before some results had a target relative error, so we should have selected fewer queries. In either case, selecting the number of queries becomes a parameter that the data analyst would need to select in advance, which would be difficult to do a priori. In our setting, no such parameter arises. Furthermore, they add up the privacy loss parameters for each count to see if it is below a target privacy loss bound at each step (based on the 1\ell_{1}-general sensitivity), however we show that adding up the privacy loss parameters can be significantly improved on. Other works have modified the definition of differential privacy to accommodate relative error, e.g. [20].

2 Preliminaries

We start with some basic definitions from differential privacy, beginning with the standard definition from Dwork et al. [5, 4]. We first need to define what we mean by neighboring datasets, which can mean adding or removing one record to a dataset or changing an entry in a dataset. We will leave the neighboring relation arbitrary, and write x,x𝒳x,x^{\prime}\in\mathcal{X} to be neighbors as xxx\sim x^{\prime}.

Definition 2.1.

An algorithm A:𝒳𝒴A:\mathcal{X}\rightarrow\mathcal{Y} is (ϵ,δ)(\epsilon,\delta)-differentially private if, for any measurable set E𝒴E\subset\mathcal{Y} and any neighboring inputs xxx\sim x^{\prime},

Pr[A(x)E]eϵPr[A(x)E]+δ.\Pr[A(x)\in E]\leq e^{\epsilon}\Pr[A(x^{\prime})\in E]+\delta. (1)

If δ=0\delta=0, we say AA is ε\varepsilon-DP or simply pure DP.

The classical pure DP mechanism is the Laplace mechanism, which adds noise to a statistic following a Laplace distribution.

Definition 2.2 (Laplace Mechanism).

Given a statistic f:𝒳df:\mathcal{X}\to\mathbb{R}^{d}, the Laplace Mechanism M:𝒳dM:\mathcal{X}\to\mathbb{R}^{d} with privacy parameter ε\varepsilon returns M(x)=f(x)+Lap(1/ε)M(x)=f(x)+\texttt{Lap}(1/\varepsilon).

In order to ensure enough noise is added to a statistic to ensure privacy, it is important to know the statistic’s sensitivity which we define as the following where the max is over all possible neighboring datasets x,xx,x^{\prime}

Δp(f):=maxx,x:xx{f(x)f(x)p}.\Delta_{p}(f):=\max_{x,x^{\prime}:x\sim x^{\prime}}\left\{||f(x)-f(x^{\prime})||_{p}\right\}.
Lemma 2.1 (Dwork et al. [5]).

Given f:𝒳df:\mathcal{X}\to\mathbb{R}^{d}, the Laplace mechanism with privacy parameter ε>0\varepsilon>0 is Δ1(f)ε\Delta_{1}(f)\cdot\varepsilon-DP.

A central actor in our analysis will be the privacy loss of an algorithm, which will be a random variable that depends on the outcomes of the algorithm under a particular dataset.

Definition 2.3 (Privacy Loss).

Let A:𝒳𝒴A:\mathcal{X}\rightarrow\mathcal{Y} be an algorithm and fix neighbors xxx\sim x^{\prime} in 𝒳\mathcal{X}. Let pxp^{x} and pxp^{x^{\prime}} be the respective densities of A(x)A(x) and A(x)A(x^{\prime}) on the space 𝒴\mathcal{Y} with respect to some reference measure. Then, the privacy loss between A(x)A(x) and A(x)A(x^{\prime}) evaluated at point y𝒴y\in\mathcal{Y} is:

A(y;x,x):=log(px(y)px(y)).\mathcal{L}_{A}(y;x,x^{\prime}):=\log\left(\frac{p^{x}(y)}{p^{x^{\prime}}(y)}\right).

Further, we refer to the privacy loss random variable to be A(x,x):=A(A(x);x,x)\mathcal{L}_{A}(x,x^{\prime}):=\mathcal{L}_{A}(A(x);x,x^{\prime}). When the algorithm AA and the neighboring datasets are clear from context, we drop them from the privacy loss, i.e. (y)=A(y;x,x)\mathcal{L}(y)=\mathcal{L}_{A}(y;x,x^{\prime}) and =A(x,x)\mathcal{L}=\mathcal{L}_{A}(x,x^{\prime}).

2.1 Zero-concentrated DP

Many existing privacy composition analyses leverage a variant of DP called (approximate) zero-concentrated DP (zCDP), introduced by Bun and Steinke [2]. Recall that the Rényi Divergence of order λ1\lambda\geq 1 between two distributions PP and QQ on the same domain is written as the following where p()p(\cdot) and q()q(\cdot) are the respective probability mass/density functions,

Dλ(P||Q):=1λ1log(𝔼yP[(p(y)q(y))λ1]).D_{\lambda}(P||Q):=\frac{1}{\lambda-1}\log\left(\mathbb{E}_{y\sim P}\left[\left(\frac{p(y)}{q(y)}\right)^{\lambda-1}\right]\right).

Since we study fully adaptive privacy composition (where privacy parameters can be chosen adaptively), we will use the following conditional extension of approximate zCDP, in which the zCDP parameters of a mechanism AA can depend on prior outcomes.

Definition 2.4 (Approximate zCDP [22]).

Suppose A:𝒳×𝒵𝒴A:\mathcal{X}\times\mathcal{Z}\to\mathcal{Y} with outputs in a measurable space (𝒴,𝒢)(\mathcal{Y},\mathcal{G}). Suppose δ,ρ:𝒵0\delta,\rho:\mathcal{Z}\rightarrow\mathbb{R}_{\geq 0}. We say the algorithm AA satisfies conditional δ(z)\delta(z)-approximate ρ(z)\rho(z)-zCDP if, for all z𝒵z\in\mathcal{Z} and any neighboring datasets x,xx,x^{\prime}, there exist probability transition kernels P,P′′,Q,Q′′:𝒵×𝒢[0,1]P^{\prime},P^{\prime\prime},Q^{\prime},Q^{\prime\prime}:\mathcal{Z}\times\mathcal{G}\rightarrow[0,1] such that the conditional outputs are distributed according to the following mixture distributions:

A(x;z)\displaystyle A(x;z) (1δ(z))P(z)+δ(z)P′′(z)\displaystyle\sim(1-\delta(z))P^{\prime}(\cdot\mid z)+\delta(z)P^{\prime\prime}(\cdot\mid z)
A(x;z)\displaystyle A(x^{\prime};z) (1δ(z))Q(z)+δ(z)Q′′(z),\displaystyle\sim(1-\delta(z))Q^{\prime}(\cdot\mid z)+\delta(z)Q^{\prime\prime}(\cdot\mid z),

where for all λ1\lambda\geq 1, Dλ(P(z)Q(z))ρ(z)λD_{\lambda}(P^{\prime}(\cdot\mid z)\|Q^{\prime}(\cdot\mid z))\leq\rho(z)\lambda and Dλ(Q(z)P(z))ρ(z)λD_{\lambda}(Q^{\prime}(\cdot\mid z)\|P^{\prime}(\cdot\mid z))\leq\rho(z)\lambda, z𝒵\forall z\in\mathcal{Z}.

The classical mechanism for zCDP is the Gaussian mechanism, similar to how the Laplace mechanism is the typical mechanism for pure DP.

Definition 2.5 (Gaussian Mechanism).

Given a statistic f:𝒳df:\mathcal{X}\to\mathbb{R}^{d}, the Gaussian Mechanism M:𝒳dM:\mathcal{X}\to\mathbb{R}^{d} with privacy parameter ρ>0\rho>0 returns M(x)=N(f(x),1/ρId)M(x)=\texttt{N}\left(f(x),1/\rho\cdot I_{d}\right).

We then have the following result, which ensures privacy that scales with the 2\ell_{2} sensitivity of statistic ff.

Lemma 2.2 (Bun and Steinke [2]).

Given f:𝒳df:\mathcal{X}\to\mathbb{R}^{d}, the Gaussian mechanism with privacy parameter ρ\rho is Δ2(f)2ρ/2\Delta_{2}(f)^{2}\cdot\rho/2-zCDP.

The following results establish the relationship between zCDP and DP and the composition of zCDP.

Lemma 2.3 (Bun and Steinke [2]).

If M:𝒳𝒴M:\mathcal{X}\to\mathcal{Y} is (ε,δ)(\varepsilon,\delta)-DP, then MM is also δ\delta-approximate ε2/2\varepsilon^{2}/2-zCDP. If MM is δ\delta-approximate ρ\rho-zCDP, then MM is also (ρ+2ρlog(1/δ′′),δ+δ′′)(\rho+2\sqrt{\rho\log(1/\delta^{\prime\prime})},\delta+\delta^{\prime\prime})-DP for δ′′>0\delta^{\prime\prime}>0.

Lemma 2.4 (Bun and Steinke [2]).

If M1:𝒳𝒴M_{1}:\mathcal{X}\to\mathcal{Y} is δ1\delta_{1}-approximate ρ1\rho_{1}-zCDP and M2:𝒳×𝒴𝒴M_{2}:\mathcal{X}\times\mathcal{Y}\to\mathcal{Y}^{\prime} is δ2\delta_{2}-approximate ρ2\rho_{2}-zCDP in its first coordinate, then M:𝒳𝒴M:\mathcal{X}\to\mathcal{Y}^{\prime} where M(x)=(M1(x),M2(x,M1(x)))M(x)=(M_{1}(x),M_{2}(x,M_{1}(x))) is (δ1+δ2)(\delta_{1}+\delta_{2})-approximate (ρ1+ρ2)(\rho_{1}+\rho_{2})-zCDP.

2.2 Concurrent composition

We will use concurrent composition for differential privacy, introduced by [18] and with subsequent work in [13, 19], in our analysis, which we define next. We first define an interactive system.

Definition 2.6.

An interactive system is a randomized algorithm S:(𝒬×𝒴)×𝒬𝒴S:(\mathcal{Q}\times\mathcal{Y})^{*}\times\mathcal{Q}\to\mathcal{Y} with input an interactive history (q1,y1),(q2,y2),,(qt,yt)(𝒬×𝒴)t(q_{1},y_{1}),(q_{2},y_{2}),\cdots,(q_{t},y_{t})\in(\mathcal{Q}\times\mathcal{Y})^{t} with a query qt+1𝒬q_{t+1}\in\mathcal{Q}. The output of SS is denoted yt+1S((qi,yi)i[t],qt+1)y_{t+1}\sim S((q_{i},y_{i})_{i\in[t]},q_{t+1}).

Note that an interactive system may also consist of an input dataset xx that the queries are evaluated on, which will then induce an interactive system. In particular, we will consider two neighboring datasets x(0)x^{(0)} and x(1)x^{(1)}, which will then induce interactive systems Si(b)S^{(b)}_{i} corresponding to input data x(b)x^{(b)} for b{0,1}b\in\{0,1\} and i[k]i\in[k]. We will use the concurrent composition definition from [13].

Definition 2.7 (Concurrent Composition).

Suppose S1,,SkS_{1},\cdots,S_{k} are kk interactive systems. The concurrent composition of them is an interactive system COMP(S1,,Sk)\mathrm{COMP}(S_{1},\cdots,S_{k}) with query domain [k]×𝒬[k]\times\mathcal{Q} and response domain 𝒴\mathcal{Y}. An adversary is a (possibly randomized) query algorithm 𝒜:([k]×𝒬×𝒴)[k]×𝒬\mathcal{A}:([k]\times\mathcal{Q}\times\mathcal{Y})^{*}\to[k]\times\mathcal{Q} . The interaction between 𝒜\mathcal{A} and COMP(S1,,Sk)\mathrm{COMP}(S_{1},\cdots,S_{k}) is a stochastic process that runs as follows. 𝒜\mathcal{A} first computes a pair (i1,q1)[k]×𝒬(i_{1},q_{1})\in[k]\times\mathcal{Q} , sends a query q1q_{1} to Si1S_{i_{1}} and gets the response y1y_{1}. In the tt-th step, 𝒜\mathcal{A} calculates the next pair (it,qt)(i_{t},q_{t}) based on the history, sends the tt-th query qtq_{t} to SitS_{i_{t}} and receives yty_{t}. There is no communication or interaction between the interactive systems. Each system SiS_{i} can only see its own interaction with 𝒜\mathcal{A}. Let IT(𝒜:S1,,Sk)\mathrm{IT}(\mathcal{A}:S_{1},\cdots,S_{k}) denote the random variable recording the transcript of the interaction.

We will be interested in how much the distribution of the transcript of interaction changes for interactive systems with neighboring inputs. We will say COMP(S1,,Sk)COMP(S_{1},\cdots,S_{k}) is (ε,δ)(\varepsilon,\delta)-DP if for all neighboring datasets x(0),x(1)x^{(0)},x^{(1)}, we have for any outcome set of transcripts EE, we have for b{0,1}b\in\{0,1\}

Pr[IT(𝒜:S1(b),,Sk(b))E]eεPr[IT(𝒜:S1(1b),,Sk(1b))E]+δ.\Pr[\mathrm{IT}(\mathcal{A}:S_{1}^{(b)},\cdots,S_{k}^{(b)})\in E]\leq e^{\varepsilon}\Pr[\mathrm{IT}(\mathcal{A}:S_{1}^{(1-b)},\cdots,S_{k}^{(1-b)})\in E]+\delta.

We then have the following result from [13].

Theorem 1.

Let A1,,AkA_{1},\cdots,A_{k} be kk interactive mechanisms that run on the same data set. Suppose that each mechanism AiA_{i} satisfies (εi,δi)(\varepsilon_{i},\delta_{i})-DP. Then COMP(𝒜:A1,,Ak)\mathrm{COMP}(\mathcal{A}:A_{1},\cdots,A_{k}) is (ε,δ)(\varepsilon,\delta)-DP, where ε,δ\varepsilon,\delta are given by the optimal (sequential) composition theorem [10, 15].

This result is very powerful because we can consider the privacy of each interactive system separately and still be able to provide a differential privacy guarantee for the more complex setting that allows an analyst to interweave queries to different interactive systems.

3 Privacy Filters

In order for us to reason about the overall privacy loss of an interaction with a sensitive dataset, we will use the framework of privacy filters, introduced in [17]. Privacy filters allow an analyst to adaptively select privacy parameters as a function of previous outcomes until some stopping time that determines whether a target privacy budget has been exhausted. To denote that privacy parameters may depend on previous outcomes, we will write ρn(x)\rho_{n}(x) to mean the privacy parameter selected at round nn that could depend on the previous outcomes A1:n1(x)A_{1:n-1}(x), and similarly for δn(x)\delta_{n}(x). We now state the definition of privacy filters in the context of approximate zCDP mechanisms.

Definition 3.1 (Privacy Filter).

Let (An:𝒳𝒴)n1(A_{n}:\mathcal{X}\to\mathcal{Y})_{n\geq 1} be an adaptive sequence of algorithms such that, for all n1n\geq 1, An(;y1:n1)A_{n}(\cdot;y_{1:n-1}) is δn(y1:n1)\delta_{n}(y_{1:n-1})-approximate ρn(y1:n1)\rho_{n}(y_{1:n-1})-zCDP for all y1:n1𝒴n1y_{1:n-1}\in\mathcal{Y}^{n-1}. Let ϵ>0\epsilon>0 and δ0\delta\geq 0 be fixed privacy parameters. Then, a function N:𝒴N:\mathcal{Y}^{\infty}\rightarrow\mathbb{N} is an (ϵ,δ)(\epsilon,\delta)-privacy filter if

  1. 1.

    for all (y1,y2,)𝒴(y_{1},y_{2},\cdots)\in\mathcal{Y}^{\infty}, N(y1,y2,)N(y_{1},y_{2},\cdots) is a stopping time with respect to the natural filtration generated by (An(x))n1(A_{n}(x))_{n\geq 1}, and

  2. 2.

    the algorithm A1:N()()A_{1:N(\cdot)}(\cdot) is (ϵ,δ)(\epsilon,\delta)-differentially private where N(x)=N(A1(x),A2(x),)N(x)=N(A_{1}(x),A_{2}(x),\cdots).

Whitehouse et al. [22] showed that we can use composition bounds from traditional differential privacy (which required setting privacy parameters for the interaction prior to any interaction) in the more adaptive setting, where privacy parameters can be set before each query.

Theorem 2.

Suppose (An:𝒳𝒴)n1(A_{n}:\mathcal{X}\to\mathcal{Y})_{n\geq 1} is a sequence of algorithms such that, for any n1n\geq 1, An(;y1:n1)A_{n}(\cdot;y_{1:n-1}) is δn(y1:n1)\delta_{n}(y_{1:n-1})-approximate ρn(y1:n1)\rho_{n}(y_{1:n-1})-zCDP for all y1:n1y_{1:n-1}. Let ϵ>0\epsilon>0 and δ0\delta\geq 0, δ′′>0\delta^{\prime\prime}>0 be fixed privacy parameters. Consider the function N:0N:\mathbb{R}_{\geq 0}^{\infty}\rightarrow\mathbb{N} given by

N(y1,y2,)\displaystyle N(y_{1},y_{2},\cdots)
:=inf{n:ϵ<2log(1δ′′)mn+1ρm(y1:m1)+mn+1ρm(y1:m1) or δ<mn+1δm(y1:m1)}.\displaystyle\qquad:=\inf\left\{n:\epsilon<2\sqrt{\log\left(\frac{1}{\delta^{\prime\prime}}\right)\sum_{m\leq n+1}\rho_{m}(y_{1:m-1})}+\sum_{m\leq n+1}\rho_{m}(y_{1:m-1})\ \text{ or }\ \delta<\sum_{m\leq n+1}\delta_{m}(y_{1:m-1})\right\}.

Then, the algorithm A1:N()():𝒳𝒴A_{1:N(\cdot)}(\cdot):\mathcal{X}\rightarrow\mathcal{Y}^{*} is (ϵ,δ+δ′′)(\epsilon,\delta+\delta^{\prime\prime})-DP, where N(x):=N((An(x))n1)N(x):=N((A_{n}(x))_{n\geq 1}). In other words, NN is an (ϵ,δ)(\epsilon,\delta)-privacy filter.

4 Ex-post Private Mechanisms

Although privacy filters allow a lot of flexibility to a data analyst in how they interact with a sensitive dataset while still guaranteeing a fixed privacy budget, there are some algorithms that ensure a bound on privacy that is adapted to the dataset. Ex-post private mechanisms define privacy loss as a probabilistic bound which can depend on the algorithm’s outcomes, so some outcomes might contain more information about an individual than others [12, 21]. Note that ex-post private mechanisms do not have any fixed a priori bound on the privacy loss, so by default they cannot be composed in a similar way to differentially private mechanisms.

Definition 4.1.

Let A:𝒳×𝒴𝒵A:\mathcal{X}\times\mathcal{Y}\rightarrow\mathcal{Z} be an algorithm and :𝒵×𝒴0\mathcal{E}:\mathcal{Z}\times\mathcal{Y}\rightarrow\mathbb{R}_{\geq 0} a function. We say A(;y)A(\cdot;y) is ((;y),δ(y))(\mathcal{E}(\cdot;y),\delta(y))-ex-post private for all y𝒴y\in\mathcal{Y} if, for any neighboring inputs xxx\sim x^{\prime}, we have Pr[(A(x;y))>(A(x;y);y)]δ(y)\Pr\left[\mathcal{L}(A(x;y))>\mathcal{E}(A(x;y);y)\right]\leq\delta(y) for all y𝒴y\in\mathcal{Y}.

We next define a single noise reduction mechanism, which will interactively apply sub-mechanisms and stop at some time kk, which can be random. Each iterate will use a privacy parameter from an increasing sequence of privacy parameters (ε(k):k1)(\varepsilon^{(k)}:k\geq 1) and the overall privacy will only depend on the last privacy parameter ϵ(k)\epsilon^{(k)}, despite releasing noisy values with parameters ε(i)\varepsilon^{(i)} for iki\leq k. Noise reduction algorithms will allow us to form ex-post private mechanisms because the privacy loss will only depend on the final outcome. We will write M:𝒳𝒴M:\mathcal{X}\to\mathcal{Y}^{*} to be any algorithm mapping databases to sequences of outputs in 𝒴\mathcal{Y}, with intermediate mechanisms written as M(k):𝒳𝒴M^{(k)}:\mathcal{X}\to\mathcal{Y} for the kk-th element of the sequence and M(1:k):𝒳𝒴kM^{(1:k)}:\mathcal{X}\to\mathcal{Y}^{k} for the first kk elements. Let μ\mu be a probability measure on 𝒴\mathcal{Y}, for each k1k\geq 1 let μk\mu_{k} be a probability measure on 𝒴k{\mathcal{Y}}^{k}, and let μ\mu^{*} be a probability measure on 𝒴{\mathcal{Y}}^{*}. We assume that the law of M(k)(x)M^{(k)}(x) on 𝒴\mathcal{Y} is equivalent to μ\mu, the law of M(x)M(x) on 𝒴{\mathcal{Y}}^{*} is equivalent to μ\mu^{*}, and the law of M(1:k)(x)M^{(1:k)}(x) on 𝒴k{\mathcal{Y}}^{k} is equivalent to μk\mu_{k} for every kk and every xx. Furthermore, we will write \mathcal{L} to be the privacy loss of MM, (k)\mathcal{L}^{(k)} be the privacy loss of M(k)M^{(k)}, and (1:k)\mathcal{L}^{(1:k)} to be the privacy loss of the sequence of mechanisms M(1:k)M^{(1:k)}. We then define noise reduction mechanisms more formally.

Definition 4.2 (Noise Reduction Mechanism).

Let M:𝒳𝒴M:\mathcal{X}\to\mathcal{Y}^{\infty} be a mechanism mapping sequences of outcomes and x,xx,x^{\prime} be any neighboring datasets. We say MM is a noise reduction mechanism if for any k1k\geq 1,

(1:k)=(k).\mathcal{L}^{(1:k)}=\mathcal{L}^{(k)}.

We will assume there is a fixed grid of time values t(1)>t(2)>>t(k)>0t^{(1)}>t^{(2)}>\cdots>t^{(k)}>0. We will typically think of the time values as being inversely proportional to the noise we add, i.e. t(i)=(1/ε(i))2t^{(i)}=\left(1/\varepsilon^{(i)}\right)^{2} where ε(1)<ε(2)<\varepsilon^{(1)}<\varepsilon^{(2)}<\cdots. An analyst will not have a particular stopping time set in advance and will instead want to stop interacting with the dataset as a function of the noisy answers that have been released. It might also be the case that the analyst wants to stop based on the outcome and the privatized dataset, but for now we consider stopping times that can only depend on the noisy outcomes or possibly some public information, not the underlying dataset.

Definition 4.3 (Stopping Function).

Let A:𝒳𝒴A:\mathcal{X}\to\mathcal{Y}^{\infty} be a noise reduction mechanism. For x𝒳x\in\mathcal{X}, let ((k)(x))k(\mathcal{F}^{(k)}(x))_{k\in\mathbb{N}} be the filtration given by (k)(x):=σ(A(i)(x):ik)\mathcal{F}^{(k)}(x):=\sigma(A^{(i)}(x):i\leq k). A function T:𝒴T:\mathcal{Y}^{\infty}\to\mathbb{N} is called a stopping function if for any x𝒳x\in\mathcal{X}, T(x):=T(A(x))T(x):=T(A(x)) is a stopping time with respect to ((k)(x))k1(\mathcal{F}^{(k)}(x))_{k\geq 1}. Note that this property does not depend on the choice of measures μ\mu, μ\mu^{*} and μk\mu_{k}.

We now recall the noise reduction mechanism with Brownian noise [21].

Definition 4.4 (Brownian Noise Reduction).

Let f:𝒳df:\mathcal{X}\to\mathbb{R}^{d} be a function and (t(k))k1(t^{(k)})_{k\geq 1} be a sequence of time values. Let (B(t))t0(B^{(t)})_{t\geq 0} be a standard dd-dimensional Brownian motion and T:(d)T:(\mathbb{R}^{d})^{*}\to\mathbb{N} be a stopping function. The Brownian mechanism associated with ff, time values (t(k))k1(t^{(k)})_{k\geq 1}, and stopping function TT is the algorithm BM:𝒳((0,t(1)]×d)\texttt{BM}:\mathcal{X}\to((0,t^{(1)}]\times\mathbb{R}^{d})^{*} given by

BM(x):=(t(k),f(x)+B(t(k)))kT(x).\texttt{BM}(x):=\left(t^{(k)},f(x)+B^{(t^{(k)})}\right)_{k\leq T(x)}.

We then have the following result.

Lemma 4.1 (Whitehouse et al. [21]).

The Brownian Noise Reduction mechanism BM is a noise reduction algorithm for a constant stopping function T(x)=kT(x)=k. Furthermore, we have for any stopping function T:𝒴T:\mathcal{Y}^{*}\to\mathbb{N}, the noise reduction property still holds, i.e.

(1:T(x))=(T(x)).\mathcal{L}^{(1:T(x))}=\mathcal{L}^{(T(x))}.

Another noise reduction mechanism uses Laplace noise [11, 12], which we consider in the appendix.

5 General Composition of Ex-post Private Mechanisms

We start with a simple result that states that ex-post mechanisms compose by just adding up the ex-post privacy functions. We will write δn(x):=δn(A1(x),,An1)(x))\delta_{n}(x):=\delta_{n}(A_{1}(x),\cdots,A_{n-1})(x)) and ρn(x):=ρ(A1(x),,An1)(x))\rho_{n}(x):=\rho(A_{1}(x),\cdots,A_{n-1})(x)). Further, we will denote n(;x)\mathcal{E}_{n}(\cdot~;x) as the privacy loss bound for algorithm AnA_{n} conditioned on the outcomes of A1:n1(x)A_{1:n-1}(x).

Lemma 5.1.

Fix a sequence δ1,,δm0\delta_{1},\cdots,\delta_{m}\geq 0. Let there be a probability measure μn\mu_{n} on 𝒴n\mathcal{Y}_{n} for each nn and the product measure on 𝒴1×𝒴m\mathcal{Y}_{1}\times\cdots\mathcal{Y}_{m}. Consider mechanisms An:𝒳×i=1n1𝒴i𝒴nA_{n}:\mathcal{X}\times\prod_{i=1}^{n-1}\mathcal{Y}_{i}\to\mathcal{Y}_{n} for n[m]n\in[m] where each An(;y1:n1)A_{n}(\cdot;y_{1:n-1}) is (n(;y1:n1),δn)(\mathcal{E}_{n}(\cdot;y_{1:n-1}),\delta_{n})-ex-post private for all prior outcomes y1:n1)y_{1:n-1}). Then the overall mechanism A1:m()A_{1:m}(\cdot) is (i=1mi(Ai();),i=1mδi)(\sum_{i=1}^{m}\mathcal{E}_{i}(A_{i}(\cdot);\cdot),\sum_{i=1}^{m}\delta_{i})-ex-post private with respect to the product measure.

Proof.

We consider neighboring inputs x,xx,x^{\prime} and write the privacy loss random variable A(A(x))\mathcal{L}_{A}(A(x)) for AA in terms of the privacy losses i(Ai(x))\mathcal{L}_{i}(A_{i}(x)) of each AiA_{i} for i[m]i\in[m]

Pr[A(A(x))<i=1mi(Ai(x);x)]\displaystyle\Pr\left[\mathcal{L}_{A}(A(x))<\sum_{i=1}^{m}\mathcal{E}_{i}(A_{i}(x);x)\right] =Pr[i=1mi(Ai(x))<i=1mi(Ai(x);x)]\displaystyle=\Pr\left[\sum_{i=1}^{m}\mathcal{L}_{i}(A_{i}(x))<\sum_{i=1}^{m}\mathcal{E}_{i}(A_{i}(x);x)\right]

Because each mechanism AiA_{i} is (i(;x),δi)(\mathcal{E}_{i}(\cdot;x),\delta_{i})-ex post private, we have Pr[i(A(x))i(Ai(x);x)]δi\Pr\left[\mathcal{L}_{i}(A(x))\geq\mathcal{E}_{i}(A_{i}(x);x)\right]\leq\delta_{i} and hence

Pr[i=1mi(Ai(x))i=1mi(Ai(x);x)]\displaystyle\Pr\left[\sum_{i=1}^{m}\mathcal{L}_{i}(A_{i}(x))\leq\sum_{i=1}^{m}\mathcal{E}_{i}(A_{i}(x);x)\right] Pr[i[m]{i(Ai(x))i(Ai(x);x)}]\displaystyle\geq\Pr\left[\bigcap_{i\in[m]}\left\{\mathcal{L}_{i}(A_{i}(x))\leq\mathcal{E}_{i}(A_{i}(x);x)\right\}\right]
=1Pr[i[m]{i(Ai(x))>i(Ai(x);x)}]\displaystyle=1-\Pr\left[\bigcup_{i\in[m]}\left\{\mathcal{L}_{i}(A_{i}(x))>\mathcal{E}_{i}(A_{i}(x);x)\right\}\right]
1i[m]Pr[i(Ai(x))>i(Ai(x);x)]\displaystyle\geq 1-\sum_{i\in[m]}\Pr\left[\mathcal{L}_{i}(A_{i}(x))>\mathcal{E}_{i}(A_{i}(x);x)\right]
1i[m]δi.\displaystyle\geq 1-\sum_{i\in[m]}\delta_{i}.

Hence, we have Pr[A(A(x))i=1mi(Ai(x);x)]i[m]δi\Pr\left[\mathcal{L}_{A}(A(x))\geq\sum_{i=1}^{m}\mathcal{E}_{i}(A_{i}(x);x)\right]\leq\sum_{i\in[m]}\delta_{i}, as desired. ∎

For general ex-post private mechanisms, this basic composition cannot be improved. We can simply pick i\mathcal{E}_{i} to be the same as the privacy loss i\mathcal{L}_{i} at each round with independently selected mechanisms AiA_{i} at each round i[m]i\in[m]. We now show how we can obtain a privacy filter from a sequence of ex-post mechanisms as long as each selected ex-post privacy mechanism selected at each round cannot exceed the remaining privacy budget. We will write δn(x)\delta_{n}(x) to denote the parameter δn\delta_{n} selected as a function of prior outcomes from A1(x),,An1(x)A_{1}(x),\cdots,A_{n-1}(x).

Lemma 5.2.

Let ε>0\varepsilon>0 and δ0\delta\geq 0 be fixed privacy parameters. Let (An:𝒳𝒴)n1(A_{n}:\mathcal{X}\to\mathcal{Y})_{n\geq 1} be a sequence of (n(;x),δn(x))(\mathcal{E}_{n}(\cdot;x),\delta_{n}(x))-ex-post private conditioned on prior outcomes y1:n1=A1:n1(x)y_{1:n-1}=A_{1:n-1}(x) where for all yiy_{i} we have

i=1n1i(yi;x)ε.\sum_{i=1}^{n-1}\mathcal{E}_{i}(y_{i};x)\leq\varepsilon.

Consider the function N:𝒴N:\mathcal{Y}^{\infty}\to\mathbb{N} where

N((yn)n1)=inf{n:ε=i[n]i(yi;x) or δ<i[n+1]δi(y1:i1)}.N((y_{n})_{n\geq 1})=\inf\left\{n:\varepsilon=\sum_{i\in[n]}\mathcal{E}_{i}(y_{i};x)\ \text{ or }\ \delta<\sum_{i\in[n+1]}\delta_{i}(y_{1:i-1})\right\}.

Then the algorithm A1:N()()A_{1:N(\cdot)}(\cdot) is (ε,δ)(\varepsilon,\delta)-DP where

N(x)=N((An(x))n1).N(x)=N((A_{n}(x))_{n\geq 1}).
Proof.

We follow a similar analysis in [22] where they created a filter for probabilistic DP mechanisms, that is the privacy loss of each mechanism can be bounded with high probability. We will write the corresponding privacy loss variables of AnA_{n} to be n\mathcal{L}_{n} and for the full sequence of algorithms A1:n=(A1,,An)A_{1:n}=(A_{1},\cdots,A_{n}), the privacy loss is denoted as 1:n\mathcal{L}_{1:n}.

Define the events

D:={nN(x):1:n>ε}, and E:={nN(x):n>n(An(x);x)}.\displaystyle D:=\left\{\exists n\leq N(x):\mathcal{L}_{1:n}>\varepsilon\right\},\ \text{ and }\ E:=\left\{\exists n\leq N(x):\mathcal{L}_{n}>\mathcal{E}_{n}(A_{n}(x);x)\right\}.

Using Bayes rule, we have that

Pr[D]=Pr[DEc]+Pr[DE]Pr[D|Ec]+Pr[E]=Pr[E],\Pr[D]=\Pr[D\cap E^{c}]+\Pr[D\cap E]\leq\Pr[D|E^{c}]+\Pr[E]=\Pr[E],

where the last inequality follows from how we defined the stopping function N(x)N(x). Now, we show that Pr[E]δ\Pr[E]\leq\delta. Define the modified privacy loss random variables (~n)n(\widetilde{\mathcal{L}}_{n})_{n\in\mathbb{N}} by

~n:={nnN(x)0otherwise.\widetilde{\mathcal{L}}_{n}:=\begin{cases}\mathcal{L}_{n}\qquad n\leq N(x)\\ 0\qquad\text{otherwise}\end{cases}.

Likewise, define the modified privacy parameter random variables ~n(;x)\widetilde{\mathcal{E}}_{n}(\cdot;x) and δ~n(x)\widetilde{\delta}_{n}(x) in an identical manner. Then, we can bound Pr[E]\Pr[E] in the following manner:

Pr[nN(x):\displaystyle\Pr[\exists n\leq N(x): n>n(An(x);x)]=Pr[n:~n>~n(An(x);x)]\displaystyle\mathcal{L}_{n}>\mathcal{E}_{n}(A_{n}(x);x)]=\Pr\left[\exists n\in\mathbb{N}:\widetilde{\mathcal{L}}_{n}>\widetilde{\mathcal{E}}_{n}(A_{n}(x);x)\right]
n=1Pr[~n>~n(An(x);x)]\displaystyle\leq\sum_{n=1}^{\infty}\Pr\left[\widetilde{\mathcal{L}}_{n}>\widetilde{\mathcal{E}}_{n}(A_{n}(x);x)\right]
=n=1𝔼[Pr[~n>~n(An(x);x)|n1]]\displaystyle=\sum_{n=1}^{\infty}\mathbb{E}\left[\Pr\left[\widetilde{\mathcal{L}}_{n}>\widetilde{\mathcal{E}}_{n}(A_{n}(x);x)|\mathcal{F}_{n-1}\right]\right]
n=1𝔼[δ~n(x)]=𝔼[n=1δ~n(x)]=𝔼[nN(x)δn(x)]δ.\displaystyle\leq\sum_{n=1}^{\infty}\mathbb{E}\left[\widetilde{\delta}_{n}(x)\right]=\mathbb{E}\left[\sum_{n=1}^{\infty}\widetilde{\delta}_{n}(x)\right]=\mathbb{E}\left[\sum_{n\leq N(x)}\delta_{n}(x)\right]\leq\delta.

Remark 1.

With ex-post privacy, we are not trying to ensure differential privacy of each intermediate outcome. Recall that DP is closed under post-processing, so that any post processing function of a DP outcome is also DP with the same parameter. Our privacy analysis depends on getting actual outcomes from an ex-post private mechanism, rather than a post-processed value of it. However, the full transcript of ex-post private mechanisms will ensure DP due to setting a privacy budget.

We now consider combining zCDP mechanisms with mechanisms that satisfy ex-post privacy. We consider a sequence of mechanisms (An:𝒳𝒴)n1(A_{n}:\mathcal{X}\to\mathcal{Y})_{n\geq 1} where each mechanism may depend on the previous outcomes. At each round, an analyst will use either an ex-post private mechanism or an approximate zCDP mechanism, in either case the privacy parameters may depend on the previous results as well.

Definition 5.1 (Approximate zCDP and Ex-post Private Sequence).

Consider a sequence of mechanisms (An)n1(A_{n})_{n\geq 1}, where An:𝒳𝒴A_{n}:\mathcal{X}\to\mathcal{Y}. The sequence (An)n1(A_{n})_{n\geq 1} is called a sequence of approximate zCDP and ex-post private mechanisms if for each round nn, the analyst will select An()A_{n}(\cdot) to be δn()\delta_{n}(\cdot)-approximate ρn()\rho_{n}(\cdot)-zCDP given previous outcomes Ai()A_{i}(\cdot) for i<ni<n, or the analyst will select An()A_{n}(\cdot) to be (n(An();),δn())(\mathcal{E}_{n}(A_{n}(\cdot);\cdot),\delta^{\prime}_{n}(\cdot))-ex-post private conditioned on Ai()A_{i}(\cdot) for i<ni<n. In rounds where zCDP is selected, we will simply write i(Ai();)0\mathcal{E}_{i}(A_{i}(\cdot);\cdot)\equiv 0, while in rounds where an ex-post private mechanism is selected, we will set ρi()=0\rho_{i}(\cdot)=0.

We now state a composition result that allows an analyst to combine ex-post private and zCDP mechanisms adaptively, while still ensuring a target level of privacy. Because we have two different interactive systems that are differentially private, one that uses only zCDP mechanisms and the other that only uses ex-post private mechanisms, we can then use concurrent composition to allow for the interaction for the sequence described in Definition 5.1.

Theorem 3.

Let ε,ε,δ,δ,δ′′>0\varepsilon,\varepsilon^{\prime},\delta,\delta^{\prime},\delta^{\prime\prime}>0. Let (An)n1(A_{n})_{n\geq 1} be a sequence of approximate zCDP and ex-post private mechanisms. As we did in Lemma 5.2, we will require that the ex-post private mechanisms that are selected at each round nn have ex-post privacy functions n\mathcal{E}_{n} that do not exceed the remaining budget from ϵ\epsilon^{\prime}. Consider the following function N:𝒴N:\mathcal{Y}^{\infty}\to\mathbb{N} as the following for any sequence of outcomes (yn)n1(y_{n})_{n\geq 1}

N((yn)n1,(yn)=inf{NzCDP((y1:n1)n1),Npost((yn)n1)},\displaystyle N((y_{n})_{n\geq 1},(y_{n})=\inf\left\{N_{\mathrm{zCDP}}((y_{1:n-1})_{n\geq 1}),N_{\mathrm{post}}((y_{n})_{n\geq 1})\right\},

where NzCDP((yn)n1)N_{\mathrm{zCDP}}((y_{n})_{n\geq 1}) is the stopping rule given in Theorem 2 with privacy parameters ε,δ,δ′′\varepsilon,\delta,\delta^{\prime\prime} and Npost((yn)n1)N_{\mathrm{post}}((y_{n})_{n\geq 1}) is the stopping rule given in Lemma 5.2 with privacy parameters ε,δ\varepsilon^{\prime},\delta^{\prime}. Then, the algorithm A1:N()()A_{1:N(\cdot)}(\cdot) is (ε+ε,δ+δ+δ′′)(\varepsilon+\varepsilon^{\prime},\delta+\delta^{\prime}+\delta^{\prime\prime})-DP, where

N(x)=N((An(x))n1).N(x)=N\left((A_{n}(x))_{n\geq 1}\right).
Proof.

We will separate out the approximate zCDP mechanisms, (AnzCDP)n1(A_{n}^{\mathrm{zCDP}})_{n\geq 1} from the ex-post private mechanisms (Anpost)n1(A_{n}^{\mathrm{post}})_{n\geq 1} to form two separate interactive systems. In this case, the parameters that are selected can only depend on the outcomes from the respective interactive system, e.g. ρn(x)\rho_{n}(x) can only depend on prior outcomes to mechanisms AizCDP(x)A_{i}^{\mathrm{zCDP}}(x) for i<ni<n. From Theorem 2, we know that A1:NzCDP()zCDP()A_{1:N_{\mathrm{zCDP}}(\cdot)}^{\mathrm{zCDP}}(\cdot) is (ε,δ+δ′′)(\varepsilon,\delta+\delta^{\prime\prime})-DP. We denote npost\mathcal{L}_{n}^{\mathrm{post}} to be the privacy loss random variable for the ex-post private mechanism at round nn. We will also write the stopping time for the ex-post private mechanisms as Npost(x)N_{\mathrm{post}}(x). From Lemma 5.2, we know that A1:Npost()post()A_{1:N_{\mathrm{post}}(\cdot)}^{\mathrm{post}}(\cdot) is (ε,δ)(\varepsilon^{\prime},\delta^{\prime})-DP.

From Theorem 1, we know that the concurrent composition, which allows for both A1:Npost()post()A_{1:N_{\mathrm{post}}(\cdot)}^{\mathrm{post}}(\cdot) and A1:NzCDP()zCDP()A_{1:N_{\mathrm{zCDP}}(\cdot)}^{\mathrm{zCDP}}(\cdot) to interact arbitrarily, will still be (ε+ε,δ+δ+δ′′)(\varepsilon+\varepsilon^{\prime},\delta+\delta^{\prime}+\delta^{\prime\prime})-DP. ∎

Although we are able to leverage advanced composition bounds from traditional differential privacy for the mechanisms that are approximate zCDP, we are simply adding up the ex-post privacy guarantees, which seems wasteful. Next, we consider how we can improve on this composition bound for certain ex-post private mechanisms.

6 Brownian Noise Reduction Mechanisms

We now consider composing specific types of ex-post private mechanisms, specifically the Brownian Noise Reduction mechanism. From Theorem 3.4 in [21], we can decompose the privacy loss as an uncentered Brownian motion, even when the stopping time is adaptively selected.

Theorem 4 (Whitehouse et al. [21]).

Let BM be the Brownian noise reduction mechanism associated with time values (t(k))k1(t^{(k)})_{k\geq 1} and a function ff. All reference measures generated by the mechanism are those generated by the Brownian motion without shift (starting at f(x)=0f(x)=0). For neighbors xxx\sim x^{\prime} and stopping function TT, the privacy loss between BM(1:T(x))(x)\texttt{BM}^{(1:T(x))}(x) and BM(1:T(x))(x)\texttt{BM}^{(1:T(x^{\prime}))}(x^{\prime}) is given by

BM(1:T(x))(x,x)=f(x)f(x)222t(T(x))+f(x)f(x)2t(T(x))f(x)f(x)f(x)f(x)2,B(t(T(x))),\mathcal{L}^{(1:T(x))}_{\texttt{BM}}(x,x^{\prime})=\frac{||f(x^{\prime})-f(x)||_{2}^{2}}{2t^{(T(x))}}+\frac{||f(x^{\prime})-f(x)||_{2}}{t^{(T(x))}}\left<\frac{f(x^{\prime})-f(x)}{||f(x^{\prime})-f(x)||_{2}},B^{(t^{(T(x))})}\right>,

Suppose ff has 2\ell_{2}-sensitivity at most Δ2(f)\Delta_{2}(f). Then, letting a+:=max(0,a)a_{+}:=\max(0,a), we have the following where (W(t))t0(W^{(t)})_{t\geq 0} is a standard, univariate Brownian motion.

BM(1:T(x))(x,x)Δ2(f)22t(T(x))+Δ2(f)t(T(x))(W(t(T(x))))+.\mathcal{L}^{(1:T(x))}_{\texttt{BM}}(x,x^{\prime})\leq\frac{\Delta_{2}(f)^{2}}{2t^{(T(x))}}+\frac{\Delta_{2}(f)}{t^{(T(x))}}\left(W^{(t^{(T(x))})}\right)_{+}.

This decomposition of the privacy loss will be very useful in analyzing the overall privacy loss of a combination of Brownian noise reduction mechanisms.

6.1 Backward Brownian Motion Martingale

We now present a key result for our analysis of composing Brownian noise reduction mechanisms. Although in [21], the ex-post privacy proof of the Brownian mechanism applied Ville’s inequality [for proof, cf. 8, Lemma 1] to the (unscaled) standard Brownian motion (B(t))t>0(B^{(t)})_{t>0}, it turns out that the scaled standard Brownian motion (B(t)/t)(B^{(t)}/t) forms a backward martingale [cf. 16, Exercise 2.16] and this fact is crucial to our analysis.

Lemma 6.1 (Backward martingale).

Let (B(t))(B^{(t)}) be a standard Brownian motion. Define the reverse filtration 𝒢(t)=σ(B(u);ut)\mathcal{G}^{(t)}=\sigma(B^{(u)};u\geq t), meaning that 𝒢(s)𝒢(t)\mathcal{G}^{(s)}\supset\mathcal{G}^{(t)} if s<ts<t. For every real λ\lambda, the process

M(t):=exp(λB(t)/tλ2/(2t));t>0M^{(t)}:=\exp(\lambda B^{(t)}/t-\lambda^{2}/(2t));\quad t>0

is a nonnegative (reverse) martingale with respect to the filtration 𝒢=(𝒢(t))\mathcal{G}=(\mathcal{G}^{(t)}). Further, at any t>0t>0, 𝔼[M(t)]=1\mathbb{E}[M^{(t)}]=1, M()=1M^{(\infty)}=1 almost surely, and 𝔼[M(τ)]1\mathbb{E}[M^{(\tau)}]\leq 1 for any stopping time τ\tau with respect to 𝒢\mathcal{G} (equality holds with some restrictions). In short, M(τ)M^{(\tau)} is an “e-value” for any stopping time τ\tau — an e-value is a nonnegative random variable with expectation at most one.

Let B1=(B1(t))t0,B2=(B2(t))t0,B_{1}=(B_{1}^{(t)})_{t\geq 0},B_{2}=(B_{2}^{(t)})_{t\geq 0},\dots be independent, standard Brownian motions, with corresponding backward martingales M1=(M1(t))t0,M2=(M2(t))t0,M_{1}=(M_{1}^{(t)})_{t\geq 0},M_{2}=(M_{2}^{(t)})_{t\geq 0},\dots and (internal to each Brownian motion) filtrations 𝒢1=(𝒢1(t))t0,𝒢2=(𝒢2(t))t0,\mathcal{G}_{1}=(\mathcal{G}_{1}^{(t)})_{t\geq 0},\mathcal{G}_{2}=(\mathcal{G}_{2}^{(t)})_{t\geq 0},\dots as defined in the previous lemma. Select time values (t1(k))k1(t_{1}^{(k)})_{k\geq 1}. Let a Brownian noise reduction mechanism BM1\texttt{BM}_{1} be run using B1B_{1} and stopped at τ1\tau_{1}. Then E[M1(τ1)]1E[M_{1}^{(\tau_{1})}]\leq 1 as per the previous lemma. Based on outputs from the BM1\texttt{BM}_{1}, we choose time values (t2(k))k1σ(B1(t);tτ1):=1(t_{2}^{(k)})_{k\geq 1}\in\sigma(B_{1}^{(t)};t\geq\tau_{1}):=\mathcal{F}_{1}. Now run the second Brownian noise reduction BM2\texttt{BM}_{2} using B2B_{2}, stopping at time τ2\tau_{2}. Since B1,B2B_{1},B_{2} are independent, we still have that 𝔼[M2(τ2)|1]1\mathbb{E}[M^{(\tau_{2})}_{2}|\mathcal{F}_{1}]\leq 1. Let 2:=σ((B1(t))tτ1,(B2(t))tτ2)\mathcal{F}_{2}:=\sigma((B^{(t)}_{1})_{t\geq\tau_{1}},(B^{(t)}_{2})_{t\geq\tau_{2}}) be the updated filtration, based on which we choose time values (t3(k))k1(t_{3}^{(k)})_{k\geq 1}. Because B3B_{3} is independent of the earlier two, at the next step, we still have 𝔼[M3(τ3)|2]1\mathbb{E}[M^{(\tau_{3})}_{3}|\mathcal{F}_{2}]\leq 1. Proceeding in this fashion, it is clear that the product of the stopped e-values EmE_{m} where

Em:=s=1mMs(τs)=exp(λs=1mBs(τs)τsλ22s=1m1τs)E_{m}:=\prod_{s=1}^{m}M^{(\tau_{s})}_{s}=\exp\left(\lambda\sum_{s=1}^{m}\frac{B^{(\tau_{s})}_{s}}{\tau_{s}}-\frac{\lambda^{2}}{2}\sum_{s=1}^{m}\frac{1}{\tau_{s}}\right) (2)

is itself a (forward) nonnegative supermartingale with respect to the filtration =(n)n1\mathcal{F}=(\mathcal{F}_{n})_{n\geq 1}, with initial value E0:=1E_{0}:=1. Applying the Gaussian mixture method [cf. 9, Proposition 5], we get that for any γ,δ>0\gamma,\delta>0 and with ψ(t;γ,δ):=(t+γ)log(t+γδ2γ)\psi(t;\gamma,\delta):=\sqrt{(t+\gamma)\log\left(\frac{t+\gamma}{\delta^{2}\gamma}\right)},

Pr[supm1{|s=1mBs(τs)τs|s[m]12τs+ψ(s[m]1/τs;γ,δ)}]δ.\Pr\left[\sup_{m\geq 1}\left\{\left|\sum_{s=1}^{m}\frac{B^{(\tau_{s})}_{s}}{\tau_{s}}\right|\geq\sum_{s\in[m]}\frac{1}{2\tau_{s}}+\psi\left(\sum_{s\in[m]}1/\tau_{s};\gamma,\delta\right)\right\}\right]\leq\delta.

This then provides an alternate way to prove Theorem 3.6 in Whitehouse et al. [21].

Theorem 5.

Let (Ti)i1(T_{i})_{i\geq 1} be a sequence of stopping functions, as in Definition 4.3, and a sequence of time values (ti(j):j[ki])i1(t_{i}^{(j)}:j\in[k_{i}])_{i\geq 1}. Let BMi\texttt{BM}_{i} denote a Brownian noise reduction with statistic fif_{i} that can be adaptively selected based on outcomes of previous Brownian noise reductions and fif_{i} has 2\ell_{2}-sensitivity 1. We then have, for any γ,δ>0\gamma,\delta>0,

supxxPr[i=1BMi(Ti(x))ψ(i=11/ti(Ti(x));γ,δ)]δ.\sup_{x\sim x^{\prime}}\Pr\left[\sum_{i=1}^{\infty}\mathcal{L}_{\texttt{BM}_{i}}^{(T_{i}(x))}\geq\psi\left(\sum_{i=1}^{\infty}1/t_{i}^{(T_{i}(x))};\gamma,\delta\right)\right]\leq\delta.

In other words, (BMi(1:Ti()))i1\left(\texttt{BM}_{i}^{(1:T_{i}(\cdot))}\right)_{i\geq 1} is (ψ(i=11/ti(Ti());γ,δ),δ)\left(\psi\left(\sum_{i=1}^{\infty}1/t_{i}^{(T_{i}(\cdot))};\gamma,\delta\right),\delta\right)-ex post private.

Remark 2.

We note here that the stopping time of each Brownian noise reduction is with respect to =((t))\mathcal{H}=(\mathcal{H}^{(t)}) where (t)=σ(f(x)+B(u);u>t)\mathcal{H}^{(t)}=\sigma(f(x)+B^{(u)};u>t). From the point of view of the analyst, f(x)f(x) is random (being fixed, but unknown, is the same as f(x)f(x) being a realization from some random mechanism). In fact, (t)\mathcal{H}^{(t)} extends 𝒢(t)\mathcal{G}^{(t)} with t=0t=0, which would reveal f(x)f(x) but the analyst only has access to t>0t>0 for which she pays.

Remark 3.

For the multivariate case we have B(u)=(B(u)[i]:i[d])B^{(u)}=\left(B^{(u)}[i]:i\in[d]\right) where each coordinate is an independent Brownian motion, and we will write the filtration 𝒢[i]\mathcal{G}[i] to be the natural reverse filtration corresponding to the Brownian motion for index ii. We then define 𝒢(t)=σ(B(u)[1],,B(u)[d]:ut)\mathcal{G}^{(t)}=\sigma\left(B^{(u)}[1],\cdots,B^{(u)}[d]:u\geq t\right). From Lemma 6.1, we have the following for all λ\lambda\in\mathbb{R}, i[d]i\in[d], and 0<s<t0<s<t

𝔼[exp(λB(t)[i]/tλ2/(2t))𝒢(s)[i]]1.\mathbb{E}[\exp(\lambda B^{(t)}[i]/t-\lambda^{2}/(2t))\mid\mathcal{G}^{(s)}[i]]\leq 1.

We then consider the full dd-dimensional Brownian motion so that with a unit vector v=(v[1],,v[d])dv=(v[1],\cdots,v[d])\in\mathbb{R}^{d} we have

𝔼[exp(λi=1dv[i]B(t)[i]/tλ22t)𝒢(s)]\displaystyle\mathbb{E}\left[\exp\left(\lambda\cdot\sum_{i=1}^{d}v[i]\cdot B^{(t)}[i]/t-\frac{\lambda^{2}}{2t}\right)\mid\mathcal{G}^{(s)}\right] =𝔼[i=1dexp(λv[i]B(t)[i]/tλ2v[i]22t)𝒢(s)]\displaystyle=\mathbb{E}\left[\prod_{i=1}^{d}\exp\left(\lambda\cdot v[i]\cdot B^{(t)}[i]/t-\frac{\lambda^{2}\cdot v[i]^{2}}{2t}\right)\mid\mathcal{G}^{(s)}\right]
=i=1d𝔼[exp(λv[i]B(t)[i]/tλ2v[i]22t)𝒢(s)[i]]\displaystyle=\prod_{i=1}^{d}\mathbb{E}\left[\exp\left(\lambda\cdot v[i]\cdot B^{(t)}[i]/t-\frac{\lambda^{2}\cdot v[i]^{2}}{2t}\right)\mid\mathcal{G}^{(s)}[i]\right]
1.\displaystyle\leq 1.

6.2 Privacy Filters with Brownian Noise Reduction Mechanisms

Given Lemma 6.1 and the decomposition of the privacy loss for the Brownian mechanism given in Theorem 4, we will be able to get tighter composition bounds of multiple Brownian noise reduction mechanisms rather than resorting to a general ex-post privacy composition in Lemma 5.2. It will be important to only use time values with the Brownian noise reduction mechanisms that cannot exceed the remaining privacy loss budget, similar to how in Lemma 5.2 we did not want to select an ex-post private mechanism whose privacy loss could exceed the remaining privacy budget. We then make the following condition on the time values (tn(j))j=1kn(t_{n}^{(j)})_{j=1}^{k_{n}} that are used for each Brownian noise reduction given prior outcomes y1,,yn1y_{1},\cdots,y_{n-1} from the earlier Brownian noise reductions with time values (ti(j))j=1ki(t_{i}^{(j)})_{j=1}^{k_{i}} and stopping functions TiT_{i} for i<ni<n and overall budget ρ>0\rho>0

12tn(kn)ρi<n12ti(Ti(y1:i)).\frac{1}{2t_{n}^{(k_{n})}}\leq\rho-\sum_{i<n}\frac{1}{2t_{i}^{(T_{i}(y_{1:i}))}}. (3)
Lemma 6.2.

Let ρ>0\rho>0 and consider a sequence of (BMn)n1(\texttt{BM}_{n})_{n\geq 1} each with statistic fn:𝒳dnf_{n}:\mathcal{X}\to\mathbb{R}^{d_{n}} with 2\ell_{2} sensitivity 11, stopping function TnT_{n}, and time values (tn(j))j=1kn(t_{n}^{(j)})_{j=1}^{k_{n}} which can be adaptively selected and satisfies (3). Consider the function N:𝒴{}N:\mathcal{Y}^{\infty}\to\mathbb{N}\cup\{\infty\} where 𝒴\mathcal{Y} contains all possible outcome streams from (BMn)n1(\texttt{BM}_{n})_{n\geq 1} as the following for any sequence of outcomes (yn)n1(y_{n})_{n\geq 1}:

N((yn)n1)=inf{n:ρ=i[n]12ti(Ti(y1:i))}.N((y_{n})_{n\geq 1})=\inf\left\{n:\rho=\sum_{i\in[n]}\frac{1}{2t_{i}^{(T_{i}(y_{1:i}))}}\right\}.

Then, for δ>0\delta>0, BM1:N()()\texttt{BM}_{1:N(\cdot)}(\cdot) is (ρ+2ρlog(1/δ),δ)(\rho+2\sqrt{\rho\log(1/\delta)},\delta)-DP, where N(x)=N((BMn(x))n1).N(x)=N((\texttt{BM}_{n}(x))_{n\geq 1}).

Proof.

We use the decomposition of the privacy loss n(1:k)\mathcal{L}_{n}^{(1:k)} at round n1n\geq 1 for Brownian noise reduction in Theorem 4 that is stopped at time value tn(k)t_{n}^{(k)} to get the following with the natural filtration (n(x))n1(\mathcal{F}_{n}(x))_{n\geq 1} generated by (An(x))n1(A_{n}(x))_{n\geq 1}. Recall that we have for the Brownian motion (Bn(t))t>0(B_{n}^{(t)})_{t>0} used in the Brownian noise reduction

n(1:k)=n(k)=12tn(k)+Bn(tn(k))tn(k).\mathcal{L}_{n}^{(1:k)}=\mathcal{L}_{n}^{(k)}=\frac{1}{2t_{n}^{(k)}}+\frac{B_{n}^{(t_{n}^{(k)})}}{t_{n}^{(k)}}.

From Lemma 6.1 we have for all λ\lambda\in\mathbb{R} and n1n\geq 1 with time value tn(k)t_{n}^{(k)}

𝔼[exp(λXn(k)λ22tn(k))n1(x)]1, where Xn(k)=n(k)12tn(k).\mathbb{E}\left[\exp\left(\lambda X^{(k)}_{n}-\frac{\lambda^{2}}{2t_{n}^{(k)}}\right)\mid\mathcal{F}_{n-1}(x)\right]\leq 1,\qquad\text{ where }\qquad X_{n}^{(k)}=\mathcal{L}_{n}^{(k)}-\frac{1}{2t_{n}^{(k)}}.

We then form the following process,

Mn(k1,,kn)=exp(λi=1nXi(ki)i=1nλ22ti(ki)).M^{(k_{1},\cdots,k_{n})}_{n}=\exp\left(\lambda\sum_{i=1}^{n}X^{(k_{i})}_{i}-\sum_{i=1}^{n}\frac{\lambda^{2}}{2t_{i}^{(k_{i})}}\right).

Hence, with fixed time values (tn(kn))n1(t_{n}^{(k_{n})})_{n\geq 1} for n1n\geq 1 we have for all λ\lambda

𝔼[Mn(k1,,kn)n1(x)]1.\mathbb{E}\left[M^{(k_{1},\cdots,k_{n})}_{n}\mid\mathcal{F}_{n-1}(x)\right]\leq 1.

We then replace (ki)in(k_{i})_{i\leq n} with an adaptive stopping time functions (Ti)in\left(T_{i}\right)_{i\leq n}, rather than fixing them in advance, in which case we rename Mn(k1,,kn)M^{(k_{1},\cdots,k_{n})}_{n} as MnBMM_{n}^{\text{BM}}. We know from Lemma 6.1 that MnBMM^{\text{BM}}_{n} is still an e-value for any nn. We then apply the optional stopping theorem to conclude that with the stopping time N(x)N(x) that 𝔼[MN(x)BM]1\mathbb{E}\left[M_{N(x)}^{\text{BM}}\right]\leq 1. By the definition of our stopping time, so that i=1N(x)12tiTi(x)=ρ\sum_{i=1}^{N(x)}\frac{1}{2t_{i}^{T_{i}(x)}}=\rho, we have for all λ\lambda

𝔼[exp((λ1)i=1N(x)Li(Ti(x)))]eλ(λ1)ρ\mathbb{E}\left[\exp\left((\lambda-1)\sum_{i=1}^{N(x)}L_{i}^{(T_{i}(x))}\right)\right]\leq e^{\lambda(\lambda-1)\rho}

We then set λ=2ρ+2ρlog(1/δ)2ρ\lambda=\frac{2\rho+2\sqrt{\rho\log(1/\delta)}}{2\rho} to get

Pr[i=1N(x)Li(Ti(x))ρ+2ρlog(1/δ)]δ\Pr\left[\sum_{i=1}^{N(x)}L_{i}^{(T_{i}(x))}\geq\rho+2\sqrt{\rho\log(1/\delta)}\right]\leq\delta

We then have a high probability bound on the overall privacy loss, which then implies differential privacy. ∎

One approach to defining a privacy filter for both approximate zCDP and Brownian noise reduction mechanisms would be to use concurrent composition, as we did in Lemma 5.2. However, this would require us to set separate privacy budgets for approximate zCDP mechanisms and Brownian noise reduction, which is an extra (nuissance) parameter to set.

We now show how we can combine Brownian noise reduction and approximate zCDP mechanisms with a single privacy budget. We will need a similar condition on the time values selected at each round for the Brownian noise reduction mechanisms as in (3). Note that at each round either an approximate zCDP or Brownian noise reduction mechanism will be selected. Given prior outcomes y1,,yn1y_{1},\cdots,y_{n-1} and previously selected zCDP parameters ρ1,ρ2(y1),,ρn(y1:n1)\rho_{1},\rho_{2}(y_{1}),\cdots,\rho_{n}(y_{1:n-1}) — noting that at round nn where BM is selected we have ρn(y1:n1)=0\rho_{n}(y_{1:n-1})=0 or if a zCDP mechanism is selected we simply set kn=1k_{n}=1 and 1tn(1)=0\frac{1}{t_{n}^{(1)}}=0 — we have the following condition on ρn(y1:n1)\rho_{n}(y_{1:n-1}) and the time values (tn(j))j=1kn(t_{n}^{(j)})_{j=1}^{k_{n}} if we select a BM at round nn and have overall budget ρ>0\rho>0,

012tn(kn)+ρn(y1:n1)ρi<n(ρi(y1:i1)+12ti(Ti(y1:i))).0\leq\frac{1}{2t_{n}^{(k_{n})}}+\rho_{n}(y_{1:n-1})\leq\rho-\sum_{i<n}\left(\rho_{i}(y_{1:i-1})+\frac{1}{2t_{i}^{(T_{i}(y_{1:i}))}}\right). (4)
Theorem 6.

Let ρ>0\rho>0 and δ0\delta\geq 0. Let (An)n1(A_{n})_{n\geq 1} be a sequence of approximate zCDP and ex-post private mechanisms where each ex-post private mechanism at round nn is a Brownian Mechanism with an adaptively chosen stopping function TnT_{n}, a statistic fnf_{n} with 2\ell_{2}-sensitivity equal to 1, and time values (tn(j))j=1kn(t_{n}^{(j)})_{j=1}^{k_{n}} that satisfy the condition in (4). Consider the function N:𝒴N:\mathcal{Y}^{\infty}\to\mathbb{N} as the following for any sequence of outcomes (yn)n1(y_{n})_{n\geq 1} :

N((yn)n1)\displaystyle N((y_{n})_{n\geq 1}) =inf{n:δ<i[n+1]δi(y1:i1) or ρ=i[n](ρi(y1:i1)+12ti(Ti(y1:i)))}.\displaystyle=\inf\left\{n:\delta<\sum_{i\in[n+1]}\delta_{i}(y_{1:i-1})\text{ or }\rho=\sum_{i\in[n]}\left(\rho_{i}(y_{1:i-1})+\frac{1}{2t_{i}^{(T_{i}(y_{1:i}))}}\right)\right\}.

Then for δ′′>0\delta^{\prime\prime}>0, the algorithm A1:N()()A_{1:N(\cdot)}(\cdot) is (ρ+22ρlog(1/δ′′),δ+δ′′)(\rho+2\sqrt{2\rho\log(1/\delta^{\prime\prime})},\delta+\delta^{\prime\prime})-DP, where the stopping function is N(x)=N((An(x))n1).N(x)=N((A_{n}(x))_{n\geq 1}).

Proof.

We will show that A1:N()()A_{1:N(\cdot)}(\cdot) is δ\delta-approximate ρ\rho-zCDP and then use Lemma 2.3 to obtain a DP guarantee stopping at N(x)N(x) as defined in the theorem statement.

We follow a similar analysis to the proof of Theorem 1 in [22]. Let P1:nP_{1:n} and Q1:nQ_{1:n} denote the joint distributions of (A1,,An)(A_{1},\dots,A_{n}) with inputs xx and xx^{\prime}, respectively. We overload notation and write P1:n(y1,,yn)P_{1:n}(y_{1},\dots,y_{n}) and Q1:n(y1,,yn)Q_{1:n}(y_{1},\dots,y_{n}) for the likelihood of y1,,yny_{1},\dots,y_{n} under input xx and xx^{\prime} respectively. We similarly write Pn(yny1:n1)P_{n}(y_{n}\mid y_{1:n-1}) and Qn(yny1:n1)Q_{n}(y_{n}\mid y_{1:n-1}) for the corresponding conditional densities.

For any n,n\in\mathbb{N}, we have

P1:n(y1,,yn)\displaystyle P_{1:n}(y_{1},\cdots,y_{n}) =m=1nPm(ymy1:m1),\displaystyle=\prod_{m=1}^{n}P_{m}(y_{m}\mid y_{1:m-1}),
Q1:n(y1,,yn)\displaystyle Q_{1:n}(y_{1},\cdots,y_{n}) =m=1nQm(ymy1:m1).\displaystyle=\prod_{m=1}^{n}Q_{m}(y_{m}\mid y_{1:m-1}).

When then show that the two likelihoods can be decomposed as weighted mixtures of PP^{\prime} and P′′P^{\prime\prime}, as well as QQ^{\prime} and Q′′Q^{\prime\prime}, respectively such that the mixture weights on PP^{\prime} and QQ^{\prime} are at least (1δ)(1-\delta), and for all λ1\lambda\geq 1,

max{Dλ(PQ),Dλ(QP)}ρλ.\displaystyle\max\Big{\{}D_{\lambda}\left(P^{\prime}\|Q^{\prime}\right),D_{\lambda}\left(Q^{\prime}\|P^{\prime}\right)\Big{\}}\leq\rho\lambda. (5)

By our assumption of approximate zCDP at each step nn, we can write the conditional likelihoods of PnP_{n} and QnQ_{n} as the following convex combinations:

Pn(yny1:n1)\displaystyle P_{n}(y_{n}\mid y_{1:n-1}) =(1δn(y1:n1))Pn(yny1:n1)+δn(y1:n1)Pn′′(yny1:n1),\displaystyle=(1-\delta_{n}(y_{1:n-1}))P^{\prime}_{n}(y_{n}\mid y_{1:n-1})+\delta_{n}(y_{1:n-1})P^{\prime\prime}_{n}(y_{n}\mid y_{1:n-1}),
Qn(yny1:n1)\displaystyle Q_{n}(y_{n}\mid y_{1:n-1}) =(1δn(y1:n1))Qn(yny1:n1)+δn(y1:n1)Qn′′(yny1:n1),\displaystyle=(1-\delta_{n}(y_{1:n-1}))Q^{\prime}_{n}(y_{n}\mid y_{1:n-1})+\delta_{n}(y_{1:n-1})Q^{\prime\prime}_{n}(y_{n}\mid y_{1:n-1}),

such that for all λ1\lambda\geq 1 and all prior outcomes y1:n1y_{1:n-1}, we have both

Dλ(Pn(y1:n1)Qn(y1:n1))ρn(y1:n1)λ,\displaystyle D_{\lambda}\left(P_{n}^{\prime}(\cdot\mid y_{1:n-1})~\|~Q_{n}^{\prime}(\cdot\mid y_{1:n-1})\right)\leq\rho_{n}(y_{1:n-1})\lambda, (6)
Dλ(Qn(y1:n1)Pn(y1:n1))ρn(y1:n1)λ.\displaystyle D_{\lambda}\left(Q_{n}^{\prime}(\cdot\mid y_{1:n-1})~\|~P_{n}^{\prime}(\cdot\mid y_{1:n-1})\right)\leq\rho_{n}(y_{1:n-1})\lambda. (7)

Note that at each round, we either select an approximate zCDP mechanism or select a Brownian noise reduction, and in the latter case δn(x)0\delta_{n}(x)\equiv 0 and ρn(x)0\rho_{n}(x)\equiv 0, which then means PnPnP_{n}^{\prime}\equiv P_{n} and QnQnQ_{n}^{\prime}\equiv Q_{n} at those rounds nn. We will write the distribution P1:nP_{1:n} for any prefix of outcomes from A1(x),,An(x)A_{1}(x),\cdots,A_{n}(x) and similarly we will write the distribution Q1:nQ_{1:n} for the prefix of outcomes from A1(x),An(x)A_{1}(x^{\prime}),\cdots A_{n}(x^{\prime}). We can then write these likelihood as a convex combination of likelihoods, using the fact that n=1δn(y1:n1)δ\sum_{n=1}^{\infty}\delta_{n}(y_{1:n-1})\leq\delta for all y1,y2,y_{1},y_{2},\cdots.

P1:n(y1,,yn)\displaystyle P_{1:n}(y_{1},\cdots,y_{n}) =(1δ)=1nP(y|y1:1)P1:n(y1,,yn)+δP1:n′′(y1,,yn)\displaystyle=(1-\delta)\underbrace{\prod_{\ell=1}^{n}P_{\ell}^{\prime}(y_{\ell}|y_{1:\ell-1})}_{P^{\prime}_{1:n}(y_{1},\cdots,y_{n})}+\delta P_{1:n}^{\prime\prime}(y_{1},\cdots,y_{n}) (8)
Q1:n(y1,,yn)\displaystyle Q_{1:n}(y_{1},\cdots,y_{n}) =(1δ)=1nQ(y|y1:1)Q1:n(y1,,yn)+δQ1:n′′(y1,,yn)\displaystyle=(1-\delta)\underbrace{\prod_{\ell=1}^{n}Q_{\ell}^{\prime}(y_{\ell}|y_{1:\ell-1})}_{Q^{\prime}_{1:n}(y_{1},\cdots,y_{n})}+\delta Q_{1:n}^{\prime\prime}(y_{1},\cdots,y_{n}) (9)

For any fixed λ1\lambda\geq 1 and k1k\geq 1, consider the following filtration =(n)n1\mathcal{F}^{\prime}=(\mathcal{F}^{\prime}_{n})_{n\geq 1} where n=σ(Y1,,Yn)\mathcal{F}^{\prime}_{n}=\sigma(Y_{1}^{\prime},\cdots,Y_{n}^{\prime}), with Y1P1Y_{1}^{\prime}\sim P_{1}^{\prime}, Y2Y1P2(Y1),,YkY1:k1Pk(Y1:k1)Y_{2}^{\prime}\mid Y_{1}^{\prime}\sim P_{2}^{\prime}(\cdot\mid Y_{1}^{\prime}),\cdots,Y_{k}^{\prime}\mid Y_{1:k-1}^{\prime}\sim P_{k}^{\prime}(\cdot\mid Y_{1:k-1}^{\prime}).

We will first consider the Brownian noise reduction mechanisms with time values (tn(k):k1)n1(t_{n}^{(k)}:k\geq 1)_{n\geq 1} to be stopped at fixed rounds (kn)n1(k_{n})_{n\geq 1}, although not every round will have a Brownian noise reduction mechanism selected with Brownian motion (Bn(t))t>0(B^{(t)}_{n})_{t>0}. We will write out the privacy loss for the Brownian noise reduction mechanisms stopped at rounds (kn)n1(k_{n})_{n\geq 1} as n(1:kn)\mathcal{L}_{n}^{(1:k_{n})} where from Theorem 4 we have,

n(1:kn)=n(kn)=12tn(kn)+Bn(kn)tn(kn).\mathcal{L}_{n}^{(1:k_{n})}=\mathcal{L}_{n}^{(k_{n})}=\frac{1}{2t_{n}^{(k_{n})}}+\frac{B_{n}^{(k_{n})}}{t_{n}^{(k_{n})}}.

We will add the noise reduction outcomes to the filtration, so that n′′=σ(Y1,,Yn1,(Bn(t))tkn)\mathcal{F}^{\prime\prime}_{n}=\sigma(Y^{\prime}_{1},\cdots,Y^{\prime}_{n-1},(B_{n}^{(t)})_{t\geq k_{n}}). From Lemma 6.1, we know for all λ\lambda\in\mathbb{R}

𝔼[exp(λ(n(1:kn)12tn(kn))λ22tn(kn))n1′′]=𝔼[exp(λn(1:kn)λ(λ+1)2tn(kn))n1′′]1.\mathbb{E}\left[\exp\left(\lambda\left(\mathcal{L}_{n}^{(1:k_{n})}-\frac{1}{2t_{n}^{(k_{n})}}\right)-\frac{\lambda^{2}}{2t_{n}^{(k_{n})}}\right)\mid\mathcal{F}^{\prime\prime}_{n-1}\right]=\mathbb{E}\left[\exp\left(\lambda\mathcal{L}_{n}^{(1:k_{n})}-\frac{\lambda(\lambda+1)}{2t_{n}^{(k_{n})}}\right)\mid\mathcal{F}^{\prime\prime}_{n-1}\right]\leq 1.

We then replace (ki)in(k_{i})_{i\leq n} with an adaptive stopping time functions (Ti)in(T_{i})_{i\leq n} with corresponding stopping times (Ti(x))in(T_{i}(x))_{i\leq n}, rather than fixing them in advance, in which case we rename n(1:kn)\mathcal{L}^{(1:k_{n})}_{n} as nBMn\mathcal{L}_{n}^{\texttt{BM}_{n}} and the same inequality still holds with the filtration n′′′=σ(Y1,,Yn1,(Bn(t))tTn(x))\mathcal{F}^{\prime\prime\prime}_{n}=\sigma(Y^{\prime}_{1},\cdots,Y^{\prime}_{n-1},(B_{n}^{(t)})_{t\geq T_{n}(x)}). Note that we will call Yn=(f(x)+Bn(t))tTn(x)Y_{n}^{\prime}=(f(x)+B_{n}^{(t)})_{t\geq T_{n}(x)} so that n′′′=n\mathcal{F}^{\prime\prime\prime}_{n}=\mathcal{F}^{\prime}_{n}.

At rounds nn where a Brownian noise reduction mechanism is not selected, we simply have 1/tn(kn)=01/t_{n}^{(k_{n})}=0 and n(1:kn)=0\mathcal{L}_{n}^{(1:k_{n})}=0. We also consider the modified privacy losses for the approximate zCDP mechanisms n\mathcal{L}^{\prime}_{n} where

n=n(y1:n)=log(Pn(yn|y<n)Qn(yn|y<n)), where y1:nP1:n\mathcal{L}^{\prime}_{n}=\mathcal{L}^{\prime}_{n}(y_{1:n})=\log\left(\frac{P^{\prime}_{n}(y_{n}|y_{<n})}{Q^{\prime}_{n}(y_{n}|y_{<n})}\right),\text{ where }y_{1:n}\sim P^{\prime}_{1:n}

Due to mechanisms being zCDP, we then have for any λ1\lambda\geq 1

𝔼[exp((λ1)nλ(λ1)ρn(Y1:n1)/2)n1]1.\mathbb{E}\left[\exp\left((\lambda-1)\mathcal{L}_{n}^{\prime}-\lambda(\lambda-1)\rho_{n}(Y_{1:n-1}^{\prime})/2\right)\mid\mathcal{F}^{\prime}_{n-1}\right]\leq 1.

Because at each round nn, the mechanism selected is either approximate zCDP or a Brownian noise reduction with a stopping function, we can write the privacy loss at each round ii as the sum i+iBM\mathcal{L}^{\prime}_{i}+\mathcal{L}_{i}^{\texttt{BM}} so that for all λ1\lambda\geq 1 we have

Xn(λ)\displaystyle X_{n}^{(\lambda)} :=in{i+iBMλ(ρi(Y1:n1)+1ti(Ti(x))},\displaystyle:=\sum_{i\leq n}\left\{\mathcal{L}^{\prime}_{i}+\mathcal{L}_{i}^{\texttt{BM}}-\lambda\left(\rho_{i}(Y_{1:n-1}^{\prime})+\frac{1}{t_{i}^{(T_{i}(x)}}\right)\right\}, (10)
Mn(λ)\displaystyle M_{n}^{(\lambda)} :=exp((λ1)Xn(λ)).\displaystyle:=\exp\left((\lambda-1)X_{n}^{(\lambda)}\right). (11)

We know that (Mn)(M_{n}) is a supermartingale with respect to (n)n1(\mathcal{F}_{n}^{\prime})_{n\geq 1}. From the optional stopping theorem, we have

𝔼[MN(x)(λ)]1.\mathbb{E}[M_{N(x)}^{(\lambda)}]\leq 1.

This will ensure (5) holds. Although for rounds nn where we select a Brownian noise reduction we have Pn=PnP^{\prime}_{n}=P_{n} and Qn=QnQ_{n}^{\prime}=Q_{n}, we still need to show that for rounds nn where approximate zCDP mechanisms were selected the original distributions PnP_{n} and QnQ_{n} can be written as weighted mixtures including PnP^{\prime}_{n} and QnQ^{\prime}_{n}, respectively. This follows from the same analysis as in [22], so that for all outcomes y1,y2,y_{1},y_{2},\cdots where n=1δn(y1:n1)δ\sum_{n=1}^{\infty}\delta_{n}(y_{1:n-1})\leq\delta we have

P1:n(y1,y2,,yn)(1δ)m=1nPm(ymy1:m1),\displaystyle P_{1:n}(y_{1},y_{2},\cdots,y_{n})\geq(1-\delta)\prod_{m=1}^{n}P^{\prime}_{m}\left(y_{m}\mid y_{1:m-1}\right),

and similarly for Q1:nQ_{1:n}. As is argued in [22], it suffices to show that the two likelihoods of the stopped process P1:NP_{1:N} and Q1:NQ_{1:N} can be decomposed as weighted mixtures of P1:NP^{\prime}_{1:N} and P1:N′′P^{\prime\prime}_{1:N} as well as Q1:NQ^{\prime}_{1:N} and Q1:N′′Q^{\prime\prime}_{1:N}, respectively such that the weights on P1:NP^{\prime}_{1:N} and Q1:NQ^{\prime}_{1:N} are at least 1δ1-\delta. Note that from our stopping rule, we haven for all λ>0\lambda>0

𝔼[MN(x)(λ)]1𝔼[eλn=1N(x)(n+nBM)]eλ(λ+1)ρ\mathbb{E}[M_{N(x)}^{(\lambda)}]\leq 1\implies\mathbb{E}\left[e^{\lambda\sum_{n=1}^{N(x)}(\mathcal{L}_{n}^{\prime}+\mathcal{L}_{n}^{\texttt{BM}})}\right]\leq e^{\lambda(\lambda+1)\rho}

We then still need to convert to DP, which we do with the stopping rule N(x)N(x) and the conversion lemma between approximate zCDP and DP in Lemma 2.3. ∎

7 Application: Bounding Relative Error

Our motivating application will be returning as many counts as possible subject to each count being within α%\alpha\% relative error, i.e. if yy is the true count and y^\hat{y} is the noisy count, we require ||y^/y|1|<α\left||\hat{y}/y|-1\right|<\alpha. It is common for practitioners to be able to tolerate some relative error to some statistics and would like to not be shown counts that are outside a certain accuracy. Typically, DP requires adding a predetermined standard deviation to counts, but it would be advantageous to be able to add large noise to large counts so that more counts could be returned subject to an overall privacy budget.

7.1 Doubling Method

A simple baseline approach would be to use the “doubling method”, as presented in [12]. This approach uses differentially private mechanisms and checks whether each outcome satisfies some condition, in which case you stop, or the analyst continues with a larger privacy loss parameter. The downside of this approach is that the analyst needs to pay for the accrued privacy loss of all the rejected values. However, handling composition in this case turns out to be straightforward given Theorem 2, due to [22]. We then compare the ‘doubling method” against using Brownian noise reduction and applying Theorem 6.

We now present the doubling method formally. We take privacy loss parameters ε(1)<ε(2)<\varepsilon^{(1)}<\varepsilon^{(2)}<\cdots, where ε(i+1)=2ε(i)\varepsilon^{(i+1)}=\sqrt{2}\varepsilon^{(i)}. Similar to the argument in Claim B.1 in [12], we use the 2\sqrt{2} factor because the privacy loss will depend on the sum of square of privacy loss parameters, i.e. i=1m(ε(i))2\sum_{i=1}^{m}(\varepsilon^{(i)})^{2} up to some iterate mm, in Theorem 2 as ρi=(ε(i))2/2\rho_{i}=(\varepsilon^{(i)})^{2}/2 is the zCDP parameter. This means that if ε\varepsilon^{\star} is the privacy loss parameter that the algorithm would have halted at, then we might overshoot it by 2ε\sqrt{2}\varepsilon^{\star}. Further, the overall sum of square privacy losses will be no more than 4(ε)24(\varepsilon^{\star})^{2}. Hence, we refer to the doubling method as doubling the square of the privacy loss parameter.

7.2 Experiments

Refer to caption
Figure 2: Data from a Zipf Distribution with parameter a=0.75a=0.75 and max value of 300.

We perform experiments to return as many results subject to a relative error tolerance α\alpha and a privacy budget ε,δ>0\varepsilon,\delta>0. We will generate synthetic data from a Zipf distribution, i.e. from a density f(k)kaf(k)\propto k^{-a} for a>0a>0 and kk\in\mathbb{N}. We will set a max value of the distribution to be 300300 and a=0.75a=0.75. We will assume that a user can modify each count by at most 1, so that the 0\ell_{0}-sensitivity is 300 and \ell_{\infty}-sensitivity is 1. See Figure 2 for the data distribution we used.

In our experiments, we will want to first find the top count and then try to add the least amount of noise to it, while still achieving the target relative error, which we set to be α=10%\alpha=10\%. To find the top count, we apply the Exponential Mechanism [14] by adding Gumbel noise with scale 1/εEM1/\varepsilon_{\text{EM}} to each sensitivity 1 count (all 300 elements’ counts, even if they are zero in the data) and take the element with the top noisy count. From [3], we know that the Exponential Mechanism with parameter εEM\varepsilon_{\text{EM}} is εEM2/8\varepsilon_{\text{EM}}^{2}/8-zCDP, which we will use in our composition bounds. For the Exponential Mechanism, we select a fixed parameter εEM=0.1\varepsilon_{\text{EM}}=0.1.

After we have selected a top element via the Exponential Mechanism, we then need to add some noise to it in order to return its count. Whether we use the doubling method and apply Theorem 2 for composition or the Brownian noise reduction mechanism and apply Theorem 6 for composition, we need a stopping condition. Note that we cannot use the true count to determine when to stop, but we can use the noisy count and the privacy loss parameter that was used. Hence we use the following condition based on the noisy count y^\hat{y} and the corresponding privacy loss parameter ε(i)\varepsilon^{(i)} at iterate ii:

1α<|(y^+1/ε(i))/(y^1/ε(i))|1+α and |y^|>1/ε(i).1-\alpha<\left|(\hat{y}+1/\varepsilon^{(i)})/(\hat{y}-1/\varepsilon^{(i)})\right|\leq 1+\alpha\quad\text{ and }\quad|\hat{y}|>1/\varepsilon^{(i)}.

Note that for Brownian noise reduction mechanism at round nn, we use time values tn(i)=1/(εn(i))2t^{(i)}_{n}=1/(\varepsilon_{n}^{(i)})^{2}. We also set an overall privacy budget of (ε,δ′′)=(10,106)(\varepsilon,\delta^{\prime\prime})=(10,10^{-6}). To determine when to stop, we will simply consider the sum of squared privacy parameters and stop if it is more than roughly 2.705, which corresponds to the overall privacy budget of ε=10\varepsilon=10 with δ′′=106\delta^{\prime\prime}=10^{-6}. If the noisy value from the largest privacy loss parameter does not satisfy the condition above, we discard the result.

We pick the smallest privacy parameter squared to be (εn(1))2=0.0001(\varepsilon_{n}^{(1)})^{2}=0.0001 for each nn in both the noise reduction and the doubling method and the largest value will change as we update the remaining sum of square privacy loss terms that have been used. We then set 1000 equally spaced parameters in noise reduction to select between 0.00010.0001 and the largest value for the square of the privacy loss parameter. We then vary the sample size of the data in {8000,16000,32000,64000,128000}\{8000,16000,32000,64000,128000\} and see how many results are returned and of those returned, how many satisfy the actual relative error, which we refer to as precision. Note that if 0 results are returned, then we consider the precision to be 1. Our results are given in Figure 3 where we give the empirical average and standard deviation over 1000 trials for each sample size.

Refer to caption
Refer to caption
Figure 3: Precision and number of results returned on Zipfian data for various sample sizes.

We also evaluated our approach on real data from Reddit comments from https://github.com/heyyjudes/differentially-private-set-union/tree/ea7b39285dace35cc9e9029692802759f3e1c8e8/data. This data consists of comments from Reddit authors. To find the most frequent words from distinct authors, we take the set of all distinct words contributed by each author, which can be arbitrarily large and form the resulting histogram which has \ell_{\infty}-sensitivity 1 yet unbounded 2\ell_{2}-sensitivity. To get a domain of words to select from, we take the top-1000 words from this histogram. We note that this step should also be done with DP, but will not impact the relative performance between using the Brownian noise reduction and the Doubling Gaussian Method.

We then follow the same approach as on the synthetic data, using the Exponential Mechanism with εEM=0.01\varepsilon_{\text{EM}}=0.01, minimum privacy parameter εn(1)=0.0001\varepsilon_{n}^{(1)}=0.0001, relative error α=0.01\alpha=0.01, and overall (ε=1,δ=106)(\varepsilon=1,\delta=10^{-6})-DP guarantee. In 1000 trials, the Brownian noise reduction precision (proportion of results that had noisy counts within 1% of the true count) was on average 97% (with minimum 92%) while the Doubling Gaussian Method precision was on average 98% (with minimum 93.5%). Furthermore, the number of results returned by the Brownian noise reduction in 1000 trials was on average 152 (with minimum 151), while the number of results returned by the Doubling Gaussian method was on average 109 (with minimum 108).

8 Conclusion

We have presented a way to combine approximate zCDP mechanisms and ex-post private mechanisms while achieving an overall differential privacy guarantee, allowing for more general and flexible types of interactions between an analyst and a sensitive dataset. Furthermore, we showed how this type of composition can be used to provide overall privacy guarantees subject to outcomes satisfying strict accuracy requirements, like relative error. We hope that that this will help extend the practicality of private data analysis by allowing the release of counts with relative error bounds subject to an overall privacy bound.

There are several open questions with this line of work. In particular, we leave open the problem of showing ex-post privacy composition that improves over basic composition for certain mechanisms. Although we only studied the Brownian noise reduction, we conjecture that Laplace noise reduction mechanisms [11] compose in a similar way, achieving privacy loss that depends on a sum of squared realized privacy loss parameters. Furthermore, we leave open the problem of improving computational run time of the noise reduction mechanisms. In particular, if we are only interested in the last iterate of the Brownian noise reduction with a particular stopping function, can we simply sample from a distribution in one-shot to arrive at that value, rather than iteratively checking lots of intermediate noisy values, which seems wasteful. Lastly, we hope that this will lead to future work in designing new ex-post private mechanisms based on some of the primitives of differential privacy, specifically the Exponential Mechanism.

9 Acknowledgements

We would like to thank Adrian Rivera Cardoso and Saikrishna Badrinarayanan for helpful comments. This work was done while G.S. was a visiting scholar at LinkedIn. ZSW was supported in part by the NSF awards 2120667 and 2232693 and a Cisco Research Grant. AR was supported by NSF DMS-2310718.

References

  • Abowd et al. [2022] J. M. Abowd, R. Ashmead, R. Cumings-Menon, S. Garfinkel, M. Heineck, C. Heiss, R. Johns, D. Kifer, P. Leclerc, A. Machanavajjhala, B. Moran, W. Sexton, M. Spence, and P. Zhuravlev. The 2020 census disclosure avoidance system topdown algorithm, 2022. arXiv: 2204.08986.
  • Bun and Steinke [2016] M. Bun and T. Steinke. Concentrated differential privacy: Simplifications, extensions, and lower bounds. In Theory of Cryptography, pages 635–658. Springer Berlin Heidelberg, 2016. ISBN 978-3-662-53641-4. URL https://link.springer.com/chapter/10.1007/978-3-662-53641-4_24.
  • Cesar and Rogers [2021] M. Cesar and R. Rogers. Bounding, concentrating, and truncating: Unifying privacy loss composition for data analytics. In Proceedings of the 32nd International Conference on Algorithmic Learning Theory, volume 132 of Proceedings of Machine Learning Research, pages 421–457. PMLR, 16–19 Mar 2021. URL https://proceedings.mlr.press/v132/cesar21a.html.
  • Dwork et al. [2006a] C. Dwork, K. Kenthapadi, F. McSherry, I. Mironov, and M. Naor. Our data, ourselves: Privacy via distributed noise generation. In Advances in Cryptology - EUROCRYPT 2006, pages 486–503. Springer Berlin Heidelberg, 2006a. ISBN 978-3-540-34547-3. URL https://www.iacr.org/archive/eurocrypt2006/40040493/40040493.pdf.
  • Dwork et al. [2006b] C. Dwork, F. McSherry, K. Nissim, and A. Smith. Calibrating noise to sensitivity in private data analysis. In Proceedings of the Third Conference on Theory of Cryptography, TCC’06, page 265–284. Springer-Verlag, 2006b. ISBN 3540327312. doi: 10.1007/11681878˙14. URL https://doi.org/10.1007/11681878_14.
  • Dwork et al. [2010] C. Dwork, G. N. Rothblum, and S. P. Vadhan. Boosting and differential privacy. In 51st Annual Symposium on Foundations of Computer Science, pages 51–60, 2010.
  • Feldman and Zrnic [2021] V. Feldman and T. Zrnic. Individual privacy accounting via a rényi filter. In Advances in Neural Information Processing Systems, 2021. URL https://openreview.net/forum?id=PBctz6_47ug.
  • Howard et al. [2020] S. R. Howard, A. Ramdas, J. McAuliffe, and J. Sekhon. Time-uniform Chernoff bounds via nonnegative supermartingales. Probability Surveys, 17:257 – 317, 2020.
  • Howard et al. [2021] S. R. Howard, A. Ramdas, J. McAuliffe, and J. Sekhon. Time-uniform, nonparametric, nonasymptotic confidence sequences. The Annals of Statistics, 49(2):1055 – 1080, 2021.
  • Kairouz et al. [2017] P. Kairouz, S. Oh, and P. Viswanath. The composition theorem for differential privacy. IEEE Transactions on Information Theory, 63(6):4037–4049, 2017. doi: 10.1109/TIT.2017.2685505. URL https://doi.org/10.1109/TIT.2017.2685505.
  • Koufogiannis et al. [2017] F. Koufogiannis, S. Han, and G. J. Pappas. Gradual release of sensitive data under differential privacy. Journal of Privacy and Confidentiality, 7(2), 2017.
  • Ligett et al. [2017] K. Ligett, S. Neel, A. Roth, B. Waggoner, and S. Z. Wu. Accuracy first: Selecting a differential privacy level for accuracy constrained erm. In Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017. URL https://proceedings.neurips.cc/paper/2017/file/86df7dcfd896fcaf2674f757a2463eba-Paper.pdf.
  • Lyu [2022] X. Lyu. Composition theorems for interactive differential privacy. In Advances in Neural Information Processing Systems, volume 35, pages 9700–9712. Curran Associates, Inc., 2022. URL https://proceedings.neurips.cc/paper_files/paper/2022/file/3f52b555967a95ee850fcecbd29ee52d-Paper-Conference.pdf.
  • McSherry and Talwar [2007] F. McSherry and K. Talwar. Mechanism design via differential privacy. In 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS’07), pages 94–103, 2007. doi: 10.1109/FOCS.2007.66. URL http://dx.doi.org/10.1109/FOCS.2007.66.
  • Murtagh and Vadhan [2016] J. Murtagh and S. Vadhan. The complexity of computing the optimal composition of differential privacy. In Proceedings, Part I, of the 13th International Conference on Theory of Cryptography - Volume 9562, TCC 2016-A, pages 157–175. Springer-Verlag, 2016. ISBN 978-3-662-49095-2. doi: 10.1007/978-3-662-49096-9˙7. URL https://doi.org/10.1007/978-3-662-49096-9_7.
  • Revuz and Yor [2013] D. Revuz and M. Yor. Continuous martingales and Brownian motion, volume 293. Springer Science & Business Media, 2013.
  • Rogers et al. [2016] R. M. Rogers, A. Roth, J. Ullman, and S. P. Vadhan. Privacy odometers and filters: Pay-as-you-go composition. In Advances in Neural Information Processing Systems 29, pages 1921–1929, 2016. URL http://papers.nips.cc/paper/6170-privacy-odometers-and-filters-pay-as-you-go-composition.
  • Vadhan and Wang [2021] S. Vadhan and T. Wang. Concurrent composition of differential privacy. In Theory of Cryptography, pages 582–604. Springer International Publishing, 2021. ISBN 978-3-030-90453-1.
  • Vadhan and Zhang [2023] S. Vadhan and W. Zhang. Concurrent composition theorems for differential privacy. In Proceedings of the 55th Annual ACM Symposium on Theory of Computing, STOC ’23. ACM, June 2023. doi: 10.1145/3564246.3585241. URL http://dx.doi.org/10.1145/3564246.3585241.
  • Wang et al. [2022] H. Wang, X. Peng, Y. Xiao, Z. Xu, and X. Chen. Differentially private data aggregating with relative error constraint, 2022. URL https://doi.org/10.1007/s40747-021-00550-3.
  • Whitehouse et al. [2022] J. Whitehouse, A. Ramdas, S. Wu, and R. Rogers. Brownian noise reduction: Maximizing privacy subject to accuracy constraints. In Advances in Neural Information Processing Systems, 2022. URL https://openreview.net/forum?id=J-IZQLQZdYu.
  • Whitehouse et al. [2023] J. Whitehouse, A. Ramdas, R. Rogers, and Z. S. Wu. Fully adaptive composition in differential privacy. In International Conference on Machine Learning. PMLR, 2023.
  • Xiao et al. [2011] X. Xiao, G. Bender, M. Hay, and J. Gehrke. Ireduct: Differential privacy with reduced relative errors. In Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data, SIGMOD ’11, page 229–240. Association for Computing Machinery, 2011. ISBN 9781450306614. doi: 10.1145/1989323.1989348. URL https://doi.org/10.1145/1989323.1989348.

Appendix A Noise Reduction Mechanisms

We provide a more general approach to handle both the Laplace and Brownian noise reduction mechanisms from [11, 12, 21]. Consider a discrete set of times {a(j)}\{a^{(j)}\} with 0<a(j)b0<a^{(j)}\leq b. We will allow the stopping time and the number of steps to be adaptive. We will handle the univariate case and note that the analysis extends to the multivariate case. We will write time functions (ϕ(k),k1)(\phi^{(k)},k\geq 1) to satisfy the following

ϕ(1)b>0\displaystyle\phi^{(1)}\equiv b>0
ϕ(k):{a(j)}×{a(j)} such that ϕ(k)(t,z)<t for all (t,z){a(j)}× and k2\displaystyle\phi^{(k)}:\{a^{(j)}\}\times\mathbb{R}\to\{a^{(j)}\}\text{ such that $\phi^{(k)}(t,z)<t$ for all $(t,z)\in\{a^{(j)}\}\times\mathbb{R}$ and $k\geq 2$}

Consider the noise reduction mechanisms where we have M(1,p)=t(1)=bM^{(1,p)}=t^{(1)}=b, M(1,q)(x)=f(x)+Z(t(1))M^{(1,q)}(x)=f(x)+Z(t^{(1)}) where Z(t)Z(t) is either a standard Brownian motion or a Laplace process [11], and for k>1k>1 we have

M(k,p)(x)\displaystyle M^{(k,p)}(x) =ϕ(k)(M(k1,p)(x),M(k1,q)(x))\displaystyle=\phi^{(k)}(M^{(k-1,p)}(x),M^{(k-1,q)}(x))
M(k,q)(x)\displaystyle M^{(k,q)}(x) =f(x)+Z(M(k,p)(x))\displaystyle=f(x)+Z(M^{(k,p)}(x))

We will then write

M(1:k)(x)=((M(k,p)(x),M(k,q)(x)),,(M(1,p)(x),M(1,q)(x)))M^{(1:k)}(x)=\left((M^{(k,p)}(x),M^{(k,q)}(x)),\cdots,(M^{(1,p)}(x),M^{(1,q)}(x))\right)

We then consider the extended mechanism M(x)=(T(x),M(1:T(x))(x))M^{*}(x)=(T(x),M^{(1:T(x))}(x)), where T(x)T(x) is a stopping function. Hence, M(x)M^{*}(x) takes values in ×((0,b)×),\mathbb{Z}\times((0,b)\times\mathbb{R})^{\infty,*} where ((0,b]×),\left((0,b]\times\mathbb{R}\right)^{\infty,*} is a collection of all finite sequences of the type ((t(i),z(i)):i[k])((t^{(i)},z^{(i)}):i\in[k]) for k=1,2,k=1,2,\cdots and 0<t(k)<,t(1)=b0<t^{(k)}<\cdots,t^{(1)}=b and z(i)z^{(i)}\in\mathbb{R} for i[k]i\in[k].

Let mm^{*} be the probability measure on this space generated by M(x)M^{*}(x^{*}) where xx^{*} is a point where f(x)=0f(x^{*})=0. Note that xx^{*} might be a fake value that needs to be added to make the function equal zero. Let m(x)m(x) be the probability measure on this space generated by M(x)M^{*}(x). We will then compute densities with respect to mm^{*}.

Let AA be a measurable set in ×((0,b]×),\mathbb{Z}\times\left((0,b]\times\mathbb{R}\right)^{\infty,*}. Let A(j)AA^{(j)}\subseteq A on which the last (smallest) pp-coordinate of every point is equal to a(j)a^{(j)}. Then

Pr[M(x)A]=jPr[M(x)A(j)].\Pr[M^{*}(x)\in A]=\sum_{j}\Pr[M^{*}(x)\in A^{(j)}].

By construction, M(x)M^{*}(x) is a function of (Z(t)+f(x),0tb)(Z(t)+f(x),0\leq t\leq b). Therefore, for each jj, we have

Pr[M(x)A(j)]=Pr[(Z(t)+f(x),0tb)(M)1(A(j))]\Pr[M^{*}(x)\in A^{(j)}]=\Pr[\left(Z(t)+f(x),0\leq t\leq b\right)\in(M^{*})^{-1}(A^{(j)})]

where the set of paths is of the following form for a measurable subset D(j)C([a(j),b])D^{(j)}\in C([a^{(j)},b]) of continuous functions on [a(j),b][a^{(j)},b]

(M)1(A(j))={(y(t):0tb):(y(t):a(j)tb)D(j)}.(M^{*})^{-1}(A^{(j)})=\left\{(y(t):0\leq t\leq b):(y(t):a^{(j)}\leq t\leq b)\in D^{(j)}\right\}.

The standard Brownian motion and the Laplace process both have independence of increments, so we have the following with ptp_{t} as the density for Z(t)Z(t)

Pr\displaystyle\Pr [(Z(t)+f(x):0tb){(y(t):0tb):(y(t):a(j)tb)D(j)}]\displaystyle[\left(Z(t)+f(x):0\leq t\leq b\right)\in\{(y(t):0\leq t\leq b):(y(t):a^{(j)}\leq t\leq b)\in D^{(j)}\}]
=Pr[(Z(t)+f(x):a(j)tb)D(j)]\displaystyle\qquad=\Pr[(Z(t)+f(x):a^{(j)}\leq t\leq b)\in D^{(j)}]
=𝔼[𝟙{(Z(t):a(j)tb)D(j)}pa(j)(Z(a(j))f(x))pa(j)(Z(a(j)))]\displaystyle\qquad=\mathbb{E}\left[\mathbbm{1}\left\{(Z(t):a^{(j)}\leq t\leq b)\in D^{(j)}\right\}\frac{p_{a^{(j)}}(Z(a^{(j)})-f(x))}{p_{a^{(j)}}(Z(a^{(j)}))}\right]
=𝔼[𝟙{M(x)A(j)}pM(T(x),q)(x)(Z(M(T(x),q)(x))f(x))pM(T(x),q)(x)(Z(M(T(x),q)(x)))]\displaystyle\qquad=\mathbb{E}\left[\mathbbm{1}\left\{M^{*}(x^{*})\in A^{(j)}\right\}\frac{p_{M^{(T(x^{*}),q)}(x^{*})}(Z(M^{(T(x^{*}),q)}(x^{*}))-f(x))}{p_{M^{(T(x^{*}),q)}(x^{*})}(Z(M^{(T(x^{*}),q)}(x^{*})))}\right]

where the second equality follows from the fact that the law of a shifted process with independent increments on (a(j),b)(a^{(j)},b) is equivalent to the law of the non-shifted process and its density is the ratio of the two densities evaluated at its left most point. We then conclude that

Pr[M(x)A]=𝔼[𝟙{M(x)A}pM(T(x),p)(x)(Z(M(T(x),p)(x))f(x))pM(T(x),p)(x)(Z(M(T(x),p)(x)))]\Pr[M^{*}(x)\in A]=\mathbb{E}\left[\mathbbm{1}\left\{M^{*}(x^{*})\in A\right\}\frac{p_{M^{(T(x^{*}),p)}(x^{*})}(Z(M^{(T(x^{*}),p)}(x^{*}))-f(x))}{p_{M^{(T(x^{*}),p)}(x^{*})}(Z(M^{(T(x^{*}),p)}(x^{*})))}\right]

Therefore m(x)m(x) is absolutely continuous with respect to mm^{*}. To write the density (the Radon-Nikodym derivative), recall that the density is evaluated at some point 𝐬×({a(j)}×),\mathbf{s}\in\mathbb{Z}\times(\{a^{(j)}\}\times\mathbb{R})^{\infty,*}. Let t(𝐬)t(\mathbf{s}) be the smallest pp-value in 𝐬\mathbf{s} and z(𝐬)z(\mathbf{s}) be the corresponding space value. Then we have

px(1:T(x))(𝐬)=dm(x)dm(𝐬)=pt(𝐬)(z(𝐬)f(x))pt(𝐬)(z(𝐬))p_{x}^{(1:T(x))}(\mathbf{s})=\frac{dm(x)}{dm^{*}}(\mathbf{s})=\frac{p_{t(\mathbf{s})}(z(\mathbf{s})-f(x))}{p_{t(\mathbf{s})}(z(\mathbf{s}))}

Similarly, we consider Mlast(x)=(T(x),M(T(x))(x))M^{*}_{\text{last}}(x)=(T(x),M^{(T(x))}(x)) so that the state space is ×{a(j)}×\mathbb{Z}\times\{a^{(j)}\}\times\mathbb{R}. We will write m^\hat{m}^{*} to be the probability measure on this space generated by Mlast(x)M^{*}_{\text{last}}(x^{*}) and m^(x)\hat{m}(x) to be the probability measure on this space generated by Mlast(x)M^{*}_{\text{last}}(x). We then compute densities with respect to m^\hat{m}^{*}.

Let AA now be a measurable subset of ×{a(j)}×\mathbb{Z}\times\{a^{(j)}\}\times\mathbb{R} and let A(j)AA^{(j)}\subseteq A with the pp-coordinate equal to a(j)a^{(j)}. Then we follow a similar argument to what we have above to show that

Pr[Mlast(x)A]=𝔼[𝟙{Mlast(x)A}pM(T(x),p)(x)(Z(M(T(x),p)(x))f(x))pM(T(x),p)(x)(Z(M(T(x),p)(x)))].\Pr\left[M^{*}_{\text{last}}(x)\in A\right]=\mathbb{E}\left[\mathbbm{1}\left\{M^{*}_{\text{last}}(x^{*})\in A\right\}\frac{p_{M^{(T(x^{*}),p)}(x^{*})}(Z(M^{(T(x^{*}),p)}(x^{*}))-f(x))}{p_{M^{(T(x^{*}),p)}(x^{*})}(Z(M^{(T(x^{*}),p)}(x^{*})))}\right].

Hence, for (k,t,z)×{a(j)}×(k,t,z)\in\mathbb{Z}\times\{a^{(j)}\}\times\mathbb{R} we have

px(T(x))(k,t,z)=dm(x)dm(k,t,z)=pt(zf(x))pt(z)p_{x}^{(T(x))}(k,t,z)=\frac{dm(x)}{dm^{*}}(k,t,z)=\frac{p_{t}(z-f(x))}{p_{t}(z)}

We then consider the privacy loss for both MM^{*}, denoted as (1:T(x))\mathcal{L}^{(1:T(x))}, and MlastM^{*}_{\text{last}}, denoted as (T(x))\mathcal{L}^{(T(x))}, under neighboring data xx and xx^{\prime}. We have

(1:T(x))=log(pM(T(x),p)(x)(Z(M(T(x),p)(x))f(x))pM(T(x),p)(x)(Z(M(T(x),p)(x))f(x)))=(T(x))\mathcal{L}^{(1:T(x))}=\log\left(\frac{p_{M^{(T(x),p)}(x)}(Z(M^{(T(x),p)}(x))-f(x))}{p_{M^{(T(x),p)}(x)}(Z(M^{(T(x),p)}(x))-f(x^{\prime}))}\right)=\mathcal{L}^{(T(x))} (12)

We then instantiate the Brownian noise reduction, in which case

pt(zf(x))/pt(z)=exp(12t(zf(x))2+12tz2)=exp(f(x)tzf(x)22t)p_{t}(z-f(x))/p_{t}(z)=\exp\left(-\frac{1}{2t}(z-f(x))^{2}+\frac{1}{2t}z^{2}\right)=\exp\left(\frac{f(x)}{t}z-\frac{f(x)^{2}}{2t}\right)

Without loss of generality, consider the function g(y)=f(y)f(x)g(y)=f(y)-f(x) so that we apply

log(pt(zg(x))pt(zg(x)))=g(x)|g(x)|tz|g(x)|+g(x)22t\log\left(\frac{p_{t}(z-g(x))}{p_{t}(z-g(x^{\prime}))}\right)=-\frac{g(x^{\prime})}{|g(x^{\prime})|t}z|g(x^{\prime})|+\frac{g(x^{\prime})^{2}}{2t}

We then use the Brownian motion (B(t))t0(B(t))_{t\geq 0} to get the privacy loss BM(1:T(x))\mathcal{L}_{\texttt{BM}}^{(1:T(x))}

BM(1:T(x))\displaystyle\mathcal{L}_{\texttt{BM}}^{(1:T(x))} =(f(x)f(x)22tf(x)f(x)|f(x)f(x)|t|f(x)f(x)|B(M(T(x),p)(x))\displaystyle=\frac{(f(x)-f(x^{\prime})^{2}}{2t}-\frac{f(x)-f(x^{\prime})}{|f(x)-f(x^{\prime})|t}|f(x)-f(x^{\prime})|\cdot B\left(M^{(T(x),p)}(x)\right)
=(f(x)f(x)22t1t|f(x)f(x)|W(MT(x),p(x))\displaystyle=\frac{(f(x)-f(x^{\prime})^{2}}{2t}-\frac{1}{t}|f(x)-f(x^{\prime})|\cdot W\left(M^{T(x),p}(x)\right)

where (W(t)=zB(t))(W(t)=z\cdot B(t)) is a standard Brownian motion for |z|=1|z|=1.

There is another noise reduction mechanism based on Laplace noise, originally from [11] and shown to be ex-post private in [12]. We first show that it is indeed a noise reduction mechanism with a stopping function and consider the resulting privacy loss random variable. We focus on the univariate case, yet the analysis extends to the multivariate case.

We construct a Markov process (X(t))t0(X(t))_{t\geq 0} with X(0)=0X(0)=0 such that, for each t>0t>0, X(t)X(t) has the Laplace distribution with parameter tt, which has density p(z)=12texp(|z|/t)p(z)=\tfrac{1}{2t}\exp\left(-|z|/t\right) for zz\in\mathbb{R}. The process we construct has independent increments, i.e. for any 0t(1)<t(2)<<t(k)0\leq t^{(1)}<t^{(2)}<\cdots<t^{(k)} the following differences are independent,

X(t(0)),X(t(1))X(t(0)),,X(t(k))X(t(k1)).X(t^{(0)}),X(t^{(1)})-X(t^{(0)}),\cdots,X(t^{(k)})-X(t^{(k-1)}).

Hence, the process is Markovian. The idea of constructing such a process is that a Laplace random variable with parameter t>0t>0 is a symmetric, infinitely divisible random variable without a Gaussian component whose Lévy measure has density

g(z;t)=e|z|/t|z|,z0.g(z;t)=\frac{e^{-|z|/t}}{|z|},\ z\neq 0.

Let UU be an infinitely divisible random measure on [0,)[0,\infty) with Lebesgue control measure and local Lévy density

ψ(z;t)=t2e|z|/t,z0,t>0.\psi(z;t)=t^{-2}e^{-|z|/t},\ z\neq 0,t>0. (13)

We define X(t)=U([0,t])X(t)=U([0,t]), where t0t\geq 0. Then X(0)=0X(0)=0 and for t>0t>0, X(t)X(t) is a symmetric, infinitely divisible random variable without a Gaussian component, whose Lévy measure has the density equal to

0tψ(z;u)𝑑u=0tu2e|z|/u𝑑u=1/te|z|s𝑑s=g(z;t).\displaystyle\int_{0}^{t}\psi(z;u)du=\int_{0}^{t}u^{-2}e^{-|z|/u}du=\int_{1/t}^{\infty}e^{-|z|s}ds=g(z;t).

That is X(t)X(t) is a Laplace random variable with parameter tt and the resulting process has independent increments by construction. Note that the process (X(t))t0(X(t))_{t\geq 0} has infinitely many jumps near 0. On an interval [t(1),t(2)][t^{(1)},t^{(2)}] with 0<t(1)<t(1)0<t^{(1)}<t^{(1)}, it is flat with probability (t(1)/t(2))2(t^{(1)}/t^{(2)})^{2}.

We can then describe the Laplace noise reduction algorithm in a way similar to the Brownian Noise Reduction.

Definition A.1 (Laplace Noise Reduction).

Let f:𝒳df:\mathcal{X}\to\mathbb{R}^{d} be a function and (t(k))k1(t^{(k)})_{k\geq 1} a sequence of time values. Let (X(t))t0(X(t))_{t\geq 0} be the Markov process described above independently in each coordinate and T:(d)T:(\mathbb{R}^{d})^{*}\to\mathbb{N} be a stopping function. The Laplace noise reduction associated with ff, time values (t(k))k1(t^{(k)})_{k\geq 1}, and stopping function TT is the algorithm LNR:𝒳((0,t(1)]×d)\texttt{LNR}:\mathcal{X}\to((0,t^{(1)}]\times\mathbb{R}^{d})^{*} given by

LNR(x)=LNR(1:T(x))(x)=(t(k),f(x)+X(t(k)))kT(x).\texttt{LNR}(x)=\texttt{LNR}^{(1:T(x))}(x)=\left(t^{(k)},f(x)+X(t^{(k)})\right)_{k\leq T(x)}.

We then aim to prove the following

Lemma A.1.

The privacy loss LNR\mathcal{L}_{\texttt{LNR}} for the Laplace noise reduction associated with ff, time values (t(k))k1(t^{(k)})_{k\geq 1}, and stopping function TT can be written as

LNR=LNR(1:T(x))=LNR(T(x)).\mathcal{L}_{\texttt{LNR}}=\mathcal{L}_{\texttt{LNR}}^{(1:T(x))}=\mathcal{L}_{\texttt{LNR}}^{(T(x))}.

Furthermore we have

LNR(1:T(x))=1M(T(x),p)(x)(|X(M(T(x),p)(x))||X(M(T(x),p)(x))Δ1(f)|)\mathcal{L}_{\texttt{LNR}}^{(1:T(x))}=-\frac{1}{M^{(T(x),p)}(x)}\left(|X(M^{(T(x),p)}(x))|-|X(M^{(T(x),p)}(x))-\Delta_{1}(f)|\right)
Proof.

Because the Laplace process has independent increments, we get the same expression for the privacy loss as in (12) where we substitute the Laplace process for Z(t)Z(t), which has the following density

pt(zf(x))/pt(z)=exp(1t(|zf(x)||z|)).p_{t}(z-f(x))/p_{t}(z)=\exp\left(-\frac{1}{t}\left(|z-f(x)|-|z|\right)\right).

We will again use g(y)=f(y)f(x)g(y)=f(y)-f(x) for neighboring datasets x,xx,x^{\prime} with f(x)>f(x)f(x^{\prime})>f(x) without loss of generality to get

log(pt(zg(x))/pt(zg(x)))=1t(|z||zg(x)|)\log\left(p_{t}(z-g(x))/p_{t}(z-g(x^{\prime}))\right)=-\frac{1}{t}\left(|z|-|z-g(x^{\prime})|\right)

Therefore, we have from (12)

LNR(1:T(x))=LNR(T(x))=1M(T(x),p)(x)(|X(M(T(x),p)(x))||X(M(T(x),p)(x))|f(x)f(x)||).\mathcal{L}_{\texttt{LNR}}^{(1:T(x))}=\mathcal{L}_{\texttt{LNR}}^{(T(x))}=-\frac{1}{M^{(T(x),p)}(x)}\left(|X(M^{(T(x),p)}(x))|-|X(M^{(T(x),p)}(x))-|f(x)-f(x^{\prime})||\right).

We now want to show a similar result that we had for the Brownian noise reduction mechanism in Lemma 6.1, but for the Laplace process. Instead of allowing general time step functions ϕ(j)\phi^{(j)}, we will simply look at time values 0<t(k)<<t(1)=b0<t^{(k)}<\cdots<t^{(1)}=b so that

LNR(1:T(x))=1t(T(x))(|X(t(T(x)))||X(t(T(x)))|f(x)f(x)||)\mathcal{L}_{\texttt{LNR}}^{(1:T(x))}=-\frac{1}{t^{(T(x))}}\left(|X(t^{(T(x))})|-|X(t^{(T(x))})-|f(x)-f(x^{\prime})||\right)

This form of the privacy loss for the Laplace noise reduction might be helpful in determining whether we can get a similar backward martingale as in the Brownian noise reduction case in Lemma 6.1. We leave the problem open to try to get a composition bound for Laplace noise reduction mechanisms that improves over simply adding up the ex-post privacy bounds as in Theorem 5.1.