This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Local Differential Privacy Is Equivalent to Contraction of 𝖀γ{\mathsf{E}}_{\gamma}-Divergence

   Shahab Asoodeh†, Maryam Aliakbarpourβˆ—, and Flavio P. Calmon†
†Harvard University, βˆ—University of Massachusetts Amherst
Abstract

We investigate the local differential privacy (LDP) guarantees of a randomized privacy mechanism via its contraction properties. We first show that LDP constraints can be equivalently cast in terms of the contraction coefficient of the 𝖀γ{\mathsf{E}}_{\gamma}-divergence. We then use this equivalent formula to express LDP guarantees of privacy mechanisms in terms of contraction coefficients of arbitrary ff-divergences. When combined with standard estimation-theoretic tools (such as Le Cam’s and Fano’s converse methods), this result allows us to study the trade-off between privacy and utility in several testing and minimax and Bayesian estimation problems.

I Introduction

A major challenge in modern machine learning applications is balancing statistical efficiency with the privacy of individuals from whom data is obtained. In such applications, privacy is often quantified in terms of Differential Privacy (DP) [1]. DP has several variants, including approximate DP [2], RΓ©nyi DP [3], and others [4, 5, 6, 7]. Arguably, the most stringent flavor of DP is local differential privacy (LDP) [8, 9]. Intuitively, a randomized mechanism (or a Markov kernel) is said to be locally differentially private if its output does not vary significantly with arbitrary perturbations of the input.

More precisely, a mechanism is said to be Ξ΅\varepsilon-LDP (or pure LDP) if the privacy loss random variable, defined as the log-likelihood ratio of the output for any two different inputs, is smaller than Ξ΅\varepsilon with probability one. One can also consider an approximate variant of this constraint: π–ͺ{\mathsf{K}} is said to be (Ξ΅,Ξ΄)(\varepsilon,\delta)-LDP if the privacy loss random variable does not exceed Ξ΅\varepsilon with probability at least 1βˆ’Ξ΄1-\delta (see Def.Β 1 for the formal definition).

The study of statistical efficiency under LDP constraints has gained considerable traction, e.g., [9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 8]. Almost all of these works consider Ξ΅\varepsilon-LDP and provide meaningful bounds only for sufficiently small values of Ξ΅\varepsilon (i.e., the high privacy regime). For instance, Duchi et al.Β [10] studied minimax estimation problems under Ξ΅\varepsilon-LDP constraints and showed that for Ρ≀1\varepsilon\leq 1, the price of privacy is to reduce the effective sample size from nn to Ξ΅2​n\varepsilon^{2}n. A slightly improved version of this result appeared in [19, 13]. More recently, Duchi and Rogers [20] developed a framework based on the strong data processing inequality (SDPI) [21] and derived lower bounds for minimax estimation risk under Ξ΅\varepsilon-LDP that hold for any Ξ΅β‰₯0\varepsilon\geq 0.

In this work, we develop an SDPI-based framework for studying hypothesis testing and estimation problems under (Ξ΅,Ξ΄)(\varepsilon,\delta)-LDP, extending the results of [20] to approximate LDP. In particular, we derive bounds for both the minimax and Bayesian estimation risks that hold for any Ξ΅β‰₯0\varepsilon\geq 0 and Ξ΄β‰₯0\delta\geq 0. Interestingly, when setting Ξ΄=0\delta=0, our bounds can be slightly stronger than [10].

Our main mathematical tool is an equivalent expression for DP in terms of 𝖀γ{\mathsf{E}}_{\gamma}-divergence. Given Ξ³β‰₯1\gamma\geq 1, the 𝖀γ{\mathsf{E}}_{\gamma}-divergence between two distributions PP and QQ is defined as

𝖀γ​(Pβˆ₯Q)≔12β€‹βˆ«|d​Pβˆ’Ξ³β€‹d​Q|βˆ’12​(Ξ³βˆ’1).{\mathsf{E}}_{\gamma}(P\|Q)\coloneqq\frac{1}{2}\int|\text{d}P-\gamma\text{d}Q|-\frac{1}{2}(\gamma-1). (1)

We show that a mechanism π–ͺ{\mathsf{K}} is (Ξ΅,Ξ΄)(\varepsilon,\delta)-LDP if and only if

𝖀γ​(P​π–ͺβˆ₯Q​π–ͺ)≀δ​𝖀γ​(Pβˆ₯Q){\mathsf{E}}_{\gamma}(P{\mathsf{K}}\|Q{\mathsf{K}})\leq\delta{\mathsf{E}}_{\gamma}(P\|Q)

for Ξ³=eΞ΅\gamma=e^{\varepsilon} and any pairs of distributions (P,Q)(P,Q) where P​π–ͺP{\mathsf{K}} represents the output distribution of π–ͺ{\mathsf{K}} when the input distribution is PP. Thus, the approximate LDP guarantee of a mechanism can be fully characterized by its contraction under 𝖀γ{\mathsf{E}}_{\gamma}-divergence. When combined with standard statistical techniques, including Le Cam’s and Fano’s methods [22, 23], 𝖀γ{\mathsf{E}}_{\gamma}-contraction leads to general lower bounds for the minimax and Bayesian risk under (Ξ΅,Ξ΄)(\varepsilon,\delta)-LDP for any Ξ΅β‰₯0\varepsilon\geq 0 and δ∈[0,1]\delta\in[0,1]. In particular, we show that the price of privacy in this case is to reduce the sample size from nn to n​[1βˆ’eβˆ’Ξ΅β€‹(1βˆ’Ξ΄)]n[1-e^{-\varepsilon}(1-\delta)].

There exists several results connecting pure LDP to the contraction properties of KL divergence Dπ–ͺ𝖫D_{\mathsf{KL}} and total variation distance 𝖳𝖡{\mathsf{TV}}. For instance, for any Ξ΅\varepsilon-LDP mechanism π–ͺ{\mathsf{K}}, it is shown in [10, Theorem 1] that Dπ–ͺ𝖫​(P​π–ͺβˆ₯Q​π–ͺ)≀2​(eΞ΅βˆ’1)2​𝖳𝖡2​(P,Q)D_{\mathsf{KL}}(P{\mathsf{K}}\|Q{\mathsf{K}})\leq 2(e^{\varepsilon}-1)^{2}{\mathsf{TV}}^{2}(P,Q) and in [13, Theorem 6] that 𝖳𝖡​(P​π–ͺβˆ₯Q​π–ͺ)≀eΞ΅βˆ’1eΞ΅+1​𝖳𝖡​(P,Q){\mathsf{TV}}(P{\mathsf{K}}\|Q{\mathsf{K}})\leq\frac{e^{\varepsilon}-1}{e^{\varepsilon}+1}{\mathsf{TV}}(P,Q) for any pairs (P,Q)(P,Q). Inspired by these results, we further show that if π–ͺ{\mathsf{K}} is (Ξ΅,Ξ΄)(\varepsilon,\delta)-LDP then Df​(P​π–ͺβˆ₯Q​π–ͺ)≀[1βˆ’eβˆ’Ξ΅β€‹(1βˆ’Ξ΄)]​Df​(Pβˆ₯Q)D_{f}(P{\mathsf{K}}\|Q{\mathsf{K}})\leq[1-e^{-\varepsilon}(1-\delta)]D_{f}(P\|Q) for any arbitrary ff-divergences DfD_{f} and any pairs (P,Q)(P,Q).

Notation. For a random variable XX, we write PXP_{X} and 𝒳{\mathcal{X}} for its distribution (i.e., X∼PXX\sim P_{X}) and its alphabet, respectively. For any set AA, we denote by 𝒫​(A){\mathcal{P}}(A) the set of all probability distributions on AA. Given two sets 𝒳{\mathcal{X}} and 𝒡{\mathcal{Z}}, a Markov kernel (i.e., channel) π–ͺ{\mathsf{K}} is a mapping from 𝒳{\mathcal{X}} to 𝒫​(𝒡){\mathcal{P}}({\mathcal{Z}}) given by x↦π–ͺ(β‹…|x)x\mapsto{\mathsf{K}}(\cdot|x). Given Pβˆˆπ’«β€‹(𝒳)P\in{\mathcal{P}}({\mathcal{X}}) and a Markov kernel π–ͺ:𝒳→𝒫​(𝒡){\mathsf{K}}:{\mathcal{X}}\to{\mathcal{P}}({\mathcal{Z}}), we let P​π–ͺP{\mathsf{K}} denote the output distribution of π–ͺ{\mathsf{K}} when the input distribution is PP, i.e., Pπ–ͺ(β‹…)=∫π–ͺ(β‹…|x)P(dx)P{\mathsf{K}}(\cdot)=\int{\mathsf{K}}(\cdot|x)P(\text{d}x). Also, we use 𝖑𝖲𝖒​(Ο‰)\mathsf{BSC}(\omega) to denote the binary symmetric channel with crossover probability Ο‰\omega. For sequences {an}\{a_{n}\} and {bn}\{b_{n}\}, we use an≳bna_{n}\gtrsim b_{n} to indicate anβ‰₯C​bna_{n}\geq Cb_{n} for some universal constant CC.

II Preliminaries

II-A ff-Divergences

Given a convex function f:(0,∞)→ℝf:(0,\infty)\to\mathbb{R} such that f​(1)=0f(1)=0, the ff-divergence between two probability measures Pβ‰ͺQP\ll Q is defined as [24, 25]

Df​(Pβˆ₯Q)≔𝔼Q​[f​(d​Pd​Q)].D_{f}(P\|Q)\coloneqq{\mathbb{E}}_{Q}\Big{[}f\big{(}\frac{\textnormal{d}P}{\textnormal{d}Q}\big{)}\Big{]}. (2)

Due to convexity of ff, we have Df​(Pβˆ₯Q)β‰₯f​(1)=0D_{f}(P\|Q)\geq f(1)=0. If, furthermore, ff is strictly convex at 11, then equality holds if and only P=QP=Q. Popular examples of ff-divergences include f​(t)=t​log⁑tf(t)=t\log t corresponding to KL divergence, f​(t)=|tβˆ’1|f(t)=|t-1| corresponding to total variation distance, and f​(t)=t2βˆ’1f(t)=t^{2}-1 corresponding to Ο‡2\chi^{2}-divergence. In this paper, we mostly concern with an important sub-family of ff-divergences associated with fγ​(t)=max⁑{tβˆ’Ξ³,0}f_{\gamma}(t)=\max\{t-\gamma,0\} for a parameter Ξ³β‰₯1\gamma\geq 1. The corresponding ff-divergence, denoted by 𝖀γ​(Pβˆ₯Q){\mathsf{E}}_{\gamma}(P\|Q), is called 𝖀γ{\mathsf{E}}_{\gamma}-divergence (or sometimes hockey-stick divergence [26]) and is explicitly defined in (1). It appeared in [27] for proving converse channel coding results and also used in [28, 29, 30, 7] for characterizing privacy guarantees of iterative algorithms in terms of other variants of DP.

II-B Contraction Coefficient

All ff-divergences satisfy data processing inequality, i.e., Df​(P​π–ͺβˆ₯Q​π–ͺ)≀Df​(Pβˆ₯Q)D_{f}(P{\mathsf{K}}\|Q{\mathsf{K}})\leq D_{f}(P\|Q) for any pair of probability distributions (P,Q)(P,Q) and Markov kernel π–ͺ{\mathsf{K}} [24]. However, in many cases, this inequality is strict. The contraction coefficient of Markov kernel π–ͺ{\mathsf{K}} under DfD_{f}-divergence Ξ·f​(π–ͺ)\eta_{f}({\mathsf{K}}) is the smallest number η≀1\eta\leq 1 such that Df​(P​π–ͺβˆ₯Q​π–ͺ)≀η​Df​(Pβˆ₯Q)D_{f}(P{\mathsf{K}}\|Q{\mathsf{K}})\leq\eta D_{f}(P\|Q) for any pair of probability distributions (P,Q)(P,Q). Formally, Ξ·f​(π–ͺ)\eta_{f}({\mathsf{K}}) is defined as

Ξ·f​(π–ͺ)≔supP,Qβˆˆπ’«β€‹(𝒳):Df​(Pβˆ₯Q)β‰ 0Df​(P​π–ͺβˆ₯Q​π–ͺ)Df​(Pβˆ₯Q).\eta_{f}({\mathsf{K}})\coloneqq\sup_{\begin{subarray}{c}P,Q\in{\mathcal{P}}({\mathcal{X}}):\\ D_{f}(P\|Q)\neq 0\end{subarray}}\frac{D_{f}(P{\mathsf{K}}\|Q{\mathsf{K}})}{D_{f}(P\|Q)}. (3)

Contraction coefficients have been studied for several ff-divergences, e.g., η𝖳𝖡\eta_{\mathsf{TV}} for total variation distance was studied in [31, 32, 33], Ξ·π–ͺ𝖫\eta_{\mathsf{KL}} for π–ͺ𝖫\mathsf{KL}-divergence was studied in [34, 35, 36, 37, 38, 39], and Ξ·Ο‡2\eta_{\chi^{2}} for Ο‡2\chi^{2}-divergence was studied in [33, 39, 40]. In particular, Dobrushin [31] showed that η𝖳𝖡\eta_{\mathsf{TV}} has a remarkably simple two-point characterization η𝖳𝖡(π–ͺ)=supx1,x2βˆˆπ’³π–³π–΅(π–ͺ(β‹…|x1),π–ͺ(β‹…|x2))\eta_{\mathsf{TV}}({\mathsf{K}})=\sup_{x_{1},x_{2}\in{\mathcal{X}}}\mathsf{TV}({\mathsf{K}}(\cdot|x_{1}),{\mathsf{K}}(\cdot|x_{2})).

Similarly, one can plug 𝖀γ{\mathsf{E}}_{\gamma}-divergence into (3) and define the contraction coefficient ηγ​(π–ͺ)\eta_{\gamma}({\mathsf{K}}) for a Markov kernel π–ͺ{\mathsf{K}} under 𝖀γ{\mathsf{E}}_{\gamma}-divergence. This contraction coefficient has recently been studied in [30] for deriving approximate DP guarantees for online algorithms. In particular, it was shown [30, Theorem 3] that Ξ·Ξ³\eta_{\gamma} enjoys a simple two-point characterization, i.e., Ξ·Ξ³(π–ͺ)=supx1,x2βˆˆπ’³π–€Ξ³(π–ͺ(β‹…|x1)βˆ₯π–ͺ(β‹…|x2))\eta_{\gamma}({\mathsf{K}})=\sup_{x_{1},x_{2}\in{\mathcal{X}}}{\mathsf{E}}_{\gamma}({\mathsf{K}}(\cdot|x_{1})\|{\mathsf{K}}(\cdot|x_{2})). Since 𝖀1​(Pβˆ₯Q)=𝖳𝖡​(P,Q){\mathsf{E}}_{1}(P\|Q)={\mathsf{TV}}(P,Q), this is a natural extension of Dobrushin’s result.

II-C Local Differential Privacy

Suppose π–ͺ{\mathsf{K}} is a randomized mechanism mapping each xβˆˆπ’³x\in{\mathcal{X}} to a distribution π–ͺ(β‹…|x)βˆˆπ’«(𝒡){\mathsf{K}}(\cdot|x)\in{\mathcal{P}}({\mathcal{Z}}). One could view π–ͺ{\mathsf{K}} as a Markov kernel (i.e., channel) π–ͺ:𝒳→𝒫​(Z){\mathsf{K}}:{\mathcal{X}}\to{\mathcal{P}}(Z).

Definition 1 ([8, 9]).

A mechanism π–ͺ:𝒳→𝒫​(𝒡){\mathsf{K}}:{\mathcal{X}}\to{\mathcal{P}}({\mathcal{Z}}) is (Ξ΅,Ξ΄)(\varepsilon,\delta)-LDP for Ξ΅β‰₯0\varepsilon\geq 0 and δ∈[0,1]\delta\in[0,1] if

supx,xβ€²βˆˆπ’³supAβŠ‚π’΅[π–ͺ​(A|x)βˆ’eΡ​π–ͺ​(A|xβ€²)]≀δ.\sup_{x,x^{\prime}\in{\mathcal{X}}}\sup_{A\subset{\mathcal{Z}}}\leavevmode\nobreak\ \left[{\mathsf{K}}(A|x)-e^{\varepsilon}{\mathsf{K}}(A|x^{\prime})\right]\leq\delta. (4)

π–ͺ{\mathsf{K}} is said to be Ξ΅\varepsilon-LDP if it is (Ξ΅,0)(\varepsilon,0)-LDP. Let 𝒬Ρ,Ξ΄{\mathcal{Q}}_{\varepsilon,\delta} be the collection of all Markov kernels π–ͺ{\mathsf{K}} with the above property. When Ξ΄=0\delta=0, we use 𝒬Ρ{\mathcal{Q}}_{\varepsilon} to denote 𝒬Ρ,0{\mathcal{Q}}_{\varepsilon,0}.

Interactivity in Privacy-Preserving Mechanisms: Suppose there are nn users, each in possession of a datapoint XiX_{i}, i∈[n]≔{1,…,n}i\in[n]\coloneqq\{1,\dots,n\}. The users wish to apply a mechanism π–ͺi{\mathsf{K}}_{i} that generates a privatized version of XiX_{i}, denoted by ZiZ_{i}. We say that the collection of mechanisms {π–ͺi}\{{\mathsf{K}}_{i}\} is non-interactive if π–ͺi{\mathsf{K}}_{i} is entirely determined by XiX_{i} and independent of (Xj,Zj)(X_{j},Z_{j}) for jβ‰ ij\neq i. When all users apply the same mechanism π–ͺ{\mathsf{K}}, we can view Zn≔(Z1,…,Zn)Z^{n}\coloneqq(Z_{1},\ldots,Z_{n}) as independent applications of π–ͺ{\mathsf{K}} to each XiX_{i}. We denote this overall mechanism by π–ͺβŠ—n{\mathsf{K}}^{\otimes n}. If interactions between users are permitted, then π–ͺi{\mathsf{K}}_{i} need not depend only on XiX_{i}. In this case, we denote the overall mechanism {π–ͺi}i=1n\{{\mathsf{K}}_{i}\}_{i=1}^{n} by π–ͺn{\mathsf{K}}^{n}. In particular, the sequentially interactive [10] setting refers to the case when the input of π–ͺi{\mathsf{K}}_{i} depends on both XiX_{i} and the outputs Ziβˆ’1Z^{i-1} of the (iβˆ’1)(i-1) previous mechanisms.

III LDP As the Contraction of 𝖀γ{\mathsf{E}}_{\gamma}-Divergence

We show next that the (Ξ΅,Ξ΄)(\varepsilon,\delta)-LDP constraint, with Ξ΄\delta not necessarily equal to zero, is equivalent to the contraction of 𝖀γ{\mathsf{E}}_{\gamma}-divergence.

Theorem 1.

A mechanism π–ͺ{\mathsf{K}} is (Ξ΅,Ξ΄)(\varepsilon,\delta)-LDP if and only if Ξ·eΡ​(π–ͺ)≀δ\eta_{e^{\varepsilon}}({\mathsf{K}})\leq\delta or equivalently

π–ͺβˆˆπ’¬Ξ΅,Ξ΄βŸΊπ–€eΡ​(P​π–ͺβˆ₯Q​π–ͺ)≀δ​𝖀eΡ​(Pβˆ₯Q),βˆ€P,Q.{\mathsf{K}}\in{\mathcal{Q}}_{\varepsilon,\delta}\leavevmode\nobreak\ \Longleftrightarrow\leavevmode\nobreak\ {\mathsf{E}}_{e^{\varepsilon}}(P{\mathsf{K}}\|Q{\mathsf{K}})\leq\delta{\mathsf{E}}_{e^{\varepsilon}}(P\|Q),\quad\forall P,Q.

We note that Duchi et al. [10] showed that if π–ͺ{\mathsf{K}} is Ξ΅\varepsilon-LDP then Dπ–ͺ𝖫​(P​π–ͺβˆ₯Q​π–ͺ)≀2​(eΞ΅βˆ’1)2​𝖳𝖡2​(P,Q)D_{{\mathsf{KL}}}(P{\mathsf{K}}\|Q{\mathsf{K}})\leq 2(e^{\varepsilon}-1)^{2}{\mathsf{TV}}^{2}(P,Q). They then informally concluded from this result that Ξ΅\varepsilon-LDP acts as a contraction on the space of probability measures. Theorem 1 makes this observation precise.

According to Theorem 1, a mechanism π–ͺ{\mathsf{K}} is Ξ΅\varepsilon-LDP if and only if 𝖀eΡ​(P​π–ͺβˆ₯Q​π–ͺ)=0{\mathsf{E}}_{e^{\varepsilon}}(P{\mathsf{K}}\|Q{\mathsf{K}})=0 for any distributions PP and QQ. An example of such Markov kernels is given next.

ExampleΒ 1. (Randomized response mechanism) Let 𝒳=𝒡={0,1}{\mathcal{X}}={\mathcal{Z}}=\{0,1\} and consider the mechanism given by the binary symmetric channel 𝖑𝖲𝖒​(ωΡ)\mathsf{BSC}(\omega_{\varepsilon}) with ωΡ≔11+eΞ΅\omega_{\varepsilon}\coloneqq\frac{1}{1+e^{\varepsilon}}. This is often called randomized response mechanism [41] and denoted by π–ͺ𝖱𝖱Ρ{\mathsf{K}}^{\varepsilon}_{\mathsf{RR}}. This simple mechanism is well-known to be Ξ΅\varepsilon-LDP which can now be verified via TheoremΒ 1. Let P=π–‘π–Ύπ—‹π—‡π—ˆπ—Žπ—…π—…π—‚β€‹(p)P={\mathsf{Bernoulli}}(p) and Q=π–‘π–Ύπ—‹π—‡π—ˆπ—Žπ—…π—…π—‚β€‹(q)Q={\mathsf{Bernoulli}}(q) with p,q∈[0,1]p,q\in[0,1]. Then P​π–ͺ𝖱𝖱Ρ=π–‘π–Ύπ—‹π—‡π—ˆπ—Žπ—…π—…π—‚β€‹(pβˆ—Ο‰Ξ΅)P{\mathsf{K}}^{\varepsilon}_{\mathsf{RR}}={\mathsf{Bernoulli}}(p*\omega_{\varepsilon}) and P​π–ͺ𝖱𝖱Ρ=π–‘π–Ύπ—‹π—‡π—ˆπ—Žπ—…π—…π—‚β€‹(qβˆ—Ο‰Ξ΅)P{\mathsf{K}}^{\varepsilon}_{\mathsf{RR}}={\mathsf{Bernoulli}}(q*\omega_{\varepsilon}) where aβˆ—b≔a​(1βˆ’b)+b​(1βˆ’a)a*b\coloneqq a(1-b)+b(1-a). It is straightforward to verify that |pβˆ—Ο‰Ξ΅βˆ’eΡ​qβˆ—Ο‰Ξ΅|+|1βˆ’pβˆ—Ο‰Ξ΅βˆ’eΡ​(1βˆ’qβˆ—Ο‰Ξ΅)|=0.5​(eΞ΅βˆ’1)|p*\omega_{\varepsilon}-e^{\varepsilon}q*\omega_{\varepsilon}|+|1-p*\omega_{\varepsilon}-e^{\varepsilon}(1-q*\omega_{\varepsilon})|=0.5(e^{\varepsilon}-1) for any p,qp,q, implying 𝖀eΡ​(P​π–ͺ𝖱𝖱Ρβˆ₯Q​π–ͺ𝖱𝖱Ρ)=0{\mathsf{E}}_{e^{\varepsilon}}(P{\mathsf{K}}^{\varepsilon}_{\mathsf{RR}}\|Q{\mathsf{K}}^{\varepsilon}_{\mathsf{RR}})=0. When |𝒳|=kβ‰₯2|{\mathcal{X}}|=k\geq 2, a simple generalization of this mechanism, called kk-ary randomized response, has been reported in literature (see, e.g., [19, 13]) and is defined by 𝒡=𝒳{\mathcal{Z}}={\mathcal{X}} and π–ͺ𝗄𝖱𝖱​(x|x)=eΞ΅kβˆ’1+eΞ΅{\mathsf{K}}_{\mathsf{kRR}}(x|x)=\frac{e^{\varepsilon}}{k-1+e^{\varepsilon}} and π–ͺ𝗄𝖱𝖱​(z|x)=1kβˆ’1+eΞ΅{\mathsf{K}}_{\mathsf{kRR}}(z|x)=\frac{1}{k-1+e^{\varepsilon}} for zβ‰ xz\neq x. Again, it can be verified that for this mechanism we have 𝖀eΡ​(P​π–ͺ𝗄𝖱𝖱Ρβˆ₯Q​π–ͺ𝗄𝖱𝖱Ρ)=0{\mathsf{E}}_{e^{\varepsilon}}(P{\mathsf{K}}^{\varepsilon}_{\mathsf{kRR}}\|Q{\mathsf{K}}^{\varepsilon}_{\mathsf{kRR}})=0, for all Bernoulli PP and QQ.

𝖀γ{\mathsf{E}}_{\gamma}-divergence underlies all other ff-divergences, in a sense that any arbitrary ff-divergence can be represented by 𝖀γ{\mathsf{E}}_{\gamma}-divergence [42, Corollary 3.7]. Thus, an LDP constraint implies that a Markov kernel contracts for all ff-divergences, in a similar spirit to 𝖀γ{\mathsf{E}}_{\gamma}-contraction in TheoremΒ 1.

Lemma 1.

Let π–ͺβˆˆπ’¬Ξ΅,Ξ΄{\mathsf{K}}\in{\mathcal{Q}}_{\varepsilon,\delta} and φ​(Ξ΅,Ξ΄)≔1βˆ’(1βˆ’Ξ΄)​eβˆ’Ξ΅\varphi(\varepsilon,\delta)\coloneqq 1-(1-\delta)e^{-\varepsilon}. Then, Ξ·f​(π–ͺ)≀φ​(Ξ΅,Ξ΄)\eta_{f}({\mathsf{K}})\leq\varphi(\varepsilon,\delta) or, equivalently,

Df​(P​π–ͺβˆ₯Q​π–ͺ)≀Df​(Pβˆ₯Q)​φ​(Ξ΅,Ξ΄)βˆ€P,Qβˆˆπ’«β€‹(𝒳).D_{f}(P{\mathsf{K}}\|Q{\mathsf{K}})\leq D_{f}(P\|Q)\varphi(\varepsilon,\delta)\qquad\forall P,Q\in{\mathcal{P}}({\mathcal{X}}).

Notice that this lemma holds for any ff-divergences and any general family of (Ξ΅,Ξ΄)(\varepsilon,\delta)-LDP mechanisms. However, it can be improved if one considers particular mechanisms or a certain ff-divergence. For instance, it is known that Ξ·π–ͺ𝖫​(𝖑𝖲𝖒​(Ο‰))=(1βˆ’2​ω2)\eta_{\mathsf{KL}}(\mathsf{BSC}(\omega))=(1-2\omega^{2}) [21]. Thus, we have Ξ·π–ͺ𝖫​(π–ͺ𝖱𝖱Ρ)=(eΞ΅βˆ’1eΞ΅+1)2\eta_{\mathsf{KL}}({\mathsf{K}}^{\varepsilon}_{\mathsf{RR}})=(\frac{e^{\varepsilon}-1}{e^{\varepsilon}+1})^{2} for the randomized response mechanism π–ͺ𝖱𝖱Ρ{\mathsf{K}}^{\varepsilon}_{\mathsf{RR}} (cf. ExampleΒ 1), while LemmaΒ 1 implies that Ξ·π–ͺ𝖫​(π–ͺ𝖱𝖱Ρ)≀1βˆ’eβˆ’Ξ΅\eta_{\mathsf{KL}}({\mathsf{K}}^{\varepsilon}_{\mathsf{RR}})\leq 1-e^{-\varepsilon}. Unfortunately, Ξ·π–ͺ𝖫\eta_{\mathsf{KL}} is difficult to compute in closed form for general Markov kernels, in which case LemmaΒ 1 provides a useful alternative.

Next, we extend LemmaΒ 1 for the non-interactive mechanism. Fix an (Ξ΅,Ξ΄)(\varepsilon,\delta)-LDP mechanism π–ͺ{\mathsf{K}} and consider the corresponding non-interactive mechanism π–ͺβŠ—n{\mathsf{K}}^{\otimes n}. To obtain upper bounds on Ξ·f​(π–ͺβŠ—n)\eta_{f}({\mathsf{K}}^{\otimes n}) directly through LemmaΒ 1, we would first need to derive privacy parameters of π–ͺβŠ—n{\mathsf{K}}^{\otimes n} in terms of Ξ΅\varepsilon and Ξ΄\delta (e.g., by applying composition theorems). Instead, we can use the tensorization properties of contraction coefficients (see, e.g., [39, 38]) to relate Ξ·f​(π–ͺβŠ—n)\eta_{f}({\mathsf{K}}^{\otimes n}) to Ξ·f​(π–ͺ)\eta_{f}({\mathsf{K}}) and then apply LemmaΒ 1, as described next.

Lemma 2.

Let π–ͺβˆˆπ’¬Ξ΅,Ξ΄{\mathsf{K}}\in{\mathcal{Q}}_{\varepsilon,\delta} and Ο†n​(Ξ΅,Ξ΄)≔1βˆ’eβˆ’n​Ρ​(1βˆ’Ξ΄)n\varphi_{n}(\varepsilon,\delta)\coloneqq 1-e^{-n\varepsilon}(1-\delta)^{n}. Then Ξ·f​(π–ͺβŠ—n)≀φn​(Ξ΅,Ξ΄)\eta_{f}({\mathsf{K}}^{\otimes n})\leq\varphi_{n}(\varepsilon,\delta) for nβ‰₯1.n\geq 1.

Each of the next three sections provide a different application of the contraction characterization of LDP.

IV Private Minimax Risk

Let Xn=(X1,…,Xn)X^{n}=(X_{1},\dots,X_{n}) be nn independent and identically distributed (i.i.d.)Β samples drawn from a distribution PP in a family π’«βŠ†π’«β€‹(𝒳){\mathcal{P}}\subseteq{\mathcal{P}}({\mathcal{X}}). Let also ΞΈ:𝒫→𝒯\theta:{\mathcal{P}}\to{\mathcal{T}} be a parameter of a distribution that we wish to estimate. Each user has a sample XiX_{i} and applies a privacy-preserving mechanism π–ͺi{\mathsf{K}}_{i} to obtain ZiZ_{i}. Generally, we can assume that π–ͺi{\mathsf{K}}_{i} are sequentially interactive. Given the sequences {Zi}i=1n\{Z_{i}\}_{i=1}^{n}, the goal is to estimate θ​(P)\theta(P) through an estimator Ξ¨:𝒡n→𝒯\Psi:{\mathcal{Z}}^{n}\to{\mathcal{T}}. The quality of such estimator is assessed by a semi-metric β„“:𝒯×𝒯→ℝ+\ell:{\mathcal{T}}\times{\mathcal{T}}\to\mathbb{R}_{+} and is used to define the minimax risk as:

β„›n​(𝒫,β„“,Ξ΅,Ξ΄)≔infπ–ͺnβŠ‚π’¬Ξ΅,Ξ΄infΞ¨supPβˆˆπ’«π”Όβ€‹[ℓ​(Ψ​(Zn),θ​(P))].\mathcal{R}_{n}({\mathcal{P}},\ell,\varepsilon,\delta)\coloneqq\inf_{{\mathsf{K}}^{n}\subset{\mathcal{Q}}_{\varepsilon,\delta}}\inf_{\Psi}\sup_{P\in{\mathcal{P}}}{\mathbb{E}}[\ell(\Psi(Z^{n}),\theta(P))]. (5)

The quantity Rn​(𝒫,β„“,Ξ΅,Ξ΄)R_{n}({\mathcal{P}},\ell,\varepsilon,\delta) uniformly characterizes the optimal rate of private statistical estimation over the family 𝒫{\mathcal{P}} using the best possible estimator and privacy-preserving mechanisms in 𝒬Ρ,Ξ΄{\mathcal{Q}}_{\varepsilon,\delta}. In the absence of privacy constraints (i.e., Zn=XnZ^{n}=X^{n}), we denote the minimax risk by β„›n​(𝒫,β„“)\mathcal{R}_{n}({\mathcal{P}},\ell).

The first step in deriving information-theoretic lower bounds for minimax risk is to reduce the above estimation problem to a testing problem [23, 43, 22]. To do so, we need to construct an index set 𝒱{\mathcal{V}} with |𝒱|<∞|{\mathcal{V}}|<\infty and a family of distributions {Pv,vβˆˆπ’±}βŠ†π’«\{P_{v},v\in{\mathcal{V}}\}\subseteq{\mathcal{P}} such that ℓ​(θ​(Pv),θ​(Pvβ€²))β‰₯2​τ\ell(\theta(P_{v}),\theta(P_{v^{\prime}}))\geq 2\tau for all vβ‰ vβ€²v\neq v^{\prime} in 𝒱{\mathcal{V}} for some Ο„>0\tau>0. The canonical testing problem is then defined as follows: Nature chooses a random variable VV uniformly at random from 𝒱{\mathcal{V}}, and then conditioned on V=vV=v, the samples XnX^{n} are drawn i.i.d.Β from PvP_{v}, denoted by Xn∼PvβŠ—nX^{n}\sim P^{\otimes n}_{v}. Each XiX_{i} is then fed to a mechanism π–ͺi{\mathsf{K}}_{i} to generate ZiZ_{i}. It is well-known [22, 43, 23] that β„›n​(𝒫,β„“)β‰₯τ​𝖯𝖾​(V|Xn)\mathcal{R}_{n}({\mathcal{P}},\ell)\geq\tau\mathsf{P}_{\mathsf{e}}(V|X^{n}), where 𝖯𝖾​(V|Xn)\mathsf{P}_{\mathsf{e}}(V|X^{n}) denotes the probability of error in guessing VV given XnX^{n}. Replacing XnX^{n} by its (Ξ΅,Ξ΄)(\varepsilon,\delta)-privatized samples ZnZ^{n} in this result, one can obtain a lower bound on Rn​(𝒫,β„“,Ξ΅,Ξ΄)R_{n}({\mathcal{P}},\ell,\varepsilon,\delta) in terms of 𝖯𝖾​(V|Zn)\mathsf{P}_{\mathsf{e}}(V|Z^{n}). Hence, the remaining challenge is to lower-bound 𝖯𝖾​(V|Zn)\mathsf{P}_{\mathsf{e}}(V|Z^{n}) over the choice of mechanisms {π–ͺi}\{{\mathsf{K}}_{i}\}. There are numerous techniques for this objective depending on 𝒱{\mathcal{V}}. We focus on two such approaches, namely Le Cam’s and Fano’s method, that bound 𝖯𝖾​(V|Zn)\mathsf{P}_{\mathsf{e}}(V|Z^{n}) in terms of total variation distance and mutual information and hence allow us to invoke LemmasΒ 1 and 2.

IV-A Locally Private Le Cam’s Method

Le Cam’s method is applicable when VV is a binary set and contains, say, P0P_{0} and P1P_{1}. In its simplest form, it relies on the inequality (see [22, Lemma 1] or [23, Theorem 2.2]) 𝖯𝖾​(V|Xn)β‰₯12​[1βˆ’π–³π–΅β€‹(P0βŠ—n,P1βŠ—n)]\mathsf{P}_{\mathsf{e}}(V|X^{n})\geq\frac{1}{2}\left[1-{\mathsf{TV}}(P^{\otimes n}_{0},P^{\otimes n}_{1})\right]. Thus, it yields the following lower bound for non-private minimax risk

β„›n​(𝒫,β„“)\displaystyle\mathcal{R}_{n}({\mathcal{P}},\ell) β‰₯Ο„2​[1βˆ’π–³π–΅β€‹(P0βŠ—n,P1βŠ—n)]\displaystyle\geq\frac{\tau}{2}\left[1-{\mathsf{TV}}(P^{\otimes n}_{0},P^{\otimes n}_{1})\right] (6)
β‰₯Ο„2​[1βˆ’12​n​Dπ–ͺ𝖫​(P0βˆ₯P1)],\displaystyle\geq\frac{\tau}{2}\left[1-\frac{1}{\sqrt{2}}\sqrt{nD_{\mathsf{KL}}(P_{0}\|P_{1})}\right], (7)

for any P0β‰ P1P_{0}\neq P_{1} in 𝒫{\mathcal{P}}, where the second inequality follows from Pinsker’s inequality and chain rule of KL divergence. In the presence of privacy, the estimator Ξ¨\Psi depends on ZnZ^{n} instead of XnX^{n}, which is generated by a sequentially interactive mechanism π–ͺn{\mathsf{K}}^{n}. To write the private counterpart of (6), we need to replace P0βŠ—nP^{\otimes n}_{0} and P1βŠ—nP^{\otimes n}_{1} with P0βŠ—n​π–ͺnP^{\otimes n}_{0}{\mathsf{K}}^{n} and P1βŠ—n​π–ͺnP^{\otimes n}_{1}{\mathsf{K}}^{n} the corresponding marginals of ZnZ^{n}, respectively. A lower bound for β„›n​(𝒫,β„“,Ξ΅,Ξ΄)\mathcal{R}_{n}({\mathcal{P}},\ell,\varepsilon,\delta) is therefore obtained by deriving an upper bound for 𝖳𝖡​(P0βŠ—n​π–ͺn,P1βŠ—n​π–ͺn){\mathsf{TV}}(P_{0}^{\otimes n}{\mathsf{K}}^{n},P_{1}^{\otimes n}{\mathsf{K}}^{n}) for all π–ͺnβŠ‚π’¬Ξ΅,Ξ΄{\mathsf{K}}^{n}\subset{\mathcal{Q}}_{\varepsilon,\delta}.

Lemma 3.

Let P0,P1βˆˆπ’«P_{0},P_{1}\in{\mathcal{P}} satisfy ℓ​(θ​(P0),θ​(P1))β‰₯2​τ\ell(\theta(P_{0}),\theta(P_{1}))\geq 2\tau. Then we have

β„›n​(𝒫,β„“,Ξ΅,Ξ΄)β‰₯Ο„2​[1βˆ’12​n​φ​(Ξ΅,Ξ΄)​Dπ–ͺ𝖫​(P0βˆ₯P1)].\mathcal{R}_{n}({\mathcal{P}},\ell,\varepsilon,\delta)\geq\frac{\tau}{2}\left[1-\frac{1}{\sqrt{2}}\sqrt{n\varphi(\varepsilon,\delta)D_{\mathsf{KL}}(P_{0}\|P_{1})}\right].

By comparing with the original non-private Le Cam’s method (7), we observe that the effect of (Ξ΅,Ξ΄)(\varepsilon,\delta)-LDP is to reduce the effective sample size from nn to (1βˆ’eβˆ’Ξ΅β€‹(1βˆ’Ξ΄))​n(1-e^{-\varepsilon}(1-\delta))n. Setting Ξ΄=0\delta=0, this result strengthens Duchi et al. [10, Corollary 2], where the effective sample size was shown to be 4​Ρ2​n4\varepsilon^{2}n for sufficiently small Ξ΅\varepsilon.

ExampleΒ 2. (One-dimensional mean estimation) For some k>1k>1, we assume 𝒫{\mathcal{P}} is given by

𝒫=𝒫k≔{Pβˆˆπ’«β€‹(𝒳):|𝔼P​[X]|≀1,𝔼P​[|X|k]≀1}.{\mathcal{P}}={\mathcal{P}}_{k}\coloneqq\{P\in{\mathcal{P}}({\mathcal{X}}):\leavevmode\nobreak\ |{\mathbb{E}}_{P}[X]|\leq 1,{\mathbb{E}}_{P}[|X|^{k}]\leq 1\}.

The goal is to estimate θ​(P)=𝔼P​[X]\theta(P)={\mathbb{E}}_{P}[X] under β„“=β„“22\ell=\ell_{2}^{2} the squared β„“2\ell_{2} metric. This problem was first studied in [10, Propsition 1] where it was shown β„›n​(𝒫k,β„“22,Ξ΅,0)β‰₯(n​Ρ2)βˆ’(kβˆ’1)/k\mathcal{R}_{n}({\mathcal{P}}_{k},\ell_{2}^{2},\varepsilon,0)\geq(n\varepsilon^{2})^{-(k-1)/k} only for Ρ≀1\varepsilon\leq 1. Applying our framework to this example, we obtain a similar lower bound that holds for all Ξ΅β‰₯0\varepsilon\geq 0 and δ∈[0,1]\delta\in[0,1].

Corollary 1.

For all k>1k>1, Ξ΅β‰₯0\varepsilon\geq 0, and δ∈(0,1)\delta\in(0,1), we have

β„›n​(𝒫k,β„“22,Ξ΅,Ξ΄)\displaystyle\mathcal{R}_{n}({\mathcal{P}}_{k},\ell_{2}^{2},\varepsilon,\delta) ≳min⁑{1,[n​φ2​(Ξ΅,Ξ΄)]βˆ’(kβˆ’1)k}.\displaystyle\gtrsim\min\Big{\{}1,\left[n\varphi^{2}(\varepsilon,\delta)\right]^{-\frac{(k-1)}{k}}\Big{\}}. (8)

It is worth instantiating this corollary for some special values of kk. Consider first the usual setting of finite variance setting, i.e., k=2k=2. In the non-private case, it is known that the sample mean has mean-squared error that scales as 1/n1/n. According to CorollaryΒ 1, this rate worsens to 1/φ​(Ξ΅,Ξ΄)​n1/\varphi(\varepsilon,\delta)\sqrt{n} in the presence of (Ξ΅,Ξ΄)(\varepsilon,\delta)-LDP requirement. As kβ†’βˆžk\to\infty, the moment condition 𝔼p​[|X|k]≀1{\mathbb{E}}_{p}[|X|^{k}]\leq 1 implies the boundedness of XX. In this case, CorollaryΒ 1 implies the more standard lower bound (Ο†2​(Ξ΅,Ξ΄)​n)βˆ’1(\varphi^{2}(\varepsilon,\delta)n)^{-1}.

IV-B Locally Private Fano’s Method

Le Cam’s method involves a pair of distributions (P0,P1)(P_{0},P_{1}) in 𝒫{\mathcal{P}}. However, it is possible to derive a stronger bound considering a larger subset of 𝒫{\mathcal{P}} by applying Fano’s inequality (see, e.g., [22]). We follow this path to obtain a better minimax lower bound for the non-interactive setting.

Consider the index set 𝒱={1,…,|𝒱|}{\mathcal{V}}=\{1,\dots,|{\mathcal{V}}|\}. The non-private Fano’s method relies on the Fano’s inequality to write a lower bound for 𝖯𝖾​(V|Xn)\mathsf{P}_{\mathsf{e}}(V|X^{n}) in terms of mutual information as

β„›n​(𝒫,β„“)β‰₯τ​[1βˆ’I​(Xn;V)+log⁑2log⁑|𝒱|].\mathcal{R}_{n}({\mathcal{P}},\ell)\geq\tau\left[1-\frac{I(X^{n};V)+\log 2}{\log|{\mathcal{V}}|}\right]. (9)

To incorporate privacy into this result, we need to derive an upper bound for I​(Zn;V)I(Z^{n};V) over all choices of mechanisms {π–ͺi}\{{\mathsf{K}}_{i}\}. Focusing on the non-interactive mechanisms, the following lemma exploits LemmaΒ 2 for such an upper bound.

Lemma 4.

Given XnX^{n} and VV as described above, let ZnZ^{n} be constructed by applying π–ͺβŠ—n{\mathsf{K}}^{\otimes n} on XnX^{n}. If π–ͺ{\mathsf{K}} is (Ξ΅,Ξ΄)(\varepsilon,\delta)-LDP, then we have

I​(Zn;V)\displaystyle I(Z^{n};V) ≀φn​(Ξ΅,Ξ΄)​I​(Xn;V)\displaystyle\leq\varphi_{n}(\varepsilon,\delta)I(X^{n};V)
≀n​φn​(Ξ΅,Ξ΄)|𝒱|2β€‹βˆ‘v,vβ€²βˆˆπ’±Dπ–ͺ𝖫​(Pvβˆ₯Pvβ€²)\displaystyle\leq\frac{n\varphi_{n}(\varepsilon,\delta)}{|{\mathcal{V}}|^{2}}\sum_{v,v^{\prime}\in{\mathcal{V}}}D_{\mathsf{KL}}(P_{v}\|P_{v^{\prime}})

This lemma can be compared with [10, Corollary 1], where it was shown

I​(Zn;V)\displaystyle I(Z^{n};V) ≀2​(eΞ΅βˆ’1)​n|𝒱|2β€‹βˆ‘v,vβ€²βˆˆπ’±Dπ–ͺ𝖫​(Pvβˆ₯Pvβ€²).\displaystyle\leq 2(e^{\varepsilon}-1)\frac{n}{|{\mathcal{V}}|^{2}}\sum_{v,v^{\prime}\in{\mathcal{V}}}D_{\mathsf{KL}}(P_{v}\|P_{v^{\prime}}). (10)

This is a looser bound than LemmaΒ 4 for any nβ‰₯1n\geq 1 and Ξ΅β‰₯0.4\varepsilon\geq 0.4 and only holds for Ξ΄=0\delta=0.

ExampleΒ 3. (High-dimensional mean estimation in an β„“2\ell_{2}-ball) For a parameter r<∞r<\infty, define

𝒫r≔{Pβˆˆπ’«β€‹(𝖑2d​(r))},{\mathcal{P}}_{r}\coloneqq\{P\in{\mathcal{P}}(\mathsf{B}^{d}_{2}(r))\}, (11)

where 𝖑2d​(r)≔{xβˆˆβ„d:β€–xβ€–2≀r}\mathsf{B}^{d}_{2}(r)\coloneqq\{x\in\mathbb{R}^{d}:\leavevmode\nobreak\ \|x\|_{2}\leq r\} is the β„“2\ell_{2}-ball of radius rr in ℝd\mathbb{R}^{d}. The goal is to estimate the mean θ​(P)=𝔼​[X]\theta(P)={\mathbb{E}}[X] given the private views ZnZ^{n}. This example was first studied in [10, Proposition 3] that states β„›n​(𝒫,β„“22,Ξ΅,0)≳r2​min⁑{1Ρ​n,dn​Ρ2}\mathcal{R}_{n}({\mathcal{P}},\ell_{2}^{2},\varepsilon,0)\gtrsim r^{2}\min\left\{\frac{1}{\varepsilon\sqrt{n}},\frac{d}{n\varepsilon^{2}}\right\} for Ρ∈(0,1)\varepsilon\in(0,1). In the following, we use LemmaΒ 4 to derive a similar lower bound for any Ξ΅β‰₯0\varepsilon\geq 0 and δ∈(0,1)\delta\in(0,1), albeit slightly weaker than [10, Proposition 3].

Corollary 2.

For the non-interactive setting, we have

β„›n​(𝒫,β„“22,Ξ΅,Ξ΄)≳r2​min⁑{1n​φn​(Ξ΅,Ξ΄),dn2​φn2​(Ξ΅,Ξ΄)}.\mathcal{R}_{n}({\mathcal{P}},\ell_{2}^{2},\varepsilon,\delta)\gtrsim r^{2}\min\left\{\frac{1}{n\varphi_{n}(\varepsilon,\delta)},\frac{d}{n^{2}\varphi^{2}_{n}(\varepsilon,\delta)}\right\}. (12)

V Private Bayesian Risk

In the minimax setting, the worst-case parameter is considered which usually leads to over-pessimistic bounds. In practice, the parameter that incurs a worst-case risk may appear with very small probability. To capture this prior knowledge, it is reasonable to assume that the true parameter is sampled from an underlying prior distribution. In this case, we are interested in the Bayes risk of the problem.

Let 𝒫={PX|Θ(β‹…|ΞΈ):ΞΈβˆˆπ’―}{\mathcal{P}}=\{P_{X|\Theta}(\cdot|\theta):\theta\in{\mathcal{T}}\} be a collection of parametric probability distributions on 𝒳{\mathcal{X}} and the parameter space 𝒯{\mathcal{T}} is endowed with a prior PΘP_{\Theta}, i.e., Θ∼PΘ\Theta\sim P_{\Theta}. Given an i.i.d.Β sequence XnX^{n} drawn from PX|ΘP_{X|\Theta}, the goal is to estimate Θ\Theta from a privatized sequence ZnZ^{n} via an estimator Ξ¨:𝒡n→𝒯\Psi:{\mathcal{Z}}^{n}\to{\mathcal{T}}. Here, we focus on the non-interactive setting. Define the private Bayes risk as

Rnπ–‘π–Ίπ—’π–Ύπ—Œβ€‹(PΘ,β„“,Ξ΅,Ξ΄)≔infπ–ͺβˆˆπ’¬Ξ΅,Ξ΄infΨ𝔼​[ℓ​(Θ,Ψ​(Zn))],R_{n}^{\mathsf{Bayes}}(P_{\Theta},\ell,\varepsilon,\delta)\coloneqq\inf_{{\mathsf{K}}\in{\mathcal{Q}}_{\varepsilon,\delta}}\inf_{\Psi}{\mathbb{E}}[\ell(\Theta,\Psi(Z^{n}))], (13)

where the expectation is taken with respect to the randomness of both Θ\Theta and ZnZ^{n}. It is evident that Rnπ–‘π–Ίπ—’π–Ύπ—Œβ€‹(PΘ,β„“,Ξ΅,Ξ΄)R_{n}^{\mathsf{Bayes}}(P_{\Theta},\ell,\varepsilon,\delta) must depend on the prior PΘP_{\Theta}. This dependence can be quantified by

ℒ​(ΞΆ)≔suptβˆˆπ’―Pr⁑(ℓ​(Θ,t)≀΢),{\mathcal{L}}(\zeta)\coloneqq\sup_{t\in{\mathcal{T}}}\Pr(\ell(\Theta,t)\leq\zeta), (14)

for ΞΆ<supΞΈ,ΞΈβ€²βˆˆπ’―β„“β€‹(ΞΈ,ΞΈβ€²)\zeta<\sup_{\theta,\theta^{\prime}\in{\mathcal{T}}}\ell(\theta,\theta^{\prime}). Xu and Raginsky [44] showed that the non-private Bayes risk (i.e., Xn=ZnX^{n}=Z^{n}), denoted by Rnπ–‘π–Ίπ—’π–Ύπ—Œβ€‹(PΘ,β„“)R_{n}^{\mathsf{Bayes}}(P_{\Theta},\ell), is lower bounded as

Rnπ–‘π–Ίπ—’π–Ύπ—Œβ€‹(PΘ,β„“)β‰₯supΞΆ>0΢​[1βˆ’I​(Θ;Xn)+log⁑2log⁑(1/ℒ​(ΞΆ))].R_{n}^{\mathsf{Bayes}}(P_{\Theta},\ell)\geq\sup_{\zeta>0}\zeta\left[1-\frac{I(\Theta;X^{n})+\log 2}{\log(1/{\mathcal{L}}(\zeta))}\right]. (15)

Replacing I​(Θ;Xn)I(\Theta;X^{n}) with I​(Θ;Zn)I(\Theta;Z^{n}) in this result and applying LemmaΒ 2 (similar to LemmaΒ 4), we can directly convert (15) to a lower bound for Rnπ–‘π–Ίπ—’π–Ύπ—Œβ€‹(PΘ,β„“,Ξ΅,Ξ΄)R_{n}^{\mathsf{Bayes}}(P_{\Theta},\ell,\varepsilon,\delta).

Corollary 3.

In the non-interactive setting, we have

Rnπ–‘π–Ίπ—’π–Ύπ—Œβ€‹(PΘ,β„“,Ξ΅,Ξ΄)β‰₯supΞΆ>0΢​[1βˆ’Ο†n​(Ξ΅,Ξ΄)​I​(Θ;Xn)+log⁑2log⁑(1/ℒ​(ΞΆ))].R_{n}^{\mathsf{Bayes}}(P_{\Theta},\ell,\varepsilon,\delta)\geq\sup_{\zeta>0}\zeta\left[1-\frac{\varphi_{n}(\varepsilon,\delta)I(\Theta;X^{n})+\log 2}{\log(1/{\mathcal{L}}(\zeta))}\right].

In the following theorem, we provide a lower bound for Rnπ–‘π–Ίπ—’π–Ύπ—Œβ€‹(PΘ,β„“,Ξ΅,Ξ΄)R_{n}^{\mathsf{Bayes}}(P_{\Theta},\ell,\varepsilon,\delta) that directly involves 𝖀γ{\mathsf{E}}_{\gamma}-divergence, and thus leads to a tighter bounds than (3). For any pair of random variables (A,B)∼PA​B(A,B)\sim P_{AB} with marginals PAP_{A} and PBP_{B} and a constant Ξ³β‰₯0\gamma\geq 0, we define their EΞ³E_{\gamma}-information as

Iγ​(A;B)≔𝖀γ​(PA​Bβˆ₯PA​PB).I_{\gamma}(A;B)\coloneqq{\mathsf{E}}_{\gamma}(P_{AB}\|P_{A}P_{B}).
Theorem 2.

Let π–ͺ{\mathsf{K}} be an (Ξ΅,Ξ΄)(\varepsilon,\delta)-LDP mechanism. Then, for n=1n=1 we have

R1π–‘π–Ίπ—’π–Ύπ—Œβ€‹(PΘ,β„“,Ξ΅,Ξ΄)β‰₯supΞΆ>0΢​[1βˆ’Ξ΄β€‹IeΡ​(Θ;X)βˆ’eΡ​ℒ​(ΞΆ)],R_{1}^{\mathsf{Bayes}}(P_{\Theta},\ell,\varepsilon,\delta)\geq\sup_{\zeta>0}\zeta\left[1-\delta I_{e^{\varepsilon}}(\Theta;X)-e^{\varepsilon}{\mathcal{L}}(\zeta)\right],

and for n>1n>1 in non-interactive setting we have

Rnπ–‘π–Ίπ—’π–Ύπ—Œβ€‹(PΘ,β„“,Ξ΅,Ξ΄)β‰₯supΞΆ>0΢​[1βˆ’Ο†n​(Ξ΅,Ξ΄)​IeΡ​(Θ;Xn)βˆ’eΡ​ℒ​(ΞΆ)].R_{n}^{\mathsf{Bayes}}(P_{\Theta},\ell,\varepsilon,\delta)\geq\sup_{\zeta>0}\zeta\left[1-\varphi_{n}(\varepsilon,\delta)I_{e^{\varepsilon}}(\Theta;X^{n})-e^{\varepsilon}{\mathcal{L}}(\zeta)\right].

We compare TheoremΒ 2 with CorollaryΒ 3 in the next example.

ExampleΒ 4. Suppose Θ\Theta is uniformly distributed on [0,1][0,1], PX|Θ=ΞΈ=π–‘π–Ύπ—‹π—‡π—ˆπ—Žπ—…π—…π—‚β€‹(ΞΈ)P_{X|\Theta=\theta}={\mathsf{Bernoulli}}(\theta), and ℓ​(ΞΈ,ΞΈβ€²)=|ΞΈβˆ’ΞΈβ€²|\ell(\theta,\theta^{\prime})=|\theta-\theta^{\prime}|. As mentioned earlier, ℒ​(ΞΆ)≀min⁑{2​΢,1}{\mathcal{L}}(\zeta)\leq\min\{2\zeta,1\}. We can write for Ξ³=eΞ΅\gamma=e^{\varepsilon}

Iγ​(Θ;Xn)=∫01𝖀γ​(PXn|ΞΈβˆ₯PXn)​d​θ.I_{\gamma}(\Theta;X^{n})=\int_{0}^{1}{\mathsf{E}}_{\gamma}(P_{X^{n}|\theta}\|P_{X^{n}})\text{d}\theta. (16)

A straightforward calculation shows that PXn|θ​(xn)=ΞΈs​(xn)​(1βˆ’ΞΈ)nβˆ’s​(xn)P_{X^{n}|\theta}(x^{n})=\theta^{s(x^{n})}(1-\theta)^{n-s(x^{n})}, for any θ∈[0,1]\theta\in[0,1], and PXn​(xn)=s​(xn)!​(nβˆ’s​(xn))!(n+1)!P_{X^{n}}(x^{n})=\frac{s(x^{n})!(n-s(x^{n}))!}{(n+1)!} where s​(xn)s(x^{n}) is the number of 1’s in xnx^{n}. Given these marginal and conditional distribution, one can obtain after algebraic manipulations

Iγ​(Θ;Xn)=1n+1β€‹βˆ‘s=0n∫01[ΞΈs​(1βˆ’ΞΈ)nβˆ’s​(n+1)!s!​(nβˆ’s)!βˆ’Ξ³]+​d​θ.I_{\gamma}(\Theta;X^{n})=\frac{1}{n+1}\sum_{s=0}^{n}\int_{0}^{1}\left[\theta^{s}(1-\theta)^{n-s}\frac{(n+1)!}{s!(n-s)!}-\gamma\right]_{+}\text{d}\theta.

Plugging this into TheoremΒ 2, we arrive at a maximization problem that can be numerically solved. Similarly, we compute I​(Θ;Xn)=∫01Dπ–ͺ𝖫​(PXn|ΞΈβˆ₯PXn)​d​θI(\Theta;X^{n})=\int_{0}^{1}D_{\mathsf{KL}}(P_{X^{n}|\theta}\|P_{X^{n}})\text{d}\theta and plug it into CorollaryΒ 3 and numerically solve the resulting optimization problem. In Fig.Β 1, we compare these two lower bounds for Ξ΄=10βˆ’4\delta=10^{-4} and n=20n=20, indicating the advantage of TheoremΒ 2 for small Ξ΅\varepsilon.

Refer to caption
Figure 1: Comparison of the lower bounds obtained from TheoremΒ 2 and the private version of [44, Theorem 1] described in CorollaryΒ 3 for ExampleΒ 2 assuming Ξ΄=10βˆ’4\delta=10^{-4} and n=20n=20.
Remark 1.

The proof of TheoremΒ 2 leads to the following lower bound for the non-private Bayes risk

Rnπ–‘π–Ίπ—’π–Ύπ—Œβ€‹(PΘ,β„“)β‰₯supΞΆ>0,Ξ³β‰₯0΢​[1βˆ’Iγ​(Θ;Xn)βˆ’Ξ³β€‹β„’β€‹(ΞΆ)βˆ’(1βˆ’Ξ³)+].R_{n}^{\mathsf{Bayes}}(P_{\Theta},\ell)\geq\sup_{\begin{subarray}{c}\zeta>0,\\ \gamma\geq 0\end{subarray}}\zeta\left[1-I_{\gamma}(\Theta;X^{n})-\gamma{\mathcal{L}}(\zeta)-(1-\gamma)_{+}\right]. (17)

For a comparison with (15), consider the following example. Suppose Θ\Theta is a uniform random variable on [0,1][0,1] and PX|Θ=ΞΈ=π–‘π–Ύπ—‹π—‡π—ˆπ—Žπ—…π—…π—‚β€‹(ΞΈ)P_{X|\Theta=\theta}={\mathsf{Bernoulli}}(\theta). We are interested in the Bayes risk with respect to the β„“1\ell_{1}-loss function ℓ​(ΞΈ,ΞΈβ€²)=|ΞΈβˆ’ΞΈβ€²|\ell(\theta,\theta^{\prime})=|\theta-\theta^{\prime}|. It can be shown that I​(Θ;X)=0.19I(\Theta;X)=0.19 nats while

Iγ​(Θ;X)={0.25​γ2ifβ€‹Ξ³βˆˆ[0,1]0.25​(Ξ³βˆ’2)2ifβ€‹Ξ³βˆˆ[1,2]0otherwise.I_{\gamma}(\Theta;X)=\begin{cases}0.25\gamma^{2}&\text{if}\leavevmode\nobreak\ \gamma\in[0,1]\\ 0.25(\gamma-2)^{2}&\text{if}\leavevmode\nobreak\ \gamma\in[1,2]\\ 0&\text{otherwise}.\end{cases} (18)

Moreover, ℒ​(ΞΆ)=supt∈[0,1]Pr⁑(|Ξ˜βˆ’t|≀΢)≀min⁑{2​΢,1}{\mathcal{L}}(\zeta)=\sup_{t\in[0,1]}\Pr(|\Theta-t|\leq\zeta)\leq\min\{2\zeta,1\}. It can be verified that (15) gives R1π–‘π–Ίπ—’π–Ύπ—Œβ€‹(PΘ,β„“1)β‰₯0.03R_{1}^{\mathsf{Bayes}}(P_{\Theta},\ell_{1})\geq 0.03, whereas our bound (17) yields R1π–‘π–Ίπ—’π–Ύπ—Œβ€‹(PΘ,β„“1)β‰₯0.08R_{1}^{\mathsf{Bayes}}(P_{\Theta},\ell_{1})\geq 0.08.

VI Private Hypothesis Testing

We now turn our attention to the well-known problem of binary hypothesis testing under local differential privacy constraint. Suppose nn i.i.d. samples XnX^{n} drawn from a distribution Qβˆˆπ’«β€‹(𝒳)Q\in{\mathcal{P}}({\mathcal{X}}) are observed. Let now each XiX_{i} be mapped to ZiZ_{i} via a mechanism π–ͺiβˆˆπ’¬Ξ΅,Ξ΄{\mathsf{K}}_{i}\in{\mathcal{Q}}_{\varepsilon,\delta} (i.e., sequential interaction is permitted). The goal is to distinguish between the null hypothesis H0:Q=P0H_{0}:Q=P_{0} from the alternative H1:Q=P1H_{1}:Q=P_{1} given ZnZ^{n}. Let TT be a binary statistic, generated from a randomized decision rule PT|Zn:𝒡n→𝒫​({0,1})P_{T|Z^{n}}:{\mathcal{Z}}^{n}\to\mathcal{P}(\{0,1\}) where 11 indicates that H0H_{0} is rejected. Type I and type II error probabilities corresponding to this statistic are given by Pr⁑(T=1|H0)\Pr(T=1|H_{0}) and Pr⁑(T=1|H1)\Pr(T=1|H_{1}), respectively. To capture the optimal trade-off between type I and type II error probabilities, it is customary to define Ξ²nΞ΅,δ​(Ξ±)≔infPr⁑(T=0|H1)\beta^{\varepsilon,\delta}_{n}(\alpha)\coloneqq\inf\Pr(T=0|H_{1}) where the infimum is taken over all kernels PT|ZnP_{T|Z^{n}} such that Pr⁑(T=1|H0)≀α\Pr(T=1|H_{0})\leq\alpha and non-interactive mechanisms π–ͺβŠ—n{\mathsf{K}}^{\otimes n} with π–ͺβˆˆπ’¬Ξ΅,Ξ΄{\mathsf{K}}\in{\mathcal{Q}}_{\varepsilon,\delta}. In the following lemma, we apply LemmaΒ 1 to obtain an asymptotic lower bound for Ξ²nΞ΅,δ​(Ξ±)\beta_{n}^{\varepsilon,\delta}(\alpha).

Corollary 4.

We have for any Ξ΅β‰₯0\varepsilon\geq 0 and δ∈[0,1]\delta\in[0,1]

limnβ†’βˆž1n​log⁑βnΞ΅,δ​(Ξ±)β‰₯βˆ’Ο†β€‹(Ξ΅,Ξ΄)​Dπ–ͺ𝖫​(P0βˆ₯P1).\lim_{n\to\infty}\frac{1}{n}\log\beta_{n}^{\varepsilon,\delta}(\alpha)\geq-\varphi(\varepsilon,\delta)D_{\mathsf{KL}}(P_{0}\|P_{1}). (19)

A similar result was proved by Kairouz et al. [13, Sec.Β 3] that holds only for sufficiently β€œsmall” (albeit unspecified) Ξ΅\varepsilon and Ξ΄=0\delta=0. When compared to Chernoff-Stein lemma [45, Theorem 11.8.3], establishing Dπ–ͺ𝖫​(P0βˆ₯P1)D_{\mathsf{KL}}(P_{0}\|P_{1}) as the asymptotic exponential decay rate of Ξ²nΞ΅,δ​(Ξ±)\beta_{n}^{\varepsilon,\delta}(\alpha), the above corollary, once again, justifies the reduction of effective sample size from nn to φ​(Ξ΅,Ξ΄)​n\varphi(\varepsilon,\delta)n in the presence of (Ξ΅,Ξ΄)(\varepsilon,\delta)-LDP requirement.

VII Mutual Information of LDP Mechanisms

Viewing mutual information as a utility measure, we may consider maximizing mutual information under local differential privacy as yet another privacy-utility trade-off. To formalize this, let X∼PXX\sim P_{X}. The goal is to characterize the supremum of I​(X;Z)I(X;Z) over π–ͺβˆˆπ’¬Ξ΅,Ξ΄{\mathsf{K}}\in{\mathcal{Q}}_{\varepsilon,\delta}, i.e., the maximum information shared between XX and its (Ξ΅,Ξ΄)(\varepsilon,\delta)-LDP representation. Such mutual information bounds under local DP have appeared in the literature, e.g., McGregor et al. [46] provided a result that roughly states I​(X;Z)≀3​ΡI(X;Z)\leq 3\varepsilon for π–ͺβˆˆπ’¬Ξ΅{\mathsf{K}}\in{\mathcal{Q}}_{\varepsilon} and Kairouz et al. [13, Corollary 15] showed for sufficiently small Ξ΅\varepsilon

supπ–ͺβˆˆπ’¬Ξ΅I​(X;Z)≀12​PX​(A)​(1βˆ’PX​(A))​Ρ2,\sup_{{\mathsf{K}}\in{\mathcal{Q}}_{\varepsilon}}I(X;Z)\leq\frac{1}{2}P_{X}(A)(1-P_{X}(A))\varepsilon^{2}, (20)

where AβŠ‚π’³A\subset{\mathcal{X}} satisfies A∈arg​minBβŠ‚π’³|PX​(B)βˆ’12|A\in\mathop{\rm arg\,min}_{B\subset{\mathcal{X}}}|P_{X}(B)-\frac{1}{2}|. Next, we provide an upper bound for the mutual information under LDP that holds for all Ξ΅β‰₯0\varepsilon\geq 0 and δ∈[0,1]\delta\in[0,1].

Corollary 5.

We have for any Ξ΅β‰₯0\varepsilon\geq 0 and δ∈[0,1]\delta\in[0,1]

supπ–ͺβˆˆπ’¬Ξ΅,Ξ΄I​(X;Z)≀φ​(Ξ΅,Ξ΄)​H​(X).\sup_{{\mathsf{K}}\in{\mathcal{Q}}_{\varepsilon,\delta}}I(X;Z)\leq\varphi(\varepsilon,\delta)H(X). (21)

References

  • [1] C.Β Dwork, F.Β McSherry, K.Β Nissim, and A.Β Smith, β€œCalibrating noise to sensitivity in private data analysis,” in Proc. Theory of Cryptography (TCC), Berlin, Heidelberg, 2006, pp. 265–284.
  • [2] C.Β Dwork, K.Β Kenthapadi, F.Β McSherry, I.Β Mironov, and M.Β Naor, β€œOur data, ourselves: Privacy via distributed noise generation,” in EUROCRYPT, S.Β Vaudenay, Ed., 2006, pp. 486–503.
  • [3] I.Β Mironov, β€œRΓ©nyi differential privacy,” in Proc. Computer Security Found. (CSF), 2017, pp. 263–275.
  • [4] M.Β Bun and T.Β Steinke, β€œConcentrated differential privacy: Simplifications, extensions, and lower bounds,” in Theory of Cryptography, 2016, pp. 635–658.
  • [5] C.Β Dwork and G.Β N. Rothblum, β€œConcentrated differential privacy,” vol. abs/1603.01887, 2016. [Online]. Available: http://arxiv.org/abs/1603.01887
  • [6] J.Β Dong, A.Β Roth, and W.Β J. Su, β€œGaussian differential privacy,” arXiv 1905.02383, 2019.
  • [7] S.Β Asoodeh, J.Β Liao, F.Β P. Calmon, O.Β Kosut, and L.Β Sankar, β€œThree variants of differential privacy: Lossless conversion and applications,” To appear in Journal on Selected Areas in Information Theory (JSAIT), 2021.
  • [8] A.Β Evfimievski, J.Β Gehrke, and R.Β Srikant, β€œLimiting privacy breaches in privacy preserving data mining,” in Proc. ACM symp. Principles of Database Systems (PODS).Β Β Β ACM, 2003, pp. 211–222.
  • [9] S.Β P. Kasiviswanathan, H.Β K. Lee, K.Β Nissim, S.Β Raskhodnikova, and A.Β Smith, β€œWhat can we learn privately?” SIAM J. Comput., vol.Β 40, no.Β 3, pp. 793–826, Jun. 2011.
  • [10] J.Β C. Duchi, M.Β I. Jordan, and M.Β J. Wainwright, β€œLocal privacy, data processing inequalities, and statistical minimax rates,” in Proc. Symp. Foundations of Computer Science, 2013, p. 429–438. [Online]. Available: https://arxiv.org/abs/1302.3203
  • [11] M.Β Gaboardi, R.Β Rogers, and O.Β Sheffet, β€œLocally private mean estimation: zz-test and tight confidence intervals,” in Proc. Machine Learning Research, 2019, pp. 2545–2554.
  • [12] A.Β Bhowmick, J.Β Duchi, J.Β Freudiger, G.Β Kapoor, and R.Β Rogers, β€œProtection against reconstruction and its applications in private federated learning,” arXiv 1812.00984, 2018.
  • [13] P.Β Kairouz, S.Β Oh, and P.Β Viswanath, β€œExtremal mechanisms for local differential privacy,” Journal of Machine Learning Research, vol.Β 17, no.Β 17, pp. 1–51, 2016.
  • [14] L.Β P. Barnes, W.Β N. Chen, and A.Β Γ–zgΓΌr, β€œFisher information under local differential privacy,” IEEE Journal on Selected Areas in Information Theory, vol.Β 1, no.Β 3, pp. 645–659, 2020.
  • [15] J.Β Acharya, C.Β L. Canonne, and H.Β Tyagi, β€œInference under information constraints i: Lower bounds from chi-square contraction,” IEEE Transactions on Information Theory, vol.Β 66, no.Β 12, pp. 7835–7855, 2020.
  • [16] M.Β Ye and A.Β Barg, β€œOptimal schemes for discrete distribution estimation under locally differential privacy,” IEEE Trans. Inf. Theory, vol.Β 64, no.Β 8, pp. 5662–5676, 2018.
  • [17] D.Β Wang and J.Β Xu, β€œOn sparse linear regression in the local differential privacy model,” IEEE Trans. Inf. Theory, pp. 1–1, 2020.
  • [18] A.Β Rohde and L.Β Steinberger, β€œGeometrizing rates of convergence under local differential privacy constraints,” Ann. Statist., vol.Β 48, no.Β 5, pp. 2646–2670, 10 2020.
  • [19] P.Β Kairouz, K.Β Bonawitz, and D.Β Ramage, β€œDiscrete distribution estimation under local privacy,” in Proc. Int. Conf. Machine Learning, vol.Β 48, 20–22 Jun 2016, pp. 2436–2444.
  • [20] J.Β Duchi and R.Β Rogers, β€œLower bounds for locally private estimation via communication complexity,” in Proc. Conference on Learning Theory, 2019, pp. 1161–1191.
  • [21] R.Β Ahlswede and P.Β GΓ‘cs, β€œSpreading of sets in product spaces and hypercontraction of the markov operator,” Ann. Probab., vol.Β 4, no.Β 6, pp. 925–939, 12 1976.
  • [22] B.Β Yu, Assouad, Fano, and Le Cam.Β Β Β Springer New York, 1997, pp. 423–435.
  • [23] A.Β B. Tsybakov, Introduction to Nonparametric Estimation, 1stΒ ed.Β Β Β Springer Publishing Company, Incorporated, 2008.
  • [24] I.Β CsiszΓ‘r, β€œInformation-type measures of difference of probability distributions and indirect observations,” Studia Sci. Math. Hungar., vol.Β 2, pp. 299–318, 1967.
  • [25] S.Β M. Ali and S.Β D. Silvey, β€œA general class of coefficients of divergence of one distribution from another,” Journal of Royal Statistics, vol.Β 28, pp. 131–142, 1966.
  • [26] N.Β Sharma and N.Β A. Warsi, β€œFundamental bound on the reliability of quantum information transmission,” CoRR, vol. abs/1302.5281, 2013. [Online]. Available: http://arxiv.org/abs/1302.5281
  • [27] Y.Β Polyanskiy, H.Β V. Poor, and S.Β VerdΓΊ, β€œChannel coding rate in the finite blocklength regime,” IEEE Trans. Inf. Theory, vol.Β 56, no.Β 5, pp. 2307–2359, 2010.
  • [28] B.Β Balle, G.Β Barthe, and M.Β Gaboardi, β€œPrivacy amplification by subsampling: Tight analyses via couplings and divergences,” in NeurIPS, 2018, pp. 6280–6290.
  • [29] B.Β Balle, G.Β Barthe, M.Β Gaboardi, and J.Β Geumlek, β€œPrivacy amplification by mixing and diffusion mechanisms,” in NeurIPS, 2019, pp. 13 277–13 287.
  • [30] S.Β Asoodeh, M.Β Diaz, and F.Β P. Calmon, β€œPrivacy analysis of online learning algorithms via contraction coefficients,” arXiv 2012.11035, 2020.
  • [31] R.Β L. Dobrushin, β€œCentral limit theorem for nonstationary markov chains. I,” Theory Probab. Appl., vol.Β 1, no.Β 1, pp. 65–80, 1956.
  • [32] P.Β DelΒ Moral, M.Β Ledoux, and L.Β Miclo, β€œOn contraction properties of markov kernels,” Probab. Theory Relat. Fields, vol. 126, pp. 395–420, 2003.
  • [33] J.Β E. Cohen, Y.Β Iwasa, G.Β Rautu, M.Β Beth Ruskai, E.Β Seneta, and G.Β Zbaganu, β€œRelative entropy under mappings by stochastic matrices,” Linear Algebra and its Applications, vol. 179, pp. 211 – 235, 1993.
  • [34] V.Β Anantharam, A.Β Gohari, S.Β Kamath, and C.Β Nair, β€œOn hypercontractivity and a data processing inequality,” in 2014 IEEE Int. Symp. Inf. Theory, 2014, pp. 3022–3026.
  • [35] Y.Β Polyanskiy and Y.Β Wu, β€œStrong data-processing inequalities for channels and bayesian networks,” in Convexity and Concentration, E.Β Carlen, M.Β Madiman, and E.Β M. Werner, Eds.Β Β Β New York, NY: Springer New York, 2017, pp. 211–249.
  • [36] Y.Β Polyanskiy and Y.Β Wu, β€œDissipation of information in channels with input constraints,” IEEE Trans. Inf. Theory, vol.Β 62, no.Β 1, pp. 35–55, Jan 2016.
  • [37] F.Β P. Calmon, Y.Β Polyanskiy, and Y.Β Wu, β€œStrong data processing inequalities for input constrained additive noise channels,” IEEE Trans. Inf. Theory, vol.Β 64, no.Β 3, pp. 1879–1892, 2018.
  • [38] A.Β Makur and L.Β Zheng, β€œComparison of contraction coefficients for ff-divergences,” Probl. Inf. Trans., vol.Β 56, pp. 103–156, 2020.
  • [39] M.Β Raginsky, β€œStrong data processing inequalities andΟ•\phi-sobolev inequalities for discrete channels,” IEEE Trans. Inf. Theory, vol.Β 62, no.Β 6, pp. 3355–3389, June 2016.
  • [40] H.Β S. Witsenhausen, β€œOn sequences of pairs of dependent random variables,” SIAM Journal on Applied Mathematics, vol.Β 28, no.Β 1, pp. 100–113, 1975.
  • [41] S.Β L. Warner, β€œRandomized response: A survey technique for eliminating evasive answer bias,” Journal of the American Statistical Association, vol.Β 60, no. 309, pp. 63–69, 1965.
  • [42] J.Β Cohen, J.Β Kemperman, and G.Β ZbΔƒganu, Comparisons of Stochastic Matrices, with Applications in Information Theory, Economics, and Population Sciences.Β Β Β BirkhΓ€user, 1998.
  • [43] Y.Β Yang and A.Β Barron, β€œInformation-theoretic determination of minimax rates of convergence,” Ann. Statist., vol.Β 27, no.Β 5, pp. 1564–1599, 10 1999.
  • [44] A.Β Xu and M.Β Raginsky, β€œConverses for distributed estimation via strong data processing inequalities,” in IEEE Int. Sympos. Inf. Theory (ISIT), 2015, pp. 2376–2380.
  • [45] T.Β M. Cover and J.Β A. Thomas, Elements of information theory.Β Β Β John Wiley & Sons, 2012.
  • [46] A.Β McGregor, I.Β Mironov, T.Β Pitassi, O.Β Reingold, K.Β Talwar, and S.Β Vadhan, β€œThe limits of two-party differential privacy,” in Proc. of the 51st Annual IEEE Symposium on Foundations of Computer Science (FOCS β€˜10), 23–26 October 2010, p. 81–90.
  • [47] I.Β CsiszΓ‘r and J.Β KΓΆrner, Information Theory: Coding Theorems for Discrete Memoryless Systems.Β Β Β Cambridge University Press, 2011.

We begin by some alternative expressions for 𝖀γ{\mathsf{E}}_{\gamma}-divergence that are useful for the subsequent proofs. It is straightforward to show that for any Ξ³β‰₯0\gamma\geq 0, we have

𝖀γ​(Pβˆ₯Q)\displaystyle{\mathsf{E}}_{\gamma}(P\|Q) =12β€‹βˆ«|d​Pβˆ’Ξ³β€‹d​Q|βˆ’12​|1βˆ’Ξ³|\displaystyle=\frac{1}{2}\int|\textnormal{d}P-\gamma\textnormal{d}Q|-\frac{1}{2}|1-\gamma| (22)
=supAβŠ‚π’³[P​(A)βˆ’Ξ³β€‹Q​(A)]βˆ’(1βˆ’Ξ³)+\displaystyle=\sup_{A\subset{\mathcal{X}}}\left[P(A)-\gamma Q(A)\right]-(1-\gamma)_{+} (23)
=P​(log⁑d​Pd​Q>log⁑γ)βˆ’Ξ³β€‹Q​(log⁑d​Pd​Q>log⁑γ)\displaystyle=P\big{(}\log\frac{\text{d}P}{\text{d}Q}>\log\gamma\big{)}-\gamma Q\big{(}\log\frac{\text{d}P}{\text{d}Q}>\log\gamma\big{)}
βˆ’(1βˆ’Ξ³)+.\displaystyle\quad-(1-\gamma)_{+}. (24)

The proof of TheoremΒ 1 relies on the following theorem, recently proved by the authors in [30, Theorem 3].

Theorem 3.

For any Ξ³β‰₯1\gamma\geq 1 and Markov kernel π–ͺ{\mathsf{K}} with input alphabet 𝒳{\mathcal{X}}, we have

Ξ·Ξ³(π–ͺ)=supx,xβ€²βˆˆπ’³π–€Ξ³(π–ͺ(β‹…|x)βˆ₯π–ͺ(β‹…|xβ€²)).\eta_{\gamma}({\mathsf{K}})=\sup_{x,x^{\prime}\in{\mathcal{X}}}{\mathsf{E}}_{\gamma}({\mathsf{K}}(\cdot|x)\|{\mathsf{K}}(\cdot|x^{\prime})). (25)

Notice that for Ξ³=1\gamma=1, this theorem reduces to the well-known Dobrushin’s result [31] that states

η𝖳𝖡(π–ͺ)=supx,xβ€²βˆˆπ’³π–³π–΅(π–ͺ(β‹…|x),π–ͺ(β‹…|xβ€²)).\eta_{\mathsf{TV}}({\mathsf{K}})=\sup_{x,x^{\prime}\in{\mathcal{X}}}{\mathsf{TV}}({\mathsf{K}}(\cdot|x),{\mathsf{K}}(\cdot|x^{\prime})). (26)
Proof of TheoremΒ 1.

It follows from TheoremΒ 3 that

Ξ·eΞ΅(π–ͺ)β‰€Ξ΄βŸΊsupx,xβ€²βˆˆπ’³π–€eΞ΅(π–ͺ(β‹…|x)βˆ₯π–ͺ(β‹…|xβ€²))≀δ,\eta_{e^{\varepsilon}}({\mathsf{K}})\leq\delta\leavevmode\nobreak\ \Longleftrightarrow\leavevmode\nobreak\ \sup_{x,x^{\prime}\in{\mathcal{X}}}{\mathsf{E}}_{e^{\varepsilon}}({\mathsf{K}}(\cdot|x)\|{\mathsf{K}}(\cdot|x^{\prime}))\leq\delta, (27)

which, according to (23), implies

Ξ·eΡ​(π–ͺ)β‰€Ξ΄βŸΊsupx,xβ€²βˆˆAsupAβŠ‚π’΅[π–ͺ​(A|x)βˆ’eΡ​π–ͺ​(A|xβ€²)]≀δ.\eta_{e^{\varepsilon}}({\mathsf{K}})\leq\delta\leavevmode\nobreak\ \Longleftrightarrow\leavevmode\nobreak\ \sup_{x,x^{\prime}\in A}\sup_{A\subset{\mathcal{Z}}}\left[{\mathsf{K}}(A|x)-e^{\varepsilon}{\mathsf{K}}(A|x^{\prime})\right]\leq\delta.

Hence, in light of DefinitionΒ 1, π–ͺ{\mathsf{K}} is (Ξ΅,Ξ΄)(\varepsilon,\delta)-LDP if and only if Ξ·eΡ​(π–ͺ)≀δ\eta_{e^{\varepsilon}}({\mathsf{K}})\leq\delta.

∎

Proof of LemmaΒ 1.

We first show the following upper and lower bounds for 𝖀γ{\mathsf{E}}_{\gamma}-divergence in terms of the total variation distance.

Claim. For any distributions PP and QQ on 𝒳{\mathcal{X}} and any Ξ³β‰₯1\gamma\geq 1, we have

1βˆ’Ξ³β€‹(1βˆ’π–³π–΅β€‹(Pβˆ₯Q))≀𝖀γ​(Pβˆ₯Q)≀𝖳𝖡​(P,Q).1-\gamma(1-{\mathsf{TV}}(P\|Q))\leq{\mathsf{E}}_{\gamma}(P\|Q)\leq{\mathsf{TV}}(P,Q). (28)
Proof of Claim.

The upper bound is immediate from the definition of 𝖀γ{\mathsf{E}}_{\gamma}-divergence (and holds for any Ξ³β‰₯0\gamma\geq 0). Note that

γ​𝖳𝖡​(Pβˆ₯Q)\displaystyle\gamma{\mathsf{TV}}(P\|Q) =maxAβŠ‚π’³β‘[γ​P​(A)βˆ’Ξ³β€‹Q​(A)]\displaystyle=\max_{A\subset{\mathcal{X}}}\left[\gamma P(A)-\gamma Q(A)\right]
=maxAβŠ‚π’³β‘[P​(A)βˆ’Ξ³β€‹Q​(A)+(Ξ³βˆ’1)​P​(A)]\displaystyle=\max_{A\subset{\mathcal{X}}}\left[P(A)-\gamma Q(A)+(\gamma-1)P(A)\right]
≀maxAβŠ‚π’³β‘[P​(A)βˆ’Ξ³β€‹Q​(A)]+(Ξ³βˆ’1)\displaystyle\leq\max_{A\subset{\mathcal{X}}}\left[P(A)-\gamma Q(A)\right]+(\gamma-1)
=𝖀γ​(Pβˆ₯Q)+Ξ³βˆ’1,\displaystyle={\mathsf{E}}_{\gamma}(P\|Q)+\gamma-1,

where the last equality follows from (23). This immediately results in the lower bound in (28). ∎

According to this claim, we can write for Ξ³β‰₯1\gamma\geq 1

𝖳𝖡​(P,Q)≀1βˆ’1βˆ’π–€Ξ³β€‹(Pβˆ₯Q)Ξ³.{\mathsf{TV}}(P,Q)\leq 1-\frac{1-{\mathsf{E}}_{\gamma}(P\|Q)}{\gamma}.

Replacing PP and QQ with π–ͺ(β‹…|x){\mathsf{K}}(\cdot|x) and π–ͺ(β‹…|xβ€²){\mathsf{K}}(\cdot|x^{\prime}), respectively, for some xx and xβ€²x^{\prime} in 𝒳{\mathcal{X}}, we obtain

𝖳𝖡(π–ͺ(β‹…|x),π–ͺ(β‹…|xβ€²))≀1βˆ’1βˆ’π–€Ξ³(π–ͺ(β‹…|x)βˆ₯π–ͺ(β‹…|xβ€²))Ξ³.{\mathsf{TV}}({\mathsf{K}}(\cdot|x),{\mathsf{K}}(\cdot|x^{\prime}))\leq 1-\frac{1-{\mathsf{E}}_{\gamma}({\mathsf{K}}(\cdot|x)\|{\mathsf{K}}(\cdot|x^{\prime}))}{\gamma}.

Taking supremum over xx and xβ€²x^{\prime} from both side and invoking TheoremΒ 3 and (26), we conclude that

η𝖳𝖡​(π–ͺ)≀1βˆ’1βˆ’Ξ·Ξ³β€‹(π–ͺ)Ξ³.\eta_{\mathsf{TV}}({\mathsf{K}})\leq 1-\frac{1-\eta_{\gamma}({\mathsf{K}})}{\gamma}. (29)

It is known [33, 39] that for any Markov kernel π–ͺ{\mathsf{K}} and any convex functions ff we have

Ξ·f​(π–ͺ)≀η𝖳𝖡​(π–ͺ),\eta_{f}({\mathsf{K}})\leq\eta_{\mathsf{TV}}({\mathsf{K}}), (30)

from which the desired result follows immediately. ∎

Proof of LemmaΒ 2.

Given nn mechanisms π–ͺ1,π–ͺ2,…,π–ͺn{\mathsf{K}}_{1},{\mathsf{K}}_{2},\dots,{\mathsf{K}}_{n}, we consider the non-interactive mechanism PZn|XnP_{Z^{n}|X^{n}} given by

PZn|Xn​(zn|xn)=∏i=1nπ–ͺi​(zi|xi).P_{Z^{n}|X^{n}}(z^{n}|x^{n})=\prod_{i=1}^{n}{\mathsf{K}}_{i}(z_{i}|x_{i}).

If π–ͺiβˆˆπ’¬Ξ΅,Ξ΄{\mathsf{K}}_{i}\in{\mathcal{Q}}_{\varepsilon,\delta} for i∈[n]i\in[n], then we have Ξ·eΡ​(π–ͺi)≀δ\eta_{e^{\varepsilon}}({\mathsf{K}}_{i})\leq\delta. According to (29), it thus leads to η𝖳𝖡​(π–ͺi)≀φ​(Ξ΅,Ξ΄)\eta_{\mathsf{TV}}({\mathsf{K}}_{i})\leq\varphi(\varepsilon,\delta). Invoking [35, Corollary 9] (see also [44, Lemma 3] and [38, Eq. (62)]), we obtain

η𝖳𝖡​(PZn|Xn)\displaystyle\eta_{\mathsf{TV}}(P_{Z^{n}|X^{n}}) ≀maxi∈[n]⁑[1βˆ’(1βˆ’Ξ·π–³π–΅β€‹(π–ͺi))n]\displaystyle\leq\max_{i\in[n]}\left[1-(1-\eta_{{\mathsf{TV}}}({\mathsf{K}}_{i}))^{n}\right]
≀[1βˆ’(1βˆ’Ο†β€‹(Ξ΅,Ξ΄))n]\displaystyle\leq\left[1-(1-\varphi(\varepsilon,\delta))^{n}\right]
=Ο†n​(Ξ΅,Ξ΄).\displaystyle=\varphi_{n}(\varepsilon,\delta).

∎

Proof of LemmaΒ 3.

Recall that XnX^{n} is an i.i.d.Β sample of distribution PP and each ZiZ_{i}, i∈[n]i\in[n] is obtained by applying π–ͺi{\mathsf{K}}_{i} to XiX_{i}. Note that by assumption π–ͺi{\mathsf{K}}_{i} specifies the conditional distribution PZi|Xi,Ziβˆ’1P_{Z_{i}|X_{i},Z^{i-1}}. Let M0nM^{n}_{0} and M1nM^{n}_{1} denote the distribution of ZnZ^{n} when P=P0P=P_{0} and P=P1P=P_{1}, respectively. Thus, we have for P=P0P=P_{0} and any znβˆˆπ’΅nz^{n}\in{\mathcal{Z}}^{n}

M0n​(zn)\displaystyle M^{n}_{0}(z^{n}) =∏i=1nPZi|Ziβˆ’1​(zi|ziβˆ’1)\displaystyle=\prod_{i=1}^{n}P_{Z_{i}|Z^{i-1}}(z_{i}|z^{i-1}) (31)
=∏i=1n(PXi|Ziβˆ’1=ziβˆ’1​π–ͺi)​(zi)\displaystyle=\prod_{i=1}^{n}\left(P_{X_{i}|Z^{i-1}=z^{i-1}}{\mathsf{K}}_{i}\right)(z_{i}) (32)
=∏i=1n(P0​π–ͺi)​(zi).\displaystyle=\prod_{i=1}^{n}\left(P_{0}{\mathsf{K}}_{i}\right)(z_{i}). (33)

Having this in mind, we can write

𝖳𝖡2​(M0n,M1n)\displaystyle{\mathsf{TV}}^{2}(M^{n}_{0},M^{n}_{1}) ≀(a)12​Dπ–ͺ𝖫​(M0nβˆ₯M1n)\displaystyle\stackrel{{\scriptstyle(a)}}{{\leq}}\frac{1}{2}D_{\mathsf{KL}}(M^{n}_{0}\|M^{n}_{1}) (34)
=(b)12β€‹βˆ‘i=1nDπ–ͺ𝖫​(P0​π–ͺiβˆ₯P1​π–ͺi)\displaystyle\stackrel{{\scriptstyle(b)}}{{=}}\frac{1}{2}\sum_{i=1}^{n}D_{\mathsf{KL}}(P_{0}{\mathsf{K}}_{i}\|P_{1}{\mathsf{K}}_{i}) (35)
≀(c)12β€‹βˆ‘i=1nφ​(Ξ΅,Ξ΄)​Dπ–ͺ𝖫​(P0βˆ₯P1)\displaystyle\stackrel{{\scriptstyle(c)}}{{\leq}}\frac{1}{2}\sum_{i=1}^{n}\varphi(\varepsilon,\delta)D_{\mathsf{KL}}(P_{0}\|P_{1}) (36)

where (a)(a) follows from Pinsker’s inequality, (b)(b) is due to the chain rule of KL divergence, (c)(c) is an application of LemmaΒ 1. Plugging (36) in (6), we obtain the desired result. ∎

Proof of CorollaryΒ 1.

Fix Ο‰βˆˆ(0,1]\omega\in(0,1] and consider two distributions P0P_{0} and P1P_{1} on {βˆ’Ο‰βˆ’1k,0,Ο‰βˆ’1k}\{-\omega^{-\frac{1}{k}},0,\omega^{-\frac{1}{k}}\} defined as

P0​(βˆ’Ο‰βˆ’1k)=Ο‰,P0​(0)=1βˆ’Ο‰,P_{0}(-\omega^{-\frac{1}{k}})=\omega,\qquad P_{0}(0)=1-\omega,

and

P1​(Ο‰βˆ’1k)=Ο‰,P1​(0)=1βˆ’Ο‰.P_{1}(\omega^{-\frac{1}{k}})=\omega,\qquad P_{1}(0)=1-\omega.

It can be verified that both P0P_{0} and P1P_{1} belong to 𝒫k{\mathcal{P}}_{k}. Note that β„“22​(θ​(P0),θ​(P1))=2​ω2​(kβˆ’1)k\ell_{2}^{2}(\theta(P_{0}),\theta(P_{1}))=2\omega^{\frac{2(k-1)}{k}}. Let M0n=P0βŠ—n​π–ͺnM_{0}^{n}=P^{\otimes n}_{0}{\mathsf{K}}^{n} and M1n=P1βŠ—n​π–ͺnM_{1}^{n}=P^{\otimes n}_{1}{\mathsf{K}}^{n} be the corresponding output distributions of the mechanism π–ͺn=π–ͺ1​…​π–ͺn{\mathsf{K}}^{n}={\mathsf{K}}_{1}\dots\mathsf{K}_{n}, the composition of mechanisms π–ͺi{\mathsf{K}}_{i}. Le Cam’s bound for β„“22\ell_{2}^{2}-metric yields

β„›n​(𝒫k,β„“22,Ξ΅,Ξ΄)\displaystyle\mathcal{R}_{n}({\mathcal{P}}_{k},\ell_{2}^{2},\varepsilon,\delta) β‰₯Ο‰2​(kβˆ’1)k(1βˆ’π–³π–΅(M0n,M1n)\displaystyle\geq\omega^{\frac{2(k-1)}{k}}(1-{\mathsf{TV}}(M^{n}_{0},M^{n}_{1})
β‰₯Ο‰2​(kβˆ’1)k​(1βˆ’H​(M0n,M1n)),\displaystyle\geq\omega^{\frac{2(k-1)}{k}}\left(1-H(M^{n}_{0},M^{n}_{1})\right), (37)

where the last inequality follows from the fact 𝖳𝖡​(P,Q)≀H​(P,Q){\mathsf{TV}}(P,Q)\leq H(P,Q) for H​(P,Q)H(P,Q) being the Hellinger distance. Notice that M0n=∏i=1n(P0​π–ͺi)M_{0}^{n}=\prod_{i=1}^{n}(P_{0}{\mathsf{K}}_{i}) and M1n=∏i=1n(P1​π–ͺi)M_{1}^{n}=\prod_{i=1}^{n}(P_{1}{\mathsf{K}}_{i}) where each π–ͺi{\mathsf{K}}_{i} for i∈[n]i\in[n] is (Ξ΅,Ξ΄)(\varepsilon,\delta)-LDP. It is well known that

H2​(∏i=1nPi,∏i=1nQi)=2βˆ’2β€‹βˆi=1n(1βˆ’12​H2​(Pi,Qi)).H^{2}\left(\prod_{i=1}^{n}P_{i},\prod_{i=1}^{n}Q_{i}\right)=2-2\prod_{i=1}^{n}\left(1-\frac{1}{2}H^{2}(P_{i},Q_{i})\right).

Thus,

H2​(M0n,M1n)\displaystyle H^{2}(M_{0}^{n},M_{1}^{n}) =2βˆ’2β€‹βˆi=1n(1βˆ’12​H2​(P0​π–ͺi,P1​π–ͺi))\displaystyle=2-2\prod_{i=1}^{n}\left(1-\frac{1}{2}H^{2}(P_{0}{\mathsf{K}}_{i},P_{1}{\mathsf{K}}_{i})\right)
≀2βˆ’2β€‹βˆi=1n(1βˆ’Ο†β€‹(Ξ΅,Ξ΄)2​H2​(P0,P1))\displaystyle\leq 2-2\prod_{i=1}^{n}\left(1-\frac{\varphi(\varepsilon,\delta)}{2}H^{2}(P_{0},P_{1})\right)
=2βˆ’2​(1βˆ’Ο†β€‹(Ξ΅,Ξ΄)2​H2​(P0,P1))n\displaystyle=2-2\left(1-\frac{\varphi(\varepsilon,\delta)}{2}H^{2}(P_{0},P_{1})\right)^{n}
=2βˆ’2​(1βˆ’Ο‰β€‹Ο†β€‹(Ξ΅,Ξ΄))n\displaystyle=2-2\left(1-\omega\varphi(\varepsilon,\delta)\right)^{n} (38)

Hence, we obtain

𝖳𝖡​(M0n,M1n)≀2βˆ’2​(1βˆ’Ο‰β€‹Ο†β€‹(Ξ΅,Ξ΄))n.{\mathsf{TV}}(M_{0}^{n},M_{1}^{n})\leq\sqrt{2-2\left(1-\omega\varphi(\varepsilon,\delta)\right)^{n}}. (39)

Plugging (38) into (37), we obtain

β„›n​(𝒫k,β„“22,Ξ΅,Ξ΄)β‰₯Ο‰2​(kβˆ’1)k​[1βˆ’2​1βˆ’(1βˆ’Ο‰β€‹Ο†β€‹(Ξ΅,Ξ΄))n].\mathcal{R}_{n}({\mathcal{P}}_{k},\ell_{2}^{2},\varepsilon,\delta)\geq\omega^{\frac{2(k-1)}{k}}\left[1-\sqrt{2}\sqrt{1-(1-\omega\varphi(\varepsilon,\delta))^{n}}\right]. (40)

Now, choose Ο‰=min⁑{1,1φ​(Ξ΅,Ξ΄)​[1βˆ’(78)1n]}\omega=\min\left\{1,\frac{1}{\varphi(\varepsilon,\delta)}\left[1-\left(\frac{7}{8}\right)^{\frac{1}{\sqrt{n}}}\right]\right\}. Notice that we assume Ξ΄>0\delta>0 and hence φ​(Ξ΅,Ξ΄)>0\varphi(\varepsilon,\delta)>0 regardless of Ξ΅\varepsilon. Plugging this choice of Ο‰\omega into the above bound, we obtain

β„›n​(𝒫k,β„“22,Ξ΅,Ξ΄)\displaystyle\mathcal{R}_{n}({\mathcal{P}}_{k},\ell_{2}^{2},\varepsilon,\delta) ≳(φ​(Ξ΅,Ξ΄))βˆ’2​(kβˆ’1)k​[1βˆ’(78)1n]2​(kβˆ’1)k\displaystyle\gtrsim(\varphi(\varepsilon,\delta))^{-\frac{2(k-1)}{k}}\left[1-\left(\frac{7}{8}\right)^{\frac{1}{\sqrt{n}}}\right]^{\frac{2(k-1)}{k}}
≳(φ​(Ξ΅,Ξ΄))βˆ’2​(kβˆ’1)k​nβˆ’kβˆ’1k.\displaystyle\gtrsim(\varphi(\varepsilon,\delta))^{-\frac{2(k-1)}{k}}n^{-\frac{k-1}{k}}. (41)

∎

Proof of LemmaΒ 4.

Note that we have the Markov chain V⊸--Xn⊸--ZnV\mathrel{\multimap}\joinrel\mathrel{-}\mspace{-9.0mu}\joinrel\mathrel{-}X^{n}\mathrel{\multimap}\joinrel\mathrel{-}\mspace{-9.0mu}\joinrel\mathrel{-}Z^{n}. It has been shown in [47, Problem 15.12] (see also [34, 47]) that for any channel PB|AP_{B|A} connecting random variable AA to BB, we have

Ξ·π–ͺ𝖫​(PB|A)=supPA​U:U⊸--A⊸--BI​(U;B)I​(U;A).\eta_{\mathsf{KL}}(P_{B|A})=\sup_{\begin{subarray}{c}P_{AU}:\\ U\mathrel{\multimap}\joinrel\mathrel{-}\mspace{-9.0mu}\joinrel\mathrel{-}A\mathrel{\multimap}\joinrel\mathrel{-}\mspace{-9.0mu}\joinrel\mathrel{-}B\end{subarray}}\frac{I(U;B)}{I(U;A)}. (42)

Replacing AA and BB with XnX^{n} and ZnZ^{n}, respectively, in the above equation, we obtain

I​(Zn;V)\displaystyle I(Z^{n};V) ≀ηπ–ͺ𝖫​(π–ͺβŠ—n)​I​(Xn;V)\displaystyle\leq\eta_{\mathsf{KL}}({\mathsf{K}}^{\otimes n})I(X^{n};V)
=Ξ·π–ͺ𝖫​(π–ͺβŠ—n)​n|𝒱|β€‹βˆ‘V=vDπ–ͺ𝖫​(Pvβˆ₯PΒ―),\displaystyle=\eta_{\mathsf{KL}}({\mathsf{K}}^{\otimes n})\frac{n}{|{\mathcal{V}}|}\sum_{V=v}D_{\mathsf{KL}}(P_{v}\|\bar{P}),

where π–ͺβŠ—n=PZn|Xn{\mathsf{K}}^{\otimes n}=P_{Z^{n}|X^{n}} and PΒ―=1|𝒱|β€‹βˆ‘vβˆˆπ’±Pv\bar{P}=\frac{1}{|{\mathcal{V}}|}\sum_{v\in{\mathcal{V}}}P_{v}. The desired result then follows from LemmaΒ 2 and the convexity of KL-divergence. ∎

Proof of CorollaryΒ 2.

The proof strategy is as follows: we first construct a set of probability distribution {Pv}\{P_{v}\} for vv taking values in a finite set 𝒱{\mathcal{V}} and then apply Fano’s inequality (9) where VV is a uniform random variable on 𝒱{\mathcal{V}}. Duchi el el. [10, Lemma 6] showed that there exists a set 𝒱k{\mathcal{V}}_{k} of the kk-dimensional hypercube {βˆ’1,+1}k\{-1,+1\}^{k} satisfying β€–vβˆ’vβ€²β€–1β‰₯k2\|v-v^{\prime}\|_{1}\geq\frac{k}{2} for each v,vβ€²βˆˆπ’±kv,v^{\prime}\in{\mathcal{V}}_{k} with vβ‰ vβ€²v\neq v^{\prime} and some integer k∈[d]k\in[d], while |𝒱k||{\mathcal{V}}_{k}| being at least ⌈ek16βŒ‰\lceil e^{\frac{k}{16}}\rceil. If k<dk<d, one can extend 𝒱kβŠ‚β„k{\mathcal{V}}_{k}\subset\mathbb{R}^{k} to a subset of ℝd\mathbb{R}^{d} by considering 𝒱=𝒱kΓ—{0}dβˆ’k{\mathcal{V}}={\mathcal{V}}_{k}\times\{0\}^{d-k}. Fix Ο‰βˆˆ(0,1)\omega\in(0,1) and define a distribution Pvβˆˆπ’«β€‹(𝖑2d​(r))P_{v}\in{\mathcal{P}}(\mathsf{B}^{d}_{2}(r)) for vβˆˆπ’±v\in{\mathcal{V}} as follows: Choose an index j∈[k]j\in[k] uniformly and set Pv​(r​𝖾j)=1+ω​vj2P_{v}(r\mathsf{e}_{j})=\frac{1+\omega v_{j}}{2} and Pv​(βˆ’r​𝖾j)=1βˆ’Ο‰β€‹vj2P_{v}(-r\mathsf{e}_{j})=\frac{1-\omega v_{j}}{2} where 𝖾j\mathsf{e}_{j} is the standard basis vector in ℝd\mathbb{R}^{d}. Given vβˆˆπ’±kv\in{\mathcal{V}}_{k}, let X∼PvX\sim P_{v} be a random variable taking values in {Β±r​𝖾j}j=1k\{\pm r\mathsf{e}_{j}\}_{j=1}^{k} and XnX^{n} be an i.i.d.Β sample of XX. Furthermore, as before, let ZnZ^{n} be a privatized sample of XnX^{n} obtained by π–ͺβŠ—n{\mathsf{K}}^{\otimes n} with π–ͺ{\mathsf{K}} being an (Ξ΅,Ξ΄)(\varepsilon,\delta)-LDP mechanism. To apply Fano’s inequality, we first need to bound I​(Zn;V)I(Z^{n};V). According to LemmaΒ 4, we have

I​(Zn;V)\displaystyle I(Z^{n};V) ≀φn​(Ξ΅,Ξ΄)​n|𝒱k|2β€‹βˆ‘v,vβ€²π–ͺ𝖫​(Pvβˆ₯Pvβ€²)\displaystyle\leq\varphi_{n}(\varepsilon,\delta)\frac{n}{|{\mathcal{V}}_{k}|^{2}}\sum_{v,v^{\prime}}{\mathsf{KL}}(P_{v}\|P_{v^{\prime}})
≀φn​(Ξ΅,Ξ΄)​I​(Xn;V).\displaystyle\leq\varphi_{n}(\varepsilon,\delta)I(X^{n};V). (43)

Hence, bounding I​(Zn;V)I(Z^{n};V) reduces to bounding I​(Xn;V)I(X^{n};V). To this end, first notice that I​(Xn;V)≀n​I​(X;V)I(X^{n};V)\leq nI(X;V). Let KK be a uniform random variable on [k][k] independent of VV that chooses the coordinate of VV. Note that KK can be determined by XX and hence

I​(X;V)\displaystyle I(X;V) =I​(X,K;V)\displaystyle=I(X,K;V)
=I​(X;V|K)\displaystyle=I(X;V|K)
≀log⁑2βˆ’h𝖻​(1βˆ’Ο‰2)\displaystyle\leq\log 2-h_{\mathsf{b}}(\frac{1-\omega}{2})
≀ω​log⁑2,\displaystyle\leq\omega\log 2,

where the last inequality follows from the fact that h𝖻​(a)β‰₯2​a​log⁑2h_{\mathsf{b}}(a)\geq 2a\log 2 for a∈[0,12]a\in[0,\frac{1}{2}] due to the concavity of entropy. Consequently, we can write

I​(Zn;V)≀n​ω​φn​(Ξ΅,Ξ΄)​log⁑2.I(Z^{n};V)\leq n\omega\varphi_{n}(\varepsilon,\delta)\log 2. (44)

Applying Fano’s inequality, we obtain

β„›n​(𝒫,β„“,Ξ΅,Ξ΄)\displaystyle\mathcal{R}_{n}({\mathcal{P}},\ell,\varepsilon,\delta) β‰₯r2​ω2k​[1βˆ’(1+n​ω​φn​(Ξ΅,Ξ΄))​log⁑2log⁑|𝒱|]\displaystyle\geq\frac{r^{2}\omega^{2}}{k}\left[1-\frac{(1+n\omega\varphi_{n}(\varepsilon,\delta))\log 2}{\log|{\mathcal{V}}|}\right] (45)
β‰₯r2​ω2k​[1βˆ’16​(1+n​ω​φn​(Ξ΅,Ξ΄))​log⁑2k].\displaystyle\geq\frac{r^{2}\omega^{2}}{k}\left[1-\frac{16(1+n\omega\varphi_{n}(\varepsilon,\delta))\log 2}{k}\right]. (46)

Setting Ο‰=min⁑{1,k50​n​φn​(Ξ΅,Ξ΄)}\omega=\min\{1,\frac{k}{50n\varphi_{n}(\varepsilon,\delta)}\} and assuming kβ‰₯16k\geq 16, we can write

β„›n​(𝒫,β„“,Ξ΅,Ξ΄)≳r2​maxk∈[d]⁑min⁑{1k,kn2​φn2​(Ξ΅,Ξ΄)}.\mathcal{R}_{n}({\mathcal{P}},\ell,\varepsilon,\delta)\gtrsim r^{2}\max_{k\in[d]}\min\left\{\frac{1}{k},\frac{k}{n^{2}\varphi^{2}_{n}(\varepsilon,\delta)}\right\}. (47)

By choosing k=min⁑{n​φn​(Ξ΅,Ξ΄),d}k=\min\{n\varphi_{n}(\varepsilon,\delta),d\}, we obtain

β„›n​(𝒫,β„“,Ξ΅,Ξ΄)≳r2​min⁑{1n​φn​(Ξ΅,Ξ΄),dn2​φn2​(Ξ΅,Ξ΄)}.\mathcal{R}_{n}({\mathcal{P}},\ell,\varepsilon,\delta)\gtrsim r^{2}\min\left\{\frac{1}{n\varphi_{n}(\varepsilon,\delta)},\frac{d}{n^{2}\varphi^{2}_{n}(\varepsilon,\delta)}\right\}. (48)

∎

Proof of TheoremΒ 2.

Let Θ^=Ψ​(Zn)\hat{\Theta}=\Psi(Z^{n}) be an estimate of Θ\Theta for some Ξ¨\Psi and p΢≔PΞ˜β€‹Ξ˜^​(ℓ​(Θ,Θ^)≀΢)p_{\zeta}\coloneqq P_{\Theta\hat{\Theta}}(\ell(\Theta,\hat{\Theta})\leq\zeta) and q΢≔(PΞ˜β€‹PΘ^)​(ℓ​(Θ,Θ^)≀΢)q_{\zeta}\coloneqq(P_{\Theta}P_{\hat{\Theta}})(\ell(\Theta,\hat{\Theta})\leq\zeta), i.e., pΞΆp_{\zeta} and qΞΆq_{\zeta} correspond to the probability of the event {ℓ​(Θ,Θ^)≀΢}\{\ell(\Theta,\hat{\Theta})\leq\zeta\} under the joint and product distributions, respectively. By definition, we have for any Ξ³β‰₯1\gamma\geq 1

Iγ​(Θ;Θ^)\displaystyle I_{\gamma}(\Theta;\hat{\Theta}) =𝖀γ​(PΞ˜β€‹Ξ˜^βˆ₯PΞ˜β€‹PΘ^)\displaystyle={\mathsf{E}}_{\gamma}(P_{\Theta\hat{\Theta}}\|P_{\Theta}P_{\hat{\Theta}})
=supAβŠ‚π’―Γ—π’―[PΞ˜β€‹Ξ˜^​(A)βˆ’Ξ³β€‹(PΞ˜β€‹PΘ^)​(A)]\displaystyle=\sup_{A\subset{\mathcal{T}}\times{\mathcal{T}}}\left[P_{\Theta\hat{\Theta}}(A)-\gamma(P_{\Theta}P_{\hat{\Theta}})(A)\right]
β‰₯pΞΆβˆ’Ξ³β€‹qΞΆ\displaystyle\geq p_{\zeta}-\gamma q_{\zeta}
β‰₯pΞΆβˆ’Ξ³β€‹β„’β€‹(ΞΆ),\displaystyle\geq p_{\zeta}-\gamma{\mathcal{L}}(\zeta),

where the last inequality follows from the fact that q΢≀ℒ​(ΞΆ)q_{\zeta}\leq{\mathcal{L}}(\zeta), that can be shown as follows

qΞΆ\displaystyle q_{\zeta} =βˆ«π’―βˆ«π’―1{ℓ​(ΞΈ,ΞΈ^)≀΢}​PΞ˜β€‹(d​θ)​PΘ^​(d​θ^)\displaystyle=\int_{{\mathcal{T}}}\int_{{\mathcal{T}}}1_{\{\ell(\theta,\hat{\theta})\leq\zeta\}}P_{\Theta}(\text{d}\theta)P_{\hat{\Theta}}(\text{d}\hat{\theta})
≀suptβˆˆπ’―βˆ«π’―1{ℓ​(ΞΈ,t)≀΢}​PΞ˜β€‹(d​θ)\displaystyle\leq\sup_{t\in{\mathcal{T}}}\int_{{\mathcal{T}}}1_{\{\ell(\theta,t)\leq\zeta\}}P_{\Theta}(\text{d}\theta)
=ℒ​(ΞΆ).\displaystyle={\mathcal{L}}(\zeta).

Recalling that Pr⁑(ℓ​(Θ,Θ^)>ΞΆ)=1βˆ’pΞΆ\Pr(\ell(\Theta,\hat{\Theta})>\zeta)=1-p_{\zeta}, the above thus implies

Pr⁑(ℓ​(Θ,Θ^)>ΞΆ)β‰₯1βˆ’Iγ​(Θ;Θ^)βˆ’Ξ³β€‹β„’β€‹(ΞΆ).\Pr(\ell(\Theta,\hat{\Theta})>\zeta)\geq 1-I_{\gamma}(\Theta;\hat{\Theta})-\gamma{\mathcal{L}}(\zeta). (49)

Since, by Markov’s inequality, 𝔼​[ℓ​(Θ,Θ^)]β‰₯΢​Pr⁑(ℓ​(Θ,Θ^)β‰₯ΞΆ){\mathbb{E}}[\ell(\Theta,\hat{\Theta})]\geq\zeta\Pr(\ell(\Theta,\hat{\Theta})\geq\zeta), we can write by setting Ξ³=eΞ΅\gamma=e^{\varepsilon}

Rnπ–‘π–Ίπ—’π–Ύπ—Œβ€‹(PΘ,β„“,Ξ΅,Ξ΄)\displaystyle R_{n}^{\mathsf{Bayes}}(P_{\Theta},\ell,\varepsilon,\delta) β‰₯΢​[1βˆ’IeΡ​(Θ;Θ^)βˆ’eΡ​ℒ​(ΞΆ)]\displaystyle\geq\zeta\left[1-I_{e^{\varepsilon}}(\Theta;\hat{\Theta})-e^{\varepsilon}{\mathcal{L}}(\zeta)\right]
β‰₯΢​[1βˆ’IeΡ​(Θ;Zn)βˆ’eΡ​ℒ​(ΞΆ)],\displaystyle\geq\zeta\left[1-I_{e^{\varepsilon}}(\Theta;Z^{n})-e^{\varepsilon}{\mathcal{L}}(\zeta)\right],

where the second inequality comes from the data processing inequality for IΞ³I_{\gamma}. To further lower bound the right-hand side, we write

IeΡ​(Θ;Zn)\displaystyle I_{e^{\varepsilon}}(\Theta;Z^{n}) =βˆ«π’―π–€eΡ​(PZn|Θ=ΞΈβˆ₯PZn)​PΞ˜β€‹(d​θ)\displaystyle=\int_{{\mathcal{T}}}{\mathsf{E}}_{e^{\varepsilon}}(P_{Z^{n}|\Theta=\theta}\|P_{Z^{n}})P_{\Theta}(\text{d}\theta)
≀ηeΡ​(PZn|Xn)β€‹βˆ«π’―π–€eΡ​(PXn|Θ=ΞΈβˆ₯PXn)​PΞ˜β€‹(d​θ)\displaystyle\leq\eta_{e^{\varepsilon}}(P_{Z^{n}|X^{n}})\int_{{\mathcal{T}}}{\mathsf{E}}_{e^{\varepsilon}}(P_{X^{n}|\Theta=\theta}\|P_{X^{n}})P_{\Theta}(\text{d}\theta)
=Ξ·eΡ​(PZn|Xn)​IeΡ​(Θ;Xn),\displaystyle=\eta_{e^{\varepsilon}}(P_{Z^{n}|X^{n}})I_{e^{\varepsilon}}(\Theta;X^{n}),

where the inequality follows from the definition of contraction coefficient. When n=1n=1, we have Ξ·eΡ​(π–ͺ)≀δ\eta_{e^{\varepsilon}}({\mathsf{K}})\leq\delta as π–ͺ=PZ|X{\mathsf{K}}=P_{Z|X} is assumed to be (Ξ΅,Ξ΄)(\varepsilon,\delta)-DP. For n>1n>1, we invoke LemmaΒ 2 to obtain Ξ·eΡ​(PZn|Xn)≀φn​(Ξ΅,Ξ΄)\eta_{e^{\varepsilon}}(P_{Z^{n}|X^{n}})\leq\varphi_{n}(\varepsilon,\delta). ∎

Proof of CorollaryΒ 4 .

Let Ξ²n​(Ξ±)≔βn∞,1​(Ξ±)\beta_{n}(\alpha)\coloneqq\beta^{\infty,1}_{n}(\alpha) be the non-private trade-off between type I and type II error probabilities (i.e., Zn=XnZ^{n}=X^{n}). According to Chernoff-Stein lemma (see, e.g., [45, Theorem 11.8.3]), we have

limnβ†’βˆž1n​log⁑βn​(Ξ±)=βˆ’Dπ–ͺ𝖫​(P0βˆ₯P1).\lim_{n\to\infty}\frac{1}{n}\log\beta_{n}(\alpha)=-D_{\mathsf{KL}}(P_{0}\|P_{1}). (50)

Assume now that, ZnZ^{n} is the output of π–ͺβŠ—n{\mathsf{K}}^{\otimes n} for an (Ξ΅,Ξ΄)(\varepsilon,\delta)-LDP mechanism π–ͺ{\mathsf{K}}. According to (50), we obtain that

limnβ†’βˆž1n​log⁑βnΞ΅,δ​(Ξ±)=βˆ’supπ–ͺβˆˆπ’¬Ξ΅,Ξ΄Dπ–ͺ𝖫​(P0​π–ͺβˆ₯P1​π–ͺ).\lim_{n\to\infty}\frac{1}{n}\log\beta^{\varepsilon,\delta}_{n}(\alpha)=-\sup_{{\mathsf{K}}\in{\mathcal{Q}}_{\varepsilon,\delta}}D_{\mathsf{KL}}(P_{0}{\mathsf{K}}\|P_{1}{\mathsf{K}}). (51)

Applying Lemma 1, we obtain the desired result. ∎

Proof of CorollaryΒ 5.

Consider the (trivial) Markov chain X⊸--X⊸--ZX\mathrel{\multimap}\joinrel\mathrel{-}\mspace{-9.0mu}\joinrel\mathrel{-}X\mathrel{\multimap}\joinrel\mathrel{-}\mspace{-9.0mu}\joinrel\mathrel{-}Z. According to (42), we can write I​(X;Z)≀ηπ–ͺ𝖫​(π–ͺ)​H​(X)I(X;Z)\leq\eta_{\mathsf{KL}}({\mathsf{K}})H(X). The desired result then immediately follows from LemmaΒ 1. ∎