Local Differential Privacy Is Equivalent to Contraction of ${\mathsf{E}}_{\gamma}$ -Divergence

Shahab Asoodeh^†, Maryam Aliakbarpour^∗, and Flavio P. Calmon^†
^†Harvard University, ^∗University of Massachusetts Amherst

Abstract

We investigate the local differential privacy (LDP) guarantees of a randomized privacy mechanism via its contraction properties. We first show that LDP constraints can be equivalently cast in terms of the contraction coefficient of the ${\mathsf{E}}_{\gamma}$ -divergence. We then use this equivalent formula to express LDP guarantees of privacy mechanisms in terms of contraction coefficients of arbitrary $f$ -divergences. When combined with standard estimation-theoretic tools (such as Le Cam’s and Fano’s converse methods), this result allows us to study the trade-off between privacy and utility in several testing and minimax and Bayesian estimation problems.

I Introduction

A major challenge in modern machine learning applications is balancing statistical efficiency with the privacy of individuals from whom data is obtained. In such applications, privacy is often quantified in terms of Differential Privacy (DP) [1]. DP has several variants, including approximate DP [2], Rényi DP [3], and others [4, 5, 6, 7]. Arguably, the most stringent flavor of DP is local differential privacy (LDP) [8, 9]. Intuitively, a randomized mechanism (or a Markov kernel) is said to be locally differentially private if its output does not vary significantly with arbitrary perturbations of the input.

More precisely, a mechanism is said to be $\varepsilon$ -LDP (or pure LDP) if the privacy loss random variable, defined as the log-likelihood ratio of the output for any two different inputs, is smaller than $\varepsilon$ with probability one. One can also consider an approximate variant of this constraint: ${\mathsf{K}}$ is said to be $(\varepsilon,\delta)$ -LDP if the privacy loss random variable does not exceed $\varepsilon$ with probability at least $1-\delta$ (see Def. 1 for the formal definition).

The study of statistical efficiency under LDP constraints has gained considerable traction, e.g., [9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 8]. Almost all of these works consider $\varepsilon$ -LDP and provide meaningful bounds only for sufficiently small values of $\varepsilon$ (i.e., the high privacy regime). For instance, Duchi et al. [10] studied minimax estimation problems under $\varepsilon$ -LDP constraints and showed that for $\varepsilon\leq 1$ , the price of privacy is to reduce the effective sample size from $n$ to $\varepsilon^{2}n$ . A slightly improved version of this result appeared in [19, 13]. More recently, Duchi and Rogers [20] developed a framework based on the strong data processing inequality (SDPI) [21] and derived lower bounds for minimax estimation risk under $\varepsilon$ -LDP that hold for any $\varepsilon\geq 0$ .

In this work, we develop an SDPI-based framework for studying hypothesis testing and estimation problems under $(\varepsilon,\delta)$ -LDP, extending the results of [20] to approximate LDP. In particular, we derive bounds for both the minimax and Bayesian estimation risks that hold for any $\varepsilon\geq 0$ and $\delta\geq 0$ . Interestingly, when setting $\delta=0$ , our bounds can be slightly stronger than [10].

Our main mathematical tool is an equivalent expression for DP in terms of ${\mathsf{E}}_{\gamma}$ -divergence. Given $\gamma\geq 1$ , the ${\mathsf{E}}_{\gamma}$ -divergence between two distributions $P$ and $Q$ is defined as

{\mathsf{E}}_{\gamma}(P\|Q)\coloneqq\frac{1}{2}\int|\text{d}P-\gamma\text{d}Q|-\frac{1}{2}(\gamma-1).

(1)

We show that a mechanism ${\mathsf{K}}$ is $(\varepsilon,\delta)$ -LDP if and only if

{\mathsf{E}}_{\gamma}(P{\mathsf{K}}\|Q{\mathsf{K}})\leq\delta{\mathsf{E}}_{\gamma}(P\|Q)

for $\gamma=e^{\varepsilon}$ and any pairs of distributions $(P,Q)$ where $P{\mathsf{K}}$ represents the output distribution of ${\mathsf{K}}$ when the input distribution is $P$ . Thus, the approximate LDP guarantee of a mechanism can be fully characterized by its contraction under ${\mathsf{E}}_{\gamma}$ -divergence. When combined with standard statistical techniques, including Le Cam’s and Fano’s methods [22, 23], ${\mathsf{E}}_{\gamma}$ -contraction leads to general lower bounds for the minimax and Bayesian risk under $(\varepsilon,\delta)$ -LDP for any $\varepsilon\geq 0$ and $\delta\in[0,1]$ . In particular, we show that the price of privacy in this case is to reduce the sample size from $n$ to $n[1-e^{-\varepsilon}(1-\delta)]$ .

There exists several results connecting pure LDP to the contraction properties of KL divergence $D_{\mathsf{KL}}$ and total variation distance ${\mathsf{TV}}$ . For instance, for any $\varepsilon$ -LDP mechanism ${\mathsf{K}}$ , it is shown in [10, Theorem 1] that $D_{\mathsf{KL}}(P{\mathsf{K}}\|Q{\mathsf{K}})\leq 2(e^{\varepsilon}-1)^{2}{\mathsf{TV}}^{2}(P,Q)$ and in [13, Theorem 6] that ${\mathsf{TV}}(P{\mathsf{K}}\|Q{\mathsf{K}})\leq\frac{e^{\varepsilon}-1}{e^{\varepsilon}+1}{\mathsf{TV}}(P,Q)$ for any pairs $(P,Q)$ . Inspired by these results, we further show that if ${\mathsf{K}}$ is $(\varepsilon,\delta)$ -LDP then $D_{f}(P{\mathsf{K}}\|Q{\mathsf{K}})\leq[1-e^{-\varepsilon}(1-\delta)]D_{f}(P\|Q)$ for any arbitrary $f$ -divergences $D_{f}$ and any pairs $(P,Q)$ .

Notation. For a random variable $X$ , we write $P_{X}$ and ${\mathcal{X}}$ for its distribution (i.e., $X\sim P_{X}$ ) and its alphabet, respectively. For any set $A$ , we denote by ${\mathcal{P}}(A)$ the set of all probability distributions on $A$ . Given two sets ${\mathcal{X}}$ and ${\mathcal{Z}}$ , a Markov kernel (i.e., channel) ${\mathsf{K}}$ is a mapping from ${\mathcal{X}}$ to ${\mathcal{P}}({\mathcal{Z}})$ given by $x\mapsto{\mathsf{K}}(\cdot|x)$ . Given $P\in{\mathcal{P}}({\mathcal{X}})$ and a Markov kernel ${\mathsf{K}}:{\mathcal{X}}\to{\mathcal{P}}({\mathcal{Z}})$ , we let $P{\mathsf{K}}$ denote the output distribution of ${\mathsf{K}}$ when the input distribution is $P$ , i.e., $P{\mathsf{K}}(\cdot)=\int{\mathsf{K}}(\cdot|x)P(\text{d}x)$ . Also, we use $\mathsf{BSC}(\omega)$ to denote the binary symmetric channel with crossover probability $\omega$ . For sequences $\{a_{n}\}$ and $\{b_{n}\}$ , we use $a_{n}\gtrsim b_{n}$ to indicate $a_{n}\geq Cb_{n}$ for some universal constant $C$ .

II Preliminaries

II-A $f$ -Divergences

Given a convex function $f:(0,\infty)\to\mathbb{R}$ such that $f(1)=0$ , the $f$ -divergence between two probability measures $P\ll Q$ is defined as [24, 25]

D_{f}(P\|Q)\coloneqq{\mathbb{E}}_{Q}\Big{[}f\big{(}\frac{\textnormal{d}P}{\textnormal{d}Q}\big{)}\Big{]}.

(2)

Due to convexity of $f$ , we have $D_{f}(P\|Q)\geq f(1)=0$ . If, furthermore, $f$ is strictly convex at $1$ , then equality holds if and only $P=Q$ . Popular examples of $f$ -divergences include $f(t)=t\log t$ corresponding to KL divergence, $f(t)=|t-1|$ corresponding to total variation distance, and $f(t)=t^{2}-1$ corresponding to $\chi^{2}$ -divergence. In this paper, we mostly concern with an important sub-family of $f$ -divergences associated with $f_{\gamma}(t)=\max\{t-\gamma,0\}$ for a parameter $\gamma\geq 1$ . The corresponding $f$ -divergence, denoted by ${\mathsf{E}}_{\gamma}(P\|Q)$ , is called ${\mathsf{E}}_{\gamma}$ -divergence (or sometimes hockey-stick divergence [26]) and is explicitly defined in (1). It appeared in [27] for proving converse channel coding results and also used in [28, 29, 30, 7] for characterizing privacy guarantees of iterative algorithms in terms of other variants of DP.

II-B Contraction Coefficient

All $f$ -divergences satisfy data processing inequality, i.e., $D_{f}(P{\mathsf{K}}\|Q{\mathsf{K}})\leq D_{f}(P\|Q)$ for any pair of probability distributions $(P,Q)$ and Markov kernel ${\mathsf{K}}$ [24]. However, in many cases, this inequality is strict. The contraction coefficient of Markov kernel ${\mathsf{K}}$ under $D_{f}$ -divergence $\eta_{f}({\mathsf{K}})$ is the smallest number $\eta\leq 1$ such that $D_{f}(P{\mathsf{K}}\|Q{\mathsf{K}})\leq\eta D_{f}(P\|Q)$ for any pair of probability distributions $(P,Q)$ . Formally, $\eta_{f}({\mathsf{K}})$ is defined as

\eta_{f}({\mathsf{K}})\coloneqq\sup_{\begin{subarray}{c}P,Q\in{\mathcal{P}}({\mathcal{X}}):\\ D_{f}(P\|Q)\neq 0\end{subarray}}\frac{D_{f}(P{\mathsf{K}}\|Q{\mathsf{K}})}{D_{f}(P\|Q)}.

(3)

Contraction coefficients have been studied for several $f$ -divergences, e.g., $\eta_{\mathsf{TV}}$ for total variation distance was studied in [31, 32, 33], $\eta_{\mathsf{KL}}$ for $\mathsf{KL}$ -divergence was studied in [34, 35, 36, 37, 38, 39], and $\eta_{\chi^{2}}$ for $\chi^{2}$ -divergence was studied in [33, 39, 40]. In particular, Dobrushin [31] showed that $\eta_{\mathsf{TV}}$ has a remarkably simple two-point characterization $\eta_{\mathsf{TV}}({\mathsf{K}})=\sup_{x_{1},x_{2}\in{\mathcal{X}}}\mathsf{TV}({\mathsf{K}}(\cdot|x_{1}),{\mathsf{K}}(\cdot|x_{2}))$ .

Similarly, one can plug ${\mathsf{E}}_{\gamma}$ -divergence into (3) and define the contraction coefficient $\eta_{\gamma}({\mathsf{K}})$ for a Markov kernel ${\mathsf{K}}$ under ${\mathsf{E}}_{\gamma}$ -divergence. This contraction coefficient has recently been studied in [30] for deriving approximate DP guarantees for online algorithms. In particular, it was shown [30, Theorem 3] that $\eta_{\gamma}$ enjoys a simple two-point characterization, i.e., $\eta_{\gamma}({\mathsf{K}})=\sup_{x_{1},x_{2}\in{\mathcal{X}}}{\mathsf{E}}_{\gamma}({\mathsf{K}}(\cdot|x_{1})\|{\mathsf{K}}(\cdot|x_{2}))$ . Since ${\mathsf{E}}_{1}(P\|Q)={\mathsf{TV}}(P,Q)$ , this is a natural extension of Dobrushin’s result.

II-C Local Differential Privacy

Suppose ${\mathsf{K}}$ is a randomized mechanism mapping each $x\in{\mathcal{X}}$ to a distribution ${\mathsf{K}}(\cdot|x)\in{\mathcal{P}}({\mathcal{Z}})$ . One could view ${\mathsf{K}}$ as a Markov kernel (i.e., channel) ${\mathsf{K}}:{\mathcal{X}}\to{\mathcal{P}}(Z)$ .

Definition 1 ([8, 9]).

A mechanism ${\mathsf{K}}:{\mathcal{X}}\to{\mathcal{P}}({\mathcal{Z}})$ is $(\varepsilon,\delta)$ -LDP for $\varepsilon\geq 0$ and $\delta\in[0,1]$ if

\sup_{x,x^{\prime}\in{\mathcal{X}}}\sup_{A\subset{\mathcal{Z}}}\leavevmode\nobreak\ \left[{\mathsf{K}}(A|x)-e^{\varepsilon}{\mathsf{K}}(A|x^{\prime})\right]\leq\delta.

(4)

${\mathsf{K}}$ is said to be $\varepsilon$ -LDP if it is $(\varepsilon,0)$ -LDP. Let ${\mathcal{Q}}_{\varepsilon,\delta}$ be the collection of all Markov kernels ${\mathsf{K}}$ with the above property. When $\delta=0$ , we use ${\mathcal{Q}}_{\varepsilon}$ to denote ${\mathcal{Q}}_{\varepsilon,0}$ .

Interactivity in Privacy-Preserving Mechanisms: Suppose there are $n$ users, each in possession of a datapoint $X_{i}$ , $i\in[n]\coloneqq\{1,\dots,n\}$ . The users wish to apply a mechanism ${\mathsf{K}}_{i}$ that generates a privatized version of $X_{i}$ , denoted by $Z_{i}$ . We say that the collection of mechanisms $\{{\mathsf{K}}_{i}\}$ is non-interactive if ${\mathsf{K}}_{i}$ is entirely determined by $X_{i}$ and independent of $(X_{j},Z_{j})$ for $j\neq i$ . When all users apply the same mechanism ${\mathsf{K}}$ , we can view $Z^{n}\coloneqq(Z_{1},\ldots,Z_{n})$ as independent applications of ${\mathsf{K}}$ to each $X_{i}$ . We denote this overall mechanism by ${\mathsf{K}}^{\otimes n}$ . If interactions between users are permitted, then ${\mathsf{K}}_{i}$ need not depend only on $X_{i}$ . In this case, we denote the overall mechanism $\{{\mathsf{K}}_{i}\}_{i=1}^{n}$ by ${\mathsf{K}}^{n}$ . In particular, the sequentially interactive [10] setting refers to the case when the input of ${\mathsf{K}}_{i}$ depends on both $X_{i}$ and the outputs $Z^{i-1}$ of the $(i-1)$ previous mechanisms.

III LDP As the Contraction of ${\mathsf{E}}_{\gamma}$ -Divergence

We show next that the $(\varepsilon,\delta)$ -LDP constraint, with $\delta$ not necessarily equal to zero, is equivalent to the contraction of ${\mathsf{E}}_{\gamma}$ -divergence.

Theorem 1.

A mechanism ${\mathsf{K}}$ is $(\varepsilon,\delta)$ -LDP if and only if $\eta_{e^{\varepsilon}}({\mathsf{K}})\leq\delta$ or equivalently

{\mathsf{K}}\in{\mathcal{Q}}_{\varepsilon,\delta}\leavevmode\nobreak\ \Longleftrightarrow\leavevmode\nobreak\ {\mathsf{E}}_{e^{\varepsilon}}(P{\mathsf{K}}\|Q{\mathsf{K}})\leq\delta{\mathsf{E}}_{e^{\varepsilon}}(P\|Q),\quad\forall P,Q.

We note that Duchi et al. [10] showed that if ${\mathsf{K}}$ is $\varepsilon$ -LDP then $D_{{\mathsf{KL}}}(P{\mathsf{K}}\|Q{\mathsf{K}})\leq 2(e^{\varepsilon}-1)^{2}{\mathsf{TV}}^{2}(P,Q)$ . They then informally concluded from this result that $\varepsilon$ -LDP acts as a contraction on the space of probability measures. Theorem 1 makes this observation precise.

According to Theorem 1, a mechanism ${\mathsf{K}}$ is $\varepsilon$ -LDP if and only if ${\mathsf{E}}_{e^{\varepsilon}}(P{\mathsf{K}}\|Q{\mathsf{K}})=0$ for any distributions $P$ and $Q$ . An example of such Markov kernels is given next.

Example 1. (Randomized response mechanism) Let ${\mathcal{X}}={\mathcal{Z}}=\{0,1\}$ and consider the mechanism given by the binary symmetric channel $\mathsf{BSC}(\omega_{\varepsilon})$ with $\omega_{\varepsilon}\coloneqq\frac{1}{1+e^{\varepsilon}}$ . This is often called randomized response mechanism [41] and denoted by ${\mathsf{K}}^{\varepsilon}_{\mathsf{RR}}$ . This simple mechanism is well-known to be $\varepsilon$ -LDP which can now be verified via Theorem 1. Let $P={\mathsf{Bernoulli}}(p)$ and $Q={\mathsf{Bernoulli}}(q)$ with $p,q\in[0,1]$ . Then $P{\mathsf{K}}^{\varepsilon}_{\mathsf{RR}}={\mathsf{Bernoulli}}(p*\omega_{\varepsilon})$ and $P{\mathsf{K}}^{\varepsilon}_{\mathsf{RR}}={\mathsf{Bernoulli}}(q*\omega_{\varepsilon})$ where $a*b\coloneqq a(1-b)+b(1-a)$ . It is straightforward to verify that $|p*\omega_{\varepsilon}-e^{\varepsilon}q*\omega_{\varepsilon}|+|1-p*\omega_{\varepsilon}-e^{\varepsilon}(1-q*\omega_{\varepsilon})|=0.5(e^{\varepsilon}-1)$ for any $p,q$ , implying ${\mathsf{E}}_{e^{\varepsilon}}(P{\mathsf{K}}^{\varepsilon}_{\mathsf{RR}}\|Q{\mathsf{K}}^{\varepsilon}_{\mathsf{RR}})=0$ . When $|{\mathcal{X}}|=k\geq 2$ , a simple generalization of this mechanism, called $k$ -ary randomized response, has been reported in literature (see, e.g., [19, 13]) and is defined by ${\mathcal{Z}}={\mathcal{X}}$ and ${\mathsf{K}}_{\mathsf{kRR}}(x|x)=\frac{e^{\varepsilon}}{k-1+e^{\varepsilon}}$ and ${\mathsf{K}}_{\mathsf{kRR}}(z|x)=\frac{1}{k-1+e^{\varepsilon}}$ for $z\neq x$ . Again, it can be verified that for this mechanism we have ${\mathsf{E}}_{e^{\varepsilon}}(P{\mathsf{K}}^{\varepsilon}_{\mathsf{kRR}}\|Q{\mathsf{K}}^{\varepsilon}_{\mathsf{kRR}})=0$ , for all Bernoulli $P$ and $Q$ .

${\mathsf{E}}_{\gamma}$ -divergence underlies all other $f$ -divergences, in a sense that any arbitrary $f$ -divergence can be represented by ${\mathsf{E}}_{\gamma}$ -divergence [42, Corollary 3.7]. Thus, an LDP constraint implies that a Markov kernel contracts for all $f$ -divergences, in a similar spirit to ${\mathsf{E}}_{\gamma}$ -contraction in Theorem 1.

Lemma 1.

Let ${\mathsf{K}}\in{\mathcal{Q}}_{\varepsilon,\delta}$ and $\varphi(\varepsilon,\delta)\coloneqq 1-(1-\delta)e^{-\varepsilon}$ . Then, $\eta_{f}({\mathsf{K}})\leq\varphi(\varepsilon,\delta)$ or, equivalently,

D_{f}(P{\mathsf{K}}\|Q{\mathsf{K}})\leq D_{f}(P\|Q)\varphi(\varepsilon,\delta)\qquad\forall P,Q\in{\mathcal{P}}({\mathcal{X}}).

Notice that this lemma holds for any $f$ -divergences and any general family of $(\varepsilon,\delta)$ -LDP mechanisms. However, it can be improved if one considers particular mechanisms or a certain $f$ -divergence. For instance, it is known that $\eta_{\mathsf{KL}}(\mathsf{BSC}(\omega))=(1-2\omega^{2})$ [21]. Thus, we have $\eta_{\mathsf{KL}}({\mathsf{K}}^{\varepsilon}_{\mathsf{RR}})=(\frac{e^{\varepsilon}-1}{e^{\varepsilon}+1})^{2}$ for the randomized response mechanism ${\mathsf{K}}^{\varepsilon}_{\mathsf{RR}}$ (cf. Example 1), while Lemma 1 implies that $\eta_{\mathsf{KL}}({\mathsf{K}}^{\varepsilon}_{\mathsf{RR}})\leq 1-e^{-\varepsilon}$ . Unfortunately, $\eta_{\mathsf{KL}}$ is difficult to compute in closed form for general Markov kernels, in which case Lemma 1 provides a useful alternative.

Next, we extend Lemma 1 for the non-interactive mechanism. Fix an $(\varepsilon,\delta)$ -LDP mechanism ${\mathsf{K}}$ and consider the corresponding non-interactive mechanism ${\mathsf{K}}^{\otimes n}$ . To obtain upper bounds on $\eta_{f}({\mathsf{K}}^{\otimes n})$ directly through Lemma 1, we would first need to derive privacy parameters of ${\mathsf{K}}^{\otimes n}$ in terms of $\varepsilon$ and $\delta$ (e.g., by applying composition theorems). Instead, we can use the tensorization properties of contraction coefficients (see, e.g., [39, 38]) to relate $\eta_{f}({\mathsf{K}}^{\otimes n})$ to $\eta_{f}({\mathsf{K}})$ and then apply Lemma 1, as described next.

Lemma 2.

Let ${\mathsf{K}}\in{\mathcal{Q}}_{\varepsilon,\delta}$ and $\varphi_{n}(\varepsilon,\delta)\coloneqq 1-e^{-n\varepsilon}(1-\delta)^{n}$ . Then $\eta_{f}({\mathsf{K}}^{\otimes n})\leq\varphi_{n}(\varepsilon,\delta)$ for $n\geq 1.$

Each of the next three sections provide a different application of the contraction characterization of LDP.

IV Private Minimax Risk

Let $X^{n}=(X_{1},\dots,X_{n})$ be $n$ independent and identically distributed (i.i.d.) samples drawn from a distribution $P$ in a family ${\mathcal{P}}\subseteq{\mathcal{P}}({\mathcal{X}})$ . Let also $\theta:{\mathcal{P}}\to{\mathcal{T}}$ be a parameter of a distribution that we wish to estimate. Each user has a sample $X_{i}$ and applies a privacy-preserving mechanism ${\mathsf{K}}_{i}$ to obtain $Z_{i}$ . Generally, we can assume that ${\mathsf{K}}_{i}$ are sequentially interactive. Given the sequences $\{Z_{i}\}_{i=1}^{n}$ , the goal is to estimate $\theta(P)$ through an estimator $\Psi:{\mathcal{Z}}^{n}\to{\mathcal{T}}$ . The quality of such estimator is assessed by a semi-metric $\ell:{\mathcal{T}}\times{\mathcal{T}}\to\mathbb{R}_{+}$ and is used to define the minimax risk as:

\mathcal{R}_{n}({\mathcal{P}},\ell,\varepsilon,\delta)\coloneqq\inf_{{\mathsf{K}}^{n}\subset{\mathcal{Q}}_{\varepsilon,\delta}}\inf_{\Psi}\sup_{P\in{\mathcal{P}}}{\mathbb{E}}[\ell(\Psi(Z^{n}),\theta(P))].

(5)

The quantity $R_{n}({\mathcal{P}},\ell,\varepsilon,\delta)$ uniformly characterizes the optimal rate of private statistical estimation over the family ${\mathcal{P}}$ using the best possible estimator and privacy-preserving mechanisms in ${\mathcal{Q}}_{\varepsilon,\delta}$ . In the absence of privacy constraints (i.e., $Z^{n}=X^{n}$ ), we denote the minimax risk by $\mathcal{R}_{n}({\mathcal{P}},\ell)$ .

The first step in deriving information-theoretic lower bounds for minimax risk is to reduce the above estimation problem to a testing problem [23, 43, 22]. To do so, we need to construct an index set ${\mathcal{V}}$ with $|{\mathcal{V}}|<\infty$ and a family of distributions $\{P_{v},v\in{\mathcal{V}}\}\subseteq{\mathcal{P}}$ such that $\ell(\theta(P_{v}),\theta(P_{v^{\prime}}))\geq 2\tau$ for all $v\neq v^{\prime}$ in ${\mathcal{V}}$ for some $\tau>0$ . The canonical testing problem is then defined as follows: Nature chooses a random variable $V$ uniformly at random from ${\mathcal{V}}$ , and then conditioned on $V=v$ , the samples $X^{n}$ are drawn i.i.d. from $P_{v}$ , denoted by $X^{n}\sim P^{\otimes n}_{v}$ . Each $X_{i}$ is then fed to a mechanism ${\mathsf{K}}_{i}$ to generate $Z_{i}$ . It is well-known [22, 43, 23] that $\mathcal{R}_{n}({\mathcal{P}},\ell)\geq\tau\mathsf{P}_{\mathsf{e}}(V|X^{n})$ , where $\mathsf{P}_{\mathsf{e}}(V|X^{n})$ denotes the probability of error in guessing $V$ given $X^{n}$ . Replacing $X^{n}$ by its $(\varepsilon,\delta)$ -privatized samples $Z^{n}$ in this result, one can obtain a lower bound on $R_{n}({\mathcal{P}},\ell,\varepsilon,\delta)$ in terms of $\mathsf{P}_{\mathsf{e}}(V|Z^{n})$ . Hence, the remaining challenge is to lower-bound $\mathsf{P}_{\mathsf{e}}(V|Z^{n})$ over the choice of mechanisms $\{{\mathsf{K}}_{i}\}$ . There are numerous techniques for this objective depending on ${\mathcal{V}}$ . We focus on two such approaches, namely Le Cam’s and Fano’s method, that bound $\mathsf{P}_{\mathsf{e}}(V|Z^{n})$ in terms of total variation distance and mutual information and hence allow us to invoke Lemmas 1 and 2.

IV-A Locally Private Le Cam’s Method

Le Cam’s method is applicable when $V$ is a binary set and contains, say, $P_{0}$ and $P_{1}$ . In its simplest form, it relies on the inequality (see [22, Lemma 1] or [23, Theorem 2.2]) $\mathsf{P}_{\mathsf{e}}(V|X^{n})\geq\frac{1}{2}\left[1-{\mathsf{TV}}(P^{\otimes n}_{0},P^{\otimes n}_{1})\right]$ . Thus, it yields the following lower bound for non-private minimax risk

	$\displaystyle\mathcal{R}_{n}({\mathcal{P}},\ell)$	$\displaystyle\geq\frac{\tau}{2}\left[1-{\mathsf{TV}}(P^{\otimes n}_{0},P^{\otimes n}_{1})\right]$		(6)
		$\displaystyle\geq\frac{\tau}{2}\left[1-\frac{1}{\sqrt{2}}\sqrt{nD_{\mathsf{KL}}(P_{0}\\|P_{1})}\right],$		(7)

for any $P_{0}\neq P_{1}$ in ${\mathcal{P}}$ , where the second inequality follows from Pinsker’s inequality and chain rule of KL divergence. In the presence of privacy, the estimator $\Psi$ depends on $Z^{n}$ instead of $X^{n}$ , which is generated by a sequentially interactive mechanism ${\mathsf{K}}^{n}$ . To write the private counterpart of (6), we need to replace $P^{\otimes n}_{0}$ and $P^{\otimes n}_{1}$ with $P^{\otimes n}_{0}{\mathsf{K}}^{n}$ and $P^{\otimes n}_{1}{\mathsf{K}}^{n}$ the corresponding marginals of $Z^{n}$ , respectively. A lower bound for $\mathcal{R}_{n}({\mathcal{P}},\ell,\varepsilon,\delta)$ is therefore obtained by deriving an upper bound for ${\mathsf{TV}}(P_{0}^{\otimes n}{\mathsf{K}}^{n},P_{1}^{\otimes n}{\mathsf{K}}^{n})$ for all ${\mathsf{K}}^{n}\subset{\mathcal{Q}}_{\varepsilon,\delta}$ .

Lemma 3.

Let $P_{0},P_{1}\in{\mathcal{P}}$ satisfy $\ell(\theta(P_{0}),\theta(P_{1}))\geq 2\tau$ . Then we have

\mathcal{R}_{n}({\mathcal{P}},\ell,\varepsilon,\delta)\geq\frac{\tau}{2}\left[1-\frac{1}{\sqrt{2}}\sqrt{n\varphi(\varepsilon,\delta)D_{\mathsf{KL}}(P_{0}\|P_{1})}\right].

By comparing with the original non-private Le Cam’s method (7), we observe that the effect of $(\varepsilon,\delta)$ -LDP is to reduce the effective sample size from $n$ to $(1-e^{-\varepsilon}(1-\delta))n$ . Setting $\delta=0$ , this result strengthens Duchi et al. [10, Corollary 2], where the effective sample size was shown to be $4\varepsilon^{2}n$ for sufficiently small $\varepsilon$ .

Example 2. (One-dimensional mean estimation) For some $k>1$ , we assume ${\mathcal{P}}$ is given by

{\mathcal{P}}={\mathcal{P}}_{k}\coloneqq\{P\in{\mathcal{P}}({\mathcal{X}}):\leavevmode\nobreak\ |{\mathbb{E}}_{P}[X]|\leq 1,{\mathbb{E}}_{P}[|X|^{k}]\leq 1\}.

The goal is to estimate $\theta(P)={\mathbb{E}}_{P}[X]$ under $\ell=\ell_{2}^{2}$ the squared $\ell_{2}$ metric. This problem was first studied in [10, Propsition 1] where it was shown $\mathcal{R}_{n}({\mathcal{P}}_{k},\ell_{2}^{2},\varepsilon,0)\geq(n\varepsilon^{2})^{-(k-1)/k}$ only for $\varepsilon\leq 1$ . Applying our framework to this example, we obtain a similar lower bound that holds for all $\varepsilon\geq 0$ and $\delta\in[0,1]$ .

Corollary 1.

For all $k>1$ , $\varepsilon\geq 0$ , and $\delta\in(0,1)$ , we have

\displaystyle\mathcal{R}_{n}({\mathcal{P}}_{k},\ell_{2}^{2},\varepsilon,\delta)

\displaystyle\gtrsim\min\Big{\{}1,\left[n\varphi^{2}(\varepsilon,\delta)\right]^{-\frac{(k-1)}{k}}\Big{\}}.

(8)

It is worth instantiating this corollary for some special values of $k$ . Consider first the usual setting of finite variance setting, i.e., $k=2$ . In the non-private case, it is known that the sample mean has mean-squared error that scales as $1/n$ . According to Corollary 1, this rate worsens to $1/\varphi(\varepsilon,\delta)\sqrt{n}$ in the presence of $(\varepsilon,\delta)$ -LDP requirement. As $k\to\infty$ , the moment condition ${\mathbb{E}}_{p}[|X|^{k}]\leq 1$ implies the boundedness of $X$ . In this case, Corollary 1 implies the more standard lower bound $(\varphi^{2}(\varepsilon,\delta)n)^{-1}$ .

IV-B Locally Private Fano’s Method

Le Cam’s method involves a pair of distributions $(P_{0},P_{1})$ in ${\mathcal{P}}$ . However, it is possible to derive a stronger bound considering a larger subset of ${\mathcal{P}}$ by applying Fano’s inequality (see, e.g., [22]). We follow this path to obtain a better minimax lower bound for the non-interactive setting.

Consider the index set ${\mathcal{V}}=\{1,\dots,|{\mathcal{V}}|\}$ . The non-private Fano’s method relies on the Fano’s inequality to write a lower bound for $\mathsf{P}_{\mathsf{e}}(V|X^{n})$ in terms of mutual information as

\mathcal{R}_{n}({\mathcal{P}},\ell)\geq\tau\left[1-\frac{I(X^{n};V)+\log 2}{\log|{\mathcal{V}}|}\right].

(9)

To incorporate privacy into this result, we need to derive an upper bound for $I(Z^{n};V)$ over all choices of mechanisms $\{{\mathsf{K}}_{i}\}$ . Focusing on the non-interactive mechanisms, the following lemma exploits Lemma 2 for such an upper bound.

Lemma 4.

Given $X^{n}$ and $V$ as described above, let $Z^{n}$ be constructed by applying ${\mathsf{K}}^{\otimes n}$ on $X^{n}$ . If ${\mathsf{K}}$ is $(\varepsilon,\delta)$ -LDP, then we have

	$\displaystyle I(Z^{n};V)$	$\displaystyle\leq\varphi_{n}(\varepsilon,\delta)I(X^{n};V)$
		$\displaystyle\leq\frac{n\varphi_{n}(\varepsilon,\delta)}{\|{\mathcal{V}}\|^{2}}\sum_{v,v^{\prime}\in{\mathcal{V}}}D_{\mathsf{KL}}(P_{v}\\|P_{v^{\prime}})$

This lemma can be compared with [10, Corollary 1], where it was shown

\displaystyle I(Z^{n};V)

\displaystyle\leq 2(e^{\varepsilon}-1)\frac{n}{|{\mathcal{V}}|^{2}}\sum_{v,v^{\prime}\in{\mathcal{V}}}D_{\mathsf{KL}}(P_{v}\|P_{v^{\prime}}).

(10)

This is a looser bound than Lemma 4 for any $n\geq 1$ and $\varepsilon\geq 0.4$ and only holds for $\delta=0$ .

Example 3. (High-dimensional mean estimation in an $\ell_{2}$ -ball) For a parameter $r<\infty$ , define

{\mathcal{P}}_{r}\coloneqq\{P\in{\mathcal{P}}(\mathsf{B}^{d}_{2}(r))\},

(11)

where $\mathsf{B}^{d}_{2}(r)\coloneqq\{x\in\mathbb{R}^{d}:\leavevmode\nobreak\ \|x\|_{2}\leq r\}$ is the $\ell_{2}$ -ball of radius $r$ in $\mathbb{R}^{d}$ . The goal is to estimate the mean $\theta(P)={\mathbb{E}}[X]$ given the private views $Z^{n}$ . This example was first studied in [10, Proposition 3] that states $\mathcal{R}_{n}({\mathcal{P}},\ell_{2}^{2},\varepsilon,0)\gtrsim r^{2}\min\left\{\frac{1}{\varepsilon\sqrt{n}},\frac{d}{n\varepsilon^{2}}\right\}$ for $\varepsilon\in(0,1)$ . In the following, we use Lemma 4 to derive a similar lower bound for any $\varepsilon\geq 0$ and $\delta\in(0,1)$ , albeit slightly weaker than [10, Proposition 3].

Corollary 2.

For the non-interactive setting, we have

\mathcal{R}_{n}({\mathcal{P}},\ell_{2}^{2},\varepsilon,\delta)\gtrsim r^{2}\min\left\{\frac{1}{n\varphi_{n}(\varepsilon,\delta)},\frac{d}{n^{2}\varphi^{2}_{n}(\varepsilon,\delta)}\right\}.

(12)

V Private Bayesian Risk

In the minimax setting, the worst-case parameter is considered which usually leads to over-pessimistic bounds. In practice, the parameter that incurs a worst-case risk may appear with very small probability. To capture this prior knowledge, it is reasonable to assume that the true parameter is sampled from an underlying prior distribution. In this case, we are interested in the Bayes risk of the problem.

Let ${\mathcal{P}}=\{P_{X|\Theta}(\cdot|\theta):\theta\in{\mathcal{T}}\}$ be a collection of parametric probability distributions on ${\mathcal{X}}$ and the parameter space ${\mathcal{T}}$ is endowed with a prior $P_{\Theta}$ , i.e., $\Theta\sim P_{\Theta}$ . Given an i.i.d. sequence $X^{n}$ drawn from $P_{X|\Theta}$ , the goal is to estimate $\Theta$ from a privatized sequence $Z^{n}$ via an estimator $\Psi:{\mathcal{Z}}^{n}\to{\mathcal{T}}$ . Here, we focus on the non-interactive setting. Define the private Bayes risk as

R_{n}^{\mathsf{Bayes}}(P_{\Theta},\ell,\varepsilon,\delta)\coloneqq\inf_{{\mathsf{K}}\in{\mathcal{Q}}_{\varepsilon,\delta}}\inf_{\Psi}{\mathbb{E}}[\ell(\Theta,\Psi(Z^{n}))],

(13)

where the expectation is taken with respect to the randomness of both $\Theta$ and $Z^{n}$ . It is evident that $R_{n}^{\mathsf{Bayes}}(P_{\Theta},\ell,\varepsilon,\delta)$ must depend on the prior $P_{\Theta}$ . This dependence can be quantified by

{\mathcal{L}}(\zeta)\coloneqq\sup_{t\in{\mathcal{T}}}\Pr(\ell(\Theta,t)\leq\zeta),

(14)

for $\zeta<\sup_{\theta,\theta^{\prime}\in{\mathcal{T}}}\ell(\theta,\theta^{\prime})$ . Xu and Raginsky [44] showed that the non-private Bayes risk (i.e., $X^{n}=Z^{n}$ ), denoted by $R_{n}^{\mathsf{Bayes}}(P_{\Theta},\ell)$ , is lower bounded as

R_{n}^{\mathsf{Bayes}}(P_{\Theta},\ell)\geq\sup_{\zeta>0}\zeta\left[1-\frac{I(\Theta;X^{n})+\log 2}{\log(1/{\mathcal{L}}(\zeta))}\right].

(15)

Replacing $I(\Theta;X^{n})$ with $I(\Theta;Z^{n})$ in this result and applying Lemma 2 (similar to Lemma 4), we can directly convert (15) to a lower bound for $R_{n}^{\mathsf{Bayes}}(P_{\Theta},\ell,\varepsilon,\delta)$ .

Corollary 3.

In the non-interactive setting, we have

R_{n}^{\mathsf{Bayes}}(P_{\Theta},\ell,\varepsilon,\delta)\geq\sup_{\zeta>0}\zeta\left[1-\frac{\varphi_{n}(\varepsilon,\delta)I(\Theta;X^{n})+\log 2}{\log(1/{\mathcal{L}}(\zeta))}\right].

In the following theorem, we provide a lower bound for $R_{n}^{\mathsf{Bayes}}(P_{\Theta},\ell,\varepsilon,\delta)$ that directly involves ${\mathsf{E}}_{\gamma}$ -divergence, and thus leads to a tighter bounds than (3). For any pair of random variables $(A,B)\sim P_{AB}$ with marginals $P_{A}$ and $P_{B}$ and a constant $\gamma\geq 0$ , we define their $E_{\gamma}$ -information as

I_{\gamma}(A;B)\coloneqq{\mathsf{E}}_{\gamma}(P_{AB}\|P_{A}P_{B}).

Theorem 2.

Let ${\mathsf{K}}$ be an $(\varepsilon,\delta)$ -LDP mechanism. Then, for $n=1$ we have

R_{1}^{\mathsf{Bayes}}(P_{\Theta},\ell,\varepsilon,\delta)\geq\sup_{\zeta>0}\zeta\left[1-\delta I_{e^{\varepsilon}}(\Theta;X)-e^{\varepsilon}{\mathcal{L}}(\zeta)\right],

and for $n>1$ in non-interactive setting we have

R_{n}^{\mathsf{Bayes}}(P_{\Theta},\ell,\varepsilon,\delta)\geq\sup_{\zeta>0}\zeta\left[1-\varphi_{n}(\varepsilon,\delta)I_{e^{\varepsilon}}(\Theta;X^{n})-e^{\varepsilon}{\mathcal{L}}(\zeta)\right].

We compare Theorem 2 with Corollary 3 in the next example.

Example 4. Suppose $\Theta$ is uniformly distributed on $[0,1]$ , $P_{X|\Theta=\theta}={\mathsf{Bernoulli}}(\theta)$ , and $\ell(\theta,\theta^{\prime})=|\theta-\theta^{\prime}|$ . As mentioned earlier, ${\mathcal{L}}(\zeta)\leq\min\{2\zeta,1\}$ . We can write for $\gamma=e^{\varepsilon}$

I_{\gamma}(\Theta;X^{n})=\int_{0}^{1}{\mathsf{E}}_{\gamma}(P_{X^{n}|\theta}\|P_{X^{n}})\text{d}\theta.

(16)

A straightforward calculation shows that $P_{X^{n}|\theta}(x^{n})=\theta^{s(x^{n})}(1-\theta)^{n-s(x^{n})}$ , for any $\theta\in[0,1]$ , and $P_{X^{n}}(x^{n})=\frac{s(x^{n})!(n-s(x^{n}))!}{(n+1)!}$ where $s(x^{n})$ is the number of 1’s in $x^{n}$ . Given these marginal and conditional distribution, one can obtain after algebraic manipulations

I_{\gamma}(\Theta;X^{n})=\frac{1}{n+1}\sum_{s=0}^{n}\int_{0}^{1}\left[\theta^{s}(1-\theta)^{n-s}\frac{(n+1)!}{s!(n-s)!}-\gamma\right]_{+}\text{d}\theta.

Plugging this into Theorem 2, we arrive at a maximization problem that can be numerically solved. Similarly, we compute $I(\Theta;X^{n})=\int_{0}^{1}D_{\mathsf{KL}}(P_{X^{n}|\theta}\|P_{X^{n}})\text{d}\theta$ and plug it into Corollary 3 and numerically solve the resulting optimization problem. In Fig. 1, we compare these two lower bounds for $\delta=10^{-4}$ and $n=20$ , indicating the advantage of Theorem 2 for small $\varepsilon$ .

Refer to caption — Figure 1: Comparison of the lower bounds obtained from Theorem 2 and the private version of [44, Theorem 1] described in Corollary 3 for Example 2 assuming $\delta=10^{-4}$ and $n=20$ .

Remark 1.

The proof of Theorem 2 leads to the following lower bound for the non-private Bayes risk

R_{n}^{\mathsf{Bayes}}(P_{\Theta},\ell)\geq\sup_{\begin{subarray}{c}\zeta>0,\\ \gamma\geq 0\end{subarray}}\zeta\left[1-I_{\gamma}(\Theta;X^{n})-\gamma{\mathcal{L}}(\zeta)-(1-\gamma)_{+}\right].

(17)

For a comparison with (15), consider the following example. Suppose $\Theta$ is a uniform random variable on $[0,1]$ and $P_{X|\Theta=\theta}={\mathsf{Bernoulli}}(\theta)$ . We are interested in the Bayes risk with respect to the $\ell_{1}$ -loss function $\ell(\theta,\theta^{\prime})=|\theta-\theta^{\prime}|$ . It can be shown that $I(\Theta;X)=0.19$ nats while

I_{\gamma}(\Theta;X)=\begin{cases}0.25\gamma^{2}&\text{if}\leavevmode\nobreak\ \gamma\in[0,1]\\ 0.25(\gamma-2)^{2}&\text{if}\leavevmode\nobreak\ \gamma\in[1,2]\\ 0&\text{otherwise}.\end{cases}

(18)

Moreover, ${\mathcal{L}}(\zeta)=\sup_{t\in[0,1]}\Pr(|\Theta-t|\leq\zeta)\leq\min\{2\zeta,1\}$ . It can be verified that (15) gives $R_{1}^{\mathsf{Bayes}}(P_{\Theta},\ell_{1})\geq 0.03$ , whereas our bound (17) yields $R_{1}^{\mathsf{Bayes}}(P_{\Theta},\ell_{1})\geq 0.08$ .

VI Private Hypothesis Testing

We now turn our attention to the well-known problem of binary hypothesis testing under local differential privacy constraint. Suppose $n$ i.i.d. samples $X^{n}$ drawn from a distribution $Q\in{\mathcal{P}}({\mathcal{X}})$ are observed. Let now each $X_{i}$ be mapped to $Z_{i}$ via a mechanism ${\mathsf{K}}_{i}\in{\mathcal{Q}}_{\varepsilon,\delta}$ (i.e., sequential interaction is permitted). The goal is to distinguish between the null hypothesis $H_{0}:Q=P_{0}$ from the alternative $H_{1}:Q=P_{1}$ given $Z^{n}$ . Let $T$ be a binary statistic, generated from a randomized decision rule $P_{T|Z^{n}}:{\mathcal{Z}}^{n}\to\mathcal{P}(\{0,1\})$ where $1$ indicates that $H_{0}$ is rejected. Type I and type II error probabilities corresponding to this statistic are given by $\Pr(T=1|H_{0})$ and $\Pr(T=1|H_{1})$ , respectively. To capture the optimal trade-off between type I and type II error probabilities, it is customary to define $\beta^{\varepsilon,\delta}_{n}(\alpha)\coloneqq\inf\Pr(T=0|H_{1})$ where the infimum is taken over all kernels $P_{T|Z^{n}}$ such that $\Pr(T=1|H_{0})\leq\alpha$ and non-interactive mechanisms ${\mathsf{K}}^{\otimes n}$ with ${\mathsf{K}}\in{\mathcal{Q}}_{\varepsilon,\delta}$ . In the following lemma, we apply Lemma 1 to obtain an asymptotic lower bound for $\beta_{n}^{\varepsilon,\delta}(\alpha)$ .

Corollary 4.

We have for any $\varepsilon\geq 0$ and $\delta\in[0,1]$

\lim_{n\to\infty}\frac{1}{n}\log\beta_{n}^{\varepsilon,\delta}(\alpha)\geq-\varphi(\varepsilon,\delta)D_{\mathsf{KL}}(P_{0}\|P_{1}).

(19)

A similar result was proved by Kairouz et al. [13, Sec. 3] that holds only for sufficiently “small” (albeit unspecified) $\varepsilon$ and $\delta=0$ . When compared to Chernoff-Stein lemma [45, Theorem 11.8.3], establishing $D_{\mathsf{KL}}(P_{0}\|P_{1})$ as the asymptotic exponential decay rate of $\beta_{n}^{\varepsilon,\delta}(\alpha)$ , the above corollary, once again, justifies the reduction of effective sample size from $n$ to $\varphi(\varepsilon,\delta)n$ in the presence of $(\varepsilon,\delta)$ -LDP requirement.

VII Mutual Information of LDP Mechanisms

Viewing mutual information as a utility measure, we may consider maximizing mutual information under local differential privacy as yet another privacy-utility trade-off. To formalize this, let $X\sim P_{X}$ . The goal is to characterize the supremum of $I(X;Z)$ over ${\mathsf{K}}\in{\mathcal{Q}}_{\varepsilon,\delta}$ , i.e., the maximum information shared between $X$ and its $(\varepsilon,\delta)$ -LDP representation. Such mutual information bounds under local DP have appeared in the literature, e.g., McGregor et al. [46] provided a result that roughly states $I(X;Z)\leq 3\varepsilon$ for ${\mathsf{K}}\in{\mathcal{Q}}_{\varepsilon}$ and Kairouz et al. [13, Corollary 15] showed for sufficiently small $\varepsilon$

\sup_{{\mathsf{K}}\in{\mathcal{Q}}_{\varepsilon}}I(X;Z)\leq\frac{1}{2}P_{X}(A)(1-P_{X}(A))\varepsilon^{2},

(20)

where $A\subset{\mathcal{X}}$ satisfies $A\in\mathop{\rm arg\,min}_{B\subset{\mathcal{X}}}|P_{X}(B)-\frac{1}{2}|$ . Next, we provide an upper bound for the mutual information under LDP that holds for all $\varepsilon\geq 0$ and $\delta\in[0,1]$ .

Corollary 5.

We have for any $\varepsilon\geq 0$ and $\delta\in[0,1]$

\sup_{{\mathsf{K}}\in{\mathcal{Q}}_{\varepsilon,\delta}}I(X;Z)\leq\varphi(\varepsilon,\delta)H(X).

(21)

References

[1] C. Dwork, F. McSherry, K. Nissim, and A. Smith, “Calibrating noise to sensitivity in private data analysis,” in Proc. Theory of Cryptography (TCC), Berlin, Heidelberg, 2006, pp. 265–284.
[2] C. Dwork, K. Kenthapadi, F. McSherry, I. Mironov, and M. Naor, “Our data, ourselves: Privacy via distributed noise generation,” in EUROCRYPT, S. Vaudenay, Ed., 2006, pp. 486–503.
[3] I. Mironov, “Rényi differential privacy,” in Proc. Computer Security Found. (CSF), 2017, pp. 263–275.
[4] M. Bun and T. Steinke, “Concentrated differential privacy: Simplifications, extensions, and lower bounds,” in Theory of Cryptography, 2016, pp. 635–658.
[5] C. Dwork and G. N. Rothblum, “Concentrated differential privacy,” vol. abs/1603.01887, 2016. [Online]. Available: http://arxiv.org/abs/1603.01887
[6] J. Dong, A. Roth, and W. J. Su, “Gaussian differential privacy,” arXiv 1905.02383, 2019.
[7] S. Asoodeh, J. Liao, F. P. Calmon, O. Kosut, and L. Sankar, “Three variants of differential privacy: Lossless conversion and applications,” To appear in Journal on Selected Areas in Information Theory (JSAIT), 2021.
[8] A. Evfimievski, J. Gehrke, and R. Srikant, “Limiting privacy breaches in privacy preserving data mining,” in Proc. ACM symp. Principles of Database Systems (PODS). ACM, 2003, pp. 211–222.
[9] S. P. Kasiviswanathan, H. K. Lee, K. Nissim, S. Raskhodnikova, and A. Smith, “What can we learn privately?” SIAM J. Comput., vol. 40, no. 3, pp. 793–826, Jun. 2011.
[10] J. C. Duchi, M. I. Jordan, and M. J. Wainwright, “Local privacy, data processing inequalities, and statistical minimax rates,” in Proc. Symp. Foundations of Computer Science, 2013, p. 429–438. [Online]. Available: https://arxiv.org/abs/1302.3203
[11] M. Gaboardi, R. Rogers, and O. Sheffet, “Locally private mean estimation: $z$ -test and tight confidence intervals,” in Proc. Machine Learning Research, 2019, pp. 2545–2554.
[12] A. Bhowmick, J. Duchi, J. Freudiger, G. Kapoor, and R. Rogers, “Protection against reconstruction and its applications in private federated learning,” arXiv 1812.00984, 2018.
[13] P. Kairouz, S. Oh, and P. Viswanath, “Extremal mechanisms for local differential privacy,” Journal of Machine Learning Research, vol. 17, no. 17, pp. 1–51, 2016.
[14] L. P. Barnes, W. N. Chen, and A. Özgür, “Fisher information under local differential privacy,” IEEE Journal on Selected Areas in Information Theory, vol. 1, no. 3, pp. 645–659, 2020.
[15] J. Acharya, C. L. Canonne, and H. Tyagi, “Inference under information constraints i: Lower bounds from chi-square contraction,” IEEE Transactions on Information Theory, vol. 66, no. 12, pp. 7835–7855, 2020.
[16] M. Ye and A. Barg, “Optimal schemes for discrete distribution estimation under locally differential privacy,” IEEE Trans. Inf. Theory, vol. 64, no. 8, pp. 5662–5676, 2018.
[17] D. Wang and J. Xu, “On sparse linear regression in the local differential privacy model,” IEEE Trans. Inf. Theory, pp. 1–1, 2020.
[18] A. Rohde and L. Steinberger, “Geometrizing rates of convergence under local differential privacy constraints,” Ann. Statist., vol. 48, no. 5, pp. 2646–2670, 10 2020.
[19] P. Kairouz, K. Bonawitz, and D. Ramage, “Discrete distribution estimation under local privacy,” in Proc. Int. Conf. Machine Learning, vol. 48, 20–22 Jun 2016, pp. 2436–2444.
[20] J. Duchi and R. Rogers, “Lower bounds for locally private estimation via communication complexity,” in Proc. Conference on Learning Theory, 2019, pp. 1161–1191.
[21] R. Ahlswede and P. Gács, “Spreading of sets in product spaces and hypercontraction of the markov operator,” Ann. Probab., vol. 4, no. 6, pp. 925–939, 12 1976.
[22] B. Yu, Assouad, Fano, and Le Cam. Springer New York, 1997, pp. 423–435.
[23] A. B. Tsybakov, Introduction to Nonparametric Estimation, 1st ed. Springer Publishing Company, Incorporated, 2008.
[24] I. Csiszár, “Information-type measures of difference of probability distributions and indirect observations,” Studia Sci. Math. Hungar., vol. 2, pp. 299–318, 1967.
[25] S. M. Ali and S. D. Silvey, “A general class of coefficients of divergence of one distribution from another,” Journal of Royal Statistics, vol. 28, pp. 131–142, 1966.
[26] N. Sharma and N. A. Warsi, “Fundamental bound on the reliability of quantum information transmission,” CoRR, vol. abs/1302.5281, 2013. [Online]. Available: http://arxiv.org/abs/1302.5281
[27] Y. Polyanskiy, H. V. Poor, and S. Verdú, “Channel coding rate in the finite blocklength regime,” IEEE Trans. Inf. Theory, vol. 56, no. 5, pp. 2307–2359, 2010.
[28] B. Balle, G. Barthe, and M. Gaboardi, “Privacy amplification by subsampling: Tight analyses via couplings and divergences,” in NeurIPS, 2018, pp. 6280–6290.
[29] B. Balle, G. Barthe, M. Gaboardi, and J. Geumlek, “Privacy amplification by mixing and diffusion mechanisms,” in NeurIPS, 2019, pp. 13 277–13 287.
[30] S. Asoodeh, M. Diaz, and F. P. Calmon, “Privacy analysis of online learning algorithms via contraction coefficients,” arXiv 2012.11035, 2020.
[31] R. L. Dobrushin, “Central limit theorem for nonstationary markov chains. I,” Theory Probab. Appl., vol. 1, no. 1, pp. 65–80, 1956.
[32] P. Del Moral, M. Ledoux, and L. Miclo, “On contraction properties of markov kernels,” Probab. Theory Relat. Fields, vol. 126, pp. 395–420, 2003.
[33] J. E. Cohen, Y. Iwasa, G. Rautu, M. Beth Ruskai, E. Seneta, and G. Zbaganu, “Relative entropy under mappings by stochastic matrices,” Linear Algebra and its Applications, vol. 179, pp. 211 – 235, 1993.
[34] V. Anantharam, A. Gohari, S. Kamath, and C. Nair, “On hypercontractivity and a data processing inequality,” in 2014 IEEE Int. Symp. Inf. Theory, 2014, pp. 3022–3026.
[35] Y. Polyanskiy and Y. Wu, “Strong data-processing inequalities for channels and bayesian networks,” in Convexity and Concentration, E. Carlen, M. Madiman, and E. M. Werner, Eds. New York, NY: Springer New York, 2017, pp. 211–249.
[36] Y. Polyanskiy and Y. Wu, “Dissipation of information in channels with input constraints,” IEEE Trans. Inf. Theory, vol. 62, no. 1, pp. 35–55, Jan 2016.
[37] F. P. Calmon, Y. Polyanskiy, and Y. Wu, “Strong data processing inequalities for input constrained additive noise channels,” IEEE Trans. Inf. Theory, vol. 64, no. 3, pp. 1879–1892, 2018.
[38] A. Makur and L. Zheng, “Comparison of contraction coefficients for $f$ -divergences,” Probl. Inf. Trans., vol. 56, pp. 103–156, 2020.
[39] M. Raginsky, “Strong data processing inequalities and $\phi$ -sobolev inequalities for discrete channels,” IEEE Trans. Inf. Theory, vol. 62, no. 6, pp. 3355–3389, June 2016.
[40] H. S. Witsenhausen, “On sequences of pairs of dependent random variables,” SIAM Journal on Applied Mathematics, vol. 28, no. 1, pp. 100–113, 1975.
[41] S. L. Warner, “Randomized response: A survey technique for eliminating evasive answer bias,” Journal of the American Statistical Association, vol. 60, no. 309, pp. 63–69, 1965.
[42] J. Cohen, J. Kemperman, and G. Zbăganu, Comparisons of Stochastic Matrices, with Applications in Information Theory, Economics, and Population Sciences. Birkhäuser, 1998.
[43] Y. Yang and A. Barron, “Information-theoretic determination of minimax rates of convergence,” Ann. Statist., vol. 27, no. 5, pp. 1564–1599, 10 1999.
[44] A. Xu and M. Raginsky, “Converses for distributed estimation via strong data processing inequalities,” in IEEE Int. Sympos. Inf. Theory (ISIT), 2015, pp. 2376–2380.
[45] T. M. Cover and J. A. Thomas, Elements of information theory. John Wiley & Sons, 2012.
[46] A. McGregor, I. Mironov, T. Pitassi, O. Reingold, K. Talwar, and S. Vadhan, “The limits of two-party differential privacy,” in Proc. of the 51st Annual IEEE Symposium on Foundations of Computer Science (FOCS ‘10), 23–26 October 2010, p. 81–90.
[47] I. Csiszár and J. Körner, Information Theory: Coding Theorems for Discrete Memoryless Systems. Cambridge University Press, 2011.

We begin by some alternative expressions for ${\mathsf{E}}_{\gamma}$ -divergence that are useful for the subsequent proofs. It is straightforward to show that for any $\gamma\geq 0$ , we have

$\displaystyle{\mathsf{E}}_{\gamma}(P\\|Q)$	$\displaystyle=\frac{1}{2}\int\|\textnormal{d}P-\gamma\textnormal{d}Q\|-\frac{1}{2}\|1-\gamma\|$	(22)
	$\displaystyle=\sup_{A\subset{\mathcal{X}}}\left[P(A)-\gamma Q(A)\right]-(1-\gamma)_{+}$	(23)
	$\displaystyle=P\big{(}\log\frac{\text{d}P}{\text{d}Q}>\log\gamma\big{)}-\gamma Q\big{(}\log\frac{\text{d}P}{\text{d}Q}>\log\gamma\big{)}$
	$\displaystyle\quad-(1-\gamma)_{+}.$	(24)

The proof of Theorem 1 relies on the following theorem, recently proved by the authors in [30, Theorem 3].

Theorem 3.

For any $\gamma\geq 1$ and Markov kernel ${\mathsf{K}}$ with input alphabet ${\mathcal{X}}$ , we have

\eta_{\gamma}({\mathsf{K}})=\sup_{x,x^{\prime}\in{\mathcal{X}}}{\mathsf{E}}_{\gamma}({\mathsf{K}}(\cdot|x)\|{\mathsf{K}}(\cdot|x^{\prime})).

(25)

Notice that for $\gamma=1$ , this theorem reduces to the well-known Dobrushin’s result [31] that states

\eta_{\mathsf{TV}}({\mathsf{K}})=\sup_{x,x^{\prime}\in{\mathcal{X}}}{\mathsf{TV}}({\mathsf{K}}(\cdot|x),{\mathsf{K}}(\cdot|x^{\prime})).

(26)

Proof of Theorem 1.

It follows from Theorem 3 that

\eta_{e^{\varepsilon}}({\mathsf{K}})\leq\delta\leavevmode\nobreak\ \Longleftrightarrow\leavevmode\nobreak\ \sup_{x,x^{\prime}\in{\mathcal{X}}}{\mathsf{E}}_{e^{\varepsilon}}({\mathsf{K}}(\cdot|x)\|{\mathsf{K}}(\cdot|x^{\prime}))\leq\delta,

(27)

which, according to (23), implies

\eta_{e^{\varepsilon}}({\mathsf{K}})\leq\delta\leavevmode\nobreak\ \Longleftrightarrow\leavevmode\nobreak\ \sup_{x,x^{\prime}\in A}\sup_{A\subset{\mathcal{Z}}}\left[{\mathsf{K}}(A|x)-e^{\varepsilon}{\mathsf{K}}(A|x^{\prime})\right]\leq\delta.

Hence, in light of Definition 1, ${\mathsf{K}}$ is $(\varepsilon,\delta)$ -LDP if and only if $\eta_{e^{\varepsilon}}({\mathsf{K}})\leq\delta$ .

∎

Proof of Lemma 1.

We first show the following upper and lower bounds for ${\mathsf{E}}_{\gamma}$ -divergence in terms of the total variation distance.

Claim. For any distributions $P$ and $Q$ on ${\mathcal{X}}$ and any $\gamma\geq 1$ , we have

1-\gamma(1-{\mathsf{TV}}(P\|Q))\leq{\mathsf{E}}_{\gamma}(P\|Q)\leq{\mathsf{TV}}(P,Q).

(28)

Proof of Claim.

The upper bound is immediate from the definition of ${\mathsf{E}}_{\gamma}$ -divergence (and holds for any $\gamma\geq 0$ ). Note that

	$\displaystyle\gamma{\mathsf{TV}}(P\\|Q)$	$\displaystyle=\max_{A\subset{\mathcal{X}}}\left[\gamma P(A)-\gamma Q(A)\right]$
		$\displaystyle=\max_{A\subset{\mathcal{X}}}\left[P(A)-\gamma Q(A)+(\gamma-1)P(A)\right]$
		$\displaystyle\leq\max_{A\subset{\mathcal{X}}}\left[P(A)-\gamma Q(A)\right]+(\gamma-1)$
		$\displaystyle={\mathsf{E}}_{\gamma}(P\\|Q)+\gamma-1,$

where the last equality follows from (23). This immediately results in the lower bound in (28). ∎

According to this claim, we can write for $\gamma\geq 1$

{\mathsf{TV}}(P,Q)\leq 1-\frac{1-{\mathsf{E}}_{\gamma}(P\|Q)}{\gamma}.

Replacing $P$ and $Q$ with ${\mathsf{K}}(\cdot|x)$ and ${\mathsf{K}}(\cdot|x^{\prime})$ , respectively, for some $x$ and $x^{\prime}$ in ${\mathcal{X}}$ , we obtain

{\mathsf{TV}}({\mathsf{K}}(\cdot|x),{\mathsf{K}}(\cdot|x^{\prime}))\leq 1-\frac{1-{\mathsf{E}}_{\gamma}({\mathsf{K}}(\cdot|x)\|{\mathsf{K}}(\cdot|x^{\prime}))}{\gamma}.

Taking supremum over $x$ and $x^{\prime}$ from both side and invoking Theorem 3 and (26), we conclude that

\eta_{\mathsf{TV}}({\mathsf{K}})\leq 1-\frac{1-\eta_{\gamma}({\mathsf{K}})}{\gamma}.

(29)

It is known [33, 39] that for any Markov kernel ${\mathsf{K}}$ and any convex functions $f$ we have

\eta_{f}({\mathsf{K}})\leq\eta_{\mathsf{TV}}({\mathsf{K}}),

(30)

from which the desired result follows immediately. ∎

Proof of Lemma 2.

Given $n$ mechanisms ${\mathsf{K}}_{1},{\mathsf{K}}_{2},\dots,{\mathsf{K}}_{n}$ , we consider the non-interactive mechanism $P_{Z^{n}|X^{n}}$ given by

P_{Z^{n}|X^{n}}(z^{n}|x^{n})=\prod_{i=1}^{n}{\mathsf{K}}_{i}(z_{i}|x_{i}).

If ${\mathsf{K}}_{i}\in{\mathcal{Q}}_{\varepsilon,\delta}$ for $i\in[n]$ , then we have $\eta_{e^{\varepsilon}}({\mathsf{K}}_{i})\leq\delta$ . According to (29), it thus leads to $\eta_{\mathsf{TV}}({\mathsf{K}}_{i})\leq\varphi(\varepsilon,\delta)$ . Invoking [35, Corollary 9] (see also [44, Lemma 3] and [38, Eq. (62)]), we obtain

	$\displaystyle\eta_{\mathsf{TV}}(P_{Z^{n}\|X^{n}})$	$\displaystyle\leq\max_{i\in[n]}\left[1-(1-\eta_{{\mathsf{TV}}}({\mathsf{K}}_{i}))^{n}\right]$
		$\displaystyle\leq\left[1-(1-\varphi(\varepsilon,\delta))^{n}\right]$
		$\displaystyle=\varphi_{n}(\varepsilon,\delta).$

∎

Proof of Lemma 3.

Recall that $X^{n}$ is an i.i.d. sample of distribution $P$ and each $Z_{i}$ , $i\in[n]$ is obtained by applying ${\mathsf{K}}_{i}$ to $X_{i}$ . Note that by assumption ${\mathsf{K}}_{i}$ specifies the conditional distribution $P_{Z_{i}|X_{i},Z^{i-1}}$ . Let $M^{n}_{0}$ and $M^{n}_{1}$ denote the distribution of $Z^{n}$ when $P=P_{0}$ and $P=P_{1}$ , respectively. Thus, we have for $P=P_{0}$ and any $z^{n}\in{\mathcal{Z}}^{n}$

$\displaystyle M^{n}_{0}(z^{n})$	$\displaystyle=\prod_{i=1}^{n}P_{Z_{i}\|Z^{i-1}}(z_{i}\|z^{i-1})$	(31)
	$\displaystyle=\prod_{i=1}^{n}\left(P_{X_{i}\|Z^{i-1}=z^{i-1}}{\mathsf{K}}_{i}\right)(z_{i})$	(32)
	$\displaystyle=\prod_{i=1}^{n}\left(P_{0}{\mathsf{K}}_{i}\right)(z_{i}).$	(33)

Having this in mind, we can write

$\displaystyle{\mathsf{TV}}^{2}(M^{n}_{0},M^{n}_{1})$	$\displaystyle\stackrel{{\scriptstyle(a)}}{{\leq}}\frac{1}{2}D_{\mathsf{KL}}(M^{n}_{0}\\|M^{n}_{1})$	(34)
	$\displaystyle\stackrel{{\scriptstyle(b)}}{{=}}\frac{1}{2}\sum_{i=1}^{n}D_{\mathsf{KL}}(P_{0}{\mathsf{K}}_{i}\\|P_{1}{\mathsf{K}}_{i})$	(35)
	$\displaystyle\stackrel{{\scriptstyle(c)}}{{\leq}}\frac{1}{2}\sum_{i=1}^{n}\varphi(\varepsilon,\delta)D_{\mathsf{KL}}(P_{0}\\|P_{1})$	(36)

where $(a)$ follows from Pinsker’s inequality, $(b)$ is due to the chain rule of KL divergence, $(c)$ is an application of Lemma 1. Plugging (36) in (6), we obtain the desired result. ∎

Proof of Corollary 1.

Fix $\omega\in(0,1]$ and consider two distributions $P_{0}$ and $P_{1}$ on $\{-\omega^{-\frac{1}{k}},0,\omega^{-\frac{1}{k}}\}$ defined as

P_{0}(-\omega^{-\frac{1}{k}})=\omega,\qquad P_{0}(0)=1-\omega,

and

P_{1}(\omega^{-\frac{1}{k}})=\omega,\qquad P_{1}(0)=1-\omega.

It can be verified that both $P_{0}$ and $P_{1}$ belong to ${\mathcal{P}}_{k}$ . Note that $\ell_{2}^{2}(\theta(P_{0}),\theta(P_{1}))=2\omega^{\frac{2(k-1)}{k}}$ . Let $M_{0}^{n}=P^{\otimes n}_{0}{\mathsf{K}}^{n}$ and $M_{1}^{n}=P^{\otimes n}_{1}{\mathsf{K}}^{n}$ be the corresponding output distributions of the mechanism ${\mathsf{K}}^{n}={\mathsf{K}}_{1}\dots\mathsf{K}_{n}$ , the composition of mechanisms ${\mathsf{K}}_{i}$ . Le Cam’s bound for $\ell_{2}^{2}$ -metric yields

	$\displaystyle\mathcal{R}_{n}({\mathcal{P}}_{k},\ell_{2}^{2},\varepsilon,\delta)$	$\displaystyle\geq\omega^{\frac{2(k-1)}{k}}(1-{\mathsf{TV}}(M^{n}_{0},M^{n}_{1})$
		$\displaystyle\geq\omega^{\frac{2(k-1)}{k}}\left(1-H(M^{n}_{0},M^{n}_{1})\right),$		(37)

where the last inequality follows from the fact ${\mathsf{TV}}(P,Q)\leq H(P,Q)$ for $H(P,Q)$ being the Hellinger distance. Notice that $M_{0}^{n}=\prod_{i=1}^{n}(P_{0}{\mathsf{K}}_{i})$ and $M_{1}^{n}=\prod_{i=1}^{n}(P_{1}{\mathsf{K}}_{i})$ where each ${\mathsf{K}}_{i}$ for $i\in[n]$ is $(\varepsilon,\delta)$ -LDP. It is well known that

H^{2}\left(\prod_{i=1}^{n}P_{i},\prod_{i=1}^{n}Q_{i}\right)=2-2\prod_{i=1}^{n}\left(1-\frac{1}{2}H^{2}(P_{i},Q_{i})\right).

Thus,

$\displaystyle H^{2}(M_{0}^{n},M_{1}^{n})$	$\displaystyle=2-2\prod_{i=1}^{n}\left(1-\frac{1}{2}H^{2}(P_{0}{\mathsf{K}}_{i},P_{1}{\mathsf{K}}_{i})\right)$
	$\displaystyle\leq 2-2\prod_{i=1}^{n}\left(1-\frac{\varphi(\varepsilon,\delta)}{2}H^{2}(P_{0},P_{1})\right)$
	$\displaystyle=2-2\left(1-\frac{\varphi(\varepsilon,\delta)}{2}H^{2}(P_{0},P_{1})\right)^{n}$
	$\displaystyle=2-2\left(1-\omega\varphi(\varepsilon,\delta)\right)^{n}$	(38)

Hence, we obtain

{\mathsf{TV}}(M_{0}^{n},M_{1}^{n})\leq\sqrt{2-2\left(1-\omega\varphi(\varepsilon,\delta)\right)^{n}}.

(39)

Plugging (38) into (37), we obtain

\mathcal{R}_{n}({\mathcal{P}}_{k},\ell_{2}^{2},\varepsilon,\delta)\geq\omega^{\frac{2(k-1)}{k}}\left[1-\sqrt{2}\sqrt{1-(1-\omega\varphi(\varepsilon,\delta))^{n}}\right].

(40)

Now, choose $\omega=\min\left\{1,\frac{1}{\varphi(\varepsilon,\delta)}\left[1-\left(\frac{7}{8}\right)^{\frac{1}{\sqrt{n}}}\right]\right\}$ . Notice that we assume $\delta>0$ and hence $\varphi(\varepsilon,\delta)>0$ regardless of $\varepsilon$ . Plugging this choice of $\omega$ into the above bound, we obtain

	$\displaystyle\mathcal{R}_{n}({\mathcal{P}}_{k},\ell_{2}^{2},\varepsilon,\delta)$	$\displaystyle\gtrsim(\varphi(\varepsilon,\delta))^{-\frac{2(k-1)}{k}}\left[1-\left(\frac{7}{8}\right)^{\frac{1}{\sqrt{n}}}\right]^{\frac{2(k-1)}{k}}$
		$\displaystyle\gtrsim(\varphi(\varepsilon,\delta))^{-\frac{2(k-1)}{k}}n^{-\frac{k-1}{k}}.$		(41)

∎

Proof of Lemma 4.

Note that we have the Markov chain $V\mathrel{\multimap}\joinrel\mathrel{-}\mspace{-9.0mu}\joinrel\mathrel{-}X^{n}\mathrel{\multimap}\joinrel\mathrel{-}\mspace{-9.0mu}\joinrel\mathrel{-}Z^{n}$ . It has been shown in [47, Problem 15.12] (see also [34, 47]) that for any channel $P_{B|A}$ connecting random variable $A$ to $B$ , we have

\eta_{\mathsf{KL}}(P_{B|A})=\sup_{\begin{subarray}{c}P_{AU}:\\ U\mathrel{\multimap}\joinrel\mathrel{-}\mspace{-9.0mu}\joinrel\mathrel{-}A\mathrel{\multimap}\joinrel\mathrel{-}\mspace{-9.0mu}\joinrel\mathrel{-}B\end{subarray}}\frac{I(U;B)}{I(U;A)}.

(42)

Replacing $A$ and $B$ with $X^{n}$ and $Z^{n}$ , respectively, in the above equation, we obtain

	$\displaystyle I(Z^{n};V)$	$\displaystyle\leq\eta_{\mathsf{KL}}({\mathsf{K}}^{\otimes n})I(X^{n};V)$
		$\displaystyle=\eta_{\mathsf{KL}}({\mathsf{K}}^{\otimes n})\frac{n}{\|{\mathcal{V}}\|}\sum_{V=v}D_{\mathsf{KL}}(P_{v}\\|\bar{P}),$

where ${\mathsf{K}}^{\otimes n}=P_{Z^{n}|X^{n}}$ and $\bar{P}=\frac{1}{|{\mathcal{V}}|}\sum_{v\in{\mathcal{V}}}P_{v}$ . The desired result then follows from Lemma 2 and the convexity of KL-divergence. ∎

Proof of Corollary 2.

The proof strategy is as follows: we first construct a set of probability distribution $\{P_{v}\}$ for $v$ taking values in a finite set ${\mathcal{V}}$ and then apply Fano’s inequality (9) where $V$ is a uniform random variable on ${\mathcal{V}}$ . Duchi el el. [10, Lemma 6] showed that there exists a set ${\mathcal{V}}_{k}$ of the $k$ -dimensional hypercube $\{-1,+1\}^{k}$ satisfying $\|v-v^{\prime}\|_{1}\geq\frac{k}{2}$ for each $v,v^{\prime}\in{\mathcal{V}}_{k}$ with $v\neq v^{\prime}$ and some integer $k\in[d]$ , while $|{\mathcal{V}}_{k}|$ being at least $\lceil e^{\frac{k}{16}}\rceil$ . If $k<d$ , one can extend ${\mathcal{V}}_{k}\subset\mathbb{R}^{k}$ to a subset of $\mathbb{R}^{d}$ by considering ${\mathcal{V}}={\mathcal{V}}_{k}\times\{0\}^{d-k}$ . Fix $\omega\in(0,1)$ and define a distribution $P_{v}\in{\mathcal{P}}(\mathsf{B}^{d}_{2}(r))$ for $v\in{\mathcal{V}}$ as follows: Choose an index $j\in[k]$ uniformly and set $P_{v}(r\mathsf{e}_{j})=\frac{1+\omega v_{j}}{2}$ and $P_{v}(-r\mathsf{e}_{j})=\frac{1-\omega v_{j}}{2}$ where $\mathsf{e}_{j}$ is the standard basis vector in $\mathbb{R}^{d}$ . Given $v\in{\mathcal{V}}_{k}$ , let $X\sim P_{v}$ be a random variable taking values in $\{\pm r\mathsf{e}_{j}\}_{j=1}^{k}$ and $X^{n}$ be an i.i.d. sample of $X$ . Furthermore, as before, let $Z^{n}$ be a privatized sample of $X^{n}$ obtained by ${\mathsf{K}}^{\otimes n}$ with ${\mathsf{K}}$ being an $(\varepsilon,\delta)$ -LDP mechanism. To apply Fano’s inequality, we first need to bound $I(Z^{n};V)$ . According to Lemma 4, we have

	$\displaystyle I(Z^{n};V)$	$\displaystyle\leq\varphi_{n}(\varepsilon,\delta)\frac{n}{\|{\mathcal{V}}_{k}\|^{2}}\sum_{v,v^{\prime}}{\mathsf{KL}}(P_{v}\\|P_{v^{\prime}})$
		$\displaystyle\leq\varphi_{n}(\varepsilon,\delta)I(X^{n};V).$		(43)

Hence, bounding $I(Z^{n};V)$ reduces to bounding $I(X^{n};V)$ . To this end, first notice that $I(X^{n};V)\leq nI(X;V)$ . Let $K$ be a uniform random variable on $[k]$ independent of $V$ that chooses the coordinate of $V$ . Note that $K$ can be determined by $X$ and hence

	$\displaystyle I(X;V)$	$\displaystyle=I(X,K;V)$
		$\displaystyle=I(X;V\|K)$
		$\displaystyle\leq\log 2-h_{\mathsf{b}}(\frac{1-\omega}{2})$
		$\displaystyle\leq\omega\log 2,$

where the last inequality follows from the fact that $h_{\mathsf{b}}(a)\geq 2a\log 2$ for $a\in[0,\frac{1}{2}]$ due to the concavity of entropy. Consequently, we can write

I(Z^{n};V)\leq n\omega\varphi_{n}(\varepsilon,\delta)\log 2.

(44)

Applying Fano’s inequality, we obtain

	$\displaystyle\mathcal{R}_{n}({\mathcal{P}},\ell,\varepsilon,\delta)$	$\displaystyle\geq\frac{r^{2}\omega^{2}}{k}\left[1-\frac{(1+n\omega\varphi_{n}(\varepsilon,\delta))\log 2}{\log\|{\mathcal{V}}\|}\right]$		(45)
		$\displaystyle\geq\frac{r^{2}\omega^{2}}{k}\left[1-\frac{16(1+n\omega\varphi_{n}(\varepsilon,\delta))\log 2}{k}\right].$		(46)

Setting $\omega=\min\{1,\frac{k}{50n\varphi_{n}(\varepsilon,\delta)}\}$ and assuming $k\geq 16$ , we can write

\mathcal{R}_{n}({\mathcal{P}},\ell,\varepsilon,\delta)\gtrsim r^{2}\max_{k\in[d]}\min\left\{\frac{1}{k},\frac{k}{n^{2}\varphi^{2}_{n}(\varepsilon,\delta)}\right\}.

(47)

By choosing $k=\min\{n\varphi_{n}(\varepsilon,\delta),d\}$ , we obtain

\mathcal{R}_{n}({\mathcal{P}},\ell,\varepsilon,\delta)\gtrsim r^{2}\min\left\{\frac{1}{n\varphi_{n}(\varepsilon,\delta)},\frac{d}{n^{2}\varphi^{2}_{n}(\varepsilon,\delta)}\right\}.

(48)

∎

Proof of Theorem 2.

Let $\hat{\Theta}=\Psi(Z^{n})$ be an estimate of $\Theta$ for some $\Psi$ and $p_{\zeta}\coloneqq P_{\Theta\hat{\Theta}}(\ell(\Theta,\hat{\Theta})\leq\zeta)$ and $q_{\zeta}\coloneqq(P_{\Theta}P_{\hat{\Theta}})(\ell(\Theta,\hat{\Theta})\leq\zeta)$ , i.e., $p_{\zeta}$ and $q_{\zeta}$ correspond to the probability of the event $\{\ell(\Theta,\hat{\Theta})\leq\zeta\}$ under the joint and product distributions, respectively. By definition, we have for any $\gamma\geq 1$

	$\displaystyle I_{\gamma}(\Theta;\hat{\Theta})$	$\displaystyle={\mathsf{E}}_{\gamma}(P_{\Theta\hat{\Theta}}\\|P_{\Theta}P_{\hat{\Theta}})$
		$\displaystyle=\sup_{A\subset{\mathcal{T}}\times{\mathcal{T}}}\left[P_{\Theta\hat{\Theta}}(A)-\gamma(P_{\Theta}P_{\hat{\Theta}})(A)\right]$
		$\displaystyle\geq p_{\zeta}-\gamma q_{\zeta}$
		$\displaystyle\geq p_{\zeta}-\gamma{\mathcal{L}}(\zeta),$

where the last inequality follows from the fact that $q_{\zeta}\leq{\mathcal{L}}(\zeta)$ , that can be shown as follows

	$\displaystyle q_{\zeta}$	$\displaystyle=\int_{{\mathcal{T}}}\int_{{\mathcal{T}}}1_{\{\ell(\theta,\hat{\theta})\leq\zeta\}}P_{\Theta}(\text{d}\theta)P_{\hat{\Theta}}(\text{d}\hat{\theta})$
		$\displaystyle\leq\sup_{t\in{\mathcal{T}}}\int_{{\mathcal{T}}}1_{\{\ell(\theta,t)\leq\zeta\}}P_{\Theta}(\text{d}\theta)$
		$\displaystyle={\mathcal{L}}(\zeta).$

Recalling that $\Pr(\ell(\Theta,\hat{\Theta})>\zeta)=1-p_{\zeta}$ , the above thus implies

\Pr(\ell(\Theta,\hat{\Theta})>\zeta)\geq 1-I_{\gamma}(\Theta;\hat{\Theta})-\gamma{\mathcal{L}}(\zeta).

(49)

Since, by Markov’s inequality, ${\mathbb{E}}[\ell(\Theta,\hat{\Theta})]\geq\zeta\Pr(\ell(\Theta,\hat{\Theta})\geq\zeta)$ , we can write by setting $\gamma=e^{\varepsilon}$

	$\displaystyle R_{n}^{\mathsf{Bayes}}(P_{\Theta},\ell,\varepsilon,\delta)$	$\displaystyle\geq\zeta\left[1-I_{e^{\varepsilon}}(\Theta;\hat{\Theta})-e^{\varepsilon}{\mathcal{L}}(\zeta)\right]$
		$\displaystyle\geq\zeta\left[1-I_{e^{\varepsilon}}(\Theta;Z^{n})-e^{\varepsilon}{\mathcal{L}}(\zeta)\right],$

where the second inequality comes from the data processing inequality for $I_{\gamma}$ . To further lower bound the right-hand side, we write

	$\displaystyle I_{e^{\varepsilon}}(\Theta;Z^{n})$	$\displaystyle=\int_{{\mathcal{T}}}{\mathsf{E}}_{e^{\varepsilon}}(P_{Z^{n}\|\Theta=\theta}\\|P_{Z^{n}})P_{\Theta}(\text{d}\theta)$
		$\displaystyle\leq\eta_{e^{\varepsilon}}(P_{Z^{n}\|X^{n}})\int_{{\mathcal{T}}}{\mathsf{E}}_{e^{\varepsilon}}(P_{X^{n}\|\Theta=\theta}\\|P_{X^{n}})P_{\Theta}(\text{d}\theta)$
		$\displaystyle=\eta_{e^{\varepsilon}}(P_{Z^{n}\|X^{n}})I_{e^{\varepsilon}}(\Theta;X^{n}),$

where the inequality follows from the definition of contraction coefficient. When $n=1$ , we have $\eta_{e^{\varepsilon}}({\mathsf{K}})\leq\delta$ as ${\mathsf{K}}=P_{Z|X}$ is assumed to be $(\varepsilon,\delta)$ -DP. For $n>1$ , we invoke Lemma 2 to obtain $\eta_{e^{\varepsilon}}(P_{Z^{n}|X^{n}})\leq\varphi_{n}(\varepsilon,\delta)$ . ∎

Proof of Corollary 4 .

Let $\beta_{n}(\alpha)\coloneqq\beta^{\infty,1}_{n}(\alpha)$ be the non-private trade-off between type I and type II error probabilities (i.e., $Z^{n}=X^{n}$ ). According to Chernoff-Stein lemma (see, e.g., [45, Theorem 11.8.3]), we have

\lim_{n\to\infty}\frac{1}{n}\log\beta_{n}(\alpha)=-D_{\mathsf{KL}}(P_{0}\|P_{1}).

(50)

Assume now that, $Z^{n}$ is the output of ${\mathsf{K}}^{\otimes n}$ for an $(\varepsilon,\delta)$ -LDP mechanism ${\mathsf{K}}$ . According to (50), we obtain that

\lim_{n\to\infty}\frac{1}{n}\log\beta^{\varepsilon,\delta}_{n}(\alpha)=-\sup_{{\mathsf{K}}\in{\mathcal{Q}}_{\varepsilon,\delta}}D_{\mathsf{KL}}(P_{0}{\mathsf{K}}\|P_{1}{\mathsf{K}}).

(51)

Applying Lemma 1, we obtain the desired result. ∎

Proof of Corollary 5.

Consider the (trivial) Markov chain $X\mathrel{\multimap}\joinrel\mathrel{-}\mspace{-9.0mu}\joinrel\mathrel{-}X\mathrel{\multimap}\joinrel\mathrel{-}\mspace{-9.0mu}\joinrel\mathrel{-}Z$ . According to (42), we can write $I(X;Z)\leq\eta_{\mathsf{KL}}({\mathsf{K}})H(X)$ . The desired result then immediately follows from Lemma 1. ∎

	$\displaystyle I_{e^{\varepsilon}}(\Theta;Z^{n})$	$\displaystyle=\int_{{\mathcal{T}}}{\mathsf{E}}_{e^{\varepsilon}}(P_{Z^{n}\|\Theta=\theta}\\|P_{Z^{n}})P_{\Theta}(\text{d}\theta)$
		$\displaystyle\leq\eta_{e^{\varepsilon}}(P_{Z^{n}\|X^{n}})\int_{{\mathcal{T}}}{\mathsf{E}}_{e^{\varepsilon}}(P_{X^{n}\|\Theta=\theta}\\|P_{X^{n}})P_{\Theta}(\text{d}\theta)$
		$\displaystyle=\eta_{e^{\varepsilon}}(P_{Z^{n}\|X^{n}})I_{e^{\varepsilon}}(\Theta;X^{n}),$

Local Differential Privacy Is Equivalent to Contraction of 𝖤γ{\mathsf{E}}_{\gamma}-Divergence