This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Format Preserving Encryption in the Bounded Retrieval Model

Ben Morris Department of Mathematics, University of California, Davis Hans Oberschelp Department of Mathematics, University of California, Davis Hamilton Samraj Santhakumar Department of Mathematics, University of California, Davis
(July 16, 2023)
Abstract

In the bounded retrieval model, the adversary can leak a certain amount of information from the message sender’s computer (e.g., 1010 percent of the hard drive). Bellare, Kane and Rogaway give an efficient symmetric encryption scheme in the bounded retrieval model. Their scheme uses a giant key (a key so large only a fraction of it can be leaked.) One property of their scheme is that the encrypted message is larger than the original message. Rogaway asked if an efficient scheme exists that does not increase the size of the message. In this paper we present such a scheme.

1 Introduction

The present paper attempts to solve the problem of format preserving encryption in the bounded retrieval model, by constructing a pseudorandom permutation and providing concrete security bounds in the random oracle model. The bounded retrieval model was introduced to study cryptographic protocols that remain secure in the presence of an adversary that can transmit or leak private information from the host’s computer to a remote home base. One example of such an adversary is an APT (Advanced Persistent Threat), which is a malware that stays undetected in the host’s network and tries to ex-filtrate the secret keys used by the host. The premise of the bounded retrieval model is that such an adversary cannot move a large amount of data to a remote base without being detected or that it can only communicate with the remote base through a very narrow channel. That is, the model assumes an upper bound on the amount of data that an adversary can leak. In [1] Bellare, Kane and Rogaway introduce an efficient symmetric encryption scheme in this model and give concrete security bounds for it. They assume that the secret key is very large and model the leaked data as a function that takes the secret key as the input and outputs a smaller string. The length of this string is a parameter on which the security bounds depend. Their algorithm uses a random seed RR along with the big key to generate a key of conventional length that is indistinguishable from a random string of the same length even when the function used to model the leaked data depends on calls to the random oracle that the algorithm uses. It then uses this newly generated key and any of the conventionally available symmetric encryption schemes, say an AES mode of operation, to create a ciphertext CC. Finally it outputs (R,C)(R,C).

The above scheme is not format preserving since the final ciphertext (R,C)(R,C) is longer than the original message MM. A question posed by Phillip Rogaway (personal communication) is whether a secure format preserving encryption scheme exists in the bounded retrieval model. Another way to pose this question is as follows : If the adversary is allowed to leak data, is it possible to construct a pseudorandom permutation that is secure under some notion of security, say the CCA notion of security? The aim of this paper is to answer this question. Unfortunately it is not possible to come up with a pseudorandom permutation that is secure under the strong notion of CCA security. This is because in the CCA model, before trying to distinguish between a random permutation and the pseudorandom permutation, the adversary can choose to look at a sequence of plaintext-ciphertext pairs that he chooses. If a leakage of data is allowed, the adversary can simply leak a single plaintext-ciphertext pair and use it to gain a very high CCA advantage. Hence we weaken the notion of security by requiring that the adversary can only look at a sequence of plaintext-ciphertext pairs where the plaintexts are uniformly random and distinct. We then ask her to distinguish between a truly random permutation and the pseudorandom permutation. (See Section 3 for a precise definition of the security in our setup.)
In the present paper, we operate in the setting of the random oracle model (see [2]). Our contribution is to give a pseudorandom permutation in the bounded retrieval model and prove that it is secure in the weak sense that is discussed above.

Just as in [1] we use a big key. We now give a brief sketch of our approach. Note that if one fixes the string of leaked bits, the key is a uniform sample from the preimage of the leaked string. If the length of the leaked string is small, then on average the preimage is very large. What this means is that even when the leakage is known, with a high probability the total entropy of the key is high. This implies that the sum of entropies of each bit in our key is large. This means that many of the bits in the key are “unpredictable” in the sense that the probabilty of 11 is not close to 0 or 11. So, if one uses a random oracle to look at various positions of the key and take an XOR, it is likely that the resulting bit is close to an unbiased random bit. This idea of probing the key is similar to the one used in [1]. The content of Sections 6 and 7, which form the heart of this paper, is to show that bits generated by probing the key are close to i.i.d. unbiased random bits. To construct a pseudorandom permutation using these bits, we use a particular card shuffling scheme called the Thorp shuffle, just as in [9]. This construction is given in the next section.

2 Thorp shuffle/maximally unbalanced Feistel network

One method of turning a pseudorandom function into a pseudorandom permutation is to use a Feistel network (see [5]). The maximally unbalanced Feistel network is also known as the Thorp shuffle. Round rr of this shuffle can be described as follows. Suppose that the current binary string (i.e., the value of the message after the first r1r-1 rounds of encryption) is LRLR, where length(L)=1{\rm length}(L)=1 and length(R)=𝓂1{\rm length}(R)=\mathcal{m}-1. Then round rr transforms the string to RLRL^{\prime}, where

L=LFk(R,r),L^{\prime}=L\oplus F_{k}(R,r),

and FkF_{k} is a pseudorandom function. Let Xt(m)X_{t}(m) denote the result of tt Thorp shuffles on message mm. The novel idea in the present paper is to use a pseudorandom function based on a big key kk.
 
The Big Key Pseudorandom Function: Let 𝓀\mathcal{k} be the length of the big key kk. To compute Fk(R,r)F_{k}(R,r), apply the random oracle to (R,r)(R,r) to obtain (P,𝒮)\big{(}P,{\mathcal{S}}\big{)}, where P=(P1,,Pn)P=(P_{1},\ldots,P_{n}) is nn samples with replacement from {1,,𝓀}\{1,\ldots,\mathcal{k}\} and 𝒮\mathcal{S} is a uniform random subset of {1,,n}\{1,\ldots,n\} that is independent of PP. By analogy with [1], we define the random subkey by k[P]:=(k[P1],,k[Pn])k[P]:=(k[P_{1}],\dots,k[P_{n}]). Finally, define

Fk(R,r)=i𝒮k[Pi].F_{k}(R,r)=\oplus_{i\in{\mathcal{S}}}k[P_{i}]\;.

That is, Fk(R,r)F_{k}(R,r) is the XOR of a randomly chosen subsequence of the subkey. For a given key kk we define our cipher to be XT()X_{T}(\cdot) for some fixed positive integer TT.

Remark: We conjecture that it would also work (i.e., we would get a suitable pseudorandom function) if we took the XOR of the entire subkey; the current definition is used because it makes the proof simpler.
 

3 Security of the Cipher

In this section we introduce a notion of security for pseudorandom permutations under the assumption that there is a leakage of data. Let 𝒦={0,1}𝓀\mathcal{K}=\{0,1\}^{\mathcal{k}} denote the set of keys and let ={0,1}𝓂\mathcal{M}=\{0,1\}^{\mathcal{m}} denote the set of messages. We assume that the adversary can leak 𝓁\mathcal{l} bits of data and just as in [1], use a function Φ:𝒦{0,1}𝓁\Phi:\mathcal{K}\rightarrow\{0,1\}^{\mathcal{l}} to model this. Henceforth we will refer to this function as the leakage function. The adversary has the power to choose this function and this function can depend on calls to the random oracle. For a key KK, we will use L=Φ(K)L=\Phi(K) to denote the output one gets by applying the leakage function to it. We will call this the leakage. We allow the adversary to make 𝓇\mathcal{r} random oracle calls and decide on a leakage function Φ\Phi. After the adversary has chosen a leakage function, consider the following two worlds.
World 1: In this world, we first choose distinct uniformly random messages M1,,MqM_{1},\ldots,M_{q}\in\mathcal{M}. Then, for a uniformly random key K𝒦K\in\mathcal{K}, we set Ci=XT(Mi)C_{i}=X_{T}(M_{i}) where XtX_{t} is the Thorp shuffle based cipher we defined in Section 2 and TT is some fixed positive integer. We give the adversary access to the leakage LL, the input-output pairs (M1,C1),,(Mq,Cq)(M_{1},C_{1}),\ldots,(M_{q},C_{q}) and the random oracle calls that were used by the algorithm to compute the XT(Mi)sX_{T}(M_{i})^{\prime}s.
World 0: In this world, again we choose distinct uniformly random messages M1,,MqM_{1},\ldots,M_{q}\in\mathcal{M}. We once again choose a random key K𝒦K\in\mathcal{K} and compute LL and all the random oracle calls necessary to evaluate the XT(Mi)sX_{T}(M_{i})^{\prime}s, just like world 1. However, instead of setting CisC_{i}^{\prime}s to be the outputs of the cipher, we choose a uniformly random permutation π:\pi:\mathcal{M}\to\mathcal{M} and set Ci=π(Mi)C_{i}=\pi(M_{i}). Just as in world 1, the adversary is provided access to the input-output pairs for the qq messages, the leakage LL and the random oracle calls.
We now place him in these two worlds one at a time without telling him which world he is in. In each of these cases we ask the adversary to guess which world he is in. Let 𝒜(0)\mathcal{A}(0) and 𝒜(1)\mathcal{A}(1) denote the answers he gives in world 0 and world 1 respectively. Then, we define the advantage of an adversary as

Adv(𝒜)=1(𝒜(1)=1)0(𝒜(0)=1),\textbf{Adv}(\mathcal{A})=\mathbb{P}_{1}\big{(}\mathcal{A}(1)=1\big{)}-\mathbb{P}_{0}\big{(}\mathcal{A}(0)=1\big{)}, (1)

where i\mathbb{P}_{i} is the probability measure in world ii. Define the maximum advantage

𝐌𝐚𝐱𝐀𝐝𝐯𝓇,𝐪=max𝒜(Adv(𝒜)),\mathbf{MaxAdv}_{\mathbf{\mathcal{r},q}}=\max_{\mathcal{A}}\Big{(}\textbf{Adv}(\mathcal{A})\Big{)}, (2)

where the maximum is taken over all adversaries satisfying the above mentioned conditions. Note that in the above setup if we allow the messages to be chosen by the adversary instead of being random, we get the notion of security of a block cipher against chosen plain text attack (CPA) under leakage. Security against CPA is weaker than security against CCA (chosen ciphertext attack). Unfortunately, if a leakage is allowed, it is not possible to design a cipher that is secure in the CPA framework. This is because of the adversary who does the following: Let q=1q=1. Assume that the message length 𝓂\mathcal{m} is less than 𝓁\mathcal{l}. For each key kk, the adversary includes the ciphertext XT(M1)X_{T}(M_{1}) into the leakage, for a fixed message M1M_{1}. Then, the adversary answers as follows. If C1=XT(M1)C_{1}=X_{T}(M_{1}) then the adversary guesses that he is in world 1. Else, the guess is world 0. In this case, 1(𝒜(1)=1)=1\mathbb{P}_{1}(\mathcal{A}(1)=1)=1 and 0(𝒜(0)=1)=1/2𝓂\mathbb{P}_{0}(\mathcal{A}(0)=1)=1/2^{\mathcal{m}}. Hence this adversary has a very high advantage. By instead providing the adversary with uniform random plaintext-ciphertext pairs, we get the notion of secuirty against a known plain text attack (KPA) under leakage. The main result of this paper is the following bound on the maximum advantage of such an adversary.

Theorem 1

The adversary’s advantage satisfies

𝐌𝐚𝐱𝐀𝐝𝐯𝓇,𝐪q𝓈+1(4𝓂q2𝓂)𝓈+qT2[h1(1α+n𝓀)]n/2+q𝓇2𝓂1+qT2𝓂,\mathbf{MaxAdv}_{\mathbf{\mathcal{r},q}}\leq\frac{q}{\mathcal{s}+1}\bigg{(}\frac{4\mathcal{m}q}{2^{\mathcal{m}}}\bigg{)}^{\mathcal{s}}+{qT\over 2}\Bigl{[}h^{-1}\Bigl{(}1-{\alpha+n\over\mathcal{k}}\Bigr{)}\Bigr{]}^{n/2}+\frac{q\mathcal{r}}{2^{\mathcal{m}-1}}+\frac{qT}{2^{\mathcal{m}}},\,

where 𝓇\mathcal{r} is the number of random oracle calls, α=𝓁+𝓂(q+1)+T\alpha=\mathcal{l}+\mathcal{m}(q+1)+T, 𝓈\mathcal{s} is an integer satisfying the equation T=𝓈(𝓂1)T=\mathcal{s}(\mathcal{m}-1) and h1h^{-1} is the inverse of the function hh restricted to [1/2,1][1/2,1], where hh is defined by h(p)=plog21p+(1p)log21p1h(p)=p\log_{2}\frac{1}{p}+(1-p)\log_{2}\frac{1}{p-1}.

Lets try to make sense of this bound. The first two terms have exponents that we can control by choosing the parameters of the cipher. Specifically, we can, with modest assumptions on the number of queries and amount of leakage, make the first term as small as desired by running the cipher for T=𝒪(log(q))T=\mathcal{O}(\log(q)) rounds and make the second term equally small by sampling n=𝒪(log(q))n=\mathcal{O}(\log(q)) probes in each round.
To make sense of the last two terms, lets consider an adversary which we will call the naive adversary. The naive adversary chooses a set \mathcal{M}^{\prime} of 𝓁𝓂\lfloor\frac{\mathcal{l}}{\mathcal{m}}\rfloor messages and uses their 𝓁\mathcal{l} bits of leakage to leak the ciphertext of each message in \mathcal{M}^{\prime}. Next, when placed in either world 0 or world 1, the naive adversary checks if any of the q random messages provided is from the collection \mathcal{M}^{\prime}. The naive adversary answers "world 1" if the corresponding ciphertext matches the laked ciphertext. Otherwise, they answer "world 0". If none of the qq provided messages are from \mathcal{M}^{\prime}, then the naive adversary answers based on the flip of an independent fair coin. Let 𝐀𝐝𝐯𝐧𝐚𝐢𝐯𝐞\mathbf{Adv_{naive}} denote the advantage of the naive adversary. Then,

𝐀𝐝𝐯𝐧𝐚𝐢𝐯𝐞=(Mi for some 1iq)(12𝓂).\mathbf{Adv_{naive}}=\mathbb{P}(M_{i}\in\mathcal{M}^{\prime}\text{ for some }1\leq i\leq q)(1-2^{-\mathcal{m}}).\,

Recall that the distinct messages M1,,MqM_{1},\ldots,M_{q} are sampled uniformly, and that ||=𝓁𝓂|\mathcal{M}^{\prime}|=\lfloor\frac{\mathcal{l}}{\mathcal{m}}\rfloor. Let X=|{M1,,Mq}|X=|\{M_{1},\ldots,M_{q}\}\cap\mathcal{M}^{\prime}|. Then XX is a hypergeometric random variable and

(Mi for some 1iq)=(X>0).\mathbb{P}(M_{i}\in\mathcal{M}^{\prime}\text{ for some }1\leq i\leq q)=\mathbb{P}(X>0).\,

Using the bound

(X>0)𝔼X1+𝔼X\mathbb{P}(X>0)\geq\frac{{{\mathbb{E}}}X}{1+{{\mathbb{E}}}X}\,

for hypergeometric random variables, we get

𝐀𝐝𝐯𝐧𝐚𝐢𝐯𝐞q𝓁𝓂2𝓂1+q𝓁𝓂2𝓂(12𝓂).\mathbf{Adv_{naive}}\geq\frac{q\lfloor\frac{\mathcal{l}}{\mathcal{m}}\rfloor 2^{-\mathcal{m}}}{1+q\lfloor\frac{\mathcal{l}}{\mathcal{m}}\rfloor 2^{-\mathcal{m}}}\cdot(1-2^{-\mathcal{m}}).\,

With the modest assumption that q𝓁𝓂2𝓂q\lfloor\frac{\mathcal{l}}{\mathcal{m}}\rfloor\leq 2^{\mathcal{m}}, and the fact that 𝓂1\mathcal{m}\geq 1, we can simplify this bound to

𝐀𝐝𝐯𝐧𝐚𝐢𝐯𝐞q𝓁𝓂42m.\mathbf{Adv_{naive}}\geq\frac{q\lfloor\frac{\mathcal{l}}{\mathcal{m}}\rfloor}{4\cdot 2^{m}}.\,

Returning to the bound of the advantage of the optimal adversary, if we assume that 𝓇,T𝓁𝓂\mathcal{r},T\leq\lfloor\frac{\mathcal{l}}{\mathcal{m}}\rfloor and q𝓁𝓂2mq\lfloor\frac{\mathcal{l}}{\mathcal{m}}\rfloor\leq 2^{m}, then

𝐌𝐚𝐱𝐀𝐝𝐯𝓇,𝐪q𝓈+1(4𝓂q2𝓂)𝓈+qT2[h1(1α+n𝓀)]n/2+12𝐀𝐝𝐯𝐧𝐚𝐢𝐯𝐞.\mathbf{MaxAdv}_{\mathbf{\mathcal{r},q}}\leq\frac{q}{\mathcal{s}+1}\bigg{(}\frac{4\mathcal{m}q}{2^{\mathcal{m}}}\bigg{)}^{\mathcal{s}}+{qT\over 2}\Bigl{[}h^{-1}\Bigl{(}1-{\alpha+n\over\mathcal{k}}\Bigr{)}\Bigr{]}^{n/2}+12\cdot\mathbf{Adv_{naive}}.\,

Thus, with realistic assumptions, no adversary can do much better than the naive adversary. To make this precise consider the following example. Let 𝓀=243\mathcal{k}=2^{43} bits and 𝓁=𝓀/8=240\mathcal{l}=\mathcal{k}/8=2^{40} bits, i.e. the key has a size of 1 terabyte out of which, 12.5%12.5\% or about 125 gigabytes can be leaked. Assume that the message length is 𝓂=128\mathcal{m}=128 bits. Fix n=500n=500, 𝓈=2\mathcal{s}=2 and T=𝓈(2𝓂1)=510T=\mathcal{s}(2\mathcal{m}-1)=510. Let Γ(q)\Gamma(q) denote the two leading terms on the RHS of the above inequality, i.e.

Γ(q)=q𝓈+1(4𝓂q2𝓂)𝓈+qT2[h1(1α+n𝓀)]n/2,\Gamma(q)=\frac{q}{\mathcal{s}+1}\bigg{(}\frac{4\mathcal{m}q}{2^{\mathcal{m}}}\bigg{)}^{\mathcal{s}}+{qT\over 2}\Bigl{[}h^{-1}\Bigl{(}1-{\alpha+n\over\mathcal{k}}\Bigr{)}\Bigr{]}^{n/2},

with values of 𝓀,𝓁,𝓂,n,𝓈\mathcal{k},\mathcal{l},\mathcal{m},n,\mathcal{s} and TT fixed as discussed above. Figure 1 shows a plot between log2(q)\log_{2}(q) and log2(Γ(q))-\log_{2}(\Gamma(q)), for values of qq satisfying q0q\geq 0, q𝓁𝓂2𝓂q\lfloor\frac{\mathcal{l}}{\mathcal{m}}\rfloor\leq 2^{\mathcal{m}}, 1(α+n)/𝓀01-(\alpha+n)/\mathcal{k}\geq 0 and Γ(q)1\Gamma(q)\leq 1. From this plot we can see that for the example under consideration, until about q=230q=2^{30}, any adversary can only have a slightly higher advantage than 12 times the advantage obtained using the naive strategy.

Refer to caption
Figure 1: Plot of log2(Γ(q))-\log_{2}\big{(}\Gamma(q)\big{)} vs log2(q)\log_{2}(q) for a Particular Example.

If a bound avoiding the use of h1h^{-1} is desired, we can make use of Lemma 3 in Section 5 which states

h1(z)12+121zln4.h^{-1}(z)\leq{{{\textstyle{1\over 2}}}}+{{{\textstyle{1\over 2}}}}\sqrt{1-z^{\ln 4}}.\,

This gives the following bound on the adversary’s advantage:

𝐌𝐚𝐱𝐀𝐝𝐯𝓇,𝐪q𝓈+1(4𝓂q2𝓂)𝓈+qT2[12+121(1α+n𝓀)ln4]n/2+q𝓇2𝓂1+qT2𝓂,\mathbf{MaxAdv}_{\mathbf{\mathcal{r},q}}\leq\frac{q}{\mathcal{s}+1}\bigg{(}\frac{4\mathcal{m}q}{2^{\mathcal{m}}}\bigg{)}^{\mathcal{s}}+{qT\over 2}\left[{{{\textstyle{1\over 2}}}}+{{{\textstyle{1\over 2}}}}\sqrt{1-\Bigl{(}1-{\alpha+n\over\mathcal{k}}\Bigr{)}^{\ln 4}}\right]^{n/2}+\frac{q\mathcal{r}}{2^{\mathcal{m}-1}}+\frac{qT}{2^{\mathcal{m}}},\,

4 Entropy Background and Notation

Let X,YX,Y be two random variables. Then let (X){\mathcal{L}}(X) and (X|Y){\mathcal{L}}(X{\;|\;}Y) denote the law of XX and the law of XX given YY respectively. For example, let XX be a uniform random variable over {0,1}𝓀0\{0,1\}^{\mathcal{k}_{0}} and suppose Ψ:{0,1}𝓀0{0,1}𝓁0{\Psi}:\{0,1\}^{\mathcal{k}_{0}}\to\{0,1\}^{\mathcal{l}_{0}}. We write (X|Ψ(X)){\mathcal{L}}(X{\;|\;}{\Psi}(X)) for the random probability measure pp defined by

p(x):={1|Ψ1(Ψ(X))|if Ψ(x)=Ψ(X);0otherwise.p(x):=\left\{\begin{array}[]{ll}{1\over|{\Psi}^{-1}({\Psi}(X))|}&\mbox{if ${\Psi}(x)={\Psi}(X)$;}\\[5.0pt] 0&\mbox{otherwise.}\\ \end{array}\right.

That is, if Ψ(X)=l{\Psi}(X)=l, then (X|Ψ(X)){\mathcal{L}}(X{\;|\;}{\Psi}(X)) is the uniform distribution over {x{0,1}𝓀0:Ψ(x)=l}\{x\in\{0,1\}^{\mathcal{k}_{0}}:{\Psi}(x)=l\}.
 
Let 𝐇{\mathbf{H}} be the entropy base 22, that is

𝐇(p):=xΩp(x)log21p(x).{\mathbf{H}}(p):=\sum_{x\in\Omega}p(x)\log_{2}{1\over p(x)}\,.

For a set SS, define 𝐇(S):=log2|S|{\mathbf{H}}(S):=\log_{2}|S|. That is, 𝐇(S){\mathbf{H}}(S) is the entropy of the uniform distribution over SS.

Lemma 2

Let XX be a uniform random variable over A{0,1}𝓀0A\subset\{0,1\}^{\mathcal{k}_{0}} and suppose Ψ:A{\Psi}:A\to{\mathcal{L}}, where ||=2𝓁0|{\mathcal{L}}|=2^{\mathcal{l}_{0}}. Define S(X):=Ψ1(Ψ(X))S(X):={\Psi}^{-1}({\Psi}(X)). Then

𝔼(𝐇(S(X)))log2|A|𝓁0.{{\mathbb{E}}}\Bigl{(}{\mathbf{H}}(S(X))\Bigr{)}\geq\log_{2}|A|-\mathcal{l}_{0}.

Furthermore for any mm\in\mathbb{R},

𝐏(𝐇(S(X))<log2|A|𝓁0m)2m.{\bf P}\Bigl{(}{\mathbf{H}}(S(X))<\log_{2}|A|-\mathcal{l}_{0}-m\Bigr{)}\leq 2^{-m}\,.

Proof: For ll\in{\mathcal{L}}, let Sl:={x:Ψ(x)=l}S_{l}:=\{x:{\Psi}(x)=l\}. Note that if XSlX\in S_{l} then S(X)=SlS(X)=S_{l}. It follows that

𝔼(𝐇(S(X)))\displaystyle{{\mathbb{E}}}\Bigl{(}{\mathbf{H}}(S(X))\Bigr{)} =\displaystyle= l|Sl||A|log2|Sl|\displaystyle\sum_{l\in{\mathcal{L}}}{|S_{l}|\over|A|}\log_{2}|S_{l}| (3)
=\displaystyle= log2|A|+l|Sl||A|log2|Sl||A|\displaystyle\log_{2}|A|+\sum_{l\in{\mathcal{L}}}{|S_{l}|\over|A|}\log_{2}{|S_{l}|\over|A|}
=\displaystyle= log2|A|+||[1||l|Sl||A|log2|Sl||A|].\displaystyle\log_{2}|A|+\,|{\mathcal{L}}|\Bigl{[}{1\over|{\mathcal{L}}|}\sum_{l\in{\mathcal{L}}}{|S_{l}|\over|A|}\log_{2}{|S_{l}|\over|A|}\Bigr{]}\,.

The average (over ll) of the quantity |Sl||A|{|S_{l}|\over|A|} is 1||{1\over|{\mathcal{L}}|}. Therefore, since the function xlogxx\log x is convex, Jensen’s inequality implies that the quantity (3) is at least

log2|A|+||(1||log2(1||))=log2|A|𝓁0.\log_{2}|A|+|{\mathcal{L}}|\left({1\over|{\mathcal{L}}|}\log_{2}\left({1\over|{\mathcal{L}}|}\right)\right)=\log_{2}|A|-\mathcal{l}_{0}\,.

For the second part of the lemma, note that

𝐏(𝐇(S(X))<log2|A|𝓁0m)\displaystyle{\bf P}\Bigl{(}{\mathbf{H}}(S(X))<\log_{2}|A|-\mathcal{l}_{0}-m\Bigr{)} =\displaystyle= 𝐏(|S(X)|<|A|2𝓁0m)\displaystyle{\bf P}(|S(X)|<|A|\cdot 2^{-\mathcal{l}_{0}-m})
=\displaystyle= |Sl||A|,\displaystyle\sum{|S_{l}|\over|A|},

where the sum is over ll such that |Sl|<|A|2𝓁0m|S_{l}|<|A|\cdot 2^{-\mathcal{l}_{0}-m}. Since each term in the sum is at most 2𝓁0m2^{-\mathcal{l}_{0}-m} and there are at most 2𝓁02^{\mathcal{l}_{0}} terms, the sum is at most 2m2^{-m}. \square

5 Entropy and Bernoulli Random Variables

Let h(p)h(p) denote the entropy of a Bernoulli(pp) random variable. That is, define h:[0,1][0,1]h:[0,1]\to[0,1] by

h(p)=plog21p+(1p)log211p.h(p)=p\log_{2}{1\over p}+(1-p)\log_{2}{1\over 1-p}\,.

The restriction of hh to [12,1][{{{\textstyle{1\over 2}}}},1] is a strictly decreasing and onto function and hence has an inverse h1:[0,1][12,1]h^{-1}:[0,1]\to[{{{\textstyle{1\over 2}}}},1]. Since hh is concave and decreasing on [12,1][{{{\textstyle{1\over 2}}}},1], the function h1h^{-1} is concave. Furthermore, note that for any p[0,1]p\in[0,1], we have h(p)=h(1p)h(p)=h(1-p) and hence

h1(h(p))=max(p,1p).h^{-1}(h(p))=\max(p,1-p)\,. (4)

Theorem 1.2 of [12] gives the following bound:

h(p)(4pq)1/ln4,h(p)\leq(4pq)^{1/\ln 4}, (5)

where q=1pq=1-p. This implies the following lemma.

Lemma 3

For any p[0,1]p\in[0,1], we have

p12(1+1h(p)ln4)p\leq{{{\textstyle{1\over 2}}}}\left(1+\sqrt{1-h(p)^{\ln 4}}\right)

Proof: Let Δ=max(p,1p)12\Delta=\max(p,1-p)-{{{\textstyle{1\over 2}}}}. Then

pq\displaystyle pq =\displaystyle= (12+Δ)(12Δ)\displaystyle\left({{\frac{1}{2}}}+\Delta\right)\left({{\frac{1}{2}}}-\Delta\right)
=\displaystyle= 14Δ2,\displaystyle{{\frac{1}{4}}}-\Delta^{2},

and hence

4pq=14Δ2.4pq=1-4\Delta^{2}. (6)

Equation (5) implies that

4pqh(p)ln4.4pq\geq h(p)^{\ln 4}.

Combining this with (6) gives

Δ214(1h(p)ln4),\Delta^{2}\leq{1\over 4}(1-h(p)^{\ln 4}),

and hence

Δ121h(p)ln4.\Delta\leq{1\over 2}\sqrt{1-h(p)^{\ln 4}}\,.

It follows that

p\displaystyle p \displaystyle\leq 12+Δ\displaystyle{1\over 2}+\Delta
\displaystyle\leq 12+121h(p)ln4\displaystyle{1\over 2}+{1\over 2}\sqrt{1-h(p)^{\ln 4}}
=\displaystyle= 12(1+1h(p)ln4).\displaystyle{1\over 2}\left(1+\sqrt{1-h(p)^{\ln 4}}\right).

\square

Recall that h(p)h(p) is the entropy of a Bernoulli(pp) random variable and if S{0,1}𝓀S\subset\{0,1\}^{\mathcal{k}} then 𝐇(S):=log2|S|{\mathbf{H}}(S):=\log_{2}|S|. We shall need the following entropy decomposition lemma.

Lemma 4

Suppose that S{0,1}𝓀S\subset\{0,1\}^{\mathcal{k}} and suppose that KK is uniformly distributed over SS. Then

i=1𝓀h((K[i]=1))𝐇(S).\sum_{i=1}^{\mathcal{k}}h\left(\mathbb{P}(K[i]=1)\right)\geq{\mathbf{H}}(S).

Proof: Note that 𝐇(S){\mathbf{H}}(S) is the entropy of KK. Applying the chain rule for entropy on K=(K[1],,K[𝓀])K=(K[1],\ldots,K[\mathcal{k}]) gives

i=1𝓀𝐇((K[i]|K[i1],K[i1],,K[1]))=𝐇(S).\sum_{i=1}^{\mathcal{k}}{\mathbf{H}}((K[i]{\;|\;}K[i-1],K[i-1],\ldots,K[1]\big{)})={\mathbf{H}}(S)\,.

It is well known that for any two discrete random variables Z,ZZ,Z^{\prime} on a common probability space, 𝐇(Z|Z)𝐇(Z)\mathbf{H}(Z|Z^{\prime})\leq\mathbf{H}(Z). So the above inequality gives

i=1𝓀𝐇((K[i]))i=1𝓀𝐇(K[i]|K[i1],,K[1])=𝐇(S).\sum_{i=1}^{\mathcal{k}}{\mathbf{H}}((K[i]))\geq\sum_{i=1}^{\mathcal{k}}\mathbf{H}\big{(}K[i]\ \big{|}\ K[i-1],\ldots,K[1]\big{)}={\mathbf{H}}(S)\,.

Finally, note that 𝐇(K[i])=h((K[i]=1)){\mathbf{H}}(K[i])=h\left(\mathbb{P}(K[i]=1)\right). \square

6 Main Technical Results

Lemma 5

Let Y=(Y1,Y2,,Yn){0,1}nY=(Y_{1},Y_{2},\ldots,Y_{n})\in\{0,1\}^{n} be a random n-bit string. For S{1,,n}S\subseteq\{1,\ldots,n\}, set fS(Y):=(1)iSYif_{S}(Y):=(-1)^{\oplus_{i\in S}Y_{i}}, with the convention that f1f_{\emptyset}\equiv 1. Also let

(S)\displaystyle\mathcal{E}(S) =\displaystyle= 𝔼[fS(Y)]\displaystyle\mathbb{E}[f_{S}(Y)] (7)
=\displaystyle= 2(12(iSYi=1)).\displaystyle 2\Bigl{(}{{{\textstyle{1\over 2}}}}-\mathbb{P}(\oplus_{i\in S}Y_{i}=1)\Bigr{)}\;.

Then for a uniformly chosen random subset 𝒮{1,,n}\mathcal{S}\subseteq\{1,\ldots,n\}, we have

𝔼[(𝒮)2]=y{0,1}n(Y=y)2.\mathbb{E}\big{[}\mathcal{E}(\mathcal{S})^{2}]=\sum_{y\in\{0,1\}^{n}}{\mathbb{P}}\bigl{(}\,{Y=y}\,\bigr{)}^{2}. (8)

Lemma 5 is a well-known consequence of Parseval’s theorem (see page 24 of [11]). For completeness, we give a proof here:
 
Proof: Let Ω={0,1}n\Omega=\{0,1\}^{n}. Note that Ω\mathbbm{R}^{\Omega}, the space of real valued functions on Ω\Omega, forms a vector space of dimension |Ω||\Omega| over \mathbbm{R}. Define the following inner product on Ω\mathbbm{R}^{\Omega}.

f,g=12nxΩf(x)g(x)for f,gΩ=𝔼[f(Z)g(Z)],\begin{split}\langle f,g\rangle&=\frac{1}{2^{n}}\sum_{x\in\Omega}f(x)g(x)\quad\text{for }f,g\in\mathbbm{R}^{\Omega}\\ &=\mathbb{E}[f(Z)g(Z)],\end{split}

where Z=(Z1,,Zn)Z=(Z_{1},\ldots,Z_{n}) and Z1,Z2,,ZnZ_{1},Z_{2},\ldots,Z_{n} are i.i.d Bernoulli(1/2) random variables. Observe that when SSS\neq S^{\prime},

fS,fS=𝔼[fS(Z)fS(Z)]=𝔼[iSS(1)2ZijSS(1)Zj]=iSS𝔼[(1)2Zi]jSS𝔼[(1)Zj]=0,\langle f_{S},f_{S^{\prime}}\rangle=\mathbb{E}[f_{S}(Z)f_{S^{\prime}}(Z)]=\mathbb{E}\Bigg{[}\prod_{i\in S\cap S^{\prime}}(-1)^{2Z_{i}}\prod_{j\in S\triangle S^{\prime}}(-1)^{Z_{j}}\Bigg{]}=\prod_{i\in S\cap S^{\prime}}\mathbb{E}\Big{[}(-1)^{2Z_{i}}\Big{]}\prod_{j\in S\triangle S^{\prime}}\mathbb{E}\Big{[}(-1)^{Z_{j}}\Big{]}=0,

since 𝔼[(1)Zi]=0\mathbb{E}[(-1)^{Z_{i}}]=0 and SSS\triangle S^{\prime} is non-empty when SSS\neq S^{\prime}. Also observe that

fS,fS=𝔼[(1)2(iSZi)]=𝔼[1]=1.\langle f_{S},f_{S}\rangle=\mathbb{E}\Big{[}(-1)^{2(\oplus_{i\in S}Z_{i})}\Big{]}=\mathbb{E}[1]=1.

Therefore, {fS}S2[n]\{f_{S}\}_{S\in 2^{[n]}} forms an orthonormal basis for 2Ω2^{\Omega}. Next, let U(y)=1/2nU(y)=1/2^{n} and P(y)=(Y=y)P(y)={\mathbb{P}}\bigl{(}\,{Y=y}\,\bigr{)} for yΩy\in\Omega. Then, P,U,P/UΩP,U,P/U\in\mathbb{R}^{\Omega}. Now note that

P/U,fS=12nxΩ2nP(x)fS(x)=xΩP(x)fS(x)=𝔼[fS(Y)]=(S).\langle P/U,f_{S}\rangle=\frac{1}{2^{n}}\sum_{x\in\Omega}2^{n}P(x)f_{S}(x)=\sum_{x\in\Omega}P(x)f_{S}(x)=\mathbb{E}[f_{S}(Y)]=\mathcal{E}(S)\,.

It follows that

12nP/U,fS2=12n(S)2.\frac{1}{2^{n}}\langle P/U,f_{S}\rangle^{2}=\frac{1}{2^{n}}\mathcal{E}(S)^{2}.

Summing the above equation over all subsets S{1,,n}S\subseteq\{1,\ldots,n\} and using the fact that fSsf_{S}^{\prime}s form an orthonormal basis, we get

12nP/U,P/U=S{1,,n}12nP/U,fS2=S{1,,n}12n(S)2=𝔼[(𝒮)2].\frac{1}{2^{n}}\langle P/U,P/U\rangle=\sum_{S\subseteq\{1,\ldots,n\}}\frac{1}{2^{n}}\langle P/U,f_{S}\rangle^{2}=\sum_{S\subseteq\{1,\ldots,n\}}\frac{1}{2^{n}}\mathcal{E}(S)^{2}=\mathbb{E}\big{[}\mathcal{E}(\mathcal{S})^{2}\big{]}.

The left hand side of the above equation simplifies to yΩP(y)2\sum_{y\in\Omega}P(y)^{2} and hence the proof is complete. \square

Note that (S)\mathcal{E}(S) is a measure of the bias in the parity of the bits of YY whose positions are in SS. More precisely, recall that for probability distributions μ\mu and ν\nu on a finite set Ω\Omega, the total variation distance

μνTV:=12xΩ|μ(x)ν(x)|.\lVert\mu-\nu\rVert_{TV}:=\frac{1}{2}\sum_{x\in\Omega}|\mu(x)-\nu(x)|. (9)

For a {0,1}\{0,1\}-valued random variable WW, the total variation distance

WBernoulli(12)TV=|(W=1)12|.\lVert W-{{\rm Bernoulli}({{{\textstyle{1\over 2}}}})}\rVert_{TV}=|\mathbb{P}(W=1)-{{{\textstyle{1\over 2}}}}|\;.

Equation (7) implies that

12(iSYi=1)=12(S),{{{\textstyle{1\over 2}}}}-\mathbb{P}(\oplus_{i\in S}Y_{i}=1)={{{\textstyle{1\over 2}}}}\mathcal{E}(S),

and hence

iSYiBernoulli(12)TV=12|(S)|.\lVert\oplus_{i\in S}Y_{i}-{{\rm Bernoulli}({{{\textstyle{1\over 2}}}})}\rVert_{TV}={{{\textstyle{1\over 2}}}}|\mathcal{E}(S)|. (10)

For S{1,,n}S\subseteq\{1,\ldots,n\}, define (S):=iSYiBernoulli(12)TV{\mathcal{B}}(S):=\lVert\oplus_{i\in S}Y_{i}-{{\rm Bernoulli}({{{\textstyle{1\over 2}}}})}\rVert_{TV}. Then

𝔼((𝒮))\displaystyle{{\mathbb{E}}}({\mathcal{B}}(\mathcal{S})) =\displaystyle= 12𝔼|(𝒮)|\displaystyle{{{\textstyle{1\over 2}}}}{{\mathbb{E}}}|\mathcal{E}(\mathcal{S})|
\displaystyle\leq 12𝔼((𝒮)2)\displaystyle{{{\textstyle{1\over 2}}}}\sqrt{{{\mathbb{E}}}(\mathcal{E}(\mathcal{S})^{2})}
=\displaystyle= 12[y{0,1}n(Y=y)2]1/2,\displaystyle{{{\textstyle{1\over 2}}}}\Biggl{[}\sum_{y\in\{0,1\}^{n}}{\mathbb{P}}\bigl{(}\,{Y=y}\,\bigr{)}^{2}\Biggr{]}^{1/2},

where the first line follows from equation (10), the second line follows from Jensen’s inequality and the third line follows from Lemma 5. This leads to the following:

Corollary 6

Let KK be a random string in {0,1}𝓀\{0,1\}^{\mathcal{k}}. Let (p1,,pn)(p_{1},\ldots,p_{n}) be a choice of probes. Let cc^{\prime} be a Bernoulli(1/21/2) random variable, and for S{1,,n}S\subset\{1,\dots,n\}, define

c(S):=iSK[pi];d(S):=c(S)cTV.c(S):=\oplus_{i\in S}K[p_{i}];\hskip 36.135ptd(S):=\lVert c(S)-c^{\prime}\rVert_{TV}.

If 𝒮\mathcal{S} is a uniform random subset of {1,2,,n}\{1,2,\dots,n\} then

𝔼(d(𝒮))12𝔼[y{0,1}n(K[p1,,pn]=y)2].\mathbb{E}(d(\mathcal{S}))\leq\frac{1}{2}\mathbb{E}\Bigg{[}\sqrt{\sum_{y\in\{0,1\}^{n}}{\mathbb{P}}\left({K[p_{1},\dots,p_{n}]=y\ }\right)^{2}\ }\Bigg{]}.

This shows that the expectation (taken over the subprobes) of the distance between the random bit cc and a Bernoulli(1/21/2) random variable can be bounded in terms of the l2l^{2}-norm of the distribution of K[p1,,pn]K[p_{1},\dots,p_{n}].

7 Main Lemma

Suppose KK is a uniform random element of {0,1}𝓀\{0,1\}^{\mathcal{k}} and suppose Ψ:{0,1}𝓀{\Psi}:\{0,1\}^{\mathcal{k}}\to{\mathcal{L}}. For ll\in{\mathcal{L}}, let Sl:=Ψ1(l)S_{l}:={\Psi}^{-1}(l). Define the probability measure l{\mathbb{P}_{l}} by

l():=(|Ψ(K)=l),{\mathbb{P}_{l}}(\,\cdot):=\mathbb{P}(\,\cdot{\;|\;}{\Psi}(K)=l)\,,

and write 𝔼l{{\mathbb{E}_{l}}} for the expectation operator with respect to l{\mathbb{P}_{l}}. Note that under l{\mathbb{P}_{l}}, the distribution of KK is uniform over SlS_{l}. For an integer rr with 1rn1\leq r\leq n and probes p1,,prp_{1},\dots,p_{r} define

gl(p1,,pr):=x{0,1}rl(K[p1,,pr]=x)2.{g}_{l}(p_{1},\dots,p_{r}):=\sum_{x\in\{0,1\}^{r}}{\mathbb{P}_{l}}(K[p_{1},\dots,p_{r}]=x)^{2}\,.
Lemma 7

Suppose that probes P1,P2,,PnP_{1},P_{2},\dots,P_{n} are chosen independently and uniformly at random from {1,2,,𝓀}\{1,2,\dots,\mathcal{k}\}. If 𝐇(Sl)𝓀α{\mathbf{H}}(S_{l})\geq\mathcal{k}-\alpha, then

𝔼l(gl(P1,,Pn))[h1(1α+n𝓀)]n.{{\mathbb{E}_{l}}}({g}_{l}(P_{1},\dots,P_{n}))\leq\Bigl{[}h^{-1}\Bigl{(}1-{\alpha+n\over\mathcal{k}}\Bigr{)}\Bigr{]}^{n}\,.

Proof: Fix ll with 𝐇(Sl)𝓀α{\mathbf{H}}(S_{l})\geq\mathcal{k}-\alpha. For x{0,1}rx\in\{0,1\}^{r}, and a choice of probes p1,,pr+1p_{1},\dots,p_{r+1}, define

λl(x,p1,,pr+1)\displaystyle{\lambda}_{l}(x,p_{1},\dots,p_{r+1}) :=\displaystyle:= l(K[pr+1]=1|K[p1,,pr]=x)\displaystyle{\mathbb{P}_{l}}(K[p_{r+1}]=1{\;|\;}K[p_{1},\dots,p_{r}]=x)
=\displaystyle= (K[pr+1]=1|K[p1,,pr]=x,Ψ(K)=l).\displaystyle\mathbb{P}(K[p_{r+1}]=1{\;|\;}K[p_{1},\dots,p_{r}]=x,{\Psi}(K)=l)\,.

Define μlr(x):l(K[p1,,pr]=x){\mu_{l}^{r}}(x):{\mathbb{P}_{l}}(K[p_{1},\dots,p_{r}]=x). Note that conditional on Ψ(K)=l{\Psi}(K)=l and K[p1,,pr]=xK[p_{1},\dots,p_{r}]=x, the distribution of KK is uniform over Sl{A{0,1}𝓀:A[p1,,pr]=x}S_{l}\cap\{A\in\{0,1\}^{\mathcal{k}}:A[p_{1},\dots,p_{r}]=x\}. Furthermore,

|Sl{A{0,1}𝓀:A[p1,,pr]=x}|=|Sl|μlr(x).|S_{l}\cap\{A\in\{0,1\}^{\mathcal{k}}:A[p_{1},\dots,p_{r}]=x\}|=|S_{l}|\cdot{\mu_{l}^{r}}(x)\,.

Hence Lemma 4 implies that

1𝓀j=1𝓀h(λ(x,p1,,pr,j))1𝓀log2(|Sl|μlr(x)).{1\over\mathcal{k}}\sum_{j=1}^{\mathcal{k}}h({\lambda}(x,p_{1},\dots,p_{r},j))\geq\frac{1}{\mathcal{k}}\log_{2}\left(|S_{l}|\cdot{\mu_{l}^{r}}(x)\right)\,. (11)

For any p1,,pr+1p_{1},\dots,p_{r+1}, we have

gl(p1,,pr+1)\displaystyle{g}_{l}(p_{1},\dots,p_{r+1}) (12)
=\displaystyle= y{0,1}r+1l(K[p1,,pr+1]=y)2\displaystyle\sum_{y\in\{0,1\}^{r+1}}{\mathbb{P}_{l}}(K[p_{1},\dots,p_{r+1}]=y)^{2} (13)
=\displaystyle= x{0,1}rl(K[p1,,pr]=x)2[λl(x,p1,,pr+1)2+(1λl(x,p1,,pr+1))2]\displaystyle\sum_{x\in\{0,1\}^{r}}{\mathbb{P}_{l}}(K[p_{1},\dots,p_{r}]=x)^{2}\Bigl{[}{\lambda}_{l}(x,p_{1},\dots,p_{r+1})^{2}+(1-{\lambda}_{l}(x,p_{1},\dots,p_{r+1}))^{2}\Bigr{]} (14)
=\displaystyle= x{0,1}rμlr(x)2[λl(x,p1,,pr+1)2+(1λl(x,p1,,pr+1))2].\displaystyle\sum_{x\in\{0,1\}^{r}}{\mu_{l}^{r}}(x)^{2}\Bigl{[}{\lambda}_{l}(x,p_{1},\dots,p_{r+1})^{2}+(1-{\lambda}_{l}(x,p_{1},\dots,p_{r+1}))^{2}\Bigr{]}\,. (15)

Note that for any p[0,1]p\in[0,1] we have p2+(1p)2max(p,1p)p^{2}+(1-p)^{2}\leq\max(p,1-p). Hence, the quantity in square brackets in equation (16) is at most

h1(h(λl(x,p1,,pr+1)))h^{-1}(h({\lambda}_{l}(x,p_{1},\dots,p_{r+1})))

by equation (4). Thus

gl(p1,,pr+1)\displaystyle{g}_{l}(p_{1},\dots,p_{r+1}) \displaystyle\leq x{0,1}rμlr(x)2h1(h(λl(x,p1,,pr+1))).\displaystyle\sum_{x\in\{0,1\}^{r}}{\mu_{l}^{r}}(x)^{2}h^{-1}(h({\lambda}_{l}(x,p_{1},\dots,p_{r+1})))\,. (16)

Recall that the probe Pr+1P_{r+1} is chosen uniformly at random from {1,2,,𝓀}\{1,2,\dots,\mathcal{k}\}. It follows that

𝔼l(gl(p1,,pr,Pr+1))\displaystyle{{\mathbb{E}_{l}}}({g}_{l}(p_{1},\dots,p_{r},P_{r+1})) \displaystyle\leq 1𝓀j=1𝓀x{0,1}rμlr(x)2h1(h(λl(x,p1,,pr,j)))\displaystyle{1\over\mathcal{k}}\sum_{j=1}^{\mathcal{k}}\sum_{x\in\{0,1\}^{r}}{\mu_{l}^{r}}(x)^{2}h^{-1}(h({\lambda}_{l}(x,p_{1},\dots,p_{r},j))) (17)
=\displaystyle= x{0,1}rμlr(x)2[1𝓀j=1𝓀h1(h(λl(x,p1,,pr,j)))].\displaystyle\sum_{x\in\{0,1\}^{r}}{\mu_{l}^{r}}(x)^{2}\Bigl{[}{1\over\mathcal{k}}\sum_{j=1}^{\mathcal{k}}h^{-1}(h({\lambda}_{l}(x,p_{1},\dots,p_{r},j)))\Bigr{]}\,. (18)

Recall that in Section 5 we showed that h1h^{-1} is concave. Thus, Jensen’s inequality implies that the quantity (18) is at most

x{0,1}rμlr(x)2h1(1𝓀j=1𝓀h(λl(x,p1,,pr,j)))\displaystyle\sum_{x\in\{0,1\}^{r}}{\mu_{l}^{r}}(x)^{2}h^{-1}\Bigl{(}{1\over\mathcal{k}}\sum_{j=1}^{\mathcal{k}}h({\lambda}_{l}(x,p_{1},\dots,p_{r},j))\Bigr{)} (19)
\displaystyle\leq x{0,1}rμlr(x)2h1(1𝓀log2(|Sl|μlr(x))),\displaystyle\sum_{x\in\{0,1\}^{r}}{\mu_{l}^{r}}(x)^{2}h^{-1}\Bigl{(}{1\over\mathcal{k}}\log_{2}(|S_{l}|\cdot{\mu_{l}^{r}}(x))\Bigr{)}, (20)

where the inequality follows from (11) and the fact that h1h^{-1} is decreasing. Recall that the Harris-FKG inequality (see Section 2.2 of [3]) implies that if XX is a random variable and ff (respectively, gg) is an increasing (respectively, decreasing) function, then

𝔼(f(X)g(X))𝔼(f(X))𝔼(g(X)).{{\mathbb{E}}}(f(X)g(X))\leq{{\mathbb{E}}}(f(X)){{\mathbb{E}}}(g(X))\,.

Now, consider the probability measure that assigns mass μlr(x){\mu_{l}^{r}}(x) to each x{0,1}rx\in\{0,1\}^{r}, and let X:{0,1}r𝐑X:\{0,1\}^{r}\to{\bf R} be the random variable defined by X(x)=μlr(x)X(x)={\mu_{l}^{r}}(x). Let ff be the identity function on 𝐑{\bf R} and define g:𝐑𝐑g:{\bf R}\to{\bf R} by g(u)=h1(1𝓀log2(|Sl|u))g(u)=h^{-1}\Bigl{(}{1\over\mathcal{k}}\log_{2}(|S_{l}|\cdot u)\Bigr{)}. Then ff is increasing and since h1h^{-1} is decreasing, gg is decreasing. Thus the Harris-FKG inequality implies that the quantity (LABEL:eighteen) is at most

(x{0,1}rμlr(x)2)(x{0,1}rμlr(x)h1(1𝓀log2(|Sl|μlr(x)))\displaystyle\Bigl{(}\sum_{x\in\{0,1\}^{r}}{\mu_{l}^{r}}(x)^{2}\Bigr{)}\Bigl{(}\sum_{x\in\{0,1\}^{r}}{\mu_{l}^{r}}(x)h^{-1}\Bigl{(}{1\over\mathcal{k}}\log_{2}(|S_{l}|\cdot{\mu_{l}^{r}}(x))\Bigr{)} (22)
\displaystyle\leq (x{0,1}rμlr(x)2)h1(1𝓀x{0,1}rμlr(x)log2(|Sl|μlr(x))).\displaystyle\Bigl{(}\sum_{x\in\{0,1\}^{r}}{\mu_{l}^{r}}(x)^{2}\Bigr{)}h^{-1}\Bigl{(}{1\over\mathcal{k}}\sum_{x\in\{0,1\}^{r}}{\mu_{l}^{r}}(x)\log_{2}(|S_{l}|\cdot{\mu_{l}^{r}}(x))\Bigr{)}\,. (23)

Applying the first part of Lemma 2 with Ψ(K)=K[p1,p2,,pr]{\Psi}(K)=K[p_{1},p_{2},\dots,p_{r}] gives

x{0,1}rμlr(x)log2(|Sl|μlr(x))\displaystyle\sum_{x\in\{0,1\}^{r}}{\mu_{l}^{r}}(x)\log_{2}(|S_{l}|\cdot{\mu_{l}^{r}}(x)) \displaystyle\geq 𝐇(Sl)r\displaystyle{\mathbf{H}}(S_{l})-r
\displaystyle\geq 𝓀αn,\displaystyle\mathcal{k}-\alpha-n,

where the second inequality follows from the fact that 𝐇(Sl)𝓀α{\mathbf{H}}(S_{l})\geq\mathcal{k}-\alpha and rnr\geq n. It follows that the quantity (23) is at most

(x{0,1}rμlr(x)2)h1(1𝓀(𝓀αn))\displaystyle\Bigl{(}\sum_{x\in\{0,1\}^{r}}{\mu_{l}^{r}}(x)^{2}\Bigr{)}h^{-1}\Bigl{(}{1\over\mathcal{k}}(\mathcal{k}-\alpha-n)\Bigr{)} (24)
=\displaystyle= (x{0,1}rμlr(x)2)h1(1α+n𝓀)\displaystyle\Bigl{(}\sum_{x\in\{0,1\}^{r}}{\mu_{l}^{r}}(x)^{2}\Bigr{)}h^{-1}\Bigl{(}1-{\alpha+n\over\mathcal{k}}\Bigr{)} (25)
=\displaystyle= gl(p1,p2,,pr)h1(1α+n𝓀).\displaystyle{g}_{l}(p_{1},p_{2},\dots,p_{r})h^{-1}\Bigl{(}1-{\alpha+n\over\mathcal{k}}\Bigr{)}\,. (26)

We have shown that for any choice of p1,p2,,prp_{1},p_{2},\dots,p_{r} we have

𝔼l(gl(p1,,pr,Pr+1))gl(p1,p2,,pr)h1(1α+n𝓀).{{\mathbb{E}_{l}}}({g}_{l}(p_{1},\dots,p_{r},P_{r+1}))\leq{g}_{l}(p_{1},p_{2},\dots,p_{r})h^{-1}\Bigl{(}1-{\alpha+n\over\mathcal{k}}\Bigr{)}\,.

It follows that

𝔼l(gl(P1,,Pr+1))𝔼l(gl(P1,P2,,Pr))h1(1α+n𝓀).{{\mathbb{E}_{l}}}({g}_{l}(P_{1},\dots,P_{r+1}))\leq{{\mathbb{E}_{l}}}({g}_{l}(P_{1},P_{2},\dots,P_{r}))h^{-1}\Bigl{(}1-{\alpha+n\over\mathcal{k}}\Bigr{)}\,.

Since this is true for all rr with 1rn1\leq r\leq n, we have

𝔼l(gl(P1,,Pn))𝔼l(gl(P1))[h1(1α+n𝓀)]n1.{{\mathbb{E}_{l}}}({g}_{l}(P_{1},\dots,P_{n}))\leq{{\mathbb{E}_{l}}}({g}_{l}(P_{1}))\Bigl{[}h^{-1}\Bigl{(}1-{\alpha+n\over\mathcal{k}}\Bigr{)}\Bigr{]}^{n-1}\,.

Finally, an argument similar to above (eliminating all sums over {0,1}r\{0,1\}^{r} and replacing μlr(x){\mu_{l}^{r}}(x) by 11) shows that

𝔼l(gl(P1))h1(1α+n𝓀),{{\mathbb{E}_{l}}}({g}_{l}(P_{1}))\leq h^{-1}\Bigl{(}1-{\alpha+n\over\mathcal{k}}\Bigr{)}\,,

and the lemma follows. \square

8 Proof of Main Theorem

In this section we will prove Theorem 1, which bounds the advantage of a KPA adversary with leakage against the cipher described in section 2. Recall that the bound in question is

𝐌𝐚𝐱𝐀𝐝𝐯𝓇,𝐪q𝓈+1(4𝓂q2𝓂)𝓈+qT2[h1(1α+n𝓀)]n/2+q𝓇2𝓂1+qT2𝓂,\mathbf{MaxAdv}_{\mathbf{\mathcal{r},q}}\leq\frac{q}{\mathcal{s}+1}\bigg{(}\frac{4\mathcal{m}q}{2^{\mathcal{m}}}\bigg{)}^{\mathcal{s}}+{qT\over 2}\Bigl{[}h^{-1}\Bigl{(}1-{\alpha+n\over\mathcal{k}}\Bigr{)}\Bigr{]}^{n/2}+\frac{q\mathcal{r}}{2^{\mathcal{m}-1}}+\frac{qT}{2^{\mathcal{m}}},\,

First, we prove the bound assuming that the adversary makes no random oracle calls. Let (Mi,Ci)i=1q({{M}}_{i},{{C}}_{i})_{i=1}^{q} be the uniform random sequence of input/output pairs given to the adversary.
The adversary’s advantage satisfies

𝐌𝐚𝐱𝐀𝐝𝐯𝟎,𝐪(Mi,Ci)i=1q(Miu,Ciu)i=1qTV,\mathbf{MaxAdv}_{\mathbf{0,q}}\leq\lVert({{M}}_{i},C_{i})_{i=1}^{q}-({{M}}^{u}_{i},C^{u}_{i})_{i=1}^{q}\rVert_{TV},

where (Miu,Ciu)i=1q({{M}}^{u}_{i},C^{u}_{i})_{i=1}^{q} are qq uniform random queries from a uniform random permuation. Let (MiTh,CiTh)i=1q({{M}}^{{\rm Th}}_{i},C^{{\rm Th}}_{i})_{i=1}^{q} be qq uniform random queries from TT rounds of an idealized Thorp shuffle that uses a uniform random round function FF instead of a pseudorandom function.

In [9], Morris, Rogaway and Stegers prove the following.

Theorem 8

[9] Let T=𝓈(2𝓂1)T=\mathcal{s}(2\mathcal{m}-1) for some whole number ss, where 2𝓂=||2^{\mathcal{m}}=|\mathcal{M}|. Then

(MiTh,CiTh)i=1q(Miu,Ciu)i=1qTVq𝓈+1(4𝓂q2𝓂)𝓈.\lVert({{M}}^{{\rm Th}}_{i},C^{{\rm Th}}_{i})_{i=1}^{q}-({{M}}^{u}_{i},C^{u}_{i})_{i=1}^{q}\rVert_{TV}\leq\frac{q}{\mathcal{s}+1}\bigg{(}\frac{4\mathcal{m}q}{2^{\mathcal{m}}}\bigg{)}^{\mathcal{s}}.

Combining this result with a bound on

(Mi,Ci)i=1q(MiTh,CiTh)i=1qTV\lVert({{M}}_{i},C_{i})_{i=1}^{q}-({{M}}^{{\rm Th}}_{i},C^{{\rm Th}}_{i})_{i=1}^{q}\rVert_{TV} (27)

will give the claimed bound on the adversary’s advantage. To bound (27) we use a hybrid argument.
 
For 0lq0\leq l\leq q, let CilC^{l}_{i} be the result of TT Thorp shuffles starting from MiM_{i}, where the round function used to determine the “random bits” of the shuffle is defined as follows:

  1. 1.

    If ili\leq l then we use the round function FkF_{k}.

  2. 2.

    If l+1iql+1\leq i\leq q, then at any step not already determined by the round functions used to evaluate the first ll queries, we use a uniform random function FF.

Define Ql=(Mi,Cil)i=1qQ_{l}=({{M}}_{i},{{C}}^{l}_{i})_{i=1}^{q}. Thus, the first ll queries of QlQ_{l} correspond to the Thorp shuffle using the pseudorandom round function FkF_{k} and the final qlq-l queries correspond to the Thorp shuffle using a uniform random round function (except at steps that are already “forced” by the trajectories of the first ll queries.) Note that Q0Q_{0} corresponds to the Thorp shuffle with a uniform random round function and QqQ_{q} corresponds to the Thorp shuffle with the “big key” pseudorandom round function FkF_{k}. Thus, the triangle inequality gives us

(Mi,Ci)i=1q(MiTh,CiTh)i=1qTV\displaystyle\lVert({{M}}_{i},C_{i})_{i=1}^{q}-({{M}}^{{\rm Th}}_{i},C^{{\rm Th}}_{i})_{i=1}^{q}\rVert_{TV} \displaystyle\leq k=0q1Qk+1QkTV\displaystyle\sum_{k=0}^{q-1}\lVert Q_{k+1}-Q_{k}\rVert_{TV}

To bound the terms of this sum, we prove the following lemma:

Lemma 9

For all ss we have

Qs+1QsTVT2[h1(1α+n𝓀)]n/2+T2𝓂.\lVert Q_{s+1}-Q_{s}\rVert_{TV}\leq{T\over 2}\Bigl{[}h^{-1}\Bigl{(}1-{\alpha+n\over\mathcal{k}}\Bigr{)}\Bigr{]}^{n/2}\,+T\cdot 2^{-\mathcal{m}}.

Proof: It is sufficient to bound

ZsZsTV,\lVert Z_{s}-Z^{\prime}_{s}\rVert_{TV},

where

Zs=(Qs+1,𝒯s+1);Zs=(Qs,𝒯s+1).Z_{s}=(Q_{s+1},{\mathcal{T}}_{s+1});\hskip 72.26999ptZ^{\prime}_{s}=(Q_{s},{\mathcal{T}}_{s+1})\;.


and 𝒯s+1=(X0(Ms+1),,XT(Ms+1)){\mathcal{T}}_{s+1}=(X_{0}(M_{s+1}),\dots,X_{T}({{M}}_{s+1})) is the trajectory of message Ms+1{{M}}_{s+1}.
 
Again, we use a hybrid argument. For ii with 1iT1\leq i\leq T, let Ps,iP_{s,i} be the algorithm defined as follows:
 
Algorithm Ps,iP_{s,i}: For the first ss queries and for the first ii rounds of query s+1s+1, we use the pseudorandom function FkF_{k}. For rounds i+1,,Ti+1,\dots,T of query s+1s+1 and for queries s+2,,qs+2,\dots,q, any random bit (that was not already determined by the previous queries) will be defined using a uniform random function FF.
 
Let Ws,iW_{s,i} be the value of ((Mi,Ci)i=1q,𝒯s+1)\Bigl{(}({{M}}_{i},{{C}}_{i})_{i=1}^{q},{\mathcal{T}}_{s+1}\Bigr{)} when algorithm Ps,iP_{s,i} is followed. Note that Ws,TW_{s,T} has the distribution of ZsZ_{s} and Ws,0W_{s,0} has the distribution of ZsZ^{\prime}_{s}. Therefore, by another application of the triangle inequality, we have

ZsZsTVi=0T1Ws,i+1Ws,iTV\lVert Z_{s}-Z_{s}^{\prime}\rVert_{TV}\leq\sum\limits_{i=0}^{T-1}\lVert W_{s,i+1}-W_{s,i}\rVert_{TV}

The only difference between Ps,i+1P_{s,i+1} and Ps,iP_{s,i} is the bit used in round i+1i+1 of query s+1s+1. If this bit was determined by the previous queries, then it has the same value in both Ps,i+1P_{s,i+1} and Ps,iP_{s,i}. Otherwise, it is a Bernoulli(12){{\rm Bernoulli}({{{\textstyle{1\over 2}}}})} random variable in Ps,iP_{s,i} and it uses the “big key” pseudorandom function FKF_{K} in Ps,i+1P_{s,i+1}. It is enough to show that the claimed bound on the total variation distance holds even if we condition on the input messages M1,,MqM_{1},\dots,M_{q}. So let m1,,mqm_{1},\dots,m_{q} be arbitrary input messages. Let Ψ{\Psi} be a function on {0,1}𝓀\{0,1\}^{\mathcal{k}} such that Ψ(K){\Psi}(K) encodes

  1. 1.

    Φ(K)\Phi(K)

  2. 2.

    the values of C1,,CsC_{1},\dots,C_{s} and X1(ms+1),,Xi(ms+1)X_{1}(m_{s+1}),\dots,X_{i}(m_{s+1}) when algorithm Ps,i+1P_{s,i+1} is used with key KK and input messages m1,,msm_{1},\dots,m_{s}.

Let L=Ψ(K)L={\Psi}(K). Note that there are at most

2𝓁(2𝓂)s2i=2𝓁+𝓂s+i2𝓁+𝓂q+T2^{\mathcal{l}}\cdot\left(2^{\mathcal{m}}\right)^{s}\cdot 2^{i}=2^{\mathcal{l}+\mathcal{m}s+i}\leq 2^{\mathcal{l}+\mathcal{m}q+T}

possible values of LL. Define Sl:=Ψ1(l)S_{l}:={\Psi}^{-1}(l). We can use Lemma 2 to get a bound on the size of SLS_{L} that holds with high probability. More precisely, Lemma 2

(𝐇(|SL|)𝓀𝓁𝓂(q+1)T)2𝓂.\mathbb{P}\Big{(}{\mathbf{H}}(|S_{L}|)\leq\mathcal{k}-\mathcal{l}-\mathcal{m}(q+1)-T\Big{)}\leq 2^{-\mathcal{m}}\,.

On the event that 𝐇(|SL|)𝓀𝓁𝓂(q+1)T{\mathbf{H}}(|S_{L}|)\leq\mathcal{k}-\mathcal{l}-\mathcal{m}(q+1)-T, we can use Lemma 7 to bound the total variation distance. Using Lemma 7 with α=𝓁+𝓂(q+1)+T\alpha=\mathcal{l}+\mathcal{m}(q+1)+T and combining this with Corollary 6 shows that if BB is the random bit generated by Algorithm Ps,i+1P_{s,i+1} then

𝔼(BBernoulli(12)TV)12[h1(1α+n𝓀)]n/2+2𝓂.{{\mathbb{E}}}\Bigl{(}\lVert B-{{\rm Bernoulli}({{{\textstyle{1\over 2}}}})}\rVert_{TV}\Bigr{)}\leq{1\over 2}\Bigl{[}h^{-1}\Bigl{(}1-{\alpha+n\over\mathcal{k}}\Bigr{)}\Bigr{]}^{n/2}+2^{-\mathcal{m}}\,.

Since this one random bit is only nondeterministic difference between Ws,i+1W_{s,i+1} and Ws,iW_{s,i}, we have

Ws,i+1Ws,iTV12[h1(1α+n𝓀)]n/2+2𝓂.\lVert W_{s,i+1}-W_{s,i}\rVert_{TV}\leq{1\over 2}\Bigl{[}h^{-1}\Bigl{(}1-{\alpha+n\over\mathcal{k}}\Bigr{)}\Bigr{]}^{n/2}+2^{-\mathcal{m}}\,.

This quantity is independent of ii, so

ZsZsTVT2[h1(1α+n𝓀)]n/2+T2𝓂.\lVert Z_{s}-Z_{s}^{\prime}\rVert_{TV}\leq{T\over 2}\Bigl{[}h^{-1}\Bigl{(}1-{\alpha+n\over\mathcal{k}}\Bigr{)}\Bigr{]}^{n/2}+T\cdot 2^{-\mathcal{m}}\,.

\square

Now we use Lemma 9 to bound the sum,

(Mi,Ci)i=1q(MiTh,CiTh)i=1qTV\displaystyle\lVert({{M}}_{i},C_{i})_{i=1}^{q}-({{M}}^{{\rm Th}}_{i},C^{{\rm Th}}_{i})_{i=1}^{q}\rVert_{TV} \displaystyle\leq k=0q1Qk+1QkTV\displaystyle\sum_{k=0}^{q-1}\lVert Q_{k+1}-Q_{k}\rVert_{TV}
\displaystyle\leq qT2[h1(1α+n𝓀)]n/2+qT2𝓂.\displaystyle{qT\over 2}\Bigl{[}h^{-1}\Bigl{(}1-{\alpha+n\over\mathcal{k}}\Bigr{)}\Bigr{]}^{n/2}+qT\cdot 2^{-\mathcal{m}}\,.

Combining this with Theorem 8 and another application of the triangle inequality gives

(Mi,Ci)i=1q(Miu,Ciu)i=1qTVqT2[h1(1α+n𝓀)]n/2+qT2𝓂+q𝓈+1(4𝓂q2𝓂)𝓈.\lVert({{M}}_{i},C_{i})_{i=1}^{q}-({{M}}^{u}_{i},C^{u}_{i})_{i=1}^{q}\rVert_{TV}\leq{qT\over 2}\Bigl{[}h^{-1}\Bigl{(}1-{\alpha+n\over\mathcal{k}}\Bigr{)}\Bigr{]}^{n/2}+qT\cdot 2^{-\mathcal{m}}+\frac{q}{\mathcal{s}+1}\bigg{(}\frac{4\mathcal{m}q}{2^{\mathcal{m}}}\bigg{)}^{\mathcal{s}}\,. (28)

Finally, we consider the effect of random oracle calls made by the adversary before calculation of Φ\Phi. Let 𝒞{\mathcal{C}} be the set of random oracle calls made by the adversary. Note that

𝒞=i=1T𝒞i,{\mathcal{C}}=\cup_{i=1}^{T}{\mathcal{C}}_{i},

where 𝒞i{\mathcal{C}}_{i} is the set of random oracle calls where the input is (R,i)(R,i) for some RR. Let EE be the event that at least one of the random oracle calls used to evaluate the Mi{{M}}_{i} is in 𝒞{\mathcal{C}}. Note that for a uniform random message, the value of RR (the rightmost 𝓂1\mathcal{m}-1 bits) after any number of Thorp shuffles is uniform over {0,1}𝓂1\{0,1\}^{\mathcal{m}-1}. Hence, the probability that the random oracle call used in stage rr of the shuffle is in 𝒞i{\mathcal{C}}_{i} is |𝒞i|2𝓂1\displaystyle\frac{|{\mathcal{C}}_{i}|}{2^{\mathcal{m}-1}}. Hence, taking a union bound over queries and time steps gives

(E)\displaystyle\mathbb{P}(E) \displaystyle\leq qi=1T|𝒞i|2𝓂1\displaystyle q\sum_{i=1}^{T}\frac{|{\mathcal{C}}_{i}|}{2^{\mathcal{m}-1}}
=\displaystyle= q|𝒞|2𝓂1\displaystyle\frac{q|{\mathcal{C}}|}{2^{\mathcal{m}-1}}
=\displaystyle= q𝓇2𝓂1.\displaystyle\frac{q\mathcal{r}}{2^{\mathcal{m}-1}}.

On the event ECE^{C}, the adversary’s random oracle calls are separate from all the oracle calls used to compute each MiM_{i}. Since random oracle calls are independent of all each other, the information from the adversary’s random oracle calls is irrelevant toward determining if they are in world 0 or world 1. Therefore, unless EE occurs, the adversary is as good as an adversary with no random oracle calls. We complete the theorem by adding (E)\mathbb{P}(E) to the advantage of an adversary with no random oracle calls.

𝐌𝐚𝐱𝐀𝐝𝐯𝓇,𝐪\displaystyle\mathbf{MaxAdv}_{\mathbf{\mathcal{r},q}} \displaystyle\leq 𝐌𝐚𝐱𝐀𝐝𝐯𝟎,𝐪+(E)\displaystyle\mathbf{MaxAdv}_{\mathbf{0,q}}+\mathbb{P}(E)
\displaystyle\leq qT2[h1(1α+n𝓀)]n/2+qT2𝓂+q𝓈+1(4𝓂q2𝓂)𝓈+q𝓇2𝓂1.\displaystyle{qT\over 2}\Bigl{[}h^{-1}\Bigl{(}1-{\alpha+n\over\mathcal{k}}\Bigr{)}\Bigr{]}^{n/2}+qT\cdot 2^{-\mathcal{m}}+\frac{q}{\mathcal{s}+1}\bigg{(}\frac{4\mathcal{m}q}{2^{\mathcal{m}}}\bigg{)}^{\mathcal{s}}+\frac{q\mathcal{r}}{2^{\mathcal{m}-1}}\,.

References

  • [1] M. Bellare, D. Kane, and P. Rogaway. Big-Key Symmetric Encryption: Resisting Key Exfiltration. CRYPTO 2016, pp. 373-402, 2016
  • [2] M. Bellare, P. Rogaway. Random Oracles are Practical: A Paradigm for Designing Efficient Protocols. ACM Conference on Computer and Communications Security, pp. 62–73.
  • [3] G. Grimmett. Percolation. Springer-Verlag, 1999.
  • [4] D. Levin, Y. Peres, and E. Wilmer. Markov chains and mixing times. American Mathematical Society, 2008.
  • [5] M. Luby and C. Rackoff. How to Construct Pseudorandom Permutations from Pseudorandom Functions. SIAM Journal on Computing, 17 (2), pp. 373–386.
  • [6] B. Morris. Improved mixing time bounds for the Thorp shuffle. Combinatorics, Probability and Computing, 22(1), 2013.
  • [7] B. Morris. The mixing time of the Thorp shuffle. SIAM J. on Computing, 38(2), pp. 484–504, 2008. Earlier version in STOC 2005.
  • [8] B. Morris and P. Rogaway. Sometimes-Recurse shuffle: Almost-random permutations in logarithmic expected time. EUROCRYPT 2014, LNCS vol. 8441, Springer, pp. 311–326, 2014.
  • [9] B. Morris, P. Rogaway, and T. Stegers. How to encipher messages on a small domain: deterministic encryption and the Thorp shuffle. CRYPTO 2009, LNCS vol. 2009, Springer, pp. 286–302, 2009.
  • [10] B. Morris and Y. Peres. Evolving sets, mixing and heat kernel bounds. Probability Theory and Related Fields, 2005.
  • [11] R. O’Donnell. Analysis of Boolean Functions. Cambridge University Press, 2014.
  • [12] F. Topsøe. Bounds for entropy and divergence for distributions over a two-element set. Journal of Inequalities in Pure and Applied Mathematics, 2001.