This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

A Parallel Repetition Theorem for the GHZ Game

Justin Holmgren NTT Research. E-mail: justin.holmgren@ntt-research.com. Research conducted at Princeton University, supported in part by the Simons Collaboration on Algorithms and Geometry and NSF grant No. CCF-1714779.    Ran Raz Department of Computer Science, Princeton University. E-mail: ranr@cs.princeton.edu. Research supported by the Simons Collaboration on Algorithms and Geometry, by a Simons Investigator Award and by the National Science Foundation grants No. CCF-1714779, CCF-2007462.
Abstract

We prove that parallel repetition of the (3-player) GHZ game reduces the value of the game polynomially fast to 0. That is, the value of the GHZ game repeated in parallel tt times is at most tΩ(1)t^{-\Omega(1)}. Previously, only a bound of 1α(t)\approx\frac{1}{\alpha(t)}, where α\alpha is the inverse Ackermann function, was known [Ver96].

The GHZ game was recently identified by Dinur, Harsha, Venkat and Yuen as a multi-player game where all existing techniques for proving strong bounds on the value of the parallel repetition of the game fail. Indeed, to prove our result we use a completely new proof technique. Dinur, Harsha, Venkat and Yuen speculated that progress on bounding the value of the parallel repetition of the GHZ game may lead to further progress on the general question of parallel repetition of multi-player games. They suggested that the strong correlations present in the GHZ question distribution represent the “hardest instance” of the multi-player parallel repetition problem [DHVY17].

Another motivation for studying the parallel repetition of the GHZ game comes from the field of quantum information. The GHZ game, first introduced by Greenberger, Horne and Zeilinger [GHZ89], is a central game in the study of quantum entanglement and has been studied in numerous works. For example, it is used for testing quantum entanglement and for device-independent quantum cryptography. In such applications a game is typically repeated to reduce the probability of error, and hence bounds on the value of the parallel repetition of the game may be useful.

1 Introduction

In a kk-player game, players are given correlated “questions” (q1,,qk)(q_{1},\ldots,q_{k}) sampled from a distribution QQ and must produce corresponding “answers” (a1,,ak)(a_{1},\ldots,a_{k}) such that (q1,,qk,a1,,ak)(q_{1},\ldots,q_{k},a_{1},\ldots,a_{k}) satisfy a fixed predicate π\pi. Crucially, the players are not allowed to communicate amongst themselves after receiving their questions (but they may agree upon a strategy beforehand). The value of the game is the probability with which the players can win with an optimal strategy. Multi-player games play a central role in theoretical computer science due to their intimate connection with multi-prover interactive proofs (MIPs) [BGKW88], hardness of approximation [FGL+91], communication complexity [PRW97, BJKS04], and the EPR paradox and non-local games [EPR35, CHTW04]

One basic operation on multi-player games is parallel repetition. In the tt-wise parallel repetition of a game, question tuples (q1(i),,qk(i))(q^{(i)}_{1},\ldots,q^{(i)}_{k}) are sampled independently for i[t]i\in[t]. The jthj^{th} player is given (qj(1),,qj(t))(q^{(1)}_{j},\ldots,q^{(t)}_{j}), and is required to produce (aj(1),,aj(t))(a^{(1)}_{j},\ldots,a^{(t)}_{j}). The players win if for every i[t]i\in[t], (a1(i),,ak(i))(a^{(i)}_{1},\ldots,a^{(i)}_{k}) is a winning answer for questions (q1(i),,qk(i))(q^{(i)}_{1},\ldots,q^{(i)}_{k}). Parallel repetition was first proposed in [FRS94] as an intuitive attempt to reduce the value of a game from ϵ\epsilon to ϵt\epsilon^{t}, but in general this is not what happens [For89, Fei91, FV96, Raz11]. The actual effect is far more subtle and a summary of some of the known results is given in Table 1.

Two-player games 3\geq 3-player games
Classical exp(Ω(t))\exp(-\Omega(t)) [Raz98] O(1α(t))O\left(\frac{1}{\alpha(t)}\right) [Ver96]
Entangled tΩ(1)t^{-\Omega(1)} [Yue16] O(1)O(1) (trivial)
Non-Signaling exp(Ω(t))\exp\left(-\Omega\left(t\right)\right) [Hol09] Ω(1)\Omega(1) [HY19]
Table 1: Known bounds on the worst-case (slowest) decay for various values of the tt-wise parallel repetition of a non-trivial game. α\alpha denotes the inverse Ackermann function.

Much less is known about games with three or more players than about two-player games. Only very weak bounds are known on how tt-wise parallel repetition decreases the value of a three-player game (as a function of tt). There is a similar gap in our understanding when players are allowed to share entangled state; in fact, no bounds here are known whatsoever in the three-player case. If players are more generally allowed to use any no-signaling strategy, then there are in fact counterexamples (lower bounds) showing that parallel repetition may utterly fail to reduce the (no-signaling) value of a three-player game.

1.1 The GHZ Game

The GHZ game, which we will denote by 𝒢𝖦𝖧𝖹{\cal G}_{\mathsf{GHZ}}, is a three-player game with query distribution Q𝖦𝖧𝖹Q_{\mathsf{GHZ}} that is uniform on {x𝔽23:x1+x2+x3=0}\{x\in{\mathbb{F}}_{2}^{3}:x_{1}+x_{2}+x_{3}=0\}. To win, players are required on input (x1,x2,x3)(x_{1},x_{2},x_{3}) to produce (y1,y2,y3)(y_{1},y_{2},y_{3}) such that y1y2y3=x1x2x3y_{1}\oplus y_{2}\oplus y_{3}=x_{1}\lor x_{2}\lor x_{3}. It is easily verified that the value of 𝒢𝖦𝖧𝖹{\cal G}_{\mathsf{GHZ}} is 3/43/4.

Dinur et al. [DHVY17] identified the GHZ game as a simple example of a game for which we do not know exponential decay bounds, writing

“We suspect that progress on bounding the value of the parallel repetition of the GHZ game will lead to further progress on the general question.”

and

“We believe that the strong correlations present in the GHZ question distribution represent the “hardest instance” of the multiplayer parallel repetition problem. Existing techniques from the two-player case (which we leverage in this paper) appear to be incapable of analyzing games with question distributions with such strong correlations.”

The GHZ game also plays an important role in quantum information theory and in particular in entanglement testing and device-independent quantum cryptography. Its salient properties are that it is an XOR game for which quantum (entangled) players can play perfectly, but classical players can win only with probability strictly less than 11 [MS13]. No such two-player game is known. Moreover, the GHZ game has the so called, self testing property, that all quantum strategies that achieve value 1 are essentially equivalent. This property is important for entanglement testing and device-independent quantum cryptography.

Prior to our work, the best known parallel repetition bound for the GHZ game was due to Verbitsky [Ver96], who observed a connection between parallel repetition and the density Hales-Jewett theorem from Ramsey theory [FK91]. Using modern quantitative versions of this theorem [Pol12], Verbitsky’s result implies a bound of approximately 1α(t)\frac{1}{\alpha(t)}, where α\alpha is the inverse Ackermann function.

We prove a bound of tΩ(1)t^{-\Omega(1)}.

2 Technical Overview

To prove our parallel repetition theorem for the GHZ game we show that for an arbitrary strategy, even if we condition on that strategy winning in several coordinates i1,,imi_{1},\ldots,i_{m}, there still exists some coordinate in which that strategy loses with significant probability. We consider the finer-grained event that also specifies specific queries and answers in coordinates i1,,imi_{1},\ldots,i_{m}, and abstract it out as a sufficiently dense product event EE over the three players’ inputs.

Given an arbitrary product event EE that occurs with sufficiently high probability, we show that some coordinate of P~=𝖽𝖾𝖿P|E\tilde{P}\stackrel{{\scriptstyle\mathsf{def}}}{{=}}P|E is hard. We do this in three high-level steps:

  1. 1.

    We first prove this for the simpler case in which EE is an affine subspace of 𝔽23×n{\mathbb{F}}_{2}^{3\times n}. In fact, we show in this case that many coordinates of P~\tilde{P} are hard.

  2. 2.

    We then prove that when EE is arbitrary, P~\tilde{P} can be written as a convex combination of components P~|𝒲\tilde{P}|{\cal W}, where 𝒲{\cal W} is a large affine subspace, with most such components “indistinguishable” from P|𝒲P|{\cal W}. Specifically, our main requirement is that for all sufficiently compressing linear functions ϕ\phi on 𝒲{\cal W}, the KL divergence of ϕ(X~)\phi(\tilde{X}) from ϕ(X)\phi(X) is small, where we sample X~P~|𝒲\tilde{X}\leftarrow\tilde{P}|{\cal W} and XP|𝒲X\leftarrow P|{\cal W}.

  3. 3.

    With this notion of indistinguishability, we prove that if P~|𝒲\tilde{P}|{\cal W} is indistinguishable from P|𝒲P|{\cal W}, then the GHZ game (or any game with a constant-sized answer alphabet) is roughly as hard in every coordinate with query distribution P~|𝒲\tilde{P}|{\cal W} as with P|𝒲P|{\cal W}.

We conclude that for many coordinates ii, there is a significant portion of P~\tilde{P} for which the ithi^{th} coordinate is hard. We emphasize that unlike all previous parallel repetition bounds, our proof does not construct a local embedding of Q𝖦𝖧𝖹Q_{\mathsf{GHZ}} into P~\tilde{P} for general EE.

Local Embeddability in Affine Subspaces

We first show that if EE is any affine subspace of sufficiently low codimension mm in 𝔽23×n{\mathbb{F}}_{2}^{3\times n}, then there exist many coordinates i[n]i\in[n] for which Q𝖦𝖧𝖹Q_{\mathsf{GHZ}} is locally embeddable in the ithi^{th} coordinate of the conditional distribution P~\tilde{P}. In fact, it will suffice for us to consider only affine “power” subspaces, i.e. of the form w+𝒱3w+{\cal V}^{3} for some linear subspace 𝒱{\cal V} in 𝔽2n{\mathbb{F}}_{2}^{n} and vector w𝔽23×nw\in{\mathbb{F}}_{2}^{3\times n}. Let X1,,Xn𝔽23X^{1},\ldots,X^{n}\in{\mathbb{F}}_{2}^{3} denote the queries in each of the nn repetitions.

Our observation is that when EE is affine there exists a subset of coordinates S[n]S\subseteq[n] with |S|2|S|\geq 2 such that for any iSi^{\prime}\in S, EE depends on (Xi)iS(X^{i})_{i\in S} only via the differences (XiXi)iS{i}(X^{i^{\prime}}-X^{i})_{i^{\prime}\in S\setminus\{i\}}. Indeed, if E=E1×E2×E3E=E_{1}\times E_{2}\times E_{3} and if each EjE_{j} is given by an affine equation (Xj1,,Xjn)A=bj(X^{1}_{j},\ldots,X^{n}_{j})\cdot A=b_{j} for a sufficiently “skinny” matrix AA, then by the pigeonhole principle there must exist two distinct subset row-sums of AA with equal values. By considering the symmetric difference of these subsets, and using the fact that we are working over 𝔽2{\mathbb{F}}_{2}, there is a set S[n]S\subseteq[n] such that the SS-subset row-sum of AA is 0. Thus the value of (Xj1,,Xjn)A(X^{1}_{j},\ldots,X^{n}_{j})\cdot A is unchanged if XjiX^{i}_{j} is subtracted from XjiX^{i^{\prime}}_{j} for every iSi\in S.

As a result, the players can all sample (XiXi)iS{i}(X^{i^{\prime}}-X^{i})_{i^{\prime}\in S\setminus\{i\}} and (Xi)iS(X^{i^{\prime}})_{i^{\prime}\notin S}, which are independent of XiX^{i}, using shared randomness. On input XjiX^{i}_{j}, the jthj^{th} player can locally compute (Xji)iS(X^{i^{\prime}}_{j})_{i^{\prime}\in S} from XjiX^{i}_{j} and (XiXi)(X^{i^{\prime}}-X^{i}).

Pseudo-Affine Decompositions

At a high level, we next show that if EE is an arbitrary product event (with sufficient probability mass) then P~\tilde{P} has a “pseudo-affine decomposition”. That is, there is a partition Π\Pi of (𝔽2n)3({\mathbb{F}}_{2}^{n})^{3} into affine subspaces such that if 𝒲{\cal W} is a random part of Π\Pi (as weighted by P~\tilde{P}), then any strategy for P~|𝒲\tilde{P}|{\cal W} can be extended to a strategy for P|𝒲P|{\cal W} that is similarly successful in expectation.

To construct Π\Pi, we prove the following sufficient conditions for Π\Pi to be a pseudo-affine decomposition:

  • When 𝒲{\cal W} is a random part of Π\Pi (as weighted by P~\tilde{P}), the distributions P~|𝒲\tilde{P}|{\cal W} and P|𝒲P|{\cal W} are indistinguishable to all sufficiently compressing linear distinguishers. That is, if 𝒲{\cal W} is an affine shift of 𝒱3{\cal V}^{3}, then for all subspaces 𝒰𝒱{\cal U}\leq{\cal V} of sufficiently small co-dimension, the distributions P~|𝒲\tilde{P}|{\cal W} and P|𝒲P|{\cal W} are statistically close modulo 𝒰3{\cal U}^{3}.

  • Each part 𝒲{\cal W} of Π\Pi is in fact an affine shift of a product space 𝒱3{\cal V}^{3} for some linear space 𝒱{\cal V}.

We construct Π\Pi satisfying these conditions iteratively. Starting with the singleton partition, as long as a random part 𝒲{\cal W} of Π\Pi has some subspace 𝒰{\cal U} for which P~|𝒲\tilde{P}|{\cal W} and P|𝒲P|{\cal W} are distributed differently mod 𝒰3{\cal U}^{3}, we replace each part 𝒲{\cal W} of Π\Pi by all the affine shifts of 𝒰3{\cal U}^{3} in 𝒲{\cal W}. We show that this process cannot be repeated too many times when EE has sufficient density.

Pseudorandomness Preserves Hardness

The high-level reason these conditions suffice is because for any strategy f=f1×f2×f3f=f_{1}\times f_{2}\times f_{3}, they enable us to refine Π\Pi to a partition Πf\Pi^{\prime}_{f} such that when XX is sampled from P~|𝒲\tilde{P}|{\cal W}^{\prime} for a random part 𝒲{\cal W}^{\prime} in Πf\Pi^{\prime}_{f}, the distribution of f(X)f(X) is as if XX were sampled uniformly from 𝒲E{\cal W}^{\prime}\cap E (i.e. with X1X_{1}, X2X_{2}, and X3X_{3} mutually independent). Moreover, when we construct Πf\Pi^{\prime}_{f} we partition each part 𝒲{\cal W} of Π\Pi into all affine shifts of some linear space 𝒰3{\cal U}^{3} where the codimension of 𝒰3{\cal U}^{3} in 𝒲{\cal W} is not too large. Thus the strategy ff on P~|𝒲\tilde{P}|{\cal W} effectively has the players acting as independent (randomized) functions of their inputs modulo 𝒰{\cal U}. Such strategies generalize to P|𝒲P|{\cal W} by the first property of pseudo-affine decompositions stated above.

To construct Πf\Pi^{\prime}_{f}, we ensure that f1f_{1} is uncorrelated with every affine function on P~|𝒲\tilde{P}|{\cal W}^{\prime} when 𝒲{\cal W}^{\prime} is a random part of Πf\Pi^{\prime}_{f}, and then prove the desired independence by Fourier analysis. We construct Πf\Pi^{\prime}_{f} by iterative refinement of Π\Pi. Start by considering a random part 𝒲{\cal W} of Π\Pi. Whenever f(X1)f(X_{1}) is correlated with an affine 𝔽2{\mathbb{F}}_{2}-valued function χ\chi, replace 𝒲{\cal W} in Π\Pi by 𝒲χ1(0){\cal W}\cap\chi^{-1}(0) and 𝒲χ1(1){\cal W}\cap\chi^{-1}(1), and do this in parallel for all parts of Π\Pi. We show that this cannot be repeated too many times, and thus we quickly arrive at our desired ΠF\Pi^{\prime}_{F}.

3 Preliminaries

In this section we describe some preliminary definitions that are somewhat specific to this work. More standard preliminaries are given in Appendices A and B.

3.1 Set Theory

Definition 3.1.

For any set SS, a partition of SS is a pairwise disjoint set of subsets of SS, whose union is all of SS.

If Π\Pi is a partition of SS and xx is an element of SS, we write Π(x)\Pi(x) to denote the (unique) element of Π\Pi that contains xx.

3.2 Linear Algebra

If UU is a linear subspace of VV, we write UVU\leq V rather than UVU\subseteq V to emphasize that UU is a subspace rather than an unstructured subset.

We crucially rely on the Cauchy-Schwarz inequality:

Definition 3.2 (Inner Product Space).

A real inner product space is a vector space VV over {\mathbb{R}} together with an operation ,:V×V\langle\cdot,\cdot\rangle:V\times V\to{\mathbb{R}} satisfying the following axioms for all x,y,zVx,y,z\in V:

  • Symmetry: x,y=y,x\langle x,y\rangle=\langle y,x\rangle.

  • Linearity in the first111Because of symmetry, this implies also linearity in the second argument, aka bilinearity. argument: ax+by,z=ax,z+by,z\langle ax+by,z\rangle=a\langle x,z\rangle+b\langle y,z\rangle.

  • Positive Definiteness: x,x>0\langle x,x\rangle>0 if x0x\neq 0.

Theorem 3.3 (Cauchy-Schwarz).

In any inner product space, it holds for all vectors uu and vv that |u,v|2u,uv,v|\langle u,v\rangle|^{2}\leq\langle u,u\rangle\cdot\langle v,v\rangle.

3.3 Multi-Player Games

In parallel repetition we often work with Cartesian product sets of the form 𝒳=(𝒳1××𝒳k)n{\cal X}=({\cal X}_{1}\times\cdots\times{\cal X}_{k})^{n}. For these sets, we will use superscripts to index the outer product and subscripts to index the inner product. That is, we view elements xx of 𝒳{\cal X} as tuples (x1,,xn)(x^{1},\ldots,x^{n}), where the ithi^{th} component of xjx^{j} is xijx^{j}_{i}. We will also write xix_{i} to denote the vector (xi1,,xin)(x^{1}_{i},\ldots,x^{n}_{i}). If {Ei𝒳i}i[k]\{E_{i}\subseteq{\cal X}_{i}\}_{i\in[k]} is a collection of subsets indexed by subscripts, we write E1××EkE_{1}\times\cdots\times E_{k} or i[k]Ei\prod_{i\in[k]}E_{i} to denote the set {x𝒳:i[k],xiEi}\{x\in{\cal X}:\,\forall i\in[k],x_{i}\in E_{i}\}. Similarly, if 𝒴{\cal Y} is a product set (𝒴1××𝒴k)m({\cal Y}_{1}\times\cdots\times{\cal Y}_{k})^{m}, we say f:𝒳𝒴f:{\cal X}\to{\cal Y} is a product function f1××fkf_{1}\times\cdots\times f_{k} if f(x)=yf(x)=y for yi=fi(xi)y_{i}=f_{i}(x_{i}).

Definition 3.4 (Multi-player Games).

A kk-player game is a tuple (𝒳,𝒴,P,W)({\cal X},{\cal Y},P,W), where 𝒳=𝒳1××𝒳k{\cal X}={\cal X}_{1}\times\cdots\times{\cal X}_{k} and 𝒴=𝒴1××𝒴k{\cal Y}={\cal Y}_{1}\times\cdots\times{\cal Y}_{k} are finite sets, PP is a probability measure on 𝒳{\cal X}, and W:𝒳×𝒴{0,1}W:{\cal X}\times{\cal Y}\to\{0,1\} is a “winning probability” predicate.

Definition 3.5 (Parallel Repetition).

Given a kk-player game 𝒢=(𝒳,𝒴,Q,W){\cal G}=({\cal X},{\cal Y},Q,W), its nn-fold parallel repetition, denoted 𝒢n{\cal G}^{n}, is defined as the kk-player game (𝒳n,𝒴n,Qn,Wn)({\cal X}^{n},{\cal Y}^{n},Q^{n},W^{n}), where Wn(x,y)=𝖽𝖾𝖿j=1nW(xj,yj)W^{n}(x,y)\stackrel{{\scriptstyle\mathsf{def}}}{{=}}\bigwedge_{j=1}^{n}W(x^{j},y^{j}).

Definition 3.6.

The success probability of a function f=f1×fk:𝒳𝒴f=f_{1}\times\cdots f_{k}:{\cal X}\to{\cal Y} in a kk-player game 𝒢=(𝒳,𝒴,Q,W){\cal G}=({\cal X},{\cal Y},Q,W) is

v[f](𝒢)=𝖽𝖾𝖿PrxQ[W((x,f(x))=1].v[f]({\cal G})\stackrel{{\scriptstyle\mathsf{def}}}{{=}}\Pr_{x\leftarrow Q}\Big{[}W\big{(}(x,f(x)\big{)}=1\Big{]}.
Definition 3.7.

The value of a kk-player game 𝒢=(𝒳,𝒴,Q,W){\cal G}=({\cal X},{\cal Y},Q,W), denoted v(𝒢)v({\cal G}), is the maximum, over all functions f=f1××fk:𝒳𝒴f=f_{1}\times\cdots\times f_{k}:{\cal X}\to{\cal Y}, of v[f](𝒢)v[f]({\cal G}).

Fact 3.8.

Randomized strategies are no better than deterministic strategies.

Definition 3.9 (Value in jthj^{th} coordinate).

If 𝒢=(𝒳,𝒴,Q,Wn){\cal G}=({\cal X},{\cal Y},Q,W^{n}) is a game (with a product winning predicate), the value of 𝒢{\cal G} in the jthj^{th} coordinate, denoted vj(𝒢)v^{j}({\cal G}), is the value of the game (𝒳,𝒴,Q,W)({\cal X},{\cal Y},Q,W^{\prime}), where W(x,y)=W(xi,yi)W^{\prime}(x,y)=W(x^{i},y^{i}).

Definition 3.10 (Game with Modified Query Distribution).

If 𝒢=(𝒳,𝒴,Q,W){\cal G}=({\cal X},{\cal Y},Q,W) is a game, and PP is a probability measure on 𝒳{\cal X}, we write 𝒢|P{\cal G}|P to denote the game (𝒳,𝒴,P,W)({\cal X},{\cal Y},P,W).

4 Key Lemmas

In this section, we give some Fourier-analytic conditions (see Appendix B for the basics of Fourier analysis) that imply independence of random variables under the (parallel repeated) GHZ query distribution.

It will be convenient for us to work with probability distributions in terms of their densities (see Appendix A for basic probability definitions and notation).

Definition 4.1 (Probability Densities).

If P:ΩP:\Omega\to{\mathbb{R}} is a probability distribution with support contained in AA, then the density of PP in AA is

φ:A\displaystyle\varphi:A \displaystyle\to{\mathbb{R}}
x\displaystyle x |A|P(x).\displaystyle\mapsto|A|\cdot P(x).

If AA is unspecified, then by default it is taken to be Ω\Omega.

Lemma 4.2.

Let 𝒱{\cal V} be a (finite) vector space over 𝔽2{\mathbb{F}}_{2}, let PP be uniform on {x𝒱3:x1+x2+x3=0}\{x\in{\cal V}^{3}:x_{1}+x_{2}+x_{3}=0\}, and let UU be uniform on 𝒱3{\cal V}^{3}.

For any subset E=E1×E2×E3E=E_{1}\times E_{2}\times E_{3} of 𝒱3{\cal V}^{3},

P(E)=χ𝒱^i[3]1^Ei(χ)=U(E)χ𝒱^i[3]φ^Ei(χ),P(E)=\sum_{\chi\in\hat{{\cal V}}}\prod_{i\in[3]}\hat{1}_{E_{i}}(\chi)=U(E)\cdot\sum_{\chi\in\hat{{\cal V}}}\prod_{i\in[3]}\hat{\varphi}_{E_{i}}(\chi),

where φEi\varphi_{E_{i}} denotes the density in 𝒱{\cal V} of the uniform distribution on EiE_{i}.

Proof.

Let φP\varphi_{P} denote the density in 𝒱3{\cal V}^{3} of PP. That is,

φP(x1,x2,x3)={|𝒱|if x1+x2+x3=00otherwise.\varphi_{P}(x_{1},x_{2},x_{3})=\begin{cases}|{\cal V}|&\text{if $x_{1}+x_{2}+x_{3}=0$}\\ 0&\text{otherwise.}\end{cases}

Then

P(E)\displaystyle P(E) =𝔼x𝒱3[φP(x)1E(x)]\displaystyle=\mathop{\mathbb{E}}_{x\leftarrow{\cal V}^{3}}\left[\varphi_{P}(x)\cdot 1_{E}(x)\right]
=χ𝒱3^φ^P(χ)1^E(χ).\displaystyle=\sum_{\chi\in\widehat{{\cal V}^{3}}}\hat{\varphi}_{P}(\chi)\cdot\hat{1}_{E}(\chi). (Plancherel) (1)

We now compute φ^P(χ)\hat{\varphi}_{P}(\chi) and 1^E(χ)\hat{1}_{E}(\chi). We start by noting that the dual space 𝒱3^\widehat{{\cal V}^{3}} is isomorphic to 𝒱^3\hat{{\cal V}}^{3}. That is, each character χ𝒱3^\chi\in\widehat{{\cal V}^{3}} is of the form χ(x1,x2,x3)=χ1(x1)χ2(x2)χ3(x3)\chi(x_{1},x_{2},x_{3})=\chi_{1}(x_{1})\chi_{2}(x_{2})\chi_{3}(x_{3}) for some (uniquely determined) χ1,χ2,χ3𝒱^\chi_{1},\chi_{2},\chi_{3}\in\hat{{\cal V}} and conversely, each choice of χ1,χ2,χ3𝒱^\chi_{1},\chi_{2},\chi_{3}\in\hat{{\cal V}} gives rise to some χ𝒱3^\chi\in\widehat{{\cal V}^{3}}.

The Fourier transform of φP\varphi_{P} is given by

φ^P(χ1,χ2,χ3)={1if χ1=χ2=χ30otherwise.\hat{\varphi}_{P}(\chi_{1},\chi_{2},\chi_{3})=\begin{cases}1&\text{if $\chi_{1}=\chi_{2}=\chi_{3}$}\\ 0&\text{otherwise.}\end{cases} (2)

Since EE is a product event, the Fourier transform of 1E:𝒱3{0,1}1_{E}:{\cal V}^{3}\to\{0,1\} is given by

1^E(χ1,χ2,χ3)\displaystyle\hat{1}_{E}(\chi_{1},\chi_{2},\chi_{3}) =i[3]1^Ei(χi)\displaystyle=\prod_{i\in[3]}\hat{1}_{E_{i}}(\chi_{i})
=U(E)i[3]φ^Ei(χi).\displaystyle=U(E)\cdot\prod_{i\in[3]}\hat{\varphi}_{E_{i}}(\chi_{i}). (3)

Substituting Eqs. 2 and 3 into Eq. 1 concludes the proof of the lemma. ∎

Corollary 4.3.

With 𝒱{\cal V}, PP, EE, and UU as in Lemma 4.2,

|P(E)U(E)|χ𝒱^{1}i[3]|1^Ei(χ)|,\left|P(E)-U(E)\right|\leq\sum_{\chi\in\hat{{\cal V}}\setminus\{1\}}\prod_{i\in[3]}\big{|}\hat{1}_{E_{i}}(\chi)\big{|},

where 1𝒱^1\in\hat{{\cal V}} denotes the trivial character.

Proof.

For any probability density function φ\varphi, we have φ^(1)=1\hat{\varphi}(1)=1, so

|P(E)U(E)|\displaystyle\left|P(E)-U(E)\right| =U(E)|P(E)U(E)1|\displaystyle=U(E)\cdot\left|\frac{P(E)}{U(E)}-1\right|
U(E)|χ𝒱^{1}i[3]φ^Ei(χ)|\displaystyle\leq U(E)\cdot\left|\sum_{\chi\in\hat{{\cal V}}\setminus\{1\}}\prod_{i\in[3]}\hat{\varphi}_{E_{i}}(\chi)\right|
χ𝒱^{1}i[3]|1^Ei(χ)|.\displaystyle\leq\sum_{\chi\in\hat{{\cal V}}\setminus\{1\}}\prod_{i\in[3]}\left|\hat{1}_{E_{i}}(\chi)\right|.\qed
Lemma 4.4.

Let 𝒱{\cal V} be a (finite) vector space over 𝔽2{\mathbb{F}}_{2}, let PP be uniform on {x𝒱3:x1+x2+x3=0}\{x\in{\cal V}^{3}:x_{1}+x_{2}+x_{3}=0\}, let UU be uniform on 𝒱3{\cal V}^{3}, and let X=(X1,X2,X3)X=(X_{1},X_{2},X_{3}) denote the identity222Specifically, with the formalism of random variables as functions on a sample space, we mean that XX is the identity function, mapping (x1,x2,x3)(x_{1},x_{2},x_{3}) to (x1,x2,x3)(x_{1},x_{2},x_{3}). random variable on 𝒱3{\cal V}^{3}. Let Yi=Yi(Xi)Y_{i}=Y_{i}(X_{i}) be a 𝒴i{\cal Y}_{i}-valued random variable for each i[3]i\in[3], let Y=(Y1,Y2,Y3)Y=(Y_{1},Y_{2},Y_{3}), and let 𝒴=𝒴1×𝒴2×𝒴3{\cal Y}={\cal Y}_{1}\times{\cal Y}_{2}\times{\cal Y}_{3}.

Let 𝒲{\cal W} be a subspace of 𝒱{\cal V}. If for all χ𝒲^\chi\in\hat{{\cal W}},

𝔼(x,y)PX,Y[d𝖳𝖵(Pχ(X1)|Xx+𝒲3,Y1=y1,Uχ(X1)|Xx+𝒲3)]ϵ,\mathop{\mathbb{E}}_{(x,y)\leftarrow P_{X,Y}}\left[d_{\mathsf{TV}}\big{(}P_{\chi(X_{1})|X\in x+{\cal W}^{3},Y_{1}=y_{1}},U_{\chi(X_{1})|X\in x+{\cal W}^{3}}\big{)}\right]\leq\epsilon, (4)

then

𝔼xPX[d𝖳𝖵(PY|Xx+𝒲3,UY|Xx+𝒲3)]ϵ|𝒴2||𝒴3|.\mathop{\mathbb{E}}_{x\leftarrow P_{X}}\left[d_{\mathsf{TV}}\big{(}P_{Y|X\in x+{\cal W}^{3}},U_{Y|X\in x+{\cal W}^{3}}\big{)}\right]\leq\epsilon\cdot\sqrt{|{\cal Y}_{2}|\cdot|{\cal Y}_{3}|}.
Proof.

For x𝒱3x\in{\cal V}^{3}, we will write x¯\bar{x} to denote the set x+𝒲3x+{\cal W}^{3}. Recall that 𝒱/𝒲{\cal V}/{\cal W} denotes the set of all cosets {x+𝒲}x𝒱\{x+{\cal W}\}_{x\in{\cal V}}. For every i[3]i\in[3], every x¯i𝒱/𝒲\bar{x}_{i}\in{\cal V}/{\cal W}, and every yi𝒴iy_{i}\in{\cal Y}_{i}, define 1i,x¯i,yi:x¯i{0,1}1_{i,\bar{x}_{i},y_{i}}:\bar{x}_{i}\to\{0,1\} to be the indicator for the set Yi1(yi)x¯iY_{i}^{-1}(y_{i})\cap\bar{x}_{i}. Define φi,x¯i,yi\varphi_{i,\bar{x}_{i},y_{i}} to be the density (in x¯i\bar{x}_{i}) of the uniform distribution on Yi1(yi)x¯iY_{i}^{-1}(y_{i})\cap\bar{x}_{i}. That is,

φi,x¯i,yi:x¯iφi,x¯i,yi(xi)={|x¯i||Yi1(yi)x¯i|if Yi(xi)=yi0otherwise.\begin{array}[]{l}\varphi_{i,\bar{x}_{i},y_{i}}:\bar{x}_{i}\to{\mathbb{R}}\\ \varphi_{i,\bar{x}_{i},y_{i}}(x^{\prime}_{i})=\begin{cases}\frac{|\bar{x}_{i}|}{|Y_{i}^{-1}(y_{i})\cap\bar{x}_{i}|}&\text{if $Y_{i}(x^{\prime}_{i})=y_{i}$}\\ 0&\text{otherwise.}\end{cases}\end{array}

φi,x¯i,yi\varphi_{i,\bar{x}_{i},y_{i}} is easily seen to be related to 1i,x¯i,yi1_{i,\bar{x}_{i},y_{i}} as

1i,x¯i,yi=PYi|X¯i=x¯i(yi)φi,x¯i,yi.1_{i,\bar{x}_{i},y_{i}}=P_{Y_{i}|\bar{X}_{i}=\bar{x}_{i}}(y_{i})\cdot\varphi_{i,\bar{x}_{i},y_{i}}.

With this notation, our assumption that Eq. 4 holds (for all χ𝒲^\chi\in\hat{{\cal W}}) is equivalent to assuming that for all χ𝒲^{1}\chi\in\hat{{\cal W}}\setminus\{1\},

𝔼(x,y)PX,Y[|φ^1,x¯1,y1(χ)|]2ϵ.\mathop{\mathbb{E}}_{(x,y)\leftarrow P_{X,Y}}\left[\big{|}\hat{\varphi}_{1,\bar{x}_{1},y_{1}}(\chi)\big{|}\right]\leq 2\epsilon. (5)

This is because for all χ𝒲^{1}\chi\in\hat{{\cal W}}\setminus\{1\}, the distribution Uχ(X1)|Xx+𝒲3U_{\chi(X_{1})|X\in x+{\cal W}^{3}} is uniform on {±1}\{\pm 1\}.

In general for xSupp(PX)x\in\mathrm{Supp}(P_{X}), we have (by Corollary 4.3) that for any y𝒴y\in{\cal Y},

|PY|Xx+𝒲3(y)UY|Xx+𝒲3(y)|χ𝒲^{1}i[3]|1^i,x¯i,yi(χ)|\left|P_{Y|X\in x+{\cal W}^{3}}(y)-U_{Y|X\in x+{\cal W}^{3}}(y)\right|\leq\sum_{\chi\in\hat{{\cal W}}\setminus\{1\}}\prod_{i\in[3]}\big{|}\hat{1}_{i,\bar{x}_{i},y_{i}}(\chi)\big{|} (6)

because:

  • the event E={Y=y}E=\{Y=y\} is a product event E1×E2×E3E_{1}\times E_{2}\times E_{3}, where each Ei={Yi=yi}E_{i}=\{Y_{i}=y_{i}\} depends only on XiX_{i} or equivalently on XixiX_{i}-x_{i},

  • the distribution PXx|X¯=x¯P_{X-x|\bar{X}=\bar{x}} is uniform on {(𝐰1,𝐰2,𝐰3)𝒲3:𝐰1+𝐰2+𝐰3=0}\{(\mathbf{w}_{1},\mathbf{w}_{2},\mathbf{w}_{3})\in{\cal W}^{3}:\mathbf{w}_{1}+\mathbf{w}_{2}+\mathbf{w}_{3}=0\}, and

  • the distribution UXx|X¯=x¯U_{X-x|\bar{X}=\bar{x}} is uniform on {(𝐰1,𝐰2,𝐰3)𝒲3}\{(\mathbf{w}_{1},\mathbf{w}_{2},\mathbf{w}_{3})\in{\cal W}^{3}\}.

Thus we have

2𝔼xPX[d𝖳𝖵(PY|Xx+𝒲3,UY|Xx+𝒲3)]\displaystyle 2\cdot\mathop{\mathbb{E}}_{x\leftarrow P_{X}}\left[d_{\mathsf{TV}}\big{(}P_{Y|X\in x+{\cal W}^{3}},U_{Y|X\in x+{\cal W}^{3}}\big{)}\right] =𝔼xPXy𝒴|PY|Xx+𝒲3(y)UY|Xx+𝒲3(y)|\displaystyle=\mathop{\mathbb{E}}_{x\leftarrow P_{X}}\sum_{y\in{\cal Y}}\left|P_{Y|X\in x+{\cal W}^{3}}(y)-U_{Y|X\in x+{\cal W}^{3}}(y)\right|
𝔼xPXy𝒴χ1i[3]|1^i,x¯i,yi(χ)|\displaystyle\leq\mathop{\mathbb{E}}_{x\leftarrow P_{X}}\sum_{y\in{\cal Y}}\sum_{\chi\neq 1}\prod_{i\in[3]}\left|\hat{1}_{i,\bar{x}_{i},y_{i}}(\chi)\right|
=𝔼xPXy𝒴χ1i{2,3}|1^1,x¯1,y1(χ)|1^i,x¯i,yi(χ)2.\displaystyle=\mathop{\mathbb{E}}_{x\leftarrow P_{X}}\sum_{y\in{\cal Y}}\sum_{\chi\neq 1}\prod_{i\in\{2,3\}}\sqrt{\left|\hat{1}_{1,\bar{x}_{1},y_{1}}(\chi)\right|\cdot\hat{1}_{i,\bar{x}_{i},y_{i}}(\chi)^{2}}.

Now, we apply Cauchy-Schwarz on the inner product space whose elements are real-valued functions of (x,y,χ)(x,y,\chi), and where the inner product is defined by f,g=𝖽𝖾𝖿𝔼xPXy𝒴χ1f(x,y,χ)g(x,y,χ)\langle f,g\rangle\stackrel{{\scriptstyle\mathsf{def}}}{{=}}\mathop{\mathbb{E}}_{x\leftarrow P_{X}}\sum_{y\in{\cal Y}}\sum_{\chi\neq 1}f(x,y,\chi)\cdot g(x,y,\chi). This bounds the above by

i{2,3}(𝔼xPXy𝒴χ1|1^1,x¯1,y1(χ)|1^i,x¯i,yi(χ)2)\displaystyle\sqrt{\prod_{i\in\{2,3\}}\left(\mathop{\mathbb{E}}_{x\leftarrow P_{X}}\sum_{y\in{\cal Y}}\sum_{\chi\neq 1}\big{|}\hat{1}_{1,\bar{x}_{1},y_{1}}(\chi)\big{|}\cdot\hat{1}_{i,\bar{x}_{i},y_{i}}(\chi)^{2}\right)}
=\displaystyle= i{2,3}(χ1y𝒴𝔼xPX[|1^1,x¯1,y1(χ)|1^i,x¯i,yi(χ)2]).\displaystyle\sqrt{\prod_{i\in\{2,3\}}\left(\sum_{\chi\neq 1}\sum_{y\in{\cal Y}}\mathop{\mathbb{E}}_{x\leftarrow P_{X}}\Big{[}\big{|}\hat{1}_{1,\bar{x}_{1},y_{1}}(\chi)\big{|}\cdot\hat{1}_{i,\bar{x}_{i},y_{i}}(\chi)^{2}\Big{]}\right)}.

By the independence of (X1,Y1)(X_{1},Y_{1}) and (Xi,Yi)(X_{i},Y_{i}) under PP for i{2,3}i\in\{2,3\}, this is equal to

i{2,3}χ1y𝒴𝔼xPX[|1^1,x¯1,y1(χ)|]𝔼xPX[1^i,x¯i,yi(χ)2]\displaystyle\prod_{i\in\{2,3\}}\sqrt{\sum_{\chi\neq 1}\sum_{y\in{\cal Y}}\mathop{\mathbb{E}}_{x\leftarrow P_{X}}\Big{[}\big{|}\hat{1}_{1,\bar{x}_{1},y_{1}}(\chi)\big{|}\Big{]}\cdot\mathop{\mathbb{E}}_{x\leftarrow P_{X}}\left[\hat{1}_{i,\bar{x}_{i},y_{i}}(\chi)^{2}\right]}
=\displaystyle= i{2,3}|𝒴5i|χ1(y1𝒴1𝔼xPX[|1^1,x¯1,y1(χ)|])(yi𝒴i𝔼xPX[1^i,x¯i,yi(χ)2]).\displaystyle\prod_{i\in\{2,3\}}\sqrt{|{\cal Y}_{5-i}|\cdot\sum_{\chi\neq 1}\left(\sum_{y_{1}\in{\cal Y}_{1}}\mathop{\mathbb{E}}_{x\leftarrow P_{X}}\Big{[}\big{|}\hat{1}_{1,\bar{x}_{1},y_{1}}(\chi)\big{|}\Big{]}\right)\cdot\left(\sum_{y_{i}\in{\cal Y}_{i}}\mathop{\mathbb{E}}_{x\leftarrow P_{X}}\left[\hat{1}_{i,\bar{x}_{i},y_{i}}(\chi)^{2}\right]\right)}.

But the function 11,x¯1,y11_{1,\bar{x}_{1},y_{1}} is just PY1|X¯1=x¯1(y1)φ1,x¯1,y1P_{Y_{1}|\bar{X}_{1}=\bar{x}_{1}}(y_{1})\cdot\varphi_{1,\bar{x}_{1},y_{1}}, so the above is

i{2,3}|𝒴5i|χ1(y1𝒴1𝔼xPX[PY1|X¯1=x¯1(y1)|φ^1,x¯1,y1(χ)|])(yi𝒴i𝔼xPX[1^i,x¯i,yi(χ)2])\displaystyle\prod_{i\in\{2,3\}}\sqrt{|{\cal Y}_{5-i}|\cdot\sum_{\chi\neq 1}\left(\sum_{y_{1}\in{\cal Y}_{1}}\mathop{\mathbb{E}}_{x\leftarrow P_{X}}\left[P_{Y_{1}|\bar{X}_{1}=\bar{x}_{1}}(y_{1})\cdot\big{|}\hat{\varphi}_{1,\bar{x}_{1},y_{1}}(\chi)\big{|}\right]\right)\cdot\left(\sum_{y_{i}\in{\cal Y}_{i}}\mathop{\mathbb{E}}_{x\leftarrow P_{X}}\left[\hat{1}_{i,\bar{x}_{i},y_{i}}(\chi)^{2}\right]\right)}
=i{2,3}|𝒴i|χ1(y1𝒴1𝔼xPX[PY1|X¯1=x¯1(y1)|φ^1,x¯1,y1(χ)|])(yi𝒴i𝔼xPX[1^i,x¯i,yi(χ)2])\displaystyle=\prod_{i\in\{2,3\}}\sqrt{|{\cal Y}_{i}|\cdot\sum_{\chi\neq 1}\left(\sum_{y_{1}\in{\cal Y}_{1}}\mathop{\mathbb{E}}_{x\leftarrow P_{X}}\left[P_{Y_{1}|\bar{X}_{1}=\bar{x}_{1}}(y_{1})\cdot\big{|}\hat{\varphi}_{1,\bar{x}_{1},y_{1}}(\chi)\big{|}\right]\right)\cdot\left(\sum_{y_{i}\in{\cal Y}_{i}}\mathop{\mathbb{E}}_{x\leftarrow P_{X}}\left[\hat{1}_{i,\bar{x}_{i},y_{i}}(\chi)^{2}\right]\right)}

which by the definition of expectation is

i{2,3}|𝒴i|χ1(𝔼x,yPX,Y[|φ^1,x¯1,y1(χ)|])(yi𝒴i𝔼xPX[1^i,x¯i,yi(χ)2]).\prod_{i\in\{2,3\}}\sqrt{|{\cal Y}_{i}|\cdot\sum_{\chi\neq 1}\left(\mathop{\mathbb{E}}_{x,y\leftarrow P_{X,Y}}\left[\big{|}\hat{\varphi}_{1,\bar{x}_{1},y_{1}}(\chi)\big{|}\right]\right)\cdot\left(\sum_{y_{i}\in{\cal Y}_{i}}\mathop{\mathbb{E}}_{x\leftarrow P_{X}}\left[\hat{1}_{i,\bar{x}_{i},y_{i}}(\chi)^{2}\right]\right)}.

We use Eq. 5 to bound this by

i{2,3}2ϵ|𝒴i|χ1yi𝒴i𝔼xPX[1^i,x¯i,yi(χ)2]\displaystyle\prod_{i\in\{2,3\}}\sqrt{2\epsilon|{\cal Y}_{i}|\cdot\sum_{\chi\neq 1}\sum_{y_{i}\in{\cal Y}_{i}}\mathop{\mathbb{E}}_{x\leftarrow P_{X}}\left[\hat{1}_{i,\bar{x}_{i},y_{i}}(\chi)^{2}\right]}
i{2,3}2ϵ|𝒴i|yi𝒴i𝔼xPX[𝔼xx¯i[1i,x¯i,yi(x)2]]\displaystyle\leq\prod_{i\in\{2,3\}}\sqrt{2\epsilon|{\cal Y}_{i}|\cdot\sum_{y_{i}\in{\cal Y}_{i}}\mathop{\mathbb{E}}_{x\leftarrow P_{X}}\left[\mathop{\mathbb{E}}_{x^{\prime}\leftarrow\bar{x}_{i}}\left[1_{i,\bar{x}_{i},y_{i}}(x^{\prime})^{2}\right]\right]} (Parseval’s Theorem)
=i{2,3}2ϵ|𝒴i|𝔼xPX[𝔼xx¯i[yi𝒴i1i,x¯i,yi(x)2]].\displaystyle=\prod_{i\in\{2,3\}}\sqrt{2\epsilon|{\cal Y}_{i}|\cdot\mathop{\mathbb{E}}_{x\leftarrow P_{X}}\left[\mathop{\mathbb{E}}_{x^{\prime}\leftarrow\bar{x}_{i}}\left[\sum_{y_{i}\in{\cal Y}_{i}}1_{i,\bar{x}_{i},y_{i}}(x^{\prime})^{2}\right]\right]}.

But for yiyiy_{i}\neq y^{\prime}_{i}, the supports of 1i,x¯i,yi1_{i,\bar{x}_{i},y_{i}} and 1i,x¯i,yi1_{i,\bar{x}_{i},y^{\prime}_{i}} are disjoint, so this is at most 2ϵ|𝒴2||𝒴3|2\epsilon\sqrt{|{\cal Y}_{2}|\cdot|{\cal Y}_{3}|}. ∎

5 Local Embeddability in Affine Subspaces

In this section we show that the parallel repeated GHZ query distribution has many coordinates in which the GHZ query distribution can be locally embedded, even conditioned on any affine event of low co-dimension. We first recall the notion of a local embedding.

Definition 5.1.

Let Σ\Sigma be a finite set, let kk and nn be positive integers, let QQ be a probability distribution on Σk\Sigma^{k}, and let P~\tilde{P} be a probability distribution on Σk×n\Sigma^{k\times n}.

We say that QQ is locally embeddable in the jthj^{th} coordinate of P~\tilde{P} if there exists a probability distribution RR on a set \mathcal{R} and functions e1,,ek:Σ×Σne_{1},\ldots,e_{k}:\Sigma\times\mathcal{R}\to\Sigma^{n} such that when sampling qQq\leftarrow Q, rRr\leftarrow R, if X~\tilde{X} denotes the random variable

X~=𝖽𝖾𝖿(e1(q1,r)ek(qk,r)),\tilde{X}\stackrel{{\scriptstyle\mathsf{def}}}{{=}}\left(\begin{array}[]{c}e_{1}(q_{1},r)^{\top}\\ \vdots\\ e_{k}(q_{k},r)^{\top}\end{array}\right),

then:

  1. 1.

    The probability law of X~\tilde{X} is exactly P~\tilde{P}.

  2. 2.

    It holds with probability 11 that X~j=q\tilde{X}^{j}=q.

Proposition 5.2.

Let nn and mm be positive integers with m<nm<n. Let QQ denote the GHZ query distribution (uniform on the set 𝒬={x𝔽23:x1+x2+x3=0}{\cal Q}=\{x\in{\mathbb{F}}_{2}^{3}:x_{1}+x_{2}+x_{3}=0\}), and let 𝒲{\cal W} be an affine shift of 𝒱3{\cal V}^{3} for a subspace 𝒱𝔽21×n{\cal V}\leq{\mathbb{F}}_{2}^{1\times n} of codimension mm with Qn(𝒲)>0Q^{n}({\cal W})>0.

Then there exist at least nmn-m distinct values of j[n]j\in[n] for which QQ is locally embeddable in the jthj^{th} coordinate of P~=𝖽𝖾𝖿Qn|𝒲\tilde{P}\stackrel{{\scriptstyle\mathsf{def}}}{{=}}Q^{n}|{\cal W}.

Proof.

Suppose otherwise. Without loss of generality, suppose that the coordinates that are not locally embeddable include the first n=𝖽𝖾𝖿m+1n^{\prime}\stackrel{{\scriptstyle\mathsf{def}}}{{=}}m+1 coordinates (otherwise, 𝒱{\cal V} can be permuted to make this so). That is, for each j[n]j\in[n^{\prime}], QQ is not locally embeddable in the jthj^{th} coordinate of P~\tilde{P}.

Let the defining equations for 𝒱{\cal V} be written as

𝒱=𝖽𝖾𝖿{x𝔽21×n:xA=0}{\cal V}\stackrel{{\scriptstyle\mathsf{def}}}{{=}}\left\{x\in{\mathbb{F}}_{2}^{1\times n}:x\cdot A=0\right\}

for some choice of A𝔽2n×mA\in{\mathbb{F}}_{2}^{n\times m}, and let 𝐯𝔽23×n\mathbf{v}\in{\mathbb{F}}_{2}^{3\times n} be such that 𝒲=𝐯+𝒱3{\cal W}=\mathbf{v}+{\cal V}^{3}.

Because 2n>2m2^{n^{\prime}}>2^{m}, the pigeonhole principle implies that there exist two distinct sets S0,S1[n]S_{0},S_{1}\subseteq[n^{\prime}] such that

jS0Aj=jS1Aj,\sum_{j\in S_{0}}A_{j}=\sum_{j\in S_{1}}A_{j},

where recall that AjA_{j} denotes the jthj^{th} row of AA. Thus, there is a non-empty subset S=𝖽𝖾𝖿S0ΔS1[n]S\stackrel{{\scriptstyle\mathsf{def}}}{{=}}S_{0}\Delta S_{1}\subseteq[n^{\prime}] such that

jSAj=0.\sum_{j\in S}A_{j}=0. (7)

Fix some such SS. We will show that for any jSj\in S, QQ is locally embeddable in the jthj^{th} coordinate of P~\tilde{P}, which is a contradiction. Let XX denote the 𝔽23×n{\mathbb{F}}_{2}^{3\times n}-valued random variable given by the identity function.

Claim 5.3.

For any jSj\in S, the distribution P~Xj\tilde{P}_{X^{j}} is identical to QQ (i.e., uniformly random on 𝒬{\cal Q}).

Proof.

Let jSj\in S be given. It suffices to show that for every q,q𝒬q,q^{\prime}\in{\cal Q}, there is a bijection Φq,q:𝒬n𝒲𝒬n𝒲\Phi_{q,q^{\prime}}:{\cal Q}^{n}\cap{\cal W}\to{\cal Q}^{n}\cap{\cal W} such that x𝒬n𝒲x\in{\cal Q}^{n}\cap{\cal W} satisfies xj=qx^{j}=q if and only if y=𝖽𝖾𝖿Φq,q(x)y\stackrel{{\scriptstyle\mathsf{def}}}{{=}}\Phi_{q,q^{\prime}}(x) satisfies yj=qy^{j}=q^{\prime}. Such a bijection Φq,q\Phi_{q,q^{\prime}} can be constructed by defining, for all j[n]j^{\prime}\in[n],

Φq,q(x)j={xj+qqif jSxjotherwise.\Phi_{q,q^{\prime}}(x)^{j^{\prime}}=\begin{cases}x^{j^{\prime}}+q^{\prime}-q&\text{if $j^{\prime}\in S$}\\ x^{j^{\prime}}&\text{otherwise.}\end{cases}

Φq,q\Phi_{q,q^{\prime}} clearly is an injective map from 𝒬n{\cal Q}^{n} to 𝒬n{\cal Q}^{n} and satisfies Φq,q(x)j=xj+qq\Phi_{q,q^{\prime}}(x)^{j}=x^{j}+q^{\prime}-q, so the only remaining thing to check is that it indeed maps 𝒲{\cal W} into 𝒲{\cal W}. This is true because it preserves xAx\cdot A. Indeed, for any i[3]i\in[3],

Φq,q(x)iA\displaystyle\Phi_{q,q^{\prime}}(x)_{i}\cdot A =xiA+jS(qiqi)Aj\displaystyle=x_{i}\cdot A+\sum_{j^{\prime}\in S}(q^{\prime}_{i}-q_{i})\cdot A_{j^{\prime}}
=xiA+(qiqi)jSAj\displaystyle=x_{i}\cdot A+(q^{\prime}_{i}-q_{i})\cdot\sum_{j^{\prime}\in S}A_{j^{\prime}}
=xiA\displaystyle=x_{i}\cdot A (by Eq. 7).\displaystyle\text{(by \lx@cref{creftype~refnum}{eq:pigeonhole}).}\qed

For any jSj\in S, let Δ(j)\Delta^{(j)} denote the random variable (XjXj)jS{j}\big{(}X^{j^{\prime}}-X^{j}\big{)}_{j^{\prime}\in S\setminus\{j\}}.

Claim 5.4.

For any jSj\in S, it holds in P~\tilde{P} that (Δ(j),X[n]S)\big{(}\Delta^{(j)},X^{[n]\setminus S}\big{)} and XjX^{j} are independent.

Proof.

Equivalently (using the definition of P~\tilde{P}), let EE denote the event that X𝒲X\in{\cal W}, i.e. for all i[3]i\in[3],

(Xi𝐯i)A=0.(X_{i}-\mathbf{v}_{i})\cdot A=0.

We need to show that in PP, the random variables XjX^{j} and (Δ(j),X[n]S)\big{(}\Delta^{(j)},X^{[n]\setminus S}\big{)} are conditionally independent given EE. To show this, we rely on the following fact:

Fact 5.5.

If YY and ZZ are any independent random variables, and if EE is any event that depends only on ZZ (and occurs with non-zero probability), then YY and ZZ are conditionally independent given EE.

It is clear that XjX^{j} and (Δ(j),X[n]S)\big{(}\Delta^{(j)},X^{[n]\setminus S}\big{)} are independent in PP. It is also the case that EE depends only on (Δ(j),X[n]S)\big{(}\Delta^{(j)},X^{[n]\setminus S}\big{)}: EE is defined by the constraint that for all i[3]i\in[3],

0\displaystyle 0 =(Xi𝐯i)A\displaystyle=(X_{i}-\mathbf{v}_{i})\cdot A
=jS(XijXij𝐯ij)Aj+j[n]S(Xij𝐯ij)Aj\displaystyle=\sum_{j^{\prime}\in S}(X^{j^{\prime}}_{i}-X^{j}_{i}-\mathbf{v}^{j^{\prime}}_{i})\cdot A_{j^{\prime}}+\sum_{j^{\prime}\in[n]\setminus S}(X^{j^{\prime}}_{i}-\mathbf{v}^{j^{\prime}}_{i})\cdot A_{j^{\prime}} (by Eq. 7)
=𝐯ijAj+jS{j}(XijXij𝐯ij)Ajdepends only on Δ(j)+j[n]S(Xij𝐯ij)Ajdepends only on X[n]S.\displaystyle=-\mathbf{v}^{j}_{i}\cdot A_{j}+\underbrace{\sum_{j^{\prime}\in S\setminus\{j\}}(X^{j^{\prime}}_{i}-X^{j}_{i}-\mathbf{v}^{j^{\prime}}_{i})\cdot A_{j^{\prime}}}_{\text{depends only on $\Delta^{(j)}$}}+\underbrace{\sum_{j^{\prime}\in[n]\setminus S}(X^{j^{\prime}}_{i}-\mathbf{v}^{j^{\prime}}_{i})\cdot A_{j^{\prime}}}_{\text{depends only on $X^{[n]\setminus S}$}}.

We now put everything togther. Fix any jSj\in S. We construct a local embedding of QQ into the jthj^{th} coordinate of P~\tilde{P}. For each i[3]i\in[3], we define ei:𝔽2×(𝔽23×n)𝔽21×ne_{i}:{\mathbb{F}}_{2}\times({\mathbb{F}}_{2}^{3\times n})\to{\mathbb{F}}_{2}^{1\times n} such that for each j[n]j^{\prime}\in[n]:

ei(x,r)j={xif j=jx+rijrijif jS{j}rijif jS.e_{i}(x,r)^{j^{\prime}}=\begin{cases}x&\text{if $j^{\prime}=j$}\\ x+r^{j^{\prime}}_{i}-r^{j}_{i}&\text{if $j^{\prime}\in S\setminus\{j\}$}\\ r^{j^{\prime}}_{i}&\text{if $j^{\prime}\notin S$.}\end{cases}

Define the distribution P(𝖾𝗆𝖻𝖾𝖽)P^{(\mathsf{embed})} to be the distribution on x𝔽23×nx\in{\mathbb{F}}_{2}^{3\times n} obtained by independently sampling qQq\leftarrow Q and rP~r\leftarrow\tilde{P}, then defining

x=𝖽𝖾𝖿(e1(q1,r)e2(q2,r)e3(q3,r)).x\stackrel{{\scriptstyle\mathsf{def}}}{{=}}\left(\begin{array}[]{c}e_{1}(q_{1},r)\\ e_{2}(q_{2},r)\\ e_{3}(q_{3},r)\end{array}\right).

It clearly holds with probability 11 that q=xjq=x^{j}.

Claim 5.6.

P(𝖾𝗆𝖻𝖾𝖽)P~P^{(\mathsf{embed})}\equiv\tilde{P}.

Proof.

By definition, it is immediate that: PXj(𝖾𝗆𝖻𝖾𝖽)P~XjP^{(\mathsf{embed})}_{X^{j}}\equiv\tilde{P}_{X^{j}} and PΔ(j),X[n]S(𝖾𝗆𝖻𝖾𝖽)P~Δ(j),X[n]SP^{(\mathsf{embed})}_{\Delta^{(j)},X^{[n]\setminus S}}\equiv\tilde{P}_{\Delta^{(j)},X^{[n]\setminus S}}.

Finally, XX is fully determined by XjX^{j} and (Δ(j),X[n]S)(\Delta^{(j)},X^{[n]\setminus S}), which are independent in both P(𝖾𝗆𝖻𝖾𝖽)P^{(\mathsf{embed})} (because qq and rr are sampled independently in the definition of P(𝖾𝗆𝖻𝖾𝖽)P^{(\mathsf{embed})}) and P~\tilde{P} (by Claim 5.4). ∎

We have constructed an embedding of QQ into one of the first nn^{\prime} coordinates of P~\tilde{P}, which is the desired contradiction. ∎

6 Decomposition Into Pseudorandom Affine Components

In this section we show that if EE is an arbitrary event with sufficient probability mass under P=Q𝖦𝖧𝖹nP=Q_{\mathsf{GHZ}}^{n}, then P~=P|E\tilde{P}=P|E can be decomposed into components with affine support that are “similar” to corresponding components of PP. We will call such components pseudorandom.

We say that Π\Pi is an affine partition of 𝔽23×n{\mathbb{F}}_{2}^{3\times n} to mean that:

  • Each part Π(x)\Pi(x) of Π\Pi has the form 𝐰(x)+𝒱(x)3\mathbf{w}(x)+{\cal V}(x)^{3} where 𝒱(x){\cal V}(x) is a subspace of 𝔽2n{\mathbb{F}}_{2}^{n}, and

  • Each 𝒱(x){\cal V}(x) has the same dimension, which we refer to as the dimension of Π\Pi and denote by dim(Π)\dim(\Pi). The codimension of Π\Pi is defined to be ndim(Π)n-\dim(\Pi).

Definition 6.1.

If 𝒲{\cal W} is an affine shift of a vector space 𝒱3{\cal V}^{3} (for 𝒱𝔽2n{\cal V}\leq{\mathbb{F}}_{2}^{n}), we say that a 𝒲{\cal W}-valued random variable XX is (m,ϵ)(m,\epsilon)-close to YY if for all linear functions ϕ:𝔽2n𝔽2m\phi:{\mathbb{F}}_{2}^{n}\to{\mathbb{F}}_{2}^{m} we have d𝖪𝖫(ϕ3(X)ϕ3(Y))ϵd_{\mathsf{KL}}(\phi^{3}(X)\|\phi^{3}(Y))\leq\epsilon, where ϕ3\phi^{3} denotes the function mapping

(x1x2x3)(ϕ(x1)ϕ(x2)ϕ(x3)).\left(\begin{array}[]{l}x_{1}\\ x_{2}\\ x_{3}\end{array}\right)\mapsto\left(\begin{array}[]{l}\phi(x_{1})\\ \phi(x_{2})\\ \phi(x_{3})\end{array}\right).

We write dm(XY)d_{m}(X\|Y) to denote the minimum ϵ\epsilon for which XX is (m,ϵ)(m,\epsilon)-close to YY.

We remark that dm(XY)d_{m}(X\|Y) is a non-decreasing function of mm.

Lemma 6.2.

Let PP denote the distribution Q𝖦𝖧𝖹nQ_{\mathsf{GHZ}}^{n}, let XX be the identity random variable, let EE be an event with P(XE)=eΔP(X\in E)=e^{-\Delta}, and let P~=P|(XE)\tilde{P}=P\big{|}(X\in E). For any δ>0\delta>0 and any m+m\in{\mathbb{Z}}^{+}, there exists an affine partition Π\Pi of 𝔽23×n{\mathbb{F}}_{2}^{3\times n}, of codimension at most mΔδm\cdot\frac{\Delta}{\delta}, such that:

𝔼πP~Π(X)[dm(P~X|XπPX|Xπ)]δ.\mathop{\mathbb{E}}_{\pi\leftarrow\tilde{P}_{\Pi(X)}}\left[d_{m}\Big{(}\tilde{P}_{X|X\in\pi}\Big{\|}P_{X|X\in\pi}\Big{)}\right]\leq\delta. (8)
Proof.

We construct the claimed partition iteratively. Start with the trivial nn-dimensional affine partition Π0={𝔽23×n}\Pi_{0}=\{{\mathbb{F}}_{2}^{3\times n}\}. Whenever Πi\Pi_{i} is a partition Π\Pi for which Eq. 8 does not hold, there exists a function ϕi:𝔽23×n𝔽23×m\phi_{i}:{\mathbb{F}}_{2}^{3\times n}\to{\mathbb{F}}_{2}^{3\times m} that:

  • When restricted to any part π\pi of Πi\Pi_{i}, ϕi\phi_{i} is of the form ϕi,π3\phi_{i,\pi}^{3} for some linear function ϕi,π:𝔽2n𝔽2m\phi_{i,\pi}:{\mathbb{F}}_{2}^{n}\to{\mathbb{F}}_{2}^{m}, and

  • d𝖪𝖫(P~ϕi(X)|Πi(X)Pϕi(X)|Πi(X))>δ.d_{\mathsf{KL}}\Big{(}\tilde{P}_{\phi_{i}(X)|\Pi_{i}(X)}\Big{\|}P_{\phi_{i}(X)|\Pi_{i}(X)}\Big{)}>\delta. (9)

Without loss of generality, we additionally assume that each ϕi,π\phi_{i,\pi} is “full rank” when restricted to π\pi. That is, if π\pi is an affine shift of 𝒱3{\cal V}^{3}, where 𝒱{\cal V} has dimension kk, then the restriction of ϕi,π\phi_{i,\pi} to 𝒱{\cal V} is a linear map of rank min(k,m)\min(k,m). It is clear that any ϕi,π\phi_{i,\pi} may be modified to be full rank without decreasing the KL divergence of Eq. 9.

Then by the chain rule for KL divergences,

d𝖪𝖫(P~X|Πi(X),ϕi(X)PX|Πi(X),ϕi(X))<d𝖪𝖫(P~X|Πi(X)PX|Πi(X))δ.d_{\mathsf{KL}}\Big{(}\tilde{P}_{X|\Pi_{i}(X),\phi_{i}(X)}\Big{\|}P_{X|\Pi_{i}(X),\phi_{i}(X)}\Big{)}<d_{\mathsf{KL}}\Big{(}\tilde{P}_{X|\Pi_{i}(X)}\Big{\|}P_{X|\Pi_{i}(X)}\Big{)}-\delta. (10)

The left-hand side of Eq. 10 is equivalent to

d𝖪𝖫(P~X|Πi+1(X)PX|Πi+1(X))d_{\mathsf{KL}}\Big{(}\tilde{P}_{X|\Pi_{i+1}(X)}\Big{\|}P_{X|\Pi_{i+1}(X)}\Big{)}

with Πi+1={π{x:ϕi(x)=z}}πΠi,z𝔽23×m\Pi_{i+1}=\big{\{}\pi\cap\{x:\phi_{i}(x)=z\}\big{\}}_{\pi\in\Pi_{i},z\in{\mathbb{F}}_{2}^{3\times m}}, which is an affine partition of dimension at least dim(Π)m\dim(\Pi)-m.

Thus with the non-negative potential function

Φ(Π)=𝖽𝖾𝖿d𝖪𝖫(P~X|Π(X)PX|Π(X)),\Phi(\Pi)\stackrel{{\scriptstyle\mathsf{def}}}{{=}}d_{\mathsf{KL}}\Big{(}\tilde{P}_{X|\Pi(X)}\Big{\|}P_{X|\Pi(X)}\Big{)},

we have Φ(Πi+1)<Φ(Πi)δ\Phi(\Pi_{i+1})<\Phi(\Pi_{i})-\delta. But Φ(Π0)=ln(P(E))=Δ\Phi(\Pi_{0})=-\ln\left(P(E)\right)=\Delta, so there must exist iΔδi^{\star}\leq\frac{\Delta}{\delta} for which Eq. 8 holds with Π=Πi\Pi=\Pi_{i^{\star}}, which has co-dimension at most mΔδm\cdot\frac{\Delta}{\delta}. ∎

7 Pseudorandomness Preserves Hardness

Proposition 7.1.

Let 𝒲𝔽23×n{\cal W}\subseteq{\mathbb{F}}_{2}^{3\times n} be an affine shift of a linear subspace 𝒱3{\cal V}^{3} and let PP be a the uniform distribution on {w𝒲:w1+w2+w3=0}\{w\in{\cal W}:w_{1}+w_{2}+w_{3}=0\}, which we assume to be non-empty. Let XX denote the identity random variable, let E=E1×E2×E3E=E_{1}\times E_{2}\times E_{3} be an event with P(XE)=eΔP(X\in E)=e^{-\Delta}, and define P~=𝖽𝖾𝖿P|(XE)\tilde{P}\stackrel{{\scriptstyle\mathsf{def}}}{{=}}P\big{|}(X\in E). Suppose that P~X\tilde{P}_{X} is (1δ,δ)(\lceil\frac{1}{\delta}\rceil,\delta)-close to PXP_{X} as in Definition 6.1, for δ\delta satisfying δmin(Δ232e4Δ/ϵ,Δ232e2, 2ϵ2)\delta\leq\min(\frac{\Delta^{2}}{32}\cdot e^{-4\Delta/\epsilon},\ \frac{\Delta^{2}}{32e^{2}},\ 2\epsilon^{2}).

Then for each j[n]j\in[n], we have vj(𝒢𝖦𝖧𝖹n|P~)vj(𝒢𝖦𝖧𝖹n|P)+2ϵv^{j}({\cal G}_{\mathsf{GHZ}}^{n}|\tilde{P})\leq v^{j}({\cal G}_{\mathsf{GHZ}}^{n}|P)+2\epsilon.

Proof.

Fix j[n]j\in[n] to be any coordinate, and let f~=f~1×f~2×f~3:𝒲𝔽23\tilde{f}=\tilde{f}_{1}\times\tilde{f}_{2}\times\tilde{f}_{3}:{\cal W}\to{\mathbb{F}}_{2}^{3} be an arbitrary strategy. Let YY denote f~(X)\tilde{f}(X).

Claim 7.2.

There exists a subspace 𝒰𝒱{\cal U}\leq{\cal V} of codimension at most 1δ\lceil\frac{1}{\delta}\rceil such that:

  • The jthj^{th} coordinate xjx^{j} of x𝔽23×nx\in{\mathbb{F}}_{2}^{3\times n} depends only on x+𝒰3x+{{\cal U}^{3}}.

  • For all χ𝒰^\chi\in\hat{{\cal U}},

    𝔼(x,y)PX,Y[d𝖪𝖫(Pχ(X1)|Xx+𝒰3,Y1=y1Uχ(X1)|Xx+𝒰3)]δ,\mathop{\mathbb{E}}_{(x,y)\leftarrow P_{X,Y}}\left[d_{\mathsf{KL}}\Big{(}P_{\chi(X_{1})|X\in x+{\cal U}^{3},Y_{1}=y_{1}}\Big{\|}U_{\chi(X_{1})|X\in x+{\cal U}^{3}}\Big{)}\right]\leq\delta,

    where UU denotes the uniform distribution on 𝒲{\cal W}.

Proof.

Start with 𝒰1={u𝒱:uj=0}{\cal U}_{1}=\{u\in{\cal V}:u^{j}=0\} (this ensures that any subspace 𝒰𝒰1{\cal U}\leq{\cal U}_{1} satisfies the first desired property). Define a potential function

Z(𝒰)=𝖽𝖾𝖿dim(𝒰)𝔼(x,y)PX,Y[H(X1|X1x1+𝒰,Y1=y1)],Z({\cal U})\stackrel{{\scriptstyle\mathsf{def}}}{{=}}\dim({\cal U})-\mathop{\mathbb{E}}_{(x,y)\leftarrow P_{X,Y}}\big{[}H(X_{1}|X_{1}\in x_{1}+{\cal U},Y_{1}=y_{1})\big{]},

which is clearly non-negative. Additionally, Z(𝒰)Z({\cal U}) (and in particular Z(𝒰1)Z({\cal U}_{1})) is at most 11 because for any subspace 𝒰𝒱{\cal U}\leq{\cal V} and any x1𝒱x_{1}\in{\cal V}, the entropy chain rule implies

𝔼yPY|X1x1+𝒰[H(X1|X1x1+𝒰,Y1=y1)]\displaystyle\mathop{\mathbb{E}}_{y\leftarrow P_{Y|X_{1}\in x_{1}+{\cal U}}}\big{[}H(X_{1}|X_{1}\in x_{1}+{\cal U},Y_{1}=y_{1})\big{]} =H(X1|X1x1+𝒰)H(Y1|X1x1+𝒰)\displaystyle=H(X_{1}|X_{1}\in x_{1}+{\cal U})-H(Y_{1}|X_{1}\in x_{1}+{\cal U})
dim(𝒰)1.\displaystyle\geq\dim({\cal U})-1.

(in the first step we used the fact that Y1Y_{1} is a function of X1X_{1}.

For i1i\geq 1, define χi𝒰^i{1}\chi_{i}\in\hat{{\cal U}}_{i}\setminus\{1\} to maximize

bi\displaystyle b_{i} =𝖽𝖾𝖿𝔼(x,y)PX,Y[d𝖪𝖫(Pχi(X1)|Xx+𝒰i3,Y1=y1Uχi(X1)|Xx+𝒰i3)]\displaystyle\stackrel{{\scriptstyle\mathsf{def}}}{{=}}\mathop{\mathbb{E}}_{(x,y)\leftarrow P_{X,Y}}\Big{[}d_{\mathsf{KL}}\Big{(}P_{\chi_{i}(X_{1})|X\in x+{\cal U}_{i}^{3},Y_{1}=y_{1}}\Big{\|}U_{\chi_{i}(X_{1})|X\in x+{\cal U}_{i}^{3}}\Big{)}\Big{]}
=𝔼(x,y)PX,Y[d𝖪𝖫(Pχi(X1)|Xx+𝒰i3,Y1=y1Unif{±1})]\displaystyle=\mathop{\mathbb{E}}_{(x,y)\leftarrow P_{X,Y}}\Big{[}d_{\mathsf{KL}}\Big{(}P_{\chi_{i}(X_{1})|X\in x+{\cal U}_{i}^{3},Y_{1}=y_{1}}\Big{\|}\mathrm{Unif}_{\{\pm 1\}}\Big{)}\Big{]}
=𝔼(x,y)PX,Y[d𝖪𝖫(Pχi(X1)|X1x1+𝒰i,Y1=y1Unif{±1})]\displaystyle=\mathop{\mathbb{E}}_{(x,y)\leftarrow P_{X,Y}}\Big{[}d_{\mathsf{KL}}\Big{(}P_{\chi_{i}(X_{1})|X_{1}\in x_{1}+{\cal U}_{i},Y_{1}=y_{1}}\Big{\|}\mathrm{Unif}_{\{\pm 1\}}\Big{)}\Big{]}
=1𝔼(x,y)PX,Y[H(χi(X1)|X1x1+𝒰i,Y1=y1)],\displaystyle=1-\mathop{\mathbb{E}}_{(x,y)\leftarrow P_{X,Y}}\Big{[}H\big{(}\chi_{i}(X_{1})|X_{1}\in x_{1}+{\cal U}_{i},Y_{1}=y_{1}\big{)}\Big{]},

and define 𝒰i+1=𝖽𝖾𝖿{u𝒰i:χi(u)=1}{\cal U}_{i+1}\stackrel{{\scriptstyle\mathsf{def}}}{{=}}\{u\in{\cal U}_{i}:\chi_{i}(u)=1\}. By the entropy chain rule, we have Z(𝒰i+1)Z(𝒰i)biZ({\cal U}_{i+1})\leq Z({\cal U}_{i})-b_{i}.

Since the initial potential is at most 11, and all potentials are at least 0, there must be some i1δi^{\star}\leq\lceil\frac{1}{\delta}\rceil for which biδb_{i^{\star}}\leq\delta. The corresponding 𝒰i{\cal U}_{i^{\star}} is the desired subspace of 𝒱{\cal V}. ∎

Now let 𝒰{\cal U} be as given by Claim 7.2. By Lemma 4.4, we have

𝔼xPX[d𝖳𝖵(PY|Xx+𝒰3,i[3]PYi|Xixi+𝒰)]2δ.\mathop{\mathbb{E}}_{x\leftarrow P_{X}}\left[d_{\mathsf{TV}}\big{(}P_{Y|X\in x+{\cal U}^{3}},\prod_{i\in[3]}\ P_{Y_{i}|X_{i}\in x_{i}+{\cal U}}\big{)}\right]\leq\sqrt{2\delta}.

By assumption of Proposition 7.1 (together with Pinsker’s inequality), PX+𝒰3P_{X+{\cal U}^{3}} and P~X+𝒰3\tilde{P}_{X+{\cal U}^{3}} are δ2\sqrt{\frac{\delta}{2}}-close in total variational distance. We thus have that

𝔼xP~X[d𝖳𝖵(PY|Xx+𝒰3,i[3]PYi|Xixi+𝒰)]8δ,\mathop{\mathbb{E}}_{x\leftarrow\tilde{P}_{X}}\left[d_{\mathsf{TV}}\big{(}P_{Y|X\in x+{\cal U}^{3}},\prod_{i\in[3]}\ P_{Y_{i}|X_{i}\in x_{i}+{\cal U}}\big{)}\right]\leq\sqrt{8\delta}, (11)

by the general fact that if PP and QQ are two distributions that are ϵ\epsilon-close in total variational distance, and if XX is a BB-bounded random variable, then |𝔼P[X]𝔼Q[X]|2Bϵ\big{|}\mathop{\mathbb{E}}_{P}[X]-\mathop{\mathbb{E}}_{Q}[X]\big{|}\leq 2B\epsilon.

We now obtain a probabilistic lower bound on P(E|X+𝒰3)P(E|X+{\cal U}^{3}). We first lower bound its log-expectation:

𝔼xP~X[lnP(E|Xx+𝒰3)]\displaystyle\mathop{\mathbb{E}}_{x\leftarrow\tilde{P}_{X}}\Big{[}-\ln P\big{(}E|X\in x+{\cal U}^{3}\big{)}\Big{]} =𝔼xP~X[d𝖪𝖫(P~X|Xx+𝒰3PX|Xx+𝒰3)]\displaystyle=\mathop{\mathbb{E}}_{x\leftarrow\tilde{P}_{X}}\Big{[}d_{\mathsf{KL}}\big{(}\tilde{P}_{X|X\in x+{\cal U}^{3}}\|P_{X|X\in x+{\cal U}^{3}}\big{)}\Big{]}
d𝖪𝖫(P~XPX)\displaystyle\leq d_{\mathsf{KL}}\big{(}\tilde{P}_{X}\|P_{X}\big{)} (Fact A.17)
Δ.\displaystyle\leq\Delta.

Markov’s inequality then implies that for any τ\tau,

PrxP~X[P(E|Xx+𝒰3)τ]Δln(1/τ).\Pr_{x\leftarrow\tilde{P}_{X}}\big{[}P(E|X\in x+{\cal U}^{3})\leq\tau\big{]}\leq\frac{\Delta}{\ln(1/\tau)}. (12)

Combining Eq. 12 with Eq. 11 and Fact A.18, we get

𝔼xP~X[d𝖳𝖵(P~Y|Xx+𝒰3,i[3](P|XiEi)Yi|Xixi+𝒰)]Δln(1/τ)+42δτ.\mathop{\mathbb{E}}_{x\leftarrow\tilde{P}_{X}}\left[d_{\mathsf{TV}}\big{(}\tilde{P}_{Y|X\in x+{\cal U}^{3}},\prod_{i\in[3]}\ (P|X_{i}\in E_{i})_{Y_{i}|X_{i}\in x_{i}+{\cal U}}\big{)}\right]\leq\frac{\Delta}{\ln(1/\tau)}+\frac{4\sqrt{2\delta}}{\tau}.

Since this holds for all τ[0,1]\tau\in[0,1] and because δΔ232e2\delta\leq\frac{\Delta^{2}}{32e^{2}}, Corollary C.2 implies that

𝔼xP~X[d𝖳𝖵(P~Y|Xx+𝒰3,i[3](P|XiEi)Yi|Xixi+𝒰)]4Δln(Δ32δ)ϵ,\displaystyle\mathop{\mathbb{E}}_{x\leftarrow\tilde{P}_{X}}\left[d_{\mathsf{TV}}\big{(}\tilde{P}_{Y|X\in x+{\cal U}^{3}},\prod_{i\in[3]}\ (P|X_{i}\in E_{i})_{Y_{i}|X_{i}\in x_{i}+{\cal U}}\big{)}\right]\leq\frac{4\Delta}{\ln\left(\frac{\Delta}{\sqrt{32\delta}}\right)}\leq\epsilon, (13)

where the last inequality follows from our assumption that δΔ232e4Δ/ϵ\delta\leq\frac{\Delta^{2}}{32}\cdot e^{-4\Delta/\epsilon}.

Putting everything together, we have

P~X+𝒰3,Y\displaystyle\tilde{P}_{X+{\cal U}^{3},Y} =P~X+𝒰3P~Y|X+𝒰3\displaystyle=\tilde{P}_{X+{\cal U}^{3}}\tilde{P}_{Y|X+{\cal U}^{3}}
ϵP~X+𝒰3i[3](P|XiEi)Yi|Xi+𝒰\displaystyle\approx_{\epsilon}\tilde{P}_{X+{\cal U}^{3}}\cdot\prod_{i\in[3]}(P|X_{i}\in E_{i})_{Y_{i}|X_{i}+{\cal U}}
δ2PX+𝒰3i[3](P|XiEi)Yi|Xi+𝒰,\displaystyle\approx_{\sqrt{\frac{\delta}{2}}}P_{X+{\cal U}^{3}}\cdot\prod_{i\in[3]}(P|X_{i}\in E_{i})_{Y_{i}|X_{i}+{\cal U}},

where \approx denotes closeness in total variational distance.

But PX+𝒰3i[3](P|XiEi)Yi|Xi+𝒰P_{X+{\cal U}^{3}}\cdot\prod_{i\in[3]}(P|X_{i}\in E_{i})_{Y_{i}|X_{i}+{\cal U}} is just the distribution on (x+𝒰3,y)(x+{\cal U}^{3},y) obtained by sampling xPXx\leftarrow P_{X}, yF(x)y\leftarrow F(x), where F=F1×F2×F3F=F_{1}\times F_{2}\times F_{3} is the following randomized strategy. On input xix_{i}, FiF_{i} uses local randomness to sample and output yi(P|XiEi)Yi|Xixi+𝒰y_{i}\leftarrow(P|X_{i}\in E_{i})_{Y_{i}|X_{i}\in x_{i}+{\cal U}}. By Fact 3.8, the probability that W(xj,y)=1W(x^{j},y)=1 (which is well-defined because xjx^{j} is a function of x+𝒰3x+{\cal U}^{3}) is at most vj(𝒢𝖦𝖧𝖹n|P)v^{j}({\cal G}_{\mathsf{GHZ}}^{n}|P).

We thus have

vj[f~](𝒢𝖦𝖧𝖹n|P~)\displaystyle v^{j}[\tilde{f}]({\cal G}_{\mathsf{GHZ}}^{n}|\tilde{P}) =P~X+𝒰3,Y(W(Xj,Y)=1)\displaystyle=\tilde{P}_{X+{\cal U}^{3},Y}\big{(}W(X^{j},Y)=1\big{)}
vj(𝒢𝖦𝖧𝖹n|P)+ϵ+δ2\displaystyle\leq v^{j}({\cal G}_{\mathsf{GHZ}}^{n}|P)+\epsilon+\sqrt{\frac{\delta}{2}}
vj(𝒢𝖦𝖧𝖹n|P)+2ϵ.\displaystyle\leq v^{j}({\cal G}_{\mathsf{GHZ}}^{n}|P)+2\epsilon.

Since this holds for arbitrary f~\tilde{f}, we have vj(𝒢𝖦𝖧𝖹n|P~)vj(𝒢𝖦𝖧𝖹n|P)+2ϵv^{j}({\cal G}_{\mathsf{GHZ}}^{n}|\tilde{P})\leq v^{j}({\cal G}_{\mathsf{GHZ}}^{n}|P)+2\epsilon. ∎

8 Proof of Main Theorem

Theorem 8.1.

If 𝒢=(𝒳,𝒴,Q,W){\cal G}=({\cal X},{\cal Y},Q,W) denotes the GHZ game, then v(𝒢n)nΩ(1)v({\cal G}^{n})\leq n^{-\Omega(1)}.

Proof.

Recall v(𝒢)=3/4v({\cal G})=3/4.

Let PP denote QnQ^{n}; that is PP is uniform on {(X1,X2,X3)𝔽23×n:X1+X2+X3=0}\big{\{}(X_{1},X_{2},X_{3})\in{\mathbb{F}}_{2}^{3\times n}:X_{1}+X_{2}+X_{3}=0\big{\}}. Let E=E1×E2×E3E=E_{1}\times E_{2}\times E_{3} be any product event in 𝔽23×n{\mathbb{F}}_{2}^{3\times n} with P(E)eΔP(E)\geq e^{-\Delta} (where Δ\Delta is a parameter we will specify later), and let P~\tilde{P} denote P|EP|E.

Let δ>0\delta>0 be a parameter we will specify later, and let m=1δm=\lceil\frac{1}{\delta}\rceil. Recall our definition of dmd_{m} (Definition 6.1). Lemma 6.2 states that there exists an affine partition Π\Pi of 𝔽23×n{\mathbb{F}}_{2}^{3\times n}, of codimension at most mΔδm\cdot\frac{\Delta}{\delta}, such that:

𝔼πP~Π(X)[dm(P~X|XπPX|Xπ)]δ.\mathop{\mathbb{E}}_{\pi\leftarrow\tilde{P}_{\Pi(X)}}\left[d_{m}\Big{(}\tilde{P}_{X|X\in\pi}\Big{\|}P_{X|X\in\pi}\Big{)}\right]\leq\delta.

Moreover,

𝔼πP~Π(X)[d(P~X|XπPX|Xπ)]\displaystyle\mathop{\mathbb{E}}_{\pi\leftarrow\tilde{P}_{\Pi(X)}}\left[d_{\infty}\Big{(}\tilde{P}_{X|X\in\pi}\Big{\|}P_{X|X\in\pi}\Big{)}\right] =d𝖪𝖫(P~X|Π(X)PX|Π(X))\displaystyle=d_{\mathsf{KL}}\Big{(}\tilde{P}_{X|\Pi(X)}\Big{\|}P_{X|\Pi(X)}\Big{)}
d𝖪𝖫(P~XPX)\displaystyle\leq d_{\mathsf{KL}}\big{(}\tilde{P}_{X}\|P_{X}\big{)}
Δ.\displaystyle\leq\Delta.

Markov’s inequality thus implies that with probability at least 1/31/3 when sampling πP~Π(X)\pi\leftarrow\tilde{P}_{\Pi(X)}, it holds that dm(P~X|XπPX|Xπ)3δd_{m}\Big{(}\tilde{P}_{X|X\in\pi}\Big{\|}P_{X|X\in\pi}\Big{)}\leq 3\delta and d(P~X|XπPX|Xπ)3Δd_{\infty}\Big{(}\tilde{P}_{X|X\in\pi}\Big{\|}P_{X|X\in\pi}\Big{)}\leq 3\Delta. Call such a π\pi pseudorandom, and let {\cal R} denote the set of pseudorandom π\pi.

By Proposition 7.1, for each pseudorandom π\pi we have

vj(𝒢n|(P~|π))vj(𝒢n|(P|π))+2ϵv^{j}\big{(}{\cal G}^{n}|(\tilde{P}|\pi)\big{)}\leq v^{j}\big{(}{\cal G}^{n}|(P|\pi)\big{)}+2\epsilon (14)

as long as

3δmin(9Δ232e12Δ/ϵ,9Δ232e2, 2ϵ2),3\delta\leq\min(\frac{9\Delta^{2}}{32}\cdot e^{-12\Delta/\epsilon},\ \frac{9\Delta^{2}}{32e^{2}},\ 2\epsilon^{2}), (15)

where ϵ\epsilon is a parameter we will specify later.

By Proposition 5.2, for each πΠ\pi\in\Pi (with P(π)>0P(\pi)>0), it holds for all but mΔδm\cdot\frac{\Delta}{\delta} values of j[n]j\in[n], we have vj(𝒢n|(P|π))=v(𝒢)=3/4v^{j}\big{(}{\cal G}^{n}\big{|}(P|\pi)\big{)}=v({\cal G})=3/4. By averaging, there exists some j[n]j^{\star}\in[n] such that

𝔼πP~Π(X)|Π(X)[vj(𝒢n|(P|π))]mΔnδ+(1mΔnδ)34,\mathop{\mathbb{E}}_{\pi\leftarrow\tilde{P}_{\Pi(X)|\Pi(X)\in{\cal R}}}\left[v^{j^{\star}}\big{(}{\cal G}^{n}\big{|}(P|\pi)\big{)}\right]\leq\frac{m\Delta}{n\delta}+\left(1-\frac{m\Delta}{n\delta}\right)\cdot\frac{3}{4},

which is at most 7/87/8 if

δ2mΔn.\delta\geq\frac{2m\Delta}{n}. (16)

Putting everything together, we have

vj(𝒢n|P~)\displaystyle v^{j^{\star}}\big{(}{\cal G}^{n}|\tilde{P}\big{)} 𝔼πP~Π(X)[vj(𝒢n|(P~|π))]\displaystyle\leq\mathop{\mathbb{E}}_{\pi\leftarrow\tilde{P}_{\Pi(X)}}\Big{[}v^{j^{\star}}\big{(}{\cal G}^{n}|(\tilde{P}|\pi)\big{)}\Big{]}
PrπP~Π(X)[π]+PrπP~Π(X)[π]𝔼πP~Π(X)|Π(X)[vj(𝒢n|(P~|π))]\displaystyle\leq\Pr_{\pi\leftarrow\tilde{P}_{\Pi(X)}}\left[\pi\notin{\cal R}\right]+\Pr_{\pi\leftarrow\tilde{P}_{\Pi(X)}}\left[\pi\in{\cal R}\right]\cdot\mathop{\mathbb{E}}_{\pi\leftarrow\tilde{P}_{\Pi(X)|\Pi(X)\in{\cal R}}}\Big{[}v^{j^{\star}}\big{(}{\cal G}^{n}|(\tilde{P}|\pi)\big{)}\Big{]}
23+13(78+2ϵ)\displaystyle\leq\frac{2}{3}+\frac{1}{3}\cdot(\frac{7}{8}+2\epsilon)
4748\displaystyle\leq\frac{47}{48}

if Eqs. 15 and 16 are satisfied and if ϵ132\epsilon\leq\frac{1}{32}. Setting ϵ=132\epsilon=\frac{1}{32}, Δ=0.0005lnn\Delta=0.0005\ln n, δ=n0.4\delta=n^{-0.4}, m=n0.4m=n^{0.4} ensures that these constraints are all satisfied for sufficiently large nn.

Applying Lemma 8.2 below with ρ(n)=n0.0005\rho(n)=n^{-0.0005} and ϵ=148\epsilon=\frac{1}{48} completes the proof. ∎

Lemma 8.2 (Parallel Repetition Criterion).

Let 𝒢=(𝒳,𝒴,Q,W){\cal G}=({\cal X},{\cal Y},Q,W) be a game, and let PP denote QnQ^{n}. Suppose ρ:+\rho:{\mathbb{Z}}^{+}\to{\mathbb{R}} is a function with ρ(n)eO(n)\rho(n)\geq e^{-O(n)} and ϵ>0\epsilon>0 is a constant such that for all E=E1×Ek𝒳nE=E_{1}\times\cdots E_{k}\subseteq{\cal X}^{n} with Pn(E)ρ(n)P^{n}(E)\geq\rho(n) there exists jj such that vj(𝒢n|(P|E))1ϵv^{j}\big{(}{\cal G}^{n}|(P|E)\big{)}\leq 1-\epsilon. Then

v(𝒢n)ρ(n)Ω(1).v({\cal G}^{n})\leq\rho(n)^{\Omega(1)}.
Proof.

Fix any f=f1××fk:𝒳n𝒴nf=f_{1}\times\cdots\times f_{k}:{\cal X}^{n}\to{\cal Y}^{n}. Consider the probability space defined by sampling XPn{X}\leftarrow P^{n}, and let Y=f(X){Y}=f({X}). We define additional random variables J1,,Jn[n]J_{1},\ldots,J_{n}\in[n] and Z1,,Zn𝒳×𝒴Z_{1},\ldots,Z_{n}\in{\cal X}\times{\cal Y} where J1J_{1} is an arbitrary fixed value, Zi=𝖽𝖾𝖿(XJi,YJi)Z_{i}\stackrel{{\scriptstyle\mathsf{def}}}{{=}}({X}^{J_{i}},{Y}^{J_{i}}) for all ii, and Ji+1J_{i+1} depends deterministically on Zi=𝖽𝖾𝖿(Z1,,Zi){Z}_{\leq i}\stackrel{{\scriptstyle\mathsf{def}}}{{=}}(Z_{1},\ldots,Z_{i}) as follows. When Zi=zi{Z}_{\leq i}={z}_{\leq i}, Ji+1J_{i+1} is defined to be a value j[n]j\in[n] that minimizes Pn(W(Xj,Yj)=1|Zi=zi)P^{n}\big{(}W(X^{j},Y^{j})=1\big{|}{Z}_{\leq i}={z}_{\leq i}\big{)}. With these definitions, each event {Zi=zi}\{Z_{\leq i}=z_{\leq i}\} is a product event. In particular, if Pn(Zi=zi)ρ(n)P^{n}({Z}_{\leq i}={z}_{\leq i})\geq\rho(n) then Pn(W(XJi+1,YJi+1)=1|Zi=zi)1ϵP^{n}\big{(}W(X^{J_{i+1}},Y^{J_{i+1}})=1\big{|}{Z}_{\leq i}={z}_{\leq i}\big{)}\leq 1-\epsilon.

Let Wini\textsc{Win}_{i} denote the event that W(Zi)=1W(Z_{i})=1, let Wini\textsc{Win}_{\leq i} denote the event Win1Wini\textsc{Win}_{1}\land\cdots\land\textsc{Win}_{i}, and let wiw_{i} denote Pn(Wini)P^{n}\big{(}\textsc{Win}_{\leq i}\big{)}. Since Wini\textsc{Win}_{\leq i} is the union of some subset of the |𝒳|i|𝒴|i|{\cal X}|^{i}\cdot|{\cal Y}|^{i} disjoint product events {Zi=zi}\{{Z}_{\leq i}={z}_{\leq i}\}, we have

PrziPZi|Winin[Pn(Zi=zi)ρ(n)]1|𝒳|i|𝒴|iρ(n)wi.\Pr_{{z}_{\leq i}\leftarrow P^{n}_{{Z}_{\leq i}|\textsc{Win}_{\leq i}}}\left[P^{n}({Z}_{\leq i}={z}_{\leq i})\geq\rho(n)\right]\geq 1-|{\cal X}|^{i}\cdot|{\cal Y}|^{i}\cdot\frac{\rho(n)}{w_{i}}.

Moreover, for all zi{z}_{\leq i} for which Pn(Zi=zi)ρ(n)P^{n}({Z}_{\leq i}={z}_{\leq i})\geq\rho(n), we know that Pn(Wini+1|Zi=zi)1ϵP^{n}\big{(}\textsc{Win}_{i+1}\big{|}{Z}_{\leq i}={z}_{\leq i}\big{)}\leq 1-\epsilon. Thus as long as wi2|𝒳|i|𝒴|iρ(n)w_{i}\geq 2\cdot|{\cal X}|^{i}\cdot|{\cal Y}|^{i}\cdot\rho(n), we have

wi+1\displaystyle w_{i+1} =wiPn(Wini+1|Wini)\displaystyle=w_{i}\cdot P^{n}(\textsc{Win}_{i+1}|\textsc{Win}_{\leq i})
=wi𝔼ziPZi|Winin[Pn(Wini+1|Zi=zi)]\displaystyle=w_{i}\cdot\mathop{\mathbb{E}}_{{z}_{\leq i}\leftarrow P^{n}_{{Z}_{\leq i}|\textsc{Win}_{\leq i}}}\big{[}P^{n}(\textsc{Win}_{i+1}|{Z}_{\leq i}={z}_{\leq i})\big{]}
wi(PrziPZi|Winin[Pn(Zi=zi)<ρ]+PrziPZi|Winin[Pn(Zi=zi)ρ](1ϵ))\displaystyle\leq w_{i}\cdot\left(\Pr_{{z}_{\leq i}\leftarrow P^{n}_{{Z}_{\leq i}|\textsc{Win}_{\leq i}}}\big{[}P^{n}\big{(}{Z}_{\leq i}={z}_{\leq i}\big{)}<\rho\big{]}+\Pr_{{z}_{\leq i}\leftarrow P^{n}_{{Z}_{\leq i}|\textsc{Win}_{\leq i}}}\big{[}P^{n}\big{(}{Z}_{\leq i}={z}_{\leq i}\big{)}\geq\rho\big{]}\cdot(1-\epsilon)\right)
wi(12+12(1ϵ))\displaystyle\leq w_{i}\cdot\left(\frac{1}{2}+\frac{1}{2}\cdot(1-\epsilon)\right)
=wi(1ϵ2)\displaystyle=w_{i}\cdot\left(1-\frac{\epsilon}{2}\right)

Iterating this inequality as long as the condition wi2|𝒳|i|𝒴|iρ(n)w_{i}\geq 2\cdot|{\cal X}|^{i}\cdot|{\cal Y}|^{i}\cdot\rho(n) is satisfied, we find wiw_{i^{\star}} such that wimin(2|𝒳|i|𝒴|iρ(n),(1ϵ2)i)w_{i^{\star}}\leq\min\big{(}2\cdot|{\cal X}|^{i^{\star}}\cdot|{\cal Y}|^{i^{\star}}\cdot\rho(n),(1-\frac{\epsilon}{2})^{i^{\star}}\big{)}. This is minimized for i=Θ(log1ρ(n))i^{\star}=\Theta(\log\frac{1}{\rho(n)}) or i=ni^{\star}=n and gives v(𝒢n)wiρ(n)Ω(1)v({\cal G}^{n})\leq w_{i^{\star}}\leq\rho(n)^{\Omega(1)}. ∎

Appendix A Probability Theory

We recall the notions of probability theory that we will need.

Definition A.1.

A probability distribution on a finite set Ω\Omega is a function P:ΩP:\Omega\to{\mathbb{R}} satisfying P(ω)0P(\omega)\geq 0 for all ωΩ\omega\in\Omega and ωΩP(ω)=1\sum_{\omega\in\Omega}P(\omega)=1. We extend the domain of PP to 2Ω2^{\Omega} by writing P(E)P(E) to denote ωEP(ω)\sum_{\omega\in E}P(\omega) for any “event” EΩE\subseteq\Omega.

Definition A.2.

The support of P:ΩP:\Omega\to{\mathbb{R}} is the set {ωΩ:P(ω)>0}\{\omega\in\Omega:P(\omega)>0\}.

Definition A.3.

A Σ\Sigma-valued random variable on a sample space Ω\Omega is a function X:ΩΣX:\Omega\to\Sigma.

Definition A.4 (Expectations).

If P:ΩP:\Omega\to{\mathbb{R}} is a probability distribution and X:ΩX:\Omega\to{\mathbb{R}} is a random variable, the expectation of XX under PP, denoted 𝔼P[X]\mathop{\mathbb{E}}_{P}[X], is defined to be ωΩP(ω)X(ω)\sum_{\omega\in\Omega}P(\omega)\cdot X(\omega).

We refer to subsets of Ω\Omega as events. We use standard shorthand for denoting events. For instance, if XX is a Σ\Sigma-valued random variable and xΣx\in\Sigma, we write X=xX=x to denote the event {ωΩ:X(ω)=x}\{\omega\in\Omega:X(\omega)=x\}.

Definition A.5 (Indicator Random Variables).

For any event EE, we write 1E1_{E} to denote a random variable defined as

1E(ω)={1if ωE0otherwise.1_{E}(\omega)=\begin{cases}1&\text{if $\omega\in E$}\\ 0&\text{otherwise.}\end{cases}
Definition A.6 (Independence).

Events E1,,EkΩE_{1},\ldots,E_{k}\subseteq\Omega are said to be independent under a probability distribution PP if P(E1Ek)=i[k]P(Ei)P(E_{1}\cap\cdots\cap E_{k})=\prod_{i\in[k]}P(E_{i}). Random variables X1,,XkX_{1},\ldots,X_{k} are said to be independent if the events X1=x1,,Xk=xkX_{1}=x_{1},\ldots,X_{k}=x_{k} are independent for any choice of x1,,xkx_{1},\ldots,x_{k}.

Definition A.7 (Conditional Probabilities).

If P:ΩP:\Omega\to{\mathbb{R}} is a probability distribution and EΩE\subseteq\Omega is an event with P(E)>0P(E)>0, then the conditional distribution of PP given EE is denoted (P|E):Ω(P|E):\Omega\to{\mathbb{R}} and is defined to be

(P|E)(ω)={P(ω)/P(E)if ωE0otherwise.(P|E)(\omega)=\begin{cases}P(\omega)/P(E)&\text{if $\omega\in E$}\\ 0&\text{otherwise.}\end{cases}

If XX is a random variable and PP is a probability distribution, we write PXP_{X} to denote the induced distribution of XX under PP. That is, PX(x)=P(X=x)P_{X}(x)=P(X=x).

If EE is an event, we write PX|EP_{X|E} as shorthand for (P|E)X(P|E)_{X}.

Definition A.8 (Entropy).

If P:ΩP:\Omega\to{\mathbb{R}} is a probability distribution, the entropy (in nats) of PP is

H(P)=𝖽𝖾𝖿ωΩP(ω)ln(P(ω)).H(P)\stackrel{{\scriptstyle\mathsf{def}}}{{=}}-\sum_{\omega\in\Omega}P(\omega)\cdot\ln\big{(}P(\omega)\big{)}.

When XX is a random variable associated with a probability distribution PP, we sometimes write H(X)H(X) as shorthand for H(PX)H(P_{X}).

Definition A.9 (Conditional Entropy).

If PP is a probability measure with random variables XX and YY, we write

H(PX|Y)=𝖽𝖾𝖿𝔼yPY[H(PX|Y=y)].H(P_{X|Y})\stackrel{{\scriptstyle\mathsf{def}}}{{=}}\mathop{\mathbb{E}}_{y\leftarrow P_{Y}}\left[H(P_{X|Y=y})\right].
Fact A.10 (Chain Rule of Conditional Entropy).

For any probability measure PP and any random variables XX, YY, it holds that

H(PX|Y)=H(PX,Y)H(PY).H(P_{X|Y})=H(P_{X,Y})-H(P_{Y}).

A.1 Divergences

Definition A.11 (Total Variational Distance).

If P,Q:ΩP,Q:\Omega\to{\mathbb{R}} are two probability distributions, then the total variational distance between PP and QQ, denoted d𝖳𝖵(P,Q)d_{\mathsf{TV}}(P,Q), is

d𝖳𝖵(P,Q)=𝖽𝖾𝖿maxEΩ|P(E)Q(E)|.d_{\mathsf{TV}}(P,Q)\stackrel{{\scriptstyle\mathsf{def}}}{{=}}\max_{E\subseteq\Omega}\Big{|}P(E)-Q(E)\Big{|}.

An equivalent definition is

d𝖳𝖵(P,Q)=𝖽𝖾𝖿12ωΩ|P(ω)Q(ω)|d_{\mathsf{TV}}(P,Q)\stackrel{{\scriptstyle\mathsf{def}}}{{=}}\frac{1}{2}\sum_{\omega\in\Omega}\big{|}P(\omega)-Q(\omega)\big{|}
Definition A.12 (Kullback-Leibler (KL) Divergence).

If P,Q:ΩP,Q:\Omega\to{\mathbb{R}} are probability distributions, the Kullback-Leibler divergence of PP from QQ is

d𝖪𝖫(PQ)=𝖽𝖾𝖿ωΩP(ω)ln(P(ω)Q(ω)),d_{\mathsf{KL}}(P\|Q)\stackrel{{\scriptstyle\mathsf{def}}}{{=}}\sum_{\omega\in\Omega}P(\omega)\ln\left(\frac{P(\omega)}{Q(\omega)}\right),

where terms of the form pln(p/0)p\cdot\ln(p/0) are treated as 0 if p=0p=0 and ++\infty otherwise, and terms of the form 0ln(0/q)0\cdot\ln(0/q) are treated as 0.

The following relation between total variational distance and Kullback-Leiber divergence, known as Pinsker’s inequality, is of fundamental importance.

Theorem A.13 (Pinsker’s Inequality).

For any probability distributions P,Q:ΩP,Q:\Omega\to{\mathbb{R}}, it holds that d𝖳𝖵(P,Q)12d𝖪𝖫(PQ)d_{\mathsf{TV}}(P,Q)\leq\sqrt{\frac{1}{2}d_{\mathsf{KL}}(P\|Q)}.

Definition A.14 (Conditional KL Divergence).

If P,Q:ΩP,Q:\Omega\to{\mathbb{R}} are probability distributions and if WW, XX, YY, and ZZ are random variables on Ω\Omega, we write

d𝖪𝖫(PW|XQY|Z)=𝖽𝖾𝖿𝔼xPX[d𝖪𝖫(PW|X=xQY|Z=x)],d_{\mathsf{KL}}(P_{W|X}\|Q_{Y|Z})\stackrel{{\scriptstyle\mathsf{def}}}{{=}}\mathop{\mathbb{E}}_{x\leftarrow P_{X}}\left[d_{\mathsf{KL}}(P_{W|X=x}\|Q_{Y|Z=x})\right],

which is taken to be ++\infty if there exists xx with PX(x)>0P_{X}(x)>0 but QZ(x)=0Q_{Z}(x)=0.

KL divergence obeys a chain rule analogous to that for entropy.

Fact A.15 (Chain Rule for KL Divergence).

If P,Q:ΩP,Q:\Omega\to{\mathbb{R}} are probability distributions and W,X,Y,ZW,X,Y,Z are random variables on Ω\Omega, then

d𝖪𝖫(PW,XQY,Z)=d𝖪𝖫(PXQZ)+d𝖪𝖫(PW|XPY|Z).d_{\mathsf{KL}}(P_{W,X}\|Q_{Y,Z})=d_{\mathsf{KL}}(P_{X}\|Q_{Z})+d_{\mathsf{KL}}(P_{W|X}\|P_{Y|Z}).

A.2 Conditional KL Divergence

Fact A.16.

If P:ΩP:\Omega\to{\mathbb{R}} is a probability distribution and EΩE\subseteq\Omega is an event, then

d𝖪𝖫(P|EP)=ln(1P(E)).d_{\mathsf{KL}}\big{(}P|E\big{\|}P\big{)}=\ln\left(\frac{1}{P(E)}\right).
Fact A.17.

Let P,Q:ΩP,Q:\Omega\to{\mathbb{R}} be probability distributions and let XX, YY be random variables on Ω\Omega with YY a function of XX. Then

d𝖪𝖫(PX|YQX|Y)]d𝖪𝖫(PXQX).d_{\mathsf{KL}}(P_{X|Y}\|Q_{X|Y})\big{]}\leq d_{\mathsf{KL}}(P_{X}\|Q_{X}).
Proof.

This is well known, but for completeness:

d𝖪𝖫(PX|YQX|Y)\displaystyle d_{\mathsf{KL}}(P_{X|Y}\|Q_{X|Y}) =d𝖪𝖫(PX,YQX,Y)d𝖪𝖫(PYQY)\displaystyle=d_{\mathsf{KL}}(P_{X,Y}\|Q_{X,Y})-d_{\mathsf{KL}}(P_{Y}\|Q_{Y}) (chain rule)
=d𝖪𝖫(PXQX)d𝖪𝖫(PYQY)\displaystyle=d_{\mathsf{KL}}(P_{X}\|Q_{X})-d_{\mathsf{KL}}(P_{Y}\|Q_{Y}) (YY is a function of XX)
d𝖪𝖫(PXQX).\displaystyle\leq d_{\mathsf{KL}}(P_{X}\|Q_{X}). (non-negativity of KL)\displaystyle\text{(non-negativity of KL)}\qed

A.3 Conditional Statistical Distance

Fact A.18.

Let P,Q:ΩP,Q:\Omega\to{\mathbb{R}} be probability distributions, and let EΩE\subseteq\Omega be an arbitrary event. Then

d𝖳𝖵(P|E,Q|E)2d𝖳𝖵(P,Q)P(E).d_{\mathsf{TV}}(P|E,Q|E)\leq\frac{2\cdot d_{\mathsf{TV}}(P,Q)}{P(E)}.
Proof.

Suppose for the sake of contradiction that for some AEA\subseteq E, we have

|(P|E)(A)(Q|E)(A)|>2d𝖳𝖵(P,Q)P(E).\left|(P|E)(A)-(Q|E)(A)\right|>\frac{2d_{\mathsf{TV}}(P,Q)}{P(E)}.

Multiplying on both sides by P(E)P(E), we obtain

|P(A)P(E)(Q|E)(A)|>2d𝖳𝖵(P,Q).\left|P(A)-P(E)\cdot(Q|E)(A)\right|>2d_{\mathsf{TV}}(P,Q).

Since |P(E)Q(E)|d𝖳𝖵(P,Q)|P(E)-Q(E)|\leq d_{\mathsf{TV}}(P,Q) and (Q|E)(A)1(Q|E)(A)\leq 1, we have

|P(A)Q(A)|>d𝖳𝖵(P,Q),\left|P(A)-Q(A)\right|>d_{\mathsf{TV}}(P,Q),

which is a contradiction. ∎

Corollary A.19.

Let P:ΩP:\Omega\to{\mathbb{R}} be a probability distribution, let XX, YY and ZZ be random variables on Ω\Omega, and let EE be an event such that PrzPZ[P(E|Z=z)δ]1τ\Pr_{z\leftarrow P_{Z}}[P(E|Z=z)\geq\delta]\geq 1-\tau, and let P~\tilde{P} denote P|EP|E. Then

𝔼zPZ[d𝖳𝖵(P~X|Z=z,P~Y|Z=z)]τ+2𝔼zPZ[d𝖳𝖵(PX|Z=z,PY|Z=z)]δ.\mathop{\mathbb{E}}_{z\leftarrow P_{Z}}\big{[}d_{\mathsf{TV}}(\tilde{P}_{X|Z=z},\tilde{P}_{Y|Z=z})\big{]}\leq\tau+\frac{2\cdot\mathop{\mathbb{E}}_{z\leftarrow P_{Z}}\big{[}d_{\mathsf{TV}}(P_{X|Z=z},P_{Y|Z=z})\big{]}}{\delta}.
Proof.
𝔼zPZ[d𝖳𝖵(P~X|Z=z,P~Y|Z=z)]\displaystyle\mathop{\mathbb{E}}_{z\leftarrow P_{Z}}\big{[}d_{\mathsf{TV}}(\tilde{P}_{X|Z=z},\tilde{P}_{Y|Z=z})\big{]}
=𝔼zPZ[1P(E|Z=z)<δd𝖳𝖵(P~X|Z=z,P~Y|Z=z)+1P(E|Z=z)δd𝖳𝖵(P~X|Z=z,P~Y|Z=z)]\displaystyle=\mathop{\mathbb{E}}_{z\leftarrow P_{Z}}\big{[}1_{P(E|Z=z)<\delta}\cdot d_{\mathsf{TV}}(\tilde{P}_{X|Z=z},\tilde{P}_{Y|Z=z})+1_{P(E|Z=z)\geq\delta}\cdot d_{\mathsf{TV}}(\tilde{P}_{X|Z=z},\tilde{P}_{Y|Z=z})\big{]}
τ+𝔼zPZ[1P(E|Z=z)δd𝖳𝖵(P~X|Z=z,P~Y|Z=z)]\displaystyle\leq\tau+\mathop{\mathbb{E}}_{z\leftarrow P_{Z}}\big{[}1_{P(E|Z=z)\geq\delta}\cdot d_{\mathsf{TV}}(\tilde{P}_{X|Z=z},\tilde{P}_{Y|Z=z})\big{]}
τ+𝔼zPZ[1P(E|Z=z)δ2d𝖳𝖵(PX|Z=z,PY|Z=z)P(E|Z=z)]\displaystyle\leq\tau+\mathop{\mathbb{E}}_{z\leftarrow P_{Z}}\left[1_{P(E|Z=z)\geq\delta}\cdot\frac{2\cdot d_{\mathsf{TV}}(P_{X|Z=z},P_{Y|Z=z})}{P(E|Z=z)}\right]
τ+2𝔼zPZ[d𝖳𝖵(PX|Z=z,PY|Z=z)]δ.\displaystyle\leq\tau+\frac{2\cdot\mathop{\mathbb{E}}_{z\leftarrow P_{Z}}\left[d_{\mathsf{TV}}(P_{X|Z=z},P_{Y|Z=z})\right]}{\delta}.\qed

Appendix B Fourier Analysis

For any (finite) vector space VV over 𝔽2{\mathbb{F}}_{2}, the character group of VV, denoted V^\hat{V}, is the set of group homomorphisms mapping VV (viewed as an additive group) to {±1}\{\pm 1\} (viewed as a multiplicative group). Each such homomorphism is called a character of VV.

We will distinguish the spaces of functions mapping from VV\to{\mathbb{R}} and functions mapping V^\hat{V}\to{\mathbb{R}} and view them as two different inner product spaces. For functions mapping VV\to{\mathbb{R}}, we define the inner product

f,g=𝖽𝖾𝖿𝔼xV[f(x)g(x)],\langle f,g\rangle\stackrel{{\scriptstyle\mathsf{def}}}{{=}}\mathop{\mathbb{E}}_{x\leftarrow V}\left[f(x){g(x)}\right],

and for functions mapping V^\hat{V}\to{\mathbb{R}}, we define the inner product

f^,g^=𝖽𝖾𝖿χ𝒱^f^(χ)g^(χ).\langle\hat{f},\hat{g}\rangle\stackrel{{\scriptstyle\mathsf{def}}}{{=}}\sum_{\chi\in\hat{{\cal V}}}\hat{f}(\chi)\cdot{\hat{g}(\chi)}.

If there is danger of ambiguity, we use ^,^\hat{\langle}\cdot,\cdot\hat{\rangle} to denote the latter inner product, and ^^\hat{\|}\cdot\hat{\|} to denote its corresponding norm.

Fact B.1.

Given a choice of basis for VV, there is a canonical isomorphism between VV and V^\hat{V}. Specifically, if V=𝔽2nV={\mathbb{F}}_{2}^{n}, then the characters of VV are the functions of the form

χγ(v)=(1)γv\chi_{\gamma}(v)=(-1)^{\gamma\cdot v}

for γ𝔽2n\gamma\in{\mathbb{F}}_{2}^{n}.

Definition B.2.

For any function f:Vf:V\to{\mathbb{R}}, its Fourier transform is the function f^:V^\hat{f}:\hat{V}\to{\mathbb{R}} defined by

f^(χ)=𝖽𝖾𝖿f,χ=𝔼xV[f(x)χ(x)].\hat{f}(\chi)\stackrel{{\scriptstyle\mathsf{def}}}{{=}}\langle f,\chi\rangle=\mathop{\mathbb{E}}_{x\leftarrow V}\left[f(x)\chi(x)\right].

One can verify that the characters of VV are orthonormal. Together with the assumption that VV is finite, we can deduce that ff is equal to χV^f^(χ)χ\sum_{\chi\in\hat{V}}\hat{f}(\chi)\cdot\chi.

Theorem B.3 (Plancherel).

For any f,g:Vf,g:V\to{\mathbb{R}},

f,g=f^,g^.\langle f,g\rangle=\langle\hat{f},\hat{g}\rangle.

An important special case of Plancherel’s theorem is Parseval’s theorem:

Theorem B.4 (Parseval).

For any f:Vf:V\to{\mathbb{R}},

f=f^.\|f\|=\|\hat{f}\|.

Appendix C Bound on Optimization Problem

Let W:++W:{\mathbb{R}}^{+}\to{\mathbb{R}}^{+} denote the inverse of the function xxexx\mapsto x\cdot e^{x} (WW is known in the literature as the (principal branch of the) Lambert W function). We rely on the following theorem:

Theorem C.1 ([HH00, Corollary 2.4]).

There exists a constant CC (in particular, C=ln(1+1e)C=\ln\left(1+\frac{1}{e}\right) works) such that for all yey\geq e,

W(y)lnylnlny+C.W(y)\leq\ln y-\ln\ln y+C.

The following corollary is more directly suited to our needs.

Corollary C.2.

For any A,B>0A,B>0 satisfying AeBA\geq eB,

minτ(0,1)Aln(1τ)+Bτ4Aln(A/B).\min_{\tau\in(0,1)}\frac{A}{\ln\left(\frac{1}{\tau}\right)}+\frac{B}{\tau}\leq\frac{4A}{\ln(A/B)}.
Proof.

The minimum is achieved (up to a factor of two) when Aln(1τ)=Bτ\frac{A}{\ln\left(\frac{1}{\tau}\right)}=\frac{B}{\tau} because Aln(1τ)\frac{A}{\ln\left(\frac{1}{\tau}\right)} is monotonically increasing with τ\tau while Bτ\frac{B}{\tau} is monotonically decreasing. Making the change of variables z=ln(τ)z=-\ln(\tau), this is equivalent to zez=ABze^{z}=\frac{A}{B}, i.e. z=W(AB)z=W(\frac{A}{B}). This choice of zz (or equivalently τ\tau) gives

Aln(1τ)+Bτ\displaystyle\frac{A}{\ln\left(\frac{1}{\tau}\right)}+\frac{B}{\tau} =2AW(A/B)\displaystyle=\frac{2A}{W(A/B)}
=2BA/BW(A/B)\displaystyle=2B\cdot\frac{A/B}{W(A/B)}
=2Bexp(W(A/B))\displaystyle=2B\cdot\exp\big{(}W(A/B)\big{)} (Definition of WW)
2A(1+e1)ln(A/B)\displaystyle\leq\frac{2A\cdot(1+e^{-1})}{\ln(A/B)} (Theorem C.1)
4Aln(A/B).\displaystyle\leq\frac{4A}{\ln(A/B)}.

References

  • [BGKW88] Michael Ben-Or, Shafi Goldwasser, Joe Kilian, and Avi Wigderson. Multi-prover interactive proofs: How to remove intractability assumptions. In STOC, pages 113–131. ACM, 1988.
  • [BJKS04] Ziv Bar-Yossef, T. S. Jayram, Ravi Kumar, and D. Sivakumar. An information statistics approach to data stream and communication complexity. J. Comput. Syst. Sci., 68(4):702–732, 2004.
  • [CHTW04] Richard Cleve, Peter Høyer, Benjamin Toner, and John Watrous. Consequences and limits of nonlocal strategies. In CCC, pages 236–249. IEEE Computer Society, 2004.
  • [DHVY17] Irit Dinur, Prahladh Harsha, Rakesh Venkat, and Henry Yuen. Multiplayer parallel repetition for expanding games. In ITCS, volume 67 of LIPIcs, pages 37:1–37:16. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2017.
  • [EPR35] Albert Einstein, Boris Podolsky, and Nathan Rosen. Can quantum-mechanical description of physical reality be considered complete? Physical review letters, 47(10):777, 1935.
  • [Fei91] Uriel Feige. On the success probability of the two provers in one-round proof systems. In Structure in Complexity Theory Conference, pages 116–123. IEEE Computer Society, 1991.
  • [FGL+91] Uriel Feige, Shafi Goldwasser, László Lovász, Shmuel Safra, and Mario Szegedy. Approximating clique is almost NP-complete (preliminary version). In FOCS, pages 2–12. IEEE Computer Society, 1991.
  • [FK91] H. Furstenberg and Y. Katznelson. A density version of the Hales-Jewett theorem. Journal d’Analyse Mathématique, 57(1):64–119, December 1991.
  • [For89] Lance Jeremy Fortnow. Complexity-theoretic aspects of interactive proof systems. PhD thesis, MIT, 1989.
  • [FRS94] Lance Fortnow, John Rompel, and Michael Sipser. On the power of multi-prover interactive protocols. Theor. Comput. Sci., 134(2):545–557, 1994.
  • [FV96] Uriel Feige and Oleg Verbitsky. Error reduction by parallel repetition - a negative result. In Steven Homer and Jin-Yi Cai, editors, CCC, pages 70–76. IEEE Computer Society, 1996.
  • [GHZ89] Daniel M. Greenberger, Michael A. Horne, and Anton Zeilinger. Going Beyond Bell’s Theorem, pages 69–72. Springer Netherlands, Dordrecht, 1989.
  • [HH00] Abdolhossein Hoorfar and Mehdi Hassani. Inequalities on the Lambert W function and hyperpower function. J. Inequal. Pure and Appl. Math, 2000.
  • [Hol09] Thomas Holenstein. Parallel repetition: Simplification and the no-signaling case. Theory Comput., 5(1):141–172, 2009.
  • [HY19] Justin Holmgren and Lisa Yang. The parallel repetition of non-signaling games: counterexamples and dichotomy. In STOC, pages 185–192. ACM, 2019.
  • [MS13] Carl A. Miller and Yaoyun Shi. Optimal robust self-testing by binary nonlocal XOR games. In TQC, volume 22 of LIPIcs, pages 254–262. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2013.
  • [Pol12] D.H.J. Polymath. A new proof of the density Hales-Jewett theorem. Annals of Mathematics, 175(3):1283–1327, May 2012.
  • [PRW97] Itzhak Parnafes, Ran Raz, and Avi Wigderson. Direct product results and the GCD problem, in old and new communication models. In Frank Thomson Leighton and Peter W. Shor, editors, STOC, pages 363–372. ACM, 1997.
  • [Raz98] Ran Raz. A parallel repetition theorem. SIAM J. Comput., 27(3):763–803, 1998.
  • [Raz11] Ran Raz. A counterexample to strong parallel repetition. SIAM J. Comput., 40(3):771–777, 2011.
  • [Ver96] Oleg Verbitsky. Towards the parallel repetition conjecture. Theor. Comput. Sci., 157(2):277–282, 1996.
  • [Yue16] Henry Yuen. A parallel repetition theorem for all entangled games. In ICALP, volume 55 of LIPIcs, pages 77:1–77:13. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2016.