A Parallel Repetition Theorem for the GHZ Game

Justin Holmgren NTT Research. E-mail: justin.holmgren@ntt-research.com. Research conducted at Princeton University, supported in part by the Simons Collaboration on Algorithms and Geometry and NSF grant No. CCF-1714779. Ran Raz Department of Computer Science, Princeton University. E-mail: ranr@cs.princeton.edu. Research supported by the Simons Collaboration on Algorithms and Geometry, by a Simons Investigator Award and by the National Science Foundation grants No. CCF-1714779, CCF-2007462.

Abstract

We prove that parallel repetition of the (3-player) GHZ game reduces the value of the game polynomially fast to 0. That is, the value of the GHZ game repeated in parallel $t$ times is at most $t^{-\Omega(1)}$ . Previously, only a bound of $\approx\frac{1}{\alpha(t)}$ , where $\alpha$ is the inverse Ackermann function, was known [Ver96].

The GHZ game was recently identified by Dinur, Harsha, Venkat and Yuen as a multi-player game where all existing techniques for proving strong bounds on the value of the parallel repetition of the game fail. Indeed, to prove our result we use a completely new proof technique. Dinur, Harsha, Venkat and Yuen speculated that progress on bounding the value of the parallel repetition of the GHZ game may lead to further progress on the general question of parallel repetition of multi-player games. They suggested that the strong correlations present in the GHZ question distribution represent the “hardest instance” of the multi-player parallel repetition problem [DHVY17].

Another motivation for studying the parallel repetition of the GHZ game comes from the field of quantum information. The GHZ game, first introduced by Greenberger, Horne and Zeilinger [GHZ89], is a central game in the study of quantum entanglement and has been studied in numerous works. For example, it is used for testing quantum entanglement and for device-independent quantum cryptography. In such applications a game is typically repeated to reduce the probability of error, and hence bounds on the value of the parallel repetition of the game may be useful.

1 Introduction

In a $k$ -player game, players are given correlated “questions” $(q_{1},\ldots,q_{k})$ sampled from a distribution $Q$ and must produce corresponding “answers” $(a_{1},\ldots,a_{k})$ such that $(q_{1},\ldots,q_{k},a_{1},\ldots,a_{k})$ satisfy a fixed predicate $\pi$ . Crucially, the players are not allowed to communicate amongst themselves after receiving their questions (but they may agree upon a strategy beforehand). The value of the game is the probability with which the players can win with an optimal strategy. Multi-player games play a central role in theoretical computer science due to their intimate connection with multi-prover interactive proofs (MIPs) [BGKW88], hardness of approximation [FGL⁺91], communication complexity [PRW97, BJKS04], and the EPR paradox and non-local games [EPR35, CHTW04]

One basic operation on multi-player games is parallel repetition. In the $t$ -wise parallel repetition of a game, question tuples $(q^{(i)}_{1},\ldots,q^{(i)}_{k})$ are sampled independently for $i\in[t]$ . The $j^{th}$ player is given $(q^{(1)}_{j},\ldots,q^{(t)}_{j})$ , and is required to produce $(a^{(1)}_{j},\ldots,a^{(t)}_{j})$ . The players win if for every $i\in[t]$ , $(a^{(i)}_{1},\ldots,a^{(i)}_{k})$ is a winning answer for questions $(q^{(i)}_{1},\ldots,q^{(i)}_{k})$ . Parallel repetition was first proposed in [FRS94] as an intuitive attempt to reduce the value of a game from $\epsilon$ to $\epsilon^{t}$ , but in general this is not what happens [For89, Fei91, FV96, Raz11]. The actual effect is far more subtle and a summary of some of the known results is given in Table 1.

	Two-player games	$\geq 3$ -player games
Classical	$\exp(-\Omega(t))$ [Raz98]	$O\left(\frac{1}{\alpha(t)}\right)$ [Ver96]
Entangled	$t^{-\Omega(1)}$ [Yue16]	$O(1)$ (trivial)
Non-Signaling	$\exp\left(-\Omega\left(t\right)\right)$ [Hol09]	$\Omega(1)$ [HY19]

Table 1: Known bounds on the worst-case (slowest) decay for various values of the

t

-wise parallel repetition of a non-trivial game.

\alpha

denotes the inverse Ackermann function.

Much less is known about games with three or more players than about two-player games. Only very weak bounds are known on how $t$ -wise parallel repetition decreases the value of a three-player game (as a function of $t$ ). There is a similar gap in our understanding when players are allowed to share entangled state; in fact, no bounds here are known whatsoever in the three-player case. If players are more generally allowed to use any no-signaling strategy, then there are in fact counterexamples (lower bounds) showing that parallel repetition may utterly fail to reduce the (no-signaling) value of a three-player game.

1.1 The GHZ Game

The GHZ game, which we will denote by ${\cal G}_{\mathsf{GHZ}}$ , is a three-player game with query distribution $Q_{\mathsf{GHZ}}$ that is uniform on $\{x\in{\mathbb{F}}_{2}^{3}:x_{1}+x_{2}+x_{3}=0\}$ . To win, players are required on input $(x_{1},x_{2},x_{3})$ to produce $(y_{1},y_{2},y_{3})$ such that $y_{1}\oplus y_{2}\oplus y_{3}=x_{1}\lor x_{2}\lor x_{3}$ . It is easily verified that the value of ${\cal G}_{\mathsf{GHZ}}$ is $3/4$ .

Dinur et al. [DHVY17] identified the GHZ game as a simple example of a game for which we do not know exponential decay bounds, writing

“We suspect that progress on bounding the value of the parallel repetition of the GHZ game will lead to further progress on the general question.”

and

“We believe that the strong correlations present in the GHZ question distribution represent the “hardest instance” of the multiplayer parallel repetition problem. Existing techniques from the two-player case (which we leverage in this paper) appear to be incapable of analyzing games with question distributions with such strong correlations.”

The GHZ game also plays an important role in quantum information theory and in particular in entanglement testing and device-independent quantum cryptography. Its salient properties are that it is an XOR game for which quantum (entangled) players can play perfectly, but classical players can win only with probability strictly less than $1$ [MS13]. No such two-player game is known. Moreover, the GHZ game has the so called, self testing property, that all quantum strategies that achieve value 1 are essentially equivalent. This property is important for entanglement testing and device-independent quantum cryptography.

Prior to our work, the best known parallel repetition bound for the GHZ game was due to Verbitsky [Ver96], who observed a connection between parallel repetition and the density Hales-Jewett theorem from Ramsey theory [FK91]. Using modern quantitative versions of this theorem [Pol12], Verbitsky’s result implies a bound of approximately $\frac{1}{\alpha(t)}$ , where $\alpha$ is the inverse Ackermann function.

We prove a bound of $t^{-\Omega(1)}$ .

2 Technical Overview

To prove our parallel repetition theorem for the GHZ game we show that for an arbitrary strategy, even if we condition on that strategy winning in several coordinates $i_{1},\ldots,i_{m}$ , there still exists some coordinate in which that strategy loses with significant probability. We consider the finer-grained event that also specifies specific queries and answers in coordinates $i_{1},\ldots,i_{m}$ , and abstract it out as a sufficiently dense product event $E$ over the three players’ inputs.

Given an arbitrary product event $E$ that occurs with sufficiently high probability, we show that some coordinate of $\tilde{P}\stackrel{{\scriptstyle\mathsf{def}}}{{=}}P|E$ is hard. We do this in three high-level steps:

1.

We first prove this for the simpler case in which $E$ is an affine subspace of ${\mathbb{F}}_{2}^{3\times n}$ . In fact, we show in this case that many coordinates of $\tilde{P}$ are hard.
2.

We then prove that when $E$ is arbitrary, $\tilde{P}$ can be written as a convex combination of components $\tilde{P}|{\cal W}$ , where ${\cal W}$ is a large affine subspace, with most such components “indistinguishable” from $P|{\cal W}$ . Specifically, our main requirement is that for all sufficiently compressing linear functions $\phi$ on ${\cal W}$ , the KL divergence of $\phi(\tilde{X})$ from $\phi(X)$ is small, where we sample $\tilde{X}\leftarrow\tilde{P}|{\cal W}$ and $X\leftarrow P|{\cal W}$ .
3.

With this notion of indistinguishability, we prove that if $\tilde{P}|{\cal W}$ is indistinguishable from $P|{\cal W}$ , then the GHZ game (or any game with a constant-sized answer alphabet) is roughly as hard in every coordinate with query distribution $\tilde{P}|{\cal W}$ as with $P|{\cal W}$ .

We conclude that for many coordinates $i$ , there is a significant portion of $\tilde{P}$ for which the $i^{th}$ coordinate is hard. We emphasize that unlike all previous parallel repetition bounds, our proof does not construct a local embedding of $Q_{\mathsf{GHZ}}$ into $\tilde{P}$ for general $E$ .

Local Embeddability in Affine Subspaces

We first show that if $E$ is any affine subspace of sufficiently low codimension $m$ in ${\mathbb{F}}_{2}^{3\times n}$ , then there exist many coordinates $i\in[n]$ for which $Q_{\mathsf{GHZ}}$ is locally embeddable in the $i^{th}$ coordinate of the conditional distribution $\tilde{P}$ . In fact, it will suffice for us to consider only affine “power” subspaces, i.e. of the form $w+{\cal V}^{3}$ for some linear subspace ${\cal V}$ in ${\mathbb{F}}_{2}^{n}$ and vector $w\in{\mathbb{F}}_{2}^{3\times n}$ . Let $X^{1},\ldots,X^{n}\in{\mathbb{F}}_{2}^{3}$ denote the queries in each of the $n$ repetitions.

Our observation is that when $E$ is affine there exists a subset of coordinates $S\subseteq[n]$ with $|S|\geq 2$ such that for any $i^{\prime}\in S$ , $E$ depends on $(X^{i})_{i\in S}$ only via the differences $(X^{i^{\prime}}-X^{i})_{i^{\prime}\in S\setminus\{i\}}$ . Indeed, if $E=E_{1}\times E_{2}\times E_{3}$ and if each $E_{j}$ is given by an affine equation $(X^{1}_{j},\ldots,X^{n}_{j})\cdot A=b_{j}$ for a sufficiently “skinny” matrix $A$ , then by the pigeonhole principle there must exist two distinct subset row-sums of $A$ with equal values. By considering the symmetric difference of these subsets, and using the fact that we are working over ${\mathbb{F}}_{2}$ , there is a set $S\subseteq[n]$ such that the $S$ -subset row-sum of $A$ is $0$ . Thus the value of $(X^{1}_{j},\ldots,X^{n}_{j})\cdot A$ is unchanged if $X^{i}_{j}$ is subtracted from $X^{i^{\prime}}_{j}$ for every $i\in S$ .

As a result, the players can all sample $(X^{i^{\prime}}-X^{i})_{i^{\prime}\in S\setminus\{i\}}$ and $(X^{i^{\prime}})_{i^{\prime}\notin S}$ , which are independent of $X^{i}$ , using shared randomness. On input $X^{i}_{j}$ , the $j^{th}$ player can locally compute $(X^{i^{\prime}}_{j})_{i^{\prime}\in S}$ from $X^{i}_{j}$ and $(X^{i^{\prime}}-X^{i})$ .

Pseudo-Affine Decompositions

At a high level, we next show that if $E$ is an arbitrary product event (with sufficient probability mass) then $\tilde{P}$ has a “pseudo-affine decomposition”. That is, there is a partition $\Pi$ of $({\mathbb{F}}_{2}^{n})^{3}$ into affine subspaces such that if ${\cal W}$ is a random part of $\Pi$ (as weighted by $\tilde{P}$ ), then any strategy for $\tilde{P}|{\cal W}$ can be extended to a strategy for $P|{\cal W}$ that is similarly successful in expectation.

To construct $\Pi$ , we prove the following sufficient conditions for $\Pi$ to be a pseudo-affine decomposition:

•

When ${\cal W}$ is a random part of $\Pi$ (as weighted by $\tilde{P}$ ), the distributions $\tilde{P}|{\cal W}$ and $P|{\cal W}$ are indistinguishable to all sufficiently compressing linear distinguishers. That is, if ${\cal W}$ is an affine shift of ${\cal V}^{3}$ , then for all subspaces ${\cal U}\leq{\cal V}$ of sufficiently small co-dimension, the distributions $\tilde{P}|{\cal W}$ and $P|{\cal W}$ are statistically close modulo ${\cal U}^{3}$ .
•

Each part ${\cal W}$ of $\Pi$ is in fact an affine shift of a product space ${\cal V}^{3}$ for some linear space ${\cal V}$ .

We construct $\Pi$ satisfying these conditions iteratively. Starting with the singleton partition, as long as a random part ${\cal W}$ of $\Pi$ has some subspace ${\cal U}$ for which $\tilde{P}|{\cal W}$ and $P|{\cal W}$ are distributed differently mod ${\cal U}^{3}$ , we replace each part ${\cal W}$ of $\Pi$ by all the affine shifts of ${\cal U}^{3}$ in ${\cal W}$ . We show that this process cannot be repeated too many times when $E$ has sufficient density.

Pseudorandomness Preserves Hardness

The high-level reason these conditions suffice is because for any strategy $f=f_{1}\times f_{2}\times f_{3}$ , they enable us to refine $\Pi$ to a partition $\Pi^{\prime}_{f}$ such that when $X$ is sampled from $\tilde{P}|{\cal W}^{\prime}$ for a random part ${\cal W}^{\prime}$ in $\Pi^{\prime}_{f}$ , the distribution of $f(X)$ is as if $X$ were sampled uniformly from ${\cal W}^{\prime}\cap E$ (i.e. with $X_{1}$ , $X_{2}$ , and $X_{3}$ mutually independent). Moreover, when we construct $\Pi^{\prime}_{f}$ we partition each part ${\cal W}$ of $\Pi$ into all affine shifts of some linear space ${\cal U}^{3}$ where the codimension of ${\cal U}^{3}$ in ${\cal W}$ is not too large. Thus the strategy $f$ on $\tilde{P}|{\cal W}$ effectively has the players acting as independent (randomized) functions of their inputs modulo ${\cal U}$ . Such strategies generalize to $P|{\cal W}$ by the first property of pseudo-affine decompositions stated above.

To construct $\Pi^{\prime}_{f}$ , we ensure that $f_{1}$ is uncorrelated with every affine function on $\tilde{P}|{\cal W}^{\prime}$ when ${\cal W}^{\prime}$ is a random part of $\Pi^{\prime}_{f}$ , and then prove the desired independence by Fourier analysis. We construct $\Pi^{\prime}_{f}$ by iterative refinement of $\Pi$ . Start by considering a random part ${\cal W}$ of $\Pi$ . Whenever $f(X_{1})$ is correlated with an affine ${\mathbb{F}}_{2}$ -valued function $\chi$ , replace ${\cal W}$ in $\Pi$ by ${\cal W}\cap\chi^{-1}(0)$ and ${\cal W}\cap\chi^{-1}(1)$ , and do this in parallel for all parts of $\Pi$ . We show that this cannot be repeated too many times, and thus we quickly arrive at our desired $\Pi^{\prime}_{F}$ .

3 Preliminaries

In this section we describe some preliminary definitions that are somewhat specific to this work. More standard preliminaries are given in Appendices A and B.

3.1 Set Theory

Definition 3.1.

For any set $S$ , a partition of $S$ is a pairwise disjoint set of subsets of $S$ , whose union is all of $S$ .

If $\Pi$ is a partition of $S$ and $x$ is an element of $S$ , we write $\Pi(x)$ to denote the (unique) element of $\Pi$ that contains $x$ .

3.2 Linear Algebra

If $U$ is a linear subspace of $V$ , we write $U\leq V$ rather than $U\subseteq V$ to emphasize that $U$ is a subspace rather than an unstructured subset.

We crucially rely on the Cauchy-Schwarz inequality:

Definition 3.2 (Inner Product Space).

A real inner product space is a vector space $V$ over ${\mathbb{R}}$ together with an operation $\langle\cdot,\cdot\rangle:V\times V\to{\mathbb{R}}$ satisfying the following axioms for all $x,y,z\in V$ :

•

Symmetry: $\langle x,y\rangle=\langle y,x\rangle$ .
•

Linearity in the first¹¹1Because of symmetry, this implies also linearity in the second argument, aka bilinearity. argument: $\langle ax+by,z\rangle=a\langle x,z\rangle+b\langle y,z\rangle$ .
•

Positive Definiteness: $\langle x,x\rangle>0$ if $x\neq 0$ .

Theorem 3.3 (Cauchy-Schwarz).

In any inner product space, it holds for all vectors $u$ and $v$ that $|\langle u,v\rangle|^{2}\leq\langle u,u\rangle\cdot\langle v,v\rangle$ .

3.3 Multi-Player Games

In parallel repetition we often work with Cartesian product sets of the form ${\cal X}=({\cal X}_{1}\times\cdots\times{\cal X}_{k})^{n}$ . For these sets, we will use superscripts to index the outer product and subscripts to index the inner product. That is, we view elements $x$ of ${\cal X}$ as tuples $(x^{1},\ldots,x^{n})$ , where the $i^{th}$ component of $x^{j}$ is $x^{j}_{i}$ . We will also write $x_{i}$ to denote the vector $(x^{1}_{i},\ldots,x^{n}_{i})$ . If $\{E_{i}\subseteq{\cal X}_{i}\}_{i\in[k]}$ is a collection of subsets indexed by subscripts, we write $E_{1}\times\cdots\times E_{k}$ or $\prod_{i\in[k]}E_{i}$ to denote the set $\{x\in{\cal X}:\,\forall i\in[k],x_{i}\in E_{i}\}$ . Similarly, if ${\cal Y}$ is a product set $({\cal Y}_{1}\times\cdots\times{\cal Y}_{k})^{m}$ , we say $f:{\cal X}\to{\cal Y}$ is a product function $f_{1}\times\cdots\times f_{k}$ if $f(x)=y$ for $y_{i}=f_{i}(x_{i})$ .

Definition 3.4 (Multi-player Games).

A $k$ -player game is a tuple $({\cal X},{\cal Y},P,W)$ , where ${\cal X}={\cal X}_{1}\times\cdots\times{\cal X}_{k}$ and ${\cal Y}={\cal Y}_{1}\times\cdots\times{\cal Y}_{k}$ are finite sets, $P$ is a probability measure on ${\cal X}$ , and $W:{\cal X}\times{\cal Y}\to\{0,1\}$ is a “winning probability” predicate.

Definition 3.5 (Parallel Repetition).

Given a $k$ -player game ${\cal G}=({\cal X},{\cal Y},Q,W)$ , its $n$ -fold parallel repetition, denoted ${\cal G}^{n}$ , is defined as the $k$ -player game $({\cal X}^{n},{\cal Y}^{n},Q^{n},W^{n})$ , where $W^{n}(x,y)\stackrel{{\scriptstyle\mathsf{def}}}{{=}}\bigwedge_{j=1}^{n}W(x^{j},y^{j})$ .

Definition 3.6.

The success probability of a function $f=f_{1}\times\cdots f_{k}:{\cal X}\to{\cal Y}$ in a $k$ -player game ${\cal G}=({\cal X},{\cal Y},Q,W)$ is

v[f]({\cal G})\stackrel{{\scriptstyle\mathsf{def}}}{{=}}\Pr_{x\leftarrow Q}\Big{[}W\big{(}(x,f(x)\big{)}=1\Big{]}.

Definition 3.7.

The value of a $k$ -player game ${\cal G}=({\cal X},{\cal Y},Q,W)$ , denoted $v({\cal G})$ , is the maximum, over all functions $f=f_{1}\times\cdots\times f_{k}:{\cal X}\to{\cal Y}$ , of $v[f]({\cal G})$ .

Fact 3.8.

Randomized strategies are no better than deterministic strategies.

Definition 3.9 (Value in $j^{th}$ coordinate).

If ${\cal G}=({\cal X},{\cal Y},Q,W^{n})$ is a game (with a product winning predicate), the value of ${\cal G}$ in the $j^{th}$ coordinate, denoted $v^{j}({\cal G})$ , is the value of the game $({\cal X},{\cal Y},Q,W^{\prime})$ , where $W^{\prime}(x,y)=W(x^{i},y^{i})$ .

Definition 3.10 (Game with Modified Query Distribution).

If ${\cal G}=({\cal X},{\cal Y},Q,W)$ is a game, and $P$ is a probability measure on ${\cal X}$ , we write ${\cal G}|P$ to denote the game $({\cal X},{\cal Y},P,W)$ .

4 Key Lemmas

In this section, we give some Fourier-analytic conditions (see Appendix B for the basics of Fourier analysis) that imply independence of random variables under the (parallel repeated) GHZ query distribution.

It will be convenient for us to work with probability distributions in terms of their densities (see Appendix A for basic probability definitions and notation).

Definition 4.1 (Probability Densities).

If $P:\Omega\to{\mathbb{R}}$ is a probability distribution with support contained in $A$ , then the density of $P$ in $A$ is

	$\displaystyle\varphi:A$	$\displaystyle\to{\mathbb{R}}$
	$\displaystyle x$	$\displaystyle\mapsto\|A\|\cdot P(x).$

If $A$ is unspecified, then by default it is taken to be $\Omega$ .

Lemma 4.2.

Let ${\cal V}$ be a (finite) vector space over ${\mathbb{F}}_{2}$ , let $P$ be uniform on $\{x\in{\cal V}^{3}:x_{1}+x_{2}+x_{3}=0\}$ , and let $U$ be uniform on ${\cal V}^{3}$ .

For any subset $E=E_{1}\times E_{2}\times E_{3}$ of ${\cal V}^{3}$ ,

P(E)=\sum_{\chi\in\hat{{\cal V}}}\prod_{i\in[3]}\hat{1}_{E_{i}}(\chi)=U(E)\cdot\sum_{\chi\in\hat{{\cal V}}}\prod_{i\in[3]}\hat{\varphi}_{E_{i}}(\chi),

where $\varphi_{E_{i}}$ denotes the density in ${\cal V}$ of the uniform distribution on $E_{i}$ .

Proof.

Let $\varphi_{P}$ denote the density in ${\cal V}^{3}$ of $P$ . That is,

\varphi_{P}(x_{1},x_{2},x_{3})=\begin{cases}|{\cal V}|&\text{if $x_{1}+x_{2}+x_{3}=0$}\\ 0&\text{otherwise.}\end{cases}

Then

	$\displaystyle P(E)$	$\displaystyle=\mathop{\mathbb{E}}_{x\leftarrow{\cal V}^{3}}\left[\varphi_{P}(x)\cdot 1_{E}(x)\right]$
		$\displaystyle=\sum_{\chi\in\widehat{{\cal V}^{3}}}\hat{\varphi}_{P}(\chi)\cdot\hat{1}_{E}(\chi).$	(Plancherel)		(1)

We now compute $\hat{\varphi}_{P}(\chi)$ and $\hat{1}_{E}(\chi)$ . We start by noting that the dual space $\widehat{{\cal V}^{3}}$ is isomorphic to $\hat{{\cal V}}^{3}$ . That is, each character $\chi\in\widehat{{\cal V}^{3}}$ is of the form $\chi(x_{1},x_{2},x_{3})=\chi_{1}(x_{1})\chi_{2}(x_{2})\chi_{3}(x_{3})$ for some (uniquely determined) $\chi_{1},\chi_{2},\chi_{3}\in\hat{{\cal V}}$ and conversely, each choice of $\chi_{1},\chi_{2},\chi_{3}\in\hat{{\cal V}}$ gives rise to some $\chi\in\widehat{{\cal V}^{3}}$ .

The Fourier transform of $\varphi_{P}$ is given by

\hat{\varphi}_{P}(\chi_{1},\chi_{2},\chi_{3})=\begin{cases}1&\text{if $\chi_{1}=\chi_{2}=\chi_{3}$}\\ 0&\text{otherwise.}\end{cases}

(2)

Since $E$ is a product event, the Fourier transform of $1_{E}:{\cal V}^{3}\to\{0,1\}$ is given by

	$\displaystyle\hat{1}_{E}(\chi_{1},\chi_{2},\chi_{3})$	$\displaystyle=\prod_{i\in[3]}\hat{1}_{E_{i}}(\chi_{i})$
		$\displaystyle=U(E)\cdot\prod_{i\in[3]}\hat{\varphi}_{E_{i}}(\chi_{i}).$		(3)

Substituting Eqs. 2 and 3 into Eq. 1 concludes the proof of the lemma. ∎

Corollary 4.3.

With ${\cal V}$ , $P$ , $E$ , and $U$ as in Lemma 4.2,

\left|P(E)-U(E)\right|\leq\sum_{\chi\in\hat{{\cal V}}\setminus\{1\}}\prod_{i\in[3]}\big{|}\hat{1}_{E_{i}}(\chi)\big{|},

where $1\in\hat{{\cal V}}$ denotes the trivial character.

Proof.

For any probability density function $\varphi$ , we have $\hat{\varphi}(1)=1$ , so

	$\displaystyle\left\|P(E)-U(E)\right\|$	$\displaystyle=U(E)\cdot\left\|\frac{P(E)}{U(E)}-1\right\|$
		$\displaystyle\leq U(E)\cdot\left\|\sum_{\chi\in\hat{{\cal V}}\setminus\{1\}}\prod_{i\in[3]}\hat{\varphi}_{E_{i}}(\chi)\right\|$
		$\displaystyle\leq\sum_{\chi\in\hat{{\cal V}}\setminus\{1\}}\prod_{i\in[3]}\left\|\hat{1}_{E_{i}}(\chi)\right\|.\qed$

Lemma 4.4.

Let ${\cal V}$ be a (finite) vector space over ${\mathbb{F}}_{2}$ , let $P$ be uniform on $\{x\in{\cal V}^{3}:x_{1}+x_{2}+x_{3}=0\}$ , let $U$ be uniform on ${\cal V}^{3}$ , and let $X=(X_{1},X_{2},X_{3})$ denote the identity²²2Specifically, with the formalism of random variables as functions on a sample space, we mean that $X$ is the identity function, mapping $(x_{1},x_{2},x_{3})$ to $(x_{1},x_{2},x_{3})$ . random variable on ${\cal V}^{3}$ . Let $Y_{i}=Y_{i}(X_{i})$ be a ${\cal Y}_{i}$ -valued random variable for each $i\in[3]$ , let $Y=(Y_{1},Y_{2},Y_{3})$ , and let ${\cal Y}={\cal Y}_{1}\times{\cal Y}_{2}\times{\cal Y}_{3}$ .

Let ${\cal W}$ be a subspace of ${\cal V}$ . If for all $\chi\in\hat{{\cal W}}$ ,

\mathop{\mathbb{E}}_{(x,y)\leftarrow P_{X,Y}}\left[d_{\mathsf{TV}}\big{(}P_{\chi(X_{1})|X\in x+{\cal W}^{3},Y_{1}=y_{1}},U_{\chi(X_{1})|X\in x+{\cal W}^{3}}\big{)}\right]\leq\epsilon,

(4)

then

\mathop{\mathbb{E}}_{x\leftarrow P_{X}}\left[d_{\mathsf{TV}}\big{(}P_{Y|X\in x+{\cal W}^{3}},U_{Y|X\in x+{\cal W}^{3}}\big{)}\right]\leq\epsilon\cdot\sqrt{|{\cal Y}_{2}|\cdot|{\cal Y}_{3}|}.

Proof.

For $x\in{\cal V}^{3}$ , we will write $\bar{x}$ to denote the set $x+{\cal W}^{3}$ . Recall that ${\cal V}/{\cal W}$ denotes the set of all cosets $\{x+{\cal W}\}_{x\in{\cal V}}$ . For every $i\in[3]$ , every $\bar{x}_{i}\in{\cal V}/{\cal W}$ , and every $y_{i}\in{\cal Y}_{i}$ , define $1_{i,\bar{x}_{i},y_{i}}:\bar{x}_{i}\to\{0,1\}$ to be the indicator for the set $Y_{i}^{-1}(y_{i})\cap\bar{x}_{i}$ . Define $\varphi_{i,\bar{x}_{i},y_{i}}$ to be the density (in $\bar{x}_{i}$ ) of the uniform distribution on $Y_{i}^{-1}(y_{i})\cap\bar{x}_{i}$ . That is,

\begin{array}[]{l}\varphi_{i,\bar{x}_{i},y_{i}}:\bar{x}_{i}\to{\mathbb{R}}\\ \varphi_{i,\bar{x}_{i},y_{i}}(x^{\prime}_{i})=\begin{cases}\frac{|\bar{x}_{i}|}{|Y_{i}^{-1}(y_{i})\cap\bar{x}_{i}|}&\text{if $Y_{i}(x^{\prime}_{i})=y_{i}$}\\ 0&\text{otherwise.}\end{cases}\end{array}

$\varphi_{i,\bar{x}_{i},y_{i}}$ is easily seen to be related to $1_{i,\bar{x}_{i},y_{i}}$ as

1_{i,\bar{x}_{i},y_{i}}=P_{Y_{i}|\bar{X}_{i}=\bar{x}_{i}}(y_{i})\cdot\varphi_{i,\bar{x}_{i},y_{i}}.

With this notation, our assumption that Eq. 4 holds (for all $\chi\in\hat{{\cal W}}$ ) is equivalent to assuming that for all $\chi\in\hat{{\cal W}}\setminus\{1\}$ ,

\mathop{\mathbb{E}}_{(x,y)\leftarrow P_{X,Y}}\left[\big{|}\hat{\varphi}_{1,\bar{x}_{1},y_{1}}(\chi)\big{|}\right]\leq 2\epsilon.

(5)

This is because for all $\chi\in\hat{{\cal W}}\setminus\{1\}$ , the distribution $U_{\chi(X_{1})|X\in x+{\cal W}^{3}}$ is uniform on $\{\pm 1\}$ .

In general for $x\in\mathrm{Supp}(P_{X})$ , we have (by Corollary 4.3) that for any $y\in{\cal Y}$ ,

\left|P_{Y|X\in x+{\cal W}^{3}}(y)-U_{Y|X\in x+{\cal W}^{3}}(y)\right|\leq\sum_{\chi\in\hat{{\cal W}}\setminus\{1\}}\prod_{i\in[3]}\big{|}\hat{1}_{i,\bar{x}_{i},y_{i}}(\chi)\big{|}

(6)

because:

•

the event $E=\{Y=y\}$ is a product event $E_{1}\times E_{2}\times E_{3}$ , where each $E_{i}=\{Y_{i}=y_{i}\}$ depends only on $X_{i}$ or equivalently on $X_{i}-x_{i}$ ,
•

the distribution $P_{X-x|\bar{X}=\bar{x}}$ is uniform on $\{(\mathbf{w}_{1},\mathbf{w}_{2},\mathbf{w}_{3})\in{\cal W}^{3}:\mathbf{w}_{1}+\mathbf{w}_{2}+\mathbf{w}_{3}=0\}$ , and
•

the distribution $U_{X-x|\bar{X}=\bar{x}}$ is uniform on $\{(\mathbf{w}_{1},\mathbf{w}_{2},\mathbf{w}_{3})\in{\cal W}^{3}\}$ .

Thus we have

	$\displaystyle 2\cdot\mathop{\mathbb{E}}_{x\leftarrow P_{X}}\left[d_{\mathsf{TV}}\big{(}P_{Y\|X\in x+{\cal W}^{3}},U_{Y\|X\in x+{\cal W}^{3}}\big{)}\right]$	$\displaystyle=\mathop{\mathbb{E}}_{x\leftarrow P_{X}}\sum_{y\in{\cal Y}}\left\|P_{Y\|X\in x+{\cal W}^{3}}(y)-U_{Y\|X\in x+{\cal W}^{3}}(y)\right\|$
		$\displaystyle\leq\mathop{\mathbb{E}}_{x\leftarrow P_{X}}\sum_{y\in{\cal Y}}\sum_{\chi\neq 1}\prod_{i\in[3]}\left\|\hat{1}_{i,\bar{x}_{i},y_{i}}(\chi)\right\|$
		$\displaystyle=\mathop{\mathbb{E}}_{x\leftarrow P_{X}}\sum_{y\in{\cal Y}}\sum_{\chi\neq 1}\prod_{i\in\{2,3\}}\sqrt{\left\|\hat{1}_{1,\bar{x}_{1},y_{1}}(\chi)\right\|\cdot\hat{1}_{i,\bar{x}_{i},y_{i}}(\chi)^{2}}.$

Now, we apply Cauchy-Schwarz on the inner product space whose elements are real-valued functions of $(x,y,\chi)$ , and where the inner product is defined by $\langle f,g\rangle\stackrel{{\scriptstyle\mathsf{def}}}{{=}}\mathop{\mathbb{E}}_{x\leftarrow P_{X}}\sum_{y\in{\cal Y}}\sum_{\chi\neq 1}f(x,y,\chi)\cdot g(x,y,\chi)$ . This bounds the above by

		$\displaystyle\sqrt{\prod_{i\in\{2,3\}}\left(\mathop{\mathbb{E}}_{x\leftarrow P_{X}}\sum_{y\in{\cal Y}}\sum_{\chi\neq 1}\big{\|}\hat{1}_{1,\bar{x}_{1},y_{1}}(\chi)\big{\|}\cdot\hat{1}_{i,\bar{x}_{i},y_{i}}(\chi)^{2}\right)}$
	$\displaystyle=$	$\displaystyle\sqrt{\prod_{i\in\{2,3\}}\left(\sum_{\chi\neq 1}\sum_{y\in{\cal Y}}\mathop{\mathbb{E}}_{x\leftarrow P_{X}}\Big{[}\big{\|}\hat{1}_{1,\bar{x}_{1},y_{1}}(\chi)\big{\|}\cdot\hat{1}_{i,\bar{x}_{i},y_{i}}(\chi)^{2}\Big{]}\right)}.$

By the independence of $(X_{1},Y_{1})$ and $(X_{i},Y_{i})$ under $P$ for $i\in\{2,3\}$ , this is equal to

		$\displaystyle\prod_{i\in\{2,3\}}\sqrt{\sum_{\chi\neq 1}\sum_{y\in{\cal Y}}\mathop{\mathbb{E}}_{x\leftarrow P_{X}}\Big{[}\big{\|}\hat{1}_{1,\bar{x}_{1},y_{1}}(\chi)\big{\|}\Big{]}\cdot\mathop{\mathbb{E}}_{x\leftarrow P_{X}}\left[\hat{1}_{i,\bar{x}_{i},y_{i}}(\chi)^{2}\right]}$
	$\displaystyle=$	$\displaystyle\prod_{i\in\{2,3\}}\sqrt{\|{\cal Y}_{5-i}\|\cdot\sum_{\chi\neq 1}\left(\sum_{y_{1}\in{\cal Y}_{1}}\mathop{\mathbb{E}}_{x\leftarrow P_{X}}\Big{[}\big{\|}\hat{1}_{1,\bar{x}_{1},y_{1}}(\chi)\big{\|}\Big{]}\right)\cdot\left(\sum_{y_{i}\in{\cal Y}_{i}}\mathop{\mathbb{E}}_{x\leftarrow P_{X}}\left[\hat{1}_{i,\bar{x}_{i},y_{i}}(\chi)^{2}\right]\right)}.$

But the function $1_{1,\bar{x}_{1},y_{1}}$ is just $P_{Y_{1}|\bar{X}_{1}=\bar{x}_{1}}(y_{1})\cdot\varphi_{1,\bar{x}_{1},y_{1}}$ , so the above is

	$\displaystyle\prod_{i\in\{2,3\}}\sqrt{\|{\cal Y}_{5-i}\|\cdot\sum_{\chi\neq 1}\left(\sum_{y_{1}\in{\cal Y}_{1}}\mathop{\mathbb{E}}_{x\leftarrow P_{X}}\left[P_{Y_{1}\|\bar{X}_{1}=\bar{x}_{1}}(y_{1})\cdot\big{\|}\hat{\varphi}_{1,\bar{x}_{1},y_{1}}(\chi)\big{\|}\right]\right)\cdot\left(\sum_{y_{i}\in{\cal Y}_{i}}\mathop{\mathbb{E}}_{x\leftarrow P_{X}}\left[\hat{1}_{i,\bar{x}_{i},y_{i}}(\chi)^{2}\right]\right)}$
	$\displaystyle=\prod_{i\in\{2,3\}}\sqrt{\|{\cal Y}_{i}\|\cdot\sum_{\chi\neq 1}\left(\sum_{y_{1}\in{\cal Y}_{1}}\mathop{\mathbb{E}}_{x\leftarrow P_{X}}\left[P_{Y_{1}\|\bar{X}_{1}=\bar{x}_{1}}(y_{1})\cdot\big{\|}\hat{\varphi}_{1,\bar{x}_{1},y_{1}}(\chi)\big{\|}\right]\right)\cdot\left(\sum_{y_{i}\in{\cal Y}_{i}}\mathop{\mathbb{E}}_{x\leftarrow P_{X}}\left[\hat{1}_{i,\bar{x}_{i},y_{i}}(\chi)^{2}\right]\right)}$

which by the definition of expectation is

\prod_{i\in\{2,3\}}\sqrt{|{\cal Y}_{i}|\cdot\sum_{\chi\neq 1}\left(\mathop{\mathbb{E}}_{x,y\leftarrow P_{X,Y}}\left[\big{|}\hat{\varphi}_{1,\bar{x}_{1},y_{1}}(\chi)\big{|}\right]\right)\cdot\left(\sum_{y_{i}\in{\cal Y}_{i}}\mathop{\mathbb{E}}_{x\leftarrow P_{X}}\left[\hat{1}_{i,\bar{x}_{i},y_{i}}(\chi)^{2}\right]\right)}.

We use Eq. 5 to bound this by

	$\displaystyle\prod_{i\in\{2,3\}}\sqrt{2\epsilon\|{\cal Y}_{i}\|\cdot\sum_{\chi\neq 1}\sum_{y_{i}\in{\cal Y}_{i}}\mathop{\mathbb{E}}_{x\leftarrow P_{X}}\left[\hat{1}_{i,\bar{x}_{i},y_{i}}(\chi)^{2}\right]}$
	$\displaystyle\leq\prod_{i\in\{2,3\}}\sqrt{2\epsilon\|{\cal Y}_{i}\|\cdot\sum_{y_{i}\in{\cal Y}_{i}}\mathop{\mathbb{E}}_{x\leftarrow P_{X}}\left[\mathop{\mathbb{E}}_{x^{\prime}\leftarrow\bar{x}_{i}}\left[1_{i,\bar{x}_{i},y_{i}}(x^{\prime})^{2}\right]\right]}$	(Parseval’s Theorem)
	$\displaystyle=\prod_{i\in\{2,3\}}\sqrt{2\epsilon\|{\cal Y}_{i}\|\cdot\mathop{\mathbb{E}}_{x\leftarrow P_{X}}\left[\mathop{\mathbb{E}}_{x^{\prime}\leftarrow\bar{x}_{i}}\left[\sum_{y_{i}\in{\cal Y}_{i}}1_{i,\bar{x}_{i},y_{i}}(x^{\prime})^{2}\right]\right]}.$

But for $y_{i}\neq y^{\prime}_{i}$ , the supports of $1_{i,\bar{x}_{i},y_{i}}$ and $1_{i,\bar{x}_{i},y^{\prime}_{i}}$ are disjoint, so this is at most $2\epsilon\sqrt{|{\cal Y}_{2}|\cdot|{\cal Y}_{3}|}$ . ∎

5 Local Embeddability in Affine Subspaces

In this section we show that the parallel repeated GHZ query distribution has many coordinates in which the GHZ query distribution can be locally embedded, even conditioned on any affine event of low co-dimension. We first recall the notion of a local embedding.

Definition 5.1.

Let $\Sigma$ be a finite set, let $k$ and $n$ be positive integers, let $Q$ be a probability distribution on $\Sigma^{k}$ , and let $\tilde{P}$ be a probability distribution on $\Sigma^{k\times n}$ .

We say that $Q$ is locally embeddable in the $j^{th}$ coordinate of $\tilde{P}$ if there exists a probability distribution $R$ on a set $\mathcal{R}$ and functions $e_{1},\ldots,e_{k}:\Sigma\times\mathcal{R}\to\Sigma^{n}$ such that when sampling $q\leftarrow Q$ , $r\leftarrow R$ , if $\tilde{X}$ denotes the random variable

\tilde{X}\stackrel{{\scriptstyle\mathsf{def}}}{{=}}\left(\begin{array}[]{c}e_{1}(q_{1},r)^{\top}\\ \vdots\\ e_{k}(q_{k},r)^{\top}\end{array}\right),

then:

1.

The probability law of $\tilde{X}$ is exactly $\tilde{P}$ .
2.

It holds with probability $1$ that $\tilde{X}^{j}=q$ .

Proposition 5.2.

Let $n$ and $m$ be positive integers with $m<n$ . Let $Q$ denote the GHZ query distribution (uniform on the set ${\cal Q}=\{x\in{\mathbb{F}}_{2}^{3}:x_{1}+x_{2}+x_{3}=0\}$ ), and let ${\cal W}$ be an affine shift of ${\cal V}^{3}$ for a subspace ${\cal V}\leq{\mathbb{F}}_{2}^{1\times n}$ of codimension $m$ with $Q^{n}({\cal W})>0$ .

Then there exist at least $n-m$ distinct values of $j\in[n]$ for which $Q$ is locally embeddable in the $j^{th}$ coordinate of $\tilde{P}\stackrel{{\scriptstyle\mathsf{def}}}{{=}}Q^{n}|{\cal W}$ .

Proof.

Suppose otherwise. Without loss of generality, suppose that the coordinates that are not locally embeddable include the first $n^{\prime}\stackrel{{\scriptstyle\mathsf{def}}}{{=}}m+1$ coordinates (otherwise, ${\cal V}$ can be permuted to make this so). That is, for each $j\in[n^{\prime}]$ , $Q$ is not locally embeddable in the $j^{th}$ coordinate of $\tilde{P}$ .

Let the defining equations for ${\cal V}$ be written as

{\cal V}\stackrel{{\scriptstyle\mathsf{def}}}{{=}}\left\{x\in{\mathbb{F}}_{2}^{1\times n}:x\cdot A=0\right\}

for some choice of $A\in{\mathbb{F}}_{2}^{n\times m}$ , and let $\mathbf{v}\in{\mathbb{F}}_{2}^{3\times n}$ be such that ${\cal W}=\mathbf{v}+{\cal V}^{3}$ .

Because $2^{n^{\prime}}>2^{m}$ , the pigeonhole principle implies that there exist two distinct sets $S_{0},S_{1}\subseteq[n^{\prime}]$ such that

\sum_{j\in S_{0}}A_{j}=\sum_{j\in S_{1}}A_{j},

where recall that $A_{j}$ denotes the $j^{th}$ row of $A$ . Thus, there is a non-empty subset $S\stackrel{{\scriptstyle\mathsf{def}}}{{=}}S_{0}\Delta S_{1}\subseteq[n^{\prime}]$ such that

\sum_{j\in S}A_{j}=0.

(7)

Fix some such $S$ . We will show that for any $j\in S$ , $Q$ is locally embeddable in the $j^{th}$ coordinate of $\tilde{P}$ , which is a contradiction. Let $X$ denote the ${\mathbb{F}}_{2}^{3\times n}$ -valued random variable given by the identity function.

Claim 5.3.

For any $j\in S$ , the distribution $\tilde{P}_{X^{j}}$ is identical to $Q$ (i.e., uniformly random on ${\cal Q}$ ).

Proof.

Let $j\in S$ be given. It suffices to show that for every $q,q^{\prime}\in{\cal Q}$ , there is a bijection $\Phi_{q,q^{\prime}}:{\cal Q}^{n}\cap{\cal W}\to{\cal Q}^{n}\cap{\cal W}$ such that $x\in{\cal Q}^{n}\cap{\cal W}$ satisfies $x^{j}=q$ if and only if $y\stackrel{{\scriptstyle\mathsf{def}}}{{=}}\Phi_{q,q^{\prime}}(x)$ satisfies $y^{j}=q^{\prime}$ . Such a bijection $\Phi_{q,q^{\prime}}$ can be constructed by defining, for all $j^{\prime}\in[n]$ ,

\Phi_{q,q^{\prime}}(x)^{j^{\prime}}=\begin{cases}x^{j^{\prime}}+q^{\prime}-q&\text{if $j^{\prime}\in S$}\\ x^{j^{\prime}}&\text{otherwise.}\end{cases}

$\Phi_{q,q^{\prime}}$ clearly is an injective map from ${\cal Q}^{n}$ to ${\cal Q}^{n}$ and satisfies $\Phi_{q,q^{\prime}}(x)^{j}=x^{j}+q^{\prime}-q$ , so the only remaining thing to check is that it indeed maps ${\cal W}$ into ${\cal W}$ . This is true because it preserves $x\cdot A$ . Indeed, for any $i\in[3]$ ,

$\displaystyle\Phi_{q,q^{\prime}}(x)_{i}\cdot A$	$\displaystyle=x_{i}\cdot A+\sum_{j^{\prime}\in S}(q^{\prime}_{i}-q_{i})\cdot A_{j^{\prime}}$
	$\displaystyle=x_{i}\cdot A+(q^{\prime}_{i}-q_{i})\cdot\sum_{j^{\prime}\in S}A_{j^{\prime}}$
	$\displaystyle=x_{i}\cdot A$	$\displaystyle\text{(by \lx@cref{creftype~refnum}{eq:pigeonhole}).}\qed$

For any $j\in S$ , let $\Delta^{(j)}$ denote the random variable $\big{(}X^{j^{\prime}}-X^{j}\big{)}_{j^{\prime}\in S\setminus\{j\}}$ .

Claim 5.4.

For any $j\in S$ , it holds in $\tilde{P}$ that $\big{(}\Delta^{(j)},X^{[n]\setminus S}\big{)}$ and $X^{j}$ are independent.

Proof.

Equivalently (using the definition of $\tilde{P}$ ), let $E$ denote the event that $X\in{\cal W}$ , i.e. for all $i\in[3]$ ,

(X_{i}-\mathbf{v}_{i})\cdot A=0.

We need to show that in $P$ , the random variables $X^{j}$ and $\big{(}\Delta^{(j)},X^{[n]\setminus S}\big{)}$ are conditionally independent given $E$ . To show this, we rely on the following fact:

Fact 5.5.

If $Y$ and $Z$ are any independent random variables, and if $E$ is any event that depends only on $Z$ (and occurs with non-zero probability), then $Y$ and $Z$ are conditionally independent given $E$ .

It is clear that $X^{j}$ and $\big{(}\Delta^{(j)},X^{[n]\setminus S}\big{)}$ are independent in $P$ . It is also the case that $E$ depends only on $\big{(}\Delta^{(j)},X^{[n]\setminus S}\big{)}$ : $E$ is defined by the constraint that for all $i\in[3]$ ,

$\displaystyle 0$	$\displaystyle=(X_{i}-\mathbf{v}_{i})\cdot A$
	$\displaystyle=\sum_{j^{\prime}\in S}(X^{j^{\prime}}_{i}-X^{j}_{i}-\mathbf{v}^{j^{\prime}}_{i})\cdot A_{j^{\prime}}+\sum_{j^{\prime}\in[n]\setminus S}(X^{j^{\prime}}_{i}-\mathbf{v}^{j^{\prime}}_{i})\cdot A_{j^{\prime}}$	(by Eq. 7)
	$\displaystyle=-\mathbf{v}^{j}_{i}\cdot A_{j}+\underbrace{\sum_{j^{\prime}\in S\setminus\{j\}}(X^{j^{\prime}}_{i}-X^{j}_{i}-\mathbf{v}^{j^{\prime}}_{i})\cdot A_{j^{\prime}}}_{\text{depends only on $\Delta^{(j)}$}}+\underbrace{\sum_{j^{\prime}\in[n]\setminus S}(X^{j^{\prime}}_{i}-\mathbf{v}^{j^{\prime}}_{i})\cdot A_{j^{\prime}}}_{\text{depends only on $X^{[n]\setminus S}$}}.$

We now put everything togther. Fix any $j\in S$ . We construct a local embedding of $Q$ into the $j^{th}$ coordinate of $\tilde{P}$ . For each $i\in[3]$ , we define $e_{i}:{\mathbb{F}}_{2}\times({\mathbb{F}}_{2}^{3\times n})\to{\mathbb{F}}_{2}^{1\times n}$ such that for each $j^{\prime}\in[n]$ :

e_{i}(x,r)^{j^{\prime}}=\begin{cases}x&\text{if $j^{\prime}=j$}\\ x+r^{j^{\prime}}_{i}-r^{j}_{i}&\text{if $j^{\prime}\in S\setminus\{j\}$}\\ r^{j^{\prime}}_{i}&\text{if $j^{\prime}\notin S$.}\end{cases}

Define the distribution $P^{(\mathsf{embed})}$ to be the distribution on $x\in{\mathbb{F}}_{2}^{3\times n}$ obtained by independently sampling $q\leftarrow Q$ and $r\leftarrow\tilde{P}$ , then defining

x\stackrel{{\scriptstyle\mathsf{def}}}{{=}}\left(\begin{array}[]{c}e_{1}(q_{1},r)\\ e_{2}(q_{2},r)\\ e_{3}(q_{3},r)\end{array}\right).

It clearly holds with probability $1$ that $q=x^{j}$ .

Claim 5.6.

$P^{(\mathsf{embed})}\equiv\tilde{P}$ .

Proof.

By definition, it is immediate that: $P^{(\mathsf{embed})}_{X^{j}}\equiv\tilde{P}_{X^{j}}$ and $P^{(\mathsf{embed})}_{\Delta^{(j)},X^{[n]\setminus S}}\equiv\tilde{P}_{\Delta^{(j)},X^{[n]\setminus S}}$ .

Finally, $X$ is fully determined by $X^{j}$ and $(\Delta^{(j)},X^{[n]\setminus S})$ , which are independent in both $P^{(\mathsf{embed})}$ (because $q$ and $r$ are sampled independently in the definition of $P^{(\mathsf{embed})}$ ) and $\tilde{P}$ (by Claim 5.4). ∎

We have constructed an embedding of $Q$ into one of the first $n^{\prime}$ coordinates of $\tilde{P}$ , which is the desired contradiction. ∎

6 Decomposition Into Pseudorandom Affine Components

In this section we show that if $E$ is an arbitrary event with sufficient probability mass under $P=Q_{\mathsf{GHZ}}^{n}$ , then $\tilde{P}=P|E$ can be decomposed into components with affine support that are “similar” to corresponding components of $P$ . We will call such components pseudorandom.

We say that $\Pi$ is an affine partition of ${\mathbb{F}}_{2}^{3\times n}$ to mean that:

•

Each part $\Pi(x)$ of $\Pi$ has the form $\mathbf{w}(x)+{\cal V}(x)^{3}$ where ${\cal V}(x)$ is a subspace of ${\mathbb{F}}_{2}^{n}$ , and
•

Each ${\cal V}(x)$ has the same dimension, which we refer to as the dimension of $\Pi$ and denote by $\dim(\Pi)$ . The codimension of $\Pi$ is defined to be $n-\dim(\Pi)$ .

Definition 6.1.

If ${\cal W}$ is an affine shift of a vector space ${\cal V}^{3}$ (for ${\cal V}\leq{\mathbb{F}}_{2}^{n}$ ), we say that a ${\cal W}$ -valued random variable $X$ is $(m,\epsilon)$ -close to $Y$ if for all linear functions $\phi:{\mathbb{F}}_{2}^{n}\to{\mathbb{F}}_{2}^{m}$ we have $d_{\mathsf{KL}}(\phi^{3}(X)\|\phi^{3}(Y))\leq\epsilon$ , where $\phi^{3}$ denotes the function mapping

\left(\begin{array}[]{l}x_{1}\\ x_{2}\\ x_{3}\end{array}\right)\mapsto\left(\begin{array}[]{l}\phi(x_{1})\\ \phi(x_{2})\\ \phi(x_{3})\end{array}\right).

We write $d_{m}(X\|Y)$ to denote the minimum $\epsilon$ for which $X$ is $(m,\epsilon)$ -close to $Y$ .

We remark that $d_{m}(X\|Y)$ is a non-decreasing function of $m$ .

Lemma 6.2.

Let $P$ denote the distribution $Q_{\mathsf{GHZ}}^{n}$ , let $X$ be the identity random variable, let $E$ be an event with $P(X\in E)=e^{-\Delta}$ , and let $\tilde{P}=P\big{|}(X\in E)$ . For any $\delta>0$ and any $m\in{\mathbb{Z}}^{+}$ , there exists an affine partition $\Pi$ of ${\mathbb{F}}_{2}^{3\times n}$ , of codimension at most $m\cdot\frac{\Delta}{\delta}$ , such that:

\mathop{\mathbb{E}}_{\pi\leftarrow\tilde{P}_{\Pi(X)}}\left[d_{m}\Big{(}\tilde{P}_{X|X\in\pi}\Big{\|}P_{X|X\in\pi}\Big{)}\right]\leq\delta.

(8)

Proof.

We construct the claimed partition iteratively. Start with the trivial $n$ -dimensional affine partition $\Pi_{0}=\{{\mathbb{F}}_{2}^{3\times n}\}$ . Whenever $\Pi_{i}$ is a partition $\Pi$ for which Eq. 8 does not hold, there exists a function $\phi_{i}:{\mathbb{F}}_{2}^{3\times n}\to{\mathbb{F}}_{2}^{3\times m}$ that:

•

When restricted to any part $\pi$ of $\Pi_{i}$ , $\phi_{i}$ is of the form $\phi_{i,\pi}^{3}$ for some linear function $\phi_{i,\pi}:{\mathbb{F}}_{2}^{n}\to{\mathbb{F}}_{2}^{m}$ , and
•

$d_{\mathsf{KL}}\Big{(}\tilde{P}_{\phi_{i}(X)|\Pi_{i}(X)}\Big{\|}P_{\phi_{i}(X)|\Pi_{i}(X)}\Big{)}>\delta.$ (9)

Without loss of generality, we additionally assume that each $\phi_{i,\pi}$ is “full rank” when restricted to $\pi$ . That is, if $\pi$ is an affine shift of ${\cal V}^{3}$ , where ${\cal V}$ has dimension $k$ , then the restriction of $\phi_{i,\pi}$ to ${\cal V}$ is a linear map of rank $\min(k,m)$ . It is clear that any $\phi_{i,\pi}$ may be modified to be full rank without decreasing the KL divergence of Eq. 9.

Then by the chain rule for KL divergences,

d_{\mathsf{KL}}\Big{(}\tilde{P}_{X|\Pi_{i}(X),\phi_{i}(X)}\Big{\|}P_{X|\Pi_{i}(X),\phi_{i}(X)}\Big{)}<d_{\mathsf{KL}}\Big{(}\tilde{P}_{X|\Pi_{i}(X)}\Big{\|}P_{X|\Pi_{i}(X)}\Big{)}-\delta.

(10)

The left-hand side of Eq. 10 is equivalent to

d_{\mathsf{KL}}\Big{(}\tilde{P}_{X|\Pi_{i+1}(X)}\Big{\|}P_{X|\Pi_{i+1}(X)}\Big{)}

with $\Pi_{i+1}=\big{\{}\pi\cap\{x:\phi_{i}(x)=z\}\big{\}}_{\pi\in\Pi_{i},z\in{\mathbb{F}}_{2}^{3\times m}}$ , which is an affine partition of dimension at least $\dim(\Pi)-m$ .

Thus with the non-negative potential function

\Phi(\Pi)\stackrel{{\scriptstyle\mathsf{def}}}{{=}}d_{\mathsf{KL}}\Big{(}\tilde{P}_{X|\Pi(X)}\Big{\|}P_{X|\Pi(X)}\Big{)},

we have $\Phi(\Pi_{i+1})<\Phi(\Pi_{i})-\delta$ . But $\Phi(\Pi_{0})=-\ln\left(P(E)\right)=\Delta$ , so there must exist $i^{\star}\leq\frac{\Delta}{\delta}$ for which Eq. 8 holds with $\Pi=\Pi_{i^{\star}}$ , which has co-dimension at most $m\cdot\frac{\Delta}{\delta}$ . ∎

7 Pseudorandomness Preserves Hardness

Proposition 7.1.

Let ${\cal W}\subseteq{\mathbb{F}}_{2}^{3\times n}$ be an affine shift of a linear subspace ${\cal V}^{3}$ and let $P$ be a the uniform distribution on $\{w\in{\cal W}:w_{1}+w_{2}+w_{3}=0\}$ , which we assume to be non-empty. Let $X$ denote the identity random variable, let $E=E_{1}\times E_{2}\times E_{3}$ be an event with $P(X\in E)=e^{-\Delta}$ , and define $\tilde{P}\stackrel{{\scriptstyle\mathsf{def}}}{{=}}P\big{|}(X\in E)$ . Suppose that $\tilde{P}_{X}$ is $(\lceil\frac{1}{\delta}\rceil,\delta)$ -close to $P_{X}$ as in Definition 6.1, for $\delta$ satisfying $\delta\leq\min(\frac{\Delta^{2}}{32}\cdot e^{-4\Delta/\epsilon},\ \frac{\Delta^{2}}{32e^{2}},\ 2\epsilon^{2})$ .

Then for each $j\in[n]$ , we have $v^{j}({\cal G}_{\mathsf{GHZ}}^{n}|\tilde{P})\leq v^{j}({\cal G}_{\mathsf{GHZ}}^{n}|P)+2\epsilon$ .

Proof.

Fix $j\in[n]$ to be any coordinate, and let $\tilde{f}=\tilde{f}_{1}\times\tilde{f}_{2}\times\tilde{f}_{3}:{\cal W}\to{\mathbb{F}}_{2}^{3}$ be an arbitrary strategy. Let $Y$ denote $\tilde{f}(X)$ .

Claim 7.2.

There exists a subspace ${\cal U}\leq{\cal V}$ of codimension at most $\lceil\frac{1}{\delta}\rceil$ such that:

•

The $j^{th}$ coordinate $x^{j}$ of $x\in{\mathbb{F}}_{2}^{3\times n}$ depends only on $x+{{\cal U}^{3}}$ .

•

For all $\chi\in\hat{{\cal U}}$ ,

\mathop{\mathbb{E}}_{(x,y)\leftarrow P_{X,Y}}\left[d_{\mathsf{KL}}\Big{(}P_{\chi(X_{1})|X\in x+{\cal U}^{3},Y_{1}=y_{1}}\Big{\|}U_{\chi(X_{1})|X\in x+{\cal U}^{3}}\Big{)}\right]\leq\delta,

where $U$ denotes the uniform distribution on ${\cal W}$ .

Proof.

Start with ${\cal U}_{1}=\{u\in{\cal V}:u^{j}=0\}$ (this ensures that any subspace ${\cal U}\leq{\cal U}_{1}$ satisfies the first desired property). Define a potential function

Z({\cal U})\stackrel{{\scriptstyle\mathsf{def}}}{{=}}\dim({\cal U})-\mathop{\mathbb{E}}_{(x,y)\leftarrow P_{X,Y}}\big{[}H(X_{1}|X_{1}\in x_{1}+{\cal U},Y_{1}=y_{1})\big{]},

which is clearly non-negative. Additionally, $Z({\cal U})$ (and in particular $Z({\cal U}_{1})$ ) is at most $1$ because for any subspace ${\cal U}\leq{\cal V}$ and any $x_{1}\in{\cal V}$ , the entropy chain rule implies

	$\displaystyle\mathop{\mathbb{E}}_{y\leftarrow P_{Y\|X_{1}\in x_{1}+{\cal U}}}\big{[}H(X_{1}\|X_{1}\in x_{1}+{\cal U},Y_{1}=y_{1})\big{]}$	$\displaystyle=H(X_{1}\|X_{1}\in x_{1}+{\cal U})-H(Y_{1}\|X_{1}\in x_{1}+{\cal U})$
		$\displaystyle\geq\dim({\cal U})-1.$

(in the first step we used the fact that $Y_{1}$ is a function of $X_{1}$ .

For $i\geq 1$ , define $\chi_{i}\in\hat{{\cal U}}_{i}\setminus\{1\}$ to maximize

	$\displaystyle b_{i}$	$\displaystyle\stackrel{{\scriptstyle\mathsf{def}}}{{=}}\mathop{\mathbb{E}}_{(x,y)\leftarrow P_{X,Y}}\Big{[}d_{\mathsf{KL}}\Big{(}P_{\chi_{i}(X_{1})\|X\in x+{\cal U}_{i}^{3},Y_{1}=y_{1}}\Big{\\|}U_{\chi_{i}(X_{1})\|X\in x+{\cal U}_{i}^{3}}\Big{)}\Big{]}$
		$\displaystyle=\mathop{\mathbb{E}}_{(x,y)\leftarrow P_{X,Y}}\Big{[}d_{\mathsf{KL}}\Big{(}P_{\chi_{i}(X_{1})\|X\in x+{\cal U}_{i}^{3},Y_{1}=y_{1}}\Big{\\|}\mathrm{Unif}_{\{\pm 1\}}\Big{)}\Big{]}$
		$\displaystyle=\mathop{\mathbb{E}}_{(x,y)\leftarrow P_{X,Y}}\Big{[}d_{\mathsf{KL}}\Big{(}P_{\chi_{i}(X_{1})\|X_{1}\in x_{1}+{\cal U}_{i},Y_{1}=y_{1}}\Big{\\|}\mathrm{Unif}_{\{\pm 1\}}\Big{)}\Big{]}$
		$\displaystyle=1-\mathop{\mathbb{E}}_{(x,y)\leftarrow P_{X,Y}}\Big{[}H\big{(}\chi_{i}(X_{1})\|X_{1}\in x_{1}+{\cal U}_{i},Y_{1}=y_{1}\big{)}\Big{]},$

and define ${\cal U}_{i+1}\stackrel{{\scriptstyle\mathsf{def}}}{{=}}\{u\in{\cal U}_{i}:\chi_{i}(u)=1\}$ . By the entropy chain rule, we have $Z({\cal U}_{i+1})\leq Z({\cal U}_{i})-b_{i}$ .

Since the initial potential is at most $1$ , and all potentials are at least $0$ , there must be some $i^{\star}\leq\lceil\frac{1}{\delta}\rceil$ for which $b_{i^{\star}}\leq\delta$ . The corresponding ${\cal U}_{i^{\star}}$ is the desired subspace of ${\cal V}$ . ∎

Now let ${\cal U}$ be as given by Claim 7.2. By Lemma 4.4, we have

\mathop{\mathbb{E}}_{x\leftarrow P_{X}}\left[d_{\mathsf{TV}}\big{(}P_{Y|X\in x+{\cal U}^{3}},\prod_{i\in[3]}\ P_{Y_{i}|X_{i}\in x_{i}+{\cal U}}\big{)}\right]\leq\sqrt{2\delta}.

By assumption of Proposition 7.1 (together with Pinsker’s inequality), $P_{X+{\cal U}^{3}}$ and $\tilde{P}_{X+{\cal U}^{3}}$ are $\sqrt{\frac{\delta}{2}}$ -close in total variational distance. We thus have that

\mathop{\mathbb{E}}_{x\leftarrow\tilde{P}_{X}}\left[d_{\mathsf{TV}}\big{(}P_{Y|X\in x+{\cal U}^{3}},\prod_{i\in[3]}\ P_{Y_{i}|X_{i}\in x_{i}+{\cal U}}\big{)}\right]\leq\sqrt{8\delta},

(11)

by the general fact that if $P$ and $Q$ are two distributions that are $\epsilon$ -close in total variational distance, and if $X$ is a $B$ -bounded random variable, then $\big{|}\mathop{\mathbb{E}}_{P}[X]-\mathop{\mathbb{E}}_{Q}[X]\big{|}\leq 2B\epsilon$ .

We now obtain a probabilistic lower bound on $P(E|X+{\cal U}^{3})$ . We first lower bound its log-expectation:

$\displaystyle\mathop{\mathbb{E}}_{x\leftarrow\tilde{P}_{X}}\Big{[}-\ln P\big{(}E\|X\in x+{\cal U}^{3}\big{)}\Big{]}$	$\displaystyle=\mathop{\mathbb{E}}_{x\leftarrow\tilde{P}_{X}}\Big{[}d_{\mathsf{KL}}\big{(}\tilde{P}_{X\|X\in x+{\cal U}^{3}}\\|P_{X\|X\in x+{\cal U}^{3}}\big{)}\Big{]}$
	$\displaystyle\leq d_{\mathsf{KL}}\big{(}\tilde{P}_{X}\\|P_{X}\big{)}$	(Fact A.17)
	$\displaystyle\leq\Delta.$

Markov’s inequality then implies that for any $\tau$ ,

\Pr_{x\leftarrow\tilde{P}_{X}}\big{[}P(E|X\in x+{\cal U}^{3})\leq\tau\big{]}\leq\frac{\Delta}{\ln(1/\tau)}.

(12)

Combining Eq. 12 with Eq. 11 and Fact A.18, we get

\mathop{\mathbb{E}}_{x\leftarrow\tilde{P}_{X}}\left[d_{\mathsf{TV}}\big{(}\tilde{P}_{Y|X\in x+{\cal U}^{3}},\prod_{i\in[3]}\ (P|X_{i}\in E_{i})_{Y_{i}|X_{i}\in x_{i}+{\cal U}}\big{)}\right]\leq\frac{\Delta}{\ln(1/\tau)}+\frac{4\sqrt{2\delta}}{\tau}.

Since this holds for all $\tau\in[0,1]$ and because $\delta\leq\frac{\Delta^{2}}{32e^{2}}$ , Corollary C.2 implies that

\displaystyle\mathop{\mathbb{E}}_{x\leftarrow\tilde{P}_{X}}\left[d_{\mathsf{TV}}\big{(}\tilde{P}_{Y|X\in x+{\cal U}^{3}},\prod_{i\in[3]}\ (P|X_{i}\in E_{i})_{Y_{i}|X_{i}\in x_{i}+{\cal U}}\big{)}\right]\leq\frac{4\Delta}{\ln\left(\frac{\Delta}{\sqrt{32\delta}}\right)}\leq\epsilon,

(13)

where the last inequality follows from our assumption that $\delta\leq\frac{\Delta^{2}}{32}\cdot e^{-4\Delta/\epsilon}$ .

Putting everything together, we have

	$\displaystyle\tilde{P}_{X+{\cal U}^{3},Y}$	$\displaystyle=\tilde{P}_{X+{\cal U}^{3}}\tilde{P}_{Y\|X+{\cal U}^{3}}$
		$\displaystyle\approx_{\epsilon}\tilde{P}_{X+{\cal U}^{3}}\cdot\prod_{i\in[3]}(P\|X_{i}\in E_{i})_{Y_{i}\|X_{i}+{\cal U}}$
		$\displaystyle\approx_{\sqrt{\frac{\delta}{2}}}P_{X+{\cal U}^{3}}\cdot\prod_{i\in[3]}(P\|X_{i}\in E_{i})_{Y_{i}\|X_{i}+{\cal U}},$

where $\approx$ denotes closeness in total variational distance.

But $P_{X+{\cal U}^{3}}\cdot\prod_{i\in[3]}(P|X_{i}\in E_{i})_{Y_{i}|X_{i}+{\cal U}}$ is just the distribution on $(x+{\cal U}^{3},y)$ obtained by sampling $x\leftarrow P_{X}$ , $y\leftarrow F(x)$ , where $F=F_{1}\times F_{2}\times F_{3}$ is the following randomized strategy. On input $x_{i}$ , $F_{i}$ uses local randomness to sample and output $y_{i}\leftarrow(P|X_{i}\in E_{i})_{Y_{i}|X_{i}\in x_{i}+{\cal U}}$ . By Fact 3.8, the probability that $W(x^{j},y)=1$ (which is well-defined because $x^{j}$ is a function of $x+{\cal U}^{3}$ ) is at most $v^{j}({\cal G}_{\mathsf{GHZ}}^{n}|P)$ .

We thus have

	$\displaystyle v^{j}[\tilde{f}]({\cal G}_{\mathsf{GHZ}}^{n}\|\tilde{P})$	$\displaystyle=\tilde{P}_{X+{\cal U}^{3},Y}\big{(}W(X^{j},Y)=1\big{)}$
		$\displaystyle\leq v^{j}({\cal G}_{\mathsf{GHZ}}^{n}\|P)+\epsilon+\sqrt{\frac{\delta}{2}}$
		$\displaystyle\leq v^{j}({\cal G}_{\mathsf{GHZ}}^{n}\|P)+2\epsilon.$

Since this holds for arbitrary $\tilde{f}$ , we have $v^{j}({\cal G}_{\mathsf{GHZ}}^{n}|\tilde{P})\leq v^{j}({\cal G}_{\mathsf{GHZ}}^{n}|P)+2\epsilon$ . ∎

8 Proof of Main Theorem

Theorem 8.1.

If ${\cal G}=({\cal X},{\cal Y},Q,W)$ denotes the GHZ game, then $v({\cal G}^{n})\leq n^{-\Omega(1)}$ .

Proof.

Recall $v({\cal G})=3/4$ .

Let $P$ denote $Q^{n}$ ; that is $P$ is uniform on $\big{\{}(X_{1},X_{2},X_{3})\in{\mathbb{F}}_{2}^{3\times n}:X_{1}+X_{2}+X_{3}=0\big{\}}$ . Let $E=E_{1}\times E_{2}\times E_{3}$ be any product event in ${\mathbb{F}}_{2}^{3\times n}$ with $P(E)\geq e^{-\Delta}$ (where $\Delta$ is a parameter we will specify later), and let $\tilde{P}$ denote $P|E$ .

Let $\delta>0$ be a parameter we will specify later, and let $m=\lceil\frac{1}{\delta}\rceil$ . Recall our definition of $d_{m}$ (Definition 6.1). Lemma 6.2 states that there exists an affine partition $\Pi$ of ${\mathbb{F}}_{2}^{3\times n}$ , of codimension at most $m\cdot\frac{\Delta}{\delta}$ , such that:

\mathop{\mathbb{E}}_{\pi\leftarrow\tilde{P}_{\Pi(X)}}\left[d_{m}\Big{(}\tilde{P}_{X|X\in\pi}\Big{\|}P_{X|X\in\pi}\Big{)}\right]\leq\delta.

Moreover,

	$\displaystyle\mathop{\mathbb{E}}_{\pi\leftarrow\tilde{P}_{\Pi(X)}}\left[d_{\infty}\Big{(}\tilde{P}_{X\|X\in\pi}\Big{\\|}P_{X\|X\in\pi}\Big{)}\right]$	$\displaystyle=d_{\mathsf{KL}}\Big{(}\tilde{P}_{X\|\Pi(X)}\Big{\\|}P_{X\|\Pi(X)}\Big{)}$
		$\displaystyle\leq d_{\mathsf{KL}}\big{(}\tilde{P}_{X}\\|P_{X}\big{)}$
		$\displaystyle\leq\Delta.$

Markov’s inequality thus implies that with probability at least $1/3$ when sampling $\pi\leftarrow\tilde{P}_{\Pi(X)}$ , it holds that $d_{m}\Big{(}\tilde{P}_{X|X\in\pi}\Big{\|}P_{X|X\in\pi}\Big{)}\leq 3\delta$ and $d_{\infty}\Big{(}\tilde{P}_{X|X\in\pi}\Big{\|}P_{X|X\in\pi}\Big{)}\leq 3\Delta$ . Call such a $\pi$ pseudorandom, and let ${\cal R}$ denote the set of pseudorandom $\pi$ .

By Proposition 7.1, for each pseudorandom $\pi$ we have

v^{j}\big{(}{\cal G}^{n}|(\tilde{P}|\pi)\big{)}\leq v^{j}\big{(}{\cal G}^{n}|(P|\pi)\big{)}+2\epsilon

(14)

as long as

3\delta\leq\min(\frac{9\Delta^{2}}{32}\cdot e^{-12\Delta/\epsilon},\ \frac{9\Delta^{2}}{32e^{2}},\ 2\epsilon^{2}),

(15)

where $\epsilon$ is a parameter we will specify later.

By Proposition 5.2, for each $\pi\in\Pi$ (with $P(\pi)>0$ ), it holds for all but $m\cdot\frac{\Delta}{\delta}$ values of $j\in[n]$ , we have $v^{j}\big{(}{\cal G}^{n}\big{|}(P|\pi)\big{)}=v({\cal G})=3/4$ . By averaging, there exists some $j^{\star}\in[n]$ such that

\mathop{\mathbb{E}}_{\pi\leftarrow\tilde{P}_{\Pi(X)|\Pi(X)\in{\cal R}}}\left[v^{j^{\star}}\big{(}{\cal G}^{n}\big{|}(P|\pi)\big{)}\right]\leq\frac{m\Delta}{n\delta}+\left(1-\frac{m\Delta}{n\delta}\right)\cdot\frac{3}{4},

which is at most $7/8$ if

\delta\geq\frac{2m\Delta}{n}.

(16)

Putting everything together, we have

	$\displaystyle v^{j^{\star}}\big{(}{\cal G}^{n}\|\tilde{P}\big{)}$	$\displaystyle\leq\mathop{\mathbb{E}}_{\pi\leftarrow\tilde{P}_{\Pi(X)}}\Big{[}v^{j^{\star}}\big{(}{\cal G}^{n}\|(\tilde{P}\|\pi)\big{)}\Big{]}$
		$\displaystyle\leq\Pr_{\pi\leftarrow\tilde{P}_{\Pi(X)}}\left[\pi\notin{\cal R}\right]+\Pr_{\pi\leftarrow\tilde{P}_{\Pi(X)}}\left[\pi\in{\cal R}\right]\cdot\mathop{\mathbb{E}}_{\pi\leftarrow\tilde{P}_{\Pi(X)\|\Pi(X)\in{\cal R}}}\Big{[}v^{j^{\star}}\big{(}{\cal G}^{n}\|(\tilde{P}\|\pi)\big{)}\Big{]}$
		$\displaystyle\leq\frac{2}{3}+\frac{1}{3}\cdot(\frac{7}{8}+2\epsilon)$
		$\displaystyle\leq\frac{47}{48}$

if Eqs. 15 and 16 are satisfied and if $\epsilon\leq\frac{1}{32}$ . Setting $\epsilon=\frac{1}{32}$ , $\Delta=0.0005\ln n$ , $\delta=n^{-0.4}$ , $m=n^{0.4}$ ensures that these constraints are all satisfied for sufficiently large $n$ .

Applying Lemma 8.2 below with $\rho(n)=n^{-0.0005}$ and $\epsilon=\frac{1}{48}$ completes the proof. ∎

Lemma 8.2 (Parallel Repetition Criterion).

Let ${\cal G}=({\cal X},{\cal Y},Q,W)$ be a game, and let $P$ denote $Q^{n}$ . Suppose $\rho:{\mathbb{Z}}^{+}\to{\mathbb{R}}$ is a function with $\rho(n)\geq e^{-O(n)}$ and $\epsilon>0$ is a constant such that for all $E=E_{1}\times\cdots E_{k}\subseteq{\cal X}^{n}$ with $P^{n}(E)\geq\rho(n)$ there exists $j$ such that $v^{j}\big{(}{\cal G}^{n}|(P|E)\big{)}\leq 1-\epsilon$ . Then

v({\cal G}^{n})\leq\rho(n)^{\Omega(1)}.

Proof.

Fix any $f=f_{1}\times\cdots\times f_{k}:{\cal X}^{n}\to{\cal Y}^{n}$ . Consider the probability space defined by sampling ${X}\leftarrow P^{n}$ , and let ${Y}=f({X})$ . We define additional random variables $J_{1},\ldots,J_{n}\in[n]$ and $Z_{1},\ldots,Z_{n}\in{\cal X}\times{\cal Y}$ where $J_{1}$ is an arbitrary fixed value, $Z_{i}\stackrel{{\scriptstyle\mathsf{def}}}{{=}}({X}^{J_{i}},{Y}^{J_{i}})$ for all $i$ , and $J_{i+1}$ depends deterministically on ${Z}_{\leq i}\stackrel{{\scriptstyle\mathsf{def}}}{{=}}(Z_{1},\ldots,Z_{i})$ as follows. When ${Z}_{\leq i}={z}_{\leq i}$ , $J_{i+1}$ is defined to be a value $j\in[n]$ that minimizes $P^{n}\big{(}W(X^{j},Y^{j})=1\big{|}{Z}_{\leq i}={z}_{\leq i}\big{)}$ . With these definitions, each event $\{Z_{\leq i}=z_{\leq i}\}$ is a product event. In particular, if $P^{n}({Z}_{\leq i}={z}_{\leq i})\geq\rho(n)$ then $P^{n}\big{(}W(X^{J_{i+1}},Y^{J_{i+1}})=1\big{|}{Z}_{\leq i}={z}_{\leq i}\big{)}\leq 1-\epsilon$ .

Let $\textsc{Win}_{i}$ denote the event that $W(Z_{i})=1$ , let $\textsc{Win}_{\leq i}$ denote the event $\textsc{Win}_{1}\land\cdots\land\textsc{Win}_{i}$ , and let $w_{i}$ denote $P^{n}\big{(}\textsc{Win}_{\leq i}\big{)}$ . Since $\textsc{Win}_{\leq i}$ is the union of some subset of the $|{\cal X}|^{i}\cdot|{\cal Y}|^{i}$ disjoint product events $\{{Z}_{\leq i}={z}_{\leq i}\}$ , we have

\Pr_{{z}_{\leq i}\leftarrow P^{n}_{{Z}_{\leq i}|\textsc{Win}_{\leq i}}}\left[P^{n}({Z}_{\leq i}={z}_{\leq i})\geq\rho(n)\right]\geq 1-|{\cal X}|^{i}\cdot|{\cal Y}|^{i}\cdot\frac{\rho(n)}{w_{i}}.

Moreover, for all ${z}_{\leq i}$ for which $P^{n}({Z}_{\leq i}={z}_{\leq i})\geq\rho(n)$ , we know that $P^{n}\big{(}\textsc{Win}_{i+1}\big{|}{Z}_{\leq i}={z}_{\leq i}\big{)}\leq 1-\epsilon$ . Thus as long as $w_{i}\geq 2\cdot|{\cal X}|^{i}\cdot|{\cal Y}|^{i}\cdot\rho(n)$ , we have

	$\displaystyle w_{i+1}$	$\displaystyle=w_{i}\cdot P^{n}(\textsc{Win}_{i+1}\|\textsc{Win}_{\leq i})$
		$\displaystyle=w_{i}\cdot\mathop{\mathbb{E}}_{{z}_{\leq i}\leftarrow P^{n}_{{Z}_{\leq i}\|\textsc{Win}_{\leq i}}}\big{[}P^{n}(\textsc{Win}_{i+1}\|{Z}_{\leq i}={z}_{\leq i})\big{]}$
		$\displaystyle\leq w_{i}\cdot\left(\Pr_{{z}_{\leq i}\leftarrow P^{n}_{{Z}_{\leq i}\|\textsc{Win}_{\leq i}}}\big{[}P^{n}\big{(}{Z}_{\leq i}={z}_{\leq i}\big{)}<\rho\big{]}+\Pr_{{z}_{\leq i}\leftarrow P^{n}_{{Z}_{\leq i}\|\textsc{Win}_{\leq i}}}\big{[}P^{n}\big{(}{Z}_{\leq i}={z}_{\leq i}\big{)}\geq\rho\big{]}\cdot(1-\epsilon)\right)$
		$\displaystyle\leq w_{i}\cdot\left(\frac{1}{2}+\frac{1}{2}\cdot(1-\epsilon)\right)$
		$\displaystyle=w_{i}\cdot\left(1-\frac{\epsilon}{2}\right)$

Iterating this inequality as long as the condition $w_{i}\geq 2\cdot|{\cal X}|^{i}\cdot|{\cal Y}|^{i}\cdot\rho(n)$ is satisfied, we find $w_{i^{\star}}$ such that $w_{i^{\star}}\leq\min\big{(}2\cdot|{\cal X}|^{i^{\star}}\cdot|{\cal Y}|^{i^{\star}}\cdot\rho(n),(1-\frac{\epsilon}{2})^{i^{\star}}\big{)}$ . This is minimized for $i^{\star}=\Theta(\log\frac{1}{\rho(n)})$ or $i^{\star}=n$ and gives $v({\cal G}^{n})\leq w_{i^{\star}}\leq\rho(n)^{\Omega(1)}$ . ∎

Appendix A Probability Theory

We recall the notions of probability theory that we will need.

Definition A.1.

A probability distribution on a finite set $\Omega$ is a function $P:\Omega\to{\mathbb{R}}$ satisfying $P(\omega)\geq 0$ for all $\omega\in\Omega$ and $\sum_{\omega\in\Omega}P(\omega)=1$ . We extend the domain of $P$ to $2^{\Omega}$ by writing $P(E)$ to denote $\sum_{\omega\in E}P(\omega)$ for any “event” $E\subseteq\Omega$ .

Definition A.2.

The support of $P:\Omega\to{\mathbb{R}}$ is the set $\{\omega\in\Omega:P(\omega)>0\}$ .

Definition A.3.

A $\Sigma$ -valued random variable on a sample space $\Omega$ is a function $X:\Omega\to\Sigma$ .

Definition A.4 (Expectations).

If $P:\Omega\to{\mathbb{R}}$ is a probability distribution and $X:\Omega\to{\mathbb{R}}$ is a random variable, the expectation of $X$ under $P$ , denoted $\mathop{\mathbb{E}}_{P}[X]$ , is defined to be $\sum_{\omega\in\Omega}P(\omega)\cdot X(\omega)$ .

We refer to subsets of $\Omega$ as events. We use standard shorthand for denoting events. For instance, if $X$ is a $\Sigma$ -valued random variable and $x\in\Sigma$ , we write $X=x$ to denote the event $\{\omega\in\Omega:X(\omega)=x\}$ .

Definition A.5 (Indicator Random Variables).

For any event $E$ , we write $1_{E}$ to denote a random variable defined as

1_{E}(\omega)=\begin{cases}1&\text{if $\omega\in E$}\\ 0&\text{otherwise.}\end{cases}

Definition A.6 (Independence).

Events $E_{1},\ldots,E_{k}\subseteq\Omega$ are said to be independent under a probability distribution $P$ if $P(E_{1}\cap\cdots\cap E_{k})=\prod_{i\in[k]}P(E_{i})$ . Random variables $X_{1},\ldots,X_{k}$ are said to be independent if the events $X_{1}=x_{1},\ldots,X_{k}=x_{k}$ are independent for any choice of $x_{1},\ldots,x_{k}$ .

Definition A.7 (Conditional Probabilities).

If $P:\Omega\to{\mathbb{R}}$ is a probability distribution and $E\subseteq\Omega$ is an event with $P(E)>0$ , then the conditional distribution of $P$ given $E$ is denoted $(P|E):\Omega\to{\mathbb{R}}$ and is defined to be

(P|E)(\omega)=\begin{cases}P(\omega)/P(E)&\text{if $\omega\in E$}\\ 0&\text{otherwise.}\end{cases}

If $X$ is a random variable and $P$ is a probability distribution, we write $P_{X}$ to denote the induced distribution of $X$ under $P$ . That is, $P_{X}(x)=P(X=x)$ .

If $E$ is an event, we write $P_{X|E}$ as shorthand for $(P|E)_{X}$ .

Definition A.8 (Entropy).

If $P:\Omega\to{\mathbb{R}}$ is a probability distribution, the entropy (in nats) of $P$ is

H(P)\stackrel{{\scriptstyle\mathsf{def}}}{{=}}-\sum_{\omega\in\Omega}P(\omega)\cdot\ln\big{(}P(\omega)\big{)}.

When $X$ is a random variable associated with a probability distribution $P$ , we sometimes write $H(X)$ as shorthand for $H(P_{X})$ .

Definition A.9 (Conditional Entropy).

If $P$ is a probability measure with random variables $X$ and $Y$ , we write

H(P_{X|Y})\stackrel{{\scriptstyle\mathsf{def}}}{{=}}\mathop{\mathbb{E}}_{y\leftarrow P_{Y}}\left[H(P_{X|Y=y})\right].

Fact A.10 (Chain Rule of Conditional Entropy).

For any probability measure $P$ and any random variables $X$ , $Y$ , it holds that

H(P_{X|Y})=H(P_{X,Y})-H(P_{Y}).

A.1 Divergences

Definition A.11 (Total Variational Distance).

If $P,Q:\Omega\to{\mathbb{R}}$ are two probability distributions, then the total variational distance between $P$ and $Q$ , denoted $d_{\mathsf{TV}}(P,Q)$ , is

d_{\mathsf{TV}}(P,Q)\stackrel{{\scriptstyle\mathsf{def}}}{{=}}\max_{E\subseteq\Omega}\Big{|}P(E)-Q(E)\Big{|}.

An equivalent definition is

d_{\mathsf{TV}}(P,Q)\stackrel{{\scriptstyle\mathsf{def}}}{{=}}\frac{1}{2}\sum_{\omega\in\Omega}\big{|}P(\omega)-Q(\omega)\big{|}

Definition A.12 (Kullback-Leibler (KL) Divergence).

If $P,Q:\Omega\to{\mathbb{R}}$ are probability distributions, the Kullback-Leibler divergence of $P$ from $Q$ is

d_{\mathsf{KL}}(P\|Q)\stackrel{{\scriptstyle\mathsf{def}}}{{=}}\sum_{\omega\in\Omega}P(\omega)\ln\left(\frac{P(\omega)}{Q(\omega)}\right),

where terms of the form $p\cdot\ln(p/0)$ are treated as $0$ if $p=0$ and $+\infty$ otherwise, and terms of the form $0\cdot\ln(0/q)$ are treated as $0$ .

The following relation between total variational distance and Kullback-Leiber divergence, known as Pinsker’s inequality, is of fundamental importance.

Theorem A.13 (Pinsker’s Inequality).

For any probability distributions $P,Q:\Omega\to{\mathbb{R}}$ , it holds that $d_{\mathsf{TV}}(P,Q)\leq\sqrt{\frac{1}{2}d_{\mathsf{KL}}(P\|Q)}$ .

Definition A.14 (Conditional KL Divergence).

If $P,Q:\Omega\to{\mathbb{R}}$ are probability distributions and if $W$ , $X$ , $Y$ , and $Z$ are random variables on $\Omega$ , we write

d_{\mathsf{KL}}(P_{W|X}\|Q_{Y|Z})\stackrel{{\scriptstyle\mathsf{def}}}{{=}}\mathop{\mathbb{E}}_{x\leftarrow P_{X}}\left[d_{\mathsf{KL}}(P_{W|X=x}\|Q_{Y|Z=x})\right],

which is taken to be $+\infty$ if there exists $x$ with $P_{X}(x)>0$ but $Q_{Z}(x)=0$ .

KL divergence obeys a chain rule analogous to that for entropy.

Fact A.15 (Chain Rule for KL Divergence).

If $P,Q:\Omega\to{\mathbb{R}}$ are probability distributions and $W,X,Y,Z$ are random variables on $\Omega$ , then

d_{\mathsf{KL}}(P_{W,X}\|Q_{Y,Z})=d_{\mathsf{KL}}(P_{X}\|Q_{Z})+d_{\mathsf{KL}}(P_{W|X}\|P_{Y|Z}).

A.2 Conditional KL Divergence

Fact A.16.

If $P:\Omega\to{\mathbb{R}}$ is a probability distribution and $E\subseteq\Omega$ is an event, then

d_{\mathsf{KL}}\big{(}P|E\big{\|}P\big{)}=\ln\left(\frac{1}{P(E)}\right).

Fact A.17.

Let $P,Q:\Omega\to{\mathbb{R}}$ be probability distributions and let $X$ , $Y$ be random variables on $\Omega$ with $Y$ a function of $X$ . Then

d_{\mathsf{KL}}(P_{X|Y}\|Q_{X|Y})\big{]}\leq d_{\mathsf{KL}}(P_{X}\|Q_{X}).

Proof.

This is well known, but for completeness:

$\displaystyle d_{\mathsf{KL}}(P_{X\|Y}\\|Q_{X\|Y})$	$\displaystyle=d_{\mathsf{KL}}(P_{X,Y}\\|Q_{X,Y})-d_{\mathsf{KL}}(P_{Y}\\|Q_{Y})$	(chain rule)
	$\displaystyle=d_{\mathsf{KL}}(P_{X}\\|Q_{X})-d_{\mathsf{KL}}(P_{Y}\\|Q_{Y})$	( $Y$ is a function of $X$ )
	$\displaystyle\leq d_{\mathsf{KL}}(P_{X}\\|Q_{X}).$	$\displaystyle\text{(non-negativity of KL)}\qed$

A.3 Conditional Statistical Distance

Fact A.18.

Let $P,Q:\Omega\to{\mathbb{R}}$ be probability distributions, and let $E\subseteq\Omega$ be an arbitrary event. Then

d_{\mathsf{TV}}(P|E,Q|E)\leq\frac{2\cdot d_{\mathsf{TV}}(P,Q)}{P(E)}.

Proof.

Suppose for the sake of contradiction that for some $A\subseteq E$ , we have

\left|(P|E)(A)-(Q|E)(A)\right|>\frac{2d_{\mathsf{TV}}(P,Q)}{P(E)}.

Multiplying on both sides by $P(E)$ , we obtain

\left|P(A)-P(E)\cdot(Q|E)(A)\right|>2d_{\mathsf{TV}}(P,Q).

Since $|P(E)-Q(E)|\leq d_{\mathsf{TV}}(P,Q)$ and $(Q|E)(A)\leq 1$ , we have

\left|P(A)-Q(A)\right|>d_{\mathsf{TV}}(P,Q),

which is a contradiction. ∎

Corollary A.19.

Let $P:\Omega\to{\mathbb{R}}$ be a probability distribution, let $X$ , $Y$ and $Z$ be random variables on $\Omega$ , and let $E$ be an event such that $\Pr_{z\leftarrow P_{Z}}[P(E|Z=z)\geq\delta]\geq 1-\tau$ , and let $\tilde{P}$ denote $P|E$ . Then

\mathop{\mathbb{E}}_{z\leftarrow P_{Z}}\big{[}d_{\mathsf{TV}}(\tilde{P}_{X|Z=z},\tilde{P}_{Y|Z=z})\big{]}\leq\tau+\frac{2\cdot\mathop{\mathbb{E}}_{z\leftarrow P_{Z}}\big{[}d_{\mathsf{TV}}(P_{X|Z=z},P_{Y|Z=z})\big{]}}{\delta}.

Proof.

	$\displaystyle\mathop{\mathbb{E}}_{z\leftarrow P_{Z}}\big{[}d_{\mathsf{TV}}(\tilde{P}_{X\|Z=z},\tilde{P}_{Y\|Z=z})\big{]}$
	$\displaystyle=\mathop{\mathbb{E}}_{z\leftarrow P_{Z}}\big{[}1_{P(E\|Z=z)<\delta}\cdot d_{\mathsf{TV}}(\tilde{P}_{X\|Z=z},\tilde{P}_{Y\|Z=z})+1_{P(E\|Z=z)\geq\delta}\cdot d_{\mathsf{TV}}(\tilde{P}_{X\|Z=z},\tilde{P}_{Y\|Z=z})\big{]}$
	$\displaystyle\leq\tau+\mathop{\mathbb{E}}_{z\leftarrow P_{Z}}\big{[}1_{P(E\|Z=z)\geq\delta}\cdot d_{\mathsf{TV}}(\tilde{P}_{X\|Z=z},\tilde{P}_{Y\|Z=z})\big{]}$
	$\displaystyle\leq\tau+\mathop{\mathbb{E}}_{z\leftarrow P_{Z}}\left[1_{P(E\|Z=z)\geq\delta}\cdot\frac{2\cdot d_{\mathsf{TV}}(P_{X\|Z=z},P_{Y\|Z=z})}{P(E\|Z=z)}\right]$
	$\displaystyle\leq\tau+\frac{2\cdot\mathop{\mathbb{E}}_{z\leftarrow P_{Z}}\left[d_{\mathsf{TV}}(P_{X\|Z=z},P_{Y\|Z=z})\right]}{\delta}.\qed$

Appendix B Fourier Analysis

For any (finite) vector space $V$ over ${\mathbb{F}}_{2}$ , the character group of $V$ , denoted $\hat{V}$ , is the set of group homomorphisms mapping $V$ (viewed as an additive group) to $\{\pm 1\}$ (viewed as a multiplicative group). Each such homomorphism is called a character of $V$ .

We will distinguish the spaces of functions mapping from $V\to{\mathbb{R}}$ and functions mapping $\hat{V}\to{\mathbb{R}}$ and view them as two different inner product spaces. For functions mapping $V\to{\mathbb{R}}$ , we define the inner product

\langle f,g\rangle\stackrel{{\scriptstyle\mathsf{def}}}{{=}}\mathop{\mathbb{E}}_{x\leftarrow V}\left[f(x){g(x)}\right],

and for functions mapping $\hat{V}\to{\mathbb{R}}$ , we define the inner product

\langle\hat{f},\hat{g}\rangle\stackrel{{\scriptstyle\mathsf{def}}}{{=}}\sum_{\chi\in\hat{{\cal V}}}\hat{f}(\chi)\cdot{\hat{g}(\chi)}.

If there is danger of ambiguity, we use $\hat{\langle}\cdot,\cdot\hat{\rangle}$ to denote the latter inner product, and $\hat{\|}\cdot\hat{\|}$ to denote its corresponding norm.

Fact B.1.

Given a choice of basis for $V$ , there is a canonical isomorphism between $V$ and $\hat{V}$ . Specifically, if $V={\mathbb{F}}_{2}^{n}$ , then the characters of $V$ are the functions of the form

\chi_{\gamma}(v)=(-1)^{\gamma\cdot v}

for $\gamma\in{\mathbb{F}}_{2}^{n}$ .

Definition B.2.

For any function $f:V\to{\mathbb{R}}$ , its Fourier transform is the function $\hat{f}:\hat{V}\to{\mathbb{R}}$ defined by

\hat{f}(\chi)\stackrel{{\scriptstyle\mathsf{def}}}{{=}}\langle f,\chi\rangle=\mathop{\mathbb{E}}_{x\leftarrow V}\left[f(x)\chi(x)\right].

One can verify that the characters of $V$ are orthonormal. Together with the assumption that $V$ is finite, we can deduce that $f$ is equal to $\sum_{\chi\in\hat{V}}\hat{f}(\chi)\cdot\chi$ .

Theorem B.3 (Plancherel).

For any $f,g:V\to{\mathbb{R}}$ ,

\langle f,g\rangle=\langle\hat{f},\hat{g}\rangle.

An important special case of Plancherel’s theorem is Parseval’s theorem:

Theorem B.4 (Parseval).

For any $f:V\to{\mathbb{R}}$ ,

\|f\|=\|\hat{f}\|.

Appendix C Bound on Optimization Problem

Let $W:{\mathbb{R}}^{+}\to{\mathbb{R}}^{+}$ denote the inverse of the function $x\mapsto x\cdot e^{x}$ ( $W$ is known in the literature as the (principal branch of the) Lambert W function). We rely on the following theorem:

Theorem C.1 ([HH00, Corollary 2.4]).

There exists a constant $C$ (in particular, $C=\ln\left(1+\frac{1}{e}\right)$ works) such that for all $y\geq e$ ,

W(y)\leq\ln y-\ln\ln y+C.

The following corollary is more directly suited to our needs.

Corollary C.2.

For any $A,B>0$ satisfying $A\geq eB$ ,

\min_{\tau\in(0,1)}\frac{A}{\ln\left(\frac{1}{\tau}\right)}+\frac{B}{\tau}\leq\frac{4A}{\ln(A/B)}.

Proof.

The minimum is achieved (up to a factor of two) when $\frac{A}{\ln\left(\frac{1}{\tau}\right)}=\frac{B}{\tau}$ because $\frac{A}{\ln\left(\frac{1}{\tau}\right)}$ is monotonically increasing with $\tau$ while $\frac{B}{\tau}$ is monotonically decreasing. Making the change of variables $z=-\ln(\tau)$ , this is equivalent to $ze^{z}=\frac{A}{B}$ , i.e. $z=W(\frac{A}{B})$ . This choice of $z$ (or equivalently $\tau$ ) gives

$\displaystyle\frac{A}{\ln\left(\frac{1}{\tau}\right)}+\frac{B}{\tau}$	$\displaystyle=\frac{2A}{W(A/B)}$
	$\displaystyle=2B\cdot\frac{A/B}{W(A/B)}$
	$\displaystyle=2B\cdot\exp\big{(}W(A/B)\big{)}$	(Definition of $W$ )
	$\displaystyle\leq\frac{2A\cdot(1+e^{-1})}{\ln(A/B)}$	(Theorem C.1)
	$\displaystyle\leq\frac{4A}{\ln(A/B)}.$

References

[BGKW88] Michael Ben-Or, Shafi Goldwasser, Joe Kilian, and Avi Wigderson. Multi-prover interactive proofs: How to remove intractability assumptions. In STOC, pages 113–131. ACM, 1988.
[BJKS04] Ziv Bar-Yossef, T. S. Jayram, Ravi Kumar, and D. Sivakumar. An information statistics approach to data stream and communication complexity. J. Comput. Syst. Sci., 68(4):702–732, 2004.
[CHTW04] Richard Cleve, Peter Høyer, Benjamin Toner, and John Watrous. Consequences and limits of nonlocal strategies. In CCC, pages 236–249. IEEE Computer Society, 2004.
[DHVY17] Irit Dinur, Prahladh Harsha, Rakesh Venkat, and Henry Yuen. Multiplayer parallel repetition for expanding games. In ITCS, volume 67 of LIPIcs, pages 37:1–37:16. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2017.
[EPR35] Albert Einstein, Boris Podolsky, and Nathan Rosen. Can quantum-mechanical description of physical reality be considered complete? Physical review letters, 47(10):777, 1935.
[Fei91] Uriel Feige. On the success probability of the two provers in one-round proof systems. In Structure in Complexity Theory Conference, pages 116–123. IEEE Computer Society, 1991.
[FGL⁺91] Uriel Feige, Shafi Goldwasser, László Lovász, Shmuel Safra, and Mario Szegedy. Approximating clique is almost NP-complete (preliminary version). In FOCS, pages 2–12. IEEE Computer Society, 1991.
[FK91] H. Furstenberg and Y. Katznelson. A density version of the Hales-Jewett theorem. Journal d’Analyse Mathématique, 57(1):64–119, December 1991.
[For89] Lance Jeremy Fortnow. Complexity-theoretic aspects of interactive proof systems. PhD thesis, MIT, 1989.
[FRS94] Lance Fortnow, John Rompel, and Michael Sipser. On the power of multi-prover interactive protocols. Theor. Comput. Sci., 134(2):545–557, 1994.
[FV96] Uriel Feige and Oleg Verbitsky. Error reduction by parallel repetition - a negative result. In Steven Homer and Jin-Yi Cai, editors, CCC, pages 70–76. IEEE Computer Society, 1996.
[GHZ89] Daniel M. Greenberger, Michael A. Horne, and Anton Zeilinger. Going Beyond Bell’s Theorem, pages 69–72. Springer Netherlands, Dordrecht, 1989.
[HH00] Abdolhossein Hoorfar and Mehdi Hassani. Inequalities on the Lambert W function and hyperpower function. J. Inequal. Pure and Appl. Math, 2000.
[Hol09] Thomas Holenstein. Parallel repetition: Simplification and the no-signaling case. Theory Comput., 5(1):141–172, 2009.
[HY19] Justin Holmgren and Lisa Yang. The parallel repetition of non-signaling games: counterexamples and dichotomy. In STOC, pages 185–192. ACM, 2019.
[MS13] Carl A. Miller and Yaoyun Shi. Optimal robust self-testing by binary nonlocal XOR games. In TQC, volume 22 of LIPIcs, pages 254–262. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2013.
[Pol12] D.H.J. Polymath. A new proof of the density Hales-Jewett theorem. Annals of Mathematics, 175(3):1283–1327, May 2012.
[PRW97] Itzhak Parnafes, Ran Raz, and Avi Wigderson. Direct product results and the GCD problem, in old and new communication models. In Frank Thomson Leighton and Peter W. Shor, editors, STOC, pages 363–372. ACM, 1997.
[Raz98] Ran Raz. A parallel repetition theorem. SIAM J. Comput., 27(3):763–803, 1998.
[Raz11] Ran Raz. A counterexample to strong parallel repetition. SIAM J. Comput., 40(3):771–777, 2011.
[Ver96] Oleg Verbitsky. Towards the parallel repetition conjecture. Theor. Comput. Sci., 157(2):277–282, 1996.
[Yue16] Henry Yuen. A parallel repetition theorem for all entangled games. In ICALP, volume 55 of LIPIcs, pages 77:1–77:13. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2016.

	$\displaystyle\left\|P(E)-U(E)\right\|$	$\displaystyle=U(E)\cdot\left\|\frac{P(E)}{U(E)}-1\right\|$
		$\displaystyle\leq U(E)\cdot\left\|\sum_{\chi\in\hat{{\cal V}}\setminus\{1\}}\prod_{i\in[3]}\hat{\varphi}_{E_{i}}(\chi)\right\|$
		$\displaystyle\leq\sum_{\chi\in\hat{{\cal V}}\setminus\{1\}}\prod_{i\in[3]}\left\|\hat{1}_{E_{i}}(\chi)\right\|.\qed$

	$\displaystyle 2\cdot\mathop{\mathbb{E}}_{x\leftarrow P_{X}}\left[d_{\mathsf{TV}}\big{(}P_{Y\|X\in x+{\cal W}^{3}},U_{Y\|X\in x+{\cal W}^{3}}\big{)}\right]$	$\displaystyle=\mathop{\mathbb{E}}_{x\leftarrow P_{X}}\sum_{y\in{\cal Y}}\left\|P_{Y\|X\in x+{\cal W}^{3}}(y)-U_{Y\|X\in x+{\cal W}^{3}}(y)\right\|$
		$\displaystyle\leq\mathop{\mathbb{E}}_{x\leftarrow P_{X}}\sum_{y\in{\cal Y}}\sum_{\chi\neq 1}\prod_{i\in[3]}\left\|\hat{1}_{i,\bar{x}_{i},y_{i}}(\chi)\right\|$
		$\displaystyle=\mathop{\mathbb{E}}_{x\leftarrow P_{X}}\sum_{y\in{\cal Y}}\sum_{\chi\neq 1}\prod_{i\in\{2,3\}}\sqrt{\left\|\hat{1}_{1,\bar{x}_{1},y_{1}}(\chi)\right\|\cdot\hat{1}_{i,\bar{x}_{i},y_{i}}(\chi)^{2}}.$

	$\displaystyle\prod_{i\in\{2,3\}}\sqrt{2\epsilon\|{\cal Y}_{i}\|\cdot\sum_{\chi\neq 1}\sum_{y_{i}\in{\cal Y}_{i}}\mathop{\mathbb{E}}_{x\leftarrow P_{X}}\left[\hat{1}_{i,\bar{x}_{i},y_{i}}(\chi)^{2}\right]}$
	$\displaystyle\leq\prod_{i\in\{2,3\}}\sqrt{2\epsilon\|{\cal Y}_{i}\|\cdot\sum_{y_{i}\in{\cal Y}_{i}}\mathop{\mathbb{E}}_{x\leftarrow P_{X}}\left[\mathop{\mathbb{E}}_{x^{\prime}\leftarrow\bar{x}_{i}}\left[1_{i,\bar{x}_{i},y_{i}}(x^{\prime})^{2}\right]\right]}$	(Parseval’s Theorem)
	$\displaystyle=\prod_{i\in\{2,3\}}\sqrt{2\epsilon\|{\cal Y}_{i}\|\cdot\mathop{\mathbb{E}}_{x\leftarrow P_{X}}\left[\mathop{\mathbb{E}}_{x^{\prime}\leftarrow\bar{x}_{i}}\left[\sum_{y_{i}\in{\cal Y}_{i}}1_{i,\bar{x}_{i},y_{i}}(x^{\prime})^{2}\right]\right]}.$

	$\displaystyle b_{i}$	$\displaystyle\stackrel{{\scriptstyle\mathsf{def}}}{{=}}\mathop{\mathbb{E}}_{(x,y)\leftarrow P_{X,Y}}\Big{[}d_{\mathsf{KL}}\Big{(}P_{\chi_{i}(X_{1})\|X\in x+{\cal U}_{i}^{3},Y_{1}=y_{1}}\Big{\\|}U_{\chi_{i}(X_{1})\|X\in x+{\cal U}_{i}^{3}}\Big{)}\Big{]}$
		$\displaystyle=\mathop{\mathbb{E}}_{(x,y)\leftarrow P_{X,Y}}\Big{[}d_{\mathsf{KL}}\Big{(}P_{\chi_{i}(X_{1})\|X\in x+{\cal U}_{i}^{3},Y_{1}=y_{1}}\Big{\\|}\mathrm{Unif}_{\{\pm 1\}}\Big{)}\Big{]}$
		$\displaystyle=\mathop{\mathbb{E}}_{(x,y)\leftarrow P_{X,Y}}\Big{[}d_{\mathsf{KL}}\Big{(}P_{\chi_{i}(X_{1})\|X_{1}\in x_{1}+{\cal U}_{i},Y_{1}=y_{1}}\Big{\\|}\mathrm{Unif}_{\{\pm 1\}}\Big{)}\Big{]}$
		$\displaystyle=1-\mathop{\mathbb{E}}_{(x,y)\leftarrow P_{X,Y}}\Big{[}H\big{(}\chi_{i}(X_{1})\|X_{1}\in x_{1}+{\cal U}_{i},Y_{1}=y_{1}\big{)}\Big{]},$

$\displaystyle\mathop{\mathbb{E}}_{x\leftarrow\tilde{P}_{X}}\Big{[}-\ln P\big{(}E\|X\in x+{\cal U}^{3}\big{)}\Big{]}$	$\displaystyle=\mathop{\mathbb{E}}_{x\leftarrow\tilde{P}_{X}}\Big{[}d_{\mathsf{KL}}\big{(}\tilde{P}_{X\|X\in x+{\cal U}^{3}}\\|P_{X\|X\in x+{\cal U}^{3}}\big{)}\Big{]}$
	$\displaystyle\leq d_{\mathsf{KL}}\big{(}\tilde{P}_{X}\\|P_{X}\big{)}$	(Fact A.17)
	$\displaystyle\leq\Delta.$

A Parallel Repetition Theorem for the GHZ Game

Abstract

1 Introduction

1.1 The GHZ Game

2 Technical Overview

Local Embeddability in Affine Subspaces

Pseudo-Affine Decompositions

Pseudorandomness Preserves Hardness

3 Preliminaries

3.1 Set Theory

Definition 3.1.

3.2 Linear Algebra

Definition 3.2 (Inner Product Space).

Theorem 3.3 (Cauchy-Schwarz).

3.3 Multi-Player Games

Definition 3.4 (Multi-player Games).

Definition 3.5 (Parallel Repetition).

Definition 3.6.

Definition 3.7.

Fact 3.8.

Definition 3.9 (Value in jt​hj^{th} coordinate).

Definition 3.10 (Game with Modified Query Distribution).

4 Key Lemmas

Definition 4.1 (Probability Densities).

Lemma 4.2.

Proof.

Corollary 4.3.

Proof.

Lemma 4.4.

Proof.

5 Local Embeddability in Affine Subspaces

Definition 5.1.

Proposition 5.2.

Proof.

Claim 5.3.

Proof.

Claim 5.4.

Proof.

Fact 5.5.

Claim 5.6.

Proof.

6 Decomposition Into Pseudorandom Affine Components

Definition 6.1.

Lemma 6.2.

Proof.

7 Pseudorandomness Preserves Hardness

Proposition 7.1.

Proof.

Claim 7.2.

Proof.

8 Proof of Main Theorem

Theorem 8.1.

Proof.

Lemma 8.2 (Parallel Repetition Criterion).

Proof.

Appendix A Probability Theory

Definition A.1.

Definition A.2.

Definition A.3.

Definition A.4 (Expectations).

Definition A.5 (Indicator Random Variables).

Definition A.6 (Independence).

Definition A.7 (Conditional Probabilities).

Definition A.8 (Entropy).

Definition A.9 (Conditional Entropy).

Fact A.10 (Chain Rule of Conditional Entropy).

A.1 Divergences

Definition A.11 (Total Variational Distance).

Definition A.12 (Kullback-Leibler (KL) Divergence).

Theorem A.13 (Pinsker’s Inequality).

Definition A.14 (Conditional KL Divergence).

Fact A.15 (Chain Rule for KL Divergence).

A.2 Conditional KL Divergence

Fact A.16.

Fact A.17.

Proof.

A.3 Conditional Statistical Distance

Fact A.18.

Proof.

Corollary A.19.

Definition 3.9 (Value in $j^{th}$ coordinate).