This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

The Approximate Degree of DNF and CNF Formulas

Alexander A. Sherstov
Abstract.

The approximate degree of a Boolean function f:{0,1}n{0,1}f\colon\{0,1\}^{n}\to\{0,1\} is the minimum degree of a real polynomial pp that approximates ff pointwise: |f(x)p(x)|1/3|f(x)-p(x)|\leqslant 1/3 for all x{0,1}n.x\in\{0,1\}^{n}. For every δ>0,\delta>0, we construct CNF and DNF formulas of polynomial size with approximate degree Ω(n1δ),\Omega(n^{1-\delta}), essentially matching the trivial upper bound of n.n. This improves polynomially on previous lower bounds and fully resolves the approximate degree of constant-depth circuits (𝖠𝖢0),(\operatorname{\mathsf{AC}}^{0}), a question that has seen extensive research over the past 10 years. Prior to our work, an Ω(n1δ)\Omega(n^{1-\delta}) lower bound was known only for 𝖠𝖢0\operatorname{\mathsf{AC}}^{0} circuits of depth that grows with 1/δ1/\delta (Bun and Thaler, FOCS 2017). Furthermore, the CNF and DNF formulas that we construct are the simplest possible in that they have constant width. Our result holds even for one-sided approximation: for any δ>0\delta>0, we construct a polynomial-size constant-width CNF formula with one-sided approximate degree Ω(n1δ)\Omega(n^{1-\delta}).

Our work has the following consequences.

  1. (i)

    We essentially settle the communication complexity of 𝖠𝖢0\operatorname{\mathsf{AC}}^{0} circuits in the bounded-error quantum model, kk-party number-on-the-forehead randomized model, and kk-party number-on-the-forehead nondeterministic model: we prove that for every δ>0\delta>0, these models require Ω(n1δ)\Omega(n^{1-\delta}), Ω(n/4kk2)1δ\Omega(n/4^{k}k^{2})^{1-\delta}, and Ω(n/4kk2)1δ\Omega(n/4^{k}k^{2})^{1-\delta}, respectively, bits of communication even for polynomial-size constant-width CNF formulas.

  2. (ii)

    In particular, we show that the multiparty communication class 𝖼𝗈𝖭𝖯k\mathsf{coNP}_{k} can be separated essentially optimally from 𝖭𝖯k\mathsf{NP}_{k} and 𝖡𝖯𝖯k\mathsf{BPP}_{k} by a particularly simple function, a polynomial-size constant-width CNF formula.

  3. (iii)

    We give an essentially tight separation, of O(1)O(1) versus Ω(n1δ)\Omega(n^{1-\delta}), for the one-sided versus two-sided approximate degree of a function; and O(1)O(1) versus Ω(n1δ)\Omega(n^{1-\delta}) for the one-sided approximate degree of a function ff versus its negation ¬f\neg f.

Our proof departs significantly from previous approaches and contributes a novel, number-theoretic method for amplifying approximate degree.

This manuscript is a much-expanded version of the STOC ’22 paper, with several new results. Work supported by NSF grants CCF-1814947 and CCF-2220232.
      Author affiliation: Computer Science Department, UCLA, Los Angeles, CA 90095. Email: sherstov@cs.ucla.edu

1. Introduction

Representations of Boolean functions by real polynomials play a central role in theoretical computer science. Our focus in this paper is on approximate degree, a particularly natural and useful complexity measure. Formally, the ϵ\epsilon-approximate degree of a Boolean function f:{0,1}n{0,1}f\colon\{0,1\}^{n}\to\{0,1\} is denoted degϵ(f)\deg_{\epsilon}(f) and defined as the minimum degree of a real polynomial pp that approximates ff within ϵ\epsilon pointwise: |f(x)p(x)|ϵ|f(x)-p(x)|\leqslant\epsilon for all x{0,1}n.x\in\{0,1\}^{n}. The standard choice of the error parameter is ϵ=1/3,\epsilon=1/3, which is a largely arbitrary setting that can be replaced by any other constant in (0,1/2)(0,1/2) without affecting the approximate degree by more than a multiplicative constant. Since every function f:{0,1}n{0,1}f\colon\{0,1\}^{n}\to\{0,1\} can be computed with zero error by a polynomial of degree at most n,n, the ϵ\epsilon-approximate degree is always at most n.n.

The notion of approximate degree originated three decades ago in the pioneering work of Nisan and Szegedy [31] and has since proved to be a powerful tool in theoretical computer science. Upper bounds on approximate degree have algorithmic applications, whereas lower bounds are a staple in complexity theory. On the algorithmic side, approximate degree underlies many of the strongest results obtained to date in computational learning, differentially private data release, and algorithm design in general. In complexity theory, the notion of approximate degree has produced breakthroughs in quantum query complexity, communication complexity, and circuit complexity. A detailed bibliographic overview of these applications can be found in [47, 17].

Approximate degree has been particularly prominent in the study of 𝖠𝖢0,\operatorname{\mathsf{AC}}^{0}, the class of polynomial-size constant-depth circuits with gates ,,¬\vee,\wedge,\neg of unbounded fan-in. The simplest functions in 𝖠𝖢0\operatorname{\mathsf{AC}}^{0} are conjunctions and disjunctions, which have depth 11, followed by polynomial-size CNF and DNF formulas, which have depth 22, followed in turn by higher-depth circuits. Lower bounds on the approximate degree of 𝖠𝖢0\operatorname{\mathsf{AC}}^{0} functions have been used to settle the quantum query complexity of Grover search [8], element distinctness [1], and a host of other problems [14]; resolve the communication complexity of set disjointness in the two-party quantum model [33, 38] and number-on-the-forehead multiparty model [37, 38, 28, 20, 36, 11, 44, 43]; separate the communication complexity classes 𝖯𝖯\mathsf{PP} and 𝖴𝖯𝖯\mathsf{UPP} [13, 37]; and separate the polynomial hierarchy in communication complexity from the communication class 𝖴𝖯𝖯\mathsf{UPP} [34]. Despite this array of applications and decades of study, our understanding of the approximate degree of 𝖠𝖢0\operatorname{\mathsf{AC}}^{0} has remained surprisingly fragmented and incomplete. In this paper, we set out to resolve this question in full.

In more detail, previous work on the approximate degree of 𝖠𝖢0\operatorname{\mathsf{AC}}^{0} started with the seminal 1994 paper of Nisan and Szegedy [31], who proved that the OR function on nn bits has approximate degree Θ(n).\Theta(\sqrt{n}). This was the best result until Aaronson and Shi’s celebrated lower bound of Ω(n2/3)\Omega(n^{2/3}) for the element distinctness problem [1]. In a beautiful paper from 2017, Bun and Thaler [17] showed that 𝖠𝖢0\operatorname{\mathsf{AC}}^{0} contains functions in nn variables with approximate degree Ω(n1δ)\Omega(n^{1-\delta}), where the constant δ>0\delta>0 can be made arbitrarily small at the expense of increasing the depth of the circuit. In follow-up work, Bun and Thaler [18] proved an Ω(n1δ)\Omega(n^{1-\delta}) lower bound for approximating 𝖠𝖢0\operatorname{\mathsf{AC}}^{0} circuits even with error exponentially close to 1/2,1/2, where once again the circuit depth grows with 1/δ1/\delta. A stronger yet result was obtained by Sherstov and Wu [49], who showed that 𝖠𝖢0\operatorname{\mathsf{AC}}^{0} has essentially the maximum possible threshold degree (defined as the limit of ϵ\epsilon-approximate degree as ϵ1/2\epsilon\nearrow 1/2) and sign-rank (a generalization of threshold degree to arbitrary bases rather than just the basis of monomials). Quantitatively, the authors of [49] proved a lower bound of Ω(n1δ)\Omega(n^{1-\delta}) for threshold degree and exp(Ω(n1δ))\exp(\Omega(n^{1-\delta})) for sign-rank, essentially matching the trivial upper bounds. As before, δ>0\delta>0 can be made arbitrarily small at the expense of increasing the circuit depth. In particular, 𝖠𝖢0\operatorname{\mathsf{AC}}^{0} requires a polynomial of degree Ω(n1δ)\Omega(n^{1-\delta}) even for approximation to error doubly (triply, quadruply, quintuply…) exponentially close to 1/21/2.

The lower bounds of [17, 18, 49] show that 𝖠𝖢0\operatorname{\mathsf{AC}}^{0} functions have essentially the maximum possible complexity—but only if one is willing to look at circuits of arbitrarily large constant depth. What happens at small depths has been a wide open problem, with no techniques to address it. Bun and Thaler observe that their 𝖠𝖢0\operatorname{\mathsf{AC}}^{0} circuit in [17] with approximate degree Ω(n1δ)\Omega(n^{1-\delta}) can be flattened to produce a DNF formula of size exp(logO(log(1/δ))n)\exp(\log^{O(\log(1/\delta))}n), but this is superpolynomial and thus no longer in 𝖠𝖢0\operatorname{\mathsf{AC}}^{0}. The only progress of which we are aware is an Ω(n3/4δ)\Omega(n^{3/4-\delta}) lower bound obtained for polynomial-size DNF formulas in [14, 29]. This leaves a polynomial gap in the approximate degree for small depth versus arbitrary constant depth. Our main contribution is to definitively resolve the approximate degree of 𝖠𝖢0\operatorname{\mathsf{AC}}^{0} by constructing, for any constant δ>0,\delta>0, a polynomial-size DNF formula with approximate degree Ω(n1δ)\Omega(n^{1-\delta}). We now describe our main result and its generalizations and applications.

1.1. Approximate degree of DNF and CNF formulas

Recall that a literal is a Boolean variable x1,x2,,xnx_{1},x_{2},\ldots,x_{n} or its negation x1¯,x2¯,,xn¯\overline{x_{1}},\overline{x_{2}},\ldots,\overline{x_{n}}. A conjunction of literals is called a term, and a disjunction of literals is called a clause. The width of a term or clause is the number of literals that it contains. A DNF formula is a disjunction of terms, and analogously a CNF formula is a conjunction of clauses. The width of a DNF or CNF formula is the maximum width of a term or clause in it, respectively. One often refers to DNF and CNF formulas of width kk as kk-DNF and kk-CNF formulas. The size of a DNF or CNF formula is the total number of terms or clauses, respectively, that it contains. Thus, 𝖠𝖢0\operatorname{\mathsf{AC}}^{0} circuits of depth 11 correspond precisely to clauses and terms, whereas 𝖠𝖢0\operatorname{\mathsf{AC}}^{0} circuits of depth 22 correspond precisely to polynomial-size DNF and CNF formulas. Our main result on approximate degree is as follows.

Theorem 1.1 (Main result).

Let δ>0\delta>0 be any constant. Then for each n1,n\geqslant 1, there is an ((explicitly given)) function f:{0,1}n{0,1}f\colon\{0,1\}^{n}\to\{0,1\} that has approximate degree

deg1/3(f)=Ω(n1δ)\deg_{1/3}(f)=\Omega(n^{1-\delta})

and is computable by a DNF formula of size nO(1)n^{O(1)} and width O(1).O(1).

Theorem 1.1 almost matches the trivial upper bound of nn on the approximate degree of any function. Thus, the theorem shows that 𝖠𝖢0\operatorname{\mathsf{AC}}^{0} circuits of depth 22 already achieve essentially the maximum possible approximate degree. This depth cannot be reduced further because 𝖠𝖢0\operatorname{\mathsf{AC}}^{0} circuits of depth 11 have approximate degree O(n).O(\sqrt{n}). Finally, the DNF formulas constructed in Theorem 1.1 are the simplest possible in that they have constant width.

Recall that previously, a lower bound of Ω(n1δ)\Omega(n^{1-\delta}) for 𝖠𝖢0\operatorname{\mathsf{AC}}^{0} was known only for circuits of large constant depth that grows with 1/δ1/\delta. The lack of progress on small-depth 𝖠𝖢0\operatorname{\mathsf{AC}}^{0} prior to this paper had experts seriously entertaining [18] the possibility that 𝖠𝖢0\operatorname{\mathsf{AC}}^{0} circuits of any given depth dd have approximate degree O(n1δd)O(n^{1-\delta_{d}}), for some constant δd=δd(d)>0\delta_{d}=\delta_{d}(d)>0. Such an upper bound would have far-reaching consequences in computational learning and circuit complexity. Theorem 1.1 rules it out.

1.2. Large-error approximation

Any Boolean function can be approximated pointwise within 1/21/2 in a trivial manner, by a constant polynomial. Approximation within 12o(1),\frac{1}{2}-o(1), on the other hand, is a meaningful and extremely useful notion. We obtain the following strengthening of our main result, in which the approximation error is relaxed from 1/31/3 to an optimal 121nΘ(1).\frac{1}{2}-\frac{1}{n^{\Theta(1)}}.

Theorem 1.2 (Main result for large error).

Let δ>0\delta>0 and C1C\geqslant 1 be any constants. Then for each n1,n\geqslant 1, there is an ((explicitly given)) function f:{0,1}n{0,1}f\colon\{0,1\}^{n}\to\{0,1\} that has approximate degree

deg121nC(f)=Ω(n1δ)\deg_{\frac{1}{2}-\frac{1}{n^{C}}}(f)=\Omega(n^{1-\delta})

and is computable by a DNF formula of size nO(1)n^{O(1)} and width O(1).O(1).

To rephrase Theorem 1.2, polynomial-size DNF formulas require degree Ω(n1δ)\Omega(n^{1-\delta}) for approximation not only to constant error but even to error 121nC,\frac{1}{2}-\frac{1}{n^{C}}, where C1C\geqslant 1 is an arbitrarily large constant. The error parameter in Theorem 1.2 cannot be relaxed further to 121nω(1)\frac{1}{2}-\frac{1}{n^{\omega(1)}} because any DNF formula with mm terms can be approximated to error 12Ω(1m)\frac{1}{2}-\Omega(\frac{1}{m}) by a polynomial of degree O(nlogm)O(\sqrt{n\log m}).

Negating a function has no effect on the approximate degree. Indeed, if ff is approximated to error ϵ\epsilon by a polynomial pp, then the negated function ¬f=1f\neg f=1-f is approximated to the same error ϵ\epsilon by the polynomial 1p.1-p. With this observation, Theorems 1.1 and 1.2 carry over to CNF formulas:

Corollary 1.3.

Let δ>0\delta>0 and C1C\geqslant 1 be any constants. Then for each n1,n\geqslant 1, there is an ((explicitly given)) function g:{0,1}n{0,1}g\colon\{0,1\}^{n}\to\{0,1\} that has approximate degree

deg121nC(g)=Ω(n1δ)\deg_{\frac{1}{2}-\frac{1}{n^{C}}}(g)=\Omega(n^{1-\delta})

and is computable by a CNF formula of size nO(1)n^{O(1)} and width O(1).O(1).

1.3. One-sided approximation

There is a natural notion of one-sided approximation for Boolean functions. Specifically, the one-sided ϵ\epsilon-approximate degree of a function f:{0,1}n{0,1}f\colon\{0,1\}^{n}\to\{0,1\} is defined as the minimum degree of a real polynomial pp such that

f(x)=0\displaystyle f(x)=0 p(x)[ϵ,ϵ],\displaystyle\qquad\Rightarrow\qquad p(x)\in[-\epsilon,\epsilon],
f(x)=1\displaystyle f(x)=1 p(x)[1ϵ,+)\displaystyle\qquad\Rightarrow\qquad p(x)\in[1-\epsilon,+\infty)

for every x{0,1}n.x\in\{0,1\}^{n}. This complexity measure is denoted degϵ+(f)\deg^{+}_{\epsilon}(f). It plays a considerable role [23, 15, 40, 16, 45, 44, 43] in the area, both in its own right and due to its applications to other asymmetric notions of computation such as nondeterminism and Merlin–Arthur protocols. One-sided approximation is meaningful for any error parameter ϵ[0,1/2),\epsilon\in[0,1/2), and as before the standard setting is ϵ=1/3\epsilon=1/3. By definition, one-sided approximate degree is always at most nn. Observe that the definitions of ϵ\epsilon-approximate degree degϵ(f)\deg_{\epsilon}(f) and its one-sided variant degϵ+(f)\deg^{+}_{\epsilon}(f) impose the same requirement for inputs xf1(0)x\in f^{-1}(0): the approximating polynomial must approximate ff within ϵ\epsilon at each such xx. For inputs xf1(1),x\in f^{-1}(1), on the other hand, the definitions of degϵ(f)\deg_{\epsilon}(f) and degϵ+(f)\deg^{+}_{\epsilon}(f) diverge dramatically, with one-sided ϵ\epsilon-approximate degree not requiring any upper bound on the approximating polynomial pp. As a result, one always has degϵ+(f)degϵ(f)\deg^{+}_{\epsilon}(f)\leqslant\deg_{\epsilon}(f), and it is reasonable to expect a large gap between the two quantities for some ff. Moreover, the one-sided approximate degree of a function is in general not equal to that of its negation: degϵ+(f)degϵ+(¬f).\deg^{+}_{\epsilon}(f)\neq\deg^{+}_{\epsilon}(\neg f). This contrasts with the equality degϵ(f)=degϵ(¬f)\deg_{\epsilon}(f)=\deg_{\epsilon}(\neg f) for two-sided approximation.

In this light, there are three particularly natural questions to ask about one-sided approximate degree:

  1. (i)

    What is the one-sided approximate degree of 𝖠𝖢0\operatorname{\mathsf{AC}}^{0} circuits?

  2. (ii)

    What is the largest possible gap between approximate degree and one-sided approximate degree?

  3. (iii)

    What is the largest possible gap between the one-sided approximate degree of a function ff and that of its negation ¬f\neg f?

In this paper, we resolve all three questions in detail. For question (i), we prove that polynomial-size CNF formulas achieve essentially the maximum possible one-sided approximate degree. In fact, our result holds even for approximation to error vanishingly close to random guessing, 12o(1)\frac{1}{2}-o(1):

Theorem 1.4.

Let δ>0\delta>0 and C1C\geqslant 1 be any constants. Then for each n1,n\geqslant 1, there is an ((explicitly given)) function g:{0,1}n{0,1}g\colon\{0,1\}^{n}\to\{0,1\} that has one-sided approximate degree

deg121nC+(g)=Ω(n1δ)\deg^{+}_{\frac{1}{2}-\frac{1}{n^{C}}}(g)=\Omega(n^{1-\delta})

and is computable by a CNF formula of size nO(1)n^{O(1)} and width O(1).O(1).

Theorem 1.4 essentially settles the one-sided approximate degree of 𝖠𝖢0\operatorname{\mathsf{AC}}^{0}. The theorem is optimal with respect to circuit depth; recall that depth-11 circuits have approximate degree O(n)O(\sqrt{n}) and hence also one-sided approximate degree O(n)O(\sqrt{n}). Previous work on the one-sided approximate degree of 𝖠𝖢0\operatorname{\mathsf{AC}}^{0} was suboptimal with respect to the degree bound and/or circuit depth. Specifically, the best previous lower bounds were Ω(n/logn)2/3\Omega(n/\log n)^{2/3} due to Bun and Thaler [16] for a polynomial-size CNF formula, and Ω(n1δ)\Omega(n^{1-\delta}) due to Sherstov and Wu [49] for 𝖠𝖢0\operatorname{\mathsf{AC}}^{0} circuits of depth that grows with 1/δ.1/\delta.

As an application of Theorem 1.4, we resolve questions (ii) and (iii) in full, establishing a gap of O(1)O(1) versus Ω(n1δ)\Omega(n^{1-\delta}) in each case. Moreover, we prove that these gaps remain valid well beyond the standard error regime of ϵ=1/3\epsilon=1/3. A detailed statement of our separations follows.

Corollary 1.5.

Let δ>0\delta>0 and C1C\geqslant 1 be any constants. Then for each n1,n\geqslant 1, there is an ((explicitly given)) function f:{0,1}n{0,1}f\colon\{0,1\}^{n}\to\{0,1\} with

deg0+(f)=O(1)\deg^{+}_{0}(f)=O(1) (1.1)

but

deg121nC(f)=Ω(n1δ),\displaystyle\deg_{\frac{1}{2}-\frac{1}{n^{C}}}(f)=\Omega(n^{1-\delta}), (1.2)
deg121nC+(¬f)=Ω(n1δ).\displaystyle\deg^{+}_{\frac{1}{2}-\frac{1}{n^{C}}}(\neg f)=\Omega(n^{1-\delta}). (1.3)

Moreover, ff is computable by a DNF formula of size nO(1)n^{O(1)} and width O(1).O(1).

Equations (1.1) and (1.2) in this result give the promised O(1)O(1) versus Ω(n1δ)\Omega(n^{1-\delta}) separation for question (ii). Analogously, (1.1) and (1.3) give an O(1)O(1) versus Ω(n1δ)\Omega(n^{1-\delta}) separation for question (iii). Of particular note in both separations is the error regime: the upper bound remains valid even under the stronger requirement of zero error, whereas the lower bounds remain valid even under the weaker requirement of error 12o(1)\frac{1}{2}-o(1). Our separations improve on previous work. For question (ii), the best previous separation was (logn)Oδ(1)(\log n)^{O_{\delta}(1)} versus Ω(n1δ)\Omega(n^{1-\delta}) for any fixed δ>0\delta>0, implicit in [17]. For the harder question (iii), the best previous separation [16] was O(logn)O(\log n) versus Ω(n/logn)2/3\Omega(n/\log n)^{2/3}, which is polynomially weaker than ours.

The derivation of Corollary 1.5 from Theorem 1.4 is short and illustrative, and we include it here.

Proof of Corollary 1.5..

Let gg be the function from Theorem 1.4, and set f=¬gf=\neg g. Then (1.3) is immediate. Equation (1.2) follows from (1.3) in light of the basic relations degϵ(f)=degϵ(¬f)degϵ+(¬f)\deg_{\epsilon}(f)=\deg_{\epsilon}(\neg f)\geqslant\deg^{+}_{\epsilon}(\neg f), valid for all ff and ϵ.\epsilon. Finally, (1.1) can be seen as follows. Since gg is a CNF formula of width O(1),O(1), its negation ff is a DNF formula of width O(1).O(1). Thus, every term of ff can be represented exactly by a polynomial of degree O(1)O(1). Summing these polynomials gives a 0-error one-sided approximant for f.f.

We now discuss applications of our results on approximate degree and one-sided approximate degree to fundamental questions in communication complexity.

1.4. Randomized multiparty communication

We adopt the number-on-the-forehead model of Chandra, Furst, and Lipton [19], which is the most powerful formalism of multiparty communication. The model features kk communicating players and a Boolean function F:X1×X2××Xk{0,1}F\colon X_{1}\times X_{2}\times\cdots\times X_{k}\to\{0,1\} with kk arguments. An input (x1,x2,,xk)(x_{1},x_{2},\dots,x_{k}) is distributed among the kk players by giving the ii-th player the arguments x1,,xi1,xi+1,,xkx_{1},\dots,x_{i-1},x_{i+1},\dots,x_{k} but not xix_{i}. This arrangement can be visualized as having the kk players seated in a circle with xix_{i} written on the ii-th player’s forehead, whence the name of the model. Number-on-the-forehead is the canonical model in the area because any other way of assigning arguments to players results in a less powerful model—provided of course that one does not assign all the arguments to some player, in which case there is never a need to communicate.

The players communicate according to a protocol agreed upon in advance. The communication occurs in the form of broadcasts, with a message sent by any given player instantly reaching everyone else. The players’ objective is to compute FF on any given input with minimal communication. To this end, the players have access to an unbounded supply of shared random bits which they can use in deciding what message to send at any given point in the protocol. The cost of a protocol is the total bit length of all the messages broadcast in a worst-case execution. The ϵ\epsilon-error randomized communication complexity Rϵ(F)R_{\epsilon}(F) of a given function FF is the least cost of a protocol that computes FF with probability of error at most ϵ\epsilon on every input. As with approximate degree, the standard setting of the error parameter is ϵ=1/3.\epsilon=1/3.

The number-on-the-forehead communication complexity of constant-depth circuits is a challenging question that has been the focus of extensive research, e.g., [12, 28, 20, 36, 11, 44, 43, 17]. In contrast to the two-party model, where a lower bound of Ω(n)\Omega(\sqrt{n}) for 𝖠𝖢0\operatorname{\mathsf{AC}}^{0} circuits is straightforward to prove from first principles [4], the first nΩ(1)n^{\Omega(1)} multiparty lower bound [44] for 𝖠𝖢0\operatorname{\mathsf{AC}}^{0} was obtained only in 2012. The strongest known multiparty lower bounds for 𝖠𝖢0\operatorname{\mathsf{AC}}^{0} are obtained using the pattern matrix method of [43], which transforms approximate degree lower bounds in a black-box manner into communication lower bounds. In the most recent application of this method, Bun and Thaler [17] gave a kk-party communication problem F:({0,1}n)k{0,1}F\colon(\{0,1\}^{n})^{k}\to\{0,1\} in 𝖠𝖢0\operatorname{\mathsf{AC}}^{0} with communication complexity Ω(n/4kk2)1δ,\Omega(n/4^{k}k^{2})^{1-\delta}, where the constant δ>0\delta>0 can be taken arbitrarily small at the expense of increasing the depth of the 𝖠𝖢0\operatorname{\mathsf{AC}}^{0} circuit. This shows that 𝖠𝖢0\operatorname{\mathsf{AC}}^{0} has essentially the maximum possible multiparty communication complexity—as long as one is willing to use circuits of arbitrarily large constant depth. For circuits of small depth, the best lower bound is polynomially weaker: Ω(n/4kk2)3/4δ\Omega(n/4^{k}k^{2})^{3/4-\delta} for the kk-party communication complexity of polynomial-size DNF formulas, which can be proved by applying the pattern matrix method to the approximate degree lower bounds in [14, 29]. This fragmented state of the art closely parallels that for approximate degree prior to our work.

We resolve the multiparty communication complexity of 𝖠𝖢0\operatorname{\mathsf{AC}}^{0} in detail in the following theorem.

Theorem 1.6.

Fix any constants δ(0,1]\delta\in(0,1] and C1C\geqslant 1. Then for all integers n,k2,n,k\geqslant 2, there is an ((explicitly given)) kk-party communication problem Fn,k:({0,1}n)k{0,1}F_{n,k}\colon(\{0,1\}^{n})^{k}\to\{0,1\} with

R1/3(Fn,k)(nc4kk2)1δ,\displaystyle R_{1/3}(F_{n,k})\geqslant\left(\frac{n}{c^{\prime}4^{k}k^{2}}\right)^{1-\delta},
R121nC(Fn,k)n1δc4k,\displaystyle R_{\frac{1}{2}-\frac{1}{n^{C}}}(F_{n,k})\geqslant\frac{n^{1-\delta}}{c^{\prime}4^{k}},

where c1c^{\prime}\geqslant 1 is a constant independent of nn and k.k. Moreover, Fn,kF_{n,k} is computable by a DNF formula of size ncn^{c^{\prime}} and width ckc^{\prime}k.

Theorem 1.6 essentially represents the state of the art for multiparty communication lower bounds. Indeed, the best communication lower bound to date for any explicit function F:({0,1}n)k{0,1},F\colon(\{0,1\}^{n})^{k}\to\{0,1\}, whether or not FF is computable by an 𝖠𝖢0\operatorname{\mathsf{AC}}^{0} circuit, is Ω(n/2k)\Omega(n/2^{k}) [6]. Theorem 1.6 comes close to matching the trivial upper bound of n+1n+1 for any communication problem, thereby showing that 𝖠𝖢0\operatorname{\mathsf{AC}}^{0} circuits of depth 22 achieve nearly the maximum possible communication complexity. Moreover, our result holds not only for bounded-error communication but also for communication with error 121nC\frac{1}{2}-\frac{1}{n^{C}} for any C1.C\geqslant 1. The error parameter in Theorem 1.6 is optimal and cannot be further increased to 121nω(1)\frac{1}{2}-\frac{1}{n^{\omega(1)}}; indeed, it is straightforward to see that any DNF formula with mm terms has a communication protocol with error 12Ω(1m)\frac{1}{2}-\Omega(\frac{1}{m}) and cost 22 bits. Theorem 1.6 is also optimal with respect to circuit depth because the multiparty communication complexity of 𝖠𝖢0\operatorname{\mathsf{AC}}^{0} circuits of depth 11 is at most 22 bits.

Since randomized communication complexity is invariant under function negation, Theorem 1.6 remains valid with the word “DNF” replaced with “CNF.”

1.5. Nondeterministic and Merlin–Arthur multiparty communication

Here again, we adopt the kk-party number-on-the-forehead model of Chandra, Furst, and Lipton [19]. Nondeterministic communication is defined in complete analogy with computational complexity. Specifically, a nondeterministic protocol starts with a guess string, whose length counts toward the protocol’s communication cost, and proceeds deterministically thenceforth. A nondeterministic protocol for a given communication problem F:X1×X2××Xk{0,1}F\colon X_{1}\times X_{2}\times\cdots\times X_{k}\to\{0,1\} is required to output the correct answer for all guess strings when presented with a negative instance of F,F, and for some guess string when presented with a positive instance. We further consider Merlin–Arthur protocols [3, 5], a communication model that combines the power of randomization and nondeterminism. As before, a Merlin–Arthur protocol for a given problem FF starts with a guess string, whose length counts toward the communication cost. From then on, the parties run an ordinary randomized protocol. The randomized phase in a Merlin–Arthur protocol must produce the correct answer with probability at least 2/32/3 for all guess strings when presented with a negative instance of F,F, and for some guess string when presented with a positive instance. Thus, the cost of a nondeterministic or Merlin–Arthur protocol is the sum of the costs of the guessing phase and communication phase. The minimum cost of a valid protocol for FF in these models is called the nondeterministic communication complexity of FF, denoted N(F)N(F), and Merlin–Arthur communication complexity of F,F, denoted MA1/3(F)\text{\it MA}_{1/3}(F). The quantity N(¬F)N(\neg F) is called the co-nondeterministic communication complexity of FF.

Nondeterministic and Merlin–Arthur protocols have been extensively studied for k=2k=2 parties but are much less understood in the multiparty setting [10, 23, 44, 43]. Prior to our paper, the best lower bounds in these models for an 𝖠𝖢0\operatorname{\mathsf{AC}}^{0} circuit F:({0,1}n)k{0,1}F\colon(\{0,1\}^{n})^{k}\to\{0,1\} were Ω(n/2kk)\Omega(\sqrt{n}/2^{k}k) for nondeterministic communication and Ω(n/2kk)1/2\Omega(\sqrt{n}/2^{k}k)^{1/2} for Merlin–Arthur communication, obtained in [43] for the set disjointness problem. We give a quadratic improvement on these lower bounds. In particular, our result for nondeterminism essentially matches the trivial upper bound. Moreover, we obtain our result for a particularly simple function in 𝖠𝖢0\operatorname{\mathsf{AC}}^{0}, namely, a polynomial-size CNF formula of constant width. A detailed statement follows.

Theorem 1.7.

Let δ>0\delta>0 be arbitrary. Then for all integers n,k2,n,k\geqslant 2, there is an ((explicitly given)) kk-party communication problem Gn,k:({0,1}n)k{0,1}G_{n,k}\colon(\{0,1\}^{n})^{k}\to\{0,1\} with

N(¬Gn,k)clognN(\neg G_{n,k})\leqslant c\log n

but

N(Gn,k)\displaystyle N(G_{n,k}) (nc4kk2)1δ,\displaystyle\geqslant\left(\frac{n}{c4^{k}k^{2}}\right)^{1-\delta}, (1.4)
R1/3(Gn,k)\displaystyle R_{1/3}(G_{n,k}) (nc4kk2)1δ,\displaystyle\geqslant\left(\frac{n}{c4^{k}k^{2}}\right)^{1-\delta}, (1.5)
MA1/3(Gn,k)\displaystyle\text{\it MA}_{1/3}(G_{n,k}) (nc4kk2)1δ2,\displaystyle\geqslant\left(\frac{n}{c4^{k}k^{2}}\right)^{\frac{1-\delta}{2}}, (1.6)

where c1c\geqslant 1 is a constant independent of nn and k.k. Moreover, Gn,kG_{n,k} is computable by a CNF formula of width ckck and size ncn^{c}.

This result can be viewed as a far-reaching generalization of Theorem 1.6 to nondeterministic and Merlin–Arthur protocols. To obtain Theorem 1.7, we adapt the pattern matrix method [43] to be able to transform any lower bound on one-sided approximate degree into a multiparty communication lower bound in the nondeterministic and Merlin–Arthur models. With this tool in hand, we obtain Theorem 1.7 from our one-sided approximate degree lower bound (Theorem 1.4).

1.6. Multiparty communication classes

Theorem 1.7 sheds new light on communication complexity classes, defined in the seminal work of Babai, Frankl, and Simon [4]. An infinite family {Fn}n=1,\{F_{n}\}_{n=1}^{\infty}, where each Fn:({0,1}n)k{0,1}F_{n}\colon(\{0,1\}^{n})^{k}\to\{0,1\} is a kk-party number-on-the-forehead communication problem, is said to be efficiently solvable in a given model of communication if FnF_{n} has communication complexity at most logcn\log^{c}n in that model, for a large enough constant c>1c>1 and all n>c.n>c. One defines 𝖡𝖯𝖯k,\mathsf{BPP}_{k}, 𝖭𝖯k,\mathsf{NP}_{k}, 𝖼𝗈𝖭𝖯k,\mathsf{coNP}_{k}, and 𝖬𝖠k\mathsf{MA}_{k} as the classes of families that are efficiently solvable in the randomized, nondeterministic, co-nondeterministic, and Merlin–Arthur models, respectively. In particular, 𝖬𝖠k\mathsf{MA}_{k} is a superset of 𝖭𝖯k\mathsf{NP}_{k} and 𝖡𝖯𝖯k\mathsf{BPP}_{k}. In these definitions, k=k(n)k=k(n) can be any function of n,n, including constant functions such as k=3.k=3. The relations among these multiparty classes have been actively studied over the past decade [9, 28, 20, 22, 11, 10, 23, 44, 43]. It particular, for kΘ(logn),k\leqslant\Theta(\log n), it is known that 𝖼𝗈𝖭𝖯k\mathsf{coNP}_{k} is not contained in 𝖡𝖯𝖯k,\mathsf{BPP}_{k}, 𝖭𝖯k,\mathsf{NP}_{k}, or even 𝖬𝖠k\mathsf{MA}_{k}. Quantitatively, these results can be summarized as follows.

  1. (i)

    Prior to our work, the strongest kk-party separation of co-nondeterministic versus randomized communication complexity was O(logn)O(\log n) versus Ω(n/2kk)\Omega(\sqrt{n}/2^{k}k), proved in [43] for the set disjointness function.

  2. (ii)

    The best previous kk-party separations of co-nondeterministic versus nondeterministic communication complexity were: O(logn)O(\log n) versus Ω(n),\Omega(n), proved in [43] nonconstructively by the probabilistic method; and O(logn)O(\log n) versus Ω(n/2kk)\Omega(\sqrt{n}/2^{k}k), proved in [43] for the set disjointness problem.

  3. (iii)

    The best previous kk-party separation of co-nondeterministic versus Merlin–Arthur communication complexity was O(logn)O(\log n) versus Ω(n/2kk)1/2\Omega(\sqrt{n}/2^{k}k)^{1/2}, proved in [43] for the set disjointness problem.

Theorem 1.7 gives a quadratic improvement on these previous separations, excluding the nonconstructive separation of 𝖼𝗈𝖭𝖯k\mathsf{coNP}_{k} from 𝖭𝖯k\mathsf{NP}_{k} in [10]. Moreover, our quadratically improved separations are achieved for a particularly simple function, namely, the polynomial-size constant-width CNF formula Gn,kG_{n,k}. In the regime kΘ(logn),k\leqslant\Theta(\log n), our separations of 𝖼𝗈𝖭𝖯k\mathsf{coNP}_{k} from 𝖡𝖯𝖯k\mathsf{BPP}_{k} and 𝖭𝖯k\mathsf{NP}_{k} are essentially optimal, and our separation of 𝖼𝗈𝖭𝖯k\mathsf{coNP}_{k} from 𝖬𝖠k\mathsf{MA}_{k} is within a square of optimal. Recall that no explicit lower bounds at all are currently known in the regime klogn,k\geqslant\log n, even for deterministic communication. We state our contributions for communication complexity classes as a corollary below.

Corollary 1.8.

Let k=k(n)k=k(n) be a function with k(n)(12ϵ)lognk(n)\leqslant(\frac{1}{2}-\epsilon)\log n for some constant ϵ>0\epsilon>0. Then the communication problem Gn,kG_{n,k} from Theorem 1.7 satisfies

{Gn,k}n=1𝖼𝗈𝖭𝖯k𝖡𝖯𝖯k,\displaystyle\{G_{n,k}\}_{n=1}^{\infty}\in\mathsf{coNP}_{k}\setminus\mathsf{BPP}_{k},
{Gn,k}n=1𝖼𝗈𝖭𝖯k𝖭𝖯k,\displaystyle\{G_{n,k}\}_{n=1}^{\infty}\in\mathsf{coNP}_{k}\setminus\mathsf{NP}_{k},
{Gn,k}n=1𝖼𝗈𝖭𝖯k𝖬𝖠k.\displaystyle\{G_{n,k}\}_{n=1}^{\infty}\in\mathsf{coNP}_{k}\setminus\mathsf{MA}_{k}.

Analogously, the communication problem Fn,kF_{n,k} from Theorem 1.6 satisfies

{Fn,k}n=1𝖭𝖯k𝖡𝖯𝖯k.\{F_{n,k}\}_{n=1}^{\infty}\in\mathsf{NP}_{k}\setminus\mathsf{BPP}_{k}.
Proof.

The claims for Gn,kG_{n,k} are immediate from Theorem 1.7 and the definitions of 𝖭𝖯k,𝖼𝗈𝖭𝖯k,𝖡𝖯𝖯k,𝖬𝖠k.\mathsf{NP}_{k},\mathsf{coNP}_{k},\mathsf{BPP}_{k},\mathsf{MA}_{k}. For the remaining separation, we need only prove the upper bound N(Fn,k)=O(logn).N(F_{n,k})=O(\log n). Recall from Theorem 1.6 that Fn,kF_{n,k} is a DNF formula with ncn^{c^{\prime}} terms. This gives the desired nondeterministic protocol: the parties “guess” one of the terms in Fn,kF_{n,k} (for a cost of lognc\lceil\log n^{c^{\prime}}\rceil bits), evaluate it (using another 22 bits of communication), and output the result. ∎

1.7. Quantum communication complexity

We adopt the standard model of quantum communication, where two parties exchange quantum messages according to an agreed-upon protocol in order to solve a two-party communication problem F:X×Y{0,1}F\colon X\times Y\to\{0,1\}. As usual, an input (x,y)X×Y(x,y)\in X\times Y is split between the parties, with one party knowing only xx and the other party knowing only y.y. We allow arbitrary prior entanglement at the start of the communication. A measurement at the end of the protocol produces a single-bit answer, which is interpreted as the protocol output. An ϵ\epsilon-error protocol for FF is required to output, on every input (x,y)X×Y,(x,y)\in X\times Y, the correct value F(x,y)F(x,y) with probability at least 1ϵ.1-\epsilon. The cost of a quantum protocol is the total number of quantum bits exchanged in the worst case on any input. The ϵ\epsilon-error quantum communication complexity of FF, denoted Qϵ(F),Q_{\epsilon}^{*}(F), is the least cost of an ϵ\epsilon-error quantum protocol for F.F. The asterisk in Qϵ(F)Q_{\epsilon}^{*}(F) indicates that the parties share arbitrary prior entanglement. The standard setting of the error parameter is ϵ=1/3,\epsilon=1/3, which is as usual without loss of generality. For a detailed formal description of the quantum model, we refer the reader to [51, 33, 38].

Proving lower bounds for bounded-error quantum communication is significantly more challenging than for randomized communication. An illustrative example is the set disjointness problem on nn bits. Babai, Frankl, and Simon [4] obtained an Ω(n)\Omega(\sqrt{n}) randomized communication lower bound for this function in 1986 using a short and elementary proof, which was later improved to a tight Ω(n)\Omega(n) in [25, 32, 7]. This is in stark contrast with the quantum model, where the best lower bound for set disjointness was for a long time a trivial Ω(logn)\Omega(\log n) until a tight Ω(n)\Omega(\sqrt{n}) was proved by Razborov [33] in 2002.

A completely different proof of the Ω(n)\Omega(\sqrt{n}) lower bound for set disjointness was given in [38] by introducing the pattern matrix method. Since then, the method has produced the strongest known quantum lower bounds for 𝖠𝖢0\operatorname{\mathsf{AC}}^{0}. Of these, the best lower bound prior to our work was Ω(n1δ)\Omega(n^{1-\delta}) due to Bun and Thaler [17], where the constant δ>0\delta>0 can be taken arbitrarily small at the expense of circuit depth. In the following theorem, we resolve the quantum communication complexity of 𝖠𝖢0\operatorname{\mathsf{AC}}^{0} in full by proving that polynomial-size DNF formulas achieve near-maximum communication complexity.

Theorem 1.9.

Let δ>0\delta>0 and C1C\geqslant 1 be any constants. Then for each n1,n\geqslant 1, there is an ((explicitly given)) two-party communication problem F:{0,1}n×{0,1}n{0,1}F\colon\{0,1\}^{n}\times\{0,1\}^{n}\to\{0,1\} that has quantum communication complexity

Q121nC(F)=Ω(n1δ)Q_{\frac{1}{2}-\frac{1}{n^{C}}}^{*}(F)=\Omega(n^{1-\delta})

and is representable by a DNF formula of size nO(1)n^{O(1)} and width O(1).O(1).

This theorem remains valid for CNF formulas since quantum communication complexity is invariant under function negation. As in all of our results, Theorem 1.9 essentially matches the trivial upper bound, showing that 𝖠𝖢0\operatorname{\mathsf{AC}}^{0} circuits of depth 22 achieve nearly the maximum possible complexity. Again analogous to our other results, Theorem 1.9 holds not only for bounded-error communication but also for communication with error 121nC\frac{1}{2}-\frac{1}{n^{C}} for any C1.C\geqslant 1. The error parameter in Theorem 1.9 is optimal and cannot be further increased to 121nω(1)\frac{1}{2}-\frac{1}{n^{\omega(1)}}: as remarked above, any DNF formula with mm terms has a classical communication protocol with error 12Ω(1m)\frac{1}{2}-\Omega(\frac{1}{m}) and cost 22 bits. Lastly, Theorem 1.9 is optimal with respect to circuit depth because 𝖠𝖢0\operatorname{\mathsf{AC}}^{0} circuits of depth 11 have communication complexity at most 22 bits even in the classical deterministic model.

In our overview so far, we have separately considered the classical multiparty model and the quantum two-party model. By combining the features of these models, one arrives at the kk-party number-on-the-forehead model with quantum players. Our results readily generalize to this setting. Specifically, for any constants δ>0\delta>0 and C1,C\geqslant 1, we give an explicit DNF formula Fn,k:({0,1}n)k{0,1}F_{n,k}\colon(\{0,1\}^{n})^{k}\to\{0,1\} of size nO(1)n^{O(1)} and width O(k)O(k) such that computing Fn,kF_{n,k} in the kk-party quantum number-on-the-forehead model with error 121nC\frac{1}{2}-\frac{1}{n^{C}} requires Ω(n1δ/4kk)\Omega(n^{1-\delta}/4^{k}k) quantum bits. For more details, see Remark 5.9.

1.8. Previous approaches

In the remainder of the introduction, we sketch our proof of Theorem 1.1. To properly set the stage for our work, we start by reviewing the relevant background and previous approaches. The notation that we adopt below is standard, and we defer its formal review to Section 2.

Dual view of approximation

Let f:X{0,1}f\colon X\to\{0,1\} be a Boolean function of interest, where XX is an arbitrary finite subset of Euclidean space. The approximate degree of ff is defined analogously to functions on the Boolean hypercube: degϵ(f)\deg_{\epsilon}(f) is the minimum degree of a real polynomial pp such that |f(x)p(x)|ϵ|f(x)-p(x)|\leqslant\epsilon for every xX.x\in X. A valuable tool in the analysis of approximate degree is linear programming duality, which gives a powerful dual view of approximation [38]. This dual characterization states that degϵ(f)d\deg_{\epsilon}(f)\geqslant d if and only if there is a function ϕ:X\phi\colon X\to\mathbb{R} with the following two properties: ϕ,f>ϵϕ1\langle\phi,f\rangle>\epsilon\|\phi\|_{1}; and ϕ,p=0\langle\phi,p\rangle=0 for every polynomial pp of degree less than dd. Rephrasing, ϕ\phi must be correlated with ff but completely uncorrelated with any polynomial of degree less than d.d. Such a function ϕ\phi is variously referred to in the literature as a “dual object,” “dual polynomial,” or “witness” for f.f. The dual characterization makes it possible to prove any approximate degree lower bound by constructing the corresponding witness ϕ.\phi. This good news comes with a caveat: for all but the simplest functions, the construction of ϕ\phi is very demanding, and linear programming duality gives no guidance in this regard.

Componentwise composition

The construction of a dual object is more approachable for composed functions since one can hope to break them up into constituent parts, construct a dual object for each, and recombine these results. Formally, define the componentwise composition of functions f:{0,1}n{0,1}f\colon\{0,1\}^{n}\to\{0,1\} and g:X{0,1}g\colon X\to\{0,1\} as the Boolean function fg:Xn{0,1}f\circ g\colon X^{n}\to\{0,1\} given by (fg)(x1,,xn)=f(g(x1),,g(xn)).\text{(}f\circ g)(x_{1},\ldots,x_{n})=f(g(x_{1}),\ldots,g(x_{n})). To construct a dual object for fg,f\circ g, one starts by obtaining dual objects ϕ\phi and ψ\psi for the constituent functions ff and gg, respectively, either by direct construction or by appeal to linear programming duality. They are then combined to yield a dual object Φ\Phi for the composed function, using dual componentwise composition [41, 26]:

Φ(x1,x2,,xn)=ϕ(𝐈[ψ(x1)>0],,𝐈[ψ(xn)>0])i=1n|ψ(xi)|.\!\!\!\Phi(x_{1},x_{2},\ldots,x_{n})=\phi(\mathbf{I}[\psi(x_{1})>0],\ldots,\mathbf{I}[\psi(x_{n})>0])\prod_{i=1}^{n}|\psi(x_{i})|. (1.7)

This composed dual object typically requires additional work to ensure strong enough correlation with the composed function fgf\circ g. Among the generic tools available to assist in this process is a “corrector” object ζ\zeta due to Razborov and Sherstov [34], with the following four properties: (i) ζ\zeta is orthogonal to low-degree polynomials; (ii) ζ\zeta takes on 11 at a prescribed point of the hypercube; (iii) ζ\zeta is bounded at inputs of low Hamming weight; and (iv) ζ\zeta vanishes at all other points of the hypercube. Using ζ\zeta, suitably shifted and scaled, one can surgically correct the behavior of a given dual object Φ\Phi at a substantial fraction of the inputs without affecting Φ\Phi’s orthogonality to low-degree polynomials. This technique played an important role in previous work, e.g., [17, 14, 18, 49].

Componentwise composition by itself does not allow one to construct hard-to-approximate functions from easy ones. To see why, consider arbitrary functions f:{0,1}n1{0,1}f\colon\{0,1\}^{n_{1}}\to\{0,1\} and g:{0,1}n2{0,1}g\colon\{0,1\}^{n_{2}}\to\{0,1\} with approximate degrees at most n1αn_{1}^{\alpha} and n2α,n_{2}^{\alpha}, respectively, for some 0<α<10<\alpha<1. It is well-known [42] that the composed function fgf\circ g on n1n2n_{1}n_{2} variables has approximate degree O(n1αn2α)=O(n1n2)α.O(n_{1}^{\alpha}n_{2}^{\alpha})=O(n_{1}n_{2})^{\alpha}. This means that relative to the new number of variables, the composed function fgf\circ g is asymptotically no harder to approximate than the constituent functions ff and gg. In particular, one cannot use componentwise composition to transform functions on nn bits with 1/31/3-approximate degree at most nαn^{\alpha} into functions on NN bits with 1/31/3-approximate degree ω(Nα).\omega(N^{\alpha}).

Previous best bound for 𝖠𝖢0\operatorname{\mathsf{AC}}^{0}

In the previous best result on the 1/31/3-approximate degree of 𝖠𝖢0\operatorname{\mathsf{AC}}^{0}, Bun and Thaler [17] approached the componentwise composition fgf\circ g in an ingenious way to amplify the approximate degree for a careful choice of gg. Let f:{0,1}n{0,1}f\colon\{0,1\}^{n}\to\{0,1\} be given, with 1/31/3-approximate degree nαn^{\alpha} for some 0α<10\leqslant\alpha<1. Bun and Thaler consider the componentwise composition F=f(ANDΘ(logm)ORm)F=f\circ(\text{\rm AND}_{\Theta(\log m)}\circ\text{\rm OR}_{m}), for a small enough parameter m=poly(n).m=\operatorname{poly}(n). It was shown in earlier work [41, 16] that dual componentwise composition witnesses the lower bound deg1/3(F)=Ω(deg1/3(ORm)deg1/3(f))=Ω(mdeg1/3(f)).\deg_{1/3}(F)=\Omega(\deg_{1/3}(\text{\rm OR}_{m})\deg_{1/3}(f))=\Omega(\sqrt{m}\deg_{1/3}(f)). Bun and Thaler make the crucial observation that the dual object for ORm\text{\rm OR}_{m} has most of its 1\ell_{1} mass on inputs of Hamming weight O(1)O(1), which in view of (1.7) implies that the dual object for FF places most of its 1\ell_{1} mass on inputs of Hamming weight O~(n).\tilde{O}(n). The authors of [17] then use the Razborov–Sherstov corrector object to transfer the small amount of 1\ell_{1} mass that the dual object for FF places on inputs of high Hamming weight, to inputs of low Hamming weight. The resulting dual object is supported entirely on inputs of low Hamming weight and therefore witnesses a lower bound on the approximate degree of the restriction FF^{\prime} of FF to inputs of low Hamming weight.

The restriction FF^{\prime} takes as input N:=Θ(nmlogm)N:=\Theta(nm\log m) variables but is defined only when its input string has Hamming weight O~(n).\tilde{O}(n). This makes it possible to represent the input to FF^{\prime} more economically, by specifying the locations of the O~(n)\tilde{O}(n) nonzero bits inside the array of NN variables. Since each such location can be specified using logN\lceil\log N\rceil bits, the entire input to FF^{\prime} can be specified using logNO~(n)=O~(n)\lceil\log N\rceil\cdot\tilde{O}(n)=\tilde{O}(n) bits. This yields a function F′′F^{\prime\prime} on O~(n)\tilde{O}(n) variables. A careful calculation shows that this “input compression” does not hurt the approximate degree. Thus, the approximate degree of F′′F^{\prime\prime} is at least the approximate degree of F,F^{\prime}, which as discussed above is Ω(mdeg1/3(f)).\Omega(\sqrt{m}\deg_{1/3}(f)). With mm set appropriately, the approximate degree of F′′F^{\prime\prime} is polynomially larger than that of f.f.

This passage from ff to F′′F^{\prime\prime} is the desired hardness amplification for approximate degree. To obtain an Ω(n1δ)\Omega(n^{1-\delta}) lower bound on the approximate degree of 𝖠𝖢0\operatorname{\mathsf{AC}}^{0}, the authors of [17] start with a trivial circuit and apply the hardness amplification step a constant number of times, until approximate degree Ω(n1δ)\Omega(n^{1-\delta}) is reached.

Limitations of previous approaches to 𝖠𝖢0\operatorname{\mathsf{AC}}^{0}

Bun and Thaler’s hardness amplification for approximate degree rests on two pillars. The first is componentwise composition, whereby the given function f:{0,1}n{0,1}f\colon\{0,1\}^{n}\to\{0,1\} is composed componentwise with nn independent copies of the gadget ANDΘ(logm)ORm.\text{\rm AND}_{\Theta(\log m)}\circ\text{\rm OR}_{m}. In this gadget, the ANDΘ(logm)\text{\rm AND}_{\Theta(\log m)} gate is necessary to control the accumulation of error and to ensure the correlation property of the dual polynomial. The resulting composed function F=f(ANDΘ(logm)ORm)F=f\circ(\text{\rm AND}_{\Theta(\log m)}\circ\text{\rm OR}_{m}) is defined on N=Θ(nmlogmN=\Theta(nm\log m) variables. The second pillar of [17] is input compression, where the length-NN input to FF is represented compactly as an array of O~(n)\tilde{O}(n) strings of length logN\lceil\log N\rceil each. The circuitry to implement these two pillars is expensive, requiring in both cases a polynomial-size DNF formula of width Θ(logn+logm)\Theta(\log n+\log m). As a result, even a single iteration of the Bun–Thaler hardness amplification cannot be implemented as a polynomial-size DNF or CNF formula.

To prove an Ω(n1δ)\Omega(n^{1-\delta}) approximate degree lower bound for small δ>0\delta>0 in the framework of [17], one needs a number of iterations that grows with 1/δ1/\delta. Thus, the overall circuit produced in [17] has a large constant number of alternating layers of AND and OR gates of logarithmic and polynomial fan-in, respectively, and in particular cannot be flattened into a polynomial-size DNF or CNF formula. Proving Theorem 1.1 within this framework would require reducing the fan-in of the AND gates from Θ(logn+logm)\Theta(\log n+\log m) to O(1),O(1), which would completely destroy the componentwise composition and input compression pillars of [17]. These pillars are present in all follow-up papers [17, 14, 18, 49] and seem impossible to get around, prompting the authors of [18, p. 14] to entertain the possibility that the approximate degree of 𝖠𝖢0\operatorname{\mathsf{AC}}^{0} at any given depth is much smaller than once conjectured. We show that this is not the case.

1.9. Our proof

In this paper, we design hardness amplification from first principles, without using componentwise composition or input compression. Our approach efficiently amplifies the approximate degree even for functions with sparse input, while ensuring that each hardness amplification stage is implementable by a monotone circuit of constant depth with AND gates of constant fan-in and OR gates of polynomial fan-in. As a result, repeating our process any constant number of times produces a polynomial-size DNF formula of constant width.

Our approach at a high level

Let f:{0,1}N{0,1}f\colon\{0,1\}^{N}\to\{0,1\} be a given function. Let f|θf|_{\leqslant\theta} denote the restriction of ff to inputs of Hamming weight at most θ,\theta, and let d=deg1/3(f|θ)d=\deg_{1/3}(f|_{\leqslant\theta}) be the approximate degree of this restriction. The total number of variables NN can be vastly larger than θ\theta; in the actual proof, we will set N=θCN=\theta^{C} for a constant C1.C\geqslant 1. Since an input y{0,1}Ny\in\{0,1\}^{N} to f|θf|_{\leqslant\theta} is guaranteed to have Hamming weight at most θ,\theta, we can think of yy as the disjunction of θ\theta vectors of Hamming weight at most 11 each:

y=y1y2yθ,y=y_{1}\vee y_{2}\vee\cdots\vee y_{\theta},

where each yiy_{i} is either the zero vector 0N0^{N} or a basis vector e1,e2,,eNe_{1},e_{2},\ldots,e_{N}, and the disjunction on the right-hand side is applied coordinate-wise. Our approach centers around encoding each yiy_{i} as a string of nNn\ll N bits so as to make the decoding difficult for polynomials but easy for circuits. Ideally, we would like a decoding function h:{0,1}n{0,1}Nh\colon\{0,1\}^{n}\to\{0,1\}^{N} with the following properties:

  1. (i)

    the sets h1(v)h^{-1}(v) for v{e1,e2,,eN,0N}v\in\{e_{1},e_{2},\ldots,e_{N},0^{N}\} are indistinguishable by polynomials of degree up to DD, for some parameter DD;

  2. (ii)

    the sets h1(v)h^{-1}(v) for v{e1,e2,,eN,0N}v\in\{e_{1},e_{2},\ldots,e_{N},0^{N}\} contain only strings of Hamming weight O(1);O(1);

  3. (iii)

    hh is computable by a constant-depth monotone circuit with AND gates of constant fan-in and OR gates of polynomial fan-in.

With such hh in hand, define F:({0,1}n)θ{0,1}F\colon(\{0,1\}^{n})^{\theta}\to\{0,1\} by

F(x1,x2,,xθ)=f(i=1θh(xi)).F(x_{1},x_{2},\ldots,x_{\theta})=f\left(\bigvee_{i=1}^{\theta}h(x_{i})\right).

Then, one can reasonably expect that approximating FF is harder than approximating f|θ.f|_{\leqslant\theta}. Indeed, an approximating polynomial has access only to the encoded input (x1,x2,,xθ)(x_{1},x_{2},\ldots,x_{\theta}). Decoding this input presumably involves computing (x1,x2,,xθ)(h(x1),h(x2),,h(xθ))(x_{1},x_{2},\ldots,x_{\theta})\mapsto(h(x_{1}),h(x_{2}),\ldots,h(x_{\theta})) one way or another, which by property (i) requires a polynomial of degree greater than DD. Once the decoded string h(x1)h(x2)h(xθ)h(x_{1})\vee h(x_{2})\vee\cdots\vee h(x_{\theta}) is available, the polynomial supposedly needs to compute ff on that input, which in and of itself requires degree d.d. Altogether, we expect FF to have approximate degree on the order of Dd.Dd. Moreover, property (ii) ensures that FF is hard to approximate even on inputs of Hamming weight O(θ),O(\theta), putting us in a strong position for another round of hardness amplification. Finally, property (iii) guarantees that the result of constantly many rounds of hardness amplification is computable by a DNF formula of polynomial size and constant width.

Actual implementation

As one might suspect, the above program is too bold and cannot be implemented literally. Our actual construction of hh achieves (i)–(iii) only approximately. In more detail, let kk be a sufficiently large constant. For each v{e1,e2,,eN,0N},v\in\{e_{1},e_{2},\ldots,e_{N},0^{N}\}, we construct a probability distribution λv\lambda_{v} on {0,1}n\{0,1\}^{n} that has all but a vanishing fraction of its mass on inputs of Hamming weight exactly k,k, and moreover any two such distributions λv\lambda_{v} and λv\lambda_{v^{\prime}} are indistinguishable by polynomials of low degree. We are further able to ensure that an input of Hamming weight kk belongs to the support of at most one of the distributions λv\lambda_{v}. Thus, the λv\lambda_{v} are in essence supported on pairwise disjoint sets of strings of Hamming weight k,k, and are pairwise indistinguishable by polynomials of low degree. The decoding function hh works by taking an input x{0,1}nx\in\{0,1\}^{n} of Hamming weight kk and determining which of the distributions has xx in its support—a highly efficient computation realizable as a monotone kk-DNF formula. With small probability, hh will receive as input a string of Hamming weight larger than k,k, in which case the decoding may fail.

Construction of the λv\lambda_{v}

Central to our work is the number-theoretic notion of mm-discrepancy, which is a measure of pseudorandomness or aperiodicity of a given set of integers modulo m.m. Formally, the mm-discrepancy of a nonempty finite set SS\subseteq\mathbb{Z} is defined as

discm(S)=maxk=1,2,,m1|1|S|sSξks|,\operatorname{disc}_{m}(S)=\max_{k=1,2,\ldots,m-1}\left|\frac{1}{|S|}\sum_{s\in S}\xi^{ks}\right|,

where ξ\xi is a primitive mm-th root of unity. The construction of sparse sets with low discrepancy is a well-studied problem in combinatorics and theoretical computer science. By building on previous work [2, 48], we construct a sparse set of integers with small discrepancy in our regime of interest. For our application, we set the modulus m=N+1.m=N+1.

Continuing, let ([n]k)\binom{[n]}{k} denote the family of cardinality-kk subsets of [n]={1,2,,n}.[n]=\{1,2,\ldots,n\}. To design the distributions λv,\lambda_{v}, we need an explicit coloring γ:([n]k)[N+1]\gamma\colon\binom{[n]}{k}\to[N+1] that is balanced, in the sense that for nearly all large enough subsets A{1,2,,n}A\subseteq\{1,2,\ldots,n\} and all i[N+1],i\in[N+1], the family γ1(i)\gamma^{-1}(i) accounts for almost exactly a 1/(N+1)1/(N+1) fraction of all cardinality-kk subsets of A.A. The existence of a highly balanced coloring follows by the probabilistic method, and we construct one explicitly using the sparse set of integers with small (N+1)(N+1)-discrepancy constructed earlier in the proof.

Our next ingredient is a dual polynomial ω\omega for the OR function, a staple in approximate degree lower bounds. An important property of ω\omega is that it places a constant fraction of its 1\ell_{1} mass on the point 0n.0^{n}. Translating ω\omega from 0n0^{n} to a point zz of slightly larger Hamming weight results in a new dual polynomial, call it ωz.\omega_{z}. Analogous to ω,\omega, the new dual polynomial has a constant fraction of its 1\ell_{1} mass on zz and the rest on inputs that are greater than or equal to zz componentwise.

For notational convenience, let us now rename γ\gamma’s range elements 1,2,,N+11,2,\ldots,N+1 to e1,e2,,eN,0N,e_{1},e_{2},\ldots,e_{N},0^{N}, respectively. For v{e1,e2,,eN,0N},v\in\{e_{1},e_{2},\ldots,e_{N},0^{N}\}, define Φv\Phi_{v} to be the average of the dual polynomials ωz\omega_{z} where zz ranges over all characteristic vectors of the sets in γ1(v).\gamma^{-1}(v). Being a convex combination of dual polynomials, each Φv\Phi_{v} is a dual object orthogonal to polynomials of low degree. Observe further that each Φv\Phi_{v} is supported on inputs of Hamming weight at least k,k, and any input of Hamming weight exactly kk belongs to the support of exactly one Φv.\Phi_{v}. For inputs xx of Hamming weight greater than kk, a remarkable thing happens: Φv(x)\Phi_{v}(x) is almost the same for all v.v. We prove this by exploiting the fact that γ\gamma is highly balanced. As a result, the “common part” of the Φv\Phi_{v} for inputs of Hamming weight greater than kk can be subtracted out to obtain a function Φv~\widetilde{\Phi_{v}} for each v{e1,e2,,eN,0N}v\in\{e_{1},e_{2},\ldots,e_{N},0^{N}\}. While these new functions are not dual polynomials, the difference of any two of them is since Φv~Φv~=ΦvΦv\widetilde{\Phi_{v}}-\widetilde{\Phi_{v^{\prime}}}=\Phi_{v}-\Phi_{v^{\prime}}. Put another way, the Φv~\widetilde{\Phi_{v}} are pairwise indistinguishable by low-degree polynomials. By defining the Φv~\widetilde{\Phi_{v}} in a somewhat more subtle way, we further ensure that each Φv~\widetilde{\Phi_{v}} is nonnegative. The distribution λv\lambda_{v} can then be taken to be the normalized function Φv~/Φv~1.\widetilde{\Phi_{v}}/\|\widetilde{\Phi_{v}}\|_{1}. This construction ensures all the properties that we need: λv\lambda_{v} has nearly all of its mass on inputs of Hamming weight kk; an input of Hamming weight kk belongs to the support of at most one distribution λv\lambda_{v}; and any pair of distributions λv,λv\lambda_{v},\lambda_{v^{\prime}} are indistinguishable by a low-degree polynomial. Observe that in our construction, λv\lambda_{v} is close to the uniform probability distribution on the characteristic vectors of the sets in γ1(v).\gamma^{-1}(v).

2. Preliminaries

2.1. General notation

For a string x{0,1}nx\in\{0,1\}^{n} and a set S{1,2,,n},S\subseteq\{1,2,\ldots,n\}, we let x|Sx|_{S} denote the restriction of xx to the indices in S.S. In other words, x|S=xi1xi2xi|S|,x|_{S}=x_{i_{1}}x_{i_{2}}\ldots x_{i_{|S|}}, where i1<i2<<i|S|i_{1}<i_{2}<\cdots<i_{|S|} are the elements of S.S. The characteristic vector 𝟏S\mathbf{1}_{S} of a set S{1,2,,n}S\subseteq\{1,2,\ldots,n\} is given by

(𝟏S)i={1if iS,0otherwise.(\mathbf{1}_{S})_{i}=\begin{cases}1&\text{if }i\in S,\\ 0&\text{otherwise.}\end{cases}

Given an arbitrary set XX and elements x,yX,x,y\in X, the Kronecker delta δx,y\delta_{x,y} is defined by

δx,y={1if x=y,0otherwise.\delta_{x,y}=\begin{cases}1&\text{if }x=y,\\ 0&\text{otherwise.}\end{cases}

For a logical condition C,C, we use the Iverson bracket

𝐈[C]={1if C holds,0otherwise.\mathbf{I}[C]=\begin{cases}1&\text{if $C$ holds,}\\ 0&\text{otherwise.}\end{cases}

We let ={0,1,2,3,}\mathbb{N}=\{0,1,2,3,\ldots\} denote the set of natural numbers. We use the comparison operators in a unary capacity to denote one-sided intervals of the real line. Thus, <a,{<}a, a,{\leqslant}a, >a,{>}a, a{\geqslant}a stand for (,a),(-\infty,a), (,a],(-\infty,a], (a,),(a,\infty), [a,),[a,\infty), respectively. We let lnx\ln x and logx\log x stand for the natural logarithm of xx and the logarithm of xx to base 2,2, respectively. The term Euclidean space refers to n\mathbb{R}^{n} for some positive integer n.n. We let eie_{i} denote the vector whose ii-th component is 11 and the others are 0.0. Thus, the vectors e1,e2,,ene_{1},e_{2},\dots,e_{n} form the standard basis for n.\mathbb{R}^{n}. For a complex number x,x, we denote the real part, imaginary part, and complex conjugate of xx as usual by Re(x),\operatorname{Re}(x), Im(x),\operatorname{Im}(x), and x¯,\overline{x}, respectively. We typeset the imaginary unit 𝐢\mathbf{i} in boldface to distinguish it from the index variable ii. For an arbitrary integer aa and a positive integer mm, recall that amodma\bmod m denotes the unique element of {0,1,2,,m1}\{0,1,2,\ldots,m-1\} that is congruent to aa modulo m.m.

For a set X,X, we let X\mathbb{R}^{X} denote the linear space of real-valued functions on X.X. The support of a function fXf\in\mathbb{R}^{X} is denoted suppf={xX:f(x)0}.\operatorname{supp}f=\{x\in X:f(x)\neq 0\}. For real-valued functions with finite support, we adopt the usual norms and inner product:

f=maxxsuppf|f(x)|,\displaystyle\|f\|_{\infty}=\max_{x\in\operatorname{supp}f}\,|f(x)|,
f1=xsuppf|f(x)|,\displaystyle\|f\|_{1}=\sum_{x\in\operatorname{supp}f}\,|f(x)|,
f,g=xsuppfsuppgf(x)g(x).\displaystyle\langle f,g\rangle=\sum_{x\in\operatorname{supp}f\,\cap\,\operatorname{supp}g}f(x)g(x).

This covers as a special case functions on finite sets. Analogous to functions, we adopt the familiar norms for vectors xnx\in\mathbb{R}^{n} in Euclidean space: x=maxi=1,,n|xi|\|x\|_{\infty}=\max_{i=1,\ldots,n}|x_{i}| and x1=i=1n|xi|.\|x\|_{1}=\sum_{i=1}^{n}|x_{i}|. The tensor product of fXf\in\mathbb{R}^{X} and gYg\in\mathbb{R}^{Y} is denoted fgX×Yf\otimes g\in\mathbb{R}^{X\times Y} and given by (fg)(x,y)=f(x)g(y).(f\otimes g)(x,y)=f(x)g(y). The tensor product ffff\otimes f\otimes\cdots\otimes f (nn times) is abbreviated fn.f^{\otimes n}. We frequently omit the argument in equations and inequalities involving functions, as in sgnp=(1)f\operatorname{sgn}p=(-1)^{f}. Such statements are to be interpreted pointwise. For example, the statement “f2|g|f\geqslant 2|g| on XX” means that f(x)2|g(x)|f(x)\geqslant 2|g(x)| for every xX.x\in X. For vectors xx and y,y, the notation xyx\leqslant y means that xiyix_{i}\leqslant y_{i} for each ii.

We adopt the standard notation for function composition, with fgf\circ g defined by (fg)(x)=f(g(x)).(f\circ g)(x)=f(g(x)). In addition, we use the \circ operator to denote the componentwise composition of Boolean functions. Formally, the componentwise composition of f:{0,1}n{0,1}f\colon\{0,1\}^{n}\to\{0,1\} and g:X{0,1}g\colon X\to\{0,1\} is the function fg:Xn{0,1}f\circ g\colon X^{n}\to\{0,1\} given by (fg)(x1,x2,,xn)=f(g(x1),g(x2),,g(xn)).(f\circ g)(x_{1},x_{2},\ldots,x_{n})=f(g(x_{1}),g(x_{2}),\ldots,g(x_{n})). Componentwise composition is consistent with standard composition, which in the context of Boolean functions is only defined for n=1.n=1. Thus, the meaning of fgf\circ g is determined by the range of gg and is never in doubt.

For a natural number n,n, we abbreviate [n]={1,2,,n}[n]=\{1,2,\ldots,n\}. For a set SS and an integer k,k, we let (Sk)\binom{S}{k} stand for the family of cardinality-kk subsets of SS:

(Sk)={AS:|A|=k}.\binom{S}{k}=\{A\subseteq S:|A|=k\}.

Analogously, for any set II, we define

(SI)={AS:|A|I}.\binom{S}{I}=\{A\subseteq S:|A|\in I\}.

To illustrate, (Sk)\binom{S}{{\leqslant}k} denotes the family of subsets of SS that have cardinality at most k.k. Analogously, we have the symbols (S<k),(Sk),(S>k).\binom{S}{{<}k},\binom{S}{{\geqslant}k},\binom{S}{{>}k}. Throughout this manuscript, we use brace notation as in {z1,z2,,zn}\{z_{1},z_{2},\ldots,z_{n}\} to specify multisets rather than sets, the distinction being that the number of times an element occurs is taken into account. The cardinality |Z||Z| of a finite multiset ZZ is defined to be the total number of element occurrences in ZZ, with each element counted as many times as it occurs. The equality and subset relations on multisets are defined analogously, with the number of element occurrences taken into account. For example, {1,1,2}={1,2,1}\{1,1,2\}=\{1,2,1\} but {1,1,2}{1,2}\{1,1,2\}\neq\{1,2\}. Similarly, {1,2}{1,1,2}\{1,2\}\subseteq\{1,1,2\} but {1,1,2}{1,2}.\{1,1,2\}\nsubseteq\{1,2\}.

2.2. Boolean strings and functions

We identify the Boolean values “true” and “false” with 11 and 0,0, respectively, and view Boolean functions as mappings X{0,1}X\to\{0,1\} for a finite set X.X. The familiar functions ORn:{0,1}n{0,1}\text{\rm OR}_{n}\colon\{0,1\}^{n}\to\{0,1\} and ANDn:{0,1}n{0,1}\text{\rm AND}_{n}\colon\{0,1\}^{n}\to\{0,1\} are given by ORn(x)=i=1nxi\text{\rm OR}_{n}(x)=\bigvee_{i=1}^{n}x_{i} and ANDn(x)=i=1nxi.\text{\rm AND}_{n}(x)=\bigwedge_{i=1}^{n}x_{i}. We abbreviate NORn=¬ORn.\text{\rm NOR}_{n}=\neg\text{\rm OR}_{n}. For Boolean strings x,y{0,1}n,x,y\in\{0,1\}^{n}, we let xyx\oplus y denote their bitwise XOR. The strings xyx\wedge y and xyx\vee y are defined analogously, with the binary operator applied bitwise.

For a vector vn,v\in\mathbb{N}^{n}, we define its weight |v||v| to be |v|=v1+v2++vn.|v|=v_{1}+v_{2}+\cdots+v_{n}. If x{0,1}nx\in\{0,1\}^{n} is a Boolean string, then |x||x| is precisely the Hamming weight of xx. For any sets XnX\subseteq\mathbb{N}^{n} and W,W\subseteq\mathbb{R}, we define X|WX|_{W} to be the subset of vectors in XX whose weight belongs to WW:

X|W={xX:|x|W}.X|_{W}=\{x\in X:|x|\in W\}.

In the case of a one-element set W={w}W=\{w\}, we further shorten X|{w}X|_{\{w\}} to X|w.X|_{w}. For example, n|w\mathbb{N}^{n}|_{\leqslant w} denotes the set of vectors whose nn components are natural numbers and sum to at most ww, whereas {0,1}n|w\{0,1\}^{n}|_{w} denotes the set of Boolean strings of length nn and Hamming weight exactly w.w. For a function f:Xf\colon X\to\mathbb{R} on a subset X{0,1}n,X\subseteq\{0,1\}^{n}, we let f|Wf|_{W} denote the restriction of ff to X|W.X|_{W}. Thus, f|Wf|_{W} is a function with domain X|WX|_{W} given by f|W(x)=f(x).f|_{W}(x)=f(x). A typical instance of this notation would be f|wf|_{\leqslant w} for some real number w,w, corresponding to the restriction of ff to Boolean strings of Hamming weight at most w.w.

2.3. Concentration of measure

Throughout this manuscript, we view probability distributions as real functions. This convention makes available the shorthand notation introduced above. In particular, for probability distributions μ\mu and λ,\lambda, the symbol suppμ\operatorname{supp}\mu denotes the support of μ\mu, and μλ\mu\otimes\lambda denotes the probability distribution given by (μλ)(x,y)=μ(x)λ(y).(\mu\otimes\lambda)(x,y)=\mu(x)\lambda(y). We use the notation μ×λ\mu\times\lambda interchangeably with μλ,\mu\otimes\lambda, the former being more standard for probability distributions. If μ\mu is a probability distribution on X,X, we consider μ\mu to be defined also on any superset of XX with the understanding that μ=0\mu=0 outside X.X.

We recall the following multiplicative form of the Chernoff bound [21].

Theorem 2.1 (Chernoff bound).

Let X1,X2,,Xn{0,1}X_{1},X_{2},\ldots,X_{n}\in\{0,1\} be i.i.d. random variables with 𝐄Xi=p.\operatorname*{\mathbf{E}}X_{i}=p. Then for all 0δ1,0\leqslant\delta\leqslant 1,

𝐏[|i=1nXipn|δpn]2exp(δ2pn3).\operatorname*{\mathbf{P}}\left[\left|\sum_{i=1}^{n}X_{i}-pn\right|\geqslant\delta pn\right]\leqslant 2\exp\left(-\frac{\delta^{2}pn}{3}\right).

Theorem 2.1 assumes i.i.d. Bernoulli random variables. Hoeffding’s inequality [24], stated next, is a more general concentration-of-measure result that applies to any independent bounded random variables.

Theorem 2.2 (Hoeffding’s inequality).

Let X1,X2,,XnX_{1},X_{2},\ldots,X_{n} be independent random variables with Xi[ai,bi].X_{i}\in[a_{i},b_{i}]. Define p=i=1n𝐄Xi.p=\sum_{i=1}^{n}\operatorname*{\mathbf{E}}X_{i}. Then for all δ0,\delta\geqslant 0,

𝐏[|i=1nXip|δ]2exp(2δ2i=1n(biai)2).\operatorname*{\mathbf{P}}\left[\left|\sum_{i=1}^{n}X_{i}-p\right|\geqslant\delta\right]\leqslant 2\exp\left(-\frac{2\delta^{2}}{\sum_{i=1}^{n}(b_{i}-a_{i})^{2}}\right).

The standard version of Hoeffding’s inequality, stated above, requires X1,X2,,XnX_{1},X_{2},\ldots,X_{n} to be independent. Less known are Hoeffding’s results for dependent random variables, which he obtained along with Theorem 2.2 in his original paper [24]. We will specifically need the following concentration inequality for sampling without replacement [24, Section 6].

Theorem 2.3 (Hoeffding’s sampling without replacement).

Let ω1,ω2,,ωN\omega_{1},\omega_{2},\ldots,\omega_{N} be given reals, with ωi[a,b]\omega_{i}\in[a,b] for all i.i. Let J1,J2,,Jn[N]J_{1},J_{2},\ldots,J_{n}\in[N] be uniformly random integers that are pairwise distinct. Let Xi=ωJiX_{i}=\omega_{J_{i}} for i=1,2,,n,i=1,2,\ldots,n, and define p=i=1n𝐄Xi.p=\sum_{i=1}^{n}\operatorname*{\mathbf{E}}X_{i}. Then for all δ0,\delta\geqslant 0,

𝐏[|i=1nXip|δ]2exp(2δ2n(ba)2).\operatorname*{\mathbf{P}}\left[\left|\sum_{i=1}^{n}X_{i}-p\right|\geqslant\delta\right]\leqslant 2\exp\left(-\frac{2\delta^{2}}{n(b-a)^{2}}\right).

Hoeffding’s two theorems are clearly incomparable. On the one hand, Theorem 2.2 requires independence and therefore does not apply to sampling without replacement. On the other hand, each random variable XiX_{i} in Theorem 2.3 must be uniformly distributed on a finite multiset of values, which must further be the same multiset for all XiX_{i}; none of this is assumed in Theorem 2.2.

Finally, we will need a concentration-of-measure result due to Bun and Thaler [17, Lemma 4.7] for product distributions on n\mathbb{N}^{n}.

Lemma 2.4 (cf. Bun and Thaler).

Let λ1,λ2,,λn\lambda_{1},\lambda_{2},\ldots,\lambda_{n} be distributions on \mathbb{N} with finite support such that

λi(t)\displaystyle\lambda_{i}(t) Cαt(t+1)2,\displaystyle\leqslant\frac{C\alpha^{t}}{(t+1)^{2}}, t,\displaystyle t\in\mathbb{N},

where C0C\geqslant 0 and 0α10\leqslant\alpha\leqslant 1. Then for all T8Cen(1+lnn),T\geqslant 8C\mathrm{e}n(1+\ln n),

𝐏vλ1×λ2××λn[v1T]αT/2.\operatorname*{\mathbf{P}}_{v\sim\lambda_{1}\times\lambda_{2}\times\cdots\times\lambda_{n}}[\|v\|_{1}\geqslant T]\leqslant\alpha^{T/2}.

Bun and Thaler’s result in [17, Lemma 4.7] differs slightly from the statement above. The proof of Lemma 2.4 as stated can be found in [49, Lemma 3.6]. By leveraging Lemma 2.4, we obtain the following concentration result for probability distributions that are supported on the Boolean hypercube, rather than \mathbb{N}, and are shifted from the origin.

Lemma 2.5.

Fix integers Bk0.B\geqslant k\geqslant 0. Let λ1,λ2,,λ\lambda_{1},\lambda_{2},\ldots,\lambda_{\ell} be probability distributions on {0,1}B\{0,1\}^{B} with support contained in {0,1}B|k.\{0,1\}^{B}|_{\geqslant k}. Suppose further that

λi({0,1}B|t)\displaystyle\lambda_{i}(\{0,1\}^{B}|_{t}) Cαtk(tk+1)2,\displaystyle\leqslant\frac{C\alpha^{t-k}}{(t-k+1)^{2}}, i[],t{k,k+1,,B},\displaystyle i\in[\ell],\;\;t\in\{k,k+1,\ldots,B\},

where C0C\geqslant 0 and 0α1.0\leqslant\alpha\leqslant 1. Then for all T8Ce(1+ln)+k,T\geqslant 8C\mathrm{e}\ell(1+\ln\ell)+\ell k,

𝐏(x1,,x)λ1××λ[i=1|xi|T]α(Tk)/2.\operatorname*{\mathbf{P}}_{(x_{1},\ldots,x_{\ell})\sim\lambda_{1}\times\cdots\times\lambda_{\ell}}\left[\sum_{i=1}^{\ell}|x_{i}|\geqslant T\right]\leqslant\alpha^{(T-\ell k)/2}.
Proof.

For i=1,2,,,i=1,2,\ldots,\ell, consider the distribution μi\mu_{i} on {0,1,,Bk}\{0,1,\ldots,B-k\} given by μi(t)=λi({0,1}B|t+k).\mu_{i}(t)=\lambda_{i}(\{0,1\}^{B}|_{t+k}). Then

μi(t)\displaystyle\mu_{i}(t) Cαt(t+1)2,\displaystyle\leqslant\frac{C\alpha^{t}}{(t+1)^{2}}, i[],t0.\displaystyle i\in[\ell],\;\;t\geqslant 0. (2.1)

Moreover, the random variable |xi||x_{i}| with xiλix_{i}\sim\lambda_{i} has the same distribution as the random variable ui+ku_{i}+k for uiμi.u_{i}\sim\mu_{i}. As a result,

𝐏(x1,,x)λ1××λ[i=1|xi|T]\displaystyle\operatorname*{\mathbf{P}}_{(x_{1},\ldots,x_{\ell})\sim\lambda_{1}\times\cdots\times\lambda_{\ell}}\left[\sum_{i=1}^{\ell}|x_{i}|\geqslant T\right] =𝐏uμ1×μ2××μ[i=1(ui+k)T]\displaystyle=\operatorname*{\mathbf{P}}_{u\sim\mu_{1}\times\mu_{2}\times\cdots\times\mu_{\ell}}\left[\sum_{i=1}^{\ell}(u_{i}+k)\geqslant T\right]
=𝐏uμ1×μ2××μ[u1Tk]\displaystyle=\operatorname*{\mathbf{P}}_{u\sim\mu_{1}\times\mu_{2}\times\cdots\times\mu_{\ell}}[\|u\|_{1}\geqslant T-k\ell]
α(Tk)/2,\displaystyle\leqslant\alpha^{(T-\ell k)/2},

where the last step uses Lemma 2.4 along with (2.1) and the hypothesis that T8Ce(1+ln)+k.T\geqslant 8C\mathrm{e}\ell(1+\ln\ell)+\ell k.

2.4. Orthogonal content

For a multivariate polynomial p:np\colon\mathbb{R}^{n}\to\mathbb{R}, we let degp\deg p denote the total degree of pp, i.e., the largest degree of any monomial of p.p. We use the terms degree and total degree interchangeably in this paper. It will be convenient to define the degree of the zero polynomial by deg0=.\deg 0=-\infty. For a real-valued function ϕ\phi supported on a finite subset of n\mathbb{R}^{n}, the orthogonal content of ϕ,\phi, denoted orthϕ\operatorname{orth}\phi, is the minimum degree of a real polynomial pp for which ϕ,p0.\langle\phi,p\rangle\neq 0. We adopt the convention that orthϕ=\operatorname{orth}\phi=\infty if no such polynomial exists. It is clear that orthϕ{},\operatorname{orth}\phi\in\mathbb{N}\cup\{\infty\}, with the extremal cases orthϕ=0ϕ,10\operatorname{orth}\phi=0\;\Leftrightarrow\;\langle\phi,1\rangle\neq 0 and orthϕ=ϕ=0.\operatorname{orth}\phi=\infty\;\Leftrightarrow\;\phi=0. Additional facts about orthogonal content are given by the following two propositions.

Proposition 2.6.

Let XX and YY be nonempty finite subsets of Euclidean space. Then:

  1. (i)

    orth(ϕ+ψ)min{orthϕ,orthψ}\operatorname{orth}(\phi+\psi)\geqslant\min\{\operatorname{orth}\phi,\operatorname{orth}\psi\} for all ϕ,ψ:X;\phi,\psi\colon X\to\mathbb{R};

  2. (ii)

    orth(ϕψ)=orth(ϕ)+orth(ψ)\operatorname{orth}(\phi\otimes\psi)=\operatorname{orth}(\phi)+\operatorname{orth}(\psi) for all ϕ:X\phi\colon X\to\mathbb{R} and ψ:Y.\psi\colon Y\to\mathbb{R}.

A proof of Proposition 2.6 can be found in [49, Proposition 2.1].

Proposition 2.7.

Define V={0N,e1,e2,,eN}N.V=\{0^{N},e_{1},e_{2},\ldots,e_{N}\}\subseteq\mathbb{R}^{N}. Fix functions ϕv:X\phi_{v}\colon X\to\mathbb{R} (vV),(v\in V), where XX is a finite subset of Euclidean space. Suppose that

orth(ϕuϕv)\displaystyle\operatorname{orth}(\phi_{u}-\phi_{v}) D,\displaystyle\geqslant D, u,vV,\displaystyle u,v\in V, (2.2)

where DD is a positive integer. Then for every polynomial p:X,p\colon X^{\ell}\to\mathbb{R}, the mapping zi=1ϕzi,pz\mapsto\langle\bigotimes_{i=1}^{\ell}\phi_{z_{i}},p\rangle is a polynomial on VV^{\ell} of degree at most (degp)/D.(\deg p)/D.

Proof.

By linearity, it suffices to consider factored polynomials p(x1,,x)=i=1pi(xi),p(x_{1},\ldots,x_{\ell})=\prod_{i=1}^{\ell}p_{i}(x_{i}), where each pip_{i} is a nonzero polynomial on X.X. In this setting,

i=1ϕzi,p\displaystyle\left\langle\bigotimes_{i=1}^{\ell}\phi_{z_{i}},p\right\rangle =i=1ϕzi,pi.\displaystyle=\prod_{i=1}^{\ell}\left\langle\phi_{z_{i}},p_{i}\right\rangle. (2.3)

By (2.2), we have ϕ0N,pi=ϕe1,pi=ϕe2,pi==ϕeN,pi\langle\phi_{0^{N}},p_{i}\rangle=\langle\phi_{e_{1}},p_{i}\rangle=\langle\phi_{e_{2}},p_{i}\rangle=\cdots=\langle\phi_{e_{N}},p_{i}\rangle for any index ii with degpi<D.\deg p_{i}<D. As a result, polynomials pip_{i} with degpi<D\deg p_{i}<D do not contribute to the degree of the right-hand side of (2.3) as a function of z.z. For the other polynomials pip_{i}, the inner product ϕzi,pi\langle\phi_{z_{i}},p_{i}\rangle is a linear polynomial in zi,z_{i}, namely,

ϕzi,pi=zi,1ϕe1,pi+zi,2ϕe2,pi++zi,NϕeN,pi+(1j=1Nzi,j)ϕ0N,pi.\langle\phi_{z_{i}},p_{i}\rangle=z_{i,1}\langle\phi_{e_{1}},p_{i}\rangle+z_{i,2}\langle\phi_{e_{2}},p_{i}\rangle+\cdots+z_{i,N}\langle\phi_{e_{N}},p_{i}\rangle\\ +\left(1-\sum_{j=1}^{N}z_{i,j}\right)\langle\phi_{0^{N}},p_{i}\rangle.

Thus, polynomials pip_{i} with degpiD\deg p_{i}\geqslant D contribute at most 11 each to the degree. Summarizing, the right-hand side of (2.3) is a real polynomial in z1,z2,,zz_{1},z_{2},\ldots,z_{\ell} of degree at most |{i:degpiD}|degpD.|\{i:\deg p_{i}\geqslant D\}|\leqslant\frac{\deg p}{D}.

Proposition 2.7 generalizes an analogous result in [49, Proposition 2.2], where the special case N=1N=1 was treated.

2.5. Polynomial approximation

For a real number ϵ0\epsilon\geqslant 0 and a function f:Xf\colon X\to\mathbb{R} on a finite subset XX of Euclidean space, the ϵ\epsilon-approximate degree of ff is denoted degϵ(f)\deg_{\epsilon}(f) and is defined to be the minimum degree of a polynomial pp such that fpϵ.\|f-p\|_{\infty}\leqslant\epsilon. For ϵ<0\epsilon<0, it will be convenient to define degϵ(f)=+\deg_{\epsilon}(f)=+\infty since no polynomial satisfies fpϵ\|f-p\|_{\infty}\leqslant\epsilon in this case. We focus on the approximate degree of Boolean functions f:X{0,1}.f\colon X\to\{0,1\}. In this setting, the standard choice of the error parameter is ϵ=1/3.\epsilon=1/3. This choice is without loss of generality since degϵ(f)=Θ(deg1/3(f))\deg_{\epsilon}(f)=\Theta(\deg_{1/3}(f)) for every Boolean function ff and every constant 0<ϵ<1/2.0<\epsilon<1/2. In what follows, we refer to 1/31/3-approximate degree simply as “approximate degree.” The notion of approximate degree has the following dual characterization [38, 39].

Fact 2.8.

Let f:Xf\colon X\to\mathbb{R} be given, for a finite set XnX\subset\mathbb{R}^{n}. Let d0d\geqslant 0 be an integer and ϵ0\epsilon\geqslant 0 a real number. Then degϵ(f)d\deg_{\epsilon}(f)\geqslant d if and only if there exists a function ψ:X\psi\colon X\to\mathbb{R} such that

f,ψ>ϵψ1,\displaystyle\langle f,\psi\rangle>\epsilon\|\psi\|_{1},
orthψd.\displaystyle\operatorname{orth}\psi\geqslant d.

This characterization of approximate degree can be verified using linear programming duality, cf. [38, 39]. We now recall a variant of approximate degree for one-sided approximation. For a Boolean function f:X{0,1}f\colon X\to\{0,1\} and ϵ0,\epsilon\geqslant 0, the one-sided ϵ\epsilon-approximate degree of ff is denoted degϵ+(f)\deg^{+}_{\epsilon}(f) and defined to be the minimum degree of a real polynomial pp such that

f(x)ϵ\displaystyle f(x)-\epsilon p(x)f(x)+ϵ,\displaystyle\leqslant p(x)\leqslant f(x)+\epsilon, xf1(0),\displaystyle x\in f^{-1}(0),
f(x)ϵ\displaystyle f(x)-\epsilon p(x),\displaystyle\leqslant p(x), xf1(1).\displaystyle x\in f^{-1}(1).

We refer to any such polynomial as a one-sided approximant for ff with error ϵ.\epsilon. As usual, the canonical setting of the error parameter is ϵ=1/3.\epsilon=1/3. In the pathological case ϵ<0\epsilon<0, it will be convenient to define degϵ+(f)=+\deg^{+}_{\epsilon}(f)=+\infty. Observe the asymmetric treatment of f1(0)f^{-1}(0) and f1(1)f^{-1}(1) in this formalism. In particular, the one-sided approximate degree of Boolean functions is in general not invariant under negation. One-sided approximate degree enjoys the following dual characterization [16].

Fact 2.9.

Let f:X{0,1}f\colon X\to\{0,1\} be given, for a finite set XnX\subset\mathbb{R}^{n}. Let d0d\geqslant 0 be an integer and ϵ0\epsilon\geqslant 0 a real number. Then degϵ+(f)d\deg^{+}_{\epsilon}(f)\geqslant d if and only if there exists a function ψ:X\psi\colon X\to\mathbb{R} such that

f,ψ>ϵψ1,\displaystyle\langle f,\psi\rangle>\epsilon\|\psi\|_{1},
orthψd,\displaystyle\operatorname{orth}\psi\geqslant d,
ψ(x)0whenever f(x)=1.\displaystyle\psi(x)\geqslant 0\;\text{whenever }f(x)=1.

2.6. Dual polynomials

Facts 2.8 and 2.9 make it possible to prove lower bounds on approximate degree in a constructive manner, by exhibiting a dual object ψ\psi that serves as a witness. This object is referred to as a dual polynomial. Often, a dual polynomial for a composed function ff can be constructed by combining dual objects for various components of f.f. Of particular importance in the study of 𝖠𝖢0\operatorname{\mathsf{AC}}^{0} is the dual object for the OR function. The first dual polynomial for OR was constructed by Å palek [50], with many refinements and generalizations obtained in follow-up work [15, 45, 46, 17, 14, 49]. We will use the following construction from [49, Lemma B.2].

Lemma 2.10.

Let ϵ\epsilon be given, 0<ϵ<10<\epsilon<1. Then for some constant c=c(ϵ)(0,1)c=c(\epsilon)\in(0,1) and every integer n1,n\geqslant 1, there is an ((explicitly given)) function ω:{0,1,2,,n}\omega\colon\{0,1,2,\dots,n\}\to\mathbb{R} such that

ω(0)>1ϵ2ω1,\displaystyle\omega(0)>\frac{1-\epsilon}{2}\cdot\|\omega\|_{1},
|ω(t)|1ct2 2ct/nω1\displaystyle|\omega(t)|\leqslant\frac{1}{ct^{2}\,2^{ct/\sqrt{n}}}\cdot\|\omega\|_{1} (t=1,2,,n),\displaystyle(t=1,2,\ldots,n),
(1)tω(t)0\displaystyle(-1)^{t}\omega(t)\geqslant 0 (t=0,1,2,,n),\displaystyle(t=0,1,2,\dots,n),
orthωcn.\displaystyle\operatorname{orth}\omega\geqslant c\sqrt{n}.

A useful tool in the construction of dual polynomials is the following lemma due to Razborov and Sherstov [34].

Lemma 2.11 (Razborov and Sherstov).

Fix integers DD and n,n, where 0D<n.0\leqslant D<n. Then there is an ((explicitly given)) function ζ:{0,1}n\zeta\colon\{0,1\}^{n}\to\mathbb{R} such that

suppζ{0,1}n|D{1n},\displaystyle\operatorname{supp}\zeta\subseteq\{0,1\}^{n}|_{\leqslant D}\cup\{1^{n}\}, (2.4)
ζ(1n)=1,\displaystyle\zeta(1^{n})=1, (2.5)
ζ11+2D(nD),\displaystyle\|\zeta\|_{1}\leqslant 1+2^{D}\binom{n}{D}, (2.6)
orthζ>D.\displaystyle\operatorname{orth}\zeta>D. (2.7)

In more detail, this result corresponds to taking k=Dk=D and ζ=(1)ng\zeta=(-1)^{n}g in the proof of Lemma 3.2 of [34]. We will need the following natural generalization of Lemma 2.11.

Lemma 2.12.

Fix integers DD and B,B, where 0D<B.0\leqslant D<B. Let y{0,1}By\in\{0,1\}^{B} be a string with |y|>D.|y|>D. Then there is an ((explicitly given)) function ζy:{0,1}B\zeta_{y}\colon\{0,1\}^{B}\to\mathbb{R} such that

suppζy{x:xy and |x|D}{y},\displaystyle\operatorname{supp}\zeta_{y}\subseteq\{x:x\leqslant y\text{ and }|x|\leqslant D\}\cup\{y\}, (2.8)
ζy(y)=1,\displaystyle\zeta_{y}(y)=1, (2.9)
ζy11+2D(BD),\displaystyle\|\zeta_{y}\|_{1}\leqslant 1+2^{D}\binom{B}{D}, (2.10)
orthζy>D.\displaystyle\operatorname{orth}\zeta_{y}>D. (2.11)
Proof.

Set n=|y|.n=|y|. Lemma 2.11 gives an explicit function ζ:{0,1}n\zeta\colon\{0,1\}^{n}\to\mathbb{R} that satisfies (2.4)–(2.7). Define ζy:{0,1}B\zeta_{y}\colon\{0,1\}^{B}\to\mathbb{R} by

ζy(x)=ζ(x|S)iS(1xi),\zeta_{y}(x)=\zeta(x|_{S})\prod_{i\notin S}(1-x_{i}),

where S={i:yi=1}S=\{i:y_{i}=1\}. Then (2.9) and (2.10) are immediate from (2.5) and (2.6), respectively. Property (2.11) follows from (2.7) in light of Proposition 2.6 (ii). To verify the remaining property (2.8), fix any input xx with ζy(x)0\zeta_{y}(x)\neq 0. Then the definition of ζy\zeta_{y} implies that x|S¯x|_{\overline{S}} is the zero vector, whereas (2.4) implies that x|Sx|_{S} is either 1n1^{n} or a string of Hamming weight at most D.D. In the former case, we have x=yx=y; in the latter case, xyx\leqslant y and |x|D.|x|\leqslant D.

Informally, Lemmas 2.11 and 2.12 are useful when one needs to adjust a dual object’s metric properties while preserving its orthogonality to low-degree polynomials. These lemmas play a basic role in several recent papers [34, 17, 14, 18, 49] as well as our work. For the reader’s benefit, we encapsulate this procedure as Lemma 2.13 below and provide a detailed proof.

Lemma 2.13.

Let Φ:{0,1}B\Phi\colon\{0,1\}^{B}\to\mathbb{R} be given. Fix integers TD0T\geqslant D\geqslant 0. Then there is an ((explicitly given)) function Φ~:{0,1}B\tilde{\Phi}\colon\{0,1\}^{B}\to\mathbb{R} such that

suppΦ~{0,1}B|T,\displaystyle\operatorname{supp}\tilde{\Phi}\subseteq\{0,1\}^{B}|_{\leqslant T}, (2.12)
orth(ΦΦ~)>D,\displaystyle\operatorname{orth}(\Phi-\tilde{\Phi})>D, (2.13)
ΦΦ~1(1+2D(BD))x:|x|>T|Φ(x)|.\displaystyle\|\Phi-\tilde{\Phi}\|_{1}\leqslant\left(1+2^{D}\binom{B}{D}\right)\sum_{x:|x|>T}|\Phi(x)|. (2.14)
Proof (adapted from [34, 17, 14, 18, 49])..

For TB,T\geqslant B, the lemma holds trivially with Φ~=Φ.\tilde{\Phi}=\Phi. In what follows, we treat the complementary case T<B.T<B.

For each y{0,1}B|>Ty\in\{0,1\}^{B}|_{>T}, Lemma 2.12 constructs a function ζy:{0,1}B\zeta_{y}\colon\{0,1\}^{B}\to\mathbb{R} that obeys (2.8)–(2.11). Define

Φ~=Φy{0,1}B|>TΦ(y)ζy.\tilde{\Phi}=\Phi-\sum_{y\in\{0,1\}^{B}|_{>T}}\Phi(y)\zeta_{y}.

Then for x{0,1}B|>Tx\in\{0,1\}^{B}|_{>T}, properties (2.8) and (2.9) force ζy(x)=δx,y\zeta_{y}(x)=\delta_{x,y} and consequently Φ~(x)=Φ(x)Φ(x)=0.\tilde{\Phi}(x)=\Phi(x)-\Phi(x)=0. This settles (2.12). Property (2.13) is justified by

orth(ΦΦ~)=orth(y{0,1}B|>TΦ(y)ζy)miny{0,1}B|>Torthζy>D,\operatorname{orth}(\Phi-\tilde{\Phi})=\operatorname{orth}\left(\sum_{y\in\{0,1\}^{B}|_{>T}}\Phi(y)\zeta_{y}\right)\geqslant\min_{y\in\{0,1\}^{B}|_{>T}}\operatorname{orth}\zeta_{y}>D,

where the last two steps use Proposition 2.6(i) and (2.11), respectively. The final property (2.14) can be derived as follows:

ΦΦ~1\displaystyle\|\Phi-\tilde{\Phi}\|_{1} =y{0,1}B|>TΦ(y)ζy1\displaystyle=\left\|\sum_{y\in\{0,1\}^{B}|_{>T}}\Phi(y)\zeta_{y}\right\|_{1}
y{0,1}B|>T|Φ(y)|ζy1\displaystyle\leqslant\sum_{y\in\{0,1\}^{B}|_{>T}}|\Phi(y)|\|\zeta_{y}\|_{1}
(1+2D(BD))y{0,1}B|>T|Φ(y)|,\displaystyle\leqslant\left(1+2^{D}\binom{B}{D}\right)\sum_{y\in\{0,1\}^{B}|_{>T}}|\Phi(y)|,

where the last two steps use the triangle inequality and (2.10), respectively. ∎

2.7. Symmetrization

Let SnS_{n} denote the symmetric group on nn elements. For a permutation σSn\sigma\in S_{n} and an arbitrary sequence x=(x1,x2,,xn),x=(x_{1},x_{2},\ldots,x_{n}), we adopt the shorthand σx=(xσ(1),xσ(2),,xσ(n)).\sigma x=(x_{\sigma(1)},x_{\sigma(2)},\ldots,x_{\sigma(n)}). A function f(x1,x2,,xn)f(x_{1},x_{2},\ldots,x_{n}) is called symmetric if it is invariant under permutation of the input variables: f(x1,x2,,xn)=f(xσ(1),xσ(2),,xσ(n))f(x_{1},x_{2},\ldots,x_{n})=f(x_{\sigma(1)},x_{\sigma(2)},\ldots,x_{\sigma(n)}) for all xx and σ.\sigma. Symmetric functions on {0,1}n\{0,1\}^{n} are intimately related to univariate polynomials, as was first observed by Minsky and Papert in their symmetrization argument [30].

Proposition 2.14 (Minsky and Papert).

Let p:np\colon\mathbb{R}^{n}\to\mathbb{R} be a given polynomial. Then the mapping

t𝐄x{0,1}n|tp(x)t\mapsto\operatorname*{\mathbf{E}}_{\begin{subarray}{c}x\in\{0,1\}^{n}|_{t}\end{subarray}}\;p(x)

is a univariate polynomial on {0,1,2,,n}\{0,1,2,\ldots,n\} of degree at most degp.\deg p.

The next result, proved in [49, Corollary 2.13], generalizes Minsky and Papert’s symmetrization to the setting when x1,x2,,xnx_{1},x_{2},\ldots,x_{n} are vectors rather than bits.

Fact 2.15 (Sherstov and Wu).

Let p:(N)θp\colon(\mathbb{R}^{N})^{\theta}\to\mathbb{R} be a given polynomial. Then the mapping

v𝐄x{0N,e1,e2,,eN}θ:x1+x2++xθ=vp(x)v\mapsto\operatorname*{\mathbf{E}}_{\begin{subarray}{c}x\in\{0^{N},e_{1},e_{2},\ldots,e_{N}\}^{\theta}:\\ x_{1}+x_{2}+\cdots+x_{\theta}=v\end{subarray}}\;\;p(x) (2.15)

is a polynomial on N|θ\mathbb{N}^{N}|_{\leqslant\theta} of degree at most degp.\deg p.

Minsky and Papert’s symmetrization corresponds to N=1N=1 in Fact 2.15.

2.8. Number theory

For positive integers aa and bb that are relatively prime, we let (a1)b{1,2,,b1}(a^{-1})_{b}\in\{1,2,\ldots,b-1\} denote the multiplicative inverse of aa modulo b.b. The following fact is well-known and straightforward to verify; see, e.g., [48, Fact 2.8].

Fact 2.16.

For any positive integers aa and bb that are relatively prime,

(a1)bb+(b1)aa1ab.\frac{(a^{-1})_{b}}{b}+\frac{(b^{-1})_{a}}{a}-\frac{1}{ab}\in\mathbb{Z}.

The prime counting function π(x)\pi(x) for a real argument x0x\geqslant 0 evaluates to the number of prime numbers less than or equal to x.x. In this manuscript, it will be clear from the context whether π\pi refers to 3.141593.14159\ldots or the prime counting function. The asymptotic growth of the latter is given by the prime number theorem, which states that π(n)n/lnn.\pi(n)\sim n/\ln n. The following explicit bound on π(n)\pi(n) is due to Rosser [35].

Fact 2.17 (Rosser).

For n55,n\geqslant 55,

nlnn+2<π(n)<nlnn4.\frac{n}{\ln n+2}<\pi(n)<\frac{n}{\ln n-4}.

The number of distinct prime divisors of a natural number nn is denoted ν(n)\nu(n). The following first-principles bound on ν(n)\nu(n) is asymptotically tight for infinitely many n;n; see [48, Fact 2.11] for details.

Fact 2.18.

The number of distinct prime divisors of nn obeys

(ν(n)+1)!n.(\nu(n)+1)!\leqslant n.

In particular,

ν(n)(1+o(1))lnnlnlnn.\nu(n)\leqslant(1+o(1))\frac{\ln n}{\ln\ln n}.

3. Balanced colorings

For integers nk1n\geqslant k\geqslant 1 and r1,r\geqslant 1, consider a mapping γ:([n]k)[r].\gamma\colon\binom{[n]}{k}\to[r]. We refer to any such γ\gamma as a coloring of ([n]k)\binom{[n]}{k} with rr colors. An important ingredient in our work is the construction of a balanced coloring, in the following technical sense.

Definition 3.1.

Let γ:([n]k)[r]\gamma\colon\binom{[n]}{k}\to[r] be a given coloring. For a subset A[n],A\subseteq[n], we say that γ\gamma is ϵ\epsilon-balanced on AA iff for each i[r],i\in[r],

1ϵr(|A|k)|γ1(i)(Ak)|1+ϵr(|A|k).\frac{1-\epsilon}{r}\binom{|A|}{k}\leqslant\left|\gamma^{-1}(i)\cap\binom{A}{k}\right|\leqslant\frac{1+\epsilon}{r}\binom{|A|}{k}.

We define γ\gamma to be (ϵ,δ,m)(\epsilon,\delta,m)-balanced iff

𝐏A([n])[γ is ϵ-balanced on A]1δ\operatorname*{\mathbf{P}}_{A\in\binom{[n]}{\ell}}[\gamma\text{ is }\text{$\epsilon$-balanced on $A$]}\geqslant 1-\delta

for all {m,m+1,,n}.\ell\in\{m,m+1,\ldots,n\}.

As one might expect, a uniformly random coloring is balanced with high probability; we establish this fact in Section 3.1. In Sections 3.23.5 that follow, we construct a highly balanced coloring based on an integer set with low discrepancy. The reader who is interested only in the quantitative aspect of our theorems and is not concerned about explicitness, may read Section 3.1 and skip without loss of continuity to Section 4.

3.1. Existence of balanced colorings

The next lemma uses the probabilistic method to establish the existence of balanced colorings with excellent parameters.

Lemma 3.2.

Let ϵ,δ(0,1]\epsilon,\delta\in(0,1] be given. Let n,m,k,rn,m,k,r be positive integers with nmkn\geqslant m\geqslant k and

(mk)3rϵ2ln2rnδ.\binom{m}{k}\geqslant\frac{3r}{\epsilon^{2}}\cdot\ln\frac{2rn}{\delta}. (3.1)

Then there exists an (ϵ,δ,m)(\epsilon,\delta,m)-balanced coloring γ:([n]k)[r]\gamma\colon\binom{[n]}{k}\to[r].

Proof.

Let γ:([n]k)[r]\gamma\colon\binom{[n]}{k}\to[r] be a uniformly random coloring. For fixed ii and A([n]m),A\in\binom{[n]}{{\geqslant}m}, the cardinality |γ1(i)(Ak)||\gamma^{-1}(i)\cap\binom{A}{k}| is the sum of (|A|k)\binom{|A|}{k} independent Bernoulli random variables, each with expected value 1/r.1/r. As a result,

𝐏γ[γ is not ϵ-balanced on A]\displaystyle\operatorname*{\mathbf{P}}_{\gamma}[\text{$\gamma$ is not $\epsilon$-balanced on $A$}]
=𝐏γ[maxi[r]||γ1(i)(Ak)|1r(|A|k)|>ϵr(|A|k)]\displaystyle\qquad\qquad=\operatorname*{\mathbf{P}}_{\gamma}\left[\max_{i\in[r]}\left|\left|\gamma^{-1}(i)\cap\binom{A}{k}\right|-\frac{1}{r}\binom{|A|}{k}\right|>\frac{\epsilon}{r}\binom{|A|}{k}\right]
rmaxi[r]𝐏γ[||γ1(i)(Ak)|1r(|A|k)|>ϵr(|A|k)]\displaystyle\qquad\qquad\leqslant r\max_{i\in[r]}\;\;\operatorname*{\mathbf{P}}_{\gamma}\left[\left|\left|\gamma^{-1}(i)\cap\binom{A}{k}\right|-\frac{1}{r}\binom{|A|}{k}\right|>\frac{\epsilon}{r}\binom{|A|}{k}\right]
r2exp(ϵ23r(|A|k))\displaystyle\qquad\qquad\leqslant r\cdot 2\exp\left(-\frac{\epsilon^{2}}{3r}\binom{|A|}{k}\right)
r2exp(ϵ23r(mk))\displaystyle\qquad\qquad\leqslant r\cdot 2\exp\left(-\frac{\epsilon^{2}}{3r}\binom{m}{k}\right)
δn,\displaystyle\qquad\qquad\leqslant\frac{\delta}{n}, (3.2)

where the second step applies the union bound over i[r],i\in[r], the third step uses the Chernoff bound (Theorem 2.1), and the fifth step uses (3.1). Now

𝐄γmax{m,m+1,,n}𝐏A([n])[γ is not ϵ-balanced on A]\displaystyle\operatorname*{\mathbf{E}}_{\gamma}\;\max_{\ell\in\{m,m+1,\ldots,n\}}\operatorname*{\mathbf{P}}_{A\in\binom{[n]}{\ell}}[\text{$\gamma$ is not $\epsilon$-balanced on $A$]}
𝐄γ=mn𝐏A([n])[γ is not ϵ-balanced on A]\displaystyle\qquad\qquad\leqslant\operatorname*{\mathbf{E}}_{\gamma}\;\;\sum_{\ell=m}^{n}\operatorname*{\mathbf{P}}_{A\in\binom{[n]}{\ell}}[\text{$\gamma$ is not $\epsilon$-balanced on $A$]}
==mn𝐄A([n])𝐏γ[γ is not ϵ-balanced on A]\displaystyle\qquad\qquad=\sum_{\ell=m}^{n}\operatorname*{\mathbf{E}}_{A\in\binom{[n]}{\ell}}\operatorname*{\mathbf{P}}_{\gamma}[\text{$\gamma$ is not $\epsilon$-balanced on $A$}]
=mnδn\displaystyle\qquad\qquad\leqslant\sum_{\ell=m}^{n}\frac{\delta}{n}
δ,\displaystyle\qquad\qquad\leqslant\delta,

where the next-to-last step uses (3.2). We conclude that there exists a coloring γ\gamma with

max{m,m+1,,n}𝐏A([n])[γ is not ϵ-balanced on A]δ,\max_{\ell\in\{m,m+1,\ldots,n\}}\operatorname*{\mathbf{P}}_{A\in\binom{[n]}{\ell}}[\text{$\gamma$ is not $\epsilon$-balanced on $A$]}\leqslant\delta,

which is the definition of an (ϵ,δ,m)(\epsilon,\delta,m)-balanced coloring. ∎

For our purposes, the following consequence of Lemma 3.2 will be sufficient.

Corollary 3.3.

Let n,m,k,rn,m,k,r be positive integers with nmk2.n\geqslant m\geqslant k^{2}. Let ϵ(0,1]\epsilon\in(0,1] be given with

ϵ3rkln(n+1)mk/4.\epsilon\geqslant\frac{3r\sqrt{k\ln(n+1)}}{m^{k/4}}.

Then there exists an (ϵ,ϵ,m)(\epsilon,\epsilon,m)-balanced coloring γ:([n]k)[r].\gamma\colon\binom{[n]}{k}\to[r].

Proof.

We have

3rϵ2ln2rnϵ\displaystyle\frac{3r}{\epsilon^{2}}\cdot\ln\frac{2rn}{\epsilon} 3rmk/29r2kln(n+1)ln(2rn3rkln(n+1)mk/4)\displaystyle\leqslant\frac{3rm^{k/2}}{9r^{2}k\ln(n+1)}\cdot\ln\left(\frac{2rn}{3r\sqrt{k\ln(n+1)}}\cdot m^{k/4}\right)
mk/23rkln(n+1)ln(nmk/4)\displaystyle\leqslant\frac{m^{k/2}}{3rk\ln(n+1)}\cdot\ln(n\cdot m^{k/4})
mk/23rkln(n+1)2klnn\displaystyle\leqslant\frac{m^{k/2}}{3rk\ln(n+1)}\cdot 2k\ln n
mk/2\displaystyle\leqslant m^{k/2}
(mk)k\displaystyle\leqslant\left(\frac{m}{k}\right)^{k}
(mk),\displaystyle\leqslant\binom{m}{k},

where the next-to-last step uses the hypothesis mk2.m\geqslant k^{2}. By Lemma 3.2, we conclude that there is an (ϵ,ϵ,m)(\epsilon,\epsilon,m)-balanced coloring γ:([n]k)[r].\gamma\colon\binom{[n]}{k}\to[r].

In Sections 3.23.5 below, we will give an explicit coloring with parameters essentially matching Corollary 3.3.

3.2. Discrepancy defined

Discrepancy is a measure of pseudorandomness or aperiodicity of a multiset of integers with respect to a given modulus M.M. Formally, let M2M\geqslant 2 be a given integer. The MM-discrepancy of a nonempty multiset Z={z1,z2,,zn}Z=\{z_{1},z_{2},\ldots,z_{n}\} of arbitrary integers is defined as

discM(Z)=maxk=1,2,,M1|1nj=1nωkzj|,\operatorname{disc}_{M}(Z)=\max_{k=1,2,\ldots,M-1}\left|\frac{1}{n}\sum_{j=1}^{n}\omega^{kz_{j}}\right|,

where ω\omega is a primitive MM-th root of unity; the right-hand side is obviously the same for any such ω\omega. Equivalently, we may write

discM(Z)=maxω1:ωM=1|1nj=1nωzj|,\operatorname{disc}_{M}(Z)=\max_{\omega\neq 1:\omega^{M}=1}\left|\frac{1}{n}\sum_{j=1}^{n}\omega^{z_{j}}\right|,

where the maximum is over MM-th roots of unity ω\omega other than 11. Yet another way to think of MM-discrepancy is in terms of the discrete Fourier transform on M.\mathbb{Z}_{M}. Specifically, consider the frequency vector (f0,f1,,fM1)(f_{0},f_{1},\ldots,f_{M-1}) of ZZ, where fjf_{j} is the total number of element occurrences in ZZ that are congruent to jj modulo M.M. Applying the discrete Fourier transform to (fj)j=0M1(f_{j})_{j=0}^{M-1} produces the sequence (j=0M1fjexp(2π𝐢kj/M))k=0M1=(j=1nexp(2π𝐢kzj/M))k=0M1,(\sum_{j=0}^{M-1}f_{j}\exp(-2\pi\mathbf{i}kj/M))_{k=0}^{M-1}=(\sum_{j=1}^{n}\exp(-2\pi\mathbf{i}kz_{j}/M))_{k=0}^{M-1}, which is a permutation of (n,j=1nωzj,,j=1nω(M1)zj)(n,\sum_{j=1}^{n}\omega^{z_{j}},\ldots,\sum_{j=1}^{n}\omega^{(M-1)z_{j}}) for a primitive MM-th root of unity ω.\omega. Thus, the MM-discrepancy of ZZ coincides up to a normalizing factor with the largest absolute value of a nonconstant Fourier coefficient of the frequency vector of Z.Z. The notion of mm-discrepancy has a long history in combinatorics and theoretical computer science; see [48] for a bibliographic overview.

Lemma 3.4 (Discrepancy under sampling without replacement).

Fix integers nt1n\geqslant t\geqslant 1 and M2.M\geqslant 2. Let Z={z1,z2,,zn}Z=\{z_{1},z_{2},\ldots,z_{n}\} be a multiset of integers. Then for all α[0,1],\alpha\in[0,1],

𝐏S([n]t)[discM({zi:iS})discM(Z)α]4Mexp(tα28),\operatorname*{\mathbf{P}}_{S\in\binom{[n]}{t}}[\operatorname{disc}_{M}(\{z_{i}:i\in S\})-\operatorname{disc}_{M}(Z)\geqslant\alpha]\leqslant 4M\exp\left(-\frac{t\alpha^{2}}{8}\right),

where {zi:iS}\{z_{i}:i\in S\} is understood to be a multiset of cardinality tt.

Proof.

Fix an MM-th root of unity ω.\omega. Then Re(ωz1),Re(ωz2),,Re(ωzn)\operatorname{Re}(\omega^{z_{1}}),\operatorname{Re}(\omega^{z_{2}}),\ldots,\operatorname{Re}(\omega^{z_{n}}) range in [1,1][-1,1]. Now, let S([n]t)S\in\binom{[n]}{t} be a uniformly random subset. Then the Hoeffding inequality for sampling without replacement (Theorem 2.3) implies that

𝐏S([n]t)[|1tjSRe(ωzj)1nj=1nRe(ωzj)|α2]2exp(tα28).\operatorname*{\mathbf{P}}_{S\in\binom{[n]}{t}}\left[\left|\frac{1}{t}\sum_{j\in S}\operatorname{Re}(\omega^{z_{j}})-\frac{1}{n}\sum_{j=1}^{n}\operatorname{Re}(\omega^{z_{j}})\right|\geqslant\frac{\alpha}{2}\right]\leqslant 2\exp\left(-\frac{t\alpha^{2}}{8}\right).

Analogously,

𝐏S([n]t)[|1tjSIm(ωzj)1nj=1nIm(ωzj)|α2]2exp(tα28).\operatorname*{\mathbf{P}}_{S\in\binom{[n]}{t}}\left[\left|\frac{1}{t}\sum_{j\in S}\operatorname{Im}(\omega^{z_{j}})-\frac{1}{n}\sum_{j=1}^{n}\operatorname{Im}(\omega^{z_{j}})\right|\geqslant\frac{\alpha}{2}\right]\leqslant 2\exp\left(-\frac{t\alpha^{2}}{8}\right).

Combining these two equations shows that for every MM-th root of unity ω,\omega,

𝐏S([n]t)[|1tjSωzj1nj=1nωzj|α]4exp(tα28).\operatorname*{\mathbf{P}}_{S\in\binom{[n]}{t}}\left[\left|\frac{1}{t}\sum_{j\in S}\omega^{z_{j}}-\frac{1}{n}\sum_{j=1}^{n}\omega^{z_{j}}\right|\geqslant\alpha\right]\leqslant 4\exp\left(-\frac{t\alpha^{2}}{8}\right). (3.3)

Now

discM({zi:iS})discM(Z)\displaystyle\operatorname{disc}_{M}(\{z_{i}:i\in S\})-\operatorname{disc}_{M}(Z)
=maxω|1tjSωzj|maxω|1nj=1nωzj|\displaystyle\qquad\qquad=\max_{\omega}\left|\frac{1}{t}\sum_{j\in S}\omega^{z_{j}}\right|-\max_{\omega}\left|\frac{1}{n}\sum_{j=1}^{n}\omega^{z_{j}}\right|
maxω{|1tjSωzj||1nj=1nωzj|}\displaystyle\qquad\qquad\leqslant\max_{\omega}\left\{\left|\frac{1}{t}\sum_{j\in S}\omega^{z_{j}}\right|-\left|\frac{1}{n}\sum_{j=1}^{n}\omega^{z_{j}}\right|\right\}
maxω|1tjSωzj1nj=1nωzj|,\displaystyle\qquad\qquad\leqslant\max_{\omega}\left|\frac{1}{t}\sum_{j\in S}\omega^{z_{j}}-\frac{1}{n}\sum_{j=1}^{n}\omega^{z_{j}}\right|, (3.4)

where the maximum in all equations is taken over MM-th roots of unity ω1.\omega\neq 1. Using (3.3) and the union bound over ω,\omega, we see that the right-hand side of (3.4) is bounded by α\alpha with probability at least 14Mexp(tα2/8).1-4M\exp(-t\alpha^{2}/8).

3.3. A low-discrepancy set

The construction of sparse integer sets with small discrepancy relative to a given modulus MM is a well-studied problem. There is an inherent trade-off between the size of the set and the discrepancy it achieves, and different works have focused on different regimes depending on the application at hand. We work in a regime not considered previously: for any constant ϵ>0\epsilon>0, we construct a set of cardinality at most MϵM^{\epsilon} that has MM-discrepancy at most MδM^{-\delta} for some constant δ=δ(ϵ)>0.\delta=\delta(\epsilon)>0. We construct such a set based on the following result.

Theorem 3.5 (cf. [2, 48]).

Fix an integer R1R\geqslant 1 and reals P2P\geqslant 2 and Δ1\Delta\geqslant 1. Let MM be an integer with

MP2(R+1).M\geqslant P^{2}(R+1).

Fix a set Sp{1,2,,p1}S_{p}\subseteq\{1,2,\ldots,p-1\} for each prime p(P/2,P]p\in(P/2,P] with pMp\nmid M. Suppose further that the cardinalities of any two sets from among the SpS_{p} differ by a factor of at most Δ.\Delta. Consider the multiset

S={(r+s(p1)M)modM:r=1,,R;p(P/2,P] prime with pM;sSp}.S=\{(r+s\cdot(p^{-1})_{M})\bmod M:\\ \qquad r=1,\ldots,R;\quad p\in(P/2,P]\text{ prime with }p\nmid M;\quad s\in S_{p}\}. (3.5)

Then the elements of SS are pairwise distinct and nonzero. Moreover, if SS\neq\varnothing then

discM(S)cR+clogMloglogMlogPPΔ+maxp{discp(Sp)}\operatorname{disc}_{M}(S)\leqslant\frac{c}{\sqrt{R}}+\frac{c\log M}{\log\log M}\cdot\frac{\log P}{P}\cdot\Delta+\max_{p}\{\operatorname{disc}_{p}(S_{p})\}

for some ((explicitly given)) constant c1c\geqslant 1 independent of P,R,M,Δ.P,R,M,\Delta.

Ajtai et al. [2] proved a special case of Theorem 3.5 for MM prime and Δ=1.\Delta=1. Their argument was generalized in [48, Theorem 3.6] to arbitrary moduli MM, again in the setting of Δ=1\Delta=1. The treatment in [48] in turn readily generalizes to any Δ1,\Delta\geqslant 1, and for the reader’s convenience we provide a complete proof of Theorem 3.5 in Appendix A. With this result in hand, we obtain the low-discrepancy set with the needed parameters:

Theorem 3.6 (Explicit low-discrepancy set).

For all integers M2M\geqslant 2 and t2,t\geqslant 2, there is an ((explicitly given)) nonempty set S{1,2,,M}S\subseteq\{1,2,\ldots,M\} with

|S|t,\displaystyle|S|\leqslant t, (3.6)
discM(S)Clogtt1/4logM1+loglogM,\displaystyle\operatorname{disc}_{M}(S)\leqslant\frac{C^{*}\log t}{t^{1/4}}\cdot\frac{\log M}{1+\log\log M}, (3.7)

where C1C^{*}\geqslant 1 is an ((explicitly given)) absolute constant independent of MM and t.t.

Proof.

Facts 2.17 and 2.18 imply that

π(P)π(P2)PClogP\displaystyle\pi(P)-\pi\left(\frac{P}{2}\right)\geqslant\frac{P}{C\log P} for all PC,\displaystyle\text{for all }P\geqslant C, (3.8)
ν(M)ClogM1+loglogM\displaystyle\nu(M)\leqslant\frac{C\log M}{1+\log\log M} for all M2,\displaystyle\text{for all }M\geqslant 2, (3.9)

for some integer C1C\geqslant 1 that is an absolute constant. Moreover, CC can be easily calculated from the explicit bounds in Facts 2.17 and 2.18. We will show that the theorem holds for some constant C4C2.C^{*}\geqslant 4C^{2}.

For tM,t\geqslant M, the theorem is trivial since the set S={1,2,,M}S=\{1,2,\ldots,M\} achieves discM(S)=0.\operatorname{disc}_{M}(S)=0. Also, if the right-hand side of (3.7) exceeds 11, then (3.7) holds trivially for the set S={1}.S=\{1\}. In what follows, we treat the remaining case when

t<M,\displaystyle t<M, (3.10)
4C2logtt1/4logM1+loglogM1.\displaystyle\frac{4C^{2}\log t}{t^{1/4}}\cdot\frac{\log M}{1+\log\log M}\leqslant 1. (3.11)

The latter condition forces

tmax{81,C8}.t\geqslant\max\{81,C^{8}\}. (3.12)

Set P=t1/4P=\lfloor t^{1/4}\rfloor and R=t1.R=\lfloor\sqrt{t}-1\rfloor. Then (3.10) and (3.12) imply that Pmax{3,C}P\geqslant\max\{3,C\}, R1,R\geqslant 1, and MP2(R+1).M\geqslant P^{2}(R+1). As a result, Theorem 3.5 is applicable with the sets Sp={1,2,,p1}S_{p}=\{1,2,\ldots,p-1\} for prime p(P/2,P].p\in(P/2,P]. The discrepancy of these sets is given by discp(Sp)=1/(p1)\operatorname{disc}_{p}(S_{p})=1/(p-1). Define SS by (3.5). The interval (P/2,P](P/2,P] contains π(P)π(P/2)\pi(P)-\pi(P/2) prime numbers, of which at most ν(M)\nu(M) are divisors of M.M. We have

π(P)π(P2)ν(M)\displaystyle\pi(P)-\pi\left(\frac{P}{2}\right)-\nu(M) PClogPClogM1+loglogM\displaystyle\geqslant\frac{P}{C\log P}-\frac{C\log M}{1+\log\log M}
t1/4ClogtClogM1+loglogM\displaystyle\geqslant\frac{t^{1/4}}{C\log t}-\frac{C\log M}{1+\log\log M}
>0,\displaystyle>0,

where the first step uses (3.8), (3.9), and PCP\geqslant C, and the last step uses (3.11). We conclude that (P/2,P](P/2,P] contains a prime that does not divide M,M, which in turn implies that SS is nonempty. Continuing, P3P\geqslant 3 forces Δ(P1)/(P/21)3\Delta\leqslant(P-1)/(\lceil P/2\rceil-1)\leqslant 3 in the notation of Theorem 3.5. As a result, Theorem 3.5 guarantees (3.7) for a large enough constant C.C^{*}. We note that CC^{*} can be easily calculated from the constant cc in Theorem 3.5. Since |S|RP2t|S|\leqslant RP^{2}\leqslant t by definition, the proof is complete. ∎

3.4. Discrepancy and balanced colorings

We will leverage the low-discrepancy integer set in Theorem 3.6 to construct a balanced coloring of ([n]k).\binom{[n]}{k}. For this, we now develop a connection between these two notions of pseudorandomness. We will henceforth denote the modulus by rr since in our construction, the modulus is set equal to the number of colors in the coloring of ([n]k).\binom{[n]}{k}. We start with a technical lemma.

Lemma 3.7.

Fix integers ,k,r\ell,k,r with k1\ell\geqslant k\geqslant 1 and r2.r\geqslant 2. Let Z={z1,z2,,z}Z=\{z_{1},z_{2},\ldots,z_{\ell}\} be a multiset of integers. Then for all α[0,1],\alpha\in[0,1],

maxa|𝐏S([]k)[iSzia(modr)]1r|4rkexp(/kα28)+(discr(Z)+α)k.\max_{a\in\mathbb{Z}}\left|\operatorname*{\mathbf{P}}_{S\in\binom{[\ell]}{k}}\left[\sum_{i\in S}z_{i}\equiv a\pmod{r}\right]-\frac{1}{r}\right|\\ \leqslant 4rk\exp\left(-\frac{\lfloor\ell/k\rfloor\alpha^{2}}{8}\right)+(\operatorname{disc}_{r}(Z)+\alpha)^{k}.
Proof.

Let ω\omega be a primitive rr-th root of unity. Then

𝐏S([]k)[iSzia(modr)]\displaystyle\operatorname*{\mathbf{P}}_{S\in\binom{[\ell]}{k}}\left[\sum_{i\in S}z_{i}\equiv a\pmod{r}\right] =𝐄S([]k)𝐈[iSzia(modr)]\displaystyle=\operatorname*{\mathbf{E}}_{S\in\binom{[\ell]}{k}}\mathbf{I}\left[\sum_{i\in S}z_{i}\equiv a\pmod{r}\right]
=𝐄S([]k)1rt=0r1ωt(iSzia)\displaystyle=\operatorname*{\mathbf{E}}_{S\in\binom{[\ell]}{k}}\frac{1}{r}\sum_{t=0}^{r-1}\omega^{t(\sum_{i\in S}z_{i}-a)}
=1rt=0r1𝐄S([]k)ωt(iSzia)\displaystyle=\frac{1}{r}\sum_{t=0}^{r-1}\operatorname*{\mathbf{E}}_{S\in\binom{[\ell]}{k}}\omega^{t(\sum_{i\in S}z_{i}-a)}
=1r+1rt=1r1𝐄S([]k)ωt(iSzia)\displaystyle=\frac{1}{r}+\frac{1}{r}\sum_{t=1}^{r-1}\operatorname*{\mathbf{E}}_{S\in\binom{[\ell]}{k}}\omega^{t(\sum_{i\in S}z_{i}-a)}
=1r+1rt=1r1𝐄i1,i2,,ikωt(zi1+zi2++zika),\displaystyle=\frac{1}{r}+\frac{1}{r}\sum_{t=1}^{r-1}\operatorname*{\mathbf{E}}_{i_{1},i_{2},\ldots,i_{k}}\omega^{t(z_{i_{1}}+z_{i_{2}}+\cdots+z_{i_{k}}-a)},

where the final expectation is taken over a uniformly random tuple of indices i1,i2,,ik[]i_{1},i_{2},\ldots,i_{k}\in[\ell] that are pairwise distinct. Therefore,

|𝐏S([]k)[iSzia(modr)]1r|\displaystyle\left|\operatorname*{\mathbf{P}}_{S\in\binom{[\ell]}{k}}\left[\sum_{i\in S}z_{i}\equiv a\pmod{r}\right]-\frac{1}{r}\right| =|1rt=1r1𝐄i1,i2,,ikωt(zi1+zi2++zika)|\displaystyle=\left|\frac{1}{r}\sum_{t=1}^{r-1}\operatorname*{\mathbf{E}}_{i_{1},i_{2},\ldots,i_{k}}\omega^{t(z_{i_{1}}+z_{i_{2}}+\cdots+z_{i_{k}}-a)}\right|
1rt=1r1|𝐄i1,i2,,ikωt(zi1+zi2++zika)|\displaystyle\leqslant\frac{1}{r}\sum_{t=1}^{r-1}\left|\operatorname*{\mathbf{E}}_{i_{1},i_{2},\ldots,i_{k}}\omega^{t(z_{i_{1}}+z_{i_{2}}+\cdots+z_{i_{k}}-a)}\right|
=1rt=1r1|𝐄i1,i2,,ikj=1kωtzij|.\displaystyle=\frac{1}{r}\sum_{t=1}^{r-1}\left|\operatorname*{\mathbf{E}}_{i_{1},i_{2},\ldots,i_{k}}\prod_{j=1}^{k}\omega^{tz_{i_{j}}}\right|. (3.13)

We now introduce conditioning to make i1,i2,,iki_{1},i_{2},\ldots,i_{k} independent random variables. Specifically, i1,i2,,iki_{1},i_{2},\ldots,i_{k} can be generated by the following two-step procedure:

  1. (i)

    pick uniformly random sets S1,S2,,Sk([]/k)S_{1},S_{2},\ldots,S_{k}\in\binom{[\ell]}{\lfloor\ell/k\rfloor} that are pairwise disjoint;

  2. (ii)

    for j=1,2,,k,j=1,2,\ldots,k, pick iji_{j} uniformly at random from among the elements of SjS_{j}.

By symmetry, this procedure generates every tuple (i1,i2,,ik)(i_{1},i_{2},\ldots,i_{k}) of pairwise distinct integers with equal probability. Importantly, conditioning on S1,S2,,SkS_{1},S_{2},\ldots,S_{k} makes i1,i2,,iki_{1},i_{2},\ldots,i_{k} independent. Now (3.13) gives

maxa|𝐏S([]k)[iSzia(modr)]1r|\displaystyle\max_{a\in\mathbb{Z}}\left|\operatorname*{\mathbf{P}}_{S\in\binom{[\ell]}{k}}\left[\sum_{i\in S}z_{i}\equiv a\pmod{r}\right]-\frac{1}{r}\right|
1rt=1r1|𝐄i1,i2,,ikj=1kωtzij|\displaystyle\qquad\qquad\leqslant\frac{1}{r}\sum_{t=1}^{r-1}\left|\operatorname*{\mathbf{E}}_{i_{1},i_{2},\ldots,i_{k}}\prod_{j=1}^{k}\omega^{tz_{i_{j}}}\right|
=1rt=1r1|𝐄S1,S2,,Skj=1k𝐄ijSjωtzij|\displaystyle\qquad\qquad=\frac{1}{r}\sum_{t=1}^{r-1}\left|\operatorname*{\mathbf{E}}_{S_{1},S_{2},\ldots,S_{k}}\prod_{j=1}^{k}\operatorname*{\mathbf{E}}_{i_{j}\in S_{j}}\omega^{tz_{i_{j}}}\right|
1rt=1r1𝐄S1,S2,,Skj=1k|𝐄ijSjωtzij|\displaystyle\qquad\qquad\leqslant\frac{1}{r}\sum_{t=1}^{r-1}\operatorname*{\mathbf{E}}_{S_{1},S_{2},\ldots,S_{k}}\prod_{j=1}^{k}\left|\operatorname*{\mathbf{E}}_{i_{j}\in S_{j}}\omega^{tz_{i_{j}}}\right|
1rt=1r1𝐄S1,S2,,Skj=1kdiscr({zi:iSj})\displaystyle\qquad\qquad\leqslant\frac{1}{r}\sum_{t=1}^{r-1}\operatorname*{\mathbf{E}}_{S_{1},S_{2},\ldots,S_{k}}\prod_{j=1}^{k}\operatorname{disc}_{r}(\{z_{i}:i\in S_{j}\})
𝐄S1,S2,,Skj=1kdiscr({zi:iSj}),\displaystyle\qquad\qquad\leqslant\operatorname*{\mathbf{E}}_{S_{1},S_{2},\ldots,S_{k}}\prod_{j=1}^{k}\operatorname{disc}_{r}(\{z_{i}:i\in S_{j}\}), (3.14)

where {zi:iSj}\{z_{i}:i\in S_{j}\} for each jj is a multiset of cardinality /k.\lfloor\ell/k\rfloor.

Let BjB_{j} be the event that {zi:iSj}\{z_{i}:i\in S_{j}\} has rr-discrepancy greater than discr(Z)+α,\operatorname{disc}_{r}(Z)+\alpha, and let B=B1B2BkB=B_{1}\vee B_{2}\vee\cdots\vee B_{k}. Conditioned on B,B, we get jdiscr({zi:iSj})1\prod_{j}\operatorname{disc}_{r}(\{z_{i}:i\in S_{j}\})\leqslant 1 since rr-discrepancy is at most 1.1. Conditioned on B¯,\overline{B}, we have by definition that jdiscr({zi:iSj})(discr(Z)+α)k.\prod_{j}\operatorname{disc}_{r}(\{z_{i}:i\in S_{j}\})\leqslant(\operatorname{disc}_{r}(Z)+\alpha)^{k}. Thus,

𝐄S1,S2,,Skj=1kdiscr({zi\displaystyle\operatorname*{\mathbf{E}}_{S_{1},S_{2},\ldots,S_{k}}\prod_{j=1}^{k}\operatorname{disc}_{r}(\{z_{i} :iSj})𝐏S1,S2,,Sk[B]+(discr(Z)+α)k.\displaystyle:i\in S_{j}\})\leqslant\operatorname*{\mathbf{P}}_{S_{1},S_{2},\ldots,S_{k}}[B]+(\operatorname{disc}_{r}(Z)+\alpha)^{k}. (3.15)

Recall that S1,S2,,SkS_{1},S_{2},\ldots,S_{k} are identically distributed, namely, each SjS_{j} has the distribution of a uniformly random subset of [][\ell] of cardinality /k.\lfloor\ell/k\rfloor. As a result, Lemma 3.4 guarantees that BjB_{j} occurs with probability at most 4rexp(/kα2/8)4r\exp(-\lfloor\ell/k\rfloor\alpha^{2}/8). Applying the union bound over all j,j,

𝐏S1,S2,,Sk[B]4rkexp(/kα28).\operatorname*{\mathbf{P}}_{S_{1},S_{2},\ldots,S_{k}}[B]\leqslant 4rk\exp\left(-\frac{\lfloor\ell/k\rfloor\alpha^{2}}{8}\right). (3.16)

Combining (3.14)–(3.16) concludes the proof. ∎

We are now in a position to give our general transformation of a low-discrepancy integer set into a balanced coloring of ([n]k).\binom{[n]}{k}.

Theorem 3.8 (From a low-discrepancy set to a balanced coloring).

Let n,m,k,rn,m,k,r be integers with nmk1n\geqslant m\geqslant k\geqslant 1 and r2.r\geqslant 2. Let Z={z1,z2,,zn}Z=\{z_{1},z_{2},\ldots,z_{n}\} be a multiset of integers. Define γ:([n]k)[r]\gamma\colon\binom{[n]}{k}\to[r] by

γ(S)=1+((iSzi)modr).\gamma(S)=1+\left(\left(\sum_{i\in S}z_{i}\right)\bmod r\right). (3.17)

Let β,ζ[0,1]\beta,\zeta\in[0,1] be arbitrary. Then γ\gamma is (ϵ,δ,m)(\epsilon,\delta,m)-balanced, where

ϵ\displaystyle\epsilon =4r2kexp(m/kζ28)+r(discr(Z)+β+ζ)k,\displaystyle=4r^{2}k\exp\left(-\frac{\lfloor m/k\rfloor\zeta^{2}}{8}\right)+r(\operatorname{disc}_{r}(Z)+\beta+\zeta)^{k},
δ\displaystyle\delta =4rexp(mβ28).\displaystyle=4r\exp\left(-\frac{m\beta^{2}}{8}\right).
Proof.

Let {m,m+1,,n}\ell\in\{m,m+1,\ldots,n\} be arbitrary. Then Lemma 3.4 implies that for all but a δ\delta fraction of the sets A([n]),A\in\binom{[n]}{\ell},

discr({zi:iA})discr(Z)+β.\operatorname{disc}_{r}(\{z_{i}:i\in A\})\leqslant\operatorname{disc}_{r}(Z)+\beta. (3.18)

It remains to prove that γ\gamma is ϵ\epsilon-balanced on every set A([n])A\in\binom{[n]}{\ell} that satisfies (3.18). We have

maxa[r]||γ1(a)(Ak)|(|A|k)1r|\displaystyle\max_{a\in[r]}\left|\frac{|\gamma^{-1}(a)\cap\binom{A}{k}|}{\binom{|A|}{k}}-\frac{1}{r}\right|
=maxa[r]|𝐏S(Ak)[γ(S)=a]1r|\displaystyle\qquad\qquad=\max_{a\in[r]}\left|\operatorname*{\mathbf{P}}_{S\in\binom{A}{k}}[\gamma(S)=a]-\frac{1}{r}\right|
=maxa|𝐏S(Ak)[iSzia(modr)]1r|\displaystyle\qquad\qquad=\max_{a\in\mathbb{Z}}\left|\operatorname*{\mathbf{P}}_{S\in\binom{A}{k}}\left[\sum_{i\in S}z_{i}\equiv a\pmod{r}\right]-\frac{1}{r}\right|
4rkexp(/kζ28)+(discr({zi:iA})+ζ)k\displaystyle\qquad\qquad\leqslant 4rk\exp\left(-\frac{\lfloor\ell/k\rfloor\zeta^{2}}{8}\right)+(\operatorname{disc}_{r}(\{z_{i}:i\in A\})+\zeta)^{k}
4rkexp(m/kζ28)+(discr(Z)+β+ζ)k\displaystyle\qquad\qquad\leqslant 4rk\exp\left(-\frac{\lfloor m/k\rfloor\zeta^{2}}{8}\right)+(\operatorname{disc}_{r}(Z)+\beta+\zeta)^{k}
=ϵr,\displaystyle\qquad\qquad=\frac{\epsilon}{r},

where the second step uses the definition of γ,\gamma, the third step applies Lemma 3.7, the fourth step uses (3.18) and m\ell\geqslant m, and the fifth step uses the definition of ϵ\epsilon. We have shown that γ\gamma is ϵ\epsilon-balanced on AA, thereby completing the proof. ∎

3.5. An explicit balanced coloring

Theorem 3.8 transforms any integer set with small rr-discrepancy into a balanced coloring with rr colors. We now apply this transformation to the low-discrepancy integer set constructed earlier, resulting in an explicit balanced coloring.

Theorem 3.9 (Explicit balanced coloring).

Let n,m,k,rn,m,k,r be integers with n/2mk1n/2\geqslant m\geqslant k\geqslant 1 and r2.r\geqslant 2. Let β,ζ[0,1]\beta,\zeta\in[0,1] be arbitrary. Then there is an ((explicitly given)) integer n(n/2,n]n^{\prime}\in(n/2,n] and an ((explicitly given)) (ϵ,δ,m)(\epsilon,\delta,m)-balanced coloring γ:([n]k)[r],\gamma\colon\binom{[n^{\prime}]}{k}\to[r], where

ϵ\displaystyle\epsilon =4r2kexp(m/kζ28)+r(Clognn1/4logr1+loglogr+β+ζ)k,\displaystyle=4r^{2}k\exp\left(-\frac{\lfloor m/k\rfloor\zeta^{2}}{8}\right)+r\left(\frac{C^{*}\log n}{n^{1/4}}\cdot\frac{\log r}{1+\log\log r}+\beta+\zeta\right)^{k}, (3.19)
δ\displaystyle\delta =4rexp(mβ28),\displaystyle=4r\exp\left(-\frac{m\beta^{2}}{8}\right), (3.20)

and C1C^{*}\geqslant 1 is the absolute constant from Theorem 3.6.

Proof.

By hypothesis, n2.n\geqslant 2. Invoke Theorem 3.6 with M=rM=r and t=nt=n to obtain an explicit nonempty set S{1,2,,r}S\subseteq\{1,2,\ldots,r\} with

|S|n,|S|\leqslant n,
discr(S)Clognn1/4logr1+loglogr.\operatorname{disc}_{r}(S)\leqslant\frac{C^{*}\log n}{n^{1/4}}\cdot\frac{\log r}{1+\log\log r}.

Let ZZ be the union of n/|S|\lfloor n/|S|\rfloor copies of S.S. Then discr(Z)=discr(S)\operatorname{disc}_{r}(Z)=\operatorname{disc}_{r}(S) by the definition of rr-discrepancy. Letting n=|Z|,n^{\prime}=|Z|, we claim that n(n/2,n].n^{\prime}\in(n/2,n]. Indeed, the upper bound is justified by n=|S|n/|S|n,n^{\prime}=|S|\cdot\lfloor n/|S|\rfloor\leqslant n, whereas the lower bound is the arithmetic mean of the bounds n|S|n^{\prime}\geqslant|S| and n>|S|(n/|S|1).n^{\prime}>|S|(n/|S|-1).

Now, let z1,z2,,znz_{1},z_{2},\ldots,z_{n^{\prime}} be the elements of ZZ and define γ:([n]k)[r]\gamma\colon\binom{[n^{\prime}]}{k}\to[r] by (3.17). Then Theorem 3.8 implies that γ\gamma is (ϵ,δ,m)(\epsilon,\delta,m)-balanced with ϵ,δ\epsilon,\delta given by (3.19) and (3.20), respectively. ∎

Taking β=ζ=m1/4\beta=\zeta=m^{-1/4} in Theorem 3.9, we obtain:

Corollary 3.10 (Explicit balanced coloring).

Let n,m,k,rn,m,k,r be integers with n/2mk1n/2\geqslant m\geqslant k\geqslant 1 and r2.r\geqslant 2. Then there is an ((explicitly given)) integer n(n/2,n]n^{\prime}\in(n/2,n] and an ((explicitly given)) (ϵ,δ,m)(\epsilon,\delta,m)-balanced coloring γ:([n]k)[r],\gamma\colon\binom{[n^{\prime}]}{k}\to[r], where

ϵ\displaystyle\epsilon =4r2kexp(m16k)+r(3Clog2(n+r)m1/4)k,\displaystyle=4r^{2}k\exp\left(-\frac{\sqrt{m}}{16k}\right)+r\left(\frac{3C^{*}\log^{2}(n+r)}{m^{1/4}}\right)^{k}, (3.21)
δ\displaystyle\delta =4rexp(m8),\displaystyle=4r\exp\left(-\frac{\sqrt{m}}{8}\right), (3.22)

and C1C^{*}\geqslant 1 is the absolute constant from Theorem 3.6.

The parameters in Corollary 3.10 generously meet our requirements. In our setting of interest, the integers n,m,rn,m,r are polynomially related. Thus, we obtain an (mK,mK,m)(m^{-K},m^{-K},m)-balanced coloring for any desired constant K1K\geqslant 1 by invoking Corollary 3.10 with a large enough constant k=k(K)k=k(K).

4. Hardness amplification

In Section 3, we laid the foundation for our main result by constructing an explicit integer set with small discrepancy and transforming it into a highly balanced coloring of ([n]k)\binom{[n]}{k}. In this section, we use this coloring to design a hardness amplification method for approximate degree and its one-sided variant.

4.1. Pseudodistributions from balanced colorings

Recall from the introduction that our approach centers around encoding the vectors e1,e2,,eN,0Ne_{1},e_{2},\ldots,e_{N},0^{N} as nn-bit strings with nNn\ll N so as to make the decoding easy for circuits but hard for low-degree polynomials. The construction of this code requires several steps. As a first step, we show how to convert any balanced coloring of ([n]k)\binom{[n]}{k} with rr colors into an explicit sequence of functions ϕ1,ϕ2,,ϕr:{0,1}n\phi_{1},\phi_{2},\ldots,\phi_{r}\colon\{0,1\}^{n}\to\mathbb{R} that are almost everywhere nonnegative, are supported almost entirely on pairwise disjoint sets of strings of Hamming weight kk, and are pairwise indistinguishable by low-degree polynomials. We call them pseudodistributions to highlight the fact that each ϕi\phi_{i} has 1\ell_{1} norm approximately 11, nearly all of it coming from the points where ϕi\phi_{i} is nonnegative.

Theorem 4.1.

Let ϵ,δ[0,1)\epsilon,\delta\in[0,1) be given. Let n,m,k,rn,m,k,r be positive integers with nm>kn\geqslant m>k. Let γ:([n]k)[r]\gamma\colon\binom{[n]}{k}\to[r] be a given (ϵ,δ,m)(\epsilon,\delta,m)-balanced coloring. Then there are ((explicitly given)) functions ϕ1,ϕ2,,ϕr:{0,1}n\phi_{1},\phi_{2},\ldots,\phi_{r}\colon\{0,1\}^{n}\to\mathbb{R} with the following properties.

  1. (i)

    Support: suppϕi{x{0,1}n:|x|=k or |x|m};\operatorname{supp}\phi_{i}\subseteq\{x\in\{0,1\}^{n}:|x|=k\text{ or }|x|\geqslant m\};

  2. (ii)

    Essential support: {0,1}n|ksuppϕi={1S:Sγ1(i)};\{0,1\}^{n}|_{k}\cap\operatorname{supp}\phi_{i}=\{\mathbf{1}_{S}:S\in\gamma^{-1}(i)\};

  3. (iii)

    Nonnegativity: ϕi0\phi_{i}\geqslant 0 on {0,1}n|k;\{0,1\}^{n}|_{k};

  4. (iv)

    Normalization: x:|x|=kϕi(x)=1;\sum_{x:|x|=k}\phi_{i}(x)=1;

  5. (v)

    Tail bound: x:|x|k|ϕi(x)|(8ϵ+4rδ)/(1ϵ);\sum_{x:|x|\neq k}|\phi_{i}(x)|\leqslant(8\epsilon+4r\delta)/(1-\epsilon);

  6. (vi)

    Graded bound: for some absolute constant c(0,1),c^{\prime}\in(0,1),

    x:|x|=|ϕi(x)|\displaystyle\sum_{x:|x|=\ell}|\phi_{i}(x)| ϵ+rδ1ϵm2c2exp(c(k)nm),\displaystyle\leqslant\frac{\epsilon+r\delta}{1-\epsilon}\cdot\frac{m^{2}}{c^{\prime}\ell^{2}}\cdot\exp\left(-\frac{c^{\prime}(\ell-k)}{\sqrt{nm}}\right), >k;\displaystyle\ell>k;
  7. (vii)

    Orthogonality: for some absolute constant c′′(0,1),c^{\prime\prime}\in(0,1),

    orth(ϕiϕj)c′′nm,\displaystyle\operatorname{orth}(\phi_{i}-\phi_{j})\geqslant c^{\prime\prime}\sqrt{\frac{n}{m}}, i,j[r].\displaystyle i,j\in[r].
Proof.

Define

Δ\displaystyle\Delta =mk,\displaystyle=m-k, (4.1)
D\displaystyle D =nkΔ.\displaystyle=\left\lfloor\frac{n-k}{\Delta}\right\rfloor. (4.2)

Setting ϵ=1/2\epsilon=1/2 in Lemma 2.10 gives an explicit function ω:{0,1,2,,D}\omega\colon\{0,1,2,\ldots,D\}\to\mathbb{R} with

ω(0)>14ω1,\displaystyle\omega(0)>\frac{1}{4}\|\omega\|_{1}, (4.3)
|ω(t)|1ct2 2ct/Dω1\displaystyle|\omega(t)|\leqslant\frac{1}{ct^{2}\,2^{ct/\sqrt{D}}}\cdot\|\omega\|_{1} (t=1,2,,D),\displaystyle(t=1,2,\ldots,D), (4.4)
orthωcD,\displaystyle\operatorname{orth}\omega\geqslant c\sqrt{D}, (4.5)

where 0<c<10<c<1 is an absolute constant. For convenience of notation, we will extend ω\omega to all of \mathbb{R} by setting ω(t)=0\omega(t)=0 for t{0,1,2,,D}.t\notin\{0,1,2,\ldots,D\}. With this extension, (4.4) gives

|ω(t)|\displaystyle|\omega(t)| 1ct2 2ct/Dω1,\displaystyle\leqslant\frac{1}{ct^{2}\,2^{ct/\sqrt{D}}}\cdot\|\omega\|_{1}, t[1,).\displaystyle t\in[1,\infty). (4.6)

For S([n]k),S\in\binom{[n]}{k}, define an auxiliary dual object ϕS:{0,1}n\phi_{S}\colon\{0,1\}^{n}\to\mathbb{R} by

ϕS(x)=(nk|x|k)1ω(0)1ω(|x|kΔ)iSxi.\phi_{S}(x)=\binom{n-k}{|x|-k}^{-1}\omega(0)^{-1}\omega\left(\frac{|x|-k}{\Delta}\right)\prod_{i\in S}x_{i}. (4.7)

Then

ϕS(𝟏S)=1,\displaystyle\phi_{S}(\mathbf{1}_{S})=1, S([n]k).\displaystyle S\in\binom{[n]}{k}. (4.8)

Since ϕS(x)=0\phi_{S}(x)=0 unless x|S=1k,x|_{S}=1^{k}, we see that 𝟏S\mathbf{1}_{S} is in fact the only input of Hamming weight kk at which ϕS\phi_{S} is nonzero:

ϕS(𝟏T)\displaystyle\phi_{S}(\mathbf{1}_{T}) =δS,T,\displaystyle=\delta_{S,T}, S,T([n]k).\displaystyle S,T\in\binom{[n]}{k}. (4.9)

Since suppω{0,1,2,,D},\operatorname{supp}\omega\subseteq\{0,1,2,\ldots,D\}, the only inputs xx other than 𝟏S\mathbf{1}_{S} in the support of ϕS\phi_{S} have Hamming weight |x|{|x|\in\{k+Δ,k+2Δ,,k+DΔ},k+\Delta,k+2\Delta,\ldots,k+D\Delta\}, so that in particular |x|m.|x|\geqslant m. In summary,

suppϕS\displaystyle\operatorname{supp}\phi_{S} {x:x=𝟏S or |x|m},\displaystyle\subseteq\{x:x=\mathbf{1}_{S}\text{ or }|x|\geqslant m\}, S([n]k),\displaystyle S\in\binom{[n]}{k}, (4.10)
suppϕS\displaystyle\operatorname{supp}\phi_{S} i=0D{x:|x|=k+iΔ},\displaystyle\subseteq\bigcup_{i=0}^{D}\{x:|x|=k+i\Delta\}, S([n]k).\displaystyle S\in\binom{[n]}{k}. (4.11)

We now turn to the construction of the ϕi\phi_{i}. By definition of an (ϵ,δ,m)(\epsilon,\delta,m)-balanced coloring, the given coloring γ:([n]k)[r]\gamma\colon\binom{[n]}{k}\to[r] satisfies

𝐏A([n])[1ϵr(|A|k)|γ1(i)(Ak)|1+ϵr(|A|k)]1δ,=m,m+1,,n.\operatorname*{\mathbf{P}}_{A\in\binom{[n]}{\ell}}\left[\frac{1-\epsilon}{r}\binom{|A|}{k}\leqslant\left|\gamma^{-1}(i)\cap\binom{A}{k}\right|\leqslant\frac{1+\epsilon}{r}\binom{|A|}{k}\right]\geqslant 1-\delta,\\ \ell=m,m+1,\ldots,n. (4.12)

Since δ<1,\delta<1, taking =n\ell=n in this equation leads to

||γ1(i)|1r(nk)|\displaystyle\left||\gamma^{-1}(i)|-\frac{1}{r}\binom{n}{k}\right| ϵr(nk),\displaystyle\leqslant\frac{\epsilon}{r}\binom{n}{k}, i[r],\displaystyle i\in[r], (4.13)

and in particular

|γ1(i)|\displaystyle|\gamma^{-1}(i)| 1ϵr(nk),\displaystyle\geqslant\frac{1-\epsilon}{r}\binom{n}{k}, i[r].\displaystyle i\in[r]. (4.14)

For i=1,2,,r,i=1,2,\ldots,r, we define ϕi:{0,1}n\phi_{i}\colon\{0,1\}^{n}\to\mathbb{R} by

ϕi(x)=𝐄Sγ1(i)ϕS(x)𝐈[|x|m]𝐄S([n]k)ϕS(x).\phi_{i}(x)=\operatorname*{\mathbf{E}}_{S\in\gamma^{-1}(i)}\phi_{S}(x)-\mathbf{I}[|x|\geqslant m]\operatorname*{\mathbf{E}}_{S\in\binom{[n]}{k}}\phi_{S}(x).

This definition is legitimate since γ1(i)\gamma^{-1}(i)\neq\varnothing for every ii due to (4.14) and ϵ<1.\epsilon<1.

Claim 4.2.

For all i[r]i\in[r] and {m,m+1,,n},\ell\in\{m,m+1,\ldots,n\},

𝐄A([n])|𝐏Sγ1(i)[SA]𝐏S([n]k)[SA]|2ϵ+rδ1ϵ(nk)1(k).\displaystyle\operatorname*{\mathbf{E}}_{A\in\binom{[n]}{\ell}}\left|\operatorname*{\mathbf{P}}_{S\in\gamma^{-1}(i)}[S\subseteq A]-\operatorname*{\mathbf{P}}_{S\in\binom{[n]}{k}}[S\subseteq A]\right|\leqslant\frac{2\epsilon+r\delta}{1-\epsilon}\binom{n}{k}^{-1}\binom{\ell}{k}.
Proof.

Fix i[r]i\in[r] and {m,m+1,,n}\ell\in\{m,m+1,\ldots,n\} arbitrarily for the remainder of the proof. Let A([n])A\in\binom{[n]}{\ell} be uniformly random. If γ\gamma is ϵ\epsilon-balanced on A,A, then by definition

||γ1(i)(Ak)|1r(|A|k)|\displaystyle\left|\left|\gamma^{-1}(i)\cap\binom{A}{k}\right|-\frac{1}{r}\binom{|A|}{k}\right| ϵr(|A|k).\displaystyle\leqslant\frac{\epsilon}{r}\binom{|A|}{k}.

If γ\gamma is not ϵ\epsilon-balanced on A,A, we have the trivial bound

||γ1(i)(Ak)|1r(|A|k)|\displaystyle\left|\left|\gamma^{-1}(i)\cap\binom{A}{k}\right|-\frac{1}{r}\binom{|A|}{k}\right| (|A|k).\displaystyle\leqslant\binom{|A|}{k}.

Combining these two equations, we arrive at

||γ1(i)(Ak)|1r(|A|k)|\displaystyle\left|\left|\gamma^{-1}(i)\cap\binom{A}{k}\right|-\frac{1}{r}\binom{|A|}{k}\right| (ϵr+YA)(|A|k)\displaystyle\leqslant\left(\frac{\epsilon}{r}+Y_{A}\right)\binom{|A|}{k} (4.15)

for all A,A, where YAY_{A} is the indicator random variable for the event that γ\gamma is not ϵ\epsilon-balanced on A.A. Since γ\gamma is (ϵ,δ,m)(\epsilon,\delta,m)-balanced, we further have

𝐄A([n])YAδ.\operatorname*{\mathbf{E}}_{A\in\binom{[n]}{\ell}}Y_{A}\leqslant\delta. (4.16)

Now

|𝐏Sγ1(i)[SA]𝐏S([n]k)[SA]|\displaystyle\left|\operatorname*{\mathbf{P}}_{S\in\gamma^{-1}(i)}[S\subseteq A]-\operatorname*{\mathbf{P}}_{S\in\binom{[n]}{k}}[S\subseteq A]\right|
=||γ1(i)(Ak)||γ1(i)|(|A|k)(nk)|\displaystyle\qquad\qquad=\left|\frac{|\gamma^{-1}(i)\cap\binom{A}{k}|}{|\gamma^{-1}(i)|}-\frac{\binom{|A|}{k}}{\binom{n}{k}}\right|
=1|γ1(i)|(nk)1||γ1(i)(Ak)|(nk)|γ1(i)|(|A|k)|\displaystyle\qquad\qquad=\frac{1}{|\gamma^{-1}(i)|}\binom{n}{k}^{-1}\left|\left|\gamma^{-1}(i)\cap\binom{A}{k}\right|\binom{n}{k}-|\gamma^{-1}(i)|\binom{|A|}{k}\right|
r1ϵ(nk)2||γ1(i)(Ak)|(nk)|γ1(i)|(|A|k)|\displaystyle\qquad\qquad\leqslant\frac{r}{1-\epsilon}\binom{n}{k}^{-2}\left|\left|\gamma^{-1}(i)\cap\binom{A}{k}\right|\binom{n}{k}-|\gamma^{-1}(i)|\binom{|A|}{k}\right|
r1ϵ(nk)2||γ1(i)(Ak)|(nk)1r(|A|k)(nk)|\displaystyle\qquad\qquad\leqslant\frac{r}{1-\epsilon}\binom{n}{k}^{-2}\left|\left|\gamma^{-1}(i)\cap\binom{A}{k}\right|\binom{n}{k}-\frac{1}{r}\binom{|A|}{k}\binom{n}{k}\right|
+r1ϵ(nk)2|1r(|A|k)(nk)|γ1(i)|(|A|k)|\displaystyle\qquad\qquad\quad\qquad\hfill+\frac{r}{1-\epsilon}\binom{n}{k}^{-2}\left|\frac{1}{r}\binom{|A|}{k}\binom{n}{k}-|\gamma^{-1}(i)|\binom{|A|}{k}\right|
r1ϵ(nk)2(ϵr+ϵr+YA)(|A|k)(nk)\displaystyle\qquad\qquad\leqslant\frac{r}{1-\epsilon}\binom{n}{k}^{-2}\left(\frac{\epsilon}{r}+\frac{\epsilon}{r}+Y_{A}\right)\binom{|A|}{k}\binom{n}{k}
=r1ϵ(nk)1(2ϵr+YA)(k),\displaystyle\qquad\qquad=\frac{r}{1-\epsilon}\binom{n}{k}^{-1}\left(\frac{2\epsilon}{r}+Y_{A}\right)\binom{\ell}{k}, (4.17)

where the third step is valid by (4.14), the fourth step applies the triangle inequality, the fifth step uses (4.13) and (4.15), and the last step uses |A|=.|A|=\ell. It remains to pass to expectations with respect to AA:

𝐄A([n])|𝐏Sγ1(i)[SA]𝐏S([n]k)[SA]|\displaystyle\operatorname*{\mathbf{E}}_{A\in\binom{[n]}{\ell}}\left|\operatorname*{\mathbf{P}}_{S\in\gamma^{-1}(i)}[S\subseteq A]-\operatorname*{\mathbf{P}}_{S\in\binom{[n]}{k}}[S\subseteq A]\right|
𝐄A([n])r1ϵ(nk)1(2ϵr+YA)(k)\displaystyle\qquad\qquad\leqslant\operatorname*{\mathbf{E}}_{A\in\binom{[n]}{\ell}}\frac{r}{1-\epsilon}\binom{n}{k}^{-1}\left(\frac{2\epsilon}{r}+Y_{A}\right)\binom{\ell}{k}
=r1ϵ(nk)1(2ϵr+𝐄A([n])YA)(k)\displaystyle\qquad\qquad=\frac{r}{1-\epsilon}\binom{n}{k}^{-1}\left(\frac{2\epsilon}{r}+\operatorname*{\mathbf{E}}_{A\in\binom{[n]}{\ell}}Y_{A}\right)\binom{\ell}{k}
2ϵ+rδ1ϵ(nk)1(k),\displaystyle\qquad\qquad\leqslant\frac{2\epsilon+r\delta}{1-\epsilon}\binom{n}{k}^{-1}\binom{\ell}{k},

where the last step uses (4.16). ∎

Claim 4.3.

For each i[r]i\in[r] and {m,m+1,,n}\ell\in\{m,m+1,\ldots,n\},

x:|x|=|ϕi(x)|2ϵ+rδ1ϵ|ω(kΔ)ω(0)|.\sum_{x:|x|=\ell}|\phi_{i}(x)|\leqslant\frac{2\epsilon+r\delta}{1-\epsilon}\cdot\left|\frac{\omega(\frac{\ell-k}{\Delta})}{\omega(0)}\right|.
Proof.

Fix i[r]i\in[r] and {m,m+1,,n}\ell\in\{m,m+1,\ldots,n\} arbitrarily for the remainder of the proof. Consider any input x=𝟏Ax=\mathbf{1}_{A} with |A|=.|A|=\ell. In this case, the definition of ϕi\phi_{i} simplifies to

ϕi(𝟏A)\displaystyle\phi_{i}(\mathbf{1}_{A}) =𝐄Sγ1(i)ϕS(𝟏A)𝐄S([n]k)ϕS(𝟏A).\displaystyle=\operatorname*{\mathbf{E}}_{S\in\gamma^{-1}(i)}\phi_{S}(\mathbf{1}_{A})-\operatorname*{\mathbf{E}}_{S\in\binom{[n]}{k}}\phi_{S}(\mathbf{1}_{A}).

Recall from (4.7) that

ϕS(𝟏A)=ω(kΔ)ω(0)(nkk)𝐈[SA].\phi_{S}(\mathbf{1}_{A})=\frac{\omega(\frac{\ell-k}{\Delta})}{\omega(0)\binom{n-k}{\ell-k}}\cdot\mathbf{I}[S\subseteq A].

As a result,

ϕi(𝟏A)=ω(kΔ)ω(0)(nkk)(𝐏Sγ1(i)[SA]𝐏S([n]k)[SA]).\phi_{i}(\mathbf{1}_{A})=\frac{\omega(\frac{\ell-k}{\Delta})}{\omega(0)\binom{n-k}{\ell-k}}\left(\operatorname*{\mathbf{P}}_{S\in\gamma^{-1}(i)}[S\subseteq A]-\operatorname*{\mathbf{P}}_{S\in\binom{[n]}{k}}[S\subseteq A]\right).

Passing to absolute values and summing over A([n]),A\in\binom{[n]}{\ell}, we obtain

A([n])|ϕi(𝟏A)|\displaystyle\sum_{A\in\binom{[n]}{\ell}}|\phi_{i}(\mathbf{1}_{A})| =|ω(kΔ)ω(0)(nkk)|A([n])|𝐏Sγ1(i)[SA]𝐏S([n]k)[SA]|\displaystyle=\left|\frac{\omega(\frac{\ell-k}{\Delta})}{\omega(0)\binom{n-k}{\ell-k}}\right|\sum_{A\in\binom{[n]}{\ell}}\left|\operatorname*{\mathbf{P}}_{S\in\gamma^{-1}(i)}[S\subseteq A]-\operatorname*{\mathbf{P}}_{S\in\binom{[n]}{k}}[S\subseteq A]\right|
|ω(kΔ)ω(0)(nkk)|(n)2ϵ+rδ1ϵ(nk)1(k)\displaystyle\leqslant\left|\frac{\omega(\frac{\ell-k}{\Delta})}{\omega(0)\binom{n-k}{\ell-k}}\right|\cdot\binom{n}{\ell}\cdot\frac{2\epsilon+r\delta}{1-\epsilon}\cdot\binom{n}{k}^{-1}\binom{\ell}{k}
=|ω(kΔ)ω(0)|2ϵ+rδ1ϵ,\displaystyle=\left|\frac{\omega(\frac{\ell-k}{\Delta})}{\omega(0)}\right|\cdot\frac{2\epsilon+r\delta}{1-\epsilon},

where the second step applies Claim 4.2, and the final step is justified by

(nkk)1(n)(nk)1(k)\displaystyle\binom{n-k}{\ell-k}^{-1}\binom{n}{\ell}\binom{n}{k}^{-1}\binom{\ell}{k}
=(k)!(n)!(nk)!n!!(n)!k!(nk)!n!!k!(k)!=1.\displaystyle\qquad=\frac{(\ell-k)!\;(n-\ell)!}{(n-k)!}\cdot\frac{n!}{\ell!\;(n-\ell)!}\cdot\frac{k!\;(n-k)!}{n!}\cdot\frac{\ell!}{k!\;(\ell-k)!}=1.

We now turn to the verification of properties (i)(vii) in the theorem statement.

Properties (i)(iv). Equation (4.10) shows that ϕi\phi_{i} is a linear combination of functions whose support is contained in {x:|x|=k or |x|m}.\{x:|x|=k\text{ or }|x|\geqslant m\}. This settles the support requirement (i). For T([n]k)T\in\binom{[n]}{k},

ϕi(𝟏T)\displaystyle\phi_{i}(\mathbf{1}_{T}) =𝐄Sγ1(i)ϕS(𝟏T)\displaystyle=\operatorname*{\mathbf{E}}_{S\in\gamma^{-1}(i)}\phi_{S}(\mathbf{1}_{T})
=𝐄Sγ1(i)δS,T\displaystyle=\operatorname*{\mathbf{E}}_{S\in\gamma^{-1}(i)}\delta_{S,T}
=𝐈[Tγ1(i)]|γ1(i)|,\displaystyle=\frac{\mathbf{I}[T\in\gamma^{-1}(i)]}{|\gamma^{-1}(i)|}, T([n]k),\displaystyle T\in\binom{[n]}{k}, (4.18)

where the first step is immediate from the defining equation for ϕi,\phi_{i}, and the second step applies (4.9). The essential support property (ii) and nonnegativity property (iii) are now immediate from (4.18). The normalization requirement (iv) follows by summing (4.18) over T([n]k).T\in\binom{[n]}{k}.

Properties (v) and (vi). The tail bound (v) for i[r]i\in[r] can be seen as follows:

x:|x|k|ϕi(x)|\displaystyle\sum_{x:|x|\neq k}|\phi_{i}(x)| =x:|x|m|ϕi(x)|\displaystyle=\sum_{x:|x|\geqslant m}|\phi_{i}(x)|
==mnx:|x|=|ϕi(x)|\displaystyle=\sum_{\ell=m}^{n}\,\sum_{x:|x|=\ell}|\phi_{i}(x)|
2ϵ+rδ1ϵ=mn|ω(0)1ω(kΔ)|\displaystyle\leqslant\frac{2\epsilon+r\delta}{1-\epsilon}\cdot\sum_{\ell=m}^{n}\left|\omega(0)^{-1}\omega\left(\frac{\ell-k}{\Delta}\right)\right|
2ϵ+rδ1ϵω1|ω(0)|\displaystyle\leqslant\frac{2\epsilon+r\delta}{1-\epsilon}\cdot\frac{\|\omega\|_{1}}{|\omega(0)|}
8ϵ+4rδ1ϵ,\displaystyle\leqslant\frac{8\epsilon+4r\delta}{1-\epsilon},

where the first step uses the support property (i), the third step is valid by Claim 4.3, and the last step applies (4.3).

The graded bound (vi) for (k,m)\ell\in(k,m) holds trivially since ϕi\phi_{i} vanishes on inputs of Hamming weight in (k,m),(k,m), by the support property (i). The validity of  (vi) for m\ell\geqslant m is borne out by

x:|x|=|ϕi(x)|\displaystyle\sum_{x:|x|=\ell}|\phi_{i}(x)| 2ϵ+rδ1ϵ|ω(0)1ω(kΔ)|\displaystyle\leqslant\frac{2\epsilon+r\delta}{1-\epsilon}\cdot\left|\omega(0)^{-1}\omega\left(\frac{\ell-k}{\Delta}\right)\right|
8ϵ+4rδ1ϵ1ω1|ω(kΔ)|\displaystyle\leqslant\frac{8\epsilon+4r\delta}{1-\epsilon}\cdot\frac{1}{\|\omega\|_{1}}\cdot\left|\omega\left(\frac{\ell-k}{\Delta}\right)\right|
8ϵ+4rδ1ϵ1c(kΔ)22c(k)/(ΔD)\displaystyle\leqslant\frac{8\epsilon+4r\delta}{1-\epsilon}\cdot\frac{1}{c\left(\frac{\ell-k}{\Delta}\right)^{2}2^{c(\ell-k)/(\Delta\sqrt{D})}}
=8ϵ+4rδ1ϵ1c(kmk)2 2c(k)/((mk)(nk)/(mk))\displaystyle=\frac{8\epsilon+4r\delta}{1-\epsilon}\cdot\frac{1}{c\left(\frac{\ell-k}{m-k}\right)^{2}\,2^{c(\ell-k)/((m-k)\sqrt{\lfloor(n-k)/(m-k)\rfloor})}}
8ϵ+4rδ1ϵm2c2 2c(k)/nm,\displaystyle\leqslant\frac{8\epsilon+4r\delta}{1-\epsilon}\cdot\frac{m^{2}}{c\ell^{2}\,2^{c(\ell-k)/\sqrt{nm}}},

where the first step restates Claim 4.3, the second step is justified by (4.3), the third step appeals to (4.6), and the fourth step substitutes the values from (4.1) and (4.2).

Property (vii). To begin with, we claim that

orthϕS\displaystyle\operatorname{orth}\phi_{S} cD,\displaystyle\geqslant c\sqrt{D}, S([n]k).\displaystyle S\in\binom{[n]}{k}. (4.19)

Indeed, let pp be a real polynomial on {0,1}n\{0,1\}^{n} with degp<cD\deg p<c\sqrt{D}. By linearity, it suffices to consider polynomials pp that factor as p(x)=p1(x|S)p2(x|S¯)p(x)=p_{1}(x|_{S})p_{2}(x|_{\overline{S}}) for some nonzero polynomials p1,p2p_{1},p_{2}. Now, Minsky and Papert’s symmetrization argument (Proposition 2.14) guarantees that

𝐄y{0,1}nk|y|=ip2(y)\displaystyle\operatorname*{\mathbf{E}}_{\begin{subarray}{c}y\in\{0,1\}^{n-k}\\ |y|=i\end{subarray}}p_{2}(y) =p2(i),\displaystyle=p_{2}^{*}(i), i=0,1,2,,nk,\displaystyle i=0,1,2,\ldots,n-k, (4.20)

for some univariate polynomial p2p_{2}^{*} of degree at most degp2\deg p_{2}. As a result,

ϕS,p\displaystyle\langle\phi_{S},p\rangle =x{0,1}n:x|S=1kϕS(x)p(x)\displaystyle=\sum_{\begin{subarray}{c}x\in\{0,1\}^{n}:\\ x|_{S}=1^{k}\end{subarray}}\phi_{S}(x)p(x)
=i=0Dx{0,1}n:|x|=k+iΔ,x|S=1kϕS(x)p(x)\displaystyle=\sum_{i=0}^{D}\sum_{\begin{subarray}{c}x\in\{0,1\}^{n}:\\ |x|=k+i\Delta,\;x|_{S}=1^{k}\end{subarray}}\phi_{S}(x)p(x)
=i=0Dx{0,1}n:|x|=k+iΔ,x|S=1k(nkiΔ)1ω(i)ω(0)p(x)\displaystyle=\sum_{i=0}^{D}\sum_{\begin{subarray}{c}x\in\{0,1\}^{n}:\\ |x|=k+i\Delta,\;x|_{S}=1^{k}\end{subarray}}\binom{n-k}{i\Delta}^{-1}\frac{\omega(i)}{\omega(0)}\cdot p(x)
=i=0D𝐄y{0,1}nk:|y|=iΔ[ω(i)ω(0)p1(1k)p2(y)]\displaystyle=\sum_{i=0}^{D}\;\operatorname*{\mathbf{E}}_{\begin{subarray}{c}y\in\{0,1\}^{n-k}:\\ |y|=i\Delta\end{subarray}}\left[\frac{\omega(i)}{\omega(0)}\cdot p_{1}(1^{k})p_{2}(y)\right]
=p1(1k)ω(0)i=0Dω(i)p2(iΔ)\displaystyle=\frac{p_{1}(1^{k})}{\omega(0)}\sum_{i=0}^{D}\omega(i)p_{2}^{*}(i\Delta)
=0,\displaystyle=0,

where the first and third steps use the definition of ϕS,\phi_{S}, the second step is justified by (4.11), the next-to-last step uses (4.20), and the last step is valid by (4.5) since degp2degp2degp<cD.\deg p_{2}^{*}\leqslant\deg p_{2}\leqslant\deg p<c\sqrt{D}. This settles (4.19).

Now the orthogonality requirement (vii) can be seen as follows:

orth(ϕiϕj)\displaystyle\operatorname{orth}(\phi_{i}-\phi_{j}) =orth(𝐄Sγ1(i)ϕS𝐄Sγ1(j)ϕS)\displaystyle=\operatorname{orth}\left(\operatorname*{\mathbf{E}}_{S\in\gamma^{-1}(i)}\phi_{S}-\operatorname*{\mathbf{E}}_{S\in\gamma^{-1}(j)}\phi_{S}\right)
minS([n]k)orthϕS\displaystyle\geqslant\min_{S\in\binom{[n]}{k}}\operatorname{orth}\phi_{S}
cD\displaystyle\geqslant c\sqrt{D}
=cnkmk\displaystyle=c\sqrt{\left\lfloor\frac{n-k}{m-k}\right\rfloor}
cnm,\displaystyle\geqslant c\sqrt{\left\lfloor\frac{n}{m}\right\rfloor},

where the second step uses Proposition 2.6(i), the third step is valid by (4.19), the fourth step applies the definition of D,D, and the last step uses nm.n\geqslant m.

4.2. Encoding via indistinguishable distributions

As our next step, we will show that the pseudodistributions ϕ1,ϕ2,,ϕr\phi_{1},\phi_{2},\ldots,\phi_{r} in Theorem 4.1 can be turned into actual probability distributions λ1,λ2,,λr\lambda_{1},\lambda_{2},\ldots,\lambda_{r} provided that the underlying coloring of ([n]k)\binom{[n]}{k} is sufficiently balanced. The resulting distributions λi\lambda_{i} inherit all the desirable analytic properties established for the ϕi\phi_{i} in Theorem 4.1. Specifically, the λi\lambda_{i} are supported almost entirely on pairwise disjoint sets of inputs of Hamming weight kk and are pairwise indistinguishable by low-degree polynomials.

Theorem 4.4.

Let 0<β<10<\beta<1 be given. Let n,n,m,k,rn,n^{\prime},m,k,r be positive integers with nnm>kn\geqslant n^{\prime}\geqslant m>k. Let γ:([n]k)[r]\gamma\colon\binom{[n^{\prime}]}{k}\to[r] be a given (β16rm2,β16r2m2,m)(\frac{\beta}{16rm^{2}},\frac{\beta}{16r^{2}m^{2}},m)-balanced coloring. Then there are ((explicitly given)) probability distributions λ1,λ2,,λr\lambda_{1},\lambda_{2},\ldots,\lambda_{r} on {0,1}n\{0,1\}^{n} such that

suppλi{x{0,1}n:|x|=k or |x|m},\displaystyle\operatorname{supp}\lambda_{i}\subseteq\{x\in\{0,1\}^{n}:|x|=k\text{ or }|x|\geqslant m\}, i[r],\displaystyle i\in[r], (4.21)
{0,1}n|ksuppλi={𝟏S:Sγ1(i)},\displaystyle\{0,1\}^{n}|_{k}\cap\operatorname{supp}\lambda_{i}=\{\mathbf{1}_{S}:S\in\gamma^{-1}(i)\}, i[r],\displaystyle i\in[r], (4.22)
λi({0,1}n|k)1β,\displaystyle\lambda_{i}(\{0,1\}^{n}|_{k})\geqslant 1-\beta, i[r],\displaystyle i\in[r], (4.23)
λi({0,1}n|)exp(c(k)/nm)c(k+1)2,\displaystyle\lambda_{i}(\{0,1\}^{n}|_{\ell})\leqslant\frac{\exp(-c(\ell-k)/\sqrt{n^{\prime}m})}{c(\ell-k+1)^{2}}, i[r],k,\displaystyle i\in[r],\;\ell\geqslant k, (4.24)
orth(λiλj)cnm,\displaystyle\operatorname{orth}(\lambda_{i}-\lambda_{j})\geqslant c\sqrt{\frac{n^{\prime}}{m},} i,j[r],\displaystyle i,j\in[r], (4.25)

where c(0,1)c\in(0,1) is an absolute constant, independent of n,n,m,k,r,β.n,n^{\prime},m,k,r,\beta.

Proof.

By hypothesis, γ\gamma is (ϵ,δ,m)(\epsilon,\delta,m)-balanced with

ϵ=β16rm2,\displaystyle\epsilon=\frac{\beta}{16rm^{2}},
δ=β16r2m2.\displaystyle\delta=\frac{\beta}{16r^{2}m^{2}}.

Applying Theorem 4.1 with these parameters gives functions ϕ1,ϕ2,,ϕr:{0,1}n\phi_{1},\phi_{2},\ldots,\phi_{r}\colon\{0,1\}^{n^{\prime}}\to\mathbb{R} that obey

suppϕi{x{0,1}n:|x|=k or |x|m},\displaystyle\operatorname{supp}\phi_{i}\subseteq\{x\in\{0,1\}^{n^{\prime}}:|x|=k\text{ or }|x|\geqslant m\}, (4.26)
{0,1}n|ksuppϕi={𝟏S:Sγ1(i)},\displaystyle\{0,1\}^{n^{\prime}}|_{k}\cap\operatorname{supp}\phi_{i}=\{\mathbf{1}_{S}:S\in\gamma^{-1}(i)\}, (4.27)
ϕi0on {0,1}n|k,\displaystyle\phi_{i}\geqslant 0\quad\text{on }\{0,1\}^{n^{\prime}}|_{k}, (4.28)
x:|x|=kϕi(x)=1,\displaystyle\sum_{x:|x|=k}\phi_{i}(x)=1, (4.29)
x:|x|k|ϕi(x)|βr,\displaystyle\sum_{x:|x|\neq k}|\phi_{i}(x)|\leqslant\frac{\beta}{r}, (4.30)
x:|x|=|ϕi(x)|βrc2exp(c(k)nm),>k,\displaystyle\sum_{x:|x|=\ell}|\phi_{i}(x)|\leqslant\frac{\beta}{rc^{\prime}\ell^{2}}\cdot\exp\left(-\frac{c^{\prime}(\ell-k)}{\sqrt{n^{\prime}m}}\right),\qquad\ell>k, (4.31)
orth(ϕiϕj)c′′nmfor all i,j[r],\displaystyle\operatorname{orth}(\phi_{i}-\phi_{j})\geqslant c^{\prime\prime}\sqrt{\frac{n^{\prime}}{m}}\qquad\text{for all }i,j\in[r], (4.32)

where c,c′′(0,1)c^{\prime},c^{\prime\prime}\in(0,1) are the absolute constants defined in Theorem 4.1. For i[r],i\in[r], define ϕ~i:{0,1}n\tilde{\phi}_{i}\colon\{0,1\}^{n^{\prime}}\to\mathbb{R} by

ϕ~i(x)=ϕi(x)𝐈[|x|>k]minj[r]ϕj(x).\tilde{\phi}_{i}(x)=\phi_{i}(x)-\mathbf{I}[|x|>k]\min_{j\in[r]}\phi_{j}(x). (4.33)

Equation (4.26) shows that ϕ~i\tilde{\phi}_{i} is a linear combination of functions whose support is contained in {x{0,1}n:|x|=k or |x|m}.\{x\in\{0,1\}^{n^{\prime}}:|x|=k\text{ or }|x|\geqslant m\}. As a result,

suppϕ~i{x{0,1}n:|x|=k or |x|m}.\displaystyle\operatorname{supp}\tilde{\phi}_{i}\subseteq\{x\in\{0,1\}^{n^{\prime}}:|x|=k\text{ or }|x|\geqslant m\}. (4.34)

Since ϕ~i=ϕi\tilde{\phi}_{i}=\phi_{i} on {0,1}n|k,\{0,1\}^{n^{\prime}}|_{k}, we obtain from (4.27) and (4.29) that

{0,1}n|ksuppϕ~i={𝟏S:Sγ1(i)},\displaystyle\{0,1\}^{n^{\prime}}|_{k}\cap\operatorname{supp}\tilde{\phi}_{i}=\{\mathbf{1}_{S}:S\in\gamma^{-1}(i)\}, (4.35)
x:|x|=kϕ~i(x)=1.\displaystyle\sum_{x:|x|=k}\tilde{\phi}_{i}(x)=1. (4.36)

In particular,

ϕ~i11.\|\tilde{\phi}_{i}\|_{1}\geqslant 1. (4.37)

We further claim that

ϕ~i(x)\displaystyle\tilde{\phi}_{i}(x) 0,\displaystyle\geqslant 0, x{0,1}n.\displaystyle x\in\{0,1\}^{n^{\prime}}. (4.38)

Indeed, the nonnegativity of ϕ~i(x)\tilde{\phi}_{i}(x) for x{0,1}n|kx\in\{0,1\}^{n^{\prime}}|_{k} follows from ϕ~i(x)=ϕi(x)\tilde{\phi}_{i}(x)=\phi_{i}(x) and (4.28), whereas the nonnegativity of ϕ~i(x)\tilde{\phi}_{i}(x) for x{0,1}n|>kx\in\{0,1\}^{n^{\prime}}|_{>k} follows from (4.33) via ϕ~i(x)=ϕi(x)minj[r]ϕj(x)ϕi(x)ϕi(x)0.\tilde{\phi}_{i}(x)=\phi_{i}(x)-\min_{j\in[r]}\phi_{j}(x)\geqslant\phi_{i}(x)-\phi_{i}(x)\geqslant 0.

On {0,1}n|>k,\{0,1\}^{n^{\prime}}|_{>k}, we have

ϕ~i\displaystyle\tilde{\phi}_{i} =ϕiminj[r]ϕj=maxj[r]{ϕiϕj}maxj[r]|ϕiϕj|j=1r|ϕj|.\displaystyle=\phi_{i}-\min_{j\in[r]}\phi_{j}=\max_{j\in[r]}\{\phi_{i}-\phi_{j}\}\leqslant\max_{j\in[r]}|\phi_{i}-\phi_{j}|\leqslant\sum_{j=1}^{r}|\phi_{j}|.

This conclusion is also valid on {0,1}n|<k\{0,1\}^{n^{\prime}}|_{<k} due to (4.34). Thus,

ϕ~i(x)\displaystyle\tilde{\phi}_{i}(x) j=1r|ϕj(x)|,\displaystyle\leqslant\sum_{j=1}^{r}|\phi_{j}(x)|, |x|k.\displaystyle|x|\neq k. (4.39)

Summing over xx gives

x:|x|kϕ~i(x)\displaystyle\sum_{x:|x|\neq k}\tilde{\phi}_{i}(x) x:|x|kj=1r|ϕj(x)|\displaystyle\leqslant\sum_{x:|x|\neq k}\sum_{j=1}^{r}|\phi_{j}(x)|
=j=1rx:|x|k|ϕj(x)|\displaystyle=\sum_{j=1}^{r}\sum_{x:|x|\neq k}|\phi_{j}(x)|
β,\displaystyle\leqslant\beta, (4.40)

where the third step applies (4.30).

For all i[r]i\in[r] and {m,m+1,,n}\ell\in\{m,m+1,\ldots,n^{\prime}\}, we have the graded bound

x:|x|=ϕ~i(x)\displaystyle\sum_{x:|x|=\ell}\tilde{\phi}_{i}(x) j=1rx:|x|=|ϕj(x)|\displaystyle\leqslant\sum_{j=1}^{r}\sum_{x:|x|=\ell}|\phi_{j}(x)|
1c2exp(c(k)nm),\displaystyle\leqslant\frac{1}{c^{\prime}\ell^{2}}\cdot\exp\left(-\frac{c^{\prime}(\ell-k)}{\sqrt{n^{\prime}m}}\right), (4.41)

where the first step uses (4.39), and the second step uses (4.31). Finally, for i,j[r],i,j\in[r], we have

orth(ϕ~iϕ~j)\displaystyle\operatorname{orth}(\tilde{\phi}_{i}-\tilde{\phi}_{j}) =orth(ϕiϕj)\displaystyle=\operatorname{orth}(\phi_{i}-\phi_{j})
c′′nm,\displaystyle\geqslant c^{\prime\prime}\sqrt{\frac{n^{\prime}}{m}}, (4.42)

where the first step uses the definition (4.33), and the second step uses (4.32).

Define c=min{c,c′′}.c=\min\{c^{\prime},c^{\prime\prime}\}. Equations (4.36) and (4.38) show that each ϕ~i\tilde{\phi}_{i} is a nonnegative function and is not identically zero, making it possible to define a probability distribution λi\lambda_{i} on {0,1}n\{0,1\}^{n} by

λi(x)\displaystyle\lambda_{i}(x) =1ϕ~i1ϕ~i(x1x2xn)j=n+1n(1xj).\displaystyle=\frac{1}{\|\tilde{\phi}_{i}\|_{1}}\tilde{\phi}_{i}(x_{1}x_{2}\ldots x_{n^{\prime}})\prod_{j=n^{\prime}+1}^{n}(1-x_{j}).

In other words, λi\lambda_{i} is nonzero only on inputs xx with xn+1=xn+2==xn=0,x_{n^{\prime}+1}=x_{n^{\prime}+2}=\cdots=x_{n}=0, and on such inputs λi(x)\lambda_{i}(x) is the properly normalized version of the nonnegative function ϕ~i(x1x2xn)\tilde{\phi}_{i}(x_{1}x_{2}\ldots x_{n^{\prime}}). Then properties (4.21) and (4.22) are immediate from (4.34) and (4.35), respectively. Property (4.23) follows from

λi({0,1}n|k)\displaystyle\lambda_{i}(\{0,1\}^{n}|_{k}) =1ϕ~i1x{0,1}n|kϕ~i(x1x2xn)j=n+1n(1xj)\displaystyle=\frac{1}{\|\tilde{\phi}_{i}\|_{1}}\sum_{x\in\{0,1\}^{n}|_{k}}\tilde{\phi}_{i}(x_{1}x_{2}\ldots x_{n^{\prime}})\prod_{j=n^{\prime}+1}^{n}(1-x_{j})
=1ϕ~i1x{0,1}n|kϕ~i(x1x2xn)\displaystyle=\frac{1}{\|\tilde{\phi}_{i}\|_{1}}\sum_{x\in\{0,1\}^{n^{\prime}}|_{k}}\tilde{\phi}_{i}(x_{1}x_{2}\ldots x_{n^{\prime}})
=1ϕ~i1\displaystyle=\frac{1}{\|\tilde{\phi}_{i}\|_{1}}
11+β\displaystyle\geqslant\frac{1}{1+\beta}
1β,\displaystyle\geqslant 1-\beta,

where the third step uses (4.36), and the fourth step uses (4.36) and (4.40). Property (4.24) is trivial for =k\ell=k and follows for >k\ell>k from

λi({0,1}n|)\displaystyle\lambda_{i}(\{0,1\}^{n}|_{\ell}) =1ϕ~i1x{0,1}n|ϕ~i(x1x2xn)j=n+1n(1xj)\displaystyle=\frac{1}{\|\tilde{\phi}_{i}\|_{1}}\sum_{x\in\{0,1\}^{n}|_{\ell}}\tilde{\phi}_{i}(x_{1}x_{2}\ldots x_{n^{\prime}})\prod_{j=n^{\prime}+1}^{n}(1-x_{j})
=1ϕ~i1x{0,1}n|ϕ~i(x1x2xn)\displaystyle=\frac{1}{\|\tilde{\phi}_{i}\|_{1}}\sum_{x\in\{0,1\}^{n^{\prime}}|_{\ell}}\tilde{\phi}_{i}(x_{1}x_{2}\ldots x_{n^{\prime}})
1ϕ~i1exp(c(k)/nm)c2\displaystyle\leqslant\frac{1}{\|\tilde{\phi}_{i}\|_{1}}\cdot\frac{\exp(-c^{\prime}(\ell-k)/\sqrt{n^{\prime}m})}{c^{\prime}\ell^{2}}
exp(c(k)/nm)c2,\displaystyle\leqslant\frac{\exp(-c(\ell-k)/\sqrt{n^{\prime}m})}{c\ell^{2}},

where the third step uses (4.41), and the fourth step uses (4.37) and c=min{c,c′′}.c=\min\{c^{\prime},c^{\prime\prime}\}.

It remains to verify (4.25). For this, fix i,j[r]i,j\in[r] arbitrarily. Then ϕ~i1ϕ~j1=ϕ~i,1ϕ~j,1=ϕ~iϕ~j,1=0,\|\tilde{\phi}_{i}\|_{1}-\|\tilde{\phi}_{j}\|_{1}=\langle\tilde{\phi}_{i},1\rangle-\langle\tilde{\phi}_{j},1\rangle=\langle\tilde{\phi}_{i}-\tilde{\phi}_{j},1\rangle=0, where the first step uses (4.38), and the third step uses (4.42). We thus see that

ϕ~i1=ϕ~j1.\|\tilde{\phi}_{i}\|_{1}=\|\tilde{\phi}_{j}\|_{1}. (4.43)

Next, observe that λi\lambda_{i} can be written as the product of two functions on disjoint sets of variables, and likewise for λj.\lambda_{j}. Namely,

λi\displaystyle\lambda_{i} =1ϕ~i1ϕ~iNORnn,\displaystyle=\frac{1}{\|\tilde{\phi}_{i}\|_{1}}\;\tilde{\phi}_{i}\otimes\text{\rm NOR}_{n-n^{\prime}},
λj\displaystyle\lambda_{j} =1ϕ~j1ϕ~jNORnn.\displaystyle=\frac{1}{\|\tilde{\phi}_{j}\|_{1}}\;\tilde{\phi}_{j}\otimes\text{\rm NOR}_{n-n^{\prime}}.

Now

orth(λiλj)\displaystyle\operatorname{orth}(\lambda_{i}-\lambda_{j}) =orth((ϕ~iϕ~i1ϕ~jϕ~j1)NORnn)\displaystyle=\operatorname{orth}\left(\left(\frac{\tilde{\phi}_{i}}{\|\tilde{\phi}_{i}\|_{1}}-\frac{\tilde{\phi}_{j}}{\|\tilde{\phi}_{j}\|_{1}}\right)\otimes\text{\rm NOR}_{n-n^{\prime}}\right)
orth(ϕ~iϕ~i1ϕ~jϕ~j1)\displaystyle\geqslant\operatorname{orth}\left(\frac{\tilde{\phi}_{i}}{\|\tilde{\phi}_{i}\|_{1}}-\frac{\tilde{\phi}_{j}}{\|\tilde{\phi}_{j}\|_{1}}\right)
=orth(ϕ~iϕ~jϕ~i1)\displaystyle=\operatorname{orth}\left(\frac{\tilde{\phi}_{i}-\tilde{\phi}_{j}}{\|\tilde{\phi}_{i}\|_{1}}\right)
=orth(ϕ~iϕ~j)\displaystyle=\operatorname{orth}(\tilde{\phi}_{i}-\tilde{\phi}_{j})
c′′nm,\displaystyle\geqslant c^{\prime\prime}\sqrt{\frac{n^{\prime}}{m}},

where the second step uses Proposition 2.6(ii), the third step applies (4.43), and the last step is justified by (4.42). In view of c=min{c,c′′},c=\min\{c^{\prime},c^{\prime\prime}\}, this settles (4.25) and completes the proof. ∎

4.3. Hardness amplification for approximate degree

We have reached the crux of our proof, a hardness amplification theorem for approximate degree. Unlike previous work, our hardness amplification is directly applicable to Boolean functions with sparse input and does not use componentwise composition or input compression. The theorem statement below has a large number of parameters, for maximum generality and black-box integration with the auxiliary results of previous sections. We will later derive a succinct and easy-to-apply corollary that will suffice for our hardness amplification purposes.

Theorem 4.5.

Let C1C^{*}\geqslant 1 and c(0,1)c\in(0,1) be the absolute constants from Theorems 3.6 and 4.4, respectively. Fix a real number 0<β<10<\beta<1 and positive integers n,m,k,N,θ,D,Tn,m,k,N,\theta,D,T such that

n/2m>k,\displaystyle n/2\geqslant m>k, (4.44)
4(N+1)2kexp(m16k)+(N+1)(3Clog2(n+N+1)m1/4)k\displaystyle 4(N+1)^{2}k\exp\left(-\frac{\sqrt{m}}{16k}\right)+(N+1)\left(\frac{3C^{*}\log^{2}(n+N+1)}{m^{1/4}}\right)^{k}
β16(N+1)2m2,\displaystyle\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\leqslant\frac{\beta}{16(N+1)^{2}m^{2}}, (4.45)
T8ecθ(1+lnθ)+θk,\displaystyle T\geqslant\frac{8\mathrm{e}}{c}\cdot\theta(1+\ln\theta)+\theta k, (4.46)
TD.\displaystyle T\geqslant D. (4.47)

Define

Δ=(1+2D(nθD))exp(c(Tθk)2nm).\Delta=\left(1+2^{D}\binom{n\theta}{D}\right)\exp\left(-\frac{c(T-\theta k)}{2\sqrt{nm}}\right). (4.48)

Then there is an ((explicitly given)) mapping H:({0,1}n)θ{0,1}NH\colon(\{0,1\}^{n})^{\theta}\to\{0,1\}^{N} such that:

  1. (i)

    each output bit of HH is computable by a monotone (k+1)(k+1)-DNF formula;

  2. (ii)

    for every ϵ[0,1]\epsilon\in[0,1] and every f:{0,1}N{0,1},f\colon\{0,1\}^{N}\to\{0,1\}, one has

    degϵβθ2Δ((fH)|T)min{cdegϵ(f|θ)n2m,D}.\deg_{\epsilon-\beta\theta-2\Delta}((f\circ H)|_{\leqslant T})\geqslant\min\left\{c\deg_{\epsilon}(f|_{\leqslant\theta})\sqrt{\frac{n}{2m}},D\right\}.
Proof.

We may assume that

ϵβθ2Δ0\epsilon-\beta\theta-2\Delta\geqslant 0 (4.49)

since otherwise the left-hand side in the approximate degree lower bound of (ii) is by definition ++\infty. Define VNV\subseteq\mathbb{R}^{N} by V={0N,e1,e2,,eN}V=\{0^{N},e_{1},e_{2},\ldots,e_{N}\} and set r=N+1.r=N+1. In view of (4.44) and (4.45), Corollary 3.10 gives an explicit integer n(n/2,n]n^{\prime}\in(n/2,n] and an explicit (β16r2m2,β16r2m2,m)(\frac{\beta}{16r^{2}m^{2}},\frac{\beta}{16r^{2}m^{2}},m)-balanced coloring γ:([n]k)[r]\gamma\colon\binom{[n^{\prime}]}{k}\to[r]. Alternatively, if one is not concerned about explicitness, the existence of γ\gamma can be deduced from the much simpler Corollary 3.3. Specifically, (4.45) forces mk\sqrt{m}\geqslant k and in particular nmk21.n\geqslant m\geqslant k^{2}\geqslant 1. Moreover, (4.45) implies that 3rkln(n+1)/mk/4β16r2m2.3r\sqrt{k\ln(n+1)}/m^{k/4}\leqslant\frac{\beta}{16r^{2}m^{2}}. Now Corollary 3.3 guarantees the existence of a (β16r2m2,β16r2m2,m)(\frac{\beta}{16r^{2}m^{2}},\frac{\beta}{16r^{2}m^{2}},m)-balanced coloring γ:([n]k)[r]\gamma\colon\binom{[n]}{k}\to[r].

Since nm>k,n^{\prime}\geqslant m>k, Theorem 4.4 gives explicit distributions λ0N,λe1,λe2,,λeN\lambda_{0^{N}},\lambda_{e_{1}},\lambda_{e_{2}},\ldots,\lambda_{e_{N}} on {0,1}n\{0,1\}^{n} such that

suppλv{x{0,1}n:|x|=k or |x|m},\displaystyle\operatorname{supp}\lambda_{v}\subseteq\{x\in\{0,1\}^{n}:|x|=k\text{ or }|x|\geqslant m\}, vV,\displaystyle v\in V, (4.50)
{0,1}n|ksuppλei={𝟏S:Sγ1(i)},\displaystyle\{0,1\}^{n}|_{k}\cap\operatorname{supp}\lambda_{e_{i}}=\{\mathbf{1}_{S}:S\in\gamma^{-1}(i)\}, i[N],\displaystyle i\in[N], (4.51)
{0,1}n|ksuppλ0N={𝟏S:Sγ1(N+1)},\displaystyle\{0,1\}^{n}|_{k}\cap\operatorname{supp}\lambda_{0^{N}}=\{\mathbf{1}_{S}:S\in\gamma^{-1}(N+1)\}, (4.52)
λv({0,1}n|k)1β,\displaystyle\lambda_{v}(\{0,1\}^{n}|_{k})\geqslant 1-\beta, vV,\displaystyle v\in V, (4.53)
λv({0,1}n|t)exp(c(tk)/nm)c(tk+1)2,\displaystyle\lambda_{v}(\{0,1\}^{n}|_{t})\leqslant\frac{\exp(-c(t-k)/\sqrt{nm})}{c(t-k+1)^{2}}, vV,tk,\displaystyle v\in V,\;t\geqslant k, (4.54)
orth(λvλu)cn2m,\displaystyle\operatorname{orth}(\lambda_{v}-\lambda_{u})\geqslant c\sqrt{\frac{n}{2m},} v,uV.\displaystyle v,u\in V. (4.55)

Properties (4.51) and (4.52) imply that

{0,1}n|ksuppλusuppλv\displaystyle\{0,1\}^{n}|_{k}\cap\operatorname{supp}\lambda_{u}\cap\operatorname{supp}\lambda_{v} =,\displaystyle=\varnothing, u,vV,uv.\displaystyle u,v\in V,\;u\neq v. (4.56)

For 𝐯=(𝐯1,𝐯2,,𝐯θ)Vθ\mathbf{v}=(\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{\theta})\in V^{\theta}, define

Λ𝐯=i=1θλ𝐯i.\Lambda_{\mathbf{v}}=\bigotimes_{i=1}^{\theta}\lambda_{\mathbf{v}_{i}}. (4.57)
Claim 4.6.

For each 𝐯Vθ,\mathbf{v}\in V^{\theta}, there is a function Λ𝐯~:({0,1}n)θ\widetilde{\Lambda_{\mathbf{v}}}\colon(\{0,1\}^{n})^{\theta}\to\mathbb{R} such that

suppΛ𝐯~({0,1}n)θ|T,\displaystyle\operatorname{supp}\widetilde{\Lambda_{\mathbf{v}}}\subseteq(\{0,1\}^{n})^{\theta}|_{\leqslant T}, (4.58)
orth(Λ𝐯Λ𝐯~)>D,\displaystyle\operatorname{orth}(\Lambda_{\mathbf{v}}-\widetilde{\Lambda_{\mathbf{v}}})>D, (4.59)
Λ𝐯Λ𝐯~1Δ.\displaystyle\|\Lambda_{\mathbf{v}}-\widetilde{\Lambda_{\mathbf{v}}}\|_{1}\leqslant\Delta. (4.60)

We will settle Claim 4.6, and all other claims, after the proof of the theorem.

We now turn to the construction of the monotone mapping HH in the theorem statement. Define h:{0,1}n{0,1}Nh\colon\{0,1\}^{n}\to\{0,1\}^{N} by

(h(z))j\displaystyle(h(z))_{j} =S([n]k+1)γ1(j)sSzs,\displaystyle=\bigvee_{S\in\binom{[n]}{k+1}\cup\gamma^{-1}(j)}\;\bigwedge_{s\in S}z_{s}, j=1,2,,N.\displaystyle j=1,2,\ldots,N. (4.61)

Clearly, this is a monotone DNF formula of width k+1k+1. Define H:({0,1}n)θ{0,1}NH\colon(\{0,1\}^{n})^{\theta}\to\{0,1\}^{N} by

H(x1,x2,,xθ)\displaystyle H(x_{1},x_{2},\ldots,x_{\theta}) =i=1θh(xi),\displaystyle=\bigvee_{i=1}^{\theta}h(x_{i}), x1,x2,,xθ{0,1}n,\displaystyle x_{1},x_{2},\ldots,x_{\theta}\in\{0,1\}^{n}, (4.62)

where the right-hand side is the componentwise disjunction of the Boolean vectors h(x1),h(x2),,h(xθ).h(x_{1}),h(x_{2}),\ldots,h(x_{\theta}). Observe that both hh and HH are monotone and are given explicitly in closed form in terms of the coloring γ\gamma constructed at the beginning of the proof. This settles (i).

For (ii), fix an arbitrary function f:{0,1}N{0,1}f\colon\{0,1\}^{N}\to\{0,1\} and abbreviate

d=degϵ(f|θ).d=\deg_{\epsilon}(f|_{\leqslant\theta}).

By the dual characterization of approximate degree (Fact 2.8), there is a function ψ:{0,1}N|θ\psi\colon\{0,1\}^{N}|_{\leqslant\theta}\to\mathbb{R} such that

ψ1=1,\displaystyle\|\psi\|_{1}=1, (4.63)
f,ψ>ϵ,\displaystyle\langle f,\psi\rangle>\epsilon, (4.64)
orthψd.\displaystyle\operatorname{orth}\psi\geqslant d. (4.65)

Define Ψ:({0,1}n)θ\Psi\colon(\{0,1\}^{n})^{\theta}\to\mathbb{R} by

Ψ=u{0,1}N|θψ(u)𝐄𝐯Vθ:𝐯1+𝐯2++𝐯θ=uΛ𝐯~.\Psi=\sum_{u\in\{0,1\}^{N}|_{\leqslant\theta}}\psi(u)\operatorname*{\mathbf{E}}_{\begin{subarray}{c}\mathbf{v}\in V^{\theta}:\\ \mathbf{v}_{1}+\mathbf{v}_{2}+\cdots+\mathbf{v}_{\theta}=u\end{subarray}}\widetilde{\Lambda_{\mathbf{v}}}. (4.66)

We will now use (4.44)–(4.66) to prove a sequence of claims.

Claim 4.7.

One has

suppΨ({0,1}n)θ|T,\displaystyle\operatorname{supp}\Psi\subseteq(\{0,1\}^{n})^{\theta}|_{\leqslant T}, (4.67)
Ψ11+Δ,\displaystyle\|\Psi\|_{1}\leqslant 1+\Delta, (4.68)
orthΨmin{cdn2m,D}.\displaystyle\operatorname{orth}\Psi\geqslant\min\left\{cd\sqrt{\frac{n}{2m}},D\right\}. (4.69)
Claim 4.8.

Let vVv\in V be given. Then for all z{0,1}n|ksuppλv,z\in\{0,1\}^{n}|_{k}\cap\operatorname{supp}\lambda_{v}, one has h(z)=v.h(z)=v.

Claim 4.9.

Let u{0,1}N|θu\in\{0,1\}^{N}|_{\leqslant\theta} and 𝐯=(𝐯1,𝐯2,,𝐯θ)Vθ\mathbf{v}=(\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{\theta})\in V^{\theta} be given such that 𝐯1+𝐯2++𝐯θ=u.\mathbf{v}_{1}+\mathbf{v}_{2}+\cdots+\mathbf{v}_{\theta}=u. Then

|f(u)Λ𝐯~,fH|βθ+Δ.|f(u)-\langle\widetilde{\Lambda_{\mathbf{v}}},f\circ H\rangle|\leqslant\beta\theta+\Delta. (4.70)
Claim 4.10.

One has

fH,Ψ>(ϵβθ2Δ)Ψ1.\langle f\circ H,\Psi\rangle>(\epsilon-\beta\theta-2\Delta)\|\Psi\|_{1}. (4.71)

Note from (4.67) that Ψ\Psi is supported on inputs of Hamming weight at most TT and can therefore be regarded as a function on ({0,1}n)θ|T(\{0,1\}^{n})^{\theta}|_{\leqslant T}. Now the claimed bound in (ii) follows by Fact 2.8 in view of (4.69) and (4.71). The proof of the theorem is complete. ∎

Proof of Claim 4.6..

Equations (4.46), (4.50), and (4.54) ensure that Lemma 2.5 is applicable to the distributions λ𝐯1,λ𝐯2,,λ𝐯θ\lambda_{\mathbf{v}_{1}},\lambda_{\mathbf{v}_{2}},\ldots,\lambda_{\mathbf{v}_{\theta}} with parameters =θ,\ell=\theta, B=n,B=n, C=1/cC=1/c, and α=exp(c/nm)\alpha=\exp(-c/\sqrt{nm}), whence

Λ𝐯(({0,1}n)θ|>T)\displaystyle\Lambda_{\mathbf{v}}((\{0,1\}^{n})^{\theta}|_{>T}) exp(c(Tθk)2nm),\displaystyle\leqslant\exp\left(-\frac{c(T-\theta k)}{2\sqrt{nm}}\right), 𝐯Vθ.\displaystyle\mathbf{v}\in V^{\theta}.

In view of (4.47), we can now invoke Lemma 2.13 with parameter B=nθB=n\theta to obtain a function Λ𝐯~:({0,1}n)θ\widetilde{\Lambda_{\mathbf{v}}}\colon(\{0,1\}^{n})^{\theta}\to\mathbb{R} that satisfies (4.58)–(4.60). ∎

Proof of Claim 4.7..

Observe from (4.58) that Ψ\Psi is a linear combination of functions supported on inputs of Hamming weight at most T.T. This settles the support property (4.67). Property (4.68) can be verified as follows:

Ψ1\displaystyle\|\Psi\|_{1} u{0,1}N|θ|ψ(u)|𝐄𝐯Vθ:𝐯1+𝐯2++𝐯θ=uΛ𝐯~1\displaystyle\leqslant\sum_{u\in\{0,1\}^{N}|_{\leqslant\theta}}|\psi(u)|\operatorname*{\mathbf{E}}_{\begin{subarray}{c}\mathbf{v}\in V^{\theta}:\\ \mathbf{v}_{1}+\mathbf{v}_{2}+\cdots+\mathbf{v}_{\theta}=u\end{subarray}}\|\widetilde{\Lambda{}_{\mathbf{v}}}\|_{1}
(u{0,1}N|θ|ψ(u)|)max𝐯VθΛ𝐯~1\displaystyle\leqslant\left(\sum_{u\in\{0,1\}^{N}|_{\leqslant\theta}}|\psi(u)|\right)\max_{\mathbf{v}\in V^{\theta}}\|\widetilde{\Lambda_{\mathbf{v}}}\|_{1}
=ψ1max𝐯VθΛ𝐯~1\displaystyle=\|\psi\|_{1}\max_{\mathbf{v}\in V^{\theta}}\|\widetilde{\Lambda_{\mathbf{v}}}\|_{1}
ψ1max𝐯Vθ{Λ𝐯1+Λ𝐯~Λ𝐯1}\displaystyle\leqslant\|\psi\|_{1}\max_{\mathbf{v}\in V^{\theta}}\{\|\Lambda_{\mathbf{v}}\|_{1}+\|\widetilde{\Lambda_{\mathbf{v}}}-\Lambda_{\mathbf{v}}\|_{1}\}
1+Δ,\displaystyle\leqslant 1+\Delta,

where the first and fourth steps apply the triangle inequality, and the last step uses (4.60) and (4.63).

To settle (4.69), consider an arbitrary polynomial P:({0,1}n)θP\colon(\{0,1\}^{n})^{\theta}\to\mathbb{R} of degree less than min{cdn/(2m),D}.\min\{cd\sqrt{n/(2m)},D\}. Then

Ψ,P\displaystyle\langle\Psi,P\rangle =u{0,1}N|θψ(u)𝐄𝐯Vθ:𝐯1+𝐯2++𝐯θ=uΛ𝐯~,P\displaystyle=\sum_{u\in\{0,1\}^{N}|_{\leqslant\theta}}\psi(u)\operatorname*{\mathbf{E}}_{\begin{subarray}{c}\mathbf{v}\in V^{\theta}:\\ \mathbf{v}_{1}+\mathbf{v}_{2}+\cdots+\mathbf{v}_{\theta}=u\end{subarray}}\langle\widetilde{\Lambda_{\mathbf{v}}},P\rangle
=u{0,1}N|θψ(u)𝐄𝐯Vθ:𝐯1+𝐯2++𝐯θ=u[Λ𝐯,P+Λ𝐯~Λ𝐯,P]\displaystyle=\sum_{u\in\{0,1\}^{N}|_{\leqslant\theta}}\psi(u)\operatorname*{\mathbf{E}}_{\begin{subarray}{c}\mathbf{v}\in V^{\theta}:\\ \mathbf{v}_{1}+\mathbf{v}_{2}+\cdots+\mathbf{v}_{\theta}=u\end{subarray}}[\langle\Lambda_{\mathbf{v}},P\rangle+\langle\widetilde{\Lambda_{\mathbf{v}}}-\Lambda_{\mathbf{v}},P\rangle]
=u{0,1}N|θψ(u)𝐄𝐯Vθ:𝐯1+𝐯2++𝐯θ=uΛ𝐯,P,\displaystyle=\sum_{u\in\{0,1\}^{N}|_{\leqslant\theta}}\psi(u)\operatorname*{\mathbf{E}}_{\begin{subarray}{c}\mathbf{v}\in V^{\theta}:\\ \mathbf{v}_{1}+\mathbf{v}_{2}+\cdots+\mathbf{v}_{\theta}=u\end{subarray}}\langle\Lambda_{\mathbf{v}},P\rangle, (4.72)

where the first and second steps use the linearity of inner product, and the third step is valid by (4.59). Equation (4.55) allows us to invoke Proposition 2.7 with =θ\ell=\theta and ϕv=λv\phi_{v}=\lambda_{v} to infer that the inner product Λ𝐯,P\langle\Lambda_{\mathbf{v}},P\rangle is a polynomial in 𝐯\mathbf{v} of degree less than d.d. As a result, Fact 2.15 implies that the expected value in (4.72) is a polynomial in uu of degree less than d.d. In summary, (4.72) is the inner product of ψ\psi with a polynomial of degree less than dd and is therefore zero by (4.65). The proof of (4.69) is complete. ∎

Proof of Claim 4.8..

Consider an arbitrary string z{0,1}n|kz\in\{0,1\}^{n}|_{k}. Then

(h(z))j=Sγ1(j)sSzs=𝐈[zsuppλej],(h(z))_{j}=\bigvee_{S\in\gamma^{-1}(j)}\;\bigwedge_{s\in S}z_{s}=\mathbf{I}[z\in\operatorname{supp}\lambda_{e_{j}}],

where the first step uses the defining equation (4.61) together with |z|=k,|z|=k, and the second step applies (4.51) along with |z|=k.|z|=k. Thus, h(z)h(z) can be written out explicitly as

h(z)=(𝐈[zsuppλe1],𝐈[zsuppλe2],,𝐈[zsuppλeN]).h(z)=(\mathbf{I}[z\in\operatorname{supp}\lambda_{e_{1}}],\mathbf{I}[z\in\operatorname{supp}\lambda_{e_{2}}],\ldots,\mathbf{I}[z\in\operatorname{supp}\lambda_{e_{N}}]). (4.73)

Now recall from (4.56) that a string zz of Hamming weight kk can belong to at most one of the sets suppλ0N,suppλe1,suppλe2,,suppλeN.\operatorname{supp}\lambda_{0^{N}},\operatorname{supp}\lambda_{e_{1}},\operatorname{supp}\lambda_{e_{2}},\ldots,\operatorname{supp}\lambda_{e_{N}}. As a result, if zsuppλeiz\in\operatorname{supp}\lambda_{e_{i}} then zsuppλejz\notin\operatorname{supp}\lambda_{e_{j}} for all jij\neq i and consequently h(z)=eih(z)=e_{i} by (4.73). Analogously, if zsuppλ0Nz\in\operatorname{supp}\lambda_{0^{N}} then zsuppλejz\notin\operatorname{supp}\lambda_{e_{j}} for all jj and consequently h(z)=0Nh(z)=0^{N} by (4.73). This settles the claim for all vV.v\in V.

Proof of Claim 4.9..

Since uu is a Boolean vector, the equality 𝐯1+𝐯2++𝐯θ=u\mathbf{v}_{1}+\mathbf{v}_{2}+\cdots+\mathbf{v}_{\theta}=u forces

𝐯1𝐯2𝐯θ=u,\mathbf{v}_{1}\vee\mathbf{v}_{2}\vee\cdots\vee\mathbf{v}_{\theta}=u, (4.74)

where the disjunction is applied componentwise. For any input (x1,x2,,xθ)(x_{1},x_{2},\ldots,x_{\theta}) where xi{0,1}n|ksuppλ𝐯i,x_{i}\in\{0,1\}^{n}|_{k}\cap\operatorname{supp}\lambda_{\mathbf{v}_{i}}, we have

(fH)(x1,x2,,xθ)=f(i=1θh(xi))=f(i=1θ𝐯i)=f(u),(f\circ H)(x_{1},x_{2},\ldots,x_{\theta})=f\left(\bigvee_{i=1}^{\theta}h(x_{i})\right)=f\left(\bigvee_{i=1}^{\theta}\mathbf{v}_{i}\right)=f(u),

where the second and third steps use Claim 4.8 and (4.74), respectively. Since suppΛ𝐯=i=1θsuppλ𝐯i,\operatorname{supp}\Lambda_{\mathbf{v}}=\prod_{i=1}^{\theta}\operatorname{supp}\lambda_{\mathbf{v}_{i}}, we have shown that

fHf(u)on ({0,1}n|k)θsuppΛ𝐯.f\circ H\equiv f(u)\qquad\qquad\text{on }\;\;(\{0,1\}^{n}|_{k})^{\theta}\cap\operatorname{supp}\Lambda_{\mathbf{v}}. (4.75)

Furthermore,

Λ𝐯(({0,1}n|k)θ)=i=1θλ𝐯i({0,1}n|k)(1β)θ1βθ,\Lambda_{\mathbf{v}}((\{0,1\}^{n}|_{k})^{\theta})=\prod_{i=1}^{\theta}\lambda_{\mathbf{v}_{i}}(\{0,1\}^{n}|_{k})\geqslant(1-\beta)^{\theta}\geqslant 1-\beta\theta, (4.76)

where the second step uses (4.53). Now

|f(u)\displaystyle|f(u)- Λ𝐯~,fH|\displaystyle\langle\widetilde{\Lambda_{\mathbf{v}}},f\circ H\rangle|
|f(u)Λ𝐯,fH|+|Λ𝐯Λ𝐯~,fH|\displaystyle\leqslant|f(u)-\langle\Lambda_{\mathbf{v}},f\circ H\rangle|+|\langle\Lambda_{\mathbf{v}}-\widetilde{\Lambda_{\mathbf{v}}},f\circ H\rangle|
|f(u)Λ𝐯,fH|+Λ𝐯Λ𝐯~1\displaystyle\leqslant|f(u)-\langle\Lambda_{\mathbf{v}},f\circ H\rangle|+\|\Lambda_{\mathbf{v}}-\widetilde{\Lambda_{\mathbf{v}}}\|_{1}
=|f(u)𝐄Λ𝐯fH|+Λ𝐯Λ𝐯~1\displaystyle=\left|f(u)-\operatorname*{\mathbf{E}}_{\Lambda_{\mathbf{v}}}f\circ H\right|+\|\Lambda_{\mathbf{v}}-\widetilde{\Lambda_{\mathbf{v}}}\|_{1}
𝐄Λ𝐯|f(u)fH|+Λ𝐯Λ𝐯~1\displaystyle\leqslant\operatorname*{\mathbf{E}}_{\Lambda_{\mathbf{v}}}|f(u)-f\circ H|+\|\Lambda_{\mathbf{v}}-\widetilde{\Lambda_{\mathbf{v}}}\|_{1}
0Λ𝐯(({0,1}n|k)θ)+1Λ𝐯(({0,1}n|k)θ¯)+Λ𝐯Λ𝐯~1\displaystyle\leqslant 0\cdot\Lambda_{\mathbf{v}}((\{0,1\}^{n}|_{k})^{\theta})+1\cdot\Lambda_{\mathbf{v}}(\overline{(\{0,1\}^{n}|_{k})^{\theta}})+\|\Lambda_{\mathbf{v}}-\widetilde{\Lambda_{\mathbf{v}}}\|_{1}
βθ+Λ𝐯Λ𝐯~1\displaystyle\leqslant\beta\theta+\|\Lambda_{\mathbf{v}}-\widetilde{\Lambda_{\mathbf{v}}}\|_{1}
βθ+Δ,\displaystyle\leqslant\beta\theta+\Delta,

where the last three steps use (4.75), (4.76), and (4.60), respectively. ∎

Proof of Claim 4.10..

To begin with,

f,ψfH,Ψ\displaystyle\langle f,\psi\rangle-\langle f\circ H,\Psi\rangle =u{0,1}N|θψ(u)f(u)\displaystyle=\sum_{u\in\{0,1\}^{N}|_{\leqslant\theta}}\psi(u)f(u)
u{0,1}N|θψ(u)𝐄𝐯Vθ:𝐯1+𝐯2++𝐯θ=uΛ𝐯~,fH\displaystyle\qquad\qquad-\sum_{u\in\{0,1\}^{N}|_{\leqslant\theta}}\psi(u)\operatorname*{\mathbf{E}}_{\begin{subarray}{c}\mathbf{v}\in V^{\theta}:\\ \mathbf{v}_{1}+\mathbf{v}_{2}+\cdots+\mathbf{v}_{\theta}=u\end{subarray}}\langle\widetilde{\Lambda_{\mathbf{v}}},f\circ H\rangle
=u{0,1}N|θψ(u)𝐄𝐯Vθ:𝐯1+𝐯2++𝐯θ=u[f(u)Λ𝐯~,fH]\displaystyle=\sum_{u\in\{0,1\}^{N}|_{\leqslant\theta}}\psi(u)\operatorname*{\mathbf{E}}_{\begin{subarray}{c}\mathbf{v}\in V^{\theta}:\\ \mathbf{v}_{1}+\mathbf{v}_{2}+\cdots+\mathbf{v}_{\theta}=u\end{subarray}}[f(u)-\langle\widetilde{\Lambda_{\mathbf{v}}},f\circ H\rangle]
u{0,1}N|θ|ψ(u)|𝐄𝐯Vθ:𝐯1+𝐯2++𝐯θ=u|f(u)Λ𝐯~,fH|\displaystyle\leqslant\sum_{u\in\{0,1\}^{N}|_{\leqslant\theta}}|\psi(u)|\operatorname*{\mathbf{E}}_{\begin{subarray}{c}\mathbf{v}\in V^{\theta}:\\ \mathbf{v}_{1}+\mathbf{v}_{2}+\cdots+\mathbf{v}_{\theta}=u\end{subarray}}|f(u)-\langle\widetilde{\Lambda_{\mathbf{v}}},f\circ H\rangle|
ψ1maxu{0,1}N|θmax𝐯Vθ:𝐯1+𝐯2++𝐯θ=u|f(u)Λ𝐯~,fH|\displaystyle\leqslant\|\psi\|_{1}\max_{u\in\{0,1\}^{N}|_{\leqslant\theta}}\max_{\begin{subarray}{c}\mathbf{v}\in V^{\theta}:\\ \mathbf{v}_{1}+\mathbf{v}_{2}+\cdots+\mathbf{v}_{\theta}=u\end{subarray}}|f(u)-\langle\widetilde{\Lambda_{\mathbf{v}}},f\circ H\rangle|
ψ1(βθ+Δ)\displaystyle\leqslant\|\psi\|_{1}(\beta\theta+\Delta)
=βθ+Δ,\displaystyle=\beta\theta+\Delta, (4.77)

where the last two steps use Claim 4.9 and (4.63), respectively. Then

fH,Ψ\displaystyle\langle f\circ H,\Psi\rangle >ϵβθΔ\displaystyle>\epsilon-\beta\theta-\Delta
ϵβθΔ1+ΔΨ1\displaystyle\geqslant\frac{\epsilon-\beta\theta-\Delta}{1+\Delta}\cdot\|\Psi\|_{1}
(ϵβθ2Δ)Ψ1,\displaystyle\geqslant(\epsilon-\beta\theta-2\Delta)\|\Psi\|_{1},

where the first step uses (4.64) and (4.77), the second step is justified by (4.49) and (4.68), and the third step is legitimate since a/(1+b)aba/(1+b)\geqslant a-b for all a[0,1]a\in[0,1] and b0b\geqslant 0. This completes the proof of (4.71). ∎

4.4. Hardness amplification for one-sided approximate degree

In this section, we will prove that the construction of Theorem 4.5 amplifies not only approximate degree but also its one-sided variant. We start with a technical lemma.

Lemma 4.11.

Let n,m,k,θ,D,Tn,m,k,\theta,D,T be positive integers with

T\displaystyle T n+D,\displaystyle\geqslant n+D, (4.78)
T\displaystyle T θk.\displaystyle\geqslant\theta k. (4.79)

Let y({0,1}n)θ|>Ty\in(\{0,1\}^{n})^{\theta}|_{>T} be given. Then there exists ζy:({0,1}n)θ\zeta_{y}\colon(\{0,1\}^{n})^{\theta}\to\mathbb{R} such that

suppζy({0,1}n)θ|T{y},\displaystyle\operatorname{supp}\zeta_{y}\subseteq(\{0,1\}^{n})^{\theta}|_{\leqslant T}\cup\{y\}, (4.80)
ζy(y)=1,\displaystyle\zeta_{y}(y)=1, (4.81)
orthζy>D,\displaystyle\operatorname{orth}\zeta_{y}>D, (4.82)
ζy11+2D(n(θ1)D),\displaystyle\|\zeta_{y}\|_{1}\leqslant 1+2^{D}\binom{n(\theta-1)}{D}, (4.83)
ζy=0 on ({0,1}n|k)θ.\displaystyle\zeta_{y}=0\quad\text{ on }\quad(\{0,1\}^{n}|_{\leqslant k})^{\theta}. (4.84)
Proof.

It follows from (4.79) that y=(y1,y2,,yθ)y=(y_{1},y_{2},\ldots,y_{\theta}) has a coordinate with Hamming weight greater than k.k. By symmetry, we may assume that

|y1|>k.|y_{1}|>k. (4.85)

We have |y2y3yθ|=|y||y1|>TnD,|y_{2}y_{3}\ldots y_{\theta}|=|y|-|y_{1}|>T-n\geqslant D, where the second step uses the hypothesis |y|>T|y|>T along with the trivial bound |y1|n,|y_{1}|\leqslant n, whereas the third step is legitimate by (4.78). Thanks to the newly obtained inequality |y2y3yθ|>D,|y_{2}y_{3}\ldots y_{\theta}|>D, Lemma 2.12 is applicable with B=n(θ1)B=n(\theta-1) and gives a function ζ:({0,1}n)θ1\zeta\colon(\{0,1\}^{n})^{\theta-1}\to\mathbb{R} such that

suppζ({0,1}n)θ1|D{y2y3yθ},\displaystyle\operatorname{supp}\zeta\subseteq(\{0,1\}^{n})^{\theta-1}|_{\leqslant D}\cup\{y_{2}y_{3}\ldots y_{\theta}\}, (4.86)
ζ(y2y3yθ)=1,\displaystyle\zeta(y_{2}y_{3}\ldots y_{\theta})=1, (4.87)
ζ11+2D(n(θ1)D),\displaystyle\|\zeta\|_{1}\leqslant 1+2^{D}\binom{n(\theta-1)}{D}, (4.88)
orthζ>D.\displaystyle\operatorname{orth}\zeta>D. (4.89)

We will prove that the claimed properties (4.80)–(4.84) are enjoyed by the function

ζy(x)=δx1,y1ζ(x2x3xθ).\zeta_{y}(x)=\delta_{x_{1},y_{1}}\zeta(x_{2}x_{3}\ldots x_{\theta}).

To verify the support property (4.80), fix any xx with ζy(x)0.\zeta_{y}(x)\neq 0. Then necessarily δx1,y1=1,\delta_{x_{1},y_{1}}=1, forcing x1=y1.x_{1}=y_{1}. Now (4.86) implies that xx either equals yy or has Hamming weight at most |y1|+D.|y_{1}|+D. Since |y1|+Dn+DT|y_{1}|+D\leqslant n+D\leqslant T by (4.78), this completes the proof of (4.80).

The remaining properties are straightforward. Property (4.81) follows from the corresponding property (4.87) of ζ\zeta. Likewise, property (4.82) follows from (4.89) in light of Proposition 2.6 (ii). Property (4.83) is immediate from (4.88). Finally, (4.84) is a consequence of (4.85). ∎

We are now ready to state and prove our hardness amplification result, which is a far-reaching generalization of Theorem 4.5.

Theorem 4.12.

Let C1C^{*}\geqslant 1 and c(0,1)c\in(0,1) be the absolute constants from Theorems 3.6 and 4.4, respectively. Fix a real number 0<β<10<\beta<1 and positive integers n,m,k,N,θ,D,Tn,m,k,N,\theta,D,T such that

n/2m>k,\displaystyle n/2\geqslant m>k, (4.90)
4(N+1)2kexp(m16k)+(N+1)(3Clog2(n+N+1)m1/4)k\displaystyle 4(N+1)^{2}k\exp\left(-\frac{\sqrt{m}}{16k}\right)+(N+1)\left(\frac{3C^{*}\log^{2}(n+N+1)}{m^{1/4}}\right)^{k}
β16(N+1)2m2,\displaystyle\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\leqslant\frac{\beta}{16(N+1)^{2}m^{2}}, (4.91)
T8ecθ(1+lnθ)+θk,\displaystyle T\geqslant\frac{8\mathrm{e}}{c}\cdot\theta(1+\ln\theta)+\theta k, (4.92)
TD+n.\displaystyle T\geqslant D+n. (4.93)

Define

Δ=(1+2D(nθD))exp(c(Tθk)2nm).\Delta=\left(1+2^{D}\binom{n\theta}{D}\right)\exp\left(-\frac{c(T-\theta k)}{2\sqrt{nm}}\right). (4.94)

Then there is an ((explicitly given)) mapping H:({0,1}n)θ{0,1}NH\colon(\{0,1\}^{n})^{\theta}\to\{0,1\}^{N} such that:

  1. (i)

    each output bit of HH is computable by a monotone (k+1)(k+1)-DNF formula;

  2. (ii)

    for every ϵ[0,1]\epsilon\in[0,1] and every f:{0,1}N{0,1},f\colon\{0,1\}^{N}\to\{0,1\}, one has

    degϵβθ2Δ((fH)|T)min{cdegϵ(f|θ)n2m,D};\deg_{\epsilon-\beta\theta-2\Delta}((f\circ H)|_{\leqslant T})\geqslant\min\left\{c\deg_{\epsilon}(f|_{\leqslant\theta})\sqrt{\frac{n}{2m}},D\right\};
  3. (iii)

    for every ϵ[0,1]\epsilon\in[0,1] and every f:{0,1}N{0,1}f\colon\{0,1\}^{N}\to\{0,1\} with f(1N)=0,f(1^{N})=0, one has

    degϵβθ2Δ+((fH)|T)min{cdegϵ+(f|θ)n2m,D}.\deg^{+}_{\epsilon-\beta\theta-2\Delta}((f\circ H)|_{\leqslant T})\geqslant\min\left\{c\deg^{+}_{\epsilon}(f|_{\leqslant\theta})\sqrt{\frac{n}{2m}},D\right\}.
Proof.

As in the proof of Theorem 4.5, we may assume that

ϵβθ2Δ0\epsilon-\beta\theta-2\Delta\geqslant 0 (4.95)

since otherwise the left-hand side in the lower bounds of (ii) and (iii) is by definition ++\infty. Define VNV\subseteq\mathbb{R}^{N} by V={0N,e1,e2,,eN}V=\{0^{N},e_{1},e_{2},\ldots,e_{N}\} and set r=N+1.r=N+1. Arguing as in the proof of Theorem 4.5, we obtain an explicit integer n(n/2,n]n^{\prime}\in(n/2,n] and an explicit (β16r2m2,β16r2m2,m)(\frac{\beta}{16r^{2}m^{2}},\frac{\beta}{16r^{2}m^{2}},m)-balanced coloring γ:([n]k)[r]\gamma\colon\binom{[n^{\prime}]}{k}\to[r], which in turn results in explicit distributions λ0N,λe1,λe2,,λeN\lambda_{0^{N}},\lambda_{e_{1}},\lambda_{e_{2}},\ldots,\lambda_{e_{N}} on {0,1}n\{0,1\}^{n} such that

suppλv{x{0,1}n:|x|=k or |x|m},\displaystyle\operatorname{supp}\lambda_{v}\subseteq\{x\in\{0,1\}^{n}:|x|=k\text{ or }|x|\geqslant m\}, vV,\displaystyle v\in V, (4.96)
{0,1}n|ksuppλei={𝟏S:Sγ1(i)},\displaystyle\{0,1\}^{n}|_{k}\cap\operatorname{supp}\lambda_{e_{i}}=\{\mathbf{1}_{S}:S\in\gamma^{-1}(i)\}, i[N],\displaystyle i\in[N], (4.97)
{0,1}n|ksuppλ0N={𝟏S:Sγ1(N+1)},\displaystyle\{0,1\}^{n}|_{k}\cap\operatorname{supp}\lambda_{0^{N}}=\{\mathbf{1}_{S}:S\in\gamma^{-1}(N+1)\}, (4.98)
λv({0,1}n|k)1β,\displaystyle\lambda_{v}(\{0,1\}^{n}|_{k})\geqslant 1-\beta, vV,\displaystyle v\in V, (4.99)
λv({0,1}n|t)exp(c(tk)/nm)c(tk+1)2,\displaystyle\lambda_{v}(\{0,1\}^{n}|_{t})\leqslant\frac{\exp(-c(t-k)/\sqrt{nm})}{c(t-k+1)^{2}}, vV,tk,\displaystyle v\in V,\;t\geqslant k, (4.100)
orth(λvλu)cn2m,\displaystyle\operatorname{orth}(\lambda_{v}-\lambda_{u})\geqslant c\sqrt{\frac{n}{2m},} v,uV.\displaystyle v,u\in V. (4.101)

Properties (4.97) and (4.98) imply that

{0,1}n|ksuppλusuppλv\displaystyle\{0,1\}^{n}|_{k}\cap\operatorname{supp}\lambda_{u}\cap\operatorname{supp}\lambda_{v} =,\displaystyle=\varnothing, u,vV,uv.\displaystyle u,v\in V,\;u\neq v. (4.102)

For 𝐯=(𝐯1,𝐯2,,𝐯θ)Vθ\mathbf{v}=(\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{\theta})\in V^{\theta}, define

Λ𝐯=i=1θλ𝐯i.\Lambda_{\mathbf{v}}=\bigotimes_{i=1}^{\theta}\lambda_{\mathbf{v}_{i}}. (4.103)
Claim 4.13.

For each 𝐯Vθ,\mathbf{v}\in V^{\theta}, there is a function Λ𝐯~:({0,1}n)θ\widetilde{\Lambda_{\mathbf{v}}}\colon(\{0,1\}^{n})^{\theta}\to\mathbb{R} such that

suppΛ𝐯~({0,1}n)θ|T,\displaystyle\operatorname{supp}\widetilde{\Lambda_{\mathbf{v}}}\subseteq(\{0,1\}^{n})^{\theta}|_{\leqslant T}, (4.104)
orth(Λ𝐯Λ𝐯~)>D,\displaystyle\operatorname{orth}(\Lambda_{\mathbf{v}}-\widetilde{\Lambda_{\mathbf{v}}})>D, (4.105)
Λ𝐯Λ𝐯~1Δ,\displaystyle\|\Lambda_{\mathbf{v}}-\widetilde{\Lambda_{\mathbf{v}}}\|_{1}\leqslant\Delta, (4.106)
Λ𝐯~=Λ𝐯on ({0,1}n|k)θ.\displaystyle\widetilde{\Lambda_{\mathbf{v}}}=\Lambda_{\mathbf{v}}\qquad\text{on }(\{0,1\}^{n}|_{\leqslant k})^{\theta}. (4.107)

We will settle Claim 4.13 after the proof of the theorem. We now define the monotone mapping HH exactly the same way as in the proof of Theorem 4.5. Specifically, define h:{0,1}n{0,1}Nh\colon\{0,1\}^{n}\to\{0,1\}^{N} by

(h(z))j\displaystyle(h(z))_{j} =S([n]k+1)γ1(j)sSzs,\displaystyle=\bigvee_{S\in\binom{[n]}{k+1}\cup\gamma^{-1}(j)}\;\bigwedge_{s\in S}z_{s}, j=1,2,,N.\displaystyle j=1,2,\ldots,N. (4.108)

Define H:({0,1}n)θ{0,1}NH\colon(\{0,1\}^{n})^{\theta}\to\{0,1\}^{N} by

H(x1,x2,,xθ)\displaystyle H(x_{1},x_{2},\ldots,x_{\theta}) =i=1θh(xi),\displaystyle=\bigvee_{i=1}^{\theta}h(x_{i}), x1,x2,,xθ{0,1}n,\displaystyle x_{1},x_{2},\ldots,x_{\theta}\in\{0,1\}^{n}, (4.109)

where the right-hand side is the componentwise disjunction of the Boolean vectors h(x1),h(x2),,h(xθ).h(x_{1}),h(x_{2}),\ldots,h(x_{\theta}). With these definitions, items (i) and (ii) are immediate because they are restatements of Theorem 4.5 (i)(ii). To prove the remaining item (iii), fix an arbitrary function f:{0,1}N{0,1}f\colon\{0,1\}^{N}\to\{0,1\} with

f(1N)=0,f(1^{N})=0, (4.110)

and abbreviate

d=degϵ+(f|θ).d=\deg^{+}_{\epsilon}(f|_{\leqslant\theta}).

By the dual characterization of one-sided approximate degree (Fact 2.9), there is a function ψ:{0,1}N|θ\psi\colon\{0,1\}^{N}|_{\leqslant\theta}\to\mathbb{R} such that

ψ1=1,\displaystyle\|\psi\|_{1}=1, (4.111)
f,ψ>ϵ,\displaystyle\langle f,\psi\rangle>\epsilon, (4.112)
orthψd,\displaystyle\operatorname{orth}\psi\geqslant d, (4.113)
ψ(x)0wheneverf(x)=1.\displaystyle\psi(x)\geqslant 0\quad\text{whenever}\quad f(x)=1. (4.114)

Define Ψ:({0,1}n)θ\Psi\colon(\{0,1\}^{n})^{\theta}\to\mathbb{R} by

Ψ=u{0,1}N|θψ(u)𝐄𝐯Vθ:𝐯1+𝐯2++𝐯θ=uΛ𝐯~.\Psi=\sum_{u\in\{0,1\}^{N}|_{\leqslant\theta}}\psi(u)\operatorname*{\mathbf{E}}_{\begin{subarray}{c}\mathbf{v}\in V^{\theta}:\\ \mathbf{v}_{1}+\mathbf{v}_{2}+\cdots+\mathbf{v}_{\theta}=u\end{subarray}}\widetilde{\Lambda_{\mathbf{v}}}. (4.115)

Equations (4.90)–(4.115) subsume the corresponding equations (4.44)–(4.66) in the proof of Theorem 4.5. Recall that from (4.44)–(4.66), we deduced Claims 4.74.10. As a result, Claims 4.74.10 remain valid here as well. In particular, we have

suppΨ({0,1}n)θ|T,\displaystyle\operatorname{supp}\Psi\subseteq(\{0,1\}^{n})^{\theta}|_{\leqslant T}, (4.116)
orthΨmin{cdn2m,D},\displaystyle\operatorname{orth}\Psi\geqslant\min\left\{cd\sqrt{\frac{n}{2m}},D\right\}, (4.117)
hvon {0,1}n|ksuppλv\displaystyle h\equiv v\quad\text{on }\{0,1\}^{n}|_{k}\cap\operatorname{supp}\lambda_{v} (vV),\displaystyle(v\in V), (4.118)
fH,Ψ>(ϵβθ2Δ)Ψ1.\displaystyle\langle f\circ H,\Psi\rangle>(\epsilon-\beta\theta-2\Delta)\|\Psi\|_{1}. (4.119)

Moreover, we will shortly prove the following new claim.

Claim 4.14.

Ψ(x)0\Psi(x)\geqslant 0 whenever (fH)(x)=1.(f\circ H)(x)=1.

The lower bound on the one-sided approximate degree in (iii) now follows from the dual characterization of one-sided approximate degree (Fact 2.9) in view of (4.116)–(4.119) and Claim 4.14. This completes the proof of Theorem 4.12. ∎

Proof of Claim 4.13..

Fix 𝐯Vθ\mathbf{v}\in V^{\theta} arbitrarily for the remainder of the proof. Equations (4.92), (4.96), and (4.100) ensure that Lemma 2.5 is applicable to the distributions λ𝐯1,λ𝐯2,,λ𝐯θ\lambda_{\mathbf{v}_{1}},\lambda_{\mathbf{v}_{2}},\ldots,\lambda_{\mathbf{v}_{\theta}} with parameters =θ,\ell=\theta, B=n,B=n, C=1/cC=1/c, and α=exp(c/nm)\alpha=\exp(-c/\sqrt{nm}), whence

Λ𝐯(({0,1}n)θ|>T)\displaystyle\Lambda_{\mathbf{v}}((\{0,1\}^{n})^{\theta}|_{>T}) exp(c(Tθk)2nm).\displaystyle\leqslant\exp\left(-\frac{c(T-\theta k)}{2\sqrt{nm}}\right). (4.120)

Recall from (4.92) and (4.93) that TD+nT\geqslant D+n and Tθk,T\geqslant\theta k, which makes Lemma 4.11 applicable. Define Λ𝐯~:({0,1}n)θ\widetilde{\Lambda_{\mathbf{v}}}\colon(\{0,1\}^{n})^{\theta}\to\mathbb{R} by

Λ𝐯~=Λ𝐯y({0,1}n)θ|>TΛ𝐯(y)ζy,\widetilde{\Lambda_{\mathbf{v}}}=\Lambda_{\mathbf{v}}-\sum_{y\in(\{0,1\}^{n})^{\theta}|_{>T}}\Lambda_{\mathbf{v}}(y)\zeta_{y}, (4.121)

where ζy\zeta_{y} is as given by Lemma 4.11. To verify the support property (4.104), fix any input xx of Hamming weight |x|>T.|x|>T. For all yy in the summation with yxy\neq x, we have ζy(x)=0\zeta_{y}(x)=0 in view of (4.80). As a result, (4.121) simplifies to Λ𝐯~(x)=Λ𝐯(x)Λ𝐯(x)ζx(x).\widetilde{\Lambda_{\mathbf{v}}}(x)=\Lambda_{\mathbf{v}}(x)-\Lambda_{\mathbf{v}}(x)\zeta_{x}(x). In view of (4.81), we conclude that Λ𝐯~(x)=0\widetilde{\Lambda_{\mathbf{v}}}(x)=0.

The orthogonality property (4.105) follows from

orth(Λ𝐯Λ𝐯~)miny({0,1}n)θ|>Torthζy>D,\operatorname{orth}(\Lambda_{\mathbf{v}}-\widetilde{\Lambda_{\mathbf{v}}})\geqslant\min_{y\in(\{0,1\}^{n})^{\theta}|_{>T}}\operatorname{orth}\zeta_{y}>D,

where the first step uses the defining equation (4.121) and Proposition 2.6 (i), and the second step is legitimate by (4.82).

Property (4.106) can be verified as follows:

Λ𝐯Λ𝐯~1\displaystyle\|\Lambda_{\mathbf{v}}-\widetilde{\Lambda_{\mathbf{v}}}\|_{1} y({0,1}n)θ|>TΛ𝐯(y)ζy1\displaystyle\leqslant\sum_{y\in(\{0,1\}^{n})^{\theta}|_{>T}}\Lambda_{\mathbf{v}}(y)\|\zeta_{y}\|_{1}
(1+2D(n(θ1)D))y({0,1}n)θ|>TΛ𝐯(y)\displaystyle\leqslant\left(1+2^{D}\binom{n(\theta-1)}{D}\right)\sum_{y\in(\{0,1\}^{n})^{\theta}|_{>T}}\Lambda_{\mathbf{v}}(y)
(1+2D(n(θ1)D))exp(c(Tθk)2nm)\displaystyle\leqslant\left(1+2^{D}\binom{n(\theta-1)}{D}\right)\exp\left(-\frac{c(T-\theta k)}{2\sqrt{nm}}\right)
Δ,\displaystyle\leqslant\Delta,

where the first step uses the triangle inequality along with the defining equation (4.121), the second step applies (4.83), the third step is valid by (4.120), and the fourth step uses the definition (4.94).

Finally, (4.107) follows from the definition (4.121) in view of (4.84). ∎

Proof of Claim 4.14..

We will prove the claim in contrapositive form. Specifically, fix an arbitrary string x=(x1,x2,,xθ)({0,1}n)θx=(x_{1},x_{2},\ldots,x_{\theta})\in(\{0,1\}^{n})^{\theta} with Ψ(x)<0\Psi(x)<0. Our objective is to deduce that (fH)(x)=0.(f\circ H)(x)=0.

There are two cases to consider. If |xi|>k|x_{i}|>k for some i,i, then the defining equation (4.108) implies that h(xi)=1Nh(x_{i})=1^{N}. As a result,

(fH)(x)=f(H(x))=f(i=1θh(xi))=f(1N)=0,(f\circ H)(x)=f(H(x))=f\left(\bigvee_{i=1}^{\theta}h(x_{i})\right)=f(1^{N})=0,

where the last step uses (4.110).

We now treat the complementary case x({0,1}n|k)θ.x\in(\{0,1\}^{n}|_{\leqslant k})^{\theta}. By (4.107) and (4.115),

Ψ(x)\displaystyle\Psi(x) =u{0,1}N|θψ(u)𝐄𝐯Vθ:𝐯1+𝐯2++𝐯θ=uΛ𝐯(x)\displaystyle=\sum_{u\in\{0,1\}^{N}|_{\leqslant\theta}}\psi(u)\operatorname*{\mathbf{E}}_{\begin{subarray}{c}\mathbf{v}\in V^{\theta}:\\ \mathbf{v}_{1}+\mathbf{v}_{2}+\cdots+\mathbf{v}_{\theta}=u\end{subarray}}\Lambda_{\mathbf{v}}(x)
=u{0,1}N|θψ(u)𝐄𝐯Vθ:𝐯1+𝐯2++𝐯θ=ui=1θλ𝐯i(xi).\displaystyle=\sum_{u\in\{0,1\}^{N}|_{\leqslant\theta}}\psi(u)\operatorname*{\mathbf{E}}_{\begin{subarray}{c}\mathbf{v}\in V^{\theta}:\\ \mathbf{v}_{1}+\mathbf{v}_{2}+\cdots+\mathbf{v}_{\theta}=u\end{subarray}}\;\prod_{i=1}^{\theta}\lambda_{\mathbf{v}_{i}}(x_{i}). (4.122)

It follows from Ψ(x)<0\Psi(x)<0 that the summation in (4.122) contains at least one negative term, corresponding to a string u{0,1}N|θu\in\{0,1\}^{N}|_{\leqslant\theta}. This forces

ψ(u)<0\psi(u)<0 (4.123)

and additionally implies the existence of 𝐯1,𝐯2,,𝐯θV\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{\theta}\in V with

xisuppλ𝐯i,\displaystyle x_{i}\in\operatorname{supp}\lambda_{\mathbf{v}_{i}}, i=1,2,,θ,\displaystyle i=1,2,\ldots,\theta, (4.124)
i=1θ𝐯i=u.\displaystyle\sum_{i=1}^{\theta}\mathbf{v}_{i}=u. (4.125)

Since x({0,1}n|k)θx\in(\{0,1\}^{n}|_{\leqslant k})^{\theta} in the case under consideration, it follows from (4.96) and (4.124) that |xi|=k|x_{i}|=k for all i.i. Now (4.118) ensures that h(xi)=𝐯ih(x_{i})=\mathbf{v}_{i} for all i,i, which in turn makes it possible to rewrite (4.125) as i=1θh(xi)=u.\sum_{i=1}^{\theta}h(x_{i})=u. Since u,h(x1),h(x2),,h(xθ){0,1}N,u,h(x_{1}),h(x_{2}),\ldots,h(x_{\theta})\in\{0,1\}^{N}, we conclude that i=1θh(xi)=u.\bigvee_{i=1}^{\theta}h(x_{i})=u. As a result,

(fH)(x)=f(i=1θh(xi))=f(u)=0,(f\circ H)(x)=f\left(\bigvee_{i=1}^{\theta}h(x_{i})\right)=f(u)=0,

where the last step is immediate from (4.114) and (4.123). ∎

4.5. Specializing the parameters

Theorems 4.5 and 4.12 have a large number of parameters that one can adjust to produce various hardness amplification theorems. We do so in this section. For any constants α(0,1]\alpha\in(0,1] and C1,C\geqslant 1, we show how to transform a function ff on θC\theta^{C} bits with approximate degree

degϵ(f|θ)θ1α\deg_{\epsilon}(f|_{\leqslant\theta})\geqslant\theta^{1-\alpha} (4.126)

into a function FF on T1+αT^{1+\alpha} bits with approximate degree

degϵ1T(F|T)T123α.\deg_{\epsilon-\frac{1}{T}}(F|_{\leqslant T})\geqslant T^{1-\frac{2}{3}\alpha}. (4.127)

Comparing the exponents in (4.126) and (4.127), we see that FF is harder to approximate than ff relative to the Hamming weight of the inputs for FF and ff, respectively. Moreover, we show that FF is expressible as F=fHF=f\circ H for some mapping HH whose output bits are computable by monotone DNF formulas of constant width. In particular, if ff is a monotone DNF formula of constant width, then so is F.F. The formal statement follows.

Corollary 4.15.

Fix reals α(0,1],\alpha\in(0,1], A1,A\geqslant 1, and C1C\geqslant 1 arbitrarily. Then for all large enough integers θ,\theta, there is an ((explicitly given)) mapping H:{0,1}T1+α{0,1}θCH\colon\{0,1\}^{\lfloor T^{1+\alpha}\rfloor}\to\{0,1\}^{\lfloor\theta^{C}\rfloor} with T=θlog2θT=\lfloor\theta\log^{2}\theta\rfloor such that the output bits of HH are computable by monotone 50(A+C)/α\lceil 50(A+C)/\alpha\rceil-DNF formulas and

degϵ1TA((fH)|T)T123α\deg_{\epsilon-\frac{1}{T^{A}}}((f\circ H)|_{\leqslant T})\geqslant T^{1-\frac{2}{3}\alpha} (4.128)

for every ϵ[0,1]\epsilon\in[0,1] and every function f:{0,1}θC{0,1}f\colon\{0,1\}^{\lfloor\theta^{C}\rfloor}\to\{0,1\} with degϵ(f|θ)θ1α\deg_{\epsilon}(f|_{\leqslant\theta})\geqslant\theta^{1-\alpha}.

Proof.

Invoke Theorem 4.5 with parameters

β\displaystyle\beta =12θθlog2θA,\displaystyle=\frac{1}{2\theta\lfloor\theta\log^{2}\theta\rfloor^{A}}, (4.129)
N\displaystyle N =θC,\displaystyle=\lfloor\theta^{C}\rfloor, (4.130)
n\displaystyle n =θα,\displaystyle=\lfloor\theta^{\alpha}\rfloor, (4.131)
m\displaystyle m =θα/4,\displaystyle=\lfloor\theta^{\alpha/4}\rfloor, (4.132)
k\displaystyle k =50(A+C)α1,\displaystyle=\left\lceil\frac{50(A+C)}{\alpha}\right\rceil-1, (4.133)
D\displaystyle D =θ158α,\displaystyle=\lceil\theta^{1-\frac{5}{8}\alpha}\rceil, (4.134)
T\displaystyle T =θlog2θ.\displaystyle=\lfloor\theta\log^{2}\theta\rfloor. (4.135)

Provided that θ\theta is large enough, these parameter settings satisfy the theorem hypotheses (4.44)–(4.47), whereas (4.48) gives

Δ14TA.\Delta\leqslant\frac{1}{4T^{A}}. (4.136)

As a result, Theorem 4.5 guarantees that

degϵ1TA((fH)|T)c2θ158α,\deg_{\epsilon-\frac{1}{T^{A}}}((f\circ H)|_{\leqslant T})\geqslant\frac{c}{2}\cdot\theta^{1-\frac{5}{8}\alpha}, (4.137)

where c(0,1)c\in(0,1) is the absolute constant from Theorem 4.4 and H:{0,1}T1+α{0,1}θCH\colon\{0,1\}^{\lfloor T^{1+\alpha}\rfloor}\to\{0,1\}^{\lfloor\theta^{C}\rfloor} is an explicit mapping whose output bits are computable by monotone 50(A+C)/α\lceil 50(A+C)/\alpha\rceil-DNF formulas. (In fact, HH uses only nθ(T/log2T)1+αn\theta\approx(T/\log^{2}T)^{1+\alpha} input bits, but this improvement is not relevant for our purposes.) Provided that θ\theta is large enough relative to the absolute constant c,c, we infer (4.128) immediately from (4.137). ∎

Analogously, we have the following hardness amplification result for one-sided approximate degree.

Corollary 4.16.

Fix reals α(0,1],\alpha\in(0,1], A1,A\geqslant 1, and C1C\geqslant 1 arbitrarily. Then for all large enough integers θ,\theta, there is an ((explicitly given)) mapping H:{0,1}T1+α{0,1}θCH\colon\{0,1\}^{\lfloor T^{1+\alpha}\rfloor}\to\{0,1\}^{\lfloor\theta^{C}\rfloor} with T=θlog2θT=\lfloor\theta\log^{2}\theta\rfloor such that the output bits of HH are computable by monotone 50(A+C)/α\lceil 50(A+C)/\alpha\rceil-DNF formulas and

degϵ1TA+((fH)|T)T123α\deg_{\epsilon-\frac{1}{T^{A}}}^{+}((f\circ H)|_{\leqslant T})\geqslant T^{1-\frac{2}{3}\alpha} (4.138)

for every ϵ[0,1]\epsilon\in[0,1] and every function f:{0,1}θC{0,1}f\colon\{0,1\}^{\lfloor\theta^{C}\rfloor}\to\{0,1\} such that degϵ+(f|θ)θ1α\deg_{\epsilon}^{+}(f|_{\leqslant\theta})\geqslant\theta^{1-\alpha} and f(1θC)=0.f(1^{\lfloor\theta^{C}\rfloor})=0.

Proof.

The proof is the same, mutatis mutandis, as that of Corollary 4.15. Specifically, invoke Theorem 4.12 with parameters (4.129)–(4.135). Provided that θ\theta is large enough, these parameter settings satisfy the theorem hypotheses (4.90)–(4.93), whereas (4.94) gives (4.136). As a result, Theorem 4.12 guarantees that

degϵ1TA+((fH)|T)c2θ158α,\deg_{\epsilon-\frac{1}{T^{A}}}^{+}((f\circ H)|_{\leqslant T})\geqslant\frac{c}{2}\cdot\theta^{1-\frac{5}{8}\alpha}, (4.139)

where c(0,1)c\in(0,1) is the absolute constant from Theorem 4.4 and H:{0,1}T1+α{0,1}θCH\colon\{0,1\}^{\lfloor T^{1+\alpha}\rfloor}\to\{0,1\}^{\lfloor\theta^{C}\rfloor} is an explicit mapping whose output bits are computable by monotone 50(A+C)/α\lceil 50(A+C)/\alpha\rceil-DNF formulas. Provided that θ\theta is large enough relative to the absolute constant c,c, this settles (4.138). ∎

5. Main results

In this section, we will settle our main results on approximate degree and present their applications to communication complexity.

5.1. Approximate degree of DNF and CNF formulas

We will start with the two-sided case. Our proof here amounts to taking the trivial one-variable formula x1x_{1} and iteratively applying the hardness amplification of Corollary 4.15.

Theorem 5.1.

For every δ(0,1]\delta\in(0,1] and Δ1,\Delta\geqslant 1, there is a constant c1c\geqslant 1 and an ((explicitly given)) family {fn}n=1\{f_{n}\}_{n=1}^{\infty} of functions fn:{0,1}n{0,1}f_{n}\colon\{0,1\}^{n}\to\{0,1\} such that each fnf_{n} is computable by a monotone cc-DNF formula and satisfies

deg121nΔ(fn)\displaystyle\deg_{\frac{1}{2}-\frac{1}{n^{\Delta}}}(f_{n}) 1cn1δ,\displaystyle\geqslant\frac{1}{c}\cdot n^{1-\delta}, n=1,2,3,.\displaystyle n=1,2,3,\ldots. (5.1)
Proof.

Let K1K\geqslant 1 be the smallest integer such that

1(2/3)K1+(2/3)K1>1δ.\displaystyle\frac{1-(2/3)^{K}}{1+(2/3)^{K-1}}>1-\delta. (5.2)

Define

A=2Δ+3.A=2\Delta+3. (5.3)

Now, let n1n\geqslant 1 be any large enough integer. Define T0,T1,T2,,TKT_{0},T_{1},T_{2},\ldots,T_{K} recursively by T0=n/log2KnT_{0}=\lfloor n/\log^{2K}n\rfloor and Ti=Ti1log2Ti1T_{i}=\lfloor T_{i-1}\log^{2}T_{i-1}\rfloor for i1.i\geqslant 1. Thus,

Ti\displaystyle T_{i} nlog2(Ki)n,\displaystyle\leqslant\frac{n}{\log^{2(K-i)}n}, i=0,1,2,,K,\displaystyle i=0,1,2,\ldots,K, (5.4)
Ti\displaystyle T_{i} nlog2(Ki)n,\displaystyle\sim\frac{n}{\log^{2(K-i)}n}, i=0,1,2,,K,\displaystyle i=0,1,2,\ldots,K, (5.5)

where \sim denotes equality up to lower-order terms. Provided that nn is larger than a certain constant, inductive application of Corollary 4.15 gives functions

gn,i:{0,1}Ti1+(2/3)i1\displaystyle g_{n,i}\colon\{0,1\}^{\lfloor T_{i}^{1+(2/3)^{i-1}}\rfloor} {0,1},\displaystyle\to\{0,1\}, i=0,1,2,,K,\displaystyle i=0,1,2,\ldots,K, (5.6)

such that

deg121T0A1T1A1TiA(gn,i|Ti)\displaystyle\deg_{\frac{1}{2}-\frac{1}{T_{0}^{A}}-\frac{1}{T_{1}^{A}}-\cdots-\frac{1}{T_{i}^{A}}}(g_{n,i}|_{\leqslant T_{i}}) Ti1(2/3)i,\displaystyle\geqslant T_{i}^{1-(2/3)^{i}}, i=0,1,2,,K,\displaystyle i=0,1,2,\ldots,K, (5.7)

and each gn,ig_{n,i} is an explicitly constructed monotone cic_{i}-DNF formula for some constant cic_{i} independent of n.n. In more detail, the requirement (5.7) for i=0i=0 is equivalent to deg121T0A(gn,0|T0)>0\deg_{\frac{1}{2}-\frac{1}{T_{0}^{A}}}(g_{n,0}|_{\leqslant T_{0}})>0 and is trivially satisfied by the “dictator” function gn,0(x)=x1g_{n,0}(x)=x_{1}, whereas for i1i\geqslant 1 the function gn,ig_{n,i} is obtained constructively from gn,i1g_{n,i-1} by invoking Corollary 4.15 with

α\displaystyle\alpha =(23)i1,\displaystyle=\left(\frac{2}{3}\right)^{i-1},
C\displaystyle C =1+(23)i2,\displaystyle=1+\left(\frac{2}{3}\right)^{i-2},
θ\displaystyle\theta =Ti1,\displaystyle=T_{i-1},
f\displaystyle f =gn,i1,\displaystyle=g_{n,i-1},
ϵ\displaystyle\epsilon =121T0A1T1A1Ti1A.\displaystyle=\frac{1}{2}-\frac{1}{T_{0}^{A}}-\frac{1}{T_{1}^{A}}-\cdots-\frac{1}{T_{i-1}^{A}}.

Specializing (5.4)–(5.7) to i=Ki=K, the function gn,Kg_{n,K} is a monotone cKc_{K}-DNF formula for some constant cKc_{K} independent of n,n, takes at most N:=n1+(2/3)K1N:=n^{1+(2/3)^{K-1}} input variables, and has approximate degree

deg121NΔ+1(gn,K)\displaystyle\deg_{\frac{1}{2}-\frac{1}{N^{\Delta+1}}}(g_{n,K}) deg121T0A1T1A1TKA(gn,K)\displaystyle\geqslant\deg_{\frac{1}{2}-\frac{1}{T_{0}^{A}}-\frac{1}{T_{1}^{A}}-\cdots-\frac{1}{T_{K}^{A}}}(g_{n,K})
deg121T0A1T1A1TKA(gn,K|TK)\displaystyle\geqslant\deg_{\frac{1}{2}-\frac{1}{T_{0}^{A}}-\frac{1}{T_{1}^{A}}-\cdots-\frac{1}{T_{K}^{A}}}(g_{n,K}|_{\leqslant T_{K}})
=Ω(n1(2/3)K)\displaystyle=\Omega(n^{1-(2/3)^{K}})
=ω(N1δ),\displaystyle=\omega(N^{1-\delta}),

where the first and last steps hold for all large enough nn due to (5.3) and (5.2), respectively. The desired function family {fn}n=1\{f_{n}\}_{n=1}^{\infty} can then be defined by setting

fn=gn1/(1+(2/3)K1),Kf_{n}=g_{\lfloor n^{1/(1+(2/3)^{K-1})}\rfloor,K}

for all nn larger than a certain constant n0,n_{0}, and taking the remaining functions f1,f2,,fn0f_{1},f_{2},\ldots,f_{n_{0}} to be the dictator function xx1.x\mapsto x_{1}.

Theorem 5.1 immediately implies Theorems 1.1 and 1.2 from the introduction. We now move on to the one-sided case.

Theorem 5.2.

For every δ(0,1]\delta\in(0,1] and Δ1,\Delta\geqslant 1, there is a constant c1c\geqslant 1 and an ((explicitly given)) family {fn}n=1\{f_{n}\}_{n=1}^{\infty} of functions fn:{0,1}n{0,1}f_{n}\colon\{0,1\}^{n}\to\{0,1\} such that each fnf_{n} is computable by a monotone cc-DNF formula and satisfies

deg121nΔ+(¬fn)\displaystyle\deg_{\frac{1}{2}-\frac{1}{n^{\Delta}}}^{+}(\neg f_{n}) 1cn1δ,\displaystyle\geqslant\frac{1}{c}\cdot n^{1-\delta}, n=1,2,3,.\displaystyle n=1,2,3,\ldots. (5.8)

This result subsumes Theorem 5.1 and settles Theorem 1.4 in the introduction. The proof below makes repeated use of the following observation: if one applies Corollary 4.16 to a function ff that is the negation of a constant-width monotone DNF formula, then the resulting composition fHf\circ H is again the negation of a constant-width monotone DNF formula. This is easy to see by writing ¬(fH)=(¬f)H\neg(f\circ H)=(\neg f)\circ H and noting that both ¬f\neg f and HH are computable by constant-width monotone DNF formulas.

Proof of Theorem 5.2..

Much of the proof is identical to that of Theorem 5.1. As before, let K1K\geqslant 1 be the smallest integer such that

1(2/3)K1+(2/3)K1>1δ.\displaystyle\frac{1-(2/3)^{K}}{1+(2/3)^{K-1}}>1-\delta. (5.9)

Define

A=2Δ+3.A=2\Delta+3. (5.10)

Now, let n1n\geqslant 1 be any large enough integer. Define T0,T1,T2,,TKT_{0},T_{1},T_{2},\ldots,T_{K} recursively by T0=n/log2KnT_{0}=\lfloor n/\log^{2K}n\rfloor and Ti=Ti1log2Ti1T_{i}=\lfloor T_{i-1}\log^{2}T_{i-1}\rfloor for i1.i\geqslant 1. Thus,

Ti\displaystyle T_{i} nlog2(Ki)n,\displaystyle\leqslant\frac{n}{\log^{2(K-i)}n}, i=0,1,2,,K,\displaystyle i=0,1,2,\ldots,K, (5.11)
Ti\displaystyle T_{i} nlog2(Ki)n,\displaystyle\sim\frac{n}{\log^{2(K-i)}n}, i=0,1,2,,K,\displaystyle i=0,1,2,\ldots,K, (5.12)

where \sim denotes equality up to lower-order terms. Provided that nn is larger than a certain constant, inductive application of Corollary 4.16 gives functions

gn,i:{0,1}Ti1+(2/3)i1\displaystyle g_{n,i}\colon\{0,1\}^{\lfloor T_{i}^{1+(2/3)^{i-1}}\rfloor} {0,1},\displaystyle\to\{0,1\}, i=0,1,2,,K,\displaystyle i=0,1,2,\ldots,K, (5.13)

such that

deg121T0A1T1A1TiA+(¬gn,i|Ti)\displaystyle\deg_{\frac{1}{2}-\frac{1}{T_{0}^{A}}-\frac{1}{T_{1}^{A}}-\cdots-\frac{1}{T_{i}^{A}}}^{+}(\neg g_{n,i}|_{\leqslant T_{i}}) Ti1(2/3)i,\displaystyle\geqslant T_{i}^{1-(2/3)^{i}}, i=0,1,2,,K,\displaystyle i=0,1,2,\ldots,K, (5.14)

and each gn,ig_{n,i} is an explicitly constructed monotone cic_{i}-DNF formula for some constant cic_{i} independent of n.n. In more detail, the requirement (5.14) for i=0i=0 is equivalent to deg121T0A+(¬gn,0|T0)>0\deg_{\frac{1}{2}-\frac{1}{T_{0}^{A}}}^{+}(\neg g_{n,0}|_{\leqslant T_{0}})>0 and is trivially satisfied by the “dictator” function gn,0(x)=x1g_{n,0}(x)=x_{1}. For i1i\geqslant 1, we obtain gn,ig_{n,i} from gn,i1g_{n,i-1} by applying Corollary 4.16 with

α\displaystyle\alpha =(23)i1,\displaystyle=\left(\frac{2}{3}\right)^{i-1},
C\displaystyle C =1+(23)i2,\displaystyle=1+\left(\frac{2}{3}\right)^{i-2},
θ\displaystyle\theta =Ti1,\displaystyle=T_{i-1},
f\displaystyle f =¬gn,i1,\displaystyle=\neg g_{n,i-1},
ϵ\displaystyle\epsilon =121T0A1T1A1Ti1A.\displaystyle=\frac{1}{2}-\frac{1}{T_{0}^{A}}-\frac{1}{T_{1}^{A}}-\cdots-\frac{1}{T_{i-1}^{A}}.

This appeal to Corollary 4.16 is legitimate because gn,i1g_{n,i-1} is a monotone DNF formula and therefore its negation f=¬gn,i1f=\neg g_{n,i-1} evaluates to 0 on the all-ones input.

Specializing (5.11)–(5.14) to i=Ki=K, the function gn,Kg_{n,K} is a monotone cKc_{K}-DNF formula for some constant cKc_{K} independent of n,n, takes at most N:=n1+(2/3)K1N:=n^{1+(2/3)^{K-1}} input variables, and has one-sided approximate degree

deg121NΔ+1+(¬gn,K)\displaystyle\deg_{\frac{1}{2}-\frac{1}{N^{\Delta+1}}}^{+}(\neg g_{n,K}) deg121T0A1T1A1TKA+(¬gn,K)\displaystyle\geqslant\deg_{\frac{1}{2}-\frac{1}{T_{0}^{A}}-\frac{1}{T_{1}^{A}}-\cdots-\frac{1}{T_{K}^{A}}}^{+}(\neg g_{n,K})
deg121T0A1T1A1TKA+(¬gn,K|TK)\displaystyle\geqslant\deg_{\frac{1}{2}-\frac{1}{T_{0}^{A}}-\frac{1}{T_{1}^{A}}-\cdots-\frac{1}{T_{K}^{A}}}^{+}(\neg g_{n,K}|_{\leqslant T_{K}})
=Ω(n1(2/3)K)\displaystyle=\Omega(n^{1-(2/3)^{K}})
=ω(N1δ),\displaystyle=\omega(N^{1-\delta}),

where the first and last steps hold for all large enough nn due to (5.10) and (5.9), respectively. The desired function family {fn}n=1\{f_{n}\}_{n=1}^{\infty} can then be defined by setting

fn=gn1/(1+(2/3)K1),Kf_{n}=g_{\lfloor n^{1/(1+(2/3)^{K-1})}\rfloor,K}

for all nn larger than a certain constant n0,n_{0}, and taking the remaining functions f1,f2,,fn0f_{1},f_{2},\ldots,f_{n_{0}} to be the dictator function xx1.x\mapsto x_{1}.

5.2. Quantum communication complexity

Using the pattern matrix method, we will “lift” our approximate degree results to a near-optimal lower bound on the communication complexity of DNF formulas in the two-party quantum model. Before we can apply the pattern matrix method, there is a technicality to address with regard to the representation of Boolean values as real numbers. In this paper, we have followed the standard convention of representing “true” and “false” as 11 and 0,0, respectively. There is another common encoding, inspired by Fourier analysis and used in the pattern matrix method [38, 43], whereby “true” and “false” are represented as 1-1 and 1,1, respectively. To switch back and forth between these representations, we will use the following proposition.

Proposition 5.3.

For any function f:Xf\colon X\to\mathbb{R} on a finite subset XX of Euclidean space, and any reals ϵ0\epsilon\geqslant 0 and c0,c\neq 0,

degϵ(f+c)=degϵ(f),\displaystyle\deg_{\epsilon}(f+c)=\deg_{\epsilon}(f),
deg|c|ϵ(cf)=degϵ(f).\displaystyle\deg_{|c|\epsilon}(cf)=\deg_{\epsilon}(f).
Proof.

For any polynomial p,p, we have the following equivalences:

fpϵ(f+c)(p+c)ϵ,\displaystyle\|f-p\|_{\infty}\leqslant\epsilon\qquad\Leftrightarrow\qquad\|(f+c)-(p+c)\|_{\infty}\leqslant\epsilon,
fpϵcfcp|c|ϵ,\displaystyle\|f-p\|_{\infty}\leqslant\epsilon\qquad\Leftrightarrow\qquad\|cf-cp\|_{\infty}\leqslant|c|\epsilon,

where the second line uses c0.c\neq 0.

As a corollary, we can relate in a precise way the approximate degree of a Boolean function f:X{0,1}f\colon X\to\{0,1\} and the approximate degree of the associated ±1\pm 1-valued function f:X{1,+1}f^{\prime}\colon X\to\{-1,+1\} given by f=f^{\prime}=(1)f.(-1)^{f}.

Corollary 5.4.

For any Boolean function f:X{0,1}f\colon X\to\{0,1\} and any ϵ0,\epsilon\geqslant 0,

degϵ((1)f)=degϵ/2(f).\deg_{\epsilon}((-1)^{f})=\deg_{\epsilon/2}(f).
Proof.

Since ff is Boolean-valued, we have the equality of functions (1)f=12f.(-1)^{f}=1-2f. Now degϵ((1)f)=degϵ(12f)=degϵ(2f)=deg2ϵ/2(2f)=degϵ/2(f),\deg_{\epsilon}((-1)^{f})=\deg_{\epsilon}(1-2f)=\deg_{\epsilon}(-2f)=\deg_{2\cdot\epsilon/2}(-2f)=\deg_{\epsilon/2}(f), where the second and fourth steps apply Proposition 5.3. ∎

Corollary 5.4 makes it easy to convert approximate degree results between the 0,10,1 representation and ±1\pm 1 representation. For communication complexity, no conversion is necessary in the first place:

Qϵ(F)=Qϵ((1)F),\displaystyle Q_{\epsilon}^{*}(F)=Q_{\epsilon}^{*}((-1)^{F}), F:X×Y{0,1},\displaystyle F\colon X\times Y\to\{0,1\}, (5.15)

where QϵQ_{\epsilon}^{*} denotes ϵ\epsilon-error quantum communication complexity with arbitrary prior entanglement. This equality holds because the representation of “true” and “false” in a communication protocol is a purely syntactic matter, and one can relabel the output values 0,10,1 as 1,11,-1, respectively, without affecting the protocol’s correctness or communication cost. We note that (5.15) and Corollary 5.4 pertain to the encoding of the output of a Boolean function ff. How “true” and “false” bits are represented in the input to ff is immaterial both for communication complexity and approximate degree because the bijection (0,1)(1,1)(0,1)\leftrightarrow(1,-1) is a linear map.

We are now in a position to prove the promised communication lower bounds. The pattern matrix method for two-party quantum communication is given by the following theorem [38, Theorem 1.1].

Theorem 5.5 (Sherstov).

Let f:{0,1}t{0,1}f\colon\{0,1\}^{t}\to\{0,1\} be given. Define F:{0,1}4t×{0,1}4t{0,1}F\colon\{0,1\}^{4t}\times\{0,1\}^{4t}\to\{0,1\} by

F(x,y)=f(i=14(x1,iy1,i),,i=14(xt,iyt,i)).F(x,y)=f\left(\bigvee_{i=1}^{4}(x_{1,i}\wedge y_{1,i}),\ldots,\bigvee_{i=1}^{4}(x_{t,i}\wedge y_{t,i})\right).

Then for all α[0,1)\alpha\in[0,1) and β<α/2,\beta<\alpha/2,

Qβ(F)14degα/2(f)12log(3α2β).Q_{\beta}^{*}(F)\geqslant\frac{1}{4}\deg_{\alpha/2}(f)-\frac{1}{2}\log\left(\frac{3}{\alpha-2\beta}\right).

The original statement in [38, Theorem 1.1] uses the ±1\pm 1 representation for the range of ff and F.F. We translated it to the 0,10,1 representation, as stated in Theorem 5.5, by applying (5.15) to FF and Corollary 5.4 to f.f. By combining Theorems 5.1 and 5.5, we obtain our main result on the quantum communication complexity of DNF formulas:

Theorem 5.6.

For all δ(0,1]\delta\in(0,1] and A1,A\geqslant 1, there is a constant c1c\geqslant 1 and an ((explicitly given)) family {Fn}n=1\{F_{n}\}_{n=1}^{\infty} of two-party communication problems Fn:{0,1}n×{0,1}n{0,1}F_{n}\colon\{0,1\}^{n}\times\{0,1\}^{n}\to\{0,1\} such that each FnF_{n} is computable by a monotone cc-DNF formula and satisfies

Q121nA(Fn)\displaystyle Q_{\frac{1}{2}-\frac{1}{n^{A}}}^{*}(F_{n}) =Ω(n1δ).\displaystyle=\Omega(n^{1-\delta}). (5.16)
Proof.

Theorem 5.1 gives a constant c1c^{\prime}\geqslant 1 and an explicit family {fn}n=1\{f_{n}\}_{n=1}^{\infty} of functions fn:{0,1}n{0,1}f_{n}\colon\{0,1\}^{n}\to\{0,1\} such that each fnf_{n} is computable by a monotone cc^{\prime}-DNF formula and satisfies

deg121n2A(fn)\displaystyle\deg_{\frac{1}{2}-\frac{1}{n^{2A}}}(f_{n}) 1cn1δ,\displaystyle\geqslant\frac{1}{c^{\prime}}\cdot n^{1-\delta}, n=1,2,3,.\displaystyle n=1,2,3,\ldots. (5.17)

For n4,n\geqslant 4, define Fn:{0,1}n×{0,1}n{0,1}F_{n}\colon\{0,1\}^{n}\times\{0,1\}^{n}\to\{0,1\} by

Fn(x,y)=fn/4(i=14(x1,iy1,i),,i=14(xn/4,iyn/4,i)),F_{n}(x,y)=f_{\lfloor n/4\rfloor}\left(\bigvee_{i=1}^{4}(x_{1,i}\wedge y_{1,i}),\ldots,\bigvee_{i=1}^{4}(x_{\lfloor n/4\rfloor,i}\wedge y_{\lfloor n/4\rfloor,i})\right),

where we index the strings xx and yy as arrays of n/4×4\lfloor n/4\rfloor\times 4 bits. Clearly, FnF_{n} is computable by a monotone 2c2c^{\prime}-DNF formula. We now invoke the pattern matrix method for quantum communication (Theorem 5.5) with parameters

α\displaystyle\alpha =12n/42A,\displaystyle=1-\frac{2}{\lfloor n/4\rfloor^{2A}},
β\displaystyle\beta =121nA,\displaystyle=\frac{1}{2}-\frac{1}{n^{A}},
f\displaystyle f =fn/4,\displaystyle=f_{\lfloor n/4\rfloor},

which satisfy β<α/2\beta<\alpha/2 for all n24.n\geqslant 24. As a result,

Q121nA(Fn)\displaystyle Q_{\frac{1}{2}-\frac{1}{n^{A}}}^{*}(F_{n}) 14deg121n/42A(fn/4)12log(32nA2n/42A)\displaystyle\geqslant\frac{1}{4}\cdot\deg_{\frac{1}{2}-\frac{1}{\lfloor n/4\rfloor^{2A}}}(f_{\lfloor n/4\rfloor})-\frac{1}{2}\log\left(\frac{3}{\frac{2}{n^{A}}-\frac{2}{\lfloor n/4\rfloor^{2A}}}\right)
141cn41δ12log(32nA2n/42A)\displaystyle\geqslant\frac{1}{4}\cdot\frac{1}{c^{\prime}}\cdot\left\lfloor\frac{n}{4}\right\rfloor^{1-\delta}-\frac{1}{2}\log\left(\frac{3}{\frac{2}{n^{A}}-\frac{2}{\lfloor n/4\rfloor^{2A}}}\right)

for all n24,n\geqslant 24, where the first inequality applies the pattern matrix method, and the second inequality uses (5.17). Now (5.16) follows since A,c,δA,c^{\prime},\delta are constants. ∎

Theorem 5.6 settles Theorem 1.9 from the introduction.

5.3. Randomized multiparty communication

We now turn to communication lower bounds for DNF formulas in the kk-party number-on-the-forehead model. Analogous to (5.15), we have

Rϵ(F)=Rϵ((1)F),\displaystyle R_{\epsilon}(F)=R_{\epsilon}((-1)^{F}), F:X1×X2××Xk{0,1},\displaystyle F\colon X_{1}\times X_{2}\times\cdots\times X_{k}\to\{0,1\}, (5.18)

where RϵR_{\epsilon} denotes ϵ\epsilon-error number-on-the-forehead randomized communication complexity. The kk-party set disjointness problem DISJn,k:({0,1}n)k{0,1}\text{\rm DISJ}_{n,k}\colon(\{0,1\}^{n})^{k}\to\{0,1\} is given by

DISJn,k(x1,x2,,xk)=j=1ni=1kxi,j¯.\text{\rm DISJ}_{n,k}(x_{1},x_{2},\ldots,x_{k})=\bigwedge_{j=1}^{n}\bigvee_{i=1}^{k}\overline{x_{i,j}}.

In other words, the problem asks whether there is a coordinate jj in which each of the Boolean vectors x1,x2,,xkx_{1},x_{2},\ldots,x_{k} has a 1.1. If one views x1,x2,,xkx_{1},x_{2},\ldots,x_{k} as the characteristic vectors of corresponding sets S1,S2,,SkS_{1},S_{2},\ldots,S_{k}, then the set disjointness function evaluates to true if and only if S1S2Sk=.S_{1}\cap S_{2}\cap\cdots\cap S_{k}=\varnothing. For a communication problem g:X1×X2××Xk{0,1}g\colon X_{1}\times X_{2}\times\cdots\times X_{k}\to\{0,1\} and a function f:{0,1}n{0,1},f\colon\{0,1\}^{n}\to\{0,1\}, we view the componentwise composition fgf\circ g as a kk-party communication problem on X1n×X2n××Xkn.X_{1}^{n}\times X_{2}^{n}\times\cdots\times X_{k}^{n}. The multiparty pattern matrix method [43, Theorem 5.1] gives a lower bound on the communication complexity of fDISJm,kf\circ\text{\rm DISJ}_{m,k} in terms of the approximate degree of ff:

Theorem 5.7 (Sherstov).

Let f:{0,1}n{0,1}f\colon\{0,1\}^{n}\to\{0,1\} be given. Consider the kk-party communication problem FF defined by F=fDISJm,k.F=f\circ\text{\rm DISJ}_{m,k}. Then for all α,β0\alpha,\beta\geqslant 0 with β<α/2,\beta<\alpha/2, one has

Rβ(F)degα/2(f)2log(mC2kk)log1α2β,R_{\beta}(F)\geqslant\frac{\deg_{\alpha/2}(f)}{2}\cdot\log\left(\frac{\sqrt{m}}{C2^{k}k}\right)-\log\frac{1}{\alpha-2\beta},

where C>0C>0 is an absolute constant.

The actual statement of the pattern matrix method in [43, Theorem 5.1] is for functions ff and FF with range {1,+1}\{-1,+1\}. Theorem 5.7 above, stated for functions with range {0,1}\{0,1\}, is immediate from [43, Theorem 5.1] by applying (5.18) to FF and Corollary 5.4 to ff. We are now ready for our main result on the randomized multiparty communication complexity of DNF formulas.

Theorem 5.8.

Fix arbitrary constants δ(0,1]\delta\in(0,1] and A1A\geqslant 1. Then for all integers n,k2,n,k\geqslant 2, there is an ((explicitly given)) kk-party communication problem Fn,k:({0,1}n)k{0,1}F_{n,k}\colon(\{0,1\}^{n})^{k}\to\{0,1\} with

R1/3(Fn,k)(nc4kk2)1δ,\displaystyle R_{1/3}(F_{n,k})\geqslant\left(\frac{n}{c4^{k}k^{2}}\right)^{1-\delta}, (5.19)
R121nA(Fn,k)n1δc4k,\displaystyle R_{\frac{1}{2}-\frac{1}{n^{A}}}(F_{n,k})\geqslant\frac{n^{1-\delta}}{c4^{k}}, (5.20)

where c1c\geqslant 1 is a constant independent of nn and k.k. Moreover, each Fn,kF_{n,k} is computable by a monotone DNF formula of width ckck and size ncn^{c}.

It will be helpful to keep in mind that the conclusion of Theorem 5.8 is “monotone” in c,c, in the sense that proving Theorem 5.8 for a given constant cc proves it for all larger constants as well.

Proof.

Theorem 5.1 gives a constant c1c^{\prime}\geqslant 1 and an explicit family {fn}n=1\{f_{n}\}_{n=1}^{\infty} of functions fn:{0,1}n{0,1}f_{n}\colon\{0,1\}^{n}\to\{0,1\} such that each fnf_{n} is computable by a monotone DNF formula of width cc^{\prime} and satisfies

deg121n2A/δ(fn)\displaystyle\deg_{\frac{1}{2}-\frac{1}{n^{2A/\delta}}}(f_{n}) 1cn1δ2,\displaystyle\geqslant\frac{1}{c^{\prime}}\cdot n^{1-\frac{\delta}{2}}, n=1,2,3,.\displaystyle n=1,2,3,\ldots. (5.21)

Let C>0C>0 be the absolute constant from Theorem 5.7. For arbitrary integers n,k2,n,k\geqslant 2, define

Fn,k={ANDkif n<C2k+1k2,fn/C2k+1k2¬DISJC2k+1k2,kotherwise.F_{n,k}=\begin{cases}\text{\rm AND}_{k}&\text{if }n<\lceil C2^{k+1}k\rceil^{2},\\ f_{\lfloor n/\lceil C2^{k+1}k\rceil^{2}\rfloor}\circ\neg\text{\rm DISJ}_{\lceil C2^{k+1}k\rceil^{2},k}&\text{otherwise.}\end{cases}

We first analyze the cost of representing Fn,kF_{n,k} as a DNF formula. If n<C2k+1k2,n<\lceil C2^{k+1}k\rceil^{2}, then by definition Fn,kF_{n,k} is a monotone DNF formula of width kk and size 1.1. In the complementary case, fn/C2k+1k2f_{\lfloor n/\lceil C2^{k+1}k\rceil^{2}\rfloor} is by construction a monotone DNF formula of width cc^{\prime} and hence of size at most nc,n^{c^{\prime}}, whereas ¬DISJC2k+1k2,k\neg\text{\rm DISJ}_{\lceil C2^{k+1}k\rceil^{2},k} is by definition a monotone DNF formula of width kk and size at most C2k+1k2n.\lceil C2^{k+1}k\rceil^{2}\leqslant n. As a result, the composed function Fn,kF_{n,k} is a monotone DNF formula of width ckc^{\prime}k and size at most ncnc=n2c.n^{c^{\prime}}\cdot n^{c^{\prime}}=n^{2c^{\prime}}. In particular, the claim in the theorem statement regarding the width and size of Fn,kF_{n,k} as a monotone DNF formula is valid for any constant c2c.c\geqslant 2c^{\prime}.

We now turn to the communication complexity of Fn,k.F_{n,k}. Since Fn,kF_{n,k} is nonconstant, we have the trivial bound

Rϵ(Fn,k)\displaystyle R_{\epsilon}(F_{n,k}) 1,\displaystyle\geqslant 1, 0ϵ<12.\displaystyle 0\leqslant\epsilon<\frac{1}{2}. (5.22)

We further claim that

Rϵ(Fn,k)12cnC2k+1k21δ2+log(12n/C2k+1k22A/δ2ϵ)R_{\epsilon}(F_{n,k})\geqslant\frac{1}{2c^{\prime}}\cdot\left\lfloor\frac{n}{\lceil C2^{k+1}k\rceil^{2}}\right\rfloor^{1-\frac{\delta}{2}}\\ +\log\left(1-\frac{2}{\lfloor n/\lceil C2^{k+1}k\rceil^{2}\rfloor^{2A/\delta}}-2\epsilon\right)\qquad (5.23)

whenever the logarithmic term is well-defined. For n<C2k+1k2,n<\lceil C2^{k+1}k\rceil^{2}, this claim is vacuous. In the complementary case nC2k+1k2,n\geqslant\lceil C2^{k+1}k\rceil^{2}, consider the family {gn}n=1\{g_{n}\}_{n=1}^{\infty} of functions gn:{0,1}n{0,1}g_{n}\colon\{0,1\}^{n}\to\{0,1\} given by gn(x1,x2,,xn)=fn(¬x1,¬x2,,¬xn)g_{n}(x_{1},x_{2},\ldots,x_{n})=f_{n}(\neg x_{1},\neg x_{2},\ldots,\neg x_{n}). For each n,n, it is clear that gng_{n} and fnf_{n} have the same approximate degree. Since Fn,k=gn/C2k+1k2DISJC2k+1k2,k,F_{n,k}=g_{\lfloor n/\lceil C2^{k+1}k\rceil^{2}\rfloor}\circ\text{\rm DISJ}_{\lceil C2^{k+1}k\rceil^{2},k}, one now obtains (5.23) directly from (5.21) and the multiparty pattern matrix method (Theorem 5.7).

For a sufficiently large constant c1c\geqslant 1, the communication lower bound (5.19) follows from (5.23) for nc4kk2n\geqslant c4^{k}k^{2} and follows from (5.22) for n<c4kk2n<c4^{k}k^{2}.

The proof of (5.20) is more tedious. Take the constant c1c\geqslant 1 large enough that the following relations hold:

nδ2C2log2n2nδ/2\displaystyle\left\lfloor\frac{n^{\delta}}{\lceil 2C\rceil^{2}\log^{2}n}\right\rfloor\geqslant 2n^{\delta/2} for all nc,\displaystyle\text{for all }n\geqslant c, (5.24)
nδ2/41+Alogn\displaystyle n^{\delta^{2}/4}\geqslant 1+A\log n for all nc,\displaystyle\text{for all }n\geqslant c, (5.25)
c2c.\displaystyle c\geqslant 2c^{\prime}. (5.26)

If n1δ<c4k,n^{1-\delta}<c4^{k}, then (5.20) holds due to (5.22). In what follows, we treat the complementary case when

n1δc4k1\frac{n^{1-\delta}}{c4^{k}}\geqslant 1 (5.27)

and in particular

n\displaystyle n c,\displaystyle\geqslant c, (5.28)
k\displaystyle k logn.\displaystyle\leqslant\log n. (5.29)

Then

nC2k+1k21δ2\displaystyle\left\lfloor\frac{n}{\lceil C2^{k+1}k\rceil^{2}}\right\rfloor^{1-\frac{\delta}{2}} n2C24kk21δ2\displaystyle\geqslant\left\lfloor\frac{n}{\lceil 2C\rceil^{2}4^{k}k^{2}}\right\rfloor^{1-\frac{\delta}{2}}
n1δ4knδ2C2log2n1δ2\displaystyle\geqslant\left\lfloor\frac{n^{1-\delta}}{4^{k}}\cdot\frac{n^{\delta}}{\lceil 2C\rceil^{2}\log^{2}n}\right\rfloor^{1-\frac{\delta}{2}}
n1δ4k2nδ/21δ2\displaystyle\geqslant\left\lfloor\frac{n^{1-\delta}}{4^{k}}\cdot 2n^{\delta/2}\right\rfloor^{1-\frac{\delta}{2}}
(n1δ4knδ/2)1δ2\displaystyle\geqslant\left(\frac{n^{1-\delta}}{4^{k}}\cdot n^{\delta/2}\right)^{1-\frac{\delta}{2}}
n1δ4knδ2/4\displaystyle\geqslant\frac{n^{1-\delta}}{4^{k}}\cdot n^{\delta^{2}/4}
n1δ4k(1+Alogn),\displaystyle\geqslant\frac{n^{1-\delta}}{4^{k}}\cdot(1+A\log n), (5.30)

where the second step uses (5.29), the third step uses (5.24) and (5.28), the fourth step is valid by (5.27), and the last step uses (5.25) and (5.28). Continuing,

log(12n/C2k+1k22A/δ2(121nA))\displaystyle\log\left(1-\frac{2}{\lfloor n/\lceil C2^{k+1}k\rceil^{2}\rfloor^{2A/\delta}}-2\left(\frac{1}{2}-\frac{1}{n^{A}}\right)\right)
=log(2nA2n/C2k+1k22A/δ)\displaystyle\hskip 56.9055pt=\log\left(\frac{2}{n^{A}}-\frac{2}{\lfloor n/\lceil C2^{k+1}k\rceil^{2}\rfloor^{2A/\delta}}\right)
log(2nA2n/(2C24kk2)2A/δ)\displaystyle\hskip 56.9055pt\geqslant\log\left(\frac{2}{n^{A}}-\frac{2}{\lfloor n/(\lceil 2C\rceil^{2}4^{k}k^{2})\rfloor^{2A/\delta}}\right)
log(2nA2n/(2C2n1δlog2n)2A/δ)\displaystyle\hskip 56.9055pt\geqslant\log\left(\frac{2}{n^{A}}-\frac{2}{\lfloor n/(\lceil 2C\rceil^{2}n^{1-\delta}\log^{2}n)\rfloor^{2A/\delta}}\right)
log(2nA2(2nδ/2)2A/δ)\displaystyle\hskip 56.9055pt\geqslant\log\left(\frac{2}{n^{A}}-\frac{2}{(2n^{\delta/2})^{2A/\delta}}\right)
log(2nA1nA)\displaystyle\hskip 56.9055pt\geqslant\log\left(\frac{2}{n^{A}}-\frac{1}{n^{A}}\right)
=Alogn,\displaystyle\hskip 56.9055pt=-A\log n, (5.31)

where the third step uses (5.27) and (5.29), and the fourth step uses (5.24) and (5.28). Now

R121nA(Fn,k)\displaystyle R_{\frac{1}{2}-\frac{1}{n^{A}}}(F_{n,k}) 12cn1δ4k(1+Alogn)Alogn\displaystyle\geqslant\frac{1}{2c^{\prime}}\cdot\frac{n^{1-\delta}}{4^{k}}\cdot(1+A\log n)-A\log n
n1δc4k(1+Alogn)Alogn\displaystyle\geqslant\frac{n^{1-\delta}}{c4^{k}}\cdot(1+A\log n)-A\log n
n1δc4k,\displaystyle\geqslant\frac{n^{1-\delta}}{c4^{k}},

where the first step substitutes the bounds (5.30) and (5.31) into (5.23), the second step uses (5.26), and the third step is valid by (5.27). This completes the proof of (5.20). ∎

Theorem 5.8 settles Theorem 1.6 from the introduction.

Remark 5.9.

In this section, we considered kk-party number-on-the-forehead bounded-error communication complexity with classical players. The model naturally extends to quantum players, and our lower bound in Theorem 5.8 implies an Ω(n1δ/4kk)\Omega(n^{1-\delta}/4^{k}k) communication lower bound in this quantum kk-party number-on-the-forehead model for computing an explicit DNF formula F:({0,1}n)k{0,1}F\colon(\{0,1\}^{n})^{k}\to\{0,1\} of size nO(1)n^{O(1)} and width O(k)O(k) with error probability 121nA,\frac{1}{2}-\frac{1}{n^{A}}, where the constants δ>0\delta>0 and A1A\geqslant 1 can be set arbitrarily. In more detail, the multiparty pattern matrix method actually gives a bound on the generalized discrepancy of the composed communication problem FF. By the results of [27], generalized discrepancy leads in turn to a lower bound on the communication complexity of FF in the quantum kk-party number-on-the-forehead model. Quantitatively, the authors of [27] show that any classical communication lower bound obtained via generalized discrepancy carries over to the quantum model with only a factor of Θ(k)\Theta(k) loss.

5.4. Nondeterministic and Merlin–Arthur multiparty communication

To obtain our results on nondeterminism and Merlin–Arthur communication, we will now develop a general technique for transforming lower bounds on one-sided approximate degree into lower bounds in these communication models. The technique in question is implicit in the papers [23, 43] but has not been previously formalized in our sought generality.

Consider a kk-party communication problem F:X1×X2××Xk{0,1},F\colon X_{1}\times X_{2}\times\cdots\times X_{k}\to\{0,1\}, for some finite sets X1,X2,,XkX_{1},X_{2},\ldots,X_{k}. A fundamental notion in the study of multiparty communication is that of a cylinder intersection [6], defined as any function χ:X1×X2××Xk{0,1}\chi:X_{1}\times X_{2}\times\cdots\times X_{k}\to\{0,1\} of the form

χ(x1,,xk)=i=1kϕi(x1,,xi1,xi+1,,xk)\chi(x_{1},\dots,x_{k})=\prod_{i=1}^{k}\phi_{i}(x_{1},\dots,x_{i-1},x_{i+1},\dots,x_{k})

for some ϕi:X1×Xi1×Xi+1×Xk{0,1},\phi_{i}:X_{1}\times\cdots X_{i-1}\times X_{i+1}\times\cdots X_{k}\to\{0,1\}, i=1,2,,k.i=1,2,\dots,k. In other words, a cylinder intersection is the product of kk Boolean functions, where the ii-th function does not depend on the ii-th coordinate. For a probability distribution μ\mu on the domain of F,F, the discrepancy of FF with respect to μ\mu is denoted discμ(F)\operatorname{disc}_{\mu}(F) and defined as

discμ(F)=maxχ|𝐄(x1,,xk)μ(1)F(x1,,xk)χ(x1,,xk)|,\displaystyle\operatorname{disc}_{\mu}(F)=\max_{\chi}\left|\operatorname*{\mathbf{E}}_{(x_{1},\ldots,x_{k})\sim\mu}(-1)^{F(x_{1},\ldots,x_{k})}\chi(x_{1},\dots,x_{k})\right|,

where the maximum is taken over all cylinder intersections χ.\chi. This notion of discrepancy was defined by Babai, Nisan, and Szegedy [6] and is unrelated to the one that we encountered in Section 3.2. It is of interest to us because of the following theorem [23, Theorem 4.1], which gives a lower bound on nondeterministic and Merlin–Arthur communication complexity in terms of discrepancy.

Theorem 5.10 (Gavinsky and Sherstov).

Let F:X{0,1}F:X\to\{0,1\} be a given kk-party communication problem, where X=X1××Xk.X=X_{1}\times\cdots\times X_{k}. Fix a function H:X{0,1}H:X\to\{0,1\} and a probability distribution Π\Pi on X.X. Put

α\displaystyle\alpha =Π(F1(1)H1(1)),\displaystyle=\Pi(F^{-1}(1)\cap H^{-1}(1)),
β\displaystyle\beta =Π(F1(1)H1(0)),\displaystyle=\Pi(F^{-1}(1)\cap H^{-1}(0)),
Q\displaystyle Q =logαβ+discΠ(H).\displaystyle=\log\frac{\alpha}{\beta+\operatorname{disc}_{\Pi}(H)}.

Then

N(F)Q,\displaystyle N(F)\geqslant Q, (5.32)
MA1/3(F)2min{Ω(Q),Ω(Qlog{2/α})2}\displaystyle\text{\it MA}_{1/3}(F)^{2}\geqslant\min\left\{\Omega(Q),\;\Omega\left(\frac{Q}{\log\{2/\alpha\}}\right)^{2}\right\} (5.33)
Ω(Q)(log2α)2.\displaystyle\phantom{\text{\it MA}_{1/3}(F)^{2}}\geqslant\Omega(Q)-\left(\log\frac{2}{\alpha}\right)^{2}. (5.34)

We note that the original statement in [23] is for functions with range {1,+1}.\{-1,+1\}. The above version for {0,1}\{0,1\} follows immediately because the output values of a communication protocol serve as textual labels that can be changed at will. Equation (5.34), which is also not part of the statement in [23], follows from (5.33) in view of the inequality (q/a)22qa2(q/a)^{2}\geqslant 2q-a^{2} for all reals q,aq,a with a0.a\neq 0. (Start with (q/aa)20(q/a-a)^{2}\geqslant 0 and multiply out the left-hand side.)

We will need yet another notion of discrepancy, introduced in [43] and called “repeated discrepancy.” Let GG be a kk-party communication problem on X=X1×X2××XkX=X_{1}\times X_{2}\times\cdots\times X_{k}. A probability distribution π\pi on the domain of GG is called balanced if π(G1(0))=π(G1(1))=1/2.\pi(G^{-1}(0))=\pi(G^{-1}(1))=1/2. For such π\pi, the repeated discrepancy of GG with respect to π\pi is given by

rdiscπ(G)=supd,r+maxχ|𝐄,xi,j,[χ(,xi,j,)i=1d(1)G(xi,1)]|1/d,\operatorname{rdisc}_{\pi}(G)\,=\,\sup_{d,r\in\mathbb{Z}^{+}}\;\,\max_{\chi}\left|\operatorname*{\mathbf{E}}_{\ldots,x_{i,j},\ldots}\left[\chi(\dots,x_{i,j},\ldots)\prod_{i=1}^{d}(-1)^{G(x_{i,1})}\right]\right|^{1/d},

where the maximum is over kk-dimensional cylinder intersections χ\chi on Xdr=X1dr×X2dr××Xkdr,X^{dr}=X_{1}^{dr}\times X_{2}^{dr}\times\cdots\times X_{k}^{dr}, and the arguments xi,jx_{i,j} (i=1,2,,d;j=1,2,,r)(i=1,2,\dots,d;\;j=1,2,\dots,r) are chosen independently according to π\pi conditioned on G(xi,1)=G(xi,2)==G(xi,r)G(x_{i,1})=G(x_{i,2})=\cdots=G(x_{i,r}) for each ii. The repeated discrepancy of a communication problem is much harder to bound from above than standard discrepancy. The following result from [43, Theorem 4.27] bounds the repeated discrepancy of set disjointness.

Theorem 5.11 (Sherstov).

Let mm and kk be positive integers. Then there is a balanced probability distribution π\pi on the domain of DISJm,k\text{\rm DISJ}_{m,k} such that

rdiscπ(DISJm,k)(ck2km)1/2,\operatorname{rdisc}_{\pi}(\text{\rm DISJ}_{m,k})\leqslant\left(\frac{ck2^{k}}{\sqrt{m}}\right)^{1/2},

where c>0c>0 is an absolute constant independent of m,k,π.m,k,\pi.

It was shown in [43] that repeated discrepancy gives a highly efficient way to transform multiparty communication protocols into polynomials. For a nonnegative integer dd and a function ff on a finite subset of Euclidean space, define

E(f,d)=minp{fp:degpd},E(f,d)=\min_{p}\{\|f-p\|_{\infty}:\deg p\leqslant d\},

where the minimum is taken over polynomials of degree at most d.d. In other words, E(f,d)E(f,d) stands for the minimum error in an \ell_{\infty}-norm approximation of ff by a polynomial of degree at most d.d. The following result was proved in [43, Theorem 4.2].

Theorem 5.12 (Sherstov).

Let G:X{0,1}G\colon X\to\{0,1\} be a kk-party communication problem, where X=X1×X2××XkX=X_{1}\times X_{2}\times\cdots\times X_{k}. For an integer n1n\geqslant 1 and a balanced probability distribution π\pi on the domain of GG, consider the linear operator Lπ,n:Xn{0,1}nL_{\pi,n}\colon\mathbb{R}^{X^{n}}\to\mathbb{R}^{\{0,1\}^{n}} given by

(Lπ,nχ)(z)\displaystyle(L_{\pi,n}\chi)(z) =𝐄x1πz1𝐄xnπznχ(x1,,xn),\displaystyle=\operatorname*{\mathbf{E}}_{x_{1}\sim\pi_{z_{1}}}\cdots\operatorname*{\mathbf{E}}_{x_{n}\sim\pi_{z_{n}}}\chi(x_{1},\dots,x_{n}), z{0,1}n,\displaystyle z\in\{0,1\}^{n}, (5.35)

where π0\pi_{0} and π1\pi_{1} are the probability distributions induced by π\pi on G1(0)G^{-1}(0) and G1(1),G^{-1}(1), respectively. Then for some absolute constant c>0c>0 and every kk-dimensional cylinder intersection χ\chi on Xn=X1n×X2n××Xkn,X^{n}=X_{1}^{n}\times X_{2}^{n}\times\cdots\times X_{k}^{n},

E(Lπ,nχ,d1)\displaystyle E(L_{\pi,n}\chi,d-1) (crdiscπ(G))d,\displaystyle\leqslant(c\operatorname{rdisc}_{\pi}(G))^{d}, d=1,2,,n.\displaystyle d=1,2,\dots,n.

We are now in a position to derive the promised lower bound on nondeterministic and Merlin–Arthur communication complexity in terms of one-sided approximate degree. Our proof combines Theorems 5.105.12 in a way closely analogous to the proof of [43, Theorem 6.9].

Theorem 5.13.

Let f:{0,1}n{0,1}f\colon\{0,1\}^{n}\to\{0,1\} be given. Let mm and kk be positive integers, and put F=fDISJm,kF=f\circ\text{\rm DISJ}_{m,k}. Then for all ϵ(0,1/2],\epsilon\in(0,1/2],

N(F)\displaystyle N(F) degϵ+(f)2log(mCk2k)log1ϵ,\displaystyle\geqslant\frac{\deg^{+}_{\epsilon}(f)}{2}\log\left(\frac{\sqrt{m}}{Ck2^{k}}\right)-\log\frac{1}{\epsilon}, (5.36)
MA1/3(F)2\displaystyle\text{\it MA}_{1/3}(F)^{2} degϵ+(f)Clog(mCk2k)(log2ϵ)2,\displaystyle\geqslant\frac{\deg^{+}_{\epsilon}(f)}{C}\log\left(\frac{\sqrt{m}}{Ck2^{k}}\right)-\left(\log\frac{2}{\epsilon}\right)^{2}, (5.37)

where C1C\geqslant 1 is an absolute constant, independent of f,n,m,k,ϵ.f,n,m,k,\epsilon.

Proof.

Abbreviate d=degϵ+(f)d=\deg^{+}_{\epsilon}(f). Let X=({0,1}m)kX=(\{0,1\}^{m})^{k} denote the domain of DISJm,k\text{\rm DISJ}_{m,k}. By Theorem 5.11, there is a probability distribution π\pi on XX such that

π(DISJm,k1(0))=π(DISJm,k1(1))=12,\displaystyle\pi(\text{\rm DISJ}_{m,k}^{-1}(0))=\pi(\text{\rm DISJ}_{m,k}^{-1}(1))=\frac{1}{2}, (5.38)
rdiscπ(DISJm,k)(ck2km)1/2,\displaystyle\operatorname{rdisc}_{\pi}(\text{\rm DISJ}_{m,k})\leqslant\left(\frac{c^{\prime}k2^{k}}{\sqrt{m}}\right)^{1/2}, (5.39)

where c>0c^{\prime}>0 is an absolute constant independent of mm and k.k. By the dual characterization of one-sided approximate degree (Fact 2.9), there exists a function ψ:{0,1}n\psi\colon\{0,1\}^{n}\to\mathbb{R} such that

f,ψ>ϵ,\displaystyle\langle f,\psi\rangle>\epsilon, (5.40)
ψ1=1,\displaystyle\|\psi\|_{1}=1, (5.41)
orthψd,\displaystyle\operatorname{orth}\psi\geqslant d, (5.42)
ψ0 on f1(1).\displaystyle\psi\geqslant 0\;\text{ on }f^{-1}(1). (5.43)

Define Ψ:Xn\Psi\colon X^{n}\to\mathbb{R} by

Ψ(x)=2nψ(DISJm,k(x1),,DISJm,k(xn))i=1nπ(xi).\Psi(x)=2^{n}\psi(\text{\rm DISJ}_{m,k}(x_{1}),\ldots,\text{\rm DISJ}_{m,k}(x_{n}))\prod_{i=1}^{n}\pi(x_{i}). (5.44)
Claim 5.14.

Ψ\Psi satisfies

F,Ψ>ϵ,\displaystyle\langle F,\Psi\rangle>\epsilon, (5.45)
Ψ1=1,\displaystyle\|\Psi\|_{1}=1, (5.46)
Ψ0 on F1(1).\displaystyle\Psi\geqslant 0\quad\text{ on }F^{-1}(1). (5.47)

We will carry on with the theorem proof and settle the claims later. Equation (5.46) allows us to write

Ψ(x)=Π(x)(1)1H(x)\Psi(x)=\Pi(x)\cdot(-1)^{1-H(x)} (5.48)

for some Boolean function H:Xn{0,1}H\colon X^{n}\to\{0,1\} and a probability distribution Π\Pi on Xn.X^{n}. Indeed, one can explicitly define Π(x)=|Ψ(x)|\Pi(x)=|\Psi(x)| and H(x)=𝐈[Ψ(x)0].H(x)=\mathbf{I}[\Psi(x)\geqslant 0].

Claim 5.15.

One has

Π(F1(1)H1(0))=0,\displaystyle\Pi(F^{-1}(1)\cap H^{-1}(0))=0, (5.49)
Π(F1(1)H1(1))>ϵ.\displaystyle\Pi(F^{-1}(1)\cap H^{-1}(1))>\epsilon. (5.50)
Claim 5.16.

There is an absolute constant c>0c>0 such that

discΠ(H)(ck2km)d/2.\operatorname{disc}_{\Pi}(H)\leqslant\left(\frac{ck2^{k}}{\sqrt{m}}\right)^{d/2}.

The sought communication bounds (5.36) and (5.37) follow from Theorem 5.10 in view of Claims 5.15 and 5.16. ∎

We now settle the claims used in the proof of Theorem 5.13.

Proof of Claim 5.14..

We have

F,Ψ\displaystyle\langle F,\Psi\rangle =2n𝐄x1,,xnπf(,DISJm,k(xi),)ψ(,DISJm,k(xi),)\displaystyle=2^{n}\operatorname*{\mathbf{E}}_{x_{1},\ldots,x_{n}\sim\pi}f(\ldots,\text{\rm DISJ}_{m,k}(x_{i}),\ldots)\psi(\ldots,\text{\rm DISJ}_{m,k}(x_{i}),\ldots)
=2n𝐄z{0,1}nf(z)ψ(z)\displaystyle=2^{n}\operatorname*{\mathbf{E}}_{z\in\{0,1\}^{n}}f(z)\psi(z)
=f,ψ\displaystyle=\langle f,\psi\rangle
>ϵ,\displaystyle>\epsilon,

where the second step uses (5.38), and the third step is legitimate by (5.40). Analogously,

Ψ1\displaystyle\|\Psi\|_{1} =2n𝐄x1,,xnπ|ψ(DISJm,k(x1),,DISJm,k(xn))|\displaystyle=2^{n}\operatorname*{\mathbf{E}}_{x_{1},\ldots,x_{n}\sim\pi}|\psi(\text{\rm DISJ}_{m,k}(x_{1}),\ldots,\text{\rm DISJ}_{m,k}(x_{n}))|
=2n𝐄z{0,1}n|ψ(z)|\displaystyle=2^{n}\operatorname*{\mathbf{E}}_{z\in\{0,1\}^{n}}|\psi(z)|
=1,\displaystyle=1,

where the last two steps are valid by (5.38) and (5.41), respectively. The final property (5.47) can be seen from the following chain of implications:

xF1(1)\displaystyle x\in F^{-1}(1) (DISJm,k(x1),,DISJm,k(xn))f1(1)\displaystyle\Rightarrow(\text{\rm DISJ}_{m,k}(x_{1}),\ldots,\text{\rm DISJ}_{m,k}(x_{n}))\in f^{-1}(1)
ψ(DISJm,k(x1),,DISJm,k(xn))0\displaystyle\Rightarrow\psi(\text{\rm DISJ}_{m,k}(x_{1}),\ldots,\text{\rm DISJ}_{m,k}(x_{n}))\geqslant 0
Ψ(x)0,\displaystyle\Rightarrow\Psi(x)\geqslant 0,

where the first and third steps use the definitions of FF and Ψ,\Psi, respectively, and the second step is valid by (5.43). ∎

Proof of Claim 5.15..

Fix any point xF1(1)H1(0).x\in F^{-1}(1)\cap H^{-1}(0). Then (5.47) implies that Ψ(x)0,\Psi(x)\geqslant 0, or equivalently Π(x)(1)1H(x)0.\Pi(x)\cdot(-1)^{1-H(x)}\geqslant 0. This forces Π(x)0\Pi(x)\leqslant 0 due to H(x)=0H(x)=0. Since Π\Pi is a probability distribution, we conclude that Π(x)=0\Pi(x)=0. The proof of (5.49) is complete. The remaining relation (5.50) can be seen as follows:

Π(F1(1)H1(1))\displaystyle\Pi(F^{-1}(1)\cap H^{-1}(1)) =Π(F1(1)H1(1))Π(F1(1)H1(0))\displaystyle=\Pi(F^{-1}(1)\cap H^{-1}(1))-\Pi(F^{-1}(1)\cap H^{-1}(0))
=XnΠ(x)F(x)(1)1H(x)\displaystyle=\sum_{X^{n}}\Pi(x)F(x)(-1)^{1-H(x)}
=Ψ,F\displaystyle=\langle\Psi,F\rangle
>ϵ,\displaystyle>\epsilon,

where the first step exploits (5.49), and the last step applies (5.45). ∎

Proof of Claim 5.16..

Let π0\pi_{0} and π1\pi_{1} be the probability distributions induced by π\pi on DISJm,k1(0)\text{\rm DISJ}_{m,k}^{-1}(0) and DISJm,k1(1),\text{\rm DISJ}_{m,k}^{-1}(1), respectively, and let Lπ,n:Xn{0,1}nL_{\pi,n}\colon\mathbb{R}^{X^{n}}\to\mathbb{R}^{\{0,1\}^{n}} be the linear operator given by (5.35). Then for any cylinder intersection χ:Xn{0,1}\chi\colon X^{n}\to\{0,1\}, we have

|𝐄xΠ(1)H(x)χ(x)|\displaystyle\left|\operatorname*{\mathbf{E}}_{x\sim\Pi}(-1)^{H(x)}\chi(x)\right| =|XnΠ(x)(1)H(x)χ(x)|\displaystyle=\left|\sum_{X^{n}}\Pi(x)(-1)^{H(x)}\chi(x)\right|
=|XnΨ(x)χ(x)|\displaystyle=\left|\sum_{X^{n}}\Psi(x)\chi(x)\right|
=|2n𝐄x1,,xnπψ(,DISJm,k(xi),)χ(x)|\displaystyle=\left|2^{n}\operatorname*{\mathbf{E}}_{x_{1},\ldots,x_{n}\sim\pi}\psi(\ldots,\text{\rm DISJ}_{m,k}(x_{i}),\ldots)\chi(x)\right|
=|z{0,1}nψ(z)𝐄x1πz1𝐄xnπznχ(x)|\displaystyle=\left|\sum_{z\in\{0,1\}^{n}}\psi(z)\operatorname*{\mathbf{E}}_{x_{1}\sim\pi_{z_{1}}}\ldots\operatorname*{\mathbf{E}}_{x_{n}\sim\pi_{z_{n}}}\chi(x)\right|
=|ψ,Lπ,nχ|,\displaystyle=|\langle\psi,L_{\pi,n}\chi\rangle|, (5.51)

where the second step uses (5.48), the third step invokes the definition (5.44), the fourth step is justified by (5.38), and the last step is valid by the definition of Lπ,nL_{\pi,n}.

For every polynomial p:{0,1}np\colon\{0,1\}^{n}\to\mathbb{R} of degree less than dd, we have

|ψ,Lπ,nχ|\displaystyle|\langle\psi,L_{\pi,n}\chi\rangle| =|ψ,Lπ,nχp+ψ,p|\displaystyle=|\langle\psi,L_{\pi,n}\chi-p\rangle+\langle\psi,p\rangle|
=|ψ,Lπ,nχp|\displaystyle=|\langle\psi,L_{\pi,n}\chi-p\rangle|
ψ1Lπ,nχp\displaystyle\leqslant\|\psi\|_{1}\,\|L_{\pi,n}\chi-p\|_{\infty}
=Lπ,nχp,\displaystyle=\|L_{\pi,n}\chi-p\|_{\infty}, (5.52)

where the second step uses (5.42), the third step applies Hölder’s inequality, and the fourth step substitutes (5.41). Taking the infimum in (5.52) over all polynomials pp of degree less than d,d, we arrive at

|ψ,Lπ,nχ|E(Lπ,nχ,d1).|\langle\psi,L_{\pi,n}\chi\rangle|\leqslant E(L_{\pi,n}\chi,d-1). (5.53)

Now

discΠ(H)\displaystyle\operatorname{disc}_{\Pi}(H) =maxχ|𝐄xΠ(1)H(x)χ(x)|\displaystyle=\max_{\chi}\left|\operatorname*{\mathbf{E}}_{x\sim\Pi}(-1)^{H(x)}\chi(x)\right|
maxχE(Lπ,nχ,d1)\displaystyle\leqslant\max_{\chi}E(L_{\pi,n}\chi,d-1)
(c′′rdiscπ(DISJm,k))d\displaystyle\leqslant(c^{\prime\prime}\operatorname{rdisc}_{\pi}(\text{\rm DISJ}_{m,k}))^{d}
(c′′(ck2km)1/2)d,\displaystyle\leqslant\left(c^{\prime\prime}\left(\frac{c^{\prime}k2^{k}}{\sqrt{m}}\right)^{1/2}\right)^{d},

where the first step maximizes over all cylinder intersections χ,\chi, the second step combines (5.51) and (5.53), the third step is valid for some absolute constant c′′>0c^{\prime\prime}>0 by Theorem 5.12, and the fourth step holds by (5.39). ∎

This completes the proof of Theorem 5.13. By combining it with our main result on one-sided approximate degree, we now obtain our sought lower bounds for nondeterministic and Merlin–Arthur multiparty communication.

Theorem 5.17.

Let δ>0\delta>0 be arbitrary. Then for all integers n,k2,n,k\geqslant 2, there is an ((explicitly given)) kk-party communication problem Fn,k:({0,1}n)k{0,1}F_{n,k}\colon(\{0,1\}^{n})^{k}\to\{0,1\} with

N(Fn,k)clogn,\displaystyle N(F_{n,k})\leqslant c\log n, (5.54)
N(¬Fn,k)(nc4kk2)1δ,\displaystyle N(\neg F_{n,k})\geqslant\left(\frac{n}{c4^{k}k^{2}}\right)^{1-\delta}, (5.55)
R1/3(¬Fn,k)(nc4kk2)1δ,\displaystyle R_{1/3}(\neg F_{n,k})\geqslant\left(\frac{n}{c4^{k}k^{2}}\right)^{1-\delta}, (5.56)
MA1/3(¬Fn,k)(nc4kk2)1δ2,\displaystyle\text{\it MA}_{1/3}(\neg F_{n,k})\geqslant\left(\frac{n}{c4^{k}k^{2}}\right)^{\frac{1-\delta}{2}}, (5.57)

where c1c\geqslant 1 is a constant independent of nn and k.k. Moreover, each Fn,kF_{n,k} is computable by a monotone DNF formula of width ckck and size ncn^{c}.

Proof.

Theorem 5.2 gives a constant c1c^{\prime}\geqslant 1 and an explicit family {fn}n=1\{f_{n}\}_{n=1}^{\infty} of functions fn:{0,1}n{0,1}f_{n}\colon\{0,1\}^{n}\to\{0,1\} such that each fnf_{n} is computable by a monotone DNF formula of width cc^{\prime} and satisfies

deg3/8+(¬fn)\displaystyle\deg_{3/8}^{+}(\neg f_{n}) 1cn1δ,\displaystyle\geqslant\frac{1}{c^{\prime}}\cdot n^{1-\delta}, n=1,2,3,.\displaystyle n=1,2,3,\ldots. (5.58)

In particular,

deg3/8(¬fn)\displaystyle\deg_{3/8}(\neg f_{n}) 1cn1δ,\displaystyle\geqslant\frac{1}{c^{\prime}}\cdot n^{1-\delta}, n=1,2,3,.\displaystyle n=1,2,3,\ldots. (5.59)

Let C1C\geqslant 1 be the maximum of the absolute constants from Theorems 5.7 and 5.13. For arbitrary integers n,k2,n,k\geqslant 2, define

Fn,k={ANDkif n<C2k+1k2,fn/C2k+1k2¬DISJC2k+1k2,kotherwise.F_{n,k}=\begin{cases}\text{\rm AND}_{k}&\text{if }n<\lceil C2^{k+1}k\rceil^{2},\\ f_{\lfloor n/\lceil C2^{k+1}k\rceil^{2}\rfloor}\circ\neg\text{\rm DISJ}_{\lceil C2^{k+1}k\rceil^{2},k}&\text{otherwise.}\end{cases}

We first analyze the cost of representing Fn,kF_{n,k} as a DNF formula. If n<C2k+1k2,n<\lceil C2^{k+1}k\rceil^{2}, then by definition Fn,kF_{n,k} is a monotone DNF formula of width kk and size 1.1. In the complementary case, fn/C2k+1k2f_{\lfloor n/\lceil C2^{k+1}k\rceil^{2}\rfloor} is by construction a monotone DNF formula of width cc^{\prime} and hence of size at most nc,n^{c^{\prime}}, whereas ¬DISJC2k+1k2,k\neg\text{\rm DISJ}_{\lceil C2^{k+1}k\rceil^{2},k} is by definition a monotone DNF formula of width kk and size at most C2k+1k2n.\lceil C2^{k+1}k\rceil^{2}\leqslant n. As a result, the composed function Fn,kF_{n,k} is a monotone DNF formula of width ckc^{\prime}k and size at most ncnc=n2c.n^{c^{\prime}}\cdot n^{c^{\prime}}=n^{2c^{\prime}}. In particular, the claim in the theorem statement regarding the width and size of Fn,kF_{n,k} as a monotone DNF formula is valid for any large enough c.c. This in turn implies the upper bound in (5.54): consider the nondeterministic protocol in which the parties “guess” one of the terms of the DNF formula for Fn,kF_{n,k} (for a cost of logn2c\lceil\log n^{2c^{\prime}}\rceil bits), evaluate it (using another 22 bits of communication), and output the result.

We now turn to the communication lower bounds. Since Fn,kF_{n,k} is nonconstant, we have the trivial bounds

N(¬Fn,k)1,\displaystyle N(\neg F_{n,k})\geqslant 1, (5.60)
R1/3(¬Fn,k)1,\displaystyle R_{1/3}(\neg F_{n,k})\geqslant 1, (5.61)
MA1/3(¬Fn,k)1.\displaystyle\text{\it MA}_{1/3}(\neg F_{n,k})\geqslant 1. (5.62)

We further claim that

N(¬Fn,k)12cnC2k+1k21δlog83,\displaystyle N(\neg F_{n,k})\geqslant\frac{1}{2c^{\prime}}\cdot\left\lfloor\frac{n}{\lceil C2^{k+1}k\rceil^{2}}\right\rfloor^{1-\delta}-\log\frac{8}{3}, (5.63)
R1/3(¬Fn,k)12cnC2k+1k21δlog12,\displaystyle R_{1/3}(\neg F_{n,k})\geqslant\frac{1}{2c^{\prime}}\cdot\left\lfloor\frac{n}{\lceil C2^{k+1}k\rceil^{2}}\right\rfloor^{1-\delta}-\log 12, (5.64)
MA1/3(¬Fn,k)21CcnC2k+1k21δ(log163)2.\displaystyle\text{\it MA}_{1/3}(\neg F_{n,k})^{2}\geqslant\frac{1}{Cc^{\prime}}\cdot\left\lfloor\frac{n}{\lceil C2^{k+1}k\rceil^{2}}\right\rfloor^{1-\delta}-\left(\log\frac{16}{3}\right)^{2}. (5.65)

For n<C2k+1k2,n<\lceil C2^{k+1}k\rceil^{2}, these claims are trivial since communication complexity is nonnegative. In the complementary case nC2k+1k2,n\geqslant\lceil C2^{k+1}k\rceil^{2}, consider the family {gn}n=1\{g_{n}\}_{n=1}^{\infty} of functions gn:{0,1}n{0,1}g_{n}\colon\{0,1\}^{n}\to\{0,1\} given by gn(x1,x2,,xn)=¬fn(¬x1,¬x2,,¬xn)g_{n}(x_{1},x_{2},\ldots,x_{n})=\neg f_{n}(\neg x_{1},\neg x_{2},\ldots,\neg x_{n}). For each n,n, it is clear that gng_{n} and ¬fn\neg f_{n} have the same one-sided approximate degree. Since ¬Fn,k=gn/C2k+1k2DISJC2k+1k2,k,\neg F_{n,k}=g_{\lfloor n/\lceil C2^{k+1}k\rceil^{2}\rfloor}\circ\text{\rm DISJ}_{\lceil C2^{k+1}k\rceil^{2},k}, one now obtains (5.63) and (5.65) directly from (5.58) and Theorem 5.13. Analogously, gng_{n} and ¬fn\neg f_{n} have the same two-sided approximate degree for each nn, and one obtains (5.64) from (5.59) and Theorem 5.7.

For a large enough constant c1c\geqslant 1, the communication lower bounds (5.55)–(5.57) follow from (5.63)–(5.65) for nc4kk2n\geqslant c4^{k}k^{2}, and from (5.60)–(5.62) for n<c4kk2n<c4^{k}k^{2}. ∎

Theorem 5.17 settles Theorem 1.7 from the introduction.

Acknowledgments

The author is thankful to Justin Thaler and Mark Bun for useful comments on an earlier version of this paper.

References

  • [1] S. Aaronson and Y. Shi, Quantum lower bounds for the collision and the element distinctness problems, J. ACM, 51 (2004), pp. 595–605, doi:10.1145/1008731.1008735.
  • [2] M. Ajtai, H. Iwaniec, J. Komlós, J. Pintz, and E. Szemerédi, Construction of a thin set with small Fourier coefficients, Bulletin of the London Mathematical Society, 22 (1990), pp. 583–590, doi:10.1112/blms/22.6.583.
  • [3] L. Babai, Trading group theory for randomness, in Proceedings of the Seventeenth Annual ACM Symposium on Theory of Computing (STOC), 1985, pp. 421–429, doi:10.1145/22145.22192.
  • [4] L. Babai, P. Frankl, and J. Simon, Complexity classes in communication complexity theory, in Proceedings of the Twenty-Seventh Annual IEEE Symposium on Foundations of Computer Science (FOCS), 1986, pp. 337–347, doi:10.1109/SFCS.1986.15.
  • [5] L. Babai and S. Moran, Arthur-Merlin games: A randomized proof system, and a hierarchy of complexity classes, J. Comput. Syst. Sci., 36 (1988), pp. 254–276, doi:10.1016/0022-0000(88)90028-1.
  • [6] L. Babai, N. Nisan, and M. Szegedy, Multiparty protocols, pseudorandom generators for logspace, and time-space trade-offs, J. Comput. Syst. Sci., 45 (1992), pp. 204–232, doi:10.1016/0022-0000(92)90047-M.
  • [7] Z. Bar-Yossef, T. S. Jayram, R. Kumar, and D. Sivakumar, An information statistics approach to data stream and communication complexity, J. Comput. Syst. Sci., 68 (2004), pp. 702–732, doi:10.1016/j.jcss.2003.11.006.
  • [8] R. Beals, H. Buhrman, R. Cleve, M. Mosca, and R. de Wolf, Quantum lower bounds by polynomials, J. ACM, 48 (2001), pp. 778–797, doi:10.1145/502090.502097.
  • [9] P. Beame, M. David, T. Pitassi, and P. Woelfel, Separating deterministic from nondeterministic NOF multiparty communication complexity, in Proceedings of the Thirty-Fourth International Colloquium on Automata, Languages and Programming (ICALP), 2007, pp. 134–145, doi:10.1007/978-3-540-73420-8_14.
  • [10] P. Beame, M. David, T. Pitassi, and P. Woelfel, Separating deterministic from randomized multiparty communication complexity, Theory of Computing, 6 (2010), pp. 201–225, doi:10.4086/toc.2010.v006a009.
  • [11] P. Beame and T. Huynh, Multiparty communication complexity and threshold circuit size of 𝖠𝖢0\mathsf{AC}^{0}, SIAM J. Comput., 41 (2012), pp. 484–518, doi:10.1137/100792779.
  • [12] P. Beame, T. Pitassi, N. Segerlind, and A. Wigderson, A strong direct product theorem for corruption and the multiparty communication complexity of disjointness, Computational Complexity, 15 (2006), pp. 391–432, doi:10.1007/s00037-007-0220-2.
  • [13] H. Buhrman and R. de Wolf, Communication complexity lower bounds by polynomials, in Proceedings of the Sixteenth Annual IEEE Conference on Computational Complexity (CCC), 2001, pp. 120–130, doi:10.1109/CCC.2001.933879.
  • [14] M. Bun, R. Kothari, and J. Thaler, The polynomial method strikes back: Tight quantum query bounds via dual polynomials, Theory Comput., 16 (2020), pp. 1–71, doi:10.4086/toc.2020.v016a010.
  • [15] M. Bun and J. Thaler, Dual lower bounds for approximate degree and Markov–Bernstein inequalities, Inf. Comput., 243 (2015), pp. 2–25, doi:10.1016/j.ic.2014.12.003.
  • [16] M. Bun and J. Thaler, Hardness amplification and the approximate degree of constant-depth circuits, in Proceedings of the Forty-Second International Colloquium on Automata, Languages and Programming (ICALP), 2015, pp. 268–280, doi:10.1007/978-3-662-47672-7_22.
  • [17] M. Bun and J. Thaler, A nearly optimal lower bound on the approximate degree of 𝖠𝖢0\mathsf{AC}^{0}, SIAM J. Comput., 49 (2020), doi:10.1137/17M1161737.
  • [18] M. Bun and J. Thaler, The large-error approximate degree of 𝖠𝖢0\operatorname{\mathsf{AC}}^{0}, Theory of Computing, 17 (2021), pp. 1–46, doi:10.4086/toc.2021.v017a007.
  • [19] A. K. Chandra, M. L. Furst, and R. J. Lipton, Multi-party protocols, in Proceedings of the Fifteenth Annual ACM Symposium on Theory of Computing (STOC), 1983, pp. 94–99, doi:10.1145/800061.808737.
  • [20] A. Chattopadhyay and A. Ada, Multiparty communication complexity of disjointness, in Electronic Colloquium on Computational Complexity (ECCC), January 2008. Report TR08-002.
  • [21] H. Chernoff, A measure of asymptotic efficiency for tests of a hypothesis based on the sum of observations, Ann. Math. Statist., 23 (1952), pp. 493–507.
  • [22] M. David, T. Pitassi, and E. Viola, Improved separations between nondeterministic and randomized multiparty communication, ACM Transactions on Computation Theory (TOCT), 1 (2009), doi:10.1145/1595391.1595392.
  • [23] D. Gavinsky and A. A. Sherstov, A separation of 𝖭𝖯\mathsf{NP} and 𝖼𝗈𝖭𝖯\mathsf{coNP} in multiparty communication complexity, Theory of Computing, 6 (2010), pp. 227–245, doi:10.4086/toc.2010.v006a010.
  • [24] W. Hoeffding, Probability inequalities for sums of bounded random variables, Journal of the American Statistical Association, 58 (1963), pp. 13–30, doi:10.1080/01621459.1963.10500830.
  • [25] B. Kalyanasundaram and G. Schnitger, The probabilistic communication complexity of set intersection, SIAM J. Discrete Math., 5 (1992), pp. 545–557, doi:10.1137/0405044.
  • [26] T. Lee, A note on the sign degree of formulas, 2009. Available at http://arxiv.org/abs/0909.4607.
  • [27] T. Lee, G. Schechtman, and A. Shraibman, Lower bounds on quantum multiparty communication complexity, in Proceedings of the Twenty-Fourth Annual IEEE Conference on Computational Complexity (CCC), 2009, pp. 254–262, doi:10.1109/CCC.2009.24.
  • [28] T. Lee and A. Shraibman, Disjointness is hard in the multiparty number-on-the-forehead model, Computational Complexity, 18 (2009), pp. 309–336, doi:10.1007/s00037-009-0276-2.
  • [29] N. S. Mande, J. Thaler, and S. Zhu, Improved approximate degree bounds for kk-distinctness, in Proceedings of the 15th Conference on the Theory of Quantum Computation, Communication and Cryptography (TQC), vol. 158, 2020, pp. 2:1–2:22, doi:10.4230/LIPIcs.TQC.2020.2.
  • [30] M. L. Minsky and S. A. Papert, Perceptrons: An Introduction to Computational Geometry, MIT Press, Cambridge, Mass., 1969.
  • [31] N. Nisan and M. Szegedy, On the degree of Boolean functions as real polynomials, Computational Complexity, 4 (1994), pp. 301–313, doi:10.1007/BF01263419.
  • [32] A. A. Razborov, On the distributional complexity of disjointness, Theor. Comput. Sci., 106 (1992), pp. 385–390, doi:10.1016/0304-3975(92)90260-M.
  • [33] A. A. Razborov, Quantum communication complexity of symmetric predicates, Izvestiya of the Russian Academy of Sciences, Mathematics, 67 (2002), pp. 145–159.
  • [34] A. A. Razborov and A. A. Sherstov, The sign-rank of 𝖠𝖢0\mathsf{AC}^{0}, SIAM J. Comput., 39 (2010), pp. 1833–1855, doi:10.1137/080744037.
  • [35] B. Rosser, Explicit bounds for some functions of prime numbers, American Journal of Mathematics, 63 (1941), pp. 211–232.
  • [36] A. A. Sherstov, Communication lower bounds using dual polynomials, Bulletin of the EATCS, 95 (2008), pp. 59–93.
  • [37] A. A. Sherstov, Separating 𝖠𝖢0\mathsf{AC}^{0} from depth-22 majority circuits, SIAM J. Comput., 38 (2009), pp. 2113–2129, doi:10.1137/08071421X.
  • [38] A. A. Sherstov, The pattern matrix method, SIAM J. Comput., 40 (2011), pp. 1969–2000, doi:10.1137/080733644.
  • [39] A. A. Sherstov, Strong direct product theorems for quantum communication and query complexity, SIAM J. Comput., 41 (2012), pp. 1122–1165, doi:10.1137/110842661.
  • [40] A. A. Sherstov, Approximating the AND-OR tree, Theory of Computing, 9 (2013), pp. 653–663, doi:10.4086/toc.2013.v009a020.
  • [41] A. A. Sherstov, The intersection of two halfspaces has high threshold degree, SIAM J. Comput., 42 (2013), pp. 2329–2374, doi:10.1137/100785260.
  • [42] A. A. Sherstov, Making polynomials robust to noise, Theory of Computing, 9 (2013), pp. 593–615, doi:10.4086/toc.2013.v009a018.
  • [43] A. A. Sherstov, Communication lower bounds using directional derivatives, J. ACM, 61 (2014), pp. 1–71, doi:10.1145/2629334.
  • [44] A. A. Sherstov, The multiparty communication complexity of set disjointness, SIAM J. Comput., 45 (2016), pp. 1450–1489, doi:10.1137/120891587.
  • [45] A. A. Sherstov, Breaking the Minsky–Papert barrier for constant-depth circuits, SIAM J. Comput., 47 (2018), pp. 1809–1857, doi:10.1137/15M1015704.
  • [46] A. A. Sherstov, The power of asymmetry in constant-depth circuits, SIAM J. Comput., 47 (2018), pp. 2362–2434, doi:10.1137/16M1064477.
  • [47] A. A. Sherstov, Algorithmic polynomials, SIAM J. Comput., 49 (2020), pp. 1173–1231, doi:10.1137/19M1278831.
  • [48] A. A. Sherstov, The hardest halfspace, Comput. Complex., 30 (2021), pp. 1–85, doi:10.1007/s00037-021-00211-4.
  • [49] A. A. Sherstov and P. Wu, Near-optimal lower bounds on the threshold degree and sign-rank of 𝖠𝖢0\mathsf{AC}^{0}, in Proceedings of the Fifty-First Annual ACM Symposium on Theory of Computing (STOC), 2019, pp. 401–412, doi:10.1145/3313276.3316408.
  • [50] R. Špalek, A dual polynomial for OR. Available at http://arxiv.org/abs/0803.4516, 2008.
  • [51] R. de Wolf, Quantum Computing and Communication Complexity, PhD thesis, University of Amsterdam, 2001.

Appendix A Constructing low-discrepancy integer sets

The purpose of this appendix is to provide a detailed and self-contained proof of Theorem 3.5, restated below.

Theorem.

Fix an integer R1R\geqslant 1 and reals P2P\geqslant 2 and Δ1\Delta\geqslant 1. Let mm be an integer with

mP2(R+1).m\geqslant P^{2}(R+1).

Fix a set Sp{1,2,,p1}S_{p}\subseteq\{1,2,\ldots,p-1\} for each prime p(P/2,P]p\in(P/2,P] with pmp\nmid m. Suppose further that the cardinalities of any two sets from among the SpS_{p} differ by a factor of at most Δ.\Delta. Consider the multiset

S={(r+s(p1)m)modm:r=1,,R;p(P/2,P] prime with pm;sSp}.S=\{(r+s\cdot(p^{-1})_{m})\bmod m:\\ \qquad r=1,\ldots,R;\quad p\in(P/2,P]\text{ prime with }p\nmid m;\quad s\in S_{p}\}.

Then the elements of SS are pairwise distinct and nonzero. Moreover, if SS\neq\varnothing then

discm(S)cR+clogmloglogmlogPPΔ+maxp{discp(Sp)}\operatorname{disc}_{m}(S)\leqslant\frac{c}{\sqrt{R}}+\frac{c\log m}{\log\log m}\cdot\frac{\log P}{P}\cdot\Delta+\max_{p}\{\operatorname{disc}_{p}(S_{p})\} (A.1)

for some ((explicitly given)) constant c1c\geqslant 1 independent of P,R,m,Δ.P,R,m,\Delta.

The special case Δ=1\Delta=1 in this result was proved in [48, Theorem 3.6], and that proof applies with cosmetic changes to any Δ1.\Delta\geqslant 1. As a service to the reader, we provide the complete derivation below; the treatment here is the same word for word as in [48] except for one minor point of departure to handle arbitrary Δ1\Delta\geqslant 1. We use the same notation as in [48] and in particular denote the modulus by lowercase m,m, as opposed to the uppercase MM in the main body of our paper (Theorem 3.5). Analogous to [48], the presentation is broken down into five key milestones, corresponding to Sections A.1A.5 below.

A.1. Exponential notation

In the remainder of this manuscript, we adopt the shorthand

e(x)=exp(2πx𝐢),e(x)=\exp(2\pi x\mathbf{i}),

where 𝐢\mathbf{i} is the imaginary unit. We will need the following bounds [48, Section 6.1]:

|1e(x)|\displaystyle|1-e(x)| 2πx,\displaystyle\leqslant 2\pi x, 0x1,\displaystyle 0\leqslant x\leqslant 1, (A.2)
|1e(x)|\displaystyle|1-e(x)| 4min(x,1x),\displaystyle\geqslant 4\min(x,1-x), 0x1.\displaystyle 0\leqslant x\leqslant 1. (A.3)

Let 𝒫\mathcal{P} denote the set of prime numbers p(P/2,P]p\in(P/2,P] with pm.p\nmid m. In this notation, the multiset SS is given by

S={(r+s(p1)m)modm:p𝒫,sSp,r=1,2,,R}.S=\{(r+s\cdot(p^{-1})_{m})\bmod m:p\in\mathcal{P},\;s\in S_{p},\;r=1,2,\ldots,R\}.

There are precisely π(P)π(P/2)\pi(P)-\pi(P/2) primes in (P/2,P],(P/2,P], of which at most ν(m)\nu(m) are prime divisors of m.m. Therefore,

|𝒫|π(P)π(P2)ν(m).|\mathcal{P}|\geqslant\pi(P)-\pi\left(\frac{P}{2}\right)-\nu(m). (A.4)

A.2. Elements of S are nonzero and distinct

As our first step, we verify that the elements of SS are nonzero and distinct modulo mm. This part of the argument is reproduced word for word from [48, Section 6.2].

Specifically, consider any r{1,2,,R}r\in\{1,2,\ldots,R\}, any prime p(P/2,P]p\in(P/2,P] with pm,p\nmid m, and any sSp.s\in S_{p}. Then pr+s[1,PR+P1][1,m).pr+s\in[1,PR+P-1]\subseteq[1,m). This means that pr+s0(modm),pr+s\not\equiv 0\pmod{m}, which in turn implies that r+s(p1)m0(modm)r+s\cdot(p^{-1})_{m}\not\equiv 0\pmod{m}.

We now show that the multiset SS contains no repeated elements. For this, consider any r,r{1,2,,R},r,r^{\prime}\in\{1,2,\ldots,R\}, any primes p,p𝒫,p,p^{\prime}\in\mathcal{P}, and any sSps\in S_{p} and sSps^{\prime}\in S_{p^{\prime}} such that

r+s(p1)mr+s(p1)m(modm).r+s\cdot(p^{-1})_{m}\equiv r^{\prime}+s^{\prime}\cdot(p^{\prime-1})_{m}\pmod{m}. (A.5)

Our goal is to show that p=p,r=r,s=s.p=p^{\prime},r=r^{\prime},s=s^{\prime}. To this end, multiply (A.5) through by pppp^{\prime} to obtain

rpp+sprpp+sp(modm).r\cdot pp^{\prime}+s\cdot p^{\prime}\equiv r^{\prime}\cdot pp^{\prime}+s^{\prime}\cdot p\pmod{m}. (A.6)

The left-hand side and right-hand side of (A.6) are integers in [1,RP2+(P1)P][1,m),[1,RP^{2}+(P-1)P]\subseteq[1,m), whence

rpp+sp=rpp+sp.r\cdot pp^{\prime}+s\cdot p^{\prime}=r^{\prime}\cdot pp^{\prime}+s^{\prime}\cdot p. (A.7)

This implies that psp,p\mid s\cdot p^{\prime}, which in view of s<ps<p and the primality of pp and pp^{\prime} forces p=p.p=p^{\prime}. Now (A.7) simplifies to

rp+s=rp+s,r\cdot p+s=r^{\prime}\cdot p+s^{\prime}, (A.8)

which in turn yields ss(modp)s\equiv s^{\prime}\pmod{p}. Recalling that s,s{1,2,,p1},s,s^{\prime}\in\{1,2,\ldots,p-1\}, we arrive at s=s.s=s^{\prime}. Finally, substituting s=ss=s^{\prime} in (A.8) gives r=r.r=r^{\prime}.

A.3. Correlation for k small

So far, we have shown that the elements of SS are distinct and nonzero. To bound the mm-discrepancy of this set, we must bound the exponential sum

|sSe(kms)|\left|\sum_{s\in S}e\left(\frac{k}{m}\cdot s\right)\right| (A.9)

for all k=1,2,,m1.k=1,2,\ldots,m-1. This subsection and the next provide two complementary bounds on (A.9). The first bound, presented below, is preferable when kk is close to zero modulo m.m.

Claim A.1.

Let k{1,2,,m1}k\in\{1,2,\ldots,m-1\} be given. Then

|sSe(kms)|(2πmin(k,mk)m+maxp𝒫{discp(Sp)}+ν(k)+ν(mk)|𝒫|Δ)|S|.\left|\sum_{s\in S}e\left(\frac{k}{m}\cdot s\right)\right|\\ \leqslant\left(\frac{2\pi\min(k,m-k)}{m}+\max_{p\in\mathcal{P}}\{\operatorname{disc}_{p}(S_{p})\}+\frac{\nu(k)+\nu(m-k)}{|\mathcal{P}|}\cdot\Delta\right)|S|.

This claim generalizes the analogous statement in [48, Claim 6.10], where the special case Δ=1\Delta=1 was considered.

Proof.

Let 𝒫\mathcal{P}^{\prime} be the set of those primes in 𝒫\mathcal{P} that divide neither kk nor mk.m-k. Then clearly

|𝒫𝒫|ν(k)+ν(mk).|\mathcal{P}\setminus\mathcal{P}^{\prime}|\leqslant\nu(k)+\nu(m-k). (A.10)

Exactly as in [48], we have

|sSe(kms)|\displaystyle\left|\sum_{s\in S}e\left(\frac{k}{m}\cdot s\right)\right|
=|r=1Rp𝒫sSpe(km(r+s(p1)m))|\displaystyle\qquad=\left|\sum_{r=1}^{R}\sum_{p\in\mathcal{P}}\sum_{s\in S_{p}}e\left(\frac{k}{m}\cdot(r+s\cdot(p^{-1})_{m})\right)\right|
r=1Rp𝒫|sSpe(km(r+s(p1)m))|\displaystyle\qquad\leqslant\sum_{r=1}^{R}\sum_{p\in\mathcal{P}}\left|\sum_{s\in S_{p}}e\left(\frac{k}{m}\cdot(r+s\cdot(p^{-1})_{m})\right)\right|
=Rp𝒫|sSpe(ks(p1)mm)|\displaystyle\qquad=R\sum_{p\in\mathcal{P}}\left|\sum_{s\in S_{p}}e\left(\frac{ks\cdot(p^{-1})_{m}}{m}\right)\right|
Rp𝒫|sSpe(ks(p1)mm)|+Rp𝒫𝒫|sSpe(ks(p1)mm)|\displaystyle\qquad\leqslant R\sum_{p\in\mathcal{P}^{\prime}}\left|\sum_{s\in S_{p}}e\left(\frac{ks\cdot(p^{-1})_{m}}{m}\right)\right|+R\sum_{p\in\mathcal{P}\setminus\mathcal{P}^{\prime}}\left|\sum_{s\in S_{p}}e\left(\frac{ks\cdot(p^{-1})_{m}}{m}\right)\right|
Rp𝒫|sSpe(ks(p1)mm)|+Rp𝒫𝒫|Sp|.\displaystyle\qquad\leqslant R\sum_{p\in\mathcal{P}^{\prime}}\left|\sum_{s\in S_{p}}e\left(\frac{ks\cdot(p^{-1})_{m}}{m}\right)\right|+R\sum_{p\in\mathcal{P}\setminus\mathcal{P}^{\prime}}|S_{p}|. (A.11)

We proceed to bound the two summations in (A.11). Bounding the second summation is straightforward:

Rp𝒫𝒫|Sp|\displaystyle R\sum_{p\in\mathcal{P}\setminus\mathcal{P}^{\prime}}|S_{p}| R|𝒫𝒫||𝒫|p𝒫Δ|Sp|\displaystyle\leqslant R\cdot\frac{|\mathcal{P}\setminus\mathcal{P}^{\prime}|}{|\mathcal{P}|}\sum_{p\in\mathcal{P}}\Delta|S_{p}|
=|𝒫𝒫||𝒫|Δ|S|\displaystyle=\frac{|\mathcal{P}\setminus\mathcal{P}^{\prime}|}{|\mathcal{P}|}\cdot\Delta|S|
ν(k)+ν(mk)|𝒫|Δ|S|,\displaystyle\leqslant\frac{\nu(k)+\nu(m-k)}{|\mathcal{P}|}\cdot\Delta|S|, (A.12)

where the first step is valid because the cardinalities of any two sets SpS_{p} differ by a factor of at most Δ,\Delta, and the last step uses (A.10). This three-line derivation is our only point of departure from the treatment in [48].

The other summation in (A.11) is analyzed exactly as in [48]. For p𝒫p\in\mathcal{P}^{\prime} and K{k,km},K\in\{k,k-m\}, we have

|sSpe(ks(p1)mm)|\displaystyle\left|\sum_{s\in S_{p}}e\left(\frac{ks\cdot(p^{-1})_{m}}{m}\right)\right|
=|sSpe(Ks(p1)mm)|\displaystyle=\left|\sum_{s\in S_{p}}e\left(\frac{Ks\cdot(p^{-1})_{m}}{m}\right)\right|
=|sSpe(Ks(m1)pp)e(Kspm)|\displaystyle=\left|\sum_{s\in S_{p}}e\left(-\frac{Ks\cdot(m^{-1})_{p}}{p}\right)e\left(\frac{Ks}{pm}\right)\right|
|sSpe(Ks(m1)pp)(e(Kspm)1)|+|sSpe(Ks(m1)pp)|\displaystyle\leqslant\left|\sum_{s\in S_{p}}e\left(-\frac{Ks\cdot(m^{-1})_{p}}{p}\right)\left(e\left(\frac{Ks}{pm}\right)-1\right)\right|+\left|\sum_{s\in S_{p}}e\left(-\frac{Ks\cdot(m^{-1})_{p}}{p}\right)\right|
|sSpe(Ks(m1)pp)(e(Kspm)1)|+discp(Sp)|Sp|\displaystyle\leqslant\left|\sum_{s\in S_{p}}e\left(-\frac{Ks\cdot(m^{-1})_{p}}{p}\right)\left(e\left(\frac{Ks}{pm}\right)-1\right)\right|+\operatorname{disc}_{p}(S_{p})\cdot|S_{p}|
sSp|e(Kspm)1|+discp(Sp)|Sp|\displaystyle\leqslant\sum_{s\in S_{p}}\left|e\left(\frac{Ks}{pm}\right)-1\right|+\operatorname{disc}_{p}(S_{p})\cdot|S_{p}|
=sSp|e(|K|spm)1|+discp(Sp)|Sp|\displaystyle=\sum_{s\in S_{p}}\left|e\left(\frac{|K|s}{pm}\right)-1\right|+\operatorname{disc}_{p}(S_{p})\cdot|S_{p}|
|Sp|2π|K|m+discp(Sp)|Sp|,\displaystyle\leqslant|S_{p}|\cdot\frac{2\pi|K|}{m}+\operatorname{disc}_{p}(S_{p})\cdot|S_{p}|,

where the second step uses Fact 2.16 and the relative primality of pp and mm; the third step applies the triangle inequality; the fourth step follows from p|K|p\nmid|K|, and the last step is valid by (A.2) and s<ps<p. We have shown that

|sSpe(ks(p1)mm)|\displaystyle\left|\sum_{s\in S_{p}}e\left(\frac{ks\cdot(p^{-1})_{m}}{m}\right)\right| 2πmin(k,mk)m|Sp|+discp(Sp)|Sp|\displaystyle\leqslant\frac{2\pi\min(k,m-k)}{m}\cdot|S_{p}|+\operatorname{disc}_{p}(S_{p})\cdot|S_{p}|

for p𝒫.p\in\mathcal{P}^{\prime}. Summing over 𝒫,\mathcal{P}^{\prime},

Rp𝒫\displaystyle R\sum_{p\in\mathcal{P}^{\prime}} |sSpe(ks(p1)mm)|\displaystyle\left|\sum_{s\in S_{p}}e\left(\frac{ks\cdot(p^{-1})_{m}}{m}\right)\right|
Rp𝒫(2πmin(k,mk)m|Sp|+discp(Sp)|Sp|)\displaystyle\qquad\leqslant R\sum_{p\in\mathcal{P}^{\prime}}\left(\frac{2\pi\min(k,m-k)}{m}\cdot|S_{p}|+\operatorname{disc}_{p}(S_{p})\cdot|S_{p}|\right)
Rp𝒫(2πmin(k,mk)m|Sp|+discp(Sp)|Sp|)\displaystyle\qquad\leqslant R\sum_{p\in\mathcal{P}}\left(\frac{2\pi\min(k,m-k)}{m}\cdot|S_{p}|+\operatorname{disc}_{p}(S_{p})\cdot|S_{p}|\right)
(2πmin(k,mk)m+maxp𝒫{discp(Sp)})Rp𝒫|Sp|\displaystyle\qquad\leqslant\left(\frac{2\pi\min(k,m-k)}{m}+\max_{p\in\mathcal{P}}\{\operatorname{disc}_{p}(S_{p})\}\right)R\sum_{p\in\mathcal{P}}|S_{p}|
=(2πmin(k,mk)m+maxp𝒫{discp(Sp)})|S|.\displaystyle\qquad=\left(\frac{2\pi\min(k,m-k)}{m}+\max_{p\in\mathcal{P}}\{\operatorname{disc}_{p}(S_{p})\}\right)|S|. (A.13)

By (A.11)–(A.13), the proof of the claim is complete. ∎

A.4. Correlation for kk large

We now present an alternative bound on the exponential sum (A.9), which is preferable to the bound of Claim A.1 when kk is far from zero modulo m.m. This part of the proof is reproduced verbatim from [48, Section 6.4].

Claim A.2.

Let k{1,2,,m1}k\in\{1,2,\ldots,m-1\} be given. Then

|sSe(kms)|m2Rmin(k,mk)|S|.\left|\sum_{s\in S}e\left(\frac{k}{m}\cdot s\right)\right|\leqslant\frac{m}{2R\min(k,m-k)}\cdot|S|.
Proof:.
|sSe(kms)|\displaystyle\left|\sum_{s\in S}e\left(\frac{k}{m}\cdot s\right)\right| =|p𝒫sSpr=1Re(km(r+s(p1)m))|\displaystyle=\left|\sum_{p\in\mathcal{P}}\sum_{s\in S_{p}}\sum_{r=1}^{R}e\left(\frac{k}{m}\cdot(r+s\cdot(p^{-1})_{m})\right)\right|
p𝒫sSp|r=1Re(km(r+s(p1)m))|\displaystyle\leqslant\sum_{p\in\mathcal{P}}\sum_{s\in S_{p}}\left|\sum_{r=1}^{R}e\left(\frac{k}{m}\cdot(r+s\cdot(p^{-1})_{m})\right)\right|
=p𝒫sSp|r=1Re(krm)|\displaystyle=\sum_{p\in\mathcal{P}}\sum_{s\in S_{p}}\left|\sum_{r=1}^{R}e\left(\frac{kr}{m}\right)\right|
=p𝒫sSp|1e(kR/m)||1e(k/m)|\displaystyle=\sum_{p\in\mathcal{P}}\sum_{s\in S_{p}}\frac{|1-e(kR/m)|}{|1-e(k/m)|}
p𝒫sSp2|1e(k/m)|\displaystyle\leqslant\sum_{p\in\mathcal{P}}\sum_{s\in S_{p}}\frac{2}{|1-e(k/m)|}
p𝒫sSpm2min(k,mk)\displaystyle\leqslant\sum_{p\in\mathcal{P}}\sum_{s\in S_{p}}\frac{m}{2\min(k,m-k)}
=m2Rmin(k,mk)|S|,\displaystyle=\frac{m}{2R\min(k,m-k)}\cdot|S|,

where the last two steps use (A.3) and |S|=Rp𝒫|Sp||S|=R\sum_{p\in\mathcal{P}}|S_{p}|, respectively. ∎

A.5. Finishing the proof

The remainder of the proof is reproduced without changes from [48, Section 6.5], except for the use of the updated bound in Claim A.1 for arbitrary Δ1.\Delta\geqslant 1.

Specifically, Facts 2.17 and 2.18 imply that

π(P)π(P2)\displaystyle\pi(P)-\pi\left(\frac{P}{2}\right) PClogP(PC),\displaystyle\geqslant\frac{P}{C\log P}\qquad\qquad(P\geqslant C), (A.14)
maxk=1,2,,mν(k)\displaystyle\max_{k=1,2,\ldots,m}\nu(k) Clogmloglogm,\displaystyle\leqslant\frac{C\log m}{\log\log m}, (A.15)

where C1C\geqslant 1 is a constant independent of R,P,m,Δ.R,P,m,\Delta. Moreover, CC can be easily calculated from the explicit bounds in Facts 2.17 and 2.18. We will show that the theorem conclusion (A.1) holds with c=4C2.c=4C^{2}. We may assume that

PC,\displaystyle P\geqslant C, (A.16)
ClogmloglogmP2ClogP,\displaystyle\frac{C\log m}{\log\log m}\leqslant\frac{P}{2C\log P}, (A.17)

since otherwise the right-hand side of (A.1) exceeds 11 and the theorem is trivially true. By (A.4) and (A.14)–(A.17), we obtain

|𝒫|P2ClogP,|\mathcal{P}|\geqslant\frac{P}{2C\log P},

which along with (A.15) gives

maxk=1,2,,m1ν(k)+ν(mk)|𝒫|\displaystyle\max_{k=1,2,\ldots,m-1}\frac{\nu(k)+\nu(m-k)}{|\mathcal{P}|} 2Clogmloglogm2ClogPP\displaystyle\leqslant\frac{2C\log m}{\log\log m}\cdot\frac{2C\log P}{P}
=clogmloglogmlogPP.\displaystyle=\frac{c\log m}{\log\log m}\cdot\frac{\log P}{P}. (A.18)

Claims A.1 and A.2 ensure that for every k=1,2,,m1,k=1,2,\ldots,m-1,

|sSe(kms)|\displaystyle\left|\sum_{s\in S}e\left(\frac{k}{m}\cdot s\right)\right| (min(2πmin(k,mk)m,m2Rmin(k,mk))\displaystyle\leqslant\left(\min\left(\frac{2\pi\min(k,m-k)}{m},\frac{m}{2R\min(k,m-k)}\right)\right.
+maxp𝒫{discp(Sp)}+ν(k)+ν(mk)|𝒫|Δ)|S|\displaystyle\qquad\qquad\left.+\max_{p\in\mathcal{P}}\{\operatorname{disc}_{p}(S_{p})\}+\frac{\nu(k)+\nu(m-k)}{|\mathcal{P}|}\cdot\Delta\right)|S|
(πR+maxp𝒫{discp(Sp)}+ν(k)+ν(mk)|𝒫|Δ)|S|\displaystyle\leqslant\left(\sqrt{\frac{\pi}{R}}+\max_{p\in\mathcal{P}}\{\operatorname{disc}_{p}(S_{p})\}+\frac{\nu(k)+\nu(m-k)}{|\mathcal{P}|}\cdot\Delta\right)|S|
(cR+maxp𝒫{discp(Sp)}+ν(k)+ν(mk)|𝒫|Δ)|S|;\displaystyle\leqslant\left(\frac{c}{\sqrt{R}}+\max_{p\in\mathcal{P}}\{\operatorname{disc}_{p}(S_{p})\}+\frac{\nu(k)+\nu(m-k)}{|\mathcal{P}|}\cdot\Delta\right)|S|;

here we are using the updated bound from Claim A.1 in this paper for general Δ\Delta. Substituting the estimate from (A.18), we conclude that

maxk=1,2,,m1|sSe(kms)|(cR+maxp𝒫{discp(Sp)}+clogmloglogmlogPPΔ)|S|.\max_{k=1,2,\ldots,m-1}\left|\sum_{s\in S}e\left(\frac{k}{m}\cdot s\right)\right|\\ \leqslant\left(\frac{c}{\sqrt{R}}+\max_{p\in\mathcal{P}}\{\operatorname{disc}_{p}(S_{p})\}+\frac{c\log m}{\log\log m}\cdot\frac{\log P}{P}\cdot\Delta\right)|S|.\qquad\qquad

This conclusion is equivalent to (A.1). The proof of Theorem 3.5 is complete.