The Approximate Degree of DNF and CNF Formulas∗
Abstract.
The approximate degree of a Boolean function is the minimum degree of a real polynomial that approximates pointwise: for all For every we construct CNF and DNF formulas of polynomial size with approximate degree essentially matching the trivial upper bound of This improves polynomially on previous lower bounds and fully resolves the approximate degree of constant-depth circuits a question that has seen extensive research over the past 10 years. Prior to our work, an lower bound was known only for circuits of depth that grows with (Bun and Thaler, FOCS 2017). Furthermore, the CNF and DNF formulas that we construct are the simplest possible in that they have constant width. Our result holds even for one-sided approximation: for any , we construct a polynomial-size constant-width CNF formula with one-sided approximate degree .
Our work has the following consequences.
-
(i)
We essentially settle the communication complexity of circuits in the bounded-error quantum model, -party number-on-the-forehead randomized model, and -party number-on-the-forehead nondeterministic model: we prove that for every , these models require , , and , respectively, bits of communication even for polynomial-size constant-width CNF formulas.
-
(ii)
In particular, we show that the multiparty communication class can be separated essentially optimally from and by a particularly simple function, a polynomial-size constant-width CNF formula.
-
(iii)
We give an essentially tight separation, of versus , for the one-sided versus two-sided approximate degree of a function; and versus for the one-sided approximate degree of a function versus its negation .
Our proof departs significantly from previous approaches and contributes a novel, number-theoretic method for amplifying approximate degree.
1. Introduction
Representations of Boolean functions by real polynomials play a central role in theoretical computer science. Our focus in this paper is on approximate degree, a particularly natural and useful complexity measure. Formally, the -approximate degree of a Boolean function is denoted and defined as the minimum degree of a real polynomial that approximates within pointwise: for all The standard choice of the error parameter is which is a largely arbitrary setting that can be replaced by any other constant in without affecting the approximate degree by more than a multiplicative constant. Since every function can be computed with zero error by a polynomial of degree at most the -approximate degree is always at most
The notion of approximate degree originated three decades ago in the pioneering work of Nisan and Szegedy [31] and has since proved to be a powerful tool in theoretical computer science. Upper bounds on approximate degree have algorithmic applications, whereas lower bounds are a staple in complexity theory. On the algorithmic side, approximate degree underlies many of the strongest results obtained to date in computational learning, differentially private data release, and algorithm design in general. In complexity theory, the notion of approximate degree has produced breakthroughs in quantum query complexity, communication complexity, and circuit complexity. A detailed bibliographic overview of these applications can be found in [47, 17].
Approximate degree has been particularly prominent in the study of the class of polynomial-size constant-depth circuits with gates of unbounded fan-in. The simplest functions in are conjunctions and disjunctions, which have depth , followed by polynomial-size CNF and DNF formulas, which have depth , followed in turn by higher-depth circuits. Lower bounds on the approximate degree of functions have been used to settle the quantum query complexity of Grover search [8], element distinctness [1], and a host of other problems [14]; resolve the communication complexity of set disjointness in the two-party quantum model [33, 38] and number-on-the-forehead multiparty model [37, 38, 28, 20, 36, 11, 44, 43]; separate the communication complexity classes and [13, 37]; and separate the polynomial hierarchy in communication complexity from the communication class [34]. Despite this array of applications and decades of study, our understanding of the approximate degree of has remained surprisingly fragmented and incomplete. In this paper, we set out to resolve this question in full.
In more detail, previous work on the approximate degree of started with the seminal 1994 paper of Nisan and Szegedy [31], who proved that the OR function on bits has approximate degree This was the best result until Aaronson and Shi’s celebrated lower bound of for the element distinctness problem [1]. In a beautiful paper from 2017, Bun and Thaler [17] showed that contains functions in variables with approximate degree , where the constant can be made arbitrarily small at the expense of increasing the depth of the circuit. In follow-up work, Bun and Thaler [18] proved an lower bound for approximating circuits even with error exponentially close to where once again the circuit depth grows with . A stronger yet result was obtained by Sherstov and Wu [49], who showed that has essentially the maximum possible threshold degree (defined as the limit of -approximate degree as ) and sign-rank (a generalization of threshold degree to arbitrary bases rather than just the basis of monomials). Quantitatively, the authors of [49] proved a lower bound of for threshold degree and for sign-rank, essentially matching the trivial upper bounds. As before, can be made arbitrarily small at the expense of increasing the circuit depth. In particular, requires a polynomial of degree even for approximation to error doubly (triply, quadruply, quintuply…) exponentially close to .
The lower bounds of [17, 18, 49] show that functions have essentially the maximum possible complexity—but only if one is willing to look at circuits of arbitrarily large constant depth. What happens at small depths has been a wide open problem, with no techniques to address it. Bun and Thaler observe that their circuit in [17] with approximate degree can be flattened to produce a DNF formula of size , but this is superpolynomial and thus no longer in . The only progress of which we are aware is an lower bound obtained for polynomial-size DNF formulas in [14, 29]. This leaves a polynomial gap in the approximate degree for small depth versus arbitrary constant depth. Our main contribution is to definitively resolve the approximate degree of by constructing, for any constant a polynomial-size DNF formula with approximate degree . We now describe our main result and its generalizations and applications.
1.1. Approximate degree of DNF and CNF formulas
Recall that a literal is a Boolean variable or its negation . A conjunction of literals is called a term, and a disjunction of literals is called a clause. The width of a term or clause is the number of literals that it contains. A DNF formula is a disjunction of terms, and analogously a CNF formula is a conjunction of clauses. The width of a DNF or CNF formula is the maximum width of a term or clause in it, respectively. One often refers to DNF and CNF formulas of width as -DNF and -CNF formulas. The size of a DNF or CNF formula is the total number of terms or clauses, respectively, that it contains. Thus, circuits of depth correspond precisely to clauses and terms, whereas circuits of depth correspond precisely to polynomial-size DNF and CNF formulas. Our main result on approximate degree is as follows.
Theorem 1.1 (Main result).
Let be any constant. Then for each there is an explicitly given function that has approximate degree
and is computable by a DNF formula of size and width
Theorem 1.1 almost matches the trivial upper bound of on the approximate degree of any function. Thus, the theorem shows that circuits of depth already achieve essentially the maximum possible approximate degree. This depth cannot be reduced further because circuits of depth have approximate degree Finally, the DNF formulas constructed in Theorem 1.1 are the simplest possible in that they have constant width.
Recall that previously, a lower bound of for was known only for circuits of large constant depth that grows with . The lack of progress on small-depth prior to this paper had experts seriously entertaining [18] the possibility that circuits of any given depth have approximate degree , for some constant . Such an upper bound would have far-reaching consequences in computational learning and circuit complexity. Theorem 1.1 rules it out.
1.2. Large-error approximation
Any Boolean function can be approximated pointwise within in a trivial manner, by a constant polynomial. Approximation within on the other hand, is a meaningful and extremely useful notion. We obtain the following strengthening of our main result, in which the approximation error is relaxed from to an optimal
Theorem 1.2 (Main result for large error).
Let and be any constants. Then for each there is an explicitly given function that has approximate degree
and is computable by a DNF formula of size and width
To rephrase Theorem 1.2, polynomial-size DNF formulas require degree for approximation not only to constant error but even to error where is an arbitrarily large constant. The error parameter in Theorem 1.2 cannot be relaxed further to because any DNF formula with terms can be approximated to error by a polynomial of degree .
Negating a function has no effect on the approximate degree. Indeed, if is approximated to error by a polynomial , then the negated function is approximated to the same error by the polynomial With this observation, Theorems 1.1 and 1.2 carry over to CNF formulas:
Corollary 1.3.
Let and be any constants. Then for each there is an explicitly given function that has approximate degree
and is computable by a CNF formula of size and width
1.3. One-sided approximation
There is a natural notion of one-sided approximation for Boolean functions. Specifically, the one-sided -approximate degree of a function is defined as the minimum degree of a real polynomial such that
for every This complexity measure is denoted . It plays a considerable role [23, 15, 40, 16, 45, 44, 43] in the area, both in its own right and due to its applications to other asymmetric notions of computation such as nondeterminism and Merlin–Arthur protocols. One-sided approximation is meaningful for any error parameter and as before the standard setting is . By definition, one-sided approximate degree is always at most . Observe that the definitions of -approximate degree and its one-sided variant impose the same requirement for inputs : the approximating polynomial must approximate within at each such . For inputs on the other hand, the definitions of and diverge dramatically, with one-sided -approximate degree not requiring any upper bound on the approximating polynomial . As a result, one always has , and it is reasonable to expect a large gap between the two quantities for some . Moreover, the one-sided approximate degree of a function is in general not equal to that of its negation: This contrasts with the equality for two-sided approximation.
In this light, there are three particularly natural questions to ask about one-sided approximate degree:
-
(i)
What is the one-sided approximate degree of circuits?
-
(ii)
What is the largest possible gap between approximate degree and one-sided approximate degree?
-
(iii)
What is the largest possible gap between the one-sided approximate degree of a function and that of its negation ?
In this paper, we resolve all three questions in detail. For question (i), we prove that polynomial-size CNF formulas achieve essentially the maximum possible one-sided approximate degree. In fact, our result holds even for approximation to error vanishingly close to random guessing, :
Theorem 1.4.
Let and be any constants. Then for each there is an explicitly given function that has one-sided approximate degree
and is computable by a CNF formula of size and width
Theorem 1.4 essentially settles the one-sided approximate degree of . The theorem is optimal with respect to circuit depth; recall that depth- circuits have approximate degree and hence also one-sided approximate degree . Previous work on the one-sided approximate degree of was suboptimal with respect to the degree bound and/or circuit depth. Specifically, the best previous lower bounds were due to Bun and Thaler [16] for a polynomial-size CNF formula, and due to Sherstov and Wu [49] for circuits of depth that grows with
As an application of Theorem 1.4, we resolve questions (ii) and (iii) in full, establishing a gap of versus in each case. Moreover, we prove that these gaps remain valid well beyond the standard error regime of . A detailed statement of our separations follows.
Corollary 1.5.
Let and be any constants. Then for each there is an explicitly given function with
(1.1) |
but
(1.2) | |||
(1.3) |
Moreover, is computable by a DNF formula of size and width
Equations (1.1) and (1.2) in this result give the promised versus separation for question (ii). Analogously, (1.1) and (1.3) give an versus separation for question (iii). Of particular note in both separations is the error regime: the upper bound remains valid even under the stronger requirement of zero error, whereas the lower bounds remain valid even under the weaker requirement of error . Our separations improve on previous work. For question (ii), the best previous separation was versus for any fixed , implicit in [17]. For the harder question (iii), the best previous separation [16] was versus , which is polynomially weaker than ours.
Proof of Corollary 1.5..
Let be the function from Theorem 1.4, and set . Then (1.3) is immediate. Equation (1.2) follows from (1.3) in light of the basic relations , valid for all and Finally, (1.1) can be seen as follows. Since is a CNF formula of width its negation is a DNF formula of width Thus, every term of can be represented exactly by a polynomial of degree . Summing these polynomials gives a -error one-sided approximant for ∎
We now discuss applications of our results on approximate degree and one-sided approximate degree to fundamental questions in communication complexity.
1.4. Randomized multiparty communication
We adopt the number-on-the-forehead model of Chandra, Furst, and Lipton [19], which is the most powerful formalism of multiparty communication. The model features communicating players and a Boolean function with arguments. An input is distributed among the players by giving the -th player the arguments but not . This arrangement can be visualized as having the players seated in a circle with written on the -th player’s forehead, whence the name of the model. Number-on-the-forehead is the canonical model in the area because any other way of assigning arguments to players results in a less powerful model—provided of course that one does not assign all the arguments to some player, in which case there is never a need to communicate.
The players communicate according to a protocol agreed upon in advance. The communication occurs in the form of broadcasts, with a message sent by any given player instantly reaching everyone else. The players’ objective is to compute on any given input with minimal communication. To this end, the players have access to an unbounded supply of shared random bits which they can use in deciding what message to send at any given point in the protocol. The cost of a protocol is the total bit length of all the messages broadcast in a worst-case execution. The -error randomized communication complexity of a given function is the least cost of a protocol that computes with probability of error at most on every input. As with approximate degree, the standard setting of the error parameter is
The number-on-the-forehead communication complexity of constant-depth circuits is a challenging question that has been the focus of extensive research, e.g., [12, 28, 20, 36, 11, 44, 43, 17]. In contrast to the two-party model, where a lower bound of for circuits is straightforward to prove from first principles [4], the first multiparty lower bound [44] for was obtained only in 2012. The strongest known multiparty lower bounds for are obtained using the pattern matrix method of [43], which transforms approximate degree lower bounds in a black-box manner into communication lower bounds. In the most recent application of this method, Bun and Thaler [17] gave a -party communication problem in with communication complexity where the constant can be taken arbitrarily small at the expense of increasing the depth of the circuit. This shows that has essentially the maximum possible multiparty communication complexity—as long as one is willing to use circuits of arbitrarily large constant depth. For circuits of small depth, the best lower bound is polynomially weaker: for the -party communication complexity of polynomial-size DNF formulas, which can be proved by applying the pattern matrix method to the approximate degree lower bounds in [14, 29]. This fragmented state of the art closely parallels that for approximate degree prior to our work.
We resolve the multiparty communication complexity of in detail in the following theorem.
Theorem 1.6.
Fix any constants and . Then for all integers there is an explicitly given -party communication problem with
where is a constant independent of and Moreover, is computable by a DNF formula of size and width .
Theorem 1.6 essentially represents the state of the art for multiparty communication lower bounds. Indeed, the best communication lower bound to date for any explicit function whether or not is computable by an circuit, is [6]. Theorem 1.6 comes close to matching the trivial upper bound of for any communication problem, thereby showing that circuits of depth achieve nearly the maximum possible communication complexity. Moreover, our result holds not only for bounded-error communication but also for communication with error for any The error parameter in Theorem 1.6 is optimal and cannot be further increased to ; indeed, it is straightforward to see that any DNF formula with terms has a communication protocol with error and cost bits. Theorem 1.6 is also optimal with respect to circuit depth because the multiparty communication complexity of circuits of depth is at most bits.
Since randomized communication complexity is invariant under function negation, Theorem 1.6 remains valid with the word “DNF” replaced with “CNF.”
1.5. Nondeterministic and Merlin–Arthur multiparty communication
Here again, we adopt the -party number-on-the-forehead model of Chandra, Furst, and Lipton [19]. Nondeterministic communication is defined in complete analogy with computational complexity. Specifically, a nondeterministic protocol starts with a guess string, whose length counts toward the protocol’s communication cost, and proceeds deterministically thenceforth. A nondeterministic protocol for a given communication problem is required to output the correct answer for all guess strings when presented with a negative instance of and for some guess string when presented with a positive instance. We further consider Merlin–Arthur protocols [3, 5], a communication model that combines the power of randomization and nondeterminism. As before, a Merlin–Arthur protocol for a given problem starts with a guess string, whose length counts toward the communication cost. From then on, the parties run an ordinary randomized protocol. The randomized phase in a Merlin–Arthur protocol must produce the correct answer with probability at least for all guess strings when presented with a negative instance of and for some guess string when presented with a positive instance. Thus, the cost of a nondeterministic or Merlin–Arthur protocol is the sum of the costs of the guessing phase and communication phase. The minimum cost of a valid protocol for in these models is called the nondeterministic communication complexity of , denoted , and Merlin–Arthur communication complexity of denoted . The quantity is called the co-nondeterministic communication complexity of .
Nondeterministic and Merlin–Arthur protocols have been extensively studied for parties but are much less understood in the multiparty setting [10, 23, 44, 43]. Prior to our paper, the best lower bounds in these models for an circuit were for nondeterministic communication and for Merlin–Arthur communication, obtained in [43] for the set disjointness problem. We give a quadratic improvement on these lower bounds. In particular, our result for nondeterminism essentially matches the trivial upper bound. Moreover, we obtain our result for a particularly simple function in , namely, a polynomial-size CNF formula of constant width. A detailed statement follows.
Theorem 1.7.
Let be arbitrary. Then for all integers there is an explicitly given -party communication problem with
but
(1.4) | ||||
(1.5) | ||||
(1.6) |
where is a constant independent of and Moreover, is computable by a CNF formula of width and size .
This result can be viewed as a far-reaching generalization of Theorem 1.6 to nondeterministic and Merlin–Arthur protocols. To obtain Theorem 1.7, we adapt the pattern matrix method [43] to be able to transform any lower bound on one-sided approximate degree into a multiparty communication lower bound in the nondeterministic and Merlin–Arthur models. With this tool in hand, we obtain Theorem 1.7 from our one-sided approximate degree lower bound (Theorem 1.4).
1.6. Multiparty communication classes
Theorem 1.7 sheds new light on communication complexity classes, defined in the seminal work of Babai, Frankl, and Simon [4]. An infinite family where each is a -party number-on-the-forehead communication problem, is said to be efficiently solvable in a given model of communication if has communication complexity at most in that model, for a large enough constant and all One defines and as the classes of families that are efficiently solvable in the randomized, nondeterministic, co-nondeterministic, and Merlin–Arthur models, respectively. In particular, is a superset of and . In these definitions, can be any function of including constant functions such as The relations among these multiparty classes have been actively studied over the past decade [9, 28, 20, 22, 11, 10, 23, 44, 43]. It particular, for it is known that is not contained in or even . Quantitatively, these results can be summarized as follows.
-
(i)
Prior to our work, the strongest -party separation of co-nondeterministic versus randomized communication complexity was versus , proved in [43] for the set disjointness function.
- (ii)
-
(iii)
The best previous -party separation of co-nondeterministic versus Merlin–Arthur communication complexity was versus , proved in [43] for the set disjointness problem.
Theorem 1.7 gives a quadratic improvement on these previous separations, excluding the nonconstructive separation of from in [10]. Moreover, our quadratically improved separations are achieved for a particularly simple function, namely, the polynomial-size constant-width CNF formula . In the regime our separations of from and are essentially optimal, and our separation of from is within a square of optimal. Recall that no explicit lower bounds at all are currently known in the regime even for deterministic communication. We state our contributions for communication complexity classes as a corollary below.
Corollary 1.8.
Proof.
The claims for are immediate from Theorem 1.7 and the definitions of For the remaining separation, we need only prove the upper bound Recall from Theorem 1.6 that is a DNF formula with terms. This gives the desired nondeterministic protocol: the parties “guess” one of the terms in (for a cost of bits), evaluate it (using another bits of communication), and output the result. ∎
1.7. Quantum communication complexity
We adopt the standard model of quantum communication, where two parties exchange quantum messages according to an agreed-upon protocol in order to solve a two-party communication problem . As usual, an input is split between the parties, with one party knowing only and the other party knowing only We allow arbitrary prior entanglement at the start of the communication. A measurement at the end of the protocol produces a single-bit answer, which is interpreted as the protocol output. An -error protocol for is required to output, on every input the correct value with probability at least The cost of a quantum protocol is the total number of quantum bits exchanged in the worst case on any input. The -error quantum communication complexity of , denoted is the least cost of an -error quantum protocol for The asterisk in indicates that the parties share arbitrary prior entanglement. The standard setting of the error parameter is which is as usual without loss of generality. For a detailed formal description of the quantum model, we refer the reader to [51, 33, 38].
Proving lower bounds for bounded-error quantum communication is significantly more challenging than for randomized communication. An illustrative example is the set disjointness problem on bits. Babai, Frankl, and Simon [4] obtained an randomized communication lower bound for this function in 1986 using a short and elementary proof, which was later improved to a tight in [25, 32, 7]. This is in stark contrast with the quantum model, where the best lower bound for set disjointness was for a long time a trivial until a tight was proved by Razborov [33] in 2002.
A completely different proof of the lower bound for set disjointness was given in [38] by introducing the pattern matrix method. Since then, the method has produced the strongest known quantum lower bounds for . Of these, the best lower bound prior to our work was due to Bun and Thaler [17], where the constant can be taken arbitrarily small at the expense of circuit depth. In the following theorem, we resolve the quantum communication complexity of in full by proving that polynomial-size DNF formulas achieve near-maximum communication complexity.
Theorem 1.9.
Let and be any constants. Then for each there is an explicitly given two-party communication problem that has quantum communication complexity
and is representable by a DNF formula of size and width
This theorem remains valid for CNF formulas since quantum communication complexity is invariant under function negation. As in all of our results, Theorem 1.9 essentially matches the trivial upper bound, showing that circuits of depth achieve nearly the maximum possible complexity. Again analogous to our other results, Theorem 1.9 holds not only for bounded-error communication but also for communication with error for any The error parameter in Theorem 1.9 is optimal and cannot be further increased to : as remarked above, any DNF formula with terms has a classical communication protocol with error and cost bits. Lastly, Theorem 1.9 is optimal with respect to circuit depth because circuits of depth have communication complexity at most bits even in the classical deterministic model.
In our overview so far, we have separately considered the classical multiparty model and the quantum two-party model. By combining the features of these models, one arrives at the -party number-on-the-forehead model with quantum players. Our results readily generalize to this setting. Specifically, for any constants and we give an explicit DNF formula of size and width such that computing in the -party quantum number-on-the-forehead model with error requires quantum bits. For more details, see Remark 5.9.
1.8. Previous approaches
In the remainder of the introduction, we sketch our proof of Theorem 1.1. To properly set the stage for our work, we start by reviewing the relevant background and previous approaches. The notation that we adopt below is standard, and we defer its formal review to Section 2.
Dual view of approximation
Let be a Boolean function of interest, where is an arbitrary finite subset of Euclidean space. The approximate degree of is defined analogously to functions on the Boolean hypercube: is the minimum degree of a real polynomial such that for every A valuable tool in the analysis of approximate degree is linear programming duality, which gives a powerful dual view of approximation [38]. This dual characterization states that if and only if there is a function with the following two properties: ; and for every polynomial of degree less than . Rephrasing, must be correlated with but completely uncorrelated with any polynomial of degree less than Such a function is variously referred to in the literature as a “dual object,” “dual polynomial,” or “witness” for The dual characterization makes it possible to prove any approximate degree lower bound by constructing the corresponding witness This good news comes with a caveat: for all but the simplest functions, the construction of is very demanding, and linear programming duality gives no guidance in this regard.
Componentwise composition
The construction of a dual object is more approachable for composed functions since one can hope to break them up into constituent parts, construct a dual object for each, and recombine these results. Formally, define the componentwise composition of functions and as the Boolean function given by To construct a dual object for one starts by obtaining dual objects and for the constituent functions and , respectively, either by direct construction or by appeal to linear programming duality. They are then combined to yield a dual object for the composed function, using dual componentwise composition [41, 26]:
(1.7) |
This composed dual object typically requires additional work to ensure strong enough correlation with the composed function . Among the generic tools available to assist in this process is a “corrector” object due to Razborov and Sherstov [34], with the following four properties: (i) is orthogonal to low-degree polynomials; (ii) takes on at a prescribed point of the hypercube; (iii) is bounded at inputs of low Hamming weight; and (iv) vanishes at all other points of the hypercube. Using , suitably shifted and scaled, one can surgically correct the behavior of a given dual object at a substantial fraction of the inputs without affecting ’s orthogonality to low-degree polynomials. This technique played an important role in previous work, e.g., [17, 14, 18, 49].
Componentwise composition by itself does not allow one to construct hard-to-approximate functions from easy ones. To see why, consider arbitrary functions and with approximate degrees at most and respectively, for some . It is well-known [42] that the composed function on variables has approximate degree This means that relative to the new number of variables, the composed function is asymptotically no harder to approximate than the constituent functions and . In particular, one cannot use componentwise composition to transform functions on bits with -approximate degree at most into functions on bits with -approximate degree
Previous best bound for
In the previous best result on the -approximate degree of , Bun and Thaler [17] approached the componentwise composition in an ingenious way to amplify the approximate degree for a careful choice of . Let be given, with -approximate degree for some . Bun and Thaler consider the componentwise composition , for a small enough parameter It was shown in earlier work [41, 16] that dual componentwise composition witnesses the lower bound Bun and Thaler make the crucial observation that the dual object for has most of its mass on inputs of Hamming weight , which in view of (1.7) implies that the dual object for places most of its mass on inputs of Hamming weight The authors of [17] then use the Razborov–Sherstov corrector object to transfer the small amount of mass that the dual object for places on inputs of high Hamming weight, to inputs of low Hamming weight. The resulting dual object is supported entirely on inputs of low Hamming weight and therefore witnesses a lower bound on the approximate degree of the restriction of to inputs of low Hamming weight.
The restriction takes as input variables but is defined only when its input string has Hamming weight This makes it possible to represent the input to more economically, by specifying the locations of the nonzero bits inside the array of variables. Since each such location can be specified using bits, the entire input to can be specified using bits. This yields a function on variables. A careful calculation shows that this “input compression” does not hurt the approximate degree. Thus, the approximate degree of is at least the approximate degree of which as discussed above is With set appropriately, the approximate degree of is polynomially larger than that of
This passage from to is the desired hardness amplification for approximate degree. To obtain an lower bound on the approximate degree of , the authors of [17] start with a trivial circuit and apply the hardness amplification step a constant number of times, until approximate degree is reached.
Limitations of previous approaches to
Bun and Thaler’s hardness amplification for approximate degree rests on two pillars. The first is componentwise composition, whereby the given function is composed componentwise with independent copies of the gadget In this gadget, the gate is necessary to control the accumulation of error and to ensure the correlation property of the dual polynomial. The resulting composed function is defined on ) variables. The second pillar of [17] is input compression, where the length- input to is represented compactly as an array of strings of length each. The circuitry to implement these two pillars is expensive, requiring in both cases a polynomial-size DNF formula of width . As a result, even a single iteration of the Bun–Thaler hardness amplification cannot be implemented as a polynomial-size DNF or CNF formula.
To prove an approximate degree lower bound for small in the framework of [17], one needs a number of iterations that grows with . Thus, the overall circuit produced in [17] has a large constant number of alternating layers of AND and OR gates of logarithmic and polynomial fan-in, respectively, and in particular cannot be flattened into a polynomial-size DNF or CNF formula. Proving Theorem 1.1 within this framework would require reducing the fan-in of the AND gates from to which would completely destroy the componentwise composition and input compression pillars of [17]. These pillars are present in all follow-up papers [17, 14, 18, 49] and seem impossible to get around, prompting the authors of [18, p. 14] to entertain the possibility that the approximate degree of at any given depth is much smaller than once conjectured. We show that this is not the case.
1.9. Our proof
In this paper, we design hardness amplification from first principles, without using componentwise composition or input compression. Our approach efficiently amplifies the approximate degree even for functions with sparse input, while ensuring that each hardness amplification stage is implementable by a monotone circuit of constant depth with AND gates of constant fan-in and OR gates of polynomial fan-in. As a result, repeating our process any constant number of times produces a polynomial-size DNF formula of constant width.
Our approach at a high level
Let be a given function. Let denote the restriction of to inputs of Hamming weight at most and let be the approximate degree of this restriction. The total number of variables can be vastly larger than ; in the actual proof, we will set for a constant Since an input to is guaranteed to have Hamming weight at most we can think of as the disjunction of vectors of Hamming weight at most each:
where each is either the zero vector or a basis vector , and the disjunction on the right-hand side is applied coordinate-wise. Our approach centers around encoding each as a string of bits so as to make the decoding difficult for polynomials but easy for circuits. Ideally, we would like a decoding function with the following properties:
-
(i)
the sets for are indistinguishable by polynomials of degree up to , for some parameter ;
-
(ii)
the sets for contain only strings of Hamming weight
-
(iii)
is computable by a constant-depth monotone circuit with AND gates of constant fan-in and OR gates of polynomial fan-in.
With such in hand, define by
Then, one can reasonably expect that approximating is harder than approximating Indeed, an approximating polynomial has access only to the encoded input . Decoding this input presumably involves computing one way or another, which by property (i) requires a polynomial of degree greater than . Once the decoded string is available, the polynomial supposedly needs to compute on that input, which in and of itself requires degree Altogether, we expect to have approximate degree on the order of Moreover, property (ii) ensures that is hard to approximate even on inputs of Hamming weight putting us in a strong position for another round of hardness amplification. Finally, property (iii) guarantees that the result of constantly many rounds of hardness amplification is computable by a DNF formula of polynomial size and constant width.
Actual implementation
As one might suspect, the above program is too bold and cannot be implemented literally. Our actual construction of achieves (i)–(iii) only approximately. In more detail, let be a sufficiently large constant. For each we construct a probability distribution on that has all but a vanishing fraction of its mass on inputs of Hamming weight exactly and moreover any two such distributions and are indistinguishable by polynomials of low degree. We are further able to ensure that an input of Hamming weight belongs to the support of at most one of the distributions . Thus, the are in essence supported on pairwise disjoint sets of strings of Hamming weight and are pairwise indistinguishable by polynomials of low degree. The decoding function works by taking an input of Hamming weight and determining which of the distributions has in its support—a highly efficient computation realizable as a monotone -DNF formula. With small probability, will receive as input a string of Hamming weight larger than in which case the decoding may fail.
Construction of the
Central to our work is the number-theoretic notion of -discrepancy, which is a measure of pseudorandomness or aperiodicity of a given set of integers modulo Formally, the -discrepancy of a nonempty finite set is defined as
where is a primitive -th root of unity. The construction of sparse sets with low discrepancy is a well-studied problem in combinatorics and theoretical computer science. By building on previous work [2, 48], we construct a sparse set of integers with small discrepancy in our regime of interest. For our application, we set the modulus
Continuing, let denote the family of cardinality- subsets of To design the distributions we need an explicit coloring that is balanced, in the sense that for nearly all large enough subsets and all the family accounts for almost exactly a fraction of all cardinality- subsets of The existence of a highly balanced coloring follows by the probabilistic method, and we construct one explicitly using the sparse set of integers with small -discrepancy constructed earlier in the proof.
Our next ingredient is a dual polynomial for the OR function, a staple in approximate degree lower bounds. An important property of is that it places a constant fraction of its mass on the point Translating from to a point of slightly larger Hamming weight results in a new dual polynomial, call it Analogous to the new dual polynomial has a constant fraction of its mass on and the rest on inputs that are greater than or equal to componentwise.
For notational convenience, let us now rename ’s range elements to respectively. For define to be the average of the dual polynomials where ranges over all characteristic vectors of the sets in Being a convex combination of dual polynomials, each is a dual object orthogonal to polynomials of low degree. Observe further that each is supported on inputs of Hamming weight at least and any input of Hamming weight exactly belongs to the support of exactly one For inputs of Hamming weight greater than , a remarkable thing happens: is almost the same for all We prove this by exploiting the fact that is highly balanced. As a result, the “common part” of the for inputs of Hamming weight greater than can be subtracted out to obtain a function for each . While these new functions are not dual polynomials, the difference of any two of them is since . Put another way, the are pairwise indistinguishable by low-degree polynomials. By defining the in a somewhat more subtle way, we further ensure that each is nonnegative. The distribution can then be taken to be the normalized function This construction ensures all the properties that we need: has nearly all of its mass on inputs of Hamming weight ; an input of Hamming weight belongs to the support of at most one distribution ; and any pair of distributions are indistinguishable by a low-degree polynomial. Observe that in our construction, is close to the uniform probability distribution on the characteristic vectors of the sets in
2. Preliminaries
2.1. General notation
For a string and a set we let denote the restriction of to the indices in In other words, where are the elements of The characteristic vector of a set is given by
Given an arbitrary set and elements the Kronecker delta is defined by
For a logical condition we use the Iverson bracket
We let denote the set of natural numbers. We use the comparison operators in a unary capacity to denote one-sided intervals of the real line. Thus, stand for respectively. We let and stand for the natural logarithm of and the logarithm of to base respectively. The term Euclidean space refers to for some positive integer We let denote the vector whose -th component is and the others are Thus, the vectors form the standard basis for For a complex number we denote the real part, imaginary part, and complex conjugate of as usual by and respectively. We typeset the imaginary unit in boldface to distinguish it from the index variable . For an arbitrary integer and a positive integer , recall that denotes the unique element of that is congruent to modulo
For a set we let denote the linear space of real-valued functions on The support of a function is denoted For real-valued functions with finite support, we adopt the usual norms and inner product:
This covers as a special case functions on finite sets. Analogous to functions, we adopt the familiar norms for vectors in Euclidean space: and The tensor product of and is denoted and given by The tensor product ( times) is abbreviated We frequently omit the argument in equations and inequalities involving functions, as in . Such statements are to be interpreted pointwise. For example, the statement “ on ” means that for every For vectors and the notation means that for each .
We adopt the standard notation for function composition, with defined by In addition, we use the operator to denote the componentwise composition of Boolean functions. Formally, the componentwise composition of and is the function given by Componentwise composition is consistent with standard composition, which in the context of Boolean functions is only defined for Thus, the meaning of is determined by the range of and is never in doubt.
For a natural number we abbreviate . For a set and an integer we let stand for the family of cardinality- subsets of :
Analogously, for any set , we define
To illustrate, denotes the family of subsets of that have cardinality at most Analogously, we have the symbols Throughout this manuscript, we use brace notation as in to specify multisets rather than sets, the distinction being that the number of times an element occurs is taken into account. The cardinality of a finite multiset is defined to be the total number of element occurrences in , with each element counted as many times as it occurs. The equality and subset relations on multisets are defined analogously, with the number of element occurrences taken into account. For example, but . Similarly, but
2.2. Boolean strings and functions
We identify the Boolean values “true” and “false” with and respectively, and view Boolean functions as mappings for a finite set The familiar functions and are given by and We abbreviate For Boolean strings we let denote their bitwise XOR. The strings and are defined analogously, with the binary operator applied bitwise.
For a vector we define its weight to be If is a Boolean string, then is precisely the Hamming weight of . For any sets and we define to be the subset of vectors in whose weight belongs to :
In the case of a one-element set , we further shorten to For example, denotes the set of vectors whose components are natural numbers and sum to at most , whereas denotes the set of Boolean strings of length and Hamming weight exactly For a function on a subset we let denote the restriction of to Thus, is a function with domain given by A typical instance of this notation would be for some real number corresponding to the restriction of to Boolean strings of Hamming weight at most
2.3. Concentration of measure
Throughout this manuscript, we view probability distributions as real functions. This convention makes available the shorthand notation introduced above. In particular, for probability distributions and the symbol denotes the support of , and denotes the probability distribution given by We use the notation interchangeably with the former being more standard for probability distributions. If is a probability distribution on we consider to be defined also on any superset of with the understanding that outside
We recall the following multiplicative form of the Chernoff bound [21].
Theorem 2.1 (Chernoff bound).
Let be i.i.d. random variables with Then for all
Theorem 2.1 assumes i.i.d. Bernoulli random variables. Hoeffding’s inequality [24], stated next, is a more general concentration-of-measure result that applies to any independent bounded random variables.
Theorem 2.2 (Hoeffding’s inequality).
Let be independent random variables with Define Then for all
The standard version of Hoeffding’s inequality, stated above, requires to be independent. Less known are Hoeffding’s results for dependent random variables, which he obtained along with Theorem 2.2 in his original paper [24]. We will specifically need the following concentration inequality for sampling without replacement [24, Section 6].
Theorem 2.3 (Hoeffding’s sampling without replacement).
Let be given reals, with for all Let be uniformly random integers that are pairwise distinct. Let for and define Then for all
Hoeffding’s two theorems are clearly incomparable. On the one hand, Theorem 2.2 requires independence and therefore does not apply to sampling without replacement. On the other hand, each random variable in Theorem 2.3 must be uniformly distributed on a finite multiset of values, which must further be the same multiset for all ; none of this is assumed in Theorem 2.2.
Finally, we will need a concentration-of-measure result due to Bun and Thaler [17, Lemma 4.7] for product distributions on .
Lemma 2.4 (cf. Bun and Thaler).
Let be distributions on with finite support such that
where and . Then for all
Bun and Thaler’s result in [17, Lemma 4.7] differs slightly from the statement above. The proof of Lemma 2.4 as stated can be found in [49, Lemma 3.6]. By leveraging Lemma 2.4, we obtain the following concentration result for probability distributions that are supported on the Boolean hypercube, rather than , and are shifted from the origin.
Lemma 2.5.
Fix integers Let be probability distributions on with support contained in Suppose further that
where and Then for all
2.4. Orthogonal content
For a multivariate polynomial , we let denote the total degree of , i.e., the largest degree of any monomial of We use the terms degree and total degree interchangeably in this paper. It will be convenient to define the degree of the zero polynomial by For a real-valued function supported on a finite subset of , the orthogonal content of denoted , is the minimum degree of a real polynomial for which We adopt the convention that if no such polynomial exists. It is clear that with the extremal cases and Additional facts about orthogonal content are given by the following two propositions.
Proposition 2.6.
Let and be nonempty finite subsets of Euclidean space. Then:
-
(i)
for all
-
(ii)
for all and
Proposition 2.7.
Define Fix functions where is a finite subset of Euclidean space. Suppose that
(2.2) |
where is a positive integer. Then for every polynomial the mapping is a polynomial on of degree at most
Proof.
By linearity, it suffices to consider factored polynomials where each is a nonzero polynomial on In this setting,
(2.3) |
By (2.2), we have for any index with As a result, polynomials with do not contribute to the degree of the right-hand side of (2.3) as a function of For the other polynomials , the inner product is a linear polynomial in namely,
Thus, polynomials with contribute at most each to the degree. Summarizing, the right-hand side of (2.3) is a real polynomial in of degree at most ∎
2.5. Polynomial approximation
For a real number and a function on a finite subset of Euclidean space, the -approximate degree of is denoted and is defined to be the minimum degree of a polynomial such that For , it will be convenient to define since no polynomial satisfies in this case. We focus on the approximate degree of Boolean functions In this setting, the standard choice of the error parameter is This choice is without loss of generality since for every Boolean function and every constant In what follows, we refer to -approximate degree simply as “approximate degree.” The notion of approximate degree has the following dual characterization [38, 39].
Fact 2.8.
Let be given, for a finite set . Let be an integer and a real number. Then if and only if there exists a function such that
This characterization of approximate degree can be verified using linear programming duality, cf. [38, 39]. We now recall a variant of approximate degree for one-sided approximation. For a Boolean function and the one-sided -approximate degree of is denoted and defined to be the minimum degree of a real polynomial such that
We refer to any such polynomial as a one-sided approximant for with error As usual, the canonical setting of the error parameter is In the pathological case , it will be convenient to define . Observe the asymmetric treatment of and in this formalism. In particular, the one-sided approximate degree of Boolean functions is in general not invariant under negation. One-sided approximate degree enjoys the following dual characterization [16].
Fact 2.9.
Let be given, for a finite set . Let be an integer and a real number. Then if and only if there exists a function such that
2.6. Dual polynomials
Facts 2.8 and 2.9 make it possible to prove lower bounds on approximate degree in a constructive manner, by exhibiting a dual object that serves as a witness. This object is referred to as a dual polynomial. Often, a dual polynomial for a composed function can be constructed by combining dual objects for various components of Of particular importance in the study of is the dual object for the OR function. The first dual polynomial for OR was constructed by Å palek [50], with many refinements and generalizations obtained in follow-up work [15, 45, 46, 17, 14, 49]. We will use the following construction from [49, Lemma B.2].
Lemma 2.10.
Let be given, . Then for some constant and every integer there is an explicitly given function such that
A useful tool in the construction of dual polynomials is the following lemma due to Razborov and Sherstov [34].
Lemma 2.11 (Razborov and Sherstov).
Fix integers and where Then there is an explicitly given function such that
(2.4) | |||
(2.5) | |||
(2.6) | |||
(2.7) |
In more detail, this result corresponds to taking and in the proof of Lemma 3.2 of [34]. We will need the following natural generalization of Lemma 2.11.
Lemma 2.12.
Fix integers and where Let be a string with Then there is an explicitly given function such that
(2.8) | |||
(2.9) | |||
(2.10) | |||
(2.11) |
Proof.
Set Lemma 2.11 gives an explicit function that satisfies (2.4)–(2.7). Define by
where . Then (2.9) and (2.10) are immediate from (2.5) and (2.6), respectively. Property (2.11) follows from (2.7) in light of Proposition 2.6 (ii). To verify the remaining property (2.8), fix any input with . Then the definition of implies that is the zero vector, whereas (2.4) implies that is either or a string of Hamming weight at most In the former case, we have ; in the latter case, and ∎
Informally, Lemmas 2.11 and 2.12 are useful when one needs to adjust a dual object’s metric properties while preserving its orthogonality to low-degree polynomials. These lemmas play a basic role in several recent papers [34, 17, 14, 18, 49] as well as our work. For the reader’s benefit, we encapsulate this procedure as Lemma 2.13 below and provide a detailed proof.
Lemma 2.13.
Let be given. Fix integers . Then there is an explicitly given function such that
(2.12) | |||
(2.13) | |||
(2.14) |
Proof (adapted from [34, 17, 14, 18, 49])..
For the lemma holds trivially with In what follows, we treat the complementary case
For each , Lemma 2.12 constructs a function that obeys (2.8)–(2.11). Define
Then for , properties (2.8) and (2.9) force and consequently This settles (2.12). Property (2.13) is justified by
where the last two steps use Proposition 2.6(i) and (2.11), respectively. The final property (2.14) can be derived as follows:
where the last two steps use the triangle inequality and (2.10), respectively. ∎
2.7. Symmetrization
Let denote the symmetric group on elements. For a permutation and an arbitrary sequence we adopt the shorthand A function is called symmetric if it is invariant under permutation of the input variables: for all and Symmetric functions on are intimately related to univariate polynomials, as was first observed by Minsky and Papert in their symmetrization argument [30].
Proposition 2.14 (Minsky and Papert).
Let be a given polynomial. Then the mapping
is a univariate polynomial on of degree at most
The next result, proved in [49, Corollary 2.13], generalizes Minsky and Papert’s symmetrization to the setting when are vectors rather than bits.
Fact 2.15 (Sherstov and Wu).
Let be a given polynomial. Then the mapping
(2.15) |
is a polynomial on of degree at most
Minsky and Papert’s symmetrization corresponds to in Fact 2.15.
2.8. Number theory
For positive integers and that are relatively prime, we let denote the multiplicative inverse of modulo The following fact is well-known and straightforward to verify; see, e.g., [48, Fact 2.8].
Fact 2.16.
For any positive integers and that are relatively prime,
The prime counting function for a real argument evaluates to the number of prime numbers less than or equal to In this manuscript, it will be clear from the context whether refers to or the prime counting function. The asymptotic growth of the latter is given by the prime number theorem, which states that The following explicit bound on is due to Rosser [35].
Fact 2.17 (Rosser).
For
The number of distinct prime divisors of a natural number is denoted . The following first-principles bound on is asymptotically tight for infinitely many see [48, Fact 2.11] for details.
Fact 2.18.
The number of distinct prime divisors of obeys
In particular,
3. Balanced colorings
For integers and consider a mapping We refer to any such as a coloring of with colors. An important ingredient in our work is the construction of a balanced coloring, in the following technical sense.
Definition 3.1.
Let be a given coloring. For a subset we say that is -balanced on iff for each
We define to be -balanced iff
for all
As one might expect, a uniformly random coloring is balanced with high probability; we establish this fact in Section 3.1. In Sections 3.2–3.5 that follow, we construct a highly balanced coloring based on an integer set with low discrepancy. The reader who is interested only in the quantitative aspect of our theorems and is not concerned about explicitness, may read Section 3.1 and skip without loss of continuity to Section 4.
3.1. Existence of balanced colorings
The next lemma uses the probabilistic method to establish the existence of balanced colorings with excellent parameters.
Lemma 3.2.
Let be given. Let be positive integers with and
(3.1) |
Then there exists an -balanced coloring .
Proof.
Let be a uniformly random coloring. For fixed and the cardinality is the sum of independent Bernoulli random variables, each with expected value As a result,
(3.2) |
where the second step applies the union bound over the third step uses the Chernoff bound (Theorem 2.1), and the fifth step uses (3.1). Now
where the next-to-last step uses (3.2). We conclude that there exists a coloring with
which is the definition of an -balanced coloring. ∎
For our purposes, the following consequence of Lemma 3.2 will be sufficient.
Corollary 3.3.
Let be positive integers with Let be given with
Then there exists an -balanced coloring
Proof.
We have
where the next-to-last step uses the hypothesis By Lemma 3.2, we conclude that there is an -balanced coloring ∎
3.2. Discrepancy defined
Discrepancy is a measure of pseudorandomness or aperiodicity of a multiset of integers with respect to a given modulus Formally, let be a given integer. The -discrepancy of a nonempty multiset of arbitrary integers is defined as
where is a primitive -th root of unity; the right-hand side is obviously the same for any such . Equivalently, we may write
where the maximum is over -th roots of unity other than . Yet another way to think of -discrepancy is in terms of the discrete Fourier transform on Specifically, consider the frequency vector of , where is the total number of element occurrences in that are congruent to modulo Applying the discrete Fourier transform to produces the sequence which is a permutation of for a primitive -th root of unity Thus, the -discrepancy of coincides up to a normalizing factor with the largest absolute value of a nonconstant Fourier coefficient of the frequency vector of The notion of -discrepancy has a long history in combinatorics and theoretical computer science; see [48] for a bibliographic overview.
Lemma 3.4 (Discrepancy under sampling without replacement).
Fix integers and Let be a multiset of integers. Then for all
where is understood to be a multiset of cardinality .
Proof.
Fix an -th root of unity Then range in . Now, let be a uniformly random subset. Then the Hoeffding inequality for sampling without replacement (Theorem 2.3) implies that
Analogously,
Combining these two equations shows that for every -th root of unity
(3.3) |
Now
(3.4) |
where the maximum in all equations is taken over -th roots of unity Using (3.3) and the union bound over we see that the right-hand side of (3.4) is bounded by with probability at least ∎
3.3. A low-discrepancy set
The construction of sparse integer sets with small discrepancy relative to a given modulus is a well-studied problem. There is an inherent trade-off between the size of the set and the discrepancy it achieves, and different works have focused on different regimes depending on the application at hand. We work in a regime not considered previously: for any constant , we construct a set of cardinality at most that has -discrepancy at most for some constant We construct such a set based on the following result.
Theorem 3.5 (cf. [2, 48]).
Fix an integer and reals and . Let be an integer with
Fix a set for each prime with . Suppose further that the cardinalities of any two sets from among the differ by a factor of at most Consider the multiset
(3.5) |
Then the elements of are pairwise distinct and nonzero. Moreover, if then
for some explicitly given constant independent of
Ajtai et al. [2] proved a special case of Theorem 3.5 for prime and Their argument was generalized in [48, Theorem 3.6] to arbitrary moduli , again in the setting of . The treatment in [48] in turn readily generalizes to any and for the reader’s convenience we provide a complete proof of Theorem 3.5 in Appendix A. With this result in hand, we obtain the low-discrepancy set with the needed parameters:
Theorem 3.6 (Explicit low-discrepancy set).
For all integers and there is an explicitly given nonempty set with
(3.6) | |||
(3.7) |
where is an explicitly given absolute constant independent of and
Proof.
Facts 2.17 and 2.18 imply that
(3.8) | ||||
(3.9) |
for some integer that is an absolute constant. Moreover, can be easily calculated from the explicit bounds in Facts 2.17 and 2.18. We will show that the theorem holds for some constant
For the theorem is trivial since the set achieves Also, if the right-hand side of (3.7) exceeds , then (3.7) holds trivially for the set In what follows, we treat the remaining case when
(3.10) | |||
(3.11) |
The latter condition forces
(3.12) |
Set and Then (3.10) and (3.12) imply that , and As a result, Theorem 3.5 is applicable with the sets for prime The discrepancy of these sets is given by . Define by (3.5). The interval contains prime numbers, of which at most are divisors of We have
where the first step uses (3.8), (3.9), and , and the last step uses (3.11). We conclude that contains a prime that does not divide which in turn implies that is nonempty. Continuing, forces in the notation of Theorem 3.5. As a result, Theorem 3.5 guarantees (3.7) for a large enough constant We note that can be easily calculated from the constant in Theorem 3.5. Since by definition, the proof is complete. ∎
3.4. Discrepancy and balanced colorings
We will leverage the low-discrepancy integer set in Theorem 3.6 to construct a balanced coloring of For this, we now develop a connection between these two notions of pseudorandomness. We will henceforth denote the modulus by since in our construction, the modulus is set equal to the number of colors in the coloring of We start with a technical lemma.
Lemma 3.7.
Fix integers with and Let be a multiset of integers. Then for all
Proof.
Let be a primitive -th root of unity. Then
where the final expectation is taken over a uniformly random tuple of indices that are pairwise distinct. Therefore,
(3.13) |
We now introduce conditioning to make independent random variables. Specifically, can be generated by the following two-step procedure:
-
(i)
pick uniformly random sets that are pairwise disjoint;
-
(ii)
for pick uniformly at random from among the elements of .
By symmetry, this procedure generates every tuple of pairwise distinct integers with equal probability. Importantly, conditioning on makes independent. Now (3.13) gives
(3.14) |
where for each is a multiset of cardinality
Let be the event that has -discrepancy greater than and let . Conditioned on we get since -discrepancy is at most Conditioned on we have by definition that Thus,
(3.15) |
Recall that are identically distributed, namely, each has the distribution of a uniformly random subset of of cardinality As a result, Lemma 3.4 guarantees that occurs with probability at most . Applying the union bound over all
(3.16) |
We are now in a position to give our general transformation of a low-discrepancy integer set into a balanced coloring of
Theorem 3.8 (From a low-discrepancy set to a balanced coloring).
Let be integers with and Let be a multiset of integers. Define by
(3.17) |
Let be arbitrary. Then is -balanced, where
Proof.
Let be arbitrary. Then Lemma 3.4 implies that for all but a fraction of the sets
(3.18) |
It remains to prove that is -balanced on every set that satisfies (3.18). We have
where the second step uses the definition of the third step applies Lemma 3.7, the fourth step uses (3.18) and , and the fifth step uses the definition of . We have shown that is -balanced on , thereby completing the proof. ∎
3.5. An explicit balanced coloring
Theorem 3.8 transforms any integer set with small -discrepancy into a balanced coloring with colors. We now apply this transformation to the low-discrepancy integer set constructed earlier, resulting in an explicit balanced coloring.
Theorem 3.9 (Explicit balanced coloring).
Let be integers with and Let be arbitrary. Then there is an explicitly given integer and an explicitly given -balanced coloring where
(3.19) | ||||
(3.20) |
and is the absolute constant from Theorem 3.6.
Proof.
By hypothesis, Invoke Theorem 3.6 with and to obtain an explicit nonempty set with
Let be the union of copies of Then by the definition of -discrepancy. Letting we claim that Indeed, the upper bound is justified by whereas the lower bound is the arithmetic mean of the bounds and
Taking in Theorem 3.9, we obtain:
Corollary 3.10 (Explicit balanced coloring).
Let be integers with and Then there is an explicitly given integer and an explicitly given -balanced coloring where
(3.21) | ||||
(3.22) |
and is the absolute constant from Theorem 3.6.
4. Hardness amplification
In Section 3, we laid the foundation for our main result by constructing an explicit integer set with small discrepancy and transforming it into a highly balanced coloring of . In this section, we use this coloring to design a hardness amplification method for approximate degree and its one-sided variant.
4.1. Pseudodistributions from balanced colorings
Recall from the introduction that our approach centers around encoding the vectors as -bit strings with so as to make the decoding easy for circuits but hard for low-degree polynomials. The construction of this code requires several steps. As a first step, we show how to convert any balanced coloring of with colors into an explicit sequence of functions that are almost everywhere nonnegative, are supported almost entirely on pairwise disjoint sets of strings of Hamming weight , and are pairwise indistinguishable by low-degree polynomials. We call them pseudodistributions to highlight the fact that each has norm approximately , nearly all of it coming from the points where is nonnegative.
Theorem 4.1.
Let be given. Let be positive integers with . Let be a given -balanced coloring. Then there are explicitly given functions with the following properties.
-
(i)
Support:
-
(ii)
Essential support:
-
(iii)
Nonnegativity: on
-
(iv)
Normalization:
-
(v)
Tail bound:
-
(vi)
Graded bound: for some absolute constant
-
(vii)
Orthogonality: for some absolute constant
Proof.
Define
(4.1) | ||||
(4.2) |
Setting in Lemma 2.10 gives an explicit function with
(4.3) | ||||
(4.4) | ||||
(4.5) |
where is an absolute constant. For convenience of notation, we will extend to all of by setting for With this extension, (4.4) gives
(4.6) |
For define an auxiliary dual object by
(4.7) |
Then
(4.8) |
Since unless we see that is in fact the only input of Hamming weight at which is nonzero:
(4.9) |
Since the only inputs other than in the support of have Hamming weight so that in particular In summary,
(4.10) | |||||
(4.11) |
We now turn to the construction of the . By definition of an -balanced coloring, the given coloring satisfies
(4.12) |
Since taking in this equation leads to
(4.13) |
and in particular
(4.14) |
For we define by
This definition is legitimate since for every due to (4.14) and
Claim 4.2.
For all and
Proof.
Fix and arbitrarily for the remainder of the proof. Let be uniformly random. If is -balanced on then by definition
If is not -balanced on we have the trivial bound
Combining these two equations, we arrive at
(4.15) |
for all where is the indicator random variable for the event that is not -balanced on Since is -balanced, we further have
(4.16) |
Now
(4.17) |
where the third step is valid by (4.14), the fourth step applies the triangle inequality, the fifth step uses (4.13) and (4.15), and the last step uses It remains to pass to expectations with respect to :
where the last step uses (4.16). ∎
Claim 4.3.
For each and ,
Proof.
Properties (i)–(iv). Equation (4.10) shows that is a linear combination of functions whose support is contained in This settles the support requirement (i). For ,
(4.18) |
where the first step is immediate from the defining equation for and the second step applies (4.9). The essential support property (ii) and nonnegativity property (iii) are now immediate from (4.18). The normalization requirement (iv) follows by summing (4.18) over
Properties (v) and (vi). The tail bound (v) for can be seen as follows:
where the first step uses the support property (i), the third step is valid by Claim 4.3, and the last step applies (4.3).
The graded bound (vi) for holds trivially since vanishes on inputs of Hamming weight in by the support property (i). The validity of (vi) for is borne out by
where the first step restates Claim 4.3, the second step is justified by (4.3), the third step appeals to (4.6), and the fourth step substitutes the values from (4.1) and (4.2).
Property (vii). To begin with, we claim that
(4.19) |
Indeed, let be a real polynomial on with . By linearity, it suffices to consider polynomials that factor as for some nonzero polynomials . Now, Minsky and Papert’s symmetrization argument (Proposition 2.14) guarantees that
(4.20) |
for some univariate polynomial of degree at most . As a result,
where the first and third steps use the definition of the second step is justified by (4.11), the next-to-last step uses (4.20), and the last step is valid by (4.5) since This settles (4.19).
4.2. Encoding via indistinguishable distributions
As our next step, we will show that the pseudodistributions in Theorem 4.1 can be turned into actual probability distributions provided that the underlying coloring of is sufficiently balanced. The resulting distributions inherit all the desirable analytic properties established for the in Theorem 4.1. Specifically, the are supported almost entirely on pairwise disjoint sets of inputs of Hamming weight and are pairwise indistinguishable by low-degree polynomials.
Theorem 4.4.
Let be given. Let be positive integers with . Let be a given -balanced coloring. Then there are explicitly given probability distributions on such that
(4.21) | ||||
(4.22) | ||||
(4.23) | ||||
(4.24) | ||||
(4.25) |
where is an absolute constant, independent of
Proof.
By hypothesis, is -balanced with
Applying Theorem 4.1 with these parameters gives functions that obey
(4.26) | |||
(4.27) | |||
(4.28) | |||
(4.29) | |||
(4.30) | |||
(4.31) | |||
(4.32) |
where are the absolute constants defined in Theorem 4.1. For define by
(4.33) |
Equation (4.26) shows that is a linear combination of functions whose support is contained in As a result,
(4.34) |
Since on we obtain from (4.27) and (4.29) that
(4.35) | |||
(4.36) |
In particular,
(4.37) |
We further claim that
(4.38) |
Indeed, the nonnegativity of for follows from and (4.28), whereas the nonnegativity of for follows from (4.33) via
On we have
This conclusion is also valid on due to (4.34). Thus,
(4.39) |
Summing over gives
(4.40) |
where the third step applies (4.30).
For all and , we have the graded bound
(4.41) |
where the first step uses (4.39), and the second step uses (4.31). Finally, for we have
(4.42) |
where the first step uses the definition (4.33), and the second step uses (4.32).
Define Equations (4.36) and (4.38) show that each is a nonnegative function and is not identically zero, making it possible to define a probability distribution on by
In other words, is nonzero only on inputs with and on such inputs is the properly normalized version of the nonnegative function . Then properties (4.21) and (4.22) are immediate from (4.34) and (4.35), respectively. Property (4.23) follows from
where the third step uses (4.36), and the fourth step uses (4.36) and (4.40). Property (4.24) is trivial for and follows for from
where the third step uses (4.41), and the fourth step uses (4.37) and
It remains to verify (4.25). For this, fix arbitrarily. Then where the first step uses (4.38), and the third step uses (4.42). We thus see that
(4.43) |
Next, observe that can be written as the product of two functions on disjoint sets of variables, and likewise for Namely,
Now
where the second step uses Proposition 2.6(ii), the third step applies (4.43), and the last step is justified by (4.42). In view of this settles (4.25) and completes the proof. ∎
4.3. Hardness amplification for approximate degree
We have reached the crux of our proof, a hardness amplification theorem for approximate degree. Unlike previous work, our hardness amplification is directly applicable to Boolean functions with sparse input and does not use componentwise composition or input compression. The theorem statement below has a large number of parameters, for maximum generality and black-box integration with the auxiliary results of previous sections. We will later derive a succinct and easy-to-apply corollary that will suffice for our hardness amplification purposes.
Theorem 4.5.
Let and be the absolute constants from Theorems 3.6 and 4.4, respectively. Fix a real number and positive integers such that
(4.44) | |||
(4.45) | |||
(4.46) | |||
(4.47) |
Define
(4.48) |
Then there is an explicitly given mapping such that:
-
(i)
each output bit of is computable by a monotone -DNF formula;
-
(ii)
for every and every one has
Proof.
We may assume that
(4.49) |
since otherwise the left-hand side in the approximate degree lower bound of (ii) is by definition . Define by and set In view of (4.44) and (4.45), Corollary 3.10 gives an explicit integer and an explicit -balanced coloring . Alternatively, if one is not concerned about explicitness, the existence of can be deduced from the much simpler Corollary 3.3. Specifically, (4.45) forces and in particular Moreover, (4.45) implies that Now Corollary 3.3 guarantees the existence of a -balanced coloring .
Since Theorem 4.4 gives explicit distributions on such that
(4.50) | ||||
(4.51) | ||||
(4.52) | ||||
(4.53) | ||||
(4.54) | ||||
(4.55) |
Properties (4.51) and (4.52) imply that
(4.56) |
For , define
(4.57) |
Claim 4.6.
For each there is a function such that
(4.58) | |||
(4.59) | |||
(4.60) |
We will settle Claim 4.6, and all other claims, after the proof of the theorem.
We now turn to the construction of the monotone mapping in the theorem statement. Define by
(4.61) |
Clearly, this is a monotone DNF formula of width . Define by
(4.62) |
where the right-hand side is the componentwise disjunction of the Boolean vectors Observe that both and are monotone and are given explicitly in closed form in terms of the coloring constructed at the beginning of the proof. This settles (i).
For (ii), fix an arbitrary function and abbreviate
By the dual characterization of approximate degree (Fact 2.8), there is a function such that
(4.63) | |||
(4.64) | |||
(4.65) |
Define by
(4.66) |
We will now use (4.44)–(4.66) to prove a sequence of claims.
Claim 4.7.
One has
(4.67) | |||
(4.68) | |||
(4.69) |
Claim 4.8.
Let be given. Then for all one has
Claim 4.9.
Let and be given such that Then
(4.70) |
Claim 4.10.
One has
(4.71) |
Proof of Claim 4.6..
Proof of Claim 4.7..
Observe from (4.58) that is a linear combination of functions supported on inputs of Hamming weight at most This settles the support property (4.67). Property (4.68) can be verified as follows:
where the first and fourth steps apply the triangle inequality, and the last step uses (4.60) and (4.63).
To settle (4.69), consider an arbitrary polynomial of degree less than Then
(4.72) |
where the first and second steps use the linearity of inner product, and the third step is valid by (4.59). Equation (4.55) allows us to invoke Proposition 2.7 with and to infer that the inner product is a polynomial in of degree less than As a result, Fact 2.15 implies that the expected value in (4.72) is a polynomial in of degree less than In summary, (4.72) is the inner product of with a polynomial of degree less than and is therefore zero by (4.65). The proof of (4.69) is complete. ∎
Proof of Claim 4.8..
Consider an arbitrary string . Then
where the first step uses the defining equation (4.61) together with and the second step applies (4.51) along with Thus, can be written out explicitly as
(4.73) |
Now recall from (4.56) that a string of Hamming weight can belong to at most one of the sets As a result, if then for all and consequently by (4.73). Analogously, if then for all and consequently by (4.73). This settles the claim for all ∎
Proof of Claim 4.9..
Since is a Boolean vector, the equality forces
(4.74) |
where the disjunction is applied componentwise. For any input where we have
where the second and third steps use Claim 4.8 and (4.74), respectively. Since we have shown that
(4.75) |
Furthermore,
(4.76) |
where the second step uses (4.53). Now
where the last three steps use (4.75), (4.76), and (4.60), respectively. ∎
Proof of Claim 4.10..
4.4. Hardness amplification for one-sided approximate degree
In this section, we will prove that the construction of Theorem 4.5 amplifies not only approximate degree but also its one-sided variant. We start with a technical lemma.
Lemma 4.11.
Let be positive integers with
(4.78) | ||||
(4.79) |
Let be given. Then there exists such that
(4.80) | |||
(4.81) | |||
(4.82) | |||
(4.83) | |||
(4.84) |
Proof.
It follows from (4.79) that has a coordinate with Hamming weight greater than By symmetry, we may assume that
(4.85) |
We have where the second step uses the hypothesis along with the trivial bound whereas the third step is legitimate by (4.78). Thanks to the newly obtained inequality Lemma 2.12 is applicable with and gives a function such that
(4.86) | |||
(4.87) | |||
(4.88) | |||
(4.89) |
We will prove that the claimed properties (4.80)–(4.84) are enjoyed by the function
To verify the support property (4.80), fix any with Then necessarily forcing Now (4.86) implies that either equals or has Hamming weight at most Since by (4.78), this completes the proof of (4.80).
We are now ready to state and prove our hardness amplification result, which is a far-reaching generalization of Theorem 4.5.
Theorem 4.12.
Let and be the absolute constants from Theorems 3.6 and 4.4, respectively. Fix a real number and positive integers such that
(4.90) | |||
(4.91) | |||
(4.92) | |||
(4.93) |
Define
(4.94) |
Then there is an explicitly given mapping such that:
-
(i)
each output bit of is computable by a monotone -DNF formula;
-
(ii)
for every and every one has
-
(iii)
for every and every with one has
Proof.
As in the proof of Theorem 4.5, we may assume that
(4.95) |
since otherwise the left-hand side in the lower bounds of (ii) and (iii) is by definition . Define by and set Arguing as in the proof of Theorem 4.5, we obtain an explicit integer and an explicit -balanced coloring , which in turn results in explicit distributions on such that
(4.96) | ||||
(4.97) | ||||
(4.98) | ||||
(4.99) | ||||
(4.100) | ||||
(4.101) |
Properties (4.97) and (4.98) imply that
(4.102) |
For , define
(4.103) |
Claim 4.13.
For each there is a function such that
(4.104) | |||
(4.105) | |||
(4.106) | |||
(4.107) |
We will settle Claim 4.13 after the proof of the theorem. We now define the monotone mapping exactly the same way as in the proof of Theorem 4.5. Specifically, define by
(4.108) |
Define by
(4.109) |
where the right-hand side is the componentwise disjunction of the Boolean vectors With these definitions, items (i) and (ii) are immediate because they are restatements of Theorem 4.5 (i), (ii). To prove the remaining item (iii), fix an arbitrary function with
(4.110) |
and abbreviate
By the dual characterization of one-sided approximate degree (Fact 2.9), there is a function such that
(4.111) | |||
(4.112) | |||
(4.113) | |||
(4.114) |
Define by
(4.115) |
Equations (4.90)–(4.115) subsume the corresponding equations (4.44)–(4.66) in the proof of Theorem 4.5. Recall that from (4.44)–(4.66), we deduced Claims 4.7–4.10. As a result, Claims 4.7–4.10 remain valid here as well. In particular, we have
(4.116) | ||||
(4.117) | ||||
(4.118) | ||||
(4.119) |
Moreover, we will shortly prove the following new claim.
Claim 4.14.
whenever
Proof of Claim 4.13..
Fix arbitrarily for the remainder of the proof. Equations (4.92), (4.96), and (4.100) ensure that Lemma 2.5 is applicable to the distributions with parameters , and , whence
(4.120) |
Recall from (4.92) and (4.93) that and which makes Lemma 4.11 applicable. Define by
(4.121) |
where is as given by Lemma 4.11. To verify the support property (4.104), fix any input of Hamming weight For all in the summation with , we have in view of (4.80). As a result, (4.121) simplifies to In view of (4.81), we conclude that .
The orthogonality property (4.105) follows from
where the first step uses the defining equation (4.121) and Proposition 2.6 (i), and the second step is legitimate by (4.82).
Property (4.106) can be verified as follows:
Proof of Claim 4.14..
We will prove the claim in contrapositive form. Specifically, fix an arbitrary string with . Our objective is to deduce that
There are two cases to consider. If for some then the defining equation (4.108) implies that . As a result,
where the last step uses (4.110).
We now treat the complementary case By (4.107) and (4.115),
(4.122) |
It follows from that the summation in (4.122) contains at least one negative term, corresponding to a string . This forces
(4.123) |
and additionally implies the existence of with
(4.124) | ||||
(4.125) |
Since in the case under consideration, it follows from (4.96) and (4.124) that for all Now (4.118) ensures that for all which in turn makes it possible to rewrite (4.125) as Since we conclude that As a result,
where the last step is immediate from (4.114) and (4.123). ∎
4.5. Specializing the parameters
Theorems 4.5 and 4.12 have a large number of parameters that one can adjust to produce various hardness amplification theorems. We do so in this section. For any constants and we show how to transform a function on bits with approximate degree
(4.126) |
into a function on bits with approximate degree
(4.127) |
Comparing the exponents in (4.126) and (4.127), we see that is harder to approximate than relative to the Hamming weight of the inputs for and , respectively. Moreover, we show that is expressible as for some mapping whose output bits are computable by monotone DNF formulas of constant width. In particular, if is a monotone DNF formula of constant width, then so is The formal statement follows.
Corollary 4.15.
Fix reals and arbitrarily. Then for all large enough integers there is an explicitly given mapping with such that the output bits of are computable by monotone -DNF formulas and
(4.128) |
for every and every function with .
Proof.
Invoke Theorem 4.5 with parameters
(4.129) | ||||
(4.130) | ||||
(4.131) | ||||
(4.132) | ||||
(4.133) | ||||
(4.134) | ||||
(4.135) |
Provided that is large enough, these parameter settings satisfy the theorem hypotheses (4.44)–(4.47), whereas (4.48) gives
(4.136) |
As a result, Theorem 4.5 guarantees that
(4.137) |
where is the absolute constant from Theorem 4.4 and is an explicit mapping whose output bits are computable by monotone -DNF formulas. (In fact, uses only input bits, but this improvement is not relevant for our purposes.) Provided that is large enough relative to the absolute constant we infer (4.128) immediately from (4.137). ∎
Analogously, we have the following hardness amplification result for one-sided approximate degree.
Corollary 4.16.
Fix reals and arbitrarily. Then for all large enough integers there is an explicitly given mapping with such that the output bits of are computable by monotone -DNF formulas and
(4.138) |
for every and every function such that and
Proof.
The proof is the same, mutatis mutandis, as that of Corollary 4.15. Specifically, invoke Theorem 4.12 with parameters (4.129)–(4.135). Provided that is large enough, these parameter settings satisfy the theorem hypotheses (4.90)–(4.93), whereas (4.94) gives (4.136). As a result, Theorem 4.12 guarantees that
(4.139) |
where is the absolute constant from Theorem 4.4 and is an explicit mapping whose output bits are computable by monotone -DNF formulas. Provided that is large enough relative to the absolute constant this settles (4.138). ∎
5. Main results
In this section, we will settle our main results on approximate degree and present their applications to communication complexity.
5.1. Approximate degree of DNF and CNF formulas
We will start with the two-sided case. Our proof here amounts to taking the trivial one-variable formula and iteratively applying the hardness amplification of Corollary 4.15.
Theorem 5.1.
For every and there is a constant and an explicitly given family of functions such that each is computable by a monotone -DNF formula and satisfies
(5.1) |
Proof.
Let be the smallest integer such that
(5.2) |
Define
(5.3) |
Now, let be any large enough integer. Define recursively by and for Thus,
(5.4) | |||||
(5.5) |
where denotes equality up to lower-order terms. Provided that is larger than a certain constant, inductive application of Corollary 4.15 gives functions
(5.6) |
such that
(5.7) |
and each is an explicitly constructed monotone -DNF formula for some constant independent of In more detail, the requirement (5.7) for is equivalent to and is trivially satisfied by the “dictator” function , whereas for the function is obtained constructively from by invoking Corollary 4.15 with
Specializing (5.4)–(5.7) to , the function is a monotone -DNF formula for some constant independent of takes at most input variables, and has approximate degree
where the first and last steps hold for all large enough due to (5.3) and (5.2), respectively. The desired function family can then be defined by setting
for all larger than a certain constant and taking the remaining functions to be the dictator function ∎
Theorem 5.1 immediately implies Theorems 1.1 and 1.2 from the introduction. We now move on to the one-sided case.
Theorem 5.2.
For every and there is a constant and an explicitly given family of functions such that each is computable by a monotone -DNF formula and satisfies
(5.8) |
This result subsumes Theorem 5.1 and settles Theorem 1.4 in the introduction. The proof below makes repeated use of the following observation: if one applies Corollary 4.16 to a function that is the negation of a constant-width monotone DNF formula, then the resulting composition is again the negation of a constant-width monotone DNF formula. This is easy to see by writing and noting that both and are computable by constant-width monotone DNF formulas.
Proof of Theorem 5.2..
Much of the proof is identical to that of Theorem 5.1. As before, let be the smallest integer such that
(5.9) |
Define
(5.10) |
Now, let be any large enough integer. Define recursively by and for Thus,
(5.11) | |||||
(5.12) |
where denotes equality up to lower-order terms. Provided that is larger than a certain constant, inductive application of Corollary 4.16 gives functions
(5.13) |
such that
(5.14) |
and each is an explicitly constructed monotone -DNF formula for some constant independent of In more detail, the requirement (5.14) for is equivalent to and is trivially satisfied by the “dictator” function . For , we obtain from by applying Corollary 4.16 with
This appeal to Corollary 4.16 is legitimate because is a monotone DNF formula and therefore its negation evaluates to on the all-ones input.
Specializing (5.11)–(5.14) to , the function is a monotone -DNF formula for some constant independent of takes at most input variables, and has one-sided approximate degree
where the first and last steps hold for all large enough due to (5.10) and (5.9), respectively. The desired function family can then be defined by setting
for all larger than a certain constant and taking the remaining functions to be the dictator function ∎
5.2. Quantum communication complexity
Using the pattern matrix method, we will “lift” our approximate degree results to a near-optimal lower bound on the communication complexity of DNF formulas in the two-party quantum model. Before we can apply the pattern matrix method, there is a technicality to address with regard to the representation of Boolean values as real numbers. In this paper, we have followed the standard convention of representing “true” and “false” as and respectively. There is another common encoding, inspired by Fourier analysis and used in the pattern matrix method [38, 43], whereby “true” and “false” are represented as and respectively. To switch back and forth between these representations, we will use the following proposition.
Proposition 5.3.
For any function on a finite subset of Euclidean space, and any reals and
Proof.
For any polynomial we have the following equivalences:
where the second line uses ∎
As a corollary, we can relate in a precise way the approximate degree of a Boolean function and the approximate degree of the associated -valued function given by
Corollary 5.4.
For any Boolean function and any
Proof.
Since is Boolean-valued, we have the equality of functions Now where the second and fourth steps apply Proposition 5.3. ∎
Corollary 5.4 makes it easy to convert approximate degree results between the representation and representation. For communication complexity, no conversion is necessary in the first place:
(5.15) |
where denotes -error quantum communication complexity with arbitrary prior entanglement. This equality holds because the representation of “true” and “false” in a communication protocol is a purely syntactic matter, and one can relabel the output values as , respectively, without affecting the protocol’s correctness or communication cost. We note that (5.15) and Corollary 5.4 pertain to the encoding of the output of a Boolean function . How “true” and “false” bits are represented in the input to is immaterial both for communication complexity and approximate degree because the bijection is a linear map.
We are now in a position to prove the promised communication lower bounds. The pattern matrix method for two-party quantum communication is given by the following theorem [38, Theorem 1.1].
Theorem 5.5 (Sherstov).
Let be given. Define by
Then for all and
The original statement in [38, Theorem 1.1] uses the representation for the range of and We translated it to the representation, as stated in Theorem 5.5, by applying (5.15) to and Corollary 5.4 to By combining Theorems 5.1 and 5.5, we obtain our main result on the quantum communication complexity of DNF formulas:
Theorem 5.6.
For all and there is a constant and an explicitly given family of two-party communication problems such that each is computable by a monotone -DNF formula and satisfies
(5.16) |
Proof.
Theorem 5.1 gives a constant and an explicit family of functions such that each is computable by a monotone -DNF formula and satisfies
(5.17) |
For define by
where we index the strings and as arrays of bits. Clearly, is computable by a monotone -DNF formula. We now invoke the pattern matrix method for quantum communication (Theorem 5.5) with parameters
which satisfy for all As a result,
for all where the first inequality applies the pattern matrix method, and the second inequality uses (5.17). Now (5.16) follows since are constants. ∎
5.3. Randomized multiparty communication
We now turn to communication lower bounds for DNF formulas in the -party number-on-the-forehead model. Analogous to (5.15), we have
(5.18) |
where denotes -error number-on-the-forehead randomized communication complexity. The -party set disjointness problem is given by
In other words, the problem asks whether there is a coordinate in which each of the Boolean vectors has a If one views as the characteristic vectors of corresponding sets , then the set disjointness function evaluates to true if and only if For a communication problem and a function we view the componentwise composition as a -party communication problem on The multiparty pattern matrix method [43, Theorem 5.1] gives a lower bound on the communication complexity of in terms of the approximate degree of :
Theorem 5.7 (Sherstov).
Let be given. Consider the -party communication problem defined by Then for all with one has
where is an absolute constant.
The actual statement of the pattern matrix method in [43, Theorem 5.1] is for functions and with range . Theorem 5.7 above, stated for functions with range , is immediate from [43, Theorem 5.1] by applying (5.18) to and Corollary 5.4 to . We are now ready for our main result on the randomized multiparty communication complexity of DNF formulas.
Theorem 5.8.
Fix arbitrary constants and . Then for all integers there is an explicitly given -party communication problem with
(5.19) | |||
(5.20) |
where is a constant independent of and Moreover, each is computable by a monotone DNF formula of width and size .
It will be helpful to keep in mind that the conclusion of Theorem 5.8 is “monotone” in in the sense that proving Theorem 5.8 for a given constant proves it for all larger constants as well.
Proof.
Theorem 5.1 gives a constant and an explicit family of functions such that each is computable by a monotone DNF formula of width and satisfies
(5.21) |
Let be the absolute constant from Theorem 5.7. For arbitrary integers define
We first analyze the cost of representing as a DNF formula. If then by definition is a monotone DNF formula of width and size In the complementary case, is by construction a monotone DNF formula of width and hence of size at most whereas is by definition a monotone DNF formula of width and size at most As a result, the composed function is a monotone DNF formula of width and size at most In particular, the claim in the theorem statement regarding the width and size of as a monotone DNF formula is valid for any constant
We now turn to the communication complexity of Since is nonconstant, we have the trivial bound
(5.22) |
We further claim that
(5.23) |
whenever the logarithmic term is well-defined. For this claim is vacuous. In the complementary case consider the family of functions given by . For each it is clear that and have the same approximate degree. Since one now obtains (5.23) directly from (5.21) and the multiparty pattern matrix method (Theorem 5.7).
For a sufficiently large constant , the communication lower bound (5.19) follows from (5.23) for and follows from (5.22) for .
The proof of (5.20) is more tedious. Take the constant large enough that the following relations hold:
(5.24) | ||||
(5.25) | ||||
(5.26) |
If then (5.20) holds due to (5.22). In what follows, we treat the complementary case when
(5.27) |
and in particular
(5.28) | ||||
(5.29) |
Then
(5.30) |
where the second step uses (5.29), the third step uses (5.24) and (5.28), the fourth step is valid by (5.27), and the last step uses (5.25) and (5.28). Continuing,
(5.31) |
where the third step uses (5.27) and (5.29), and the fourth step uses (5.24) and (5.28). Now
Remark 5.9.
In this section, we considered -party number-on-the-forehead bounded-error communication complexity with classical players. The model naturally extends to quantum players, and our lower bound in Theorem 5.8 implies an communication lower bound in this quantum -party number-on-the-forehead model for computing an explicit DNF formula of size and width with error probability where the constants and can be set arbitrarily. In more detail, the multiparty pattern matrix method actually gives a bound on the generalized discrepancy of the composed communication problem . By the results of [27], generalized discrepancy leads in turn to a lower bound on the communication complexity of in the quantum -party number-on-the-forehead model. Quantitatively, the authors of [27] show that any classical communication lower bound obtained via generalized discrepancy carries over to the quantum model with only a factor of loss.
5.4. Nondeterministic and Merlin–Arthur multiparty communication
To obtain our results on nondeterminism and Merlin–Arthur communication, we will now develop a general technique for transforming lower bounds on one-sided approximate degree into lower bounds in these communication models. The technique in question is implicit in the papers [23, 43] but has not been previously formalized in our sought generality.
Consider a -party communication problem for some finite sets . A fundamental notion in the study of multiparty communication is that of a cylinder intersection [6], defined as any function of the form
for some In other words, a cylinder intersection is the product of Boolean functions, where the -th function does not depend on the -th coordinate. For a probability distribution on the domain of the discrepancy of with respect to is denoted and defined as
where the maximum is taken over all cylinder intersections This notion of discrepancy was defined by Babai, Nisan, and Szegedy [6] and is unrelated to the one that we encountered in Section 3.2. It is of interest to us because of the following theorem [23, Theorem 4.1], which gives a lower bound on nondeterministic and Merlin–Arthur communication complexity in terms of discrepancy.
Theorem 5.10 (Gavinsky and Sherstov).
Let be a given -party communication problem, where Fix a function and a probability distribution on Put
Then
(5.32) | |||
(5.33) | |||
(5.34) |
We note that the original statement in [23] is for functions with range The above version for follows immediately because the output values of a communication protocol serve as textual labels that can be changed at will. Equation (5.34), which is also not part of the statement in [23], follows from (5.33) in view of the inequality for all reals with (Start with and multiply out the left-hand side.)
We will need yet another notion of discrepancy, introduced in [43] and called “repeated discrepancy.” Let be a -party communication problem on . A probability distribution on the domain of is called balanced if For such , the repeated discrepancy of with respect to is given by
where the maximum is over -dimensional cylinder intersections on and the arguments are chosen independently according to conditioned on for each . The repeated discrepancy of a communication problem is much harder to bound from above than standard discrepancy. The following result from [43, Theorem 4.27] bounds the repeated discrepancy of set disjointness.
Theorem 5.11 (Sherstov).
Let and be positive integers. Then there is a balanced probability distribution on the domain of such that
where is an absolute constant independent of
It was shown in [43] that repeated discrepancy gives a highly efficient way to transform multiparty communication protocols into polynomials. For a nonnegative integer and a function on a finite subset of Euclidean space, define
where the minimum is taken over polynomials of degree at most In other words, stands for the minimum error in an -norm approximation of by a polynomial of degree at most The following result was proved in [43, Theorem 4.2].
Theorem 5.12 (Sherstov).
Let be a -party communication problem, where . For an integer and a balanced probability distribution on the domain of , consider the linear operator given by
(5.35) |
where and are the probability distributions induced by on and respectively. Then for some absolute constant and every -dimensional cylinder intersection on
We are now in a position to derive the promised lower bound on nondeterministic and Merlin–Arthur communication complexity in terms of one-sided approximate degree. Our proof combines Theorems 5.10–5.12 in a way closely analogous to the proof of [43, Theorem 6.9].
Theorem 5.13.
Let be given. Let and be positive integers, and put . Then for all
(5.36) | ||||
(5.37) |
where is an absolute constant, independent of
Proof.
Abbreviate . Let denote the domain of . By Theorem 5.11, there is a probability distribution on such that
(5.38) | |||
(5.39) |
where is an absolute constant independent of and By the dual characterization of one-sided approximate degree (Fact 2.9), there exists a function such that
(5.40) | |||
(5.41) | |||
(5.42) | |||
(5.43) |
Define by
(5.44) |
Claim 5.14.
satisfies
(5.45) | |||
(5.46) | |||
(5.47) |
We will carry on with the theorem proof and settle the claims later. Equation (5.46) allows us to write
(5.48) |
for some Boolean function and a probability distribution on Indeed, one can explicitly define and
Claim 5.15.
One has
(5.49) | |||
(5.50) |
Claim 5.16.
There is an absolute constant such that
We now settle the claims used in the proof of Theorem 5.13.
Proof of Claim 5.14..
We have
where the second step uses (5.38), and the third step is legitimate by (5.40). Analogously,
where the last two steps are valid by (5.38) and (5.41), respectively. The final property (5.47) can be seen from the following chain of implications:
where the first and third steps use the definitions of and respectively, and the second step is valid by (5.43). ∎
Proof of Claim 5.15..
Proof of Claim 5.16..
Let and be the probability distributions induced by on and respectively, and let be the linear operator given by (5.35). Then for any cylinder intersection , we have
(5.51) |
where the second step uses (5.48), the third step invokes the definition (5.44), the fourth step is justified by (5.38), and the last step is valid by the definition of .
For every polynomial of degree less than , we have
(5.52) |
where the second step uses (5.42), the third step applies Hölder’s inequality, and the fourth step substitutes (5.41). Taking the infimum in (5.52) over all polynomials of degree less than we arrive at
(5.53) |
Now
where the first step maximizes over all cylinder intersections the second step combines (5.51) and (5.53), the third step is valid for some absolute constant by Theorem 5.12, and the fourth step holds by (5.39). ∎
This completes the proof of Theorem 5.13. By combining it with our main result on one-sided approximate degree, we now obtain our sought lower bounds for nondeterministic and Merlin–Arthur multiparty communication.
Theorem 5.17.
Let be arbitrary. Then for all integers there is an explicitly given -party communication problem with
(5.54) | |||
(5.55) | |||
(5.56) | |||
(5.57) |
where is a constant independent of and Moreover, each is computable by a monotone DNF formula of width and size .
Proof.
Theorem 5.2 gives a constant and an explicit family of functions such that each is computable by a monotone DNF formula of width and satisfies
(5.58) |
In particular,
(5.59) |
Let be the maximum of the absolute constants from Theorems 5.7 and 5.13. For arbitrary integers define
We first analyze the cost of representing as a DNF formula. If then by definition is a monotone DNF formula of width and size In the complementary case, is by construction a monotone DNF formula of width and hence of size at most whereas is by definition a monotone DNF formula of width and size at most As a result, the composed function is a monotone DNF formula of width and size at most In particular, the claim in the theorem statement regarding the width and size of as a monotone DNF formula is valid for any large enough This in turn implies the upper bound in (5.54): consider the nondeterministic protocol in which the parties “guess” one of the terms of the DNF formula for (for a cost of bits), evaluate it (using another bits of communication), and output the result.
We now turn to the communication lower bounds. Since is nonconstant, we have the trivial bounds
(5.60) | |||
(5.61) | |||
(5.62) |
We further claim that
(5.63) | |||
(5.64) | |||
(5.65) |
For these claims are trivial since communication complexity is nonnegative. In the complementary case consider the family of functions given by . For each it is clear that and have the same one-sided approximate degree. Since one now obtains (5.63) and (5.65) directly from (5.58) and Theorem 5.13. Analogously, and have the same two-sided approximate degree for each , and one obtains (5.64) from (5.59) and Theorem 5.7.
Acknowledgments
The author is thankful to Justin Thaler and Mark Bun for useful comments on an earlier version of this paper.
References
- [1] S. Aaronson and Y. Shi, Quantum lower bounds for the collision and the element distinctness problems, J. ACM, 51 (2004), pp. 595–605, doi:10.1145/1008731.1008735.
- [2] M. Ajtai, H. Iwaniec, J. Komlós, J. Pintz, and E. Szemerédi, Construction of a thin set with small Fourier coefficients, Bulletin of the London Mathematical Society, 22 (1990), pp. 583–590, doi:10.1112/blms/22.6.583.
- [3] L. Babai, Trading group theory for randomness, in Proceedings of the Seventeenth Annual ACM Symposium on Theory of Computing (STOC), 1985, pp. 421–429, doi:10.1145/22145.22192.
- [4] L. Babai, P. Frankl, and J. Simon, Complexity classes in communication complexity theory, in Proceedings of the Twenty-Seventh Annual IEEE Symposium on Foundations of Computer Science (FOCS), 1986, pp. 337–347, doi:10.1109/SFCS.1986.15.
- [5] L. Babai and S. Moran, Arthur-Merlin games: A randomized proof system, and a hierarchy of complexity classes, J. Comput. Syst. Sci., 36 (1988), pp. 254–276, doi:10.1016/0022-0000(88)90028-1.
- [6] L. Babai, N. Nisan, and M. Szegedy, Multiparty protocols, pseudorandom generators for logspace, and time-space trade-offs, J. Comput. Syst. Sci., 45 (1992), pp. 204–232, doi:10.1016/0022-0000(92)90047-M.
- [7] Z. Bar-Yossef, T. S. Jayram, R. Kumar, and D. Sivakumar, An information statistics approach to data stream and communication complexity, J. Comput. Syst. Sci., 68 (2004), pp. 702–732, doi:10.1016/j.jcss.2003.11.006.
- [8] R. Beals, H. Buhrman, R. Cleve, M. Mosca, and R. de Wolf, Quantum lower bounds by polynomials, J. ACM, 48 (2001), pp. 778–797, doi:10.1145/502090.502097.
- [9] P. Beame, M. David, T. Pitassi, and P. Woelfel, Separating deterministic from nondeterministic NOF multiparty communication complexity, in Proceedings of the Thirty-Fourth International Colloquium on Automata, Languages and Programming (ICALP), 2007, pp. 134–145, doi:10.1007/978-3-540-73420-8_14.
- [10] P. Beame, M. David, T. Pitassi, and P. Woelfel, Separating deterministic from randomized multiparty communication complexity, Theory of Computing, 6 (2010), pp. 201–225, doi:10.4086/toc.2010.v006a009.
- [11] P. Beame and T. Huynh, Multiparty communication complexity and threshold circuit size of , SIAM J. Comput., 41 (2012), pp. 484–518, doi:10.1137/100792779.
- [12] P. Beame, T. Pitassi, N. Segerlind, and A. Wigderson, A strong direct product theorem for corruption and the multiparty communication complexity of disjointness, Computational Complexity, 15 (2006), pp. 391–432, doi:10.1007/s00037-007-0220-2.
- [13] H. Buhrman and R. de Wolf, Communication complexity lower bounds by polynomials, in Proceedings of the Sixteenth Annual IEEE Conference on Computational Complexity (CCC), 2001, pp. 120–130, doi:10.1109/CCC.2001.933879.
- [14] M. Bun, R. Kothari, and J. Thaler, The polynomial method strikes back: Tight quantum query bounds via dual polynomials, Theory Comput., 16 (2020), pp. 1–71, doi:10.4086/toc.2020.v016a010.
- [15] M. Bun and J. Thaler, Dual lower bounds for approximate degree and Markov–Bernstein inequalities, Inf. Comput., 243 (2015), pp. 2–25, doi:10.1016/j.ic.2014.12.003.
- [16] M. Bun and J. Thaler, Hardness amplification and the approximate degree of constant-depth circuits, in Proceedings of the Forty-Second International Colloquium on Automata, Languages and Programming (ICALP), 2015, pp. 268–280, doi:10.1007/978-3-662-47672-7_22.
- [17] M. Bun and J. Thaler, A nearly optimal lower bound on the approximate degree of , SIAM J. Comput., 49 (2020), doi:10.1137/17M1161737.
- [18] M. Bun and J. Thaler, The large-error approximate degree of , Theory of Computing, 17 (2021), pp. 1–46, doi:10.4086/toc.2021.v017a007.
- [19] A. K. Chandra, M. L. Furst, and R. J. Lipton, Multi-party protocols, in Proceedings of the Fifteenth Annual ACM Symposium on Theory of Computing (STOC), 1983, pp. 94–99, doi:10.1145/800061.808737.
- [20] A. Chattopadhyay and A. Ada, Multiparty communication complexity of disjointness, in Electronic Colloquium on Computational Complexity (ECCC), January 2008. Report TR08-002.
- [21] H. Chernoff, A measure of asymptotic efficiency for tests of a hypothesis based on the sum of observations, Ann. Math. Statist., 23 (1952), pp. 493–507.
- [22] M. David, T. Pitassi, and E. Viola, Improved separations between nondeterministic and randomized multiparty communication, ACM Transactions on Computation Theory (TOCT), 1 (2009), doi:10.1145/1595391.1595392.
- [23] D. Gavinsky and A. A. Sherstov, A separation of and in multiparty communication complexity, Theory of Computing, 6 (2010), pp. 227–245, doi:10.4086/toc.2010.v006a010.
- [24] W. Hoeffding, Probability inequalities for sums of bounded random variables, Journal of the American Statistical Association, 58 (1963), pp. 13–30, doi:10.1080/01621459.1963.10500830.
- [25] B. Kalyanasundaram and G. Schnitger, The probabilistic communication complexity of set intersection, SIAM J. Discrete Math., 5 (1992), pp. 545–557, doi:10.1137/0405044.
- [26] T. Lee, A note on the sign degree of formulas, 2009. Available at http://arxiv.org/abs/0909.4607.
- [27] T. Lee, G. Schechtman, and A. Shraibman, Lower bounds on quantum multiparty communication complexity, in Proceedings of the Twenty-Fourth Annual IEEE Conference on Computational Complexity (CCC), 2009, pp. 254–262, doi:10.1109/CCC.2009.24.
- [28] T. Lee and A. Shraibman, Disjointness is hard in the multiparty number-on-the-forehead model, Computational Complexity, 18 (2009), pp. 309–336, doi:10.1007/s00037-009-0276-2.
- [29] N. S. Mande, J. Thaler, and S. Zhu, Improved approximate degree bounds for -distinctness, in Proceedings of the 15th Conference on the Theory of Quantum Computation, Communication and Cryptography (TQC), vol. 158, 2020, pp. 2:1–2:22, doi:10.4230/LIPIcs.TQC.2020.2.
- [30] M. L. Minsky and S. A. Papert, Perceptrons: An Introduction to Computational Geometry, MIT Press, Cambridge, Mass., 1969.
- [31] N. Nisan and M. Szegedy, On the degree of Boolean functions as real polynomials, Computational Complexity, 4 (1994), pp. 301–313, doi:10.1007/BF01263419.
- [32] A. A. Razborov, On the distributional complexity of disjointness, Theor. Comput. Sci., 106 (1992), pp. 385–390, doi:10.1016/0304-3975(92)90260-M.
- [33] A. A. Razborov, Quantum communication complexity of symmetric predicates, Izvestiya of the Russian Academy of Sciences, Mathematics, 67 (2002), pp. 145–159.
- [34] A. A. Razborov and A. A. Sherstov, The sign-rank of , SIAM J. Comput., 39 (2010), pp. 1833–1855, doi:10.1137/080744037.
- [35] B. Rosser, Explicit bounds for some functions of prime numbers, American Journal of Mathematics, 63 (1941), pp. 211–232.
- [36] A. A. Sherstov, Communication lower bounds using dual polynomials, Bulletin of the EATCS, 95 (2008), pp. 59–93.
- [37] A. A. Sherstov, Separating from depth- majority circuits, SIAM J. Comput., 38 (2009), pp. 2113–2129, doi:10.1137/08071421X.
- [38] A. A. Sherstov, The pattern matrix method, SIAM J. Comput., 40 (2011), pp. 1969–2000, doi:10.1137/080733644.
- [39] A. A. Sherstov, Strong direct product theorems for quantum communication and query complexity, SIAM J. Comput., 41 (2012), pp. 1122–1165, doi:10.1137/110842661.
- [40] A. A. Sherstov, Approximating the AND-OR tree, Theory of Computing, 9 (2013), pp. 653–663, doi:10.4086/toc.2013.v009a020.
- [41] A. A. Sherstov, The intersection of two halfspaces has high threshold degree, SIAM J. Comput., 42 (2013), pp. 2329–2374, doi:10.1137/100785260.
- [42] A. A. Sherstov, Making polynomials robust to noise, Theory of Computing, 9 (2013), pp. 593–615, doi:10.4086/toc.2013.v009a018.
- [43] A. A. Sherstov, Communication lower bounds using directional derivatives, J. ACM, 61 (2014), pp. 1–71, doi:10.1145/2629334.
- [44] A. A. Sherstov, The multiparty communication complexity of set disjointness, SIAM J. Comput., 45 (2016), pp. 1450–1489, doi:10.1137/120891587.
- [45] A. A. Sherstov, Breaking the Minsky–Papert barrier for constant-depth circuits, SIAM J. Comput., 47 (2018), pp. 1809–1857, doi:10.1137/15M1015704.
- [46] A. A. Sherstov, The power of asymmetry in constant-depth circuits, SIAM J. Comput., 47 (2018), pp. 2362–2434, doi:10.1137/16M1064477.
- [47] A. A. Sherstov, Algorithmic polynomials, SIAM J. Comput., 49 (2020), pp. 1173–1231, doi:10.1137/19M1278831.
- [48] A. A. Sherstov, The hardest halfspace, Comput. Complex., 30 (2021), pp. 1–85, doi:10.1007/s00037-021-00211-4.
- [49] A. A. Sherstov and P. Wu, Near-optimal lower bounds on the threshold degree and sign-rank of , in Proceedings of the Fifty-First Annual ACM Symposium on Theory of Computing (STOC), 2019, pp. 401–412, doi:10.1145/3313276.3316408.
- [50] R. Špalek, A dual polynomial for OR. Available at http://arxiv.org/abs/0803.4516, 2008.
- [51] R. de Wolf, Quantum Computing and Communication Complexity, PhD thesis, University of Amsterdam, 2001.
Appendix A Constructing low-discrepancy integer sets
The purpose of this appendix is to provide a detailed and self-contained proof of Theorem 3.5, restated below.
Theorem.
Fix an integer and reals and . Let be an integer with
Fix a set for each prime with . Suppose further that the cardinalities of any two sets from among the differ by a factor of at most Consider the multiset
Then the elements of are pairwise distinct and nonzero. Moreover, if then
(A.1) |
for some explicitly given constant independent of
The special case in this result was proved in [48, Theorem 3.6], and that proof applies with cosmetic changes to any As a service to the reader, we provide the complete derivation below; the treatment here is the same word for word as in [48] except for one minor point of departure to handle arbitrary . We use the same notation as in [48] and in particular denote the modulus by lowercase as opposed to the uppercase in the main body of our paper (Theorem 3.5). Analogous to [48], the presentation is broken down into five key milestones, corresponding to Sections A.1–A.5 below.
A.1. Exponential notation
In the remainder of this manuscript, we adopt the shorthand
where is the imaginary unit. We will need the following bounds [48, Section 6.1]:
(A.2) | |||||
(A.3) |
Let denote the set of prime numbers with In this notation, the multiset is given by
There are precisely primes in of which at most are prime divisors of Therefore,
(A.4) |
A.2. Elements of S are nonzero and distinct
As our first step, we verify that the elements of are nonzero and distinct modulo . This part of the argument is reproduced word for word from [48, Section 6.2].
Specifically, consider any , any prime with and any Then This means that which in turn implies that .
We now show that the multiset contains no repeated elements. For this, consider any any primes and any and such that
(A.5) |
Our goal is to show that To this end, multiply (A.5) through by to obtain
(A.6) |
The left-hand side and right-hand side of (A.6) are integers in whence
(A.7) |
This implies that which in view of and the primality of and forces Now (A.7) simplifies to
(A.8) |
which in turn yields . Recalling that we arrive at Finally, substituting in (A.8) gives
A.3. Correlation for k small
So far, we have shown that the elements of are distinct and nonzero. To bound the -discrepancy of this set, we must bound the exponential sum
(A.9) |
for all This subsection and the next provide two complementary bounds on (A.9). The first bound, presented below, is preferable when is close to zero modulo
Claim A.1.
Let be given. Then
This claim generalizes the analogous statement in [48, Claim 6.10], where the special case was considered.
Proof.
Let be the set of those primes in that divide neither nor Then clearly
(A.10) |
Exactly as in [48], we have
(A.11) |
We proceed to bound the two summations in (A.11). Bounding the second summation is straightforward:
(A.12) |
where the first step is valid because the cardinalities of any two sets differ by a factor of at most and the last step uses (A.10). This three-line derivation is our only point of departure from the treatment in [48].
The other summation in (A.11) is analyzed exactly as in [48]. For and we have
where the second step uses Fact 2.16 and the relative primality of and ; the third step applies the triangle inequality; the fourth step follows from , and the last step is valid by (A.2) and . We have shown that
for Summing over
(A.13) |
A.4. Correlation for large
We now present an alternative bound on the exponential sum (A.9), which is preferable to the bound of Claim A.1 when is far from zero modulo This part of the proof is reproduced verbatim from [48, Section 6.4].
Claim A.2.
Let be given. Then
Proof:.
where the last two steps use (A.3) and , respectively. ∎
A.5. Finishing the proof
The remainder of the proof is reproduced without changes from [48, Section 6.5], except for the use of the updated bound in Claim A.1 for arbitrary
Specifically, Facts 2.17 and 2.18 imply that
(A.14) | ||||
(A.15) |
where is a constant independent of Moreover, can be easily calculated from the explicit bounds in Facts 2.17 and 2.18. We will show that the theorem conclusion (A.1) holds with We may assume that
(A.16) | |||
(A.17) |
since otherwise the right-hand side of (A.1) exceeds and the theorem is trivially true. By (A.4) and (A.14)–(A.17), we obtain
which along with (A.15) gives
(A.18) |
Claims A.1 and A.2 ensure that for every
here we are using the updated bound from Claim A.1 in this paper for general . Substituting the estimate from (A.18), we conclude that
This conclusion is equivalent to (A.1). The proof of Theorem 3.5 is complete.