Lifting Theorems Meet Information Complexity: Known and New Lower Bounds of Set-disjointness111Working Paper
Abstract
Set-disjointness problems are one of the most fundamental problems in communication complexity and have been extensively studied in past decades. Given its importance, many lower bound techniques were introduced to prove communication lower bounds of set-disjointness.
Combining ideas from information complexity and query-to-communication lifting theorems, we introduce a density increment argument to prove communication lower bounds for set-disjointness:
-
•
We give a simple proof showing that a large rectangle cannot be -monochromatic for multi-party unique-disjointness.
-
•
We interpret the direct-sum argument as a density increment process and give an alternative proof of randomized communication lower bounds for multi-party unique-disjointness.
-
•
Avoiding full simulations in lifting theorems, we simplify and improve communication lower bounds for sparse unique-disjointness.
Potential applications to be unified and improved by our density increment argument are also discussed.
1 Introduction
Set-disjointness is one of the most important problems in communication complexity. Since the formulation of the communication model [Yao79], many researchers have made great efforts to understand the communication complexity, both upper and lower bounds, of set-disjointness problems in various communication models [BFS86, KS92, Raz92, Raz03, BYJKS04, JRS03, HW07, KW09, BEO+13, ST13, GW16, WW15, Gav16, BO17, BGK+18, KPW21, DOR21]. Building upon communication lower bounds for set-disjointness, applications in diverse areas have been studied. For example, it gives lower bounds for monotone circuit depth [GP18], streaming problems [AMS99, BYJKS04, KPW21], proof complexity [GP18], game theory [GNOR15, GS17], property testing [BBM11], data structure lower bound [MNSW95], extension complexity [BM13, GW16], and more.
Given the importance of this problem, many techniques were invented simply to understand communication lower bounds of set-disjointness. Some remarkable methods include rank method [Gri85, HW07, RY20], discrepancy method [RY15], corruption bound [Raz92], smooth rectangle bound [JK10, HJ13], and information complexity [CSWY01, BYJKS04, Gro09, Jay09]. Among all of these methods, the information complexity framework seems to provide the best results so far. We refer interested readers to [CP10] for a good survey on these results.
In this paper, we continue the study of set-disjointness. Inspired by simulation methods in query-to-communication lifting theorems [RM97, GPW18, LMM+22, YZ22], we present a proof of lower bounds of set-disjointness based on density increment arguments (sometimes also called structure-vs-pseudorandomness approach). Based on this method, we give several new lower bounds for set-disjointness in different communication models. Our proof can be considered as a combination of simulation methods and information complexity.
Compared with previous techniques, our proof is simpler and more general. It addresses some drawbacks of both simulation methods and information complexity methods. More details will be discussed in Section 1.2.
1.1 Our results
The main contribution of this work is ”explicit proofs” of communication lower bounds, together with some new unique-disjointness lower bounds. We call it explicit because our proof framework has several advantages compared to existing techniques:
-
•
It has fewer restrictions to communication models.
-
•
It allows us to use communication lower bound techniques in a non-black-box way.
-
•
It provides a method to analyze distributions with correlations between different coordinates.
In Section 1.3, we discuss three potential applications of these advantages. Each of them corresponds to an advantage here.
Our proof builds on a combination of simulation techniques from lifting theorems and information complexity. Specifically, we abstract the core idea from Raz-McKenzie simulation [RM97] and revise it as a density increment argument. To explain more connections and comparison with previous techniques, we present three lower bounds for unique-disjointness problem.
We first study the multi-party communication model (). In this setting, there are parties, where each party holds a set (we use a binary string to represent a set). It is promised that either all sets are pairwise disjoint, or they share a unique common element. Formally, we define
-
•
.
-
•
.
We use to refer to the no instances and to refer to the yes instances. In this setting, we prove a structure lemma that any -large rectangle must intersect .
Theorem 1.1.
Let be a rectangle such that , then .
We note that Theorem 1.1 implies (and stronger than) an deterministic communication lower bound of . For any protocol with communication bits, we can always find a rectangle in the partition such that , Theorem 1.1 then tells us that is not disjoint from .
Our proof is a two-page elementary (and self-contained) proof. Furthermore, we do not even need notions like entropy or rank. This proof also reveals the main idea of query-to-communication lifting theorems. We will discuss more details in Section 1.2.
Our second contribution is a new proof of randomized communication lower bounds of . This problem has been extensively studied for many years. Building on a series of great papers [AMS99, BYJKS04, CKS03], the optimal tight randomized communication lower bound was finally obtained by [Gro09, Jay09] through the information complexity framework. In this paper, we reprove this theorem via the density increment argument.
Theorem 1.2.
For any , the randomized communication complexity of is .
We first note that Theorem 1.2 does not imply Theorem 1.1 because Theorem 1.1 indicates that every large rectangle (contains many no instance) cannot be monochromatic. However, Theorem 1.2 only proves randomized communication lower bounds.
Our proof of Theorem 1.2 is a mix of information complexity and query-to-communication simulations. Roughly speaking, in the information complexity framework, we analyze the information cost for each coordinate and then apply a direct-sum argument to merge them. In our density increment argument, we merge these costs by borrowing the projection operation from query-to-communication simulations. Hence, our density increment argument can be interpreted as an alternative direct-sum argument.
Several papers [CFK+19, GJPW18, MM22] pointed research directions in connecting information complexity and lifting theorems, and our proof has a great potential to unify information complexity and lifting theorems in this direction.
Our last result is a tight deterministic lower bound for (two-party) sparse unique-disjointess () for a large range of sparse parameters. This problem, with sparsity parameter , can be described as follows: Alice holds a set and Bob holds a set with . It is promised that either or , where Alice and Bob need to distinguish the two cases with deterministic communication.
Two extreme choices of correspond to two important problems in communication complexity. If , this problem becomes the standard unique-disjointness problem (i.e., with ). When , the problem is essentially the EQUALITY problem. For , we prove the following theorem.
Theorem 1.3.
Let be any small constant. For any , the deterministic communication complexity of -sparse unique-disjointness is .
Prior to our work, Kushilevitz and Weinreb [KW09] proved the same lower bound for a smaller range of . Then Loff and Mukhopadhyay [LM19] improved this range to . Our Theorem 1.3 further pushes this range to .
Our proof of Theorem 1.3 is built on [LM19] with several differences where the main difference is that we no longer fully simulate the communication tree by a decision tree. Instead, we aim to find a long path in the communication tree. The approach was suggested by Yang and Zhang [YZ22]. We believe it is possible to further improve the range to all and discuss more details in Section 5.
A similar task is to prove a deterministic lower bound for sparse set-disjointness without the unique requirement. In this setting, Håstad and Wigderson pointed out [HW07] a same bound can be proved via the rank method in [Juk11]. However, in the unique setting, [KW09] showed that the rank method is impossible to achieve such tight bounds.
We emphasize that Theorem 1.3 is a lower bound only for deterministic communication complexity. Allowing public randomness and a constant error, there exists a protocol that costs only bits [HW07]. Therefore, this factor is a separation between randomized communication and deterministic communication. This also implies all lower bound techniques that simultaneously imply randomized communication lower bounds, including information complexity approaches, cannot reprove our bounds.
Furthermore, Braverman [Bra12] gave a zero-error protocol for -sparse unique-disjointness with constant information cost which can be extended to all .
Lemma 1.4.
For any . There is a zero-error protocol for -sparse unique-disjointness with information cost .
Overall, Theorem 1.3 demonstrates that the density increment argument has fewer restrictions on communication models and is able to circumvent such barriers by rank methods and information complexity.
1.2 Our techniques
Here we give an overview of our proof technique and discuss connections to lifting theorems and information complexity. We focus on Theorem 1.1. We recall that no instances are
Our main idea of Theorem 1.1 is a density increment argument. In this argument, we first define the density of a rectangle on by
It is clear that because . Now Theorem 1.1 is equivalent to say that any rectangle with density cannot be monochromatic.
Let be any monochromatic rectangle containing only no instances. We will perform a projection operation to convert into another monochromatic rectangle with a larger density . Now, since is still monochromatic, we repeat this projection for rounds where each round increases the density by a factor of . Let be the rectangle after projections, we then have
Combining and , this gives a contradiction.
Now we briefly explain our projection process. Let be a monochromatic rectangle. For each party , the projection of on is a rectangle defined by:
-
•
For each party , .
-
•
For the party , .
It is not hard to see that (for any ) preserves the monochromatic property of . On the other hand, we show there exists a party , such that has a larger density compared to . In fact, this density increment captures the communication cost of party .
We give a full proof of Theorem 1.1 in Section 3. We suggest readers begin with Section 3, and then proceed to Section 4 and Section 5.
Connections to lifting theorems.
Query-to-communication lifting theorems are a generic method to lift the hardness of one-party functions (decision tree lower bounds) to two-party lower bounds in the communication model. This recent breakthrough gives many applications in diverse areas [Göö15, HN12, GPW18, GJW18, DRNV16, dR19, GGKS18, CKLM18, GR21, BR20, RPRC16, PR17, LRS15, CLRS16, KMR21, CMS20, GKY22]. The simulation method is widely used to prove such lifting theorems. During the simulation, we maintain some pseudorandom properties to enforce the communication protocol to be similar to a query protocol. In this process, a potential function is always used to capture the number of communication bits against the pseudorandom property.
In this paper, we adopt the idea of the potential function argument and rephrase it as a density increment argument. In lifting theorems, the simulation is mainly used to maintain a good structure of rectangles, such as maintaining full-range rectangles. In contrast, our density increment argument is more flexible.
Connections to information complexity.
The information complexity is another important tool to prove communication complexity lower bounds. It aims to analyze the mutual information between communication transcripts and the random inputs held by Alice and Bob. From information-theoretical perspective, the randomized communication complexity is then lower bounded by the mutual information. To analyze the mutual entropy, a very useful tool here is called the direct-sum argument [CSWY01, BYJKS04, CKS03, Gro09, Jay09]. Roughly speaking, based on this argument, we only need to analyze the mutual information of the communication transcripts and each coordinate of Alice and Bob’s input. This step significantly reduces the difficulty of analyzing mutual entropy.
In our proof (see Section 4 for example), we also use the local-to-global strategy. The difference is that the information complexity paradigm [BYJKS04, CKS03, Gro09, Jay09] uses a direct-sum argument, which is a useful tool in information theory. In comparison, the density increment arguments use a combinatorial operation called projection. The projection provides more flexibility in different applications. As [DOR21] pointed out many applications in other settings are not amenable to the standard direct sum argument, such as proving information-theoretic lower bounds for the number-on-forehead model. In the density increment arguments, we do not see such barriers for now.
1.3 Potential applications of explicit proofs
We discuss some potential applications of our density increment arguments.
Different communication models.
As an active research area, many techniques were invented to prove communication complexity lower bounds in past decades. However, many of these techniques are specific to only one communication model. For example, the rank method mainly applies to deterministic communication; information complexity is usually used for randomized communication. We believe our density increment arguments provide more flexibility, having less dependency on the communication models. For example, in Theorem 1.1, we study a lower bound similar to (but not exactly) corruption bounds; in Theorem 1.2, we prove randomized communication lower bounds; in Theorem 1.3, we show deterministic lower bounds (it is a separation from randomized communication). Overall, we demonstrate that (at least for the set-disjointness problems) the density increment arguments combine both advantages of lifting theorems and information complexity. Till now, some communication models are still not fully understood. For example, the -game is an interesting communication model with applications in extension complexity [GJW18]. However, to our best knowledge, we still do not have a generic way to prove extension complexity lower bounds. Another example is the number-on-forehead model. It would be interesting to see if density increment arguments give new applications in various communication models.
Streaming lower bounds.
The connections between communication complexity and streaming lower bounds were explored by a seminal work of Alon, Matias and Szegedy [AMS99]. [AMS99] proved a streaming lower bound for frequency moment estimation based on a reduction from unique-disjointness lower bounds. After that, many subsequent works made a lot of efforts to improve the lower bounds [BYJKS04, CKS03, Gro09, CCM08, AMOP08, GH09] for this problem. As [CMVW16] also pointed out, any improved lower bounds of frequency moment estimation can be automatically applied to improve lower bounds for many other streaming problems.
However, the optimal bound for this fundamental problem222We focus on the random order streaming model. is still not clear333[GH09] claimed a tight bound, however, [CMVW16] pointed there is a flaw in [GH09].. To the best of our knowledge, all current lower bounds rely on (black-box) reductions from lower bounds. As we discussed, an bound for randomized communication of is already tight, and the black-box reduction seems a dead end in achieving tight bounds for frequency moment estimation.
To resolve this barrier, we believe an important step is to open the black box. Put differently, we should extend communication complexity lower-bounds techniques into streaming models. Since our proof of has fewer restrictions to models, it is reasonable to try this argument on streaming settings. Concretely, could we prove a tight lower bound of frequency moment estimation by the density increment argument?
Coordinate-wise correlated hard distributions.
Many proofs of randomized communication complexity lower bounds start with Yao’s minimax theorem and design a hard distribution. In some important applications, the hard distribution has a strong correlation between input coordinates. A good example is Tseitin problems, the lower bounds of which can be converted into lower bounds in proof complexity [GP18], extension complexity [GJW18], monotone computation lower bounds [PR17]. However, the hard distribution of Tseitin has complicated coordinate-wise correlation, which makes the information complexity argument difficult to use. Known lower bounds for randomized communication [GP18, GJW18] all lose a factor (including one based on a black-box reduction from two-party unique-disjointness [GP18]). Again, it seems the loss cannot be avoided in black-box reductions and it is very interesting to see if our density increment arguments are able to break this barrier.
Acknowledgements.
Authors thank Shachar Lovett and Xinyu Mao for helpful discussions. We are grateful to Kewen Wu for reading the early versions of this paper and providing useful suggestions.
2 Preliminary
For integer , we use to denote the set . Throughout, is the logarithm with base . For a finite domain , we use to denote a random variable uniformly distributed over .
Definition 2.1 (Entropy).
Let be a random variable on . The entropy of is defined by
Let and be two random variables on and respectively. the conditional entropy of given is defined by
Definition 2.2 (Mutual information).
Let and be two (possibly correlated) random variables on and respectively. The mutual information of and is defined by
Let be a random variable on , the conditional mutual information of and given is defined by
We use several basic properties of entropy and mutual information.
Fact 2.3.
Let and be two (possibly correlated) random variables on and respectively.
-
1.
Conditional entropy inequality: .
-
2.
Chain rule: .
-
3.
Nonnegativity: .
-
4.
.
3 Deterministic lower bound for multi-party unique-disjointness
In this section, we give a simple proof for Theorem 1.1. Our proof is based on a density increment argument. We first formally define the problem. In our writing, we use binary strings to represent a set. We associate any set with a corresponding string by setting iff .
Definition 3.1 (, deterministic version).
For each and . We define (no instances) and (yes instances) as follows:
-
•
.
-
•
.
A -party deterministic communication protocol solves if,
-
•
for all , ,
-
•
for all , .
Since the projection may fix some coordinates, we also define the projected instances. For a set , we define (no instances on ) as
and define (yes instances on ) as
We also partition the yes instances by , where
Now we define the density function.
Definition 3.2 (Density function).
For each and , we define its density function as
Note that for any rectangle because . We simplify the notation as if is clear in the context. A crucial step in our argument is the projection operation.
Definition 3.3 (Projection).
Let be a rectangle. For an and , the projection of on is a rectangle defined by:
-
•
for each , ,
-
•
for , .
Here is a string in by extending with .
The projection operation has two useful properties. The first one is that projection preserves the monochromatic property of the rectangle.
Fact 3.4.
Let be a rectangle such that . Then for every and , we have
The proof of Fact 3.4 follows from the definition and we omit it here. The next property is phrased as the following projection lemma.
Lemma 3.5 (Projection lemma).
Let be a rectangle. If there is a coordinate such that , then there is some such that
Given Lemma 3.5 and Fact 3.4, Theorem 1.1 becomes straightforward. We can simply repeat the projection times for , where each time we use Lemma 3.5 to choose a good to do projection and increase the density function by . Now we prove Lemma 3.5.
Proof of Lemma 3.5.
Let be a rectangle such that . Let and
Here is the restriction of on . Note that for all , , and our goal is to show that there is a such that is large.
For every , define the extension set of as . Crucially, for every , we have
(1) |
Note that, without the condition , it can only be bounded by . Inequality (1) is proved by contradiction. If there is a such that . Then we must have that , contradicting .
We now continue our proof. Partition into two parts:
First observe that for any , we have for every as is a rectangle. This implies for all . Hence
Applying (1) with , we have
For , since , we have
On the other hand, for every , it always exists one such that . By an average argument, there is at least one such that,
As a result, for this fixed we have
By the definition of the density function, we have
(since for all ) |
∎
4 Randomized lower bound for multi-party unique-disjointness
We focus on randomized communication lower bounds in this section. By Yao’s minimax theorem, this is equivalent to identifying a distribution that is hard on average for any deterministic communication protocol. We use the notation same as the previous section (Definition 3.1).
Our hard distribution is supported on .
Definition 4.1.
For any , we define the hard distribution on as follows.
-
1.
For every , uniformly and independently sample and .
-
2.
For every , if and , then set ; otherwise set .
-
3.
Sample and uniformly. If , then update for all .
-
4.
Output .
Given this hard distribution , we also define the distribution .
Now we give some explanation of the random variables in this sampling process.
-
•
The bit determines whether to output yes instances or no instances. In particular, for , we output a yes instance. Hence we update for all , where is uniformly sampled.
-
•
For every , the variable captures which party () may have the -th element
-
•
For every , determines whether has the -th element or not.
It is well-known that a deterministic protocol with communication complexity partitions the input domain into at most rectangles where each rectangle corresponds to a leaf in the communication tree. We then define the following random variable , which is the rectangle of a random leaf induced by the input distribution .
Definition 4.2.
For a fixed deterministic protocol , we define a distribution on leaf rectangles (of ) as follows.
-
1.
Randomly sample .
-
2.
Output the rectangle of containing .
We emphasize is defined by , which is not . Hence for a protocol with a small error on , the random rectangle should be biased towards with high probability.
If is clear in the context, we will simply use for . For any rectangle , we also use to denote the distribution Let and we use to denote the joint distribution of and . We are now ready to explain our theorems.
Theorem 4.3.
Let be a constant. For any deterministic protocol with error under , i.e.,
We have
We note that Theorem 4.3 implies Theorem 1.2 because the communication complexity of is lower bounded by , and is an upper bound of (Fact 2.3).
A similar lower bound was previously obtained by the information complexity framework [Gro09]. We reprove it by a density increment argument. In what follows, we fix the protocol . We first give a high-level view of our proof.
Sketch of the proof.
We first reinterpret the proof of the deterministic lower bound (Section 3) in an entropy perspective. Then we generalize it to randomized communication lower bounds.
Let be a deterministic communication protocol for . Every leaf of is a monochromatic rectangle. Let be any -monochromatic rectangle (i.e., ) of . Then for every input and ,
(2) |
since is -monochromatic. Furthermore, since is a rectangle, there is a party such that
Recall samples no instances. Thus we also have
By the definition of , it is equivalent to 444In the following part, we use replace the notation with when .
If we use the entropy language and recall the definition of , it is equivalent to
In contrast, if we do not condition on , we have that
which can be written as,
This gap captures the mutual information of and on the -th coordinate.
For different choices of , we may have different witnessing the mutual information. But on average, we have
In particular, there exists a such that
Now we explain how can we view the projection as a decoupling process for this mutual information. We can decompose the projection in two steps:
-
1.
Fix , i.e., update .
-
2.
Update the density function as
or equivalently
where (resp., ) is the marginal distribution of (resp., ) on .
In the first step, we pick the party that has mutual information. In the second step, we decouple the mutual information by simply removing it from the density function. The projection lemma (Lemma 3.5) captures how this decoupling step increases the density function. Another crucial fact is that, for any -monochromatic rectangle , the distribution is also supported on (see Fact 3.4), which guarantees us to continuously increase the density by projections on different coordinates.
Now we generalize this to the randomized communication setting where the rectangle is not necessarily monochromatic. By the correctness of the protocol, most rectangle is biased to either yes instances or no instances.
For a rectangle biased to no instances, we expect an inequality similar to (2) will hold: For most , most no instance and most , it satisfies
where is a small constant depending on the error rate of the protocol.
On the other hand, we also need to argue that projections can be repeated. This part is slightly more complicated than the deterministic case where we can simply fix for some . In the randomized case, we cannot fix it because we have to preserve the bias. This is addressed by:
- •
- •
4.1 Key definitions and lemmas
Now we introduce the key definitions and lemmas (bias lemma and projection lemma) needed for the randomized communication lower bound.
Definition 4.4 (-restriction).
For and , we call an -restriction, and denote as the distribution .
The -restriction corresponds to projections in the deterministic case: For , corresponds to the projection . Now we define our new density function.
Definition 4.5 (Density function).
Let be a rectangle and a set . For a restriction with , its density is defined by
The average density is defined by
In particular .
The main difference between the deterministic setting and randomized setting is that, in the deterministic case, we consider for some fixed and . However, in the randomized communication case, we have to consider which does not fix and because the projection lemma (Lemma 4.10) and the bias lemma (Lemma 4.11) are not preserved under fixed and .
We also note that might be negative for some and . But is always nonnegative because it is mutual information.
As we mentioned before, in the randomized setting, the leaves are no longer monochromatic but biased. Now we define the following bias definition to capture it (This is a randomized version of equation 2).
Definition 4.6.
Let be a rectangle and . Let be a restriction. For any and input , the bias of on the coordinate under is defined by
where is the yes instances with intersection witnessed by , i.e., is support of .555Though is the leaf conditioned on an input from , it is still possible that since the protocol is allowed to err. That is why is not implied by . Then we define the average bias of a rectangle on as
The overall bias on is defined by
Finally, we define the projection for randomized communication. Recall in the deterministic case, there are two steps in the projection. In the randomized case, since we put in average, we can remove the first step. Then the projection can be defined as follows.
Definition 4.7 (Projection).
Let be the set of unrestricted coordinates. For any , the projection on is to update the density function from to .
Remark 4.8.
We may use different projections for different communication problems. For example, the BPP lifting theorem [GPW17] used a very different projection because they studied low-discrepancy gadgets. We define the projection in such a way because we are working on AND gadgets. Given this flexibility, we believe the density increment arguments may provide new applications beyond the information complexity framework.
Now we introduce three key lemmas in our proof.
Lemma 4.9.
Let be a constant. Let be a deterministic protocol with error under the distribution . There is a constant (depending only on ) and a set of coordinates with such that holds for each .
Since is a protocol with a small error under and is sampled according to (no instances), we have that, for a random , it is very likely that is biased towards to no instances. Then Lemma 4.9 can be proved by an average argument. This is a generalization from the deterministic case that for all . The proof of Lemma 4.9 is deferred to Section A as part of the proof of Lemma 4.11.
Lemma 4.10 (Projection lemma).
Let be a constant. For any and , if , the projection on increases the density function by , i.e.,
The projection lemma shows that the density function increases if we do a projection on a biased coordinate. We prove it in Section 4.2.
Our last lemma shows that the bias is preserved during the projections as a counterpart to Lemma 3.4 in the deterministic case.
Lemma 4.11 (Bias lemma).
Let be the constant and be the set from Lemma 4.9. For any and distinct , we have that
This lemma can be by a convexity inequality and its proof is deferred to the Section A. Now we summarize these three lemmas and complete the proof of Theorem 4.3.
-
•
Lemma 4.9 shows that, if is a communication protocol with a small error under , then is very small for many coordinates .
-
•
Projection lemma (Lemma 4.10) converts the bias on into the density increment of projection on the coordinate .
-
•
Bias lemma (Lemma 4.11) proves that a projection on a coordinate preserves the bias on other coordinates , which shows the projection lemma can be applied many times.
4.2 Proof of the projection lemma
In our proof, we borrow a useful lemma from [Gro09] and [Jay09]. In [Gro09, Jay09], this lemma was used to analyze information cost.
Lemma 4.12 ([Gro09, Theorem 3.16]).
Let be a constant and . Fix a deterministic protocol . If , then
5 Deterministic lower bounds for sparse unique-disjointness
In this section, We discuss the sparse unique-disjointness problem.
Definition 5.1.
For each and , the -UDISJ problem is defined as follows:
-
•
No instances: .
-
•
Yes instances: .
Here is the Hamming weight of .
Theorem 1.3 aims to show that any deterministic communication protocol for -UDISJ requires communication bits. To prove this theorem, we consider the following Unique-Equality problem [ST13, LM19].
Definition 5.2.
Let and be integers. Let be a set with elements. The -UEQUAL problem is defined as follows:
-
•
No instances: .
-
•
Yes instances: .
There is a simple reduction from -UEQUAL to -UDISJ [ST13]. Hence it is sufficient to prove a communication lower bound for -UEQUAL. In Theorem 1.3, we focus on the regime that for any small constant Now our goal is to prove the communication complexity of -UEQUAL is .
We borrow the square idea from [LM19] but revise and simplify it as we do not need to fully simulate the protocol. See Section 5.2 for discussions.
Definition 5.3 (Square).
Let be a rectangle. A square in contains a set , a set , and for every , there is a set . We denote the family of these ’s as .
Given , we say it is a square in if, for every , there exists some and such that:
-
•
and, for all , ;
-
•
and, for all , .
Same as in previous sections, we use the set to denote unrestricted coordinates and use to denote fixed coordinates. We remark that the definition above enforces that (as ) for all . Hence, the fixed coordinates do not reveal any information about whether it is a yes instance or a no instance.
Similar to Raz-McKenzie simulation, we also have a notion of thickness in the proof.
Definition 5.4 (Thickness).
A set is -thick if, it is not empty and for every and , we have that
We say that a square is -thick if the set is -thick.
In our proof, we always choose , and sometimes abbreviate -thick as thick. The following thickness to full-range lemma is a standard fact in query-to-communication simulations.
Lemma 5.5.
Let be a thick set. Then for every , there is a pair such that
The proof of this lemma will be included in the full version. As a byproduct of this lemma, we have the following corollary.
Corollary 5.6.
Let be a rectangle containing a square such that and is thick, then is not monochromatic.
Definition 5.7 (Average degree).
Let . For each , we define the set as
We say that the average degree of is if holds for all . We say that a square has an average degree if the average degree of is .
Regards to the average degree, we have a simple but useful fact.
Fact 5.8.
For . Let be a set that has average degree . Then for any subset of size , has average degree .
A crucial component in Raz-McKenzie simulation connecting thickness and the average degree is the thickness lemma. In our proof, we borrow a version from [LM19].
Lemma 5.9 (Thickness lemma [LM19]).
Let be parameters. Let and . If has average degree , then there is a -thick set of size .
We also fix and together with . Recall that for some . In this regime of parameters, we have that . Hence, as long as we maintain a square with average degree . we are able to apply the thickness lemma.
Lemma 5.10 (Projection lemma).
Let be a rectangle and let be a thick square in . If the set has size more than , then there is a square in such that
-
•
and ,
-
•
has average degree ,
-
•
.
Proof sketch.
We prove this lemma by a standard structure-vs-pseudorandomness approach. We first describe the process (Algorithm 1) to find the set and .
We note that the average degree of is at least , otherwise the algorithm would not stop. Following this algorithm, it is also clear that . This implies that because .
Now, for each , we randomly pick a set by independently including each element with probability . Let and be those strings such that, there exists an input and such that:
-
•
and, for all , .
-
•
and, for all , .
We show that with high probability, the square is a witness for this lemma. We already argued that now we show for every ,
This inequality uses the fact that is -thick, and then a Chernoff bound on each , and a union bound on all . We omit the details here and will include them in the full version.
Once it is established, then by an average argument, there is a choice of such that
-
•
;
-
•
has average degree , by Fact 5.8 and having average and .
∎
Now we are ready to explain how to find a long path in the communication tree.
5.1 Finding a long path in a communication tree
Before presenting our algorithm, we first fix some notations.
Definition 5.11.
Let be a square in a rectangle . For any sub-rectangle of , sub-square is defined as follows:
-
•
Keep and the same.
-
•
contains all of those such that, there exists inputs and ,
and, for all , ,
and, for all , .
Definition 5.12 (Density function).
For a square , we define its density as
Now we describe how to find a long path in the communication tree. Recall that every node in a communication tree has an associated rectangle. Starting from the root, we find a path as follows:
-
1.
We maintain a square in each intermediate node.
-
2.
For each intermediate node, the path always visits the left or right child whose associated rectangle maximizes the density.
The pseudo-code is given in Algorithm 2.
Proof sketch of Theorem 1.3.
Let be the value of when we terminate the algorithm. We note this is the length of our path, which is a lower bound of the deterministic communication complexity. Now we argue a lower bound of by analyzing the changes to the density function in Algorithm 2. We consider two types of density changes, which are called simulation and projection respectively.
- •
-
•
Projection. The Line 21 is a projection. For every , if , then
Note that, to apply Lemma 5.9, we need control over the average degrees; to apply Lemma 5.10, we need control over the thickness. Indeed we will inductively show that the following properties are true for all ,
-
•
: the average degree of is at least .
-
•
: the average degree of is at least .
-
•
: is -thick.
The base case is true because and . The rest part can be proved by applying the thickness lemma and the projection lemma alternatively. We skip the proof here and will include it in our full version.
Finally, we observe that we must have that when the algorithm terminates at step . Otherwise, it is not a monochromatic rectangle by Corollary 5.6.
We note that the total density decrease before the algorithm termination is at most . On the other hand, the total density increase before the termination is at least . This implies
and the result follows. ∎
5.2 Discussions and open problems
A very interesting follow-up open problem is to study -UEQUAL lower bounds for . In our proof, the main bottleneck is Lemma 5.9 (thickness lemma), which requires that . Note that and . Hence, Lemma 5.9 only applies to the range . In fact, the thickness lemma (or similar lemmas) is also the main barrier in query-to-communication lifting theorems. Lifting theorems usually require a full-range lemma (something similar to Lemma 5.5) to maintain a full simulation on the communication tree. We use the term full simulation to refer to these proofs aiming to construct a decision tree to exactly compute the Boolean functions.
In contrast, we only attempt to find a long path in the communication tree. This approach was suggested by Yang and Zhang [YZ22]. In our analysis, only Corollary 5.6 (a direct corollary of the full-range lemma) is needed. This is much weaker than the full-range requirement. Recall that the full-range lemma shows that: For every , there is a pair such that
For the -UEQUAL problem, we only care about a subset of . Here is the indicate vector. This observation may give a chance to avoid the full-range barrier, providing tight lower bounds for any .
Overall, we believe that finding a long-path paradigm may provide more applications beyond the full simulation paradigm.
References
- [AMOP08] Alexandr Andoni, Andrew McGregor, Krzysztof Onak, and Rina Panigrahy. Better bounds for frequency moments in random-order streams. arXiv preprint arXiv:0808.2222, 2008.
- [AMS99] Noga Alon, Yossi Matias, and Mario Szegedy. The space complexity of approximating the frequency moments. Journal of Computer and system sciences, 58(1):137–147, 1999.
- [BBM11] Eric Blais, Joshua Brody, and Kevin Matulef. Property testing lower bounds via communication complexity. In 2011 IEEE 26th Annual Conference on Computational Complexity, pages 210–220, 2011.
- [BEO+13] Mark Braverman, Faith Ellen, Rotem Oshman, Toniann Pitassi, and Vinod Vaikuntanathan. A tight bound for set disjointness in the message-passing model. In 2013 IEEE 54th Annual Symposium on Foundations of Computer Science, pages 668–677. IEEE, 2013.
- [BFS86] László Babai, Peter Frankl, and Janos Simon. Complexity classes in communication complexity theory (preliminary version). In FOCS 1986, 1986.
- [BGK+18] Mark Braverman, Ankit Garg, Young Kun Ko, Jieming Mao, and Dave Touchette. Near-optimal bounds on the bounded-round quantum communication complexity of disjointness. SIAM Journal on Computing, 47(6):2277–2314, 2018.
- [BM13] Mark Braverman and Ankur Moitra. An information complexity approach to extended formulations. In Proceedings of the forty-fifth annual ACM symposium on Theory of computing, pages 161–170, 2013.
- [BO17] Mark Braverman and Rotem Oshman. A rounds vs. communication tradeoff for multi-party set disjointness. In 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS), pages 144–155. IEEE, 2017.
- [BR20] Yakov Babichenko and Aviad Rubinstein. Communication complexity of nash equilibrium in potential games. In 2020 IEEE 61st Annual Symposium on Foundations of Computer Science (FOCS), pages 1439–1445. IEEE, 2020.
- [Bra12] Mark Braverman. Interactive information complexity. In Proceedings of the forty-fourth annual ACM symposium on Theory of computing, pages 505–524, 2012.
- [BYJKS04] Ziv Bar-Yossef, Thathachar S Jayram, Ravi Kumar, and D Sivakumar. An information statistics approach to data stream and communication complexity. Journal of Computer and System Sciences, 68(4):702–732, 2004.
- [CCM08] Amit Chakrabarti, Graham Cormode, and Andrew McGregor. Robust lower bounds for communication and stream computation. In Proceedings of the fortieth annual ACM symposium on Theory of computing, pages 641–650, 2008.
- [CFK+19] Arkadev Chattopadhyay, Yuval Filmus, Sajin Koroth, Or Meir, and Toniann Pitassi. Query-to-communication lifting using low-discrepancy gadgets. arXiv preprint arXiv:1904.13056, 2019.
- [CKLM18] Arkadev Chattopadhyay, Michal Kouckỳ, Bruno Loff, and Sagnik Mukhopadhyay. Simulation beats richness: New data-structure lower bounds. In Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing, pages 1013–1020, 2018.
- [CKS03] Amit Chakrabarti, Subhash Khot, and Xiaodong Sun. Near-optimal lower bounds on the multi-party communication complexity of set disjointness. In In IEEE Conference on Computational Complexity, pages 107–117, 2003.
- [CLRS16] Siu On Chan, James R Lee, Prasad Raghavendra, and David Steurer. Approximate constraint satisfaction requires large lp relaxations. Journal of the ACM (JACM), 63(4):1–22, 2016.
- [CMS20] Arkadev Chattopadhyay, Nikhil S Mande, and Suhail Sherif. The log-approximate-rank conjecture is false. Journal of the ACM (JACM), 67(4):1–28, 2020.
- [CMVW16] Michael Crouch, Andrew McGregor, Gregory Valiant, and David P Woodruff. Stochastic streams: Sample complexity vs. space complexity. In 24th Annual European Symposium on Algorithms (ESA 2016). Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2016.
- [CP10] Arkadev Chattopadhyay and Toniann Pitassi. The story of set disjointness. ACM SIGACT News, 41(3):59–85, 2010.
- [CSWY01] Amit Chakrabarti, Yaoyun Shi, Anthony Wirth, and Andrew Yao. Informational complexity and the direct sum problem for simultaneous message complexity. In Proceedings 42nd IEEE Symposium on Foundations of Computer Science, pages 270–278. IEEE, 2001.
- [DOR21] Nachum Dershowitz, Rotem Oshman, and Tal Roth. The communication complexity of multiparty set disjointness under product distributions. In Proceedings of the 53rd Annual ACM SIGACT Symposium on Theory of Computing, STOC 2021, page 1194–1207, New York, NY, USA, 2021. Association for Computing Machinery.
- [dR19] Susanna F de Rezende. Lower Bounds and Trade-offs in Proof Complexity. PhD thesis, KTH Royal Institute of Technology, 2019.
- [DRNV16] Susanna F De Rezende, Jakob Nordström, and Marc Vinyals. How limited interaction hinders real communication (and what it means for proof and circuit complexity). In 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS), pages 295–304. IEEE, 2016.
- [Gav16] Dmitry Gavinsky. Communication complexity of inevitable intersection. ArXiv, abs/1611.08842, 2016.
- [GGKS18] Ankit Garg, Mika Göös, Pritish Kamath, and Dmitry Sokolov. Monotone circuit lower bounds from resolution. In Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing, pages 902–911, 2018.
- [GH09] Sudipto Guha and Zhiyi Huang. Revisiting the direct sum theorem and space lower bounds in random order streams. In International Colloquium on Automata, Languages, and Programming, pages 513–524. Springer, 2009.
- [GJPW18] Mika Göös, T. S. Jayram, Toniann Pitassi, and Thomas Watson. Randomized communication versus partition number. ACM Trans. Comput. Theory, 10(1), jan 2018.
- [GJW18] Mika Göös, Rahul Jain, and Thomas Watson. Extension complexity of independent set polytopes. SIAM Journal on Computing, 47(1):241–269, 2018.
- [GKY22] Mika Göös, Stefan Kiefer, and Weiqiang Yuan. Lower bounds for unambiguous automata via communication complexity. Leibniz International Proceedings in Informatics, 229(1), 2022.
- [GNOR15] Yannai A. Gonczarowski, Noam Nisan, Rafail Ostrovsky, and Will Rosenbaum. A stable marriage requires communication. In Proceedings of the Twenty-Sixth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA ’15, page 1003–1017, USA, 2015. Society for Industrial and Applied Mathematics.
- [Göö15] Mika Göös. Lower bounds for clique vs. independent set. In 2015 IEEE 56th Annual Symposium on Foundations of Computer Science, pages 1066–1076. IEEE, 2015.
- [GP18] Mika Göös and Toniann Pitassi. Communication lower bounds via critical block sensitivity. SIAM Journal on Computing, 47(5):1778–1806, 2018.
- [GPW17] Mika Göös, Toniann Pitassi, and Thomas Watson. Query-to-communication lifting for bpp. In 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS), pages 132–143. IEEE, 2017.
- [GPW18] Mika Göös, Toniann Pitassi, and Thomas Watson. Deterministic communication vs. partition number. SIAM Journal on Computing, 47(6):2435–2450, 2018.
- [GR21] Mika Göös and Aviad Rubinstein. Near-optimal communication lower bounds for approximate nash equilibria. SIAM Journal on Computing, pages FOCS18–316, 2021.
- [Gri85] Dima Grigoriev. Lower bounds in algebraic computational complexity. Journal of Soviet Mathematics, 1985.
- [Gro09] Andre Gronemeier. Asymptotically optimal lower bounds on the nih-multi-party information complexity of the and-function and disjointness. In in Proc. of the 26th International Symposium on Theoretical Aspects of Computer Science, STACS, pages 505–516, 2009.
- [GS17] Anat Ganor and Karthik C. S. Communication complexity of correlated equilibrium in two-player games. arXiv preprint arXiv:1704.01104, 2017.
- [GW16] Mika Göös and Thomas Watson. Communication complexity of set-disjointness for all probabilities. Theory of Computing, 12(1):1–23, 2016.
- [HJ13] Prahladh Harsha and Rahul Jain. A strong direct product theorem for the tribes function via the smooth-rectangle bound. arXiv preprint arXiv:1302.0275, 2013.
- [HN12] Trinh Huynh and Jakob Nordström. On the virtue of succinct proofs: Amplifying communication complexity hardness to time-space trade-offs in proof complexity. In Proceedings of the forty-fourth annual ACM symposium on Theory of computing, pages 233–248, 2012.
- [HW07] Johan Håstad and Avi Wigderson. The randomized communication complexity of set disjointness. Theory of Computing, 3(1):211–219, 2007.
- [Jay09] T. S. Jayram. Hellinger strikes back: A note on the multi-party information complexity of and. APPROX ’09 / RANDOM ’09, page 562–573, Berlin, Heidelberg, 2009. Springer-Verlag.
- [JK10] Rahul Jain and Hartmut Klauck. The partition bound for classical communication complexity and query complexity. In 2010 IEEE 25th Annual Conference on Computational Complexity, pages 247–258. IEEE, 2010.
- [JRS03] Rahul Jain, Jaikumar Radhakrishnan, and Pranab Sen. A lower bound for the bounded round quantum communication complexity of set disjointness. In 44th Annual IEEE Symposium on Foundations of Computer Science, 2003. Proceedings., pages 220–229. IEEE, 2003.
- [Juk11] Stasys Jukna. Extremal combinatorics: with applications in computer science, volume 571. Springer, 2011.
- [KMR21] Pravesh K Kothari, Raghu Meka, and Prasad Raghavendra. Approximating rectangles by juntas and weakly exponential lower bounds for lp relaxations of csps. SIAM Journal on Computing, pages STOC17–305, 2021.
- [KPW21] Akshay Kamath, Eric Price, and David P Woodruff. A simple proof of a new set disjointness with applications to data streams. arXiv preprint arXiv:2105.11338, 2021.
- [KS92] Bala Kalyanasundaram and Georg Schintger. The probabilistic communication complexity of set intersection. SIAM J. Discret. Math., 5(4):545–557, nov 1992.
- [KW09] Eyal Kushilevitz and Enav Weinreb. The communication complexity of set-disjointness with small sets and 0-1 intersection. In 2009 50th Annual IEEE Symposium on Foundations of Computer Science, pages 63–72, 2009.
- [LM19] Bruno Loff and Sagnik Mukhopadhyay. Lifting theorems for equality. In 36th International Symposium on Theoretical Aspects of Computer Science (STACS 2019). Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2019.
- [LMM+22] Shachar Lovett, Raghu Meka, Ian Mertz, Toniann Pitassi, and Jiapeng Zhang. Lifting with sunflowers. In 13th Innovations in Theoretical Computer Science Conference (ITCS 2022). Schloss Dagstuhl-Leibniz-Zentrum für Informatik, 2022.
- [LRS15] James R Lee, Prasad Raghavendra, and David Steurer. Lower bounds on the size of semidefinite programming relaxations. In Proceedings of the forty-seventh annual ACM symposium on Theory of computing, pages 567–576, 2015.
- [MM22] Yahel Manor and Or Meir. Lifting with inner functions of polynomial discrepancy. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2022). Schloss Dagstuhl-Leibniz-Zentrum für Informatik, 2022.
- [MNSW95] Peter Bro Miltersen, Noam Nisan, Shmuel Safra, and Avi Wigderson. On data structures and asymmetric communication complexity. In Proceedings of the Twenty-Seventh Annual ACM Symposium on Theory of Computing, STOC ’95, page 103–111, New York, NY, USA, 1995. Association for Computing Machinery.
- [PR17] Toniann Pitassi and Robert Robere. Strongly exponential lower bounds for monotone computation. In Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pages 1246–1255, 2017.
- [Raz92] Aleksandr Razborov. On the distributional complexity of set disjointness. Theoretical Computer Science, 106:385–390, 1992.
- [Raz03] Alexander A Razborov. Quantum communication complexity of symmetric predicates. Izvestiya: Mathematics, 67(1):145, 2003.
- [RM97] Ran Raz and Pierre McKenzie. Separation of the monotone nc hierarchy. In Proceedings 38th Annual Symposium on Foundations of Computer Science, pages 234–243. IEEE, 1997.
- [RPRC16] Robert Robere, Toniann Pitassi, Benjamin Rossman, and Stephen A. Cook. Exponential lower bounds for monotone span programs. In Proceedings of the 57th Symposium on Foundations of Computer Science (FOCS), pages 406–415. IEEE Computer Society, 2016.
- [RY15] Anup Rao and Amir Yehudayoff. Simplified lower bounds on the multiparty communication complexity of disjointness. In 30th Conference on Computational Complexity (CCC 2015). Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2015.
- [RY20] Anup Rao and Amir Yehudayoff. Communication Complexity: and Applications. Cambridge University Press, 2020.
- [ST13] Mert Saglam and Gábor Tardos. On the communication complexity of sparse set disjointness and exists-equal problems. In 2013 IEEE 54th Annual Symposium on Foundations of Computer Science, pages 678–687, 2013.
- [WW15] Omri Weinstein and David P Woodruff. The simultaneous communication of disjointness with applications to data streams. In International Colloquium on Automata, Languages, and Programming, pages 1082–1093. Springer, 2015.
- [Yao79] Andrew Chi-Chih Yao. Some complexity questions related to distributive computing (preliminary report). In Proceedings of the eleventh annual ACM symposium on Theory of computing, pages 209–213, 1979.
- [YZ22] Guangxu Yang and Jiapeng Zhang. Simulation methods in communication lower bounds, revisited. Electron. Colloquium Comput. Complex., TR22-019, 2022.
Appendix A Missing proofs in Section 4
In this section, we give a proof for the bias lemma (Lemma 4.11). We first recall this lemma below.
Lemma A.1 (Lemma 4.11 restated).
Let be the constant and be the set from Lemma 4.9. For any and distinct , we have that
Lemma A.2.
[YZ22] For any and ,
Proof of Lemma A.1.
We first recall the random variable in the definition of (Definition 4.1). Then for any , we have that
For an , we call a rectangle good for if
Since the error of deterministic protocol under is at most , we have
Let be the set of leaf rectangles that protocol output and be the set of leaf rectangles that protocol output , we have
Since , we have,
Since , we have,
Thus,
Since
By average argument, there is a set of coordinates with such that for any ,
For any ,
and
Intuitively, is the probability of happens in distribution , is the probability of and happens in distribution , is the probability of and happens in distribution .
We recall connections between and , let ,
and
By Lemma A.2, we have
Since , . Thus,
We also can prove that for any , via replacing with in above proofs. Thus, also satisfies Lemma 4.9. ∎