Tight Bounds for Distributed Functional Monitoring
Abstract
We resolve several fundamental questions in the area of distributed functional monitoring, initiated by Cormode, Muthukrishnan, and Yi (SODA, 2008), and receiving recent attention. In this model there are sites each tracking their input streams and communicating with a central coordinator. The coordinator’s task is to continuously maintain an approximate output to a function computed over the union of the streams. The goal is to minimize the number of bits communicated.
Let the -th frequency moment be defined as , where is the frequency of element . We show the randomized communication complexity of estimating the number of distinct elements (that is, ) up to a factor is , improving upon the previous bound and matching known upper bounds up to a logarithmic factor. For , , we improve the previous bits communication bound to . We obtain similar improvements for heavy hitters, empirical entropy, and other problems. Our lower bounds are the first of any kind in distributed functional monitoring to depend on the product of and . Moreover, the lower bounds are for the static version of the distributed functional monitoring model where the coordinator only needs to compute the function at the time when all input streams end; surprisingly they almost match what is achievable in the (dynamic version of) distributed functional monitoring model where the coordinator needs to keep track of the function continuously at any time step. We also show that we can estimate , for any , using bits of communication. This drastically improves upon the previous bits bound of Cormode, Muthukrishnan, and Yi for general , and their bits bound for . For , our bound resolves their main open question.
Our lower bounds are based on new direct sum theorems for approximate majority, and yield improvements to classical problems in the standard data stream model. First, we improve the known lower bound for estimating in passes from to , giving the first bound that matches what we expect when for any constant number of passes. Second, we give the first lower bound for estimating in passes with bits of space that does not use the hardness of the gap-hamming problem.
1 Introduction
Recent applications in sensor networks and distributed systems have motivated the distributed functional monitoring model, initiated by Cormode, Muthukrishnan, and Yi [20]. In this model there are sites and a single central coordinator. Each site receives a stream of data for timesteps , and the coordinator wants to keep track of a function that is defined over the multiset union of the data streams at each time . For example, the function could be the number of distinct elements in the union of the streams. We assume that there is a two-way communication channel between each site and the coordinator so that the sites can communicate with the coordinator. The goal is to minimize the total amount of communication between the sites and the coordinator so that the coordinator can approximately maintain at any time . Minimizing the total communication is motivated by power constraints in sensor networks, since communication typically uses a power-hungry radio [25]; and also by network bandwidth constraints in distributed systems. There is a large body of work on monitoring problems in this model, including maintaining a random sample [21, 50], estimating frequency moments [18, 20], finding the heavy hitters [5, 42, 45, 54], approximating the quantiles [19, 54, 35], and estimating the entropy [4].
We can think of the distributed functional monitoring model as follows. Each of the sites holds an -dimentional vector where is the size of the universe. An update to a coordinate on site causes to increase by . The goal is to estimate a statistic of , such as the -th frequency moment , the number of distinct elements , and the empirical entropy . This is the standard insertion-only model. For many of these problems, with the exception of the empirical entropy, there are strong lower bounds (e.g., ) if allowing updates to coordinates that cause to decrease [4]. The latter is called the update model. Thus, except for entropy, we follow previous work and consider the insertion-only model.
To prove lower bounds, we consider the static version of the distributed functional monitoring model, where the coordinator only needs to compute the function at the time when all input streams end. It is clear that a lower bound for the static case is also a lower bound for the dynamic case in which the coordinator has to keep track of the function at any point in time. The static version of the distributed functional monitoring model is closely related to the multiparty number-in-hand communication model, where we again have sites each holding an -dimensional vector , and they want to jointly compute a function defined on the input vectors. It is easy to see that these two models are essentially the same since in the former, if site would like to send a message to , it can always send the message first to the coordinator and then the coordinator can forward the message to . Doing this will only increase the total amount of communication by a factor of two. Therefore, we do not distinguish between these two models in this paper.
There are two variants of the multiparty number-in-hand communication model we will consider: the blackboard model, in which each message a site sends is received by all other sites, i.e., it is broadcast, and the message-passing model, in which each message is between the coordinator and a specific site.
Despite the large body of work in the distributed functional monitoring model, the complexity of basic problems is not well understood. For example, for estimating up to a -factor, the best upper bound is 111 We use to denote a function of the form . [20] (all communication and information bounds in this paper, if not otherwise stated, are in terms of bits), while the only known lower bound is . The dependence on in the lower bound is not very insightful, as the bound follows just by considering two sites [4, 16]. The real question is whether the and factors should multiply. Even more embarrassingly, for the frequency moments , , the known algorithms use communication , while the only known lower bound is [4, 16]. Even for , the best known upper bound is [20], and the authors’ main open question in their paper is “It remains to close the gap in the case: can a better lower bound than be shown, or do there exist solutions?”
Our Results: We significantly improve the previous communication bounds for approximating the frequency moments, entropy, heavy hitters, and quantiles in the distributed functional monitoring model. In many cases our bounds are optimal. Our results are summarized in Table 1, where they are compared with previous bounds.
Previous work | This paper | Previous work | This paper | |
Problem | LB | LB (all static) | UB | UB |
[20] | [20] | – | ||
[20] | (BB) | [20] | ||
[4, 16] | (BB) | [20] | ||
All-quantile | [35] | (BB) | [35] | – |
Heavy Hitters | [35] | (BB) | [35] | – |
Entropy | [4] | (BB) | [4], (static) [33] | – |
– | (BB) | (static) [40] | – |
We have three main results, each introducing a new technique:
-
1.
We show that for estimating in the message-passing model, communication is required, matching an upper bound of [20] up to a polylogarithmic factor. Our lower bound holds in the static model in which the sites just need to approximate once on their inputs.
- 2.
-
3.
We show communication is necessary for approximating in the blackboard model, significantly improving the prior bound. As with our lower bound for , these are the first lower bounds which depend on the product of and . As with , our lower bound holds in the static model in which the sites just approximate once.
Our other results in Table 1 are explained in the body of the paper, and use similar techniques.
We would like to mention that after the conference version of our paper, our results found applications in proving a space lower bound at each site for tracking heavy hitters in the functional monitoring model [36], and a communication complexity lower bound of computing -approximations of range spaces in in the message-passing model [34].
Our Techniques: Lower Bound for : For illustration, suppose . There are sites each holding a random independent bit. Their task is to approximate the sum of the bits up to an additive error . Call this problem -APPROX-SUM.333In the conference version of this paper we introduced a problem called -GAP-MAJ, in which sites need to decide if at least of the bits are , or at most of the bits are . We instead use -APPROX-SUM here since we feel it is easier to work with: This problem is stronger than -GAP-MAJ thus is easier to lower bound, and it suffices for our purpose. -GAP-MAJ will be introduced and used in Section 6.1 for heavy-hitters and quantiles. We show any correct protocol must reveal bits of information about the sites’ inputs. We “compose” this with -party disjointness (-DISJ) [48], in which each party has a bitstring of length and either the strings have disjoint support (the solution is ) or there is a single coordinate which is in both strings (the solution is ). Let be the hard distribution for -DISJ, shown to require bits of communication to solve [48]. Suppose the coordinator and each site share an instance of -DISJ in which the solution to -DISJ is a random bit, which is the site’s effective input to -APPROX-SUM. The coordinator has the same input for each of the instances, while the sites have an independent input drawn from conditioned on the coordinator’s input and output bit determined by -APPROX-SUM. The inputs are chosen so that if the output of -DISJ is , then increases by , otherwise it remains the same. This is not entirely accurate, but it illustrates the main idea. Now, the key is that by the rectangle property of -party communication protocols, the different output bits are independent conditioned on the transcript. Thus if a protocol does not reveal bits of information about these output bits, by an anti-concentration theorem we can show that the protocol cannot succeed with large probability. Finally, since a -approximation to can decide -APPROX-SUM, and since any correct protocol for -APPROX-SUM must reveal bits of information, the protocol must solve instances of -DISJ, each requiring bits of communication (otherwise the coordinator could simulate of the sites and obtain an - communication protocol for -DISJ with the remaining site, contradicting the communication lower bound for -DISJ on this distribution). We obtain an bound for by using similar arguments. One cannot show this in the blackboard model since there is an bound for 444The idea is to first obtain a -approximation. Then, sub-sample so that there are distinct elements. Then the first party broadcasts his distinct elements, the second party broadcasts the distinct elements he has that the first party does not, etc..
Lower Bound for : Our bound for cannot use the above reduction since we do not know how to turn a protocol for approximating into a protocol for solving the composition of -APPROX-SUM and -DISJ. Instead, our starting point is a recent lower bound for the -party gap-hamming distance problem GHD [16]. The parties have a length- bitstring, and , respectively, and they must decide if the Hamming distance or . A simplification by Sherstov [49] shows a related problem called -GAP-ORT also has communication complexity of bits. Here there are two parties, each with -length bitstrings and , and they must decide if or . Chakrabarti et al. [15] showed that any correct protocol for -GAP-ORT must reveal bits of information about . By independence and the chain rule, this means for indices , bits of information is revealed about conditioned on values for . We now “embed” an independent copy of a variant of -party-disjointness, the -XOR problem, on each of the coordinates of -GAP-ORT. In this variant, there are parties each holding a bitstring of length . On all but one “special” randomly chosen coordinate, there is a single site assigned to the coordinate and that site uses private randomness to choose whether the value on the coordinate is or (with equal probability), and the remaining sites have on this coordinate. On the special coordinate, with probability all sites have a on this coordinate (a “00” instance), with probability the first parties have a on this coordinate and the remaining parties have a (a “10” instance), with probability the second parties have a on this coordinate and the remaining parties have a (a “01” instance), and with the remaining probability all parties have a on this coordinate (a “11” instance). We show, via a direct sum for distributional communication complexity, that any deterministic protocol that decides which case the special coordinate is in with probability has conditional information cost . This implies that any protocol that can decide whether the output is in the set (the “XOR” of the output bits) with probability has conditional information cost . We do the direct sum argument by conditioning the mutual information on low-entropy random variables which allow us to fill in inputs on remaining coordinates without any communication between the parties and without asymptotically affecting our lower bound. We design a reduction so that on the -th coordinate of -GAP-ORT, the input of the first -players of -XOR is determined by the public coin (which we condition on) and the first party’s input bit to -GAP-ORT, and the input of the second -players of -XOR is determined by the public coin and the second party’s input bit to -GAP-ORT . We show that any protocol that solves the composition of -GAP-ORT with copies of -XOR , a problem that we call -BTX , must reveal bits of information about the two output bits of an fraction of the copies, and from our information cost lower bound for a single copy, we can obtain an overall bound. Finally, one can show that a -approximation algorithm for can be used to solve -BTX .
Upper Bound for : We illustrate the algorithm for and constant . Unlike [20], we do not use AMS sketches [3]. A nice property of our protocol is that it is the first -way protocol (the protocol of [20] is not), in the sense that only the sites send messages to the coordinator (the coordinator does not send any messages). Moreover, all messages are simple: if a site receives an update to the -th coordinate, provided the frequency of coordinate in its stream exceeds a threshold, it decides with a certain probability to send to the coordinator. Unfortunately, one can show that this probability cannot be the same for all coordinates , as otherwise the communication would be too large.
To determine the threshold and probability to send an update to a coordinate , the sites use the public coin to randomly group all coordinates into buckets , where contains a fraction of the input coordinates. For , the threshold and probability are only a function of . Inspired by work on sub-sampling [37], we try to estimate the number of coordinates of magnitude in the range , for each . Call this class of coordinates . If the contribution to from is significant, then , and to estimate we only consider those that are in for a value which satisfies . We do not know and so we also do not know , but we can make a logarithmic number of guesses. We note that the work [37] was available to the authors of [20] for several years, but adapting it to the distributed framework here is tricky in the sense that the “heavy hitters” algorithm used in [37] for finding elements in different needs to be implemented in a -party communication-efficient way.
When choosing the threshold and probability we have two competing
constraints; on the one hand these values must be chosen so that we can accurately estimate
the values from the samples. On the other hand, these values need to be chosen so that
the communication is not excessive. Balancing these
two constraints forces us to use a threshold instead of just the same probability for all coordinates in .
By choosing the thresholds and probabilities to be
appropriate functions of , we can satisfy both constraints. Other minor issues in the analysis
arise from
the fact that different classes contribute at different times, and that the coordinator must be
correct at all times. These issues can be resolved by conditioning on a quantity related
to the protocol’s correctness being accurate at a small number of selected times in the stream, and then
arguing that the quantity is non-decreasing and that this implies that it is correct at all times.
Implications for the Data Stream Model:
In 2003, Indyk and Woodruff introduced the GHD problem
[38], where a -round lower bound shortly followed
[52]. Ever since, it seemed the space complexity of estimating
in a data stream with passes hinged on whether GHD
required communication for rounds, see, e.g., Question 10 in [2].
A flurry [9, 10, 16, 51, 49] of recent work finally resolved the
complexity of GHD. What our lower
bound shows for is that this is not the only way to prove the space
bound for multiple passes for . Indeed, we just needed to look at parties instead
of parties. Since we have an communication lower bound for
with parties, this implies an
bound for -pass algorithms for approximating .
Arguably our proof is simpler than the recent GHD lower bounds.
Our bound for also improves a long line of work on the space complexity of estimating for in a data stream. The current best upper bound is bits of space [28]. See Figure 1 of [28] for a list of papers which make progress on the and logarithmic factors. The previous best lower bound is for passes [7]. By setting , we obtain that the total communication is at least , and so the implied space lower bound for -pass algorithms for in a data stream is . This gives the first bound that agrees with the tight bound when for any constant . After our work, Ganguly [29] improved this for the special case . That is, for -pass algorithms for estimating , , he shows a space lower bound of .
Other Related Work: There are quite a few papers on multiparty number-in-hand communication complexity, though they are not directly relevant for the problems studied in this paper. Alon et al. [3] and Bar-Yossef et al. [7] studied lower bounds for multiparty set-disjointness, which has applications to -th frequency moment estimation for in the streaming model. Their results were further improved in [14, 31, 39]. Chakrabarti et al. [12] studied random-partition communication lower bounds for multiparty set-disjointness and pointer jumping, which have a number of applications in the random-order data stream model. Other work includes Chakrabarti et al. [13] for median selection, Magniez et al. [44] and Chakrabarti et al. [11] for streaming language recognition. Very few studies have been conducted in the message-passing model. Duris and Rolim [23] proved several lower bounds in the message-passing model, but only for some simple boolean functions. Three related but more restrictive private-message models were studied by Gal and Gopalan [27], Ergün and Jowhari [24], and Guha and Huang [32]. The first two only investigated deterministic protocols and the third was tailored for the random-order data stream model.
Recently Phillips et al. [47] introduced a technique called symmetrization
for the number-in-hand communication model. The idea is to try to find a symmetric hard distribution for the players. Then one reduces the -player problem to a -player problem by assigning Alice the input of a random player and Bob the inputs of the remaining players. The answer to the -player problem gives the answer to the -player problem. By symmetrization one can argue that if the communication lower bound for the resulting -player problem is , then the lower bound for the -player problem is .
While symmetrization developed in [47] can be used to solve some problems for which other techniques are not known,
such as bitwise AND/OR and graph connectivity, it has several limitations.
First, symmetrization requires a symmetric hard distribution, and for many problems (e.g., in this paper) this is
not known or unlikely to exist.
Second, for many problems (e.g., in this paper), we need a direct-sum type of argument with certain combining functions (e.g., the majority (MAJ)), while in [47], only outputting all copies or with the combining function OR is considered.
Third, the symmetrization technique in [47] does not give information cost bounds, and so
it is difficult to use when composing problems as is done in this paper. In this paper, we have further developed symmetrization to make it work with the combining function MAJ and the information cost.
Paper Outline:
In Section 3 and Section 4 we prove our lower bounds for and , . The lower bounds apply to functional monitoring, but hold even in the static model.
In Section 5 we show improved upper bounds for for functional monitoring. Finally, in Section 6 we prove lower bounds for all-quantile, heavy hitters, entropy and for any in the blackboard model.
2 Preliminaries
In this section we review some basics on communication complexity and information theory.
Information Theory
We refer the reader to [22] for a comprehensive introduction to information theory. Here we review a few concepts and notations.
Let denote the Shannon entropy of the random variable , and let denote the binary entropy function when . Let denote conditional entropy of given . Let denote the mutual information between two random variables . Let denote the mutual information between two random variables conditioned on . The following is a summarization of the basic properties of entropy and mutual information that we need.
Proposition 1
Let be random variables.
-
1.
If takes value in , then .
-
2.
and .
-
3.
If and are independent, then we have . Similarly, if are independent given , then .
-
4.
(Chain rule of mutual information)
And in general, for any random variables ,
.
Thus, .
-
5.
(Data processing inequality) If and are conditionally independent given , then .
-
6.
(Fano’s inequality) Let be a random variable chosen from domain according to distribution , and be a random variable chosen from domain according to distribution . For any reconstruction function with error ,
-
7.
(The Maximum Likelihood Estimation principle) With the notations as in Fano’s inequality, if the (deterministic) reconstruction function is for the that maximizes the conditional probability , then
Call this the maximum likelihood function.
Communication complexity
In the two-party randomized communication complexity model (see e.g., [43]), we have two players Alice and Bob. Alice is given and Bob is given , and they want to jointly compute a function by exchanging messages according to a protocol . Let denote the message transcript when Alice and Bob run protocol on input pair . We sometimes abuse notation by identifying the protocol and the corresponding random transcript, as long as there is no confusion.
The communication complexity of a protocol is defined as the maximum number of bits exchanged among all pairs of inputs. We say a protocol computes with error probability if there exists a function such that for all input pairs , . The -error randomized communication complexity, denoted by , is the cost of the minimum-communication randomized protocol that computes with error probability . The -distributional communication complexity of , denoted by , is the cost of the minimum-communication deterministic protocol that gives the correct answer for on at least a fraction of all input pairs, weighted by distribution . Yao [53] showed that
Lemma 1 (Yao’s Lemma)
.
Thus, one way to prove a lower bound for randomized protocols is to find a hard distribution and lower bound . This is called Yao’s Minimax Principle.
We will use the notion expected distributional communication complexity , which was introduced in [47] (where it was written as , with a bit abuse of notation) and is defined to be the expected cost (rather than the worst case cost) of the deterministic protocol that gives the correct answer for on at least fraction of all inputs, where the expectation is taken over distribution .
The definitions for two-party protocols can be easily extended to the multiparty setting, where we have players and the -th player is given an input . Again the players want to jointly compute a function by exchanging messages according to a protocol .
Information complexity
Information complexity was introduced in a series of papers including [17, 7]. We refer the reader to Bar-Yossef’s Thesis [6]; see Chapter 6 for a detailed introduction. Here we briefly review the concepts of information cost and conditional information cost for -player communication problems. All of them are defined in the blackboard number-in-hand model.
Let be an input distribution on and let be a random input chosen from . Let be a randomized protocol running on inputs in . The information cost of with respect to is . The information complexity of a problem with respect to a distribution and error parameter , denoted , is the minimum information cost of a -error protocol for with respect to . We will work in the public coin model, in which all parties also share a common source of randomness.
We say a distribution partitions if conditioned on , is a product distribution. Let be a random input chosen from and be a random variable chosen from . For a randomized protocol on , the conditional information cost of with respect to the distribution on and a distribution partitioning is defined as . The conditional information complexity of a problem with respect to a distribution , a distribution partitioning , and error parameter , denoted , is the minimum information cost of a -error protocol for with respect to and . The following proposition can be found in [7].
Proposition 2
For any distribution , distribution partitioning , and error parameter ,
Statistical distance measures
Given two probability distributions and over the same space , the following statistical distance measures will be used in this paper:
-
1.
Total variation distance: .
-
2.
Hellinger distance:
We have the following relation between total variation distance and Hellinger distance (cf. [6], Chapter ).
Proposition 3
The total variation distance of transcripts on a pair of inputs is closely related to the error of a randomized protocol. The following proposition can be found in [6], Proposition 6.22 (the original proposition is for the 2-party case, and generalizing it to the multiparty case is straightforward).
Proposition 4
Let , and be a -error randomized protocol for a function . Then, for every two inputs for which , it holds that
Conventions.
In the rest of the paper we call a player a site, as to be consistent with the distributed functional monitoring model. We denote . Let be the XOR function. All logarithms are base- unless noted otherwise. We say is a -approximation of , , if .
3 A Lower Bound for
We introduce a problem called -APPROX-SUM, and then compose it with -DISJ (studied, e.g., in [48]) to prove a lower bound for . In this section we work in the message-passing model.
3.1 The -APPROX-SUM Problem
In the -APPROX-SUMf,τ problem, we have sites and the coordinator. Let be an arbitrary function, and let be an arbitrary distribution on such that for , with probability , and with probability , where for a sufficiently large constant ) is a parameter. We define the input distribution for -APPROX-SUMf,τ on as follows: We first sample , and then independently sample . Note that each pair is distributed according to . Let . Thus ’s are i.i.d. Bernoulli. Let . We assign to site for each , and assign to the coordinator.
In the -APPROX-SUMf,τ problem, the sites want to approximate up to an additive factor of . In the rest of this section, for convenience, we omit subscripts in -APPROX-SUMf,τ, since our results will hold for all having the properties mentioned above.
For a fixed transcript , let . Thus . Let be a sufficiently large constant.
Definition 1
Given an input and a transcript , let and . For convenience, we define . We say
-
1.
is bad1 for (denoted by ) if , and for at least fraction of , it holds that , and
-
2.
is bad0 for (denoted by ) if , and for at least fraction of , it holds that .
And is good for otherwise.
In this section, we will prove the following theorem. Except stated explicitly, all probabilities, expectations and variances are taken with respect to the input distribution .
Theorem 1
Let be the transcript of any deterministic protocol for -APPROX-SUM on input distribution with error probability for some sufficiently small constant , then .
The following observation, which easily follows from the rectangle property of communication protocols, is crucial to our proof. We have included a proof in Appendix A.
Observation 1
Conditioned on , are independent.
Definition 2
We say a transcript is rare+ if and rare- if . In both cases we say is rare. Otherwise we say it is normal.
Definition 3
We say is a joker+ if , and a joker- if . In both cases we say is a joker.
Lemma 2
Under the assumption of Theorem 1, .
-
Proof:
First, we can apply a Chernoff bound on random variables , and get
Second, by Observation 1, we can apply a Chernoff bound on random variables conditioned on being rare+,
Finally by Bayes’ theorem, we have that
Similarly, we can also show that . Therefore (recall that by our assumption for a sufficiently large constant ).
Definition 4
Let . We say a transcript is weak if , and strong otherwise.
Lemma 3
Under the assumption of Theorem 1, .
-
Proof:
We first show that for a normal and weak transcript , there exists a constant such that
(1) (2) The first inequality is a simple application of Chernoff-Hoeffding bound. Recall that for a normal , . We have
Now we prove for the second inequality. We will need the following anti-concentration result which is an easy consequence of Feller [26] (cf. [46]).
Fact 1
([46]) Let be a sum of independent random variables, each attaining values in , and let . Then for all , we have
for a universal constant .
For a normal and weak , it holds that
Recall that by our assumption, for a sufficiently large constant , thus and . Using Lemma 1, we have for a universal constant ,
By (1) and (2), it is easy to see that given that is normal, it cannot be weak with probability more than , since otherwise by Lemma 2 and the analysis above, the error probability of the protocol will be at least , for an arbitrarily small constant error , violating the success guarantee of the lemma. Therefore,
Now we analyze the probability of being good. For a , let and . We have the following two lemmas.
Lemma 4
Under the assumption of Theorem 1, .
-
Proof:
Consider any . First, by the definition of a normal , we have . Therefore the number of ’s such that and is at most . Second, by the definition of a strong , we have . Therefore the number of ’s such that and is at most . Also note that if is not joker, then . Thus conditioned on a normal and strong , as well as is not a joker, the number of ’s such that and is at least
where we have used our assumption that for a sufficiently large constant . We conclude that
Lemma 5
Under the assumption of Theorem 1, .
-
Proof:
Call a is bad1 for a set (denoted by ), if for more than fraction of , we have . Let if holds and otherwise. We have
(4) The last inequality holds since in (4), in the last term, we count the probability of each possible set of size and is to that its elements are all , which upper bounds the corresponding summation in (4). Now for a fixed , conditioned on a normal , we consider the term
(5) W.l.o.g., we can assume that for an . We consider a pair . Terms in the summation (5) that includes either or can be written as
By the symmetry of , the sets and are the same. Using this fact and the AM-GM inequality, it is easy to see that the sum will not decrease if we set . Call such an operation an equalization. We repeat applying such equalizations to any pair , with the constraint that if and , then we only “average” them to the extent that if , and otherwise. We introduce this constraint because we do not want to change , since otherwise a set which was originally can be after these equalizations. We cannot further apply equalizations when one of the followings happen.
(6) (7) We note that actually (7) cannot happen since is preserved during equalizations, and conditioned on a normal , we have .
Let . For a normal , it holds that . Let . Recall that , and we have set . We try to upper bound (5) using (6).
(9) In (Proof: ), the first term is the number of possible choices of the set with fraction of items in , and the rest in . And the second term upper bounds according to the discussion above. Here we have assumed , otherwise if , then , which is smaller than (9). Now, (4) can be upper bounded by
3.2 The -DISJ Problem
In -DISJ problem, Alice has a set and Bob has a set . Their goal is to output if , and otherwise.
We define the input distribution as follows. Let . With probability , and are random subsets of such that and . And with probability , and are random subsets of such that and . Razborov [48] proved that for , . It is easy to extend this result to general and the average-case complexity.
Theorem 2 ([47], Lemma 2.2)
For any , it holds that , where the expectation is taken over the input distribution .
In the rest of the section, we simply write as .
3.3 The Complexity of
3.3.1 Connecting and -APPROX-SUM
Set , , where is the small constant error parameter for -APPROX-SUM in Theorem 1.
We choose to be -DISJ with universe size , set its input distribution to be , and work on -APPROX-SUM. Let be the input distribution of -APPROX-SUM, which is a function of (see Section 3.1 for the detailed construction of from ). Let . Let . Let be the induced distribution of on which we choose to be the input distribution for . In the rest of this section, for convenience, we will omit the subscripts -DISJ and in -APPROX-SUM when there is no confusion.
Let . Let . The following lemma shows that will concentrate around its expectation , which can be calculated exactly.
Lemma 6
With probability at least , we have , where for some fixed constant .
-
Proof:
We can think of our problem as a bin-ball game: Think each pair such that -DISJ are balls (thus we have balls), and elements in the set are bins. Let . We throw each of the balls into one of the bins uniformly at random. Our goal is to estimate the number of non-empty bins at the end of the process.
By a Chernoff bound, with probability , . By Fact and Lemma in [41], we have and . Thus by Chebyshev’s inequality we have
Let . We can write
This series converges and thus we can write for some fixed constant .
The next lemma shows that we can use a protocol for to solve -APPROX-SUM with good properties.
Lemma 7
Any protocol that computes a -approximation to (for a sufficiently small constant ) on input distribution with error probability can be used to compute -APPROX-SUM on input distribution with error probability .
-
Proof:
Given an input for -APPROX-SUM. The sites and the coordinator use to compute which is a -approximation to , and then determine the answer to -APPROX-SUM to be
Recall that is some fixed constant, and .
Correctness.
Given a random input , the exact value of can be written as the sum of two components.
(10) |
where counts , and counts . First, from our construction it is easy to see by a Chernoff bound and the union bound that with probability , we have , since each element in will be chosen by every with a probability at least . Second, by Lemma 6 we know that with probability , is within from its mean for some fixed constant . Thus with probability , we can write Equation (10) as
(11) |
for a value and .
Set . Since computes a value which is a -approximation of , we can substitute with in Equation (11), resulting in the following.
(12) |
where , , and
Now we have
where . Therefore approximates correctly up to an additive error , thus computes -APPROX-SUM correctly. The total error probability of this simulation is at most , where the first term counts the error probability of and the second term counts the error probability introduced by the reduction. This is less than if we choose .
3.3.2 An Embedding Argument
Lemma 8
Suppose that there exists a deterministic protocol which computes -approximate (for a sufficiently small constant ) on input distribution with error probability (for a sufficiently small constant ) and communication , then there exists a deterministic protocol that computes -DISJ on input distribution with error probability and expected communication complexity , where the expectation is taken over the input distribution .
-
Proof:
In -DISJ, Alice holds and Bob holds such that . We show that Alice and Bob can use the deterministic protocol to construct a deterministic protocol for -DISJ with desired error probability and communication complexity.
Alice and Bob first use to construct a protocol . During the construction they will use public and private randomness which will be fixed at the end. consists of two phases.
Input reduction phase. Alice and Bob construct an input for using and as follows: They pick a random site using public randomness. Alice assigns with input , and Bob constructs inputs for the rest sites using . For each , Bob samples an according to using independent private randomness and assigns it to . Let . Note that and .
Simulation phase. Alice simulates and Bob simulates the rest sites, and they run protocol on to compute up to a -approximation for a sufficiently small constant and error probability . Let be the protocol transcript, and let be the output. By Lemma 7, we can use to compute -APPROX-SUM with error probability . And then by Theorem 1, for fraction of over the input distribution and , it holds that for fraction of , , and fraction of , . Now outputs if , and otherwise. Since is chosen randomly among the sites, and the inputs for the sites are identically distributed, computes on input distribution correctly with probability .
We now describe the final protocol : Alice and Bob repeat independently for times for a large enough constant . At the -th repetition, in the input reduction phase, they choose a random permutation of using public randomness, and apply it to each element in before assigning them to the sites. After running for times, outputs the majority of the outcomes.
Since is fixed at each repetition, the inputs at each repetition have a small dependence, but conditioned on , they are all independent. Let to be input distribution of conditioned on . Let be the induced distribution of on . The successful probability of a run of on is at least , where is the total variation distance between distributions , which is at most
and can be bounded by (see, e.g., Fact 2.4 of [30]). Since conditioned on , the inputs at each repetition are independent, and the success probability of each run of is at least , by a Chernoff bound over the repetitions for a sufficiently large , we conclude that succeeds with error probability .
We next consider the communication complexity. At each run of , let be the expected communication cost between the site and the rest players (more precisely, between and the coordinator, since in the coordinator model all sites only talk to the coordinator, whose initial input is ), where the expectation is taken over the input distribution and the choice of the random . Since conditioned on , all are independent and identically distributed, if we take a random site , the expected communication between and the coordinator should be equal to the total communication divided by a factor of . Thus we have . Finally, by the linearity of expectation, the expected total communication cost of the runs of is .
At the end we fix all the randomness used in construction of protocol . We first use two Markov inequalities to fix all public randomness such that succeeds with error probability , and the expected total communication cost of the , where both the error probability and the cost expectation are taken over the input distribution and Bob’s private randomness. We next use another two Markov inequalities to fix Bob’s private randomness such that succeeds with error probability , and the expected total communication cost of the , where both the error probability and the cost expectation are taken over the input distribution .
The following theorem is a direct consequence of Lemma 8, Theorem 2 for -DISJ and Lemma 1 (Yao’s Lemma). Recall that we set and . In the definition for -APPROX-SUM we need for a sufficiently large constant , thus we require for a sufficiently large constant .
Theorem 3
Assuming that for a sufficiently large constant . Any randomised protocol that computes a -approximation to with error probability (for a sufficiently small constant ) has communication complexity .
4 A Lower Bound for
We first introduce a problem called -XOR which can be considered to some extent as a combination of two -DISJ (introduced in [3, 7]) instances, and then compose it with -GAP-ORT (introduced in [49]) to create another problem that we call the -BLOCK-THRESH-XOR (-BTX) problem. We prove that the communication complexity of -BTX is large. Finally, we prove a communication complexity lower bound for by performing a reduction from -BTX. In this section we work in the blackboard model.
4.1 The -GAP-ORT Problem
In the -GAP-ORT problem we have two players Alice and Bob. Alice has a vector and Bob has a vector . They want to compute
Let be the uniform distribution on and let be a random input chosen from distribution . The following theorem is recently obtained by Chakrabarti et al. [15]. 555In our original conference paper, which was published before [15], we proved the theorem for protocols with communication, using a simple argument based on [49] and Theorem 1.3 of [8].
Theorem 4 ([15])
Let be the transcript of any randomized protocol for -GAP-ORT on input distribution with error probability , for a sufficiently small constant . Then, .
4.2 The -XOR Problem
In the -XOR problem we have sites . Each site holds a block of bits. Let be the inputs of sites. Let be the sites’ inputs on the -th coordinate. W.l.o.g., we assume is a power of . The sites want to compute the following function.
We define the input distribution for the -XOR problem as follows. For each coordinate there is a variable chosen uniformly at random from . Conditioned on , all but the -th sites set their inputs to , whereas the -th site sets its input to or with equal probability. We call the -th site the special site in the -th coordinate. Let denote this input distribution on one coordinate.
Next, we choose a random special coordinate and replace the sites’ inputs on the -th coordinate as follows: For the first sites, with probability we replace all sites’ inputs with , and with probability we replace all sites’ inputs with ; and we independently perform the same operation to the second sites. Let denote the distribution on this special coordinate. And let denote the input distribution that on the special coordinate is distributed as and on each of the remaining coordinates is distributed as .
Let be the corresponding random variables of when the input of -XOR is chosen according to the distribution . Let .
4.3 The -GUESS Problem
The -GUESS problem can be seen as an augmentation of the -XOR problem. The sites are still given an input , as that in the -XOR problem. In addition, we introduce another player called the predictor. The predictor will be given an input , but it cannot talk to any of the sites (that is, it cannot write anything to the blackboard). After the sites finish the whole communication, the predictor computes the final output , where is the transcript of the sites’ communication on their input , and is the (deterministic) maximum likelihood function (see Proposition 1). In this section when we talk about protocol transcripts, we always mean the concatenation of the messages exchanged by the sites, but excluding the output of the predictor.
In the -GUESS problem, the goal is for the predictor to output , where if the inputs of the first sites in the special coordinate are all and otherwise, and if the inputs of the second sites in the special coordinate are all and otherwise. We say the instance is a -instance if , a -instance if and , a -instance if and , and a -instance if . Let be the type of an instance.
We define the following input distribution for -GUESS: We assign an input to the sites, and to the predictor, where are those used to construct the in the distribution . Slightly abusing notation, we also use to denote the joint distribution of and . We do the same for the one coordinate distributions and . That is, we also use (or ) to denote the joint distribution of and for a single coordinate .
Theorem 5
Let be the transcript of any randomized protocol for -GUESS on input distribution with success probability . Then we have , where information is measured 666When we say that the information is measured with respect to a distribution we mean that the inputs to the protocol are distributed according to when computing the mutual information (note that there is also randomness used by when measuring the mutual information). with respect to the input distribution .
-
Proof:
By a Markov inequality, we know that for of , the protocol succeeds with probability conditioned on . Call an for which this holds eligible. Let . We say an is good if is both eligible and . Thus there are good . Let denote the random variable with -th component missing. We say a is nice for a good if the protocol succeeds with probability conditioned on and . By another Markov inequality, it holds that at least an fraction of is nice for a good .
Now we consider . Note that if we can show that , then it follows that , since , and . By the chain rule, expanding the conditioning, and letting be the inputs to the sites on the first coordinates, we have
(17) (19) where (17) to (Proof: ) is because is independent of given other conditions, and we apply item in Proposition 1.
Now let’s focus on a good and a nice for . We define a protocol which on input , attempts to output , where if and otherwise, and if and otherwise. Here are inputs of the sites and is the input of the predictor. The protocol has hardwired into it, and works as follows. First, the sites construct an input for -GUESS distributed according to , using and their private randomness, without any communication: They set the input on the -th coordinate to be , and use their private randomness to sample the inputs for coordinates using the value and the fact that the inputs to the sites are independent conditioned on . The predictor sets its input to be . Next, the sites run on their input . Finally, the predictor outputs .
Let . Let . Let and be the distributions and of after embedding to the -th coordinate conditioned on (recall that for a good , thus ), respectively. Since is good and is nice for , it follows that
where the probability is taken over , and is the total variation distance between distributions and , which can be bounded by . The proof will be given shortly, and here is where we use that .
Hence, for a good , and for a nice for , we have
(20) where is the protocol that minimizes the information cost when the information (on the right side of (20)) is measured with respect to the marginal distribution of on a good coordinate , and succeeds in outputting with probability when the sites and the predictor get input , The information on the left side of (20) is measured with respect to .
Combining (19) and (20), and given that we have good , as well as at least an fraction of that are nice for any good , we have
(21) Now we analyze . First note that we can just focus on coordinates , since the distributions and are the same on coordinates . Let and be the distribution of and on coordinates , respectively. Observe that can be thought as a binomial distribution: for each coordinate , we set randomly to be or with equal probability. The remaining are all set to be . Moreover, can be thought in the following way: we first sample according to , and then randomly choose a coordinate and reset . Since is random, the total variation distance between and is the total variation distance between Binomial and Binomial (that is, by symmetry, only the number of ’s in matters), which is at most (see, e.g., Fact 2.4 of [30]).
Let be the event that all sites have the value in the -th coordinate when the inputs are drawn from . Observe that , thus
where the information on the left hand side is measured with respect to inputs drawn from , and the information on the right hand side is measured with respect to inputs drawn from the marginal distribution of on a good coordinate , which is equivalent to since . By the third item of Proposition 1, and using that are independent of given and , we obtain
Finally, since are independent of and , it holds that , where the information is measured with respect to the input distribution , and is a protocol which succeeds with probability on .
It remains to show that , where the information is measured with respect to , and the correctness is measured with respect to . Let be the all- vector, be the all- vector and be the standard basis vector with the -th coordinate being . By the relationship between mutual information and Hellinger distance (see Proposition 2.51 and Proposition 2.53 of [6]), we have
where is the Hellinger distance (see Section 2 for a definition). Now we assume and are powers of , and we use Theorem of [39], which says that the following three statements hold:
-
1.
.
-
2.
.
-
3.
.
It follows that
By the Cauchy-Schwartz inequality we have,
We can rewrite this as (by changing the constant in the ):
By the triangle inequality of the Hellinger distance, we get
-
1.
,
-
2.
,
-
3.
.
Thus we have
The claim is that at least one of in the RHS in Equation (Proof: ) is , and this will complete the proof. By Proposition 3, this is true if the total variation distance for a pair . We show that there must be such a pair , for the following reasons.
For a , let if . First, there must exist a pair such that
(22) since otherwise, if for all , then by Proposition 4, for any pair with , and a chosen from uniformly at random, it holds that , where the probability is taken over the distribution of . Consequently, for a chosen from uniformly at random, it holds that , violating the protocol’s success probability guarantee. Second, since is a deterministic function, and is independent of when , we have
(23) -
1.
4.4 The -BTX Problem
The input of the -BTX problem is a concatenation of copies of inputs of the -XOR problem. That is, each site holds an input consisting of blocks each of which is an input for a site in the -XOR problem. More precisely, each holds an input where is a vector of bits. Let be the list of inputs to the sites in the -th block. Let be the list of inputs to the sites. In the -BTX problem the sites want to compute the following.
We define the input distribution for the -BTX problem as follows: The input of the sites in each block is chosen independently according to the input distribution , which is defined for the -XOR problem. Let be the corresponding random variables of when the input of -BTX is chosen according to the distribution . Let where is the special site in the -th coordinate of block , and let . Let where is the special coordinate in block . Let where is the type of the -XOR instance in block .
For each block , let if the inputs of the first sites in the special coordinate are all and otherwise; and similarly let if the inputs of the second sites in the coordinate are all and otherwise. Let and .
Linking -BTX to -GAP-ORT.
We show that Alice and Bob, who are given , can construct a -player protocol for -GAP-ORT using a protocol for -BTX.
They first construct an input for -BTX using . Alice simulates the first players, and Bob simulates the second players. Alice and Bob use the public randomness to generate and for each . For each , Alice sets the -th coordinate of each of the first players to . Similarly, Bob sets the -th coordinate of each of the last players to . Alice and Bob then use private randomness and the vectors to fill in the remaining coordinates. Observe that the resulting inputs (for -BTX) is distributed according to .
Alice and Bob then run the protocol on . Every time a message is sent between any two of the players in , it is appended to the transcript. That is, if the two players are among the first , Alice still forwards this message to Bob. If the two players are among the last , Bob still forwards this message to Alice. If the message is between a player in the first group and the second group, Alice and Bob exchange a message. The output of is equal to that of .
Theorem 6
Let be the transcript of any randomized protocol for -BTX on input distribution with error probability for a sufficiently small constant . Then , where the information is measured with respect to the uniform distribution on .
-
Proof:
By a Markov inequality, we have that for at least fraction of choices of , the -party protocol computes -BTX with error probability at most . Say such a pair good. According to our reduction, we have that the transcript of is equal to the transcript of and the output of is the same as the output of that of . Hence, for a good pair , the -party protocol computes -GAP-ORT with error probability at most on distribution . We have
Now we are ready to prove our main theorem for -BTX.
Theorem 7
Let be the transcript of any randomized protocol for -BTX on input distribution with error probability for a sufficiently small constant . We have , where the information is measured with respect to the input distribution .
-
Proof:
By Theorem 6 we have . Using the chain rule and a Markov inequality, it holds that
for at least of , where and similarly for . We say such a for which this holds is good.
Now we consider a good , and show that
Since determines given , and is independent of given and , by item of Proposition 1, it suffices to prove that . By expanding the conditioning, we can write as
By the definition of a good , we know by a Markov bound that with probability over the choice of , we have
Call these for which this holds good for .
Note that , since are independent of , Therefore, for a good and a tuple that is good for , we have
By the Maximum Likelihood Principle in Proposition 1, the maximum likelihood function computes from the transcript of and , with error probability , over and the randomness of , satisfying
(26) Now for a good , and a tuple that is good for , we define a protocol which computes the -GUESS problem on input correctly with probability . Here are inputs of the sites and is the input of the predictor. The protocol has hardwired into it, and works as follows. First, the sites construct an input for the -BTX problem distributed according to , using , and their private randomness, without any communication: They set , and use their private randomness to sample inputs for blocks using the values and the fact that the inputs to the sites are independent conditioned on . The predictor sets its input to be . Next, the sites run on their input . Finally, the predictor outputs .
By Proposition 2 that says that the randomized communication complexity is always at least the conditional information cost, we have the following immediate corollary.
Corollary 1
Any randomized protocol that computes -BTX on input distribution with error probability for some sufficient small constant has communication complexity .
4.5 The Complexity of
The input of -approximate is chosen to be the same as -BTX by setting . That is, we choose randomly according to distribution . is the input vector for site consisting of blocks each having coordinates. We prove the lower bound for by performing a reduction from -BTX.
Lemma 9
If there exists a protocol that computes a -approximate for a sufficiently small constant on input distribution with communication complexity and error probability at most , then there exists a protocol for -BTX on input distribution with communication complexity and error probability at most , where is an arbitrarily small constant.
-
Proof:
We pick a random input from distribution . Each coordinate (column) of represents an item. Thus we have a total of possible items, which we identify with the set . If we view each input vector as a set, then each site has a subset of corresponding to these bits. Let be the exact value of . can be written as the sum of three components:
(27) where are random variables (it will be clear why we write it this way in what follows). The first term of the RHS of Equation (27) is the contribution of non-special coordinates across all blocks in each of which one site has . The second term is the contribution of the special coordinates across all blocks in each of which sites have . The third term is the contribution of the special coordinates across all blocks in each of which all sites have .
Note that -BTX is if and if . Our goal is to use a protocol for to construct a protocol for -BTX such that we can differentiate the two cases (i.e., or ) with a very good probability.
Given a random input , let be the exact -value on the first sites, and be the exact -value on the second sites. That is, and . We have
(28) By Equation (27) and (28) we can cancel out :
(29) Let , and be the estimated , and obtained by running on the sites’ inputs, the first sites’ inputs and the second sites’ inputs, respectively. Observe that and . By the randomized approximation guarantee of and the discussion above we have that with probability at least ,
(30) where .
Protocol .
Given an input for -BTX, protocol first uses to obtain the value described above, and then determines the answer to -BTX as follows:
Correctness.
Note that with probability at least , we have , where is a sufficiently small constant, and thus . Therefore, in this case protocol will always succeed.
Theorem 8
Any protocol that computes a -approximate on input distribution with error probability for some sufficiently small constant has communication complexity .
5 An Upper Bound for
We describe the following protocol to give a factor -approximation to at all points in time in the union of streams each held by a different site. Each site has a non-negative vector , 777We use instead of for universe size only in this section. which evolves with time, and at all times the coordinator holds a -approximation to . Let be the length of the union of the streams. We assume , and that is a power of .
As observed in [20], up to a factor of in communication, the problem is equivalent to the threshold problem: given a threshold , with probability : when , the coordinator outputs , when , the coordinator outputs , and for , the coordinator can output either or 888To see the equivalence, by independent repetition, we can assume the success probability of the protocol for the threshold problem is . Then we can run a protocol for each , and we are correct on all instantiations with probability at least ..
We can thus assume we are given a threshold in the following algorithm description. For notational convenience, define for an integer . A nice property of the algorithm is that it is one-way, namely, all communication is from the sites to the coordinator. We leave optimization of the factors in the communication complexity to future work.
5.1 Our Protocol
The protocol consists of four algorithms illustrated in Algorithm 1 to Algorithm 4. Let at any point in time during the union of the streams. At times we will make the following assumptions on the algorithm parameters and : we assume is sufficiently small, and and are sufficiently large.
5.2 Communication Cost
Lemma 10
Consider any setting of for which we have Then the expected total communication is bits.
-
Proof:
Fix any particular and . Let equal if and equal otherwise. Let be the vector with coordinates for . Also let . Observe that .
Because of non-negativity of the ,
Notice that a is sent by a site with probability at most and only if . Hence the expected number of messages sent for this and , over all randomness, is
(34) where we used that is maximized subject to and when all the are equal to . Summing over all and , it follows that the expected number of messages sent in total is . Since each message is bits, the expected number of bits is .
5.3 Correctness
We let be a sufficiently large constant.
5.3.1 Concentration of Individual Frequencies
We shall make use of the following standard multiplicative Chernoff bound.
Fact 2
Let be i.i.d. Bernoulli random variables. Then for all ,
Lemma 11
For a sufficiently large constant , with probability , for all , , , and all times in the union of the streams,
-
1.
, and
-
2.
if , then .
-
Proof:
Fix a particular time snapshot in the stream. Let . Then is a sum of indicator variables, where the number of indicator variables depends on the values of the . The indicator variables are independent, each with expectation .
First part of lemma. The number of indicator variables is at most , and the expectation of each is at most . Hence, the probability that or more of them equal is at most
This part of the lemma now follows by scaling the by to obtain a bound on the .
Second part of lemma. Suppose at this time . The number of indicator variables is minimized when there are distinct for which , and one value of for which
Hence,
If the expectation is , then , and using that establishes this part of the lemma. Otherwise, applying Fact 2 with and , and using that , we have
Scaling by , we have
and since ,
and finally using that , and union-bounding over a stream of length as well as all choices of and , the lemma follows.
5.3.2 Estimating Class Sizes
Define the classes as follows:
Say that contributes at a point in time in the union of the streams if
Since the number of non-zero is , we have
(35) |
Lemma 12
With probability , at all points in time in the union of the streams and for all and , for at least a fraction of the ,
-
Proof:
The random variable is a sum of independent Bernoulli random variables. By a Markov bound, . Letting be an indicator variable which is iff , the lemma follows by applying Fact 2 to the , using that is large enough, and union-bounding over a stream of length and all and .
For a given , let be the value of for which we have , or if no such exists.
Lemma 13
With probability , at all points in time in the union of the streams and for all , for at least a fraction of the ,
-
1.
and
-
2.
if at this time contributes and , then
-
Proof:
We show this statement for a fixed and at a particular point in time in the union of the streams. The lemma will follow by a union bound.
The first part of the lemma follows from Lemma 12.
We now prove the second part. In this case . We can assume that there exists an for which . Indeed, otherwise and and the second part of the lemma follows.
Let , which is a sum of independent indicator random variables and so . Also,
(36) Since contributes, , and combining this with (36),
It follows that for sufficiently large, and assuming which happens with probability , we have , and so by Chebyshev’s inequality,
Since , and is large enough, the lemma follows by a Chernoff bound.
5.3.3 Combining Individual Frequency Estimation and Class Size Estimation
We define the set to be the set of times in the input stream for which the -value of the union of the streams first exceeds for an satisfying
Lemma 14
With probability , for all times in and all ,
-
1.
, and
-
2.
if at this time contributes and , then
- Proof:
By Lemma 11, for any for which , if
(37) |
then . Let us first verify that for , we have . We have
(38) |
and so
where the final inequality follows for large enough and .
It remains to consider the case when (37) does not hold.
Conditioned on all other randomness, is uniformly random subject to , or equivalently,
If (37) does not hold, then either
Hence, the probability over that inequality (37) holds is at least
It follows by a Markov bound that
(39) |
Now we must consider the case that
there is a for which for
an . There are two cases, namely, if
or if
.
We handle each case in turn.
Case: . Then
by Lemma 11,
Therefore, it suffices to show that
from which we can conclude that . But by (38),
where the last inequality follows for large enough .
Hence, .
Case: .
We claim
that . Indeed, by Lemma 11 we must have
This is equivalent to
If for , then
which is impossible. Also, if for , then
which is impossible. Hence, .
Let . Then
(40) |
By (39) and applying a Markov bound to (40), together with a union bound, with probability ,
(41) |
(42) |
By Lemma 12,
(43) |
First part of lemma. At this point we can prove the first part of this lemma. By the first part of Lemma 13,
(44) |
Combining (42), (43), and (44), we have with probability at least ,
Since this holds for at least different , it follows that
and the first part of the lemma follows by a union bound. Indeed, the number of
is , which with probability , say, is
since with this probability .
Also, . Hence, the probability this holds for all and all times
in is .
Second part of the lemma. Now we can prove the second part of the lemma. By the second
part of Lemma 13, if at this time contributes and ,
then
(45) |
Combining (41), (42), (43), and (45), we have with probability at least ,
Since this holds for at least different , it follows that
and the second part of the lemma now follows by a union bound over all and all times in , exactly in the same way as the first part of the lemma. Note that for small enough .
5.3.4 Putting It All Together
Lemma 15
With probability at least , at all times the coordinator’s output is correct.
-
Proof:
The coordinator outputs up until the first point in time in the union of the streams for which . It suffices to show that
(46) at all times in the stream. We first show that with probability at least , for all times in ,
(47) and then use the structure of and the protocol to argue that (46) holds at all times in the stream.
Fix a particular time in . We condition on the event of Lemma 14, which by setting small enough, can assume occurs with probability at least .
First, suppose at this point in time we have . Then by Lemma 14, for sufficiently small , we have
and so the coordinator will correctly output , provided .
We now handle the case . Then for all contributing , we have
while for all , we have
Hence, using (35),
For the other direction,
Hence, (47) follows for all times in provided that is small enough and is large enough.
It remains to argue that (46) holds for all points in time in the union of the streams. Recall that each time in the union of the streams for which for an integer is included in , provided .
The key observation is that the quantity is non-decreasing, since the values are non-decreasing. Now, the value of at a time not in is, by definition of , within a factor of of the value of for some time in . Since (47) holds for all times in , it follows that the value of at time satisfies
which implies for small enough that (46) holds for all points in time in the union of the streams. This completes the proof.
Theorem 9
(MAIN) With probability at least , at all times the coordinator’s output is correct and the total communication is bits.
-
Proof:
Consider the setting of at the first time in the stream for which . For any non-negative integer vector and any update , we have . Since is an integer and , we therefore have . By Lemma 10, the expected communication for these is bits, so with probability at least the communication is bits. By Lemma 15, with probability at least , the protocol terminates at or before the time for which the inputs held by the players equal . The theorem follows by a union bound.
6 Related Problems
In this section we show that the techniques we have developed for distributed and can also be used to solve other fundamental problems. In particular, we consider the problems: all-quantile, heavy hitters, empirical entropy and for any . For the first three problems, we are able to show that our lower bounds holds even if we allow some additive error . From definitions below one can observe that lower bounds for additive -approximations also hold for their multiplicative -approximation counterparts.
6.1 The All-Quantile and Heavy Hitters
We first give the definitions of the problems. Given a multiset where each is drawn from the universe , let be the frequency of item in the set . Thus .
Definition 5
(-heavy hitters) For any , the set of -heavy hitters of is . If an -approximation is allowed, then the returned set of heavy hitters must contain and cannot include any such that . If , then may or may not be included in .
Definition 6
(-quantile) For any , the -quantile of is some such that there are at most items of that are smaller than and at most items of that are greater than . If an -approximation is allowed, then when asking for the -quantile of we are allowed to return any -quantile of such that .
Definition 7
(All-quantile) The -approximate all-quantile (QUAN) problem is defined in the coordinator model, where we have sites and a coordinator. Site has a set of items. The sites want to communicate with the coordinator so that at the end of the process the coordinator can construct a data structure from which all -approximate -quantiles for any can be extracted. The cost is defined as the total number of bits exchanged between the coordinator and the sites.
Theorem 10
Any randomized protocol that computes -approximate QUAN or -approximate -heavy hitters with error probability for some sufficiently small constant has communication complexity bits.
6.1.1 The -GAP-MAJ Problem
Before proving Theorem 10, we introduce a problem we call -GAP-MAJ.
In this section we fix . In the -GAP-MAJ problem we have sites , and each site has a bit such that . Let be the distribution of . The sites want to compute the following function.
where means that the answer can be either or .
Notice that -GAP-MAJ is very similar to -APPROX-SUM: We set and directly assign ’s to the sites. Also, instead of approximating the sum, we just want to decide whether the sum is large or small, up to a gap which is roughly equal to the standard deviation.
We will prove the following theorem for -GAP-MAJ.
Theorem 11
Let be the transcript of any private randomness protocol for -GAP-MAJ on input distribution with error probability for some sufficiently small constant , then .
Remark 1
The theorem holds for private randomness protocols, though for our applications, we only need it to hold for deterministic protocols. Allowing private randomness could be useful when the theorem is used for direct-sum types of arguments in other settings.
The following definition is essentially the same as Definition 8, but with a different setting of parameters. For convenience, we will still use the terms rare and normal. Let be a constant chosen later. For a transcript , we define . Thus .
Definition 8
We say a transcript is rare+ if and rare- if . In both cases we say is rare. Otherwise we say it is normal.
Let be the joint distribution of and the distribution of ’s private randomness. The following lemma is essentially the same as Lemma 2. For completeness we still include a proof.
Lemma 16
Under the assumption of Theorem 11, .
-
Proof:
Set . We will redefine the term joker which was defined in Definition 3, with a different setting of parameters. We say is a joker+ if , and a joker- if . In both cases we say is a joker.
First, we can apply a Chernoff bound on random variables , and obtain
Second, by Observation 1, we can apply a Chernoff bound on random variables conditioned on being bad+,
Finally by Bayes’ theorem, we have that
By symmetry (since we have set ), we can also show that
Therefore (recall that we have set ).
The following definition is essentially the same as Definition 4, but is for private randomness protocols.
Definition 9
We say a transcript is weak if (for a sufficiently large constant ), and strong otherwise.
The following lemma is similar to Lemma 3, but with the new definition of a normal .
Lemma 17
Under the assumption of Theorem 11, .
-
Proof:
We first show that for a normal and weak transcript , there exists a universal constant such that
We only need to prove the first inequality. The second will follow by symmetry, since we set . For a good and weak , we have
Set . By Fact 1 we have for a universal constant ,
Together with the fact that is good, we obtain
Now set . Suppose conditioned on being normal, it is weak with probability more than . Then the error probability of the protocol (taken over the distribution ) is at least
for a sufficiently small constant , violating the success guarantee of Theorem 11. Therefore with probability at least
is both normal and strong.
-
Proof:
(for Theorem 11) Recall that for a transcript , we have defined . Let , thus . We will omit the superscript in when it is clear from the context.
For a strong , we have . Thus . For each , if (for a sufficiently large constant ), then (recall that is the binary entropy function). Otherwise, it holds that , since if . Thus we have
Therefore, if , then
Now, for a large enough constant and ,
6.1.2 Proof of Theorem 10
-
Proof:
We first prove the theorem for QUAN. In the case that , we prove an information complexity lower bound. We prove this by a simple reduction from -GAP-MAJ. We can assume since if then we can just give inputs to the first sites. Set . Given a random input of -GAP-MAJ chosen from distribution , we simply give to site . It is easy to observe that a protocol that computes -approximate QUAN on with error probability also computes -GAP-MAJ on input distribution with error probability , since the answer to -GAP-MAJ is simply the answer to -quantile. The lower bound follows from Theorem 11.
In the case that , we prove an information complexity lower bound. We again perform a reduction from -GAP-MAJ. Set . The reduction works as follows. We are given independent copies of -GAP-MAJ with being the inputs, where is chosen from distribution . We construct an input for QUAN by giving the -th site the item set . It is not difficult to observe that a protocol that computes -approximate QUAN on the set with error probability also computes the answer to each copy of -GAP-MAJ on distribution with error probability , simply by returning for the -th copy of -GAP-MAJ, where is the -approximate -quantile.
On the other hand, any protocol that computes each of the independent copies of -GAP-MAJ correctly with error probability for a sufficiently small constant has information complexity . This is simply because for any transcript , by Theorem 11, independence and the chain rule we have that
(50) The proof for heavy hitters is done by essentially the same reduction as that for QUAN. In the case that (or in general), a protocol that computes -approximate -heavy hitters on with error probability also computes -GAP-MAJ on input distribution with error probability . In the case that , it also holds that a protocol that computes -approximate -heavy hitters on the set where with error probability also computes the answer to each copy of -GAP-MAJ on distribution with error probability .
6.2 Entropy Estimation
We are given a set where each is drawn from the universe , and denotes an insertion or a deletion of item . The entropy estimation problem (ENTROPY) asks for the value where and . In the -approximate ENTROPY problem, the items in the set are distributed among sites who want to compute a value for which . In this section we prove the following theorem.
Theorem 12
Any randomized protocol that computes -approximate ENTROPY with error probability at most for some sufficiently small constant has communication complexity .
-
Proof:
As with , we prove the lower bound for the ENTROPY problem by a reduction from -BTX. Given a random input for -BTX according to distribution with for some parameter for large enough constant , we construct an input for ENTROPY as follows. Each block in -BTX corresponds to one coordinate item in the vector for ENTROPY; so we have in total items in the entropy vector. The sites first use shared randomness to sample random values for each coordinate across all blocks in 999By Newman’s theorem (cf. [43], Chapter 3) we can get rid of the public randomness by increasing the total communication complexity by no more than an additive factor which is negligible in our proof.. Let be these random values. Each site looks at each of its bits , and generates an item (recall that denotes insertion or deletion of the item ) if . Call the resulting input distribution .
We call an item in group if the -XOR instance in the corresponding block is a -instance; and in group if it is a -instance; in group if it is a -instance or a -instance. Group is further divided to two subgroups and , containing all -instance and all -instance, respectively. Let be the cardinalities of these groups. Now we consider the frequency of each item type.
-
1.
For an item , its frequency is distributed as follows: we choose a value from the binomial distribution on values each with probability , then we take the sum of i.i.d. random variables. We can thus write .
-
2.
For an item , its frequency is distributed as follows: we choose a value from the binomial distribution on values each with probability , then we take the sum of i.i.d. random variables. Then we add the value , where is the index of the special column in block . We can thus write as . By a Chernoff-Hoeffding bound, with probability , we have . We choose , and thus . Therefore will not affect the sign of for any (by a union bound) and we can write . Since is symmetric about and is a random variable, we can simply drop and write .
-
3.
For an item , its frequency is distributed as follows: we choose a value from the binomial distribution on values each with probability , then we take the sum of i.i.d. random variables. Then we add the value , where is the index of the special column in block . We can thus write as . As in the previous case, with probability , will not affect the sign of and we can write .
By a union bound, with error probability at most , each will not affect the sign of the corresponding . Moreover, by another Chernoff bound we have that with error probability , are equal to , and . Here can be sufficiently small if we set constant sufficiently large. Thus we have that with arbitrary small constant error , all the concentration results claimed above hold. For simplicity we neglect this part of error since it can be arbitrarily small and will not affect any of the analysis. In the rest of this section we will ignore arbitrarily small errors and drop some lower order terms as long as such operations will not affect any the analysis.
The analysis of the next part is similar to that for our lower bound, where we end up computing on three different vectors. Let us calculate and , which stand for the entropies of all -sites, the first sites and the second sites, respectively. Then we show that using and we can estimate well, and thus compute -BTX correctly with an arbitrarily small constant error. Thus if there is a protocol for ENTROPY on distribution then we obtain a protocol for -BTX on distribution with the same communication complexity, completing the reduction and consequently proving Theorem 12.
Before computing and , we first compute the total number of items. We can write
(51) The absolute value of the fourth term in (51) can be bounded by with arbitrarily large constant probability, using a Chernoff-Hoeffding bound, which will be and thus can be dropped. For the third term, by Chebyshev’s inequality we can assume (by increasing the constant in the big-Oh) that with arbitrarily large constant probability, , where follows by approximating the binomial distribution by a normal distribution (or, e.g., Khintchine’s inequality). Let be a value which can be computed exactly. Then, , and so we can drop the additive term.
Finally, we get,
(52) where is a value that can be computed by any site without any comunication.
Let . We can write as follows.
(53) where
(54) We consider the three summands in (54) one by one. For the second term in (54), we have
(55) The second term in (55) is at most , and can be dropped. By a similar analysis we can obtain that the third term in (54) is (up to an term)
(56) Now consider the first term. We have
(57) where can be computed exactly. Then the second term in (57) is at most , and thus can be dropped. Let . By Equations (53), (52), (54), (55), (56), (57) we can write
(58) Let and , and thus and . Next we convert the RHS of (58) to a linear function of and .
(61) (62) up to factors (see below for discussion), where
(63) -
1.
From (61) to (61) we use the fact that . From (61) to (61) we use the fact that . From (61) to (62) we use the fact that all terms in the form of are at most (we are assuming which is fine since we are neglecting polylog factors), therefore we can drop all of them together with the other terms, and consequently obtain a linear function on and .
Next we calculate , and the calculation of will be exactly the same. The values used in the following expressions are essentially the same as used for calculating , with and . Set and .
(64) | |||||
where
(65) |
By the same calculation we can obtain the following equation for .
(66) |
Note that . Combining (64) and (66) we have
(67) |
It is easy to verify that Equations (62) and (67) are linearly independent: by direct calculation (notice that are lower order terms) we obtain and . Therefore . Similarly we can obtain . Therefore the two equations are linearly independent. Furthermore, we can compute all the coefficients up to a factor. Thus if we have additive approximations of for a sufficient small constant , then we can estimate (and thus ) up to an additive error of for a sufficiently small constant by Equation (62) and (67), and therefore -BTX. This completes the proof.
6.3 for any constant
Consider an -dimensional vector with integer entries. It is well-known that for a vector of i.i.d. random variables that . Hence, for any real , , where is the -th moment of the standard half-normal distribution (see [1] for a formula for these moments in terms of confluent hypergeometric functions). Let , and be independent -dimensional vectors of i.i.d. random variables. Let , so that . By Chebyshev’s inequality for sufficiently large, with probability at least for an arbitrarily small constant .
We thus have the following reduction which shows that estimating up to a -factor requires communication complexity for any . Let the parties have respective inputs , and let . The parties use the shared randomness to choose shared vectors as described above. For and , let , so that . Let . By the above, with probability at least for an arbitrarily small constant . We note that the entries of the can be discretized to bits, changing the -norm of by only a factor, which we ignore.
Hence, given a randomized protocol for estimating up to a factor with probability , and given that the parties have respective inputs , this implies a randomized protocol for estimating up to a factor with probability at least , and hence a protocol for estimating up to a factor with this probability. The communication complexity of the protocol for is the same as that for . By our communication lower bound for estimating (in fact, for estimating in which all coordinates of are non-negative), this implies the following theorem.
Theorem 13
The randomized communication complexity of approximating the -norm, , up to a factor of with constant probability, is .
Acknowledgements
We would like to thank Elad Verbin for many helpful discussions, in particular, for helping us with the lower bound, which was discovered in joint conversations with him. We also thank Amit Chakrabarti and Oded Regev for helpful discussions, as well as the anonymous referees for useful comments. Finally, we thank the organizers of the Synergies in Lower Bounds workshop that took place in Aarhus for bringing the authors together.
References
- [1] http://en.wikipedia.org/wiki/Normal_distribution.
- [2] Open problems in data streams and related topics. http://www.cse.iitk.ac.in/users/sganguly/data-stream-probs.pdf, 2006.
- [3] N. Alon, Y. Matias, and M. Szegedy. The space complexity of approximating the frequency moments. In Proc. ACM Symposium on Theory of Computing, 1996.
- [4] C. Arackaparambil, J. Brody, and A. Chakrabarti. Functional monitoring without monotonicity. In Proc. International Colloquium on Automata, Languages, and Programming, 2009.
- [5] B. Babcock and C. Olston. Distributed top-k monitoring. In Proc. ACM SIGMOD International Conference on Management of Data, 2003.
- [6] Z. Bar-Yossef. The complexity of massive data set computations. PhD thesis, University of California at Berkeley, 2002.
- [7] Z. Bar-Yossef, T. S. Jayram, R. Kumar, and D. Sivakumar. An information statistics approach to data stream and communication complexity. J. Comput. Syst. Sci., 68:702–732, June 2004.
- [8] B. Barak, M. Braverman, X. Chen, and A. Rao. How to compress interactive communication. In Proc. ACM Symposium on Theory of Computing, pages 67–76, 2010.
- [9] J. Brody and A. Chakrabarti. A multi-round communication lower bound for gap hamming and some consequences. In IEEE Conference on Computational Complexity, pages 358–368, 2009.
- [10] J. Brody, A. Chakrabarti, O. Regev, T. Vidick, and R. de Wolf. Better gap-hamming lower bounds via better round elimination. In APPROX-RANDOM, pages 476–489, 2010.
- [11] A. Chakrabarti, G. Cormode, R. Kondapally, and A. McGregor. Information cost tradeoffs for augmented index and streaming language recognition. In Proc. IEEE Symposium on Foundations of Computer Science, pages 387–396, 2010.
- [12] A. Chakrabarti, G. Cormode, and A. McGregor. Robust lower bounds for communication and stream computation. In Proc. ACM Symposium on Theory of Computing, pages 641–650, 2008.
- [13] A. Chakrabarti, T. S. Jayram, and M. Patrascu. Tight lower bounds for selection in randomly ordered streams. In Proc. ACM-SIAM Symposium on Discrete Algorithms, pages 720–729, 2008.
- [14] A. Chakrabarti, S. Khot, and X. Sun. Near-optimal lower bounds on the multi-party communication complexity of set disjointness. In Proc. IEEE Conference on Computational Complexity, pages 107–117, 2003.
- [15] A. Chakrabarti, R. Kondapally, and Z. Wang. Information complexity versus corruption and applications to orthogonality and gap-hamming. In APPROX-RANDOM, pages 483–494, 2012.
- [16] A. Chakrabarti and O. Regev. An optimal lower bound on the communication complexity of gap-hamming-distance. In Proc. ACM Symposium on Theory of Computing, 2011.
- [17] A. Chakrabarti, Y. Shi, A. Wirth, and A. Yao. Informational complexity and the direct sum problem for simultaneous message complexity. In Proc. IEEE Symposium on Foundations of Computer Science, pages 270–278, 2001.
- [18] G. Cormode and M. Garofalakis. Sketching streams through the net: Distributed approximate query tracking. In Proc. International Conference on Very Large Data Bases, 2005.
- [19] G. Cormode, M. Garofalakis, S. Muthukrishnan, and R. Rastogi. Holistic aggregates in a networked world: Distributed tracking of approximate quantiles. In Proc. ACM SIGMOD International Conference on Management of Data, 2005.
- [20] G. Cormode, S. Muthukrishnan, and K. Yi. Algorithms for distributed functional monitoring. ACM Transactions on Algorithms, 7(2):21, 2011.
- [21] G. Cormode, S. Muthukrishnan, K. Yi, and Q. Zhang. Optimal sampling from distributed streams. In Proc. ACM Symposium on Principles of Database Systems, 2010. Invited to Journal of the ACM.
- [22] T. Cover and J. Thomas. Elements of Information Theory. John Wiley and Sons, Inc., 1991.
- [23] P. Duris and J. D. P. Rolim. Lower bounds on the multiparty communication complexity. J. Comput. Syst. Sci., 56(1):90–95, 1998.
- [24] F. Ergün and H. Jowhari. On distance to monotonicity and longest increasing subsequence of a data stream. In Proc. ACM-SIAM Symposium on Discrete Algorithms, pages 730–736, 2008.
- [25] D. Estrin, R. Govindan, J. S. Heidemann, and S. Kumar. Next century challenges: Scalable coordination in sensor networks. In MOBICOM, pages 263–270, 1999.
- [26] W. Feller. Generalization of a probability limit theorem of cramer. Trans. Amer. Math. Soc, 54(3):361–372, 1943.
- [27] A. Gál and P. Gopalan. Lower bounds on streaming algorithms for approximating the length of the longest increasing subsequence. In Proc. IEEE Symposium on Foundations of Computer Science, 2007.
- [28] S. Ganguly. Polynomial estimators for high frequency moments. CoRR, abs/1104.4552, 2011.
- [29] S. Ganguly. A lower bound for estimating high moments of a data stream. CoRR, abs/1201.0253, 2012.
- [30] P. Gopalan, R. Meka, O. Reingold, and D. Zuckerman. Pseudorandom generators for combinatorial shapes. In Proceedings of the 43rd annual ACM symposium on Theory of computing, STOC ’11, pages 253–262, New York, NY, USA, 2011. ACM.
- [31] A. Gronemeier. Asymptotically optimal lower bounds on the nih-multi-party information complexity of the and-function and disjointness. In Symposium on Theoretical Aspects of Computer Science, pages 505–516, 2009.
- [32] S. Guha and Z. Huang. Revisiting the direct sum theorem and space lower bounds in random order streams. In Proc. International Colloquium on Automata, Languages, and Programming, 2009.
- [33] N. J. A. Harvey, J. Nelson, and K. Onak. Sketching and streaming entropy via approximation theory. In Proc. IEEE Symposium on Foundations of Computer Science, pages 489–498, 2008.
- [34] Z. Huang and K. Yi. Personal communication, 2012.
- [35] Z. Huang, K. Yi, and Q. Zhang. Randomized algorithms for tracking distributed count, frequencies, and ranks. CoRR, abs/1108.3413, 2011.
- [36] Z. Huang, K. Yi, and Q. Zhang. Randomized algorithms for tracking distributed count, frequencies, and ranks. In Proceedings of the 31st symposium on Principles of Database Systems, PODS ’12, pages 295–306, New York, NY, USA, 2012. ACM.
- [37] P. Indyk and D. Woodruff. Optimal approximations of the frequency moments of data streams. In Proc. ACM Symposium on Theory of Computing, 2005.
- [38] P. Indyk and D. P. Woodruff. Tight lower bounds for the distinct elements problem. In FOCS, pages 283–288, 2003.
- [39] T. S. Jayram. Hellinger strikes back: A note on the multi-party information complexity of and. In APPROX-RANDOM, pages 562–573, 2009.
- [40] D. M. Kane, J. Nelson, E. Porat, and D. P. Woodruff. Fast moment estimation in data streams in optimal space. In STOC, pages 745–754, 2011.
- [41] D. M. Kane, J. Nelson, and D. P. Woodruff. An optimal algorithm for the distinct elements problem. In Proc. ACM Symposium on Principles of Database Systems, pages 41–52, 2010.
- [42] R. Keralapura, G. Cormode, and J. Ramamirtham. Communication-efficient distributed monitoring of thresholded counts. In Proc. ACM SIGMOD International Conference on Management of Data, 2006.
- [43] E. Kushilevitz and N. Nisan. Communication Complexity. Cambridge University Press, 1997.
- [44] F. Magniez, C. Mathieu, and A. Nayak. Recognizing well-parenthesized expressions in the streaming model. In Proc. ACM Symposium on Theory of Computing, pages 261–270, 2010.
- [45] A. Manjhi, V. Shkapenyuk, K. Dhamdhere, and C. Olston. Finding (recently) frequent items in distributed data streams. In Proc. IEEE International Conference on Data Engineering, 2005.
- [46] J. Matousek and J. Vondrák. The probabilistic method. Lecture Notes, 2008.
- [47] J. M. Phillips, E. Verbin, and Q. Zhang. Lower bounds for number-in-hand multiparty communication complexity, made easy. In Proc. ACM-SIAM Symposium on Discrete Algorithms, 2012.
- [48] A. A. Razborov. On the distributional complexity of disjointness. In Proc. International Colloquium on Automata, Languages, and Programming, 1990.
- [49] A. A. Sherstov. The communication complexity of gap hamming distance. Electronic Colloquium on Computational Complexity (ECCC), 18:63, 2011.
- [50] S. Tirthapura and D. P. Woodruff. Optimal random sampling from distributed streams revisited. In The International Symposium on Distributed Computing, pages 283–297, 2011.
- [51] T. Vidick. A concentration inequality for the overlap of a vector on a large set, with application to the communication complexity of the gap-hamming-distance problem. Electronic Colloquium on Computational Complexity (ECCC), 18:51, 2011.
- [52] D. Woodruff. Optimal space lower bounds for all frequency moments. In Proc. ACM-SIAM Symposium on Discrete Algorithms, 2004.
- [53] A. C. Yao. Probabilistic computations: Towards a unified measure of complexity. In Proc. IEEE Symposium on Foundations of Computer Science, 1977.
- [54] K. Yi and Q. Zhang. Optimal tracking of distributed heavy hitters and quantiles. In Proc. ACM Symposium on Principles of Database Systems, 2009.
Appendix A Proof for Observation 1
We first show the rectangle property of private randomness protocols in the message passing model. The proof is just a syntactic change to that in [6], Section 6.4.1, which was designed for the blackboard model. The only difference between the blackboard model and the message-passing model is that in the blackboard model, if one speaks, everyone else can hear.
Property 1
Given a -party private randomness protocol on inputs in in the message passing model, for all , and for all possible transcripts , we have
(68) |
where is the part of transcript that player sees (that is, the concatenation of all messages sent from or received by ), and is the players’ private randomness. Furthermore, we have
(69) |
-
Proof:
We can view the input of as a pair , where and . Let be the combinatorial rectangle containing all tuples such that .
For each , for each , let be the projection of on pairs of the form , and be the collection of all pairs of the form . Note that . If each player chooses uniformly at random from , then the transcript will be if and only if . Since the choices of are all independent, it follows that .
Now we prove Observation 1. That is, to show
We show this using the rectangle property.