This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Tight Bounds for Distributed Functional Monitoring

David P. Woodruff
IBM Almaden
dpwoodru@us.ibm.com
   Qin Zhang
IBM Almaden
qinzhang@cse.ust.hk
Most of this work was done while Qin Zhang was a postdoc in MADALGO (Center for Massive Data Algorithmics - a Center of the Danish National Research Foundation), Aarhus University.
Abstract

We resolve several fundamental questions in the area of distributed functional monitoring, initiated by Cormode, Muthukrishnan, and Yi (SODA, 2008), and receiving recent attention. In this model there are kk sites each tracking their input streams and communicating with a central coordinator. The coordinator’s task is to continuously maintain an approximate output to a function computed over the union of the kk streams. The goal is to minimize the number of bits communicated.

Let the pp-th frequency moment be defined as Fp=ifipF_{p}=\sum_{i}f_{i}^{p}, where fif_{i} is the frequency of element ii. We show the randomized communication complexity of estimating the number of distinct elements (that is, F0F_{0}) up to a 1+ε1+\varepsilon factor is Ω~(k/ε2)\tilde{\Omega}(k/\varepsilon^{2}), improving upon the previous Ω(k+1/ε2)\Omega(k+1/\varepsilon^{2}) bound and matching known upper bounds up to a logarithmic factor. For FpF_{p}, p>1p>1, we improve the previous Ω(k+1/ε2)\Omega(k+1/\varepsilon^{2}) bits communication bound to Ω(kp1/ε2){\Omega}(k^{p-1}/\varepsilon^{2}). We obtain similar improvements for heavy hitters, empirical entropy, and other problems. Our lower bounds are the first of any kind in distributed functional monitoring to depend on the product of kk and 1/ε21/\varepsilon^{2}. Moreover, the lower bounds are for the static version of the distributed functional monitoring model where the coordinator only needs to compute the function at the time when all kk input streams end; surprisingly they almost match what is achievable in the (dynamic version of) distributed functional monitoring model where the coordinator needs to keep track of the function continuously at any time step. We also show that we can estimate FpF_{p}, for any p>1p>1, using O~(kp1poly(ε1))\tilde{O}(k^{p-1}{\mathrm{poly}}(\varepsilon^{-1})) bits of communication. This drastically improves upon the previous O~(k2p+1N12/ppoly(ε1))\tilde{O}(k^{2p+1}N^{1-2/p}{\mathrm{poly}}(\varepsilon^{-1})) bits bound of Cormode, Muthukrishnan, and Yi for general pp, and their O~(k2/ε+k1.5/ε3)\tilde{O}(k^{2}/\varepsilon+k^{1.5}/\varepsilon^{3}) bits bound for p=2p=2. For p=2p=2, our bound resolves their main open question.

Our lower bounds are based on new direct sum theorems for approximate majority, and yield improvements to classical problems in the standard data stream model. First, we improve the known lower bound for estimating Fp,p>2,F_{p},p>2, in tt passes from Ω~(n12/p/(ε2/pt))\tilde{\Omega}(n^{1-2/p}/(\varepsilon^{2/p}t)) to Ω(n12/p/(ε4/pt)){\Omega}(n^{1-2/p}/(\varepsilon^{4/p}t)), giving the first bound that matches what we expect when p=2p=2 for any constant number of passes. Second, we give the first lower bound for estimating F0F_{0} in tt passes with Ω(1/(ε2t))\Omega(1/(\varepsilon^{2}t)) bits of space that does not use the hardness of the gap-hamming problem.

1 Introduction

Recent applications in sensor networks and distributed systems have motivated the distributed functional monitoring model, initiated by Cormode, Muthukrishnan, and Yi [20]. In this model there are kk sites and a single central coordinator. Each site Si(i[k])S_{i}\ (i\in[k]) receives a stream of data Ai(t)A_{i}(t) for timesteps t=1,2,t=1,2,\ldots, and the coordinator wants to keep track of a function ff that is defined over the multiset union of the kk data streams at each time tt. For example, the function ff could be the number of distinct elements in the union of the kk streams. We assume that there is a two-way communication channel between each site and the coordinator so that the sites can communicate with the coordinator. The goal is to minimize the total amount of communication between the sites and the coordinator so that the coordinator can approximately maintain f(A1(t),,Ak(t))f(A_{1}(t),\ldots,A_{k}(t)) at any time tt. Minimizing the total communication is motivated by power constraints in sensor networks, since communication typically uses a power-hungry radio [25]; and also by network bandwidth constraints in distributed systems. There is a large body of work on monitoring problems in this model, including maintaining a random sample [21, 50], estimating frequency moments [18, 20], finding the heavy hitters [5, 42, 45, 54], approximating the quantiles [19, 54, 35], and estimating the entropy [4].

We can think of the distributed functional monitoring model as follows. Each of the kk sites holds an NN-dimentional vector where NN is the size of the universe. An update to a coordinate jj on site SiS_{i} causes vjiv^{i}_{j} to increase by 11. The goal is to estimate a statistic of v=i=1kviv=\sum_{i=1}^{k}v^{i}, such as the pp-th frequency moment Fp=vppF_{p}=\|v\|_{p}^{p}, the number of distinct elements F0=|support(v)|F_{0}=|\textrm{support}(v)|, and the empirical entropy H=iviv1logv1viH=\sum_{i}\frac{v_{i}}{\|v\|_{1}}\log\frac{\|v\|_{1}}{v_{i}}. This is the standard insertion-only model. For many of these problems, with the exception of the empirical entropy, there are strong lower bounds (e.g., Ω(N)\Omega(N)) if allowing updates to coordinates that cause vjiv^{i}_{j} to decrease [4]. The latter is called the update model. Thus, except for entropy, we follow previous work and consider the insertion-only model.

To prove lower bounds, we consider the static version of the distributed functional monitoring model, where the coordinator only needs to compute the function at the time when all kk input streams end. It is clear that a lower bound for the static case is also a lower bound for the dynamic case in which the coordinator has to keep track of the function at any point in time. The static version of the distributed functional monitoring model is closely related to the multiparty number-in-hand communication model, where we again have kk sites each holding an NN-dimensional vector viv^{i}, and they want to jointly compute a function defined on the kk input vectors. It is easy to see that these two models are essentially the same since in the former, if site SiS_{i} would like to send a message to SjS_{j}, it can always send the message first to the coordinator and then the coordinator can forward the message to SjS_{j}. Doing this will only increase the total amount of communication by a factor of two. Therefore, we do not distinguish between these two models in this paper.

There are two variants of the multiparty number-in-hand communication model we will consider: the blackboard model, in which each message a site sends is received by all other sites, i.e., it is broadcast, and the message-passing model, in which each message is between the coordinator and a specific site.

Despite the large body of work in the distributed functional monitoring model, the complexity of basic problems is not well understood. For example, for estimating F0F_{0} up to a (1+ε)(1+\varepsilon)-factor, the best upper bound is O~(k/ε2)\tilde{O}(k/\varepsilon^{2}) 111 We use O~(f)\tilde{O}(f) to denote a function of the form flogO(1)(Nk/ε)f\cdot\log^{O(1)}(Nk/\varepsilon). [20] (all communication and information bounds in this paper, if not otherwise stated, are in terms of bits), while the only known lower bound is Ω(k+1/ε2)\Omega(k+1/\varepsilon^{2}). The dependence on ε\varepsilon in the lower bound is not very insightful, as the Ω(1/ε2)\Omega(1/\varepsilon^{2}) bound follows just by considering two sites [4, 16]. The real question is whether the kk and 1/ε21/\varepsilon^{2} factors should multiply. Even more embarrassingly, for the frequency moments FpF_{p}, p>2p>2, the known algorithms use communication O~(k2p+1N12/ppoly(1/ε))\tilde{O}(k^{2p+1}N^{1-2/p}{\mathrm{poly}}(1/\varepsilon)), while the only known lower bound is Ω(k+1/ε2)\Omega(k+1/\varepsilon^{2}) [4, 16]. Even for p=2p=2, the best known upper bound is O~(k2/ε+k1.5/ε3)\tilde{O}(k^{2}/\varepsilon+k^{1.5}/\varepsilon^{3}) [20], and the authors’ main open question in their paper is “It remains to close the gap in the F2F_{2} case: can a better lower bound than Ω(k)\Omega(k) be shown, or do there exist O~(kpoly(1/ε))\tilde{O}(k\cdot{\mathrm{poly}}(1/\varepsilon)) solutions?”

Our Results: We significantly improve the previous communication bounds for approximating the frequency moments, entropy, heavy hitters, and quantiles in the distributed functional monitoring model. In many cases our bounds are optimal. Our results are summarized in Table 1, where they are compared with previous bounds.

Previous work This paper Previous work This paper
Problem LB LB (all static) UB UB
F0F_{0} Ω~(k)\tilde{\Omega}(k) [20] 𝛀~(𝐤/ε𝟐)\mathbf{\tilde{\Omega}(k/\varepsilon^{2})} O~(k/ε2)\tilde{O}(k/\varepsilon^{2}) [20]
F2F_{2} Ω(k)\Omega(k) [20] 𝛀(𝐤/ε𝟐)\mathbf{{\Omega}(k/\varepsilon^{2})} (BB) O~(k2/ε+k1.5/ε3)\tilde{O}(k^{2}/\varepsilon+k^{1.5}/\varepsilon^{3}) [20] 𝐎~(𝐤poly(ε))\mathbf{\tilde{O}(\frac{k}{\mathrm{poly}(\varepsilon)})}
Fp(p>1)F_{p}\ (p>1) Ω(k+1/ε2)\Omega(k+1/\varepsilon^{2}) [4, 16] 𝛀(𝐤𝐩𝟏/ε𝟐)\mathbf{{\Omega}(k^{p-1}/\varepsilon^{2})} (BB) O~(pε1+2/pk2p+1N12/p)\tilde{O}(\frac{p}{\varepsilon^{1+2/p}}k^{2p+1}N^{1-2/p}) [20] 𝐎~(𝐤𝐩𝟏poly(ε))\mathbf{\tilde{O}(\frac{k^{p-1}}{\mathrm{poly}(\varepsilon)})}
All-quantile Ω~(min{kε,1ε2})\tilde{\Omega}(\min\{\frac{\sqrt{k}}{\varepsilon},\frac{1}{\varepsilon^{2}}\}) [35] 𝛀(min{𝐤ε,𝟏ε𝟐})\mathbf{{\Omega}(\min\{\frac{\sqrt{k}}{\varepsilon},\frac{1}{\varepsilon^{2}}\})} (BB) O~(min{kε,1ε2})\tilde{O}(\min\{\frac{\sqrt{k}}{\varepsilon},\frac{1}{\varepsilon^{2}}\}) [35]
Heavy Hitters Ω~(min{kε,1ε2})\tilde{\Omega}(\min\{\frac{\sqrt{k}}{\varepsilon},\frac{1}{\varepsilon^{2}}\}) [35] 𝛀(min{𝐤ε,𝟏ε𝟐})\mathbf{{\Omega}(\min\{\frac{\sqrt{k}}{\varepsilon},\frac{1}{\varepsilon^{2}}\})} (BB) O~(min{kε,1ε2})\tilde{O}(\min\{\frac{\sqrt{k}}{\varepsilon},\frac{1}{\varepsilon^{2}}\}) [35]
Entropy Ω~(1/ε)\tilde{\Omega}(1/\sqrt{\varepsilon}) [4] 𝛀(𝐤/ε𝟐)\mathbf{{\Omega}(k/\varepsilon^{2})} (BB) O~(kε3)\tilde{O}(\frac{k}{\varepsilon^{3}}) [4], O~(kε2)\tilde{O}(\frac{k}{\varepsilon^{2}}) (static) [33]
p(p(0,2])\ell_{p}\ (p\in(0,2]) 𝛀(𝐤/ε𝟐)\mathbf{{\Omega}(k/\varepsilon^{2})} (BB) O~(k/ε2)\tilde{O}(k/\varepsilon^{2}) (static) [40]
Table 1: UB denotes upper bound; LB denotes lower bound; BB denotes blackboard model. NN denotes the universe size. All bounds are for randomized algorithms. We assume all bounds hold in the dynamic setting by default, and will state explicitly if they hold in the static setting. For lower bounds we assume the message-passing model by default, and state explicitly if they also hold in the blackboard model.

We have three main results, each introducing a new technique:

  1. 1.

    We show that for estimating F0F_{0} in the message-passing model, Ω~(k/ε2)\tilde{\Omega}(k/\varepsilon^{2}) communication is required, matching an upper bound of [20] up to a polylogarithmic factor. Our lower bound holds in the static model in which the kk sites just need to approximate F0F_{0} once on their inputs.

  2. 2.

    We show that we can estimate FpF_{p}, for any p>1p>1, using O~(kp1poly(ε1))\tilde{O}(k^{p-1}{\mathrm{poly}}(\varepsilon^{-1})) communication in the message-passing model222We assume the total number of updates is poly(N){\mathrm{poly}}(N).. This drastically improves upon the previous bound O~(k2p+1N12/ppoly(ε1))\tilde{O}(k^{2p+1}N^{1-2/p}{\mathrm{poly}}(\varepsilon^{-1})) of [20]. In particular, setting p=2p=2, we resolve the main open question of [20].

  3. 3.

    We show Ω(kp1/ε2){\Omega}(k^{p-1}/\varepsilon^{2}) communication is necessary for approximating Fp(p>1)F_{p}\ (p>1) in the blackboard model, significantly improving the prior Ω(k+1/ε2)\Omega(k+1/\varepsilon^{2}) bound. As with our lower bound for F0F_{0}, these are the first lower bounds which depend on the product of kk and 1/ε1/\varepsilon. As with F0F_{0}, our lower bound holds in the static model in which the sites just approximate FpF_{p} once.

Our other results in Table 1 are explained in the body of the paper, and use similar techniques.

We would like to mention that after the conference version of our paper, our results found applications in proving a space lower bound at each site for tracking heavy hitters in the functional monitoring model [36], and a communication complexity lower bound of computing ε\varepsilon-approximations of range spaces in 2\mathbb{R}^{2} in the message-passing model [34].

Our Techniques: Lower Bound for F0F_{0}: For illustration, suppose k=1/ε2k=1/\varepsilon^{2}. There are 1/ε21/\varepsilon^{2} sites each holding a random independent bit. Their task is to approximate the sum of the kk bits up to an additive error 1/ε1/\varepsilon. Call this problem kk-APPROX-SUM.333In the conference version of this paper we introduced a problem called kk-GAP-MAJ, in which sites need to decide if at least 1/(2ε2)+1/ε1/(2\varepsilon^{2})+1/\varepsilon of the bits are 11, or at most 1/(2ε2)1/ε1/(2\varepsilon^{2})-1/\varepsilon of the bits are 11. We instead use kk-APPROX-SUM here since we feel it is easier to work with: This problem is stronger than kk-GAP-MAJ thus is easier to lower bound, and it suffices for our purpose. kk-GAP-MAJ will be introduced and used in Section 6.1 for heavy-hitters and quantiles. We show any correct protocol must reveal Ω(1/ε2)\Omega(1/\varepsilon^{2}) bits of information about the sites’ inputs. We “compose” this with 22-party disjointness (22-DISJ) [48], in which each party has a bitstring of length 1/ε21/\varepsilon^{2} and either the strings have disjoint support (the solution is 0) or there is a single coordinate which is 11 in both strings (the solution is 11). Let τ\tau be the hard distribution for 22-DISJ, shown to require Ω(1/ε2)\Omega(1/\varepsilon^{2}) bits of communication to solve [48]. Suppose the coordinator and each site share an instance of 22-DISJ in which the solution to 22-DISJ is a random bit, which is the site’s effective input to kk-APPROX-SUM. The coordinator has the same input for each of the 1/ε21/\varepsilon^{2} instances, while the sites have an independent input drawn from τ\tau conditioned on the coordinator’s input and output bit determined by kk-APPROX-SUM. The inputs are chosen so that if the output of 22-DISJ is 11, then F0F_{0} increases by 11, otherwise it remains the same. This is not entirely accurate, but it illustrates the main idea. Now, the key is that by the rectangle property of kk-party communication protocols, the 1/ε21/\varepsilon^{2} different output bits are independent conditioned on the transcript. Thus if a protocol does not reveal Ω(1/ε2)\Omega(1/\varepsilon^{2}) bits of information about these output bits, by an anti-concentration theorem we can show that the protocol cannot succeed with large probability. Finally, since a (1+ε)(1+\varepsilon)-approximation to F0F_{0} can decide kk-APPROX-SUM, and since any correct protocol for kk-APPROX-SUM must reveal Ω(1/ε2)\Omega(1/\varepsilon^{2}) bits of information, the protocol must solve Ω(1/ε2)\Omega(1/\varepsilon^{2}) instances of 22-DISJ, each requiring Ω(1/ε2)\Omega(1/\varepsilon^{2}) bits of communication (otherwise the coordinator could simulate k1k-1 of the sites and obtain an o(1/ε2)o(1/\varepsilon^{2})- communication protocol for 22-DISJ with the remaining site, contradicting the communication lower bound for 22-DISJ on this distribution). We obtain an Ω~(k/ε2)\tilde{\Omega}(k/\varepsilon^{2}) bound for k1/ε2k\geq 1/\varepsilon^{2} by using similar arguments. One cannot show this in the blackboard model since there is an O~(k+1/ε2)\tilde{O}(k+1/\varepsilon^{2}) bound for F0F_{0} 444The idea is to first obtain a 22-approximation. Then, sub-sample so that there are Θ(1/ε2)\Theta(1/\varepsilon^{2}) distinct elements. Then the first party broadcasts his distinct elements, the second party broadcasts the distinct elements he has that the first party does not, etc..

Lower Bound for FpF_{p}: Our Ω(kp1/ε2){\Omega}(k^{p-1}/\varepsilon^{2}) bound for FpF_{p} cannot use the above reduction since we do not know how to turn a protocol for approximating FpF_{p} into a protocol for solving the composition of kk-APPROX-SUM and 22-DISJ. Instead, our starting point is a recent Ω(1/ε2)\Omega(1/\varepsilon^{2}) lower bound for the 22-party gap-hamming distance problem GHD [16]. The parties have a length-1/ε21/\varepsilon^{2} bitstring, xx and yy, respectively, and they must decide if the Hamming distance Δ(x,y)>1/(2ε2)+1/ε\Delta(x,y)>1/(2\varepsilon^{2})+1/\varepsilon or Δ(x,y)<1/(2ε2)1/ε\Delta(x,y)<1/(2\varepsilon^{2})-1/\varepsilon. A simplification by Sherstov [49] shows a related problem called 22-GAP-ORT also has communication complexity of Ω(1/ε2)\Omega(1/\varepsilon^{2}) bits. Here there are two parties, each with 1/ε21/\varepsilon^{2}-length bitstrings xx and yy, and they must decide if |Δ(x,y)1/(2ε2)|>2/ε|\Delta(x,y)-1/(2\varepsilon^{2})|>2/\varepsilon or |Δ(x,y)1/(2ε2)|<1/ε|\Delta(x,y)-1/(2\varepsilon^{2})|<1/\varepsilon. Chakrabarti et al. [15] showed that any correct protocol for 22-GAP-ORT must reveal Ω(1/ε2){\Omega}(1/\varepsilon^{2}) bits of information about (x,y)(x,y). By independence and the chain rule, this means for Ω(1/ε2){\Omega}(1/\varepsilon^{2}) indices ii, Ω(1){\Omega}(1) bits of information is revealed about (xi,yi)(x_{i},y_{i}) conditioned on values (xj,yj)(x_{j},y_{j}) for j<ij<i. We now “embed” an independent copy of a variant of kk-party-disjointness, the kk-XOR problem, on each of the 1/ε21/\varepsilon^{2} coordinates of 22-GAP-ORT. In this variant, there are kk parties each holding a bitstring of length kpk^{p}. On all but one “special” randomly chosen coordinate, there is a single site assigned to the coordinate and that site uses private randomness to choose whether the value on the coordinate is 0 or 11 (with equal probability), and the remaining k1k-1 sites have 0 on this coordinate. On the special coordinate, with probability 1/41/4 all sites have a 0 on this coordinate (a “00” instance), with probability 1/41/4 the first k/2k/2 parties have a 11 on this coordinate and the remaining k/2k/2 parties have a 0 (a “10” instance), with probability 1/41/4 the second k/2k/2 parties have a 11 on this coordinate and the remaining k/2k/2 parties have a 0 (a “01” instance), and with the remaining probability 1/41/4 all kk parties have a 11 on this coordinate (a “11” instance). We show, via a direct sum for distributional communication complexity, that any deterministic protocol that decides which case the special coordinate is in with probability 1/4+Ω(1)1/4+{\Omega}(1) has conditional information cost Ω(kp1){\Omega}(k^{p-1}). This implies that any protocol that can decide whether the output is in the set {10,01}\{10,01\} (the “XOR” of the output bits) with probability 1/2+Ω(1)1/2+{\Omega}(1) has conditional information cost Ω(kp1){\Omega}(k^{p-1}). We do the direct sum argument by conditioning the mutual information on low-entropy random variables which allow us to fill in inputs on remaining coordinates without any communication between the parties and without asymptotically affecting our Ω(kp1){\Omega}(k^{p-1}) lower bound. We design a reduction so that on the ii-th coordinate of 22-GAP-ORT, the input of the first k/2k/2-players of kk-XOR is determined by the public coin (which we condition on) and the first party’s input bit to 22-GAP-ORT, and the input of the second k/2k/2-players of kk-XOR is determined by the public coin and the second party’s input bit to 22-GAP-ORT . We show that any protocol that solves the composition of 22-GAP-ORT with 1/ε21/\varepsilon^{2} copies of kk-XOR , a problem that we call kk-BTX , must reveal Ω(1){\Omega}(1) bits of information about the two output bits of an Ω(1){\Omega}(1) fraction of the 1/ε21/\varepsilon^{2} copies, and from our Ω(kp1){\Omega}(k^{p-1}) information cost lower bound for a single copy, we can obtain an overall Ω(kp1/ε2){\Omega}(k^{p-1}/\varepsilon^{2}) bound. Finally, one can show that a (1+ε)(1+\varepsilon)-approximation algorithm for FpF_{p} can be used to solve kk-BTX .

Upper Bound for FpF_{p}: We illustrate the algorithm for p=2p=2 and constant ε\varepsilon. Unlike [20], we do not use AMS sketches [3]. A nice property of our protocol is that it is the first 11-way protocol (the protocol of [20] is not), in the sense that only the sites send messages to the coordinator (the coordinator does not send any messages). Moreover, all messages are simple: if a site receives an update to the jj-th coordinate, provided the frequency of coordinate jj in its stream exceeds a threshold, it decides with a certain probability to send jj to the coordinator. Unfortunately, one can show that this probability cannot be the same for all coordinates jj, as otherwise the communication would be too large.

To determine the threshold and probability to send an update to a coordinate jj, the sites use the public coin to randomly group all coordinates jj into buckets SS_{\ell}, where SS_{\ell} contains a 1/21/2^{\ell} fraction of the input coordinates. For jSj\in S_{\ell}, the threshold and probability are only a function of \ell. Inspired by work on sub-sampling [37], we try to estimate the number of coordinates jj of magnitude in the range [2h,2h+1)[2^{h},2^{h+1}), for each hh. Call this class of coordinates ChC_{h}. If the contribution to F2F_{2} from ChC_{h} is significant, then |Ch|22hF2|C_{h}|\approx 2^{-2h}\cdot F_{2}, and to estimate |Ch||C_{h}| we only consider those jChj\in C_{h} that are in SS_{\ell} for a value \ell which satisfies |Ch|222hF221|C_{h}|\cdot 2^{-\ell}\approx 2^{-2h}\cdot F_{2}\cdot 2^{-\ell}\approx 1. We do not know F2F_{2} and so we also do not know \ell, but we can make a logarithmic number of guesses. We note that the work [37] was available to the authors of [20] for several years, but adapting it to the distributed framework here is tricky in the sense that the “heavy hitters” algorithm used in [37] for finding elements in different ChC_{h} needs to be implemented in a kk-party communication-efficient way.

When choosing the threshold and probability we have two competing constraints; on the one hand these values must be chosen so that we can accurately estimate the values |Ch||C_{h}| from the samples. On the other hand, these values need to be chosen so that the communication is not excessive. Balancing these two constraints forces us to use a threshold instead of just the same probability for all coordinates in SS_{\ell}. By choosing the thresholds and probabilities to be appropriate functions of \ell, we can satisfy both constraints. Other minor issues in the analysis arise from the fact that different classes contribute at different times, and that the coordinator must be correct at all times. These issues can be resolved by conditioning on a quantity related to the protocol’s correctness being accurate at a small number of selected times in the stream, and then arguing that the quantity is non-decreasing and that this implies that it is correct at all times.

Implications for the Data Stream Model: In 2003, Indyk and Woodruff introduced the GHD problem [38], where a 11-round lower bound shortly followed [52]. Ever since, it seemed the space complexity of estimating F0F_{0} in a data stream with t>1t>1 passes hinged on whether GHD required Ω(1/ε2)\Omega(1/\varepsilon^{2}) communication for tt rounds, see, e.g., Question 10 in [2]. A flurry [9, 10, 16, 51, 49] of recent work finally resolved the complexity of GHD. What our lower bound shows for F0F_{0} is that this is not the only way to prove the Ω(1/ε2)\Omega(1/\varepsilon^{2}) space bound for multiple passes for F0F_{0}. Indeed, we just needed to look at Θ(1/ε2)\Theta(1/\varepsilon^{2}) parties instead of 22 parties. Since we have an Ω(1/ε4)\Omega(1/\varepsilon^{4}) communication lower bound for F0F_{0} with Θ(1/ε2)\Theta(1/\varepsilon^{2}) parties, this implies an Ω((1/ε4)/(t/ε2))=Ω(1/(tε2))\Omega((1/\varepsilon^{4})/(t/\varepsilon^{2}))=\Omega(1/(t\varepsilon^{2})) bound for tt-pass algorithms for approximating F0F_{0}. Arguably our proof is simpler than the recent GHD lower bounds.

Our Ω(kp1/ε2){\Omega}(k^{p-1}/\varepsilon^{2}) bound for FpF_{p} also improves a long line of work on the space complexity of estimating FpF_{p} for p>2p>2 in a data stream. The current best upper bound is O~(N12/pε2)\tilde{O}(N^{1-2/p}\varepsilon^{-2}) bits of space [28]. See Figure 1 of [28] for a list of papers which make progress on the ε\varepsilon and logarithmic factors. The previous best lower bound is Ω~(N12/pε2/p/t)\tilde{\Omega}(N^{1-2/p}\varepsilon^{-2/p}/t) for tt passes [7]. By setting kp=ε2Nk^{p}=\varepsilon^{2}N, we obtain that the total communication is at least Ω(ε22/pN11/p/ε2){\Omega}(\varepsilon^{2-2/p}N^{1-1/p}/\varepsilon^{2}), and so the implied space lower bound for tt-pass algorithms for FpF_{p} in a data stream is Ω(ε2/pN11/p/(tk))=Ω(N12/p/(ε4/pt)){\Omega}(\varepsilon^{-2/p}N^{1-1/p}/(tk))={\Omega}(N^{1-2/p}/(\varepsilon^{4/p}t)). This gives the first bound that agrees with the tight Θ~(1/ε2)\tilde{\Theta}(1/\varepsilon^{2}) bound when p=2p=2 for any constant tt. After our work, Ganguly [29] improved this for the special case t=1t=1. That is, for 11-pass algorithms for estimating FpF_{p}, p>2p>2, he shows a space lower bound of Ω(N12/p/(ε2logn))\Omega(N^{1-2/p}/(\varepsilon^{2}\log n)).

Other Related Work: There are quite a few papers on multiparty number-in-hand communication complexity, though they are not directly relevant for the problems studied in this paper. Alon et al. [3] and Bar-Yossef et al. [7] studied lower bounds for multiparty set-disjointness, which has applications to pp-th frequency moment estimation for p>2p>2 in the streaming model. Their results were further improved in [14, 31, 39]. Chakrabarti et al. [12] studied random-partition communication lower bounds for multiparty set-disjointness and pointer jumping, which have a number of applications in the random-order data stream model. Other work includes Chakrabarti et al. [13] for median selection, Magniez et al. [44] and Chakrabarti et al. [11] for streaming language recognition. Very few studies have been conducted in the message-passing model. Duris and Rolim [23] proved several lower bounds in the message-passing model, but only for some simple boolean functions. Three related but more restrictive private-message models were studied by Gal and Gopalan [27], Ergün and Jowhari [24], and Guha and Huang [32]. The first two only investigated deterministic protocols and the third was tailored for the random-order data stream model.

Recently Phillips et al. [47] introduced a technique called symmetrization for the number-in-hand communication model. The idea is to try to find a symmetric hard distribution for the kk players. Then one reduces the kk-player problem to a 22-player problem by assigning Alice the input of a random player and Bob the inputs of the remaining k1k-1 players. The answer to the kk-player problem gives the answer to the 22-player problem. By symmetrization one can argue that if the communication lower bound for the resulting 22-player problem is LL, then the lower bound for the kk-player problem is Ω(kL)\Omega(kL). While symmetrization developed in [47] can be used to solve some problems for which other techniques are not known, such as bitwise AND/OR and graph connectivity, it has several limitations. First, symmetrization requires a symmetric hard distribution, and for many problems (e.g., Fp(p>1)F_{p}\ (p>1) in this paper) this is not known or unlikely to exist. Second, for many problems (e.g., F0F_{0} in this paper), we need a direct-sum type of argument with certain combining functions (e.g., the majority (MAJ)), while in [47], only outputting all copies or with the combining function OR is considered. Third, the symmetrization technique in [47] does not give information cost bounds, and so it is difficult to use when composing problems as is done in this paper. In this paper, we have further developed symmetrization to make it work with the combining function MAJ and the information cost.

Paper Outline: In Section 3 and Section 4 we prove our lower bounds for F0F_{0} and FpF_{p}, p>1p>1. The lower bounds apply to functional monitoring, but hold even in the static model. In Section 5 we show improved upper bounds for Fp,p>1,F_{p},p>1, for functional monitoring. Finally, in Section 6 we prove lower bounds for all-quantile, heavy hitters, entropy and p\ell_{p} for any p1p\geq 1 in the blackboard model.

2 Preliminaries

In this section we review some basics on communication complexity and information theory.

Information Theory

We refer the reader to [22] for a comprehensive introduction to information theory. Here we review a few concepts and notations.

Let H(X)H(X) denote the Shannon entropy of the random variable XX, and let Hb(p)H_{b}(p) denote the binary entropy function when p[0,1]p\in[0,1]. Let H(X|Y)H(X\ |\ Y) denote conditional entropy of XX given YY. Let I(X;Y)I(X;Y) denote the mutual information between two random variables X,YX,Y. Let I(X;Y|Z)I(X;Y\ |\ Z) denote the mutual information between two random variables X,YX,Y conditioned on ZZ. The following is a summarization of the basic properties of entropy and mutual information that we need.

Proposition 1

Let X,Y,Z,WX,Y,Z,W be random variables.

  1. 1.

    If XX takes value in {1,2,,m}\{1,2,\ldots,m\}, then H(X)[0,logm]H(X)\in[0,\log m].

  2. 2.

    H(X)H(X|Y)H(X)\geq H(X\ |\ Y) and I(X;Y)=H(X)H(X|Y)0I(X;Y)=H(X)-H(X\ |\ Y)\geq 0.

  3. 3.

    If XX and ZZ are independent, then we have I(X;Y|Z)I(X;Y)I(X;Y\ |\ Z)\geq I(X;Y). Similarly, if X,ZX,Z are independent given WW, then I(X;Y|Z,W)I(X;Y|W)I(X;Y\ |\ Z,W)\geq I(X;Y\ |\ W).

  4. 4.

    (Chain rule of mutual information)

    I(X,Y;Z)=I(X;Z)+I(Y;Z|X).I(X,Y;Z)=I(X;Z)+I(Y;Z\ |\ X).

    And in general, for any random variables X1,X2,,Xn,YX_{1},X_{2},\ldots,X_{n},Y,

    I(X1,,Xn;Y)=i=1nI(Xi;Y|X1,,Xi1)\textstyle I(X_{1},\ldots,X_{n};Y)=\sum_{i=1}^{n}I(X_{i};Y\ |\ X_{1},\ldots,X_{i-1}).

    Thus, I(X,Y;Z|W)I(X;Z|W)I(X,Y;Z\ |\ W)\geq I(X;Z\ |\ W).

  5. 5.

    (Data processing inequality) If XX and ZZ are conditionally independent given YY, then I(X;Y|Z,W)I(X;Y|W)I(X;Y\ |\ Z,W)\leq I(X;Y\ |\ W).

  6. 6.

    (Fano’s inequality) Let XX be a random variable chosen from domain 𝒳\mathcal{X} according to distribution μX\mu_{X}, and YY be a random variable chosen from domain 𝒴\mathcal{Y} according to distribution μY\mu_{Y}. For any reconstruction function g:𝒴𝒳g:\mathcal{Y}\to\mathcal{X} with error δg\delta_{g},

    Hb(δg)+δglog(|𝒳|1)H(X|Y).H_{b}(\delta_{g})+\delta_{g}\log(\left|\mathcal{X}\right|-1)\geq H(X\ |\ Y).
  7. 7.

    (The Maximum Likelihood Estimation principle) With the notations as in Fano’s inequality, if the (deterministic) reconstruction function is g(y)=xg(y)=x for the xx that maximizes the conditional probability μX(x|Y=y)\mu_{X}(x\ |\ Y=y), then

    δg112H(X|Y).\delta_{g}\leq 1-\frac{1}{2^{H(X\ |\ Y)}}.

    Call this gg the maximum likelihood function.

Communication complexity

In the two-party randomized communication complexity model (see e.g., [43]), we have two players Alice and Bob. Alice is given x𝒳x\in\mathcal{X} and Bob is given y𝒴y\in\mathcal{Y}, and they want to jointly compute a function f(x,y)f(x,y) by exchanging messages according to a protocol Π\Pi. Let Π(x,y)\Pi(x,y) denote the message transcript when Alice and Bob run protocol Π\Pi on input pair (x,y)(x,y). We sometimes abuse notation by identifying the protocol and the corresponding random transcript, as long as there is no confusion.

The communication complexity of a protocol is defined as the maximum number of bits exchanged among all pairs of inputs. We say a protocol Π\Pi computes ff with error probability δ(0δ1)\delta\ (0\leq\delta\leq 1) if there exists a function gg such that for all input pairs (x,y)(x,y), 𝖯𝗋[g(Π(x,y))f(x,y)]δ\mathsf{Pr}[g(\Pi(x,y))\neq f(x,y)]\leq\delta. The δ\delta-error randomized communication complexity, denoted by Rδ(f)R^{\delta}(f), is the cost of the minimum-communication randomized protocol that computes ff with error probability δ\delta. The (μ,δ)(\mu,\delta)-distributional communication complexity of ff, denoted by Dμδ(f)D_{\mu}^{\delta}(f), is the cost of the minimum-communication deterministic protocol that gives the correct answer for ff on at least a 1δ1-\delta fraction of all input pairs, weighted by distribution μ\mu. Yao [53] showed that

Lemma 1 (Yao’s Lemma)

Rδ(f)maxμDμδ(f)R^{\delta}(f)\geq\max_{\mu}D_{\mu}^{\delta}(f).

Thus, one way to prove a lower bound for randomized protocols is to find a hard distribution μ\mu and lower bound Dμδ(f)D_{\mu}^{\delta}(f). This is called Yao’s Minimax Principle.

We will use the notion expected distributional communication complexity 𝖤𝖣μδ(f)\mathsf{ED}_{\mu}^{\delta}(f), which was introduced in [47] (where it was written as 𝖤[Dμδ(f)]\mathsf{E}[D_{\mu}^{\delta}(f)], with a bit abuse of notation) and is defined to be the expected cost (rather than the worst case cost) of the deterministic protocol that gives the correct answer for ff on at least 1δ1-\delta fraction of all inputs, where the expectation is taken over distribution μ\mu.

The definitions for two-party protocols can be easily extended to the multiparty setting, where we have kk players and the ii-th player is given an input xi𝒳ix_{i}\in\mathcal{X}_{i}. Again the kk players want to jointly compute a function f(x1,x2,,xk)f(x_{1},x_{2},\ldots,x_{k}) by exchanging messages according to a protocol Π\Pi.

Information complexity

Information complexity was introduced in a series of papers including [17, 7]. We refer the reader to Bar-Yossef’s Thesis [6]; see Chapter 6 for a detailed introduction. Here we briefly review the concepts of information cost and conditional information cost for kk-player communication problems. All of them are defined in the blackboard number-in-hand model.

Let μ\mu be an input distribution on 𝒳1×𝒳2××𝒳k\mathcal{X}_{1}\times\mathcal{X}_{2}\times\ldots\times\mathcal{X}_{k} and let XX be a random input chosen from μ\mu. Let Π\Pi be a randomized protocol running on inputs in 𝒳1×𝒳2××𝒳k\mathcal{X}_{1}\times\mathcal{X}_{2}\times\ldots\times\mathcal{X}_{k}. The information cost of Π\Pi with respect to μ\mu is I(X;Π)I(X;\Pi). The information complexity of a problem ff with respect to a distribution μ\mu and error parameter δ(0δ1)\delta\ (0\leq\delta\leq 1), denoted ICμδ(f)\mathrm{IC}_{\mu}^{\delta}(f), is the minimum information cost of a δ\delta-error protocol for ff with respect to μ\mu. We will work in the public coin model, in which all parties also share a common source of randomness.

We say a distribution λ\lambda partitions μ\mu if conditioned on λ\lambda, μ\mu is a product distribution. Let XX be a random input chosen from μ\mu and DD be a random variable chosen from λ\lambda. For a randomized protocol Π\Pi on 𝒳1×𝒳2××𝒳k\mathcal{X}_{1}\times\mathcal{X}_{2}\times\ldots\times\mathcal{X}_{k}, the conditional information cost of Π\Pi with respect to the distribution μ\mu on 𝒳1×𝒳2××𝒳k\mathcal{X}_{1}\times\mathcal{X}_{2}\times\ldots\times\mathcal{X}_{k} and a distribution λ\lambda partitioning μ\mu is defined as I(X;Π|D)I(X;\Pi\ |\ D). The conditional information complexity of a problem ff with respect to a distribution μ\mu, a distribution λ\lambda partitioning μ\mu, and error parameter δ(0δ1)\delta\ (0\leq\delta\leq 1), denoted ICμδ(f|λ)\mathrm{IC}_{\mu}^{\delta}(f|\ \lambda), is the minimum information cost of a δ\delta-error protocol for ff with respect to μ\mu and λ\lambda. The following proposition can be found in [7].

Proposition 2

For any distribution μ\mu, distribution λ\lambda partitioning μ\mu, and error parameter δ(0δ1)\delta\ (0\leq\delta\leq 1),

Rδ(f)ICμδ(f)ICμδ(f|λ).R^{\delta}(f)\geq\mathrm{IC}_{\mu}^{\delta}(f)\geq\mathrm{IC}_{\mu}^{\delta}(f\ |\ \lambda).
Statistical distance measures

Given two probability distributions μ\mu and ν\nu over the same space 𝒳\mathcal{X}, the following statistical distance measures will be used in this paper:

  1. 1.

    Total variation distance: 𝖳𝖵(μ,ν)=defmaxA𝒳|μ(A)ν(A)|\mathsf{TV}(\mu,\nu)\stackrel{{\scriptstyle def}}{{=}}\max_{A\subseteq\mathcal{X}}\left|\mu(A)-\nu(A)\right|.

  2. 2.

    Hellinger distance: h(μ,ν)=def12x𝒳(μ(x)ν(x))2h(\mu,\nu)\stackrel{{\scriptstyle def}}{{=}}\sqrt{\frac{1}{2}\sum_{x\in\mathcal{X}}\left(\sqrt{\mu(x)}-\sqrt{\nu(x)}\right)^{2}}

We have the following relation between total variation distance and Hellinger distance (cf. [6], Chapter 22).

Proposition 3

h2(μ,ν)𝖳𝖵(μ,ν)h(μ,ν)2h2(μ,ν).h^{2}(\mu,\nu)\leq\mathsf{TV}(\mu,\nu)\leq h(\mu,\nu)\sqrt{2-h^{2}(\mu,\nu)}.

The total variation distance of transcripts on a pair of inputs is closely related to the error of a randomized protocol. The following proposition can be found in [6], Proposition 6.22 (the original proposition is for the 2-party case, and generalizing it to the multiparty case is straightforward).

Proposition 4

Let 0<δ<1/20<\delta<1/2, and Π\Pi be a δ\delta-error randomized protocol for a function f:𝒳1××𝒳k𝒵f:\mathcal{X}_{1}\times\ldots\times\mathcal{X}_{k}\to\mathcal{Z}. Then, for every two inputs (x1,,xk),(x1,,xk)𝒳1××𝒳k(x_{1},\ldots,x_{k}),(x^{\prime}_{1},\ldots,x^{\prime}_{k})\in\mathcal{X}_{1}\times\ldots\times\mathcal{X}_{k} for which f(x1,,xk)f(x1,,xk)f(x_{1},\ldots,x_{k})\neq f(x^{\prime}_{1},\ldots,x^{\prime}_{k}), it holds that

𝖳𝖵(Πx1,,xk,Πx1,,xk)>12δ.\mathsf{TV}(\Pi_{x_{1},\ldots,x_{k}},\Pi_{x^{\prime}_{1},\ldots,x^{\prime}_{k}})>1-2\delta.
Conventions.

In the rest of the paper we call a player a site, as to be consistent with the distributed functional monitoring model. We denote [n]={1,,n}[n]=\{1,\ldots,n\}. Let \oplus be the XOR function. All logarithms are base-22 unless noted otherwise. We say W~\tilde{W} is a (1+ε)(1+\varepsilon)-approximation of WW, 0<ε<10<\varepsilon<1, if WW~(1+ε)WW\leq\tilde{W}\leq(1+\varepsilon)W.

3 A Lower Bound for F0F_{0}

We introduce a problem called kk-APPROX-SUM, and then compose it with 22-DISJ (studied, e.g., in [48]) to prove a lower bound for F0F_{0}. In this section we work in the message-passing model.

3.1 The kk-APPROX-SUM Problem

In the kk-APPROX-SUMf,τ problem, we have kk sites S1,S2,,SkS_{1},S_{2},\ldots,S_{k} and the coordinator. Let f:𝒳×𝒴{0,1}f:\mathcal{X}\times\mathcal{Y}\to\{0,1\} be an arbitrary function, and let τ\tau be an arbitrary distribution on 𝒳×𝒴\mathcal{X}\times\mathcal{Y} such that for (X,Y)τ(X,Y)\sim\tau, f(X,Y)=1f(X,Y)=1 with probability β\beta, and 0 with probability 1β1-\beta, where β(cβ/kβ1/cβ\beta\ (c_{\beta}/k\leq\beta\leq 1/c_{\beta} for a sufficiently large constant cβc_{\beta}) is a parameter. We define the input distribution μ\mu for kk-APPROX-SUMf,τ on {X1,,Xk,Y}𝒳k×𝒴\{X_{1},\ldots,X_{k},Y\}\in\mathcal{X}^{k}\times\mathcal{Y} as follows: We first sample (X1,Y)τ(X_{1},Y)\sim\tau, and then independently sample X2,,Xkτ|YX_{2},\ldots,X_{k}\sim\tau\ |\ Y. Note that each pair (Xi,Y)(X_{i},Y) is distributed according to τ\tau. Let Zi=f(Xi,Y)Z_{i}=f(X_{i},Y). Thus ZiZ_{i}’s are i.i.d. Bernoulli(β)(\beta). Let Z={Z1,Z2,,Zk}Z=\{Z_{1},Z_{2},\ldots,Z_{k}\}. We assign XiX_{i} to site SiS_{i} for each i[k]i\in[k], and assign YY to the coordinator.

In the kk-APPROX-SUMf,τ problem, the kk sites want to approximate i[k]Zi\sum_{i\in[k]}Z_{i} up to an additive factor of βk\sqrt{\beta k}. In the rest of this section, for convenience, we omit subscripts f,τf,\tau in kk-APPROX-SUMf,τ, since our results will hold for all f,τf,\tau having the properties mentioned above.

For a fixed transcript Π=π\Pi=\pi, let qiπ=𝖯𝗋[Zi=1|Π=π]q_{i}^{\pi}=\mathsf{Pr}[Z_{i}=1\ |\ \Pi=\pi]. Thus i[k]qiπ=𝖤[i[k]Zi|Π=π]\sum_{i\in[k]}q_{i}^{\pi}=\mathsf{E}[\sum_{i\in[k]}Z_{i}\ |\ \Pi=\pi]. Let c0c_{0} be a sufficiently large constant.

Definition 1

Given an input (x1,,xk,y)(x_{1},\ldots,x_{k},y) and a transcript Π=π\Pi=\pi, let zi=f(xi,y)z_{i}=f(x_{i},y) and z={z1,,zk}z=\{z_{1},\ldots,z_{k}\}. For convenience, we define Π(z)Π(x1,,xk,y)\Pi(z)\triangleq\Pi(x_{1},\ldots,x_{k},y). We say

  1. 1.

    π\pi is bad1 for zz (denoted by z1πz\perp_{1}\pi) if Π(z)=π\Pi(z)=\pi, and for at least 0.10.1 fraction of {i[k]|zi=1}\{i\in[k]\ |\ z_{i}=1\}, it holds that qiπβ/c0q_{i}^{\pi}\leq\beta/c_{0}, and

  2. 2.

    π\pi is bad0 for zz (denoted by z0πz\perp_{0}\pi) if Π(z)=π\Pi(z)=\pi, and for at least 0.10.1 fraction of {i[k]|zi=0}\{i\in[k]\ |\ z_{i}=0\}, it holds that qiπβ/c0q_{i}^{\pi}\geq\beta/c_{0}.

And π\pi is good for zz otherwise.

In this section, we will prove the following theorem. Except stated explicitly, all probabilities, expectations and variances are taken with respect to the input distribution μ\mu.

Theorem 1

Let Π\Pi be the transcript of any deterministic protocol for kk-APPROX-SUM on input distribution μ\mu with error probability δ\delta for some sufficiently small constant δ\delta, then 𝖯𝗋[Π is good]0.96\mathsf{Pr}[\Pi\textrm{ is good}]\geq 0.96.

The following observation, which easily follows from the rectangle property of communication protocols, is crucial to our proof. We have included a proof in Appendix A.

Observation 1

Conditioned on Π\Pi, Z1,Z2,,ZkZ_{1},Z_{2},\ldots,Z_{k} are independent.

Definition 2

We say a transcript π\pi is rare+ if i[k]qiπ4βk\sum_{i\in[k]}q_{i}^{\pi}\geq 4\beta k and rare- if i[k]qiπβk/4\sum_{i\in[k]}q_{i}^{\pi}\leq\beta k/4. In both cases we say π\pi is rare. Otherwise we say it is normal.

Definition 3

We say Z={Z1,Z2,,Zk}Z=\{Z_{1},Z_{2},\ldots,Z_{k}\} is a joker+ if i[k]Zi2βk\sum_{i\in[k]}Z_{i}\geq 2\beta k, and a joker- if i[k]Ziβk/2\sum_{i\in[k]}Z_{i}\leq\beta k/2. In both cases we say ZZ is a joker.

Lemma 2

Under the assumption of Theorem 1, 𝖯𝗋[Π is normal]0.99\mathsf{Pr}[\Pi\textrm{ is normal}]\geq 0.99.

  • Proof:

    First, we can apply a Chernoff bound on random variables Z1,,ZkZ_{1},\ldots,Z_{k}, and get

    𝖯𝗋[Z is a joker+]=𝖯𝗋[i[k]Zi2βk]eβk/3.\mathsf{Pr}[Z\textrm{ is a joker}^{+}]=\mathsf{Pr}\left[\sum_{i\in[k]}Z_{i}\geq 2\beta k\right]\leq e^{-\beta k/3}.

    Second, by Observation 1, we can apply a Chernoff bound on random variables Z1,,ZkZ_{1},\ldots,Z_{k} conditioned on Π\Pi being rare+,

    𝖯𝗋[Z is a joker+|Π is rare+]\displaystyle\mathsf{Pr}[Z\textrm{ is a joker}^{+}\ |\ \Pi\textrm{ is rare}^{+}]
    \displaystyle\geq π𝖯𝗋[Π=π|Π is rare+]𝖯𝗋[Z is a joker+|Π=π,Π is rare+]\displaystyle\sum_{\pi}\mathsf{Pr}\left[\Pi=\pi\ |\ \Pi\textrm{ is rare}^{+}\right]\mathsf{Pr}\left[Z\textrm{ is a joker}^{+}\ |\ \Pi=\pi,\Pi\textrm{ is rare}^{+}\right]
    =\displaystyle= π𝖯𝗋[Π=π|Π is rare+]𝖯𝗋[i[k]Zi2βk|i[k]qiπ4βk,Π=π]\displaystyle\sum_{\pi}\mathsf{Pr}\left[\Pi=\pi\ |\ \Pi\textrm{ is rare}^{+}\right]\mathsf{Pr}\left[\left.\sum_{i\in[k]}Z_{i}\geq 2\beta k\ \right|\sum_{i\in[k]}q_{i}^{\pi}\geq 4\beta k,\Pi=\pi\right]
    \displaystyle\geq π𝖯𝗋[Π=π|Π is rare+](1eβk/2)\displaystyle\sum_{\pi}\mathsf{Pr}\left[\Pi=\pi\ |\ \Pi\textrm{ is rare}^{+}\right]\left(1-e^{-\beta k/2}\right)
    =\displaystyle= (1eβk/2).\displaystyle\left(1-e^{-\beta k/2}\right).

    Finally by Bayes’ theorem, we have that

    𝖯𝗋[Π is rare+]\displaystyle\mathsf{Pr}[\Pi\textrm{ is rare}^{+}] =\displaystyle= 𝖯𝗋[Z is a joker+]𝖯𝗋[Π is rare+|Z is a joker+]𝖯𝗋[Z is a joker+|Π is rare+]\displaystyle\frac{\mathsf{Pr}[Z\textrm{ is a joker}^{+}]\cdot\mathsf{Pr}[\Pi\textrm{ is rare}^{+}\ |\ Z\textrm{ is a joker}^{+}]}{\mathsf{Pr}[Z\textrm{ is a joker}^{+}\ |\ \Pi\textrm{ is rare}^{+}]}
    \displaystyle\leq eβk/31eβk/22eβk/3.\displaystyle\frac{e^{-\beta k/3}}{1-e^{-\beta k/2}}\leq 2e^{-\beta k/3}.

    Similarly, we can also show that 𝖯𝗋[Π is rare]2eβk/8\mathsf{Pr}[\Pi\textrm{ is rare}^{-}]\leq 2e^{-\beta k/8}. Therefore 𝖯𝗋[Π is rare]4eβk/80.01\mathsf{Pr}[\Pi\textrm{ is rare}]\leq 4e^{-\beta k/8}\leq 0.01 (recall that by our assumption βkcβ\beta k\geq c_{\beta} for a sufficiently large constant cβc_{\beta}).  

Definition 4

Let c=40c0c_{\ell}=40c_{0}. We say a transcript π\pi is weak if i[k]qiπ(1qiπ)βk/c\sum_{i\in[k]}q_{i}^{\pi}(1-q_{i}^{\pi})\geq\beta k/c_{\ell}, and strong otherwise.

Lemma 3

Under the assumption of Theorem 1, 𝖯𝗋[Π is normal and strong]0.98\mathsf{Pr}[\Pi\textrm{ is normal and strong}]\geq 0.98.

  • Proof:

    We first show that for a normal and weak transcript π\pi, there exists a constant δ=δ(c)\delta_{\ell}=\delta_{\ell}(c_{\ell}) such that

    𝖯𝗋[i[k]Zii[k]qiπ+2βk|Π=π]\displaystyle\mathsf{Pr}\left[\left.\sum_{i\in[k]}Z_{i}\leq\sum_{i\in[k]}q_{i}^{\pi}+2\sqrt{\beta k}\ \right|\ \Pi=\pi\right] \displaystyle\geq δ,\displaystyle\delta_{\ell}, (1)
    and 𝖯𝗋[i[k]Zii[k]qiπ+4βk|Π=π]\displaystyle\text{and \quad}\mathsf{Pr}\left[\left.\sum_{i\in[k]}Z_{i}\geq\sum_{i\in[k]}q_{i}^{\pi}+4\sqrt{\beta k}\ \right|\ \Pi=\pi\right] \displaystyle\geq δ.\displaystyle\delta_{\ell}. (2)

    The first inequality is a simple application of Chernoff-Hoeffding bound. Recall that for a normal π\pi, i[k]qiπ4βk\sum_{i\in[k]}q_{i}^{\pi}\leq 4\beta k. We have

    𝖯𝗋[i[k]Zii[k]qiπ+2βk|Π=π,Π is normal]\displaystyle\mathsf{Pr}\left[\left.\sum_{i\in[k]}Z_{i}\leq\sum_{i\in[k]}q_{i}^{\pi}+2\sqrt{\beta k}\ \right|\ \Pi=\pi,\Pi\text{ is normal}\right]
    \displaystyle\geq 1𝖯𝗋[i[k]Zii[k]qiπ+2βk|Π=π,Π is normal]\displaystyle 1-\mathsf{Pr}\left[\left.\sum_{i\in[k]}Z_{i}\geq\sum_{i\in[k]}q_{i}^{\pi}+2\sqrt{\beta k}\ \right|\ \Pi=\pi,\Pi\text{ is normal}\right]
    \displaystyle\geq 1e8βk2i[k]qiπ1e2δ.(for a sufficiently small constant δ)\displaystyle 1-e^{-\frac{8\sqrt{\beta k}^{2}}{\sum_{i\in[k]}q_{i}^{\pi}}}\geq 1-e^{-2}\geq\delta_{\ell}.\quad(\text{for a sufficiently small constant $\delta_{\ell}$})

    Now we prove for the second inequality. We will need the following anti-concentration result which is an easy consequence of Feller [26] (cf. [46]).

    Fact 1

    ([46]) Let YY be a sum of independent random variables, each attaining values in [0,1][0,1], and let σ=𝖵𝖺𝗋[Y]200\sigma=\sqrt{\mathsf{Var}[Y]}\geq 200. Then for all t[0,σ2/100]t\in[0,\sigma^{2}/100], we have

    𝖯𝗋[Y𝖤[Y]+t]cet2/(3σ2)\mathsf{Pr}[Y\geq\mathsf{E}[Y]+t]\geq c\cdot e^{-t^{2}/(3\sigma^{2})}

    for a universal constant c>0c>0.

For a normal and weak Π=π\Pi=\pi, it holds that

𝖵𝖺𝗋[i[k]Zi|Π=π]\displaystyle\mathsf{Var}\left[\sum_{i\in[k]}Z_{i}\ |\ \Pi=\pi\right] =\displaystyle= i[k]𝖵𝖺𝗋[Zi|Π=π](by observation 1)\displaystyle\sum_{i\in[k]}\mathsf{Var}\left[Z_{i}\ |\ \Pi=\pi\right]\quad(\text{by observation \ref{obs:first}})
=\displaystyle= i[k]qiπ(1qiπ)\displaystyle\sum_{i\in[k]}q_{i}^{\pi}(1-q_{i}^{\pi})
\displaystyle\geq βk/c.(by definition of a weak π)\displaystyle\beta k/c_{\ell}.\quad(\text{by definition of a weak $\pi$})

Recall that by our assumption, βkcβ\beta k\geq c_{\beta} for a sufficiently large constant cβc_{\beta}, thus βkβk/(100c)\sqrt{\beta k}\leq\beta k/(100c_{\ell}) and βk/c2002\beta k/c_{\ell}\geq 200^{2}. Using Lemma 1, we have for a universal constant cc,

𝖯𝗋[i[k]Zii[k]qiπ+4βk|Π=π,Π is weak]\displaystyle\mathsf{Pr}\left[\left.\sum_{i\in[k]}Z_{i}\geq\sum_{i\in[k]}q_{i}^{\pi}+4\sqrt{\beta k}\ \right|\ \Pi=\pi,\Pi\text{ is weak}\right]
\displaystyle\geq ce(4βk)23βk/cce16c/3δ.(for a sufficiently small constant δ)\displaystyle c\cdot e^{-\frac{(4\sqrt{\beta k})^{2}}{3\beta k/c_{\ell}}}\geq c\cdot e^{-16c_{\ell}/3}\geq\delta_{\ell}.\quad(\text{for a sufficiently small constant $\delta_{\ell}$})

By (1) and (2), it is easy to see that given that Π\Pi is normal, it cannot be weak with probability more than 0.010.01, since otherwise by Lemma 2 and the analysis above, the error probability of the protocol will be at least 0.990.01δ>δ0.99\cdot 0.01\cdot\delta_{\ell}>\delta, for an arbitrarily small constant error δ\delta, violating the success guarantee of the lemma. Therefore,

𝖯𝗋[Π is normal and strong]𝖯𝗋[Π is normal]𝖯𝗋[Π is strong|Π is normal]0.990.990.98.\mathsf{Pr}[\Pi\text{ is normal and strong}]\geq\mathsf{Pr}[\Pi\text{ is normal}]\ \mathsf{Pr}[\Pi\text{ is strong}\ |\ \Pi\text{ is normal}]\geq 0.99\cdot 0.99\geq 0.98.
 

Now we analyze the probability of Π\Pi being good. For a Z=zZ=z, let H0(z)={i|zi=0}H_{0}(z)=\{i\ |\ z_{i}=0\} and H1(z)={i|zi=1}H_{1}(z)=\{i\ |\ z_{i}=1\}. We have the following two lemmas.

Lemma 4

Under the assumption of Theorem 1, 𝖯𝗋[Π is bad0|Π is normal and strong]0.01\mathsf{Pr}[\Pi\text{ is bad${}_{0}$}\ |\ \Pi\text{ is normal and strong}]\leq 0.01.

  • Proof:

    Consider any Z=zZ=z. First, by the definition of a normal π\pi, we have i:zi=0qiπi[k]qiπ4βk\sum_{i:z_{i}=0}q_{i}^{\pi}\leq\sum_{i\in[k]}q_{i}^{\pi}\leq 4\beta k. Therefore the number of ii’s such that zi=0z_{i}=0 and qiπ>(1β/c0)q_{i}^{\pi}>(1-\beta/c_{0}) is at most 4βk/(1β/c0)8βk4\beta k/(1-\beta/c_{0})\leq 8\beta k. Second, by the definition of a strong π\pi, we have i:zi=0qiπ(1qiπ)i[k]qiπ(1qiπ)βk/c\sum_{i:z_{i}=0}q_{i}^{\pi}(1-q_{i}^{\pi})\leq\sum_{i\in[k]}q_{i}^{\pi}(1-q_{i}^{\pi})\leq\beta k/c_{\ell}. Therefore the number of ii’s such that zi=0z_{i}=0 and β/c0qiπ(1β/c0)\beta/c_{0}\leq q_{i}^{\pi}\leq(1-\beta/c_{0}) is at most βk/cβ/c0(1β/c0)0.05k(c=40c0)\frac{\beta k/c_{\ell}}{\beta/c_{0}\cdot(1-\beta/c_{0})}\leq 0.05k\ (c_{\ell}=40c_{0}). Also note that if zz is not joker, then |H0(z)|(k2βk)\left|H_{0}(z)\right|\geq(k-2\beta k). Thus conditioned on a normal and strong π\pi, as well as zz is not a joker, the number of ii’s such that zi=0z_{i}=0 and qiπ<β/c0q_{i}^{\pi}<\beta/c_{0} is at least

    (k2βk)8βk0.05k>0.9k0.9|H0(z)|,\displaystyle(k-2\beta k)-8\beta k-0.05k>0.9k\geq 0.9\left|H_{0}(z)\right|,

    where we have used our assumption that β1/cβ\beta\leq 1/c_{\beta} for a sufficiently large constant cβc_{\beta}. We conclude that

    𝖯𝗋[Π is bad0|Π is normal and strong]𝖯𝗋[Z is a joker]2eβk/80.01.\mathsf{Pr}[\Pi\text{ is bad${}_{0}$}\ |\ \Pi\text{ is normal and strong}]\leq\mathsf{Pr}[Z\text{ is a joker}]\leq 2e^{-\beta k/8}\leq 0.01.
     
Lemma 5

Under the assumption of Theorem 1, 𝖯𝗋[Π is bad1|Π is normal]0.01\mathsf{Pr}[\Pi\text{ is bad${}_{1}$}\ |\ \Pi\text{ is normal}]\leq 0.01.

  • Proof:

    Call a π\pi is bad1 for a set T[k]T\subseteq[k] (denoted by T1πT\perp_{1}\pi), if for more than 0.10.1 fraction of iTi\in T, we have qiπβ/c0q_{i}^{\pi}\leq\beta/c_{0}. Let χ()=1\chi(\mathcal{E})=1 if \mathcal{E} holds and χ()=0\chi(\mathcal{E})=0 otherwise. We have

    𝖯𝗋[Π is bad1|Π is normal]\displaystyle\mathsf{Pr}[\Pi\text{ is bad${}_{1}$}\ |\ \Pi\text{ is normal}] (4)
    =\displaystyle= π𝖯𝗋[Π=π|Π is normal]z𝖯𝗋[Z=z|Π=π,Π is normal]χ(z1π)\displaystyle\sum_{\pi}\mathsf{Pr}[\Pi=\pi\ |\ \Pi\text{ is normal}]\sum_{z}\mathsf{Pr}[Z=z\ |\ \Pi=\pi,\Pi\text{ is normal}]\ \chi(z\perp_{1}\pi)
    \displaystyle\leq 𝖯𝗋[Z is a joker]+π𝖯𝗋[Π=π|Π is normal]\displaystyle\mathsf{Pr}[Z\text{ is a joker}]+\sum_{\pi}\mathsf{Pr}[\Pi=\pi\ |\ \Pi\text{ is normal}]
    [βk/2,2βk]T[k]:|T|=z𝖯𝗋[Z=z|Π=π,Π is normal]χ(H1(z)=T)χ(T1π)\displaystyle\quad\quad\sum_{\ell\in[{\beta k}/{2},2\beta k]}\sum_{T\subseteq[k]:\left|T\right|=\ell}\sum_{z}\mathsf{Pr}[Z=z\ |\ \Pi=\pi,\Pi\text{ is normal}]\ \chi(H_{1}(z)=T)\ \chi(T\perp_{1}\pi)
    \displaystyle\leq 𝖯𝗋[Z is a joker]+π𝖯𝗋[Π=π|Π is normal]\displaystyle\mathsf{Pr}[Z\text{ is a joker}]+\sum_{\pi}\mathsf{Pr}[\Pi=\pi\ |\ \Pi\text{ is normal}]
    [βk/2,2βk](T[k]:|T|=T1πiTqiπ|Π=π,Π is normal)\displaystyle\quad\quad\sum_{\ell\in[\beta k/2,2\beta k]}\left(\left.\sum_{\begin{subarray}{c}T\subseteq[k]:\left|T\right|=\ell\\ T\perp_{1}\pi\end{subarray}}\prod_{i\in T}q_{i}^{\pi}\ \right|\ \Pi=\pi,\Pi\text{ is normal}\right)

    The last inequality holds since in (4), in the last term, we count the probability of each possible set TT of size \ell and is 1\perp_{1} to π\pi that its elements are all 11, which upper bounds the corresponding summation in (4). Now for a fixed \ell, conditioned on a normal π\pi, we consider the term

    T[k]:|T|=T1πiTqiπ.\displaystyle\sum_{\begin{subarray}{c}T\subseteq[k]:\left|T\right|=\ell\\ T\perp_{1}\pi\end{subarray}}\prod_{i\in T}q_{i}^{\pi}. (5)

    W.l.o.g., we can assume that q1πqsπ>β/c0qs+1πqkπq_{1}^{\pi}\geq\ldots\geq q_{s}^{\pi}>\beta/c_{0}\geq q_{s+1}^{\pi}\geq\ldots\geq q_{k}^{\pi} for an s=κsk(0<κs1)s=\kappa_{s}k\ (0<\kappa_{s}\leq 1). We consider a pair (quπ,qvπ)(u,v[k])(q_{u}^{\pi},q_{v}^{\pi})\ (u,v\in[k]). Terms in the summation (5) that includes either quπq_{u}^{\pi} or qvπq_{v}^{\pi} can be written as

    quπT[k]:|T|=T1πuT,vTiT\uqiπ+qvπT[k]:|T|=T1πvT,uTiT\vqiπ+quπqvπT[k]:|T|=T1πvT,uTiT\v,uqiπ.\displaystyle q_{u}^{\pi}\sum_{\begin{subarray}{c}T\subseteq[k]:\left|T\right|=\ell\\ T\perp_{1}\pi\\ u\in T,v\not\in T\end{subarray}}\prod_{i\in T\backslash u}q_{i}^{\pi}+q_{v}^{\pi}\sum_{\begin{subarray}{c}T\subseteq[k]:\left|T\right|=\ell\\ T\perp_{1}\pi\\ v\in T,u\not\in T\end{subarray}}\prod_{i\in T\backslash v}q_{i}^{\pi}+q_{u}^{\pi}q_{v}^{\pi}\sum_{\begin{subarray}{c}T\subseteq[k]:\left|T\right|=\ell\\ T\perp_{1}\pi\\ v\in T,u\in T\end{subarray}}\prod_{i\in T\backslash v,u}q_{i}^{\pi}.

    By the symmetry of quπ,qvπq_{u}^{\pi},q_{v}^{\pi}, the sets {T\u|T[k],|T|=,T1π,uT,vT}\{T\backslash u\ |\ T\subseteq[k],\left|T\right|=\ell,T\perp_{1}\pi,u\in T,v\not\in T\} and {T\v|T[k],|T|=,T1π,vT,uT}\{T\backslash v\ |\ T\subseteq[k],\left|T\right|=\ell,T\perp_{1}\pi,v\in T,u\not\in T\} are the same. Using this fact and the AM-GM inequality, it is easy to see that the sum will not decrease if we set (quπ)=(qvπ)=(quπ+qvπ)/2(q_{u}^{\pi})^{\prime}=(q_{v}^{\pi})^{\prime}=(q_{u}^{\pi}+q_{v}^{\pi})/{2}. Call such an operation an equalization. We repeat applying such equalizations to any pair (quπ,qvπ)(q_{u}^{\pi},q_{v}^{\pi}), with the constraint that if u[1,s]u\in[1,s] and v[s+1,k]v\in[s+1,k], then we only “average” them to the extent that (quπ)=β/c0,(qvπ)=quπ+qvπβ/c0(q_{u}^{\pi})^{\prime}=\beta/c_{0},(q_{v}^{\pi})^{\prime}=q_{u}^{\pi}+q_{v}^{\pi}-\beta/c_{0} if quπ+qvπ2β/c0q_{u}^{\pi}+q_{v}^{\pi}\leq 2\beta/c_{0}, and (qvπ)=β/c0,(quπ)=quπ+qvπβ/c0(q_{v}^{\pi})^{\prime}=\beta/c_{0},(q_{u}^{\pi})^{\prime}=q_{u}^{\pi}+q_{v}^{\pi}-\beta/c_{0} otherwise. We introduce this constraint because we do not want to change |{i|(qiπ)β/c0}|\left|\{i\ |\ (q_{i}^{\pi})^{\prime}\leq\beta/c_{0}\}\right|, since otherwise a set TT which was originally 1Π\perp_{1}\Pi can be ⟂̸1Π\not\perp_{1}\Pi after these equalizations. We cannot further apply equalizations when one of the followings happen.

    (q1π)==(qsπ)>β/c0=(qs+1π)==(qkπ).\displaystyle(q_{1}^{\pi})^{\prime}=\ldots=(q_{s}^{\pi})^{\prime}>\beta/c_{0}=(q_{s+1}^{\pi})^{\prime}=\ldots=(q_{k}^{\pi})^{\prime}. (6)
    (q1π)==(qsπ)=β/c0(qs+1π)==(qkπ).\displaystyle(q_{1}^{\pi})^{\prime}=\ldots=(q_{s}^{\pi})^{\prime}=\beta/c_{0}\geq(q_{s+1}^{\pi})^{\prime}=\ldots=(q_{k}^{\pi})^{\prime}. (7)

    We note that actually (7) cannot happen since i[k](qiπ)=i[k]qiπ\sum_{i\in[k]}(q_{i}^{\pi})^{\prime}=\sum_{i\in[k]}q_{i}^{\pi} is preserved during equalizations, and conditioned on a normal π\pi, we have i[k]qiπβk/4>βk/c0\sum_{i\in[k]}q_{i}^{\pi}\geq\beta k/4>\beta k/c_{0}.

    Let q=(q1π)==(qsπ)q=(q_{1}^{\pi})^{\prime}=\ldots=(q_{s}^{\pi})^{\prime}. For a normal π\pi, it holds that i[k](qiπ)=sq+(ks)β/c0=r[βk/4,4βk]\sum_{i\in[k]}(q_{i}^{\pi})^{\prime}=s\cdot q+(k-s)\cdot\beta/c_{0}=r\in[\beta k/4,4\beta k]. Let α(0.1,1]\alpha\in(0.1,1]. Recall that [βk/2,2βk]\ell\in[\beta k/2,2\beta k], and we have set s=κsks=\kappa_{s}k. We try to upper bound (5) using (6).

    T[k]:|T|=T1πiTqiπ.\displaystyle\sum_{\begin{subarray}{c}T\subseteq[k]:\left|T\right|=\ell\\ T\perp_{1}\pi\end{subarray}}\prod_{i\in T}q_{i}^{\pi}. \displaystyle\leq ((ksα)(s(1α)))((βc0)α(rs(ks)βc0s)(1α))\displaystyle\left({k-s\choose\alpha\ell}\cdot{s\choose(1-\alpha)\ell}\right)\cdot\left(\left(\frac{\beta}{c_{0}}\right)^{\alpha\ell}\cdot\left(\frac{r}{s}-\frac{(k-s)\beta}{c_{0}s}\right)^{(1-\alpha)\ell}\right)
    \displaystyle\leq ((e(1κs)kα)α(eκsk(1α))(1α))((βc0)α(rκsk)(1α))\displaystyle\left(\left(\frac{e(1-\kappa_{s})k}{\alpha\ell}\right)^{\alpha\ell}\cdot\left(\frac{e\kappa_{s}k}{(1-\alpha)\ell}\right)^{(1-\alpha)\ell}\right)\cdot\left(\left(\frac{\beta}{c_{0}}\right)^{\alpha\ell}\cdot\left(\frac{r}{\kappa_{s}k}\right)^{(1-\alpha)\ell}\right)
    \displaystyle\leq (eαc0βk)α(er(1α))(1α)\displaystyle\left(\frac{e}{\alpha c_{0}}\cdot\frac{\beta k}{\ell}\right)^{\alpha\ell}\cdot\left(\frac{er}{(1-\alpha)\ell}\right)^{(1-\alpha)\ell}
    \displaystyle\leq (8e(c0)ααα(1α)1α)\displaystyle\left(\frac{8e}{(c_{0})^{\alpha}\cdot\alpha^{\alpha}(1-\alpha)^{1-\alpha}}\right)^{\ell}
    \displaystyle\leq (8e(c0)0.1(1/e)2/e)βk/2\displaystyle\left(\frac{8e}{(c_{0})^{0.1}\cdot(1/e)^{2/e}}\right)^{\beta k/2} (9)

    In (Proof: ), the first term is the number of possible choices of the set T(T=)T\ (T=\ell) with α\alpha fraction of items in [s+1,][s+1,\infty], and the rest in [1,s][1,s]. And the second term upper bounds iTqiπ\prod_{i\in T}q_{i}^{\pi} according to the discussion above. Here we have assumed α<1\alpha<1, otherwise if α=1\alpha=1, then (Proof: )(k)(β/c0)(2e/c0)βk/2(\ref{eq:alpha})\leq{k\choose\ell}\cdot(\beta/c_{0})^{\ell}\leq(2e/c_{0})^{\beta k/2}, which is smaller than (9). Now, (4) can be upper bounded by

    2eβk/8+π𝖯𝗋[Π=π|Π is normal]2βk(8e(c0)0.1(1/e)2/e)βk/2\displaystyle 2e^{-\beta k/8}+\sum_{\pi}\mathsf{Pr}[\Pi=\pi\ |\ \Pi\text{ is normal}]\cdot 2\beta k\cdot\left(\frac{8e}{(c_{0})^{0.1}\cdot(1/e)^{2/e}}\right)^{\beta k/2}
    =\displaystyle= 2eβk/8+2βk(8e(c0)0.1(1/e)2/e)βk/2\displaystyle 2e^{-\beta k/8}+2\beta k\cdot\left(\frac{8e}{(c_{0})^{0.1}\cdot(1/e)^{2/e}}\right)^{\beta k/2}
    \displaystyle\leq 0.01.(for a sufficiently large constant c0)\displaystyle 0.01.\quad\text{(for a sufficiently large constant $c_{0}$)}
     

Finally, combining Lemma 3, Lemma 4 and Lemma 5, we get

𝖯𝗋[Π is good]𝖯𝗋[Π is good, normal and strong]\displaystyle\mathsf{Pr}[\Pi\text{ is good}]\geq\mathsf{Pr}[\Pi\text{ is good, normal and strong}]
=\displaystyle= 𝖯𝗋[Π is normal and strong](1𝖯𝗋[Π is bad0|Π is normal and strong]\displaystyle\mathsf{Pr}[\Pi\text{ is normal and strong}](1-\mathsf{Pr}[\Pi\text{ is bad${}_{0}$}\ |\ \Pi\text{ is normal and strong}]
𝖯𝗋[Π is bad1|Π is normal and strong])\displaystyle\quad\quad\quad\quad-\mathsf{Pr}[\Pi\text{ is bad${}_{1}$}\ |\ \Pi\text{ is normal and strong}])
\displaystyle\geq 𝖯𝗋[Π is normal and strong](1𝖯𝗋[Π is bad0|Π is normal and strong])\displaystyle\mathsf{Pr}[\Pi\text{ is normal and strong}](1-\mathsf{Pr}[\Pi\text{ is bad${}_{0}$}\ |\ \Pi\text{ is normal and strong}])
𝖯𝗋[Π is normal]𝖯𝗋[Π is bad1|Π is normal]\displaystyle\quad\quad\quad\quad-\mathsf{Pr}[\Pi\text{ is normal}]\ \mathsf{Pr}[\Pi\text{ is bad${}_{1}$}\ |\ \Pi\text{ is normal}]
\displaystyle\geq 0.98(10.01)0.010.96.\displaystyle 0.98\cdot(1-0.01)-0.01\geq 0.96.

3.2 The 22-DISJ Problem

In 22-DISJ problem, Alice has a set x[n]x\subseteq[n] and Bob has a set y[n]y\subseteq[n]. Their goal is to output 11 if xyx\cap y\neq\emptyset, and 0 otherwise.

We define the input distribution τβ\tau_{\beta} as follows. Let =(n+1)/4\ell=(n+1)/4. With probability β\beta, xx and yy are random subsets of [n][n] such that |x|=|y|=\left|x\right|=\left|y\right|=\ell and |xy|=1\left|x\cap y\right|=1. And with probability 1β1-\beta, xx and yy are random subsets of [n][n] such that |x|=|y|=\left|x\right|=\left|y\right|=\ell and xy=x\cap y=\emptyset. Razborov [48] proved that for β=1/4\beta=1/4, Dτ1/41/(400)(2-DISJ)=Ω(n)D^{1/(400)}_{\tau_{1/4}}(\mbox{{$2$-DISJ}})=\Omega(n). It is easy to extend this result to general β\beta and the average-case complexity.

Theorem 2 ([47], Lemma 2.2)

For any β1/4\beta\leq 1/4, it holds that 𝖤𝖣τββ/100(2-DISJ)=Ω(n)\mathsf{ED}^{\beta/100}_{\tau_{\beta}}(\mbox{{$2$-DISJ}})=\Omega(n), where the expectation is taken over the input distribution τβ\tau_{\beta}.

In the rest of the section, we simply write τβ\tau_{\beta} as τ\tau.

3.3 The Complexity of F0F_{0}

3.3.1 Connecting F0F_{0} and kk-APPROX-SUM2-DISJ,τ{}_{\text{$2$-DISJ},\tau}

Set β=1/(kε2)\beta=1/(k\varepsilon^{2}), B=20000/δB=20000/\delta, where δ\delta is the small constant error parameter for kk-APPROX-SUM in Theorem 1.

We choose ff to be 22-DISJ with universe size n=B/ε2n=B/\varepsilon^{2}, set its input distribution to be τ\tau, and work on kk-APPROX-SUM2-DISJ,τ{}_{\text{$2$-DISJ},\tau}. Let μ\mu be the input distribution of kk-APPROX-SUM2-DISJ,τ{}_{\text{$2$-DISJ},\tau}, which is a function of τ\tau (see Section 3.1 for the detailed construction of μ\mu from τ\tau). Let {X1,,Xk,Y}μ\{X_{1},\ldots,X_{k},Y\}\sim\mu. Let Zi=2-DISJ(Xi,Y)Z_{i}=\text{{$2$-DISJ}}(X_{i},Y). Let ζ\zeta be the induced distribution of μ\mu on {X1,,Xk}\{X_{1},\ldots,X_{k}\} which we choose to be the input distribution for F0F_{0}. In the rest of this section, for convenience, we will omit the subscripts 22-DISJ and τ\tau in kk-APPROX-SUM2-DISJ,τ{}_{\text{$2$-DISJ},\tau} when there is no confusion.

Let N=i[k]Zi=i[k]2-DISJ(Xi,Y)N=\sum_{i\in[k]}Z_{i}=\sum_{i\in[k]}\text{{$2$-DISJ}}(X_{i},Y). Let R=F0(i[k]XiY)R=F_{0}(\cup_{i\in[k]}X_{i}\bigcap Y). The following lemma shows that RR will concentrate around its expectation 𝖤[R]\mathsf{E}[R], which can be calculated exactly.

Lemma 6

With probability at least (16500/B)(1-6500/B), we have |R𝖤[R]|1/(10ε)\left|R-\mathsf{E}[R]\right|\leq 1/(10\varepsilon), where 𝖤[R]=(1λ)N\mathsf{E}[R]=(1-\lambda)N for some fixed constant 0λ4/B0\leq\lambda\leq 4/B.

  • Proof:

    We can think of our problem as a bin-ball game: Think each pair (Xi,Y)(X_{i},Y) such that 22-DISJ(Xi,Y)=1(X_{i},Y)=1 are balls (thus we have NN balls), and elements in the set YY are bins. Let =|Y|\ell=\left|Y\right|. We throw each of the NN balls into one of the \ell bins uniformly at random. Our goal is to estimate the number of non-empty bins at the end of the process.

    By a Chernoff bound, with probability (1eβk/3)(1100/B)\left(1-e^{-\beta k/3}\right)\geq(1-100/B), N2βk=2/ε2N\leq 2\beta k=2/\varepsilon^{2}. By Fact 11 and Lemma 11 in [41], we have 𝖤[R]=(1(11/)N)\mathsf{E}[R]=\ell\left(1-(1-1/\ell)^{N}\right) and 𝖵𝖺𝗋[R]<4N2/\mathsf{Var}[R]<4N^{2}/\ell. Thus by Chebyshev’s inequality we have

    𝖯𝗋[|R𝖤[R]|>1/(10ε)]𝖵𝖺𝗋[R]1/(100ε2)6400B.\mathsf{Pr}[\left|R-\mathsf{E}[R]\right|>1/(10\varepsilon)]\leq\frac{\mathsf{Var}[R]}{1/(100\varepsilon^{2})}\leq\frac{6400}{B}.

    Let θ=N/8/B\theta=N/\ell\leq 8/B. We can write

    𝖤[R]=(1eθ)+O(1)=θ(1θ2!+θ23!θ34!+)+O(1).\mathsf{E}[R]=\ell\left(1-e^{-\theta}\right)+O(1)=\theta\ell\left(1-\frac{\theta}{2!}+\frac{\theta^{2}}{3!}-\frac{\theta^{3}}{4!}+\right)+O(1).

    This series converges and thus we can write 𝖤[R]=(1λ)θ=(1λ)N\mathsf{E}[R]=(1-\lambda)\theta\ell=(1-\lambda)N for some fixed constant 0λθ/24/B0\leq\lambda\leq\theta/2\leq 4/B.  

The next lemma shows that we can use a protocol for F0F_{0} to solve kk-APPROX-SUM with good properties.

Lemma 7

Any protocol 𝒫\cal P that computes a (1+γε)(1+\gamma\varepsilon)-approximation to F0F_{0} (for a sufficiently small constant γ\gamma) on input distribution ζ\zeta with error probability δ/2\delta/2 can be used to compute kk-APPROX-SUM2-DISJ,τ{}_{\text{$2$-DISJ},\tau}  on input distribution μ\mu with error probability δ\delta.

  • Proof:

    Given an input {X1,,Xk,Y}μ\{X_{1},\ldots,X_{k},Y\}\sim\mu for kk-APPROX-SUM. The kk sites and the coordinator use 𝒫\cal P to compute W~\tilde{W} which is a (1+γε)(1+\gamma\varepsilon)-approximation to F0(X1,,Xk)F_{0}(X_{1},\ldots,X_{k}), and then determine the answer to kk-APPROX-SUM to be

    W~(n)1λ.\displaystyle\frac{\tilde{W}-(n-\ell)}{1-\lambda}.

    Recall that 0λ4/B0\leq\lambda\leq 4/B is some fixed constant, n=B/ε2n=B/\varepsilon^{2} and =(n+1)/4\ell=(n+1)/4.

Correctness.

Given a random input (X1,,Xk,Y)ζ(X_{1},\ldots,X_{k},Y)\sim\zeta, the exact value of W=F0(X1,,Xk)W=F_{0}(X_{1},\ldots,X_{k}) can be written as the sum of two components.

W=Q+R,W=Q+R, (10)

where QQ counts F0(i[k]Xi\Y)F_{0}(\cup_{i\in[k]}X_{i}\backslash Y), and RR counts F0(i[k]XiY)F_{0}(\cup_{i\in[k]}X_{i}\bigcap Y). First, from our construction it is easy to see by a Chernoff bound and the union bound that with probability (11/ε2eΩ(k))1100/B\left(1-1/\varepsilon^{2}\cdot e^{-\Omega(k)}\right)\geq 1-100/B, we have Q=|{[n]Y}|=nQ=\left|\{[n]-Y\}\right|=n-\ell, since each element in 𝒮\Y\mathcal{S}\backslash Y will be chosen by every Xi(i=1,2,,k)X_{i}\ (i=1,2,\ldots,k) with a probability at least 1/41/4. Second, by Lemma 6 we know that with probability (16500/B)(1-6500/B), RR is within 1/(10ε)1/(10\varepsilon) from its mean (1λ)N(1-\lambda)N for some fixed constant 0λ4/B0\leq\lambda\leq 4/B. Thus with probability (16600/B)(1-6600/B), we can write Equation (10) as

W=(n)+(1λ)N+κ1,W=(n-\ell)+(1-\lambda)N+\kappa_{1}, (11)

for a value |κ1|1/(10ε)\left|\kappa_{1}\right|\leq 1/(10\varepsilon) and N2/ε2N\leq 2/\varepsilon^{2}.

Set γ=1/(20B)\gamma=1/(20B). Since F0(X1,X2,,Xk)F_{0}(X_{1},X_{2},\ldots,X_{k}) computes a value W~\tilde{W} which is a (1+γε)(1+\gamma\varepsilon)-approximation of WW, we can substitute WW with W~\tilde{W} in Equation (11), resulting in the following.

W~=(n)+(1λ)N+κ1+κ2,\tilde{W}=(n-\ell)+(1-\lambda)N+\kappa_{1}+\kappa_{2}, (12)

where |κ1|1/(10ε)\left|\kappa_{1}\right|\leq 1/(10\varepsilon), N2/ε2N\leq 2/\varepsilon^{2}, and

κ2\displaystyle\kappa_{2} \displaystyle\leq γεW\displaystyle\gamma\varepsilon\cdot W
=\displaystyle= γε((n)+(1λ)N+κ1)\displaystyle\gamma\varepsilon\cdot((n-\ell)+(1-\lambda)N+\kappa_{1})
\displaystyle\leq γε(B/ε2+2/ε2+1/(10ε))\displaystyle\gamma\varepsilon\cdot(B/\varepsilon^{2}+2/\varepsilon^{2}+1/(10\varepsilon))
\displaystyle\leq 1/(10ε).\displaystyle 1/(10\varepsilon).

Now we have

N\displaystyle N =\displaystyle= (W~(n)κ1κ2)/(1λ)\displaystyle(\tilde{W}-(n-\ell)-\kappa_{1}-\kappa_{2})/(1-\lambda)
=\displaystyle= (W~(n))/(1λ)+κ3,\displaystyle(\tilde{W}-(n-\ell))/(1-\lambda)+\kappa_{3},

where |κ3|(1/(10ε)+1/(10ε))/(14/B)1/(4ε)\left|\kappa_{3}\right|\leq(1/(10\varepsilon)+1/(10\varepsilon))/(1-4/B)\leq 1/(4\varepsilon). Therefore (W~(n))/(1λ)(\tilde{W}-(n-\ell))/(1-\lambda) approximates N=i[k]ZiN=\sum_{i\in[k]}Z_{i} correctly up to an additive error 1/(4ε)<βk=1/ε1/(4\varepsilon)<\sqrt{\beta k}=1/\varepsilon, thus computes kk-APPROX-SUM correctly. The total error probability of this simulation is at most (δ/2+6600/B)(\delta/2+6600/B), where the first term counts the error probability of 𝒫\cal P^{\prime} and the second term counts the error probability introduced by the reduction. This is less than δ\delta if we choose B=20000/δB=20000/\delta.  

3.3.2 An Embedding Argument

Lemma 8

Suppose that there exists a deterministic protocol 𝒫\cal P^{\prime} which computes (1+γε)(1+\gamma\varepsilon)-approximate F0F_{0} (for a sufficiently small constant γ\gamma) on input distribution ζ\zeta with error probability δ/2\delta/2 (for a sufficiently small constant δ\delta) and communication o(C)o(C), then there exists a deterministic protocol 𝒫\cal P that computes 22-DISJ on input distribution τ\tau with error probability β/100\beta/100 and expected communication complexity o(log(1/β)C/k)o(\log(1/\beta)\cdot C/k), where the expectation is taken over the input distribution τ\tau.

  • Proof:

    In 22-DISJ, Alice holds XX and Bob holds YY such that (X,Y)τ(X,Y)\sim\tau. We show that Alice and Bob can use the deterministic protocol 𝒫\cal P^{\prime} to construct a deterministic protocol 𝒫\cal P for 22-DISJ(X,Y)(X,Y) with desired error probability and communication complexity.

    Alice and Bob first use 𝒫\cal P^{\prime} to construct a protocol 𝒫′′\cal P^{\prime\prime}. During the construction they will use public and private randomness which will be fixed at the end. 𝒫′′\cal P^{\prime\prime} consists of two phases.

    Input reduction phase. Alice and Bob construct an input for F0F_{0} using XX and YY as follows: They pick a random site SI(I[k])S_{I}\ (I\in[k]) using public randomness. Alice assigns SIS_{I} with input XI=XX_{I}=X, and Bob constructs inputs for the rest (k1)(k-1) sites using YY. For each i[k]\Ii\in[k]\backslash I, Bob samples an XiX_{i} according to τ|Y\tau\ |\ Y using independent private randomness and assigns it to SiS_{i}. Let Zi=2-DISJ(Xi,Y)Z_{i}=\text{{$2$-DISJ}}(X_{i},Y). Note that {X1,,Xk,Y}μ\{X_{1},\ldots,X_{k},Y\}\sim\mu and {X1,,Xk}ζ\{X_{1},\ldots,X_{k}\}\sim\zeta.

    Simulation phase. Alice simulates SIS_{I} and Bob simulates the rest (k1)(k-1) sites, and they run protocol 𝒫\cal P^{\prime} on {X1,,Xk}ζ\{X_{1},\ldots,X_{k}\}\sim\zeta to compute F0(X1,,Xk)F_{0}(X_{1},\ldots,X_{k}) up to a (1+γε)(1+\gamma\varepsilon)-approximation for a sufficiently small constant γ\gamma and error probability δ/2\delta/2. Let π\pi be the protocol transcript, and let W~\tilde{W} be the output. By Lemma 7, we can use W~\tilde{W} to compute kk-APPROX-SUM with error probability δ\delta. And then by Theorem 1, for 0.960.96 fraction of Z=zZ=z over the input distribution μ\mu and π=Π(z)\pi=\Pi(z), it holds that for 0.90.9 fraction of {i[k]|zi=0}\{i\in[k]\ |\ z_{i}=0\}, qiπ<β/c0q_{i}^{\pi}<\beta/c_{0}, and 0.90.9 fraction of {i[k]|zi=1}\{i\in[k]\ |\ z_{i}=1\}, qiπ>β/c0q_{i}^{\pi}>\beta/c_{0}. Now 𝒫′′\cal P^{\prime\prime} outputs 11 if qIπ>β/c0q_{I}^{\pi}>\beta/c_{0}, and 0 otherwise. Since SIS_{I} is chosen randomly among the kk sites, and the inputs for the kk sites are identically distributed, 𝒫′′\cal P^{\prime\prime} computes ZI=2-DISJ(X,Y)Z_{I}=\text{{$2$-DISJ}}(X,Y) on input distribution τ\tau correctly with probability 0.960.90.80.96\cdot 0.9\geq 0.8.

    We now describe the final protocol 𝒫\cal P: Alice and Bob repeat 𝒫′′\cal P^{\prime\prime} independently for cRlog(1/β)c_{R}\log(1/\beta) times for a large enough constant cRc_{R}. At the jj-th repetition, in the input reduction phase, they choose a random permutation σj\sigma_{j} of [n][n] using public randomness, and apply it to each element in X1,,XkX_{1},\ldots,X_{k} before assigning them to the kk sites. After running 𝒫′′\cal P^{\prime\prime} for cRlog(1/β)c_{R}\log(1/\beta) times, 𝒫\cal P outputs the majority of the outcomes.

    Since ZI=2-DISJ(X, Y)Z_{I}=\text{{$2$-DISJ}(X, Y)} is fixed at each repetition, the inputs {X1,,Xk}\{X_{1},\ldots,X_{k}\} at each repetition have a small dependence, but conditioned on ZIZ_{I}, they are all independent. Let μ\mu^{\prime} to be input distribution of {X1,,Xk,Y}\{X_{1},\ldots,X_{k},Y\} conditioned on ZI=bZ_{I}=b. Let ζ\zeta^{\prime} be the induced distribution of μ\mu^{\prime} on {X1,,Xk}\{X_{1},\ldots,X_{k}\}. The successful probability of a run of 𝒫′′\cal P^{\prime\prime} on ζ\zeta^{\prime} is at least 0.8𝖳𝖵(ζ,ζ)0.8-\mathsf{TV}(\zeta,\zeta^{\prime}), where 𝖳𝖵(ζ,ζ)\mathsf{TV}(\zeta,\zeta^{\prime}) is the total variation distance between distributions ζ,ζ\zeta,\zeta^{\prime}, which is at most

    max{𝖳𝖵(Binomial(k,β),Binomial(k1,β)),𝖳𝖵(Binomial(k,β),Binomial(k1,β)+1)},\max\{\mathsf{TV}(\text{Binomial}(k,\beta),\text{Binomial}(k-1,\beta)),\mathsf{TV}(\text{Binomial}(k,\beta),\text{Binomial}(k-1,\beta)+1)\},

    and can be bounded by O(1/βk)=O(ε)O(1/\sqrt{\beta k})=O(\varepsilon) (see, e.g., Fact 2.4 of [30]). Since conditioned on ZIZ_{I}, the inputs at each repetition are independent, and the success probability of each run of 𝒫′′\cal P^{\prime\prime} is at least 0.70.7, by a Chernoff bound over the cRlog(1/β)c_{R}\log(1/\beta) repetitions for a sufficiently large cRc_{R}, we conclude that 𝒫\cal P succeeds with error probability β/1600\beta/1600.

    We next consider the communication complexity. At each run of 𝒫′′\cal P^{\prime\prime}, let 𝖢𝖢(SI,SI)\mathsf{CC}(S_{I},S_{-I}) be the expected communication cost between the site SIS_{I} and the rest players (more precisely, between SIS_{I} and the coordinator, since in the coordinator model all sites only talk to the coordinator, whose initial input is \emptyset), where the expectation is taken over the input distribution ζ\zeta and the choice of the random I[k]I\in[k]. Since conditioned on YY, all Xi(i[k])X_{i}\ (i\in[k]) are independent and identically distributed, if we take a random site SIS_{I}, the expected communication between SIS_{I} and the coordinator should be equal to the total communication divided by a factor of kk. Thus we have 𝖢𝖢(SI,SI)=o(C/k)\mathsf{CC}(S_{I},S_{-I})=o(C/k). Finally, by the linearity of expectation, the expected total communication cost of the O(log(1/β))O(\log(1/\beta)) runs of 𝒫′′\cal P^{\prime\prime} is o(log(1/β)C/k)o(\log(1/\beta)\cdot C/k).

    At the end we fix all the randomness used in construction of protocol 𝒫\cal P. We first use two Markov inequalities to fix all public randomness such that 𝒫\cal P succeeds with error probability β/400\beta/400, and the expected total communication cost of the o(log(1/β)C/k)o(\log(1/\beta)C/k), where both the error probability and the cost expectation are taken over the input distribution μ\mu and Bob’s private randomness. We next use another two Markov inequalities to fix Bob’s private randomness such that 𝒫\cal P succeeds with error probability β/100\beta/100, and the expected total communication cost of the o(log(1/β)C/k)o(\log(1/\beta)C/k), where both the error probability and the cost expectation are taken over the input distribution μ\mu.  

The following theorem is a direct consequence of Lemma 8, Theorem 2 for 22-DISJ and Lemma 1 (Yao’s Lemma). Recall that we set n=O(1/ε2)n=O(1/\varepsilon^{2}) and 1/β=ε2k1/\beta=\varepsilon^{2}k. In the definition for kk-APPROX-SUM we need cβ/kβ1/cβc_{\beta}/k\leq\beta\leq 1/c_{\beta} for a sufficiently large constant cβc_{\beta}, thus we require cβ1/ε2k/cβc_{\beta}\leq 1/\varepsilon^{2}\leq k/c_{\beta} for a sufficiently large constant cβc_{\beta}.

Theorem 3

Assuming that cβ1/ε2k/cβc_{\beta}\leq 1/\varepsilon^{2}\leq k/c_{\beta} for a sufficiently large constant cβc_{\beta}. Any randomised protocol that computes a (1+ε)(1+\varepsilon)-approximation to F0F_{0} with error probability δ\delta (for a sufficiently small constant δ\delta) has communication complexity Ω(kε2log(ε2k))\Omega\left(\frac{k}{\varepsilon^{2}\log(\varepsilon^{2}k)}\right).

4 A Lower Bound for Fp(p>1)F_{p}\ (p>1)

We first introduce a problem called kk-XOR which can be considered to some extent as a combination of two kk-DISJ (introduced in [3, 7]) instances, and then compose it with 22-GAP-ORT (introduced in [49]) to create another problem that we call the kk-BLOCK-THRESH-XOR (kk-BTX) problem. We prove that the communication complexity of kk-BTX is large. Finally, we prove a communication complexity lower bound for FpF_{p} by performing a reduction from kk-BTX. In this section we work in the blackboard model.

4.1 The 22-GAP-ORT Problem

In the 22-GAP-ORT problem we have two players Alice and Bob. Alice has a vector x={x1,,x1/ε2}{0,1}1/ε2x=\{x_{1},\ldots,x_{1/\varepsilon^{2}}\}\in\{0,1\}^{1/\varepsilon^{2}} and Bob has a vector y={y1,,y1/ε2}{0,1}1/ε2y=\{y_{1},\ldots,y_{1/\varepsilon^{2}}\}\in\{0,1\}^{1/\varepsilon^{2}}. They want to compute

2-GAP-ORT(x,y)={1,|i[1/ε2]XOR(xi,yi)12ε2|2ε,0,|i[1/ε2]XOR(xi,yi)12ε2|1ε,,otherwise.\displaystyle\begin{array}[]{l}2\textrm{-GAP-ORT}(x,y)=\left\{\begin{array}[]{rl}1,&\left|\sum\limits_{i\in[1/\varepsilon^{2}]}\textrm{XOR}(x_{i},y_{i})-\frac{1}{2\varepsilon^{2}}\right|\geq\frac{2}{\varepsilon},\\ 0,&\left|\sum\limits_{i\in[1/\varepsilon^{2}]}\textrm{XOR}(x_{i},y_{i})-\frac{1}{2\varepsilon^{2}}\right|\leq\frac{1}{\varepsilon},\\ ,&\text{otherwise.}\end{array}\right.\end{array}

Let ϕ\phi be the uniform distribution on {0,1}1/ε2×{0,1}1/ε2\{0,1\}^{1/\varepsilon^{2}}\times\{0,1\}^{1/\varepsilon^{2}} and let (X,Y)(X,Y) be a random input chosen from distribution ϕ\phi. The following theorem is recently obtained by Chakrabarti et al. [15]555In our original conference paper, which was published before [15], we proved the theorem for protocols with poly(N)\mathrm{poly}(N) communication, using a simple argument based on [49] and Theorem 1.3 of [8].

Theorem 4 ([15])

Let Π\Pi be the transcript of any randomized protocol for 22-GAP-ORT on input distribution ϕ\phi with error probability ι\iota, for a sufficiently small constant ι>0\iota>0. Then, I(X,Y;Π)Ω(1/ε2)I(X,Y;\Pi)\geq{\Omega}(1/\varepsilon^{2}).

4.2 The kk-XOR Problem

In the kk-XOR problem we have kk sites S1,,SkS_{1},\ldots,S_{k}. Each site Si(i=1,2,,k)S_{i}\ (i=1,2,\ldots,k) holds a block bi={bi,1,,bi,n}{b}_{i}=\{b_{i,1},\ldots,b_{i,n}\} of nn bits. Let b=(b1,,bk)b=(b_{1},\ldots,b_{k}) be the inputs of kk sites. Let b[k],b_{[k],\ell} be the kk sites’ inputs on the \ell-th coordinate. W.l.o.g., we assume kk is a power of 22. The kk sites want to compute the following function.

k-XOR(b1,,bk)={1,if  j[n] such that bi,j=1for exactly k/2 i’s,0,otherwise.\displaystyle\begin{array}[]{l}k\textrm{-XOR}({b}_{1},\ldots,{b}_{k})=\left\{\begin{array}[]{rl}1,&\text{if $\exists$ $j\in[n]$ such that $b_{i,j}=1$}\\ &\quad\text{for exactly $k/2$ $i$'s},\\ 0,&\text{otherwise.}\end{array}\right.\end{array}

We define the input distribution φn\varphi_{n} for the kk-XOR problem as follows. For each coordinate ([n])\ell\ (\ell\in[n]) there is a variable DD_{\ell} chosen uniformly at random from {1,2,,k}\{1,2,\ldots,k\}. Conditioned on DD_{\ell}, all but the DD_{\ell}-th sites set their inputs to 0, whereas the DD_{\ell}-th site sets its input to 0 or 11 with equal probability. We call the DD_{\ell}-th site the special site in the \ell-th coordinate. Let φ1\varphi_{1} denote this input distribution on one coordinate.

Next, we choose a random special coordinate M[n]M\in[n] and replace the kk sites’ inputs on the MM-th coordinate as follows: For the first k/2k/2 sites, with probability 1/21/2 we replace all k/2k/2 sites’ inputs with 0, and with probability 1/21/2 we replace all k/2k/2 sites’ inputs with 11; and we independently perform the same operation to the second k/2k/2 sites. Let ψ1\psi_{1} denote the distribution on this special coordinate. And let ψn\psi_{n} denote the input distribution that on the special coordinate MM is distributed as ψ1\psi_{1} and on each of the remaining n1n-1 coordinates is distributed as φ1\varphi_{1}.

Let B,Bi,B[k],,Bi,{B},B_{i},B_{[k],\ell},B_{i,\ell} be the corresponding random variables of b,bi,b[k],,bi,{b},b_{i},b_{[k],\ell},b_{i,\ell} when the input of kk-XOR is chosen according to the distribution ψn\psi_{n}. Let D={D1,,Dn}D=\{D_{1},\ldots,D_{n}\}.

4.3 The kk-GUESS Problem

The kk-GUESS problem can be seen as an augmentation of the kk-XOR problem. The kk sites are still given an input BB, as that in the kk-XOR problem. In addition, we introduce another player called the predictor. The predictor will be given an input ZZ, but it cannot talk to any of the kk sites (that is, it cannot write anything to the blackboard). After the kk sites finish the whole communication, the predictor computes the final output g(Π(B),Z)g(\Pi(B),Z), where Π(B)\Pi(B) is the transcript of the kk sites’ communication on their input BB, and gg is the (deterministic) maximum likelihood function (see Proposition 1). In this section when we talk about protocol transcripts, we always mean the concatenation of the messages exchanged by the kk sites, but excluding the output of the predictor.

In the kk-GUESS problem, the goal is for the predictor to output (X,Y)(X,Y), where X=1X=1 if the inputs of the first k/2k/2 sites in the special coordinate MM are all 11 and X=0X=0 otherwise, and Y=1Y=1 if the inputs of the second k/2k/2 sites in the special coordinate MM are all 11 and Y=0Y=0 otherwise. We say the instance BB is a 0000-instance if X=Y=0X=Y=0, a 1010-instance if X=1X=1 and Y=0Y=0, a 0101-instance if X=0X=0 and Y=1Y=1, and a 1111-instance if X=Y=1X=Y=1. Let S{00,01,10,11}S\in\{00,01,10,11\} be the type of an instance.

We define the following input distribution for kk-GUESS: We assign an input BψnB\sim\psi_{n} to the kk sites, and Z={D,M}Z=\{D,M\} to the predictor, where D,MD,M are those used to construct the BB in the distribution ψn\psi_{n}. Slightly abusing notation, we also use ψn\psi_{n} to denote the joint distribution of BB and {D,M}\{D,M\}. We do the same for the one coordinate distributions ψ1\psi_{1} and φ1\varphi_{1}. That is, we also use ψ1\psi_{1} (or φ1\varphi_{1}) to denote the joint distribution of B[k],B_{[k],\ell} and DD_{\ell} for a single coordinate \ell.

Theorem 5

Let Π\Pi be the transcript of any randomized protocol for kk-GUESS on input distribution ψn\psi_{n} with success probability 1/4+Ω(1)1/4+\Omega(1). Then we have I(B;Π|D,M)=Ω(n/k)I(B;\Pi\ |\ D,M)={\Omega}(n/k), where information is measured 666When we say that the information is measured with respect to a distribution μ\mu we mean that the inputs to the protocol are distributed according to μ\mu when computing the mutual information (note that there is also randomness used by Π\Pi when measuring the mutual information). with respect to the input distribution ψn\psi_{n}.

  • Proof:

    By a Markov inequality, we know that for κ1=Ω(n)\kappa_{1}=\Omega(n) of [n]\ell\in[n], the protocol succeeds with probability 1/4+Ω(1)1/4+\Omega(1) conditioned on M=M=\ell. Call an \ell for which this holds eligible. Let κ=nκ1/2\kappa=n-\kappa_{1}/2. We say an \ell is good if \ell is both eligible and κ\ell\leq\kappa. Thus there are κ1+κn=Ω(n)\kappa_{1}+\kappa-n=\Omega(n) good \ell. Let DD_{-\ell} denote the random variable DD with \ell-th component missing. We say a dd is nice for a good \ell if the protocol succeeds with probability 1/4+Ω(1)1/4+\Omega(1) conditioned on M=M=\ell and D=dD_{-\ell}=d. By another Markov inequality, it holds that at least an Ω(1)\Omega(1) fraction of dd is nice for a good \ell.

    Now we consider I(B;Π|D,M,S=00,M>κ)I(B;\Pi\ |\ D,M,S=00,M>\kappa). Note that if we can show that I(B;Π|D,M,S=00,M>κ)=Ω(n/k)I(B;\Pi\ |\ D,M,S=00,M>\kappa)=\Omega(n/k), then it follows that I(B;Π|D,M)=Ω(n/k)I(B;\Pi\ |\ D,M)=\Omega(n/k), since H(S)=2H(S)=2, and 𝖯𝗋[(S=00)(M>κ)]=1/4Ω(1)=Ω(1)\mathsf{Pr}[(S=00)\wedge(M>\kappa)]=1/4\cdot{\Omega}(1)={\Omega}(1). By the chain rule, expanding the conditioning, and letting B[k],<B_{[k],<\ell} be the inputs to the kk sites on the first 1\ell-1 coordinates, we have

    I(B;Π|D,M,S=00,M>κ)\displaystyle I(B;\Pi\ |\ D,M,S=00,M>\kappa) (17)
    =\displaystyle= [n]I(B[k],;Π|D,M,S=00,M>κ,B[k],<)\displaystyle\sum_{\ell\in[n]}I(B_{[k],\ell};\Pi\ |\ D,M,S=00,M>\kappa,B_{[k],<\ell})
    =\displaystyle= [n]I(B[k],;Π|D,M,S=00,M>κ)\displaystyle\sum_{\ell\in[n]}I(B_{[k],\ell};\Pi\ |\ D,M,S=00,M>\kappa)
    \displaystyle\geq good I(B[k],;Π|D,M,S=00,M>κ)\displaystyle\sum_{\textrm{good }\ell}I(B_{[k],\ell};\Pi\ |\ D,M,S=00,M>\kappa)
    =\displaystyle= good d𝖯𝗋[D=d]I(B[k],;Π|D,M,S=00,M>κ,D=d)\displaystyle\sum_{\textrm{good }\ell}\sum_{d}\mathsf{Pr}[D_{-\ell}=d]\cdot I(B_{[k],\ell};\Pi\ |\ D_{\ell},M,S=00,M>\kappa,D_{-\ell}=d)
    \displaystyle\geq good nice d for 𝖯𝗋[D=d]I(B[k],;Π|D,M,S=00,M>κ,D=d),\displaystyle\sum_{\textrm{good }\ell}\sum_{\text{nice $d$ for $\ell$}}\mathsf{Pr}[D_{-\ell}=d]\cdot I(B_{[k],\ell};\Pi\ |\ D_{\ell},M,S=00,M>\kappa,D_{-\ell}=d), (19)

    where (17) to (Proof: ) is because B[k],<B_{[k],<\ell} is independent of B[k],B_{[k],\ell} given other conditions, and we apply item 33 in Proposition 1.

    Now let’s focus on a good \ell and a nice dd for \ell. We define a protocol Π,d\Pi_{\ell,d} which on input (A1,,Ak,R)ψ1(A_{1},\ldots,A_{k},R)\sim\psi_{1}, attempts to output (U,V)(U,V), where U=1U=1 if A1==Ak/2=1A_{1}=\ldots=A_{k/2}=1 and U=0U=0 otherwise, and V=1V=1 if Ak/2+1==Ak=1A_{k/2+1}=\ldots=A_{k}=1 and V=0V=0 otherwise. Here A1,,AkA_{1},\ldots,A_{k} are inputs of the kk sites and RR is the input of the predictor. The protocol Π,d\Pi_{\ell,d} has (,d)(\ell,d) hardwired into it, and works as follows. First, the kk sites construct an input BB for kk-GUESS distributed according to ψn\psi_{n}, using {A1,,Ak},d\{A_{1},\ldots,A_{k}\},d and their private randomness, without any communication: They set the input on the \ell-th coordinate to be B[k],={A1,,Ak}B_{[k],\ell}=\{A_{1},\ldots,A_{k}\}, and use their private randomness to sample the inputs for coordinates \ell^{\prime}\neq\ell using the value dd and the fact that the inputs to the kk sites are independent conditioned on D=dD_{-\ell}=d. The predictor sets its input to be {D=R,D=d,M=}\{D_{\ell}=R,D_{-\ell}=d,M=\ell\}. Next, the kk sites run Π\Pi on their input BB. Finally, the predictor outputs (U,V)=g(Π(B),R,d,)(U,V)=g(\Pi(B),R,d,\ell).

    Let ψn,00,d=(ψn|S=00,D=d)\psi_{n,00,d}=(\psi_{n}|S=00,D_{-\ell}=d). Let φn,d=(φn|D=d)\varphi_{n,d}=(\varphi_{n}|D_{-\ell}=d). Let ψn,00,d\psi_{n,00,d}^{\ell} and φn,d\varphi_{n,d}^{\ell} be the distributions ψn,00,d\psi_{n,00,d} and φn,d\varphi_{n,d} of BB after embedding (A1,,Ak)(A_{1},\ldots,A_{k}) to the \ell-th coordinate conditioned on M>κM>\kappa (recall that κ\kappa\geq\ell for a good \ell, thus M>M>\ell), respectively. Since \ell is good and dd is nice for \ell, it follows that

    𝖯𝗋[Π,d(A1,,Ak,R)=(U,V)|S=00,M>κ]\displaystyle\mathsf{Pr}[\Pi_{\ell,d}(A_{1},\ldots,A_{k},R)=(U,V)\ |\ S=00,M>\kappa] =\displaystyle= 1/4+Ω(1)𝖳𝖵(ψn,00,d,φn,d)\displaystyle 1/4+{\Omega}(1)-\mathsf{TV}(\psi_{n,00,d}^{\ell},\varphi_{n,d}^{\ell})
    \displaystyle\geq 1/4+Ω(1),\displaystyle 1/4+{\Omega}(1),

    where the probability is taken over (A1,,Ak,R)ψ1(A_{1},\ldots,A_{k},R)\sim\psi_{1}, and 𝖳𝖵(ψn,00,d,φn,d)\mathsf{TV}(\psi_{n,00,d}^{\ell},\varphi_{n,d}^{\ell}) is the total variation distance between distributions ψn,00,d\psi_{n,00,d}^{\ell} and φn,d\varphi_{n,d}^{\ell}, which can be bounded by O(1/nκ)=O(1/n)O(1/\sqrt{n-\kappa})=O(1/\sqrt{n}). The proof will be given shortly, and here is where we use that κ=nΩ(n)\kappa=n-\Omega(n).

    Hence, for a good \ell, and for a nice dd for \ell, we have

    I(B[k],;Π|D,M,S=00,M>κ,D=d)I(A1,,Ak;Π|R,M,S=00,M>κ),\displaystyle I(B_{[k],\ell};\Pi\ |\ D_{\ell},M,S=00,M>\kappa,D_{-\ell}=d)\geq I(A_{1},\ldots,A_{k};\Pi^{\prime}\ |\ R,M,S=00,M>\kappa), (20)

    where Π\Pi^{\prime} is the protocol that minimizes the information cost when the information (on the right side of (20)) is measured with respect to the marginal distribution of ψn\psi_{n} on a good coordinate \ell, and succeeds in outputting (U,V)(U,V) with probability 1/4+Ω(1)1/4+{\Omega}(1) when the kk sites and the predictor get input {A1,,Ak,R}ψ1\{A_{1},\ldots,A_{k},R\}\sim\psi_{1}, The information on the left side of (20) is measured with respect to ψn\psi_{n}.

    Combining (19) and (20), and given that we have Ω(n)\Omega(n) good \ell, as well as at least an Ω(1)\Omega(1) fraction of dd that are nice for any good \ell, we have

    I(B;Π|D,M,S=00,M>κ)Ω(n)I(A1,,Ak;Π|R,M,S=00,M>κ).\displaystyle I(B;\Pi\ |\ D,M,S=00,M>\kappa)\geq{\Omega}(n)\cdot I(A_{1},\ldots,A_{k};\Pi^{\prime}\ |\ R,M,S=00,M>\kappa). (21)

    Now we analyze 𝖳𝖵(ψn,00,d,φn,d)\mathsf{TV}(\psi_{n,00,d}^{\ell},\varphi_{n,d}^{\ell}). First note that we can just focus on coordinates >\ell^{\prime}>\ell, since the distributions ψn,00,d\psi_{n,00,d}^{\ell} and φn,d\varphi_{n,d}^{\ell} are the same on coordinates \ell^{\prime}\leq\ell. Let ς1\varsigma_{1} and ς2\varsigma_{2} be the distribution of ψn,00,d\psi_{n,00,d}^{\ell} and φn,d\varphi_{n,d}^{\ell} on coordinates {|>,[n]}\{\ell^{\prime}\ |\ \ell^{\prime}>\ell,\ell^{\prime}\in[n]\}, respectively. Observe that ς2\varsigma_{2} can be thought as a binomial distribution: for each coordinate >\ell^{\prime}>\ell, we set BD,B_{D_{\ell^{\prime}},\ell^{\prime}} randomly to be 0 or 11 with equal probability. The remaining Bi,(iD)B_{i,\ell^{\prime}}\ (i\neq D_{\ell^{\prime}}) are all set to be 0. Moreover, ς1\varsigma_{1} can be thought in the following way: we first sample according to ς2\varsigma_{2}, and then randomly choose a coordinate M>κM>\kappa\geq\ell and reset BDM,M=0B_{D_{M},M}=0. Since MM is random, the total variation distance between ς1\varsigma_{1} and ς2\varsigma_{2} is the total variation distance between Binomial(n,1/2)(n-\ell,1/2) and Binomial(n1,1/2)(n-\ell-1,1/2) (that is, by symmetry, only the number of 11’s in {BD,|>}\{B_{D_{\ell^{\prime}},\ell^{\prime}}\ |\ \ell^{\prime}>\ell\} matters), which is at most O(1/n)O(1/nκ)O(1/\sqrt{n-\ell})\leq O(1/\sqrt{n-\kappa}) (see, e.g., Fact 2.4 of [30]).

    Let \mathcal{E} be the event that all sites have the value 0 in the MM-th coordinate when the inputs are drawn from φn\varphi_{n}. Observe that (φn|)=(ψn|S=00)(\varphi_{n}|\mathcal{E})=(\psi_{n}|S=00), thus

    I(B;Π|D,M,S=00,M>κ)Ω(n)I(A1,,Ak;Π|R,M,M>κ,),I(B;\Pi\ |\ D,M,S=00,M>\kappa)\geq{\Omega}(n)\cdot I(A_{1},\ldots,A_{k};\Pi^{\prime}\ |\ R,M,M>\kappa,\mathcal{E}),

    where the information on the left hand side is measured with respect to inputs (B,{D,M})(B,\{D,M\}) drawn from ψn\psi_{n}, and the information on the right hand side is measured with respect to inputs (A1,,Ak,R)(A_{1},\ldots,A_{k},R) drawn from the marginal distribution of φn\varphi_{n} on a good coordinate \ell, which is equivalent to φ1\varphi_{1} since M>κM>\kappa\geq\ell. By the third item of Proposition 1, and using that A1,,AkA_{1},\ldots,A_{k} are independent of \mathcal{E} given M>κM>\kappa\geq\ell and RR, we obtain

    I(A1,,Ak;Π|R,M,M>κ,)I(A1,,Ak;Π|R,M,M>κ).\displaystyle I(A_{1},\ldots,A_{k};\Pi^{\prime}\ |\ R,M,M>\kappa,\mathcal{E})\geq I(A_{1},\ldots,A_{k}\ ;\Pi^{\prime}\ |\ R,M,M>\kappa).

    Finally, since M,M>κM,M>\kappa are independent of A1,,AkA_{1},\ldots,A_{k} and RR, it holds that I(A1,,Ak;Π|R,M,M>κ)I(A1,,Ak;Π|R)I(A_{1},\ldots,A_{k}\ ;\Pi^{\prime}\ |\ R,M,M>\kappa)\geq I(A_{1},\ldots,A_{k};\Pi^{\prime}\ |\ R), where the information is measured with respect to the input distribution φ1\varphi_{1}, and Π\Pi^{\prime} is a protocol which succeeds with probability 1/4+Ω(1)1/4+{\Omega}(1) on ψ1\psi_{1}.

    It remains to show that I(A1,,Ak;Π|R)=Ω(1/k)I(A_{1},\ldots,A_{k};\Pi^{\prime}\ |\ R)={\Omega}(1/k), where the information is measured with respect to φ1\varphi_{1}, and the correctness is measured with respect to ψ1\psi_{1}. Let 𝟎\mathbf{0} be the all-0 vector, 𝟏\mathbf{1} be the all-11 vector and 𝐞𝐢\mathbf{e_{i}} be the standard basis vector with the ii-th coordinate being 11. By the relationship between mutual information and Hellinger distance (see Proposition 2.51 and Proposition 2.53 of [6]), we have

    I(A1,,Ak;Π|R)\displaystyle I(A_{1},\ldots,A_{k};\Pi^{\prime}\ |\ R) =\displaystyle= (1/k)i[k]I(A1,,Ak;Π|R=i)\displaystyle(1/k)\cdot\sum_{i\in[k]}I(A_{1},\ldots,A_{k};\Pi^{\prime}\ |\ R=i)
    =\displaystyle= Ω(1/k)i[k]h2(Π(𝟎),Π(𝐞𝐢)),\displaystyle\textstyle\Omega(1/k)\cdot\sum_{i\in[k]}h^{2}(\Pi^{\prime}(\mathbf{0}),\Pi^{\prime}(\mathbf{e_{i}})),

    where h(,)h(\cdot,\cdot) is the Hellinger distance (see Section 2 for a definition). Now we assume kk and k/2k/2 are powers of 22, and we use Theorem 77 of [39], which says that the following three statements hold:

    1. 1.

      i[k]h2(Π(𝟎),Π(𝐞𝐢))=Ω(1)h2(Π(𝟎),Π(1k/20k/2))\sum_{i\in[k]}h^{2}(\Pi^{\prime}(\mathbf{0}),\Pi^{\prime}(\mathbf{e_{i}}))=\Omega(1)\cdot h^{2}(\Pi^{\prime}(\mathbf{0}),\Pi^{\prime}(1^{k/2}0^{k/2})).

    2. 2.

      i[k]h2(Π(𝟎),Π(𝐞𝐢))=Ω(1)h2(Π(𝟎),Π(0k/21k/2))\sum_{i\in[k]}h^{2}(\Pi^{\prime}(\mathbf{0}),\Pi^{\prime}(\mathbf{e_{i}}))=\Omega(1)\cdot h^{2}(\Pi^{\prime}(\mathbf{0}),\Pi^{\prime}(0^{k/2}1^{k/2})).

    3. 3.

      i[k]h2(Π(𝟎),Π(𝐞𝐢))=Ω(1)h2(Π(𝟎),Π(𝟏))\sum_{i\in[k]}h^{2}(\Pi^{\prime}(\mathbf{0}),\Pi^{\prime}(\mathbf{e_{i}}))=\Omega(1)\cdot h^{2}(\Pi^{\prime}(\mathbf{0}),\Pi^{\prime}(\mathbf{1})).

    It follows that

    I(A1,,Ak;Π|R)\displaystyle I(A_{1},\ldots,A_{k};\Pi^{\prime}\ |\ R) =\displaystyle= Ω(1/k)(h2(Π(𝟎),Π(1k/20k/2))\displaystyle\Omega(1/k)\cdot\left(h^{2}(\Pi^{\prime}(\mathbf{0}),\Pi^{\prime}(1^{k/2}0^{k/2}))\right.
    +h2(Π(𝟎),Π(0k/21k/2))+h2(Π(𝟎),Π(𝟏))).\displaystyle\left.+h^{2}(\Pi^{\prime}(\mathbf{0}),\Pi^{\prime}(0^{k/2}1^{k/2}))+h^{2}(\Pi^{\prime}(\mathbf{0}),\Pi^{\prime}(\mathbf{1}))\right).

    By the Cauchy-Schwartz inequality we have,

    I(A1,,Ak;Π|R)\displaystyle I(A_{1},\ldots,A_{k};\Pi^{\prime}\ |\ R) =\displaystyle= Ω(1/k)(h(Π(𝟎),Π(1k/20k/2))\displaystyle\Omega(1/k)\cdot\left(h(\Pi^{\prime}(\mathbf{0}),\Pi^{\prime}(1^{k/2}0^{k/2}))\right.
    +h(Π(𝟎),Π(0k/21k/2))+h(Π(𝟎),Π(𝟏)))2.\displaystyle\left.+h(\Pi^{\prime}(\mathbf{0}),\Pi^{\prime}(0^{k/2}1^{k/2}))+h(\Pi^{\prime}(\mathbf{0}),\Pi^{\prime}(\mathbf{1}))\right)^{2}.

    We can rewrite this as (by changing the constant in the Ω(1/k)\Omega(1/k)):

    I(A1,,Ak;Π|R)\displaystyle I(A_{1},\ldots,A_{k};\Pi^{\prime}\ |\ R) =\displaystyle= Ω(1/k)(3h(Π(𝟎),Π(1k/20k/2))\displaystyle\Omega(1/k)\cdot\left(3h(\Pi^{\prime}(\mathbf{0}),\Pi^{\prime}(1^{k/2}0^{k/2}))\right.
    +3h(Π(𝟎),Π(0k/21k/2))+3h(Π(𝟎),Π(𝟏)))2.\displaystyle\left.+3h(\Pi^{\prime}(\mathbf{0}),\Pi^{\prime}(0^{k/2}1^{k/2}))+3h(\Pi^{\prime}(\mathbf{0}),\Pi^{\prime}(\mathbf{1}))\right)^{2}.

    By the triangle inequality of the Hellinger distance, we get

    1. 1.

      h(Π(𝟎),Π(𝟏))+h(Π(𝟎),Π(1k/20k/2))h(Π(𝟏),Π(1k/20k/2))h(\Pi^{\prime}(\mathbf{0}),\Pi^{\prime}(\mathbf{1}))+h(\Pi^{\prime}(\mathbf{0}),\Pi^{\prime}(1^{k/2}0^{k/2}))\geq h(\Pi^{\prime}(\mathbf{1}),\Pi^{\prime}(1^{k/2}0^{k/2})),

    2. 2.

      h(Π(𝟎),Π(𝟏))+h(Π(𝟎),Π(0k/21k/2))h(Π(𝟏),Π(0k/21k/2))h(\Pi^{\prime}(\mathbf{0}),\Pi^{\prime}(\mathbf{1}))+h(\Pi^{\prime}(\mathbf{0}),\Pi^{\prime}(0^{k/2}1^{k/2}))\geq h(\Pi^{\prime}(\mathbf{1}),\Pi^{\prime}(0^{k/2}1^{k/2})),

    3. 3.

      h(Π(𝟎),Π(0k/21k/2)+h(Π(𝟎),Π(1k/20k/2))h(Π(0k/21k/2),Π(1k/20k/2))h(\Pi^{\prime}(\mathbf{0}),\Pi^{\prime}(0^{k/2}1^{k/2})+h(\Pi^{\prime}(\mathbf{0}),\Pi^{\prime}(1^{k/2}0^{k/2}))\geq h(\Pi^{\prime}(0^{k/2}1^{k/2}),\Pi^{\prime}(1^{k/2}0^{k/2})).

    Thus we have

    I(A1,,Ak;Π|R)\displaystyle\textstyle I(A_{1},\ldots,A_{k};\Pi^{\prime}\ |\ R) =\displaystyle= Ω(1/k)(a,b{𝟎, 1, 1k/20k/2, 0k/21k/2}h(Π(a),Π(b)))2.\displaystyle\textstyle\Omega(1/k)\cdot\left(\sum_{a,b\in\{\mathbf{0},\ \mathbf{1},\ 1^{k/2}0^{k/2},\ 0^{k/2}1^{k/2}\}}h(\Pi^{\prime}(a),\Pi^{\prime}(b))\right)^{2}.

    The claim is that at least one of h(Π(a),Π(b))h(\Pi^{\prime}(a),\Pi^{\prime}(b)) in the RHS in Equation (Proof: ) is Ω(1){\Omega}(1), and this will complete the proof. By Proposition 3, this is true if the total variation distance 𝖳𝖵(Π(a),Π(b))=Ω(1)\mathsf{TV}(\Pi^{\prime}(a),\Pi^{\prime}(b))={\Omega}(1) for a pair (a,b){𝟎, 1, 1k/20k/2, 0k/21k/2}2(a,b)\in\{\mathbf{0},\ \mathbf{1},\ 1^{k/2}0^{k/2},\ 0^{k/2}1^{k/2}\}^{2}. We show that there must be such a pair (a,b)(a,b), for the following reasons.

    For a z{𝟎, 1, 1k/20k/2, 0k/21k/2}z\in\{\mathbf{0},\ \mathbf{1},\ 1^{k/2}0^{k/2},\ 0^{k/2}1^{k/2}\}, let χ(z)=xy\chi(z)=xy if z=xk/2yk/2z=x^{k/2}y^{k/2}. First, there must exist a pair (a,b)(a,b) such that

    𝖳𝖵(g(Π(a),R),g(Π(b),R))=Ω(1),\displaystyle\mathsf{TV}(g(\Pi^{\prime}(a),R),g(\Pi^{\prime}(b),R))=\Omega(1), (22)

    since otherwise, if 𝖳𝖵(g(Π(a),R),g(Π(b),R))=o(1)\mathsf{TV}(g(\Pi^{\prime}(a),R),g(\Pi^{\prime}(b),R))=o(1) for all (a,b){𝟎, 1, 1k/20k/2, 0k/21k/2}2(a,b)\in\{\mathbf{0},\ \mathbf{1},\ 1^{k/2}0^{k/2},\ 0^{k/2}1^{k/2}\}^{2}, then by Proposition 4, for any pair (a,b)(a,b) with aba\neq b, and a cc chosen from {a,b}\{a,b\} uniformly at random, it holds that 𝖯𝗋[g(Π(c),R)=χ(c)]1/2+o(1)\mathsf{Pr}[g(\Pi^{\prime}(c),R)=\chi(c)]\leq 1/2+o(1), where the probability is taken over the distribution of cc. Consequently, for a cc chosen from {𝟎, 1, 1k/20k/2, 0k/21k/2}\{\mathbf{0},\ \mathbf{1},\ 1^{k/2}0^{k/2},\ 0^{k/2}1^{k/2}\} uniformly at random, it holds that 𝖯𝗋[g(Π(c),R)=χ(c)]1/4+o(1)\mathsf{Pr}[g(\Pi^{\prime}(c),R)=\chi(c)]\leq 1/4+o(1), violating the protocol’s success probability guarantee. Second, since gg is a deterministic function, and RR is independent of A1,,AkA_{1},\ldots,A_{k} when (A1,,Ak,R)ψ1(A_{1},\ldots,A_{k},R)\sim\psi_{1}, we have

    𝖳𝖵(Π(a),Π(b))𝖳𝖵(g(Π(a),R),g(Π(b),R)).\displaystyle\mathsf{TV}(\Pi^{\prime}(a),\Pi^{\prime}(b))\geq\mathsf{TV}(g(\Pi^{\prime}(a),R),g(\Pi^{\prime}(b),R)). (23)

    The claim follows from (22) and (23).  

4.4 The kk-BTX Problem

The input of the kk-BTX problem is a concatenation of 1/ε21/\varepsilon^{2} copies of inputs of the kk-XOR problem. That is, each site Si(i=1,2,,k)S_{i}\ (i=1,2,\ldots,k) holds an input consisting of 1/ε21/\varepsilon^{2} blocks each of which is an input for a site in the kk-XOR problem. More precisely, each Si(i[k])S_{i}\ (i\in[k]) holds an input bi=(bi1,,bi1/ε2){b_{i}}=({b}_{i}^{1},\ldots,{b}_{i}^{1/\varepsilon^{2}}) where bij=(bi,1j,,bi,nj)(j[1/ε2]){b}_{i}^{j}=(b_{i,1}^{j},\ldots,b_{i,n}^{j})\ (j\in[1/\varepsilon^{2}]) is a vector of nn bits. Let bj=(b1j,,bkj)b^{j}=(b^{j}_{1},\ldots,b^{j}_{k}) be the list of inputs to the kk sites in the jj-th block. Let b=(b1,,bk){b}=({b_{1}},\ldots,{b_{k}}) be the list of inputs to the kk sites. In the kk-BTX problem the kk sites want to compute the following.

k-BTX(b1,,bk)={1,if |j[1/ε2]k-XOR(b1j,,bkj)12ε2|2/ε,0,if |j[1/ε2]k-XOR(b1j,,bkj)12ε2|1/ε,,otherwise.\displaystyle\begin{array}[]{l}k\textrm{-BTX}({b}_{1},\ldots,{b}_{k})=\left\{\begin{array}[]{rl}1,&\text{if }\left|\sum\limits_{j\in[1/\varepsilon^{2}]}k\textrm{-XOR}({b}_{1}^{j},\ldots,{b}_{k}^{j})-\frac{1}{2\varepsilon^{2}}\right|\geq 2/\varepsilon,\\ 0,&\text{if }\left|\sum\limits_{j\in[1/\varepsilon^{2}]}k\textrm{-XOR}({b}_{1}^{j},\ldots,{b}_{k}^{j})-\frac{1}{2\varepsilon^{2}}\right|\leq 1/\varepsilon,\\ ,&\text{otherwise}.\end{array}\right.\end{array}

We define the input distribution ν\nu for the kk-BTX problem as follows: The input of the kk sites in each block is chosen independently according to the input distribution ψn\psi_{n}, which is defined for the kk-XOR problem. Let B,Bi,Bj,Bij,Bi,j{B},{B}_{i},B^{j},{B}_{i}^{j},B_{i,\ell}^{j} be the corresponding random variables of b,bi,bj,bij,bi,j{b},{b}_{i},b^{j},{b}_{i}^{j},b_{i,\ell}^{j} when the input of kk-BTX is chosen according to the distribution ν\nu. Let Dj=(D1j,,Dnj){D}^{j}=({D}_{1}^{j},\ldots,{D}_{n}^{j}) where Dj([n],j[1/ε2]){D}_{\ell}^{j}\ (\ell\in[n],j\in[1/\varepsilon^{2}]) is the special site in the \ell-th coordinate of block jj, and let D=(D1,,D1/ε2){D}=({D}^{1},\ldots,{D}^{1/\varepsilon^{2}}). Let M=(M1,,M1/ε2){M}=(M^{1},\ldots,M^{1/\varepsilon^{2}}) where MjM^{j} is the special coordinate in block jj. Let S=(S1,,S1/ε2)S=(S^{1},\ldots,S^{1/\varepsilon^{2}}) where Sj{00,01,10,11}S^{j}\in\{00,01,10,11\} is the type of the kk-XOR instance in block jj.

For each block j(j[1/ε2])j\ (j\in[1/\varepsilon^{2}]), let Xj=1X^{j}=1 if the inputs of the first k/2k/2 sites in the special coordinate MjM^{j} are all 11 and Xj=0X^{j}=0 otherwise; and similarly let Yj=1Y^{j}=1 if the inputs of the second k/2k/2 sites in the coordinate MjM^{j} are all 11 and Yj=0Y^{j}=0 otherwise. Let X=(X1,,X1/ε2)X=(X^{1},\ldots,X^{1/\varepsilon^{2}}) and Y=(Y1,,Y1/ε2)Y=(Y^{1},\ldots,Y^{1/\varepsilon^{2}}).

Linking kk-BTX to 22-GAP-ORT.

We show that Alice and Bob, who are given (X,Y)ϕ(X,Y)\sim\phi, can construct a 22-player protocol Π\Pi^{\prime} for 22-GAP-ORT(X,Y)(X,Y) using a protocol Π\Pi for kk-BTX.

They first construct an input for kk-BTX using (X,Y)(X,Y). Alice simulates the first k/2k/2 players, and Bob simulates the second k/2k/2 players. Alice and Bob use the public randomness to generate MjM^{j} and DjD^{j} for each j[1/ε2]j\in[1/\varepsilon^{2}]. For each j[1/ε2]j\in[1/\varepsilon^{2}], Alice sets the MjM^{j}-th coordinate of each of the first k/2k/2 players to XjX^{j}. Similarly, Bob sets the MjM^{j}-th coordinate of each of the last k/2k/2 players to YjY^{j}. Alice and Bob then use private randomness and the DjD^{j} vectors to fill in the remaining coordinates. Observe that the resulting inputs BB (for kk-BTX) is distributed according to ν\nu.

Alice and Bob then run the protocol Π\Pi on BB. Every time a message is sent between any two of the kk players in Π\Pi, it is appended to the transcript. That is, if the two players are among the first k/2k/2, Alice still forwards this message to Bob. If the two players are among the last k/2k/2, Bob still forwards this message to Alice. If the message is between a player in the first group and the second group, Alice and Bob exchange a message. The output of Π\Pi^{\prime} is equal to that of Π\Pi.

Theorem 6

Let Π\Pi be the transcript of any randomized protocol for kk-BTX on input distribution ν\nu with error probability δ\delta for a sufficiently small constant δ\delta. Then I(X,Y;Π|M,D)=Ω(1/ε2)I(X,Y;\Pi\ |\ M,D)={\Omega}(1/\varepsilon^{2}), where the information is measured with respect to the uniform distribution on X,YX,Y.

  • Proof:

    By a Markov inequality, we have that for at least 1/21/2 fraction of choices of (M,D)=(m,d)(M,D)=(m,d), the kk-party protocol Π\Pi computes kk-BTX with error probability at most 2δ2\delta. Say such a pair (m,d)(m,d) good. According to our reduction, we have that the transcript of Π\Pi is equal to the transcript of Π\Pi^{\prime} and the output of Π\Pi^{\prime} is the same as the output of that of Π\Pi. Hence, for a good pair (m,d)(m,d), the 22-party protocol Π\Pi^{\prime} computes 22-GAP-ORT with error probability at most 2δ2\delta on distribution ϕ\phi. We have

    I(X,Y;Π|M,D)\displaystyle I(X,Y;\Pi\ |\ M,D) =\displaystyle= (m,d)𝖯𝗋[(M,D)=(m,d)][I(X,Y;Π|(M,D)=(m,d))]\displaystyle\sum_{(m,d)}\mathsf{Pr}[(M,D)=(m,d)]\left[I(X,Y;\Pi\ |\ (M,D)=(m,d))\right]
    \displaystyle\geq good (m,d)𝖯𝗋[(M,D)=(m,d)][I(X,Y;Π|(M,D)=(m,d))]\displaystyle\sum_{\text{good }(m,d)}\mathsf{Pr}[(M,D)=(m,d)]\left[I(X,Y;\Pi\ |\ (M,D)=(m,d))\right]
    =\displaystyle= good (m,d)𝖯𝗋[(M,D)=(m,d)][I(X,Y;Π|(M,D)=(m,d))]\displaystyle\sum_{\text{good }(m,d)}\mathsf{Pr}[(M,D)=(m,d)]\left[I(X,Y;\Pi^{\prime}\ |\ (M,D)=(m,d))\right]
    \displaystyle\geq 1/2Ω(1/ε2)Ω(1/ε2).(By Theorem 4)\displaystyle 1/2\cdot{\Omega}(1/\varepsilon^{2})\geq{\Omega}(1/\varepsilon^{2}).\quad(\text{By Theorem~\ref{thm:gap}})
     

Now we are ready to prove our main theorem for kk-BTX.

Theorem 7

Let Π\Pi be the transcript of any randomized protocol for kk-BTX on input distribution ν\nu with error probability δ\delta for a sufficiently small constant δ\delta. We have I(B;Π|M,D)Ω(n/(kε2))I(B;\Pi\ |\ M,D)\geq{\Omega}(n/(k\varepsilon^{2})), where the information is measured with respect to the input distribution ν\nu.

  • Proof:

    By Theorem 6 we have I(X,Y;Π|M,D)=Ω(1/ε2)I(X,Y;\Pi\ |\ M,D)={\Omega}(1/\varepsilon^{2}). Using the chain rule and a Markov inequality, it holds that

    I(Xj,Yj;Π|M,D,X<j,Y<j)=Ω(1)I(X^{j},Y^{j};\Pi\ |\ M,D,X^{<j},Y^{<j})={\Omega}(1)

    for at least Ω(1/ε2){\Omega}(1/\varepsilon^{2}) of j[1/ε2]j\in[1/\varepsilon^{2}], where X<j={X1,,Xj1}X^{<j}=\{X^{1},\ldots,X^{j-1}\} and similarly for Y<jY^{<j}. We say such a jj for which this holds is good.

    Now we consider a good j[1/ε2]j\in[1/\varepsilon^{2}], and show that

    I(Bj;Π|M,D,B<j)=Ω(n/k).I(B^{j};\Pi\ |\ M,D,B^{<j})={\Omega}(n/k).

    Since B<jB^{<j} determines (X<j,Y<j)(X^{<j},Y^{<j}) given MM, and B<jB^{<j} is independent of BjB^{j} given MM and DD, by item 33 of Proposition 1, it suffices to prove that I(Bj;Π|M,D,X<j,Y<j)=Ω(n/k)I(B^{j};\Pi\ |\ M,D,X^{<j},Y^{<j})={\Omega}(n/k). By expanding the conditioning, we can write I(Bj;Π|M,D,X<j,Y<j)I(B^{j};\Pi\ |\ M,D,X^{<j},Y^{<j}) as

    (x,y,m,d)𝖯𝗋[(Mj,Dj,X<j,Y<j)=(m,d,x,y)]\displaystyle\sum_{(x,y,m,d)}\mathsf{Pr}\left[(M^{-j},D^{-j},X^{<j},Y^{<j})=(m,d,x,y)\right]
    ×I(Bj;Π|Mj,Dj,(Mj,Dj,X<j,Y<j)=(m,d,x,y)).\displaystyle\quad\quad\quad\quad\times I\left(B^{j};\Pi\ |\ M^{j},D^{j},(M^{-j},D^{-j},X^{<j},Y^{<j})=(m,d,x,y)\right).

    By the definition of a good j[1/ε2]j\in[1/\varepsilon^{2}], we know by a Markov bound that with probability Ω(1){\Omega}(1) over the choice of (x,y,m,d)(x,y,m,d), we have

    I(Xj,Yj;Π|Mj,Dj,(Mj,Dj,X<j,Y<j)=(m,d,x,y))=Ω(1).I(X^{j},Y^{j};\Pi\ |\ M^{j},D^{j},(M^{-j},D^{-j},X^{<j},Y^{<j})=(m,d,x,y))={\Omega}(1).

    Call these (x,y,m,d)(x,y,m,d) for which this holds good for jj.

    Note that H(Xj,Yj|Mj,Dj,(Mj,Dj,X<j,Y<j)=(m,d,x,y))=2H(X^{j},Y^{j}\ |\ M^{j},D^{j},(M^{-j},D^{-j},X^{<j},Y^{<j})=(m,d,x,y))=2, since M,D,X<j,Y<jM,D,X^{<j},Y^{<j} are independent of X,YX,Y, Therefore, for a good jj and a tuple (x,y,m,d)(x,y,m,d) that is good for jj, we have

    H(Xj,Yj|Π,Mj,Dj,(Mj,Dj,X<j,Y<j)=(m,d,x,y))\displaystyle H(X^{j},Y^{j}\ |\ \Pi,M^{j},D^{j},(M^{-j},D^{-j},X^{<j},Y^{<j})=(m,d,x,y))
    =\displaystyle= H(Xj,Yj|Mj,Dj,(Mj,Dj,X<j,Y<j)=(m,d,x,y))\displaystyle H(X^{j},Y^{j}\ |\ M^{j},D^{j},(M^{-j},D^{-j},X^{<j},Y^{<j})=(m,d,x,y))
    I(Xj,Yj;Π|Mj,Dj,(Mj,Dj,X<j,Y<j)=(m,d,x,y))\displaystyle-I(X^{j},Y^{j};\Pi\ |\ M^{j},D^{j},(M^{-j},D^{-j},X^{<j},Y^{<j})=(m,d,x,y))
    =\displaystyle= 2Ω(1).\displaystyle 2-\Omega(1).

    By the Maximum Likelihood Principle in Proposition 1, the maximum likelihood function gg computes (Xj,Yj)(X^{j},Y^{j}) from the transcript of Π\Pi and Mj,Dj,(j,m,d,x,y)M^{j},D^{j},(j,m,d,x,y), with error probability δg\delta_{g}, over Xj,Yj,Mj,DjX^{j},Y^{j},M^{j},D^{j} and the randomness of Π\Pi, satisfying

    1δg12H(Xj,Yj|Π,Mj,Dj,(Mj,Dj,X<j,Y<j)=(m,d,x,y))122Ω(1)=14+Ω(1),\displaystyle 1-\delta_{g}\geq\frac{1}{2^{H(X^{j},Y^{j}\ |\ \Pi,M^{j},D^{j},(M^{-j},D^{-j},X^{<j},Y^{<j})=(m,d,x,y))}}\geq\frac{1}{2^{2-{\Omega}(1)}}=\frac{1}{4}+{\Omega}(1), (26)

    Now for a good jj, and a tuple (x,y,m,d)(x,y,m,d) that is good for jj, we define a protocol Πj,x,y,m,d\Pi_{j,x,y,m,d} which computes the kk-GUESS problem on input (A1,,Ak,{Q,R})ψn(A_{1},\ldots,A_{k},\{Q,R\})\sim\psi_{n} correctly with probability 1/4+Ω(1)1/4+\Omega(1). Here A1,,AkA_{1},\ldots,A_{k} are inputs of the kk sites and {Q,R}\{Q,R\} is the input of the predictor. The protocol Πj,x,y,m,d\Pi_{j,x,y,m,d} has (j,x,y,m,d)(j,x,y,m,d) hardwired into it, and works as follows. First, the kk sites construct an input BB for the kk-BTX problem distributed according to ν\nu, using {A1,,Ak}\{A_{1},\ldots,A_{k}\}, (j,x,y,m,d)(j,x,y,m,d) and their private randomness, without any communication: They set Bj={A1,,Ak}B^{j}=\{A_{1},\ldots,A_{k}\}, and use their private randomness to sample inputs for blocks jjj^{\prime}\neq j using the values (x,y,m,d)(x,y,m,d) and the fact that the inputs to the kk sites are independent conditioned on (x,y,m,d)(x,y,m,d). The predictor sets its input to be {Mj=Q,Dj=R,(Mj,Dj,X<j,Y<j)=(m,d,x,y)}\{M^{j}=Q,D^{j}=R,(M^{-j},D^{-j},X^{<j},Y^{<j})=(m,d,x,y)\}. Next, the kk sites run Π\Pi on their input BB. Finally, the predictor outputs (Xj,Yj)=g(Π(B),Q,R,j,m,d,x,y)(X^{j},Y^{j})=g(\Pi(B),Q,R,j,m,d,x,y).

    Combining these with Theorem 5, we obtain

    I(B;Π|M,D)\displaystyle I(B;\Pi\ |\ M,D) \displaystyle\geq good jI(Bj;Π|M,D,B<j)\displaystyle\sum_{\text{good }j}I(B^{j};\Pi\ |\ M,D,B^{<j})
    \displaystyle\geq good jgood (x,y,m,d) for j𝖯𝗋[(Mj,Dj,X<j,Y<j)=(m,d,x,y)]\displaystyle\sum_{\text{good }j}\sum_{\text{good }(x,y,m,d)\text{ for }j}\mathsf{Pr}\left[(M^{-j},D^{-j},X^{<j},Y^{<j})=(m,d,x,y)\right]
    ×I(Bj;Π|Mj,Dj,(Mj,Dj,X<j,Y<j)=(m,d,x,y))\displaystyle\quad\quad\quad\quad\times I\left(B^{j};\Pi\ |\ M^{j},D^{j},(M^{-j},D^{-j},X^{<j},Y^{<j})=(m,d,x,y)\right)
    =\displaystyle= Ω(1/ε2)Ω(n/k)(By (26) and Theorem 5)\displaystyle\Omega(1/\varepsilon^{2})\cdot{\Omega}(n/k)\quad(\mbox{By (\ref{eq:error}) and Theorem~\ref{thm:XOR}})
    \displaystyle\geq Ω(n/(kε2)).\displaystyle{\Omega}(n/(k\varepsilon^{2})).

    This completes the proof.  

By Proposition 2 that says that the randomized communication complexity is always at least the conditional information cost, we have the following immediate corollary.

Corollary 1

Any randomized protocol that computes kk-BTX on input distribution ν\nu with error probability δ\delta for some sufficient small constant δ\delta has communication complexity Ω(n/(kε2)){\Omega}(n/(k\varepsilon^{2})).

4.5 The Complexity of Fp(p>1)F_{p}\ (p>1)

The input of ε\varepsilon-approximate Fp(p>1)F_{p}\ (p>1) is chosen to be the same as kk-BTX by setting n=kpn=k^{p}. That is, we choose {B1,,Bk}\{B_{1},\ldots,B_{k}\} randomly according to distribution ν\nu. BiB_{i} is the input vector for site SiS_{i} consisting of 1/ε21/\varepsilon^{2} blocks each having n=kpn=k^{p} coordinates. We prove the lower bound for FpF_{p} by performing a reduction from kk-BTX.

Lemma 9

If there exists a protocol 𝒫\cal P^{\prime} that computes a (1+αε)(1+\alpha\varepsilon)-approximate Fp(p>1)F_{p}\ (p>1) for a sufficiently small constant α\alpha on input distribution ν\nu with communication complexity CC and error probability at most δ\delta, then there exists a protocol 𝒫\cal P for kk-BTX on input distribution ν\nu with communication complexity CC and error probability at most 3δ+σ3\delta+\sigma, where σ\sigma is an arbitrarily small constant.

  • Proof:

    We pick a random input B={B1,,Bk}B=\{B_{1},\ldots,B_{k}\} from distribution ν\nu. Each coordinate (column) of BB represents an item. Thus we have a total of 1/ε2kp=kp/ε21/\varepsilon^{2}\cdot k^{p}=k^{p}/\varepsilon^{2} possible items, which we identify with the set [kp/ε2][k^{p}/\varepsilon^{2}]. If we view each input vector Bi(i[k])B_{i}\ (i\in[k]) as a set, then each site has a subset of [kp/ε2][k^{p}/\varepsilon^{2}] corresponding to these 11 bits. Let W0W_{0} be the exact value of Fp(B)F_{p}(B). W0W_{0} can be written as the sum of three components:

    W0\displaystyle W_{0} =\displaystyle= (kp12ε2+Q)1p+(12ε2+U)(k/2)p+(14ε2+V)kp,\displaystyle\left(\frac{k^{p}-1}{2\varepsilon^{2}}+Q\right)\cdot 1^{p}+\left(\frac{1}{2\varepsilon^{2}}+U\right)\cdot(k/2)^{p}+\left(\frac{1}{4\varepsilon^{2}}+V\right)\cdot k^{p}, (27)

    where Q,U,VQ,U,V are random variables (it will be clear why we write it this way in what follows). The first term of the RHS of Equation (27) is the contribution of non-special coordinates across all blocks in each of which one site has 11. The second term is the contribution of the special coordinates across all blocks in each of which k/2k/2 sites have 11. The third term is the contribution of the special coordinates across all blocks in each of which all kk sites have 11.

    Note that kk-BTX(B1,,Bk)(B_{1},\ldots,B_{k}) is 11 if |U|2/ε\left|U\right|\geq 2/\varepsilon and 0 if |U|1/ε\left|U\right|\leq 1/\varepsilon. Our goal is to use a protocol 𝒫\cal P^{\prime} for FpF_{p} to construct a protocol 𝒫\cal P for kk-BTX such that we can differentiate the two cases (i.e., |U|2/ε\left|U\right|\geq 2/\varepsilon or |U|1/ε\left|U\right|\leq 1/\varepsilon) with a very good probability.

    Given a random input BB, let W1W_{1} be the exact FpF_{p}-value on the first k/2k/2 sites, and W2W_{2} be the exact FpF_{p}-value on the second k/2k/2 sites. That is, W1=Fp(B1,,Bk/2)W_{1}=F_{p}(B_{1},\ldots,B_{k/2}) and W2=Fp(Bk/2+1,,Bk)W_{2}=F_{p}(B_{k/2+1},\ldots,B_{k}). We have

    W1+W2\displaystyle W_{1}+W_{2} =\displaystyle= (kp12ε2+Q)1p+(12ε2+U)(k/2)p+(14ε2+V)2(k/2)p.\displaystyle\left(\frac{k^{p}-1}{2\varepsilon^{2}}+Q\right)\cdot 1^{p}+\left(\frac{1}{2\varepsilon^{2}}+U\right)\cdot(k/2)^{p}+\left(\frac{1}{4\varepsilon^{2}}+V\right)\cdot 2\cdot(k/2)^{p}. (28)

    By Equation (27) and (28) we can cancel out VV:

    2p1(W1+W2)W0\displaystyle 2^{p-1}(W_{1}+W_{2})-W_{0} =\displaystyle= (2p11)((kp12ε2+Q)+(12ε2+U)(k/2)p).\displaystyle(2^{p-1}-1)\left(\left(\frac{k^{p}-1}{2\varepsilon^{2}}+Q\right)+\left(\frac{1}{2\varepsilon^{2}}+U\right)\cdot(k/2)^{p}\right). (29)

    Let W~0\tilde{W}_{0}, W~1\tilde{W}_{1} and W~2\tilde{W}_{2} be the estimated W0W_{0}, W1W_{1} and W2W_{2} obtained by running 𝒫\cal P^{\prime} on the kk sites’ inputs, the first k/2k/2 sites’ inputs and the second k/2k/2 sites’ inputs, respectively. Observe that W0(2p+1)kp/ε2W_{0}\leq(2^{p}+1)k^{p}/\varepsilon^{2} and W1,W22kp/ε2W_{1},W_{2}\leq 2k^{p}/\varepsilon^{2}. By the randomized approximation guarantee of 𝒫\cal P^{\prime} and the discussion above we have that with probability at least 13δ1-3\delta,

    2p1(W1+W2)W0=2p1(W~1+W~2)W~0±βkp/ε,2^{p-1}(W_{1}+W_{2})-W_{0}=2^{p-1}(\tilde{W}_{1}+\tilde{W}_{2})-\tilde{W}_{0}\pm\beta^{\prime}k^{p}/\varepsilon, (30)

    where |β|3(2p+1)α\left|\beta^{\prime}\right|\leq 3(2^{p}+1)\alpha.

    By a Chernoff bound we have that |Q|c1kp/2/ε\left|Q\right|\leq c_{1}k^{p/2}/\varepsilon with probability at least 1σ1-\sigma, where σ\sigma is an arbitrarily small constant and c1κlog1/2(1/σ)c_{1}\leq\kappa\log^{1/2}(1/\sigma) for some universal constant κ\kappa. Combining this fact with Equation (29) and (30) and letting W~=(2p1(W~1+W~2)W~0)/(2p11)\tilde{W}=(2^{p-1}(\tilde{W}_{1}+\tilde{W}_{2})-\tilde{W}_{0})/(2^{p-1}-1), we have that with probability at least 13δσ1-3\delta-\sigma,

    U\displaystyle U =\displaystyle= 2pW~kp2p+12ε22pβ(2p11)ε,\displaystyle\frac{2^{p}\tilde{W}}{k^{p}}-\frac{2^{p}+1}{2\varepsilon^{2}}-\frac{2^{p}\beta}{(2^{p-1}-1)\varepsilon}, (31)

    where |β|3(2p+1)α+o(1)\left|\beta\right|\leq 3(2^{p}+1)\alpha+o(1).

Protocol 𝒫\cal P.

Given an input BB for kk-BTX, protocol 𝒫\cal P first uses 𝒫\cal P^{\prime} to obtain the value W~\tilde{W} described above, and then determines the answer to kk-BTX as follows:

k-BTX(B)={1,if|2pW~/kp(2p+1)/(2ε2)|1.5/ε,0,otherwise.\displaystyle\begin{array}[]{l}k\textrm{-BTX}(B)=\left\{\begin{array}[]{rl}1,&\text{if}\quad\left|2^{p}\tilde{W}/k^{p}-(2^{p}+1)/(2\varepsilon^{2})\right|\geq 1.5/\varepsilon,\\ 0,&\text{otherwise}.\end{array}\right.\end{array}
Correctness.

Note that with probability at least 13δσ1-3\delta-\sigma, we have |β|3(2p+1)α+o(1)\left|\beta\right|\leq 3(2^{p}+1)\alpha+o(1), where α>0\alpha>0 is a sufficiently small constant, and thus |2pβ(2p11)ε|<0.5/ε\left|\frac{2^{p}\beta}{(2^{p-1}-1)\varepsilon}\right|<0.5/\varepsilon. Therefore, in this case protocol 𝒫\cal P will always succeed.  

Theorem 7 (set n=kpn=k^{p}) and Lemma 9 directly imply the following main theorem for FpF_{p}.

Theorem 8

Any protocol that computes a (1+ε)(1+\varepsilon)-approximate Fp(p>1)F_{p}\ (p>1) on input distribution ν\nu with error probability δ\delta for some sufficiently small constant δ\delta has communication complexity Ω(kp1/ε2){\Omega}(k^{p-1}/\varepsilon^{2}).

5 An Upper Bound for Fp(p>1)F_{p}\ (p>1)

We describe the following protocol to give a factor (1+Θ(ε))(1+\Theta(\varepsilon))-approximation to FpF_{p} at all points in time in the union of kk streams each held by a different site. Each site has a non-negative vector vimv^{i}\in\mathbb{R}^{m}777We use mm instead of NN for universe size only in this section. which evolves with time, and at all times the coordinator holds a (1+Θ(ε))(1+\Theta(\varepsilon))-approximation to i=1kvipp\|\sum_{i=1}^{k}v^{i}\|_{p}^{p}. Let nn be the length of the union of the kk streams. We assume n=poly(m)n={\mathrm{poly}}(m), and that kk is a power of 22.

As observed in [20], up to a factor of O(ε1lognlog(ε1logn))O(\varepsilon^{-1}\log n\log(\varepsilon^{-1}\log n)) in communication, the problem is equivalent to the threshold problem: given a threshold τ\tau, with probability 2/32/3: when i=1kvipp>τ\|\sum_{i=1}^{k}v^{i}\|_{p}^{p}>\tau, the coordinator outputs 11, when i=1kvipp<τ/(1+ε)\|\sum_{i=1}^{k}v^{i}\|_{p}^{p}<\tau/(1+\varepsilon), the coordinator outputs 0, and for τ/(1+ε)i=1kvippτ\tau/(1+\varepsilon)\leq\|\sum_{i=1}^{k}v^{i}\|_{p}^{p}\leq\tau, the coordinator can output either 0 or 11888To see the equivalence, by independent repetition, we can assume the success probability of the protocol for the threshold problem is 1Θ(ε/logn)1-\Theta(\varepsilon/\log n). Then we can run a protocol for each τ=1,(1+ε),(1+ε)2,(1+ε)3,,Θ(n2)\tau=1,(1+\varepsilon),(1+\varepsilon)^{2},(1+\varepsilon)^{3},\ldots,\Theta(n^{2}), and we are correct on all instantiations with probability at least 2/32/3..

We can thus assume we are given a threshold τ\tau in the following algorithm description. For notational convenience, define τ=τ/2\tau_{\ell}=\tau/2^{\ell} for an integer \ell. A nice property of the algorithm is that it is one-way, namely, all communication is from the sites to the coordinator. We leave optimization of the poly(ε1logn){\mathrm{poly}}(\varepsilon^{-1}\log n) factors in the communication complexity to future work.

5.1 Our Protocol

The protocol consists of four algorithms illustrated in Algorithm 1 to Algorithm 4. Let v=i=1kviv=\sum_{i=1}^{k}v^{i} at any point in time during the union of the kk streams. At times we will make the following assumptions on the algorithm parameters γ,B,\gamma,B, and rr: we assume γ=Θ(ε)\gamma=\Theta(\varepsilon) is sufficiently small, and B=poly(ε1logn)B={\mathrm{poly}}(\varepsilon^{-1}\log n) and r=Θ(logn)r=\Theta(\log n) are sufficiently large.

r=Θ(logn)r=\Theta(\log n) /* A parameter used by the sites and coordinator */
for z=1,2,,rz=1,2,\ldots,r do
 for =0,1,2,,logm\ell=0,1,2,\ldots,\log m do
     Create a set SzS^{z}_{\ell} by including each coordinate in [m][m] independently with probability 22^{-\ell}.
 
Algorithm 1 Intepretation of the random public coin by sites and the coordinator
γ=Θ(ε),B=poly(ε1logn)\gamma=\Theta(\varepsilon),B={\mathrm{poly}}(\varepsilon^{-1}\log n). Choose η[0,1]\eta\in[0,1] uniformly at random /* Parameters */
for z=1,2,,rz=1,2,\ldots,r do
 for =0,1,2,,logm\ell=0,1,2,\ldots,\log m do
    for j=1,2,,mj=1,2,\ldots,m do
       fz,,j0f_{z,\ell,j}\leftarrow 0 /* Initialize all frequencies seen to 0 */
       
    
 
out0out\leftarrow 0 /* The coordinator’s current output */
Algorithm 2 Initialization at Coordinator
for z=1,2,,rz=1,2,\ldots,r do
 for =0,1,2,,logm\ell=0,1,2,\ldots,\log m do
    if jSzj\in S_{\ell}^{z} and vji>τ1/p/(kB)v_{j}^{i}>\tau_{\ell}^{1/p}/(kB) then
        With probability min(B/τ1/p,1)\min(B/\tau_{\ell}^{1/p},1), send (j,z,)(j,z,\ell) to the coordinator
    
 
Algorithm 3 When Site ii receives an update vivi+ejv^{i}\leftarrow v^{i}+e_{j} for standard unit vector eje_{j}
fz,,jfz,,j+τ1/p/Bf_{z,\ell,j}\leftarrow f_{z,\ell,j}+\tau_{\ell}^{1/p}/B
for h=0,1,2,,O(γ1log(n/ηp))h=0,1,2,\ldots,O(\gamma^{-1}\log(n/\eta^{p})) do
 for z=1,2,,rz=1,2,\ldots,r do
      Choose \ell for which 2τηp(1+γ)phB<2+12^{\ell}\leq\frac{\tau}{\eta^{p}(1+\gamma)^{ph}B}<2^{\ell+1}, or =0\ell=0 if no such \ell exists
      Let Fz,h={j[m]fz,,j[η(1+γ)h,η(1+γ)h+1)}F_{z,h}=\{j\in[m]\mid f_{z,\ell,j}\in[\eta(1+\gamma)^{h},\eta(1+\gamma)^{h+1})\}
    
 c~h=medianz 2|Fz,h|\tilde{c}_{h}=\textrm{median}_{z}\ 2^{\ell}\cdot|F_{z,h}|
if h0c~hηp(1+γ)ph>(1ε)τ\sum_{h\geq 0}\tilde{c}_{h}\cdot\eta^{p}\cdot(1+\gamma)^{ph}>(1-\varepsilon)\tau then
 out1out\leftarrow 1
   Terminate the protocol
Algorithm 4 Algorithm at Coordinator if a tuple (j,z,)(j,z,\ell) arrives

5.2 Communication Cost

Lemma 10

Consider any setting of v1,,vkv^{1},\ldots,v^{k} for which we have i=1kvipp2pτ.\|\sum_{i=1}^{k}v^{i}\|_{p}^{p}\leq 2^{p}\cdot\tau. Then the expected total communication is kp1poly(ε1logn)k^{p-1}\cdot{\mathrm{poly}}(\varepsilon^{-1}\log n) bits.

  • Proof:

    Fix any particular z[r]z\in[r] and [0,1,,logm]\ell\in[0,1,\ldots,\log m]. Let vji,v_{j}^{i,\ell} equal vjiv_{j}^{i} if jSj\in S_{\ell} and equal 0 otherwise. Let vi,v^{i,\ell} be the vector with coordinates vji,v_{j}^{i,\ell} for j[m]j\in[m]. Also let v=i=1kvi,v^{\ell}=\sum_{i=1}^{k}v^{i,\ell}. Observe that 𝐄[vpp]2pτ/2=2pτ{\bf E}[\|v^{\ell}\|_{p}^{p}]\leq 2^{p}\cdot\tau/2^{\ell}=2^{p}\cdot\tau_{\ell}.

    Because of non-negativity of the viv^{i},

    i=1kjS(vji,)pi=1kvi,ppvpp.\sum_{i=1}^{k}\sum_{j\in S_{\ell}}(v_{j}^{i,\ell})^{p}\leq\sum_{i=1}^{k}\|v^{i,\ell}\|_{p}^{p}\leq\|v^{\ell}\|_{p}^{p}.

    Notice that a jSj\in S_{\ell} is sent by a site with probability at most B/τ1/pB/\tau_{\ell}^{1/p} and only if (vji)pτkpBp(v_{j}^{i})^{p}\geq\frac{\tau_{\ell}}{k^{p}B^{p}}. Hence the expected number of messages sent for this zz and \ell, over all randomness, is

    Bτ1/p𝐄[i,j(vji)pτkpBpvji]Bτ1/p𝐄[vpp]τ/(kpBp)τ1/pkB2pτkp1Bpτ=2pkp1Bp,\frac{B}{\tau_{\ell}^{1/p}}{\bf E}\left[\sum_{i,j\ \mid\ (v_{j}^{i})^{p}\geq\frac{\tau_{\ell}}{k^{p}B^{p}}}v_{j}^{i}\right]\leq\frac{B}{\tau_{\ell}^{1/p}}\cdot\frac{{\bf E}[\|v^{\ell}\|_{p}^{p}]}{\tau_{\ell}/(k^{p}B^{p})}\cdot\frac{\tau_{\ell}^{1/p}}{kB}\leq\frac{2^{p}\cdot\tau_{\ell}\cdot k^{p-1}\cdot B^{p}}{\tau_{\ell}}=2^{p}\cdot k^{p-1}\cdot B^{p}, (34)

    where we used that vji\sum v_{j}^{i} is maximized subject to (vji)pτkpBp(v_{j}^{i})^{p}\geq\frac{\tau_{\ell}}{k^{p}B^{p}} and (vji)pvpp\sum(v_{j}^{i})^{p}\leq\|v^{\ell}\|_{p}^{p} when all the vjiv_{j}^{i} are equal to τ1/p/(kB)\tau_{\ell}^{1/p}/(kB). Summing over all zz and \ell, it follows that the expected number of messages sent in total is O(kp1Bplog2n)O(k^{p-1}B^{p}\log^{2}n). Since each message is O(logn)O(\log n) bits, the expected number of bits is kp1poly(ε1logn)k^{p-1}\cdot{\mathrm{poly}}(\varepsilon^{-1}\log n).  

5.3 Correctness

We let C>0C>0 be a sufficiently large constant.

5.3.1 Concentration of Individual Frequencies

We shall make use of the following standard multiplicative Chernoff bound.

Fact 2

Let X1,XsX_{1},\ldots X_{s} be i.i.d. Bernoulli(q)(q) random variables. Then for all 0<β<10<\beta<1,

𝖯𝗋[|i=1sXiqs|βqs]2eβ2qs3.\mathsf{Pr}\left[|\sum_{i=1}^{s}X_{i}-qs|\geq\beta qs\right]\leq 2\cdot e^{-\frac{\beta^{2}qs}{3}}.
Lemma 11

For a sufficiently large constant C>0C>0, with probability 1nΩ(C)1-n^{-\Omega(C)}, for all zz, \ell, jSj\in S_{\ell}, and all times in the union of the kk streams,

  1. 1.

    fz,,j2evj+Cτ1/plognBf_{z,\ell,j}\leq 2e\cdot v_{j}+\frac{C\tau_{\ell}^{1/p}\log n}{B}, and

  2. 2.

    if vjC(log5n)τ1/pBγ10v_{j}\geq\frac{C(\log^{5}n)\tau_{\ell}^{1/p}}{B\gamma^{10}}, then |fz,,jvj|γ5log2nvj|f_{z,\ell,j}-v_{j}|\leq\frac{\gamma^{5}}{\log^{2}n}\cdot v_{j}.

  • Proof:

    Fix a particular time snapshot in the stream. Let gz,,j=fz,,jB/τ1/pg_{z,\ell,j}=f_{z,\ell,j}\cdot B/\tau_{\ell}^{1/p}. Then gz,,jg_{z,\ell,j} is a sum of indicator variables, where the number of indicator variables depends on the values of the vjiv_{j}^{i}. The indicator variables are independent, each with expectation min(B/τ1/p,1)\min(B/\tau_{\ell}^{1/p},1).

    First part of lemma. The number ss of indicator variables is at most vjv_{j}, and the expectation of each is at most B/τ1/pB/\tau_{\ell}^{1/p}. Hence, the probability that w=2evjB/τ1/p+Clognw=2e\cdot v_{j}\cdot B/\tau_{\ell}^{1/p}+C\log n or more of them equal 11 is at most

    (vjw)(Bτ1/p)w(evjBwτ1/p)w(12)Clogn=nC.{v_{j}\choose w}\cdot\left(\frac{B}{\tau_{\ell}^{1/p}}\right)^{w}\leq\left(\frac{ev_{j}B}{w\tau_{\ell}^{1/p}}\right)^{w}\leq\left(\frac{1}{2}\right)^{C\log n}=n^{-C}.

    This part of the lemma now follows by scaling the gz,,jg_{z,\ell,j} by τ1/p/B\tau_{\ell}^{1/p}/B to obtain a bound on the fz,,jf_{z,\ell,j}.

    Second part of lemma. Suppose at this time vjC(log5n)τ1/pBγ10v_{j}\geq\frac{C(\log^{5}n)\tau_{\ell}^{1/p}}{B\gamma^{10}}. The number ss of indicator variables is minimized when there are k1k-1 distinct ii for which vji=τ1/pkBv_{j}^{i}=\frac{\tau_{\ell}^{1/p}}{kB}, and one value of ii for which

    vji=vj(k1)τ1/pkB.v_{j}^{i}=v_{j}-(k-1)\cdot\frac{\tau_{\ell}^{1/p}}{kB}.

    Hence,

    svj(k1)τ1/pkBτ1/pkB=vjτ1/pB.s\geq v_{j}-(k-1)\cdot\frac{\tau_{\ell}^{1/p}}{kB}-\frac{\tau_{\ell}^{1/p}}{kB}=v_{j}-\frac{\tau_{\ell}^{1/p}}{B}.

    If the expectation is 11, then fz,,j=vjτ1/pBf_{z,\ell,j}=v_{j}-\frac{\tau_{\ell}^{1/p}}{B}, and using that vjC(log5n)τ1/pBγ10v_{j}\geq\frac{C(\log^{5}n)\tau_{\ell}^{1/p}}{B\gamma^{10}} establishes this part of the lemma. Otherwise, applying Fact 2 with svjτ1/pBC(log5n)τ1/p2Bγ10s\geq v_{j}-\frac{\tau_{\ell}^{1/p}}{B}\geq\frac{C(\log^{5}n)\tau_{\ell}^{1/p}}{2B\gamma^{10}} and q=Bτ1/pq=\frac{B}{\tau_{\ell}^{1/p}}, and using that qsClog5n2γ10qs\geq\frac{C\log^{5}n}{2\gamma^{10}}, we have

    𝖯𝗋[|gz,,jqs|>γ5qs2log2n]=nΩ(C).\mathsf{Pr}\left[|g_{z,\ell,j}-qs|>\frac{\gamma^{5}qs}{2\log^{2}n}\right]=n^{-\Omega(C)}.

    Scaling by τ1/pB=1q\frac{\tau_{\ell}^{1/p}}{B}=\frac{1}{q}, we have

    𝖯𝗋[|fs,,js|>γ5s2log2n]=nΩ(C),\mathsf{Pr}\left[|f_{s,\ell,j}-s|>\frac{\gamma^{5}s}{2\log^{2}n}\right]=n^{-\Omega(C)},

    and since vjτ1/pBsvjv_{j}-\frac{\tau_{\ell}^{1/p}}{B}\leq s\leq v_{j},

    𝖯𝗋[|fs,,jvj|γ5vj2log2n+τ1/pB]=nΩ(C),\mathsf{Pr}\left[|f_{s,\ell,j}-v_{j}|\geq\frac{\gamma^{5}v_{j}}{2\log^{2}n}+\frac{\tau_{\ell}^{1/p}}{B}\right]=n^{-\Omega(C)},

    and finally using that τ1/pB<γ5vj2log2n\frac{\tau_{\ell}^{1/p}}{B}<\frac{\gamma^{5}v_{j}}{2\log^{2}n}, and union-bounding over a stream of length nn as well as all choices of z,,z,\ell, and jj, the lemma follows.  

5.3.2 Estimating Class Sizes

Define the classes ChC_{h} as follows:

Ch={j[m]η(1+γ)hvj<η(1+γ)h+1}.C_{h}=\{j\in[m]\mid\eta(1+\gamma)^{h}\leq v_{j}<\eta(1+\gamma)^{h+1}\}.

Say that ChC_{h} contributes at a point in time in the union of the kk streams if

|Ch|ηp(1+γ)phγvppB1/2log(n/ηp).|C_{h}|\cdot\eta^{p}(1+\gamma)^{ph}\geq\frac{\gamma\|v\|_{p}^{p}}{B^{1/2}\log(n/\eta^{p})}.

Since the number of non-zero |Ch||C_{h}| is O(γ1log(n/ηp))O(\gamma^{-1}\log(n/\eta^{p})), we have

 non-contributing h|Ch|ηp(1+γ)ph+p=O(vppB1/2).\displaystyle\sum_{\textrm{ non-contributing }h}|C_{h}|\cdot\eta^{p}(1+\gamma)^{ph+p}=O\left(\frac{\|v\|_{p}^{p}}{B^{1/2}}\right). (35)
Lemma 12

With probability 1nΩ(C)1-n^{-\Omega(C)}, at all points in time in the union of the kk streams and for all hh and \ell, for at least a 3/53/5 fraction of the z[r]z\in[r],

|ChSz|32|Ch||C_{h}\cap S_{\ell}^{z}|\leq 3\cdot 2^{-\ell}\cdot|C_{h}|
  • Proof:

    The random variable |ChSz||C_{h}\cap S_{\ell}^{z}| is a sum of |Ch||C_{h}| independent Bernoulli(2)(2^{-\ell}) random variables. By a Markov bound, 𝖯𝗋[|ChSz|32|Ch|]2/3\mathsf{Pr}[|C_{h}\cap S_{\ell}^{z}|\leq 3\cdot 2^{-\ell}|C_{h}|]\geq 2/3. Letting XzX_{z} be an indicator variable which is 11 iff |ChSz|32|Ch||C_{h}\cap S_{\ell}^{z}|\leq 3\cdot 2^{-\ell}|C_{h}|, the lemma follows by applying Fact 2 to the XzX_{z}, using that rr is large enough, and union-bounding over a stream of length nn and all hh and \ell.  

For a given ChC_{h}, let (h)\ell(h) be the value of \ell for which we have 2τηp(1+γ)phB<2+12^{\ell}\leq\frac{\tau}{\eta^{p}(1+\gamma)^{ph}B}<2^{\ell+1}, or =0\ell=0 if no such \ell exists.

Lemma 13

With probability 1nΩ(C)1-n^{-\Omega(C)}, at all points in time in the union of the kk streams and for all hh, for at least a 3/53/5 fraction of the z[r]z\in[r],

  1. 1.

    2(h)|ChS(h)z|3|Ch|,2^{\ell(h)}\cdot|C_{h}\cap S_{\ell(h)}^{z}|\leq 3|C_{h}|, and

  2. 2.

    if at this time ChC_{h} contributes and vppτ5\|v\|_{p}^{p}\geq\frac{\tau}{5}, then 2(h)|ChS(h)z|=(1±γ)|Ch|.2^{\ell(h)}\cdot|C_{h}\cap S_{\ell(h)}^{z}|=\left(1\pm\gamma\right)|C_{h}|.

  • Proof:

    We show this statement for a fixed hh and at a particular point in time in the union of the kk streams. The lemma will follow by a union bound.

    The first part of the lemma follows from Lemma 12.

    We now prove the second part. In this case vppτ5\|v\|_{p}^{p}\geq\frac{\tau}{5}. We can assume that there exists an \ell for which 2τηp(1+γ)phB<2+12^{\ell}\leq\frac{\tau}{\eta^{p}(1+\gamma)^{ph}B}<2^{\ell+1}. Indeed, otherwise (h)=0\ell(h)=0 and |ChS(h)z|=|Ch||C_{h}\cap S_{\ell(h)}^{z}|=|C_{h}| and the second part of the lemma follows.

    Let q(z)=|ChS(h)z|q(z)=|C_{h}\cap S_{\ell(h)}^{z}|, which is a sum of independent indicator random variables and so 𝐕𝐚𝐫[q(z)]𝐄[q(z)]{\bf Var}[q(z)]\leq{\bf E}[q(z)]. Also,

    𝐄[q(z)]\displaystyle{\bf E}[q(z)] =\displaystyle= 2|Ch|ηp(1+γ)phBτ|Ch|.\displaystyle 2^{-\ell}|C_{h}|\geq\frac{\eta^{p}(1+\gamma)^{ph}B}{\tau}\cdot|C_{h}|. (36)

    Since ChC_{h} contributes, |Ch|ηp(1+γ)phγvppB1/2log(n/ηp)|C_{h}|\cdot\eta^{p}\cdot(1+\gamma)^{ph}\geq\frac{\gamma\|v\|_{p}^{p}}{B^{1/2}\log(n/\eta^{p})}, and combining this with (36),

    𝐄[q(z)]BγvppB1/2τlog(n/ηp)B1/2γ5log(n/ηp).{\bf E}[q(z)]\geq\frac{B\gamma\|v\|_{p}^{p}}{B^{1/2}\tau\log(n/\eta^{p})}\geq\frac{B^{1/2}\gamma}{5\log(n/\eta^{p})}.

    It follows that for BB sufficiently large, and assuming η1/nC\eta\geq 1/n^{C} which happens with probability 11/nC1-1/n^{C}, we have 𝐄[q(z)]3γ2{\bf E}[q(z)]\geq\frac{3}{\gamma^{2}}, and so by Chebyshev’s inequality,

    𝖯𝗋[|q(z)𝐄[q(z)]|γ𝐄[q(z)]]𝐕𝐚𝐫[q(z)]γ2𝐄2[q(z)]13.\mathsf{Pr}\left[|q(z)-{\bf E}[q(z)]|\geq\gamma{\bf E}[q(z)]\right]\leq\frac{{\bf Var}[q(z)]}{\gamma^{2}\cdot{\bf E}^{2}[q(z)]}\leq\frac{1}{3}.

    Since 𝐄[q(z)]=2|Ch|{\bf E}[q(z)]=2^{-\ell}|C_{h}|, and r=Θ(logn)r=\Theta(\log n) is large enough, the lemma follows by a Chernoff bound.  

5.3.3 Combining Individual Frequency Estimation and Class Size Estimation

We define the set TT to be the set of times in the input stream for which the FpF_{p}-value of the union of the kk streams first exceeds (1+γ)i(1+\gamma)^{i} for an ii satisfying

0ilog(1+γ)2pτ.0\leq i\leq\log_{(1+\gamma)}2^{p}\cdot\tau.
Lemma 14

With probability 1O(γ)1-O(\gamma), for all times in TT and all hh,

  1. 1.

    c~h3|Ch|+3γ(2+γ)(|Ch1|+|Ch+1|)\tilde{c}_{h}\leq 3|C_{h}|+3\gamma(2+\gamma)(|C_{h-1}|+|C_{h+1}|), and

  2. 2.

    if at this time ChC_{h} contributes and vppτ5\|v\|_{p}^{p}\geq\frac{\tau}{5}, then

    (14γ)|Ch|c~h(1+γ)|Ch|+3γ(2+γ)(|Ch1|+|Ch+1|).(1-4\gamma)|C_{h}|\leq\tilde{c}_{h}\leq(1+\gamma)|C_{h}|+3\gamma(2+\gamma)(|C_{h-1}|+|C_{h+1}|).
  • Proof:

    We assume the events of Lemma 11 and Lemma 13 occur, and we add nΩ(C)n^{-\Omega(C)} to the error probability. Let us fix a class ChC_{h}, a point in time in TT, and a z[r]z\in[r] which is among the at least 3r/53r/5 different zz that satisfy Lemma 13 at this point in time.

By Lemma 11, for any jChS(h)zj\in C_{h}\cap S_{\ell(h)}^{z} for which vjC(log5n)τ(h)1/pBγ10v_{j}\geq\frac{C(\log^{5}n)\tau_{\ell(h)}^{1/p}}{B\gamma^{10}}, if

|min(vjη(1+γ)h,η(1+γ)h+1vj)|γ5log2nvj,\displaystyle|\min(v_{j}-\eta(1+\gamma)^{h},\eta(1+\gamma)^{h+1}-v_{j})|\geq\frac{\gamma^{5}}{\log^{2}n}\cdot v_{j}, (37)

then jFz,hj\in F_{z,h}. Let us first verify that for jChj\in C_{h}, we have vjC(log5n)τ(h)1/pBγ10v_{j}\geq\frac{C(\log^{5}n)\tau_{\ell(h)}^{1/p}}{B\gamma^{10}}. We have

vjpηp(1+γ)phτ2(h)+1Bτ(h)2B,\displaystyle v_{j}^{p}\geq\eta^{p}(1+\gamma)^{ph}\geq\frac{\tau}{2^{\ell(h)+1}B}\geq\frac{\tau_{\ell(h)}}{2B}, (38)

and so

vj(τ(h)2B)1/pC(log5n)τ(h)1/pBγ10,v_{j}\geq\left(\frac{\tau_{\ell(h)}}{2B}\right)^{1/p}\geq\frac{C(\log^{5}n)\tau_{\ell(h)}^{1/p}}{B\gamma^{10}},

where the final inequality follows for large enough B=poly(ε1logn)B={\mathrm{poly}}(\varepsilon^{-1}\log n) and p>1p>1.

It remains to consider the case when (37) does not hold.

Conditioned on all other randomness, η[0,1]\eta\in[0,1] is uniformly random subject to vjChv_{j}\in C_{h}, or equivalently,

vj(1+γ)h+1<ηvj(1+γ)h.\frac{v_{j}}{(1+\gamma)^{h+1}}<\eta\leq\frac{v_{j}}{(1+\gamma)^{h}}.

If (37) does not hold, then either

(1γ5/log2n)vj(1+γ)hη, or η(1+γ5/log2n)vj(1+γ)h+1.\frac{(1-\gamma^{5}/\log^{2}n)v_{j}}{(1+\gamma)^{h}}\leq\eta,\textrm{ or }\eta\leq\frac{(1+\gamma^{5}/\log^{2}n)v_{j}}{(1+\gamma)^{h+1}}.

Hence, the probability over η\eta that inequality (37) holds is at least

1γ5vj(1+γ)hlog2n+γ5vj(1+γ)h+1log2nvj(1+γ)hvj(1+γ)h+1=1γ4(2+γ)log2n.1-\frac{\frac{\gamma^{5}v_{j}}{(1+\gamma)^{h}\log^{2}n}+\frac{\gamma^{5}v_{j}}{(1+\gamma)^{h+1}\log^{2}n}}{\frac{v_{j}}{(1+\gamma)^{h}}-\frac{v_{j}}{(1+\gamma)^{h+1}}}=1-\frac{\gamma^{4}(2+\gamma)}{\log^{2}n}.

It follows by a Markov bound that

𝖯𝗋[|ChS(h)z||Ch|(1γ(2+γ))]γ3log2n.\displaystyle\mathsf{Pr}\left[|C_{h}\cap S_{\ell(h)}^{z}|\geq|C_{h}|\cdot\left(1-\gamma(2+\gamma)\right)\right]\leq\frac{\gamma^{3}}{\log^{2}n}. (39)

Now we must consider the case that there is a jChS(h)zj^{\prime}\in C_{h^{\prime}}\cap S^{z}_{\ell(h)} for which jFz,hj^{\prime}\in F_{z,h} for an hhh^{\prime}\neq h. There are two cases, namely, if vj<C(log5n)τ(h)1/pBγ10v_{j^{\prime}}<\frac{C(\log^{5}n)\tau_{\ell(h)}^{1/p}}{B\gamma^{10}} or if vjC(log5n)τ(h)1/pBγ10v_{j^{\prime}}\geq\frac{C(\log^{5}n)\tau_{\ell(h)}^{1/p}}{B\gamma^{10}}. We handle each case in turn.

Case: vj<C(log5n)τ(h)1/pBγ10v_{j^{\prime}}<\frac{C(\log^{5}n)\tau_{\ell(h)}^{1/p}}{B\gamma^{10}}. Then by Lemma 11,

fz,(h),j2evj+Cτ(h)1/plognB.f_{z,\ell(h),j^{\prime}}\leq 2e\cdot v_{j^{\prime}}+\frac{C\tau_{\ell(h)}^{1/p}\log n}{B}.

Therefore, it suffices to show that

2eC(log5n)τ(h)1/pBγ10+Cτ(h)1/plognB<η(1+γ)h,2e\cdot\frac{C(\log^{5}n)\tau_{\ell(h)}^{1/p}}{B\gamma^{10}}+\frac{C\tau_{\ell(h)}^{1/p}\log n}{B}<\eta(1+\gamma)^{h},

from which we can conclude that jFz,hj^{\prime}\notin F_{z,h}. But by (38),

η(1+γ)h(τ(h)2B)1/p>2eC(log5n)τ(h)1/pBγ10+Cτ(h)1/plognB,\eta(1+\gamma)^{h}\geq\left(\frac{\tau_{\ell(h)}}{2B}\right)^{1/p}>2e\cdot\frac{C(\log^{5}n)\tau_{\ell(h)}^{1/p}}{B\gamma^{10}}+\frac{C\tau_{\ell(h)}^{1/p}\log n}{B},

where the last inequality follows for large enough B=poly(ε1logn)B={\mathrm{poly}}(\varepsilon^{-1}\log n). Hence, jFz,hj^{\prime}\notin F_{z,h}.

Case: vjC(log5n)τ(h)1/pBγ10v_{j^{\prime}}\geq\frac{C(\log^{5}n)\tau_{\ell(h)}^{1/p}}{B\gamma^{10}}. We claim that h{h1,h+1}h^{\prime}\in\{h-1,h+1\}. Indeed, by Lemma 11 we must have

η(1+γ)hγ5log2nvjvjη(1+γ)h+1+γ5log2nvj.\eta(1+\gamma)^{h}-\frac{\gamma^{5}}{\log^{2}n}\cdot v_{j^{\prime}}\leq v_{j^{\prime}}\leq\eta(1+\gamma)^{h+1}+\frac{\gamma^{5}}{\log^{2}n}\cdot v_{j^{\prime}}.

This is equivalent to

η(1+γ)h1+γ5/log2nvjη(1+γ)h+11γ5/log2n,\frac{\eta(1+\gamma)^{h}}{1+\gamma^{5}/\log^{2}n}\leq v_{j^{\prime}}\leq\frac{\eta(1+\gamma)^{h+1}}{1-\gamma^{5}/\log^{2}n},

If jChj^{\prime}\in C_{h^{\prime}} for h<h1h^{\prime}<h-1, then

vjη(1+γ)h1=η(1+γ)h1+γ<η(1+γ)h1+γ5/log2n,v_{j^{\prime}}\leq\eta(1+\gamma)^{h-1}=\frac{\eta(1+\gamma)^{h}}{1+\gamma}<\frac{\eta(1+\gamma)^{h}}{1+\gamma^{5}/\log^{2}n},

which is impossible. Also, if jChj^{\prime}\in C_{h^{\prime}} for h>h+1h^{\prime}>h+1, then

vjη(1+γ)h+2=η(1+γ)h+1(1+γ)>η(1+γ)h+11γ5/log2n,v_{j^{\prime}}\geq\eta(1+\gamma)^{h+2}=\eta(1+\gamma)^{h+1}\cdot(1+\gamma)>\frac{\eta(1+\gamma)^{h+1}}{1-\gamma^{5}/\log^{2}n},

which is impossible. Hence, h{h1,h+1}h^{\prime}\in\{h-1,h+1\}.

Let Nz,h=Fz,hChN_{z,h}=F_{z,h}\setminus C_{h}. Then

𝐄[|Nz,h|γ4(2+γ)log2n(|Ch1S(h)z|+|Ch+1S(h)z|).\displaystyle{\bf E}[|N_{z,h}|\leq\frac{\gamma^{4}(2+\gamma)}{\log^{2}n}\cdot(|C_{h-1}\cap S_{\ell(h)}^{z}|+|C_{h+1}\cap S_{\ell(h)}^{z}|). (40)

By (39) and applying a Markov bound to (40), together with a union bound, with probability 12γ3log2n\geq 1-\frac{2\gamma^{3}}{\log^{2}n},

(1γ(2+γ))|ChS(h)z||Fz,h|\displaystyle(1-\gamma(2+\gamma))\cdot|C_{h}\cap S_{\ell(h)}^{z}|\leq|F_{z,h}| (41)
|Fz,h||ChS(h)z|+γ(2+γ)(|Ch1S(h)z|+|Ch+1S(h)z|).|F_{z,h}|\leq|C_{h}\cap S_{\ell(h)}^{z}|+\gamma(2+\gamma)\cdot(|C_{h-1}\cap S_{\ell(h)}^{z}|+|C_{h+1}\cap S_{\ell(h)}^{z}|). (42)

By Lemma 12,

2(h)|Ch1S(h)z|3|Ch1| and  2(h)|Ch+1S(h)z|3|Ch+1|.2^{\ell(h)}|C_{h-1}\cap S_{\ell(h)}^{z}|\leq 3|C_{h-1}|\textrm{ and }\ 2^{\ell(h)}|C_{h+1}\cap S_{\ell(h)}^{z}|\leq 3|C_{h+1}|. (43)

First part of lemma. At this point we can prove the first part of this lemma. By the first part of Lemma 13,

2(h)|ChS(h)z|3|Ch|.\displaystyle 2^{\ell(h)}\cdot|C_{h}\cap S_{\ell(h)}^{z}|\leq 3|C_{h}|. (44)

Combining (42), (43), and (44), we have with probability at least 12γ3log2nnΩ(C)1-\frac{2\gamma^{3}}{\log^{2}n}-n^{-\Omega(C)},

2(h)|Fz,h|3|Ch|+3γ(2+γ)(|Ch1|+|Ch+1|).2^{\ell(h)}|F_{z,h}|\leq 3|C_{h}|+3\gamma(2+\gamma)(|C_{h-1}|+|C_{h+1}|).

Since this holds for at least 3r/53r/5 different zz, it follows that

c~h3|Ch|+3γ(2+γ)(|Ch1|+|Ch+1|),\tilde{c}_{h}\leq 3|C_{h}|+3\gamma(2+\gamma)(|C_{h-1}|+|C_{h+1}|),

and the first part of the lemma follows by a union bound. Indeed, the number of hh is O(γ1log(n/ηp))O(\gamma^{-1}\log(n/\eta^{p})), which with probability 11/n1-1/n, say, is O(γ1logn)O(\gamma^{-1}\log n) since with this probability ηp1/np\eta^{p}\geq 1/n^{p}. Also, |T|=O(γ1logn)|T|=O(\gamma^{-1}\log n). Hence, the probability this holds for all hh and all times in TT is 1O(γ)1-O(\gamma).

Second part of the lemma. Now we can prove the second part of the lemma. By the second part of Lemma 13, if at this time ChC_{h} contributes and vppτ5\|v\|_{p}^{p}\geq\frac{\tau}{5}, then

2(h)|ChS(h)z|=(1±γ)|Ch|.\displaystyle 2^{\ell(h)}\cdot|C_{h}\cap S_{\ell(h)}^{z}|=(1\pm\gamma)|C_{h}|. (45)

Combining (41), (42), (43), and (45), we have with probability at least 12γ3log2nnΩ(C)1-\frac{2\gamma^{3}}{\log^{2}n}-n^{-\Omega(C)},

(1γ(2+γ))(1γ)|Ch|2(h)|Fz,h|(1+γ)|Ch|+3γ(2+γ)(|Ch1|+|Ch+1|).(1-\gamma(2+\gamma))(1-\gamma)|C_{h}|\leq 2^{\ell(h)}|F_{z,h}|\leq(1+\gamma)|C_{h}|+3\gamma(2+\gamma)(|C_{h-1}|+|C_{h+1}|).

Since this holds for at least 3r/53r/5 different zz, it follows that

(1γ(2+γ))(1γ)|Ch|c~h(1+γ)|Ch|+3γ(2+γ)(|Ch1|+|Ch+1|).(1-\gamma(2+\gamma))(1-\gamma)|C_{h}|\leq\tilde{c}_{h}\leq(1+\gamma)|C_{h}|+3\gamma(2+\gamma)(|C_{h-1}|+|C_{h+1}|).

and the second part of the lemma now follows by a union bound over all hh and all times in TT, exactly in the same way as the first part of the lemma. Note that 14γ(1γ(2+γ))(1γ)1-4\gamma\leq(1-\gamma(2+\gamma))(1-\gamma) for small enough γ=Θ(ε)\gamma=\Theta(\varepsilon).  

5.3.4 Putting It All Together

Lemma 15

With probability at least 5/65/6, at all times the coordinator’s output is correct.

  • Proof:

    The coordinator outputs 0 up until the first point in time in the union of the kk streams for which h0c~hηp(1+γ)ph>(1ε/2)τ\sum_{h\geq 0}\tilde{c}_{h}\cdot\eta^{p}\cdot(1+\gamma)^{ph}>(1-\varepsilon/2)\tau. It suffices to show that

    h0c~hηp(1+γ)ph=(1±ε/2)vpp\displaystyle\sum_{h\geq 0}\tilde{c}_{h}\eta^{p}(1+\gamma)^{ph}=(1\pm\varepsilon/2)\|v\|_{p}^{p} (46)

    at all times in the stream. We first show that with probability at least 5/65/6, for all times in TT,

    h0c~hηp(1+γ)ph=(1±ε/4)vpp,\displaystyle\sum_{h\geq 0}\tilde{c}_{h}\eta^{p}(1+\gamma)^{ph}=(1\pm\varepsilon/4)\|v\|_{p}^{p}, (47)

    and then use the structure of TT and the protocol to argue that (46) holds at all times in the stream.

    Fix a particular time in TT. We condition on the event of Lemma 14, which by setting γ=Θ(ε)\gamma=\Theta(\varepsilon) small enough, can assume occurs with probability at least 5/65/6.

    First, suppose at this point in time we have vpp<τ5\|v\|_{p}^{p}<\frac{\tau}{5}. Then by Lemma 14, for sufficiently small γ=Θ(ε)\gamma=\Theta(\varepsilon), we have

    h0c~hηp(1+γ)ph\displaystyle\sum_{h\geq 0}\tilde{c}_{h}\cdot\eta^{p}(1+\gamma)^{ph} \displaystyle\leq h0(3|Ch|+3γ(2+γ)(|Ch1|+|Ch+1|))ηp(1+γ)ph\displaystyle\sum_{h\geq 0}(3|C_{h}|+3\gamma(2+\gamma)(|C_{h-1}|+|C_{h+1}|))\cdot\eta^{p}(1+\gamma)^{ph}
    \displaystyle\leq h0(3jChvjp+3γ(2+γ)(1+γ)2jCh1Ch+1vjp)\displaystyle\sum_{h\geq 0}\left(3\sum_{j\in C_{h}}v_{j}^{p}+3\gamma(2+\gamma)(1+\gamma)^{2}\sum_{j\in C_{h-1}\cup C_{h+1}}v_{j}^{p}\right)
    \displaystyle\leq 4vpp\displaystyle 4\|v\|_{p}^{p}
    \displaystyle\leq 4τ5,\displaystyle\frac{4\tau}{5},

    and so the coordinator will correctly output 0, provided ε<15\varepsilon<\frac{1}{5}.

    We now handle the case vppτ5\|v\|_{p}^{p}\geq\frac{\tau}{5}. Then for all contributing ChC_{h}, we have

    (14γ)|Ch|c~h(1+γ)|Ch|+3γ(2+γ)(|Ch1|+|Ch+1|),(1-4\gamma)|C_{h}|\leq\tilde{c}_{h}\leq(1+\gamma)|C_{h}|+3\gamma(2+\gamma)(|C_{h-1}|+|C_{h+1}|),

    while for all ChC_{h}, we have

    c~h3|Ch|+3γ(2+γ)(|Ch1|+|Ch+1|).\tilde{c}_{h}\leq 3|C_{h}|+3\gamma(2+\gamma)(|C_{h-1}|+|C_{h+1}|).

    Hence, using (35),

    h0c~hηp(1+γ)ph\displaystyle\sum_{h\geq 0}\tilde{c}_{h}\cdot\eta^{p}(1+\gamma)^{ph} \displaystyle\geq contributing Ch(14γ)|Ch|ηp(1+γ)ph\displaystyle\sum_{\textrm{contributing }C_{h}}(1-4\gamma)|C_{h}|\eta^{p}(1+\gamma)^{ph}
    \displaystyle\geq (14γ)(1+γ)2contributing ChjChvjp\displaystyle\frac{(1-4\gamma)}{(1+\gamma)^{2}}\sum_{\textrm{contributing }C_{h}}\sum_{j\in C_{h}}v_{j}^{p}
    \displaystyle\geq (16γ)(1O(1/B1/2))vpp.\displaystyle(1-6\gamma)\cdot(1-O(1/B^{1/2}))\cdot\|v\|_{p}^{p}.

    For the other direction,

    h0c~hηp(1+γ)ph\displaystyle\sum_{h\geq 0}\tilde{c}_{h}\cdot\eta^{p}(1+\gamma)^{ph}
    \displaystyle\leq contributing Ch(1+γ)|Ch|ηp(1+γ)ph+non-contributing Ch3|Ch|ηp(1+γ)ph\displaystyle\sum_{\textrm{contributing }C_{h}}(1+\gamma)|C_{h}|\eta^{p}(1+\gamma)^{ph}+\sum_{\textrm{non-contributing }C_{h}}3|C_{h}|\eta^{p}(1+\gamma)^{ph}
    +h03γ(2+γ)(|Ch1|+|Ch+1|)ηp(1+γ)ph\displaystyle+\sum_{h\geq 0}3\gamma(2+\gamma)(|C_{h-1}|+|C_{h+1}|)\eta^{p}(1+\gamma)^{ph}
    \displaystyle\leq (1+γ)contributing ChjChvjp+O(1/B1/2)vpp+O(γ)vpp\displaystyle(1+\gamma)\sum_{\textrm{contributing }C_{h}}\sum_{j\in C_{h}}v_{j}^{p}+O(1/B^{1/2})\cdot\|v\|_{p}^{p}+O(\gamma)\cdot\|v\|_{p}^{p}
    \displaystyle\leq (1+O(γ)+O(1/B1/2))vpp.\displaystyle(1+O(\gamma)+O(1/B^{1/2}))\|v\|_{p}^{p}.

    Hence, (47) follows for all times in TT provided that γ=Θ(ε)\gamma=\Theta(\varepsilon) is small enough and B=poly(ε1logn)B={\mathrm{poly}}(\varepsilon^{-1}\log n) is large enough.

    It remains to argue that (46) holds for all points in time in the union of the kk streams. Recall that each time in the union of the kk streams for which vpp(1+γ)i\|v\|_{p}^{p}\geq(1+\gamma)^{i} for an integer ii is included in TT, provided vpp2pτ\|v\|_{p}^{p}\leq 2^{p}\tau.

    The key observation is that the quantity h0c~hηp(1+γ)ph\sum_{h\geq 0}\tilde{c}_{h}\eta^{p}(1+\gamma)^{ph} is non-decreasing, since the values |Fz,h||F_{z,h}| are non-decreasing. Now, the value of vpp\|v\|_{p}^{p} at a time tt not in TT is, by definition of TT, within a factor of (1±γ)(1\pm\gamma) of the value of vpp\|v\|_{p}^{p} for some time in TT. Since (47) holds for all times in TT, it follows that the value of h0c~hηp(1+γ)ph\sum_{h\geq 0}\tilde{c}_{h}\eta^{p}(1+\gamma)^{ph} at time tt satisfies

    (1γ)(1ε/4)vpph0c~hηp(1+γ)ph(1+γ)(1+ε/4)vpp,(1-\gamma)(1-\varepsilon/4)\|v\|_{p}^{p}\leq\sum_{h\geq 0}\tilde{c}_{h}\eta^{p}(1+\gamma)^{ph}\leq(1+\gamma)(1+\varepsilon/4)\|v\|_{p}^{p},

    which implies for γ=Θ(ε)\gamma=\Theta(\varepsilon) small enough that (46) holds for all points in time in the union of the kk streams. This completes the proof.  

Theorem 9

(MAIN) With probability at least 2/32/3, at all times the coordinator’s output is correct and the total communication is kp1poly(ε1logn)k^{p-1}\cdot{\mathrm{poly}}(\varepsilon^{-1}\log n) bits.

  • Proof:

    Consider the setting of v1,,vkv^{1},\ldots,v^{k} at the first time in the stream for which i=1kvipp>τ\|\sum_{i=1}^{k}v^{i}\|_{p}^{p}>\tau. For any non-negative integer vector ww and any update eje_{j}, we have w+ejpp(wp+1)p2pwpp\|w+e_{j}\|_{p}^{p}\leq(\|w\|_{p}+1)^{p}\leq 2^{p}\|w\|_{p}^{p}. Since i=1kvipp\|\sum_{i=1}^{k}v^{i}\|_{p}^{p} is an integer and τ1\tau\geq 1, we therefore have i=1kvipp2pτ\|\sum_{i=1}^{k}v^{i}\|_{p}^{p}\leq 2^{p}\cdot\tau. By Lemma 10, the expected communication for these v1,,vkv^{1},\ldots,v^{k} is kp1poly(ε1logn)k^{p-1}\cdot{\mathrm{poly}}(\varepsilon^{-1}\log n) bits, so with probability at least 5/65/6 the communication is kp1poly(ε1logn)k^{p-1}\cdot{\mathrm{poly}}(\varepsilon^{-1}\log n) bits. By Lemma 15, with probability at least 5/65/6, the protocol terminates at or before the time for which the inputs held by the players equal v1,,vkv^{1},\ldots,v^{k}. The theorem follows by a union bound.  

6 Related Problems

In this section we show that the techniques we have developed for distributed F0F_{0} and Fp(p>1)F_{p}\ (p>1) can also be used to solve other fundamental problems. In particular, we consider the problems: all-quantile, heavy hitters, empirical entropy and p\ell_{p} for any p>0p>0. For the first three problems, we are able to show that our lower bounds holds even if we allow some additive error ε\varepsilon. From definitions below one can observe that lower bounds for additive ε\varepsilon-approximations also hold for their multiplicative (1+ε)(1+\varepsilon)-approximation counterparts.

6.1 The All-Quantile and Heavy Hitters

We first give the definitions of the problems. Given a multiset A={a1,a2,,am}A=\{a_{1},a_{2},\ldots,a_{m}\} where each aia_{i} is drawn from the universe [N][N], let fif_{i} be the frequency of item ii in the set AA. Thus i[N]fi=m\sum_{i\in[N]}f_{i}=m.

Definition 5

(ϕ\phi-heavy hitters) For any 0ϕ10\leq\phi\leq 1, the set of ϕ\phi-heavy hitters of AA is Hϕ(A)={x|fxϕm}H_{\phi}(A)=\{x\ |\ f_{x}\geq\phi m\}. If an ε\varepsilon-approximation is allowed, then the returned set of heavy hitters must contain Hϕ(A)H_{\phi}(A) and cannot include any xx such that fx<(ϕε)mf_{x}<(\phi-\varepsilon)m. If (ϕε)mfx<ϕm(\phi-\varepsilon)m\leq f_{x}<\phi m, then xx may or may not be included in Hϕ(A)H_{\phi}(A).

Definition 6

(ϕ\phi-quantile) For any 0ϕ10\leq\phi\leq 1, the ϕ\phi-quantile of AA is some xx such that there are at most ϕm\phi m items of AA that are smaller than xx and at most (1ϕ)m(1-\phi)m items of AA that are greater than xx. If an ε\varepsilon-approximation is allowed, then when asking for the ϕ\phi-quantile of AA we are allowed to return any ϕ\phi^{\prime}-quantile of AA such that ϕεϕϕ+ε\phi-\varepsilon\leq\phi^{\prime}\leq\phi+\varepsilon.

Definition 7

(All-quantile) The ε\varepsilon-approximate all-quantile (QUAN) problem is defined in the coordinator model, where we have kk sites and a coordinator. Site Si(i[k])S_{i}\ (i\in[k]) has a set AiA_{i} of items. The kk sites want to communicate with the coordinator so that at the end of the process the coordinator can construct a data structure from which all ε\varepsilon-approximate ϕ\phi-quantiles for any 0ϕ10\leq\phi\leq 1 can be extracted. The cost is defined as the total number of bits exchanged between the coordinator and the kk sites.

Theorem 10

Any randomized protocol that computes ε\varepsilon-approximate QUAN or ε\varepsilon-approximate min{12,εk2}\min\{\frac{1}{2},\frac{\varepsilon\sqrt{k}}{2}\}-heavy hitters with error probability δ\delta for some sufficiently small constant δ\delta has communication complexity Ω(min{k/ε,1/ε2})\Omega(\min\{\sqrt{k}/\varepsilon,1/\varepsilon^{2}\}) bits.

6.1.1 The kk-GAP-MAJ Problem

Before proving Theorem 10, we introduce a problem we call kk-GAP-MAJ.

In this section we fix β=1/2\beta=1/2. In the kk-GAP-MAJ problem we have kk sites S1,S2,,SkS_{1},S_{2},\ldots,S_{k}, and each site has a bit Zi(1ik)Z_{i}\ (1\leq i\leq k) such that 𝖯𝗋[Zi=0]=𝖯𝗋[Zi=1]=1/2\mathsf{Pr}[Z_{i}=0]=\mathsf{Pr}[Z_{i}=1]=1/2. Let ð\eth be the distribution of {Z1,,Zk}\{Z_{1},\ldots,Z_{k}\}. The sites want to compute the following function.

k-GAP-MAJ(Z1,,Zk)={0,if i[k]Ziβkβk,1,if i[k]Ziβk+βk,,otherwise,\displaystyle\begin{array}[]{l}k\mbox{-GAP-MAJ}(Z_{1},\ldots,Z_{k})=\left\{\begin{array}[]{rl}0,&\text{if }\sum_{i\in[k]}Z_{i}\leq\beta k-\sqrt{\beta k},\\ 1,&\text{if }\sum_{i\in[k]}Z_{i}\geq\beta k+\sqrt{\beta k},\\ ,&\text{otherwise,}\end{array}\right.\end{array}

where * means that the answer can be either 0 or 11.

Notice that kk-GAP-MAJ is very similar to kk-APPROX-SUM: We set β=1/2\beta=1/2 and directly assign ZiZ_{i}’s to the kk sites. Also, instead of approximating the sum, we just want to decide whether the sum is large or small, up to a gap which is roughly equal to the standard deviation.

We will prove the following theorem for kk-GAP-MAJ.

Theorem 11

Let Π\Pi be the transcript of any private randomness protocol for kk-GAP-MAJ on input distribution ð\eth with error probability δ\delta for some sufficiently small constant δ\delta, then I(Z;Π)=Ω(k)I(Z;\Pi)=\Omega(k).

Remark 1

The theorem holds for private randomness protocols, though for our applications, we only need it to hold for deterministic protocols. Allowing private randomness could be useful when the theorem is used for direct-sum types of arguments in other settings.

The following definition is essentially the same as Definition 8, but with a different setting of parameters. For convenience, we will still use the terms rare and normal. Let κ1\kappa_{1} be a constant chosen later. For a transcript π\pi, we define qiπ=𝖯𝗋ð[Zi=1|Π=π]q_{i}^{\pi}=\mathsf{Pr}_{\eth^{\prime}}[Z_{i}=1\ |\ \Pi=\pi]. Thus i[k]qiπ=𝖤ð[i[k]Zi|Π=π]\sum_{i\in[k]}q_{i}^{\pi}=\mathsf{E}_{\eth^{\prime}}[\sum_{i\in[k]}Z_{i}\ |\ \Pi=\pi].

Definition 8

We say a transcript π\pi is rare+ if i[k]qiπβk+κ1βk\sum_{i\in[k]}q_{i}^{\pi}\geq\beta k+\kappa_{1}\sqrt{\beta k} and rare- if i[k]qiπβkκ1βk\sum_{i\in[k]}q_{i}^{\pi}\leq\beta k-\kappa_{1}\sqrt{\beta k}. In both cases we say π\pi is rare. Otherwise we say it is normal.

Let ð\eth^{\prime} be the joint distribution of ð\eth and the distribution of Π\Pi’s private randomness. The following lemma is essentially the same as Lemma 2. For completeness we still include a proof.

Lemma 16

Under the assumption of Theorem 11, 𝖯𝗋ð[Π is normal]18eκ12/4\mathsf{Pr}_{\eth^{\prime}}[\Pi\textrm{ is normal}]\geq 1-8e^{-\kappa_{1}^{2}/4}.

  • Proof:

    Set κ2=κ11\kappa_{2}=\kappa_{1}-1. We will redefine the term joker which was defined in Definition 3, with a different setting of parameters. We say Z={Z1,Z2,,Zk}Z=\{Z_{1},Z_{2},\ldots,Z_{k}\} is a joker+ if i[k]Ziβk+κ2βk\sum_{i\in[k]}Z_{i}\geq\beta k+\kappa_{2}\sqrt{\beta k}, and a joker- if i[k]Ziβkκ2βk\sum_{i\in[k]}Z_{i}\leq\beta k-\kappa_{2}\sqrt{\beta k}. In both cases we say ZZ is a joker.

    First, we can apply a Chernoff bound on random variables Z1,,ZkZ_{1},\ldots,Z_{k}, and obtain

    𝖯𝗋ð[Z is a joker+]=𝖯𝗋ð[i[k]Ziβk+κ2βk]eκ22/3.\textstyle\mathsf{Pr}_{\eth}[Z\textrm{ is a joker}^{+}]=\mathsf{Pr}_{\eth}\left[\sum_{i\in[k]}Z_{i}\geq\beta k+\kappa_{2}\sqrt{\beta k}\right]\leq e^{-\kappa_{2}^{2}/3}.

    Second, by Observation 1, we can apply a Chernoff bound on random variables Z1,,ZkZ_{1},\ldots,Z_{k} conditioned on Π\Pi being bad+,

    𝖯𝗋ð[Z is a joker+|Π is bad+]\displaystyle\mathsf{Pr}_{\eth^{\prime}}[Z\textrm{ is a joker}^{+}\ |\ \Pi\textrm{ is bad}^{+}]
    \displaystyle\geq π𝖯𝗋ð[Π=π|Π is bad+]𝖯𝗋ð[Z is a joker+|Π=π,Π is bad+]\displaystyle\sum_{\pi}\mathsf{Pr}_{\eth^{\prime}}\left[\Pi=\pi\ |\ \Pi\textrm{ is bad}^{+}\right]\mathsf{Pr}_{\eth^{\prime}}\left[Z\textrm{ is a joker}^{+}\ |\ \Pi=\pi,\Pi\textrm{ is bad}^{+}\right]
    =\displaystyle= π𝖯𝗋ð[Π=π|Π is bad+]𝖯𝗋ð[i[k]Ziβk+κ2βk|i[k]qiπβk+κ1βk,Π=π]\displaystyle\sum_{\pi}\mathsf{Pr}_{\eth^{\prime}}\left[\Pi=\pi\ |\ \Pi\textrm{ is bad}^{+}\right]\textstyle\mathsf{Pr}_{\eth^{\prime}}\left[\left.\sum_{i\in[k]}Z_{i}\geq\beta k+\kappa_{2}\sqrt{\beta k}\ \right|\sum_{i\in[k]}q_{i}^{\pi}\geq\beta k+\kappa_{1}\sqrt{\beta k},\Pi=\pi\right]
    \displaystyle\geq π𝖯𝗋ð[Π=π|Π is bad+](1e(κ1κ2)2/3)\displaystyle\sum_{\pi}\mathsf{Pr}_{\eth^{\prime}}\left[\Pi=\pi\ |\ \Pi\textrm{ is bad}^{+}\right]\left(1-e^{-(\kappa_{1}-\kappa_{2})^{2}/3}\right)
    =\displaystyle= (1e(κ1κ2)2/3).\displaystyle\left(1-e^{-(\kappa_{1}-\kappa_{2})^{2}/3}\right).

    Finally by Bayes’ theorem, we have that

    𝖯𝗋ð[Π is bad+]\displaystyle\mathsf{Pr}_{\eth^{\prime}}[\Pi\textrm{ is bad}^{+}] =\displaystyle= 𝖯𝗋ð[Z is a joker+]𝖯𝗋ð[Π is bad+|Z is a joker+]𝖯𝗋ð[Z is a joker+|Π is bad+]\displaystyle\frac{\mathsf{Pr}_{\eth}[Z\textrm{ is a joker}^{+}]\cdot\mathsf{Pr}_{\eth^{\prime}}[\Pi\textrm{ is bad}^{+}\ |\ Z\textrm{ is a joker}^{+}]}{\mathsf{Pr}_{\eth^{\prime}}[Z\textrm{ is a joker}^{+}\ |\ \Pi\textrm{ is bad}^{+}]}
    \displaystyle\leq eκ22/31e(κ1κ2)2/3.\displaystyle\frac{e^{-\kappa_{2}^{2}/3}}{1-e^{-(\kappa_{1}-\kappa_{2})^{2}/3}}.

    By symmetry (since we have set β=1/2\beta=1/2), we can also show that

    𝖯𝗋ð[Π is bad]eκ22/3/(1e(κ1κ2)2/3).\mathsf{Pr}_{\eth^{\prime}}[\Pi\textrm{ is bad}^{-}]\leq{e^{-\kappa_{2}^{2}/3}}/{(1-e^{-(\kappa_{1}-\kappa_{2})^{2}/3})}.

    Therefore 𝖯𝗋ð[Π is bad]2e(κ11)2/3/(1e1/3)8eκ12/4\mathsf{Pr}_{\eth^{\prime}}[\Pi\textrm{ is bad}]\leq 2{e^{-(\kappa_{1}-1)^{2}/3}}/{(1-e^{-1/3})}\leq 8e^{-\kappa_{1}^{2}/4} (recall that we have set κ2=κ11\kappa_{2}=\kappa_{1}-1).  

The following definition is essentially the same as Definition 4, but is for private randomness protocols.

Definition 9

We say a transcript π\pi is weak if i[k]qiπ(1qiπ)βk/(40c0)\sum_{i\in[k]}q_{i}^{\pi}(1-q_{i}^{\pi})\geq\beta k/(40c_{0}) (for a sufficiently large constant c0c_{0}), and strong otherwise.

The following lemma is similar to Lemma 3, but with the new definition of a normal π\pi.

Lemma 17

Under the assumption of Theorem 11, 𝖯𝗋ð[Π is normal and strong]0.98\mathsf{Pr}_{\eth^{\prime}}[\Pi\textrm{ is normal and strong}]\geq 0.98.

  • Proof:

    We first show that for a normal and weak transcript π\pi, there exists a universal constant cc such that

    𝖯𝗋ð[i[k]Ziβkκ1βk|Π=π]\displaystyle\textstyle\mathsf{Pr}_{\eth^{\prime}}\left[\left.\sum_{i\in[k]}Z_{i}\leq\beta k-\kappa_{1}\sqrt{\beta k}\ \right|\ \Pi=\pi\right] \displaystyle\geq ce60c0κ12,\displaystyle c\cdot e^{-60c_{0}\kappa_{1}^{2}},
    and 𝖯𝗋ð[i[k]Ziβk+κ1βk|Π=π]\displaystyle\text{and \quad}\textstyle\mathsf{Pr}_{\eth^{\prime}}\left[\left.\sum_{i\in[k]}Z_{i}\geq\beta k+\kappa_{1}\sqrt{\beta k}\ \right|\ \Pi=\pi\right] \displaystyle\geq ce60c0κ12.\displaystyle c\cdot e^{-60c_{0}\kappa_{1}^{2}}.

    We only need to prove the first inequality. The second will follow by symmetry, since we set β=1/2\beta=1/2. For a good and weak Π=π\Pi=\pi, we have

    𝖵𝖺𝗋ð(i[k]Zi|Π=π)=i[k]𝖵𝖺𝗋ð(Zi|Π=π)βk/(40c0).\displaystyle\mathsf{Var}_{\eth^{\prime}}\left(\sum_{i\in[k]}Z_{i}\ |\ \Pi=\pi\right)=\sum_{i\in[k]}\mathsf{Var}_{\eth^{\prime}}(Z_{i}\ |\ \Pi=\pi)\geq\beta k/(40c_{0}).

    Set κ3=2κ1\kappa_{3}=2\kappa_{1}. By Fact 1 we have for a universal constant cc,

    𝖯𝗋ð[i[k]Zii[k]qiπ+κ3βk|Π=π]\displaystyle\textstyle\mathsf{Pr}_{\eth^{\prime}}\left[\left.\sum_{i\in[k]}Z_{i}\geq\sum_{i\in[k]}q_{i}^{\pi}+\kappa_{3}\sqrt{\beta k}\ \right|\ \Pi=\pi\right]
    \displaystyle\geq ce(κ3βk)23βk/(40c0)ce60c0κ12.\displaystyle c\cdot e^{-\frac{(\kappa_{3}\sqrt{\beta k})^{2}}{3\cdot\beta k/(40c_{0})}}\geq c\cdot e^{-60c_{0}\kappa_{1}^{2}}.

    Together with the fact that π\pi is good, we obtain

    𝖯𝗋ð[i[k]Ziβk+κ1βk|Π=π]\displaystyle\textstyle\mathsf{Pr}_{\eth^{\prime}}\left[\left.\sum_{i\in[k]}Z_{i}\geq\beta k+\kappa_{1}\sqrt{\beta k}\ \right|\ \Pi=\pi\right]
    =\displaystyle= 𝖯𝗋ð[i[k]Ziβk+(κ3κ1)βk|Π=π]\displaystyle\textstyle\mathsf{Pr}_{\eth^{\prime}}\left[\left.\sum_{i\in[k]}Z_{i}\geq\beta k+(\kappa_{3}-\kappa_{1})\sqrt{\beta k}\ \right|\ \Pi=\pi\right]
    \displaystyle\geq 𝖯𝗋ð[i[k]Zii[k]qiπκ3βk|Π=π]\displaystyle\textstyle\mathsf{Pr}_{\eth^{\prime}}\left[\left.\sum_{i\in[k]}Z_{i}-\sum_{i\in[k]}q_{i}^{\pi}\geq\kappa_{3}\sqrt{\beta k}\ \right|\ \Pi=\pi\right]
    \displaystyle\geq ce60c0κ12.\displaystyle c\cdot e^{-60c_{0}\kappa_{1}^{2}}.

    Now set κ1=4ln800\kappa_{1}=\sqrt{4\ln 800}. Suppose conditioned on Π\Pi being normal, it is weak with probability more than 0.010.01. Then the error probability of the protocol (taken over the distribution ð\eth^{\prime}) is at least

    (18eκ12/4)0.01ce60c0κ12δ,(1-8e^{-\kappa_{1}^{2}/4})\cdot 0.01\cdot c\cdot e^{-60c_{0}\kappa_{1}^{2}}\geq\delta,

    for a sufficiently small constant δ\delta, violating the success guarantee of Theorem 11. Therefore with probability at least

    (18eκ12/4)(10.99)0.98,(1-8e^{-\kappa_{1}^{2}/4})\cdot(1-0.99)\geq 0.98,

    Π\Pi is both normal and strong.  

  • Proof:

    (for Theorem 11) Recall that for a transcript π\pi, we have defined qiπ=𝖯𝗋ð[Zi=1|Π=π]q_{i}^{\pi}=\mathsf{Pr}_{\eth^{\prime}}[Z_{i}=1\ |\ \Pi=\pi]. Let piπ=min{qiπ,1qiπ}p_{i}^{\pi}=\min\{q_{i}^{\pi},1-q_{i}^{\pi}\}, thus piπ[0,1/2]p_{i}^{\pi}\in[0,1/2]. We will omit the superscript π\pi in piπp_{i}^{\pi} when it is clear from the context.

    For a strong π\pi, we have i[k]pi1/2i[k]pi(1pi)<βk40c0\sum_{i\in[k]}p_{i}\cdot 1/2\leq\sum_{i\in[k]}p_{i}(1-p_{i})<\frac{\beta k}{40c_{0}}. Thus i[k]pi<βk20c0\sum_{i\in[k]}p_{i}<\frac{\beta k}{20c_{0}}. For each pip_{i}, if pi<β16c0<1/2p_{i}<\frac{\beta}{16c_{0}}<1/2 (for a sufficiently large constant c0c_{0}), then Hb(pi)<Hb(β16c0)H_{b}(p_{i})<H_{b}\left(\frac{\beta}{16c_{0}}\right) (recall that Hb()H_{b}(\cdot) is the binary entropy function). Otherwise, it holds that Hb(pi)=pilog1pi+(1pi)log11pipilog16c0β+2(1pi)piH_{b}(p_{i})=p_{i}\log\frac{1}{p_{i}}+(1-p_{i})\log\frac{1}{1-p_{i}}\leq p_{i}\log\frac{16c_{0}}{\beta}+2(1-p_{i})p_{i}, since log11pi=pi+pi2/2+pi3/3+2pi\log\frac{1}{1-p_{i}}=p_{i}+p_{i}^{2}/2+p_{i}^{3}/3+\ldots\leq 2p_{i} if pi1/2p_{i}\leq 1/2. Thus we have

    i[k]H(Zi|Π=π)=i[k]Hb(pi)\displaystyle\sum_{i\in[k]}H(Z_{i}\ |\ \Pi=\pi)=\sum_{i\in[k]}H_{b}(p_{i})
    \displaystyle\leq i[k]max{Hb(β16c0),pilog16c0β+2(1pi)pi}\displaystyle\sum_{i\in[k]}\max\left\{H_{b}\left(\frac{\beta}{16c_{0}}\right),\ p_{i}\log\frac{16c_{0}}{\beta}+2(1-p_{i})p_{i}\right\}
    \displaystyle\leq max{kHb(β16c0),i[k]pilog16c0β+2i[k]pi(1pi)}\displaystyle\max\left\{k\cdot H_{b}\left(\frac{\beta}{16c_{0}}\right),\ \sum_{i\in[k]}p_{i}\cdot\log\frac{16c_{0}}{\beta}+2\sum_{i\in[k]}p_{i}(1-p_{i})\right\}
    <\displaystyle< max{kHb(β16c0),βk20c0log16c0β+βk20c0}\displaystyle\max\left\{k\cdot H_{b}\left(\frac{\beta}{16c_{0}}\right),\ \frac{\beta k}{20c_{0}}\cdot\log\frac{16c_{0}}{\beta}+\frac{\beta k}{20c_{0}}\right\}
    =\displaystyle= kHb(β16c0).\displaystyle k\cdot H_{b}\left(\frac{\beta}{16c_{0}}\right).

    Therefore, if 𝖯𝗋ð[Π is strong]0.98\mathsf{Pr}_{\eth^{\prime}}[\Pi\textrm{ is strong}]\geq 0.98, then

    i[k]H(Zi|Π)\displaystyle\sum_{i\in[k]}H(Z_{i}\ |\ \Pi)
    =\displaystyle= π:π is strong(𝖯𝗋ð[Π=π]i[k]H(Zi|Π=π))+π:π is weak(𝖯𝗋ð[Π=π]i[k]H(Zi|Π=π))\displaystyle\sum_{\pi:\pi\textrm{ is strong}}\left(\mathsf{Pr}_{\eth^{\prime}}[\Pi=\pi]\sum_{i\in[k]}H(Z_{i}\ |\ \Pi=\pi)\right)+\sum_{\pi:\pi\textrm{ is weak}}\left(\mathsf{Pr}_{\eth^{\prime}}[\Pi=\pi]\sum_{i\in[k]}H(Z_{i}\ |\ \Pi=\pi)\right)
    \displaystyle\leq 0.98kHb(β16c0)+0.02k1\displaystyle 0.98\cdot k\cdot H_{b}\left(\frac{\beta}{16c_{0}}\right)+0.02\cdot k\cdot 1
    \displaystyle\leq kHb(β16c0)+0.02k.\displaystyle k\cdot H_{b}\left(\frac{\beta}{16c_{0}}\right)+0.02k.

    Now, for a large enough constant c0c_{0} and β=1/2\beta=1/2,

    I(Z;Π)i[k]I(Zi;Π)i[k](H(Zi)H(Zi|Π))kHb(β)kHb(β16c0)0.02kΩ(k).I(Z;\Pi)\geq\sum_{i\in[k]}I(Z_{i};\Pi)\geq\sum_{i\in[k]}(H(Z_{i})-H(Z_{i}\ |\ \Pi))\geq k\cdot H_{b}(\beta)-k\cdot H_{b}\left(\frac{\beta}{16c_{0}}\right)-0.02k\geq\Omega(k).
     

6.1.2 Proof of Theorem 10

  • Proof:

    We first prove the theorem for QUAN. In the case that k1/ε2k\geq 1/\varepsilon^{2}, we prove an Ω(1/ε2)\Omega(1/\varepsilon^{2}) information complexity lower bound. We prove this by a simple reduction from kk-GAP-MAJ. We can assume k=1/ε2k=1/\varepsilon^{2} since if k>1/ε2k>1/\varepsilon^{2} then we can just give inputs to the first 1/ε21/\varepsilon^{2} sites. Set β=1/2\beta=1/2. Given a random input Z1,Z2,,ZkZ_{1},Z_{2},\ldots,Z_{k} of kk-GAP-MAJ chosen from distribution ð\eth, we simply give ZiZ_{i} to site SiS_{i}. It is easy to observe that a protocol that computes ε/2\varepsilon/2-approximate QUAN on A={Z1,Z2,,Zk}A=\{Z_{1},Z_{2},\ldots,Z_{k}\} with error probability δ\delta also computes kk-GAP-MAJ on input distribution ð\eth with error probability δ\delta, since the answer to kk-GAP-MAJ is simply the answer to 12\frac{1}{2}-quantile. The Ω(1/ε2)\Omega(1/\varepsilon^{2}) lower bound follows from Theorem 11.

    In the case that k<1/ε2k<1/\varepsilon^{2}, we prove an Ω(k/ε)\Omega(\sqrt{k}/\varepsilon) information complexity lower bound. We again perform a reduction from kk-GAP-MAJ. Set β=1/2\beta=1/2. The reduction works as follows. We are given =1/(εk)\ell=1/(\varepsilon\sqrt{k}) independent copies of kk-GAP-MAJ with Z1,Z2,,ZZ^{1},Z^{2},\ldots,Z^{\ell} being the inputs, where Zi={Z1i,Z2i,,Zki}{0,1}kZ^{i}=\{Z^{i}_{1},Z^{i}_{2},\ldots,Z^{i}_{k}\}\in\{0,1\}^{k} is chosen from distribution ð\eth. We construct an input for QUAN by giving the jj-th site the item set Aj={Zj1,2+Zj2,4+Zj3,,2(1)+Zj}A_{j}=\{Z^{1}_{j},2+Z^{2}_{j},4+Z^{3}_{j},\ldots,2(\ell-1)+Z^{\ell}_{j}\}. It is not difficult to observe that a protocol that computes ε/2\varepsilon/2-approximate QUAN on the set A={A1,A2,,Aj}A=\{A_{1},A_{2},\ldots,A_{j}\} with error probability δ\delta also computes the answer to each copy of kk-GAP-MAJ on distribution ð\eth with error probability δ\delta, simply by returning (Xi2(i1))(X_{i}-2(i-1)) for the ii-th copy of kk-GAP-MAJ, where XiX_{i} is the ε/2\varepsilon/2-approximate i1/2\frac{i-1/2}{\ell}-quantile.

    On the other hand, any protocol that computes each of the \ell independent copies of kk-GAP-MAJ correctly with error probability δ\delta for a sufficiently small constant δ\delta has information complexity Ω(k/ε)\Omega(\sqrt{k}/\varepsilon). This is simply because for any transcript Π\Pi, by Theorem 11, independence and the chain rule we have that

    I(Z1,Z2,,Z;Π)i[]I(Zi;Π)Ω(k)Ω(k/ε).I(Z^{1},Z^{2},\ldots,Z^{\ell};\Pi)\geq\sum_{i\in[\ell]}I(Z^{i};\Pi)\geq\Omega(\ell k)\geq\Omega(\sqrt{k}/\varepsilon). (50)

    The proof for heavy hitters is done by essentially the same reduction as that for QUAN. In the case that k=1/ε2k=1/\varepsilon^{2} (or k1/ε2k\geq 1/\varepsilon^{2} in general), a protocol that computes ε/2\varepsilon/2-approximate 12\frac{1}{2}-heavy hitters on A={Z1,Z2,,Zk}A=\{Z_{1},Z_{2},\ldots,Z_{k}\} with error probability δ\delta also computes kk-GAP-MAJ on input distribution ð\eth with error probability δ\delta. In the case that k<1/ε2k<1/\varepsilon^{2}, it also holds that a protocol that computes ε/2\varepsilon/2-approximate εk2\frac{\varepsilon\sqrt{k}}{2}-heavy hitters on the set A={A1,A2,,Aj}A=\{A_{1},A_{2},\ldots,A_{j}\} where Aj={Zj1,2+Zj2,4+Zj3,,2(1)+Zj}A_{j}=\{Z^{1}_{j},2+Z^{2}_{j},4+Z^{3}_{j},\ldots,2(\ell-1)+Z^{\ell}_{j}\} with error probability δ\delta also computes the answer to each copy of kk-GAP-MAJ on distribution ð\eth with error probability δ\delta.  

6.2 Entropy Estimation

We are given a set A={(e1,a1),(e2,a2),,(em,am)}A=\{(e_{1},a_{1}),(e_{2},a_{2}),\ldots,(e_{m},a_{m})\} where each ek(k[m])e_{k}\ (k\in[m]) is drawn from the universe [N][N], and ak{+1,1}a_{k}\in\{+1,-1\} denotes an insertion or a deletion of item eke_{k}. The entropy estimation problem (ENTROPY) asks for the value H(A)=j[N](|fj|/L)log(L/|fj|)H(A)=\sum_{j\in[N]}(\left|f_{j}\right|/L)\log(L/\left|f_{j}\right|) where fj=k:ek=jakf_{j}=\sum_{k:e_{k}=j}a_{k} and L=j[N]|fj|L=\sum_{j\in[N]}\left|f_{j}\right|. In the ε\varepsilon-approximate ENTROPY problem, the items in the set AA are distributed among kk sites who want to compute a value H~(A)\tilde{H}(A) for which |H~(A)H(A)|ε\left|\tilde{H}(A)-H(A)\right|\leq\varepsilon. In this section we prove the following theorem.

Theorem 12

Any randomized protocol that computes ε\varepsilon-approximate ENTROPY  with error probability at most δ\delta for some sufficiently small constant δ\delta has communication complexity Ω(k/ε2){\Omega}(k/\varepsilon^{2}).

  • Proof:

    As with F2F_{2}, we prove the lower bound for the ENTROPY problem by a reduction from kk-BTX. Given a random input BB for kk-BTX according to distribution ν\nu with n=γ2k2n=\gamma^{2}k^{2} for some parameter γ=logd(k/ε)\gamma=\log^{-d}(k/\varepsilon) for large enough constant dd, we construct an input for ENTROPY as follows. Each block j[1/ε2]j\in[1/\varepsilon^{2}] in kk-BTX corresponds to one coordinate item eje_{j} in the vector for ENTROPY; so we have in total 1/ε21/\varepsilon^{2} items in the entropy vector. The kk sites first use shared randomness to sample γ2k2/ε2\gamma^{2}k^{2}/\varepsilon^{2} random ±1\pm 1 values for each coordinate across all blocks in BB 999By Newman’s theorem (cf.  [43], Chapter 3) we can get rid of the public randomness by increasing the total communication complexity by no more than an additive O(log(γk/ε))O(\log(\gamma k/\varepsilon)) factor which is negligible in our proof.. Let {R11,R12,,Rγ2k21/ε2}\{R_{1}^{1},R_{1}^{2},\ldots,R_{\gamma^{2}k^{2}}^{1/\varepsilon^{2}}\} be these random ±1\pm 1 values. Each site looks at each of its bits Bi,j(i[k],γ2k2,j[1/ε2])B_{i,\ell}^{j}\ (i\in[k],\ell\in\gamma^{2}k^{2},j\in[1/\varepsilon^{2}]), and generates an item (ej,Rj)(e_{j},R_{\ell}^{j}) (recall that RjR_{\ell}^{j} denotes insertion or deletion of the item eje_{j}) if Bi,j=1B_{i,\ell}^{j}=1. Call the resulting input distribution ν\nu^{\prime}.

    We call an item in group GPG_{P} if the kk-XOR instance in the corresponding block is a 0000-instance; and in group GQG_{Q} if it is a 1111-instance; in group GUG_{U} if it is a 0101-instance or a 1010-instance. Group GUG_{U} is further divided to two subgroups GU1G_{U_{1}} and GU2G_{U_{2}}, containing all 1010-instance and all 0101-instance, respectively. Let P,Q,U,U1,U2)P,Q,U,U_{1},U_{2}) be the cardinalities of these groups. Now we consider the frequency of each item type.

    1. 1.

      For an item ejGPe_{j}\in G_{P}, its frequency fjf_{j} is distributed as follows: we choose a value ii from the binomial distribution on nn values each with probability 1/21/2, then we take the sum κj\kappa_{j} of ii i.i.d. ±1\pm 1 random variables. We can thus write |fj|=|κj||f_{j}|=|\kappa_{j}|.

    2. 2.

      For an item ejGQe_{j}\in G_{Q}, its frequency fjf_{j} is distributed as follows: we choose a value ii from the binomial distribution on nn values each with probability 1/21/2, then we take the sum κj\kappa_{j} of ii i.i.d. ±1\pm 1 random variables. Then we add the value RjkR_{\ell^{*}}^{j}\cdot k, where \ell^{*} is the index of the special column in block jj. We can thus write |fj||f_{j}| as |k+Rjκj||k+R_{\ell^{*}}^{j}\cdot\kappa_{j}|. By a Chernoff-Hoeffding bound, with probability 12eλ2/21-2e^{-\lambda^{2}/2}, we have |κj|λγk\left|\kappa_{j}\right|\leq\lambda\gamma k. We choose λ=log(k/ε)\lambda=\log(k/\varepsilon), and thus λγ=o(1)\lambda\gamma=o(1). Therefore κj\kappa_{j} will not affect the sign of fjf_{j} for any jj (by a union bound) and we can write |fj|=k+Rjκj\left|f_{j}\right|=k+R_{\ell^{*}}^{j}\cdot\kappa_{j}. Since κj\kappa_{j} is symmetric about 0 and RjR_{\ell^{*}}^{j} is a random ±1\pm 1 variable, we can simply drop RjR_{\ell^{*}}^{j} and write |fj|=k+κj\left|f_{j}\right|=k+\kappa_{j}.

    3. 3.

      For an item ejGUe_{j}\in G_{U}, its frequency fjf_{j} is distributed as follows: we choose a value ii from the binomial distribution on nn values each with probability 1/21/2, then we take the sum κj\kappa_{j} of ii i.i.d. ±1\pm 1 random variables. Then we add the value Rjk/2R_{\ell^{*}}^{j}\cdot k/2, where \ell^{*} is the index of the special column in block jj. We can thus write |fj||f_{j}| as |k/2+Rjκj|k/2+R_{\ell^{*}}^{j}\cdot\kappa_{j}. As in the previous case, with probability 12eλ2/21-2e^{-\lambda^{2}/2}, κj\kappa_{j} will not affect the sign of fjf_{j} and we can write |fj|=k/2+κj\left|f_{j}\right|=k/2+\kappa_{j}.

    By a union bound, with error probability at most δ1=1/ε22eλ2/2=1o(1)\delta_{1}=1/\varepsilon^{2}\cdot 2e^{-\lambda^{2}/2}=1-o(1), each κj(ejGQGU)\kappa_{j}\ (e_{j}\in G_{Q}\cup G_{U}) will not affect the sign of the corresponding fjf_{j}. Moreover, by another Chernoff bound we have that with error probability δ2=10ec02/3\delta_{2}=10e^{-c_{0}^{2}/3}, P,Q,U1,U2P,Q,U_{1},U_{2} are equal to 1/4ε2±c0/ε1/4\varepsilon^{2}\pm c_{0}/\varepsilon, and U=1/(2ε2)±c0/εU=1/(2\varepsilon^{2})\pm c_{0}/\varepsilon. Here δ2\delta_{2} can be sufficiently small if we set constant c0c_{0} sufficiently large. Thus we have that with arbitrary small constant error δ0=δ1+δ2\delta_{0}=\delta_{1}+\delta_{2}, all the concentration results claimed above hold. For simplicity we neglect this part of error since it can be arbitrarily small and will not affect any of the analysis. In the rest of this section we will ignore arbitrarily small errors and drop some lower order terms as long as such operations will not affect any the analysis.

    The analysis of the next part is similar to that for our F2F_{2} lower bound, where we end up computing F2F_{2} on three different vectors. Let us calculate H0,H1H_{0},H_{1} and H2H_{2}, which stand for the entropies of all kk-sites, the first k/2k/2 sites and the second k/2k/2 sites, respectively. Then we show that using H0,H1H_{0},H_{1} and H2H_{2} we can estimate UU well, and thus compute kk-BTX correctly with an arbitrarily small constant error. Thus if there is a protocol for ENTROPY on distribution ν\nu^{\prime} then we obtain a protocol for kk-BTX on distribution ν\nu with the same communication complexity, completing the reduction and consequently proving Theorem 12.

    Before computing H0,H1H_{0},H_{1} and H2H_{2}, we first compute the total number LL of items. We can write

    L\displaystyle L =\displaystyle= ejGP|fj|+ejGQ|fj|+ejGU|fj|\displaystyle\sum_{e_{j}\in G_{P}}\left|f_{j}\right|+\sum_{e_{j}\in G_{Q}}\left|f_{j}\right|+\sum_{e_{j}\in G_{U}}\left|f_{j}\right| (51)
    =\displaystyle= Qk+Uk/2+ejGp|κj|+ejGQGUκj.\displaystyle Q\cdot k+U\cdot k/2+\sum_{e_{j}\in G_{p}}\left|\kappa_{j}\right|+\sum_{e_{j}\in G_{Q}\cup G_{U}}\kappa_{j}.

    The absolute value of the fourth term in (51) can be bounded by O(γk/ε)O(\gamma k/\varepsilon) with arbitrarily large constant probability, using a Chernoff-Hoeffding bound, which will be o(εL)o(\varepsilon L) and thus can be dropped. For the third term, by Chebyshev’s inequality we can assume (by increasing the constant in the big-Oh) that with arbitrarily large constant probability, ejGp|κj|=(1±ε)1/(4ε2)𝐄[|κj|]\sum_{e_{j}\in G_{p}}|\kappa_{j}|=(1\pm\varepsilon)\cdot 1/(4\varepsilon^{2})\cdot{\bf E}[|\kappa_{j}|], where 𝐄[|κj|]=Θ(γk){\bf E}[|\kappa_{j}|]=\Theta(\gamma k) follows by approximating the binomial distribution by a normal distribution (or, e.g., Khintchine’s inequality). Let z1=𝐄[|κj|]z_{1}={\bf E}[|\kappa_{j}|] be a value which can be computed exactly. Then, ejGp|κj|=1/(4ε2)z1±O(γk/ε)=z1/(4ε2)±o(εL)\sum_{e_{j}\in G_{p}}|\kappa_{j}|=1/(4\varepsilon^{2})\cdot z_{1}\pm O(\gamma k/\varepsilon)=z_{1}/(4\varepsilon^{2})\pm o(\varepsilon L), and so we can drop the additive o(εL)o(\varepsilon L) term.

    Finally, we get,

    L\displaystyle L =\displaystyle= Qk+Uk/2+r1\displaystyle Q\cdot k+U\cdot k/2+r_{1} (52)

    where r1=z1/(4ε2)=Θ(γk/ε2)r_{1}=z_{1}/(4\varepsilon^{2})=\Theta(\gamma k/\varepsilon^{2}) is a value that can be computed by any site without any comunication.

    Let pj=|fj|/L(j[1/ε2])p_{j}=\left|f_{j}\right|/L\ (j\in[1/\varepsilon^{2}]). We can write HH as follows.

    H0\displaystyle H_{0} =\displaystyle= ejPpjlog(1/pj)+ejQpjlog(1/pj)+ejUpjlog(1/pj).\displaystyle\sum_{e_{j}\in P}p_{j}\log(1/p_{j})+\sum_{e_{j}\in Q}p_{j}\log(1/p_{j})+\sum_{e_{j}\in U}p_{j}\log(1/p_{j}). (53)
    =\displaystyle= logLS/L\displaystyle\log L-S/L

    where

    S\displaystyle S =\displaystyle= ejP|fj|log|fj|+ejQ|fj|log|fj|+ejU|fj|log|fj|\displaystyle\sum_{e_{j}\in P}\left|f_{j}\right|\log\left|f_{j}\right|+\sum_{e_{j}\in Q}\left|f_{j}\right|\log\left|f_{j}\right|+\sum_{e_{j}\in U}\left|f_{j}\right|\log\left|f_{j}\right| (54)

    We consider the three summands in (54) one by one. For the second term in (54), we have

    ejQ|fj|log|fj|\displaystyle\sum_{e_{j}\in Q}\left|f_{j}\right|\log\left|f_{j}\right| =\displaystyle= ejQ(k+κj)log(k+κj)\displaystyle\sum_{e_{j}\in Q}(k+\kappa_{j})\log(k+\kappa_{j}) (55)
    =\displaystyle= Qklogk+kejQlog(1+κj/k)\displaystyle Q\cdot k\log k+k\sum_{e_{j}\in Q}\log(1+\kappa_{j}/k)
    =\displaystyle= Qklogk±O(ejQκj).\displaystyle Q\cdot k\log k\pm O(\sum_{e_{j}\in Q}\kappa_{j}).

    The second term in (55) is at most o(εQγk)=o(k/eps)o(\varepsilon Q\cdot\gamma k)=o(k/eps), and can be dropped. By a similar analysis we can obtain that the third term in (54) is (up to an o(k/ε)o(k/\varepsilon) term)

    ejU|fj|log|fj|=U(k/2)log(k/2).\displaystyle\sum_{e_{j}\in U}\left|f_{j}\right|\log\left|f_{j}\right|=U\cdot(k/2)\log(k/2). (56)

    Now consider the first term. We have

    ejP|fj|log|fj|\displaystyle\sum_{e_{j}\in P}\left|f_{j}\right|\log\left|f_{j}\right| =\displaystyle= ejP|κj|log|κj|\displaystyle\sum_{e_{j}\in P}\left|\kappa_{j}\right|\log\left|\kappa_{j}\right| (57)
    =\displaystyle= (1/4ε2±O(1/ε))𝖤[|κj|log|κj|]\displaystyle(1/4\varepsilon^{2}\pm O(1/\varepsilon))\cdot\mathsf{E}[\left|\kappa_{j}\right|\log\left|\kappa_{j}\right|]
    =\displaystyle= (1/4ε2)𝖤[|κj|log|κj|]±O(1/ε)𝖤[|κj|log|κj|]\displaystyle(1/4\varepsilon^{2})\cdot\mathsf{E}[\left|\kappa_{j}\right|\log\left|\kappa_{j}\right|]\pm O(1/\varepsilon)\cdot\mathsf{E}[\left|\kappa_{j}\right|\log\left|\kappa_{j}\right|]

    where z2=𝖤[|κj|log|κj|]=O(γklogk)z_{2}=\mathsf{E}[\left|\kappa_{j}\right|\log\left|\kappa_{j}\right|]=O(\gamma k\log k) can be computed exactly. Then the second term in (57) is at most O(γklogk/ε)=o(k/ε)O(\gamma k\log k/\varepsilon)=o(k/\varepsilon), and thus can be dropped. Let r2=(1/4ε2)z2=O(γklogk/ε2)r_{2}=(1/4\varepsilon^{2})\cdot z_{2}=O(\gamma k\log k/\varepsilon^{2}). By Equations (53), (52), (54), (55), (56), (57) we can write

    H0\displaystyle H_{0} =\displaystyle= log(Qk+Uk/2+r1)+Qklogk+U(k/2)log(k/2)+r2Qk+Uk/2+r1.\displaystyle\log(Q\cdot k+U\cdot k/2+r_{1})+\frac{Q\cdot k\log k+U\cdot(k/2)\log(k/2)+r_{2}}{Q\cdot k+U\cdot k/2+r_{1}}. (58)

    Let U=1/(2ε2)+UU=1/(2\varepsilon^{2})+U^{\prime} and Q=1/4ε2+QQ=1/4\varepsilon^{2}+Q^{\prime}, and thus U=O(1/ε)U^{\prime}=O(1/\varepsilon) and Q=O(1/ε)Q^{\prime}=O(1/\varepsilon). Next we convert the RHS of (58) to a linear function of UU^{\prime} and QQ^{\prime}.

    H0\displaystyle H_{0} =\displaystyle= log(k/(2ε2)+Qk+Uk/2+r1)+(14ε2+Q)klogk+(1(2ε2)+U)k2logk2+r2k/(2ε2)+Qk+Uk/2+r1\displaystyle\log(k/(2\varepsilon^{2})+Q^{\prime}k+U^{\prime}k/2+r_{1})+\frac{\left(\frac{1}{4\varepsilon^{2}}+Q^{\prime}\right)\cdot k\log k+\left(\frac{1}{(2\varepsilon^{2})}+U^{\prime}\right)\cdot\frac{k}{2}\log\frac{k}{2}+r_{2}}{k/(2\varepsilon^{2})+Q^{\prime}k+U^{\prime}k/2+r_{1}} (61)
    =\displaystyle= log(k/(2ε2)+r1)+log(1+(U+2Q)ε21+2ε2r1/k)\displaystyle\log(k/(2\varepsilon^{2})+r_{1})+\log\left(1+(U^{\prime}+2Q^{\prime})\frac{\varepsilon^{2}}{1+2\varepsilon^{2}r_{1}/k}\right)
    +((14ε2+Q)klogk+(1(2ε2)+U)k2logk2+r2)2ε2k+2ε2r1\displaystyle+\ \left(\left(\frac{1}{4\varepsilon^{2}}+Q^{\prime}\right)\cdot k\log k+\left(\frac{1}{(2\varepsilon^{2})}+U^{\prime}\right)\cdot\frac{k}{2}\log\frac{k}{2}+r_{2}\right)\cdot\frac{2\varepsilon^{2}}{k+2\varepsilon^{2}r_{1}}\cdot
    (1(U+2Q)ε21+2ε2r1/k)\displaystyle\left(1-(U^{\prime}+2Q^{\prime})\frac{\varepsilon^{2}}{1+2\varepsilon^{2}r_{1}/k}\right)
    ±O(ε2)\displaystyle\pm\ O(\varepsilon^{2})
    =\displaystyle= log(k/(2ε2)+r1)+(U+2Q)ε21+2ε2r1/k\displaystyle\log(k/(2\varepsilon^{2})+r_{1})+(U^{\prime}+2Q^{\prime})\frac{\varepsilon^{2}}{1+2\varepsilon^{2}r_{1}/k}
    +((k(2logk1)4ε2+r2)+Qklogk+Uk2logk2)2ε2k+2ε2r1\displaystyle+\ \left(\left(\frac{k(2\log{k}-1)}{4\varepsilon^{2}}+r_{2}\right)+Q^{\prime}\cdot k\log k+U^{\prime}\cdot\frac{k}{2}\log\frac{k}{2}\right)\cdot\frac{2\varepsilon^{2}}{k+2\varepsilon^{2}r_{1}}\cdot
    (1Uε21+2ε2r1/kQ2ε21+2ε2r1/k)\displaystyle\quad\left(1-U^{\prime}\cdot\frac{\varepsilon^{2}}{1+2\varepsilon^{2}r_{1}/k}-Q^{\prime}\cdot\frac{2\varepsilon^{2}}{1+2\varepsilon^{2}r_{1}/k}\right)
    ±O(ε2)\displaystyle\pm\ O(\varepsilon^{2})
    =\displaystyle= α1+α2U+α3Q,\displaystyle\alpha_{1}+\alpha_{2}U^{\prime}+\alpha_{3}Q^{\prime}, (62)

    up to o(ε)o(\varepsilon) factors (see below for discussion), where

    α1\displaystyle\alpha_{1} =\displaystyle= log(k/(2ε2)+r1)+(k(2logk1)4ε2+r2)2ε2k+2ε2r1,\displaystyle\log(k/(2\varepsilon^{2})+r_{1})+\left(\frac{k(2\log{k}-1)}{4\varepsilon^{2}}+r_{2}\right)\cdot\frac{2\varepsilon^{2}}{k+2\varepsilon^{2}r_{1}},
    α2\displaystyle\alpha_{2} =\displaystyle= ε21+2ε2r1/k+(k2logk2(k(2logk1)4ε2+r2)ε21+2ε2r1/k)2ε2k+2ε2r1,\displaystyle\frac{\varepsilon^{2}}{1+2\varepsilon^{2}r_{1}/k}+\left(\frac{k}{2}\log\frac{k}{2}-\left(\frac{k(2\log{k}-1)}{4\varepsilon^{2}}+r_{2}\right)\cdot\frac{\varepsilon^{2}}{1+2\varepsilon^{2}r_{1}/k}\right)\cdot\frac{2\varepsilon^{2}}{k+2\varepsilon^{2}r_{1}},
    α3\displaystyle\alpha_{3} =\displaystyle= 2ε21+2ε2r1/k+(klogk(k(2logk1)4ε2+r2)2ε21+2ε2r1/k)2ε2k+2ε2r1,\displaystyle\frac{2\varepsilon^{2}}{1+2\varepsilon^{2}r_{1}/k}+\left(k\log k-\left(\frac{k(2\log{k}-1)}{4\varepsilon^{2}}+r_{2}\right)\cdot\frac{2\varepsilon^{2}}{1+2\varepsilon^{2}r_{1}/k}\right)\cdot\frac{2\varepsilon^{2}}{k+2\varepsilon^{2}r_{1}}, (63)

From (61) to (61) we use the fact that 1/(1+ε)=1ε+O(ε2)1/(1+\varepsilon)=1-\varepsilon+O(\varepsilon^{2}). From (61) to (61) we use the fact that log(1+ε)=ε+O(ε2)\log(1+\varepsilon)=\varepsilon+O(\varepsilon^{2}). From (61) to (62) we use the fact that all terms in the form of UQ,U2,Q2U^{\prime}Q^{\prime},{U^{\prime}}^{2},{Q^{\prime}}^{2} are at most ±o(ε)\pm o(\varepsilon) (we are assuming O(ε2logk)=o(ε)O(\varepsilon^{2}\log k)=o(\varepsilon) which is fine since we are neglecting polylog(N)(N) factors), therefore we can drop all of them together with the other ±O(ε2)\pm O(\varepsilon^{2}) terms, and consequently obtain a linear function on UU^{\prime} and QQ^{\prime}.

Next we calculate H1H_{1}, and the calculation of H2H_{2} will be exactly the same. The values t1,t2t_{1},t_{2} used in the following expressions are essentially the same as r1,r2r_{1},r_{2} used for calculating H0H_{0}, with t1=Θ(γk)t_{1}=\Theta(\gamma k) and t2=O(γklogk/ε2)t_{2}=O(\gamma k\log k/\varepsilon^{2}). Set U1=U11/4ε2U^{\prime}_{1}=U_{1}-1/4\varepsilon^{2} and U2=U21/4ε2U^{\prime}_{2}=U_{2}-1/4\varepsilon^{2}.

H1\displaystyle H_{1} =\displaystyle= log((Q+U1)k/2+t1)+Qk2logk2+U1k2logk2+t2(Q+U1)k/2+t1\displaystyle\log((Q+U_{1})k/2+t_{1})+\frac{Q\cdot\frac{k}{2}\log\frac{k}{2}+U_{1}\cdot\frac{k}{2}\log\frac{k}{2}+t_{2}}{(Q+U_{1})k/2+t_{1}} (64)
=\displaystyle= log(k/(4ε2)+(Q+U1)k/2+t1)+Qk2logk2+U1k2logk2+klog(k/2)4ε2+t2k/(4ε2)+(Q+U1)k/2+t1\displaystyle\log(k/(4\varepsilon^{2})+(Q^{\prime}+U^{\prime}_{1})k/2+t_{1})+\frac{Q^{\prime}\cdot\frac{k}{2}\log\frac{k}{2}+U^{\prime}_{1}\cdot\frac{k}{2}\log\frac{k}{2}+\frac{k\log(k/2)}{4\varepsilon^{2}}+t_{2}}{k/(4\varepsilon^{2})+(Q^{\prime}+U^{\prime}_{1})k/2+t_{1}}
=\displaystyle= log(k/(4ε2)+t1)+(Q+U1)2ε21+4ε2t1/k\displaystyle\log(k/(4\varepsilon^{2})+t_{1})+(Q^{\prime}+U^{\prime}_{1})\frac{2\varepsilon^{2}}{1+4\varepsilon^{2}t_{1}/k}
+(Qk2logk2+U1k2logk2+klog(k/2)4ε2+t2)4ε2k+4ε2t1(1Q2ε2+U12ε21+4ε2t1/k)\displaystyle+\left(Q^{\prime}\cdot\frac{k}{2}\log\frac{k}{2}+U^{\prime}_{1}\cdot\frac{k}{2}\log\frac{k}{2}+\frac{k\log(k/2)}{4\varepsilon^{2}}+t_{2}\right)\cdot\frac{4\varepsilon^{2}}{k+4\varepsilon^{2}t_{1}}\cdot\left(1-\frac{Q^{\prime}\cdot 2\varepsilon^{2}+U^{\prime}_{1}\cdot 2\varepsilon^{2}}{1+4\varepsilon^{2}t_{1}/k}\right)
=\displaystyle= β1+β2U1+β3Q.\displaystyle\beta_{1}+\beta_{2}U^{\prime}_{1}+\beta_{3}Q^{\prime}.

where

β1\displaystyle\beta_{1} =\displaystyle= log(k/(4ε2)+t1)+(klog(k/2)4ε2+t2)4ε2k+4ε2t1,\displaystyle\log(k/(4\varepsilon^{2})+t_{1})+\left(\frac{k\log(k/2)}{4\varepsilon^{2}}+t_{2}\right)\cdot\frac{4\varepsilon^{2}}{k+4\varepsilon^{2}t_{1}},
β2\displaystyle\beta_{2} =\displaystyle= 2ε21+4ε2t1/k+(k2logk2(klog(k/2)4ε2+t2)2ε21+4ε2t1/k)4ε2k+4ε2t1,\displaystyle\frac{2\varepsilon^{2}}{1+4\varepsilon^{2}t_{1}/k}+\left(\frac{k}{2}\log\frac{k}{2}-\left(\frac{k\log(k/2)}{4\varepsilon^{2}}+t_{2}\right)\frac{2\varepsilon^{2}}{1+4\varepsilon^{2}t_{1}/k}\right)\frac{4\varepsilon^{2}}{k+4\varepsilon^{2}t_{1}},
β3\displaystyle\beta_{3} =\displaystyle= 2ε21+4ε2t1/k+(k2logk2(klog(k/2)4ε2+t2)2ε21+4ε2t1/k)4ε2k+4ε2t1.\displaystyle\frac{2\varepsilon^{2}}{1+4\varepsilon^{2}t_{1}/k}+\left(\frac{k}{2}\log\frac{k}{2}-\left(\frac{k\log(k/2)}{4\varepsilon^{2}}+t_{2}\right)\frac{2\varepsilon^{2}}{1+4\varepsilon^{2}t_{1}/k}\right)\frac{4\varepsilon^{2}}{k+4\varepsilon^{2}t_{1}}. (65)

By the same calculation we can obtain the following equation for H2H_{2}.

H2\displaystyle H_{2} =\displaystyle= β1+β2U2+β3Q.\displaystyle\beta_{1}+\beta_{2}U^{\prime}_{2}+\beta_{3}Q^{\prime}. (66)

Note that U=U1+U2U^{\prime}=U^{\prime}_{1}+U^{\prime}_{2}. Combining (64) and (66) we have

H1+H2\displaystyle H_{1}+H_{2} =\displaystyle= 2β1+β2U+2β3Q.\displaystyle 2\beta_{1}+\beta_{2}U^{\prime}+2\beta_{3}Q^{\prime}. (67)

It is easy to verify that Equations (62) and (67) are linearly independent: by direct calculation (notice that r1,r2r_{1},r_{2} are lower order terms) we obtain α2=ε22(1±o(1))\alpha_{2}=\frac{\varepsilon^{2}}{2}(1\pm o(1)) and α3=3ε2(1±o(1)\alpha_{3}=3\varepsilon^{2}(1\pm o(1). Therefore α2/α3=(1±o(1))/6\alpha_{2}/\alpha_{3}=(1\pm o(1))/6. Similarly we can obtain β2/2β3=(1±o(1))/2\beta_{2}/2\beta_{3}=(1\pm o(1))/2. Therefore the two equations are linearly independent. Furthermore, we can compute all the coefficients α1,α2,α3,β1,β2,β3\alpha_{1},\alpha_{2},\alpha_{3},\beta_{1},\beta_{2},\beta_{3} up to a (1±o(ε))(1\pm o(\varepsilon)) factor. Thus if we have σε\sigma\varepsilon additive approximations of H0,H1,H2H_{0},H_{1},H_{2} for a sufficient small constant σ\sigma, then we can estimate UU^{\prime} (and thus UU) up to an additive error of σ/ε\sigma^{\prime}/\varepsilon for a sufficiently small constant σ\sigma^{\prime} by Equation (62) and (67), and therefore kk-BTX. This completes the proof.  

6.3 p\ell_{p} for any constant p1p\geq 1

Consider an nn-dimensional vector xx with integer entries. It is well-known that for a vector vv of nn i.i.d. N(0,1)N(0,1) random variables that v,xN(0,x22)\langle v,x\rangle\sim N(0,\|x\|_{2}^{2}). Hence, for any real p>0p>0, 𝐄[|v,x|p]=x2pGp{\bf E}[|\langle v,x\rangle|^{p}]=\|x\|_{2}^{p}G_{p}, where Gp>0G_{p}>0 is the pp-th moment of the standard half-normal distribution (see [1] for a formula for these moments in terms of confluent hypergeometric functions). Let r=O(ε2)r=O(\varepsilon^{-2}), and v1,,vrv^{1},\ldots,v^{r} be independent nn-dimensional vectors of i.i.d. N(0,1)N(0,1) random variables. Let yj=vj,x/Gp1/py_{j}=\langle v^{j},x\rangle/G_{p}^{1/p}, so that y=(y1,,yr)y=(y_{1},\ldots,y_{r}). By Chebyshev’s inequality for r=O(ε2)r=O(\varepsilon^{-2}) sufficiently large, ypp=(1±ε/3)x2p\|y\|_{p}^{p}=(1\pm\varepsilon/3)\|x\|_{2}^{p} with probability at least 1c1-c for an arbitrarily small constant c>0c>0.

We thus have the following reduction which shows that estimating p\ell_{p} up to a (1+ε)(1+\varepsilon)-factor requires communication complexity Ω(k/ε2){\Omega}(k/\varepsilon^{2}) for any p>0p>0. Let the kk parties have respective inputs x1,,xkx^{1},\ldots,x^{k}, and let x=i=1kxix=\sum_{i=1}^{k}x^{i}. The parties use the shared randomness to choose shared vectors v1,,vrv^{1},\ldots,v^{r} as described above. For i=1,,ki=1,\ldots,k and j=1,,rj=1,\ldots,r, let yji=vj,xi/Gp1/py^{i}_{j}=\langle v^{j},x^{i}\rangle/G_{p}^{1/p}, so that yi=(y1i,,yri)y^{i}=(y^{i}_{1},\ldots,y^{i}_{r}). Let y=i=1kyiy=\sum_{i=1}^{k}y^{i}. By the above, ypp=(1±ε/3)x2p\|y\|_{p}^{p}=(1\pm\varepsilon/3)\|x\|_{2}^{p} with probability at least 1c1-c for an arbitrarily small constant c>0c>0. We note that the entries of the viv^{i} can be discretized to O(logn)O(\log n) bits, changing the pp-norm of yy by only a (1±O(1/n))(1\pm O(1/n)) factor, which we ignore.

Hence, given a randomized protocol for estimating ypp\|y\|_{p}^{p} up to a (1+ε/3)(1+\varepsilon/3) factor with probability 1δ1-\delta, and given that the parties have respective inputs y1,,yky^{1},\ldots,y^{k}, this implies a randomized protocol for estimating x2p\|x\|_{2}^{p} up to a (1±ε/3)(1±ε/3)=(1±ε)(1\pm\varepsilon/3)\cdot(1\pm\varepsilon/3)=(1\pm\varepsilon) factor with probability at least 1δc1-\delta-c, and hence a protocol for estimating 2\ell_{2} up to a (1±ε)(1\pm\varepsilon) factor with this probability. The communication complexity of the protocol for 2\ell_{2} is the same as that for p\ell_{p}. By our communication lower bound for estimating 2\ell_{2} (in fact, for estimating F2F_{2} in which all coordinates of xx are non-negative), this implies the following theorem.

Theorem 13

The randomized communication complexity of approximating the p\ell_{p}-norm, p1p\geq 1, up to a factor of 1+ε1+\varepsilon with constant probability, is Ω(k/ε2){\Omega}(k/\varepsilon^{2}).

Acknowledgements

We would like to thank Elad Verbin for many helpful discussions, in particular, for helping us with the F0F_{0} lower bound, which was discovered in joint conversations with him. We also thank Amit Chakrabarti and Oded Regev for helpful discussions, as well as the anonymous referees for useful comments. Finally, we thank the organizers of the Synergies in Lower Bounds workshop that took place in Aarhus for bringing the authors together.

References

  • [1] http://en.wikipedia.org/wiki/Normal_distribution.
  • [2] Open problems in data streams and related topics. http://www.cse.iitk.ac.in/users/sganguly/data-stream-probs.pdf, 2006.
  • [3] N. Alon, Y. Matias, and M. Szegedy. The space complexity of approximating the frequency moments. In Proc. ACM Symposium on Theory of Computing, 1996.
  • [4] C. Arackaparambil, J. Brody, and A. Chakrabarti. Functional monitoring without monotonicity. In Proc. International Colloquium on Automata, Languages, and Programming, 2009.
  • [5] B. Babcock and C. Olston. Distributed top-k monitoring. In Proc. ACM SIGMOD International Conference on Management of Data, 2003.
  • [6] Z. Bar-Yossef. The complexity of massive data set computations. PhD thesis, University of California at Berkeley, 2002.
  • [7] Z. Bar-Yossef, T. S. Jayram, R. Kumar, and D. Sivakumar. An information statistics approach to data stream and communication complexity. J. Comput. Syst. Sci., 68:702–732, June 2004.
  • [8] B. Barak, M. Braverman, X. Chen, and A. Rao. How to compress interactive communication. In Proc. ACM Symposium on Theory of Computing, pages 67–76, 2010.
  • [9] J. Brody and A. Chakrabarti. A multi-round communication lower bound for gap hamming and some consequences. In IEEE Conference on Computational Complexity, pages 358–368, 2009.
  • [10] J. Brody, A. Chakrabarti, O. Regev, T. Vidick, and R. de Wolf. Better gap-hamming lower bounds via better round elimination. In APPROX-RANDOM, pages 476–489, 2010.
  • [11] A. Chakrabarti, G. Cormode, R. Kondapally, and A. McGregor. Information cost tradeoffs for augmented index and streaming language recognition. In Proc. IEEE Symposium on Foundations of Computer Science, pages 387–396, 2010.
  • [12] A. Chakrabarti, G. Cormode, and A. McGregor. Robust lower bounds for communication and stream computation. In Proc. ACM Symposium on Theory of Computing, pages 641–650, 2008.
  • [13] A. Chakrabarti, T. S. Jayram, and M. Patrascu. Tight lower bounds for selection in randomly ordered streams. In Proc. ACM-SIAM Symposium on Discrete Algorithms, pages 720–729, 2008.
  • [14] A. Chakrabarti, S. Khot, and X. Sun. Near-optimal lower bounds on the multi-party communication complexity of set disjointness. In Proc. IEEE Conference on Computational Complexity, pages 107–117, 2003.
  • [15] A. Chakrabarti, R. Kondapally, and Z. Wang. Information complexity versus corruption and applications to orthogonality and gap-hamming. In APPROX-RANDOM, pages 483–494, 2012.
  • [16] A. Chakrabarti and O. Regev. An optimal lower bound on the communication complexity of gap-hamming-distance. In Proc. ACM Symposium on Theory of Computing, 2011.
  • [17] A. Chakrabarti, Y. Shi, A. Wirth, and A. Yao. Informational complexity and the direct sum problem for simultaneous message complexity. In Proc. IEEE Symposium on Foundations of Computer Science, pages 270–278, 2001.
  • [18] G. Cormode and M. Garofalakis. Sketching streams through the net: Distributed approximate query tracking. In Proc. International Conference on Very Large Data Bases, 2005.
  • [19] G. Cormode, M. Garofalakis, S. Muthukrishnan, and R. Rastogi. Holistic aggregates in a networked world: Distributed tracking of approximate quantiles. In Proc. ACM SIGMOD International Conference on Management of Data, 2005.
  • [20] G. Cormode, S. Muthukrishnan, and K. Yi. Algorithms for distributed functional monitoring. ACM Transactions on Algorithms, 7(2):21, 2011.
  • [21] G. Cormode, S. Muthukrishnan, K. Yi, and Q. Zhang. Optimal sampling from distributed streams. In Proc. ACM Symposium on Principles of Database Systems, 2010. Invited to Journal of the ACM.
  • [22] T. Cover and J. Thomas. Elements of Information Theory. John Wiley and Sons, Inc., 1991.
  • [23] P. Duris and J. D. P. Rolim. Lower bounds on the multiparty communication complexity. J. Comput. Syst. Sci., 56(1):90–95, 1998.
  • [24] F. Ergün and H. Jowhari. On distance to monotonicity and longest increasing subsequence of a data stream. In Proc. ACM-SIAM Symposium on Discrete Algorithms, pages 730–736, 2008.
  • [25] D. Estrin, R. Govindan, J. S. Heidemann, and S. Kumar. Next century challenges: Scalable coordination in sensor networks. In MOBICOM, pages 263–270, 1999.
  • [26] W. Feller. Generalization of a probability limit theorem of cramer. Trans. Amer. Math. Soc, 54(3):361–372, 1943.
  • [27] A. Gál and P. Gopalan. Lower bounds on streaming algorithms for approximating the length of the longest increasing subsequence. In Proc. IEEE Symposium on Foundations of Computer Science, 2007.
  • [28] S. Ganguly. Polynomial estimators for high frequency moments. CoRR, abs/1104.4552, 2011.
  • [29] S. Ganguly. A lower bound for estimating high moments of a data stream. CoRR, abs/1201.0253, 2012.
  • [30] P. Gopalan, R. Meka, O. Reingold, and D. Zuckerman. Pseudorandom generators for combinatorial shapes. In Proceedings of the 43rd annual ACM symposium on Theory of computing, STOC ’11, pages 253–262, New York, NY, USA, 2011. ACM.
  • [31] A. Gronemeier. Asymptotically optimal lower bounds on the nih-multi-party information complexity of the and-function and disjointness. In Symposium on Theoretical Aspects of Computer Science, pages 505–516, 2009.
  • [32] S. Guha and Z. Huang. Revisiting the direct sum theorem and space lower bounds in random order streams. In Proc. International Colloquium on Automata, Languages, and Programming, 2009.
  • [33] N. J. A. Harvey, J. Nelson, and K. Onak. Sketching and streaming entropy via approximation theory. In Proc. IEEE Symposium on Foundations of Computer Science, pages 489–498, 2008.
  • [34] Z. Huang and K. Yi. Personal communication, 2012.
  • [35] Z. Huang, K. Yi, and Q. Zhang. Randomized algorithms for tracking distributed count, frequencies, and ranks. CoRR, abs/1108.3413, 2011.
  • [36] Z. Huang, K. Yi, and Q. Zhang. Randomized algorithms for tracking distributed count, frequencies, and ranks. In Proceedings of the 31st symposium on Principles of Database Systems, PODS ’12, pages 295–306, New York, NY, USA, 2012. ACM.
  • [37] P. Indyk and D. Woodruff. Optimal approximations of the frequency moments of data streams. In Proc. ACM Symposium on Theory of Computing, 2005.
  • [38] P. Indyk and D. P. Woodruff. Tight lower bounds for the distinct elements problem. In FOCS, pages 283–288, 2003.
  • [39] T. S. Jayram. Hellinger strikes back: A note on the multi-party information complexity of and. In APPROX-RANDOM, pages 562–573, 2009.
  • [40] D. M. Kane, J. Nelson, E. Porat, and D. P. Woodruff. Fast moment estimation in data streams in optimal space. In STOC, pages 745–754, 2011.
  • [41] D. M. Kane, J. Nelson, and D. P. Woodruff. An optimal algorithm for the distinct elements problem. In Proc. ACM Symposium on Principles of Database Systems, pages 41–52, 2010.
  • [42] R. Keralapura, G. Cormode, and J. Ramamirtham. Communication-efficient distributed monitoring of thresholded counts. In Proc. ACM SIGMOD International Conference on Management of Data, 2006.
  • [43] E. Kushilevitz and N. Nisan. Communication Complexity. Cambridge University Press, 1997.
  • [44] F. Magniez, C. Mathieu, and A. Nayak. Recognizing well-parenthesized expressions in the streaming model. In Proc. ACM Symposium on Theory of Computing, pages 261–270, 2010.
  • [45] A. Manjhi, V. Shkapenyuk, K. Dhamdhere, and C. Olston. Finding (recently) frequent items in distributed data streams. In Proc. IEEE International Conference on Data Engineering, 2005.
  • [46] J. Matousek and J. Vondrák. The probabilistic method. Lecture Notes, 2008.
  • [47] J. M. Phillips, E. Verbin, and Q. Zhang. Lower bounds for number-in-hand multiparty communication complexity, made easy. In Proc. ACM-SIAM Symposium on Discrete Algorithms, 2012.
  • [48] A. A. Razborov. On the distributional complexity of disjointness. In Proc. International Colloquium on Automata, Languages, and Programming, 1990.
  • [49] A. A. Sherstov. The communication complexity of gap hamming distance. Electronic Colloquium on Computational Complexity (ECCC), 18:63, 2011.
  • [50] S. Tirthapura and D. P. Woodruff. Optimal random sampling from distributed streams revisited. In The International Symposium on Distributed Computing, pages 283–297, 2011.
  • [51] T. Vidick. A concentration inequality for the overlap of a vector on a large set, with application to the communication complexity of the gap-hamming-distance problem. Electronic Colloquium on Computational Complexity (ECCC), 18:51, 2011.
  • [52] D. Woodruff. Optimal space lower bounds for all frequency moments. In Proc. ACM-SIAM Symposium on Discrete Algorithms, 2004.
  • [53] A. C. Yao. Probabilistic computations: Towards a unified measure of complexity. In Proc. IEEE Symposium on Foundations of Computer Science, 1977.
  • [54] K. Yi and Q. Zhang. Optimal tracking of distributed heavy hitters and quantiles. In Proc. ACM Symposium on Principles of Database Systems, 2009.

Appendix A Proof for Observation 1

We first show the rectangle property of private randomness protocols in the message passing model. The proof is just a syntactic change to that in [6], Section 6.4.1, which was designed for the blackboard model. The only difference between the blackboard model and the message-passing model is that in the blackboard model, if one speaks, everyone else can hear.

Property 1

Given a kk-party private randomness protocol Π\Pi on inputs in 𝒵1××𝒵k\mathcal{Z}_{1}\times\cdots\times\mathcal{Z}_{k} in the message passing model, for all {z1,,zk}𝒵1××𝒵k\{z_{1},\ldots,z_{k}\}\in\mathcal{Z}_{1}\times\cdots\times\mathcal{Z}_{k}, and for all possible transcripts π{0,1}\pi\in\{0,1\}^{*}, we have

𝖯𝗋R[Π=π|Z1=z1,,Zk=zk]=i=1k𝖯𝗋Ri[Πi=πi|Zi=zi],\mathsf{Pr}_{R}[\Pi=\pi\ |\ Z_{1}=z_{1},\ldots,Z_{k}=z_{k}]=\prod_{i=1}^{k}\mathsf{Pr}_{R_{i}}[\Pi_{i}=\pi_{i}\ |\ Z_{i}=z_{i}], (68)

where πi\pi_{i} is the part of transcript π\pi that player ii sees (that is, the concatenation of all messages sent from or received by SiS_{i}), and R={R1,,Rk}{1××k}R=\{R_{1},\ldots,R_{k}\}\in\{\mathcal{R}_{1}\times\ldots\times\mathcal{R}_{k}\} is the players’ private randomness. Furthermore, we have

𝖯𝗋μ,R[Π=π]=i[k]𝖯𝗋μ,Ri[Πi=πi].\mathsf{Pr}_{\mu,R}[\Pi=\pi]=\prod_{i\in[k]}\mathsf{Pr}_{\mu,R_{i}}[\Pi_{i}=\pi_{i}]. (69)
  • Proof:

    We can view the input of Pi(i[k])P_{i}\ (i\in[k]) as a pair (zi,ri)(z_{i},r_{i}), where zi𝒵iz_{i}\in\mathcal{Z}_{i} and riir_{i}\in\mathcal{R}_{i}. Let 𝒞1(π1)××𝒞k(πk)\mathcal{C}_{1}(\pi_{1})\times\cdots\times\mathcal{C}_{k}(\pi_{k}) be the combinatorial rectangle containing all tuples {(z1,r1),,(zk,rk)}\{(z_{1},r_{1}),\ldots,(z_{k},r_{k})\} such that Π(z1,,zk,r1,,rk)=π\Pi(z_{1},\ldots,z_{k},r_{1},\ldots,r_{k})=\pi.

    For each i[k]i\in[k], for each zi𝒵iz_{i}\in\mathcal{Z}_{i}, let 𝒞i(zi,πi)\mathcal{C}_{i}(z_{i},\pi_{i}) be the projection of 𝒞i(πi)\mathcal{C}_{i}(\pi_{i}) on pairs of the form (zi,)(z_{i},*), and 𝒞i(zi)\mathcal{C}_{i}(z_{i}) be the collection of all pairs of the form (zi,)(z_{i},*). Note that |𝒞i(zi,πi)|/|𝒞i(zi)|=𝖯𝗋Ri[Πi=πi|Zi=zi]\left|\mathcal{C}_{i}(z_{i},\pi_{i})\right|/\left|\mathcal{C}_{i}(z_{i})\right|=\mathsf{Pr}_{R_{i}}[\Pi_{i}=\pi_{i}\ |\ Z_{i}=z_{i}]. If each player PiP_{i} chooses rir_{i} uniformly at random from i\mathcal{R}_{i}, then the transcript will be π\pi if and only if (zi,ri)𝒞i(zi,πi)(z_{i},r_{i})\in\mathcal{C}_{i}(z_{i},\pi_{i}). Since the choices of R1,,RkR_{1},\ldots,R_{k} are all independent, it follows that 𝖯𝗋R[Π=π|Z1=z1,,Zk=zk]=i[k]𝖯𝗋Ri[Πi=πi|Zi=zi]\mathsf{Pr}_{R}[\Pi=\pi\ |\ Z_{1}=z_{1},\ldots,Z_{k}=z_{k}]=\prod_{i\in[k]}\mathsf{Pr}_{R_{i}}[\Pi_{i}=\pi_{i}\ |\ Z_{i}=z_{i}].

    To show (69), we sum over all possible values of Z={Z1,,Zk}Z=\{Z_{1},\ldots,Z_{k}\}. Let Zi={Z1,,Zi1,Zi+1,Zk}Z_{-i}=\{Z_{1},\ldots,Z_{i-1},Z_{i+1},Z_{k}\}.

    𝖯𝗋μ,R[Π=π]\displaystyle\mathsf{Pr}_{\mu,R}[\Pi=\pi]
    =\displaystyle= z(𝖯𝗋R[Π=π|Z=z]𝖯𝗋μ[Z=z])\displaystyle\sum_{z}(\mathsf{Pr}_{R}[\Pi=\pi\ |\ Z=z]\mathsf{Pr}_{\mu}[Z=z])
    =\displaystyle= z(i[k]𝖯𝗋Ri[Πi=πi|Zi=zi]i[k]𝖯𝗋μ[Zi=zi])(by (68) and Zi are independent)\displaystyle\sum_{z}\left(\prod_{i\in[k]}\mathsf{Pr}_{R_{i}}[\Pi_{i}=\pi_{i}\ |\ Z_{i}=z_{i}]\prod_{i\in[k]}\mathsf{Pr}_{\mu}[Z_{i}=z_{i}]\right)\quad(\text{by (\ref{eq:x-1}) and $Z_{i}$ are independent})
    =\displaystyle= zi[k]𝖯𝗋μ,Ri[Πi=πiZi=zi]\displaystyle\sum_{z}\prod_{i\in[k]}\mathsf{Pr}_{\mu,R_{i}}[\Pi_{i}=\pi_{i}\wedge Z_{i}=z_{i}]
    =\displaystyle= zi((zi𝖯𝗋μ,Ri[Πi=πiZi=zi])ji𝖯𝗋μ,Rj[Πj=πjZj=zj])\displaystyle\sum_{z_{-i}}\left(\left(\sum_{z_{i}}\mathsf{Pr}_{\mu,R_{i}}[\Pi_{i}=\pi_{i}\wedge Z_{i}=z_{i}]\right)\prod_{j\neq i}\mathsf{Pr}_{\mu,R_{j}}[\Pi_{j}=\pi_{j}\wedge Z_{j}=z_{j}]\right)
    =\displaystyle= 𝖯𝗋μ,Ri[Πi=πi]ziji𝖯𝗋μ,Rj[Πj=πjZj=zj]\displaystyle\mathsf{Pr}_{\mu,R_{i}}[\Pi_{i}=\pi_{i}]\cdot\sum_{z_{-i}}\prod_{j\neq i}\mathsf{Pr}_{\mu,R_{j}}[\Pi_{j}=\pi_{j}\wedge Z_{j}=z_{j}]
    =\displaystyle= i[k]𝖯𝗋μ,Ri[Πi=πi].\displaystyle\prod_{i\in[k]}\mathsf{Pr}_{\mu,R_{i}}[\Pi_{i}=\pi_{i}].
     

Now we prove Observation 1. That is, to show

𝖯𝗋μ,R[Z1=z1,,Zk=zk|Π=π]=i[k]𝖯𝗋μ,Ri[Zi=zi|Πi=πi],\mathsf{Pr}_{\mu,R}[Z_{1}=z_{1},\ldots,Z_{k}=z_{k}\ |\ \Pi=\pi]=\prod_{i\in[k]}\mathsf{Pr}_{\mu,R_{i}}[Z_{i}=z_{i}\ |\ \Pi_{i}=\pi_{i}],

We show this using the rectangle property.

𝖯𝗋μ,R[Z1=z1,,Zk=zk|Π=π]\displaystyle\mathsf{Pr}_{\mu,R}[Z_{1}=z_{1},\ldots,Z_{k}=z_{k}\ |\ \Pi=\pi]
=\displaystyle= 𝖯𝗋R[Π=π|Z1=z1,,Zk=zk]𝖯𝗋μ[Z1=z1,,Zk=zk]𝖯𝗋μ,R[Π=π](Bayes’ theorem)\displaystyle\frac{\mathsf{Pr}_{R}[\Pi=\pi\ |\ Z_{1}=z_{1},\ldots,Z_{k}=z_{k}]\cdot\mathsf{Pr}_{\mu}[Z_{1}=z_{1},\ldots,Z_{k}=z_{k}]}{\mathsf{Pr}_{\mu,R}[\Pi=\pi]}\quad(\text{Bayes' theorem})
=\displaystyle= i[k]𝖯𝗋Ri[Πi=πi|Zi=zi]i[k]𝖯𝗋μ[Zi=zi]i[k]𝖯𝗋μ,Ri[Πi=πi](by (68),(69) and Zi are independent)\displaystyle\frac{\prod_{i\in[k]}\mathsf{Pr}_{R_{i}}[\Pi_{i}=\pi_{i}\ |\ Z_{i}=z_{i}]\cdot\prod_{i\in[k]}\mathsf{Pr}_{\mu}[Z_{i}=z_{i}]}{\prod_{i\in[k]}\mathsf{Pr}_{\mu,R_{i}}[\Pi_{i}=\pi_{i}]}\quad(\text{by }(\ref{eq:x-1}),(\ref{eq:x-2})\text{ and }Z_{i}\text{ are independent})
=\displaystyle= i[k]𝖯𝗋Ri[Πi=πi|Zi=zi]𝖯𝗋μ[Zi=zi]𝖯𝗋μ,Ri[Πi=πi]\displaystyle\prod_{i\in[k]}\frac{\mathsf{Pr}_{R_{i}}[\Pi_{i}=\pi_{i}\ |\ Z_{i}=z_{i}]\cdot\mathsf{Pr}_{\mu}[Z_{i}=z_{i}]}{\mathsf{Pr}_{\mu,R_{i}}[\Pi_{i}=\pi_{i}]}
=\displaystyle= i[k]𝖯𝗋μ,Ri[Zi=zi|Πi=πi](Bayes’ theorem)\displaystyle\prod_{i\in[k]}\mathsf{Pr}_{\mu,R_{i}}[Z_{i}=z_{i}\ |\ \Pi_{i}=\pi_{i}]\quad(\text{Bayes' theorem})