This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Deterministic Lower Bounds for kk-Edge Connectivity in the Distributed Sketching Model

Anonymous    Peter Robinson
School of Computer & Cyber Sciences
Augusta University
Peter Robinson was supported in part by National Science Foundation (NSF) grant CCF-2402836 Collaborative Research: AF: Medium: The Communication Cost of Distributed Computation
   Ming Ming Tan
School of Computer & Cyber Sciences
Augusta University
Corresponding author. Ming Ming Tan was supported in part by National Science Foundation (NSF) grant CCF-2348346 CRII: AF: The Impact of Knowledge on the Performance of Distributed Algorithms.
Abstract

We study the kk-edge connectivity problem on undirected graphs in the distributed sketching model, where we have nn nodes and a referee. Each node sends a single message to the referee based on its 1-hop neighborhood in the graph, and the referee must decide whether the graph is kk-edge connected by taking into account the received messages.

We present the first lower bound for deciding a graph connectivity problem in this model with a deterministic algorithm. Concretely, we show that the worst case message length is Ω(k)\Omega\left(k\right) bits for kk-edge connectivity, for any super-constant k=O(n)k=O(\sqrt{n}). Previously, only a lower bound of Ω(log3n)\Omega\left(\log^{3}n\right) bits was known for (11-edge) connectivity, due to Yu (SODA 2021). In fact, our result is the first super-polylogarithmic lower bound for a connectivity decision problem in the distributed graph sketching model.

To obtain our result, we introduce a new lower bound graph construction, as well as a new 3-party communication complexity problem that we call 𝖴𝗇𝗂𝗊𝗎𝖾𝖮𝗏𝖾𝗋𝗅𝖺𝗉\mathsf{UniqueOverlap}. As this problem does not appear to be amenable to reductions to existing hard problems such as set disjointness or indexing due to correlations between the inputs of the three players, we leverage results from cross-intersecting set families to prove the hardness of 𝖴𝗇𝗂𝗊𝗎𝖾𝖮𝗏𝖾𝗋𝗅𝖺𝗉\mathsf{UniqueOverlap} for deterministic algorithms. Finally, we obtain the sought lower bound for deciding kk-edge connectivity via a novel simulation argument that, in contrast to previous works, does not introduce any probability of error and thus works for deterministic algorithms.

1 Introduction

We consider the distributed graph sketching model [BMN+11], where nn distributed nodes each observe their list of neighbors in a graph. Every node sends a single message to a central entity, called referee, who must compute a function of the graph based on the received messages. In this setting, any graph problem can be solved trivially by instructing each one of the nn nodes to simply include their entire neighborhood information in the message to the referee. To obtain more efficient algorithms that scale to large graphs, the main goal is to keep the maximum size of these messages (also called sketches) as small as possible, preferably polylogarithmic in nn. In contrast, the full list of neighbors of a node could be as large as Θ(n)\Theta\left(n\right) bits, under the standard assumption that the node IDs are a permutation of {1,,n}\{1,\ldots,n\}. Consequently, nodes can afford to convey only incomplete information (hence the name “sketch”) of their local neighbors to the referee, which may paint a somewhat ambiguous picture of the actual graph. While it may be tempting to conclude that small sketches do not allow us to solve interesting graph problems in this model, the surprising breakthrough of Ahn, Guha, and McGregor [AGM12a] showed that several fundamental graph problems, such as connectivity and spanning trees, can indeed be solved with sketches of only O(log3n)O\left(\log^{3}n\right) bits, if nodes have access to shared randomness and one is willing to accept a polynomially small probability of error. Their technique (called “AGM sketches”) paved the way for sketch-based graph algorithms for many other graph problems, including vertex connectivity [GMT15] and approximate graph cuts [AGM12b]; see the survey of [McG14] for a list of additional graph problems.111Strictly speaking, [AGM12a] showed these results for fully dynamic graph streams in the semi-streaming model, but it is straightforward to adapt them to the distributed graph sketching model, as observed in [BMRT14].

1.1 Randomized Lower Bounds for Connectivity Problems

The question whether AGM sketches are optimal for connectivity problems remained an open problem for several years, until the seminal work by Nelson and Yu [NY19] showed that Ω(log3n)\Omega\left(\log^{3}n\right) bits are indeed required for computing spanning forests in the distributed graph sketching model. The main idea of their lower bound is to use a reduction to a variant of the universal relation problem [KRW95] in 2-party communication complexity, denoted by 𝖴𝖱\mathsf{UR}^{\subset}, where Alice starts with a subset SS of some universe UU, and Bob gets a proper subset TST\subset S. Alice sends a single message to Bob, who must output some element in STS\setminus T. In [KNP+17a], Kapralov, Nelson, Pachoki, Wang, and Woodruf showed that Ω(log3n)\Omega\left(\log^{3}n\right) bits are required for solving 𝖴𝖱\mathsf{UR}^{\subset} with high probability. To obtain a reduction to 𝖴𝖱\mathsf{UR}^{\subset} in the distributed sketching model, [NY19] define a tripartite graph on vertex sets VlV^{l}, VmV^{m}, and a much smaller set VrV^{r}. Each node vVmv\in V^{m} has a set of unique neighbors in VlV^{l} (not shared with any other node in VmV^{m}) as well as some neighbors in VrV^{r}. Since every node in VlV^{l} is connected to at most one node vVmv\in V^{m}, any spanning forest must include some edge between vv and a node in VrV^{r}. The neighborhood of a randomly chosen vVmv^{*}\in V^{m} is determined by the input of 𝖴𝖱\mathsf{UR}^{\subset} such that Alice’s set SS corresponds to all of vv^{*}’s neighbors, whereby Bob’s set TT specifies the subset of neighbors in VlV^{l}. They show that the players can jointly simulate the nodes in VlV^{l} and VmV^{m} (but not VrV^{r}!), whereby Bob also simulates the referee. From the resulting spanning forest, Bob can extract an edge between vv^{*} and VrV^{r}, which corresponds to an element in STS\setminus T, and thus solves 𝖴𝖱\mathsf{UR}^{\subset}. Notably absent in their simulation are the nodes in VrV^{r}, which neither Alice or Bob can simulate, due to their lack of knowledge of set STS\setminus T. In fact, trying to faithfully simulate every node executing a distributed sketching algorithm in a 2-party model poses a major technical challenge since every edge is shared between its neighbors. However, [NY19] show that choosing VrV^{r} to be polynomially smaller than VmV^{m} suffices to overcome this obstacle: Since each wVrw\in V^{r} will be a neighbor of many vVmv\in V^{m} and ww does not know which of them is the important node vv^{*} (as all of these neighborhoods are sampled from the same hard input distribution for 𝖴𝖱\mathsf{UR}^{\subset}), it follows that the amount of information that the referee can learn about the neighbors of vv^{*} from the sketches of the nodes in VrV^{r} is only (roughly) O~(|Vr||Vm|)=o(1)\tilde{O}(\frac{|V^{r}|}{|V^{m}|})=o(1), and thus negligible. Hence, Alice and Bob can omit these messages in their simulation altogether, while only causing a small increase in the overall error probability, due to Pinsker’s inequality [Pin64]. We remark that the actual lower bound construction of [NY19] uses multiple copies of this basic building block.

More recently, Robinson [Rob23] extended the approach of [NY19] to constructing a kk-edge connected spanning subgraph (kk-ECSS), with the main difference being a modified graph construction that enables a reduction to the multi-output variant of universal relation, called 𝖴𝖱k\mathsf{UR}^{\subset}_{k}, which has a lower bound of Ω(klog2n)\Omega\left(k\log^{2}n\right) bits [KNP+17a] and requires Bob to find kk elements of STS\setminus T. The simulation argument in [Rob23] does not directly use Pinsker’s inequality as in [NY19], and instead obtains a slightly smaller, but nevertheless nonzero, probability of error, by devising a mechanism of reconstructing the output when omitting the sketches of the nodes in VrV^{r}. At the risk of stating the obvious, we point out that this result does not have any implications for the problem of deciding whether a given graph is kk-edge connected, which is the focus of the current paper.

In [Yu21], Yu proved that the decision problem of graph connectivity is subject to the same lower bound on the message length as computing a spanning forest, which fully resolved the complexity of connectivity in the distributed sketching model. For this purpose, [Yu21] introduced and proved the hardness of a new decision version of universal relation, called 𝖴𝖱dec\mathsf{UR}^{\subset}_{\text{dec}}. Just as for the 𝖴𝖱\mathsf{UR}^{\subset} problem, Alice again gets a set SUS\subseteq U. However, Bob not only gets TST\subset S but, in addition, he also knows a partition (P1,P2)(P_{1},P_{2}) of UTU\setminus T. The inputs adhere to the promise that either STP1S\setminus T\subseteq P_{1} or STP2S\setminus T\subseteq P_{2}, and, upon receiving Alice’s message, Bob needs to decide which is the case. In order to embed an instance of 𝖴𝖱dec\mathsf{UR}^{\subset}_{\text{dec}} in a graph, [Yu21] adapted the construction of [NY19] by partitioning the set VrV^{r} into V1rV^{r}_{1} and V2rV^{r}_{2}, and by sampling the neighborhood of each node in VmV^{m} according to the hard distribution of 𝖴𝖱dec\mathsf{UR}^{\subset}_{\text{dec}}. Following the overall simulation approach of [NY19], Alice’s input of 𝖴𝖱dec\mathsf{UR}^{\subset}_{\text{dec}} is embedded at some special node vVmv^{*}\in V^{m}. Accordingly, the partition-promise of 𝖴𝖱dec\mathsf{UR}^{\subset}_{\text{dec}} translates to each vVmv\in V^{m} having all its VrV^{r}-neighbors in either V1rV^{r}_{1} or V2rV^{r}_{2}.

We point out that there does not appear to be a natural extension of the connectivity lower bound of [Yu21] to kk-edge connectivity. Even though 𝖴𝖱dec\mathsf{UR}^{\subset}_{\text{dec}} can be generalized to k>1k>1, proving a lower bound that scales linearly with kk for this generalization appears to be far from straightforward. Interestingly, the situation is reversed for the standard “search” version of 𝖴𝖱\mathsf{UR}^{\subset}, where the proof of the lower bound for 𝖴𝖱k\mathsf{UR}^{\subset}_{k} is technically less involved than the corresponding result for 𝖴𝖱\mathsf{UR}^{\subset}, see [KNP+17a]. A second obstacle is that, in contrast to the problem of finding a kk-edge connected spanning subgraph [Rob23], designing a suitable lower bound graph for deciding kk-edge connectivity is itself nontrivial. This is because choosing the neighborhood of every node in VmV^{m} according to the hard distribution of the generalized variant of 𝖴𝖱dec\mathsf{UR}^{\subset}_{\text{dec}} and embedding it in the constructions of [NY19, Yu21] may result in a graph that is very likely to be kk-edge connected, independently of the neighborhood of the special node vv^{*}.

1.2 Why Deterministic Lower Bounds?

Interestingly, none of the above mentioned lower bound results provide a straightforward way towards obtaining stronger bounds for deterministic graph sketching algorithms (that never fail): Since any deterministic one-way algorithm for 𝖴𝖱\mathsf{UR}^{\subset}, 𝖴𝖱k\mathsf{UR}^{\subset}_{k}, or 𝖴𝖱dec\mathsf{UR}^{\subset}_{\text{dec}} needs to send Alice’s entire input to Bob,222While there is no published deterministic lower bound for 𝖴𝖱dec\mathsf{UR}^{\subset}_{\text{dec}}, the same communication complexity bound as for the search version 𝖴𝖱\mathsf{UR}^{\subset} follows from standard fooling set and counting arguments for one-way communication; see, e.g., [KN97]. it may appear that, at first glance, we can simply “plug in” these results for deterministic protocols and directly obtain stronger lower bounds in the graph sketching model. This intuition, however, turns out to be misleading, as a major technical challenge is that the simulations used in [NY19, Yu21, Rob23] are themselves randomized and, more importantly, are necessarily “lossy” in the sense that they introduce a nonzero probability of error, due to forgoing the simulation of some nodes—in particular, all nodes in VrV^{r}. This poses a major technical obstacle towards obtaining lower bounds for algorithms that fail with exponentially small probability, including deterministic ones.333We discuss prior work on lower bounds for other graph problems in the sketching model in Section 1.4. To the best of our knowledge, achieving a perfect (i.e., 0-error) simulation of graph sketching algorithms in a two-party communication complexity model remains a challenging open problem.

1.3 Our Contributions

In this work, we take the first step toward determining lower bounds for deterministic connectivity algorithms in the graph sketching model. In more detail, we focus on the problem of deciding kk-edge connectivity, and we show that any deterministic sketching algorithm has a worst case sketch length that is almost linear (in kk):

Theorem 1.

Every deterministic algorithm that decides kk-edge connectivity on nn-node graphs in the distributed sketching model has a worst case message length of Ω(k)\Omega(k) bits, for any super-constant k=k(n)γnk=k(n)\leq\gamma\sqrt{n}, where γ>0\gamma>0 is a suitable constant.

The proof of Theorem 1 requires us to overcome several technical challenges:

  • To obtain a graph that is hard for deciding kk-edge connectivity in the distributed sketching model, we need to find a suitable graph construction. We cannot directly extend the constructions used for deciding graph connectivity in [Yu21], as this crucially rests on the assumption that the neighborhood of every node in VmV^{m} (see Sec. 1.1) must be chosen from the same distribution. In fact, the straightforward generalization of this idea to kk-edge connectivity would yield a graph that is very likely to be kk-edge connected, for large values of kk. In this work, we propose a new construction that does not preserve the uniformity property of [Yu21] regarding the input distribution. Instead, there is a special node vσv_{\sigma}, with a somewhat larger degree than its peers, which we connect in a way such that the kk-edge connectivity of the graph entirely rests on the choice of neighborhood for this node. Nevertheless, we ensure that its neighbors have no easy way of distinguishing vσv_{\sigma} from the vast majority of unimportant nodes. This approach introduces additional technical difficulties that we address in our simulation.

  • As elaborated in more detail in Section 1.2, we need to depart from the well-trodden path of using the “partial simulation” argument pioneered in [NY19] (and also used in the work of [Yu21] and [Rob23]), where some nodes are omitted from being simulated, as this would result in a nonzero error algorithm. To avoid this pitfall, we design a completely different simulation that is suitable for deterministic algorithms. We point out that the main technical difficulty of the distributed sketching model, namely the “sharing” of edges between nodes, also surfaces in our setting and prevents us from devising a simulation, where every node vv’s neighborhood is part of the input of one of the simulating players. Instead, we exploit some structural properties of the algorithm at hand that must hold for any deterministic protocol. In more detail, the main idea behind our approach is that it suffices to compute a sketch for vv that looks “compatible” with our (limited) knowledge of vv’s neighborhood, and we prove that this indeed ensures an error free simulation.

  • Our simulation works in a simultaneous 3-party model of communication complexity, where two players, Alice and Bob, each send a single message to Charlie who computes the output. For this setting, we introduce the 𝖴𝗇𝗂𝗊𝗎𝖾𝖮𝗏𝖾𝗋𝗅𝖺𝗉\mathsf{UniqueOverlap} problem, which abstracts the core difficulty present in our lower bound graph construction: That is, nodes are unaware which of their incident edges point to the crucial node vσv_{\sigma}, even though vσv_{\sigma} itself is aware of its special role. We capture this property in the communication complexity setting by equipping Alice and Bob each with an input vector that has a single common coordinate whose XOR is 11. While Charlie knows the location of this coordinate, he does not know the values of the respective bits of Alice and Bob, which are needed for the correct output. The hardness of 𝖴𝗇𝗂𝗊𝗎𝖾𝖮𝗏𝖾𝗋𝗅𝖺𝗉\mathsf{UniqueOverlap} does not appear to follow from reductions to standard problems such as set disjointness or variants of indexing (see Sec. 4.1 for a more detailed discussion), due to the correlation between the inputs of the three players and since we consider ternary input vectors, whose entries may be either 0, 11, or \perp. A further complication is that we need to assume Charlie knows the entire support (i.e., the indices of all non-\perp entries) of Alice’s and Bob’s inputs. Instead, we devise a purely combinatorial argument based on properties of cross-intersecting set families to show the following result:

Theorem 2.

For any sufficiently large mm, there exists an s=Θ(m)s=\Theta\left(m\right) such that the deterministic one-way communication complexity of 𝖴𝗇𝗂𝗊𝗎𝖾𝖮𝗏𝖾𝗋𝗅𝖺𝗉m,s\mathsf{UniqueOverlap}_{m,s} in the simultaneous 3-party model is Ω(m)\Omega\left(m\right), where each mm-length input vector has support ss.

1.4 Additional Related Work

The distributed graph sketching model was first introduced by Becker, Matamala, Nisse, Rapaport, Suchan, and Todinca in [BMN+11], where they proved the hardness of several fundamental graph problems for deterministic sketching algorithms: In particular, they showed that problems including deciding whether the diameter is at most 33, and whether the graph contains a certain subgraph such as a triangle or a square, require sketches of large (i.e., super-polylogarithmic) size. Their approach works by showing that the existence of an efficient protocol for any of these problems can be leveraged for reconstructing graphs with certain properties using similar-sized sketches. However, as pointed out in [BMN+11], their approach does not extend to graph connectivity problems.

The model was further explored by Becker, Montealegre, Rapaport, and Todinca in [BMRT14], where they show separations between the power of deterministic, private randomness and public randomness algorithms. Apart from the aforementioned works of [NY19, Yu21, Rob23] on connectivity-related problems, Assadi, Kol, and Oshman [AKO20] showed a lower bound of Ω(n1/2ε)\Omega\left(n^{1/2-\varepsilon}\right) bits on the sketch size for computing a maximal independent set. For the distributed sketching model with private randomness, Holm, King, Thorup, Zamir, and Zwick [HKT+19] showed that computing a spanning tree is possible with sketches of O(nlogn)O(\sqrt{n}\log n) bits.

We point out that the distributed graph sketching model is known to be equivalent to a single-round variant of the broadcast congested clique, as observed in several earlier works [JLN18, AKO20], where each one of nn nodes can broadcast a single message per round that becomes visible to all other nodes at the end of the current round. In that context, the worst case sketch size corresponds to the available bandwidth, i.e., size of the broadcast message. Several works have studied tradeoffs between the number of rounds and the message size in the (multi-round) broadcast congested clique: Pai and Pemmaraju [PP20] showed that Ω(lognb)\Omega\left(\frac{\log n}{b}\right) rounds are required if nodes can broadcast messages of size bb. The work of Montealegre and Todinca [MT16] revealed that there is a deterministic rr-round connectivity algorithm that sends messages of size O(n1/rlogn)O(n^{1/r}\log n). Subsequently, Jurdzinski and Nowicki [JN17] obtained an improved round complexity of O(logn/loglogn)O(\log n/\log\log n) when considering the standard assumption of O(logn)O(\log n) bits per message.

1.5 Notation and Preliminaries

In the distributed sketching model each node vv is equipped with an ID of O(logn)O(\log n) bits. We sometimes abuse notation and use vv to refer to either the node itself or its ID, depending on the context. Thus, when referring to the neighborhood of a node, we also mean the list of IDs associated to the node’s neighbors.

We use [m][m] to denote the set of integers from 11 to mm. For a set SS and some positive integer mm, we borrow the standard notation from extremal combinatorics (e.g., see [Juk01]) and use (Sm)\binom{S}{m} to denote the set family of all mm-element subsets of SS.

1.6 Roadmap

We structure the proof of Theorem 1 as follows: In Section 2, we introduce our new graph construction and analyze how its properties relate to kk-connectivity. Then, we show in Section 3 that, for any given deterministic algorithm that sends short sketches, there exists an instance of this lower bound graph, such that many nodes send the same message for certain pairs of inputs. After introducing and proving the hardness of the 𝖴𝗇𝗂𝗊𝗎𝖾𝖮𝗏𝖾𝗋𝗅𝖺𝗉\mathsf{UniqueOverlap} problem in Section 4, we show in Section 5 how to solve it by simulating a kk-edge connectivity algorithm on our lower bound graph. In Section 6, we combine these results to complete the proof of Theorem 1. Finally, we discuss some open problems in Section 7.

2 The Lower Bound Graph

In this section, we define the class of lower bound graphs for proving Theorem 1. Consider parameters nn and kk, where

ω(1)kγn,\displaystyle\omega\left(1\right)\leq k\leq\gamma\sqrt{n}, (1)

for some suitable constant γ>0\gamma>0, which, in particular, rules out kk being a constant. We define the class of nn-node graphs 𝒢k,n\mathcal{G}_{k,n} on a vertex set VW{uA,uB}V\cup W\cup\{u_{A},u_{B}\}, such that

|V|\displaystyle|V| =nn2\displaystyle=n-\left\lfloor\sqrt{n}\right\rfloor-2 (2)
|W|\displaystyle|W| =n.\displaystyle=\left\lfloor\sqrt{n}\right\rfloor. (3)

The set WW is further partitioned into vertex sets AA and BB, each of size at least kk. Every vertex has a unique identifier (ID), and we use any fixed arbitrary ID assignment from the set [n][n] for all graphs in 𝒢k,n\mathcal{G}_{k,n}. Every graph G𝒢G\in\mathcal{G} is associated with a determining index σ[|V|]\sigma\in[|V|]. As we will see below, this name is motivated by the fact that the edges in the cut E(vσ,AB)E(v_{\sigma},A\cup B) determine whether the graph is kk-connected.

Next, we define the edges of GG:

  1. (E1)

    G[A]G[A] and G[B]G[B] are cliques.

  2. (E2)

    For every wiAw_{i}\in A, we add the edge {uA,wi}\{u_{A},w_{i}\}, and, similarly, for every wjBw_{j}\in B, we add {uB,wj}\{u_{B},w_{j}\}.

We now define the edges in the cut (V,W)(V,W): For every viVv_{i}\in V (iσi\neq\sigma), the incident edges are such that exactly one of the following two properties hold:

  1. (E3)

    |E(vi,A)|1|E(v_{i},A)|\geq 1 and |E(vi,B)|=0|E(v_{i},B)|=0, or |E(vi,AB)|=0|E(v_{i},A\cup B)|=0; and there are kk parallel edges between uAu_{A} and viv_{i}.

  2. (E4)

    |E(vi,A)|=0|E(v_{i},A)|=0 and |E(vi,B)|1|E(v_{i},B)|\geq 1, and there are kk parallel edges between uBu_{B} and viv_{i}.

We say that a node viv_{i} is AA-restricted if it satisfies (E3) and call viv_{i} BB-restricted if (E4) holds. Note that (E3) holds for a node viv_{i} even in the case that there are no edges at all between viv_{i} and AB=WA\cup B=W.

Node vσv_{\sigma}, on the other hand, has exactly 2k12k-1 edges to nodes in WW, and

  1. (E5)

    there are kk parallel edges between vσv_{\sigma} and uAu_{A}.

Moreover, one of the following two conditions hold:

  1. (C0)

    |E(vσ,A)|k|E(v_{\sigma},A)|\geq k and |E(vσ,B)|k1|E(v_{\sigma},B)|\leq k-1;

  2. (C1)

    |E(vσ,A)|k1|E(v_{\sigma},A)|\leq k-1 and |E(vσ,B)|k|E(v_{\sigma},B)|\geq k.

We point out that (E3), (E4), and (E5) require the input graph to be a multi-graph. This can be avoided by simply replacing uAu_{A} and uBu_{B} by cliques of size kk if desired, without changing the asymptotic bounds. Apart from the list of their neighbors, we provide additional power to the algorithm by revealing to each node viv_{i} whether it is AA-restricted, BB-restricted, or vσv_{\sigma}. Clearly, this can only strengthen the lower bound. Figure 1 shows an example of a graph G𝒢k,nG\in\mathcal{G}_{k,n}.

Refer to caption
Figure 1: An instance of a lower bound graph for k=3k=3. Notice that vσv_{\sigma} is the only vertex with edges in both AA and BB. In this example, condition (C0) holds, since vσv_{\sigma} has kk edges to AA. Thus, the graph is not kk-edge connected.

We now state some properties of this construction. The next lemma is immediate from the description:

Lemma 1.

Every node viVv_{i}\in V is either vσv_{\sigma}, AA-restricted, or BB-restricted. Moreover, viv_{i} knows this information as part of its initial state.

Lemma 2.

Graph G𝒢k,nG\in\mathcal{G}_{k,n} is kk-edge connected if and only if (C1) holds.

Proof.

Let VAVV_{A}\subseteq V be the subset of nodes that are AA-restricted, i.e., satisfy (E3), and also include vσv_{\sigma} in VAV_{A}. Similarly, we define VBVV_{B}\subseteq V to contain all BB-restricted nodes, which satisfy (E4). Every vjVAv_{j}\in V_{A} has kk parallel edges to uAu_{A}, which itself has an edge to each node in AA. Since AA is a clique of size at least kk, it follows that there are at least kk edge-disjoint paths between every pair of nodes in G[A{uA}VA]G[A\cup\{u_{A}\}\cup V_{A}], which implies that G[A{uA}VA]G[A\cup\{u_{A}\}\cup V_{A}] is kk-edge connected. A similar argument shows that BB, uBu_{B}, and VBV_{B} induce a kk-edge connected subgraph. By properties (E1)-(E5), any edge in the cut (A{uA}VA,B{uB}VB)(A\cup\{u_{A}\}\cup V_{A},B\cup\{u_{B}\}\cup V_{B}) must be in E(vσ,B)E(v_{\sigma},B). Thus, the graph is kk-edge connected if and only if Case (C1) occurs. ∎

3 Existence of Indistinguishable Separated Pairs

As our overall goal is to prove Theorem 1, we assume towards a contradiction that each node sends at most L=o(k)L=o\left(k\right) bits throughout this section.

For a node viVv_{i}\in V, we call the subset of its neighbors in WW its WW-neighborhood. Here, our main focus will be to show that there is a way to choose the partition {A,B}\{A,B\} of WW that has properties suitable for our simulation in Section 5.

Definition 1 (Separated Pair).

We say that two WW-neighborhoods S0S_{0} and S1S_{1} form a separated pair for vσv_{\sigma} with respect to AA and BB, when the following are true:

  • if vσv_{\sigma}’s WW-neighborhood corresponds to S0S_{0}, then condition (C0) holds. In other words, |S0A|k|S_{0}\cap A|\geq k and |S0B|k1.|S_{0}\cap B|\leq k-1.

  • if vσv_{\sigma}’s WW-neighborhood is given by S1S_{1}, then condition (C1) holds. Thus, |S1A|k1|S_{1}\cap A|\leq k-1 and |S1B|k.|S_{1}\cap B|\geq k.

Hence, if vσv_{\sigma}’s WW-neighborhood is S1S_{1}, the graph is kk-edge connected whereas if it is S0S_{0}, then the graph is not kk-edge connected.

Since node vσv_{\sigma} has exactly 2k12k-1 neighbors in WW according to our lower bound graph description, we know that the set of all WW-neighborhoods of vσv_{\sigma} is of size (|W|2k1)\binom{|W|}{2k-1}. For technical reasons, however, we need to ensure that any pair of possible WW-neighborhoods of vσv_{\sigma} have an intersection that is at most a small constant fraction of kk. The proof of the next lemma follows via the probabilistic method and similar to Lemma 6 in [KNP+17a].444This is Lemma 7 in the full version of their paper on arXiv [KNP+17b]. We include a full proof in Appendix A for completeness.

Lemma 3.

Consider a set family WW, any small ε>0\varepsilon>0, and any positive integer dε|W|2e2+4/εd\leq\frac{\varepsilon|W|}{2e^{2+4/\varepsilon}}. There exists a set family 𝒮(Wd)\mathcal{S}\subseteq\binom{W}{d} such that, for any two distinct S1,S2𝒮S_{1},S_{2}\in\mathcal{S}, we have |S1S2|εd2|S_{1}\cap S_{2}|\leq\frac{\varepsilon d}{2}, and |𝒮|ed1|\mathcal{S}|\geq e^{d-1}.

Instantiating Lemma 3 with d=2k1d=2k-1 and a suitable small constant ε>0\varepsilon>0, shows that

|𝒮|e2k2.\displaystyle|\mathcal{S}|\geq e^{2k-2}. (4)

Given an arbitrary set DD, we say that {1,,}\{\mathcal{B}_{1},\ldots,\mathcal{B}_{\ell}\} is a partition 𝒫\mathcal{P} of DD and each set iD\mathcal{B}_{i}\subseteq D is called a block of 𝒫\mathcal{P}, if i=1i=D\bigcup_{i=1}^{\ell}\mathcal{B}_{i}=D and all blocks are pairwise disjoint. For a fixed partition {A,B}\{A,B\} of WW and any S𝒮S\in\mathcal{S}, we define the projections ϕA(S)=SA\phi_{A}(S)=S\cap A and ϕB(S)=SB\phi_{B}(S)=S\cap B, and we also define the resulting image sets ΦA={ϕA(S)|S𝒮}\Phi_{A}=\left\{\phi_{A}(S)\ \middle|\ S\in\mathcal{S}\right\} and ΦB={ϕB(S)|S𝒮}\Phi_{B}=\left\{\phi_{B}(S)\ \middle|\ S\in\mathcal{S}\right\}. Consider the algorithm executed by some node viv_{i}. Let us assume that, for the time being, σ=i\sigma=i, and that node viv_{i} is equipped with ID ii. Together with the node IDs in its neighborhood, this fully determines the message viv_{i} sends to the referee. In particular, viv_{i} can send at most 2L2^{L} distinct messages, and hence the given algorithm induces a partition 𝒫i,σ\mathcal{P}_{i,\sigma} of 𝒮\mathcal{S} into at most 2L2^{L} distinct blocks, where each block is a set of WW-neighborhoods for which viv_{i} sends the same message to Charlie. Let i,σ\mathcal{B}_{i,\sigma} denote the largest one of these blocks, and observe that i,σ|𝒮|2L.\mathcal{B}_{i,\sigma}\geq\frac{|\mathcal{S}|}{2^{L}}. We emphasize that the i,σ\mathcal{B}_{i,\sigma} depends on the ID ii of viv_{i}; it may be the case that the resulting set j,σ\mathcal{B}_{j,\sigma} contains a completely different set of neighborhoods for node vjv_{j} (iji\neq j). Recall from Lemma 1 that a node viVv_{i}\in V may deduce whether σ=i\sigma=i by inspecting its degree and, consequently, the algorithm at viv_{i} may behave differently when σi\sigma\neq i compared to the case where viv_{i} is either AA-restricted or BB-restricted. That is, for a given partition {A,B}\{A,B\} of WW, the message sent by any AA-restricted node viVv_{i}\in V (iσi\neq\sigma) induces a partition 𝒫i,A\mathcal{P}_{i,A} on ΦA\Phi_{A}, whereas the message sent by a BB-restricted node vjv_{j} induces a partition 𝒫j,B\mathcal{P}_{j,B} on ΦB\Phi_{B}.

The next definition places some additional restrictions on a separated pair with respect to the three partitions 𝒫i,σ\mathcal{P}_{i,\sigma}, 𝒫i,A\mathcal{P}_{i,A}, and 𝒫i,B\mathcal{P}_{i,B}, which we will leverage in Section 5.

Definition 2 (Indistinguishable Separated Pair).

We say that S0S_{0} and S1S_{1} form an indistinguishable separated pair for viv_{i}, if the following properties hold:

  1. (i)

    S0S_{0} and S1S_{1} are in the same block of 𝒫i,σ\mathcal{P}_{i,\sigma}.

  2. (ii)

    ϕA(S0)\phi_{A}(S_{0}) and ϕA(S1)\phi_{A}(S_{1}) are distinct and in the same block of 𝒫i,A\mathcal{P}_{i,A}, and

  3. (iii)

    ϕB(S1)\phi_{B}(S_{1}) and ϕB(S1)\phi_{B}(S_{1}) are distinct and in the same block of 𝒫i,B\mathcal{P}_{i,B}.

  4. (iv)

    S0S_{0} and S1S_{1} form a separated pair for viv_{i}.

Lemma 4.

Suppose that we obtain the partition {A,B}\{A,B\} by assigning each vertex in WW uniformly at random to either AA or BB. Consider any viVv_{i}\in V. Then, with probability at least 34\frac{3}{4}, there exist distinct S0,S1𝒮S_{0},S_{1}\in\mathcal{S} that form an indistinguishable separated pair for viv_{i}.

Proof.

We start by defining additional partitions 𝒫~i,A\tilde{\mathcal{P}}_{i,A} and 𝒫~i,B\tilde{\mathcal{P}}_{i,B} that we obtain from 𝒫i,A\mathcal{P}_{i,A} and 𝒫i,B\mathcal{P}_{i,B}, respectively, as follows: For each block 𝒫i,A\mathcal{B}\in\mathcal{P}_{i,A}, we define

ϕA1()=T{S𝒮ϕA(S)=T}.\phi^{-1}_{A}(\mathcal{B})=\bigcup_{T\in\mathcal{B}}\{S\in\mathcal{S}\mid\phi_{A}(S)=T\}.

Partition 𝒫~i,A\tilde{\mathcal{P}}_{i,A} has exactly the same number of blocks as 𝒫i,A\mathcal{P}_{i,A}, and we obtain ~𝒫~i,A\tilde{\mathcal{B}}\in\tilde{\mathcal{P}}_{i,A} from 𝒫i,A\mathcal{B}\in\mathcal{P}_{i,A} by defining it as the set containing all elements in ϕA1()\phi^{-1}_{A}(\mathcal{B}). Analogously, we construct 𝒫~i,B\tilde{\mathcal{P}}_{i,B}, which will have the same number of blocks as 𝒫i,B\mathcal{P}_{i,B}, and, for each 𝒫i,B\mathcal{B}^{\prime}\in\mathcal{P}_{i,B}, we get block ~𝒫~i,B\tilde{\mathcal{B}^{\prime}}\in\tilde{\mathcal{P}}_{i,B}, which contains all elements in ϕB1()=T{S𝒮ϕB(S)=T}\phi^{-1}_{B}(\mathcal{B}^{\prime})=\bigcup_{T\in\mathcal{B}^{\prime}}\{S\in\mathcal{S}\mid\phi_{B}(S)=T\}. Thus, the number of blocks in 𝒫~i,A\tilde{\mathcal{P}}_{i,A}, 𝒫~i,B\tilde{\mathcal{P}}_{i,B}, and 𝒫i,σ\mathcal{P}_{i,\sigma} is at most 2L2^{L}, and all three of them partition the set 𝒮\mathcal{S} given by Lemma 3.

To show the existence of the sought sets S0S_{0} and S1S_{1}, we need the following combinatorial statement:

Claim 1.

Let DD be a set of size ss, and let 𝒫1,,𝒫r\mathcal{P}_{1},\ldots,\mathcal{P}_{r} be rr partitions of DD, such that the number of blocks in each of the partitions is at most n<s1/r+1n^{\prime}<s^{{1}/{r+1}}. Then, for every i[r]i\in[r], there exists a block BiB_{i} in 𝒫i\mathcal{P}_{i} that contains at least s1i/(r+1)s^{1-i/(r+1)} elements that are in the same block in each of the partitions 𝒫1,,𝒫i\mathcal{P}_{1},\ldots,\mathcal{P}_{i}.

Proof of Claim 1.

The case i=1i=1 is immediate by the pigeonhole principle. For the inductive step, assume that the claim is true for some i<ri<r. That is, there is some block Bi𝒫iB_{i}\in\mathcal{P}_{i} containing at least s1i/(r+1)s^{1-i/(r+1)} elements that are also in the same block in each of the partitions P1,,PiP_{1},\ldots,P_{i}. Again, by the pigeonhole principle, we conclude that there exists a block Bi+1𝒫i+1B_{i+1}\in\mathcal{P}_{i+1} that has at least s1(i+1)/(r+1)s^{1-(i+1)/(r+1)} elements of BiB_{i}, as required. ∎

Instantiating Claim 1 with n=2L=2o(k)n^{\prime}=2^{L}=2^{o(k)} and partitions 𝒫~i,A\tilde{\mathcal{P}}_{i,A}, 𝒫~i,B\tilde{\mathcal{P}}_{i,B}, and 𝒫i,σ\mathcal{P}_{i,\sigma}, it follows that there exists a set 𝒯𝒮\mathcal{T}\subseteq\mathcal{S}, such that all sets in 𝒯\mathcal{T} are in the same block in all three partitions. Moreover, (4) tells us that

|𝒯||𝒮|1/4e(k1)/2.\displaystyle|\mathcal{T}|\geq|\mathcal{S}|^{1/4}\geq e^{(k-1)/2}.

Without loss of generality, assume that |𝒯||\mathcal{T}| is even, and pair up the sets in 𝒯\mathcal{T} in some arbitrary way. Let 𝒯^\hat{\mathcal{T}} denote this set of pairs, and observe that

|𝒯^|e(k1)/2log2.\displaystyle|\hat{\mathcal{T}}|\geq e^{(k-1)/2-\log 2}. (5)

Consider any pair (S0,S1)𝒯^(S_{0},S_{1})\in\hat{\mathcal{T}}, and note that we have already shown that Property (i) of Definition 2 holds. To show Properties (ii) and (iii), suppose that S0,S1~A𝒫~i,AS_{0},S_{1}\in\tilde{\mathcal{B}}_{A}\in\tilde{\mathcal{P}}_{i,A}. By construction of 𝒫~i,A\tilde{\mathcal{P}}_{i,A}, we know that there exist some T0,T1AT_{0},T_{1}\in\mathcal{B}_{A} where A\mathcal{B}_{A} is a block in 𝒫i,A\mathcal{P}_{i,A}, such that T0=ϕA(S0)T_{0}=\phi_{A}(S_{0}) and T1=ϕA(S1)T_{1}=\phi_{A}(S_{1}), and, analogously, it follows that ϕB(S0)\phi_{B}(S_{0}) and ϕB(S1)\phi_{B}(S_{1}) are in a common block of 𝒫i,B\mathcal{P}_{i,B}.

To complete the proof, we need to bound the probability that S0S_{0} and S1S_{1} form a separated pair. Note that this will also imply ϕA(S0)ϕA(S1)\phi_{A}(S_{0})\neq\phi_{A}(S_{1}) and ϕB(S0)ϕB(S1)\phi_{B}(S_{0})\neq\phi_{B}(S_{1}), by Definition 1 (see page 1).

Since |S0|=|S1|=2k1|S_{0}|=|S_{1}|=2k-1, it must be true that either |S0A|k|S_{0}\cap A|\geq k or |S0B|k|S_{0}\cap B|\geq k. Without loss of generality, we can assume the former inequality holds—otherwise, simply exchange S0S_{0} and S1S_{1}. Recalling that S0,S1𝒮S_{0},S_{1}\in\mathcal{S}, we know that |S0S1|εk|S_{0}\cap S_{1}|\leq\varepsilon k, which means that, conditioned on |S0A|k|S_{0}\cap A|\geq k, there are still at least

|S1S0|2k1εk\displaystyle|S_{1}\setminus S_{0}|\geq 2k-1-\lceil\varepsilon k\rceil (6)

elements of S1S_{1}, whose random assignment is not fixed by the conditioning. Let \ell be the largest even integer such that 2k1εk,\ell\leq 2k-1-\left\lceil\varepsilon k\right\rceil, and note that

k(2ε)2k(2ε)1.\displaystyle k(2-\varepsilon)-2\leq\ell\leq k(2-\varepsilon)-1. (7)

Fix some order of the elements in S1S0S_{1}\setminus S_{0}, and define ZiZ_{i} (i[]i\in[\ell]) to be the indicator random variable that is 11 if and only if the ii-th element is in BB. Let Z=i=1ZiZ=\sum_{i=1}^{\ell}Z_{i}. Our goal is to show that |(S1S0)B|Zk|(S_{1}\setminus S_{0})\cap B|\geq Z\geq k. In the worst case, all εk\left\lceil\varepsilon k\right\rceil elements in S1S0S_{1}\cap S_{0} lie in AA, and hence E[Z||S0A|k]<k\operatorname*{\textbf{{E}}}\left[Z\ \middle|\ |S_{0}\cap A|\geq k\right]<k. Therefore, we need to make use of the following anti-concentration result to show that there is a sufficiently large probability for ZkZ\geq k:

Lemma 5 (Proposition 7.3.2 in [MV01]).

Let \ell be even, X1,,XX_{1},\ldots,X_{\ell} be independent random variables, where each takes the values of 0 and 11 with probability 12.\frac{1}{2}. Let X=X1+X2+XX=X_{1}+X_{2}+\ldots X_{\ell}. For any integer t[/8]t\in[\ell/8], it holds that Pr[X2+t]115e16t2/.\operatorname*{\textbf{{Pr}}}\left[X\geq\frac{\ell}{2}+t\right]\geq\frac{1}{15}e^{-16t^{2}/\ell}.

To apply Lemma 5, define tt to be an integer such that

εk+12tεk+42,\displaystyle\frac{\varepsilon k+1}{2}\leq t\leq\frac{\varepsilon k+4}{2}, (8)

and note that t</8t<\ell/8 holds, as long as ε25185k\varepsilon\leq\frac{2}{5}-\frac{18}{5k} since k=ω(1)k=\omega(1). By Lemma 5, we have

Pr[Zk|S0A|k]\displaystyle\operatorname*{\textbf{{Pr}}}\Big{[}Z\geq k\ \Big{|}\ |S_{0}\cap A|\geq k\Big{]} Pr[Z2+t|S0A|k]\displaystyle\geq\operatorname*{\textbf{{Pr}}}\Big{[}Z\geq\frac{\ell}{2}+t\ \Big{|}\ |S_{0}\cap A|\geq k\Big{]}
(by Lem. 5, and t</8t<\ell/8) 115exp(4)\displaystyle\geq\frac{1}{15}\exp\left(-\frac{\ell}{4}\right)
(by (7)) 115exp(k(2ε)4)\displaystyle\geq\frac{1}{15}\exp\left(-\frac{k(2-\varepsilon)}{4}\right)

Recalling (5), it follows that the probability that none of the pairs in 𝒯^\hat{\mathcal{T}} is a separated pair is at most

(1ek(2ε)415)ek12log2exp(ek(2ε)4+k12log215)exp(eεk4215)\displaystyle\left(1-\frac{e^{-\frac{k(2-\varepsilon)}{4}}}{15}\right)^{e^{\frac{k-1}{2}-\log 2}}\leq\exp\left(-\frac{e^{-\frac{k(2-\varepsilon)}{4}+\frac{k-1}{2}-\log 2}}{15}\right)\leq\exp\left(-\frac{e^{\frac{\varepsilon k}{4}-2}}{15}\right)

which is at most 14\frac{1}{4} for sufficiently large nn.

It follows that one of the pairs in 𝒯^\hat{\mathcal{T}}, all of which satisfy Properties (i)-(iii), is a separated pair with probability at least 34\frac{3}{4}, which shows Property (iv) and completes the proof of Lemma 4. ∎

Recall that |V|=Θ(n)|V|=\Theta\left(n\right). For a randomly chosen partition {A,B}\{A,B\} of WW, Lemma 4 implies that a constant fraction of the nodes viVv_{i}\in V has indistinguishable separated pairs in expectation. Therefore, there must be some concrete choice {A,B}\{A^{*},B^{*}\} for {A,B}\{A,B\} that ensures these properties. We are now ready to summarize the main result of this section while recalling the assumed upper bound on the message size:

Corollary 1.

For a given kk-edge connectivity algorithm, let VgoodVV_{\text{good}}\subseteq V be the set of nodes for which there exist indistinguishable separated pairs (see Def. 2 on page 2). If each node sends at most o(k)o(k) bits, then there exists a partition {A,B}\{A^{*},B^{*}\} of WW such that |Vgood|34|V||V_{\text{good}}|\geq\frac{3}{4}|V|.

4 The 𝖴𝗇𝗂𝗊𝗎𝖾𝖮𝗏𝖾𝗋𝗅𝖺𝗉\mathsf{UniqueOverlap} Problem

We now introduce a communication complexity problem in a 3-party model, whose hardness turns out to be crucial for obtaining a lower bound for deciding kk-edge connectivity in the sketching model.

In the 𝖴𝗇𝗂𝗊𝗎𝖾𝖮𝗏𝖾𝗋𝗅𝖺𝗉m,s\mathsf{UniqueOverlap}_{m,s} problem, we are given two positive integers mm and ss, where sm/2s\leq\lceil m/2\rceil, and there are three players Alice, Bob, and Charlie. We focus on one-way communication protocols, where Alice and Bob each send a single message to Charlie, who outputs the result. Alice’s input is a ternary vector of length mm, denoted by X{0,1,}mX\in\{0,1,\perp\}^{m}, and, similarly, Bob gets a vector Y{0,1,}mY\in\{0,1,\perp\}^{m}. We use 𝗌𝗎𝗉𝗉(Z)\operatorname{\mathsf{supp}}(Z) to denote the support of a vector ZZ, i.e., the set of indices in [m][m], for which ZiZ_{i}\neq\perp. Charlie does not know anything about XX and YY, except 𝗌𝗎𝗉𝗉(X)\operatorname{\mathsf{supp}}(X) and 𝗌𝗎𝗉𝗉(Y)\operatorname{\mathsf{supp}}(Y).

Definition 3 (Valid Instance for 𝖴𝗇𝗂𝗊𝗎𝖾𝖮𝗏𝖾𝗋𝗅𝖺𝗉m,s\mathsf{UniqueOverlap}_{m,s}).

The tuple (X,Y)(X,Y) is a valid instance for 𝖴𝗇𝗂𝗊𝗎𝖾𝖮𝗏𝖾𝗋𝗅𝖺𝗉m,s\mathsf{UniqueOverlap}_{m,s}, if |X|=|Y|=s|X|=|Y|=s and the following properties hold:

  1. (P1)

    There is exactly one index σ[m]\sigma\in[m] such that XσYσ=1X_{\sigma}\oplus Y_{\sigma}=1, where \oplus denotes the XOR operator.

  2. (P2)

    For every i[m]{σ}i\in[m]\setminus\{\sigma\}, it holds that Xi=X_{i}=\perp or Yi=Y_{i}=\perp.

In conjunction, Properties (P1) and (P2) ensure that |𝗌𝗎𝗉𝗉(X)𝗌𝗎𝗉𝗉(Y)|=1|\operatorname{\mathsf{supp}}(X)\cap\operatorname{\mathsf{supp}}(Y)|=1, and thus Charlie can deduce the value of σ\sigma directly from his input. To solve the problem, Alice and Bob each send a single message to Charlie, who must output “yes” if Xσ=0X_{\sigma}=0 and Yσ=1Y_{\sigma}=1, and “no” otherwise. See Figure 2 for an example.

Refer to caption
Figure 2: An instance of 𝖴𝗇𝗂𝗊𝗎𝖾𝖮𝗏𝖾𝗋𝗅𝖺𝗉8,3\mathsf{UniqueOverlap}_{8,3}. Since (Xσ,Yσ)=(1,0)(X_{\sigma},Y_{\sigma})=(1,0), Charlie needs to answer “no”.

4.1 Relationship to Other Problems in Communication Complexity.

In [AAD+23], the authors define the Leave-One Index problem (Problem 3 in [AAD+23]), where there are three players Alice, Bob, and Charlie. Alice and Bob hold the same string x{0,1}Nx\in\{0,1\}^{N} and Charlie knows an index i[N]i^{*}\in[N]. In addition, Alice has a set S[N]S\subseteq[N] and Bob has a set T[N]T\subseteq[N] with the property that ST=[N]{i}S\cup T=[N]\setminus\{i^{*}\}, and the goal is that the players output xix_{i^{*}}, whereby the order of communication is Alice, Bob, and then Charlie. We point out that Leave-One Index lacks an important requirement of the 𝖴𝗇𝗂𝗊𝗎𝖾𝖮𝗏𝖾𝗋𝗅𝖺𝗉\mathsf{UniqueOverlap} problem, namely that Charlie also knows the exact support sets (not just their size) of the sets SS and TT given to Alice and Bob, which is crucial for the correctness of our simulation. Thus, it is unclear whether there exists a reduction between these problems.

The uniqueness of the shared index σ\sigma may give the impression, at first, that 𝖴𝗇𝗂𝗊𝗎𝖾𝖮𝗏𝖾𝗋𝗅𝖺𝗉\mathsf{UniqueOverlap} is related to a (promise) variant of computing the set intersection. However, we doubt the existence of a straightforward reduction, as, for the 𝖴𝗇𝗂𝗊𝗎𝖾𝖮𝗏𝖾𝗋𝗅𝖺𝗉\mathsf{UniqueOverlap} problem, the input vectors of Alice and Bob are ternary, and Charlie knows the (single) intersecting index as well as the support of the two input sets.

4.2 Hardness of the 𝖴𝗇𝗂𝗊𝗎𝖾𝖮𝗏𝖾𝗋𝗅𝖺𝗉\mathsf{UniqueOverlap} Problem

We now show that either Alice or Bob needs to send a message of size Ω(m)\Omega\left(m\right) bits to Charlie in the worst case.

Theorem 2 (restated).

For any sufficiently large mm, there exists an s=Θ(m)s=\Theta\left(m\right) such that the deterministic one-way communication complexity of 𝖴𝗇𝗂𝗊𝗎𝖾𝖮𝗏𝖾𝗋𝗅𝖺𝗉m,s\mathsf{UniqueOverlap}_{m,s} in the simultaneous 3-party model is Ω(m)\Omega\left(m\right), where each mm-length input vector has support ss.

Proof.

Consider any protocol for 𝖴𝗇𝗂𝗊𝗎𝖾𝖮𝗏𝖾𝗋𝗅𝖺𝗉m,s\mathsf{UniqueOverlap}_{m,s} and let CC be the worst case length of any message sent. Assume towards a contradiction that

Cm61.\displaystyle C\leq\left\lfloor\frac{m}{6}\right\rfloor-1. (9)

Suppose that we fix Alice’s support to be a set II of C+fC+f indices from [m][m], where ff is an integer parameter described below. We can partitions her 2C+f2^{C+f} possible input vectors into 2C2^{C} blocks such that the ii-th block contains all inputs on which Alice sends the number ii to Charlie. Clearly, the largest block \mathcal{B} in this partition contains at least 2f2^{f} inputs. Observe that there must exist a set FAlice(I)IF_{\text{Alice}}(I)\subseteq I of ff indices in II such that, for each iFAlice(I)i\in F_{\text{Alice}}(I), there are inputs X,X^X,\hat{X}\in\mathcal{B} such that XiX^iX_{i}\neq\hat{X}_{i}, i.e., Alice sends the same message for two inputs in which the bit at index ii is 0 and 11, respectively. We say that index ii is flipped for II, and we refer to FAlice(I)F_{\text{Alice}}(I) as Alice’s flipped indices for support II. Similarly, we can identify a set of flipped indices FBob(I)F_{\text{Bob}}(I) for Bob by partitioning all possible input vectors with support II into blocks. Note that FAlice(I)F_{\text{Alice}}(I) and FBob(I)F_{\text{Bob}}(I) need not be the same since Alice and Bob can execute different algorithms.

A crucial observation is that the algorithm fails, if there exist support sets I1I_{1} and I2I_{2} such that

  1. (a)

    |I1I2|=1|I_{1}\cap I_{2}|=1, and

  2. (b)

    (I1I2)(FAlice(I1)FBob(I2))\left(I_{1}\cap I_{2}\right)\subseteq\left(F_{\text{Alice}}(I_{1})\cap F_{\text{Bob}}(I_{2})\right).

These conditions imply that the important index σI1I2\sigma\in I_{1}\cap I_{2} is flipped for Alice’s as well as Bob’s support. In more detail, this means that there are two inputs XX and X^\hat{X} for Alice that are in the same block Alice\mathcal{B}_{\text{Alice}}, where XσX^σX_{\sigma}\neq\hat{X}_{\sigma} and Alice sends the same message π(Alice)\pi(\mathcal{B}_{\text{Alice}}), and there are two inputs YY and Y^\hat{Y} in the same block Bob\mathcal{B}_{\text{Bob}}, where YσY^σY_{\sigma}\neq\hat{Y}_{\sigma}, for which Bob sends π(Bob)\pi(\mathcal{B}_{\text{Bob}}), thus causing Charlie to fail in one of the (two) valid input combinations.

We show that if an algorithm manages to avoid (a) and (b) for all possible pairs of support sets, then it must send at least C=Ω(m)C=\Omega\left(m\right) bits. Let =([m]s)\mathcal{I}=\binom{[m]}{s} be the set family of all possible ss-element support sets for the inputs of Alice and Bob. For each i[m]i\in[m], define

iAlice={IiFAlice(I)},\mathcal{R}_{i}^{\text{Alice}}=\left\{I\in\mathcal{I}\mid i\in F_{\text{Alice}}(I)\right\},

which means that iAlice\mathcal{R}_{i}^{\text{Alice}} contains every support set II where ii is a flipped index for II of Alice, and we define iBob\mathcal{R}_{i}^{\text{Bob}} analogously.

For the rest of the proof, if a statement holds for both iAlice\mathcal{R}_{i}^{\text{Alice}} and iBob\mathcal{R}_{i}^{\text{Bob}}, we simply omit the superscript and write i\mathcal{R}_{i} instead. Recall that each support II\in\mathcal{I} contains at least ff flipped indices, i.e., |FAlice(I)|f|F_{\text{Alice}}(I)|\geq f and |FBob(I)|f|F_{\text{Bob}}(I)|\geq f. This tells us that II must be a member of at least ff of the i\mathcal{R}_{i} set families, and thus

i=1m|iAlice|+|iBob|2f(mr).\displaystyle\sum_{i=1}^{m}|\mathcal{R}_{i}^{\text{Alice}}|+|\mathcal{R}_{i}^{\text{Bob}}|\geq 2f\binom{m}{r}. (10)

Consider I1iAliceI_{1}\in\mathcal{R}_{i}^{\text{Alice}} and I2iBobI_{2}\in\mathcal{R}_{i}^{\text{Bob}}. The only way to avoid (a) and (b) is if I1I_{1} and I2I_{2} are not valid choices for the support of Alice’s input XX and Bob’s input YY, respectively (see Def. 3), which is the case if |I1I2|2|I_{1}\cap I_{2}|\geq 2; (recall that I1I2=I_{1}\cap I_{2}=\emptyset is impossible since iI1I2i\in I_{1}\cap I_{2}). For any i\mathcal{R}_{i}, we can obtain a new set family ^i\hat{\mathcal{R}}_{i} by removing the index ii from each set in i\mathcal{R}_{i}, i.e.,

^i={I([m]s1)Ii:I=I{i}}.\displaystyle\hat{\mathcal{R}}_{i}=\left\{I^{\prime}\in\binom{[m]}{s-1}\mid\exists I\in\mathcal{R}_{i}\colon I=I^{\prime}\cup\{i\}\right\}.

The correctness of the assumed protocol implies the following:

Fact 1.

Set families ^iAlice\hat{\mathcal{R}}_{i}^{\text{Alice}} and ^iBob\hat{\mathcal{R}}_{i}^{\text{Bob}} form a cross-intersecting family, which means that

I1^iAliceI2^iBob:|I1I2|1.\displaystyle\forall I_{1}^{\prime}\in\hat{\mathcal{R}}_{i}^{\text{Alice}}\ \forall I_{2}^{\prime}\in\hat{\mathcal{R}}_{i}^{\text{Bob}}\colon|I_{1}^{\prime}\cap I_{2}^{\prime}|\geq 1. (11)

Next, we employ an upper bound on the sum of the cardinalities of any two cross-intersecting set families:

Lemma 6 ([HM67]).

Let 𝒜\mathcal{A} and \mathcal{B} be such that each is a family of \ell-element sets on the same underlying set of size pp and assume that 𝒜\mathcal{A} and \mathcal{B} are cross-intersecting. If p2p\geq 2\ell, then |𝒜|+||1+(p)(p).|\mathcal{A}|+|\mathcal{B}|\leq 1+\binom{p}{\ell}-\binom{p-\ell}{\ell}.

Choosing f=m3+1f=\left\lfloor\frac{m}{3}\right\rfloor+1 and recalling (9), implies that

s=C+fm2,\displaystyle s=C+f\leq\frac{m}{2}, (12)

and therefore 2(s1)m12(s-1)\leq m-1. Thus, we can apply Lemma 6 with parameters p=m1p=m-1 and =s1\ell=s-1 to obtain

|^iAlice|+|^iBob|1+(m1s1)(mss1).\displaystyle|\hat{\mathcal{R}}_{i}^{\text{Alice}}|+|\hat{\mathcal{R}}_{i}^{\text{Bob}}|\leq 1+\binom{m-1}{s-1}-\binom{m-s}{s-1}. (13)

Since |^i|=|i||\hat{\mathcal{R}}_{i}|=|\mathcal{R}_{i}|, it follows from (10) and (13) that

2f(ms)1+(m1s1)(mss1).\displaystyle 2f\binom{m}{s}\leq 1+\binom{m-1}{s-1}-\binom{m-s}{s-1}. (14)

However, it also holds that

2f(ms)\displaystyle 2f\binom{m}{s} =2(m3+1)ms(m1s1)\displaystyle=2\left(\left\lfloor\frac{m}{3}\right\rfloor+1\right)\frac{m}{s}\binom{m-1}{s-1}
(by (12)) 4((m31)+1)(m1s1)\displaystyle\geq 4\left(\left(\frac{m}{3}-1\right)+1\right)\binom{m-1}{s-1}
=43m(m1s1)\displaystyle=\frac{4}{3}m\binom{m-1}{s-1}
>m(1+(m1s1)),\displaystyle>m\left(1+\binom{m-1}{s-1}\right),

which provides a contradiction to (14), thus completing the proof of Theorem 2. ∎

4.3 A Deterministic Algorithm for 𝖴𝗇𝗂𝗊𝗎𝖾𝖮𝗏𝖾𝗋𝗅𝖺𝗉\mathsf{UniqueOverlap}

The argument developed for our lower bound proof in Section 4.2 inspires the design of a simple deterministic algorithm that allows Alice and Bob to save a single bit by sending messages of length s1s-1 under certain conditions. As this is not needed for proving our main result, we relegate the details to Appendix B.

Theorem 3.

If m3<s\lceil\frac{m}{3}\rceil<s, then there exists a deterministic one-way protocol for 𝖴𝗇𝗂𝗊𝗎𝖾𝖮𝗏𝖾𝗋𝗅𝖺𝗉m,s\mathsf{UniqueOverlap}_{m,s} in the simultaneous 33-party model, such that Alice and Bob send at most s1s-1 bits to Charlie.

5 Simulation

In this section, we show how Alice, Bob, and Charlie can jointly simulate a given kk-edge connectivity algorithm 𝒜k-conn\mathcal{A}^{\text{$k$-conn}} in the 3-party model to solve the 𝖴𝗇𝗂𝗊𝗎𝖾𝖮𝗏𝖾𝗋𝗅𝖺𝗉m,s\mathsf{UniqueOverlap}_{m,s} problem, for some given integers mm and ss, assuming that the sketches are sufficiently small for Corollary 1 to hold.

We now describe the details of the simulation and how the players can create the lower bound graph (see Sec. 2), given a valid instance of 𝖴𝗇𝗂𝗊𝗎𝖾𝖮𝗏𝖾𝗋𝗅𝖺𝗉m,s\mathsf{UniqueOverlap}_{m,s}. First, each player locally computes

n=24m3,\displaystyle n=2\left\lceil\frac{4m}{3}\right\rceil, (15)

which will correspond to the number of nodes in the graph G𝒢k,nG\in\mathcal{G}_{k,n} that the players construct. According to Corollary 1, there is a partitioning of WW into {A,B}\{A^{*},B^{*}\}, as well as a set VgoodV_{\text{good}} of size at least mm vertices, with the property that there exist indistinguishable separated pairs (see Def. 2), for each of the nodes in VgoodV_{\text{good}}. We use VgoodmV_{\text{good}}^{\leq m} to denote the first mm nodes in VgoodV_{\text{good}}, ordered by IDs. Note that the players know, in advance, the IDs in the sets VV, VgoodV_{\text{good}}, AA^{*}, BB^{*}, as these are fixed for all possible lower bound graphs in 𝒢k,n\mathcal{G}_{k,n}. Furthermore, they can also pre-compute the indistinguishable separated pair (S0i,S1i)(S_{0}^{i},S_{1}^{i}) , for every viVgoodmv_{i}\in V_{\text{good}}^{\leq m}, as these are fully determined by fixing kk, nn, and the algorithm at hand.

Refer to caption
(a) Alice simulates all nodes in AA^{*}.
Refer to caption
(b) Bob simulates all nodes in BB^{*}.
Refer to caption
(c) Charlie simulates uAu_{A} and uBu_{B}, as well as all nodes in VV, even though he only knows a subset of their incident edges.
Figure 3: Overview of the Simulation. Each player simulates a certain subset of the nodes and their incident edges. Note that the edges of vσv_{\sigma} are “split” between Alice and Bob.

Alice’s and Bob’s Simulation:

Alice simulates all nodes in AA^{*}, whereas Bob is responsible for the nodes in BB^{*}. See Figures 3(a) and 3(b) for an example. First, Alice creates a clique on AA^{*} and connects uAu_{A} to each node in this clique. Then, for each i𝗌𝗎𝗉𝗉(X)i\in\operatorname{\mathsf{supp}}(X) where viVgoodmv_{i}\in V_{\text{good}}^{\leq m}, Alice inspects the input XiX_{i} and adds edges incident to the nodes in AA^{*} as follows:

  • If Xi=0X_{i}=0, then she includes the edges between viv_{i} and the nodes in S1iAS_{1}^{i}\cap A^{*}.

  • On the other hand, if Xi=1X_{i}=1, then Alice adds all edges connecting viv_{i} to S0iAS_{0}^{i}\cap A^{*}.

Bob adds a clique on BB^{*} and, analogously to Alice, includes an edge between uBu_{B} and each node in BB^{*}. Next, we describe how Bob creates the edges between BB^{*} and node vjVgoodmv_{j}\in V_{\text{good}}^{\leq m}, for each j𝗌𝗎𝗉𝗉(Y)j\in\operatorname{\mathsf{supp}}(Y):

  • If Yj=0Y_{j}=0, then he adds edges between vjv_{j} and S0jBS_{0}^{j}\cap B^{*}.

  • If Yj=1Y_{j}=1, he connects vjv_{j} to each node in S1jBS_{1}^{j}\cap B^{*}.

Finally, Alice and Bob locally execute algorithm 𝒜k-conn\mathcal{A}^{\text{$k$-conn}} to compute the sketches produced by their simulated nodes in AA^{*} and BB^{*}, respectively, which they send to Charlie.

Charlie’s Simulation:

Charlie is responsible for computing the sketches for uAu_{A}, uBu_{B}, and all nodes in VV, and he will also simulate the referee. To this end, he creates edges between uAu_{A} and each node in AA^{*}—recall that these edges were also created by Alice for simulating the nodes in AA^{*}—and he also connects uAu_{A} to every viV𝗌𝗎𝗉𝗉(Y)v_{i}\in V\setminus\operatorname{\mathsf{supp}}(Y) via kk parallel edges. Note that V𝗌𝗎𝗉𝗉(Y)V\setminus\operatorname{\mathsf{supp}}(Y) contains every ii such that i𝗌𝗎𝗉𝗉(X)i\in\operatorname{\mathsf{supp}}(X) or i𝗌𝗎𝗉𝗉(X)𝗌𝗎𝗉𝗉(Y)i\notin\operatorname{\mathsf{supp}}(X)\cup\operatorname{\mathsf{supp}}(Y), whereby the latter case is equivalent to Xi=Yi=X_{i}=Y_{i}=\perp. Charlie also includes kk parallel edges connecting uBu_{B} to every vjv_{j} where j𝗌𝗎𝗉𝗉(Y){σ}j\in\operatorname{\mathsf{supp}}(Y)\setminus\{\sigma\}. Similarly, he adds edges between uBu_{B} and each node in BB^{*}. See Figure 3(c).

Next, we describe how Charlie simulates the nodes in VV:

  • First, consider the crucial node viVgoodmv_{i}\in V_{\text{good}}^{\leq m}, where σ=i\sigma=i: Since Charlie does not know whether Alice and Bob used S0iS_{0}^{i} or S1iS_{1}^{i} for creating the edges between their nodes in ABA^{*}\cup B^{*} and vσv_{\sigma}, he cannot hope to faithfully recreate the entire neighborhood of viv_{i}. However, what he can do instead is to directly compute the sketch that algorithm 𝒜k-conn\mathcal{A}^{\text{$k$-conn}} produces for node viv_{i}, in the case that σ=i\sigma=i. To see why this is true, recall that S0iS_{0}^{i} or S1iS_{1}^{i} form an indistinguishable separated pair, and Charlie knows the block i\mathcal{B}_{i} in partition 𝒫i,σ\mathcal{P}_{i,\sigma} that contains S0σS_{0}^{\sigma} as well as S1σS_{1}^{\sigma}. This ensures that viv_{i} sends the same sketch π(i)\pi(\mathcal{B}_{i}) given either neighborhood.

  • For any node vjVgoodmv_{j}\in V_{\text{good}}^{\leq m} (jσj\neq\sigma), Charlie proceeds analogously as in the case j=σj=\sigma, with the only difference being that he no longer uses 𝒫j,σ\mathcal{P}_{j,\sigma} for identifying the common block that contains S0jS_{0}^{j} and S1jS_{1}^{j} (and deriving vjv_{j}’s sketch). Instead, if vjv_{j} is AA-restricted (i.e., j𝗌𝗎𝗉𝗉(X)𝗌𝗎𝗉𝗉(Y)j\in\operatorname{\mathsf{supp}}(X)\setminus\operatorname{\mathsf{supp}}(Y)), then he uses 𝒫~j,A\tilde{\mathcal{P}}_{j,A^{*}} to find the common block that contains S0jS_{0}^{j} and S1jS_{1}^{j} and uses the associated block in 𝒫j,A\mathcal{P}_{j,A^{*}} (see Page 4) to derive vjv_{j}’s sketch. Otherwise, vjv_{j} must be BB-restricted (j𝗌𝗎𝗉𝗉(Y)𝗌𝗎𝗉𝗉(X)j\in\operatorname{\mathsf{supp}}(Y)\setminus\operatorname{\mathsf{supp}}(X)), prompting him to find the common block in 𝒫~j,B\tilde{\mathcal{P}}_{j,B^{*}}, and compute vjv_{j}’s sketch accordingly.

  • Charlie can directly simulate the remaining nodes in VV, i.e., every

    vr(VVgoodm){viVgoodm|i(𝗌𝗎𝗉𝗉(X)𝗌𝗎𝗉𝗉(Y))},v_{r}\in(V\setminus V_{\text{good}}^{\leq m})\cup\left\{v_{i}\in V_{\text{good}}^{\leq m}\ \middle|\ i\notin\left(\operatorname{\mathsf{supp}}(X)\cup\operatorname{\mathsf{supp}}(Y)\right)\right\},

    since the only edges added for vrv_{r} are kk parallel edges to uAu_{A}.

Charlie simulates that the referee receives the sketches sent by the nodes in VV, uAu_{A}, and uBu_{B}, as well as the messages that it received from Alice and Bob for the nodes in AA^{*} and BB^{*}, which is sufficient for invoking 𝒜k-conn\mathcal{A}^{\text{$k$-conn}} and obtaining the decision by the referee. Charlie answers “yes” if and only if the decision was that the graph was kk-edge connected.

5.1 Correctness and Complexity Bounds of the Simulation

As made evident by the above description, our simulation constructs only certain graphs in 𝒢k,n\mathcal{G}_{k,n}. In particular, these are graphs where all edges in the cut E(V,W)E(V,W) are determined by separating pairs. We start by formalizing this restriction:

Definition 4.

Consider any graph G𝒢k,nG\in\mathcal{G}_{k,n} and recall the existence of set VgoodV_{\text{good}}, the partition (A,B)(A^{*},B^{*}), and the separating pairs (S0i,S1i)(S_{0}^{i},S_{1}^{i}) for each viVgoodmv_{i}\in V^{\leq m}_{\text{good}}, guaranteed by applying Corollary 1 to algorithm 𝒜k-conn\mathcal{A}^{\text{$k$-conn}}. We say that GG is compatible with a valid instance (X,Y)(X,Y) of 𝖴𝗇𝗂𝗊𝗎𝖾𝖮𝗏𝖾𝗋𝗅𝖺𝗉m,s\mathsf{UniqueOverlap}_{m,s} if |Vgood|m|V_{\text{good}}|\geq m and, for every i[m]i\in[m]:

  1. 1.

    If Xi=0X_{i}=0, then E(vi,A)=AS1iE(v_{i},A^{*})=A^{*}\cap S_{1}^{i}.

  2. 2.

    If Xi=1X_{i}=1, then E(vi,A)=AS0iE(v_{i},A^{*})=A^{*}\cap S_{0}^{i}.

  3. 3.

    If Yi=0Y_{i}=0, then E(vi,B)=BS0iE(v_{i},B^{*})=B^{*}\cap S_{0}^{i}.

  4. 4.

    If Yi=1Y_{i}=1, then E(vi,B)=BS1iE(v_{i},B^{*})=B^{*}\cap S_{1}^{i}.

  5. 5.

    If i𝗌𝗎𝗉𝗉(X)i\in\operatorname{\mathsf{supp}}(X) or i(𝗌𝗎𝗉𝗉(X)𝗌𝗎𝗉𝗉(Y))i\notin(\operatorname{\mathsf{supp}}(X)\cup\operatorname{\mathsf{supp}}(Y)), then viv_{i} has kk edges to uAu_{A}.

  6. 6.

    If i𝗌𝗎𝗉𝗉(Y)𝗌𝗎𝗉𝗉(X)i\in\operatorname{\mathsf{supp}}(Y)\setminus\operatorname{\mathsf{supp}}(X), then viv_{i} has kk edges to uBu_{B}.

Lemma 7.

Let GG be a graph that is compatible with an instance I=(X,Y)I=(X,Y) of 𝖴𝗇𝗂𝗊𝗎𝖾𝖮𝗏𝖾𝗋𝗅𝖺𝗉\mathsf{UniqueOverlap}. It holds that GG is kk-edge connected if and only if the solution to II is “yes”, i.e., Xσ=0X_{\sigma}=0 and Yσ=1Y_{\sigma}=1.

Proof.

Since GG is a compatible graph, we know that G𝒢k,nG\in\mathcal{G}_{k,n}. Thus, it follows from Lemma 2 that GG is kk-edge connected if and only condition (C1) holds, i.e., |E(vσ,A)|k1|E(v_{\sigma},A^{*})|\leq k-1 and |E(vσ,B)|k|E(v_{\sigma},B^{*})|\geq k. Since (S0σ,S1σ)(S_{0}^{\sigma},S_{1}^{\sigma}) form a separated pair, we know from Definition 1 that this is satisfied for S1σS_{1}^{\sigma}. According to Definition 4, we use S1σS^{\sigma}_{1} to determine the neighbors of vσv_{\sigma} in ABA^{*}\cup B^{*} if and only if Xσ=0X_{\sigma}=0 and Yσ=1Y_{\sigma}=1, which proves the correspondence between kk-edge connectivity and the solution to 𝖴𝗇𝗂𝗊𝗎𝖾𝖮𝗏𝖾𝗋𝗅𝖺𝗉\mathsf{UniqueOverlap}. ∎

Equipped with Definition 4, we are ready to show that the sketches produced by the simulation indeed correspond to an actual execution of 𝒜k-conn\mathcal{A}^{\text{$k$-conn}} on a suitable graph in 𝒢k,n\mathcal{G}_{k,n}.

Lemma 8.

Suppose that algorithm 𝒜k-conn\mathcal{A}^{\text{$k$-conn}} has a maximum sketch length of L=o(k)L=o(k) bits. Then, Charlie provides as input to the referee in the simulation the same set of sketches that the referee receives when executing algorithm 𝒜k-conn\mathcal{A}^{\text{$k$-conn}} on a graph compatible with the instance (X,Y)(X,Y).

Proof.

Since the players construct a graph G𝒢k,nG\in\mathcal{G}_{k,n}, where n=24m3n=2\left\lceil\frac{4m}{3}\right\rceil, it follows from (2) (on page 2) that |V|4m3|V|\geq\left\lceil\frac{4m}{3}\right\rceil. According to Corollary 1, this implies that the set VgoodV_{\text{good}} must have a size of cmc\cdot m for some constant c1c\geq 1, which ensures that the set VgoodmVgoodV_{\text{good}}^{\leq m}\subseteq V_{\text{good}} of size mm exists. Recall that all players can compute VgoodV_{\text{good}}, since this is fully determined by the algorithm at hand (i.e., 𝒜k-conn\mathcal{A}^{\text{$k$-conn}}) and the parameters nn, kk, and LL. We now argue that the simulation computes the correct sketch for every type of node in graph GG:

  • Nodes uAu_{A} and uBu_{B}: These nodes are only simulated by Charlie who knows 𝗌𝗎𝗉𝗉(X)\operatorname{\mathsf{supp}}(X), 𝗌𝗎𝗉𝗉(Y)\operatorname{\mathsf{supp}}(Y), and thus also σ\sigma. By the description of the simulation, Charlie creates their incident edges in a way that satisfies Points 5 and 6 of Definition 4.

  • Node wjABw_{j}\in A^{*}\cup B^{*}: Without loss of generality, assume that wjAw_{j}\in A^{*}; the case wjBw_{j}\in B^{*} is symmetric. In the execution on the compatible graph GG, the edges in the cut E(V,wj)E(V,w_{j}) are fully determined by the separating pairs (S01,S11),,(S0m,S1m)(S_{0}^{1},S_{1}^{1}),\ldots,(S_{0}^{m},S_{1}^{m}), which corresponds to how Alice creates the edges incident to wjw_{j} in the simulation. Moreover, Alice will also add an edge {uA,wj}\{u_{A},w_{j}\}, which ensures the same sketch for wjw_{j} as in the actual execution.

  • Node viVgoodmv_{i}\in V_{\text{good}}^{\leq m}: According to the simulation, Charlie will select one of the blocks σ,~A,~B\mathcal{B}_{\sigma},\tilde{\mathcal{B}}_{A},\tilde{\mathcal{B}}_{B}, where σPi,σ\mathcal{B}_{\sigma}\in P_{i,\sigma}, ~AP~i,A\tilde{\mathcal{B}}_{A}\in\tilde{P}_{i,A^{*}} and ~BP~i,B\tilde{\mathcal{B}}_{B}\in\tilde{P}_{i,B^{*}}, with the property that each of the blocks contains (S0i,S1i)(S_{0}^{i},S_{1}^{i}). Recall that these blocks exist due to Corollary 1. Let APi,A\mathcal{B}_{A}\in P_{i,A^{*}} be the associated block of ~A\tilde{\mathcal{B}}_{A}, and BPi,B\mathcal{B}_{B}\in P_{i,B^{*}} be the associated block of ~B\tilde{\mathcal{B}}_{B}, see Page 4. If i=σi=\sigma, then he computes π(σ)\pi(\mathcal{B}_{\sigma}) where π(σ)\pi(\mathcal{B}_{\sigma}) corresponds to the message sent by viv_{i} for the inputs in σ\mathcal{B}_{\sigma}; otherwise, if i𝗌𝗎𝗉𝗉(X)𝗌𝗎𝗉𝗉(Y)i\in\operatorname{\mathsf{supp}}(X)\setminus\operatorname{\mathsf{supp}}(Y), then he computes π(A)\pi(\mathcal{B}_{A}). Finally, in the case that i𝗌𝗎𝗉𝗉(Y)𝗌𝗎𝗉𝗉(X)i\in\operatorname{\mathsf{supp}}(Y)\setminus\operatorname{\mathsf{supp}}(X), he computes π(B)\pi(\mathcal{B}_{B}) instead. A crucial observation is that, even though Charlie does not know whether Alice used S0iAS_{0}^{i}\cap A^{*} or S1iAS_{1}^{i}\cap A^{*} to create the edges in E(vi,A)E(v_{i},A^{*}), this does not prevent him from computing the correct sketch for viv_{i}, since S0iS_{0}^{i} and S1iS_{1}^{i} are guaranteed to be in the same block in each of the partitions; a similar argument applies to the edges in E(vi,B)E(v_{i},B^{*}).

  • Node viVVgoodmv_{i}\in V\setminus V_{\text{good}}^{\leq m}: These nodes do not have any edges to ABA^{*}\cup B^{*}. Instead, Charlie only creates kk parallel edges to uAu_{A}, which matches the neighborhood of viv_{i} in the compatible graph GG. ∎

Lemma 9.

Suppose there exists a deterministic kk-edge connectivity algorithm, with the property that every node sends a sketch of length at most L=o(k)L=o\left(k\right) bits, when executing on 𝒢k,n\mathcal{G}_{k,n}. Then, 𝖴𝗇𝗂𝗊𝗎𝖾𝖮𝗏𝖾𝗋𝗅𝖺𝗉m,s\mathsf{UniqueOverlap}_{m,s} has a deterministic communication complexity of o(km)o\left(k\cdot\sqrt{m}\right) bits.

Proof.

By Lemma 8, Charlie invokes the referee with the same set of sketches as in the execution of algorithm 𝒜k-conn\mathcal{A}^{\text{$k$-conn}} on the compatible graph G𝒢k,nG\in\mathcal{G}_{k,n}. Thus, correctness follows directly from Lemma 7.

Next, we show the claimed bound on the communication complexity of 𝖴𝗇𝗂𝗊𝗎𝖾𝖮𝗏𝖾𝗋𝗅𝖺𝗉m,s\mathsf{UniqueOverlap}_{m,s}. By (15) (see page 15), we know that the size nn of GG is chosen such that n=Θ(m)n=\Theta\left(m\right). Since every node in ABA\cup B sends a sketch of at most LL bits in the simulation, and these are all the nodes simulated by Alice and Bob, sending these concatenated sketches to Charlie requires at most

O(|AB|L)=O(|W|L)=O(Ln)=o(km)\displaystyle O\left(|A\cup B|\cdot L\right)=O\left(|W|\cdot L\right)=O\left(L\cdot\sqrt{n}\right)=o\left(k\cdot\sqrt{m}\right) (16)

bits, where we have used (3) in the second-last equality. ∎

6 Combining the Pieces

We now use the results from the previous sections to prove Theorem 1.

Theorem 1 (restated).

Every deterministic algorithm that decides kk-edge connectivity on nn-node graphs in the distributed sketching model has a worst case message length of Ω(k)\Omega(k) bits, for any super-constant k=k(n)γnk=k(n)\leq\gamma\sqrt{n}, where γ>0\gamma>0 is a suitable constant.

Let LL be the maximum length of a sketch. Suppose that every node in VV sends at most o(k)o\left(k\right) bits, and consider an instance of 𝖴𝗇𝗂𝗊𝗎𝖾𝖮𝗏𝖾𝗋𝗅𝖺𝗉m,s\mathsf{UniqueOverlap}_{m,s}. Lemma 9 tells us that we can solve an instance of 𝖴𝗇𝗂𝗊𝗎𝖾𝖮𝗏𝖾𝗋𝗅𝖺𝗉m,s\mathsf{UniqueOverlap}_{m,s} if Alice and Bob send a message of o(km)o\left(k\cdot\sqrt{m}\right) bits. Since kO(n)=O(m)k\leq O\left(\sqrt{n}\right)=O\left(\sqrt{m}\right), the messages sent is thus o(m)o(m) bits. We obtain a contradiction to Theorem 2 which states that any deterministic algorithm for 𝖴𝗇𝗂𝗊𝗎𝖾𝖮𝗏𝖾𝗋𝗅𝖺𝗉m,s\mathsf{UniqueOverlap}_{m,s} must send Ω(m)\Omega\left(m\right) bits in the worst case.

7 Conclusion and Open Problems

A problem left open by our work is the complexity of randomized sketching algorithms for deciding kk-edge connectivity. As outlined in Section 1, the existing lower bound approach for graph connectivity does not appear to be amenable to a straightforward generalization. This suggests that either new technical ideas are needed for proving such a bound, or perhaps there is indeed a more efficient randomized algorithm:

Open Problem 1.

What is the sketch length of deciding kk-edge connectivity, if we allow a small probability of error?

Our result implies that any deterministic sketching algorithm must send sketches of near-linear (in kk) bits in the worst case. While the AGM sketches [AGM12a] yield an upper bound of O(klog3n)O(k\log^{3}n) bits, currently there is no non-trivial deterministic upper bound:

Open Problem 2.

Is there a deterministic kk-edge connectivity algorithm that matches the best known randomized upper bound of O(klog3n)O\left(k\log^{3}n\right) bits on the sketch length?

Finally, we point out that our techniques do not shed light on the most fundamental open problem in this setting, namely (1-edge) connectivity:

Open Problem 3.

Can we solve connectivity deterministically in the distributed sketching model if messages are of length o(n)o(n)?

Acknowledgements

The authors would like to thank the anonymous reviewer for pointing out the relationship between the 𝖴𝗇𝗂𝗊𝗎𝖾𝖮𝗏𝖾𝗋𝗅𝖺𝗉\mathsf{UniqueOverlap} problem and the Leave-One Index problem of [AAD+23].

Appendix

Appendix A Proof of Lemma 3

Lemma 3 (restated).

Consider a set family WW, any small ε>0\varepsilon>0, and any positive integer dε|W|2e2+4/εd\leq\frac{\varepsilon|W|}{2e^{2+4/\varepsilon}}. There exists a set family 𝒮(Wd)\mathcal{S}\subseteq\binom{W}{d} such that, for any two distinct S1,S2𝒮S_{1},S_{2}\in\mathcal{S}, we have |S1S2|εd2|S_{1}\cap S_{2}|\leq\frac{\varepsilon d}{2}, and |𝒮|ed1|\mathcal{S}|\geq e^{d-1}.

Proof.

The proof is similar to Lemma 6 in [KNP+17a]. Fix N=(ε|W|2ed)εd/21N=\left\lfloor\sqrt{\left(\frac{\varepsilon|W|}{2ed}\right)^{\varepsilon d/2}-1}\right\rfloor. We choose S1,,SNS_{1},\dots,S_{N} independently and uniformly at random from (Wd){W\choose d}. We first bound the expected size of the overlap of two given subsets. Consider two distinct indices i,jNi,j\leq N, and define indicator random variables X1,,XdX_{1},\dots,X_{d} such that Xk=1X_{k}=1 if and only if the kk-th element of SjS_{j} intersects with SiS_{i}, which happens with probability d|W|\frac{d}{|W|}, and thus E[|SiSj|]=k=1dE[Xk]=d2|W|\operatorname*{\textbf{{E}}}\left[|S_{i}\cap S_{j}|\right]=\sum_{k=1}^{d}\operatorname*{\textbf{{E}}}\left[X_{k}\right]=\frac{d^{2}}{|W|}. While the XkX_{k} variables are not independent, they are negatively dependent, and thus we can use a Chernoff bound to show concentration. For δ=ε|W|2d1\delta=\frac{\varepsilon|W|}{2d}-1, we get

Pr[|SiSj|(1+δ)E[|SiSj|]]\displaystyle\operatorname*{\textbf{{Pr}}}\left[|S_{i}\cap S_{j}|\geq(1+\delta)\operatorname*{\textbf{{E}}}\left[|S_{i}\cap S_{j}|\right]\right] (eδ(1+δ)(1+δ))E[|SiSj|]\displaystyle\leq\left(\frac{e^{\delta}}{(1+\delta)^{(1+\delta)}}\right)^{\operatorname*{\textbf{{E}}}\left[|S_{i}\cap S_{j}|\right]}
(eε|W|2d(ε|W|2d)ε|W|2d)d2/|W|\displaystyle\leq\left(\frac{e^{\frac{\varepsilon|W|}{2d}}}{\left(\frac{\varepsilon|W|}{2d}\right)^{\frac{\varepsilon|W|}{2d}}}\right)^{{d^{2}}/{|W|}}
=(ε|W|2ed)εd/2\displaystyle=\left(\frac{\varepsilon|W|}{2ed}\right)^{-\varepsilon d/2}

It follows by a union bound over the (N2)(ε|W|2ed)εd/21{N\choose 2}\leq\left(\frac{\varepsilon|W|}{2ed}\right)^{\varepsilon d/2}-1 possible pairs of subsets that a set family 𝒮\mathcal{S} with the required bounded intersection property exists with nonzero probability. Note that

|𝒮|=exp(logN)\displaystyle|\mathcal{S}|=\exp\left(\log N\right) exp(12(log((ε|W|2ed)εd/21)2))\displaystyle\geq\exp\left(\frac{1}{2}\left(\log\left(\left(\frac{\varepsilon|W|}{2ed}\right)^{\varepsilon d/2}-1\right)-2\right)\right)
(since dε|W|2e2+4/εd\leq\frac{\varepsilon|W|}{2e^{2+4/\varepsilon}}) exp(12(log((e1+4/ε)εd/21)2))\displaystyle\geq\exp\left(\frac{1}{2}\left(\log\left(\left(e^{1+4/\varepsilon}\right)^{\varepsilon d/2}-1\right)-2\right)\right)
exp(12(log((e4/ε)εd/2)2))\displaystyle\geq\exp\left(\frac{1}{2}\left(\log\left(\left(e^{4/\varepsilon}\right)^{\varepsilon d/2}\right)-2\right)\right)
exp(12(log(e2d)2))\displaystyle\geq\exp\left(\frac{1}{2}\left(\log\left(e^{2d}\right)-2\right)\right)
ed1.\displaystyle\geq e^{d-1}.

Appendix B Proof of Theorem 3

Theorem 3 (restated).

If m3<s\lceil\frac{m}{3}\rceil<s, then there exists a deterministic one-way protocol for 𝖴𝗇𝗂𝗊𝗎𝖾𝖮𝗏𝖾𝗋𝗅𝖺𝗉m,s\mathsf{UniqueOverlap}_{m,s} in the simultaneous 33-party model, such that Alice and Bob send at most s1s-1 bits to Charlie.

Proof.

Consider an input vector XX with 𝗌𝗎𝗉𝗉(X)={i1,i2,,is}\operatorname{\mathsf{supp}}(X)=\{i_{1},i_{2},\ldots,i_{s}\} where i1<i2<<isi_{1}<i_{2}<\ldots<i_{s}. We can obtain a binary string of length s=|𝗌𝗎𝗉𝗉(X)|s=|\operatorname{\mathsf{supp}}(X)| from XX by simply removing all \perp from the vector. In other words, the binary string of XX is Xi1Xi2XisX_{i_{1}}X_{i_{2}}\ldots X_{i_{s}}. For the rest of the proof, we use XX to refer to the input vector or its binary string representation depending on the context.

We define πia(X)\pi_{i_{a}}(X) to be the mapping of the bit string of XX to a (s1)(s-1)-length bit string that we get by removing XiaX_{i_{a}}. That is,

πia(Xi1Xi2XiaXis)=Xi1Xi2Xia1Xia+1Xis.\displaystyle\pi_{i_{a}}(X_{i_{1}}X_{i_{2}}\ldots X_{i_{a}}\ldots X_{i_{s}})=X_{i_{1}}X_{i_{2}}\ldots X_{i_{a-1}}X_{i_{a+1}}\ldots X_{i_{s}}.

Our algorithm determines the message sent by a party given the input vector XX to be πia(X)\pi_{i_{a}}(X), whereby iai_{a} is chosen in a way that guarantees that Charlie can compute the output correctly. Note that, given XX and πia(X)\pi_{i_{a}}(X), the only bit values of XX that Charlie can not deduce with certainty is the value at index iai_{a}, i.e., XiaX_{i_{a}}. Now, consider a valid input pair 𝗌𝗎𝗉𝗉(X)={i1,,is}\operatorname{\mathsf{supp}}(X)=\{i_{1},\ldots,i_{s}\} and 𝗌𝗎𝗉𝗉(Y)={j1,,js}\operatorname{\mathsf{supp}}(Y)=\{j_{1},\ldots,j_{s}\}, where σ=𝗌𝗎𝗉𝗉(X)𝗌𝗎𝗉𝗉(Y)\sigma=\operatorname{\mathsf{supp}}(X)\cap\operatorname{\mathsf{supp}}(Y). Upon receiving the messages πia(X)\pi_{i_{a}}(X) and πjb(Y)\pi_{j_{b}}(Y) sent by Alice and Bob respectively, for some values iai_{a} and jbj_{b}, Charlie can deduce XσX_{\sigma} and YσY_{\sigma} only if at least one of iai_{a} or jbj_{b} is not σ\sigma.

In the following, we describe a protocol on how to choose the value ia𝗌𝗎𝗉𝗉(X)i_{a}\in\operatorname{\mathsf{supp}}(X) so that Charlie can compute correctly: We construct a partition of ([m]s)\binom{[m]}{s} into blocks 1,,m\mathcal{R}_{1},\ldots,\mathcal{R}_{m} such that for each i[m]i\in[m]:

  • (a)

    For all IiI\in\mathcal{R}_{i}, we have iIi\in I.

  • (b)

    For all I,JI,J in i\mathcal{R}_{i}, it holds that |IJ|2|I\cap J|\geq 2.555The reader may notice that the set families 1,,m\mathcal{R}_{1},\ldots,\mathcal{R}_{m} are defined analogously as in the lower bound proof in Section 4.2.

To see why Properties (a) and (b) are sufficient, consider I=𝗌𝗎𝗉𝗉(X)I=\operatorname{\mathsf{supp}}(X) and J=𝗌𝗎𝗉𝗉(Y)J=\operatorname{\mathsf{supp}}(Y) such that IJ={σ}I\cap J=\{\sigma\}. Let ii and jj are such that IRiI\in R_{i} and JRjJ\in R_{j}. The message sent by Alice and Bob given input XX and YY is thus πi(X)\pi_{i}(X) and πj(Y)\pi_{j}(Y), respectively. By the properties of the partition, it follows that iji\neq j. Hence, at least one of them is not equal to σ\sigma, which allow Charlie to deduce the correct value of XσX_{\sigma} and YσY_{\sigma}.

To complete the proof, we give a concrete construction of the blocks 1,,m\mathcal{R}_{1},\ldots,\mathcal{R}_{m}: We partition [m][m] into m3\lceil\frac{m}{3}\rceil intervals, each of length 33, except possibly one interval, where the length is either 11 or 22. We define Φ(b)\Phi(b) to be the successor of bb in the interval containing bb, following the cyclic order. For instance, if the interval is [b,b+1,b+2][b,b+1,b+2], then

Φ(b+i)=b+((i+1)mod3),\Phi(b+i)=b+((i+1)\bmod{3}),

for i=0,1,2.i=0,1,2. Since we are considering cyclic order, the intervals are referred to as cycles. Note that if a cycle has only one element, say aa, we set Φ(a)\Phi(a) to be an arbitrary element different from aa.

Let =([m]s)\mathcal{I}=\binom{[m]}{s}. We define i\mathcal{R}_{i} to be the set that contains all II in \mathcal{I} such that both ii and Φ(i)\Phi(i) are in II, i.e., i={I|{i,Φ(i)}I}\mathcal{R}_{i}=\{I\in\mathcal{I}|\{i,\Phi(i)\}\subseteq I\}. To ensure i\mathcal{R}_{i} and j\mathcal{R}_{j} are disjoint, we remove any II that occurs in ij\mathcal{R}_{i}\cap\mathcal{R}_{j} from one of them. It is straightforward to verify that the resulting 1,,m\mathcal{R}_{1},\ldots,\mathcal{R}_{m} indeed satisfy Properties (a) and (b).

It remains to show that i=1mi=\bigcup_{i=1}^{m}\mathcal{R}_{i}=\mathcal{I}. Consider II\in\mathcal{I}. Recall that there are ss elements in II and m3\lceil\frac{m}{3}\rceil cycles. Given that s>m3s>\lceil\frac{m}{3}\rceil from the assumption, there must exists distinct elements aa and bb in II such that they are in the same cycle. Hence II is in some i\mathcal{R}_{i}. ∎

References

  • [AAD+23] Vikrant Ashvinkumar, Sepehr Assadi, Chengyuan Deng, Jie Gao, and Chen Wang. Evaluating stability in massive social networks: Efficient streaming algorithms for structural balance. In APPROX/RANDOM, 2023.
  • [AGM12a] Kook Jin Ahn, Sudipto Guha, and Andrew McGregor. Analyzing graph structure via linear measurements. In Proceedings of the twenty-third annual ACM-SIAM symposium on Discrete Algorithms, pages 459–467. SIAM, 2012.
  • [AGM12b] Kook Jin Ahn, Sudipto Guha, and Andrew McGregor. Graph sketches: sparsification, spanners, and subgraphs. In Proceedings of the 31st ACM SIGMOD-SIGACT-SIGAI symposium on Principles of Database Systems, pages 5–14, 2012.
  • [AKO20] Sepehr Assadi, Gillat Kol, and Rotem Oshman. Lower bounds for distributed sketching of maximal matchings and maximal independent sets. In PODC ’20: ACM Symposium on Principles of Distributed Computing, Virtual Event, Italy, August 3-7, 2020, pages 79–88, 2020.
  • [BMN+11] Florent Becker, Martin Matamala, Nicolas Nisse, Ivan Rapaport, Karol Suchan, and Ioan Todinca. Adding a referee to an interconnection network: What can (not) be computed in one round. In 2011 IEEE International Parallel & Distributed Processing Symposium, pages 508–514. IEEE, 2011.
  • [BMRT14] Florent Becker, Pedro Montealegre, Ivan Rapaport, and Ioan Todinca. The simultaneous number-in-hand communication model for networks: Private coins, public coins and determinism. In Structural Information and Communication Complexity - 21st International Colloquium, SIROCCO 2014, Takayama, Japan, July 23-25, 2014. Proceedings, pages 83–95, 2014.
  • [GMT15] Sudipto Guha, Andrew McGregor, and David Tench. Vertex and hyperedge connectivity in dynamic graph streams. In Proceedings of the 34th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, pages 241–247, 2015.
  • [HKT+19] Jacob Holm, Valerie King, Mikkel Thorup, Or Zamir, and Uri Zwick. Random k-out subgraph leaves only o(n/k) inter-component edges. In 60th IEEE Annual Symposium on Foundations of Computer Science, FOCS 2019, Baltimore, Maryland, USA, November 9-12, 2019, pages 896–909, 2019.
  • [HM67] Anthony JW Hilton and Eric C Milner. Some intersection theorems for systems of finite sets. The Quarterly Journal of Mathematics, 18(1):369–384, 1967.
  • [JLN18] Tomasz Jurdzinski, Krzysztof Lorys, and Krzysztof Nowicki. Communication complexity in vertex partition whiteboard model. In International Colloquium on Structural Information and Communication Complexity, pages 264–279. Springer, 2018.
  • [JN17] Tomasz Jurdzinski and Krzysztof Nowicki. Brief announcement: on connectivity in the broadcast congested clique. In 31st International Symposium on Distributed Computing (DISC 2017). Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2017.
  • [Juk01] Stasys Jukna. Extremal combinatorics: with applications in computer science, volume 29. Springer, 2001.
  • [KN97] Eyal Kushilevitz and Noam Nisan. Communication Complexity. Cambridge University Press, 1997.
  • [KNP+17a] Michael Kapralov, Jelani Nelson, Jakub Pachocki, Zhengyu Wang, David P Woodruff, and Mobin Yahyazadeh. Optimal lower bounds for universal relation, and for samplers and finding duplicates in streams. In 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS), pages 475–486. Ieee, 2017.
  • [KNP+17b] Michael Kapralov, Jelani Nelson, Jakub Pachocki, Zhengyu Wang, David P Woodruff, and Mobin Yahyazadeh. Optimal lower bounds for universal relation, and for samplers and finding duplicates in streams. arXiv preprint arXiv:1704.00633, 2017.
  • [KRW95] Mauricio Karchmer, Ran Raz, and Avi Wigderson. Super-logarithmic depth lower bounds via the direct sum in communication complexity. Computational Complexity, 5:191–204, 1995.
  • [McG14] Andrew McGregor. Graph stream algorithms: a survey. ACM SIGMOD Record, 43(1):9–20, 2014.
  • [MT16] Pedro Montealegre and Ioan Todinca. Brief announcement: deterministic graph connectivity in the broadcast congested clique. In Proceedings of the 2016 ACM Symposium on Principles of Distributed Computing, pages 245–247, 2016.
  • [MV01] Jiří Matoušek and Jan Vondrák. The probabilistic method. Lecture Notes, Department of Applied Mathematics, Charles University, Prague, 2001.
  • [NY19] Jelani Nelson and Huacheng Yu. Optimal lower bounds for distributed and streaming spanning forest computation. In Proceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2019, San Diego, California, USA, January 6-9, 2019, pages 1844–1860, 2019.
  • [Pin64] Mark S Pinsker. Information and information stability of random variables and processes. Holden-Day, 1964.
  • [PP20] Shreyas Pai and Sriram V Pemmaraju. Connectivity lower bounds in broadcast congested clique. In 40th IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS 2020). Schloss Dagstuhl-Leibniz-Zentrum für Informatik, 2020.
  • [Rob23] Peter Robinson. Distributed sketching lower bounds for k-edge connected spanning subgraphs, BFS trees, and LCL problems. In 37th International Symposium on Distributed Computing (DISC 2023), pages 32–1. Schloss Dagstuhl–Leibniz-Zentrum für Informatik, 2023.
  • [Yu21] Huacheng Yu. Tight distributed sketching lower bound for connectivity. In Proceedings of the 2021 ACM-SIAM Symposium on Discrete Algorithms, SODA 2021, Virtual Conference, January 10 - 13, 2021, pages 1856–1873, 2021.