This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Practical Rateless Set Reconciliation

Lei Yang leiy@csail.mit.edu 0000-0003-3256-4700 Massachusetts Institute of Technology77 Massachusetts AvenueCambridgeMAUSA02139 Yossi Gilad yossigi@cs.huji.ac.il 0000-0003-3475-8322 Hebrew University of JerusalemJerusalemIsrael  and  Mohammad Alizadeh alizadeh@csail.mit.edu 0000-0002-0014-6742 Massachusetts Institute of Technology77 Massachusetts AvenueCambridgeMAUSA02139
(2024)
Abstract.

Set reconciliation, where two parties hold fixed-length bit strings and run a protocol to learn the strings they are missing from each other, is a fundamental task in many distributed systems. We present Rateless Invertible Bloom Lookup Tables (Rateless IBLTs), the first set reconciliation protocol, to the best of our knowledge, that achieves low computation cost and near-optimal communication cost across a wide range of scenarios: set differences of one to millions, bit strings of a few bytes to megabytes, and workloads injected by potential adversaries. Rateless IBLT is based on a novel encoder that incrementally encodes the set difference into an infinite stream of coded symbols, resembling rateless error-correcting codes. We compare Rateless IBLT with state-of-the-art set reconciliation schemes and demonstrate significant improvements. Rateless IBLT achieves 334×4\times lower communication cost than non-rateless schemes with similar computation cost, and 222000×2000\times lower computation cost than schemes with similar communication cost. We show the real-world benefits of Rateless IBLT by applying it to synchronize the state of the Ethereum blockchain, and demonstrate 5.6×5.6\times lower end-to-end completion time and 4.4×4.4\times lower communication cost compared to the system used in production.

Set Reconciliation, Rateless Codes, Universal Codes, Data Synchronization, Randomized Algorithms
journalyear: 2024copyright: rightsretainedconference: ACM SIGCOMM 2024 Conference; August 4–8, 2024; Sydney, NSW, Australiabooktitle: ACM SIGCOMM 2024 Conference (ACM SIGCOMM ’24), August 4–8, 2024, Sydney, NSW, Australiadoi: 10.1145/3651890.3672219isbn: 979-8-4007-0614-1/24/08ccs: Networks Application layer protocolsccs: Networks Network algorithmsccs: Networks Network protocol designccs: Mathematics of computing Coding theoryccs: Computer systems organization Dependable and fault-tolerant systems and networks

1. Introduction

Recent years have seen growing interest in distributed applications, such as blockchains (Nakamoto, 2008; Wood, 2014; Androulaki et al., 2018), social networks (Raman et al., 2019), mesh messaging (Perry et al., 2022), and file hosting (Trautwein et al., 2022). In these applications, nodes (participating servers) maintain replicas of the entire or part of the application state, and synchronize their replicas by exchanging messages on a peer-to-peer network.

The naive approach to solving this problem requires communication proportional to the size of the set one party holds. For example, some applications send the entire set while other applications have parties exchange the hashes of their items or a Bloom filter of their sets (the Bloom filter’s size is proportional to the set size). A node then requests the items it is missing. All these solutions induce high overhead, especially when nodes have large, overlapping sets. This is a common scenario in distributed applications, such as nodes in a blockchain network synchronizing transactions or account balances, social media servers synchronizing users’ posts, or a name system synchronizing certificates or revocation lists (Summermatter and Grothoff, 2022).

An emerging alternative solution to the state synchronization problem is set reconciliation. It abstracts the application’s state as a set and then uses a reconciliation protocol to synchronize replicas. Crucially, the overhead is determined by the set difference size rather than the set size, allowing support for applications with very large states. However, existing set reconciliation protocols suffer from at least one of two major caveats. First, most protocols are parameterized by the size of the set difference between the two participating parties. However, in practice, setting this parameter is difficult since scenarios such as temporal network disconnections or fluctuating load on the system make it challenging to know what the exact difference size will be ahead of time. Thus, application designers often resort to running online estimation protocols, which induce additional latency and only give a statistical estimate of the set difference size. Such estimators are inaccurate, forcing the application designers to tailor the parameters to the tail of the potential error, resulting in high communication overhead. The second type of caveat is that some set reconciliation protocols suffer from high decoding complexity, where the recipient has to run a quadratic-time or worse algorithm, with relatively expensive operations.

We propose a rateless set reconciliation scheme called Rateless Invertible Bloom Lookup Tables (Rateless IBLT) that addresses these challenges. In Rateless IBLT, a sender generates an infinite stream of coded symbols that encode the set difference, and the recipient can decode the set difference when they receive enough coded symbols. Rateless IBLT has no parameters and does not need an estimate of the set difference size. With overwhelming probability, the recipient can decode the set difference after receiving a number of coded symbols that are proportional to the set difference size rather than the entire set size, resulting in low overhead. Rateless IBLT’s coded symbols are universal. The same sequence of coded symbols can be used to reconcile any number of differences with any other set. Therefore, the sender can create coded symbols once and use them to synchronize with any number of peers. The latter property is particularly useful for applications such as blockchain peer-to-peer networks, where nodes may synchronize with multiple sources with overlapping states, since it allows the node to recover the union of their states using coded symbols it concurrently receives from all of them.

In summary, we make the following contributions:

  1. (1)

    The design of Rateless IBLT, the first set reconciliation protocol that achieves low computation cost and near-optimal communication cost across a wide range of scenarios: set differences of one to millions, bit strings of a few bytes to megabytes, and workloads injected by potential adversaries.

  2. (2)

    A mathematical analysis of Rateless IBLT’s communication and computation costs. We prove that when the set difference size dd goes to infinity, Rateless IBLT reconciles dd differences with 1.35d1.35d communication. We show in simulations that the communication cost is between 1.35d1.35d to 1.72d1.72d on average for all values of dd and that it quickly converges to 1.35×1.35\times when dd is in the low hundreds.

  3. (3)

    An implementation of Rateless IBLT as a library. When reconciling 10001000 differences, our implementation can process input data (sets being reconciled) at 120120 MB/s using a single core of a 2016-model CPU.

  4. (4)

    Extensive experiments comparing Rateless IBLT with state-of-the-art solutions. Rateless IBLT achieves 334×4\times lower communication cost than regular IBLT (Goodrich and Mitzenmacher, 2011) and MET-IBLT (Lázaro and Matuz, 2023), two non-rateless schemes; and 222000×2000\times lower computation cost than PinSketch (Dodis et al., 2008).

  5. (5)

    Demonstration of Rateless IBLT’s real-world benefits by applying our implementation to synchronize the account states of the Ethereum blockchain. Compared to Merkle trie (Yue et al., 2020), today’s de facto solution, Rateless IBLT achieves 5.6×5.6\times lower completion time and 4.4×4.4\times lower communication cost on historic traces.

2. Motivation and Related Work

We first formally define the set reconciliation problem (Minsky et al., 2003; Eppstein et al., 2011). Let AA and BB be two sets containing items (bit strings) of the same length \ell. AA and BB are stored by two distinct parties, Alice and Bob. They want to efficiently compute the symmetric difference of AA and BB, i.e., (AB)(AB)(A\cup B)\setminus(A\cap B), denoted as ABA\bigtriangleup B. By convention (Eppstein et al., 2011), we assume that only one of the parties, Bob, wants to compute ABA\bigtriangleup B because he can send the result to Alice afterward if needed.

While straightforward solutions exist, such as exchanging Bloom filters (Bloom, 1970) or hashes of the items, they incur O(|A|+|B|)O(|A|+|B|) communication and computation costs. The costs can be improved to logarithmic by hashing the sets into Merkle tries (Yue et al., 2020), where a trie node on depth ii is the hash of a (1/2i)\left(1/2^{i}\right)-fraction of the set. Alice and Bob traverse and compare their tries, only descending into a sub-trie (subset) if their roots (hashes) differ. However, the costs are still dependent on |A|,|B||A|,|B|, and now takes O(log|A|+log|B|)O(\log|A|+\log|B|) round trips.

In contrast, the information-theoretic lower bound (Minsky et al., 2003, § 2) of the communication cost is dd\ell, where d=|AB|d=|A\bigtriangleup B|.111More precisely, the lower bound is ddlog2dd\ell-d\log_{2}d (Minsky et al., 2003, § 2), but the second term can be neglected when d2d\ll 2^{\ell}. State-of-the-art solutions get close to this lower bound using techniques from coding theory. On a high level, we can view BB as a copy of AA with dd errors (insertions and/or deletions), and the goal of set reconciliation is to correct these errors. Alice encodes AA into a list of mm coded symbols and sends them to Bob. Bob then uses the coded symbols and BB to decode ABA\bigtriangleup B. The coded symbols are the parity data in a systematic error-correcting code that can correct set insertions and deletions (Mitzenmacher and Varghese, 2012). Using appropriate codes, it takes m=O(d)m=O(d) coded symbols, each of length O()O(\ell), to correct the dd errors, resulting in a communication cost of O(d)O(d\ell).

The performance of existing solutions varies depending on the codes they use. Characteristic Polynomial Interpolation (CPI) (Minsky et al., 2003) uses a variant of Reed-Solomon codes (Reed and Solomon, 1960), where coded symbols are evaluations of a polynomial uniquely constructed from AA. CPI has a communication cost of dd\ell, achieving the information-theoretic lower bound. However, its computation cost is O(|A|d)O(|A|d\ell) for Alice, and O(|B|d+d34)O(|B|d\ell+d^{3}\ell^{4}) for Bob. The latter was improved to O(|B|d+d22)O(|B|d\ell+d^{2}\ell^{2}) in PinSketch (Dodis et al., 2008; Wuille, 2018) using BCH codes (Bose and Ray-Chaudhuri, 1960) that are easier to decode. Nevertheless, as we show in § 7.2, computation on both Alice and Bob quickly becomes intractable even at moderate |A||A|, |B||B|, and dd, limiting its applicability. For example, Shrec (Han et al., 2020) attempted to use PinSketch to synchronize transactions in a high-throughput blockchain but found that its high computation complexity severely limits system throughput (Han et al., 2020, § 5.2).

Invertible Bloom Lookup Tables (IBLTs) (Goodrich and Mitzenmacher, 2011) use sparse graph codes similar to LT (Luby, 2002) and LDPC (Gallager, 1962) codes. Each set item is summed into kk coded symbols, denoted as its neighbors in a random, sparse graph. Some variants also consider graphs with richer structures such as varying kk depending on the set item (Lázaro and Matuz, 2021). The computation cost is O(|A|k)O(|A|k\ell) for Alice, and O((|B|+d)k)O((|B|+d)k\ell) for Bob. The communication cost is O(d)O(d\ell) with a coefficient strictly larger than 11 (e.g., 441010 for small dd, see § 7.1). Due to their random nature, IBLTs may fail to decode even if properly parameterized (Eppstein et al., 2011). We provide more background on IBLTs in § 3.

The aforementioned discussions assume that the codes are properly parameterized. In particular, we need to decide mm, the number of coded symbols Alice sends to Bob. Decoding will fail if mm is too small compared to dd, and we incur redundant communication and computation if mm is too large. The optimal choice of mm is thus a function of dd. However, accurate prediction of dd is usually difficult (Naumenko et al., 2019; Han et al., 2020), and sometimes outright impossible (Ozisik et al., 2019). Existing systems often resort to online estimation protocols (Eppstein et al., 2011) and over-provision mm to accommodate the ensuing errors (Eppstein et al., 2011; Ozisik et al., 2019).

The case for rateless reconciliation. A key feature of Rateless IBLT is that it can generate an infinite stream of coded symbols for a set, resembling rateless error-correcting codes (Byers et al., 1998). For any m>0m>0, the first mm coded symbols can reconcile O(m)O(m) set differences with a coefficient close to 11 (0.740.74 in most cases, see § 7.1). This means that Rateless IBLT does not require parameterization, making real-world deployments easy and robust. Alice simply keeps sending coded symbols to Bob, and Bob can decode as soon as he receives enough—which we show analytically (§ 5) and experimentally (§ 7.1) to be about 1.35d1.35d in most cases—coded symbols. Neither Alice nor Bob needs to know dd beforehand. The encoding and the decoding algorithms have zero parameters.

The concept of incrementally generating coded symbols is first mentioned in CPI (Minsky et al., 2003). However, as mentioned before, its real-world use has been limited due to the high computation cost. We discuss these limitations in § 7, and demonstrate that Rateless IBLT reduces the computation cost by 222000×2000\times, while incurring a communication cost of less than 2×2\times the information-theoretic lower bound. Concurrently with our work, MET-IBLT (Lázaro and Matuz, 2023) proposes to simultaneously optimize the parameters of IBLTs for multiple pre-selected values of dd, e.g., d1,d2,,dnd_{1},d_{2},\dots,d_{n}, such that a list of coded symbols for did_{i} is a prefix/suffix of that for djd_{j}. However, it only considers a few values of dd due to the complexity of the optimization so still requires workload-dependent parameterization. As we show in § 7.1, its communication cost is 4410×10\times higher for the dd values that are not optimized for. In addition, MET-IBLT does not provide a practical algorithm to incrementally generate coded symbols. Rateless IBLT does not have any of these issues.

Rateless IBLT offers additional benefits. Imagine that Alice has the canonical system state, and multiple peers wish to reconcile their states with Alice. In a non-rateless scheme, Alice must separately produce coded symbols for each peer depending on the particular number of differences dd. This incurs additional computation and storage I/Os for every peer she reconciles with. And, more importantly, Alice must produce the coded symbols on the fly because she does not know dd before a peer arrives. In comparison, using Rateless IBLT, Alice simply maintains a universal sequence of coded symbols and streams it to anyone who wishes to reconcile. Rateless IBLT also allows her to incrementally update the coded symbols as she modifies the state (set), further amortizing the encoding costs.

To the best of our knowledge, Rateless IBLT is the first set reconciliation solution that simultaneously achieves the following properties:

  • Ratelessness. The encoder generates an infinite sequence of coded symbols, capable of reconciling any number of differences dd with low overhead.

  • Universality. The same algorithm works efficiently for any |A||A|, |B||B|, dd, and \ell without any parameter.

  • Low communication cost. The average communication cost peaks at 1.72d1.72d\ell when d=4d=4, and quickly converges to 1.35d1.35d\ell when dd is at the low hundreds.

  • Low computation cost. Encoding costs O(logd)O(\ell\log d) per set item, and decoding costs O(logd)O(\ell\log d) per difference. In practice, a single core on a 2016-model CPU can encode (decode) 3.4 million items (differences) per second when d=1000d=1000 and =8\ell=8 bytes.

We demonstrate these advantages by comparing with all the aforementioned schemes in § 7 and applying Rateless IBLT to a potential application in § 7.3.

3. Background

Our Rateless IBLT retains the format of coded symbols and the decoding algorithm of regular IBLTs, but employs a new encoder that is oblivious to the number of differences to reconcile. In this section, we provide the necessary background on IBLTs (Eppstein et al., 2011; Goodrich and Mitzenmacher, 2011) and explain why regular IBLTs fail to provide the rateless property that we desire. We discuss the new rateless encoder in the next section.

On a high level, an IBLT is an encoding of a set. We call the items (bit strings) in the set the source symbols, and an IBLT comprises a list of mm coded symbols. Each source symbol is mapped to kk coded symbols uniformly at random, e.g., by using kk hash functions. Here, mm and kk are design parameters.

Coded symbol format. A coded symbol contains two fields: sum, the bitwise exclusive-or (XOR) sum of the source symbols mapped to it; and checksum, the bitwise XOR sum of the hashes of the source symbols mapped to it. In practice, there is usually a third field, count, which we will discuss shortly. Fig. 1 provides an example.

AAx0x_{0}x1x_{1}x2x_{2}x3x_{3}IBLT(A)\text{IBLT}(A)a0a_{0}a1a_{1}a2a_{2}a3a_{3}a4a_{4}a5a_{5}
Figure 1. Example of constructing a regular IBLT for set AA with source symbols x0,x1,x2,x3x_{0},x_{1},x_{2},x_{3}. The IBLT has m=6m=6 coded symbols: a0,a1,a2,a3,a4,a5a_{0},a_{1},a_{2},a_{3},a_{4},a_{5}. Each source symbol is mapped to k=3k=3 coded symbols. Solid lines represent the mapping between source and coded symbols. For example, for a4a_{4}, 𝚜𝚞𝚖=x1x3\mathtt{sum}=x_{1}\oplus x_{3}, 𝚌𝚑𝚎𝚌𝚔𝚜𝚞𝚖=𝙷𝚊𝚜𝚑(x1)𝙷𝚊𝚜𝚑(x3)\mathtt{checksum}=\mathtt{Hash}(x_{1})\oplus\mathtt{Hash}(x_{3}), and 𝚌𝚘𝚞𝚗𝚝=2\mathtt{count}=2. \oplus is the bitwise exclusive-or operator.

Peeling decoder. To recover source symbols from a list of coded symbols, the decoder runs a recursive procedure called “peeling”. We say a coded symbol is pure when exactly one source symbol is mapped to it; or, equivalently, when its checksum equals the hash of its sum (Eppstein et al., 2011).222Unless there is a hash collision, which happens with negligible probability in the length of the hash. See § 4.3. In this case, its sum field is the source symbol itself, which is now recovered. The decoder then removes the recovered source symbol from any other coded symbols it is mapped to (determined by the kk agreed-upon hash functions), by XOR-ing the source symbol and its hash into their sum and checksum fields, respectively. This process may generate additional pure symbols; the decoder repeats until no pure symbols are left. Decoding fails if it stops before recovering all source symbols (Goodrich and Mitzenmacher, 2011). Fig. 2 shows the example of decoding the IBLT in Fig. 1.

(a) Iteration 1
(b) Iteration 2
(c) Iteration 3
Figure 2. Example of decoding the IBLT in Fig. 1 using peeling. Dark colors represent pure coded symbols at the beginning of each iteration, and source symbols recovered so far. Dashed edges are removed at the end of each iteration, by XOR-ing the source symbol (now recovered) and its hash on one end of the edge into the sum and checksum fields of the coded symbol on the other end.

Subtraction of coded symbols. aba\oplus b denotes subtraction of two coded symbols a,ba,b. For the resulting coded symbol, its sum is the bitwise XOR of a.𝚜𝚞𝚖a\mathtt{.sum} and b.𝚜𝚞𝚖b\mathtt{.sum}; its checksum is the bitwise XOR of a.𝚌𝚑𝚎𝚌𝚔𝚜𝚞𝚖a\mathtt{.checksum} and b.𝚌𝚑𝚎𝚌𝚔𝚜𝚞𝚖b\mathtt{.checksum}; and its count is a.𝚌𝚘𝚞𝚗𝚝b.𝚌𝚘𝚞𝚗𝚝a\mathtt{.count}-b\mathtt{.count}.

Set reconciliation using IBLTs. IBLTs with the same parameter configuration (mm, kk, and hash functions mapping from source symbols to coded symbols) can be subtracted (Eppstein et al., 2011). For any two sets AA and BB, IBLT(A)IBLT(B)=IBLT(AB)\text{IBLT}(A)\oplus\text{IBLT}(B)=\text{IBLT}(A\bigtriangleup B), where the \oplus operator subtracts each corresponding pair of coded symbols from the two IBLTs. This is because if a source symbol is present in both AA and BB, then it is XOR-ed twice into each coded symbol it is mapped to in IBLT(A)IBLT(B)\text{IBLT}(A)\oplus\text{IBLT}(B), resulting in no effect. As a result, IBLT(A)IBLT(B)\text{IBLT}(A)\oplus\text{IBLT}(B) is an encoding of the source symbols that appear exactly once across AA and BB, i.e., ABA\bigtriangleup B.

To reconcile AA and BB, Alice sends IBLT(A)\text{IBLT}(A) to Bob, who then computes and decodes IBLT(A)IBLT(B)\text{IBLT}(A)\oplus\text{IBLT}(B) to recover ABA\bigtriangleup B. To determine whether a recovered source symbol belongs to AA or BB, we use the count field.333Alternatively, Bob may try looking up each item in ABA\bigtriangleup B against BB, but this requires indexing BB, which is undesirable when |B||B| is large. It records the number of source symbols mapped to a coded symbol. When a coded symbol is pure, count=1\texttt{count}=1 indicates that the recovered source symbol is exclusive to AA, and count=1\texttt{count}=-1 indicates BB (Eppstein et al., 2011).

Limitations of IBLTs. IBLTs are not rateless. An IBLT with a particular set of parameters m,km,k only works for a narrow range of difference size dd. It quickly becomes inefficient to use it for more or fewer differences than parameterized for. In § A, we show Theorems A.1 and A.2, which we summarize informally here. First, with high probability, Bob cannot recover any source symbol in ABA\bigtriangleup B when d>md>m, making undersized IBLTs completely useless. On the other hand, we cannot simply default to a very large mm to accommodate a potentially large dd. If dd turns out to be small, i.e., dmd\ll m, Alice still has to send almost the entire IBLT (mm coded symbols) for Bob to decode successfully, leading to high communication cost. Alice cannot dynamically enlarge mm, either. Each source symbol is already uniformly mapped to kk out of mm coded symbols upon encoding. Increasing mm post hoc would require remapping the source symbols to the expanded space of coded symbols so that the mapping remains uniform. This requires Alice to rebuild and re-send the entire IBLT. Figs. 3(a)3(b) show an example.

4. Design

ABA\bigtriangleup BIBLT
(a) Regular IBLT, m=4m=4
(b) Regular IBLT, m=7m=7
ABA\bigtriangleup BRateless IBLT
(c) Rateless IBLT, prefix of m=4m=4
(d) Rateless IBLT, prefix of m=7m=7
Figure 3. Regular IBLTs and prefixes of Rateless IBLT for 55 source symbols. Figs. a, c (left) have too few coded symbols and are undecodable. Figs. b, d (right) are decodable. Red edges are common across each row. Dark coded symbols in Figs. b, d are new or changed compared to their counterparts in Figs. a, c. Imagine that Alice sends 44 coded symbols but Bob fails to decode. In regular IBLT, in order to enlarge mm, she has to send all 77 coded symbols since the existing 44 symbols also changed. In Rateless IBLT, she only needs to send the 33 new symbols. The existing 44 symbols stay the same.

For any set SS, Rateless IBLT defines an infinite sequence of coded symbols. Intuitively, an infinite number of coded symbols for AA and BB allows Rateless IBLT to accommodate an arbitrarily large set difference. Every prefix of this infinite sequence functions like a normal IBLT and can reconcile a number of differences proportional to its length. Meanwhile, because these prefixes belong to a common infinite sequence, Alice simply streams the sequence until Bob receives a long enough prefix to decode. For any d>0d>0, on average, reconciling dd differences requires only the first 1.35d1.35d1.72d1.72d coded symbols in the sequence.

4.1. Coded Symbol Sequence

Our main task is to design the algorithm that encodes any set SS into an infinite sequence of coded symbols, denoted as s0,s1,s2,s_{0},s_{1},s_{2},\dots. It should provide the following properties:

  • Decodability. With high probability, the peeling decoder can recover all source symbols in a set SS using a prefix of s0,s1,s2,s_{0},s_{1},s_{2},\dots with length O(|S|)O(|S|).

  • Linearity. For any sets AA and BB, a0b0,a1b1,a2b2,a_{0}\oplus b_{0},a_{1}\oplus b_{1},a_{2}\oplus b_{2},\dots is the coded symbol sequence for ABA\bigtriangleup B.

  • Universality. The encoding algorithm does not need any extra information other than the set being encoded.

These properties allow us to build the following simple protocol for rateless reconciliation. To reconcile AA and BB, Alice incrementally sends a0,a1,a2,a_{0},a_{1},a_{2},\dots to Bob. Bob computes a0b0,a1b1,a2b2,a_{0}\oplus b_{0},a_{1}\oplus b_{1},a_{2}\oplus b_{2},\dots, and tries to decode these symbols using the peeling decoder. Bob notifies Alice to stop when he has recovered all source symbols in ABA\bigtriangleup B. As we will soon show, the first symbol a0b0a_{0}\oplus b_{0} in Rateless IBLT is decoded only after all source symbols are recovered. This is the indicator for Bob to terminate.

Linearity guarantees that a0b0,a1b1,a2b2,a_{0}\oplus b_{0},a_{1}\oplus b_{1},a_{2}\oplus b_{2},\dots is the coded symbol sequence for ABA\bigtriangleup B. Decodability guarantees that Bob can decode after receiving O(|AB|)O(|A\bigtriangleup B|) coded symbols and recover all source symbols in ABA\bigtriangleup B. Universality guarantees that Alice and Bob do not need any prior context to run the protocol.

If Alice regularly reconciles with multiple peers, she may cache coded symbols for AA to avoid recomputing them every session. Universality implies that Alice can reuse the same cached symbols across different peers. Linearity implies that if she updates her set AA, she can incrementally update the cached symbols by treating the updates AAA\bigtriangleup A^{\prime} as a set and subtracting its coded symbols from the cached ones for AA.

We now discuss how we design an encoding algorithm that satisfies the three properties we set to achieve.

4.1.1. Linearity & Universality

Our key observation is that to ensure linearity, it is sufficient to define a consistent mapping rule, which, given any source symbol x{0,1}x\in\{0,1\}^{*} and any index i0i\geq 0, deterministically decides whether xx should be mapped to the ii-th coded symbol when encoding a set that contains xx. This ensures that if xABx\in A\cap B, then it will be mapped to both aia_{i} and bib_{i} or neither; in either case, xx will not be reflected in aibia_{i}\oplus b_{i}. On the other hand, if xABx\in A\bigtriangleup B, and xx should be mapped to index ii according to the rule, then it will be mapped to exactly one of aia_{i} or bib_{i}, and therefore will be reflected in aibia_{i}\oplus b_{i}. Since the mapping rule makes decisions based only on xx and ii, the resulting encoding algorithm also satisfies universality.

4.1.2. Decodability

Whether the peeling decoder can recover all source symbols from a set of coded symbols is fully determined by the mapping between the source and the coded symbols. Let ρ(i)\rho(i) be the probability that a random source symbol maps to the ii-th coded symbol, which we refer to as the mapping probability. It is the key property that defines the behavior of a mapping rule. In the remainder of this subsection, we constrain ρ(i)\rho(i) by examining two necessary conditions for peeling to succeed. Our key conclusion is that in order for decodability to hold, ρ(i)\rho(i) must be inversely proportional to ii. This rejects most functions as candidates of ρ(i)\rho(i) and leads us to a concrete instantiation of ρ(i)\rho(i). We will design a concrete algorithm (mapping rule) that realizes the mapping probability in the next subsection, and mathematically prove that it satisfies decodability in § 5.

First, to kick-start the peeling decoder, there must be a coded symbol with exactly one source symbol mapped to it (a pure coded symbol). For a set SS and index ii, the probability that this happens decreases quasi-exponentially in ρ(i)|S|\rho(i)|S|. This implies that ρ(i)\rho(i) must decrease quickly with ii. Otherwise, each of the first O(|S|)O(|S|) coded symbols would have an exponentially small probability of being pure, and it would be likely that none of them is pure, violating decodability.

The following lemma shows that for this reason, the mapping probability ρ(i)\rho(i) cannot decrease slower than 1/i1ϵ1/i^{1-\epsilon} for any positive ϵ\epsilon, i.e., almost inversely proportional to ii. We defer the proof to § C.

Lemma 4.1.

For any ϵ>0\epsilon>0, any mapping probability ρ(i)\rho(i) such that ρ(i)=Ω(1/i1ϵ)\rho(i)=\Omega\left(1/i^{1-\epsilon}\right), and any σ>0\sigma>0, if there exists at least one pure coded symbol within the first mm coded symbols for a random set SS with probability σ\sigma, then m=ω(|S|)m=\omega(|S|).

Second, to recover all source symbols in a set SS, we need at least |S||S| non-empty coded symbols. This is because during peeling, each pure symbol (which must be non-empty) yields at most one source symbol. Intuitively, ρ(i)\rho(i) cannot decrease too fast with index ii. Otherwise, the probability that a coded symbol is empty would quickly grow towards 11 as ii increases. The first O(|S|)O(|S|) coded symbols would not reliably contain at least |S||S| non-empty symbols, violating decodability.

The following lemma shows that for this reason, the mapping probability ρ(i)\rho(i) cannot decrease faster than 1/i1/i. We defer the proof to § C.

Lemma 4.2.

For any mapping probability ρ(i)\rho(i) such that ρ(i)=o(1/i)\rho(i)=o\left(1/i\right), and any σ>0\sigma>0, if there exist at least |S||S| non-empty coded symbols within the first mm coded symbols for a random set SS with probability σ\sigma, then m=ω(|S|)m=\omega(|S|).

The constraints above reject functions that decrease faster than 1/i1/i, as well as functions that decrease slower than iϵ/ii^{\epsilon}/i for any ϵ>0\epsilon>0. For simplicity, we ignore the degree of freedom stemming from the iϵi^{\epsilon} factor since for a sufficiently small ϵ\epsilon and any practical ii, it is very close to 1. The remaining candidates for ρ(i)\rho(i) are the ones in between, i.e., functions of order 1/i1/i. We choose the simplest function in this class:

(1) ρ(i)=11+αi,\rho(i)=\frac{1}{1+\alpha i},

where α>0\alpha>0 is a parameter. We shift the denominator by 11 because ii starts at 0. In § 5, we prove that this ρ(i)\rho(i) achieves decodability with high efficiency: recovering a set SS only requires the first 1.35|S|1.35|S|1.72|S|1.72|S| coded symbols on average.

We highlight two interesting properties of our ρ(i)\rho(i). First, ρ(0)=1\rho(0)=1. This means that for any set, every source symbol is mapped to the first coded symbol. This coded symbol is only decoded after all source symbols are recovered. So, Bob can tell whether reconciliation has finished by checking if a0b0a_{0}\oplus b_{0} is decoded. Second, among the first mm indices, a source symbol is mapped to i=0m1ρ(i)\sum_{i=0}^{m-1}\rho(i) of them on average, or O(logm)O(\log m). It means that the density of the mapping, which decides the computation cost of encoding and decoding, decreases quickly as mm increases. As we will show in § 7.2, the low density allows Rateless IBLT to achieve 222000×2000\times higher throughput than PinSketch.

4.2. Realizing the Mapping Probability

We now design an efficient deterministic algorithm for mapping a source symbol ss to coded symbols that achieves the mapping probability rule identified in the previous section.

Recall that for a random source symbol ss, we want to make the probability that ss is mapped to the ii-th coded symbol to be ρ(i)\rho(i) in Eq. 1. A simple strawman solution, for example, is to use a hash function that, given ss, outputs a hash value uniformly distributed in [0,1)[0,1). We then compare the hash value to ρ(i)\rho(i), and decide to map ss to the ii-th coded symbol if the hash value is smaller. Given a random ss, because its hash value distributes uniformly, the mapping happens with probability ρ(i)\rho(i).

However, this approach has major issues. First, it requires comparing hash values and ρ(i)\rho(i) for every pair of source symbol ss and index ii. As mentioned in § 4.1.2, the density of the mapping is O(logm)O(\log m) for the first mm coded symbols. In contrast, generating the mm coded symbols using this algorithm would require mm comparisons for each source symbol, significantly inflating the computation cost. Another issue is that we cannot use the same hash function when mapping ss to different indices ii and jj. Otherwise, the mappings to them would not be independent: if ρ(i)<ρ(j)\rho(i)<\rho(j) and ss is mapped to ii, then it will always be mapped to jj. Using different, independent hash functions when mapping the same source symbol to different indices means that we also need to hash the symbol mm times.

We design an algorithm that maps each source symbol to the first mm coded symbols using only O(logm)O(\log m) computation. The strawman solution is inefficient because we roll a dice (compare hash and ρ(i)\rho(i)) for every index ii, even though we end up not mapping ss to the majority of them (mO(logm)m-O(\log m) out of mm), so reaching the next mapped index takes many dice rolls (m/O(logm)m/O(\log m) on average). Our key idea is to directly sample the distance (number of indices) to skip before reaching the next mapped index. We achieve it with constant cost per sample, so we can jump from one mapped index straight to the next in constant time.

We describe our algorithm recursively. Suppose that, according to our algorithm, a source symbol ss has been mapped to the ii-th coded symbol. We now wish to compute, in constant time, the next index jj that ss is mapped to. Let GG be the random variable such that ji=Gj-i=G for a random ss, and let PgP_{g} (g1g\geq 1) be the probability that G=gG=g. In other words, PgP_{g} is the probability that a random ss is not mapped to any of i+1,i+2,,i+g1i+1,i+2,\dots,i+g-1, but is mapped to i+gi+g, which are all independent events. So,

Pg=(1ρ(i+1))(1ρ(i+2))(1ρ(i+g1))ρ(i+g).P_{g}=(1-\rho(i+1))(1-\rho(i+2))\dots(1-\rho(i+g-1))\rho(i+g).

Generating jj is then equivalent to sampling gGg\leftarrow G, whose distribution is described by PgP_{g}, and then computing j=i+gj=i+g.

However, since there are gg (which can go to infinity) terms in PgP_{g}, it is still unclear how to sample GG in constant time. The key observation is that the cumulative mass function of GG, denoted as C(x)C(x), has a remarkably simple form. In particular,

(2) C(x)=g=1xPg=1Γ(i+1+1α)Γ(x+i+1)Γ(i+1)Γ(x+i+1+1α).C(x)=\sum_{g=1}^{x}P_{g}=1-\frac{\Gamma(i+1+\frac{1}{\alpha})\Gamma(x+i+1)}{\Gamma(i+1)\Gamma(x+i+1+\frac{1}{\alpha})}.

We defer the step-by-step derivation to § B.

Let C1(r)C^{-1}(r) be the inverse of C(x)C(x). The simple form of C(x)C(x) allows us to compute C1(r)C^{-1}(r) easily, which we will soon explain. To sample GG, we sample r[0,1)r\leftarrow[0,1) uniformly, and compute g=C1(r)g=\lceil C^{-1}(r)\rceil. To make the algorithm deterministic, rr may come from a pseudorandom number generator seeded with the source symbol ss. The algorithm outputs i+gi+g as the next index to which ss is mapped, updates ii+gi\leftarrow i+g, and is ready to produce another index. Because every source symbol is mapped to the first coded symbol (recall that ρ(0)=1\rho(0)=1), we start the recursion with i=0i=0.

Finally, we explain how to compute C1(r)C^{-1}(r). It is the most simple if we set the parameter α\alpha in ρ(i)\rho(i) to 0.50.5. Plugging α=0.5\alpha=0.5 into Eq. 2, we get

C(x)=x(2i+x+3)(i+x+1)(i+x+2).C(x)=\frac{x(2i+x+3)}{(i+x+1)(i+x+2)}.

Its inverse is

C1(r)\displaystyle C^{-1}(r) =(3+2i)2r4(1r)3+2i2\displaystyle=\sqrt{\frac{(3+2i)^{2}-r}{4(1-r)}}-\frac{3+2i}{2}
(1.5+i)((1r)121).\displaystyle\approx(1.5+i)((1-r)^{-\frac{1}{2}}-1).

For a generic α\alpha, we can use Stirling’s approximation (Robbins, 1955), and get

C(x)1(i+1x+i+1)1α.C(x)\approx 1-\left(\frac{i+1}{x+i+1}\right)^{\frac{1}{\alpha}}.

Consequently,

C1(r)(i+1)((1r)α1).C^{-1}(r)\approx(i+1)((1-r)^{-\alpha}-1).

In our final design, we set α=0.5\alpha=0.5. The main reason is that computing C1(r)C^{-1}(r) when α=0.5\alpha=0.5 only requires computing square roots, while it otherwise involves raising 1r1-r to other non-integer powers. We observe that the latter is significantly slower on older CPUs. Meanwhile, as we will show in § 5, setting α=0.5\alpha=0.5 results in negligible extra communication compared to the optimal setting.

4.3. Resistance to Malicious Workload

In some applications, rogue users may inject items to Alice or Bob’s sets. For example, in a distributed social media application where servers exchange posts, users can craft any post they like. This setting may create an “adversarial workload,” where the hash of the symbol representing the user’s input does not distribute uniformly. If the user injects into Bob’s set a source symbol that hashes to the same value as another source symbol that Alice has, then Bob will never be able to reconcile its set with Alice. This is because Bob will XOR the malicious symbol into the coded symbol stream it receives from Alice, but it will only cancel out the hash of Alice’s colliding symbol from the checksum field, and will corrupt the sum field.

The literature on set reconciliation is aware of this issue, but typically does not specify the required properties from the hash function to mitigate it; most use hash functions with strong properties such as random oracles (Mitzenmacher and Pagh, 2018), which have long outputs (e.g., 256 bits). It is sufficient, however, to use a keyed hash function with uniform and shorter outputs (e.g., 64 bits). This allows Alice and Bob to coordinate a secret key and use it to choose a hash function from the family of keyed hashes. Although with short hashes, an attacker can computationally enumerate enough symbols to find a collision for an item that Alice has, the attacker does not know the key, i.e., the hash function that Alice and Bob use, so she cannot target a collision to one of Alice’s symbols. This allows Rateless IBLT to minimize the size of a coded symbol and save bandwidth, particularly in applications where symbols are short and checksums account for much of the overhead. In practice, we use the SipHash (Aumasson and Bernstein, 2012) keyed hash function. A trade-off we make is that Alice has to compute the checksums separately for each key she uses, which increases her computation load. We believe this is a worthwhile trade-off as SipHash is very efficient, and we find in experiments (§ 7.2) that computing the hashes has negligible cost compared to computing sums, which are still universal. Also, we expect using different keys only in applications where malicious workload is a concern.

5. Analysis

In this section, we use density evolution (Richardson and Urbanke, 2001; Luby et al., 1998) to analyze the behavior of the peeling decoder when decoding coded symbols in Rateless IBLTs. We mathematically prove that as the difference size dd goes to infinity, the overhead of Rateless IBLTs converges to 1.351.35; i.e., reconciling dd differences requires only the first 1.35d1.35d coded symbols. We then use Monte Carlo simulations to show the behavior for finite dd. In particular, we show that the overhead converges quickly, when dd is at the low hundreds.

Density evolution is a standard technique for analyzing the iterative decoding processes in error correcting codes based on sparse graphs (Richardson and Urbanke, 2001; Luby et al., 1998), and has been applied to IBLTs with simpler mappings between source and coded symbols (Lázaro and Matuz, 2021). Its high-level idea is to iteratively compute the probability that a random source symbol has not been recovered while simulating the peeling decoder statistically. If this probability keeps decreasing towards 0 as the peeling decoder runs for more iterations, then decoding will succeed with probability converging to 11 (Luby et al., 1998, § 2). The following theorem states our main conclusion. We defer its proof to § C.

Theorem 5.1.

For a random set of nn source symbols, the probability that the peeling decoder successfully recovers the set using the first ηn\eta n coded symbols (as defined in § 4.1) tends to 1 as nn goes to infinity, provided that η\eta is any positive constant that satisfies

(3) q(0,1]:e1αEi(qαη)<q.\forall q\in(0,1]:e^{\frac{1}{\alpha}\text{Ei}\left(-\frac{q}{\alpha\eta}\right)}<q.

Recall that Ei()\text{Ei}(\cdot) is the exponential integral function;444Ei(x)=xetdtt\text{Ei}(x)=-\int_{-x}^{\infty}\frac{e^{-t}dt}{t}. α\alpha is the parameter in the mapping probability ρ(i)=11+αi\rho(i)=\frac{1}{1+\alpha i} as discussed in § 4.1. We stated Theorem 5.1 with respect to a generic set of source symbols and its corresponding coded symbol sequence; in practice, the set is ABA\bigtriangleup B. The decoder (Bob) knows the coded symbol sequence for ABA\bigtriangleup B because he subtracts the coded symbols for BB (generated locally) from those for AA (received from Alice) as defined in § 3.

Theorem 5.1 implies that for any choice of parameter α\alpha, there exists a corresponding threshold η\eta^{*} which is the smallest η\eta that satisfies Eq. 3. Any η>η\eta>\eta^{*} also satisfies Eq. 3 because the left-hand side monotonically decreases with respect to η\eta. (Intuitively, this must be true as a larger η\eta means more coded symbols, which should be strictly beneficial for decoding.) As long as Bob receives more than η\eta^{*} coded symbols per source symbol, he can decode with high probability. In other words, η\eta^{*} is the communication overhead of Rateless IBLTs, i.e., the average number of coded symbols required to recover each source symbol. η\eta^{*} is a function of α\alpha. As discussed in § 4.2, we set α=0.5\alpha=0.5 in our final design to simplify the process of generating mappings according to ρ(i)\rho(i). We solve for η\eta^{*} when α=0.5\alpha=0.5 and get the following result.

Corollary 5.2.

The average overhead of Rateless IBLTs converges to 1.351.35 as the difference size d=|AB|d=|A\bigtriangleup B| goes to infinity.

5.1. Monte Carlo Simulations

Theorem 5.1 and Corollary 5.2 from the density evolution analysis state the behavior of Rateless IBLTs when the difference size dd goes to infinity. To understand the behavior when dd is finite, we run Monte Carlo simulations, and compare the results with the theorems.

Fig. 4 shows the main results. It compares the overhead predicted by Theorem 5.1 and that observed in simulations. First, notice that as the difference size increases, simulation results converge to the analysis for all α\alpha. How fast the results converge depends on α\alpha. For all α0.55\alpha\leq 0.55, convergence happens quickly, and the overhead observed in simulations stays within 10%10\% of the analysis even for the smallest difference size we test. On the other hand, for α=0.95\alpha=0.95, simulation results are still 12%12\% higher than the analysis at the largest difference size we test. Second, the figure shows that setting α=0.5\alpha=0.5 is close to optimal for the communication overhead. Setting α=0.5\alpha=0.5 results in η=1.35\eta^{*}=1.35, while the optimal setting is α=0.64\alpha=0.64 which results in η=1.31\eta^{*}=1.31, a difference of only 3%3\%.

Refer to caption
Figure 4. Relationship between the communication overhead η\eta^{*} and the parameter α\alpha in ρ(i)\rho(i). “DE” shows results from the density evolution analysis which assumes the difference size goes to infinity. Points show results from Monte Carlo simulations for various finite difference sizes. Each point is the average over 100 runs.

Next, we focus on α=0.5\alpha=0.5, the parameter we choose for our final design. Fig. 5 shows the overhead as we vary the difference size dd. It peaks at 1.721.72 when d=4d=4 and then converges to 1.351.35 as predicted by Corollary 5.2. Convergence happens quickly: for all d>128d>128, the overhead is less than 1.401.40.

Refer to caption
Figure 5. Overhead of Rateless IBLTs at varying difference sizes dd. We run 100 simulations for each data point and report the average. The shaded area shows the standard deviation. The dashed line shows 1.351.35, the overhead predicted by density evolution.

The density evolution analysis also predicts how decoding progresses as the decoder receives more coded symbols. The fixed points of qq in Eq. 3 represent the expected fraction of source symbols that the peeling decoder fails to recover before stalling, as dd goes to infinity. Fig. 6 compares this result with simulations (we plot 1q1-q, the fraction that the decoder can recover) and they match closely. There is a sharp increase in the fraction of recovered source symbols towards the end, a behavior also seen in other codes that use the peeling decoder, such as LT codes (Luby, 2002).

Refer to caption
Figure 6. The fraction of recovered source symbols after receiving different number of coded symbols (normalized over the difference size dd), as observed in simulations (average of 1000 runs), and as predicted by density evolution. Dashed line shows 1.351.35, the overhead predicted by density evolution.

6. Implementation

We implement Rateless IBLT as a library in 353 lines of Go code. The implementation is self-contained and does not use third-party code. In this section, we discuss some important optimizations in the implementation.

Efficient incremental encoding. A key feature of Rateless IBLT is that it allows Alice to generate and send coded symbols one by one until Bob can decode. Suppose that Alice has generated coded symbols until index i1i-1, and now wishes to generate the ii-th coded symbol. She needs to quickly find the source symbols that are mapped to it. A strawman solution is to store alongside each source symbol the next index it is mapped to, and scan all the source symbols to find the ones mapped to ii. However, this takes O(|A|)O(|A|) time. In our implementation, we store pointers to source symbols in a heap. It implements a priority queue, where the priority is the index of the next coded symbol that a source symbol is mapped to. A smaller value indicates higher priority. This ensures that source symbols used for generating the next coded symbol are always at the head of the queue so that the encoder can access them efficiently without scanning all the source symbols.

Variable-length encoding for count. Recall that the count field stores the number of source symbols that are mapped to a coded symbol during encoding. The standard approach is to allocate a fixed number of bytes for it (Eppstein et al., 2011; Lázaro and Matuz, 2023), which inflates the size of each coded symbol by a constant amount. However, in Rateless IBLT, the value stored in count decreases with the index of the coded symbol according to a known pattern: the ii-th coded symbol for a set SS is expected to have a count of |S|ρ(i)|S|\rho(i). This pattern allows us to aggressively compress the count field. Instead of storing the value itself, we can store the difference of the actual value and the aforementioned expected value, which is a much smaller number. The node receiving the coded symbol can reconstruct the actual value of count, because it knows NN (transmitted with the 0-th coded symbol) and ii (assuming a transport that preserves ordering). Instead of allocating a fixed number of bytes, we use variable-length quantity (Wang et al., 2017) to store the difference, which uses log128x\lceil\log_{128}{x}\rceil bytes to store any number xx. Using our approach, the count field takes only 1.051.05 bytes per coded symbol on average when encoding a set of 10610^{6} items into 10410^{4} coded symbols, keeping the resulting communication cost to a minimum.

7. Evaluation

We compare Rateless IBLT with state-of-the-art set reconciliation schemes, and demonstrate its low communication (§ 7.1) and computation (§ 7.2) costs across a wide range of workloads (set sizes, difference sizes, and item lengths). We then apply Rateless IBLT to synchronize the account states of Ethereum and demonstrate significant improvements over the production system on real workloads (§ 7.3).

Schemes compared. We compare with regular IBLT (Goodrich and Mitzenmacher, 2011; Eppstein et al., 2011), MET-IBLT (Lázaro and Matuz, 2023), PinSketch (Dodis et al., 2008), and Merkle tries (Yue et al., 2020). For Rateless IBLT, we use our implementation discussed in § 6. For regular IBLT and MET-IBLT, we implement each scheme in Python. We use the recommended parameters (Eppstein et al., 2011, § 6.1)(Lázaro and Matuz, 2023, §§ V-A, V-C), and allocate 8 bytes for the checksum and the count fields, respectively. For PinSketch, we use Minisketch (Naumenko et al., 2019, § 6), a state-of-the-art implementation (Wuille, 2018) written in C++ and deployed in Bitcoin. For Merkle tries, we use the implementation in Geth (go-ethereum Authors, 2024a), the most popular client for Ethereum.

7.1. Communication Cost

We first measure the communication overhead, defined as the amount of data transmitted during reconciliation divided by the size of set difference accounted in bytes. We test with set differences of 11400400 items. Beyond 400400, the overhead of all schemes stays stable. The set size is 1 million items (recall that it only affects Merkle trie’s communication cost). Each item is 32 bytes, the size of a SHA256 hash, commonly used as keys in open-permission distributed systems (Nakamoto, 2008; Trautwein et al., 2022). For Rateless IBLT and MET-IBLT, we generate coded symbols until decoding succeeds, repeat each experiment 100100 times, and then report the average overhead and the standard deviation. Regular IBLTs cannot be dynamically expanded, and tuning the number of coded symbols mm requires precise knowledge of the size of the set difference. Usually, this is achieved by sending an estimator before reconciliation (Eppstein et al., 2011), which incurs an extra communication cost of at least 15 KB according to the recommended setup (Lázaro and Matuz, 2023, § V-C). We report the overhead of regular IBLT with and without this extra cost. Also, unlike the other schemes, regular IBLTs may fail to decode probabilistically. We gradually increase the number of coded symbols mm until the decoding failure rate drops below 1/3 0001/3\,000.

Fig. 7 shows the overhead of all schemes except for Merkle trie, whose overhead is significantly higher than the rest at over 4040 across all difference sizes we test. Rateless IBLT consistently achieves lower overhead compared to regular IBLT and MET-IBLT, especially when the set difference is small. For example, the overhead is 224×4\times lower when the set difference is less than 5050. The improvement is more significant when considering the cost of the estimator for regular IBLTs. On the other hand, PinSketch consistently achieves an overhead of 11, which is 373760%60\% lower than Rateless IBLT. However, as we will soon show, Rateless IBLT incurs 222 000×2\,000\times less computation than PinSketch on both the encoder and the decoder. We believe that the extra communication cost is worthwhile in most applications for the significant reduction in computation cost.

Refer to caption
Figure 7. Communication overhead of various schemes. Each item is 32 bytes. Shaded areas show the standard deviation for Rateless IBLT and MET-IBLT. Regular IBLT + Estimator shows the overhead of regular IBLT with an estimator for the size of set difference. We do not plot for Merkle Trie as its overhead is significantly higher (over 40) than the rest.

Scalability of Rateless IBLT. We quickly remark on how Rateless IBLT’s communication cost scales to longer or shorter items. Like other schemes based on sparse graphs, the checksum and count fields add a constant cost to each coded symbol. For Rateless IBLT, these two fields together occupy about 99 bytes. Longer items will better amortize this fixed cost. When reconciling shorter items, this fixed cost might become more prominent. However, it is possible to reduce the length of the checksum field if the differences are smaller, because there will be fewer opportunities for hash collisions. We found that hashes of 44 bytes are enough to reliably reconcile differences of tens of thousands. It is also possible to remove the count field altogether; Bob can still recover the symmetric difference as the peeling decoder (§ 3) does not use this field.

7.2. Computation Cost

Refer to caption
(a) N=1 000 000N=1\,000\,000
Refer to caption
(b) N=10 000N=10\,000
Figure 8. Encoding throughput and time for sets of sizes N=1 000 000N=1\,000\,000 and N=10 000N=10\,000. Solid lines show the throughput (left Y-axis), and dashed lines show the encoding time (right Y-axis).
Refer to caption
Figure 9. Decoding throughput and time. Solid lines show the throughput (left Y-axis), and dashed lines show the decoding time (right Y-axis).

There are two potential computation bottlenecks in set reconciliation: encoding the sets into coded symbols, and decoding the coded symbols to recover the symmetric difference. Encoding happens at Alice, and both encoding and decoding happen at Bob. In this experiment, we measure the encoding and decoding throughput for sets of various sizes and differences. We focus on comparing with PinSketch. We fix the item size to 88 bytes, because this is the maximum size that the PinSketch implementation supports. We do not compare with regular IBLT or MET-IBLT as we cannot find high-quality open source implementations, and they have similar complexity as Rateless IBLT.555The complexity is linear to the average number of coded symbols each source symbol is mapped to.This is O(log(m))O(\log(m)) for Rateless IBLT and MET-IBLT (Lázaro and Matuz, 2023), and constant for regular IBLT, where mm is the number of coded symbols. However, the cost is amortized over the size of the set difference, which is O(m)O(m). So, in all three IBLT-based schemes, the cost to encode for each set difference decreases quickly as mm increases. We will compare with Merkle trie in § 7.3. We run the benchmarks on a server with two Intel Xeon E5-2697 v4 CPUs. Both Rateless IBLT and PinSketch are single-threaded, and we pin the executions to one CPU core using cpuset(1).

Encoding. Fig. 9 shows in solid lines the encoding throughput, defined as the difference size divided by the time it takes for the encoder to generate enough coded symbols for successful reconciliation. It indicates the number of items that can be reconciled per second with a compute-bound encoder. Rateless IBLT achieves 222000×2000\times higher encoding throughput than PinSketch when reconciling differences of 2210510^{5} items. The significant gain is because the mapping between source and coded symbols is sparse in Rateless IBLT, and the sparsity increases rapidly with mm, so the average cost to generate a coded symbol decreases quickly. In comparison, generating a coded symbol in PinSketch always requires evaluating the entire characteristic polynomial, causing the throughput to converge to a constant.

As the difference size increases, the encoding throughput of Rateless IBLT increases almost linearly, enabling the encoder to scale to large differences. In Fig. 9, we plot in dashed lines the time it takes to finish encoding. As the difference size increases by 50 000×50\,000\times, the encoding time of Rateless IBLT grows by less than 6×6\times. Meanwhile, the encoding time of PinSketch grows by 5 000×5\,000\times.

Decoding. Fig. 9 shows the decoding throughput (solid lines) and time (dashed lines), defined similarly as in the encoding experiment. We do not make a distinction of the set size, because it does not affect the decoding complexity. (Recall that decoders operate on coded symbols of the symmetric difference only.) Rateless IBLT achieves 1010107×10^{7}\times higher decoding throughput than PinSketch. This is because decoding PinSketch is equivalent to interpolating polynomials (Dodis et al., 2008), which has O(m2)O(m^{2}) complexity (Wuille, 2018), while decoding Rateless IBLT has only O(mlog(m))O(m\log(m)) complexity thanks to the sparse mapping between source and coded symbols. As the difference size grows by 50 000×50\,000\times, the decoding throughput of Rateless IBLT drops by only 34%34\%, allowing it to scale to large differences. For example, it takes Rateless IBLT 0.010.01 second to decode 10510^{5} differences. In contrast, it takes PinSketch more than a minute to decode 10410^{4} differences.

Refer to caption
Figure 10. Encoding time of 1 0001\,000 differences and varying set size NN.
Refer to caption
Figure 11. Slowdown when encoding items of different sizes.

Scalability of Rateless IBLT. We now show that Rateless IBLT preserves its computation efficiency when scaling to larger sets, larger differences, and longer items.

The set size NN affects encoding, but not decoding, because the decoder operates on coded symbols that represent the symmetric difference. The computation cost of encoding grows linearly with NN, as each source symbol is mapped to the same number of coded symbols on average and thus adds the same amount of work. For example, in Fig. 9, the encoding time for 10310^{3} differences is 2.92.9 milliseconds when N=104N=10^{4}, and 294294 milliseconds when N=106N=10^{6}, a difference of 100×100\times that matches the change in NN. Fig. 11 shows the encoding time measured in experiments with the same configuration for a wider range of NN.

The difference size dd affects both encoding and decoding. Recall that Rateless IBLT uses about 1.35d1.35d coded symbols to reconcile dd differences (§ 5). As dd increases, the encoder needs to generate more coded symbols. However, unlike PinSketch where the cost is linear in dd, the cost of Rateless IBLT grows logarithmically. For example, in Fig. 8(a), the encoding time grows by only 6×6\times as the set difference increases from 11 to 10510^{5}. This is because the mapping from source to coded symbols is sparse: each source symbol is only mapped to an average of O(logd)O(\log d) coded symbols. The same result applies to decoding. For example, in Fig. 9, the decoding throughput drops by 2×2\times as the dd grows by 104×10^{4}\times.

The item size \ell affects both encoding and decoding because it decides the time it takes to compute the XOR of two symbols, which dominates the computation cost in Rateless IBLT. Fig. 11 shows the relative slowdown when as \ell grows from 88 bytes to 3232 KB. Initially, the slowdown is sublinear (e.g., less than 4×4\times when \ell grows by 16×16\times from 88 to 128128 bytes) because the other costs that are independent of \ell (e.g., generating the mappings) are better amortized. However, after 22 KB, the slowdown becomes linear. This implies that the data rate at which the encoder can process source symbols, measured in bytes per second, stays constant. For example, when encoding for d=1000d=1000, the encoder can process source symbols at 124.8124.8 MB/s. The same analysis applies to decoding. In comparison, the encoding complexity of PinSketch increases linearly with \ell, and the decoding complexity increases quadratically (Dodis et al., 2008; Wuille, 2018).

7.3. Application

We now apply Rateless IBLT to a prominent application, the Ethereum blockchain. Whenever a blockchain replica comes online, it must synchronize with others to get the latest ledger state before it can validate new transactions or serve user queries. The ledger state is a key-value table, where the keys are 20-byte wallet addresses, and the values are 72-byte account states such as its balance. There are 230 million accounts as of January 4, 2024. Synchronizing the ledger state is equivalent to reconciling the set of all key-value pairs, a natural application of Rateless IBLT.

Ethereum (as well as most other blockchains) currently uses Merkle tries (§ 2) to synchronize ledger states between replicas. It applied a few optimizations: using a 16-ary trie instead of a binary one, and shortening sub-tries that have no branches. The protocol is called state heal and has been deployed in Geth (go-ethereum Authors, 2024a), the implementation which accounts for 84% of all Ethereum replicas (Sonic, 2024). Variants of Geth also power other major blockchains, such as Binance Smart Chain and Optimism.

State heal retains the issues with Merkle tries despite the optimizations. To discover a differing key-value pair (leaf), replicas must visit and compare every internal node on the branch from the root to the differing leaf. This amplifies the communication, computation, and storage I/O costs by as much as the depth of the trie, i.e., O(log(N))O(\log(N)) for a set of NN key-value pairs. In addition, replicas must descend the branch in lock steps, so the process takes O(log(N))O(\log(N)) round trips. As a result, some Ethereum replicas have reported spending weeks on state heal, e.g., (go-ethereum Authors, 2024b). In comparison, Rateless IBLT does not have these issues. Its communication and computation costs depend only on the size of the difference rather than the entire ledger state, and it requires no interactivity between replicas besides streaming coded symbols at line rate.

Setup. We compare state heal with Rateless IBLT in synchronizing Ethereum ledger states. We implement a prototype in 1,903 lines of Go code. The prototype is able to load a snapshot of the ledger state from the disk, and synchronize with a peer over the network using either scheme. For state heal, we use the implementation (go-ethereum Authors, 2024c) in Geth v1.13.10 without modification. For Rateless IBLT, we use our implementation discussed in § 6. We wrap it with a simple network protocol where a replica requests synchronization by opening a TCP connection to the peer, and the peer streams coded symbols until the requesting replica closes the connection to signal successful decoding.

To obtain workload for experiments, we extract snapshots of Ethereum ledger states as of blocks 18908312–18938312, corresponding to a 100-hour time span between December 31, 2023 and January 4, 2024. Each snapshot represents the ledger state when a block was just produced in the live Ethereum blockchain.666Ethereum produces a block every 12 seconds. Each block is a batch of transactions that update the ledger state. For each experiment, we set up two replicas: Alice always loads the latest snapshot (block 18938312); Bob loads snapshots of different staleness and synchronizes with Alice. This simulates the scenario where Bob goes offline at some point in time (depending on the snapshot he loads), wakes up when block 18938312 was just produced, and synchronizes with Alice to get the latest ledger state. We run both replicas on a server with two Intel Xeon E5-2698 v4 CPUs running FreeBSD 14.0. We use Dummynet (Rizzo, 1997) to inject a 50 ms one-way propagation delay between the replicas and enforce different bandwidth caps of 10 to 100 Mbps.

Results. We first vary the state snapshot that Bob loads and measure the completion time and the communication cost for Bob to synchronize with Alice. We fix the bandwidth to 20 Mbps. Fig. 12 shows the results. As Bob’s state becomes more stale, more updates happen between his and the latest states, and the difference between the two grows linearly. As a result, the completion time and the communication cost of both schemes increase linearly. Meanwhile, Rateless IBLT consistently achieves 4.84.813.6×13.6\times lower completion time, and 4.44.48.6×8.6\times lower communication cost compared to state heal. As discussed previously, state heal has a much higher communication cost because it requires transmitting the differing internal nodes of the Merkle trie in addition to the leaves. For example, this amplifies the number of trie nodes transmitted by 3.6×3.6\times when Bob’s state is 30 hours stale. The higher communication cost leads to proportionately longer completion time as the system is throughput-bound.

Refer to caption
(a) Staleness between 20 minutes and 100 hours.
Refer to caption
(b) Staleness between 1 minute and 20 minutes.
Figure 12. Completion time and communication cost when synchronizing Ethereum ledger states at different staleness over a network link with 50 ms of propagation delay and 20 Mbps of bandwidth. A staleness of xx hours means the state is xx-hours old when synchronization starts.

In our experiments, state heal requires at least 11 rounds of interactivity, as Alice and Bob descend from the roots of their tries to the differing leaves in lock steps. Rateless IBLT, in comparison, only requires half of a round because Alice streams coded symbols without waiting for any feedback. This advantage is the most obvious when reconciling a small difference, where the system would be latency-bound. For example, Rateless IBLT is 8.2×8.2\times faster than state heal when Bob’s ledger state is only 1 block (12 seconds) stale.

We quickly highlight the impact of interactivity. Fig. 13 shows traces of bandwidth usage when synchronizing one block worth of state difference. For Rateless IBLT, the first coded symbol arrives at Bob in 1 round-trip time (RTT) after his TCP socket opens (0.5 RTT for TCP ACK to reach Alice, and another 0.5 RTT for the first symbol to arrive). Subsequent symbols arrive at line rate, as the peak at 1 RTT indicates. In comparison, for state heal, Alice and Bob reach the bottom of their tries after 11 RTTs; before that, they do not know the actual key-value pairs that differ, and the network link stays almost idle.

Refer to caption
Figure 13. Time series of bandwidth usage when synchronizing Ethereum ledger states that are 1 block (12 seconds) stale. The network link has 50 ms of propagation delay and 20 Mbps of bandwidth. Time starts when Bob sees the TCP socket open.

Finally, we demonstrate that Rateless IBLT consistently outperforms state heal across different network conditions. We fix Bob’s snapshot to be 10 hours stale and vary the bandwidth cap. Fig. 14 shows the results. Rateless IBLT is 4.8×4.8\times faster than state heal at 10 Mbps, and the gain increases to 16×16\times at 100 Mbps. Notice that the completion time of state heal stays constant after 20 Mbps; it cannot utilize any extra bandwidth. We observe that state heal becomes compute-bound: Bob cannot process the trie nodes he receives fast enough to saturate the network. The completion time does not change even if we remove the bandwidth cap. In contrast, Rateless IBLT is throughput-bound, as its completion time keeps decreasing with the increasing bandwidth. If we remove the bandwidth cap, Rateless IBLT takes 2.5 seconds to finish and can saturate a 170 Mbps link using one CPU core on each side.

Refer to caption
Figure 14. Completion time when synchronizing Ethereum ledger states that are 10 hours stale over a network link with 50 ms of propagation delay and different bandwidth.

Before ending, we quickly discuss a few other potential solutions and how Rateless IBLT compares with them. When Bob’s state is consistent with some particular block, he may request Alice to compute and send the state delta from his block to the latest block, which would be as efficient as an optimal set reconciliation scheme. However, this is often not the case when Bob needs synchronization, such as when he recovers from database corruption or he has downloaded inconsistent shards of state snapshots from multiple sources, routine when Geth replicas bootstrap (Taverna and Paterson, 2023). Rateless IBLT (and state heal) does not assume consistent states. Coded symbols in traditional set reconciliation schemes like regular IBLT are tailored to a fixed difference size (§ 3). Alice has to generate a new batch of coded symbols for each peer with a different state. This would add minutes to the latency for large sets like Ethereum, incur significant computation costs, and create denial-of-service vulnerabilities.777These issues also apply to the aforementioned state delta solution to a lesser degree, because Alice has to compute the state deltas on the fly. In contrast, Rateless IBLT allows Alice to prepare a single stream of coded symbols that is efficient for all peers. Because of linearity (§ 4), Alice can incrementally update the coded symbols as her ledger state changes. For an average Ethereum block, it takes 11 ms to update 50 million coded symbols (7 GB) using one CPU core to reflect the state changes.

8. Irregular Rateless IBLTs

When designing Rateless IBLTs (§ 4.1), the key task was to define the mapping rule that decides whether a source symbol xx should be mapped to the ii-th coded symbol. Lemmas 4.1 and 4.2 showed that the mapping rule cannot be uniform over coded symbols: the probability that a random source symbol is mapped to the ii-th coded symbol must decrease with ii. In other words, different coded symbols are statistically inequivalent. On average, a coded symbol with a smaller index sees more source symbols mapped to it than one with a larger index does.

However, the same is not true for source symbols in our design: every subset of source symbols uses the same mapping probability ρ(i)=11+αi\rho(i)=\frac{1}{1+\alpha i} with the same parameter α\alpha. This leaves a degree of freedom which we did not explore. We may divide source symbols into multiple subsets and use different ρ(i)\rho(i) (in particular, different α\alpha) for each subset. Similar techniques have successfully improved the communication costs of regular IBLTs (Lázaro and Matuz, 2021). In this section, we apply this technique on Rateless IBLTs and discuss the implications.

Concretely, we partition source symbols into cc mutually exclusive subsets, where cc is a parameter. A random source symbol belongs to subset jj (0j<c0\leq j<c) with probability wjw_{j}, which is another set of parameters. For a source symbol xx, we choose the subset it belongs to based on its hash. For example, xx belongs to subset jj if b=0j1wb𝙷𝚊𝚜𝚑(x)<b=0jwb\sum_{b=0}^{j-1}w_{b}\leq\mathtt{Hash}(x)<\sum_{b=0}^{j}w_{b} assuming the hash is uniformly distributed in [0,1)[0,1). For each subset jj, we define a parameter αj\alpha_{j} and use mapping probability ρj(i)=11+αji\rho_{j}(i)=\frac{1}{1+\alpha_{j}i} when mapping source symbols in this subset. In other words, we replace α\alpha with a subset-specific αj\alpha_{j} in the algorithm described in § 4.2. By convention, we call this generalized design Irregular Rateless IBLTs. Rateless IBLTs as discussed prior to this section is a special case where c=1c=1, w0=1w_{0}=1, and α0=0.5\alpha_{0}=0.5.

As mentioned, the main benefit of Irregular Rateless IBLTs over Rateless IBLTs is a lower communication cost. Unfortunately, the density evolution analysis does not produce a closed-form result like the one in Theorem 5.1. To find a good configuration of cc, wjw_{j}, and αj\alpha_{j} that minimizes the overhead, we use brute force and try different values in simulations. To limit the complexity of the search, we set the number of subsets cc to 33 and found the following optimal configuration

c=3,\displaystyle c=3,
w0,w1,w2=0.18,0.56,0.26,\displaystyle w_{0},w_{1},w_{2}=0.18,0.56,0.26,
α0,α1,α2=0.11,0.68,0.82.\displaystyle\alpha_{0},\alpha_{1},\alpha_{2}=0.11,0.68,0.82.

As shown in Fig. 15, the resulting communication overhead converges to 1.101.10, which is 19%19\% lower than Rateless IBLTs (§ 4) and only 10%10\% above the information-theoretic lower bound. Meanwhile, encoding and decoding are 1.881.88 times slower than Rateless IBLTs. As mentioned in § 4.2, the main reason is that computing mappings when α0.5\alpha\neq 0.5 requires raising numbers to arbitrary non-integer powers, while the case of α=0.5\alpha=0.5 only requires computing square roots, which is faster on modern hardware. We leave further optimizations of the parameters and the implementation to future works.

Refer to caption
Figure 15. Communication overhead of Rateless IBLTs and Irregular Rateless IBLTs as the difference size changes. We run 100 simulations for each data point and report the average.

9. Conclusion

We designed, mathematically analyzed, and experimentally evaluated Rateless IBLT. To the best of our knowledge, Rateless IBLT is the first set reconciliation solution with universally low computation cost and near-optimal communication cost across workloads. The distinguishing feature is ratelessness: it encodes any set into an infinitely long codeword, of which any prefix is capable of reconciling a proportional number of differences with another set. Ratelessness simplifies deployment as there is no parameter; reduces overhead as nodes can incrementally send longer prefixes without over- or under-committing resources to fixed-sized codewords; and naturally supports concurrent synchronization with multiple nodes. We mathematically proved its asymptotic efficiency and showed that the actual performance converges quickly with extensive simulations. We implemented Rateless IBLT as a library and benchmarked its performance. Finally, we applied Rateless IBLT to a popular distributed application and demonstrated significant gains in state synchronization over the production system.

We point out a few interesting future directions: optimizing the parameters and the implementation of Irregular Rateless IBLTs; considering scenarios where Alice and Bob’s sets change in the middle of reconciliation; and designing efficient solutions for reconciliation across more than two parties.

Acknowledgments

We thank Francisco Lázaro for fruitful discussions. Lei Yang was supported by a gift from the Ethereum Foundation. Yossi Gilad was partially supported by the Alon Fellowship.

This work does not raise any ethical issues. Appendices are supporting material that has not been peer-reviewed.

References

  • (1)
  • Androulaki et al. (2018) Elli Androulaki, Artem Barger, Vita Bortnikov, Christian Cachin, Konstantinos Christidis, Angelo De Caro, David Enyeart, Christopher Ferris, Gennady Laventman, Yacov Manevich, Srinivasan Muralidharan, Chet Murthy, Binh Nguyen, Manish Sethi, Gari Singh, Keith Smith, Alessandro Sorniotti, Chrysoula Stathakopoulou, Marko Vukolić, Sharon Weed Cocco, and Jason Yellick. 2018. Hyperledger fabric: a distributed operating system for permissioned blockchains. In Proceedings of the Thirteenth EuroSys Conference (Porto, Portugal) (EuroSys ’18). Association for Computing Machinery, New York, NY, USA, Article 30, 15 pages. https://doi.org/10.1145/3190508.3190538
  • Aumasson and Bernstein (2012) Jean-Philippe Aumasson and Daniel J. Bernstein. 2012. SipHash: A Fast Short-Input PRF. In 13th International Conference on Cryptology in India (INDOCRYPT 2012) (Kolkata, India) (Lecture Notes in Computer Science, Vol. 7668). Springer, New York, NY, USA, 489–508. https://doi.org/10.1007/978-3-642-34931-7_28
  • Bloom (1970) Burton H. Bloom. 1970. Space/time trade-offs in hash coding with allowable errors. Commun. ACM 13, 7 (jul 1970), 422–426. https://doi.org/10.1145/362686.362692
  • Bose and Ray-Chaudhuri (1960) R. C. Bose and Dwijendra K. Ray-Chaudhuri. 1960. On A Class of Error Correcting Binary Group Codes. Inf. Control. 3, 1 (1960), 68–79. https://doi.org/10.1016/S0019-9958(60)90287-4
  • Byers et al. (1998) John W. Byers, Michael Luby, Michael Mitzenmacher, and Ashutosh Rege. 1998. A digital fountain approach to reliable distribution of bulk data. In Proceedings of the ACM SIGCOMM ’98 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communication (Vancouver, British Columbia, Canada) (SIGCOMM ’98). Association for Computing Machinery, New York, NY, USA, 56–67. https://doi.org/10.1145/285237.285258
  • Cam (1960) Lucien Le Cam. 1960. An approximation theorem for the Poisson binomial distribution. Pacific J. Math. 10, 4 (1960), 1181 – 1197.
  • Dodis et al. (2008) Yevgeniy Dodis, Rafail Ostrovsky, Leonid Reyzin, and Adam D. Smith. 2008. Fuzzy Extractors: How to Generate Strong Keys from Biometrics and Other Noisy Data. SIAM J. Comput. 38, 1 (2008), 97–139. https://doi.org/10.1137/060651380
  • Eppstein et al. (2011) David Eppstein, Michael T. Goodrich, Frank Uyeda, and George Varghese. 2011. What’s the difference? efficient set reconciliation without prior context. In Proceedings of the ACM SIGCOMM 2011 Conference (Toronto, Ontario, Canada) (SIGCOMM ’11). Association for Computing Machinery, New York, NY, USA, 218–229. https://doi.org/10.1145/2018436.2018462
  • Gallager (1962) Robert G. Gallager. 1962. Low-density parity-check codes. IRE Trans. Inf. Theory 8, 1 (1962), 21–28. https://doi.org/10.1109/TIT.1962.1057683
  • go-ethereum Authors (2024a) The go-ethereum Authors. 2024a. go-ethereum: Official Go implementation of the Ethereum protocol. https://geth.ethereum.org.
  • go-ethereum Authors (2024b) The go-ethereum Authors. 2024b. State heal phase is very slow (not finished after 2 weeks). https://github.com/ethereum/go-ethereum/issues/23191.
  • go-ethereum Authors (2024c) The go-ethereum Authors. 2024c. trie package - go-ethereum. https://pkg.go.dev/github.com/ethereum/go-ethereum/trie.
  • Goodrich and Mitzenmacher (2011) Michael T. Goodrich and Michael Mitzenmacher. 2011. Invertible Bloom Lookup Tables. In 49th Annual Allerton Conference on Communication, Control, and Computing (Allerton 2011) (Monticello, IL, USA). IEEE, New York, NY, USA, 792–799. https://doi.org/10.1109/ALLERTON.2011.6120248
  • Han et al. (2020) Yilin Han, Chenxing Li, Peilun Li, Ming Wu, Dong Zhou, and Fan Long. 2020. Shrec: bandwidth-efficient transaction relay in high-throughput blockchain systems. In Proceedings of the 11th ACM Symposium on Cloud Computing (Virtual Event, USA) (SoCC ’20). Association for Computing Machinery, New York, NY, USA, 238–252. https://doi.org/10.1145/3419111.3421283
  • Lázaro and Matuz (2021) Francisco Lázaro and Balázs Matuz. 2021. Irregular Invertible Bloom Look-Up Tables. In 11th International Symposium on Topics in Coding (ISTC 2021) (Montreal, QC, Canada). IEEE, New York, NY, USA, 1–5. https://doi.org/10.1109/ISTC49272.2021.9594198
  • Lázaro and Matuz (2023) Francisco Lázaro and Balázs Matuz. 2023. A Rate-Compatible Solution to the Set Reconciliation Problem. IEEE Trans. Commun. 71, 10 (2023), 5769–5782. https://doi.org/10.1109/TCOMM.2023.3296630
  • Luby (2002) Michael Luby. 2002. LT Codes. In 43rd Symposium on Foundations of Computer Science (FOCS 2002) (Vancouver, BC, Canada). IEEE Computer Society, Los Alamitos, CA, USA, 271. https://doi.org/10.1109/SFCS.2002.1181950
  • Luby et al. (1998) Michael G. Luby, Michael Mitzenmacher, and M. Amin Shokrollahi. 1998. Analysis of random processes via And-Or tree evaluation. In Proceedings of the Ninth Annual ACM-SIAM Symposium on Discrete Algorithms (San Francisco, California, USA) (SODA ’98). Society for Industrial and Applied Mathematics, USA, 364–373.
  • Minsky et al. (2003) Yaron Minsky, Ari Trachtenberg, and Richard Zippel. 2003. Set reconciliation with nearly optimal communication complexity. IEEE Trans. Inf. Theory 49, 9 (2003), 2213–2218. https://doi.org/10.1109/TIT.2003.815784
  • Mitzenmacher and Pagh (2018) Michael Mitzenmacher and Rasmus Pagh. 2018. Simple multi-party set reconciliation. Distributed Comput. 31, 6 (2018), 441–453. https://doi.org/10.1007/S00446-017-0316-0
  • Mitzenmacher and Varghese (2012) Michael Mitzenmacher and George Varghese. 2012. Biff (Bloom filter) codes: Fast error correction for large data sets. In Proceedings of the 2012 IEEE International Symposium on Information Theory (ISIT 2012) (Cambridge, MA, USA). IEEE, New York, NY, USA, 483–487. https://doi.org/10.1109/ISIT.2012.6284236
  • Nakamoto (2008) Satoshi Nakamoto. 2008. Bitcoin: A peer-to-peer electronic cash system.
  • Naumenko et al. (2019) Gleb Naumenko, Gregory Maxwell, Pieter Wuille, Alexandra Fedorova, and Ivan Beschastnikh. 2019. Erlay: Efficient Transaction Relay for Bitcoin. In Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security (London, United Kingdom) (CCS ’19). Association for Computing Machinery, New York, NY, USA, 817–831. https://doi.org/10.1145/3319535.3354237
  • Ozisik et al. (2019) A. Pinar Ozisik, Gavin Andresen, Brian N. Levine, Darren Tapp, George Bissias, and Sunny Katkuri. 2019. Graphene: efficient interactive set reconciliation applied to blockchain propagation. In Proceedings of the ACM Special Interest Group on Data Communication (Beijing, China) (SIGCOMM ’19). Association for Computing Machinery, New York, NY, USA, 303–317. https://doi.org/10.1145/3341302.3342082
  • Perry et al. (2022) Neil Perry, Bruce Spang, Saba Eskandarian, and Dan Boneh. 2022. Strong Anonymity for Mesh Messaging. arXiv:2207.04145 [cs.CR]
  • Raman et al. (2019) Aravindh Raman, Sagar Joglekar, Emiliano De Cristofaro, Nishanth Sastry, and Gareth Tyson. 2019. Challenges in the Decentralised Web: The Mastodon Case. In Proceedings of the Internet Measurement Conference (Amsterdam, Netherlands) (IMC ’19). Association for Computing Machinery, New York, NY, USA, 217–229. https://doi.org/10.1145/3355369.3355572
  • Reed and Solomon (1960) Irving S. Reed and Gustave Solomon. 1960. Polynomial Codes Over Certain Finite Fields. J. Soc. Indust. Appl. Math. 8, 2 (1960), 300–304. https://doi.org/10.1137/0108018
  • Richardson and Urbanke (2001) Thomas J. Richardson and Rüdiger L. Urbanke. 2001. The capacity of low-density parity-check codes under message-passing decoding. IEEE Trans. Inf. Theory 47, 2 (2001), 599–618. https://doi.org/10.1109/18.910577
  • Rizzo (1997) Luigi Rizzo. 1997. Dummynet: a simple approach to the evaluation of network protocols. SIGCOMM Comput. Commun. Rev. 27, 1 (jan 1997), 31–41. https://doi.org/10.1145/251007.251012
  • Robbins (1955) Herbert Robbins. 1955. A remark on Stirling’s formula. The American mathematical monthly 62, 1 (1955), 26–29.
  • Sonic (2024) Sonic. 2024. Ethereum Execution Client Diversity. https://execution-diversity.info (retrieved January 22, 2024).
  • Steele (1994) J Michael Steele. 1994. Le Cam’s inequality and Poisson approximations. The American Mathematical Monthly 101, 1 (1994), 48–54.
  • Summermatter and Grothoff (2022) E. Summermatter and C. Grothoff. 2022. Byzantine Fault Tolerant Set Reconciliation. https://lsd.gnunet.org/lsd0003/.
  • Taverna and Paterson (2023) Massimiliano Taverna and Kenneth G. Paterson. 2023. Snapping Snap Sync: Practical Attacks on Go Ethereum Synchronising Nodes. In 32nd USENIX Security Symposium (USENIX Security 23). USENIX Association, Anaheim, CA, 3331–3348. https://www.usenix.org/conference/usenixsecurity23/presentation/taverna
  • Trautwein et al. (2022) Dennis Trautwein, Aravindh Raman, Gareth Tyson, Ignacio Castro, Will Scott, Moritz Schubotz, Bela Gipp, and Yiannis Psaras. 2022. Design and evaluation of IPFS: a storage layer for the decentralized web. In Proceedings of the ACM SIGCOMM 2022 Conference (Amsterdam, Netherlands) (SIGCOMM ’22). Association for Computing Machinery, New York, NY, USA, 739–752. https://doi.org/10.1145/3544216.3544232
  • Wang et al. (2017) Jianguo Wang, Chunbin Lin, Yannis Papakonstantinou, and Steven Swanson. 2017. An Experimental Study of Bitmap Compression vs. Inverted List Compression. In Proceedings of the 2017 ACM International Conference on Management of Data (Chicago, Illinois, USA) (SIGMOD ’17). Association for Computing Machinery, New York, NY, USA, 993–1008. https://doi.org/10.1145/3035918.3064007
  • Wood (2014) Gavin Wood. 2014. Ethereum: A secure decentralised generalised transaction ledger.
  • Wuille (2018) Pieter Wuille. 2018. Minisketch: a library for BCH-based set reconciliation. https://github.com/sipa/minisketch.
  • Yue et al. (2020) Cong Yue, Zhongle Xie, Meihui Zhang, Gang Chen, Beng Chin Ooi, Sheng Wang, and Xiaokui Xiao. 2020. Analysis of Indexing Structures for Immutable Data. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data (Portland, OR, USA) (SIGMOD ’20). Association for Computing Machinery, New York, NY, USA, 925–935. https://doi.org/10.1145/3318464.3389773

Appendix A Inflexibility of Regular IBLTs

We state and prove Theorems A.1 and A.2, which show that the efficiency of regular IBLTs degrades exponentially fast when being used to reconcile more or fewer differences than parameterized for. We state the theorems with respect to generic sets of source symbols. When using IBLTs for set reconciliation (§ 3), the sets are ABA\bigtriangleup B.

Theorem A.1.

For a random set of nn source symbols and a corresponding regular IBLT with mm coded symbols, the probability that the peeling decoder can recover at least one source symbol decreases exponentially in n/mn/m.

Proof.

For the peeling decoder to recover at least one source symbol, there must be at least one pure coded symbol at the beginning. Otherwise, peeling cannot start, and no source symbol can be recovered. We now calculate a lower bound on the probability pnopurep_{\text{nopure}} that no pure coded symbol exists. Note that there is another parameter for regular IBLTs, kk, which determines the number of coded symbols each source symbol is mapped to (§ 3). However, it can be shown that the probability that no pure coded symbol exists increases with kk, so we set k=1k=1 to get a lower bound.

We consider the equivalent problem: if we throw nn balls (source symbols) uniformly at random into mm bins (coded symbols), what is a lower bound on the probability that no bin ends up having exactly one ball? We compute the number of ways ff such that at least one bin has exactly one ball, which is the opposite of the event we are interested in. We set aside one of the mm bins which will get exactly one ball, and assign one of the nn balls to this bin. We then throw the remaining n1n-1 balls into the remaining m1m-1 bins freely. Notice that there are duplicates, so we get an upper bound

fmn(m1)n1.f\leq mn(m-1)^{n-1}.

The total number of ways to throw nn balls into mm bins is

g=mn.g=m^{n}.

Each way of throwing is equally likely to happen, so the probability pnopurep_{\text{nopure}} that no bin ends up with exactly one ball has a lower bound

pnopure\displaystyle p_{\text{nopure}} =1f/g\displaystyle=1-f/g
1mn(m1)n1mn\displaystyle\geq 1-\frac{mn(m-1)^{n-1}}{m^{n}}
=1n(m1)n1mn1\displaystyle=1-\frac{n(m-1)^{n-1}}{m^{n-1}}
=1n(11m)n1\displaystyle=1-n\left(1-\frac{1}{m}\right)^{n-1}

We are interested in the event where peeling can start, which is the opposite event. Its probability has an upper bound for n/m>1n/m>1

phaspure\displaystyle p_{\text{haspure}} =1pnopure\displaystyle=1-p_{\text{nopure}}
n(11m)n1\displaystyle\leq n\left(1-\frac{1}{m}\right)^{n-1}
nen1m\displaystyle\leq ne^{-\frac{n-1}{m}}
=o(1.5nm).\displaystyle=o(1.5^{-\frac{n}{m}}).

The following theorem says that when dropping a fraction of a regular IBLT to reconcile a smaller number of differences nn with a constant overhead η\eta (the ratio between the number of used coded symbols and the number of source symbols nn), the success probability decreases quickly as a larger fraction gets dropped.

Theorem A.2.

Consider a random set of nn source symbols and a corresponding regular IBLT with mm coded symbols, where each source symbol is mapped to kk coded symbols. The peeling decoder tries to recover all source symbols using the first ηn\eta n coded symbols. kk and η\eta are constants, and ηnm\eta n\leq m. The probability that it succeeds decreases exponentially in 1ηn/m1-\eta n/m.

Proof.

For the peeling decoder to succeed, each of the source symbols must be mapped at least once to the first ηn\eta n coded symbols. Because each source symbol is uniformly mapped to kk of the mm coded symbols, the probability that one is only mapped to the remaining mηnm-\eta n coded symbols that the decoder does not use (“missed”) is

pmissone\displaystyle p_{\text{missone}} =(mηnk)/(mk)\displaystyle=\binom{m-\eta n}{k}/\binom{m}{k}
=(mηn)(mηn1)(mηnk+1)m(m1)(mk+1)\displaystyle=\frac{(m-\eta n)(m-\eta n-1)\dots(m-\eta n-k+1)}{m(m-1)\dots(m-k+1)}
(1ηnm)k.\displaystyle\approx\left(1-\frac{\eta n}{m}\right)^{k}.

The last step approximates mηnk+1mk+1\frac{m-\eta n-k+1}{m-k+1} with mηnm\frac{m-\eta n}{m}. This does not change the result qualitatively because kk is a constant.

The probability that no source symbol is missed is

pnomiss\displaystyle p_{\text{nomiss}} =(1pmissone)n\displaystyle=(1-p_{\text{missone}})^{n}
enpmissone\displaystyle\leq e^{-np_{\text{missone}}}
=en(1ηnm)k.\displaystyle=e^{-n\left(1-\frac{\eta n}{m}\right)^{k}}.

Appendix B Calculation of PgP_{g} and C(x)C(x)

In this section, we calculate PgP_{g} and C(x)C(x) as defined in § 4.2.

Pg\displaystyle P_{g} =(1ρ(i+1))(1ρ(i+2))(1ρ(i+g1))ρ(i+g)\displaystyle=(1-\rho(i+1))(1-\rho(i+2))\dots(1-\rho(i+g-1))\rho(i+g)
=1/αi+g+1/αn=1g1i+ni+n+1/α\displaystyle=\frac{1/\alpha}{i+g+1/\alpha}\prod_{n=1}^{g-1}\frac{i+n}{i+n+1/\alpha}
=(i+1)g1α(i+1+1/α)g.\displaystyle=\frac{(i+1)_{g-1}}{\alpha(i+1+1/\alpha)_{g}}.

Here, (x)n(x)_{n} is the Pochhammer symbol.888(x)n=x(x+1)(x+n1)(x)_{n}=x(x+1)\dots(x+n-1).

Before proceeding to calculate C(x)C(x), we first prove a useful identity about quotients of Gamma functions,

Γ(x)Γ(x+y)1y1(Γ(x)Γ(x+y1)Γ(x+1)Γ(x+y)).\frac{\Gamma(x)}{\Gamma(x+y)}\equiv\frac{1}{y-1}\left(\frac{\Gamma(x)}{\Gamma(x+y-1)}-\frac{\Gamma(x+1)}{\Gamma(x+y)}\right).

We start from the right-hand side,

1y1(Γ(x)Γ(x+y1)Γ(x+1)Γ(x+y))\displaystyle\frac{1}{y-1}\left(\frac{\Gamma(x)}{\Gamma(x+y-1)}-\frac{\Gamma(x+1)}{\Gamma(x+y)}\right)
=\displaystyle= 1y1((x+y1)Γ(x)(x+y1)Γ(x+y1)Γ(x+1)Γ(x+y))\displaystyle\frac{1}{y-1}\left(\frac{(x+y-1)\Gamma(x)}{(x+y-1)\Gamma(x+y-1)}-\frac{\Gamma(x+1)}{\Gamma(x+y)}\right)
=\displaystyle= 1y1((y1)Γ(x)+Γ(x+1)Γ(x+y)Γ(x+1)Γ(x+y))\displaystyle\frac{1}{y-1}\left(\frac{(y-1)\Gamma(x)+\Gamma(x+1)}{\Gamma(x+y)}-\frac{\Gamma(x+1)}{\Gamma(x+y)}\right)
=\displaystyle= Γ(x)Γ(x+y).\displaystyle\frac{\Gamma(x)}{\Gamma(x+y)}.

The identity immediately implies the following

(4) x=abΓ(x)Γ(x+y)1y1(Γ(a)Γ(a+y1)Γ(b+1)Γ(b+y)).\sum_{x=a}^{b}\frac{\Gamma(x)}{\Gamma(x+y)}\equiv\frac{1}{y-1}\left(\frac{\Gamma(a)}{\Gamma(a+y-1)}-\frac{\Gamma(b+1)}{\Gamma(b+y)}\right).

We now calculate C(x)C(x).

C(x)\displaystyle C(x) =g=1xPg\displaystyle=\sum_{g=1}^{x}P_{g}
=g=1x(i+1)g1α(i+1+1/α)g\displaystyle=\sum_{g=1}^{x}\frac{(i+1)_{g-1}}{\alpha(i+1+1/\alpha)_{g}}
=g=1xΓ(i+g)Γ(i+1+1/α)αΓ(i+1)Γ(i+1+g+1/α)\displaystyle=\sum_{g=1}^{x}\frac{\Gamma(i+g)\Gamma(i+1+1/\alpha)}{\alpha\Gamma(i+1)\Gamma(i+1+g+1/\alpha)}
=Γ(i+1+1/α)αΓ(i+1)g=1xΓ(i+g)Γ(i+g+1+1/α)\displaystyle=\frac{\Gamma(i+1+1/\alpha)}{\alpha\Gamma(i+1)}\sum_{g=1}^{x}\frac{\Gamma(i+g)}{\Gamma(i+g+1+1/\alpha)}
=Γ(i+1+1α)Γ(i+1)(Γ(i+1)Γ(i+1+1α)Γ(i+x+1)Γ(i+x+1+1α))\displaystyle=\frac{\Gamma(i+1+\frac{1}{\alpha})}{\Gamma(i+1)}\left(\frac{\Gamma(i+1)}{\Gamma(i+1+\frac{1}{\alpha})}-\frac{\Gamma(i+x+1)}{\Gamma(i+x+1+\frac{1}{\alpha})}\right)
=1Γ(i+1+1α)Γ(x+i+1)Γ(i+1)Γ(x+i+1+1α).\displaystyle=1-\frac{\Gamma(i+1+\frac{1}{\alpha})\Gamma(x+i+1)}{\Gamma(i+1)\Gamma(x+i+1+\frac{1}{\alpha})}.

The second last equality results from applying Eq. 4.

Appendix C Deferred Proofs

Lemma C.1 (Restatement of Lemma 4.1).

For any ϵ>0\epsilon>0, any mapping probability ρ(i)\rho(i) such that ρ(i)=Ω(1/i1ϵ)\rho(i)=\Omega\left(1/i^{1-\epsilon}\right), and any σ>0\sigma>0, if there exists at least one pure coded symbol within the first mm coded symbols for a random set SS with probability σ\sigma, then m=ω(|S|)m=\omega(|S|).

Proof.

We need to show η>0|S|0>0|S|>|S|0:m>η|S|\forall\eta>0\,\exists|S|_{0}>0\,\forall|S|>|S|_{0}:m>\eta|S|. Because ρ(i)=Ω(1/i1ϵ)\rho(i)=\Omega\left(1/i^{1-\epsilon}\right), there exists δ>0\delta>0 and i0>0i_{0}>0, such that ρ(i)δ/i1ϵ\rho(i)\geq\delta/i^{1-\epsilon} for all i>i0i>i_{0}. Let ρ0\rho_{0} be the smallest non-zero value among ρ(i)\rho(i) for all 0ii00\leq i\leq i_{0}. Let

|S|0=max((η1ϵδ)1ϵ,1η(δρ0)11ϵ,|S|)|S|_{0}=\max\left(\left(\frac{\eta^{1-\epsilon}}{\delta}\right)^{\frac{1}{\epsilon}},\frac{1}{\eta}\left(\frac{\delta}{\rho_{0}}\right)^{\frac{1}{1-\epsilon}},|S|^{*}\right)

where |S||S|^{*} is such that for all |S|>|S||S|>|S|^{*},

eδηϵ|S|1+ϵexp(δ|S|ϵη1ϵ)<σ.e\cdot\delta\eta^{\epsilon}|S|^{1+\epsilon}\exp\left(-{\frac{\delta|S|^{\epsilon}}{\eta^{1-\epsilon}}}\right)<\sigma.

Note that exp(δ|S|ϵη1ϵ)=ω(|S|1+ϵ)\exp\left({\frac{\delta|S|^{\epsilon}}{\eta^{1-\epsilon}}}\right)=\omega\left(|S|^{1+\epsilon}\right), so such |S||S|^{*} always exists.

For any i0i\geq 0, the ii-th coded symbol is pure if and only if exactly one source symbol is mapped to it, which happens with probability

Pi\displaystyle P_{i} =|S|ρ(i)(1ρ(i))|S|1\displaystyle=|S|\rho(i)(1-\rho(i))^{|S|-1}
e|S|ρ(i)e|S|ρ(i).\displaystyle\leq e\cdot|S|\rho(i)e^{-|S|\rho(i)}.

The inequality comes from the fact that (1x)yexy(1-x)^{y}\leq e^{-xy} for any 0x10\leq x\leq 1 and y1y\geq 1.

By the definition of |S|0|S|_{0}, for any |S|>|S|0|S|>|S|_{0} and any 0iη|S|0\leq i\leq\eta|S|, either ρ(i)=0\rho(i)=0, or ρ(i)δ(η|S|)1ϵ\rho(i)\geq\frac{\delta}{(\eta|S|)^{1-\epsilon}} and |S|ρ(i)>1|S|\rho(i)>1. In either case,

Pieδ|S|ϵη1ϵexp(δ|S|ϵη1ϵ).P_{i}\leq e\cdot\frac{\delta|S|^{\epsilon}}{\eta^{1-\epsilon}}\exp\left(-{\frac{\delta|S|^{\epsilon}}{\eta^{1-\epsilon}}}\right).

Recall that we want at least one pure symbol among the first mm coded symbols. Assume for contradiction that mη|S|m\leq\eta|S|. Then, failure happens with probability

Pfail\displaystyle P_{\text{fail}} =i=0m1(1Pi)\displaystyle=\prod_{i=0}^{m-1}(1-P_{i})
i=0η|S|1(1Pi)\displaystyle\geq\prod_{i=0}^{\eta|S|-1}(1-P_{i})
(1eδ|S|ϵη1ϵexp(δ|S|ϵη1ϵ))η|S|\displaystyle\geq\left(1-e\cdot\frac{\delta|S|^{\epsilon}}{\eta^{1-\epsilon}}\exp\left(-{\frac{\delta|S|^{\epsilon}}{\eta^{1-\epsilon}}}\right)\right)^{\eta|S|}
1eδηϵ|S|1+ϵexp(δ|S|ϵη1ϵ)\displaystyle\geq 1-e\cdot\delta\eta^{\epsilon}|S|^{1+\epsilon}\exp\left(-{\frac{\delta|S|^{\epsilon}}{\eta^{1-\epsilon}}}\right)
>1σ.\displaystyle>1-\sigma.

We remark that a stronger result which only requires ρ(i)=ω(logi/i)\rho(i)=\omega\left(\log i/i\right) can be shown with a very similar proof, which we omit for simplicity and lack of practical implications. We may also consider a generalization of this lemma, by requiring there to be at least kk coded symbols with at most kk source symbols mapped to each, for every k|S|k\leq|S|. (Lemma 4.1 is the special case of k=1k=1.) This may lead to an even tighter bound on ρ(i)\rho(i), which we conjecture to be ρ(i)=ω(1/i)\rho(i)=\omega(1/i).

Lemma C.2 (Restatement of Lemma 4.2).

For any mapping probability ρ(i)\rho(i) such that ρ(i)=o(1/i)\rho(i)=o\left(1/i\right), and any σ>0\sigma>0, if there exist at least |S||S| non-empty coded symbols within the first mm coded symbols for a random set SS with probability σ\sigma, then m=ω(|S|)m=\omega(|S|).

Proof.

We need to show that η>0|S|0>0|S|>|S|0:m>η|S|\forall\eta>0\,\exists|S|_{0}>0\,\forall|S|>|S|_{0}:m>\eta|S|. First, note that for there to be |S||S| non-empty symbols within the first mm coded symbols, mm cannot be smaller than |S||S|, so the statement is trivially true for 0<η<10<\eta<1. We now prove for the case of η1\eta\geq 1.

For any η1\eta\geq 1, let δ=14η\delta=\frac{1}{4\eta}. Because ρ(i)=o(1/i)\rho(i)=o(1/i), there must exist i0>0i_{0}>0 such that ρ(i)<δ/i\rho(i)<\delta/i for all i>i0i>i_{0}. Let

|S|0=max(2i0,4η2(12η)log(σ)).|S|_{0}=\max(2i_{0},4\eta^{2}(1-2\eta)\log(\sigma)).

For all i|S|/2i\geq|S|/2, the ii-th coded symbol is non-empty with probability

Pi\displaystyle P_{i} =1(1ρ(i))|S|\displaystyle=1-\left(1-\rho\left(i\right)\right)^{|S|}
<1(12δ|S|)|S|\displaystyle<1-\left(1-\frac{2\delta}{|S|}\right)^{|S|}
2δ.\displaystyle\leq 2\delta.

The first inequality is because ρ(i)<δ/i\rho(i)<\delta/i for i|S|/2i\geq|S|/2, and the second inequality is because (1x)y1xy(1-x)^{y}\geq 1-xy for any x<1x<1 and y1y\geq 1.

In order to get |S||S| non-empty symbols among the first mm coded symbols, there must be at least |S|/2|S|/2 non-empty symbols from index i=|S|/2i=|S|/2 to index i=m1i=m-1. To derive an upper bond on this probability, we assume that each is non-empty with probability 2δ2\delta, which, as we just saw, is strictly an overestimate. By Hoeffding’s inequality, the probability that there are at least |S|/2|S|/2 non-empty symbols has an upper bound

Psucc<exp((|S|2m)(2δ|S|2m|S|)2)P_{\text{succ}}<\exp\left(\left(|S|-2m\right)\left(2\delta-\frac{|S|}{2m-|S|}\right)^{2}\right)

when m(14δ+12)|S|m\leq(\frac{1}{4\delta}+\frac{1}{2})|S|, which is true for all mη|S|m\leq\eta|S|.

Assume mη|S|m\leq\eta|S| for contradiction. By the definition of δ\delta, the previous upper bound becomes

Psucc<exp(|S|4η2(12η)).P_{\text{succ}}<\exp\left(\frac{|S|}{4\eta^{2}(1-2\eta)}\right).

The right hand side monotonically decreases with |S||S|. So, by the definition of |S|0|S|_{0}, for all |S|>|S|0|S|>|S|_{0},

Psucc<exp(|S|04η2(12η))σ.P_{\text{succ}}<\exp\left(\frac{|S|_{0}}{4\eta^{2}(1-2\eta)}\right)\leq\sigma.

Theorem C.3 (Restatement of Theorem 5.1).

For a random set of nn source symbols, the probability that the peeling decoder successfully recovers the set using the first ηn\eta n coded symbols (as defined in § 4) goes to 1 as nn goes to infinity. Here, η\eta is any positive constant such that

q(0,1]:e1αEi(qαη)<q.\forall q\in(0,1]:e^{\frac{1}{\alpha}\text{Ei}\left(-\frac{q}{\alpha\eta}\right)}<q.
Source symbolsx0x_{0}x1x_{1}x2x_{2}x3x_{3}Coded symbolsa0a_{0}a1a_{1}a2a_{2}a3a_{3}a4a_{4}a5a_{5}
Figure 16. Example of the bipartite graph representation of a set of source symbols, x0,x1,x2,x3x_{0},x_{1},x_{2},x_{3}, and its first 6 coded symbols, a0,a1,,a5a_{0},a_{1},\dots,a_{5}.

Before proving the theorem, we introduce the graph representation of a set of source symbols and the corresponding coded symbols. Imagine a bipartite graph where each source or coded symbol is a vertex in the graph, and there is an edge between a source and a coded symbol if and only if the former is mapped to the latter during encoding. Fig. 16 is an example. We define the degree of a symbol as the number of neighbors it has in this bipartite graph, i.e., its degree as in graph theory. For example, in Fig. 16, source symbol x0x_{0} has degree 4, and coded symbol a1a_{1} has degree 2.

We also define the degree of an edge in the graph (Lázaro and Matuz, 2021). The source degree of an edge is the degree of the source symbol it connects to, and its coded degree is the degree of the coded symbol it connects to. For example, for the edge connecting x3x_{3} and a3a_{3} in Fig. 16, its source degree is 5 because x3x_{3} has degree 5, and its coded degree is 2 because a3a_{3} has degree 2.

We remark that density evolution is a standard technique (Richardson and Urbanke, 2001; Luby et al., 1998) of analyzing codes that are based on random graphs, such as LT (Luby, 2002) and LDPC (Gallager, 1962) codes. Our proof mostly follows these analysis, in particular, (Luby et al., 1998, § 2) and (Lázaro and Matuz, 2021, § III). However, the mapping probability ρ(i)\rho(i) in Rateless IBLTs is a function whose parameter ii goes to infinity as the set size goes to infinity. This is a key challenge that we solve in our analysis, which enables us to get the closed-form expression in Theorem 5.1.

Proof.

Consider nn random source symbols and its first mm coded symbols. Let Λ\Lambda be the random variable denoting the degree of a random source symbol. Let Λu\Lambda_{u} (0um0\leq u\leq m) be the probability that Λ\Lambda takes value uu. Similarly, let Ψ\Psi be the random variable denoting the degree of a random coded symbol. Let Ψv\Psi_{v} (0vn0\leq v\leq n) be the probability that Ψ\Psi takes value vv. Define the probability generating functions of Λ\Lambda and Ψ\Psi,

Λ(x)\displaystyle\Lambda(x) =u=0mΛuxu,\displaystyle=\sum_{u=0}^{m}\Lambda_{u}x^{u},
Ψ(x)\displaystyle\Psi(x) =v=0nΨvxv.\displaystyle=\sum_{v=0}^{n}\Psi_{v}x^{v}.

We also consider the degree of a random edge. Let λ\lambda be the random variable denoting the source degree of a random edge. Let λu\lambda_{u} (0um0\leq u\leq m) be the probability that λ\lambda takes value uu. It is the fraction of edges with source degree uu among all edges, i.e.,

λu\displaystyle\lambda_{u} =Λuuw=0mΛww\displaystyle=\frac{\Lambda_{u}u}{\sum_{w=0}^{m}\Lambda_{w}w}
=Λuu𝔼(Λ).\displaystyle=\frac{\Lambda_{u}u}{\mathbb{E}(\Lambda)}.

Let λ(x)\lambda(x) be the generating function of λ\lambda, defined as

λ(x)\displaystyle\lambda(x) =u=0mλuxu1\displaystyle=\sum_{u=0}^{m}\lambda_{u}x^{u-1}
=1𝔼(Λ)u=0mΛuuxu1\displaystyle=\frac{1}{\mathbb{E}(\Lambda)}\sum_{u=0}^{m}\Lambda_{u}ux^{u-1}
=Λ(x)𝔼(Λ).\displaystyle=\frac{\Lambda^{\prime}(x)}{\mathbb{E}(\Lambda)}.

Similarly, let φ\varphi be the random variable denoting the coded degree of a random edge. Let φv\varphi_{v} (0vn0\leq v\leq n) be the probability that φ\varphi takes value vv. It is the fraction of edges with coded degree vv among all edges, i.e.,

φv\displaystyle\varphi_{v} =Ψvvw=0nΨww\displaystyle=\frac{\Psi_{v}v}{\sum_{w=0}^{n}\Psi_{w}w}
=Ψvv𝔼(Ψ).\displaystyle=\frac{\Psi_{v}v}{\mathbb{E}(\Psi)}.

Let φ(x)\varphi(x) be the generating function of φ\varphi, defined as

φ(x)\displaystyle\varphi(x) =v=0nφvxv1\displaystyle=\sum_{v=0}^{n}\varphi_{v}x^{v-1}
=1𝔼(Ψ)v=0nΨvvxv1\displaystyle=\frac{1}{\mathbb{E}(\Psi)}\sum_{v=0}^{n}\Psi_{v}vx^{v-1}
=Ψ(x)𝔼(Ψ).\displaystyle=\frac{\Psi^{\prime}(x)}{\mathbb{E}(\Psi)}.

Let us now consider Ψ(x)\Psi(x). Recall that each of the nn random source symbols is mapped to the ii-th coded symbol independently with probability ρ(i)\rho(i). The degree of the ii-th coded symbol thus follows binomial distribution, which takes vv with probability (nv)ρv(i)(1ρ(i))nv\binom{n}{v}\rho^{v}(i)(1-\rho(i))^{n-v}. Because we are interested in a random coded symbol, its index ii takes 0,1,,m10,1,\dots,m-1 with equal probability 1/m1/m. By the law of total probability,

Ψv=1mi=0m1(nv)ρv(i)(1ρ(i))nv.\displaystyle\Psi_{v}=\frac{1}{m}\sum_{i=0}^{m-1}\binom{n}{v}\rho^{v}(i)(1-\rho(i))^{n-v}.

Plugging it into the definition of Ψ(x)\Psi(x), we get

Ψ(x)\displaystyle\Psi(x) =v=0nΨvxv\displaystyle=\sum_{v=0}^{n}\Psi_{v}x^{v}
=v=0n1mi=0m1(nv)ρv(i)(1ρ(i))nvxv\displaystyle=\sum_{v=0}^{n}\frac{1}{m}\sum_{i=0}^{m-1}\binom{n}{v}\rho^{v}(i)(1-\rho(i))^{n-v}x^{v}
=1mi=0m1v=0n(nv)(xρ(i))v(1ρ(i))nv\displaystyle=\frac{1}{m}\sum_{i=0}^{m-1}\sum_{v=0}^{n}\binom{n}{v}(x\rho(i))^{v}(1-\rho(i))^{n-v}
=1mi=0m1(1(1x)ρ(i))n.\displaystyle=\frac{1}{m}\sum_{i=0}^{m-1}(1-(1-x)\rho(i))^{n}.

Here, the last step is because of the binomial theorem. Plugging it into the definition of φ(x)\varphi(x), we get

φ(x)\displaystyle\varphi(x) =nm𝔼(Ψ)i=0m1ρ(i)(ρ(i)(x1)+1)n1.\displaystyle=\frac{n}{m\mathbb{E}(\Psi)}\sum_{i=0}^{m-1}\rho(i)(\rho(i)(x-1)+1)^{n-1}.

By the handshaking lemma, which says the sum of the degree of all source symbols should equal the sum of the degree of all coded symbols,

m𝔼(Ψ)=mv=0nΨvv=nu=0mΛuu=n𝔼(Λ).\displaystyle m\mathbb{E}(\Psi)=m\sum_{v=0}^{n}\Psi_{v}v=n\sum_{u=0}^{m}\Lambda_{u}u=n\mathbb{E}(\Lambda).

So, we can further simplify φ(x)\varphi(x) as

φ(x)\displaystyle\varphi(x) =1𝔼(Λ)i=0m1ρ(i)(ρ(i)(x1)+1)n1.\displaystyle=\frac{1}{\mathbb{E}(\Lambda)}\sum_{i=0}^{m-1}\rho(i)(\rho(i)(x-1)+1)^{n-1}.

Next, let us consider Λ(x)\Lambda(x). For a random source symbol, it is mapped to the ii-th coded symbol independently with probability ρ(i)\rho(i). Its degree Λ\Lambda is thus the sum of independent Bernoulli random variables with success probabilities ρ(0),ρ(1),,ρ(m1)\rho(0),\rho(1),\dots,\rho(m-1), which follows Poisson binomial distribution. By an extension (Steele, 1994, § 5) to Le Cam’s theorem (Cam, 1960), we can approximate this distribution with a Poisson distribution of rate i=0m1ρ(i)\sum_{i=0}^{m-1}\rho(i), i.e., 𝔼(Λ)\mathbb{E}(\Lambda), with the total variation distance between the two distributions tending to zero as mm goes to infinity. That is,

u=0|Λu(𝔼(Λ))ue𝔼(Λ)u!|<2𝔼(Λ)i=0m1ρ2(i).\displaystyle\sum_{u=0}^{\infty}\left|\Lambda_{u}-\frac{(\mathbb{E}(\Lambda))^{u}e^{-\mathbb{E}(\Lambda)}}{u!}\right|<\frac{2}{\mathbb{E}(\Lambda)}\sum_{i=0}^{m-1}{\rho^{2}(i)}.

When ρ(i)=11+αi\rho(i)=\frac{1}{1+\alpha i} for any α>0\alpha>0, the right hand side of the inequality goes to zero as mm goes to infinity.

Recall that the probability generating function of a Poisson random variable with rate 𝔼(Λ)\mathbb{E}(\Lambda) is

Λ(x)=e𝔼(Λ)(x1).\displaystyle\Lambda(x)=e^{\mathbb{E}(\Lambda)(x-1)}.

Plugging it into the definition of λ(x)\lambda(x), we get

λ(x)=e𝔼(Λ)(x1).\displaystyle\lambda(x)=e^{\mathbb{E}(\Lambda)(x-1)}.

Let qq denote the probability that a randomly chosen edge connects to a source symbol that is not yet recovered. As decoding progresses, qq is updated according to the following function (Luby et al., 1998; Lázaro and Matuz, 2021)

f(q)\displaystyle f(q) =λ(1φ(1q))\displaystyle=\lambda(1-\varphi(1-q))
=e𝔼(Λ)φ(1q)\displaystyle=e^{-\mathbb{E}(\Lambda)\varphi(1-q)}
=ei=0m1ρ(i)(1qρ(i))n1.\displaystyle=e^{-\sum_{i=0}^{m-1}\rho(i)(1-q\rho(i))^{n-1}}.

Let us consider f(q)f(q) when the number of source symbols nn goes to infinity, and the ratio of coded and source symbols is fixed, i.e., η=m/n\eta=m/n where η\eta is a positive constant. Recall that ρ(i)=11+αi\rho(i)=\frac{1}{1+\alpha i}. Notice that

enqαi(1qρ(i))n1e(n1)q1+αi\displaystyle e^{-\frac{nq}{\alpha i}}\leq(1-q\rho(i))^{n-1}\leq e^{-\frac{(n-1)q}{1+\alpha i}}

holds for all n1n\geq 1, i0i\geq 0, α>0\alpha>0, and 0q10\leq q\leq 1. We use this inequality and the squeeze theorem to calculate the limit of the exponent of f(q)f(q) when nn goes to infinity.

We first calculate the lower bound.

limnln(f(q))\displaystyle\lim_{n\rightarrow\infty}-\ln(f(q)) =limni=0ηn1ρ(i)(1qρ(i))n1\displaystyle=\lim_{n\rightarrow\infty}\sum_{i=0}^{\eta n-1}\rho(i)(1-q\rho(i))^{n-1}
limni=0ηn1ρ(i)enq/(αi)\displaystyle\geq\lim_{n\rightarrow\infty}\sum_{i=0}^{\eta n-1}\rho(i)e^{-nq/(\alpha i)}
=limni=0ηn11(1+αi)enq/(αi)\displaystyle=\lim_{n\rightarrow\infty}\sum_{i=0}^{\eta n-1}\frac{1}{(1+\alpha i)e^{nq/(\alpha i)}}
=limn1ni=0ηn11(1n+αin)eqαni\displaystyle=\lim_{n\rightarrow\infty}\frac{1}{n}\sum_{i=0}^{\eta n-1}\frac{1}{(\frac{1}{n}+\alpha\cdot\frac{i}{n})e^{\frac{q}{\alpha}\cdot\frac{n}{i}}}
=0η1αxeqαx𝑑x\displaystyle=\int_{0}^{\eta}\frac{1}{\alpha xe^{\frac{q}{\alpha x}}}dx
=1αEi(qαη).\displaystyle=-\frac{1}{\alpha}\text{Ei}\left(-\frac{q}{\alpha\eta}\right).

Here, Ei()\text{Ei}(\cdot) is the exponential integral function.

We then calculate the upper bound.

limnln(f(q))\displaystyle\lim_{n\rightarrow\infty}-\ln(f(q)) =limni=0ηn1ρ(i)(1qρ(i))n1\displaystyle=\lim_{n\rightarrow\infty}\sum_{i=0}^{\eta n-1}\rho(i)(1-q\rho(i))^{n-1}
limni=0ηn1ρ(i)e(n1)q1+αi\displaystyle\leq\lim_{n\rightarrow\infty}\sum_{i=0}^{\eta n-1}\rho(i)e^{-\frac{(n-1)q}{1+\alpha i}}
=limni=0ηn11(1+αi)e(n1)q1+αi\displaystyle=\lim_{n\rightarrow\infty}\sum_{i=0}^{\eta n-1}\frac{1}{(1+\alpha i)e^{\frac{(n-1)q}{1+\alpha i}}}
=limn1(n1)qi=0ηn111+αi(n1)qe(n1)q1+αi\displaystyle=\lim_{n\rightarrow\infty}\frac{1}{(n-1)q}\sum_{i=0}^{\eta n-1}\frac{1}{\frac{1+\alpha i}{(n-1)q}e^{\frac{(n-1)q}{1+\alpha i}}}
=1α0αη/q1xe1/x𝑑x\displaystyle=\frac{1}{\alpha}\int_{0}^{\alpha\eta/q}\frac{1}{xe^{1/x}}dx
=1αEi(qαη).\displaystyle=-\frac{1}{\alpha}\text{Ei}\left(-\frac{q}{\alpha\eta}\right).

By the squeeze theorem,

limnln(f(q))\displaystyle\lim_{n\rightarrow\infty}-\ln(f(q)) =1αEi(qαη).\displaystyle=-\frac{1}{\alpha}\text{Ei}\left(-\frac{q}{\alpha\eta}\right).

Plugging it into f(q)f(q), we have

limnf(q)\displaystyle\lim_{n\rightarrow\infty}f(q) =e1αEi(qαη).\displaystyle=e^{\frac{1}{\alpha}\text{Ei}\left(-\frac{q}{\alpha\eta}\right)}.

By standard results (Luby et al., 1998, § 2)(Lázaro and Matuz, 2021, § III.B) of density evolution analysis, if

f(q)<q\displaystyle f(q)<q

holds for all q(0,1]q\in(0,1], then the probability that all source symbols are recovered when the decoding process terminates tends to 1 as nn goes to infinity. Plugging in the closed-form result of limnf(q)\lim_{n\rightarrow\infty}f(q), we get the condition

e1αEi(qαη)<q,e^{\frac{1}{\alpha}\text{Ei}\left(-\frac{q}{\alpha\eta}\right)}<q,

which should hold for all q(0,1]q\in(0,1] for the success probability to converge to 1. ∎

We refer readers to the literature (Luby et al., 1998) for a formal treatment on density evolution, in particular the result (Luby et al., 1998, § 2.2) that q(0,1]:f(q)<q\forall q\in(0,1]:f(q)<q is a sufficient condition for the success probability to converge to 11, which we use directly in our proof. Here, we give some intuition. Recall that qq is the probability that a random edge in the bipartite graph connects to a source symbol that is not yet recovered. Let pp be the probability that a random edge connects to a coded symbol that is not yet decoded, i.e., has more than one neighbors that are not yet recovered. Density evolution iteratively updates qq and pp by simulating the peeling decoder. For a random edge with source degree uu, the source symbol it connects to is not yet recovered if none of the source symbol’s other u1u-1 neighbors is decoded. This happens with probability pu1p^{u-1}. Similarly, for a random edge with coded degree vv, the coded symbol it connects to is not decoded if not all of the coded symbol’s other v1v-1 neighbors are recovered. This happens with probability 1(1q)v11-(1-q)^{v-1}.

Because qq and pp are the probabilities with regard to a random edge, we take the mean over the distributions of source and coded degrees of the edge, and the results are the new values of qq and pp after one iteration of peeling. In particular, in each iteration (Lázaro and Matuz, 2021; Luby et al., 1998),

p\displaystyle p vφv(1(1q)v1),\displaystyle\leftarrow\sum_{v}\varphi_{v}\left(1-(1-q)^{v-1}\right),
q\displaystyle q uλupu1.\displaystyle\leftarrow\sum_{u}\lambda_{u}p^{u-1}.

By the definition of the generating functions of φ\varphi and λ\lambda, the above equations can be written as

p\displaystyle p 1φ(1q),\displaystyle\leftarrow 1-\varphi(1-q),
q\displaystyle q λ(p).\displaystyle\leftarrow\lambda(p).

Combine the two equations, and we get

q\displaystyle q λ(1φ(1q)).\displaystyle\leftarrow\lambda(1-\varphi(1-q)).

Notice that its right hand side is f(q)f(q). Intuitively, by requiring f(q)<qf(q)<q for all q(0,1]q\in(0,1], we make sure that the peeling decoder always makes progress, i.e., the non-recovery probability qq gets smaller, regardless of the current qq. Conversely, if the inequality has a fixed point qq^{*} such that f(q)=qf(q^{*})=q^{*}, then the decoder will stop making progress after recovering (1q)(1-q^{*})-fraction of source symbols, implying a failure.