This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Technical Report: Dealing with Undependable Workers
in Decentralized Network Supercomputingthanks: This work is supported in part by the NSF award 1017232.

Seda Davtyan     Kishori M. Konwar     Alexander Russell     Alexander A. Shvartsman Department of Computer Science & Engineering, University of Connecticut, 371 Fairfield Way, Unit 4155, Storrs CT 06269, USA. Emails: {seda,acr,aas}@engr.uconn.edu. University of British Columbia, Vancouver, BC V6T 1Z3, CANADA Email: kishori@interchange.ubc.ca.
Abstract

Internet supercomputing is an approach to solving partitionable, computation-intensive problems by harnessing the power of a vast number of interconnected computers. This paper presents a new algorithm for the problem of using network supercomputing to perform a large collection of independent tasks, while dealing with undependable processors. The adversary may cause the processors to return bogus results for tasks with certain probabilities, and may cause a subset FF of the initial set of processors PP to crash. The adversary is constrained in two ways. First, for the set of non-crashed processors PFP-F, the average probability of a processor returning a bogus result is inferior to 12\frac{1}{2}. Second, the adversary may crash a subset of processors FF, provided the size of PFP-F is bounded from below. We consider two models: the first bounds the size of PFP-F by a fractional polynomial, the second bounds this size by a poly-logarithm. Both models yield adversaries that are much stronger than previously studied. Our randomized synchronous algorithm is formulated for nn processors and tt tasks, with ntn\leq t, where depending on the number of crashes each live processor is able to terminate dynamically with the knowledge that the problem is solved with high probability. For the adversary constrained by a fractional polynomial, the round complexity of the algorithm is O(tnεlognloglogn)O(\frac{t}{n^{\varepsilon}}\log{n}\log{\log{n}}), its work is O(tlognloglogn)O(t\log{n}\log{\log{n}}) and message complexity is O(nlognloglogn)O(n\log{n}\log{\log{n}}). For the poly-log constrained adversary, the round complexity is O(t)O(t), work is O(tnε)O(tn^{\varepsilon}), and message complexity is O(n1+ε)O(n^{1+\varepsilon}) All bounds are shown to hold with high probability.

1 Introduction

Cooperative network supercomputing is becoming increasingly popular for harnessing the power of the global Internet computing platform. A typical Internet supercomputer, e.g., [1, 2], consists of a master computer and a large number of computers called workers, performing computation on behalf of the master. Despite the simplicity and benefits of a single master approach, as the scale of such computing environments grows, it becomes unrealistic to assume the existence of the infallible master that is able to coordinate the activities of multitudes of workers. Large-scale distributed systems are inherently dynamic and are subject to perturbations, such as failures of computers and network links, thus it is also necessary to consider fully distributed peer-to-peer solutions.

One could address the single point of failure issue by providing redundant multiple masters, yet this would remain a centralized scheme that is not suitable for big data processing that involves a large amount of input and output data. For example, consider applications in molecular biology that require large reference databases of gene models or annotated protein sequences, and large sets of unknown protein sequences [15]. Dealing with such voluminous data requires a large scale platform providing the necessary computational power and storage.

Therefore, a more scalable approach is to use a decentralized system, where the input is distributed and, once the processing is complete, the output is distributed across multiple nodes. Interestingly, computers returning bogus results is a phenomenon of increasing concern. While this may occur unintentionally, e.g., as a result of over-clocked processors, workers may in fact wrongly claim to have performed assigned work so as to obtain incentives associated with the system, e.g., higher rank. To address this problem, several works, e.g., [5, 10, 11, 18], study approaches based on a reliable master coordinating unreliable workers. The drawback in these approaches is the reliance on a reliable, bandwidth-unlimited master processor.

In our recent work [6, 7] we began to address this drawback of centralized systems by removing the assumption of an infallible and bandwidth-unlimited master processor. We introduced a decentralized approach, where a collection of worker processors cooperates on a large set of independent tasks without the reliance on a centralized control. Our prior algorithm is able to perform all tasks with high probability (whp), while dealing with undependable processors under an assumption that the average probability of live (non-crashed) processors returning incorrect results remains inferior to 12\frac{1}{2} during the computation. There the adversary is only allowed to crash a constant fraction of processors, and the correct termination of the nn-processor algorithm strongly depends on the availability of Ω(n)\Omega(n) live processors.

The goal of this work is to develop a new nn-processor algorithm that is able to deal with much stronger adversaries, e.g., those that can crash all but a fractional polynomial in nn, or even a poly-log in nn, number of processors, while still remaining in the synchronous setting with reliable communication. One of the challenges here is to enable an algorithm to terminate efficiently in the presence of any allowable number of crashes. Of course, to be interesting, such a solution must be efficient in terms of its work and communication complexities.

Contributions. We consider the problem of performing tt tasks in a distributed system of nn workers without centralized control. The tasks are independent, they admit at-least-once execution semantics, and each task can be performed by any worker in constant time. We assume that tasks can be obtained from some repository (else we can assume that the tasks are initially known). The fully-connected message-passing system is synchronous and reliable. We deal with failure models where crash-prone workers can return incorrect results. We present a randomized decentralized algorithm and analyze it for two different adversaries of increasing strength: constrained by a fractional polynomial, and poly-log constrained. In each of these settings, we assume that at any point of the computation live processors return bogus results with the average probability inferior to 12\frac{1}{2}. In more detail our contributions are as follows.

  • 1.

    Given the initial set of processors PP, with |P|=n|P|=n, we formulate two adversarial models, where the adversary can crash a set FF of processors, subject to the model constraints:

    • a)a)

      For the first adversary, constrained by a fractional polynomial, we have |PF|=Ω(nε)|P-F|=\Omega(n^{\varepsilon}), for a constant ε(0,1)\varepsilon\in(0,1).

    • b)b)

      For the second, poly-log constrained model, we have |PF|=Ω(logcn)|P-F|=\Omega(\log^{c}n), for a constant c1c\geq 1.

    In both models the adversary may assign arbitrary constant probability to processors, provided that processors in PP return bogus results with the average probability inferior to 12\frac{1}{2}. The adversary is additionally constrained, so that the average probability of returning bogus results for processors in PFP-F must remain inferior to 12\frac{1}{2}.

  • 2.

    We present a randomized algorithm for nn processors and tt tasks that works in synchronous rounds, where each processor performs a random task and shares its cumulative knowledge of results with one randomly chosen processor. Each processor starts as a “worker,” and once a processor accumulates a “sufficient” number of results, it becomes “enlightened.” Enlightened processors then “profess” their knowledge by multicasting it to exponentially growing random subsets of processors. When a processor receives a “sufficient” number of such messages, it halts. We note that workers become enlightened without any synchronization, and using only the local knowledge. The values that control “sufficient” numbers of results and messages are established in our analysis and are used as compile-time constants.

    We consider the protocol, by which the “enlightened” processors “profess” their knowledge and reach termination, to be of independent interest. The protocol’s message complexity does not depend on crashes, and the processors can terminate without explicit coordination. This addresses one of the challenges associated with termination when PFP-F can vary broadly in both models.

  • 3.

    We analyze the quality and performance of the algorithm for the two adversarial models. For each model we show that all live workers obtain the results of all tasks whp, and that these results are correct whp. Complexity results for the algorithm also hold whp:

    1. a)a)

      For the polynomially constrained adversary we show that the algorithm has work complexity O(tlognloglogn)O(t\log{n}\log{\log{n}}) and message complexity O(nlognloglogn)O(n\log{n}\log{\log{n}}).

    2. b)b)

      For the poly-log constrained adversary we show that the algorithm has work complexity O(tnε)O(tn^{\varepsilon}) and message complexity O(n1+ε)O(n^{1+\varepsilon}), for any 0<ε<10<\varepsilon<1. For this model we note that trivial solutions with all workers doing all tasks may be work-efficient, but they do not guarantee that the results are correct.

Prior work. Earlier approaches explored ways of improving the quality of the results obtained from untrusted workers in the settings where a bandwidth-unlimited and infallible master is coordinating the workers. Fernandez et al. [11, 10] and Konwar et al. [18] consider a distributed master-worker system where the workers may act maliciously by returning wrong results. Works [11, 10, 18] design algorithms that help the master determine correct results whp, while minimizing work. The failure models assume that some fraction of processors can exhibit faulty behavior. Another recent work by Christoforou et al. [5] pursues a game-theoretic approach. Paquette and Pelc [21] consider a model of a fault-prone system in which a decision has to be made on the basis of unreliable information and design a deterministic strategy that leads to a correct decision whp.

As already mentioned, our prior work [6] introduced the decentralized approach that eliminates the master, and provided a synchronous algorithm that is able to perform all tasks whp. That algorithm requires Ω(n)\Omega(n) live processors to terminate correctly. Our new algorithm uses a similar approach to performing tasks, however, it takes a completely different approach to termination that enables it to tolerate a much broader spectrum of crashes. The approach uses the new notion of “enlightened” processors that, having acquired sufficient knowledge, “profess” this knowledge to other processors, ultimately leading to termination. The behavior of two algorithms is similar while Ω(n)\Omega(n) processors remain in the computation, and we use the results from our prior analysis for this case.

A related problem, called Do-All, deals with the setting where a set of processors must perform a collection of tasks in the presence of adversity [12, 16]. For Do-All the termination condition is that all tasks must be performed and at least one processor is aware of the fact. The problem in this paper is different in that each non-crashed processor must learn the results of all tasks. Additionally, the failure model in our problem allows processors to return incorrect results, and our solution requires that each task is performed a certain minimum number of times so that the correct result can be discerned, whereas Do-All algorithms only guarantee that each task is performed at least once. Thus major changes are required to adapt a solution for Do-All to our setting. Do-All, being a key problem in the study of cooperative distributed computation was considered in a variety of models, including message-passing [9, 22] and shared-memory models [17, 19]. Chlebus et al. [4], study the Do-All problem, in the synchronous setting, considering work (total number of steps taken) and communication (total number of point-to-point messages) as equivalent, i.e. they consider the complexity work + communication as the cost metric. They derive upper bounds for the bounded adversary, upper and lower bounds for ff-bounded adversary, and almost matching upper and lower bounds for the linearly-bounded adversary.

Another related problem is the Omni-Do problem [8, 14, 13]. Here the problem is to perform all tasks in a network that is prone to fragmentations (partitions), thus here the results also must be known to all processors. However the failure models are quite different (network fragmentation and merges), and so is the analysis for these models. For linearly-bounded weakly-adaptive adversaries Chlebus and Kowalski [3] give a very efficient randomized algorithm for t=nt=n with work and communication complexities O(nlogn)O(n\log^{*}n).

Probabilistic quantification of trustworthiness of participants is also used in distributed reputation mechanisms. Yu et al. [23] propose a probabilistic model for distributed reputation management, where agents, i.e., participants, keep ratings of trustworthiness on one another, and update the values, using referrals and testimonies, through interactions.

Document structure. In Section 2 we give the models of computation and adversity, and measures of efficiency. Section 3 presents our algorithm. In Section 4 we carry out the analysis of the algorithm and derive complexity bounds. We conclude in Section 5 with a discussion.

2 Model of Computation and Definitions

System model. There are nn processors, each with a unique identifier (id) from set P=[n]P=[n]. We refer to the processor with id ii as processor ii. The system is synchronous and processors communicate by exchanging reliable messages. Computation is structured in terms of synchronous rounds, where in each round a processor performs three steps: send, receive, and compute. In these steps, respectively, processors can send and receive messages, and perform local polynomial computation, where the local computation time is assumed to be negligible compared to message latency. Messages received by a processor in a given step include all messages sent to it in the previous step. The duration of each round depends on the algorithm.

Tasks. There are tt tasks to be performed, each with a unique id from set 𝒯=[t]{\cal T}=[t]. We refer to the task with id jj as Task[j]Task[j]. The tasks are (a)(a) similar, meaning that any task can be done in constant time by any processor, (bb) independent, meaning that each task can be performed independently of other tasks, and (c) idempotent, meaning that each task admits at-least-once execution semantics and can be performed concurrently. For simplicity, we assume that the outcome of each task is a binary value. The problem is most interesting when there are at least as many tasks as there are processors, thus we consider tnt\geq n.

Models of Adversity. Processors are undependable in that a processor may compute the results of tasks incorrectly and it may crash. A processor can crash at any moment during the computation; following a crash, a processor performs no further actions.

Otherwise, each processor adheres to the protocol established by the algorithm it executes. We refer to non-crashed processors as live. We consider an oblivious adversary that decides prior to the computation what processors to crash and when to crash them. The maximum number of processors that can crash is established by the adversarial models (specified below).

For each processor iPi\in P, we define pip_{i} to be the probability of processor ii returning incorrect results, independently of other processors, such that, 1nipi<12ζ, for some ζ>0\frac{1}{n}\sum_{i}p_{i}<\frac{1}{2}-\zeta,\textrm{~for~some~}\zeta>0. That is, the average probability of processors in PP returning incorrect results is inferior to 12\frac{1}{2}. We use the constant ζ\zeta to ensure that the average probability of incorrect computation does not become arbitrarily close to 12\frac{1}{2} as nn grows arbitrarily large. The individual probabilities of incorrect computation are unknown to the processors.

For an execution of an algorithm, let FF be the set of processors that adversary crashes. The adversary is constrained in that the average probability of processors in PFP-F computing results incorrectly remains inferior to 12\frac{1}{2}. We define two adversarial models:

Model fp{\cal F}\!\!_{fp}, adversary constrained by a fractional polynomial :
     |PF|=Ω(nε)|P-F|=\Omega(n^{\varepsilon}), for a constant ε(0,1)\varepsilon\in(0,1).

Model pl{\cal F}\!\!_{pl}, poly-log constrained adversary :
     |PF|=Ω(logcn)|P-F|=\Omega(\log^{c}n), for a constant c1c\geq 1.

Measures of efficiency. We assess the efficiency of algorithms in terms of time, work, and message complexities. We use the conventional measures of time complexity and work complexity. Message complexity assesses the number of point-to-point messages sent during the execution of an algorithm. Lastly, we use the common definition of an event \mathcal{E} occurring with high probability (whp) to mean that Pr[]=1O(nα)\mbox{\bf Pr}[\mathcal{E}]=1-O(n^{-\alpha}) for some constant α>0\alpha>0.

3 Algorithm Description

We now present our decentralized solution, called algorithm daks (for Decentralized Algorithm with Knowledge Sharing), that employs no master and instead uses a gossip-based approach. We start by specifying in detail the algorithm for nn processors and t=nt=n tasks, then we generalize it for tt tasks, where tnt\geq n.

The algorithm is structured in terms of a main loop. The principal data structures at each processor are two arrays of size linear in nn: one accumulates knowledge gathered from the processors, and another stores the results. All processors start as workers. In each iteration, any worker performs one randomly selected task and sends its knowledge to just one other randomly selected processor. When a worker obtains “enough” knowledge about the tasks performed in the system, it computes the final results, stops being a worker, and becomes “enlightened.” Such processors no longer perform tasks, and instead “profess” their knowledge to other processors by means of multicasts to exponentially increasing random sets of processors. The main loop terminates when a certain number of messages is received from enlightened processors. The pseudocode for algorithm daks is given in Figure 1. We now give the details.

 
0:Procedure for processor ii;    
1:    external n,n, /* nn is the number of processors and tasks */    
2:         ,𝔎\mathfrak{H,K} /* positive constants */    
3:    Task[1..n]Task[1..n] /* set of tasks */    
4:    Ri[1..n]R_{i}[1..n] init n\emptyset^{n} /* set of collected results */    
5:    Resultsi[1..n]Results_{i}[1..n] init \bot /* array of results */    
6:    𝑝𝑟𝑜𝑓_𝑐𝑡𝑟{\it prof\!\_ctr} init 0 /* number of profess messages received */    
7:    rr init 0 /* round number */    
8:    \ell init 0 /* number of profess messages to be sent per iteration */    
9:    workerworker init true /* indicates whether the processor is still a worker */    
10:    while 𝑝𝑟𝑜𝑓_𝑐𝑡𝑟<logn{\it prof\!\_ctr}<\mathfrak{H}\log{n} do
10:       Send:    
11:      if workerworker then    
12:         Let qq be a randomly selected processor from P{P}    
13:         Send 𝗌𝗁𝖺𝗋𝖾,Ri[]\langle{\sf share},R_{i}[\;]\rangle to processor qq    
14:      else    
15:         Let DD be a set of 2logn2^{\ell}\log{n} randomly selected processors from P{P}
15:         /* Here the selection is with replacement */    
16:         Send 𝗉𝗋𝗈𝖿𝖾𝗌𝗌,Ri[]\langle{\sf profess},R_{i}[\;]\rangle to processors in DD    
17:         +1\ell\leftarrow\ell+1
17:       Receive:    
18:      Let MM be the set of received messages    
19:      𝑝𝑟𝑜𝑓_𝑐𝑡𝑟𝑝𝑟𝑜𝑓_𝑐𝑡𝑟+|{m:mMm.type=𝗉𝗋𝗈𝖿𝖾𝗌𝗌}|{\it prof\!\_ctr}\leftarrow{\it prof\!\_ctr}+|\{m:m\in M\wedge m.type={\sf profess}\}|    
20:       for all j𝒯j\in\mathcal{T} do    
21:         Ri[j]Ri[j](mMm.R[j])R_{i}[j]\leftarrow R_{i}[j]\cup(\bigcup_{m\in M}m.R[j]) /* update knowledge */
21:       Compute:    
22:      rr+1r\leftarrow r+1    
23:      if workerworker then    
24:         Randomly select j𝒯j\in\mathcal{T} and compute the result vj{v_{j}} for Task[j]Task[j]    
25:         Ri[j]Ri[j]{vj,i,r}R_{i}[j]\leftarrow R_{i}[j]\cup\{\langle v_{j},i,r\rangle\}    
26:         if minj𝒯{|Ri[j]|}𝔎logn\min_{j\in\mathcal{T}}\{|R_{i}[j]|\}\geq\mathfrak{K}\log{n} then /* ii has enough results */    
27:            for all j𝒯j\in\mathcal{T} do    
28:            Resultsi[j]uResults_{i}[j]\leftarrow u such that triples u,_,_\langle u,\_,\_\rangle form a plurality in Ri[j]R_{i}[j]    
29:            worker𝖿𝖺𝗅𝗌𝖾worker\leftarrow{\sf false} /* worker becomes enlightened */
29:end
 
Figure 1: Algorithm daks for t=nt=n; code at processor ii for iPi\in P.

Local knowledge and state. The algorithm is parameterized by nn, the number of processors and tasks, and by compile-time constants \mathfrak{H} and 𝔎\mathfrak{K} that are discussed later (they emerge from the analysis). Every processor ii maintains the following:

  • Array of results Ri[1..n]R_{i}[1..n], where element Ri[j]R_{i}[j], for j𝒯j\in\mathcal{T}, is a set of results for Task[j]Task[j]. Each Ri[j]R_{i}[j] is a set of triples v,i,r\langle v,i,r\rangle, where vv is the result computed for Task[j]Task[j] by processor ii in round rr (here the inclusion of rr ensures that the results computed by processor ii in different rounds are preserved).

  • The array Resultsi[1..n]Results_{i}[1..n] stores the final results.

  • The 𝑝𝑟𝑜𝑓_𝑐𝑡𝑟{\it prof\_ctr} stores the number of messages received from enlightened processors.

  • rr is the round (iteration) number that is used by workers to timestamp the computed results.

  • \ell is the exponent that controls the number of messages multicast by enlightened processors.

Control flow. The algorithm iterations are controlled by the main while-loop, and we use the term round to refer to a single iteration of the loop. The loop contains three stages, viz., Send, Receive, and Compute.

Processors communicate using messages mm that contain pairs type,R[]\langle type,R[\;]\rangle. Here m.R[]m.R[\;] is the sender’s array of results. When a processor is a worker, it sends messages with m.type=𝗌𝗁𝖺𝗋𝖾m.type={\sf share}. When a processor becomes enlightened, it sends messages with m.type=𝗉𝗋𝗈𝖿𝖾𝗌𝗌m.type={\sf profess}. The loop is controlled by the counter 𝑝𝑟𝑜𝑓_𝑐𝑡𝑟{\it prof\_ctr} that keeps track of the received messages of type profess. We next describe the stages in detail.

Send stage: Any worker chooses a target processor qq at random and sends its array of results R[]R[\;] to processor qq in a share message. Any enlightened processor chooses a set DPD\subseteq P of processors at random and sends the array of results R[]R[\;] to processors in DD in a profess message. The size of the set DD is 2logn2^{\ell}\log{n}, where initially =0\ell=0, and once a processor is enlightened, it increments \ell by 11 in every round. (Strictly speaking, DD is a multiset, because the random selection is with replacement. However this is done only for the purpose of the analysis, and DD can be safely treated as a set for the purpose of sending profess messages.)

Receive stage: Processor ii receives messages (if any) sent to it in the preceding Send stage. The processor increments its 𝑝𝑟𝑜𝑓_𝑐𝑡𝑟{\it prof\_ctr} by the number of profess messages received. For each task jj, the processor updates its Ri[j]R_{i}[j] by including the results received in all messages.

Compute stage: Any worker ii randomly selects task jj, computes the result vjv_{j}, and adds the triple vj,i,r\langle v_{j},i,r\rangle for round rr to Ri[j]R_{i}[j]. For each task the worker checks whether “enough” results were collected. Once at least 𝔎logn\mathfrak{K}\log n results for each task are obtained, the worker stores the final results in Resultsi[]Results_{i}[\;] by taking the plurality of results for each task, and becomes enlightened. (In Section 4 we reason about the compile-time constant 𝔎\mathfrak{K}, and establish that 𝔎logn\mathfrak{K}\log n results are sufficient for our claims.) Enlightened processors rest on their laurels in subsequent Compute stages.

Reaching Termination. We note that a processor must become enlightened before it can terminate. Processors can become enlightened at different times and without any synchronization. Once enlightened, they profess their knowledge by multicasting it to exponentially growing random subsets DD of processors. When a processor receives sufficiently many such messages, i.e., logn\mathfrak{H}\log n, it halts, again without any synchronization, and using only the local knowledge. We consider this protocol to be of independent interest. In Section 4 we reason about the compile-time constant \mathfrak{H}, and establish that logn\mathfrak{H}\log n profess messages are sufficient for our claims; additionally we show that the protocol’s efficiency can be assessed independently of the number of crashes.

Extending the Algorithm for tnt\geq n. We now show how to modify the algorithm to handle arbitrary number of tasks tt such that tnt\geq n. Let 𝒯=[t]{\cal T}^{\prime}=[t] be the set of unique task identifiers, where tnt\geq n. We segment the tt tasks into chunks of t/n\lceil t/n\rceil tasks, and construct a new array of chunk-tasks with identifiers in 𝒯=[n]{\cal T}=[n], where each chunk-task takes Θ(t/n)\Theta(t/n) time to perform by any live processor. We now use algorithm daks, where the only difference is that each Compute stage takes Θ(t/n)\Theta(t/n) time to perform a chunk-task. In the sequel, we use daks as the name of the algorithm when t=nt=n, and we use dakst,n as the name of the algorithm when tnt\geq n.

4 Algorithm Analysis

We present the performance analysis of algorithm daks in the two adversarial failure models. We first present the analysis that deals with the case when t=nt=n, then extend the results to the general case with tnt\geq n for algorithm dakst,n.

4.1 Foundational Lemmas

We proceed by giving lemmas relevant to both adversarial models, starting with the statement of the well known Chernoff bound.

Lemma 1 (Chernoff Bounds)

Let X1,X2,,XnX_{1},X_{2},...,X_{n} be nn independent Bernoulli random variables with Pr[Xi=1]=pi\mbox{\bf Pr}[X_{i}=1]=p_{i} and Pr[Xi=0]=1pi\mbox{\bf Pr}[X_{i}=0]=1-p_{i}, then it holds for X=i=1nXiX=\sum_{i=1}^{n}X_{i} and μ=𝔼[X]=i=1npi\mu=\mathbb{E}[X]=\sum_{i=1}^{n}p_{i} that for all δ>0\delta>0, (i) Pr[X(1+δ)μ]eμδ23\mbox{\bf Pr}[X\geq(1+\delta)\mu]\leq e^{-\frac{\mu\delta^{2}}{3}}, and (ii) Pr[X(1δ)μ]eμδ22\mbox{\bf Pr}[X\leq(1-\delta)\mu]\leq e^{-\frac{\mu\delta^{2}}{2}}.

Now we show that if Θ(nlogn)\Theta(n\log{n}) profess messages are sent by the enlightened processors, then the algorithm terminates whp in one round.

Lemma 2

Let rr be the first round by which the total number of profess messages is Θ(nlogn)\Theta(n\log{n}). Then by the end of this round every live processor halts whp.

Proof. Let n~=knlogn\tilde{n}=kn\log{n} be the number of 𝗉𝗋𝗈𝖿𝖾𝗌𝗌{\sf profess} messages sent by round rr, where k>1k>1 is a sufficiently large constant. We show that whp every live processor receives at least (1δ)klogn(1-\delta)k\log{n} 𝗉𝗋𝗈𝖿𝖾𝗌𝗌{\sf profess} messages, for some constant δ(0,1)\delta\in(0,1). Let us assume that there exists processor qq that receives less than (1δ)klogn(1-\delta)k\log{n} of such messages. We prove that whp such a processor does not exist.

Since n~\tilde{n} 𝗉𝗋𝗈𝖿𝖾𝗌𝗌{\sf profess} messages are sent by round rr, there were n~\tilde{n} random selections of processors from set PP in line 15 of algorithm daks on page 1, possibly by different enlightened processors. We denote by ii an index of one of the random selections in line 15. Let XiX_{i} be a Bernoulli random variable such that Xi=1X_{i}=1 if processor qq was chosen by an enlightened processor and Xi=0X_{i}=0 otherwise.

We define a random variable X=i=1n~XiX=\sum_{i=1}^{\tilde{n}}X_{i} to estimate the total number of times processor qq is selected by round rr. In line 15 every enlightened processor chooses a set of destinations for the 𝗉𝗋𝗈𝖿𝖾𝗌𝗌{\sf profess} message uniformly at random, and hence 𝐏𝐫[Xi=1]=1n{\bf Pr}[X_{i}=1]=\frac{1}{n}. Let μ=𝔼[X]=i=1n~Xi=1nknlogn=klogn\mu=\mathbb{E}[X]=\sum_{i=1}^{\tilde{n}}X_{i}=\frac{1}{n}k\,n\log{n}=k\log{n}, then by applying Chernoff bound, for the same δ\delta chosen as above, we have:

𝐏𝐫[X(1δ)μ]eμδ22e(klogn)δ221nbδ221nα{\bf Pr}[X\leq(1-\delta)\mu]\leq e^{-\frac{\mu\delta^{2}}{2}}\leq e^{-\frac{(k\log{n})\delta^{2}}{2}}\leq\frac{1}{n^{\frac{b\delta^{2}}{2}}}\leq\frac{1}{n^{\alpha}}

where α>1\alpha>1 for some sufficiently large bb. We now define \mathfrak{H} to be =(1δ)k\mathfrak{H}=(1-\delta)k. Thus, with this \mathfrak{H}, we have 𝐏𝐫[Xlogn]1nα{\bf Pr}[X\leq\mathfrak{H}\log{n}]\leq\frac{1}{n^{\alpha}} for some α>1\alpha>1. Now let us denote by q\mathcal{E}_{q} the fact that 𝑝𝑟𝑜𝑓_𝑐𝑡𝑟qlogn{\it prof\_ctr}_{q}\geq\mathfrak{H}\log{n} by the end of round rr, and let q¯\bar{\mathcal{E}_{q}} be the complement of that event. By Boole’s inequality we have 𝐏𝐫[q¯q]q𝐏𝐫[q¯]1nβ{\bf Pr}[\cup_{q}\bar{\mathcal{E}}_{q}]\leq\sum_{q}{\bf Pr}[\bar{\mathcal{E}_{q}}]\leq\frac{1}{n^{\beta}}, where β=α1>0\beta=\alpha-1>0. Hence each processor qPq\in P is the destination of at least logn\mathfrak{H}\log{n} 𝗉𝗋𝗈𝖿𝖾𝗌𝗌{\sf profess} messages whp, i.e.,

𝐏𝐫[qq]=𝐏𝐫[qq¯¯]=1𝐏𝐫[qq¯]11nβ{\bf Pr}[\cap_{q}\mathcal{E}_{q}]={\bf Pr}[\overline{\cup_{q}\bar{\mathcal{E}_{q}}}]=1-{\bf Pr}[\cup_{q}\bar{\mathcal{E}_{q}}]\geq 1-\frac{1}{n^{\beta}}

and hence, it halts (line 10). \Box

We use the constant \mathfrak{H} from the proof of Lemma 2 as a compile-time constant in algorithm daks (Figure 1). The constant is used in the main while loop (line 10) to determine when a sufficient number of profess messages is received from enlightened processors, causing the loop to terminate.

We now show that once a processor, that the adversary does not crash, becomes enlightened then, whp, in O(logn)O(\log{n}) rounds every other live processor becomes enlightened, and halts.

Lemma 3

Once a processor qPFq\in P-F becomes enlightened, every live processor halts in additional O(logn)O(\log{n}) rounds whp.

Proof. According to Lemma 2 if Θ(nlogn)\Theta(n\log{n}) 𝗉𝗋𝗈𝖿𝖾𝗌𝗌{\sf profess} messages are sent then every processor halts whp. Given that processor qq does not crash it takes qq at most logn\log{n} rounds to send nlognn\log{n} 𝗉𝗋𝗈𝖿𝖾𝗌𝗌{\sf profess} messages (per line 15 in Figure 1), regardless of the actions of other processors. Hence, whp every live processor halts in O(logn)O(\log{n}) rounds. \Box

Next we establish the work and message complexities for algorithm daks for the case when number of crashes is small, specifically when at least a linear number of processors do not crash. As we mentioned in the introduction, while Ω(n)\Omega(n) processors remain active in the computation, algorithm daks performs tasks in exactly the same pattern as algorithm AA in [6] (to avoid a complete restatement, we kindly refer the reader to that earlier paper). This forms the basis for the next lemma.

Lemma 4

Algorithm daks has work and message complexity Θ(nlogn)\Theta(n\log n) when Ω(n)\Omega(n) processors do not crash.

Proof. Algorithm daks chooses tasks to perform in the same pattern as algorithm AA in [6], however the two algorithms have very different termination strategies. Theorems 2 and 4 of [6] establish that in the presence of at most fnf\cdot n crashes, for a constant f(0,1)f\in(0,1), the work and message complexities of algorithm AA are Θ(nlogn)\Theta(n\log{n}). The termination strategy of algorithm daks is completely different, however, per Lemmas 2 and 3, after at least one processor from PFP-F is enlightened every live processor halts in Θ(logn)\Theta(\log n) rounds whp, having sent Θ(nlogn)\Theta(n\log n) 𝗉𝗋𝗈𝖿𝖾𝗌𝗌{\sf profess} messages. Thus, with at least a linear number of processors remaining, the work and message complexities, relative to algorithm AA, increase by an additive Θ(nlogn)\Theta(n\log n) term. The result follows. \Box

We denote by LL the number of rounds required for a processor from the set PFP-F to become enlightened. We next analyze the value of LL for models fp{\cal F}\!\!_{fp} and pl{\cal F}\!\!_{pl}.

4.2 Analysis for Model fp{\cal F}\!\!_{fp}

In model fp{\cal F}\!\!_{fp} we have |F|nnε|F|\leq n-n^{\varepsilon}. Let FrF_{r} be the actual number of crashes that occur prior to round rr. For the purpose of analysis we divide an execution of the algorithm into two epochs: epoch 𝔞\!\mathfrak{a} consists of all rounds rr where |Fr||F_{r}| is at most linear in nn, so that when the number of live processors is at least cnc^{\prime}n for some suitable constant cc^{\prime}; epoch 𝔟\!\mathfrak{b} consists of all rounds rr starting with first round rr^{\prime} (it can be round 1) when the number of live processors drops below some cnc^{\prime}n and becomes c′′nεc^{\prime\prime}n^{\varepsilon} for some suitable constant c′′c^{\prime\prime}. Note that either epoch may be empty.

For the small number of crashes in epoch 𝔞\!\mathfrak{a}, Lemma 4 gives the worst case work and message complexities as Θ(nlogn)\Theta(n\log n); the upper bounds apply whether or not the algorithm terminates in this epoch.

Next we consider epoch 𝔟\!\mathfrak{b}. If the algorithm terminates in round rr^{\prime}, the first round of the epoch, the cost remains the same as given by Lemma 4. If it does not terminate, it incurs additional costs associated with the processors in PFrP-F_{r^{\prime}}, where |PFr|c′′nε|P-F_{r^{\prime}}|\leq c^{\prime\prime}n^{\varepsilon}. We analyze the costs for epoch 𝔟\!\mathfrak{b} in the rest of this section. The final message and work complexities will be at most the worst case complexity for epoch 𝔞\!\mathfrak{a} plus the additional costs for epoch 𝔟\!\mathfrak{b} incurred while |PF|=Ω(nε)|P-F|=\Omega(n^{\varepsilon}) per model fp{\cal F}\!\!_{fp}.

First we show that whp it will take L=O(n1εlognloglogn)L=O(n^{1-\varepsilon}\log{n}\log{\log{n}}) rounds for a worker from the set PFP-F to become enlightened in epoch 𝔟\!\mathfrak{b}.

Lemma 5

In O(n1εlogn)O(n^{1-\varepsilon}\log{n}) rounds of epoch 𝔟\!\mathfrak{b} every task is performed Θ(logn)\Theta(\log{n}) times whp by processors in PFP-F.

Proof. If the algorithm terminates within O(n1alogn)O(n^{1-a}\log{n}) rounds of epoch 𝔟\!\mathfrak{b}, then each task is performed Θ(logn)\Theta(\log n) times as reasoned earlier. Suppose the algorithm does not terminate (in this case its performance is going to be worse).

Let us assume that after r~=κn1alogn\tilde{r}=\kappa n^{1-a}\log{n} rounds of algorithm daks, where κ\kappa is a sufficiently large constant and 0<a<10<a<1 is a constant, there exists a task τ\tau that is performed less than (1δ)κlogn(1-\delta)\kappa\log{n} times among all live workers, for some δ>0\delta>0. We prove that whp such a task does not exist.

We define k2k_{2} to be such that k2=(1δ)κk_{2}=(1-\delta)\kappa (the constant k2k_{2} will play a role in establishing the value of the compile-time constant 𝔎\mathfrak{K} of algorithm daks; we come back to this in Section 4). According to the above assumption, at the end of round r~\tilde{r} for some task τ\tau, we have |j=1nRj[τ]|<(1δ)κlogn=k2logn|\cup_{j=1}^{n}R_{j}[\tau]|<(1-\delta)\kappa\log{n}=k_{2}\log{n}.

Let us consider all algorithm iterations individually performed by each processor in PFP-F during the r~\tilde{r} rounds. Let ξ\xi be the total number of such individual iterations. Then ξr~|PF|cna\xi\geq\tilde{r}|P-F|\geq cn^{a}. During any such iteration, a processor from PFP-F selects and performs task τ\tau in line 24 independently with probability 1n\frac{1}{n}. Let us arbitrarily enumerate said iterations from 11 to ξ\xi. Let X1,,Xx,,XξX_{1},\ldots,X_{x},\dots,X_{\xi} be Bernoulli random variables, such that XxX_{x} is 11 if task τ\tau is performed in iteration xx, and 0 otherwise. We define Xx=1ξXxX\equiv\sum_{x=1}^{\xi}X_{x}, the random variable that describes the total number of times task τ\tau is performed during the r~\tilde{r} rounds by processors in PFP-F. We define μ\mu to be 𝔼[X]{\mathbb{E}}[X]. Since 𝐏𝐫[Xx=1]=1n{\bf Pr}[X_{x}=1]=\frac{1}{n}, for x{1,,ξ}x\in\{1,\ldots,\xi\}, where ξr~cna\xi\geq\tilde{r}cn^{a}, by linearity of expectation, we obtain μ=𝔼[X]=x=1r~cna1n=r~cnan>κclogn\mu={\mathbb{E}}[X]=\sum_{x=1}^{\tilde{r}cn^{a}}\frac{1}{n}=\frac{\tilde{r}cn^{a}}{n}>\kappa c\log{n}. Now by applying Chernoff bound, for the same δ>0\delta>0 chosen as above, we have:

𝐏𝐫[X(1δ)μ]eμδ22e(κclogn)δ221nbδ221nα{\bf Pr}[X\leq(1-\delta)\mu]\leq e^{-\frac{\mu\delta^{2}}{2}}\leq e^{-\frac{(\kappa c\log{n})\delta^{2}}{2}}\leq\frac{1}{n^{\frac{b\delta^{2}}{2}}}\leq\frac{1}{n^{\alpha}}

where α>1\alpha>1 for some sufficiently large bb. Now let us denote by τ\mathcal{E}_{\tau} the fact that |i=1nRi(τ)|>k2logn|\cup_{i=1}^{n}R_{i}(\tau)|>k_{2}\log{n} by the round r~\tilde{r} of the algorithm and we denote by τ¯\bar{\mathcal{E}_{\tau}} the complement of that event. Next by Boole’s inequality we have 𝐏𝐫[τ¯τ]τ𝐏𝐫[τ¯]1nβ{\bf Pr}[\cup_{\tau}\bar{\mathcal{E}}_{\tau}]\leq\sum_{\tau}{\bf Pr}[\bar{\mathcal{E}_{\tau}}]\leq\frac{1}{n^{\beta}}, where β=α1>0\beta=\alpha-1>0. Hence each task is performed at least Θ(logn)\Theta(\log{n}) times by workers in PFP-F whp, i.e.,

𝐏𝐫[ττ]=𝐏𝐫[ττ¯¯]11nβ.\displaystyle{\bf Pr}[\cap_{\tau}\mathcal{E}_{\tau}]={\bf Pr}[\overline{\cup_{\tau}\bar{\mathcal{E}_{\tau}}}]\geq 1-\frac{1}{n^{\beta}}. \Box

We now focus only on the set of live processors PFP-F with |PF|cnε|P-F|\geq cn^{\varepsilon}. Our goal is to show that in O(n1εlognloglogn)O(n^{1-\varepsilon}\log{n}\log{\log{n}}) rounds of algorithm daks at least one processor from PFP-F becomes enlightened. In reasoning about Lemmas 67 and 8, that follow, we note that if the algorithm terminates within O(n1εlognloglogn)O(n^{1-\varepsilon}\log{n}\log{\log{n}}) rounds of epoch 𝔟\!\mathfrak{b}, then every processor in PFP-F is enlightened as reasoned earlier. Suppose the algorithm does not terminate (in focusing on this case we note that the algorithm’s performance is going to be worse).

We first show that any triple zz generated by a processor in PFP-F is known to all processors in PFP-F in O(n1εlognloglogn)O(n^{1-\varepsilon}\log{n}\log{\log{n}}) rounds of algorithm daks.

We denote by S(r)PFS(r)\subseteq P-F the set of processors that know a certain triple zz by round rr, and let s(r)=|S(r)|s(r)=|S(r)|. Next lemma shows that by round r1=r+Θ(n1εlognloglogn)r_{1}=r^{\prime}+\Theta(n^{1-\varepsilon}\log{n}\log{\log{n}}) in epoch 𝔟\!\mathfrak{b} we have s(r1)=Θ(log3n)s(r_{1})=\Theta(\log^{3}{n}).

Lemma 6

By round r1=r+Θ(n1εlognloglogn)r_{1}=r^{\prime}+\Theta(n^{1-\varepsilon}\log{n}\log{\log{n}}) of epoch 𝔟\!\mathfrak{b}, s(r1)=Θ(log3n)s(r_{1})=\Theta(\log^{3}{n}) whp.

Proof. Consider a scenario when a processor pPFp\in P-F generates a triple zz. Then the probability that processor pp sends triple zz to at least one other processor qPFq\in P-F, where pqp\neq q, in n1εlognn^{1-\varepsilon}\log{n} rounds is at least

1(1cnεn)n1εlogn1eblogn>11nα1-(1-\frac{cn^{\varepsilon}}{n})^{n^{1-\varepsilon}\log{n}}\geq 1-e^{-b\log{n}}>1-\frac{1}{n^{\alpha}} s.t. α>0\alpha>0

for some appropriately chosen bb and for a sufficiently large nn. Similarly, it is straightforward to show that the number of live processors that learn about zz doubles every n1εlognn^{1-\varepsilon}\log{n} rounds, hence whp after (n1εlogn)3loglogn=Θ(n1εlognloglogn)(n^{1-\varepsilon}\log{n})\cdot 3\log{\log{n}}=\Theta(n^{1-\varepsilon}\log{n}\log{\log{n}}) rounds the number of processors in PFP-F that learn about zz is Θ(log3n)\Theta(\log^{3}{n}). \Box

In the next lemma we reason about the growth of s(r)s(r) after round r1r_{1}.

Lemma 7

Let r2r_{2} be the first round after round r1r_{1} in epoch 𝔟\!\mathfrak{b} such that r2r1=Θ(n1εlogn)r_{2}-r_{1}=\Theta(n^{1-\varepsilon}\log{n}). Then s(r2)35|PF|s(r_{2})\geq\frac{3}{5}|P-F| whp.

Proof. Per model fp{\cal F}\!\!_{fp}, let constant cc be such that |PF|cna|P-F|\geq cn^{a}. We would like to apply the Chernoff bound to approximate the number of processors from (PF)S(r1)(P-F)-S(r_{1}) that learn about triple zz by round r2r_{2}. According to algorithm daks if a processor i(PF)S(r1)i\in(P-F)-S(r_{1}) learns about triple zz in some round r1<r<r2r_{1}<r<r_{2}, then in round r+1r+1 processor ii forwards zz to some randomly chosen processor jPj\in P (lines 12-13 of the algorithm). Let YiY_{i}, where i(PF)S(r1)i\in(P-F)-S(r_{1}), be a random variable such that Yi=1Y_{i}=1 if processor ii receives the triple zz from some processor jS(r1)j\in S(r_{1}), in some round r1<r<r2r_{1}<r<r_{2}, and Yi=0Y_{i}=0 otherwise. It is clear that if some processor k(PF)S(r)k\in(P-F)-S(r), where kik\neq i receives triple zz from processor ii in round r+1<r2r+1<r_{2}, then random variables YiY_{i} and YkY_{k} are not independent, and hence, the Chernoff bound cannot be applied. To circumvent this, we consider the rounds between r1r_{1} and r2r_{2} and partition these rounds into blocks of 1cn1a\frac{1}{c}n^{1-a} consecutive rounds. For instance, rounds r1+1,,r1+1cn1ar_{1}+1,...,r_{1}+\frac{1}{c}n^{1-a} form the first block, rounds r1+1cn1a+1,,r1+2cn1ar_{1}+\frac{1}{c}n^{1-a}+1,...,r_{1}+\frac{2}{c}n^{1-a} form the second block, etc. The final block may contain less than 1cn1a\frac{1}{c}n^{1-a} rounds.

We are interested in estimating the fraction of the processors in (PF)S(r1)(P-F)-S(r_{1}) that learn about triple zz at the end of each block.

For the purpose of the analysis we consider another algorithm, called daks. The difference between algorithms daks and daks is that in daks a processor does not forward triple zz in round rr if zz was first received in the round that belongs to the same block as rr does. This allows us to apply Chernoff bound (with negative dependencies) to approximate the number of processors in (PF)S(r1)(P-F)-S(r_{1}) that learn about triple zz in a block. We let S(r)S^{\prime}(r) be the subset of processors in PFP-F that are aware of triple zz by round rr in algorithm daks, and we let s(r)=|S(r)|s^{\prime}(r)=|S^{\prime}(r)|. Note, that since in daks triple zz is forwarded less often than in daks, it follows that the number of processors from PFP-F that learn about zz in daks is at least as large as the number of processors from PFP-F that learn about zz in daks, and, in particular, S(r)S(r)S^{\prime}(r)\subseteq S(r), for any rr. This allows us to consider algorithm daks instead of daks for assessing the number of processors from PFP-F that learn about zz by round r2r_{2}, and we do this by having s(r)s^{\prime}(r) serve as a lower bound for s(r)s(r).

Let XiX_{i}, where i(PF)S(r)i\in(P-F)-S^{\prime}(r), be a random variable, s.t. Xi=1X_{i}=1 if processor ii receives the triple zz from some processor jS(r)j\in S^{\prime}(r) in a block that starts with round r+1r+1, e.g., for the first block r=r1,r=r_{1}, and Xi=0X_{i}=0 otherwise. Let us next define the random variable X=i(PF)S(r)Xi=s(r+1cn1a)s(r)X={\sum}_{i\in(P-F)-S^{\prime}(r)}{X_{i}}=s^{\prime}({r+\frac{1}{c}n^{1-a})}-s^{\prime}(r) to count the number of processors in (PF)S(r)(P-F)-S^{\prime}(r) that received triple zz in the block that starts with round r+1r+1.

Next, we calculate 𝔼[X]\mathbb{E}[X], the expected number of processors in (PF)S(r)(P-F)-S^{\prime}(r) that learn about triple zz at the end of the block that begins with round r+1r+1 in algorithm daks. There are s(r)s^{\prime}(r) processors in S(r)S^{\prime}(r) that are aware of triple zz. Note that there are 1cn1a\frac{1}{c}n^{1-a} consecutive rounds in a block; and during every round every processor qS(r)q\in S^{\prime}(r) picks a processor from PP uniformly at random, and sends the triple zz to it. Note also that in algorithm daks, triple zz is not forwarded by a processor during the same round in which it is received. Therefore, every processor pp in (PF)S(r)(P-F)-S^{\prime}(r) has a probability of 1n\frac{1}{n} to be selected by a processor qS(r)q\in S^{\prime}(r) in one round. Conversely, the probability that p(PF)S(r)p\in(P-F)-S^{\prime}(r) is not selected by qq is 11n1-\frac{1}{n}. The number of trials is s(r)cn1a\frac{s^{\prime}(r)}{c}n^{1-a}, hence the probability that processor p(PF)S(r)p\in(P-F)-S^{\prime}(r) is not selected is (11n)s(r)cn1a(1-\frac{1}{n})^{\frac{s^{\prime}(r)}{c}n^{1-a}}. On the contrary, the probability that a processor p(PF)S(r)p\in(P-F)-S^{\prime}(r) is selected is 1(11n)s(r)cn1a1-(1-\frac{1}{n})^{\frac{s^{\prime}(r)}{c}n^{1-a}}. Therefore, the expected number of processors from (PF)S(r)(P-F)-S^{\prime}(r) that learn about triple zz by the end of the block in algorithm daks is (cnas(r))(1(11n)s(r)cn1a)\displaystyle(cn^{a}-s^{\prime}(r))(1-(1-\frac{1}{n})^{\frac{s^{\prime}(r)}{c}n^{1-a}}). Next, by applying the binomial expansion, we have:

(cnas(r))(1(11n)s(r)cn1a)\displaystyle(cn^{a}-s^{\prime}(r))(1-(1-\frac{1}{n})^{\frac{s^{\prime}(r)}{c}n^{1-a}})

(cnas(r))(s(r)n1acns(r)2n22a2c2n2)\displaystyle\geq(cn^{a}-s^{\prime}(r))(\frac{s^{\prime}(r)n^{1-a}}{cn}-\frac{{s^{\prime}(r)}^{2}n^{2-2a}}{2c^{2}n^{2}})
=cna(1s(r)cna)s(r)n1acn(1s(r)n1a2cn)\displaystyle=cn^{a}(1-\frac{s^{\prime}(r)}{cn^{a}})\frac{s^{\prime}(r)n^{1-a}}{cn}(1-\frac{s^{\prime}(r)n^{1-a}}{2cn})
=s(r)(1s(r)cna)(1s(r)n1a2cn)\displaystyle=s^{\prime}(r)(1-\frac{s^{\prime}(r)}{cn^{a}})(1-\frac{s^{\prime}(r)n^{1-a}}{2cn})

The number of processors from (PF)S(r)(P-F)-S^{\prime}(r) that become aware of triple zz in the block of 1cn1a\frac{1}{c}n^{1-a} rounds that starts with round r+1r+1 is s(r+1cn1a)s(r)s^{\prime}({r+\frac{1}{c}n^{1-a}})-s^{\prime}(r). While, as shown above, the expected number of processors that learn about triple zz is μ=𝔼[X]=s(r)(1s(r)cna)(1s(r)2cna)\mu=\mathbb{E}[X]=s^{\prime}(r)(1-\frac{s^{\prime}(r)}{cn^{a}})(1-\frac{s^{\prime}(r)}{2cn^{a}}).

On the other hand, because in algorithm daks no processor that learns about triple zz in a block forwards it in the same block, we have negative dependencies among the random variables XiX_{i}. And hence, we can apply the regular Chernoff bound, with δ=1logn\delta=\frac{1}{\log{n}}. Considering also that s(r)s(r1)s^{\prime}(r)\geq s(r_{1}) and that s(r1)=Θ(log3n)s(r_{1})=\Theta(\log^{3}n) by Lemma 6, we obtain:

𝐏𝐫[X(11logn)μ]es(r)(1s(r)cna)(1s(r)2cna)121log2neklog3n2log2n=eklogn1nα\displaystyle{\bf Pr}[X\leq(1-\frac{1}{\log{n}})\mu]\leq e^{-s^{\prime}(r)(1-\frac{s^{\prime}(r)}{cn^{a}})(1-\frac{s^{\prime}(r)}{2cn^{a}})\frac{1}{2}\frac{1}{\log^{2}{n}}}\leq e^{-\frac{k\log^{3}{n}}{2\log^{2}{n}}}=e^{-k\log{n}}\leq\frac{1}{n^{\alpha}}

where α>0\alpha>0 for some sufficiently large k>2k>2.

Therefore, whp the number of processors that learn about triple zz in a block that starts with round r+1r+1 is

s(r+1cn1a)\displaystyle s^{\prime}({r+\frac{1}{c}n^{1-a}}) \displaystyle\geq s(r)+s(r)(1s(r)cna)(1s(r)2cna)(11logn)\displaystyle s^{\prime}(r)+s^{\prime}(r)(1-\frac{s^{\prime}(r)}{cn^{a}})(1-\frac{s^{\prime}(r)}{2cn^{a}})(1-\frac{1}{\log{n}})
\displaystyle\geq s(r)+s(r)(13s(r)2cna)(11logn)\displaystyle s^{\prime}(r)+s^{\prime}(r)(1-\frac{3s^{\prime}(r)}{2cn^{a}})(1-\frac{1}{\log{n}})
\displaystyle\geq s(r)+s(r)(13235)(11logn)\displaystyle s^{\prime}(r)+s^{\prime}(r)(1-\frac{3}{2}\cdot\frac{3}{5})(1-\frac{1}{\log{n}})
\displaystyle\geq s(r)2120\displaystyle s^{\prime}(r)\frac{21}{20}

for a sufficiently large nn, and given that s(r)<35cnas^{\prime}(r)<\frac{3}{5}cn^{a} (otherwise the lemma is proved).

Hence we showed that the number of processors from (PF)S(r)(P-F)-S^{\prime}(r) that learnt about triple zz at the end of the block that starts with round r+1r+1, is at least 2120s(r)\frac{21}{20}s^{\prime}(r) whp. It remains to show that s(r2)35|PF|s(r_{2})\geq\frac{3}{5}|P-F| whp. Indeed, even assuming that processors that learnt about triple zz following round rr do not disseminate it, after repeating the process described above for some O(logn)O(\log{n}) times, it is clear that whp s(r2)35|PF|s^{\prime}({r_{2}})\geq\frac{3}{5}|P-F|. On the other hand since the block size is 1cn1a\frac{1}{c}n^{1-a} and r2r1=Θ(n1alogn)r_{2}-r_{1}=\Theta(n^{1-a}\log{n}) there are Θ(logn)\Theta(\log{n}) blocks.

Thus whp we have s(r2)35|PF|s^{\prime}({r_{2}})\geq\frac{3}{5}|P-F| for r2r1=Θ(n1alogn)r_{2}-r_{1}=\Theta(n^{1-a}\log{n}), and since S(r)S(r)S^{\prime}(r)\subseteq S(r) we have s(r2)35|PF|s({r_{2}})\geq\frac{3}{5}|P-F|. \Box

In the proof of the next lemma we use the Coupon Collector’s problem [20]:

Definition 1

The Coupon Collector’s Problem (CCP). There are nn types of coupons and at each trial a coupon is chosen at random. Each random coupon is equally likely to be of any of the n types, and the random choices of the coupons are mutually independent. Let mm be the number of trials. The goal is to study the relationship between mm and the probability of having collected at least one copy of each of nn types.

In [20] it is shown that 𝔼[X]=nlnn+O(n)\mathbb{E}[X]=n\ln{n}+O(n) and that whp the number of trials for collecting all nn coupon types lies in a small interval centered about its expected value.

Next we calculate the number of rounds required for the remaining 25|PF|\frac{2}{5}|P-F| processors in PFP-F to learn zz. Let UdPFU_{d}\subset P-F be the set of workers that do not learn zz after O(n1εlognloglogn)O(n^{1-\varepsilon}\log{n}\log{\log{n}}) rounds of algorithm daks. According to Lemma 7 we have |Ud|25|PF||U_{d}|\leq\frac{2}{5}|P-F|.

Lemma 8

Once every task is performed Θ(logn)\Theta(\log{n}) times in epoch 𝔟\!\mathfrak{b} by processors in PFP-F then at least one worker from PFP-F becomes enlightened in O(n1εlognloglogn)O(n^{1-\varepsilon}\log{n}\log{\log{n}}) rounds, whp.

Proof. According to Lemmas 6 and 7 in O(n1εlognloglogn)O(n^{1-\varepsilon}\log{n}\log{\log{n}}) rounds of algorithm daks at least 35|PF|\frac{3}{5}|P-F| of the workers are aware of triple zz generated by a processor in PFP-F. Let us denote this subset of workers by SdS_{d}, where dd is the first such round.

We are interested in the number of rounds required for every processor in UdU_{d} to learn about zz whp by receiving a message from a processor in SdS_{d} in some round following dd.

We show that, by the analysis similar to CCP, in O(n1εlognloglogn)O(n^{1-\varepsilon}\log{n}\log{\log{n}}) rounds triple zz is known to all processors in PFP-F, whp. Every processor in PFP-F has a unique id, hence we consider these processors as different types of coupons and we assume that the processors in SdS_{d} collectively represent the coupon collector. In this case, however, we do not require that every processor in SdS_{d} contacts all processors in UdU_{d} whp. Instead, we require only that the processors in SdS_{d} collectively contact all processors in UdU_{d} whp. According to our algorithm in every round every processor in PFP-F (SdPF)(S_{d}\subset P-F), selects a processor uniformly at random and sends all its data to it. Let us denote by mm the collective number of trials by processors in SdS_{d} to contact processors in UdU_{d}. According to CCP if m=O(nlnn)m=O(n\ln{n}) then whp processors in SdS_{d} collectively contact every processor in PFP-F, including those in UdU_{d}. Since there are at least 35cnε\frac{3}{5}cn^{\varepsilon} processors in SdS_{d} then in every round the number of trials is at least 35cnε\frac{3}{5}cn^{\varepsilon}, hence in O(n1εlnn)O(n^{1-\varepsilon}\ln{n}) rounds whp all processors in UdU_{d} learn about zz. Therefore, in O(n1εlognloglogn)O(n^{1-\varepsilon}\log{n}\log{\log{n}}) rounds whp all processors in UdU_{d}, and thus in PFP-F, learn about zz.

Let 𝒱\mathcal{V} be the set of triples such that for every task j𝒯j\in\cal{T} there are Θ(logn)\Theta(\log{n}) triples generated by processors in PFP-F, and hence |𝒱|=Θ(nlogn)|\mathcal{V}|=\Theta(n\log{n}). Now by applying Boole’s inequality we want to show that whp in O(n1εlognloglogn)O(n^{1-\varepsilon}\log{n}\log{\log{n}}) rounds all triples in 𝒱\cal{V} become known to all processors in PFP-F.

Let ¯z\overline{\mathcal{E}}_{z} be the event that some triple z𝒱z\in\cal{V} is not known to all processors in PFP-F. In the preceding part of the proof we have shown that 𝐏𝐫[¯z]<1nβ{\bf Pr}[\overline{\mathcal{E}}_{z}]<\frac{1}{n^{\beta}}, where β>1\beta>1. By Boole’s inequality, the probability that there exists one triple in 𝒱\cal{V} that is not known to all processors in PFP-F can be bounded as

𝐏𝐫[z𝒱¯z]Σz𝒱𝐏𝐫[¯z]=Θ(nlogn)1nβ1nγ{\bf Pr}[\cup_{z\in{\mathcal{V}}}\overline{\mathcal{E}}_{z}]\leq\Sigma_{z\in{\mathcal{V}}}{\bf Pr}[\overline{\mathcal{E}}_{z}]=\Theta(n\log n)\frac{1}{n^{\beta}}\leq\frac{1}{n^{\gamma}}

where γ>0\gamma>0. This implies that every processor in PFP-F collects all Θ(nlogn)\Theta(n\log n) triples generated by processors in PFP-F, whp. And hence, at least one of these processors becomes enlightened after O(n1εlognloglogn)O(n^{1-\varepsilon}\log{n}\log{\log{n}}) rounds. \Box

Theorem 1

Algorithm daks makes known the correct results of all nn tasks at every live processor in epoch 𝔟\!\mathfrak{b} after O(n1εlognloglogn)O(n^{1-\varepsilon}\log{n}\log{\log{n}}) rounds whp.

Proof. According to algorithm daks (line 28) every live processor computes the result of every task τ\tau by taking a plurality among all the results. We want to prove that the majority of the results for any task τ\tau are correct at any enlightened processor, whp.

To do that, for a task τ\tau we estimate (with a concentration bound) the number of times the results are computed correctly, then we estimate the bound on the total number of times task τ\tau is computed (whether correctly or incorrectly), and we show that a majority of the results are computed correctly.

Let us consider random variables XirX_{ir} that denote the success or failure of correctly computing the result of some task τ\tau in round rr by worker ii. Specifically, Xir=1X_{ir}=1 if in round rr, worker ii computes the result of the task τ\tau correctly, otherwise Xir=0X_{ir}=0. According to our algorithm we observe that for a live processor ii we have 𝐏𝐫[Xir=1]=qin{\bf Pr}[X_{ir}=1]=\frac{q_{i}}{n} and 𝐏𝐫[Xir=0]=1𝐏𝐫[Xir=1]{\bf Pr}[X_{ir}=0]=1-{\bf Pr}[X_{ir}=1], where qi1piq_{i}\equiv 1-p_{i}. We want to count the number of correct results calculated for task τ\tau when a processor iPFi\in P-F becomes enlightened. As before, we let FrF_{r} be the set of processors that crashes prior to round rr.

Let XriPFrXirX_{r}\equiv\sum_{i\in P-F_{r}}X_{ir} denote the number of correctly computed results for task τ\tau among all live workers during round rr. By linearity of expected values of a sum of random variables we have

𝔼[Xr]=𝔼[iPFrXir]=iPFr𝔼[Xir]=iPFrqin\mathbb{E}[X_{r}]=\mathbb{E}\left[\sum_{i\in P-F_{r}}X_{ir}\right]=\sum_{i\in P-F_{r}}\mathbb{E}[X_{ir}]=\sum_{i\in P-F_{r}}\frac{q_{i}}{n}

We denote by LL^{\prime} the minimum number of rounds required for at least one processor from PFP-F to become enlightened. It follows from line 26 of algorithm daks that a processor becomes enlightened only when there are at least 𝔎logn\mathfrak{K}\log{n} results for every task τ𝒯\tau\in\cal{T} (the constant 𝔎\mathfrak{K} is chosen later in this section). We see that Lc~n1a𝔎lognL^{\prime}\geq\tilde{c}n^{1-a}\mathfrak{K}\log{n}, where 0<c~10<\tilde{c}\leq 1. This is because there are t=nt=n tasks to be performed, and in epoch 𝔟\!\mathfrak{b} we have |PF|cna|P-F|\geq cn^{a} for a constant c1c\geq 1.

We further denote by Xr=1LXrX\equiv\sum_{r=1}^{L^{\prime}}X_{r} the number of correctly computed results for task τ\tau when the condition in line 26 of the algorithm is satisfied. Again, using the linearity of expected values of a sum of random variables we have

𝔼[X]=𝔼[r=1LXr]=r=1L𝔼[Xr]=r=1LiPFrqin=LniPFrqi\mathbb{E}[X]=\mathbb{E}\left[\sum_{r=1}^{L^{\prime}}X_{r}\right]=\sum_{r=1}^{L^{\prime}}\mathbb{E}[X_{r}]=\sum_{r=1}^{L^{\prime}}\sum_{i\in P-F_{r}}\frac{q_{i}}{n}=\frac{L^{\prime}}{n}\sum_{i\in P-F_{r}}q_{i}

Note that, according to our adversarial model definition, for every round rLr\leq L^{\prime} we have 1|PFr|iPFrqi>12+ζ\frac{1}{|P-F_{r}|}\sum_{i\in P-F_{r}}q_{i}>\frac{1}{2}+\zeta^{\prime}, for some fixed ζ>0\zeta^{\prime}>0. Note also that 1|PFr|iPFrqi1niPFrqi\frac{1}{|P-F_{r}|}\sum_{i\in P-F_{r}}q_{i}\geq\frac{1}{n}\sum_{i\in P-F_{r}}q_{i}, and hence, there exists some δ>0\delta>0, such that, (1δ)LniPFrqi>(1+δ)L2(1-\delta)\frac{L^{\prime}}{n}\sum_{i\in P-F_{r}}q_{i}>(1+\delta)\frac{L^{\prime}}{2}. Also, observe that the random variables X1,X2,,XLX_{1},X_{2},\ldots,X_{L^{\prime}} are mutually independent, since we consider an oblivious adversary and the random variables correspond to different rounds of execution of the algorithm. Therefore, by applying Chernoff bound on X1,X2,,XLX_{1},X_{2},\ldots,X_{L^{\prime}} we have:

𝐏𝐫[X(1δ)𝔼[X]]=𝐏𝐫[X(1δ)LniPFrqi]eδ2L(1+δ)4(1δ)1nα1,{\bf Pr}\left[X\leq(1-\delta)\mathbb{E}[X]\right]={\bf Pr}\left[X\leq(1-\delta)\frac{L^{\prime}}{n}\sum_{i\in P-F_{r}}{q_{i}}\right]\leq e^{-\frac{\delta^{2}L^{\prime}(1+\delta)}{4(1-\delta)}}\leq\frac{1}{n^{{\alpha_{1}}}}\;,

where Lc~n1a𝔎lognL^{\prime}\geq\tilde{c}n^{1-a}\mathfrak{K}\log{n} as above and α1>1{\alpha_{1}}>1 for a sufficiently large nn.

Let us now count the total number of times task τ\tau is chosen to be performed during the execution of the algorithm until every live processor halts. We represent the choice of task τ\tau by worker ii during round rr by a random variable YirY_{ir}. We assume Yir=1Y_{ir}=1 if τ\tau is chosen by worker ii in round rr, otherwise Yir=0Y_{ir}=0.

At this juncture, we address a technical point regarding the total number of results for τ\tau used for computing plurality. Note that even after round LL^{\prime} any processor that is still a worker continues to perform tasks, thereby adding more results for task τ\tau. According to Lemma 3 every processor is enlightened in O(logn)O(\log{n}) rounds after LL^{\prime}. Furthermore, in epoch 𝔟\!\mathfrak{b} following round LL^{\prime}, the number of processors that are still workers is n′′<|PF|n^{\prime\prime}<|P-F|. Hence, the expected number of results computed for every task τ\tau by workers is klognnak\frac{\log{n}}{n^{a}}, for some k>0k>0, that is, O(1na)O(\frac{1}{n^{a^{\prime}}}), for some a>0a^{\prime}>0. Therefore, the number of results computed for task τ\tau, starting from round LL^{\prime} and until the termination is negligible. Let us denote by YY the total number of results computed for a task τ\tau at termination. We express the random variable YY as Yr=1LiPFrYirY\equiv\sum_{r=1}^{L}\sum_{i\in P-F_{r}}Y_{ir}, where LL is the last round prior to termination. As argued above, the total number of results computed for task τ\tau between rounds LL^{\prime} and LL is O(1na)O(\frac{1}{n^{a^{\prime}}}), for some a>0a^{\prime}>0, and hence LL=1+o(1)\frac{L}{L^{\prime}}=1+o(1). Note that the outer sum terms of YY consisting of the inner sums are mutually independent because each sum pertains to a different round; this allows us to use Chernoff bounds. From above it is clear that 𝔼[Y]=c~n1a𝔎logn+1no(1)\mathbb{E}[Y]=\tilde{c}n^{1-a}\mathfrak{K}\log{n}+\frac{1}{n^{o(1)}}. Therefore, by applying Chernoff bound for the same δ>0\delta>0 as chosen above we have:

𝐏𝐫[Y(1+δ)𝔼[Y]]=𝐏𝐫[Y(1+δ)L]eδ2c~n1a𝔎logn31nα2,{\bf Pr}[Y\geq(1+\delta)\mathbb{E}[Y]]={\bf Pr}[Y\geq(1+\delta)L^{\prime}]\leq e^{-\frac{\delta^{2}\tilde{c}n^{1-a}\mathfrak{K}\log{n}}{3}}\leq\frac{1}{n^{{\alpha}_{2}}}\;,

where α2>1{\alpha}_{2}>1 for a sufficiently large nn.

Then, by applying Boole’s inequality on the above two events, we have

𝐏𝐫[{X(1δ)LniPFrqi}{Y(1+δ)L}]2nα{\bf Pr}[\{X\leq(1-\delta)\frac{L^{\prime}}{n}\sum_{i\in P-F_{r}}{q_{i}}\}\cup\{Y\geq(1+\delta)L^{\prime}\}]\leq\frac{2}{n^{\alpha}}

where α=min{α1,α2}>1\alpha=\min\{{\alpha}_{1},{\alpha}_{2}\}>1

Therefore, from above, and from the fact that (1δ)LniPFrqi>(1+δ)L2(1-\delta)\frac{L^{\prime}}{n}\sum_{i\in P-F_{r}}q_{i}>(1+\delta)\frac{L^{\prime}}{2}, we have 𝐏𝐫[Y/2<X]11nβ{\bf Pr}[Y/2<X]\geq 1-\frac{1}{n^{\beta}} for some β>1\beta>1. Hence, at termination, whp, the majority of calculated results for task τ\tau are correct. Let us denote this event by τ\mathcal{E}_{\tau}. It follows that 𝐏𝐫[¯τ]1nβ{\bf Pr}[\overline{\mathcal{E}}_{\tau}]\leq\frac{1}{n^{\beta}}. Now, by Boole’s inequality we obtain

𝐏𝐫[τ𝒯¯τ]τ𝒯𝐏𝐫[¯τ]1nβ11nγ{\bf Pr}[\bigcup_{\tau\in{\mathcal{T}}}\overline{\mathcal{E}}_{\tau}]\leq\sum_{\tau\in\mathcal{T}}{\bf Pr}[\overline{\mathcal{E}}_{\tau}]\leq\frac{1}{n^{\beta-1}}\leq\frac{1}{n^{\gamma}}

where 𝒯{\mathcal{T}} is the set of all nn tasks, and γ>0\gamma>0.

By Lemmas 37 and 8, whp, in O(n1alognloglogn)O(n^{1-a}\log{n}\log{\log{n}}) rounds of the algorithm, at least Θ(nlogn)\Theta(n\log{n}) triples generated by processors in PFP-F are disseminated across all workers whp. Thus, the majority of the results computed for any task at any worker is the same among all workers, and moreover these results are correct whp. \Box

According to Lemma 8, after O(n1εlognloglogn)O(n^{1-\varepsilon}\log{n}\log{\log{n}}) rounds of epoch 𝔟\!\mathfrak{b} at least one processor in PFP-F becomes enlightened. Furthermore, once a processor in PFP-F becomes enlightened, according to Lemma 3 after O(logn)O(\log{n}) rounds of the algorithm every live processor becomes enlightened and then terminates, whp. Next we assess work and message complexities.

Theorem 2

For t=nt=n algorithm daks has work and message complexity O(nlognloglogn)O(n\log n\log{\log{n}}).

Proof. To obtain the result we combine the costs associated with epoch 𝔞\!\mathfrak{a} with the costs of epoch 𝔟\!\mathfrak{b}. The work and message complexity bounds for epoch 𝔞\!\mathfrak{a} are given by Lemma 4 as Θ(nlogn)\Theta(n\log n).

For epoch 𝔟\!\mathfrak{b} (if it is not empty), where |PF|=O(nε)|P-F|=O(n^{\varepsilon}), the algorithm terminates after O(n1εlognloglogn)O(n^{1-\varepsilon}\log{n}\log{\log{n}}) rounds whp and there are Θ(nε)\Theta(n^{\varepsilon}) live processors, thus its work is O(nlognloglogn)O(n\log{n}\log{\log{n}}). In every round if a processor is a worker it sends a share message to one randomly chosen processor. If a processor is enlightened then it sends profess messages to a randomly selected subset of processors. In every round Θ(nε)\Theta(n^{\varepsilon}) share messages are sent. Since the algorithm terminates, whp, in O(n1εlognloglogn)O(n^{1-\varepsilon}\log{n}\log{\log{n}}) rounds, Θ(nlognloglogn)\Theta(n\log{n}\log{\log{n}}) share messages are sent. On the other hand, according to Lemma 2, if during the execution of the algorithm Θ(nlogn)\Theta(n\log{n}) profess messages are sent then every processor terminates whp. Hence, the message complexity is O(nlognloglogn)O(n\log{n}\log{\log{n}}).

The worst case costs of the algorithm correspond to the executions with non-empty epoch 𝔟\!\mathfrak{b}, where the algorithm does not terminate early. In this case the costs from epoch 𝔞\!\mathfrak{a} are asymptotically absorbed into the worst case costs of epoch 𝔟\!\mathfrak{b} computed above. \Box

Finally, we consider the efficiency of algorithm dakst,n for tt tasks, where tnt\geq n. Note that the only change in the algorithm is that, instead of one task, processors perform chunks of t/nt/n tasks. The communication pattern in the algorithm remains exactly the same. The following result is directly obtained from the analysis of algorithm daks for t=nt=n by multiplying the time and work complexities by the size Θ(t/n)\Theta(t/n) of the chunk of tasks; the message complexity is unchanged.

Theorem 3

Algorithm dakst,n, with tnt\geq n, computes the results of tt tasks correctly in model fp{\cal F}\!\!_{fp} whp, with time complexity O(tnεlognloglogn)O(\frac{t}{n^{\varepsilon}}\log{n}\log{\log{n}}), work complexity O(tlognloglogn)O(t\log{n}\log{\log{n}}), and message complexity O(nlognloglogn)O(n\log{n}\log{\log{n}}).

Proof. For epoch 𝔞\!\mathfrak{a} algorithm daks has time Θ(logn)\Theta(\log n), work Θ(tlogn)\Theta(t\log n), and message complexity is Θ(nlogn)\Theta(n\log n). The same holds for algorithm dakst,n. For epoch 𝔟\!\mathfrak{b} algorithm daks takes O(n1εlognloglogn)O(n^{1-\varepsilon}\log n\log{\log{n}}) iterations for at least one processor from set PFP-F to become enlightened whp. The same holds for dakst,n, except that each iteration is extended by Θ(t/n)\Theta(t/n) rounds due to the size of chunks (recall that no communication takes place during these rounds). This yields round complexity O(tnεlognloglogn)O(\frac{t}{n^{\varepsilon}}\log{n}\log{\log{n}}). Work complexity is then O(tlognloglogn)O(t\log{n}\log{\log{n}}). Message complexity remains the same as for algorithm daks at O(nlognloglogn)O(n\log{n}\log{\log{n}}) as the number of messages does not change. The final assessment is obtained by combining the costs of epoch 𝔞\!\mathfrak{a} and epoch 𝔟\!\mathfrak{b}. \Box

4.3 Failure Model pl{\cal F}\!\!_{pl}

We start with the analysis of algorithm daks, then extend the main result to algorithm dakst,n, for the adversarial model pl{\cal F}\!\!_{pl}, where |PF|=Ω(𝑝𝑜𝑙𝑦(logn))|P-F|=\Omega({\it poly}(\log n)), where we use the term 𝑝𝑜𝑙𝑦(logn){\it poly}(\log n) to denote a member of the class of functions k1O(logkn)\bigcup_{k\geq 1}O(\log^{k}{n}). As a motivation, first note that when a large number of crashes make |PF|=Θ(𝑝𝑜𝑙𝑦(logn))|P-F|=\Theta({\it poly}(\log n)), one may attempt a trivial solution where all live processors perform all tt tasks. While this approach has efficient work, it does not guarantee that workers compute correct results; in fact, since the overall probability of live workers producing bogus results can be close to 12\frac{1}{2}, this may yield on the average just slightly more than t/2t/2 correct results.

For executions in pl{\cal F}\!\!_{pl}, let |PF||P-F| be at least alogcna\log^{c}n, for specific constants aa and cc satisfying the model constraints. Let FrF_{r} be the actual number of crashes that occur prior to round rr. For the purpose of analysis we divide an execution of the algorithm into two epochs: epoch 𝔟\!\mathfrak{b^{\prime}} and epoch 𝔠\!\mathfrak{c}. In epoch 𝔟\!\mathfrak{b^{\prime}} we include all rounds rr where FrF_{r} remains constrained as in model fp{\cal F}\!\!_{fp}, i.e., |PFr|bnε|P-F_{r}|\geq bn^{\varepsilon}, for some constants b>0b>0 and ε(0,1)\varepsilon\in(0,1); for reference, this epoch combines epoch 𝔞\!\mathfrak{a} and epoch 𝔟\!\mathfrak{b} from the previous section. In epoch 𝔠\!\mathfrak{c} we include all rounds rr starting with the first round r′′r^{\prime\prime} (it can be round 1) when the number of live processors drops below bnεbn^{\varepsilon}, but remains Ω(logcn)\Omega(\log^{c}{n}) per model p{\cal F}\!\!_{p\ell}.Also note that either epoch may be empty.

In epoch 𝔟\!\mathfrak{b^{\prime}} the algorithm incurs costs exactly as in model fp{\cal F}\!\!_{fp}. Next we consider epoch 𝔠\!\mathfrak{c}. If algorithm daks terminates in round r′′r^{\prime\prime}, the first round of the epoch, the costs remain the same as the costs analyzed for fp{\cal F}\!\!_{fp} in the previous section.

If it does not terminate, it incurs additional costs associated with the processors in PFr′′P-F_{r^{\prime\prime}}, where alogcn|PFr′′|bnεa\log^{c}n\leq|P-F_{r^{\prime\prime}}|\leq bn^{\varepsilon}. We analyze the costs for epoch 𝔠\!\mathfrak{c} next. The final message and work complexities are then at most the worst case complexity for epoch 𝔟\!\mathfrak{b^{\prime}} plus the additional costs for epoch 𝔠\!\mathfrak{c}.

In the next lemmas we use the fact that |PFr′′|=Ω(logcn)|P-F_{r^{\prime\prime}}|=\Omega(\log^{c}{n}). The first lemma shows that within some O(n)O(n) rounds in epoch 𝔠\!\mathfrak{c} every task is chosen for execution Θ(logn)\Theta(\log{n}) times by processors in PFP-F whp.

Lemma 9

In O(n)O(n) rounds of epoch 𝔠\!\mathfrak{c} every task is performed Θ(logn)\Theta(\log{n}) times whp by processors in PFP-F.

Proof. If the algorithm terminates within O(n)O(n) rounds of epoch 𝔠\!\mathfrak{c}, then each task is performed Θ(logn)\Theta(\log n) times as reasoned earlier. Suppose the algorithm does not terminate (its performance is worse in this case). Let us assume that after r~\tilde{r} rounds of algorithm daks, where r~=κ~n\tilde{r}=\tilde{\kappa}n (κ~\tilde{\kappa} is a sufficiently large constant), there exists a task τ\tau that is performed less than (1δ)κ~logn(1-\delta)\tilde{\kappa}\log{n} times by the processors in PFP-F, for some δ>0\delta>0. We prove that whp such a task does not exist.

We define k3k_{3} to be such that k3=(1δ)κ~k_{3}=(1-\delta)\tilde{\kappa} (the constant k3k_{3} will play a role in establishing the value of the compile-time constant 𝔎\mathfrak{K} of algorithm daks; we come back to this at the end of Section 4). According to the above assumption, at the end of round r~\tilde{r} for some task τ\tau, we have |j=1nRj[τ]|<(1δ)κ~logn=k3logn|\cup_{j=1}^{n}R_{j}[\tau]|<(1-\delta)\tilde{\kappa}\log n=k_{3}\log{n}.

Let us consider all algorithm iterations individually performed by each processor in PFP-F during the r~\tilde{r} rounds. Let ξ\xi be the total number of such individual iterations. Then ξr~|PF|r~alogcn\xi\geq\tilde{r}|P-F|\geq\tilde{r}a\log^{c}{n}. During any such iteration, a processor from PFP-F selects and performs task τ\tau in line 24 independently with probability 1n\frac{1}{n}. Let us arbitrarily enumerate said iterations from 11 to ξ\xi. Let X1,,Xx,,XξX_{1},\ldots,X_{x},\dots,X_{\xi} be Bernoulli random variables, such that XxX_{x} is 11 if task τ\tau is performed in iteration xx, and 0 otherwise. We define Xx=1ξXxX\equiv\sum_{x=1}^{\xi}X_{x}, the random variable that describes the total number of times task τ\tau is performed during the r~\tilde{r} rounds by processors in PFP-F. We define μ\mu to be 𝔼[X]{\mathbb{E}}[X]. Since 𝐏𝐫[Xx=1]=1n{\bf Pr}[X_{x}=1]=\frac{1}{n}, for x{1,,ξ}x\in\{1,\ldots,\xi\}, where ξr~alogcn\xi\geq\tilde{r}a\log^{c}{n}, by linearity of expectation, we obtain μ=𝔼[X]=x=1ξ1nκ~alogcn>k3logn\mu={\mathbb{E}}[X]=\sum_{x=1}^{\xi}\frac{1}{n}\geq\tilde{\kappa}a\log^{c}{n}>k_{3}\log{n}. Now by applying Chernoff bound for the same δ>0\delta>0 as chosen above, we have:

𝐏𝐫[X(1δ)μ]eμδ22e(κ~alogcn)δ221nklogc1nδ221nα{\bf Pr}[X\leq(1-\delta)\mu]\leq e^{-\frac{\mu\delta^{2}}{2}}\leq e^{-\frac{(\tilde{\kappa}a\log^{c}{n})\delta^{2}}{2}}\leq\frac{1}{n^{\frac{k\log^{c-1}n\delta^{2}}{2}}}\leq\frac{1}{n^{\alpha}}

where α>1\alpha>1 for some sufficiently large kk. Now let τ\mathcal{E}_{\tau} denote the probability event |i=1nRi(τ)|>k3logn|\cup_{i=1}^{n}R_{i}(\tau)|>k_{3}\log{n} by the round r~\tilde{r} of the algorithm, and we let τ¯\bar{\mathcal{E}_{\tau}} be the complement of that event. Next, by Boole’s inequality we have 𝐏𝐫[τ¯τ]τ𝐏𝐫[τ¯]1nβ{\bf Pr}[\cup_{\tau}\bar{\mathcal{E}}_{\tau}]\leq\sum_{\tau}{\bf Pr}[\bar{\mathcal{E}_{\tau}}]\leq\frac{1}{n^{\beta}}, where β=α1>0\beta=\alpha-1>0. Hence each task is performed at least Θ(logn)\Theta(\log{n}) times whp, i.e., 𝐏𝐫[ττ]=𝐏𝐫[ττ¯¯]11nβ{\bf Pr}[\cap_{\tau}\mathcal{E}_{\tau}]={\bf Pr}[\overline{\cup_{\tau}\bar{\mathcal{E}_{\tau}}}]\geq 1-\frac{1}{n^{\beta}}. \Box

Next we show that once each task is done a logarithmic number of times by processors in PFP-F, then at least one worker in PFP-F acquires a sufficient collection of triples in at most a linear number of rounds to become enlightened. We note that if the algorithm terminates within O(n)O(n) rounds of epoch 𝔠\!\mathfrak{c}, then every processor in PFP-F is enlightened as reasoned earlier. Suppose the algorithm does not terminate (leading to its worst case performance).

Lemma 10

Once every task is performed Θ(logn)\Theta(\log{n}) times by processors in PFP-F then at least one worker in PFP-F becomes enlightened whp after O(n)O(n) rounds in epoch 𝔠\!\mathfrak{c}.

Proof. Assume that after rr rounds of algorithm daks, every task j𝒯j\in\cal{T} is done Θ(logn)\Theta(\log{n}) times by processors in PFP-F, and let 𝒱\mathcal{V} be the set of corresponding triples in the system. Consider a triple z𝒱z\in\cal{V} that was generated in some round r~\tilde{r}. We want to prove that whp it takes O(n)O(n) rounds for the rest of the processors in PFP-F to learn about zz.

Let Λ(n)\Lambda(n) be the number of processors in PFP-F, then |PF|=Λ(n)alogcn|P-F|=\Lambda(n)\geq a\log^{c}{n}, by the constraint of model pl{\cal F}\!\!_{pl}. While there may be more than Λ(n)\Lambda(n) processors that start epoch 𝔠\!\mathfrak{c}, we focus only on processors in PFP-F. This is sufficient for our purpose of establishing an upper bound on the number of rounds of at least one worker becoming enlightened: in line 12 of algorithm daks every live processor chooses a destination for a share message uniformly at random, and hence having more processors will only cause a processor in PFP-F becoming enlightened quicker.

Let Z(r)PFZ(r)\subseteq{P-F} be the set of processors that becomes aware of triple zz, in round rr. Beginning with round r~\tilde{r} when the triple is generated, we have |Z(r~)|1|Z(\tilde{r})|\geq 1 (at least one processor is aware of the triple). For any rounds rr^{\prime} and r′′r^{\prime\prime}, where r~rr′′\tilde{r}\leq r^{\prime}\leq r^{\prime\prime}, we have Z(r~)Z(r)Z(r′′)PFZ(\tilde{r})\subseteq Z(r^{\prime})\subseteq Z(r^{\prime\prime})\subseteq P-F because the considered processors that become aware of zz do not crash; thus |Z(r)||Z(r)| is monotonically non-decreasing with respect to rr.

We want to estimate an upper bound on the total number of rounds rr required for |Z(r)||Z(r)| to become Λ(n)\Lambda(n). We will do this by constructing a sequence of random mutually independent variables, each corresponding to a contiguous segment of rounds r1,,rkr_{1},...,r_{k}, for k1k\geq 1 in an execution of the algorithm. Let r0r_{0} be the round that precedes round r1r_{1}. Our contiguous segment of rounds has the following properties: (a) |Z(rx)|=|Z(r0)||Z(r_{x})|=|Z(r_{0})| for 1x<k1\leq x<k, where during such rounds rxr_{x} the set Z(rx)Z(r_{x}) does not grow (the set of such rounds may be empty), and (b) |Z(rk)|>|Z(r0)||Z(r_{k})|>|Z(r_{0})|, i.e., the size of the set grows.

For the purposes of analysis, we assume that |Z(rk)|=|Z(r0)|+1|Z(r_{k})|=|Z(r_{0})|+1, i.e., the set grows by exactly one processor. Of course it is possible that this set grows by more than a unity in one round. Thus we consider an ‘amnesiac’ version of the algorithm where if more than one processor learns about the triple, then all but one processor ‘forget’ about that triple. The information is propagated slower in the amnesiac algorithm, but this is sufficient for us to establish the needed upper bound on the number of rounds needed to propagate the triple in question.

Consider some round rr with |Z(r)|=λ|Z(r)|=\lambda. We define random variable TλT_{\lambda} that represents the number of rounds required for |Z(r+Tλ)|=λ+1|Z(r+T_{\lambda})|=\lambda+1, i.e., TλT_{\lambda} corresponds to the number kk of rounds in the contiguous segment of rounds we defined above. The random variables TλT_{\lambda} are geometric, independent random variables. Hence, we acquire a sequence of random variables T1,,TΛ(n)1T_{1},...,T_{\Lambda(n)-1}, since |PF|=Λ(n)|P-F|=\Lambda(n) and according to our amnesiac algorithm |Z(r)||Z(r+1)|+1|Z(r)|\leq|Z(r+1)|+1 for any round rr~r\geq\tilde{r}.

Let us define the random variable TT as Tλ=1Λ(n)1TλT\equiv\sum_{\lambda=1}^{\Lambda(n)-1}T_{\lambda}. TT is the total number of rounds required for all processors in PFP-F to learn about triple zz: By Markov’s inequality we have:

𝐏𝐫(κT>κη)=𝐏𝐫(eκT>ekη)𝔼[eκT]eκη,{\bf Pr}(\kappa T>\kappa\eta)={\bf Pr}(e^{\kappa T}>e^{k\eta})\leq\frac{\mathbb{E}[e^{\kappa T}]}{e^{\kappa\eta}}\;,

for some κ>0\kappa>0 and η>0\eta>0 to be specified later in the proof.

We say that “a transmission in round r>r~r>\tilde{r} successful” if processor jZ(r)j\in Z(r) sends a message to some processor l(PF)Z(r)l\in(P-F)-Z(r); otherwise we say that “the transmission is unsuccessful.” Let pjp_{j} be the probability that the transmission is successful in a round, and qj=1pjq_{j}=1-p_{j} be the probability that it is unsuccessful. Note that if a transmission is unsuccessful then this means that in that round none of the processors in Z(r)Z(r), where |Z(r)|=λ|Z(r)|=\lambda, were able to contact a processor in (PF)Z(r)(P-F)-Z(r) (here |(PF)Z(r)|=Λ(n)λ|(P-F)-Z(r)|=\Lambda(n)-\lambda), and hence we have:

qj=(1Λ(n)λn)λq_{j}=(1-\frac{\Lambda(n)-\lambda}{n})^{\lambda}

By geometric distribution, we have the following:

𝔼[eκTλ]=pjeκ+pje2κqj+pje3κqj2+=pjeκ(1+qjeκ+qj2e2κ+)\displaystyle\mathbb{E}[e^{\kappa T_{\lambda}}]=p_{j}e^{\kappa}+p_{j}e^{2\kappa}q_{j}+p_{j}e^{3\kappa}{q_{j}}^{2}+...=p_{j}e^{\kappa}(1+q_{j}e^{\kappa}+{q_{j}}^{2}e^{2\kappa}+...)

In order to sum the infinite geometric series, we need to have qjeκ<1q_{j}e^{\kappa}<1. Assume that qjeκ<1q_{j}e^{\kappa}<1 (note that we will need to choose κ\kappa such that the inequality is satisfied), hence using infinite geometric series we have:

𝔼[eκTλ]=pjeκ1qjeκ\displaystyle\mathbb{E}[e^{\kappa T_{\lambda}}]=\frac{p_{j}e^{\kappa}}{1-q_{j}e^{\kappa}}

In the remainder of the proof we focus on deriving a tight bound on the 𝔼[eκTλ]\mathbb{E}[e^{\kappa T_{\lambda}}], and subsequently apply Boole’s inequality across all triples in 𝒱{\mathcal{V}}

𝐏𝐫[κTκη]\displaystyle{\bf Pr}[\kappa T\geq\kappa\eta] \displaystyle\leq 𝔼[eκT]eκη=𝔼[eκλ=1Λ(n)1Tλ]eκη=λ=1Λ(n)1𝔼[eκTλ]eκη\displaystyle\frac{\mathbb{E}[e^{\kappa T}]}{e^{\kappa\eta}}=\frac{\mathbb{E}[e^{\kappa\sum_{\lambda=1}^{\Lambda(n)-1}T_{\lambda}}]}{e^{\kappa\eta}}=\frac{\prod_{\lambda=1}^{\Lambda(n)-1}\mathbb{E}[e^{\kappa T_{\lambda}}]}{e^{\kappa\eta}}
=\displaystyle= 1eκηλ=1Λ(n)1(1(1Λ(n)λn)λ)eκ1(1Λ(n)λn)λeκ\displaystyle\frac{1}{e^{\kappa\eta}}\prod_{\lambda=1}^{\Lambda(n)-1}\frac{(1-(1-\frac{\Lambda(n)-\lambda}{n})^{\lambda})e^{\kappa}}{1-(1-\frac{\Lambda(n)-\lambda}{n})^{\lambda}e^{\kappa}}
\displaystyle\leq 1eκηλ=1Λ(n)1(11+λn(Λ(n)λ))eκ1(1Λ(n)λn)λeκ\displaystyle\frac{1}{e^{\kappa\eta}}\prod_{\lambda=1}^{\Lambda(n)-1}\frac{(1-1+\frac{\lambda}{n}(\Lambda(n)-\lambda))e^{\kappa}}{1-(1-\frac{\Lambda(n)-\lambda}{n})^{\lambda}e^{\kappa}}
=\displaystyle= 1eκηλ=1Λ(n)1λ(Λ(n)λ)eκn(1(1Λ(n)λn)λeκ)\displaystyle\frac{1}{e^{\kappa\eta}}\prod_{\lambda=1}^{\Lambda(n)-1}\frac{\lambda(\Lambda(n)-\lambda)e^{\kappa}}{n(1-(1-\frac{\Lambda(n)-\lambda}{n})^{\lambda}e^{\kappa})}

Remember that we assumed that eκqj=eκ(1Λ(n)λn)λ<1e^{\kappa}q_{j}=e^{\kappa}(1-\frac{\Lambda(n)-\lambda}{n})^{\lambda}<1, for jZ(r)j\in Z(r), λ=1,2,,Λ(n)1\lambda=1,2,\cdots,\Lambda(n)-1 and κ>0\kappa>0. Let κ\kappa be such that eκ=1+Λ(n)2ne^{\kappa}=1+\frac{\Lambda(n)}{2n}, then we have the following

eκ(1Λ(n)λn)λ\displaystyle e^{\kappa}(1-\frac{\Lambda(n)-\lambda}{n})^{\lambda} =\displaystyle= (1+Λ(n)2n)(1Λ(n)λn)λ\displaystyle(1+\frac{\Lambda(n)}{2n})(1-\frac{\Lambda(n)-\lambda}{n})^{\lambda}
=\displaystyle= 1+Λ(n)2n(Λ(n)λ)λnO(1n2)\displaystyle 1+\frac{\Lambda(n)}{2n}-\frac{(\Lambda(n)-\lambda)\lambda}{n}-O(\frac{1}{n^{2}})
=\displaystyle= 1λ(Λ(n)λ)12Λ(n)nO(1n2)\displaystyle 1-\frac{\lambda(\Lambda(n)-\lambda)-\frac{1}{2}\Lambda(n)}{n}-O(\frac{1}{n^{2}})

In order to show that eκ(1Λ(n)λn)λ<1e^{\kappa}(1-\frac{\Lambda(n)-\lambda}{n})^{\lambda}<1 it remains to show that λ(Λ(n)λ)12Λ(n)\lambda(\Lambda(n)-\lambda)-\frac{1}{2}\Lambda(n) is positive. Note that λ(Λ(n)λ)\lambda(\Lambda(n)-\lambda) is increasing until λΛ(n)2\lambda\leq\frac{\Lambda(n)}{2}, we should also note that we consider cases for λ=1,2,,Λ(n)1\lambda=1,2,\cdots,\Lambda(n)-1. Hence, the minimal value of λ(Λ(n)λ)\lambda(\Lambda(n)-\lambda) will be when either λ=1\lambda=1, or λ=Λ(n)1\lambda=\Lambda(n)-1 and in both cases λ(Λ(n)λ)12Λ(n)0\lambda(\Lambda(n)-\lambda)-\frac{1}{2}\Lambda(n)\geq 0, for sufficiently large nn.

Let us now evaluate the following expression:

n(1eκ(1Λ(n)λn)λ)\displaystyle n(1-e^{\kappa}(1-\frac{\Lambda(n)-\lambda}{n})^{\lambda})

=\displaystyle= n(1(1λ(Λ(n)λ)12Λ(n)nO(1n2)))\displaystyle n(1-(1-\frac{\lambda(\Lambda(n)-\lambda)-\frac{1}{2}\Lambda(n)}{n}-O(\frac{1}{n^{2}})))
=\displaystyle= n(λ(Λ(n)λ)12Λ(n)n+O(1n2))\displaystyle n(\frac{\lambda(\Lambda(n)-\lambda)-\frac{1}{2}\Lambda(n)}{n}+O(\frac{1}{n^{2}}))
=\displaystyle= λ(Λ(n)λ)12Λ(n)+O(1n)\displaystyle\lambda(\Lambda(n)-\lambda)-\frac{1}{2}\Lambda(n)+O(\frac{1}{n})

Then, we have

λ=1Λ(n)1λ(Λ(n)λ)n(1eκ(1Λ(n)λn)λ)\displaystyle\prod_{\lambda=1}^{\Lambda(n)-1}\frac{\lambda(\Lambda(n)-\lambda)}{n(1-e^{\kappa}(1-\frac{\Lambda(n)-\lambda}{n})^{\lambda})}

\displaystyle\leq λ=1Λ(n)1λ(Λ(n)λ)λ(Λ(n)λ)12Λ(n)+O(1n)\displaystyle\prod_{\lambda=1}^{\Lambda(n)-1}\frac{\lambda(\Lambda(n)-\lambda)}{\lambda(\Lambda(n)-\lambda)-\frac{1}{2}\Lambda(n)+O(\frac{1}{n})}
\displaystyle\leq λ=1Λ(n)1(1Λ(n)2λ(Λ(n)λ))1\displaystyle\prod_{\lambda=1}^{\Lambda(n)-1}(1-\frac{\Lambda(n)}{2\lambda(\Lambda(n)-\lambda)})^{-1}
\displaystyle\leq λ=1Λ(n)1(12)12Λ(n)\displaystyle\prod_{\lambda=1}^{\Lambda(n)-1}\left(\frac{1}{2}\right)^{-1}\leq 2^{\Lambda(n)}

The latest is true because λ(Λ(n)λ)\lambda(\Lambda(n)-\lambda) achieves its minimal value when λ=1\lambda=1. Now, since Λ(n)alogcn\Lambda(n)\geq a\log^{c}{n} we have:

𝐏𝐫(κT>κη)eκ(alogcn1)eκη2alogcn\displaystyle{\bf Pr}(\kappa T>\kappa\eta)\leq\frac{e^{\kappa(a\log^{c}{n}-1)}}{e^{\kappa\eta}}2^{a\log^{c}{n}}

Since eκ=1+alogcn2ne^{\kappa}=1+\frac{a\log^{c}{n}}{2n} then by taking natural base logarithm of both sides and using Taylor series for ln(1+x)\ln(1+x), where |x|<1|x|<1, we have κalogcn2n\kappa\leq\frac{a\log^{c}{n}}{2n}. And hence, eκ(alogcn1)=O(1)e^{\kappa(a\log^{c}{n}-1)}=O(1). And we get,

𝐏𝐫(κT>κη)2alogcneκη\displaystyle{\bf Pr}(\kappa T>\kappa\eta)\leq\frac{2^{a\log^{c}{n}}}{e^{\kappa\eta}}

By taking η=kn\eta=kn, where k>2k>2 is a sufficiently large constant, we get

𝐏𝐫(κT>κη)2alogcnealogcnkn2n1nα\displaystyle{\bf Pr}(\kappa T>\kappa\eta)\leq\frac{2^{a\log^{c}{n}}}{e^{\frac{a\log^{c}{n}kn}{2n}}}\leq\frac{1}{n^{\alpha}}

where α>1\alpha>1 for some sufficiently large constant k>2k>2.

Thus we showed that if a new triple is generated by a worker in PFP-F then whp it is known to all processors in PFP-F in O(n)O(n) rounds. Now by applying Boole’s inequality we want to show that whp in O(n)O(n) rounds all triples in 𝒱\cal{V} become known to all processors in PFP-F.

Let ¯z\overline{\mathcal{E}}_{z} be the event that some triple z𝒱z\in\cal{V} is not spread around among all workers in PFP-F. In the preceding part of the proof we have shown that 𝐏𝐫[¯z]<1nα{\bf Pr}[\overline{\mathcal{E}}_{z}]<\frac{1}{n^{\alpha}}, where α>1\alpha>1. By Boole’s inequality, the probability that there exists one triple that did not get spread to all workers in PFP-F, can be bounded as

𝐏𝐫[z𝒱¯z]Σz𝒱𝐏𝐫[¯z]=Θ(nlogcn)1nα1nβ{\bf Pr}[\cup_{z\in{\mathcal{V}}}\overline{\mathcal{E}}_{z}]\leq\Sigma_{z\in{\mathcal{V}}}{\bf Pr}[\overline{\mathcal{E}}_{z}]=\Theta(n\log^{c}{n})\frac{1}{n^{\alpha}}\leq\frac{1}{n^{\beta}}

where β>0\beta>0. This implies that every worker in PFP-F collects all Θ(nlogn)\Theta(n\log n) triples generated by processors in PFP-F whp. Thus, at least one worker in PFP-F becomes enlightened after O(n)O(n) rounds. \Box

The following theorem shows that, with high probability, during epoch 𝔠\!\mathfrak{c} the correct results for all nn tasks, are available at all live processors in O(n)O(n) rounds.

Theorem 4

Algorithm daks makes known the correct results of all nn tasks at every live processor in epoch 𝔠\!\mathfrak{c} after O(n)O(n) rounds whp.

Proof sketch. The proof of this theorem is similar to the proof of Theorem 1. This is because, by Lemma 9, in O(n)O(n) rounds the processors in PFP-F generate Θ(logcn)\Theta(\log^{c}{n}) triples, where c1c\geq 1 is a constant. According to Lemmas 3 and 10 in O(n)O(n) rounds every live worker becomes enlightened. \Box

According to Lemma 10, after O(n)O(n) rounds of epoch 𝔠\!\mathfrak{c} at least one processor in PFP-F becomes enlightened. Furthermore, once a processor in PFP-F becomes enlightened, according to Lemma 3 after additional O(logn)O(\log{n}) rounds every live processor becomes enlightened and then terminates, whp. Next we assess work and message complexities (using the approach in the proof of Theorem 2). Recall that we may choose arbitrary ε\varepsilon, such that 0<ε<10<\varepsilon<1.

Theorem 5

Under adversarial model pl{\cal F}\!\!_{pl} algorithm daks has work complexity and message complexity O(n1+ε)O(n^{1+\varepsilon}), for any 0<ε<10<\varepsilon<1.

Proof. To obtain the result we combine the costs associated with epoch 𝔟\!\mathfrak{b^{\prime}} with the costs of epoch 𝔠\!\mathfrak{c}. As reasoned earlier, the worst case costs for epoch 𝔟\!\mathfrak{b^{\prime}} are given in Theorem 2.

For epoch 𝔠\!\mathfrak{c} (if it is not empty), where |PF|=Ω(logcn)|P-F|=\Omega(\log^{c}n), algorithm daks terminates after O(n)O(n) rounds whp and there are up to O(nε)O(n^{\varepsilon}) live processors. Thus its work is O(n)O(nε)=O(n1+ε)O(n)\cdot O(n^{\varepsilon})=O(n^{1+\varepsilon}). In every round, if a processor is a worker it sends a share message to one randomly chosen processor. If a processor is enlightened then it sends profess messages to a randomly selected subset of processors. In every round O(nε)O(n^{\varepsilon}) share messages are sent. Since whp algorithm daks terminates in O(n)O(n) rounds, O(n1+ε)O(n^{1+\varepsilon}) share messages are sent. On the other hand, according to Lemma 2, if during an execution Θ(nlogn)\Theta(n\log{n}) profess messages are sent then every processor terminates whp. Hence, the message complexity is O(n1+ε)O(n^{1+\varepsilon}).

The worst case costs of the algorithm correspond to executions with non-empty epoch 𝔠\!\mathfrak{c}, where the algorithm does not terminate early. In this case the costs from epoch 𝔟\!\mathfrak{b^{\prime}} are asymptotically absorbed into the worst case costs of epoch 𝔠\!\mathfrak{c} computed above. \Box

Last, we extend our analysis to assess the efficiency of algorithm dakst,n for tt tasks, where tnt\geq n. This is done based on the definition of algorithm dakst,n using the same observations as done in discussing Theorem 3.

Theorem 6

Algorithm dakst,n, with tnt\geq n, computes the results of tt tasks correctly in adversarial model pl{\cal F}\!\!_{pl} whp, in O(t)O(t) rounds, with work complexity O(tnε)O(t\cdot n^{\varepsilon}) and message complexity O(n1+ε)O(n^{1+\varepsilon}), for any 0<ε<10<\varepsilon<1.

Proof. The result for algorithm dakst,n is obtained (as in Theorem 3) by combining the costs from epoch 𝔟\!\mathfrak{b^{\prime}} (ibid.) with the costs of epoch 𝔠\!\mathfrak{c} derived from the analysis of algorithm daks for t=nt=n (Theorem 5). This is done by multiplying the number of rounds and work complexities by the size of the chunk Θ(t/n)\Theta(t/n); the message complexity is unchanged. \Box

We note that it should be possible to derive tighter bounds on the complexity of the algorithm. This is because we only assume that for all rounds in epoch 𝔠\!\mathfrak{c} the number of live processors is bounded by the generous range alogcn|PFr|bnεa\log^{c}n\leq|P-F_{r}|\leq bn^{\varepsilon}. In particular, if in all rounds of epoch 𝔠\!\mathfrak{c} there are Θ(𝑝𝑜𝑙𝑦(logn))\Theta({\it poly}(\log n)) live processors, the round and message complexities both become O(t𝑝𝑜𝑙𝑦(logn))O(t\,{\it poly}(\log n)) as follows from the arguments along the lines of the proofs of Theorems 5 and 6.

4.4 Finalizing Algorithm Parameterization

Lastly, we discuss the compile-time constants \mathfrak{H} and 𝔎\mathfrak{K} that appear in algorithm daks (starting with line 2). Recall that we have already given the constant \mathfrak{H} in Section 4.1; the constant stems from the proof of Lemma 2.

We compute 𝔎\mathfrak{K} as max{k1,k2,k3}\max\{k_{1},k_{2},k_{3}\}, where k2k_{2} and k3k_{3} come from the proofs of Lemmas 5 and 9. The constant k1k_{1}, as we detail below, emerges from the proof of Lemma 2 of [6] in the same way that the constants k2k_{2} and k3k_{3} are established in Lemmas 5 and 9.

As we discussed in conjunction with Lemma 4, algorithm daks in epoch 𝔞\!\mathfrak{a} performs tasks in the same pattern as in algorithm AA [6] when Ω(n)\Omega(n) processors do not crash. Lemma 2 of [6] shows that after Θ(logn)\Theta(\log n) rounds of algorithm AA there is no task that is performed less than k(1δ)lognk(1-\delta)\log n times, whp, for a suitably large constant kk and some constant δ(0,1)\delta\in(0,1). Thus, we let k1k_{1} to be k1=k(1δ)k_{1}=k(1-\delta). This allows us to define 𝔎\mathfrak{K} to be 𝔎=max{k1,k2,k3}\mathfrak{K}=\max\{k_{1},k_{2},k_{3}\}, ensuring that the constant 𝔎\mathfrak{K} in algorithm daks (and thus in algorithm dakst,n) is large enough to satisfy all requirements of the analysis.

5 Conclusion

We presented a synchronous decentralized algorithm that can perform a set of tasks using a distributed system of undependable, crash-prone processors. Our randomized algorithm allows the processors to compute the correct results and make the results available at every live participating processor, whp. We provided time, message, and work complexity bounds for two adversarial strategies, viz., (a) all but Ω(nε)\Omega(n^{\varepsilon}), ε>0\varepsilon>0, processors can crash, and (b) all but a poly-logarithmic number of processors can crash. In this work our focus was on stronger adversarial behaviors, while still assuming synchrony and reliable communication. Future work considers the problem in synchronous and asynchronous decentralized systems, with more virulent adversarial settings in both. We plan to derive strong lower bounds on the message, time, and work complexities in various models. Our algorithm solves the problem (whp) even if only one processor remains operational. Thus it is worthwhile to understand its behavior in light of failure dynamics during executions. Accordingly we plan to derive complexity bounds that depend on the number of processors and tasks, and also on the actual number of crashes.

References

  • [1] Distributed.net. http://www.distributed.net/.
  • [2] Seti@home. http://setiathome.ssl.berkeley.edu/.
  • [3] B. S. Chlebus and D. R. Kowalski. Randomization helps to perform independent tasks reliably. Random Structures and Algorithms, 24(1):11–41, 2004.
  • [4] Bogdan S. Chlebus, Leszek Gasieniec, Dariusz R. Kowalski, and Alexander A. Shvartsman. Bounding work and communication in robust cooperative computation. In DISC, pages 295–310, 2002.
  • [5] E. Christoforou, A. Fernandez, Ch. Georgiou, and M. Mosteiro. Algorithmic mechanisms for internet supercomputing under unreliable communication. In NCA, pages 275–280, 2011.
  • [6] S. Davtyan, K. M. Konwar, and A. A. Shvartsman. Robust network supercomputing without centralized control. In Proc. of the 15th Int-l Conf. on Principles of Distributed Systems, pages 435–450, 2011.
  • [7] S. Davtyan, K. M. Konwar, and A. A. Shvartsman. Decentralized network supercomputing in the presence of malicious and crash-prone workers. In Proc. of 31st ACM Symp. on Principles of Distributed Computing, pages 231–232, 2012.
  • [8] S. Dolev, R. Segala, and A.A. Shvartsman. Dynamic load balancing with group communication. Theoretical Computer Science, 369(1–3):348–360, 2006. A preliminary version appeared in SIROCCO 1999.
  • [9] C. Dwork, J. Y. Halpern, and O. Waarts. Performing work efficiently in the presence of faults. SIAM J. Comput., 27(5):1457–1491, 1998.
  • [10] A. Fernandez, C. Georgiou, L. Lopez, and A. Santos. Reliably executing tasks in the presence of malicious processors. Technical Report Numero 9 (RoSaC-2005-9), Grupo de Sistemas y Comunicaciones, Universidad Rey Juan Carlos, 2005. http://gsyc.escet.urjc.es/publicaciones/tr/RoSaC-2005-9.pdf.
  • [11] A. Fernandez, C. Georgiou, L. Lopez, and A. Santos. Reliably executing tasks in the presence of untrusted entities. In SRDS, pages 39–50, 2006.
  • [12] C. Georgiou and A.A. Shvartsman. Cooperative Task-Oriented Computing; Algorithms and Complexity. Morgan & Claypool Publishers, first edition, 2011.
  • [13] Ch. Georgiou, A. Russell, and A.A. Shvartsman. Work-competitive scheduling for cooperative computing with dynamic groups. SIAM Journal on Computing, 34(4):848–862, 2005. A preliminary version appeared in STOC 2003.
  • [14] Ch. Georgiou and A.A. Shvartsman. Cooperative computing with fragmentable and mergeable groups. Journal of Discrete Algorithms, 1(2):211–235, 2003. A preliminary version appeared in SIROCCO 2000.
  • [15] N.W. Hanson, K.M. Konwar, S.-J. Wu, and S.J. Hallam. Metapathways v2.0: A master-worker model for environmental pathway/genome database construction on grids and clouds. In IEEE Conf. on Comput. Intelligence in Bioinf. and Comput. Biology, Hawaii, 2014 (to appear).
  • [16] P. C. Kanellakis and A. A. Shvartsman. Fault-Tolerant Parallel Computation. Kluwer Academic Publishers, 1997.
  • [17] Z.M. Kedem, K.V. Palem, A. Raghunathan, and P. Spirakis. In Combining tentative and definite executions for dependable parallel computing, pages 381–390, 1991.
  • [18] K. M. Konwar, S. Rajasekaran, and A. A. Shvartsman. Robust network supercomputing with malicious processes. In Proceedings of the 17th International Symposium on Distributed Computing, pages 474–488, 2006.
  • [19] C. Martel and R. Subramonian. On the complexity of certified write-all algorithms. Journal of Algorithms, 16(3):361–387, 1994.
  • [20] R. Motwani and P. Raghavan. Randomized Algorithms. Cambridge University Press, 1995.
  • [21] M. Paquette and A. Pelc. Optimal decision strategies in byzantine environments. Parallel and Distributed Computing, 66(3):419–427, 2006.
  • [22] R. De. Prisco, A. Mayer, and M. Yung. Time-optimal message-efficientwork performance in the presence of faults. In Proceedings of the 13th ACM Symp. Principles of Distributed Computing, pages 161–172, 1994.
  • [23] Bin Yu and Munindar P. Singh. An evidential model of distributed reputation management. In AAMAS, pages 294–301, 2002.