Information-Theoretic Lower Bounds for Distributed Function Computation

Aolin Xu and Maxim Raginsky This work was supported by the NSF under grant CCF-1017564, CAREER award CCF-1254041, by the Center for Science of Information (CSoI), an NSF Science and Technology Center, under grant agreement CCF-0939370, and by ONR under grant N00014-12-1-0998. The material in this paper was presented in part at the IEEE International Symposium on Information Theory (ISIT), Honolulu, HI, July 2014.The authors are with the Department of Electrical and Computer Engineering and the Coordinated Science Laboratory, University of Illinois, Urbana, IL 61801, USA. E-mail: {aolinxu2,maxim}@illinois.edu.

Abstract

We derive information-theoretic converses (i.e., lower bounds) for the minimum time required by any algorithm for distributed function computation over a network of point-to-point channels with finite capacity, where each node of the network initially has a random observation and aims to compute a common function of all observations to a given accuracy with a given confidence by exchanging messages with its neighbors. We obtain the lower bounds on computation time by examining the conditional mutual information between the actual function value and its estimate at an arbitrary node, given the observations in an arbitrary subset of nodes containing that node. The main contributions include: 1) A lower bound on the conditional mutual information via so-called small ball probabilities, which captures the dependence of the computation time on the joint distribution of the observations at the nodes, the structure of the function, and the accuracy requirement. For linear functions, the small ball probability can be expressed by Lévy concentration functions of sums of independent random variables, for which tight estimates are available that lead to strict improvements over existing lower bounds on computation time. 2) An upper bound on the conditional mutual information via strong data processing inequalities, which complements and strengthens existing cutset-capacity upper bounds. 3) A multi-cutset analysis that quantifies the loss (dissipation) of the information needed for computation as it flows across a succession of cutsets in the network. This analysis is based on reducing a general network to a line network with bidirectional links and self-links, and the results highlight the dependence of the computation time on the diameter of the network, a fundamental parameter that is missing from most of the existing lower bounds on computation time.

Index Terms:

Distributed function computation, computation time, small ball probability, Lévy concentration function, strong data processing inequality, cutset bound, multi-cutset analysis

I Introduction and preview of results

I-A Model and problem formulation

The problem of distributed function computation arises in such applications as inference and learning in networks and consensus or coordination of multiple agents. Each node of the network has an initial random observation and aims to compute a common function of the observations of all the nodes by exchanging messages with its neighbors over discrete memoryless point-to-point channels and by performing local computations. A problem of theoretical and practical interest is to determine the fundamental limits on the computation time, i.e., the minimum number of steps needed by any distributed computation algorithm to guarantee that, when the algorithm terminates, each node has an accurate estimate of the function value with high probability.

Formally, a network consisting of nodes connected by point-to-point channels is represented by a directed graph $G=({\mathcal{V}},{\mathcal{E}})$ , where ${\mathcal{V}}$ is a finite set of nodes and ${\mathcal{E}}\subseteq{\mathcal{V}}\times{\mathcal{V}}$ is a set of edges. Node $u$ can send messages to node $v$ only if $(u,v)\in{\mathcal{E}}$ . Accordingly, to each edge $e\in{\mathcal{E}}$ we associate a discrete memoryless channel with finite input alphabet ${\mathsf{X}}_{e}$ , finite output alphabet ${\mathsf{Y}}_{e}$ , and stochastic transition law $K_{e}$ that specifies the transition probabilities $K_{e}(y_{e}|x_{e})$ for all $(x_{e},y_{e})\in{\mathsf{X}}_{e}\times{\mathsf{Y}}_{e}$ . The channels corresponding to different edges are assumed to be independent. Initially, each node $v$ has access to an observation given by a random variable (r.v.) $W_{v}$ taking values in some space ${\mathsf{W}}_{v}$ . We assume that the joint probability law ${\mathbb{P}}_{W}$ of $W\triangleq(W_{v})_{v\in{\mathcal{V}}}$ is known to all the nodes. Given a function $f:\prod_{v\in{\mathcal{V}}}{\mathsf{W}}_{v}\to{\mathsf{Z}}$ , each node aims to estimate the value $Z=f(W)$ via local communication and computation. For example, when $f$ is given by the identity mapping $Z=W$ , the goal of each node is to estimate the observations of all other nodes in the network.

The operation of the network is synchronized, and takes place in discrete time. A $T$ -step algorithm ${\mathcal{A}}$ is a collection of deterministic encoders $(\varphi_{v,t})$ and estimators $(\psi_{v})$ , for all $v\in{\mathcal{V}}$ and $t\in\{1,\ldots,T\}$ , given by mappings

\displaystyle\varphi_{v,t}:{\mathsf{W}}_{v}\times{\mathsf{Y}}^{t-1}_{v\leftarrow}\to{\mathsf{X}}_{v\rightarrow},\quad\psi_{v}:{\mathsf{W}}_{v}\times{\mathsf{Y}}^{T}_{v\leftarrow}\to{\mathsf{Z}},

where ${\mathsf{X}}_{v\rightarrow}=\prod_{u\in{\mathcal{N}}_{v\rightarrow}}{\mathsf{X}}_{(v,u)}$ and ${\mathsf{Y}}_{v\leftarrow}=\prod_{u\in{\mathcal{N}}_{v\leftarrow}}{\mathsf{Y}}_{(u,v)}$ . Here, ${\mathcal{N}}_{v\leftarrow}\triangleq\{u\in{\mathcal{V}}:(u,v)\in{\mathcal{E}}\}$ and ${\mathcal{N}}_{v\rightarrow}\triangleq\{u\in{\mathcal{V}}:(v,u)\in{\mathcal{E}}\}$ are, respectively, the in-neighborhood and the out-neighborhood of node $v$ . The algorithm operates as follows: at each step $t$ , each node $v$ computes $X_{v,t}\triangleq(X_{(v,u),t})_{u\in{\mathcal{N}}_{v\rightarrow}}=\varphi_{v,t}\big{(}W_{v},Y^{t-1}_{v}\big{)}\in{\mathsf{X}}_{v\rightarrow}$ , and then transmits each message $X_{(v,u),t}$ along the edge $(v,u)\in{\mathcal{E}}$ . For each $(u,v)\in{\mathcal{E}}$ , the received message $Y_{(u,v),t}$ at each $t$ is related to the transmitted message $X_{(u,v),t}$ via the stochastic transition law $K_{(u,v)}$ . At step $T$ , each node $v$ computes ${\widehat{Z}}_{v}=\psi_{v}(W_{v},Y^{T}_{v})$ as an estimate of $Z$ , where $Y_{v,t}\triangleq(Y_{(u,v),t})_{u\in{\mathcal{N}}_{v\leftarrow}}\in{\mathsf{Y}}_{v\leftarrow}$ for $t\in\{1,\ldots,T\}$ .

Given a nonnegative distortion function $d:{\mathsf{Z}}\times{\mathsf{Z}}\to\mathbb{R}^{+}$ , we use the excess distortion probability ${\mathbb{P}}\big{[}d(Z,{\widehat{Z}}_{v})>\varepsilon\big{]}$ to quantify the computation fidelity of the algorithm at node $v$ . A key fundamental limit of distributed function computation is the $(\varepsilon,\delta)$ -computation time:

	$\displaystyle T(\varepsilon,\delta)\triangleq\inf\Big{\{}T\in{\mathbb{N}}:$	$\displaystyle\exists\text{ a $T$-step algorithm ${\mathcal{A}}$ such that }$
		$\displaystyle\max_{v\in{\mathcal{V}}}{\mathbb{P}}\big{[}d(Z,{\widehat{Z}}_{v})>\varepsilon\big{]}\leq\delta\Big{\}}.$		(1)

If an algorithm ${\mathcal{A}}$ has the property that

\max_{v\in{\mathcal{V}}}{\mathbb{P}}\big{[}d(Z,{\widehat{Z}}_{v})>\varepsilon\big{]}\leq\delta,

then we say that it achieves accuracy $\varepsilon$ with confidence $1-\delta$ . Thus, $T(\varepsilon,\delta)$ is the minimum number of time steps needed by any algorithm to achieve accuracy $\varepsilon$ with confidence $1-\delta$ . The objective of this paper is to derive general lower bounds on $T(\varepsilon,\delta)$ for arbitrary network topologies, discrete memoryless channel models, continuous or discrete observations, and functions $f$ .

Previously, this problem (for real-valued functions and quadratic distortion) has been studied by Ayaso et al. [1] and by Como and Dahleh [2] using information-theoretic techniques. This problem is also related to the study of communication complexity of distributed computing over noisy channels. In that context, Goyal et al. [3] studied the problem of computing Boolean functions in complete graphs, where each pair of nodes communicates over a pair of independent binary symmetric channels (BSCs), and obtained tight lower bounds on the number of serial broadcasts using an approach tailored to that special problem. The technique used in [3] has been extended to random planar networks by Dutta et al. [4]. Other related, but differently formulated, problems include communication complexity and information complexity in distributed computing over noiseless channels, surveyed in [5]; minimum communication rates for distributed computing [6, 7, 8], compression, or estimation based on infinite sequences of observations, surveyed in [9, Chap. 21]; and distributed computing in wireless networks, surveyed in [10]. Some achievability results for specific distributed function computation problems can be found in [11, 12, 13, 1, 14, 15, 16, 17, 18].

I-B Method of analysis and summary of main results

Our analysis builds upon the information-theoretic framework proposed by Ayaso et al. [1] and Como and Dahleh [2]. The underlying idea is rather natural and exploits a fundamental trade-off between the minimal amount of information any good algorithm must necessarily extract about the function value $Z$ when it terminates and the maximal amount of information any algorithm is able to obtain due to time and communication constraints. To be more precise, given any set of nodes ${\mathcal{S}}\subseteq{\mathcal{V}}$ , let $W_{\mathcal{S}}\triangleq(W_{v})_{v\in{\mathcal{S}}}$ denote the vector of observations at all the nodes in ${\mathcal{S}}$ . The quantity that plays a key role in the analysis is the conditional mutual information $I(Z;{\widehat{Z}}_{v}|W_{\mathcal{S}})$ between the actual function value $Z$ and the estimate ${\widehat{Z}}_{v}$ at an arbitrary node $v$ , given the observations in an arbitrary subset of nodes ${\mathcal{S}}$ containing $v$ .

Consider an arbitrary $T$ -step algorithm ${\mathcal{A}}$ that achieves accuracy $\varepsilon$ with confidence $1-\delta$ . Then, as we show in Lemma 1 of Sec. II-A, this mutual information can be lower-bounded by

\displaystyle I(Z;{\widehat{Z}}_{v}|W_{\mathcal{S}})

\displaystyle\geq(1-\delta)\log\frac{1}{\mathbb{E}[L(W_{\mathcal{S}},\varepsilon)]}-h_{2}(\delta),

(2)

where $h_{2}(\delta)\triangleq-\delta\log\delta-(1-\delta)\log(1-\delta)$ is the binary entropy function, and

	$\displaystyle L(w_{\mathcal{S}},\varepsilon)$	$\displaystyle\triangleq\sup_{z\in{\mathsf{Z}}}{\mathbb{P}}[d(Z,z)\leq\varepsilon\|W_{\mathcal{S}}=w_{\mathcal{S}}]$
		$\displaystyle=\sup_{z\in{\mathsf{Z}}}{\mathbb{P}}[d(f(W),z)\leq\varepsilon\|W_{\mathcal{S}}=w_{\mathcal{S}}]$

is the conditional small ball probability of $Z=f(W)$ given $W_{\mathcal{S}}=w_{\mathcal{S}}$ . The conditional small ball probability quantifies the difficulty of localizing the value of $Z=f(W)$ in a “distortion ball” of size $\varepsilon$ given partial knowledge about the value of $W$ , namely $W_{\mathcal{S}}=w_{\mathcal{S}}$ . For example, as discussed in Sec. IV, when $f$ is a linear function of the observations $W$ , the conditional small ball probability can be expressed in terms of so-called Lévy concentration functions [19], for which tight estimates are available under various regularity conditions.

Refer to caption — Figure 1: A four-node network with a cut defined by ${\mathcal{S}}=\{2,3\}$ and ${\mathcal{S}}^{c}=\{1,4\}$ . The cutset ${\mathcal{E}}_{\mathcal{S}}$ consists of edges $(1,2)$ and $(4,3)$ , marked in blue.

On the other hand, if ${\mathcal{A}}$ is a $T$ -step algorithm, then the amount of information any node $v$ has about $Z$ once ${\mathcal{A}}$ terminates can be upper-bounded by a quantity that increases with $T$ and also depends on the network topology and on the information transmission capabilities of the channels connecting the nodes. To quantify this amount of information, we consider a cut of the network, i.e., a partition of the set of nodes ${\mathcal{V}}$ into two disjoint subsets ${\mathcal{S}}$ and ${\mathcal{S}}^{c}\triangleq{\mathcal{V}}\backslash{\mathcal{S}}$ , such that $v\in{\mathcal{S}}$ . The underlying intuition is that any information that nodes in ${\mathcal{S}}$ receive about $W_{{\mathcal{S}}^{c}}$ must flow across the edges from nodes in ${\mathcal{S}}^{c}$ to nodes in ${\mathcal{S}}$ . The set of these edges, denoted by ${\mathcal{E}}_{\mathcal{S}}$ , is referred to as the cutset induced by ${\mathcal{S}}$ . Figure 1 illustrates these concepts on a simple four-node network. We then have the following upper bound [1, 2] (see also Lemma 2 in Sec. II-B):

\displaystyle I(Z;{\widehat{Z}}_{v}|W_{{\mathcal{S}}})\leq TC_{{\mathcal{S}}}.

(3)

The quantity $C_{\mathcal{S}}$ , referred to as the cutset capacity, is the sum of the Shannon capacities of all the channels located on the edges in the cutset ${\mathcal{E}}_{\mathcal{S}}$ . Thus, if there exists a cut $({\mathcal{S}},{\mathcal{S}}^{c})$ with a small value of $C_{\mathcal{S}}$ , then the amount of information gained by the nodes in ${\mathcal{S}}$ about $Z$ will also be small. Note that the cutset upper bound grows linearly with $T$ . However, when the initial observations $W$ are discrete, we also know that

I(Z;{\widehat{Z}}_{v}|W_{{\mathcal{S}}})\leq I(W_{{\mathcal{S}}^{c}};{\widehat{Z}}_{v}|W_{{\mathcal{S}}})\leq H(W_{{\mathcal{S}}^{c}}|W_{\mathcal{S}})

where $H(W_{{\mathcal{S}}^{c}}|W_{\mathcal{S}})$ is the conditional entropy of $W_{{\mathcal{S}}^{c}}$ given $W_{{\mathcal{S}}}$ , which does not depend on $T$ . In fact, we sharpen this bound by showing in Lemma 5 in Sec. II-D that

\displaystyle I(Z;{\widehat{Z}}_{v}|W_{{\mathcal{S}}})

\displaystyle\leq\big{(}1-(1-\eta_{v})^{T}\big{)}H(W_{{\mathcal{S}}^{c}}|W_{{\mathcal{S}}}).

(4)

Here, $\eta_{v}$ is defined as

\eta_{v}=\sup\frac{I(U;Y_{v})}{I(U;X_{v})}

where the supremum is over all triples $(U,X_{v},Y_{v})$ of r.v.’s, such that $U$ takes values in an arbitrary alphabet, $U\to X_{v}\to Y_{v}$ is a Markov chain, $X_{v}$ takes values in ${\mathsf{X}}_{v\leftarrow}$ , $Y_{v}$ takes values in ${\mathsf{Y}}_{v\leftarrow}$ , and the conditional probability law ${\mathbb{P}}_{Y_{v}|X_{v}}$ is equal to the product of all the channels entering $v$ . As we discuss in detail in Sec. II-C, this constant is related to so-called strong data processing inequalities (SDPIs) [20], and quantifies the information transmission capabilities of the channels entering $v$ . When $\eta_{v}<1$ , the upper bound (4) is strictly smaller than $H(W_{{\mathcal{S}}^{c}}|W_{\mathcal{S}})$ . With the upper bound (4), we can strengthen the cutset bound to the following:

\displaystyle I(Z;{\widehat{Z}}_{v}|W_{\mathcal{S}})\leq\min\big{\{}TC_{\mathcal{S}},\big{(}1-(1-\eta_{v})^{T}\big{)}H(W_{{\mathcal{S}}^{c}}|W_{{\mathcal{S}}})\big{\}}.

(5)

Combining the bounds in (2) and (5), we conclude that, if there exists a $T$ -step algorithm ${\mathcal{A}}$ that achieves accuracy $\varepsilon$ with confidence $1-\delta$ , then

		$\displaystyle T\geq\max\Bigg{\{}\frac{1}{C_{\mathcal{S}}}\left((1-\delta)\log\frac{1}{\mathbb{E}[L(W_{\mathcal{S}},\varepsilon)]}-h_{2}(\delta)\right),$
		$\displaystyle\frac{\log\left(1-\frac{1}{H(W_{{\mathcal{S}}^{c}}\|W_{\mathcal{S}})}\left((1-\delta)\log\frac{1}{\mathbb{E}[L(W_{\mathcal{S}},\varepsilon)]}-h_{2}(\delta)\right)\right)}{\log(1-\eta_{v})}\Bigg{\}};$		(6)

moreover, this inequality holds for all choices of ${\mathcal{S}}\subset{\mathcal{V}}$ and $v\in{\mathcal{S}}$ . The precise statements of the resulting lower bounds on $T(\varepsilon,\delta)$ are given in Theorem 1 and Theorem 2.

The lower bound in (I-B) accounts for the difficulty of estimating the value of $Z=f(W)$ given only a subset of observations $W_{\mathcal{S}}$ through the small ball probability $L(W_{\mathcal{S}},\varepsilon)$ , and for the communication bottlenecks in the network through the cutset capacity $C_{\mathcal{S}}$ and the constants $\eta_{v}$ . The presence of $L(W_{\mathcal{S}},\varepsilon)$ in the bound ensures the correct scaling of $T(\varepsilon,\delta)$ in the high-accuracy limit $\varepsilon\to 0$ . In particular, when the function $f$ is real-valued and the probability distribution of $Z=f(W)$ has a density, it is not hard to see that $L(W_{\mathcal{S}},\varepsilon)=O(\varepsilon)$ , and therefore $T(\varepsilon,\delta)$ grows without bound at the rate of $\log(1/\varepsilon)$ as $\varepsilon\to 0$ . By contrast, the bounds of Ayaso et al. [1] saturate at a finite constant even when no computation error is allowed, i.e., when $\varepsilon=0$ . Detailed comparison with existing bounds is given in Sec. IV, where we particularize our lower bounds to the computation of linear functions. Moreover, in certain cases our lower bound on $T(\varepsilon,\delta)$ tends to infinity in the high-confidence regime $\delta\to 0$ . By contrast, existing lower bounds that rely on cutset capacity estimates remain bounded regardless of how small we make $\delta$ .

Throughout the paper, we provide several concrete examples that illustrate the tightness of the general lower bound in (I-B). In particular, Example 1 in Sec. II-E concerns the problem of computing the mod- $2$ sum of two independent ${\rm Bern}(\tfrac{1}{2})$ random variables in a network of two nodes communicating over binary symmetric channels (BSCs). For that problem, we obtain a lower bound on $T(0,\delta)$ that matches an achievable upper bound within a factor of $2$ . In Example 2 in Sec. II-E, we consider the case where the nodes aim to distribute their discrete observations to all other nodes, and obtain a lower bound on $T(0,\delta)$ that captures the conductance of the network, which plays a prominent role in the previously published bounds of Ayaso et al. [1]. In Sec. V, we study two more examples: computing a sum of independent Rademacher random variables in a dumbbell network of BSCs, and distributed averaging of real-valued observations in an arbitrary network of binary erasure channels (BECs). Our lower bound for the former example can precisely capture the dependence of the computation time on the number of nodes in the network, while for the latter example it captures the correct dependence of the computation time on the accuracy parameter $\varepsilon$ .

A significant limitation of the analysis based on a single cut $({\mathcal{S}},{\mathcal{S}}^{c})$ of the network is that it only captures the flow of information across the cutset ${\mathcal{E}}_{\mathcal{S}}$ , but does not account for the time it takes the algorithm to disseminate this information to all the nodes in ${\mathcal{S}}$ . We address this limitation in Sec. III through a multi-cutset analysis. The main idea is to partition the set of nodes ${\mathcal{V}}$ into several subsets ${\mathcal{S}}_{1},\ldots,{\mathcal{S}}_{n}$ , such that, for all ${\mathcal{P}}_{i}\triangleq{\mathcal{S}}_{1}\cup\ldots\cup{\mathcal{S}}_{i}$ , the cutsets ${\mathcal{E}}_{{\mathcal{P}}_{1}},\ldots,{\mathcal{E}}_{{\mathcal{P}}_{n-1}}$ , ${\mathcal{E}}_{{\mathcal{P}}^{c}_{1}},\ldots,{\mathcal{E}}_{{\mathcal{P}}^{c}_{n-1}}$ are disjoint, and to analyze the flow of information across this sequence of cutsets. Once such a partition is selected, the analysis is based on a network reduction argument (Lemma 7), which lumps all the nodes in each ${\mathcal{S}}_{i}$ into a single virtual “supernode.” The construction of the partition ensures that each supernode $i$ only communicates with supernodes $i-1$ and $i+1$ , and can also send noisy messages to itself (this is needed to simulate noisy communication among the nodes within ${\mathcal{S}}_{i}$ in the original network). Thus, the reduced network takes the form of a chain with $n$ nodes communicating with their nearest neighbors over bidirectional noisy links and, in addition, sending noisy messages to themselves. We refer to this network as a bidirected chain of length $n-1$ . Figure 2a shows the partition of a six-node network, and the bidirected chain reduced from this network is shown in Fig. 2b.

Once this reduction is carried out, we can convert any $T$ -step algorithm ${\mathcal{A}}$ running on the original network into a randomized $T$ -step algorithm ${\mathcal{A}}^{\prime}$ running on the reduced network with the same accuracy and confidence guarantees as ${\mathcal{A}}$ . Consequently, it suffices to analyze distributed function computation in bidirected chains. The key quantitative statement that emerges from this analysis can be informally stated as follows: For any bidirected chain with $n>3$ nodes, there exists a constant $\eta\in[0,1]$ that plays the same role as $\eta_{v}$ in (4) and quantifies the information transmission capabilities of the channels in the chain, such that, for any algorithm ${\mathcal{A}}$ that runs on this chain and takes time $T=O(n/\eta)$ , the conditional mutual information between the function value $Z$ and its estimate ${\widehat{Z}}_{n}$ at the rightmost node $n$ given the observations of nodes $2$ through $n$ is upper-bounded by

\displaystyle I(Z;{\widehat{Z}}_{n}|W_{2:n})=O\left(\frac{C_{(1,2)}n^{2}}{\eta}e^{-2n\eta^{2}}\right),

(7)

where $C_{(1,2)}$ is the Shannon capacity of the channel from node $1$ to node $2$ . The precise statement is given in Lemma 8 in Sec. III-A. Intuitively, this shows that, unless the algorithm uses $\Omega(n/\eta)$ steps, the information about $W_{1}$ will dissipate at an exponential rate by the time it propagates through the chain from node $1$ to node $n$ . Combining (7) with the lower bound on $I(Z;{\widehat{Z}}_{n}|W_{2:n})$ based on small ball probabilities, we can obtain lower bounds on the computation time $T(\varepsilon,\delta)$ . The precise statement is given in Theorem 3. Moreover, as we show, it is always possible to reduce an arbitrary network with bidirectional point-to-point channels between the nodes to a bidirected chain whose length is equal to the diameter of the original network, which implies that, for networks with sufficiently large diameter, and for sufficiently small values of $\varepsilon,\delta$ ,

\displaystyle T(\varepsilon,\delta)=\Omega\left(\frac{{\rm diam}(G)}{\eta}\right),

(8)

where ${\rm diam}(G)$ denotes the diameter. This dependence on ${\rm diam}(G)$ , which cannot be captured by the single-cutset analysis, is missing in almost all of the existing lower bounds on computation time. An exception is the paper by Rajagopalan and Schulman [13] that gives an asymptotic lower bound on the time required to broadcast a single bit over a chain of unidirectional BSCs. Our multi-cutset analysis applies to both discrete and continuous observations, and to general network topologies. It can be straightforwardly particularized to specific networks, such as bidirected chains, rings, trees, and grids, as discussed in Sec. III-B. We note that techniques involving multiple (though not necessarily disjoint) cutsets have also been proposed in the study of multi-party communication complexity by Tiwari [21] and more recently by Chattopadhyay et al. [22], while our concern is the influence of network topology and channel noise on the computation time.

I-C Organization of the paper

The remainder of the paper is structured as follows. We start with the single-cutset analysis in Sec. II. The lower bound on the conditional mutual information via the conditional small ball probability is presented in Sec. II-A. The cutset upper bound and the SDPI upper bound on the conditional mutual information are presented in Sec. II-B and Sec. II-D. An introduction on SDPIs given in Sec. II-C. The lower bound on computation time is given in Sec. II-E, along with two concrete examples. Sec. III is devoted to the multi-cutset analysis, where we first present the network reduction argument in Sec. III-A, then derive general lower bounds on computation time and particularize the results to special networks in Sec. III-B. In Sec. IV, we discuss lower bounds for computing linear functions, where we relate the conditional small ball probability to Lévy concentration functions, and evaluate them in a number of special cases. We also make detailed comparisons of our results with existing lower bounds in Sec. IV-D. In Sec. V, we compare the lower bounds on computation time with the achievable upper bounds for two more examples: computing a sum of independent Rademacher random variables in a dumbbell network of BSCs, and distributed averaging of real-valued observations in an arbitrary network of binary erasure channels (BECs). We conclude this paper and point out future research directions in Sec. VI. A couple of lengthy technical proofs are relegated to a series of appendices.

II Single-cutset analysis

We start by deriving information-theoretic lower bounds on the computation time $T(\varepsilon,\delta)$ based on a single cutset in the network. Recall that a cutset associated to a partition of ${\mathcal{V}}$ into two disjoint sets ${\mathcal{S}}$ and ${\mathcal{S}}^{c}\triangleq{\mathcal{V}}\setminus{\mathcal{S}}$ consists of all edges that connect a node in ${\mathcal{S}}^{c}$ to a node in ${\mathcal{S}}$ :

{\mathcal{E}}_{\mathcal{S}}\triangleq\big{\{}(u,v)\in{\mathcal{E}}:u\in{\mathcal{S}}^{c},v\in{\mathcal{S}}\big{\}}\equiv({\mathcal{S}}^{c}\times{\mathcal{S}})\cap{\mathcal{E}}.

When ${\mathcal{S}}$ is a singleton, i.e., ${\mathcal{S}}=\{v\}$ , we will write ${\mathcal{E}}_{v}$ instead of the more clunky ${\mathcal{E}}_{\{v\}}$ . As the discussion in Sec. I-B indicates, our analysis revolves around the conditional mutual information $I(Z;{\widehat{Z}}_{v}|W_{\mathcal{S}})$ for an arbitrary set of nodes ${\mathcal{S}}\subset{\mathcal{V}}$ and for an arbitrary node $v\in{\mathcal{S}}$ . The lower bound on $I(Z;{\widehat{Z}}_{v}|W_{\mathcal{S}})$ expresses quantitatively the intuition that any algorithm that achieves

\displaystyle\max_{v\in{\mathcal{V}}}{\mathbb{P}}\big{[}d(Z,{\widehat{Z}}_{v})>\varepsilon\big{]}\leq\delta

must necessarily extract a sufficient amount of information about the value of $Z=f(W)=f(W_{\mathcal{S}},W_{{\mathcal{S}}^{c}})$ . On the other hand, the upper bounds on $I(Z;{\widehat{Z}}_{v}|W_{\mathcal{S}})$ formalize the idea that this amount cannot be too large, since any information that nodes in ${\mathcal{S}}$ receive about $W_{{\mathcal{S}}^{c}}$ must flow across the edges in the cutset ${\mathcal{E}}_{\mathcal{S}}$ (cf. [23, Sec. 15.10] for a typical illustration of this type of cutset arguments). We capture this information limitation in two ways: via channel capacity and via SDPI constants.

The remainder of this section is organized as follows. We first present conditional mutual information lower bounds in Sec. II-A. Then we state the upper bound based on cutset capacity in Sec. II-B. After a brief detour to introduce the SDPIs in Sec. II-C, we state the SDPI-based upper bounds in Sec. II-D. Finally, we combine the lower and upper bounds to derive lower bounds on $T(\varepsilon,\delta)$ in Sec. II-E.

II-A Lower bound on $I(Z;{\widehat{Z}}_{v}|W_{\mathcal{S}})$

For any $\varepsilon\geq 0$ , ${\mathcal{S}}\subset{\mathcal{V}}$ , and $w_{\mathcal{S}}\in\prod_{v\in{\mathcal{S}}}{\mathsf{W}}_{v}$ , define the conditional small ball probability of $Z$ given $W_{\mathcal{S}}=w_{\mathcal{S}}$ as

\displaystyle L(w_{\mathcal{S}},\varepsilon)\triangleq\sup_{z\in{\mathsf{Z}}}{\mathbb{P}}[d(Z,z)\leq\varepsilon|W_{\mathcal{S}}=w_{\mathcal{S}}].

(9)

This quantity measures how well the conditional distribution of $Z$ given $W_{\mathcal{S}}=w_{\mathcal{S}}$ concentrates in a small region of size $\varepsilon$ as measured by $d(\cdot,\cdot)$ . The following lower bound on $I(Z;{\widehat{Z}}_{v}|W_{\mathcal{S}})$ in terms of the conditional small ball probability is essential for proving lower bounds on $T(\varepsilon,\delta)$ .

Lemma 1.

If an algorithm ${\mathcal{A}}$ achieves

\displaystyle\max_{v\in{\mathcal{V}}}{\mathbb{P}}\big{[}d(Z,{\widehat{Z}}_{v})>\varepsilon\big{]}\leq\delta\leq 1/2,

(10)

then for any set ${\mathcal{S}}\subset{\mathcal{V}}$ and any node $v\in{\mathcal{S}}$ ,

\displaystyle I(Z;{\widehat{Z}}_{v}|W_{\mathcal{S}})

\displaystyle\geq(1-\delta)\log\frac{1}{\mathbb{E}[L(W_{\mathcal{S}},\varepsilon)]}-h_{2}(\delta),

(11)

where $h_{2}(\delta)=-\delta\log\delta-(1-\delta)\log(1-\delta)$ is the binary entropy function.

Proof:

Fix an arbitrary ${\mathcal{S}}\subset{\mathcal{V}}$ and an arbitrary $v\in{\mathcal{S}}$ . Consider the probability distributions ${\mathbb{P}}={\mathbb{P}}_{W_{\mathcal{S}},Z,{\widehat{Z}}_{v}}$ and ${\mathbb{Q}}={\mathbb{P}}_{W_{\mathcal{S}}}\otimes{\mathbb{P}}_{Z|W_{\mathcal{S}}}\otimes{\mathbb{P}}_{{\widehat{Z}}_{v}|W_{\mathcal{S}}}$ . Define the indicator random variable $\Upsilon\triangleq\mathbf{1}\big{\{}d(Z,{\widehat{Z}}_{v})\leq\varepsilon\big{\}}$ . Then from (10) it follows that ${\mathbb{P}}[\Upsilon=1]\geq 1-\delta$ . On the other hand, since $Z\to W_{\mathcal{S}}\to{\widehat{Z}}_{v}$ form a Markov chain under ${\mathbb{Q}}$ , by Fubini’s theorem,

	$\displaystyle\quad\,\,{\mathbb{Q}}[\Upsilon=1]$
	$\displaystyle=\int_{{\mathsf{W}}_{\mathcal{S}}}\int_{{\mathsf{Z}}}\int_{{\mathsf{Z}}}\mathbf{1}\big{\{}d(z,{\widehat{z}}_{v})\leq\varepsilon\big{\}}{\mathbb{P}}({\rm d}z\|w_{\mathcal{S}}){\mathbb{P}}({\rm d}{\widehat{z}}_{v}\|w_{\mathcal{S}}){\mathbb{P}}({\rm d}w_{\mathcal{S}})$
	$\displaystyle=\int_{{\mathsf{W}}_{\mathcal{S}}}\int_{{\mathsf{Z}}}{\mathbb{P}}\big{[}d(Z,{\widehat{z}}_{v})\leq\varepsilon\big{\|}W_{\mathcal{S}}=w_{\mathcal{S}}\big{]}{\mathbb{P}}({\rm d}{\widehat{z}}_{v}\|w_{\mathcal{S}}){\mathbb{P}}({\rm d}w_{\mathcal{S}})$
	$\displaystyle\leq\int_{\mathcal{{\mathsf{W}}}_{\mathcal{S}}}\sup_{{\widehat{z}}_{v}\in{\mathsf{Z}}}{\mathbb{P}}\big{[}d(Z,{\widehat{z}}_{v})\leq\varepsilon\big{\|}W_{\mathcal{S}}=w_{\mathcal{S}}\big{]}{\mathbb{P}}({\rm d}w_{\mathcal{S}})$
	$\displaystyle=\mathbb{E}[L(W_{\mathcal{S}},\varepsilon)].$		(12)

Consequently,

	$\displaystyle I(Z;{\widehat{Z}}_{v}\|W_{\mathcal{S}})$	$\displaystyle=D({\mathbb{P}}\\|{\mathbb{Q}})$
		$\displaystyle\overset{{\rm(a)}}{\geq}d_{2}({\mathbb{P}}[\Upsilon=1]\\|{\mathbb{Q}}[\Upsilon=1])$
		$\displaystyle\overset{{\rm(b)}}{\geq}{\mathbb{P}}[\Upsilon=1]\log\frac{1}{{\mathbb{Q}}[\Upsilon=1]}-h_{2}({\mathbb{P}}[\Upsilon=1])$
		$\displaystyle\overset{{\rm(c)}}{\geq}(1-\delta)\log\frac{1}{\mathbb{E}[L(W_{\mathcal{S}},\varepsilon)]}-h_{2}(\delta)$

where

(a)

follows from the data processing inequality for divergence, where $d_{2}(p\|q)\triangleq p\log(p/q)+(1-p)\log((1-p)/(1-q))$ is the binary divergence function;
(b)

follows from the fact that $d_{2}(p\|q)\geq p\log(1/q)-h_{2}(p)$ ;
(c)

follows from the fact that ${\mathbb{P}}[\Upsilon=1]\geq 1-\delta\geq 1/2$ by (10), and ${\mathbb{Q}}[\Upsilon=1]\leq\mathbb{E}[L(W_{\mathcal{S}},\varepsilon)]$ by (12).

∎

For a fixed $\varepsilon$ , Lemma 1 captures the intuition that, the more spread the conditional distribution ${\mathbb{P}}_{Z|W_{\mathcal{S}}}$ is, the more information we need about $Z$ to achieve the required accuracy; similarly, for a fixed ${\mathbb{P}}_{Z|W_{\mathcal{S}}}$ , the smaller the accuracy parameter $\varepsilon$ , the more information is necessary. In Section IV, we provide explicit expressions and upper bounds for the conditional small ball probability $L(\varepsilon,w_{\mathcal{S}})$ in the context of computing linear functions of real-valued r.v.’s with absolutely continuous probability distributions. We show that, in such cases, $L(\varepsilon,w_{\mathcal{S}})=O(\varepsilon)$ , which implies that the lower bound of Lemma 1 grows at least as fast as $\log(1/\varepsilon)$ in the high-accuracy limit $\varepsilon\to 0$ .

II-B Upper bound on $I(Z;{\widehat{Z}}_{v}|W_{\mathcal{S}})$ via cutset capacity

Our first upper bound involves the cutset capacity $C_{\mathcal{S}}$ , defined as

\displaystyle C_{\mathcal{S}}\triangleq\sum_{e\in{\mathcal{E}}_{\mathcal{S}}}C_{e}.

Here, $C_{e}$ denotes the Shannon capacity of the channel $K_{e}$ .

Lemma 2.

For any set ${\mathcal{S}}\subset{\mathcal{V}}$ , let ${\widehat{Z}}_{\mathcal{S}}\triangleq({\widehat{Z}}_{v})_{v\in{\mathcal{S}}}$ . Then, for any $T$ -step algorithm ${\mathcal{A}}$ and for any $v\in{\mathcal{S}}$ ,

\displaystyle I(Z;{\widehat{Z}}_{v}|W_{{\mathcal{S}}})\leq I(Z;{\widehat{Z}}_{{\mathcal{S}}}|W_{{\mathcal{S}}})\leq TC_{{\mathcal{S}}}.

Proof:

The first inequality follows from the data processing lemma for mutual information. The second inequality has been obtained in [1] and [2] as well, but the proof in [1] relies heavily on differential entropy. Our proof is more general, as it only uses the properties of mutual information.

For a set of nodes ${\mathcal{S}}\subset{\mathcal{V}}$ , let $X_{{\mathcal{S}},t}\triangleq(X_{v,t})_{v\in{\mathcal{S}}}$ and $Y_{{\mathcal{S}},t}\triangleq(Y_{v,t})_{v\in{\mathcal{S}}}$ . For two subsets ${\mathcal{S}}_{1}$ and ${\mathcal{S}}_{2}$ of ${\mathcal{V}}$ , define $X_{({\mathcal{S}}_{1},{\mathcal{S}}_{2}),t}\triangleq\big{(}X_{(u,v),t}:u\in{\mathcal{S}}_{1},v\in{\mathcal{S}}_{2},(v,u)\in{\mathcal{E}}\big{)}$ as the messages sent from nodes in ${\mathcal{S}}_{1}$ to nodes in ${\mathcal{S}}_{2}$ at step $t$ , and $Y_{({\mathcal{S}}_{1},{\mathcal{S}}_{2}),t}\triangleq\big{(}Y_{(u,v),t}:u\in{\mathcal{S}}_{1},v\in{\mathcal{S}}_{2},(u,v)\in{\mathcal{E}}\big{)}$ as the messages received by nodes in ${\mathcal{S}}_{2}$ from nodes in ${\mathcal{S}}_{1}$ at step $t$ . We will be using this notation in the proofs that follow, as well.

If $T=0$ , then for any $v\in{\mathcal{S}}$ , ${\widehat{Z}}_{v}=\psi_{v}(W_{v})$ , hence $I(Z;{\widehat{Z}}_{{\mathcal{S}}}|W_{{\mathcal{S}}})\leq I(Z;W_{\mathcal{S}}|W_{{\mathcal{S}}})=0$ . For $T\geq 1$ , we start with the following chain of inequalities:

$\displaystyle I(Z;{\widehat{Z}}_{{\mathcal{S}}}\|W_{{\mathcal{S}}})$	$\displaystyle\overset{\rm(a)}{\leq}I(W_{{\mathcal{S}}},W_{{\mathcal{S}}^{c}};W_{{\mathcal{S}}},Y_{{\mathcal{S}}}^{T}\|W_{{\mathcal{S}}})$
	$\displaystyle=I(W_{{\mathcal{S}}^{c}};Y_{{\mathcal{S}}}^{T}\|W_{{\mathcal{S}}})$
	$\displaystyle=\sum\limits_{t=1}^{T}I(W_{{\mathcal{S}}^{c}};Y_{{\mathcal{S}},t}\|W_{{\mathcal{S}}},Y_{{\mathcal{S}}}^{t-1})$
	$\displaystyle\overset{\rm(b)}{=}\sum\limits_{t=1}^{T}I(W_{{\mathcal{S}}^{c}};Y_{{\mathcal{S}},t}\|W_{{\mathcal{S}}},Y_{{\mathcal{S}}}^{t-1},X_{{\mathcal{S}},t})$
	$\displaystyle\leq\sum\limits_{t=1}^{T}I(W_{{\mathcal{S}}^{c}},X_{{\mathcal{S}}^{c},t};Y_{{\mathcal{S}},t}\|W_{{\mathcal{S}}},Y_{{\mathcal{S}}}^{t-1},X_{{\mathcal{S}},t})$
	$\displaystyle=\sum\limits_{t=1}^{T}\Big{(}I(X_{{\mathcal{S}}^{c},t};Y_{{\mathcal{S}},t}\|W_{{\mathcal{S}}},Y_{{\mathcal{S}}}^{t-1},X_{{\mathcal{S}},t})$
	$\displaystyle\qquad+I(W_{{\mathcal{S}}^{c}};Y_{{\mathcal{S}},t}\|W_{{\mathcal{S}}},Y_{{\mathcal{S}}}^{t-1},X_{{\mathcal{S}},t},X_{{\mathcal{S}}^{c},t})\Big{)}$
	$\displaystyle\overset{\rm(c)}{=}\sum\limits_{t=1}^{T}I(X_{{\mathcal{S}}^{c},t};Y_{{\mathcal{S}},t}\|W_{{\mathcal{S}}},Y_{{\mathcal{S}}}^{t-1},X_{{\mathcal{S}},t})$
	$\displaystyle\overset{\rm(d)}{\leq}\sum\limits_{t=1}^{T}I(X_{{\mathcal{S}}^{c},t};Y_{{\mathcal{S}},t}\|X_{{\mathcal{S}},t})$	(13)

where

(a)

follows from data processing inequality, and the fact that $Z=f(W_{\mathcal{S}},W_{{\mathcal{S}}^{c}})$ and ${\widehat{Z}}_{v}=\psi_{v}(W_{v},Y_{v}^{T})$ ;
(b)

follows from the fact that $X_{v,t}=\varphi_{v,t}(W_{v},Y_{v}^{t-1})$ ;
(c)

follows from the memorylessness of the channels, hence the Markov chain $W_{{\mathcal{S}}^{c}},W_{{\mathcal{S}}},Y_{{\mathcal{S}}}^{t-1}\rightarrow X_{{\mathcal{S}},t},X_{{\mathcal{S}}^{c},t}\rightarrow Y_{{\mathcal{S}},t}$ , and the weak union property of conditional independence [24, p. 25];

(d)

follows from the Markov chain

W_{{\mathcal{S}}},Y_{{\mathcal{S}}}^{t-1}\rightarrow X_{{\mathcal{S}},t},X_{{\mathcal{S}}^{c},t}\rightarrow Y_{{\mathcal{S}},t},

together with the fact that, if $X\rightarrow A,B\rightarrow C$ form a Markov chain, then

I(A;C|X,B)\leq I(A;C|B).

To prove this, we expand $I\left(\left.A,X;C\right|B\right)$ in two ways to get

	$\displaystyle I\left(A,X;C\|B\right)$	$\displaystyle=I(X;C\|B)+I\left(\left.A;C\right\|X,B\right)$
		$\displaystyle=I(A;C\|B)+I\left(\left.X;C\right\|A,B\right).$

The claim follows because $I\left(\left.X;C\right|A,B\right)=0$ .

From now on we drop the step index $t$ and denote $X_{({\mathcal{S}}_{1},{\mathcal{S}}_{2}),t}$ as $X_{{\mathcal{S}}_{1}{\mathcal{S}}_{2}}$ to simplify the notation. Note that $X_{{\mathcal{S}}}=(X_{{\mathcal{S}}{\mathcal{S}}},X_{{\mathcal{S}}{\mathcal{S}}^{c}})$ and $Y_{{\mathcal{S}}}=(Y_{{\mathcal{S}}{\mathcal{S}}},Y_{{\mathcal{S}}^{c}{\mathcal{S}}})$ . We have

$\displaystyle I(X_{{\mathcal{S}}^{c}};Y_{{\mathcal{S}}}\|X_{{\mathcal{S}}})$	$\displaystyle=I(X_{{\mathcal{S}}^{c}};Y_{{\mathcal{S}}^{c}{\mathcal{S}}},Y_{{\mathcal{S}}{\mathcal{S}}}\|X_{\mathcal{S}})$
	$\displaystyle=I(X_{{\mathcal{S}}^{c}};Y_{{\mathcal{S}}^{c}{\mathcal{S}}}\|X_{\mathcal{S}})+I(X_{{\mathcal{S}}^{c}};Y_{{\mathcal{S}}{\mathcal{S}}}\|X_{\mathcal{S}},Y_{{\mathcal{S}}^{c}{\mathcal{S}}})$
	$\displaystyle\overset{\rm(a)}{=}I(X_{{\mathcal{S}}^{c}{\mathcal{S}}},X_{{\mathcal{S}}^{c}{\mathcal{S}}^{c}};Y_{{\mathcal{S}}^{c}{\mathcal{S}}}\|X_{\mathcal{S}})$
	$\displaystyle=I(X_{{\mathcal{S}}^{c}{\mathcal{S}}};Y_{{\mathcal{S}}^{c}{\mathcal{S}}}\|X_{\mathcal{S}})$
	$\displaystyle\quad+I(X_{{\mathcal{S}}^{c}{\mathcal{S}}^{c}};Y_{{\mathcal{S}}^{c}{\mathcal{S}}}\|X_{\mathcal{S}},X_{{\mathcal{S}}^{c}{\mathcal{S}}})$
	$\displaystyle\overset{\rm(b)}{\leq}I(X_{{\mathcal{S}}^{c}{\mathcal{S}}};Y_{{\mathcal{S}}^{c}{\mathcal{S}}})$
	$\displaystyle\overset{\rm(c)}{\leq}\sum_{e\in{\mathcal{E}}_{{\mathcal{S}}}}C_{e}$	(14)

where

(a)

follows from the Markov chain $X_{{\mathcal{S}}^{c}},Y_{{\mathcal{S}}^{c}{\mathcal{S}}}\rightarrow X_{\mathcal{S}}\rightarrow Y_{{\mathcal{S}}{\mathcal{S}}}$ and the weak union property of conditional independence;
(b)

follows from the Markov chains $X_{\mathcal{S}}\rightarrow X_{{\mathcal{S}}^{c}{\mathcal{S}}}\rightarrow Y_{{\mathcal{S}}^{c}{\mathcal{S}}}$ and $X_{{\mathcal{S}}^{c}{\mathcal{S}}^{c}},X_{\mathcal{S}}\rightarrow X_{{\mathcal{S}}^{c}{\mathcal{S}}}\rightarrow Y_{{\mathcal{S}}^{c}{\mathcal{S}}}$ , and the weak union property of conditional independence;
(c)

follows from the fact that the channels associated with ${\mathcal{E}}_{\mathcal{S}}$ are independent, and the fact that the capacity of a product channel is at most the sum of the capacities of the constituent channels [25].

Then the statement of Lemma 2 follows from (13) and (14). ∎

II-C Preliminaries on strong data processing inequalities

In Sec. II-D, we will upper-bound $I(Z;{\widehat{Z}}_{v}|W_{\mathcal{S}})$ using so-called strong data processing inequalities (SDPI’s) for discrete channels (cf. [20] and references therein). Here we provide the necessary background. A discrete memoryless channel is specified by a triple $({\mathsf{X}},{\mathsf{Y}},K)$ , where ${\mathsf{X}}$ is the input alphabet, ${\mathsf{Y}}$ is the output alphabet, and $K=\big{(}K(y|x)\big{)}_{(x,y)\in{\mathsf{X}}\times{\mathsf{Y}}}$ is the stochastic transition law. We say that the channel $({\mathsf{X}},{\mathsf{Y}},K)$ satisfies an SDPI at input distribution ${\mathbb{P}}_{X}$ with constant $c\in[0,1)$ if $D({\mathbb{Q}}_{Y}\|{\mathbb{P}}_{Y})\leq cD({\mathbb{Q}}_{X}\|{\mathbb{P}}_{X})$ for any other input distribution ${\mathbb{Q}}_{X}$ . Here ${\mathbb{P}}_{Y}$ and ${\mathbb{Q}}_{Y}$ denote the marginal distribution of the channel output when the input has distribution ${\mathbb{P}}_{X}$ and ${\mathbb{Q}}_{X}$ , respectively. Define the SDPI constant of $K$ as

\displaystyle\eta(K)\triangleq\sup_{{\mathbb{P}}_{X}}\sup_{{\mathbb{Q}}_{X}\neq{\mathbb{P}}_{X}}\frac{D({\mathbb{Q}}_{Y}\|{\mathbb{P}}_{Y})}{D({\mathbb{Q}}_{X}\|{\mathbb{P}}_{X})}.

The SDPI constants of some common discrete channels have closed form expressions. For example, for a binary symmetric channel (BSC) with crossover probability $p$ , $\eta({\rm BSC}(p))=(1-2p)^{2}$ [26], and for a binary erasure channel (BEC) with erasure probability $p$ , $\eta({\rm BEC}(p))=1-p$ . It can be shown that $\eta(K)$ is also the maximum mutual information contraction ratio in a Markov chain $U\rightarrow X\rightarrow Y$ with ${\mathbb{P}}_{Y|X}=K$ [27]:

\displaystyle\eta(K)=\sup_{{\mathbb{P}}_{U,X}}\frac{I(U;Y)}{I(U;X)}

(see [28, App. B] for a proof of this formula in the setting of abstract alphabets). Consequently, for any such Markov chain,

\displaystyle I(U;Y)\leq\eta(K)I(U;X).

This is a stronger result than the ordinary data processing inequality for mutual information, as it quantitatively captures the amount by which the information contracts after passing through a channel. We will also need a conditional version of the SDPI:

Lemma 3.

For any Markov chain $U,V\rightarrow X\rightarrow Y$ with ${\mathbb{P}}_{Y|X}=K$ ,

\displaystyle I(U;Y|V)\leq\eta(K)I(U;X|V).

For binary channels, this result was first proved by Evans and Schulman [29, Corollary 1]. A proof for the general case is included in [30, Lemma 2.7]. Finally, we will need a bound on the SDPI constant of a product channel. The tensor product of two channels $({\mathsf{X}}_{1},{\mathsf{Y}}_{2},K_{1})$ and $({\mathsf{X}}_{2},{\mathsf{Y}}_{2},K_{2})$ is a channel $({\mathsf{X}}_{1}\times{\mathsf{X}}_{2},{\mathsf{Y}}_{1}\times{\mathsf{Y}}_{2},K_{1}\otimes K_{2})$ with

K_{1}\otimes K_{2}(y_{1},y_{2}|x_{1},x_{2})\triangleq K_{1}(y_{1}|x_{1})K_{2}(y_{2}|x_{2})

for all $(x_{1},x_{2})\in{\mathsf{X}}_{1}\times{\mathsf{X}}_{2},\,(y_{1},y_{2})\in{\mathsf{Y}}_{1}\times{\mathsf{Y}}_{2}$ . The extension to more than two channels is obvious. The following lemma is a special case of Corollary 2 of Polyanskiy and Wu [31], obtained using the method of Evans and Schulman [29]. We give the proof, since we adapt the underlying technique at several points in this paper.

Lemma 4.

For a product channel $K=\bigotimes_{i=1}^{m}K_{i}$ , if the constituent channels satisfy $\eta(K_{i})\leq\eta$ for $i\in\{1,\ldots,m\}$ , then

\displaystyle\eta(K)\leq 1-(1-{\eta})^{m}.

Proof:

Let $X^{m}$ and $Y^{m}$ be the input and output of the product channel $K=K_{1}\otimes\ldots\otimes K_{m}$ . Let $U$ be an arbitrary random variable, such that $U\rightarrow X^{m}\rightarrow Y^{m}$ form a Markov chain. It suffices to show that

\displaystyle I(U;Y^{m})\leq\big{(}1-(1-\eta)^{m}\big{)}I(U;X^{m}).

(15)

From the chain rule,

\displaystyle I(U;Y^{m})=I(U;Y^{m-1})+I(U;Y_{m}|Y^{m-1}).

Since $U,Y^{m-1}\rightarrow X_{m}\rightarrow Y_{m}$ form a Markov chain, and ${\mathbb{P}}_{Y_{m}|X_{m}}=K_{m}$ , Lemma 3 gives

	$\displaystyle I(U;Y_{m}\|Y^{m-1})$	$\displaystyle\leq\eta(K_{m})I(U;X_{m}\|Y^{m-1})$
		$\displaystyle\leq\eta I(U;X_{m}\|Y^{m-1}).$

It follows that

	$\displaystyle I(U;Y^{m})$	$\displaystyle\leq I(U;Y^{m-1})+\eta I(U;X_{m}\|Y^{m-1})$
		$\displaystyle=(1-\eta)I(U;Y^{m-1})+\eta I(U;Y^{m-1},X_{m})$
		$\displaystyle\leq(1-\eta)I(U;Y^{m-1})+\eta I(U;X^{m}),$

where the last step follows from the ordinary data processing inequality and the Markov chain $U\rightarrow X^{m}\rightarrow Y^{m-1},X_{m}$ . Unrolling the above recursive upper bound on $I(U;Y^{m})$ and noting that $I(U;Y_{1})\leq\eta I(U;X_{1})$ , we get

	$\displaystyle I(U;Y^{m})$	$\displaystyle\leq(1-\eta)^{m-1}\eta I(U;X_{1})+\ldots$
		$\displaystyle\quad+(1-\eta)\eta I(U;X^{m-1})+\eta I(U;X^{m})$
		$\displaystyle\leq\big{(}(1-\eta)^{m-1}+\ldots+(1-\eta)+1\big{)}\eta I(U;X^{m})$
		$\displaystyle=\big{(}1-(1-\eta)^{m}\big{)}I(U;X^{m}),$

which proves (15) and hence Lemma 4. ∎

II-D Upper bound on $I(Z;{\widehat{Z}}_{v}|W_{\mathcal{S}})$ via SDPI

Having the necessary background at hand, we can now state our upper bounds based on SDPI constants. Let $K_{v}\triangleq\bigotimes_{e\in{\mathcal{E}}_{v}}K_{e}$ be the overall transition law of the channels across the cutset ${\mathcal{E}}_{v}$ . Define

\eta_{v}\triangleq\eta(K_{v})

as the SDPI constant of $K_{v}$ , and

\eta^{*}_{v}\triangleq\max_{e\in{\mathcal{E}}_{v}}\eta(K_{e})

as the largest SDPI constant among all the channels across ${\mathcal{E}}_{v}$ . Our second upper bound on $I(Z;{\widehat{Z}}_{v}|W_{\mathcal{S}})$ involves these SDPI constants, and the conditional entropy of $W_{{\mathcal{S}}^{c}}$ given $W_{{\mathcal{S}}}$ .

Lemma 5.

For any set ${\mathcal{S}}\subset{\mathcal{V}}$ , any node $v\in{\mathcal{S}}$ , and any $T$ -step algorithm ${\mathcal{A}}$ ,

	$\displaystyle I(Z;{\widehat{Z}}_{v}\|W_{{\mathcal{S}}})$	$\displaystyle\leq(1-(1-\eta_{v})^{T})H(W_{{\mathcal{S}}^{c}}\|W_{{\mathcal{S}}})$
		$\displaystyle\leq(1-(1-\eta^{*}_{v})^{\|{\mathcal{E}}_{v}\|T})H(W_{{\mathcal{S}}^{c}}\|W_{{\mathcal{S}}}).$

Proof:

We adapt the proof of Lemma 4. For any $v$ and $t$ , define the shorthand $X_{v\leftarrow,t}\triangleq X_{({\mathcal{N}}_{v\leftarrow},v),t}$ . If $T=0$ , then for any $v\in{\mathcal{S}}$ , ${\widehat{Z}}_{v}=\psi_{v}(W_{v})$ ; hence $I(Z;{\widehat{Z}}_{v}|W_{{\mathcal{S}}})\leq I(Z;W_{v}|W_{{\mathcal{S}}})=0$ . If $T\geq 1$ , then for any $v\in{\mathcal{S}}$ ,

	$\displaystyle\quad\,\,I(Z;{\widehat{Z}}_{v}\|W_{{\mathcal{S}}})$
	$\displaystyle\leq I(W_{\mathcal{S}},W_{{\mathcal{S}}^{c}};W_{v},Y_{v}^{T}\|W_{{\mathcal{S}}})$
	$\displaystyle=I(W_{{\mathcal{S}}^{c}};Y_{v}^{T}\|W_{{\mathcal{S}}})$
	$\displaystyle=I(W_{{\mathcal{S}}^{c}};Y_{v}^{T-1}\|W_{{\mathcal{S}}})+I(W_{{\mathcal{S}}^{c}};Y_{v,T}\|W_{{\mathcal{S}}},Y_{v}^{T-1})$
	$\displaystyle\overset{\rm(a)}{\leq}I(W_{{\mathcal{S}}^{c}};Y_{v}^{T-1}\|W_{{\mathcal{S}}})+\eta_{v}I(W_{{\mathcal{S}}^{c}};X_{v\leftarrow,T}\|W_{{\mathcal{S}}},Y_{v}^{T-1})$
	$\displaystyle=(1-\eta_{v})I(W_{{\mathcal{S}}^{c}};Y_{v}^{T-1}\|W_{{\mathcal{S}}})$
	$\displaystyle\quad+\eta_{v}I(W_{{\mathcal{S}}^{c}};Y_{v}^{T-1},X_{v\leftarrow,T}\|W_{\mathcal{S}})$

where (a) follows from the conditional SDPI (Lemma 3) and the fact that $W_{{\mathcal{S}}^{c}},W_{{\mathcal{S}}},Y_{v}^{t-1}\rightarrow X_{v\leftarrow,t}\rightarrow Y_{v,t}$ form a Markov chain for $t\in\{1,\ldots,T\}$ . Unrolling the above recursive upper bound on $I(W_{{\mathcal{S}}^{c}};Y_{v}^{T}|W_{{\mathcal{S}}})$ , and noting that $I(W_{{\mathcal{S}}^{c}};Y_{{v},1}|W_{{\mathcal{S}}})\leq\eta_{v}I(W_{{\mathcal{S}}^{c}};X_{v\leftarrow,1}|W_{{\mathcal{S}}})$ , we get

	$\displaystyle\quad\,\,I(W_{{\mathcal{S}}^{c}};Y_{v}^{T}\|W_{{\mathcal{S}}})$
	$\displaystyle\leq(1-\eta_{v})^{T-1}\eta_{v}I(W_{{\mathcal{S}}^{c}};X_{v\leftarrow,1}\|W_{{\mathcal{S}}})+\ldots$
	$\displaystyle\quad+(1-\eta_{v})\eta_{v}I(W_{{\mathcal{S}}^{c}};Y_{v}^{T-2},X_{v\leftarrow,T-1}\|W_{{\mathcal{S}}})$
	$\displaystyle\quad+\eta_{v}I(W_{{\mathcal{S}}^{c}};Y_{v}^{T-1},X_{v\leftarrow,T}\|W_{{\mathcal{S}}})$
	$\displaystyle\leq\big{(}(1-\eta_{v})^{T-1}+\ldots+(1-\eta_{v})+1\big{)}\eta_{v}H(W_{{\mathcal{S}}^{c}}\|W_{{\mathcal{S}}})$
	$\displaystyle=\big{(}1-(1-\eta_{v})^{T}\big{)}H(W_{{\mathcal{S}}^{c}}\|W_{{\mathcal{S}}}).$

The weakened upper bound follows from the fact that $\eta_{v}\leq 1-(1-\eta_{v}^{*})^{|{\mathcal{E}}_{v}|}$ , due to Lemma 4. This completes the proof of Lemma 5.∎

Comparing Lemma 2 and Lemma 5, we note that the upper bound in Lemma 2 captures the communication constraints through the cutset capacity alone, in accordance with the fact that the communication constraints do not depend on $W$ or $Z$ . The bound applies when $W$ is either discrete or continuous; however, it grows linearly with $T$ . By contrast, the upper bound in Lemma 5 builds on the fact that $I(Z;{\widehat{Z}}_{v}|W_{{\mathcal{S}}})$ is upper bounded by $H(W_{{\mathcal{S}}^{c}}|W_{{\mathcal{S}}})$ , and goes a step further by capturing the communication constraint through a multiplicative contraction of $H(W_{{\mathcal{S}}^{c}}|W_{{\mathcal{S}}})$ . It never exceeds $H(W_{{\mathcal{S}}^{c}}|W_{{\mathcal{S}}})$ as $T$ increases. However, it is useful only when the conditional entropy $H(W_{{\mathcal{S}}^{c}}|W_{\mathcal{S}})$ is well-defined and finite (e.g., when $W$ is discrete). We give an explicit comparison of Lemma 2 and Lemma 5 in the following example:

Example 1.

Consider a two-node network, where the nodes are connected by BSCs. The problem is for the two nodes to compute the mod- $2$ sum of their one-bit observations. Formally, we have $G=({\mathcal{V}},{\mathcal{E}})$ with ${\mathcal{V}}=\{1,2\}$ , ${\mathcal{E}}=\{(1,2),(2,1)\}$ , $K_{(1,2)}=K_{(2,1)}={\rm BSC}(p)$ , $W_{1}$ and $W_{2}$ are independent ${\rm Bern}(\frac{1}{2})$ r.v.’s, $Z=W_{1}\oplus W_{2}$ , and $d(z,{\widehat{z}})=\mathbf{1}\{z\neq{\widehat{z}}\}$ .

Choosing ${\mathcal{S}}=\{2\}$ , Lemma 2 gives

\displaystyle I(Z;{\widehat{Z}}_{2}|W_{2})\leq(1-h_{2}(p))T,

(16)

whereas Lemma 5, together with the fact that $\eta({\rm BSC}(p))=(1-2p)^{2}$ , gives

\displaystyle I(Z;{\widehat{Z}}_{2}|W_{2})\leq 1-(4p\bar{p})^{T},

(17)

where, for $p\in[0,1]$ , $\bar{p}\triangleq 1-p$ . For this example, the cutset-capacity upper bound is always tighter for small $T$ , as

\displaystyle\frac{\partial\big{(}1-(4p\bar{p})^{T}\big{)}}{\partial T}\Big{|}_{T=0}

\displaystyle=\log\frac{1}{4p\bar{p}}\geq 1-h_{2}(p),\quad p\in[0,1].

Fig. 3 shows the two upper bounds with $p=0.3$ : the cutset-capacity upper bound is tighter when $T<5$ .

II-E Lower bounds on computation time

We now proceed to derive lower bounds on the computation time $T(\varepsilon,\delta)$ based on the previously derived lower and upper bounds on the conditional mutual information $I(Z;{\widehat{Z}}_{v}|W_{\mathcal{S}})$ . Define the shorthand notation

\displaystyle\ell({\mathcal{S}},\varepsilon,\delta)\triangleq(1-\delta)\log\frac{1}{\mathbb{E}[L(W_{\mathcal{S}},\varepsilon)]}-h_{2}(\delta),

(18)

which is the lower bound on $I(Z;{\widehat{Z}}_{v}|W_{\mathcal{S}})$ in Lemma 1 .

II-E1 Cutset-capacity bounds

Combined with the conditional small ball probability lower bound in Lemma 1, the cutset-capacity upper bound in Lemma 2 leads to a lower bound on $T(\varepsilon,\delta)$ :

Theorem 1.

For an arbitrary network, for any $\varepsilon\geq 0$ and $\delta\in[0,1/2]$ ,

\displaystyle T(\varepsilon,\delta)\geq\max_{{\mathcal{S}}\subset{\mathcal{V}}}\frac{\ell({\mathcal{S}},\varepsilon,\delta)}{C_{\mathcal{S}}}.

From an operational point of view, the lower bound of Theorem 1 reflects the fact that the problem of distributed function computation is, in a certain sense, a joint source-channel coding (JSCC) problem with possibly noisy feedback. In particular, the lower bound on $I(Z;{\widehat{Z}}_{v}|W_{\mathcal{S}})$ from Lemma 1, which is used to prove Theorem 1, can be interpreted in terms of a reduction of JSCC to generalized list decoding [32, Sec. III.B]. Given any algorithm ${\mathcal{A}}$ and any node $v\in{\mathcal{V}}$ , we may construct a “list decoder” as follows: given the estimate ${\widehat{Z}}_{v}$ , we generate a “list” $\{z\in{\mathsf{Z}}:d(z,{\widehat{Z}}_{v})\leq\varepsilon\}$ . If we fix a set ${\mathcal{S}}\subset{\mathcal{V}}$ and allow all the nodes in ${\mathcal{S}}$ to share their observations $W_{\mathcal{S}}$ , then $\mathbb{E}[L(W_{\mathcal{S}},\varepsilon)]$ is an upper bound on the ${\mathbb{P}}_{W}$ -measure of the list of any node $v\in{\mathcal{S}}$ . Therefore, $\ell({\mathcal{S}},\varepsilon,\delta)$ is a lower bound on the total amount of information that is necessary for the JSCC problem. The complementary cutset upper bound on $I(Z;{\widehat{Z}}_{v}|W_{\mathcal{S}})$ bounds the amount of information that can be accumulated with each channel use. The lower bound on $T(\varepsilon,\delta)$ can thus be interpreted as a lower bound on the blocklength of the JSCC problem.

As we will demonstrate in Section IV, based on Theorem 1, it is possible to exploit structural properties of the function $f$ (such as linearity) and of the probability law ${\mathbb{P}}_{W}$ (such as log-concavity) to derive lower bounds on the computation time that are often tighter than existing bounds.

II-E2 SDPI bounds

Combining the lower bound of Lemma 1 with the SDPI upper bound of Lemma 5, we get the following:

Theorem 2.

For an arbitrary network, for any $\varepsilon\geq 0$ and $\delta\in[0,1/2]$ ,

\displaystyle T(\varepsilon,\delta)\geq\max_{{\mathcal{S}}\subset{\mathcal{V}}}\max_{v\in{\mathcal{S}}}\frac{\log\big{(}1-\frac{\ell({\mathcal{S}},\varepsilon,\delta)}{H(W_{{\mathcal{S}}^{c}}|W_{\mathcal{S}})}\big{)}^{-1}}{|{\mathcal{E}}_{v}|\log(1-\eta^{*}_{v})^{-1}}

(19)

where $\eta^{*}_{v}\triangleq\max_{e\in{\mathcal{E}}_{v}}\eta(K_{e})$ .

The lower bounds in Theorem 1 and Theorem 2 can behave quite differently. To illustrate this, we compare them in two cases:

When $H(W_{{\mathcal{S}}^{c}}|W_{\mathcal{S}})\gg\log\frac{1}{\mathbb{E}[L(W_{\mathcal{S}},\varepsilon)]}$ , Theorem 2 gives

	$\displaystyle T(\varepsilon,\delta)$	$\displaystyle\geq\max_{{\mathcal{S}}\subset{\mathcal{V}}}\max_{v\in{\mathcal{S}}}\frac{\log\big{(}1-\frac{\ell({\mathcal{S}},\varepsilon,\delta)}{H(W_{{\mathcal{S}}^{c}}\|W_{\mathcal{S}})}\big{)}^{-1}}{\|{\mathcal{E}}_{v}\|\log(1-\eta^{*}_{v})^{-1}}$
		$\displaystyle\approx\max_{{\mathcal{S}}\subset{\mathcal{V}}}\max_{v\in{\mathcal{S}}}\frac{\ell({\mathcal{S}},\varepsilon,\delta)\log e}{H(W_{{\mathcal{S}}^{c}}\|W_{\mathcal{S}})\|{\mathcal{E}}_{v}\|\log(1-\eta^{*}_{v})^{-1}},$

which has essentially the same dependence on $\ell({\mathcal{S}},\varepsilon,\delta)$ as the lower bound given by Theorem 1. In this case, Theorem 1 gives more useful lower bounds as long as $C_{\mathcal{S}}\ll H(W_{{\mathcal{S}}^{c}}|W_{\mathcal{S}})$ , especially when $W$ is continuous.

When $H(W_{{\mathcal{S}}^{c}}|W_{\mathcal{S}})\approx\log\frac{1}{\mathbb{E}[L(W_{\mathcal{S}},\varepsilon)]}$ and $\delta$ is small, $H(W_{{\mathcal{S}}^{c}}|W_{\mathcal{S}})$ serves as a sharp proxy of $\ell({\mathcal{S}},\varepsilon,\delta)$ . Theorem 1 in this case gives

\displaystyle T(\varepsilon,\delta)\geq\max_{{\mathcal{S}}\subset{\mathcal{V}}}\frac{\ell({\mathcal{S}},\varepsilon,\delta)}{C_{\mathcal{S}}}\approx\max_{{\mathcal{S}}\subset{\mathcal{V}}}\frac{H(W_{{\mathcal{S}}^{c}}|W_{\mathcal{S}})}{C_{\mathcal{S}}},

while Theorem 2 gives

	$\displaystyle T(\varepsilon,\delta)$	$\displaystyle\geq\max_{{\mathcal{S}}\subset{\mathcal{V}}}\max_{v\in{\mathcal{S}}}\frac{\log\big{(}1-\frac{\ell({\mathcal{S}},\varepsilon,\delta)}{H(W_{{\mathcal{S}}^{c}}\|W_{\mathcal{S}})}\big{)}^{-1}}{\|{\mathcal{E}}_{v}\|\log(1-\eta^{*}_{v})^{-1}}$
		$\displaystyle\approx\max_{{\mathcal{S}}\subset{\mathcal{V}}}\max_{v\in{\mathcal{S}}}\frac{\log{H(W_{{\mathcal{S}}^{c}}\|W_{\mathcal{S}})}+\log\frac{1}{h_{2}(\delta)}}{\|{\mathcal{E}}_{v}\|\log(1-\eta^{*}_{v})^{-1}}$

where in the last step we have used the fact that $\log\left(\delta+\frac{h_{2}(\delta)}{H(W_{{\mathcal{S}}^{c}}|W_{{\mathcal{S}}})}\right)\sim\log\left(\frac{h_{2}(\delta)}{H(W_{{\mathcal{S}}^{c}}|W_{\mathcal{S}})}\right)$ as $\delta\rightarrow 0$ . Theorem 1 in this case is sharper in capturing the dependence of $T(\varepsilon,\delta)$ on the amount of information contained in $Z$ , in that the lower bound is proportional to $H(W_{{\mathcal{S}}^{c}}|W_{\mathcal{S}})$ , whereas the lower bound given by Theorem 2 depends on $H(W_{{\mathcal{S}}^{c}}|W_{\mathcal{S}})$ only through $\log H(W_{{\mathcal{S}}^{c}}|W_{\mathcal{S}})$ . On the other hand, Theorem 2 in this case is much sharper in capturing the dependence of $T(\varepsilon,\delta)$ on the confidence parameter $\delta$ , since $\log h_{2}(\delta)$ grows without bound as $\delta\rightarrow 0$ , while the lower bound given by Theorem 1 remains bounded. We consider two examples for this case.

The first is Example 1 in Section II-D, for the two-node mod- $2$ sum problem. We have $L(w_{2},\varepsilon)=\max_{z\in\{0,1\}}{\mathbb{P}}[W_{1}\oplus W_{2}=z|W_{2}=w_{2}]=\frac{1}{2}$ , and $\ell({\mathcal{S}},0,\delta)=1-\delta-h_{2}(\delta)$ . Theorems 1 and 2 imply the following:

Corollary 1.

For the problem in Example 1, for $\delta\in[0,1/2]$ , the $(0,\delta)$ -computation time satisfies

\displaystyle T(0,\delta)

\displaystyle\geq\max\Big{\{}\frac{1-\delta-h_{2}(\delta)}{1-h_{2}(p)},\frac{\log(\delta+h_{2}(\delta))^{-1}}{\log(4p\bar{p})^{-1}}\Big{\}},

(20)

where the first lower bound is given by Theorem 1, and the second one is given by Theorem 2.

To obtain an achievable upper bound on $T(0,\delta)$ in Example 1, we consider the algorithm where each node uses a length- $T$ repetition code to send its one-bit observation to the other node. Using the Chernoff bound, as in [33], it can be shown that the probability of decoding error at each node is upper-bounded by $(4p\bar{p})^{T/2}$ , and therefore this algorithm achieves accuracy $\varepsilon=0$ with confidence parameter $\delta\leq(4p\bar{p})^{T/2}$ . This gives the upper bound

\displaystyle T(0,\delta)

\displaystyle\leq\frac{2\log\delta^{-1}}{\log(4p\bar{p})^{-1}}.

(21)

Comparing (21) with the second lower bound in (20), we see that they asymptotically differ only by a factor of $2$ as $\delta\rightarrow 0$ , as $\lim_{\delta\rightarrow 0}\log(\delta+h_{2}(\delta))/\log(\delta)=1$ . Thus, for the problem in Example 1, the converse lower bound on $T(0,\delta)$ obtained from the SDPI closely matches the achievable upper bound on $T(0,\delta)$ .

The second example concerns the problem of disseminating all of the observations through an arbitrary network:

Example 2.

Consider the problem where $W_{v}$ ’s are i.i.d. samples from the uniformly distribution over $\{1,\ldots,M\}$ , $Z=W$ , and $d(z,{\widehat{z}})=\mathbf{1}\{z\neq{\widehat{z}}\}$ . In other words, the goal of the nodes is to distribute their observations to all other nodes.

In this example, $H(W_{{\mathcal{S}}^{c}}|W_{{\mathcal{S}}})=|{\mathcal{S}}^{c}|\log M$ , and $\ell({\mathcal{S}},0,\delta)=(1-\delta)|{\mathcal{S}}^{c}|\log M-h_{2}(\delta)$ . Following Ayaso et al. [1, Def. III.4], we define the conductance of the network $G$ as

\Phi(G)\triangleq\min_{{\mathcal{S}}\in{\mathcal{V}}:|{\mathcal{V}}|/2<|{\mathcal{S}}|<|{\mathcal{V}}|}\frac{C_{{\mathcal{S}}}}{|{\mathcal{S}}^{c}|}.

Then we have the following corollary:

Corollary 2.

For the problem in Example 2, Theorem 1 gives

	$\displaystyle T(0,\delta)$	$\displaystyle\geq\max_{{\mathcal{S}}\subset{\mathcal{V}}}\frac{(1-\delta)\|{\mathcal{S}}^{c}\|\log M-h_{2}(\delta)}{C_{\mathcal{S}}}$		(22)
		$\displaystyle\gtrsim\frac{\log M}{\Phi(G)}\qquad\text{as $\delta\rightarrow 0$},$		(23)

whereas Theorem 2 gives

\displaystyle T(0,\delta)

\displaystyle\gtrsim\max_{{\mathcal{S}}\subset{\mathcal{V}}}\max_{v\in{\mathcal{S}}}\frac{\log\big{(}|{\mathcal{S}}^{c}|\log M\big{)}+\log h_{2}(\delta)^{-1}}{|{\mathcal{E}}_{v}|\log(1-\eta^{*}_{v})^{-1}}

(24)

as $\delta\rightarrow 0$ .

Again, we see that the lower bound obtained from SDPI is much sharper for capturing the dependence of $T(0,\delta)$ on $\delta$ , since $\log h_{2}(\delta)^{-1}\to+\infty$ as $\delta\to 0$ . On the other hand, the lower bound obtained from the cutset capacity upper bound is tighter in its dependence on $M$ , and can also capture the dependence on the conductance of the network.

Finally, we point out that Theorem 1 gives the correct lower bound $T(\varepsilon,\delta)=+\infty$ when the network graph $G$ is disconnected (assuming $f$ depends on the observations of all nodes): If ${\mathcal{V}}$ consists of two disconnected components ${\mathcal{S}}$ and ${\mathcal{S}}^{c}$ , then $C_{\mathcal{S}}=0$ , which results in $T(\varepsilon,\delta)=+\infty$ . Despite the sharp dependence of the lower bounds of Theorems 1 and 2 on $\varepsilon$ and $\delta$ , they have the same limitation as all previously known bounds obtained via single-cutset arguments: they examine only the flow of information across a cutset ${\mathcal{E}}_{\mathcal{S}}$ , but not within ${\mathcal{S}}$ ; hence they cannot capture the dependence of computation time on the diameter of the network. We address this limitation in the following section.

III Multi-cutset analysis

We now extend the techniques of Section II to a multi-cutset analysis, to address the limitation of the results obtained from the single-cutset analysis. In particular, the new results are able to quantify the dissipation of information as it flows across a succession of cutsets in the network. As briefly sketched in Sec. I-B, we accomplish this by partitioning a general network using multiple disjoint cutsets, such that the operation of any algorithm on the network can be simulated by another algorithm running on a chain of bidirectional noisy links. We then derive tight mutual information upper bounds for such chains, which in turn can be used to lower-bound the computation time for the original network.

III-A Network reduction

Consider an arbitrary network $G=({\mathcal{V}},{\mathcal{E}})$ . If there exists a collection of nested subsets ${\mathcal{P}}_{1}\subset\ldots\subset{\mathcal{P}}_{n-1}$ of ${\mathcal{V}}$ , such that the associated cutsets ${\mathcal{E}}_{{\mathcal{P}}_{1}},\ldots,{\mathcal{E}}_{{\mathcal{P}}_{n-1}}$ are disjoint, and the cutsets ${\mathcal{E}}_{{\mathcal{P}}_{1}^{c}},\ldots,{\mathcal{E}}_{{\mathcal{P}}_{n-1}^{c}}$ are also disjoint, then we say that $G$ is successively partitioned according to ${\mathcal{P}}_{1},\ldots,{\mathcal{P}}_{n-1}$ into $n$ subsets ${\mathcal{S}}_{1},\ldots,{\mathcal{S}}_{n}$ , where ${\mathcal{S}}_{i}={\mathcal{P}}_{i}\setminus{\mathcal{P}}_{i-1}$ , with ${\mathcal{P}}_{0}\triangleq\varnothing$ and ${\mathcal{P}}_{n}\triangleq{\mathcal{V}}$ . For $i\in\{2,\ldots,n\}$ , a node in ${\mathcal{S}}_{i}$ is called a left-bound node of ${\mathcal{S}}_{i}$ if there is an edge from it to a node in ${\mathcal{S}}_{i-1}$ . The set of left-bound nodes of ${\mathcal{S}}_{i}$ is denoted by $\overset{{}_{\leftarrow}}{\partial}{\mathcal{S}}_{i}$ . For ${\mathcal{S}}_{1}$ , define $\overset{{}_{\leftarrow}}{\partial}{\mathcal{S}}_{1}=\{v\}$ for an arbitrary $v\in{\mathcal{S}}_{1}$ . In addition, for $i\in\{2,\ldots,n\}$ , let

\displaystyle d_{i}\triangleq|{\mathcal{E}}_{{\mathcal{P}}_{i-1}^{c}}|+|{\mathcal{E}}_{{\mathcal{P}}_{i}}|+|\{{\mathcal{E}}\cap({\mathcal{S}}_{i}\times\overset{{}_{\leftarrow}}{\partial}{\mathcal{S}}_{i})\}|

(25)

be the number of edges entering ${\mathcal{S}}_{i}$ from its neighbors ${\mathcal{S}}_{i-1}$ and ${\mathcal{S}}_{i+1}$ , plus the number of edges entering $\overset{{}_{\leftarrow}}{\partial}{\mathcal{S}}_{i}$ from ${\mathcal{S}}_{i}$ itself. For example, Fig. 2a in Sec. I-B illustrates a successive partition of a six-node network into three subsets ${\mathcal{S}}_{1}=\{1,4\}$ , ${\mathcal{S}}_{2}=\{2,5\}$ and ${\mathcal{S}}_{3}=\{3,6\}$ , with $\overset{{}_{\leftarrow}}{\partial}{\mathcal{S}}_{1}=\{4\}$ , $\overset{{}_{\leftarrow}}{\partial}{\mathcal{S}}_{2}=\{2\}$ and $\overset{{}_{\leftarrow}}{\partial}{\mathcal{S}}_{3}=\{3,6\}$ . In addition, $d_{2}=5$ and $d_{3}=4$ . As another example, the network in Fig. 4a, where each undirected edge represents a pair of channels with opposite directions, can be successively partitioned into ${\mathcal{S}}_{1}=\{1\}$ , ${\mathcal{S}}_{2}=\{2,7\}$ , ${\mathcal{S}}_{3}=\{3,6,8,9\}$ , ${\mathcal{S}}_{4}=\{4,10\}$ , and ${\mathcal{S}}_{5}=\{5\}$ , with $\overset{{}_{\leftarrow}}{\partial}{\mathcal{S}}_{1}=\{1\}$ , $\overset{{}_{\leftarrow}}{\partial}{\mathcal{S}}_{1}=\{2,7\}$ , $\overset{{}_{\leftarrow}}{\partial}{\mathcal{S}}_{3}=\{3,8\}$ , $\overset{{}_{\leftarrow}}{\partial}{\mathcal{S}}_{4}=\{4,10\}$ , and $\overset{{}_{\leftarrow}}{\partial}{\mathcal{S}}_{i}=\{5\}$ . In addition, $d_{2}=6$ , $d_{3}=7$ , $d_{4}=6$ , and $d_{5}=2$ .

Formally, a network $G$ has bidirectional links if, for any pair of nodes $u,v\in{\mathcal{V}}$ , $(u,v)\in{\mathcal{E}}$ if and only if $(v,u)\in{\mathcal{E}}$ . A path between $u$ and $v$ is a sequence of edges $\{(v_{i},v_{i+1})\}^{k-1}_{i=1}$ , such that $v_{1}=u$ and $v_{k}=v$ (if $G$ is connected, there is at least one path between any pair of nodes). The graph distance between $u$ and $v$ , denoted by $d_{G}(u,v)$ , is the length of a shortest path between $u$ and $v$ (shortest paths are not necessarily unique). The diameter of $G$ is then defined by

{\rm diam}(G)\triangleq\max_{u\in{\mathcal{V}}}\max_{v\in{\mathcal{V}}}d_{G}(u,v).

The following lemma states that any such network $G$ can be successively partitioned into $n={\rm diam}(G)+1$ subsets:

Lemma 6.

Any network $G=({\mathcal{V}},{\mathcal{E}})$ with bidirectional links (i.e., $(u,v)\in{\mathcal{E}}$ if and only if $(v,u)\in{\mathcal{E}}$ ) admits a successive partition into subsets ${\mathcal{S}}_{1},\ldots,{\mathcal{S}}_{n}$ with $n={\rm diam}(G)+1$ .

Proof:

For any $v\in{\mathcal{V}}$ and any $r\in\{0:{\rm diam}(G)\}$ , we define the sets

\displaystyle{\mathbb{B}}_{G}(v,r)\triangleq\left\{u\in{\mathcal{V}}:d_{G}(v,u)\leq r\right\}

and

\displaystyle\SS_{G}(v,r)\triangleq\left\{u\in{\mathcal{V}}:d_{G}(v,u)=r\right\},

i.e., the ball and the sphere of radius $r$ centered at $v$ . In particular, ${\mathbb{B}}_{G}(v,r)={\mathbb{B}}_{G}(v,r-1)\cup\SS_{G}(v,r)$ .

We now construct the desired successive partition. Let $n={\rm diam}(G)+1$ , and pick any pair of nodes $v_{0},v_{1}\in{\mathcal{V}}$ that achieve the maximum in the definition of ${\rm diam}(G)$ . With this, we take

{\mathcal{P}}_{i}={\mathbb{B}}_{G}(v_{0},i-1),\qquad i=1,\ldots,n.

Clearly, ${\mathcal{P}}_{1}=\{v_{0}\}\subset{\mathcal{P}}_{2}\subset\ldots\subset{\mathcal{P}}_{n}={\mathcal{V}}$ , and moreover

{\mathcal{S}}_{i}=\SS_{G}(v_{0},i-1),\qquad i=1,\ldots,n.

From this construction, we see that

{\mathcal{E}}_{{\mathcal{P}}_{i}}=\left\{(u,v)\in{\mathcal{E}}:u\in{\mathcal{S}}_{i+1},\,v\in{\mathcal{S}}_{i}\right\}

and

{\mathcal{E}}_{{\mathcal{P}}^{c}_{i}}=\left\{(u,v)\in{\mathcal{V}}:u\in{\mathcal{S}}_{i},\,v\in{\mathcal{S}}_{i+1}\right\}.

The pairwise disjointness of the cutsets ${\mathcal{E}}_{{\mathcal{P}}_{i}}$ , as well as of the cutsets ${\mathcal{E}}_{{\mathcal{P}}^{c}_{i}}$ , is immediate. ∎

Remarks:

•

Using the construction underlying the proof, we can also show that, for any two nodes in $G$ , we can successively partition $G$ into $n=d_{G}(u,v)+1$ subsets.
•

For the successive partition constructed in the proof, all nodes in ${\mathcal{S}}_{i}$ are left-bound nodes, and $d_{i}$ is the sum of the in-degrees of the nodes in ${\mathcal{S}}_{i}$ .

As an example, Fig. 5a shows the successive partition of the network in Fig. 4a using the construction in the proof, where ${\mathcal{S}}_{1}=\{1\}$ , ${\mathcal{S}}_{2}=\{2,7\}$ , ${\mathcal{S}}_{3}=\{3,8\}$ , ${\mathcal{S}}_{4}=\{4,6,9\}$ , ${\mathcal{S}}_{5}=\{5,10\}$ , with $\overset{{}_{\leftarrow}}{\partial}{\mathcal{S}}_{i}={\mathcal{S}}_{i}$ , $i\in\{1,\ldots,5\}$ , and $d_{2}=6$ , $d_{3}=6$ , $d_{4}=9$ , and $d_{5}=5$ .

The successive partition of $G$ ensures that nodes in ${\mathcal{S}}_{i}$ only communicate with nodes in ${\mathcal{S}}_{i-1}$ and ${\mathcal{S}}_{i+1}$ , as well as among themselves. Indeed, suppose that the network graph $G$ includes an edge $e=(u,v)\in{\mathcal{E}}$ with $u\in{\mathcal{S}}_{i}$ and $v\in{\mathcal{S}}_{j}$ , where $i>j+1$ . By construction of the successive partition, $u\in{\mathcal{P}}^{c}_{j+1}\subset{\mathcal{P}}^{c}_{j}$ and $v\in{\mathcal{P}}_{j}\subset{\mathcal{P}}_{j+1}$ . Therefore, $e$ belongs to both ${\mathcal{E}}_{{\mathcal{P}}_{j}}$ and ${\mathcal{E}}_{{\mathcal{P}}_{j+1}}$ . However, the cutsets ${\mathcal{E}}_{{\mathcal{P}}_{j}}$ and ${\mathcal{E}}_{{\mathcal{P}}_{j+1}}$ are disjoint, so we arrive at a contradiction. Likewise, we can use the disjointness of the cutsets ${\mathcal{E}}_{{\mathcal{P}}^{c}_{i}}$ and ${\mathcal{E}}_{{\mathcal{P}}^{c}_{j}}$ to show that the network graph contains no edges $(u,v)$ with $u\in{\mathcal{S}}_{i}$ , $v\in{\mathcal{S}}_{j}$ , and $j>i+1$ .

In view of this, we can associate to the partition $\{{\mathcal{S}}_{i}\}$ a bidirected chain $G^{\prime}=({\mathcal{V}}^{\prime},{\mathcal{E}}^{\prime})$ , i.e., a network with vertex set ${\mathcal{V}}^{\prime}=\{1^{\prime},\ldots,n^{\prime}\}$ , edge set

\displaystyle{\mathcal{E}}^{\prime}=\big{\{}(i^{\prime},(i-1)^{\prime})\big{\}}_{i=2}^{n}\cup\big{\{}(i^{\prime},(i+1)^{\prime})\big{\}}_{i=1}^{n-1}\cup\big{\{}(i^{\prime},i^{\prime})\big{\}}_{i=1}^{n},

and channel transition laws

$\displaystyle K_{(i^{\prime},(i-1)^{\prime})}$	$\displaystyle=\bigotimes_{(u,v)\in{\mathcal{E}}:u\in{\mathcal{S}}_{i},v\in{\mathcal{S}}_{i-1}}K_{(u,v)}$	(26)
$\displaystyle K_{(i^{\prime},(i+1)^{\prime})}$	$\displaystyle=\bigotimes_{(u,v)\in{\mathcal{E}}:u\in{\mathcal{S}}_{i},v\in{\mathcal{S}}_{i+1}}K_{(u,v)}$	(27)
$\displaystyle K_{(i^{\prime},i^{\prime})}$	$\displaystyle=\bigotimes_{(u,v)\in{\mathcal{E}}:u\in{\mathcal{S}}_{i},v\in\overset{{}_{\leftarrow}}{\partial}{\mathcal{S}}_{i}}K_{(u,v)},$	(28)

where node $i^{\prime}$ in $G^{\prime}$ observes

\displaystyle W_{i^{\prime}}=W_{{\mathcal{S}}_{i}}.

In other words, the subset ${\mathcal{S}}_{i}$ in $G$ is reduced to node $i^{\prime}$ in $G^{\prime}$ ; the channels across the subsets in $G$ are reduced to the channels between the nodes in $G^{\prime}$ ; and the channels from ${\mathcal{S}}_{i}$ to $\overset{{}_{\leftarrow}}{\partial}{\mathcal{S}}_{i}$ in $G$ are reduced to a self-loop at node $i^{\prime}$ in $G^{\prime}$ . The channels from ${\mathcal{S}}_{i}$ to ${\mathcal{S}}_{i}\setminus\overset{{}_{\leftarrow}}{\partial}{\mathcal{S}}_{i}$ in $G$ are not included in $G^{\prime}$ , and will be simulated by node $i^{\prime}$ using private randomness. For the network in Fig. 2a in Sec. I-B, according to the illustrated partition, it can be reduced to a $3$ -node bidirected chain in Fig. 2b, with $K_{(1^{\prime},1^{\prime})}=K_{(1,4)}$ , $K_{(2^{\prime},2^{\prime})}=K_{(5,2)}$ , and $K_{(3^{\prime},3^{\prime})}=K_{(3,6)}\otimes K_{(6,3)}$ . For the network in Fig. 4a, according to the illustrated partition, it can be reduced to a $5$ -node bidirected chain in Fig. 4b, with $K_{(2^{\prime},2^{\prime})}=K_{(2,7)}\otimes K_{(7,2)}$ , $K_{(3^{\prime},3^{\prime})}=K_{(6,3)}\otimes K_{(6,8)}\otimes K_{(9,8)}$ , and $K_{(4^{\prime},4^{\prime})}=K_{(4,10)}\otimes K_{(10,4)}$ . According to the partition in Fig. 5a, the same network can be reduced to a $5$ -node bidirected chain in Fig. 5b, with $K_{(2^{\prime},2^{\prime})}=K_{(2,7)}\otimes K_{(7,2)}$ , $K_{(4^{\prime},4^{\prime})}=K_{(6,9)}\otimes K_{(9,6)}$ , and $K_{(5^{\prime},5^{\prime})}=K_{(5,10)}\otimes K_{(10,5)}$ .

For the bidirected chain $G^{\prime}$ reduced from $G$ , we consider a class of randomized $T$ -step algorithms that run on $G^{\prime}$ and are of a more general form compared to the deterministic algorithms considered so far. Such a randomized algorithm operates as follows: at step $t\in\{1,\ldots,T\}$ , node $i^{\prime}$ computes the outgoing messages $X_{(i^{\prime},(i-1)^{\prime}),t}=\overset{{}_{\leftarrow}}{\varphi}_{i^{\prime},t}(W_{i^{\prime}},Y^{t-1}_{i^{\prime}})$ , $X_{(i^{\prime},(i+1)^{\prime}),t}=\overset{{}_{\rightarrow}}{\varphi}_{i^{\prime},t}(W_{i^{\prime}},Y^{t-1}_{i^{\prime}},U^{t-1}_{i^{\prime}})$ , and $X_{(i^{\prime},i^{\prime}),t}=\mathring{\varphi}_{i^{\prime},t}(W_{i^{\prime}},Y^{t-1}_{i^{\prime}},U^{t-1}_{i^{\prime}})$ , and computes the private message $U_{i^{\prime},t}=\vartheta_{i^{\prime},t}(W_{i^{\prime}},Y_{i^{\prime}}^{t-1},U^{t-1}_{i^{\prime}},R_{i^{\prime},t})$ , where $R_{i^{\prime},t}$ is the private randomness held by node $i^{\prime}$ , uniformly distributed on $[0,1]$ and independent across $i^{\prime}\in{\mathcal{V}}^{\prime}$ and $t\in\{1,\ldots,T\}$ . At step $T$ , node $i^{\prime}$ computes the final estimate ${\widehat{Z}}_{i^{\prime}}=\psi_{i^{\prime}}(W_{i^{\prime}},Y^{T}_{i^{\prime}})$ of $Z$ . These randomized algorithms have the feature that the message sent to the node on the left and the final estimate of a node are computed solely based on the node’s initial observation and received messages, whereas the messages sent to the node on the right and to itself are computed based on the node’s initial observation, received messages, as well as private messages, and the computation of the private messages involves the node’s private randomness. Define

	$\displaystyle T^{\prime}(\varepsilon,\delta)=\inf\Big{\{}$	$\displaystyle T\in{\mathbb{N}}:\exists\text{ a randomized $T$-step algorithm }{\mathcal{A}}^{\prime}$
		$\displaystyle\text{ such that }\max_{i^{\prime}\in{\mathcal{V}}^{\prime}}{\mathbb{P}}\big{[}d(Z,{\widehat{Z}}_{i^{\prime}})>\varepsilon\big{]}\!\leq\delta\Big{\}}$		(29)

as the $(\varepsilon,\delta)$ -computation time for $Z$ on $G^{\prime}$ using the randomized algorithms described above. The following lemma indicates that we can obtain lower bounds on $T(\varepsilon,\delta)$ by lower-bounding $T^{\prime}(\varepsilon,\delta)$ .

Lemma 7.

Consider an arbitrary network $G$ that can be successively partitioned into ${\mathcal{S}}_{1},\ldots,{\mathcal{S}}_{n}$ , such that $\overset{{}_{\leftarrow}}{\partial}{\mathcal{S}}_{i}$ ’s are all nonempty. Let $G^{\prime}=({\mathcal{V}}^{\prime},{\mathcal{E}}^{\prime})$ be the bidirected chain constructed from $G$ according to the partition. Then, given any $T$ -step algorithm on $G$ that achieves $\max_{v\in{\mathcal{V}}}{\mathbb{P}}[d(Z,{\widehat{Z}}_{v})>\varepsilon]\leq\delta$ , we can construct a randomized $T$ -step algorithm ${\mathcal{A}}^{\prime}$ on $G^{\prime}$ , such that $\max_{i^{\prime}\in{\mathcal{V}}^{\prime}}{\mathbb{P}}[d(Z,{\widehat{Z}}_{i^{\prime}})>\varepsilon]\leq\delta$ . Consequently, $T(\varepsilon,\delta)$ for computing $Z$ on $G$ is lower bounded by $T^{\prime}(\varepsilon,\delta)$ defined in (III-A).

Proof:

Appendix A. ∎

Remark: In the network reduction, we can alternatively map all the channels from ${\mathcal{S}}_{i}$ to ${\mathcal{S}}_{i}$ (instead of only mapping the channels from ${\mathcal{S}}_{i}$ to $\overset{{}_{\leftarrow}}{\partial}{\mathcal{S}}_{i}$ ) in the original network $G$ to the self-loop at node $i^{\prime}$ of the reduced chain $G^{\prime}$ . By doing so, to simulate the operation of an algorithm ${\mathcal{A}}$ that runs on $G$ , the algorithm ${\mathcal{A}}^{\prime}$ that runs on $G^{\prime}$ no longer needs to generate private messages using the nodes’ private randomness, since all the channels in $G$ are preserved in $G^{\prime}$ . In other words, under this alternative reduction, any $T$ -step algorithm ${\mathcal{A}}$ that runs on $G$ can be simulated by a $T$ -step algorithm ${\mathcal{A}}^{\prime}$ of the same deterministic type as ${\mathcal{A}}$ that runs on $G^{\prime}$ . However, this alternative reduction increases the information transmission capability of the self-loops in $G^{\prime}$ , and will result in a looser lower bound on $T(\varepsilon,\delta)$ , as will be discussed in the remark following Theorem 3.

In light of Lemma 7, in order to lower-bound $T(\varepsilon,\delta)$ for computing $Z$ on $G$ , we just need to lower-bound $T^{\prime}(\varepsilon,\delta)$ defined in (III-A). To this end, we derive upper bounds on the conditional mutual information for bidirected chains by extending the techniques behind Lemma 2 and Lemma 5:

Lemma 8.

Consider an $n$ -node bidirected chain with vertex set ${\mathcal{V}}=\{1,\ldots,n\}$ and edge set

\displaystyle{\mathcal{E}}=\big{\{}(i,i-1)\big{\}}_{i=2}^{n}\cup\big{\{}(i,i+1)\big{\}}_{i=1}^{n-1}\cup\big{\{}(i,i)\big{\}}_{i=1}^{n},

and an arbitrary randomized $T$ -step algorithm ${\mathcal{A}}^{\prime}$ that runs on this chain. Let $\eta_{i}\triangleq\eta(K_{i})$ denote the SDPI constant of the channel $K_{i}\triangleq\bigotimes_{j:\,(j,i)\in{\mathcal{E}}}K_{(j,i)}$ , and let $\eta\triangleq\max_{i=1,\ldots,n}\eta_{i}$ . If $T\leq n-2$ , then

\displaystyle I(Z;{\widehat{Z}}_{n}|W_{2:n})=0.

If $T\geq n-1$ , then

\displaystyle I(Z;{\widehat{Z}}_{n}|W_{2:n})\leq

		$\displaystyle H(W_{1}\|W_{2:n})\eta\sum_{i=1}^{T-n+2}{\mathcal{B}}(T-i,n-2,\eta),$	$n\geq 2\qquad$		(30)
		$\displaystyle C_{(1,2)}\eta\sum_{i=1}^{T-n+2}{\mathcal{B}}(T-i-1,n-3,\eta)i,$	$n\geq 3\qquad$		(31)

with ${\mathcal{B}}(m,k,p)\triangleq{m\choose k}p^{k}(1-p)^{m-k}$ . For $n\geq 2$ , the above upper bounds can be weakened to

\displaystyle\quad I(Z;{\widehat{Z}}_{n}|W_{2:n})\leq

		$\displaystyle\!\!\!H(W_{1}\|W_{2:n})\big{(}1-(1-\eta)^{T-n+2}\big{)}^{n-1},$			(32)
		$\displaystyle\!\!\!C_{(1,2)}(T-n+2)\big{(}1-(1-\tilde{\eta})^{T-n+2}\big{)}^{n-2}.$			(33)

Moreover, if $n\geq 4$ and

n-1\leq T\leq 2+\frac{(n-3)\gamma}{\eta}

for some $\gamma\in(0,1)$ , then

		$\displaystyle I(Z;{\widehat{Z}}_{n}\|W_{2:n})\leq$
		$\displaystyle\qquad C_{(1,2)}\frac{(n-3)^{2}\gamma^{2}}{\eta}\exp\left(-2\left(\frac{\eta}{\gamma}-\eta\right)^{2}(n-3)\right).$		(34)

Proof:

Appendix B. ∎

Equation (30) is reminiscent of a result of Rajagopalan and Schulman [13] on the evolution of mutual information in broadcasting a bit over a unidirectional chain of BSCs. The result in [13] is obtained by solving a system of recursive inequalities on the mutual information involving suboptimal SDPI constants. Our results apply to chains of general bidirectional links and to the computation of general functions. We arrive at a system of inequalities similar to the one in [13], which can be solved in a similar manner and gives (30) and (31). We also obtain weakened upper bounds in (32) and (33), which show that, for a fixed $T$ , the conditional mutual information decays at least exponentially fast in $n$ . The upper bound in (8) provides another weakening of (30) and (31), and shows explicitly the dependence of the upper bound on $n$ .

Assuming for simplicity that $H(W_{1}|W_{2:n})=1$ , Fig. 6 compares (30) with the weakened upper bound in (32). We can see that the gap can be large when $n$ is large and $T$ is much larger than $n$ . Nevertheless, the weakened upper bounds in (32) and (33) allow us to derive lower bounds on computation time that are non-asymptotic in $n$ , and explicit in $\varepsilon$ , $\delta$ , and channel properties.

III-B Lower bounds on computation time

We now build on the results presented above to obtain lower bounds on the $T(\varepsilon,\delta)$ by reducing the original problem to function computation over bidirected chains. We first provide the result for an arbitrary network, and then particularize it to several specific topologies (namely, chains, rings, grids, and trees).

III-B1 Lower bound for an arbitrary network

Theorem 3 below contains general lower bounds on computation time for an arbitrary network. The statement of the theorem is somewhat lengthy, but can be parsed as follows: Given an arbitrary connected network with bidirectional links, any reduction of that network to a bidirected chain gives rise to a system of inequalities that must be satisfied by the computation time $T(\varepsilon,\delta)$ . These inequalities, presented in (3), are nonasymptotic in nature and involve explicitly computable parameters of the network, but cannot be solved in closed form. The first inequality follows from an SDPI-based analysis analogous to Theorem 2, while the second inequality is a cutset bound in the spirit of Theorem 1. Explicit but weaker expressions that lower-bound $T(\varepsilon,\delta)$ in terms of network parameters appear below as (36) and (37), together with asymptotic expressions for large $n$ (the size of the reduced bidirected chain). Both of these bounds state that $T(\varepsilon,\delta)$ is lower-bounded by the size of the bidirected chain plus a correction term that accounts for the effect of channel noise (via channel capacities and SDPI constants). Finally, (38) and (39) provide the precise version of the bound in (8): asymptotically, the computation time $T(\varepsilon,\delta)$ scales as $\Omega(n/\tilde{\eta})$ , where $\tilde{\eta}$ is the worst-case SDPI constant of the reduced network. By Lemma 6, it is always possible to reduce the network to a bidirected chain of length ${\rm diam}(G)+1$ , so the main message of Theorem 3 is that the computation time $T(\varepsilon,\delta)$ scales at least linearly in the network diameter. Thus, the main advantage of the multi-cutset analysis over the usual single-cutset analysis is that it can capture this dependence on the network diameter.

Theorem 3.

Assume the following:

•

The network graph $G=({\mathcal{V}},{\mathcal{E}})$ is connected, the capacities of all edge links are upper-bounded by $C$ , and the SDPI constants of edge links are upper-bounded by $\eta$ .
•

$G$ admits a successive partition into ${\mathcal{S}}_{1},\ldots,{\mathcal{S}}_{n}$ , such that $\overset{{}_{\leftarrow}}{\partial}{\mathcal{S}}_{i}$ ’s are all nonempty.

Let

\Delta\triangleq\max_{i\in\{2:n\}}d_{i}

where

d_{i}=|{\mathcal{E}}_{{\mathcal{P}}_{i-1}^{c}}|+|{\mathcal{E}}_{{\mathcal{P}}_{i}}|+|\{{\mathcal{E}}\cap({\mathcal{S}}_{i}\times\overset{{}_{\leftarrow}}{\partial}{\mathcal{S}}_{i})\}|

as defined in (25), and let

\tilde{\eta}=1-(1-\eta)^{\Delta}.

Then for $\varepsilon\geq 0$ and $\delta\in(0,1/2]$ , the $(\varepsilon,\delta)$ -computation time $T(\varepsilon,\delta)$ must satisfy the inequalities

		$\displaystyle\ell({\mathcal{S}}_{1}^{c},\varepsilon,\delta)\leq$
		$\displaystyle\begin{cases}H(W_{{\mathcal{S}}_{1}}\|W_{{\mathcal{S}}_{1}^{c}})\tilde{\eta}\displaystyle\sum\limits_{i=1}^{T(\varepsilon,\delta)-n+2}{\mathcal{B}}(T(\varepsilon,\delta)-i,n-2,\tilde{\eta}),&\!\!n\geq 2\\ C_{{\mathcal{S}}_{1}^{c}}\tilde{\eta}\displaystyle\sum\limits_{i=1}^{T(\varepsilon,\delta)-n+2}{\mathcal{B}}(T(\varepsilon,\delta)-i-1,n-3,\tilde{\eta})i,&\!\!n\geq 3.\end{cases}$		(35)

The above results can be weakened to

	$\displaystyle T(\varepsilon,\delta)$	$\displaystyle\geq\frac{\log\left(1-\big{(}\frac{\ell({\mathcal{S}}_{1}^{c},\varepsilon,\delta)}{H(W_{{\mathcal{S}}_{1}}\|W_{{\mathcal{S}}_{1}^{c}})}\big{)}^{\frac{1}{n-1}}\right)^{-1}}{\Delta\log(1-\eta)^{-1}}+n-2$		(36)
		$\displaystyle\sim\frac{\log(n-1)+\log\big{(}1-\frac{\ell\left({\mathcal{S}}_{1}^{c},\varepsilon,\delta\right)}{H(W_{{\mathcal{S}}_{1}}\|W_{{\mathcal{S}}_{1}^{c}})}\big{)}^{-1}}{\Delta\log(1-\eta)^{-1}}+n-2,$

as $n\rightarrow\infty$ , and

\displaystyle T(\varepsilon,\delta)\geq\frac{\ell\left({\mathcal{S}}_{1}^{c},\varepsilon,\delta\right)}{C_{{\mathcal{S}}_{1}^{c}}}+n-2.

(37)

Moreover, if the partition size $n$ is large enough, so that $n\geq 4$ and

\displaystyle\frac{C|{\mathcal{V}}|^{2}(n-3)^{2}}{4\eta}\exp\left(-2\eta^{2}(n-3)\right)<\ell({\mathcal{S}}^{c}_{1},\varepsilon,\delta),

(38)

then

\displaystyle T(\varepsilon,\delta)>2+\frac{n-3}{2\tilde{\eta}}\geq 2+\frac{n-3}{2\Delta\eta}.

(39)

Proof:

In light of Lemma 7, it suffices to show that the lower bounds in Theorem 3 need to be satisfied by $T^{\prime}(\varepsilon,\delta)$ for the bidirected chain $G^{\prime}$ , to which $G$ reduces according to the partition $\{{\mathcal{S}}_{i}\}$ .

Consider any randomized $T$ -step algorithm ${\mathcal{A}}^{\prime}$ that achieves $\max_{i^{\prime}\in{\mathcal{V}}^{\prime}}{\mathbb{P}}[d(Z,{\widehat{Z}}_{i^{\prime}})>\varepsilon]\leq\delta$ on $G^{\prime}$ . From Lemma 1,

I(Z;{\widehat{Z}}_{n^{\prime}}|W_{2^{\prime}:n^{\prime}})\geq\ell(\{2^{\prime}:n^{\prime}\},\varepsilon,\delta).

Then from Lemma 8 and the fact that

$\displaystyle\eta_{i^{\prime}}$	$\displaystyle=\eta(K_{((i-1)^{\prime},i^{\prime})}\otimes K_{((i+1)^{\prime},i^{\prime})}\otimes K_{i^{\prime},i^{\prime}})$
	$\displaystyle\leq 1-(1-\eta)^{d_{i}}$
	$\displaystyle\leq 1-(1-\eta)^{\Delta},$	(40)

we have

	$\displaystyle\ell(\{2^{\prime}:n^{\prime}\},\varepsilon,\delta)\leq$
	$\displaystyle\begin{cases}H(W_{1^{\prime}}\|W_{2^{\prime}:n^{\prime}})\tilde{\eta}\displaystyle\sum\limits_{i=1}^{T-n+2}{\mathcal{B}}(T-i,n-2,\tilde{\eta}),&\quad n\geq 2\\ C_{(1^{\prime},2^{\prime})}\tilde{\eta}\displaystyle\sum\limits_{i=1}^{T-n+2}{\mathcal{B}}(T-i-1,n-3,\tilde{\eta})i,&\quad n\geq 3,\end{cases}$

and for $n\geq 2$ ,

		$\displaystyle\ell(\{2^{\prime}:n^{\prime}\},\varepsilon,\delta)\leq$
		$\displaystyle\begin{cases}H(W_{1^{\prime}}\|W_{2^{\prime}:n^{\prime}})\displaystyle\prod\limits_{i=2}^{n}\big{(}1-(1-\eta)^{d_{i}(T-n+2)}\big{)}\\ C_{(1^{\prime},2^{\prime})}(T-n+2)\displaystyle\prod\limits_{i=3}^{n}\big{(}1-(1-\eta)^{d_{i}(T-n+2)}\big{)}.\end{cases}$		(41)

Since $\ell(\{2^{\prime}:n^{\prime}\},\varepsilon,\delta)=\ell({\mathcal{S}}_{1}^{c},\varepsilon,\delta)$ , $H(W_{1^{\prime}}|W_{2^{\prime}:n^{\prime}})=H(W_{{\mathcal{S}}_{1}}|W_{{\mathcal{S}}_{1}^{c}})$ , and $C_{(1^{\prime},2^{\prime})}=C_{{\mathcal{S}}_{1}^{c}}$ , we see that $T^{\prime}(\varepsilon,\delta)$ must satisfy (3) in Theorem 3.

Using (40), (III-B1) can be weakened to

		$\displaystyle\ell({\mathcal{S}}_{1}^{c},\varepsilon,\delta)\leq$
		$\displaystyle\begin{cases}H(W_{{\mathcal{S}}_{1}}\|W_{{\mathcal{S}}_{1}^{c}})\big{(}1-(1-\eta)^{\Delta(T-n+2)}\big{)}^{n-1}\\ C_{{\mathcal{S}}_{1}^{c}}(T-n+2)\big{(}1-(1-\eta)^{\Delta(T-n+2)}\big{)}^{n-2}\end{cases}.$		(42)

The first line of (III-B1) leads to

	$\displaystyle T^{\prime}(\varepsilon,\delta)$	$\displaystyle\geq\frac{\log\left(1-\big{(}\frac{\ell({\mathcal{S}}_{1}^{c},\varepsilon,\delta)}{H(W_{{\mathcal{S}}_{1}}\|W_{{\mathcal{S}}_{1}^{c}})}\big{)}^{\frac{1}{n-1}}\right)^{-1}}{\Delta\log(1-\eta)^{-1}}+n-2$
		$\displaystyle\sim\frac{\log(n-1)+\log\big{(}1-\frac{\ell\left({\mathcal{S}}_{1}^{c},\varepsilon,\delta\right)}{H(W_{{\mathcal{S}}_{1}}\|W_{{\mathcal{S}}_{1}^{c}})}\big{)}^{-1}}{\Delta\log(1-\eta)^{-1}}+n-2,$

where the last step follows from the fact that $\log\big{(}1-p^{\frac{1}{n}}\big{)}^{-1}\sim\log\frac{n}{1-p}$ as $n\rightarrow\infty$ for $p\in(0,1)$ . The second line of (III-B1) leads to

\displaystyle T^{\prime}(\varepsilon,\delta)\geq\frac{\ell\left({\mathcal{S}}_{1}^{c},\varepsilon,\delta\right)}{C_{{\mathcal{S}}_{1}^{c}}}+n-2.

Finally, we prove that $T^{\prime}(\varepsilon,\delta)=\Omega(n/\tilde{\eta})$ under the assumption that (38) holds. Suppose that $T^{\prime}(\varepsilon,\delta)\leq 2+(n-3)/2\tilde{\eta}$ . Then, from (8) in Lemma 8, we have

\displaystyle\ell({\mathcal{S}}_{1}^{c},\varepsilon,\delta)\leq C_{{\mathcal{S}}_{1}^{c}}\frac{(n-3)^{2}}{4\tilde{\eta}}\exp\left(-2{\tilde{\eta}}^{2}(n-3)\right),\quad\text{if $n\geq 4$}.

Note that $\Delta\geq 1$ by the assumption that $G$ is connected, thus $\tilde{\eta}=1-(1-\eta)^{\Delta}\geq\eta$ . Moreover, $C_{{\mathcal{S}}_{1}^{c}}\leq C|{\mathcal{E}}|\leq C|{\mathcal{V}}|^{2}$ . As a result,

\displaystyle\ell({\mathcal{S}}_{1}^{c},\varepsilon,\delta)

\displaystyle\leq\frac{C|{\mathcal{V}}|^{2}(n-3)^{2}}{4\eta}\exp\left(-2\eta^{2}(n-3)\right),\quad\text{if $n\geq 4$,}

which contradicts the assumption that (38) holds. Thus,

T^{\prime}(\varepsilon,\delta)>2+\frac{n-3}{2\tilde{\eta}}\geq 2+\frac{n-3}{2\Delta\eta}

Theorem 3 then follows from Lemma 7. ∎

Remarks:

•

We call a node in ${\mathcal{S}}_{i}$ a boundary node if there is an edge (either inward or outward) between it and a node in ${\mathcal{S}}_{i-1}$ or ${\mathcal{S}}_{i+1}$ . Denote the set of boundary nodes of ${\mathcal{S}}_{i}$ by $\partial{\mathcal{S}}_{i}$ . The results in Theorem 3 can be weakened by replacing $d_{i}$ with

$\displaystyle\partial d_{i}=\sum_{v\in\partial{\mathcal{S}}_{i}}|{\mathcal{E}}_{v}|,$

namely the summation of the in-degrees of boundary nodes of ${\mathcal{S}}_{i}$ , since $d_{i}\leq\partial d_{i}$ for $i\in\{2,\ldots,n\}$ .

•

As discussed in the remark following Lemma 7, an alternative network reduction is to map all the channels from ${\mathcal{S}}_{i}$ to ${\mathcal{S}}_{i}$ (instead of only mapping the channels from ${\mathcal{S}}_{i}$ to $\overset{{}_{\leftarrow}}{\partial}{\mathcal{S}}_{i}$ ) in the original network $G$ to the self-loop at node $i^{\prime}$ of the reduced chain $G^{\prime}$ . Using the same proof strategy with this alternative reduction, we can obtain lower bounds on $T(\varepsilon,\delta)$ of the same form as the results in Theorem 3, but with $d_{i}$ ’s replaced by

\tilde{d}_{i}\triangleq|{\mathcal{E}}_{{\mathcal{P}}_{i-1}^{c}}|+|{\mathcal{E}}_{{\mathcal{P}}_{i}}|+|\{{\mathcal{E}}\cap({\mathcal{S}}_{i}\times{\mathcal{S}}_{i})\}|.

Since $d_{i}\leq\partial d_{i}\leq\tilde{d}_{i}$ for $i\in\{2,\ldots,n\}$ , the lower bounds on $T(\varepsilon,\delta)$ obtained by this alternative network reduction are weaker than the results in Theorem 3, and are even weaker than the results obtained by replacing $d_{i}$ ’s with $\partial d_{i}$ s.

•

Due to Lemma 6, for a network $G$ of bidirectional links, we can always find a successive partition of $G$ such that $n$ in Theorem 3 is equal to the ${\rm diam}(G)+1$ . By contrast, the diameter cannot be captured in general by the theorems in Section II.
•

Choosing a successive partition of $G$ with $n=2$ is equivalent to choosing a single cutset. In that case, we see that (37) recovers Theorem 1, while (36) recovers a weakened version of Theorem 2 (in (36), $\Delta=d_{2}$ is at least the sum of the in-degrees of the left-bound nodes of ${\mathcal{S}}_{2}$ , while Theorem 2 involves the in-degree of only one node in ${\mathcal{S}}_{2}$ ).

We now apply Theorem 3 to networks with specific topologies. We assume that nodes communicate via bidirectional links. Thus, any such network will be represented by an undirected graph, where each undirected edge represents a pair of channels with opposite directions.

III-B2 Chains

For chains, the proof of Theorem 3 already contains lower bounds on $T^{\prime}(\varepsilon,\delta)$ . These lower bounds apply to $T(\varepsilon,\delta)$ as well, since the class of $T$ -step algorithms on a chain is a subcollection of randomized $T$ -step algorithms on the same chain. We thus have the following corollary.

Corollary 3.

Consider an $n$ -node bidirected chain without self-loops, where the SDPI constants of all channels are upper bounded by $\eta$ . Then for $\varepsilon\geq 0$ and $\delta\in(0,1/2]$ , $T(\varepsilon,\delta)$ must satisfy the inequalities in Theorem 3 with ${\mathcal{S}}_{1}=\{1\}$ and $d_{i}=2$ for all $i\in\{1,\ldots,n\}$ . In particular, if all channels are ${\rm BSC}(p)$ , then

	$\displaystyle T(\varepsilon,$	$\displaystyle\delta)\geq\max\bigg{\{}\frac{\ell\big{(}{\mathcal{V}}\setminus\{1\},\varepsilon,\delta\big{)}}{1-h_{2}(p)},$
		$\displaystyle\frac{\log(n-1)+\log\big{(}1-\frac{\ell\big{(}{\mathcal{V}}\setminus\{1\},\varepsilon,\delta\big{)}}{H(W_{1}\|W_{{\mathcal{V}}\setminus\{1\}})}\big{)}^{-1}}{2\log(4p\bar{p})^{-1}}\bigg{\}}+n-2$

for all sufficiently large $n$ .

Here and below, the estimates for a network of bidirectional BSCs are obtained using the bounds (16) and (17).

III-B3 Rings

Consider a ring with $2n-2$ nodes, where the nodes are labeled clockwise from $1$ to $2n-2$ . The diameter is equal to $n-1$ . According to the successive partition in the proof of Lemma 6, this ring can be partitioned into ${\mathcal{S}}_{1}=\{1\}$ , ${\mathcal{S}}_{i}=\{i,2n-i\}$ , $i\in\{2,\ldots,n-1\}$ , and ${\mathcal{S}}_{n}=\{n\}$ . As an example, Fig. 7a shows a $6$ -node ring and Fig. 7b shows the chain reduced from it.

With this partition, we can apply Theorem 3 and get the following corollary.

Corollary 4.

Consider a $(2n-2)$ -node ring, where the SDPI constants of all channels are upper bounded by $\eta$ . Then for $\varepsilon\geq 0$ and $\delta\in(0,1/2]$ , $T(\varepsilon,\delta)$ must satisfy the inequalities in Theorem 3 with ${\mathcal{S}}_{1}=\{1\}$ and $d_{i}=4$ for all $i\in\{1,\ldots,n\}$ . In particular, if all channels are ${\rm BSC}(p)$ , then

	$\displaystyle T(\varepsilon,\delta)$	$\displaystyle=\max\bigg{\{}\frac{\ell\big{(}{\mathcal{V}}\setminus\{1\},\varepsilon,\delta\big{)}}{2(1-h_{2}(p))},$
		$\displaystyle\frac{\log(n-1)+\log\big{(}1-\frac{\ell\left({\mathcal{V}}\setminus\{1\},\varepsilon,\delta\right)}{H(W_{1}\|W_{{\mathcal{V}}\setminus\{1\}})}\big{)}^{-1}}{4\log(4p\bar{p})^{-1}}\bigg{\}}+n-2$

for all sufficiently large $n$ .

III-B4 Grids

Consider an $\frac{n+1}{2}\times\frac{n+1}{2}$ grid (where we assume $n$ is odd), which has diameter $n-1$ . Figure 8a shows a successive partition of a $\frac{n+1}{2}\times\frac{n+1}{2}$ grid into $\frac{n+1}{2}$ subsets, with $\Delta=\max_{i\in\{2:n\}}d_{i}=2n$ . Figure 8b shows the successive partition in the proof of Lemma 6, which partitions the network into $n$ subsets, with $\Delta=\max_{i\in\{2:n\}}d_{i}=2(n-1)$ , thus resulting in strictly tighter lower bounds on computation time compared to the ones obtained from the partition in Fig. 8a. With the latter partition, we get the following corollary.

Corollary 5.

Consider an $\frac{n+1}{2}\times\frac{n+1}{2}$ grid, where $1-\ldots-n$ is one of the longest paths. Assume that the SDPI constants of all channels are upper bounded by $\eta$ . Then for $\varepsilon\geq 0$ and $\delta\in(0,1/2]$ , $T(\varepsilon,\delta)$ must satisfy the inequalities in Theorem 3 with ${\mathcal{S}}_{1}=\{1\}$ , $d_{i}=d_{n+1-i}=4(i-2)+6$ , $i\in\{1,\ldots,\frac{n-1}{2}\}$ , and $d_{{(n+1)}/{2}}=2(n-1)$ . In particular, if all channels are ${\rm BSC}(p)$ , then

	$\displaystyle T(\varepsilon,$	$\displaystyle\delta)\geq\max、\bigg{\{}\frac{\ell\big{(}{\mathcal{V}}\setminus\{1\},\varepsilon,\delta\big{)}}{2(1-h_{2}(p))},$
		$\displaystyle\frac{\log(n-1)+\log\big{(}1-\frac{\ell\left({\mathcal{V}}\setminus\{1\},\varepsilon,\delta\right)}{H(W_{1}\|W_{{\mathcal{V}}\setminus\{1\}})}\big{)}^{-1}}{2(n-1)\log(4p\bar{p})^{-1}}\bigg{\}}+n-2$

for all sufficiently large $n$ .

III-B5 Trees

Consider a tree, whose nodes are numbered in such a way that $1-\ldots-n$ is one of the longest paths. Then the diameter of the tree is $n-1$ , and nodes $1$ and $n$ are necessarily leaf nodes. The tree can be viewed as being rooted at node $1$ . Let ${\mathcal{D}}_{i}$ be the union of node $i$ and its descendants in the rooted tree, and let ${\mathcal{S}}_{i}={\mathcal{D}}_{i}\setminus{\mathcal{D}}_{i+1}$ , $i\in\{1,\ldots,n\}$ . The tree can then be successively partitioned into ${\mathcal{S}}_{1},\ldots,{\mathcal{S}}_{n}$ . In the $n$ -node bidirected chain reduced according to this partition, the edges between nodes $i^{\prime}$ and $(i+1)^{\prime}$ are the pair of channels between nodes $i$ and $i+1$ in the tree, and the self-loop of node $i^{\prime}$ , $i\in\{2,\ldots,n-1\}$ , is the channel from ${\mathcal{S}}_{i}\setminus\{i\}$ to node $i$ in the tree.

As an example, Fig. 9a shows this partition of a tree network, and the chain reduced from it has the same form as the one in Fig. 4b. With this partition, we get the following corollary.

Corollary 6.

Consider a $d$ -regular tree network where $1-\ldots-n$ is one of the longest paths. Assume that the SDPI constants of all channels are upper bounded by $\eta$ . Then for $\varepsilon\geq 0$ and $\delta\in(0,1/2]$ , $T(\varepsilon,\delta)$ must satisfy the inequalities in Theorem 3 with ${\mathcal{S}}_{1}=\{1\}$ and $d_{i}=d$ for all $i\in\{1,\ldots,n\}$ . In particular, if all channels are ${\rm BSC}(p)$ , then

	$\displaystyle T(\varepsilon,$	$\displaystyle\delta)\geq\max\bigg{\{}\frac{\ell\big{(}{\mathcal{V}}\setminus\{1\},\varepsilon,\delta\big{)}}{1-h_{2}(p)},$
		$\displaystyle\frac{\log(n-1)+\log\big{(}1-\frac{\ell\left({\mathcal{V}}\setminus\{1\},\varepsilon,\delta\right)}{H(W_{1}\|W_{{\mathcal{V}}\setminus\{1\}})}\big{)}^{-1}}{d\log(4p\bar{p})^{-1}}\bigg{\}}+n-2$

for all sufficiently large $n$ .

If we use the successive partition in the proof of Lemma 6 on a $d$ -regular tree with diameter $n-1$ , then the tree will be reduced to an $n$ -node bidirected chain without self-loops. Figure 9b shows such an example. However, with this partition, $\Delta=\max_{i\in\{2:n\}}d_{i}$ increases with $n$ , which renders the resulting lower bound on computation time looser than the one in Corollary 6. It means that, although the partition in the proof of Lemma 6 always captures the diameter of a network, it may not always give the best lower bound on computation time among all possible successive partitions.

IV Small ball probability estimates for computation of linear functions

The bounds stated in the preceding sections involve the conditional small ball probability, defined in (9). In this section, we provide estimates for this quantity in the context of a distributed computation problem of wide interest — the computation of linear functions. Specifically, we assume that the observations $W_{v},v\in{\mathcal{V}}$ , are independent real-valued random variables, and the objective is to compute a linear function

\displaystyle Z=f(W)=\sum_{v\in{\mathcal{V}}}a_{v}W_{v}

(43)

for a fixed vector of coefficients $(a_{v})_{v\in{\mathcal{V}}}\in\mathbb{R}^{|{\mathcal{V}}|}$ , subject to the absolute error criterion $d(z,{\widehat{z}})=|z-{\widehat{z}}|$ . We will use the following shorthand notation: for any set ${\mathcal{S}}\subset{\mathcal{V}}$ , let $a_{\mathcal{S}}=(a_{v})_{v\in{\mathcal{S}}}$ and $\langle a_{\mathcal{S}},W_{\mathcal{S}}\rangle=\sum_{v\in{\mathcal{S}}}a_{v}W_{v}$ .

The independence of the $W_{v}$ ’s and the additive structure of $f$ allow us to express the conditional small ball probability $L(w_{\mathcal{S}},\varepsilon)$ defined in (9) in terms of so-called Lévy concentration functions of random sums [19]. The Lévy concentration function of a real-valued r.v. $U$ (also known as the “small ball probability”) is defined as

\displaystyle{\mathcal{L}}(U,\rho)=\sup_{u\in\mathbb{R}}{\mathbb{P}}\left[|U-u|\leq\rho\right],\qquad\rho>0.

(44)

If we fix a subset ${\mathcal{S}}\subset{\mathcal{V}}$ , and consider a specific realization $W_{\mathcal{S}}=w_{\mathcal{S}}$ of the observations of the nodes in ${\mathcal{S}}$ , then

$\displaystyle L(w_{\mathcal{S}},\varepsilon)$	$\displaystyle=\sup_{z\in\mathbb{R}}{\mathbb{P}}\Bigg{[}\Big{\|}\sum_{v\in{\mathcal{V}}}a_{v}W_{v}-z\Big{\|}\leq\varepsilon\Bigg{\|}W_{\mathcal{S}}=w_{\mathcal{S}}\Bigg{]}$
	$\displaystyle=\sup_{z\in\mathbb{R}}{\mathbb{P}}\left[\Big{\|}\sum_{v\in{\mathcal{S}}^{c}}a_{v}W_{v}+\sum_{v\in{\mathcal{S}}}a_{v}w_{v}-z\Big{\|}\leq\varepsilon\right]$
	$\displaystyle=\sup_{z\in\mathbb{R}}{\mathbb{P}}\Bigg{[}\Big{\|}\sum_{v\in{\mathcal{S}}^{c}}a_{v}W_{v}-z\Big{\|}\leq\varepsilon\Bigg{]}$
	$\displaystyle={\mathcal{L}}\left(\langle a_{{\mathcal{S}}^{c}},W_{{\mathcal{S}}^{c}}\rangle,\varepsilon\right),$	(45)

where in the second line we have used the fact that the $W_{v}$ ’s are independent r.v.’s, while in the third line we have used the fact that for any function $g:\mathbb{R}\to\mathbb{R}$ and any $a\in\mathbb{R}$ , $\sup_{z}g(z)=\sup_{z}g(z+a)$ . In other words, for a fixed ${\mathcal{S}}$ , the quantity $L(w_{\mathcal{S}},\varepsilon)$ is independent of the boundary condition $w_{\mathcal{S}}$ , and is controlled by the probability law of the random sum $\langle a_{{\mathcal{S}}^{c}},W_{{\mathcal{S}}^{c}}\rangle$ , i.e., the part of the function $f$ that depends on the observations of the nodes in ${\mathcal{S}}^{c}$ .

The problem of estimating Lévy concentration functions of sums of independent random variables has a long history in the theory of probability — for random variables with densities, some of the first results go back at least to Kolmogorov [34], while for discrete random variables it is closely related to the so-called Littlewood–Offord problem [35]. We provide a few examples to illustrate how one can exploit available estimates for Lévy concentration functions under various regularity conditions to obtain tight lower bounds on the computation time for linear functions. The examples are illustrated through Theorem 1, as it tightly captures the dependence of computation time on $\ell({\mathcal{S}},\varepsilon,\delta)$ . (However, since the results of Theorems 2 and 3 also involve the quantity $\ell({\mathcal{S}},\varepsilon,\delta)$ , the estimates for Lévy concentration functions can be applied there as well.)

IV-A Computing linear functions of continuous observations

IV-A1 Gaussian sums

Suppose that the local observations $W_{v}$ , $v\in{\mathcal{V}}$ , are i.i.d. standard Gaussian random variables. Then, for any ${\mathcal{S}}\subseteq{\mathcal{V}}$ , $\langle a_{{\mathcal{S}}},W_{{\mathcal{S}}}\rangle$ is a zero-mean Gaussian r.v. with variance $\|a_{{\mathcal{S}}}\|^{2}_{2}=\sum_{v\in{\mathcal{S}}}a^{2}_{v}$ (here, $\|\cdot\|_{2}$ is the usual Euclidean $\ell_{2}$ norm). A simple calculation shows that

\displaystyle L(w_{\mathcal{S}},\varepsilon)

\displaystyle={\mathcal{L}}\left(N\left(0,\|a_{{\mathcal{S}}^{c}}\|^{2}_{2}\right),\varepsilon\right)\leq\sqrt{\frac{2}{\pi}}\frac{\varepsilon}{\|a_{{\mathcal{S}}^{c}}\|_{2}}.

Using this in Theorem 1, we get the following result.

Corollary 7.

For the problem of computing a linear function in (43), where $(W_{v})\overset{{\rm i.i.d.}}{\sim}N(0,1)$ , suppose that the coefficients $a_{v}$ are all nonzero. Then for $\varepsilon\geq 0$ and $\delta\in(0,1/2]$ ,

\displaystyle T(\varepsilon,\delta)\geq\max_{{\mathcal{S}}\subset{\mathcal{V}}}\frac{1}{C_{\mathcal{S}}}\left(\frac{1-\delta}{2}\log\frac{\pi\|a_{{\mathcal{S}}^{c}}\|^{2}_{2}}{2\varepsilon^{2}}-h_{2}(\delta)\right).

Thus, the lower bound on the computation time for (43) depends on the vector of coefficients $a$ only through its $\ell_{2}$ norm.

IV-A2 Sums of independent r.v.’s with log-concave distributions

Another instance in which sharp bounds on the Lévy concentration function are available is when the observations of the nodes are independent random variables with log-concave distributions (we recall that a real-valued r.v. $U$ is said to have a log-concave distribution if it has a density of the form $p_{U}(u)=e^{-F(u)}$ , where $F:\mathbb{R}\to(-\infty,+\infty]$ is a convex function; this includes Gaussian, Laplace, uniform, etc.). The following result was obtained recently by Bobkov and Chistyakov [36, Theorem 1.1]: Let $U_{1},\ldots,U_{k}$ be independent random variables with log-concave distributions, and let $S_{k}=U_{1}+\ldots+U_{k}$ . Then, for any $\rho\geq 0$ ,

\displaystyle\!\!\!\!\frac{1}{\sqrt{3}}\frac{\rho}{\sqrt{{\rm Var}(S_{k})+{\rho^{2}}/{3}}}\leq{\mathcal{L}}(S_{k},\rho)\leq\!\frac{2\rho}{\sqrt{{\rm Var}(S_{k})+{\rho^{2}}/{3}}}.

(46)

Corollary 8.

For the problem of computing a linear function in (43), where the $W_{v}$ ’s are independent random variables with log-concave distributions and with variances at least $\sigma^{2}$ , suppose that the coefficients $a_{v}$ are all nonzero. Then for $\varepsilon\geq 0$ and $\delta\in(0,1/2]$ ,

\displaystyle\!T(\varepsilon,\delta)\geq\max_{{\mathcal{S}}\subset{\mathcal{V}}}\frac{1}{C_{\mathcal{S}}}\left(\frac{1-\delta}{2}\log\left(\frac{\sigma^{2}\|a_{{\mathcal{S}}^{c}}\|^{2}_{2}}{4\varepsilon^{2}}+\frac{1}{12}\right)-h_{2}(\delta)\right).

Proof:

For each $v\in{\mathcal{V}}$ , $a_{v}W_{v}$ also has a log-concave distribution, and, for any ${\mathcal{S}}\subset{\mathcal{V}}$ ,

\displaystyle{\rm Var}(\langle{a_{{\mathcal{S}}^{c}}},W_{{\mathcal{S}}^{c}}\rangle)=\sum_{v\in{\mathcal{S}}^{c}}|a_{v}|^{2}{\rm Var}(W_{v})\geq\|a_{{\mathcal{S}}^{c}}\|^{2}_{2}\sigma^{2}.

The lower bound follows from Theorem 1 and from (46). ∎

IV-A3 Sums of independent r.v.’s with bounded third moments

It is known that random variables with log-concave distributions have bounded moments of any order. Under a much weaker assumption that the local observations $W_{v}$ , $v\in{\mathcal{V}}$ have bounded third moments, we can prove the following result.

Corollary 9.

Consider the problem of computing the linear function in (43), where the $W_{v}$ ’s are independent zero-mean r.v.’s with variances at least $1$ and with third moments bounded by $B$ , and the coefficients $a_{v}$ satisfy the constraint $K_{1}\leq|a_{v}|\leq K_{2}$ for some $K_{1},K_{2}>0$ . Then for $\varepsilon\geq 0$ and $\delta\in(0,1/2]$ ,

\displaystyle T(\varepsilon,\delta)\geq\max_{{\mathcal{S}}\subset{\mathcal{V}}}\frac{1}{C_{\mathcal{S}}}\Bigg{(}\frac{1-\delta}{2}\log\frac{|{\mathcal{V}}\setminus{\mathcal{S}}|}{M^{2}(\varepsilon)}-h_{2}(\delta)\Bigg{)}

where $M(\varepsilon)\triangleq c\big{(}\varepsilon/K_{1}+B(K_{2}/K_{1})^{3}\big{)}$ with some absolute constant $c$ .

Proof:

Under the conditions of the theorem, a small ball estimate due to Rudelson and Vershynin [37, Corollary 2.10] can be used to show that, for any ${\mathcal{S}}\subset{\mathcal{V}}$ ,

\displaystyle{\mathcal{L}}(\langle a_{{\mathcal{S}}},W_{{\mathcal{S}}}\rangle,\varepsilon)\leq\frac{M(\varepsilon)}{\sqrt{|{\mathcal{S}}|}}.

The desired conclusion follows immediately. ∎

IV-B Linear vector-valued functions

Similar to the Lévy concentration function of a real-valued random variable, the Lévy concentration function of a random vector $U$ taking values in $\mathbb{R}^{n}$ can be defined as

\displaystyle{\mathcal{L}}(U,\rho)=\sup_{u\in\mathbb{R}^{n}}{\mathbb{P}}\left[\|U-u\|_{2}\leq\rho\right],\qquad\rho>0.

Consider the case where each node observes an independent real-valued random variable $W_{v}$ , and the observations form a $|{\mathcal{V}}|\times 1$ vector $W_{\mathcal{V}}$ . Suppose the nodes wish to compute a linear transform of $W_{\mathcal{V}}$ ,

\displaystyle Z=AW_{\mathcal{V}}

(47)

with some fixed $n\times|{\mathcal{V}}|$ matrix $A$ , subject to the Euclidean-norm distortion criterion $d(z,{\widehat{z}})=\|z-{\widehat{z}}\|_{2}$ . In this case

	$\displaystyle L(w_{\mathcal{S}},\varepsilon)$	$\displaystyle=\sup_{z\in\mathbb{R}^{n}}{\mathbb{P}}[\\|AW_{\mathcal{V}}-z\\|_{2}\leq\varepsilon\|W_{\mathcal{S}}=w_{\mathcal{S}}]$
		$\displaystyle=\sup_{z\in\mathbb{R}^{n}}{\mathbb{P}}[\\|A_{{\mathcal{S}}^{c}}W_{{\mathcal{S}}^{c}}+A_{{\mathcal{S}}}w_{\mathcal{S}}-z\\|_{2}\leq\varepsilon]$
		$\displaystyle=\sup_{z\in\mathbb{R}^{n}}{\mathbb{P}}[\\|A_{{\mathcal{S}}^{c}}W_{{\mathcal{S}}^{c}}-z\\|_{2}\leq\varepsilon]$
		$\displaystyle={\mathcal{L}}(A_{{\mathcal{S}}^{c}}W_{{\mathcal{S}}^{c}},\varepsilon)$

where $A_{{\mathcal{S}}^{c}}$ is the submatrix formed by the columns of $A$ with indices in ${\mathcal{S}}^{c}$ . We will need the following result, due to Rudelson and Vershynin [38]. Let $s_{j}(A_{{\mathcal{S}}^{c}})$ , $j=1,\ldots,\min\{n,|{\mathcal{S}}^{c}|\}$ , denote the singular values of $A_{{\mathcal{S}}^{c}}$ arranged in non-increasing order, and define the stable rank of $A_{{\mathcal{S}}^{c}}$ by

\displaystyle r(A_{{\mathcal{S}}^{c}})=\Bigg{\lfloor}\frac{\|A_{{\mathcal{S}}^{c}}\|_{\rm HS}^{2}}{\|A_{{\mathcal{S}}^{c}}\|^{2}}\Bigg{\rfloor}

where $\|A_{{\mathcal{S}}^{c}}\|_{\rm HS}=\big{(}\sum_{j=1}^{\min\{n,|{\mathcal{S}}^{c}|\}}s_{j}(A_{{\mathcal{S}}^{c}})^{2}\big{)}^{1/2}$ is the Hilbert-Schmidt norm of $A_{{\mathcal{S}}^{c}}$ , and $\|A_{{\mathcal{S}}^{c}}\|=s_{1}(A_{{\mathcal{S}}^{c}})$ is the spectral norm of $A_{{\mathcal{S}}^{c}}$ . (Note that for any nonzero matrix $A_{{\mathcal{S}}^{c}}$ , $1\leq r(A_{{\mathcal{S}}^{c}})\leq{\rm rank}(A_{{\mathcal{S}}^{c}})$ .) Then, provided

\displaystyle{\mathcal{L}}(W_{v},\varepsilon/\|A_{{\mathcal{S}}^{c}}\|_{\rm HS})\leq p

for all $v\in{\mathcal{S}}^{c}$ , we will have

\displaystyle{\mathcal{L}}(A_{{\mathcal{S}}^{c}}W_{{\mathcal{S}}^{c}},\varepsilon)\leq(cp)^{0.9r(A_{{\mathcal{S}}^{c}})}

where $c$ is an absolute constant [38, Theorem 1.4]. This result relates the Lévy concentration function of the linear transform of a vector to the Lévy concentration function of each coordinate of the vector. Applying this result in Theorem 1, we get a lower bound on $T(\varepsilon,\delta)$ for computing linear vector-valued functions.

Corollary 10.

For the problem of computing a linear transform of the observations defined in (47), where $W_{v}$ ’s are independent real-valued r.v.s, suppose the rows of $A$ are nonzero vectors. Then for $\varepsilon\geq 0$ and $\delta\in(0,1/2]$ ,

	$\displaystyle T(\varepsilon,\delta)\geq$	$\displaystyle\max_{{\mathcal{S}}\subset{\mathcal{V}}}\frac{1}{C_{\mathcal{S}}}\Bigg{(}0.9(1-\delta)r(A_{{\mathcal{S}}^{c}})$
		$\displaystyle\log\frac{1}{c\max_{v\in{\mathcal{S}}^{c}}{\mathcal{L}}(W_{v},\varepsilon/\\|A_{{\mathcal{S}}^{c}}\\|_{\rm HS})}-h_{2}(\delta)\Bigg{)}$

for some absolute constant $c$ .

IV-C Linear function of discrete observations

Finally, we consider a case when the local observations $W_{v}$ have discrete distributions. Specifically, let the $W_{v}$ ’s be i.i.d. Rademacher random variables, i.e., each $W_{v}$ takes values $\pm 1$ with equal probability. We still use the absolute distortion function $d(z,{\widehat{z}})=|z-{\widehat{z}}|$ to quantify the estimation error. In this case, the Lévy concentration function ${\mathcal{L}}(\langle a_{\mathcal{S}},W_{\mathcal{S}}\rangle,\varepsilon)$ will be highly sensitive to the direction of the vector $a_{{\mathcal{S}}}$ , rather than just its norm. For example, consider the extreme case when $a_{v}=|{\mathcal{V}}|$ for a single node $v\in{\mathcal{S}}$ , and all other coefficients are zero. Then ${\mathcal{L}}(\langle a_{\mathcal{S}},W_{\mathcal{S}}\rangle,0)={\mathcal{L}}(|{\mathcal{V}}|W_{v},0)=1/2$ . On the other hand, if $a_{v}=1$ for all $v\in{\mathcal{V}}$ and $|{\mathcal{S}}|$ is even, then

\displaystyle{\mathcal{L}}(\langle a_{{\mathcal{S}}},W_{{\mathcal{S}}}\rangle,0)=2^{-|{\mathcal{S}}|}{|{\mathcal{S}}|\choose|{\mathcal{S}}|/2}\sim\sqrt{\frac{2}{\pi|{\mathcal{S}}|}}\quad\text{as $|{\mathcal{S}}|\rightarrow\infty$}

where the last step is due to Stirling’s approximation. Moreover, a celebrated result due to Littlewood and Offord, improved later by Erdős [39], says that, if $|a_{v}|\geq 1$ for all $v$ , then

\displaystyle{\mathcal{L}}(\langle a_{{\mathcal{S}}},W_{{\mathcal{S}}}\rangle,1)\leq 2^{-|{\mathcal{S}}|}{|{\mathcal{S}}|\choose\lfloor|{\mathcal{S}}|/2\rfloor}\sim\sqrt{\frac{2}{\pi|{\mathcal{S}}|}}\quad\text{as $|{\mathcal{S}}|\rightarrow\infty$,}

which translates into a lower bound on the $(1,\delta)$ -computation time which is of the same order as the lower bound on the zero-error computation time.

Corollary 11.

For the problem of computing the linear function in (43), where the $W_{v}$ ’s are independent Rademacher random variables, suppose that $|a_{v}|\geq 1$ for all $v$ , and $\delta<{1}/{2}$ . Then

\displaystyle T(0,\delta)\geq T(1,\delta)\gtrsim\max_{{\mathcal{S}}\subset{\mathcal{V}}}\frac{1}{C_{\mathcal{S}}}\left(\frac{1-\delta}{2}\log\frac{\pi|{\mathcal{V}}\setminus{\mathcal{S}}|}{2}-h_{2}(\delta)\right)

as $|{\mathcal{S}}|\rightarrow\infty$ .

IV-D Comparison with existing results

We illustrate the utility of the above bounds through comparison with some existing results. For example, Ayaso et al. [1] derive lower bounds on a related quantity

	$\displaystyle\tilde{T}(\varepsilon,\delta)\triangleq\inf\Big{\{}T$	$\displaystyle\in{\mathbb{N}}:\exists\text{ a $T$-step algorithm }{\mathcal{A}}\text{ such that }$
		$\displaystyle\max_{v\in{\mathcal{V}}}{\mathbb{P}}\big{[}{\widehat{Z}}_{v}\notin\left[(1-\varepsilon)Z,(1+\varepsilon)Z\right]\big{]}<\delta\Big{\}}.$

One of their results is as follows: if $Z=f(W)$ is a linear function of the form (43) and $\left(W_{v}\right)\overset{{\rm i.i.d.}}{\sim}\text{Uniform}([1,1+B])$ for some $B>0$ , then

\displaystyle\tilde{T}(\varepsilon,\delta)\geq\max_{{\mathcal{S}}\subset{\mathcal{V}}}\frac{|{\mathcal{S}}|}{2C_{\mathcal{S}}}\log\frac{1}{B\varepsilon^{2}+\kappa\delta+(1/B)^{2/|{\mathcal{V}}|}}

(48)

for all sufficiently small $\varepsilon,\delta>0$ , where $\kappa>0$ is a fixed constant [1, Theorem III.5]. Let us compare (48) with what we can obtain using our techniques. It is not hard to show that

\displaystyle\tilde{T}(\varepsilon,\delta)\geq T\big{(}\|a\|_{1}(1+B)\varepsilon,\delta\big{)}

(49)

where $\|a\|_{1}=\sum_{v\in{\mathcal{V}}}|a_{v}|$ is the $\ell_{1}$ norm of $a$ . Moreover, since any r.v. uniformly distributed on a bounded interval of the real line has a log-concave distribution, we can use Corollary 8 to lower-bound the right-hand side of (49). This gives

\displaystyle\tilde{T}(\varepsilon,\delta)\geq\max_{{\mathcal{S}}\subset{\mathcal{V}}}\frac{1}{C_{\mathcal{S}}}\left(\frac{1-\delta}{2}\log\frac{B^{2}\|a_{{\mathcal{S}}^{c}}\|^{2}_{2}}{48(B+1)^{2}\|a\|^{2}_{1}\varepsilon^{2}}-h_{2}(\delta)\right)

(50)

for all sufficiently small $\varepsilon,\delta>0$ . We immediately see that this bound is tighter than the one in (48). In particular, the right-hand side of (48) remains bounded for vanishingly small $\varepsilon$ and $\delta$ , and in the limit of $\varepsilon,\delta\to 0$ tends to

\displaystyle\max_{{\mathcal{S}}\subset{\mathcal{V}}}\frac{|{\mathcal{S}}|}{C_{\mathcal{S}}}\frac{\log B}{|{\mathcal{V}}|}\leq\frac{\log B}{\min_{{\mathcal{S}}\subset{\mathcal{V}}}C_{\mathcal{S}}}.

By contrast, as $\varepsilon,\delta\to 0$ , the right-hand side of (50) grows without bound as $\log(1/\varepsilon)$ .

Another lower bound on the $(\varepsilon,\delta)$ -computation time $T(\varepsilon,\delta)$ was obtained by Como and Dahleh [2]. Their starting point is the following continuum generalization of Fano’s inequality [2, Lemma 2] in terms of conditional differential entropy: if $Z,{\widehat{Z}}$ are two jointly distributed real-valued r.v.’s, such that $\mathbb{E}Z^{2}<\infty$ , then, for any $\varepsilon>0$ ,

\displaystyle h(Z|{\widehat{Z}})\leq{\mathbb{P}}\big{[}|Z-{\widehat{Z}}|\leq\varepsilon\big{]}\log\varepsilon+\frac{1}{2}\log\big{(}16\pi e\mathbb{E}Z^{2}\big{)}.

(51)

If we use (51) instead of Lemma 1 to lower-bound $I(Z;{\widehat{Z}}_{v}|W_{\mathcal{S}})$ , then we get

	$\displaystyle T(\varepsilon,\delta)\geq\max_{{\mathcal{S}}\subset{\mathcal{V}}}\frac{1}{C_{\mathcal{S}}}\Bigg{(}$	$\displaystyle\frac{1-\delta}{2}\log\frac{1}{\varepsilon^{2}}+h(Z\|W_{\mathcal{S}})$
		$\displaystyle-\frac{1}{2}\log\big{(}16\pi e\mathbb{E}Z^{2}\big{)}\Bigg{)}.$		(52)

Again, let us consider the case when $Z=f(W)$ is a linear function of the form (43) with all $a_{v}$ nonzero and with $(W_{v})\overset{{\rm i.i.d.}}{\sim}N(0,1)$ . Then (IV-D) becomes

\displaystyle T(\varepsilon,\delta)\geq\max_{{\mathcal{S}}\subset{\mathcal{V}}}\frac{1}{C_{\mathcal{S}}}\left(\frac{1-\delta}{2}\log\frac{1}{\varepsilon^{2}}+\frac{1}{2}\log\frac{\|a_{{\mathcal{S}}^{c}}\|^{2}_{2}}{8\|a\|^{2}_{2}}\right).

(53)

The lower bound of our Corollary 7 will be tighter than (53) for all $\varepsilon>0$ as long as

\displaystyle\frac{1-\delta}{2}\log\frac{\pi\|a_{{\mathcal{S}}^{c}}\|^{2}_{2}}{2}-h_{2}(\delta)\geq\frac{1}{2}\log\frac{\|a_{{\mathcal{S}}^{c}}\|^{2}_{2}}{8\|a\|^{2}_{2}},\quad\forall{\mathcal{S}}\subset{\mathcal{V}}.

Note that the quantity on the right-hand side is nonpositive. More generally, for observations with log-concave distributions, the result of Lemma 1 can be weakened to get a lower bound involving the conditional differential entropy $h(Z|W_{\mathcal{S}})$ , which is tighter than similar results obtained in [2].

Corollary 12.

If the observations $W_{v}$ , $v\in{\mathcal{V}}$ , have log-concave distributions, then for computing the sum $Z=\sum_{v\in{\mathcal{V}}}W_{v}$ subject to the absolute error criterion $d(z,{\widehat{z}})=|z-{\widehat{z}}|$ , for $\varepsilon\geq 0$ and $\delta\in(0,1/2]$ ,

\displaystyle T(\varepsilon,\delta)\geq\max_{{\mathcal{S}}\subset{\mathcal{V}}}\frac{1}{C_{\mathcal{S}}}\left(\!(1-\delta)\!\left(\!h(Z|W_{\mathcal{S}})+\log\frac{1}{2e\varepsilon}\!\right)-h_{2}(\delta)\!\right).

Proof:

Let $p_{\mathcal{S}}(z)$ denote the probability density of $\sum_{v\in{\mathcal{S}}^{c}}W_{v}$ . Then from (45),

\displaystyle L(w_{\mathcal{S}},\varepsilon)=\sup_{z\in\mathbb{R}}\int^{z+\varepsilon}_{z-\varepsilon}p_{\mathcal{S}}(z){\rm d}z\leq 2\varepsilon\|p_{\mathcal{S}}\|_{\infty}

(54)

for all $w_{\mathcal{S}}\in\prod_{v\in{\mathcal{S}}}{\mathsf{W}}_{v}$ , where $\|p_{\mathcal{S}}\|_{\infty}$ is the sup norm of $p_{\mathcal{S}}$ . By a result of Bobkov and Madiman [40, Proposition I.2], if $U$ is a real-valued r.v. with a log-concave density $p$ , then the differential entropy $h(U)$ is upper-bounded by $\log e+\log\|p\|^{-1}_{\infty}$ . Using this fact together with (54), the log-concavity of $p_{\mathcal{S}}$ , and the fact that the $W_{v}$ ’s are mutually independent, we can write

	$\displaystyle\log\frac{1}{\mathbb{E}[L(W_{\mathcal{S}},\varepsilon)]}$	$\displaystyle\geq\log\frac{1}{2\varepsilon}+\log\frac{1}{\\|p_{\mathcal{S}}\\|_{\infty}}$
		$\displaystyle\geq\log\frac{1}{2e\varepsilon}+h\Big{(}\sum_{v\in{\mathcal{S}}^{c}}W_{v}\Big{)}$
		$\displaystyle=\log\frac{1}{2e\varepsilon}+h(Z\|W_{\mathcal{S}}),$

Using this estimate in Theorem 1, we get the desired lower bound on $T(\varepsilon,\delta)$ . ∎

V Comparison with upper bounds on computation time

For the two-node mod- $2$ sum problem in Example 1, we have shown in Corollary 1 that the lower bound on computation given by Theorem 2 can tightly match the upper bound. In this section, we provide two more examples in which our lower bounds on computation time are tight. In the first example, our lower bound precisely captures the dependence of computation time on the number of nodes in the network. In the second example, our lower bound tightly captures the dependence of computation time on the accuracy parameter $\varepsilon$ .

V-A Rademacher sum over a dumbbell network

Example 3.

Consider a dumbbell network of bidirectional BSCs with the same crossover probability. Formally, suppose $|{\mathcal{V}}|$ is even, and let the nodes be indexed from $1$ to $|{\mathcal{V}}|$ . Nodes $1$ to $|{\mathcal{V}}|/2$ form a clique (i.e., each pair of nodes are connected by a pair of BSCs), while nodes $|{\mathcal{V}}|/2+1$ to $|{\mathcal{V}}|$ form another clique. The two cliques are connected by a pair of BSCs between nodes $|{\mathcal{V}}|/2$ and $|{\mathcal{V}}|/2+1$ . Each node initially observes a ${\rm Bern}(\frac{1}{2})$ (or Rademacher) r.v. The goal is for the nodes to compute the sum of the observations of all nodes. The distortion function is $d(z,{\widehat{z}})=|z-{\widehat{z}}|$ .

By choosing the cutset as the pair of BSCs that joins the two cliques, our lower bound for random Rademacher sums in Corollary 11 gives the following lower bound on computation time.

Corollary 13.

For the problem of in Example 3, for $\delta\in(0,1/2)$ ,

\displaystyle T(0,\delta)\gtrsim\frac{1}{C}\left(\frac{1-\delta}{2}\log\frac{\pi|{\mathcal{V}}|}{4}-h_{2}(\delta)\right)\quad\text{as $|{\mathcal{V}}|\rightarrow\infty$},

which implies

\displaystyle T(0,\delta)=\Omega\left(\log|{\mathcal{V}}|\right).

Now we show that the above lower bound matches the upper bound on the computation time, which turns out to be

T(0,\delta)=O\left(\log|{\mathcal{V}}|\right).

As shown by Gallager [11], for a fixed success probability, nodes $|{\mathcal{V}}|/2$ and $|{\mathcal{V}}|/2+1$ can learn the partial sum of the observations in their respective cliques in $O\big{(}\log\log|{\mathcal{V}}|\big{)}$ steps. These two nodes then exchange their partial sum estimates using binary block codes. Each partial sum can take $|{\mathcal{V}}|/2+1$ values, and can be encoded losslessly with $\log(|{\mathcal{V}}|/2+1)$ bits. The blocklength needed for transmission of the encoded partial sums is thus $O\big{(}\log(|{\mathcal{V}}|/2+1)\big{)}$ , where the hidden factor depends on the required success probability and the channel crossover probability, but not on $|{\mathcal{V}}|$ . Having learned the partial sum of the other clique, nodes $|{\mathcal{V}}|/2$ and $|{\mathcal{V}}|/2+1$ continue to broadcast this partial sum to other nodes in their own clique. This takes another $O\big{(}\log(|{\mathcal{V}}|/2+1)\big{)}$ step. In total, the computation can be done in $O\big{(}\log\log|{\mathcal{V}}|\big{)}+2O\big{(}\log(|{\mathcal{V}}|/2+1)\big{)}=O(\log|{\mathcal{V}}|)$ steps, to have all nodes learn the sum of all observations, for any prescribed success probability. This shows that $T(0,\delta)=O\left(\log|{\mathcal{V}}|\right)$ .

V-B Distributed averaging over discrete noisy channels

Example 4.

Consider a network where the nodes are connected by binary erasure channels with the same erasure probability. Each node initially observes a log-concave r.v. The goal is for the nodes to compute the average of the observations of all nodes.

For this example, Carli et al. [14] define the computation time as

\displaystyle\tilde{T}(\varepsilon)

\displaystyle\triangleq\inf\Big{\{}T\in{\mathbb{N}}:\frac{1}{|{\mathcal{V}}|}\sum_{v\in{\mathcal{V}}}\mathbb{E}\big{[}(Z-{\widehat{Z}}_{v}(t))^{2}\big{]}\leq\varepsilon,\,\forall t\geq T\Big{\}}

and show that

\displaystyle\tilde{T}(\varepsilon)\leq c_{1}+c_{2}\frac{\log^{3}\varepsilon^{-1}}{\log^{2}\rho^{-1}}

(55)

where $\rho$ is the second largest singular value of the consensus matrix adapted to the network, and $c_{1}$ and $c_{2}$ are positive constants depending only on channel erasure probability. It can be shown that the above upper bound still holds (with different constants) when channels are BSCs.

We use Corollary 12 to derive the following lower bound on $\tilde{T}(\varepsilon)$ .

Corollary 14.

For the problem in Example 4,

\displaystyle\tilde{T}(\varepsilon)\geq\max_{{\mathcal{S}}\subset{\mathcal{V}}}\frac{1}{2C_{\mathcal{S}}}\left(\!h(Z|W_{\mathcal{S}})+\log\frac{1}{4e|{\mathcal{V}}|}+\frac{1}{2}\log\frac{1}{\varepsilon}-2\!\right).

(56)

Proof:

Using Jensen’s inequality twice, we can write

	$\displaystyle\frac{1}{\|{\mathcal{V}}\|}\sum_{v\in{\mathcal{V}}}\mathbb{E}\big{[}(Z-{\widehat{Z}}_{v}(T))^{2}\big{]}$	$\displaystyle\geq\frac{1}{\|{\mathcal{V}}\|}\sum_{v\in{\mathcal{V}}}\big{(}\mathbb{E}\|Z-{\widehat{Z}}_{v}(T)\|\big{)}^{2}$
		$\displaystyle\geq\left(\frac{1}{\|{\mathcal{V}}\|}\sum_{v\in{\mathcal{V}}}\mathbb{E}\|Z-{\widehat{Z}}_{v}(T)\|\right)^{2}.$

Therefore, $|{\mathcal{V}}|^{-1}\sum_{v\in{\mathcal{V}}}\mathbb{E}\big{[}(Z-{\widehat{Z}}_{v}(T))^{2}\big{]}\leq\varepsilon$ implies that $\mathbb{E}|Z-{\widehat{Z}}_{v}(T)|\leq|{\mathcal{V}}|\sqrt{\varepsilon}$ for all $v\in|{\mathcal{V}}|$ , and

{\mathbb{P}}\left[|Z-{\widehat{Z}}_{v}(T)|\geq\frac{|{\mathcal{V}}|\sqrt{\varepsilon}}{\delta}\right]\leq\delta,\qquad\forall v\in{\mathcal{V}},\delta\in(0,1/2]

by Markov’s inequality. Then by Corollary 12,

	$\displaystyle\tilde{T}(\varepsilon)\geq T\left(\frac{\|{\mathcal{V}}\|\sqrt{\varepsilon}}{\delta},\delta\right)$
	$\displaystyle\geq\max_{{\mathcal{S}}\subset{\mathcal{V}}}\frac{1}{C_{\mathcal{S}}}\left((1-\delta)\left(h(Z\|W_{\mathcal{S}})+\log\frac{\delta}{2e\|{\mathcal{V}}\|\sqrt{\varepsilon}}\right)-h_{2}(\delta)\right).$

Choosing $\delta=1/2$ , we obtain (56). ∎

The lower bound given by (56) states that $\tilde{T}(\varepsilon)$ is necessarily logarithmic in $\varepsilon^{-1}$ , which tightly matches the poly-logarithmic dependence on $\varepsilon^{-1}$ in the upper bound given by (55). As pointed out in Carli et al. [41], it is possible to prove that a computation time logarithmic in $\varepsilon^{-1}$ is achievable by embedding a quantized consensus algorithm for noiseless networks into the simulation framework developed by Rajagopalan and Schulman for noisy networks in [13].

VI Conclusion and future research directions

We have studied the fundamental time limits of distributed function computation from an information-theoretic perspective. The computation time depends on the amount of information about the function value needed by each node and the rate for the nodes to accumulate such an amount of information. The small ball probability lower bound on conditional mutual information reveals how much information is necessary, while the cutset-capacity upper bound and the SDPI upper bound capture the bottleneck on the rate for the information to be accumulated. The multi-cutset analysis provides a more refined characterization of the information dissipation in a network.

Here are some questions that are worthwhile to consider in the future:

•

In the multi-cutset analysis, the purpose of introducing self-loops when reducing the network to a chain is to establish necessary Markov relations for proving upper bounds on $I(Z;{\widehat{Z}}_{n}|W_{{\mathcal{S}}})$ in bidirected chains, and the reason for considering left-bound nodes is to improve the lower bounds on computation time. We could have included all channels from ${\mathcal{S}}_{i}$ to ${\mathcal{S}}_{i}$ into the self-loop at node $i^{\prime}$ in $G^{\prime}$ , but this would result in looser lower bounds on computation time (cf. the remark after Theorem 3). However, there might be other network reduction methods, e.g., different ways to construct the bidirected chain, that will yield even tighter lower bounds on computation time than our proposed method.

•

In the first step of the derivation of Lemma 2 and Lemma 5, we have upper-bounded $I(Z;{\widehat{Z}}_{v}|W_{{\mathcal{S}}})$ using the ordinary data processing inequality as

\displaystyle I(Z;{\widehat{Z}}_{v}|W_{{\mathcal{S}}})

\displaystyle\leq I(W_{{\mathcal{S}}^{c}};{\widehat{Z}}_{v}|W_{{\mathcal{S}}}).

One may wonder whether we can tighten this step by a judicious use of SDPIs. The answer is negative. It can be shown that

	$\displaystyle I(Z;$	$\displaystyle{\widehat{Z}}_{v}\|W_{{\mathcal{S}}})\leq I(W_{{\mathcal{S}}^{c}};{\widehat{Z}}_{v}\|W_{{\mathcal{S}}})$
		$\displaystyle\sup_{w_{\mathcal{S}}\in\prod_{v\in{\mathcal{S}}}{\mathsf{W}}_{v}}\eta\big{(}{\mathbb{P}}_{W_{{\mathcal{S}}^{c}}\|W_{\mathcal{S}}=w_{\mathcal{S}}},{\mathbb{P}}_{Z\|W_{{\mathcal{S}}^{c}},W_{\mathcal{S}}=w_{\mathcal{S}}}\big{)}$

where the contraction coefficient depends on the joint distribution of the observations ${\mathbb{P}}_{W}$ and the function $Z=f(W)$ . However,

\displaystyle\eta\big{(}{\mathbb{P}}_{W_{{\mathcal{S}}^{c}}|W_{\mathcal{S}}=w_{\mathcal{S}}},{\mathbb{P}}_{Z|W_{{\mathcal{S}}^{c}},W_{\mathcal{S}}=w_{\mathcal{S}}}\big{)}=1

for both discrete and continuous observations. For discrete observations, this is a consequence of the fact that $\eta({\mathbb{P}}_{X},{\mathbb{P}}_{Y|X})<1$ if and only if the graph $\big{\{}(x,y):{\mathbb{P}}_{X}(x)>0,{\mathbb{P}}_{Y|X}(y|x)>0\big{\}}$ is connected [26], and the fact that, for any ${\mathbb{P}}_{Y|X}$ induced by a deterministic function $f:{\mathsf{X}}\rightarrow{\mathsf{Y}}$ , this graph is always disconnected. This condition can be extended to continuous alphabets [42]. It would be interesting to see whether nonlinear SDPI’s, e.g., of the sort recently introduced by Polyanskiy and Wu [28], can be somehow applied here to tighten the upper bounds.

•

If the function to be computed is the identity mapping, i.e., $Z=W$ , then the goal of the nodes is to distribute their observations to all other nodes in the network. In this case, our results on the computation time can provide non-asymptotic lower bounds on the blocklength of the codes for the source-channel coding problems in multi-terminal networks. In Example 2, we have considered one such case with discrete observations, and obtained lower bounds in Corollary 2 based on the single cutset analysis. It would be interesting to apply the multi-cutset analysis to the source-channel coding problems in multi-terminal, multi-hop networks.

Acknowledgment

The authors would like to thank the Associate Editor Prof. Chandra Nair and two anonymous referees for numerous constructive suggestions on how to improve the flow and the structure of the paper.

Appendix A Proof of Lemma 7

The goal of this proof is to show that, given any $T$ -step algorithm ${\mathcal{A}}$ running on $G$ , we can construct a randomized $T$ -step algorithm ${\mathcal{A}}^{\prime}$ running on $G^{\prime}$ that simulates ${\mathcal{A}}$ . Fix any $T$ -step algorithm ${\mathcal{A}}$ that runs on $G$ . For each $t$ , we can factor the conditional distribution of the messages $X_{t}\triangleq(X_{v,t})_{v\in{\mathcal{V}}}$ given $W,X^{t-1},Y^{t-1}$ as follows:

	$\displaystyle\quad\,\,{\mathbb{P}}_{X_{t}\|W,X^{t-1},Y^{t-1}}(x_{t}\|w,x^{t-1},y^{t-1})$
	$\displaystyle=\prod_{v\in{\mathcal{V}}}{\mathbb{P}}_{X_{v,t}\|W_{v},Y^{t-1}_{v}}(x_{v,t}\|w_{v},y^{t-1}_{v})$
	$\displaystyle=\prod^{n}_{i=1}\prod_{v\in{\mathcal{S}}_{i}}{\mathbb{P}}_{X_{v,t}\|W_{v},Y^{t-1}_{v}}\Big{(}x_{v,t}\Big{\|}w_{v},y^{t-1}_{v}\Big{)}$
	$\displaystyle=\prod^{n}_{i=1}{\mathbb{P}}_{X_{{\mathcal{S}}_{i},t}\|W_{{\mathcal{S}}_{i}},Y^{t-1}_{{\mathcal{S}}_{i}}}\Big{(}x_{{\mathcal{S}}_{i},t}\Big{\|}w_{{\mathcal{S}}_{i}},y^{t-1}_{{\mathcal{S}}_{i}}\Big{)}.$		(A.1)

Likewise, the conditional distribution of the received messages $Y_{t}\triangleq(Y_{v,t})_{v\in{\mathcal{V}}}$ given $W,X^{t},Y^{t-1}$ can be factored as

	$\displaystyle\quad\,\,{\mathbb{P}}_{Y_{t}\|W,X^{t},Y^{t-1}}(y_{t}\|w,x^{t},y^{t-1})$
	$\displaystyle=\prod_{e\in{\mathcal{E}}}{\mathbb{P}}_{Y_{e,t}\|X_{e,t}}(y_{e,t}\|x_{e,t})$
	$\displaystyle=\prod_{e\in{\mathcal{E}}}K_{e}(y_{e,t}\|x_{e,t})$
	$\displaystyle=\prod^{n}_{i=1}\prod_{u\in{\mathcal{S}}_{i}}\prod_{v\in{\mathcal{V}}:\,(u,v)\in{\mathcal{E}}}K_{(u,v)}(y_{(u,v),t}\|x_{(u,v),t}).$		(A.2)

Since the successive partition of $G$ ensures that nodes in ${\mathcal{S}}_{i}$ can communicate with nodes in ${\mathcal{S}}_{j}$ only if $|i-j|\leq 1$ , the messages originating from ${\mathcal{S}}_{i}$ at step $t$ can be decomposed as

	$\displaystyle X_{{\mathcal{S}}_{i},t}$	$\displaystyle=(X_{({\mathcal{S}}_{i},{\mathcal{S}}_{i-1}),t},X_{({\mathcal{S}}_{i},{\mathcal{S}}_{i+1}),t},X_{({\mathcal{S}}_{i},{\mathcal{S}}_{i}),t})$
		$\displaystyle=(X_{({\mathcal{S}}_{i},{\mathcal{S}}_{i-1}),t},X_{({\mathcal{S}}_{i},{\mathcal{S}}_{i+1}),t},X\!{\raisebox{-3.0pt}{$\scriptstyle{({\mathcal{S}}_{i},\overset{{}_{\leftarrow}}{\partial}{\mathcal{S}}_{i}),t}$}},X\!{\raisebox{-3.0pt}{$\scriptstyle{({\mathcal{S}}_{i},{\mathcal{S}}_{i}\setminus\overset{{}_{\leftarrow}}{\partial}{\mathcal{S}}_{i}),t}$}}),$

and the messages received by nodes in ${\mathcal{S}}_{i}$ at step $t$ can be decomposed as

	$\displaystyle Y_{{\mathcal{S}}_{i},t}$	$\displaystyle=(Y_{({\mathcal{S}}_{i-1},{\mathcal{S}}_{i}),t},Y_{({\mathcal{S}}_{i+1},{\mathcal{S}}_{i}),t},Y_{({\mathcal{S}}_{i},{\mathcal{S}}_{i}),t})$
		$\displaystyle=(Y_{({\mathcal{S}}_{i-1},{\mathcal{S}}_{i}),t},Y_{({\mathcal{S}}_{i+1},{\mathcal{S}}_{i}),t},Y\!{\raisebox{-3.0pt}{$\scriptstyle{({\mathcal{S}}_{i},\overset{{}_{\leftarrow}}{\partial}{\mathcal{S}}_{i}),t}$}},Y\!{\raisebox{-3.0pt}{$\scriptstyle{({\mathcal{S}}_{i},{\mathcal{S}}_{i}\setminus\overset{{}_{\leftarrow}}{\partial}{\mathcal{S}}_{i}),t}$}}).$		(A.3)

According to the operation of algorithm ${\mathcal{A}}$ , for each $(u,v)\in{\mathcal{E}}$ there exists a mapping $\varphi_{(u,v),t}$ , such that $X_{(u,v),t}=\varphi_{(u,v),t}(W_{u},Y_{u}^{t-1})$ . By the definition of $\overset{{}_{\leftarrow}}{\partial}{\mathcal{S}}_{i}$ , we can write

	$\displaystyle X_{({\mathcal{S}}_{i},{\mathcal{S}}_{i-1}),t}$	$\displaystyle=\big{(}\varphi_{(u,v),t}(W_{u},Y_{u}^{t-1}):$
		$\displaystyle\quad(u,v)\in{\mathcal{E}},u\in\overset{{}_{\leftarrow}}{\partial}{\mathcal{S}}_{i},v\in{\mathcal{S}}_{i-1}\big{)}.$

Thus, there exists a mapping $\overset{{}_{\leftarrow}}{\varphi}_{{\mathcal{S}}_{i},t}$ , such that

\displaystyle X_{({\mathcal{S}}_{i},{\mathcal{S}}_{i-1}),t}

\displaystyle=\overset{{}_{\leftarrow}}{\varphi}_{{\mathcal{S}}_{i},t}(W\thinspace{\raisebox{-4.0pt}{$\scriptstyle{\overset{{}_{\leftarrow}}{\partial}{\mathcal{S}}_{i}}$}},Y^{t-1}\thinspace{\raisebox{-5.0pt}{$\scriptstyle{\overset{{}_{\leftarrow}}{\partial}{\mathcal{S}}_{i}}$}})

(A.4)

where

\displaystyle Y\thinspace{\raisebox{-3.0pt}{$\scriptstyle{\overset{{}_{\leftarrow}}{\partial}{\mathcal{S}}_{i},t}$}}=\big{(}Y\thinspace{\raisebox{-3.0pt}{$\scriptstyle{({\mathcal{S}}_{i-1},\overset{{}_{\leftarrow}}{\partial}{\mathcal{S}}_{i}),t}$}},Y\thinspace{\raisebox{-3.0pt}{$\scriptstyle{({\mathcal{S}}_{i+1},\overset{{}_{\leftarrow}}{\partial}{\mathcal{S}}_{i}),t}$}},Y\thinspace{\raisebox{-3.0pt}{$\scriptstyle{({\mathcal{S}}_{i},\overset{{}_{\leftarrow}}{\partial}{\mathcal{S}}_{i}),t}$}}\big{)}.

(A.5)

By the same token, there exist mappings $\overset{{}_{\rightarrow}}{\varphi}_{{\mathcal{S}}_{i},t}$ , $\mathring{\varphi}_{{\mathcal{S}}_{i},t}$ and $\bar{\varphi}_{{\mathcal{S}}_{i},t}$ , such that

$\displaystyle X_{({\mathcal{S}}_{i},{\mathcal{S}}_{i+1}),t}$	$\displaystyle=\overset{{}_{\rightarrow}}{\varphi}_{{\mathcal{S}}_{i},t}(W_{{\mathcal{S}}_{i}},Y^{t-1}_{{\mathcal{S}}_{i}}),$	(A.6)
$\displaystyle X\!{\raisebox{-3.0pt}{$\scriptstyle{({\mathcal{S}}_{i},\overset{{}_{\leftarrow}}{\partial}{\mathcal{S}}_{i}),t}$}}$	$\displaystyle=\mathring{\varphi}_{{\mathcal{S}}_{i},t}(W_{{\mathcal{S}}_{i}},Y^{t-1}_{{\mathcal{S}}_{i}}),$	(A.7)
$\displaystyle X\!{\raisebox{-3.0pt}{$\scriptstyle{({\mathcal{S}}_{i},{\mathcal{S}}_{i}\setminus\overset{{}_{\leftarrow}}{\partial}{\mathcal{S}}_{i}),t}$}}$	$\displaystyle=\bar{\varphi}_{{\mathcal{S}}_{i},t}(W_{{\mathcal{S}}_{i}},Y^{t-1}_{{\mathcal{S}}_{i}}).$	(A.8)

Define the random variables

	$\displaystyle W_{i}$	$\displaystyle\triangleq W_{{\mathcal{S}}_{i}},$
	$\displaystyle X_{i,t}$	$\displaystyle=(X_{(i,i-1),t},X_{(i,i+1),t},X_{(i,i),t})$
		$\displaystyle\triangleq(X_{({\mathcal{S}}_{i},{\mathcal{S}}_{i-1}),t},X_{({\mathcal{S}}_{i},{\mathcal{S}}_{i+1}),t},X\!{\raisebox{-3.0pt}{$\scriptstyle{({\mathcal{S}}_{i},\overset{{}_{\leftarrow}}{\partial}{\mathcal{S}}_{i}),t}$}}),$
	$\displaystyle Y_{i,t}$	$\displaystyle=(Y_{(i-1,i),t},Y_{(i+1,i),t},Y_{(i,i),t})$
		$\displaystyle\triangleq(Y_{({\mathcal{S}}_{i-1},{\mathcal{S}}_{i}),t},Y_{({\mathcal{S}}_{i+1},{\mathcal{S}}_{i}),t},Y\!{\raisebox{-3.0pt}{$\scriptstyle{({\mathcal{S}}_{i},\overset{{}_{\leftarrow}}{\partial}{\mathcal{S}}_{i}),t}$}}),$
	$\displaystyle U_{i,t}$	$\displaystyle\triangleq(X\!{\raisebox{-3.0pt}{$\scriptstyle{({\mathcal{S}}_{i},{\mathcal{S}}_{i}\setminus\overset{{}_{\leftarrow}}{\partial}{\mathcal{S}}_{i}),t}$}},Y\!{\raisebox{-3.0pt}{$\scriptstyle{({\mathcal{S}}_{i},{\mathcal{S}}_{i}\setminus\overset{{}_{\leftarrow}}{\partial}{\mathcal{S}}_{i}),t}$}}).$

From the decomposition of $Y_{{\mathcal{S}}_{i},t}$ in (A.3), we know that $(Y_{i}^{t-1},U_{i}^{t-1})$ contains $Y_{{\mathcal{S}}_{i}}^{t-1}$ ; while from the decomposition of $Y\thinspace{\raisebox{-3.0pt}{$\scriptstyle{\overset{{}_{\leftarrow}}{\partial}{\mathcal{S}}_{i},t}$}}$ in (A.5), we know that $Y_{i}^{t-1}$ contains $Y^{t-1}\thinspace{\raisebox{-5.0pt}{$\scriptstyle{\overset{{}_{\leftarrow}}{\partial}{\mathcal{S}}_{i}}$}}$ . Therefore, from Eqs. (A.4) and (A.6)-(A.8), we deduce the existence of mappings $\overset{{}_{\leftarrow}}{\varphi}_{i,t}$ , $\overset{{}_{\rightarrow}}{\varphi}_{i,t}$ , $\mathring{\varphi}_{i,t}$ , and $\bar{\varphi}_{i,t}$ , such that the messages transmitted by nodes in ${\mathcal{S}}_{i}$ at time $t$ can be generated as

$\displaystyle X_{(i,i-1),t}$	$\displaystyle=\overset{{}_{\leftarrow}}{\varphi}_{i,t}(W_{i},Y_{i}^{t-1}),$	(A.9)
$\displaystyle X_{(i,i+1),t}$	$\displaystyle=\overset{{}_{\rightarrow}}{\varphi}_{i,t}(W_{i},Y_{i}^{t-1},U_{i}^{t-1}),$	(A.10)
$\displaystyle X_{(i,i),t}$	$\displaystyle=\mathring{\varphi}_{i,t}(W_{i},Y_{i}^{t-1},U_{i}^{t-1}),$	(A.11)
$\displaystyle X\!{\raisebox{-3.0pt}{$\scriptstyle{({\mathcal{S}}_{i},{\mathcal{S}}_{i}\setminus\overset{{}_{\leftarrow}}{\partial}{\mathcal{S}}_{i}),t}$}}$	$\displaystyle=\bar{\varphi}_{i,t}(W_{i},Y^{t-1}_{i},U^{t-1}_{i}).$	(A.12)

Note that the computation of $X_{(i,i-1),t}$ does not involve $U_{i}^{t-1}$ . Next, the messages received by nodes in ${\mathcal{S}}_{i}$ at step $t$ are related to the transmitted messages as

	$\displaystyle X_{(i-1,i),t}$	$\displaystyle\xrightarrow{K_{(i-1,i)}}Y_{(i-1,i),t},$
	$\displaystyle X_{(i+1,i),t}$	$\displaystyle\xrightarrow{K_{(i+1,i)}}Y_{(i+1,i),t},$
	$\displaystyle X_{(i,i),t}$	$\displaystyle\xrightarrow{K_{(i,i)}}Y_{(i,i),t},$

where the stochastic transition laws have the same form as those in Eqs. (26) to (28). In addition, since $X\!{\raisebox{-3.0pt}{$\scriptstyle{({\mathcal{S}}_{i},{\mathcal{S}}_{i}\setminus\overset{{}_{\leftarrow}}{\partial}{\mathcal{S}}_{i}),t}$}}$ and $Y\!{\raisebox{-3.0pt}{$\scriptstyle{({\mathcal{S}}_{i},{\mathcal{S}}_{i}\setminus\overset{{}_{\leftarrow}}{\partial}{\mathcal{S}}_{i}),t}$}}$ are related through the channels from ${\mathcal{S}}_{i}$ to ${\mathcal{S}}_{i}\setminus\overset{{}_{\leftarrow}}{\partial}{\mathcal{S}}_{i}$ , there exists a mapping $\kappa_{i,t}$ such that $Y\!{\raisebox{-3.0pt}{$\scriptstyle{({\mathcal{S}}_{i},{\mathcal{S}}_{i}\setminus\overset{{}_{\leftarrow}}{\partial}{\mathcal{S}}_{i}),t}$}}$ can be realized as

\displaystyle Y\!{\raisebox{-3.0pt}{$\scriptstyle{({\mathcal{S}}_{i},{\mathcal{S}}_{i}\setminus\overset{{}_{\leftarrow}}{\partial}{\mathcal{S}}_{i}),t}$}}

\displaystyle=\kappa_{i,t}(X\!{\raisebox{-3.0pt}{$\scriptstyle{({\mathcal{S}}_{i},{\mathcal{S}}_{i}\setminus\overset{{}_{\leftarrow}}{\partial}{\mathcal{S}}_{i}),t}$}},R_{i,t}),

(A.13)

where $R_{i,t}$ can be taken as a random variable uniformly distributed over $[0,1]$ and independent of everything else. From (A.12) and (A.13), we know that $U_{i,t}$ can be realized by a mapping $\vartheta_{i,t}$ as

\displaystyle U_{i,t}

\displaystyle=\vartheta_{i,t}(W_{i},Y_{i}^{t-1},U_{i}^{t-1},R_{i,t}).

(A.14)

Taking all of this into account, we can rewrite the factorization (A.1) as follows:

	$\displaystyle\quad\,\,{\mathbb{P}}_{X_{t}\|W,X^{t-1},Y^{t-1}}(x_{t}\|w,x^{t-1},y^{t-1})$
	$\displaystyle=\prod^{n}_{i=1}\mathbf{1}\big{\{}x_{(i-1,i),t}=\overset{{}_{\leftarrow}}{\varphi}_{i,t}(w_{i},y_{i}^{t-1})\big{\}}$
	$\displaystyle\quad\cdot\mathbf{1}\big{\{}x_{(i,i+1),t}=\overset{{}_{\rightarrow}}{\varphi}_{i,t}(w_{i},y_{i}^{t-1},u_{i}^{t-1})\big{\}}$
	$\displaystyle\quad\cdot\mathbf{1}\big{\{}x_{(i,i),t}=\mathring{\varphi}_{i,t}(w_{i},y_{i}^{t-1},u_{i}^{t-1})\big{\}}$
	$\displaystyle\quad\cdot\mathbf{1}\big{\{}x\!{\raisebox{-3.0pt}{$\scriptstyle{({\mathcal{S}}_{i},{\mathcal{S}}_{i}\setminus\overset{{}_{\leftarrow}}{\partial}{\mathcal{S}}_{i}),t}$}}=\bar{\varphi}_{i,t}(w_{i},y^{t-1}_{i},u^{t-1}_{i})\big{\}},$		(A.15)

and we can rewrite the factorization (A.2) as

	$\displaystyle\quad\,\,{\mathbb{P}}_{Y_{t}\|W,X^{t},Y^{t-1}}(y_{t}\|w,x^{t},y^{t-1})$
	$\displaystyle=\prod^{n}_{i=1}K_{(i-1,i)}(y_{(i-1,i),t}\|x_{(i-1,i),t})$
	$\displaystyle\quad\cdot K_{(i+1,i)}(y_{(i+1,i),t}\|x_{(i+1,i),t})\cdot K_{(i,i)}(y_{(i,i),t}\|x_{(i,i),t})$
	$\displaystyle\quad\cdot\bigotimes_{(u,v)\in{\mathcal{E}}:u\in{\mathcal{S}}_{i},v\in{\mathcal{S}}_{i}\setminus\overset{{}_{\leftarrow}}{\partial}{\mathcal{S}}_{i}}K_{(u,v)}(y\!{\raisebox{-3.0pt}{$\scriptstyle{({\mathcal{S}}_{i},{\mathcal{S}}_{i}\setminus\overset{{}_{\leftarrow}}{\partial}{\mathcal{S}}_{i}),t}$}}\|x\!{\raisebox{-3.0pt}{$\scriptstyle{({\mathcal{S}}_{i},{\mathcal{S}}_{i}\setminus\overset{{}_{\leftarrow}}{\partial}{\mathcal{S}}_{i}),t}$}})$		(A.16)

where the channel $\bigotimes_{(u,v)\in{\mathcal{E}}:u\in{\mathcal{S}}_{i},v\in{\mathcal{S}}_{i}\setminus\overset{{}_{\leftarrow}}{\partial}{\mathcal{S}}_{i}}K_{(u,v)}$ can be realized by the mapping $\kappa_{i,t}$ with the r.v. $R_{i,t}$ .

To summarize: the mappings defined in (A.9) to (A.11) and (A.14) specify a randomized $T$ -step algorithm ${\mathcal{A}}^{\prime}$ that runs on $G^{\prime}$ and simulates the $T$ -step algorithm ${\mathcal{A}}$ that runs on $G$ . Specifically, using these mappings, each node $i^{\prime}$ in $G^{\prime}$ can generate all the transmitted and received messages of ${\mathcal{S}}_{i}$ in ${\mathcal{A}}$ as $(X_{i^{\prime}}^{T},Y_{i^{\prime}}^{T},U_{i^{\prime}}^{T})$ . Moreover, from (A.15) and (A.16) we see that the random objects

\big{(}W_{{\mathcal{S}}_{i}},X_{{\mathcal{S}}_{i}}^{T},Y_{{\mathcal{S}}_{i}}^{T}:i\in\{1,\ldots,n\}\big{)}

and

\big{(}W_{i^{\prime}},X_{i^{\prime}}^{T},Y_{i^{\prime}}^{T},U_{i^{\prime}}^{T}:i^{\prime}\in\{1,\ldots,n\}\big{)}

have the same joint distribution.

Finally, as we have assumed that $\overset{{}_{\leftarrow}}{\partial}{\mathcal{S}}_{i}$ ’s are all nonempty, we can define

\displaystyle{\widehat{Z}}_{i}\triangleq{\widehat{Z}}_{v}=\psi_{v}(W_{v},Y_{v}^{T})

with an arbitrary $v\in\overset{{}_{\leftarrow}}{\partial}{\mathcal{S}}_{i}$ . From the definition of $Y_{i,t}$ and the fact that $Y_{i}^{T}$ contains $Y_{v}^{T}$ , it follows that there exists a mapping $\psi_{i}$ such that

\displaystyle{\widehat{Z}}_{i}=\psi_{i}(W_{i},Y_{i}^{T}).

Using this mapping, node $i^{\prime}$ in $G^{\prime}$ can generate the final estimate of the chosen $v\in\overset{{}_{\leftarrow}}{\partial}{\mathcal{S}}_{i}$ in ${\mathcal{A}}$ as ${\widehat{Z}}_{i^{\prime}}$ , such that $(Z,{\widehat{Z}}_{i}:i\in\{1,\ldots,n\})$ and $(Z,{\widehat{Z}}_{i^{\prime}}:i\in\{1,\ldots,n\})$ have the same joint distribution. This guarantees that

	$\displaystyle\max_{i^{\prime}\in{\mathcal{V}}^{\prime}}{\mathbb{P}}[d(Z,{\widehat{Z}}_{i^{\prime}})>\varepsilon]$	$\displaystyle=\max_{i\in\{1:n\}}{\mathbb{P}}[d(Z,{\widehat{Z}}_{i})>\varepsilon]$
		$\displaystyle\leq\max_{v\in{\mathcal{V}}}{\mathbb{P}}[d(Z,{\widehat{Z}}_{v})>\varepsilon]$
		$\displaystyle\leq\delta.$

The claim that $T(\varepsilon,\delta)$ for computing $Z$ on $G$ is lower bounded by $T^{\prime}(\varepsilon,\delta)$ for computing $Z$ on $G^{\prime}$ then follows from the definition of $T^{\prime}(\varepsilon,\delta)$ in (III-A). This proves Lemma 7.

Appendix B Proof of Lemma 8

Recall that, for any randomized $T$ -step algorithm ${\mathcal{A}}^{\prime}$ , at step $t\in\{1,\ldots,T\}$ , node $i\in\{1,\ldots,n\}$ computes the outgoing messages $X_{(i,i-1),t}=\overset{{}_{\leftarrow}}{\varphi}_{i,t}(W_{i},Y^{t-1}_{i})$ , $X_{(i,i+1),t}=\overset{{}_{\rightarrow}}{\varphi}_{i,t}(W_{i},Y^{t-1}_{i},U^{t-1}_{i})$ , and $X_{(i,i),t}=\mathring{\varphi}_{i,t}(W_{i},Y^{t-1}_{i},U^{t-1}_{i})$ , and the private message $U_{i,t}=\vartheta_{i,t}(W_{i},Y_{i}^{t-1},U^{t-1}_{i},R_{i,t})$ , where $R_{i,t}$ is the private randomness of node $i$ . At step $T$ , node $i$ computes ${\widehat{Z}}_{i}=\psi_{i}(W_{i},Y^{T}_{i})$ . We will use the Bayesian network formed by all the relevant variables and the d-separation criterion [24, Theorem 3.3] to find conditional independences among these variables. To simplify the Bayesian network, we merge some of the variables by defining

\tilde{U}_{i,t}\triangleq(X_{(i,i),t},X_{(i,i+1),t},U_{i,t})

and

\tilde{Y}_{i,t}\triangleq(Y_{(i,i),t},Y_{(i+1,i),t})

for $i\in\{1,\ldots,n\}$ . The joint distribution of the variables can then be factored as

	$\displaystyle\quad\,\,{\mathbb{P}}_{W,X^{T},U^{T},Y^{T}}(w,x^{T},u^{T},y^{T})$
	$\displaystyle={\mathbb{P}}_{W}(w)\prod_{t=1}^{T}\prod_{i=1}^{n}\mathbf{1}\big{\{}x_{(i,i-1),t}=\overset{{}_{\leftarrow}}{\varphi}_{i,t}(w_{i},y_{i}^{t-1})\big{\}}$
	$\displaystyle\quad\cdot{\mathbb{P}}_{\tilde{U}_{i,t}\|W_{i},Y_{i}^{t-1},\tilde{U}_{i}^{t-1}}(\tilde{u}_{i,t}\|w_{i},y_{i}^{t-1},\tilde{u}_{i}^{t-1})$
	$\displaystyle\quad\cdot\prod_{i=1}^{n}{\mathbb{P}}_{Y_{(i-1,i),t}\|\tilde{U}_{i-1,t}}(y_{(i-1,i),t}\|\tilde{u}_{i-1,t})$
	$\displaystyle\quad\cdot{\mathbb{P}}_{\tilde{Y}_{i,t}\|\tilde{U}_{i,t},X_{(i+1,i),t}}(\tilde{y}_{i,t}\|\tilde{u}_{i,t},x_{(i+1,i),t}).$		(B.1)

The Bayesian network corresponding to this factorization for $n=4$ and $T=4$ is shown in Fig. 10.

If $T=0$ , then ${\widehat{Z}}_{n}=\psi(W_{n})$ , hence $I(Z;{\widehat{Z}}_{n}|W_{2:n})\leq I(Z;W_{n}|W_{2:n})=0$ . For $T\geq 1$ , we prove the upper bounds in the following steps, where we assume $n\geq 4$ . The case $n=3$ can be proved by skipping Step 2, and the case $n=2$ can be proved by skipping Step 1 and Step 2.

Step 1:
For any $i$ and $t$ , define the shorthand $X_{i\leftarrow,t}\triangleq X_{({\mathcal{N}}_{i\leftarrow},i),t}$ , where ${\mathcal{N}}_{i\leftarrow}$ is the in-neighborhood of node $i$ . From the Markov chain $W,Y_{n}^{T-1}\rightarrow X_{n\leftarrow,T}\rightarrow Y_{n,T}$ and Lemma 3, we follow the same argument as the one used for proving Lemma 5 to show that

	$\displaystyle I(Z;{\widehat{Z}}_{n}\|W_{2:n})$	$\displaystyle\leq I(W_{1};Y_{n}^{T}\|W_{2:n})$
		$\displaystyle\leq(1-\eta_{n})I(W_{1};Y_{n}^{T-1}\|W_{2:n})$
		$\displaystyle\quad+\eta_{n}I(W_{1};Y_{n}^{T-1},X_{{n\leftarrow},T}\|W_{2:n}).$

Applying the d-separation criterion to the Bayesian network corresponding to (B.1) (see Fig. 10 for an illustration), we can read off the Markov chain

\displaystyle{W_{1}\rightarrow W_{2:n},Y_{n-1}^{t-1}\rightarrow Y_{n}^{t-1},\tilde{U}_{n-1,t},\tilde{U}_{n,t}}

for $t\in\{1,\ldots,T\}$ , since all trails from $W_{1}$ to $(Y_{n}^{t-1},\tilde{U}_{n-1,t},\tilde{U}_{n,t})$ are blocked by $(W_{2:n},Y_{(n-2,n-1)}^{t-1})$ , and all trails from $(Y_{n}^{t-1},\tilde{U}_{n-1,t},\tilde{U}_{n,t})$ to $W_{1}$ are blocked by $(W_{2:n},\tilde{Y}_{n-1}^{t-1})$ . This implies the Markov chain $W_{1}\rightarrow W_{2:n},Y_{n-1}^{T-1}\rightarrow Y_{n}^{T-1},X_{n\leftarrow,T}$ , since $X_{(n-1,n),T}$ is included in $\tilde{U}_{n-1,T}$ and $X_{(n,n),T}$ is included in $\tilde{U}_{n,T}$ . Consequently,¹¹1This follows from the ordinary DPI and from the fact that, if $X\to A,B\to C$ is a Markov chain, then $X\to B\to C$ is a Markov chain conditioned on $A=a$ .

	$\displaystyle I(W_{1};Y_{n}^{T}\|W_{2:n})$	$\displaystyle\leq(1-\eta_{n})I(W_{1};Y_{n}^{T-1}\|W_{2:n})$
		$\displaystyle\quad+\eta_{n}I(W_{1};Y_{n-1}^{T-1}\|W_{2:n}).$		(B.2)

Also note that $I(W_{1};Y_{n,1}|W_{2:n})\leq I(W_{1};X_{n\leftarrow,1}|W_{2:n})\leq I(W_{1};W_{{\mathcal{N}}_{n\leftarrow}}|W_{2:n})=0$ .

Step 2:
For $i\in\{1,\ldots,n-3\}$ , from the Markov chain $W,Y_{n-i}^{T-i-1}\rightarrow X_{{{(n-i)}\leftarrow},T-i}\rightarrow Y_{n-i,T-i}$ and Lemma 3,

	$\displaystyle I(W_{1};Y_{n-i}^{T-i}\|$	$\displaystyle W_{2:n})\leq(1-\eta_{n-i})I(W_{1};Y_{n-i}^{T-i-1}\|W_{2:n})$
		$\displaystyle+\eta_{n-i}I(W_{1};Y_{n-i}^{T-i-1},X_{({n-i)}\leftarrow,T-i}\|W_{2:n})$

From the Bayesian network corresponding to (B.1), we can read off the Markov chain

	$\displaystyle W_{1}$	$\displaystyle\rightarrow W_{2:n},Y_{n-i-1}^{t-1}$
		$\displaystyle\rightarrow Y_{n-i}^{t-1},\tilde{U}_{n-i-1,t},\tilde{U}_{n-i,t},X_{(n-i+1,n-i),t}$

for $t\in\{1,\ldots,T-i\}$ , since all trails from $W_{1}$ to

(Y_{n-i}^{t-1},\tilde{U}_{n-i-1,t},\tilde{U}_{n-i,t},X_{(n-i+1,n-i),t})

are blocked by $(W_{2:n},Y_{(n-i-2,n-i-1)}^{t-1})$ , and all trails from

(Y_{n-i}^{t-1},\tilde{U}_{n-i-1,t},\tilde{U}_{n-i,t},X_{(n-i+1,n-i),t})

to $W_{1}$ are blocked by $(W_{2:n},\tilde{Y}_{n-i-1}^{t-1})$ . This implies the Markov chain

W_{1}\rightarrow W_{2:n},Y_{n-i-1}^{T-i-1}\rightarrow Y_{n-i}^{T-i-1},X_{(n-i)\leftarrow,T-i},

since $X_{(n-i-1,n-i),T-i}$ is included in $\tilde{U}_{n-i-1,T-i}$ and $X_{(n-i,n-i),T-i}$ is included in $\tilde{U}_{n-i,T-i}$ . Therefore,

	$\displaystyle I(W_{1};Y_{n-i}^{T-i}\|W_{2:n})$	$\displaystyle\leq(1-\eta_{n-i})I(W_{1};Y_{n-i}^{T-i-1}\|W_{2:n})$
		$\displaystyle\quad+\eta_{n-i}I(W_{1};Y_{n-i-1}^{T-i-1}\|W_{2:n})$		(B.3)

for $i\in\{1,\ldots,n-3\}$ . Also note that

	$\displaystyle I(W_{1};Y_{n-i,1}\|W_{2:n})$	$\displaystyle\leq I(W_{1};X_{(n-i)\leftarrow,1}\|W_{2:n})$
		$\displaystyle\leq I(W_{1};W_{{\mathcal{N}}_{(n-i)\leftarrow}}\|W_{2:n})$
		$\displaystyle=0.$

Step 3:
Finally, we upper-bound $I(W_{1};Y_{2}^{T-n+2}|W_{2:n})$ for $T\geq n-1$ . From the Markov chain $W,Y_{2}^{t-1}\rightarrow X_{2\leftarrow,t}\rightarrow Y_{2,t}$ and Lemma 3,

	$\displaystyle I(W_{1};Y_{2}^{T-n+2}\|W_{2:n})$	$\displaystyle\leq(1-\eta_{2})I(W_{1};Y_{2}^{T-n+1}\|W_{2:n})$
		$\displaystyle\quad+\eta_{2}H(W_{1}\|W_{2:n}).$		(B.4)

This upper bound is useful only when $H(W_{1}|W_{2:n})$ is finite. If the observations are continuous r.v.’s, we can upper bound $I(W_{1};Y_{2}^{T-n+2}|W_{2:n})$ in terms of the channel capacity $C_{(1,2)}$ :

	$\displaystyle\quad\,\,I(W_{1};Y_{2}^{T-n+2}\|W_{2:n})$
	$\displaystyle=\sum_{t=1}^{T-n+2}I(W_{1};Y_{2,t}\|W_{2:n},Y_{2}^{t-1})$
	$\displaystyle\overset{\rm(a)}{=}\sum_{t=1}^{T-n+2}\Big{(}I(W_{1};Y_{(1,2),t}\|W_{2:n},Y_{2}^{t-1})$
	$\displaystyle\qquad\qquad+I(W_{1};\tilde{Y}_{2,t}\|W_{2:n},Y_{2}^{t-1},Y_{(1,2),t})\Big{)}$
	$\displaystyle\overset{\rm(b)}{\leq}\sum_{t=1}^{T-n+2}I(X_{(1,2),t};Y_{(1,2),t}\|W_{2:n},Y_{2}^{t-1})$
	$\displaystyle\overset{\rm(c)}{\leq}\sum_{t=1}^{T-n+2}I(X_{(1,2),t};Y_{(1,2),t})$
	$\displaystyle\leq C_{(1,2)}(T-n+2),$		(B.5)

where we have used the Markov chain $W_{1}\rightarrow W_{2:n},Y_{2}^{t-1},Y_{(1,2),t}\rightarrow\tilde{Y}_{2,t}$ for $t\in\{1,\ldots,T-n+2\}$ , which follows by applying the d-separation criterion to the Bayesian network corresponding to the factorization in (B.1), so that the second term in (a) is zero; the Markov chain $W,Y_{2}^{t-1}\rightarrow X_{(1,2),t}\rightarrow Y_{(1,2),t}$ , which also implies the Markov chain $W_{1}\rightarrow X_{(1,2),t},W_{2:n},Y_{2}^{t-1}\rightarrow Y_{(1,2),t}$ by the weak union property of conditional independence, hence (b) and (c); and the fact that $I(X_{(1,2),t};Y_{(1,2),t})\leq C_{(1,2)}$ .

Step 4:
Define $I_{i,t}=I(W_{1};Y_{i}^{t}|W_{2:i})$ for $i\geq 2$ and $t\geq 1$ . From (B.2), (B.3), (B.4), and (B.5), we can write, for $n\geq 3$ , $T\geq n-1$ , and $i\in\{0,\ldots,n-3\}$ ,

\displaystyle I_{n-i,T-i}

\displaystyle\leq\bar{\eta}_{n-i}I_{n-i,T-i-1}+\eta_{n-i}I_{n-i-1,T-i-1}

(B.6)

where $\bar{\eta}_{n-i}=1-\eta_{n-i}$ , and $I_{n-i,1}=0$ . In addition, for $T\geq n-1$ ,

\displaystyle I_{2,T-n+2}

\displaystyle\leq\begin{cases}\bar{\eta}_{2}I_{2,T-n+1}+\eta_{2}H(W_{1}|W_{2:n})\\ C_{(1,2)}(T-n+2)\end{cases},

(B.7)

and $I_{2,0}=0$ .

An upper bound on $I(W_{1};Y_{n}^{T}|W_{2:n})$ can be obtained by solving this set of recursive inequalities with the specified boundary conditions. It can be checked by induction that $I(W_{1};Y_{n}^{T}|W_{2:n})=0$ if $T\leq n-2$ . For $T\geq n-1$ , if $\eta_{i}\leq\tilde{\eta}$ for all $i\in\{1,\ldots,n\}$ , then the above inequalities continue to hold with $\eta_{i}$ ’s replaced with $\tilde{\eta}$ . The resulting set of inequalities is similar to the one obtained by Rajagopalan and Schulman [13] for the evolution of mutual information in broadcasting a bit over a unidirectional chain of BSCs. With

{\mathcal{B}}(m,k,p)\triangleq{m\choose k}p^{k}(1-p)^{m-k},

the exact solution is given by

	$\displaystyle\quad\,\,I(W_{1};Y_{n}^{T}\|W_{2:n})$
	$\displaystyle\leq H(W_{1}\|W_{2:n})\tilde{\eta}\sum_{i=1}^{T-n+2}\tilde{\eta}^{n-2}(1-\tilde{\eta})^{T-i-n+2}{T-i\choose n-2}$
	$\displaystyle=H(W_{1}\|W_{2:n})\eta\sum_{i=1}^{T-n+2}{\mathcal{B}}(T-i,n-2,\eta)$

for $n\geq 2$ , and

	$\displaystyle\quad\,\,I(W_{1};Y_{n}^{T}\|W_{2:n})$
	$\displaystyle\leq C_{(1,2)}\tilde{\eta}\sum_{i=1}^{T-n+2}\tilde{\eta}^{n-3}(1-\tilde{\eta})^{T-i-n+2}{T-i-1\choose n-3}i$
	$\displaystyle=C_{(1,2)}\eta\sum_{i=1}^{T-n+2}{\mathcal{B}}(T-i-1,n-3,\eta)i$

for $n\geq 3$ . This proves (30) and (31).

For general $\eta_{i}$ ’s, we obtain a suboptimal upper bound by unrolling the first term in (B.6) for each $i$ and using the fact that $I_{n-i,t}=0$ for $t\leq n-i-2$ , getting

	$\displaystyle I_{n-i,T-i}$	$\displaystyle\leq\bar{\eta}_{n-i}^{T-n+1}\eta_{n-i}I_{n-i-1,n-i-2}+\ldots$
		$\displaystyle\quad+\bar{\eta}_{n-i}\eta_{n-i}I_{n-i-1,T-i-2}+\eta_{n-i}I_{n-i-1,T-i-1}$
		$\displaystyle\leq\big{(}\bar{\eta}_{n-i}^{T-n+1}+\ldots+\bar{\eta}_{n-i}+1\big{)}\eta_{n-i}I_{n-i-1,T-i-1}$
		$\displaystyle=\big{(}1-\bar{\eta}_{n-i}^{T-n+2}\big{)}I_{n-i-1,T-i-1}.$

Iterating over $i$ , and noting that

	$\displaystyle\quad\,\,I_{2,T-n+2}$
	$\displaystyle\leq\min\big{\{}H(W_{1}\|W_{2:n})(1-\bar{\eta}_{2}^{T-n+2}),C_{(1,2)}(T-n+2)\big{\}},$

we get for $n\geq 2$ and $T\geq n-1$ ,

		$\displaystyle I(W_{1};Y_{n}^{T}\|W_{2:n})\leq$
		$\displaystyle\begin{cases}H(W_{1}\|W_{2:n})\prod_{i=2}^{n}\big{(}1-(1-\eta_{i})^{T-n+2}\big{)}\\ C_{(1,2)}(T-n+2)\prod_{i=3}^{n}\big{(}1-(1-\eta_{i})^{T-n+2}\big{)}\end{cases}.$		(B.8)

The weakened upper bounds in (32) and (33) are obtained by replacing $\eta_{i}$ in (B) with

\eta\triangleq\max_{i=1,\ldots,n}\eta_{i}.

Finally, we show (8) using an argument similar to the one in [13]. If $n\geq 4$ and $T\leq 2+(n-3)\gamma/\eta$ for some $\gamma\in(0,1)$ , then

\displaystyle\eta<\frac{\eta}{\gamma}\leq\frac{n-3}{T-2}\leq\frac{n-2}{T-1}\leq 1

where the last inequality follows from the assumption that $T\geq n-1$ , since otherwise $I(Z;{\widehat{Z}}_{n}|W_{2:n})=0$ . The upper bounds in (30) and (31) can be weakened to

	$\displaystyle\quad\,\,I(Z;{\widehat{Z}}_{n}\|W_{2:n})$
	$\displaystyle\overset{\rm(a)}{\leq}\begin{cases}H(W_{1}\|W_{2:n})\eta(T-n+2){\mathcal{B}}(T-1,n-2,\eta)\\ C_{(1,2)}\eta(T-n+2)^{2}{\mathcal{B}}(T-2,n-3,\eta)\end{cases}$
	$\displaystyle\overset{\rm(b)}{\leq}\min\big{\{}H(W_{1}\|W_{2:n}),C_{(1,2)}\big{\}}\eta(T-n+2)^{2}{\mathcal{B}}(T-2,n-3,\eta)$
	$\displaystyle\overset{\rm(c)}{\leq}C_{(1,2)}\eta(T-n+2)^{2}\exp\left(-2\left(\frac{n-3}{T-2}-\eta\right)^{2}(T-2)\right)$
	$\displaystyle\overset{\rm(d)}{\leq}C_{(1,2)}\frac{(n-3)^{2}\gamma^{2}}{\eta}\exp\left(-2\left(\frac{\eta}{\gamma}-\eta\right)^{2}(n-3)\right)$

where

(a)

and (b) follow from monotonicity properties of the binomial distribution;
(c)

follows from the Chernoff–Hoeffding bound;
(d)

follows from the fact that the channels associated with ${\mathcal{E}}_{\mathcal{S}}$ are independent, and the fact that the assumption that $n\geq 4$ and $n-1\leq T\leq 2+(n-3)\gamma/\eta$ .

References

[1] O. Ayaso, D. Shah, and M. Dahleh, “Information-theoretic bounds for distributed computation over networks of point-to-point channels,” IEEE Trans. Inform. Theory, vol. 56, no. 12, pp. 6020–6039, 2010.
[2] G. Como and M. Dahleh, “Lower bounds on the estimation error in problems of distributed computation,” in Proc. Inform. Theory and Applications Workshop, 2009, pp. 70–76.
[3] N. Goyal, G. Kindler, and M. Saks, “Lower bounds for the noisy broadcast problem,” SIAM Journal on Computing, vol. 37, no. 6, pp. 1806–1841, 2008.
[4] C. Dutta, Y. Kanoria, D. Manjunath, and J. Radhakrishnan, “A tight lower bound for parity in noisy communication networks,” in Proc. ACM Symposium on Discrete Algorithms (SODA), 2014, pp. 1056–1065.
[5] M. Braverman, “Interactive information and coding theory,” in Proc. Int. Congress Math., 2014.
[6] A. Orlitsky and J. Roche, “Coding for computing,” IEEE Trans. Inform. Theory, vol. 47, no. 3, pp. 903–917, 2001.
[7] J. Körner and K. Marton, “How to encode the modulo-two sum of binary sources,” IEEE Trans. Inform. Theory, vol. 25, no. 2, pp. 219–221, 1979.
[8] A. B. Wagner, S. Tavildar, and P. Viswanath, “Rate region of the quadratic Gaussian two-encoder source-coding problem,” IEEE Trans. Inform. Theory, vol. 54, no. 5, pp. 1938–1961, 2008.
[9] A. El Gamal and Y.-H. Kim, Network Information Theory. Cambridge Univ. Press, 2011.
[10] A. Giridhar and P. Kumar, “Toward a theory of in-network computation in wireless sensor networks,” IEEE Communications Magazine, vol. 44, no. 4, pp. 98–107, April 2006.
[11] R. Gallager, “Finding parity in a simple broadcast network,” IEEE Trans. Inform. Theory, vol. 34, no. 2, pp. 176–180, 1988.
[12] L. Schulman, “Coding for interactive communication,” IEEE Trans. Inform. Theory, vol. 42, no. 6, pp. 1745–1756, 1996.
[13] S. Rajagopalan and L. Schulman, “A coding theorem for distributed computation,” in ACM Symposium on Theory of Computing, 1994.
[14] R. Carli, G. Como, P. Frasca, and F. Garin, “Distributed averaging on digital erasure networks,” Automatica, vol. 47, no. 115-121, 2011.
[15] S. Kar and J. Moura, “Distributed consensus algorithms in sensor networks with imperfect communication: Link failures and channel noise,” IEEE Trans. Signal Process., vol. 57, no. 1, pp. 355–369, 2009.
[16] N. Noorshams and M. Wainwright, “Non-asymptotic analysis of an optimal algorithm for network-constrained averaging with noisy links,” IEEE J. Sel. Top. Sign. Proces., vol. 5, no. 4, pp. 833–844, 2011.
[17] L. Ying, R. Srikant, and G. Dullerud, “Distributed symmetric function computation in noisy wireless sensor networks with binary data,” in International Symposium on Modeling and Optimization in Mobile, Ad-Hoc and Wireless networks (WiOpt), 2006.
[18] S. Deb, M. Medard, and C. Choute, “Algebraic gossip: a network coding approach to optimal multiple rumor mongering,” IEEE Trans. Inform. Theory, vol. 52, no. 6, pp. 2486–2507, 2006.
[19] V. V. Petrov, Sums of Independent Random Variables. Berlin: Springer-Verlag, 1975.
[20] M. Raginsky, “Strong data processing inequalities and $\Phi$ -Sobolev inequalities for discrete channels,” IEEE Trans. Inform. Theory, vol. 62, no. 6, pp. 3355–3389, 2016.
[21] P. Tiwari, “Lower bounds on communication complexity in distributed computer networks,” J. ACM, vol. 34, no. 4, pp. 921–938, Oct. 1987.
[22] A. Chattopadhyay, J. Radhakrishnan, and A. Rudra, “Topology matters in communication,” in Proc. IEEE Annu. Symp. on Foundations of Comp. Sci. (FOCS), Oct 2014, pp. 631–640.
[23] T. Cover and J. Thomas, Elements of Information Theory, 2nd ed. New York: Wiley.
[24] D. Koller and N. Friedman, Probabilistic Graphical Models: Principles and Techniques. MIT Press, 2009.
[25] Y. Polyanskiy and Y. Wu, “Lecture Notes on Information Theory,” Lecture Notes for ECE563 (UIUC) and 6.441 (MIT), 2012-2016. [Online]. Available: http://people.lids.mit.edu/yp/homepage/data/itlectures_v4.pdf
[26] R. Ahlswede and P. Gács, “Spreading of sets in product spaces and hypercontraction of the Markov operator,” Ann. Probab., vol. 4, no. 6, pp. 925–939, 1976.
[27] V. Anantharam, A. Gohari, S. Kamath, and C. Nair, “On maximal correlation, hypercontractivity, and the data processing inequality studied by Erkip and Cover,” arXiv preprint, 2013. [Online]. Available: http://arxiv.org/abs/1304.6133
[28] Y. Polyanskiy and Y. Wu, “Dissipation of information in channels with input constraints,” IEEE Trans. Inform. Theory, vol. 62, no. 1, pp. 35–55, 2016.
[29] W. Evans and L. Schulman, “Signal propagation and noisy circuits,” IEEE Trans. Inform. Theory, vol. 45, no. 7, pp. 2367–2373, 1999.
[30] A. Xu, “Information-theoretic limitations of distributed information processing,” Ph.D. dissertation, University of Illinois at Urbana-Champaign, 2016.
[31] Y. Polyanskiy and Y. Wu, “Strong data-processing inequalities for channels and Bayesian networks,” arXiv preprint, 2015. [Online]. Available: http://arxiv.org/abs/1508.06025
[32] V. Kostina and S. Verdú, “Lossy joint source-channel coding in the finite blocklength regime,” IEEE Trans. Inform. Theory, vol. 59, no. 5, pp. 2545–2575, 2013.
[33] R. Gallager, Information Theory and Reliable Communication. New York: Wiley, 1968.
[34] A. Kolmogorov, “Sur les propriétés des fonctions de concentrations de M. P. Lévy,” Ann. Inst. H. Poincaré, vol. 16, pp. 27–34, 1958.
[35] H. H. Nguyen and V. H. Vu, “Small ball probability, inverse theorems, and applications,” in Erdős Centennial, ser. Bolyai Society Mathematical Studies. Springer, 2013, vol. 25. [Online]. Available: http://arxiv.org/abs/1301.0019
[36] S. G. Bobkov and G. P. Chistyakov, “On concentration functions of random variables,” J. Theor. Probab., vol. 28, no. 3, pp. 976–988, 2015, published online.
[37] M. Rudelson and R. Vershynin, “The Littlewood–Offord problem and invertibility of random matrices,” Adv. Math., vol. 218, pp. 600–633, 2008.
[38] M. Rudelson and R. Vershynin, “Small ball probabilities for linear images of high dimensional distributions,” arXiv1402.4492R, Feb. 2014. [Online]. Available: https://arxiv.org/abs/1402.4492
[39] P. Erdős, “On a lemma of Littlewood and Offord,” Bull. Amer. Math. Soc., vol. 51, pp. 898–902, 1945.
[40] S. Bobkov and M. Madiman, “The entropy per coordinate of a random vector is highly constrained under convexity conditions,” IEEE Trans. Inform. Theory, vol. 57, no. 8, pp. 4940–4954, 2011.
[41] R. Carli, G. Como, P. Frasca, and F. Garin, “Average consensus on digital noisy networks,” 1st IFAC Workshop on Estimation and Control of Networked Systems, 2009.
[42] H. S. Witsenhausen, “On sequences of pairs of dependent random variables,” SIAM J. Appl. Math., vol. 28, no. 1, pp. 100–113, Jan. 1975.

$\displaystyle I(Z;{\widehat{Z}}_{{\mathcal{S}}}\|W_{{\mathcal{S}}})$	$\displaystyle\overset{\rm(a)}{\leq}I(W_{{\mathcal{S}}},W_{{\mathcal{S}}^{c}};W_{{\mathcal{S}}},Y_{{\mathcal{S}}}^{T}\|W_{{\mathcal{S}}})$
	$\displaystyle=I(W_{{\mathcal{S}}^{c}};Y_{{\mathcal{S}}}^{T}\|W_{{\mathcal{S}}})$
	$\displaystyle=\sum\limits_{t=1}^{T}I(W_{{\mathcal{S}}^{c}};Y_{{\mathcal{S}},t}\|W_{{\mathcal{S}}},Y_{{\mathcal{S}}}^{t-1})$
	$\displaystyle\overset{\rm(b)}{=}\sum\limits_{t=1}^{T}I(W_{{\mathcal{S}}^{c}};Y_{{\mathcal{S}},t}\|W_{{\mathcal{S}}},Y_{{\mathcal{S}}}^{t-1},X_{{\mathcal{S}},t})$
	$\displaystyle\leq\sum\limits_{t=1}^{T}I(W_{{\mathcal{S}}^{c}},X_{{\mathcal{S}}^{c},t};Y_{{\mathcal{S}},t}\|W_{{\mathcal{S}}},Y_{{\mathcal{S}}}^{t-1},X_{{\mathcal{S}},t})$
	$\displaystyle=\sum\limits_{t=1}^{T}\Big{(}I(X_{{\mathcal{S}}^{c},t};Y_{{\mathcal{S}},t}\|W_{{\mathcal{S}}},Y_{{\mathcal{S}}}^{t-1},X_{{\mathcal{S}},t})$
	$\displaystyle\qquad+I(W_{{\mathcal{S}}^{c}};Y_{{\mathcal{S}},t}\|W_{{\mathcal{S}}},Y_{{\mathcal{S}}}^{t-1},X_{{\mathcal{S}},t},X_{{\mathcal{S}}^{c},t})\Big{)}$
	$\displaystyle\overset{\rm(c)}{=}\sum\limits_{t=1}^{T}I(X_{{\mathcal{S}}^{c},t};Y_{{\mathcal{S}},t}\|W_{{\mathcal{S}}},Y_{{\mathcal{S}}}^{t-1},X_{{\mathcal{S}},t})$
	$\displaystyle\overset{\rm(d)}{\leq}\sum\limits_{t=1}^{T}I(X_{{\mathcal{S}}^{c},t};Y_{{\mathcal{S}},t}\|X_{{\mathcal{S}},t})$	(13)

$\displaystyle I(X_{{\mathcal{S}}^{c}};Y_{{\mathcal{S}}}\|X_{{\mathcal{S}}})$	$\displaystyle=I(X_{{\mathcal{S}}^{c}};Y_{{\mathcal{S}}^{c}{\mathcal{S}}},Y_{{\mathcal{S}}{\mathcal{S}}}\|X_{\mathcal{S}})$
	$\displaystyle=I(X_{{\mathcal{S}}^{c}};Y_{{\mathcal{S}}^{c}{\mathcal{S}}}\|X_{\mathcal{S}})+I(X_{{\mathcal{S}}^{c}};Y_{{\mathcal{S}}{\mathcal{S}}}\|X_{\mathcal{S}},Y_{{\mathcal{S}}^{c}{\mathcal{S}}})$
	$\displaystyle\overset{\rm(a)}{=}I(X_{{\mathcal{S}}^{c}{\mathcal{S}}},X_{{\mathcal{S}}^{c}{\mathcal{S}}^{c}};Y_{{\mathcal{S}}^{c}{\mathcal{S}}}\|X_{\mathcal{S}})$
	$\displaystyle=I(X_{{\mathcal{S}}^{c}{\mathcal{S}}};Y_{{\mathcal{S}}^{c}{\mathcal{S}}}\|X_{\mathcal{S}})$
	$\displaystyle\quad+I(X_{{\mathcal{S}}^{c}{\mathcal{S}}^{c}};Y_{{\mathcal{S}}^{c}{\mathcal{S}}}\|X_{\mathcal{S}},X_{{\mathcal{S}}^{c}{\mathcal{S}}})$
	$\displaystyle\overset{\rm(b)}{\leq}I(X_{{\mathcal{S}}^{c}{\mathcal{S}}};Y_{{\mathcal{S}}^{c}{\mathcal{S}}})$
	$\displaystyle\overset{\rm(c)}{\leq}\sum_{e\in{\mathcal{E}}_{{\mathcal{S}}}}C_{e}$	(14)

	$\displaystyle\quad\,\,I(Z;{\widehat{Z}}_{v}\|W_{{\mathcal{S}}})$
	$\displaystyle\leq I(W_{\mathcal{S}},W_{{\mathcal{S}}^{c}};W_{v},Y_{v}^{T}\|W_{{\mathcal{S}}})$
	$\displaystyle=I(W_{{\mathcal{S}}^{c}};Y_{v}^{T}\|W_{{\mathcal{S}}})$
	$\displaystyle=I(W_{{\mathcal{S}}^{c}};Y_{v}^{T-1}\|W_{{\mathcal{S}}})+I(W_{{\mathcal{S}}^{c}};Y_{v,T}\|W_{{\mathcal{S}}},Y_{v}^{T-1})$
	$\displaystyle\overset{\rm(a)}{\leq}I(W_{{\mathcal{S}}^{c}};Y_{v}^{T-1}\|W_{{\mathcal{S}}})+\eta_{v}I(W_{{\mathcal{S}}^{c}};X_{v\leftarrow,T}\|W_{{\mathcal{S}}},Y_{v}^{T-1})$
	$\displaystyle=(1-\eta_{v})I(W_{{\mathcal{S}}^{c}};Y_{v}^{T-1}\|W_{{\mathcal{S}}})$
	$\displaystyle\quad+\eta_{v}I(W_{{\mathcal{S}}^{c}};Y_{v}^{T-1},X_{v\leftarrow,T}\|W_{\mathcal{S}})$

	$\displaystyle\quad\,\,I(W_{{\mathcal{S}}^{c}};Y_{v}^{T}\|W_{{\mathcal{S}}})$
	$\displaystyle\leq(1-\eta_{v})^{T-1}\eta_{v}I(W_{{\mathcal{S}}^{c}};X_{v\leftarrow,1}\|W_{{\mathcal{S}}})+\ldots$
	$\displaystyle\quad+(1-\eta_{v})\eta_{v}I(W_{{\mathcal{S}}^{c}};Y_{v}^{T-2},X_{v\leftarrow,T-1}\|W_{{\mathcal{S}}})$
	$\displaystyle\quad+\eta_{v}I(W_{{\mathcal{S}}^{c}};Y_{v}^{T-1},X_{v\leftarrow,T}\|W_{{\mathcal{S}}})$
	$\displaystyle\leq\big{(}(1-\eta_{v})^{T-1}+\ldots+(1-\eta_{v})+1\big{)}\eta_{v}H(W_{{\mathcal{S}}^{c}}\|W_{{\mathcal{S}}})$
	$\displaystyle=\big{(}1-(1-\eta_{v})^{T}\big{)}H(W_{{\mathcal{S}}^{c}}\|W_{{\mathcal{S}}}).$

	$\displaystyle L(w_{\mathcal{S}},\varepsilon)$	$\displaystyle=\sup_{z\in\mathbb{R}^{n}}{\mathbb{P}}[\\|AW_{\mathcal{V}}-z\\|_{2}\leq\varepsilon\|W_{\mathcal{S}}=w_{\mathcal{S}}]$
		$\displaystyle=\sup_{z\in\mathbb{R}^{n}}{\mathbb{P}}[\\|A_{{\mathcal{S}}^{c}}W_{{\mathcal{S}}^{c}}+A_{{\mathcal{S}}}w_{\mathcal{S}}-z\\|_{2}\leq\varepsilon]$
		$\displaystyle=\sup_{z\in\mathbb{R}^{n}}{\mathbb{P}}[\\|A_{{\mathcal{S}}^{c}}W_{{\mathcal{S}}^{c}}-z\\|_{2}\leq\varepsilon]$
		$\displaystyle={\mathcal{L}}(A_{{\mathcal{S}}^{c}}W_{{\mathcal{S}}^{c}},\varepsilon)$

Information-Theoretic Lower Bounds for Distributed Function Computation

Abstract

Index Terms:

I Introduction and preview of results

I-A Model and problem formulation

I-B Method of analysis and summary of main results

I-C Organization of the paper

II Single-cutset analysis

II-A Lower bound on I​(Z;Z^v|W𝒮)I(Z;{\widehat{Z}}_{v}|W_{\mathcal{S}})

Lemma 1.

Proof:

II-B Upper bound on I​(Z;Z^v|W𝒮)I(Z;{\widehat{Z}}_{v}|W_{\mathcal{S}}) via cutset capacity

Lemma 2.

Proof:

II-C Preliminaries on strong data processing inequalities

Lemma 3.

Lemma 4.

Proof:

II-D Upper bound on I​(Z;Z^v|W𝒮)I(Z;{\widehat{Z}}_{v}|W_{\mathcal{S}}) via SDPI

Lemma 5.

Proof:

Example 1.

II-E Lower bounds on computation time

II-E1 Cutset-capacity bounds

Theorem 1.

II-E2 SDPI bounds

Theorem 2.

Corollary 1.

Example 2.

Corollary 2.

III Multi-cutset analysis

III-A Network reduction

Lemma 6.

Proof:

Lemma 7.

Proof:

Lemma 8.

Proof:

III-B Lower bounds on computation time

III-B1 Lower bound for an arbitrary network

Theorem 3.

Proof:

III-B2 Chains

Corollary 3.

III-B3 Rings

Corollary 4.

III-B4 Grids

Corollary 5.

III-B5 Trees

Corollary 6.

IV Small ball probability estimates for computation of linear functions

IV-A Computing linear functions of continuous observations

IV-A1 Gaussian sums

Corollary 7.

IV-A2 Sums of independent r.v.’s with log-concave distributions

Corollary 8.

Proof:

IV-A3 Sums of independent r.v.’s with bounded third moments

Corollary 9.

Proof:

IV-B Linear vector-valued functions

Corollary 10.

IV-C Linear function of discrete observations

Corollary 11.

IV-D Comparison with existing results

Corollary 12.

Proof:

V Comparison with upper bounds on computation time

V-A Rademacher sum over a dumbbell network

Example 3.

Corollary 13.

V-B Distributed averaging over discrete noisy channels

Example 4.

Corollary 14.

Proof:

VI Conclusion and future research directions

Acknowledgment

Appendix A Proof of Lemma 7

Appendix B Proof of Lemma 8

References

II-A Lower bound on $I(Z;{\widehat{Z}}_{v}|W_{\mathcal{S}})$

II-B Upper bound on $I(Z;{\widehat{Z}}_{v}|W_{\mathcal{S}})$ via cutset capacity

II-D Upper bound on $I(Z;{\widehat{Z}}_{v}|W_{\mathcal{S}})$ via SDPI