Achievable Rates and Low-Complexity Encoding of Posterior Matching for the BSC

Amaael Antonini, Rita Gimelshein,
and Richard Wesel This work was supported by the National Science Foundation (NSF) under Grant CCF-1955660. An earlier version of this paper was presented in part at the 2020 IEEE International Symposium on Information Theory (ISIT) [1] [DOI: 10.1109/ISIT44484.2020.9174232 ]. (Corresponding author: Amaael Antonini.) A. Antonini is with the Department of Electrical and Computer Engineering, University of California, Los Angeles, Los Angeles, CA, 90095 USA (e-mail: amaael@ucla.edu). R. Gimelshein is with the Department of Electrical and Computer Engineering, University of California, Los Angeles, Los Angeles, CA, 90095 USA (e-mail: rgimel@ucla.edu). Minghao Pan is with the California Institute of Technology. R. D. Wesel is with the Department of Electrical and Computer Engineering, University of California, Los Angeles, Los Angeles, CA, 90095 USA (e-mail: wesel@ucla.edu).

Abstract

Horstein, Burnashev, Shayevitz and Feder, Naghshvar et al. and others have studied sequential transmission of a $k$ -bit message over the binary symmetric channel (BSC) with full, noiseless feedback using posterior matching. Yang et al. provide an improved lower bound on the achievable rate using martingale analysis that relies on the small-enough difference (SED) partitioning introduced by Naghshvar et al. SED requires a relatively complex encoder and decoder. To reduce complexity, this paper replaces SED with relaxed constraints that admit the small enough absolute difference (SEAD) partitioning rule. The main analytical results show that achievable-rate bounds higher than those found by Yang et al. [2] are possible even under the new constraints, which are less restrictive than SED. The new analysis does not use martingale theory for the confirmation phase and applies a surrogate channel technique to tighten the results. An initial systematic transmission further increases the achievable rate bound. The simplified encoder associated with SEAD has a complexity below order $O(K^{2})$ and allows simulations for message sizes of at least 1000 bits. For example, simulations achieve $99$ % of of the channel’s $0.50$ -bit capacity with an average block size of 200 bits for a target codeword error rate of $10^{-3}$ .

Index Terms:

Posterior matching, binary symmetric channel, noiseless feedback, random coding.

I Introduction

Consider sequential-transmission over the binary symmetric channel with full, noiseless feedback as depicted in Fig. 1. The source data at the transmitter is a $K$ -bit message $\theta$ , uniformly sampled from $\{0,1\}^{K}\triangleq\Omega$ . At each time $t=1,2,\dots\tau$ , input symbol $X_{t}$ is transmitted across the channel, and output symbol $Y_{t}$ is received, where $X_{t},Y_{t}\in\{0,1\}$ and $\Pr(Y_{t}=1\mid X_{t}=0)=\Pr(Y_{t}=0\mid X_{t}=1)=p\ \forall t$ . The received symbol $Y_{t}$ is available to the transmitter for encoding symbol $X_{t+1}$ (and subsequent symbols) via the noiseless feedback channel.

Figure 1: System diagram of a BSC with full, noiseless feedback.

The process terminates at stopping time $t=\tau$ when a reliability threshold is achieved, at which point the receiver computes an estimate $\hat{\theta}\in\Omega$ of $\theta$ from the received symbols $Y_{1},Y_{2},\dots,Y_{\tau}$ . The communication problem consists of obtaining a decoding estimate of $\theta$ at the smallest possible time index $\tau$ while keeping the error probability $\Pr(\hat{\theta}\neq\theta)$ bounded by a small threshold $\epsilon$ .

I-A Background

Shannon [3] showed that feedback cannot increase the capacity of discrete memoryless channels (DMC). However, when combined with variable-length coding, Burnashev [4] showed that feedback can help increase the frame error rate’s (FER) decay rate as a function of blocklength. One such variable length coding method was pioneered by Horstein [5]. Horstein’s sequential transmission scheme was presumed to achieve the capacity of the BSC, which was later proved by Shayevitz and Feder [6] showing that it satisfies the criteria of a posterior matching scheme. A posterior matching (PM) scheme was defined by Shayevitz and Feder as one that satisfies the two requirements of the posterior matching principle:

1.

The input symbol at time $t+1$ , $X_{t+1}$ , is a fixed function of a random variable $U$ , that is independent of the received symbol history $Y^{t}\triangleq\{Y_{1},Y_{2},\dots,Y_{t}\}$ ; and
2.

The transmitted message, $\theta$ , can be uniquely recovered from $(U,Y^{t})$ a.s.

Gorantla and Coleman [7] used Lyapunov functions for an alternative proof that PM schemes achieve the channel capacity. Later, Li and El-Gamal [8] proposed a capacity achieving “posterior matching” scheme with fixed block-length for DMC channels. Their scheme used a random cyclic shift that was later used by Shayevitz and Feder for a simpler proof that Horstein’s scheme achieves capacity [9]. Naghshvar et. al. [10] proposed a variable length, single phase “posterior matching” scheme for discrete DMC channels with feedback that exhibits Burnashev’s optimal error exponent, and used a sub-martingale analysis to prove that it achieves the channel capacity. Bae and Anastasopoulos [11] proposed a PM scheme that achieves the capacity of finite state channels with feedback. Since then, other “posterior matching” algorithms have been developed, see [12, 13, 14, 15, 16]. Other variable length schemes that attain Burnashev’s optimal error exponent have also been developed, and some can be found in [17, 18, 19, 20, 21].

Feedback communication over the BSC in particular has been the subject of extensive investigation. Capacity-approaching, fixed-length schemes have been developed such as [8], but these schemes only achieve low frame error rates (FERs) at block sizes larger than 1000 bits. For shorter block lengths, capacity-approaching, variable-length schemes have also been developed, e.g., [5], [4], [10]. Recently, Yang et al. [2] provided the best currently available achievability bound for these variable-length schemes. Yang et al. derive an achievable rate using encoders that satisfy the small-enough-difference (SED) constraint. However, the complexity of variable-length schemes satisfying that constraint can grow quickly with message size, becoming too complex for practical implementation even at block lengths significantly below those addressed by the fixed-length schemes such as in [8].

I-B Contributions

In our precursor conference paper [1], we simplified the implementation of an encoder that enforces the SED constraint both by initially sending systematic bits and by grouping the messages according to Hamming distance from the received systematic bits. The contributions of the current paper include the following:

•

This paper provides a new analysis framework for posterior matching on the BSC that avoids martingale analysis in the communication phase in order to show that the achievable rate of [2] can be achieved with a broader set of encoders that satisfy less restrictive criteria than the SED constraint. Thm. 3 provides an example of a constraint, the small-enough-absolute-difference (SEAD) constraint, that meets the new, relaxed criteria.
•

The relaxed criteria allow a significant reduction of encoder complexity. Specifically, this paper shows that applying a new partitioning algorithm, thresholding of ordered posteriors (TOP), induces a partitioning that meets the SEAD constraints. The TOP algorithm facilitates further complexity reduction by avoiding explicit computation of posterior updates for the majority of messages, since those posterior updates are not required to compute the threshold position. This low-complexity encoding algorithm achieves that same rate performance that has been previously established for SED encoders in, e.g., [2].
•

Our new analysis further tightens the achievable rate bound provided in [2]. This new achievable rate lower bound applies to both the SED encoder analyzed in [2] and to our new, simpler, encoder.
•

We also show that using systematic transmissions as in [1] to initially send the message meets both the relaxed criteria including SEAD as well as the SED constraint. Complexity is reduced during the systematic transmission, with the required operations limited to simply storing the received sequence.
•

We generalize the concept of the “surrogate process” $U^{\prime}_{i}(t)$ , used in Sec V-E of [2], to a broader class of processes that are not necessarily sub-martingales. The ability to construct such “surrogate” processes allows tighter bounds that also apply to the original process.
•

Taken together, these results demonstrate that variable-length coding with full noiseless feedback can closely approach capacity with modest complexity.
•

Regarding complexity, the simplified encoder associated with SEAD has a complexity below order $O(K^{2})$ and allows simulations for message sizes of at least $1000$ bits. The simplified encoder organizes messages according to their type, i.e. their Hamming distance from the received word, orders messages according to their posterior, and partitions the messages with a simple threshold without requiring any swaps.
•

Regarding proximity to capacity, our achievable rate bounds show that with codeword error rate of $10^{-3}$ SEAD posterior matching can achieve $96$ % of the channel’s $0.50$ -bit capacity for an average blocklength of $199.08$ bits corresponding to a message with $k=47$ bits. Simulations with our simplified encoder achieve $99$ % of of the channel’s $0.50$ -bit capacity for a target codeword error rate of $10^{-3}$ with an average block size of $201.08$ bits corresponding to a message with $k=49$ bits.

I-C Organization

The rest of the paper proceeds as follows. Sec. II describes the communication process, introduces the problem statement, and reviews the highest existing achievability bound, by Yang et al. [2], as well as the scheme that achieves it, by Naghshvar et. al. [10]. Sec. III introduces Thms. 1, 2 and 3 that together relax the sufficient constraints to guarantee a rate above Yang’s lower bound and further tightens Yang’s bound. Sec. IV introduces Lemmas 1-5 and provides the proof of Thm. 1 via Lemmas 1-5. Sec. V provides the proofs of Lemmas 1-5, and the proof of Thm. 2 and Thm. 3. Sec. VI generalizes the new rate lower bound to arbitrary input distributions and derives an improved lower bound for the special case where a uniform input distribution is transformed into a binomial distribution through a systematic transmission phase. Sec. VII describes the TOP partitioning method and implements a simplified encoder that organizes messages according to their type, applies TOP, and employs initial systematic transmissions. Sec. VIII compares performance from simulations using the simplified encoder to the new achievability bounds. Sec. IX provides our conclusions. The AppendixX provides detailed proof of the second part of Thm. 3 and the proof of claim 1.

II Posterior Matching with SED Partitioning

II-A Communication Scheme

Our proposed communication scheme and simplified encoding algorithm are based on the single phase transmission scheme proposed by Naghshvar et. al. [10]. Before each transmission, both the transmitter and the receiver partition the message set $\Omega=\{0,1\}^{K}$ into two sets, $S_{0}$ and $S_{1}$ . The partition is based on the received symbols $Y^{t}$ according to a specified deterministic algorithm known to both the transmitter and receiver. Then, the transmitter encodes $X_{t}=0$ if $\theta\in S_{0}$ and $X_{t}=1$ if $\theta\in S_{1}$ , i.e.

X_{t}=\text{enc}(\theta,Y^{t})=\mathbbm{1}_{i\in S_{1}}

(1)

After receiving symbol $Y_{t}$ , the receiver computes the posterior probabilities:

\rho_{i}(y^{t})\triangleq P(\theta=i\mid Y^{t}=y^{t}),\>\forall i\in\{0,1\}^{K}.

(2)

The transmitter also computes these posteriors, as it has access to the received symbol $Y_{t}$ via the noiseless feedback channel, which allows both transmitter and receiver to use the same deterministic partitioning algorithm. The process repeats until the first time $\tau$ that a single message $i$ attains a posterior $\rho_{i}(y^{\tau})\geq 1-\epsilon$ . The receiver chooses this message $i$ as the estimate $\hat{\theta}$ . Since $\theta$ is uniformly sampled, every possible message $j\in\{0,1\}^{K}$ has the same prior: $\Pr(\theta=j)=2^{-K}$ .

To prove that the SED scheme of Naghshvar et. al. [10] is a posterior matching BSC scheme as described in [6], it suffices to show that the the scheme uses the same encoding function as [6] applied to a permutation of the messages. Since the posteriors $\rho_{i}(y^{t})$ are fully determined by the history of received symbols $Y^{t}$ , a permutation of the messages can be defined concatenating the messages in $S_{0}$ and $S_{1}$ , each sorted by decreasing posterior. This permutation induces a c.d.f. on the corresponding posteriors. Then, to satisfy the posterior matching principle, the random variable $U$ could just be the c.d.f. evaluated at the last message before $\theta$ . The resulting encoding function is given by $X_{t+1}=0$ if $U<1/2$ , otherwise $X_{t+1}=1$ .

Naghshvar et. al. proposed two methods to construct the partitions $S_{0}$ and $S_{1}$ . The simplest one, described as the small enough difference encoder (SED) [2], consists of an algorithm that terminates when the SED constraint bellow is met:

0\leq\sum_{i\in S_{0}}\rho_{i}(y^{t})-\sum_{i\in S_{1}}\rho_{i}(y^{t})<\min_{i\in S_{0}}\rho_{i}(y^{t})\,.

(3)

The algorithm starts with all messages in $S_{0}$ and a vector of posteriors $\bm{\rho}_{t}\triangleq[\rho_{1}(y^{t}),\dots,\rho_{2^{K}}(y^{t})]$ of the messages $\{1,\dots,2^{K}\}$ . The items are moved to $S_{1}$ one by one, from smallest to largest posterior. The process ends at any point where rule (3) is met. If the accumulated probability in $S_{0}$ falls below $\frac{1}{2}$ , then the labelings of $S_{0}$ and $S_{1}$ are swapped, after which the process resumes.

The worst case scenario complexity of this algorithm is of order $O(M^{2})$ , where $M=2^{K}$ is the number of posteriors. The $M$ is squared because part of the process repeats after every swap, and in the worst case scenario the number of swaps is proportional to $M$ . However, a likely scenario is that the process ends after very few swaps, in which case the complexity is of order $O(M)=O(2^{K})$ .

The second method by which Naghshvar et al. proposed to construct $S_{0}$ and $S_{1}$ consists of an exhaustive search over all possible partitions, i.e., the power set $2^{\Omega}$ , and a metric to determine the optimal partition. This search would clearly include the partitioning of the first method, and therefore, also provide the guarantees of equations (9) and (14).

II-B Yang’s Achievable Rate

Yang et. al. [2] developed the upper bound (7) on the expected block length $\tau$ of the SED encoder that, to the best of our knowledge, is the best upper bound that has been developed for the model.

The analysis by Yang et al. consists of two steps. The first step, in [2] Thm. $7$ , consists of splitting the single phase process from Naghshvar et. al. [10] into a two phase process: the communication phase, with stopping time $T$ , where $\rho_{i}(y^{t})<\frac{1}{2}$ and a confirmation phase where $\rho_{i}(y^{t})\geq\frac{1}{2}$ , when the transmitted message $\theta$ is the message $i$ . This is a method first used by Burnahsev in [4]. With the first step alone, the following upper bound on the expected blocklength can be constructed:

	$\displaystyle\mathsf{E}[\tau]$	$\displaystyle\leq\frac{\log_{2}(M-1)+C_{2}}{C}+\left\lceil\frac{\log_{2}(\frac{1-\epsilon}{\epsilon})}{C_{2}}\right\rceil\frac{C_{2}}{C_{1}}$
		$\displaystyle\phantom{=\,}+2^{-C_{2}}\left(\frac{2C_{2}}{C}-\frac{C_{2}}{C_{1}}\right)\frac{1-\frac{\epsilon}{1-\epsilon}2^{-C_{2}}}{1-2^{-C_{2}}}\,.$		(4)

where $C$ is the channel capacity, defined by $C\triangleq 1-H(p)$ , and $H(p)\triangleq-p\log_{2}(p)-(1-p)\log_{2}(1-p)$ , and the constants $C_{2}$ and $C_{1}$ from [2] are given by:

	$\displaystyle C_{2}$	$\displaystyle\triangleq\log_{2}\left(\frac{q}{p}\right)$		(5)
	$\displaystyle C_{1}$	$\displaystyle\triangleq q\log_{2}\left(\frac{q}{p}\right)+p\log_{2}\left(\frac{p}{q}\right)\,.$		(6)

The second step, in [2] Lemma $4$ , consists of synthesizing a surrogate martingale $U^{\prime}_{i}(t)$ with stopping time $T^{\prime}$ that upper bounds $T$ , which is a degraded version of the sub-martingale $U_{i}(t)$ . The martingale $U^{\prime}_{i}(t)$ guarantees that whenever $U^{\prime}_{i}(t)<0$ , then $U^{\prime}_{i}(t+1)\leq\frac{1}{q}\log_{2}(2q)$ and while still satisfying the constraints needed to guarantee the bound (4). An achievability bound on the expected blocklength for the surrogate process, $U^{\prime}_{i}(t)$ , is constructed from (4) by replacing some of the $C_{2}$ values by $\frac{1}{q}\log_{2}(2q)$ . The new bound from [2] Lemma $4$ is given by:

	$\displaystyle\mathsf{E}[\tau]$	$\displaystyle\leq\frac{\log_{2}(M-1)}{C}+\frac{\log_{2}(2q)}{q\cdot C}+\left\lceil\frac{\log_{2}(\frac{1-\epsilon}{\epsilon})}{C_{2}}\right\rceil\frac{C_{2}}{C_{1}}$
		$\displaystyle\phantom{=\,}+2^{-C_{2}}\left(\frac{C_{2}+\frac{\log_{2}(2q)}{q}}{C}-\frac{C_{2}}{C_{1}}\right)\frac{1-\frac{\epsilon}{1-\epsilon}2^{-C_{2}}}{1-2^{-C_{2}}}\,.$		(7)

This bound also applies to the original process $U_{i}(t)$ , since the blocklength of the process $U^{\prime}_{i}(t)$ upper bounds that of $U_{i}(t)$ . The bound (7) is lower because $\frac{1}{q}\log_{2}(2q)$ is smaller than $C_{2}$ . The improvement is more significant as $p\rightarrow 0$ because $\frac{1}{q}\log_{2}(2q)$ grows from $0$ to $1$ as $p\rightarrow 0$ , while $C_{2}$ , instead, grows from $0$ to infinity. The rate lower bound is given by $\frac{K}{\mathsf{E}[\tau]}$ , where $\mathsf{E}[\tau]$ is upper bounded by (7) from Thm. $7$ [2].

II-C Original Constraints that Ensure Yang’s Achievable Rate

Let $\mathcal{F}_{t}\triangleq\sigma(Y^{t})$ , the $\sigma$ -algebra generated by the sequence of received symbols up to time $t$ , where $Y^{t}=[Y_{1},Y_{2},\dots,Y_{t}]$ . For each $i=1,\dots,M$ , let the processes $U_{i}(Y^{t})$ by:

\displaystyle U_{i}(t)=U_{i}(Y^{t})

\displaystyle\triangleq\log_{2}\left(\frac{\rho_{i}(Y^{t})}{1-\rho_{i}(Y^{t})}\right)\,.

(8)

Yang et. al. show that the SED encoder from Naghshvar et. al. [10] guarantees that the following constraints (9)-(12) are met:

$\displaystyle\mathsf{E}[U_{i}(t+1)\|\mathcal{F}_{t},\theta=i]$	$\displaystyle\geq U_{i}(t)+C,$	$\displaystyle\text{if }U_{i}(t)<0,$	(9)
$\displaystyle\|U_{i}(t+1)-U_{i}(t)\|$	$\displaystyle\leq C_{2}.$		(10)
$\displaystyle\mathsf{E}[U_{i}(t+1)\|\mathcal{F}_{t},\theta=i]$	$\displaystyle=U_{i}(t)+C_{1},$	$\displaystyle\text{if }U_{i}(t)\geq 0,$	(11)
$\displaystyle\|U_{i}(t+1)-U_{i}(t)\|$	$\displaystyle=C_{2},$	$\displaystyle\text{if }U_{i}(t)\geq 0,$	(12)

Meanwhile, Naghshvar et. al. show that the SED encoder also satisfied the more strict constraint that the average log likelihood ratio $\mathbf{U}(t)$ , as defined in equation (13), is also a submartingale that satisfies equation (14), which is equivalent to (15):

	$\displaystyle\mathbf{U}(Y^{t})\triangleq\sum_{i=1}^{M}\rho_{i}(Y^{t})U_{i}(Y^{t})$		(13)
	$\displaystyle\mathsf{E}[\mathbf{U}(Y^{t+1})\mid\mathcal{F}_{t}]\geq\mathbf{U}(Y^{t})+C$		(14)
	$\displaystyle\mathsf{E}\left[\sum_{i=1}^{M}\left(\rho_{i}(y^{t\!+\!1})U_{i}(t\!+\!1)\!-\!\rho_{i}(y^{t})U_{i}(t)\right)\Bigg{\|}\mathcal{F}_{t}\right]\geq C\,.$		(15)

The process $\mathbf{U}(t)$ is a weighted average of values $U_{i}(t)$ , some of which increase and some of which decrease after the next transmission $t+1$ .

To derive the bound (4), Yang et al. split the decoding time $\tau$ into $T$ and $\tau-T$ , where $T$ is an intermediate stopping time defined by the first crossing into the confirmation phase. The expectation $\mathsf{E}[T]$ is analyzed in [2] as a martingale stopping time, and requires that if $\theta=i$ , then $U_{i}(t)$ be a strict submartingale that satisfies the inequalities (9) and (10). The expectation $\mathsf{E}[\tau-T]$ is analyzed using a Markov Chain that exploits the larger and fixed magnitude step size (12) and inequality (11). Since $T$ is the time of the first crossing into the confirmation phase, the Markov Chain model, needs to include in the time $\tau-T$ the time that message $i$ takes to return to the confirmation phase if it has fallen back to the communication phase, that is: $\rho_{i}(y^{t})<\frac{1}{2}$ for some $t>T$ .

III A New Bound and Relaxed Partitioning

In the following section, we introduce relaxed conditions that are still sufficient to allow a sequential encoder over the BSC with full feedback to attain the performance of Yang’s bound (7). Specifically, we replace the requirement in (9) that applies separately to each message with a new requirement in (20) that applies to an average over all possible messages. For each individual message, we require in (16) that each step size is larger than the same positive a. The relaxed conditions are easier to enforce than (9), e.g. by the SEAD partitioning constraint introduced in Thm. 3.

III-A Relaxed Constraints that Also Guarantee Bound (4)

We begin with a theorem that introduces relaxed conditions and shows that they guarantee the performance (4), corresponding to the first step of Yang’s analysis.

Theorem 1.

Let $\tau$ be the stopping time of a sequential transmission system over the BSC. At each time $t$ let the posteriors $\rho_{1}(Y^{t}),\rho_{2}(Y^{t}),\dots,\rho_{M}(Y^{t})$ be as defined in (2) and the log likelihood ratios $U_{1}(t),\dots,U_{M}(t)$ be as defined in (8). Suppose that for all times $t$ for all received symbols $y^{t}$ , and for each $j\in\Omega$ , the constraints (16)-(19) are satisfied:

$\displaystyle\mathsf{E}[U_{j}(t+1)-U_{j}(t)\|\mathcal{F}_{t},\theta=j]$	$\displaystyle\geq a\,,$	$\displaystyle\text{where }a$	$\displaystyle>0\,,$	(16)
$\displaystyle U_{j}(t+1)-U_{j}(t)$	$\displaystyle\leq C_{2}\,,\quad$	$\displaystyle\text{if }U_{j}(t)$	$\displaystyle\leq 0\,,$	(17)
$\displaystyle\mathsf{E}[U_{j}(t+1)-U_{j}(t)\|\mathcal{F}_{t},\theta=j]$	$\displaystyle=C_{1}\,,$	$\displaystyle\text{if }U_{j}(t)$	$\displaystyle\geq 0\,,$	(18)
$\displaystyle\mid U_{j}(t+1)-U_{j}(t)\mid$	$\displaystyle=C_{2}\,,$	$\displaystyle\text{if }U_{j}(t)$	$\displaystyle\geq 0\,.$	(19)

Suppose further that for all $t$ and $y^{t}$ the following condition is satisfied:

\displaystyle\sum_{j=1}^{M}\mathsf{E}[\rho_{j}(y^{t})\left(U_{j}(t\!+\!1)\!-\!U_{j}(t)\right)|

\displaystyle Y^{t}=y^{t},

\displaystyle\theta=j])

\displaystyle\geq C.

(20)

Then, expected stopping time $\mathsf{E}[\tau]$ is upper bounded by (21).

	$\displaystyle\mathsf{E}[\tau]$	$\displaystyle\leq\frac{\log_{2}(M\!-\!1)+C_{2}}{C}+\left\lceil\frac{\log_{2}(\frac{1-\epsilon}{\epsilon})}{C_{2}}\right\rceil\frac{C_{2}}{C_{1}}$
		$\displaystyle\phantom{+\,}+2^{-C_{2}}\left(\frac{C_{2}}{C}-\frac{C_{2}}{C_{1}}\right)\frac{1-\frac{\epsilon}{1-\epsilon}2^{-C_{2}}}{1-2^{-C_{2}}}\,.\quad\boxempty$		(21)

The proof is provided in Sec IV-B.

In equation (20) the values of $\rho_{j}(y^{t})$ and $U_{j}(t)$ are fixed since they are functions of the $y^{t}$ specified by the conditioning. Then, the expectation $\mathsf{E}[\rho_{j}(y^{t})|Y^{t}=y^{t},\theta=j])$ is simply the constant $\rho_{j}(y^{t})$ , and we can also write (20) as the weighted sum of expectations:

\displaystyle\sum_{j=1}^{M}\rho_{j}(y^{t})\mathsf{E}[\left(U_{j}(t\!+\!1)\!-\!U_{j}(t)\right)|

\displaystyle Y^{t}=y^{t},

\displaystyle\theta=j])

\displaystyle\geq C.

(22)

Meanwhile the value of $U_{j}(t\!+\!1)$ for each $j$ is a random variable that takes on two possible values depending on the value of $Y_{t+1}$ .

The sequential transmission process begins by randomly selecting a message $\theta$ from $\Omega$ . Using that selected message, at each time $t$ until the decoding process terminates, the process computes an $X_{t}=x_{t}$ , which induces a $Y_{t}=y_{t}$ at the receiver. The original constraint (9) dictates that $\{U_{i}(t),\theta=i\}$ is a sub-martingale and allows for a bound on $U_{i}(t)$ at any future time $t+s$ for any possible selected message $i$ , i.e. $\mathsf{E}[U_{i}(t+s)\mid\mathcal{F}_{t},\theta=i]\geq U_{i}(t)+sC$ . This is no longer the case with the new constraints in Thm. 1. While equation (16) of the new constraints makes the process $U_{i}(t)$ a sub-martingale, it only guarantees that $\mathsf{E}[U_{i}(t+s)\mid\mathcal{F}_{t},\theta=i]\geq U_{i}(t)+sa$ and $a$ could be any small positive constant. The left side of equation (20) is a sum that includes all $M$ realizations of the message, it is a constraint for each fixed time $t$ and each fixed event $Y^{t}=y^{t}$ that governs the behavior across the entire message set and does not define a sub-martingale. For this reason, the martingale analysis used by Naghshvar et. al. in [10] and Yang et al. in [2] no longer be applies.

A new analysis is needed to derive (21), the bound on the expected stopping time $\tau$ , using only the constraints of Thm. 1. This new analysis needs to exploit the property that the expected stopping time is over all messages, that is: $\mathsf{E}[\tau]=\sum_{i=1}^{M}\Pr(\theta=i)\mathsf{E}[\tau\mid\theta=i]$ which the original does not use because it guarantees that the bound (7) holds for each message, i.e., $\mathsf{E}[\tau\mid\theta=i],i=1,\dots,M$ . Note, however, that the original constraint (9) does imply that the new constraints are satisfied, so that the results we derive below also apply to the setting of Naghshvar et. al. in [10] and Yang et al. in [2]. The new constraints allow for a much simpler encoder and decoder design. This simpler design motivates our new analysis that forgoes the simplicity afforded by modeling the process $\{U_{i}(t),\theta=i\}$ as a martingale.

The new analysis seeks to accumulate the entire time that a message $i$ is not in its confirmation phase, i.e. the time during which the encoder is either in the communication phase or in some other message’s confirmation phase. For each $n=1,2,3,\dots$ , let $T_{n}$ be the time at which the confirmation phase for message $i$ starts for the $n$ th time (or the process terminates) and let $t^{(n)}_{0}$ be the time the encoder exits the confirmation phase for message $i$ for the $(n-1)^{th}$ time (or the process terminates). That is, for each $n=1,2,3,\dots$ , let $t^{(n)}_{0}$ and $T_{n}$ be defined recursively by $t_{0}^{(1)}=0$ and :

	$\displaystyle T_{n}$	$\displaystyle=\min\{t\geq t^{(n)}_{0}$	$\displaystyle:U_{i}(t)\geq 0\,\text{or }t=\tau\}$		(23)
	$\displaystyle t^{(n+1)}_{0}$	$\displaystyle=\min\{t\geq T_{n}$	$\displaystyle:U_{i}(t)<0\,\text{or }t=\tau\}\,.$		(24)

Thus, the total time the process $U_{i}(t)$ is not in its confirmation phase is given by:

T\triangleq\sum_{n=1}^{\infty}\left(T_{n}\!-\!t^{(n)}_{0}\right)\,.

(25)

III-B A “Surrogate Process” that can Tighten the Bound

First we want to note that the bound (21) is loose compared to (7). It is loose because when the expectation $\mathsf{E}[\tau]$ is split in two parts, $\mathsf{E}[T]$ and $\mathsf{E}[\tau-T]$ , to analyze them separately, a sub-optimal factor is introduced in the expression for $\mathsf{E}[T]$ , which is $\frac{1}{C}(\log_{2}(M-1)+C_{2})$ . The sub-optimality is derived from the term $C_{2}$ , which is the largest value that $U_{i}(t)$ can take at the start of the confirmation phase and makes the term $\mathsf{E}[T]$ large. However, this large $C_{2}$ is not needed to satisfy any of the constraints in Thm. 1. To overcome this sub-optimality, we use a surrogate process that is a degraded version of the process $U_{i}(t)$ , where the value at the start of the confirmation phase is bounded by a constant smaller than $C_{2}$ . The surrogate process is a degradation in the sense that it is always below the value of the original process $U_{i}(t)$ .

Perhaps the utility of the surrogate process can be better understood through the following frog-race analogy illustrated in Fig. 2. A frog $f_{1}$ traverses a race track of length $L$ jumping from one point to the next. The distance traveled by frog $f_{1}$ in a single jump is upper bounded by $u_{1}$ . The jumps are not necessarily IID, but we know that the expected length of each jump is lower bounded by $l$ . It is also possible that $f_{1}$ takes some jumps backwards. With only this information, we want to determine an upper bound on the average number of steps frog $f_{1}$ takes to reach the end of the track. This could be done using Doob’s optional stopping theorem [22] to compute the upper bound as $\frac{L+u_{1}}{l}$ , the maximum distance $L+u_{1}$ traveled from the origin to the last jump divided by the lower bound on average distance $l$ of a single jump.

Perhaps this bound can be improved. The final point is located between $L$ and $L+u_{1}$ and is reached in a single jump from a point between $L-u_{1}$ and $L$ . If for instance, the frog was restricted to only forward jumps, we could replace $u_{1}/l$ by just $1$ , but the process $U_{i}(t)$ actually can take steps backwards. Instead we exploit another property of $U_{i}(t)$ , which is that maximum step size $C_{2}$ is not needed to guarantee the lower bound $C$ on the average step size.

Suppose now that a surrogate frog $f_{2}$ participates in the race along $f_{1}$ but with the following restrictions:

1.

$f_{1}$ and $f_{2}$ start in the same place and always jump at the same time.
2.

$f_{2}$ is never ahead of $f_{1}$ , i.e. when $f_{1}$ jumps forward, $f_{2}$ jumps at most as far, and when $f_{1}$ jumps backwards, $f_{2}$ jumps at least as far.
3.

Moreover, the forward distance traveled by frog $f_{2}$ in a single jump is upper bounded by $u_{2}<u_{1}$ .
4.

Despite its slower progress, the surrogate frog $f_{2}$ still satisfies the property that the expected length of each jump is lower bounded by $l$ .

The average number of steps taken by $f_{2}$ will be upper bounded by $\frac{L+u_{2}}{l}$ , also by Doob’s optional stopping theorem. Since $f_{2}$ is never ahead of $f_{1}$ then $f_{2}$ crossing the finish line implies that $f_{1}$ has as well. Thus, $\frac{L+u_{2}}{l}$ is also an upper bound on the average number of jumps required for frog $f_{1}$ reach across $L$ .

The equivalent to the surrogate frog $f_{2}$ is what we proceed to define in Thm. 2, where the length $L=\log(M-1)$ , $u_{1}=C_{2}$ , $u_{2}=B$ and $l=C$ .

Refer to caption — Figure 2: Example: frogs $f_{1}$ and $f_{2}$ jumping from $0$ to $L$ . The length of a single jump by $f_{1}$ is at most $u_{1}$ . Frog $f_{2}$ jumps at the same times as $f_{1}$ , however, the length of a single jump by $f_{2}$ is at most $u_{2}<u_{1}$ . This restriction forces frog $f_{2}$ to be always behind $f_{1}$ and thus reach $L$ no sooner than frog $f_{1}$ .

Theorem 2 (Surrogate Process Theorem).

Let the surrogate process $U^{\prime}_{i}(t)$ be a degraded version of $U_{i}(t)$ that still satisfies the constraints of Thm. 1. Initialize the surrogate process as $U^{\prime}_{i}(0)=U_{i}(0)$ and reset $U^{\prime}_{i}(t)$ to $U_{i}(t)$ at every $t=t^{(n)}_{0}$ , that is at each $t$ that the encoder exits a confirmation phase round for message $i$ . Define $T^{\prime}_{n}\triangleq\min\{t\geq t_{0}^{(n)}:U^{\prime}_{i}(t)\geq 0\,\text{or }t=t_{0}^{(n+1)}\}$ . Suppose that for some $B<C_{2}$ , the process $U^{\prime}_{i}(t)$ also satisfies the following constraints:

$\displaystyle U_{i}(t)<0\implies U^{\prime}_{i}(t\!+\!1)\!-\!U^{\prime}_{i}(t)$	$\displaystyle\leq U_{i}(t\!+\!1)\!-\!U_{i}(t)$	(26)
$\displaystyle U^{\prime}_{i}(t)<0\implies U^{\prime}_{i}(t\!+\!1)$	$\displaystyle\leq B$	(27)
$\displaystyle U^{\prime}_{i}\left(T^{\prime}_{n}\right)-\frac{p}{q}\left(U_{i}(T_{n})-C_{2}\right)$	$\displaystyle\leq B.$	(28)

Then the total time $U^{\prime}_{i}(t)$ is not in its confirmation phase is given by $T^{\prime}\triangleq\sum_{n=1}^{\infty}\left(T^{\prime}_{n}\!-\!t^{(n)}_{0}\right)$ , and $E[T]$ is bounded by:

\mathsf{E}[T]\leq\mathsf{E}[T^{\prime}]\leq\frac{B}{C}\left(1\!+\!2^{-C_{2}}\frac{1\!-\!2^{-NC_{2}}}{1\!-\!2^{-C_{2}}}\right)-\frac{\mathsf{E}[U_{i}(0)]}{C}\,.

(29)

Note that $T_{n}\leq T^{\prime}_{n}$ for all $n$ because $U_{i}(t)\geq U^{\prime}_{i}(t)$ from the definition of $U^{\prime}_{i}(t)$ and constraint (26), Therefore $T\leq T^{\prime}$ . Also note that after the process terminates at the stopping time $\tau$ , both $T_{n}$ and $t^{(n)}_{0}$ are equal to $\tau$ , which makes their difference $0$ . Then the communication phase times $T$ and $T^{\prime}$ are a sum of finitely many non-zero values. $\boxempty$

The proof is provided in Sec. V-B

III-C Relaxed Constraints that Achieve a Tighter Bound

The following theorem introduces partitioning constraints that guarantee that the constraints in Thm. 2 are satisfied with a value of $B=\log_{2}(2q)/q$ for the surrogate process. The new constraints are looser than the original SED constraint, and therefore are satisfied by an encoder that enforces the SED constraint. Using a new analysis we show that this encoder guarantees an achievability bound tighter than the bound (7) obtained in the second step of Yang’s analysis described in Section II. The value $B=\log_{2}(2q)/q$ is the lowest possible $B$ value that satisfies the constraints (26)-(28) of Thm. 2 for a system that enforces the original SED constraint (3). The new achievability applies to an encoder that satisfies the new relaxed constraint as well as one that satisfies the SED constraint.

Theorem 3.

Consider sequential transmission over the BSC with noiseless feedback as described in Sec. I with an encoder that enforces the Small Enough Absolute Difference (SEAD) encoding constraints, equations (30) and (31) bellow:

	$\displaystyle\left\|\underset{i\in S_{0}}{\sum}\rho_{i}(y^{t})-\underset{i\in S_{1}}{\sum}\rho_{i}(y^{t})\right\|\leq\underset{i\in S_{0}}{\min}\rho_{i}(y^{t})$		(30)
	$\displaystyle\rho_{i}(y^{t})\geq\frac{1}{2}\implies S_{0}=\{i\}\;\text{or }S_{1}=\{i\}\,.$		(31)

Then, the constraints (19)-(20) in Thm. 1 are satisfied and a process $U^{\prime}_{i}(t)$ , $i=1,\dots,M$ described in Thm. 2 can be constructed with $B=\frac{1}{q}\log_{2}(2q)$ . The resulting upper bound on $\mathsf{E}[\tau]$ is given by:

	$\displaystyle\mathsf{E}[\tau]\leq$	$\displaystyle\frac{\log_{2}(M-1)+\frac{\log_{2}(2q)}{q}}{C}+\frac{C_{2}}{C_{1}}\left\lceil\frac{\log_{2}\left(\frac{1-\epsilon}{\epsilon}\right)}{C_{2}}\right\rceil$
		$\displaystyle+2^{-C_{2}}\frac{1-\frac{\epsilon}{1-\epsilon}2^{-C_{2}}}{1-2^{-C_{2}}}\left(\frac{\log_{2}(2q)}{qC}-\frac{C_{2}}{C_{1}}\right)\,,$		(32)

which is lower than (7) from [2]. Note that meeting the SEAD constraints guarantees that both sets $S_{0}$ and $S_{1}$ are non empty. This is because if either set is empty, the the other one is the whole space $\Omega$ and the difference in (30) is $1$ , which is greater than any posterior in a space with more than one element. $\boxempty$

The proof is provided in Sec. V-B

The requirement $\rho_{i}(y^{t})\geq\frac{1}{2}\implies S_{0}=\{i\}\;\text{or }S_{1}=\{i\}$ , is needed to satisfy constraint (19) and guarantees constraint (18). This requirement is also enforced by the SED partitioning constraints in [10] and [2].

The SEAD partitioning constraint is satisfied whenever the SED constraint (3) is. However, the SEAD partitioning constraint allows for constructions of $S_{0}$ , $S_{1}$ that do not meet either of the SED constraints in [10] and [2], and is therefore looser. Particularly, SEAD partitioning allows for the case where $P_{1}-P_{0}>\underset{j\in S_{1}}{\max}\rho_{j}(y^{t})$ that often arises in the implementation shown in Sec VII-A. This case is not allowed under either of the SED constraints because they both demand that:

-\underset{j\in S_{1}}{\min}\rho_{j}(y^{t})\leq P_{0}-P_{1}\leq\underset{j\in S_{0}}{\min}\rho_{j}(y^{t})\,.

(33)

IV Supporting Lemmas and Proof of Theorem 1

This section presents some supporting Lemmas and the full proofs of Thm. 1. Let $T$ be the time the transmitted message $\theta$ spends in the communication phase or on an incorrect confirmation phase, that is, for $\theta=i$ , $U_{i}(t)<0$ as defined equation (25). Note that this definition is different from the stopping time used in [2] described in Sec. II. The proof of Thm. 1 consists of bounding $\mathsf{E}[T]$ and $\mathsf{E}[\tau-T]$ by expressions that derive from the constraints (16)-(20).

Since $\rho_{i}(y^{t})\geq 1-\epsilon\iff\frac{\rho_{i}(y^{t})}{1-\rho_{i}(y^{t})}\geq\frac{1-\epsilon}{\epsilon}\iff U_{i}(t)\geq\log_{2}\left(\frac{1-\epsilon}{\epsilon}\right)$ , the stopping rule described in Sec. II could be expressed by: $\tau\triangleq\{\min\;t:\underset{i}{\max}\{U_{i}(t)\}\geq\log_{2}\left(\frac{1-\epsilon}{\epsilon}\right)\}$ . To prove Thm. 1, we will instead use the stopping rule introduced by Yang et. al. [2], defined by:

\tau\triangleq\{\min t:\underset{i}{\max}\{U_{i}(t)\}\geq NC_{2}\}\,,

(34)

where $N\triangleq\left\lceil\frac{\log_{2}\left(\frac{1-\epsilon}{\epsilon}\right)}{C_{2}}\right\rceil$ . This rule models the confirmation phase as a fixed Markov Chain with exactly $N+1$ states. Since $NC_{2}\geq\log_{2}\left(\frac{1-\epsilon}{\epsilon}\right)$ , the stopping time under the new rule is larger than or equal to that of the original rule without the ceiling as explained in [2].

IV-A Five Helping Lemmas to Aid the Proof of Thms. 1-3

The expression to bound the expectation expectation $\mathsf{E}[T]$ is constructed via five inequalities (or equalities) each of which derives from one of the following five Lemmas. The proofs of the Lemmas will be provided in Sec. V-A.

Lemma 1.

Let the total time the transmitted message spends in the communication (or an incorrect confirmation phase) be $T$ and let $T_{n}$ and $t^{(n)}_{0}$ be as defined in (23) and (24). Define $T^{(n)}\triangleq T_{n}-t^{(n)}_{0}$ and let

\mathcal{Y}_{\epsilon}^{(i)}\triangleq\{y^{t}:\rho_{i}(y^{t})<\frac{1}{2},\rho_{j}(s)<1\!-\!\epsilon\;\forall s\leq t,j\in\Omega\}

(35)

then:

\displaystyle\mathsf{E}[T]

\displaystyle=\sum_{i=1}^{M}\Pr(\theta=i)\sum_{t=1}^{\infty}\sum_{y^{t}\in\mathcal{Y}_{\epsilon}^{(i)}}\Pr(Y^{t}=y^{t}\mid\theta=i)\,.

(36)

Note that $T^{(1)}=T_{1}$ is the time before entering the correct confirmation phase for the first time, that is, the time spent in the communication phase (or an incorrect confirmation phase) before the posterior $\rho_{i}(y^{t})$ of the transmitted message ever crosses $\frac{1}{2}$ . If the decoder stops (in error) before ever entering the correct confirmation phase, then $T^{(1)}$ is the time until the decoder stops. For $n>1$ , $T^{(n)}$ is the time between falling back from the correct confirmation for the $(n-1)^{th}$ time and either stopping (in error) or reentering the correct confirmation phase for the $(n)^{th}$ time. Thus, the total time the transmitted message $\theta=i$ has $U_{i}(t)<0$ is also given by $T=\sum_{n=1}^{\infty}T^{(n)}$ . Also note that that if the decoder stops before entering the correct confirmation phase for the $n^{th}$ time, then $T^{(n+m)}=0$ for all $m\geq 1$ . $\boxempty$

Lemma 2.

Suppose constraints (16), (19), and (20) of Thm. 1 are satisfied and let:

V_{i}(y^{t})\triangleq\mathsf{E}[U_{i}(t\!+\!1)\!-\!U_{i}(t)\!\mid\!Y^{t}=y^{t},\theta=i]\,,

(37)

then:

	$\displaystyle\sum_{i=1}^{M}$	$\displaystyle\Pr(\theta=i)\underset{y^{t}\in\mathcal{Y}_{\epsilon}^{(i)}}{\sum}V_{i}(y^{t})\Pr(Y^{t}=y^{t}\mid\theta=i)$
		$\displaystyle\geq C\sum_{i=1}^{M}\Pr(\theta=i)\underset{y^{t}\in\mathcal{Y}_{\epsilon}^{(i)}}{\sum}\Pr(Y^{t}=y^{t}\mid\theta=i)\,.\quad\boxempty$		(38)

Lemma 3.

Let $T_{n}$ and $t_{0}^{(n)}$ be the times defined in (23) and (24). Then, for the left side of sum (38) in Lemma (2) the following equality holds:

	$\displaystyle\sum_{i=1}^{M}$	$\displaystyle\Pr(\theta=i)\sum_{y^{t}\in\mathcal{Y}_{\epsilon}^{(i)}}V_{i}(y^{t})\Pr(Y^{t}=y^{t}\mid\theta=i)$
	$\displaystyle=$	$\displaystyle\sum_{i=1}^{M}\!\Pr(\theta\!=\!i)\sum_{n=1}^{\infty}\mathsf{E}[U_{i}(T_{n})\!-\!U_{i}(t_{0}^{(n)})\!\mid\!\theta\!=\!i]\,.\quad\boxempty$		(39)

Lemma 4.

Let $\epsilon$ be the decoding threshold and let the decoding rule be (34). Define the fallback probability as the probability that a subsequent round of communication phase occurs, computed at the start of a confirmation phase. Then, this fallback probability is a constant $p_{f}$ independent of the message $i=1,\dots,M$ , independent of the number of previous confirmation phase rounds $n$ , and is given by:

\displaystyle p_{f}

\displaystyle=2^{-C_{2}}\frac{1-2^{-NC_{2}}}{1-2^{-(N+1)C_{2}}}\,.\quad\boxempty

(40)

Lemma 5.

Let $p_{f}$ be the fallback probability in Lemma 4 and suppose that $U_{i}(0)<0\;\forall i=1,\dots,M$ . Then the expectation (39) in Lemma 3 is upper bounded by:

$\displaystyle\sum_{i=1}^{M}$	$\displaystyle\Pr(\theta=i)\sum_{n=1}^{\infty}\mathsf{E}[U_{i}(T_{n})\!-\!U_{i}(t_{0}^{(n)})\mid\theta=i]$
	$\displaystyle\leq\sum_{i=1}^{M}\Pr(\theta=i)\left(\frac{p_{f}}{1-p_{f}}C_{2}+C_{2}-U_{i}(0)\right)$	(41)
	$\displaystyle\leq 2^{-C_{2}}\frac{1-2^{-NC_{2}}}{1-2^{-C_{2}}}C_{2}+C_{2}-\mathsf{E}[U_{i}(0)]\,.\quad\boxempty$	(42)

IV-B Proof of Thm. 1 Using Lemmas 1-5:

Proof:

Using Lemmas 1 and 2, the expectation $\mathsf{E}[T]$ is bounded as follows:

	$\displaystyle\mathsf{E}[T]$	$\displaystyle=\sum_{i=1}^{M}\underset{y^{t}\in\mathcal{Y}_{\epsilon}^{(i)}}{\sum}\Pr(Y^{t}=y^{t},\theta=i)$		(43)
		$\displaystyle\leq\frac{1}{C}\sum_{i=1}^{M}\underset{y^{t}\in\mathcal{Y}_{\epsilon}^{(i)}}{\sum}V_{i}(y^{t})\Pr(Y^{t}=y^{t},\theta=i)\,.$		(44)

By Lemma (3), expression (44) is equal to the left side of inequality (41), which is bounded by (45) according to Lemma 5:

	$\displaystyle\frac{1}{C}\sum_{i=1}^{M}$	$\displaystyle\Pr(\theta\!=\!i)\sum_{n=1}^{\infty}\mathsf{E}[U_{i}(T_{n})\!-\!U_{i}(t_{0}^{(n)})\!\mid\!\theta\!=\!i]$
		$\displaystyle\leq\left(1+2^{-C_{2}}\frac{1-2^{-NC_{2}}}{1-2^{-C_{2}}}\right)\frac{C_{2}}{C}-\frac{\mathbf{U}(Y^{0})}{C}\,,$		(45)

where $\mathbf{U}(Y^{0})$ is the expected value of the log likelihood ratio of the true message according to the a-priori message distribution, i.e. from the perspective of the receiver before any symbols have been received. Note that $\mathbf{U}(Y^{0})$ is $-\log(M-1)$ for a uniform a-priori input distribution. Then, equations (43)-(45) yield the following bound on $\mathsf{E}[T]$ :

\displaystyle\mathsf{E}[T]

\displaystyle\leq 2^{-C_{2}}\frac{1-2^{-NC_{2}}}{1-2^{-C_{2}}}\frac{C_{2}}{C}+\frac{C_{2}-\mathbf{U}(Y^{0})}{C}\,.

(46)

A bound on the expectation $\mathsf{E}[\tau-T]$ can be obtained using the Markov Analysis in [2], Section V. F. However, our analysis of $\mathsf{E}[T]$ already accounts for all time spent in the communication phase, including the additional communication phases that occur after the system falls back from the confirmation phase. Accordingly, we reduce the self loop weight $\Delta_{0}$ in [2] Sec. V F from $\Delta_{0}=1+\frac{C_{2}}{C}+\frac{\log_{2}(2q)}{qC}$ to $\Delta_{0}=1$ . The resulting bound is given by:

\displaystyle\mathsf{E}[\tau-T]

\displaystyle\leq\left(N-2^{-C_{2}}\frac{1-2^{-NC_{2}}}{1-2^{-C_{2}}}\right)\frac{C_{2}}{C_{1}}\,.

(47)

The inequality in (47) is not equality because, in our analysis, the transmission ends if any message $j$ , other than the transmitted message $\theta$ , attains $U_{j}(t)\geq NC_{2}$ . However, in [2] the transmission only terminates when $U_{i}(t)\geq NC_{2}$ . The upper bound on the expected stopping time $\mathsf{E}[\tau]$ is obtained by adding the bounds in equations (46) and (47) and replacing $N$ by its definition in equation (34):

	$\displaystyle\mathsf{E}[\tau]\leq$	$\displaystyle\frac{\log_{2}(M-1)+C_{2}}{C}+\frac{C_{2}}{C_{1}}\left\lceil\frac{\log_{2}\left(\frac{1-\epsilon}{\epsilon}\right)}{C_{2}}\right\rceil$
		$\displaystyle+2^{-C_{2}}\frac{1-\frac{\epsilon}{1-\epsilon}2^{-C_{2}}}{1-2^{-C_{2}}}\left(\frac{C_{2}}{C}-\frac{C_{2}}{C_{1}}\right)\,.$		(48)

Note that $\frac{C_{2}}{C}-\frac{C_{2}}{C_{1}}\geq 0$ and since
$N=\left\lceil\frac{1}{C_{2}}\log_{2}\left(\frac{1-\epsilon}{\epsilon}\right)\right\rceil\leq\frac{1}{C_{2}}\log_{2}\left(\frac{1-\epsilon}{\epsilon}\right)+1$ , then
$2^{-NC_{2}}\geq 2^{-\log_{2}\left(\frac{1-\epsilon}{\epsilon}\right)-C_{2}}=\frac{\epsilon}{1-\epsilon}2^{-C_{2}}$ which is also used in [2] for a more compact upper bound expression.

∎

V Proof of Lemmas 1-5, Thm. 2, and Thm. 3

Before proceeding to prove Lemmas 1-5, we will introduce a claim that will aid in some of the proofs.

Claim 1.

For the communication scheme described in Sec. II, the following are equivalent:

(i)

$\mid U_{j}(t+1)-U_{j}(t)\mid=C_{2}$
(ii)

$S_{0}=\{j\}$ or $S_{1}=\{j\}$

This claim implies that for constraint (19) to hold, the set containing item $j$ , with $U_{j}(t)\geq 0$ , must be a singleton.

Proof:

See appendix A ∎

V-A Proof of Lemmas 1-5

Proof:

We begin by defining sets that are used in the proof. First we define $\mathcal{E}_{t}^{\epsilon}$ , the set of sequences of length $t$ where the process has not stopped:

\displaystyle\mathcal{E}_{t}^{\epsilon}\triangleq

\displaystyle\{y^{t}\!\in\{0,1\}^{t}\!\mid\rho_{j}(s)<1\!-\!\epsilon,\>\forall j\in\Omega,s\leq t\}\,,

(49)

and let $\mathcal{E}^{\epsilon}\triangleq\cup_{t=0}^{\infty}\mathcal{E}_{t}^{\epsilon}$ .

For each sequence $y^{t}\in\mathcal{E}^{\epsilon}$ , $N_{i}(y^{t})$ is the set of time values $t^{(1)}_{0},t^{(2)}_{0},\dots,t^{(n)}_{0}\leq t$ where message $i$ begins an interval with $U_{i}(t)<0$ . This includes time zero and all the times $s$ where from time $s-1$ to $s$ , message $i$ transitions from $U_{i}(s-1)\geq 0$ to $U_{i}(s)<0$ , i.e. the decoder falls back from confirmation phase to communication phase.

\displaystyle N_{i}(y^{t})

\displaystyle\triangleq\{0\}\cup\{s\leq t:U_{i}(s)<0,U_{i}(s-1)\geq 0\}\,.

(50)

Now we define the set $\mathcal{Y}^{(i,n)}_{\epsilon}$ of sequences $y^{t}$ for which the the following are all true: 1) the decoder has not stopped, 2) the decoder has entered the confirmation phase for message $i$ $n$ times, and 3) the decoder is not in the confirmation phase for message $i$ at time $t$ , where the sequence ends.

\displaystyle\mathcal{Y}^{(i,n)}_{\epsilon}

\displaystyle\triangleq\{y^{t}\!\in\mathcal{E}^{\epsilon}\!:\!\left|N_{i}(y^{t})\right|=n,U_{i}(t)<0\}\,.

(51)

For $t\geq s$ , let $y^{s:t}=[y_{s+1},\dots,y_{t}]$ and let $y^{s}y^{s:t}=\text{cat}([y_{1},\dots,y_{s}],[y_{s+1},\dots,y_{t}])=[y_{1},\dots,y_{s},y_{s+1},\dots,y_{t}]$ , the concatenation of the strings $y^{s}$ and $y^{s:t}$ . Now we define the set $\mathcal{Y}^{(i,n)}_{\epsilon}(y^{s})$ , which is the subset sequences in $\mathcal{Y}^{(i,n)}_{\epsilon}$ that have the sequence $y^{s}$ as a prefix.

\displaystyle\mathcal{Y}^{(i,n)}_{\epsilon}(y^{s})

\displaystyle\triangleq\{y^{t}\in\mathcal{Y}_{\epsilon}^{(i,n)}:y^{t}=y^{s}y^{s:t}\}

(52)

As our final definition, let $\mathcal{B}^{(i,n)}_{\epsilon}$ be the set containing only the sequences where the final received symbol $y_{t}$ is the symbol for which the decoder resumes the communication phase for message $i$ for the $n$ th time, or the empty string, that is:

\displaystyle\mathcal{B}^{(i,n)}_{\epsilon}

\displaystyle\triangleq\{y^{t}\in\mathcal{Y}^{(i,n)}_{\epsilon}\big{|}\;t\in N_{i}(y^{t})\}\,.

(53)

Each $y^{t}\in\mathcal{B}^{(i,n)}_{\epsilon}$ , sets an initial condition for the communication phase where $U_{i}(t)<0$ , so that $T^{(n)}\geq 1$ , that is $t$ is of the form $t^{(n)}_{0}$ defined in (24). By the property of conditional expectation, $\mathsf{E}[T]$ is given by:

\mathsf{E}[T]=\sum_{i=1}^{M}\Pr(\theta=i)\mathsf{E}[T\mid\theta=i]\,.

(54)

Now we explicitly write this expression as a function of all the possible initial conditions for each of the communication phase rounds $n$ , that is, the set $\mathcal{B}^{(i,n)}_{\epsilon}$ :

$\displaystyle\sum_{i=1}^{M}$	$\displaystyle\Pr(\theta=i)\mathsf{E}[T\mid\theta=i]$
	$\displaystyle=\sum_{i=1}^{M}\Pr(\theta=i)\mathsf{E}\left[\left(\sum_{n=1}^{\infty}T^{(n)}\right)\bigg{\|}\theta=i\right]$	(55)
	$\displaystyle=\sum_{i=1}^{M}\Pr(\theta=i)\sum_{n=1}^{\infty}\mathsf{E}\left[T^{(n)}\Big{\|}\theta=i\right]$	(56)
	$\displaystyle=\sum_{i=1}^{M}\Pr(\theta\!=\!i)\sum_{n=1}^{\infty}\underset{y^{s}\in\mathcal{B}^{(i,n)}_{\epsilon}}{\sum\Pr(Y^{s}}\!=\!y^{s}\!\mid\!\theta\!=\!i)$
	$\displaystyle\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\cdot\mathsf{E}\Big{[}T^{(n)}\Big{\|}Y^{s}\!=\!y^{s},\theta\!=\!i\Big{]}\,.$	(57)

Now we proceed to write the last expectation (57) using the tail sum formula for expectations in (58) and then as an expectation of the indicator of $\{T^{(n)}>0\}$ in (59). Then, since $T^{(n)}$ is a random function of $Y^{t}=Y^{s}Y^{r}$ , where $Y^{r}\in\{0,1\}^{r}$ , given by $\mathbbm{1}_{T^{(n)}>r}=\mathbbm{1}_{Y^{s}Y^{r}\in\mathcal{Y}^{(i,n)}_{\epsilon}}$ , (60) follows:

$\displaystyle\mathsf{E}\Big{[}$	$\displaystyle T^{(n)}\Big{\|}Y^{s}\!=\!y^{s},\theta\!=\!i\Big{]}$
	$\displaystyle=\sum_{r=0}^{\infty}\Pr(T^{(n)}>r\|Y^{s}=y^{s},\theta=i)$	(58)
	$\displaystyle=\sum_{r=0}^{\infty}\mathsf{E}[\mathbbm{1}_{T^{(n)}>r}\|Y^{s}=y^{s},\theta=i]$	(59)
	$\displaystyle=\sum_{r=0}^{\infty}\mathsf{E}[\mathbbm{1}_{Y^{s\!+\!r}\in\mathcal{Y}^{(i,n)}_{\epsilon}(y^{s})}\|Y^{s}=y^{s},\theta=i]\,.$	(60)

Expanding the expectation in (60) we obtain (61). Since the indicator in (61) is $0$ outside $\mathcal{Y}^{(i,n)}_{\epsilon}$ and $1$ inside, it is omitted in (62), where we have only considered values of $y^{s}z^{r}$ that intersect with $\mathcal{Y}^{(i,n)}_{\epsilon}$ . Since $\cup_{r=1}^{\infty}\{\{0,1\}^{r}\cap\mathcal{Y}^{(i,n)}_{\epsilon}(y^{s})\}=\mathcal{Y}^{(i,n)}_{\epsilon}(y^{s})$ (62) follows.

$\displaystyle\sum_{r=0}^{\infty}$	$\displaystyle\mathsf{E}[\mathbbm{1}_{Y^{s\!+\!r}\in\mathcal{Y}^{(i,n)}_{\epsilon}(y^{s})}\|Y^{s}=y^{s},\theta=i]$
	$\displaystyle=\sum_{r=0}^{\infty}\sum_{z^{r}\in\{0,1\}^{r}}\mathbbm{1}_{Y^{s\!+\!r}\in\mathcal{Y}^{(i,n)}_{\epsilon}(y^{s})}$
	$\displaystyle\quad\quad\quad\quad\quad\quad\cdot\Pr(Y^{s+r}\!=\!y^{s}z^{r}\mid Y^{s}\!=\!y^{s},\theta\!=\!i)$	(61)
	$\displaystyle=\underset{y^{s}z^{r}\in\cup_{r=1}^{\infty}\{\{0,1\}^{r}\cap\mathcal{Y}^{(i,n)}_{\epsilon}(y^{s})\}}{\sum\Pr(Y^{s+r}\!=\!y^{s}z^{r}}\mid Y^{s}\!=\!y^{s},\theta\!=\!i)$	(62)
	$\displaystyle=\underset{y^{s+r}\in\mathcal{Y}^{(i,n)}_{\epsilon}(y^{s})}{\quad\sum\Pr(Y^{s+r}}=y^{s}z^{r}\mid Y^{s}=y^{s},\theta=i)\,.$	(63)

The product of the conditional probabilities $\Pr(Y^{s}\!=\!y^{s}\mid\theta\!=\!i)$ in (57) and $\Pr(Y^{s+r}=y^{s}z^{r}\mid Y^{s}=y^{s},\theta=i)$ in (63) is given by $\Pr(Y^{s+r}=y^{s}z^{r}\mid\theta=i)$ . Replacing the expectation in (57) by (63) the inner-most sum in (57) becomes (64). The summation in (64) is over $\mathcal{Y}^{(i,n)}_{\epsilon}(y^{s})$ for each $y^{s}$ in $\mathcal{B}^{(i,n)}_{\epsilon}$ , which is equivalent to the sum over $\underset{y^{s}\in\mathcal{B}^{(i,n)}_{\epsilon}}{\cup}\mathcal{Y}^{(i,n)}_{\epsilon}(y^{s})=\mathcal{Y}^{(i,n)}_{\epsilon}$ and (65) follows:

	$\displaystyle\underset{y^{s}\in\mathcal{B}^{(i,n)}_{\epsilon}}{\sum}\Pr(Y^{s}\!=\!y^{s}\!\mid\!\theta\!=\!i)\mathsf{E}\Big{[}T^{(n)}\Big{\|}Y^{s}\!=\!y^{s},\theta\!=\!i\Big{]}$
	$\displaystyle\quad=\!\underset{y^{s}\in\mathcal{B}^{(i,n)}_{\epsilon}}{\sum}\underset{y^{s+r}\in\mathcal{Y}^{(i,n)}_{\epsilon}(y^{s})}{\sum\Pr(Y^{s+r}\!=\!}y^{s}z^{r}\mid Y^{s}\!=\!y^{s},\theta\!=\!i)$			(64)
	$\displaystyle\quad=\underset{y^{t}\in\mathcal{Y}^{(i,n)}_{\epsilon}}{\sum\Pr(Y^{t}}=y^{t}\mid\theta=i)\,.$			(65)

We can now rewrite (55) by replacing the expectation in (56) by (65) to obtain (66). In (67) the two summations are consolidated into a single sum over union over all $n$ of each $\mathcal{Y}^{(i,n)}_{\epsilon}$ :

$\displaystyle\sum_{i=1}^{M}$	$\displaystyle\Pr(\theta\!=\!i)\sum_{n=1}^{\infty}\sum_{r=0}^{\infty}\mathsf{E}[\mathbbm{1}_{T^{(n)}>r}\|\theta=i]$
	$\displaystyle=\sum_{i=1}^{M}\Pr(\theta\!=\!i)\sum_{n=1}^{\infty}\underset{y^{t}\in\mathcal{Y}^{(i,n)}_{\epsilon}}{\sum\Pr(Y^{t}}=y^{t}\mid\theta=i)$	(66)
	$\displaystyle=\sum_{i=1}^{M}\underset{y^{t}\in\cup_{n=0}^{\infty}\mathcal{Y}^{(i,n)}_{\epsilon}}{\Pr(\theta=i)\sum\Pr(Y^{t}=y^{t}}\mid\theta=i)\,.$	(67)

To conclude the proof, note that the union $\cup_{n=0}^{\infty}\mathcal{Y}^{(i,n)}_{\epsilon}$ is the set $\mathcal{Y}^{(i)}_{\epsilon}$ defined in the statement of the Lemma 1. ∎

Proof:

Define the set $\mathcal{A}_{\epsilon}$ by:

\mathcal{A}_{\epsilon}\triangleq\{y^{t}\in\mathcal{Y}_{\epsilon}^{(i)}:\rho_{j}(y^{t})<\frac{1}{2}\;\forall j=1,\dots,M\}\,,

(68)

where $\mathcal{A}_{\epsilon}$ does not depend on $i$ . Let the set $\mathcal{Y}_{\epsilon}^{(i)}$ be partitioned into $\mathcal{A}_{\epsilon}$ and $\mathcal{Y}_{\epsilon}^{(i)}\setminus\mathcal{A}_{\epsilon}$ . Then, we can split the sum in the left side of (69), which is the left side of (38) in Lemma 2, into a sum over $\mathcal{A}_{\epsilon}$ , right side of (69), and a sum over the sets $\mathcal{Y}_{\epsilon}^{(i)}\setminus\mathcal{A}_{\epsilon}$ , expression (70) as follows:

$\displaystyle\sum_{i=1}^{M}\Pr$	$\displaystyle(\theta=i)\sum_{y^{t}\in\mathcal{Y}_{\epsilon}^{(i)}}\Pr(Y^{t}=y^{t}\mid\theta=i)V_{i}(y^{t})$
$\displaystyle=$	$\displaystyle\sum_{i=1}^{M}\Pr(\theta=i)\sum_{y^{t}\in\mathcal{A}_{\epsilon}}\Pr(Y^{t}=y^{t}\mid\theta=i)V_{i}(y^{t})$	(69)
$\displaystyle+$	$\displaystyle\sum_{i=1}^{M}\underset{y^{t}\in\mathcal{Y}_{\epsilon}^{(i)}\setminus\mathcal{A}_{\epsilon}}{\Pr(\theta=i)\sum\Pr(Y^{t}=y^{t}}\mid\theta=i)V_{i}(y^{t})\,.$	(70)

For $y^{t}\in\mathcal{Y}_{\epsilon}^{(i)}\setminus\mathcal{A}_{\epsilon}:\;\exists j\neq i$ s.t. $U_{j}(t)\geq 0$ and $U_{i}(t)<0$ . Then $S_{0}=\{j\}$ by constraint (19) and Claim (1), and therefore $\Delta>0$ , (see the proof of Thm. 3, for definition of $\Delta$ ). By equation (126) with $\Delta\geq 0$ , this results in $\mathsf{E}[U_{i}(t+1)-U_{i}(t)\mid Y^{t}=y^{t},\theta=i]\geq C$ for all $i$ . Note that $\{U_{j}(t)\geq 0\}\cap\{S_{0}=\{j\}\}$ means that in this case the SED constraint (3) is satisfied. It suffices to show the bound holds also for (69). The product of conditional probabilities: $\Pr(\theta=i)$ and $\Pr(Y^{t}=y^{t}\mid\theta=i)$ in (69) is equal to $\Pr(Y^{t}=y^{t},\theta\!=\!i)$ and can be factored into $\Pr(Y^{t}=y^{t})\Pr(\theta=i\mid Y^{t}=y^{t})$ . Since $0<V_{i}(y^{t})\leq C_{2}$ and $\mathcal{A}_{\epsilon}$ does not depend on $i$ , then the summation order in (69) can be reversed to obtain:

	$\displaystyle\sum_{i=1}^{M}$	$\displaystyle\sum_{y^{t}\in\mathcal{A}_{\epsilon}}\!\Pr(Y^{t}=y^{t})\Pr(\theta=i\mid Y^{t}=y^{t})V_{i}(y^{t})$
		$\displaystyle\underset{y^{t}\in\mathcal{A}_{\epsilon}}{=\sum}\!\Pr(Y^{t}=y^{t})\sum_{i=1}^{M}\!\Pr(\theta=i\mid Y^{t}=y^{t})V_{i}(y^{t})$		(71)

The probability $\Pr(\theta=i\mid Y^{t}=y^{t})$ in (71) is just $\rho_{i}(y^{t})$ and using the definition of $V_{i}(y^{t})$ (72) follows. In (73) $\rho_{i}(y^{t})$ is moved inside the expectation, to obtain the form in constraint (20) of Thm. 1:

$\displaystyle\sum_{i=1}^{M}$	$\displaystyle\Pr(\theta=i\mid Y^{t}=y^{t})V_{i}(y^{t})$
	$\displaystyle=\sum_{i=1}^{M}\!\rho_{i}(y^{t})\mathsf{E}[U_{i}(t\!+\!1)\!-\!U_{i}(t)\!\mid\!Y^{t}=y^{t},\theta=i]$	(72)
	$\displaystyle=\sum_{i=1}^{M}\mathsf{E}[\rho_{i}(y^{t})(U_{i}(t\!+\!1)\!-\!U_{i}(t))\!\mid\!Y^{t}=y^{t},\theta=i]$	(73)
	$\displaystyle\geq C$	(74)

Constraint (20) dictates that (73) is lower bounded by $C$ and (75) follows. Then we multiply by $1=\sum_{i=1}^{M}{\rho_{i}(y^{t})}$ to produce (76). In (77) note that $\rho_{i}(y^{t})=\Pr(\theta=i\mid Y^{t}=y^{t})$ and the product $\Pr(\theta=i\mid Y^{t}=y^{t})\Pr(Y^{t}=y^{t})$ is given $\Pr(Y^{t}=y^{t},\theta=i)=\Pr(Y^{t}=y^{t}\mid\theta=i)\Pr(\theta=i)$ . This is used to obtain (77) and then (78):

$\displaystyle\underset{y^{t}\in\mathcal{A}_{\epsilon}}{\sum}\!$	$\displaystyle\Pr(Y^{t}=y^{t})\sum_{i=1}^{M}\!\Pr(\theta=i\mid Y^{t}=y^{t})V_{i}(y^{t})$
	$\displaystyle\geq\sum_{y^{t}\in\mathcal{A}_{\epsilon}}\Pr(Y^{t}=y^{t})C$	(75)
	$\displaystyle=C\sum_{y^{t}\in\mathcal{A}_{\epsilon}}\Pr(Y^{t}=y^{t})\sum_{i=1}^{M}{\rho_{i}(y^{t})}$	(76)
	$\displaystyle=C\sum_{i=1}^{M}\sum_{y^{t}\in\mathcal{A}_{\epsilon}}\Pr(Y^{t}=y^{t},\theta=i)$	(77)
	$\displaystyle=C\sum_{i=1}^{M}\Pr(\theta=i)\sum_{y^{t}\in\mathcal{A}_{\epsilon}}\Pr(Y^{t}=y^{t}\mid\theta=i)\,.$	(78)

In both (69) and (70) replacing $V_{i}(y^{t})$ by $C$ provide and upper bound on the original expression. Combining the two upper bounds we recover the Lemma. ∎

Proof:

We start writing, in the left side of (79), the sum in the left side of equation (39) of Lemma 3, using an equivalent form for $\mathcal{Y}^{(i)}_{\epsilon}$ , which is $\cup_{n=0}^{\infty}\mathcal{Y}^{(i,n)}_{\epsilon}$ . This equivalent form was also used in the proof of Lemma 1. Then in (79) we break it into two summations, first over $n$ and then over $\mathcal{Y}^{(i,n)}_{\epsilon}$ :

	$\displaystyle\sum_{i=1}^{M}$	$\displaystyle\Pr(\theta\!=\!i)\underset{y^{t}\in\cup_{n=0}^{\infty}\mathcal{Y}^{(i,n)}_{\epsilon}}{\sum V_{i}(y^{t})}\Pr(Y^{t}\!=\!y^{t}\mid\theta=i)$
	$\displaystyle=$	$\displaystyle\sum_{i=1}^{M}\Pr(\theta\!=\!i)\sum_{n=1}^{\infty}\underset{y^{t}\in\mathcal{Y}^{(i,n)}_{\epsilon}}{\sum V_{i}}(y^{t})\Pr(Y^{t}\!=\!y^{t}\mid\theta\!=\!i)\,.$		(79)

The set $\mathcal{Y}^{(i,n)}_{\epsilon}$ is a subset of $\cup_{t=0}^{\infty}\{0,1\}^{t}$ and therefore can be expressed a union of all the intersections over $n$ : $\mathcal{Y}^{(i,n)}_{\epsilon}=\cup_{t=0}^{\infty}\{\mathcal{Y}^{(i,n)}_{\epsilon}\cap\{0,1\}^{t}\}$ . We use this new form to rewrite the inner sum in (79) as the left side of (80). Then, we remove the intersections with $\mathcal{Y}^{(i,n)}_{\epsilon}$ by using its indicator in the right side of (80):

	$\displaystyle\sum_{t=0}^{\infty}$	$\displaystyle\underset{y^{t}\in\mathcal{Y}^{(i,n)}_{\epsilon}\cap\{0,1\}^{t}}{\sum}V_{i}(y^{t})\Pr(Y^{t}=y^{t}\mid\theta=i)$
	$\displaystyle=$	$\displaystyle\sum_{t=0}^{\infty}\sum_{y^{t}\in\{0,1\}^{t}}\mathbbm{1}_{y^{t}\in\mathcal{Y}^{(i,n)}_{\epsilon}}V_{i}(y^{t})\Pr(Y^{t}\!=\!y^{t}\mid\theta\!=\!i)\,.$		(80)

Recall that $V_{i}(y^{t})=\mathsf{E}\left[U_{i}(t+1)\!-\!U_{i}(t)\mid\!Y^{t}=y^{t},\theta\!=\!i\right]$ from (37). Also recall from (8) that $U_{i}(t)=U_{i}(Y^{t})$ , a random function of $Y^{t}$ . Let $D_{i}(Y^{t+1})\triangleq U_{i}(Y^{t+1})-U_{i}(Y^{t})$ , then we expand $V_{i}(y^{t})$ as:

	$\displaystyle V_{i}(y^{t})$	$\displaystyle=\mathsf{E}[U_{i}(t+1)-U_{i}(t)\mid Y^{t}=y^{t},\theta=i]$
		$\displaystyle\underset{z\in\{0,1\}}{=\sum D_{i}}(Y^{t+1})\Pr(Y_{t+1}\!=\!z\mid Y^{t}\!=\!y^{t},\theta\!=\!i)\,.$		(81)

The product of the probabilities in (80) and (81) is given by $\Pr(Y^{t+1}=y^{t}z\mid\theta=i)$ . Replacing $V_{i}(y^{t})$ in (80) using (81) we obtain the left side of (82). The equality in (82) follows by definition of expectation:

	$\displaystyle\sum_{t=0}^{\infty}$	$\displaystyle\underset{y^{t+1}\in\{0,1\}^{t+1}}{\quad\sum D_{i}(y^{t+1})}\mathbbm{1}_{y^{t}\in\mathcal{Y}^{(i,n)}_{\epsilon}}\Pr(Y^{t+1}\!=\!y^{t+1}\mid\theta\!=\!i)$
		$\displaystyle=\sum_{t=0}^{\infty}\mathsf{E}[D_{i}(Y^{t+1})\mathbbm{1}_{Y^{t}\in\mathcal{Y}^{(i,n)}_{\epsilon}}\mid\theta\!=\!i]\,.$		(82)

We expand $D_{i}(Y^{t})$ using its definition to write (82) as the left side of (83) and use linearity of expectations to the equality (83). The indicator $\mathbbm{1}_{Y^{t}\in\mathcal{Y}^{(i,n)}_{\epsilon}}$ is zero before time $t=t^{(n)}_{0}$ and after time $t=t^{(n)}_{0}+T^{(n)}-1$ , and is one in between. Accordingly, in (84) we adjust the limits of summation and remove the indicator function. Note that the times $t_{0}^{(n)}$ and $T^{(n)}$ are themselves random variables. Lastly, observe that (84) is a telescopic sum that is replaced by the end points in (85):

$\displaystyle\sum_{t=0}^{\infty}$	$\displaystyle\mathsf{E}\left[\left(U_{i}\left(Y^{t+1}\right)\!-\!U_{i}\left(Y^{t}\right)\right)\mathbbm{1}_{Y^{t}\in\mathcal{Y}^{(i,n)}_{\epsilon}}\big{\|}\theta\!=\!i\right]$
	$\displaystyle=\mathsf{E}\left[\sum_{t=0}^{\infty}\left(U_{i}\!\left(Y^{t+1}\right)\!-\!U_{i}\left(Y^{t}\right)\right)\!\mathbbm{1}_{Y^{t}\in\mathcal{Y}^{(i,n)}_{\epsilon}}\Bigg{\|}\theta\!=\!i\right]$	(83)
	$\displaystyle=\mathsf{E}\left[\sum_{t=t_{0}^{(n)}}^{T^{(n)}+t_{0}^{(n)}-1}(U_{i}(Y^{t+1})\!-\!U_{i}(Y^{t}))\Bigg{\|}\theta=i\right]$	(84)
	$\displaystyle=\mathsf{E}\left[U_{i}\left(t_{0}^{(n)}+T^{(n)}\right)-U_{i}\left(t_{0}^{(n)}\right)\big{\|}\theta=i\right]\,.$	(85)

To conclude the proof, we replace the inner most summation in (79) with (85):

	$\displaystyle\sum_{i=1}^{M}$	$\displaystyle\Pr(\theta\!=\!i)\sum_{n=1}^{\infty}\underset{y^{t}\in\mathcal{Y}^{(i,n)}_{\epsilon}}{\sum V_{i}}(y^{t})\Pr(Y^{t}\!=\!y^{t}\mid\theta\!=\!i)$		(86)
	$\displaystyle=$	$\displaystyle\sum_{i=1}^{M}\Pr(\theta\!=\!i)\!\sum_{n=1}^{\infty}\!\mathsf{E}\left[U_{i}\left(\!t_{0}^{(n)}\!+\!T^{(n)}\!\right)\!-\!U_{i}\left(t_{0}^{(n)}\right)\!\mid\!\theta\!=\!i\right]\,.$

∎

Proof:

The confirmation phase starts at a time $t$ of the form $T_{n}$ defined in (23), at which the transmitted message $i$ attains $U_{i}(T_{n})\geq 0$ and $U_{i}(T_{n}-1)<0$ . Then, like the product martingale in [23], the process $\zeta_{i}(t)$ , $t\geq T_{n}$ , is a martingale respect to $\mathcal{F}_{t}=\sigma(Y^{t})$ , where:

\displaystyle\zeta_{i}(t)=\left(\frac{p}{q}\right)^{\frac{U_{i}(t)}{C_{2}}}\,.

(87)

Note that $U_{i}(t)$ is a biased random walk, see the Markov Chain in [2], with $U_{i}(t)=U_{i}(T_{n})+\sum_{m=T_{n}}^{t}\xi_{m}$ , where $\xi_{m}$ is an R.V. distributed according to:

\displaystyle\xi_{m}=\begin{cases}+C_{2}\quad\text{w.p. }q\\ -C_{2}\quad\text{w.p. }p\end{cases}\,,

(88)

We verify that $\mathsf{E}[\zeta_{i}(t+1)\mid\mathcal{F}_{t}]=\zeta_{i}(t)$ as follows:

	$\displaystyle\mathsf{E}[\zeta_{i}(t+1)\mid\mathcal{F}_{t}]$	$\displaystyle=\zeta_{i}(t)\left(p\left(\frac{p}{q}\right)^{-1}+q\left(\frac{p}{q}\right)^{1}\right)$
		$\displaystyle=\zeta_{i}(t)\left(p+q\right)=\zeta_{i}(t)\,.$		(89)

Let $S_{n}$ be the time at which decoding either terminates at $U_{i}(t)=U_{i}(T_{n})+NC_{2}$ , or a fall back occurs, when $U_{i}(t)=U_{i}(T_{n})-C_{2}<0$ , that is $S_{n}\triangleq\min\{t\geq T_{n}:U_{i}(t)\in\{U_{i}(T_{n})-C_{2},U_{i}(T_{n})+NC_{2}\}\}$ . Then, the process $\zeta_{i}(t\wedge S_{n})$ is a two side bounded martingale and:

	$\displaystyle\mathsf{E}[\zeta_{i}(S_{n})]$	$\displaystyle=p_{f}\left(\frac{p}{q}\right)^{\frac{U_{i}(T_{n})}{C_{2}}-1}\!\!+(1\!-\!p_{f})\left(\frac{p}{q}\right)^{\frac{U_{i}(T_{n})}{C_{2}}+N}$		(90)
	$\displaystyle\mathsf{E}[\zeta_{i}(T_{n})]$	$\displaystyle=\left(\frac{p}{q}\right)^{\frac{U_{i}(T_{n})}{C_{2}}}$		(91)

By Doob’s optional stopping theorem [22], $\mathsf{E}[\zeta_{i}(S_{n})]$ is equal to $\mathsf{E}[\zeta_{i}(T_{n})]$ . Let the fall back probability be $p_{f}\triangleq\Pr(U_{i}(S_{n})=U_{i}(T_{n})-C_{2}\mid t=T_{n})$ , then we can solve for $p_{f}$ using equations (90) and (91) by setting both right sides equal. In (92) we factor out and cancel $\left(p/q\right)^{U_{i}(T_{n})/C_{2}}$ . In (93) we collect the terms with factor $p_{f}$ and in (94) we solve for $p_{f}$ .

$\displaystyle 1$	$\displaystyle=p_{f}\frac{q}{p}+(1-p_{f})\left(\frac{p}{q}\right)^{N}$	(92)
$\displaystyle 0$	$\displaystyle=p_{f}\frac{q}{p}\left(1-\left(\frac{p}{q}\right)^{N+1}\right)-\left(1-\left(\frac{p}{q}\right)^{N}\right)$	(93)
$\displaystyle p_{f}$	$\displaystyle=\frac{p}{q}\frac{1-\left(\frac{p}{q}\right)^{N}}{1-\left(\frac{p}{q}\right)^{N+1}}\,.$	(94)

Since $p_{f}$ is just a function of $N$ and $p$ , then it is the same constant across all messages $i=1,\dots,M$ and indexes $n=1,\dots$ . To complete the proof we use the definition of $C_{2}$ from equation (5), which is $C_{2}=\log_{2}\left(\frac{q}{p}\right)$ and express $p_{f}$ in terms of $C_{2}$ :

	$\displaystyle p_{f}$	$\displaystyle=2^{-\log_{2}(\frac{q}{p})}\frac{1-2^{-N\log_{2}(\frac{q}{p})}}{1-2^{-(N+1)\log_{2}(\frac{q}{p})}}$
		$\displaystyle=2^{-C_{2}}\frac{1-2^{-NC_{2}}}{1-2^{-(N+1)C_{2}}}$		(95)

∎

Proof:

We start by conditioning the expectation in the left side of equation (41), in Lemma 5, on the events $\{T^{(n)}>0,\theta=i\}$ , $\{T^{(n)}=0,\theta=i\}$ , and $\{T^{(n)}<0,\theta=i\}$ , whose union results in the original conditioning event, $\{\theta=i\}$ , to express the original conditional probability using Bayes rule:

	$\displaystyle\mathsf{E}[U_{i}(T_{n})-U_{i}(t_{0}^{(n)})\mid\theta=i]$		(96)
	$\displaystyle=\mathsf{E}[U_{i}(T_{n})\!-\!U_{i}(t_{0}^{(n)})\!\mid\!T^{(n)}\!>\!0,\theta\!=\!i]\Pr(T^{(n)}\!>\!0\mid\!\theta\!=\!i)$
	$\displaystyle+\mathsf{E}[U_{i}(T_{n})\!-\!U_{i}(t_{0}^{(n)})\!\mid\!T^{(n)}\!=\!0,\theta\!=\!i]\Pr(T^{(n)}\!=\!0\mid\!\theta\!=\!i)$
	$\displaystyle+\mathsf{E}[U_{i}(T_{n})\!-\!U_{i}(t_{0}^{(n)})\!\mid\!T^{(n)}\!<\!0,\theta\!=\!i]\Pr(T^{(n)}\!<\!0\mid\!\theta\!=\!i)\,.$

Note that $T^{(n)}$ is non-negative, thus the last term in the right side of (96) vanishes as $\Pr(T^{(n)}<0)=0$ . When $T^{(n)}=0$ , then $T_{n}=t_{0}^{(n)}+T^{(n)}=t_{0}^{(n)}$ so that $U_{i}(T_{n})=U_{i}(t_{0}^{(n)})$ . Therefore, the second term in the right side of (96) also vanishes, leaving only the first term conditioned on $\{T^{(n)}>0,\theta=i\}$ . Let $\mathcal{C}(t^{(n)}_{0})$ be the event that message $i$ enters confirmation after time $t^{(n)}_{0}$ , rather than another message $j\neq i$ ending the process by attaining $U_{j}(t)\geq\log_{2}(1-\epsilon)-\log_{2}(\epsilon)$ , that is: $\mathcal{C}(t^{(n)}_{0})\triangleq\{\exists t>t^{(n)}_{0}:U_{i}(t)\geq 0\}$ . Then, the probability $\Pr(T^{(n+1)}\geq 0\mid\theta=i)$ can be expressed as:

$\displaystyle\Pr($	$\displaystyle T^{(n+1)}\geq 0\mid\theta=i)$
	$\displaystyle=\Pr(T^{(n+1)}\geq 0\mid T^{(n)}>0,\mathcal{C}(t^{(n)}_{0}),\theta=i)$	(97)
	$\displaystyle\quad\quad\quad\quad\quad\quad\,\cdot\Pr(T^{(n)}>0,\mathcal{C}(t^{(n)}_{0})\mid\theta=i)\,.$

Note that the first probability in the right side of (97) is just the fall back probability $p_{f}$ computed in Lemma 4. The last probability in (97) can be also expressed as a product of conditional probabilities, see (98). In (98) note that event $\mathcal{C}(t^{(n)}_{0})$ is the event that an $n$ -th confirmation phase phase occurs, which implies that a preceding $n$ -th communication phase round occurs. Then, $\mathcal{C}(t^{(n)}_{0})\implies T^{(n)}>0$ and the first factor in the product of probabilities in (98) vanishes:

$\displaystyle\Pr$	$\displaystyle(T^{(n)}>0,\mathcal{C}(t^{(n)}_{0})\mid\theta=i)$
	$\displaystyle=\Pr(T^{(n)}\!>\!0\mid\mathcal{C}(t^{(n)}_{0}),\theta\!=\!i)\Pr(\mathcal{C}(t^{(n)}_{0})\mid\theta\!=\!i)$
	$\displaystyle=\Pr(\mathcal{C}(t^{(n)}_{0})\mid\theta=i)\,.$	(98)

Combining (97) and (98) we obtain:

\displaystyle\Pr(T^{(n+1)}\!>\!0\mid\theta\!=\!i)=\Pr(\mathcal{C}(t^{(n)}_{0})\mid\theta=i)p_{f}\,.

(99)

We can also bound $\Pr(\mathcal{C}(t^{(n)}_{0})\mid\theta=i)$ as follows:

$\displaystyle\Pr$	$\displaystyle(\mathcal{C}(t^{(n)}_{0})\mid\theta=i)$
	$\displaystyle=\Pr(\mathcal{C}(t^{(n)}_{0})\mid T^{(n)}\!>\!0,\theta\!=\!i)\Pr(T^{(n)}\!>\!0\mid\theta\!=\!i)$
	$\displaystyle\leq\Pr(T^{(n)}\!>\!0\!\mid\theta\!=\!i)\,.$	(100)

Then, we can recursively bound $\Pr(T^{(n+1)}\!>\!0\mid\theta\!=\!i)$ by $\Pr(T^{(n)}\!>\!0\mid\theta\!=\!i)p_{f}$ using (99) and (100). For $n\geq 1$ , this results in the general bound:

\Pr(T^{(n)}\!\geq\!0\mid\theta\!=\!i)\leq p_{f}^{n\!-\!1}\,.

(101)

Using (101) we can bound the expectation $\mathsf{E}[U_{i}(T_{n})-U_{i}(t_{0}^{(n)})\mid\theta=i]$ in the left side of (96) by:

	$\displaystyle\mathsf{E}[U_{i}($	$\displaystyle T_{n})-U_{i}(t_{0}^{(n)})\mid\theta=i]$
		$\displaystyle\leq\mathsf{E}[U_{i}(T_{n})\!-\!U_{i}(t_{0}^{(n)})\!\mid\!T^{(n)}\!>\!0,\theta\!=\!i]p^{n-1}_{f}\,.$		(102)

The value of $U_{i}(t^{(n)}_{0})$ at $n=1$ , when $t^{(1)}_{0}=0$ is a constant that can be directly computed for every $i$ from the initial distribution. Using (102), we can then bound the second sum in the left side of equation (41) by:

$\displaystyle\sum_{n=1}^{\infty}$	$\displaystyle\mathsf{E}[U_{i}(T_{n})-U_{i}(t_{0}^{(n)})\mid\theta=i]$	(103)
	$\displaystyle\leq\mathsf{E}[U_{i}(T^{(1)})-U_{i}(0)\mid\!T^{(1)}\!>\!0,\theta=i]p_{f}^{0}$
	$\displaystyle+\sum_{n=2}^{\infty}\mathsf{E}[U_{i}(T_{n})\!-\!U_{i}(t_{0}^{(n)})\mid\!T^{(n)}\!>\!0,\theta\!=\!i]p_{f}^{n\!-\!1}$	(104)
$\displaystyle=$	$\displaystyle\sum_{n=1}^{\infty}\mathsf{E}[U_{i}(T_{n})\mid T^{(n)}>0,\theta=i]p_{f}^{n-1}-U_{i}(0)$
	$\displaystyle-\sum_{n=2}^{\infty}\mathsf{E}[U_{i}(t_{0}^{(n)})\mid T^{(n)}>0,\theta=i]p_{f}^{n-1}\,.$	(105)

The conditioning event $\{T^{(n)}>0\}$ implies events $\{T^{(m)}>0\}$ for $m=1,\dots,n$ because if $T^{(m)}=0$ means the process has stopped and no further communication rounds occur. Event $\{T^{(n)}>0\}$ also implies that the $n$ -th round of communication occurs, and therefore $U_{i}(t_{0}^{(n)})$ is given by the previous crossing value $U_{i}(T_{n-1})$ minus $C_{2}$ by constraint (19), then:

$\displaystyle\sum_{n=2}^{\infty}$	$\displaystyle\mathsf{E}[U_{i}(t_{0}^{(n)})\mid T^{(n)}>0,\theta=i]p_{f}^{n-1}$
	$\displaystyle=\sum_{n=2}^{\infty}\mathsf{E}[U_{i}(T_{n-1})\!-\!C_{2}\mid T^{(n)}>0,\theta\!=\!i]p_{f}^{n-1}$	(106)
	$\displaystyle=\sum_{n=1}^{\infty}\mathsf{E}[U_{i}(T_{n})\!-\!C_{2}\mid T^{(n+1)}>0,\theta\!=\!i]p_{f}^{n}$
	$\displaystyle\geq\sum_{n=1}^{\infty}\mathsf{E}[U_{i}(T_{n})\!-\!C_{2}\mid T^{(n)}\!>\!0,\theta\!=\!i]p_{f}^{n}\,.$	(107)

The inequality in (107) follows from the following inequality:

\displaystyle\mathsf{E}[U_{i}(T_{n})\mid T^{(n+1)}\!>\!0,\theta\!=\!i]\geq\mathsf{E}[U_{i}(T_{n})\mid T^{(n)}\!>\!0,\theta\!=\!i]\,.

(108)

For the proof of inequality (108) see Appendix D. From (107) it follows that:

$\displaystyle-\sum_{n=1}^{\infty}$	$\displaystyle\mathsf{E}[U_{i}(T_{n})\!-\!C_{2}\mid T^{(n+1)}\!>\!0\theta\!=\!i]p_{f}^{n}$
$\displaystyle\leq$	$\displaystyle-\sum_{n=1}^{\infty}\mathsf{E}[U_{i}(T_{n})\!-\!C_{2}\mid T^{(n)}\!>\!0,\theta\!=\!i]p_{f}^{n}$	(109)
$\displaystyle=$	$\displaystyle\!-\!\sum_{n=1}^{\infty}\mathsf{E}[p_{f}(U_{i}(T_{n})\!-\!C_{2})\!\mid\!T^{(n)}\!>\!0,\theta\!=\!i]p_{f}^{n\!-\!1}\,.$	(110)

In (110) we have factored one $p_{f}$ inside the expectation. We can now replace (105) by (110) to upper bound (103):

	$\displaystyle\sum_{n=1}^{\infty}\mathsf{E}[U_{i}(t_{0}^{(n)}+T^{(n)})-U_{i}(t_{0}^{(n)})\mid\theta=i]$
	$\displaystyle\leq-U_{i}(0)+\sum_{n=1}^{\infty}\mathsf{E}[U_{i}(T_{n})\mid T^{(n)}>0,\theta=i]p_{f}^{n-1}$			(111)
	$\displaystyle\quad\quad-\sum_{n=1}^{\infty}\mathsf{E}[p_{f}(U_{i}(T_{n})\!-\!C_{2})\!\mid T^{(n)}\!>\!0,\theta\!=\!i]p_{f}^{n\!-\!1}$
	$\displaystyle=\sum_{n=1}^{\infty}\mathsf{E}[U_{i}(T_{n})\!-\!p_{f}(U_{i}(T_{n})\!-\!C_{2})\!\mid\!T^{(n)}\!>\!0,\theta\!=\!i]p_{f}^{n\!-\!1}$
	$\displaystyle\quad\quad-U_{i}(0)\,.$			(112)

The expectation in (112) combines the two sums in from (111) by subtracting $p_{f}(U_{i}(T_{n})-C_{2})$ from $U_{i}(T_{n})$ . The first term $U_{i}(T_{n})$ is the value of $U_{i}(t)$ at the communication-phase stopping time $t=T_{n}$ . In the second term $p_{f}(U_{i}(T_{n})-C_{2})$ , the difference $U_{i}(T_{n})-C_{2}$ is the unique value that $U_{i}(t_{0}^{(n+1)})$ can take once the $n$ -th confirmation phase round starts at a point $U_{i}(T_{n})$ . Equation (112) is an important intermediate result in the proof of Thm. 2. This is because when considering the process $U^{\prime}_{i}(t)$ , the starting value of each communication-phase round $U^{\prime}_{i}(t_{0}^{(n+1)})$ is still that of the original process $U_{i}(T_{n})-C_{2}$ , and therefore the argument of the expectation would change to $U^{\prime}_{i}(T^{\prime}_{n})-p_{f}(U_{i}(T_{n})-C_{2})$ . For the proof of Lemma 5, we just need to bound (112), so we write the sum in (112) as:

	$\displaystyle\sum_{n=1}^{\infty}\mathsf{E}[$	$\displaystyle U_{i}(T_{n})(1-p_{f})\!+\!p_{f}C_{2})\!\mid T^{(n)}\!>\!0,\theta\!=\!i]p_{f}^{n\!-\!1}$		(113)
	$\displaystyle=$	$\displaystyle\sum_{n=1}^{\infty}\mathsf{E}[U_{i}(T_{n})\mid T^{(n)}\!>\!0,\theta\!=\!i](1\!-\!p_{f})p_{f}^{n\!-\!1}\!+\!\sum_{n=1}^{\infty}C_{2}p_{f}^{n}.$

By constraint (17), $U_{i}(T_{n})\leq U_{i}(T_{n}-1)+C_{2}$ , and since $U_{i}(T_{n}-1)<0$ , then, $U_{i}(T_{n})$ is bounded by $C_{2}$ . Thus, the expectation $\mathsf{E}[U_{i}(T_{n})\mid T^{(n)}\!>\!0,\theta\!=\!i]$ is also bounded by $C_{2}$ . Then (113) is bounded by:

\displaystyle\sum_{n=1}^{\infty}C_{2}(1\!-\!p_{f})p_{f}^{n\!-\!1}+\sum_{n=1}^{\infty}C_{2}p_{f}^{n}=C_{2}+\frac{p_{f}}{1-p_{f}}C_{2}

(114)

Finally, the left side of equation (41) in Lemma 5, (the left side of (115)), is upper bounded using the bounds (112) and (114) on the inner sum (109) as follows:

$\displaystyle\sum_{i=1}^{M}$	$\displaystyle\Pr(\theta=i)\sum_{n=1}^{\infty}\mathsf{E}[U_{i}(T_{n})\!-\!U_{i}(t_{0}^{(n)})\mid\theta=i]$
	$\displaystyle\leq\sum_{i=1}^{M}\Pr(\theta=i)\left(\frac{p_{f}}{1-p_{f}}C_{2}+C_{2}-U_{i}(0)\right)$	(115)
	$\displaystyle=2^{-C_{2}}\frac{1-2^{-NC_{2}}}{1-2^{-C_{2}}}C_{2}+C_{2}-\mathsf{E}[U_{i}(0)]\,.$	(116)

To transition from (115) to (116) we have used the definition of $p_{f}$ from Lemma 4. The proof of Lemma 5 is complete. ∎

V-B Proof of Theorems 2 and 3

Proof:

Suppose $U^{\prime}_{i}(t)$ is a process that satisfies the constraints (16)-(20) in Thm. 1 and constraints (26)-(28) of Thm. 2 for some $B<C_{2}$ . Because the constraints of Thm. 1 are satisfied, Lemmas 1-5 all hold for the process $U^{\prime}_{i}(t)$ . To bound $\mathsf{E}[T^{\prime}]$ we begin by bounding the sum on the right side of Lemma 3, which is (39), but using the new process $U^{\prime}_{i}(t)$ . Dividing the new bound by $C$ produces the desired result. We follow the procedure in the proof of Lemma 5, but replacing $U_{i}(t)$ by $U^{\prime}_{i}(t)$ , up to the equation (105). By the definition of $U^{\prime}_{i}(t)$ we have that $U^{\prime}_{i}(t_{0}^{(n)})=U_{i}(t_{0}^{(n)})$ and from equation (19) it follows that, for $n>1$ , $T^{(n)}>0$ implies $U_{i}(t_{0}^{(n)})=U_{i}(T_{n-1})-C_{2}$ . Then, from equation (105) to (112), we replace $U^{\prime}_{i}(t_{0}^{(n)})$ by $U_{i}(T_{n-1})-C_{2}$ instead. Using equation (112) we have that:

	$\displaystyle\sum_{n=1}^{\infty}$	$\displaystyle\mathsf{E}[U^{\prime}_{i}(T^{\prime}_{n})-U^{\prime}_{i}(t_{0}^{(n)})\mid\theta=i]\leq-U^{\prime}_{i}(0)$		(117)
	$\displaystyle+$	$\displaystyle\sum_{n=1}^{\infty}\mathsf{E}[U^{\prime}_{i}(T^{\prime}_{n})\!-\!p_{f}(U_{i}(T_{n})-C_{2})\!\mid T^{(n)}\!>\!0,\theta\!=\!i]p_{f}^{n\!-\!1}$

We can replace $U^{\prime}_{i}(t)$ by $U^{\prime}_{i}(0)=U_{i}(0)$ using the definition of $U^{\prime}_{i}(t)$ . We further claim that the constraints of Thm. 2 guarantee that $U^{\prime}_{i}(T^{\prime}_{n})\!-\!p_{f}(U_{i}(T_{n})-C_{2})\leq B$ . This is derived from constraint (28): $U^{\prime}_{i}(T^{\prime}_{n})-\frac{p}{q}(U_{i}(T_{n})-C_{2})\leq B$ by replacing $p_{f}$ by $\frac{p}{q}$ . The replacement is possible because $p_{f}<\frac{p}{q}$ , see (94), and $U_{i}(T_{n})-C_{2}<0$ by constraint (17). Therefore, the expectation in the last sum can be replaced with $B$ for an upper bound to obtain:

	$\displaystyle\sum_{n=1}^{\infty}$	$\displaystyle\mathsf{E}[U^{\prime}_{i}(T^{\prime}_{n})\!-\!U_{i}(t_{0}^{(n)})\!\mid\!\theta=i]\leq-U_{i}(0)\!+\!\sum_{n=1}^{\infty}Bp_{f}^{n\!-\!1}$
	$\displaystyle=$	$\displaystyle B\!+\!\frac{Bp_{f}}{1\!-\!p_{f}}\!-\!U_{i}(0)=B\!+\!2^{-C_{2}}\frac{1\!-\!2^{-NC_{2}}}{1\!-\!2^{-C_{2}}}B-U_{i}(0)\,.$		(118)

Then, the value in equation (118) replaces the inner sum in the left side of (45) to obtain:

$\displaystyle\frac{1}{C}$	$\displaystyle\sum_{i=1}^{M}\Pr(\theta\!=\!i)\sum_{n=1}^{\infty}\mathsf{E}[U^{\prime}_{i}(T^{\prime}_{n})\!-\!U^{\prime}_{i}(t_{0}^{(n)})\!\mid\!\theta\!=\!i]$
	$\displaystyle\leq\frac{1}{C}\sum_{i=1}^{M}\Pr(\theta\!=\!i)\left(U_{i}(0)\!+\!B\!+\!2^{-C_{2}}\frac{1\!-\!2^{-NC_{2}}}{1-2^{-C_{2}}}B\right)$
	$\displaystyle=\frac{B}{C}\left(1\!+\!2^{-C_{2}}\frac{1\!-\!2^{-NC_{2}}}{1\!-\!2^{-C_{2}}}\right)-\frac{\mathsf{E}[U_{i}(0)]}{C}\,.$	(119)

The proof is complete. ∎

Proof:

When $U_{i}(t)\geq 0$ for some $i$ , constraint (31) is the same as the SED constraint (3) and therefore the constraints (19) and (18) are satisfied as shown in [2]. Need to show that constraints (16), (17) and (20) are also satisfied. We start the proof by deriving expressions for $\mathsf{E}[U_{i}(t\!+\!1)\!-\!U_{i}(t)\mid Y^{t},\theta\!=\!i]$ to find bounds in terms of the constraints of the theorem. The posterior probabilities $\rho_{i}(y^{t+1})$ are computed according to Bayes’ Rule:

\displaystyle\rho_{i}(y^{t+1})=\frac{\Pr(\theta=i,Y_{t+1}=y_{t+1}\mid Y^{t})}{\Pr(Y_{t+1}=y_{t+1}\mid Y^{t})}\,.

(120)

The top conditional probability in equation (120) can be split into $P(Y_{t+1}=y_{t+1}\mid\theta=i,Y^{t}=y^{t})\Pr(\theta=i\mid y^{t})$ . Since the received history $y^{t}$ fully characterizes the vector of posterior probabilities $\bm{\rho}_{t}\triangleq[\rho_{1}(y^{t}),\rho_{2}(y^{t}),\dots,\rho_{M}(y^{t})]$ , and the new construction of $S_{0}$ and $S_{1}$ , then the conditioning event $\{\theta=i\}$ sets the value of the encoding function $X_{t+1}=\text{enc}(i,Y^{t})$ , via its definition: $\text{enc}(i,Y^{t})=\mathbbm{1}_{i\in S_{1}}$ . We can just write the first probability as $\Pr(Y_{t+1}\mid\text{enc}(i,Y^{t}))$ , which reduces to $q$ if $Y_{t+1}=\text{enc}(i,Y^{t})$ and to $p$ if $Y_{t+1}\neq\text{enc}(i,Y^{t})$ . The second probability $\Pr(\theta=i\mid Y^{t}=y^{t})$ is just $\rho_{i}(y^{t})$ .

The bottom conditional probability can be written as $\underset{x_{t+1}\in\{0,1\}}{\sum\Pr(Y_{t+1}}=y_{t+1}\mid X_{t+1},Y^{t})P(X_{t+1}=x_{t+1}\mid Y^{t})$ . By the channel memoryless property, the next output $Y_{t+1}$ given the input $X_{t+1}$ is independent of the past $Y^{t}$ , that is:
$\Pr(Y_{t+1}=y_{t+1}\mid X_{t+1},Y^{t})=\Pr(Y_{t+1}=y_{t+1}\mid X_{t+1})$ . Since $\Pr(X_{t+1}=x_{t+1}\mid Y^{t})=\Pr(\theta\in S_{x_{t+1}})$ which is given by $\underset{i\in S_{x_{t+1}}}{\sum}\rho_{i}(y^{t})$ , we write:

	$\displaystyle\rho_{i}$	$\displaystyle(t+1)=\frac{\Pr(Y_{t+1}\mid i)\rho_{i}(y^{t})}{\sum_{j\in\Omega}\Pr(Y_{t+1}\mid j)\rho_{j}(y^{t})}$		(121)
		$\displaystyle=\frac{\Pr(Y_{t+1}\mid i)\rho_{i}(y^{t})}{q\sum_{j\in S_{y_{t+1}}}\rho_{j}(y^{t})+p\sum_{j\in\Omega\setminus S_{y_{t+1}}}\rho_{j}(y^{t})}\,.$		(121)

For $\{i=\theta\}$ the encoding function $X_{t+1}=\mathbbm{1}_{\theta\in S_{1}}$ dictates that $X_{t+1}=\mathbbm{1}_{i\in S_{1}}$ . Thus $\Pr(Y_{t+1}=\mathbbm{1}_{i\in S_{1}}\mid i=\theta)=\Pr(Y_{t+1}=X_{t+1})=q$ , and $\Pr(Y_{t+1}=\mathbbm{1}_{i\notin S_{1}}\mid i=\theta)=\Pr(Y_{t+1}=X_{t+1}\oplus 1)=p$ . Let $P_{0}=\sum_{j\in S_{0}}\rho_{j}(y^{t})$ and $P_{1}=\sum_{j\in S_{1}}\rho_{j}(y^{t})$ and let $\Delta\triangleq P_{0}-P_{1}$ , so that $P_{0}=\frac{1}{2}+\frac{\Delta}{2}$ and $P_{1}=\frac{1}{2}-\frac{\Delta}{2}$ . The value of $U_{i}(t+1)$ for each $Y_{t+1}\in\{0,1\}$ can be obtained from equation 121.

Assume first that $i\in S_{0}$ to obtain the value of $E[U_{i}(t+1)-U_{i}(t)\mid Y^{t}=y^{t},\theta=i]$ .

	$\displaystyle\mathsf{E}[$	$\displaystyle U_{i}(t+1)\mid Y^{t}=y^{t},\theta=i]$
		$\displaystyle=q\log_{2}\frac{\frac{\rho_{i}(y^{t})q}{P_{0}q+P_{1}p}}{1-\frac{\rho_{i}(y^{t})q}{P_{0}q+P_{1}p}}+p\log_{2}\frac{\frac{\rho_{i}(y^{t})p}{P_{0}p+P_{1}q}}{1-\frac{\rho_{i}(y^{t})p}{P_{0}p+P_{1}q}}$
		$\displaystyle=q\log_{2}\dfrac{\rho_{i}(y^{t})q}{\frac{1}{2}\!+\!\frac{\Delta(q-p)}{2}\!-\!\rho_{i}(y^{t})q}+p\log_{2}\cfrac{\rho_{i}(y^{t})p}{\frac{1}{2}\!-\!\frac{\Delta(q-p)}{2}\!-\!\rho_{i}(y^{t})p}\,.$

For $i\in S_{1}$ the only difference is the sign of the term with $\Delta$ . Let $\iota_{i}=\mathbbm{1}_{i\in S_{0}}-\mathbbm{1}_{i\in S_{1}}$ , that is $1$ if $i\in S_{0}$ and $-1$ if $i\in S_{1}$ and add a coefficient $\iota_{i}$ to each $\Delta$ for a general expression. Multiply by $2$ both terms of the fraction inside the logarithm and expand it to obtain:

$\displaystyle\mathsf{E}[$	$\displaystyle U_{i}(t\!+\!1)\!-\!U_{i}(t)\mid Y^{t},\theta\!=\!i]=\log_{2}(\rho_{i}(y^{t}))$	(122)
	$\displaystyle+q\left(\log_{2}(2q)\!-\!\log_{2}\left(1\!-\!\rho_{i}(y^{t})\!+\!(q\!-\!p)(\iota_{i}\Delta\!-\!\rho_{i}(y^{t})\right)\right)$
	$\displaystyle+p\left(\log_{2}(2p)\!-\!\log_{2}\left(1\!-\!\rho_{i}(y^{t})\!-\!(q\!-\!p)(\iota_{i}\Delta\!-\!\rho_{i}(y^{t})\right)\right)$
$\displaystyle=$	$\displaystyle q\left(\log_{2}(2q)\!-\!\log_{2}\left(1\!+\!(q\!-\!p)\frac{\iota_{i}\Delta\!-\!\rho_{i}(y^{t})}{1-\rho_{i}(y^{t})}\right)\right)$	(123)
	$\displaystyle+p\left(\log_{2}(2p)\!-\!\log_{2}\left(1\!-\!(q\!-\!p)\frac{\iota_{i}\Delta\!-\!\rho_{i}(y^{t})}{1\!-\!\rho_{i}(y^{t})}\right)\right)$
$\displaystyle\geq$	$\displaystyle C-\log_{2}\left(1+(q-p)^{2}\frac{\iota_{i}\Delta-\rho_{i}(y^{t})}{1-\rho_{i}(y^{t})}\right)\,.$	(124)

Now subtract the term $\log_{2}(1-\rho_{i}(y^{t}))$ , and add it back as a factor in the logarithm, to recover $U_{i}(t)$ from $\log_{2}(\rho_{i}(y^{t}))$ . Note that $2\rho_{i}(y^{t})q=\rho_{i}(y^{t})+(q-p)\rho_{i}(y^{t})$ and $2\rho_{i}(y^{t})p=\rho_{i}(y^{t})-(q-p)\rho_{i}(y^{t})$ . And also note that $q\log_{2}(2q)+p\log_{2}(2p)=C$ .

The logarithm $\log_{2}(1-\rho_{i}(y^{t}))$ from (122) is split into $p\log_{2}(1-\rho_{i}(y^{t}))+q\log_{2}(1-\rho_{i}(y^{t}))$ , and $1-\rho_{i}(y^{t})$ divides the arguments of the logarithms in (123). Equation (124) follows from applying Jensen’s inequality to the convex function $-\log_{2}(\cdot)$ . Then:

$\displaystyle\sum_{i=1}^{M}$	$\displaystyle\mathsf{E}[U_{i}(t+1)-U_{i}(t)\mid Y^{t},\theta=i]\rho_{i}(Y^{t})$
$\displaystyle\geq C-$	$\displaystyle\sum_{i=1}^{M}\rho_{i}(y^{t})\log_{2}\left(1+(q-p)^{2}\frac{\iota_{i}\Delta-\rho_{i}(y^{t})}{1-\rho_{i}(y^{t})}\right)$	(125)
$\displaystyle=C-$	$\displaystyle\sum_{i\in S_{0}}\rho_{i}(y^{t})\log_{2}\left(1+(q-p)^{2}\frac{\Delta-\rho_{i}(y^{t})}{1-\rho_{i}(y^{t})}\right)$
$\displaystyle-$	$\displaystyle\sum_{i\in S_{1}}\rho_{i}(y^{t})\log_{2}\left(1-(q-p)^{2}\frac{\Delta+\rho_{i}(y^{t})}{1-\rho_{i}(y^{t})}\right)\,.$	(126)

By the SEAD constraints, equations (30) and (31) if $i\in S_{0}$ , then $\Delta\leq\rho_{\min}\leq\rho_{i}(y^{t})$ . For the case where $\Delta\geq 0$ , then $i\in S_{0}\implies\Delta-\rho_{i}(y^{t})\leq 0$ and $-\Delta-\rho_{i}(y^{t})<0$ . Then the arguments of the logarithms in (126) are both less than $1$ for every $i$ . This suffices to show that the constraints (20) and (16) are satisfied when $P_{0}\geq P_{1}$ for the case that $\Delta\geq 0$ .

It remains to prove that constraints (20) and (16) hold in the case where $P_{1}>P_{0}$ , or equivalently $\Delta<0$ . Let $\alpha=-\Delta>0$ , and note that since $0<\alpha<1$ , then:

\frac{\alpha}{1\!-\!\rho_{\min}}\geq\alpha=\alpha\frac{1\!-\!\rho_{i}(y^{t})}{1\!-\!\rho_{i}(y^{t})}\geq\frac{\alpha\!-\!\rho_{i}(y^{t})}{1\!-\!\rho_{i}(y^{t})}\,,

(127)

and $\rho_{i}\geq\rho_{\min}\implies\alpha+\rho_{i}\geq\alpha+\rho_{\min}$ and $1-\rho_{i}<1-\rho_{\min}$ , therefore:

	$\displaystyle\log_{2}\!\left(\!1\!-\!(q\!-\!p)^{2}\frac{\alpha\!+\!\rho_{i}(y^{t})}{1\!-\!\rho_{i}(y^{t})}\right)\!$	$\displaystyle\leq\log_{2}\!\left(\!1\!-\!(q\!-\!p)^{2}\frac{\alpha\!+\!\rho_{\min}}{1\!-\!\rho_{\min}}\right)$
	$\displaystyle\log_{2}\!\left(\!1\!+\!(q\!-\!p)^{2}\frac{\alpha\!-\!\rho_{i}(y^{t})}{1\!-\!\rho_{i}(y^{t})}\right)\!$	$\displaystyle\leq\log_{2}\!\left(\!1\!+(q\!-\!p)^{2}\frac{\alpha}{1-\rho_{\min}}\right)\,.$

Since this holds for all $i=1,\dots,M$ , then:

	$\displaystyle\sum_{i=1}^{M}\mathsf{E}[U_{i}(t+1)-U_{i}(t)\mid Y^{t}=y^{t},\theta=i]\rho_{i}(y^{t})\geq C$		(128)
	$\displaystyle-P_{0}\log_{2}\!\left(\!1\!-\!(q\!-\!p)^{2}\frac{\alpha\!+\!\rho_{\min}}{1\!-\!\rho_{\min}}\right)\!-\!P_{1}\log_{2}\!\left(\!1\!+\alpha\frac{(q\!-\!p)^{2}}{1\!-\!\rho_{\min}}\right)$
	$\displaystyle\geq C-\log_{2}\left(1-\frac{(q-p)^{2}}{1-\rho_{\min}}[P_{0}(\alpha+\rho_{\min})-P_{1}\alpha]\right)\,.$		(129)

To satisfy constraint (20) we only need the logarithm term in (129) to be non-negative. This only requires that $-\Delta^{2}+P_{0}\rho_{\min}>0$ . Since $P_{0}-P_{1}=\Delta$ , then $P_{0}(\alpha+\rho_{\min})-P_{1}\alpha=(P_{0}-P_{1})\alpha+P_{0}\rho_{\min}=-\Delta^{2}+P_{0}\rho_{\min}$ . To satisfy constraint (20) it suffices that $-\Delta^{2}+P_{0}\rho_{\min}>0$ , which is equivalent to:

\Delta^{2}\leq P_{0}\rho_{\min}\,.

(130)

The SEAD constraints, equations (30) and (31), guarantees that $\Delta^{2}\leq\rho^{2}_{\min}$ . Since $P_{0}\geq\underset{i\in S_{0}}{\min}\rho_{i}(y^{t})=\rho_{\min}$ , then $\Delta^{2}\leq\rho^{2}_{\min}\leq P_{0}\rho_{\min}$ , which satisfies inequality (130). Then, the SEAD constraints guarantee that constraint (20) is satisfied, and only restricts the (130) is satisfied, and only restricts the absolute difference between $P_{0}$ and $P_{1}$ .

To prove that constraint (16) is satisfied, note that equation (30) of the SEAD constraints guarantees that if $\rho_{j}(t)\leq\frac{1}{2}\>\forall j=1,\dots,M$ , then $|\Delta|\leq\frac{1}{3}$ . Starting from equation (124) note that the worst case scenario is when $\iota_{i}\Delta=\frac{1}{3}$ . Using (127) with $\alpha=\frac{1}{3}$ to go from (131) to (132) we find obtain:

$\displaystyle\mathsf{E}[U_{i}(t+1)$	$\displaystyle-U_{i}(t)\mid Y^{t},\theta=i]$
$\displaystyle\geq$	$\displaystyle C-\log_{2}\left(1+(q-p)^{2}\frac{\iota_{i}\Delta-\rho_{i}(y^{t})}{1-\rho_{i}(y^{t})}\right)$	(131)
$\displaystyle\geq$	$\displaystyle C-\log_{2}\left(1+\frac{(q-p)^{2}}{3}\right)$	(132)
$\displaystyle\geq$	$\displaystyle C-\frac{(q-p)^{2}}{3}$	(133)
$\displaystyle=$	$\displaystyle C-\frac{(q-p)^{2}}{2\ln(2)}+\frac{3-2\ln(2)}{6\ln(2)}(q-p)^{2}$	(134)
$\displaystyle\geq$	$\displaystyle\frac{3-2\ln(2)}{6\ln(2)}(q-p)^{2}>\frac{(q-p)^{2}}{3}\,.$	(135)

To transition from (134) to (135) we need to show that $2\ln(2)C\geq(q-p)^{2}$ . For this we find a small constant $a$ that makes $aC-(q-p)^{2}$ , the difference between 2 convex functions, also convex. Take second derivatives $\frac{d^{2}}{dp^{2}}aC=\frac{1}{\ln(2)}\frac{a}{pq}$ and $\frac{d^{2}}{dp^{2}}(q-p)^{2}=8$ and subtract them. The constant $a$ is found by noting that $pq\leq\frac{1}{4}$ .

The SEAD constraints guarantee that both sets, $S_{0}$ and $S_{1}$ are non-empty. Then, since the maximum absolute value difference $\mid U_{i}(t+1)-U_{i}(t)\mid$ is $C_{2}$ , constraint (17) is satisfied, see the proof of Claim 1.

For the proof of existence of a process $U^{\prime}_{i}(t)$ , with $B=\frac{1}{q}\log_{2}(2q)$ , see Appendix B. ∎

VI Extension to Arbitrary Initial Distributions

The proof of Thm. 1 only used the uniform input distribution to assert $U_{i}(0)=U_{1}(0)$ and replace $\mathsf{E}[U_{i}(0)]$ by $U_{1}(0)=\log_{2}(M-1)$ in equation (46). In Lemma 5, we have required that $U_{i}(0)<0\;\forall i$ . However, even with uniform input distribution this is not the case when $\Omega=\{0,1\}$ . To avoid this requirement, the case where $\exists i:U_{i}(0)\geq 0$ and therefore $T^{(1)}=0$ needs to be accounted for. Also, if $U_{i}(t)\geq C_{2}$ , then the probability that an initial fall back occurs is only upper bounded by $p_{f}$ , which can be inferred from the proof of Lemma 4. Then, to obtain an upper bound on the expected stopping time $\mathsf{E}[\tau]$ for an arbitrary input distribution, it suffices to multiply the terms $\mathsf{E}[U_{i}(T^{(1)})\mid T^{(1)}>0,\theta=i]-U_{i}(0)$ in the proof of Lemma 5, equation (104), by the indicator $\mathbbm{1}_{U_{i}(0)<0}$ . Then, the bound on Lemma 5 becomes:

	$\displaystyle\sum_{i=1}^{M}$	$\displaystyle\sum_{n=1}^{\infty}\mathsf{E}[U_{i}(t_{0}^{(n)}+T^{(n)})-U_{i}(t_{0}^{(n)})\mid\theta=i]\Pr(\theta=i)$
		$\displaystyle\leq C_{2}\frac{p}{q}\frac{1\!-\!\left(\frac{p}{q}\right)^{N}}{1\!-\!\frac{p}{q}}\!+\!\mathsf{E}[\left(C_{2}\!-\!U_{i}(0)\right)\mathbbm{1}_{U_{i}(0)<0}]\,.$		(136)

By Thm. 3, we can replace $C_{2}$ with $q^{-1}\log_{2}(2q)$ in (136). Using the definition of $p_{f}$ from (40) we obtain the bound:

	$\displaystyle\mathsf{E}[T]$	$\displaystyle\leq\mathsf{E}[T^{\prime}]\leq 2^{-C_{2}}\frac{1-2^{-NC_{2}}}{1-2^{-C_{2}}}\frac{\log_{2}(2q)}{qC}$
		$\displaystyle+\mathsf{E}\left[\left(\frac{\log_{2}(2q)}{q}-U_{i}(0)\right)\frac{\mathbbm{1}_{U_{i}(0)<0}}{C}\right]\,.$		(137)

VI-A Generalized Achievability Bound

An upper bound on $\mathsf{E}[\tau]$ for a arbitrary initial distribution $\bm{\rho}_{0}$ is then obtained using this bound (137) and the bound on $\mathsf{E}[\tau-T]$ from equation (47) to obtain:

	$\displaystyle\mathsf{E}[\tau]\leq\sum_{i=1}^{M}\left(\frac{\log_{2}\left(\frac{1-\rho_{i}(0)}{\rho_{i}(0)}\right)}{C}+\frac{\log_{2}(2q)}{q\cdot C}\right)\rho_{i}(0){\huge\mathds{1}_{\rho_{0}^{(i)}<0.5}}$
	$\displaystyle+\!\left\lceil\frac{\log_{2}(\frac{1-\epsilon}{\epsilon})}{C_{2}}\right\rceil\!\frac{C_{2}}{C_{1}}\!+\!\left(\frac{\log_{2}(2q)}{qC}-\frac{C_{2}}{C_{1}}\right)\frac{1-\frac{\epsilon}{1\!-\!\epsilon}2^{-C_{2}}}{1\!-\!2^{-C_{2}}}2^{-C_{2}}\,.$		(138)

For the special case where $\rho_{i}(0)\ll\frac{1}{2}\quad\forall i=1,\dots,M$ , the log likelyhood ratio can be approximated by $\log_{2}(\frac{\rho_{i}(0)}{1-\rho_{i}(0)})\lessapprox\log_{2}(\rho_{i}(i))$ to obtain a simpler expression of the bound (138):

	$\displaystyle\mathsf{E}[\tau]$	$\displaystyle<\frac{\mathcal{H}(\bm{\rho}_{0})}{C}+\frac{\log_{2}(2q)}{q\cdot C}+\left\lceil\frac{\log_{2}(\frac{1-\epsilon}{\epsilon})}{C_{2}}\right\rceil\frac{C_{2}}{C_{1}}$
		$\displaystyle\phantom{=\,}+\left(\frac{\log_{2}(2q)}{qC}-\frac{C_{2}}{C_{1}}\right)\frac{1-\frac{\epsilon}{1\!-\!\epsilon}2^{-C_{2}}}{1\!-\!2^{-C_{2}}}2^{-C_{2}}\,,$		(139)

where $\mathcal{H}(\bm{\rho}(0))$ is the entropy of the p.d.f. $\bm{\rho}_{0}$ in bits.

VI-B Uniform and Binomial Initial Distribution

Using the bound of equation (138), we can develop a better upper bound on the blocklength for a systematic encoder with uniform input distribution when $\Omega=\{0,1\}^{K}$ . It can be shown that the systematic transmissions transform the uniform distribution into a binomial distribution, see [1]. The bound is constructed by adding the $K$ systematic transmissions to the bound in (138) applied to the binomial distribution as follows:

	$\displaystyle\mathsf{E}[\tau]\leq K+$		(140)
	$\displaystyle\sum_{i=0}^{K}\!\left[\frac{\log_{2}(\frac{1-p^{i}q^{K\!-\!i}}{p^{i}q^{K\!-\!i}})}{C}\!+\!\frac{\log_{2}(2q)}{qC}\right]\!\!\binom{K}{i}p^{i}q^{K-i}\mathbbm{1}_{(q^{K-i}p^{i}<0.5)}$
	$\displaystyle+\left\lceil\frac{\log_{2}(\frac{1-\epsilon}{\epsilon})}{C_{2}}\right\rceil\frac{C_{2}}{C_{1}}\!+\!\left(\frac{\log_{2}(2q)}{qC}\!-\!\frac{C_{2}}{C_{1}}\right)\frac{1-\frac{\epsilon}{1\!-\!\epsilon}2^{-C_{2}}}{1\!-\!2^{-C_{2}}}2^{-C_{2}}\,.$

This bound, which assumes SEAD and systematic transmission, is the tightest achievability bound that we have developed for the model.

VII Algorithm and Implementation

In this section we introduce a systematic posterior matching (SPM) algorithm with partitioning by thresholding of ordered posteriors (TOP), that we call SPM-TOP. The SPM-TOP algorithm guarantees the performance of bound (21) of Thm. 3 because both systematic encoding and partitioning via TOP enforce the SEAD partitioning constraint in equations (30) and (31). The SPM-TOP algorithm also guarantees the performance of the bound (140) because it is a systematic algorithm.

VII-A Partitioning by Thresholding of Ordered Posteriors (TOP)

The TOP rule is a simple method to construct $S_{0}$ and $S_{1}$ at any time $t$ from the vector of posteriors $\bm{\rho}_{t}$ , which enforces the SEAD partitioning constraint of Thm. 3. The rule requires an ordering $\{b_{1},\dots,b_{M}\}$ of the vector of posteriors such that $\rho_{b_{1}}(t)\geq\rho_{b_{2}}(t)\geq\cdots\geq\rho_{b_{M}}(t)$ . TOP builds $S_{0}$ and $S_{1}$ by finding a threshold $m$ to split $\{b_{1},\dots,b_{M}\}$ into two contiguous segments $\{b_{1},\dots,b_{m}\}=S_{0}$ and $\{b_{m+1},\dots,b_{M}\}=S_{1}$ . To determine the threshold position, the rule first determines an index $m^{\prime}\in\{1,\dots,M\}$ such that:

\sum_{j=1}^{m^{\prime}-1}\rho_{b_{i}}(y^{t})<\frac{1}{2}\leq\sum_{j=1}^{m^{\prime}}\rho_{b_{i}}(y^{t})\,,

(141)

Once $m^{\prime}$ is found, the rule must select between two possible alternatives: Either $m=m^{\prime}$ or $m=m^{\prime}-1$ . In other words, all that remains to decide is whether to place $b_{m^{\prime}}$ in $S_{0}$ or in $S_{1}$ . We select the choice that guarantees that the absolute difference between $P_{0}$ and $P_{1}$ is no larger than the posterior of $b_{m^{\prime}}$ . The threshold $m$ is selected from $\{m^{\prime}-1,m^{\prime}\}$ as follows:

	$\displaystyle\textrm{if }\sum_{i=1}^{m^{\prime}}\rho_{b_{i}}(t)-\frac{1}{2}>\frac{1}{2}\rho_{b_{m^{\prime}}}(t)\quad\text{then: }m=m^{\prime}-1$		(142)
	$\displaystyle\textrm{if }\sum_{i=1}^{m^{\prime}}\rho_{b_{i}}(t)-\frac{1}{2}\leq\frac{1}{2}\rho_{b_{m^{\prime}}}(t)\quad\text{then: }m=m^{\prime}\,.$		(143)

Note that since $m\in\{m^{\prime}-1,m^{\prime}\}$ , then the posterior of $b_{m^{\prime}}$ is no larger than that of $b_{m}$ , and the posterior of $b_{m}$ is also the value of $\rho_{\min}=\min_{i\in S_{0}}\{\rho_{i}(t)\}$ .

Claim 2.

The TOP rule guarantees that the SEAD constraints of Thm. 3, given by (30) and (31) are satisfied.

Proof:

The TOP partitioning rule sets the threshold that separates $S_{0}$ and $S_{1}$ exactly before or exactly after item $b_{m^{\prime}}$ depending on weather case (142) or case (143) occurs. To show that the TOP rule guarantees that the SEAD constraints in Thm. 3 are satisfied note that the threshold lies before item $b_{m^{\prime}}$ if case (142) occurs. Then, by the first inequality of (141) and by (142):

	$\displaystyle P_{0}$	$\displaystyle=\sum_{i=1}^{m^{\prime}-1}\rho_{b_{i}}(y^{t})=\sum_{i=1}^{m^{\prime}}\rho_{b_{i}}(y^{t})-\rho_{b_{m^{\prime}}}(t)<\frac{1}{2}$		(144)
	$\displaystyle P_{0}$	$\displaystyle>\frac{1}{2}+\frac{1}{2}\rho_{b_{m^{\prime}}}(t)-\rho_{b_{m^{\prime}}}(t)=\frac{1}{2}-\frac{1}{2}\rho_{b_{m^{\prime}}}(t)\,.$		(145)

When case (143) occurs, the threshold is set after item $b_{m^{\prime}}$ . Then by the second inequality of (141) and by (142):

	$\displaystyle P_{0}$	$\displaystyle=\sum_{i=1}^{m^{\prime}}\rho_{b_{i}}(y^{t})\geq\frac{1}{2}$		(146)
	$\displaystyle P_{0}$	$\displaystyle\leq\frac{1}{2}+\frac{1}{2}\rho_{b_{m^{\prime}}}(t)\,,$		(147)

In either case we have:

\frac{1}{2}-\frac{1}{2}\rho_{b_{m}}(t)\leq P_{0}\leq\frac{1}{2}+\frac{1}{2}\rho_{b_{m}}(t)

(148)

By definition, $\Delta=P_{0}-P_{1}=P_{0}-(1-P_{0})=2P_{0}-1$ . Scale equation (148) by $2$ and subtract $1$ , then: $-\rho_{b_{m^{\prime}}}(t)<2P_{0}-1\leq\rho_{b_{m^{\prime}}}(t)$ . Then, $\mid\Delta\mid\;\leq\rho_{b_{m^{\prime}}}(t)\leq\rho_{b_{m}}(t)=\rho_{\min}$ . This concludes the proof. ∎

We have shown that the construction of $S_{0}$ and $S_{1}$ can be as simple as finding the threshold item $b_{m^{\prime}}$ where the c.d.f. induced by the ordered vector of posteriors crosses $\frac{1}{2}$ , then, determining whether the threshold should be before or after item $b_{m^{\prime}}$ , and finally allocating all items before the threshold to $S_{0}$ and all items after the threshold to $S_{1}$ .

VII-B The Systematic Posterior Matching Algorithm

The SPM-TOP algorithm is a low complexity scheme that implements sequential transmission over the BSC with noiseless feedback with a source message sampled from a uniform distribution. The algorithm has the usual communication and confirmation phases and the communication phase starts with systematic transmission. The systematic transmissions of the communication phase are treated as a separate systematic phase, for a total of three phases that we proceed to describe in detail.

VII-C Systematic phase

Let the sampled message be $\theta\in\{0,1\}^{K}$ , with bits $b_{i}^{(\theta)}$ , that is $\theta=\{b_{1}^{(\theta)},b_{2}^{(\theta)},\dots,b_{K}^{(\theta)}\}$ . For $t=1,\dots,K$ the bits $b_{t}^{(\theta)}$ are transmitted and the vector $y^{K}\triangleq\{y_{1},\dots,y_{K}\}$ is received. After the $K$ -th transmission, both transmitter and receiver initialize a list of $K+1$ groups $\{\mathcal{G}_{0},\dots,\mathcal{G}_{K}\}$ , where each $\mathcal{G}_{i}$ is a tuple $\mathcal{G}_{i}=[N_{i},L_{i},h_{i},\rho_{i}(y^{t})]$ . For each tuple $N_{i}$ is the count of messages in the group $N_{i}$ ; $L_{i}$ is the index of the first message in the group; $h_{i}$ is the shared Hamming distance between $y^{K}$ and any message in the group, that is: $l,s\in\mathcal{G}_{i}\implies\sum_{j=1}^{K}b^{(l)}_{j}\oplus y_{j}=\sum_{j=1}^{K}b^{(s)}_{j}\oplus y_{j}$ ; and $\rho_{i}(y^{t})$ is the group’s shared posterior. At time $t=K$ , each group $\mathcal{G}_{i},i=1,\dots,K$ has that $N_{i}=\binom{K}{i}$ , $L_{i}=0$ , $h_{i}=i$ , and $\rho_{i}(K)=p^{j}q^{K-j}$ . The groups are sorted in order of decreasing probability, equivalent to increasing Hamming weight, since $j>l\rightarrow p^{l}q^{K-l}<p^{h}q^{K-j}$ . At the end of the systematic phase, the transmitter finds the index of the group $h_{\theta}$ and the index within the group $n_{\theta}$ corresponding to the sampled message $\theta$ . The index $h_{\theta}$ is given by $h_{\theta}=\sum_{j=1}^{K}b_{j}^{(\theta)}\oplus y_{j}$ and the index $n_{\theta}\in\{0,\dots,\binom{K}{h_{\theta}}-1\}$ and is found via algorithm 5. The systematic phase is described by algorithm 1.

Input: Tx:

\theta=[b_{1}^{(\theta)},\dots,b_{K}^{(\theta)}]

\triangleright

Transmitted message.

for $t=1,\dots,K$ do

1 channel input:

x_{t}=b^{\theta}_{t}

, output:

y_{t}

;

end for

Construct

\{\mathcal{G}_{0},\dots,\mathcal{G}_{K}\}

\mathcal{G}_{i}=[N_{i},L_{i},h_{i},\rho_{i}(K)]

;

N_{i}\leftarrow\binom{K}{i}

\triangleright

N_{i}\triangleq\mid\mathcal{G}_{i}\mid

;

\rho_{i}(K)\leftarrow q^{K-i}p^{i}

\triangleright

j\in\mathcal{G}_{i}\rightarrow\rho_{j}(K)=\rho_{i}

;

L_{i}\leftarrow 0

h_{i}\leftarrow i

;

5map

h_{\theta}

n_{\theta}\in\mathcal{G}_{h_{\theta}}\triangleq\mathcal{G}^{(\theta)}

\triangleright

Only Tx, algorithm 5;

Algorithm 1 Systematic Phase

VII-D Communication Phase

The communication phase consists of the transmissions after the systematic phase, and while all posteriors are lower than $\frac{1}{2}$ . During communication phase, the transmitter attempts to boost the posterior of the transmitted message, past the threshold $\frac{1}{2}$ , though any other message could cross the threshold instead, due to channel errors.

The list of groups initialized in the systematic phase is maintained ordered by decreasing common posterior. The list of groups is partitioned into $S_{0}$ and $S_{1}$ before each transmission using rule (141). For this, the group $\mathcal{G}_{m}$ that contains the threshold item $b_{m}$ is found first, then all groups before $\mathcal{G}_{m}$ are assigned to $S_{0}$ and all the groups after $\mathcal{G}_{m}$ are assigned to $S_{1}$ . To assign group $\mathcal{G}_{m}$ the index $n_{m}$ of item $b_{m}$ that sets the threshold is determined within group $\mathcal{G}_{m}$ . The TOP rule demands that all items $j\in\mathcal{G}_{m}$ with index $n^{(j)}_{m}\leq n_{m}$ be assigned to $S_{0}$ and all items $i\in\mathcal{G}_{m}$ with index $n^{(i)}_{m}>n_{m}$ to $S_{1}$ . For this we split the group $\mathcal{G}_{m}$ into two by creating an new group with the segment of items past $n_{m}$ that belongs in $S_{1}$ . However, if the item with index $n_{m}$ is the last item in $\mathcal{G}_{m}$ , then the entire group $\mathcal{G}_{m}$ belongs in $S_{0}$ and no splitting is required.

After each transmission $t$ , the posterior probabilities of the groups are updated using the received symbol $Y_{t}$ according to equation (120). Each posterior is multiplied by a weight update, computed using equation (121), according to its assignment, $S_{0}$ or $S_{1}$ . Then, the lists that comprise $S_{0}$ and $S_{1}$ are merged into a single sorted list. The process repeats for the next transmission and the communication phase is interrupted to transition to the confirmation phase when the posterior of a candidate message crosses the $\frac{1}{2}$ threshold. The communication phase might resume if the posterior of message $i$ that triggered the confirmation falls below $\frac{1}{2}$ rather than increasing past $1-\epsilon$ , and all other posteriors are still below $\frac{1}{2}$ .

Not all groups need to be updated at every transmission. The partitioning method only requires visiting groups $\mathcal{G}_{1},\mathcal{G}_{2},\dots,\mathcal{G}_{m}$ . After the symbol $Y_{t+1}$ is received, we need to combine $S_{0}$ and $S_{1}$ into a single list, sorted by decreasing order of posteriors once updated. Figure 3 shows the three operations that are executed during the communication phase: partitioning the list, updating the posteriors of the partitions, and combining the updated partitions into a single sorted list. Note that in both cases, $Y_{t+1}=0$ and $Y_{t+1}=1$ , either the entirety or a segment of the partition $S_{1}$ is just appended at the end of the new sorted list. This segment starts at the first group $\mathcal{G}_{j}\in S_{1}$ , such that its posterior $\rho_{j}(t)$ is smaller than the smallest in $S_{0}$ . We avoid explicit operations on this segments and only saved the “weight” update factor as another item in the tuple described in at the beginning of this section. Every subsequent item in the list will share this update coefficient and could be updated latter on if it is encountered at either partitioning the list, updating the posteriors or combining the partitions. If this happens, the “weight” update is just “pushed” to the next list item. Since most of the groups belong to this “tail” segment, we expect to perform explicit operations only for a “small” fraction of the groups. This results in a large complexity reduction, which is validated by the simulation data of figure 6.

Result: index

\hat{\theta}

s.t.

\rho_{\hat{\theta}}(\tau)\geq\frac{1}{2}

Input: List of Groups

\{\mathcal{G}_{0},\dots,\mathcal{G}_{K}\}

Input:

h_{\theta}

n_{\theta}\in\mathcal{G}_{h_{\theta}}

\triangleright

Tx Only

while $\rho_{0}(t)<\frac{1}{2}$ do

m\leftarrow 0,P_{0}\leftarrow 0,P_{1}\leftarrow 0

S_{0}\leftarrow\emptyset

;

while $P_{0}+N_{m}\rho_{m}<\frac{1}{2}$ do

P_{0}\leftarrow P_{0}+N_{m}\rho_{m}(t)

;

m\leftarrow m+1

;

end while

S_{0}\leftarrow\{\mathcal{G}_{0},\dots,\mathcal{G}_{m-1}\}

S_{1}\leftarrow\{\mathcal{G}_{m+1},\dots\}

;

n\leftarrow\lceil\frac{0.5-P_{0}}{\rho_{m}(t)}\rceil

\triangleright

Initial

n

value;

if $P_{0}+n\rho_{m}(t)>\frac{1}{2}(1+\rho_{m}(t))$ then

n\leftarrow n-1

\triangleright

TOP rule

end if

if $n>0$ && $n<N_{m}$ then

\mathcal{G}_{new}\leftarrow\textbf{copy}\left(\mathcal{G}_{m}\right)

;

N_{new}\leftarrow N_{m}-n

L_{new}\leftarrow L_{m}+n,N_{m}\leftarrow n

;

S_{0}\leftarrow S_{0}\cup\mathcal{G}_{m},S_{1}\leftarrow\mathcal{G}_{new}\cup S_{1}

;

else

n==0

then

S_{1}=\mathcal{G}_{m}\cup S_{1}

m\leftarrow m-1

;

else then

S_{0}=S_{0}\cup\mathcal{G}_{m}

end if

P_{0}\leftarrow P_{0}+n\rho_{m}(t),P_{1}\leftarrow 1-P_{0}

;

if Tx then

if $\mathcal{G}_{m}==\mathcal{G}^{(\theta)}$ && $n_{\theta}\leq L_{m}+n$ then

\mathcal{G}^{(\theta)}\leftarrow\mathcal{G}_{new}^{0}

end if

\mathcal{G}^{(\theta)}\in S_{0}

then

x_{t+1}=0

, else

x_{t+1}=1

end;

end if

t\leftarrow t+1

\triangleright

Increase time index;

Update and Merge

S_{0}

and

S_{1}

via algorithm 3;

end while

Algorithm 2 Communication Phase

Data: channel input:

x_{t+1}

, output:

y_{t+1}

Data:

m

: index of last item in

S_{0}

Data:

W_{i}

: Additional group parameter initialized at

1

\mathcal{G}_{i}=[N_{i},L_{i},h_{i},\rho_{i}(y^{t}),W_{i}]

W_{0}\leftarrow\frac{q}{P_{y_{t+1}}(q-p)+p}

\triangleright

Weight update for items in

S_{0}

;

W_{1}\leftarrow\frac{p}{P_{y_{t+1}}(q-p)+p}

\triangleright

Weight Update for items in

S_{1}

;

n_{0}\leftarrow 0

n_{1}\leftarrow 0

\triangleright

index of first groups in

S_{0}

and

S_{1}

;

W_{n_{1}}\leftarrow W_{n_{1}}\cdot W_{1}

\triangleright

Update weight of first group in

S_{1}

;

while $n_{0}\leq m$ do

if $\rho_{n_{0}}(t)<W_{n_{1}}\cdot\rho_{n_{1}}(t)$ then

\rho_{n_{1}}(t)\leftarrow\rho_{n_{1}}(t)\cdot W_{n_{1}}

;

W_{n_{1}+1}\leftarrow W_{n_{1}+1}\cdot W_{n_{1}}

\triangleright

Update Next weight;

W_{n_{1}}\leftarrow 1

\triangleright

Reset weight

W_{n_{1}}

;

insert

\mathcal{G}_{n_{1}}

S_{0}

before

\mathcal{G}_{n_{0}}

;

n_{1}\leftarrow n_{1}+1

\triangleright

Get next item from

S_{1}

;

else

\rho_{n_{0}}(t)\leftarrow\rho_{n_{0}}(t)\cdot W_{0}

\triangleright

Update

\rho_{n_{0}}(t)

;

n_{0}\leftarrow n_{0}+1

\triangleright

Get next item from

S_{0}

;

end if

end while

Algorithm 3 Simplified Update and Merge Algorithm

Data: Group

\mathcal{G}_{i}

with

N_{i}=1

and

\rho_{i}(y^{t})\geq\frac{1}{2}

Z(t)\leftarrow 0

N=\left\lceil C_{2}^{-1}\log_{2}\left(\frac{1-\epsilon}{\epsilon}\right)\right\rceil

;

U_{i}(t)+(N-1)C_{2}\geq\log_{2}\left(\frac{1-\epsilon}{\epsilon}\right)

then

N\leftarrow N-1

;

while $Z(t)\geq 0$ && $Z(t)<N$ do

Tx:

X_{t}\leftarrow 0

\mathcal{G}^{(\theta)}==\mathcal{G}_{i}

else

X_{t}\leftarrow 1

;

Z(t)\leftarrow Z(t)+1-2\cdot Y_{t}

;

end while

if $Z(t)==-1$ then

Run update and merge

\triangleright

Algorithm 3;

Go to Communication Phase

\triangleright

Algorithm 2;

else

Rx: Get estimate

\hat{\theta}

\triangleright

Algorithm 6;

end if

Algorithm 4 Confirmation Phase Algorithm

VII-E Confirmation Phase

The Confirmation Phase is triggered when a candidate $i$ attains a posterior $\rho_{i}(y^{t})\geq\frac{1}{2}$ . During this phase the transmitter will attempt to boost $\rho_{i}(y^{t})$ , the posterior of candidate $i$ , past the $1-\epsilon$ threshold, if it is the true message $\theta$ . Otherwise it will attempt to drive its posterior below $\frac{1}{2}$ . Clearly, the randomness of the channel could allow the posterior $\rho_{i}(y^{t})$ to grow past $1-\epsilon$ , even if it is the wrong message, resulting in a decoding error. Alternatively, the right message could still fall back to the communication phase, also due to channel errors. The confirmation phase lasts for as long as the posterior of the message that triggered its start stays between $\frac{1}{2}$ and $1-\epsilon$ or equivalently $U_{i}(t)$ stays between $0$ and $\epsilon_{U}\triangleq\log_{2}\left(\frac{1-\epsilon}{\epsilon}\right)$ .

There are no partitioning, update, or combining operations during the confirmation phase. If $j$ is the message in the confirmation phase, then the partitioning is just $S_{0}=\{j\}$ , $S_{1}=\Omega\setminus\{j\}$ . A single update is executed if a fallback occurs, letting $\rho_{i}(y^{t})=\rho_{i}(y^{T_{n}})\quad\forall i=1,\dots,M$ , where $n$ is the index of the confirmation phase round that just ended and $T_{n}$ is the time at which it started. This is because every negative update that follows a positive update results in every $\rho_{i}(y^{t})$ returning to the state it was at time $t-2$ . This is summarized in claim 3 that follows. During the confirmation phase it suffices to check if $U_{j}(t)\geq\epsilon_{U}$ , in which case the process should terminate, or if $U_{j}(t)<0$ , in which case a fall back occurs.

Claim 3 (Confirmation Phase is a Discrete Markov Chain).

Let the partitioning of $\Omega$ at time $t=s$ be $S_{0}=\{j\}$ , $S_{1}=\Omega\setminus\{j\}$ , and suppose $Y_{s+1}=0$ . If the partitioning at time $t=s+1$ is also $S_{0}=\{j\}$ , $S_{1}=\Omega\setminus\{j\}$ , the same partitioning of time $s$ , and $Y_{t+1}=1$ , then for all $i=1,\dots,M$ , $\rho_{i}(y^{t+1})=\rho_{i}(y^{t-1})$ , that is $\rho_{i}(y^{s})=\rho_{i}(y^{s+2})$ .

Proof:

See appendix C ∎

During the confirmation phase we only need to count the difference between boosting updates and attenuating updates. Since the $U_{i}(t)$ changes in steps with magnitude $C_{2}$ , then there is a unique number $N$ such that $U_{i}(T_{n})+NC_{2}\geq\epsilon_{U}$ and $U_{i}(T_{n})+(N-1)C_{2}<\epsilon_{U}$ . Starting at time $t=T_{n}$ , , since $S_{0}=\{j\}$ , any event $Y_{t+1}=0$ is a boosting update that results in $U_{i}(t+1)=U_{i}(t)+C_{2}$ and any event $Y_{t+1}=1$ is an attenuating update that results in $U_{i}(t+1)=U_{i}(t)-C_{2}$ . A net of $N$ boosting updates are needed to reach $U_{i}(T_{n})+NC_{2}$ . Let the difference between boosting and attenuating updates be $Z(t)\triangleq\sum_{s=T_{n}+1}^{t}(1-2Y_{s})$ . The transmission terminates the first time $\tau$ where $Z(\tau)=N$ . However, a fall back occurs if $Z(t)$ ever reaches $-1$ before reaching $N$ . The value of $N$ can be computed as follows: let $N_{1}\triangleq\left\lceil C_{2}^{-1}\log_{2}\left(\frac{1-\epsilon}{\epsilon}\right)\right\rceil$ and let $\epsilon_{n}\triangleq\log_{2}\left(\frac{1-\epsilon}{\epsilon}\right)-N_{1}C_{2}$ . Suppose the confirmation phase starts at some time $t=T_{n}$ , then, $N=N_{1}$ if $U_{i}(T_{n})\geq\epsilon_{N}$ , otherwise $N=N_{1}+1$ . Once $N$ is computed, all that remains is to track $Z(t)$ , where $Z(t+1)=Z(t)+(1-2Y_{t+1})$ , and return to the communication phase if $Z(t)$ reaches $-1$ or terminate the process if $Z(t)$ reaches $N$ .

Input:

i,K

\triangleright

i

: message,

K

: message length=

Input: channel output:

y_{K}

Result: tuple

(h,n)

\triangleright

(type, index)

e^{i}=i\oplus y^{K}

;

h\leftarrow\sum^{K}_{j=0}e^{i}_{j}

;

n\leftarrow 0

;

c\leftarrow h

;

for $j=0,\dots,K$ do

if $c==0$ then

Break

end if

if $e^{i}_{j}==0$ then

n\leftarrow n+\binom{K-j-1}{c-1}

;

else

c\leftarrow c-1

;

end if

end for

Algorithm 5 map message i to index n and type h

Theorem 4.

[from [1]] Suppose that $\Omega=\{0,1\}^{K}$ and $\rho_{i}(0)=2^{-K}\,\forall i\in\Omega$ . Then, for $t=1,\dots,K$ the partitioning rule $S_{0}=\{i\in\Omega\mid b^{(i)}_{t}=0\},S_{1}=\{i\in\Omega\mid b^{(i)}_{t}=1\}$ , results in systematic transmission: $x^{K}=\theta$ , and achieves exactly equal partitioning $P_{0}=P_{1}=\frac{1}{2}$ .

Proof:

First note that if $\Omega=\{0,1\}^{K}$ , then for each $t=1,\dots,K$ , exactly half of the items in $i\in\Omega$ have bit $b^{(i)}_{t}=0$ and the other half have bit $b^{(i)}_{t}=1$ . The theorem holds for $t=1$ , since the partitioning $S_{0}=\{i\in\Omega\mid b^{(i)}_{1}=0\},S_{1}=\{i\in\Omega\mid b^{(i)}_{1}=1\}$ results in half the messages in each partition and all the messages have the same prior. For $t=1,\dots,K-1$ note the partitioning $S_{0}=\{i\in\Omega\mid b^{(i)}_{t}=0\},S_{1}=\{i\in\Omega\mid b^{(i)}_{t}=1\}$ only considers the first $t$ bits $b^{(i)}_{1},\dots,b^{(i)}_{t}$ of each message $i$ . Thus, all item $\{j\in\Omega\mid b^{(j)}_{1},\dots,b^{(j)}_{t}=b_{1},\dots,b_{t}\}$ that share a prefix sequence $b_{1},\dots,b_{t}$ have shared the same partition at times $s=1,\dots,t$ , and therefore share the same posterior. There are exactly $2^{K-t}$ such difference posteriors. Also, exactly half of the items that share the sequence $b_{1},\dots,b_{t}$ have bit $b_{t+1}=0$ and are assigned to $S_{0}$ at time $t+1$ and the other half have bit $b_{t+1}=1$ and are assigned to $S_{1}$ at time $t+1$ . Then, $S_{0}$ and $S_{1}$ will each hold half the items in each posterior group at each next time $t+1$ for $t=1,\dots,K-1$ , and therefore equal partitioning holds also at times $t=2,\dots,K$ . ∎

Data: Index

L_{i}

, Hamming weight

h_{i}

Result:

\hat{\theta}

\triangleright

Receiver Estimate of

\theta

\hat{\theta}\leftarrow y^{K}

n\leftarrow L_{i}

h\leftarrow h_{i}

;

for $j=0,\dots,K-1$ do

if $h==0$ then

Break

end if

if $n<\binom{K-1-j}{h-1}$ then

\hat{\theta}_{j}\leftarrow\neg\hat{\theta}_{j}

;

h\leftarrow h-1

;

else

n\leftarrow n-\binom{K-j-1}{h-1}

;

end if

end for

Algorithm 6 get estimate

\hat{\theta}

from index n and type h

VII-F Complexity of the SPM-TOP Algorithm

The memory complexity of the SPM-TOP algorithm is of order $O(K^{2})$ because we use a triangular array of all combinations of the form $\binom{K}{i}\quad i\in\{0,\dots,K\}$ . The algorithm itself stores a list of groups that grow linearly with $K$ , since the list size is bounded by the decoding time $\tau$ .

The time complexity of the SPM-TOP algorithm is of order $O(K^{2})$ . To obtain this result note that the total number of items that the system tracks is bounded by the transmission index $t$ . At each transmission $t$ , partitioning, update and combine operations require visiting every item at most once. Furthermore, because of the complexity reduction described in Sec. VII-D, the system executes operations for only a fraction of all the items that are stored. The time complexity at each transmission is then of order $O(K)$ , with a small constant coefficient. The number of transmissions required is approximately $K/C$ as the scheme approaches capacity. A linear number of transmissions, each of which requires a linear number of operations, results in an overall quadratic complexity, that is, order $O(K^{2})$ , for fixed channel capacity $C$ .

The $K$ systematic transmissions only require storing the bits, and in the confirmation phase we just add each symbol $Y_{t}$ to the running sum. The complexity of this phase is then of order $O(K)$ . Therefore, the complexity of $O(K^{2})$ is only for the communication phase.

VIII SPM-TOP simulation results

We validate the theoretical achievability bounds in Sec. III and Sec. VI via simulations of the SPM-TOP algorithm. Figure 4 shows Simulated rate vs. blocklength performance curves of the SPM-TOP algorithm and the corresponding frame error rate (FER). The plots show simulated rate curves for the standard SPM-TOP algorithm and a randomized version, as well as their associated error rate. The rate for the standard SPM-TOP algorithm is shown with the red curve with dots at the top and the corresponding FER is the red, jagged curve at the bottom. The standard SPM-TOP algorithm stops when a message $i$ attains $U_{i}(t)\geq\log_{2}\left((1-\epsilon)/\epsilon\right)$ . Note that the FER is well below the threshold $\epsilon$ . The randomized version of the SPM-TOP algorithm, implements a stopping rule that randomly alternates between the standard rule, which is stopping when a message $i$ attains $U_{i}(t)\geq\log_{2}\left((1-\epsilon)/\epsilon\right)$ , and stopping when a message attains $U_{i}(t)\geq\log_{2}\left((1-\epsilon)/\epsilon\right)-C_{2}$ , which requires one less correctly received transmission. The simulated rate of the randomized SPM-TOP version is the purple curve above the standard version. This randomized version aims to obtain a higher rate by forcing the FER to be close to the threshold $\epsilon$ rather than upper bounded bounded by $\epsilon$ . Note that the corresponding FER, the horizontal purple curve with dots, is very close to the threshold $\epsilon=10^{-3}$ , but not necessarily bounded above by the threshold. The simulation consisted of $10^{6}$ trials for each value of $K=1,\dots,100$ and for a decoding threshold $\epsilon=10^{-3}$ and a channel with capacity $C=0.50$ . The simulated rate curves attain an average rate that approaches capacity rapidly and stay above all these theoretical bounds, also shown in Figure 4 that we describe next.

The two rate lower bounds introduced in this paper are shown in Figure 4 for the same channel, capacity $C=0.50$ and threshold $\epsilon=10^{-3}$ , used in the simulations. The highest lower bound is labeled Sys. Lower Bound, and is the bound developed in Sec. VI-B for a system that uses a systematic phase to turn a uniform initial distribution into a binomial distribution and then enforces the SEAD constraints. The next highest bound is labeled SEAD Rate Bound and is the lower bound (32) introduced in Thm. 3 for a system that enforces the SEAD constraints. This bound is a slight improvement from the SED lower bound by Yang et al. [2] that we show for comparison and is labeled Yang’s Lower Bound. Also for comparison, we show Polyanskiy’s VLF lower bound developed for a stop feedback system. Since a stop feedback system is less capable than a full feedback system, we expect that the lower bound for full feedback system approaches capacity faster than the VLF bound, which is what the previous $3$ bounds achieve.

The empirical time complexity results of the SPM-TOP algorithm simulations vs. message size $K$ is shown in Figures 5 and Figure 6. All simulations were performed on a 2019 MacBook Pro laptop with a $2.4$ GHz, $8$ -core $i9$ processor and $16$ GB of RAM, and with transmitter and receiver operating alternatively on the same processor. First we show in Figure 5 the average time, in milliseconds, taken per message, yellow line, and per transmitted symbol, green line. The average time per symbol drops very fast as the message size grows from $10$ to $200$ and then slowly stabilizes. This drop could be explained by the initialization time needed for each new message. However, the computer temperature and other processes managed by the computer’s OS could also play a role in time measurements. For a more accurate characterization of the complexity’s evolution as a function of message size $K$ , we count the number of operations executed during the transmission of each symbol and each message, which are probability checks for partitioning before transmitting a symbol and probability updates after the transmission of a symbol.

The average number of probability checks and probability update operations per message vs message size $K$ are shown in the top of Figure 6. To compare the data with a quadratic line, we fitted the parabola $0.0154K^{2}+4.4316K-25.9905$ to the update-merge simulated data. Also for reference, the blue line shows the function $0.17K^{1.69}$ to highlight that the complexity per message is below quadratic in the region of interest. The average number of probability checks and probability update operations per transmitted symbol are shown in the bottom of Figure 6. The number of operations per symbol falls well below $K$ , even when the number of probabilities that the system tracks is larger than $K$ . This number is $K+1$ at $t=K$ and only increases with $t$ . Note that for $K=1000$ both averages are below $40$ . These results show that complexity of the SPM-TOP algorithm allows for fast execution time and validate the theory that the complexity order as a function of $K$ is linear for each transmission and quadratic for the whole message.

IX Conclusion

Naghshvar et al. [10] established the “small enough difference” (SED) rule for posterior matching partitioning and used martingale theory to study asymptotic behavior and also showed how to develop a non-asymptotic lower bound on achievable rate. Yang et al. [2] significantly improved the non-asymptotic achievable rate bound using martingale theory for the communication phase and a Markov model for the confirmation phase, still maintaining the SED rule. However, partitioning algorithms that enforce the SED rule require a complex process of swapping messages back and forth between the two message sets $S_{0}$ and $S_{1}$ and updating the posteriors.

To reduce complexity, this paper replaces SED with the small enough absolute difference (SEAD) partitioning constraints. The SEAD constraints are more relaxed than SED, and they admit the TOP partitioning rule. In this way, SEAD allows a low complexity approach that organizes messages according to their type, i.e. their Hamming distance from the received word, orders messages according to their posterior, and partitions the messages with a simple threshold without requiring any swaps.

The main analytical results show that the SEAD constraints suffice to achieve at least the same lower bound that Yang et al. [2] showed to be achievable by SED. Moreover, the new SEAD analysis establishes achievable rate bounds higher than those found by Yang et al. [2]. The analysis does not use martingale theory for the communication phase and applies a surrogate channel technique to tighten the results. An initial systematic transmission further increases the achievable rate bound.

The simplified encoder associated with SEAD has a complexity below order $O(K^{2})$ and allows simulations for message sizes of at least 1000 bits. These simulations reveal actual achievable rates that are enough above our new achievable-rate bounds that further analytical investigation to obtain even tighter achievable-rate bounds is warranted. From a practical perspective, the simulation results themselves provide new lower bounds on the achievable rates possible for the BSC with full feedback. For example, with an average block size of 200.97 bits corresponding to $k=99$ message bits, simulation results for a target codeword error rate of $10^{-3}$ show a rate of $R=0.493$ for the channel with capacity 0.5, i.e. $99$ % of capacity.

X Acknowledgements

The authors would like to thank Hengjie Yang and Minghao Pan for their help with this manuscript.

Appendix A Proof of claim 1

Proof that $\mid U_{i}(t+1)-U_{i}(t)\mid=C_{2}$ is equivalent to $S_{0}=\{j\}$ ( or $S_{1}=\{j\}$ ). First let’s prove the converse, if the set containing $j$ is not singleton, then constraint (19) does not hold. Without loss of generality, assume $j\in S_{0}$ and suppose $\exists l\neq j$ s.t. $l\in S_{0}$ . Since $P_{0}\geq\rho_{j}(y^{t})+\rho_{l}(t)$ , then, $\Delta=2P_{0}-1\geq 2\rho_{i}(y^{t})+2\rho_{j}(y^{t})-1\geq 2\rho_{j}(y^{t})-1>0$ . By equation (123), when $j\in S_{y}$ , then:

$\displaystyle U_{j}$	$\displaystyle(t+1)-U_{j}(t)$
	$\displaystyle=\log_{2}(2q)-\log_{2}\left(1+(q-p)\frac{\Delta-\rho_{j}(y^{t})}{1-\rho_{j}(y^{t})}\right)$	(149)
	$\displaystyle\leq\log_{2}(2q)\!-\!\log_{2}\left(1\!+\!(q\!-\!p)\frac{2\rho_{j}(y^{t})+2\rho_{l}(t)\!-\!1\!-\!\rho_{j}(y^{t})}{1\!-\!\rho_{j}(y^{t})}\right)$
	$\displaystyle=\log_{2}(2q)\!-\!\log_{2}\left(1\!-\!(q\!-\!p)\!+\!(q\!-\!p)\frac{2\rho_{l}(t)}{1\!-\!\rho_{j}(y^{t})}\right)$
	$\displaystyle<\log_{2}(2q)-\log_{2}\left(1-(q-p)\right)=C_{2}\,.$	(150)

Note from equation (149) that $U_{j}(t+1)-U_{j}(t)$ decreases with $\Delta$ , therefore, replacing $\Delta$ with a lower bound gives an upper bound of difference (149). For a lower bound, note that $\Delta\leq 1$ , and that setting $\Delta=1$ in (149), results in $U_{j}(t+1)-U_{j}(t)=0$ . In the case where $Y_{t+1}=X_{t+1}\oplus 1$ , or $j\in S_{y^{c}}$ , by equation (123), the difference $U_{j}(t+1)-U_{j}(t)$ is:

$\displaystyle U_{j}$	$\displaystyle(t+1)-U_{j}(t)$
	$\displaystyle=\log_{2}(2p)-\log_{2}\left(1-(q-p)\frac{\Delta-\rho_{j}(y^{t})}{1-\rho_{j}(y^{t})}\right)$	(151)
	$\displaystyle\geq\log_{2}(2p)\!-\!\log_{2}\left(\!1\!-(q\!-\!p)\frac{2\rho_{j}(y^{t})\!+\!2\rho_{l}(t)\!-\!1-\!\rho_{j}(y^{t})}{1-\rho_{j}(y^{t})}\right)$	(152)
	$\displaystyle=\log_{2}(2p)\!-\!\log_{2}\left(\!1\!+\!(q\!-\!p)\left(1\!-\!\frac{2\rho_{l}(t)}{1\!-\!\rho_{j}(y^{t})}\right)\right)$	(153)
	$\displaystyle>\log_{2}(2p)-\log_{2}\left(2q\right)=-C_{2}\,.$	(154)

To prove that if the set containing $j$ is singleton, then $|U_{j}(t+1)-U_{j}(t)|=C_{2}$ , note that $S_{0}=\{j\}\implies\Delta=2\rho_{j}(y^{t})-1$ . The inequalities, therefore, become equalities and equations (149) and (151) become $C_{2}$ and $-C_{2}$ respectively.

Appendix B Proof of existence of $U^{\prime}_{i}(t)$ in Thm. 2

The proof that a process like the one described in Thm. 2 exists, consists of constructing one such process. Define the process $U^{\prime}_{i}(t)$ by $U^{\prime}_{i}(t)=U_{i}(t)$ if $Y^{t}\in\mathcal{B}^{(i,n)}_{\epsilon}$ and define $U^{\prime}_{i}(t+1)\triangleq U^{\prime}_{i}(t)+W^{\prime}_{i}(t+1)$ , where, to enforce constraints (16) and (20), if $Y^{t}\!\in\mathcal{Y}^{(i,n)}_{\epsilon}$ , the update $W^{\prime}_{i}(t+1)$ is defined by (162), and when $Y^{t}\!\notin\mathcal{Y}^{(i,n)}_{\epsilon}$ , in which case $U^{\prime}_{i}(t)\geq 0$ , to enforce constraints (11) and (19) the update $W^{\prime}_{i}(t+1)$ is given by $C_{2}(\mathbbm{1}_{(Y_{t+1}=0)}-\mathbbm{1}_{(Y_{t+1}=1)})$ .

Denote the transmitted symbol $X_{t+1}$ by $X$ , and the received symbol $Y_{t+1}$ by $Y$ and let $X^{c}=X\oplus 1$ . The symbol $Y$ could either be $X$ or $X^{c}$ . Also, we could have $\{i\in S_{0}\}$ or $\{i\in S_{1}\}$ . These cases combine to $4$ possible events. Define $W_{i}(t+1)\triangleq U_{i}(t+1)-U_{i}(t)$ , and note that $W_{i}(t+1)$ can be derived from equation (123) as follows:

W_{i}(t\!+\!1)=\begin{cases}\log_{2}(2q)\!+\!a_{i}\quad\text{if }i\in S_{0},Y=X\\ \log_{2}(2p)\!+\!b_{i}\quad\text{if }i\in S_{0},Y=X^{c}\\ \log_{2}(2p)\!+\!c_{i}\quad\text{if }i\in S_{1},Y=X^{c}\\ \log_{2}(2q)\!+\!d_{i}\quad\text{if }i\in S_{1},Y=X\end{cases}\,,

(155)

where $a_{i}$ , $b_{i}$ , $c_{i}$ and $d_{i}$ are given by:

$\displaystyle a_{i}$	$\displaystyle=-\log_{2}\left(1-(q-p)\frac{\rho_{j}(y^{t})-\Delta}{1-\rho_{j}(y^{t})}\right)$	(156)
$\displaystyle b_{i}$	$\displaystyle=-\log_{2}\left(1+(q-p)\frac{\rho_{j}(y^{t})-\Delta}{1-\rho_{j}(y^{t})}\right)$	(157)
$\displaystyle c_{i}$	$\displaystyle=-\log_{2}\left(1+(q-p)\frac{\rho_{j}(y^{t})+\Delta}{1-\rho_{j}(y^{t})}\right)$	(158)
$\displaystyle d_{i}$	$\displaystyle=-\log_{2}\left(1-(q-p)\frac{\rho_{j}(y^{t})+\Delta}{1-\rho_{j}(y^{t})}\right)\,.$	(159)

Let $a^{\prime}_{i}$ and $d^{\prime}_{i}$ be defined by:

	$\displaystyle a^{\prime}_{i}$	$\displaystyle\triangleq\mathbb{1}_{(\Delta<0)}\frac{1-\Delta}{1+\Delta}\log_{2}\left(1-(q-p)\Delta\right)-\frac{p}{q}b_{i},$		(160)
	$\displaystyle d^{\prime}_{i}$	$\displaystyle\triangleq\mathbb{1}_{(d_{i}<0)}d_{i}-\mathbb{1}_{(d_{i}\geq 0)}\frac{p}{q}c_{i}\,,$		(161)

then, define the update $W^{\prime}_{i}(t+1)$ by:

W^{\prime}_{i}(t\!+\!1)=\begin{cases}\log_{2}(2q)\!+\!a^{\prime}_{i}\quad\text{if }i\in S_{0},Y=X\\ \log_{2}(2p)\!+\!b_{i}\quad\text{if }i\in S_{0},Y=X^{c}\\ \log_{2}(2p)\!+\!c_{i}\quad\text{if }i\in S_{1},Y=X^{c}\\ \log_{2}(2q)\!+\!d^{\prime}_{i}\quad\text{if }i\in S_{1},Y=X\end{cases}\,.

(162)

Need to show that the constraints (20)-(19) of Thm. 1, and constraints (26) and (28) are satisfied. When $U^{\prime}_{i}(t)\geq 0$ , since $W^{\prime}_{i}(t+1)$ is defined in the same manner as $W_{i}(t+1)$ , then constraints (18) and (19) are satisfied.

The proof that $U^{\prime}_{i}(t)$ satisfies constraints (20), (16) and (26) is split into the case where $\Delta\geq 0$ and the case where $\Delta<0$ .

B-A Case $\Delta\geq 0$ .

It suffices to show that for all $y^{t}\in\mathcal{Y}^{(i,n)}_{\epsilon}$ and for all $i=1,\dots,M$ , if $\Delta\geq 0$ , then $\mathsf{E}[W^{\prime}_{i}(t+1)\mid\theta=i,Y^{t}=y^{t}]=C$ , Since $C>0$ (constraint (16)), and any weighted average would add up to $C$ (constraint (20)).

When $\Delta\geq 0$ , then $\rho_{i}(y^{t})+\Delta>0$ , and therefore, $a^{\prime}_{i}=-\frac{p}{q}b_{j}$ since $d_{i}>0$ . The expectation $\mathsf{E}[W^{\prime}_{j}(t+1)\mid\theta=i,Y^{t}=y^{t}]$ can be computed from (123), where $\iota_{i}$ depends on whether $i\in S_{0}$ or $i\in S_{1}$ . The expectation is given by either (163) or (164) respectively:

	$\displaystyle q\log_{2}(2q)-pb_{i}+p\log_{2}(2p)+pb_{i}=C\;\text{if }i\in S_{0}$		(163)
	$\displaystyle q\log_{2}(2q)-pc_{i}+p\log_{2}(2p)+pc_{i}=C\;\text{if }i\in S_{1}\,.$		(164)

This proofs constraints (20) and (16) satisfied. To proof that constraint (26) is satisfied, need to show that $W^{\prime}_{i}(t+1)\leq W_{i}(t+1)$ . If suffices to compare the cases where $W^{\prime}_{i}(t+1)\neq W_{i}(t+1)$ , when $Y=X$ . It suffices to compare the terms in which the pairs differ, that is, that $a^{\prime}_{i}\leq a_{i}$ and $d^{\prime}_{i}\leq d_{i}$ . For this comparison, express $a_{i}$ and $d_{i}$ as positive logarithms as follows:

$\displaystyle a_{i}$	$\displaystyle=\log_{2}\left(1+\frac{(q-p)(\rho_{i}(y^{t})-\Delta)}{1-\rho_{i}(y^{t})-(q-p)(\rho_{i}(y^{t})-\Delta)}\right)$	(165)
$\displaystyle a^{\prime}_{i}$	$\displaystyle=\frac{p}{q}\log_{2}\left(1+\frac{(q-p)(\rho_{i}(y^{t})-\Delta)}{1-\rho_{i}(y^{t})}\right)$	(166)
$\displaystyle d_{i}$	$\displaystyle=\log_{2}\left(1+\frac{(q-p)(\Delta+\rho_{i}(y^{t}))}{1-\rho_{i}(y^{t})-(q-p)(\Delta+\rho_{i}(y^{t}))}\right)$	(167)
$\displaystyle d^{\prime}_{i}$	$\displaystyle=\frac{p}{q}\log_{2}\left(1+\frac{(q-p)(\rho_{i}(y^{t})+\Delta)}{1-\rho_{i}(y^{t})}\right)\,.$	(168)

Since $p\leq\frac{1}{2}\rightarrow\frac{p}{q}\leq 1$ , then, we only need to show that the arguments of the logarithm in (165) is greater than that of (166), and similarly for (167) and (168). All arguments share the term $1$ , then only inequalities (169) and (170) to hold:

	$\displaystyle\frac{(q-p)(\rho_{i}(y^{t})-\Delta)}{1\!-\!\rho_{i}(y^{t})\!-\!(q\!-\!p)(\rho_{i}(y^{t})\!-\!\Delta)}$	$\displaystyle\geq\frac{(q-p)(\rho_{i}(y^{t})-\Delta)}{1\!-\!\rho_{i}(y^{t})}$		(169)
	$\displaystyle\frac{(q-p)(\Delta+\rho_{i}(y^{t}))}{1\!-\!\rho_{i}(y^{t})\!-\!(q\!-\!p)(\rho_{i}(y^{t})\!+\!\Delta)}$	$\displaystyle\geq\frac{(q-p)(\rho_{i}(y^{t})+\Delta)}{1\!-\!\rho_{i}(y^{t})}\,.$		(170)

The numerators on both inequalities are the same, and positive, since that $q-p>0$ and if $i\in S_{0}$ then $\rho_{i}(y^{t})-\Delta\geq 0$ and for all cases $\rho_{i}(y^{t})+\Delta\geq 0$ regardless. Both denominators on the left hand side are smaller than those in the right side, by exactly the numerators, and therefore, the inequalities hold.

B-B Case $\Delta<0$ .

Next we show that constraints (20), (16) and (26) are satisfied when $\Delta<0$ . In the case where $\rho_{i}(y^{t})+\Delta>0$ , then $d^{\prime}_{i}$ is still $-\frac{p}{q}c_{i}\leq d_{i}$ by equation (167). However, whenever $\Delta<0$ , the term $\frac{1-\Delta}{1+\Delta}\log_{2}\left(1-(q-p)\Delta\right)\geq 0$ is added to $a^{\prime}_{i}$ . To show that constraint (20) holds, recall from (126) that:

$\displaystyle\sum_{i=1}^{M}$	$\displaystyle\rho_{i}\mathsf{E}[W_{i}(t+1)\mid\theta=i,Y^{t}=y^{t}]-C$
$\displaystyle=$	$\displaystyle\sum_{i\in S_{0}}\rho_{i}(y^{t})\left(qa_{i}+\!pb_{i}\right)+\!\sum_{i\in S_{0}}\rho_{i}(y^{t})\left(qd_{i}+\!pc_{i}\right)$	(171)
$\displaystyle\geq$	$\displaystyle\sum_{i\in S_{0}}\rho_{i}(y^{t})\left(qa_{i}+\!pb_{i}\right)$
	$\displaystyle-\sum_{i\in S_{1}}\rho_{i}(y^{t})\log_{2}\left(1-(q-p)^{2}\frac{\rho_{i}(y^{t})-\alpha}{1-\rho_{i}(y^{t})}\right)$	(172)
$\displaystyle\geq$	$\displaystyle\sum_{i\in S_{0}}\!\rho_{i}(y^{t})\left(qa_{i}\!+\!pb_{i}\right)\!-\!\frac{1\!+\!\alpha}{2}\log_{2}\left(1\!+\!(q\!-\!p)^{2}\alpha\right).$	(173)

To obtain $\mathsf{E}[W^{\prime}_{i}(t+1)\mid\mathcal{F}_{t},\theta=i]$ , replace $a_{i}$ by $a^{\prime}_{i}$ in equation (173), and let $e_{i}\triangleq\frac{1+\alpha}{1-\alpha}\log_{2}\left(1+(q-p)^{2}\alpha\right)$ . Then $a^{\prime}_{i}=e_{i}-\frac{p}{q}b_{i}$ and $qa^{\prime}_{i}+pb_{i}=qe_{i}$ , and replace to obtain:

$\displaystyle\sum_{i=1}^{M}$	$\displaystyle\rho_{i}\mathsf{E}[W^{\prime}_{i}(t+1)\mid\theta=i,Y^{t}=y^{t}]-C$
$\displaystyle=$	$\displaystyle\sum_{i\in S_{0}}\!\rho_{i}(y^{t})qe_{i}\!+\!\sum_{i\in S_{1}}\!\rho_{i}(y^{t})\left(qd_{i}+pc_{i}\right)$	(174)
$\displaystyle\geq$	$\displaystyle qe_{1}\sum_{i\in S_{0}}\rho_{i}(y^{t})-\frac{1+\alpha}{2}\log_{2}\left(1+(q-p)^{2}\alpha\right)$	(175)
$\displaystyle=$	$\displaystyle\frac{1-\alpha}{2}e_{1}-\frac{1+\alpha}{2}\log_{2}\left(1+(q-p)^{2}\alpha\right)$	(176)
$\displaystyle=$	$\displaystyle\left(\frac{1-\alpha}{2}\frac{1+\alpha}{1-\alpha}-\frac{1+\alpha}{2}\right)\log_{2}\left(1+(q-p)^{2}\alpha\right)$	(177)
$\displaystyle=$	$\displaystyle\;0\,.$	(178)

To show that $\mathsf{E}[W^{\prime}_{i}(t+1)\mid\theta=i,Y^{t}=y^{t}]\geq 0$ , (constraint (16)), note that $d^{\prime}_{i}$ is either unchanged from $d_{i}$ , or the same as when $\Delta\geq 0$ and therefore, it holds for $i\in S_{1}$ . For $i\in S_{0}$ , note from the first term of equation (174), that $\mathsf{E}[W^{\prime}_{i}(t+1)\mid\theta=i,Y^{t}=y^{t}]-0=\rho_{i}(y^{t})qe_{i}$ . Since $e_{i}\geq 0$ , then the expectation is either $C$ or greater.

Need to show, $W^{\prime}_{i}(t+1)\leq W_{i}(t+1)$ (constraint (26)). It suffices to show that $a^{\prime}_{i}\leq a_{i}$ and $d^{\prime}_{i}\leq d_{i}$ . Again, since $d^{\prime}_{i}$ is either $d_{i}$ or the same as when $\Delta\geq 0$ , we only need to show that $a^{\prime}_{i}=e_{i}-\frac{p}{q}b_{i}\leq a_{i}$ . It suffices to show that for a positive scalar $\gamma$ :

\displaystyle\gamma\left(q\left(e_{i}-\frac{p}{q}b_{i}\right)+pb_{i}\right)

\displaystyle\leq\gamma\left(qa_{i}+pb_{i}\right)\,.

(179)

When $\Delta<0$ , then $e_{i}>0$ . We have that:

\displaystyle q\left(e_{i}-\frac{p}{q}b_{i}\right)+\!pb_{i}=\frac{1+\alpha}{1-\alpha}\log_{2}\left(1+(q-p)^{2}\alpha\right)\,.

(180)

Recall from equation (129) that:

\displaystyle qa_{i}+pb_{i}\geq-\log_{2}\left(1-(q-p)^{2}\frac{\rho_{\min}+\alpha}{1-\rho_{\min}}\right)\,,

(181)

and let $\gamma=\frac{1-\alpha}{2}$ , then, the scaled difference between left and right terms in (179) is given by:

$\displaystyle\frac{1-\alpha}{2}$	$\displaystyle\left(qa_{i}+pb_{i}\right)-\frac{1+\alpha}{2}\log_{2}(1+(q-p)^{2}\alpha)$
$\displaystyle\geq$	$\displaystyle-\frac{1-\alpha}{2}\log_{2}\left(1-(q-p)^{2}\frac{\rho_{\min}+\alpha}{1-\rho_{\min}}\right)$
	$\displaystyle-\frac{1+\alpha}{2}\log_{2}\left(1-(q-p)^{2}\alpha\frac{\rho_{\min}-1}{1-\rho_{\min}}\right)$	(182)
$\displaystyle\geq$	$\displaystyle-\log_{2}\left(\!1\!-\!\frac{(q\!-\!p)^{2}}{1\!-\!\rho_{\min}}\left(\rho_{\min}\frac{1\!+\!\alpha^{2}}{2}\!-\!\alpha^{2}\right)\right)$	(183)
$\displaystyle\geq$	$\displaystyle-\log_{2}\left(\!1\!-\!\frac{(q\!-\!p)^{2}}{1\!-\!\rho_{\min}}\left(\rho_{\min}\frac{1\!+\!\alpha^{2}}{2}\!-\!\rho_{\min}\alpha\right)\right)$	(184)
$\displaystyle=$	$\displaystyle-\log_{2}\left(\!1\!-\!\frac{(q\!-\!p)^{2}}{1\!-\!\rho_{\min}}\left(\rho_{\min}\frac{(1\!-\!\alpha)^{2}}{2}\right)\right)\geq 0\,.$	(185)

Equation (182) follows from (181). In (183), Jensen’s inequality is used, where:
$\frac{1-\alpha}{2}(\rho_{\min}+\alpha)+\frac{1+\alpha}{2}\alpha(\rho_{\min}-1)=-\alpha^{2}+\rho_{\min}\frac{1-\alpha+\alpha+\alpha^{2}}{2}$ . In (184) note that $\alpha\leq\rho_{\min}\implies\alpha^{2}\leq\rho_{\min}\alpha$ . Finally $1-2\alpha+\alpha^{2}=(1-\alpha)^{2}\geq 0$ . We conclude that $e_{i}-\frac{p}{q}b_{i}\leq a_{i}$ , and therefore $W^{\prime}_{i}(t+1)\leq W_{i}(t+1)$ .

B-C Proof of constraint (28)

Finally need to show that constraint (28) is satisfied, that is: $U^{\prime}_{i}(T_{n+1})\!-\!\frac{p}{q}(U_{i}(T_{n})\!-\!C_{2})\leq\frac{1}{q}\log_{2}(2q)$ . We have shown that the update $W^{\prime}_{i}(t)$ allows the process $U^{\prime}_{i}(t)$ to meet constraints (16)-(19) of Thm. (1) and constraints (26) and (26). Note that by the definition of $T^{\prime}_{n}$ in Thm. 2 it is possible that the process $U^{\prime}_{i}(t)$ restarts when $U_{i}(t)$ falls from confirmation, at a time $t_{0}^{(n+1)}$ , without ever attaining $U^{\prime}_{i}(t)\geq 0$ . We could construct a third process $U^{\prime\prime}_{i}(t)$ that preserves all the properties of $U^{\prime}_{i}(t)$ , and with $U^{\prime\prime}_{i}(t)\geq 0$ if $U_{i}(t)\geq 0$ . The process $U^{\prime\prime}_{i}(t)$ could be initialized by $U^{\prime\prime}_{i}(t_{0}^{(n)})=U_{i}(t_{0}^{(n)})$ and then letting $U^{\prime\prime}_{i}(t+1)=U^{\prime\prime}_{i}(t)+W^{\prime\prime}_{i}(t+1)$ with step size $W^{\prime\prime}_{i}(t)$ defined by: $W^{\prime\prime}_{i}(t+1)=\max\{\min\{W_{i}(t+1),-U_{i}(t)\},W^{\prime}_{i}(t+1)\}$ . The inner minimum guarantees that $U^{\prime\prime}_{i}(t)$ reaches $0$ if $U_{i}(t)$ does, and the outer maximum guarantees that the step size is at least that of $U^{\prime}_{i}(t)$ . Then the processes $U_{i}(t)$ and $U^{\prime\prime}_{i}(t)$ cross $0$ at the same time, and share the same values when $U_{i}(t)<0$ , that is:

	$\displaystyle U_{i}(t\!+\!1)\geq\!0\quad$	$\displaystyle\implies$		$\displaystyle U^{\prime\prime}_{i}(t+1)\geq 0$		(186)
	$\displaystyle U_{i}(t)\leq 0$	$\displaystyle\implies$		$\displaystyle U^{\prime\prime}_{i}(t)=U_{i}(t)\,.$		(187)

Using the process $U^{\prime\prime}_{i}(t)$ and equation (187), the expression in constraint (28) becomes:

	$\displaystyle U^{\prime\prime}_{i}($	$\displaystyle t+1)-\frac{p}{q}\left(U_{i}(t+1)-C_{2}\right)$
	$\displaystyle=$	$\displaystyle U_{i}(t)\!\left(\!1\!-\!\frac{p}{q}\right)\!+\!W^{\prime\prime}_{i}(t\!+\!1)\!-\!\frac{p}{q}\left(W_{i}(t\!+\!1)\!-\!C_{2}\right).$		(188)

In the case where $W^{\prime\prime}_{i}(t+1)=-U_{i}(t)$ , we have that
$U^{\prime\prime}_{i}(t+1)=0$ and $U_{i}(t+1)\in[0,C_{2}]$ , then:

$\displaystyle U^{\prime\prime}_{i}(t+1)$	$\displaystyle\!-\!\frac{p}{q}(U_{i}(t\!+\!1)\!-\!C_{2})$
	$\displaystyle=-\frac{p}{q}(U_{i}(t)+W_{i}(t+1)-C_{2})$	(189)
	$\displaystyle=\frac{p}{q}C_{2}-\frac{p}{q}(U_{i}(t)+W_{i}(t+1))$	(190)
	$\displaystyle\leq\frac{p}{q}C_{2}\leq\frac{1}{q}\log_{2}(2q)\,.$	(191)

The first inequality in (191) follows since $U^{\prime\prime}_{i}(t)=0\implies W_{i}(t+1)\geq-U_{i}(t)$ and the second inequality holds because:

	$\displaystyle\log_{2}(2q)-pC_{2}$	$\displaystyle=1+(1-p)\log_{2}(1-p)+p\log_{2}(p)$
		$\displaystyle=C\geq 0\,.$		(192)

For the case where $W^{\prime\prime}_{i}(t)>-U_{i}(t)$ we solve a constraint maximization of expression (188), where the constraint is $U_{i}(t)<0$ (or $\rho_{i}(y^{t})<\frac{1}{2}$ ). For simplicity we subtract the constant $\frac{1}{q}\log_{2}(2q)$ from (188).

Let $i\in\{1,\dots,M\}$ be arbitrary and let $\rho\triangleq\rho_{i}(y^{t})$ , and $\alpha\triangleq|\Delta|$ . Using the definitions of $W_{i}(t)$ and $W^{\prime\prime}_{i}(t)$ in (155), (162) and (160)-(161), we explicitly find expressions for $U^{\prime\prime}_{i}(t+1)\!-\!\frac{p}{q}(U_{i}(t\!+\!1)\!-\!C_{2})$ in terms of $\rho$ , $p$ , $q$ and $\alpha$ .

When $\{i\in S_{0}\}\cap\{\Delta<0\}$ or $\{i\in S_{1}\}\cap\{\Delta\geq 0\}$ the expression is given by:

	$\displaystyle\log_{2}\left(\frac{\rho}{1\!-\!\rho}\right)\left(1\!-\!\frac{p}{q}\right)-\frac{p}{q}\log_{2}(2q)-\frac{p}{q}\log_{2}(2p)$
	$\displaystyle+\mathbbm{1}_{\Delta<0}\frac{1+\alpha}{1-\alpha}\log_{2}(1+(q-p)^{2}\alpha)$		(193)
	$\displaystyle+\frac{p}{q}\log_{2}\left(1\!+\!(q\!-\!p)\frac{\rho\!+\!\alpha}{1\!-\!\rho}\right)\!+\!\frac{p}{q}\log_{2}\left(1\!-\!(q\!-\!p)\frac{\rho\!+\!\alpha}{1\!-\!\rho}\right)\,,$

and when $\{i\in S_{0}\}\cap\{\Delta\geq 0\}$ or $\{i\in S_{1}\}\cap\{\Delta<0\}$ it is given by:

	$\displaystyle\log_{2}\left(\frac{\rho}{1\!-\!\rho}\right)\left(1\!-\!\frac{p}{q}\right)-\frac{p}{q}\log_{2}(2q)-\frac{p}{q}\log_{2}(2p)$		(194)
	$\displaystyle+\frac{p}{q}\log_{2}\left(1\!+\!(q\!-\!p)\frac{\rho\!-\!\alpha}{1\!-\!\rho}\right)+\frac{p}{q}\log_{2}\left(1\!-\!(q\!-\!p)\frac{\rho\!-\!\alpha}{1\!-\!\rho}\right)\,.$

B-D Maximizing (193)

The maximum of (193) happens when $\Delta>0$ , since the term with the indicator function is non-negative. Since $\alpha\leq\frac{1}{3}$ , then $\frac{1+\alpha}{1-\alpha}\leq 2$ , and we proceed to solve:

	$\displaystyle\textbf{maximize }f(\rho,\alpha)$		(195)
	$\displaystyle\textbf{subject to }\rho\leq\frac{1}{2},\alpha\leq 1-2\rho\,,$		(196)

where $f(\rho,\alpha)$ is defined by:

	$\displaystyle f(\rho,\alpha)\triangleq\log_{2}\left(\frac{\rho}{1\!-\!\rho}\right)\left(1\!-\!\frac{p}{q}\right)$		(197)
	$\displaystyle+2\log_{2}(1+(q-p)^{2}\alpha)-\frac{p}{q}\log_{2}(2q)-\frac{p}{q}\log_{2}(2p)$
	$\displaystyle+\frac{p}{q}\log_{2}\left(1\!+\!(q\!-\!p)\frac{\rho\!+\!\alpha}{1\!-\!\rho}\right)\!+\!\frac{p}{q}\log_{2}\left(1\!-\!(q\!-\!p)\frac{\rho\!+\!\alpha}{1\!-\!\rho}\right)\,.$

First we show that $f$ is increasing in $\rho$ , by showing $\frac{d}{d\rho}f\geq 0$ . Note that $\frac{d}{d\rho}\frac{\rho}{1-\rho}=\frac{1}{(1-\rho)^{2}}$ and $\frac{d}{d\rho}\frac{\rho+\alpha}{1-\rho}=\frac{1}{(1-\rho)^{2}}+\frac{\alpha}{(1-\rho)^{2}}$

	$\displaystyle\frac{\partial}{\partial\rho}$	$\displaystyle\ln(2)f(\rho,\alpha)=\frac{q-p}{q}\frac{1}{(1-\rho)\rho}+$
		$\displaystyle\frac{p}{q}\frac{1\!+\!\alpha}{(1\!-\!\rho)^{2}}\left(\frac{(q-p)}{1+(q\!-\!p)\frac{\rho+\alpha}{1-\rho}}-\frac{(q-p)}{1-(q\!-\!p)\frac{\rho+\alpha}{1-\rho}}\right)\,.$		(198)

Factor out the positive constant $\frac{1}{q}\frac{q-p}{1-\rho}$ , to obtain:

	$\displaystyle\frac{1}{\rho}+p\frac{1\!+\!\alpha}{1\!-\!\rho}\left(\frac{1}{1+(q\!-\!p)\frac{\rho+\alpha}{1-\rho}}-\frac{1}{1-(q\!-\!p)\frac{\rho+\alpha}{1-\rho}}\right)$
	$\displaystyle=\frac{1}{\rho}+p\frac{1\!+\!\alpha}{1\!-\!\rho}\frac{\left(1\!-\!(q\!-\!p)\frac{\rho\!+\!\alpha}{1\!-\!\rho}\right)-\left(1\!+\!(q\!-\!p)\frac{\rho\!+\!\alpha}{1\!-\!\rho}\right)}{1-(q-p)^{2}\left(\frac{\rho+\alpha}{1-\rho}\right)^{2}}$		(199)
	$\displaystyle=\frac{1}{\rho}-2\frac{p(1+\alpha)(q-p)(\rho+\alpha)}{(1-\rho)^{2}-(q-p)^{2}(\rho+\alpha)^{2}}=$
	$\displaystyle\frac{(1\!-\!\rho)^{2}\!-\!(q\!-\!p)^{2}(\rho\!+\!\alpha)^{2}\!-\!2p\rho(1\!+\!\alpha)(q\!-\!p)(\rho\!+\!\alpha)}{\rho(1-\rho)^{2}-\rho(q-p)^{2}(\rho+\alpha)^{2}}.$		(200)

It suffices to show that the top of equation (200) is non-negative. Since it decreases when $\alpha\leq 1-2\rho$ then:

$\displaystyle(1-\rho)^{2}$	$\displaystyle-(q-p)^{2}(\rho+\alpha)^{2}-2p\rho(1+\alpha)(q-p)(\rho+\alpha)$
$\displaystyle\geq$	$\displaystyle(1-\rho)^{2}-(q-p)^{2}(\rho+1-2\rho)^{2}$
	$\displaystyle-2p\rho(1+1-2\rho)(q-p)(\rho+1-2\rho)$	(201)
$\displaystyle=$	$\displaystyle(1-\rho)^{2}-(q-p)^{2}(1-\rho)^{2}$
	$\displaystyle-2p\rho(2-2\rho)(q-p)(1-\rho)$	(202)
$\displaystyle=$	$\displaystyle(1-\rho)^{2}(1-(q-p)^{2}-(q-p)4p\rho)$	(203)
$\displaystyle=$	$\displaystyle(1-\rho)^{2}4p(q-\rho(q-p))$
	$\displaystyle>4p(1-\rho)^{2}(q-\rho)>0\,.$	(204)

In equation (204) we have used $(q-p)^{2}=(1-2p)^{2}=1-4p+4p^{2}=1-4pq$ and $\rho(q-p)<\rho q<\rho$ . Since $\frac{\partial}{\partial\rho}f>0$ then $f$ is increasing in $\rho$ and we can replace $\rho$ by $\frac{1-\alpha}{2}$ for an upper bound. Since $\frac{\rho+\alpha}{1-\rho}=\frac{\frac{1-\alpha}{2}+\alpha}{1-\frac{1-\alpha}{2}}=\frac{1-\alpha+2\alpha}{2-1+\alpha}=1$ , then $f(\alpha)\triangleq f\left(\frac{1-\alpha}{2},\alpha\right)$ is given by:

$\displaystyle f(\alpha)=$	$\displaystyle\log_{2}\left(\frac{1\!-\!\alpha}{1\!+\!\alpha}\right)\left(1\!-\!\frac{p}{q}\right)\!+\!2\log_{2}(1\!+\!(q\!-\!p)^{2}\alpha)$	(205)
	$\displaystyle-\frac{p}{q}\log_{2}(2q)-\frac{p}{q}\log_{2}(2p)$
	$\displaystyle+\frac{p}{q}\log_{2}(1\!+\!(q\!-\!p))+\frac{p}{q}\log_{2}(1-(q-p))$	(206)
$\displaystyle=$	$\displaystyle\log_{2}\left(\frac{1\!-\!\alpha}{1\!+\!\alpha}\right)\!\left(1\!-\!\frac{p}{q}\right)\!+\!2\log_{2}(1\!+\!(q\!-\!p)^{2}\alpha).$	(207)

To complete the proof, it suffices to show that the last expression decreases in $\alpha$ :

$\displaystyle\frac{d}{d\alpha}$	$\displaystyle\ln(2)f(\alpha)$
$\displaystyle=$	$\displaystyle\left(1\!-\!\frac{p}{q}\right)\frac{1\!+\!\alpha}{1\!-\!\alpha}\frac{-(1\!+\!\alpha)\!-\!(1\!-\!\alpha)}{(1+\alpha)^{2}}\!+\!\frac{2(q-p)^{2}}{1\!+\!(q\!-\!p)^{2}\alpha}$	(208)
$\displaystyle=$	$\displaystyle-2\frac{1}{q}\frac{q-p}{1-\alpha^{2}}+2\frac{(q-p)^{2}}{1+(q-p)^{2}\alpha}$	(209)
$\displaystyle=$	$\displaystyle 2(q-p)\left(-\frac{1}{q}\frac{1}{1-\alpha^{2}}+\frac{q-p}{1+(q-p)^{2}\alpha}\right)$	(210)
	$\displaystyle\leq-2(q-p)\left(1+\frac{p}{q}-q+p\right)$
	$\displaystyle=-2p(q-p)\left(2+\frac{1}{q}\right)<0\,.$	(211)

Since $f$ is decreasing, then the maximum of equation (193) is $0$ , at $\alpha=0$ .

B-E Maximizing (194)

The expression (194) is given by $g(\rho,\alpha)-\frac{p}{q}\log_{2}(2q)-\frac{p}{q}\log_{2}(2p)$ where $f(\rho,\alpha)$ is defined by:

	$\displaystyle g$	$\displaystyle(\rho,\alpha)\triangleq\log_{2}\left(\frac{\rho}{1\!-\!\rho}\right)\left(1\!-\!\frac{p}{q}\right)$		(212)
		$\displaystyle+\frac{p}{q}\log_{2}\left(1\!+\!(q\!-\!p)\frac{\rho\!-\!\alpha}{1\!-\!\rho}\right)+\frac{p}{q}\log_{2}\left(1\!-\!(q\!-\!p)\frac{\rho\!-\!\alpha}{1\!-\!\rho}\right)\,.$

We proceed to solve:

	maximize	$\displaystyle g(\rho,\alpha)$		(213)
	subject to	$\displaystyle\rho\leq\frac{1}{2},\alpha\leq 1-2\rho\,.$		(214)

Combining the last two terms we obtain:

	$\displaystyle g(\rho,\alpha)=$	$\displaystyle\log_{2}\left(\frac{\rho}{1-\rho}\right)\left(1-\frac{p}{q}\right)$
		$\displaystyle+\frac{p}{q}\log_{2}\left(1-(q-p)^{2}\left(\frac{\rho-\alpha}{1-\rho}\right)^{2}\right)\,.$		(215)

The first term increases with $\rho$ , and the second one decreases as the quotient $\left(\frac{\rho-\alpha}{1-\rho}\right)$ increases in absolute value. For $\rho\leq\frac{1}{3}$ , it is possible to have $\rho=\alpha$ , leaving only (215). However, for $\rho\geq\frac{1}{3}$ , the quotient is positive because $\alpha\leq 1-2\rho\leq\frac{1}{3}$ . The smallest value of the quotient is then $\frac{1-\alpha}{1-\rho}\frac{3\rho-1}{1-\rho}=\frac{2\rho}{1-\rho}-1$ with square $1-\frac{4\rho(1-2\rho)}{(1-\rho)^{2}}$ . Let $g(\rho)$ be defined in equation (216), then the maximum of $g(\rho,\alpha)$ is bounded by the maximum of $g(\rho)$ , where:

	$\displaystyle g(\rho)$	$\displaystyle\triangleq\log_{2}\left(\frac{\rho}{1-\rho}\right)\left(1-\frac{p}{q}\right)$		(216)
	$\displaystyle+$	$\displaystyle\frac{p}{q}\log_{2}\left(1\!-\!(q\!-\!p)\frac{3\rho\!-\!1}{1\!-\!\rho}\right)+\frac{p}{q}\log_{2}\left(1\!+\!(q\!-\!p)\frac{3\rho\!-\!1}{1\!-\!\rho}\right)\,.$

To determine the max, we find the behavior of $g(\rho)$ by taking the first derivative:

	$\displaystyle\frac{d}{d\rho}$	$\displaystyle g(\rho)=\frac{q-p}{q\ln(2)}\frac{1-\rho}{\rho}\frac{1}{(1-\rho)^{2}}$
	$\displaystyle+$	$\displaystyle\frac{p}{q\ln(2)}\left(\frac{(q-p)\frac{2}{(1-\rho)^{2}}}{1+(q-p)\frac{3\rho-1}{1-\rho}}-\frac{(q-p)\frac{2}{(1-\rho)^{2}}}{1-(q-p)\frac{3\rho-1}{1-\rho}}\right)\,.$		(217)

Then, scale by the positive term $\frac{(1-\rho)^{2}}{q-p}q\ln(2)$ to obtain:

$\displaystyle g^{\prime}$	$\displaystyle(\rho)\frac{(1-\rho)^{2}}{q-p}q\ln(2)$
$\displaystyle=$	$\displaystyle\frac{1-\rho}{\rho}-p\frac{2}{1-(q-p)\frac{3\rho-1}{1-\rho}}+p\frac{2}{1+(q-p)\frac{3\rho-1}{1-\rho}}$	(218)
$\displaystyle=$	$\displaystyle\frac{1-\rho}{\rho}+2p\frac{1-(q-p)\frac{3\rho-1}{1-\rho}-1-(q-p)\frac{3\rho-1}{1-\rho}}{1-(q-p)\frac{3\rho-1}{1-\rho}}$	(219)
$\displaystyle=$	$\displaystyle\frac{1-\rho}{\rho}-4p(q-p)\frac{\frac{3\rho-1}{1-\rho}}{1-(q-p)\frac{3\rho-1}{1-\rho}}$	(220)
$\displaystyle=$	$\displaystyle\frac{1-\rho}{\rho}-4p(q-p)\frac{3\rho-1}{1-(q-p)(3\rho-1)}$	(221)
$\displaystyle=$	$\displaystyle\frac{1-\rho-(q-p)(3\rho-1)(1-\rho+4p\rho)}{\rho-(q-p)\rho(3\rho-1)}\,.$	(222)

To show that $g^{\prime}\geq 0$ in $[\frac{1}{3},\frac{1}{2}]$ it suffices to show that the top of equation (222) is positive:

$\displaystyle 1-\rho$	$\displaystyle-(q-p)(3\rho-1)(1-\rho+4p\rho)$
$\displaystyle\geq$	$\displaystyle 1-\frac{1}{2}-(q-p)\left(\frac{3}{2}-1\right)(1-\rho(1-4p))$	(223)
$\displaystyle=$	$\displaystyle\frac{1}{2}-\frac{1-2p}{2}(1-\rho(1-4p))$	(224)
$\displaystyle=$	$\displaystyle\frac{1}{2}-\frac{1-2p}{2}+\frac{1-2p}{2}\rho(1-4p)$	(225)
$\displaystyle=$	$\displaystyle\frac{1}{2}-\frac{1}{2}+p+\rho\frac{(1-2p)(1-4p)}{2}$	(226)
$\displaystyle=$	$\displaystyle p+\rho\frac{(1-2p)(1-4p)}{2}\,.$	(227)

When $p\leq\frac{1}{4}$ , then $(1-4p)>0$ and the second term is non-negative and therefore the derivative is positive. When $p>\frac{1}{4}$ we have $1-4p\geq-1$ , $0\leq 1-2p<\frac{1}{2}$ , then:

\displaystyle p+\rho\frac{(1-2p)(1-4p)}{2}\geq\frac{1}{4}-\rho\frac{1}{4}=\frac{1-\rho}{4}\geq\frac{1}{8}>0\,,

therefore, $g$ is increasing in $\rho$ and the maximum is at $\rho=\frac{1}{2}$ where $\frac{\rho}{1-\rho}=1$ . The maximum is given by:

$\displaystyle g\left(\frac{1}{2}\right)=$	$\displaystyle\log_{2}\left(1\right)\left(1-\frac{p}{q}\right)$
	$\displaystyle+\frac{p}{q}\log\left(1\!+\!(q\!-\!p)\right)+\frac{p}{q}\ln\left(1\!-\!(q\!-\!p)\right)$
$\displaystyle=$	$\displaystyle\frac{p}{q}\log_{2}(2q)+\frac{p}{q}\log_{2}(2p)\,.$	(228)

Then the maximum of $g(\rho,\alpha)-\frac{p}{q}\log(2q)-\frac{p}{q}\log_{2}(2p)$ is zero. Since the maximum of both equations, (193) and (194) are zero, we conclude that $U^{\prime}_{i}(t+1)-\frac{p}{q}\left(U_{i}(t+1)-C_{2}\right)\leq\frac{1}{q}\log_{2}(2q)$ .

Finally, we prove the last claim that $B=\frac{1}{q}\log_{2}(2q)$ is the smallest value for a system that enforces the SED constraint. It suffices to note that the surrogate process described in [2], Sec. V E is a strict martingale. A process $U^{\prime}_{i}(t)$ with a lower $B$ value would not comply with constraint (9) and therefore would also fail to meet constraint (20).

Appendix C Proof: Confirmation Phase State Space 3

Proof:

Suppose that for times $t=s$ and $t=s+1$ the partitioning is fixed at $S_{0}=\{j\}$ and $S_{1}=\Omega\setminus\{j\}$ . Suppose also that $Y_{s}=0$ , and $Y_{s+1}=1$ . Need to show that $\forall i=1,\dots,M$ we have that $\rho_{i}(y^{s})=\rho_{i}(y^{s+2})$ . Using the update formula (121), at time $t=s+1$ we have that for $i\neq j$ :

	$\displaystyle\rho_{i}(y^{s+1})$	$\displaystyle=\frac{p\rho_{i}(y^{s})}{q\rho_{j}(y^{s})\!+\!p(1\!-\!\rho_{j}(y^{s}))}=\frac{p\rho_{i}(y^{s})}{\rho_{j}(y^{s})(q\!-\!p)\!+\!p}$		(229)
	$\displaystyle\rho_{j}(y^{s+1})$	$\displaystyle=\frac{q\rho_{j}(y^{s})}{q\rho_{j}(y^{s})+p(1-\rho_{j}(y^{s}))}=\frac{q\rho_{j}(y^{s})}{\rho_{j}(y^{s})(q\!-\!p)\!-\!p}\,.$		(230)

At time $t=s+2$ , since $Y_{s+2}=1$ , equation (121) for $i\neq j$ results in:

$\displaystyle\rho_{i}(y^{s+2})$	$\displaystyle=\frac{q\rho_{i}(y^{s+1})}{(p\!-\!q)\rho_{j}(y^{s+1})\!+\!q}=\frac{q\frac{p\rho_{i}(y^{s})}{\rho_{j}(y^{s})(q\!-\!p)\!+\!p}}{(p\!-\!q)\frac{q\rho_{j}(y^{s})}{\rho_{j}(y^{s})(q\!-\!p)\!+\!p}\!+\!q}$
	$\displaystyle=\frac{qp\rho_{i}(y^{s})}{(p\!-\!q)q\rho_{j}(y^{s})\!+\!q(\rho_{j}(y^{s})(q\!-\!p)\!+\!p)}$	(231)
	$\displaystyle=\frac{qp\rho_{i}(y^{s})}{-(q\!-\!p)q\rho_{j}(y^{s})\!+\!(q\!-\!p)q\rho_{j}(y^{s})\!+\!qp}$
	$\displaystyle=\frac{qp\rho_{i}(y^{s})}{qp}=\rho_{i}(y^{s})\,.$	(232)

And for $i=j$ equation (121) results in:

$\displaystyle\rho_{j}(y^{s+2})$	$\displaystyle=\frac{p\rho_{j}(y^{s+1})}{(p\!-\!q)\rho_{j}(y^{s+1})\!+\!q}=\frac{p\frac{q\rho_{j}(y^{s})}{\rho_{j}(y^{s})(q\!-\!p)\!+\!p}}{(p\!-\!q)\frac{q\rho_{j}(y^{s})}{\rho_{j}(y^{s})(q\!-\!p)\!+\!p}\!+\!q}$
	$\displaystyle=\frac{pq\rho_{j}(y^{s})}{(p\!-\!q)q\rho_{j}(y^{s})\!+\!q(\rho_{j}(y^{s})(q\!-\!p)\!+\!p)}$	(233)
	$\displaystyle=\frac{qp\rho_{j}(y^{s})}{-\!(q\!-\!p)q\rho_{j}(y^{s})\!+\!(q\!-\!p)q\rho_{j}(y^{s})\!+\!qp}$
	$\displaystyle=\frac{qp\rho_{j}(y^{s})}{qp}=\rho_{j}(y^{s})\,.$	(234)

Then for all $i=i\dots,M$ each posterior at time $t=s+1$ is given by: $\rho_{i}(y^{s+2})=\rho_{i}(y^{s})$ . The same equalities hold when $Y_{s+1}=1$ and $Y_{s+2}=0$ , where the only difference is that $p$ and $q$ are interchanged. By induction, we have that $\rho_{i}(y^{s+2r})=\rho_{i}(y^{s})$ for $r=1,\dots$ , if for every $t=s,\dots,s+2r-1$ the partitions are fixed at $S_{0}=\{j\}$ and $S_{1}=\Omega\setminus\{j\}$ , and $\sum_{k=1}^{2r}Y_{s+k}=0$ . ∎

Appendix D Proof of Inequality (108), Sec. III

Proof:

Need to show that the following inequality holds:

\displaystyle\mathsf{E}[U_{i}(T_{n})\mid T^{(n+1)}\!>\!0,\theta\!=\!i]\geq\mathsf{E}[U_{i}(T_{n})\mid T^{(n)}\!>\!0,\theta\!=\!i]

(235)

Recall that $\mathcal{C}(t^{(n)}_{0})$ is the event that message $i$ enters confirmation after time $t^{(n)}_{0}$ , rather than another message $j\neq i$ ending the process by attaining $U_{j}(t)\geq\log_{2}(1-\epsilon)-\log_{2}(\epsilon)$ . This event is defined by $\mathcal{C}(t^{(n)}_{0})\triangleq\{\exists t>t^{(n)}_{0}:U_{i}(t)\geq 0\}$ . Using Bayes rule, the expectation $\mathsf{E}[U_{i}(T_{n})\mid T^{(n)}>0,\theta\!=\!i]$ can be expanded as a sum of expectations conditioned on events that are defined in terms of $\{T^{(n+1)}>0\}$ , $\{T^{(n)}>0\}$ and $\mathcal{C}(t^{(n)}_{0})$ , and whose union is the full event space to leave only the original conditioning $\{T^{(n)}>0\}$ . These events are $\mathcal{C}(t^{(n)}_{0})\cap\{T^{(n+1)}>0\}$ , $\mathcal{C}(t^{(n)}_{0})\cap\{T^{(n+1)}\leq 0\}$ , $\neg\mathcal{C}(t^{(n)}_{0})\cap\{T^{(n+1)}>0\}$ and $\neg\mathcal{C}(t^{(n)}_{0})\cap\{T^{(n+1)}\leq 0\}$ . Note that $\neg\mathcal{C}(t^{(n)}_{0})\implies\{T^{(n+1)}=0\}$ and therefore the third event vanishes. The expansion is given by:

$\displaystyle\mathsf{E}[$	$\displaystyle U_{i}(T_{n})\mid T^{(n)}>0,\theta=i]$
$\displaystyle=$	$\displaystyle\mathsf{E}[U_{i}(T_{n})\mid T^{(n+1)}>0,T^{(n)}>0,\theta=i]$
	$\displaystyle\quad\quad\quad\cdot\Pr(\mathcal{C}(t^{(n)}_{0}),T^{(n+1)}>0\mid T^{(n)}>0,\theta=i)$	(236)
	$\displaystyle+\mathsf{E}[U_{i}(T_{n})\mid\mathcal{C}(t^{(n)}_{0}),T^{(n+1)}\!\leq\!0,T^{(n)}\!>\!0,\theta\!=\!i]$
	$\displaystyle\quad\quad\quad\cdot\Pr(\mathcal{C}(t^{(n)}_{0}),T^{(n+1)}\leq 0\mid T^{(n)}\!>\!0,\theta\!=\!i)$	(237)
	$\displaystyle+\mathsf{E}[U_{i}(T_{n})\mid\neg\mathcal{C}(t^{(n)}_{0}),T^{(n+1)}\!\leq\!0,T^{(n)}\!>\!0,\theta\!=\!i]$
	$\displaystyle\quad\quad\cdot\Pr(\neg\mathcal{C}(t^{(n)}_{0}),T^{(n+1)}\leq 0\mid T^{(n)}\!>\!0,\theta\!=\!i)\,.$	(238)

Since $\{T^{(n+1)}>0\}\implies\mathcal{C}(t^{(n)}_{0})\cap\{T^{(n)}>0\}$ , we can omit the conditioning on $\mathcal{C}(t^{(n)}_{0})$ and $\{T^{(n)}>0\}$ when accompanied by $\{T^{(n+1)}>0\}$ . By the independence of the confirmation phase from the crossing value $U_{i}(T_{n})$ derived from the fix state count of the Markov Chain we have that:

	$\displaystyle\mathsf{E}[U_{i}(T_{n})$	$\displaystyle\mid\mathcal{C}(t^{(n)}_{0}),T^{(n+1)}\!\leq\!0,T^{(n)}\!>\!0,\theta\!=\!i]$
		$\displaystyle=\mathsf{E}[U_{i}(T_{n})\mid T^{(n+1)}>0,T^{(n)}>0,\theta=i]\,.$		(239)

Therefore, we can replace the expectation in (237) by the one in (236) and then add the probabilities in (236) and (237) to obtain $\Pr(\mathcal{C}(t^{(n)}_{0})\mid T^{(n)}\!>\!0,\theta\!=\!i)$ . Note that $\neg\mathcal{C}(t^{(n)}_{0})\implies\{T^{(n+1)}\leq 0\}$ , thus the conditioning on $\{T^{(n+1)}\leq 0\}$ is redundant with $\neg\mathcal{C}(t^{(n)}_{0})$ . Then the expectation in the left of (236) is also given by:

$\displaystyle\mathsf{E}[U_{i}(T_{n})$	$\displaystyle\mid T^{(n)}>0,\theta=i]$
	$\displaystyle=\mathsf{E}[U_{i}(T_{n})\mid T^{(n+1)}>0,\theta=i]$
	$\displaystyle\quad\quad\quad\quad\quad\cdot\Pr(\mathcal{C}(t^{(n)}_{0})\mid T^{(n)}\!>\!0,\theta\!=\!i)$	(240)
	$\displaystyle\quad+\mathsf{E}[U_{i}(T_{n})\mid\neg\mathcal{C}(t^{(n)}_{0}),T^{(n)}\!>\!0,\theta\!=\!i]$
	$\displaystyle\quad\quad\quad\quad\quad\cdot\Pr(\neg\mathcal{C}(t^{(n)}_{0})\mid T^{(n)}\!>\!0,\theta\!=\!i)$	(241)

The event $\neg\mathcal{C}(t^{(n)}_{0})\cap\{T^{(n)}>0\}\cap\{\theta=i\}$ implies that the process decodes in error at the $n$ th communication phase round, which results in $U_{i}(T_{n})<0$ . Therefore, we have that $\mathsf{E}[U_{i}(T_{n})\mid\neg\mathcal{C}(t^{(n)}_{0}),T^{(n+1)}\!\leq\!0,\theta\!=\!i]<0$ , This makes the left side of (240) an average of the positive quantity in the right of (240) and the negative quantity in (241). Then:

$\displaystyle\mathsf{E}[U_{i}(T_{n})\mid$	$\displaystyle T^{(n)}\!>\!0,\theta\!=\!i]$
	$\displaystyle\leq\mathsf{E}[U_{i}(T_{n})\mid T^{(n+1)}\!>\!0,\theta\!=\!i]$
	$\displaystyle\quad\quad\quad\quad\quad\cdot\Pr(\mathcal{C}(t^{(n)}_{0})\mid T^{(n)}\!>\!0,\theta\!=\!i)$	(242)
	$\displaystyle\leq\mathsf{E}[U_{i}(T_{n})\mid T^{(n+1)}>0,\theta=i]$	(243)

The last inequality (243) follows because the expectation is positive and is multiplied by a probability, $0\leq\Pr(\mathcal{C}(t^{(n)}_{0})\mid T^{(n)}\!>\!0,\theta\!=\!i)\leq 1$ , in (242). ∎

References

[1] A. Antonini, H. Yang, and R. D. Wesel, “Low complexity algorithms for transmission of short blocks over the bsc with full feedback,” in 2020 IEEE International Symposium on Information Theory (ISIT), 2020, pp. 2173–2178.
[2] H. Yang, M. Pan, A. Antonini, and R. D. Wesel, “Sequential transmission over binary asymmetric channels with feedback,” IEEE Tran. Inf. Theory, 2021. [Online]. Available: https://arxiv.org/abs/2111.15042
[3] C. Shannon, “The zero error capacity of a noisy channel,” IRE Trans. Inf. Theory, vol. 2, no. 3, pp. 8–19, September 1956.
[4] M. V. Burnashev, “Data transmission over a discrete channel with feedback. random transmission time,” Problemy Peredachi Inf., vol. 12, no. 4, pp. 10–30, 1976.
[5] M. Horstein, “Sequential transmission using noiseless feedback,” IEEE Trans. Inf. Theory, vol. 9, no. 3, pp. 136–143, July 1963.
[6] O. Shayevitz and M. Feder, “Optimal feedback communication via posterior matching,” IEEE Trans. Inf. Theory, vol. 57, no. 3, pp. 1186–1222, March 2011.
[7] S. K. Gorantla and T. P. Coleman, “A stochastic control approach to coding with feedback over degraded broadcast channels,” in 49th IEEE Conference on Decision and Control (CDC), 2010, pp. 1516–1521.
[8] C. T. Li and A. E. Gamal, “An efficient feedback coding scheme with low error probability for discrete memoryless channels,” IEEE Trans. Inf. Theory, vol. 61, no. 6, pp. 2953–2963, June 2015.
[9] O. Shayevitz and M. Feder, “A simple proof for the optimality of randomized posterior matching,” IEEE Trans. Inf. Theory, vol. 62, no. 6, pp. 3410–3418, June 2016.
[10] M. Naghshvar, T. Javidi, and M. Wigger, “Extrinsic Jensen–Shannon divergence: Applications to variable-length coding,” IEEE Trans. Inf. Theory, vol. 61, no. 4, pp. 2148–2164, April 2015.
[11] J. H. Bae and A. Anastasopoulos, “A posterior matching scheme for finite-state channels with feedback,” in 2010 IEEE Int. Symp. Inf. Theory, 2010, pp. 2338–2342.
[12] V. Kostina, Y. Polyanskiy, and S. Verd, “Joint source-channel coding with feedback,” IEEE Trans. on Inf. Theory, vol. 63, no. 6, pp. 3502–3515, 2017.
[13] S. Kim, R. Ma, D. Mesa, and T. P. Coleman, “Efficient bayesian inference methods via convex optimization and optimal transport,” in 2013 IEEE Int. Symp. on Inf. Theory, 2013, pp. 2259–2263.
[14] O. Sabag, H. H. Permuter, and N. Kashyap, “Feedback capacity and coding for the bibo channel with a no-repeated-ones input constraint,” IEEE Tran. Inf. Theory, vol. 64, no. 7, pp. 4940–4961, 2018.
[15] L. V. Truong, “Posterior matching scheme for gaussian multiple access channel with feedback,” in 2014 IEEE Information Theory Workshop (ITW 2014), 2014, pp. 476–480.
[16] A. Anastasopoulos, “A sequential transmission scheme for unifilar finite-state channels with feedback based on posterior matching,” in 2012 IEEE Int. Symp. Inf. Theory, 2012, pp. 2914–2918.
[17] J. Schalkwijk, “A class of simple and optimal strategies for block coding on the binary symmetric channel with noiseless feedback,” IEEE Trans. Inf. Theory, vol. 17, no. 3, pp. 283–287, May 1971.
[18] J. Schalkwijk and K. Post, “On the error probability for a class of binary recursive feedback strategies,” IEEE Trans. Inf. Theory, vol. 19, no. 4, pp. 498–511, July 1973.
[19] A. Tchamkerten and E. Telatar, “A feedback strategy for binary symmetric channels,” in Proc. IEEE Int. Symp. Inf. Theory, June 2002, pp. 362–362.
[20] A. Tchamkerten and I. E. Telatar, “Variable length coding over an unknown channel,” IEEE Trans. Inf. Theory, vol. 52, no. 5, pp. 2126–2145, May 2006.
[21] M. Naghshvar, M. Wigger, and T. Javidi, “Optimal reliability over a class of binary-input channels with feedback,” in 2012 IEEE Inf. Theory Workshop, Sep. 2012, pp. 391–395.
[22] R. Durrett, Probability : theory and examples / Chapter 4.8, Optional Stopping Theorems, 5th ed., ser. Cambridge series in statistical and probabilistic mathematics ; 49. Cambridge ;: Cambridge University Press, 2019.
[23] ——, Probability : theory and examples / Chapter 4.2, Exponential Martingale, 5th ed., ser. Cambridge series in statistical and probabilistic mathematics ; 49. Cambridge ;: Cambridge University Press, 2019.

Achievable Rates and Low-Complexity Encoding of Posterior Matching for the BSC

Abstract

Index Terms:

I Introduction

I-A Background

I-B Contributions

I-C Organization

II Posterior Matching with SED Partitioning

II-A Communication Scheme

II-B Yang’s Achievable Rate

II-C Original Constraints that Ensure Yang’s Achievable Rate

III A New Bound and Relaxed Partitioning

III-A Relaxed Constraints that Also Guarantee Bound (4)

Theorem 1.

III-B A “Surrogate Process” that can Tighten the Bound

Theorem 2 (Surrogate Process Theorem).

III-C Relaxed Constraints that Achieve a Tighter Bound

Theorem 3.

IV Supporting Lemmas and Proof of Theorem 1

IV-A Five Helping Lemmas to Aid the Proof of Thms. 1-3

Lemma 1.

Lemma 2.

Lemma 3.

Lemma 4.

Lemma 5.

IV-B Proof of Thm. 1 Using Lemmas 1-5:

Proof:

V Proof of Lemmas 1-5, Thm. 2, and Thm. 3

Claim 1.

Proof:

V-A Proof of Lemmas 1-5

Proof:

Proof:

Proof:

Proof:

Proof:

V-B Proof of Theorems 2 and 3

Proof:

Proof:

VI Extension to Arbitrary Initial Distributions

VI-A Generalized Achievability Bound

VI-B Uniform and Binomial Initial Distribution

VII Algorithm and Implementation

VII-A Partitioning by Thresholding of Ordered Posteriors (TOP)

Claim 2.

Proof:

VII-B The Systematic Posterior Matching Algorithm

VII-C Systematic phase

VII-D Communication Phase

VII-E Confirmation Phase

Claim 3 (Confirmation Phase is a Discrete Markov Chain).

Proof:

Theorem 4.

Proof:

VII-F Complexity of the SPM-TOP Algorithm

VIII SPM-TOP simulation results

IX Conclusion

X Acknowledgements

Appendix A Proof of claim 1

Appendix B Proof of existence of Ui′​(t)U^{\prime}_{i}(t) in Thm. 2

B-A Case Δ≥0\Delta\geq 0.

B-B Case Δ<0\Delta<0.

B-C Proof of constraint (28)

B-D Maximizing (193)

B-E Maximizing (194)

Appendix C Proof: Confirmation Phase State Space 3

Proof:

Appendix D Proof of Inequality (108), Sec. III

Proof:

References

Appendix B Proof of existence of $U^{\prime}_{i}(t)$ in Thm. 2

B-A Case $\Delta\geq 0$ .

B-B Case $\Delta<0$ .