This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Achievable Rates and Low-Complexity Encoding of Posterior Matching for the BSC

Amaael Antonini,  Rita Gimelshein, 
and Richard Wesel
This work was supported by the National Science Foundation (NSF) under Grant CCF-1955660. An earlier version of this paper was presented in part at the 2020 IEEE International Symposium on Information Theory (ISIT) [1] [DOI: 10.1109/ISIT44484.2020.9174232 ]. (Corresponding author: Amaael Antonini.) A. Antonini is with the Department of Electrical and Computer Engineering, University of California, Los Angeles, Los Angeles, CA, 90095 USA (e-mail: amaael@ucla.edu). R. Gimelshein is with the Department of Electrical and Computer Engineering, University of California, Los Angeles, Los Angeles, CA, 90095 USA (e-mail: rgimel@ucla.edu). Minghao Pan is with the California Institute of Technology. R. D. Wesel is with the Department of Electrical and Computer Engineering, University of California, Los Angeles, Los Angeles, CA, 90095 USA (e-mail: wesel@ucla.edu).
Abstract

Horstein, Burnashev, Shayevitz and Feder, Naghshvar et al. and others have studied sequential transmission of a kk-bit message over the binary symmetric channel (BSC) with full, noiseless feedback using posterior matching. Yang et al. provide an improved lower bound on the achievable rate using martingale analysis that relies on the small-enough difference (SED) partitioning introduced by Naghshvar et al. SED requires a relatively complex encoder and decoder. To reduce complexity, this paper replaces SED with relaxed constraints that admit the small enough absolute difference (SEAD) partitioning rule. The main analytical results show that achievable-rate bounds higher than those found by Yang et al. [2] are possible even under the new constraints, which are less restrictive than SED. The new analysis does not use martingale theory for the confirmation phase and applies a surrogate channel technique to tighten the results. An initial systematic transmission further increases the achievable rate bound. The simplified encoder associated with SEAD has a complexity below order O(K2)O(K^{2}) and allows simulations for message sizes of at least 1000 bits. For example, simulations achieve 9999% of of the channel’s 0.500.50-bit capacity with an average block size of 200 bits for a target codeword error rate of 10310^{-3}.

Index Terms:
Posterior matching, binary symmetric channel, noiseless feedback, random coding.

I Introduction

Consider sequential-transmission over the binary symmetric channel with full, noiseless feedback as depicted in Fig. 1. The source data at the transmitter is a KK-bit message θ\theta, uniformly sampled from {0,1}KΩ\{0,1\}^{K}\triangleq\Omega. At each time t=1,2,τt=1,2,\dots\tau, input symbol XtX_{t} is transmitted across the channel, and output symbol YtY_{t} is received, where Xt,Yt{0,1}X_{t},Y_{t}\in\{0,1\} and Pr(Yt=1Xt=0)=Pr(Yt=0Xt=1)=pt\Pr(Y_{t}=1\mid X_{t}=0)=\Pr(Y_{t}=0\mid X_{t}=1)=p\ \forall t. The received symbol YtY_{t} is available to the transmitter for encoding symbol Xt+1X_{t+1} (and subsequent symbols) via the noiseless feedback channel.

Source Encoder BSC Decoder θ\thetaXtX_{t}YtY_{t}Yt1Y_{t-1}θ^\hat{\theta}
Figure 1: System diagram of a BSC with full, noiseless feedback.

The process terminates at stopping time t=τt=\tau when a reliability threshold is achieved, at which point the receiver computes an estimate θ^Ω\hat{\theta}\in\Omega of θ\theta from the received symbols Y1,Y2,,YτY_{1},Y_{2},\dots,Y_{\tau}. The communication problem consists of obtaining a decoding estimate of θ\theta at the smallest possible time index τ\tau while keeping the error probability Pr(θ^θ)\Pr(\hat{\theta}\neq\theta) bounded by a small threshold ϵ\epsilon.

I-A Background

Shannon [3] showed that feedback cannot increase the capacity of discrete memoryless channels (DMC). However, when combined with variable-length coding, Burnashev [4] showed that feedback can help increase the frame error rate’s (FER) decay rate as a function of blocklength. One such variable length coding method was pioneered by Horstein [5]. Horstein’s sequential transmission scheme was presumed to achieve the capacity of the BSC, which was later proved by Shayevitz and Feder [6] showing that it satisfies the criteria of a posterior matching scheme. A posterior matching (PM) scheme was defined by Shayevitz and Feder as one that satisfies the two requirements of the posterior matching principle:

  1. 1.

    The input symbol at time t+1t+1, Xt+1X_{t+1}, is a fixed function of a random variable UU, that is independent of the received symbol history Yt{Y1,Y2,,Yt}Y^{t}\triangleq\{Y_{1},Y_{2},\dots,Y_{t}\}; and

  2. 2.

    The transmitted message, θ\theta, can be uniquely recovered from (U,Yt)(U,Y^{t}) a.s.

Gorantla and Coleman [7] used Lyapunov functions for an alternative proof that PM schemes achieve the channel capacity. Later, Li and El-Gamal [8] proposed a capacity achieving “posterior matching” scheme with fixed block-length for DMC channels. Their scheme used a random cyclic shift that was later used by Shayevitz and Feder for a simpler proof that Horstein’s scheme achieves capacity [9]. Naghshvar et. al. [10] proposed a variable length, single phase “posterior matching” scheme for discrete DMC channels with feedback that exhibits Burnashev’s optimal error exponent, and used a sub-martingale analysis to prove that it achieves the channel capacity. Bae and Anastasopoulos [11] proposed a PM scheme that achieves the capacity of finite state channels with feedback. Since then, other “posterior matching” algorithms have been developed, see [12, 13, 14, 15, 16]. Other variable length schemes that attain Burnashev’s optimal error exponent have also been developed, and some can be found in [17, 18, 19, 20, 21].

Feedback communication over the BSC in particular has been the subject of extensive investigation. Capacity-approaching, fixed-length schemes have been developed such as [8], but these schemes only achieve low frame error rates (FERs) at block sizes larger than 1000 bits. For shorter block lengths, capacity-approaching, variable-length schemes have also been developed, e.g., [5], [4], [10]. Recently, Yang et al. [2] provided the best currently available achievability bound for these variable-length schemes. Yang et al. derive an achievable rate using encoders that satisfy the small-enough-difference (SED) constraint. However, the complexity of variable-length schemes satisfying that constraint can grow quickly with message size, becoming too complex for practical implementation even at block lengths significantly below those addressed by the fixed-length schemes such as in [8].

I-B Contributions

In our precursor conference paper [1], we simplified the implementation of an encoder that enforces the SED constraint both by initially sending systematic bits and by grouping the messages according to Hamming distance from the received systematic bits. The contributions of the current paper include the following:

  • This paper provides a new analysis framework for posterior matching on the BSC that avoids martingale analysis in the communication phase in order to show that the achievable rate of [2] can be achieved with a broader set of encoders that satisfy less restrictive criteria than the SED constraint. Thm. 3 provides an example of a constraint, the small-enough-absolute-difference (SEAD) constraint, that meets the new, relaxed criteria.

  • The relaxed criteria allow a significant reduction of encoder complexity. Specifically, this paper shows that applying a new partitioning algorithm, thresholding of ordered posteriors (TOP), induces a partitioning that meets the SEAD constraints. The TOP algorithm facilitates further complexity reduction by avoiding explicit computation of posterior updates for the majority of messages, since those posterior updates are not required to compute the threshold position. This low-complexity encoding algorithm achieves that same rate performance that has been previously established for SED encoders in, e.g., [2].

  • Our new analysis further tightens the achievable rate bound provided in [2]. This new achievable rate lower bound applies to both the SED encoder analyzed in [2] and to our new, simpler, encoder.

  • We also show that using systematic transmissions as in [1] to initially send the message meets both the relaxed criteria including SEAD as well as the SED constraint. Complexity is reduced during the systematic transmission, with the required operations limited to simply storing the received sequence.

  • We generalize the concept of the “surrogate process” Ui(t)U^{\prime}_{i}(t), used in Sec V-E of [2], to a broader class of processes that are not necessarily sub-martingales. The ability to construct such “surrogate” processes allows tighter bounds that also apply to the original process.

  • Taken together, these results demonstrate that variable-length coding with full noiseless feedback can closely approach capacity with modest complexity.

  • Regarding complexity, the simplified encoder associated with SEAD has a complexity below order O(K2)O(K^{2}) and allows simulations for message sizes of at least 10001000 bits. The simplified encoder organizes messages according to their type, i.e. their Hamming distance from the received word, orders messages according to their posterior, and partitions the messages with a simple threshold without requiring any swaps.

  • Regarding proximity to capacity, our achievable rate bounds show that with codeword error rate of 10310^{-3} SEAD posterior matching can achieve 9696% of the channel’s 0.500.50-bit capacity for an average blocklength of 199.08199.08 bits corresponding to a message with k=47k=47 bits. Simulations with our simplified encoder achieve 9999% of of the channel’s 0.500.50-bit capacity for a target codeword error rate of 10310^{-3} with an average block size of 201.08201.08 bits corresponding to a message with k=49k=49 bits.

I-C Organization

The rest of the paper proceeds as follows. Sec. II describes the communication process, introduces the problem statement, and reviews the highest existing achievability bound, by Yang et al. [2], as well as the scheme that achieves it, by Naghshvar et. al. [10]. Sec. III introduces Thms. 1, 2 and 3 that together relax the sufficient constraints to guarantee a rate above Yang’s lower bound and further tightens Yang’s bound. Sec. IV introduces Lemmas 1-5 and provides the proof of Thm. 1 via Lemmas 1-5. Sec. V provides the proofs of Lemmas 1-5, and the proof of Thm. 2 and Thm. 3. Sec. VI generalizes the new rate lower bound to arbitrary input distributions and derives an improved lower bound for the special case where a uniform input distribution is transformed into a binomial distribution through a systematic transmission phase. Sec. VII describes the TOP partitioning method and implements a simplified encoder that organizes messages according to their type, applies TOP, and employs initial systematic transmissions. Sec. VIII compares performance from simulations using the simplified encoder to the new achievability bounds. Sec. IX provides our conclusions. The AppendixX provides detailed proof of the second part of Thm. 3 and the proof of claim 1.

II Posterior Matching with SED Partitioning

II-A Communication Scheme

Our proposed communication scheme and simplified encoding algorithm are based on the single phase transmission scheme proposed by Naghshvar et. al. [10]. Before each transmission, both the transmitter and the receiver partition the message set Ω={0,1}K\Omega=\{0,1\}^{K} into two sets, S0S_{0} and S1S_{1}. The partition is based on the received symbols YtY^{t} according to a specified deterministic algorithm known to both the transmitter and receiver. Then, the transmitter encodes Xt=0X_{t}=0 if θS0\theta\in S_{0} and Xt=1X_{t}=1 if θS1\theta\in S_{1}, i.e.

Xt=enc(θ,Yt)=𝟙iS1X_{t}=\text{enc}(\theta,Y^{t})=\mathbbm{1}_{i\in S_{1}} (1)

After receiving symbol YtY_{t}, the receiver computes the posterior probabilities:

ρi(yt)P(θ=iYt=yt),i{0,1}K.\rho_{i}(y^{t})\triangleq P(\theta=i\mid Y^{t}=y^{t}),\>\forall i\in\{0,1\}^{K}. (2)

The transmitter also computes these posteriors, as it has access to the received symbol YtY_{t} via the noiseless feedback channel, which allows both transmitter and receiver to use the same deterministic partitioning algorithm. The process repeats until the first time τ\tau that a single message ii attains a posterior ρi(yτ)1ϵ\rho_{i}(y^{\tau})\geq 1-\epsilon. The receiver chooses this message ii as the estimate θ^\hat{\theta}. Since θ\theta is uniformly sampled, every possible message j{0,1}Kj\in\{0,1\}^{K} has the same prior: Pr(θ=j)=2K\Pr(\theta=j)=2^{-K}.

To prove that the SED scheme of Naghshvar et. al. [10] is a posterior matching BSC scheme as described in [6], it suffices to show that the the scheme uses the same encoding function as [6] applied to a permutation of the messages. Since the posteriors ρi(yt)\rho_{i}(y^{t}) are fully determined by the history of received symbols YtY^{t}, a permutation of the messages can be defined concatenating the messages in S0S_{0} and S1S_{1}, each sorted by decreasing posterior. This permutation induces a c.d.f. on the corresponding posteriors. Then, to satisfy the posterior matching principle, the random variable UU could just be the c.d.f. evaluated at the last message before θ\theta. The resulting encoding function is given by Xt+1=0X_{t+1}=0 if U<1/2U<1/2, otherwise Xt+1=1X_{t+1}=1.

Naghshvar et. al. proposed two methods to construct the partitions S0S_{0} and S1S_{1}. The simplest one, described as the small enough difference encoder (SED) [2], consists of an algorithm that terminates when the SED constraint bellow is met:

0iS0ρi(yt)iS1ρi(yt)<miniS0ρi(yt).0\leq\sum_{i\in S_{0}}\rho_{i}(y^{t})-\sum_{i\in S_{1}}\rho_{i}(y^{t})<\min_{i\in S_{0}}\rho_{i}(y^{t})\,. (3)

The algorithm starts with all messages in S0S_{0} and a vector of posteriors 𝝆t[ρ1(yt),,ρ2K(yt)]\bm{\rho}_{t}\triangleq[\rho_{1}(y^{t}),\dots,\rho_{2^{K}}(y^{t})] of the messages {1,,2K}\{1,\dots,2^{K}\}. The items are moved to S1S_{1} one by one, from smallest to largest posterior. The process ends at any point where rule (3) is met. If the accumulated probability in S0S_{0} falls below 12\frac{1}{2}, then the labelings of S0S_{0} and S1S_{1} are swapped, after which the process resumes.

The worst case scenario complexity of this algorithm is of order O(M2)O(M^{2}), where M=2KM=2^{K} is the number of posteriors. The MM is squared because part of the process repeats after every swap, and in the worst case scenario the number of swaps is proportional to MM. However, a likely scenario is that the process ends after very few swaps, in which case the complexity is of order O(M)=O(2K)O(M)=O(2^{K}).

The second method by which Naghshvar et al. proposed to construct S0S_{0} and S1S_{1} consists of an exhaustive search over all possible partitions, i.e., the power set 2Ω2^{\Omega}, and a metric to determine the optimal partition. This search would clearly include the partitioning of the first method, and therefore, also provide the guarantees of equations (9) and (14).

II-B Yang’s Achievable Rate

Yang et. al. [2] developed the upper bound (7) on the expected block length τ\tau of the SED encoder that, to the best of our knowledge, is the best upper bound that has been developed for the model.

The analysis by Yang et al. consists of two steps. The first step, in [2] Thm. 77, consists of splitting the single phase process from Naghshvar et. al. [10] into a two phase process: the communication phase, with stopping time TT, where ρi(yt)<12\rho_{i}(y^{t})<\frac{1}{2} and a confirmation phase where ρi(yt)12\rho_{i}(y^{t})\geq\frac{1}{2}, when the transmitted message θ\theta is the message ii. This is a method first used by Burnahsev in [4]. With the first step alone, the following upper bound on the expected blocklength can be constructed:

𝖤[τ]\displaystyle\mathsf{E}[\tau] log2(M1)+C2C+log2(1ϵϵ)C2C2C1\displaystyle\leq\frac{\log_{2}(M-1)+C_{2}}{C}+\left\lceil\frac{\log_{2}(\frac{1-\epsilon}{\epsilon})}{C_{2}}\right\rceil\frac{C_{2}}{C_{1}}
+2C2(2C2CC2C1)1ϵ1ϵ2C212C2.\displaystyle\phantom{=\,}+2^{-C_{2}}\left(\frac{2C_{2}}{C}-\frac{C_{2}}{C_{1}}\right)\frac{1-\frac{\epsilon}{1-\epsilon}2^{-C_{2}}}{1-2^{-C_{2}}}\,. (4)

where CC is the channel capacity, defined by C1H(p)C\triangleq 1-H(p), and H(p)plog2(p)(1p)log2(1p)H(p)\triangleq-p\log_{2}(p)-(1-p)\log_{2}(1-p), and the constants C2C_{2} and C1C_{1} from [2] are given by:

C2\displaystyle C_{2} log2(qp)\displaystyle\triangleq\log_{2}\left(\frac{q}{p}\right) (5)
C1\displaystyle C_{1} qlog2(qp)+plog2(pq).\displaystyle\triangleq q\log_{2}\left(\frac{q}{p}\right)+p\log_{2}\left(\frac{p}{q}\right)\,. (6)

The second step, in [2] Lemma 44, consists of synthesizing a surrogate martingale Ui(t)U^{\prime}_{i}(t) with stopping time TT^{\prime} that upper bounds TT, which is a degraded version of the sub-martingale Ui(t)U_{i}(t). The martingale Ui(t)U^{\prime}_{i}(t) guarantees that whenever Ui(t)<0U^{\prime}_{i}(t)<0, then Ui(t+1)1qlog2(2q)U^{\prime}_{i}(t+1)\leq\frac{1}{q}\log_{2}(2q) and while still satisfying the constraints needed to guarantee the bound (4). An achievability bound on the expected blocklength for the surrogate process, Ui(t)U^{\prime}_{i}(t), is constructed from (4) by replacing some of the C2C_{2} values by 1qlog2(2q)\frac{1}{q}\log_{2}(2q). The new bound from [2] Lemma 44 is given by:

𝖤[τ]\displaystyle\mathsf{E}[\tau] log2(M1)C+log2(2q)qC+log2(1ϵϵ)C2C2C1\displaystyle\leq\frac{\log_{2}(M-1)}{C}+\frac{\log_{2}(2q)}{q\cdot C}+\left\lceil\frac{\log_{2}(\frac{1-\epsilon}{\epsilon})}{C_{2}}\right\rceil\frac{C_{2}}{C_{1}}
+2C2(C2+log2(2q)qCC2C1)1ϵ1ϵ2C212C2.\displaystyle\phantom{=\,}+2^{-C_{2}}\left(\frac{C_{2}+\frac{\log_{2}(2q)}{q}}{C}-\frac{C_{2}}{C_{1}}\right)\frac{1-\frac{\epsilon}{1-\epsilon}2^{-C_{2}}}{1-2^{-C_{2}}}\,. (7)

This bound also applies to the original process Ui(t)U_{i}(t), since the blocklength of the process Ui(t)U^{\prime}_{i}(t) upper bounds that of Ui(t)U_{i}(t). The bound (7) is lower because 1qlog2(2q)\frac{1}{q}\log_{2}(2q) is smaller than C2C_{2}. The improvement is more significant as p0p\rightarrow 0 because 1qlog2(2q)\frac{1}{q}\log_{2}(2q) grows from 0 to 11 as p0p\rightarrow 0, while C2C_{2}, instead, grows from 0 to infinity. The rate lower bound is given by K𝖤[τ]\frac{K}{\mathsf{E}[\tau]}, where 𝖤[τ]\mathsf{E}[\tau] is upper bounded by (7) from Thm. 77 [2].

II-C Original Constraints that Ensure Yang’s Achievable Rate

Let tσ(Yt)\mathcal{F}_{t}\triangleq\sigma(Y^{t}), the σ\sigma-algebra generated by the sequence of received symbols up to time tt, where Yt=[Y1,Y2,,Yt]Y^{t}=[Y_{1},Y_{2},\dots,Y_{t}]. For each i=1,,Mi=1,\dots,M, let the processes Ui(Yt)U_{i}(Y^{t}) by:

Ui(t)=Ui(Yt)\displaystyle U_{i}(t)=U_{i}(Y^{t}) log2(ρi(Yt)1ρi(Yt)).\displaystyle\triangleq\log_{2}\left(\frac{\rho_{i}(Y^{t})}{1-\rho_{i}(Y^{t})}\right)\,. (8)

Yang et. al. show that the SED encoder from Naghshvar et. al. [10] guarantees that the following constraints (9)-(12) are met:

𝖤[Ui(t+1)|t,θ=i]\displaystyle\mathsf{E}[U_{i}(t+1)|\mathcal{F}_{t},\theta=i] Ui(t)+C,\displaystyle\geq U_{i}(t)+C, if Ui(t)<0,\displaystyle\text{if }U_{i}(t)<0, (9)
|Ui(t+1)Ui(t)|\displaystyle|U_{i}(t+1)-U_{i}(t)| C2.\displaystyle\leq C_{2}. (10)
𝖤[Ui(t+1)|t,θ=i]\displaystyle\mathsf{E}[U_{i}(t+1)|\mathcal{F}_{t},\theta=i] =Ui(t)+C1,\displaystyle=U_{i}(t)+C_{1}, if Ui(t)0,\displaystyle\text{if }U_{i}(t)\geq 0, (11)
|Ui(t+1)Ui(t)|\displaystyle|U_{i}(t+1)-U_{i}(t)| =C2,\displaystyle=C_{2}, if Ui(t)0,\displaystyle\text{if }U_{i}(t)\geq 0, (12)

Meanwhile, Naghshvar et. al. show that the SED encoder also satisfied the more strict constraint that the average log likelihood ratio 𝐔(t)\mathbf{U}(t), as defined in equation (13), is also a submartingale that satisfies equation (14), which is equivalent to (15):

𝐔(Yt)i=1Mρi(Yt)Ui(Yt)\displaystyle\mathbf{U}(Y^{t})\triangleq\sum_{i=1}^{M}\rho_{i}(Y^{t})U_{i}(Y^{t}) (13)
𝖤[𝐔(Yt+1)t]𝐔(Yt)+C\displaystyle\mathsf{E}[\mathbf{U}(Y^{t+1})\mid\mathcal{F}_{t}]\geq\mathbf{U}(Y^{t})+C (14)
𝖤[i=1M(ρi(yt+1)Ui(t+1)ρi(yt)Ui(t))|t]C.\displaystyle\mathsf{E}\left[\sum_{i=1}^{M}\left(\rho_{i}(y^{t\!+\!1})U_{i}(t\!+\!1)\!-\!\rho_{i}(y^{t})U_{i}(t)\right)\Bigg{|}\mathcal{F}_{t}\right]\geq C\,. (15)

The process 𝐔(t)\mathbf{U}(t) is a weighted average of values Ui(t)U_{i}(t), some of which increase and some of which decrease after the next transmission t+1t+1.

To derive the bound (4), Yang et al. split the decoding time τ\tau into TT and τT\tau-T, where TT is an intermediate stopping time defined by the first crossing into the confirmation phase. The expectation 𝖤[T]\mathsf{E}[T] is analyzed in [2] as a martingale stopping time, and requires that if θ=i\theta=i, then Ui(t)U_{i}(t) be a strict submartingale that satisfies the inequalities (9) and (10). The expectation 𝖤[τT]\mathsf{E}[\tau-T] is analyzed using a Markov Chain that exploits the larger and fixed magnitude step size (12) and inequality (11). Since TT is the time of the first crossing into the confirmation phase, the Markov Chain model, needs to include in the time τT\tau-T the time that message ii takes to return to the confirmation phase if it has fallen back to the communication phase, that is: ρi(yt)<12\rho_{i}(y^{t})<\frac{1}{2} for some t>Tt>T.

III A New Bound and Relaxed Partitioning

In the following section, we introduce relaxed conditions that are still sufficient to allow a sequential encoder over the BSC with full feedback to attain the performance of Yang’s bound (7). Specifically, we replace the requirement in (9) that applies separately to each message with a new requirement in (20) that applies to an average over all possible messages. For each individual message, we require in (16) that each step size is larger than the same positive a. The relaxed conditions are easier to enforce than (9), e.g. by the SEAD partitioning constraint introduced in Thm. 3.

III-A Relaxed Constraints that Also Guarantee Bound (4)

We begin with a theorem that introduces relaxed conditions and shows that they guarantee the performance (4), corresponding to the first step of Yang’s analysis.

Theorem 1.

Let τ\tau be the stopping time of a sequential transmission system over the BSC. At each time tt let the posteriors ρ1(Yt),ρ2(Yt),,ρM(Yt)\rho_{1}(Y^{t}),\rho_{2}(Y^{t}),\dots,\rho_{M}(Y^{t}) be as defined in (2) and the log likelihood ratios U1(t),,UM(t)U_{1}(t),\dots,U_{M}(t) be as defined in (8). Suppose that for all times tt for all received symbols yty^{t}, and for each jΩj\in\Omega, the constraints (16)-(19) are satisfied:

𝖤[Uj(t+1)Uj(t)|t,θ=j]\displaystyle\mathsf{E}[U_{j}(t+1)-U_{j}(t)|\mathcal{F}_{t},\theta=j] a,\displaystyle\geq a\,, where a\displaystyle\text{where }a >0,\displaystyle>0\,, (16)
Uj(t+1)Uj(t)\displaystyle U_{j}(t+1)-U_{j}(t) C2,\displaystyle\leq C_{2}\,,\quad if Uj(t)\displaystyle\text{if }U_{j}(t) 0,\displaystyle\leq 0\,, (17)
𝖤[Uj(t+1)Uj(t)|t,θ=j]\displaystyle\mathsf{E}[U_{j}(t+1)-U_{j}(t)|\mathcal{F}_{t},\theta=j] =C1,\displaystyle=C_{1}\,, if Uj(t)\displaystyle\text{if }U_{j}(t) 0,\displaystyle\geq 0\,, (18)
Uj(t+1)Uj(t)\displaystyle\mid U_{j}(t+1)-U_{j}(t)\mid =C2,\displaystyle=C_{2}\,, if Uj(t)\displaystyle\text{if }U_{j}(t) 0.\displaystyle\geq 0\,. (19)

Suppose further that for all tt and yty^{t} the following condition is satisfied:

j=1M𝖤[ρj(yt)(Uj(t+1)Uj(t))|\displaystyle\sum_{j=1}^{M}\mathsf{E}[\rho_{j}(y^{t})\left(U_{j}(t\!+\!1)\!-\!U_{j}(t)\right)| Yt=yt,\displaystyle Y^{t}=y^{t}, θ=j])\displaystyle\theta=j]) C.\displaystyle\geq C. (20)

Then, expected stopping time 𝖤[τ]\mathsf{E}[\tau] is upper bounded by (21).

𝖤[τ]\displaystyle\mathsf{E}[\tau] log2(M1)+C2C+log2(1ϵϵ)C2C2C1\displaystyle\leq\frac{\log_{2}(M\!-\!1)+C_{2}}{C}+\left\lceil\frac{\log_{2}(\frac{1-\epsilon}{\epsilon})}{C_{2}}\right\rceil\frac{C_{2}}{C_{1}}
+2C2(C2CC2C1)1ϵ1ϵ2C212C2.\displaystyle\phantom{+\,}+2^{-C_{2}}\left(\frac{C_{2}}{C}-\frac{C_{2}}{C_{1}}\right)\frac{1-\frac{\epsilon}{1-\epsilon}2^{-C_{2}}}{1-2^{-C_{2}}}\,.\quad\boxempty (21)

The proof is provided in Sec IV-B.

In equation (20) the values of ρj(yt)\rho_{j}(y^{t}) and Uj(t)U_{j}(t) are fixed since they are functions of the yty^{t} specified by the conditioning. Then, the expectation 𝖤[ρj(yt)|Yt=yt,θ=j])\mathsf{E}[\rho_{j}(y^{t})|Y^{t}=y^{t},\theta=j]) is simply the constant ρj(yt)\rho_{j}(y^{t}), and we can also write (20) as the weighted sum of expectations:

j=1Mρj(yt)𝖤[(Uj(t+1)Uj(t))|\displaystyle\sum_{j=1}^{M}\rho_{j}(y^{t})\mathsf{E}[\left(U_{j}(t\!+\!1)\!-\!U_{j}(t)\right)| Yt=yt,\displaystyle Y^{t}=y^{t}, θ=j])\displaystyle\theta=j]) C.\displaystyle\geq C. (22)

Meanwhile the value of Uj(t+1)U_{j}(t\!+\!1) for each jj is a random variable that takes on two possible values depending on the value of Yt+1Y_{t+1}.

The sequential transmission process begins by randomly selecting a message θ\theta from Ω\Omega. Using that selected message, at each time tt until the decoding process terminates, the process computes an Xt=xtX_{t}=x_{t}, which induces a Yt=ytY_{t}=y_{t} at the receiver. The original constraint (9) dictates that {Ui(t),θ=i}\{U_{i}(t),\theta=i\} is a sub-martingale and allows for a bound on Ui(t)U_{i}(t) at any future time t+st+s for any possible selected message ii, i.e. 𝖤[Ui(t+s)t,θ=i]Ui(t)+sC\mathsf{E}[U_{i}(t+s)\mid\mathcal{F}_{t},\theta=i]\geq U_{i}(t)+sC. This is no longer the case with the new constraints in Thm. 1. While equation (16) of the new constraints makes the process Ui(t)U_{i}(t) a sub-martingale, it only guarantees that 𝖤[Ui(t+s)t,θ=i]Ui(t)+sa\mathsf{E}[U_{i}(t+s)\mid\mathcal{F}_{t},\theta=i]\geq U_{i}(t)+sa and aa could be any small positive constant. The left side of equation (20) is a sum that includes all MM realizations of the message, it is a constraint for each fixed time tt and each fixed event Yt=ytY^{t}=y^{t} that governs the behavior across the entire message set and does not define a sub-martingale. For this reason, the martingale analysis used by Naghshvar et. al. in [10] and Yang et al. in [2] no longer be applies.

A new analysis is needed to derive (21), the bound on the expected stopping time τ\tau, using only the constraints of Thm. 1. This new analysis needs to exploit the property that the expected stopping time is over all messages, that is: 𝖤[τ]=i=1MPr(θ=i)𝖤[τθ=i]\mathsf{E}[\tau]=\sum_{i=1}^{M}\Pr(\theta=i)\mathsf{E}[\tau\mid\theta=i] which the original does not use because it guarantees that the bound (7) holds for each message, i.e., 𝖤[τθ=i],i=1,,M\mathsf{E}[\tau\mid\theta=i],i=1,\dots,M. Note, however, that the original constraint (9) does imply that the new constraints are satisfied, so that the results we derive below also apply to the setting of Naghshvar et. al. in [10] and Yang et al. in [2]. The new constraints allow for a much simpler encoder and decoder design. This simpler design motivates our new analysis that forgoes the simplicity afforded by modeling the process {Ui(t),θ=i}\{U_{i}(t),\theta=i\} as a martingale.

The new analysis seeks to accumulate the entire time that a message ii is not in its confirmation phase, i.e. the time during which the encoder is either in the communication phase or in some other message’s confirmation phase. For each n=1,2,3,n=1,2,3,\dots, let TnT_{n} be the time at which the confirmation phase for message ii starts for the nnth time (or the process terminates) and let t0(n)t^{(n)}_{0} be the time the encoder exits the confirmation phase for message ii for the (n1)th(n-1)^{th} time (or the process terminates). That is, for each n=1,2,3,n=1,2,3,\dots, let t0(n)t^{(n)}_{0} and TnT_{n} be defined recursively by t0(1)=0t_{0}^{(1)}=0 and :

Tn\displaystyle T_{n} =min{tt0(n)\displaystyle=\min\{t\geq t^{(n)}_{0} :Ui(t)0or t=τ}\displaystyle:U_{i}(t)\geq 0\,\text{or }t=\tau\} (23)
t0(n+1)\displaystyle t^{(n+1)}_{0} =min{tTn\displaystyle=\min\{t\geq T_{n} :Ui(t)<0or t=τ}.\displaystyle:U_{i}(t)<0\,\text{or }t=\tau\}\,. (24)

Thus, the total time the process Ui(t)U_{i}(t) is not in its confirmation phase is given by:

Tn=1(Tnt0(n)).T\triangleq\sum_{n=1}^{\infty}\left(T_{n}\!-\!t^{(n)}_{0}\right)\,. (25)

III-B A “Surrogate Process” that can Tighten the Bound

First we want to note that the bound (21) is loose compared to (7). It is loose because when the expectation 𝖤[τ]\mathsf{E}[\tau] is split in two parts, 𝖤[T]\mathsf{E}[T] and 𝖤[τT]\mathsf{E}[\tau-T], to analyze them separately, a sub-optimal factor is introduced in the expression for 𝖤[T]\mathsf{E}[T], which is 1C(log2(M1)+C2)\frac{1}{C}(\log_{2}(M-1)+C_{2}). The sub-optimality is derived from the term C2C_{2}, which is the largest value that Ui(t)U_{i}(t) can take at the start of the confirmation phase and makes the term 𝖤[T]\mathsf{E}[T] large. However, this large C2C_{2} is not needed to satisfy any of the constraints in Thm. 1. To overcome this sub-optimality, we use a surrogate process that is a degraded version of the process Ui(t)U_{i}(t), where the value at the start of the confirmation phase is bounded by a constant smaller than C2C_{2}. The surrogate process is a degradation in the sense that it is always below the value of the original process Ui(t)U_{i}(t).

Perhaps the utility of the surrogate process can be better understood through the following frog-race analogy illustrated in Fig. 2. A frog f1f_{1} traverses a race track of length LL jumping from one point to the next. The distance traveled by frog f1f_{1} in a single jump is upper bounded by u1u_{1}. The jumps are not necessarily IID, but we know that the expected length of each jump is lower bounded by ll. It is also possible that f1f_{1} takes some jumps backwards. With only this information, we want to determine an upper bound on the average number of steps frog f1f_{1} takes to reach the end of the track. This could be done using Doob’s optional stopping theorem [22] to compute the upper bound as L+u1l\frac{L+u_{1}}{l}, the maximum distance L+u1L+u_{1} traveled from the origin to the last jump divided by the lower bound on average distance ll of a single jump.

Perhaps this bound can be improved. The final point is located between LL and L+u1L+u_{1} and is reached in a single jump from a point between Lu1L-u_{1} and LL. If for instance, the frog was restricted to only forward jumps, we could replace u1/lu_{1}/l by just 11, but the process Ui(t)U_{i}(t) actually can take steps backwards. Instead we exploit another property of Ui(t)U_{i}(t), which is that maximum step size C2C_{2} is not needed to guarantee the lower bound CC on the average step size.

Suppose now that a surrogate frog f2f_{2} participates in the race along f1f_{1} but with the following restrictions:

  1. 1.

    f1f_{1} and f2f_{2} start in the same place and always jump at the same time.

  2. 2.

    f2f_{2} is never ahead of f1f_{1}, i.e. when f1f_{1} jumps forward, f2f_{2} jumps at most as far, and when f1f_{1} jumps backwards, f2f_{2} jumps at least as far.

  3. 3.

    Moreover, the forward distance traveled by frog f2f_{2} in a single jump is upper bounded by u2<u1u_{2}<u_{1}.

  4. 4.

    Despite its slower progress, the surrogate frog f2f_{2} still satisfies the property that the expected length of each jump is lower bounded by ll.

The average number of steps taken by f2f_{2} will be upper bounded by L+u2l\frac{L+u_{2}}{l}, also by Doob’s optional stopping theorem. Since f2f_{2} is never ahead of f1f_{1} then f2f_{2} crossing the finish line implies that f1f_{1} has as well. Thus, L+u2l\frac{L+u_{2}}{l} is also an upper bound on the average number of jumps required for frog f1f_{1} reach across LL.

The equivalent to the surrogate frog f2f_{2} is what we proceed to define in Thm. 2, where the length L=log(M1)L=\log(M-1), u1=C2u_{1}=C_{2}, u2=Bu_{2}=B and l=Cl=C.

Refer to caption
Figure 2: Example: frogs f1f_{1} and f2f_{2} jumping from 0 to LL. The length of a single jump by f1f_{1} is at most u1u_{1}. Frog f2f_{2} jumps at the same times as f1f_{1}, however, the length of a single jump by f2f_{2} is at most u2<u1u_{2}<u_{1}. This restriction forces frog f2f_{2} to be always behind f1f_{1} and thus reach LL no sooner than frog f1f_{1}.
Theorem 2 (Surrogate Process Theorem).

Let the surrogate process Ui(t)U^{\prime}_{i}(t) be a degraded version of Ui(t)U_{i}(t) that still satisfies the constraints of Thm. 1. Initialize the surrogate process as Ui(0)=Ui(0)U^{\prime}_{i}(0)=U_{i}(0) and reset Ui(t)U^{\prime}_{i}(t) to Ui(t)U_{i}(t) at every t=t0(n)t=t^{(n)}_{0}, that is at each tt that the encoder exits a confirmation phase round for message ii. Define Tnmin{tt0(n):Ui(t)0or t=t0(n+1)}T^{\prime}_{n}\triangleq\min\{t\geq t_{0}^{(n)}:U^{\prime}_{i}(t)\geq 0\,\text{or }t=t_{0}^{(n+1)}\}. Suppose that for some B<C2B<C_{2}, the process Ui(t)U^{\prime}_{i}(t) also satisfies the following constraints:

Ui(t)<0Ui(t+1)Ui(t)\displaystyle U_{i}(t)<0\implies U^{\prime}_{i}(t\!+\!1)\!-\!U^{\prime}_{i}(t) Ui(t+1)Ui(t)\displaystyle\leq U_{i}(t\!+\!1)\!-\!U_{i}(t) (26)
Ui(t)<0Ui(t+1)\displaystyle U^{\prime}_{i}(t)<0\implies U^{\prime}_{i}(t\!+\!1) B\displaystyle\leq B (27)
Ui(Tn)pq(Ui(Tn)C2)\displaystyle U^{\prime}_{i}\left(T^{\prime}_{n}\right)-\frac{p}{q}\left(U_{i}(T_{n})-C_{2}\right) B.\displaystyle\leq B. (28)

Then the total time Ui(t)U^{\prime}_{i}(t) is not in its confirmation phase is given by Tn=1(Tnt0(n))T^{\prime}\triangleq\sum_{n=1}^{\infty}\left(T^{\prime}_{n}\!-\!t^{(n)}_{0}\right), and E[T]E[T] is bounded by:

𝖤[T]𝖤[T]BC(1+2C212NC212C2)𝖤[Ui(0)]C.\mathsf{E}[T]\leq\mathsf{E}[T^{\prime}]\leq\frac{B}{C}\left(1\!+\!2^{-C_{2}}\frac{1\!-\!2^{-NC_{2}}}{1\!-\!2^{-C_{2}}}\right)-\frac{\mathsf{E}[U_{i}(0)]}{C}\,. (29)

Note that TnTnT_{n}\leq T^{\prime}_{n} for all nn because Ui(t)Ui(t)U_{i}(t)\geq U^{\prime}_{i}(t) from the definition of Ui(t)U^{\prime}_{i}(t) and constraint (26), Therefore TTT\leq T^{\prime}. Also note that after the process terminates at the stopping time τ\tau, both TnT_{n} and t0(n)t^{(n)}_{0} are equal to τ\tau, which makes their difference 0. Then the communication phase times TT and TT^{\prime} are a sum of finitely many non-zero values.  \boxempty

The proof is provided in Sec. V-B

III-C Relaxed Constraints that Achieve a Tighter Bound

The following theorem introduces partitioning constraints that guarantee that the constraints in Thm. 2 are satisfied with a value of B=log2(2q)/qB=\log_{2}(2q)/q for the surrogate process. The new constraints are looser than the original SED constraint, and therefore are satisfied by an encoder that enforces the SED constraint. Using a new analysis we show that this encoder guarantees an achievability bound tighter than the bound (7) obtained in the second step of Yang’s analysis described in Section II. The value B=log2(2q)/qB=\log_{2}(2q)/q is the lowest possible BB value that satisfies the constraints (26)-(28) of Thm. 2 for a system that enforces the original SED constraint (3). The new achievability applies to an encoder that satisfies the new relaxed constraint as well as one that satisfies the SED constraint.

Theorem 3.

Consider sequential transmission over the BSC with noiseless feedback as described in Sec. I with an encoder that enforces the Small Enough Absolute Difference (SEAD) encoding constraints, equations (30) and (31) bellow:

|iS0ρi(yt)iS1ρi(yt)|miniS0ρi(yt)\displaystyle\left|\underset{i\in S_{0}}{\sum}\rho_{i}(y^{t})-\underset{i\in S_{1}}{\sum}\rho_{i}(y^{t})\right|\leq\underset{i\in S_{0}}{\min}\rho_{i}(y^{t}) (30)
ρi(yt)12S0={i}or S1={i}.\displaystyle\rho_{i}(y^{t})\geq\frac{1}{2}\implies S_{0}=\{i\}\;\text{or }S_{1}=\{i\}\,. (31)

Then, the constraints (19)-(20) in Thm. 1 are satisfied and a process Ui(t)U^{\prime}_{i}(t), i=1,,Mi=1,\dots,M described in Thm. 2 can be constructed with B=1qlog2(2q)B=\frac{1}{q}\log_{2}(2q). The resulting upper bound on 𝖤[τ]\mathsf{E}[\tau] is given by:

𝖤[τ]\displaystyle\mathsf{E}[\tau]\leq log2(M1)+log2(2q)qC+C2C1log2(1ϵϵ)C2\displaystyle\frac{\log_{2}(M-1)+\frac{\log_{2}(2q)}{q}}{C}+\frac{C_{2}}{C_{1}}\left\lceil\frac{\log_{2}\left(\frac{1-\epsilon}{\epsilon}\right)}{C_{2}}\right\rceil
+2C21ϵ1ϵ2C212C2(log2(2q)qCC2C1),\displaystyle+2^{-C_{2}}\frac{1-\frac{\epsilon}{1-\epsilon}2^{-C_{2}}}{1-2^{-C_{2}}}\left(\frac{\log_{2}(2q)}{qC}-\frac{C_{2}}{C_{1}}\right)\,, (32)

which is lower than (7) from [2]. Note that meeting the SEAD constraints guarantees that both sets S0S_{0} and S1S_{1} are non empty. This is because if either set is empty, the the other one is the whole space Ω\Omega and the difference in (30) is 11, which is greater than any posterior in a space with more than one element.  \boxempty

The proof is provided in Sec. V-B

The requirement ρi(yt)12S0={i}or S1={i}\rho_{i}(y^{t})\geq\frac{1}{2}\implies S_{0}=\{i\}\;\text{or }S_{1}=\{i\}, is needed to satisfy constraint (19) and guarantees constraint (18). This requirement is also enforced by the SED partitioning constraints in [10] and [2].

The SEAD partitioning constraint is satisfied whenever the SED constraint (3) is. However, the SEAD partitioning constraint allows for constructions of S0S_{0}, S1S_{1} that do not meet either of the SED constraints in [10] and [2], and is therefore looser. Particularly, SEAD partitioning allows for the case where P1P0>maxjS1ρj(yt)P_{1}-P_{0}>\underset{j\in S_{1}}{\max}\rho_{j}(y^{t}) that often arises in the implementation shown in Sec VII-A. This case is not allowed under either of the SED constraints because they both demand that:

minjS1ρj(yt)P0P1minjS0ρj(yt).-\underset{j\in S_{1}}{\min}\rho_{j}(y^{t})\leq P_{0}-P_{1}\leq\underset{j\in S_{0}}{\min}\rho_{j}(y^{t})\,. (33)

IV Supporting Lemmas and Proof of Theorem 1

This section presents some supporting Lemmas and the full proofs of Thm. 1. Let TT be the time the transmitted message θ\theta spends in the communication phase or on an incorrect confirmation phase, that is, for θ=i\theta=i, Ui(t)<0U_{i}(t)<0 as defined equation (25). Note that this definition is different from the stopping time used in [2] described in Sec. II. The proof of Thm. 1 consists of bounding 𝖤[T]\mathsf{E}[T] and 𝖤[τT]\mathsf{E}[\tau-T] by expressions that derive from the constraints (16)-(20).

Since ρi(yt)1ϵρi(yt)1ρi(yt)1ϵϵUi(t)log2(1ϵϵ)\rho_{i}(y^{t})\geq 1-\epsilon\iff\frac{\rho_{i}(y^{t})}{1-\rho_{i}(y^{t})}\geq\frac{1-\epsilon}{\epsilon}\iff U_{i}(t)\geq\log_{2}\left(\frac{1-\epsilon}{\epsilon}\right), the stopping rule described in Sec. II could be expressed by: τ{mint:max𝑖{Ui(t)}log2(1ϵϵ)}\tau\triangleq\{\min\;t:\underset{i}{\max}\{U_{i}(t)\}\geq\log_{2}\left(\frac{1-\epsilon}{\epsilon}\right)\}. To prove Thm. 1, we will instead use the stopping rule introduced by Yang et. al. [2], defined by:

τ{mint:max𝑖{Ui(t)}NC2},\tau\triangleq\{\min t:\underset{i}{\max}\{U_{i}(t)\}\geq NC_{2}\}\,, (34)

where Nlog2(1ϵϵ)C2N\triangleq\left\lceil\frac{\log_{2}\left(\frac{1-\epsilon}{\epsilon}\right)}{C_{2}}\right\rceil. This rule models the confirmation phase as a fixed Markov Chain with exactly N+1N+1 states. Since NC2log2(1ϵϵ)NC_{2}\geq\log_{2}\left(\frac{1-\epsilon}{\epsilon}\right), the stopping time under the new rule is larger than or equal to that of the original rule without the ceiling as explained in [2].

IV-A Five Helping Lemmas to Aid the Proof of Thms. 1-3

The expression to bound the expectation expectation 𝖤[T]\mathsf{E}[T] is constructed via five inequalities (or equalities) each of which derives from one of the following five Lemmas. The proofs of the Lemmas will be provided in Sec. V-A.

Lemma 1.

Let the total time the transmitted message spends in the communication (or an incorrect confirmation phase) be TT and let TnT_{n} and t0(n)t^{(n)}_{0} be as defined in (23) and (24). Define T(n)Tnt0(n)T^{(n)}\triangleq T_{n}-t^{(n)}_{0} and let

𝒴ϵ(i){yt:ρi(yt)<12,ρj(s)<1ϵst,jΩ}\mathcal{Y}_{\epsilon}^{(i)}\triangleq\{y^{t}:\rho_{i}(y^{t})<\frac{1}{2},\rho_{j}(s)<1\!-\!\epsilon\;\forall s\leq t,j\in\Omega\} (35)

then:

𝖤[T]\displaystyle\mathsf{E}[T] =i=1MPr(θ=i)t=1yt𝒴ϵ(i)Pr(Yt=ytθ=i).\displaystyle=\sum_{i=1}^{M}\Pr(\theta=i)\sum_{t=1}^{\infty}\sum_{y^{t}\in\mathcal{Y}_{\epsilon}^{(i)}}\Pr(Y^{t}=y^{t}\mid\theta=i)\,. (36)

Note that T(1)=T1T^{(1)}=T_{1} is the time before entering the correct confirmation phase for the first time, that is, the time spent in the communication phase (or an incorrect confirmation phase) before the posterior ρi(yt)\rho_{i}(y^{t}) of the transmitted message ever crosses 12\frac{1}{2}. If the decoder stops (in error) before ever entering the correct confirmation phase, then T(1)T^{(1)} is the time until the decoder stops. For n>1n>1, T(n)T^{(n)} is the time between falling back from the correct confirmation for the (n1)th(n-1)^{th} time and either stopping (in error) or reentering the correct confirmation phase for the (n)th(n)^{th} time. Thus, the total time the transmitted message θ=i\theta=i has Ui(t)<0U_{i}(t)<0 is also given by T=n=1T(n)T=\sum_{n=1}^{\infty}T^{(n)}. Also note that that if the decoder stops before entering the correct confirmation phase for the nthn^{th} time, then T(n+m)=0T^{(n+m)}=0 for all m1m\geq 1.  \boxempty

Lemma 2.

Suppose constraints (16), (19), and (20) of Thm. 1 are satisfied and let:

Vi(yt)𝖤[Ui(t+1)Ui(t)Yt=yt,θ=i],V_{i}(y^{t})\triangleq\mathsf{E}[U_{i}(t\!+\!1)\!-\!U_{i}(t)\!\mid\!Y^{t}=y^{t},\theta=i]\,, (37)

then:

i=1M\displaystyle\sum_{i=1}^{M} Pr(θ=i)yt𝒴ϵ(i)Vi(yt)Pr(Yt=ytθ=i)\displaystyle\Pr(\theta=i)\underset{y^{t}\in\mathcal{Y}_{\epsilon}^{(i)}}{\sum}V_{i}(y^{t})\Pr(Y^{t}=y^{t}\mid\theta=i)
Ci=1MPr(θ=i)yt𝒴ϵ(i)Pr(Yt=ytθ=i).\displaystyle\geq C\sum_{i=1}^{M}\Pr(\theta=i)\underset{y^{t}\in\mathcal{Y}_{\epsilon}^{(i)}}{\sum}\Pr(Y^{t}=y^{t}\mid\theta=i)\,.\quad\boxempty (38)
Lemma 3.

Let TnT_{n} and t0(n)t_{0}^{(n)} be the times defined in (23) and (24). Then, for the left side of sum (38) in Lemma (2) the following equality holds:

i=1M\displaystyle\sum_{i=1}^{M} Pr(θ=i)yt𝒴ϵ(i)Vi(yt)Pr(Yt=ytθ=i)\displaystyle\Pr(\theta=i)\sum_{y^{t}\in\mathcal{Y}_{\epsilon}^{(i)}}V_{i}(y^{t})\Pr(Y^{t}=y^{t}\mid\theta=i)
=\displaystyle= i=1MPr(θ=i)n=1𝖤[Ui(Tn)Ui(t0(n))θ=i].\displaystyle\sum_{i=1}^{M}\!\Pr(\theta\!=\!i)\sum_{n=1}^{\infty}\mathsf{E}[U_{i}(T_{n})\!-\!U_{i}(t_{0}^{(n)})\!\mid\!\theta\!=\!i]\,.\quad\boxempty (39)
Lemma 4.

Let ϵ\epsilon be the decoding threshold and let the decoding rule be (34). Define the fallback probability as the probability that a subsequent round of communication phase occurs, computed at the start of a confirmation phase. Then, this fallback probability is a constant pfp_{f} independent of the message i=1,,Mi=1,\dots,M, independent of the number of previous confirmation phase rounds nn, and is given by:

pf\displaystyle p_{f} =2C212NC212(N+1)C2.\displaystyle=2^{-C_{2}}\frac{1-2^{-NC_{2}}}{1-2^{-(N+1)C_{2}}}\,.\quad\boxempty (40)
Lemma 5.

Let pfp_{f} be the fallback probability in Lemma 4 and suppose that Ui(0)<0i=1,,MU_{i}(0)<0\;\forall i=1,\dots,M. Then the expectation (39) in Lemma 3 is upper bounded by:

i=1M\displaystyle\sum_{i=1}^{M} Pr(θ=i)n=1𝖤[Ui(Tn)Ui(t0(n))θ=i]\displaystyle\Pr(\theta=i)\sum_{n=1}^{\infty}\mathsf{E}[U_{i}(T_{n})\!-\!U_{i}(t_{0}^{(n)})\mid\theta=i]
i=1MPr(θ=i)(pf1pfC2+C2Ui(0))\displaystyle\leq\sum_{i=1}^{M}\Pr(\theta=i)\left(\frac{p_{f}}{1-p_{f}}C_{2}+C_{2}-U_{i}(0)\right) (41)
2C212NC212C2C2+C2𝖤[Ui(0)].\displaystyle\leq 2^{-C_{2}}\frac{1-2^{-NC_{2}}}{1-2^{-C_{2}}}C_{2}+C_{2}-\mathsf{E}[U_{i}(0)]\,.\quad\boxempty (42)

IV-B Proof of Thm. 1 Using Lemmas 1-5:

Proof:

Using Lemmas 1 and 2, the expectation 𝖤[T]\mathsf{E}[T] is bounded as follows:

𝖤[T]\displaystyle\mathsf{E}[T] =i=1Myt𝒴ϵ(i)Pr(Yt=yt,θ=i)\displaystyle=\sum_{i=1}^{M}\underset{y^{t}\in\mathcal{Y}_{\epsilon}^{(i)}}{\sum}\Pr(Y^{t}=y^{t},\theta=i) (43)
1Ci=1Myt𝒴ϵ(i)Vi(yt)Pr(Yt=yt,θ=i).\displaystyle\leq\frac{1}{C}\sum_{i=1}^{M}\underset{y^{t}\in\mathcal{Y}_{\epsilon}^{(i)}}{\sum}V_{i}(y^{t})\Pr(Y^{t}=y^{t},\theta=i)\,. (44)

By Lemma (3), expression (44) is equal to the left side of inequality (41), which is bounded by (45) according to Lemma 5:

1Ci=1M\displaystyle\frac{1}{C}\sum_{i=1}^{M} Pr(θ=i)n=1𝖤[Ui(Tn)Ui(t0(n))θ=i]\displaystyle\Pr(\theta\!=\!i)\sum_{n=1}^{\infty}\mathsf{E}[U_{i}(T_{n})\!-\!U_{i}(t_{0}^{(n)})\!\mid\!\theta\!=\!i]
(1+2C212NC212C2)C2C𝐔(Y0)C,\displaystyle\leq\left(1+2^{-C_{2}}\frac{1-2^{-NC_{2}}}{1-2^{-C_{2}}}\right)\frac{C_{2}}{C}-\frac{\mathbf{U}(Y^{0})}{C}\,, (45)

where 𝐔(Y0)\mathbf{U}(Y^{0}) is the expected value of the log likelihood ratio of the true message according to the a-priori message distribution, i.e. from the perspective of the receiver before any symbols have been received. Note that 𝐔(Y0)\mathbf{U}(Y^{0}) is log(M1)-\log(M-1) for a uniform a-priori input distribution. Then, equations (43)-(45) yield the following bound on 𝖤[T]\mathsf{E}[T]:

𝖤[T]\displaystyle\mathsf{E}[T] 2C212NC212C2C2C+C2𝐔(Y0)C.\displaystyle\leq 2^{-C_{2}}\frac{1-2^{-NC_{2}}}{1-2^{-C_{2}}}\frac{C_{2}}{C}+\frac{C_{2}-\mathbf{U}(Y^{0})}{C}\,. (46)

A bound on the expectation 𝖤[τT]\mathsf{E}[\tau-T] can be obtained using the Markov Analysis in [2], Section V. F. However, our analysis of 𝖤[T]\mathsf{E}[T] already accounts for all time spent in the communication phase, including the additional communication phases that occur after the system falls back from the confirmation phase. Accordingly, we reduce the self loop weight Δ0\Delta_{0} in [2] Sec. V F from Δ0=1+C2C+log2(2q)qC\Delta_{0}=1+\frac{C_{2}}{C}+\frac{\log_{2}(2q)}{qC} to Δ0=1\Delta_{0}=1. The resulting bound is given by:

𝖤[τT]\displaystyle\mathsf{E}[\tau-T] (N2C212NC212C2)C2C1.\displaystyle\leq\left(N-2^{-C_{2}}\frac{1-2^{-NC_{2}}}{1-2^{-C_{2}}}\right)\frac{C_{2}}{C_{1}}\,. (47)

The inequality in (47) is not equality because, in our analysis, the transmission ends if any message jj, other than the transmitted message θ\theta, attains Uj(t)NC2U_{j}(t)\geq NC_{2}. However, in [2] the transmission only terminates when Ui(t)NC2U_{i}(t)\geq NC_{2}. The upper bound on the expected stopping time 𝖤[τ]\mathsf{E}[\tau] is obtained by adding the bounds in equations (46) and (47) and replacing NN by its definition in equation (34):

𝖤[τ]\displaystyle\mathsf{E}[\tau]\leq log2(M1)+C2C+C2C1log2(1ϵϵ)C2\displaystyle\frac{\log_{2}(M-1)+C_{2}}{C}+\frac{C_{2}}{C_{1}}\left\lceil\frac{\log_{2}\left(\frac{1-\epsilon}{\epsilon}\right)}{C_{2}}\right\rceil
+2C21ϵ1ϵ2C212C2(C2CC2C1).\displaystyle+2^{-C_{2}}\frac{1-\frac{\epsilon}{1-\epsilon}2^{-C_{2}}}{1-2^{-C_{2}}}\left(\frac{C_{2}}{C}-\frac{C_{2}}{C_{1}}\right)\,. (48)

Note that C2CC2C10\frac{C_{2}}{C}-\frac{C_{2}}{C_{1}}\geq 0 and since
N=1C2log2(1ϵϵ)1C2log2(1ϵϵ)+1N=\left\lceil\frac{1}{C_{2}}\log_{2}\left(\frac{1-\epsilon}{\epsilon}\right)\right\rceil\leq\frac{1}{C_{2}}\log_{2}\left(\frac{1-\epsilon}{\epsilon}\right)+1, then
2NC22log2(1ϵϵ)C2=ϵ1ϵ2C22^{-NC_{2}}\geq 2^{-\log_{2}\left(\frac{1-\epsilon}{\epsilon}\right)-C_{2}}=\frac{\epsilon}{1-\epsilon}2^{-C_{2}} which is also used in [2] for a more compact upper bound expression.

V Proof of Lemmas 1-5, Thm. 2, and Thm. 3

Before proceeding to prove Lemmas 1-5, we will introduce a claim that will aid in some of the proofs.

Claim 1.

For the communication scheme described in Sec. II, the following are equivalent:

  1. (i)

    Uj(t+1)Uj(t)=C2\mid U_{j}(t+1)-U_{j}(t)\mid=C_{2}

  2. (ii)

    S0={j}S_{0}=\{j\} or S1={j}S_{1}=\{j\}

This claim implies that for constraint (19) to hold, the set containing item jj, with Uj(t)0U_{j}(t)\geq 0, must be a singleton.

Proof:

See appendix A

V-A Proof of Lemmas 1-5

Proof:

We begin by defining sets that are used in the proof. First we define tϵ\mathcal{E}_{t}^{\epsilon}, the set of sequences of length tt where the process has not stopped:

tϵ\displaystyle\mathcal{E}_{t}^{\epsilon}\triangleq {yt{0,1}tρj(s)<1ϵ,jΩ,st},\displaystyle\{y^{t}\!\in\{0,1\}^{t}\!\mid\rho_{j}(s)<1\!-\!\epsilon,\>\forall j\in\Omega,s\leq t\}\,, (49)

and let ϵt=0tϵ\mathcal{E}^{\epsilon}\triangleq\cup_{t=0}^{\infty}\mathcal{E}_{t}^{\epsilon}.

For each sequence ytϵy^{t}\in\mathcal{E}^{\epsilon}, Ni(yt)N_{i}(y^{t}) is the set of time values t0(1),t0(2),,t0(n)tt^{(1)}_{0},t^{(2)}_{0},\dots,t^{(n)}_{0}\leq t where message ii begins an interval with Ui(t)<0U_{i}(t)<0. This includes time zero and all the times ss where from time s1s-1 to ss, message ii transitions from Ui(s1)0U_{i}(s-1)\geq 0 to Ui(s)<0U_{i}(s)<0, i.e. the decoder falls back from confirmation phase to communication phase.

Ni(yt)\displaystyle N_{i}(y^{t}) {0}{st:Ui(s)<0,Ui(s1)0}.\displaystyle\triangleq\{0\}\cup\{s\leq t:U_{i}(s)<0,U_{i}(s-1)\geq 0\}\,. (50)

Now we define the set 𝒴ϵ(i,n)\mathcal{Y}^{(i,n)}_{\epsilon} of sequences yty^{t} for which the the following are all true: 1) the decoder has not stopped, 2) the decoder has entered the confirmation phase for message ii nn times, and 3) the decoder is not in the confirmation phase for message ii at time tt, where the sequence ends.

𝒴ϵ(i,n)\displaystyle\mathcal{Y}^{(i,n)}_{\epsilon} {ytϵ:|Ni(yt)|=n,Ui(t)<0}.\displaystyle\triangleq\{y^{t}\!\in\mathcal{E}^{\epsilon}\!:\!\left|N_{i}(y^{t})\right|=n,U_{i}(t)<0\}\,. (51)

For tst\geq s, let ys:t=[ys+1,,yt]y^{s:t}=[y_{s+1},\dots,y_{t}] and let ysys:t=cat([y1,,ys],[ys+1,,yt])=[y1,,ys,ys+1,,yt]y^{s}y^{s:t}=\text{cat}([y_{1},\dots,y_{s}],[y_{s+1},\dots,y_{t}])=[y_{1},\dots,y_{s},y_{s+1},\dots,y_{t}], the concatenation of the strings ysy^{s} and ys:ty^{s:t}. Now we define the set 𝒴ϵ(i,n)(ys)\mathcal{Y}^{(i,n)}_{\epsilon}(y^{s}), which is the subset sequences in 𝒴ϵ(i,n)\mathcal{Y}^{(i,n)}_{\epsilon} that have the sequence ysy^{s} as a prefix.

𝒴ϵ(i,n)(ys)\displaystyle\mathcal{Y}^{(i,n)}_{\epsilon}(y^{s}) {yt𝒴ϵ(i,n):yt=ysys:t}\displaystyle\triangleq\{y^{t}\in\mathcal{Y}_{\epsilon}^{(i,n)}:y^{t}=y^{s}y^{s:t}\} (52)

As our final definition, let ϵ(i,n)\mathcal{B}^{(i,n)}_{\epsilon} be the set containing only the sequences where the final received symbol yty_{t} is the symbol for which the decoder resumes the communication phase for message ii for the nnth time, or the empty string, that is:

ϵ(i,n)\displaystyle\mathcal{B}^{(i,n)}_{\epsilon} {yt𝒴ϵ(i,n)|tNi(yt)}.\displaystyle\triangleq\{y^{t}\in\mathcal{Y}^{(i,n)}_{\epsilon}\big{|}\;t\in N_{i}(y^{t})\}\,. (53)

Each ytϵ(i,n)y^{t}\in\mathcal{B}^{(i,n)}_{\epsilon}, sets an initial condition for the communication phase where Ui(t)<0U_{i}(t)<0, so that T(n)1T^{(n)}\geq 1, that is tt is of the form t0(n)t^{(n)}_{0} defined in (24). By the property of conditional expectation, 𝖤[T]\mathsf{E}[T] is given by:

𝖤[T]=i=1MPr(θ=i)𝖤[Tθ=i].\mathsf{E}[T]=\sum_{i=1}^{M}\Pr(\theta=i)\mathsf{E}[T\mid\theta=i]\,. (54)

Now we explicitly write this expression as a function of all the possible initial conditions for each of the communication phase rounds nn, that is, the set ϵ(i,n)\mathcal{B}^{(i,n)}_{\epsilon}:

i=1M\displaystyle\sum_{i=1}^{M} Pr(θ=i)𝖤[Tθ=i]\displaystyle\Pr(\theta=i)\mathsf{E}[T\mid\theta=i]
=i=1MPr(θ=i)𝖤[(n=1T(n))|θ=i]\displaystyle=\sum_{i=1}^{M}\Pr(\theta=i)\mathsf{E}\left[\left(\sum_{n=1}^{\infty}T^{(n)}\right)\bigg{|}\theta=i\right] (55)
=i=1MPr(θ=i)n=1𝖤[T(n)|θ=i]\displaystyle=\sum_{i=1}^{M}\Pr(\theta=i)\sum_{n=1}^{\infty}\mathsf{E}\left[T^{(n)}\Big{|}\theta=i\right] (56)
=i=1MPr(θ=i)n=1Pr(Ysysϵ(i,n)=ysθ=i)\displaystyle=\sum_{i=1}^{M}\Pr(\theta\!=\!i)\sum_{n=1}^{\infty}\underset{y^{s}\in\mathcal{B}^{(i,n)}_{\epsilon}}{\sum\Pr(Y^{s}}\!=\!y^{s}\!\mid\!\theta\!=\!i)
𝖤[T(n)|Ys=ys,θ=i].\displaystyle\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\cdot\mathsf{E}\Big{[}T^{(n)}\Big{|}Y^{s}\!=\!y^{s},\theta\!=\!i\Big{]}\,. (57)

Now we proceed to write the last expectation (57) using the tail sum formula for expectations in (58) and then as an expectation of the indicator of {T(n)>0}\{T^{(n)}>0\} in (59). Then, since T(n)T^{(n)} is a random function of Yt=YsYrY^{t}=Y^{s}Y^{r}, where Yr{0,1}rY^{r}\in\{0,1\}^{r}, given by 𝟙T(n)>r=𝟙YsYr𝒴ϵ(i,n)\mathbbm{1}_{T^{(n)}>r}=\mathbbm{1}_{Y^{s}Y^{r}\in\mathcal{Y}^{(i,n)}_{\epsilon}}, (60) follows:

𝖤[\displaystyle\mathsf{E}\Big{[} T(n)|Ys=ys,θ=i]\displaystyle T^{(n)}\Big{|}Y^{s}\!=\!y^{s},\theta\!=\!i\Big{]}
=r=0Pr(T(n)>r|Ys=ys,θ=i)\displaystyle=\sum_{r=0}^{\infty}\Pr(T^{(n)}>r|Y^{s}=y^{s},\theta=i) (58)
=r=0𝖤[𝟙T(n)>r|Ys=ys,θ=i]\displaystyle=\sum_{r=0}^{\infty}\mathsf{E}[\mathbbm{1}_{T^{(n)}>r}|Y^{s}=y^{s},\theta=i] (59)
=r=0𝖤[𝟙Ys+r𝒴ϵ(i,n)(ys)|Ys=ys,θ=i].\displaystyle=\sum_{r=0}^{\infty}\mathsf{E}[\mathbbm{1}_{Y^{s\!+\!r}\in\mathcal{Y}^{(i,n)}_{\epsilon}(y^{s})}|Y^{s}=y^{s},\theta=i]\,. (60)

Expanding the expectation in (60) we obtain (61). Since the indicator in (61) is 0 outside 𝒴ϵ(i,n)\mathcal{Y}^{(i,n)}_{\epsilon} and 11 inside, it is omitted in (62), where we have only considered values of yszry^{s}z^{r} that intersect with 𝒴ϵ(i,n)\mathcal{Y}^{(i,n)}_{\epsilon}. Since r=1{{0,1}r𝒴ϵ(i,n)(ys)}=𝒴ϵ(i,n)(ys)\cup_{r=1}^{\infty}\{\{0,1\}^{r}\cap\mathcal{Y}^{(i,n)}_{\epsilon}(y^{s})\}=\mathcal{Y}^{(i,n)}_{\epsilon}(y^{s}) (62) follows.

r=0\displaystyle\sum_{r=0}^{\infty} 𝖤[𝟙Ys+r𝒴ϵ(i,n)(ys)|Ys=ys,θ=i]\displaystyle\mathsf{E}[\mathbbm{1}_{Y^{s\!+\!r}\in\mathcal{Y}^{(i,n)}_{\epsilon}(y^{s})}|Y^{s}=y^{s},\theta=i]
=r=0zr{0,1}r𝟙Ys+r𝒴ϵ(i,n)(ys)\displaystyle=\sum_{r=0}^{\infty}\sum_{z^{r}\in\{0,1\}^{r}}\mathbbm{1}_{Y^{s\!+\!r}\in\mathcal{Y}^{(i,n)}_{\epsilon}(y^{s})}
Pr(Ys+r=yszrYs=ys,θ=i)\displaystyle\quad\quad\quad\quad\quad\quad\cdot\Pr(Y^{s+r}\!=\!y^{s}z^{r}\mid Y^{s}\!=\!y^{s},\theta\!=\!i) (61)
=Pr(Ys+r=yszryszrr=1{{0,1}r𝒴ϵ(i,n)(ys)}Ys=ys,θ=i)\displaystyle=\underset{y^{s}z^{r}\in\cup_{r=1}^{\infty}\{\{0,1\}^{r}\cap\mathcal{Y}^{(i,n)}_{\epsilon}(y^{s})\}}{\sum\Pr(Y^{s+r}\!=\!y^{s}z^{r}}\mid Y^{s}\!=\!y^{s},\theta\!=\!i) (62)
=Pr(Ys+rys+r𝒴ϵ(i,n)(ys)=yszrYs=ys,θ=i).\displaystyle=\underset{y^{s+r}\in\mathcal{Y}^{(i,n)}_{\epsilon}(y^{s})}{\quad\sum\Pr(Y^{s+r}}=y^{s}z^{r}\mid Y^{s}=y^{s},\theta=i)\,. (63)

The product of the conditional probabilities Pr(Ys=ysθ=i)\Pr(Y^{s}\!=\!y^{s}\mid\theta\!=\!i) in (57) and Pr(Ys+r=yszrYs=ys,θ=i)\Pr(Y^{s+r}=y^{s}z^{r}\mid Y^{s}=y^{s},\theta=i) in (63) is given by Pr(Ys+r=yszrθ=i)\Pr(Y^{s+r}=y^{s}z^{r}\mid\theta=i). Replacing the expectation in (57) by (63) the inner-most sum in (57) becomes (64). The summation in (64) is over 𝒴ϵ(i,n)(ys)\mathcal{Y}^{(i,n)}_{\epsilon}(y^{s}) for each ysy^{s} in ϵ(i,n)\mathcal{B}^{(i,n)}_{\epsilon}, which is equivalent to the sum over ysϵ(i,n)𝒴ϵ(i,n)(ys)=𝒴ϵ(i,n)\underset{y^{s}\in\mathcal{B}^{(i,n)}_{\epsilon}}{\cup}\mathcal{Y}^{(i,n)}_{\epsilon}(y^{s})=\mathcal{Y}^{(i,n)}_{\epsilon} and (65) follows:

ysϵ(i,n)Pr(Ys=ysθ=i)𝖤[T(n)|Ys=ys,θ=i]\displaystyle\underset{y^{s}\in\mathcal{B}^{(i,n)}_{\epsilon}}{\sum}\Pr(Y^{s}\!=\!y^{s}\!\mid\!\theta\!=\!i)\mathsf{E}\Big{[}T^{(n)}\Big{|}Y^{s}\!=\!y^{s},\theta\!=\!i\Big{]}
=ysϵ(i,n)Pr(Ys+r=ys+r𝒴ϵ(i,n)(ys)yszrYs=ys,θ=i)\displaystyle\quad=\!\underset{y^{s}\in\mathcal{B}^{(i,n)}_{\epsilon}}{\sum}\underset{y^{s+r}\in\mathcal{Y}^{(i,n)}_{\epsilon}(y^{s})}{\sum\Pr(Y^{s+r}\!=\!}y^{s}z^{r}\mid Y^{s}\!=\!y^{s},\theta\!=\!i) (64)
=Pr(Ytyt𝒴ϵ(i,n)=ytθ=i).\displaystyle\quad=\underset{y^{t}\in\mathcal{Y}^{(i,n)}_{\epsilon}}{\sum\Pr(Y^{t}}=y^{t}\mid\theta=i)\,. (65)

We can now rewrite (55) by replacing the expectation in (56) by (65) to obtain (66). In (67) the two summations are consolidated into a single sum over union over all nn of each 𝒴ϵ(i,n)\mathcal{Y}^{(i,n)}_{\epsilon}:

i=1M\displaystyle\sum_{i=1}^{M} Pr(θ=i)n=1r=0𝖤[𝟙T(n)>r|θ=i]\displaystyle\Pr(\theta\!=\!i)\sum_{n=1}^{\infty}\sum_{r=0}^{\infty}\mathsf{E}[\mathbbm{1}_{T^{(n)}>r}|\theta=i]
=i=1MPr(θ=i)n=1Pr(Ytyt𝒴ϵ(i,n)=ytθ=i)\displaystyle=\sum_{i=1}^{M}\Pr(\theta\!=\!i)\sum_{n=1}^{\infty}\underset{y^{t}\in\mathcal{Y}^{(i,n)}_{\epsilon}}{\sum\Pr(Y^{t}}=y^{t}\mid\theta=i) (66)
=i=1MPr(θ=i)Pr(Yt=ytytn=0𝒴ϵ(i,n)θ=i).\displaystyle=\sum_{i=1}^{M}\underset{y^{t}\in\cup_{n=0}^{\infty}\mathcal{Y}^{(i,n)}_{\epsilon}}{\Pr(\theta=i)\sum\Pr(Y^{t}=y^{t}}\mid\theta=i)\,. (67)

To conclude the proof, note that the union n=0𝒴ϵ(i,n)\cup_{n=0}^{\infty}\mathcal{Y}^{(i,n)}_{\epsilon} is the set 𝒴ϵ(i)\mathcal{Y}^{(i)}_{\epsilon} defined in the statement of the Lemma 1. ∎

Proof:

Define the set 𝒜ϵ\mathcal{A}_{\epsilon} by:

𝒜ϵ{yt𝒴ϵ(i):ρj(yt)<12j=1,,M},\mathcal{A}_{\epsilon}\triangleq\{y^{t}\in\mathcal{Y}_{\epsilon}^{(i)}:\rho_{j}(y^{t})<\frac{1}{2}\;\forall j=1,\dots,M\}\,, (68)

where 𝒜ϵ\mathcal{A}_{\epsilon} does not depend on ii. Let the set 𝒴ϵ(i)\mathcal{Y}_{\epsilon}^{(i)} be partitioned into 𝒜ϵ\mathcal{A}_{\epsilon} and 𝒴ϵ(i)𝒜ϵ\mathcal{Y}_{\epsilon}^{(i)}\setminus\mathcal{A}_{\epsilon}. Then, we can split the sum in the left side of (69), which is the left side of (38) in Lemma 2, into a sum over 𝒜ϵ\mathcal{A}_{\epsilon}, right side of (69), and a sum over the sets 𝒴ϵ(i)𝒜ϵ\mathcal{Y}_{\epsilon}^{(i)}\setminus\mathcal{A}_{\epsilon}, expression (70) as follows:

i=1MPr\displaystyle\sum_{i=1}^{M}\Pr (θ=i)yt𝒴ϵ(i)Pr(Yt=ytθ=i)Vi(yt)\displaystyle(\theta=i)\sum_{y^{t}\in\mathcal{Y}_{\epsilon}^{(i)}}\Pr(Y^{t}=y^{t}\mid\theta=i)V_{i}(y^{t})
=\displaystyle= i=1MPr(θ=i)yt𝒜ϵPr(Yt=ytθ=i)Vi(yt)\displaystyle\sum_{i=1}^{M}\Pr(\theta=i)\sum_{y^{t}\in\mathcal{A}_{\epsilon}}\Pr(Y^{t}=y^{t}\mid\theta=i)V_{i}(y^{t}) (69)
+\displaystyle+ i=1MPr(θ=i)Pr(Yt=ytyt𝒴ϵ(i)𝒜ϵθ=i)Vi(yt).\displaystyle\sum_{i=1}^{M}\underset{y^{t}\in\mathcal{Y}_{\epsilon}^{(i)}\setminus\mathcal{A}_{\epsilon}}{\Pr(\theta=i)\sum\Pr(Y^{t}=y^{t}}\mid\theta=i)V_{i}(y^{t})\,. (70)

For yt𝒴ϵ(i)𝒜ϵ:jiy^{t}\in\mathcal{Y}_{\epsilon}^{(i)}\setminus\mathcal{A}_{\epsilon}:\;\exists j\neq i s.t. Uj(t)0U_{j}(t)\geq 0 and Ui(t)<0U_{i}(t)<0. Then S0={j}S_{0}=\{j\} by constraint (19) and Claim (1), and therefore Δ>0\Delta>0, (see the proof of Thm. 3, for definition of Δ\Delta). By equation (126) with Δ0\Delta\geq 0, this results in 𝖤[Ui(t+1)Ui(t)Yt=yt,θ=i]C\mathsf{E}[U_{i}(t+1)-U_{i}(t)\mid Y^{t}=y^{t},\theta=i]\geq C for all ii. Note that {Uj(t)0}{S0={j}}\{U_{j}(t)\geq 0\}\cap\{S_{0}=\{j\}\} means that in this case the SED constraint (3) is satisfied. It suffices to show the bound holds also for (69). The product of conditional probabilities: Pr(θ=i)\Pr(\theta=i) and Pr(Yt=ytθ=i)\Pr(Y^{t}=y^{t}\mid\theta=i) in (69) is equal to Pr(Yt=yt,θ=i)\Pr(Y^{t}=y^{t},\theta\!=\!i) and can be factored into Pr(Yt=yt)Pr(θ=iYt=yt)\Pr(Y^{t}=y^{t})\Pr(\theta=i\mid Y^{t}=y^{t}). Since 0<Vi(yt)C20<V_{i}(y^{t})\leq C_{2} and 𝒜ϵ\mathcal{A}_{\epsilon} does not depend on ii, then the summation order in (69) can be reversed to obtain:

i=1M\displaystyle\sum_{i=1}^{M} yt𝒜ϵPr(Yt=yt)Pr(θ=iYt=yt)Vi(yt)\displaystyle\sum_{y^{t}\in\mathcal{A}_{\epsilon}}\!\Pr(Y^{t}=y^{t})\Pr(\theta=i\mid Y^{t}=y^{t})V_{i}(y^{t})
=yt𝒜ϵPr(Yt=yt)i=1MPr(θ=iYt=yt)Vi(yt)\displaystyle\underset{y^{t}\in\mathcal{A}_{\epsilon}}{=\sum}\!\Pr(Y^{t}=y^{t})\sum_{i=1}^{M}\!\Pr(\theta=i\mid Y^{t}=y^{t})V_{i}(y^{t}) (71)

The probability Pr(θ=iYt=yt)\Pr(\theta=i\mid Y^{t}=y^{t}) in (71) is just ρi(yt)\rho_{i}(y^{t}) and using the definition of Vi(yt)V_{i}(y^{t}) (72) follows. In (73) ρi(yt)\rho_{i}(y^{t}) is moved inside the expectation, to obtain the form in constraint (20) of Thm. 1:

i=1M\displaystyle\sum_{i=1}^{M} Pr(θ=iYt=yt)Vi(yt)\displaystyle\Pr(\theta=i\mid Y^{t}=y^{t})V_{i}(y^{t})
=i=1Mρi(yt)𝖤[Ui(t+1)Ui(t)Yt=yt,θ=i]\displaystyle=\sum_{i=1}^{M}\!\rho_{i}(y^{t})\mathsf{E}[U_{i}(t\!+\!1)\!-\!U_{i}(t)\!\mid\!Y^{t}=y^{t},\theta=i] (72)
=i=1M𝖤[ρi(yt)(Ui(t+1)Ui(t))Yt=yt,θ=i]\displaystyle=\sum_{i=1}^{M}\mathsf{E}[\rho_{i}(y^{t})(U_{i}(t\!+\!1)\!-\!U_{i}(t))\!\mid\!Y^{t}=y^{t},\theta=i] (73)
C\displaystyle\geq C (74)

Constraint (20) dictates that (73) is lower bounded by CC and (75) follows. Then we multiply by 1=i=1Mρi(yt)1=\sum_{i=1}^{M}{\rho_{i}(y^{t})} to produce (76). In (77) note that ρi(yt)=Pr(θ=iYt=yt)\rho_{i}(y^{t})=\Pr(\theta=i\mid Y^{t}=y^{t}) and the product Pr(θ=iYt=yt)Pr(Yt=yt)\Pr(\theta=i\mid Y^{t}=y^{t})\Pr(Y^{t}=y^{t}) is given Pr(Yt=yt,θ=i)=Pr(Yt=ytθ=i)Pr(θ=i)\Pr(Y^{t}=y^{t},\theta=i)=\Pr(Y^{t}=y^{t}\mid\theta=i)\Pr(\theta=i). This is used to obtain (77) and then (78):

yt𝒜ϵ\displaystyle\underset{y^{t}\in\mathcal{A}_{\epsilon}}{\sum}\! Pr(Yt=yt)i=1MPr(θ=iYt=yt)Vi(yt)\displaystyle\Pr(Y^{t}=y^{t})\sum_{i=1}^{M}\!\Pr(\theta=i\mid Y^{t}=y^{t})V_{i}(y^{t})
yt𝒜ϵPr(Yt=yt)C\displaystyle\geq\sum_{y^{t}\in\mathcal{A}_{\epsilon}}\Pr(Y^{t}=y^{t})C (75)
=Cyt𝒜ϵPr(Yt=yt)i=1Mρi(yt)\displaystyle=C\sum_{y^{t}\in\mathcal{A}_{\epsilon}}\Pr(Y^{t}=y^{t})\sum_{i=1}^{M}{\rho_{i}(y^{t})} (76)
=Ci=1Myt𝒜ϵPr(Yt=yt,θ=i)\displaystyle=C\sum_{i=1}^{M}\sum_{y^{t}\in\mathcal{A}_{\epsilon}}\Pr(Y^{t}=y^{t},\theta=i) (77)
=Ci=1MPr(θ=i)yt𝒜ϵPr(Yt=ytθ=i).\displaystyle=C\sum_{i=1}^{M}\Pr(\theta=i)\sum_{y^{t}\in\mathcal{A}_{\epsilon}}\Pr(Y^{t}=y^{t}\mid\theta=i)\,. (78)

In both (69) and (70) replacing Vi(yt)V_{i}(y^{t}) by CC provide and upper bound on the original expression. Combining the two upper bounds we recover the Lemma. ∎

Proof:

We start writing, in the left side of (79), the sum in the left side of equation (39) of Lemma 3, using an equivalent form for 𝒴ϵ(i)\mathcal{Y}^{(i)}_{\epsilon}, which is n=0𝒴ϵ(i,n)\cup_{n=0}^{\infty}\mathcal{Y}^{(i,n)}_{\epsilon}. This equivalent form was also used in the proof of Lemma 1. Then in (79) we break it into two summations, first over nn and then over 𝒴ϵ(i,n)\mathcal{Y}^{(i,n)}_{\epsilon}:

i=1M\displaystyle\sum_{i=1}^{M} Pr(θ=i)Vi(yt)ytn=0𝒴ϵ(i,n)Pr(Yt=ytθ=i)\displaystyle\Pr(\theta\!=\!i)\underset{y^{t}\in\cup_{n=0}^{\infty}\mathcal{Y}^{(i,n)}_{\epsilon}}{\sum V_{i}(y^{t})}\Pr(Y^{t}\!=\!y^{t}\mid\theta=i)
=\displaystyle= i=1MPr(θ=i)n=1Viyt𝒴ϵ(i,n)(yt)Pr(Yt=ytθ=i).\displaystyle\sum_{i=1}^{M}\Pr(\theta\!=\!i)\sum_{n=1}^{\infty}\underset{y^{t}\in\mathcal{Y}^{(i,n)}_{\epsilon}}{\sum V_{i}}(y^{t})\Pr(Y^{t}\!=\!y^{t}\mid\theta\!=\!i)\,. (79)

The set 𝒴ϵ(i,n)\mathcal{Y}^{(i,n)}_{\epsilon} is a subset of t=0{0,1}t\cup_{t=0}^{\infty}\{0,1\}^{t} and therefore can be expressed a union of all the intersections over nn: 𝒴ϵ(i,n)=t=0{𝒴ϵ(i,n){0,1}t}\mathcal{Y}^{(i,n)}_{\epsilon}=\cup_{t=0}^{\infty}\{\mathcal{Y}^{(i,n)}_{\epsilon}\cap\{0,1\}^{t}\}. We use this new form to rewrite the inner sum in (79) as the left side of (80). Then, we remove the intersections with 𝒴ϵ(i,n)\mathcal{Y}^{(i,n)}_{\epsilon} by using its indicator in the right side of (80):

t=0\displaystyle\sum_{t=0}^{\infty} yt𝒴ϵ(i,n){0,1}tVi(yt)Pr(Yt=ytθ=i)\displaystyle\underset{y^{t}\in\mathcal{Y}^{(i,n)}_{\epsilon}\cap\{0,1\}^{t}}{\sum}V_{i}(y^{t})\Pr(Y^{t}=y^{t}\mid\theta=i)
=\displaystyle= t=0yt{0,1}t𝟙yt𝒴ϵ(i,n)Vi(yt)Pr(Yt=ytθ=i).\displaystyle\sum_{t=0}^{\infty}\sum_{y^{t}\in\{0,1\}^{t}}\mathbbm{1}_{y^{t}\in\mathcal{Y}^{(i,n)}_{\epsilon}}V_{i}(y^{t})\Pr(Y^{t}\!=\!y^{t}\mid\theta\!=\!i)\,. (80)

Recall that Vi(yt)=𝖤[Ui(t+1)Ui(t)Yt=yt,θ=i]V_{i}(y^{t})=\mathsf{E}\left[U_{i}(t+1)\!-\!U_{i}(t)\mid\!Y^{t}=y^{t},\theta\!=\!i\right] from (37). Also recall from (8) that Ui(t)=Ui(Yt)U_{i}(t)=U_{i}(Y^{t}), a random function of YtY^{t}. Let Di(Yt+1)Ui(Yt+1)Ui(Yt)D_{i}(Y^{t+1})\triangleq U_{i}(Y^{t+1})-U_{i}(Y^{t}), then we expand Vi(yt)V_{i}(y^{t}) as:

Vi(yt)\displaystyle V_{i}(y^{t}) =𝖤[Ui(t+1)Ui(t)Yt=yt,θ=i]\displaystyle=\mathsf{E}[U_{i}(t+1)-U_{i}(t)\mid Y^{t}=y^{t},\theta=i]
=Diz{0,1}(Yt+1)Pr(Yt+1=zYt=yt,θ=i).\displaystyle\underset{z\in\{0,1\}}{=\sum D_{i}}(Y^{t+1})\Pr(Y_{t+1}\!=\!z\mid Y^{t}\!=\!y^{t},\theta\!=\!i)\,. (81)

The product of the probabilities in (80) and (81) is given by Pr(Yt+1=ytzθ=i)\Pr(Y^{t+1}=y^{t}z\mid\theta=i). Replacing Vi(yt)V_{i}(y^{t}) in (80) using (81) we obtain the left side of (82). The equality in (82) follows by definition of expectation:

t=0\displaystyle\sum_{t=0}^{\infty} Di(yt+1)yt+1{0,1}t+1𝟙yt𝒴ϵ(i,n)Pr(Yt+1=yt+1θ=i)\displaystyle\underset{y^{t+1}\in\{0,1\}^{t+1}}{\quad\sum D_{i}(y^{t+1})}\mathbbm{1}_{y^{t}\in\mathcal{Y}^{(i,n)}_{\epsilon}}\Pr(Y^{t+1}\!=\!y^{t+1}\mid\theta\!=\!i)
=t=0𝖤[Di(Yt+1)𝟙Yt𝒴ϵ(i,n)θ=i].\displaystyle=\sum_{t=0}^{\infty}\mathsf{E}[D_{i}(Y^{t+1})\mathbbm{1}_{Y^{t}\in\mathcal{Y}^{(i,n)}_{\epsilon}}\mid\theta\!=\!i]\,. (82)

We expand Di(Yt)D_{i}(Y^{t}) using its definition to write (82) as the left side of (83) and use linearity of expectations to the equality (83). The indicator 𝟙Yt𝒴ϵ(i,n)\mathbbm{1}_{Y^{t}\in\mathcal{Y}^{(i,n)}_{\epsilon}} is zero before time t=t0(n)t=t^{(n)}_{0} and after time t=t0(n)+T(n)1t=t^{(n)}_{0}+T^{(n)}-1, and is one in between. Accordingly, in (84) we adjust the limits of summation and remove the indicator function. Note that the times t0(n)t_{0}^{(n)} and T(n)T^{(n)} are themselves random variables. Lastly, observe that (84) is a telescopic sum that is replaced by the end points in (85):

t=0\displaystyle\sum_{t=0}^{\infty} 𝖤[(Ui(Yt+1)Ui(Yt))𝟙Yt𝒴ϵ(i,n)|θ=i]\displaystyle\mathsf{E}\left[\left(U_{i}\left(Y^{t+1}\right)\!-\!U_{i}\left(Y^{t}\right)\right)\mathbbm{1}_{Y^{t}\in\mathcal{Y}^{(i,n)}_{\epsilon}}\big{|}\theta\!=\!i\right]
=𝖤[t=0(Ui(Yt+1)Ui(Yt))𝟙Yt𝒴ϵ(i,n)|θ=i]\displaystyle=\mathsf{E}\left[\sum_{t=0}^{\infty}\left(U_{i}\!\left(Y^{t+1}\right)\!-\!U_{i}\left(Y^{t}\right)\right)\!\mathbbm{1}_{Y^{t}\in\mathcal{Y}^{(i,n)}_{\epsilon}}\Bigg{|}\theta\!=\!i\right] (83)
=𝖤[t=t0(n)T(n)+t0(n)1(Ui(Yt+1)Ui(Yt))|θ=i]\displaystyle=\mathsf{E}\left[\sum_{t=t_{0}^{(n)}}^{T^{(n)}+t_{0}^{(n)}-1}(U_{i}(Y^{t+1})\!-\!U_{i}(Y^{t}))\Bigg{|}\theta=i\right] (84)
=𝖤[Ui(t0(n)+T(n))Ui(t0(n))|θ=i].\displaystyle=\mathsf{E}\left[U_{i}\left(t_{0}^{(n)}+T^{(n)}\right)-U_{i}\left(t_{0}^{(n)}\right)\big{|}\theta=i\right]\,. (85)

To conclude the proof, we replace the inner most summation in (79) with (85):

i=1M\displaystyle\sum_{i=1}^{M} Pr(θ=i)n=1Viyt𝒴ϵ(i,n)(yt)Pr(Yt=ytθ=i)\displaystyle\Pr(\theta\!=\!i)\sum_{n=1}^{\infty}\underset{y^{t}\in\mathcal{Y}^{(i,n)}_{\epsilon}}{\sum V_{i}}(y^{t})\Pr(Y^{t}\!=\!y^{t}\mid\theta\!=\!i) (86)
=\displaystyle= i=1MPr(θ=i)n=1𝖤[Ui(t0(n)+T(n))Ui(t0(n))θ=i].\displaystyle\sum_{i=1}^{M}\Pr(\theta\!=\!i)\!\sum_{n=1}^{\infty}\!\mathsf{E}\left[U_{i}\left(\!t_{0}^{(n)}\!+\!T^{(n)}\!\right)\!-\!U_{i}\left(t_{0}^{(n)}\right)\!\mid\!\theta\!=\!i\right]\,.

Proof:

The confirmation phase starts at a time tt of the form TnT_{n} defined in (23), at which the transmitted message ii attains Ui(Tn)0U_{i}(T_{n})\geq 0 and Ui(Tn1)<0U_{i}(T_{n}-1)<0. Then, like the product martingale in [23], the process ζi(t)\zeta_{i}(t), tTnt\geq T_{n}, is a martingale respect to t=σ(Yt)\mathcal{F}_{t}=\sigma(Y^{t}), where:

ζi(t)=(pq)Ui(t)C2.\displaystyle\zeta_{i}(t)=\left(\frac{p}{q}\right)^{\frac{U_{i}(t)}{C_{2}}}\,. (87)

Note that Ui(t)U_{i}(t) is a biased random walk, see the Markov Chain in [2], with Ui(t)=Ui(Tn)+m=TntξmU_{i}(t)=U_{i}(T_{n})+\sum_{m=T_{n}}^{t}\xi_{m}, where ξm\xi_{m} is an R.V. distributed according to:

ξm={+C2w.p. qC2w.p. p,\displaystyle\xi_{m}=\begin{cases}+C_{2}\quad\text{w.p. }q\\ -C_{2}\quad\text{w.p. }p\end{cases}\,, (88)

We verify that 𝖤[ζi(t+1)t]=ζi(t)\mathsf{E}[\zeta_{i}(t+1)\mid\mathcal{F}_{t}]=\zeta_{i}(t) as follows:

𝖤[ζi(t+1)t]\displaystyle\mathsf{E}[\zeta_{i}(t+1)\mid\mathcal{F}_{t}] =ζi(t)(p(pq)1+q(pq)1)\displaystyle=\zeta_{i}(t)\left(p\left(\frac{p}{q}\right)^{-1}+q\left(\frac{p}{q}\right)^{1}\right)
=ζi(t)(p+q)=ζi(t).\displaystyle=\zeta_{i}(t)\left(p+q\right)=\zeta_{i}(t)\,. (89)

Let SnS_{n} be the time at which decoding either terminates at Ui(t)=Ui(Tn)+NC2U_{i}(t)=U_{i}(T_{n})+NC_{2}, or a fall back occurs, when Ui(t)=Ui(Tn)C2<0U_{i}(t)=U_{i}(T_{n})-C_{2}<0, that is Snmin{tTn:Ui(t){Ui(Tn)C2,Ui(Tn)+NC2}}S_{n}\triangleq\min\{t\geq T_{n}:U_{i}(t)\in\{U_{i}(T_{n})-C_{2},U_{i}(T_{n})+NC_{2}\}\}. Then, the process ζi(tSn)\zeta_{i}(t\wedge S_{n}) is a two side bounded martingale and:

𝖤[ζi(Sn)]\displaystyle\mathsf{E}[\zeta_{i}(S_{n})] =pf(pq)Ui(Tn)C21+(1pf)(pq)Ui(Tn)C2+N\displaystyle=p_{f}\left(\frac{p}{q}\right)^{\frac{U_{i}(T_{n})}{C_{2}}-1}\!\!+(1\!-\!p_{f})\left(\frac{p}{q}\right)^{\frac{U_{i}(T_{n})}{C_{2}}+N} (90)
𝖤[ζi(Tn)]\displaystyle\mathsf{E}[\zeta_{i}(T_{n})] =(pq)Ui(Tn)C2\displaystyle=\left(\frac{p}{q}\right)^{\frac{U_{i}(T_{n})}{C_{2}}} (91)

By Doob’s optional stopping theorem [22], 𝖤[ζi(Sn)]\mathsf{E}[\zeta_{i}(S_{n})] is equal to 𝖤[ζi(Tn)]\mathsf{E}[\zeta_{i}(T_{n})] . Let the fall back probability be pfPr(Ui(Sn)=Ui(Tn)C2t=Tn)p_{f}\triangleq\Pr(U_{i}(S_{n})=U_{i}(T_{n})-C_{2}\mid t=T_{n}), then we can solve for pfp_{f} using equations (90) and (91) by setting both right sides equal. In (92) we factor out and cancel (p/q)Ui(Tn)/C2\left(p/q\right)^{U_{i}(T_{n})/C_{2}}. In (93) we collect the terms with factor pfp_{f} and in (94) we solve for pfp_{f}.

1\displaystyle 1 =pfqp+(1pf)(pq)N\displaystyle=p_{f}\frac{q}{p}+(1-p_{f})\left(\frac{p}{q}\right)^{N} (92)
0\displaystyle 0 =pfqp(1(pq)N+1)(1(pq)N)\displaystyle=p_{f}\frac{q}{p}\left(1-\left(\frac{p}{q}\right)^{N+1}\right)-\left(1-\left(\frac{p}{q}\right)^{N}\right) (93)
pf\displaystyle p_{f} =pq1(pq)N1(pq)N+1.\displaystyle=\frac{p}{q}\frac{1-\left(\frac{p}{q}\right)^{N}}{1-\left(\frac{p}{q}\right)^{N+1}}\,. (94)

Since pfp_{f} is just a function of NN and pp, then it is the same constant across all messages i=1,,Mi=1,\dots,M and indexes n=1,n=1,\dots. To complete the proof we use the definition of C2C_{2} from equation (5), which is C2=log2(qp)C_{2}=\log_{2}\left(\frac{q}{p}\right) and express pfp_{f} in terms of C2C_{2}:

pf\displaystyle p_{f} =2log2(qp)12Nlog2(qp)12(N+1)log2(qp)\displaystyle=2^{-\log_{2}(\frac{q}{p})}\frac{1-2^{-N\log_{2}(\frac{q}{p})}}{1-2^{-(N+1)\log_{2}(\frac{q}{p})}}
=2C212NC212(N+1)C2\displaystyle=2^{-C_{2}}\frac{1-2^{-NC_{2}}}{1-2^{-(N+1)C_{2}}} (95)

Proof:

We start by conditioning the expectation in the left side of equation (41), in Lemma 5, on the events {T(n)>0,θ=i}\{T^{(n)}>0,\theta=i\}, {T(n)=0,θ=i}\{T^{(n)}=0,\theta=i\}, and {T(n)<0,θ=i}\{T^{(n)}<0,\theta=i\}, whose union results in the original conditioning event, {θ=i}\{\theta=i\}, to express the original conditional probability using Bayes rule:

𝖤[Ui(Tn)Ui(t0(n))θ=i]\displaystyle\mathsf{E}[U_{i}(T_{n})-U_{i}(t_{0}^{(n)})\mid\theta=i] (96)
=𝖤[Ui(Tn)Ui(t0(n))T(n)>0,θ=i]Pr(T(n)>0θ=i)\displaystyle=\mathsf{E}[U_{i}(T_{n})\!-\!U_{i}(t_{0}^{(n)})\!\mid\!T^{(n)}\!>\!0,\theta\!=\!i]\Pr(T^{(n)}\!>\!0\mid\!\theta\!=\!i)
+𝖤[Ui(Tn)Ui(t0(n))T(n)=0,θ=i]Pr(T(n)=0θ=i)\displaystyle+\mathsf{E}[U_{i}(T_{n})\!-\!U_{i}(t_{0}^{(n)})\!\mid\!T^{(n)}\!=\!0,\theta\!=\!i]\Pr(T^{(n)}\!=\!0\mid\!\theta\!=\!i)
+𝖤[Ui(Tn)Ui(t0(n))T(n)<0,θ=i]Pr(T(n)<0θ=i).\displaystyle+\mathsf{E}[U_{i}(T_{n})\!-\!U_{i}(t_{0}^{(n)})\!\mid\!T^{(n)}\!<\!0,\theta\!=\!i]\Pr(T^{(n)}\!<\!0\mid\!\theta\!=\!i)\,.

Note that T(n)T^{(n)} is non-negative, thus the last term in the right side of (96) vanishes as Pr(T(n)<0)=0\Pr(T^{(n)}<0)=0. When T(n)=0T^{(n)}=0, then Tn=t0(n)+T(n)=t0(n)T_{n}=t_{0}^{(n)}+T^{(n)}=t_{0}^{(n)} so that Ui(Tn)=Ui(t0(n))U_{i}(T_{n})=U_{i}(t_{0}^{(n)}). Therefore, the second term in the right side of (96) also vanishes, leaving only the first term conditioned on {T(n)>0,θ=i}\{T^{(n)}>0,\theta=i\}. Let 𝒞(t0(n))\mathcal{C}(t^{(n)}_{0}) be the event that message ii enters confirmation after time t0(n)t^{(n)}_{0}, rather than another message jij\neq i ending the process by attaining Uj(t)log2(1ϵ)log2(ϵ)U_{j}(t)\geq\log_{2}(1-\epsilon)-\log_{2}(\epsilon), that is: 𝒞(t0(n)){t>t0(n):Ui(t)0}\mathcal{C}(t^{(n)}_{0})\triangleq\{\exists t>t^{(n)}_{0}:U_{i}(t)\geq 0\}. Then, the probability Pr(T(n+1)0θ=i)\Pr(T^{(n+1)}\geq 0\mid\theta=i) can be expressed as:

Pr(\displaystyle\Pr( T(n+1)0θ=i)\displaystyle T^{(n+1)}\geq 0\mid\theta=i)
=Pr(T(n+1)0T(n)>0,𝒞(t0(n)),θ=i)\displaystyle=\Pr(T^{(n+1)}\geq 0\mid T^{(n)}>0,\mathcal{C}(t^{(n)}_{0}),\theta=i) (97)
Pr(T(n)>0,𝒞(t0(n))θ=i).\displaystyle\quad\quad\quad\quad\quad\quad\,\cdot\Pr(T^{(n)}>0,\mathcal{C}(t^{(n)}_{0})\mid\theta=i)\,.

Note that the first probability in the right side of (97) is just the fall back probability pfp_{f} computed in Lemma 4. The last probability in (97) can be also expressed as a product of conditional probabilities, see (98). In (98) note that event 𝒞(t0(n))\mathcal{C}(t^{(n)}_{0}) is the event that an nn-th confirmation phase phase occurs, which implies that a preceding nn-th communication phase round occurs. Then, 𝒞(t0(n))T(n)>0\mathcal{C}(t^{(n)}_{0})\implies T^{(n)}>0 and the first factor in the product of probabilities in (98) vanishes:

Pr\displaystyle\Pr (T(n)>0,𝒞(t0(n))θ=i)\displaystyle(T^{(n)}>0,\mathcal{C}(t^{(n)}_{0})\mid\theta=i)
=Pr(T(n)>0𝒞(t0(n)),θ=i)Pr(𝒞(t0(n))θ=i)\displaystyle=\Pr(T^{(n)}\!>\!0\mid\mathcal{C}(t^{(n)}_{0}),\theta\!=\!i)\Pr(\mathcal{C}(t^{(n)}_{0})\mid\theta\!=\!i)
=Pr(𝒞(t0(n))θ=i).\displaystyle=\Pr(\mathcal{C}(t^{(n)}_{0})\mid\theta=i)\,. (98)

Combining (97) and (98) we obtain:

Pr(T(n+1)>0θ=i)=Pr(𝒞(t0(n))θ=i)pf.\displaystyle\Pr(T^{(n+1)}\!>\!0\mid\theta\!=\!i)=\Pr(\mathcal{C}(t^{(n)}_{0})\mid\theta=i)p_{f}\,. (99)

We can also bound Pr(𝒞(t0(n))θ=i)\Pr(\mathcal{C}(t^{(n)}_{0})\mid\theta=i) as follows:

Pr\displaystyle\Pr (𝒞(t0(n))θ=i)\displaystyle(\mathcal{C}(t^{(n)}_{0})\mid\theta=i)
=Pr(𝒞(t0(n))T(n)>0,θ=i)Pr(T(n)>0θ=i)\displaystyle=\Pr(\mathcal{C}(t^{(n)}_{0})\mid T^{(n)}\!>\!0,\theta\!=\!i)\Pr(T^{(n)}\!>\!0\mid\theta\!=\!i)
Pr(T(n)>0θ=i).\displaystyle\leq\Pr(T^{(n)}\!>\!0\!\mid\theta\!=\!i)\,. (100)

Then, we can recursively bound Pr(T(n+1)>0θ=i)\Pr(T^{(n+1)}\!>\!0\mid\theta\!=\!i) by Pr(T(n)>0θ=i)pf\Pr(T^{(n)}\!>\!0\mid\theta\!=\!i)p_{f} using (99) and (100). For n1n\geq 1, this results in the general bound:

Pr(T(n)0θ=i)pfn1.\Pr(T^{(n)}\!\geq\!0\mid\theta\!=\!i)\leq p_{f}^{n\!-\!1}\,. (101)

Using (101) we can bound the expectation 𝖤[Ui(Tn)Ui(t0(n))θ=i]\mathsf{E}[U_{i}(T_{n})-U_{i}(t_{0}^{(n)})\mid\theta=i] in the left side of (96) by:

𝖤[Ui(\displaystyle\mathsf{E}[U_{i}( Tn)Ui(t0(n))θ=i]\displaystyle T_{n})-U_{i}(t_{0}^{(n)})\mid\theta=i]
𝖤[Ui(Tn)Ui(t0(n))T(n)>0,θ=i]pfn1.\displaystyle\leq\mathsf{E}[U_{i}(T_{n})\!-\!U_{i}(t_{0}^{(n)})\!\mid\!T^{(n)}\!>\!0,\theta\!=\!i]p^{n-1}_{f}\,. (102)

The value of Ui(t0(n))U_{i}(t^{(n)}_{0}) at n=1n=1, when t0(1)=0t^{(1)}_{0}=0 is a constant that can be directly computed for every ii from the initial distribution. Using (102), we can then bound the second sum in the left side of equation (41) by:

n=1\displaystyle\sum_{n=1}^{\infty} 𝖤[Ui(Tn)Ui(t0(n))θ=i]\displaystyle\mathsf{E}[U_{i}(T_{n})-U_{i}(t_{0}^{(n)})\mid\theta=i] (103)
𝖤[Ui(T(1))Ui(0)T(1)>0,θ=i]pf0\displaystyle\leq\mathsf{E}[U_{i}(T^{(1)})-U_{i}(0)\mid\!T^{(1)}\!>\!0,\theta=i]p_{f}^{0}
+n=2𝖤[Ui(Tn)Ui(t0(n))T(n)>0,θ=i]pfn1\displaystyle+\sum_{n=2}^{\infty}\mathsf{E}[U_{i}(T_{n})\!-\!U_{i}(t_{0}^{(n)})\mid\!T^{(n)}\!>\!0,\theta\!=\!i]p_{f}^{n\!-\!1} (104)
=\displaystyle= n=1𝖤[Ui(Tn)T(n)>0,θ=i]pfn1Ui(0)\displaystyle\sum_{n=1}^{\infty}\mathsf{E}[U_{i}(T_{n})\mid T^{(n)}>0,\theta=i]p_{f}^{n-1}-U_{i}(0)
n=2𝖤[Ui(t0(n))T(n)>0,θ=i]pfn1.\displaystyle-\sum_{n=2}^{\infty}\mathsf{E}[U_{i}(t_{0}^{(n)})\mid T^{(n)}>0,\theta=i]p_{f}^{n-1}\,. (105)

The conditioning event {T(n)>0}\{T^{(n)}>0\} implies events {T(m)>0}\{T^{(m)}>0\} for m=1,,nm=1,\dots,n because if T(m)=0T^{(m)}=0 means the process has stopped and no further communication rounds occur. Event {T(n)>0}\{T^{(n)}>0\} also implies that the nn-th round of communication occurs, and therefore Ui(t0(n))U_{i}(t_{0}^{(n)}) is given by the previous crossing value Ui(Tn1)U_{i}(T_{n-1}) minus C2C_{2} by constraint (19), then:

n=2\displaystyle\sum_{n=2}^{\infty} 𝖤[Ui(t0(n))T(n)>0,θ=i]pfn1\displaystyle\mathsf{E}[U_{i}(t_{0}^{(n)})\mid T^{(n)}>0,\theta=i]p_{f}^{n-1}
=n=2𝖤[Ui(Tn1)C2T(n)>0,θ=i]pfn1\displaystyle=\sum_{n=2}^{\infty}\mathsf{E}[U_{i}(T_{n-1})\!-\!C_{2}\mid T^{(n)}>0,\theta\!=\!i]p_{f}^{n-1} (106)
=n=1𝖤[Ui(Tn)C2T(n+1)>0,θ=i]pfn\displaystyle=\sum_{n=1}^{\infty}\mathsf{E}[U_{i}(T_{n})\!-\!C_{2}\mid T^{(n+1)}>0,\theta\!=\!i]p_{f}^{n}
n=1𝖤[Ui(Tn)C2T(n)>0,θ=i]pfn.\displaystyle\geq\sum_{n=1}^{\infty}\mathsf{E}[U_{i}(T_{n})\!-\!C_{2}\mid T^{(n)}\!>\!0,\theta\!=\!i]p_{f}^{n}\,. (107)

The inequality in (107) follows from the following inequality:

𝖤[Ui(Tn)T(n+1)>0,θ=i]𝖤[Ui(Tn)T(n)>0,θ=i].\displaystyle\mathsf{E}[U_{i}(T_{n})\mid T^{(n+1)}\!>\!0,\theta\!=\!i]\geq\mathsf{E}[U_{i}(T_{n})\mid T^{(n)}\!>\!0,\theta\!=\!i]\,. (108)

For the proof of inequality (108) see Appendix D. From (107) it follows that:

n=1\displaystyle-\sum_{n=1}^{\infty} 𝖤[Ui(Tn)C2T(n+1)>0θ=i]pfn\displaystyle\mathsf{E}[U_{i}(T_{n})\!-\!C_{2}\mid T^{(n+1)}\!>\!0\theta\!=\!i]p_{f}^{n}
\displaystyle\leq n=1𝖤[Ui(Tn)C2T(n)>0,θ=i]pfn\displaystyle-\sum_{n=1}^{\infty}\mathsf{E}[U_{i}(T_{n})\!-\!C_{2}\mid T^{(n)}\!>\!0,\theta\!=\!i]p_{f}^{n} (109)
=\displaystyle= n=1𝖤[pf(Ui(Tn)C2)T(n)>0,θ=i]pfn1.\displaystyle\!-\!\sum_{n=1}^{\infty}\mathsf{E}[p_{f}(U_{i}(T_{n})\!-\!C_{2})\!\mid\!T^{(n)}\!>\!0,\theta\!=\!i]p_{f}^{n\!-\!1}\,. (110)

In (110) we have factored one pfp_{f} inside the expectation. We can now replace (105) by (110) to upper bound (103):

n=1𝖤[Ui(t0(n)+T(n))Ui(t0(n))θ=i]\displaystyle\sum_{n=1}^{\infty}\mathsf{E}[U_{i}(t_{0}^{(n)}+T^{(n)})-U_{i}(t_{0}^{(n)})\mid\theta=i]
Ui(0)+n=1𝖤[Ui(Tn)T(n)>0,θ=i]pfn1\displaystyle\leq-U_{i}(0)+\sum_{n=1}^{\infty}\mathsf{E}[U_{i}(T_{n})\mid T^{(n)}>0,\theta=i]p_{f}^{n-1} (111)
n=1𝖤[pf(Ui(Tn)C2)T(n)>0,θ=i]pfn1\displaystyle\quad\quad-\sum_{n=1}^{\infty}\mathsf{E}[p_{f}(U_{i}(T_{n})\!-\!C_{2})\!\mid T^{(n)}\!>\!0,\theta\!=\!i]p_{f}^{n\!-\!1}
=n=1𝖤[Ui(Tn)pf(Ui(Tn)C2)T(n)>0,θ=i]pfn1\displaystyle=\sum_{n=1}^{\infty}\mathsf{E}[U_{i}(T_{n})\!-\!p_{f}(U_{i}(T_{n})\!-\!C_{2})\!\mid\!T^{(n)}\!>\!0,\theta\!=\!i]p_{f}^{n\!-\!1}
Ui(0).\displaystyle\quad\quad-U_{i}(0)\,. (112)

The expectation in (112) combines the two sums in from (111) by subtracting pf(Ui(Tn)C2)p_{f}(U_{i}(T_{n})-C_{2}) from Ui(Tn)U_{i}(T_{n}). The first term Ui(Tn)U_{i}(T_{n}) is the value of Ui(t)U_{i}(t) at the communication-phase stopping time t=Tnt=T_{n}. In the second term pf(Ui(Tn)C2)p_{f}(U_{i}(T_{n})-C_{2}), the difference Ui(Tn)C2U_{i}(T_{n})-C_{2} is the unique value that Ui(t0(n+1))U_{i}(t_{0}^{(n+1)}) can take once the nn-th confirmation phase round starts at a point Ui(Tn)U_{i}(T_{n}). Equation (112) is an important intermediate result in the proof of Thm. 2. This is because when considering the process Ui(t)U^{\prime}_{i}(t), the starting value of each communication-phase round Ui(t0(n+1))U^{\prime}_{i}(t_{0}^{(n+1)}) is still that of the original process Ui(Tn)C2U_{i}(T_{n})-C_{2}, and therefore the argument of the expectation would change to Ui(Tn)pf(Ui(Tn)C2)U^{\prime}_{i}(T^{\prime}_{n})-p_{f}(U_{i}(T_{n})-C_{2}). For the proof of Lemma 5, we just need to bound (112), so we write the sum in (112) as:

n=1𝖤[\displaystyle\sum_{n=1}^{\infty}\mathsf{E}[ Ui(Tn)(1pf)+pfC2)T(n)>0,θ=i]pfn1\displaystyle U_{i}(T_{n})(1-p_{f})\!+\!p_{f}C_{2})\!\mid T^{(n)}\!>\!0,\theta\!=\!i]p_{f}^{n\!-\!1} (113)
=\displaystyle= n=1𝖤[Ui(Tn)T(n)>0,θ=i](1pf)pfn1+n=1C2pfn.\displaystyle\sum_{n=1}^{\infty}\mathsf{E}[U_{i}(T_{n})\mid T^{(n)}\!>\!0,\theta\!=\!i](1\!-\!p_{f})p_{f}^{n\!-\!1}\!+\!\sum_{n=1}^{\infty}C_{2}p_{f}^{n}.

By constraint (17), Ui(Tn)Ui(Tn1)+C2U_{i}(T_{n})\leq U_{i}(T_{n}-1)+C_{2}, and since Ui(Tn1)<0U_{i}(T_{n}-1)<0, then, Ui(Tn)U_{i}(T_{n}) is bounded by C2C_{2}. Thus, the expectation 𝖤[Ui(Tn)T(n)>0,θ=i]\mathsf{E}[U_{i}(T_{n})\mid T^{(n)}\!>\!0,\theta\!=\!i] is also bounded by C2C_{2}. Then (113) is bounded by:

n=1C2(1pf)pfn1+n=1C2pfn=C2+pf1pfC2\displaystyle\sum_{n=1}^{\infty}C_{2}(1\!-\!p_{f})p_{f}^{n\!-\!1}+\sum_{n=1}^{\infty}C_{2}p_{f}^{n}=C_{2}+\frac{p_{f}}{1-p_{f}}C_{2} (114)

Finally, the left side of equation (41) in Lemma 5, (the left side of (115)), is upper bounded using the bounds (112) and (114) on the inner sum (109) as follows:

i=1M\displaystyle\sum_{i=1}^{M} Pr(θ=i)n=1𝖤[Ui(Tn)Ui(t0(n))θ=i]\displaystyle\Pr(\theta=i)\sum_{n=1}^{\infty}\mathsf{E}[U_{i}(T_{n})\!-\!U_{i}(t_{0}^{(n)})\mid\theta=i]
i=1MPr(θ=i)(pf1pfC2+C2Ui(0))\displaystyle\leq\sum_{i=1}^{M}\Pr(\theta=i)\left(\frac{p_{f}}{1-p_{f}}C_{2}+C_{2}-U_{i}(0)\right) (115)
=2C212NC212C2C2+C2𝖤[Ui(0)].\displaystyle=2^{-C_{2}}\frac{1-2^{-NC_{2}}}{1-2^{-C_{2}}}C_{2}+C_{2}-\mathsf{E}[U_{i}(0)]\,. (116)

To transition from (115) to (116) we have used the definition of pfp_{f} from Lemma 4. The proof of Lemma 5 is complete. ∎

V-B Proof of Theorems 2 and 3

Proof:

Suppose Ui(t)U^{\prime}_{i}(t) is a process that satisfies the constraints (16)-(20) in Thm. 1 and constraints (26)-(28) of Thm. 2 for some B<C2B<C_{2}. Because the constraints of Thm. 1 are satisfied, Lemmas 1-5 all hold for the process Ui(t)U^{\prime}_{i}(t). To bound 𝖤[T]\mathsf{E}[T^{\prime}] we begin by bounding the sum on the right side of Lemma 3, which is (39), but using the new process Ui(t)U^{\prime}_{i}(t). Dividing the new bound by CC produces the desired result. We follow the procedure in the proof of Lemma 5, but replacing Ui(t)U_{i}(t) by Ui(t)U^{\prime}_{i}(t), up to the equation (105). By the definition of Ui(t)U^{\prime}_{i}(t) we have that Ui(t0(n))=Ui(t0(n))U^{\prime}_{i}(t_{0}^{(n)})=U_{i}(t_{0}^{(n)}) and from equation (19) it follows that, for n>1n>1, T(n)>0T^{(n)}>0 implies Ui(t0(n))=Ui(Tn1)C2U_{i}(t_{0}^{(n)})=U_{i}(T_{n-1})-C_{2}. Then, from equation (105) to (112), we replace Ui(t0(n))U^{\prime}_{i}(t_{0}^{(n)}) by Ui(Tn1)C2U_{i}(T_{n-1})-C_{2} instead. Using equation (112) we have that:

n=1\displaystyle\sum_{n=1}^{\infty} 𝖤[Ui(Tn)Ui(t0(n))θ=i]Ui(0)\displaystyle\mathsf{E}[U^{\prime}_{i}(T^{\prime}_{n})-U^{\prime}_{i}(t_{0}^{(n)})\mid\theta=i]\leq-U^{\prime}_{i}(0) (117)
+\displaystyle+ n=1𝖤[Ui(Tn)pf(Ui(Tn)C2)T(n)>0,θ=i]pfn1\displaystyle\sum_{n=1}^{\infty}\mathsf{E}[U^{\prime}_{i}(T^{\prime}_{n})\!-\!p_{f}(U_{i}(T_{n})-C_{2})\!\mid T^{(n)}\!>\!0,\theta\!=\!i]p_{f}^{n\!-\!1}

We can replace Ui(t)U^{\prime}_{i}(t) by Ui(0)=Ui(0)U^{\prime}_{i}(0)=U_{i}(0) using the definition of Ui(t)U^{\prime}_{i}(t). We further claim that the constraints of Thm. 2 guarantee that Ui(Tn)pf(Ui(Tn)C2)BU^{\prime}_{i}(T^{\prime}_{n})\!-\!p_{f}(U_{i}(T_{n})-C_{2})\leq B. This is derived from constraint (28): Ui(Tn)pq(Ui(Tn)C2)BU^{\prime}_{i}(T^{\prime}_{n})-\frac{p}{q}(U_{i}(T_{n})-C_{2})\leq B by replacing pfp_{f} by pq\frac{p}{q}. The replacement is possible because pf<pqp_{f}<\frac{p}{q}, see (94), and Ui(Tn)C2<0U_{i}(T_{n})-C_{2}<0 by constraint (17). Therefore, the expectation in the last sum can be replaced with BB for an upper bound to obtain:

n=1\displaystyle\sum_{n=1}^{\infty} 𝖤[Ui(Tn)Ui(t0(n))θ=i]Ui(0)+n=1Bpfn1\displaystyle\mathsf{E}[U^{\prime}_{i}(T^{\prime}_{n})\!-\!U_{i}(t_{0}^{(n)})\!\mid\!\theta=i]\leq-U_{i}(0)\!+\!\sum_{n=1}^{\infty}Bp_{f}^{n\!-\!1}
=\displaystyle= B+Bpf1pfUi(0)=B+2C212NC212C2BUi(0).\displaystyle B\!+\!\frac{Bp_{f}}{1\!-\!p_{f}}\!-\!U_{i}(0)=B\!+\!2^{-C_{2}}\frac{1\!-\!2^{-NC_{2}}}{1\!-\!2^{-C_{2}}}B-U_{i}(0)\,. (118)

Then, the value in equation (118) replaces the inner sum in the left side of (45) to obtain:

1C\displaystyle\frac{1}{C} i=1MPr(θ=i)n=1𝖤[Ui(Tn)Ui(t0(n))θ=i]\displaystyle\sum_{i=1}^{M}\Pr(\theta\!=\!i)\sum_{n=1}^{\infty}\mathsf{E}[U^{\prime}_{i}(T^{\prime}_{n})\!-\!U^{\prime}_{i}(t_{0}^{(n)})\!\mid\!\theta\!=\!i]
1Ci=1MPr(θ=i)(Ui(0)+B+2C212NC212C2B)\displaystyle\leq\frac{1}{C}\sum_{i=1}^{M}\Pr(\theta\!=\!i)\left(U_{i}(0)\!+\!B\!+\!2^{-C_{2}}\frac{1\!-\!2^{-NC_{2}}}{1-2^{-C_{2}}}B\right)
=BC(1+2C212NC212C2)𝖤[Ui(0)]C.\displaystyle=\frac{B}{C}\left(1\!+\!2^{-C_{2}}\frac{1\!-\!2^{-NC_{2}}}{1\!-\!2^{-C_{2}}}\right)-\frac{\mathsf{E}[U_{i}(0)]}{C}\,. (119)

The proof is complete. ∎

Proof:

When Ui(t)0U_{i}(t)\geq 0 for some ii, constraint (31) is the same as the SED constraint (3) and therefore the constraints (19) and (18) are satisfied as shown in [2]. Need to show that constraints (16), (17) and (20) are also satisfied. We start the proof by deriving expressions for 𝖤[Ui(t+1)Ui(t)Yt,θ=i]\mathsf{E}[U_{i}(t\!+\!1)\!-\!U_{i}(t)\mid Y^{t},\theta\!=\!i] to find bounds in terms of the constraints of the theorem. The posterior probabilities ρi(yt+1)\rho_{i}(y^{t+1}) are computed according to Bayes’ Rule:

ρi(yt+1)=Pr(θ=i,Yt+1=yt+1Yt)Pr(Yt+1=yt+1Yt).\displaystyle\rho_{i}(y^{t+1})=\frac{\Pr(\theta=i,Y_{t+1}=y_{t+1}\mid Y^{t})}{\Pr(Y_{t+1}=y_{t+1}\mid Y^{t})}\,. (120)

The top conditional probability in equation (120) can be split into P(Yt+1=yt+1θ=i,Yt=yt)Pr(θ=iyt)P(Y_{t+1}=y_{t+1}\mid\theta=i,Y^{t}=y^{t})\Pr(\theta=i\mid y^{t}). Since the received history yty^{t} fully characterizes the vector of posterior probabilities 𝝆t[ρ1(yt),ρ2(yt),,ρM(yt)]\bm{\rho}_{t}\triangleq[\rho_{1}(y^{t}),\rho_{2}(y^{t}),\dots,\rho_{M}(y^{t})], and the new construction of S0S_{0} and S1S_{1}, then the conditioning event {θ=i}\{\theta=i\} sets the value of the encoding function Xt+1=enc(i,Yt)X_{t+1}=\text{enc}(i,Y^{t}), via its definition: enc(i,Yt)=𝟙iS1\text{enc}(i,Y^{t})=\mathbbm{1}_{i\in S_{1}}. We can just write the first probability as Pr(Yt+1enc(i,Yt))\Pr(Y_{t+1}\mid\text{enc}(i,Y^{t})), which reduces to qq if Yt+1=enc(i,Yt)Y_{t+1}=\text{enc}(i,Y^{t}) and to pp if Yt+1enc(i,Yt)Y_{t+1}\neq\text{enc}(i,Y^{t}). The second probability Pr(θ=iYt=yt)\Pr(\theta=i\mid Y^{t}=y^{t}) is just ρi(yt)\rho_{i}(y^{t}).

The bottom conditional probability can be written as Pr(Yt+1xt+1{0,1}=yt+1Xt+1,Yt)P(Xt+1=xt+1Yt)\underset{x_{t+1}\in\{0,1\}}{\sum\Pr(Y_{t+1}}=y_{t+1}\mid X_{t+1},Y^{t})P(X_{t+1}=x_{t+1}\mid Y^{t}). By the channel memoryless property, the next output Yt+1Y_{t+1} given the input Xt+1X_{t+1} is independent of the past YtY^{t}, that is:
Pr(Yt+1=yt+1Xt+1,Yt)=Pr(Yt+1=yt+1Xt+1)\Pr(Y_{t+1}=y_{t+1}\mid X_{t+1},Y^{t})=\Pr(Y_{t+1}=y_{t+1}\mid X_{t+1}). Since Pr(Xt+1=xt+1Yt)=Pr(θSxt+1)\Pr(X_{t+1}=x_{t+1}\mid Y^{t})=\Pr(\theta\in S_{x_{t+1}}) which is given by iSxt+1ρi(yt)\underset{i\in S_{x_{t+1}}}{\sum}\rho_{i}(y^{t}), we write:

ρi\displaystyle\rho_{i} (t+1)=Pr(Yt+1i)ρi(yt)jΩPr(Yt+1j)ρj(yt)\displaystyle(t+1)=\frac{\Pr(Y_{t+1}\mid i)\rho_{i}(y^{t})}{\sum_{j\in\Omega}\Pr(Y_{t+1}\mid j)\rho_{j}(y^{t})} (121)
=Pr(Yt+1i)ρi(yt)qjSyt+1ρj(yt)+pjΩSyt+1ρj(yt).\displaystyle=\frac{\Pr(Y_{t+1}\mid i)\rho_{i}(y^{t})}{q\sum_{j\in S_{y_{t+1}}}\rho_{j}(y^{t})+p\sum_{j\in\Omega\setminus S_{y_{t+1}}}\rho_{j}(y^{t})}\,.

For {i=θ}\{i=\theta\} the encoding function Xt+1=𝟙θS1X_{t+1}=\mathbbm{1}_{\theta\in S_{1}} dictates that Xt+1=𝟙iS1X_{t+1}=\mathbbm{1}_{i\in S_{1}}. Thus Pr(Yt+1=𝟙iS1i=θ)=Pr(Yt+1=Xt+1)=q\Pr(Y_{t+1}=\mathbbm{1}_{i\in S_{1}}\mid i=\theta)=\Pr(Y_{t+1}=X_{t+1})=q, and Pr(Yt+1=𝟙iS1i=θ)=Pr(Yt+1=Xt+11)=p\Pr(Y_{t+1}=\mathbbm{1}_{i\notin S_{1}}\mid i=\theta)=\Pr(Y_{t+1}=X_{t+1}\oplus 1)=p. Let P0=jS0ρj(yt)P_{0}=\sum_{j\in S_{0}}\rho_{j}(y^{t}) and P1=jS1ρj(yt)P_{1}=\sum_{j\in S_{1}}\rho_{j}(y^{t}) and let ΔP0P1\Delta\triangleq P_{0}-P_{1}, so that P0=12+Δ2P_{0}=\frac{1}{2}+\frac{\Delta}{2} and P1=12Δ2P_{1}=\frac{1}{2}-\frac{\Delta}{2}. The value of Ui(t+1)U_{i}(t+1) for each Yt+1{0,1}Y_{t+1}\in\{0,1\} can be obtained from equation 121.

Assume first that iS0i\in S_{0} to obtain the value of E[Ui(t+1)Ui(t)Yt=yt,θ=i]E[U_{i}(t+1)-U_{i}(t)\mid Y^{t}=y^{t},\theta=i].

𝖤[\displaystyle\mathsf{E}[ Ui(t+1)Yt=yt,θ=i]\displaystyle U_{i}(t+1)\mid Y^{t}=y^{t},\theta=i]
=qlog2ρi(yt)qP0q+P1p1ρi(yt)qP0q+P1p+plog2ρi(yt)pP0p+P1q1ρi(yt)pP0p+P1q\displaystyle=q\log_{2}\frac{\frac{\rho_{i}(y^{t})q}{P_{0}q+P_{1}p}}{1-\frac{\rho_{i}(y^{t})q}{P_{0}q+P_{1}p}}+p\log_{2}\frac{\frac{\rho_{i}(y^{t})p}{P_{0}p+P_{1}q}}{1-\frac{\rho_{i}(y^{t})p}{P_{0}p+P_{1}q}}
=qlog2ρi(yt)q12+Δ(qp)2ρi(yt)q+plog2ρi(yt)p12Δ(qp)2ρi(yt)p.\displaystyle=q\log_{2}\dfrac{\rho_{i}(y^{t})q}{\frac{1}{2}\!+\!\frac{\Delta(q-p)}{2}\!-\!\rho_{i}(y^{t})q}+p\log_{2}\cfrac{\rho_{i}(y^{t})p}{\frac{1}{2}\!-\!\frac{\Delta(q-p)}{2}\!-\!\rho_{i}(y^{t})p}\,.

For iS1i\in S_{1} the only difference is the sign of the term with Δ\Delta. Let ιi=𝟙iS0𝟙iS1\iota_{i}=\mathbbm{1}_{i\in S_{0}}-\mathbbm{1}_{i\in S_{1}}, that is 11 if iS0i\in S_{0} and 1-1 if iS1i\in S_{1} and add a coefficient ιi\iota_{i} to each Δ\Delta for a general expression. Multiply by 22 both terms of the fraction inside the logarithm and expand it to obtain:

𝖤[\displaystyle\mathsf{E}[ Ui(t+1)Ui(t)Yt,θ=i]=log2(ρi(yt))\displaystyle U_{i}(t\!+\!1)\!-\!U_{i}(t)\mid Y^{t},\theta\!=\!i]=\log_{2}(\rho_{i}(y^{t})) (122)
+q(log2(2q)log2(1ρi(yt)+(qp)(ιiΔρi(yt)))\displaystyle+q\left(\log_{2}(2q)\!-\!\log_{2}\left(1\!-\!\rho_{i}(y^{t})\!+\!(q\!-\!p)(\iota_{i}\Delta\!-\!\rho_{i}(y^{t})\right)\right)
+p(log2(2p)log2(1ρi(yt)(qp)(ιiΔρi(yt)))\displaystyle+p\left(\log_{2}(2p)\!-\!\log_{2}\left(1\!-\!\rho_{i}(y^{t})\!-\!(q\!-\!p)(\iota_{i}\Delta\!-\!\rho_{i}(y^{t})\right)\right)
=\displaystyle= q(log2(2q)log2(1+(qp)ιiΔρi(yt)1ρi(yt)))\displaystyle q\left(\log_{2}(2q)\!-\!\log_{2}\left(1\!+\!(q\!-\!p)\frac{\iota_{i}\Delta\!-\!\rho_{i}(y^{t})}{1-\rho_{i}(y^{t})}\right)\right) (123)
+p(log2(2p)log2(1(qp)ιiΔρi(yt)1ρi(yt)))\displaystyle+p\left(\log_{2}(2p)\!-\!\log_{2}\left(1\!-\!(q\!-\!p)\frac{\iota_{i}\Delta\!-\!\rho_{i}(y^{t})}{1\!-\!\rho_{i}(y^{t})}\right)\right)
\displaystyle\geq Clog2(1+(qp)2ιiΔρi(yt)1ρi(yt)).\displaystyle C-\log_{2}\left(1+(q-p)^{2}\frac{\iota_{i}\Delta-\rho_{i}(y^{t})}{1-\rho_{i}(y^{t})}\right)\,. (124)

Now subtract the term log2(1ρi(yt))\log_{2}(1-\rho_{i}(y^{t})), and add it back as a factor in the logarithm, to recover Ui(t)U_{i}(t) from log2(ρi(yt))\log_{2}(\rho_{i}(y^{t})). Note that 2ρi(yt)q=ρi(yt)+(qp)ρi(yt)2\rho_{i}(y^{t})q=\rho_{i}(y^{t})+(q-p)\rho_{i}(y^{t}) and 2ρi(yt)p=ρi(yt)(qp)ρi(yt)2\rho_{i}(y^{t})p=\rho_{i}(y^{t})-(q-p)\rho_{i}(y^{t}). And also note that qlog2(2q)+plog2(2p)=Cq\log_{2}(2q)+p\log_{2}(2p)=C.

The logarithm log2(1ρi(yt))\log_{2}(1-\rho_{i}(y^{t})) from (122) is split into plog2(1ρi(yt))+qlog2(1ρi(yt))p\log_{2}(1-\rho_{i}(y^{t}))+q\log_{2}(1-\rho_{i}(y^{t})), and 1ρi(yt)1-\rho_{i}(y^{t}) divides the arguments of the logarithms in (123). Equation (124) follows from applying Jensen’s inequality to the convex function log2()-\log_{2}(\cdot). Then:

i=1M\displaystyle\sum_{i=1}^{M} 𝖤[Ui(t+1)Ui(t)Yt,θ=i]ρi(Yt)\displaystyle\mathsf{E}[U_{i}(t+1)-U_{i}(t)\mid Y^{t},\theta=i]\rho_{i}(Y^{t})
C\displaystyle\geq C- i=1Mρi(yt)log2(1+(qp)2ιiΔρi(yt)1ρi(yt))\displaystyle\sum_{i=1}^{M}\rho_{i}(y^{t})\log_{2}\left(1+(q-p)^{2}\frac{\iota_{i}\Delta-\rho_{i}(y^{t})}{1-\rho_{i}(y^{t})}\right) (125)
=C\displaystyle=C- iS0ρi(yt)log2(1+(qp)2Δρi(yt)1ρi(yt))\displaystyle\sum_{i\in S_{0}}\rho_{i}(y^{t})\log_{2}\left(1+(q-p)^{2}\frac{\Delta-\rho_{i}(y^{t})}{1-\rho_{i}(y^{t})}\right)
\displaystyle- iS1ρi(yt)log2(1(qp)2Δ+ρi(yt)1ρi(yt)).\displaystyle\sum_{i\in S_{1}}\rho_{i}(y^{t})\log_{2}\left(1-(q-p)^{2}\frac{\Delta+\rho_{i}(y^{t})}{1-\rho_{i}(y^{t})}\right)\,. (126)

By the SEAD constraints, equations (30) and (31) if iS0i\in S_{0}, then Δρminρi(yt)\Delta\leq\rho_{\min}\leq\rho_{i}(y^{t}). For the case where Δ0\Delta\geq 0, then iS0Δρi(yt)0i\in S_{0}\implies\Delta-\rho_{i}(y^{t})\leq 0 and Δρi(yt)<0-\Delta-\rho_{i}(y^{t})<0. Then the arguments of the logarithms in (126) are both less than 11 for every ii. This suffices to show that the constraints (20) and (16) are satisfied when P0P1P_{0}\geq P_{1} for the case that Δ0\Delta\geq 0.

It remains to prove that constraints (20) and (16) hold in the case where P1>P0P_{1}>P_{0}, or equivalently Δ<0\Delta<0. Let α=Δ>0\alpha=-\Delta>0, and note that since 0<α<10<\alpha<1, then:

α1ρminα=α1ρi(yt)1ρi(yt)αρi(yt)1ρi(yt),\frac{\alpha}{1\!-\!\rho_{\min}}\geq\alpha=\alpha\frac{1\!-\!\rho_{i}(y^{t})}{1\!-\!\rho_{i}(y^{t})}\geq\frac{\alpha\!-\!\rho_{i}(y^{t})}{1\!-\!\rho_{i}(y^{t})}\,, (127)

and ρiρminα+ρiα+ρmin\rho_{i}\geq\rho_{\min}\implies\alpha+\rho_{i}\geq\alpha+\rho_{\min} and 1ρi<1ρmin1-\rho_{i}<1-\rho_{\min}, therefore:

log2(1(qp)2α+ρi(yt)1ρi(yt))\displaystyle\log_{2}\!\left(\!1\!-\!(q\!-\!p)^{2}\frac{\alpha\!+\!\rho_{i}(y^{t})}{1\!-\!\rho_{i}(y^{t})}\right)\! log2(1(qp)2α+ρmin1ρmin)\displaystyle\leq\log_{2}\!\left(\!1\!-\!(q\!-\!p)^{2}\frac{\alpha\!+\!\rho_{\min}}{1\!-\!\rho_{\min}}\right)
log2(1+(qp)2αρi(yt)1ρi(yt))\displaystyle\log_{2}\!\left(\!1\!+\!(q\!-\!p)^{2}\frac{\alpha\!-\!\rho_{i}(y^{t})}{1\!-\!\rho_{i}(y^{t})}\right)\! log2(1+(qp)2α1ρmin).\displaystyle\leq\log_{2}\!\left(\!1\!+(q\!-\!p)^{2}\frac{\alpha}{1-\rho_{\min}}\right)\,.

Since this holds for all i=1,,Mi=1,\dots,M, then:

i=1M𝖤[Ui(t+1)Ui(t)Yt=yt,θ=i]ρi(yt)C\displaystyle\sum_{i=1}^{M}\mathsf{E}[U_{i}(t+1)-U_{i}(t)\mid Y^{t}=y^{t},\theta=i]\rho_{i}(y^{t})\geq C (128)
P0log2(1(qp)2α+ρmin1ρmin)P1log2(1+α(qp)21ρmin)\displaystyle-P_{0}\log_{2}\!\left(\!1\!-\!(q\!-\!p)^{2}\frac{\alpha\!+\!\rho_{\min}}{1\!-\!\rho_{\min}}\right)\!-\!P_{1}\log_{2}\!\left(\!1\!+\alpha\frac{(q\!-\!p)^{2}}{1\!-\!\rho_{\min}}\right)
Clog2(1(qp)21ρmin[P0(α+ρmin)P1α]).\displaystyle\geq C-\log_{2}\left(1-\frac{(q-p)^{2}}{1-\rho_{\min}}[P_{0}(\alpha+\rho_{\min})-P_{1}\alpha]\right)\,. (129)

To satisfy constraint (20) we only need the logarithm term in (129) to be non-negative. This only requires that Δ2+P0ρmin>0-\Delta^{2}+P_{0}\rho_{\min}>0. Since P0P1=ΔP_{0}-P_{1}=\Delta, then P0(α+ρmin)P1α=(P0P1)α+P0ρmin=Δ2+P0ρminP_{0}(\alpha+\rho_{\min})-P_{1}\alpha=(P_{0}-P_{1})\alpha+P_{0}\rho_{\min}=-\Delta^{2}+P_{0}\rho_{\min}. To satisfy constraint (20) it suffices that Δ2+P0ρmin>0-\Delta^{2}+P_{0}\rho_{\min}>0, which is equivalent to:

Δ2P0ρmin.\Delta^{2}\leq P_{0}\rho_{\min}\,. (130)

The SEAD constraints, equations (30) and (31), guarantees that Δ2ρmin2\Delta^{2}\leq\rho^{2}_{\min}. Since P0miniS0ρi(yt)=ρminP_{0}\geq\underset{i\in S_{0}}{\min}\rho_{i}(y^{t})=\rho_{\min}, then Δ2ρmin2P0ρmin\Delta^{2}\leq\rho^{2}_{\min}\leq P_{0}\rho_{\min}, which satisfies inequality (130). Then, the SEAD constraints guarantee that constraint (20) is satisfied, and only restricts the (130) is satisfied, and only restricts the absolute difference between P0P_{0} and P1P_{1}.

To prove that constraint (16) is satisfied, note that equation (30) of the SEAD constraints guarantees that if ρj(t)12j=1,,M\rho_{j}(t)\leq\frac{1}{2}\>\forall j=1,\dots,M, then |Δ|13|\Delta|\leq\frac{1}{3}. Starting from equation (124) note that the worst case scenario is when ιiΔ=13\iota_{i}\Delta=\frac{1}{3}. Using (127) with α=13\alpha=\frac{1}{3} to go from (131) to (132) we find obtain:

𝖤[Ui(t+1)\displaystyle\mathsf{E}[U_{i}(t+1) Ui(t)Yt,θ=i]\displaystyle-U_{i}(t)\mid Y^{t},\theta=i]
\displaystyle\geq Clog2(1+(qp)2ιiΔρi(yt)1ρi(yt))\displaystyle C-\log_{2}\left(1+(q-p)^{2}\frac{\iota_{i}\Delta-\rho_{i}(y^{t})}{1-\rho_{i}(y^{t})}\right) (131)
\displaystyle\geq Clog2(1+(qp)23)\displaystyle C-\log_{2}\left(1+\frac{(q-p)^{2}}{3}\right) (132)
\displaystyle\geq C(qp)23\displaystyle C-\frac{(q-p)^{2}}{3} (133)
=\displaystyle= C(qp)22ln(2)+32ln(2)6ln(2)(qp)2\displaystyle C-\frac{(q-p)^{2}}{2\ln(2)}+\frac{3-2\ln(2)}{6\ln(2)}(q-p)^{2} (134)
\displaystyle\geq 32ln(2)6ln(2)(qp)2>(qp)23.\displaystyle\frac{3-2\ln(2)}{6\ln(2)}(q-p)^{2}>\frac{(q-p)^{2}}{3}\,. (135)

To transition from (134) to (135) we need to show that 2ln(2)C(qp)22\ln(2)C\geq(q-p)^{2}. For this we find a small constant aa that makes aC(qp)2aC-(q-p)^{2}, the difference between 2 convex functions, also convex. Take second derivatives d2dp2aC=1ln(2)apq\frac{d^{2}}{dp^{2}}aC=\frac{1}{\ln(2)}\frac{a}{pq} and d2dp2(qp)2=8\frac{d^{2}}{dp^{2}}(q-p)^{2}=8 and subtract them. The constant aa is found by noting that pq14pq\leq\frac{1}{4}.

The SEAD constraints guarantee that both sets, S0S_{0} and S1S_{1} are non-empty. Then, since the maximum absolute value difference Ui(t+1)Ui(t)\mid U_{i}(t+1)-U_{i}(t)\mid is C2C_{2}, constraint (17) is satisfied, see the proof of Claim 1.

For the proof of existence of a process Ui(t)U^{\prime}_{i}(t), with B=1qlog2(2q)B=\frac{1}{q}\log_{2}(2q), see Appendix B. ∎

VI Extension to Arbitrary Initial Distributions

The proof of Thm. 1 only used the uniform input distribution to assert Ui(0)=U1(0)U_{i}(0)=U_{1}(0) and replace 𝖤[Ui(0)]\mathsf{E}[U_{i}(0)] by U1(0)=log2(M1)U_{1}(0)=\log_{2}(M-1) in equation (46). In Lemma 5, we have required that Ui(0)<0iU_{i}(0)<0\;\forall i. However, even with uniform input distribution this is not the case when Ω={0,1}\Omega=\{0,1\}. To avoid this requirement, the case where i:Ui(0)0\exists i:U_{i}(0)\geq 0 and therefore T(1)=0T^{(1)}=0 needs to be accounted for. Also, if Ui(t)C2U_{i}(t)\geq C_{2}, then the probability that an initial fall back occurs is only upper bounded by pfp_{f}, which can be inferred from the proof of Lemma 4. Then, to obtain an upper bound on the expected stopping time 𝖤[τ]\mathsf{E}[\tau] for an arbitrary input distribution, it suffices to multiply the terms 𝖤[Ui(T(1))T(1)>0,θ=i]Ui(0)\mathsf{E}[U_{i}(T^{(1)})\mid T^{(1)}>0,\theta=i]-U_{i}(0) in the proof of Lemma 5, equation (104), by the indicator 𝟙Ui(0)<0\mathbbm{1}_{U_{i}(0)<0}. Then, the bound on Lemma 5 becomes:

i=1M\displaystyle\sum_{i=1}^{M} n=1𝖤[Ui(t0(n)+T(n))Ui(t0(n))θ=i]Pr(θ=i)\displaystyle\sum_{n=1}^{\infty}\mathsf{E}[U_{i}(t_{0}^{(n)}+T^{(n)})-U_{i}(t_{0}^{(n)})\mid\theta=i]\Pr(\theta=i)
C2pq1(pq)N1pq+𝖤[(C2Ui(0))𝟙Ui(0)<0].\displaystyle\leq C_{2}\frac{p}{q}\frac{1\!-\!\left(\frac{p}{q}\right)^{N}}{1\!-\!\frac{p}{q}}\!+\!\mathsf{E}[\left(C_{2}\!-\!U_{i}(0)\right)\mathbbm{1}_{U_{i}(0)<0}]\,. (136)

By Thm. 3, we can replace C2C_{2} with q1log2(2q)q^{-1}\log_{2}(2q) in (136). Using the definition of pfp_{f} from (40) we obtain the bound:

𝖤[T]\displaystyle\mathsf{E}[T] 𝖤[T]2C212NC212C2log2(2q)qC\displaystyle\leq\mathsf{E}[T^{\prime}]\leq 2^{-C_{2}}\frac{1-2^{-NC_{2}}}{1-2^{-C_{2}}}\frac{\log_{2}(2q)}{qC}
+𝖤[(log2(2q)qUi(0))𝟙Ui(0)<0C].\displaystyle+\mathsf{E}\left[\left(\frac{\log_{2}(2q)}{q}-U_{i}(0)\right)\frac{\mathbbm{1}_{U_{i}(0)<0}}{C}\right]\,. (137)

VI-A Generalized Achievability Bound

An upper bound on 𝖤[τ]\mathsf{E}[\tau] for a arbitrary initial distribution 𝝆0\bm{\rho}_{0} is then obtained using this bound (137) and the bound on 𝖤[τT]\mathsf{E}[\tau-T] from equation (47) to obtain:

𝖤[τ]i=1M(log2(1ρi(0)ρi(0))C+log2(2q)qC)ρi(0)𝟙ρ0(i)<0.5\displaystyle\mathsf{E}[\tau]\leq\sum_{i=1}^{M}\left(\frac{\log_{2}\left(\frac{1-\rho_{i}(0)}{\rho_{i}(0)}\right)}{C}+\frac{\log_{2}(2q)}{q\cdot C}\right)\rho_{i}(0){\huge\mathds{1}_{\rho_{0}^{(i)}<0.5}}
+log2(1ϵϵ)C2C2C1+(log2(2q)qCC2C1)1ϵ1ϵ2C212C22C2.\displaystyle+\!\left\lceil\frac{\log_{2}(\frac{1-\epsilon}{\epsilon})}{C_{2}}\right\rceil\!\frac{C_{2}}{C_{1}}\!+\!\left(\frac{\log_{2}(2q)}{qC}-\frac{C_{2}}{C_{1}}\right)\frac{1-\frac{\epsilon}{1\!-\!\epsilon}2^{-C_{2}}}{1\!-\!2^{-C_{2}}}2^{-C_{2}}\,. (138)

For the special case where ρi(0)12i=1,,M\rho_{i}(0)\ll\frac{1}{2}\quad\forall i=1,\dots,M, the log likelyhood ratio can be approximated by log2(ρi(0)1ρi(0))log2(ρi(i))\log_{2}(\frac{\rho_{i}(0)}{1-\rho_{i}(0)})\lessapprox\log_{2}(\rho_{i}(i)) to obtain a simpler expression of the bound (138):

𝖤[τ]\displaystyle\mathsf{E}[\tau] <(𝝆0)C+log2(2q)qC+log2(1ϵϵ)C2C2C1\displaystyle<\frac{\mathcal{H}(\bm{\rho}_{0})}{C}+\frac{\log_{2}(2q)}{q\cdot C}+\left\lceil\frac{\log_{2}(\frac{1-\epsilon}{\epsilon})}{C_{2}}\right\rceil\frac{C_{2}}{C_{1}}
+(log2(2q)qCC2C1)1ϵ1ϵ2C212C22C2,\displaystyle\phantom{=\,}+\left(\frac{\log_{2}(2q)}{qC}-\frac{C_{2}}{C_{1}}\right)\frac{1-\frac{\epsilon}{1\!-\!\epsilon}2^{-C_{2}}}{1\!-\!2^{-C_{2}}}2^{-C_{2}}\,, (139)

where (𝝆(0))\mathcal{H}(\bm{\rho}(0)) is the entropy of the p.d.f. 𝝆0\bm{\rho}_{0} in bits.

VI-B Uniform and Binomial Initial Distribution

Using the bound of equation (138), we can develop a better upper bound on the blocklength for a systematic encoder with uniform input distribution when Ω={0,1}K\Omega=\{0,1\}^{K}. It can be shown that the systematic transmissions transform the uniform distribution into a binomial distribution, see [1]. The bound is constructed by adding the KK systematic transmissions to the bound in (138) applied to the binomial distribution as follows:

𝖤[τ]K+\displaystyle\mathsf{E}[\tau]\leq K+ (140)
i=0K[log2(1piqKipiqKi)C+log2(2q)qC](Ki)piqKi𝟙(qKipi<0.5)\displaystyle\sum_{i=0}^{K}\!\left[\frac{\log_{2}(\frac{1-p^{i}q^{K\!-\!i}}{p^{i}q^{K\!-\!i}})}{C}\!+\!\frac{\log_{2}(2q)}{qC}\right]\!\!\binom{K}{i}p^{i}q^{K-i}\mathbbm{1}_{(q^{K-i}p^{i}<0.5)}
+log2(1ϵϵ)C2C2C1+(log2(2q)qCC2C1)1ϵ1ϵ2C212C22C2.\displaystyle+\left\lceil\frac{\log_{2}(\frac{1-\epsilon}{\epsilon})}{C_{2}}\right\rceil\frac{C_{2}}{C_{1}}\!+\!\left(\frac{\log_{2}(2q)}{qC}\!-\!\frac{C_{2}}{C_{1}}\right)\frac{1-\frac{\epsilon}{1\!-\!\epsilon}2^{-C_{2}}}{1\!-\!2^{-C_{2}}}2^{-C_{2}}\,.

This bound, which assumes SEAD and systematic transmission, is the tightest achievability bound that we have developed for the model.

VII Algorithm and Implementation

In this section we introduce a systematic posterior matching (SPM) algorithm with partitioning by thresholding of ordered posteriors (TOP), that we call SPM-TOP. The SPM-TOP algorithm guarantees the performance of bound (21) of Thm. 3 because both systematic encoding and partitioning via TOP enforce the SEAD partitioning constraint in equations (30) and (31). The SPM-TOP algorithm also guarantees the performance of the bound (140) because it is a systematic algorithm.

VII-A Partitioning by Thresholding of Ordered Posteriors (TOP)

The TOP rule is a simple method to construct S0S_{0} and S1S_{1} at any time tt from the vector of posteriors 𝝆t\bm{\rho}_{t}, which enforces the SEAD partitioning constraint of Thm. 3. The rule requires an ordering {b1,,bM}\{b_{1},\dots,b_{M}\} of the vector of posteriors such that ρb1(t)ρb2(t)ρbM(t)\rho_{b_{1}}(t)\geq\rho_{b_{2}}(t)\geq\cdots\geq\rho_{b_{M}}(t). TOP builds S0S_{0} and S1S_{1} by finding a threshold mm to split {b1,,bM}\{b_{1},\dots,b_{M}\} into two contiguous segments {b1,,bm}=S0\{b_{1},\dots,b_{m}\}=S_{0} and {bm+1,,bM}=S1\{b_{m+1},\dots,b_{M}\}=S_{1}. To determine the threshold position, the rule first determines an index m{1,,M}m^{\prime}\in\{1,\dots,M\} such that:

j=1m1ρbi(yt)<12j=1mρbi(yt),\sum_{j=1}^{m^{\prime}-1}\rho_{b_{i}}(y^{t})<\frac{1}{2}\leq\sum_{j=1}^{m^{\prime}}\rho_{b_{i}}(y^{t})\,, (141)

Once mm^{\prime} is found, the rule must select between two possible alternatives: Either m=mm=m^{\prime} or m=m1m=m^{\prime}-1. In other words, all that remains to decide is whether to place bmb_{m^{\prime}} in S0S_{0} or in S1S_{1}. We select the choice that guarantees that the absolute difference between P0P_{0} and P1P_{1} is no larger than the posterior of bmb_{m^{\prime}}. The threshold mm is selected from {m1,m}\{m^{\prime}-1,m^{\prime}\} as follows:

if i=1mρbi(t)12>12ρbm(t)then: m=m1\displaystyle\textrm{if }\sum_{i=1}^{m^{\prime}}\rho_{b_{i}}(t)-\frac{1}{2}>\frac{1}{2}\rho_{b_{m^{\prime}}}(t)\quad\text{then: }m=m^{\prime}-1 (142)
if i=1mρbi(t)1212ρbm(t)then: m=m.\displaystyle\textrm{if }\sum_{i=1}^{m^{\prime}}\rho_{b_{i}}(t)-\frac{1}{2}\leq\frac{1}{2}\rho_{b_{m^{\prime}}}(t)\quad\text{then: }m=m^{\prime}\,. (143)

Note that since m{m1,m}m\in\{m^{\prime}-1,m^{\prime}\}, then the posterior of bmb_{m^{\prime}} is no larger than that of bmb_{m}, and the posterior of bmb_{m} is also the value of ρmin=miniS0{ρi(t)}\rho_{\min}=\min_{i\in S_{0}}\{\rho_{i}(t)\}.

Claim 2.

The TOP rule guarantees that the SEAD constraints of Thm. 3, given by (30) and (31) are satisfied.

Proof:

The TOP partitioning rule sets the threshold that separates S0S_{0} and S1S_{1} exactly before or exactly after item bmb_{m^{\prime}} depending on weather case (142) or case (143) occurs. To show that the TOP rule guarantees that the SEAD constraints in Thm. 3 are satisfied note that the threshold lies before item bmb_{m^{\prime}} if case (142) occurs. Then, by the first inequality of (141) and by (142):

P0\displaystyle P_{0} =i=1m1ρbi(yt)=i=1mρbi(yt)ρbm(t)<12\displaystyle=\sum_{i=1}^{m^{\prime}-1}\rho_{b_{i}}(y^{t})=\sum_{i=1}^{m^{\prime}}\rho_{b_{i}}(y^{t})-\rho_{b_{m^{\prime}}}(t)<\frac{1}{2} (144)
P0\displaystyle P_{0} >12+12ρbm(t)ρbm(t)=1212ρbm(t).\displaystyle>\frac{1}{2}+\frac{1}{2}\rho_{b_{m^{\prime}}}(t)-\rho_{b_{m^{\prime}}}(t)=\frac{1}{2}-\frac{1}{2}\rho_{b_{m^{\prime}}}(t)\,. (145)

When case (143) occurs, the threshold is set after item bmb_{m^{\prime}}. Then by the second inequality of (141) and by (142):

P0\displaystyle P_{0} =i=1mρbi(yt)12\displaystyle=\sum_{i=1}^{m^{\prime}}\rho_{b_{i}}(y^{t})\geq\frac{1}{2} (146)
P0\displaystyle P_{0} 12+12ρbm(t),\displaystyle\leq\frac{1}{2}+\frac{1}{2}\rho_{b_{m^{\prime}}}(t)\,, (147)

In either case we have:

1212ρbm(t)P012+12ρbm(t)\frac{1}{2}-\frac{1}{2}\rho_{b_{m}}(t)\leq P_{0}\leq\frac{1}{2}+\frac{1}{2}\rho_{b_{m}}(t) (148)

By definition, Δ=P0P1=P0(1P0)=2P01\Delta=P_{0}-P_{1}=P_{0}-(1-P_{0})=2P_{0}-1. Scale equation (148) by 22 and subtract 11, then: ρbm(t)<2P01ρbm(t)-\rho_{b_{m^{\prime}}}(t)<2P_{0}-1\leq\rho_{b_{m^{\prime}}}(t). Then, Δρbm(t)ρbm(t)=ρmin\mid\Delta\mid\;\leq\rho_{b_{m^{\prime}}}(t)\leq\rho_{b_{m}}(t)=\rho_{\min}. This concludes the proof. ∎

We have shown that the construction of S0S_{0} and S1S_{1} can be as simple as finding the threshold item bmb_{m^{\prime}} where the c.d.f. induced by the ordered vector of posteriors crosses 12\frac{1}{2}, then, determining whether the threshold should be before or after item bmb_{m^{\prime}}, and finally allocating all items before the threshold to S0S_{0} and all items after the threshold to S1S_{1}.

VII-B The Systematic Posterior Matching Algorithm

The SPM-TOP algorithm is a low complexity scheme that implements sequential transmission over the BSC with noiseless feedback with a source message sampled from a uniform distribution. The algorithm has the usual communication and confirmation phases and the communication phase starts with systematic transmission. The systematic transmissions of the communication phase are treated as a separate systematic phase, for a total of three phases that we proceed to describe in detail.

VII-C Systematic phase

Let the sampled message be θ{0,1}K\theta\in\{0,1\}^{K}, with bits bi(θ)b_{i}^{(\theta)}, that is θ={b1(θ),b2(θ),,bK(θ)}\theta=\{b_{1}^{(\theta)},b_{2}^{(\theta)},\dots,b_{K}^{(\theta)}\}. For t=1,,Kt=1,\dots,K the bits bt(θ)b_{t}^{(\theta)} are transmitted and the vector yK{y1,,yK}y^{K}\triangleq\{y_{1},\dots,y_{K}\} is received. After the KK-th transmission, both transmitter and receiver initialize a list of K+1K+1 groups {𝒢0,,𝒢K}\{\mathcal{G}_{0},\dots,\mathcal{G}_{K}\}, where each 𝒢i\mathcal{G}_{i} is a tuple 𝒢i=[Ni,Li,hi,ρi(yt)]\mathcal{G}_{i}=[N_{i},L_{i},h_{i},\rho_{i}(y^{t})]. For each tuple NiN_{i} is the count of messages in the group NiN_{i}; LiL_{i} is the index of the first message in the group; hih_{i} is the shared Hamming distance between yKy^{K} and any message in the group, that is: l,s𝒢ij=1Kbj(l)yj=j=1Kbj(s)yjl,s\in\mathcal{G}_{i}\implies\sum_{j=1}^{K}b^{(l)}_{j}\oplus y_{j}=\sum_{j=1}^{K}b^{(s)}_{j}\oplus y_{j}; and ρi(yt)\rho_{i}(y^{t}) is the group’s shared posterior. At time t=Kt=K, each group 𝒢i,i=1,,K\mathcal{G}_{i},i=1,\dots,K has that Ni=(Ki)N_{i}=\binom{K}{i}, Li=0L_{i}=0, hi=ih_{i}=i, and ρi(K)=pjqKj\rho_{i}(K)=p^{j}q^{K-j}. The groups are sorted in order of decreasing probability, equivalent to increasing Hamming weight, since j>lplqKl<phqKjj>l\rightarrow p^{l}q^{K-l}<p^{h}q^{K-j}. At the end of the systematic phase, the transmitter finds the index of the group hθh_{\theta} and the index within the group nθn_{\theta} corresponding to the sampled message θ\theta. The index hθh_{\theta} is given by hθ=j=1Kbj(θ)yjh_{\theta}=\sum_{j=1}^{K}b_{j}^{(\theta)}\oplus y_{j} and the index nθ{0,,(Khθ)1}n_{\theta}\in\{0,\dots,\binom{K}{h_{\theta}}-1\} and is found via algorithm 5. The systematic phase is described by algorithm 1.

Input: Tx: θ=[b1(θ),,bK(θ)]\theta=[b_{1}^{(\theta)},\dots,b_{K}^{(\theta)}] \triangleright Transmitted message.
for t=1,,Kt=1,\dots,K do
      1 channel input: xt=btθx_{t}=b^{\theta}_{t}, output: yty_{t};
      
end for
Construct {𝒢0,,𝒢K}\{\mathcal{G}_{0},\dots,\mathcal{G}_{K}\}, 𝒢i=[Ni,Li,hi,ρi(K)]\mathcal{G}_{i}=[N_{i},L_{i},h_{i},\rho_{i}(K)];
2 Ni(Ki)N_{i}\leftarrow\binom{K}{i} \triangleright Ni𝒢iN_{i}\triangleq\mid\mathcal{G}_{i}\mid ;
3 ρi(K)qKipi\rho_{i}(K)\leftarrow q^{K-i}p^{i} \triangleright j𝒢iρj(K)=ρij\in\mathcal{G}_{i}\rightarrow\rho_{j}(K)=\rho_{i} ;
4Li0L_{i}\leftarrow 0,  hiih_{i}\leftarrow i;
5map hθh_{\theta}, nθ𝒢hθ𝒢(θ)n_{\theta}\in\mathcal{G}_{h_{\theta}}\triangleq\mathcal{G}^{(\theta)} \triangleright Only Tx, algorithm 5;
Algorithm 1 Systematic Phase
Refer to caption
Figure 3: The two cases for update and combine S0S_{0} and S1S_{1} after partitioning with the TOP rule of claim 2.

VII-D Communication Phase

The communication phase consists of the transmissions after the systematic phase, and while all posteriors are lower than 12\frac{1}{2}. During communication phase, the transmitter attempts to boost the posterior of the transmitted message, past the threshold 12\frac{1}{2}, though any other message could cross the threshold instead, due to channel errors.

The list of groups initialized in the systematic phase is maintained ordered by decreasing common posterior. The list of groups is partitioned into S0S_{0} and S1S_{1} before each transmission using rule (141). For this, the group 𝒢m\mathcal{G}_{m} that contains the threshold item bmb_{m} is found first, then all groups before 𝒢m\mathcal{G}_{m} are assigned to S0S_{0} and all the groups after 𝒢m\mathcal{G}_{m} are assigned to S1S_{1}. To assign group 𝒢m\mathcal{G}_{m} the index nmn_{m} of item bmb_{m} that sets the threshold is determined within group 𝒢m\mathcal{G}_{m}. The TOP rule demands that all items j𝒢mj\in\mathcal{G}_{m} with index nm(j)nmn^{(j)}_{m}\leq n_{m} be assigned to S0S_{0} and all items i𝒢mi\in\mathcal{G}_{m} with index nm(i)>nmn^{(i)}_{m}>n_{m} to S1S_{1}. For this we split the group 𝒢m\mathcal{G}_{m} into two by creating an new group with the segment of items past nmn_{m} that belongs in S1S_{1}. However, if the item with index nmn_{m} is the last item in 𝒢m\mathcal{G}_{m}, then the entire group 𝒢m\mathcal{G}_{m} belongs in S0S_{0} and no splitting is required.

After each transmission tt, the posterior probabilities of the groups are updated using the received symbol YtY_{t} according to equation (120). Each posterior is multiplied by a weight update, computed using equation (121), according to its assignment, S0S_{0} or S1S_{1}. Then, the lists that comprise S0S_{0} and S1S_{1} are merged into a single sorted list. The process repeats for the next transmission and the communication phase is interrupted to transition to the confirmation phase when the posterior of a candidate message crosses the 12\frac{1}{2} threshold. The communication phase might resume if the posterior of message ii that triggered the confirmation falls below 12\frac{1}{2} rather than increasing past 1ϵ1-\epsilon, and all other posteriors are still below 12\frac{1}{2}.

Not all groups need to be updated at every transmission. The partitioning method only requires visiting groups 𝒢1,𝒢2,,𝒢m\mathcal{G}_{1},\mathcal{G}_{2},\dots,\mathcal{G}_{m}. After the symbol Yt+1Y_{t+1} is received, we need to combine S0S_{0} and S1S_{1} into a single list, sorted by decreasing order of posteriors once updated. Figure 3 shows the three operations that are executed during the communication phase: partitioning the list, updating the posteriors of the partitions, and combining the updated partitions into a single sorted list. Note that in both cases, Yt+1=0Y_{t+1}=0 and Yt+1=1Y_{t+1}=1, either the entirety or a segment of the partition S1S_{1} is just appended at the end of the new sorted list. This segment starts at the first group 𝒢jS1\mathcal{G}_{j}\in S_{1}, such that its posterior ρj(t)\rho_{j}(t) is smaller than the smallest in S0S_{0}. We avoid explicit operations on this segments and only saved the “weight” update factor as another item in the tuple described in at the beginning of this section. Every subsequent item in the list will share this update coefficient and could be updated latter on if it is encountered at either partitioning the list, updating the posteriors or combining the partitions. If this happens, the “weight” update is just “pushed” to the next list item. Since most of the groups belong to this “tail” segment, we expect to perform explicit operations only for a “small” fraction of the groups. This results in a large complexity reduction, which is validated by the simulation data of figure 6.

Result: index θ^\hat{\theta} s.t. ρθ^(τ)12\rho_{\hat{\theta}}(\tau)\geq\frac{1}{2}
Input: List of Groups {𝒢0,,𝒢K}\{\mathcal{G}_{0},\dots,\mathcal{G}_{K}\}
Input: hθh_{\theta}, nθ𝒢hθn_{\theta}\in\mathcal{G}_{h_{\theta}} \triangleright Tx Only
while ρ0(t)<12\rho_{0}(t)<\frac{1}{2} do
       m0,P00,P10m\leftarrow 0,P_{0}\leftarrow 0,P_{1}\leftarrow 0, S0S_{0}\leftarrow\emptyset;
       while P0+Nmρm<12P_{0}+N_{m}\rho_{m}<\frac{1}{2} do
             P0P0+Nmρm(t)P_{0}\leftarrow P_{0}+N_{m}\rho_{m}(t);
             mm+1m\leftarrow m+1;
            
       end while
      S0{𝒢0,,𝒢m1}S_{0}\leftarrow\{\mathcal{G}_{0},\dots,\mathcal{G}_{m-1}\}, S1{𝒢m+1,}S_{1}\leftarrow\{\mathcal{G}_{m+1},\dots\};
       n0.5P0ρm(t)n\leftarrow\lceil\frac{0.5-P_{0}}{\rho_{m}(t)}\rceil \triangleright Initial nn value;
       if P0+nρm(t)>12(1+ρm(t))P_{0}+n\rho_{m}(t)>\frac{1}{2}(1+\rho_{m}(t)) then
             nn1n\leftarrow n-1 \triangleright TOP rule
       end if
      if n>0n>0 && n<Nmn<N_{m} then
             𝒢newcopy(𝒢m)\mathcal{G}_{new}\leftarrow\textbf{copy}\left(\mathcal{G}_{m}\right);
             NnewNmnN_{new}\leftarrow N_{m}-n, LnewLm+n,NmnL_{new}\leftarrow L_{m}+n,N_{m}\leftarrow n;
             S0S0𝒢m,S1𝒢newS1S_{0}\leftarrow S_{0}\cup\mathcal{G}_{m},S_{1}\leftarrow\mathcal{G}_{new}\cup S_{1};
            
      else
             if n==0n==0 then S1=𝒢mS1S_{1}=\mathcal{G}_{m}\cup S_{1}, mm1m\leftarrow m-1;
             else then S0=S0𝒢mS_{0}=S_{0}\cup\mathcal{G}_{m}
       end if
      P0P0+nρm(t),P11P0P_{0}\leftarrow P_{0}+n\rho_{m}(t),P_{1}\leftarrow 1-P_{0};
      
      if Tx then
             if 𝒢m==𝒢(θ)\mathcal{G}_{m}==\mathcal{G}^{(\theta)} && nθLm+nn_{\theta}\leq L_{m}+n then
                  𝒢(θ)𝒢new0\mathcal{G}^{(\theta)}\leftarrow\mathcal{G}_{new}^{0}
             end if
            if 𝒢(θ)S0\mathcal{G}^{(\theta)}\in S_{0} then xt+1=0x_{t+1}=0, else xt+1=1x_{t+1}=1 end;
       end if
      tt+1t\leftarrow t+1 \triangleright Increase time index;
       Update and Merge S0S_{0} and S1S_{1} via algorithm 3;
      
end while
Algorithm 2 Communication Phase
Data: channel input: xt+1x_{t+1}, output: yt+1y_{t+1}
Data: mm: index of last item in S0S_{0}
Data: WiW_{i}: Additional group parameter initialized at 11: 𝒢i=[Ni,Li,hi,ρi(yt),Wi]\mathcal{G}_{i}=[N_{i},L_{i},h_{i},\rho_{i}(y^{t}),W_{i}]
W0qPyt+1(qp)+pW_{0}\leftarrow\frac{q}{P_{y_{t+1}}(q-p)+p} \triangleright Weight update for items in S0S_{0} ;
W1pPyt+1(qp)+pW_{1}\leftarrow\frac{p}{P_{y_{t+1}}(q-p)+p} \triangleright Weight Update for items in S1S_{1} ;
n00n_{0}\leftarrow 0, n10n_{1}\leftarrow 0 \triangleright index of first groups in S0S_{0} and S1S_{1} ;
Wn1Wn1W1W_{n_{1}}\leftarrow W_{n_{1}}\cdot W_{1} \triangleright Update weight of first group in S1S_{1};
while n0mn_{0}\leq m do
       if ρn0(t)<Wn1ρn1(t)\rho_{n_{0}}(t)<W_{n_{1}}\cdot\rho_{n_{1}}(t) then
             ρn1(t)ρn1(t)Wn1\rho_{n_{1}}(t)\leftarrow\rho_{n_{1}}(t)\cdot W_{n_{1}};
             Wn1+1Wn1+1Wn1W_{n_{1}+1}\leftarrow W_{n_{1}+1}\cdot W_{n_{1}} \triangleright Update Next weight;
             Wn11W_{n_{1}}\leftarrow 1 \triangleright Reset weight Wn1W_{n_{1}} ;
             insert 𝒢n1\mathcal{G}_{n_{1}} in S0S_{0} before 𝒢n0\mathcal{G}_{n_{0}};
             n1n1+1n_{1}\leftarrow n_{1}+1 \triangleright Get next item from S1S_{1};
            
      else
             ρn0(t)ρn0(t)W0\rho_{n_{0}}(t)\leftarrow\rho_{n_{0}}(t)\cdot W_{0} \triangleright Update ρn0(t)\rho_{n_{0}}(t) ;
             n0n0+1n_{0}\leftarrow n_{0}+1 \triangleright Get next item from S0S_{0};
            
       end if
      
end while
Algorithm 3 Simplified Update and Merge Algorithm
Data: Group 𝒢i\mathcal{G}_{i} with Ni=1N_{i}=1 and ρi(yt)12\rho_{i}(y^{t})\geq\frac{1}{2}
Z(t)0Z(t)\leftarrow 0, N=C21log2(1ϵϵ)N=\left\lceil C_{2}^{-1}\log_{2}\left(\frac{1-\epsilon}{\epsilon}\right)\right\rceil;
if Ui(t)+(N1)C2log2(1ϵϵ)U_{i}(t)+(N-1)C_{2}\geq\log_{2}\left(\frac{1-\epsilon}{\epsilon}\right) then NN1N\leftarrow N-1;
while Z(t)0Z(t)\geq 0 && Z(t)<NZ(t)<N do
       Tx: Xt0X_{t}\leftarrow 0 if 𝒢(θ)==𝒢i\mathcal{G}^{(\theta)}==\mathcal{G}_{i} else Xt1X_{t}\leftarrow 1;
       Z(t)Z(t)+12YtZ(t)\leftarrow Z(t)+1-2\cdot Y_{t};
      
end while
if Z(t)==1Z(t)==-1 then
       Run update and merge \triangleright Algorithm 3;
       Go to Communication Phase \triangleright Algorithm 2;
      
else
       Rx: Get estimate θ^\hat{\theta} \triangleright Algorithm 6;
      
end if
Algorithm 4 Confirmation Phase Algorithm

VII-E Confirmation Phase

The Confirmation Phase is triggered when a candidate ii attains a posterior ρi(yt)12\rho_{i}(y^{t})\geq\frac{1}{2}. During this phase the transmitter will attempt to boost ρi(yt)\rho_{i}(y^{t}), the posterior of candidate ii, past the 1ϵ1-\epsilon threshold, if it is the true message θ\theta. Otherwise it will attempt to drive its posterior below 12\frac{1}{2}. Clearly, the randomness of the channel could allow the posterior ρi(yt)\rho_{i}(y^{t}) to grow past 1ϵ1-\epsilon, even if it is the wrong message, resulting in a decoding error. Alternatively, the right message could still fall back to the communication phase, also due to channel errors. The confirmation phase lasts for as long as the posterior of the message that triggered its start stays between 12\frac{1}{2} and 1ϵ1-\epsilon or equivalently Ui(t)U_{i}(t) stays between 0 and ϵUlog2(1ϵϵ)\epsilon_{U}\triangleq\log_{2}\left(\frac{1-\epsilon}{\epsilon}\right).

There are no partitioning, update, or combining operations during the confirmation phase. If jj is the message in the confirmation phase, then the partitioning is just S0={j}S_{0}=\{j\}, S1=Ω{j}S_{1}=\Omega\setminus\{j\}. A single update is executed if a fallback occurs, letting ρi(yt)=ρi(yTn)i=1,,M\rho_{i}(y^{t})=\rho_{i}(y^{T_{n}})\quad\forall i=1,\dots,M, where nn is the index of the confirmation phase round that just ended and TnT_{n} is the time at which it started. This is because every negative update that follows a positive update results in every ρi(yt)\rho_{i}(y^{t}) returning to the state it was at time t2t-2. This is summarized in claim 3 that follows. During the confirmation phase it suffices to check if Uj(t)ϵUU_{j}(t)\geq\epsilon_{U}, in which case the process should terminate, or if Uj(t)<0U_{j}(t)<0, in which case a fall back occurs.

Claim 3 (Confirmation Phase is a Discrete Markov Chain).

Let the partitioning of Ω\Omega at time t=st=s be S0={j}S_{0}=\{j\}, S1=Ω{j}S_{1}=\Omega\setminus\{j\}, and suppose Ys+1=0Y_{s+1}=0. If the partitioning at time t=s+1t=s+1 is also S0={j}S_{0}=\{j\}, S1=Ω{j}S_{1}=\Omega\setminus\{j\}, the same partitioning of time ss, and Yt+1=1Y_{t+1}=1, then for all i=1,,Mi=1,\dots,M, ρi(yt+1)=ρi(yt1)\rho_{i}(y^{t+1})=\rho_{i}(y^{t-1}), that is ρi(ys)=ρi(ys+2)\rho_{i}(y^{s})=\rho_{i}(y^{s+2}).

Proof:

See appendix C

During the confirmation phase we only need to count the difference between boosting updates and attenuating updates. Since the Ui(t)U_{i}(t) changes in steps with magnitude C2C_{2}, then there is a unique number NN such that Ui(Tn)+NC2ϵUU_{i}(T_{n})+NC_{2}\geq\epsilon_{U} and Ui(Tn)+(N1)C2<ϵUU_{i}(T_{n})+(N-1)C_{2}<\epsilon_{U}. Starting at time t=Tnt=T_{n}, , since S0={j}S_{0}=\{j\}, any event Yt+1=0Y_{t+1}=0 is a boosting update that results in Ui(t+1)=Ui(t)+C2U_{i}(t+1)=U_{i}(t)+C_{2} and any event Yt+1=1Y_{t+1}=1 is an attenuating update that results in Ui(t+1)=Ui(t)C2U_{i}(t+1)=U_{i}(t)-C_{2}. A net of NN boosting updates are needed to reach Ui(Tn)+NC2U_{i}(T_{n})+NC_{2}. Let the difference between boosting and attenuating updates be Z(t)s=Tn+1t(12Ys)Z(t)\triangleq\sum_{s=T_{n}+1}^{t}(1-2Y_{s}). The transmission terminates the first time τ\tau where Z(τ)=NZ(\tau)=N. However, a fall back occurs if Z(t)Z(t) ever reaches 1-1 before reaching NN. The value of NN can be computed as follows: let N1C21log2(1ϵϵ)N_{1}\triangleq\left\lceil C_{2}^{-1}\log_{2}\left(\frac{1-\epsilon}{\epsilon}\right)\right\rceil and let ϵnlog2(1ϵϵ)N1C2\epsilon_{n}\triangleq\log_{2}\left(\frac{1-\epsilon}{\epsilon}\right)-N_{1}C_{2}. Suppose the confirmation phase starts at some time t=Tnt=T_{n}, then, N=N1N=N_{1} if Ui(Tn)ϵNU_{i}(T_{n})\geq\epsilon_{N}, otherwise N=N1+1N=N_{1}+1. Once NN is computed, all that remains is to track Z(t)Z(t), where Z(t+1)=Z(t)+(12Yt+1)Z(t+1)=Z(t)+(1-2Y_{t+1}), and return to the communication phase if Z(t)Z(t) reaches 1-1 or terminate the process if Z(t)Z(t) reaches NN.

Input: i,Ki,K \triangleright ii: message, KK: message length=
Input: channel output: yKy_{K}
Result: tuple (h,n)(h,n) \triangleright (type, index)
ei=iyKe^{i}=i\oplus y^{K};
hj=0Kejih\leftarrow\sum^{K}_{j=0}e^{i}_{j};
n0n\leftarrow 0;
chc\leftarrow h;
for j=0,,Kj=0,\dots,K do
       if c==0c==0 then
            Break
       end if
      if eji==0e^{i}_{j}==0 then
             nn+(Kj1c1)n\leftarrow n+\binom{K-j-1}{c-1};
            
      else
             cc1c\leftarrow c-1;
            
       end if
      
end for
Algorithm 5 map message i to index n and type h
Theorem 4.

[from [1]] Suppose that Ω={0,1}K\Omega=\{0,1\}^{K} and ρi(0)=2KiΩ\rho_{i}(0)=2^{-K}\,\forall i\in\Omega. Then, for t=1,,Kt=1,\dots,K the partitioning rule S0={iΩbt(i)=0},S1={iΩbt(i)=1}S_{0}=\{i\in\Omega\mid b^{(i)}_{t}=0\},S_{1}=\{i\in\Omega\mid b^{(i)}_{t}=1\}, results in systematic transmission: xK=θx^{K}=\theta, and achieves exactly equal partitioning P0=P1=12P_{0}=P_{1}=\frac{1}{2}.

Proof:

First note that if Ω={0,1}K\Omega=\{0,1\}^{K}, then for each t=1,,Kt=1,\dots,K, exactly half of the items in iΩi\in\Omega have bit bt(i)=0b^{(i)}_{t}=0 and the other half have bit bt(i)=1b^{(i)}_{t}=1. The theorem holds for t=1t=1, since the partitioning S0={iΩb1(i)=0},S1={iΩb1(i)=1}S_{0}=\{i\in\Omega\mid b^{(i)}_{1}=0\},S_{1}=\{i\in\Omega\mid b^{(i)}_{1}=1\} results in half the messages in each partition and all the messages have the same prior. For t=1,,K1t=1,\dots,K-1 note the partitioning S0={iΩbt(i)=0},S1={iΩbt(i)=1}S_{0}=\{i\in\Omega\mid b^{(i)}_{t}=0\},S_{1}=\{i\in\Omega\mid b^{(i)}_{t}=1\} only considers the first tt bits b1(i),,bt(i)b^{(i)}_{1},\dots,b^{(i)}_{t} of each message ii. Thus, all item {jΩb1(j),,bt(j)=b1,,bt}\{j\in\Omega\mid b^{(j)}_{1},\dots,b^{(j)}_{t}=b_{1},\dots,b_{t}\} that share a prefix sequence b1,,btb_{1},\dots,b_{t} have shared the same partition at times s=1,,ts=1,\dots,t, and therefore share the same posterior. There are exactly 2Kt2^{K-t} such difference posteriors. Also, exactly half of the items that share the sequence b1,,btb_{1},\dots,b_{t} have bit bt+1=0b_{t+1}=0 and are assigned to S0S_{0} at time t+1t+1 and the other half have bit bt+1=1b_{t+1}=1 and are assigned to S1S_{1} at time t+1t+1. Then, S0S_{0} and S1S_{1} will each hold half the items in each posterior group at each next time t+1t+1 for t=1,,K1t=1,\dots,K-1, and therefore equal partitioning holds also at times t=2,,Kt=2,\dots,K. ∎

Data: Index LiL_{i}, Hamming weight hih_{i}
Result: θ^\hat{\theta} \triangleright Receiver Estimate of θ\theta
θ^yK\hat{\theta}\leftarrow y^{K}, nLin\leftarrow L_{i}, hhih\leftarrow h_{i};
for j=0,,K1j=0,\dots,K-1 do
       if h==0h==0 then
            Break
       end if
      if n<(K1jh1)n<\binom{K-1-j}{h-1} then
             θ^j¬θ^j\hat{\theta}_{j}\leftarrow\neg\hat{\theta}_{j};
             hh1h\leftarrow h-1;
            
      else
            
            nn(Kj1h1)n\leftarrow n-\binom{K-j-1}{h-1};
            
       end if
      
end for
Algorithm 6 get estimate θ^\hat{\theta} from index n and type h

VII-F Complexity of the SPM-TOP Algorithm

The memory complexity of the SPM-TOP algorithm is of order O(K2)O(K^{2}) because we use a triangular array of all combinations of the form (Ki)i{0,,K}\binom{K}{i}\quad i\in\{0,\dots,K\}. The algorithm itself stores a list of groups that grow linearly with KK, since the list size is bounded by the decoding time τ\tau.

The time complexity of the SPM-TOP algorithm is of order O(K2)O(K^{2}). To obtain this result note that the total number of items that the system tracks is bounded by the transmission index tt. At each transmission tt, partitioning, update and combine operations require visiting every item at most once. Furthermore, because of the complexity reduction described in Sec. VII-D, the system executes operations for only a fraction of all the items that are stored. The time complexity at each transmission is then of order O(K)O(K), with a small constant coefficient. The number of transmissions required is approximately K/CK/C as the scheme approaches capacity. A linear number of transmissions, each of which requires a linear number of operations, results in an overall quadratic complexity, that is, order O(K2)O(K^{2}), for fixed channel capacity CC.

The KK systematic transmissions only require storing the bits, and in the confirmation phase we just add each symbol YtY_{t} to the running sum. The complexity of this phase is then of order O(K)O(K). Therefore, the complexity of O(K2)O(K^{2}) is only for the communication phase.

VIII SPM-TOP simulation results

Refer to caption
Figure 4: SPM-TOP Rate and FER performance as a function of average blocklength. The red lines are rate and FER for the standard version of the SPM-TOP algorithm. The purple lines are rate and FER for a randomized version of the SPM-TOP algorithm. The orange dash-dot line -\cdot shows the bound defined in VI-B for a systematic encoder; the yellow dash-dot line -\cdot shows the new bound introduced in Thm. 3 for system that enforces the SEAD constraints an the initial distribution is uniform; the black dash-dot line -\cdot shows the SED lower bound by Yang et al. [2], equation (7) sec. II; and the green dash-dot line -\cdot shows Polyanskiy’s VLF lower bound for a stop feedback system. The simulation consists of 1M Monte Carlo trials and for all curves the channel is one with capacity C=0.50C=0.50, where p0.110p\approx 0.110, and a decoding threshold ϵ=103\epsilon=10^{-3}.

We validate the theoretical achievability bounds in Sec. III and Sec. VI via simulations of the SPM-TOP algorithm. Figure 4 shows Simulated rate vs. blocklength performance curves of the SPM-TOP algorithm and the corresponding frame error rate (FER). The plots show simulated rate curves for the standard SPM-TOP algorithm and a randomized version, as well as their associated error rate. The rate for the standard SPM-TOP algorithm is shown with the red curve with dots at the top and the corresponding FER is the red, jagged curve at the bottom. The standard SPM-TOP algorithm stops when a message ii attains Ui(t)log2((1ϵ)/ϵ)U_{i}(t)\geq\log_{2}\left((1-\epsilon)/\epsilon\right). Note that the FER is well below the threshold ϵ\epsilon. The randomized version of the SPM-TOP algorithm, implements a stopping rule that randomly alternates between the standard rule, which is stopping when a message ii attains Ui(t)log2((1ϵ)/ϵ)U_{i}(t)\geq\log_{2}\left((1-\epsilon)/\epsilon\right), and stopping when a message attains Ui(t)log2((1ϵ)/ϵ)C2U_{i}(t)\geq\log_{2}\left((1-\epsilon)/\epsilon\right)-C_{2}, which requires one less correctly received transmission. The simulated rate of the randomized SPM-TOP version is the purple curve above the standard version. This randomized version aims to obtain a higher rate by forcing the FER to be close to the threshold ϵ\epsilon rather than upper bounded bounded by ϵ\epsilon. Note that the corresponding FER, the horizontal purple curve with dots, is very close to the threshold ϵ=103\epsilon=10^{-3}, but not necessarily bounded above by the threshold. The simulation consisted of 10610^{6} trials for each value of K=1,,100K=1,\dots,100 and for a decoding threshold ϵ=103\epsilon=10^{-3} and a channel with capacity C=0.50C=0.50. The simulated rate curves attain an average rate that approaches capacity rapidly and stay above all these theoretical bounds, also shown in Figure 4 that we describe next.

The two rate lower bounds introduced in this paper are shown in Figure 4 for the same channel, capacity C=0.50C=0.50 and threshold ϵ=103\epsilon=10^{-3}, used in the simulations. The highest lower bound is labeled Sys. Lower Bound, and is the bound developed in Sec. VI-B for a system that uses a systematic phase to turn a uniform initial distribution into a binomial distribution and then enforces the SEAD constraints. The next highest bound is labeled SEAD Rate Bound and is the lower bound (32) introduced in Thm. 3 for a system that enforces the SEAD constraints. This bound is a slight improvement from the SED lower bound by Yang et al. [2] that we show for comparison and is labeled Yang’s Lower Bound. Also for comparison, we show Polyanskiy’s VLF lower bound developed for a stop feedback system. Since a stop feedback system is less capable than a full feedback system, we expect that the lower bound for full feedback system approaches capacity faster than the VLF bound, which is what the previous 33 bounds achieve.

Refer to caption
Figure 5: Shows average time as a function of KK for values of K=10,20,,1000K=10,20,\dots,1000. The yellow line shows the time in milliseconds per frame transmission and the yellow line, shows microseconds per symbol transmission. The number of trials use to obtain this data is 100,000100,000 and the channel capacity is C=0.50C=0.50.

The empirical time complexity results of the SPM-TOP algorithm simulations vs. message size KK is shown in Figures 5 and Figure 6. All simulations were performed on a 2019 MacBook Pro laptop with a 2.42.4 GHz, 88-core i9i9 processor and 1616 GB of RAM, and with transmitter and receiver operating alternatively on the same processor. First we show in Figure 5 the average time, in milliseconds, taken per message, yellow line, and per transmitted symbol, green line. The average time per symbol drops very fast as the message size grows from 1010 to 200200 and then slowly stabilizes. This drop could be explained by the initialization time needed for each new message. However, the computer temperature and other processes managed by the computer’s OS could also play a role in time measurements. For a more accurate characterization of the complexity’s evolution as a function of message size KK, we count the number of operations executed during the transmission of each symbol and each message, which are probability checks for partitioning before transmitting a symbol and probability updates after the transmission of a symbol.

The average number of probability checks and probability update operations per message vs message size KK are shown in the top of Figure 6. To compare the data with a quadratic line, we fitted the parabola 0.0154K2+4.4316K25.99050.0154K^{2}+4.4316K-25.9905 to the update-merge simulated data. Also for reference, the blue line shows the function 0.17K1.690.17K^{1.69} to highlight that the complexity per message is below quadratic in the region of interest. The average number of probability checks and probability update operations per transmitted symbol are shown in the bottom of Figure 6. The number of operations per symbol falls well below KK, even when the number of probabilities that the system tracks is larger than KK. This number is K+1K+1 at t=Kt=K and only increases with tt. Note that for K=1000K=1000 both averages are below 4040. These results show that complexity of the SPM-TOP algorithm allows for fast execution time and validate the theory that the complexity order as a function of KK is linear for each transmission and quadratic for the whole message.

Refer to caption
Figure 6: The plots shows the average of the number of links visited for update-merge operations, orange solid line, and for partitioning the list of groups into S0S_{0} and S1S_{1}, red solid line, as a function of message size KK. The top plot shows the average for the entire transmission of a frame, while the bottom plot shows the average for a single transmission. The top plot includes a quadratic line, green line with dots ...., fitted to the update-merge curve for comparison with the simulation data. Also, the top plot includes the function 0.17K1.690.17K^{1.69} as a reference to show that the complexity order of the number of links visited during the transmission of a frame is below quadratic. This data was obtained with a simulation of 100,000100,000 trials, and for a channel with capacity C=0.5C=0.5

IX Conclusion

Naghshvar et al. [10] established the “small enough difference” (SED) rule for posterior matching partitioning and used martingale theory to study asymptotic behavior and also showed how to develop a non-asymptotic lower bound on achievable rate. Yang et al. [2] significantly improved the non-asymptotic achievable rate bound using martingale theory for the communication phase and a Markov model for the confirmation phase, still maintaining the SED rule. However, partitioning algorithms that enforce the SED rule require a complex process of swapping messages back and forth between the two message sets S0S_{0} and S1S_{1} and updating the posteriors.

To reduce complexity, this paper replaces SED with the small enough absolute difference (SEAD) partitioning constraints. The SEAD constraints are more relaxed than SED, and they admit the TOP partitioning rule. In this way, SEAD allows a low complexity approach that organizes messages according to their type, i.e. their Hamming distance from the received word, orders messages according to their posterior, and partitions the messages with a simple threshold without requiring any swaps.

The main analytical results show that the SEAD constraints suffice to achieve at least the same lower bound that Yang et al. [2] showed to be achievable by SED. Moreover, the new SEAD analysis establishes achievable rate bounds higher than those found by Yang et al. [2]. The analysis does not use martingale theory for the communication phase and applies a surrogate channel technique to tighten the results. An initial systematic transmission further increases the achievable rate bound.

The simplified encoder associated with SEAD has a complexity below order O(K2)O(K^{2}) and allows simulations for message sizes of at least 1000 bits. These simulations reveal actual achievable rates that are enough above our new achievable-rate bounds that further analytical investigation to obtain even tighter achievable-rate bounds is warranted. From a practical perspective, the simulation results themselves provide new lower bounds on the achievable rates possible for the BSC with full feedback. For example, with an average block size of 200.97 bits corresponding to k=99k=99 message bits, simulation results for a target codeword error rate of 10310^{-3} show a rate of R=0.493R=0.493 for the channel with capacity 0.5, i.e. 9999% of capacity.

X Acknowledgements

The authors would like to thank Hengjie Yang and Minghao Pan for their help with this manuscript.

Appendix A Proof of claim 1

Proof that Ui(t+1)Ui(t)=C2\mid U_{i}(t+1)-U_{i}(t)\mid=C_{2} is equivalent to S0={j}S_{0}=\{j\} ( or S1={j}S_{1}=\{j\}). First let’s prove the converse, if the set containing jj is not singleton, then constraint (19) does not hold. Without loss of generality, assume jS0j\in S_{0} and suppose lj\exists l\neq j s.t. lS0l\in S_{0}. Since P0ρj(yt)+ρl(t)P_{0}\geq\rho_{j}(y^{t})+\rho_{l}(t), then, Δ=2P012ρi(yt)+2ρj(yt)12ρj(yt)1>0\Delta=2P_{0}-1\geq 2\rho_{i}(y^{t})+2\rho_{j}(y^{t})-1\geq 2\rho_{j}(y^{t})-1>0. By equation (123), when jSyj\in S_{y}, then:

Uj\displaystyle U_{j} (t+1)Uj(t)\displaystyle(t+1)-U_{j}(t)
=log2(2q)log2(1+(qp)Δρj(yt)1ρj(yt))\displaystyle=\log_{2}(2q)-\log_{2}\left(1+(q-p)\frac{\Delta-\rho_{j}(y^{t})}{1-\rho_{j}(y^{t})}\right) (149)
log2(2q)log2(1+(qp)2ρj(yt)+2ρl(t)1ρj(yt)1ρj(yt))\displaystyle\leq\log_{2}(2q)\!-\!\log_{2}\left(1\!+\!(q\!-\!p)\frac{2\rho_{j}(y^{t})+2\rho_{l}(t)\!-\!1\!-\!\rho_{j}(y^{t})}{1\!-\!\rho_{j}(y^{t})}\right)
=log2(2q)log2(1(qp)+(qp)2ρl(t)1ρj(yt))\displaystyle=\log_{2}(2q)\!-\!\log_{2}\left(1\!-\!(q\!-\!p)\!+\!(q\!-\!p)\frac{2\rho_{l}(t)}{1\!-\!\rho_{j}(y^{t})}\right)
<log2(2q)log2(1(qp))=C2.\displaystyle<\log_{2}(2q)-\log_{2}\left(1-(q-p)\right)=C_{2}\,. (150)

Note from equation (149) that Uj(t+1)Uj(t)U_{j}(t+1)-U_{j}(t) decreases with Δ\Delta, therefore, replacing Δ\Delta with a lower bound gives an upper bound of difference (149). For a lower bound, note that Δ1\Delta\leq 1, and that setting Δ=1\Delta=1 in (149), results in Uj(t+1)Uj(t)=0U_{j}(t+1)-U_{j}(t)=0. In the case where Yt+1=Xt+11Y_{t+1}=X_{t+1}\oplus 1, or jSycj\in S_{y^{c}}, by equation (123), the difference Uj(t+1)Uj(t)U_{j}(t+1)-U_{j}(t) is:

Uj\displaystyle U_{j} (t+1)Uj(t)\displaystyle(t+1)-U_{j}(t)
=log2(2p)log2(1(qp)Δρj(yt)1ρj(yt))\displaystyle=\log_{2}(2p)-\log_{2}\left(1-(q-p)\frac{\Delta-\rho_{j}(y^{t})}{1-\rho_{j}(y^{t})}\right) (151)
log2(2p)log2(1(qp)2ρj(yt)+2ρl(t)1ρj(yt)1ρj(yt))\displaystyle\geq\log_{2}(2p)\!-\!\log_{2}\left(\!1\!-(q\!-\!p)\frac{2\rho_{j}(y^{t})\!+\!2\rho_{l}(t)\!-\!1-\!\rho_{j}(y^{t})}{1-\rho_{j}(y^{t})}\right) (152)
=log2(2p)log2(1+(qp)(12ρl(t)1ρj(yt)))\displaystyle=\log_{2}(2p)\!-\!\log_{2}\left(\!1\!+\!(q\!-\!p)\left(1\!-\!\frac{2\rho_{l}(t)}{1\!-\!\rho_{j}(y^{t})}\right)\right) (153)
>log2(2p)log2(2q)=C2.\displaystyle>\log_{2}(2p)-\log_{2}\left(2q\right)=-C_{2}\,. (154)

To prove that if the set containing jj is singleton, then |Uj(t+1)Uj(t)|=C2|U_{j}(t+1)-U_{j}(t)|=C_{2}, note that S0={j}Δ=2ρj(yt)1S_{0}=\{j\}\implies\Delta=2\rho_{j}(y^{t})-1. The inequalities, therefore, become equalities and equations (149) and (151) become C2C_{2} and C2-C_{2} respectively.

Appendix B Proof of existence of Ui(t)U^{\prime}_{i}(t) in Thm. 2

The proof that a process like the one described in Thm. 2 exists, consists of constructing one such process. Define the process Ui(t)U^{\prime}_{i}(t) by Ui(t)=Ui(t)U^{\prime}_{i}(t)=U_{i}(t) if Ytϵ(i,n)Y^{t}\in\mathcal{B}^{(i,n)}_{\epsilon} and define Ui(t+1)Ui(t)+Wi(t+1)U^{\prime}_{i}(t+1)\triangleq U^{\prime}_{i}(t)+W^{\prime}_{i}(t+1), where, to enforce constraints (16) and (20), if Yt𝒴ϵ(i,n)Y^{t}\!\in\mathcal{Y}^{(i,n)}_{\epsilon}, the update Wi(t+1)W^{\prime}_{i}(t+1) is defined by (162), and when Yt𝒴ϵ(i,n)Y^{t}\!\notin\mathcal{Y}^{(i,n)}_{\epsilon}, in which case Ui(t)0U^{\prime}_{i}(t)\geq 0, to enforce constraints (11) and (19) the update Wi(t+1)W^{\prime}_{i}(t+1) is given by C2(𝟙(Yt+1=0)𝟙(Yt+1=1))C_{2}(\mathbbm{1}_{(Y_{t+1}=0)}-\mathbbm{1}_{(Y_{t+1}=1)}).

Denote the transmitted symbol Xt+1X_{t+1} by XX, and the received symbol Yt+1Y_{t+1} by YY and let Xc=X1X^{c}=X\oplus 1. The symbol YY could either be XX or XcX^{c}. Also, we could have {iS0}\{i\in S_{0}\} or {iS1}\{i\in S_{1}\}. These cases combine to 44 possible events. Define Wi(t+1)Ui(t+1)Ui(t)W_{i}(t+1)\triangleq U_{i}(t+1)-U_{i}(t), and note that Wi(t+1)W_{i}(t+1) can be derived from equation (123) as follows:

Wi(t+1)={log2(2q)+aiif iS0,Y=Xlog2(2p)+biif iS0,Y=Xclog2(2p)+ciif iS1,Y=Xclog2(2q)+diif iS1,Y=X,W_{i}(t\!+\!1)=\begin{cases}\log_{2}(2q)\!+\!a_{i}\quad\text{if }i\in S_{0},Y=X\\ \log_{2}(2p)\!+\!b_{i}\quad\text{if }i\in S_{0},Y=X^{c}\\ \log_{2}(2p)\!+\!c_{i}\quad\text{if }i\in S_{1},Y=X^{c}\\ \log_{2}(2q)\!+\!d_{i}\quad\text{if }i\in S_{1},Y=X\end{cases}\,, (155)

where aia_{i}, bib_{i}, cic_{i} and did_{i} are given by:

ai\displaystyle a_{i} =log2(1(qp)ρj(yt)Δ1ρj(yt))\displaystyle=-\log_{2}\left(1-(q-p)\frac{\rho_{j}(y^{t})-\Delta}{1-\rho_{j}(y^{t})}\right) (156)
bi\displaystyle b_{i} =log2(1+(qp)ρj(yt)Δ1ρj(yt))\displaystyle=-\log_{2}\left(1+(q-p)\frac{\rho_{j}(y^{t})-\Delta}{1-\rho_{j}(y^{t})}\right) (157)
ci\displaystyle c_{i} =log2(1+(qp)ρj(yt)+Δ1ρj(yt))\displaystyle=-\log_{2}\left(1+(q-p)\frac{\rho_{j}(y^{t})+\Delta}{1-\rho_{j}(y^{t})}\right) (158)
di\displaystyle d_{i} =log2(1(qp)ρj(yt)+Δ1ρj(yt)).\displaystyle=-\log_{2}\left(1-(q-p)\frac{\rho_{j}(y^{t})+\Delta}{1-\rho_{j}(y^{t})}\right)\,. (159)

Let aia^{\prime}_{i} and did^{\prime}_{i} be defined by:

ai\displaystyle a^{\prime}_{i} 𝟙(Δ<0)1Δ1+Δlog2(1(qp)Δ)pqbi,\displaystyle\triangleq\mathbb{1}_{(\Delta<0)}\frac{1-\Delta}{1+\Delta}\log_{2}\left(1-(q-p)\Delta\right)-\frac{p}{q}b_{i}, (160)
di\displaystyle d^{\prime}_{i} 𝟙(di<0)di𝟙(di0)pqci,\displaystyle\triangleq\mathbb{1}_{(d_{i}<0)}d_{i}-\mathbb{1}_{(d_{i}\geq 0)}\frac{p}{q}c_{i}\,, (161)

then, define the update Wi(t+1)W^{\prime}_{i}(t+1) by:

Wi(t+1)={log2(2q)+aiif iS0,Y=Xlog2(2p)+biif iS0,Y=Xclog2(2p)+ciif iS1,Y=Xclog2(2q)+diif iS1,Y=X.W^{\prime}_{i}(t\!+\!1)=\begin{cases}\log_{2}(2q)\!+\!a^{\prime}_{i}\quad\text{if }i\in S_{0},Y=X\\ \log_{2}(2p)\!+\!b_{i}\quad\text{if }i\in S_{0},Y=X^{c}\\ \log_{2}(2p)\!+\!c_{i}\quad\text{if }i\in S_{1},Y=X^{c}\\ \log_{2}(2q)\!+\!d^{\prime}_{i}\quad\text{if }i\in S_{1},Y=X\end{cases}\,. (162)

Need to show that the constraints (20)-(19) of Thm. 1, and constraints (26) and (28) are satisfied. When Ui(t)0U^{\prime}_{i}(t)\geq 0, since Wi(t+1)W^{\prime}_{i}(t+1) is defined in the same manner as Wi(t+1)W_{i}(t+1), then constraints (18) and (19) are satisfied.

The proof that Ui(t)U^{\prime}_{i}(t) satisfies constraints (20), (16) and (26) is split into the case where Δ0\Delta\geq 0 and the case where Δ<0\Delta<0.

B-A Case Δ0\Delta\geq 0.

It suffices to show that for all yt𝒴ϵ(i,n)y^{t}\in\mathcal{Y}^{(i,n)}_{\epsilon} and for all i=1,,Mi=1,\dots,M, if Δ0\Delta\geq 0, then 𝖤[Wi(t+1)θ=i,Yt=yt]=C\mathsf{E}[W^{\prime}_{i}(t+1)\mid\theta=i,Y^{t}=y^{t}]=C, Since C>0C>0 (constraint (16)), and any weighted average would add up to CC (constraint (20)).

When Δ0\Delta\geq 0, then ρi(yt)+Δ>0\rho_{i}(y^{t})+\Delta>0, and therefore, ai=pqbja^{\prime}_{i}=-\frac{p}{q}b_{j} since di>0d_{i}>0. The expectation 𝖤[Wj(t+1)θ=i,Yt=yt]\mathsf{E}[W^{\prime}_{j}(t+1)\mid\theta=i,Y^{t}=y^{t}] can be computed from (123), where ιi\iota_{i} depends on whether iS0i\in S_{0} or iS1i\in S_{1}. The expectation is given by either (163) or (164) respectively:

qlog2(2q)pbi+plog2(2p)+pbi=Cif iS0\displaystyle q\log_{2}(2q)-pb_{i}+p\log_{2}(2p)+pb_{i}=C\;\text{if }i\in S_{0} (163)
qlog2(2q)pci+plog2(2p)+pci=Cif iS1.\displaystyle q\log_{2}(2q)-pc_{i}+p\log_{2}(2p)+pc_{i}=C\;\text{if }i\in S_{1}\,. (164)

This proofs constraints (20) and (16) satisfied. To proof that constraint (26) is satisfied, need to show that Wi(t+1)Wi(t+1)W^{\prime}_{i}(t+1)\leq W_{i}(t+1). If suffices to compare the cases where Wi(t+1)Wi(t+1)W^{\prime}_{i}(t+1)\neq W_{i}(t+1), when Y=XY=X. It suffices to compare the terms in which the pairs differ, that is, that aiaia^{\prime}_{i}\leq a_{i} and didid^{\prime}_{i}\leq d_{i}. For this comparison, express aia_{i} and did_{i} as positive logarithms as follows:

ai\displaystyle a_{i} =log2(1+(qp)(ρi(yt)Δ)1ρi(yt)(qp)(ρi(yt)Δ))\displaystyle=\log_{2}\left(1+\frac{(q-p)(\rho_{i}(y^{t})-\Delta)}{1-\rho_{i}(y^{t})-(q-p)(\rho_{i}(y^{t})-\Delta)}\right) (165)
ai\displaystyle a^{\prime}_{i} =pqlog2(1+(qp)(ρi(yt)Δ)1ρi(yt))\displaystyle=\frac{p}{q}\log_{2}\left(1+\frac{(q-p)(\rho_{i}(y^{t})-\Delta)}{1-\rho_{i}(y^{t})}\right) (166)
di\displaystyle d_{i} =log2(1+(qp)(Δ+ρi(yt))1ρi(yt)(qp)(Δ+ρi(yt)))\displaystyle=\log_{2}\left(1+\frac{(q-p)(\Delta+\rho_{i}(y^{t}))}{1-\rho_{i}(y^{t})-(q-p)(\Delta+\rho_{i}(y^{t}))}\right) (167)
di\displaystyle d^{\prime}_{i} =pqlog2(1+(qp)(ρi(yt)+Δ)1ρi(yt)).\displaystyle=\frac{p}{q}\log_{2}\left(1+\frac{(q-p)(\rho_{i}(y^{t})+\Delta)}{1-\rho_{i}(y^{t})}\right)\,. (168)

Since p12pq1p\leq\frac{1}{2}\rightarrow\frac{p}{q}\leq 1, then, we only need to show that the arguments of the logarithm in (165) is greater than that of (166), and similarly for (167) and (168). All arguments share the term 11, then only inequalities (169) and (170) to hold:

(qp)(ρi(yt)Δ)1ρi(yt)(qp)(ρi(yt)Δ)\displaystyle\frac{(q-p)(\rho_{i}(y^{t})-\Delta)}{1\!-\!\rho_{i}(y^{t})\!-\!(q\!-\!p)(\rho_{i}(y^{t})\!-\!\Delta)} (qp)(ρi(yt)Δ)1ρi(yt)\displaystyle\geq\frac{(q-p)(\rho_{i}(y^{t})-\Delta)}{1\!-\!\rho_{i}(y^{t})} (169)
(qp)(Δ+ρi(yt))1ρi(yt)(qp)(ρi(yt)+Δ)\displaystyle\frac{(q-p)(\Delta+\rho_{i}(y^{t}))}{1\!-\!\rho_{i}(y^{t})\!-\!(q\!-\!p)(\rho_{i}(y^{t})\!+\!\Delta)} (qp)(ρi(yt)+Δ)1ρi(yt).\displaystyle\geq\frac{(q-p)(\rho_{i}(y^{t})+\Delta)}{1\!-\!\rho_{i}(y^{t})}\,. (170)

The numerators on both inequalities are the same, and positive, since that qp>0q-p>0 and if iS0i\in S_{0} then ρi(yt)Δ0\rho_{i}(y^{t})-\Delta\geq 0 and for all cases ρi(yt)+Δ0\rho_{i}(y^{t})+\Delta\geq 0 regardless. Both denominators on the left hand side are smaller than those in the right side, by exactly the numerators, and therefore, the inequalities hold.

B-B Case Δ<0\Delta<0.

Next we show that constraints (20), (16) and (26) are satisfied when Δ<0\Delta<0. In the case where ρi(yt)+Δ>0\rho_{i}(y^{t})+\Delta>0, then did^{\prime}_{i} is still pqcidi-\frac{p}{q}c_{i}\leq d_{i} by equation (167). However, whenever Δ<0\Delta<0, the term 1Δ1+Δlog2(1(qp)Δ)0\frac{1-\Delta}{1+\Delta}\log_{2}\left(1-(q-p)\Delta\right)\geq 0 is added to aia^{\prime}_{i}. To show that constraint (20) holds, recall from (126) that:

i=1M\displaystyle\sum_{i=1}^{M} ρi𝖤[Wi(t+1)θ=i,Yt=yt]C\displaystyle\rho_{i}\mathsf{E}[W_{i}(t+1)\mid\theta=i,Y^{t}=y^{t}]-C
=\displaystyle= iS0ρi(yt)(qai+pbi)+iS0ρi(yt)(qdi+pci)\displaystyle\sum_{i\in S_{0}}\rho_{i}(y^{t})\left(qa_{i}+\!pb_{i}\right)+\!\sum_{i\in S_{0}}\rho_{i}(y^{t})\left(qd_{i}+\!pc_{i}\right) (171)
\displaystyle\geq iS0ρi(yt)(qai+pbi)\displaystyle\sum_{i\in S_{0}}\rho_{i}(y^{t})\left(qa_{i}+\!pb_{i}\right)
iS1ρi(yt)log2(1(qp)2ρi(yt)α1ρi(yt))\displaystyle-\sum_{i\in S_{1}}\rho_{i}(y^{t})\log_{2}\left(1-(q-p)^{2}\frac{\rho_{i}(y^{t})-\alpha}{1-\rho_{i}(y^{t})}\right) (172)
\displaystyle\geq iS0ρi(yt)(qai+pbi)1+α2log2(1+(qp)2α).\displaystyle\sum_{i\in S_{0}}\!\rho_{i}(y^{t})\left(qa_{i}\!+\!pb_{i}\right)\!-\!\frac{1\!+\!\alpha}{2}\log_{2}\left(1\!+\!(q\!-\!p)^{2}\alpha\right). (173)

To obtain 𝖤[Wi(t+1)t,θ=i]\mathsf{E}[W^{\prime}_{i}(t+1)\mid\mathcal{F}_{t},\theta=i], replace aia_{i} by aia^{\prime}_{i} in equation (173), and let ei1+α1αlog2(1+(qp)2α)e_{i}\triangleq\frac{1+\alpha}{1-\alpha}\log_{2}\left(1+(q-p)^{2}\alpha\right). Then ai=eipqbia^{\prime}_{i}=e_{i}-\frac{p}{q}b_{i} and qai+pbi=qeiqa^{\prime}_{i}+pb_{i}=qe_{i}, and replace to obtain:

i=1M\displaystyle\sum_{i=1}^{M} ρi𝖤[Wi(t+1)θ=i,Yt=yt]C\displaystyle\rho_{i}\mathsf{E}[W^{\prime}_{i}(t+1)\mid\theta=i,Y^{t}=y^{t}]-C
=\displaystyle= iS0ρi(yt)qei+iS1ρi(yt)(qdi+pci)\displaystyle\sum_{i\in S_{0}}\!\rho_{i}(y^{t})qe_{i}\!+\!\sum_{i\in S_{1}}\!\rho_{i}(y^{t})\left(qd_{i}+pc_{i}\right) (174)
\displaystyle\geq qe1iS0ρi(yt)1+α2log2(1+(qp)2α)\displaystyle qe_{1}\sum_{i\in S_{0}}\rho_{i}(y^{t})-\frac{1+\alpha}{2}\log_{2}\left(1+(q-p)^{2}\alpha\right) (175)
=\displaystyle= 1α2e11+α2log2(1+(qp)2α)\displaystyle\frac{1-\alpha}{2}e_{1}-\frac{1+\alpha}{2}\log_{2}\left(1+(q-p)^{2}\alpha\right) (176)
=\displaystyle= (1α21+α1α1+α2)log2(1+(qp)2α)\displaystyle\left(\frac{1-\alpha}{2}\frac{1+\alpha}{1-\alpha}-\frac{1+\alpha}{2}\right)\log_{2}\left(1+(q-p)^{2}\alpha\right) (177)
=\displaystyle=  0.\displaystyle\;0\,. (178)

To show that 𝖤[Wi(t+1)θ=i,Yt=yt]0\mathsf{E}[W^{\prime}_{i}(t+1)\mid\theta=i,Y^{t}=y^{t}]\geq 0, (constraint (16)), note that did^{\prime}_{i} is either unchanged from did_{i}, or the same as when Δ0\Delta\geq 0 and therefore, it holds for iS1i\in S_{1}. For iS0i\in S_{0}, note from the first term of equation (174), that 𝖤[Wi(t+1)θ=i,Yt=yt]0=ρi(yt)qei\mathsf{E}[W^{\prime}_{i}(t+1)\mid\theta=i,Y^{t}=y^{t}]-0=\rho_{i}(y^{t})qe_{i}. Since ei0e_{i}\geq 0, then the expectation is either CC or greater.

Need to show, Wi(t+1)Wi(t+1)W^{\prime}_{i}(t+1)\leq W_{i}(t+1) (constraint (26)). It suffices to show that aiaia^{\prime}_{i}\leq a_{i} and didid^{\prime}_{i}\leq d_{i}. Again, since did^{\prime}_{i} is either did_{i} or the same as when Δ0\Delta\geq 0, we only need to show that ai=eipqbiaia^{\prime}_{i}=e_{i}-\frac{p}{q}b_{i}\leq a_{i}. It suffices to show that for a positive scalar γ\gamma:

γ(q(eipqbi)+pbi)\displaystyle\gamma\left(q\left(e_{i}-\frac{p}{q}b_{i}\right)+pb_{i}\right) γ(qai+pbi).\displaystyle\leq\gamma\left(qa_{i}+pb_{i}\right)\,. (179)

When Δ<0\Delta<0, then ei>0e_{i}>0. We have that:

q(eipqbi)+pbi=1+α1αlog2(1+(qp)2α).\displaystyle q\left(e_{i}-\frac{p}{q}b_{i}\right)+\!pb_{i}=\frac{1+\alpha}{1-\alpha}\log_{2}\left(1+(q-p)^{2}\alpha\right)\,. (180)

Recall from equation (129) that:

qai+pbilog2(1(qp)2ρmin+α1ρmin),\displaystyle qa_{i}+pb_{i}\geq-\log_{2}\left(1-(q-p)^{2}\frac{\rho_{\min}+\alpha}{1-\rho_{\min}}\right)\,, (181)

and let γ=1α2\gamma=\frac{1-\alpha}{2}, then, the scaled difference between left and right terms in (179) is given by:

1α2\displaystyle\frac{1-\alpha}{2} (qai+pbi)1+α2log2(1+(qp)2α)\displaystyle\left(qa_{i}+pb_{i}\right)-\frac{1+\alpha}{2}\log_{2}(1+(q-p)^{2}\alpha)
\displaystyle\geq 1α2log2(1(qp)2ρmin+α1ρmin)\displaystyle-\frac{1-\alpha}{2}\log_{2}\left(1-(q-p)^{2}\frac{\rho_{\min}+\alpha}{1-\rho_{\min}}\right)
1+α2log2(1(qp)2αρmin11ρmin)\displaystyle-\frac{1+\alpha}{2}\log_{2}\left(1-(q-p)^{2}\alpha\frac{\rho_{\min}-1}{1-\rho_{\min}}\right) (182)
\displaystyle\geq log2(1(qp)21ρmin(ρmin1+α22α2))\displaystyle-\log_{2}\left(\!1\!-\!\frac{(q\!-\!p)^{2}}{1\!-\!\rho_{\min}}\left(\rho_{\min}\frac{1\!+\!\alpha^{2}}{2}\!-\!\alpha^{2}\right)\right) (183)
\displaystyle\geq log2(1(qp)21ρmin(ρmin1+α22ρminα))\displaystyle-\log_{2}\left(\!1\!-\!\frac{(q\!-\!p)^{2}}{1\!-\!\rho_{\min}}\left(\rho_{\min}\frac{1\!+\!\alpha^{2}}{2}\!-\!\rho_{\min}\alpha\right)\right) (184)
=\displaystyle= log2(1(qp)21ρmin(ρmin(1α)22))0.\displaystyle-\log_{2}\left(\!1\!-\!\frac{(q\!-\!p)^{2}}{1\!-\!\rho_{\min}}\left(\rho_{\min}\frac{(1\!-\!\alpha)^{2}}{2}\right)\right)\geq 0\,. (185)

Equation (182) follows from (181). In (183), Jensen’s inequality is used, where:
1α2(ρmin+α)+1+α2α(ρmin1)=α2+ρmin1α+α+α22\frac{1-\alpha}{2}(\rho_{\min}+\alpha)+\frac{1+\alpha}{2}\alpha(\rho_{\min}-1)=-\alpha^{2}+\rho_{\min}\frac{1-\alpha+\alpha+\alpha^{2}}{2}. In (184) note that αρminα2ρminα\alpha\leq\rho_{\min}\implies\alpha^{2}\leq\rho_{\min}\alpha. Finally 12α+α2=(1α)201-2\alpha+\alpha^{2}=(1-\alpha)^{2}\geq 0. We conclude that eipqbiaie_{i}-\frac{p}{q}b_{i}\leq a_{i}, and therefore Wi(t+1)Wi(t+1)W^{\prime}_{i}(t+1)\leq W_{i}(t+1).

B-C Proof of constraint (28)

Finally need to show that constraint (28) is satisfied, that is: Ui(Tn+1)pq(Ui(Tn)C2)1qlog2(2q)U^{\prime}_{i}(T_{n+1})\!-\!\frac{p}{q}(U_{i}(T_{n})\!-\!C_{2})\leq\frac{1}{q}\log_{2}(2q). We have shown that the update Wi(t)W^{\prime}_{i}(t) allows the process Ui(t)U^{\prime}_{i}(t) to meet constraints (16)-(19) of Thm. (1) and constraints (26) and (26). Note that by the definition of TnT^{\prime}_{n} in Thm. 2 it is possible that the process Ui(t)U^{\prime}_{i}(t) restarts when Ui(t)U_{i}(t) falls from confirmation, at a time t0(n+1)t_{0}^{(n+1)}, without ever attaining Ui(t)0U^{\prime}_{i}(t)\geq 0. We could construct a third process Ui′′(t)U^{\prime\prime}_{i}(t) that preserves all the properties of Ui(t)U^{\prime}_{i}(t), and with Ui′′(t)0U^{\prime\prime}_{i}(t)\geq 0 if Ui(t)0U_{i}(t)\geq 0. The process Ui′′(t)U^{\prime\prime}_{i}(t) could be initialized by Ui′′(t0(n))=Ui(t0(n))U^{\prime\prime}_{i}(t_{0}^{(n)})=U_{i}(t_{0}^{(n)}) and then letting Ui′′(t+1)=Ui′′(t)+Wi′′(t+1)U^{\prime\prime}_{i}(t+1)=U^{\prime\prime}_{i}(t)+W^{\prime\prime}_{i}(t+1) with step size Wi′′(t)W^{\prime\prime}_{i}(t) defined by: Wi′′(t+1)=max{min{Wi(t+1),Ui(t)},Wi(t+1)}W^{\prime\prime}_{i}(t+1)=\max\{\min\{W_{i}(t+1),-U_{i}(t)\},W^{\prime}_{i}(t+1)\}. The inner minimum guarantees that Ui′′(t)U^{\prime\prime}_{i}(t) reaches 0 if Ui(t)U_{i}(t) does, and the outer maximum guarantees that the step size is at least that of Ui(t)U^{\prime}_{i}(t). Then the processes Ui(t)U_{i}(t) and Ui′′(t)U^{\prime\prime}_{i}(t) cross 0 at the same time, and share the same values when Ui(t)<0U_{i}(t)<0, that is:

Ui(t+1)0\displaystyle U_{i}(t\!+\!1)\geq\!0\quad \displaystyle\implies Ui′′(t+1)0\displaystyle U^{\prime\prime}_{i}(t+1)\geq 0 (186)
Ui(t)0\displaystyle U_{i}(t)\leq 0 \displaystyle\implies Ui′′(t)=Ui(t).\displaystyle U^{\prime\prime}_{i}(t)=U_{i}(t)\,. (187)

Using the process Ui′′(t)U^{\prime\prime}_{i}(t) and equation (187), the expression in constraint (28) becomes:

Ui′′(\displaystyle U^{\prime\prime}_{i}( t+1)pq(Ui(t+1)C2)\displaystyle t+1)-\frac{p}{q}\left(U_{i}(t+1)-C_{2}\right)
=\displaystyle= Ui(t)(1pq)+Wi′′(t+1)pq(Wi(t+1)C2).\displaystyle U_{i}(t)\!\left(\!1\!-\!\frac{p}{q}\right)\!+\!W^{\prime\prime}_{i}(t\!+\!1)\!-\!\frac{p}{q}\left(W_{i}(t\!+\!1)\!-\!C_{2}\right). (188)

In the case where Wi′′(t+1)=Ui(t)W^{\prime\prime}_{i}(t+1)=-U_{i}(t), we have that
Ui′′(t+1)=0U^{\prime\prime}_{i}(t+1)=0 and Ui(t+1)[0,C2]U_{i}(t+1)\in[0,C_{2}], then:

Ui′′(t+1)\displaystyle U^{\prime\prime}_{i}(t+1) pq(Ui(t+1)C2)\displaystyle\!-\!\frac{p}{q}(U_{i}(t\!+\!1)\!-\!C_{2})
=pq(Ui(t)+Wi(t+1)C2)\displaystyle=-\frac{p}{q}(U_{i}(t)+W_{i}(t+1)-C_{2}) (189)
=pqC2pq(Ui(t)+Wi(t+1))\displaystyle=\frac{p}{q}C_{2}-\frac{p}{q}(U_{i}(t)+W_{i}(t+1)) (190)
pqC21qlog2(2q).\displaystyle\leq\frac{p}{q}C_{2}\leq\frac{1}{q}\log_{2}(2q)\,. (191)

The first inequality in (191) follows since Ui′′(t)=0Wi(t+1)Ui(t)U^{\prime\prime}_{i}(t)=0\implies W_{i}(t+1)\geq-U_{i}(t) and the second inequality holds because:

log2(2q)pC2\displaystyle\log_{2}(2q)-pC_{2} =1+(1p)log2(1p)+plog2(p)\displaystyle=1+(1-p)\log_{2}(1-p)+p\log_{2}(p)
=C0.\displaystyle=C\geq 0\,. (192)

For the case where Wi′′(t)>Ui(t)W^{\prime\prime}_{i}(t)>-U_{i}(t) we solve a constraint maximization of expression (188), where the constraint is Ui(t)<0U_{i}(t)<0 (or ρi(yt)<12\rho_{i}(y^{t})<\frac{1}{2}). For simplicity we subtract the constant 1qlog2(2q)\frac{1}{q}\log_{2}(2q) from (188).

Let i{1,,M}i\in\{1,\dots,M\} be arbitrary and let ρρi(yt)\rho\triangleq\rho_{i}(y^{t}), and α|Δ|\alpha\triangleq|\Delta|. Using the definitions of Wi(t)W_{i}(t) and Wi′′(t)W^{\prime\prime}_{i}(t) in (155), (162) and (160)-(161), we explicitly find expressions for Ui′′(t+1)pq(Ui(t+1)C2)U^{\prime\prime}_{i}(t+1)\!-\!\frac{p}{q}(U_{i}(t\!+\!1)\!-\!C_{2}) in terms of ρ\rho, pp, qq and α\alpha.

When {iS0}{Δ<0}\{i\in S_{0}\}\cap\{\Delta<0\} or {iS1}{Δ0}\{i\in S_{1}\}\cap\{\Delta\geq 0\} the expression is given by:

log2(ρ1ρ)(1pq)pqlog2(2q)pqlog2(2p)\displaystyle\log_{2}\left(\frac{\rho}{1\!-\!\rho}\right)\left(1\!-\!\frac{p}{q}\right)-\frac{p}{q}\log_{2}(2q)-\frac{p}{q}\log_{2}(2p)
+𝟙Δ<01+α1αlog2(1+(qp)2α)\displaystyle+\mathbbm{1}_{\Delta<0}\frac{1+\alpha}{1-\alpha}\log_{2}(1+(q-p)^{2}\alpha) (193)
+pqlog2(1+(qp)ρ+α1ρ)+pqlog2(1(qp)ρ+α1ρ),\displaystyle+\frac{p}{q}\log_{2}\left(1\!+\!(q\!-\!p)\frac{\rho\!+\!\alpha}{1\!-\!\rho}\right)\!+\!\frac{p}{q}\log_{2}\left(1\!-\!(q\!-\!p)\frac{\rho\!+\!\alpha}{1\!-\!\rho}\right)\,,

and when {iS0}{Δ0}\{i\in S_{0}\}\cap\{\Delta\geq 0\} or {iS1}{Δ<0}\{i\in S_{1}\}\cap\{\Delta<0\} it is given by:

log2(ρ1ρ)(1pq)pqlog2(2q)pqlog2(2p)\displaystyle\log_{2}\left(\frac{\rho}{1\!-\!\rho}\right)\left(1\!-\!\frac{p}{q}\right)-\frac{p}{q}\log_{2}(2q)-\frac{p}{q}\log_{2}(2p) (194)
+pqlog2(1+(qp)ρα1ρ)+pqlog2(1(qp)ρα1ρ).\displaystyle+\frac{p}{q}\log_{2}\left(1\!+\!(q\!-\!p)\frac{\rho\!-\!\alpha}{1\!-\!\rho}\right)+\frac{p}{q}\log_{2}\left(1\!-\!(q\!-\!p)\frac{\rho\!-\!\alpha}{1\!-\!\rho}\right)\,.

B-D Maximizing (193)

The maximum of (193) happens when Δ>0\Delta>0, since the term with the indicator function is non-negative. Since α13\alpha\leq\frac{1}{3}, then 1+α1α2\frac{1+\alpha}{1-\alpha}\leq 2, and we proceed to solve:

maximize f(ρ,α)\displaystyle\textbf{maximize }f(\rho,\alpha) (195)
subject to ρ12,α12ρ,\displaystyle\textbf{subject to }\rho\leq\frac{1}{2},\alpha\leq 1-2\rho\,, (196)

where f(ρ,α)f(\rho,\alpha) is defined by:

f(ρ,α)log2(ρ1ρ)(1pq)\displaystyle f(\rho,\alpha)\triangleq\log_{2}\left(\frac{\rho}{1\!-\!\rho}\right)\left(1\!-\!\frac{p}{q}\right) (197)
+2log2(1+(qp)2α)pqlog2(2q)pqlog2(2p)\displaystyle+2\log_{2}(1+(q-p)^{2}\alpha)-\frac{p}{q}\log_{2}(2q)-\frac{p}{q}\log_{2}(2p)
+pqlog2(1+(qp)ρ+α1ρ)+pqlog2(1(qp)ρ+α1ρ).\displaystyle+\frac{p}{q}\log_{2}\left(1\!+\!(q\!-\!p)\frac{\rho\!+\!\alpha}{1\!-\!\rho}\right)\!+\!\frac{p}{q}\log_{2}\left(1\!-\!(q\!-\!p)\frac{\rho\!+\!\alpha}{1\!-\!\rho}\right)\,.

First we show that ff is increasing in ρ\rho, by showing ddρf0\frac{d}{d\rho}f\geq 0. Note that ddρρ1ρ=1(1ρ)2\frac{d}{d\rho}\frac{\rho}{1-\rho}=\frac{1}{(1-\rho)^{2}} and ddρρ+α1ρ=1(1ρ)2+α(1ρ)2\frac{d}{d\rho}\frac{\rho+\alpha}{1-\rho}=\frac{1}{(1-\rho)^{2}}+\frac{\alpha}{(1-\rho)^{2}}

ρ\displaystyle\frac{\partial}{\partial\rho} ln(2)f(ρ,α)=qpq1(1ρ)ρ+\displaystyle\ln(2)f(\rho,\alpha)=\frac{q-p}{q}\frac{1}{(1-\rho)\rho}+
pq1+α(1ρ)2((qp)1+(qp)ρ+α1ρ(qp)1(qp)ρ+α1ρ).\displaystyle\frac{p}{q}\frac{1\!+\!\alpha}{(1\!-\!\rho)^{2}}\left(\frac{(q-p)}{1+(q\!-\!p)\frac{\rho+\alpha}{1-\rho}}-\frac{(q-p)}{1-(q\!-\!p)\frac{\rho+\alpha}{1-\rho}}\right)\,. (198)

Factor out the positive constant 1qqp1ρ\frac{1}{q}\frac{q-p}{1-\rho}, to obtain:

1ρ+p1+α1ρ(11+(qp)ρ+α1ρ11(qp)ρ+α1ρ)\displaystyle\frac{1}{\rho}+p\frac{1\!+\!\alpha}{1\!-\!\rho}\left(\frac{1}{1+(q\!-\!p)\frac{\rho+\alpha}{1-\rho}}-\frac{1}{1-(q\!-\!p)\frac{\rho+\alpha}{1-\rho}}\right)
=1ρ+p1+α1ρ(1(qp)ρ+α1ρ)(1+(qp)ρ+α1ρ)1(qp)2(ρ+α1ρ)2\displaystyle=\frac{1}{\rho}+p\frac{1\!+\!\alpha}{1\!-\!\rho}\frac{\left(1\!-\!(q\!-\!p)\frac{\rho\!+\!\alpha}{1\!-\!\rho}\right)-\left(1\!+\!(q\!-\!p)\frac{\rho\!+\!\alpha}{1\!-\!\rho}\right)}{1-(q-p)^{2}\left(\frac{\rho+\alpha}{1-\rho}\right)^{2}} (199)
=1ρ2p(1+α)(qp)(ρ+α)(1ρ)2(qp)2(ρ+α)2=\displaystyle=\frac{1}{\rho}-2\frac{p(1+\alpha)(q-p)(\rho+\alpha)}{(1-\rho)^{2}-(q-p)^{2}(\rho+\alpha)^{2}}=
(1ρ)2(qp)2(ρ+α)22pρ(1+α)(qp)(ρ+α)ρ(1ρ)2ρ(qp)2(ρ+α)2.\displaystyle\frac{(1\!-\!\rho)^{2}\!-\!(q\!-\!p)^{2}(\rho\!+\!\alpha)^{2}\!-\!2p\rho(1\!+\!\alpha)(q\!-\!p)(\rho\!+\!\alpha)}{\rho(1-\rho)^{2}-\rho(q-p)^{2}(\rho+\alpha)^{2}}. (200)

It suffices to show that the top of equation (200) is non-negative. Since it decreases when α12ρ\alpha\leq 1-2\rho then:

(1ρ)2\displaystyle(1-\rho)^{2} (qp)2(ρ+α)22pρ(1+α)(qp)(ρ+α)\displaystyle-(q-p)^{2}(\rho+\alpha)^{2}-2p\rho(1+\alpha)(q-p)(\rho+\alpha)
\displaystyle\geq (1ρ)2(qp)2(ρ+12ρ)2\displaystyle(1-\rho)^{2}-(q-p)^{2}(\rho+1-2\rho)^{2}
2pρ(1+12ρ)(qp)(ρ+12ρ)\displaystyle-2p\rho(1+1-2\rho)(q-p)(\rho+1-2\rho) (201)
=\displaystyle= (1ρ)2(qp)2(1ρ)2\displaystyle(1-\rho)^{2}-(q-p)^{2}(1-\rho)^{2}
2pρ(22ρ)(qp)(1ρ)\displaystyle-2p\rho(2-2\rho)(q-p)(1-\rho) (202)
=\displaystyle= (1ρ)2(1(qp)2(qp)4pρ)\displaystyle(1-\rho)^{2}(1-(q-p)^{2}-(q-p)4p\rho) (203)
=\displaystyle= (1ρ)24p(qρ(qp))\displaystyle(1-\rho)^{2}4p(q-\rho(q-p))
>4p(1ρ)2(qρ)>0.\displaystyle>4p(1-\rho)^{2}(q-\rho)>0\,. (204)

In equation (204) we have used (qp)2=(12p)2=14p+4p2=14pq(q-p)^{2}=(1-2p)^{2}=1-4p+4p^{2}=1-4pq and ρ(qp)<ρq<ρ\rho(q-p)<\rho q<\rho. Since ρf>0\frac{\partial}{\partial\rho}f>0 then ff is increasing in ρ\rho and we can replace ρ\rho by 1α2\frac{1-\alpha}{2} for an upper bound. Since ρ+α1ρ=1α2+α11α2=1α+2α21+α=1\frac{\rho+\alpha}{1-\rho}=\frac{\frac{1-\alpha}{2}+\alpha}{1-\frac{1-\alpha}{2}}=\frac{1-\alpha+2\alpha}{2-1+\alpha}=1, then f(α)f(1α2,α)f(\alpha)\triangleq f\left(\frac{1-\alpha}{2},\alpha\right) is given by:

f(α)=\displaystyle f(\alpha)= log2(1α1+α)(1pq)+2log2(1+(qp)2α)\displaystyle\log_{2}\left(\frac{1\!-\!\alpha}{1\!+\!\alpha}\right)\left(1\!-\!\frac{p}{q}\right)\!+\!2\log_{2}(1\!+\!(q\!-\!p)^{2}\alpha) (205)
pqlog2(2q)pqlog2(2p)\displaystyle-\frac{p}{q}\log_{2}(2q)-\frac{p}{q}\log_{2}(2p)
+pqlog2(1+(qp))+pqlog2(1(qp))\displaystyle+\frac{p}{q}\log_{2}(1\!+\!(q\!-\!p))+\frac{p}{q}\log_{2}(1-(q-p)) (206)
=\displaystyle= log2(1α1+α)(1pq)+2log2(1+(qp)2α).\displaystyle\log_{2}\left(\frac{1\!-\!\alpha}{1\!+\!\alpha}\right)\!\left(1\!-\!\frac{p}{q}\right)\!+\!2\log_{2}(1\!+\!(q\!-\!p)^{2}\alpha). (207)

To complete the proof, it suffices to show that the last expression decreases in α\alpha:

ddα\displaystyle\frac{d}{d\alpha} ln(2)f(α)\displaystyle\ln(2)f(\alpha)
=\displaystyle= (1pq)1+α1α(1+α)(1α)(1+α)2+2(qp)21+(qp)2α\displaystyle\left(1\!-\!\frac{p}{q}\right)\frac{1\!+\!\alpha}{1\!-\!\alpha}\frac{-(1\!+\!\alpha)\!-\!(1\!-\!\alpha)}{(1+\alpha)^{2}}\!+\!\frac{2(q-p)^{2}}{1\!+\!(q\!-\!p)^{2}\alpha} (208)
=\displaystyle= 21qqp1α2+2(qp)21+(qp)2α\displaystyle-2\frac{1}{q}\frac{q-p}{1-\alpha^{2}}+2\frac{(q-p)^{2}}{1+(q-p)^{2}\alpha} (209)
=\displaystyle= 2(qp)(1q11α2+qp1+(qp)2α)\displaystyle 2(q-p)\left(-\frac{1}{q}\frac{1}{1-\alpha^{2}}+\frac{q-p}{1+(q-p)^{2}\alpha}\right) (210)
2(qp)(1+pqq+p)\displaystyle\leq-2(q-p)\left(1+\frac{p}{q}-q+p\right)
=2p(qp)(2+1q)<0.\displaystyle=-2p(q-p)\left(2+\frac{1}{q}\right)<0\,. (211)

Since ff is decreasing, then the maximum of equation (193) is 0, at α=0\alpha=0.

B-E Maximizing (194)

The expression (194) is given by g(ρ,α)pqlog2(2q)pqlog2(2p)g(\rho,\alpha)-\frac{p}{q}\log_{2}(2q)-\frac{p}{q}\log_{2}(2p) where f(ρ,α)f(\rho,\alpha) is defined by:

g\displaystyle g (ρ,α)log2(ρ1ρ)(1pq)\displaystyle(\rho,\alpha)\triangleq\log_{2}\left(\frac{\rho}{1\!-\!\rho}\right)\left(1\!-\!\frac{p}{q}\right) (212)
+pqlog2(1+(qp)ρα1ρ)+pqlog2(1(qp)ρα1ρ).\displaystyle+\frac{p}{q}\log_{2}\left(1\!+\!(q\!-\!p)\frac{\rho\!-\!\alpha}{1\!-\!\rho}\right)+\frac{p}{q}\log_{2}\left(1\!-\!(q\!-\!p)\frac{\rho\!-\!\alpha}{1\!-\!\rho}\right)\,.

We proceed to solve:

maximize g(ρ,α)\displaystyle g(\rho,\alpha) (213)
subject to ρ12,α12ρ.\displaystyle\rho\leq\frac{1}{2},\alpha\leq 1-2\rho\,. (214)

Combining the last two terms we obtain:

g(ρ,α)=\displaystyle g(\rho,\alpha)= log2(ρ1ρ)(1pq)\displaystyle\log_{2}\left(\frac{\rho}{1-\rho}\right)\left(1-\frac{p}{q}\right)
+pqlog2(1(qp)2(ρα1ρ)2).\displaystyle+\frac{p}{q}\log_{2}\left(1-(q-p)^{2}\left(\frac{\rho-\alpha}{1-\rho}\right)^{2}\right)\,. (215)

The first term increases with ρ\rho, and the second one decreases as the quotient (ρα1ρ)\left(\frac{\rho-\alpha}{1-\rho}\right) increases in absolute value. For ρ13\rho\leq\frac{1}{3}, it is possible to have ρ=α\rho=\alpha, leaving only (215). However, for ρ13\rho\geq\frac{1}{3}, the quotient is positive because α12ρ13\alpha\leq 1-2\rho\leq\frac{1}{3}. The smallest value of the quotient is then 1α1ρ3ρ11ρ=2ρ1ρ1\frac{1-\alpha}{1-\rho}\frac{3\rho-1}{1-\rho}=\frac{2\rho}{1-\rho}-1 with square 14ρ(12ρ)(1ρ)21-\frac{4\rho(1-2\rho)}{(1-\rho)^{2}}. Let g(ρ)g(\rho) be defined in equation (216), then the maximum of g(ρ,α)g(\rho,\alpha) is bounded by the maximum of g(ρ)g(\rho), where:

g(ρ)\displaystyle g(\rho) log2(ρ1ρ)(1pq)\displaystyle\triangleq\log_{2}\left(\frac{\rho}{1-\rho}\right)\left(1-\frac{p}{q}\right) (216)
+\displaystyle+ pqlog2(1(qp)3ρ11ρ)+pqlog2(1+(qp)3ρ11ρ).\displaystyle\frac{p}{q}\log_{2}\left(1\!-\!(q\!-\!p)\frac{3\rho\!-\!1}{1\!-\!\rho}\right)+\frac{p}{q}\log_{2}\left(1\!+\!(q\!-\!p)\frac{3\rho\!-\!1}{1\!-\!\rho}\right)\,.

To determine the max, we find the behavior of g(ρ)g(\rho) by taking the first derivative:

ddρ\displaystyle\frac{d}{d\rho} g(ρ)=qpqln(2)1ρρ1(1ρ)2\displaystyle g(\rho)=\frac{q-p}{q\ln(2)}\frac{1-\rho}{\rho}\frac{1}{(1-\rho)^{2}}
+\displaystyle+ pqln(2)((qp)2(1ρ)21+(qp)3ρ11ρ(qp)2(1ρ)21(qp)3ρ11ρ).\displaystyle\frac{p}{q\ln(2)}\left(\frac{(q-p)\frac{2}{(1-\rho)^{2}}}{1+(q-p)\frac{3\rho-1}{1-\rho}}-\frac{(q-p)\frac{2}{(1-\rho)^{2}}}{1-(q-p)\frac{3\rho-1}{1-\rho}}\right)\,. (217)

Then, scale by the positive term (1ρ)2qpqln(2)\frac{(1-\rho)^{2}}{q-p}q\ln(2) to obtain:

g\displaystyle g^{\prime} (ρ)(1ρ)2qpqln(2)\displaystyle(\rho)\frac{(1-\rho)^{2}}{q-p}q\ln(2)
=\displaystyle= 1ρρp21(qp)3ρ11ρ+p21+(qp)3ρ11ρ\displaystyle\frac{1-\rho}{\rho}-p\frac{2}{1-(q-p)\frac{3\rho-1}{1-\rho}}+p\frac{2}{1+(q-p)\frac{3\rho-1}{1-\rho}} (218)
=\displaystyle= 1ρρ+2p1(qp)3ρ11ρ1(qp)3ρ11ρ1(qp)3ρ11ρ\displaystyle\frac{1-\rho}{\rho}+2p\frac{1-(q-p)\frac{3\rho-1}{1-\rho}-1-(q-p)\frac{3\rho-1}{1-\rho}}{1-(q-p)\frac{3\rho-1}{1-\rho}} (219)
=\displaystyle= 1ρρ4p(qp)3ρ11ρ1(qp)3ρ11ρ\displaystyle\frac{1-\rho}{\rho}-4p(q-p)\frac{\frac{3\rho-1}{1-\rho}}{1-(q-p)\frac{3\rho-1}{1-\rho}} (220)
=\displaystyle= 1ρρ4p(qp)3ρ11(qp)(3ρ1)\displaystyle\frac{1-\rho}{\rho}-4p(q-p)\frac{3\rho-1}{1-(q-p)(3\rho-1)} (221)
=\displaystyle= 1ρ(qp)(3ρ1)(1ρ+4pρ)ρ(qp)ρ(3ρ1).\displaystyle\frac{1-\rho-(q-p)(3\rho-1)(1-\rho+4p\rho)}{\rho-(q-p)\rho(3\rho-1)}\,. (222)

To show that g0g^{\prime}\geq 0 in [13,12][\frac{1}{3},\frac{1}{2}] it suffices to show that the top of equation (222) is positive:

1ρ\displaystyle 1-\rho (qp)(3ρ1)(1ρ+4pρ)\displaystyle-(q-p)(3\rho-1)(1-\rho+4p\rho)
\displaystyle\geq 112(qp)(321)(1ρ(14p))\displaystyle 1-\frac{1}{2}-(q-p)\left(\frac{3}{2}-1\right)(1-\rho(1-4p)) (223)
=\displaystyle= 1212p2(1ρ(14p))\displaystyle\frac{1}{2}-\frac{1-2p}{2}(1-\rho(1-4p)) (224)
=\displaystyle= 1212p2+12p2ρ(14p)\displaystyle\frac{1}{2}-\frac{1-2p}{2}+\frac{1-2p}{2}\rho(1-4p) (225)
=\displaystyle= 1212+p+ρ(12p)(14p)2\displaystyle\frac{1}{2}-\frac{1}{2}+p+\rho\frac{(1-2p)(1-4p)}{2} (226)
=\displaystyle= p+ρ(12p)(14p)2.\displaystyle p+\rho\frac{(1-2p)(1-4p)}{2}\,. (227)

When p14p\leq\frac{1}{4}, then (14p)>0(1-4p)>0 and the second term is non-negative and therefore the derivative is positive. When p>14p>\frac{1}{4} we have 14p11-4p\geq-1, 012p<120\leq 1-2p<\frac{1}{2}, then:

p+ρ(12p)(14p)214ρ14=1ρ418>0,\displaystyle p+\rho\frac{(1-2p)(1-4p)}{2}\geq\frac{1}{4}-\rho\frac{1}{4}=\frac{1-\rho}{4}\geq\frac{1}{8}>0\,,

therefore, gg is increasing in ρ\rho and the maximum is at ρ=12\rho=\frac{1}{2} where ρ1ρ=1\frac{\rho}{1-\rho}=1. The maximum is given by:

g(12)=\displaystyle g\left(\frac{1}{2}\right)= log2(1)(1pq)\displaystyle\log_{2}\left(1\right)\left(1-\frac{p}{q}\right)
+pqlog(1+(qp))+pqln(1(qp))\displaystyle+\frac{p}{q}\log\left(1\!+\!(q\!-\!p)\right)+\frac{p}{q}\ln\left(1\!-\!(q\!-\!p)\right)
=\displaystyle= pqlog2(2q)+pqlog2(2p).\displaystyle\frac{p}{q}\log_{2}(2q)+\frac{p}{q}\log_{2}(2p)\,. (228)

Then the maximum of g(ρ,α)pqlog(2q)pqlog2(2p)g(\rho,\alpha)-\frac{p}{q}\log(2q)-\frac{p}{q}\log_{2}(2p) is zero. Since the maximum of both equations, (193) and (194) are zero, we conclude that Ui(t+1)pq(Ui(t+1)C2)1qlog2(2q)U^{\prime}_{i}(t+1)-\frac{p}{q}\left(U_{i}(t+1)-C_{2}\right)\leq\frac{1}{q}\log_{2}(2q).

Finally, we prove the last claim that B=1qlog2(2q)B=\frac{1}{q}\log_{2}(2q) is the smallest value for a system that enforces the SED constraint. It suffices to note that the surrogate process described in [2], Sec. V E is a strict martingale. A process Ui(t)U^{\prime}_{i}(t) with a lower BB value would not comply with constraint (9) and therefore would also fail to meet constraint (20).

Appendix C Proof: Confirmation Phase State Space 3

Proof:

Suppose that for times t=st=s and t=s+1t=s+1 the partitioning is fixed at S0={j}S_{0}=\{j\} and S1=Ω{j}S_{1}=\Omega\setminus\{j\}. Suppose also that Ys=0Y_{s}=0, and Ys+1=1Y_{s+1}=1. Need to show that i=1,,M\forall i=1,\dots,M we have that ρi(ys)=ρi(ys+2)\rho_{i}(y^{s})=\rho_{i}(y^{s+2}). Using the update formula (121), at time t=s+1t=s+1 we have that for iji\neq j:

ρi(ys+1)\displaystyle\rho_{i}(y^{s+1}) =pρi(ys)qρj(ys)+p(1ρj(ys))=pρi(ys)ρj(ys)(qp)+p\displaystyle=\frac{p\rho_{i}(y^{s})}{q\rho_{j}(y^{s})\!+\!p(1\!-\!\rho_{j}(y^{s}))}=\frac{p\rho_{i}(y^{s})}{\rho_{j}(y^{s})(q\!-\!p)\!+\!p} (229)
ρj(ys+1)\displaystyle\rho_{j}(y^{s+1}) =qρj(ys)qρj(ys)+p(1ρj(ys))=qρj(ys)ρj(ys)(qp)p.\displaystyle=\frac{q\rho_{j}(y^{s})}{q\rho_{j}(y^{s})+p(1-\rho_{j}(y^{s}))}=\frac{q\rho_{j}(y^{s})}{\rho_{j}(y^{s})(q\!-\!p)\!-\!p}\,. (230)

At time t=s+2t=s+2, since Ys+2=1Y_{s+2}=1, equation (121) for iji\neq j results in:

ρi(ys+2)\displaystyle\rho_{i}(y^{s+2}) =qρi(ys+1)(pq)ρj(ys+1)+q=qpρi(ys)ρj(ys)(qp)+p(pq)qρj(ys)ρj(ys)(qp)+p+q\displaystyle=\frac{q\rho_{i}(y^{s+1})}{(p\!-\!q)\rho_{j}(y^{s+1})\!+\!q}=\frac{q\frac{p\rho_{i}(y^{s})}{\rho_{j}(y^{s})(q\!-\!p)\!+\!p}}{(p\!-\!q)\frac{q\rho_{j}(y^{s})}{\rho_{j}(y^{s})(q\!-\!p)\!+\!p}\!+\!q}
=qpρi(ys)(pq)qρj(ys)+q(ρj(ys)(qp)+p)\displaystyle=\frac{qp\rho_{i}(y^{s})}{(p\!-\!q)q\rho_{j}(y^{s})\!+\!q(\rho_{j}(y^{s})(q\!-\!p)\!+\!p)} (231)
=qpρi(ys)(qp)qρj(ys)+(qp)qρj(ys)+qp\displaystyle=\frac{qp\rho_{i}(y^{s})}{-(q\!-\!p)q\rho_{j}(y^{s})\!+\!(q\!-\!p)q\rho_{j}(y^{s})\!+\!qp}
=qpρi(ys)qp=ρi(ys).\displaystyle=\frac{qp\rho_{i}(y^{s})}{qp}=\rho_{i}(y^{s})\,. (232)

And for i=ji=j equation (121) results in:

ρj(ys+2)\displaystyle\rho_{j}(y^{s+2}) =pρj(ys+1)(pq)ρj(ys+1)+q=pqρj(ys)ρj(ys)(qp)+p(pq)qρj(ys)ρj(ys)(qp)+p+q\displaystyle=\frac{p\rho_{j}(y^{s+1})}{(p\!-\!q)\rho_{j}(y^{s+1})\!+\!q}=\frac{p\frac{q\rho_{j}(y^{s})}{\rho_{j}(y^{s})(q\!-\!p)\!+\!p}}{(p\!-\!q)\frac{q\rho_{j}(y^{s})}{\rho_{j}(y^{s})(q\!-\!p)\!+\!p}\!+\!q}
=pqρj(ys)(pq)qρj(ys)+q(ρj(ys)(qp)+p)\displaystyle=\frac{pq\rho_{j}(y^{s})}{(p\!-\!q)q\rho_{j}(y^{s})\!+\!q(\rho_{j}(y^{s})(q\!-\!p)\!+\!p)} (233)
=qpρj(ys)(qp)qρj(ys)+(qp)qρj(ys)+qp\displaystyle=\frac{qp\rho_{j}(y^{s})}{-\!(q\!-\!p)q\rho_{j}(y^{s})\!+\!(q\!-\!p)q\rho_{j}(y^{s})\!+\!qp}
=qpρj(ys)qp=ρj(ys).\displaystyle=\frac{qp\rho_{j}(y^{s})}{qp}=\rho_{j}(y^{s})\,. (234)

Then for all i=i,Mi=i\dots,M each posterior at time t=s+1t=s+1 is given by: ρi(ys+2)=ρi(ys)\rho_{i}(y^{s+2})=\rho_{i}(y^{s}). The same equalities hold when Ys+1=1Y_{s+1}=1 and Ys+2=0Y_{s+2}=0, where the only difference is that pp and qq are interchanged. By induction, we have that ρi(ys+2r)=ρi(ys)\rho_{i}(y^{s+2r})=\rho_{i}(y^{s}) for r=1,r=1,\dots, if for every t=s,,s+2r1t=s,\dots,s+2r-1 the partitions are fixed at S0={j}S_{0}=\{j\} and S1=Ω{j}S_{1}=\Omega\setminus\{j\}, and k=12rYs+k=0\sum_{k=1}^{2r}Y_{s+k}=0. ∎

Appendix D Proof of Inequality (108), Sec. III

Proof:

Need to show that the following inequality holds:

𝖤[Ui(Tn)T(n+1)>0,θ=i]𝖤[Ui(Tn)T(n)>0,θ=i]\displaystyle\mathsf{E}[U_{i}(T_{n})\mid T^{(n+1)}\!>\!0,\theta\!=\!i]\geq\mathsf{E}[U_{i}(T_{n})\mid T^{(n)}\!>\!0,\theta\!=\!i] (235)

Recall that 𝒞(t0(n))\mathcal{C}(t^{(n)}_{0}) is the event that message ii enters confirmation after time t0(n)t^{(n)}_{0}, rather than another message jij\neq i ending the process by attaining Uj(t)log2(1ϵ)log2(ϵ)U_{j}(t)\geq\log_{2}(1-\epsilon)-\log_{2}(\epsilon). This event is defined by 𝒞(t0(n)){t>t0(n):Ui(t)0}\mathcal{C}(t^{(n)}_{0})\triangleq\{\exists t>t^{(n)}_{0}:U_{i}(t)\geq 0\}. Using Bayes rule, the expectation 𝖤[Ui(Tn)T(n)>0,θ=i]\mathsf{E}[U_{i}(T_{n})\mid T^{(n)}>0,\theta\!=\!i] can be expanded as a sum of expectations conditioned on events that are defined in terms of {T(n+1)>0}\{T^{(n+1)}>0\}, {T(n)>0}\{T^{(n)}>0\} and 𝒞(t0(n))\mathcal{C}(t^{(n)}_{0}), and whose union is the full event space to leave only the original conditioning {T(n)>0}\{T^{(n)}>0\}. These events are 𝒞(t0(n)){T(n+1)>0}\mathcal{C}(t^{(n)}_{0})\cap\{T^{(n+1)}>0\}, 𝒞(t0(n)){T(n+1)0}\mathcal{C}(t^{(n)}_{0})\cap\{T^{(n+1)}\leq 0\}, ¬𝒞(t0(n)){T(n+1)>0}\neg\mathcal{C}(t^{(n)}_{0})\cap\{T^{(n+1)}>0\} and ¬𝒞(t0(n)){T(n+1)0}\neg\mathcal{C}(t^{(n)}_{0})\cap\{T^{(n+1)}\leq 0\}. Note that ¬𝒞(t0(n)){T(n+1)=0}\neg\mathcal{C}(t^{(n)}_{0})\implies\{T^{(n+1)}=0\} and therefore the third event vanishes. The expansion is given by:

𝖤[\displaystyle\mathsf{E}[ Ui(Tn)T(n)>0,θ=i]\displaystyle U_{i}(T_{n})\mid T^{(n)}>0,\theta=i]
=\displaystyle= 𝖤[Ui(Tn)T(n+1)>0,T(n)>0,θ=i]\displaystyle\mathsf{E}[U_{i}(T_{n})\mid T^{(n+1)}>0,T^{(n)}>0,\theta=i]
Pr(𝒞(t0(n)),T(n+1)>0T(n)>0,θ=i)\displaystyle\quad\quad\quad\cdot\Pr(\mathcal{C}(t^{(n)}_{0}),T^{(n+1)}>0\mid T^{(n)}>0,\theta=i) (236)
+𝖤[Ui(Tn)𝒞(t0(n)),T(n+1)0,T(n)>0,θ=i]\displaystyle+\mathsf{E}[U_{i}(T_{n})\mid\mathcal{C}(t^{(n)}_{0}),T^{(n+1)}\!\leq\!0,T^{(n)}\!>\!0,\theta\!=\!i]
Pr(𝒞(t0(n)),T(n+1)0T(n)>0,θ=i)\displaystyle\quad\quad\quad\cdot\Pr(\mathcal{C}(t^{(n)}_{0}),T^{(n+1)}\leq 0\mid T^{(n)}\!>\!0,\theta\!=\!i) (237)
+𝖤[Ui(Tn)¬𝒞(t0(n)),T(n+1)0,T(n)>0,θ=i]\displaystyle+\mathsf{E}[U_{i}(T_{n})\mid\neg\mathcal{C}(t^{(n)}_{0}),T^{(n+1)}\!\leq\!0,T^{(n)}\!>\!0,\theta\!=\!i]
Pr(¬𝒞(t0(n)),T(n+1)0T(n)>0,θ=i).\displaystyle\quad\quad\cdot\Pr(\neg\mathcal{C}(t^{(n)}_{0}),T^{(n+1)}\leq 0\mid T^{(n)}\!>\!0,\theta\!=\!i)\,. (238)

Since {T(n+1)>0}𝒞(t0(n)){T(n)>0}\{T^{(n+1)}>0\}\implies\mathcal{C}(t^{(n)}_{0})\cap\{T^{(n)}>0\} , we can omit the conditioning on 𝒞(t0(n))\mathcal{C}(t^{(n)}_{0}) and {T(n)>0}\{T^{(n)}>0\} when accompanied by {T(n+1)>0}\{T^{(n+1)}>0\}. By the independence of the confirmation phase from the crossing value Ui(Tn)U_{i}(T_{n}) derived from the fix state count of the Markov Chain we have that:

𝖤[Ui(Tn)\displaystyle\mathsf{E}[U_{i}(T_{n}) 𝒞(t0(n)),T(n+1)0,T(n)>0,θ=i]\displaystyle\mid\mathcal{C}(t^{(n)}_{0}),T^{(n+1)}\!\leq\!0,T^{(n)}\!>\!0,\theta\!=\!i]
=𝖤[Ui(Tn)T(n+1)>0,T(n)>0,θ=i].\displaystyle=\mathsf{E}[U_{i}(T_{n})\mid T^{(n+1)}>0,T^{(n)}>0,\theta=i]\,. (239)

Therefore, we can replace the expectation in (237) by the one in (236) and then add the probabilities in (236) and (237) to obtain Pr(𝒞(t0(n))T(n)>0,θ=i)\Pr(\mathcal{C}(t^{(n)}_{0})\mid T^{(n)}\!>\!0,\theta\!=\!i). Note that ¬𝒞(t0(n)){T(n+1)0}\neg\mathcal{C}(t^{(n)}_{0})\implies\{T^{(n+1)}\leq 0\}, thus the conditioning on {T(n+1)0}\{T^{(n+1)}\leq 0\} is redundant with ¬𝒞(t0(n))\neg\mathcal{C}(t^{(n)}_{0}). Then the expectation in the left of (236) is also given by:

𝖤[Ui(Tn)\displaystyle\mathsf{E}[U_{i}(T_{n}) T(n)>0,θ=i]\displaystyle\mid T^{(n)}>0,\theta=i]
=𝖤[Ui(Tn)T(n+1)>0,θ=i]\displaystyle=\mathsf{E}[U_{i}(T_{n})\mid T^{(n+1)}>0,\theta=i]
Pr(𝒞(t0(n))T(n)>0,θ=i)\displaystyle\quad\quad\quad\quad\quad\cdot\Pr(\mathcal{C}(t^{(n)}_{0})\mid T^{(n)}\!>\!0,\theta\!=\!i) (240)
+𝖤[Ui(Tn)¬𝒞(t0(n)),T(n)>0,θ=i]\displaystyle\quad+\mathsf{E}[U_{i}(T_{n})\mid\neg\mathcal{C}(t^{(n)}_{0}),T^{(n)}\!>\!0,\theta\!=\!i]
Pr(¬𝒞(t0(n))T(n)>0,θ=i)\displaystyle\quad\quad\quad\quad\quad\cdot\Pr(\neg\mathcal{C}(t^{(n)}_{0})\mid T^{(n)}\!>\!0,\theta\!=\!i) (241)

The event ¬𝒞(t0(n)){T(n)>0}{θ=i}\neg\mathcal{C}(t^{(n)}_{0})\cap\{T^{(n)}>0\}\cap\{\theta=i\} implies that the process decodes in error at the nnth communication phase round, which results in Ui(Tn)<0U_{i}(T_{n})<0. Therefore, we have that 𝖤[Ui(Tn)¬𝒞(t0(n)),T(n+1)0,θ=i]<0\mathsf{E}[U_{i}(T_{n})\mid\neg\mathcal{C}(t^{(n)}_{0}),T^{(n+1)}\!\leq\!0,\theta\!=\!i]<0, This makes the left side of (240) an average of the positive quantity in the right of (240) and the negative quantity in (241). Then:

𝖤[Ui(Tn)\displaystyle\mathsf{E}[U_{i}(T_{n})\mid T(n)>0,θ=i]\displaystyle T^{(n)}\!>\!0,\theta\!=\!i]
𝖤[Ui(Tn)T(n+1)>0,θ=i]\displaystyle\leq\mathsf{E}[U_{i}(T_{n})\mid T^{(n+1)}\!>\!0,\theta\!=\!i]
Pr(𝒞(t0(n))T(n)>0,θ=i)\displaystyle\quad\quad\quad\quad\quad\cdot\Pr(\mathcal{C}(t^{(n)}_{0})\mid T^{(n)}\!>\!0,\theta\!=\!i) (242)
𝖤[Ui(Tn)T(n+1)>0,θ=i]\displaystyle\leq\mathsf{E}[U_{i}(T_{n})\mid T^{(n+1)}>0,\theta=i] (243)

The last inequality (243) follows because the expectation is positive and is multiplied by a probability, 0Pr(𝒞(t0(n))T(n)>0,θ=i)10\leq\Pr(\mathcal{C}(t^{(n)}_{0})\mid T^{(n)}\!>\!0,\theta\!=\!i)\leq 1, in (242). ∎

References

  • [1] A. Antonini, H. Yang, and R. D. Wesel, “Low complexity algorithms for transmission of short blocks over the bsc with full feedback,” in 2020 IEEE International Symposium on Information Theory (ISIT), 2020, pp. 2173–2178.
  • [2] H. Yang, M. Pan, A. Antonini, and R. D. Wesel, “Sequential transmission over binary asymmetric channels with feedback,” IEEE Tran. Inf. Theory, 2021. [Online]. Available: https://arxiv.org/abs/2111.15042
  • [3] C. Shannon, “The zero error capacity of a noisy channel,” IRE Trans. Inf. Theory, vol. 2, no. 3, pp. 8–19, September 1956.
  • [4] M. V. Burnashev, “Data transmission over a discrete channel with feedback. random transmission time,” Problemy Peredachi Inf., vol. 12, no. 4, pp. 10–30, 1976.
  • [5] M. Horstein, “Sequential transmission using noiseless feedback,” IEEE Trans. Inf. Theory, vol. 9, no. 3, pp. 136–143, July 1963.
  • [6] O. Shayevitz and M. Feder, “Optimal feedback communication via posterior matching,” IEEE Trans. Inf. Theory, vol. 57, no. 3, pp. 1186–1222, March 2011.
  • [7] S. K. Gorantla and T. P. Coleman, “A stochastic control approach to coding with feedback over degraded broadcast channels,” in 49th IEEE Conference on Decision and Control (CDC), 2010, pp. 1516–1521.
  • [8] C. T. Li and A. E. Gamal, “An efficient feedback coding scheme with low error probability for discrete memoryless channels,” IEEE Trans. Inf. Theory, vol. 61, no. 6, pp. 2953–2963, June 2015.
  • [9] O. Shayevitz and M. Feder, “A simple proof for the optimality of randomized posterior matching,” IEEE Trans. Inf. Theory, vol. 62, no. 6, pp. 3410–3418, June 2016.
  • [10] M. Naghshvar, T. Javidi, and M. Wigger, “Extrinsic Jensen–Shannon divergence: Applications to variable-length coding,” IEEE Trans. Inf. Theory, vol. 61, no. 4, pp. 2148–2164, April 2015.
  • [11] J. H. Bae and A. Anastasopoulos, “A posterior matching scheme for finite-state channels with feedback,” in 2010 IEEE Int. Symp. Inf. Theory, 2010, pp. 2338–2342.
  • [12] V. Kostina, Y. Polyanskiy, and S. Verd, “Joint source-channel coding with feedback,” IEEE Trans. on Inf. Theory, vol. 63, no. 6, pp. 3502–3515, 2017.
  • [13] S. Kim, R. Ma, D. Mesa, and T. P. Coleman, “Efficient bayesian inference methods via convex optimization and optimal transport,” in 2013 IEEE Int. Symp. on Inf. Theory, 2013, pp. 2259–2263.
  • [14] O. Sabag, H. H. Permuter, and N. Kashyap, “Feedback capacity and coding for the bibo channel with a no-repeated-ones input constraint,” IEEE Tran. Inf. Theory, vol. 64, no. 7, pp. 4940–4961, 2018.
  • [15] L. V. Truong, “Posterior matching scheme for gaussian multiple access channel with feedback,” in 2014 IEEE Information Theory Workshop (ITW 2014), 2014, pp. 476–480.
  • [16] A. Anastasopoulos, “A sequential transmission scheme for unifilar finite-state channels with feedback based on posterior matching,” in 2012 IEEE Int. Symp. Inf. Theory, 2012, pp. 2914–2918.
  • [17] J. Schalkwijk, “A class of simple and optimal strategies for block coding on the binary symmetric channel with noiseless feedback,” IEEE Trans. Inf. Theory, vol. 17, no. 3, pp. 283–287, May 1971.
  • [18] J. Schalkwijk and K. Post, “On the error probability for a class of binary recursive feedback strategies,” IEEE Trans. Inf. Theory, vol. 19, no. 4, pp. 498–511, July 1973.
  • [19] A. Tchamkerten and E. Telatar, “A feedback strategy for binary symmetric channels,” in Proc. IEEE Int. Symp. Inf. Theory, June 2002, pp. 362–362.
  • [20] A. Tchamkerten and I. E. Telatar, “Variable length coding over an unknown channel,” IEEE Trans. Inf. Theory, vol. 52, no. 5, pp. 2126–2145, May 2006.
  • [21] M. Naghshvar, M. Wigger, and T. Javidi, “Optimal reliability over a class of binary-input channels with feedback,” in 2012 IEEE Inf. Theory Workshop, Sep. 2012, pp. 391–395.
  • [22] R. Durrett, Probability : theory and examples / Chapter 4.8, Optional Stopping Theorems, 5th ed., ser. Cambridge series in statistical and probabilistic mathematics ; 49.   Cambridge ;: Cambridge University Press, 2019.
  • [23] ——, Probability : theory and examples / Chapter 4.2, Exponential Martingale, 5th ed., ser. Cambridge series in statistical and probabilistic mathematics ; 49.   Cambridge ;: Cambridge University Press, 2019.