This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Variable-Length Sparse Feedback Codes for Point-to-Point, Multiple Access, and Random Access Channels

Recep Can Yavas, , Victoria Kostina, , and Michelle Effros Manuscript received January 31, 2023; revised October 9, 2023; accepted November 17, 2023.When this work was completed, R. C. Yavas, V. Kostina, and M. Effros were all with the Department of Electrical Engineering, California Institute of Technology, Pasadena, CA 91125, USA. R. C. Yavas is currently with CNRS@CREATE, 138602, Singapore (e-mail: vkostina, effros@caltech.edu, recep.yavas@cnrsatcreate.sg). This work was supported in part by the National Science Foundation (NSF) under grant CCF-1817241 and CCF-1956386. This paper was presented in part at ISIT 2021 [1] and at ITW 2021 [2].
Abstract

This paper investigates variable-length stop-feedback codes for memoryless channels in point-to-point, multiple access, and random access communication scenarios. The proposed codes employ LL decoding times n1,n2,,nLn_{1},n_{2},\dots,n_{L} for the point-to-point and multiple access channels and KL+1KL+1 decoding times for the random access channel with at most KK active transmitters. In the point-to-point and multiple access channels, the decoder uses the observed channel outputs to decide whether to decode at each of the allowed decoding times n1,,nLn_{1},\dots,n_{L}, at each time telling the encoder whether or not to stop transmitting using a single bit of feedback. In the random access scenario, the decoder estimates the number of active transmitters at time n0n_{0} and then chooses among decoding times nk,1,,nk,Ln_{k,1},\dots,n_{k,L} if it believes that there are kk active transmitters. In all cases, the choice of allowed decoding times is part of the code design; given fixed value LL, allowed decoding times are chosen to minimize the expected decoding time for a given codebook size and target average error probability. The number LL in each scenario is assumed to be constant even when the blocklength is allowed to grow; the resulting code therefore requires only sparse feedback. The central results are asymptotic approximations of achievable rates as a function of the error probability, the expected decoding time, and the number of decoding times. A converse for variable-length stop-feedback codes with uniformly-spaced decoding times is included for the point-to-point channel.

Index Terms:
Variable-length coding, multiple-access, random-access, feedback codes, sparse feedback, second-order analysis, channel dispersion, moderate deviations, sequential hypothesis testing.

I Introduction

Although feedback does not increase the capacity of memoryless, point-to-point channels (PPCs) [3], feedback can simplify coding schemes and improve the speed of approach to capacity with blocklength. Examples that demonstrate this effect include Horstein’s scheme for the binary symmetric channel (BSC) [4] and Schalkwijk and Kailath’s scheme for the Gaussian channel [5], both of which leverage full channel feedback to simplify coding in the fixed-length regime. Wagner et al. [6] show that feedback improves the second-order term in the achievable rate as a function of blocklength for fixed-rate coding over discrete, memoryless, point-to-point channels (DM-PPCs) that have multiple capacity-achieving input distributions giving distinct dispersions.

I-A Literature Review on Variable-Length Feedback Codes

The benefits of feedback increase for codes with multiple decoding times (called variable-length or rateless codes). In [7], Burnashev shows that feedback significantly improves the optimal error exponent of variable-length codes for DM-PPCs. In [8], Polyanskiy et al. extend the work of Burnashev to the finite-length regime with non-vanishing error probabilities, introducing variable-length feedback (VLF) codes and deriving achievability and converse bounds on their performance. Tchamkerten and Telatar [9] show that Burnashev’s optimal error exponent is achieved for a family of BSCs and ZZ channels, where the cross-over probability of the channel is unknown. For the BSC, Naghshvar et al. [10] propose a VLF coding scheme with a novel encoder called the small-enough-difference (SED) encoder and derive a non-asymptotic achievability bound. Their scheme is an alternative to Burnashev’s scheme to achieve the optimal error exponent. Yang et al. [11] extend the SED encoder to the binary asymmetric channel, of which the BSC is a special case, and derive refined non-asymptotic achievability bounds for the binary asymmetric channel. Guo and Kostina [12] propose an instantaneous SED code for a source whose symbols progressively arrive at the encoder in real time.

The feedback in VLF codes can be limited in its amount and frequency. Here, the amount refers to how much feedback is sent from the receiver at each time feedback is available; the frequency refers to how many times feedback is available throughout the communication epoch. The extreme cases in the frequency are no feedback and feedback after every channel use. The extreme cases in the amount are full feedback and stop feedback. With full feedback, at time nin_{i}, the receiver sends all symbols received until that time, YniY^{n_{i}}, which can be used by the transmitter to encode the (ni+1)(n_{i+1})-th symbol. With stop feedback, the receiver sends a single bit of feedback to inform the transmitter whether or not to stop transmitting. Unlike full-feedback codes, variable-length stop-feedback (VLSF) codes employ codewords that are fixed when the code is designed; that is, feedback affects how much of a codeword is sent but does not affect the codeword’s value.

In [8], Polyanskiy et al. define VLSF codes with feedback after every channel use. The result in [8, Th. 2] shows that variable-length coding improves the first-order term in the asymptotic expansion of the maximum achievable message set size from NCNC to NC1ϵ\frac{NC}{1-\epsilon}, where CC is the capacity of the DM-PPC, NN is the average decoding time (averaging is with respect to both the random message and the random noise), and ϵ\epsilon is the average error probability. The second-order term achievable for VLF codes is O(logN)O(\log N), which means that VLF codes have zero dispersion and that the convergence to the capacity is much faster than that achieved by the fixed-length codes [13, 14]. In [15], Altuğ et al. modify the VLSF coding paradigm by replacing the average decoding time constraint with a constraint on the probability that the decoding time exceeds a target value; the benefit in the first-order term does not appear under this probabilistic delay constraint, and the dispersion is no longer zero. A VLSF scenario with noisy feedback and a finite largest available decoding time is studied in [16]. For VLSF codes, Forney [17] shows an achievable error exponent that is strictly better than that of fixed-length, no-feedback codes and is strictly worse than Burnashev’s error exponent for variable-length full-feedback codes. Ginzach et al. [18] derive the exact error exponent of VLSF codes for the BSC.

Bounds on the performance of VLSF codes that allow feedback after every channel use are derived for several network communication problems. Truong and Tan [19, 20] extend the results from [8] to the Gaussian multiple access channel (MAC) under an average power constraint. Trillingsgaard et al. [21] study the VLSF scenario where a common message is transmitted across a KK-user discrete memoryless broadcast channel. Heidari et al. [22] extend Burnashev’s work from the DM-PPC to the DM-MAC, deriving lower and upper bounds on the error exponents of VLF codes for the DM-MAC. Bounds on the performance of VLSF codes for the DM-MAC with an unbounded number of decoding times appear in [23]. The achievability bounds for KK-transmitter MAC in [20] and [23] employ 2K12^{K}-1 simultaneous information density threshold rules.

While high rates of feedback are impractical for many applications — especially wireless applications on half-duplex devices — most prior work on VLSF codes (e.g., [8, 15, 19, 20, 21, 22, 23]) considers the densest feedback possible, using feedback at each of the (at most) nmaxn_{\max} time steps before decoding, where nmaxn_{\max} is the largest blocklength used by a given VLSF code. To consider more limited feedback scenarios, let LL denote the number of potential decoding times in a VLSF code, a number that we assume to be independent of the blocklength. We further assume that feedback is available only at the LL fixed decoding times n1,,nLn_{1},\dots,n_{L}, which are fixed in the code design and known by the transmitter and receiver before the start of transmission. In [24], Kim et al. choose the decoding time for each message from the set {d,2d,,Ld}\{d,2d,\dots,Ld\} for some positive integer dd and L<L<\infty, In [25], Williamson et al. numerically optimize the values of LL decoding times and employ punctured convolutional codes and a Viterbi algorithm. In [26], Vakilinia et al. introduce a sequential differential optimization (SDO) algorithm to optimize the choices of the LL potential decoding times n1,,nLn_{1},\dots,n_{L}, approximating the random decoding time τ\tau by a Gaussian random variable. Vakilinia et al. apply the SDO algorithm to non-binary low-density parity-check codes over binary-input, additive white Gaussian channels; the mean and variance of τ\tau are determined through simulation. Heidarzadeh et al. [27] extend [26] to account for the feedback rate and apply the SDO algorithm to random linear codes over the binary erasure channel. In [28], we develop a communication strategy for a random access scenario with a total of KK transmitters; in this scenario, neither the transmitters nor the receiver knows the set of active transmitters, which can vary from one epoch to the next. The code in [28] is a VLSF code with decoding times n0<n1<<nKn_{0}<n_{1}<\dots<n_{K}. The decoder decodes messages only if it decides at that time that kk out of total KK transmitters are active at time nkn_{k}. It informs the transmitters about its decision by sending a one-bit signal at each time nin_{i} until the time at which it decodes. We show that our random access channel (RAC) code with sparse stop feedback achieves performance identical in the capacity and dispersion terms to that of the best-known code without feedback for a MAC in which the set of active transmitters is known a priori. An extension of [28] to low-density parity-check codes appears in [29]. Building upon an earlier version of the present paper [1], Yang et al. [30] construct an integer program to minimize the upper bound on the average blocklength subject to constraints on the average error probability and the minimum gap between consecutive decoding times. By employing a combination of the Edgeworth expansion [31, Sec. XVI.4] and the Petrov expansion (Lemma 2), that paper develops an approximation to the cumulative distribution function of the information density random variable ı(Xn;Yn)\imath(X^{n};Y^{n}); the numerical comparison of their approximation and the empirical cumulant distribution function shows that the approximation is tight even for small values of nn. Their analysis uses this tight approximation to numerically evaluate the non-asymptotic achievability bound (Theorem 1, below) for the BSC, binary erasure channel, and binary-input Gaussian PPC for all L32L\leq 32. The resulting numerical results show performance that closely approaches Polyanskiy’s VLSF achievability bound [8] with a relatively small LL. For the binary erasure channel, [30] also proposes a new zero-error code that employs systematic transmission followed by random linear fountain coding; the proposed code outperforms Polyanskiy’s achievability bound.

Sparse feedback is known to achieve the optimal error exponent for VLF codes. Yamamoto and Itoh [32] construct a two-phase scheme that achieves Burnashev’s optimal error exponent [7]. Although their scheme allows an unlimited number of feedback instances and decoding times, it is sparse in the sense that feedback is available only at times αn,n,(1+α)n,2n,\alpha n,n,(1+\alpha)n,2n,\dots for some α(0,1)\alpha\in(0,1) and integer nn. Lalitha and Javidi [33] show that Burnashev’s optimal error exponent can be achieved by only L=3L=3 decoding times by truncating the Yamamoto–Itoh scheme.

Decoding for VLSF codes can be accomplished by running a sequential hypothesis test (SHT) on each possible message. At each increasingly larger stopping times, the SHT compares a hypothesis H0H_{0} corresponding to a particular transmitted message to the hypothesis H1H_{1} corresponding to the marginal distribution of the channel output. In [34], Berlin et al. derive a bound on the average stopping time of an SHT. They then use this bound to derive a non-asymptotic converse bound for VLF codes. This result is an alternative proof for the converse of Burnashev’s error exponent [7].

I-B Contributions of This Work

Like [26, 25, 27], this paper studies VLSF codes under a finite constraint LL on the number of decoding times. While [26, 25, 27] focus on practical coding and performance, our goal is to derive new achievability bounds on the asymptotic rate achievable by VLSF codes between L=1L=1 (the fixed-length regime analyzed in [13, 35]) and L=nmaxL=n_{\max} (the classical variable-length regime defined in [8, Def. 1] where all decoding times 1, 2, ,nmax\dots,n_{\max} are available).

Our contributions are summarized as follows.

  1. 1.

    We derive second-order achievability bounds for VLSF codes over DM-PPCs, DM-MACs, DM-RACs, and the Gaussian PPC with maximal power constraints. These bounds are presented in Theorems 256, and 7, respectively. In our analysis for each problem, we consider the asymptotic regime where the number of decoding times LL is fixed while the average decoding time NN grows without bound, i.e., L=O(1)L=O(1) with respect to NN. Each of our asymptotic bounds follows from the corresponding non-asymptotic bound that employs an information-density threshold rule with a stop-at-time-zero procedure. Asymptotically optimizing the values of the LL decoding times yields the given results. By viewing the proposed decoder as a special case of SHT-based decoders, we show a more general non-asymptotic achievability bound; Theorem 8 employs an arbitrary SHT to decide whether a message is transmitted.

  2. 2.

    Linking the error probability of any given VLSF code to that of an SHT, in Theorem 9, we prove a converse bound in the spirit of the meta-converse bound from [13, Th. 27]. Analyzing the new bound with infinitely many uniformly-spaced decoding times over Cover–Thomas symmetric channels, in Theorem 3, we prove a converse bound for VLSF codes; the resulting bound is tight up to its second-order term. Unfortunately, since analyzing our meta-converse bound is challenging in the general case of an arbitrary DM-PPC and an arbitrary number LL of decoding times (see [36, Th. 3.2.3] for the structure of optimal SHTs with finitely many decoding times), whether or not the second-order term is tight in the general case remains an open question.

TABLE I: The performance of VLSF codes according to the number of decoding times LL and the channel type
Number of decoding times Channel type First-order term Second-order term
Lower bound Upper bound
Fixed-length, no-feedback
(L=1)(L=1)
DM-PPC NCNC NVQ1(ϵ)-\sqrt{NV}Q^{-1}(\epsilon)[13] NVQ1(ϵ)-\sqrt{NV}Q^{-1}(\epsilon)[13]
Variable-length
(1<L<)(1<L<\infty)
DM-PPC NC1ϵ\frac{NC}{1-\epsilon} Nlog(L1)(N)V1ϵ-\sqrt{N\log_{(L-1)}(N)\frac{V}{1-\epsilon}}  (Theorem 2) +O(1)+O(1)[8]
Variable-length
(L=)(L=\infty)
DM-PPC NC1ϵ\frac{NC}{1-\epsilon} logN+O(1)-\log N+O(1)[8] +O(1)+O(1)[8]
Fixed-length, no-feedback
(L=1)(L=1)
DM-MAC NIKNI_{K} NVKQ1(ϵ)-\sqrt{NV_{K}}Q^{-1}(\epsilon)[37] +O(N)+O(\sqrt{N})[38]
Variable-length
(1<L<)(1<L<\infty)
DM-MAC NIK1ϵ\frac{NI_{K}}{1-\epsilon} Nlog(L1)(N)VK1ϵ-\sqrt{N\log_{(L-1)}(N)\frac{V_{K}}{1-\epsilon}}  (Theorem 5) +O(1)+O(1)[20]
Variable-length
(L=)(L=\infty)
DM-MAC NIK1ϵ\frac{NI_{K}}{1-\epsilon} logN+O(1)-\log N+O(1)  eq. (44) +O(1)+O(1)[20]
Variable-length
(L=)(L=\infty)
Gaussian MAC
(average power)
NC(KP)1ϵ\frac{NC(KP)}{1-\epsilon} O(N)-O(\sqrt{N})[20] +O(1)+O(1)[20]
(L=1)(L=1) DM-RAC NkIkN_{k}I_{k} NkVkQ1(ϵk)-\sqrt{N_{k}V_{k}}Q^{-1}(\epsilon_{k})[28] +O(Nk)+O(\sqrt{N_{k}})[38]
(L=1)(L=1)
Gaussian RAC
(maximal power)
NkC(kP)N_{k}C(kP) NkVk(P)Q1(ϵk)-\sqrt{N_{k}V_{k}(P)}Q^{-1}(\epsilon_{k})[39] +O(Nk)+O(\sqrt{N_{k}})[38]
(1<L<)(1<L<\infty) DM-RAC NkIk1ϵk\frac{N_{k}I_{k}}{1-\epsilon_{k}} Nklog(L1)(Nk)Vk1ϵk-\sqrt{N_{k}\log_{(L-1)}(N_{k})\frac{V_{k}}{1-\epsilon_{k}}}  (Theorem 6) +O(1)+O(1)[20]
Variable-length
(1<L<)(1<L<\infty)
Gaussian PPC
(maximal power)
NC(P)1ϵ\frac{NC(P)}{1-\epsilon} Nlog(L1)(N)V(P)1ϵ-\sqrt{N\log_{(L-1)}(N)\frac{V(P)}{1-\epsilon}}  (Theorem 7) +O(1)+O(1)[19]
Variable-length
(L=)(L=\infty)
Gaussian PPC
(average power)
NC(P)1ϵ\frac{NC(P)}{1-\epsilon} logN+O(1)-\log N+O(1)[19] +O(1)+O(1)[19]

Below, we detail these contributions. Our main result shows that for VLSF codes with L=O(1)2L=O(1)\geq 2 decoding times over a DM-PPC, message set size MM satisfying

logMNC1ϵNlog(L1)(N)V1ϵ\displaystyle\log M\approx\frac{NC}{1-\epsilon}-\sqrt{N\log_{(L-1)}(N)\frac{V}{1-\epsilon}} (1)

is achievable. Here log(L)()\log_{(L)}(\cdot) denotes the LL-fold nested logarithm, NN is the average decoding time, ϵ\epsilon is the average error probability, and CC and VV are the capacity and dispersion of the DM-PPC, respectively. Similar formulas arise for the DM-MAC and DM-RAC, where CC and VV are replaced by the sum-rate mutual information and the sum-rate dispersion. The speed of convergence to C1ϵ\frac{C}{1-\epsilon} depends on LL. It is slower than the convergence to CC in the fixed-length scenario, which has second-order term O(N)O(\sqrt{N}) [13]. The L=2L=2 case in (1) recovers the rate of convergence for the variable-length scenario without feedback, which has second-order term O(NlogN)O(\sqrt{N\log N}) [8, Proof of Th. 1]; that rate is achieved with n1=0n_{1}=0. The nested logarithm term in (1) arises because after writing the average decoding time as 𝔼[τ]=n1+i=1L1(ni+1ni)[τ>ni]\mathbb{E}\left[\tau\right]=n_{1}+\sum_{i=1}^{L-1}(n_{i+1}-n_{i})\mathbb{P}\left[\tau>n_{i}\right], the decoding time choices in (22), below, satisfy (ni+1ni)[τ>ni]=o(n1)(n_{i+1}-n_{i})\mathbb{P}\left[\tau>n_{i}\right]=o(\sqrt{n_{1}}) for i[L1]i\in[L-1], making the effect of each decoding time on the average decoding time asymptotically similar. We then use the SDO algorithm introduced in [26] to show that our particular choice of n1,,nLn_{1},\dots,n_{L} is second-order optimal (see Appendix B.II). Despite the order-wise dependence of the rate of convergence on LL, (1) grows so slowly with LL that it suggests little benefit to choosing a large LL. For example, when L=4L=4, Nlog(L1)(N)\sqrt{N\log_{(L-1)}(N)} behaves very similarly to O(N)O(\sqrt{N}) for practical values of NN (e.g., N[103,105]N\in[10^{3},10^{5}]). Notice, however, that the given achievability result provides a lower bound on the benefit of increasing LL; bounding the benefit from above requires a converse result. We note, however, that the numerical results in [30] support our conclusion from the asymptotic achievability bound (1) that indicates that the improvement over achievable logM\log M from LL to L+1L+1 decoding times diminishes as LL increases.

For the PPC and MAC, the feedback rate of our code is n\frac{\ell}{n_{\ell}} if the decoding time is nn_{\ell}; for the RAC, that rate becomes (k1)L++1nk,\frac{(k-1)L+\ell+1}{n_{k,\ell}} if the decoding time is nk,n_{k,\ell}. In both cases, our feedback rate approaches 0 as the decoding time grows. In contrast, VLSF codes like in [8, 17] use feedback rate 1 bit per channel use. In VLSF codes for the RAC, the decoder decodes at one of the available times nk,1,nk,2,,nk,Ln_{k,1},n_{k,2},\dots,n_{k,L} if it estimates that the number of active transmitters is k0k\neq 0; we reserve a single decoding time n0n_{0} for decoding the possibility that no active transmitters are active. Theorem 6 extends the RAC code in [28] from L=1L=1 to any L2L\geq 2.

The converse result in Theorem 3 shows that in order to achieve (1) with evenly spaced decoding times, one needs at least L=Ω(Nlog(L1)(N))L=\Omega\left(\sqrt{\frac{N}{\log_{(L-1)}(N)}}\right) decoding times. In contrast, our optimized codes achieve (1) with a finite LL that does not grow with the average decoding time NN, which highlights the importance of optimizing the values of decoding times in a VLSF code.

Table I summarizes the literature on VLSF codes and the new results from this work, showing how they vary with the number of decoding times and the channel type.

In what follows, Section II gives notation and definitions. Sections IIIVI introduce variable-length sparse stop-feedback codes for the DM-PPC, DM-MAC, DM-RAC, and the Gaussian PPC, respectively, and present our main theorems for those channel models; Section VII concludes the paper. The proofs appear in the Appendix.

II Preliminaries

II-A Notation

For any positive integers kk and nn, [k]{1,,k}[k]\triangleq\{1,\dots,k\}, xn(x1,,xn)x^{n}\triangleq(x_{1},\dots,x_{n}), and xa:b(xa,xa+1,,xb)x^{a:b}\triangleq(x_{a},x_{a+1},\dots,x_{b}). The collection of length-nn vectors from the transmitter index set 𝒜\mathcal{A} is denoted by x𝒜n(xan:a𝒜)x_{\mathcal{A}}^{n}\triangleq(x_{a}^{n}\colon a\in\mathcal{A}); we drop the superscript nn if n=1n=1, i.e., x𝒜1=x𝒜x^{1}_{\mathcal{A}}=x_{\mathcal{A}}. The collection of non-empty strict subsets of a set 𝒜\mathcal{A} is denoted by 𝒫(𝒜){:𝒜,0<||<|𝒜|}.\mathcal{P}(\mathcal{A})\triangleq\{\mathcal{B}\colon\mathcal{B}\subseteq\mathcal{A},0<|\mathcal{B}|<|\mathcal{A}|\}. All-zero and all-one vectors are denoted by 𝟎\mathbf{0} and 𝟏\mathbf{1}, respectively; dimension is determined from the context. The sets of positive integers and non-negative integers are denoted by +\mathbb{Z}_{+} and \mathbb{Z}_{\geq}, respectively. We write xn=πynx^{n}\stackrel{{\scriptstyle\pi}}{{=}}y^{n} if there exists a permutation π\pi of xnx^{n} such that π(xn)=yn\pi(x^{n})=y^{n}, and xnπynx^{n}\stackrel{{\scriptstyle\pi}}{{\neq}}y^{n} if such a permutation does not exist. The identity matrix of dimension nn is denoted by 𝖨n\mathsf{I}_{n}. The Euclidean norm of vector xnx^{n} is denoted by xni=1nxi2\left\lVert x^{n}\right\rVert\triangleq\sqrt{\sum_{i=1}^{n}x_{i}^{2}}. Unless specified otherwise, all logarithms and exponents have base ee. Information is measured in nats. The standard O()O(\cdot), o()o(\cdot), and Ω()\Omega(\cdot) notations are defined as f(n)=O(g(n))f(n)=O(g(n)) if lim supn|f(n)/g(n)|<\limsup_{n\to\infty}|f(n)/g(n)|<\infty, f(n)=o(g(n))f(n)=o(g(n)) if limn|f(n)/g(n)|=0\lim_{n\to\infty}|f(n)/g(n)|=0, and f(n)=Ω(g(n))f(n)=\Omega(g(n)) if limn|f(n)/g(n)|>0\lim_{n\to\infty}|f(n)/g(n)|>0. The distribution of a random variable XX is denoted by PXP_{X}; 𝒩(𝝁,𝖵)\mathcal{N}(\bm{\mu},\mathsf{V}) denotes the Gaussian distribution with mean 𝝁\bm{\mu} and covariance matrix 𝖵\mathsf{V}, Q()Q(\cdot) represents the complementary standard Gaussian cumulative distribution function Q(x)12πxexp{t22}𝑑tQ(x)\triangleq\frac{1}{\sqrt{2\pi}}\int_{x}^{\infty}\exp\left\{-\frac{t^{2}}{2}\right\}dt, and Q1()Q^{-1}(\cdot) is its functional inverse. We define the nested logarithm function

log(L)(x){log(x)if L=1,x>0log(log(L1)(x))if L2,log(L1)(x)>0;\displaystyle\log_{(L)}(x)\triangleq\begin{cases}\log(x)&\text{if }L=1,\,\,x>0\\ \log(\log_{(L-1)}(x))&\text{if }L\geq 2,\,\,\log_{(L-1)}(x)>0;\end{cases} (2)

log(L)(x)\log_{(L)}(x) is undefined for all other (L,x)(L,x) pairs.

We denote the Radon-Nikodym derivative of distribution PP with respect to distribution QQ by dPdQ\frac{\mathrm{d}P}{\mathrm{d}Q}. We denote the relative entropy and relative entropy variance between PP and QQ by D(PQ)=𝔼[logdPdQ(X)]D(P\|Q)=\mathbb{E}\left[\log\frac{\mathrm{d}P}{\mathrm{d}Q}(X)\right] and V(PQ)=Var[logdPdQ(X)]V(P\|Q)=\mathrm{Var}\left[\log\frac{\mathrm{d}P}{\mathrm{d}Q}(X)\right], respectively, where XPX\sim P. The σ\sigma-algebra generated by random variable XX is denoted by (X)\mathcal{F}(X). A random variable XX is called arithmetic if there exists some d>0d>0 such that [Xd]=1\mathbb{P}\left[X\in d\mathbb{Z}\right]=1. The largest dd that satisfies this condition is called the span. If such a dd does not exist, then the random variable is non-arithmetic. Denote X+max{0,X}X^{+}\triangleq\max\{0,X\} and Xmin{0,X}X^{-}\triangleq-\min\{0,X\} for any random variable XX.

II-B Discrete Memoryless Channel and Information Density

A DM-PPC is defined by the triple (𝒳,PY|X,𝒴)(\mathcal{X},P_{Y|X},\mathcal{Y}), where 𝒳\mathcal{X} is the finite input alphabet, PY|XP_{Y|X} is the channel transition kernel, and 𝒴\mathcal{Y} is the finite output alphabet. The nn-letter input-output relationship of the channel is given by PYn|Xn(yn|xn)=i=1nPY|X(yi|xi)P_{Y^{n}|X^{n}}(y^{n}|x^{n})=\prod_{i=1}^{n}P_{Y|X}(y_{i}|x_{i}) for all nn, xnx^{n}, and yny^{n}.

The nn-letter information density of a channel PY|XP_{Y|X} under input distribution PXnP_{X^{n}} is defined as

ı(xn;yn)logPYn|Xn(yn|xn)PYn(yn),\displaystyle\imath(x^{n};y^{n})\triangleq\log\frac{P_{Y^{n}|X^{n}}(y^{n}|x^{n})}{P_{Y^{n}}(y^{n})}, (3)

where PYnP_{Y^{n}} is the YnY^{n} marginal of PXnPYn|XnP_{X^{n}}P_{Y^{n}|X^{n}}. If the inputs X1,X2,,XnX_{1},X_{2},\dots,X_{n} are independently and identically distributed (i.i.d.) according to PXP_{X}, then

ı(xn;yn)=i=1nı(xi;yi),\displaystyle\imath(x^{n};y^{n})=\sum_{i=1}^{n}\imath(x_{i};y_{i}), (4)

where the single-letter information density is given by

ı(x;y)logPY|X(y|x)PY(y),x𝒳,y𝒴.\displaystyle\imath(x;y)\triangleq\log\frac{P_{Y|X}(y|x)}{P_{Y}(y)},\quad x\in\mathcal{X},y\in\mathcal{Y}. (5)

The mutual information and dispersion are defined as

I(X;Y)\displaystyle I(X;Y) 𝔼[ı(X;Y)]\displaystyle\triangleq\mathbb{E}\left[\imath(X;Y)\right] (6)
V(X;Y)\displaystyle V(X;Y) Var[ı(X;Y)],\displaystyle\triangleq\mathrm{Var}\left[\imath(X;Y)\right], (7)

respectively, where (X,Y)PXPY|X(X,Y)\sim P_{X}P_{Y|X}.

Let 𝒫\mathcal{P} denote all distributions on the alphabet 𝒳\mathcal{X}. The capacity of the DM-PPC is

C=maxPX𝒫I(X;Y),\displaystyle C=\max_{P_{X}\in\mathcal{P}}I(X;Y), (8)

and the dispersion of the DM-PPC is

V=minPX𝒫:I(X;Y)=CV(X;Y).\displaystyle V=\min_{P_{X}\in\mathcal{P}\colon I(X;Y)=C}V(X;Y). (9)

III VLSF Codes for the DM-PPC

III-A VLSF Codes with LL Decoding Times

We consider VLSF codes with a finite number of potential decoding times n1<n2<<nLn_{1}<n_{2}<\cdots<n_{L} over a DM-PPC. The receiver chooses to end the transmission at the first time n{n1,,nL}n_{\ell}\in\{n_{1},\ldots,n_{L}\} that it is ready to decode. The transmitter learns of the receiver’s decision via a single bit of feedback at each of times n1,,nn_{1},\ldots,n_{\ell}. Feedback bit “0” at time nin_{i} means that the receiver is not yet ready to decode, and transmission should continue; feedback bit “1” means that the receiver can decode at time nin_{i}, which signals the transmitter to stop. Using this feedback, the transmitter and the receiver are synchronized and aware of the current state of the transmission at all times. Since nLn_{L} is the last decoding time available, the receiver always makes a final decision if time nLn_{L} is reached. Unlike [7, 32, 25], we do not allow re-transmission of the message after time nLn_{L}. Since the transmitter and the receiver both know the values of decoding times, the receiver does not need to send feedback at the last available time nLn_{L}. We assume that the transmitter and the receiver know the channel transition kernel PY|XP_{Y|X}. We employ average decoding time and average error probability constraints. Definition 1, below, formalizes our code description.

Definition 1

Fix ϵ(0,1)\epsilon\in(0,1), positive integers LL and MM, and a positive scalar NN. An (N,L,M,ϵ)(N,L,M,\epsilon)-VLSF code for the DM-PPC comprises

  1. 1.

    non-negative integer-valued decoding times n1<<nLn_{1}<\ldots<n_{L},

  2. 2.

    a finite alphabet 𝒰\mathcal{U} and a probability distribution PUP_{U} on 𝒰\mathcal{U} defining a common randomness random variable UU that is revealed to both the transmitter and the receiver before the start of the transmission,111The realization uu of UU specifies the codebook.

  3. 3.

    an encoding function 𝖿n:𝒰×[M]𝒳\mathsf{f}_{n}\colon\mathcal{U}\times[M]\to\mathcal{X}, for each n=1,,nLn=1,\ldots,n_{L}, that assigns a codeword

    𝖿(u,m)nL(𝖿1(u,m),,𝖿nL(u,m))\displaystyle\mathsf{f}(u,m)^{n_{L}}\triangleq(\mathsf{f}_{1}(u,m),\dots,\mathsf{f}_{n_{L}}(u,m)) (10)

    to each message m[M]m\in[M] and common randomness instance u𝒰u\in\mathcal{U},

  4. 4.

    a non-negative integer-valued random stopping time τ{n1,,nL}\tau\in\{n_{1},\dots,n_{L}\} for the filtration generated by {U,Yni}i=1L\{U,Y^{n_{i}}\}_{i=1}^{L} that satisfies an average decoding time constraint

    𝔼[τ]N,\displaystyle\mathbb{E}\left[\tau\right]\leq N, (11)
  5. 5.

    and a decoding function 𝗀n:𝒰×𝒴n[M]{𝖾}\mathsf{g}_{n_{\ell}}\colon\mathcal{U}\times\mathcal{Y}^{n_{\ell}}\to[M]\cup\{\mathsf{e}\} for each [L]\ell\in[L] (where 𝖾\mathsf{e} is the erasure symbol used to indicate that the receiver is not ready to decode), satisfying an average error probability constraint

    [𝗀τ(U,Yτ)W]ϵ,\displaystyle\mathbb{P}\left[\mathsf{g}_{\tau}(U,Y^{\tau})\neq W\right]\leq\epsilon, (12)

    where the message WW is equiprobably distributed on the set [M][M], and Xτ=𝖿(U,W)τX^{\tau}=\mathsf{f}(U,W)^{\tau}.

Recall that Definition 1 with L=1L=1 recovers the fixed-length no-feedback codes in [13]. As in [8, 28, 21], we here need common randomness because the traditional random-coding argument does not prove the existence of a single (deterministic) code that simultaneously satisfies conditions (11) and (12) on the code. Therefore, randomized codes are necessary for our achievability argument; here, |𝒰|2|\mathcal{U}|\leq 2 suffices [28, Appendix D].

We define the maximum achievable message set size M(N,L,ϵ)M^{*}(N,L,\epsilon) with LL decoding times, average decoding time NN, and average error probability ϵ\epsilon as

M(N,L,ϵ)\displaystyle M^{*}(N,L,\epsilon) max{M: an (N,L,M,ϵ)\displaystyle\triangleq\max\{M\colon\text{ an }(N,L,M,\epsilon)
VLSF code exists}.\displaystyle\text{ VLSF code exists}\}. (13)

The maximum achievable message set size for VLSF codes with LL decoding times n1,,nLn_{1},\dots,n_{L} that are restricted to belong to a subset 𝒩\mathcal{N}\subseteq\mathbb{Z}_{\geq} is denoted by M(N,L,ϵ,𝒩)M^{*}(N,L,\epsilon,\mathcal{N}).

III-B Related Work

The following discussion summarizes prior asymptotic expansions of the maximum achievable message set size for the DM-PPC.

  1. a)

    M(N,1,ϵ)M^{*}(N,1,\epsilon): For L=1L=1 and ϵ(0,1/2)\epsilon\in(0,1/2), Polyanskiy et al. [13, Th. 49] show that

    logM(N,1,ϵ)=NCNVQ1(ϵ)+O(logN).\displaystyle\log M^{*}(N,1,\epsilon)=NC-\sqrt{NV}Q^{-1}(\epsilon)+O(\log N). (14)

    For ϵ[1/2,1)\epsilon\in[1/2,1), the dispersion VV in (9) is replaced by the maximum dispersion VmaxmaxPX:ı(X;Y)=CV(X;Y)V_{\max}\triangleq\max\limits_{P_{X}\colon\imath(X;Y)=C}V(X;Y). The O(logN)O(\log N) term is lower bounded by O(1)O(1) and upper bounded by 12logN+O(1)\frac{1}{2}\log N+O(1). For nonsingular DM-PPCs, i.e., the channels that satisfy 𝔼[Var[ı(X;Y)|Y]]>0\mathbb{E}\left[\mathrm{Var}\left[\imath(X;Y)|Y\right]\right]>0 for the distributions that achieve the capacity CC and the dispersion VV, the O(logN)O(\log N) term is equal to 12logN+O(1)\frac{1}{2}\log N+O(1) [40]. Moulin [41] derives lower and upper bounds on the O(1)O(1) term in the asymptotic expansion when the channel is nonsingular with non-lattice information density.

  2. b)

    M(N,,ϵ)M^{*}(N,\infty,\epsilon): For VLSF codes with L=nmax=L=n_{\max}=\infty, Polyanskiy et al. [8, Th. 2] show that for ϵ(0,1)\epsilon\in(0,1),

    logM(N,,ϵ)\displaystyle\log M^{*}(N,\infty,\epsilon) NC1ϵlogN+O(1)\displaystyle\geq\frac{NC}{1-\epsilon}-\log N+O(1) (15)
    logM(N,,ϵ)\displaystyle\log M^{*}(N,\infty,\epsilon) NC1ϵ+hb(ϵ)1ϵ,\displaystyle\leq\frac{NC}{1-\epsilon}+\frac{h_{b}(\epsilon)}{1-\epsilon}, (16)

    where hb(ϵ)ϵlogϵ(1ϵ)log(1ϵ)h_{b}(\epsilon)\triangleq-\epsilon\log\epsilon-(1-\epsilon)\log(1-\epsilon) is the binary entropy function (in nats). The bounds in (15)–(16) indicate that the ϵ\epsilon-capacity (the first-order achievable term) is

    liminfN1NlogM(N,,ϵ)=C1ϵ.\displaystyle\lim\inf_{N\to\infty}\frac{1}{N}\log M^{*}(N,\infty,\epsilon)=\frac{C}{1-\epsilon}. (17)

    The achievable dispersion term is zero, i.e., the second-order term in the fundamental limit in (15)–(16) is o(N)o(\sqrt{N}).

III-C Our Achievability Bounds

Theorem 1, below, is our non-asymptotic achievability bound for VLSF codes with LL decoding times.

Theorem 1

Fix a constant γ\gamma, decoding times n1<<nLn_{1}<\cdots<n_{L}, and a positive integer MM. For any positive number NN and ϵ(0,1)\epsilon\in(0,1), there exists an (N,L,M,ϵ)(N,L,M,\epsilon)-VLSF code for the DM-PPC (𝒳,PY|X,𝒴)(\mathcal{X},P_{Y|X},\mathcal{Y}) with

ϵ\displaystyle\epsilon \displaystyle\leq [ı(XnL;YnL)<γ]+(M1)exp{γ},\displaystyle\mathbb{P}\left[\imath(X^{n_{L}};Y^{n_{L}})<\gamma\right]+(M-1)\exp\{-\gamma\}, (18)
N\displaystyle N \displaystyle\leq n1+=1L1(n+1n)[ı(Xn;Yn)<γ],\displaystyle n_{1}+\sum_{\ell=1}^{L-1}(n_{\ell+1}-n_{\ell})\mathbb{P}\left[\imath(X^{n_{\ell}};Y^{n_{\ell}})<\gamma\right], (19)

where PXnLP_{X^{n_{L}}} is a product of distributions of LL sub-vectors of lengths njnj1{n_{j}-n_{j-1}}, j[L]j\in[L], i.e.,

PXnL(xnL)=j=1LPXnj1+1:nj(xnj1+1:nj),\displaystyle P_{X^{n_{L}}}(x^{n_{L}})=\prod_{j=1}^{L}P_{X^{n_{j-1}+1:n_{j}}}(x^{n_{j-1}+1:n_{j}}), (20)

where n0=0n_{0}=0.

Proof:

Polyanskiy et al. [13] interpret the information-density threshold test for a fixed-length code as a collection of hypothesis tests aimed at determining whether the channel output is (H0)(H_{0}) or is not (H1)(H_{1}) dependent on a given codeword. In our coding scheme, we use SHTs in a similar way. The strategy is as follows.

The VLSF decoder at each time n1,,nLn_{1},\dots,n_{L} runs MM SHTs between a hypothesis H0H_{0} that the channel output results from transmission of the mm-th codeword, m[M]m\in[M], and the hypothesis H1H_{1} that the channel output is drawn from the unconditional channel output distribution. The former indicates that the decoder hypothesizes that message mm is the sent message. The latter indicates that the decoder hypothesizes that message mm has not been sent and thus can be removed from the list of possible messages to decode. Transmission stops at the first time nin_{i} that hypothesis H0H_{0} is accepted for some message mm or the first time nin_{i} that hypothesis H1H_{1} is accepted for all mm. If the latter happens, decoding fails and we declare an error. Transmission continues as long as one of the SHTs has not accepted either H0H_{0} or H1H_{1}. If H0H_{0} is declared for multiple messages at the same decoding time, then we stop and declare an error. Since nLn_{L} is the last available decoding time, the SHTs are forced to decide between H0H_{0} and H1H_{1} at time nLn_{L}. Once H0H_{0} or H1H_{1} is decided for some message, the decision cannot be reversed at a later time.

The optimal SHT has the form of a two-sided information density threshold rule, where the thresholds depend on the individual decision times [36, Th. 3.2.3]. To simplify the analysis, we employ sub-optimal SHTs for which the upper threshold is set to a value γ\gamma\in\mathbb{R} that is independent of the decoding times and the lower thresholds are set to -\infty for n<nLn_{\ell}<n_{L} and to γ\gamma for n=nLn_{\ell}=n_{L}. That is, we declare H1H_{1} for a message if and only if the corresponding information density never reaches γ\gamma at any of decoding times n1,,nLn_{1},\dots,n_{L}. Theorem 1 analyzes the error probability and the average decoding time of the sub-optimal SHT-based decoder above, and extends the achievability bound in [8, Th. 3] that considers L=L=\infty to the scenario where only a finite number of decoding times is allowed. The bound on the average decoding time (19) is obtained by expressing the bound on the average decoding time in [8, eq. (27)] using the fact that the stopping time τ\tau is in {n1,,nL}\{n_{1},\dots,n_{L}\}. When we compare Theorem 1 with [8, Th. 3], we see that the error probability bound in (18) has an extra term [ı(XnL;YnL)<γ]\mathbb{P}\left[\imath(X^{n_{L}};Y^{n_{L}})<\gamma\right]. This term appears since transmission always stops at or before time nLn_{L}.

Theorem 1 is related to [24, Lemma 1], which similarly treats L<L<\infty but requires n+1n=dn_{\ell+1}-n_{\ell}=d for some constant d1d\geq 1, and [25, Cor. 2], where the transmitter retransmits the message if decoding attempts at times n1,,nLn_{1},\dots,n_{L} are unsuccessful.

See Appendix A for the proof details.

Theorem 2, stated next, is our second-order achievability bound for VLSF codes with L=O(1)L=O(1) decoding times over the DM-PPC. The proof of Theorem 2 builds upon the non-asymptotic bound in Theorem 1.

Theorem 2

Fix an integer L=O(1)2L=O(1)\geq 2 and real numbers N>0N>0 and ϵ(0,1)\epsilon\in(0,1). For the DM-PPC with V>0V>0, the maximum message set size (13) achievable by (N,L,M,ϵ)(N,L,M,\epsilon)-VLSF codes satisfies

logM(N,L,ϵ)\displaystyle\log M^{*}\left(N,L,\epsilon\right) NC1ϵNlog(L1)(N)V1ϵ\displaystyle\geq{\frac{NC}{1-\epsilon}}-\sqrt{N\log_{(L-1)}(N)\frac{V}{1-\epsilon}}
+O(Nlog(L1)(N)).\displaystyle+O\left(\sqrt{\frac{N}{\log_{(L-1)}(N)}}\right). (21)

The decoding times {n1,,nL}\{n_{1},\dots,n_{L}\} that achieve (21) satisfy the equations

logM=nCnlog(L+1)(n)Vlogn+O(1)\displaystyle\log M=n_{\ell}C-\sqrt{n_{\ell}\log_{(L-\ell+1)}(n_{\ell})V}-\log{n_{\ell}}+O(1) (22)

for {2,,L}\ell\in\{2,\dots,L\}, and n1=0n_{1}=0.

Proof:

Inspired by [8, Th. 2], the proof employs a time-sharing strategy between an (N,L1,M,ϵN)(N^{\prime},L-1,M,\epsilon_{N}^{\prime})-VLSF code whose smallest decoding time is nonzero and a simple “stop-at-time-zero” procedure that does not involve any code and decodes an error at time 0. Specifically, we set the VLSF code as the one that achieves the bound in Theorem 1, and we use the VLSF code and the stop-at-time-zero procedure with probabilities 1p1-p and pp, respectively, where pp and ϵN\epsilon_{N}^{\prime} satisfy

ϵN\displaystyle\epsilon_{N}^{\prime} =1NlogN\displaystyle=\frac{1}{\sqrt{N^{\prime}\log N^{\prime}}} (23)
p\displaystyle p =ϵϵN1ϵN\displaystyle=\frac{\epsilon-\epsilon_{N}^{\prime}}{1-\epsilon_{N}^{\prime}} (24)

The error probability of the resulting code is bounded by ϵ\epsilon, and the average decoding time is

N=N(1p)=N(1ϵ)+O(NlogN).\displaystyle N=N^{\prime}(1-p)=N^{\prime}(1-\epsilon)+O\left(\sqrt{\frac{N^{\prime}}{\log N^{\prime}}}\right). (25)

For the scenario where L=L=\infty, we again use time-sharing with the stop-at-time-zero procedure in the achievability bound in [8, Th. 2] with ϵN=1N\epsilon_{N}^{\prime}=\frac{1}{N^{\prime}} instead of (23). In the asymptotic regime L=O(1)L=O(1), the choice in (23) results in a better second-order term than that achieved by ϵN=1N\epsilon_{N}^{\prime}=\frac{1}{N^{\prime}}.

In the analysis of Theorem 1, we need to bound the probability [ı(XnL;YnL)<γ]=ϵN(1o(1))\mathbb{P}\left[\imath(X^{n_{L}};Y^{n_{L}})<\gamma\right]=\epsilon_{N}^{\prime}(1-o(1)). Since this probability decays sub-exponentially to zero due to (23), we use a moderate deviations result from [42, Ch. 8] to bound this probability. Such a tool was not needed in the proof of [8, Th. 2] for L=L=\infty because when nL=n_{L}=\infty, the term [ı(XnL;YnL)<γ]\mathbb{P}\left[\imath(X^{n_{L}};Y^{n_{L}})<\gamma\right] disappears from (18), and the average decoding time is bounded via martingale analysis instead of (19). Finally, we apply Karush-Kuhn-Tucker conditions to show that the decoding times in (22) yield a value of logM\log M that is the maximal value achievable by the non-asymptotic bound up to terms of order O(Nlog(L1)(N))O\left(\sqrt{\frac{N}{\log_{(L-1)}(N)}}\right). The details of the proof appear in Appendix B. ∎

The non-asymptotic achievability bounds obtained from the coding scheme described in the proof sketch of Theorem 2 are illustrated for the BSC in Fig. 1. For L{2,3,4}L\in\{2,3,4\}, the decoding times n1,,nLn_{1},\dots,n_{L} are chosen as described in (22) with the O(1)O(1) term ignored, and ϵN\epsilon_{N}^{\prime} in the stop-at-time-zero procedure is replaced with the right-hand side of (18). For L=1L=1, Fig. 1 shows the random coding union bound in [13, Th. 16], which is a non-asymptotic achievability bound for fixed-length no-feedback codes. For L=L=\infty, Fig. 1 shows the non-asymptotic bound in [8, eq. (102)]. The curves for L=1L=1 and L=2L=2 cross because the choice of decoding times in (22) requires ϵ1NlogN\epsilon\gg\frac{1}{\sqrt{N\log N}} and is optimal only as NN\to\infty. In [30], Yang et al. construct a computationally intensive integer program for the numerical optimization of the decoding times for finite NN. If such a precise optimization is desired, our approximate decoding times in (22) can be used as starting locations for that integer program.

Refer to caption
Figure 1: The non-asymptotic achievability bounds obtained from Theorems 1 and 2 and the non-asymptotic converse bound (16) for the maximum achievable rate logM(N,L,ϵ)N\frac{\log M^{*}(N,L,\epsilon)}{N} are shown for the BSC with crossover probability 0.11, L{1,2,3,4,}L\in\{1,2,3,4,\infty\}, and ϵ=0.05\epsilon=0.05. The curves that L=1L=1 and L=L=\infty are Polyanskiy et al.’s achievability bounds from [13, Th. 16] and [8, eq. (102)], respectively.

Replacing the information-density-based decoding rule in the proof sketch with the optimal SHT would improve the performance achieved on the right-hand side of (21) by only O(1)O(1).

Since any (N,L,M,ϵ)(N,L,M,\epsilon)-VLSF code is also an (N,,M,ϵ)(N,\infty,M,\epsilon)-VLSF code, (16) provides an upper bound on logM(N,L,ϵ)\log M^{*}(N,L,\epsilon) for an arbitrary LL. The order of the second-order term, Nlog(L1)(N)V1ϵ-\sqrt{N\log_{(L-1)}(N)\frac{V}{1-\epsilon}}, depends on the number of decoding times LL. The larger LL, the faster the achievable rate converges to the capacity. However, the dependence on LL is weak since log(L1)(N)\log_{(L-1)}(N) grows very slowly in NN even if LL is small. For example, for L=4L=4 and N=1000N=1000, log(L1)(N)0.659\log_{(L-1)}(N)\approx 0.659. For a finite LL, this bound falls short of the logN-\log N achievability bound in (16) achievable with L=L=\infty. Whether the second-order term achieved in Theorem 2 is tight remains an open problem.

The following theorem gives achievability and converse bounds for VLSF codes with decoding times uniformly spaced as {0,dN,2dN,}\{0,d_{N},2d_{N},\dots\}.

Theorem 3

Fix ϵ(0,1)\epsilon\in(0,1). Let dN=o(N)d_{N}=o(N) with dNd_{N}\to\infty, and let PY|XP_{Y|X} be any DM-PPC. Then, it holds that

logM(N,,ϵ,dN)\displaystyle\log M^{*}(N,\infty,\epsilon,d_{N}\mathbb{Z}_{\geq}) NC1ϵdNC2logN+o(dN).\displaystyle\geq\frac{NC}{1-\epsilon}-\frac{d_{N}C}{2}-\log N+o(d_{N}). (26)

If the DM-PPC PY|XP_{Y|X} is a Cover–Thomas symmetric DM-PPC [43, p. 190] i.e., the rows (and resp. the columns) of the transition probability matrix are permutations of each other, then

logM(N,,ϵ,dN)\displaystyle\log M^{*}(N,\infty,\epsilon,d_{N}\mathbb{Z}_{\geq}) NC1ϵdNC2+o(dN).\displaystyle\leq\frac{NC}{1-\epsilon}-\frac{d_{N}C}{2}+o(d_{N}). (27)
Proof:

The achievability bound (26) employs the sub-optimal SHT in the proof sketch of Theorem 2. To prove the converse in (27), we first derive in Theorem 9, in Appendix C below, the meta-converse bound for VLSF codes. The meta-converse bound in Theorem 9 bounds the error probability of any given VLSF code from below by the minimum achievable type-II error probability of the corresponding SHT; it is an extension and a tightening of Polyanskiy et al.’s converse in (16) since for dN=1d_{N}=1, weakening it by applying a loose bound on the performance of SHTs from [36, Th. 3.2.2] recovers (16). The Cover-Thomas symmetry assumption allows us to circumvent the maximization of that minimum type-II error probability over codes since the log-likelihood ratio logPY|X(Y|x)PY(Y)\log\frac{P_{Y|X}(Y|x)}{P_{Y}(Y)} is the same regardless of the channel input xx for that channel class. In both bounds in (26)–(27), we use the expansions for the average stopping time and the type-II error probability from [36, Ch. 2-3]. See Appendix C for details. ∎

Theorem 3 establishes that when dNlogN\frac{d_{N}}{\log N}\to\infty, the second-order term of the logarithm of maximum achievable message set size among VLSF codes with uniformly spaced decoding times is dNC2-\frac{d_{N}C}{2}. Theorem 3 implies that in order to achieve the same performance as achieved in (21) with LL decoding times, one needs on average Ω(Nlog(L1)(N))\Omega\left(\sqrt{\frac{N}{\log_{(L-1)}(N)}}\right) uniformly spaced stop-feedback instances, suggesting that the optimization of available decoding times considered in Theorem 2 is crucial for attaining the second-order term in (21).

The case where dN=Ω(N)d_{N}=\Omega(N) is not as interesting as the case where dN=o(N)d_{N}=o(N) since analyzing Theorem 9 using Chernoff bound would yield a bound on the probability that the optimal SHT makes a decision at times other than n1=0n_{1}=0 and N1ϵ(1+o(1))\frac{N}{1-\epsilon}(1+o(1)). Since that probability decays exponentially with NN, the scenario where LL is unbounded and dN=Ω(N)d_{N}=\Omega(N) is asymptotically equivalent to L=2L=2. For example, for dN=1N1ϵ(1+O(1NlogN))d_{N}=\frac{1}{\ell}\frac{N}{1-\epsilon}\left(1+O\left(\frac{1}{\sqrt{N\log N}}\right)\right) for some +\ell\in\mathbb{Z}_{+}, the right-hand side of (21) is tight up to the second-order term.

IV VLSF Codes for the DM-MAC

We begin by introducing the definitions used for the multi-transmitter setting.

IV-A Definitions

A KK-transmitter DM-MAC is defined by a triple (k=1K𝒳k,PYK|X[K],𝒴K)\left(\prod_{k=1}^{K}\mathcal{X}_{k},P_{Y_{K}|X_{[K]}},\mathcal{Y}_{K}\right), where 𝒳k\mathcal{X}_{k} is the finite input alphabet for transmitter k[K]k\in[K], 𝒴K\mathcal{Y}_{K} is the finite output alphabet of the channel, and PYK|X[K]P_{Y_{K}|X_{[K]}} is the channel transition probability.

In what follows, the subscript and superscript indicate the corresponding transmitter indices and the codeword lengths, respectively. Let PYKP_{Y_{K}} denote the marginal output distribution induced by the input distribution PX[K]P_{X_{[K]}}. The unconditional and conditional information densities are defined for each non-empty 𝒜[K]\mathcal{A}\subseteq[K] as

ıK(x𝒜;y)\displaystyle\imath_{K}(x_{\mathcal{A}};y) logPYK|X𝒜(y|x𝒜)PYK(y)\displaystyle\triangleq\log\frac{P_{Y_{K}|X_{\mathcal{A}}}(y|x_{\mathcal{A}})}{P_{Y_{K}}(y)} (28)
ıK(x𝒜;y|x𝒜c)\displaystyle\imath_{K}(x_{\mathcal{A}};y|x_{\mathcal{A}^{c}}) logPYK|X[K](y|x[K])PYK|X𝒜c(y|x𝒜c),\displaystyle\triangleq\log\frac{P_{Y_{K}|X_{[K]}}(y|x_{[K]})}{P_{Y_{K}|X_{\mathcal{A}^{c}}}(y|x_{\mathcal{A}^{c}})}, (29)

where 𝒜c=[K]𝒜\mathcal{A}^{c}=[K]\setminus\mathcal{A}. Note that in (28)–(29), the information density functions depend on the transmitter set 𝒜\mathcal{A} unless further symmetry conditions are assumed (e.g., in some cases we assume that the components of PX[K]P_{X_{[K]}} are i.i.d., and PYK|X[K]P_{Y_{K}|X_{[K]}} is invariant to permutations of the inputs X[K]X_{[K]}).

The corresponding mutual informations under the input distribution PX[K]P_{X_{[K]}} and the channel transition probability PYK|X[K]P_{Y_{K}|X_{[K]}} are defined as

IK(X𝒜;YK)\displaystyle I_{K}(X_{\mathcal{A}};Y_{K}) 𝔼[ıK(X𝒜;YK)]\displaystyle\triangleq\mathbb{E}\left[\imath_{K}(X_{\mathcal{A}};Y_{K})\right] (30)
IK(X𝒜;YK|X𝒜c)\displaystyle I_{K}(X_{\mathcal{A}};Y_{K}|{X_{\mathcal{A}^{c}}}) 𝔼[ıK(X𝒜;YK|X𝒜c)].\displaystyle\triangleq\mathbb{E}\left[\imath_{K}(X_{\mathcal{A}};Y_{K}|X_{\mathcal{A}^{c}})\right]. (31)

The dispersions are defined as

VK(X𝒜;YK)\displaystyle V_{K}(X_{\mathcal{A}};Y_{K}) Var[ıK(X𝒜;YK)]\displaystyle\triangleq\mathrm{Var}\left[\imath_{K}(X_{\mathcal{A}};Y_{K})\right] (32)
VK(X𝒜;YK|X𝒜c)\displaystyle V_{K}(X_{\mathcal{A}};Y_{K}|{X_{\mathcal{A}^{c}}}) Var[ıK(X𝒜;YK|X𝒜c)].\displaystyle\triangleq\mathrm{Var}\left[\imath_{K}(X_{\mathcal{A}};Y_{K}|X_{\mathcal{A}^{c}})\right]. (33)

For brevity, we define

IK\displaystyle I_{K} IK(X[K];YK)\displaystyle\triangleq I_{K}(X_{[K]};Y_{K}) (34)
VK\displaystyle V_{K} Var[ıK(X[K];YK)].\displaystyle\triangleq\mathrm{Var}\left[\imath_{K}(X_{[K]};Y_{K})\right]. (35)

A VLSF code for the MAC with KK transmitters is defined similarly to the VLSF code for the PPC.

Definition 2

Fix ϵ(0,1)\epsilon\in(0,1), N(0,)N\in(0,\infty), and positive integers Mk,k[K]M_{k},k\in[K]. An (N,L,M[K],ϵ)(N,L,M_{[K]},\epsilon)-VLSF code for the MAC comprises

  1. 1.

    non-negative integer-valued decoding times n1<<nLn_{1}<\cdots<n_{L},

  2. 2.

    KK finite alphabets 𝒰k\mathcal{U}_{k}, k[K]k\in[K], defining common randomness random variables U1,,UKU_{1},\dots,U_{K},

  3. 3.

    KK sequences of encoding functions 𝖿n(k):𝒰k×[Mk]𝒳k\mathsf{f}_{n}^{(k)}\colon\mathcal{U}_{k}\times[M_{k}]\to\mathcal{X}_{k}, k[K]k\in[K],

  4. 4.

    a stopping time τ{n1,,nL}\tau\in\{n_{1},\dots,n_{L}\} for the filtration generated by {U1,,UK,YKn}=1L\{U_{1},\dots,U_{K},Y_{K}^{n_{\ell}}\}_{\ell=1}^{L}, satisfying an average decoding time constraint (11), and

  5. 5.

    LL decoding functions 𝗀n:𝒰[K]×𝒴Knk=1K[Mk]{𝖾}\mathsf{g}_{n_{\ell}}\colon\mathcal{U}_{[K]}\times\mathcal{Y}_{K}^{n_{\ell}}\to\prod\limits_{k=1}^{K}[M_{k}]\cup\{\mathsf{e}\} for [L]\ell\in[L], satisfying an average error probability constraint

    [𝗀τ(U[K],YKτ)W[K]]ϵ,\displaystyle\mathbb{P}\left[\mathsf{g}_{\tau}(U_{[K]},Y_{K}^{\tau})\neq W_{[K]}\right]\leq\epsilon, (36)

    where the independent messages W1,,WKW_{1},\dots,W_{K} are uniformly distributed on the sets [M1],,[MK][M_{1}],\dots,[M_{K}], respectively.

IV-B Our Achievability Bounds

Our main results are second-order achievability bounds for the rates approaching a point on the sum-rate boundary of the MAC achievable region expanded by a factor of 11ϵ\frac{1}{1-\epsilon}.

Theorem 4, below, is a non-asymptotic achievability bound for any DM-MAC with KK transmitters and LL decoding times.

Theorem 4

Fix constants ϵ(0,1)\epsilon\in(0,1), γ\gamma\in\mathbb{R}, λ(𝒜)>0\lambda^{(\mathcal{A})}>0 for 𝒜𝒫([K])\mathcal{A}\in\mathcal{P}([K]), integers 0n1<<nL0\leq n_{1}<\cdots<n_{L}, and distributions PXkP_{X_{k}}, k[K]k\in[K]. For any DM-MAC with KK transmitters (k=1K𝒳k,PYK|X[K],𝒴K)(\prod_{k=1}^{K}\mathcal{X}_{k},P_{Y_{K}|X_{[K]}},\mathcal{Y}_{K}), there exists an (N,L,M[K],ϵ)(N,L,M_{[K]},\epsilon)-VLSF code with

ϵ[ıK(X[K]nL;YKnL)<γ]\displaystyle\epsilon\leq\mathbb{P}\left[\imath_{K}(X_{[K]}^{n_{L}};Y_{K}^{n_{L}})<\gamma\right] (37)
+k=1K(Mk1)exp{γ}\displaystyle+\prod_{k=1}^{K}(M_{k}-1)\exp\{-\gamma\} (38)
+=1L𝒜𝒫([K])[ıK(X𝒜n;YKn)>N(IK(X𝒜;Y)+λ(𝒜))]\displaystyle+\sum_{\ell=1}^{L}\sum_{\mathcal{A}\in\mathcal{P}([K])}\mathbb{P}\left[\imath_{K}(X_{\mathcal{A}}^{n_{\ell}};Y_{K}^{n_{\ell}})>N(I_{K}(X_{\mathcal{A}};Y)+\lambda^{(\mathcal{A})})\right] (39)
+𝒜𝒫([K])(k𝒜c(Mk1))\displaystyle+\sum_{\mathcal{A}\in\mathcal{P}([K])}\left(\prod_{k\in\mathcal{A}^{\mathrm{c}}}(M_{k}-1)\right)
exp{γ+NIK(X𝒜;YK)+Nλ(𝒜)}\displaystyle\quad\quad\quad\exp\{-\gamma+NI_{K}(X_{\mathcal{A}};Y_{K})+N\lambda^{(\mathcal{A})}\} (40)
Nn1+=1L1(n+1n)[ıK(X[K]n;YKn)<γ].\displaystyle N\leq n_{1}+\sum_{\ell=1}^{L-1}(n_{\ell+1}-n_{\ell})\mathbb{P}\left[\imath_{K}(X_{[K]}^{n_{\ell}};Y_{K}^{n_{\ell}})<\gamma\right]. (41)
Proof:

The proof of Theorem 4 uses a random coding argument that employs KK independent codebook ensembles, each with distribution PXknLP_{X_{k}}^{n_{L}}, k[K]k\in[K]. The receiver employs LL decoders that operate by comparing an information density ıK(x[K]n;yn)\imath_{K}(x_{[K]^{n_{\ell}}};y^{n_{\ell}}) for each possible transmitted codeword set to a threshold. At time nn_{\ell}, decoder gng_{n_{\ell}} computes the information densities ıK(X[K]n(m[K]);YKn)\imath_{K}(X_{[K]}^{n_{\ell}}(m_{[K]});Y_{K}^{n_{\ell}}); if there exists a unique message vector m^[K]\hat{m}_{[K]} satisfying ıK(X[K]n(m^[K]);YKn)>γ\imath_{K}(X_{[K]}^{n_{\ell}}(\hat{m}_{[K]});Y_{K}^{n_{\ell}})>\gamma, then the receiver decodes to the message vector m^[K]\hat{m}_{[K]}; if there exists multiple such message vectors, then the receiver stops the transmission and decodes an error. If no such message vectors exist at time nn_{\ell}, then the receiver emits output 𝖾\mathsf{e} and passes the decoding time nn_{\ell} without decoding if n<nLn_{\ell}<n_{L} and decodes an error if n=nLn_{\ell}=n_{L}. The term (37) bounds the probability that the information density corresponding to the true messages is below the threshold for all decoding times; (38) bounds the probability that all messages are decoded incorrectly; and (39)-(40) bound the probability that the messages from the transmitter index set 𝒜[K]\mathcal{A}\subseteq[K] are decoded incorrectly, and the messages from the index set 𝒜c\mathcal{A}^{c} are decoded correctly. The proof of Theorem 4 appears in Appendix D. ∎

Theorem 5, below, is a second-order achievability bound in the asymptotic regime L=O(1)L=O(1) for any DM-MAC. It follows from an application of Theorem 4.

Theorem 5

Fix ϵ(0,1)\epsilon\in(0,1), an integer L=O(1)2{{L=O(1)\geq 2}}, and distributions PXkP_{X_{k}}, k[K]k\in[K]. For any KK-transmitter DM-MAC (k=1K𝒳k,PYK|X[K],𝒴K)(\prod_{k=1}^{K}\mathcal{X}_{k},P_{Y_{K}|X_{[K]}},\mathcal{Y}_{K}), there exists a KK-tuple M[K]M_{[K]} and an (N,L,M[K],ϵ)(N,L,M_{[K]},\epsilon)-VLSF code satisfying

k[K]logMk\displaystyle\sum_{k\in[K]}\log M_{k} =NIK1ϵNlog(L1)(N)VK1ϵ\displaystyle={\frac{NI_{K}}{1-\epsilon}}-\sqrt{N\log_{(L-1)}(N)\frac{V_{K}}{1-\epsilon}}
+O(Nlog(L1)(N)).\displaystyle\quad+O\left(\sqrt{\frac{N}{\log_{(L-1)}(N)}}\right). (42)
Proof:

See Appendix D. ∎

In the application of Theorem 4 to prove Theorem 5, we choose the parameters λ(𝒜)\lambda^{(\mathcal{A})} and γ\gamma so that the terms in (39)-(40) decay exponentially with NN, which become negligible compared to (37) and (38). Between (37) and (38), the term (37) is dominant when LL does not grow with NN, and (38) is dominant when LL grows linearly with NN.

Like the single-threshold rule from [28] for the RAC, the single-threshold rule employed in the proof of Theorem 4 differs from the decoding rules employed in [20] for VLSF codes over the Gaussian MAC with expected power constraints and in [23] for the DM-MAC. In both [20] and [23], L=nmax=L=n_{\max}=\infty, and the decoder employs 2K12^{K}-1 simultaneous threshold rules for each of the boundaries that define the achievable region of the MAC with KK transmitters. Those rules fix thresholds γ(𝒜)\gamma^{(\mathcal{A})}, 𝒜𝒫([K])\mathcal{A}\in\mathcal{P}([K]), and decode messages m[K]m_{[K]} if for all 𝒜𝒫([K])\mathcal{A}\in\mathcal{P}([K]), the codeword for m[K]m_{[K]} satisfies

ıK(X𝒜n(m𝒜);YKn|X𝒜cn(m𝒜c))\displaystyle\imath_{K}(X_{\mathcal{A}}^{n_{\ell}}(m_{\mathcal{A}});Y_{K}^{n_{\ell}}|X_{\mathcal{A}^{c}}^{n_{\ell}}(m_{\mathcal{A}^{c}})) >γ(𝒜),\displaystyle>\gamma^{(\mathcal{A})}, (43)

for some γ(𝒜)\gamma^{(\mathcal{A})}, 𝒜𝒫([K])\mathcal{A}\in\mathcal{P}([K]). Our decoder can be viewed as a special case of (43) obtained by setting γ(𝒜)=\gamma^{(\mathcal{A})}=-\infty for 𝒜[K]\mathcal{A}\neq[K].

Analyzing Theorem 4 in the asymptotic regime L=Ω(N)L=\Omega(N), we determine that there exists a KK-tuple M[K]M_{[K]} and an (N,,M[K],ϵ)(N,\infty,M_{[K]},\epsilon)-VLSF code satisfying

k[K]logMk\displaystyle\sum_{k\in[K]}\log M_{k} =NIK1ϵlogN+O(1).\displaystyle={\frac{NI_{K}}{1-\epsilon}}-\log N+O(1). (44)

Both (42) and (44) are achieved at rate points that approach a point on the sum-rate boundary of the KK-MAC achievable region expanded by a factor of 11ϵ\frac{1}{1-\epsilon}.

For any VLSF code, L=L=\infty case can be treated as L=Ω(N)L=\Omega(N) regardless of the number of transmitters since if we truncate an infinite-length code at time nmax=2Nn_{\max}=2N, by Chernoff bound, the resulting penalty term added to the error probability decays exponentially with NN, whose effect in (44) is o(1)o(1). See Appendix D.III for the proof of (44).

For L=nmax=L=n_{\max}=\infty, Trillingsgaard and Popovski [23] numerically evaluate their non-asymptotic achievability bound for a DM-MAC while Truong and Tan [20] provide an achievability bound with second-order term O(N)-O(\sqrt{N}) for the Gaussian MAC with average power constraints. Applying our single-threshold rule and analysis to the Gaussian MAC with average power constraints improves the second-order term in [20] from O(N)-O(\sqrt{N}) to logN+O(1)-\log N+O(1) for all non-corner points in the achievable region. The main challenge in [20] is to derive a tight bound on the expected value of the maximum over 𝒜[K]\mathcal{A}\subseteq[K] of stopping times τ(𝒜)\tau^{(\mathcal{A})} for the corresponding threshold rules in (43). In our analysis, we avoid that challenge by employing a single-threshold decoder whose average decoding time is bounded by 𝔼[τ([K])]\mathbb{E}\left[\tau^{([K])}\right].

Under the same model and assumptions on LL, to achieve non-corner rate points that do not lie on the sum-rate boundary, we modify our single-threshold rule to (43), where 𝒜\mathcal{A} is the transmitter index set corresponding to the capacity region’s active sum-rate bound at the (non-corner) point of interest. Following steps similar to the proof of (44) gives second-order term logN+O(1)-\log N+O(1) for those points as well. For corner points, more than one boundary is active222The capacity region of a KK-transmitter MAC is characterized by the region bounded by 2K12^{K}-1 planes. By definition of a corner point, at least two inequalities corresponding to these planes are active at a corner point.; therefore, more than one threshold rule in (43) is needed at the decoder. In this case, again for L=L=\infty, [20] proves an achievability bound with a second-order term O(N)-O(\sqrt{N}). Whether this bound can be improved to logN+O(1)-\log N+O(1) as in (44) remains an open problem.

V VLSF Codes for the DM-RAC with at Most KK Transmitters

Definition 3 (Yavas et al. [28, eq. (1)])

A permutation-invariant, reducible DM-RAC for the maximal number of transmitters K<K<\infty is defined by a family of DM-MACs {(𝒳k,PYk|X[k],𝒴k)}k=0K\left\{\left(\mathcal{X}^{k},P_{Y_{k}|X_{[k]}},\mathcal{Y}_{k}\right)\right\}_{k=0}^{K}, where the kk-th DM-MAC defines the channel for kk active transmitters.

By assumption, each of the DM-MACs satisfies the permutation-invariance condition

PYk|X[k](y|x[k])=PYk|X[k](y|xπ[k])\displaystyle P_{Y_{k}|X_{[k]}}(y|x_{[k]})=P_{Y_{k}|X_{[k]}}(y|x_{\pi[k]}) (45)

for all permutations π[k]\pi[k] of [k][k], and y𝒴ky\in\mathcal{Y}_{k}, and the reducibility condition

PYs|X[s](y|x[s])=PYk|X[k](y|x[s],0ks)\displaystyle P_{Y_{s}|X_{[s]}}(y|x_{[s]})=P_{Y_{k}|X_{[k]}}(y|x_{[s]},0^{k-s})\quad (46)

for all s<ks<k, x[s]𝒳[s]x_{[s]}\in{\mathcal{X}}_{[s]}, and y𝒴sy\in{\mathcal{Y}}_{s}, where 0𝒳0\in\mathcal{X} specifies a unique “silence” symbol that is transmitted when a transmitter is silent.

The permutation-invariance (45) and reducibility (46) conditions simplify the presentation and enable us to show, using a single-threshold rule at the decoder [28], that the symmetrical rate point (R,R,,R)(R,R,\dots,R) at which the code operates lies on the sum-rate boundary of each of the underlying DM-MACs,

The VLSF RAC code defined here combines our rateless communication strategy from [28] with the sparse feedback VLSF PPC and MAC codes with optimized average decoding times described above. Specifically, the decoder estimates the value of kk at time n0n_{0}. If the estimate k^\hat{k} is not zero, it decodes at one of the LL decoding times nk^,1<nk^,2<<nk^,Ln_{\hat{k},1}<n_{\hat{k},2}<\dots<n_{\hat{k},L} (rather than just the single time nk^n_{\hat{k}} used in [28, 39]). For every k[K]k\in[K], the locations of the LL decoding times are optimized to attain the minimum average decoding delay. As in [28], we do not assume any probability distribution on the user activity pattern. We seek instead to optimize the rate-reliability trade-off simultaneously for all possible activity patterns. (By the permutation-invariance assumption, there are only KK distinguishable activity patterns to consider here indexed by the number of active transmitters.) If the decoder concludes that no transmitters are active, then it ends the transmission at time n0n_{0} decoding no messages. At each time ni,n_{i,\ell}, i<k^i<\hat{k}, the receiver broadcasts “0” to the transmitters, signaling that they should continue to transmit. At time nk^,n_{\hat{k},\ell}, the receiver broadcasts feedback bit “1” to the transmitters if it is able to decode k^\hat{k} messages; otherwise, it outputs an erasure symbol “𝖾\mathsf{e}” and sends feedback bit “0”, again signaling that decoding has not occurred and transmission should continue.

As in [28, 39], we assume that the transmitters know nothing about the set 𝒜\mathcal{A} except their own membership and the receiver’s feedback at potential decoding times. We employ identical encoding [44], that is, all transmitters use the same codebook. This implies that the RAC code operates at the symmetrical rate point, i.e., Mi=MM_{i}=M for i[K]i\in[K]. As in [44, 28], the decoder is required to decode the list of messages transmitted by the active transmitters but not the identities of these transmitters.

To deal with the scenario where the number of transmitters in the RAC grows linearly with the blocklength, i.e., K=Ω(N)K=\Omega(N), [44] employs the per-user error probability (PUPE) constraint rather than the joint error probability used here and in the analysis of the MAC (e.g., [20, 28, 39]). The PUPE is a weaker error probability constraint since, under PUPE, an error at one decoder does not count as an error at all other decoders. In [28], it is shown that when K=O(1)K=O(1), PUPE and joint error probability constraints have the same second-order performance for random access coding. As a result, there is no advantage to using PUPE rather than the more stringent joint error criterion when K=O(1)K=O(1). Therefore, we employ the joint error probability constraint throughout.

We formally define VLSF codes for the RAC as follows.

Definition 4

Fix ϵ0,,ϵK(0,1)\epsilon_{0},\dots,\epsilon_{K}\in(0,1) and N0,,NK(0,)N_{0},\dots,N_{K}\in(0,\infty). An ({Nk}k=0K,L,M,{ϵk}k=0K)(\{N_{k}\}_{k=0}^{K},L,M,\{\epsilon_{k}\}_{k=0}^{K})-VLSF code with identical encoders comprises

  1. 1.

    a set of integers 𝒩{n0}{nk,:k[K],[L]}\mathcal{N}\triangleq\{n_{0}\}\cup\{n_{k,\ell}\colon k\in[K],\ell\in[L]\} (without loss of generality, we assume that nK,Ln_{K,L} is the largest available decoding time),

  2. 2.

    a common randomness random variable UU on an alphabet 𝒰\mathcal{U},

  3. 3.

    a sequence of encoding functions 𝖿n:𝒰×[M]𝒳\mathsf{f}_{n}\colon\mathcal{U}\times[M]\to\mathcal{X}, n=1,2,,nK,Ln=1,2,\ldots,n_{K,L}, defining MM length-nK,Ln_{K,L} codewords,

  4. 4.

    KK non-negative integer-valued random stopping times τk𝒩\tau_{k}\in\mathcal{N} for the filtration generated by {U,Ykn}n𝒩\{U,Y_{k}^{n}\}_{n\in\mathcal{N}}, satisfying

    𝔼[τk]Nk\displaystyle\mathbb{E}\left[\tau_{k}\right]\leq N_{k} (47)

    if k{0}[K]k\in\{0\}\cup[K] transmitters are active, and

  5. 5.

    KL+1KL+1 decoding functions 𝗀n0:𝒰×𝒴0n0{}{𝖾}\mathsf{g}_{n_{0}}\colon\mathcal{U}\times\mathcal{Y}_{0}^{n_{0}}\to\{\emptyset\}\cup\{\mathsf{e}\} and 𝗀nk,:𝒰×𝒴knk,[M]k{𝖾}\mathsf{g}_{n_{k,\ell}}\colon\mathcal{U}\times\mathcal{Y}_{k}^{n_{k,\ell}}\to[M]^{k}\cup\{\mathsf{e}\}, k[K]k\in[K] and [L]\ell\in[L], satisfying an average error probability constraint

    [𝗀τk(U,Ykτk)πW[k]]ϵk\displaystyle\mathbb{P}\left[\mathsf{g}_{\tau_{k}}(U,Y_{k}^{\tau_{k}})\stackrel{{\scriptstyle\pi}}{{\neq}}W_{[k]}\right]\leq\epsilon_{k} (48)

    when k[K]k\in[K] messages W[k]=(W1,,Wk)W_{[k]}=(W_{1},\dots,W_{k}) are transmitted, where W1,,WkW_{1},\dots,W_{k} are independent and equiprobable on the set [M][M], and

    [𝗀τ0(U,Y0τ0)]ϵ0\displaystyle\mathbb{P}\left[\mathsf{g}_{\tau_{0}}(U,Y_{0}^{\tau_{0}})\neq\emptyset\right]\leq\epsilon_{0} (49)

    when no transmitters are active.

To guarantee that the symmetrical rate point arising from identical encoding lies on the sum-rate boundary for all k[K]k\in[K], following [28], we assume that there exists an input distribution PXP_{X} that satisfies the interference assumptions

PX[t]|YkPX[s]|YkPX[s+1:t]|Yks<tkK.\displaystyle P_{X_{[t]}|Y_{k}}\neq P_{X_{[s]}|Y_{k}}\,P_{X_{[s+1:t]}|Y_{k}}\quad\forall\,s<t\leq k\leq K. (50)

Permutation-invariance (45), reducibility (46), and interference (50) together imply that the mutual information per transmitter, Ikk\frac{I_{k}}{k}, strictly decreases with increasing kk (see [28, Lemma 1]). This property guarantees the existence of decoding times satisfying nk1,1<nk2,2n_{k_{1},\ell_{1}}<n_{k_{2},\ell_{2}} for any k1<k2k_{1}<k_{2} and 1,2[L]\ell_{1},\ell_{2}\in[L].

In order to be able to detect the number of active transmitters using the received symbols Ynk,Y^{n_{k,\ell}} but not the codewords themselves, we require that the input distribution PXP_{X} satisfies the distinguishability assumption

PYk1PYk2k1k2{0}[K],\displaystyle P_{Y_{k_{1}}}\neq P_{Y_{k_{2}}}\quad\forall\,k_{1}\neq k_{2}\in\{0\}\cup[K], (51)

where PYkP_{Y_{k}} is the marginal output distribution under the kk-transmitter DM-MAC with input distribution PX[k]=(PX)kP_{X_{[k]}}=(P_{X})^{k}.

An example of a permutation-invariant and reducible DM-RAC that satisfies interference (50) and distinguishability (51) assumptions is the adder-erasure RAC in [45, 28]

Yk={i=1kXi,w.p. 1δ𝖾w.p. δ,\displaystyle Y_{k}=\begin{cases}\sum_{i=1}^{k}X_{i},&\text{w.p. }1-\delta\\ \mathsf{e}&\text{w.p. }\delta,\end{cases} (52)

where Xi{0,1}X_{i}\in\{0,1\}, Yk{0,,k}{𝖾}Y_{k}\in\left\{0,\ldots,k\right\}\cup\{\mathsf{e}\}, and δ(0,1)\delta\in(0,1).

Theorem 6

Fix ϵ(0,1)\epsilon\in(0,1), finite integers K1K\geq 1 and L2L\geq 2, and a distribution PXP_{X} satisfying (50)–(51). For any permutation-invariant (45) and reducible (46) DM-RAC {(𝒳k,PYk|X[k],𝒴k)}k=0K\left\{(\mathcal{X}^{k},P_{Y_{k}|X_{[k]}},\mathcal{Y}_{k})\right\}_{k=0}^{K}, there exists an ({Nk}k=0K,L,M,{ϵk}k=0K)(\{N_{k}\}_{k=0}^{K},L,M,\{\epsilon_{k}\}_{k=0}^{K})-VLSF code satisfying

klogM\displaystyle k\log M =NkIk1ϵkNklog(L1)(Nk)Vk1ϵk\displaystyle={\frac{N_{k}I_{k}}{1-\epsilon_{k}}}-\sqrt{N_{k}\log_{(L-1)}(N_{k})\frac{V_{k}}{1-\epsilon_{k}}}
+O(Nklog(L)(Nk))\displaystyle\quad+O\left(\sqrt{\frac{N_{k}}{\log_{(L)}(N_{k})}}\right) (53)

for k[K]k\in[K], and

N0=clogN1+o(logN1)\displaystyle N_{0}=c\log N_{1}+o(\log N_{1}) (54)

for some c>0c>0.

Proof:

The coding strategy to prove Theorem 6 is as follows. The decoder applies a (K+1)(K+1)-ary hypothesis test using the output sequence Yn0Y^{n_{0}} and decides an estimate k^\hat{k} of the number of active transmitters k{0,1,,K}k\in\{0,1,\dots,K\}. If the hypothesis test declares that k^=0\hat{k}=0, then the receiver stops the transmission at time n0n_{0}, decoding no messages. If k^0\hat{k}\neq 0, then the receiver decodes k^\hat{k} messages at one of the times nk^,1,,nk^,Ln_{\hat{k},1},\dots,n_{\hat{k},L} using the VLSF code in Theorem 5 for the k^\hat{k}-transmitter DM-MAC with LL decoding times. If the receiver decodes at time nk^,n_{\hat{k},\ell}, then it sends feedback bit ‘0’ at all previous decoding times {n𝒩:n<nk^,}\{n\in\mathcal{N}\colon n<n_{\hat{k},\ell}\} and feedback bit ‘1’ at time nk^,n_{\hat{k},\ell}. Note that alternatively, the receiver can send its estimate k^\hat{k} using log2(K+1)+L\lceil\log_{2}(K+1)\rceil+L bits at time n0n_{0}, informing the transmitters that it will decode at some time {nk^,1,,nk^,L}\{n_{\hat{k},1},\dots,n_{\hat{k},L}\}; in this case, the number of feedback bits decreases from the worst-case KL+1KL+1 that results from the strategy described above. The details of the proof appear in Appendix E. ∎

VI VLSF Codes for the Gaussian PPC with Maximal Power Constraints

VI-A Gaussian PPC

The output of a memoryless, Gaussian PPC of blocklength nn in response to the input XnnX^{n}\in\mathbb{R}^{n} is

Yn=Xn+Zn,\displaystyle Y^{n}=X^{n}+Z^{n}, (55)

where Z1,,ZnZ_{1},\ldots,Z_{n} are drawn i.i.d. from 𝒩(0,1)\mathcal{N}(0,1), independent of XnX^{n}.

The channel’s capacity C(P)C(P) and dispersion V(P)V(P) are

C(P)\displaystyle C(P) =12log(1+P)\displaystyle=\frac{1}{2}\log(1+P) (56)
V(P)\displaystyle V(P) =P(P+2)2(1+P)2.\displaystyle=\frac{P(P+2)}{2(1+P)^{2}}. (57)

VI-B Related Work on the Gaussian PPC

We first introduce the maximal and average power constraints on VLSF codes for the PPC. Given a VLSF code with LL decoding times n1,,nLn_{1},\dots,n_{L}, the maximal power constraint requires that the length-nn prefixes, n{n1,,nL}n\in\{n_{1},\dots,n_{L}\}, of each codeword all satisfy a power constraint PP, i.e.,

𝖿(u,m)n2nP for all m[M],u𝒰,[L]..\displaystyle\left\lVert\mathsf{f}(u,m)^{n_{\ell}}\right\rVert^{2}\leq n_{\ell}P\,\text{ for all }m\in[M],u\in\mathcal{U},\quad\ell\in[L].. (58)

The average power constraint on the length-nLn_{L} codewords, as defined by [20, Def. 1], is

𝔼[𝖿(U,W)nL2]\displaystyle\mathbb{E}\left[\left\lVert\mathsf{f}(U,W)^{n_{L}}\right\rVert^{2}\right] NP.\displaystyle\leq NP. (59)

The definitions of (N,L,M,ϵ,P)max(N,L,M,\epsilon,P)_{\mathrm{max}} and (N,L,M,ϵ,P)ave(N,L,M,\epsilon,P)_{\mathrm{ave}}-VLSF codes for the Gaussian PPC are similar to Definition 1 with the addition of maximal (58) and average (59) power constraints, respectively. Similar to (13), M(N,L,ϵ,P)maxM^{*}(N,L,\epsilon,P)_{\mathrm{max}} (resp. M(N,L,ϵ,P)aveM^{*}(N,L,\epsilon,P)_{\mathrm{ave}}) denotes the maximum achievable message set size with LL decoding times, average decoding time NN, average error probability ϵ\epsilon, and maximal (resp. average) power constraint PP.

In the following, we discuss prior asymptotic expansions of M(N,L,ϵ,P)maxM^{*}(N,L,\epsilon,P)_{\mathrm{max}} and M(N,L,ϵ,P)aveM^{*}(N,L,\epsilon,P)_{\mathrm{ave}} for the Gaussian PPC, where L{1,}L\in\{1,\infty\}.

  1. a)

    M(N,1,ϵ,P)maxM^{*}(N,1,\epsilon,P)_{\mathrm{max}}: For L=1L=1, P>0P>0, and ϵ(0,1)\epsilon\in(0,1), Tan and Tomamichel [35, Th. 1] and Polyanskiy et al. [13, Th. 54] show that

    \IEEEeqnarraymulticol3llogM(N,1,ϵ,P)max\displaystyle\IEEEeqnarraymulticol{3}{l}{\log M^{*}(N,1,\epsilon,P)_{\mathrm{max}}} (60)
    =\displaystyle= NC(P)NV(P)Q1(ϵ)+12logN+O(1).\displaystyle NC(P)-\sqrt{NV(P)}Q^{-1}(\epsilon)+\frac{1}{2}\log N+O(1).

    The converse for (60) is derived in [13, Th. 54] and the achievability for (60) in [35, Th. 1]. The achievability scheme in [35, Th. 1] generates i.i.d. codewords uniformly distributed on the nn-dimensional sphere with radius nP\sqrt{nP}, and applies maximum likelihood (ML) decoding. These results imply that random codewords uniformly distributed on a sphere and ML decoding are, together, third-order optimal, meaning that the gap between the achievability and converse bounds in (60) is O(1)O(1).

  2. b)

    M(N,1,ϵ,P)aveM^{*}(N,1,\epsilon,P)_{\mathrm{ave}}: For L=1L=1 with an average-power-constraint, Yang et al. show in [46] that

    logM(N,1,ϵ,P)ave=NC(P1ϵ)\displaystyle\log M^{*}(N,1,\epsilon,P)_{\mathrm{ave}}=N\,C\left(\frac{P}{1-\epsilon}\right)-
    NlogNV(P1ϵ)+O(N).\displaystyle\quad-\sqrt{N\log N\,V\left(\frac{P}{1-\epsilon}\right)}+O(\sqrt{N}). (61)

    Yang et al. use a power control argument to show the achievability of (61). They divide the messages into disjoint sets 𝒜\mathcal{A} and [M]𝒜[M]\setminus\mathcal{A}, where |𝒜|=M(1ϵ)(1o(1))|\mathcal{A}|=M(1-\epsilon)(1-o(1)). For the messages in 𝒜\mathcal{A}, they use an (N,1,|𝒜|,2NlogN,P1ϵ(1o(1)))\left(N,1,|\mathcal{A}|,\frac{2}{\sqrt{N\log N}},\frac{P}{1-\epsilon}(1-o(1))\right)-VLSF code with a single decoding time NN. The codewords are generated i.i.d. uniformly on the sphere with center at 0 and radius NP1ϵ(1o(1))\sqrt{N\frac{P}{1-\epsilon}(1-o(1))}. The messages in [M]𝒜[M]\setminus\mathcal{A} are assigned the all-zero codeword. The converse for (61) follows from an application of the meta-converse [13, Th. 26].

  3. c)

    M(N,,ϵ,P)aveM^{*}(N,\infty,\epsilon,P)_{\mathrm{ave}}: For VLSF codes with L=nmax=L=n_{\max}=\infty and average power constraint (59), Truong and Tan show in [19, Th. 1] that for ϵ(0,1)\epsilon\in(0,1) and P>0P>0,

    logM(N,,ϵ,P)ave\displaystyle\log M^{*}(N,\infty,\epsilon,P)_{\mathrm{ave}} NC(P)1ϵlogN+O(1)\displaystyle\geq\frac{NC(P)}{1-\epsilon}-\log N+O(1) (62)
    logM(N,,ϵ,P)ave\displaystyle\log M^{*}(N,\infty,\epsilon,P)_{\mathrm{ave}} NC(P)1ϵ+hb(ϵ)1ϵ,\displaystyle\leq\frac{NC(P)}{1-\epsilon}+\frac{h_{b}(\epsilon)}{1-\epsilon}, (63)

    where hbh_{b} is the binary entropy function. The results in (62)–(63) are analogous to the fundamental limits for DM-PPCs (15)–(16) and follow from arguments similar to those in [8]. Since the information density ı(X;Y)\imath(X;Y) for the Gaussian channel is unbounded, bounding the expected value of the decoding time in the proof of [19, Th. 1] requires different techniques from those applicable to DM-PPCs [8].

TABLE II: The performance of VLSF codes for the Gaussian channel in scenarios distinguished by the number of available decoding times LL, the type of the power constraint, and the presence of feedback.
First-order term Second-order term
Lower Bound Upper Bound
Fixed-length (L=1)(L=1) No Feedback Max. power NC(P)NC(P) NV(P)Q1(ϵ)-\sqrt{NV(P)}Q^{-1}(\epsilon) ([35, 13]) NV(P)Q1(ϵ)-\sqrt{NV(P)}Q^{-1}(\epsilon) ([13])
Ave. power NC(P1ϵ)NC\left(\frac{P}{1-\epsilon}\right) NlogNV(P1ϵ)-\sqrt{N\log NV\left(\frac{P}{1-\epsilon}\right)} ([46]) NlogNV(P1ϵ)-\sqrt{N\log NV\left(\frac{P}{1-\epsilon}\right)} ([46])
Feedback Max. power NC(P)NC(P) NV(P)Q1(ϵ)-\sqrt{NV(P)}Q^{-1}(\epsilon)  ([35, 13]) NV(P)Q1(ϵ)-\sqrt{NV(P)}Q^{-1}(\epsilon) ([14])
Ave. power NC(P1ϵ)NC\left(\frac{P}{1-\epsilon}\right) O(log(L)(N))-O(\log_{(L)}(N)) ([47]) +NlogNV(P1ϵ)+\sqrt{N\log NV\left(\frac{P}{1-\epsilon}\right)} ([47])
Variable-length (L<)(L<\infty) Max. power NC(P)1ϵ\frac{NC(P)}{1-\epsilon} Nlog(L1)(N)V(P)1ϵ-\sqrt{N\log_{(L-1)}(N)\frac{V(P)}{1-\epsilon}} (Theorem 7) +O(1)+O(1) ([19])
Ave. power NC(P)1ϵ\frac{NC(P)}{1-\epsilon} Nlog(L1)(N)V(P)1ϵ-\sqrt{N\log_{(L-1)}(N)\frac{V(P)}{1-\epsilon}} (Theorem 7) +O(1)+O(1) ([19])
Variable-length (L=nmax=)(L=n_{\max}=\infty) Max. power NC(P)1ϵ\frac{NC(P)}{1-\epsilon} O(N)-O(\sqrt{N}) ([1]) +O(1)+O(1) ([19])
Ave. power NC(P)1ϵ\frac{NC(P)}{1-\epsilon} logN-\log N  ([19]) +O(1)+O(1) ([19])

Table II combines the L=1L=1 summary from [47, Table I] with the corresponding results for L>1L>1 to summarize the performance of VLSF codes for the Gaussian channel in different communication scenarios.

VI-C Main Result

The theorem below is our main result for the Gaussian PPC under the maximal power constraint (58).

Theorem 7

Fix an integer L=O(1)2L=O(1)\geq 2 and real numbers P>0P>0 and ϵ(0,1)\epsilon\in(0,1). For the Gaussian channel with maximal power constraint (58), the maximum message set size achievable by (N,L,M,ϵ,P)(N,L,M,\epsilon,P)-VLSF codes satisfies

logM(N,L,ϵ,P)max\displaystyle\log M^{*}\left(N,L,\epsilon,P\right)_{\max} NC(P)1ϵNlog(L1)(N)V(P)1ϵ\displaystyle\geq{\frac{NC(P)}{1-\epsilon}}-\sqrt{N\log_{(L-1)}(N)\frac{V(P)}{1-\epsilon}}
+O(Nlog(L1)(N)).\displaystyle+O\left(\sqrt{\frac{N}{\log_{(L-1)}(N)}}\right). (64)

The decoding times that achieve (64) satisfy the equations

logM(N,L,ϵ,P)\displaystyle\log M^{*}\left(N,L,\epsilon,P\right)
=nC(P)nlog(L+1)(n)V(P)logn+O(1)\displaystyle=n_{\ell}C(P)-\sqrt{n_{\ell}\log_{(L-\ell+1)}(n_{\ell})V(P)}-\log{n_{\ell}}+O(1) (65)

for {2,,L}\ell\in\{2,\dots,L\}, and n1=0n_{1}=0.

Proof:

See Appendix F. ∎

Note that the achievability bound in Theorem 7 has the same form as the one in Theorem 2 with CC and VV replaced with the Gaussian capacity C(P)C(P) and the Gaussian dispersion V(P)V(P), respectively. The bound in (64) holds for the average power constraint as well since any code that satisfies the maximal power constraint also satisfies the average power constraint.

From Shannon’s work in [48], it is known that for the Gaussian channel with a maximal power constraint, drawing i.i.d. Gaussian codewords yields a performance inferior to that achieved by the uniform distribution on the power sphere. As a result, almost all tight achievability bounds for the Gaussian channel in the fixed-length regime under a variety of settings (e.g., all four combinations of the maximal/average power constraint and feedback/no feedback [35, 46, 14, 47] in Table I) employ random codewords drawn uniformly at random on the power sphere. A notable exception is Truong and Tan’s result in (62) [19, Th. 1], which considers VLSF codes with an average power constraint; that result employs i.i.d. Gaussian inputs. The Gaussian distribution works in this scenario because when L=L=\infty, the usually dominant term [ı(XnL;YnL)<γ]\mathbb{P}\left[\imath(X^{n_{L}};Y^{n_{L}})<\gamma\right] in (18) disappears. The second term (M1)exp{γ}(M-1)\exp\{-\gamma\} in (18) is not affected by the input distribution. Unfortunately, the approach from [19, Th. 1] does not work here since drawing codewords i.i.d. 𝒩(0,P)\mathcal{N}(0,P) satisfies the average power constraint (59) but not the maximal power constraint (58). When L=O(1)L=O(1) and the probability [ı(XnL;YnL)<γ]\mathbb{P}\left[\imath(X^{n_{L}};Y^{n_{L}})<\gamma\right] dominates, using i.i.d. 𝒩(0,P)\mathcal{N}(0,P) inputs achieves a worse second-order term in the asymptotic expansion (64) of the maximum achievable message set size. For the case L=O(1)L=O(1), we draw codewords according to the rule that the sub-codewords indexed from nj1+1n_{j-1}+1 to njn_{j} are drawn uniformly on the (njnj1)(n_{j}-n_{j-1})-dimensional sphere of radius (njnj1)P\sqrt{(n_{j}-n_{j-1})P} for j[L]j\in[L], independently of each other. Note that this input distribution is dispersion-achieving for the fixed-length no-feedback case, i.e., L=1L=1 [13] and is superior to choosing codewords i.i.d. 𝒩(0,P)\mathcal{N}(0,P), even under the average power constraint. In particular, i.i.d. 𝒩(0,P)\mathcal{N}(0,P) inputs achieve (21), where the dispersion V(P)V(P) is replaced by the variance V~(P)=P1+P\tilde{V}(P)=\frac{P}{1+P} of ı(X;Y)\imath(X;Y) when X𝒩(0,P)X\sim\mathcal{N}(0,P); here V~(P)\tilde{V}(P) is greater than the dispersion V(P)V(P) for all P>0P>0 (see [49, eq. (2.56)]). Whether or not our input distribution is optimal in the second-order term remains an open question.

VII Conclusions

This paper investigates the maximum achievable message set size for sparse VLSF codes over the DM-PPC (Theorem 2), DM-MAC (Theorem 5), DM-RAC (Theorem 6), and Gaussian PPC (Theorem 7) in the asymptotic regime where the number of decoding times LL is constant as the average decoding time NN grows without bound. Under our second-order achievability bounds, the performance improvement due to adding more decoding time opportunities to our code quickly diminishes as LL increases. For example, for the BSC with crossover probability 0.11, at average decoding time N=2000N=2000, our VLSF coding bound with only L=4L=4 decoding times achieves 96% of the rate of Polyanskiy et al.’s VLSF coding bound for L=L=\infty. Incremental redundancy automatic repeat request codes, which are some of the most common feedback codes, employ only a small number of decoding times and stop feedback. Our analysis shows that such a code design is not only practical but also has performance competitive with the best known dense feedback codes.

In all channel types considered, the first-order term in our achievability bounds is NC1ϵ\frac{NC}{1-\epsilon}, where NN is the average decoding time, ϵ\epsilon is the error probability, and CC is the capacity (or the sum-rate capacity in the multi-transmitter case), and the second-order term is O(Nlog(L1)(N))O\left(\sqrt{N\log_{(L-1)}(N)}\right). For DM-PPCs, there is a mismatch between the second-order term of our achievability bound for VLSF codes with L=O(1)L=O(1) decoding times (Theorem 2) and the second-order term of the best known converse bound (16); the latter applies to L=L=\infty, and therefore to any LL. Towards closing the gap between the achievability and converse bounds, in Theorem 9 in Appendix C, below, we derive a non-asymptotic converse bound that links the error probability of a VLSF code with the minimum achievable type-II error probability of an SHT. However, since the threshold values of the optimal SHT with LL decoding times do not have a closed-form expression [36, pp. 153-154], analyzing the non-asymptotic converse bound in Theorem 9 is a difficult task. Whether the second-order term in Theorem 2 is optimal is a question left to future work.

In sparse VLSF codes, optimizing the values of LL available decoding times is important since to achieve the same performance as L=O(1)L=O(1) optimized decoding times (Theorem 2), one needs Ω(Nlog(L1)(N))\Omega\left(\sqrt{\frac{N}{\log_{(L-1)}(N)}}\right) uniformly spaced decoding times (Theorem 3).

Appendix A Proof of Theorem 1

In this section, we derive an achievability bound based on a general SHT, which we use to prove Theorem 1.

A.I A General SHT-based Achievability Bound

A.I1 SHT definitions

We begin by formally defining an SHT. We extend the definition in [36, Ch. 3] to allow non-i.i.d. distributions and finitely many testing times. Let {Zi}i=1nL\{Z_{i}\}_{i=1}^{n_{L}} be the observed sequence. Consider two hypotheses for the distribution of ZnLZ^{n_{L}}

H0\displaystyle H_{0} :ZnLP0\displaystyle\colon Z^{n_{L}}\sim P_{0} (66)
H1\displaystyle H_{1} :ZnLP1,\displaystyle\colon Z^{n_{L}}\sim P_{1}, (67)

where P0P_{0} and P1P_{1} are distributions on a common alphabet 𝒵nL\mathcal{Z}^{n_{L}}. Let 𝒩{0,1,2,,nL}\mathcal{N}\subseteq\{0,1,2,\dots,n_{L}\} be the set of times that the hypothesis is tested. Let Pi(n)P_{i}^{(n_{\ell})} denote the marginal distribution of the first nn_{\ell} symbols in PiP_{i}, i{0,1}i\in\{0,1\}. At time n𝒩n_{\ell}\in\mathcal{N}, we either decide H0:ZnP0(n)H_{0}\colon Z^{n_{\ell}}\sim P_{0}^{(n_{\ell})}, H1:ZnP1(n)H_{1}\colon Z^{n_{\ell}}\sim P_{1}^{(n_{\ell})}, or we wait until the next available time n+1n_{\ell+1} in 𝒩\mathcal{N}. Let τ\tau be a stopping time adapted to the filtration {(Xn)}n𝒩\{\mathcal{F}(X^{n})\}_{n\in\mathcal{N}}. Let δ\delta be a {0,1}\{0,1\}-valued, (τ)\mathcal{F}(\tau)-measurable function. An SHT is a triple (δ,τ,𝒩)(\delta,\tau,\mathcal{N}), where δ\delta is called the decision rule, τ\tau is called the stopping time, and 𝒩\mathcal{N} is the set of available decision times. Type-I and type-II error probabilities are defined as

α\displaystyle\alpha [δ=1|H0]\displaystyle\triangleq\mathbb{P}\left[\delta=1|H_{0}\right] (68)
β\displaystyle\beta [δ=0|H1].\displaystyle\triangleq\mathbb{P}\left[\delta=0|H_{1}\right]. (69)

Below, we derive an achievability using a general SHT.

A.I2 Achievability Bound

Given some input distribution PXnLP_{X^{n_{L}}}, define the common randomness random variable UU on MnL\mathbb{R}^{Mn_{L}} with the distribution

PU=PXnL×PXnL××PXnLMtimes.\displaystyle P_{U}=\underbrace{P_{X^{n_{L}}}\times P_{X^{n_{L}}}\times\cdots\times P_{X^{n_{L}}}}_{M\text{times}}. (70)

The realization of UU defines MM length-nLn_{L} codewords XnL(1),XnL(2),,XnL(M)X^{n_{L}}(1),X^{n_{L}}(2),\dots,X^{n_{L}}(M). Denote the set of available decoding times by

𝒩{n1,,nL}.\displaystyle\mathcal{N}\triangleq\{n_{1},\dots,n_{L}\}. (71)

Let {(δm,τ~m,𝒩)}m=1M\{(\delta_{m},\tilde{\tau}_{m},\mathcal{N})\}_{m=1}^{M} be MM copies of an SHT that distinguishes between the hypotheses

H0\displaystyle H_{0} :(XnL,YnL)PXnL×PY|XnL\displaystyle\colon(X^{n_{L}},Y^{n_{L}})\sim P_{X^{n_{L}}}\times P_{Y|X}^{n_{L}} (72)
H1\displaystyle H_{1} :(XnL,YnL)PXnL×PYnL\displaystyle\colon(X^{n_{L}},Y^{n_{L}})\sim P_{X^{n_{L}}}\times P_{{Y}^{n_{L}}} (73)

for each message m[M]m\in[M], where the type-I and type-II error probabilities are α\alpha and β\beta, respectively. Define for m[M]m\in[M] and j{0,1}j\in\{0,1\},

τmj{τ~mif δm=jotherwise.\displaystyle\tau_{m}^{j}\triangleq\begin{cases}\tilde{\tau}_{m}&\text{if }\delta_{m}=j\\ \infty&\text{otherwise}.\end{cases} (74)

Theorem 8, below, is an achievability bound that employs an arbitrary SHT with LL decoding times.

Theorem 8

Fix LL\leq\infty, integers M>0M>0 and 0n1<n2<<nL0\leq n_{1}<n_{2}<\dots<n_{L}\leq\infty, a distribution PXnLP_{X^{n_{L}}} as in (70), and MM copies of an SHT {(δm,τ~m,{n1,nL})}m=1M\{(\delta_{m},\tilde{\tau}_{m},\{n_{1},\dots n_{L}\})\}_{m=1}^{M} as in (72)–(74). There exists an (N,L,M,ϵ)(N,L,M,\epsilon)-VLSF code for the DM-PPC (𝒳,PY|X,𝒴)(\mathcal{X},P_{Y|X},\mathcal{Y}) with

ϵ\displaystyle\epsilon α+(M1)β\displaystyle\leq\alpha+(M-1)\beta (75)
N\displaystyle N 𝔼[min{minm[M]{τm0},maxm[M]{τm1}}].\displaystyle\leq\mathbb{E}\left[\min\left\{\min\limits_{m\in[M]}\left\{\tau_{m}^{0}\right\},\max\limits_{m\in[M]}\left\{\tau_{m}^{1}\right\}\right\}\right]. (76)
Proof:

We generate MM i.i.d. codewords according to (70). For each of MM messages, we run the hypothesis test given in (72)–(73). We decode at the earliest time that one of the following events happens

  • H0H_{0} is declared for some message m[M]m\in[M],

  • H1H_{1} is declared for all m[M]m\in[M].

The decoding output is mm if H0H_{0} is declared for mm; if there exist more than one such mm or if there exists no such mm, the decoder declares an error.

Mathematically, the random decoding time of this code is expressed as

τ=min{minm[M]{τm0},maxm[M]{τm1}}.\displaystyle\tau^{*}=\min\left\{\min\limits_{m\in[M]}\left\{\tau_{m}^{0}\right\},\max\limits_{m\in[M]}\left\{\tau_{m}^{1}\right\}\right\}. (77)

Note that τ\tau^{*} is bounded by nLn_{L} by construction. The average decoding time bound in (76) immediately follows from (77). The decoder output is

W^{mif !m[M] s. t. τ=τm0errorotherwise.\displaystyle\hat{W}\triangleq\begin{cases}m&\text{if }\exists!\,m\in[M]\text{ s. t. }\tau^{*}=\tau_{m}^{0}\\ \text{error}&\text{otherwise.}\end{cases} (78)

Since the messages are equiprobable, without loss of generality, assume that message m=1m=1 is transmitted. An error occurs if and only if H1H_{1} is decided for m=1m=1 or if H0H_{0} is decided for some m1m\neq 1, giving

ϵ=[{δ1=1}{m=2M{δm=0}}].\displaystyle\epsilon=\mathbb{P}\left[\{\delta_{1}=1\}\cup\left\{\bigcup_{m=2}^{M}\{\delta_{m}=0\}\right\}\right]. (79)

Applying the union bound to (79) shows (75). ∎

A.II Proof of Theorem 1

Theorem 1 particularizes the SHT in Theorem 8 as an information density threshold rule.

In addition to the random code design in (70), let PXnLP_{X^{n_{L}}} satisfy (20). We here specify the stopping rule τm\tau_{m} and the decision rule δm\delta_{m} for the SHT in (72)–(73).

Define the information density for message mm and decoding time nn_{\ell} as

Sm,nı(Xn(m);Yn) for m[M],[L].\displaystyle S_{m,n_{\ell}}\triangleq\imath(X^{n_{\ell}}(m);Y^{n_{\ell}})\text{ for }m\in[M],\ell\in[L]. (80)

Note that Sm,nS_{m,n_{\ell}} is the log-likelihood ratio between the distributions in hypotheses H0H_{0} and H1H_{1}. We fix a threshold γ\gamma\in\mathbb{R} and construct the SHTs

τm\displaystyle\tau_{m} =inf{n𝒩:Sm,nγ}\displaystyle=\inf\{n_{\ell}\in\mathcal{N}\colon S_{m,n_{\ell}}\geq\gamma\} (81)
τ~m\displaystyle\tilde{\tau}_{m} =min{τm,nL}\displaystyle=\min\{\tau_{m},n_{L}\} (82)
δm\displaystyle\delta_{m} ={0if Sm,τ~mγ1if Sm,τ~m<γ\displaystyle=\begin{cases}0&\text{if }S_{m,\tilde{\tau}_{m}}\geq\gamma\\ 1&\text{if }S_{m,\tilde{\tau}_{m}}<\gamma\end{cases} (83)

for all m[M]m\in[M], that is, we decide H0H_{0} for message mm at the first time nn_{\ell} that Sm,nS_{m,n_{\ell}} passes γ\gamma; if this never happens for n{n1,,nL}n_{\ell}\in\{n_{1},\dots,n_{L}\}, then we decide H1H_{1} for mm. Without loss of generality, assume that message 1 is transmitted.

Bounding (76) from above, we get

N\displaystyle N 𝔼[min{τ1,nL}]\displaystyle\leq\mathbb{E}\left[\min\{\tau_{1},n_{L}\}\right] (84)
=n=0[min{τ1,nL}>n]\displaystyle=\sum_{n=0}^{\infty}\mathbb{P}\left[\min\{\tau_{1},n_{L}\}>n\right] (85)
=n1+=1L1(n+1n)[τ1>n].\displaystyle=n_{1}+\sum_{\ell=1}^{L-1}(n_{\ell+1}-n_{\ell})\mathbb{P}\left[\tau_{1}>n_{\ell}\right]. (86)

The probability [τ1>n]\mathbb{P}\left[\tau_{1}>n_{\ell}\right] is further bounded as

[τ1>n]\displaystyle\mathbb{P}\left[\tau_{1}>n_{\ell}\right] =[j=1{ı(Xnj(1);Ynj)<γ}]\displaystyle=\mathbb{P}\left[\bigcap_{j=1}^{\ell}\{\imath(X^{n_{j}}(1);Y^{n_{j}})<\gamma\}\right] (87)
[ı(Xn(1);Yn)<γ].\displaystyle\leq\mathbb{P}\left[\imath(X^{n_{\ell}}(1);Y^{n_{\ell}})<\gamma\right]. (88)

Combining (86) and (88) proves (19).

We bound the type-I error probability of the given SHT as

α\displaystyle\alpha [δ1=1]\displaystyle\triangleq\mathbb{P}\left[\delta_{1}=1\right] (89)
=[τ1=]\displaystyle=\mathbb{P}\left[\tau_{1}=\infty\right] (90)
=[j=1L{ı(Xnj(1);Ynj)<γ}]\displaystyle=\mathbb{P}\left[\bigcap_{j=1}^{L}\{\imath(X^{n_{j}}(1);Y^{n_{j}})<\gamma\}\right] (91)
[ı(XnL(1);YnL)<γ],\displaystyle\leq\mathbb{P}\left[\imath(X^{n_{L}}(1);Y^{n_{L}})<\gamma\right], (92)

where (91) uses the definition of the decision rule (83). The type-II error probability is bounded as

β\displaystyle\beta [δ2=0]\displaystyle\triangleq\mathbb{P}\left[\delta_{2}=0\right] (93)
[τ2<]\displaystyle\leq\mathbb{P}\left[\tau_{2}<\infty\right] (94)
=𝔼[exp{ı(XnL(1);YnL)}1{τ1<}]\displaystyle=\mathbb{E}\left[\exp\{-\imath(X^{n_{L}}(1);Y^{n_{L}})\}1\{\tau_{1}<\infty\}\right] (95)
=𝔼[exp{ı(Xτ(1);Yτ)}1{τ1<}]\displaystyle=\mathbb{E}\left[\exp\{-\imath(X^{\tau}(1);Y^{\tau})\}1\{\tau_{1}<\infty\}\right] (96)
exp{γ},\displaystyle\leq\exp\{-\gamma\}, (97)

where (95) follows from changing measure from PXnL(2)YnL=PXnLPYnLP_{X^{n_{L}}(2)Y^{n_{L}}}=P_{X^{n_{L}}}P_{Y^{n_{L}}} to PXnL(1),YnL=PXnLPY|XnLP_{X^{n_{L}}(1),Y^{n_{L}}}=P_{X^{n_{L}}}P_{Y|X}^{n_{L}}. Equality (96) uses the same arguments as in [8, eq. (111)-(118)] and the fact that {exp{ı(Xn(1);Yn)}:n𝒩}{\{\exp\{-\imath(X^{n_{\ell}}(1);Y^{n_{\ell}})\}\colon n_{\ell}\in\mathcal{N}\}} is a martingale due to the product distribution in (20). Inequality (97) follows from the definition of τ1\tau_{1} in (81). Applying (75) together with (92) and (97) proves (18).

In his analysis of the error exponent regime, Forney [17] uses a slightly different threshold rule than the one in (81). Specifically, he uses a maximum a posteriori threshold rule, which can also be written as

logPYn|Xn(Yn|Xn(m))1Mj=1MPYn|Xn(Yn|Xn(j))γ,\displaystyle\log\frac{P_{Y^{n_{\ell}}|X^{n_{\ell}}}(Y^{n_{\ell}}|X^{n_{\ell}}(m))}{\frac{1}{M}\sum_{j=1}^{M}P_{Y^{n_{\ell}}|X^{n_{\ell}}}(Y^{n_{\ell}}|X^{n_{\ell}}(j))}\geq\gamma, (98)

whose denominator is the output distribution induced by the code rather than by the random codeword distribution PXnP_{X}^{n_{\ell}}.

Appendix B Proof of Theorem 2

The proof uses an idea that is similar to that in [13, Th. 2], which combines the achievability bound of a VLSF code with a sub-exponentially decaying error probability with the stop-at-time-zero procedure. The difference is that we set the sub-exponentially decaying error probability as ϵN=1NlogN\epsilon_{N}^{\prime}=\frac{1}{\sqrt{N^{\prime}\log N^{\prime}}} while [13, Th. 2] sets it to 1N\frac{1}{N^{\prime}}. This modification yields a better second-order term for finite LL.

Inverting (25), we get

N=N1ϵ(1+O(1NlogN)).\displaystyle N^{\prime}=\frac{N}{1-\epsilon}\left(1+O\left(\frac{1}{\sqrt{N\log N}}\right)\right). (99)

Next, we particularize the decision rules in the SHT at times n2,,nLn_{2},\dots,n_{L} to the information density threshold rule. Lemma 1, below, is an achievability bound for an (N,L,M,1NlogN)\left(N,L,M,\frac{1}{\sqrt{N\log N}}\right)-VLSF code that employs the information density threshold rule with the optimized decoding times and the threshold value.

Lemma 1

Fix an integer L=O(1)1L=O(1)\geq 1. For the DM-PPC with V>0V>0, the maximum message set size (13) achievable by (N,L,M,1NlogN)\left(N,L,M,\frac{1}{\sqrt{N\log N}}\right)-VLSF codes satisfies

logM(N,L,1NlogN)\displaystyle\log M^{*}\left(N,L,\frac{1}{\sqrt{N\log N}}\right) NCNlog(L)(N)V\displaystyle\geq{NC}-\sqrt{N\log_{(L)}(N)\,V}
+O(Nlog(L)(N)).\displaystyle\quad+O\left(\sqrt{\frac{N}{\log_{(L)}(N)}}\right). (100)

The decoding times n1,,nLn_{1},\dots,n_{L} that achieve (100) satisfy the equations

logM\displaystyle\log M =nCnlog(L+1)(n)Vlogn+O(1)\displaystyle=n_{\ell}C-\sqrt{n_{\ell}\log_{(L-\ell+1)}(n_{\ell})V}-\log{n_{\ell}}+O(1) (101)

for [L]\ell\in[L].

Proof:

Lemma 1 analyzes Theorem 1. See Appendix B.I, below. For L=O(1)L=O(1), the proof is significantly different than the corresponding result in [8, eq. (102)] for L=L=\infty because the dominant error probability term [ı(XnL;YnL)<γ]\mathbb{P}\left[\imath(X^{n_{L}};Y^{n_{L}})<\gamma\right] in (18) disappears when L=L=\infty. ∎

We use the average decoding time NN and average error probability ϵ\epsilon of a VLSF code in Lemma 1 in the places of NN^{\prime} and ϵN\epsilon_{N}^{\prime} in (23). By Lemma 1, there exists an (N,L1,M,ϵN)(N^{\prime},L-1,M,\epsilon_{N}^{\prime})-VLSF code with

logM\displaystyle\log M =NCNlog(L1)(N)V\displaystyle={N^{\prime}C}-\sqrt{N^{\prime}\log_{(L-1)}(N^{\prime})\,V}
+O(Nlog(L1)(N)).\displaystyle\quad+O\left(\sqrt{\frac{N^{\prime}}{\log_{(L-1)}(N^{\prime})}}\right). (102)

Plugging (99) into (102) and applying the necessary Taylor series expansions complete the proof.

Lemma 1 is an achievability bound in the moderate deviations regime since the error probability 1NlogN\frac{1}{\sqrt{N\log N}} decays sub-exponentially to zero. The fixed-length scenario in Lemma 1, i.e., L=1L=1, is recovered by [50], which investigates the moderate deviations regime in channel coding. A comparison between the right-hand side of (100) and [50, Th. 2] highlights the benefit of using VLSF codes in the moderate deviations regime. The second-order rate achieved by a VLSF code with L2L\geq 2 decoding times, average decoding time NN, and error probability 1NlogN\frac{1}{\sqrt{N\log N}} is achieved by a fixed-length code with blocklength NN and error probability 1log(L1)(N)log(L)(N)\frac{1}{\sqrt{\log_{(L-1)}(N)\log_{(L)}(N)}}.

In the remainder of the appendix, we prove Lemma 1 and show the second-order optimality of the parameters used in the proof of Theorem 2 including the decoding times set in (22).

B.I Proof of Lemma 1

We first present two lemmas used in the proof of Lemma 1 (step 1). We then choose the distribution PXnLP_{X}^{n_{L}} of the random codewords (step 2) and the parameters n1,,nL,γn_{1},\dots,n_{L},\gamma in Theorem 1 (step 3). Finally, we analyze the bounds in Theorem 1 using the supporting lemmas (step 4).

B.I1 Supporting lemmas

Lemma 2, below, is the moderate deviations result that bounds the probability that a sum of nn zero-mean i.i.d. random variables is above a function of nn that grows at most as quickly as n2/3n^{2/3}.

Lemma 2 (Petrov [42, Ch. 8, Th.  4])

Let Z1,,ZnZ_{1},\dots,Z_{n} be i.i.d. random variables. Let 𝔼[Z1]=0\mathbb{E}\left[Z_{1}\right]=0, σ2=Var[Z1]\sigma^{2}=\mathrm{Var}\left[Z_{1}\right], and μ3=𝔼[Z13]\mu_{3}=\mathbb{E}\left[Z_{1}^{3}\right]. Suppose that the moment generating function 𝔼[exp{tZ}]\mathbb{E}\left[\exp\{tZ\}\right] is finite in the neighborhood of t=0t=0. (This condition is known as Cramér’s condition.) Let 0zn=O(n1/6)0\leq z_{n}=O(n^{1/6}). As nn\to\infty, it holds that

[i=1nZiznσn]\displaystyle\mathbb{P}\left[\sum\limits_{i=1}^{n}Z_{i}\geq z_{n}\sigma\sqrt{n}\right]
=Q(zn)exp{zn3μ36nσ3}+O(1nexp{zn22})\displaystyle\quad=Q(z_{n})\exp\left\{\frac{z_{n}^{3}\mu_{3}}{6\sqrt{n}\sigma^{3}}\right\}+O\left(\frac{1}{\sqrt{n}}\exp\left\{-\frac{z_{n}^{2}}{2}\right\}\right) (103)

Lemma 3, below, gives the asymptotic expansion of the root of an equation. We use Lemma 3 to find the asymptotic expansion for the gap between two consecutive decoding times nn_{\ell} and n+1n_{\ell+1}.

Lemma 3

Let f(x)f(x) be a differentiable increasing function that satisfies f(x)0f^{\prime}(x)\to 0 as xx\to\infty. Suppose that

x+f(x)=y.\displaystyle x+f(x)=y. (104)

Then, as xx\to\infty it holds that

x=yf(y)(1o(1)).\displaystyle x=y-f(y)\left(1-o(1)\right). (105)
Proof:

Define the function F(x)x+f(x)yF(x)\triangleq x+f(x)-y. Applying Newton’s method with the starting point x0=yx_{0}=y yields

x1\displaystyle x_{1} =x0F(x0)F(x0)\displaystyle=x_{0}-\frac{F(x_{0})}{F^{\prime}(x_{0})} (106)
=yf(y)1+f(y)\displaystyle=y-\frac{f(y)}{1+f^{\prime}(y)} (107)
=yf(y)(1f(y)+O(f(y)2).\displaystyle=y-f(y)(1-f^{\prime}(y)+O(f^{\prime}(y)^{2}). (108)

Recall that f(y)=o(1)f^{\prime}(y)=o(1) by assumption. Equality (108) follows from the Taylor series expansion of the function 11+x\frac{1}{1+x} around x=0x=0. Let

x=yf(y)(1o(1)).\displaystyle x^{\star}=y-f(y)(1-o(1)). (109)

From Taylor’s theorem, it follows that

f(x)=f(y)f(y0)f(y)(1o(1)),\displaystyle f(x^{\star})=f(y)-f^{\prime}(y_{0})f(y)(1-o(1)), (110)

for some y0[yf(y)(1o(1)),y]y_{0}\in[y-f(y)(1-o(1)),y]. Therefore, f(y0)=o(1)f^{\prime}(y_{0})=o(1), and f(x)=f(y)(1o(1))f(x^{\star})=f(y)(1-o(1)). Putting (109)–(110) in (104), we see that xx^{\star} is a solution to the equality in (104).

B.I2 Random encoder design

We set the distribution of the random codewords PXnLP_{X^{n_{L}}} as the product of PXP_{X}^{*}’s, where PXP_{X}^{*} is the capacity-achieving distribution with minimum dispersion, i.e.,

PXnL\displaystyle P_{X^{n_{L}}} =(PX)nL\displaystyle=(P_{X}^{*})^{n_{L}} (111)
PX\displaystyle P_{X}^{*} =argminPX{V(X;Y):I(X;Y)=C}.\displaystyle=\arg\min_{P_{X}}\{V(X;Y)\colon I(X;Y)=C\}. (112)

B.I3 Choosing the threshold γ\gamma and decoding times n1,,nLn_{1},\dots,n_{L}

We choose γ,n1,,nL\gamma,n_{1},\dots,n_{L} so that the equalities

γ=nCnlog(L+1)(n)V\displaystyle\gamma=n_{\ell}C-\sqrt{n_{\ell}\log_{(L-\ell+1)}(n_{\ell})V} (113)

hold for all [L]\ell\in[L]. This choice minimizes our upper bound (19) on the average decoding time up to the second-order term in the asymptotic expansion. See Appendix B.II for the proof. Applying Lemma 3 with

x\displaystyle x =n+1\displaystyle=n_{\ell+1} (114)
y\displaystyle y =n1Cnlog(L+1)(n)V\displaystyle=n_{\ell}-\frac{1}{C}\sqrt{n_{\ell}\log_{(L-\ell+1)}(n_{\ell})V} (115)
f(x)\displaystyle f(x) =1Cn+1log(L)(n+1)V\displaystyle=-\frac{1}{C}\sqrt{n_{\ell+1}\log_{(L-\ell)}(n_{\ell+1})V} (116)

for {1,,L1}\ell\in\{1,\dots,L-1\}, gives the following gaps between consecutive decoding times

n+1n\displaystyle n_{\ell+1}-n_{\ell} =1C(nlog(L)(n)V\displaystyle=\frac{1}{C}\Big{(}\sqrt{n_{\ell}\log_{(L-\ell)}(n_{\ell})V}
nilog(L+1)(n)V)(1+o(1)).\displaystyle-\sqrt{n_{i}\log_{(L-\ell+1)}(n_{\ell})V}\Big{)}(1+o(1)). (117)

B.I4 Analyzing the bounds in Theorem 1

The information density ı(X;Y)\imath(X;Y) of a DM-PPC is a bounded random variable. Therefore, ı(X;Y)\imath(X;Y) satisfies Cramér’s condition (see Lemma 2).333Here, Cramér’s condition is the bottleneck that determines whether our proof technique applies to a specific DM-PPC. For DM-PPCs with infinite input or output alphabets, Cramér’s condition may or may not be satisfied. Our proof technique applies to any DM-PPC for which the information density satisfies Cramér’s condition. For each [L]\ell\in[L], applying Lemma 2 with γ,n1,,nL\gamma,n_{1},\dots,n_{L} satisfying (113) gives

\IEEEeqnarraymulticol3l[ı(Xn;Yn)<γ]\displaystyle\IEEEeqnarraymulticol{3}{l}{\mathbb{P}\left[\imath(X^{n_{\ell}};Y^{n_{\ell}})<\gamma\right]} (119)
\displaystyle\leq Q(log(L+1)(n))exp{(log(L+1)(n))3/2μ36nV3/2}\displaystyle Q\left(\sqrt{\log_{(L-\ell+1)}(n_{\ell})}\right)\exp\left\{\frac{-(\log_{(L-\ell+1)}(n_{\ell}))^{3/2}\mu_{3}}{6\sqrt{n}V^{3/2}}\right\}
+O(1nexp{log(L+1)(n)2})\displaystyle+O\left(\frac{1}{\sqrt{n}}\exp\left\{-\frac{\log_{(L-\ell+1)}(n_{\ell})}{2}\right\}\right)
\displaystyle\leq 12π1log(L)(n)1log(L+1)(n)\displaystyle\frac{1}{\sqrt{2\pi}}\frac{1}{\sqrt{\log_{(L-\ell)}(n_{\ell})}}\frac{1}{\sqrt{\log_{(L-\ell+1)}(n_{\ell})}}\,
(1+O((log(L+1)(n))(3/2)n))\displaystyle\cdot\left(1+O\left(\frac{(\log_{(L-\ell+1)}(n_{\ell}))^{(3/2)}}{\sqrt{n_{\ell}}}\right)\right)

for <L\ell<L, where

μ3𝔼[(ı(X;Y)C)3]<,\displaystyle\mu_{3}\triangleq\mathbb{E}\left[(\imath(X;Y)-C)^{3}\right]<\infty, (120)

and (119) follows from the Taylor series expansion exp{x}=1+x+O(x2)\exp\{x\}=1+x+O(x^{2}) as x0x\to 0, and the well-known bound (e.g., [42, Ch. 8, eq. (2.46)])

Q(x)12π1xexp{x22}for x>0.\displaystyle Q(x)\leq\frac{1}{\sqrt{2\pi}}\frac{1}{x}\exp\left\{-\frac{x^{2}}{2}\right\}\quad\text{for }x>0. (121)

For =L\ell=L, Lemma 2 gives

\IEEEeqnarraymulticol3l[ı(XnL;YnL)<γ]\displaystyle\IEEEeqnarraymulticol{3}{l}{\mathbb{P}\left[\imath(X^{n_{L}};Y^{n_{L}})<\gamma\right]} (122)
\displaystyle\leq 12π1nL1lognL(1+O((lognL)(3/2)nL)).\displaystyle\frac{1}{\sqrt{2\pi}}\frac{1}{\sqrt{n_{L}}}\frac{1}{\sqrt{\log n_{L}}}\left(1+O\left(\frac{(\log n_{L})^{(3/2)}}{\sqrt{n_{L}}}\right)\right).

By Theorem 1, there exists a VLSF code with LL decoding times n1<n2<<nLn_{1}<n_{2}<\cdots<n_{L} such that the expected decoding time is bounded as

Nn1+=1L1(n+1n)[ı(Xn;Yn)<γ].\displaystyle N\leq n_{1}+\sum_{\ell=1}^{L-1}(n_{\ell+1}-n_{\ell})\mathbb{P}\left[\imath(X^{n_{\ell}};Y^{n_{\ell}})<\gamma\right]. (123)

By (117), we have n+1=n(1+o(1))n_{\ell+1}=n_{\ell}(1+o(1)) for [L1]\ell\in[L-1]. Multiplying these asymptotic equations, we get

n=n1(1+o(1))\displaystyle n_{\ell}=n_{1}(1+o(1)) (124)

for [L]\ell\in[L]. Plugging (117), (119), and (124) into (123), we get

Nn1+V2πCn1log(L)(n1)(1+o(1)).\displaystyle N\leq n_{1}+\frac{\sqrt{V}}{\sqrt{2\pi}\,C}\frac{\sqrt{n_{1}}}{\sqrt{\log_{(L)}(n_{1})}}(1+o(1)). (125)

Applying Lemma 3 to (125), we get

n1NV2πCNlog(L)(N)(1+o(1)).\displaystyle n_{1}\geq N-\frac{\sqrt{V}}{\sqrt{2\pi}\,C}\frac{\sqrt{N}}{\sqrt{\log_{(L)}(N)}}(1+o(1)). (126)

Comparing (126) and (117), we observe that for n1n_{1} large enough,

n1<N<n2<<nL.\displaystyle n_{1}<N<n_{2}<\dots<n_{L}. (127)

Further, from (113) and (125), we have

nL=N(1+O(logNN)).\displaystyle n_{L}=N\left(1+O\left(\sqrt{\frac{\log N}{N}}\right)\right). (128)

Finally, we set message set size MM such that

logM=γlogN.\displaystyle\log M=\gamma-\log N. (129)

Plugging (122) and (129) into (18), we bound the error probability as

\IEEEeqnarraymulticol3l[𝗀τ(U,Yτ)W]\displaystyle\IEEEeqnarraymulticol{3}{l}{\mathbb{P}\left[\mathsf{g}_{\tau^{*}}(U,Y^{\tau^{*}})\neq W\right]} (130)
\displaystyle\leq [ı(XnL;YnL)<γ]+(M1)exp{γ}\displaystyle\mathbb{P}\left[\imath(X^{n_{L}};Y^{n_{L}})<\gamma\right]+(M-1)\exp\{-\gamma\}
\displaystyle\leq 12π1nL1lognL(1+o(1))+1N\displaystyle\frac{1}{\sqrt{2\pi}}\frac{1}{\sqrt{n_{L}}}\frac{1}{\sqrt{\log n_{L}}}(1+o(1))+\frac{1}{N} (131)
\displaystyle\leq 12π1N1logN(1+o(1))+1N,\displaystyle\frac{1}{\sqrt{2\pi}}\frac{1}{\sqrt{N}}\frac{1}{\sqrt{\log N}}(1+o(1))+\frac{1}{N}, (132)

where (132) follows from (127). Inequality (132) implies that the error probability is bounded by 1NlogN\frac{1}{\sqrt{N\log N}} for NN large enough. Plugging (126) and (129) into (113) with =1\ell=1, we conclude that there exists an (N,L,M,1NlogN)\left(N,L,M,\frac{1}{\sqrt{N\log N}}\right)-VLSF code with

logM\displaystyle\log M NCNlog(L)(N)V\displaystyle\geq NC-\sqrt{N\log_{(L)}(N)V}
12πNVlog(L)(N)(1+o(1))logN,\displaystyle\quad-\frac{1}{\sqrt{2\pi}}\sqrt{\frac{NV}{\log_{(L)}(N)}}(1+o(1))-\log N, (133)

which completes the proof.

B.II Second-order optimality of the decoding times in Theorem 2

From the code construction described in Theorems 12, the average decoding time is

N(n2,,nL,γ)=N(1ϵ)11ϵN,\displaystyle N(n_{2},\dots,n_{L},\gamma)=N^{\prime}(1-\epsilon)\frac{1}{1-\epsilon_{N}^{\prime}}, (134)

where

N\displaystyle N^{\prime} =n2+i=2L1(ni+1ni)[ı(Xni;Yni)<γ]\displaystyle=n_{2}+\sum_{i=2}^{L-1}(n_{i+1}-n_{i})\mathbb{P}\left[\imath(X^{n_{i}};Y^{n_{i}})<\gamma\right] (135)
ϵN\displaystyle\epsilon_{N}^{\prime} =[ı(XnL;YnL)<γ]+Mexp{γ}.\displaystyle=\mathbb{P}\left[\imath(X^{n_{L}};Y^{n_{L}})<\gamma\right]+M\exp\{-\gamma\}. (136)

We here show that given a fixed MM, the parameters n2,n3,,nL,γn_{2},n_{3},\dots,n_{L},\gamma chosen according to (113) and (129) (and also the error value ϵN\epsilon_{N}^{\prime} chosen in (23) since ϵN\epsilon_{N}^{\prime} is a function of (nL,γ(n_{L},\gamma)) minimize the average decoding time in (134) in the sense that the second-order expansion of logM\log M in terms of NN is maximized. That is, our parameter choice optimizes our bound on our code construction.

B.II1 Optimality of n2,,nL1n_{2},\dots,n_{L-1}

We first set nLn_{L} and γ\gamma to satisfy the equations

1nLlognL\displaystyle\frac{1}{\sqrt{n_{L}\log n_{L}}} =[ı(XnL;YnL)<γ]+(M1)exp{γ}\displaystyle=\mathbb{P}\left[\imath(X^{n_{L}};Y^{n_{L}})<\gamma\right]+(M-1)\exp\{-\gamma\} (137)
logM\displaystyle\log M =γlognL,\displaystyle=\gamma-\log n_{L}, (138)

and optimize the values of n2,,nL1n_{2},\dots,n_{L-1} under (137)–(138). Section B.II2 proves the optimality of the choices in (137)–(138).

Under (137)–(138), the optimization problem in (134)–(136) reduces to

min\displaystyle\min N(n2,,nL1)\displaystyle N^{\prime}(n_{2},\dots,n_{L-1}) (139)
=n2+i=2L1(ni+1ni)[ı(Xni;Yni)<γ]\displaystyle=n_{2}+\sum_{i=2}^{L-1}(n_{i+1}-n_{i})\mathbb{P}\left[\imath(X^{n_{i}};Y^{n_{i}})<\gamma\right]
s.t. 1nLlognL=[ı(XnL;YnL)<γ]\displaystyle\frac{1}{\sqrt{n_{L}\log n_{L}}}=\mathbb{P}\left[\imath(X^{n_{L}};Y^{n_{L}})<\gamma\right]
+(M1)exp{γ}.\displaystyle\quad\quad\quad\quad\quad\quad+(M-1)\exp\{-\gamma\}.

Next, we define the functions

g(n)\displaystyle g(n) nCγnV\displaystyle\triangleq\frac{nC-\gamma}{\sqrt{nV}} (140)
F(n)\displaystyle F(n) Q(g(n))=1Q(g(n))\displaystyle\triangleq Q(-g(n))=1-Q(g(n)) (141)
f(n)\displaystyle f(n) F(n)=12πexp{g(n)22}g(n).\displaystyle\triangleq F^{\prime}(n)=\frac{1}{\sqrt{2\pi}}\exp\left\{-\frac{g(n)^{2}}{2}\right\}\,g^{\prime}(n). (142)

Assume that γ=γn\gamma=\gamma_{n} is such that g(n)=O(n1/6)g(n)=O(n^{1/6}), and limng(n)=\lim\limits_{n\to\infty}g(n)=\infty. Then by Lemma 2, [ı(Xn;Yn)<γ]\mathbb{P}\left[\imath(X^{n};Y^{n})<\gamma\right], which is a step-wise constant function of nn, is approximated by differentiable function 1F(n)1-F(n) as

[ı(Xn;Yn)<γ]=(1F(n))(1+o(1)).\displaystyle\mathbb{P}\left[\imath(X^{n};Y^{n})<\gamma\right]=(1-F(n))(1+o(1)). (143)

Taylor series expansions give

1F(n)\displaystyle 1-F(n) =1g(n)12πexp{g(n)22}(1+o(1))\displaystyle=\frac{1}{g(n)}\frac{1}{\sqrt{2\pi}}\exp\left\{-\frac{g(n)^{2}}{2}\right\}(1+o(1)) (144)
f(n)\displaystyle f(n) =(1F(n))g(n)g(n)(1+o(1))\displaystyle=(1-F(n))g(n)g^{\prime}(n)(1+o(1)) (145)
g(n)\displaystyle g^{\prime}(n) =CnV(1+o(1)).\displaystyle=\frac{C}{\sqrt{nV}}(1+o(1)). (146)

Let 𝐧=(n2,,nL1){\mathbf{n}^{*}=(n_{2}^{*},\dots,n_{L-1}^{*})} denote the solution to the optimization problem in (139) with [ı(Xn;Yn)<γ]\mathbb{P}\left[\imath(X^{n};Y^{n})<\gamma\right] replaced by (1F(n))(1-F(n)). Then 𝐧\mathbf{n}^{*} must satisfy the Karush-Kuhn-Tucker conditions N(𝐧)=𝟎\nabla N^{\prime}(\mathbf{n}^{*})=\mathbf{0}, giving

Nn2|𝐧=𝐧\displaystyle\left.\frac{\partial N^{\prime}}{\partial n_{2}}\right|_{\mathbf{n}=\mathbf{n}^{*}} =F(n2)(n3n2)f(n2)=0\displaystyle=F(n_{2}^{*})-(n_{3}^{*}-n_{2}^{*})f(n_{2}^{*})=0 (147)
Nn|𝐧=𝐧\displaystyle\left.\frac{\partial N^{\prime}}{\partial n_{\ell}}\right|_{\mathbf{n}=\mathbf{n}^{*}} =F(n)F(nL1)(n+1n)f(n)=0\displaystyle=F(n_{\ell}^{*})-F(n_{L-1}^{*})-(n_{\ell+1}^{*}-n_{\ell}^{*})f(n_{\ell}^{*})=0 (148)

for =3,,L1\ell=3,\dots,L-1.

Let 𝐧~=(n~2,,n~L1)\tilde{\mathbf{n}}=(\tilde{n}_{2},\dots,\tilde{n}_{L-1}) be the decoding times chosen in (113). We evaluate

g(n~)\displaystyle g(\tilde{n}_{\ell}) =log(L+1)(n~)(1+o(1))\displaystyle=\sqrt{\log_{(L-\ell+1)}(\tilde{n}_{\ell})}(1+o(1)) (149)
1F(n~)\displaystyle 1-F(\tilde{n}_{\ell}) =12π1g(n~+1)g(n~)(1+o(1))\displaystyle=\frac{1}{\sqrt{2\pi}}\frac{1}{g(\tilde{n}_{\ell+1})g(\tilde{n}_{\ell})}(1+o(1)) (150)
f(n~)\displaystyle f(\tilde{n}_{\ell}) =12πg(n~)g(n~+1)\displaystyle=\frac{1}{\sqrt{2\pi}}\frac{g^{\prime}(\tilde{n}_{\ell})}{g(\tilde{n}_{\ell+1})} (151)
n~+1n~\displaystyle\tilde{n}_{\ell+1}-\tilde{n}_{\ell} =g(n~+1)g(n~)(1+o(1))\displaystyle=\frac{g(\tilde{n}_{\ell+1})}{g^{\prime}(\tilde{n}_{\ell})}(1+o(1)) (152)

for =2,,L1\ell=2,\dots,L-1. Plugging (149)–(152) into (147)–(148) for Nn|𝐧=𝐧~\left.\frac{\partial N^{\prime}}{\partial n_{\ell}}\right|_{\mathbf{n}=\mathbf{\tilde{n}}}, we get

N(𝐧~)\displaystyle\nabla N^{\prime}(\tilde{\mathbf{n}}) =(112π,12π,12π,,12π)\displaystyle=\left(1-\frac{1}{\sqrt{2\pi}},-\frac{1}{\sqrt{2\pi}},-\frac{1}{\sqrt{2\pi}},\dots,-\frac{1}{\sqrt{2\pi}}\right)
(1+o(1)).\displaystyle\quad(1+o(1)). (153)

Our goal is to find a vector Δ𝐧=(Δn2,,ΔnL1)\Delta\mathbf{n}=(\Delta n_{2},\dots,\Delta n_{L-1}) such that

N(𝐧~+Δ𝐧)=𝟎,\displaystyle\nabla N^{\prime}(\tilde{\mathbf{n}}+\Delta\mathbf{n})=\mathbf{0}, (154)

Assume that Δn=O(n)\Delta n=O(\sqrt{n}). By plugging n+Δnn+\Delta n into (144)–(146) and using the Taylor series expansion of g(n+Δn)g(n+\Delta n), we get

1F(n+Δn)\displaystyle 1-F(n+\Delta n) =(1F(n))\displaystyle=(1-F(n))
exp{Δng(n)g(n)}(1+o(1))\displaystyle\quad\cdot\exp\{-\Delta ng(n)g^{\prime}(n)\}(1+o(1)) (155)
f(n+Δn)\displaystyle f(n+\Delta n) =f(n)exp{Δng(n)g(n)}(1+o(1)).\displaystyle=f(n)\exp\{-\Delta ng(n)g^{\prime}(n)\}(1+o(1)). (156)

Using (155)–(156), and putting 𝐧~+Δ𝐧\tilde{\mathbf{n}}+\Delta\mathbf{n} in (147)–(148), we solve (154) as

Δn2\displaystyle\Delta n_{2} =log2πg(n~2)g(n~2)(1+o(1))\displaystyle=-\frac{\log\sqrt{2\pi}}{g(\tilde{n}_{2})g^{\prime}(\tilde{n}_{2})}(1+o(1)) (157)
=Vlog2πCn~2log(L1)(n~2)(1+o(1))\displaystyle=-\frac{\sqrt{V}\,\log{\sqrt{2\pi}}}{C}\frac{\sqrt{\tilde{n}_{2}}}{\sqrt{\log_{(L-1)}(\tilde{n}_{2})}}(1+o(1)) (158)
Δni\displaystyle\Delta n_{i} =12g(n~i1)2g(n~i)g(n~i)=o(Δn2)(1+o(1))\displaystyle=\frac{1}{2}\frac{g(\tilde{n}_{i-1})^{2}}{g(\tilde{n}_{i})g^{\prime}(\tilde{n}_{i})}=o(\Delta n_{2})(1+o(1)) (159)

for i=3,,L1i=3,\dots,L-1. Hence, 𝐧~+Δ𝐧\tilde{\mathbf{n}}+\Delta{\mathbf{n}} satisfies the optimality criterion, and 𝐧=𝐧~+Δ𝐧\mathbf{n}^{*}=\tilde{\mathbf{n}}+\Delta{\mathbf{n}}.

It remains only to evaluate the gap N(𝐧)N(𝐧~)N^{\prime}(\mathbf{n}^{*})-N^{\prime}(\tilde{\mathbf{n}}). We have

N(𝐧)N(𝐧~)\displaystyle N^{\prime}(\mathbf{n}^{*})-N^{\prime}(\tilde{\mathbf{n}})
=(Δn2+i=2L1(n~i+1n~i)Q(g(n~i))\displaystyle=\bigg{(}\Delta n_{2}+\sum_{i=2}^{L-1}(\tilde{n}_{i+1}-\tilde{n}_{i})Q(g(\tilde{n}_{i}))
(exp{Δnig(n~i)g(n~i)}1))(1+o(1))\displaystyle\quad\cdot\left(\exp\{-\Delta n_{i}g(\tilde{n}_{i})g^{\prime}(\tilde{n}_{i})\}-1\right)\bigg{)}(1+o(1)) (160)
=(Δn2+(112π)1g(n~1)g(n~i)i=3L1Δni)\displaystyle=\left(\Delta n_{2}+\left(1-\frac{1}{\sqrt{2\pi}}\right)\frac{1}{g(\tilde{n}_{1})g^{\prime}(\tilde{n}_{i})}-\sum_{i=3}^{L-1}\Delta n_{i}\right)
(1+o(1))\displaystyle\quad\cdot(1+o(1)) (161)
=Bn~2log(L1)(n~2)(1+o(1)),\displaystyle=-B\frac{\sqrt{\tilde{n}_{2}}}{\sqrt{\log_{(L-1)}(\tilde{n}_{2})}}(1+o(1)), (162)

where B=(log2π+12π1)VCB=\left(\log\sqrt{2\pi}+\frac{1}{\sqrt{2\pi}}-1\right)\frac{\sqrt{V}}{C} is a positive constant. From the relationship between nn_{\ell} and n2n_{2} in (124) and the equality (162), we get

N(𝐧~)=N(𝐧)+BN(𝐧)log(L1)(N(𝐧))(1+o(1)).\displaystyle N^{\prime}(\tilde{\mathbf{n}})=N^{\prime}(\mathbf{n}^{*})+B\frac{\sqrt{N^{\prime}(\mathbf{n}^{*})}}{\sqrt{\log_{(L-1)}(N^{\prime}(\mathbf{n}^{*}))}}(1+o(1)). (163)

Plugging (163) into our VLSF achievability bound (133) gives

logM\displaystyle\log M N(𝐧)CN(𝐧)log(L1)(N(𝐧))V\displaystyle\geq N^{\prime}(\mathbf{n}^{*})C-\sqrt{N^{\prime}(\mathbf{n}^{*})\log_{(L-1)}(N^{\prime}(\mathbf{n}^{*}))V}
O(N(𝐧)log(L1)(N(𝐧))).\displaystyle\quad-O\left(\sqrt{\frac{N^{\prime}(\mathbf{n}^{*})}{\log_{(L-1)}(N^{\prime}(\mathbf{n}^{*}))}}\right). (164)

Comparing (164) and (133), note that the decoding times chosen in (113) have the optimal second-order term in the asymptotic expansion of the maximum achievable message set size within our code construction. Moreover, the order of the third-order term in (164) is the same as the order of the third-order term in (133).

The method of approximating the probability [ı(Xn;Yn)γ]\mathbb{P}\left[\imath(X^{n};Y^{n})\geq\gamma\right], which is an upper bound for [τn]\mathbb{P}\left[\tau\leq n\right] (see (86)), by a differentiable function F(n)F(n) is introduced in [26, Sec. III] for the optimization problem in (139). In [26], Vakilinia et al. approximate the distribution of the random stopping time τ\tau by the Gaussian distribution, where 𝔼[τ]\mathbb{E}\left[\tau\right] and Var[τ]\mathrm{Var}\left[\tau\right] are found empirically. They derive the Karush-Kuhn-Tucker conditions in (147)–(148), which is known as the SDO algorithm. A similar analysis appears in [27] for the binary erasure channel. The analyses in [26, 27] use the SDO algorithm to numerically solve the equations (147)–(148) for a fixed LL, MM, and ϵ\epsilon. Unlike [26, 27], we find the analytic solution to (147)–(148) as decoding times n2,,nLn_{2},\dots,n_{L} approach infinity, and we derive the achievable rate in Theorem 2 as a function of LL.

B.II2 Optimality of nLn_{L} and γ\gamma

Let (𝐧,γ)=(n2,,nL,γ)(\mathbf{n}^{*},\gamma^{*})=(n_{2}^{*},\dots,n_{L}^{*},\gamma^{*}) be the solution to N(𝐧,γ)=𝟎\nabla N(\mathbf{n}^{*},\gamma^{*})=\mathbf{0} in (134). Section A finds the values of n2,n3,,nL1n_{2}^{*},n_{3}^{*},\dots,n_{L-1}^{*} that minimize NN^{\prime}. Minimizing NN^{\prime} also minimizes NN in (134) since ϵN\epsilon_{N}^{\prime} depends only on nLn_{L} and γ\gamma, and ϵ\epsilon is a constant. Therefore, to minimize NN, it only remains to find (nL,γ)(n_{L}^{*},\gamma^{*}) such that

NnL|(𝐧,γ)=(𝐧,γ)\displaystyle\left.\frac{\partial N}{\partial n_{L}}\right|_{(\mathbf{n},\gamma)=(\mathbf{n}^{*},\gamma^{*})} =0\displaystyle=0 (165)
Nγ|(𝐧,γ)=(𝐧,γ)\displaystyle\left.\frac{\partial N}{\partial\gamma}\right|_{(\mathbf{n},\gamma)=(\mathbf{n}^{*},\gamma^{*})} =0.\displaystyle=0. (166)

Consider the case where L>2L>2. Solving (165) and (166) using (147)–(152) gives

g(nL)\displaystyle g(n_{L}^{*}) =lognL+log(2)(nL)+log(3)(nL)+O(1)\displaystyle=\sqrt{\log n_{L}^{*}+\log_{(2)}(n_{L}^{*})+\log_{(3)}{(n_{L}^{*})}+O(1)} (167)
0\displaystyle 0 =c0+N(12πnLexp{g(nL)22}(1+o(1))\displaystyle=c_{0}+N^{\prime}\bigg{(}\frac{1}{\sqrt{2\pi n_{L}^{*}}}\exp\left\{-\frac{g(n_{L}^{*})^{2}}{2}\right\}(1+o(1))
Mexp{γ}),\displaystyle\quad-M\exp\{-\gamma^{*}\}\bigg{)}, (168)

where c0c_{0} is a positive constant. Solving (167)–(168) simultaneously, we obtain

logM\displaystyle\log M =γlognL+O(1).\displaystyle=\gamma^{*}-\log n_{L}^{*}+O(1). (169)

Plugging (167) and (169) into (136), we get

ϵN=c1nLlog(2)(nL)lognL(1+o(1)),\displaystyle\epsilon_{N}^{\prime*}=\frac{c_{1}}{\sqrt{n_{L}^{*}\log_{(2)}(n_{L}^{*})}\log n_{L}^{*}}(1+o(1)), (170)

where c1c_{1} is a constant. Let (𝐧~,γ~)=(n~2,,n~K,γ~)(\tilde{\mathbf{n}},\tilde{\gamma})=(\tilde{n}_{2},\dots,\tilde{n}_{K},\tilde{\gamma}) be the parameters chosen in (113) and (129). Note that ϵN\epsilon_{N}^{\prime*} is order-wise different than ϵN\epsilon_{N}^{\prime} in (23). Following steps similar to (160)–(162), we compute

N(𝐧,γ)N(𝐧~,γ~)=O(nLlognL).\displaystyle N(\mathbf{n}^{*},\gamma^{*})-N(\tilde{\mathbf{n}},\tilde{\gamma})=-O\left(\sqrt{\frac{n_{L}^{*}}{\log n_{L}^{*}}}\right). (171)

Plugging (171) into (21) gives

logM\displaystyle\log M =N(𝐧,γ)C1ϵ\displaystyle={\frac{N(\mathbf{n}^{*},\gamma^{*})C}{1-\epsilon}}
N(𝐧,γ)log(L1)(N(𝐧,γ))V1ϵ\displaystyle-\sqrt{N(\mathbf{n}^{*},\gamma^{*})\log_{(L-1)}(N(\mathbf{n}^{*},\gamma^{*}))\frac{V}{1-\epsilon}}
+O(N(𝐧,γ)log(L1)(N(𝐧,γ))).\displaystyle+O\left(\sqrt{\frac{N(\mathbf{n}^{*},\gamma^{*})}{\log_{(L-1)}(N(\mathbf{n}^{*},\gamma^{*}))}}\right). (172)

Comparing (21) and (172), we see that although (23) and (170) are different, the parameters (𝐧~,γ~)(\tilde{\mathbf{n}},\tilde{\gamma}) chosen in (113) and (129) have the same second-order term in the asymptotic expansion of the maximum achievable message set size as the parameters (𝐧,γ)(\mathbf{n}^{*},\gamma^{*}) that minimize the average decoding time in the achievability bound in Theorem 1.

For L=2L=2, the summation term in (135) disappears; in this case, the solution to (165) gives

ϵN=c2nLlognL(1+o(1))\displaystyle\epsilon_{N}^{\prime*}=\frac{c_{2}}{\sqrt{n_{L}^{*}\log n_{L}^{*}}}(1+o(1)) (173)

for some constant c2c_{2}. Following the steps in (171)–(172), we conclude that the parameter choices in (113) and (129) are second-order optimal for L=2L=2 as well.

Appendix C Proof of Theorem 3

Let P0P_{0} and P1P_{1} be two distributions. Let ZlogdP0dP1Z\triangleq\log\frac{\mathrm{d}P_{0}}{\mathrm{d}P_{1}} be the log-likelihood ratio, and let

Sn=i=1nZi,\displaystyle S_{n}=\sum_{i=1}^{n}Z_{i}, (174)

where ZiZ_{i}’s are i.i.d. and have the same distribution as ZZ. For i{0,1}i\in\{0,1\}, we denote the probability measures and expectations under distribution PiP_{i} by i\mathbb{P}_{i} and 𝔼i\mathbb{E}_{i}, respectively. Given a threshold a0a_{0}\in\mathbb{R}, define the stopping time

T\displaystyle T inf{n1:Sna0}\displaystyle\triangleq\inf\{n\geq 1\colon S_{n}\geq a_{0}\} (175)

and the overshoot

ξ0=STa0.\displaystyle\xi_{0}=S_{T}-a_{0}. (176)

The following lemma from [36], which gives the refined asymptotics for the stopping time TT, is the main tool to prove our bounds.

Lemma 4 ([36, Cor. 2.3.1, Th. 2.3.3, Th. 2.5.3, Lemma 3.1.1])

Suppose that 𝔼0[(Z1+)2]<\mathbb{E}_{0}[(Z_{1}^{+})^{2}]<\infty, and Z1Z_{1} is non-arithmetic. Then, as aa\to\infty, it holds that

𝔼0[T]\displaystyle\mathbb{E}_{0}[T] =1D(P0P1)(a0+𝔼0[ξ0])\displaystyle=\frac{1}{D(P_{0}\|P_{1})}(a_{0}+\mathbb{E}_{0}[\xi_{0}]) (177)
=1D(P0P1)(a0+𝔼0[Z12]2D(P0P1)\displaystyle=\frac{1}{D(P_{0}\|P_{1})}\Bigg{(}a_{0}+\frac{\mathbb{E}_{0}[Z_{1}^{2}]}{2D(P_{0}\|P_{1})}
n=11n𝔼0[Sn]+o(1)),\displaystyle\quad-\sum_{n=1}^{\infty}\frac{1}{n}\mathbb{E}_{0}[S_{n}^{-}]+o(1)\Bigg{)}, (178)

and

0[T<]\displaystyle\mathbb{P}_{0}[T<\infty] =1\displaystyle=1 (179)
1[T<]\displaystyle\mathbb{P}_{1}[T<\infty] =ea0𝔼0[eξ0]\displaystyle=e^{-a_{0}}\mathbb{E}_{0}[e^{-\xi_{0}}] (180)
𝔼0[eλξ0]\displaystyle\mathbb{E}_{0}[e^{-\lambda\xi_{0}}] =1+o(1)λD(P0P1)exp{n=11n𝔼0[eλSn+]}.\displaystyle=\frac{1+o(1)}{\lambda D(P_{0}\|P_{1})}\exp\left\{-\sum_{n=1}^{\infty}\frac{1}{n}\mathbb{E}_{0}[e^{-\lambda S_{n}^{+}}]\right\}. (181)

C.I Achievability Proof

Let PXP_{X} be a capacity-achieving distribution of the DM-PPC. Define the hypotheses

H0\displaystyle H_{0} :(XdN,YdN)P0=((PX×PY|X)dN)\displaystyle\colon(X^{d_{N}},Y^{d_{N}})^{\infty}\sim P_{0}^{\infty}=((P_{X}\times P_{Y|X})^{d_{N}})^{\infty} (182)
H1\displaystyle H_{1} :(XdN,YdN)P1=((PX×PY)dN),\displaystyle\colon(X^{d_{N}},Y^{d_{N}})^{\infty}\sim P_{1}^{\infty}=((P_{X}\times P_{Y})^{d_{N}})^{\infty}, (183)

and the random variables

Wi\displaystyle W_{i} 1dNlogdP0dNdP1dN(X(i1)dN+1:idN,Y(i1)dN+1:idN)\displaystyle\triangleq\frac{1}{d_{N}}\log\frac{\mathrm{d}P_{0}^{d_{N}}}{\mathrm{d}P_{1}^{d_{N}}}\left(X^{(i-1)d_{N}+1:id_{N}},Y^{(i-1)d_{N}+1:id_{N}}\right) (184)
=1dNı(X(i1)dN+1:idN;Y(i1)dN+1:idN)\displaystyle=\frac{1}{d_{N}}\imath(X^{(i-1)d_{N}+1:id_{N}};Y^{(i-1)d_{N}+1:id_{N}}) (185)

for i.i\in\mathbb{N}. Note that under H0H_{0}, 𝔼0[Wi]=C\mathbb{E}_{0}[W_{i}]=C. Here, we use WiW_{i} in the place of ZiZ_{i} in (174). Define

Sni=1nWi,\displaystyle S_{n}\triangleq\sum_{i=1}^{n}W_{i}, (186)

and

τ\displaystyle\tau inf{k1:Ska0/dN}\displaystyle\triangleq\inf\{k\geq 1\colon S_{k}\geq a_{0}/d_{N}\} (187)
T\displaystyle T dNτ.\displaystyle\triangleq d_{N}\,\tau. (188)

We employ the stop-at-time-zero procedure described in the proof sketch of Theorem 2 with ϵN=1𝔼0[T]\epsilon_{N}^{\prime}=\frac{1}{\mathbb{E}_{0}[T]} and the information density threshold rule (80)–(83) from the proof of Theorem 1, where the threshold γ\gamma is set to a0a_{0}. Here, TT is as in (175). We set MM and a0a_{0} so that

M1[T<]\displaystyle M\mathbb{P}_{1}[T<\infty] Mea0=ϵN=1𝔼0[T],\displaystyle\leq Me^{-a_{0}}=\epsilon_{N}^{\prime}=\frac{1}{\mathbb{E}_{0}[T]}, (189)

where the inequality follows from (180). Following steps identical to (99), and noting that 0[T=]=0\mathbb{P}_{0}[T=\infty]=0, we get

N=(1ϵ)𝔼0[T]+O(1),\displaystyle N=(1-\epsilon)\mathbb{E}_{0}[T]+O(1), (190)

and the average error probability of the code is bounded by ϵ\epsilon.

To evaluate 𝔼0[T]\mathbb{E}_{0}[T], we use Lemma 4 with WiW_{i} in place of ZiZ_{i}. A straightforward calculation yields

𝔼0[W12]\displaystyle\mathbb{E}_{0}[W_{1}^{2}] =𝔼0[W1]2Var[1dNi=1dNı(Xi;Yi)]\displaystyle=\mathbb{E}_{0}[W_{1}]^{2}-\mathrm{Var}\left[\frac{1}{d_{N}}\sum_{i=1}^{d_{N}}\imath(X_{i};Y_{i})\right] (191)
=C21dNVar[ı(X1;Y1)].\displaystyle=C^{2}-\frac{1}{d_{N}}\mathrm{Var}\left[\imath(X_{1};Y_{1})\right]. (192)

Next, we have that

𝔼0[Sn]=ndN𝔼[1ndNSn1{1ndNSn0}],\displaystyle\mathbb{E}_{0}[S_{n}^{-}]=-nd_{N}\mathbb{E}\left[\frac{1}{nd_{N}}S_{n}1\left\{\frac{1}{nd_{N}}S_{n}\leq 0\right\}\right], (193)

where Sn=j=1ndNı(Xj;Yj)S_{n}=\sum_{j=1}^{nd_{N}}\imath(X_{j};Y_{j}). Applying the saddlepoint approximation (e.g., [51, eq. (1.2)]) to 1ndNSn\frac{1}{nd_{N}}S_{n}, we get

𝔼0[Sn]=ndN0c(x)ndNendNg(x)+logxdx,\displaystyle\mathbb{E}_{0}[S_{n}^{-}]=nd_{N}\int_{-\infty}^{0}c(x)\sqrt{nd_{N}}e^{-nd_{N}g(x)+\log x}\mathrm{d}x, (194)

where c(x)c(x) and g(x)g(x) are bounded from below a positive constant for all x(,0]x\in(-\infty,0]. Applying the Laplace’s integral [51, eq. (2.5)] to (194), we get

𝔼0[Sn]=endNcn+o(ndN)\displaystyle\mathbb{E}_{0}[S_{n}^{-}]=e^{-nd_{N}c_{n}+o(nd_{N})} (195)

for all n+n\in\mathbb{Z}_{+}, where each cnc_{n} is a positive constant depending on nn. Putting (192) and (195) into (178) and (188), we get

𝔼0[T]=a0C+dN2+o(dN).\displaystyle\mathbb{E}_{0}[T]=\frac{a_{0}}{C}+\frac{d_{N}}{2}+o(d_{N}). (196)

From (189)–(190), we get

𝔼0[T]\displaystyle\mathbb{E}_{0}[T] =N1ϵ+O(1)\displaystyle=\frac{N}{1-\epsilon}+O\left(1\right) (197)
logM\displaystyle\log M =a0logN.\displaystyle=a_{0}-\log N. (198)

Putting (196)–(198) together completes the proof of (26).

C.II Converse Proof

Recall the definition of an SHT (δ,τ,𝒩)(\delta,\tau,\mathcal{N}) from Appendix A.I1 that tests the hypotheses

H0\displaystyle H_{0} :ZP0\displaystyle\colon Z^{\infty}\sim P_{0} (199)
H1\displaystyle H_{1} :ZP1,\displaystyle\colon Z^{\infty}\sim P_{1}, (200)

where P0P_{0} and P1P_{1} are distributions on a common alphabet 𝒵\mathcal{Z}^{\infty}. Here, Z(Z1,Z2,)Z^{\infty}\triangleq(Z_{1},Z_{2},\dots) denotes a vector of infinite length whose joint distribution is either P0P_{0} or P1P_{1}, which need not be product distributions in general. We define the minimum achievable type-II error probability, subject to a type-I error probability bound and a maximal expected decoding time constraint, with decision times restricted to the set 𝒩\mathcal{N} as

β(ϵ,N,𝒩)(P0,P1)min(δ,τ,𝒩):0[δ=1]ϵ,max{𝔼0[τ],𝔼1[τ]}N1[δ=0],\displaystyle\beta_{(\epsilon,N,\mathcal{N})}(P_{0},P_{1})\triangleq\min_{\begin{subarray}{c}(\delta,\tau,\mathcal{N})\colon\mathbb{P}_{0}[\delta=1]\leq{\epsilon},\\ \max\{\mathbb{E}_{0}[\tau],\mathbb{E}_{1}[\tau]\}\leq N\end{subarray}}{\mathbb{P}_{1}[\delta=0]}, (201)

which is the SHT version of the βα\beta_{\alpha}-function defined for the fixed-length binary hypothesis test [13].

The following theorem extends the meta-converse bound [13, Th. 27], which is a fundamental theorem used to show converse results in fixed-length channel coding without feedback and many other applications.

Theorem 9

Fix any set 𝒩\mathcal{N}\subseteq\mathbb{Z}_{\geq}, a real number N>0N>0, and a DM-PPC PY|XP_{Y|X}. Then, it holds that

logM(N,|𝒩|,ϵ,𝒩)\displaystyle\log M^{*}(N,|\mathcal{N}|,\epsilon,\mathcal{N})
supPXinfQYlogβ(ϵ,N,𝒩)(PX×PY|X,PX×QY).\displaystyle\quad\leq\sup_{P_{X^{\infty}}}\inf_{Q_{Y^{\infty}}}-\log\beta_{(\epsilon,N,\mathcal{N})}(P_{X^{\infty}}\times P_{Y|X}^{\infty},P_{X^{\infty}}\times Q_{Y^{\infty}}). (202)
Proof:

The proof is similar to that in [13]. Let WW denote a message equiprobably distributed on [M][M], and let W^\hat{W} be its reconstruction. Given any VLSF code with the set of available decoding times 𝒩\mathcal{N}, average decoding time NN, error probability ϵ\epsilon, and codebook size MM, let P^X\hat{P}_{X^{\infty}} denote the input distribution induced by the code’s codebook. The code operation creates a Markov chain WXYW^W\to X^{\infty}\to Y^{\infty}\to\hat{W}. As full feedback breaks this Markov chain, stop feedback does not since the channel inputs are conditionally independent of the channel outputs given the message WW. Fix an arbitrary output distribution QYQ_{Y^{\infty}}, and consider the SHT

H0\displaystyle H_{0} :(X,Y)P^X×PY|X\displaystyle\colon(X^{\infty},Y^{\infty})\sim\hat{P}_{X^{\infty}}\times P_{Y|X}^{\infty} (203)
H1\displaystyle H_{1} :(X,Y)P^X×QY\displaystyle\colon(X^{\infty},Y^{\infty})\sim\hat{P}_{X^{\infty}}\times Q_{Y^{\infty}} (204)

with a test δ=1{W^W}\delta=1\{\hat{W}\neq W\}, where (W,W^)(W,\hat{W}) are generated by the (potentially random) encoder-decoder pair of the VLSF code. The type-I and type-II error probabilities of this code-induced SHT are

α\displaystyle\alpha =0[δ=1]=[W^W]ϵ\displaystyle=\mathbb{P}_{0}[\delta=1]=\mathbb{P}\left[\hat{W}\neq W\right]\leq\epsilon (205)
β\displaystyle\beta =1[δ=0]=1M,\displaystyle=\mathbb{P}_{1}[\delta=0]=\frac{1}{M}, (206)

where (206) follows since the sequence YY^{\infty} is independent of XX^{\infty} under H1H_{1}. The stopping time of this SHT under H0H_{0} or H1H_{1} is bounded by NN by the definition of a VLSF code. Since the error probabilities in (205)–(206) cannot be better than that of the optimal SHT, it holds that

logM\displaystyle\log M
logβ(ϵ,N,𝒩)(P^X×PY|X,P^X×QY)\displaystyle\leq-\log\beta_{(\epsilon,N,\mathcal{N})}(\hat{P}_{X^{\infty}}\times P_{Y|X}^{\infty},\hat{P}_{X^{\infty}}\times Q_{Y^{\infty}}) (207)
infQYlogβ(ϵ,N,𝒩)(P^X×PY|X,P^X×QY)\displaystyle\leq\inf_{Q_{Y^{\infty}}}-\log\beta_{(\epsilon,N,\mathcal{N})}(\hat{P}_{X^{\infty}}\times P_{Y|X}^{\infty},\hat{P}_{X^{\infty}}\times Q_{Y^{\infty}}) (208)
supPXinfQYlogβ(ϵ,N,𝒩)(PX×PY|X,PX×QY),\displaystyle\leq\sup_{P_{X^{\infty}}}\inf_{Q_{Y^{\infty}}}-\log\beta_{(\epsilon,N,\mathcal{N})}(P_{X^{\infty}}\times P_{Y|X}^{\infty},P_{X^{\infty}}\times Q_{Y^{\infty}}), (209)

where (208) follows since the choice QYQ_{Y^{\infty}} is arbitrary. ∎

To prove (27), we apply Theorem 9 and get

logMlogβ(ϵ,N,𝒩)(PY|X,PY),\displaystyle\log M\leq-\log\beta_{(\epsilon,N,\mathcal{N})}(P_{Y|X}^{\infty},P_{Y}^{\infty}), (210)

where PYP_{Y} is the capacity-achieving output distribution, and 𝒩={0,dN,2dN,}\mathcal{N}=\{0,d_{N},2d_{N},\dots\}. The reduction from Theorem 9 to (210) follows since logPY|X(Y|x)PY(Y)\log\frac{P_{Y|X}(Y|x)}{P_{Y}(Y)} has the same distribution for all x𝒳x\in\mathcal{X} for Cover–Thomas symmetric channels [43, p. 190]. In the remainder of the proof, we derive an upper bound for the right-hand side of (210).

Consider any SHT (δ,τ,𝒩)(\delta,\tau,\mathcal{N}) with 𝔼0[τ]N\mathbb{E}_{0}[\tau]\leq N and 𝔼1[τ]N\mathbb{E}_{1}[\tau]\leq N. Our definition in (201) is slightly different than the classical SHT definition from [52] since our definition allows one to make a decision at time 0. Notice that at time 0, any test has three choices: decide H0H_{0}, decide H1H_{1}, or decide to start taking samples. When the test decides to start taking samples, the remainder of the procedure becomes a classical SHT. From this observation, any test satisfies

ϵα\displaystyle\epsilon\geq\alpha =ϵ0+(1ϵ0ϵ1)αϵ0\displaystyle=\epsilon_{0}+(1-\epsilon_{0}-\epsilon_{1})\alpha^{\prime}\geq\epsilon_{0} (211)
β\displaystyle\beta =ϵ1+(1ϵ0ϵ1)β(1ϵ0)β,\displaystyle=\epsilon_{1}+(1-\epsilon_{0}-\epsilon_{1})\beta^{\prime}\geq(1-\epsilon_{0})\beta^{\prime}, (212)

where at time 0, the test decides HiH_{i} with probability ϵ1i\epsilon_{1-i}, and α\alpha^{\prime} and β\beta^{\prime} are the type-I and type-II error probabilities conditioned on the event that the test decides to take samples at time 0, which occurs with probability 1ϵ0ϵ11-\epsilon_{0}-\epsilon_{1}.

Let τ\tau^{\prime} denote the average stopping time of the test with error probabilities (α,β)(\alpha^{\prime},\beta^{\prime}). We have

𝔼0[τ]\displaystyle\mathbb{E}_{0}[\tau] =(1ϵ0ϵ1)𝔼0[τ]\displaystyle=(1-\epsilon_{0}-\epsilon_{1})\mathbb{E}_{0}[\tau^{\prime}] (213)
=(1ϵ0)(𝔼0[τ]+eO(N))N\displaystyle=(1-\epsilon_{0})(\mathbb{E}_{0}[\tau^{\prime}]+e^{-O(N)})\leq N (214)
𝔼1[τ]\displaystyle\mathbb{E}_{1}[\tau] =(1ϵ0ϵ1)𝔼1[τ]\displaystyle=(1-\epsilon_{0}-\epsilon_{1})\mathbb{E}_{1}[\tau^{\prime}] (215)
=(1ϵ0)(𝔼1[τ]+eO(N))N\displaystyle=(1-\epsilon_{0})(\mathbb{E}_{1}[\tau^{\prime}]+e^{-O(N)})\leq N (216)

since β\beta decays exponentially with 𝔼0[τ]\mathbb{E}_{0}[\tau].

The following argument is similar to that in [53, Sec. V-C]. Set an arbitrary ν>0\nu>0 and the thresholds

a~0\displaystyle\tilde{a}_{0} =C(N1ϵ0dN2o(dN)+ν)\displaystyle=C\left(\frac{N}{1-\epsilon_{0}}-\frac{d_{N}}{2}-o(d_{N})+{\nu}\right) (217)
a~1\displaystyle\tilde{a}_{1} =D(PYPY|X=x)(N1ϵ0dN2o(dN)+ν),\displaystyle=D(P_{Y}\|P_{Y|X=x})\left(\frac{N}{1-\epsilon_{0}}-\frac{d_{N}}{2}-o(d_{N})+{\nu}\right), (218)

where x𝒳x\in\mathcal{X} is arbitrary, and let (δ~,τ~,𝒩)(\tilde{\delta},\tilde{\tau},\mathcal{N}) be the SPRT associated with the thresholds (a~1,a~0)(-\tilde{a}_{1},\tilde{a}_{0}), and type-I and type-II error probabilities α~\tilde{\alpha} and β~\tilde{\beta}.

Applying [36, eq. (3.56)] to (196), we get

𝔼0[τ~]\displaystyle\mathbb{E}_{0}[\tilde{\tau}] =a~0C+dN2+o(dN)\displaystyle=\frac{\tilde{a}_{0}}{C}+\frac{d_{N}}{2}+o(d_{N}) (219)
𝔼1[τ~]\displaystyle\mathbb{E}_{1}[\tilde{\tau}] =a~1D(PYPY|X=x)+dN2+o(dN).\displaystyle=\frac{\tilde{a}_{1}}{D(P_{Y}\|P_{Y|X=x})}+\frac{d_{N}}{2}+o(d_{N}). (220)

Combining (217)–(220) gives

𝔼0[τ~]\displaystyle\mathbb{E}_{0}[\tilde{\tau}] N1ϵ0+ν\displaystyle\geq\frac{N}{1-\epsilon_{0}}+{\nu} (221)
𝔼1[τ~]\displaystyle\mathbb{E}_{1}[\tilde{\tau}] N1ϵ0+ν.\displaystyle\geq\frac{N}{1-\epsilon_{0}}+{\nu}. (222)

Letting ν=O(1N)\nu=O\left(\frac{1}{N}\right), it follows from (213)–(216) and (221)–(222) that

𝔼0[τ~]\displaystyle\mathbb{E}_{0}[\tilde{\tau}] 𝔼0[τ]\displaystyle\geq\mathbb{E}_{0}[\tau^{\prime}] (223)
𝔼1[τ~]\displaystyle\mathbb{E}_{1}[\tilde{\tau}] 𝔼1[τ]\displaystyle\geq\mathbb{E}_{1}[\tau^{\prime}] (224)

for a large enough NN. Using Wald and Wolfowitz’s SPRT optimality result [54], we get

α\displaystyle\alpha^{\prime} α~\displaystyle\geq\tilde{\alpha} (225)
β\displaystyle\beta^{\prime} β~.\displaystyle\geq\tilde{\beta}. (226)

Now it only remains to lower bound β~\tilde{\beta}. Applying [36, Th. 3.1.2, 3.1.3] and (181) gives

β~=ζ~ea~0(1+o(1)),\displaystyle\tilde{\beta}=\tilde{\zeta}e^{-\tilde{a}_{0}}(1+o(1)), (227)

where

ζ~\displaystyle\tilde{\zeta} =1dNC(exp{n=11n0[Sn<0]+1[Sn>0]}),\displaystyle=\frac{1}{d_{N}C}\left(\exp\left\{-\sum_{n=1}^{\infty}\frac{1}{n}\mathbb{P}_{0}[S_{n}<0]+\mathbb{P}_{1}[S_{n}>0]\right\}\right), (228)

and SnS_{n} is as in (186). Since SnS_{n} is a sum of ndNnd_{N}\to\infty i.i.d. random variables, where the summands have a non-zero mean, the Chernoff bound implies that each of the probabilities in (228) decays exponentially with dNd_{N}. Thus,

ζ~=1dNC(1+o(1)).\displaystyle\tilde{\zeta}=\frac{1}{d_{N}C}(1+o(1)). (229)

From (217) and (229), we get

logβ~=C(N1ϵ0dN2o(dN)+o(logdN))\displaystyle-\log\tilde{\beta}=C\left(\frac{N}{1-\epsilon_{0}}-\frac{d_{N}}{2}-o(d_{N})+o(\log d_{N})\right) (230)
C(N1ϵdN2o(dN)+o(logdN)),\displaystyle\leq C\left(\frac{N}{1-\epsilon}-\frac{d_{N}}{2}-o(d_{N})+o(\log d_{N})\right), (231)

where (231) follows from (211). Inequalities (212), (226), and (231) imply (27).

Appendix D Proofs for the DM-MAC

In this section, we prove our main results for the DM-MAC, beginning with Theorem 4, which is used to prove Theorem 5.

D.I Proof of Theorem 4

For each transmitter kk, k[K]k\in[K], we generate MkM_{k} nLn_{L}-dimensional codewords i.i.d. from PXknLP_{X_{k}}^{n_{L}}. Codewords for distinct transmitters are drawn independently of each other. Denote the codeword for transmitter kk and message mkm_{k} by XknL(mk)X_{k}^{n_{L}}(m_{k}) for k[K]k\in[K] and mk[Mk]m_{k}\in[M_{k}]. The proof extends the DM-PPC achievability bound in Theorem 1 that is based on a sub-optimal SHT to the DM-MAC. Below, we explain the differences.

Without loss of generality, assume that m[K]=𝟏m_{[K]}=\bm{1} is transmitted. The hypothesis test in (72)–(73) is replaced by

H0\displaystyle H_{0} :(X[K]nL,YKnL)(k=1KPXknL)×PYK|X[K]nL\displaystyle\colon(X_{[K]}^{n_{L}},Y_{K}^{n_{L}})\sim\left(\prod_{k=1}^{K}P_{X_{k}}^{n_{L}}\right)\times P_{Y_{K}|X_{[K]}}^{n_{L}} (232)
H1\displaystyle H_{1} :(X[K]nL,YKnL)(k=1KPXknL)×PYKnL,\displaystyle\colon(X_{[K]}^{n_{L}},Y_{K}^{n_{L}})\sim\left(\prod_{k=1}^{K}P_{X_{k}}^{n_{L}}\right)\times P_{Y_{K}}^{n_{L}}, (233)

which is run for every message tuple m[K]k=1K[Mk]m_{[K]}\in\prod\limits_{k=1}^{K}[M_{k}]. The information density (80), the stopping times (81)–(82), and the decision rule (83) are extended to the DM-MAC as

Sm[K],n\displaystyle S_{m_{[K]},n_{\ell}} ıK(X[K]n(m[K]);YKn)\displaystyle\triangleq\imath_{K}(X_{[K]}^{n_{\ell}}(m_{[K]});Y_{K}^{n_{\ell}}) (234)
τm[K]\displaystyle\tau_{m_{[K]}} inf{n𝒩:Sm[K],nγ}\displaystyle\triangleq\inf\{n_{\ell}\in\mathcal{N}\colon S_{m_{[K]},n_{\ell}}\geq\gamma\} (235)
τ~m[K]\displaystyle\tilde{\tau}_{m_{[K]}} min{τm[K],nL}\displaystyle\triangleq\min\{\tau_{m_{[K]}},n_{L}\} (236)
δm[K]\displaystyle\delta_{m_{[K]}} {0if Sm[K],nγ1if Sm[K],n<γ\displaystyle\triangleq\begin{cases}0&\text{if }S_{m_{[K]},n_{\ell}}\geq\gamma\\ 1&\text{if }S_{m_{[K]},n_{\ell}}<\gamma\end{cases} (237)

for every message tuple m[K]m_{[K]} and decoding time nn_{\ell}. For brevity, let (X[K]n,YKn,X¯[K]n)(X_{[K]}^{n_{\ell}},Y_{K}^{n_{\ell}},\bar{X}_{[K]}^{n_{\ell}}) be drawn i.i.d. according to the joint distribution

PX[K],YK,X¯[K](x[K],y,x¯[K])\displaystyle P_{X_{[K]},Y_{K},\bar{X}_{[K]}}(x_{[K]},y,\bar{x}_{[K]})
=PYK|X[K](y|x[K])k=1KPXk(xk)PXk(x¯k).\displaystyle=P_{Y_{K}|X_{[K]}}(y|x_{[K]})\prod_{k=1}^{K}P_{X_{k}}(x_{k})P_{X_{k}}(\bar{x}_{k}). (238)

Expected decoding time analysis: Following steps identical to (84)–(86), we get (41).

Error probability analysis: The following error analysis extends the PPC bounds in (79) and (89)–(97) to the DM-MAC.

In the analysis below, for brevity, we write m𝒜1m_{\mathcal{A}}\neq 1 to denote that mi1m_{i}\neq 1 for i𝒜i\in\mathcal{A}. The error probability is bounded as

ϵ\displaystyle\epsilon [m[K]𝟏{τm[K]τ𝟏<}{τ𝟏=}]\displaystyle\leq\mathbb{P}\bigg{[}\bigcup_{m_{[K]}\neq\mathbf{1}}\{\tau_{m_{[K]}}\leq\tau_{\mathbf{1}}<\infty\}\bigcup\{\tau_{\mathbf{1}}=\infty\}\bigg{]} (239)
[τ𝟏=]+[m[K]1{τm[K]<}]\displaystyle\leq\mathbb{P}\left[\tau_{\mathbf{1}}=\infty\right]+\mathbb{P}\left[\bigcup_{\begin{subarray}{c}m_{[K]}\neq 1\end{subarray}}\{\tau_{m_{[K]}}<\infty\}\right] (240)
+[m[K]𝟏:i[K]mi=1{τm[K]<}],\displaystyle\quad+\mathbb{P}\left[\bigcup_{\begin{subarray}{c}m_{[K]}\neq\mathbf{1}\colon\exists\,i\in[K]\\ m_{i}=1\end{subarray}}\{\tau_{m_{[K]}}<\infty\}\right], (241)

where (240)–(241) apply the union bound to separate the probabilities of the following error events:

  1. 1.

    the information density of the true message tuple does not satisfy the threshold test for any available decoding time;

  2. 2.

    the information density of a message tuple in which all messages are incorrect satisfies the threshold test for some decoding time;

  3. 3.

    the information density of a message tuple in which the messages from some transmitters are correct and the messages from the other transmitters are incorrect satisfies the threshold test for some decoding time.

The terms in (240) are bounded using steps identical to (89)–(97) as

[τ𝟏=]\displaystyle\mathbb{P}\left[\tau_{\mathbf{1}}=\infty\right] [ıK(X[K]nL;YKnL)<γ]\displaystyle\leq\mathbb{P}\left[\imath_{K}(X_{[K]}^{n_{L}};Y_{K}^{n_{L}})<\gamma\right] (242)
[m[K]1{τm[K]<}]\displaystyle\mathbb{P}\left[\bigcup_{\begin{subarray}{c}m_{[K]}\neq 1\end{subarray}}\{\tau_{m_{[K]}}<\infty\}\right] k=1K(Mk1)exp{γ}.\displaystyle\leq\prod_{k=1}^{K}(M_{k}-1)\exp\{-\gamma\}. (243)

For the cases where at least one message is decoded correctly, we delay the application of the union bound. Let 𝒜𝒫([K])\mathcal{A}\in\mathcal{P}([K]) be the set of transmitters whose messages are decoded correctly. Define

(𝒜)\displaystyle\mathcal{M}^{(\mathcal{A})} {m[K][M]K:mk=1 for k𝒜,\displaystyle\triangleq\{m_{[K]}\in[M]^{K}\colon m_{k}=1\text{ for }k\in\mathcal{A},
mk1 for k𝒜c}\displaystyle\quad\quad m_{k}\neq 1\text{ for }k\in\mathcal{A}^{\mathrm{c}}\} (244)
~(𝒜)\displaystyle\tilde{\mathcal{M}}^{(\mathcal{A})} {m𝒜[M]|𝒜|:mk1 for k𝒜}.\displaystyle\triangleq\{m_{\mathcal{A}}\in[M]^{|\mathcal{A}|}\colon m_{k}\neq 1\text{ for }k\in\mathcal{A}\}. (245)

We first bound the probability term in (241) by applying the union bound according to which subset 𝒜\mathcal{A} of the transmitter set [K][K] is decoded correctly, and get

[m[K]𝟏:i[K]mi=1{τm[K]<}]\displaystyle\mathbb{P}\left[\bigcup_{\begin{subarray}{c}m_{[K]}\neq\mathbf{1}\colon\exists\,i\in[K]\\ m_{i}=1\end{subarray}}\{\tau_{m_{[K]}}<\infty\}\right]
𝒜𝒫([K])[m[K](𝒜){τm[K]<}]\displaystyle\leq\sum_{\mathcal{A}\in\mathcal{P}([K])}\mathbb{P}\left[\bigcup_{m_{[K]}\in\mathcal{M}^{(\mathcal{A})}}\{\tau_{m_{[K]}}<\infty\}\right] (246)
=𝒜𝒫([K])[m𝒜c~(𝒜c)n𝒩{ıK(X¯𝒜cn(m𝒜c),X𝒜n;YKn)γ}],\displaystyle=\sum_{\mathcal{A}\in\mathcal{P}([K])}\mathbb{P}\left[\bigcup_{\begin{subarray}{c}m_{\mathcal{A}^{c}}\in\tilde{\mathcal{M}}^{(\mathcal{A}^{c})}\\ n_{\ell}\in\mathcal{N}\end{subarray}}\left\{\imath_{K}(\bar{X}_{\mathcal{A}^{c}}^{n_{\ell}}(m_{\mathcal{A}^{c}}),X_{\mathcal{A}}^{n_{\ell}};Y_{K}^{n_{\ell}})\geq\gamma\right\}\right], (247)

where X¯𝒜cn(m𝒜c)\bar{X}_{\mathcal{A}^{c}}^{n_{\ell}}(m_{\mathcal{A}^{c}}) refers to the random sample from the codebooks of transmitter set 𝒜c\mathcal{A}^{c}, independent from the codewords X𝒜cnX_{{\mathcal{A}^{c}}}^{n_{\ell}} transmitted by the transmitters 𝒜c\mathcal{A}^{c} and the received output YnY^{n_{\ell}}.

We bound the right-hand side of (247) using the same method as in [28, eq. (65)–(66)]. This step is crucial in enabling the single-threshold rule for the rate vectors approaching a point on the sum-rate boundary. Set an arbitrary λ(𝒜)>0\lambda^{(\mathcal{A})}>0. Define two events

(𝒜)\displaystyle\mathcal{E}(\mathcal{A}) n𝒩{ıK(X𝒜n;YKn)>NIK(X𝒜;YK)+Nλ(𝒜)}\displaystyle\triangleq\bigcup_{n_{\ell}\in\mathcal{N}}\left\{\imath_{K}(X_{\mathcal{A}}^{n_{\ell}};Y_{K}^{n_{\ell}})>NI_{K}(X_{\mathcal{A}};Y_{K})+N\lambda^{(\mathcal{A})}\right\} (248)
(𝒜)\displaystyle\mathcal{F}(\mathcal{A}) m𝒜c~(𝒜c)n𝒩{ıK(X¯𝒜cn(m𝒜c),X𝒜n;YKn)γ}.\displaystyle\triangleq\bigcup_{\begin{subarray}{c}m_{\mathcal{A}^{c}}\in\tilde{\mathcal{M}}^{(\mathcal{A}^{c})}\\ n_{\ell}\in\mathcal{N}\end{subarray}}\left\{\imath_{K}(\bar{X}_{\mathcal{A}^{c}}^{n_{\ell}}(m_{{\mathcal{A}}^{c}}),X_{\mathcal{A}}^{n_{\ell}};Y_{K}^{n_{\ell}})\geq\gamma\right\}. (249)

Define the threshold

γ¯(𝒜)γNIK(X𝒜;YK)Nλ(𝒜).\displaystyle\bar{\gamma}^{(\mathcal{A})}\triangleq\gamma-NI_{K}(X_{\mathcal{A}};Y_{K})-N\lambda^{(\mathcal{A})}. (250)

We have

[(𝒜)]\displaystyle\mathbb{P}\left[\mathcal{F}(\mathcal{A})\right]
=[(𝒜)(𝒜)]+[(𝒜)(𝒜)c]\displaystyle=\mathbb{P}\left[\mathcal{F}(\mathcal{A})\cap\mathcal{E}(\mathcal{A})\right]+\mathbb{P}\left[\mathcal{F}(\mathcal{A})\cap\mathcal{E}(\mathcal{A})^{c}\right] (251)
[(𝒜)]\displaystyle\leq\mathbb{P}\left[\mathcal{E}(\mathcal{A})\right]
+[m𝒜c~(𝒜c)n𝒩{ıK(X¯𝒜cn(m𝒜c);YKn|X𝒜n)γ¯(𝒜)}]\displaystyle+\mathbb{P}\Biggm{[}\bigcup_{\begin{subarray}{c}\begin{subarray}{c}m_{\mathcal{A}^{c}}\in\tilde{\mathcal{M}}^{(\mathcal{A}^{c})}\end{subarray}\\ n_{\ell}\in\mathcal{N}\end{subarray}}\bigg{\{}\imath_{K}(\bar{X}_{\mathcal{A}^{\mathrm{c}}}^{n_{\ell}}(m_{\mathcal{A}^{\mathrm{c}}});Y_{K}^{n_{\ell}}|X_{\mathcal{A}}^{n_{\ell}})\geq\bar{\gamma}^{(\mathcal{A})}\bigg{\}}\Biggm{]} (252)
n𝒩[ıK(X𝒜n;YKn)>NIK(X𝒜;YK)+Nλ(𝒜)]\displaystyle\leq\sum_{n_{\ell}\in\mathcal{N}}\mathbb{P}\left[\imath_{K}(X_{\mathcal{A}}^{n_{\ell}};Y_{K}^{n_{\ell}})>NI_{K}(X_{\mathcal{A}};Y_{K})+N\lambda^{(\mathcal{A})}\right]
+k𝒜c(Mk1)[n𝒩{ıK(X¯𝒜cn;YKn|X𝒜n)γ¯(𝒜)}]\displaystyle+\prod_{k\in\mathcal{A}^{\mathrm{c}}}(M_{k}-1)\mathbb{P}\left[\bigcup_{n_{\ell}\in\mathcal{N}}\left\{\imath_{K}(\bar{X}_{\mathcal{A}^{\mathrm{c}}}^{n_{\ell}};Y_{K}^{n_{\ell}}|X_{\mathcal{A}}^{n_{\ell}})\geq\bar{\gamma}^{(\mathcal{A})}\right\}\right] (253)
n𝒩[ıK(X𝒜n;YKn)>NIK(X𝒜;YK)+Nλ(𝒜)]\displaystyle\leq\sum_{n_{\ell}\in\mathcal{N}}\mathbb{P}\left[\imath_{K}(X_{\mathcal{A}}^{n_{\ell}};Y_{K}^{n_{\ell}})>NI_{K}(X_{\mathcal{A}};Y_{K})+N\lambda^{(\mathcal{A})}\right]
+k𝒜c(Mk1)exp{γ¯(𝒜)},\displaystyle\quad+\prod_{k\in\mathcal{A}^{\mathrm{c}}}(M_{k}-1)\exp\{-\bar{\gamma}^{(\mathcal{A})}\}, (254)

where inequality (252) uses the chain rule for mutual information, (253) applies the union bound, and (254) follows from [20, eq. (88)].

Applying the bound in (254) to each of the probabilities in (247) and plugging (242), (243), and (247) into (240)–(241), we complete the proof of Theorem 4.

D.II Proof of Theorem 5

We employ the stop-at-time-zero procedure in the proof sketch of Theorem 2 with ϵN=1NlogN\epsilon_{N}^{\prime}=\frac{1}{\sqrt{N^{\prime}\log N^{\prime}}}. Therefore, we first show that there exists an (N,L,M[K],1/NlogN)(N,L,M_{[K]},1/\sqrt{N\log N})-VLSF code satisfying

k=1KlogMk\displaystyle\sum_{k=1}^{K}\log M_{k} =NIKNlog(L)(N)VK\displaystyle=NI_{K}-\sqrt{N\log_{(L)}(N)V_{K}}
+O(NVKlog(L)(N)).\displaystyle\quad+O\left(\sqrt{\frac{NV_{K}}{\log_{(L)}(N)}}\right). (255)

We set the parameters

γ\displaystyle\gamma =nIKnlog(L+1)(n)VK[L]\displaystyle=n_{\ell}I_{K}-\sqrt{n_{\ell}\log_{(L-\ell+1)}(n_{\ell})V_{K}}\quad\forall\,\ell\in[L] (256)
=k=1KlogMk+logN\displaystyle=\sum_{k=1}^{K}\log M_{k}+\log N (257)
λ(𝒜)\displaystyle\lambda^{(\mathcal{A})} =NIK(X𝒜c;YK|X𝒜)k𝒜clogMk2N,𝒜𝒫([K]).\displaystyle=\frac{NI_{K}(X_{\mathcal{A}^{c}};Y_{K}|X_{\mathcal{A}})-\sum_{k\in\mathcal{A}^{c}}\log M_{k}}{2N},\quad\mathcal{A}\in\mathcal{P}([K]). (258)

Note that λ(𝒜)\lambda^{(\mathcal{A})}’s are bounded below by a positive constant for rate points lying on the sum-rate boundary.

By Theorem 4, there exists a VLSF code with LL decoding times n1<n2<<nLn_{1}<n_{2}<\cdots<n_{L} such that the average decoding time is bounded as

Nn1+=1L1(n+1n)[ıK(X[K]n;YKn)<γ].\displaystyle N\leq n_{1}+\sum_{\ell=1}^{L-1}(n_{\ell+1}-n_{\ell})\mathbb{P}\left[\imath_{K}(X_{[K]}^{n_{\ell}};Y_{K}^{n_{\ell}})<\gamma\right]. (259)

Following the analysis in (125)–(128), we conclude that

n=N(1+o(1))\displaystyle n_{\ell}=N(1+o(1)) (260)

for all [L]\ell\in[L]. Applying the Chernoff bound to the probability terms in (39)–(40) using (256) and (260), we get that the sum of the terms in (39)–(40) is bounded by exp{NE}\exp\{-NE\} for some E>0E>0.

Applying Lemma 2 to the probability in (37) with (256) gives

[ıK(X[K]nL;YKnL)<γ]\displaystyle\mathbb{P}\left[\imath_{K}(X_{[K]}^{n_{L}};Y_{K}^{n_{L}})<\gamma\right] 12π1nL1lognL\displaystyle\leq\frac{1}{\sqrt{2\pi}}\frac{1}{\sqrt{n_{L}}}\frac{1}{\sqrt{\log n_{L}}}
(1+O((lognL)(3/2)nL)).\displaystyle\quad\cdot\left(1+O\left(\frac{(\log n_{L})^{(3/2)}}{\sqrt{n_{L}}}\right)\right). (261)

Applying Theorem 4 with (257), (261), and the exponential bound on the sum of the terms in (39)–(40), we bound the error probability as

[𝗀τ(U,Yτ)W[K]]\displaystyle\mathbb{P}\left[\mathsf{g}_{\tau^{*}}(U,Y^{\tau^{*}})\neq W_{[K]}\right]
12π1N1logN(1+O((logN)(3/2)N))\displaystyle\leq\frac{1}{\sqrt{2\pi}}\frac{1}{\sqrt{N}}\frac{1}{\sqrt{\log N}}\cdot\left(1+O\left(\frac{(\log N)^{(3/2)}}{\sqrt{N}}\right)\right)
+1N+exp{NE},\displaystyle\quad+\frac{1}{N}+\exp\{-NE\}, (262)

which is further bounded by 1NlogN\frac{1}{\sqrt{N\log N}} for NN large enough. Following steps identical to (125)–(133), we prove the existence of a VLSF code that satisfies (255) for the DM-MAC with LL decoding times and error probability 1NlogN\frac{1}{\sqrt{N\log N}}.

Finally, invoking (255) with LL replaced by L1L-1 and the stop-at-time-zero procedure in the proof sketch of Theorem 2 with ϵN=1NlogN\epsilon_{N}^{\prime}=\frac{1}{\sqrt{N^{\prime}\log N^{\prime}}}, we complete the proof of Theorem 5.

D.III Proof of (44)

The proof of (44) follows steps similar to those in the proof of [8, Th. 2]. Below, we detail the differences between the proofs of (44), Theorem 5, and [8, Th. 2].

  1. 1.

    In (44), we choose cN+1cN+1 decoding times as ni=i1n_{i}=i-1 for i=1,,cN+1i=1,\dots,cN+1 for a sufficiently large constant c>1c>1. This differs from Theorem 5 where LL does not grow with NN (L=O(1)L=O(1)) and the gaps between consecutive decoding times can differ. In [8, Th. 2], any integer time is available for decoding, giving L=nmax=L=n_{\max}=\infty.

  2. 2.

    We here set the parameter γ\gamma differently from how it was set in (256) and (257). The difference accounts for the error event that some of the messages are decoded incorrectly and some of the messages are decoded correctly. Specifically, we set

    γ\displaystyle\gamma =NIKa\displaystyle=NI_{K}-a (263)
    =k=1KlogMk+logN+b,\displaystyle=\sum_{k=1}^{K}\log M_{k}+\log N+b, (264)

    where aa is an upper bound on the information density ıK(X[K];YK)\imath_{K}(X_{[K]};Y_{K}), and bb is a positive constant to be determined later. Since the number of decoding times LL grows linearly with NN and c>1c>1, applying the Chernoff bound gives

    (37)+(39)+(40)exp{NE}\displaystyle\eqref{eq:true}+\eqref{eq:1errorS}+\eqref{eq:1errorExp}\leq\exp\{-NE\} (265)

    for some E>0E>0 and NN large enough. Hence, the error probability ϵ\epsilon in Theorem 4 is bounded by exp{b}N+exp{NE}\frac{\exp\{-b\}}{N}+\exp\{-NE\}, which can be further bounded by 1N\frac{1}{N} by appropriately choosing the constant bb.

    The term (37) disappears in [8, Th. 2] because nL=n_{L}=\infty; the terms (39) and (40) disappear in [8, Th. 2] because the channel is point-to-point. Therefore, bb is set to 0 in [8, Th. 2].

  3. 3.

    We bound the average decoding time 𝔼[τ]\mathbb{E}\left[\tau^{*}\right] as

    𝔼[τ]\displaystyle\mathbb{E}\left[\tau^{*}\right] γ+aIK=N\displaystyle\leq\frac{\gamma+a}{I_{K}}=N (266)

    using Doob’s optional stopping theorem as used in[8, eq. (106)-(107)] while 𝔼[τ]\mathbb{E}\left[\tau^{*}\right] in the proof of Theorem 5 is bounded by bounding the tail probability of information density.

    The steps above show the achievability of an (N,cN,M[K],1/N)(N,cN,M_{[K]},1/N) code with

    k=1KlogMk=NIKlogN+O(1).\displaystyle\sum_{k=1}^{K}\log M_{k}=NI_{K}-\log N+O(1). (267)
  4. 4.

    Lastly, as in [8, Th. 2], we invoke the stop-at-time-zero procedure from the proof sketch of Theorem 2 with ϵN=1N\epsilon_{N}^{\prime}=\frac{1}{N^{\prime}}.

Appendix E Proof of Theorem 6

In Theorem 6, we employ a multiple hypothesis test at some early time n0n_{0} to estimate the number of active transmitters followed by a VLSF MAC coding. Since VLSF MAC codeword design employed in Theorem 5 is unchanged (up to the coding dimension), the VLSF MAC code employs a single nested codebook, as described in the proof below. If the test decides that the number of active transmitters is k^=0\hat{k}=0, then the decoder declares no active transmitters at time n0n_{0} and stops the transmission at that time. If the estimated number of active transmitters is k^0\hat{k}\neq 0, then the decoder decides to decode at one of the available times nk^,1n_{\hat{k},1}, …, nk^,Ln_{\hat{k},L} using the decoder for the MAC with k^\hat{k} transmitters.

E.I Encoding and decoding

Encoding: As in the DM-PPC and DM-MAC cases, the codewords are generated i.i.d. from the distribution PXnK,LP_{X}^{n_{K,L}}.

Decoding: The decoder combines a (K+1)(K+1)-ary hypothesis test and the threshold test that is used for the DM-MAC.

Multiple hypothesis test: Given distributions PYkP_{Y_{k}}, k{0,,K}k\in\{0,\dots,K\} where 𝒴K\mathcal{Y}_{K} is the common alphabet, we test the hypotheses

Hk:Yn0PYkn0,k{0,,K}.\displaystyle H_{k}\colon Y^{n_{0}}\sim P_{Y_{k}}^{n_{0}},\quad k\in\{0,\dots,K\}. (268)

The error probability constraints of our test are

[Decide Hs where s0|H0]\displaystyle\mathbb{P}\left[\text{Decide }H_{s}\text{ where }s\neq 0|H_{0}\right] ϵ0\displaystyle\leq\epsilon_{0} (269)
[Decide Hs where sk|Hk]\displaystyle\mathbb{P}\left[\text{Decide }H_{s}\text{ where }s\neq k|H_{k}\right] exp{n0E+o(n0)}\displaystyle\leq\exp\{-n_{0}E+o(n_{0})\} (270)

for k[K]k\in[K], where E>0E>0 is a constant.

Due to the asymmetry in (269)–(270), we employ a composite hypothesis testing to decide whether H0H_{0} is true; that is, the test declares H0H_{0} if

logPY0n0(yn0)PYsn0(yn0)ηs\displaystyle\log\frac{P_{Y_{0}}^{n_{0}}(y^{n_{0}})}{P_{Y_{s}}^{n_{0}}(y^{n_{0}})}\geq\eta_{s} (271)

for all s[K]s\in[K], where the threshold values ηs\eta_{s}, s[K]s\in[K], are chosen to satisfy (269). If the condition in (271) is not satisfied, then the test applies the maximum likelihood decoding rule, i.e., the output is HsH_{s}, where

s=argmaxs[K]PYsn0(yn0).\displaystyle s=\arg\max_{s\in[K]}P_{Y_{s}}^{n_{0}}(y^{n_{0}}). (272)

From the asymptotics of the error probability bound for composite hypothesis testing in [28, Th. 5], the maximum type-II error of the composite hypothesis test is bounded as

maxk[K][Decide H0|Hk]\displaystyle\max_{k\in[K]}\mathbb{P}\left[\text{Decide }H_{0}|H_{k}\right]
exp{n0mink[K]D(PY0PYk)+O(n0)}.\displaystyle\quad\leq\exp\left\{-n_{0}\min_{k\in[K]}D(P_{Y_{0}}\|P_{Y_{k}})+O(\sqrt{n_{0}})\right\}. (273)

If PY0P_{Y_{0}} is not absolutely continuous with respect to PYkP_{Y_{k}}, (273) remains valid when D(PY0PYk)=D(P_{Y_{0}}\|P_{Y_{k}})=\infty since in that case, we can achieve arbitrarily large type-II error probability exponent (see [13, Lemmas 57-58].)

From [55], the maximum likelihood test yields

max(k,s)[K]2[Decide Hs|Hk]exp{nEC+o(n)},\displaystyle\max_{(k,s)\in[K]^{2}}\mathbb{P}\left[\text{Decide }H_{s}|H_{k}\right]\leq\exp\{-nE_{C}+o(n)\}, (274)

where

EC=mink,sminλ(0,1)logy𝒴KPYk(y)1λPYs(y)λ\displaystyle E_{C}=\min_{k,s}\min_{\lambda\in(0,1)}\log\sum_{y\in\mathcal{Y}_{K}}P_{Y_{k}}(y)^{1-\lambda}P_{Y_{s}}(y)^{\lambda} (275)

is the minimum Chernoff distance between the pairs (PYk,PYs)(P_{Y_{k}},P_{Y_{s}}), ks[K]k\neq s\in[K]. Combining (273) and (275), the conditions in (269)–(270) are satisfied with

E=min{mink[K]D(PY0PYk),EC}>0.\displaystyle E=\min\left\{\min_{k\in[K]}D(P_{Y_{0}}\|P_{Y_{k}}),E_{C}\right\}>0. (276)

If the hypothesis test declares the hypothesis Hk^H_{\hat{k}}, k^0\hat{k}\neq 0, then the receiver decides to decode k^\hat{k} messages at one of the decoding times in {nk^,1,,nk^,L}\{n_{\hat{k},1},\dots,n_{\hat{k},L}\} using the VLSF code in Section D for the k^\hat{k}-MAC, where nk^,1n_{\hat{k},1} is set to n0n_{0}.

E.II Error analysis

We here bound the probability of error for the RAC code in Definition 4.

No active transmitters: For k=0k=0, the only error event is that the composite hypothesis test at time n0n_{0} does not declare H0H_{0} given that H0H_{0} is true. By (269), the probability of this event is bounded by ϵ0\epsilon_{0}.

k1k\geq 1 active transmitters: When there is at least one active transmitter, there is an error if and only if at least one of the following events occurs:

  • k^k\mathcal{E}_{\hat{k}\neq k}: The number of active transmitters is estimated incorrectly at time n0n_{0}, i.e., k^k\hat{k}\neq k, which results in decoding of k^\hat{k} messages instead of kk messages.

  • mes\mathcal{E}_{\textnormal{mes}}: A list of messages m[k]m[k]m_{[k]}^{\prime}\neq m_{[k]} is decoded at one of the times in {nk,1,,nk,L}\{n_{k,1},\dots,n_{k,L}\}.

In the following discussion, we bound the probability of these events separately, and apply the union bound to combine them.

Since the encoders are identical, the event rep={Wi=Wj for some ij}\mathcal{E}_{\textnormal{rep}}=\{W_{i}=W_{j}\text{ for some }i\neq j\}, where two or more transmitters send the same yields a dependence with transmitted codewords. Since transmitted codewords are usually independent, treating rep\mathcal{E}_{\mathrm{rep}} as an error simplifies the analysis. By the union bound, we have

[rep]k(k1)2M.\displaystyle\mathbb{P}\left[\mathcal{E}_{\textnormal{rep}}\right]\leq\frac{k(k-1)}{2M}. (277)

Applying the union bound, we bound the error probability as

ϵk\displaystyle\epsilon_{k} \displaystyle\leq [rep]+[repc][k^kmes|repc]\displaystyle\mathbb{P}\left[\mathcal{E}_{\textnormal{rep}}\right]+\mathbb{P}\left[\mathcal{E}^{\mathrm{c}}_{\textnormal{rep}}\right]\mathbb{P}\left[\mathcal{E}_{\hat{k}\neq k}\cup\mathcal{E}_{\textnormal{mes}}\middle|\mathcal{E}^{\mathrm{c}}_{\textnormal{rep}}\right] (278)
\displaystyle\leq [rep]+[k^k|repc]+[mes|repck^kc].\displaystyle\mathbb{P}\left[\mathcal{E}_{\textnormal{rep}}\right]+\mathbb{P}\left[\mathcal{E}_{\hat{k}\neq k}\middle|\mathcal{E}^{\mathrm{c}}_{\textnormal{rep}}\right]+\mathbb{P}\left[\mathcal{E}_{\textnormal{mes}}\middle|\mathcal{E}^{\mathrm{c}}_{\textnormal{rep}}\cap\mathcal{E}^{\mathrm{c}}_{\hat{k}\neq k}\right].

By (270), the probability [k^k|repc]\mathbb{P}\left[\mathcal{E}_{\hat{k}\neq k}\middle|\mathcal{E}^{\mathrm{c}}_{\textnormal{rep}}\right] is bounded as

[k^k|repc]exp{n0E+o(n0)}.\displaystyle\mathbb{P}\left[\mathcal{E}_{\hat{k}\neq k}\middle|\mathcal{E}^{\mathrm{c}}_{\textnormal{rep}}\right]\leq\exp\{-n_{0}E+o(n_{0})\}. (279)

Since the number of active transmitters kk is not available at the decoder at time 0, we here slightly modify the stop-at-time-zero procedure from the proof sketch of Theorem 2. We set the smallest decoding time nj,1n_{j,1} to n00n_{0}\neq 0 for all j[K]j\in[K]. Given the estimate k^\hat{k} of the number of active transmitters kk, we employ the stop-at-time-zero procedure with the triple (N,ϵ,ϵN)(N^{\prime},\epsilon,\epsilon_{N}^{\prime}) replaced by (Nk^,ϵk^,ϵNk^)(N_{\hat{k}}^{\prime},\epsilon_{\hat{k}},\epsilon_{N_{\hat{k}}^{\prime}}).

Let stop\mathcal{E}_{\textnormal{stop}} denote the event that the decoder chooses to stop at time nk,1=n0n_{k,1}=n_{0}. We bound [mes|repck^kc]\mathbb{P}\left[\mathcal{E}_{\textnormal{mes}}\middle|\mathcal{E}^{\mathrm{c}}_{\textnormal{rep}}\cap\mathcal{E}^{\mathrm{c}}_{\hat{k}\neq k}\right] as

[mes|repck^kc][stop|repck^kc]\displaystyle\mathbb{P}\left[\mathcal{E}_{\textnormal{mes}}\middle|\mathcal{E}^{\mathrm{c}}_{\textnormal{rep}}\cap\mathcal{E}^{\mathrm{c}}_{\hat{k}\neq k}\right]\leq\mathbb{P}\left[\mathcal{E}_{\textnormal{stop}}|\mathcal{E}^{\mathrm{c}}_{\textnormal{rep}}\cap\mathcal{E}^{\mathrm{c}}_{\hat{k}\neq k}\right]
+[stopc|repck^kc][mes|repck^kcstopc].\displaystyle\quad+\mathbb{P}\left[\mathcal{E}_{\textnormal{stop}}^{\mathrm{c}}|\mathcal{E}^{\mathrm{c}}_{\textnormal{rep}}\cap\mathcal{E}^{\mathrm{c}}_{\hat{k}\neq k}\right]\mathbb{P}\left[\mathcal{E}_{\textnormal{mes}}\middle|\mathcal{E}^{\mathrm{c}}_{\textnormal{rep}}\cap\mathcal{E}^{\mathrm{c}}_{\hat{k}\neq k}\cap\mathcal{E}^{\mathrm{c}}_{\textnormal{stop}}\right]. (280)

By Theorem 4, when the RAC decoder decodes decode a list of kk messages from [M][M] at time nk,n_{k,\ell}, we get

[mes|repck^kcstopc]\displaystyle\mathbb{P}\left[\mathcal{E}_{\textnormal{mes}}\middle|\mathcal{E}^{\mathrm{c}}_{\textnormal{rep}}\cap\mathcal{E}^{\mathrm{c}}_{\hat{k}\neq k}\cap\mathcal{E}^{\mathrm{c}}_{\textnormal{stop}}\right] (281)
[ık(X[k]nk,L;Yknk,L)<γk]\displaystyle\quad\leq\mathbb{P}\left[\imath_{k}(X_{[k]}^{n_{k,L}};Y_{k}^{n_{k,L}})<\gamma_{k}\right] (282)
+(Mkk)exp{γk}\displaystyle\quad\quad+\binom{M-k}{k}\exp\{-\gamma_{k}\} (283)
+=2L𝒜𝒫([k])\displaystyle\quad\quad+\sum_{\ell=2}^{L}\sum_{\mathcal{A}\in\mathcal{P}([k])}
[ık(X𝒜nk,;Yknk,)>NkIk(X𝒜;Yk)+Nkλ(k,𝒜)]\displaystyle\quad\quad\quad\mathbb{P}\left[\imath_{k}(X_{\mathcal{A}}^{n_{k,\ell}};Y_{k}^{n_{k,\ell}})>N_{k}^{\prime}I_{k}(X_{\mathcal{A}};Y_{k})+N_{k}^{\prime}\lambda^{(k,\mathcal{A})}\right] (284)
+𝒜𝒫([k])(Mk|𝒜|)\displaystyle\quad\quad+\sum_{\mathcal{A}\in\mathcal{P}([k])}\binom{M-k}{|\mathcal{A}|}
exp{γ+NkIk(X𝒜;Yk)+Nkλ(k,𝒜)},\displaystyle\quad\quad\quad\exp\{-\gamma+N_{k}^{\prime}I_{k}(X_{\mathcal{A}};Y_{k})+N_{k}\lambda^{(k,\mathcal{A})}\}, (285)

where NkN_{k}^{\prime} is the average decoding time given stopc\mathcal{E}_{\textnormal{stop}}^{c}, and γk\gamma_{k} and λ(k,𝒜)\lambda^{(k,\mathcal{A})} are constants chosen to satisfy the equations

γk\displaystyle\gamma_{k} =nk,Iknk,log(L+1)(nk,)Vk\displaystyle=n_{k,\ell}I_{k}-\sqrt{n_{k,\ell}\log_{(L-\ell+1)}(n_{k,\ell})V_{k}} (286)
=klogM+logNk+O(1)\displaystyle=k\log M+\log N_{k}^{\prime}+O(1) (287)

for all {2,,L}\ell\in\{2,\dots,L\}, and

λ(k,𝒜)\displaystyle\lambda^{(k,\mathcal{A})} =NkIk(X𝒜c;Yk|X𝒜)|𝒜c|logM2Nk,𝒜𝒫([k]).\displaystyle=\frac{N_{k}^{\prime}I_{k}(X_{\mathcal{A}^{c}};Y_{k}|X_{\mathcal{A}})-|\mathcal{A}^{c}|\log M}{2N_{k}^{\prime}},\quad\mathcal{A}\in\mathcal{P}([k]). (288)

The fact that each λ(k,𝒜)\lambda^{(k,\mathcal{A})} is bounded below by a positive constant follows from (287), [28, Lemma 1], and the symmetry assumptions on the RAC.

Following the analysis in Appendix D.II, we conclude that

klogM=NkIkNklog(L1)(Nk)Vk\displaystyle k\log M=N_{k}^{\prime}I_{k}-\sqrt{N_{k}^{\prime}\log_{(L-1)}(N_{k}^{\prime})V_{k}}
+O(NkVklog(L1)(Nk))\displaystyle\quad+O\left(\sqrt{\frac{N_{k}^{\prime}V_{k}}{\log_{(L-1)}(N_{k}^{\prime})}}\right) (289)
[mes|repck^kcstopc]1NklogNk.\displaystyle\mathbb{P}\left[\mathcal{E}_{\textnormal{mes}}\middle|\mathcal{E}^{\mathrm{c}}_{\textnormal{rep}}\cap\mathcal{E}^{\mathrm{c}}_{\hat{k}\neq k}\cap\mathcal{E}^{\mathrm{c}}_{\textnormal{stop}}\right]\leq\frac{1}{\sqrt{N_{k}^{\prime}\log N_{k}^{\prime}}}. (290)

Note that by (277) and (289), the bound on [rep]\mathbb{P}\left[\mathcal{E}_{\textnormal{rep}}\right] decays exponentially with NkN_{k}. A consequence of (286) and (289) is that

Nk=nk,(1+o(1))\displaystyle N_{k}^{\prime}=n_{k,\ell}(1+o(1)) (291)

for all 2\ell\geq 2 and k[K]k\in[K].

Note that from (289), the right-hand side of (277) is bounded by 1Nk\frac{1}{N_{k}^{\prime}} for NkN_{k}^{\prime} large enough. We set the time n0n_{0} so that the right-hand side of (279) is bounded by 14NklogNk\frac{1}{4\sqrt{N_{k}^{\prime}\log N_{k}^{\prime}}} for all k[K]k\in[K]. This condition is satisfied if

n012ElogNk+o(logNk).\displaystyle n_{0}\geq\frac{1}{2E}\log N_{k}^{\prime}+o(\log N_{k}^{\prime}). (292)

The above arguments imply that

[rep]+[k^k|repc]12NklogNk\displaystyle\mathbb{P}\left[\mathcal{E}_{\textnormal{rep}}\right]+\mathbb{P}\left[\mathcal{E}_{\hat{k}\neq k}\middle|\mathcal{E}^{\mathrm{c}}_{\textnormal{rep}}\right]\leq\frac{1}{2\sqrt{N_{k}^{\prime}\log N_{k}^{\prime}}} (293)

for NkN_{k}^{\prime} large enough. As in the DM-MAC case, we set

p[stop|repck^kc]=ϵk1NklogNk11NklogNk\displaystyle p\triangleq\mathbb{P}\left[\mathcal{E}_{\textnormal{stop}}|\mathcal{E}^{\mathrm{c}}_{\textnormal{rep}}\cap\mathcal{E}^{\mathrm{c}}_{\hat{k}\neq k}\right]=\frac{\epsilon_{k}^{\prime}-\frac{1}{\sqrt{N_{k}^{\prime}\log N_{k}^{\prime}}}}{1-\frac{1}{\sqrt{N_{k}^{\prime}\log N_{k}^{\prime}}}} (294)

where

ϵk=ϵk12NklogNk.\displaystyle\epsilon_{k}^{\prime}=\epsilon_{k}-\frac{1}{2\sqrt{N_{k}^{\prime}\log N_{k}^{\prime}}}. (295)

Combining (278), (280), (290), and (293)–(294), the error probability of the RAC code is bounded by

[rep]+[k^k|repc]+[stop|repck^kc]\displaystyle\mathbb{P}\left[\mathcal{E}_{\textnormal{rep}}\right]+\mathbb{P}\left[\mathcal{E}_{\hat{k}\neq k}\middle|\mathcal{E}^{\mathrm{c}}_{\textnormal{rep}}\right]+\mathbb{P}\left[\mathcal{E}_{\textnormal{stop}}|\mathcal{E}^{\mathrm{c}}_{\textnormal{rep}}\cap\mathcal{E}^{\mathrm{c}}_{\hat{k}\neq k}\right]
+[stopc|repck^kc][mes|repck^kcstopc]\displaystyle+\mathbb{P}\left[\mathcal{E}_{\textnormal{stop}}^{\mathrm{c}}|\mathcal{E}^{\mathrm{c}}_{\textnormal{rep}}\cap\mathcal{E}^{\mathrm{c}}_{\hat{k}\neq k}\right]\mathbb{P}\left[\mathcal{E}_{\textnormal{mes}}\middle|\mathcal{E}^{\mathrm{c}}_{\textnormal{rep}}\cap\mathcal{E}^{\mathrm{c}}_{\hat{k}\neq k}\cap\mathcal{E}^{\mathrm{c}}_{\textnormal{stop}}\right] (296)
12NklogNk+p+(1p)1NklogNk\displaystyle\quad\leq\frac{1}{2\sqrt{N_{k}^{\prime}\log N_{k}^{\prime}}}+p+(1-p)\frac{1}{\sqrt{N_{k}^{\prime}\log N_{k}^{\prime}}} (297)
=ϵk.\displaystyle\quad=\epsilon_{k}. (298)

The average decoding time of the code is bounded as

Nk\displaystyle N_{k} 𝔼[τk|k^krep][k^krep]\displaystyle\leq\mathbb{E}\left[\tau_{k}^{*}|\mathcal{E}_{\hat{k}\neq k}\cup\mathcal{E}_{\textnormal{rep}}\right]\mathbb{P}\left[\mathcal{E}_{\hat{k}\neq k}\cup\mathcal{E}_{\textnormal{rep}}\right]
+𝔼[τk|k^kcrepc][k^kck^kc]\displaystyle\quad+\mathbb{E}\left[\tau_{k}^{*}|\mathcal{E}_{\hat{k}\neq k}^{\mathrm{c}}\cap\mathcal{E}_{\textnormal{rep}}^{\mathrm{c}}\right]\mathbb{P}\left[\mathcal{E}_{\hat{k}\neq k}^{\mathrm{c}}\cap\mathcal{E}_{\hat{k}\neq k}^{\mathrm{c}}\right] (299)
NK,L2NklogNk+n0p+Nk(1p).\displaystyle\leq\frac{N_{K,L}}{2\sqrt{N_{k}^{\prime}\log N_{k}^{\prime}}}+n_{0}p+N_{k}^{\prime}(1-p). (300)

From (291) and (294)–(295), we get

Nk=Nk1ϵk(1+O(1NklogNk)).\displaystyle N_{k}^{\prime}=\frac{N_{k}}{1-\epsilon_{k}^{\prime}}\left(1+O\left(\frac{1}{\sqrt{N_{k}\log N_{k}}}\right)\right). (301)

Plugging (301) into (289) completes the proof.

Appendix F Proof of Theorem 7

The non-asymptotic achievability bound in Theorem 1 applies to the Gaussian PPC with maximal power constraint PP (58) with the modification that the error probability (18) has an additional term for power constraint violations

[=1L{Xn2>nP}].\displaystyle\mathbb{P}\left[\bigcup_{\ell=1}^{L}\left\{\left\lVert X^{n_{\ell}}\right\rVert^{2}>n_{\ell}P\right\}\right]. (302)

The proof follows similarly to the proof of Theorem 2 as we employ the stop-at-time-zero procedure in the proof sketch of Theorem 2. We extend Lemma 1 to the Gaussian PPC, showing

logM(N,L,1NlogN,P)\displaystyle\log M^{*}\left(N,L,\frac{1}{\sqrt{N\log N}},P\right)
NC(P)Nlog(L)(N)V(P)+O(Nlog(L)(N)).\displaystyle\geq{NC(P)}-\sqrt{N\log_{(L)}(N)\,V(P)}+O\left(\sqrt{\frac{N}{\log_{(L)}(N)}}\right). (303)

The input distribution PXnLP_{X^{n_{L}}} used in the proof of (303) differs from the one used in the proof of Lemma 1, causing changes in the analysis on the probability [ı(XnL;YnL)<γ]\mathbb{P}\left[\imath(X^{n_{L}};Y^{n_{L}})<\gamma\right] and the threshold γ\gamma in (113). Below, we detail these differences.

F.1 The input distribution PXnLP_{X^{n_{L}}}

We choose the distribution of the random codewords, PXnLP_{X^{n_{L}}}, in Theorem 1 as follows. Set n0=0n_{0}=0. For each codeword, independently draw sub-codewords Xnj1+1:njX^{n_{j-1}+1:n_{j}}, j[L]j\in[L] from the uniform distribution on the (njnj1)(n_{j}-n_{j-1})-dimensional sphere of radius (njnj1)P\sqrt{(n_{j}-n_{j-1})P}. Let PXnLP_{X^{n_{L}}} denote the distribution of the length-nLn_{L} random codewords described above. Since codewords chosen under PXnLP_{X^{n_{L}}} never violate the power constraint (58), the power violation probability in (302) is 0. Furthermore, the power constraint is satisfied with equality at each of the dimensions n1,,nLn_{1},\dots,n_{L}; our analysis in [39] shows that for any finite LL, and sufficiently large increments nn1n_{\ell}-n_{\ell-1} for all [L]\ell\in[L], using this restricted subset instead of the entire nLn_{L}-dimensional power sphere results in no change in the asymptotic expansion (60) for the fixed-length no-feedback codes up to the third-order term.

F.2 Bounding the probability of the information density random variable

We begin by bounding the probability

[ı(Xn;Yn)<γ],[L],\displaystyle\mathbb{P}\left[\imath(X^{n_{\ell}};Y^{n_{\ell}})<\gamma\right],\quad\ell\in[L], (304)

that appears in Theorem 1 under the input distribution described above. Under this choice of PXnLP_{X^{n_{L}}}, the random variable ı(Xn;Yn)\imath(X^{n_{\ell}};Y^{n_{\ell}}) is not a sum of nn_{\ell} i.i.d. random variables. We wish to apply the moderate deviations result in Lemma 2. To do this, we first introduce the following lemma from [56], which uniformly bounds the Radon-Nikodym derivative of the channel output distribution in response to the uniform distribution on a sphere as compared to the channel output distribution in response to i.i.d. Gaussian inputs.

Lemma 5 (MolavianJazi and Laneman [56, Prop. 2])

Let XnX^{n} be distributed uniformly over the nn-dimensional sphere of radius nP\sqrt{nP}. Let X~n𝒩(𝟎,P𝖨n)\tilde{X}^{n}\sim\mathcal{N}(\mathbf{0},P\mathsf{I}_{n}). Let PYnP_{Y^{n}} and PY~nP_{\tilde{Y}^{n}} denote the channel output distributions in response to PXnP_{X^{n}} and PX~nP_{\tilde{X}^{n}}, respectively, where PYn|XnP_{Y^{n}|X^{n}} is the point-to-point Gaussian channel (55). Then there exists an n0n_{0}\in\mathbb{N} such that for all nn0n\geq n_{0} and ynny^{n}\in\mathbb{R}^{n}, it holds that

dPYn(yn)dPY~n(yn)\displaystyle\frac{\mathrm{d}P_{Y^{n}}(y^{n})}{\mathrm{d}P_{\tilde{Y}^{n}}(y^{n})} J(P)27π81+P1+2P.\displaystyle\leq J(P)\triangleq{27}{\sqrt{\frac{\pi}{8}}}\frac{1+P}{\sqrt{1+2P}}. (305)

Let PY~nP_{\tilde{Y}}^{n_{\ell}} be 𝒩(𝟎,(1+P)𝖨n)\mathcal{N}(\mathbf{0},(1+P)\mathsf{I}_{n_{\ell}}). By Lemma 5, we bound (304)\eqref{eq:probimath} as

\IEEEeqnarraymulticol3l[ı(Xn;Yn)<γ]\displaystyle\IEEEeqnarraymulticol{3}{l}{\mathbb{P}\left[\imath(X^{n_{\ell}};Y^{n_{\ell}})<\gamma\right]} (306)
=\displaystyle= [logdPYn|Xn(Yn|Xn)dPY~n(Yn)<γ+logdPYn(Yn)dPY~n(Yn)]\displaystyle\mathbb{P}\left[\log\frac{\mathrm{d}P_{Y^{n_{\ell}}|X^{n_{\ell}}}(Y^{n_{\ell}}|X^{n_{\ell}})}{\mathrm{d}P_{\tilde{Y}^{n_{\ell}}}(Y^{n_{\ell}})}<\gamma+\log\frac{\mathrm{d}P_{Y^{n_{\ell}}}(Y^{n_{\ell}})}{\mathrm{d}P_{\tilde{Y}^{n_{\ell}}}(Y^{n_{\ell}})}\right]
\displaystyle\leq [logdPYn|Xn(Yn|Xn)dPY~n(Yn)<γ+logJ(P)],\displaystyle\mathbb{P}\left[\log\frac{\mathrm{d}P_{Y^{n_{\ell}}|X^{n_{\ell}}}(Y^{n_{\ell}}|X^{n_{\ell}})}{\mathrm{d}P_{\tilde{Y}^{n_{\ell}}}(Y^{n_{\ell}})}<\gamma+\ell\log J(P)\right], (307)

where J(P)J(P) is the constant given in (305), and (307) follows from the fact that PYnP_{Y^{n_{\ell}}} is the product of \ell output distributions of dimensions njnj1,j[]n_{j}-n_{j-1},j\in[\ell], each induced by a uniform distribution over a sphere of the corresponding radius. As argued in [13, 35, 56, 39], by spherical symmetry, the distribution of the random variable

logdPYn|Xn(Yn|Xn)dPY~n(Yn)\displaystyle\log\frac{\mathrm{d}P_{Y^{n_{\ell}}|X^{n_{\ell}}}(Y^{n_{\ell}}|X^{n_{\ell}})}{\mathrm{d}P_{\tilde{Y}^{n_{\ell}}}(Y^{n_{\ell}})} (308)

depends on XnX^{n_{\ell}} only through its norm Xn\left\lVert X^{n_{\ell}}\right\rVert. Since Xn2=nP\left\lVert X^{n_{\ell}}\right\rVert^{2}=n_{\ell}P with probability 1, any choice of xnx^{n_{\ell}} such that xni2=niP\left\lVert x^{n_{i}}\right\rVert^{2}=n_{i}P for i[]i\in[\ell] gives

[logdPYn|Xn(Yn|Xn)dPY~n(Yn)<γ+logJ(P)]=\displaystyle\mathbb{P}\left[\log\frac{\mathrm{d}P_{Y^{n_{\ell}}|X^{n_{\ell}}}(Y^{n_{\ell}}|X^{n_{\ell}})}{\mathrm{d}P_{\tilde{Y}^{n_{\ell}}}(Y^{n_{\ell}})}<\gamma+\ell\log J(P)\right]=
[logdPYn|Xn(Yn|Xn)dPY~n(Yn)<γ+logJ(P)|Xn=xn].\displaystyle\mathbb{P}\left[\log\frac{\mathrm{d}P_{Y^{n_{\ell}}|X^{n_{\ell}}}(Y^{n_{\ell}}|X^{n_{\ell}})}{\mathrm{d}P_{\tilde{Y}^{n_{\ell}}}(Y^{n_{\ell}})}<\gamma+\ell\log J(P)\middle|X^{n_{\ell}}=x^{n_{\ell}}\right]. (309)

We set xn=(P,P,,P)=P𝟏x^{n_{\ell}}=(\sqrt{P},\sqrt{P},\dots,\sqrt{P})=\sqrt{P}\mathbf{1} to obtain an i.i.d. sum in (309). Given Xn=P𝟏X^{n_{\ell}}=\sqrt{P}\mathbf{1}, the distribution of (308) is the same as the distribution of the sum

i=1nAi\displaystyle\sum_{i=1}^{n_{\ell}}A_{i} (310)

of nn_{\ell} i.i.d. random variables

Ai=C(P)+P2(1+P)(1Zi2+2PZi),i[n],\displaystyle A_{i}=C(P)+\frac{P}{2(1+P)}\left(1-Z_{i}^{2}+\frac{2}{\sqrt{P}}Z_{i}\right),\quad i\in[n_{\ell}], (311)

where Z1,,ZnZ_{1},\dots,Z_{n_{\ell}} are drawn independently from 𝒩(0,1)\mathcal{N}(0,1) (see e.g., [13, eq. (205)]). The mean and variance of A1A_{1} are

𝔼[A1]\displaystyle\mathbb{E}\left[A_{1}\right] =C(P)\displaystyle=C(P) (312)
Var[A1]\displaystyle\mathrm{Var}\left[A_{1}\right] =V(P).\displaystyle=V(P). (313)

From (307)–(310), we get

[ı(Xn;Yn)<γ][i=1nAi<γ+logJ(P)].\displaystyle\mathbb{P}\left[\imath(X^{n_{\ell}};Y^{n_{\ell}})<\gamma\right]\leq\mathbb{P}\left[\sum_{i=1}^{n_{\ell}}A_{i}<\gamma+\ell\log J(P)\right]. (314)

To verify that Lemma 2 is applicable to the right-hand side of (314), it only remains to show that 𝔼[(A1C(P))3]\mathbb{E}\left[(A_{1}-C(P))^{3}\right] is finite, and A1C(P)A_{1}-C(P) satisfies Cramér’s condition, that is, there exists some t0>0t_{0}>0 such that 𝔼[exp{t(A1C(P))}]<\mathbb{E}\left[\exp\{t(A_{1}-C(P))\}\right]<\infty for all |t|<t0|t|<t_{0}. From (311), (A1C(P))3(A_{1}-C(P))^{3} has the same distribution as a 6-degree polynomial of the Gaussian random variable Z𝒩(0,1)Z\sim\mathcal{N}(0,1). This polynomial has a finite mean since all moments of ZZ are finite. Let cP2(1+P)c\triangleq\frac{P}{2(1+P)}, f2Pf\triangleq\frac{2}{\sqrt{P}}, and ttct^{\prime}\triangleq tc. To show that Cramér’s condition holds, we compute

𝔼[exp{t(A1C(P))}]\displaystyle\mathbb{E}\left[\exp\{t(A_{1}-C(P))\}\right]
=𝔼[exp{t(1Z2+fZ)}]\displaystyle=\mathbb{E}\left[\exp\{t^{\prime}(1-Z^{2}+fZ)\}\right] (315)
=12πexp{x22+t(1x2+fx)}dx\displaystyle=\int_{-\infty}^{\infty}\frac{1}{\sqrt{2\pi}}\exp\left\{-\frac{x^{2}}{2}+t^{\prime}(1-x^{2}+fx)\right\}\mathrm{d}x (316)
=11+2texp{t+tf2(1+2t)}.\displaystyle=\frac{1}{\sqrt{1+2t^{\prime}}}\exp\left\{t^{\prime}+\frac{t^{\prime}f}{2(1+2t^{\prime})}\right\}. (317)

Thus, 𝔼[exp{t(A1C(P))}]<\mathbb{E}\left[\exp\{t(A_{1}-C(P))\}\right]<\infty for t>12t^{\prime}>-\frac{1}{2}, and t0=12c>0t_{0}=\frac{1}{2c}>0 satisfies Cramér’s condition.

F.3 The threshold γ\gamma

We set γ,n1,,nL\gamma,n_{1},\dots,n_{L} so that the equalities

γ=nC(P)nlog(L+1)(n)V(P)logJ(P)\displaystyle\gamma=n_{\ell}C(P)-\sqrt{n_{\ell}\log_{(L-\ell+1)}(n_{\ell})V(P)}-\ell\log J(P) (318)

hold for all [L]\ell\in[L].

The rest of the proof follows identically to (119)–(133) with CC and VV replaced by C(P)C(P) and V(P)V(P), respectively, giving

logM\displaystyle\log M NC(P)Nlog(L)(N)V(P)\displaystyle\geq NC(P)-\sqrt{N\log_{(L)}(N)V(P)}
12πNV(P)log(L)(N)(1+o(1))logNLlogJ(P),\displaystyle-\frac{1}{\sqrt{2\pi}}\sqrt{\frac{NV(P)}{\log_{(L)}(N)}}(1+o(1))-\log N-L\log J(P), (319)

which completes the proof.

References

  • [1] R. C. Yavas, V. Kostina, and M. Effros, “Variable-length feedback codes with several decoding times for the Gaussian channel,” in Proc. IEEE Int. Symp. Inf. Theory (ISIT), July 2021, pp. 1883–1888.
  • [2] ——, “Nested sparse feedback codes for point-to-point, multiple access, and random access channels,” in IEEE Information Theory Workshop (ITW), Kanazawa, Japan, Oct. 2021, pp. 1–6.
  • [3] C. Shannon, “The zero error capacity of a noisy channel,” IRE Trans. Inf. Theory, vol. 2, no. 3, pp. 8–19, Sep. 1956.
  • [4] M. Horstein, “Sequential transmission using noiseless feedback,” IEEE Trans. Inf. Theory, vol. 9, no. 3, pp. 136–143, July 1963.
  • [5] J. Schalkwijk and T. Kailath, “A coding scheme for additive noise channels with feedback–I: No bandwidth constraint,” IEEE Trans. Inf. Theory, vol. 12, no. 2, pp. 172–182, Apr. 1966.
  • [6] A. B. Wagner, N. V. Shende, and Y. Altuğ, “A new method for employing feedback to improve coding performance,” IEEE Trans. Inf. Theory, vol. 66, no. 11, pp. 6660–6681, Nov. 2020.
  • [7] M. V. Burnashev, “Data transmission over a discrete channel with feedback: Random transmission time,” Problems of Information Transmission, vol. 12, no. 4, pp. 10–30, Oct. 1976.
  • [8] Y. Polyanskiy, H. V. Poor, and S. Verdú, “Feedback in the non-asymptotic regime,” IEEE Trans. Inf. Theory, vol. 57, no. 8, pp. 4903–4925, Aug. 2011.
  • [9] A. Tchamkerten and I. E. Telatar, “Variable length coding over an unknown channel,” IEEE Trans. Inf. Theory, vol. 52, no. 5, pp. 2126–2145, May 2006.
  • [10] M. Naghshvar, T. Javidi, and M. Wigger, “Extrinsic Jensen–Shannon divergence: Applications to variable-length coding,” IEEE Trans. Inf, Theory, vol. 61, no. 4, pp. 2148–2164, Apr. 2015.
  • [11] H. Yang, M. Pan, A. Antonini, and R. D. Wesel, “Sequential transmission over binary asymmetric channels with feedback,” IEEE Trans. Inf. Theory, vol. 68, no. 11, pp. 7023–7042, Nov. 2022.
  • [12] N. Guo and V. Kostina, “Reliability function for streaming over a DMC with feedback,” IEEE Trans. Inf. Theory, vol. 69, no. 4, pp. 2165–2192, Apr. 2023.
  • [13] Y. Polyanskiy, H. V. Poor, and S. Verdu, “Channel coding rate in the finite blocklength regime,” IEEE Trans. Inf. Theory, vol. 56, no. 5, pp. 2307–2359, May 2010.
  • [14] S. L. Fong and V. Y. F. Tan, “Asymptotic expansions for the AWGN channel with feedback under a peak power constraint,” in Proc. IEEE Int. Symp. Inf. Theory (ISIT), Hong Kong, China, June 2015, pp. 311–315.
  • [15] Y. Altuğ, H. V. Poor, and S. Verdú, “Variable-length channel codes with probabilistic delay guarantees,” in 53rd Annual Allerton Conference on Communication, Control, and Computing (Allerton), Monticello, IL, USA, Sep. 2015, pp. 642–649.
  • [16] J. Östman, R. Devassy, G. Durisi, and E. G. Ström, “Short-packet transmission via variable-length codes in the presence of noisy stop feedback,” IEEE Trans. Wireless Commun., vol. 20, no. 1, pp. 214–227, Jan. 2021.
  • [17] G. Forney, “Exponential error bounds for erasure, list, and decision feedback schemes,” IEEE Trans. Inf. Theory, vol. 14, no. 2, pp. 206–220, Mar. 1968.
  • [18] S. Ginzach, N. Merhav, and I. Sason, “Random-coding error exponent of variable-length codes with a single-bit noiseless feedback,” in IEEE Inf. Theory Workshop (ITW), Kaohsiung, Taiwan, Nov. 2017, pp. 584–588.
  • [19] L. V. Truong and V. Y. F. Tan, “On Gaussian MACs with variable-length feedback and non-vanishing error probabilities,” arXiv:1609.00594v2, Sep. 2016.
  • [20] ——, “On Gaussian MACs with variable-length feedback and non-vanishing error probabilities,” IEEE Trans. Inf. Theor., vol. 64, no. 4, p. 2333–2346, Apr. 2018.
  • [21] K. F. Trillingsgaard, W. Yang, G. Durisi, and P. Popovski, “Common-message broadcast channels with feedback in the nonasymptotic regime: Stop feedback,” IEEE Trans. Inf. Theory, vol. 64, no. 12, pp. 7686–7718, Dec. 2018.
  • [22] M. Heidari, A. Anastasopoulos, and S. S. Pradhan, “On the reliability function of discrete memoryless multiple-access channel with feedback,” in 2018 IEEE Information Theory Workshop (ITW), Guangzhou, China, Nov. 2018, pp. 1–5.
  • [23] K. F. Trillingsgaard and P. Popovski, “Variable-length coding for short packets over a multiple access channel with feedback,” in 2014 11th International Symposium on Wireless Communications Systems (ISWCS), Barcelona, Spain, Aug. 2014, pp. 796–800.
  • [24] S. H. Kim, D. K. Sung, and T. Le-Ngoc, “Variable-length feedback codes under a strict delay constraint,” IEEE Communications Letters, vol. 19, no. 4, pp. 513–516, Apr. 2015.
  • [25] A. R. Williamson, T. Chen, and R. D. Wesel, “Variable-length convolutional coding for short blocklengths with decision feedback,” IEEE Trans. Commun., vol. 63, no. 7, pp. 2389–2403, July 2015.
  • [26] K. Vakilinia, S. V. S. Ranganathan, D. Divsalar, and R. D. Wesel, “Optimizing transmission lengths for limited feedback with nonbinary LDPC examples,” IEEE Trans. Commun., vol. 64, no. 6, pp. 2245–2257, June 2016.
  • [27] A. Heidarzadeh, J. Chamberland, R. D. Wesel, and P. Parag, “A systematic approach to incremental redundancy with application to erasure channels,” IEEE Trans. Commun., vol. 67, no. 4, pp. 2620–2631, Apr. 2019.
  • [28] R. C. Yavas, V. Kostina, and M. Effros, “Random access channel coding in the finite blocklength regime,” IEEE Trans. Inf. Theory, vol. 67, no. 4, pp. 2115–2140, Apr. 2021.
  • [29] Y. Liu and M. Effros, “Finite-blocklength and error-exponent analyses for LDPC codes in point-to-point and multiple access communication,” in IEEE Int. Symp. Inf. Theory (ISIT), Los Angeles, California, USA, June 2020, pp. 361–366.
  • [30] H. Yang, R. C. Yavas, V. Kostina, and R. D. Wesel, “Variable-length stop-feedback codes with finite optimal decoding times for BI-AWGN channels,” in IEEE Int. Symp. Inf. Theory (ISIT), Espoo, Finland, July 2022, pp. 2327–2332.
  • [31] W. Feller, An Introduction to Probability Theory and its Applications, 2nd ed.   John Wiley & Sons, 1971, vol. II.
  • [32] H. Yamamoto and K. Itoh, “Asymptotic performance of a modified Schalkwijk-Barron scheme for channels with noiseless feedback (corresp.),” IEEE Trans. Inf. Theory, vol. 25, no. 6, pp. 729–733, Nov. 1979.
  • [33] A. Lalitha and T. Javidi, “On error exponents of almost-fixed-length channel codes and hypothesis tests,” arXiv:2012.00077, Nov. 2020.
  • [34] P. Berlin, B. Nakiboğlu, B. Rimoldi, and E. Telatar, “A simple converse of Burnashev’s reliability function,” IEEE Trans. Inf. Theory, vol. 55, no. 7, pp. 3074–3080, July 2009.
  • [35] V. Y. F. Tan and M. Tomamichel, “The third-order term in the normal approximation for the AWGN channel,” IEEE Trans. Inf. Theory, vol. 61, no. 5, pp. 2430–2438, May 2015.
  • [36] A. Tartakovsky, I. Nikiforov, and M. Basseville, Sequential Analysis: Hypothesis Testing and Changepoint Detection, 1st ed.   Chapman and Hall CRC, 2014.
  • [37] V. Y. Tan and O. Kosut, “On the dispersions of three network information theory problems,” IEEE Trans. Inf. Theory, vol. 60, no. 2, pp. 881–903, Feb. 2014.
  • [38] O. Kosut, “A second-order converse bound for the multiple-access channel via wringing dependence,” IEEE Tran. Inf. Theory, vol. 68, no. 6, pp. 3552–3584, June 2022.
  • [39] R. C. Yavas, V. Kostina, and M. Effros, “Gaussian multiple and random access channels: Finite-blocklength analysis,” IEEE Trans. Inf. Theory, vol. 67, no. 11, pp. 6983–7009, Nov. 2021.
  • [40] M. Tomamichel and V. Y. F. Tan, “A tight upper bound for the third-order asymptotics of discrete memoryless channels,” in Proc. IEEE Int. Symp. Inf. Theory (ISIT), Istanbul, Turkey, July 2013, pp. 1536–1540.
  • [41] P. Moulin, “The log-volume of optimal codes for memoryless channels, asymptotically within a few nats,” IEEE Trans. Inf. Theory, vol. 63, no. 4, pp. 2278–2313, Apr. 2017.
  • [42] V. V. Petrov, Sums of independent random variables.   New York, USA: Springer, Berlin, Heidelberg, 1975.
  • [43] T. M. Cover and J. A. Thomas, Elements of Information Theory.   NJ, USA: Wiley, 2006.
  • [44] Y. Polyanskiy, “A perspective on massive random-access,” in Proc. IEEE Int. Symp. Inf. Theory (ISIT), Aachen, Germany, June 2017, pp. 2523–2527.
  • [45] M. Ebrahimi, F. Lahouti, and V. Kostina, “Coded random access design for constrained outage,” in Proc. IEEE Int. Symp. Inf. Theory (ISIT), Aachen, Germany, June 2017, pp. 2732–2736.
  • [46] W. Yang, G. Caire, G. Durisi, and Y. Polyanskiy, “Optimum power control at finite blocklength,” IEEE Trans. Inf. Theory, vol. 61, no. 9, pp. 4598–4615, Sep. 2015.
  • [47] L. V. Truong, S. L. Fong, and V. Y. F. Tan, “On Gaussian channels with feedback under expected power constraints and with non-vanishing error probabilities,” IEEE Trans. Inf. Theory, vol. 63, no. 3, pp. 1746–1765, Mar. 2017.
  • [48] C. E. Shannon, “Probability of error for optimal codes in a Gaussian channel,” The Bell System Technical Journal, vol. 38, no. 3, pp. 611–656, May 1959.
  • [49] E. MolavianJazi, “A unified approach to Gaussian channels with finite blocklength,” Ph.D. dissertation, University of Notre Dame, July 2014.
  • [50] Y. Polyanskiy and S. Verdú, “Channel dispersion and moderate deviations limits for memoryless channels,” in 2010 48th Annual Allerton Conference on Communication, Control, and Computing (Allerton), Monticello, USA, Sep. 2010, pp. 1334–1339.
  • [51] R. W. Butler, Saddlepoint Approximations with Applications, ser. Cambridge Series in Statistical and Probabilistic Mathematics.   Cambridge University Press, 2007.
  • [52] A. Wald, “Sequential tests of statistical hypotheses,” The Annals of Mathematical Statistics, vol. 16, no. 2, pp. 117–186, June 1945.
  • [53] Y. Li and V. Y. F. Tan, “Second-order asymptotics of sequential hypothesis testing,” IEEE Trans. Inf. Theory, vol. 66, no. 11, pp. 7222–7230, Nov. 2020.
  • [54] A. Wald and J. Wolfowitz, “Optimum character of the sequential probability ratio test,” The Annals of Mathematical Statistics, vol. 19, no. 3, pp. 326–339, Sep. 1948.
  • [55] C. Leang and D. Johnson, “On the asymptotics of m-hypothesis Bayesian detection,” IEEE Trans. Inf. Theory, vol. 43, no. 1, pp. 280–282, 1997.
  • [56] E. MolavianJazi and J. N. Laneman, “A second-order achievable rate region for Gaussian multi-access channels via a central limit theorem for functions,” IEEE Trans. Inf. Theory, vol. 61, no. 12, pp. 6719–6733, Dec. 2015.
Recep Can Yavas (S’18–M’22) received the B.S. degree (Hons.) in electrical engineering from Bilkent University, Ankara, Turkey, in 2016. He received the M.S. and Ph.D. degrees in electrical engineering from the California Institute of Technology (Caltech) in 2017 and 2023, respectively. He is currently a research fellow at CNRS at CREATE, Singapore. His research interests include information theory, probability theory, and multi-armed bandits.
Victoria Kostina (S’12–M’14–SM’22) is a professor of electrical engineering and of computing and mathematical sciences at Caltech. She received the bachelor’s degree from Moscow Institute of Physics and Technology (MIPT) in 2004, the master’s degree from University of Ottawa in 2006, and the Ph.D. degree from Princeton University in 2013. During her studies at MIPT, she was affiliated with the Institute for Information Transmission Problems of the Russian Academy of Sciences. Her research interests lie in information theory, coding, communications, learning, and control. She has served as an Associate Editor for IEEE Transactions of Information Theory, and as a Guest Editor for the IEEE Journal on Selected Areas in Information Theory. She received the Natural Sciences and Engineering Research Council of Canada postgraduate scholarship during 2009–2012, Princeton Electrical Engineering Best Dissertation Award in 2013, Simons-Berkeley research fellowship in 2015 and the NSF CAREER award in 2017.
Michelle Effros (S’93–M’95–SM’03–F’09) is the George Van Osdol Professor of Electrical Engineering and Vice Provost at the California Institute of Technology. She was a co-founder of Code On Technologies, a technology licensing firm, which was sold in 2016. Dr. Effros is a fellow of the IEEE and has received a number of awards including Stanford’s Frederick Emmons Terman Engineering Scholastic Award (for excellence in engineering), the Hughes Masters Full-Study Fellowship, the National Science Foundation Graduate Fellowship, the AT&T Ph.D. Scholarship, the NSF CAREER Award, the Charles Lee Powell Foundation Award, the Richard Feynman-Hughes Fellowship, an Okawa Research Grant, and the Communications Society and Information Theory Society Joint Paper Award. She was cited by Technology Review as one of the world’s top 100 young innovators in 2002, became a fellow of the IEEE in 2009, and is a member of Tau Beta Pi, Phi Beta Kappa, and Sigma Xi. She received the B.S. (with distinction), M.S., and Ph.D. degrees in electrical engineering from Stanford University. Her research interests include information theory (with a focus on source, channel, and network coding for multi-node networks) and theoretical neuroscience (with a focus on neurostability and memory). Dr. Effros served as the Editor of the IEEE Information Theory Society Newsletter from 1995 to 1998 and as a Member of the Board of Governors of the IEEE Information Theory Society from 1998 to 2003 and from 2008 to 2017. She served as President of the IEEE Information Theory Society in 2015 and as Executive Director for the film “The Bit Player,” a movie about Claude Shannon, which came out in 2018. She was a member of the Advisory Committee and the Committee of Visitors for the Computer and Information Science and Engineering (CISE) Directorate at the National Science Foundation from 2009 to 2012 and in 2014, respectively. She served on the IEEE Signal Processing Society Image and Multi-Dimensional Signal Processing (IMDSP) Technical Committee from 2001 to 2007 and on ISAT from 2006 to 2009. She served as Associate Editor for the joint special issue on Networking and Information Theory in the IEEE Transactions on Information Theory and the IEEE/ACM Transactions on Networking, as Associate Editor for the special issue honoring the scientific legacy of Ralf Koetter in the IEEE Transactions on Information Theory and, from 2004 to 2007 served as Associate Editor for Source Coding for the IEEE Transactions on Information Theory. She has served on numerous technical program committees and review boards, including serving as general co-chair for the 2009 Network Coding Workshop and technical program committee co-chair for the 2012 IEEE International Symposium on Information Theory and the 2023 IEEE Information Theory Workshop.