Variable-Length Sparse Feedback Codes for Point-to-Point, Multiple Access, and Random Access Channels

Recep Can Yavas, , Victoria Kostina, , and Michelle Effros Manuscript received January 31, 2023; revised October 9, 2023; accepted November 17, 2023.When this work was completed, R. C. Yavas, V. Kostina, and M. Effros were all with the Department of Electrical Engineering, California Institute of Technology, Pasadena, CA 91125, USA. R. C. Yavas is currently with CNRS@CREATE, 138602, Singapore (e-mail: vkostina, effros@caltech.edu, recep.yavas@cnrsatcreate.sg). This work was supported in part by the National Science Foundation (NSF) under grant CCF-1817241 and CCF-1956386. This paper was presented in part at ISIT 2021 [1] and at ITW 2021 [2].

Abstract

This paper investigates variable-length stop-feedback codes for memoryless channels in point-to-point, multiple access, and random access communication scenarios. The proposed codes employ $L$ decoding times $n_{1},n_{2},\dots,n_{L}$ for the point-to-point and multiple access channels and $KL+1$ decoding times for the random access channel with at most $K$ active transmitters. In the point-to-point and multiple access channels, the decoder uses the observed channel outputs to decide whether to decode at each of the allowed decoding times $n_{1},\dots,n_{L}$ , at each time telling the encoder whether or not to stop transmitting using a single bit of feedback. In the random access scenario, the decoder estimates the number of active transmitters at time $n_{0}$ and then chooses among decoding times $n_{k,1},\dots,n_{k,L}$ if it believes that there are $k$ active transmitters. In all cases, the choice of allowed decoding times is part of the code design; given fixed value $L$ , allowed decoding times are chosen to minimize the expected decoding time for a given codebook size and target average error probability. The number $L$ in each scenario is assumed to be constant even when the blocklength is allowed to grow; the resulting code therefore requires only sparse feedback. The central results are asymptotic approximations of achievable rates as a function of the error probability, the expected decoding time, and the number of decoding times. A converse for variable-length stop-feedback codes with uniformly-spaced decoding times is included for the point-to-point channel.

Index Terms:

Variable-length coding, multiple-access, random-access, feedback codes, sparse feedback, second-order analysis, channel dispersion, moderate deviations, sequential hypothesis testing.

I Introduction

Although feedback does not increase the capacity of memoryless, point-to-point channels (PPCs) [3], feedback can simplify coding schemes and improve the speed of approach to capacity with blocklength. Examples that demonstrate this effect include Horstein’s scheme for the binary symmetric channel (BSC) [4] and Schalkwijk and Kailath’s scheme for the Gaussian channel [5], both of which leverage full channel feedback to simplify coding in the fixed-length regime. Wagner et al. [6] show that feedback improves the second-order term in the achievable rate as a function of blocklength for fixed-rate coding over discrete, memoryless, point-to-point channels (DM-PPCs) that have multiple capacity-achieving input distributions giving distinct dispersions.

I-A Literature Review on Variable-Length Feedback Codes

The benefits of feedback increase for codes with multiple decoding times (called variable-length or rateless codes). In [7], Burnashev shows that feedback significantly improves the optimal error exponent of variable-length codes for DM-PPCs. In [8], Polyanskiy et al. extend the work of Burnashev to the finite-length regime with non-vanishing error probabilities, introducing variable-length feedback (VLF) codes and deriving achievability and converse bounds on their performance. Tchamkerten and Telatar [9] show that Burnashev’s optimal error exponent is achieved for a family of BSCs and $Z$ channels, where the cross-over probability of the channel is unknown. For the BSC, Naghshvar et al. [10] propose a VLF coding scheme with a novel encoder called the small-enough-difference (SED) encoder and derive a non-asymptotic achievability bound. Their scheme is an alternative to Burnashev’s scheme to achieve the optimal error exponent. Yang et al. [11] extend the SED encoder to the binary asymmetric channel, of which the BSC is a special case, and derive refined non-asymptotic achievability bounds for the binary asymmetric channel. Guo and Kostina [12] propose an instantaneous SED code for a source whose symbols progressively arrive at the encoder in real time.

The feedback in VLF codes can be limited in its amount and frequency. Here, the amount refers to how much feedback is sent from the receiver at each time feedback is available; the frequency refers to how many times feedback is available throughout the communication epoch. The extreme cases in the frequency are no feedback and feedback after every channel use. The extreme cases in the amount are full feedback and stop feedback. With full feedback, at time $n_{i}$ , the receiver sends all symbols received until that time, $Y^{n_{i}}$ , which can be used by the transmitter to encode the $(n_{i+1})$ -th symbol. With stop feedback, the receiver sends a single bit of feedback to inform the transmitter whether or not to stop transmitting. Unlike full-feedback codes, variable-length stop-feedback (VLSF) codes employ codewords that are fixed when the code is designed; that is, feedback affects how much of a codeword is sent but does not affect the codeword’s value.

In [8], Polyanskiy et al. define VLSF codes with feedback after every channel use. The result in [8, Th. 2] shows that variable-length coding improves the first-order term in the asymptotic expansion of the maximum achievable message set size from $NC$ to $\frac{NC}{1-\epsilon}$ , where $C$ is the capacity of the DM-PPC, $N$ is the average decoding time (averaging is with respect to both the random message and the random noise), and $\epsilon$ is the average error probability. The second-order term achievable for VLF codes is $O(\log N)$ , which means that VLF codes have zero dispersion and that the convergence to the capacity is much faster than that achieved by the fixed-length codes [13, 14]. In [15], Altuğ et al. modify the VLSF coding paradigm by replacing the average decoding time constraint with a constraint on the probability that the decoding time exceeds a target value; the benefit in the first-order term does not appear under this probabilistic delay constraint, and the dispersion is no longer zero. A VLSF scenario with noisy feedback and a finite largest available decoding time is studied in [16]. For VLSF codes, Forney [17] shows an achievable error exponent that is strictly better than that of fixed-length, no-feedback codes and is strictly worse than Burnashev’s error exponent for variable-length full-feedback codes. Ginzach et al. [18] derive the exact error exponent of VLSF codes for the BSC.

Bounds on the performance of VLSF codes that allow feedback after every channel use are derived for several network communication problems. Truong and Tan [19, 20] extend the results from [8] to the Gaussian multiple access channel (MAC) under an average power constraint. Trillingsgaard et al. [21] study the VLSF scenario where a common message is transmitted across a $K$ -user discrete memoryless broadcast channel. Heidari et al. [22] extend Burnashev’s work from the DM-PPC to the DM-MAC, deriving lower and upper bounds on the error exponents of VLF codes for the DM-MAC. Bounds on the performance of VLSF codes for the DM-MAC with an unbounded number of decoding times appear in [23]. The achievability bounds for $K$ -transmitter MAC in [20] and [23] employ $2^{K}-1$ simultaneous information density threshold rules.

While high rates of feedback are impractical for many applications — especially wireless applications on half-duplex devices — most prior work on VLSF codes (e.g., [8, 15, 19, 20, 21, 22, 23]) considers the densest feedback possible, using feedback at each of the (at most) $n_{\max}$ time steps before decoding, where $n_{\max}$ is the largest blocklength used by a given VLSF code. To consider more limited feedback scenarios, let $L$ denote the number of potential decoding times in a VLSF code, a number that we assume to be independent of the blocklength. We further assume that feedback is available only at the $L$ fixed decoding times $n_{1},\dots,n_{L}$ , which are fixed in the code design and known by the transmitter and receiver before the start of transmission. In [24], Kim et al. choose the decoding time for each message from the set $\{d,2d,\dots,Ld\}$ for some positive integer $d$ and $L<\infty$ , In [25], Williamson et al. numerically optimize the values of $L$ decoding times and employ punctured convolutional codes and a Viterbi algorithm. In [26], Vakilinia et al. introduce a sequential differential optimization (SDO) algorithm to optimize the choices of the $L$ potential decoding times $n_{1},\dots,n_{L}$ , approximating the random decoding time $\tau$ by a Gaussian random variable. Vakilinia et al. apply the SDO algorithm to non-binary low-density parity-check codes over binary-input, additive white Gaussian channels; the mean and variance of $\tau$ are determined through simulation. Heidarzadeh et al. [27] extend [26] to account for the feedback rate and apply the SDO algorithm to random linear codes over the binary erasure channel. In [28], we develop a communication strategy for a random access scenario with a total of $K$ transmitters; in this scenario, neither the transmitters nor the receiver knows the set of active transmitters, which can vary from one epoch to the next. The code in [28] is a VLSF code with decoding times $n_{0}<n_{1}<\dots<n_{K}$ . The decoder decodes messages only if it decides at that time that $k$ out of total $K$ transmitters are active at time $n_{k}$ . It informs the transmitters about its decision by sending a one-bit signal at each time $n_{i}$ until the time at which it decodes. We show that our random access channel (RAC) code with sparse stop feedback achieves performance identical in the capacity and dispersion terms to that of the best-known code without feedback for a MAC in which the set of active transmitters is known a priori. An extension of [28] to low-density parity-check codes appears in [29]. Building upon an earlier version of the present paper [1], Yang et al. [30] construct an integer program to minimize the upper bound on the average blocklength subject to constraints on the average error probability and the minimum gap between consecutive decoding times. By employing a combination of the Edgeworth expansion [31, Sec. XVI.4] and the Petrov expansion (Lemma 2), that paper develops an approximation to the cumulative distribution function of the information density random variable $\imath(X^{n};Y^{n})$ ; the numerical comparison of their approximation and the empirical cumulant distribution function shows that the approximation is tight even for small values of $n$ . Their analysis uses this tight approximation to numerically evaluate the non-asymptotic achievability bound (Theorem 1, below) for the BSC, binary erasure channel, and binary-input Gaussian PPC for all $L\leq 32$ . The resulting numerical results show performance that closely approaches Polyanskiy’s VLSF achievability bound [8] with a relatively small $L$ . For the binary erasure channel, [30] also proposes a new zero-error code that employs systematic transmission followed by random linear fountain coding; the proposed code outperforms Polyanskiy’s achievability bound.

Sparse feedback is known to achieve the optimal error exponent for VLF codes. Yamamoto and Itoh [32] construct a two-phase scheme that achieves Burnashev’s optimal error exponent [7]. Although their scheme allows an unlimited number of feedback instances and decoding times, it is sparse in the sense that feedback is available only at times $\alpha n,n,(1+\alpha)n,2n,\dots$ for some $\alpha\in(0,1)$ and integer $n$ . Lalitha and Javidi [33] show that Burnashev’s optimal error exponent can be achieved by only $L=3$ decoding times by truncating the Yamamoto–Itoh scheme.

Decoding for VLSF codes can be accomplished by running a sequential hypothesis test (SHT) on each possible message. At each increasingly larger stopping times, the SHT compares a hypothesis $H_{0}$ corresponding to a particular transmitted message to the hypothesis $H_{1}$ corresponding to the marginal distribution of the channel output. In [34], Berlin et al. derive a bound on the average stopping time of an SHT. They then use this bound to derive a non-asymptotic converse bound for VLF codes. This result is an alternative proof for the converse of Burnashev’s error exponent [7].

I-B Contributions of This Work

Like [26, 25, 27], this paper studies VLSF codes under a finite constraint $L$ on the number of decoding times. While [26, 25, 27] focus on practical coding and performance, our goal is to derive new achievability bounds on the asymptotic rate achievable by VLSF codes between $L=1$ (the fixed-length regime analyzed in [13, 35]) and $L=n_{\max}$ (the classical variable-length regime defined in [8, Def. 1] where all decoding times 1, 2, $\dots,n_{\max}$ are available).

Our contributions are summarized as follows.

1.

We derive second-order achievability bounds for VLSF codes over DM-PPCs, DM-MACs, DM-RACs, and the Gaussian PPC with maximal power constraints. These bounds are presented in Theorems 2, 5, 6, and 7, respectively. In our analysis for each problem, we consider the asymptotic regime where the number of decoding times $L$ is fixed while the average decoding time $N$ grows without bound, i.e., $L=O(1)$ with respect to $N$ . Each of our asymptotic bounds follows from the corresponding non-asymptotic bound that employs an information-density threshold rule with a stop-at-time-zero procedure. Asymptotically optimizing the values of the $L$ decoding times yields the given results. By viewing the proposed decoder as a special case of SHT-based decoders, we show a more general non-asymptotic achievability bound; Theorem 8 employs an arbitrary SHT to decide whether a message is transmitted.
2.

Linking the error probability of any given VLSF code to that of an SHT, in Theorem 9, we prove a converse bound in the spirit of the meta-converse bound from [13, Th. 27]. Analyzing the new bound with infinitely many uniformly-spaced decoding times over Cover–Thomas symmetric channels, in Theorem 3, we prove a converse bound for VLSF codes; the resulting bound is tight up to its second-order term. Unfortunately, since analyzing our meta-converse bound is challenging in the general case of an arbitrary DM-PPC and an arbitrary number $L$ of decoding times (see [36, Th. 3.2.3] for the structure of optimal SHTs with finitely many decoding times), whether or not the second-order term is tight in the general case remains an open question.

TABLE I: The performance of VLSF codes according to the number of decoding times

L

and the channel type

Number of decoding times

Channel type

First-order term

Second-order term

Lower bound

Upper bound

Fixed-length, no-feedback

(L=1)

DM-PPC

NC

-\sqrt{NV}Q^{-1}(\epsilon)

[13]

-\sqrt{NV}Q^{-1}(\epsilon)

[13]

Variable-length

(1<L<\infty)

DM-PPC

\frac{NC}{1-\epsilon}

-\sqrt{N\log_{(L-1)}(N)\frac{V}{1-\epsilon}}

(Theorem 2)

+O(1)

[8]

Variable-length

(L=\infty)

DM-PPC

\frac{NC}{1-\epsilon}

-\log N+O(1)

[8]

+O(1)

[8]

Fixed-length, no-feedback

(L=1)

DM-MAC

NI_{K}

-\sqrt{NV_{K}}Q^{-1}(\epsilon)

[37]

+O(\sqrt{N})

[38]

Variable-length

(1<L<\infty)

DM-MAC

\frac{NI_{K}}{1-\epsilon}

-\sqrt{N\log_{(L-1)}(N)\frac{V_{K}}{1-\epsilon}}

(Theorem 5)

+O(1)

[20]

Variable-length

(L=\infty)

DM-MAC

\frac{NI_{K}}{1-\epsilon}

-\log N+O(1)

eq. (44)

+O(1)

[20]

Variable-length

(L=\infty)

Gaussian MAC

(average power)

\frac{NC(KP)}{1-\epsilon}

-O(\sqrt{N})

[20]

+O(1)

[20]

(L=1)

DM-RAC

N_{k}I_{k}

-\sqrt{N_{k}V_{k}}Q^{-1}(\epsilon_{k})

[28]

+O(\sqrt{N_{k}})

[38]

(L=1)

Gaussian RAC

(maximal power)

N_{k}C(kP)

-\sqrt{N_{k}V_{k}(P)}Q^{-1}(\epsilon_{k})

[39]

+O(\sqrt{N_{k}})

[38]

(1<L<\infty)

DM-RAC

\frac{N_{k}I_{k}}{1-\epsilon_{k}}

-\sqrt{N_{k}\log_{(L-1)}(N_{k})\frac{V_{k}}{1-\epsilon_{k}}}

(Theorem 6)

+O(1)

[20]

Variable-length

(1<L<\infty)

Gaussian PPC

(maximal power)

\frac{NC(P)}{1-\epsilon}

-\sqrt{N\log_{(L-1)}(N)\frac{V(P)}{1-\epsilon}}

(Theorem 7)

+O(1)

[19]

Variable-length

(L=\infty)

Gaussian PPC

(average power)

\frac{NC(P)}{1-\epsilon}

-\log N+O(1)

[19]

+O(1)

[19]

Below, we detail these contributions. Our main result shows that for VLSF codes with $L=O(1)\geq 2$ decoding times over a DM-PPC, message set size $M$ satisfying

\displaystyle\log M\approx\frac{NC}{1-\epsilon}-\sqrt{N\log_{(L-1)}(N)\frac{V}{1-\epsilon}}

(1)

is achievable. Here $\log_{(L)}(\cdot)$ denotes the $L$ -fold nested logarithm, $N$ is the average decoding time, $\epsilon$ is the average error probability, and $C$ and $V$ are the capacity and dispersion of the DM-PPC, respectively. Similar formulas arise for the DM-MAC and DM-RAC, where $C$ and $V$ are replaced by the sum-rate mutual information and the sum-rate dispersion. The speed of convergence to $\frac{C}{1-\epsilon}$ depends on $L$ . It is slower than the convergence to $C$ in the fixed-length scenario, which has second-order term $O(\sqrt{N})$ [13]. The $L=2$ case in (1) recovers the rate of convergence for the variable-length scenario without feedback, which has second-order term $O(\sqrt{N\log N})$ [8, Proof of Th. 1]; that rate is achieved with $n_{1}=0$ . The nested logarithm term in (1) arises because after writing the average decoding time as $\mathbb{E}\left[\tau\right]=n_{1}+\sum_{i=1}^{L-1}(n_{i+1}-n_{i})\mathbb{P}\left[\tau>n_{i}\right]$ , the decoding time choices in (22), below, satisfy $(n_{i+1}-n_{i})\mathbb{P}\left[\tau>n_{i}\right]=o(\sqrt{n_{1}})$ for $i\in[L-1]$ , making the effect of each decoding time on the average decoding time asymptotically similar. We then use the SDO algorithm introduced in [26] to show that our particular choice of $n_{1},\dots,n_{L}$ is second-order optimal (see Appendix B.II). Despite the order-wise dependence of the rate of convergence on $L$ , (1) grows so slowly with $L$ that it suggests little benefit to choosing a large $L$ . For example, when $L=4$ , $\sqrt{N\log_{(L-1)}(N)}$ behaves very similarly to $O(\sqrt{N})$ for practical values of $N$ (e.g., $N\in[10^{3},10^{5}]$ ). Notice, however, that the given achievability result provides a lower bound on the benefit of increasing $L$ ; bounding the benefit from above requires a converse result. We note, however, that the numerical results in [30] support our conclusion from the asymptotic achievability bound (1) that indicates that the improvement over achievable $\log M$ from $L$ to $L+1$ decoding times diminishes as $L$ increases.

For the PPC and MAC, the feedback rate of our code is $\frac{\ell}{n_{\ell}}$ if the decoding time is $n_{\ell}$ ; for the RAC, that rate becomes $\frac{(k-1)L+\ell+1}{n_{k,\ell}}$ if the decoding time is $n_{k,\ell}$ . In both cases, our feedback rate approaches 0 as the decoding time grows. In contrast, VLSF codes like in [8, 17] use feedback rate 1 bit per channel use. In VLSF codes for the RAC, the decoder decodes at one of the available times $n_{k,1},n_{k,2},\dots,n_{k,L}$ if it estimates that the number of active transmitters is $k\neq 0$ ; we reserve a single decoding time $n_{0}$ for decoding the possibility that no active transmitters are active. Theorem 6 extends the RAC code in [28] from $L=1$ to any $L\geq 2$ .

The converse result in Theorem 3 shows that in order to achieve (1) with evenly spaced decoding times, one needs at least $L=\Omega\left(\sqrt{\frac{N}{\log_{(L-1)}(N)}}\right)$ decoding times. In contrast, our optimized codes achieve (1) with a finite $L$ that does not grow with the average decoding time $N$ , which highlights the importance of optimizing the values of decoding times in a VLSF code.

Table I summarizes the literature on VLSF codes and the new results from this work, showing how they vary with the number of decoding times and the channel type.

In what follows, Section II gives notation and definitions. Sections III–VI introduce variable-length sparse stop-feedback codes for the DM-PPC, DM-MAC, DM-RAC, and the Gaussian PPC, respectively, and present our main theorems for those channel models; Section VII concludes the paper. The proofs appear in the Appendix.

II Preliminaries

II-A Notation

For any positive integers $k$ and $n$ , $[k]\triangleq\{1,\dots,k\}$ , $x^{n}\triangleq(x_{1},\dots,x_{n})$ , and $x^{a:b}\triangleq(x_{a},x_{a+1},\dots,x_{b})$ . The collection of length- $n$ vectors from the transmitter index set $\mathcal{A}$ is denoted by $x_{\mathcal{A}}^{n}\triangleq(x_{a}^{n}\colon a\in\mathcal{A})$ ; we drop the superscript $n$ if $n=1$ , i.e., $x^{1}_{\mathcal{A}}=x_{\mathcal{A}}$ . The collection of non-empty strict subsets of a set $\mathcal{A}$ is denoted by $\mathcal{P}(\mathcal{A})\triangleq\{\mathcal{B}\colon\mathcal{B}\subseteq\mathcal{A},0<|\mathcal{B}|<|\mathcal{A}|\}.$ All-zero and all-one vectors are denoted by $\mathbf{0}$ and $\mathbf{1}$ , respectively; dimension is determined from the context. The sets of positive integers and non-negative integers are denoted by $\mathbb{Z}_{+}$ and $\mathbb{Z}_{\geq}$ , respectively. We write $x^{n}\stackrel{{\scriptstyle\pi}}{{=}}y^{n}$ if there exists a permutation $\pi$ of $x^{n}$ such that $\pi(x^{n})=y^{n}$ , and $x^{n}\stackrel{{\scriptstyle\pi}}{{\neq}}y^{n}$ if such a permutation does not exist. The identity matrix of dimension $n$ is denoted by $\mathsf{I}_{n}$ . The Euclidean norm of vector $x^{n}$ is denoted by $\left\lVert x^{n}\right\rVert\triangleq\sqrt{\sum_{i=1}^{n}x_{i}^{2}}$ . Unless specified otherwise, all logarithms and exponents have base $e$ . Information is measured in nats. The standard $O(\cdot)$ , $o(\cdot)$ , and $\Omega(\cdot)$ notations are defined as $f(n)=O(g(n))$ if $\limsup_{n\to\infty}|f(n)/g(n)|<\infty$ , $f(n)=o(g(n))$ if $\lim_{n\to\infty}|f(n)/g(n)|=0$ , and $f(n)=\Omega(g(n))$ if $\lim_{n\to\infty}|f(n)/g(n)|>0$ . The distribution of a random variable $X$ is denoted by $P_{X}$ ; $\mathcal{N}(\bm{\mu},\mathsf{V})$ denotes the Gaussian distribution with mean $\bm{\mu}$ and covariance matrix $\mathsf{V}$ , $Q(\cdot)$ represents the complementary standard Gaussian cumulative distribution function $Q(x)\triangleq\frac{1}{\sqrt{2\pi}}\int_{x}^{\infty}\exp\left\{-\frac{t^{2}}{2}\right\}dt$ , and $Q^{-1}(\cdot)$ is its functional inverse. We define the nested logarithm function

\displaystyle\log_{(L)}(x)\triangleq\begin{cases}\log(x)&\text{if }L=1,\,\,x>0\\ \log(\log_{(L-1)}(x))&\text{if }L\geq 2,\,\,\log_{(L-1)}(x)>0;\end{cases}

(2)

$\log_{(L)}(x)$ is undefined for all other $(L,x)$ pairs.

We denote the Radon-Nikodym derivative of distribution $P$ with respect to distribution $Q$ by $\frac{\mathrm{d}P}{\mathrm{d}Q}$ . We denote the relative entropy and relative entropy variance between $P$ and $Q$ by $D(P\|Q)=\mathbb{E}\left[\log\frac{\mathrm{d}P}{\mathrm{d}Q}(X)\right]$ and $V(P\|Q)=\mathrm{Var}\left[\log\frac{\mathrm{d}P}{\mathrm{d}Q}(X)\right]$ , respectively, where $X\sim P$ . The $\sigma$ -algebra generated by random variable $X$ is denoted by $\mathcal{F}(X)$ . A random variable $X$ is called arithmetic if there exists some $d>0$ such that $\mathbb{P}\left[X\in d\mathbb{Z}\right]=1$ . The largest $d$ that satisfies this condition is called the span. If such a $d$ does not exist, then the random variable is non-arithmetic. Denote $X^{+}\triangleq\max\{0,X\}$ and $X^{-}\triangleq-\min\{0,X\}$ for any random variable $X$ .

II-B Discrete Memoryless Channel and Information Density

A DM-PPC is defined by the triple $(\mathcal{X},P_{Y|X},\mathcal{Y})$ , where $\mathcal{X}$ is the finite input alphabet, $P_{Y|X}$ is the channel transition kernel, and $\mathcal{Y}$ is the finite output alphabet. The $n$ -letter input-output relationship of the channel is given by $P_{Y^{n}|X^{n}}(y^{n}|x^{n})=\prod_{i=1}^{n}P_{Y|X}(y_{i}|x_{i})$ for all $n$ , $x^{n}$ , and $y^{n}$ .

The $n$ -letter information density of a channel $P_{Y|X}$ under input distribution $P_{X^{n}}$ is defined as

\displaystyle\imath(x^{n};y^{n})\triangleq\log\frac{P_{Y^{n}|X^{n}}(y^{n}|x^{n})}{P_{Y^{n}}(y^{n})},

(3)

where $P_{Y^{n}}$ is the $Y^{n}$ marginal of $P_{X^{n}}P_{Y^{n}|X^{n}}$ . If the inputs $X_{1},X_{2},\dots,X_{n}$ are independently and identically distributed (i.i.d.) according to $P_{X}$ , then

\displaystyle\imath(x^{n};y^{n})=\sum_{i=1}^{n}\imath(x_{i};y_{i}),

(4)

where the single-letter information density is given by

\displaystyle\imath(x;y)\triangleq\log\frac{P_{Y|X}(y|x)}{P_{Y}(y)},\quad x\in\mathcal{X},y\in\mathcal{Y}.

(5)

The mutual information and dispersion are defined as

	$\displaystyle I(X;Y)$	$\displaystyle\triangleq\mathbb{E}\left[\imath(X;Y)\right]$		(6)
	$\displaystyle V(X;Y)$	$\displaystyle\triangleq\mathrm{Var}\left[\imath(X;Y)\right],$		(7)

respectively, where $(X,Y)\sim P_{X}P_{Y|X}$ .

Let $\mathcal{P}$ denote all distributions on the alphabet $\mathcal{X}$ . The capacity of the DM-PPC is

\displaystyle C=\max_{P_{X}\in\mathcal{P}}I(X;Y),

(8)

and the dispersion of the DM-PPC is

\displaystyle V=\min_{P_{X}\in\mathcal{P}\colon I(X;Y)=C}V(X;Y).

(9)

III VLSF Codes for the DM-PPC

III-A VLSF Codes with $L$ Decoding Times

We consider VLSF codes with a finite number of potential decoding times $n_{1}<n_{2}<\cdots<n_{L}$ over a DM-PPC. The receiver chooses to end the transmission at the first time $n_{\ell}\in\{n_{1},\ldots,n_{L}\}$ that it is ready to decode. The transmitter learns of the receiver’s decision via a single bit of feedback at each of times $n_{1},\ldots,n_{\ell}$ . Feedback bit “0” at time $n_{i}$ means that the receiver is not yet ready to decode, and transmission should continue; feedback bit “1” means that the receiver can decode at time $n_{i}$ , which signals the transmitter to stop. Using this feedback, the transmitter and the receiver are synchronized and aware of the current state of the transmission at all times. Since $n_{L}$ is the last decoding time available, the receiver always makes a final decision if time $n_{L}$ is reached. Unlike [7, 32, 25], we do not allow re-transmission of the message after time $n_{L}$ . Since the transmitter and the receiver both know the values of decoding times, the receiver does not need to send feedback at the last available time $n_{L}$ . We assume that the transmitter and the receiver know the channel transition kernel $P_{Y|X}$ . We employ average decoding time and average error probability constraints. Definition 1, below, formalizes our code description.

Definition 1

Fix $\epsilon\in(0,1)$ , positive integers $L$ and $M$ , and a positive scalar $N$ . An $(N,L,M,\epsilon)$ -VLSF code for the DM-PPC comprises

1.

non-negative integer-valued decoding times $n_{1}<\ldots<n_{L}$ ,
2.

a finite alphabet $\mathcal{U}$ and a probability distribution $P_{U}$ on $\mathcal{U}$ defining a common randomness random variable $U$ that is revealed to both the transmitter and the receiver before the start of the transmission,¹¹1The realization $u$ of $U$ specifies the codebook.
3.

an encoding function $\mathsf{f}_{n}\colon\mathcal{U}\times[M]\to\mathcal{X}$ , for each $n=1,\ldots,n_{L}$ , that assigns a codeword

$\displaystyle\mathsf{f}(u,m)^{n_{L}}\triangleq(\mathsf{f}_{1}(u,m),\dots,\mathsf{f}_{n_{L}}(u,m))$ (10)

to each message $m\in[M]$ and common randomness instance $u\in\mathcal{U}$ ,
4.

a non-negative integer-valued random stopping time $\tau\in\{n_{1},\dots,n_{L}\}$ for the filtration generated by $\{U,Y^{n_{i}}\}_{i=1}^{L}$ that satisfies an average decoding time constraint

$\displaystyle\mathbb{E}\left[\tau\right]\leq N,$ (11)
5.

and a decoding function $\mathsf{g}_{n_{\ell}}\colon\mathcal{U}\times\mathcal{Y}^{n_{\ell}}\to[M]\cup\{\mathsf{e}\}$ for each $\ell\in[L]$ (where $\mathsf{e}$ is the erasure symbol used to indicate that the receiver is not ready to decode), satisfying an average error probability constraint

$\displaystyle\mathbb{P}\left[\mathsf{g}_{\tau}(U,Y^{\tau})\neq W\right]\leq\epsilon,$ (12)

where the message $W$ is equiprobably distributed on the set $[M]$ , and $X^{\tau}=\mathsf{f}(U,W)^{\tau}$ .

Recall that Definition 1 with $L=1$ recovers the fixed-length no-feedback codes in [13]. As in [8, 28, 21], we here need common randomness because the traditional random-coding argument does not prove the existence of a single (deterministic) code that simultaneously satisfies conditions (11) and (12) on the code. Therefore, randomized codes are necessary for our achievability argument; here, $|\mathcal{U}|\leq 2$ suffices [28, Appendix D].

We define the maximum achievable message set size $M^{*}(N,L,\epsilon)$ with $L$ decoding times, average decoding time $N$ , and average error probability $\epsilon$ as

	$\displaystyle M^{*}(N,L,\epsilon)$	$\displaystyle\triangleq\max\{M\colon\text{ an }(N,L,M,\epsilon)$
		$\displaystyle\text{ VLSF code exists}\}.$		(13)

The maximum achievable message set size for VLSF codes with $L$ decoding times $n_{1},\dots,n_{L}$ that are restricted to belong to a subset $\mathcal{N}\subseteq\mathbb{Z}_{\geq}$ is denoted by $M^{*}(N,L,\epsilon,\mathcal{N})$ .

III-B Related Work

The following discussion summarizes prior asymptotic expansions of the maximum achievable message set size for the DM-PPC.

a)

$M^{*}(N,1,\epsilon)$ : For $L=1$ and $\epsilon\in(0,1/2)$ , Polyanskiy et al. [13, Th. 49] show that

$\displaystyle\log M^{*}(N,1,\epsilon)=NC-\sqrt{NV}Q^{-1}(\epsilon)+O(\log N).$ (14)

For $\epsilon\in[1/2,1)$ , the dispersion $V$ in (9) is replaced by the maximum dispersion $V_{\max}\triangleq\max\limits_{P_{X}\colon\imath(X;Y)=C}V(X;Y)$ . The $O(\log N)$ term is lower bounded by $O(1)$ and upper bounded by $\frac{1}{2}\log N+O(1)$ . For nonsingular DM-PPCs, i.e., the channels that satisfy $\mathbb{E}\left[\mathrm{Var}\left[\imath(X;Y)|Y\right]\right]>0$ for the distributions that achieve the capacity $C$ and the dispersion $V$ , the $O(\log N)$ term is equal to $\frac{1}{2}\log N+O(1)$ [40]. Moulin [41] derives lower and upper bounds on the $O(1)$ term in the asymptotic expansion when the channel is nonsingular with non-lattice information density.

$M^{*}(N,\infty,\epsilon)$ : For VLSF codes with $L=n_{\max}=\infty$ , Polyanskiy et al. [8, Th. 2] show that for $\epsilon\in(0,1)$ ,

	$\displaystyle\log M^{*}(N,\infty,\epsilon)$	$\displaystyle\geq\frac{NC}{1-\epsilon}-\log N+O(1)$		(15)
	$\displaystyle\log M^{*}(N,\infty,\epsilon)$	$\displaystyle\leq\frac{NC}{1-\epsilon}+\frac{h_{b}(\epsilon)}{1-\epsilon},$		(16)

where $h_{b}(\epsilon)\triangleq-\epsilon\log\epsilon-(1-\epsilon)\log(1-\epsilon)$ is the binary entropy function (in nats). The bounds in (15)–(16) indicate that the $\epsilon$ -capacity (the first-order achievable term) is

\displaystyle\lim\inf_{N\to\infty}\frac{1}{N}\log M^{*}(N,\infty,\epsilon)=\frac{C}{1-\epsilon}.

(17)

The achievable dispersion term is zero, i.e., the second-order term in the fundamental limit in (15)–(16) is $o(\sqrt{N})$ .

III-C Our Achievability Bounds

Theorem 1, below, is our non-asymptotic achievability bound for VLSF codes with $L$ decoding times.

Theorem 1

Fix a constant $\gamma$ , decoding times $n_{1}<\cdots<n_{L}$ , and a positive integer $M$ . For any positive number $N$ and $\epsilon\in(0,1)$ , there exists an $(N,L,M,\epsilon)$ -VLSF code for the DM-PPC $(\mathcal{X},P_{Y|X},\mathcal{Y})$ with

	$\displaystyle\epsilon$	$\displaystyle\leq$	$\displaystyle\mathbb{P}\left[\imath(X^{n_{L}};Y^{n_{L}})<\gamma\right]+(M-1)\exp\{-\gamma\},$		(18)
	$\displaystyle N$	$\displaystyle\leq$	$\displaystyle n_{1}+\sum_{\ell=1}^{L-1}(n_{\ell+1}-n_{\ell})\mathbb{P}\left[\imath(X^{n_{\ell}};Y^{n_{\ell}})<\gamma\right],$		(19)

where $P_{X^{n_{L}}}$ is a product of distributions of $L$ sub-vectors of lengths ${n_{j}-n_{j-1}}$ , $j\in[L]$ , i.e.,

\displaystyle P_{X^{n_{L}}}(x^{n_{L}})=\prod_{j=1}^{L}P_{X^{n_{j-1}+1:n_{j}}}(x^{n_{j-1}+1:n_{j}}),

(20)

where $n_{0}=0$ .

Proof:

Polyanskiy et al. [13] interpret the information-density threshold test for a fixed-length code as a collection of hypothesis tests aimed at determining whether the channel output is $(H_{0})$ or is not $(H_{1})$ dependent on a given codeword. In our coding scheme, we use SHTs in a similar way. The strategy is as follows.

The VLSF decoder at each time $n_{1},\dots,n_{L}$ runs $M$ SHTs between a hypothesis $H_{0}$ that the channel output results from transmission of the $m$ -th codeword, $m\in[M]$ , and the hypothesis $H_{1}$ that the channel output is drawn from the unconditional channel output distribution. The former indicates that the decoder hypothesizes that message $m$ is the sent message. The latter indicates that the decoder hypothesizes that message $m$ has not been sent and thus can be removed from the list of possible messages to decode. Transmission stops at the first time $n_{i}$ that hypothesis $H_{0}$ is accepted for some message $m$ or the first time $n_{i}$ that hypothesis $H_{1}$ is accepted for all $m$ . If the latter happens, decoding fails and we declare an error. Transmission continues as long as one of the SHTs has not accepted either $H_{0}$ or $H_{1}$ . If $H_{0}$ is declared for multiple messages at the same decoding time, then we stop and declare an error. Since $n_{L}$ is the last available decoding time, the SHTs are forced to decide between $H_{0}$ and $H_{1}$ at time $n_{L}$ . Once $H_{0}$ or $H_{1}$ is decided for some message, the decision cannot be reversed at a later time.

The optimal SHT has the form of a two-sided information density threshold rule, where the thresholds depend on the individual decision times [36, Th. 3.2.3]. To simplify the analysis, we employ sub-optimal SHTs for which the upper threshold is set to a value $\gamma\in\mathbb{R}$ that is independent of the decoding times and the lower thresholds are set to $-\infty$ for $n_{\ell}<n_{L}$ and to $\gamma$ for $n_{\ell}=n_{L}$ . That is, we declare $H_{1}$ for a message if and only if the corresponding information density never reaches $\gamma$ at any of decoding times $n_{1},\dots,n_{L}$ . Theorem 1 analyzes the error probability and the average decoding time of the sub-optimal SHT-based decoder above, and extends the achievability bound in [8, Th. 3] that considers $L=\infty$ to the scenario where only a finite number of decoding times is allowed. The bound on the average decoding time (19) is obtained by expressing the bound on the average decoding time in [8, eq. (27)] using the fact that the stopping time $\tau$ is in $\{n_{1},\dots,n_{L}\}$ . When we compare Theorem 1 with [8, Th. 3], we see that the error probability bound in (18) has an extra term $\mathbb{P}\left[\imath(X^{n_{L}};Y^{n_{L}})<\gamma\right]$ . This term appears since transmission always stops at or before time $n_{L}$ .

Theorem 1 is related to [24, Lemma 1], which similarly treats $L<\infty$ but requires $n_{\ell+1}-n_{\ell}=d$ for some constant $d\geq 1$ , and [25, Cor. 2], where the transmitter retransmits the message if decoding attempts at times $n_{1},\dots,n_{L}$ are unsuccessful.

See Appendix A for the proof details.

∎

Theorem 2, stated next, is our second-order achievability bound for VLSF codes with $L=O(1)$ decoding times over the DM-PPC. The proof of Theorem 2 builds upon the non-asymptotic bound in Theorem 1.

Theorem 2

Fix an integer $L=O(1)\geq 2$ and real numbers $N>0$ and $\epsilon\in(0,1)$ . For the DM-PPC with $V>0$ , the maximum message set size (13) achievable by $(N,L,M,\epsilon)$ -VLSF codes satisfies

	$\displaystyle\log M^{*}\left(N,L,\epsilon\right)$	$\displaystyle\geq{\frac{NC}{1-\epsilon}}-\sqrt{N\log_{(L-1)}(N)\frac{V}{1-\epsilon}}$
		$\displaystyle+O\left(\sqrt{\frac{N}{\log_{(L-1)}(N)}}\right).$		(21)

The decoding times $\{n_{1},\dots,n_{L}\}$ that achieve (21) satisfy the equations

\displaystyle\log M=n_{\ell}C-\sqrt{n_{\ell}\log_{(L-\ell+1)}(n_{\ell})V}-\log{n_{\ell}}+O(1)

(22)

for $\ell\in\{2,\dots,L\}$ , and $n_{1}=0$ .

Proof:

Inspired by [8, Th. 2], the proof employs a time-sharing strategy between an $(N^{\prime},L-1,M,\epsilon_{N}^{\prime})$ -VLSF code whose smallest decoding time is nonzero and a simple “stop-at-time-zero” procedure that does not involve any code and decodes an error at time 0. Specifically, we set the VLSF code as the one that achieves the bound in Theorem 1, and we use the VLSF code and the stop-at-time-zero procedure with probabilities $1-p$ and $p$ , respectively, where $p$ and $\epsilon_{N}^{\prime}$ satisfy

	$\displaystyle\epsilon_{N}^{\prime}$	$\displaystyle=\frac{1}{\sqrt{N^{\prime}\log N^{\prime}}}$		(23)
	$\displaystyle p$	$\displaystyle=\frac{\epsilon-\epsilon_{N}^{\prime}}{1-\epsilon_{N}^{\prime}}$		(24)

The error probability of the resulting code is bounded by $\epsilon$ , and the average decoding time is

\displaystyle N=N^{\prime}(1-p)=N^{\prime}(1-\epsilon)+O\left(\sqrt{\frac{N^{\prime}}{\log N^{\prime}}}\right).

(25)

For the scenario where $L=\infty$ , we again use time-sharing with the stop-at-time-zero procedure in the achievability bound in [8, Th. 2] with $\epsilon_{N}^{\prime}=\frac{1}{N^{\prime}}$ instead of (23). In the asymptotic regime $L=O(1)$ , the choice in (23) results in a better second-order term than that achieved by $\epsilon_{N}^{\prime}=\frac{1}{N^{\prime}}$ .

In the analysis of Theorem 1, we need to bound the probability $\mathbb{P}\left[\imath(X^{n_{L}};Y^{n_{L}})<\gamma\right]=\epsilon_{N}^{\prime}(1-o(1))$ . Since this probability decays sub-exponentially to zero due to (23), we use a moderate deviations result from [42, Ch. 8] to bound this probability. Such a tool was not needed in the proof of [8, Th. 2] for $L=\infty$ because when $n_{L}=\infty$ , the term $\mathbb{P}\left[\imath(X^{n_{L}};Y^{n_{L}})<\gamma\right]$ disappears from (18), and the average decoding time is bounded via martingale analysis instead of (19). Finally, we apply Karush-Kuhn-Tucker conditions to show that the decoding times in (22) yield a value of $\log M$ that is the maximal value achievable by the non-asymptotic bound up to terms of order $O\left(\sqrt{\frac{N}{\log_{(L-1)}(N)}}\right)$ . The details of the proof appear in Appendix B. ∎

The non-asymptotic achievability bounds obtained from the coding scheme described in the proof sketch of Theorem 2 are illustrated for the BSC in Fig. 1. For $L\in\{2,3,4\}$ , the decoding times $n_{1},\dots,n_{L}$ are chosen as described in (22) with the $O(1)$ term ignored, and $\epsilon_{N}^{\prime}$ in the stop-at-time-zero procedure is replaced with the right-hand side of (18). For $L=1$ , Fig. 1 shows the random coding union bound in [13, Th. 16], which is a non-asymptotic achievability bound for fixed-length no-feedback codes. For $L=\infty$ , Fig. 1 shows the non-asymptotic bound in [8, eq. (102)]. The curves for $L=1$ and $L=2$ cross because the choice of decoding times in (22) requires $\epsilon\gg\frac{1}{\sqrt{N\log N}}$ and is optimal only as $N\to\infty$ . In [30], Yang et al. construct a computationally intensive integer program for the numerical optimization of the decoding times for finite $N$ . If such a precise optimization is desired, our approximate decoding times in (22) can be used as starting locations for that integer program.

Refer to caption — Figure 1: The non-asymptotic achievability bounds obtained from Theorems 1 and 2 and the non-asymptotic converse bound (16) for the maximum achievable rate $\frac{\log M^{*}(N,L,\epsilon)}{N}$ are shown for the BSC with crossover probability 0.11, $L\in\{1,2,3,4,\infty\}$ , and $\epsilon=0.05$ . The curves that $L=1$ and $L=\infty$ are Polyanskiy *et al.*’s achievability bounds from [13, Th. 16] and [8, eq. (102)], respectively.

Replacing the information-density-based decoding rule in the proof sketch with the optimal SHT would improve the performance achieved on the right-hand side of (21) by only $O(1)$ .

Since any $(N,L,M,\epsilon)$ -VLSF code is also an $(N,\infty,M,\epsilon)$ -VLSF code, (16) provides an upper bound on $\log M^{*}(N,L,\epsilon)$ for an arbitrary $L$ . The order of the second-order term, $-\sqrt{N\log_{(L-1)}(N)\frac{V}{1-\epsilon}}$ , depends on the number of decoding times $L$ . The larger $L$ , the faster the achievable rate converges to the capacity. However, the dependence on $L$ is weak since $\log_{(L-1)}(N)$ grows very slowly in $N$ even if $L$ is small. For example, for $L=4$ and $N=1000$ , $\log_{(L-1)}(N)\approx 0.659$ . For a finite $L$ , this bound falls short of the $-\log N$ achievability bound in (16) achievable with $L=\infty$ . Whether the second-order term achieved in Theorem 2 is tight remains an open problem.

The following theorem gives achievability and converse bounds for VLSF codes with decoding times uniformly spaced as $\{0,d_{N},2d_{N},\dots\}$ .

Theorem 3

Fix $\epsilon\in(0,1)$ . Let $d_{N}=o(N)$ with $d_{N}\to\infty$ , and let $P_{Y|X}$ be any DM-PPC. Then, it holds that

\displaystyle\log M^{*}(N,\infty,\epsilon,d_{N}\mathbb{Z}_{\geq})

\displaystyle\geq\frac{NC}{1-\epsilon}-\frac{d_{N}C}{2}-\log N+o(d_{N}).

(26)

If the DM-PPC $P_{Y|X}$ is a Cover–Thomas symmetric DM-PPC [43, p. 190] i.e., the rows (and resp. the columns) of the transition probability matrix are permutations of each other, then

\displaystyle\log M^{*}(N,\infty,\epsilon,d_{N}\mathbb{Z}_{\geq})

\displaystyle\leq\frac{NC}{1-\epsilon}-\frac{d_{N}C}{2}+o(d_{N}).

(27)

Proof:

The achievability bound (26) employs the sub-optimal SHT in the proof sketch of Theorem 2. To prove the converse in (27), we first derive in Theorem 9, in Appendix C below, the meta-converse bound for VLSF codes. The meta-converse bound in Theorem 9 bounds the error probability of any given VLSF code from below by the minimum achievable type-II error probability of the corresponding SHT; it is an extension and a tightening of Polyanskiy et al.’s converse in (16) since for $d_{N}=1$ , weakening it by applying a loose bound on the performance of SHTs from [36, Th. 3.2.2] recovers (16). The Cover-Thomas symmetry assumption allows us to circumvent the maximization of that minimum type-II error probability over codes since the log-likelihood ratio $\log\frac{P_{Y|X}(Y|x)}{P_{Y}(Y)}$ is the same regardless of the channel input $x$ for that channel class. In both bounds in (26)–(27), we use the expansions for the average stopping time and the type-II error probability from [36, Ch. 2-3]. See Appendix C for details. ∎

Theorem 3 establishes that when $\frac{d_{N}}{\log N}\to\infty$ , the second-order term of the logarithm of maximum achievable message set size among VLSF codes with uniformly spaced decoding times is $-\frac{d_{N}C}{2}$ . Theorem 3 implies that in order to achieve the same performance as achieved in (21) with $L$ decoding times, one needs on average $\Omega\left(\sqrt{\frac{N}{\log_{(L-1)}(N)}}\right)$ uniformly spaced stop-feedback instances, suggesting that the optimization of available decoding times considered in Theorem 2 is crucial for attaining the second-order term in (21).

The case where $d_{N}=\Omega(N)$ is not as interesting as the case where $d_{N}=o(N)$ since analyzing Theorem 9 using Chernoff bound would yield a bound on the probability that the optimal SHT makes a decision at times other than $n_{1}=0$ and $\frac{N}{1-\epsilon}(1+o(1))$ . Since that probability decays exponentially with $N$ , the scenario where $L$ is unbounded and $d_{N}=\Omega(N)$ is asymptotically equivalent to $L=2$ . For example, for $d_{N}=\frac{1}{\ell}\frac{N}{1-\epsilon}\left(1+O\left(\frac{1}{\sqrt{N\log N}}\right)\right)$ for some $\ell\in\mathbb{Z}_{+}$ , the right-hand side of (21) is tight up to the second-order term.

IV VLSF Codes for the DM-MAC

We begin by introducing the definitions used for the multi-transmitter setting.

IV-A Definitions

A $K$ -transmitter DM-MAC is defined by a triple $\left(\prod_{k=1}^{K}\mathcal{X}_{k},P_{Y_{K}|X_{[K]}},\mathcal{Y}_{K}\right)$ , where $\mathcal{X}_{k}$ is the finite input alphabet for transmitter $k\in[K]$ , $\mathcal{Y}_{K}$ is the finite output alphabet of the channel, and $P_{Y_{K}|X_{[K]}}$ is the channel transition probability.

In what follows, the subscript and superscript indicate the corresponding transmitter indices and the codeword lengths, respectively. Let $P_{Y_{K}}$ denote the marginal output distribution induced by the input distribution $P_{X_{[K]}}$ . The unconditional and conditional information densities are defined for each non-empty $\mathcal{A}\subseteq[K]$ as

	$\displaystyle\imath_{K}(x_{\mathcal{A}};y)$	$\displaystyle\triangleq\log\frac{P_{Y_{K}\|X_{\mathcal{A}}}(y\|x_{\mathcal{A}})}{P_{Y_{K}}(y)}$		(28)
	$\displaystyle\imath_{K}(x_{\mathcal{A}};y\|x_{\mathcal{A}^{c}})$	$\displaystyle\triangleq\log\frac{P_{Y_{K}\|X_{[K]}}(y\|x_{[K]})}{P_{Y_{K}\|X_{\mathcal{A}^{c}}}(y\|x_{\mathcal{A}^{c}})},$		(29)

where $\mathcal{A}^{c}=[K]\setminus\mathcal{A}$ . Note that in (28)–(29), the information density functions depend on the transmitter set $\mathcal{A}$ unless further symmetry conditions are assumed (e.g., in some cases we assume that the components of $P_{X_{[K]}}$ are i.i.d., and $P_{Y_{K}|X_{[K]}}$ is invariant to permutations of the inputs $X_{[K]}$ ).

The corresponding mutual informations under the input distribution $P_{X_{[K]}}$ and the channel transition probability $P_{Y_{K}|X_{[K]}}$ are defined as

	$\displaystyle I_{K}(X_{\mathcal{A}};Y_{K})$	$\displaystyle\triangleq\mathbb{E}\left[\imath_{K}(X_{\mathcal{A}};Y_{K})\right]$		(30)
	$\displaystyle I_{K}(X_{\mathcal{A}};Y_{K}\|{X_{\mathcal{A}^{c}}})$	$\displaystyle\triangleq\mathbb{E}\left[\imath_{K}(X_{\mathcal{A}};Y_{K}\|X_{\mathcal{A}^{c}})\right].$		(31)

The dispersions are defined as

	$\displaystyle V_{K}(X_{\mathcal{A}};Y_{K})$	$\displaystyle\triangleq\mathrm{Var}\left[\imath_{K}(X_{\mathcal{A}};Y_{K})\right]$		(32)
	$\displaystyle V_{K}(X_{\mathcal{A}};Y_{K}\|{X_{\mathcal{A}^{c}}})$	$\displaystyle\triangleq\mathrm{Var}\left[\imath_{K}(X_{\mathcal{A}};Y_{K}\|X_{\mathcal{A}^{c}})\right].$		(33)

For brevity, we define

	$\displaystyle I_{K}$	$\displaystyle\triangleq I_{K}(X_{[K]};Y_{K})$		(34)
	$\displaystyle V_{K}$	$\displaystyle\triangleq\mathrm{Var}\left[\imath_{K}(X_{[K]};Y_{K})\right].$		(35)

A VLSF code for the MAC with $K$ transmitters is defined similarly to the VLSF code for the PPC.

Definition 2

Fix $\epsilon\in(0,1)$ , $N\in(0,\infty)$ , and positive integers $M_{k},k\in[K]$ . An $(N,L,M_{[K]},\epsilon)$ -VLSF code for the MAC comprises

1.

non-negative integer-valued decoding times $n_{1}<\cdots<n_{L}$ ,
2.

$K$ finite alphabets $\mathcal{U}_{k}$ , $k\in[K]$ , defining common randomness random variables $U_{1},\dots,U_{K}$ ,
3.

$K$ sequences of encoding functions $\mathsf{f}_{n}^{(k)}\colon\mathcal{U}_{k}\times[M_{k}]\to\mathcal{X}_{k}$ , $k\in[K]$ ,
4.

a stopping time $\tau\in\{n_{1},\dots,n_{L}\}$ for the filtration generated by $\{U_{1},\dots,U_{K},Y_{K}^{n_{\ell}}\}_{\ell=1}^{L}$ , satisfying an average decoding time constraint (11), and
5.

$L$ decoding functions $\mathsf{g}_{n_{\ell}}\colon\mathcal{U}_{[K]}\times\mathcal{Y}_{K}^{n_{\ell}}\to\prod\limits_{k=1}^{K}[M_{k}]\cup\{\mathsf{e}\}$ for $\ell\in[L]$ , satisfying an average error probability constraint

$\displaystyle\mathbb{P}\left[\mathsf{g}_{\tau}(U_{[K]},Y_{K}^{\tau})\neq W_{[K]}\right]\leq\epsilon,$ (36)

where the independent messages $W_{1},\dots,W_{K}$ are uniformly distributed on the sets $[M_{1}],\dots,[M_{K}]$ , respectively.

IV-B Our Achievability Bounds

Our main results are second-order achievability bounds for the rates approaching a point on the sum-rate boundary of the MAC achievable region expanded by a factor of $\frac{1}{1-\epsilon}$ .

Theorem 4, below, is a non-asymptotic achievability bound for any DM-MAC with $K$ transmitters and $L$ decoding times.

Theorem 4

Fix constants $\epsilon\in(0,1)$ , $\gamma\in\mathbb{R}$ , $\lambda^{(\mathcal{A})}>0$ for $\mathcal{A}\in\mathcal{P}([K])$ , integers $0\leq n_{1}<\cdots<n_{L}$ , and distributions $P_{X_{k}}$ , $k\in[K]$ . For any DM-MAC with $K$ transmitters $(\prod_{k=1}^{K}\mathcal{X}_{k},P_{Y_{K}|X_{[K]}},\mathcal{Y}_{K})$ , there exists an $(N,L,M_{[K]},\epsilon)$ -VLSF code with

	$\displaystyle\epsilon\leq\mathbb{P}\left[\imath_{K}(X_{[K]}^{n_{L}};Y_{K}^{n_{L}})<\gamma\right]$		(37)
	$\displaystyle+\prod_{k=1}^{K}(M_{k}-1)\exp\{-\gamma\}$		(38)
	$\displaystyle+\sum_{\ell=1}^{L}\sum_{\mathcal{A}\in\mathcal{P}([K])}\mathbb{P}\left[\imath_{K}(X_{\mathcal{A}}^{n_{\ell}};Y_{K}^{n_{\ell}})>N(I_{K}(X_{\mathcal{A}};Y)+\lambda^{(\mathcal{A})})\right]$		(39)
	$\displaystyle+\sum_{\mathcal{A}\in\mathcal{P}([K])}\left(\prod_{k\in\mathcal{A}^{\mathrm{c}}}(M_{k}-1)\right)$
	$\displaystyle\quad\quad\quad\exp\{-\gamma+NI_{K}(X_{\mathcal{A}};Y_{K})+N\lambda^{(\mathcal{A})}\}$		(40)
	$\displaystyle N\leq n_{1}+\sum_{\ell=1}^{L-1}(n_{\ell+1}-n_{\ell})\mathbb{P}\left[\imath_{K}(X_{[K]}^{n_{\ell}};Y_{K}^{n_{\ell}})<\gamma\right].$		(41)

Proof:

The proof of Theorem 4 uses a random coding argument that employs $K$ independent codebook ensembles, each with distribution $P_{X_{k}}^{n_{L}}$ , $k\in[K]$ . The receiver employs $L$ decoders that operate by comparing an information density $\imath_{K}(x_{[K]^{n_{\ell}}};y^{n_{\ell}})$ for each possible transmitted codeword set to a threshold. At time $n_{\ell}$ , decoder $g_{n_{\ell}}$ computes the information densities $\imath_{K}(X_{[K]}^{n_{\ell}}(m_{[K]});Y_{K}^{n_{\ell}})$ ; if there exists a unique message vector $\hat{m}_{[K]}$ satisfying $\imath_{K}(X_{[K]}^{n_{\ell}}(\hat{m}_{[K]});Y_{K}^{n_{\ell}})>\gamma$ , then the receiver decodes to the message vector $\hat{m}_{[K]}$ ; if there exists multiple such message vectors, then the receiver stops the transmission and decodes an error. If no such message vectors exist at time $n_{\ell}$ , then the receiver emits output $\mathsf{e}$ and passes the decoding time $n_{\ell}$ without decoding if $n_{\ell}<n_{L}$ and decodes an error if $n_{\ell}=n_{L}$ . The term (37) bounds the probability that the information density corresponding to the true messages is below the threshold for all decoding times; (38) bounds the probability that all messages are decoded incorrectly; and (39)-(40) bound the probability that the messages from the transmitter index set $\mathcal{A}\subseteq[K]$ are decoded incorrectly, and the messages from the index set $\mathcal{A}^{c}$ are decoded correctly. The proof of Theorem 4 appears in Appendix D. ∎

Theorem 5, below, is a second-order achievability bound in the asymptotic regime $L=O(1)$ for any DM-MAC. It follows from an application of Theorem 4.

Theorem 5

Fix $\epsilon\in(0,1)$ , an integer ${{L=O(1)\geq 2}}$ , and distributions $P_{X_{k}}$ , $k\in[K]$ . For any $K$ -transmitter DM-MAC $(\prod_{k=1}^{K}\mathcal{X}_{k},P_{Y_{K}|X_{[K]}},\mathcal{Y}_{K})$ , there exists a $K$ -tuple $M_{[K]}$ and an $(N,L,M_{[K]},\epsilon)$ -VLSF code satisfying

	$\displaystyle\sum_{k\in[K]}\log M_{k}$	$\displaystyle={\frac{NI_{K}}{1-\epsilon}}-\sqrt{N\log_{(L-1)}(N)\frac{V_{K}}{1-\epsilon}}$
		$\displaystyle\quad+O\left(\sqrt{\frac{N}{\log_{(L-1)}(N)}}\right).$		(42)

Proof:

See Appendix D. ∎

In the application of Theorem 4 to prove Theorem 5, we choose the parameters $\lambda^{(\mathcal{A})}$ and $\gamma$ so that the terms in (39)-(40) decay exponentially with $N$ , which become negligible compared to (37) and (38). Between (37) and (38), the term (37) is dominant when $L$ does not grow with $N$ , and (38) is dominant when $L$ grows linearly with $N$ .

Like the single-threshold rule from [28] for the RAC, the single-threshold rule employed in the proof of Theorem 4 differs from the decoding rules employed in [20] for VLSF codes over the Gaussian MAC with expected power constraints and in [23] for the DM-MAC. In both [20] and [23], $L=n_{\max}=\infty$ , and the decoder employs $2^{K}-1$ simultaneous threshold rules for each of the boundaries that define the achievable region of the MAC with $K$ transmitters. Those rules fix thresholds $\gamma^{(\mathcal{A})}$ , $\mathcal{A}\in\mathcal{P}([K])$ , and decode messages $m_{[K]}$ if for all $\mathcal{A}\in\mathcal{P}([K])$ , the codeword for $m_{[K]}$ satisfies

\displaystyle\imath_{K}(X_{\mathcal{A}}^{n_{\ell}}(m_{\mathcal{A}});Y_{K}^{n_{\ell}}|X_{\mathcal{A}^{c}}^{n_{\ell}}(m_{\mathcal{A}^{c}}))

\displaystyle>\gamma^{(\mathcal{A})},

(43)

for some $\gamma^{(\mathcal{A})}$ , $\mathcal{A}\in\mathcal{P}([K])$ . Our decoder can be viewed as a special case of (43) obtained by setting $\gamma^{(\mathcal{A})}=-\infty$ for $\mathcal{A}\neq[K]$ .

Analyzing Theorem 4 in the asymptotic regime $L=\Omega(N)$ , we determine that there exists a $K$ -tuple $M_{[K]}$ and an $(N,\infty,M_{[K]},\epsilon)$ -VLSF code satisfying

\displaystyle\sum_{k\in[K]}\log M_{k}

\displaystyle={\frac{NI_{K}}{1-\epsilon}}-\log N+O(1).

(44)

Both (42) and (44) are achieved at rate points that approach a point on the sum-rate boundary of the $K$ -MAC achievable region expanded by a factor of $\frac{1}{1-\epsilon}$ .

For any VLSF code, $L=\infty$ case can be treated as $L=\Omega(N)$ regardless of the number of transmitters since if we truncate an infinite-length code at time $n_{\max}=2N$ , by Chernoff bound, the resulting penalty term added to the error probability decays exponentially with $N$ , whose effect in (44) is $o(1)$ . See Appendix D.III for the proof of (44).

For $L=n_{\max}=\infty$ , Trillingsgaard and Popovski [23] numerically evaluate their non-asymptotic achievability bound for a DM-MAC while Truong and Tan [20] provide an achievability bound with second-order term $-O(\sqrt{N})$ for the Gaussian MAC with average power constraints. Applying our single-threshold rule and analysis to the Gaussian MAC with average power constraints improves the second-order term in [20] from $-O(\sqrt{N})$ to $-\log N+O(1)$ for all non-corner points in the achievable region. The main challenge in [20] is to derive a tight bound on the expected value of the maximum over $\mathcal{A}\subseteq[K]$ of stopping times $\tau^{(\mathcal{A})}$ for the corresponding threshold rules in (43). In our analysis, we avoid that challenge by employing a single-threshold decoder whose average decoding time is bounded by $\mathbb{E}\left[\tau^{([K])}\right]$ .

Under the same model and assumptions on $L$ , to achieve non-corner rate points that do not lie on the sum-rate boundary, we modify our single-threshold rule to (43), where $\mathcal{A}$ is the transmitter index set corresponding to the capacity region’s active sum-rate bound at the (non-corner) point of interest. Following steps similar to the proof of (44) gives second-order term $-\log N+O(1)$ for those points as well. For corner points, more than one boundary is active²²2The capacity region of a $K$ -transmitter MAC is characterized by the region bounded by $2^{K}-1$ planes. By definition of a corner point, at least two inequalities corresponding to these planes are active at a corner point.; therefore, more than one threshold rule in (43) is needed at the decoder. In this case, again for $L=\infty$ , [20] proves an achievability bound with a second-order term $-O(\sqrt{N})$ . Whether this bound can be improved to $-\log N+O(1)$ as in (44) remains an open problem.

V VLSF Codes for the DM-RAC with at Most $K$ Transmitters

Definition 3 (Yavas et al. [28, eq. (1)])

A permutation-invariant, reducible DM-RAC for the maximal number of transmitters $K<\infty$ is defined by a family of DM-MACs $\left\{\left(\mathcal{X}^{k},P_{Y_{k}|X_{[k]}},\mathcal{Y}_{k}\right)\right\}_{k=0}^{K}$ , where the $k$ -th DM-MAC defines the channel for $k$ active transmitters.

By assumption, each of the DM-MACs satisfies the permutation-invariance condition

\displaystyle P_{Y_{k}|X_{[k]}}(y|x_{[k]})=P_{Y_{k}|X_{[k]}}(y|x_{\pi[k]})

(45)

for all permutations $\pi[k]$ of $[k]$ , and $y\in\mathcal{Y}_{k}$ , and the reducibility condition

\displaystyle P_{Y_{s}|X_{[s]}}(y|x_{[s]})=P_{Y_{k}|X_{[k]}}(y|x_{[s]},0^{k-s})\quad

(46)

for all $s<k$ , $x_{[s]}\in{\mathcal{X}}_{[s]}$ , and $y\in{\mathcal{Y}}_{s}$ , where $0\in\mathcal{X}$ specifies a unique “silence” symbol that is transmitted when a transmitter is silent.

The permutation-invariance (45) and reducibility (46) conditions simplify the presentation and enable us to show, using a single-threshold rule at the decoder [28], that the symmetrical rate point $(R,R,\dots,R)$ at which the code operates lies on the sum-rate boundary of each of the underlying DM-MACs,

The VLSF RAC code defined here combines our rateless communication strategy from [28] with the sparse feedback VLSF PPC and MAC codes with optimized average decoding times described above. Specifically, the decoder estimates the value of $k$ at time $n_{0}$ . If the estimate $\hat{k}$ is not zero, it decodes at one of the $L$ decoding times $n_{\hat{k},1}<n_{\hat{k},2}<\dots<n_{\hat{k},L}$ (rather than just the single time $n_{\hat{k}}$ used in [28, 39]). For every $k\in[K]$ , the locations of the $L$ decoding times are optimized to attain the minimum average decoding delay. As in [28], we do not assume any probability distribution on the user activity pattern. We seek instead to optimize the rate-reliability trade-off simultaneously for all possible activity patterns. (By the permutation-invariance assumption, there are only $K$ distinguishable activity patterns to consider here indexed by the number of active transmitters.) If the decoder concludes that no transmitters are active, then it ends the transmission at time $n_{0}$ decoding no messages. At each time $n_{i,\ell}$ , $i<\hat{k}$ , the receiver broadcasts “0” to the transmitters, signaling that they should continue to transmit. At time $n_{\hat{k},\ell}$ , the receiver broadcasts feedback bit “1” to the transmitters if it is able to decode $\hat{k}$ messages; otherwise, it outputs an erasure symbol “ $\mathsf{e}$ ” and sends feedback bit “0”, again signaling that decoding has not occurred and transmission should continue.

As in [28, 39], we assume that the transmitters know nothing about the set $\mathcal{A}$ except their own membership and the receiver’s feedback at potential decoding times. We employ identical encoding [44], that is, all transmitters use the same codebook. This implies that the RAC code operates at the symmetrical rate point, i.e., $M_{i}=M$ for $i\in[K]$ . As in [44, 28], the decoder is required to decode the list of messages transmitted by the active transmitters but not the identities of these transmitters.

To deal with the scenario where the number of transmitters in the RAC grows linearly with the blocklength, i.e., $K=\Omega(N)$ , [44] employs the per-user error probability (PUPE) constraint rather than the joint error probability used here and in the analysis of the MAC (e.g., [20, 28, 39]). The PUPE is a weaker error probability constraint since, under PUPE, an error at one decoder does not count as an error at all other decoders. In [28], it is shown that when $K=O(1)$ , PUPE and joint error probability constraints have the same second-order performance for random access coding. As a result, there is no advantage to using PUPE rather than the more stringent joint error criterion when $K=O(1)$ . Therefore, we employ the joint error probability constraint throughout.

We formally define VLSF codes for the RAC as follows.

Definition 4

Fix $\epsilon_{0},\dots,\epsilon_{K}\in(0,1)$ and $N_{0},\dots,N_{K}\in(0,\infty)$ . An $(\{N_{k}\}_{k=0}^{K},L,M,\{\epsilon_{k}\}_{k=0}^{K})$ -VLSF code with identical encoders comprises

1.

a set of integers $\mathcal{N}\triangleq\{n_{0}\}\cup\{n_{k,\ell}\colon k\in[K],\ell\in[L]\}$ (without loss of generality, we assume that $n_{K,L}$ is the largest available decoding time),
2.

a common randomness random variable $U$ on an alphabet $\mathcal{U}$ ,
3.

a sequence of encoding functions $\mathsf{f}_{n}\colon\mathcal{U}\times[M]\to\mathcal{X}$ , $n=1,2,\ldots,n_{K,L}$ , defining $M$ length- $n_{K,L}$ codewords,
4.

$K$ non-negative integer-valued random stopping times $\tau_{k}\in\mathcal{N}$ for the filtration generated by $\{U,Y_{k}^{n}\}_{n\in\mathcal{N}}$ , satisfying

$\displaystyle\mathbb{E}\left[\tau_{k}\right]\leq N_{k}$ (47)

if $k\in\{0\}\cup[K]$ transmitters are active, and
5.

$KL+1$ decoding functions $\mathsf{g}_{n_{0}}\colon\mathcal{U}\times\mathcal{Y}_{0}^{n_{0}}\to\{\emptyset\}\cup\{\mathsf{e}\}$ and $\mathsf{g}_{n_{k,\ell}}\colon\mathcal{U}\times\mathcal{Y}_{k}^{n_{k,\ell}}\to[M]^{k}\cup\{\mathsf{e}\}$ , $k\in[K]$ and $\ell\in[L]$ , satisfying an average error probability constraint

$\displaystyle\mathbb{P}\left[\mathsf{g}_{\tau_{k}}(U,Y_{k}^{\tau_{k}})\stackrel{{\scriptstyle\pi}}{{\neq}}W_{[k]}\right]\leq\epsilon_{k}$ (48)

when $k\in[K]$ messages $W_{[k]}=(W_{1},\dots,W_{k})$ are transmitted, where $W_{1},\dots,W_{k}$ are independent and equiprobable on the set $[M]$ , and

$\displaystyle\mathbb{P}\left[\mathsf{g}_{\tau_{0}}(U,Y_{0}^{\tau_{0}})\neq\emptyset\right]\leq\epsilon_{0}$ (49)

when no transmitters are active.

To guarantee that the symmetrical rate point arising from identical encoding lies on the sum-rate boundary for all $k\in[K]$ , following [28], we assume that there exists an input distribution $P_{X}$ that satisfies the interference assumptions

\displaystyle P_{X_{[t]}|Y_{k}}\neq P_{X_{[s]}|Y_{k}}\,P_{X_{[s+1:t]}|Y_{k}}\quad\forall\,s<t\leq k\leq K.

(50)

Permutation-invariance (45), reducibility (46), and interference (50) together imply that the mutual information per transmitter, $\frac{I_{k}}{k}$ , strictly decreases with increasing $k$ (see [28, Lemma 1]). This property guarantees the existence of decoding times satisfying $n_{k_{1},\ell_{1}}<n_{k_{2},\ell_{2}}$ for any $k_{1}<k_{2}$ and $\ell_{1},\ell_{2}\in[L]$ .

In order to be able to detect the number of active transmitters using the received symbols $Y^{n_{k,\ell}}$ but not the codewords themselves, we require that the input distribution $P_{X}$ satisfies the distinguishability assumption

\displaystyle P_{Y_{k_{1}}}\neq P_{Y_{k_{2}}}\quad\forall\,k_{1}\neq k_{2}\in\{0\}\cup[K],

(51)

where $P_{Y_{k}}$ is the marginal output distribution under the $k$ -transmitter DM-MAC with input distribution $P_{X_{[k]}}=(P_{X})^{k}$ .

An example of a permutation-invariant and reducible DM-RAC that satisfies interference (50) and distinguishability (51) assumptions is the adder-erasure RAC in [45, 28]

\displaystyle Y_{k}=\begin{cases}\sum_{i=1}^{k}X_{i},&\text{w.p. }1-\delta\\ \mathsf{e}&\text{w.p. }\delta,\end{cases}

(52)

where $X_{i}\in\{0,1\}$ , $Y_{k}\in\left\{0,\ldots,k\right\}\cup\{\mathsf{e}\}$ , and $\delta\in(0,1)$ .

Theorem 6

Fix $\epsilon\in(0,1)$ , finite integers $K\geq 1$ and $L\geq 2$ , and a distribution $P_{X}$ satisfying (50)–(51). For any permutation-invariant (45) and reducible (46) DM-RAC $\left\{(\mathcal{X}^{k},P_{Y_{k}|X_{[k]}},\mathcal{Y}_{k})\right\}_{k=0}^{K}$ , there exists an $(\{N_{k}\}_{k=0}^{K},L,M,\{\epsilon_{k}\}_{k=0}^{K})$ -VLSF code satisfying

	$\displaystyle k\log M$	$\displaystyle={\frac{N_{k}I_{k}}{1-\epsilon_{k}}}-\sqrt{N_{k}\log_{(L-1)}(N_{k})\frac{V_{k}}{1-\epsilon_{k}}}$
		$\displaystyle\quad+O\left(\sqrt{\frac{N_{k}}{\log_{(L)}(N_{k})}}\right)$		(53)

for $k\in[K]$ , and

\displaystyle N_{0}=c\log N_{1}+o(\log N_{1})

(54)

for some $c>0$ .

Proof:

The coding strategy to prove Theorem 6 is as follows. The decoder applies a $(K+1)$ -ary hypothesis test using the output sequence $Y^{n_{0}}$ and decides an estimate $\hat{k}$ of the number of active transmitters $k\in\{0,1,\dots,K\}$ . If the hypothesis test declares that $\hat{k}=0$ , then the receiver stops the transmission at time $n_{0}$ , decoding no messages. If $\hat{k}\neq 0$ , then the receiver decodes $\hat{k}$ messages at one of the times $n_{\hat{k},1},\dots,n_{\hat{k},L}$ using the VLSF code in Theorem 5 for the $\hat{k}$ -transmitter DM-MAC with $L$ decoding times. If the receiver decodes at time $n_{\hat{k},\ell}$ , then it sends feedback bit ‘0’ at all previous decoding times $\{n\in\mathcal{N}\colon n<n_{\hat{k},\ell}\}$ and feedback bit ‘1’ at time $n_{\hat{k},\ell}$ . Note that alternatively, the receiver can send its estimate $\hat{k}$ using $\lceil\log_{2}(K+1)\rceil+L$ bits at time $n_{0}$ , informing the transmitters that it will decode at some time $\{n_{\hat{k},1},\dots,n_{\hat{k},L}\}$ ; in this case, the number of feedback bits decreases from the worst-case $KL+1$ that results from the strategy described above. The details of the proof appear in Appendix E. ∎

VI VLSF Codes for the Gaussian PPC with Maximal Power Constraints

VI-A Gaussian PPC

The output of a memoryless, Gaussian PPC of blocklength $n$ in response to the input $X^{n}\in\mathbb{R}^{n}$ is

\displaystyle Y^{n}=X^{n}+Z^{n},

(55)

where $Z_{1},\ldots,Z_{n}$ are drawn i.i.d. from $\mathcal{N}(0,1)$ , independent of $X^{n}$ .

The channel’s capacity $C(P)$ and dispersion $V(P)$ are

	$\displaystyle C(P)$	$\displaystyle=\frac{1}{2}\log(1+P)$		(56)
	$\displaystyle V(P)$	$\displaystyle=\frac{P(P+2)}{2(1+P)^{2}}.$		(57)

VI-B Related Work on the Gaussian PPC

We first introduce the maximal and average power constraints on VLSF codes for the PPC. Given a VLSF code with $L$ decoding times $n_{1},\dots,n_{L}$ , the maximal power constraint requires that the length- $n$ prefixes, $n\in\{n_{1},\dots,n_{L}\}$ , of each codeword all satisfy a power constraint $P$ , i.e.,

\displaystyle\left\lVert\mathsf{f}(u,m)^{n_{\ell}}\right\rVert^{2}\leq n_{\ell}P\,\text{ for all }m\in[M],u\in\mathcal{U},\quad\ell\in[L]..

(58)

The average power constraint on the length- $n_{L}$ codewords, as defined by [20, Def. 1], is

\displaystyle\mathbb{E}\left[\left\lVert\mathsf{f}(U,W)^{n_{L}}\right\rVert^{2}\right]

\displaystyle\leq NP.

(59)

The definitions of $(N,L,M,\epsilon,P)_{\mathrm{max}}$ and $(N,L,M,\epsilon,P)_{\mathrm{ave}}$ -VLSF codes for the Gaussian PPC are similar to Definition 1 with the addition of maximal (58) and average (59) power constraints, respectively. Similar to (13), $M^{*}(N,L,\epsilon,P)_{\mathrm{max}}$ (resp. $M^{*}(N,L,\epsilon,P)_{\mathrm{ave}}$ ) denotes the maximum achievable message set size with $L$ decoding times, average decoding time $N$ , average error probability $\epsilon$ , and maximal (resp. average) power constraint $P$ .

In the following, we discuss prior asymptotic expansions of $M^{*}(N,L,\epsilon,P)_{\mathrm{max}}$ and $M^{*}(N,L,\epsilon,P)_{\mathrm{ave}}$ for the Gaussian PPC, where $L\in\{1,\infty\}$ .

$M^{*}(N,1,\epsilon,P)_{\mathrm{max}}$ : For $L=1$ , $P>0$ , and $\epsilon\in(0,1)$ , Tan and Tomamichel [35, Th. 1] and Polyanskiy et al. [13, Th. 54] show that

	$\displaystyle\IEEEeqnarraymulticol{3}{l}{\log M^{*}(N,1,\epsilon,P)_{\mathrm{max}}}$				(60)
		$\displaystyle=$	$\displaystyle NC(P)-\sqrt{NV(P)}Q^{-1}(\epsilon)+\frac{1}{2}\log N+O(1).$		(60)

The converse for (60) is derived in [13, Th. 54] and the achievability for (60) in [35, Th. 1]. The achievability scheme in [35, Th. 1] generates i.i.d. codewords uniformly distributed on the $n$ -dimensional sphere with radius $\sqrt{nP}$ , and applies maximum likelihood (ML) decoding. These results imply that random codewords uniformly distributed on a sphere and ML decoding are, together, third-order optimal, meaning that the gap between the achievability and converse bounds in (60) is $O(1)$ .

$M^{*}(N,1,\epsilon,P)_{\mathrm{ave}}$ : For $L=1$ with an average-power-constraint, Yang et al. show in [46] that

	$\displaystyle\log M^{*}(N,1,\epsilon,P)_{\mathrm{ave}}=N\,C\left(\frac{P}{1-\epsilon}\right)-$
	$\displaystyle\quad-\sqrt{N\log N\,V\left(\frac{P}{1-\epsilon}\right)}+O(\sqrt{N}).$		(61)

Yang et al. use a power control argument to show the achievability of (61). They divide the messages into disjoint sets $\mathcal{A}$ and $[M]\setminus\mathcal{A}$ , where $|\mathcal{A}|=M(1-\epsilon)(1-o(1))$ . For the messages in $\mathcal{A}$ , they use an $\left(N,1,|\mathcal{A}|,\frac{2}{\sqrt{N\log N}},\frac{P}{1-\epsilon}(1-o(1))\right)$ -VLSF code with a single decoding time $N$ . The codewords are generated i.i.d. uniformly on the sphere with center at 0 and radius $\sqrt{N\frac{P}{1-\epsilon}(1-o(1))}$ . The messages in $[M]\setminus\mathcal{A}$ are assigned the all-zero codeword. The converse for (61) follows from an application of the meta-converse [13, Th. 26].

$M^{*}(N,\infty,\epsilon,P)_{\mathrm{ave}}$ : For VLSF codes with $L=n_{\max}=\infty$ and average power constraint (59), Truong and Tan show in [19, Th. 1] that for $\epsilon\in(0,1)$ and $P>0$ ,

	$\displaystyle\log M^{*}(N,\infty,\epsilon,P)_{\mathrm{ave}}$	$\displaystyle\geq\frac{NC(P)}{1-\epsilon}-\log N+O(1)$		(62)
	$\displaystyle\log M^{*}(N,\infty,\epsilon,P)_{\mathrm{ave}}$	$\displaystyle\leq\frac{NC(P)}{1-\epsilon}+\frac{h_{b}(\epsilon)}{1-\epsilon},$		(63)

where $h_{b}$ is the binary entropy function. The results in (62)–(63) are analogous to the fundamental limits for DM-PPCs (15)–(16) and follow from arguments similar to those in [8]. Since the information density $\imath(X;Y)$ for the Gaussian channel is unbounded, bounding the expected value of the decoding time in the proof of [19, Th. 1] requires different techniques from those applicable to DM-PPCs [8].

TABLE II: The performance of VLSF codes for the Gaussian channel in scenarios distinguished by the number of available decoding times

L

, the type of the power constraint, and the presence of feedback.

			First-order term	Second-order term
			First-order term	Lower Bound	Upper Bound
Fixed-length $(L=1)$	No Feedback	Max. power	$NC(P)$	$-\sqrt{NV(P)}Q^{-1}(\epsilon)$ ([35, 13])	$-\sqrt{NV(P)}Q^{-1}(\epsilon)$ ([13])
	No Feedback	Ave. power	$NC\left(\frac{P}{1-\epsilon}\right)$	$-\sqrt{N\log NV\left(\frac{P}{1-\epsilon}\right)}$ ([46])	$-\sqrt{N\log NV\left(\frac{P}{1-\epsilon}\right)}$ ([46])
	Feedback	Max. power	$NC(P)$	$-\sqrt{NV(P)}Q^{-1}(\epsilon)$ ([35, 13])	$-\sqrt{NV(P)}Q^{-1}(\epsilon)$ ([14])
	Feedback	Ave. power	$NC\left(\frac{P}{1-\epsilon}\right)$	$-O(\log_{(L)}(N))$ ([47])	$+\sqrt{N\log NV\left(\frac{P}{1-\epsilon}\right)}$ ([47])
Variable-length $(L<\infty)$	Max. power		$\frac{NC(P)}{1-\epsilon}$	$-\sqrt{N\log_{(L-1)}(N)\frac{V(P)}{1-\epsilon}}$ (Theorem 7)	$+O(1)$ ([19])
Variable-length $(L<\infty)$	Ave. power		$\frac{NC(P)}{1-\epsilon}$	$-\sqrt{N\log_{(L-1)}(N)\frac{V(P)}{1-\epsilon}}$ (Theorem 7)	$+O(1)$ ([19])
Variable-length $(L=n_{\max}=\infty)$	Max. power		$\frac{NC(P)}{1-\epsilon}$	$-O(\sqrt{N})$ ([1])	$+O(1)$ ([19])
Variable-length $(L=n_{\max}=\infty)$	Ave. power		$\frac{NC(P)}{1-\epsilon}$	$-\log N$ ([19])	$+O(1)$ ([19])

Table II combines the $L=1$ summary from [47, Table I] with the corresponding results for $L>1$ to summarize the performance of VLSF codes for the Gaussian channel in different communication scenarios.

VI-C Main Result

The theorem below is our main result for the Gaussian PPC under the maximal power constraint (58).

Theorem 7

Fix an integer $L=O(1)\geq 2$ and real numbers $P>0$ and $\epsilon\in(0,1)$ . For the Gaussian channel with maximal power constraint (58), the maximum message set size achievable by $(N,L,M,\epsilon,P)$ -VLSF codes satisfies

	$\displaystyle\log M^{*}\left(N,L,\epsilon,P\right)_{\max}$	$\displaystyle\geq{\frac{NC(P)}{1-\epsilon}}-\sqrt{N\log_{(L-1)}(N)\frac{V(P)}{1-\epsilon}}$
		$\displaystyle+O\left(\sqrt{\frac{N}{\log_{(L-1)}(N)}}\right).$		(64)

The decoding times that achieve (64) satisfy the equations

	$\displaystyle\log M^{*}\left(N,L,\epsilon,P\right)$
	$\displaystyle=n_{\ell}C(P)-\sqrt{n_{\ell}\log_{(L-\ell+1)}(n_{\ell})V(P)}-\log{n_{\ell}}+O(1)$		(65)

for $\ell\in\{2,\dots,L\}$ , and $n_{1}=0$ .

Proof:

See Appendix F. ∎

Note that the achievability bound in Theorem 7 has the same form as the one in Theorem 2 with $C$ and $V$ replaced with the Gaussian capacity $C(P)$ and the Gaussian dispersion $V(P)$ , respectively. The bound in (64) holds for the average power constraint as well since any code that satisfies the maximal power constraint also satisfies the average power constraint.

From Shannon’s work in [48], it is known that for the Gaussian channel with a maximal power constraint, drawing i.i.d. Gaussian codewords yields a performance inferior to that achieved by the uniform distribution on the power sphere. As a result, almost all tight achievability bounds for the Gaussian channel in the fixed-length regime under a variety of settings (e.g., all four combinations of the maximal/average power constraint and feedback/no feedback [35, 46, 14, 47] in Table I) employ random codewords drawn uniformly at random on the power sphere. A notable exception is Truong and Tan’s result in (62) [19, Th. 1], which considers VLSF codes with an average power constraint; that result employs i.i.d. Gaussian inputs. The Gaussian distribution works in this scenario because when $L=\infty$ , the usually dominant term $\mathbb{P}\left[\imath(X^{n_{L}};Y^{n_{L}})<\gamma\right]$ in (18) disappears. The second term $(M-1)\exp\{-\gamma\}$ in (18) is not affected by the input distribution. Unfortunately, the approach from [19, Th. 1] does not work here since drawing codewords i.i.d. $\mathcal{N}(0,P)$ satisfies the average power constraint (59) but not the maximal power constraint (58). When $L=O(1)$ and the probability $\mathbb{P}\left[\imath(X^{n_{L}};Y^{n_{L}})<\gamma\right]$ dominates, using i.i.d. $\mathcal{N}(0,P)$ inputs achieves a worse second-order term in the asymptotic expansion (64) of the maximum achievable message set size. For the case $L=O(1)$ , we draw codewords according to the rule that the sub-codewords indexed from $n_{j-1}+1$ to $n_{j}$ are drawn uniformly on the $(n_{j}-n_{j-1})$ -dimensional sphere of radius $\sqrt{(n_{j}-n_{j-1})P}$ for $j\in[L]$ , independently of each other. Note that this input distribution is dispersion-achieving for the fixed-length no-feedback case, i.e., $L=1$ [13] and is superior to choosing codewords i.i.d. $\mathcal{N}(0,P)$ , even under the average power constraint. In particular, i.i.d. $\mathcal{N}(0,P)$ inputs achieve (21), where the dispersion $V(P)$ is replaced by the variance $\tilde{V}(P)=\frac{P}{1+P}$ of $\imath(X;Y)$ when $X\sim\mathcal{N}(0,P)$ ; here $\tilde{V}(P)$ is greater than the dispersion $V(P)$ for all $P>0$ (see [49, eq. (2.56)]). Whether or not our input distribution is optimal in the second-order term remains an open question.

VII Conclusions

This paper investigates the maximum achievable message set size for sparse VLSF codes over the DM-PPC (Theorem 2), DM-MAC (Theorem 5), DM-RAC (Theorem 6), and Gaussian PPC (Theorem 7) in the asymptotic regime where the number of decoding times $L$ is constant as the average decoding time $N$ grows without bound. Under our second-order achievability bounds, the performance improvement due to adding more decoding time opportunities to our code quickly diminishes as $L$ increases. For example, for the BSC with crossover probability 0.11, at average decoding time $N=2000$ , our VLSF coding bound with only $L=4$ decoding times achieves 96% of the rate of Polyanskiy et al.’s VLSF coding bound for $L=\infty$ . Incremental redundancy automatic repeat request codes, which are some of the most common feedback codes, employ only a small number of decoding times and stop feedback. Our analysis shows that such a code design is not only practical but also has performance competitive with the best known dense feedback codes.

In all channel types considered, the first-order term in our achievability bounds is $\frac{NC}{1-\epsilon}$ , where $N$ is the average decoding time, $\epsilon$ is the error probability, and $C$ is the capacity (or the sum-rate capacity in the multi-transmitter case), and the second-order term is $O\left(\sqrt{N\log_{(L-1)}(N)}\right)$ . For DM-PPCs, there is a mismatch between the second-order term of our achievability bound for VLSF codes with $L=O(1)$ decoding times (Theorem 2) and the second-order term of the best known converse bound (16); the latter applies to $L=\infty$ , and therefore to any $L$ . Towards closing the gap between the achievability and converse bounds, in Theorem 9 in Appendix C, below, we derive a non-asymptotic converse bound that links the error probability of a VLSF code with the minimum achievable type-II error probability of an SHT. However, since the threshold values of the optimal SHT with $L$ decoding times do not have a closed-form expression [36, pp. 153-154], analyzing the non-asymptotic converse bound in Theorem 9 is a difficult task. Whether the second-order term in Theorem 2 is optimal is a question left to future work.

In sparse VLSF codes, optimizing the values of $L$ available decoding times is important since to achieve the same performance as $L=O(1)$ optimized decoding times (Theorem 2), one needs $\Omega\left(\sqrt{\frac{N}{\log_{(L-1)}(N)}}\right)$ uniformly spaced decoding times (Theorem 3).

Appendix A Proof of Theorem 1

In this section, we derive an achievability bound based on a general SHT, which we use to prove Theorem 1.

A.I A General SHT-based Achievability Bound

A.I1 SHT definitions

We begin by formally defining an SHT. We extend the definition in [36, Ch. 3] to allow non-i.i.d. distributions and finitely many testing times. Let $\{Z_{i}\}_{i=1}^{n_{L}}$ be the observed sequence. Consider two hypotheses for the distribution of $Z^{n_{L}}$

	$\displaystyle H_{0}$	$\displaystyle\colon Z^{n_{L}}\sim P_{0}$		(66)
	$\displaystyle H_{1}$	$\displaystyle\colon Z^{n_{L}}\sim P_{1},$		(67)

where $P_{0}$ and $P_{1}$ are distributions on a common alphabet $\mathcal{Z}^{n_{L}}$ . Let $\mathcal{N}\subseteq\{0,1,2,\dots,n_{L}\}$ be the set of times that the hypothesis is tested. Let $P_{i}^{(n_{\ell})}$ denote the marginal distribution of the first $n_{\ell}$ symbols in $P_{i}$ , $i\in\{0,1\}$ . At time $n_{\ell}\in\mathcal{N}$ , we either decide $H_{0}\colon Z^{n_{\ell}}\sim P_{0}^{(n_{\ell})}$ , $H_{1}\colon Z^{n_{\ell}}\sim P_{1}^{(n_{\ell})}$ , or we wait until the next available time $n_{\ell+1}$ in $\mathcal{N}$ . Let $\tau$ be a stopping time adapted to the filtration $\{\mathcal{F}(X^{n})\}_{n\in\mathcal{N}}$ . Let $\delta$ be a $\{0,1\}$ -valued, $\mathcal{F}(\tau)$ -measurable function. An SHT is a triple $(\delta,\tau,\mathcal{N})$ , where $\delta$ is called the decision rule, $\tau$ is called the stopping time, and $\mathcal{N}$ is the set of available decision times. Type-I and type-II error probabilities are defined as

	$\displaystyle\alpha$	$\displaystyle\triangleq\mathbb{P}\left[\delta=1\|H_{0}\right]$		(68)
	$\displaystyle\beta$	$\displaystyle\triangleq\mathbb{P}\left[\delta=0\|H_{1}\right].$		(69)

Below, we derive an achievability using a general SHT.

A.I2 Achievability Bound

Given some input distribution $P_{X^{n_{L}}}$ , define the common randomness random variable $U$ on $\mathbb{R}^{Mn_{L}}$ with the distribution

\displaystyle P_{U}=\underbrace{P_{X^{n_{L}}}\times P_{X^{n_{L}}}\times\cdots\times P_{X^{n_{L}}}}_{M\text{times}}.

(70)

The realization of $U$ defines $M$ length- $n_{L}$ codewords $X^{n_{L}}(1),X^{n_{L}}(2),\dots,X^{n_{L}}(M)$ . Denote the set of available decoding times by

\displaystyle\mathcal{N}\triangleq\{n_{1},\dots,n_{L}\}.

(71)

Let $\{(\delta_{m},\tilde{\tau}_{m},\mathcal{N})\}_{m=1}^{M}$ be $M$ copies of an SHT that distinguishes between the hypotheses

	$\displaystyle H_{0}$	$\displaystyle\colon(X^{n_{L}},Y^{n_{L}})\sim P_{X^{n_{L}}}\times P_{Y\|X}^{n_{L}}$		(72)
	$\displaystyle H_{1}$	$\displaystyle\colon(X^{n_{L}},Y^{n_{L}})\sim P_{X^{n_{L}}}\times P_{{Y}^{n_{L}}}$		(73)

for each message $m\in[M]$ , where the type-I and type-II error probabilities are $\alpha$ and $\beta$ , respectively. Define for $m\in[M]$ and $j\in\{0,1\}$ ,

\displaystyle\tau_{m}^{j}\triangleq\begin{cases}\tilde{\tau}_{m}&\text{if }\delta_{m}=j\\ \infty&\text{otherwise}.\end{cases}

(74)

Theorem 8, below, is an achievability bound that employs an arbitrary SHT with $L$ decoding times.

Theorem 8

Fix $L\leq\infty$ , integers $M>0$ and $0\leq n_{1}<n_{2}<\dots<n_{L}\leq\infty$ , a distribution $P_{X^{n_{L}}}$ as in (70), and $M$ copies of an SHT $\{(\delta_{m},\tilde{\tau}_{m},\{n_{1},\dots n_{L}\})\}_{m=1}^{M}$ as in (72)–(74). There exists an $(N,L,M,\epsilon)$ -VLSF code for the DM-PPC $(\mathcal{X},P_{Y|X},\mathcal{Y})$ with

	$\displaystyle\epsilon$	$\displaystyle\leq\alpha+(M-1)\beta$		(75)
	$\displaystyle N$	$\displaystyle\leq\mathbb{E}\left[\min\left\{\min\limits_{m\in[M]}\left\{\tau_{m}^{0}\right\},\max\limits_{m\in[M]}\left\{\tau_{m}^{1}\right\}\right\}\right].$		(76)

Proof:

We generate $M$ i.i.d. codewords according to (70). For each of $M$ messages, we run the hypothesis test given in (72)–(73). We decode at the earliest time that one of the following events happens

•

$H_{0}$ is declared for some message $m\in[M]$ ,
•

$H_{1}$ is declared for all $m\in[M]$ .

The decoding output is $m$ if $H_{0}$ is declared for $m$ ; if there exist more than one such $m$ or if there exists no such $m$ , the decoder declares an error.

Mathematically, the random decoding time of this code is expressed as

\displaystyle\tau^{*}=\min\left\{\min\limits_{m\in[M]}\left\{\tau_{m}^{0}\right\},\max\limits_{m\in[M]}\left\{\tau_{m}^{1}\right\}\right\}.

(77)

Note that $\tau^{*}$ is bounded by $n_{L}$ by construction. The average decoding time bound in (76) immediately follows from (77). The decoder output is

\displaystyle\hat{W}\triangleq\begin{cases}m&\text{if }\exists!\,m\in[M]\text{ s. t. }\tau^{*}=\tau_{m}^{0}\\ \text{error}&\text{otherwise.}\end{cases}

(78)

Since the messages are equiprobable, without loss of generality, assume that message $m=1$ is transmitted. An error occurs if and only if $H_{1}$ is decided for $m=1$ or if $H_{0}$ is decided for some $m\neq 1$ , giving

\displaystyle\epsilon=\mathbb{P}\left[\{\delta_{1}=1\}\cup\left\{\bigcup_{m=2}^{M}\{\delta_{m}=0\}\right\}\right].

(79)

Applying the union bound to (79) shows (75). ∎

A.II Proof of Theorem 1

Theorem 1 particularizes the SHT in Theorem 8 as an information density threshold rule.

In addition to the random code design in (70), let $P_{X^{n_{L}}}$ satisfy (20). We here specify the stopping rule $\tau_{m}$ and the decision rule $\delta_{m}$ for the SHT in (72)–(73).

Define the information density for message $m$ and decoding time $n_{\ell}$ as

\displaystyle S_{m,n_{\ell}}\triangleq\imath(X^{n_{\ell}}(m);Y^{n_{\ell}})\text{ for }m\in[M],\ell\in[L].

(80)

Note that $S_{m,n_{\ell}}$ is the log-likelihood ratio between the distributions in hypotheses $H_{0}$ and $H_{1}$ . We fix a threshold $\gamma\in\mathbb{R}$ and construct the SHTs

$\displaystyle\tau_{m}$	$\displaystyle=\inf\{n_{\ell}\in\mathcal{N}\colon S_{m,n_{\ell}}\geq\gamma\}$	(81)
$\displaystyle\tilde{\tau}_{m}$	$\displaystyle=\min\{\tau_{m},n_{L}\}$	(82)
$\displaystyle\delta_{m}$	$\displaystyle=\begin{cases}0&\text{if }S_{m,\tilde{\tau}_{m}}\geq\gamma\\ 1&\text{if }S_{m,\tilde{\tau}_{m}}<\gamma\end{cases}$	(83)

for all $m\in[M]$ , that is, we decide $H_{0}$ for message $m$ at the first time $n_{\ell}$ that $S_{m,n_{\ell}}$ passes $\gamma$ ; if this never happens for $n_{\ell}\in\{n_{1},\dots,n_{L}\}$ , then we decide $H_{1}$ for $m$ . Without loss of generality, assume that message 1 is transmitted.

Bounding (76) from above, we get

$\displaystyle N$	$\displaystyle\leq\mathbb{E}\left[\min\{\tau_{1},n_{L}\}\right]$	(84)
	$\displaystyle=\sum_{n=0}^{\infty}\mathbb{P}\left[\min\{\tau_{1},n_{L}\}>n\right]$	(85)
	$\displaystyle=n_{1}+\sum_{\ell=1}^{L-1}(n_{\ell+1}-n_{\ell})\mathbb{P}\left[\tau_{1}>n_{\ell}\right].$	(86)

The probability $\mathbb{P}\left[\tau_{1}>n_{\ell}\right]$ is further bounded as

	$\displaystyle\mathbb{P}\left[\tau_{1}>n_{\ell}\right]$	$\displaystyle=\mathbb{P}\left[\bigcap_{j=1}^{\ell}\{\imath(X^{n_{j}}(1);Y^{n_{j}})<\gamma\}\right]$		(87)
		$\displaystyle\leq\mathbb{P}\left[\imath(X^{n_{\ell}}(1);Y^{n_{\ell}})<\gamma\right].$		(88)

Combining (86) and (88) proves (19).

We bound the type-I error probability of the given SHT as

$\displaystyle\alpha$	$\displaystyle\triangleq\mathbb{P}\left[\delta_{1}=1\right]$	(89)
	$\displaystyle=\mathbb{P}\left[\tau_{1}=\infty\right]$	(90)
	$\displaystyle=\mathbb{P}\left[\bigcap_{j=1}^{L}\{\imath(X^{n_{j}}(1);Y^{n_{j}})<\gamma\}\right]$	(91)
	$\displaystyle\leq\mathbb{P}\left[\imath(X^{n_{L}}(1);Y^{n_{L}})<\gamma\right],$	(92)

where (91) uses the definition of the decision rule (83). The type-II error probability is bounded as

$\displaystyle\beta$	$\displaystyle\triangleq\mathbb{P}\left[\delta_{2}=0\right]$	(93)
	$\displaystyle\leq\mathbb{P}\left[\tau_{2}<\infty\right]$	(94)
	$\displaystyle=\mathbb{E}\left[\exp\{-\imath(X^{n_{L}}(1);Y^{n_{L}})\}1\{\tau_{1}<\infty\}\right]$	(95)
	$\displaystyle=\mathbb{E}\left[\exp\{-\imath(X^{\tau}(1);Y^{\tau})\}1\{\tau_{1}<\infty\}\right]$	(96)
	$\displaystyle\leq\exp\{-\gamma\},$	(97)

where (95) follows from changing measure from $P_{X^{n_{L}}(2)Y^{n_{L}}}=P_{X^{n_{L}}}P_{Y^{n_{L}}}$ to $P_{X^{n_{L}}(1),Y^{n_{L}}}=P_{X^{n_{L}}}P_{Y|X}^{n_{L}}$ . Equality (96) uses the same arguments as in [8, eq. (111)-(118)] and the fact that ${\{\exp\{-\imath(X^{n_{\ell}}(1);Y^{n_{\ell}})\}\colon n_{\ell}\in\mathcal{N}\}}$ is a martingale due to the product distribution in (20). Inequality (97) follows from the definition of $\tau_{1}$ in (81). Applying (75) together with (92) and (97) proves (18).

In his analysis of the error exponent regime, Forney [17] uses a slightly different threshold rule than the one in (81). Specifically, he uses a maximum a posteriori threshold rule, which can also be written as

\displaystyle\log\frac{P_{Y^{n_{\ell}}|X^{n_{\ell}}}(Y^{n_{\ell}}|X^{n_{\ell}}(m))}{\frac{1}{M}\sum_{j=1}^{M}P_{Y^{n_{\ell}}|X^{n_{\ell}}}(Y^{n_{\ell}}|X^{n_{\ell}}(j))}\geq\gamma,

(98)

whose denominator is the output distribution induced by the code rather than by the random codeword distribution $P_{X}^{n_{\ell}}$ .

Appendix B Proof of Theorem 2

The proof uses an idea that is similar to that in [13, Th. 2], which combines the achievability bound of a VLSF code with a sub-exponentially decaying error probability with the stop-at-time-zero procedure. The difference is that we set the sub-exponentially decaying error probability as $\epsilon_{N}^{\prime}=\frac{1}{\sqrt{N^{\prime}\log N^{\prime}}}$ while [13, Th. 2] sets it to $\frac{1}{N^{\prime}}$ . This modification yields a better second-order term for finite $L$ .

Inverting (25), we get

\displaystyle N^{\prime}=\frac{N}{1-\epsilon}\left(1+O\left(\frac{1}{\sqrt{N\log N}}\right)\right).

(99)

Next, we particularize the decision rules in the SHT at times $n_{2},\dots,n_{L}$ to the information density threshold rule. Lemma 1, below, is an achievability bound for an $\left(N,L,M,\frac{1}{\sqrt{N\log N}}\right)$ -VLSF code that employs the information density threshold rule with the optimized decoding times and the threshold value.

Lemma 1

Fix an integer $L=O(1)\geq 1$ . For the DM-PPC with $V>0$ , the maximum message set size (13) achievable by $\left(N,L,M,\frac{1}{\sqrt{N\log N}}\right)$ -VLSF codes satisfies

	$\displaystyle\log M^{*}\left(N,L,\frac{1}{\sqrt{N\log N}}\right)$	$\displaystyle\geq{NC}-\sqrt{N\log_{(L)}(N)\,V}$
		$\displaystyle\quad+O\left(\sqrt{\frac{N}{\log_{(L)}(N)}}\right).$		(100)

The decoding times $n_{1},\dots,n_{L}$ that achieve (100) satisfy the equations

\displaystyle\log M

\displaystyle=n_{\ell}C-\sqrt{n_{\ell}\log_{(L-\ell+1)}(n_{\ell})V}-\log{n_{\ell}}+O(1)

(101)

for $\ell\in[L]$ .

Proof:

Lemma 1 analyzes Theorem 1. See Appendix B.I, below. For $L=O(1)$ , the proof is significantly different than the corresponding result in [8, eq. (102)] for $L=\infty$ because the dominant error probability term $\mathbb{P}\left[\imath(X^{n_{L}};Y^{n_{L}})<\gamma\right]$ in (18) disappears when $L=\infty$ . ∎

We use the average decoding time $N$ and average error probability $\epsilon$ of a VLSF code in Lemma 1 in the places of $N^{\prime}$ and $\epsilon_{N}^{\prime}$ in (23). By Lemma 1, there exists an $(N^{\prime},L-1,M,\epsilon_{N}^{\prime})$ -VLSF code with

	$\displaystyle\log M$	$\displaystyle={N^{\prime}C}-\sqrt{N^{\prime}\log_{(L-1)}(N^{\prime})\,V}$
		$\displaystyle\quad+O\left(\sqrt{\frac{N^{\prime}}{\log_{(L-1)}(N^{\prime})}}\right).$		(102)

Plugging (99) into (102) and applying the necessary Taylor series expansions complete the proof.

Lemma 1 is an achievability bound in the moderate deviations regime since the error probability $\frac{1}{\sqrt{N\log N}}$ decays sub-exponentially to zero. The fixed-length scenario in Lemma 1, i.e., $L=1$ , is recovered by [50], which investigates the moderate deviations regime in channel coding. A comparison between the right-hand side of (100) and [50, Th. 2] highlights the benefit of using VLSF codes in the moderate deviations regime. The second-order rate achieved by a VLSF code with $L\geq 2$ decoding times, average decoding time $N$ , and error probability $\frac{1}{\sqrt{N\log N}}$ is achieved by a fixed-length code with blocklength $N$ and error probability $\frac{1}{\sqrt{\log_{(L-1)}(N)\log_{(L)}(N)}}$ .

In the remainder of the appendix, we prove Lemma 1 and show the second-order optimality of the parameters used in the proof of Theorem 2 including the decoding times set in (22).

B.I Proof of Lemma 1

We first present two lemmas used in the proof of Lemma 1 (step 1). We then choose the distribution $P_{X}^{n_{L}}$ of the random codewords (step 2) and the parameters $n_{1},\dots,n_{L},\gamma$ in Theorem 1 (step 3). Finally, we analyze the bounds in Theorem 1 using the supporting lemmas (step 4).

B.I1 Supporting lemmas

Lemma 2, below, is the moderate deviations result that bounds the probability that a sum of $n$ zero-mean i.i.d. random variables is above a function of $n$ that grows at most as quickly as $n^{2/3}$ .

Lemma 2 (Petrov [42, Ch. 8, Th. 4])

Let $Z_{1},\dots,Z_{n}$ be i.i.d. random variables. Let $\mathbb{E}\left[Z_{1}\right]=0$ , $\sigma^{2}=\mathrm{Var}\left[Z_{1}\right]$ , and $\mu_{3}=\mathbb{E}\left[Z_{1}^{3}\right]$ . Suppose that the moment generating function $\mathbb{E}\left[\exp\{tZ\}\right]$ is finite in the neighborhood of $t=0$ . (This condition is known as Cramér’s condition.) Let $0\leq z_{n}=O(n^{1/6})$ . As $n\to\infty$ , it holds that

	$\displaystyle\mathbb{P}\left[\sum\limits_{i=1}^{n}Z_{i}\geq z_{n}\sigma\sqrt{n}\right]$
	$\displaystyle\quad=Q(z_{n})\exp\left\{\frac{z_{n}^{3}\mu_{3}}{6\sqrt{n}\sigma^{3}}\right\}+O\left(\frac{1}{\sqrt{n}}\exp\left\{-\frac{z_{n}^{2}}{2}\right\}\right)$		(103)

Lemma 3, below, gives the asymptotic expansion of the root of an equation. We use Lemma 3 to find the asymptotic expansion for the gap between two consecutive decoding times $n_{\ell}$ and $n_{\ell+1}$ .

Lemma 3

Let $f(x)$ be a differentiable increasing function that satisfies $f^{\prime}(x)\to 0$ as $x\to\infty$ . Suppose that

\displaystyle x+f(x)=y.

(104)

Then, as $x\to\infty$ it holds that

\displaystyle x=y-f(y)\left(1-o(1)\right).

(105)

Proof:

Define the function $F(x)\triangleq x+f(x)-y$ . Applying Newton’s method with the starting point $x_{0}=y$ yields

$\displaystyle x_{1}$	$\displaystyle=x_{0}-\frac{F(x_{0})}{F^{\prime}(x_{0})}$	(106)
	$\displaystyle=y-\frac{f(y)}{1+f^{\prime}(y)}$	(107)
	$\displaystyle=y-f(y)(1-f^{\prime}(y)+O(f^{\prime}(y)^{2}).$	(108)

Recall that $f^{\prime}(y)=o(1)$ by assumption. Equality (108) follows from the Taylor series expansion of the function $\frac{1}{1+x}$ around $x=0$ . Let

\displaystyle x^{\star}=y-f(y)(1-o(1)).

(109)

From Taylor’s theorem, it follows that

\displaystyle f(x^{\star})=f(y)-f^{\prime}(y_{0})f(y)(1-o(1)),

(110)

for some $y_{0}\in[y-f(y)(1-o(1)),y]$ . Therefore, $f^{\prime}(y_{0})=o(1)$ , and $f(x^{\star})=f(y)(1-o(1))$ . Putting (109)–(110) in (104), we see that $x^{\star}$ is a solution to the equality in (104).

∎

B.I2 Random encoder design

We set the distribution of the random codewords $P_{X^{n_{L}}}$ as the product of $P_{X}^{*}$ ’s, where $P_{X}^{*}$ is the capacity-achieving distribution with minimum dispersion, i.e.,

	$\displaystyle P_{X^{n_{L}}}$	$\displaystyle=(P_{X}^{*})^{n_{L}}$		(111)
	$\displaystyle P_{X}^{*}$	$\displaystyle=\arg\min_{P_{X}}\{V(X;Y)\colon I(X;Y)=C\}.$		(112)

B.I3 Choosing the threshold $\gamma$ and decoding times $n_{1},\dots,n_{L}$

We choose $\gamma,n_{1},\dots,n_{L}$ so that the equalities

\displaystyle\gamma=n_{\ell}C-\sqrt{n_{\ell}\log_{(L-\ell+1)}(n_{\ell})V}

(113)

hold for all $\ell\in[L]$ . This choice minimizes our upper bound (19) on the average decoding time up to the second-order term in the asymptotic expansion. See Appendix B.II for the proof. Applying Lemma 3 with

$\displaystyle x$	$\displaystyle=n_{\ell+1}$	(114)
$\displaystyle y$	$\displaystyle=n_{\ell}-\frac{1}{C}\sqrt{n_{\ell}\log_{(L-\ell+1)}(n_{\ell})V}$	(115)
$\displaystyle f(x)$	$\displaystyle=-\frac{1}{C}\sqrt{n_{\ell+1}\log_{(L-\ell)}(n_{\ell+1})V}$	(116)

for $\ell\in\{1,\dots,L-1\}$ , gives the following gaps between consecutive decoding times

	$\displaystyle n_{\ell+1}-n_{\ell}$	$\displaystyle=\frac{1}{C}\Big{(}\sqrt{n_{\ell}\log_{(L-\ell)}(n_{\ell})V}$
		$\displaystyle-\sqrt{n_{i}\log_{(L-\ell+1)}(n_{\ell})V}\Big{)}(1+o(1)).$		(117)

B.I4 Analyzing the bounds in Theorem 1

The information density $\imath(X;Y)$ of a DM-PPC is a bounded random variable. Therefore, $\imath(X;Y)$ satisfies Cramér’s condition (see Lemma 2).³³3Here, Cramér’s condition is the bottleneck that determines whether our proof technique applies to a specific DM-PPC. For DM-PPCs with infinite input or output alphabets, Cramér’s condition may or may not be satisfied. Our proof technique applies to any DM-PPC for which the information density satisfies Cramér’s condition. For each $\ell\in[L]$ , applying Lemma 2 with $\gamma,n_{1},\dots,n_{L}$ satisfying (113) gives

$\displaystyle\IEEEeqnarraymulticol{3}{l}{\mathbb{P}\left[\imath(X^{n_{\ell}};Y^{n_{\ell}})<\gamma\right]}$			(119)
	$\displaystyle\leq$	$\displaystyle Q\left(\sqrt{\log_{(L-\ell+1)}(n_{\ell})}\right)\exp\left\{\frac{-(\log_{(L-\ell+1)}(n_{\ell}))^{3/2}\mu_{3}}{6\sqrt{n}V^{3/2}}\right\}$
		$\displaystyle+O\left(\frac{1}{\sqrt{n}}\exp\left\{-\frac{\log_{(L-\ell+1)}(n_{\ell})}{2}\right\}\right)$
	$\displaystyle\leq$	$\displaystyle\frac{1}{\sqrt{2\pi}}\frac{1}{\sqrt{\log_{(L-\ell)}(n_{\ell})}}\frac{1}{\sqrt{\log_{(L-\ell+1)}(n_{\ell})}}\,$
		$\displaystyle\cdot\left(1+O\left(\frac{(\log_{(L-\ell+1)}(n_{\ell}))^{(3/2)}}{\sqrt{n_{\ell}}}\right)\right)$

for $\ell<L$ , where

\displaystyle\mu_{3}\triangleq\mathbb{E}\left[(\imath(X;Y)-C)^{3}\right]<\infty,

(120)

and (119) follows from the Taylor series expansion $\exp\{x\}=1+x+O(x^{2})$ as $x\to 0$ , and the well-known bound (e.g., [42, Ch. 8, eq. (2.46)])

\displaystyle Q(x)\leq\frac{1}{\sqrt{2\pi}}\frac{1}{x}\exp\left\{-\frac{x^{2}}{2}\right\}\quad\text{for }x>0.

(121)

For $\ell=L$ , Lemma 2 gives

	$\displaystyle\IEEEeqnarraymulticol{3}{l}{\mathbb{P}\left[\imath(X^{n_{L}};Y^{n_{L}})<\gamma\right]}$				(122)
		$\displaystyle\leq$	$\displaystyle\frac{1}{\sqrt{2\pi}}\frac{1}{\sqrt{n_{L}}}\frac{1}{\sqrt{\log n_{L}}}\left(1+O\left(\frac{(\log n_{L})^{(3/2)}}{\sqrt{n_{L}}}\right)\right).$		(122)

By Theorem 1, there exists a VLSF code with $L$ decoding times $n_{1}<n_{2}<\cdots<n_{L}$ such that the expected decoding time is bounded as

\displaystyle N\leq n_{1}+\sum_{\ell=1}^{L-1}(n_{\ell+1}-n_{\ell})\mathbb{P}\left[\imath(X^{n_{\ell}};Y^{n_{\ell}})<\gamma\right].

(123)

By (117), we have $n_{\ell+1}=n_{\ell}(1+o(1))$ for $\ell\in[L-1]$ . Multiplying these asymptotic equations, we get

\displaystyle n_{\ell}=n_{1}(1+o(1))

(124)

for $\ell\in[L]$ . Plugging (117), (119), and (124) into (123), we get

\displaystyle N\leq n_{1}+\frac{\sqrt{V}}{\sqrt{2\pi}\,C}\frac{\sqrt{n_{1}}}{\sqrt{\log_{(L)}(n_{1})}}(1+o(1)).

(125)

Applying Lemma 3 to (125), we get

\displaystyle n_{1}\geq N-\frac{\sqrt{V}}{\sqrt{2\pi}\,C}\frac{\sqrt{N}}{\sqrt{\log_{(L)}(N)}}(1+o(1)).

(126)

Comparing (126) and (117), we observe that for $n_{1}$ large enough,

\displaystyle n_{1}<N<n_{2}<\dots<n_{L}.

(127)

Further, from (113) and (125), we have

\displaystyle n_{L}=N\left(1+O\left(\sqrt{\frac{\log N}{N}}\right)\right).

(128)

Finally, we set message set size $M$ such that

\displaystyle\log M=\gamma-\log N.

(129)

Plugging (122) and (129) into (18), we bound the error probability as

$\displaystyle\IEEEeqnarraymulticol{3}{l}{\mathbb{P}\left[\mathsf{g}_{\tau^{}}(U,Y^{\tau^{}})\neq W\right]}$			(130)
	$\displaystyle\leq$	$\displaystyle\mathbb{P}\left[\imath(X^{n_{L}};Y^{n_{L}})<\gamma\right]+(M-1)\exp\{-\gamma\}$	(130)
	$\displaystyle\leq$	$\displaystyle\frac{1}{\sqrt{2\pi}}\frac{1}{\sqrt{n_{L}}}\frac{1}{\sqrt{\log n_{L}}}(1+o(1))+\frac{1}{N}$	(131)
	$\displaystyle\leq$	$\displaystyle\frac{1}{\sqrt{2\pi}}\frac{1}{\sqrt{N}}\frac{1}{\sqrt{\log N}}(1+o(1))+\frac{1}{N},$	(132)

where (132) follows from (127). Inequality (132) implies that the error probability is bounded by $\frac{1}{\sqrt{N\log N}}$ for $N$ large enough. Plugging (126) and (129) into (113) with $\ell=1$ , we conclude that there exists an $\left(N,L,M,\frac{1}{\sqrt{N\log N}}\right)$ -VLSF code with

	$\displaystyle\log M$	$\displaystyle\geq NC-\sqrt{N\log_{(L)}(N)V}$
		$\displaystyle\quad-\frac{1}{\sqrt{2\pi}}\sqrt{\frac{NV}{\log_{(L)}(N)}}(1+o(1))-\log N,$		(133)

which completes the proof.

B.II Second-order optimality of the decoding times in Theorem 2

From the code construction described in Theorems 1–2, the average decoding time is

\displaystyle N(n_{2},\dots,n_{L},\gamma)=N^{\prime}(1-\epsilon)\frac{1}{1-\epsilon_{N}^{\prime}},

(134)

where

	$\displaystyle N^{\prime}$	$\displaystyle=n_{2}+\sum_{i=2}^{L-1}(n_{i+1}-n_{i})\mathbb{P}\left[\imath(X^{n_{i}};Y^{n_{i}})<\gamma\right]$		(135)
	$\displaystyle\epsilon_{N}^{\prime}$	$\displaystyle=\mathbb{P}\left[\imath(X^{n_{L}};Y^{n_{L}})<\gamma\right]+M\exp\{-\gamma\}.$		(136)

We here show that given a fixed $M$ , the parameters $n_{2},n_{3},\dots,n_{L},\gamma$ chosen according to (113) and (129) (and also the error value $\epsilon_{N}^{\prime}$ chosen in (23) since $\epsilon_{N}^{\prime}$ is a function of $(n_{L},\gamma$ )) minimize the average decoding time in (134) in the sense that the second-order expansion of $\log M$ in terms of $N$ is maximized. That is, our parameter choice optimizes our bound on our code construction.

B.II1 Optimality of $n_{2},\dots,n_{L-1}$

We first set $n_{L}$ and $\gamma$ to satisfy the equations

	$\displaystyle\frac{1}{\sqrt{n_{L}\log n_{L}}}$	$\displaystyle=\mathbb{P}\left[\imath(X^{n_{L}};Y^{n_{L}})<\gamma\right]+(M-1)\exp\{-\gamma\}$		(137)
	$\displaystyle\log M$	$\displaystyle=\gamma-\log n_{L},$		(138)

and optimize the values of $n_{2},\dots,n_{L-1}$ under (137)–(138). Section B.II2 proves the optimality of the choices in (137)–(138).

Under (137)–(138), the optimization problem in (134)–(136) reduces to

$\displaystyle\min$	$\displaystyle N^{\prime}(n_{2},\dots,n_{L-1})$	(139)
	$\displaystyle=n_{2}+\sum_{i=2}^{L-1}(n_{i+1}-n_{i})\mathbb{P}\left[\imath(X^{n_{i}};Y^{n_{i}})<\gamma\right]$
s.t.	$\displaystyle\frac{1}{\sqrt{n_{L}\log n_{L}}}=\mathbb{P}\left[\imath(X^{n_{L}};Y^{n_{L}})<\gamma\right]$
	$\displaystyle\quad\quad\quad\quad\quad\quad+(M-1)\exp\{-\gamma\}.$

Next, we define the functions

$\displaystyle g(n)$	$\displaystyle\triangleq\frac{nC-\gamma}{\sqrt{nV}}$	(140)
$\displaystyle F(n)$	$\displaystyle\triangleq Q(-g(n))=1-Q(g(n))$	(141)
$\displaystyle f(n)$	$\displaystyle\triangleq F^{\prime}(n)=\frac{1}{\sqrt{2\pi}}\exp\left\{-\frac{g(n)^{2}}{2}\right\}\,g^{\prime}(n).$	(142)

Assume that $\gamma=\gamma_{n}$ is such that $g(n)=O(n^{1/6})$ , and $\lim\limits_{n\to\infty}g(n)=\infty$ . Then by Lemma 2, $\mathbb{P}\left[\imath(X^{n};Y^{n})<\gamma\right]$ , which is a step-wise constant function of $n$ , is approximated by differentiable function $1-F(n)$ as

\displaystyle\mathbb{P}\left[\imath(X^{n};Y^{n})<\gamma\right]=(1-F(n))(1+o(1)).

(143)

Taylor series expansions give

$\displaystyle 1-F(n)$	$\displaystyle=\frac{1}{g(n)}\frac{1}{\sqrt{2\pi}}\exp\left\{-\frac{g(n)^{2}}{2}\right\}(1+o(1))$	(144)
$\displaystyle f(n)$	$\displaystyle=(1-F(n))g(n)g^{\prime}(n)(1+o(1))$	(145)
$\displaystyle g^{\prime}(n)$	$\displaystyle=\frac{C}{\sqrt{nV}}(1+o(1)).$	(146)

Let ${\mathbf{n}^{*}=(n_{2}^{*},\dots,n_{L-1}^{*})}$ denote the solution to the optimization problem in (139) with $\mathbb{P}\left[\imath(X^{n};Y^{n})<\gamma\right]$ replaced by $(1-F(n))$ . Then $\mathbf{n}^{*}$ must satisfy the Karush-Kuhn-Tucker conditions $\nabla N^{\prime}(\mathbf{n}^{*})=\mathbf{0}$ , giving

	$\displaystyle\left.\frac{\partial N^{\prime}}{\partial n_{2}}\right\|_{\mathbf{n}=\mathbf{n}^{*}}$	$\displaystyle=F(n_{2}^{})-(n_{3}^{}-n_{2}^{})f(n_{2}^{})=0$		(147)
	$\displaystyle\left.\frac{\partial N^{\prime}}{\partial n_{\ell}}\right\|_{\mathbf{n}=\mathbf{n}^{*}}$	$\displaystyle=F(n_{\ell}^{})-F(n_{L-1}^{})-(n_{\ell+1}^{}-n_{\ell}^{})f(n_{\ell}^{*})=0$		(148)

for $\ell=3,\dots,L-1$ .

Let $\tilde{\mathbf{n}}=(\tilde{n}_{2},\dots,\tilde{n}_{L-1})$ be the decoding times chosen in (113). We evaluate

$\displaystyle g(\tilde{n}_{\ell})$	$\displaystyle=\sqrt{\log_{(L-\ell+1)}(\tilde{n}_{\ell})}(1+o(1))$	(149)
$\displaystyle 1-F(\tilde{n}_{\ell})$	$\displaystyle=\frac{1}{\sqrt{2\pi}}\frac{1}{g(\tilde{n}_{\ell+1})g(\tilde{n}_{\ell})}(1+o(1))$	(150)
$\displaystyle f(\tilde{n}_{\ell})$	$\displaystyle=\frac{1}{\sqrt{2\pi}}\frac{g^{\prime}(\tilde{n}_{\ell})}{g(\tilde{n}_{\ell+1})}$	(151)
$\displaystyle\tilde{n}_{\ell+1}-\tilde{n}_{\ell}$	$\displaystyle=\frac{g(\tilde{n}_{\ell+1})}{g^{\prime}(\tilde{n}_{\ell})}(1+o(1))$	(152)

for $\ell=2,\dots,L-1$ . Plugging (149)–(152) into (147)–(148) for $\left.\frac{\partial N^{\prime}}{\partial n_{\ell}}\right|_{\mathbf{n}=\mathbf{\tilde{n}}}$ , we get

	$\displaystyle\nabla N^{\prime}(\tilde{\mathbf{n}})$	$\displaystyle=\left(1-\frac{1}{\sqrt{2\pi}},-\frac{1}{\sqrt{2\pi}},-\frac{1}{\sqrt{2\pi}},\dots,-\frac{1}{\sqrt{2\pi}}\right)$
		$\displaystyle\quad(1+o(1)).$		(153)

Our goal is to find a vector $\Delta\mathbf{n}=(\Delta n_{2},\dots,\Delta n_{L-1})$ such that

\displaystyle\nabla N^{\prime}(\tilde{\mathbf{n}}+\Delta\mathbf{n})=\mathbf{0},

(154)

Assume that $\Delta n=O(\sqrt{n})$ . By plugging $n+\Delta n$ into (144)–(146) and using the Taylor series expansion of $g(n+\Delta n)$ , we get

$\displaystyle 1-F(n+\Delta n)$	$\displaystyle=(1-F(n))$
	$\displaystyle\quad\cdot\exp\{-\Delta ng(n)g^{\prime}(n)\}(1+o(1))$	(155)
$\displaystyle f(n+\Delta n)$	$\displaystyle=f(n)\exp\{-\Delta ng(n)g^{\prime}(n)\}(1+o(1)).$	(156)

Using (155)–(156), and putting $\tilde{\mathbf{n}}+\Delta\mathbf{n}$ in (147)–(148), we solve (154) as

$\displaystyle\Delta n_{2}$	$\displaystyle=-\frac{\log\sqrt{2\pi}}{g(\tilde{n}_{2})g^{\prime}(\tilde{n}_{2})}(1+o(1))$	(157)
	$\displaystyle=-\frac{\sqrt{V}\,\log{\sqrt{2\pi}}}{C}\frac{\sqrt{\tilde{n}_{2}}}{\sqrt{\log_{(L-1)}(\tilde{n}_{2})}}(1+o(1))$	(158)
$\displaystyle\Delta n_{i}$	$\displaystyle=\frac{1}{2}\frac{g(\tilde{n}_{i-1})^{2}}{g(\tilde{n}_{i})g^{\prime}(\tilde{n}_{i})}=o(\Delta n_{2})(1+o(1))$	(159)

for $i=3,\dots,L-1$ . Hence, $\tilde{\mathbf{n}}+\Delta{\mathbf{n}}$ satisfies the optimality criterion, and $\mathbf{n}^{*}=\tilde{\mathbf{n}}+\Delta{\mathbf{n}}$ .

It remains only to evaluate the gap $N^{\prime}(\mathbf{n}^{*})-N^{\prime}(\tilde{\mathbf{n}})$ . We have

	$\displaystyle N^{\prime}(\mathbf{n}^{*})-N^{\prime}(\tilde{\mathbf{n}})$
	$\displaystyle=\bigg{(}\Delta n_{2}+\sum_{i=2}^{L-1}(\tilde{n}_{i+1}-\tilde{n}_{i})Q(g(\tilde{n}_{i}))$
	$\displaystyle\quad\cdot\left(\exp\{-\Delta n_{i}g(\tilde{n}_{i})g^{\prime}(\tilde{n}_{i})\}-1\right)\bigg{)}(1+o(1))$		(160)
	$\displaystyle=\left(\Delta n_{2}+\left(1-\frac{1}{\sqrt{2\pi}}\right)\frac{1}{g(\tilde{n}_{1})g^{\prime}(\tilde{n}_{i})}-\sum_{i=3}^{L-1}\Delta n_{i}\right)$
	$\displaystyle\quad\cdot(1+o(1))$		(161)
	$\displaystyle=-B\frac{\sqrt{\tilde{n}_{2}}}{\sqrt{\log_{(L-1)}(\tilde{n}_{2})}}(1+o(1)),$		(162)

where $B=\left(\log\sqrt{2\pi}+\frac{1}{\sqrt{2\pi}}-1\right)\frac{\sqrt{V}}{C}$ is a positive constant. From the relationship between $n_{\ell}$ and $n_{2}$ in (124) and the equality (162), we get

\displaystyle N^{\prime}(\tilde{\mathbf{n}})=N^{\prime}(\mathbf{n}^{*})+B\frac{\sqrt{N^{\prime}(\mathbf{n}^{*})}}{\sqrt{\log_{(L-1)}(N^{\prime}(\mathbf{n}^{*}))}}(1+o(1)).

(163)

Plugging (163) into our VLSF achievability bound (133) gives

	$\displaystyle\log M$	$\displaystyle\geq N^{\prime}(\mathbf{n}^{})C-\sqrt{N^{\prime}(\mathbf{n}^{})\log_{(L-1)}(N^{\prime}(\mathbf{n}^{*}))V}$
		$\displaystyle\quad-O\left(\sqrt{\frac{N^{\prime}(\mathbf{n}^{})}{\log_{(L-1)}(N^{\prime}(\mathbf{n}^{}))}}\right).$		(164)

Comparing (164) and (133), note that the decoding times chosen in (113) have the optimal second-order term in the asymptotic expansion of the maximum achievable message set size within our code construction. Moreover, the order of the third-order term in (164) is the same as the order of the third-order term in (133).

The method of approximating the probability $\mathbb{P}\left[\imath(X^{n};Y^{n})\geq\gamma\right]$ , which is an upper bound for $\mathbb{P}\left[\tau\leq n\right]$ (see (86)), by a differentiable function $F(n)$ is introduced in [26, Sec. III] for the optimization problem in (139). In [26], Vakilinia et al. approximate the distribution of the random stopping time $\tau$ by the Gaussian distribution, where $\mathbb{E}\left[\tau\right]$ and $\mathrm{Var}\left[\tau\right]$ are found empirically. They derive the Karush-Kuhn-Tucker conditions in (147)–(148), which is known as the SDO algorithm. A similar analysis appears in [27] for the binary erasure channel. The analyses in [26, 27] use the SDO algorithm to numerically solve the equations (147)–(148) for a fixed $L$ , $M$ , and $\epsilon$ . Unlike [26, 27], we find the analytic solution to (147)–(148) as decoding times $n_{2},\dots,n_{L}$ approach infinity, and we derive the achievable rate in Theorem 2 as a function of $L$ .

B.II2 Optimality of $n_{L}$ and $\gamma$

Let $(\mathbf{n}^{*},\gamma^{*})=(n_{2}^{*},\dots,n_{L}^{*},\gamma^{*})$ be the solution to $\nabla N(\mathbf{n}^{*},\gamma^{*})=\mathbf{0}$ in (134). Section A finds the values of $n_{2}^{*},n_{3}^{*},\dots,n_{L-1}^{*}$ that minimize $N^{\prime}$ . Minimizing $N^{\prime}$ also minimizes $N$ in (134) since $\epsilon_{N}^{\prime}$ depends only on $n_{L}$ and $\gamma$ , and $\epsilon$ is a constant. Therefore, to minimize $N$ , it only remains to find $(n_{L}^{*},\gamma^{*})$ such that

	$\displaystyle\left.\frac{\partial N}{\partial n_{L}}\right\|_{(\mathbf{n},\gamma)=(\mathbf{n}^{},\gamma^{})}$	$\displaystyle=0$		(165)
	$\displaystyle\left.\frac{\partial N}{\partial\gamma}\right\|_{(\mathbf{n},\gamma)=(\mathbf{n}^{},\gamma^{})}$	$\displaystyle=0.$		(166)

Consider the case where $L>2$ . Solving (165) and (166) using (147)–(152) gives

$\displaystyle g(n_{L}^{*})$	$\displaystyle=\sqrt{\log n_{L}^{}+\log_{(2)}(n_{L}^{})+\log_{(3)}{(n_{L}^{*})}+O(1)}$	(167)
$\displaystyle 0$	$\displaystyle=c_{0}+N^{\prime}\bigg{(}\frac{1}{\sqrt{2\pi n_{L}^{}}}\exp\left\{-\frac{g(n_{L}^{})^{2}}{2}\right\}(1+o(1))$
	$\displaystyle\quad-M\exp\{-\gamma^{*}\}\bigg{)},$	(168)

where $c_{0}$ is a positive constant. Solving (167)–(168) simultaneously, we obtain

\displaystyle\log M

\displaystyle=\gamma^{*}-\log n_{L}^{*}+O(1).

(169)

Plugging (167) and (169) into (136), we get

\displaystyle\epsilon_{N}^{\prime*}=\frac{c_{1}}{\sqrt{n_{L}^{*}\log_{(2)}(n_{L}^{*})}\log n_{L}^{*}}(1+o(1)),

(170)

where $c_{1}$ is a constant. Let $(\tilde{\mathbf{n}},\tilde{\gamma})=(\tilde{n}_{2},\dots,\tilde{n}_{K},\tilde{\gamma})$ be the parameters chosen in (113) and (129). Note that $\epsilon_{N}^{\prime*}$ is order-wise different than $\epsilon_{N}^{\prime}$ in (23). Following steps similar to (160)–(162), we compute

\displaystyle N(\mathbf{n}^{*},\gamma^{*})-N(\tilde{\mathbf{n}},\tilde{\gamma})=-O\left(\sqrt{\frac{n_{L}^{*}}{\log n_{L}^{*}}}\right).

(171)

Plugging (171) into (21) gives

$\displaystyle\log M$	$\displaystyle={\frac{N(\mathbf{n}^{},\gamma^{})C}{1-\epsilon}}$
	$\displaystyle-\sqrt{N(\mathbf{n}^{},\gamma^{})\log_{(L-1)}(N(\mathbf{n}^{},\gamma^{}))\frac{V}{1-\epsilon}}$
	$\displaystyle+O\left(\sqrt{\frac{N(\mathbf{n}^{},\gamma^{})}{\log_{(L-1)}(N(\mathbf{n}^{},\gamma^{}))}}\right).$	(172)

Comparing (21) and (172), we see that although (23) and (170) are different, the parameters $(\tilde{\mathbf{n}},\tilde{\gamma})$ chosen in (113) and (129) have the same second-order term in the asymptotic expansion of the maximum achievable message set size as the parameters $(\mathbf{n}^{*},\gamma^{*})$ that minimize the average decoding time in the achievability bound in Theorem 1.

For $L=2$ , the summation term in (135) disappears; in this case, the solution to (165) gives

\displaystyle\epsilon_{N}^{\prime*}=\frac{c_{2}}{\sqrt{n_{L}^{*}\log n_{L}^{*}}}(1+o(1))

(173)

for some constant $c_{2}$ . Following the steps in (171)–(172), we conclude that the parameter choices in (113) and (129) are second-order optimal for $L=2$ as well.

Appendix C Proof of Theorem 3

Let $P_{0}$ and $P_{1}$ be two distributions. Let $Z\triangleq\log\frac{\mathrm{d}P_{0}}{\mathrm{d}P_{1}}$ be the log-likelihood ratio, and let

\displaystyle S_{n}=\sum_{i=1}^{n}Z_{i},

(174)

where $Z_{i}$ ’s are i.i.d. and have the same distribution as $Z$ . For $i\in\{0,1\}$ , we denote the probability measures and expectations under distribution $P_{i}$ by $\mathbb{P}_{i}$ and $\mathbb{E}_{i}$ , respectively. Given a threshold $a_{0}\in\mathbb{R}$ , define the stopping time

\displaystyle T

\displaystyle\triangleq\inf\{n\geq 1\colon S_{n}\geq a_{0}\}

(175)

and the overshoot

\displaystyle\xi_{0}=S_{T}-a_{0}.

(176)

The following lemma from [36], which gives the refined asymptotics for the stopping time $T$ , is the main tool to prove our bounds.

Lemma 4 ([36, Cor. 2.3.1, Th. 2.3.3, Th. 2.5.3, Lemma 3.1.1])

Suppose that $\mathbb{E}_{0}[(Z_{1}^{+})^{2}]<\infty$ , and $Z_{1}$ is non-arithmetic. Then, as $a\to\infty$ , it holds that

$\displaystyle\mathbb{E}_{0}[T]$	$\displaystyle=\frac{1}{D(P_{0}\\|P_{1})}(a_{0}+\mathbb{E}_{0}[\xi_{0}])$	(177)
	$\displaystyle=\frac{1}{D(P_{0}\\|P_{1})}\Bigg{(}a_{0}+\frac{\mathbb{E}_{0}[Z_{1}^{2}]}{2D(P_{0}\\|P_{1})}$
	$\displaystyle\quad-\sum_{n=1}^{\infty}\frac{1}{n}\mathbb{E}_{0}[S_{n}^{-}]+o(1)\Bigg{)},$	(178)

and

$\displaystyle\mathbb{P}_{0}[T<\infty]$	$\displaystyle=1$	(179)
$\displaystyle\mathbb{P}_{1}[T<\infty]$	$\displaystyle=e^{-a_{0}}\mathbb{E}_{0}[e^{-\xi_{0}}]$	(180)
$\displaystyle\mathbb{E}_{0}[e^{-\lambda\xi_{0}}]$	$\displaystyle=\frac{1+o(1)}{\lambda D(P_{0}\\|P_{1})}\exp\left\{-\sum_{n=1}^{\infty}\frac{1}{n}\mathbb{E}_{0}[e^{-\lambda S_{n}^{+}}]\right\}.$	(181)

C.I Achievability Proof

Let $P_{X}$ be a capacity-achieving distribution of the DM-PPC. Define the hypotheses

	$\displaystyle H_{0}$	$\displaystyle\colon(X^{d_{N}},Y^{d_{N}})^{\infty}\sim P_{0}^{\infty}=((P_{X}\times P_{Y\|X})^{d_{N}})^{\infty}$		(182)
	$\displaystyle H_{1}$	$\displaystyle\colon(X^{d_{N}},Y^{d_{N}})^{\infty}\sim P_{1}^{\infty}=((P_{X}\times P_{Y})^{d_{N}})^{\infty},$		(183)

and the random variables

	$\displaystyle W_{i}$	$\displaystyle\triangleq\frac{1}{d_{N}}\log\frac{\mathrm{d}P_{0}^{d_{N}}}{\mathrm{d}P_{1}^{d_{N}}}\left(X^{(i-1)d_{N}+1:id_{N}},Y^{(i-1)d_{N}+1:id_{N}}\right)$		(184)
		$\displaystyle=\frac{1}{d_{N}}\imath(X^{(i-1)d_{N}+1:id_{N}};Y^{(i-1)d_{N}+1:id_{N}})$		(185)

for $i\in\mathbb{N}.$ Note that under $H_{0}$ , $\mathbb{E}_{0}[W_{i}]=C$ . Here, we use $W_{i}$ in the place of $Z_{i}$ in (174). Define

\displaystyle S_{n}\triangleq\sum_{i=1}^{n}W_{i},

(186)

and

	$\displaystyle\tau$	$\displaystyle\triangleq\inf\{k\geq 1\colon S_{k}\geq a_{0}/d_{N}\}$		(187)
	$\displaystyle T$	$\displaystyle\triangleq d_{N}\,\tau.$		(188)

We employ the stop-at-time-zero procedure described in the proof sketch of Theorem 2 with $\epsilon_{N}^{\prime}=\frac{1}{\mathbb{E}_{0}[T]}$ and the information density threshold rule (80)–(83) from the proof of Theorem 1, where the threshold $\gamma$ is set to $a_{0}$ . Here, $T$ is as in (175). We set $M$ and $a_{0}$ so that

\displaystyle M\mathbb{P}_{1}[T<\infty]

\displaystyle\leq Me^{-a_{0}}=\epsilon_{N}^{\prime}=\frac{1}{\mathbb{E}_{0}[T]},

(189)

where the inequality follows from (180). Following steps identical to (99), and noting that $\mathbb{P}_{0}[T=\infty]=0$ , we get

\displaystyle N=(1-\epsilon)\mathbb{E}_{0}[T]+O(1),

(190)

and the average error probability of the code is bounded by $\epsilon$ .

To evaluate $\mathbb{E}_{0}[T]$ , we use Lemma 4 with $W_{i}$ in place of $Z_{i}$ . A straightforward calculation yields

	$\displaystyle\mathbb{E}_{0}[W_{1}^{2}]$	$\displaystyle=\mathbb{E}_{0}[W_{1}]^{2}-\mathrm{Var}\left[\frac{1}{d_{N}}\sum_{i=1}^{d_{N}}\imath(X_{i};Y_{i})\right]$		(191)
		$\displaystyle=C^{2}-\frac{1}{d_{N}}\mathrm{Var}\left[\imath(X_{1};Y_{1})\right].$		(192)

Next, we have that

\displaystyle\mathbb{E}_{0}[S_{n}^{-}]=-nd_{N}\mathbb{E}\left[\frac{1}{nd_{N}}S_{n}1\left\{\frac{1}{nd_{N}}S_{n}\leq 0\right\}\right],

(193)

where $S_{n}=\sum_{j=1}^{nd_{N}}\imath(X_{j};Y_{j})$ . Applying the saddlepoint approximation (e.g., [51, eq. (1.2)]) to $\frac{1}{nd_{N}}S_{n}$ , we get

\displaystyle\mathbb{E}_{0}[S_{n}^{-}]=nd_{N}\int_{-\infty}^{0}c(x)\sqrt{nd_{N}}e^{-nd_{N}g(x)+\log x}\mathrm{d}x,

(194)

where $c(x)$ and $g(x)$ are bounded from below a positive constant for all $x\in(-\infty,0]$ . Applying the Laplace’s integral [51, eq. (2.5)] to (194), we get

\displaystyle\mathbb{E}_{0}[S_{n}^{-}]=e^{-nd_{N}c_{n}+o(nd_{N})}

(195)

for all $n\in\mathbb{Z}_{+}$ , where each $c_{n}$ is a positive constant depending on $n$ . Putting (192) and (195) into (178) and (188), we get

\displaystyle\mathbb{E}_{0}[T]=\frac{a_{0}}{C}+\frac{d_{N}}{2}+o(d_{N}).

(196)

From (189)–(190), we get

	$\displaystyle\mathbb{E}_{0}[T]$	$\displaystyle=\frac{N}{1-\epsilon}+O\left(1\right)$		(197)
	$\displaystyle\log M$	$\displaystyle=a_{0}-\log N.$		(198)

Putting (196)–(198) together completes the proof of (26).

C.II Converse Proof

Recall the definition of an SHT $(\delta,\tau,\mathcal{N})$ from Appendix A.I1 that tests the hypotheses

	$\displaystyle H_{0}$	$\displaystyle\colon Z^{\infty}\sim P_{0}$		(199)
	$\displaystyle H_{1}$	$\displaystyle\colon Z^{\infty}\sim P_{1},$		(200)

where $P_{0}$ and $P_{1}$ are distributions on a common alphabet $\mathcal{Z}^{\infty}$ . Here, $Z^{\infty}\triangleq(Z_{1},Z_{2},\dots)$ denotes a vector of infinite length whose joint distribution is either $P_{0}$ or $P_{1}$ , which need not be product distributions in general. We define the minimum achievable type-II error probability, subject to a type-I error probability bound and a maximal expected decoding time constraint, with decision times restricted to the set $\mathcal{N}$ as

\displaystyle\beta_{(\epsilon,N,\mathcal{N})}(P_{0},P_{1})\triangleq\min_{\begin{subarray}{c}(\delta,\tau,\mathcal{N})\colon\mathbb{P}_{0}[\delta=1]\leq{\epsilon},\\ \max\{\mathbb{E}_{0}[\tau],\mathbb{E}_{1}[\tau]\}\leq N\end{subarray}}{\mathbb{P}_{1}[\delta=0]},

(201)

which is the SHT version of the $\beta_{\alpha}$ -function defined for the fixed-length binary hypothesis test [13].

The following theorem extends the meta-converse bound [13, Th. 27], which is a fundamental theorem used to show converse results in fixed-length channel coding without feedback and many other applications.

Theorem 9

Fix any set $\mathcal{N}\subseteq\mathbb{Z}_{\geq}$ , a real number $N>0$ , and a DM-PPC $P_{Y|X}$ . Then, it holds that

	$\displaystyle\log M^{*}(N,\|\mathcal{N}\|,\epsilon,\mathcal{N})$
	$\displaystyle\quad\leq\sup_{P_{X^{\infty}}}\inf_{Q_{Y^{\infty}}}-\log\beta_{(\epsilon,N,\mathcal{N})}(P_{X^{\infty}}\times P_{Y\|X}^{\infty},P_{X^{\infty}}\times Q_{Y^{\infty}}).$		(202)

Proof:

The proof is similar to that in [13]. Let $W$ denote a message equiprobably distributed on $[M]$ , and let $\hat{W}$ be its reconstruction. Given any VLSF code with the set of available decoding times $\mathcal{N}$ , average decoding time $N$ , error probability $\epsilon$ , and codebook size $M$ , let $\hat{P}_{X^{\infty}}$ denote the input distribution induced by the code’s codebook. The code operation creates a Markov chain $W\to X^{\infty}\to Y^{\infty}\to\hat{W}$ . As full feedback breaks this Markov chain, stop feedback does not since the channel inputs are conditionally independent of the channel outputs given the message $W$ . Fix an arbitrary output distribution $Q_{Y^{\infty}}$ , and consider the SHT

	$\displaystyle H_{0}$	$\displaystyle\colon(X^{\infty},Y^{\infty})\sim\hat{P}_{X^{\infty}}\times P_{Y\|X}^{\infty}$		(203)
	$\displaystyle H_{1}$	$\displaystyle\colon(X^{\infty},Y^{\infty})\sim\hat{P}_{X^{\infty}}\times Q_{Y^{\infty}}$		(204)

with a test $\delta=1\{\hat{W}\neq W\}$ , where $(W,\hat{W})$ are generated by the (potentially random) encoder-decoder pair of the VLSF code. The type-I and type-II error probabilities of this code-induced SHT are

	$\displaystyle\alpha$	$\displaystyle=\mathbb{P}_{0}[\delta=1]=\mathbb{P}\left[\hat{W}\neq W\right]\leq\epsilon$		(205)
	$\displaystyle\beta$	$\displaystyle=\mathbb{P}_{1}[\delta=0]=\frac{1}{M},$		(206)

where (206) follows since the sequence $Y^{\infty}$ is independent of $X^{\infty}$ under $H_{1}$ . The stopping time of this SHT under $H_{0}$ or $H_{1}$ is bounded by $N$ by the definition of a VLSF code. Since the error probabilities in (205)–(206) cannot be better than that of the optimal SHT, it holds that

	$\displaystyle\log M$
	$\displaystyle\leq-\log\beta_{(\epsilon,N,\mathcal{N})}(\hat{P}_{X^{\infty}}\times P_{Y\|X}^{\infty},\hat{P}_{X^{\infty}}\times Q_{Y^{\infty}})$		(207)
	$\displaystyle\leq\inf_{Q_{Y^{\infty}}}-\log\beta_{(\epsilon,N,\mathcal{N})}(\hat{P}_{X^{\infty}}\times P_{Y\|X}^{\infty},\hat{P}_{X^{\infty}}\times Q_{Y^{\infty}})$		(208)
	$\displaystyle\leq\sup_{P_{X^{\infty}}}\inf_{Q_{Y^{\infty}}}-\log\beta_{(\epsilon,N,\mathcal{N})}(P_{X^{\infty}}\times P_{Y\|X}^{\infty},P_{X^{\infty}}\times Q_{Y^{\infty}}),$		(209)

where (208) follows since the choice $Q_{Y^{\infty}}$ is arbitrary. ∎

To prove (27), we apply Theorem 9 and get

\displaystyle\log M\leq-\log\beta_{(\epsilon,N,\mathcal{N})}(P_{Y|X}^{\infty},P_{Y}^{\infty}),

(210)

where $P_{Y}$ is the capacity-achieving output distribution, and $\mathcal{N}=\{0,d_{N},2d_{N},\dots\}$ . The reduction from Theorem 9 to (210) follows since $\log\frac{P_{Y|X}(Y|x)}{P_{Y}(Y)}$ has the same distribution for all $x\in\mathcal{X}$ for Cover–Thomas symmetric channels [43, p. 190]. In the remainder of the proof, we derive an upper bound for the right-hand side of (210).

Consider any SHT $(\delta,\tau,\mathcal{N})$ with $\mathbb{E}_{0}[\tau]\leq N$ and $\mathbb{E}_{1}[\tau]\leq N$ . Our definition in (201) is slightly different than the classical SHT definition from [52] since our definition allows one to make a decision at time 0. Notice that at time 0, any test has three choices: decide $H_{0}$ , decide $H_{1}$ , or decide to start taking samples. When the test decides to start taking samples, the remainder of the procedure becomes a classical SHT. From this observation, any test satisfies

	$\displaystyle\epsilon\geq\alpha$	$\displaystyle=\epsilon_{0}+(1-\epsilon_{0}-\epsilon_{1})\alpha^{\prime}\geq\epsilon_{0}$		(211)
	$\displaystyle\beta$	$\displaystyle=\epsilon_{1}+(1-\epsilon_{0}-\epsilon_{1})\beta^{\prime}\geq(1-\epsilon_{0})\beta^{\prime},$		(212)

where at time 0, the test decides $H_{i}$ with probability $\epsilon_{1-i}$ , and $\alpha^{\prime}$ and $\beta^{\prime}$ are the type-I and type-II error probabilities conditioned on the event that the test decides to take samples at time 0, which occurs with probability $1-\epsilon_{0}-\epsilon_{1}$ .

Let $\tau^{\prime}$ denote the average stopping time of the test with error probabilities $(\alpha^{\prime},\beta^{\prime})$ . We have

$\displaystyle\mathbb{E}_{0}[\tau]$	$\displaystyle=(1-\epsilon_{0}-\epsilon_{1})\mathbb{E}_{0}[\tau^{\prime}]$	(213)
	$\displaystyle=(1-\epsilon_{0})(\mathbb{E}_{0}[\tau^{\prime}]+e^{-O(N)})\leq N$	(214)
$\displaystyle\mathbb{E}_{1}[\tau]$	$\displaystyle=(1-\epsilon_{0}-\epsilon_{1})\mathbb{E}_{1}[\tau^{\prime}]$	(215)
	$\displaystyle=(1-\epsilon_{0})(\mathbb{E}_{1}[\tau^{\prime}]+e^{-O(N)})\leq N$	(216)

since $\beta$ decays exponentially with $\mathbb{E}_{0}[\tau]$ .

The following argument is similar to that in [53, Sec. V-C]. Set an arbitrary $\nu>0$ and the thresholds

	$\displaystyle\tilde{a}_{0}$	$\displaystyle=C\left(\frac{N}{1-\epsilon_{0}}-\frac{d_{N}}{2}-o(d_{N})+{\nu}\right)$		(217)
	$\displaystyle\tilde{a}_{1}$	$\displaystyle=D(P_{Y}\\|P_{Y\|X=x})\left(\frac{N}{1-\epsilon_{0}}-\frac{d_{N}}{2}-o(d_{N})+{\nu}\right),$		(218)

where $x\in\mathcal{X}$ is arbitrary, and let $(\tilde{\delta},\tilde{\tau},\mathcal{N})$ be the SPRT associated with the thresholds $(-\tilde{a}_{1},\tilde{a}_{0})$ , and type-I and type-II error probabilities $\tilde{\alpha}$ and $\tilde{\beta}$ .

Applying [36, eq. (3.56)] to (196), we get

	$\displaystyle\mathbb{E}_{0}[\tilde{\tau}]$	$\displaystyle=\frac{\tilde{a}_{0}}{C}+\frac{d_{N}}{2}+o(d_{N})$		(219)
	$\displaystyle\mathbb{E}_{1}[\tilde{\tau}]$	$\displaystyle=\frac{\tilde{a}_{1}}{D(P_{Y}\\|P_{Y\|X=x})}+\frac{d_{N}}{2}+o(d_{N}).$		(220)

Combining (217)–(220) gives

	$\displaystyle\mathbb{E}_{0}[\tilde{\tau}]$	$\displaystyle\geq\frac{N}{1-\epsilon_{0}}+{\nu}$		(221)
	$\displaystyle\mathbb{E}_{1}[\tilde{\tau}]$	$\displaystyle\geq\frac{N}{1-\epsilon_{0}}+{\nu}.$		(222)

Letting $\nu=O\left(\frac{1}{N}\right)$ , it follows from (213)–(216) and (221)–(222) that

	$\displaystyle\mathbb{E}_{0}[\tilde{\tau}]$	$\displaystyle\geq\mathbb{E}_{0}[\tau^{\prime}]$		(223)
	$\displaystyle\mathbb{E}_{1}[\tilde{\tau}]$	$\displaystyle\geq\mathbb{E}_{1}[\tau^{\prime}]$		(224)

for a large enough $N$ . Using Wald and Wolfowitz’s SPRT optimality result [54], we get

	$\displaystyle\alpha^{\prime}$	$\displaystyle\geq\tilde{\alpha}$		(225)
	$\displaystyle\beta^{\prime}$	$\displaystyle\geq\tilde{\beta}.$		(226)

Now it only remains to lower bound $\tilde{\beta}$ . Applying [36, Th. 3.1.2, 3.1.3] and (181) gives

\displaystyle\tilde{\beta}=\tilde{\zeta}e^{-\tilde{a}_{0}}(1+o(1)),

(227)

where

\displaystyle\tilde{\zeta}

\displaystyle=\frac{1}{d_{N}C}\left(\exp\left\{-\sum_{n=1}^{\infty}\frac{1}{n}\mathbb{P}_{0}[S_{n}<0]+\mathbb{P}_{1}[S_{n}>0]\right\}\right),

(228)

and $S_{n}$ is as in (186). Since $S_{n}$ is a sum of $nd_{N}\to\infty$ i.i.d. random variables, where the summands have a non-zero mean, the Chernoff bound implies that each of the probabilities in (228) decays exponentially with $d_{N}$ . Thus,

\displaystyle\tilde{\zeta}=\frac{1}{d_{N}C}(1+o(1)).

(229)

From (217) and (229), we get

	$\displaystyle-\log\tilde{\beta}=C\left(\frac{N}{1-\epsilon_{0}}-\frac{d_{N}}{2}-o(d_{N})+o(\log d_{N})\right)$		(230)
	$\displaystyle\leq C\left(\frac{N}{1-\epsilon}-\frac{d_{N}}{2}-o(d_{N})+o(\log d_{N})\right),$		(231)

where (231) follows from (211). Inequalities (212), (226), and (231) imply (27).

Appendix D Proofs for the DM-MAC

In this section, we prove our main results for the DM-MAC, beginning with Theorem 4, which is used to prove Theorem 5.

D.I Proof of Theorem 4

For each transmitter $k$ , $k\in[K]$ , we generate $M_{k}$ $n_{L}$ -dimensional codewords i.i.d. from $P_{X_{k}}^{n_{L}}$ . Codewords for distinct transmitters are drawn independently of each other. Denote the codeword for transmitter $k$ and message $m_{k}$ by $X_{k}^{n_{L}}(m_{k})$ for $k\in[K]$ and $m_{k}\in[M_{k}]$ . The proof extends the DM-PPC achievability bound in Theorem 1 that is based on a sub-optimal SHT to the DM-MAC. Below, we explain the differences.

Without loss of generality, assume that $m_{[K]}=\bm{1}$ is transmitted. The hypothesis test in (72)–(73) is replaced by

	$\displaystyle H_{0}$	$\displaystyle\colon(X_{[K]}^{n_{L}},Y_{K}^{n_{L}})\sim\left(\prod_{k=1}^{K}P_{X_{k}}^{n_{L}}\right)\times P_{Y_{K}\|X_{[K]}}^{n_{L}}$		(232)
	$\displaystyle H_{1}$	$\displaystyle\colon(X_{[K]}^{n_{L}},Y_{K}^{n_{L}})\sim\left(\prod_{k=1}^{K}P_{X_{k}}^{n_{L}}\right)\times P_{Y_{K}}^{n_{L}},$		(233)

which is run for every message tuple $m_{[K]}\in\prod\limits_{k=1}^{K}[M_{k}]$ . The information density (80), the stopping times (81)–(82), and the decision rule (83) are extended to the DM-MAC as

$\displaystyle S_{m_{[K]},n_{\ell}}$	$\displaystyle\triangleq\imath_{K}(X_{[K]}^{n_{\ell}}(m_{[K]});Y_{K}^{n_{\ell}})$	(234)
$\displaystyle\tau_{m_{[K]}}$	$\displaystyle\triangleq\inf\{n_{\ell}\in\mathcal{N}\colon S_{m_{[K]},n_{\ell}}\geq\gamma\}$	(235)
$\displaystyle\tilde{\tau}_{m_{[K]}}$	$\displaystyle\triangleq\min\{\tau_{m_{[K]}},n_{L}\}$	(236)
$\displaystyle\delta_{m_{[K]}}$	$\displaystyle\triangleq\begin{cases}0&\text{if }S_{m_{[K]},n_{\ell}}\geq\gamma\\ 1&\text{if }S_{m_{[K]},n_{\ell}}<\gamma\end{cases}$	(237)

for every message tuple $m_{[K]}$ and decoding time $n_{\ell}$ . For brevity, let $(X_{[K]}^{n_{\ell}},Y_{K}^{n_{\ell}},\bar{X}_{[K]}^{n_{\ell}})$ be drawn i.i.d. according to the joint distribution

	$\displaystyle P_{X_{[K]},Y_{K},\bar{X}_{[K]}}(x_{[K]},y,\bar{x}_{[K]})$
	$\displaystyle=P_{Y_{K}\|X_{[K]}}(y\|x_{[K]})\prod_{k=1}^{K}P_{X_{k}}(x_{k})P_{X_{k}}(\bar{x}_{k}).$		(238)

Expected decoding time analysis: Following steps identical to (84)–(86), we get (41).

Error probability analysis: The following error analysis extends the PPC bounds in (79) and (89)–(97) to the DM-MAC.

In the analysis below, for brevity, we write $m_{\mathcal{A}}\neq 1$ to denote that $m_{i}\neq 1$ for $i\in\mathcal{A}$ . The error probability is bounded as

$\displaystyle\epsilon$	$\displaystyle\leq\mathbb{P}\bigg{[}\bigcup_{m_{[K]}\neq\mathbf{1}}\{\tau_{m_{[K]}}\leq\tau_{\mathbf{1}}<\infty\}\bigcup\{\tau_{\mathbf{1}}=\infty\}\bigg{]}$	(239)
	$\displaystyle\leq\mathbb{P}\left[\tau_{\mathbf{1}}=\infty\right]+\mathbb{P}\left[\bigcup_{\begin{subarray}{c}m_{[K]}\neq 1\end{subarray}}\{\tau_{m_{[K]}}<\infty\}\right]$	(240)
	$\displaystyle\quad+\mathbb{P}\left[\bigcup_{\begin{subarray}{c}m_{[K]}\neq\mathbf{1}\colon\exists\,i\in[K]\\ m_{i}=1\end{subarray}}\{\tau_{m_{[K]}}<\infty\}\right],$	(241)

where (240)–(241) apply the union bound to separate the probabilities of the following error events:

1.

the information density of the true message tuple does not satisfy the threshold test for any available decoding time;
2.

the information density of a message tuple in which all messages are incorrect satisfies the threshold test for some decoding time;
3.

the information density of a message tuple in which the messages from some transmitters are correct and the messages from the other transmitters are incorrect satisfies the threshold test for some decoding time.

The terms in (240) are bounded using steps identical to (89)–(97) as

	$\displaystyle\mathbb{P}\left[\tau_{\mathbf{1}}=\infty\right]$	$\displaystyle\leq\mathbb{P}\left[\imath_{K}(X_{[K]}^{n_{L}};Y_{K}^{n_{L}})<\gamma\right]$		(242)
	$\displaystyle\mathbb{P}\left[\bigcup_{\begin{subarray}{c}m_{[K]}\neq 1\end{subarray}}\{\tau_{m_{[K]}}<\infty\}\right]$	$\displaystyle\leq\prod_{k=1}^{K}(M_{k}-1)\exp\{-\gamma\}.$		(243)

For the cases where at least one message is decoded correctly, we delay the application of the union bound. Let $\mathcal{A}\in\mathcal{P}([K])$ be the set of transmitters whose messages are decoded correctly. Define

$\displaystyle\mathcal{M}^{(\mathcal{A})}$	$\displaystyle\triangleq\{m_{[K]}\in[M]^{K}\colon m_{k}=1\text{ for }k\in\mathcal{A},$
	$\displaystyle\quad\quad m_{k}\neq 1\text{ for }k\in\mathcal{A}^{\mathrm{c}}\}$	(244)
$\displaystyle\tilde{\mathcal{M}}^{(\mathcal{A})}$	$\displaystyle\triangleq\{m_{\mathcal{A}}\in[M]^{\|\mathcal{A}\|}\colon m_{k}\neq 1\text{ for }k\in\mathcal{A}\}.$	(245)

We first bound the probability term in (241) by applying the union bound according to which subset $\mathcal{A}$ of the transmitter set $[K]$ is decoded correctly, and get

	$\displaystyle\mathbb{P}\left[\bigcup_{\begin{subarray}{c}m_{[K]}\neq\mathbf{1}\colon\exists\,i\in[K]\\ m_{i}=1\end{subarray}}\{\tau_{m_{[K]}}<\infty\}\right]$
	$\displaystyle\leq\sum_{\mathcal{A}\in\mathcal{P}([K])}\mathbb{P}\left[\bigcup_{m_{[K]}\in\mathcal{M}^{(\mathcal{A})}}\{\tau_{m_{[K]}}<\infty\}\right]$		(246)
	$\displaystyle=\sum_{\mathcal{A}\in\mathcal{P}([K])}\mathbb{P}\left[\bigcup_{\begin{subarray}{c}m_{\mathcal{A}^{c}}\in\tilde{\mathcal{M}}^{(\mathcal{A}^{c})}\\ n_{\ell}\in\mathcal{N}\end{subarray}}\left\{\imath_{K}(\bar{X}_{\mathcal{A}^{c}}^{n_{\ell}}(m_{\mathcal{A}^{c}}),X_{\mathcal{A}}^{n_{\ell}};Y_{K}^{n_{\ell}})\geq\gamma\right\}\right],$		(247)

where $\bar{X}_{\mathcal{A}^{c}}^{n_{\ell}}(m_{\mathcal{A}^{c}})$ refers to the random sample from the codebooks of transmitter set $\mathcal{A}^{c}$ , independent from the codewords $X_{{\mathcal{A}^{c}}}^{n_{\ell}}$ transmitted by the transmitters $\mathcal{A}^{c}$ and the received output $Y^{n_{\ell}}$ .

We bound the right-hand side of (247) using the same method as in [28, eq. (65)–(66)]. This step is crucial in enabling the single-threshold rule for the rate vectors approaching a point on the sum-rate boundary. Set an arbitrary $\lambda^{(\mathcal{A})}>0$ . Define two events

	$\displaystyle\mathcal{E}(\mathcal{A})$	$\displaystyle\triangleq\bigcup_{n_{\ell}\in\mathcal{N}}\left\{\imath_{K}(X_{\mathcal{A}}^{n_{\ell}};Y_{K}^{n_{\ell}})>NI_{K}(X_{\mathcal{A}};Y_{K})+N\lambda^{(\mathcal{A})}\right\}$		(248)
	$\displaystyle\mathcal{F}(\mathcal{A})$	$\displaystyle\triangleq\bigcup_{\begin{subarray}{c}m_{\mathcal{A}^{c}}\in\tilde{\mathcal{M}}^{(\mathcal{A}^{c})}\\ n_{\ell}\in\mathcal{N}\end{subarray}}\left\{\imath_{K}(\bar{X}_{\mathcal{A}^{c}}^{n_{\ell}}(m_{{\mathcal{A}}^{c}}),X_{\mathcal{A}}^{n_{\ell}};Y_{K}^{n_{\ell}})\geq\gamma\right\}.$		(249)

Define the threshold

\displaystyle\bar{\gamma}^{(\mathcal{A})}\triangleq\gamma-NI_{K}(X_{\mathcal{A}};Y_{K})-N\lambda^{(\mathcal{A})}.

(250)

We have

	$\displaystyle\mathbb{P}\left[\mathcal{F}(\mathcal{A})\right]$
	$\displaystyle=\mathbb{P}\left[\mathcal{F}(\mathcal{A})\cap\mathcal{E}(\mathcal{A})\right]+\mathbb{P}\left[\mathcal{F}(\mathcal{A})\cap\mathcal{E}(\mathcal{A})^{c}\right]$		(251)
	$\displaystyle\leq\mathbb{P}\left[\mathcal{E}(\mathcal{A})\right]$
	$\displaystyle+\mathbb{P}\Biggm{[}\bigcup_{\begin{subarray}{c}\begin{subarray}{c}m_{\mathcal{A}^{c}}\in\tilde{\mathcal{M}}^{(\mathcal{A}^{c})}\end{subarray}\\ n_{\ell}\in\mathcal{N}\end{subarray}}\bigg{\{}\imath_{K}(\bar{X}_{\mathcal{A}^{\mathrm{c}}}^{n_{\ell}}(m_{\mathcal{A}^{\mathrm{c}}});Y_{K}^{n_{\ell}}\|X_{\mathcal{A}}^{n_{\ell}})\geq\bar{\gamma}^{(\mathcal{A})}\bigg{\}}\Biggm{]}$		(252)
	$\displaystyle\leq\sum_{n_{\ell}\in\mathcal{N}}\mathbb{P}\left[\imath_{K}(X_{\mathcal{A}}^{n_{\ell}};Y_{K}^{n_{\ell}})>NI_{K}(X_{\mathcal{A}};Y_{K})+N\lambda^{(\mathcal{A})}\right]$
	$\displaystyle+\prod_{k\in\mathcal{A}^{\mathrm{c}}}(M_{k}-1)\mathbb{P}\left[\bigcup_{n_{\ell}\in\mathcal{N}}\left\{\imath_{K}(\bar{X}_{\mathcal{A}^{\mathrm{c}}}^{n_{\ell}};Y_{K}^{n_{\ell}}\|X_{\mathcal{A}}^{n_{\ell}})\geq\bar{\gamma}^{(\mathcal{A})}\right\}\right]$		(253)
	$\displaystyle\leq\sum_{n_{\ell}\in\mathcal{N}}\mathbb{P}\left[\imath_{K}(X_{\mathcal{A}}^{n_{\ell}};Y_{K}^{n_{\ell}})>NI_{K}(X_{\mathcal{A}};Y_{K})+N\lambda^{(\mathcal{A})}\right]$
	$\displaystyle\quad+\prod_{k\in\mathcal{A}^{\mathrm{c}}}(M_{k}-1)\exp\{-\bar{\gamma}^{(\mathcal{A})}\},$		(254)

where inequality (252) uses the chain rule for mutual information, (253) applies the union bound, and (254) follows from [20, eq. (88)].

Applying the bound in (254) to each of the probabilities in (247) and plugging (242), (243), and (247) into (240)–(241), we complete the proof of Theorem 4.

D.II Proof of Theorem 5

We employ the stop-at-time-zero procedure in the proof sketch of Theorem 2 with $\epsilon_{N}^{\prime}=\frac{1}{\sqrt{N^{\prime}\log N^{\prime}}}$ . Therefore, we first show that there exists an $(N,L,M_{[K]},1/\sqrt{N\log N})$ -VLSF code satisfying

	$\displaystyle\sum_{k=1}^{K}\log M_{k}$	$\displaystyle=NI_{K}-\sqrt{N\log_{(L)}(N)V_{K}}$
		$\displaystyle\quad+O\left(\sqrt{\frac{NV_{K}}{\log_{(L)}(N)}}\right).$		(255)

We set the parameters

$\displaystyle\gamma$	$\displaystyle=n_{\ell}I_{K}-\sqrt{n_{\ell}\log_{(L-\ell+1)}(n_{\ell})V_{K}}\quad\forall\,\ell\in[L]$	(256)
	$\displaystyle=\sum_{k=1}^{K}\log M_{k}+\log N$	(257)
$\displaystyle\lambda^{(\mathcal{A})}$	$\displaystyle=\frac{NI_{K}(X_{\mathcal{A}^{c}};Y_{K}\|X_{\mathcal{A}})-\sum_{k\in\mathcal{A}^{c}}\log M_{k}}{2N},\quad\mathcal{A}\in\mathcal{P}([K]).$	(258)

Note that $\lambda^{(\mathcal{A})}$ ’s are bounded below by a positive constant for rate points lying on the sum-rate boundary.

By Theorem 4, there exists a VLSF code with $L$ decoding times $n_{1}<n_{2}<\cdots<n_{L}$ such that the average decoding time is bounded as

\displaystyle N\leq n_{1}+\sum_{\ell=1}^{L-1}(n_{\ell+1}-n_{\ell})\mathbb{P}\left[\imath_{K}(X_{[K]}^{n_{\ell}};Y_{K}^{n_{\ell}})<\gamma\right].

(259)

Following the analysis in (125)–(128), we conclude that

\displaystyle n_{\ell}=N(1+o(1))

(260)

for all $\ell\in[L]$ . Applying the Chernoff bound to the probability terms in (39)–(40) using (256) and (260), we get that the sum of the terms in (39)–(40) is bounded by $\exp\{-NE\}$ for some $E>0$ .

Applying Lemma 2 to the probability in (37) with (256) gives

	$\displaystyle\mathbb{P}\left[\imath_{K}(X_{[K]}^{n_{L}};Y_{K}^{n_{L}})<\gamma\right]$	$\displaystyle\leq\frac{1}{\sqrt{2\pi}}\frac{1}{\sqrt{n_{L}}}\frac{1}{\sqrt{\log n_{L}}}$
		$\displaystyle\quad\cdot\left(1+O\left(\frac{(\log n_{L})^{(3/2)}}{\sqrt{n_{L}}}\right)\right).$		(261)

Applying Theorem 4 with (257), (261), and the exponential bound on the sum of the terms in (39)–(40), we bound the error probability as

	$\displaystyle\mathbb{P}\left[\mathsf{g}_{\tau^{}}(U,Y^{\tau^{}})\neq W_{[K]}\right]$
	$\displaystyle\leq\frac{1}{\sqrt{2\pi}}\frac{1}{\sqrt{N}}\frac{1}{\sqrt{\log N}}\cdot\left(1+O\left(\frac{(\log N)^{(3/2)}}{\sqrt{N}}\right)\right)$
	$\displaystyle\quad+\frac{1}{N}+\exp\{-NE\},$		(262)

which is further bounded by $\frac{1}{\sqrt{N\log N}}$ for $N$ large enough. Following steps identical to (125)–(133), we prove the existence of a VLSF code that satisfies (255) for the DM-MAC with $L$ decoding times and error probability $\frac{1}{\sqrt{N\log N}}$ .

Finally, invoking (255) with $L$ replaced by $L-1$ and the stop-at-time-zero procedure in the proof sketch of Theorem 2 with $\epsilon_{N}^{\prime}=\frac{1}{\sqrt{N^{\prime}\log N^{\prime}}}$ , we complete the proof of Theorem 5.

D.III Proof of (44)

The proof of (44) follows steps similar to those in the proof of [8, Th. 2]. Below, we detail the differences between the proofs of (44), Theorem 5, and [8, Th. 2].

1.

In (44), we choose $cN+1$ decoding times as $n_{i}=i-1$ for $i=1,\dots,cN+1$ for a sufficiently large constant $c>1$ . This differs from Theorem 5 where $L$ does not grow with $N$ ( $L=O(1)$ ) and the gaps between consecutive decoding times can differ. In [8, Th. 2], any integer time is available for decoding, giving $L=n_{\max}=\infty$ .
2.

We here set the parameter $\gamma$ differently from how it was set in (256) and (257). The difference accounts for the error event that some of the messages are decoded incorrectly and some of the messages are decoded correctly. Specifically, we set

$\displaystyle\gamma$ $\displaystyle=NI_{K}-a$ (263)

$\displaystyle=\sum_{k=1}^{K}\log M_{k}+\log N+b,$ (264)

where $a$ is an upper bound on the information density $\imath_{K}(X_{[K]};Y_{K})$ , and $b$ is a positive constant to be determined later. Since the number of decoding times $L$ grows linearly with $N$ and $c>1$ , applying the Chernoff bound gives

$\displaystyle\eqref{eq:true}+\eqref{eq:1errorS}+\eqref{eq:1errorExp}\leq\exp\{-NE\}$ (265)

for some $E>0$ and $N$ large enough. Hence, the error probability $\epsilon$ in Theorem 4 is bounded by $\frac{\exp\{-b\}}{N}+\exp\{-NE\}$ , which can be further bounded by $\frac{1}{N}$ by appropriately choosing the constant $b$ .

The term (37) disappears in [8, Th. 2] because $n_{L}=\infty$ ; the terms (39) and (40) disappear in [8, Th. 2] because the channel is point-to-point. Therefore, $b$ is set to 0 in [8, Th. 2].
3.

We bound the average decoding time $\mathbb{E}\left[\tau^{*}\right]$ as

$\displaystyle\mathbb{E}\left[\tau^{*}\right]$ $\displaystyle\leq\frac{\gamma+a}{I_{K}}=N$ (266)

using Doob’s optional stopping theorem as used in[8, eq. (106)-(107)] while $\mathbb{E}\left[\tau^{*}\right]$ in the proof of Theorem 5 is bounded by bounding the tail probability of information density.

The steps above show the achievability of an $(N,cN,M_{[K]},1/N)$ code with

$\displaystyle\sum_{k=1}^{K}\log M_{k}=NI_{K}-\log N+O(1).$ (267)
4.

Lastly, as in [8, Th. 2], we invoke the stop-at-time-zero procedure from the proof sketch of Theorem 2 with $\epsilon_{N}^{\prime}=\frac{1}{N^{\prime}}$ .

Appendix E Proof of Theorem 6

In Theorem 6, we employ a multiple hypothesis test at some early time $n_{0}$ to estimate the number of active transmitters followed by a VLSF MAC coding. Since VLSF MAC codeword design employed in Theorem 5 is unchanged (up to the coding dimension), the VLSF MAC code employs a single nested codebook, as described in the proof below. If the test decides that the number of active transmitters is $\hat{k}=0$ , then the decoder declares no active transmitters at time $n_{0}$ and stops the transmission at that time. If the estimated number of active transmitters is $\hat{k}\neq 0$ , then the decoder decides to decode at one of the available times $n_{\hat{k},1}$ , …, $n_{\hat{k},L}$ using the decoder for the MAC with $\hat{k}$ transmitters.

E.I Encoding and decoding

Encoding: As in the DM-PPC and DM-MAC cases, the codewords are generated i.i.d. from the distribution $P_{X}^{n_{K,L}}$ .

Decoding: The decoder combines a $(K+1)$ -ary hypothesis test and the threshold test that is used for the DM-MAC.

Multiple hypothesis test: Given distributions $P_{Y_{k}}$ , $k\in\{0,\dots,K\}$ where $\mathcal{Y}_{K}$ is the common alphabet, we test the hypotheses

\displaystyle H_{k}\colon Y^{n_{0}}\sim P_{Y_{k}}^{n_{0}},\quad k\in\{0,\dots,K\}.

(268)

The error probability constraints of our test are

	$\displaystyle\mathbb{P}\left[\text{Decide }H_{s}\text{ where }s\neq 0\|H_{0}\right]$	$\displaystyle\leq\epsilon_{0}$		(269)
	$\displaystyle\mathbb{P}\left[\text{Decide }H_{s}\text{ where }s\neq k\|H_{k}\right]$	$\displaystyle\leq\exp\{-n_{0}E+o(n_{0})\}$		(270)

for $k\in[K]$ , where $E>0$ is a constant.

Due to the asymmetry in (269)–(270), we employ a composite hypothesis testing to decide whether $H_{0}$ is true; that is, the test declares $H_{0}$ if

\displaystyle\log\frac{P_{Y_{0}}^{n_{0}}(y^{n_{0}})}{P_{Y_{s}}^{n_{0}}(y^{n_{0}})}\geq\eta_{s}

(271)

for all $s\in[K]$ , where the threshold values $\eta_{s}$ , $s\in[K]$ , are chosen to satisfy (269). If the condition in (271) is not satisfied, then the test applies the maximum likelihood decoding rule, i.e., the output is $H_{s}$ , where

\displaystyle s=\arg\max_{s\in[K]}P_{Y_{s}}^{n_{0}}(y^{n_{0}}).

(272)

From the asymptotics of the error probability bound for composite hypothesis testing in [28, Th. 5], the maximum type-II error of the composite hypothesis test is bounded as

	$\displaystyle\max_{k\in[K]}\mathbb{P}\left[\text{Decide }H_{0}\|H_{k}\right]$
	$\displaystyle\quad\leq\exp\left\{-n_{0}\min_{k\in[K]}D(P_{Y_{0}}\\|P_{Y_{k}})+O(\sqrt{n_{0}})\right\}.$		(273)

If $P_{Y_{0}}$ is not absolutely continuous with respect to $P_{Y_{k}}$ , (273) remains valid when $D(P_{Y_{0}}\|P_{Y_{k}})=\infty$ since in that case, we can achieve arbitrarily large type-II error probability exponent (see [13, Lemmas 57-58].)

From [55], the maximum likelihood test yields

\displaystyle\max_{(k,s)\in[K]^{2}}\mathbb{P}\left[\text{Decide }H_{s}|H_{k}\right]\leq\exp\{-nE_{C}+o(n)\},

(274)

where

\displaystyle E_{C}=\min_{k,s}\min_{\lambda\in(0,1)}\log\sum_{y\in\mathcal{Y}_{K}}P_{Y_{k}}(y)^{1-\lambda}P_{Y_{s}}(y)^{\lambda}

(275)

is the minimum Chernoff distance between the pairs $(P_{Y_{k}},P_{Y_{s}})$ , $k\neq s\in[K]$ . Combining (273) and (275), the conditions in (269)–(270) are satisfied with

\displaystyle E=\min\left\{\min_{k\in[K]}D(P_{Y_{0}}\|P_{Y_{k}}),E_{C}\right\}>0.

(276)

If the hypothesis test declares the hypothesis $H_{\hat{k}}$ , $\hat{k}\neq 0$ , then the receiver decides to decode $\hat{k}$ messages at one of the decoding times in $\{n_{\hat{k},1},\dots,n_{\hat{k},L}\}$ using the VLSF code in Section D for the $\hat{k}$ -MAC, where $n_{\hat{k},1}$ is set to $n_{0}$ .

E.II Error analysis

We here bound the probability of error for the RAC code in Definition 4.

No active transmitters: For $k=0$ , the only error event is that the composite hypothesis test at time $n_{0}$ does not declare $H_{0}$ given that $H_{0}$ is true. By (269), the probability of this event is bounded by $\epsilon_{0}$ .

$k\geq 1$ active transmitters: When there is at least one active transmitter, there is an error if and only if at least one of the following events occurs:

•

$\mathcal{E}_{\hat{k}\neq k}$ : The number of active transmitters is estimated incorrectly at time $n_{0}$ , i.e., $\hat{k}\neq k$ , which results in decoding of $\hat{k}$ messages instead of $k$ messages.
•

$\mathcal{E}_{\textnormal{mes}}$ : A list of messages $m_{[k]}^{\prime}\neq m_{[k]}$ is decoded at one of the times in $\{n_{k,1},\dots,n_{k,L}\}$ .

In the following discussion, we bound the probability of these events separately, and apply the union bound to combine them.

Since the encoders are identical, the event $\mathcal{E}_{\textnormal{rep}}=\{W_{i}=W_{j}\text{ for some }i\neq j\}$ , where two or more transmitters send the same yields a dependence with transmitted codewords. Since transmitted codewords are usually independent, treating $\mathcal{E}_{\mathrm{rep}}$ as an error simplifies the analysis. By the union bound, we have

\displaystyle\mathbb{P}\left[\mathcal{E}_{\textnormal{rep}}\right]\leq\frac{k(k-1)}{2M}.

(277)

Applying the union bound, we bound the error probability as

	$\displaystyle\epsilon_{k}$	$\displaystyle\leq$	$\displaystyle\mathbb{P}\left[\mathcal{E}_{\textnormal{rep}}\right]+\mathbb{P}\left[\mathcal{E}^{\mathrm{c}}_{\textnormal{rep}}\right]\mathbb{P}\left[\mathcal{E}_{\hat{k}\neq k}\cup\mathcal{E}_{\textnormal{mes}}\middle\|\mathcal{E}^{\mathrm{c}}_{\textnormal{rep}}\right]$		(278)
		$\displaystyle\leq$	$\displaystyle\mathbb{P}\left[\mathcal{E}_{\textnormal{rep}}\right]+\mathbb{P}\left[\mathcal{E}_{\hat{k}\neq k}\middle\|\mathcal{E}^{\mathrm{c}}_{\textnormal{rep}}\right]+\mathbb{P}\left[\mathcal{E}_{\textnormal{mes}}\middle\|\mathcal{E}^{\mathrm{c}}_{\textnormal{rep}}\cap\mathcal{E}^{\mathrm{c}}_{\hat{k}\neq k}\right].$		(278)

By (270), the probability $\mathbb{P}\left[\mathcal{E}_{\hat{k}\neq k}\middle|\mathcal{E}^{\mathrm{c}}_{\textnormal{rep}}\right]$ is bounded as

\displaystyle\mathbb{P}\left[\mathcal{E}_{\hat{k}\neq k}\middle|\mathcal{E}^{\mathrm{c}}_{\textnormal{rep}}\right]\leq\exp\{-n_{0}E+o(n_{0})\}.

(279)

Since the number of active transmitters $k$ is not available at the decoder at time 0, we here slightly modify the stop-at-time-zero procedure from the proof sketch of Theorem 2. We set the smallest decoding time $n_{j,1}$ to $n_{0}\neq 0$ for all $j\in[K]$ . Given the estimate $\hat{k}$ of the number of active transmitters $k$ , we employ the stop-at-time-zero procedure with the triple $(N^{\prime},\epsilon,\epsilon_{N}^{\prime})$ replaced by $(N_{\hat{k}}^{\prime},\epsilon_{\hat{k}},\epsilon_{N_{\hat{k}}^{\prime}})$ .

Let $\mathcal{E}_{\textnormal{stop}}$ denote the event that the decoder chooses to stop at time $n_{k,1}=n_{0}$ . We bound $\mathbb{P}\left[\mathcal{E}_{\textnormal{mes}}\middle|\mathcal{E}^{\mathrm{c}}_{\textnormal{rep}}\cap\mathcal{E}^{\mathrm{c}}_{\hat{k}\neq k}\right]$ as

	$\displaystyle\mathbb{P}\left[\mathcal{E}_{\textnormal{mes}}\middle\|\mathcal{E}^{\mathrm{c}}_{\textnormal{rep}}\cap\mathcal{E}^{\mathrm{c}}_{\hat{k}\neq k}\right]\leq\mathbb{P}\left[\mathcal{E}_{\textnormal{stop}}\|\mathcal{E}^{\mathrm{c}}_{\textnormal{rep}}\cap\mathcal{E}^{\mathrm{c}}_{\hat{k}\neq k}\right]$
	$\displaystyle\quad+\mathbb{P}\left[\mathcal{E}_{\textnormal{stop}}^{\mathrm{c}}\|\mathcal{E}^{\mathrm{c}}_{\textnormal{rep}}\cap\mathcal{E}^{\mathrm{c}}_{\hat{k}\neq k}\right]\mathbb{P}\left[\mathcal{E}_{\textnormal{mes}}\middle\|\mathcal{E}^{\mathrm{c}}_{\textnormal{rep}}\cap\mathcal{E}^{\mathrm{c}}_{\hat{k}\neq k}\cap\mathcal{E}^{\mathrm{c}}_{\textnormal{stop}}\right].$		(280)

By Theorem 4, when the RAC decoder decodes decode a list of $k$ messages from $[M]$ at time $n_{k,\ell}$ , we get

	$\displaystyle\mathbb{P}\left[\mathcal{E}_{\textnormal{mes}}\middle\|\mathcal{E}^{\mathrm{c}}_{\textnormal{rep}}\cap\mathcal{E}^{\mathrm{c}}_{\hat{k}\neq k}\cap\mathcal{E}^{\mathrm{c}}_{\textnormal{stop}}\right]$		(281)
	$\displaystyle\quad\leq\mathbb{P}\left[\imath_{k}(X_{[k]}^{n_{k,L}};Y_{k}^{n_{k,L}})<\gamma_{k}\right]$		(282)
	$\displaystyle\quad\quad+\binom{M-k}{k}\exp\{-\gamma_{k}\}$		(283)
	$\displaystyle\quad\quad+\sum_{\ell=2}^{L}\sum_{\mathcal{A}\in\mathcal{P}([k])}$
	$\displaystyle\quad\quad\quad\mathbb{P}\left[\imath_{k}(X_{\mathcal{A}}^{n_{k,\ell}};Y_{k}^{n_{k,\ell}})>N_{k}^{\prime}I_{k}(X_{\mathcal{A}};Y_{k})+N_{k}^{\prime}\lambda^{(k,\mathcal{A})}\right]$		(284)
	$\displaystyle\quad\quad+\sum_{\mathcal{A}\in\mathcal{P}([k])}\binom{M-k}{\|\mathcal{A}\|}$
	$\displaystyle\quad\quad\quad\exp\{-\gamma+N_{k}^{\prime}I_{k}(X_{\mathcal{A}};Y_{k})+N_{k}\lambda^{(k,\mathcal{A})}\},$		(285)

where $N_{k}^{\prime}$ is the average decoding time given $\mathcal{E}_{\textnormal{stop}}^{c}$ , and $\gamma_{k}$ and $\lambda^{(k,\mathcal{A})}$ are constants chosen to satisfy the equations

	$\displaystyle\gamma_{k}$	$\displaystyle=n_{k,\ell}I_{k}-\sqrt{n_{k,\ell}\log_{(L-\ell+1)}(n_{k,\ell})V_{k}}$		(286)
		$\displaystyle=k\log M+\log N_{k}^{\prime}+O(1)$		(287)

for all $\ell\in\{2,\dots,L\}$ , and

\displaystyle\lambda^{(k,\mathcal{A})}

\displaystyle=\frac{N_{k}^{\prime}I_{k}(X_{\mathcal{A}^{c}};Y_{k}|X_{\mathcal{A}})-|\mathcal{A}^{c}|\log M}{2N_{k}^{\prime}},\quad\mathcal{A}\in\mathcal{P}([k]).

(288)

The fact that each $\lambda^{(k,\mathcal{A})}$ is bounded below by a positive constant follows from (287), [28, Lemma 1], and the symmetry assumptions on the RAC.

Following the analysis in Appendix D.II, we conclude that

	$\displaystyle k\log M=N_{k}^{\prime}I_{k}-\sqrt{N_{k}^{\prime}\log_{(L-1)}(N_{k}^{\prime})V_{k}}$
	$\displaystyle\quad+O\left(\sqrt{\frac{N_{k}^{\prime}V_{k}}{\log_{(L-1)}(N_{k}^{\prime})}}\right)$		(289)
	$\displaystyle\mathbb{P}\left[\mathcal{E}_{\textnormal{mes}}\middle\|\mathcal{E}^{\mathrm{c}}_{\textnormal{rep}}\cap\mathcal{E}^{\mathrm{c}}_{\hat{k}\neq k}\cap\mathcal{E}^{\mathrm{c}}_{\textnormal{stop}}\right]\leq\frac{1}{\sqrt{N_{k}^{\prime}\log N_{k}^{\prime}}}.$		(290)

Note that by (277) and (289), the bound on $\mathbb{P}\left[\mathcal{E}_{\textnormal{rep}}\right]$ decays exponentially with $N_{k}$ . A consequence of (286) and (289) is that

\displaystyle N_{k}^{\prime}=n_{k,\ell}(1+o(1))

(291)

for all $\ell\geq 2$ and $k\in[K]$ .

Note that from (289), the right-hand side of (277) is bounded by $\frac{1}{N_{k}^{\prime}}$ for $N_{k}^{\prime}$ large enough. We set the time $n_{0}$ so that the right-hand side of (279) is bounded by $\frac{1}{4\sqrt{N_{k}^{\prime}\log N_{k}^{\prime}}}$ for all $k\in[K]$ . This condition is satisfied if

\displaystyle n_{0}\geq\frac{1}{2E}\log N_{k}^{\prime}+o(\log N_{k}^{\prime}).

(292)

The above arguments imply that

\displaystyle\mathbb{P}\left[\mathcal{E}_{\textnormal{rep}}\right]+\mathbb{P}\left[\mathcal{E}_{\hat{k}\neq k}\middle|\mathcal{E}^{\mathrm{c}}_{\textnormal{rep}}\right]\leq\frac{1}{2\sqrt{N_{k}^{\prime}\log N_{k}^{\prime}}}

(293)

for $N_{k}^{\prime}$ large enough. As in the DM-MAC case, we set

\displaystyle p\triangleq\mathbb{P}\left[\mathcal{E}_{\textnormal{stop}}|\mathcal{E}^{\mathrm{c}}_{\textnormal{rep}}\cap\mathcal{E}^{\mathrm{c}}_{\hat{k}\neq k}\right]=\frac{\epsilon_{k}^{\prime}-\frac{1}{\sqrt{N_{k}^{\prime}\log N_{k}^{\prime}}}}{1-\frac{1}{\sqrt{N_{k}^{\prime}\log N_{k}^{\prime}}}}

(294)

where

\displaystyle\epsilon_{k}^{\prime}=\epsilon_{k}-\frac{1}{2\sqrt{N_{k}^{\prime}\log N_{k}^{\prime}}}.

(295)

Combining (278), (280), (290), and (293)–(294), the error probability of the RAC code is bounded by

	$\displaystyle\mathbb{P}\left[\mathcal{E}_{\textnormal{rep}}\right]+\mathbb{P}\left[\mathcal{E}_{\hat{k}\neq k}\middle\|\mathcal{E}^{\mathrm{c}}_{\textnormal{rep}}\right]+\mathbb{P}\left[\mathcal{E}_{\textnormal{stop}}\|\mathcal{E}^{\mathrm{c}}_{\textnormal{rep}}\cap\mathcal{E}^{\mathrm{c}}_{\hat{k}\neq k}\right]$
	$\displaystyle+\mathbb{P}\left[\mathcal{E}_{\textnormal{stop}}^{\mathrm{c}}\|\mathcal{E}^{\mathrm{c}}_{\textnormal{rep}}\cap\mathcal{E}^{\mathrm{c}}_{\hat{k}\neq k}\right]\mathbb{P}\left[\mathcal{E}_{\textnormal{mes}}\middle\|\mathcal{E}^{\mathrm{c}}_{\textnormal{rep}}\cap\mathcal{E}^{\mathrm{c}}_{\hat{k}\neq k}\cap\mathcal{E}^{\mathrm{c}}_{\textnormal{stop}}\right]$		(296)
	$\displaystyle\quad\leq\frac{1}{2\sqrt{N_{k}^{\prime}\log N_{k}^{\prime}}}+p+(1-p)\frac{1}{\sqrt{N_{k}^{\prime}\log N_{k}^{\prime}}}$		(297)
	$\displaystyle\quad=\epsilon_{k}.$		(298)

The average decoding time of the code is bounded as

$\displaystyle N_{k}$	$\displaystyle\leq\mathbb{E}\left[\tau_{k}^{*}\|\mathcal{E}_{\hat{k}\neq k}\cup\mathcal{E}_{\textnormal{rep}}\right]\mathbb{P}\left[\mathcal{E}_{\hat{k}\neq k}\cup\mathcal{E}_{\textnormal{rep}}\right]$
	$\displaystyle\quad+\mathbb{E}\left[\tau_{k}^{*}\|\mathcal{E}_{\hat{k}\neq k}^{\mathrm{c}}\cap\mathcal{E}_{\textnormal{rep}}^{\mathrm{c}}\right]\mathbb{P}\left[\mathcal{E}_{\hat{k}\neq k}^{\mathrm{c}}\cap\mathcal{E}_{\hat{k}\neq k}^{\mathrm{c}}\right]$	(299)
	$\displaystyle\leq\frac{N_{K,L}}{2\sqrt{N_{k}^{\prime}\log N_{k}^{\prime}}}+n_{0}p+N_{k}^{\prime}(1-p).$	(300)

From (291) and (294)–(295), we get

\displaystyle N_{k}^{\prime}=\frac{N_{k}}{1-\epsilon_{k}^{\prime}}\left(1+O\left(\frac{1}{\sqrt{N_{k}\log N_{k}}}\right)\right).

(301)

Plugging (301) into (289) completes the proof.

Appendix F Proof of Theorem 7

The non-asymptotic achievability bound in Theorem 1 applies to the Gaussian PPC with maximal power constraint $P$ (58) with the modification that the error probability (18) has an additional term for power constraint violations

\displaystyle\mathbb{P}\left[\bigcup_{\ell=1}^{L}\left\{\left\lVert X^{n_{\ell}}\right\rVert^{2}>n_{\ell}P\right\}\right].

(302)

The proof follows similarly to the proof of Theorem 2 as we employ the stop-at-time-zero procedure in the proof sketch of Theorem 2. We extend Lemma 1 to the Gaussian PPC, showing

	$\displaystyle\log M^{*}\left(N,L,\frac{1}{\sqrt{N\log N}},P\right)$
	$\displaystyle\geq{NC(P)}-\sqrt{N\log_{(L)}(N)\,V(P)}+O\left(\sqrt{\frac{N}{\log_{(L)}(N)}}\right).$		(303)

The input distribution $P_{X^{n_{L}}}$ used in the proof of (303) differs from the one used in the proof of Lemma 1, causing changes in the analysis on the probability $\mathbb{P}\left[\imath(X^{n_{L}};Y^{n_{L}})<\gamma\right]$ and the threshold $\gamma$ in (113). Below, we detail these differences.

F.1 The input distribution $P_{X^{n_{L}}}$

We choose the distribution of the random codewords, $P_{X^{n_{L}}}$ , in Theorem 1 as follows. Set $n_{0}=0$ . For each codeword, independently draw sub-codewords $X^{n_{j-1}+1:n_{j}}$ , $j\in[L]$ from the uniform distribution on the $(n_{j}-n_{j-1})$ -dimensional sphere of radius $\sqrt{(n_{j}-n_{j-1})P}$ . Let $P_{X^{n_{L}}}$ denote the distribution of the length- $n_{L}$ random codewords described above. Since codewords chosen under $P_{X^{n_{L}}}$ never violate the power constraint (58), the power violation probability in (302) is 0. Furthermore, the power constraint is satisfied with equality at each of the dimensions $n_{1},\dots,n_{L}$ ; our analysis in [39] shows that for any finite $L$ , and sufficiently large increments $n_{\ell}-n_{\ell-1}$ for all $\ell\in[L]$ , using this restricted subset instead of the entire $n_{L}$ -dimensional power sphere results in no change in the asymptotic expansion (60) for the fixed-length no-feedback codes up to the third-order term.

F.2 Bounding the probability of the information density random variable

We begin by bounding the probability

\displaystyle\mathbb{P}\left[\imath(X^{n_{\ell}};Y^{n_{\ell}})<\gamma\right],\quad\ell\in[L],

(304)

that appears in Theorem 1 under the input distribution described above. Under this choice of $P_{X^{n_{L}}}$ , the random variable $\imath(X^{n_{\ell}};Y^{n_{\ell}})$ is not a sum of $n_{\ell}$ i.i.d. random variables. We wish to apply the moderate deviations result in Lemma 2. To do this, we first introduce the following lemma from [56], which uniformly bounds the Radon-Nikodym derivative of the channel output distribution in response to the uniform distribution on a sphere as compared to the channel output distribution in response to i.i.d. Gaussian inputs.

Lemma 5 (MolavianJazi and Laneman [56, Prop. 2])

Let $X^{n}$ be distributed uniformly over the $n$ -dimensional sphere of radius $\sqrt{nP}$ . Let $\tilde{X}^{n}\sim\mathcal{N}(\mathbf{0},P\mathsf{I}_{n})$ . Let $P_{Y^{n}}$ and $P_{\tilde{Y}^{n}}$ denote the channel output distributions in response to $P_{X^{n}}$ and $P_{\tilde{X}^{n}}$ , respectively, where $P_{Y^{n}|X^{n}}$ is the point-to-point Gaussian channel (55). Then there exists an $n_{0}\in\mathbb{N}$ such that for all $n\geq n_{0}$ and $y^{n}\in\mathbb{R}^{n}$ , it holds that

\displaystyle\frac{\mathrm{d}P_{Y^{n}}(y^{n})}{\mathrm{d}P_{\tilde{Y}^{n}}(y^{n})}

\displaystyle\leq J(P)\triangleq{27}{\sqrt{\frac{\pi}{8}}}\frac{1+P}{\sqrt{1+2P}}.

(305)

Let $P_{\tilde{Y}}^{n_{\ell}}$ be $\mathcal{N}(\mathbf{0},(1+P)\mathsf{I}_{n_{\ell}})$ . By Lemma 5, we bound $\eqref{eq:probimath}$ as

$\displaystyle\IEEEeqnarraymulticol{3}{l}{\mathbb{P}\left[\imath(X^{n_{\ell}};Y^{n_{\ell}})<\gamma\right]}$			(306)
	$\displaystyle=$	$\displaystyle\mathbb{P}\left[\log\frac{\mathrm{d}P_{Y^{n_{\ell}}\|X^{n_{\ell}}}(Y^{n_{\ell}}\|X^{n_{\ell}})}{\mathrm{d}P_{\tilde{Y}^{n_{\ell}}}(Y^{n_{\ell}})}<\gamma+\log\frac{\mathrm{d}P_{Y^{n_{\ell}}}(Y^{n_{\ell}})}{\mathrm{d}P_{\tilde{Y}^{n_{\ell}}}(Y^{n_{\ell}})}\right]$	(306)
	$\displaystyle\leq$	$\displaystyle\mathbb{P}\left[\log\frac{\mathrm{d}P_{Y^{n_{\ell}}\|X^{n_{\ell}}}(Y^{n_{\ell}}\|X^{n_{\ell}})}{\mathrm{d}P_{\tilde{Y}^{n_{\ell}}}(Y^{n_{\ell}})}<\gamma+\ell\log J(P)\right],$	(307)

where $J(P)$ is the constant given in (305), and (307) follows from the fact that $P_{Y^{n_{\ell}}}$ is the product of $\ell$ output distributions of dimensions $n_{j}-n_{j-1},j\in[\ell]$ , each induced by a uniform distribution over a sphere of the corresponding radius. As argued in [13, 35, 56, 39], by spherical symmetry, the distribution of the random variable

\displaystyle\log\frac{\mathrm{d}P_{Y^{n_{\ell}}|X^{n_{\ell}}}(Y^{n_{\ell}}|X^{n_{\ell}})}{\mathrm{d}P_{\tilde{Y}^{n_{\ell}}}(Y^{n_{\ell}})}

(308)

depends on $X^{n_{\ell}}$ only through its norm $\left\lVert X^{n_{\ell}}\right\rVert$ . Since $\left\lVert X^{n_{\ell}}\right\rVert^{2}=n_{\ell}P$ with probability 1, any choice of $x^{n_{\ell}}$ such that $\left\lVert x^{n_{i}}\right\rVert^{2}=n_{i}P$ for $i\in[\ell]$ gives

	$\displaystyle\mathbb{P}\left[\log\frac{\mathrm{d}P_{Y^{n_{\ell}}\|X^{n_{\ell}}}(Y^{n_{\ell}}\|X^{n_{\ell}})}{\mathrm{d}P_{\tilde{Y}^{n_{\ell}}}(Y^{n_{\ell}})}<\gamma+\ell\log J(P)\right]=$
	$\displaystyle\mathbb{P}\left[\log\frac{\mathrm{d}P_{Y^{n_{\ell}}\|X^{n_{\ell}}}(Y^{n_{\ell}}\|X^{n_{\ell}})}{\mathrm{d}P_{\tilde{Y}^{n_{\ell}}}(Y^{n_{\ell}})}<\gamma+\ell\log J(P)\middle\|X^{n_{\ell}}=x^{n_{\ell}}\right].$		(309)

We set $x^{n_{\ell}}=(\sqrt{P},\sqrt{P},\dots,\sqrt{P})=\sqrt{P}\mathbf{1}$ to obtain an i.i.d. sum in (309). Given $X^{n_{\ell}}=\sqrt{P}\mathbf{1}$ , the distribution of (308) is the same as the distribution of the sum

\displaystyle\sum_{i=1}^{n_{\ell}}A_{i}

(310)

of $n_{\ell}$ i.i.d. random variables

\displaystyle A_{i}=C(P)+\frac{P}{2(1+P)}\left(1-Z_{i}^{2}+\frac{2}{\sqrt{P}}Z_{i}\right),\quad i\in[n_{\ell}],

(311)

where $Z_{1},\dots,Z_{n_{\ell}}$ are drawn independently from $\mathcal{N}(0,1)$ (see e.g., [13, eq. (205)]). The mean and variance of $A_{1}$ are

	$\displaystyle\mathbb{E}\left[A_{1}\right]$	$\displaystyle=C(P)$		(312)
	$\displaystyle\mathrm{Var}\left[A_{1}\right]$	$\displaystyle=V(P).$		(313)

From (307)–(310), we get

\displaystyle\mathbb{P}\left[\imath(X^{n_{\ell}};Y^{n_{\ell}})<\gamma\right]\leq\mathbb{P}\left[\sum_{i=1}^{n_{\ell}}A_{i}<\gamma+\ell\log J(P)\right].

(314)

To verify that Lemma 2 is applicable to the right-hand side of (314), it only remains to show that $\mathbb{E}\left[(A_{1}-C(P))^{3}\right]$ is finite, and $A_{1}-C(P)$ satisfies Cramér’s condition, that is, there exists some $t_{0}>0$ such that $\mathbb{E}\left[\exp\{t(A_{1}-C(P))\}\right]<\infty$ for all $|t|<t_{0}$ . From (311), $(A_{1}-C(P))^{3}$ has the same distribution as a 6-degree polynomial of the Gaussian random variable $Z\sim\mathcal{N}(0,1)$ . This polynomial has a finite mean since all moments of $Z$ are finite. Let $c\triangleq\frac{P}{2(1+P)}$ , $f\triangleq\frac{2}{\sqrt{P}}$ , and $t^{\prime}\triangleq tc$ . To show that Cramér’s condition holds, we compute

	$\displaystyle\mathbb{E}\left[\exp\{t(A_{1}-C(P))\}\right]$
	$\displaystyle=\mathbb{E}\left[\exp\{t^{\prime}(1-Z^{2}+fZ)\}\right]$		(315)
	$\displaystyle=\int_{-\infty}^{\infty}\frac{1}{\sqrt{2\pi}}\exp\left\{-\frac{x^{2}}{2}+t^{\prime}(1-x^{2}+fx)\right\}\mathrm{d}x$		(316)
	$\displaystyle=\frac{1}{\sqrt{1+2t^{\prime}}}\exp\left\{t^{\prime}+\frac{t^{\prime}f}{2(1+2t^{\prime})}\right\}.$		(317)

Thus, $\mathbb{E}\left[\exp\{t(A_{1}-C(P))\}\right]<\infty$ for $t^{\prime}>-\frac{1}{2}$ , and $t_{0}=\frac{1}{2c}>0$ satisfies Cramér’s condition.

F.3 The threshold $\gamma$

We set $\gamma,n_{1},\dots,n_{L}$ so that the equalities

\displaystyle\gamma=n_{\ell}C(P)-\sqrt{n_{\ell}\log_{(L-\ell+1)}(n_{\ell})V(P)}-\ell\log J(P)

(318)

hold for all $\ell\in[L]$ .

The rest of the proof follows identically to (119)–(133) with $C$ and $V$ replaced by $C(P)$ and $V(P)$ , respectively, giving

	$\displaystyle\log M$	$\displaystyle\geq NC(P)-\sqrt{N\log_{(L)}(N)V(P)}$
		$\displaystyle-\frac{1}{\sqrt{2\pi}}\sqrt{\frac{NV(P)}{\log_{(L)}(N)}}(1+o(1))-\log N-L\log J(P),$		(319)

which completes the proof.

References

[1] R. C. Yavas, V. Kostina, and M. Effros, “Variable-length feedback codes with several decoding times for the Gaussian channel,” in Proc. IEEE Int. Symp. Inf. Theory (ISIT), July 2021, pp. 1883–1888.
[2] ——, “Nested sparse feedback codes for point-to-point, multiple access, and random access channels,” in IEEE Information Theory Workshop (ITW), Kanazawa, Japan, Oct. 2021, pp. 1–6.
[3] C. Shannon, “The zero error capacity of a noisy channel,” IRE Trans. Inf. Theory, vol. 2, no. 3, pp. 8–19, Sep. 1956.
[4] M. Horstein, “Sequential transmission using noiseless feedback,” IEEE Trans. Inf. Theory, vol. 9, no. 3, pp. 136–143, July 1963.
[5] J. Schalkwijk and T. Kailath, “A coding scheme for additive noise channels with feedback–I: No bandwidth constraint,” IEEE Trans. Inf. Theory, vol. 12, no. 2, pp. 172–182, Apr. 1966.
[6] A. B. Wagner, N. V. Shende, and Y. Altuğ, “A new method for employing feedback to improve coding performance,” IEEE Trans. Inf. Theory, vol. 66, no. 11, pp. 6660–6681, Nov. 2020.
[7] M. V. Burnashev, “Data transmission over a discrete channel with feedback: Random transmission time,” Problems of Information Transmission, vol. 12, no. 4, pp. 10–30, Oct. 1976.
[8] Y. Polyanskiy, H. V. Poor, and S. Verdú, “Feedback in the non-asymptotic regime,” IEEE Trans. Inf. Theory, vol. 57, no. 8, pp. 4903–4925, Aug. 2011.
[9] A. Tchamkerten and I. E. Telatar, “Variable length coding over an unknown channel,” IEEE Trans. Inf. Theory, vol. 52, no. 5, pp. 2126–2145, May 2006.
[10] M. Naghshvar, T. Javidi, and M. Wigger, “Extrinsic Jensen–Shannon divergence: Applications to variable-length coding,” IEEE Trans. Inf, Theory, vol. 61, no. 4, pp. 2148–2164, Apr. 2015.
[11] H. Yang, M. Pan, A. Antonini, and R. D. Wesel, “Sequential transmission over binary asymmetric channels with feedback,” IEEE Trans. Inf. Theory, vol. 68, no. 11, pp. 7023–7042, Nov. 2022.
[12] N. Guo and V. Kostina, “Reliability function for streaming over a DMC with feedback,” IEEE Trans. Inf. Theory, vol. 69, no. 4, pp. 2165–2192, Apr. 2023.
[13] Y. Polyanskiy, H. V. Poor, and S. Verdu, “Channel coding rate in the finite blocklength regime,” IEEE Trans. Inf. Theory, vol. 56, no. 5, pp. 2307–2359, May 2010.
[14] S. L. Fong and V. Y. F. Tan, “Asymptotic expansions for the AWGN channel with feedback under a peak power constraint,” in Proc. IEEE Int. Symp. Inf. Theory (ISIT), Hong Kong, China, June 2015, pp. 311–315.
[15] Y. Altuğ, H. V. Poor, and S. Verdú, “Variable-length channel codes with probabilistic delay guarantees,” in 53rd Annual Allerton Conference on Communication, Control, and Computing (Allerton), Monticello, IL, USA, Sep. 2015, pp. 642–649.
[16] J. Östman, R. Devassy, G. Durisi, and E. G. Ström, “Short-packet transmission via variable-length codes in the presence of noisy stop feedback,” IEEE Trans. Wireless Commun., vol. 20, no. 1, pp. 214–227, Jan. 2021.
[17] G. Forney, “Exponential error bounds for erasure, list, and decision feedback schemes,” IEEE Trans. Inf. Theory, vol. 14, no. 2, pp. 206–220, Mar. 1968.
[18] S. Ginzach, N. Merhav, and I. Sason, “Random-coding error exponent of variable-length codes with a single-bit noiseless feedback,” in IEEE Inf. Theory Workshop (ITW), Kaohsiung, Taiwan, Nov. 2017, pp. 584–588.
[19] L. V. Truong and V. Y. F. Tan, “On Gaussian MACs with variable-length feedback and non-vanishing error probabilities,” arXiv:1609.00594v2, Sep. 2016.
[20] ——, “On Gaussian MACs with variable-length feedback and non-vanishing error probabilities,” IEEE Trans. Inf. Theor., vol. 64, no. 4, p. 2333–2346, Apr. 2018.
[21] K. F. Trillingsgaard, W. Yang, G. Durisi, and P. Popovski, “Common-message broadcast channels with feedback in the nonasymptotic regime: Stop feedback,” IEEE Trans. Inf. Theory, vol. 64, no. 12, pp. 7686–7718, Dec. 2018.
[22] M. Heidari, A. Anastasopoulos, and S. S. Pradhan, “On the reliability function of discrete memoryless multiple-access channel with feedback,” in 2018 IEEE Information Theory Workshop (ITW), Guangzhou, China, Nov. 2018, pp. 1–5.
[23] K. F. Trillingsgaard and P. Popovski, “Variable-length coding for short packets over a multiple access channel with feedback,” in 2014 11th International Symposium on Wireless Communications Systems (ISWCS), Barcelona, Spain, Aug. 2014, pp. 796–800.
[24] S. H. Kim, D. K. Sung, and T. Le-Ngoc, “Variable-length feedback codes under a strict delay constraint,” IEEE Communications Letters, vol. 19, no. 4, pp. 513–516, Apr. 2015.
[25] A. R. Williamson, T. Chen, and R. D. Wesel, “Variable-length convolutional coding for short blocklengths with decision feedback,” IEEE Trans. Commun., vol. 63, no. 7, pp. 2389–2403, July 2015.
[26] K. Vakilinia, S. V. S. Ranganathan, D. Divsalar, and R. D. Wesel, “Optimizing transmission lengths for limited feedback with nonbinary LDPC examples,” IEEE Trans. Commun., vol. 64, no. 6, pp. 2245–2257, June 2016.
[27] A. Heidarzadeh, J. Chamberland, R. D. Wesel, and P. Parag, “A systematic approach to incremental redundancy with application to erasure channels,” IEEE Trans. Commun., vol. 67, no. 4, pp. 2620–2631, Apr. 2019.
[28] R. C. Yavas, V. Kostina, and M. Effros, “Random access channel coding in the finite blocklength regime,” IEEE Trans. Inf. Theory, vol. 67, no. 4, pp. 2115–2140, Apr. 2021.
[29] Y. Liu and M. Effros, “Finite-blocklength and error-exponent analyses for LDPC codes in point-to-point and multiple access communication,” in IEEE Int. Symp. Inf. Theory (ISIT), Los Angeles, California, USA, June 2020, pp. 361–366.
[30] H. Yang, R. C. Yavas, V. Kostina, and R. D. Wesel, “Variable-length stop-feedback codes with finite optimal decoding times for BI-AWGN channels,” in IEEE Int. Symp. Inf. Theory (ISIT), Espoo, Finland, July 2022, pp. 2327–2332.
[31] W. Feller, An Introduction to Probability Theory and its Applications, 2nd ed. John Wiley & Sons, 1971, vol. II.
[32] H. Yamamoto and K. Itoh, “Asymptotic performance of a modified Schalkwijk-Barron scheme for channels with noiseless feedback (corresp.),” IEEE Trans. Inf. Theory, vol. 25, no. 6, pp. 729–733, Nov. 1979.
[33] A. Lalitha and T. Javidi, “On error exponents of almost-fixed-length channel codes and hypothesis tests,” arXiv:2012.00077, Nov. 2020.
[34] P. Berlin, B. Nakiboğlu, B. Rimoldi, and E. Telatar, “A simple converse of Burnashev’s reliability function,” IEEE Trans. Inf. Theory, vol. 55, no. 7, pp. 3074–3080, July 2009.
[35] V. Y. F. Tan and M. Tomamichel, “The third-order term in the normal approximation for the AWGN channel,” IEEE Trans. Inf. Theory, vol. 61, no. 5, pp. 2430–2438, May 2015.
[36] A. Tartakovsky, I. Nikiforov, and M. Basseville, Sequential Analysis: Hypothesis Testing and Changepoint Detection, 1st ed. Chapman and Hall CRC, 2014.
[37] V. Y. Tan and O. Kosut, “On the dispersions of three network information theory problems,” IEEE Trans. Inf. Theory, vol. 60, no. 2, pp. 881–903, Feb. 2014.
[38] O. Kosut, “A second-order converse bound for the multiple-access channel via wringing dependence,” IEEE Tran. Inf. Theory, vol. 68, no. 6, pp. 3552–3584, June 2022.
[39] R. C. Yavas, V. Kostina, and M. Effros, “Gaussian multiple and random access channels: Finite-blocklength analysis,” IEEE Trans. Inf. Theory, vol. 67, no. 11, pp. 6983–7009, Nov. 2021.
[40] M. Tomamichel and V. Y. F. Tan, “A tight upper bound for the third-order asymptotics of discrete memoryless channels,” in Proc. IEEE Int. Symp. Inf. Theory (ISIT), Istanbul, Turkey, July 2013, pp. 1536–1540.
[41] P. Moulin, “The log-volume of optimal codes for memoryless channels, asymptotically within a few nats,” IEEE Trans. Inf. Theory, vol. 63, no. 4, pp. 2278–2313, Apr. 2017.
[42] V. V. Petrov, Sums of independent random variables. New York, USA: Springer, Berlin, Heidelberg, 1975.
[43] T. M. Cover and J. A. Thomas, Elements of Information Theory. NJ, USA: Wiley, 2006.
[44] Y. Polyanskiy, “A perspective on massive random-access,” in Proc. IEEE Int. Symp. Inf. Theory (ISIT), Aachen, Germany, June 2017, pp. 2523–2527.
[45] M. Ebrahimi, F. Lahouti, and V. Kostina, “Coded random access design for constrained outage,” in Proc. IEEE Int. Symp. Inf. Theory (ISIT), Aachen, Germany, June 2017, pp. 2732–2736.
[46] W. Yang, G. Caire, G. Durisi, and Y. Polyanskiy, “Optimum power control at finite blocklength,” IEEE Trans. Inf. Theory, vol. 61, no. 9, pp. 4598–4615, Sep. 2015.
[47] L. V. Truong, S. L. Fong, and V. Y. F. Tan, “On Gaussian channels with feedback under expected power constraints and with non-vanishing error probabilities,” IEEE Trans. Inf. Theory, vol. 63, no. 3, pp. 1746–1765, Mar. 2017.
[48] C. E. Shannon, “Probability of error for optimal codes in a Gaussian channel,” The Bell System Technical Journal, vol. 38, no. 3, pp. 611–656, May 1959.
[49] E. MolavianJazi, “A unified approach to Gaussian channels with finite blocklength,” Ph.D. dissertation, University of Notre Dame, July 2014.
[50] Y. Polyanskiy and S. Verdú, “Channel dispersion and moderate deviations limits for memoryless channels,” in 2010 48th Annual Allerton Conference on Communication, Control, and Computing (Allerton), Monticello, USA, Sep. 2010, pp. 1334–1339.
[51] R. W. Butler, Saddlepoint Approximations with Applications, ser. Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, 2007.
[52] A. Wald, “Sequential tests of statistical hypotheses,” The Annals of Mathematical Statistics, vol. 16, no. 2, pp. 117–186, June 1945.
[53] Y. Li and V. Y. F. Tan, “Second-order asymptotics of sequential hypothesis testing,” IEEE Trans. Inf. Theory, vol. 66, no. 11, pp. 7222–7230, Nov. 2020.
[54] A. Wald and J. Wolfowitz, “Optimum character of the sequential probability ratio test,” The Annals of Mathematical Statistics, vol. 19, no. 3, pp. 326–339, Sep. 1948.
[55] C. Leang and D. Johnson, “On the asymptotics of m-hypothesis Bayesian detection,” IEEE Trans. Inf. Theory, vol. 43, no. 1, pp. 280–282, 1997.
[56] E. MolavianJazi and J. N. Laneman, “A second-order achievable rate region for Gaussian multi-access channels via a central limit theorem for functions,” IEEE Trans. Inf. Theory, vol. 61, no. 12, pp. 6719–6733, Dec. 2015.

Recep Can Yavas (S’18–M’22) received the B.S. degree (Hons.) in electrical engineering from Bilkent University, Ankara, Turkey, in 2016. He received the M.S. and Ph.D. degrees in electrical engineering from the California Institute of Technology (Caltech) in 2017 and 2023, respectively. He is currently a research fellow at CNRS at CREATE, Singapore. His research interests include information theory, probability theory, and multi-armed bandits.

Victoria Kostina (S’12–M’14–SM’22) is a professor of electrical engineering and of computing and mathematical sciences at Caltech. She received the bachelor’s degree from Moscow Institute of Physics and Technology (MIPT) in 2004, the master’s degree from University of Ottawa in 2006, and the Ph.D. degree from Princeton University in 2013. During her studies at MIPT, she was affiliated with the Institute for Information Transmission Problems of the Russian Academy of Sciences. Her research interests lie in information theory, coding, communications, learning, and control. She has served as an Associate Editor for IEEE Transactions of Information Theory, and as a Guest Editor for the IEEE Journal on Selected Areas in Information Theory. She received the Natural Sciences and Engineering Research Council of Canada postgraduate scholarship during 2009–2012, Princeton Electrical Engineering Best Dissertation Award in 2013, Simons-Berkeley research fellowship in 2015 and the NSF CAREER award in 2017.

Michelle Effros (S’93–M’95–SM’03–F’09) is the George Van Osdol Professor of Electrical Engineering and Vice Provost at the California Institute of Technology. She was a co-founder of Code On Technologies, a technology licensing firm, which was sold in 2016. Dr. Effros is a fellow of the IEEE and has received a number of awards including Stanford’s Frederick Emmons Terman Engineering Scholastic Award (for excellence in engineering), the Hughes Masters Full-Study Fellowship, the National Science Foundation Graduate Fellowship, the AT&T Ph.D. Scholarship, the NSF CAREER Award, the Charles Lee Powell Foundation Award, the Richard Feynman-Hughes Fellowship, an Okawa Research Grant, and the Communications Society and Information Theory Society Joint Paper Award. She was cited by Technology Review as one of the world’s top 100 young innovators in 2002, became a fellow of the IEEE in 2009, and is a member of Tau Beta Pi, Phi Beta Kappa, and Sigma Xi. She received the B.S. (with distinction), M.S., and Ph.D. degrees in electrical engineering from Stanford University. Her research interests include information theory (with a focus on source, channel, and network coding for multi-node networks) and theoretical neuroscience (with a focus on neurostability and memory). Dr. Effros served as the Editor of the IEEE Information Theory Society Newsletter from 1995 to 1998 and as a Member of the Board of Governors of the IEEE Information Theory Society from 1998 to 2003 and from 2008 to 2017. She served as President of the IEEE Information Theory Society in 2015 and as Executive Director for the film “The Bit Player,” a movie about Claude Shannon, which came out in 2018. She was a member of the Advisory Committee and the Committee of Visitors for the Computer and Information Science and Engineering (CISE) Directorate at the National Science Foundation from 2009 to 2012 and in 2014, respectively. She served on the IEEE Signal Processing Society Image and Multi-Dimensional Signal Processing (IMDSP) Technical Committee from 2001 to 2007 and on ISAT from 2006 to 2009. She served as Associate Editor for the joint special issue on Networking and Information Theory in the IEEE Transactions on Information Theory and the IEEE/ACM Transactions on Networking, as Associate Editor for the special issue honoring the scientific legacy of Ralf Koetter in the IEEE Transactions on Information Theory and, from 2004 to 2007 served as Associate Editor for Source Coding for the IEEE Transactions on Information Theory. She has served on numerous technical program committees and review boards, including serving as general co-chair for the 2009 Network Coding Workshop and technical program committee co-chair for the 2012 IEEE International Symposium on Information Theory and the 2023 IEEE Information Theory Workshop.

	$\displaystyle\left.\frac{\partial N^{\prime}}{\partial n_{2}}\right\|_{\mathbf{n}=\mathbf{n}^{*}}$	$\displaystyle=F(n_{2}^{})-(n_{3}^{}-n_{2}^{})f(n_{2}^{})=0$		(147)
	$\displaystyle\left.\frac{\partial N^{\prime}}{\partial n_{\ell}}\right\|_{\mathbf{n}=\mathbf{n}^{*}}$	$\displaystyle=F(n_{\ell}^{})-F(n_{L-1}^{})-(n_{\ell+1}^{}-n_{\ell}^{})f(n_{\ell}^{*})=0$		(148)

	$\displaystyle\gamma$	$\displaystyle=NI_{K}-a$		(263)
		$\displaystyle=\sum_{k=1}^{K}\log M_{k}+\log N+b,$		(264)

Variable-Length Sparse Feedback Codes for Point-to-Point, Multiple Access, and Random Access Channels

Abstract

Index Terms:

I Introduction

I-A Literature Review on Variable-Length Feedback Codes

I-B Contributions of This Work

II Preliminaries

II-A Notation

II-B Discrete Memoryless Channel and Information Density

III VLSF Codes for the DM-PPC

III-A VLSF Codes with LL Decoding Times

Definition 1

III-B Related Work

III-C Our Achievability Bounds

Theorem 1

Proof:

Theorem 2

Proof:

Theorem 3

Proof:

IV VLSF Codes for the DM-MAC

IV-A Definitions

Definition 2

IV-B Our Achievability Bounds

Theorem 4

Proof:

Theorem 5

Proof:

V VLSF Codes for the DM-RAC with at Most KK Transmitters

Definition 3 (Yavas et al. [28, eq. (1)])

Definition 4

Theorem 6

Proof:

VI VLSF Codes for the Gaussian PPC with Maximal Power Constraints

VI-A Gaussian PPC

VI-B Related Work on the Gaussian PPC

VI-C Main Result

Theorem 7

Proof:

VII Conclusions

Appendix A Proof of Theorem 1

A.I A General SHT-based Achievability Bound

A.I1 SHT definitions

A.I2 Achievability Bound

Theorem 8

Proof:

A.II Proof of Theorem 1

Appendix B Proof of Theorem 2

Lemma 1

Proof:

B.I Proof of Lemma 1

B.I1 Supporting lemmas

Lemma 2 (Petrov [42, Ch. 8, Th. 4])

Lemma 3

Proof:

B.I2 Random encoder design

B.I3 Choosing the threshold γ\gamma and decoding times n1,…,nLn_{1},\dots,n_{L}

B.I4 Analyzing the bounds in Theorem 1

B.II Second-order optimality of the decoding times in Theorem 2

B.II1 Optimality of n2,…,nL−1n_{2},\dots,n_{L-1}

B.II2 Optimality of nLn_{L} and γ\gamma

Appendix C Proof of Theorem 3

Lemma 4 ([36, Cor. 2.3.1, Th. 2.3.3, Th. 2.5.3, Lemma 3.1.1])

C.I Achievability Proof

C.II Converse Proof

Theorem 9

Proof:

Appendix D Proofs for the DM-MAC

D.I Proof of Theorem 4

D.II Proof of Theorem 5

D.III Proof of (44)

Appendix E Proof of Theorem 6

E.I Encoding and decoding

E.II Error analysis

Appendix F Proof of Theorem 7

F.1 The input distribution PXnLP_{X^{n_{L}}}

F.2 Bounding the probability of the information density random variable

Lemma 5 (MolavianJazi and Laneman [56, Prop. 2])

F.3 The threshold γ\gamma

References

III-A VLSF Codes with $L$ Decoding Times

V VLSF Codes for the DM-RAC with at Most $K$ Transmitters

B.I3 Choosing the threshold $\gamma$ and decoding times $n_{1},\dots,n_{L}$

B.II1 Optimality of $n_{2},\dots,n_{L-1}$

B.II2 Optimality of $n_{L}$ and $\gamma$

F.1 The input distribution $P_{X^{n_{L}}}$

F.3 The threshold $\gamma$