Every Bit Counts: Second-Order Analysis of Cooperation in the Multiple-Access Channel

Oliver Kosut, Michelle Effros, Michael Langberg O. Kosut is with the School of Electrical, Computer and Energy Engineering at Arizona State University. Email: okosut@asu.eduM. Effros is with the Department of Electrical Engineering at the California Institute of Technology. Email: effros@caltech.eduM. Langberg is with the Department of Electrical Engineering at the University at Buffalo (State University of New York). Email: mikel@buffalo.eduThis work is supported in part by NSF grants CCF-1817241, CCF-1908725, and CCF-1909451.

Abstract

The work at hand presents a finite-blocklength analysis of the multiple access channel (MAC) sum-rate under the cooperation facilitator (CF) model. The CF model, in which independent encoders coordinate through an intermediary node, is known to show significant rate benefits, even when the rate of cooperation is limited. We continue this line of study for cooperation rates which are sub-linear in the blocklength $n$ . Roughly speaking, our results show that if the facilitator transmits $\log{K}$ bits, there is a sum-rate benefit of order $\sqrt{\log{K}/n}$ . This result extends across a wide range of $K$ : even a single bit of cooperation is shown to provide a sum-rate benefit of order $1/\sqrt{n}$ .

I Introduction

The multiple access channel (MAC) model lies at an interesting conceptual intersection between the notions of cooperation and interference in wireless communications. When viewed from the perspective of any single transmitter, codewords transmitted by other transmitters can only inhibit the first transmitter’s individual communication rate; thus each transmitter sees the others as a source of interference. When viewed from the perspective of the receiver, however, maximizing the total rate delivered to the receiver often requires all transmitters to communicate simultaneously; from the receiver’s perspective, then, the transmitters must cooperate through their simultaneous transmissions to maximize the sum-rate delivered to the receiver.

Simultaneous transmission is, perhaps, the weakest form of cooperation imaginable in a wireless communication model. Nonetheless, the fact that even simultaneous transmission of independent codewords from interfering transmitters can increase the sum-rate deliverable to the MAC receiver begs the question of how much more could be achieved through more significant forms of MAC transmitter cooperation.

The information theory literature devotes considerable effort to studying the impact of encoder cooperation in the MAC. A variety of cooperation models are considered. Examples include the “conferencing” cooperation model [1], in which encoders share information directly in order to coordinate their channel inputs, the “cribbing” cooperation model [2], in which transmitters cooperate by sharing their codeword information (at times causally), and the “cooperation facilitator” (CF) cooperation model [3] in which users coordinate their channel inputs with the help of an intermediary called the CF. The CF distinguishes the amount of information that must be understood to facilitate cooperation (i.e., the rate $R_{\rm IN}$ to the CF) from the amount of information employed in the coordination (i.e., the rate $R_{\rm OUT}$ from the CF). Key results using the CF model show that for many MACs, no matter what the (non-zero) fixed rate $C_{\rm IN}$ , the curve describing the maximal sum-rate as a function of $R_{\rm OUT}$ has infinite slope at $R_{\rm OUT}=0$ [4]. That is, very little coordination through a CF can change the MAC capacity considerably. This phenomenon holds for both average and maximum error sum-rates; it is most extreme in the latter case, where even a finite number of bits (independent of the blocklength) — that is, $R_{\rm OUT}=0$ — can suffice to change the MAC capacity region [5, 6, 7].

We study the CF model for 2-user MACs under the average error criterion. In this setting, the maximal sum-rate is a continuous function of $R_{\rm OUT}$ at $R_{\rm OUT}=0$ [6, 7], implying a first-order upper-bound on the benefit of cooperation for rates that are sub-linear. However, sub-linear CF cooperation may still increase sum-rate, albeit through second-order terms. In this work, we seek to understand the impact of the CF over a wide range of cooperation rates. Specifically, we consider a CF that, after viewing both messages, can transmit one of $K$ signals to both transmitters. We prove achievable bounds that express the benefit of this cooperation as a function of $K$ . These bounds extend all the way from constant $K$ to exponential $K$ . Interestingly, we find that even for $K=2$ (i.e., one bit of cooperation), there is a benefit in the second-order (i.e., dispersion) term, corresponding to an improvement of $O(\sqrt{n})$ message bits. We prove two main achievable bounds, each of which is optimal for a different range of $K$ values. The proof of the first bound is based on refined asymptotic analysis similar to typical second-order bounds. The proof of the second bound is based on the method of types. For a wide range of $K$ values, we find that the benefit is $O(\sqrt{n\log K})$ message bits.

II Problem Setup

An $(M_{1},M_{2},K)$ facilitated multiple access code for multiple access channel (MAC)

({\cal X}_{1}\times{\cal X}_{2},p_{Y|X_{1},X_{2}}(y|x_{1},x_{2}),{\cal Y})

is defined by a facilitator code

\displaystyle e:

\displaystyle[M_{1}]\times[M_{2}]\rightarrow[K]

a pair of encoders

		$\displaystyle f_{1}:$	$\displaystyle[M_{1}]\times[K]\rightarrow{\cal X}_{1}$
		$\displaystyle f_{2}:$	$\displaystyle[M_{2}]\times[K]\rightarrow{\cal X}_{2}$

and a decoder

g:{\cal Y}\rightarrow[M_{1}]\times[M_{2}].

The encoder’s output is sometimes described using the abbreviated notation

	$\displaystyle X_{1}(m_{1},m_{2})$	$\displaystyle=$	$\displaystyle f_{1}(m_{1},e(m_{1},m_{2}))$
	$\displaystyle X_{2}(m_{1},m_{2})$	$\displaystyle=$	$\displaystyle f_{2}(m_{2},e(m_{1},m_{2})).$

The average error probability for the given code is

	$\displaystyle P_{e}$	$\displaystyle=\frac{1}{M_{1}M_{2}}\sum_{m_{1}=1}^{M_{1}}\sum_{m_{2}=1}^{M_{2}}\Pr\big{(}g(Y)\neq(m_{1},m_{2})\big{\|}$
		$\displaystyle\qquad(X_{1},X_{2})=(X_{1}(m_{1},m_{2}),X_{2}(m_{1},m_{2}))\big{)}.$

We also consider codes for the $n$ -length product channel, where ${\cal X}_{1},{\cal X}_{2},{\cal Y}$ are replaced by ${\cal X}_{1}^{n},{\cal X}_{2}^{n},{\cal Y}^{n}$ respectively, and where

p_{Y^{n}|X_{1}^{n},X_{2}^{n}}(y^{n}|x_{1}^{n},x_{2}^{n})=\prod_{i=1}^{n}p_{Y|X_{1},X_{2}}(y_{i}|x_{1i},x_{2i}).

An $(M_{1},M_{2},K)$ code for the $n$ -length channel achieving average probability of error at most $\epsilon$ is called an $(n,M_{1},M_{2},K,\epsilon)$ code. We assume that all alphabets are finite.

The following notation will be useful. Given a MAC

({\cal X}_{1}\times{\cal X}_{2},p_{Y|X_{1},X_{2}}(y|x_{1},x_{2}),{\cal Y}),

the sum-capacity without cooperation is given by

C_{\text{sum}}=\max_{p_{X_{1}}p_{X_{2}}}I(X_{1},X_{2};Y).

(1)

Let ${\cal P}^{\star}$ be the set of product distributions $p_{X_{1}}p_{X_{2}}$ achieving the maximum in (1). For any $p_{X_{1}}p_{X_{2}}\in{\cal P}^{\star}$ , let $p_{Y}$ be the resulting marginal on the channel output, giving

p_{Y}(y)=\sum_{(x_{1},x_{2})\in{\cal X}_{1}\times{\cal X}_{2}}p_{X_{1}}(x_{1})p_{X_{2}}(x_{2})p_{Y|X_{1},X_{2}}(y|x_{1},x_{2})

for all $y\in{\cal Y}$ . We use $i(x_{1},x_{2};y)$ , $i(x_{1};y|x_{2})$ and $i(x_{2};y|x_{1})$ to represent the joint and conditional information densities

$\displaystyle i(x_{1},x_{2};y)$	$\displaystyle=$	$\displaystyle\log\left(\frac{p_{Y\|X_{1},X_{2}}(y\|x_{1},x_{2})}{p_{Y}(y)}\right)$
$\displaystyle i(x_{1};y\|x_{2})$	$\displaystyle=$	$\displaystyle\log\left(\frac{p_{Y\|X_{1},X_{2}}(y\|x_{1},x_{2})}{p_{Y\|X_{2}}(y\|x_{2})}\right)$
$\displaystyle i(x_{2};y\|x_{1})$	$\displaystyle=$	$\displaystyle\log\left(\frac{p_{Y\|X_{1},X_{2}}(y\|x_{1},x_{2})}{p_{Y\|X_{1}}(y\|x_{1})}\right),$

where $p_{Y|X_{1}}$ and $p_{Y|X_{2}}$ are conditional marginals on $Y$ under joint distribution

p_{X_{1},X_{2},Y}=p_{X_{1}}p_{X_{2}}p_{Y|X_{1},X_{2}}.

We denote the 3-vector of all three quantities as

\mathbf{i}(x_{1},x_{2};y)=\left[\begin{array}[]{c}i(x_{1},x_{2};y)\\ i(x_{1};y|x_{2})\\ i(x_{2};y|x_{1})\end{array}\right].

It will be convenient to define

	$\displaystyle i(x_{1},x_{2})$	$\displaystyle=E[i(x_{1},x_{2};Y)\|(X_{1},X_{2})=(x_{1},x_{2})]$
		$\displaystyle=D(p_{Y\|X_{1}=x_{1},X_{2}=x_{2}}\\|p_{Y}).$

Let

	$\displaystyle V_{1}$	$\displaystyle=\text{Var}(i(X_{1},X_{2})),$		(2)
	$\displaystyle V_{2}$	$\displaystyle=E[\text{Var}(i(X_{1},X_{2};Y)\|X_{1},X_{2})].$		(3)

Roughly speaking, $V_{1}$ represents the information-variance of the codewords, whereas $V_{2}$ represents the information-variance of the channel noise. Given two distributions $p_{X},q_{X}$ , let the divergence-variance be

V(p_{X}\|q_{X})=\text{Var}_{p_{X}}\left(\log\frac{p_{X}(X)}{q_{X}(X)}\right).

Note that

V_{2}=\sum_{x_{1},x_{2}}p_{X_{1}}(x_{1})p_{X_{2}}(x_{2})V(p_{Y|X_{1}=x_{1},X_{2}=x_{2}}\|p_{Y}).

III Main Results

Define the fundamental sum-rate limit for the facilitated-MAC as

R_{\text{sum}}(n,\epsilon,K)\\ =\sup\bigg{\{}\frac{\log(M_{1}M_{2})}{n}:\exists(n,M_{1},M_{2},K,\epsilon)\text{ code}\bigg{\}}.

In the literature on second-order rates, there are typically two types of results: (i) finite blocklength results, with no asymptotic terms, that are typically written in terms of abstract alphabets, and (ii) asymptotic results that derive from these finite blocklength results, which are typically easier to understand. The following is an achievable result which has some flavor of both: the channel noise is dealt with via an asymptotic analysis, but the dependence on the randomness in the codewords is written as in a finite blocklength result. We provide this “intermediate” result because, depending on the CF parameter $K$ , the relevant aspect of the codeword distribution may be in the central limit, moderate deviations, or large deviations regime. Thus, in this form one may plug in any concentration bound to derive an achievable bound. Subsequently, Theorem 2 gives specific achievable results based on two different concentration bounds. We also prove another achievable bound, Theorem 3, which does not rely on Theorem 1, but instead uses an approach based on the method of types that applies at larger values of $K$ .

Theorem 1.

Assume $\log K=o(n)$ . For any distribution $p_{X_{1}},p_{X_{2}}$ , let $X_{j}^{n}(k)$ be an i.i.d. sequence from $p_{X_{j}}$ for each $k\in[K]$ , with all sequences mutually independent. There exists an $(n,M_{1},M_{2},K,\epsilon)$ code if

$\displaystyle\epsilon$	$\displaystyle\geq\Pr\Bigg{(}\max_{k\in[K]}\sum_{i=1}^{n}i(X_{1i}(k),X_{2i}(k))+\sqrt{nV_{2}}\,Z_{0}$
	$\displaystyle\qquad<\log(M_{1}M_{2}K)+\frac{1}{2}\log n\Bigg{)}$
	$\displaystyle\qquad+O\left(\sqrt{\frac{\log n}{n}}\right)+O\left(\sqrt{\frac{\log K}{n}}\right)$	(4)
$\displaystyle\log M_{1}$	$\displaystyle\leq nI(X_{1};Y\|X_{2})-c\sqrt{n\log K+n\log n}$	(5)
$\displaystyle\log M_{2}$	$\displaystyle\leq nI(X_{2};Y\|X_{1})-c\sqrt{n\log K+n\log n}$	(6)

where $Z_{0}$ is a standard Gaussian, and where $c$ is a constant.

For fixed $K$ , let $Z_{0},\ldots,Z_{K}$ be drawn i.i.d. from $\mathcal{N}(0,1)$ . Let

S_{K}=\sqrt{V_{2}}\,Z_{0}+\sqrt{V_{1}}\max_{k\in[1:K]}Z_{k},

and define the CDF of $S_{K}$ as

F_{S_{K}}(s)=\Pr(S_{K}\leq s).

Also let $F_{S_{K}}^{-1}$ be the inverse of the CDF; that is,

F_{S_{K}}^{-1}(p)=\sup\{s:F_{S_{K}}(s)\leq p\}\text{ for }p\in[0,1].

In what follows we use Theorem 1 and the function $F_{S_{K}}^{-1}$ to explicitly bound from below the benefit in sum-rate when cooperating with varying measures of $K$ . A numerical computation of $F_{S_{K}}^{-1}(\epsilon)$ as a function of $K$ is shown in Fig. 1. The following is a technical estimate of $F_{S_{K}}^{-1}$ .

Refer to caption — Figure 1: The inverse CDF $F_{S_{K}}^{-1}(\epsilon)$ for $\epsilon=0.01$ , for $V_{1}=V_{2}=1$ across a range of $K$ . Note that the horizontal axis is $\log_{2}K$ , i.e., the number of bits transmitted from the CF.

Lemma 1.

For $K$ and $\epsilon$ that satisfy $K>e^{3}\sqrt{2\pi}\ln(4/\epsilon)$ , $F_{S_{K}}^{-1}(\epsilon)$ is at least

\sqrt{V_{1}(2\ln K-2\ln\ln({4}/{\epsilon})-\ln\ln K-\ln(4\pi))}\\ -\sqrt{2V_{2}\ln(2/\epsilon)}.

Moreover, for all $K$ and $\epsilon$ ,

F^{-1}_{S_{K}}(1-\epsilon)\leq\sqrt{2V_{1}\ln K}+\sqrt{2V_{1}\ln(4/\epsilon)}+\sqrt{2V_{2}\ln(2/\epsilon)}.

Proof:

Let $Z(K)=\max_{k\in[K]}{Z_{k}}$ . From [8], it holds that $\Pr(Z(K)\leq\sqrt{\kappa-\ln{\kappa}})\leq\epsilon/2$ for $\kappa=2\ln(K/\sqrt{2\pi})-2\ln\ln(4/\epsilon)\geq 6$ . Moreover, $\Pr(\sqrt{V}Z_{0}\leq-\sqrt{2V\ln(2/\epsilon)})\leq\epsilon/2$ . Combining these bounds gives the desired lower bound.

For the upper bound, [9, 10] imply that for any $K$ , $\Pr(\sqrt{V_{1}}Z(K)\geq(\sqrt{2V_{1}\ln K}+\sqrt{2V_{1}\ln(4/\epsilon)}))\leq\epsilon/2$ . Moreover, $\Pr(\sqrt{V_{2}}Z_{0}\geq\sqrt{2V_{2}\ln(2/\epsilon)})\leq\epsilon/2$ . Thus, $F^{-1}_{S_{k}}(1-\epsilon)\leq\sqrt{2V_{1}\ln K}+\sqrt{2V_{1}\ln(4/\epsilon)}+\sqrt{2V_{2}\ln(2/\epsilon)}$ . ∎

Theorem 2.

For any $p_{X_{1}}p_{X_{2}}\in{\cal P}^{\star}$ and the associated constants $V_{1}$ and $V_{2}$ , if $\log K=o(n^{1/3})$ , then

R_{\text{sum}}(n,\epsilon,K)\geq C_{\text{sum}}+\frac{1}{\sqrt{n}}\,F_{S_{K}}^{-1}(\epsilon)-\theta_{n}

where

$\displaystyle\theta_{n}$	$\displaystyle=O\left(\frac{\log n}{n}\right),$	$\displaystyle\text{ if }K\leq\log n$	(7)
$\displaystyle\theta_{n}$	$\displaystyle=O\left(\frac{K}{n}\right),$	$\displaystyle\text{ if }\log n\leq K\leq\log^{3/2}n$	(8)
$\displaystyle\theta_{n}$	$\displaystyle=O\left(\frac{\log^{3/2}n}{n}\right),$	$\displaystyle\text{ if }\log^{3/2}n\leq K\leq n$	(9)
$\displaystyle\theta_{n}$	$\displaystyle=O\left(\frac{\log^{3/2}K}{n}\right),$	$\displaystyle\text{ if }K\geq n.$	(10)

For larger $K$ , our achievability bound employs the function

\Delta(a)=\max_{p_{X_{1},X_{2}}:I(X_{1};X_{2})\leq a}I(X_{1},X_{2};Y)-C_{\text{sum}}.

Note that $\Delta(0)=0$ . Lemma 2 captures the behavior of $\Delta(a)$ for small $a$ . (See Appendix A for the proof.)

Lemma 2.

In the limit as $a\to 0$ ,

\Delta(a)=\sqrt{a\,2V_{1}^{\star}\ln 2}+o(\sqrt{a})

where

V_{1}^{\star}=\max_{p_{X_{1}}p_{X_{2}}\in{\cal P}^{\star}}\text{{\em Var}}(i(X_{1},X_{2})).

(11)

Theorem 3.

For any $K$ such that $\log K=\omega(\log n)$ ,

R_{\text{sum}}(n,\epsilon,K)\\ \geq C_{\text{sum}}+\Delta\left(\frac{\log K}{n}-O\left(\frac{\log n}{n}\right)\right)-O\left(\frac{1}{\sqrt{n}}\right).

Remark 1.

While Theorems 2 and 3 appear quite different, Lemmas 1 and 2 imply that for mid-range $K$ values, they give similar results. In particular, if

\log n\ll\log K\ll n^{1/3}

then applying Theorem 2, and choosing the distribution $p_{X_{1}}p_{X_{2}}\in{\cal P}^{\star}$ that achieves the maximum in (11) gives

	$\displaystyle R_{\text{sum}}(n,\epsilon,K)-C_{\text{sum}}$	$\displaystyle\geq\frac{1}{\sqrt{n}}F_{S_{K}}^{-1}(\epsilon)-\theta_{n}$
		$\displaystyle\approx\sqrt{\frac{2V_{1}^{\star}\ln K}{n}}.$

For the same range of $K$ , Theorem 3 gives

	$\displaystyle R_{\text{sum}}(n,\epsilon,K)-C_{\text{sum}}$	$\displaystyle\geq\Delta\left(\frac{\log K}{n}-O\left(\frac{\log n}{n}\right)\right)$
		$\displaystyle\qquad-O\left(\frac{1}{\sqrt{n}}\right)$
		$\displaystyle\approx\sqrt{\frac{V_{1}^{\star}\log K}{n}\,2\ln 2}$
		$\displaystyle=\sqrt{\frac{2V_{1}^{\star}\ln K}{n}}.$

III-A Comparison to prior work

In [4], an analog to Theorem 3 is proven for the asymptotic blocklength regime. Namely, in our notation, [4] proves that for any $\epsilon>0$ and $\delta>0$ , if we set $K=2^{\Omega(n)}$ then there exist $n$ such that,

R_{\text{sum}}(n,\epsilon,K)-C_{\text{sum}}>\Delta\left(\frac{\log{K}}{n}\right)-\delta.

Similarly, in [4, 7], an analog to Lemma 2 is shown for asymptotic blocklength. Specifically, it is shown that the existence of distributions $p_{X_{1}}p_{X_{2}}\in{\cal P}^{\star}$ and $p_{\tilde{X}_{1}\tilde{X}_{2}}$ over ${\cal X}_{1}\times{\cal X}_{2}$ such that (a) the support of $p_{\tilde{X}_{1}\tilde{X}_{2}}$ is included in that of $p_{X_{1}}p_{X_{2}}$ , and (b)

I(\tilde{X}_{1},\tilde{X}_{2},\tilde{Y})+D(p_{\tilde{X}_{1}\tilde{X}_{2}}\|p_{X_{1}}p_{X_{2}})>I(X_{1},X_{2};Y)

for

	$\displaystyle p_{X_{1},X_{2},\tilde{X}_{1},\tilde{X}_{2},Y,\tilde{Y}}(x_{1},x_{2},\tilde{x}_{1},\tilde{x}_{2},y,\tilde{y})$
	$\displaystyle=p_{X_{1}}(x_{1})p_{X_{2}}(x_{2})p_{\tilde{X}_{1},\tilde{X}_{2}}(\tilde{x}_{1},\tilde{x}_{2})$
	$\displaystyle\qquad\cdot p_{Y\|X_{1},X_{2}}(y\|x_{1},x_{2})p_{Y\|X_{1},X_{2}}(\tilde{y}\|\tilde{x}_{1},\tilde{x}_{2}),$

imply that there exists a constant $\sigma_{0}$ such that

\lim_{a\rightarrow 0}\Delta(a)\geq\sigma_{0}\sqrt{a}.

Although Theorem 3 and Lemma 2 (and their proof techniques) are similar in nature to those of [4, 7], the analysis presented here is refined in that it captures higher order behavior in blocklength $n$ and further optimized to address the challenges in studying values of $K$ that are sub-exponential in the blocklength $n$ .

We may also compare our results against prior achievable bounds without cooperation. Note that the standard MAC, with no cooperation, corresponds to $K=1$ . In fact, in this case Theorem 2 gives the same second-order term as the best-known achievable bound for the MAC sum-rate [11, 12, 13, 14, 15]. This can be seen by noting that $S_{1}\sim\mathcal{N}(0,V_{1}+V_{2})$ , and so $F_{S_{1}}^{-1}(\epsilon)=\sqrt{V_{1}+V_{2}}\Phi^{-1}(\epsilon)$ . Thus Theorem 2 gives

R_{\text{sum}}(n,\epsilon,1)\geq C_{\text{sum}}+\sqrt{\frac{V_{1}+V_{2}}{n}}\Phi^{-1}(\epsilon)-O\left(\frac{\log n}{n}\right).

Moreover,

V_{1}+V_{2}=\text{Var}(i(X_{1},X_{2};Y))

which, for the optimal input distribution, is precisely the best-known achievable dispersion. The proof of Theorem 2 uses i.i.d. codebooks, which, as shown in [14], can be outperformed in terms of second-order rate by constant combination codebooks. However, as pointed out in [15, Sec. III-B], the two approaches give the same bounds on the sum-rate itself.

Another interesting conclusion comes from comparing the no cooperation case ( $K=1$ ) with a single bit of cooperation ( $K=2$ ). As long as $V_{1}^{\star}>0$ , it is easy to see that $F_{S_{2}}^{-1}(\epsilon)>F_{S_{1}}^{-1}(\epsilon)$ for any $\epsilon\in(0,1)$ (Fig. 1 shows an example). Thus, the second-order coefficient in Theorem 2 for $K=2$ is strictly improved compared to $K=1$ . Therefore, even a single bit of cooperation allows for $O(\sqrt{n})$ additional message bits.

IV Proof of Theorem 1

We use random code design, beginning with independent design of the codewords for both transmitters. Precisely, we draw

	$\displaystyle f_{1}(1,1),f_{1}(1,2),\ldots,f_{1}(M_{1},K)$	$\displaystyle\sim$	$\displaystyle\mbox{i.i.d. }p_{X_{1}}$
	$\displaystyle f_{2}(1,1),f_{2}(1,2),\ldots,f_{2}(M_{2},K)$	$\displaystyle\sim$	$\displaystyle\mbox{i.i.d. }p_{X_{2}}.$

The facilitator code $e(m_{1},m_{2})$ is then designed in an attempt to maximize the likelihood $p_{Y|X_{1},X_{2}}$ under a received channel output $Y$ . We begin by defining the threshold decoder $g(y)$ employed in our analysis. Maximum likelihood decoding is expected to give the best performance, but instead we here employ a threshold decoder for simplicity. For notational efficiency, let

	$\displaystyle(X_{1},X_{2})(m_{1},m_{2})=(X_{1}(m_{1},m_{2}),X_{2}(m_{1},m_{2}))$
	$\displaystyle=(f_{1}(m_{1},e(m_{1},m_{2})),f_{2}(m_{2},e(m_{1},m_{2}))),$

where $e(m_{1},m_{2})$ is the (fixed) facilitator function to be defined below. Given a constant vector $\mathbf{c}^{\star}=[c_{12}^{\star},c_{1}^{\star},c_{2}^{\star}]^{T}$ , we define the decoder $g(y)$ to choose the unique message pair $(m_{1},m_{2})$ such that

\mathbf{i}((X_{1},X_{2})(m_{1},m_{2});y)\geq\mathbf{c}^{\star},

where the vector inequality means that all three inequalities must hold simultaneously. Conversely we use the notation $\not\geq$ between vectors to mean that any one of the three inequalities fails. If the number of message pairs that meet this constraint is not one, we declare an error. In an attempt to ensure that $i((X_{1},X_{2})(m_{1},m_{2});Y)$ is, in some sense, large for random channel outputs $Y$ that may result from that codeword pair’s transmissions, for each $(m_{1},m_{2})\in[M_{1}]\times[M_{2}]$ , we define

e(m_{1},m_{2})=\operatorname*{arg\,max}_{k\in[K]}s(f_{1}(m_{1},k),f_{2}(m_{2},k)),

where $s(x_{1},x_{2})$ is a score function to be chosen below.

Under this code design, the expected error probability satisfies

\begin{array}[]{r@{}l}\lx@intercol\hfil E[P_{e}]=E\left[\Pr\left(\left.g(Y)\neq(1,1)\right|(X_{1},X_{2})=(X_{1},X_{2})(1,1)\right)\right]\hfil\lx@intercol\\ \leq\Pr\Bigg{(}&\mathbf{i}((X_{1},X_{2})(m_{1},m_{2});Y)\not\geq\mathbf{c}^{\star}\\ &\text{ or }\mathbf{i}(X_{1}(\hat{m}_{1},\hat{k}),X_{2}(\hat{m}_{2},\hat{k});Y)\geq\mathbf{c}^{\star}\\ &\text{ for any }(\hat{m}_{1},\hat{m}_{2})\neq(1,1),k\in[K]\\ &\Bigg{|}(X_{1},X_{2})=(X_{1},X_{2})(1,1)\Bigg{)}.\end{array}

To further upper bound the error probability, we define the following random variables. Let $p_{\tilde{X}_{1},\tilde{X}_{2}}$ be the joint distribution of $(X_{1},X_{2})(1,1)$ that results from our choice of CF. This distribution would be the same for any message pair. Let variables $X_{1}$ , $X_{2}$ , $\tilde{X}_{1}$ , $\tilde{X}_{2}$ , $Y$ , $\tilde{Y}$ have joint distribution

	$\displaystyle p_{X_{1},X_{2},\tilde{X}_{1},\tilde{X}_{2},Y,\tilde{Y}}(x_{1},x_{2},\tilde{x}_{1},\tilde{x}_{2},y,\tilde{y})$
	$\displaystyle=p_{X_{1}}(x_{1})p_{X_{2}}(x_{2})p_{\tilde{X}_{1},\tilde{X}_{2}}(\tilde{x}_{1},\tilde{x}_{2})$
	$\displaystyle\qquad\cdot p_{Y\|X_{1},X_{2}}(y\|x_{1},x_{2})p_{Y\|X_{1},X_{2}}(\tilde{y}\|\tilde{x}_{1},\tilde{x}_{2}).$

Under transmission of message pair $(1,1)$ , $(X_{1},X_{2},Y)$ capture the relationship between channel inputs and output in a standard MAC, whereas $(\tilde{X}_{1},\tilde{X}_{2},\tilde{Y})$ capture the corresponding relationship with CF. Moreover, $(\tilde{X}_{1},X_{2},\tilde{Y})$ , $(X_{1},\tilde{X}_{2},\tilde{Y})$ , and $(X_{1},X_{2},\tilde{Y})$ capture the relationship between the channel output and one or more untransmitted codewords from our random code. Assume without loss of generality that $e(1,1)=1$ ; i.e., $(X_{1},X_{2})(1,1)=(f_{1}(1,1),f_{2}(1,1))$ . We now analyze the error probability by considering the following cases.

$\hat{m}_{1}$	$\hat{m}_{2}$	$\hat{k}$	Number of values	Distribution
$\neq 1$	1	1	$M_{1}-1$	$p_{X_{1}}p_{\tilde{X}_{2},\tilde{Y}}$
$\neq 1$	1	$\neq 1$	$(M_{1}-1)(K-1)$	$p_{X_{1}}p_{X_{2}}p_{\tilde{Y}}$
1	$\neq 1$	1	$M_{2}-1$	$p_{\tilde{X}_{1},\tilde{Y}}p_{X_{2}}$
$1$	$\neq 1$	$\neq 1$	$(M_{2}-1)(K-1)$	$p_{X_{1}}p_{X_{2}}p_{\tilde{Y}}$
$\neq 1$	$\neq 1$	any	$(M_{1}-1)(M_{2}-1)K$	$p_{X_{1}}p_{X_{2}}p_{\tilde{Y}}$

Note that we have excluded cases where $\hat{m}_{1}=\hat{m}_{2}=1$ , since those are not errors (even if $\hat{k}\neq 1$ ). Moreover, the number of cases wherein $X_{1}(\hat{m}_{1},\hat{k}),X_{2}(\hat{m}_{2},\hat{k}),Y$ has joint distribution $p_{X_{1}}p_{X_{2}}p_{\tilde{Y}}$ is less than $M_{1}M_{2}K$ . We can upper bound the expected error probability as

	$\displaystyle E[P_{e}]$	$\displaystyle\leq\Pr\left(\mathbf{i}(\tilde{X}_{1},\tilde{X}_{2};\tilde{Y})\not\geq\mathbf{c}^{\star}\right)$
		$\displaystyle\quad+M_{1}M_{2}K\Pr(\mathbf{i}(X_{1},X_{2};\tilde{Y})\geq\mathbf{c}^{\star})$
		$\displaystyle\quad+M_{1}\Pr(\mathbf{i}(X_{1},\tilde{X}_{2};\tilde{Y})\geq\mathbf{c}^{\star})$
		$\displaystyle\quad+M_{2}\Pr(\mathbf{i}(\tilde{X}_{1},X_{2};\tilde{Y})\geq\mathbf{c}^{\star}))$
		$\displaystyle\leq\Pr\left(\mathbf{i}(\tilde{X}_{1},\tilde{X}_{2};\tilde{Y})\not\geq\mathbf{c}^{\star}\right)$
		$\displaystyle\quad+M_{1}M_{2}K\Pr(i(X_{1},X_{2};\tilde{Y})\geq c_{12}^{\star})$
		$\displaystyle\quad+M_{1}\Pr(i(X_{1};\tilde{Y}\|\tilde{X}_{2})\geq c_{1}^{\star})$
		$\displaystyle\quad+M_{2}\Pr(i(X_{2};\tilde{Y}\|\tilde{X}_{1})\geq c_{2}^{\star})$

Note that

	$\displaystyle\Pr(i(X_{1};\tilde{Y}\|\tilde{X}_{2})\geq c_{1}^{\star})$
	$\displaystyle=\sum_{x_{1},x_{2},y}p_{X_{1}}(x_{1})p_{\tilde{X}_{2},\tilde{Y}}(x_{2},y)1\left(i(x_{1};y\|x_{2})\geq c_{1}^{\star}\right)$
	$\displaystyle=\sum_{x_{1},x_{2},y}p_{X_{1}\|X_{2}}(x_{1}\|x_{2})p_{\tilde{X}_{2},\tilde{Y}}(x_{2},y)$
	$\displaystyle\quad\cdot 1\left(i(x_{1};y\|x_{2})\geq c_{1}^{\star}\right)$
	$\displaystyle=\sum_{x_{2},y}p_{\tilde{X}_{2},\tilde{Y}}(x_{2},y)\sum_{x_{1}}p_{X_{1}\|X_{2},Y}(x_{1}\|x_{2},y)$
	$\displaystyle\quad\cdot\frac{p_{X_{1}\|X_{2}}(x_{1}\|x_{2})}{p_{X_{1}\|X_{2},Y}(x_{1}\|x_{2},y)}1\left(i(x_{1};y\|x_{2})\geq c_{1}^{\star}\right)$
	$\displaystyle=\sum_{x_{2},y}p_{\tilde{X}_{2},\tilde{Y}}(x_{2},y)\sum_{x_{1}}p_{X_{1}\|X_{2},Y}(x_{1}\|x_{2},y)$
	$\displaystyle\quad\cdot\frac{p_{Y\|X_{2}}(y\|x_{2})}{p_{Y\|X_{1},X_{2}}(y\|x_{1},x_{2})}$
	$\displaystyle\quad\cdot 1\left(\log\frac{p_{Y\|X_{1},X_{2}}(y\|x_{1},x_{2})}{p_{Y\|X_{2}}(y\|x_{2})}\geq c_{1}^{\star}\right)$
	$\displaystyle\leq\sum_{x_{2},y}p_{\tilde{X}_{2},\tilde{Y}}(x_{2},y)\sum_{x_{1}}p_{X_{1}\|X_{2},Y}(x_{1}\|x_{2},y)\exp(-c_{1}^{\star})$
	$\displaystyle=\exp(-c_{1}^{\star}).$

Applying similar arguments to the other terms, we find

	$\displaystyle E[P_{e}]$	$\displaystyle\leq\Pr\left(\mathbf{i}(\tilde{X}_{1},\tilde{X}_{2};\tilde{Y})\not\geq\mathbf{c}^{\star}\right)+M_{1}M_{2}K\exp(-c_{12}^{\star})$
		$\displaystyle\quad+M_{1}\exp(-c_{1}^{\star})+M_{2}\exp(-c_{2}^{\star}).$		(12)

Note that (12) may be viewed as a finite blocklength achievable result. While our primary goal is asymptotic second-order analysis, we proceed by analyzing this bound on the $n$ -length product channel. Specifically, we now focus on the case where $({\cal X}_{1}\times{\cal X}_{2},p_{Y|X_{1},X_{2}},{\cal Y})$ captures $n$ uses of a discrete, memoryless channel. We designate this special case by

({\cal X}_{1}^{n}\times{\cal X}_{2}^{n},(p_{Y|X_{1},X_{2}})^{n},{\cal Y}^{n})

and add superscript $n$ to all coding functions as a reminder of the scenario in operation. Assume that each codeword entry is drawn i.i.d. from $p_{X_{1}}$ or $p_{X_{2}}$ . Define the CF’s score function as

s(x_{1}^{n},x_{2}^{n})=\sum_{i=1}^{n}i(x_{1i},x_{2i}).

If we choose

	$\displaystyle c_{12}^{\star}$	$\displaystyle=\log(M_{1}M_{2}K)+\frac{1}{2}\log n,$
	$\displaystyle c_{1}^{\star}$	$\displaystyle=\log M_{1}+\frac{1}{2}\log n,$
	$\displaystyle c_{2}^{\star}$	$\displaystyle=\log M_{2}+\frac{1}{2}\log n,$

then

	$\displaystyle E[P_{e}]\leq\Pr\left(\mathbf{i}(\tilde{X}_{1}^{n},\tilde{X}_{2}^{n};\tilde{Y}^{n})\not\geq\mathbf{c}^{\star}\right)+\frac{3}{\sqrt{n}}$
	$\displaystyle\leq\Pr\left(i(\tilde{X}_{1}^{n},\tilde{X}_{2}^{n};\tilde{Y}^{n})<\log(M_{1}M_{2}K)+\frac{1}{2}\log n\right)$
	$\displaystyle\quad+\Pr\left(i(\tilde{X}_{1}^{n};\tilde{Y}^{n}\|\tilde{X}_{2}^{n})<\log M_{1}+\frac{1}{2}\log n\right)$
	$\displaystyle\quad+\Pr\left(i(\tilde{X}_{2}^{n};\tilde{Y}^{n}\|\tilde{X}_{1}^{n})<\log M_{2}+\frac{1}{2}\log n\right)+\frac{3}{\sqrt{n}}.$		(13)

We begin by bounding the second and third terms of (13) before returning to bound the first. For the second term in (13), recall that $\tilde{X}_{1}^{n},\tilde{X}_{2}^{n}$ are drawn from the distribution of $(X_{1}^{n},X_{2}^{n})(1,1)$ induced by the cooperation facilitator,

	$\displaystyle\Pr\left(i(\tilde{X}_{1}^{n};\tilde{Y}^{n}\|\tilde{X}_{2}^{n})<\log M_{1}+\frac{1}{2}\log n\right)$		(14)
	$\displaystyle\leq K\Pr\left(i(X_{1}^{n};Y^{n}\|X_{2}^{n})<\log M_{1}+\frac{1}{2}\log n\right)$
	$\displaystyle\leq K\exp\left\{-\frac{a_{1}}{n}\left(nI(X_{1};Y\|X_{2})-\log M_{1}-\frac{1}{2}\log n\right)^{2}\right\}$

where the last inequality follows from Hoeffding’s inequality and the assumption that $i(X_{1};Y|X_{2})$ is bounded, where $a_{1}$ is a constant employed in this bound. By the assumption of $\log M_{1}$ in the statement of the theorem, this quantity is at most $1/\sqrt{n}$ for a suitable constant $c$ . A similar bound can be applied to the third term in (13).

Now we consider the first term in (13). For fixed $x_{1}^{n},x_{2}^{n}$ ,

E[i(x_{1}^{n},x_{2}^{n};Y^{n})]=\sum_{i=1}^{n}i(x_{1i},x_{2i})=s(x_{1}^{n},x_{2}^{n}).

Thus we can apply the Berry-Esseen theorem to write

	$\displaystyle\Pr\bigg{(}i(x_{1}^{n},x_{2}^{n};Y^{n})<c_{12}^{\star}\bigg{\|}(X_{1}^{n},X_{2}^{n})=(x_{1}^{n},x_{2}^{n})\bigg{)}$
	$\displaystyle\leq\Pr\bigg{(}s(x_{1}^{n},x_{2}^{n})+\sqrt{\sum_{i=1}^{n}V(p(y\|x_{1i},x_{2i})\\|p_{Y})}\,Z_{0}$
	$\displaystyle\qquad<c_{12}^{\star}\bigg{)}+\frac{B}{\sqrt{n}}$

where, as in the statement of the theorem, $Z_{0}$ is a standard Gaussian random variable.

Assume

V(p_{Y|X_{1}=x_{1},X_{2}=x_{2}}\|p_{Y})\leq V_{\max}\text{ for all }x_{1},x_{2}.

Let

\gamma=V_{\max}\sqrt{\frac{\ln K+\frac{1}{2}\ln n}{2n}}.

Note that

\gamma=O\left(\sqrt{\frac{\log K}{n}}\right)+O\left(\sqrt{\frac{\log n}{n}}\right).

By the assumption that $\log K=o(n)$ , $\gamma=o(1)$ . By Hoeffding’s inequality, we may write

	$\displaystyle\Pr\left(\left\|\frac{1}{n}\sum_{i=1}^{n}V(p(y\|X_{1i},X_{2i})\\|p_{Y})-V_{2}\right\|>\gamma\right)$
	$\displaystyle\leq 2\exp\left\{-\frac{2n\gamma^{2}}{V_{\max}^{2}}\right\}=\frac{2}{K\sqrt{n}}.$

Thus, by the union bound

\Pr\Bigg{(}\Bigg{|}\frac{1}{n}\sum_{i=1}^{n}V(p(y|f_{1i}(1,k),f_{2i}(1,k))\|p_{Y})-V_{2}\Bigg{|}>\gamma\\ \text{ for any }k\in[K]\Bigg{)}\leq\frac{2}{\sqrt{n}}.

Thus

	$\displaystyle E[P_{e}]\leq\Pr\Bigg{(}s(\tilde{X}_{1}^{n},\tilde{X}_{2}^{n})$
	$\displaystyle\quad+\sqrt{\sum_{i=1}^{n}V(p(y\|\tilde{X}_{1i},\tilde{X}_{2i})\\|p_{Y})}\,Z_{0}<c_{12}^{\star}\Bigg{)}+O\left(\frac{1}{\sqrt{n}}\right)$
	$\displaystyle\leq E\Bigg{[}\max_{V^{\prime}\in[V_{2}-\gamma,V_{2}+\gamma]}\Pr\Bigg{(}s(\tilde{X}_{1}^{n},\tilde{X}_{2}^{n})+\sqrt{nV^{\prime}}\,Z_{0}$
	$\displaystyle\qquad<c_{12}^{\star}\Bigg{\|}\tilde{X}_{1}^{n},\tilde{X}_{2}^{n}\Bigg{)}\Bigg{]}+O\left(\frac{1}{\sqrt{n}}\right)$
	$\displaystyle=E\Bigg{[}\max_{V^{\prime}\in[V_{2}-\gamma,V_{2}+\gamma]}\Phi\left(\frac{c_{12}^{\star}-s(\tilde{X}_{1}^{n},\tilde{X}_{2}^{n})}{\sqrt{nV^{\prime}}}\right)\Bigg{]}$
	$\displaystyle\quad+O\left(\frac{1}{\sqrt{n}}\right)$

where $\Phi(\cdot)$ is the Gaussian CDF. Similarly, let $\phi(\cdot)$ be the Gaussian PDF. Given $x_{1}^{n},x_{2}^{n}$ , let

z=\frac{c_{12}^{\star}-s(x_{1}^{n},x_{2}^{n})}{\sqrt{n}}.

If $z\geq 0$ , then we may bound

	$\displaystyle\max_{V^{\prime}\in[V_{2}-\gamma,V_{2}+\gamma]}\Phi\left(\frac{z}{\sqrt{V^{\prime}}}\right)$
	$\displaystyle=\Phi\left(\frac{z}{\sqrt{V_{2}-\gamma}}\right)$
	$\displaystyle=\Phi\left(\frac{z}{\sqrt{V_{2}}}\right)+\int_{z/\sqrt{V_{2}}}^{z/\sqrt{V_{2}-\gamma}}\phi(y)dy$
	$\displaystyle\leq\Phi\left(\frac{z}{\sqrt{V_{2}}}\right)+\left(\frac{1}{\sqrt{V_{2}-\gamma}}-\frac{1}{\sqrt{V_{2}}}\right)z\phi\left(\frac{z}{\sqrt{V_{2}}}\right)$
	$\displaystyle\leq\Phi\left(\frac{z}{\sqrt{V_{2}}}\right)+\left(\frac{1}{\sqrt{V_{2}-\gamma}}-\frac{1}{\sqrt{V_{2}}}\right)\sqrt{\frac{V_{2}}{2\pi e}}$
	$\displaystyle=\Phi\left(\frac{z}{\sqrt{V_{2}}}\right)+\left(\sqrt{\frac{V_{2}}{V_{2}-\gamma}}-1\right)\frac{1}{\sqrt{2\pi e}}.$

If $z\geq 0$ , then we may bound

	$\displaystyle\max_{V^{\prime}\in[V_{2}-\gamma,V_{2}+\gamma]}\Phi\left(\frac{z}{\sqrt{V^{\prime}}}\right)$
	$\displaystyle=\Phi\left(\frac{z}{\sqrt{V_{2}+\gamma}}\right)$
	$\displaystyle=\Phi\left(\frac{z}{\sqrt{V_{2}}}\right)+\int_{z/\sqrt{V_{2}}}^{z/\sqrt{V_{2}+\gamma}}\phi(y)dy$
	$\displaystyle\leq\Phi\left(\frac{z}{\sqrt{V_{2}}}\right)+\left(\frac{1}{\sqrt{V_{2}}}-\frac{1}{\sqrt{V_{2}+\gamma}}\right)\|z\|\phi\left(\frac{z}{\sqrt{V_{2}+\gamma}}\right)$
	$\displaystyle\leq\Phi\left(\frac{z}{\sqrt{V_{2}}}\right)+\left(\frac{1}{\sqrt{V_{2}}}-\frac{1}{\sqrt{V_{2}+\gamma}}\right)\sqrt{\frac{V_{2}+\gamma}{2\pi e}}$
	$\displaystyle=\Phi\left(\frac{z}{\sqrt{V_{2}}}\right)+\left(\sqrt{\frac{V_{2}+\gamma}{V_{2}}}-1\right)\frac{1}{\sqrt{2\pi e}}.$

Since $\gamma=o(1)$ , combining the above bounds gives

\max_{V^{\prime}\in[V_{2}-\gamma,V_{2}+\gamma]}\Phi\left(\frac{z}{\sqrt{V^{\prime}}}\right)\leq\Phi\left(\frac{z}{\sqrt{V_{2}}}\right)+O(\gamma).

Thus,

	$\displaystyle E[P_{e}]\leq E\left[\Phi\left(\frac{c_{12}^{\star}-s(\tilde{X}_{1}^{n},\tilde{X}_{2}^{n})}{\sqrt{nV_{2}}}\right)\right]+O(\gamma)+O\left(\frac{1}{\sqrt{n}}\right)$
	$\displaystyle=\Pr\bigg{(}s(\tilde{X}_{1}^{n},\tilde{X}_{2}^{n})+\sqrt{nV_{2}}\,Z_{0}<c_{12}^{\star}\bigg{)}$
	$\displaystyle\qquad+O(\gamma)+O\left(\frac{1}{\sqrt{n}}\right)$
	$\displaystyle=\Pr\bigg{(}\max_{k\in[K]}\sum_{i=1}^{n}i(X_{1i}(k),X_{2i}(k))+\sqrt{nV_{2}}\,Z_{0}<c_{12}^{\star}\bigg{)}$
	$\displaystyle\qquad+O\left(\sqrt{\frac{\log K}{n}}\right)+O\left(\sqrt{\frac{\log n}{n}}\right).$

V Proof of Theorem 2

Given $\epsilon$ , our goal is to choose $M_{1},M_{2}$ to satisfy the conditions of Theorem 1, while

\log(M_{1}M_{2})=nC_{\text{sum}}+\sqrt{n}\,F_{S_{K}}^{-1}(\epsilon)-n\theta_{n}

(15)

where $\theta_{n}$ satisfies one of (7)–(10) depending on $K$ . Given $p_{X_{1}}p_{X_{2}}\in{\cal P}^{\star}$ , let $r_{1},r_{2}$ be rates where

$\displaystyle r_{1}+r_{2}$	$\displaystyle=I(X_{1},X_{2};Y)=C_{\text{sum}},$	(16)
$\displaystyle r_{1}$	$\displaystyle<I(X_{1};Y\|X_{2}),$	(17)
$\displaystyle r_{2}$	$\displaystyle<I(X_{2};Y\|X_{1}).$	(18)

Let

\log M_{j}=nr_{j}+\frac{1}{2}\left[\sqrt{n}\,F_{S_{K}}^{-1}(\epsilon)-n\theta_{n}\right].

This choice clearly satisfies (15). By Lemma 1, $F_{S_{K}}^{-1}(\epsilon)=\sqrt{2V_{1}\ln K}+O(1)$ , so for sufficiently large $n$ , (5)–(6) are easily satisfied. It remains to prove (4). Let $p_{e}$ be the probability in (4). We divide the remainder of the proof into two cases.

Case 1: $K\leq\log^{3/2}n$ . We adopt the notation from the proof of Theorem 1, specifically

	$\displaystyle s(X_{1}^{n},X_{2}^{n})$	$\displaystyle=\sum_{i=1}^{n}i(X_{1i}(k),X_{2i}(k)),$
	$\displaystyle c_{12}^{\star}$	$\displaystyle=\log(M_{1}M_{2}K)+\frac{1}{2}\log n.$

Thus

	$\displaystyle p_{e}$	$\displaystyle\leq\int_{-\infty}^{\infty}\phi(z)\Pr\bigg{(}\max_{k\in[K]}s(X_{1}^{n}(k),X_{2}^{n}(k))$
		$\displaystyle\qquad<c_{12}^{\star}-\sqrt{nV_{2}}\,z\bigg{)}dz$
		$\displaystyle=\int_{-\infty}^{\infty}\phi(z)\Pr\bigg{(}s(X_{1}^{n},X_{2}^{n})<c_{12}^{\star}-\sqrt{nV_{2}}\,z\bigg{)}^{K}dz.$

Note that $s(X_{1}^{n},X_{2}^{n})$ is an i.i.d. sum where each term has expectation

E[i(X_{1},X_{2})]=I(X_{1},X_{2};Y)=C_{\text{sum}}

and variance $V_{1}$ . Thus, by the Berry-Esseen theorem,

	$\displaystyle p_{e}$	$\displaystyle\leq\int_{-\infty}^{\infty}\phi(z)\bigg{[}\Pr\bigg{(}nC_{\text{sum}}+\sqrt{n}\,\sigma Z_{1}<c_{12}^{\star}-\sqrt{nV_{2}}\,z\bigg{)}$
		$\displaystyle\qquad+\frac{B_{1}}{\sqrt{n}}\bigg{]}^{K}dz$

where $Z_{1}\sim\mathcal{N}(0,1)$ . For any $p\in[0,1]$ and any $0\leq q\leq 1/K$ , we can bound

	$\displaystyle(p+q)^{K}$	$\displaystyle=\sum_{\ell=0}^{K}\binom{K}{\ell}p^{K-\ell}q^{\ell}$
		$\displaystyle\leq p^{K}+\sum_{\ell=1}^{K}\binom{K}{\ell}q^{\ell}$
		$\displaystyle=p^{K}+(1+q)^{K}-1$
		$\displaystyle\leq p^{K}+e^{qK}-1$
		$\displaystyle\leq p^{K}+2qK.$

By the assumption that $K\leq\log^{3/2}n$ , for sufficiently large $n$ , $\frac{B_{1}}{\sqrt{n}}\leq\frac{1}{K}$ . Thus

	$\displaystyle p_{e}\leq\int_{-\infty}^{\infty}\phi(z)\Pr\bigg{(}nC_{\text{sum}}+\sqrt{n}\,\sigma Z_{1}<c_{12}^{\star}-\sqrt{nV_{2}}\,z\bigg{)}^{K}\!dz$
	$\displaystyle\qquad+\frac{2B_{1}K}{\sqrt{n}}$
	$\displaystyle=\Pr(nC_{\text{sum}}+\sqrt{n}\,S_{K}<c_{12}^{\star})+O\left(\frac{K}{\sqrt{n}}\right)$
	$\displaystyle=F_{S_{K}}\left(\frac{\log(M_{1}M_{2}K)+\frac{1}{2}\log n-nC_{\text{sum}}}{\sqrt{n}}\right)+O\left(\frac{K}{\sqrt{n}}\right).$

Recalling Theorem 1, we can achieve probability of error $\epsilon$ if

p_{e}+O\left(\sqrt{\frac{\log n}{n}}\right)+O\left(\sqrt{\frac{\log K}{n}}\right)\leq\epsilon.

This condition is satisfied if

\log(M_{1}M_{2}K)+\frac{1}{2}\log n=nC_{\text{sum}}\\ +\sqrt{n}\,F_{S_{K}}^{-1}\left(\epsilon-c_{1}\frac{K}{\sqrt{n}}-c_{2}\sqrt{\frac{\log n}{n}}-c_{3}\sqrt{\frac{\log K}{n}}\right)

for suitable constants $c_{1},c_{2},c_{3}$ and sufficiently large $n$ . To simplify the second term, we need the following lemma, which is proved in Appendix B.

Lemma 3.

Fix $\epsilon\in(0,1)$ and $V_{1},V_{2}>0$ . Then

\sup_{K\geq 1}\frac{d}{dp}F_{S_{K}}^{-1}(p)\bigg{|}_{p=\epsilon}<\infty.

Applying Lemma 3, there exists a sequence of codes if

	$\displaystyle\frac{\log(M_{1}M_{2})}{n}$	$\displaystyle\geq C_{\text{sum}}+\frac{1}{\sqrt{n}}\,F_{S_{K}}^{-1}(\epsilon)-O\left(\frac{K}{n}\right)$
		$\displaystyle\qquad-O\left(\frac{\log n}{n}\right).$

This achieves the ranges of $K$ given by (7)–(8).

Case 2: $K\geq\log^{3/2}n$ and $\log K=o(n^{1/3})$ . For convenience, define

A=\frac{c_{12}^{\star}-\max_{k\in[K]}s(X_{1}^{n}(k),X_{2}^{n}(k))}{\sqrt{nV_{2}}}.

Thus,

	$\displaystyle p_{e}=\Pr(Z_{0}<A)$
	$\displaystyle\leq\Pr(Z_{0}<A,\|Z_{0}\|<\sqrt{\ln n})+\Pr(\|Z_{0}\|\geq\sqrt{\ln n})$
	$\displaystyle\leq\Pr(Z_{0}<A,\|Z_{0}\|<\sqrt{\ln n})+O\left(\frac{1}{\sqrt{n}}\right)$
	$\displaystyle=\int_{-\sqrt{\ln n}}^{\sqrt{\ln n}}\phi(z)\Pr(z<A)dz+O\left(\frac{1}{\sqrt{n}}\right)$
	$\displaystyle=\int_{-\sqrt{\ln n}}^{\sqrt{\ln n}}\phi(z)\Pr\bigg{(}\max_{k\in[K]}s(X_{1}^{n}(k),X_{2}^{n}(k))$
	$\displaystyle\qquad<c_{12}^{\star}-\sqrt{nV_{2}}\,z\bigg{)}dz+O\left(\frac{1}{\sqrt{n}}\right)$
	$\displaystyle=\int_{-\sqrt{\ln n}}^{\sqrt{\ln n}}\phi(z)\Pr\left(s(X_{1}^{n},X_{2}^{n})<c_{12}^{\star}-\sqrt{nV_{2}}\,z\right)^{K}dz$
	$\displaystyle\qquad+O\left(\frac{1}{\sqrt{n}}\right).$

To continue, we need the moderate deviations bound given by the following lemma.

Lemma 4 (Moderate deviations [16]).

Let $X_{1},X_{2},\ldots$ be i.i.d. random variables with zero mean and unit variance, and let $W=\sum_{i=1}^{n}X_{i}/\sqrt{n}$ where $c=E[e^{t|X_{1}|}]<\infty$ for some $t>0$ . There exist constants $a_{0}$ and $b_{0}$ depending only on $t$ and $c$ such that, for any $0\leq w\leq a_{0}n^{1/6}$ ,

\left|\frac{\Pr(W\geq w)}{Q(w)}-1\right|\leq\frac{b_{0}(1+w^{3})}{\sqrt{n}},

where $Q(w)=1-\Phi(w)$ is the complementary CDF of the standard Gaussian distribution.

To apply the moderate deviations bound, we can write

	$\displaystyle\Pr\left(s(X_{1}^{n},X_{2}^{n})<c_{12}^{\star}-\sqrt{nV_{2}}\,z\right)$
	$\displaystyle=\Pr\left(\frac{s(X_{1}^{n},X_{2}^{n})-nC_{\text{sum}}}{\sqrt{nV_{1}}}<w_{z}\right)$

where

w_{z}=\frac{c_{12}^{\star}-\sqrt{nV_{2}}\,z-nC_{\text{sum}}}{\sqrt{nV_{1}}}.

Since in our integral, $|z|\leq\sqrt{\ln n}$ , in order to apply the moderate deviations bound, we need to prove that $|w_{z}|\leq a_{0}n^{1/6}$ as long as $|z|\leq\sqrt{\ln n}$ . We have

\displaystyle|w_{z}|

\displaystyle\leq\frac{|c_{12}^{\star}-nC_{\text{sum}}|}{\sqrt{nV_{1}}}+\sqrt{\frac{2V_{2}\ln n}{V_{1}}}.

From the target for $M_{1}M_{2}$ in (15),

	$\displaystyle c_{12}^{\star}$	$\displaystyle=\log(M_{1}M_{2}K)+\frac{1}{2}\log n$
		$\displaystyle=nC_{\text{sum}}+\sqrt{n}\,F_{S_{K}}^{-1}(\epsilon)-n\theta_{n}+\log K+\frac{1}{2}\log n$
		$\displaystyle\leq nC_{\text{sum}}+\sqrt{2V_{1}n\ln K}+\log K+O(\log n)$
		$\displaystyle=nC_{\text{sum}}+O(n^{2/3}).$

By the assumption that $\log K=o(n^{1/3})$ , $\sqrt{n\ln K}\gg\log K$ , so

|w_{z}|=O(\sqrt{\log K})+O(\sqrt{\log n}).

Thus $|w_{z}|=o(n^{1/6})$ , so indeed we may apply the moderate deviations bound. Let

	$\displaystyle\lambda_{n}$	$\displaystyle=\max_{\|z\|\leq\sqrt{\ln n}}\frac{b_{0}}{\sqrt{n}}(1+\|w_{z}\|^{3})$
		$\displaystyle=O\left(\frac{\log^{3/2}K}{\sqrt{n}}\right)+O\left(\frac{\log^{3/2}n}{\sqrt{n}}\right)$

Letting $Z_{1}\sim\mathcal{N}(0,1)$ we now have

	$\displaystyle p_{e}$	$\displaystyle\leq\int_{-\sqrt{\ln{n}}}^{\sqrt{\ln{n}}}\phi(z)\left(1-\Pr(Z_{1}>w_{z})(1-\lambda_{n})\right)^{K}dz$
		$\displaystyle\qquad+O\left(\frac{1}{\sqrt{n}}\right)$
		$\displaystyle\leq\int_{-\sqrt{\ln{n}}}^{\sqrt{\ln{n}}}\phi(z)\left(1-Q(w_{z})(1-\lambda_{n})\right)^{K}dz+O\left(\frac{1}{\sqrt{n}}\right).$

We now claim that for any $w\geq 0$ and any $0\leq\lambda\leq 3/4$ ,

Q(w)(1-\lambda)\geq Q(w+2\lambda).

Indeed, it is easy to see that

\frac{Q(w+2\lambda)}{Q(w)}\leq\frac{Q(2\lambda)}{Q(0)}=2Q(2\lambda)\leq 1-\lambda,

where the last inequality holds if $\lambda\leq 3/4$ . Note that $\lambda_{n}=o(1)$ , so this inequality holds for sufficiently large $n$ . Thus,

	$\displaystyle p_{e}$	$\displaystyle\leq E\Bigg{[}\left(1-Q\left(w_{Z_{0}}\right)(1-\lambda_{n})\right)^{K}\cdot 1\left(w_{Z_{0}}\geq 0\right)\Bigg{]}$
		$\displaystyle\quad+\Pr\left(w_{Z_{0}}<0\right)+O\left(\frac{1}{\sqrt{n}}\right)$
		$\displaystyle\leq E\left[\left(1-Q\left(w_{Z_{0}}+2\lambda_{n}\right)\right)^{K}\right]+Q\left(\frac{c_{12}^{\star}-nC_{\text{sum}}}{\sqrt{nV_{2}}}\right)$
		$\displaystyle\qquad+O\left(\frac{1}{\sqrt{n}}\right).$

Note that

	$\displaystyle E\left[\left(1-Q\left(w_{Z_{0}}+2\lambda_{n}\right)\right)^{K}\right]$
	$\displaystyle=E\left[\Pr\bigg{(}Z_{1}<\frac{c_{12}^{\star}-\sqrt{nV_{2}}\,Z_{0}-nC_{\text{sum}}}{\sqrt{nV_{1}}}+2\lambda_{n}\bigg{\|}Z_{0}\bigg{)}^{K}\right]$
	$\displaystyle=\Pr\bigg{(}\max_{k\in[K]}Z_{k}<\frac{c_{12}^{\star}-\sqrt{nV_{2}}\,Z_{0}-nC_{\text{sum}}}{\sqrt{nV_{1}}}+2\lambda_{n}\bigg{)}$
	$\displaystyle=F_{S_{K}}\left(\frac{c_{12}^{\star}-nC_{\text{sum}}}{\sqrt{n}}+2\sqrt{V_{1}}\,\lambda_{n}\right).$

At this point, we make the choice of $M_{1}M_{2}$ slightly more precise; in particular, let

	$\displaystyle\log(M_{1}M_{2})=nC_{\text{sum}}$
	$\displaystyle\quad+\sqrt{n}F_{S_{K}}^{-1}\left(\epsilon-c_{1}\sqrt{\frac{\log K}{n}}-c_{2}\sqrt{\frac{\log n}{n}}-K^{-V_{1}/(2V_{2})}\right)$
	$\displaystyle\quad-2\sqrt{nV_{1}}\,\lambda_{n}-\frac{1}{2}\log n-\log K$

for suitable constants $c_{1}$ and $c_{2}$ . From Lemma 1,

\displaystyle\frac{c_{12}^{\star}-nC_{\text{sum}}}{\sqrt{n}}

\displaystyle\geq\sqrt{2V_{1}\ln K}-o(1).

Thus

	$\displaystyle Q\left(\frac{c_{12}^{\star}-nC_{\text{sum}}}{\sqrt{nV_{2}}}\right)$	$\displaystyle\leq\exp\left\{-\frac{1}{2}\left(\sqrt{\frac{2V_{1}\ln K}{V_{2}}}-o(1)\right)^{2}\right\}$
		$\displaystyle\leq K^{-V_{1}/(2V_{2})}$

where the last inequality holds for sufficiently large $n$ . From Theorem 1, there exists a code with probability of error at most

	$\displaystyle p_{e}+O\left(\sqrt{\frac{\log K}{n}}\right)+O\left(\sqrt{\frac{\log n}{n}}\right)$
	$\displaystyle\leq\epsilon-c_{1}\sqrt{\frac{\log K}{n}}-c_{2}\sqrt{\frac{\log n}{n}}$
	$\displaystyle\qquad+O\left(\frac{1}{\sqrt{n}}\right)+O\left(\sqrt{\frac{\log K}{n}}\right)+O\left(\sqrt{\frac{\log n}{n}}\right)$
	$\displaystyle\leq\epsilon$

assuming $c_{1},c_{2}$ are chosen properly. This proves that we can achieve the sum-rate

	$\displaystyle\frac{\log(M_{1}M_{2})}{n}\geq C_{\text{sum}}$
	$\displaystyle\quad+\frac{1}{\sqrt{n}}F_{S_{K}}^{-1}\left(\epsilon-c_{1}\sqrt{\frac{\log K}{n}}-c_{2}\sqrt{\frac{\log n}{n}}-K^{-V_{1}/(2V_{2})}\right)$
	$\displaystyle\quad-\frac{2\sqrt{V_{1}}\,\lambda_{n}}{\sqrt{n}}-\frac{\log n}{2n}-\frac{\log K}{n}$
	$\displaystyle\geq C_{\text{sum}}+\frac{1}{\sqrt{n}}F_{S_{K}}^{-1}(\epsilon)-O\left(\frac{\log^{3/2}K}{n}\right)-O\left(\frac{\log^{3/2}n}{n}\right)$

where in the last inequality we have used Lemma 3 as well as the bound on $\lambda_{n}$ . This achives the ranges of $K$ given by (9)–(10).

VI Proof of Theorem 3

This proof uses the method of types. A probability mass function $p_{X}$ is an $n$ -length type on alphabet ${\cal X}$ if $p_{X}(x)$ is a multiple of $1/n$ for each $x\in{\cal X}$ . For an $n$ -length type $p_{X}$ , the type class is denoted $T(p_{X})$ .

Let $p_{X_{1},X_{2}}$ be an $n$ -length joint type on alphabet ${\cal X}_{1}\times{\cal X}_{2}$ . Note that the marginal distributions $p_{X_{1}}$ and $p_{X_{2}}$ are also $n$ -length types. We employ the following random code construction. Draw codewords uniformly from the type classes $T(p_{X_{1}})$ and $T(p_{X_{2}})$ . Given message pair $(m_{1},m_{2})$ , the cooperation facilitator chooses uniformly from the set of $k\in[K]$ where

(f(m_{1},k),f(m_{2},k))\in T(p_{X_{1},X_{2}}).

If there is no such $k$ , the CF chooses $k$ uniformly at random. These random choices at the CF are taken to be part of the random code design. For the purposes of this proof, the three information densities employ the joint distribution $p_{X_{1},X_{2}}$ . The quantity $V_{2}$ is also defined as in (3) using information density for this joint distribution. The decoder is as follows. Given $y^{n}$ , choose the unique message pair $(m_{1},m_{2})$ such that

1.

$\mathbf{i}((X_{1}^{n},X_{2}^{n})(m_{1},m_{2});y^{n})\geq\mathbf{c}^{\star}$ ,
2.

$((X_{1}^{n},X_{2}^{n})(m_{1},m_{2}))\in T(p_{X_{1},X_{2}})$

for a constant vector $\mathbf{c}^{\star}=[c_{12}^{\star},c_{1}^{\star},c_{2}^{\star}]^{T}$ to be determined. If there is no message pair or more than one satisfying these conditions, declare an error. Note that, given

(X_{1}^{n},X_{2}^{n})(m_{1},m_{2})\in T(p_{X_{1},X_{2}}),

$(X_{1}^{n},X_{2}^{n})(m_{1},m_{2})$ is uniformly distributed on $T(p_{X_{1},X_{2}})$ . Let $q(x_{1}^{n},x_{2}^{n})$ be the uniform distribution on the type class $T(p_{X_{1},X_{2}})$ , with corresponding conditional distributions $q(x_{1}^{n}|x_{2}^{n})$ and $q(x_{2}^{n}|x_{1}^{n})$ . Define random variables $X_{1}^{n},X_{2}^{n},Y^{n}$ to have distribution

p_{X_{1}^{n},X_{2}^{n},Y^{n}}(x_{1}^{n},x_{2}^{n},y^{n})=q(x_{1}^{n},x_{2}^{n})p_{Y^{n}|X_{1}^{n},X_{2}^{n}}(y^{n}|x_{1}^{n},x_{2}^{n}).

Furthermore, define $Y_{1}^{n},Y_{2}^{n},Y_{12}^{n}$ where

p_{Y_{1}^{n},Y_{2}^{n},Y_{12}^{n}|X_{1}^{n},X_{2}^{n},Y^{n}}(y_{1}^{n},y_{2}^{n},y_{12}^{n}|x_{1}^{n},x_{2}^{n},y^{n})\\ =p_{Y^{n}|X_{2}^{n}}(y_{1}^{n}|x_{2}^{n})p_{Y^{n}|X_{1}^{n}}(y_{2}^{n}|x_{1}^{n})p_{Y^{n}}(y_{12}^{n})

Now we may bound the expected error probability by

	$\displaystyle E[P_{e}]$		(19)
	$\displaystyle\leq\Pr((X_{1}^{n},X_{2}^{n})(1,1)\notin T(p_{X_{1},X_{2}}))$
	$\displaystyle\quad+\Pr(\mathbf{i}((X_{1}^{n},X_{2}^{n})(1,1);Y^{n})\not\geq\mathbf{c}^{\star})$
	$\displaystyle\quad+\sum_{(\hat{m}_{1},\hat{m}_{2})\neq(1,1)}\Pr\bigg{(}(X_{1}^{n},X_{2}^{n})(\hat{m}_{1},\hat{m}_{2})\in T(p_{X_{1},X_{2}}),$
	$\displaystyle\quad\mathbf{i}((X_{1}^{n},X_{2}^{n})(\hat{m}_{1},\hat{m}_{2});Y^{n})\geq\mathbf{c}^{\star})\bigg{\|}(X_{1}^{n},X_{2}^{n})(1,1)\bigg{)}$
	$\displaystyle\leq\Pr((X_{1}^{n},X_{2}^{n})(1,1)\notin T(p_{X_{1},X_{2}}))$
	$\displaystyle\quad+\Pr(\mathbf{i}((X_{1}^{n},X_{2}^{n})(1,1);Y^{n})\not\geq\mathbf{c}^{\star})$
	$\displaystyle\quad+\sum_{(\hat{m}_{1},\hat{m}_{2})\neq(1,1)}\Pr\bigg{(}\mathbf{i}((X_{1}^{n},X_{2}^{n})(\hat{m}_{1},\hat{m}_{2});Y^{n})\geq\mathbf{c}^{\star})$
	$\displaystyle\quad\bigg{\|}(X_{1}^{n},X_{2}^{n})(1,1),(X_{1}^{n},X_{2}^{n})(\hat{m}_{1},\hat{m}_{2})\in T(p_{X_{1},X_{2}})\bigg{)}.$		(20)

In the summation in (20), consider a term where $\hat{m}_{1}\neq 1$ and $\hat{m}_{2}\neq 1$ . In this case, $(X_{1}^{n},X_{2}^{n})(\hat{m}_{1},\hat{m}_{2})$ is independent from $Y^{n}$ , so we may write that

((X_{1}^{n},X_{2}^{n})(\hat{m}_{1},\hat{m}_{2}),Y^{n})\stackrel{{\scriptstyle d}}{{=}}(X_{1}^{n},X_{2}^{n},Y_{12}^{n}),

where $Y_{12}^{n}$ has the same distribution as $Y^{n}$ but is independent from $X_{1}^{n},X_{2}^{n}$ ; i.e.,

p_{Y_{12}^{n}|X_{1}^{n},X_{2}^{n}}(y^{n}|x_{1}^{n},x_{2}^{n})=p_{Y^{n}}(y^{n}).

Now consider a term in (20) where $\hat{m}_{1}=1$ but $\hat{m}_{2}\neq 1$ . In this case, whether the transmitted signal from user 1 with message pair $(1,\hat{m}_{2})$ is the same as that with message pair $(1,1)$ depends on whether $e(1,\hat{m}_{2})=e(1,1)$ . Thus, the term in (20) is no more than

	$\displaystyle\Pr\bigg{(}\mathbf{i}((X_{1}^{n},X_{2}^{n})(1,\hat{m}_{2});Y^{n})\geq\mathbf{c}^{\star})\bigg{\|}(X_{1}^{n},X_{2}^{n})(1,1),$
	$\displaystyle\quad(X_{1}^{n},X_{2}^{n})(1,\hat{m}_{2})\in T(p_{X_{1},X_{2}}),e(1,\hat{m}_{2})=e(1,1)\bigg{)}$
	$\displaystyle\quad+\Pr\bigg{(}\mathbf{i}((X_{1}^{n},X_{2}^{n})(1,\hat{m}_{2});Y^{n})\geq\mathbf{c}^{\star})\bigg{\|}(X_{1}^{n},X_{2}^{n})(1,1),$
	$\displaystyle\quad(X_{1}^{n},X_{2}^{n})(1,\hat{m}_{2})\in T(p_{X_{1},X_{2}}),e(1,\hat{m}_{2})\neq e(1,1)\bigg{)}.$

In the first term, $Y^{n}$ is the channel output where $X_{1}^{n}(1,\hat{m}_{2})$ is one of the channel inputs, but the channel input for user 2 is unrelated. However, by the condition that $(X_{1}^{n},X_{2}^{n})(1,\hat{m}_{2})\in T(p_{X_{1},X_{2}})$ , these two codewords are distributed according to $q(x_{1}^{n},x_{2}^{n})$ . Thus we may write that

((X_{1}^{n},X_{2}^{n})(1,\hat{m}_{2}),Y^{n})\stackrel{{\scriptstyle d}}{{=}}(X_{1}^{n},X_{2}^{n},Y_{2}^{n}),

where

p_{Y_{2}^{n}|X_{1}^{n},X_{2}^{n}}(y^{n}|x_{1}^{n},x_{2}^{n})=p_{Y^{n}|X_{1}^{n}}(y^{n}|x_{1}^{n}).

In the second term, the transmitted signals are unrelated, and so the three sequences once again have the same distribution as $(X_{1}^{n},X_{2}^{n},Y_{12}^{n})$ . We may apply a similar analysis for the case where $\hat{m}_{1}\neq 1$ and $\hat{m}_{2}=1$ , defining $Y_{1}^{n}$ by

p_{Y_{1}^{n}|X_{1}^{n},X_{2}^{n}}(y^{n}|x_{1}^{n},x_{2}^{n})=p_{Y^{n}|X_{2}^{n}}(y^{n}|x_{2}^{n}).

Therefore

	$\displaystyle E[P_{e}]$	$\displaystyle\leq\Pr((X_{1}^{n},X_{2}^{n})(1,1)\notin T(p_{X_{1},X_{2}}))$
		$\displaystyle\quad+\Pr(\mathbf{i}(X_{1}^{n},X_{2}^{n};Y^{n})\not\geq\mathbf{c}^{\star})$
		$\displaystyle\quad+M_{1}M_{2}\Pr(\mathbf{i}(X_{1}^{n},X_{2}^{n};Y_{12}^{n})\geq\mathbf{c}^{\star})$
		$\displaystyle\quad+M_{1}\Pr(\mathbf{i}(X_{1}^{n},X_{2}^{n};Y_{1}^{n})\geq\mathbf{c}^{\star})$
		$\displaystyle\quad+M_{2}\Pr(\mathbf{i}(X_{1}^{n},X_{2}^{n};Y_{2}^{n})\geq\mathbf{c}^{\star})$
		$\displaystyle\leq\Pr((X_{1}^{n},X_{2}^{n})(1,1)\notin T(p_{X_{1},X_{2}}))$
		$\displaystyle\quad+\Pr(\mathbf{i}(X_{1}^{n},X_{2}^{n};Y^{n})\not\geq\mathbf{c}^{\star})$
		$\displaystyle\quad+M_{1}M_{2}\Pr(i(X_{1}^{n},X_{2}^{n};Y_{12}^{n})\geq c_{12}^{\star})$
		$\displaystyle\quad+M_{1}\Pr(i(X_{1}^{n};Y_{1}^{n}\|X_{2}^{n})\geq c_{1}^{\star})$
		$\displaystyle\quad+M_{2}\Pr(i(X_{2}^{n};Y_{2}^{n}\|X_{1}^{n})\geq c_{2}^{\star}).$

For any $(x_{1}^{n},x_{2}^{n})\in T(p_{X_{1},X_{2}})$ ,

	$\displaystyle q(x_{1}^{n},x_{2}^{n})$
	$\displaystyle=\frac{1}{\|T(p_{X_{1},X_{2}})\|}$
	$\displaystyle\leq(n+1)^{\|{\cal X}_{1}\|\cdot\|{\cal X}_{2}\|}2^{-nH(X_{1},X_{2})}$
	$\displaystyle=(n+1)^{\|{\cal X}_{1}\|\cdot\|{\cal X}_{2}\|}\prod_{x_{1},x_{2}}p_{X_{1},X_{2}}(x_{1},x_{2})^{np_{X_{1},X_{2}}(x_{1},x_{2})}$
	$\displaystyle=(n+1)^{\|{\cal X}_{1}\|\cdot\|{\cal X}_{2}\|}\prod_{i=1}^{n}p_{X_{1},X_{2}}(x_{1i},x_{2i}).$

Thus, for any $x_{1}^{n},x_{2}^{n}$ including those not in $T(p_{X_{1},X_{2}})$ ,

q(x_{1}^{n},x_{2}^{n})\leq(n+1)^{|{\cal X}_{1}|\cdot|{\cal X}_{2}|}\prod_{i=1}^{n}p_{X_{1},X_{2}}(x_{1i},x_{2i}).

By similar calculations

	$\displaystyle q(x_{1}^{n}\|x_{2}^{n})\leq(n+1)^{\|{\cal X}_{1}\|\cdot\|{\cal X}_{2}\|}\prod_{i=1}^{n}p_{X_{1}\|X_{2}}(x_{1i}\|x_{2i}),$
	$\displaystyle q(x_{2}^{n}\|x_{1}^{n})\leq(n+1)^{\|{\cal X}_{1}\|\cdot\|{\cal X}_{2}\|}\prod_{i=1}^{n}p_{X_{2}\|X_{1}}(x_{2i}\|x_{1i}).$

We bound $\Pr(i(X_{1}^{n},X_{2}^{n};Y_{12}^{n})\geq c_{12}^{\star})$ as

	$\displaystyle\Pr(i(X_{1}^{n},X_{2}^{n};Y_{12}^{n})\geq c_{12}^{\star})$
	$\displaystyle=\sum_{x_{1}^{n},x_{2}^{n},y^{n}}q(x_{1}^{n},x_{2}^{n})p_{Y^{n}}(y^{n})1(i(x_{1}^{n},x_{2}^{n};y^{n})\geq c_{12}^{\star})$
	$\displaystyle\leq(n+1)^{\|{\cal X}_{1}\|\cdot\|{\cal X}_{2}\|}\sum_{x_{1}^{n},x_{2}^{n},y^{n}}\prod_{i=1}^{n}p_{X_{1},X_{2}}(x_{1i},x_{2i})p_{Y^{n}}(y^{n})$
	$\displaystyle\qquad\cdot 1\left(\sum_{i=1}^{n}i(x_{1i},x_{2i};y_{i})\geq c_{12}^{\star}\right)$
	$\displaystyle\leq(n+1)^{\|{\cal X}_{1}\|\cdot\|{\cal X}_{2}\|}\exp\{-c_{12}^{\star}\}$
	$\displaystyle\quad\cdot\sum_{x_{1}^{n},x_{2}^{n},y^{n}}\prod_{i=1}^{n}p_{X_{1},X_{2}\|Y}(x_{1i},x_{2i}\|y_{i})p_{Y^{n}}(y^{n})$
	$\displaystyle=(n+1)^{\|{\cal X}_{1}\|\cdot\|{\cal X}_{2}\|}\exp\{-c_{12}^{\star}\}.$

Using similar bounds on the other terms, we have

	$\displaystyle E[P_{e}]$	$\displaystyle\leq\Pr((X_{1}^{n},X_{2}^{n})(1,1)\notin T(p_{X_{1},X_{2}}))$
		$\displaystyle\quad+\Pr(\mathbf{i}(X_{1}^{n},X_{2}^{n};Y^{n})\not\geq\mathbf{c}^{\star})$
		$\displaystyle\quad+(n+1)^{\|{\cal X}_{1}\|\cdot\|{\cal X}_{2}\|}\Big{(}M_{1}M_{2}\exp\{-c_{12}^{\star}\}$
		$\displaystyle\quad+M_{1}\exp\{-c_{1}^{\star}\}+M_{2}\exp\{-c_{2}^{\star}\}\Big{)}.$

Next, choose

	$\displaystyle c_{12}^{\star}$	$\displaystyle=\log(M_{1}M_{2})+\frac{1}{2}\log n+\|{\cal X}_{1}\|\cdot\|{\cal X}_{2}\|\log(n+1),$
	$\displaystyle c_{1}^{\star}$	$\displaystyle=\log M_{1}+\frac{1}{2}\log n+\|{\cal X}_{1}\|\cdot\|{\cal X}_{2}\|\log(n+1),$
	$\displaystyle c_{2}^{\star}$	$\displaystyle=\log M_{2}+\frac{1}{2}\log n+\|{\cal X}_{1}\|\cdot\|{\cal X}_{2}\|\log(n+1).$

Then

$\displaystyle E[P_{e}]$	$\displaystyle\leq\Pr((X_{1}^{n},X_{2}^{n})(1,1)\notin T(p_{X_{1},X_{2}}))$
	$\displaystyle\quad+\Pr(\mathbf{i}(X_{1}^{n},X_{2}^{n};Y^{n})\not\geq\mathbf{c}^{\star})+\frac{3}{\sqrt{n}}$
	$\displaystyle\leq\Pr((X_{1}^{n},X_{2}^{n})(1,1)\notin T(p_{X_{1},X_{2}}))$
	$\displaystyle\quad+\Pr(i(X_{1}^{n},X_{2}^{n};Y^{n})<c_{12}^{\star})$
	$\displaystyle\quad+\Pr(i(X_{1}^{n};Y^{n}\|X_{2}^{n})<c_{1}^{\star})$
	$\displaystyle\quad+\Pr(i(X_{2}^{n};Y^{n}\|X_{2}^{n})<c_{2}^{\star})+\frac{3}{\sqrt{n}}.$	(21)

As in the proof of Thm. 2, let $(r_{1},r_{2})$ be a pair of rates satisfying (16)–(18). We now choose

\log M_{j}=r_{j}-\frac{1}{2}\left(\sqrt{nV_{2}}\,Q^{-1}(\epsilon)+n\theta_{n}\right),\quad j=1,2

where $\theta_{n}$ is an error term to be determined chosen below to satisfy $\theta_{n}\leq O(\frac{\log n}{n})$ . Thus

\log(M_{1}M_{2})=nI(X_{1},X_{2};Y)-\sqrt{nV_{2}}\,Q^{-1}(\epsilon)-n\theta_{n}.

Consider the first term in (21). Note that $(X_{1}^{n},X_{2}^{n})(1,1)\notin T(p_{X_{1},X_{2}})$ only if

(f_{1}(1,k),f_{2}(1,k))\notin T(p_{X_{1},X_{2}})\text{ for all }k\in[K].

This occurs with probability bounded as

	$\displaystyle\Pr((X_{1}^{n},X_{2}^{n})(1,1)\notin T(p_{X_{1},X_{2}}))$
	$\displaystyle=\left(1-\frac{\|T(p_{X_{1},X_{2}})\|}{\|T(p_{X_{1}})\|\cdot\|T(p_{X_{2}})\|}\right)^{K}$
	$\displaystyle\leq\left(1-(n+1)^{-\|{\cal X}_{1}\|\cdot\|{\cal X}_{2}\|}2^{-nI(X_{1};X_{2})}\right)^{K}$
	$\displaystyle\leq\exp\left\{-K(n+1)^{-\|{\cal X}_{1}\|\cdot\|{\cal X}_{2}\|}2^{-nI(X_{1};X_{2})}\right\}$
	$\displaystyle\leq\frac{1}{\sqrt{n}},$

where the last inequality holds if

	$\displaystyle I(X_{1};X_{2})$	$\displaystyle\leq\frac{1}{n}\bigg{(}\log K-\|{\cal X}_{1}\|\cdot\|{\cal X}_{2}\|\log(n+1)$
		$\displaystyle\quad-\log\left(\frac{1}{2}\ln n\right)\bigg{)}.$		(22)

Now consider the second term in (21). For any $(x_{1}^{n},x_{2}^{n})\in T(p_{X_{1},X_{2}})$ ,

	$\displaystyle\sum_{i=1}^{n}E[i(x_{1i},x_{2i};Y_{i})]=nI(X_{1},X_{2};Y),$
	$\displaystyle\sum_{i=1}^{n}\text{Var}[i(x_{1i},x_{2i};Y_{i})]$
	$\displaystyle\quad=n\sum_{x_{1},x_{2}}p_{X_{1},X_{2}}(x_{1},x_{2})V(p_{Y\|X_{1}=x_{1},X_{2}=x_{2}}\\|p_{Y})=nV_{2}.$

By the Berry-Esseen inequality,

	$\displaystyle\Pr(i(X_{1}^{n},X_{2}^{n};Y^{n})<c_{12}^{\star})$
	$\displaystyle\leq\max_{(x_{1}^{n},x_{2}^{n})\in T(p_{X_{1},X_{2}})}\Pr(i(x_{1}^{n},x_{2}^{n};Y^{n})<c_{12}^{\star}\|x_{1}^{n},x_{2}^{n})$
	$\displaystyle\leq Q\left(\frac{nI(X_{1},X_{2};Y)-c_{12}^{\star}}{\sqrt{nV_{2}}}\right)+O\left(\frac{1}{\sqrt{n}}\right).$

As in the proof of Thm. 2 (near (14)), we use Hoeffding’s inequality to bound the third and fourth terms in (21) from above by $1/\sqrt{n}$ .

Putting together all the above bounds, for any $p_{X_{1},X_{2}}$ satisfying (22), we find

	$\displaystyle E[P_{e}]$
	$\displaystyle\leq Q\left(\frac{nI(X_{1},X_{2};Y)-c_{12}^{\star}}{\sqrt{nV_{2}}}\right)+O\left(\frac{1}{\sqrt{n}}\right)$
	$\displaystyle=Q\left(\frac{nI(X_{1},X_{2};Y)-\log(M_{1}M_{2})-O(\log n)}{\sqrt{nV_{2}}}\right)$
	$\displaystyle\qquad+O\left(\frac{1}{\sqrt{n}}\right)$
	$\displaystyle=Q\left(Q^{-1}(\epsilon)+\sqrt{\frac{n}{V_{2}}}\,\theta_{n}-O\left(\frac{\log n}{\sqrt{n}}\right)\right)+O\left(\frac{1}{\sqrt{n}}\right).$

There exists a choice for $\theta_{n}=O(\frac{\log n}{n})$ where this bound is no greater than $\epsilon$ . This proves that we can achieve the sum-rate

\displaystyle\frac{\log(M_{1}M_{2})}{n}

\displaystyle\geq I(X_{1},X_{2};Y)-\sqrt{\frac{V_{2}}{n}}\,Q^{-1}(\epsilon)-O\left(\frac{\log n}{n}\right)

for any $p_{X_{1},X_{2}}$ satisfying (22).

Appendix A Proof of Lemma 2

Through this proof, $x\approx y$ means that $x-y\to 0$ as $a\to 0$ . For small $a$ , $I(X_{1};X_{2})\leq a$ implies that $p_{X_{1},X_{2}}\approx p_{X_{1}}p_{X_{2}}$ . Thus, the second-order Taylor approximation for the mutual information gives

I(X_{1};X_{2})\\ \approx\frac{1}{2\ln 2}\sum_{x_{1},x_{2}}\frac{(p_{X_{1},X_{2}}(x_{1},x_{2})-p_{X_{1}}(x_{1})p_{X_{2}}(x_{1}))^{2}}{p_{X_{1}}(x_{1})p_{X_{2}}(x_{2})}.

Moreover, the first-order Taylor approximation of the mutual information $I(X_{1},X_{2};Y)$ is

\sum_{x_{1},x_{2},y}p_{X_{1},X_{2}}(x_{1},x_{2})\log\frac{p_{Y|X_{1},X_{2}}(y|x_{1},x_{2})}{p_{Y}(y)}

where

p_{Y}(y)=\sum_{x_{1},x_{2}}p_{X_{1}}(x_{1})p_{X_{2}}(x_{2})p_{Y|X_{1},X_{2}}(y|x_{1},x_{2}).

As usual, let

	$\displaystyle i(x_{1},x_{2};y)$	$\displaystyle=\log\frac{p_{Y\|X_{1},X_{2}}(y\|x_{1},x_{2})}{p_{Y}(y)}$
	$\displaystyle i(x_{1},x_{2})$	$\displaystyle=\sum_{y}p_{Y\|X_{1},X_{2}}(y\|x_{1},x_{2})i(x_{1},x_{2};y).$

Also let

I_{0}(X_{1},X_{2};Y)=\sum_{x_{1},x_{2}}p_{X_{1}}(x_{1})p_{X_{2}}(x_{2})i(x_{1},x_{2})

be the mutual information where $X_{1}$ and $X_{2}$ are independent. We can now rewrite the optimization problem for $\Delta(a)$ in terms of the marginal distributions $p_{X_{1}},p_{X_{2}},$ and

r(x_{1},x_{2})=p_{X_{1},X_{2}}(x_{1},x_{2})-p_{X_{1}}(x_{1})p_{X_{2}}(x_{2}).

Note that

I(X_{1},X_{2};Y)-C_{\text{sum}}\\ \approx\sum_{x_{1},x_{2}}r(x_{1},x_{2})i(x_{1},x_{2})+I_{0}(X_{1},X_{2};Y)-C_{\text{sum}}

In particular, if we consider maximizing over only $r$ , the optimization problem is

\begin{array}[]{ll}\text{maximize}&\sum_{x_{1},x_{2}}r(x_{1},x_{2})i(x_{1},x_{2})\\ \text{subject to}&\sum_{x_{1},x_{2}}\frac{r(x_{1},x_{2})^{2}}{p_{X_{1}}(x_{1})p_{X_{2}}(x_{2})}\leq a\,2\ln 2\\ &\sum_{x_{2}}r(x_{1},x_{2})=0\text{ for all }x_{1}\in{\cal X}_{1}.\\ &\sum_{x_{1}}r(x_{1},x_{2})=0\text{ for all }x_{2}\in{\cal X}_{2}\end{array}

(23)

The Lagrangian for this problem is

	$\displaystyle\sum_{x_{1},x_{2}}r(x_{1},x_{2})i(x_{1},x_{2})-\lambda\left(\frac{r(x_{1},x_{2})^{2}}{p_{X_{1}}(x_{1})p_{X_{2}}(x_{2})}-a\,2\ln 2\right)$
	$\displaystyle\quad+\sum_{x_{1}}\nu_{1}(x_{1})\sum_{x_{2}}r(x_{1},x_{2})+\sum_{x_{2}}\nu_{2}(x_{2})\sum_{x_{1}}r(x_{1},x_{2}).$

Differentiating with respect to $r(x_{1},x_{2})$ and setting to zero, we find that the optimal $r(x_{1},x_{2})$ is of the form

r(x_{1},x_{2})=\frac{p_{X_{1}}(x_{1})p_{X_{2}}(x_{2})}{2\lambda}\left(i(x_{1},x_{2})+\nu_{1}(x_{1})+\nu_{2}(x_{2})\right).

We first find the values of the dual variables $\nu_{1}$ and $\nu_{2}$ . For any $x_{1}$ , we need

	$\displaystyle 0$	$\displaystyle=\sum_{x_{2}}r(x_{1},x_{2})$
		$\displaystyle=\frac{p_{X_{1}}(x_{1})}{2\lambda}\left(E[i(x_{1},X_{2})]+\nu_{1}(x_{1})+E[\nu_{2}(X_{2})]\right)$

where the expectations are with respect to $(X_{1},X_{2})\sim p_{X_{1}}p_{X_{2}}$ . Combining this constraint with the equivalent one for $\nu_{2}$ , we must have

	$\displaystyle\nu(x_{1})$	$\displaystyle=-E[i(x_{1},X_{2})]-E[\nu_{2}(X_{2})]$
	$\displaystyle\nu(x_{2})$	$\displaystyle=-E[i(X_{1},x_{2})]-E[\nu_{1}(X_{1})].$

Taking the expectation of either constraint gives

E[\nu_{2}(X_{2})]+E[\nu(X_{1})]=-E[i(X_{1},X_{2})].

Thus

\nu_{1}(x_{1})+\nu_{2}(x_{2})\\ =-E[i(x_{1},X_{2})]-E[i(X_{1},x_{2})]+E[i(X_{1},X_{2})].

and so

r(x_{1},x_{2})=\frac{1}{2\lambda}p_{X_{1}}(x_{1})p_{X_{2}}(x_{2})j(x_{1},x_{2})

where

j(x_{1},x_{2})=i(x_{1},x_{2})-E[i(x_{1},X_{2})]\\ -E[i(X_{1},x_{2})]+E[i(X_{1},X_{2})].

To find $\lambda$ , we use the constraint

	$\displaystyle a\,2\ln 2$	$\displaystyle=\sum_{x_{1},x_{2}}\frac{r(x_{1},x_{2})^{2}}{p_{X_{1}}(x_{1})p_{X_{2}}(x_{2})}$
		$\displaystyle=\frac{1}{(2\lambda)^{2}}E[j(X_{1},X_{2})^{2}]$

\frac{1}{2\lambda}=\sqrt{\frac{a\,2\ln 2}{E[j(X_{1},X_{2})^{2}]}}.

We may now derive the optimal objective value for the optimization problem in (23), which is

\sum_{x_{1},x_{2}}r(x_{1},x_{2})i(x_{1},x_{2})\\ =\sqrt{\frac{a\,2\ln 2}{E[j(X_{1},X_{2})^{2}]}}\,E[j(X_{1},X_{2})i(X_{1},X_{2})].

Now considering the optimization over $p_{X_{1}},p_{X_{2}}$ , we may write

	$\displaystyle\Delta(a)$	$\displaystyle\approx\max_{p_{X_{1}}p_{X_{2}}}\sqrt{\frac{a\,2\ln 2}{E[j(X_{1},X_{2})^{2}]}}\,E[j(X_{1},X_{2})i(X_{1},X_{2})]$
		$\displaystyle\qquad+I_{0}(X_{1},X_{2};Y)-C_{\text{sum}}.$

Note that for small $a$ , the RHS will be negative unless $p_{X_{1}},p_{X_{2}}$ are such that $I_{0}(X_{1},X_{2};Y)=C$ (i.e., they are sum-capacity achieving). By the optimality conditions for the maximization defining the sum-capacity, this implies that

	$\displaystyle E[i(x_{1},X_{2})]$	$\displaystyle=C_{\text{sum}}\text{ for all }x_{1}\text{ where }p_{X_{1}}(x_{1})>0$
	$\displaystyle E[i(X_{1},x_{2})]$	$\displaystyle=C_{\text{sum}}\text{ for all }x_{1}\text{ where }p_{X_{2}}(x_{2})>0.$

Thus, for $x_{1},x_{2}$ where $p_{X_{1}}(x_{1})p_{X_{2}}(x_{2})>0$ , we have

j(x_{1},x_{2})=i(x_{1},x_{2})-C.

Thus $E[j(X_{1},X_{2})^{2}]=\text{Var}(i(X_{1},X_{2}))$ , and

	$\displaystyle E[j(X_{1},X_{2})i(X_{1},X_{2})]$	$\displaystyle=E[i(X_{1},X_{2})^{2}-C_{\text{sum}}\,i(X_{1},X_{2})]$
		$\displaystyle=E[i(X_{1},X_{2})^{2}]-C_{\text{sum}}^{2}$
		$\displaystyle=\text{Var}(i(X_{1},X_{2})).$

Therefore

	$\displaystyle\Delta(a)$	$\displaystyle\approx\max_{p_{X_{1}}p_{X_{2}}:I(X_{1},X_{2};Y)=C_{\text{sum}}}\sqrt{a\,2\ln 2\,\text{Var}(i(X_{1},X_{2}))}$
		$\displaystyle=\sigma\sqrt{a\,2\ln 2}.$

Appendix B Proof of Lemma 3

We first need the following lemma.

Lemma 5.

Fix $\epsilon\in(0,1)$ . Let $Y$ and $Z$ be independent random variables where

	$\displaystyle f_{Y}(y)\geq c\text{ for all }p\in[F_{Y}^{-1}(\epsilon/4),F_{Y}^{-1}({\textstyle\frac{3+\epsilon}{4}})],$
	$\displaystyle d\geq f_{Z}(y)\geq c\text{ for all }p\in[F_{Z}^{-1}(\epsilon/4),F_{Z}^{-1}({\textstyle\frac{3+\epsilon}{4}})].$

Then for $X=Y+Z$ ,

f_{X}(F_{X}^{-1}(\epsilon))\geq\min\left\{\frac{3c}{4},\frac{c^{2}\epsilon}{4d}\right\}.

Proof:

Let $x=F_{X}^{-1}(\epsilon)$ . Note that

	$\displaystyle F_{X}(y+z)$	$\displaystyle=\Pr(Y+Z\leq y+z)$
		$\displaystyle\leq\Pr(Y\leq y\text{ or }Z\leq z)$
		$\displaystyle\leq F_{Y}(y)+F_{Z}(z).$

In particular,

F_{X}(F_{Y}^{-1}(\epsilon/2)+F_{Z}^{-1}(\epsilon/2))\leq\epsilon

x\geq F_{Y}^{-1}(\epsilon/2)+F_{Z}^{-1}(\epsilon/2).

By similar reasoning,

x\leq F_{Y}^{-1}\left(\frac{1+\epsilon}{2}\right)+F_{Z}^{-1}\left(\frac{1+\epsilon}{2}\right).

Define

	$\displaystyle y_{1}$	$\displaystyle=F_{Y}^{-1}(\epsilon/4),$	$\displaystyle y_{2}$	$\displaystyle=F_{Y}^{-1}({\textstyle\frac{3+\epsilon}{4}}),$
	$\displaystyle z_{1}$	$\displaystyle=F_{Z}^{-1}(\epsilon/4),$	$\displaystyle z_{2}$	$\displaystyle=F_{Z}^{-1}({\textstyle\frac{3+\epsilon}{4}}).$

Consider several cases. First, if

y_{2}+z_{1}\leq x\leq y_{1}+z_{2}.

(24)

Then

	$\displaystyle f_{X}(x)$	$\displaystyle=\int_{-\infty}^{\infty}f_{Y}(x-z)f_{Z}(z)dz$
		$\displaystyle\geq\int_{z_{1}}^{z_{2}}cf_{Y}(x-z)dz$
		$\displaystyle=c\Pr(x-z_{2}<Y<x-z_{1})$
		$\displaystyle\geq c\Pr(y_{1}<Y<y_{2})$
		$\displaystyle=c\left(\frac{3+\epsilon}{4}-\frac{\epsilon}{4}\right)$
		$\displaystyle\geq\frac{3c}{4}.$

Similarly, if

y_{1}+z_{2}\leq x\leq y_{2}+z_{1},

(25)

then $f_{X}(x)\geq\frac{3c}{4}$ . Now consider the case that neither (24) nor (25) holds. We have

	$\displaystyle f_{X}(x)$	$\displaystyle\geq\int_{\max\{y_{1},x-z_{2}\}}^{\min\{y_{2},x-z_{1}\}}c^{2}dy$
		$\displaystyle=c^{2}\left[\min\{y_{2},x-z_{1}\}-\max\{y_{1},x-z_{2}\}\right]$

By the assumption that (24) and (25) do not hold, we have

f_{X}(x)\geq c^{2}\min\{y_{2}+z_{2}-x,x-y_{1}-z_{2}\}.

Note that

	$\displaystyle y_{2}+z_{2}-x$	$\displaystyle\geq F_{Y}^{-1}(1/2+\epsilon)+F_{Z}^{-1}(1/2+\epsilon)$
		$\displaystyle\qquad-F_{Y}^{-1}\left(\frac{1+\epsilon}{2}\right)+F_{Z}^{-1}\left(\frac{1+\epsilon}{2}\right)$
		$\displaystyle\geq F_{Z}^{-1}(1/2+\epsilon)-F_{Z}^{-1}\left(\frac{1+\epsilon}{2}\right)$
		$\displaystyle\geq\frac{\epsilon}{2d}.$

Moreover

	$\displaystyle x-y_{1}-y_{2}$	$\displaystyle\geq F_{Y}^{-1}({\textstyle\frac{\epsilon}{2}})+F_{Z}^{-1}({\textstyle\frac{\epsilon}{2}})-F_{Y}^{-1}({\textstyle\frac{\epsilon}{4}})-F_{Z}^{-1}({\textstyle\frac{\epsilon}{4}})$
		$\displaystyle\geq F_{Z}^{-1}(\epsilon/2)-F_{Z}^{-1}(\epsilon/4)$
		$\displaystyle\geq\frac{\epsilon}{4d}.$

Thus in this case,

f_{X}(x)\geq\frac{c^{2}\epsilon}{2d}.

∎

We now complete the proof of Lemma 3. Recall that $S_{K}=\sqrt{V_{1}}Z(K)+\sqrt{V_{2}}Z_{0}$ where $Z(K)=\max_{k\in[K]}Z_{k}$ . Let $x=F_{S_{K}}^{-1}(\epsilon)$ . Note that

\frac{d}{ds}F_{S_{K}}(s)=f_{S_{K}}(s)

and so

\frac{d}{dp}F_{S_{K}}^{-1}(p)\bigg{|}_{p=\epsilon}=\frac{1}{f_{S_{K}}(x)}.

Thus it is sufficient to show that $f_{S_{K}}(x)$ is bounded away from zero for all $K$ . Since $Z_{0}\sim\mathcal{N}(0,1)$ ,

F_{Z_{0}}^{-1}(\epsilon/4)\!\geq\!-\sqrt{2\ln(2/\epsilon)},\quad F_{Z_{0}}^{-1}({\textstyle\frac{3+\epsilon}{4}})\!\leq\!\sqrt{2\ln(2/(1-\epsilon))}.

Thus, for $z\in[F_{Z_{0}}^{-1}(\epsilon/4),F_{Z_{0}}^{-1}({\textstyle\frac{3+\epsilon}{4}})]$ ,

	$\displaystyle f_{Z_{0}}(z)$	$\displaystyle=\phi(z)$
		$\displaystyle\geq\max\{\phi(-\sqrt{2\ln(2/\epsilon)}),\phi(\sqrt{2\ln(2/(1-\epsilon))})\}$
		$\displaystyle=\max\left\{\frac{2}{\sqrt{2\pi}\epsilon},\frac{2}{\sqrt{2\pi}(1-\epsilon)}\right\}$
		$\displaystyle=\frac{2}{\sqrt{2\pi}\min\{\epsilon,1-\epsilon\}}.$

Moreover, for all $z$ ,

f_{Z_{0}}(z)\leq\frac{1}{\sqrt{2\pi}}.

Now we prove a lower bound on $f_{Z(K)}(y)$ . Specifically let $y=F_{Z(K)}^{-1}(p)$ for $p\in[\frac{\epsilon}{4},\frac{3+\epsilon}{4}]$ . Note that

p=F_{Z(K)}(y)=\Phi(y)^{K}

so $\Phi(y)=p^{1/K}$ . We have

	$\displaystyle f_{Z(K)}(y)$	$\displaystyle=K\Phi(y)^{K-1}\phi(y)$
		$\displaystyle=Kp^{1-1/K}\phi(y)$

Suppose $p^{1/K}<1/2$ . Thus $y<0$ . Also, since $K\geq 1$ , $p^{1/K}\geq p\geq\frac{\epsilon}{4}$ , we have

	$\displaystyle\frac{\epsilon}{4}$	$\displaystyle\leq p^{1/K}$
		$\displaystyle=\Phi(y)$
		$\displaystyle=Q(-y)$
		$\displaystyle\leq e^{-y^{2}/2}.$

Thus

0>y\geq-\sqrt{2\ln(4/\epsilon)}

and so

f_{Z(K)}(y)\geq Kp^{1-1/K}\frac{\epsilon}{4\sqrt{2\pi}}\geq Kp\frac{\epsilon}{4\sqrt{2\pi}}\geq\frac{\epsilon^{2}}{16\sqrt{2\pi}}.

Now suppose $p^{1/K}\geq 1/2$ , so $y\geq 0$ . We have

	$\displaystyle p^{1/K}$	$\displaystyle=\Phi(y)$
		$\displaystyle=1-Q(y)$
		$\displaystyle\geq 1-e^{-y^{2}/2}$

and so

y\leq\sqrt{-2\ln(1-p^{1/K})}.

Thus

	$\displaystyle f_{Z(K)}(y)$	$\displaystyle\geq Kp^{1-1/K}\phi\left(\sqrt{-2\ln(1-p^{1/K})}\right)$
		$\displaystyle=Kp^{1-1/K}\frac{1}{\sqrt{2\pi}}(1-p^{1/K})$
		$\displaystyle=Kp\frac{1}{\sqrt{2\pi}}(p^{-1/K}-1)$

In the limit as $K\to\infty$ ,

	$\displaystyle p^{-1/K}-1$	$\displaystyle=\exp\left\{-\frac{1}{K}\ln p\right\}-1$
		$\displaystyle\geq-\frac{1}{K}\ln p.$

Thus

\displaystyle f_{Z(K)}(y)

\displaystyle\geq\frac{p\ln(1/p)}{\sqrt{2\pi}}.

This proves that there exists a $c>0$ such that $f_{Z(K)}(y)\geq c$ for all $y$ in the range of interest. Similar $f_{Z_{0}}$ is upper and lower bounded as shown above, we may apply Lemma 5 to complete the proof.

References

[1] F. Willems, “The discrete memoryless multiple access channel with partially cooperating encoders,” IEEE Transactions on Information Theory, vol. 29, no. 3, pp. 441–445, 1983.
[2] F. Willems and E. Van der Meulen, “The discrete memoryless multiple-access channel with cribbing encoders,” IEEE Transactions on Information Theory, vol. 31, no. 3, pp. 313–327, 1985.
[3] P. Noorzad, M. Effros, M. Langberg, and T. Ho, “On the power of cooperation: Can a little help a lot?” in IEEE International Symposium on Information Theory, 2014, pp. 3132–3136.
[4] P. Noorzad, M. Effros, and M. Langberg, “The unbounded benefit of encoder cooperation for the k-user MAC,” IEEE Transactions on Information Theory, vol. 64, no. 5, pp. 3655–3678, 2018.
[5] M. Langberg and M. Effros, “On the capacity advantage of a single bit,” in 2016 IEEE Globecom Workshops (GC Wkshps). IEEE, 2016, pp. 1–6.
[6] P. Noorzad, M. Effros, and M. Langberg, “Can negligible cooperation increase capacity? the average-error case,” in Proceedings of IEEE International Symposium on Information Theory (ISIT), 2018, pp. 1256–1260.
[7] P. Noorzad, M. Langberg, and M. Effros, “Negligible Cooperation: Contrasting the Maximal- and Average-Error Cases,” Manuscript. Available on https://arxiv.org/pdf/1911.10449.pdf, 2019.
[8] J. Hartigan et al., “Bounding the maximum of dependent random variables,” Electronic Journal of Statistics, vol. 8, no. 2, pp. 3126–3140, 2014.
[9] C. Borell, “The Brunn-Minkowski inequality in Gauss space,” Inventiones Mathematicae, vol. 30, no. 2, pp. 207–216, 1975.
[10] B. Tsirelson, I. Ibragimov, and V. Sudakov, “Norms of Gaussian sample functions,” Proceedings of the Third Japan-USSR Symposium on Probability Theory, vol. 550, pp. 20–41, 1976.
[11] Y.-W. Huang and P. Moulin, “Finite blocklength coding for multiple access channels,” in 2012 IEEE International Symposium on Information Theory Proceedings. IEEE, 2012, pp. 831–835.
[12] E. M. Jazi and J. N. Laneman, “Simpler achievable rate regions for multiaccess with finite blocklength,” in 2012 IEEE International Symposium on Information Theory Proceedings. IEEE, 2012, pp. 36–40.
[13] V. Y. Tan and O. Kosut, “On the dispersions of three network information theory problems,” IEEE Transactions on Information Theory, vol. 60, no. 2, pp. 881–903, 2014.
[14] J. Scarlett, A. Martinez, and A. G. i Fàbregas, “Second-order rate region of constant-composition codes for the multiple-access channel,” IEEE Transactions on Information Theory, vol. 61, no. 1, pp. 157–172, 2015.
[15] R. C. Yavas, V. Kostina, and M. Effros, “Random access channel coding in the finite blocklength regime,” IEEE Transactions on Information Theory, 2020.
[16] L. H. Y. Chen, X. Fang, and Q.-M. Shao, “From Stein identities to moderate deviations,” Ann. Probab., vol. 41, no. 1, pp. 262–293, 01 2013.

$\displaystyle i(x_{1},x_{2};y)$	$\displaystyle=$	$\displaystyle\log\left(\frac{p_{Y\|X_{1},X_{2}}(y\|x_{1},x_{2})}{p_{Y}(y)}\right)$
$\displaystyle i(x_{1};y\|x_{2})$	$\displaystyle=$	$\displaystyle\log\left(\frac{p_{Y\|X_{1},X_{2}}(y\|x_{1},x_{2})}{p_{Y\|X_{2}}(y\|x_{2})}\right)$
$\displaystyle i(x_{2};y\|x_{1})$	$\displaystyle=$	$\displaystyle\log\left(\frac{p_{Y\|X_{1},X_{2}}(y\|x_{1},x_{2})}{p_{Y\|X_{1}}(y\|x_{1})}\right),$

	$\displaystyle\Pr(i(X_{1};\tilde{Y}\|\tilde{X}_{2})\geq c_{1}^{\star})$
	$\displaystyle=\sum_{x_{1},x_{2},y}p_{X_{1}}(x_{1})p_{\tilde{X}_{2},\tilde{Y}}(x_{2},y)1\left(i(x_{1};y\|x_{2})\geq c_{1}^{\star}\right)$
	$\displaystyle=\sum_{x_{1},x_{2},y}p_{X_{1}\|X_{2}}(x_{1}\|x_{2})p_{\tilde{X}_{2},\tilde{Y}}(x_{2},y)$
	$\displaystyle\quad\cdot 1\left(i(x_{1};y\|x_{2})\geq c_{1}^{\star}\right)$
	$\displaystyle=\sum_{x_{2},y}p_{\tilde{X}_{2},\tilde{Y}}(x_{2},y)\sum_{x_{1}}p_{X_{1}\|X_{2},Y}(x_{1}\|x_{2},y)$
	$\displaystyle\quad\cdot\frac{p_{X_{1}\|X_{2}}(x_{1}\|x_{2})}{p_{X_{1}\|X_{2},Y}(x_{1}\|x_{2},y)}1\left(i(x_{1};y\|x_{2})\geq c_{1}^{\star}\right)$
	$\displaystyle=\sum_{x_{2},y}p_{\tilde{X}_{2},\tilde{Y}}(x_{2},y)\sum_{x_{1}}p_{X_{1}\|X_{2},Y}(x_{1}\|x_{2},y)$
	$\displaystyle\quad\cdot\frac{p_{Y\|X_{2}}(y\|x_{2})}{p_{Y\|X_{1},X_{2}}(y\|x_{1},x_{2})}$
	$\displaystyle\quad\cdot 1\left(\log\frac{p_{Y\|X_{1},X_{2}}(y\|x_{1},x_{2})}{p_{Y\|X_{2}}(y\|x_{2})}\geq c_{1}^{\star}\right)$
	$\displaystyle\leq\sum_{x_{2},y}p_{\tilde{X}_{2},\tilde{Y}}(x_{2},y)\sum_{x_{1}}p_{X_{1}\|X_{2},Y}(x_{1}\|x_{2},y)\exp(-c_{1}^{\star})$
	$\displaystyle=\exp(-c_{1}^{\star}).$

	$\displaystyle\Pr\left(i(\tilde{X}_{1}^{n};\tilde{Y}^{n}\|\tilde{X}_{2}^{n})<\log M_{1}+\frac{1}{2}\log n\right)$		(14)
	$\displaystyle\leq K\Pr\left(i(X_{1}^{n};Y^{n}\|X_{2}^{n})<\log M_{1}+\frac{1}{2}\log n\right)$
	$\displaystyle\leq K\exp\left\{-\frac{a_{1}}{n}\left(nI(X_{1};Y\|X_{2})-\log M_{1}-\frac{1}{2}\log n\right)^{2}\right\}$

	$\displaystyle q(x_{1}^{n},x_{2}^{n})$
	$\displaystyle=\frac{1}{\|T(p_{X_{1},X_{2}})\|}$
	$\displaystyle\leq(n+1)^{\|{\cal X}_{1}\|\cdot\|{\cal X}_{2}\|}2^{-nH(X_{1},X_{2})}$
	$\displaystyle=(n+1)^{\|{\cal X}_{1}\|\cdot\|{\cal X}_{2}\|}\prod_{x_{1},x_{2}}p_{X_{1},X_{2}}(x_{1},x_{2})^{np_{X_{1},X_{2}}(x_{1},x_{2})}$
	$\displaystyle=(n+1)^{\|{\cal X}_{1}\|\cdot\|{\cal X}_{2}\|}\prod_{i=1}^{n}p_{X_{1},X_{2}}(x_{1i},x_{2i}).$

	$\displaystyle\Pr(i(X_{1}^{n},X_{2}^{n};Y_{12}^{n})\geq c_{12}^{\star})$
	$\displaystyle=\sum_{x_{1}^{n},x_{2}^{n},y^{n}}q(x_{1}^{n},x_{2}^{n})p_{Y^{n}}(y^{n})1(i(x_{1}^{n},x_{2}^{n};y^{n})\geq c_{12}^{\star})$
	$\displaystyle\leq(n+1)^{\|{\cal X}_{1}\|\cdot\|{\cal X}_{2}\|}\sum_{x_{1}^{n},x_{2}^{n},y^{n}}\prod_{i=1}^{n}p_{X_{1},X_{2}}(x_{1i},x_{2i})p_{Y^{n}}(y^{n})$
	$\displaystyle\qquad\cdot 1\left(\sum_{i=1}^{n}i(x_{1i},x_{2i};y_{i})\geq c_{12}^{\star}\right)$
	$\displaystyle\leq(n+1)^{\|{\cal X}_{1}\|\cdot\|{\cal X}_{2}\|}\exp\{-c_{12}^{\star}\}$
	$\displaystyle\quad\cdot\sum_{x_{1}^{n},x_{2}^{n},y^{n}}\prod_{i=1}^{n}p_{X_{1},X_{2}\|Y}(x_{1i},x_{2i}\|y_{i})p_{Y^{n}}(y^{n})$
	$\displaystyle=(n+1)^{\|{\cal X}_{1}\|\cdot\|{\cal X}_{2}\|}\exp\{-c_{12}^{\star}\}.$