This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Error Exponents of Optimum Decoding for the Interference Channel

Raul H. Etkin,  Neri Merhav,  and Erik Ordentlich
raul.etkin@hp.com, merhav@ee.technion.ac.il, erik.ordentlich@hp.com
R. Etkin and E. Ordentlich are with Hewlett-Packard Laboratories, Palo Alto, CA 94304, USA.N. Merhav is with Technion - Israel Institute of Technology, Haifa 32000, Israel. Part of this work was done while N. Merhav was visiting Hewlett–Packard Laboratories in the Summers of 2007 and 2008.
Abstract

Exponential error bounds for the finite–alphabet interference channel (IFC) with two transmitter–receiver pairs, are investigated under the random coding regime. Our focus is on optimum decoding, as opposed to heuristic decoding rules that have been used in previous works, like joint typicality decoding, decoding based on interference cancellation, and decoding that considers the interference as additional noise. Indeed, the fact that the actual interfering signal is a codeword and not an i.i.d. noise process complicates the application of conventional techniques to the performance analysis of the optimum decoder. Using analytical tools rooted in statistical physics, we derive a single letter expression for error exponents achievable under optimum decoding and demonstrate strict improvement over error exponents obtainable using suboptimal decoding rules, but which are amenable to more conventional analysis.

Index Terms:
Error exponent region, large deviations, method of types, statistical physics.

I Introduction

The MM-user interference channel (IFC) models the communication between MM transmitter-receiver pairs, wherein each receiver must decode its corresponding transmitter’s message from a signal that is corrupted by interference from the other transmitters, in addition to channel noise. The information theoretic analysis of the IFC was initiated over 30 year ago and has recently witnessed a resurgence of interest, motivated by new potential applications, such as wireless communication over unregulated spectrum.

Previous work on the IFC has focused on obtaining inner and outer bounds to the capacity region for memoryless interference and noise, with a precise characterization of the capacity region remaining elusive for most channels, even for M=2M=2 users. The best known inner bound for the IFC is the Han-Kobayashi (HK) region, established in [1]. It has been found to be tight in certain special cases ([1, 2]), and recently was found to be tight to within 1 bit for the two user Gaussian IFC [3]. No achievable rates that lie outside the HK region are known for any IFC with M=2M=2 users.

Our aim in this paper is to extend the study of achievable schemes to the analysis of error exponents, or exponential rates of decay of error probabilities, that are attainable as a function of user rates. To our knowledge, there has been no prior treatment of error exponents for the IFC. In particular, the error bounds underlying the achievability results in [1] yield vanishing error exponents (though still decaying error probability) at all rates.

The notion of an error exponent region, or a set of achievable exponential rates of decay in the error probabilities for different users at a given operating rate-tuple in a multi-user communication network, was formalized recently in [4], and studied therein for Gaussian multiple access and broadcast channels. Our main result, presented in Section IV, is a single letter characterization of an achievable error exponent region, as a function of user rates, for the M=2M=2 user finite alphabet, memoryless interference channel. The region is derived by bounding the average error probability of random codebooks comprised of i.i.d. codewords uniformly distributed over a type class, under maximum likelihood (ML) decoding at each user. Unlike the single user setting, in this case, the effective channel determining each receiver’s ML decoding rule is induced both by the noise and the interfering user’s codebook. Our focus on optimal decoding is a departure from the conventional achievability arguments in [1] and elsewhere, which are based on joint-typicality decoding, with restrictions on the decoder to “treat interference as noise” or to “decode the interference” in part or in whole. However, in this work, we confine our analysis to codebook ensembles that are simpler than the superposition codebooks of [1].

The analysis of the probability of decoding error under optimal decoding is complicated due to correlations induced by the interfering signal. Usual methods for bounding the probability of error based on Jensen’s inequality and other related inequalities (see, e.g., (8) below) fail to give good results. Our bounding approach combines some of the classical information theoretic approaches of [5] and [6] with an analytical technique from statistical physics that was applied recently to the study of single user channels in [7, 8]. More specifically, as in [5], we use auxiliary parameters ρ\rho and λ\lambda to get an upper bound on the average probability of decoding error under ML decoding, which we then bound using the method of types [6]. Key in our derivation is the use of distance enumerators in the spirit of [7] and [8], which allows us to avoid using Jensen’s inequality in some steps, and allows us to maintain exponential tightness in other inequalities by applying them to only polynomially few terms (as opposed to exponentially many) in certain sums that bound the probability of decoding error. It should be emphasized, in this context, that the use of this technique was pivotal to our results. Our earlier attempts, that were based on more ‘traditional’ tools, failed to provide meaningful results. In fact, they all turned out to be inferior to some trivial bounds.

The paper is organized as follows. The notation, various definitions, and the channel model assumed throughout the paper are detailed in Section II. In Section III, we derive an “easy” set of attainable error exponents which we shall treat as a benchmark for the exponents of the main section, Section IV. The “easy” exponents are obtained by simple extensions to the interference channel of existing error exponent results for single user and multiple access channels, based on random constant composition codebooks and suboptimal decoders. Then, in Section IV, we derive another set of attainable exponents by analyzing ML decoding for the channel induced by the interfering codebook. In Section V, we show that the minimizations required to evaluate the new error exponents can be written as convex optimization problems, and, as a result, can be solved efficiently. We follow this up in Section VI with a numerical comparison of the new exponents with the baseline exponents of Section III for a simple IFC. These numerical results demonstrate that the new exponents are never worse (at least for the chosen channel and parameters) and, for most rates, strictly improve over the baseline exponents.

An earlier version of this work was presented in [9].

II Notation, Definitions, and Channel Model

Unless otherwise stated, we use lowercase and uppercase letters for scalars, boldface lowercase letters for vectors, uppercase (boldface) letters for random variables (vectors), and calligraphic letters for sets. For example, aa is a scalar, 𝒗v is a vector, XX is a random variable, 𝑿X is a random vector, and 𝒮{\cal S} is a set. For a real number aa we shall, on occasion, let a¯\overline{a} denote 1a1-a. Also, we use log()\log(\cdot) to denote natural logarithm, 𝑬E to denote expectation, and Pr to denote probability. For independent random variables XX and YY distributed according to PX,Y(x,y)=PX(x)PY(y)P_{X,Y}(x,y)=P_{X}(x)P_{Y}(y), (x,y)𝒳×𝒴(x,y)\in{\cal X}\times{\cal Y}, we denote the conditional expectation operator 𝑬X()\mbox{\boldmath$E$}_{X}(\cdot) as 𝑬X(f(X,Y))=x𝒳f(x,Y)PX(x)\mbox{\boldmath$E$}_{X}(f(X,Y))\stackrel{{\scriptstyle\triangle}}{{=}}\sum_{x\in{\cal X}}f(x,Y)P_{X}(x) for any function f(,)f(\cdot,\cdot). All information quantities (entropy, mutual information, etc.) and rates are in nats. Finally, we use \doteq, .\stackrel{{\scriptstyle.}}{{\leq}}, etc., to denote equality or inequality to the first order in the exponent, i.e. anbnlimn1nloganbn=0a_{n}\doteq b_{n}\Leftrightarrow\lim_{n\to\infty}\frac{1}{n}\log\frac{a_{n}}{b_{n}}=0; an.bnlimsupn1nloganbn0a_{n}\stackrel{{\scriptstyle.}}{{\leq}}b_{n}\Leftrightarrow\lim\sup_{n\to\infty}\frac{1}{n}\log\frac{a_{n}}{b_{n}}\leq 0.

The empirical probability mass function of the finite alphabet sequence 𝒗=(v(1),,v(n))\mbox{\boldmath$v$}=(v(1),\ldots,v(n)) with alphabet 𝒱{\cal V} is denoted as the vector {P𝒗(v),v𝒱}\{P_{\mbox{\boldmath$v$}}(v),~v\in{\cal V}\}, where each P𝒗(v)P_{\mbox{\boldmath$v$}}(v) is the relative frequency of v(i)=vv(i)=v along 𝒗v. The type class associated with an empirical probability mass function PP, which will be denoted by 𝒯P{\cal T}_{P}, is the set of all nn–vectors {𝒗}\{\mbox{\boldmath$v$}\} whose empirical probability mass function is equal to PP. Similar conventions will apply to pairs and triples of vectors of length nn, which are defined over the corresponding product alphabets. Information measures pertaining to empirical distributions will be denoted using the standard notational conventions, except that we use “^\hat{\quad}” as well as subscripts that indicate the sequences from which these empirical distributions were extracted. For example, we write H^𝒙𝒚𝒛(X,Y|Z)\hat{H}_{\mbox{\boldmath$x$}\mbox{\boldmath$y$}\mbox{\boldmath$z$}}(X,Y|Z) and I^𝒙𝒚𝒛(X,Y;Z)\hat{I}_{\mbox{\boldmath$x$}\mbox{\boldmath$y$}\mbox{\boldmath$z$}}(X,Y;Z) to denote the conditional entropy of (X,Y)(X,Y) given ZZ and the mutual information between (X,Y)(X,Y) and ZZ, respectively, computed with respect to the empirical distribution P𝒙𝒚𝒛(x,y,z)P_{\mbox{\boldmath$x$}\mbox{\boldmath$y$}\mbox{\boldmath$z$}}(x,y,z). We denote the relative entropy or Kullback-Leibler divergence between distributions PXP_{X} and PYP_{Y} as D(PX||PY)=xPX(x)log(PX(x)/PY(x))D(P_{X}||P_{Y})\stackrel{{\scriptstyle\triangle}}{{=}}\sum_{x}P_{X}(x)\log(P_{X}(x)/P_{Y}(x)), and we write D(PX|Z||PY|Z|PZ)D(P_{X|Z}||P_{Y|Z}|P_{Z}) for the conditional relative entropy between conditional distributions PX|ZP_{X|Z} and PY|ZP_{Y|Z} conditioned on PZP_{Z}, which is defined as D(PX|Z||PY|Z|PZ)=x,zPZ(z)PX|Z(x|z)log(PX|Z(x|z)/PY|Z(x|z))D(P_{X|Z}||P_{Y|Z}|P_{Z})\stackrel{{\scriptstyle\triangle}}{{=}}\sum_{x,z}P_{Z}(z)P_{X|Z}(x|z)\log(P_{X|Z}(x|z)/P_{Y|Z}(x|z)) .

We continue with a formal description of the two–user IFC setting. Let 𝒙i=(xi(1),,xi(n))𝒳in\mbox{\boldmath$x$}_{i}=(x_{i}(1),\ldots,x_{i}(n))\in{\cal X}^{n}_{i}, i=1,2i=1,2, denote the channel input signals of the two transmitters, and let 𝒚i=(yi(1),,yi(n))𝒴in\mbox{\boldmath$y$}_{i}=(y_{i}(1),\ldots,y_{i}(n))\in{\cal Y}^{n}_{i} be the corresponding channel outputs received by decoders 1 and 2, where 𝒳i{\cal X}_{i} and 𝒴i{\cal Y}_{i} denote the input and output alphabets, and which we assume to be finite. Each (random) output symbol pair (Y1(j),Y2(j))(Y_{1}(j),Y_{2}(j)) is assumed to be conditionally independent of all other outputs, and all input symbols, given the two corresponding (random) input symbols (X1(j),X2(j))(X_{1}(j),X_{2}(j)), and the corresponding conditional probability is assumed to be constant from symbol to symbol. An (n,R1,R2)(n,R_{1},R_{2}) code for the IFC consists of pairs of encoding and decoding functions, (f1,f2)(f_{1},f_{2}) and (g1,g2)(g_{1},g_{2}), respectively, where fi:{1,,Mi}𝒳inf_{i}:\{1,\ldots,M_{i}\}\rightarrow{\cal X}^{n}_{i}, Mi=enRiM_{i}=\lceil e^{nR_{i}}\rceil, and gi:𝒴in{1,,Mi}g_{i}:{\cal Y}^{n}_{i}\rightarrow\{1,\ldots,M_{i}\}, i=1,2i=1,2. The performance of the code is characterized by a pair of error probabilities Pe,i=Pr(W^iWi)P_{e,i}=\mbox{Pr}(\hat{W}_{i}\neq W_{i}), i=1,2i=1,2, where W^i=gi(𝒀i)\hat{W}_{i}=g_{i}(\mbox{\boldmath$Y$}_{i}) and 𝒀i\mbox{\boldmath$Y$}_{i} is the random output when user ii transmits 𝑿i=fi(Wi)\mbox{\boldmath$X$}_{i}=f_{i}(W_{i}), assuming the messages WiW_{i} are uniformly distributed on the sets of indices {1,2,,Mi}\{1,2,\ldots,M_{i}\}, i=1,2i=1,2. The per user error probabilities depend on the channel only through the marginal conditional distributions of the channel outputs given the corresponding channel input pairs. We shall denote these conditional distributions as qi(y|x1,x2)=Pr(Yi(j)=y|(X1(j),X2(j))=(x1,x2))q_{i}(y|x_{1},x_{2})\stackrel{{\scriptstyle\triangle}}{{=}}\mbox{Pr}(Y_{i}(j)=y|(X_{1}(j),X_{2}(j))=(x_{1},x_{2})).

A pair of error exponents (E1,E2)(E_{1},E_{2}) is attainable at a rate pair (R1,R2)(R_{1},R_{2}) if there is a sequence of (n,R1,R2)(n,R_{1},R_{2}) codes satisfying Eilim infn(1/n)logPe,iE_{i}\leq\liminf_{n\to\infty}-(1/n)\log P_{e,i} for i=1,2i=1,2. The set of all attainable error exponents at (R1,R2)(R_{1},R_{2}) comprises the error exponent region at (R1,R2)(R_{1},R_{2}) and we shall denote it as (R1,R2){\cal E}(R_{1},R_{2}). The main result of this paper is a single letter characterization of a non–trivial subset of (R1,R2){\cal E}(R_{1},R_{2}) for each R1,R2R_{1},R_{2}.

III Background

In this section, we present achievable error exponents for the interference channel which are based on known results of error exponents for single user and multiple access channels (MAC) for fixed composition codebooks [12, 13, 11]. These exponents will be used as a baseline for comparing the performance of the error exponents that we derive in Section IV.

In the following, we will focus on the error performance of user 1, and as a result, all explanations and expressions will be specialized to receiver 1. Similar expressions also hold for user 2 with the exchange of indices 121\leftrightarrow 2.

A possibly suboptimal decoder for the interference channel can be obtained from a given multiple access channel decoder by simply ignoring the decoded message of the interfering transmitter. For example, following [13], we can use a minimum entropy decoder that for a given received vector 𝒚1\mbox{\boldmath$y$}_{1} at receiver 11 computes (𝒙^1,𝒙^2)(\hat{\mbox{\boldmath$x$}}_{1},\hat{\mbox{\boldmath$x$}}_{2})

(𝒙^1,𝒙^2)=argmin(𝒙~1,𝒙~2)𝒞1×𝒞2H^~𝒙1~𝒙2𝒚1(X1,X2|Y1),(\hat{\mbox{\boldmath$x$}}_{1},\hat{\mbox{\boldmath$x$}}_{2})=\operatornamewithlimits{arg\,min}_{(\tilde{\mbox{\boldmath$x$}}_{1},\tilde{\mbox{\boldmath$x$}}_{2})\in{\cal C}_{1}\times{\cal C}_{2}}\hat{H}_{\tilde{}\mbox{\boldmath$x$}_{1}\tilde{}\mbox{\boldmath$x$}_{2}\mbox{\boldmath$y$}_{1}}(X_{1},X_{2}|Y_{1}),

and throws away 𝒙^2\hat{\mbox{\boldmath$x$}}_{2}.

It follows from [13] that for random codebooks of fixed composition Q1,Q2Q_{1},Q_{2}, the average probability of decoding both messages in error, where the averaging is done over the random choice of codebooks, satisfies:

Pr(𝒙^1𝒙1,𝒙^2𝒙2).enE1,2\Pr(\hat{\mbox{\boldmath$x$}}_{1}\neq\mbox{\boldmath$x$}_{1},\hat{\mbox{\boldmath$x$}}_{2}\neq\mbox{\boldmath$x$}_{2})\stackrel{{\scriptstyle.}}{{\leq}}e^{-nE_{1,2}}

where

E1,2=\displaystyle E_{1,2}= minPX^1X^2Y^1:PX^1=Q1,PX^2=Q2D(PY^1|X^1X^2||q1|PX^1,X^2)\displaystyle\min_{P_{\hat{X}_{1}\hat{X}_{2}\hat{Y}_{1}}:P_{\hat{X}_{1}}=Q_{1},P_{\hat{X}_{2}}=Q_{2}}D(P_{\hat{Y}_{1}|\hat{X}_{1}\hat{X}_{2}}||q_{1}|P_{\hat{X}_{1},\hat{X}_{2}})
+I(X^1;X^2)\displaystyle+I(\hat{X}_{1};\hat{X}_{2})
+|I(X^1;Y^1)+I(X^2;X^1,Y^1)R1R2|+\displaystyle+|I(\hat{X}_{1};\hat{Y}_{1})+I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1})-R_{1}-R_{2}|^{+}

with ||+=max{,0}|\cdot|^{+}=\max\{\cdot,0\}.

In addition, the average probability of decoding the message of the interfering transmitter correctly but the message of the desired transmitter incorrectly satisfies:

Pr(𝒙^1𝒙1,𝒙^2=𝒙2).enE1|2\Pr(\hat{\mbox{\boldmath$x$}}_{1}\neq\mbox{\boldmath$x$}_{1},\hat{\mbox{\boldmath$x$}}_{2}=\mbox{\boldmath$x$}_{2})\stackrel{{\scriptstyle.}}{{\leq}}e^{-nE_{1|2}}

where

E1|2=\displaystyle E_{1|2}= minPX^1X^2Y^1:PX^1=Q1,PX^2=Q2D(PY^1|X^1X^2||q1|PX^1,X^2)\displaystyle\min_{P_{\hat{X}_{1}\hat{X}_{2}\hat{Y}_{1}}:P_{\hat{X}_{1}}=Q_{1},P_{\hat{X}_{2}}=Q_{2}}D(P_{\hat{Y}_{1}|\hat{X}_{1}\hat{X}_{2}}||q_{1}|P_{\hat{X}_{1},\hat{X}_{2}})
+I(X^1;X^2)+|I(X^1;X^2,Y^1)R1|+.\displaystyle+I(\hat{X}_{1};\hat{X}_{2})+|I(\hat{X}_{1};\hat{X}_{2},\hat{Y}_{1})-R_{1}|^{+}.

Therefore, the overall average error performance of this MAC decoder in the IFC satisfies:

Pr(𝒙^1𝒙1).enmin{E1,2,E1|2}.\Pr(\hat{\mbox{\boldmath$x$}}_{1}\neq\mbox{\boldmath$x$}_{1})\stackrel{{\scriptstyle.}}{{\leq}}e^{-n\min\{E_{1,2},E_{1|2}\}}.

A second suboptimal decoder that leads to tractable error performance bounds is the single user maximum mutual information decoder (which in this case coincides with the minimum entropy decoder):

𝒙^1=argmax𝒙1𝒞1I^𝒙1𝒚1(X1;Y1).\hat{\mbox{\boldmath$x$}}_{1}=\operatornamewithlimits{arg\,max}_{\mbox{\boldmath$x$}_{1}\in{\cal C}_{1}}\hat{I}_{\mbox{\boldmath$x$}_{1}\mbox{\boldmath$y$}_{1}}(X_{1};Y_{1}).

In this case, standard application of the method of types [11] leads to the following bound on the average error probability under random fixed composition codebooks of types Q1,Q2Q_{1},Q_{2}:

Pr(𝒙^1𝒙1).enE1,\Pr(\hat{\mbox{\boldmath$x$}}_{1}\neq\mbox{\boldmath$x$}_{1})\stackrel{{\scriptstyle.}}{{\leq}}e^{-nE_{1}},

where

E1=\displaystyle E_{1}= minPX^1X^2Y^1:PX^1=Q1,PX^2=Q2D(PY^1|X^1X^2||q1|PX^1,X^2)\displaystyle\min_{P_{\hat{X}_{1}\hat{X}_{2}\hat{Y}_{1}}:P_{\hat{X}_{1}}=Q_{1},P_{\hat{X}_{2}}=Q_{2}}D(P_{\hat{Y}_{1}|\hat{X}_{1}\hat{X}_{2}}||q_{1}|P_{\hat{X}_{1},\hat{X}_{2}})
+I(X^1;X^2)+|I(X^1;Y^1)R1|+.\displaystyle+I(\hat{X}_{1};\hat{X}_{2})+|I(\hat{X}_{1};\hat{Y}_{1})-R_{1}|^{+}.

We can choose the better decoder between these two, that leads to the better error performance. Therefore, we obtain that

EB,1=max{E1;min{E1,2;E1|2}}E_{B,1}=\max\{E_{1};\min\{E_{1,2};E_{1|2}\}\} (1)

is an achievable error exponent at receiver 1, with an analogous exponent following for receiver 2.

IV Main Result

Our main contribution is stated in the following theorem, which presents a new error exponent region for the discrete memoryless two–user IFC. While the full proof appears in Appendix A, we also provide a proof outline below, to give an idea of the main steps.

Theorem 1

For a discrete memoryless two-user IFC as defined in Section I, for a family of block codes of rates R1R_{1} and R2R_{2} a decoding error probability for user 1 satisfying

lim infn1nlogP¯e,1(n)ER,1(R1,R2,Q1,Q2,ρ,λ)\liminf_{n\to\infty}-\frac{1}{n}\log\overline{P}_{e,1}(n)\geq E_{R,1}(R_{1},R_{2},Q_{1},Q_{2},\rho,\lambda) (2)

can be achieved as the block length of the codes nn goes to infinity, where the error exponent ER,1(R1,R2,Q1,Q2,ρ,λ)E_{R,1}(R_{1},R_{2},Q_{1},Q_{2},\rho,\lambda) is given by

ER,1(R1,R2,Q1,Q2,ρ,λ)={R2ρR1+min{\displaystyle E_{R,1}(R_{1},R_{2},Q_{1},Q_{2},\rho,\lambda)=\Bigg{\{}R_{2}-\rho R_{1}+\min\Bigg{\{}
min(PX^1X^2Y^1,PX^1X^2Y^1)𝒮1(Q1,Q2)f1(ρ,λ,PX^1X^2Y^1,PX^1X^2Y^1);\displaystyle\min_{\begin{subarray}{c}(P_{\hat{X}_{1}\hat{X}_{2}\hat{Y}_{1}},P_{\hat{X}_{1}^{\prime}\hat{X}_{2}^{\prime}\hat{Y}_{1}^{\prime}})\\ \in{\cal S}_{1}(Q_{1},Q_{2})\end{subarray}}f_{1}\left(\rho,\lambda,P_{\hat{X}_{1}\hat{X}_{2}\hat{Y}_{1}},P_{\hat{X}_{1}^{\prime}\hat{X}_{2}^{\prime}\hat{Y}_{1}^{\prime}}\right);
min(PX^1X^2Y^1,PX^1X^2Y^1)𝒮2(Q1,Q2,R2)f2(ρ,λ,PX^1X^2Y^1,PX^1X^2Y^1)}}\displaystyle\min_{\begin{subarray}{c}(P_{\hat{X}_{1}\hat{X}_{2}\hat{Y}_{1}},P_{\hat{X}_{1}^{\prime}\hat{X}_{2}^{\prime}\hat{Y}_{1}^{\prime}})\\ \in{\cal S}_{2}(Q_{1},Q_{2},R_{2})\end{subarray}}f_{2}\left(\rho,\lambda,P_{\hat{X}_{1}\hat{X}_{2}\hat{Y}_{1}},P_{\hat{X}_{1}^{\prime}\hat{X}_{2}^{\prime}\hat{Y}_{1}^{\prime}}\right)\Bigg{\}}\Bigg{\}} (3)

where

f1=\displaystyle f_{1}\stackrel{{\scriptstyle\triangle}}{{=}} g(ρ,λ,PX^1X^2Y^1,PX^1X^2Y^1)H(Y^1|X^1)+ρI(X^1;Y^1)\displaystyle g(\rho,\lambda,P_{\hat{X}_{1}\hat{X}_{2}\hat{Y}_{1}},P_{\hat{X}_{1}^{\prime}\hat{X}_{2}^{\prime}\hat{Y}_{1}^{\prime}})-H(\hat{Y}_{1}|\hat{X}_{1})+\rho I(\hat{X}_{1}^{\prime};\hat{Y}_{1}^{\prime})
+max{I(X^2;X^1,Y^1)R2;\displaystyle+\max\bigg{\{}I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1})-R_{2};
ρλ¯(I(X^2;X^1,Y^1)R2)}\displaystyle\quad\quad\quad\quad\quad\quad\quad\quad\overline{\rho\lambda}(I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1})-R_{2})\bigg{\}}
+max{ρ¯I(X^2;Y^1)+ρI(X^2;X^1,Y^1)R2;\displaystyle+\max\bigg{\{}\overline{\rho}I(\hat{X}_{2}^{\prime};\hat{Y}_{1}^{\prime})+\rho I(\hat{X}_{2}^{\prime};\hat{X}_{1}^{\prime},\hat{Y}_{1}^{\prime})-R_{2};
ρ(I(X^2;X^1,Y^1)R2);ρλ(I(X^2;X^1,Y^1)R2)}\displaystyle\rho(I(\hat{X}_{2}^{\prime};\hat{X}_{1}^{\prime},\hat{Y}_{1}^{\prime})-R_{2});\rho\lambda(I(\hat{X}_{2}^{\prime};\hat{X}_{1}^{\prime},\hat{Y}_{1}^{\prime})-R_{2})\bigg{\}} (4)
f2=\displaystyle f_{2}\stackrel{{\scriptstyle\triangle}}{{=}} g(ρ,λ,PX^1X^2Y^1,PX^1X^2Y^1)H(Y^1|X^1)\displaystyle g(\rho,\lambda,P_{\hat{X}_{1}\hat{X}_{2}\hat{Y}_{1}},P_{\hat{X}_{1}^{\prime}\hat{X}_{2}^{\prime}\hat{Y}_{1}^{\prime}})-H(\hat{Y}_{1}|\hat{X}_{1})
+ρI(X^1;X^2,Y^1)+I(X^2;X^1,Y^1)R2\displaystyle+\rho I(\hat{X}_{1}^{\prime};\hat{X}_{2}^{\prime},\hat{Y}_{1}^{\prime})+I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1})-R_{2} (5)

with

g=\displaystyle g\stackrel{{\scriptstyle\triangle}}{{=}} ρλ¯EX^1,X^2,Y^1logq1(Y^1|X^1,X^2)\displaystyle-\overline{\rho\lambda}E_{\hat{X}_{1},\hat{X}_{2},\hat{Y}_{1}}\log q_{1}(\hat{Y}_{1}|\hat{X}_{1},\hat{X}_{2})
ρλEX^1,X^2,Y^1logq1(Y^1|X^1,X^2)\displaystyle\quad\quad\quad\quad-\rho\lambda E_{\hat{X}_{1}^{\prime},\hat{X}_{2}^{\prime},\hat{Y}_{1}^{\prime}}\log q_{1}(\hat{Y}_{1}^{\prime}|\hat{X}_{1}^{\prime},\hat{X}_{2}^{\prime})

and

𝒮1(Q1,Q2)=\displaystyle{\cal S}_{1}(Q_{1},Q_{2})\stackrel{{\scriptstyle\triangle}}{{=}} {(PX^1X^2Y^1,PX^1X^2Y^1)𝒮2:PY^1=PY^1,\displaystyle\big{\{}(P_{\hat{X}_{1}\hat{X}_{2}\hat{Y}_{1}},P_{\hat{X}_{1}^{\prime}\hat{X}_{2}^{\prime}\hat{Y}_{1}^{\prime}})\in{\cal S}^{2}:P_{\hat{Y}_{1}}=P_{\hat{Y}_{1}^{\prime}},
PX^1=PX^1=Q1,PX^2=PX^2=Q2}\displaystyle P_{\hat{X}_{1}}=P_{\hat{X}_{1}^{\prime}}=Q_{1},P_{\hat{X}_{2}}=P_{\hat{X}_{2}^{\prime}}=Q_{2}\big{\}} (6)
𝒮2(Q1,Q2,R2)=\displaystyle{\cal S}_{2}(Q_{1},Q_{2},R_{2})\stackrel{{\scriptstyle\triangle}}{{=}} {(PX^1X^2Y^1,PX^1X^2Y^1)𝒮2:\displaystyle\big{\{}(P_{\hat{X}_{1}\hat{X}_{2}\hat{Y}_{1}},P_{\hat{X}_{1}^{\prime}\hat{X}_{2}^{\prime}\hat{Y}_{1}^{\prime}})\in{\cal S}^{2}:
PX^1=PX^1=Q1,PX^2=PX^2=Q2,\displaystyle P_{\hat{X}_{1}}=P_{\hat{X}_{1}^{\prime}}=Q_{1},P_{\hat{X}_{2}}=P_{\hat{X}_{2}^{\prime}}=Q_{2},
R2I(X^2;Y^1),PX^2,Y^1=PX^2,Y^1}\displaystyle R_{2}\leq I(\hat{X}_{2};\hat{Y}_{1}),P_{\hat{X}_{2},\hat{Y}_{1}}=P_{\hat{X}_{2}^{\prime},\hat{Y}_{1}^{\prime}}\big{\}} (7)

where 𝒮{\cal S} is the probability simplex in 𝒳1×𝒳2×𝒴1\mathcal{X}_{1}\times\mathcal{X}_{2}\times\mathcal{Y}_{1}. In the bound (2), (ρ,λ)[0,1]2(\rho,\lambda)\in[0,1]^{2} can be chosen to maximize the error exponent ER,1E_{R,1}.

In eqs. (2), (3), (6), and (7), Q1Q_{1} and Q2Q_{2} are probability distributions defined over the alphabets 𝒳1\mathcal{X}_{1} and 𝒳2\mathcal{X}_{2} respectively.

Expressions for the error probability Pe,2P_{e,2} and error exponent ER,2E_{R,2} equivalent to (2) and (3) can be stated for the receiver of user 2 by replacing X1X2X_{1}\leftrightarrow X_{2}, Y1Y2Y_{1}\rightarrow Y_{2}, and q1q2q_{1}\rightarrow q_{2} in all the expressions. By varying Q1Q_{1} and Q2Q_{2} over all probability distributions in 𝒳1\mathcal{X}_{1} and 𝒳2\mathcal{X}_{2} respectively, we obtain the error exponent region for fixed rates R1R_{1} and R2R_{2}.

Remark 1

A lower bound to ER,1=maxρ,λER,1(R1,E_{R,1}^{*}\stackrel{{\scriptstyle\triangle}}{{=}}\max_{\rho,\lambda}E_{R,1}(R_{1}, R2,Q1,Q2,ρ,λ)R_{2},Q_{1},Q_{2},\rho,\lambda) is derived in Appendix B (cf. equation (B.4)) that is closer in form to the expressions underlying the benchmark exponent EB,1E_{B,1} presented above. In particular, this lower bound allows us to establish analytically (see Appendix B) that EB,1ER,1E_{B,1}\leq E_{R,1}^{*} at R1=0R_{1}=0 (and for sufficiently small R1R_{1}). Numerical computations, as presented in Section VI, indicate that this inequality can be strict.

A second application of the lower bound (B.4) is to determine the set of rate pairs R1,R2R_{1},R_{2} for which ER,1>0E_{R,1}^{*}>0. We show in Appendix B that this region includes

1={R1<I(X^1;Y^1)}{{R1+R2<I(Y^1;X^1,X^2)}{R1<I(X^1;Y^1|X^2)},\mathcal{R}_{1}=\{R_{1}<I(\hat{X}_{1};\hat{Y}_{1})\}\cup\Big{\{}\{R_{1}+R_{2}<I(\hat{Y}_{1};\hat{X}_{1},\hat{X}_{2})\}\\ \cap\{R_{1}<I(\hat{X}_{1};\hat{Y}_{1}|\hat{X}_{2})\Big{\}},

with an analogous region following for the set where ER,2>0E_{R,2}^{*}>0 (see Fig. 1).

Refer to caption

Figure 1: Rate region 1\mathcal{R}_{1} where ER,1>0E_{R,1}^{*}>0.

Furthermore, it is shown in [11] that the error exponent achievable for user no. 1 with optimal decoding and random fixed composition codebooks is zero outside the closure of the region 1\mathcal{R}_{1}. This result, together with our contribution characterize the rate region where the attainable exponents with random constant composition codebooks are positive. Finally, it can be shown that this region is contained in the HK region [11].

Remark 2

Theorem 1 presents an asymptotic upper bound on the average probability of decoding error for fixed composition codebooks, where the averaging is done over the random choice of codebooks. It is straightforward to show (see, e.g., [4]) that there exists a specific (i.e. non-random) sequence of fixed composition codebooks of increasing block length nn for which the same asymptotic error performance can be achieved.


Proof Outline. For nn non–negative reals a1,,ana_{1},\ldots,a_{n} and b[0,1]b\in[0,1], the following inequality [5, Problem 4.15(f)] will be frequently used:

(i=1nai)bi=1naib.\left(\sum_{i=1}^{n}a_{i}\right)^{b}\leq\sum_{i=1}^{n}a_{i}^{b}. (8)

For a given block length nn, we generate the codebook of user i=1,2i=1,2 by choosing MiM_{i} sequences 𝒙i\mbox{\boldmath$x$}_{i} of length nn independently and uniformly over all the sequences of length nn and type QiQ_{i} in 𝒳in\mathcal{X}_{i}^{n}. Note that Qi,i=1,2Q_{i},i=1,2 have rational entries with denominator nn. We will write 𝒙i,j\mbox{\boldmath$x$}_{i,j} to denote the jj-th codeword of user ii.

For a given channel output 𝒚1𝒴1n\mbox{\boldmath$y$}_{1}\in\mathcal{Y}_{1}^{n}, the best decoding rule to minimize the probability of error in decoding the message of user 1 is ML decoding, which consists of picking the message mm which maximizes P(𝒚1|𝒙1,m)=i=1M2q1(n)(𝒚1|𝒙1,m,𝒙2,i)/M2P(\mbox{\boldmath$y$}_{1}|\mbox{\boldmath$x$}_{1,m})=\sum_{i=1}^{M_{2}}q_{1}^{(n)}(\mbox{\boldmath$y$}_{1}|\mbox{\boldmath$x$}_{1,m},\mbox{\boldmath$x$}_{2,i})/M_{2}. Letting

q1,𝒞2(n)(𝒚1|𝒙1)=1M2i=1M2q1(n)(𝒚1|𝒙1,𝒙2,i)q_{1,{\cal C}_{2}}^{(n)}(\mbox{\boldmath$y$}_{1}|\mbox{\boldmath$x$}_{1})\stackrel{{\scriptstyle\triangle}}{{=}}\frac{1}{M_{2}}\sum_{i=1}^{M_{2}}q_{1}^{(n)}(\mbox{\boldmath$y$}_{1}|\mbox{\boldmath$x$}_{1},\mbox{\boldmath$x$}_{2,i}) (9)

be the “average” channel observed at receiver 1, where the averaging is done over the codewords of user 2 in 𝒞2{\cal C}_{2}, the decoding error probability at receiver 1 for transmitted codeword 𝒙1,m\mbox{\boldmath$x$}_{1,m} and codebooks 𝒞1{\cal C}_{1} and 𝒞2{\cal C}_{2} is given by:

Pe,1(𝒙1,m,𝒞1,𝒞2)=\displaystyle P_{e,1}(\mbox{\boldmath$x$}_{1,m},{\cal C}_{1},{\cal C}_{2})=
𝒚1𝒴1nPe,1(\displaystyle\sum_{\mbox{\boldmath$y$}_{1}\in\mathcal{Y}_{1}^{n}}P_{e,1}( 𝒙1,m,𝒞1,𝒞2|𝒚1)q1,𝒞2(n)(𝒚1|𝒙1,m)\displaystyle\mbox{\boldmath$x$}_{1,m},{\cal C}_{1},{\cal C}_{2}|\mbox{\boldmath$y$}_{1})q_{1,{\cal C}_{2}}^{(n)}(\mbox{\boldmath$y$}_{1}|\mbox{\boldmath$x$}_{1,m}) (10)

With the introduction of the average channel (9), and the use of two auxiliary parameters (ρ,λ)[0,1]2(\rho,\lambda)\in[0,1]^{2}, we can follow the approach of [5] to bound the conditional probability of decoding error Pe,1(𝒙m,𝒞1,𝒞2|𝒚1)P_{e,1}(\mbox{\boldmath$x$}_{m},{\cal C}_{1},{\cal C}_{2}|\mbox{\boldmath$y$}_{1}). Taking expectation over the random choice of codebooks 𝒞1{\cal C}_{1} and 𝒞2{\cal C}_{2} we obtain an average error probability:

P¯E1\displaystyle\overline{P}_{E_{1}}\leq M1ρ𝒚1𝒴1n𝑬𝒞2{𝑬𝑿1[[q1,𝒞2(n)(𝒚1|𝑿1)]ρλ¯]\displaystyle M_{1}^{\rho}\sum_{\mbox{\boldmath$y$}_{1}\in\mathcal{Y}_{1}^{n}}\mbox{\boldmath$E$}_{{\cal C}_{2}}\bigg{\{}\mbox{\boldmath$E$}_{\mbox{\boldmath$X$}_{1}}\bigg{[}[q_{1,{\cal C}_{2}}^{(n)}(\mbox{\boldmath$y$}_{1}|\mbox{\boldmath$X$}_{1})]^{\overline{\rho\lambda}}\bigg{]}
𝑬𝑿1ρ[[q1,𝒞2(n)(𝒚1|𝑿1)]λ]}\displaystyle\cdot\mbox{\boldmath$E$}_{\mbox{\boldmath$X$}_{1}}^{\rho}\bigg{[}[q_{1,{\cal C}_{2}}^{(n)}(\mbox{\boldmath$y$}_{1}|\mbox{\boldmath$X$}_{1})]^{\lambda}\bigg{]}\bigg{\}} (11)

where we used Jensen’s inequality to move the second expectation inside ()ρ(\cdot)^{\rho}.

Equation (11) is hard to handle, mainly due to the correlation introduced by 𝒞2{\cal C}_{2} between the two factors inside the outer expectation. Furthermore, the evaluation of the inner expectations over 𝑿1\mbox{\boldmath$X$}_{1} are complicated due to the powers ρλ¯\overline{\rho\lambda} and λ\lambda affecting q1,𝒞2(n)(𝒚1|𝑿1)q_{1,{\cal C}_{2}}^{(n)}(\mbox{\boldmath$y$}_{1}|\mbox{\boldmath$X$}_{1}). Bounding methods based on Jensen’s inequality and (8) fail to give good results due to the loss of exponential tightness.

We proceed with a refined bounding technique based on the method of types inspired by [7]. While in this approach we still use (8), we use it to bound sums with a number of terms that only grows polynomially with nn, and as a result, exponential tightness is preserved.

Since the channel is memoryless,

q1,𝒞2(n)(𝒚1|𝒙1)=\displaystyle q_{1,{\cal C}_{2}}^{(n)}(\mbox{\boldmath$y$}_{1}|\mbox{\boldmath$x$}_{1})= 1M2i=1M2t=1nq1(y1(t)|x1(t),x2,i(t))\displaystyle\frac{1}{M_{2}}\sum_{i=1}^{M_{2}}\prod_{t=1}^{n}q_{1}(y_{1}(t)|x_{1}(t),x_{2,i}(t))
=\displaystyle= 1M2PX^1X^2Y^1N𝒙1,𝒚1(PX^1X^2Y^1)\displaystyle\frac{1}{M_{2}}\sum_{P_{\hat{X}_{1}\hat{X}_{2}\hat{Y}_{1}}}N_{\mbox{\boldmath$x$}_{1},\mbox{\boldmath$y$}_{1}}(P_{\hat{X}_{1}\hat{X}_{2}\hat{Y}_{1}})
en𝑬X^1X^2Y^1[logq1(Y^1|X^1,X^2)]\displaystyle\cdot e^{n\mbox{\boldmath$E$}_{\hat{X}_{1}\hat{X}_{2}\hat{Y}_{1}}\left[\log q_{1}\left(\hat{Y}_{1}|\hat{X}_{1},\hat{X}_{2}\right)\right]} (12)

where we used N𝒙1,𝒚1(PX^1X^2Y^1)N_{\mbox{\boldmath$x$}_{1},\mbox{\boldmath$y$}_{1}}(P_{\hat{X}_{1}\hat{X}_{2}\hat{Y}_{1}}) to denote the number of codewords 𝒙2\mbox{\boldmath$x$}_{2} in 𝒞2{\cal C}_{2} such that (𝒙1,𝒙2,𝒚1)(\mbox{\boldmath$x$}_{1},\mbox{\boldmath$x$}_{2},\mbox{\boldmath$y$}_{1}) have empirical distribution PX^1X^2Y^1P_{\hat{X}_{1}\hat{X}_{2}\hat{Y}_{1}}. We also used 𝑬X^1X^2Y^1()\mbox{\boldmath$E$}_{\hat{X}_{1}\hat{X}_{2}\hat{Y}_{1}}(\cdot) to denote expectation with respect to the distribution PX^1X^2Y^1P_{\hat{X}_{1}\hat{X}_{2}\hat{Y}_{1}}.

Replacing (12) in (11) and using (8) three times, we obtain:

P¯E1\displaystyle\overline{P}_{E_{1}}\leq M1ρM2P^P^𝒚1𝒴1n𝑬𝒞2{𝑬𝑿1[N𝑿1,𝒚1ρλ¯(P^)]\displaystyle\frac{M_{1}^{\rho}}{M_{2}}\sum_{\hat{P}}\sum_{\hat{P}^{\prime}}\sum_{\mbox{\boldmath$y$}_{1}\in\mathcal{Y}_{1}^{n}}\mbox{\boldmath$E$}_{{\cal C}_{2}}\Bigg{\{}\mbox{\boldmath$E$}_{\mbox{\boldmath$X$}_{1}}\bigg{[}N_{\mbox{\boldmath$X$}_{1},\mbox{\boldmath$y$}_{1}}^{\overline{\rho\lambda}}(\hat{P})\bigg{]}
𝑬𝑿1ρ[N𝑿1,𝒚1λ(P^)]}\displaystyle\cdot\mbox{\boldmath$E$}_{\mbox{\boldmath$X$}_{1}}^{\rho}\bigg{[}N_{\mbox{\boldmath$X$}_{1},\mbox{\boldmath$y$}_{1}}^{\lambda}(\hat{P}^{\prime})\bigg{]}\Bigg{\}}
en[ρλ¯EP^logq1(Y^1|X^1,X^2)+λEP^logq1(Y^1|X^1,X^2)\displaystyle\cdot e^{n[\overline{\rho\lambda}E_{\hat{P}}\log q_{1}(\hat{Y}_{1}|\hat{X}_{1},\hat{X}_{2})+\lambda E_{\hat{P}^{\prime}}\log q_{1}(\hat{Y}_{1}^{\prime}|\hat{X}_{1}^{\prime},\hat{X}_{2}^{\prime})} (13)

where we used P^=PX^1X^2Y^1\hat{P}=P_{\hat{X}_{1}\hat{X}_{2}\hat{Y}_{1}} and P^=PX^1X^2Y^1\hat{P}^{\prime}=P_{\hat{X}_{1}^{\prime}\hat{X}_{2}^{\prime}\hat{Y}_{1}^{\prime}} to shorten the expression.

We next consider the bounding of

E(𝒚1,P^,P^)=\displaystyle E(\mbox{\boldmath$y$}_{1},\hat{P},\hat{P}^{\prime})\stackrel{{\scriptstyle\triangle}}{{=}}
𝑬𝒞2{𝑬𝑿1\displaystyle\mbox{\boldmath$E$}_{{\cal C}_{2}}\Bigg{\{}\mbox{\boldmath$E$}_{\mbox{\boldmath$X$}_{1}} [N𝑿1,𝒚1ρλ¯(P^)]𝑬𝑿1ρ[N𝑿1,𝒚1λ(P^)]},\displaystyle\bigg{[}N_{\mbox{\boldmath$X$}_{1},\mbox{\boldmath$y$}_{1}}^{\overline{\rho\lambda}}(\hat{P})\bigg{]}\mbox{\boldmath$E$}_{\mbox{\boldmath$X$}_{1}}^{\rho}\bigg{[}N_{\mbox{\boldmath$X$}_{1},\mbox{\boldmath$y$}_{1}}^{\lambda}(\hat{P}^{\prime})\bigg{]}\Bigg{\}}, (14)

and note that N𝑿1,𝒚1(P^)N_{\mbox{\boldmath$X$}_{1},\mbox{\boldmath$y$}_{1}}(\hat{P}) and N𝑿1,𝒚1(P^)N_{\mbox{\boldmath$X$}_{1},\mbox{\boldmath$y$}_{1}}(\hat{P}^{\prime}) are formed by sums of an exponentially large number of indicator functions, each of which takes value 1 with exponentially small probability. These sums concentrate around their means, which show different behavior depending on how the number of terms in the sum (enR2e^{nR_{2}}) compares to the probability of each of the indicator functions taking value 1 (depending on the case considered, these probabilities take the form enI(X^2;X^1,Y^1)e^{-nI(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1})}, enI(X^2;X^1,Y^1)e^{-nI(\hat{X}_{2}^{\prime};\hat{X}_{1}^{\prime},\hat{Y}_{1}^{\prime})}, or enI(X^2;Y^1)e^{-nI(\hat{X}_{2}^{\prime};\hat{Y}_{1}^{\prime})}). Whenever one of the factors in (14) concentrates around its mean it behaves as a constant, and hence is uncorrelated with the remaining factor. As a result, the correlation between the two factors of (14), which complicates the analysis, can be circumvented. We give the details of this part of the derivation in Appendix A, but note here that the resulting bound on E(𝒚1,P^,P^)E(\mbox{\boldmath$y$}_{1},\hat{P},\hat{P}^{\prime}) depends on 𝒚1\mbox{\boldmath$y$}_{1} only through a factor 1(𝒚1PY^1,PY^1;PX^1=PX^1=Q1;PX^2=PX^2=Q2)1(\mbox{\boldmath$y$}_{1}\in P_{\hat{Y}_{1}},P_{\hat{Y}_{1}^{\prime}};P_{\hat{X}_{1}}=P_{\hat{X}_{1}^{\prime}}=Q_{1};P_{\hat{X}_{2}}=P_{\hat{X}_{2}^{\prime}}=Q_{2}). Therefore, the innermost sum in (13) can be evaluated by counting the number of vectors 𝒚1𝒴1n\mbox{\boldmath$y$}_{1}\in\mathcal{Y}_{1}^{n} that have empirical types PY^1P_{\hat{Y}_{1}} and PY^1P_{\hat{Y}_{1}^{\prime}}. Note that this count can only be positive for PY^1=PY^1P_{\hat{Y}_{1}}=P_{\hat{Y}_{1}^{\prime}}. This count is approximately equal to enH(Y^1)e^{nH(\hat{Y}_{1})} to first order in the exponent. Furthermore, the sums over P^\hat{P} and P^\hat{P}^{\prime} in (13) have a number of terms that only grows polynomially with nn. Therefore, to first order, the exponential growth rate of (13) equals the maximum exponential growth rate of the argument of the outer two sums, where the maximization is performed over the distributions P^\hat{P} and P^\hat{P}^{\prime} which are rational, with denominator nn. We can further upper bound the probability of error by enlarging the optimization region, maximizing over any probability distributions P^,P^\hat{P},\hat{P}^{\prime}.

V Convex Optimization Issues

In order to get a valid evaluation of ER,1(R1,R2,Q1,E_{R,1}(R_{1},R_{2},Q_{1}, Q2,ρ,λ)Q_{2},\rho,\lambda), for any given Q1,Q2,ρ,λQ_{1},Q_{2},\rho,\lambda satisfying the constraints of the outer maximization, we need to accurately solve the inner minimization problems. A brute force search may not give accurate enough results in reasonable time. As will be shown below, the first minimization problem in (3) is a convex problem, and as a result, it that can be solved efficiently. In addition, convexity allows to lower bound the objective function by its supporting hyperplane, which in turn, allows to get a reliable111In our implementation we solve the original convex optimization problem using the MATLAB function fmincon. lower bound through the solution of a linear program.

The second minimization problem is not convex due to the non–convex constraint R2I(X^2;Y^1)R_{2}\leq I(\hat{X}_{2};\hat{Y}_{1}). If we remove this constraint, it will be later shown that we obtain a convex problem that can be solved efficiently. There are two possible situations:

The first situation occurs when the optimal solution to the modified problem satisfies R2I(X^2;Y^1)R_{2}\leq I(\hat{X}_{2};\hat{Y}_{1}): in this case, the solution to the modified problem is also a solution to the original problem.

The second situation is when the optimal solution to the modified problem satisfies R2>I(X^2;Y^1)R_{2}>I(\hat{X}_{2};\hat{Y}_{1}): in this case, a solution to the original problem must satisfy R2=I(X^2;Y^1)R_{2}=I(\hat{X}_{2};\hat{Y}_{1}). We prove this statement by contradiction. Let P1P^{*}_{1} be the optimal solution to the modified problem, and P2P^{*}_{2} be an optimal solution to the original problem. Now assume conversely, that there is no P2P^{*}_{2} that satisfies R2=I(X^2;Y^1)R_{2}=I(\hat{X}_{2};\hat{Y}_{1}). With this assumption, we have that at P2P^{*}_{2}, R2<I(X^2;Y^1)R_{2}<I(\hat{X}_{2};\hat{Y}_{1}). Let 𝒟{P=(PX^1X^2Y^1,PX^1X^2Y^1):PX^1=PX^1=Q1,PX^2=PX^2=Q2}\mathcal{D}\triangleq\{P=(P_{\hat{X}_{1}\hat{X}_{2}\hat{Y}_{1}},P_{\hat{X}_{1}^{\prime}\hat{X}_{2}^{\prime}\hat{Y}_{1}^{\prime}}):P_{\hat{X}_{1}}=P_{\hat{X}_{1}^{\prime}}=Q_{1},P_{\hat{X}_{2}}=P_{\hat{X}_{2}^{\prime}}=Q_{2}\}. Note that 𝒟\mathcal{D} is a convex set and P1,P2𝒟P_{1}^{*},P_{2}^{*}\in\mathcal{D}. Due to the continuity of I(X^2;Y^1)I(\hat{X}_{2};\hat{Y}_{1}), the straight line in 𝒟\mathcal{D} that joins P1P^{*}_{1} and P2P^{*}_{2} must pass through an intermediate point P¯=αP1+(1α)P2\overline{P}=\alpha P^{*}_{1}+(1-\alpha)P^{*}_{2}, α(0,1)\alpha\in(0,1), that satisfies I(X^2;Y^1)=R2I(\hat{X}_{2};\hat{Y}_{1})=R_{2}. Let f2()f_{2}(\cdot) be the objective function of the second minimization problem in (3), restricted to 𝒟\mathcal{D}. It will be shown later that f2()f_{2}(\cdot), restricted to this domain, is a convex function. By hypothesis, f2(P¯)>f2(P2)f_{2}(\overline{P})>f_{2}(P_{2}^{*}) and we have f2(P1)f2(P2)<f2(P¯)f_{2}(P_{1}^{*})\leq f_{2}(P_{2}^{*})<f_{2}(\overline{P}). On the other hand, from the convexity of f2()f_{2}(\cdot), restricted to 𝒟\mathcal{D}, we have f2(P¯)αf2(P1)+(1α)f2(P2)f2(P2)f_{2}(\overline{P})\leq\alpha f_{2}(P_{1}^{*})+(1-\alpha)f_{2}(P_{2}^{*})\leq f_{2}(P_{2}^{*}) and we get a contradiction. Therefore, it follows that there is a solution P2P^{*}_{2} to the original problem that satisfies R2=I(X^2;Y^1)R_{2}=I(\hat{X}_{2};\hat{Y}_{1}).

Let f1()f_{1}(\cdot) be the objective function of the first minimization problem in (3). First, we note that P2P_{2}^{*} satisfies the constraints of the first minimization problem since they are less restrictive than the constraints of the second minimization problem in (3). We next prove that f1(P2)=f2(P2)f_{1}(P_{2}^{*})=f_{2}(P_{2}^{*}). As a result, the optimal solution PP^{*} of the first minimization problem satisfies f1(P)f1(P2)=f2(P2)f_{1}(P^{*})\leq f_{1}(P_{2}^{*})=f_{2}(P^{*}_{2}), and we do not need to know f2(P2)f_{2}(P_{2}^{*}) to evaluate the argument of the maximization in (3). Using the fact that at P2P_{2}^{*}, I(X^2;Y^1)=I(X^2;Y^1)=R2I(\hat{X}_{2};\hat{Y}_{1})=I(\hat{X}_{2}^{\prime};\hat{Y}_{1}^{\prime})=R_{2}, we have:

f2(P2)f1(P2)\displaystyle f_{2}(P_{2}^{*})-f_{1}(P_{2}^{*})
=ρI(X^1;X^2,Y^1)ρI(X^1;Y^1)ρ(I(X^2;X^1,Y^1)R2)\displaystyle=\rho I(\hat{X}_{1}^{\prime};\hat{X}_{2}^{\prime},\hat{Y}_{1}^{\prime})-\rho I(\hat{X}_{1}^{\prime};\hat{Y}_{1}^{\prime})-\rho(I(\hat{X}_{2}^{\prime};\hat{X}_{1}^{\prime},\hat{Y}_{1}^{\prime})-R_{2})
=ρ[I(X^2;X^1,Y^1)I(X^2;Y^1)I(X^2;X^1,Y^1)+R2]\displaystyle=\rho\left[I(\hat{X}_{2}^{\prime};\hat{X}_{1}^{\prime},\hat{Y}_{1}^{\prime})-I(\hat{X}_{2}^{\prime};\hat{Y}_{1}^{\prime})-I(\hat{X}_{2}^{\prime};\hat{X}_{1}^{\prime},\hat{Y}_{1}^{\prime})+R_{2}\right]
=0,\displaystyle=0, (15)

where we used the identity I(X^1;X^2,Y^1)I(X^1;Y^1)=I(X^2;X^1,Y^1)I(X^2;Y^1)I(\hat{X}_{1}^{\prime};\hat{X}_{2}^{\prime},\hat{Y}_{1}^{\prime})-I(\hat{X}_{1}^{\prime};\hat{Y}_{1}^{\prime})=I(\hat{X}_{2}^{\prime};\hat{X}_{1}^{\prime},\hat{Y}_{1}^{\prime})-I(\hat{X}_{2}^{\prime};\hat{Y}_{1}^{\prime}) in the second equality.

In summary, if the solution to the second minimization problem in (3), without the constraint on R2R_{2}, satisfies R2>I(X^2;Y^1)R_{2}>I(\hat{X}_{2};\hat{Y}_{1}), then the first minimization problem in (3) dominates the expression. Otherwise, the solution to the second minimization problem in (3) without the constraint R2I(X^2;Y^1)R_{2}\leq I(\hat{X}_{2};\hat{Y}_{1}), equals the solution to the second minimization problem with this constraint.

It remains to show that the objective functions of the minimization problems in (3), f1(PX^1X^2Y^1,PX^1X^2Y^1)f_{1}(P_{\hat{X}_{1}\hat{X}_{2}\hat{Y}_{1}},P_{\hat{X}_{1}^{\prime}\hat{X}_{2}^{\prime}\hat{Y}_{1}^{\prime}}), f2(PX^1X^2Y^1,PX^1X^2Y^1)f_{2}(P_{\hat{X}_{1}\hat{X}_{2}\hat{Y}_{1}},P_{\hat{X}_{1}^{\prime}\hat{X}_{2}^{\prime}\hat{Y}_{1}^{\prime}}), restricted to the domain 𝒟\mathcal{D}, are convex functions. Since the sum of convex functions is convex, to prove the convexity of f1()f_{1}(\cdot) on 𝒟\mathcal{D}, we only need to prove that the different terms of

f1=\displaystyle f_{1}= ρλ¯𝑬X^1X^2Y^1logq(Y^1|X^1,X^2)\displaystyle-\overline{\rho\lambda}\mbox{\boldmath$E$}_{\hat{X}_{1}\hat{X}_{2}\hat{Y}_{1}}\log q(\hat{Y}_{1}|\hat{X}_{1},\hat{X}_{2})-
ρλ𝑬X^1X^2Y^1logq(Y^1|X^1,X^2)H(Y^1|X^1)+ρI(X^1;Y^1)\displaystyle\rho\lambda\mbox{\boldmath$E$}_{\hat{X}_{1}^{\prime}\hat{X}_{2}^{\prime}\hat{Y}_{1}^{\prime}}\log q(\hat{Y}_{1}^{\prime}|\hat{X}_{1}^{\prime},\hat{X}_{2}^{\prime})-H(\hat{Y}_{1}|\hat{X}_{1})+\rho I(\hat{X}_{1}^{\prime};\hat{Y}_{1}^{\prime})
+max{I(X^2;X^1,Y^1)R2;\displaystyle+\max\bigg{\{}I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1})-R_{2};
ρλ¯(I(X^2;X^1,Y^1)R2)}\displaystyle\quad\quad\quad\quad\quad\quad\quad\quad\overline{\rho\lambda}(I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1})-R_{2})\bigg{\}}
+max{ρ¯I(X^2;Y^1)+ρI(X^2;X^1,Y^1)R2;\displaystyle+\max\bigg{\{}\overline{\rho}I(\hat{X}_{2}^{\prime};\hat{Y}_{1}^{\prime})+\rho I(\hat{X}_{2}^{\prime};\hat{X}_{1}^{\prime},\hat{Y}_{1}^{\prime})-R_{2};
ρ(I(X^2;X^1,Y^1)R2);ρλ(I(X^2;X^1,Y^1)R2)}\displaystyle\rho(I(\hat{X}_{2}^{\prime};\hat{X}_{1}^{\prime},\hat{Y}_{1}^{\prime})-R_{2});\rho\lambda(I(\hat{X}_{2}^{\prime};\hat{X}_{1}^{\prime},\hat{Y}_{1}^{\prime})-R_{2})\bigg{\}} (16)

are convex within 𝒟\mathcal{D}.

First, we have that ρλ¯𝑬X^1X^2Y^1logq(Y^1|X^1,X^2)ρλ𝑬X^1X^2Y^1logq(Y^1|X^1,X^2)-\overline{\rho\lambda}\mbox{\boldmath$E$}_{\hat{X}_{1}\hat{X}_{2}\hat{Y}_{1}}\log q(\hat{Y}_{1}|\hat{X}_{1},\hat{X}_{2})-\rho\lambda\mbox{\boldmath$E$}_{\hat{X}_{1}^{\prime}\hat{X}_{2}^{\prime}\hat{Y}_{1}^{\prime}}\log q(\hat{Y}_{1}^{\prime}|\hat{X}_{1}^{\prime},\hat{X}_{2}^{\prime}) is linear in (PX^1X^2Y^1,PX^1X^2Y^1)(P_{\hat{X}_{1}\hat{X}_{2}\hat{Y}_{1}},P_{\hat{X}_{1}^{\prime}\hat{X}_{2}^{\prime}\hat{Y}_{1}^{\prime}}) and therefore convex. Also, we have that H(Y^1|X^1)=H(X^1)H(X^1,Y^1)-H(\hat{Y}_{1}|\hat{X}_{1})=H(\hat{X}_{1})-H(\hat{X}_{1},\hat{Y}_{1}) is convex for fixed PX^1P_{\hat{X}_{1}} due to the concavity of H(X^1,Y^1)H(\hat{X}_{1},\hat{Y}_{1}).

In addition, I(X^1;Y^1)I(\hat{X}_{1}^{\prime};\hat{Y}_{1}^{\prime}) can be written as D(PX^1Y^1||PX^1×PY^1)D(P_{\hat{X}_{1}^{\prime}\hat{Y}_{1}^{\prime}}||P_{\hat{X}_{1}^{\prime}}\times P_{\hat{Y}_{1}^{\prime}}). Let P¯=λP^+(1λ)Pˇ\overline{P}=\lambda\hat{P}+(1-\lambda)\check{P} for any P^\hat{P}, Pˇ\check{P} such that P^X^1=PˇX^1\hat{P}_{\hat{X}_{1}^{\prime}}=\check{P}_{\hat{X}_{1}^{\prime}} and λ[0,1]\lambda\in[0,1]. We have that P¯X^1Y^1=λP^X^1Y^1+(1λ)PˇX^1Y^1\overline{P}_{\hat{X}_{1}^{\prime}\hat{Y}_{1}^{\prime}}=\lambda\hat{P}_{\hat{X}_{1}^{\prime}\hat{Y}_{1}^{\prime}}+(1-\lambda)\check{P}_{\hat{X}_{1}^{\prime}\hat{Y}_{1}^{\prime}} and P¯X^1×P¯Y^1=P^X^1×(λP^Y^1+(1λ)PˇY^1)=λ(P^X^1×P^Y^1)+(1λ)(PˇX^1×PˇY^1)\overline{P}_{\hat{X}_{1}^{\prime}}\times\overline{P}_{\hat{Y}_{1}^{\prime}}=\hat{P}_{\hat{X}_{1}^{\prime}}\times(\lambda\hat{P}_{\hat{Y}_{1}^{\prime}}+(1-\lambda)\check{P}_{\hat{Y}_{1}^{\prime}})=\lambda(\hat{P}_{\hat{X}_{1}^{\prime}}\times\hat{P}_{\hat{Y}_{1}^{\prime}})+(1-\lambda)(\check{P}_{\hat{X}_{1}^{\prime}}\times\check{P}_{\hat{Y}_{1}^{\prime}}). The convexity of ρI(X^1;Y^1)\rho I(\hat{X}_{1}^{\prime};\hat{Y}_{1}^{\prime}) for fixed PX^1P_{\hat{X}_{1}^{\prime}} follows from the convexity of D(PQ)D(P\|Q) in the pair (P,Q)(P,Q):

I(X^1;Y^1)|P¯\displaystyle I(\hat{X}_{1}^{\prime};\hat{Y}^{\prime}_{1})\bigg{|}_{\overline{P}} =D(P¯X^1Y^1P¯X^1×P¯Y^1)\displaystyle=D(\overline{P}_{\hat{X}_{1}^{\prime}\hat{Y}_{1}^{\prime}}\|\overline{P}_{\hat{X}_{1}^{\prime}}\times\overline{P}_{\hat{Y}_{1}^{\prime}})
λD(P^X^1Y^1P^X^1×P^Y^1)\displaystyle\leq\lambda D(\hat{P}_{\hat{X}_{1}^{\prime}\hat{Y}_{1}^{\prime}}\|\hat{P}_{\hat{X}_{1}^{\prime}}\times\hat{P}_{\hat{Y}_{1}^{\prime}})
+(1λ)D(PˇX^1Y^1PˇX^1×PˇY^1)\displaystyle+(1-\lambda)D(\check{P}_{\hat{X}_{1}^{\prime}\hat{Y}_{1}^{\prime}}\|\check{P}_{\hat{X}_{1}^{\prime}}\times\check{P}_{\hat{Y}_{1}^{\prime}})
=λI(X^1;Y^1)|P^+(1λ)I(X^1;Y^1)|Pˇ.\displaystyle=\lambda I(\hat{X}_{1}^{\prime};\hat{Y}_{1}^{\prime})\bigg{|}_{\hat{P}}+(1-\lambda)I(\hat{X}_{1}^{\prime};\hat{Y}_{1}^{\prime})\bigg{|}_{\check{P}}.

Continuing with the next term of (16),

max{I(X^2;X^1,Y^1)R2;ρλ¯(I(X^2;X^1,Y^1)R2)}\max\big{\{}I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1})-R_{2};\overline{\rho\lambda}(I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1})-R_{2})\big{\}}

we note that it is the maximum of two convex functions, and therefore convex. The convexity of each of the individual functions follows from the convexity of I(X^2;X^1,Y^1)I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1}) for fixed PX^1P_{\hat{X}_{1}}, PX^2P_{\hat{X}_{2}}, which can be proved along the same lines as (LABEL:eqn:convexI).

Finally, we consider the last term of (16):

max{ρ¯I(X^2;Y^1)+ρI(X^2;X^1,Y^1)R2;\displaystyle\max\bigg{\{}\overline{\rho}I(\hat{X}_{2}^{\prime};\hat{Y}_{1}^{\prime})+\rho I(\hat{X}_{2}^{\prime};\hat{X}_{1}^{\prime},\hat{Y}_{1}^{\prime})-R_{2};
ρ(I(X^2;X^1,Y^1)R2);ρλ(I(X^2;X^1,Y^1)R2)}.\displaystyle\quad\quad\rho(I(\hat{X}_{2}^{\prime};\hat{X}_{1}^{\prime},\hat{Y}_{1}^{\prime})-R_{2});\rho\lambda(I(\hat{X}_{2}^{\prime};\hat{X}_{1}^{\prime},\hat{Y}_{1}^{\prime})-R_{2})\bigg{\}}.

Each of the arguments of the max{}\max\{\ldots\} can be shown to be the sum of convex functions for fixed PX^1P_{\hat{X}_{1}^{\prime}} and PX^2P_{\hat{X}_{2}^{\prime}}, using a similar argument as the one used to prove (LABEL:eqn:convexI). Since the maximum of convex functions is convex, the convexity of f1f_{1} restricted to 𝒟\mathcal{D} follows.

Using similar arguments, it is easy to show that

f2\displaystyle f_{2} =ρλ¯𝑬X^1X^2Y^1logq1(Y^1|X^1,X^2)\displaystyle=-\overline{\rho\lambda}\mbox{\boldmath$E$}_{\hat{X}_{1}\hat{X}_{2}\hat{Y}_{1}}\log q_{1}(\hat{Y}_{1}|\hat{X}_{1},\hat{X}_{2})-
ρλ𝑬X^1X^2Y^1logq1(Y^1|X^1X^2)H(Y^1|X^1)+\displaystyle\rho\lambda\mbox{\boldmath$E$}_{\hat{X}_{1}^{\prime}\hat{X}_{2}^{\prime}\hat{Y}_{1}^{\prime}}\log q_{1}(\hat{Y}_{1}^{\prime}|\hat{X}_{1}^{\prime}\hat{X}_{2}^{\prime})-H(\hat{Y}_{1}|\hat{X}_{1})+
ρI(X^1;X^2,Y^1)+I(X^2;X^1,Y^1)R2\displaystyle\rho I(\hat{X}_{1}^{\prime};\hat{X}_{2},\hat{Y}_{1}^{\prime})+I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1})-R_{2}

is convex in 𝒟\mathcal{D}.

VI Numerical Results

In this section, we present a numerical example to show the performance of the error exponent region introduced in Theorem 1. We use as a baseline for comparison the error exponent region of Section III, which is obtained with minor modifications from known results for single user and multiple access channels.

We present results for the binary Z-channel model: Y1=X1X2ZY_{1}=X_{1}*X_{2}\oplus Z, Y2=X2Y_{2}=X_{2}, where X1,X2,Y1,Y2{0,1}X_{1},X_{2},Y_{1},Y_{2}\in\{0,1\}, ZBernoulli(p)Z\sim\text{Bernoulli}(p), * is multiplication, and \oplus is modulo 2 addition. This is a modified version of the binary erasure IFC studied in [10], where we add noise ZZ to the received signal of user 1. In the results presented here, we fix p=0.01p=0.01.

The boundary of the error exponent region is a surface in four dimensions R1,R2,ER,1,ER,2R_{1},R_{2},E_{R,1},E_{R,2}. This surface can be obtained parametrically by computing ER,1,ER,2E_{R,1},E_{R,2} as a function of R1,R2,Q1,Q2R_{1},R_{2},Q_{1},Q_{2}, by optimizing over ρ\rho and λ\lambda in (3) and in the corresponding expression for ER,2E_{R,2}. The parameterization of ER,iE_{R,i} in terms of R1,R2,Q1,Q2R_{1},R_{2},Q_{1},Q_{2}, allows the study of the error performance as a function of the parameters that directly influence it.

Refer to caption

Figure 2: Error exponents as a function of R1R_{1} for two different values of R2R_{2} and fixed choices Q1,Q2Q_{1},Q_{2}. All the rates are in nats.

Refer to caption

Figure 3: Optimal parameters ρ\rho and λ\lambda for the ER,1E_{R,1} curves of Fig. 2. All the rates are in nats.

Fig. 2 shows that the error exponents under optimal decoding derived in this paper can be strictly better than the baseline error exponents of Section III. This suggests that the inequality obtained in Appendix B for R1=0R_{1}=0 can be strict. In addition, in all the plots that we computed for the Z-channel for different values of Q1,Q2Q_{1},Q_{2} and R2R_{2} we were not able to find a single case where the baseline exponent EB,1E_{B,1} was larger than ER,1E_{R,1}.

We see that the curves of ER,1E_{R,1} (EB,1E_{B,1}) for fixed R2,Q1,Q2R_{2},Q_{1},Q_{2} have a linear part for R1R_{1} below a critical value R1c(R)R_{1c}^{(R)} (R1c(B)R_{1c}^{(B)}), and a curvy part for R1>R1c(R)R_{1}>R_{1c}^{(R)} (R1>R1c(B)R_{1}>R_{1c}^{(B)}) (note that the critical values depend on the parameters R2,Q1R_{2},Q_{1} and Q2Q_{2}). Figure 3 shows the optimal parameters ρ\rho and λ\lambda for the ER,1E_{R,1} curves shown in Fig. 2 for R2=0.139R_{2}=0.139 and R2=0.277R_{2}=0.277 nats/channel use. We see that for the linear part of the ER,1E_{R,1} curves ρ=1\rho=1 and λ=1/2\lambda=1/2 are optimal, while for the curvy part (i.e. R1>R1c(R)R_{1}>R_{1c}^{(R)}) the optimal ρ\rho decreases to 0 and the optimal λ\lambda increases towards 1. For R1R_{1} in the interval (0,min{R1c(R);R1c(B)})(0,\min\{R_{1c}^{(R)};R_{1c}^{(B)}\}) the gap between the ER,1E_{R,1} and EB,1E_{B,1} curves remains constant as both curves are lines with slope 1-1, and this gap is equal to the gap at R1=0R_{1}=0. In general, any gap between ER,1E_{R,1} and EG,1E_{G,1} at R1=0R_{1}=0 will remain constant in the interval where both curves have slope 1-1. We also note since the optimal parameters ρ\rho and λ\lambda vary for different rates, these parameters are indeed active, i.e. they have influence on the resulting error exponent.

The curves of Fig. 2 are obtained for fixed choices of Q1Q_{1} and Q2Q_{2}, which are the distributions used to generate the random fixed composition codebooks. As Q1Q_{1} and Q2Q_{2} vary in the probability simplex 𝒮{\cal S}, we obtain the four-dimensional error exponent region {R1,R2,ER,1(R1,R2,Q1,Q2),ER,2(R1,R2,Q1,Q2):Q1,Q2𝒮}\{R_{1},R_{2},E_{R,1}(R_{1},R_{2},Q_{1},Q_{2}),E_{R,2}(R_{1},R_{2},Q_{1},Q_{2}):Q_{1},Q_{2}\in{\cal S}\}. In order to obtain a two-dimensional plot of the region, we consider a projection: we fix R2R_{2} varying R1R_{1} and plot the maximum value over Q1Q_{1} and Q2Q_{2} in the error exponent region of min{ER,1,ER,2}\min\{E_{R,1},E_{R,2}\}. This corresponds to choosing Q1Q_{1} and Q2Q_{2} in order to maximize the error exponent simultaneously achievable for both users. Figure 4 shows this projection for R2=0.139R_{2}=0.139 and R2=0.277R_{2}=0.277 nats/channel use, where, for reference, we included the corresponding curves for the error exponents EB,1,EB,2E_{B,1},E_{B,2} of Section III.

Refer to caption

Figure 4: Maximum error exponent simultaneously achievable for both users for fixed R2R_{2} as a function of R1R_{1}.

For the noiseless binary channel of user 2, ER,2=max{H(Q2)R2;0}E_{R,2}=\max\{H(Q_{2})-R_{2};0\}, and as a result, ER,2E_{R,2} decreases with increasing Pr(X2=1)\mbox{Pr}(X_{2}=1) for Pr(X2=1)1/2\mbox{Pr}(X_{2}=1)\geq 1/2. On the other hand, because of the multiplication between X1X_{1} and X2X_{2} in the received signal Y1Y_{1}, increasing Pr(X2=1)\mbox{Pr}(X_{2}=1) results in less interference for user 1, and a larger value of ER,1E_{R,1}. It follows that there is a direct trade-off between ER,1E_{R,1} and ER,2E_{R,2} through the choice of Q2Q_{2}, and whenever min{ER,1,ER,2}\min\{E_{R,1},E_{R,2}\} is maximized, ER,1=ER,2E_{R,1}=E_{R,2}. Therefore, in the curve of Fig. 4, ER,1=ER,2E_{R,1}=E_{R,2}.

From the plots of Figs. 2 and 4, we see that the error exponents obtained from Theorem 1 sometimes outperform and are never worse than the baseline error exponents of Section III.

Appendix A Proof of Theorem 1

It is easy to see that the optimum decoder for user 1 picks the message mm (1mM11\leq m\leq M_{1}) that maximizes (1/M2)𝒙2𝒞2q1(n)(𝒚1|𝒙1,𝒙2)(1/M_{2})\sum_{\mbox{\boldmath$x$}_{2}\in{\cal C}_{2}}q_{1}^{(n)}(\mbox{\boldmath$y$}_{1}|\mbox{\boldmath$x$}_{1},\mbox{\boldmath$x$}_{2}), where M1=enR1M_{1}=\lceil e^{nR_{1}}\rceil and M2=enR2M_{2}=\lceil e^{nR_{2}}\rceil. Applying Gallager’s general upper bound to the “channel” P(𝒚1|𝒙1)=1M2𝒙2𝒞2q1(n)(𝒚1|𝒙1,𝒙2)P(\mbox{\boldmath$y$}_{1}|\mbox{\boldmath$x$}_{1})=\frac{1}{M_{2}}\sum_{\mbox{\boldmath$x$}_{2}\in{\cal C}_{2}}q_{1}^{(n)}(\mbox{\boldmath$y$}_{1}|\mbox{\boldmath$x$}_{1},\mbox{\boldmath$x$}_{2}), we have for user no. 1:

PE1\displaystyle P_{E_{1}} 𝒚1[1M2𝒙2𝒞2q1(n)(𝒚1|𝒙1,𝒙2)]ρλ¯×\displaystyle\leq\sum_{\mbox{\boldmath$y$}_{1}}\left[\frac{1}{M_{2}}\sum_{\mbox{\boldmath$x$}_{2}\in{\cal C}_{2}}q_{1}^{(n)}(\mbox{\boldmath$y$}_{1}|\mbox{\boldmath$x$}_{1},\mbox{\boldmath$x$}_{2})\right]^{\overline{\rho\lambda}}\times
[𝒙1𝒙1(1M2𝒙2𝒞2q1(n)(𝒚1|𝒙1,𝒙2))λ]ρ,\displaystyle\left[\sum_{\mbox{\boldmath$x$}_{1}^{\prime}\neq\mbox{\boldmath$x$}_{1}}\left(\frac{1}{M_{2}}\sum_{\mbox{\boldmath$x$}_{2}\in{\cal C}_{2}}q_{1}^{(n)}(\mbox{\boldmath$y$}_{1}|\mbox{\boldmath$x$}_{1}^{\prime},\mbox{\boldmath$x$}_{2})\right)^{\lambda}\right]^{\rho}, (A.1)

where λ0\lambda\geq 0 and ρ0\rho\geq 0 are arbitrary parameters to be optimized in the sequel. Thus, the average error probability is upper bounded by the expectation of the above w.r.t. the ensemble of codes of both users. Let us take the expectation w.r.t. the ensemble of user 1 first, and we denote this expectation operator by 𝑬𝒞1{}\mbox{\boldmath$E$}_{{\cal C}_{1}}\{\cdot\}. Since the codewords of user 1 are independent, the expectation of the summand in the sum above is given by the product of expectations, namely, the product of

A\displaystyle A =𝑬𝒞1{[1M2𝒙2𝒞2q1(n)(𝒚1|𝒙1,𝒙2)]ρλ¯}\displaystyle\stackrel{{\scriptstyle\triangle}}{{=}}\mbox{\boldmath$E$}_{{\cal C}_{1}}\left\{\left[\frac{1}{M_{2}}\sum_{\mbox{\boldmath$x$}_{2}\in{\cal C}_{2}}q_{1}^{(n)}(\mbox{\boldmath$y$}_{1}|\mbox{\boldmath$x$}_{1},\mbox{\boldmath$x$}_{2})\right]^{\overline{\rho\lambda}}\right\}
=M2ρλ1𝑬𝒞1{[𝒙2𝒞2q1(n)(𝒚1|𝒙1,𝒙2)]ρλ¯}.\displaystyle=M_{2}^{\rho\lambda-1}\mbox{\boldmath$E$}_{{\cal C}_{1}}\left\{\left[\sum_{\mbox{\boldmath$x$}_{2}\in{\cal C}_{2}}q_{1}^{(n)}(\mbox{\boldmath$y$}_{1}|\mbox{\boldmath$x$}_{1},\mbox{\boldmath$x$}_{2})\right]^{\overline{\rho\lambda}}\right\}. (A.2)

and

B\displaystyle B =𝑬𝒞1{[𝒙1𝒙1(1M2𝒙2𝒞2q1(n)(𝒚1|𝒙1,𝒙2))λ]ρ}\displaystyle\stackrel{{\scriptstyle\triangle}}{{=}}\mbox{\boldmath$E$}_{{\cal C}_{1}}\left\{\left[\sum_{\mbox{\boldmath$x$}_{1}^{\prime}\neq\mbox{\boldmath$x$}_{1}}\left(\frac{1}{M_{2}}\sum_{\mbox{\boldmath$x$}_{2}\in{\cal C}_{2}}q_{1}^{(n)}(\mbox{\boldmath$y$}_{1}|\mbox{\boldmath$x$}_{1}^{\prime},\mbox{\boldmath$x$}_{2})\right)^{\lambda}\right]^{\rho}\right\}
=M2ρλ𝑬𝒞1{[𝒙1𝒙1(𝒙2𝒞2q1(n)(𝒚1|𝒙1,𝒙2))λ]ρ}.\displaystyle=M_{2}^{-\rho\lambda}\mbox{\boldmath$E$}_{{\cal C}_{1}}\left\{\left[\sum_{\mbox{\boldmath$x$}_{1}^{\prime}\neq\mbox{\boldmath$x$}_{1}}\left(\sum_{\mbox{\boldmath$x$}_{2}\in{\cal C}_{2}}q_{1}^{(n)}(\mbox{\boldmath$y$}_{1}|\mbox{\boldmath$x$}_{1}^{\prime},\mbox{\boldmath$x$}_{2})\right)^{\lambda}\right]^{\rho}\right\}.

Now, let N𝒙1,𝒚1(PX^1X^2Y^1)N_{\mbox{\boldmath$x$}_{1},\mbox{\boldmath$y$}_{1}}(P_{\hat{X}_{1}\hat{X}_{2}\hat{Y}_{1}}) denote the number of codewords {𝒙2}\{\mbox{\boldmath$x$}_{2}\} that form a joint empirical PMF PX^1X^2Y^1P_{\hat{X}_{1}\hat{X}_{2}\hat{Y}_{1}} together with a given 𝒙1\mbox{\boldmath$x$}_{1} and 𝒚1\mbox{\boldmath$y$}_{1}. Then, using (8), AA can be bounded by

A=\displaystyle A= M2ρλ1𝑬𝑿1[PX^1X^2Y^1N𝑿1,𝒚1(PX^1X^2Y^1)×\displaystyle M_{2}^{\rho\lambda-1}\mbox{\boldmath$E$}_{\mbox{\boldmath$X$}_{1}}\Bigg{[}\sum_{P_{\hat{X}_{1}\hat{X}_{2}\hat{Y}_{1}}}N_{\mbox{\boldmath$X$}_{1},\mbox{\boldmath$y$}_{1}}(P_{\hat{X}_{1}\hat{X}_{2}\hat{Y}_{1}})\times
en𝑬X^1X^2Y^1logq1(Y^1|X^1,X^2)]ρλ¯\displaystyle\quad\quad\quad\quad\quad\quad\quad e^{n\mbox{\boldmath$E$}_{\hat{X}_{1}\hat{X}_{2}\hat{Y}_{1}}\log q_{1}(\hat{Y}_{1}|\hat{X}_{1},\hat{X}_{2})}\Bigg{]}^{\overline{\rho\lambda}}
\displaystyle\leq M2ρλ1PX^1X^2Y^1𝑬𝑿1N𝑿1,𝒚1ρλ¯(PX^1X^2Y^1)×\displaystyle M_{2}^{\rho\lambda-1}\sum_{P_{\hat{X}_{1}\hat{X}_{2}\hat{Y}_{1}}}\mbox{\boldmath$E$}_{\mbox{\boldmath$X$}_{1}}N_{\mbox{\boldmath$X$}_{1},\mbox{\boldmath$y$}_{1}}^{\overline{\rho\lambda}}(P_{\hat{X}_{1}\hat{X}_{2}\hat{Y}_{1}})\times
enρλ¯𝑬X^1X^2Y^1logq1(Y^1|X^1,X^2)\displaystyle\quad\quad\quad\quad\quad\quad\quad e^{n\overline{\rho\lambda}\mbox{\boldmath$E$}_{\hat{X}_{1}\hat{X}_{2}\hat{Y}_{1}}\log q_{1}(\hat{Y}_{1}|\hat{X}_{1},\hat{X}_{2})} (A.3)

where q1(Y^1|X^1,X^2)q_{1}(\hat{Y}_{1}|\hat{X}_{1},\hat{X}_{2}) is the single–letter transition probability distribution of the IFC, and where 𝑬X^1X^2Y^1f(X^1,X^2,Y^1)\mbox{\boldmath$E$}_{\hat{X}_{1}\hat{X}_{2}\hat{Y}_{1}}f(\hat{X}_{1},\hat{X}_{2},\hat{Y}_{1}), for a generic function ff, denotes the expectation operator when the RV’s (X^1,X^2,Y^1)(\hat{X}_{1},\hat{X}_{2},\hat{Y}_{1}) are understood to be distributed according to PX^1X^2Y^1P_{\hat{X}_{1}\hat{X}_{2}\hat{Y}_{1}}. Similarly, (and using Jensen’s inequality to push the expectation w.r.t. 𝒞1{\cal C}_{1} into the brackets), we have:

B\displaystyle B\leq M2ρλM1ρ[PX^1X^2Y^1𝑬𝑿1N𝑿1,𝒚1λ(PX^1X^2Y^1)×\displaystyle M_{2}^{-\rho\lambda}M_{1}^{\rho}\Bigg{[}\sum_{P_{\hat{X}_{1}\hat{X}_{2}\hat{Y}_{1}}}\mbox{\boldmath$E$}_{\mbox{\boldmath$X$}_{1}}N_{\mbox{\boldmath$X$}_{1},\mbox{\boldmath$y$}_{1}}^{\lambda}(P_{\hat{X}_{1}\hat{X}_{2}\hat{Y}_{1}})\times
enλ𝑬X^1X^2Y^1logq(Y^1|X^1,X^2)]ρ\displaystyle\quad\quad\quad\quad\quad\quad\quad e^{n\lambda\mbox{\boldmath$E$}_{\hat{X}_{1}\hat{X}_{2}\hat{Y}_{1}}\log q(\hat{Y}_{1}|\hat{X}_{1},\hat{X}_{2})}\Bigg{]}^{\rho} (A.4)

Taking the product of these two expressions, applying (8) to the summation in the bound for BB, and taking expectations with respect to the codebook 𝒞2{\cal C}_{2} yields

𝑬𝒞2(A\displaystyle\mbox{\boldmath$E$}_{{\cal C}_{2}}(A B)M1ρM21PX^1X^2Y^1PX^1X^2Y^1\displaystyle B)\leq M_{1}^{\rho}M_{2}^{-1}\sum_{P_{\hat{X}_{1}\hat{X}_{2}\hat{Y}_{1}}}\sum_{P_{\hat{X}_{1}^{\prime}\hat{X}_{2}^{\prime}\hat{Y}_{1}^{\prime}}}
𝑬𝒞2[𝑬𝑿1N𝑿1,𝒚1ρλ¯(PX^1X^2Y^1)𝑬𝑿1ρN𝑿1,𝒚1λ(PX^1X^2Y^1)]\displaystyle\mbox{\boldmath$E$}_{{\cal C}_{2}}[\mbox{\boldmath$E$}_{\mbox{\boldmath$X$}_{1}}N_{\mbox{\boldmath$X$}_{1},\mbox{\boldmath$y$}_{1}}^{\overline{\rho\lambda}}(P_{\hat{X}_{1}\hat{X}_{2}\hat{Y}_{1}})\mbox{\boldmath$E$}^{\rho}_{\mbox{\boldmath$X$}_{1}}N_{\mbox{\boldmath$X$}_{1},\mbox{\boldmath$y$}_{1}}^{\lambda}(P_{\hat{X}_{1}^{\prime}\hat{X}_{2}^{\prime}\hat{Y}_{1}^{\prime}})]
×exp{n[ρλ¯𝑬X^1X^2Y^1logq1(Y^1|X^1,X^2)\displaystyle\times\exp\{n[\overline{\rho\lambda}\mbox{\boldmath$E$}_{\hat{X}_{1}\hat{X}_{2}\hat{Y}_{1}}\log q_{1}(\hat{Y}_{1}|\hat{X}_{1},\hat{X}_{2})
+ρλ𝑬X^1X^2Y^1logq1(Y^1|X^1,X^2)]}\displaystyle\quad\quad\quad+\rho\lambda\mbox{\boldmath$E$}_{\hat{X}_{1}^{\prime}\hat{X}_{2}^{\prime}\hat{Y}_{1}^{\prime}}\log q_{1}(\hat{Y}_{1}^{\prime}|\hat{X}_{1}^{\prime},\hat{X}_{2}^{\prime})]\} (A.5)

The next step is to bound the term involving the expectation over 𝒞2{\cal C}_{2}. As noted, the codewords {𝑿1}\{\mbox{\boldmath$X$}_{1}\} and {𝑿2}\{\mbox{\boldmath$X$}_{2}\} are randomly selected i.i.d. over the type classes 𝒯1=𝒯Q1{\cal T}_{1}={\cal T}_{Q_{1}} and 𝒯2=𝒯Q2{\cal T}_{2}={\cal T}_{Q_{2}} corresponding to probability distributions Q1Q_{1} and Q2Q_{2}, respectively. To avoid cumbersome notation, we denote hereafter P^=PX^1X^2Y^1\hat{P}=P_{\hat{X}_{1}\hat{X}_{2}\hat{Y}_{1}} and P^=PX^1X^2Y^1\hat{P}^{\prime}=P_{\hat{X}_{1}^{\prime}\hat{X}_{2}^{\prime}\hat{Y}_{1}^{\prime}} and assume that PX^1=PX^1=Q1P_{\hat{X}_{1}}=P_{\hat{X}_{1}^{\prime}}=Q_{1}, PX^2=PX^2=Q2P_{\hat{X}_{2}}=P_{\hat{X}_{2}^{\prime}}=Q_{2}, PY^1=PY^1P_{\hat{Y}_{1}}=P_{\hat{Y}_{1}^{\prime}} and that 𝒚1\mbox{\boldmath$y$}_{1} lies in the type class corresponding to PY^1P_{\hat{Y}_{1}}. We will also use the shorthand notation

𝑬𝒞2𝑬𝒞2[𝑬𝑿1N𝑿1,𝒚1ρλ¯(P^)𝑬𝑿1ρN𝑿1,𝒚1λ(P^)].\mbox{\boldmath$E$}_{{\cal C}_{2}}\triangleq\mbox{\boldmath$E$}_{{\cal C}_{2}}[\mbox{\boldmath$E$}_{\mbox{\boldmath$X$}_{1}}N_{\mbox{\boldmath$X$}_{1},\mbox{\boldmath$y$}_{1}}^{\overline{\rho\lambda}}(\hat{P})\mbox{\boldmath$E$}_{\mbox{\boldmath$X$}_{1}}^{\rho}N_{\mbox{\boldmath$X$}_{1},\mbox{\boldmath$y$}_{1}}^{\lambda}(\hat{P}^{\prime})]. (A.6)

The bounding of 𝑬𝒞2\mbox{\boldmath$E$}_{{\cal C}_{2}} requires considering multiple cases which depend on how R2R_{2} compares to different information quantities, and also depend on properties of the joint types PX^1X^2Y^1,PX^1X^2Y^1P_{\hat{X}_{1}\hat{X}_{2}\hat{Y}_{1}},P_{\hat{X}_{1}^{\prime}\hat{X}_{2}^{\prime}\hat{Y}_{1}^{\prime}}. In order to guide the reader through the different steps we present in Fig. 5 below a schematic representation of the different cases that arise.

We first consider two different ranges of R2R_{2}, according to its comparison with I(X^2;X^1,Y^1)I(\hat{X}_{2}^{\prime};\hat{X}_{1}^{\prime},\hat{Y}_{1}^{\prime}):


1. The range R2I(X^2;X^1,Y^1)R_{2}\geq I(\hat{X}_{2}^{\prime};\hat{X}_{1}^{\prime},\hat{Y}_{1}^{\prime}). Here we have:

𝑬𝒞2=𝑬𝒞2{𝑬𝑿1[N𝑿1,𝒚11ρλ(P^)]\displaystyle\mbox{\boldmath$E$}_{{\cal C}_{2}}=\mbox{\boldmath$E$}_{{\cal C}_{2}}\Bigg{\{}\mbox{\boldmath$E$}_{\mbox{\boldmath$X$}_{1}}\left[N_{\mbox{\boldmath$X$}_{1},\mbox{\boldmath$y$}_{1}}^{1-\rho\lambda}(\hat{P})\right]
×[1|𝒯1|𝒙~𝒯1N𝒙~1,𝒚1λ(P^)]ρ}\displaystyle\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\times\bigg{[}\frac{1}{|{\cal T}_{1}|}\sum_{\tilde{\mbox{\boldmath$x$}}\in{\cal T}_{1}}N_{\tilde{\mbox{\boldmath$x$}}_{1},\mbox{\boldmath$y$}_{1}}^{\lambda}(\hat{P}^{\prime})\bigg{]}^{\rho}\Bigg{\}}
=\displaystyle= 𝑬𝒞2{𝑬𝑿1[N𝑿1,𝒚11ρλ(P^)][1|𝒯1|𝒙~𝒯1N𝒙~1,𝒚1λ(P^)]ρ×\displaystyle\mbox{\boldmath$E$}_{{\cal C}_{2}}\Bigg{\{}\mbox{\boldmath$E$}_{\mbox{\boldmath$X$}_{1}}\left[N_{\mbox{\boldmath$X$}_{1},\mbox{\boldmath$y$}_{1}}^{1-\rho\lambda}(\hat{P})\right]\cdot\bigg{[}\frac{1}{|{\cal T}_{1}|}\sum_{\tilde{\mbox{\boldmath$x$}}\in{\cal T}_{1}}N_{\tilde{\mbox{\boldmath$x$}}_{1},\mbox{\boldmath$y$}_{1}}^{\lambda}(\hat{P}^{\prime})\bigg{]}^{\rho}\times
1[N𝒙~1,𝒚1(P^)en[(R2I(X^2;X^1,Y^1))+ϵ],𝒙~1𝒯1]}\displaystyle 1\left[N_{\tilde{\mbox{\boldmath$x$}}_{1},\mbox{\boldmath$y$}_{1}}(\hat{P}^{\prime})\leq e^{n[(R_{2}-I(\hat{X}_{2}^{\prime};\hat{X}_{1}^{\prime},\hat{Y}_{1}^{\prime}))+\epsilon]},\forall\tilde{\mbox{\boldmath$x$}}_{1}\in{\cal T}_{1}\right]\Bigg{\}}
+𝑬𝒞2{𝑬𝑿1[N𝑿1,𝒚11ρλ(P^)][1|𝒯1|𝒙~𝒯1N𝒙1~,𝒚1λ(P^)]ρ×\displaystyle+\mbox{\boldmath$E$}_{{\cal C}_{2}}\Bigg{\{}\mbox{\boldmath$E$}_{\mbox{\boldmath$X$}_{1}}\left[N_{\mbox{\boldmath$X$}_{1},\mbox{\boldmath$y$}_{1}}^{1-\rho\lambda}(\hat{P})\right]\bigg{[}\frac{1}{|{\cal T}_{1}|}\sum_{\tilde{\mbox{\boldmath$x$}}\in{\cal T}_{1}}N_{\tilde{\mbox{\boldmath$x$}_{1}},\mbox{\boldmath$y$}_{1}}^{\lambda}(\hat{P}^{\prime})\bigg{]}^{\rho}\times
1[𝒙~𝒯1:N𝒙~1,𝒚1(P^)>en[(R2I(X^2;X^1,Y^1))+ϵ]]}\displaystyle 1\left[\exists\tilde{\mbox{\boldmath$x$}}\in{\cal T}_{1}:N_{\tilde{\mbox{\boldmath$x$}}_{1},\mbox{\boldmath$y$}_{1}}(\hat{P}^{\prime})>e^{n[(R_{2}-I(\hat{X}_{2}^{\prime};\hat{X}_{1}^{\prime},\hat{Y}_{1}^{\prime}))+\epsilon]}\right]\Bigg{\}}
\displaystyle\leq 𝑬𝒞2{𝑬𝑿1[N𝑿1,𝒚1ρλ¯(P^)][en(H(X^1)ϵ)×\displaystyle\mbox{\boldmath$E$}_{{\cal C}_{2}}\Bigg{\{}\mbox{\boldmath$E$}_{\mbox{\boldmath$X$}_{1}}\left[N_{\mbox{\boldmath$X$}_{1},\mbox{\boldmath$y$}_{1}}^{\overline{\rho\lambda}}(\hat{P})\right]\cdot\bigg{[}e^{-n(H(\hat{X}_{1}^{\prime})-\epsilon)}\times
𝒙~𝒯11[(𝒙~,𝒚1)𝒯PX^1Y^1]enλ(R2I(X^2;X^1,Y^1)+ϵ)]ρ}\displaystyle\sum_{\tilde{\mbox{\boldmath$x$}}\in{\cal T}_{1}}1\left[(\tilde{\mbox{\boldmath$x$}},\mbox{\boldmath$y$}_{1})\in{\cal T}_{P_{\hat{X}_{1}^{\prime}\hat{Y}_{1}^{\prime}}}\right]\cdot e^{n\lambda(R_{2}-I(\hat{X}_{2}^{\prime};\hat{X}_{1}^{\prime},\hat{Y}_{1}^{\prime})+\epsilon)}\bigg{]}^{\rho}\Bigg{\}}
+enR2Pr[𝒙~𝒯1:N𝒙~,𝒚1(P^)>en[(R2I(X^2;X^1,Y^1))+ϵ]]\displaystyle+e^{nR_{2}}\mbox{Pr}\left[\exists\tilde{\mbox{\boldmath$x$}}\in{\cal T}_{1}:N_{\tilde{\mbox{\boldmath$x$}},\mbox{\boldmath$y$}_{1}}(\hat{P}^{\prime})>e^{n[(R_{2}-I(\hat{X}_{2}^{\prime};\hat{X}_{1}^{\prime},\hat{Y}_{1}^{\prime}))+\epsilon]}\right]
.\displaystyle\stackrel{{\scriptstyle.}}{{\leq}} 𝑬𝒞2{𝑬𝑿1[N𝑿1,𝒚11ρλ(P^)]}enρ[H(X^1)H(X^1|Y^1)]×\displaystyle\mbox{\boldmath$E$}_{{\cal C}_{2}}\left\{\mbox{\boldmath$E$}_{\mbox{\boldmath$X$}_{1}}\left[N_{\mbox{\boldmath$X$}_{1},\mbox{\boldmath$y$}_{1}}^{1-\rho\lambda}(\hat{P})\right]\right\}\cdot e^{-n\rho[H(\hat{X}_{1}^{\prime})-H(\hat{X}_{1}^{\prime}|\hat{Y}_{1}^{\prime})]}\times
enρλ(R2I(X^2;X^1,Y^1))\displaystyle\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad e^{n\rho\lambda(R_{2}-I(\hat{X}_{2}^{\prime};\hat{X}_{1}^{\prime},\hat{Y}_{1}^{\prime}))} (A.7)

where in the second to last inequality we used N𝒙1,𝒚M2N_{\mbox{\boldmath$x$}_{1},\mbox{\boldmath$y$}}\leq M_{2}, and in the last inequality we used the fact that

Pr{𝒙~𝒯1:N𝒙~,𝒚1(P^)>en[(R2I(X^2;X^1,Y^1))+ϵ]}\displaystyle\mbox{Pr}\left\{\exists\tilde{\mbox{\boldmath$x$}}\in{\cal T}_{1}:N_{\tilde{\mbox{\boldmath$x$}},\mbox{\boldmath$y$}_{1}}(\hat{P}^{\prime})>e^{n[(R_{2}-I(\hat{X}_{2}^{\prime};\hat{X}_{1}^{\prime},\hat{Y}_{1}^{\prime}))+\epsilon]}\right\}
en(H(X^1)+ϵ)Pr{N𝒙~,𝒚1(P^)>en[(R2I(X^2;X^1,Y^1))+ϵ]}\displaystyle\leq e^{n(H(\hat{X}_{1}^{\prime})+\epsilon)}\cdot\mbox{Pr}\left\{N_{\tilde{\mbox{\boldmath$x$}},\mbox{\boldmath$y$}_{1}}(\hat{P}^{\prime})>e^{n[(R_{2}-I(\hat{X}_{2}^{\prime};\hat{X}_{1}^{\prime},\hat{Y}_{1}^{\prime}))+\epsilon]}\right\}

for any 𝒙~𝒯1\tilde{\mbox{\boldmath$x$}}\in{\cal T}_{1}, which decays doubly exponentially with nn (cf. [7, Appendix]).

To compute 𝑬𝒞2{𝑬𝑿1[N𝑿1,𝒚11ρλ(P^)]}\mbox{\boldmath$E$}_{{\cal C}_{2}}\left\{\mbox{\boldmath$E$}_{\mbox{\boldmath$X$}_{1}}\left[N_{\mbox{\boldmath$X$}_{1},\mbox{\boldmath$y$}_{1}}^{1-\rho\lambda}(\hat{P})\right]\right\} we consider two cases, according to the comparison between R2R_{2} and I(X^2;X^1,Y^1)I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1}):

The case R2I(X^2;X^1,Y^1)R_{2}\geq I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1}). Here, we have:

𝑬𝒞2𝑬𝑿1[N𝑿1,𝒚11ρλ(P^)]=𝑬𝑿1𝑬𝒞2[N𝑿1,𝒚11ρλ(P^)]\displaystyle\mbox{\boldmath$E$}_{{\cal C}_{2}}\mbox{\boldmath$E$}_{\mbox{\boldmath$X$}_{1}}\left[N_{\mbox{\boldmath$X$}_{1},\mbox{\boldmath$y$}_{1}}^{1-\rho\lambda}(\hat{P})\right]=\mbox{\boldmath$E$}_{\mbox{\boldmath$X$}_{1}}\mbox{\boldmath$E$}_{{\cal C}_{2}}\left[N_{\mbox{\boldmath$X$}_{1},\mbox{\boldmath$y$}_{1}}^{1-\rho\lambda}(\hat{P})\right]
.𝑬𝑿1[1((𝑿1,𝒚1)𝒯PX^1Y^1)enρλ¯(R2I(X^2;X^1,Y^1))]\displaystyle\stackrel{{\scriptstyle.}}{{\leq}}\mbox{\boldmath$E$}_{\mbox{\boldmath$X$}_{1}}\left[1\left((\mbox{\boldmath$X$}_{1},\mbox{\boldmath$y$}_{1})\in{\cal T}_{P_{\hat{X}_{1}\hat{Y}_{1}}}\right)e^{n\overline{\rho\lambda}(R_{2}-I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1}))}\right]
=.enI(X^1;Y^1)enρλ¯(R2I(X^2;X^1,Y^1)).\displaystyle\stackrel{{\scriptstyle.}}{{=}}e^{-nI(\hat{X}_{1};\hat{Y}_{1})}e^{n\overline{\rho\lambda}(R_{2}-I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1}))}. (A.9)

Therefore, when

R2max{I(X^2;X^1,Y^1),I(X^2;X^1,Y^1)}R_{2}\geq\max\{I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1}),I(\hat{X}_{2}^{\prime};\hat{X}_{1}^{\prime},\hat{Y}_{1}^{\prime})\}

we have:

𝑬𝒞2\displaystyle\mbox{\boldmath$E$}_{{\cal C}_{2}} .exp{n[I(X^1;Y^1)+ρλ¯(R2I(X^2;X^1,Y^1))\displaystyle\stackrel{{\scriptstyle.}}{{\leq}}\exp\left\{n\left[-I(\hat{X}_{1};\hat{Y}_{1})+\overline{\rho\lambda}(R_{2}-I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1}))\right.\right.
ρI(X^1;Y^1)+ρλ(R2I(X^2;X^1,Y^1))]}.\displaystyle\quad\left.\left.-\rho I(\hat{X}_{1}^{\prime};\hat{Y}_{1}^{\prime})+\rho\lambda(R_{2}-I(\hat{X}_{2}^{\prime};\hat{X}_{1}^{\prime},\hat{Y}_{1}^{\prime}))\right]\right\}. (A.10)

The case R2<I(X^2;X^1,Y^1)R_{2}<I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1}). Here we have:

𝑬𝒞2𝑬𝑿1[N𝑿1,𝒚1ρλ¯(P^)]\displaystyle\mbox{\boldmath$E$}_{{\cal C}_{2}}\mbox{\boldmath$E$}_{\mbox{\boldmath$X$}_{1}}\left[N_{\mbox{\boldmath$X$}_{1},\mbox{\boldmath$y$}_{1}}^{\overline{\rho\lambda}}(\hat{P})\right] 𝑬𝒞2𝑬𝑿1[N𝑿1,𝒚1(P^)]\displaystyle\leq\mbox{\boldmath$E$}_{{\cal C}_{2}}\mbox{\boldmath$E$}_{\mbox{\boldmath$X$}_{1}}\left[N_{\mbox{\boldmath$X$}_{1},\mbox{\boldmath$y$}_{1}}(\hat{P})\right]
.enI(X^1;Y^1)en(R2I(X^2;X^1,Y^1)),\displaystyle\stackrel{{\scriptstyle.}}{{\leq}}e^{-nI(\hat{X}_{1};\hat{Y}_{1})}\cdot e^{n(R_{2}-I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1}))}, (A.11)

where we used the fact that ρλ¯1\overline{\rho\lambda}\leq 1 and then estimated the expectation of N𝑿1,𝒚1(P^)N_{\mbox{\boldmath$X$}_{1},\mbox{\boldmath$y$}_{1}}(\hat{P}) as M2M_{2} times the probability 𝒙2\mbox{\boldmath$x$}_{2} would fall into the corresponding conditional type. Therefore, when

I(X^2;X^1,Y^1)R2<I(X^2;X^1,Y^1)I(\hat{X}_{2}^{\prime};\hat{X}_{1}^{\prime},\hat{Y}_{1}^{\prime})\leq R_{2}<I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1})

we have:

𝑬𝒞2\displaystyle\mbox{\boldmath$E$}_{{\cal C}_{2}} .exp{n[I(X^1;Y^1)+(R2I(X^2;X^1,Y^1))\displaystyle\stackrel{{\scriptstyle.}}{{\leq}}\exp\left\{n\left[-I(\hat{X}_{1};\hat{Y}_{1})+(R_{2}-I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1}))\right.\right.
ρI(X^1;Y^1)+ρλ(R2I(X^2;X^1,Y^1))]}.\displaystyle\quad\left.\left.-\rho I(\hat{X}_{1}^{\prime};\hat{Y}_{1}^{\prime})+\rho\lambda(R_{2}-I(\hat{X}_{2}^{\prime};\hat{X}_{1}^{\prime},\hat{Y}_{1}^{\prime}))\right]\right\}. (A.12)

The exponents for the subcases (A)(\ref{eqn:Ecy1a}) and (A)(\ref{eqn:Ecy1b}) corresponding to R2I(X^2;X^1,Y^1)R_{2}\geq I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1}) and R2<I(X^2;X^1,Y^1)R_{2}<I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1}), respectively, differ only in the factors (ρλ¯\overline{\rho\lambda} and 11, resp.) multiplying the term R2I(X^2;X^1,Y^1)R_{2}-I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1}). Therefore, we can consolidate these two subscases of R2I(X^2;X^1,Y^1)R_{2}\geq I(\hat{X}_{2}^{\prime};\hat{X}_{1}^{\prime},\hat{Y}_{1}^{\prime}) into the expression:

𝑬𝒞2\displaystyle\mbox{\boldmath$E$}_{{\cal C}_{2}} .exp{n[I(X^1;Y^1)+\displaystyle\stackrel{{\scriptstyle.}}{{\leq}}\exp\left\{n\left[-I(\hat{X}_{1};\hat{Y}_{1})+\right.\right.
min{ρλ¯(R2I(X^2;X^1,Y^1)),\displaystyle\quad\min\{\overline{\rho\lambda}(R_{2}-I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1})),
(R2I(X^2;X^1,Y^1))}\displaystyle\quad\quad\quad(R_{2}-I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1}))\}
ρI(X^1;Y^1)+ρλ(R2I(X^2;X^1,Y^1))]},\displaystyle\quad\left.\left.-\rho I(\hat{X}_{1}^{\prime};\hat{Y}_{1}^{\prime})+\rho\lambda(R_{2}-I(\hat{X}_{2}^{\prime};\hat{X}_{1}^{\prime},\hat{Y}_{1}^{\prime}))\right]\right\}, (A.13)

since min{ρλ¯\min\{\overline{\rho\lambda} (R2I(X^2;X^1,Y^1)),(R_{2}-I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1})), (R2I(X^2;X^1,Y^1))}(R_{2}-I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1}))\} is ρλ¯\overline{\rho\lambda} (R2I(X^2;X^1,Y^1))(R_{2}-I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1})) when R2I(X^2;X^1,Y^1)R_{2}\geq I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1}) and (R2I(X^2;X^1,Y^1))(R_{2}-I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1})) when R2<I(X^2;X^1,Y^1)R_{2}<I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1}).

2. The range R2<I(X^2;X^1,Y^1)R_{2}<I(\hat{X}_{2}^{\prime};\hat{X}_{1}^{\prime},\hat{Y}_{1}^{\prime}). In this range,

𝑬𝒞2\displaystyle\mbox{\boldmath$E$}_{{\cal C}_{2}} =𝑬𝒞2{𝑬𝑿1[N𝑿1,𝒚11ρλ(P^)]𝑬𝑿1ρ[N𝑿1,𝒚1λ(P^)]}\displaystyle=\mbox{\boldmath$E$}_{{\cal C}_{2}}\left\{\mbox{\boldmath$E$}_{\mbox{\boldmath$X$}_{1}}\left[N_{\mbox{\boldmath$X$}_{1},\mbox{\boldmath$y$}_{1}}^{1-\rho\lambda}(\hat{P})\right]\mbox{\boldmath$E$}_{\mbox{\boldmath$X$}_{1}}^{\rho}\left[N_{\mbox{\boldmath$X$}_{1},\mbox{\boldmath$y$}_{1}}^{\lambda}(\hat{P}^{\prime})\right]\right\}
𝑬𝒞2{𝑬𝑿1[N𝑿1,𝒚11ρλ(P^)]𝑬𝑿1ρ[N𝑿1,𝒚1(P^)]}\displaystyle\leq\mbox{\boldmath$E$}_{{\cal C}_{2}}\left\{\mbox{\boldmath$E$}_{\mbox{\boldmath$X$}_{1}}\left[N_{\mbox{\boldmath$X$}_{1},\mbox{\boldmath$y$}_{1}}^{1-\rho\lambda}(\hat{P})\right]\mbox{\boldmath$E$}_{\mbox{\boldmath$X$}_{1}}^{\rho}\left[N_{\mbox{\boldmath$X$}_{1},\mbox{\boldmath$y$}_{1}}(\hat{P}^{\prime})\right]\right\}

where we assumed λ1\lambda\leq 1 in the last step. The second expectation over 𝑿1\mbox{\boldmath$X$}_{1} can be evaluated as

E𝑿1N𝑿1,𝒚1\displaystyle E_{\mbox{\boldmath$X$}_{1}}N_{\mbox{\boldmath$X$}_{1},\mbox{\boldmath$y$}_{1}} (PX^1X^2Y^1)\displaystyle(P_{\hat{X}_{1}^{\prime}\hat{X}_{2}^{\prime}\hat{Y}_{1}^{\prime}})
=𝒙2𝒞2𝑬𝑿11((𝑿1,𝒙2,𝒚1)𝒯PX^1X^2Y^1)\displaystyle=\sum_{\mbox{\boldmath$x$}_{2}\in{\cal C}_{2}}\mbox{\boldmath$E$}_{\mbox{\boldmath$X$}_{1}}1((\mbox{\boldmath$X$}_{1},\mbox{\boldmath$x$}_{2},\mbox{\boldmath$y$}_{1})\in{\cal T}_{P_{\hat{X}_{1}^{\prime}\hat{X}_{2}^{\prime}\hat{Y}_{1}^{\prime}}})
=enI(X^1;X^2,Y^1)𝒙2𝒞21((𝒙2,𝒚1)𝒯PX^2Y^1)\displaystyle\stackrel{{\scriptstyle\cdot}}{{=}}e^{-nI(\hat{X}_{1}^{\prime};\hat{X}_{2}^{\prime},\hat{Y}_{1}^{\prime})}\sum_{\mbox{\boldmath$x$}_{2}\in{\cal C}_{2}}1((\mbox{\boldmath$x$}_{2},\mbox{\boldmath$y$}_{1})\in{\cal T}_{P_{\hat{X}_{2}^{\prime}\hat{Y}_{1}^{\prime}}})
=enI(X^1;X^2,Y^1)N𝒚1(PX^2Y^1),\displaystyle=e^{-nI(\hat{X}_{1}^{\prime};\hat{X}_{2}^{\prime},\hat{Y}_{1}^{\prime})}N_{\mbox{\boldmath$y$}_{1}}(P_{\hat{X}_{2}^{\prime}\hat{Y}_{1}^{\prime}}), (A.15)

where N𝒚1(PX^2Y^1)N_{\mbox{\boldmath$y$}_{1}}(P_{\hat{X}_{2}^{\prime}\hat{Y}_{1}^{\prime}}) is the number of codewords {𝒙2}\{\mbox{\boldmath$x$}_{2}\} that are jointly typical with 𝒚1\mbox{\boldmath$y$}_{1} according to PX^2Y^1P_{\hat{X}_{2}^{\prime}\hat{Y}_{1}^{\prime}}. Thus,

𝑬𝒞2[𝑬𝑿1N𝑿1,𝒚1ρλ¯(P^)𝑬𝑿1ρN𝑿1,𝒚1(P^)]\displaystyle\mbox{\boldmath$E$}_{{\cal C}_{2}}\big{[}\mbox{\boldmath$E$}_{\mbox{\boldmath$X$}_{1}}N_{\mbox{\boldmath$X$}_{1},\mbox{\boldmath$y$}_{1}}^{\overline{\rho\lambda}}(\hat{P})\mbox{\boldmath$E$}^{\rho}_{\mbox{\boldmath$X$}_{1}}N_{\mbox{\boldmath$X$}_{1},\mbox{\boldmath$y$}_{1}}(\hat{P}^{\prime})\big{]}
=enρI(X^1;X^2,Y^1)𝑬𝒞2[𝑬𝑿1N𝑿1,𝒚1ρλ¯(PX^1X^2Y^1)N𝒚1ρ(PX^2Y^1)]\displaystyle\stackrel{{\scriptstyle\cdot}}{{=}}e^{-n\rho I(\hat{X}_{1}^{\prime};\hat{X}_{2}^{\prime},\hat{Y}_{1}^{\prime})}\mbox{\boldmath$E$}_{{\cal C}_{2}}\big{[}\mbox{\boldmath$E$}_{\mbox{\boldmath$X$}_{1}}N_{\mbox{\boldmath$X$}_{1},\mbox{\boldmath$y$}_{1}}^{\overline{\rho\lambda}}(P_{\hat{X}_{1}\hat{X}_{2}\hat{Y}_{1}})N_{\mbox{\boldmath$y$}_{1}}^{\rho}(P_{\hat{X}_{2}^{\prime}\hat{Y}_{1}^{\prime}})\big{]}
=enρI(X^1;X^2,Y^1)𝑬𝑿1𝑬𝒞2[N𝑿1,𝒚1ρλ¯(PX^1X^2Y^1)N𝒚1ρ(PX^2Y^1)].\displaystyle=e^{-n\rho I(\hat{X}_{1}^{\prime};\hat{X}_{2}^{\prime},\hat{Y}_{1}^{\prime})}\mbox{\boldmath$E$}_{\mbox{\boldmath$X$}_{1}}\mbox{\boldmath$E$}_{{\cal C}_{2}}\big{[}N_{\mbox{\boldmath$X$}_{1},\mbox{\boldmath$y$}_{1}}^{\overline{\rho\lambda}}(P_{\hat{X}_{1}\hat{X}_{2}\hat{Y}_{1}})N_{\mbox{\boldmath$y$}_{1}}^{\rho}(P_{\hat{X}_{2}^{\prime}\hat{Y}_{1}^{\prime}})\big{]}. (A.16)

To bound 𝑬𝑿1𝑬𝒞2[N𝑿1,𝒚1ρλ¯(P^)N𝒚1ρ(P^)]\mbox{\boldmath$E$}_{\mbox{\boldmath$X$}_{1}}\mbox{\boldmath$E$}_{{\cal C}_{2}}[N_{\mbox{\boldmath$X$}_{1},\mbox{\boldmath$y$}_{1}}^{\overline{\rho\lambda}}(\hat{P})N_{\mbox{\boldmath$y$}_{1}}^{\rho}(\hat{P}^{\prime})], we consider two cases depending on how R2R_{2} compares to I(X^2;Y^1)I(\hat{X}_{2}^{\prime};\hat{Y}_{1}^{\prime}).

The case R2I(X^2;Y^1)R_{2}\geq I(\hat{X}_{2}^{\prime};\hat{Y}_{1}^{\prime}). Here, we have:

𝑬𝑿1𝑬𝒞2[N𝑿1,𝒚1ρλ¯(P^)N𝒚1ρ(P^)]\displaystyle\mbox{\boldmath$E$}_{\mbox{\boldmath$X$}_{1}}\mbox{\boldmath$E$}_{{\cal C}_{2}}[N_{\mbox{\boldmath$X$}_{1},\mbox{\boldmath$y$}_{1}}^{\overline{\rho\lambda}}(\hat{P})N_{\mbox{\boldmath$y$}_{1}}^{\rho}(\hat{P}^{\prime})]
=\displaystyle= 𝑬𝑿1𝑬𝒞2{N𝑿1,𝒚1ρλ¯(P^)N𝒚1ρ(P^)×\displaystyle\mbox{\boldmath$E$}_{\mbox{\boldmath$X$}_{1}}\mbox{\boldmath$E$}_{{\cal C}_{2}}\Bigg{\{}N_{\mbox{\boldmath$X$}_{1},\mbox{\boldmath$y$}_{1}}^{\overline{\rho\lambda}}(\hat{P})N_{\mbox{\boldmath$y$}_{1}}^{\rho}(\hat{P}^{\prime})\times
1[N𝒚1(P^)en(R2I(X^2;Y^1)+ϵ)]}\displaystyle\quad\quad\quad\quad\quad\quad\quad 1\bigg{[}N_{\mbox{\boldmath$y$}_{1}}(\hat{P}^{\prime})\leq e^{n(R_{2}-I(\hat{X}_{2}^{\prime};\hat{Y}_{1}^{\prime})+\epsilon)}\bigg{]}\Bigg{\}}
+𝑬𝑿1𝑬𝒞2{N𝑿1,𝒚1ρλ¯(P^)N𝒚1ρ(P^)×\displaystyle+\mbox{\boldmath$E$}_{\mbox{\boldmath$X$}_{1}}\mbox{\boldmath$E$}_{{\cal C}_{2}}\Bigg{\{}N_{\mbox{\boldmath$X$}_{1},\mbox{\boldmath$y$}_{1}}^{\overline{\rho\lambda}}(\hat{P})N_{\mbox{\boldmath$y$}_{1}}^{\rho}(\hat{P}^{\prime})\times
1[N𝒚1(P^)>en(R2I(X^2;Y^1)+ϵ)]}\displaystyle\quad\quad\quad\quad\quad\quad\quad 1\bigg{[}N_{\mbox{\boldmath$y$}_{1}}(\hat{P}^{\prime})>e^{n(R_{2}-I(\hat{X}_{2}^{\prime};\hat{Y}_{1}^{\prime})+\epsilon)}\bigg{]}\Bigg{\}}
.\displaystyle\stackrel{{\scriptstyle.}}{{\leq}} enρ(R2I(X^2;Y^1))𝑬𝑿1𝑬𝒞2[N𝑿1,𝒚1ρλ¯(P^)]\displaystyle e^{n\rho(R_{2}-I(\hat{X}_{2}^{\prime};\hat{Y}_{1}^{\prime}))}\mbox{\boldmath$E$}_{\mbox{\boldmath$X$}_{1}}\mbox{\boldmath$E$}_{{\cal C}_{2}}\bigg{[}N_{\mbox{\boldmath$X$}_{1},\mbox{\boldmath$y$}_{1}}^{\overline{\rho\lambda}}(\hat{P})\bigg{]}
+en(ρλ¯+ρ)R2Pr[N𝒚1(P^)>en(R2I(X^2;Y^1)+ϵ)]\displaystyle+e^{n(\overline{\rho\lambda}+\rho)R_{2}}\mbox{Pr}\bigg{[}N_{\mbox{\boldmath$y$}_{1}}(\hat{P}^{\prime})>e^{n(R_{2}-I(\hat{X}_{2}^{\prime};\hat{Y}_{1}^{\prime})+\epsilon)}\bigg{]}
.\displaystyle\stackrel{{\scriptstyle.}}{{\leq}} exp{n[ρ(R2I(X^2;Y^1))I(X^1;Y^1)\displaystyle\exp\Bigg{\{}n\Bigg{[}\rho(R_{2}-I(\hat{X}_{2}^{\prime};\hat{Y}_{1}^{\prime}))-I(\hat{X}_{1};\hat{Y}_{1})
+1(R2I(X^2;X^1,Y^1))ρλ¯(R2I(X^2;X^1,Y^1))\displaystyle+1(R_{2}\geq I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1}))\overline{\rho\lambda}(R_{2}-I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1}))
+1(R2<I(X^2;X^1,Y^1))(R2I(X^2;X^1,Y^1))]}\displaystyle+1(R_{2}<I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1}))(R_{2}-I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1}))\bigg{]}\Bigg{\}}
=\displaystyle= exp{n[ρ(R2I(X^2;Y^1))I(X^1;Y^1)\displaystyle\exp\Bigg{\{}n\Bigg{[}\rho(R_{2}-I(\hat{X}_{2}^{\prime};\hat{Y}_{1}^{\prime}))-I(\hat{X}_{1};\hat{Y}_{1})
+min{ρλ¯(R2I(X^2;X^1,Y^1)),\displaystyle+\min\{\overline{\rho\lambda}(R_{2}-I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1})),
(R2I(X^2;X^1,Y^1))}]}\displaystyle\quad\quad(R_{2}-I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1}))\}\bigg{]}\Bigg{\}} (A.17)

where we used the fact that Pr[N𝒚1(P^)>en(R2I(X^2;Y^1)+ϵ)]\mbox{Pr}\big{[}N_{\mbox{\boldmath$y$}_{1}}(\hat{P}^{\prime})>e^{n(R_{2}-I(\hat{X}_{2}^{\prime};\hat{Y}_{1}^{\prime})+\epsilon)}\big{]} decays doubly exponentially in the third inequality, and bounded 𝑬𝑿1𝑬𝒞2[N𝑿1,𝒚1ρλ¯(P^)]\mbox{\boldmath$E$}_{\mbox{\boldmath$X$}_{1}}\mbox{\boldmath$E$}_{{\cal C}_{2}}\big{[}N_{\mbox{\boldmath$X$}_{1},\mbox{\boldmath$y$}_{1}}^{\overline{\rho\lambda}}(\hat{P})\big{]} using (A.9) and (A.11) in the last inequality.

The case R2<I(X^2;Y^1)R_{2}<I(\hat{X}_{2}^{\prime};\hat{Y}_{1}^{\prime}). Here, we further split the evaluation into two parts. In the first part, R2I(X^2;X^1,Y^1)R_{2}\geq I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1}), and we have:

𝑬𝑿1𝑬𝒞2[N𝑿1,𝒚1ρλ¯(P^)N𝒚1ρ(P^)]\displaystyle\mbox{\boldmath$E$}_{\mbox{\boldmath$X$}_{1}}\mbox{\boldmath$E$}_{{\cal C}_{2}}[N_{\mbox{\boldmath$X$}_{1},\mbox{\boldmath$y$}_{1}}^{\overline{\rho\lambda}}(\hat{P})N_{\mbox{\boldmath$y$}_{1}}^{\rho}(\hat{P}^{\prime})]
\displaystyle\leq 𝑬𝑿1𝑬𝒞2{N𝑿1,𝒚11ρλ(P^)N𝒚1ρ(P^)×\displaystyle\mbox{\boldmath$E$}_{\mbox{\boldmath$X$}_{1}}\mbox{\boldmath$E$}_{{\cal C}_{2}}\Bigg{\{}N_{\mbox{\boldmath$X$}_{1},\mbox{\boldmath$y$}_{1}}^{1-\rho\lambda}(\hat{P})N_{\mbox{\boldmath$y$}_{1}}^{\rho}(\hat{P}^{\prime})\times
1[N𝑿1,𝒚1(P^)en(R2I(X^2;X^1,Y^1)+ϵ)]}\displaystyle\quad\quad\quad\quad\quad\quad 1\bigg{[}N_{\mbox{\boldmath$X$}_{1},\mbox{\boldmath$y$}_{1}}(\hat{P})\leq e^{n(R_{2}-I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1})+\epsilon)}\bigg{]}\Bigg{\}}
+𝑬𝑿1𝑬𝒞2{N𝑿1,𝒚11ρλ(P^)N𝒚1ρ(P^)×\displaystyle+\mbox{\boldmath$E$}_{\mbox{\boldmath$X$}_{1}}\mbox{\boldmath$E$}_{{\cal C}_{2}}\Bigg{\{}N_{\mbox{\boldmath$X$}_{1},\mbox{\boldmath$y$}_{1}}^{1-\rho\lambda}(\hat{P})N_{\mbox{\boldmath$y$}_{1}}^{\rho}(\hat{P}^{\prime})\times
1[N𝑿1,𝒚1(P^)>en(R2I(X^2;X^1,Y^1)+ϵ)]}\displaystyle\quad\quad\quad\quad\quad\quad 1\bigg{[}N_{\mbox{\boldmath$X$}_{1},\mbox{\boldmath$y$}_{1}}(\hat{P})>e^{n(R_{2}-I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1})+\epsilon)}\bigg{]}\Bigg{\}}
.\displaystyle\stackrel{{\scriptstyle.}}{{\leq}} enρλ¯(R2I(X^2;X^1,Y^1))×\displaystyle e^{n\overline{\rho\lambda}(R_{2}-I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1}))}\times
𝑬𝑿1𝑬𝒞2{N𝒚1ρ(P^)1[(𝑿1,𝒚1)𝒯PX^1Y^1]}\displaystyle\quad\quad\quad\quad\quad\mbox{\boldmath$E$}_{\mbox{\boldmath$X$}_{1}}\mbox{\boldmath$E$}_{{\cal C}_{2}}\bigg{\{}N_{\mbox{\boldmath$y$}_{1}}^{\rho}(\hat{P}^{\prime})1\big{[}(\mbox{\boldmath$X$}_{1},\mbox{\boldmath$y$}_{1})\in{\cal T}_{P_{\hat{X}_{1}\hat{Y}_{1}}}\big{]}\bigg{\}}
+en(ρλ¯+ρ)R2Pr[N𝑿1,𝒚1(P^)>en(R2I(X^2;X^1,Y^1)+ϵ)]\displaystyle+e^{n(\overline{\rho\lambda}+\rho)R_{2}}\mbox{Pr}\bigg{[}N_{\mbox{\boldmath$X$}_{1},\mbox{\boldmath$y$}_{1}}(\hat{P})>e^{n(R_{2}-I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1})+\epsilon)}\bigg{]}
.\displaystyle\stackrel{{\scriptstyle.}}{{\leq}} en[ρλ¯(R2I(X^2;X^1,Y^1))I(X^1;Y^1)]𝑬𝒞2[N𝒚1ρ(PX^2Y^1)]\displaystyle e^{n[\overline{\rho\lambda}(R_{2}-I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1}))-I(\hat{X}_{1};\hat{Y}_{1})]}\mbox{\boldmath$E$}_{{\cal C}_{2}}\big{[}N_{\mbox{\boldmath$y$}_{1}}^{\rho}(P_{\hat{X}_{2}^{\prime}\hat{Y}_{1}^{\prime}})\big{]}
.\displaystyle\stackrel{{\scriptstyle.}}{{\leq}} exp{n[ρλ¯(R2I(X^2;X^1,Y^1))I(X^1;Y^1)\displaystyle\exp\bigg{\{}n\big{[}\overline{\rho\lambda}(R_{2}-I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1}))-I(\hat{X}_{1};\hat{Y}_{1})
+R2I(X^2;Y^1)]}\displaystyle\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad+R_{2}-I(\hat{X}_{2}^{\prime};\hat{Y}_{1}^{\prime})\big{]}\bigg{\}} (A.18)

where we used in the last inequality

𝑬𝒞2[N𝒚1ρ(PX^2Y^1)]𝑬𝒞2[N𝒚1(PX^2Y^1)]=.en(R2I(X^2;Y^1))\mbox{\boldmath$E$}_{{\cal C}_{2}}\big{[}N_{\mbox{\boldmath$y$}_{1}}^{\rho}(P_{\hat{X}_{2}^{\prime}\hat{Y}_{1}^{\prime}})\big{]}\leq\mbox{\boldmath$E$}_{{\cal C}_{2}}\big{[}N_{\mbox{\boldmath$y$}_{1}}(P_{\hat{X}_{2}^{\prime}\hat{Y}_{1}^{\prime}})\big{]}\stackrel{{\scriptstyle.}}{{=}}e^{n(R_{2}-I(\hat{X}_{2}^{\prime};\hat{Y}_{1}^{\prime}))}

valid for ρ1\rho\leq 1.

The other part corresponds to R2<I(X^2;X^1,Y^1)R_{2}<I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1}). Here we have:

𝑬𝑿1\displaystyle\mbox{\boldmath$E$}_{\mbox{\boldmath$X$}_{1}} 𝑬𝒞2[N𝑿1,𝒚1ρλ¯(P^)N𝒚1ρ(P^)]\displaystyle\mbox{\boldmath$E$}_{{\cal C}_{2}}[N_{\mbox{\boldmath$X$}_{1},\mbox{\boldmath$y$}_{1}}^{\overline{\rho\lambda}}(\hat{P})N_{\mbox{\boldmath$y$}_{1}}^{\rho}(\hat{P}^{\prime})]
=\displaystyle= 𝑬𝑿1𝑬𝒞2{N𝑿1,𝒚11ρλ(P^)N𝒚1ρ(P^)1[N𝒚1(P^)enϵ]}\displaystyle\mbox{\boldmath$E$}_{\mbox{\boldmath$X$}_{1}}\mbox{\boldmath$E$}_{{\cal C}_{2}}\bigg{\{}N_{\mbox{\boldmath$X$}_{1},\mbox{\boldmath$y$}_{1}}^{1-\rho\lambda}(\hat{P})N_{\mbox{\boldmath$y$}_{1}}^{\rho}(\hat{P}^{\prime})1\big{[}N_{\mbox{\boldmath$y$}_{1}}(\hat{P}^{\prime})\leq e^{n\epsilon}\big{]}\bigg{\}}
+𝑬𝑿1𝑬𝒞2{N𝑿1,𝒚1ρλ¯(P^)N𝒚1ρ(P^)1[N𝒚1(P^)>enϵ]}\displaystyle+\mbox{\boldmath$E$}_{\mbox{\boldmath$X$}_{1}}\mbox{\boldmath$E$}_{{\cal C}_{2}}\bigg{\{}N_{\mbox{\boldmath$X$}_{1},\mbox{\boldmath$y$}_{1}}^{\overline{\rho\lambda}}(\hat{P})N_{\mbox{\boldmath$y$}_{1}}^{\rho}(\hat{P}^{\prime})1\big{[}N_{\mbox{\boldmath$y$}_{1}}(\hat{P}^{\prime})>e^{n\epsilon}\big{]}\bigg{\}}
\displaystyle\leq enρϵ𝑬𝑿1𝑬𝒞2{N𝑿1,𝒚1ρλ¯(P^)1[N𝒚1(P^)1]}\displaystyle e^{n\rho\epsilon}\mbox{\boldmath$E$}_{\mbox{\boldmath$X$}_{1}}\mbox{\boldmath$E$}_{{\cal C}_{2}}\bigg{\{}N_{\mbox{\boldmath$X$}_{1},\mbox{\boldmath$y$}_{1}}^{\overline{\rho\lambda}}(\hat{P})1\big{[}N_{\mbox{\boldmath$y$}_{1}}(\hat{P}^{\prime})\geq 1\big{]}\bigg{\}}
+en(ρλ¯+ρ)R2Pr[N𝒚1(P^)>enϵ]\displaystyle\quad\quad\quad\quad\quad\quad\quad+e^{n(\overline{\rho\lambda}+\rho)R_{2}}\mbox{Pr}\bigg{[}N_{\mbox{\boldmath$y$}_{1}}(\hat{P}^{\prime})>e^{n\epsilon}\bigg{]}
.\displaystyle\stackrel{{\scriptstyle.}}{{\leq}} 𝑬𝑿1𝑬𝒞2{N𝑿1,𝒚1ρλ¯(P^)1[N𝒚1(P^)1]×\displaystyle\mbox{\boldmath$E$}_{\mbox{\boldmath$X$}_{1}}\mbox{\boldmath$E$}_{{\cal C}_{2}}\bigg{\{}N_{\mbox{\boldmath$X$}_{1},\mbox{\boldmath$y$}_{1}}^{\overline{\rho\lambda}}(\hat{P})\cdot 1\big{[}N_{\mbox{\boldmath$y$}_{1}}(\hat{P}^{\prime})\geq 1\big{]}\times
1[N𝑿1,𝒚1(P^)enϵ]}\displaystyle\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad 1\big{[}N_{\mbox{\boldmath$X$}_{1},\mbox{\boldmath$y$}_{1}}(\hat{P})\leq e^{n\epsilon}\big{]}\bigg{\}}
+𝑬𝑿1𝑬𝒞2{N𝑿1,𝒚1ρλ¯(P^)1[N𝒚1(P^)1]×\displaystyle+\mbox{\boldmath$E$}_{\mbox{\boldmath$X$}_{1}}\mbox{\boldmath$E$}_{{\cal C}_{2}}\bigg{\{}N_{\mbox{\boldmath$X$}_{1},\mbox{\boldmath$y$}_{1}}^{\overline{\rho\lambda}}(\hat{P})\cdot 1\big{[}N_{\mbox{\boldmath$y$}_{1}}(\hat{P}^{\prime})\geq 1\big{]}\times
1[N𝑿1,𝒚1(P^)>enϵ]}\displaystyle\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad 1\big{[}N_{\mbox{\boldmath$X$}_{1},\mbox{\boldmath$y$}_{1}}(\hat{P})>e^{n\epsilon}\big{]}\bigg{\}}
.\displaystyle\stackrel{{\scriptstyle.}}{{\leq}} enρλ¯ϵ𝑬𝑿1𝑬𝒞2{1[N𝒚1(P^)1]×\displaystyle e^{n\overline{\rho\lambda}\epsilon}\mbox{\boldmath$E$}_{\mbox{\boldmath$X$}_{1}}\mbox{\boldmath$E$}_{{\cal C}_{2}}\bigg{\{}1\big{[}N_{\mbox{\boldmath$y$}_{1}}(\hat{P}^{\prime})\geq 1\big{]}\times
1[N𝑿1,𝒚1(P^)1]}\displaystyle\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad 1\big{[}N_{\mbox{\boldmath$X$}_{1},\mbox{\boldmath$y$}_{1}}(\hat{P})\geq 1\big{]}\bigg{\}}
+enρλ¯R2𝑬𝑿1{Pr[N𝑿1,𝒚1(P^)>enϵ]}\displaystyle+e^{n\overline{\rho\lambda}R_{2}}\mbox{\boldmath$E$}_{\mbox{\boldmath$X$}_{1}}\bigg{\{}\mbox{Pr}\big{[}N_{\mbox{\boldmath$X$}_{1},\mbox{\boldmath$y$}_{1}}(\hat{P})>e^{n\epsilon}\big{]}\bigg{\}}
=.\displaystyle\stackrel{{\scriptstyle.}}{{=}} 1|𝒯1|𝒙~1𝒯11[(𝒙~1,𝒚1)𝒯PX^1Y^1]×\displaystyle\frac{1}{|{\cal T}_{1}|}\sum_{\tilde{\mbox{\boldmath$x$}}_{1}\in{\cal T}_{1}}1\big{[}(\tilde{\mbox{\boldmath$x$}}_{1},\mbox{\boldmath$y$}_{1})\in{\cal T}_{P_{\hat{X}_{1}\hat{Y}_{1}}}\big{]}\times
Pr[N𝒚1(P^)1,N𝒙~1,𝒚1(P^)1]\displaystyle\quad\quad\quad\quad\quad\mbox{Pr}\big{[}N_{\mbox{\boldmath$y$}_{1}}(\hat{P}^{\prime})\geq 1,N_{\tilde{\mbox{\boldmath$x$}}_{1},\mbox{\boldmath$y$}_{1}}(\hat{P})\geq 1\big{]} (A.19)

To bound Pr[N𝒚1(P^)1,N𝒙~1,𝒚1(P^)1]\mbox{Pr}\big{[}N_{\mbox{\boldmath$y$}_{1}}(\hat{P}^{\prime})\geq 1,N_{\tilde{\mbox{\boldmath$x$}}_{1},\mbox{\boldmath$y$}_{1}}(\hat{P})\geq 1\big{]}, we consider two cases:

The first case is when PX^2Y^1=PX^2Y^1P_{\hat{X}_{2}\hat{Y}_{1}}=P_{\hat{X}_{2}^{\prime}\hat{Y}_{1}^{\prime}}: in this case, {N𝒙~1,𝒚1(P^)1}{N𝒚1(P^)1}\big{\{}N_{\tilde{\mbox{\boldmath$x$}}_{1},\mbox{\boldmath$y$}_{1}}(\hat{P})\geq 1\big{\}}\Rightarrow\big{\{}N_{\mbox{\boldmath$y$}_{1}}(\hat{P}^{\prime})\geq 1\big{\}}. Therefore,

Pr[N𝒚1(P^)1,N𝒙~1,𝒚1(P^)1]=\displaystyle\mbox{Pr}\big{[}N_{\mbox{\boldmath$y$}_{1}}(\hat{P}^{\prime})\geq 1,N_{\tilde{\mbox{\boldmath$x$}}_{1},\mbox{\boldmath$y$}_{1}}(\hat{P})\geq 1\big{]}= Pr[N𝒙~1,𝒚1(P^)1]\displaystyle\mbox{Pr}\big{[}N_{\tilde{\mbox{\boldmath$x$}}_{1},\mbox{\boldmath$y$}_{1}}(\hat{P})\geq 1\big{]}
.\displaystyle\stackrel{{\scriptstyle.}}{{\leq}} en(R2I(X^2;X^1,Y^1)).\displaystyle e^{n(R_{2}-I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1}))}.

Replacing in (A.19), we get:

𝑬𝑿1𝑬𝒞2\displaystyle\mbox{\boldmath$E$}_{\mbox{\boldmath$X$}_{1}}\mbox{\boldmath$E$}_{{\cal C}_{2}} [N𝑿1,𝒚1ρλ¯(P^)N𝒚1ρ(P^)]\displaystyle[N_{\mbox{\boldmath$X$}_{1},\mbox{\boldmath$y$}_{1}}^{\overline{\rho\lambda}}(\hat{P})N_{\mbox{\boldmath$y$}_{1}}^{\rho}(\hat{P}^{\prime})]
.exp{n[I(X^1;Y^1)+R2I(X^2;X^1,Y^1)]}.\displaystyle\stackrel{{\scriptstyle.}}{{\leq}}\exp\big{\{}n\big{[}-I(\hat{X}_{1};\hat{Y}_{1})+R_{2}-I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1})\big{]}\big{\}}. (A.20)

The other case is PX^2Y^1PX^2Y^1P_{\hat{X}_{2}\hat{Y}_{1}}\neq P_{\hat{X}_{2}^{\prime}\hat{Y}_{1}^{\prime}}: in this case, the same codeword 𝒙2\mbox{\boldmath$x$}_{2} cannot simultaneously satisfy (𝒙~1,𝒙2,𝒚1)𝒯PX^1X^2Y^1(\tilde{\mbox{\boldmath$x$}}_{1},\mbox{\boldmath$x$}_{2},\mbox{\boldmath$y$}_{1})\in{\cal T}_{P_{\hat{X}_{1}\hat{X}_{2}\hat{Y}_{1}}} and (𝒙2,𝒚1)𝒯PX^2Y^1(\mbox{\boldmath$x$}_{2},\mbox{\boldmath$y$}_{1})\in{\cal T}_{P_{\hat{X}_{2}^{\prime}\hat{Y}_{1}^{\prime}}}. Therefore, we have that

Pr[N𝒚1(P^)1,\displaystyle\mbox{Pr}\big{[}N_{\mbox{\boldmath$y$}_{1}}(\hat{P}^{\prime})\geq 1,\quad N𝒙~1,𝒚1(P^)1]\displaystyle N_{\tilde{\mbox{\boldmath$x$}}_{1},\mbox{\boldmath$y$}_{1}}(\hat{P})\geq 1\big{]}
=\displaystyle= Pr[𝒙2𝒙2:(𝒙~1,𝒙2,𝒚1)𝒯PX^1X^2Y^1,\displaystyle\mbox{Pr}\big{[}\exists\mbox{\boldmath$x$}_{2}^{\prime}\neq\mbox{\boldmath$x$}_{2}:(\tilde{\mbox{\boldmath$x$}}_{1},\mbox{\boldmath$x$}_{2}^{\prime},\mbox{\boldmath$y$}_{1})\in{\cal T}_{P_{\hat{X}_{1}\hat{X}_{2}\hat{Y}_{1}}},
(𝒙2,𝒚1)𝒯PX^2,Y^1]\displaystyle\quad\quad\quad\quad\quad\quad\quad\quad(\mbox{\boldmath$x$}_{2},\mbox{\boldmath$y$}_{1})\in{\cal T}_{P_{\hat{X}^{\prime}_{2},\hat{Y}_{1}^{\prime}}}\big{]}
\displaystyle\leq 𝒙2𝒞2𝒙2𝒙2Pr[(𝒙~1,𝒙2,𝒚1)𝒯PX^1X^2Y^1,\displaystyle\sum_{\mbox{\boldmath$x$}_{2}\in{\cal C}_{2}}\sum_{\mbox{\boldmath$x$}_{2}^{\prime}\neq\mbox{\boldmath$x$}_{2}}\mbox{Pr}\big{[}(\tilde{\mbox{\boldmath$x$}}_{1},\mbox{\boldmath$x$}_{2}^{\prime},\mbox{\boldmath$y$}_{1})\in{\cal T}_{P_{\hat{X}_{1}\hat{X}_{2}\hat{Y}_{1}}},
(𝒙2,𝒚1)𝒯PX^2,Y^1]\displaystyle\quad\quad\quad\quad\quad\quad\quad\quad(\mbox{\boldmath$x$}_{2},\mbox{\boldmath$y$}_{1})\in{\cal T}_{P_{\hat{X}_{2}^{\prime},\hat{Y}_{1}^{\prime}}}\big{]}
.\displaystyle\stackrel{{\scriptstyle.}}{{\leq}} en2R2enI(X^2;X^1,Y^1)enI(X^2;Y^1).\displaystyle e^{n2R_{2}}e^{-nI(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1})}e^{-nI(\hat{X}_{2}^{\prime};\hat{Y}_{1}^{\prime})}.

Replacing in (A.19), we get:

𝑬𝑿1𝑬𝒞2\displaystyle\mbox{\boldmath$E$}_{\mbox{\boldmath$X$}_{1}}\mbox{\boldmath$E$}_{{\cal C}_{2}} [N𝑿1,𝒚1ρλ¯(P^)N𝒚1ρ(P^)]\displaystyle[N_{\mbox{\boldmath$X$}_{1},\mbox{\boldmath$y$}_{1}}^{\overline{\rho\lambda}}(\hat{P})N_{\mbox{\boldmath$y$}_{1}}^{\rho}(\hat{P}^{\prime})]
.\displaystyle\stackrel{{\scriptstyle.}}{{\leq}} exp{n[I(X^1;Y^1)+R2I(X^2;X^1,Y^1)\displaystyle\exp\big{\{}n\big{[}-I(\hat{X}_{1};\hat{Y}_{1})+R_{2}-I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1})
+R2I(X^2;Y^1)]}.\displaystyle\quad\quad\quad\quad\quad\quad\quad\quad+R_{2}-I(\hat{X}_{2}^{\prime};\hat{Y}_{1}^{\prime})\big{]}\big{\}}. (A.21)

This completes the decomposition of 𝑬𝒞2\mbox{\boldmath$E$}_{{\cal C}_{2}} into the various subcases.

Refer to caption

Figure 5: Tree representing the multiple ranges of R2R_{2} considered in the derivation, and the equations that consolidate the different ranges.

Consolidation. Next, we carry out a consolidation process that merges all of the above subcases into a more compact expression, leading ultimately to the expression in Theorem 1. Figure 5 gives a schematic representation, in terms of a tree, of the various consolidation steps described below. The consolidation of (A) and (A) into (A.13) was done before, but we include it in Fig. 5 for completeness. Referring to Fig. 5, the consolidation starts at the deepest leaves of the tree and works its way up the nodes until it reaches the root.

We begin with the last set of subsubcases derived, R2I(X^2;X^1,Y^1)R_{2}\geq I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1}) and R2<I(X^2;X^1,Y^1)R_{2}<I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1}) (expressions (A.18), (A.20), and (A.21)) for the subcase R2<I(X^2;Y^1)R_{2}<I(\hat{X}_{2}^{\prime};\hat{Y}_{1}^{\prime}), and consolidate them as follows:

𝑬𝑿1𝑬𝒞2\displaystyle\mbox{\boldmath$E$}_{\mbox{\boldmath$X$}_{1}}\mbox{\boldmath$E$}_{{\cal C}_{2}} .exp{n{1(R2I(X^2;X^1,Y^1))×\displaystyle\stackrel{{\scriptstyle.}}{{\leq}}\exp\Bigg{\{}n\Big{\{}1(R_{2}\geq I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1}))\times
[ρλ¯(R2I(X^2;X^1,Y^1))I(X^1;Y^1)\displaystyle\big{[}\overline{\rho\lambda}(R_{2}-I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1}))-I(\hat{X}_{1};\hat{Y}_{1})
+R2I(X^2;Y^1)]\displaystyle\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad+R_{2}-I(\hat{X}_{2}^{\prime};\hat{Y}_{1}^{\prime})\big{]}
+1(R2<I(X^2;X^1,Y^1))1(PX^2Y^1PX^2Y^1)×\displaystyle+1(R_{2}<I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1}))1(P_{\hat{X}_{2}\hat{Y}_{1}}\neq P_{\hat{X}_{2}^{\prime}\hat{Y}_{1}^{\prime}})\times
[I(X^1;Y^1)+R2I(X^2;X^1,Y^1)\displaystyle\quad\big{[}-I(\hat{X}_{1};\hat{Y}_{1})+R_{2}-I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1})
+R2I(X^2;Y^1)]\displaystyle\quad\quad\quad\quad\quad\quad\quad\quad+R_{2}-I(\hat{X}_{2}^{\prime};\hat{Y}_{1}^{\prime})\big{]}
+1(R2<I(X^2;X^1,Y^1))1(PX^2Y^1=PX^2Y^1)×\displaystyle+1(R_{2}<I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1}))1(P_{\hat{X}_{2}\hat{Y}_{1}}=P_{\hat{X}_{2}^{\prime}\hat{Y}_{1}^{\prime}})\times
[I(X^1;Y^1)+R2I(X^2;X^1,Y^1)]}}.\displaystyle\quad\big{[}-I(\hat{X}_{1};\hat{Y}_{1})+R_{2}-I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1})\big{]}\Big{\}}\Bigg{\}}. (A.22)

Next we would like to decompose the indicator 1(R2I(X^2;X^1,Y^1))1(R_{2}\geq I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1})) appearing in the initial part of this expression as

1(R2\displaystyle 1(R_{2} I(X^2;X^1,Y^1))\displaystyle\geq I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1}))
=\displaystyle= 1(R2I(X^2;X^1,Y^1))1(PX^2Y^1=PX^2Y^1)+\displaystyle 1(R_{2}\geq I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1}))1(P_{\hat{X}_{2}\hat{Y}_{1}}=P_{\hat{X}_{2}^{\prime}\hat{Y}_{1}^{\prime}})+
1(R2I(X^2;X^1,Y^1))1(PX^2Y^1PX^2Y^1)\displaystyle 1(R_{2}\geq I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1}))1(P_{\hat{X}_{2}\hat{Y}_{1}}\neq P_{\hat{X}_{2}^{\prime}\hat{Y}_{1}^{\prime}})
=\displaystyle= 1(R2I(X^2;X^1,Y^1))1(PX^2Y^1PX^2Y^1),\displaystyle 1(R_{2}\geq I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1}))1(P_{\hat{X}_{2}\hat{Y}_{1}}\neq P_{\hat{X}_{2}^{\prime}\hat{Y}_{1}^{\prime}}),

where we are taking into account in the last step that for the present subcase R2<I(X^2;Y^1)R_{2}<I(\hat{X}_{2}^{\prime};\hat{Y}_{1}^{\prime}), 1(R2I(X^2;X^1,Y^1))1(PX^2Y^1=PX^2Y^1)=01(R_{2}\geq I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1}))1(P_{\hat{X}_{2}\hat{Y}_{1}}=P_{\hat{X}_{2}^{\prime}\hat{Y}_{1}^{\prime}})=0 since for PX^2Y^1=PX^2Y^1P_{\hat{X}_{2}\hat{Y}_{1}}=P_{\hat{X}_{2}^{\prime}\hat{Y}_{1}^{\prime}} we have R2<I(X^2;Y^1)=I(X^2;Y^1)I(X^2;X^1,Y^1)R_{2}<I(\hat{X}_{2}^{\prime};\hat{Y}_{1}^{\prime})=I(\hat{X}_{2};\hat{Y}_{1})\leq I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1}).

Applying this decomposition to (A.22), then combining terms having the same indicators 1(PX^2Y^1PX^2Y^1)1(P_{\hat{X}_{2}\hat{Y}_{1}}\neq P_{\hat{X}_{2}^{\prime}\hat{Y}_{1}^{\prime}}), and 1(PX^2Y^1=PX^2Y^1)1(P_{\hat{X}_{2}\hat{Y}_{1}}=P_{\hat{X}_{2}^{\prime}\hat{Y}_{1}^{\prime}}), and replacing indicators by min{}\min\{\cdots\} as appropriate (similar to (A.13)), we simplify (A.22) to

𝑬𝑿1\displaystyle\mbox{\boldmath$E$}_{\mbox{\boldmath$X$}_{1}} 𝑬𝒞2\displaystyle\mbox{\boldmath$E$}_{{\cal C}_{2}}
.\displaystyle\stackrel{{\scriptstyle.}}{{\leq}} exp{n{1(PX^2Y^1PX^2Y^1)[I(X^1;Y^1)+\displaystyle\exp\Bigg{\{}n\Big{\{}1(P_{\hat{X}_{2}\hat{Y}_{1}}\neq P_{\hat{X}_{2}^{\prime}\hat{Y}_{1}^{\prime}})\big{[}-I(\hat{X}_{1};\hat{Y}_{1})+
min{ρλ¯(R2I(X^2;X^1,Y^1)),R2I(X^2;X^1,Y^1)}\displaystyle\quad\min\{\overline{\rho\lambda}(R_{2}{-}I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1})),R_{2}{-}I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1})\}
+R2I(X^2;Y^1)]\displaystyle\quad\quad+R_{2}-I(\hat{X}_{2}^{\prime};\hat{Y}_{1}^{\prime})\big{]}
+1(PX^2Y^1=PX^2Y^1)1(R2<I(X^2;X^1,Y^1))×\displaystyle+1(P_{\hat{X}_{2}\hat{Y}_{1}}=P_{\hat{X}_{2}^{\prime}\hat{Y}_{1}^{\prime}})1(R_{2}<I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1}))\times
[I(X^1;Y^1)+R2I(X^2;X^1,Y^1)]]}}.\displaystyle\quad\big{[}-I(\hat{X}_{1};\hat{Y}_{1})+R_{2}-I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1})\big{]}\Big{]}\Big{\}}\Bigg{\}}. (A.23)

This is valid for the subcase R2<I(X^2;Y^1)R_{2}<I(\hat{X}_{2}^{\prime};\hat{Y}_{1}^{\prime}).

Next, we consolidate (A.17) from the subcase R2I(X^2;Y^1)R_{2}\geq I(\hat{X}_{2}^{\prime};\hat{Y}_{1}^{\prime}) with (A.23) and insert the result into (A.16) to get

𝑬𝒞2\displaystyle\mbox{\boldmath$E$}_{{\cal C}_{2}} .exp{n{ρI(X^1;X^2,Y^1)\displaystyle\stackrel{{\scriptstyle.}}{{\leq}}\exp\Bigg{\{}n\Big{\{}-\rho I(\hat{X}_{1}^{\prime};\hat{X}_{2}^{\prime},\hat{Y}_{1}^{\prime})
+1(R2I(X^2;Y^1))[I(X^1;Y^1)+ρ(R2I(X^2;Y^1))\displaystyle+1(R_{2}{\geq}I(\hat{X}_{2}^{\prime};\hat{Y}_{1}^{\prime}))\Big{[}{-}I(\hat{X}_{1};\hat{Y}_{1}){+}\rho(R_{2}{-}I(\hat{X}_{2}^{\prime};\hat{Y}_{1}^{\prime}))
+min{ρλ¯(R2I(X^2;X^1,Y^1)),(R2I(X^2;X^1,Y^1))}]\displaystyle{+}\min\{\overline{\rho\lambda}(R_{2}{-}I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1})),(R_{2}{-}I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1}))\}\Big{]}
+1(R2<I(X^2;Y^1))[1(PX^2Y^1PX^2Y^1)[I(X^1;Y^1)\displaystyle{+}1(R_{2}{<}I(\hat{X}_{2}^{\prime};\hat{Y}_{1}^{\prime}))\Big{[}1(P_{\hat{X}_{2}\hat{Y}_{1}}{\neq}P_{\hat{X}_{2}^{\prime}\hat{Y}_{1}^{\prime}})\big{[}-I(\hat{X}_{1};\hat{Y}_{1})
+min{ρλ¯(R2I(X^2;X^1,Y^1)),R2I(X^2;X^1,Y^1)}\displaystyle{+}\min\{\overline{\rho\lambda}(R_{2}{-}I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1})),R_{2}{-}I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1})\}
+R2I(X^2;Y^1)]\displaystyle\quad\quad\quad+R_{2}-I(\hat{X}_{2}^{\prime};\hat{Y}_{1}^{\prime})\big{]}
+1(PX^2Y^1=PX^2Y^1)1(R2<I(X^2;X^1,Y^1))×\displaystyle\quad+1(P_{\hat{X}_{2}\hat{Y}_{1}}=P_{\hat{X}_{2}^{\prime}\hat{Y}_{1}^{\prime}})1(R_{2}<I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1}))\times
[I(X^1;Y^1)+R2I(X^2;X^1,Y^1)]]}},\displaystyle\quad\quad\big{[}-I(\hat{X}_{1};\hat{Y}_{1})+R_{2}-I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1})\big{]}\Big{]}\Big{\}}\Bigg{\}}, (A.24)

which applies to the range R2<I(X^2;X^1,Y^1)R_{2}<I(\hat{X}_{2}^{\prime};\hat{X}_{1}^{\prime},\hat{Y}_{1}^{\prime}). Again, expanding all terms against the indicators 1(PX^2Y^1PX^2Y^1)1(P_{\hat{X}_{2}\hat{Y}_{1}}\neq P_{\hat{X}_{2}^{\prime}\hat{Y}_{1}^{\prime}}), and 1(PX^2Y^1=PX^2Y^1)1(P_{\hat{X}_{2}\hat{Y}_{1}}=P_{\hat{X}_{2}^{\prime}\hat{Y}_{1}^{\prime}}), and, as above, replacing indicators by min{}\min\{\cdots\} as appropriate, we obtain

𝑬𝒞2\displaystyle\mbox{\boldmath$E$}_{{\cal C}_{2}} .exp{n{1(PX^2Y^1PX^2Y^1)[ρI(X^1;X^2,Y^1)\displaystyle\stackrel{{\scriptstyle.}}{{\leq}}\exp\Bigg{\{}n\Big{\{}1(P_{\hat{X}_{2}\hat{Y}_{1}}\neq P_{\hat{X}_{2}^{\prime}\hat{Y}_{1}^{\prime}})\Big{[}-\rho I(\hat{X}_{1}^{\prime};\hat{X}_{2}^{\prime},\hat{Y}_{1}^{\prime})
I(X^1;Y^1)+min{ρλ¯(R2I(X^2;X^1,Y^1)),\displaystyle\quad-I(\hat{X}_{1};\hat{Y}_{1})+\min\{\overline{\rho\lambda}(R_{2}-I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1})),
R2I(X^2;X^1,Y^1)}\displaystyle\quad\quad\quad\quad\quad R_{2}-I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1})\}
+min{ρ(R2I(X^2;Y^1)),R2I(X^2;Y^1)}]\displaystyle\quad+\min\{\rho(R_{2}-I(\hat{X}_{2}^{\prime};\hat{Y}_{1}^{\prime})),R_{2}-I(\hat{X}_{2}^{\prime};\hat{Y}_{1}^{\prime})\}\Big{]}
1(PX^2Y^1=PX^2Y^1)×\displaystyle 1(P_{\hat{X}_{2}\hat{Y}_{1}}=P_{\hat{X}_{2}^{\prime}\hat{Y}_{1}^{\prime}})\times
[ρI(X^1;X^2,Y^1)+1(R2I(X^2;Y^1))×\displaystyle\quad\Big{[}-\rho I(\hat{X}_{1}^{\prime};\hat{X}_{2}^{\prime},\hat{Y}_{1}^{\prime})+1(R_{2}\geq I(\hat{X}_{2};\hat{Y}_{1}))\times
[I(X^1;Y^1)+ρ(R2I(X^2;Y^1))\displaystyle\quad\quad\big{[}-I(\hat{X}_{1};\hat{Y}_{1})+\rho(R_{2}-I(\hat{X}_{2}^{\prime};\hat{Y}_{1}^{\prime}))
+min{ρλ¯(R2I(X^2;X^1,Y^1)),\displaystyle\quad\quad+\min\{\overline{\rho\lambda}(R_{2}-I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1})),
R2I(X^2;X^1,Y^1)}]+1(R2<I(X^2;Y^1))×\displaystyle\quad\quad\quad\quad R_{2}-I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1})\}\big{]}+1(R_{2}<I(\hat{X}_{2};\hat{Y}_{1}))\times
[I(X^1;Y^1)+R2I(X^2;X^1,Y^1)]]}}.\displaystyle\quad\quad\big{[}-I(\hat{X}_{1};\hat{Y}_{1})+R_{2}-I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1})\big{]}\Big{]}\Big{\}}\Bigg{\}}. (A.25)

Using the identity (proved via the chain rule)

I(X^1;X^2,Y^1)+I(X^2;Y^1)=I(X^2;X^1,Y^1)+I(X^1;Y^1)I(\hat{X}_{1}^{\prime};\hat{X}_{2}^{\prime},\hat{Y}_{1}^{\prime})+I(\hat{X}_{2}^{\prime};\hat{Y}_{1}^{\prime})=I(\hat{X}_{2}^{\prime};\hat{X}_{1}^{\prime},\hat{Y}_{1}^{\prime})+I(\hat{X}_{1}^{\prime};\hat{Y}_{1}^{\prime})

twice, we can rewrite the term

ρI(X^1;X^2,Y^1)+min{ρ(R2I(X^2;Y^1)),R2I(X^2;Y^1)}-\rho I(\hat{X}_{1}^{\prime};\hat{X}_{2}^{\prime},\hat{Y}_{1}^{\prime})+\min\{\rho(R_{2}-I(\hat{X}_{2}^{\prime};\hat{Y}_{1}^{\prime})),\\ R_{2}-I(\hat{X}_{2}^{\prime};\hat{Y}_{1}^{\prime})\}

appearing after the indicator 1(PX^2Y^1PX^2Y^1)1(P_{\hat{X}_{2}\hat{Y}_{1}}\neq P_{\hat{X}_{2}^{\prime}\hat{Y}_{1}^{\prime}}) in (A.25) as

ρI(X^1;Y^1)+min{ρ(R2I(X^2;X^1,Y^1)),R2ρ¯I(X^2;Y^1)ρI(X^2;X^1,Y^1)}.-\rho I(\hat{X}_{1}^{\prime};\hat{Y}_{1}^{\prime})+\min\{\rho(R_{2}-I(\hat{X}_{2}^{\prime};\hat{X}_{1}^{\prime},\hat{Y}_{1}^{\prime})),\\ R_{2}-\overline{\rho}I(\hat{X}_{2}^{\prime};\hat{Y}_{1}^{\prime})-\rho I(\hat{X}_{2}^{\prime};\hat{X}_{1}^{\prime},\hat{Y}_{1}^{\prime})\}.

Similarly, we can decompose the term ρI(X^1;X^2,Y^1)-\rho I(\hat{X}_{1}^{\prime};\hat{X}_{2}^{\prime},\hat{Y}_{1}^{\prime}) appearing after the indicator 1(PX^2Y^1=PX^2Y^1)1(P_{\hat{X}_{2}\hat{Y}_{1}}=P_{\hat{X}_{2}^{\prime}\hat{Y}_{1}^{\prime}}) against the indicators 1(R2I(X^2;Y^1)1(R_{2}\geq I(\hat{X}_{2};\hat{Y}_{1}) and 1(R2<I(X^2;Y^1))1(R_{2}<I(\hat{X}_{2};\hat{Y}_{1})), and use the above identity to combine it with ρ(R2I(X^2;Y^1))\rho(R_{2}-I(\hat{X}_{2}^{\prime};\hat{Y}_{1}^{\prime})) appearing after the indicator 1(R2I(X^2;Y^1))1(R_{2}\geq I(\hat{X}_{2};\hat{Y}_{1})). Incorporating these steps, we can rewrite (A.25) as

𝑬𝒞2\displaystyle\mbox{\boldmath$E$}_{{\cal C}_{2}} .exp{n{1(PX^2Y^1PX^2Y^1)[I(X^1;Y^1)ρI(X^1;Y^1)\displaystyle\stackrel{{\scriptstyle.}}{{\leq}}\exp\Bigg{\{}n\Big{\{}1(P_{\hat{X}_{2}\hat{Y}_{1}}{\neq}P_{\hat{X}_{2}^{\prime}\hat{Y}_{1}^{\prime}})\Big{[}{-}I(\hat{X}_{1};\hat{Y}_{1}){-}\rho I(\hat{X}_{1}^{\prime};\hat{Y}_{1}^{\prime})
+min{ρλ¯(R2I(X^2;X^1,Y^1)),R2I(X^2;X^1,Y^1)}\displaystyle\quad+\min\{\overline{\rho\lambda}(R_{2}-I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1})),R_{2}-I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1})\}
+min{R2ρ¯I(X^2;Y^1)ρI(X^2;X^1,Y^1),\displaystyle\quad+\min\{R_{2}-\overline{\rho}I(\hat{X}_{2}^{\prime};\hat{Y}_{1}^{\prime})-\rho I(\hat{X}_{2}^{\prime};\hat{X}_{1}^{\prime},\hat{Y}_{1}^{\prime}),
ρ(R2I(X^2;X^1,Y^1))}]\displaystyle\quad\quad\quad\quad\quad\quad\rho(R_{2}-I(\hat{X}_{2}^{\prime};\hat{X}_{1}^{\prime},\hat{Y}_{1}^{\prime}))\}\Big{]}
+1(PX^2Y^1=PX^2Y^1)×\displaystyle+1(P_{\hat{X}_{2}\hat{Y}_{1}}=P_{\hat{X}_{2}^{\prime}\hat{Y}_{1}^{\prime}})\times
[1(R2I(X^2;Y^1))[I(X^1;Y^1)ρI(X^1;Y^1)\displaystyle\quad\Big{[}1(R_{2}\geq I(\hat{X}_{2};\hat{Y}_{1}))\big{[}-I(\hat{X}_{1};\hat{Y}_{1})-\rho I(\hat{X}_{1}^{\prime};\hat{Y}_{1}^{\prime})
+min{ρλ¯(R2I(X^2;X^1,Y^1)),R2I(X^2;X^1,Y^1)}\displaystyle\quad\quad{+}\min\{\overline{\rho\lambda}(R_{2}{-}I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1})),R_{2}{-}I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1})\}
+ρ(R2I(X^2;X^1,Y^1))]\displaystyle\quad\quad{+}\rho(R_{2}-I(\hat{X}_{2}^{\prime};\hat{X}_{1}^{\prime},\hat{Y}_{1}^{\prime}))\big{]}
+1(R2<I(X^2;Y^1))[I(X^1;Y^1)+R2\displaystyle\quad+1(R_{2}<I(\hat{X}_{2};\hat{Y}_{1}))\big{[}-I(\hat{X}_{1};\hat{Y}_{1})+R_{2}
I(X^2;X^1,Y^1)ρI(X^1;X^2,Y^1)]]}}.\displaystyle\quad\quad-I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1})-\rho I(\hat{X}_{1}^{\prime};\hat{X}_{2}^{\prime},\hat{Y}_{1}^{\prime})\big{]}\Big{]}\Big{\}}\Bigg{\}}. (A.26)

Finally, we consolidate (A.13) from the range R2I(X^2;X^1,Y^1)R_{2}\geq I(\hat{X}_{2}^{\prime};\hat{X}_{1}^{\prime},\hat{Y}_{1}^{\prime}) with the just obtained (A.26) (for the range R2<I(X^2;X^1,Y^1)R_{2}<I(\hat{X}_{2}^{\prime};\hat{X}_{1}^{\prime},\hat{Y}_{1}^{\prime})) to get

𝑬𝒞2\displaystyle\mbox{\boldmath$E$}_{{\cal C}_{2}} .exp{n{1(R2I(X^2;X^1,Y^1))×\displaystyle\stackrel{{\scriptstyle.}}{{\leq}}\exp\Bigg{\{}n\Big{\{}1(R_{2}\geq I(\hat{X}_{2}^{\prime};\hat{X}_{1}^{\prime},\hat{Y}_{1}^{\prime}))\times
[I(X^1;Y^1)ρI(X^1;Y^1)\displaystyle\Big{[}-I(\hat{X}_{1};\hat{Y}_{1})-\rho I(\hat{X}_{1}^{\prime};\hat{Y}_{1}^{\prime})
+min{ρλ¯(R2I(X^2;X^1,Y^1)),(R2I(X^2;X^1,Y^1))}\displaystyle{+}\min\{\overline{\rho\lambda}(R_{2}{-}I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1})),(R_{2}{-}I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1}))\}
+ρλ(R2I(X^2;X^1,Y^1))]\displaystyle{+}\rho\lambda(R_{2}-I(\hat{X}_{2}^{\prime};\hat{X}_{1}^{\prime},\hat{Y}_{1}^{\prime}))\Big{]}
+1(R2<I(X^2;X^1,Y^1))[1(PX^2Y^1PX^2Y^1)×\displaystyle+1(R_{2}<I(\hat{X}_{2}^{\prime};\hat{X}_{1}^{\prime},\hat{Y}_{1}^{\prime}))\Big{[}1(P_{\hat{X}_{2}\hat{Y}_{1}}{\neq}P_{\hat{X}_{2}^{\prime}\hat{Y}_{1}^{\prime}})\times
[I(X^1;Y^1)ρI(X^1;Y^1)\displaystyle\quad\Big{[}{-}I(\hat{X}_{1};\hat{Y}_{1}){-}\rho I(\hat{X}_{1}^{\prime};\hat{Y}_{1}^{\prime})
+min{ρλ¯(R2I(X^2;X^1,Y^1)),R2I(X^2;X^1,Y^1)}\displaystyle\quad+\min\{\overline{\rho\lambda}(R_{2}{-}I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1})),R_{2}{-}I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1})\}
+min{R2ρ¯I(X^2;Y^1)ρI(X^2;X^1,Y^1),\displaystyle\quad+\min\{R_{2}-\overline{\rho}I(\hat{X}_{2}^{\prime};\hat{Y}_{1}^{\prime})-\rho I(\hat{X}_{2}^{\prime};\hat{X}_{1}^{\prime},\hat{Y}_{1}^{\prime}),
ρ(R2I(X^2;X^1,Y^1))}]\displaystyle\quad\quad\quad\quad\quad\quad\rho(R_{2}-I(\hat{X}_{2}^{\prime};\hat{X}_{1}^{\prime},\hat{Y}_{1}^{\prime}))\}\Big{]}
+1(PX^2Y^1=PX^2Y^1)×\displaystyle+1(P_{\hat{X}_{2}\hat{Y}_{1}}=P_{\hat{X}_{2}^{\prime}\hat{Y}_{1}^{\prime}})\times
[1(R2I(X^2;Y^1))[I(X^1;Y^1)ρI(X^1;Y^1)\displaystyle\quad\Big{[}1(R_{2}\geq I(\hat{X}_{2};\hat{Y}_{1}))\big{[}-I(\hat{X}_{1};\hat{Y}_{1})-\rho I(\hat{X}_{1}^{\prime};\hat{Y}_{1}^{\prime})
+min{ρλ¯(R2I(X^2;X^1,Y^1)),R2I(X^2;X^1,Y^1)}\displaystyle\quad{+}\min\{\overline{\rho\lambda}(R_{2}{-}I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1})),R_{2}{-}I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1})\}
+ρ(R2I(X^2;X^1,Y^1))]\displaystyle\quad{+}\rho(R_{2}-I(\hat{X}_{2}^{\prime};\hat{X}_{1}^{\prime},\hat{Y}_{1}^{\prime}))\big{]}
+1(R2<I(X^2;Y^1))[I(X^1;Y^1)+R2\displaystyle\quad+1(R_{2}<I(\hat{X}_{2};\hat{Y}_{1}))\big{[}-I(\hat{X}_{1};\hat{Y}_{1})+R_{2}
I(X^2;X^1,Y^1)ρI(X^1;X^2,Y^1)]]]}}.\displaystyle\quad\quad-I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1})-\rho I(\hat{X}_{1}^{\prime};\hat{X}_{2}^{\prime},\hat{Y}_{1}^{\prime})\big{]}\Big{]}\Big{]}\Big{\}}\Bigg{\}}. (A.27)

As before, after expanding the first indicator 1(R2I(X^2;X^1,Y^1))1(R_{2}\geq I(\hat{X}_{2}^{\prime};\hat{X}_{1}^{\prime},\hat{Y}_{1}^{\prime})) against 1(PX^2Y^1PX^2Y^1)1(P_{\hat{X}_{2}\hat{Y}_{1}}\neq P_{\hat{X}_{2}^{\prime}\hat{Y}_{1}^{\prime}}), and 1(PX^2Y^1=PX^2Y^1)1(P_{\hat{X}_{2}\hat{Y}_{1}}=P_{\hat{X}_{2}^{\prime}\hat{Y}_{1}^{\prime}}), and combining terms, we obtain

𝑬𝒞2\displaystyle\mbox{\boldmath$E$}_{{\cal C}_{2}} .exp{n{1(PX^2Y^1PX^2Y^1)[I(X^1;Y^1)ρI(X^1;Y^1)\displaystyle\stackrel{{\scriptstyle.}}{{\leq}}\exp\Bigg{\{}n\Big{\{}1(P_{\hat{X}_{2}\hat{Y}_{1}}{\neq}P_{\hat{X}_{2}^{\prime}\hat{Y}_{1}^{\prime}})\Big{[}{-}I(\hat{X}_{1};\hat{Y}_{1}){-}\rho I(\hat{X}_{1}^{\prime};\hat{Y}_{1}^{\prime})
+min{ρλ¯(R2I(X^2;X^1,Y^1)),R2I(X^2;X^1,Y^1)}\displaystyle\quad+\min\{\overline{\rho\lambda}(R_{2}{-}I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1})),R_{2}{-}I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1})\}
+min{R2ρ¯I(X^2;Y^1)ρI(X^2;X^1,Y^1),\displaystyle\quad+\min\{R_{2}-\overline{\rho}I(\hat{X}_{2}^{\prime};\hat{Y}_{1}^{\prime})-\rho I(\hat{X}_{2}^{\prime};\hat{X}_{1}^{\prime},\hat{Y}_{1}^{\prime}),
ρ(R2I(X^2;X^1,Y^1)),ρλ(R2I(X^2;X^1,Y^1))}]\displaystyle\quad\quad\rho(R_{2}-I(\hat{X}_{2}^{\prime};\hat{X}_{1}^{\prime},\hat{Y}_{1}^{\prime})),\rho\lambda(R_{2}-I(\hat{X}_{2}^{\prime};\hat{X}_{1}^{\prime},\hat{Y}_{1}^{\prime}))\}\Big{]}
+1(PX^2Y^1=PX^2Y^1)×\displaystyle+1(P_{\hat{X}_{2}\hat{Y}_{1}}=P_{\hat{X}_{2}^{\prime}\hat{Y}_{1}^{\prime}})\times
[1(R2I(X^2;Y^1))[I(X^1;Y^1)ρI(X^1;Y^1)\displaystyle\quad\Big{[}1(R_{2}\geq I(\hat{X}_{2};\hat{Y}_{1}))\big{[}-I(\hat{X}_{1};\hat{Y}_{1})-\rho I(\hat{X}_{1}^{\prime};\hat{Y}_{1}^{\prime})
+min{ρλ¯(R2I(X^2;X^1,Y^1)),R2I(X^2;X^1,Y^1)}\displaystyle\quad{+}\min\{\overline{\rho\lambda}(R_{2}{-}I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1})),R_{2}{-}I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1})\}
+min{ρ(R2I(X^2;X^1,Y^1)),ρλ(R2I(X^2;X^1,Y^1))}]\displaystyle\quad{+}\min\{\rho(R_{2}{-}I(\hat{X}_{2}^{\prime};\hat{X}_{1}^{\prime},\hat{Y}_{1}^{\prime})),\rho\lambda(R_{2}{-}I(\hat{X}_{2}^{\prime};\hat{X}_{1}^{\prime},\hat{Y}_{1}^{\prime}))\}\big{]}
+1(R2<I(X^2;Y^1))[I(X^1;Y^1)+R2\displaystyle\quad+1(R_{2}<I(\hat{X}_{2};\hat{Y}_{1}))\big{[}-I(\hat{X}_{1};\hat{Y}_{1})+R_{2}
I(X^2;X^1,Y^1)ρI(X^1;X^2,Y^1)]]}},\displaystyle\quad\quad-I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1})-\rho I(\hat{X}_{1}^{\prime};\hat{X}_{2}^{\prime},\hat{Y}_{1}^{\prime})\big{]}\Big{]}\Big{\}}\Bigg{\}}, (A.28)

where, in simplifying, we have made use of the identity

1\displaystyle 1 (R2I(X^2;X^1,Y^1))ρλ(R2I(X^2;X^1,Y^1))+\displaystyle(R_{2}\geq I(\hat{X}_{2}^{\prime};\hat{X}_{1}^{\prime},\hat{Y}_{1}^{\prime}))\rho\lambda(R_{2}-I(\hat{X}_{2}^{\prime};\hat{X}_{1}^{\prime},\hat{Y}_{1}^{\prime}))+
1(R2<I(X^2;X^1,Y^1))min{R2ρ¯I(X^2;Y^1)\displaystyle\quad 1(R_{2}<I(\hat{X}_{2}^{\prime};\hat{X}_{1}^{\prime},\hat{Y}_{1}^{\prime}))\min\{R_{2}-\overline{\rho}I(\hat{X}_{2}^{\prime};\hat{Y}_{1}^{\prime})
ρI(X^2;X^1,Y^1),ρ(R2I(X^2;X^1,Y^1))}\displaystyle-\rho I(\hat{X}_{2}^{\prime};\hat{X}_{1}^{\prime},\hat{Y}_{1}^{\prime}),\rho(R_{2}-I(\hat{X}_{2}^{\prime};\hat{X}_{1}^{\prime},\hat{Y}_{1}^{\prime}))\}
=min{R2ρ¯I(X^2;Y^1)ρI(X^2;X^1,Y^1),\displaystyle=\min\{R_{2}-\overline{\rho}I(\hat{X}_{2}^{\prime};\hat{Y}_{1}^{\prime})-\rho I(\hat{X}_{2}^{\prime};\hat{X}_{1}^{\prime},\hat{Y}_{1}^{\prime}),
ρ(R2I(X^2;X^1,Y^1)),ρλ(R2I(X^2;X^1,Y^1))},\displaystyle\quad\rho(R_{2}-I(\hat{X}_{2}^{\prime};\hat{X}_{1}^{\prime},\hat{Y}_{1}^{\prime})),\rho\lambda(R_{2}-I(\hat{X}_{2}^{\prime};\hat{X}_{1}^{\prime},\hat{Y}_{1}^{\prime}))\},

along with

1(PX^2Y^1=PX^2Y^1)1(R2I(X^2;X^1,Y^1))\displaystyle 1(P_{\hat{X}_{2}\hat{Y}_{1}}=P_{\hat{X}_{2}^{\prime}\hat{Y}_{1}^{\prime}})1(R_{2}\geq I(\hat{X}_{2}^{\prime};\hat{X}_{1}^{\prime},\hat{Y}_{1}^{\prime}))
=1(PX^2Y^1=PX^2Y^1)1(R2I(X^2;Y^1))1(R2I(X^2;X^1,Y^1)),\displaystyle=1(P_{\hat{X}_{2}\hat{Y}_{1}}{=}P_{\hat{X}_{2}^{\prime}\hat{Y}_{1}^{\prime}})1(R_{2}{\geq}I(\hat{X}_{2};\hat{Y}_{1}))1(R_{2}{\geq}I(\hat{X}_{2}^{\prime};\hat{X}_{1}^{\prime},\hat{Y}_{1}^{\prime})),

and finally

1\displaystyle 1 (R2I(X^2;X^1,Y^1))ρλ(R2I(X^2;X^1,Y^1))+\displaystyle(R_{2}\geq I(\hat{X}_{2}^{\prime};\hat{X}_{1}^{\prime},\hat{Y}_{1}^{\prime}))\rho\lambda(R_{2}-I(\hat{X}_{2}^{\prime};\hat{X}_{1}^{\prime},\hat{Y}_{1}^{\prime}))+
1(R2<I(X^2;X^1,Y^1))ρ(R2I(X^2;X^1,Y^1))\displaystyle 1(R_{2}<I(\hat{X}_{2}^{\prime};\hat{X}_{1}^{\prime},\hat{Y}_{1}^{\prime}))\rho(R_{2}-I(\hat{X}_{2}^{\prime};\hat{X}_{1}^{\prime},\hat{Y}_{1}^{\prime}))
=min{ρ(R2I(X^2;X^1,Y^1)),ρλ(R2I(X^2;X^1,Y^1))}.\displaystyle=\min\{\rho(R_{2}{-}I(\hat{X}_{2}^{\prime};\hat{X}_{1}^{\prime},\hat{Y}_{1}^{\prime})),\rho\lambda(R_{2}{-}I(\hat{X}_{2}^{\prime};\hat{X}_{1}^{\prime},\hat{Y}_{1}^{\prime}))\}.

We use (A.28) in (A), add over all vectors 𝒚1\mbox{\boldmath$y$}_{1}, decompose all joint-type-dependent terms appearing in (A), as well as the term nH(Y^1)nH(\hat{Y}_{1}) arising from the summation over 𝒚1\mbox{\boldmath$y$}_{1} per type, against the indicators 1(PX^2Y^1PX^2Y^1)1(P_{\hat{X}_{2}\hat{Y}_{1}}\neq P_{\hat{X}_{2}^{\prime}\hat{Y}_{1}^{\prime}}) and 1(PX^2Y^1=PX^2Y^1)1(P_{\hat{X}_{2}\hat{Y}_{1}}=P_{\hat{X}_{2}^{\prime}\hat{Y}_{1}^{\prime}}), and finally optimize over the types PX^1X^2Y^1P_{\hat{X}_{1}\hat{X}_{2}\hat{Y}_{1}}, PX^1X^2Y^1P_{\hat{X}_{1}^{\prime}\hat{X}_{2}^{\prime}\hat{Y}_{1}^{\prime}} to obtain:

𝑬𝒞1,𝒞2\displaystyle\mbox{\boldmath$E$}_{{\cal C}_{1},{\cal C}_{2}} (PE1).exp{n{R2+ρR1+max{\displaystyle(P_{E_{1}})\stackrel{{\scriptstyle.}}{{\leq}}\exp\Bigg{\{}n\Bigg{\{}-R_{2}+\rho R_{1}+\max\Bigg{\{}
maxPX^1X^2Y^1,PX^1X^2Y^1PX^1=PX^1=Q1,PX^2=PX^2=Q2,PY^=PY^1PX^2Y^1PX^2Y^1,[ρλ¯𝑬X^1X^2Y^1logq1(Y^1|X^1,X^2)\displaystyle\max_{\begin{subarray}{c}P_{\hat{X}_{1}\hat{X}_{2}\hat{Y}_{1}},P_{\hat{X}_{1}^{\prime}\hat{X}_{2}^{\prime}\hat{Y}_{1}^{\prime}}\\ P_{\hat{X}_{1}}=P_{\hat{X}_{1}^{\prime}}=Q_{1},\\ P_{\hat{X}_{2}}=P_{\hat{X}_{2}^{\prime}}=Q_{2},\\ P_{\hat{Y}}=P_{\hat{Y}_{1}^{\prime}}\\ P_{\hat{X}_{2}\hat{Y}_{1}}\neq P_{\hat{X}_{2}^{\prime}\hat{Y}_{1}^{\prime}},\end{subarray}}\Bigg{[}\overline{\rho\lambda}\mbox{\boldmath$E$}_{\hat{X}_{1}\hat{X}_{2}\hat{Y}_{1}}\log q_{1}(\hat{Y}_{1}|\hat{X}_{1},\hat{X}_{2})
+ρλ𝑬X^1X^2Y^1logq1(Y^1|X^1,X^2)\displaystyle+\rho\lambda\mbox{\boldmath$E$}_{\hat{X}_{1}^{\prime}\hat{X}_{2}^{\prime}\hat{Y}_{1}^{\prime}}\log q_{1}(\hat{Y}_{1}^{\prime}|\hat{X}_{1}^{\prime},\hat{X}_{2}^{\prime})
+H(Y^1|X^1)ρI(X^1;Y^1)\displaystyle+H(\hat{Y}_{1}|\hat{X}_{1}){-}\rho I(\hat{X}_{1}^{\prime};\hat{Y}_{1}^{\prime})
+min{ρλ¯(R2I(X^2;X^1,Y^1)),R2I(X^2;X^1,Y^1)}\displaystyle\quad+\min\{\overline{\rho\lambda}(R_{2}{-}I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1})),R_{2}{-}I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1})\}
+min{R2ρ¯I(X^2;Y^1)ρI(X^2;X^1,Y^1),\displaystyle\quad+\min\{R_{2}-\overline{\rho}I(\hat{X}_{2}^{\prime};\hat{Y}_{1}^{\prime})-\rho I(\hat{X}_{2}^{\prime};\hat{X}_{1}^{\prime},\hat{Y}_{1}^{\prime}),
ρ(R2I(X^2;X^1,Y^1)),ρλ(R2I(X^2;X^1,Y^1))}];\displaystyle\quad\quad\rho(R_{2}{-}I(\hat{X}_{2}^{\prime};\hat{X}_{1}^{\prime},\hat{Y}_{1}^{\prime})),\rho\lambda(R_{2}{-}I(\hat{X}_{2}^{\prime};\hat{X}_{1}^{\prime},\hat{Y}_{1}^{\prime}))\}\Bigg{]};
maxPX^1X^2Y^1,PX^1X^2Y^1PX^1=PX^1=Q1,PX^2=PX^2=Q2,PX^2Y^1=PX^2Y^1[ρλ¯𝑬X^1X^2Y^1logq1(Y^1|X^1,X^2)\displaystyle\max_{\begin{subarray}{c}P_{\hat{X}_{1}\hat{X}_{2}\hat{Y}_{1}},P_{\hat{X}_{1}^{\prime}\hat{X}_{2}^{\prime}\hat{Y}_{1}^{\prime}}\\ P_{\hat{X}_{1}}=P_{\hat{X}_{1}^{\prime}}=Q_{1},\\ P_{\hat{X}_{2}}=P_{\hat{X}_{2}^{\prime}}=Q_{2},\\ P_{\hat{X}_{2}\hat{Y}_{1}}=P_{\hat{X}_{2}^{\prime}\hat{Y}_{1}^{\prime}}\end{subarray}}\Bigg{[}\overline{\rho\lambda}\mbox{\boldmath$E$}_{\hat{X}_{1}\hat{X}_{2}\hat{Y}_{1}}\log q_{1}(\hat{Y}_{1}|\hat{X}_{1},\hat{X}_{2})
+ρλ𝑬X^1X^2Y^1logq1(Y^1|X^1,X^2)\displaystyle+\rho\lambda\mbox{\boldmath$E$}_{\hat{X}_{1}^{\prime}\hat{X}_{2}^{\prime}\hat{Y}_{1}^{\prime}}\log q_{1}(\hat{Y}_{1}^{\prime}|\hat{X}_{1}^{\prime},\hat{X}_{2}^{\prime})
+1(R2I(X^2;Y^1))[H(Y^1|X^1)ρI(X^1;Y^1)\displaystyle+1(R_{2}\geq I(\hat{X}_{2};\hat{Y}_{1}))\big{[}H(\hat{Y}_{1}|\hat{X}_{1})-\rho I(\hat{X}_{1}^{\prime};\hat{Y}_{1}^{\prime})
+min{ρλ¯(R2I(X^2;X^1,Y^1)),R2I(X^2;X^1,Y^1)}\displaystyle\quad{+}\min\{\overline{\rho\lambda}(R_{2}{-}I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1})),R_{2}{-}I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1})\}
+min{ρ(R2I(X^2;X^1,Y^1)),\displaystyle\quad{+}\min\{\rho(R_{2}-I(\hat{X}_{2}^{\prime};\hat{X}_{1}^{\prime},\hat{Y}_{1}^{\prime})),
ρλ(R2I(X^2;X^1,Y^1))}]\displaystyle\quad\quad\rho\lambda(R_{2}-I(\hat{X}_{2}^{\prime};\hat{X}_{1}^{\prime},\hat{Y}_{1}^{\prime}))\}\big{]}
+1(R2<I(X^2;Y^1))[H(Y^1|X^1)+R2\displaystyle\quad+1(R_{2}<I(\hat{X}_{2};\hat{Y}_{1}))\big{[}H(\hat{Y}_{1}|\hat{X}_{1})+R_{2}
I(X^2;X^1,Y^1)ρI(X^1;X^2,Y^1)]]}}}\displaystyle\quad\quad-I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1})-\rho I(\hat{X}_{1}^{\prime};\hat{X}_{2}^{\prime},\hat{Y}_{1}^{\prime})\big{]}\Bigg{]}\Bigg{\}}\Bigg{\}}\Bigg{\}} (A.29)

Note that the term H(Y^1)H(\hat{Y}_{1}) mentioned above has been combined with the term I(X^1;X^2)-I(\hat{X}_{1};\hat{X}_{2}) appearing in all subcases of (A.28) to yield the H(Y^1|X^1)H(\hat{Y}_{1}|\hat{X}_{1}) appearing throughout (A.29).

The expression in Theorem 1 is obtained from (A.29) by dropping the constraint PX^2Y^1PX^2Y^1,P_{\hat{X}_{2}\hat{Y}_{1}}\neq P_{\hat{X}_{2}^{\prime}\hat{Y}_{1}^{\prime}}, from the first maximization (which, given the continuity of the underlying terms, is not really a constraint anyway), by noting that if, in the resulting expression, the second maximization is attained when R2I(X^2;Y^1)R_{2}\geq I(\hat{X}_{2};\hat{Y}_{1}), it will be dominated by the first maximization so that the second maximization can be restricted to the case R2<I(X^2;Y^1)R_{2}<I(\hat{X}_{2};\hat{Y}_{1}), and finally by negating the resulting exponent (and propagating the negation as max{}=min{}-\max\{\cdots\}=\min\{-\cdots\} throughout).

Appendix B A Lower Bound to ER,1E_{R,1}

We can lower bound the maximization of (3) over ρ\rho and λ\lambda by applying the min-max theorem twice, as follows.

First we introduce a new parameter θ\theta and bound (3) as

E\displaystyle E R,1minθ[0,1]{R2ρR1+θ×{}_{R,1}\geq\min_{\theta\in[0,1]}\Bigg{\{}R_{2}-\rho R_{1}+\theta\times (B.1)
min(PX^1(1)X^2(1)Y^1(1),PX^1(1)X^2(1)Y^1(1))𝒮1(Q1,Q2)f1(ρ,λ,PX^1(1)X^2(1)Y^1(1),PX^1(1)X^2(1)Y^1(1))+\displaystyle\min_{\begin{subarray}{c}(P_{\hat{X}^{(1)}_{1}\hat{X}^{(1)}_{2}\hat{Y}^{(1)}_{1}},\\ P_{\hat{X}^{{}^{\prime}(1)}_{1}\hat{X}^{{}^{\prime}(1)}_{2}\hat{Y}^{{}^{\prime}(1)}_{1}})\\ \in{\cal S}_{1}(Q_{1},Q_{2})\end{subarray}}f_{1}\left(\rho,\lambda,P_{\hat{X}^{(1)}_{1}\hat{X}^{(1)}_{2}\hat{Y}^{(1)}_{1}},P_{\hat{X}^{{}^{\prime}(1)}_{1}\hat{X}^{{}^{\prime}(1)}_{2}\hat{Y}^{{}^{\prime}(1)}_{1}}\right)+
θ¯min(PX^1(2)X^2(2)Y^1(2),PX^1(2)X^2(2)Y^1(2))𝒮2(Q1,Q2)f2(ρ,λ,PX^1(2)X^2(2)Y^1(2),PX^1(2)X^2(2)Y^1(2))}\displaystyle\overline{\theta}\min_{\begin{subarray}{c}(P_{\hat{X}^{(2)}_{1}\hat{X}^{(2)}_{2}\hat{Y}^{(2)}_{1}},\\ P_{\hat{X}^{{}^{\prime}(2)}_{1}\hat{X}^{{}^{\prime}(2)}_{2}\hat{Y}^{{}^{\prime}(2)}_{1}})\\ \in{\cal S}_{2}(Q_{1},Q_{2})\end{subarray}}f_{2}\left(\rho,\lambda,P_{\hat{X}^{(2)}_{1}\hat{X}^{(2)}_{2}\hat{Y}^{(2)}_{1}},P_{\hat{X}^{{}^{\prime}(2)}_{1}\hat{X}^{{}^{\prime}(2)}_{2}\hat{Y}^{{}^{\prime}(2)}_{1}}\right)\Bigg{\}} (B.2)

where θ¯=1θ\overline{\theta}=1-\theta and we have dropped the constraint involving R2R_{2} from 𝒮2{\cal S}_{2}, resulting in a lower bound, and making 𝒮2{\cal S}_{2} convex.

Letting γ=ρλ\gamma=\rho\lambda, we claim that for fixed θ\theta, the expression in (B.2) being minimized over θ\theta above is convex in (ρ,γ)(\rho,\gamma). This follows from the fact that for fixed PX^1(1)X^2(1)Y^1(1),P_{\hat{X}^{(1)}_{1}\hat{X}^{(1)}_{2}\hat{Y}^{(1)}_{1}}, PX^1(1)X^2(1)Y^1(1),P_{\hat{X}^{{}^{\prime}(1)}_{1}\hat{X}^{{}^{\prime}(1)}_{2}\hat{Y}^{{}^{\prime}(1)}_{1}}, PX^1(2)X^2(2)Y^1(2),P_{\hat{X}^{(2)}_{1}\hat{X}^{(2)}_{2}\hat{Y}^{(2)}_{1}}, PX^1(2)X^2(2)Y^1(2))P_{\hat{X}^{{}^{\prime}(2)}_{1}\hat{X}^{{}^{\prime}(2)}_{2}\hat{Y}^{{}^{\prime}(2)}_{1}}), both f1f_{1} and f2f_{2} are affine in (ρ,γ)(\rho,\gamma). The only problem would come from the max\max’s appearing in these expressions, but it can be checked that these maximizations are independent of (ρ,γ)(\rho,\gamma) for fixed (PX^1(1)X^2(1)Y^1(1),(P_{\hat{X}^{(1)}_{1}\hat{X}^{(1)}_{2}\hat{Y}^{(1)}_{1}}, PX^1(1)X^2(1)Y^1(1),P_{\hat{X}^{{}^{\prime}(1)}_{1}\hat{X}^{{}^{\prime}(1)}_{2}\hat{Y}^{{}^{\prime}(1)}_{1}}, PX^1(2)X^2(2)Y^1(2),P_{\hat{X}^{(2)}_{1}\hat{X}^{(2)}_{2}\hat{Y}^{(2)}_{1}}, PX^1(2)X^2(2)Y^1(2))P_{\hat{X}^{{}^{\prime}(2)}_{1}\hat{X}^{{}^{\prime}(2)}_{2}\hat{Y}^{{}^{\prime}(2)}_{1}}). Letting Σ={(x,y):x[0,1],y[0,x]}\Sigma=\{(x,y):x\in[0,1],y\in[0,x]\}, we can thus apply the min-max theorem of convex analysis (twice) as follows

ER,1\displaystyle E^{*}_{R,1}
max(ρ,γ)Σminθ[0,1]{R2ρR1+θ×\displaystyle\geq\max_{(\rho,\gamma)\in\Sigma}\min_{\theta\in[0,1]}\Bigg{\{}R_{2}-\rho R_{1}+\theta\times
min(PX^1(1)X^2(1)Y^1(1),PX^1(1)X^2(1)Y^1(1))𝒮1(Q1,Q2)f1(ρ,γ,PX^1(1)X^2(1)Y^1(1),PX^1(1)X^2(1)Y^1(1))+\displaystyle\min_{\begin{subarray}{c}(P_{\hat{X}^{(1)}_{1}\hat{X}^{(1)}_{2}\hat{Y}^{(1)}_{1}},\\ P_{\hat{X}^{{}^{\prime}(1)}_{1}\hat{X}^{{}^{\prime}(1)}_{2}\hat{Y}^{{}^{\prime}(1)}_{1}})\\ \in{\cal S}_{1}(Q_{1},Q_{2})\end{subarray}}f_{1}\left(\rho,\gamma,P_{\hat{X}^{(1)}_{1}\hat{X}^{(1)}_{2}\hat{Y}^{(1)}_{1}},P_{\hat{X}^{{}^{\prime}(1)}_{1}\hat{X}^{{}^{\prime}(1)}_{2}\hat{Y}^{{}^{\prime}(1)}_{1}}\right)+
θ¯min(PX^1(2)X^2(2)Y^1(2),PX^1(2)X^2(2)Y^1(2))𝒮2(Q1,Q2)f2(ρ,γ,PX^1(2)X^2(2)Y^1(2),PX^1(2)X^2(2)Y^1(2))}\displaystyle\overline{\theta}\min_{\begin{subarray}{c}(P_{\hat{X}^{(2)}_{1}\hat{X}^{(2)}_{2}\hat{Y}^{(2)}_{1}},\\ P_{\hat{X}^{{}^{\prime}(2)}_{1}\hat{X}^{{}^{\prime}(2)}_{2}\hat{Y}^{{}^{\prime}(2)}_{1}})\\ \in{\cal S}_{2}(Q_{1},Q_{2})\end{subarray}}f_{2}\left(\rho,\gamma,P_{\hat{X}^{(2)}_{1}\hat{X}^{(2)}_{2}\hat{Y}^{(2)}_{1}},P_{\hat{X}^{{}^{\prime}(2)}_{1}\hat{X}^{{}^{\prime}(2)}_{2}\hat{Y}^{{}^{\prime}(2)}_{1}}\right)\Bigg{\}}
=minθ[0,1]max(ρ,γ)Σ{R2ρR1+θ×\displaystyle=\min_{\theta\in[0,1]}\max_{(\rho,\gamma)\in\Sigma}\Bigg{\{}R_{2}-\rho R_{1}+\theta\times
min(PX^1(1)X^2(1)Y^1(1),PX^1(1)X^2(1)Y^1(1))𝒮1(Q1,Q2)f1(ρ,γ,PX^1(1)X^2(1)Y^1(1),PX^1(1)X^2(1)Y^1(1))+\displaystyle\min_{\begin{subarray}{c}(P_{\hat{X}^{(1)}_{1}\hat{X}^{(1)}_{2}\hat{Y}^{(1)}_{1}},\\ P_{\hat{X}^{{}^{\prime}(1)}_{1}\hat{X}^{{}^{\prime}(1)}_{2}\hat{Y}^{{}^{\prime}(1)}_{1}})\\ \in{\cal S}_{1}(Q_{1},Q_{2})\end{subarray}}f_{1}\left(\rho,\gamma,P_{\hat{X}^{(1)}_{1}\hat{X}^{(1)}_{2}\hat{Y}^{(1)}_{1}},P_{\hat{X}^{{}^{\prime}(1)}_{1}\hat{X}^{{}^{\prime}(1)}_{2}\hat{Y}^{{}^{\prime}(1)}_{1}}\right)+
θ¯min(PX^1(2)X^2(2)Y^1(2),PX^1(2)X^2(2)Y^1(2))𝒮2(Q1,Q2)f2(ρ,γ,PX^1(2)X^2(2)Y^1(2),PX^1(2)X^2(2)Y^1(2))}\displaystyle\overline{\theta}\min_{\begin{subarray}{c}(P_{\hat{X}^{(2)}_{1}\hat{X}^{(2)}_{2}\hat{Y}^{(2)}_{1}},\\ P_{\hat{X}^{{}^{\prime}(2)}_{1}\hat{X}^{{}^{\prime}(2)}_{2}\hat{Y}^{{}^{\prime}(2)}_{1}})\\ \in{\cal S}_{2}(Q_{1},Q_{2})\end{subarray}}f_{2}\left(\rho,\gamma,P_{\hat{X}^{(2)}_{1}\hat{X}^{(2)}_{2}\hat{Y}^{(2)}_{1}},P_{\hat{X}^{{}^{\prime}(2)}_{1}\hat{X}^{{}^{\prime}(2)}_{2}\hat{Y}^{{}^{\prime}(2)}_{1}}\right)\Bigg{\}}
=minθ[0,1]max(ρ,γ)Σmin(PX^1(1)X^2(1)Y^1(1),PX^1(1)X^2(1)Y^1(1),PX^1(2)X^2(2)Y^1(2),PX^1(2)X^2(2)Y^1(2))𝒮1(Q1,Q2)×𝒮2(Q1,Q2){R2ρR1+\displaystyle=\min_{\theta\in[0,1]}\max_{(\rho,\gamma)\in\Sigma}\min_{\begin{subarray}{c}(P_{\hat{X}^{(1)}_{1}\hat{X}^{(1)}_{2}\hat{Y}^{(1)}_{1}},P_{\hat{X}^{{}^{\prime}(1)}_{1}\hat{X}^{{}^{\prime}(1)}_{2}\hat{Y}^{{}^{\prime}(1)}_{1}},\\ P_{\hat{X}^{(2)}_{1}\hat{X}^{(2)}_{2}\hat{Y}^{(2)}_{1}},P_{\hat{X}^{{}^{\prime}(2)}_{1}\hat{X}^{{}^{\prime}(2)}_{2}\hat{Y}^{{}^{\prime}(2)}_{1}})\\ \in{\cal S}_{1}(Q_{1},Q_{2})\times{\cal S}_{2}(Q_{1},Q_{2})\end{subarray}}\Bigg{\{}R_{2}-\rho R_{1}+
θf1(ρ,γ,PX^1(1)X^2(1)Y^1(1),PX^1(1)X^2(1)Y^1(1))+\displaystyle\quad\quad\quad\theta f_{1}\left(\rho,\gamma,P_{\hat{X}^{(1)}_{1}\hat{X}^{(1)}_{2}\hat{Y}^{(1)}_{1}},P_{\hat{X}^{{}^{\prime}(1)}_{1}\hat{X}^{{}^{\prime}(1)}_{2}\hat{Y}^{{}^{\prime}(1)}_{1}}\right)+
θ¯f2(ρ,γ,PX^1(2)X^2(2)Y^1(2),PX^1(2)X^2(2)Y^1(2))}\displaystyle\quad\quad\quad\overline{\theta}f_{2}\left(\rho,\gamma,P_{\hat{X}^{(2)}_{1}\hat{X}^{(2)}_{2}\hat{Y}^{(2)}_{1}},P_{\hat{X}^{{}^{\prime}(2)}_{1}\hat{X}^{{}^{\prime}(2)}_{2}\hat{Y}^{{}^{\prime}(2)}_{1}}\right)\Bigg{\}}
=minθ[0,1]min(PX^1(1)X^2(1)Y^1(1),PX^1(1)X^2(1)Y^1(1),PX^1(2)X^2(2)Y^1(2),PX^1(2)X^2(2)Y^1(2))𝒮1(Q1,Q2)×𝒮2(Q1,Q2)max(ρ,γ)Σ{R2ρR1+\displaystyle=\min_{\theta\in[0,1]}\min_{\begin{subarray}{c}(P_{\hat{X}^{(1)}_{1}\hat{X}^{(1)}_{2}\hat{Y}^{(1)}_{1}},P_{\hat{X}^{{}^{\prime}(1)}_{1}\hat{X}^{{}^{\prime}(1)}_{2}\hat{Y}^{{}^{\prime}(1)}_{1}},\\ P_{\hat{X}^{(2)}_{1}\hat{X}^{(2)}_{2}\hat{Y}^{(2)}_{1}},P_{\hat{X}^{{}^{\prime}(2)}_{1}\hat{X}^{{}^{\prime}(2)}_{2}\hat{Y}^{{}^{\prime}(2)}_{1}})\\ \in{\cal S}_{1}(Q_{1},Q_{2})\times{\cal S}_{2}(Q_{1},Q_{2})\end{subarray}}\max_{(\rho,\gamma)\in\Sigma}\Bigg{\{}R_{2}-\rho R_{1}+
θf1(ρ,γ,PX^1(1)X^2(1)Y^1(1),PX^1(1)X^2(1)Y^1(1))+\displaystyle\quad\quad\quad\theta f_{1}\left(\rho,\gamma,P_{\hat{X}^{(1)}_{1}\hat{X}^{(1)}_{2}\hat{Y}^{(1)}_{1}},P_{\hat{X}^{{}^{\prime}(1)}_{1}\hat{X}^{{}^{\prime}(1)}_{2}\hat{Y}^{{}^{\prime}(1)}_{1}}\right)+
θ¯f2(ρ,γ,PX^1(2)X^2(2)Y^1(2),PX^1(2)X^2(2)Y^1(2))}\displaystyle\quad\quad\quad\overline{\theta}f_{2}\left(\rho,\gamma,P_{\hat{X}^{(2)}_{1}\hat{X}^{(2)}_{2}\hat{Y}^{(2)}_{1}},P_{\hat{X}^{{}^{\prime}(2)}_{1}\hat{X}^{{}^{\prime}(2)}_{2}\hat{Y}^{{}^{\prime}(2)}_{1}}\right)\Bigg{\}} (B.3)

Since, as noted above, for fixed (θ,\theta, PX^1(1)X^2(1)Y^1(1),P_{\hat{X}^{(1)}_{1}\hat{X}^{(1)}_{2}\hat{Y}^{(1)}_{1}}, PX^1(1)X^2(1)Y^1(1),P_{\hat{X}^{{}^{\prime}(1)}_{1}\hat{X}^{{}^{\prime}(1)}_{2}\hat{Y}^{{}^{\prime}(1)}_{1}}, PX^1(2)X^2(2)Y^1(2),P_{\hat{X}^{(2)}_{1}\hat{X}^{(2)}_{2}\hat{Y}^{(2)}_{1}}, PX^1(2)X^2(2)Y^1(2)P_{\hat{X}^{{}^{\prime}(2)}_{1}\hat{X}^{{}^{\prime}(2)}_{2}\hat{Y}^{{}^{\prime}(2)}_{1}}), both f1f_{1} and f2f_{2} are affine in (ρ,γ)(\rho,\gamma), the inner maximization in (B.3) is attained at one of the points (ρ,γ)={(0,0),(1,0),(1,1)}(\rho,\gamma)=\{(0,0),(1,0),(1,1)\}. After simplification, we obtain

ER,1minθ[0,1]min(PX^1(1)X^2(1)Y^1(1),PX^1(1)X^2(1)Y^1(1),PX^1(2)X^2(2)Y^1(2),PX^1(2)X^2(2)Y^1(2))𝒮1(Q1,Q2)×𝒮2(Q1,Q2)max{\displaystyle E_{R,1}^{*}\geq\min_{\theta\in[0,1]}\min_{\begin{subarray}{c}(P_{\hat{X}^{(1)}_{1}\hat{X}^{(1)}_{2}\hat{Y}^{(1)}_{1}},P_{\hat{X}^{{}^{\prime}(1)}_{1}\hat{X}^{{}^{\prime}(1)}_{2}\hat{Y}^{{}^{\prime}(1)}_{1}},\\ P_{\hat{X}^{(2)}_{1}\hat{X}^{(2)}_{2}\hat{Y}^{(2)}_{1}},P_{\hat{X}^{{}^{\prime}(2)}_{1}\hat{X}^{{}^{\prime}(2)}_{2}\hat{Y}^{{}^{\prime}(2)}_{1}})\\ \in{\cal S}_{1}(Q_{1},Q_{2})\times{\cal S}_{2}(Q_{1},Q_{2})\end{subarray}}\max\Bigg{\{}
θ[E[logq1(Y^1(1)|X^1(1),X^2(1))]H(Y^1(1)|X^1(1))+\displaystyle\theta\Big{[}-E\left[\log q_{1}(\hat{Y}^{(1)}_{1}|\hat{X}^{(1)}_{1},\hat{X}^{(1)}_{2})\right]-H(\hat{Y}^{(1)}_{1}|\hat{X}^{(1)}_{1})+
I(X^2(1);X^1(1),Y^1(1))+|I(X^2(1);Y^1(1))R2|+]+\displaystyle\quad\quad I(\hat{X}^{(1)}_{2};\hat{X}^{(1)}_{1},\hat{Y}^{(1)}_{1})+|I(\hat{X}^{{}^{\prime}(1)}_{2};\hat{Y}^{{}^{\prime}(1)}_{1})-R_{2}|^{+}\Big{]}+
θ¯[E[logq1(Y^1(2)|X^1(2),X^2(2))]H(Y^1(2)|X^1(2))+\displaystyle\overline{\theta}\Big{[}-E\left[\log q_{1}(\hat{Y}^{(2)}_{1}|\hat{X}^{(2)}_{1},\hat{X}^{(2)}_{2})\right]-H(\hat{Y}^{(2)}_{1}|\hat{X}^{(2)}_{1})+
I(X^2(2);X^1(2),Y^1(2))];\displaystyle\quad\quad I(\hat{X}^{(2)}_{2};\hat{X}^{(2)}_{1},\hat{Y}^{(2)}_{1})\Big{]};
R1+θ[E[logq1(Y^1(1)|X^1(1),X^2(1))]\displaystyle-R_{1}+\theta\Big{[}-E\left[\log q_{1}(\hat{Y}^{(1)}_{1}|\hat{X}^{(1)}_{1},\hat{X}^{(1)}_{2})\right]-
H(Y^1(1)|X^1(1))+I(X^1(1);Y^1(1))+\displaystyle\quad\quad H(\hat{Y}^{(1)}_{1}|\hat{X}^{(1)}_{1})+I(\hat{X}^{{}^{\prime}(1)}_{1};\hat{Y}^{{}^{\prime}(1)}_{1})+
I(X^2(1);X^1(1),Y^1(1))+|I(X^2(1);X^1(1),Y^1(1))R2|+]+\displaystyle\quad\quad I(\hat{X}^{(1)}_{2};\hat{X}^{(1)}_{1},\hat{Y}^{(1)}_{1})+|I(\hat{X}^{{}^{\prime}(1)}_{2};\hat{X}^{{}^{\prime}(1)}_{1},\hat{Y}^{{}^{\prime}(1)}_{1})-R_{2}|^{+}\Big{]}+
θ¯[E[logq1(Y^1(2)|X^1(2),X^2(2))]H(Y^1(2)|X^1(2))+\displaystyle\overline{\theta}\Big{[}-E\left[\log q_{1}(\hat{Y}^{(2)}_{1}|\hat{X}^{(2)}_{1},\hat{X}^{(2)}_{2})\right]-H(\hat{Y}^{(2)}_{1}|\hat{X}^{(2)}_{1})+
I(X^1(2);X^2(2),Y^1(2))+I(X^2(2);X^1(2),Y^1(2))];\displaystyle\quad\quad I(\hat{X}^{{}^{\prime}(2)}_{1};\hat{X}^{{}^{\prime}(2)}_{2},\hat{Y}^{{}^{\prime}(2)}_{1})+I(\hat{X}^{(2)}_{2};\hat{X}^{(2)}_{1},\hat{Y}^{(2)}_{1})\Big{]};
R1+θ[E[logq1(Y^1(1)|X^1(1),X^2(1))]\displaystyle-R_{1}+\theta\Big{[}-E\left[\log q_{1}(\hat{Y}^{{}^{\prime}(1)}_{1}|\hat{X}^{{}^{\prime}(1)}_{1},\hat{X}^{{}^{\prime}(1)}_{2})\right]-
H(Y^1(1)|X^1(1))+I(X^1(1);Y^1(1))+\displaystyle\quad\quad H(\hat{Y}^{(1)}_{1}|\hat{X}^{(1)}_{1})+I(\hat{X}^{{}^{\prime}(1)}_{1};\hat{Y}^{{}^{\prime}(1)}_{1})+
I(X^2(1);X^1(1),Y^1(1))+|I(X^2(1);X^1(1),Y^1(1))R2|+]+\displaystyle\quad\quad I(\hat{X}^{{}^{\prime}(1)}_{2};\hat{X}^{{}^{\prime}(1)}_{1},\hat{Y}^{{}^{\prime}(1)}_{1})+|I(\hat{X}^{(1)}_{2};\hat{X}^{(1)}_{1},\hat{Y}^{(1)}_{1})-R_{2}|^{+}\Big{]}+
θ¯[E[logq1(Y^1(2)|X^1(2),X^2(2))]H(Y^1(2)|X^1(2))+\displaystyle\overline{\theta}\Big{[}-E\left[\log q_{1}(\hat{Y}^{{}^{\prime}(2)}_{1}|\hat{X}^{{}^{\prime}(2)}_{1},\hat{X}^{{}^{\prime}(2)}_{2})\right]-H(\hat{Y}^{(2)}_{1}|\hat{X}^{(2)}_{1})+
I(X^1(2);X^2(2),Y^1(2))+I(X^2(2);X^1(2),Y^1(2))]}\displaystyle\quad\quad I(\hat{X}^{{}^{\prime}(2)}_{1};\hat{X}^{{}^{\prime}(2)}_{2},\hat{Y}^{{}^{\prime}(2)}_{1})+I(\hat{X}^{(2)}_{2};\hat{X}^{(2)}_{1},\hat{Y}^{(2)}_{1})\Big{]}\Bigg{\}}

Next, we note the identities

I(X^2;X^1,Y^1)=I(X^1;X^2)+H(Y^1|X^1)H(Y^1|X^1,X^2)\displaystyle I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1})=I(\hat{X}_{1};\hat{X}_{2})+H(\hat{Y}_{1}|\hat{X}_{1})-H(\hat{Y}_{1}|\hat{X}_{1},\hat{X}_{2})
I(X^1;X^2,Y^1)=I(X^1;X^2)+H(Y^1|X^2)H(Y^1|X^1,X^2)\displaystyle I(\hat{X}_{1};\hat{X}_{2},\hat{Y}_{1})=I(\hat{X}_{1};\hat{X}_{2})+H(\hat{Y}_{1}|\hat{X}_{2})-H(\hat{Y}_{1}|\hat{X}_{1},\hat{X}_{2})
D(PY^1|X^1X^2||q1|PX^1X^2)=H(Y^1|X^1,X^2)\displaystyle D(P_{\hat{Y}_{1}|\hat{X}_{1}\hat{X}_{2}}||q_{1}|P_{\hat{X}_{1}\hat{X}_{2}})=-H(\hat{Y}_{1}|\hat{X}_{1},\hat{X}_{2})-
𝑬X^1X^2Y^1[logq1(Y^1|X^1,X^2)]\displaystyle\quad\quad\quad\mbox{\boldmath$E$}_{\hat{X}_{1}\hat{X}_{2}\hat{Y}_{1}}\left[\log q_{1}\left(\hat{Y}_{1}|\hat{X}_{1},\hat{X}_{2}\right)\right]

and use them, with the shorthand D(m)=D(PY^1(m)|X^1(m)X^2(m)||q1|PX^1(m)X^2(m))D^{(m)}=D(P_{\hat{Y}^{(m)}_{1}|\hat{X}^{(m)}_{1}\hat{X}^{(m)}_{2}}||q_{1}|P_{\hat{X}^{(m)}_{1}\hat{X}^{(m)}_{2}}) and D(m)=D(PY^1(m)|X^1(m)X^2(m)||q1|PX^1(m)X^2(m))D^{{}^{\prime}(m)}=D(P_{\hat{Y}^{{}^{\prime}(m)}_{1}|\hat{X}^{{}^{\prime}(m)}_{1}\hat{X}^{{}^{\prime}(m)}_{2}}||q_{1}|P_{\hat{X}^{{}^{\prime}(m)}_{1}\hat{X}^{{}^{\prime}(m)}_{2}}), for m{1,2}m\in\{1,2\}, to rewrite the bound as

ER,1minθ[0,1]min(PX^1(1)X^2(1)Y^1(1),PX^1(1)X^2(1)Y^1(1),PX^1(2)X^2(2)Y^1(2),PX^1(2)X^2(2)Y^1(2))𝒮1(Q1,Q2)×𝒮2(Q1,Q2)max{\displaystyle E_{R,1}^{*}\geq\min_{\theta\in[0,1]}\min_{\begin{subarray}{c}(P_{\hat{X}^{(1)}_{1}\hat{X}^{(1)}_{2}\hat{Y}^{(1)}_{1}},P_{\hat{X}^{{}^{\prime}(1)}_{1}\hat{X}^{{}^{\prime}(1)}_{2}\hat{Y}^{{}^{\prime}(1)}_{1}},\\ P_{\hat{X}^{(2)}_{1}\hat{X}^{(2)}_{2}\hat{Y}^{(2)}_{1}},P_{\hat{X}^{{}^{\prime}(2)}_{1}\hat{X}^{{}^{\prime}(2)}_{2}\hat{Y}^{{}^{\prime}(2)}_{1}})\\ \in{\cal S}_{1}(Q_{1},Q_{2})\times{\cal S}_{2}(Q_{1},Q_{2})\end{subarray}}\max\Bigg{\{}
θ[D(1)+I(X^1(1);X^2(1))+|I(X^2(1);Y^1(1))R2|+]+\displaystyle\theta\Big{[}D^{(1)}+I(\hat{X}^{(1)}_{1};\hat{X}^{(1)}_{2})+|I(\hat{X}^{{}^{\prime}(1)}_{2};\hat{Y}^{{}^{\prime}(1)}_{1})-R_{2}|^{+}\Big{]}+
θ¯[D(2)+I(X^1(2);X^2(2))];\displaystyle\overline{\theta}\Big{[}D^{(2)}+I(\hat{X}^{(2)}_{1};\hat{X}^{(2)}_{2})\Big{]};
R1+θ[D(1)+I(X^1(1);X^2(1))+I(X^1(1);Y^1(1))\displaystyle-R_{1}+\theta\Big{[}D^{(1)}+I(\hat{X}^{(1)}_{1};\hat{X}^{(1)}_{2})+I(\hat{X}^{{}^{\prime}(1)}_{1};\hat{Y}^{{}^{\prime}(1)}_{1})
+|I(X^2(1);X^1(1),Y^1(1))R2|+]+\displaystyle\quad\quad\quad+|I(\hat{X}^{{}^{\prime}(1)}_{2};\hat{X}^{{}^{\prime}(1)}_{1},\hat{Y}^{{}^{\prime}(1)}_{1})-R_{2}|^{+}\Big{]}+
θ¯[D(2)+I(X^1(2);X^2(2))+I(X^1(2);X^2(2),Y^1(2))];\displaystyle\overline{\theta}\Big{[}D^{(2)}+I(\hat{X}^{(2)}_{1};\hat{X}^{(2)}_{2})+I(\hat{X}^{{}^{\prime}(2)}_{1};\hat{X}^{{}^{\prime}(2)}_{2},\hat{Y}^{{}^{\prime}(2)}_{1})\Big{]};
R1+θ[D(1)+I(X^1(1);X^2(1))+I(X^1(1);Y^1(1))+\displaystyle-R_{1}+\theta\Big{[}D^{{}^{\prime}(1)}+I(\hat{X}^{{}^{\prime}(1)}_{1};\hat{X}^{{}^{\prime}(1)}_{2})+I(\hat{X}^{(1)}_{1};\hat{Y}^{(1)}_{1})+
|I(X^2(1);X^1(1),Y^1(1))R2|+]+\displaystyle\quad\quad\quad|I(\hat{X}^{(1)}_{2};\hat{X}^{(1)}_{1},\hat{Y}^{(1)}_{1})-R_{2}|^{+}\Big{]}+
θ¯[D(2)+I(X^1(2);X^2(2))+I(X^1(2);X^2(2),Y^1(2))]}\displaystyle\overline{\theta}\Big{[}D^{{}^{\prime}(2)}+I(\hat{X}^{{}^{\prime}(2)}_{1};\hat{X}^{{}^{\prime}(2)}_{2})+I(\hat{X}^{(2)}_{1};\hat{X}^{(2)}_{2},\hat{Y}^{(2)}_{1})\Big{]}\Bigg{\}} (B.4)

where in simplifying the third expression in the maximization we have also exploited the constraints H(Y^1(1))=H(Y^1(1))H(\hat{Y}_{1}^{(1)})=H(\hat{Y}_{1}^{{}^{\prime}(1)}) and H(Y^1(2)|X^2(2))=H(Y^1(2)|X^2(2))H(\hat{Y}_{1}^{(2)}|\hat{X}_{2}^{(2)})=H(\hat{Y}_{1}^{{}^{\prime}(2)}|\hat{X}_{2}^{{}^{\prime}(2)}).

For R1=0R_{1}=0 we can further simplify this expression. In particular, for R1=0R_{1}=0, the first term in the inner maximization is readily seen to be always smaller than the second term. Additionally, the second and third terms are symmetric in the primed and non-primed joint distributions, which, together with the readily established joint convexity of the maximum of these two terms on the constraint set, imply that the inner minimization over the joint types is achieved when the primed and non-primed joint distributions are equal, in which case the two terms are equal. Therefore, at R1=0R_{1}=0 we have

ER,1\displaystyle E_{R,1}^{*} minθ[0,1]min(PX^1(1)X^2(1)Y^1(1),PX^1(2)X^2(2)Y^1(2)):PX^1(1)=PX^1(2)=Q1,PX^2(1)=PX^2(2)=Q2\displaystyle\geq\min_{\theta\in[0,1]}\min_{\begin{subarray}{c}(P_{\hat{X}^{(1)}_{1}\hat{X}^{(1)}_{2}\hat{Y}^{(1)}_{1}},P_{\hat{X}^{(2)}_{1}\hat{X}^{(2)}_{2}\hat{Y}^{(2)}_{1}}):\\ P_{\hat{X}^{(1)}_{1}}=P_{\hat{X}^{(2)}_{1}}=Q_{1},P_{\hat{X}^{(1)}_{2}}=P_{\hat{X}^{(2)}_{2}}=Q_{2}\end{subarray}}
θ[D(1)+I(X^1(1);X^2(1))+\displaystyle\theta\Big{[}D^{(1)}+I(\hat{X}^{(1)}_{1};\hat{X}^{(1)}_{2})+
I(X^1(1);Y^1(1))+|I(X^2(1);X^1(1),Y^1(1))R2|+]+\displaystyle\quad\quad I(\hat{X}^{(1)}_{1};\hat{Y}^{(1)}_{1})+|I(\hat{X}^{(1)}_{2};\hat{X}^{(1)}_{1},\hat{Y}^{(1)}_{1})-R_{2}|^{+}\Big{]}+
θ¯[D(2)+I(X^1(2);X^2(2))+I(X^1(2);X^2(2),Y^1(2))]\displaystyle\overline{\theta}\Big{[}D^{(2)}+I(\hat{X}^{(2)}_{1};\hat{X}^{(2)}_{2})+I(\hat{X}^{(2)}_{1};\hat{X}^{(2)}_{2},\hat{Y}^{(2)}_{1})\Big{]} (B.5)

or

ER,1\displaystyle E_{R,1}^{*} min{minPX^1X^2Y^1:PX^1=Q1,PX^2=Q2[D+I(X^1;X^2)+I(X^1;Y^1)\displaystyle\geq\min\Bigg{\{}\min_{\begin{subarray}{c}P_{\hat{X}_{1}\hat{X}_{2}\hat{Y}_{1}}:\\ P_{\hat{X}_{1}}=Q_{1},P_{\hat{X}_{2}}=Q_{2}\end{subarray}}\Big{[}D+I(\hat{X}_{1};\hat{X}_{2})+I(\hat{X}_{1};\hat{Y}_{1})
+|I(X^2;X^1,Y^1)R2|+];\displaystyle\quad\quad\quad+|I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1})-R_{2}|^{+}\Big{]};
minPX^1X^2Y^1:PX^1=Q1,PX^2=Q2[D+I(X^1;X^2)+I(X^1;X^2,Y^1)]}\displaystyle\min_{\begin{subarray}{c}P_{\hat{X}_{1}\hat{X}_{2}\hat{Y}_{1}}:\\ P_{\hat{X}_{1}}=Q_{1},P_{\hat{X}_{2}}=Q_{2}\end{subarray}}\Big{[}D+I(\hat{X}_{1};\hat{X}_{2})+I(\hat{X}_{1};\hat{X}_{2},\hat{Y}_{1})\Big{]}\Bigg{\}} (B.6)

where D=D(PY^1|X^1X^2||q1|PX^1X^2)D=D(P_{\hat{Y}_{1}|\hat{X}_{1}\hat{X}_{2}}||q_{1}|P_{\hat{X}_{1}\hat{X}_{2}}).

Simplifying EB,1E_{B,1} at R1=0R_{1}=0 gives

EB,1\displaystyle E_{B,1} =max{minPX^1X^2Y^1:PX^1=Q1,PX^2=Q2[D+I(X^1;X^2)+I(X^1;Y^1)];\displaystyle=\max\Bigg{\{}\min_{\begin{subarray}{c}P_{\hat{X}_{1}\hat{X}_{2}\hat{Y}_{1}}:\\ P_{\hat{X}_{1}}=Q_{1},P_{\hat{X}_{2}}=Q_{2}\end{subarray}}\Big{[}D+I(\hat{X}_{1};\hat{X}_{2})+I(\hat{X}_{1};\hat{Y}_{1})\Big{]};
min{minPX^1X^2Y^1:PX^1=Q1,PX^2=Q2[D+I(X^1;X^2)+\displaystyle\quad\min\Bigg{\{}\min_{\begin{subarray}{c}P_{\hat{X}_{1}\hat{X}_{2}\hat{Y}_{1}}:\\ P_{\hat{X}_{1}}=Q_{1},P_{\hat{X}_{2}}=Q_{2}\end{subarray}}\Big{[}D+I(\hat{X}_{1};\hat{X}_{2})+
+|I(X^1;Y^1)+I(X^2;X^1,Y^1)R2|+];\displaystyle\quad\quad\quad+|I(\hat{X}_{1};\hat{Y}_{1})+I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1})-R_{2}|^{+}\Big{]};
minPX^1X^2Y^1:PX^1=Q1,PX^2=Q2[D+I(X^1;X^2)+I(X^1;X^2,Y^1)]}}\displaystyle\quad\min_{\begin{subarray}{c}P_{\hat{X}_{1}\hat{X}_{2}\hat{Y}_{1}}:\\ P_{\hat{X}_{1}}=Q_{1},P_{\hat{X}_{2}}=Q_{2}\end{subarray}}\Big{[}D+I(\hat{X}_{1};\hat{X}_{2})+I(\hat{X}_{1};\hat{X}_{2},\hat{Y}_{1})\Big{]}\Bigg{\}}\Bigg{\}} (B.7)

which is seen to be no bigger than the above lower bound on ER,1E^{*}_{R,1}, since |I(X^2;X^1,Y^1)R2|+0|I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1})-R_{2}|^{+}\geq 0, I(X^1;X^2,Y^1)I(X^1;Y^1)I(\hat{X}_{1};\hat{X}_{2},\hat{Y}_{1})\geq I(\hat{X}_{1};\hat{Y}_{1}), and I(X^1;Y^1)+|I(X^2;X^1,Y^1)R2|+|I(X^1;Y^1)+I(X^2;X^1,Y^1)R2|+I(\hat{X}_{1};\hat{Y}_{1})+|I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1})-R_{2}|^{+}\geq|I(\hat{X}_{1};\hat{Y}_{1})+I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1})-R_{2}|^{+}.

Another application of the lower bound (B.4) is in determining the set of rate pairs R1,R2R_{1},R_{2} for which ER,1>0E_{R,1}^{*}>0. Let (X^1,X^2)(\hat{X}_{1},\hat{X}_{2}) be independent with marginal distributions Q1Q_{1} and Q2Q_{2} and Y^1\hat{Y}_{1} be the result of (X^1,X^2)(\hat{X}_{1},\hat{X}_{2}) passing through the channel q1q_{1}. We shall argue that if R1<I(X^1;Y^1)+|I(X^2;X^1,Y^1)R2|+R_{1}<I(\hat{X}_{1};\hat{Y}_{1})+|I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1})-R_{2}|^{+} =I(X^1;Y^1)+|I(X^2;Y^1|X^1)R2|+=I(\hat{X}_{1};\hat{Y}_{1})+|I(\hat{X}_{2};\hat{Y}_{1}|\hat{X}_{1})-R_{2}|^{+}. and R1<I(X^1;X^2,Y^1)=I(X^1;Y^1|X^2)R_{1}<I(\hat{X}_{1};\hat{X}_{2},\hat{Y}_{1})=I(\hat{X}_{1};\hat{Y}_{1}|\hat{X}_{2}) then the expression (B.4) must be greater than 0. Indeed, for the expression (B.4) to equal 0, we see from the first term in the inner maximum that the minimizing θ\theta and joint distributions must satisfy one of the following: case 1: θ=1\theta=1, D(1)=0D^{(1)}=0, and I(X^1(1);X^2(1))=0I(\hat{X}^{(1)}_{1};\hat{X}^{(1)}_{2})=0; case 2: θ=0\theta=0, D(2)=0D^{(2)}=0, and I(X^1(2);X^2(2))=0I(\hat{X}^{(2)}_{1};\hat{X}^{(2)}_{2})=0; or case 3: 0<θ<10<\theta<1, D(1)=D(2)=0D^{(1)}=D^{(2)}=0, and I(X^1(1);X^2(1))=I(X^1(2);X^2(2))=0I(\hat{X}^{(1)}_{1};\hat{X}^{(1)}_{2})=I(\hat{X}^{(2)}_{1};\hat{X}^{(2)}_{2})=0. If case 1 holds then (X^1(1),X^2(1),Y^1(1))(\hat{X}_{1}^{(1)},\hat{X}_{2}^{(1)},\hat{Y}_{1}^{(1)}) necessarily have the same joint distribution as (X^1,X^2,Y^1)(\hat{X}_{1},\hat{X}_{2},\hat{Y}_{1}), in which case, we see from the third term in the maximum in (B.4) that R1I(X^1;Y^1)+|I(X^2;X^1,Y^1)R2|+R_{1}\geq I(\hat{X}_{1};\hat{Y}_{1})+|I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1})-R_{2}|^{+}. Similarly, if case 2 holds then it follows that (X^1(2),X^2(2),Y^1(2))(\hat{X}_{1}^{(2)},\hat{X}_{2}^{(2)},\hat{Y}_{1}^{(2)}) have the same joint distribution as (X^1,X^2,Y^1)(\hat{X}_{1},\hat{X}_{2},\hat{Y}_{1}), in which case, it follows again from the third term in the maximum that R1I(X^1;X^2,Y^1)R_{1}\geq I(\hat{X}_{1};\hat{X}_{2},\hat{Y}_{1}). Finally, if case 3 holds then both (X^1(1),X^2(1),Y^1(1))(\hat{X}_{1}^{(1)},\hat{X}_{2}^{(1)},\hat{Y}_{1}^{(1)}) and (X^1(2),X^2(2),Y^1(2))(\hat{X}_{1}^{(2)},\hat{X}_{2}^{(2)},\hat{Y}_{1}^{(2)}) have the same distribution as (X^1,X^2,Y^1)(\hat{X}_{1},\hat{X}_{2},\hat{Y}_{1}), in which case, after writing R1=θR1+θ¯R1R_{1}=\theta R_{1}+\overline{\theta}R_{1}, we see again that either R1I(X^1;Y^1)+|I(X^2;X^1,Y^1)R2|+R_{1}\geq I(\hat{X}_{1};\hat{Y}_{1})+|I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1})-R_{2}|^{+} or R1I(X^1;X^2,Y^1)R_{1}\geq I(\hat{X}_{1};\hat{X}_{2},\hat{Y}_{1}) must hold. Thus, the three cases together establish the above claim that if R1<I(X^1;Y^1)+|I(X^2;Y^1|X^1)R2|+R_{1}<I(\hat{X}_{1};\hat{Y}_{1})+|I(\hat{X}_{2};\hat{Y}_{1}|\hat{X}_{1})-R_{2}|^{+} and R1<I(X^1;Y^1|X^2)R_{1}<I(\hat{X}_{1};\hat{Y}_{1}|\hat{X}_{2}) then the expression (B.4), and hence ER,1E^{*}_{R,1}, must be greater than 0. It can be checked that this region is equivalent to

{R1<I(X^1;Y^1)}{{R1+R2<I(Y^1;X^1,X^2)}{R1<I(X^1;Y^1|X^2)}\{R_{1}<I(\hat{X}_{1};\hat{Y}_{1})\}\cup\Big{\{}\{R_{1}+R_{2}<I(\hat{Y}_{1};\hat{X}_{1},\hat{X}_{2})\}\\ \cap\{R_{1}<I(\hat{X}_{1};\hat{Y}_{1}|\hat{X}_{2})\Big{\}}

which is represented in Fig. 1 in Section IV. It is shown in [11] that for the ensemble of constant composition codes comprised of i.i.d. codewords uniformly distributed over the types Q1Q_{1} and Q2Q_{2}, the exponential decay rate of the average probability of error for user 1 must necessarily be zero for rate pairs outside of this region, even for optimum, maximum likelihood decoding.

References

  • [1] T. S. Han and K. Kobayashi, “A new achievable rate region for the interference channel,” IEEE Trans. Info. Theory, vol. IT-27, pp. 49 - 60, January 1981.
  • [2] A. B. Carleial, “A case where interference does not reduce capacity,” (Corresp.), IEEE Trans. Info. Theory, vol. IT-21, pp. 569 - 570, September 1975.
  • [3] R. Etkin, D. Tse, and H. Wang, “Gaussian Interference Channel Capacity to Within One Bit,” submitted to IEEE Transactions on Information Theory, Feb. 2007. Also, available on–line at: [http://arxiv.org/PS_cache/cs/pdf/0702/0702045v2.pdf].
  • [4] L. Weng, S. S. Pradhan, and A. Anastasopoulos, “Error exponent regions for Gaussian broadcast and multiple-access channels,” IEEE Trans. Info. Theory, vol. IT-54, pp. 2919 - 2942, July 2008.
  • [5] R. G. Gallager, Information Theory and Reliable Communication, John Wiley & Sons, Inc., New York, 1968.
  • [6] I. Csiszár, and J. Körner, Information Theory: Coding Theorems for Discrete Memoryless Systems, Akadémiai Kiadó, Budapest, 1981.
  • [7] N. Merhav, “Relations between random coding exponents and the statistical physics of random codes,” accepted to IEEE Trans. Inform. Theory, Sep. 2008. Also, available on–line at: [http://www.ee.technion.ac.il/people/merhav/papers/p117.pdf].
  • [8] N. Merhav, “Error exponents of erasure/list decoding revisited via moments of distance enumerators,” IEEE Trans. Inform. Theory, Vol. 54, No. 10, pp. 4439-4447, Oct. 2008.
  • [9] R. Etkin, N. Merhav, E. Ordentlich, “Error exponents of optimum decoding for the interference channel,” Proceedings of the IEEE International Symposium on Information Theory, Toronto, Canada, pp. 1523–1527, 6-11 July 2008.
  • [10] R. Etkin, E. Ordentlich, “Discrete Memoryless Interference Channel: New Outer Bound,” Proceedings of the IEEE International Symposium on Information Theory, Nice, France, pp. 2851–2855, 24-29 June 2007.
  • [11] C. Chang, HP Labs Technical Report, 2008.
  • [12] J. Pokorny, H. Wallmeier, “Random coding bound and codes produced by permutations for the multiple-access channel,” IEEE Trans. Inform. Theory, Vol. 31, No. 6, pp. 741–750, Nov. 1985.
  • [13] Yu-Sun Liu, B. L. Hughes, “A new universal random coding bound for the multiple-access channel,” IEEE Trans. Inform. Theory, Vol. 42, No. 2, pp. 376–386, Mar. 1996.