Error Exponents of Optimum Decoding for the Interference Channel

Raul H. Etkin^†, Neri Merhav^‡, and Erik Ordentlich^†,
raul.etkin@hp.com, merhav@ee.technion.ac.il, erik.ordentlich@hp.com ^†R. Etkin and E. Ordentlich are with Hewlett-Packard Laboratories, Palo Alto, CA 94304, USA.^‡N. Merhav is with Technion - Israel Institute of Technology, Haifa 32000, Israel. Part of this work was done while N. Merhav was visiting Hewlett–Packard Laboratories in the Summers of 2007 and 2008.

Abstract

Exponential error bounds for the finite–alphabet interference channel (IFC) with two transmitter–receiver pairs, are investigated under the random coding regime. Our focus is on optimum decoding, as opposed to heuristic decoding rules that have been used in previous works, like joint typicality decoding, decoding based on interference cancellation, and decoding that considers the interference as additional noise. Indeed, the fact that the actual interfering signal is a codeword and not an i.i.d. noise process complicates the application of conventional techniques to the performance analysis of the optimum decoder. Using analytical tools rooted in statistical physics, we derive a single letter expression for error exponents achievable under optimum decoding and demonstrate strict improvement over error exponents obtainable using suboptimal decoding rules, but which are amenable to more conventional analysis.

Index Terms:

Error exponent region, large deviations, method of types, statistical physics.

I Introduction

The $M$ -user interference channel (IFC) models the communication between $M$ transmitter-receiver pairs, wherein each receiver must decode its corresponding transmitter’s message from a signal that is corrupted by interference from the other transmitters, in addition to channel noise. The information theoretic analysis of the IFC was initiated over 30 year ago and has recently witnessed a resurgence of interest, motivated by new potential applications, such as wireless communication over unregulated spectrum.

Previous work on the IFC has focused on obtaining inner and outer bounds to the capacity region for memoryless interference and noise, with a precise characterization of the capacity region remaining elusive for most channels, even for $M=2$ users. The best known inner bound for the IFC is the Han-Kobayashi (HK) region, established in [1]. It has been found to be tight in certain special cases ([1, 2]), and recently was found to be tight to within 1 bit for the two user Gaussian IFC [3]. No achievable rates that lie outside the HK region are known for any IFC with $M=2$ users.

Our aim in this paper is to extend the study of achievable schemes to the analysis of error exponents, or exponential rates of decay of error probabilities, that are attainable as a function of user rates. To our knowledge, there has been no prior treatment of error exponents for the IFC. In particular, the error bounds underlying the achievability results in [1] yield vanishing error exponents (though still decaying error probability) at all rates.

The notion of an error exponent region, or a set of achievable exponential rates of decay in the error probabilities for different users at a given operating rate-tuple in a multi-user communication network, was formalized recently in [4], and studied therein for Gaussian multiple access and broadcast channels. Our main result, presented in Section IV, is a single letter characterization of an achievable error exponent region, as a function of user rates, for the $M=2$ user finite alphabet, memoryless interference channel. The region is derived by bounding the average error probability of random codebooks comprised of i.i.d. codewords uniformly distributed over a type class, under maximum likelihood (ML) decoding at each user. Unlike the single user setting, in this case, the effective channel determining each receiver’s ML decoding rule is induced both by the noise and the interfering user’s codebook. Our focus on optimal decoding is a departure from the conventional achievability arguments in [1] and elsewhere, which are based on joint-typicality decoding, with restrictions on the decoder to “treat interference as noise” or to “decode the interference” in part or in whole. However, in this work, we confine our analysis to codebook ensembles that are simpler than the superposition codebooks of [1].

The analysis of the probability of decoding error under optimal decoding is complicated due to correlations induced by the interfering signal. Usual methods for bounding the probability of error based on Jensen’s inequality and other related inequalities (see, e.g., (8) below) fail to give good results. Our bounding approach combines some of the classical information theoretic approaches of [5] and [6] with an analytical technique from statistical physics that was applied recently to the study of single user channels in [7, 8]. More specifically, as in [5], we use auxiliary parameters $\rho$ and $\lambda$ to get an upper bound on the average probability of decoding error under ML decoding, which we then bound using the method of types [6]. Key in our derivation is the use of distance enumerators in the spirit of [7] and [8], which allows us to avoid using Jensen’s inequality in some steps, and allows us to maintain exponential tightness in other inequalities by applying them to only polynomially few terms (as opposed to exponentially many) in certain sums that bound the probability of decoding error. It should be emphasized, in this context, that the use of this technique was pivotal to our results. Our earlier attempts, that were based on more ‘traditional’ tools, failed to provide meaningful results. In fact, they all turned out to be inferior to some trivial bounds.

The paper is organized as follows. The notation, various definitions, and the channel model assumed throughout the paper are detailed in Section II. In Section III, we derive an “easy” set of attainable error exponents which we shall treat as a benchmark for the exponents of the main section, Section IV. The “easy” exponents are obtained by simple extensions to the interference channel of existing error exponent results for single user and multiple access channels, based on random constant composition codebooks and suboptimal decoders. Then, in Section IV, we derive another set of attainable exponents by analyzing ML decoding for the channel induced by the interfering codebook. In Section V, we show that the minimizations required to evaluate the new error exponents can be written as convex optimization problems, and, as a result, can be solved efficiently. We follow this up in Section VI with a numerical comparison of the new exponents with the baseline exponents of Section III for a simple IFC. These numerical results demonstrate that the new exponents are never worse (at least for the chosen channel and parameters) and, for most rates, strictly improve over the baseline exponents.

An earlier version of this work was presented in [9].

II Notation, Definitions, and Channel Model

Unless otherwise stated, we use lowercase and uppercase letters for scalars, boldface lowercase letters for vectors, uppercase (boldface) letters for random variables (vectors), and calligraphic letters for sets. For example, $a$ is a scalar, $v$ is a vector, $X$ is a random variable, $X$ is a random vector, and ${\cal S}$ is a set. For a real number $a$ we shall, on occasion, let $\overline{a}$ denote $1-a$ . Also, we use $\log(\cdot)$ to denote natural logarithm, $E$ to denote expectation, and Pr to denote probability. For independent random variables $X$ and $Y$ distributed according to $P_{X,Y}(x,y)=P_{X}(x)P_{Y}(y)$ , $(x,y)\in{\cal X}\times{\cal Y}$ , we denote the conditional expectation operator $\mbox{\boldmath$E$}_{X}(\cdot)$ as $\mbox{\boldmath$E$}_{X}(f(X,Y))\stackrel{{\scriptstyle\triangle}}{{=}}\sum_{x\in{\cal X}}f(x,Y)P_{X}(x)$ for any function $f(\cdot,\cdot)$ . All information quantities (entropy, mutual information, etc.) and rates are in nats. Finally, we use $\doteq$ , $\stackrel{{\scriptstyle.}}{{\leq}}$ , etc., to denote equality or inequality to the first order in the exponent, i.e. $a_{n}\doteq b_{n}\Leftrightarrow\lim_{n\to\infty}\frac{1}{n}\log\frac{a_{n}}{b_{n}}=0$ ; $a_{n}\stackrel{{\scriptstyle.}}{{\leq}}b_{n}\Leftrightarrow\lim\sup_{n\to\infty}\frac{1}{n}\log\frac{a_{n}}{b_{n}}\leq 0$ .

The empirical probability mass function of the finite alphabet sequence $\mbox{\boldmath$v$}=(v(1),\ldots,v(n))$ with alphabet ${\cal V}$ is denoted as the vector $\{P_{\mbox{\boldmath$v$}}(v),~v\in{\cal V}\}$ , where each $P_{\mbox{\boldmath$v$}}(v)$ is the relative frequency of $v(i)=v$ along $v$ . The type class associated with an empirical probability mass function $P$ , which will be denoted by ${\cal T}_{P}$ , is the set of all $n$ –vectors $\{\mbox{\boldmath$v$}\}$ whose empirical probability mass function is equal to $P$ . Similar conventions will apply to pairs and triples of vectors of length $n$ , which are defined over the corresponding product alphabets. Information measures pertaining to empirical distributions will be denoted using the standard notational conventions, except that we use “ $\hat{\quad}$ ” as well as subscripts that indicate the sequences from which these empirical distributions were extracted. For example, we write $\hat{H}_{\mbox{\boldmath$x$}\mbox{\boldmath$y$}\mbox{\boldmath$z$}}(X,Y|Z)$ and $\hat{I}_{\mbox{\boldmath$x$}\mbox{\boldmath$y$}\mbox{\boldmath$z$}}(X,Y;Z)$ to denote the conditional entropy of $(X,Y)$ given $Z$ and the mutual information between $(X,Y)$ and $Z$ , respectively, computed with respect to the empirical distribution $P_{\mbox{\boldmath$x$}\mbox{\boldmath$y$}\mbox{\boldmath$z$}}(x,y,z)$ . We denote the relative entropy or Kullback-Leibler divergence between distributions $P_{X}$ and $P_{Y}$ as $D(P_{X}||P_{Y})\stackrel{{\scriptstyle\triangle}}{{=}}\sum_{x}P_{X}(x)\log(P_{X}(x)/P_{Y}(x))$ , and we write $D(P_{X|Z}||P_{Y|Z}|P_{Z})$ for the conditional relative entropy between conditional distributions $P_{X|Z}$ and $P_{Y|Z}$ conditioned on $P_{Z}$ , which is defined as $D(P_{X|Z}||P_{Y|Z}|P_{Z})\stackrel{{\scriptstyle\triangle}}{{=}}\sum_{x,z}P_{Z}(z)P_{X|Z}(x|z)\log(P_{X|Z}(x|z)/P_{Y|Z}(x|z))$ .

We continue with a formal description of the two–user IFC setting. Let $\mbox{\boldmath$x$}_{i}=(x_{i}(1),\ldots,x_{i}(n))\in{\cal X}^{n}_{i}$ , $i=1,2$ , denote the channel input signals of the two transmitters, and let $\mbox{\boldmath$y$}_{i}=(y_{i}(1),\ldots,y_{i}(n))\in{\cal Y}^{n}_{i}$ be the corresponding channel outputs received by decoders 1 and 2, where ${\cal X}_{i}$ and ${\cal Y}_{i}$ denote the input and output alphabets, and which we assume to be finite. Each (random) output symbol pair $(Y_{1}(j),Y_{2}(j))$ is assumed to be conditionally independent of all other outputs, and all input symbols, given the two corresponding (random) input symbols $(X_{1}(j),X_{2}(j))$ , and the corresponding conditional probability is assumed to be constant from symbol to symbol. An $(n,R_{1},R_{2})$ code for the IFC consists of pairs of encoding and decoding functions, $(f_{1},f_{2})$ and $(g_{1},g_{2})$ , respectively, where $f_{i}:\{1,\ldots,M_{i}\}\rightarrow{\cal X}^{n}_{i}$ , $M_{i}=\lceil e^{nR_{i}}\rceil$ , and $g_{i}:{\cal Y}^{n}_{i}\rightarrow\{1,\ldots,M_{i}\}$ , $i=1,2$ . The performance of the code is characterized by a pair of error probabilities $P_{e,i}=\mbox{Pr}(\hat{W}_{i}\neq W_{i})$ , $i=1,2$ , where $\hat{W}_{i}=g_{i}(\mbox{\boldmath$Y$}_{i})$ and $\mbox{\boldmath$Y$}_{i}$ is the random output when user $i$ transmits $\mbox{\boldmath$X$}_{i}=f_{i}(W_{i})$ , assuming the messages $W_{i}$ are uniformly distributed on the sets of indices $\{1,2,\ldots,M_{i}\}$ , $i=1,2$ . The per user error probabilities depend on the channel only through the marginal conditional distributions of the channel outputs given the corresponding channel input pairs. We shall denote these conditional distributions as $q_{i}(y|x_{1},x_{2})\stackrel{{\scriptstyle\triangle}}{{=}}\mbox{Pr}(Y_{i}(j)=y|(X_{1}(j),X_{2}(j))=(x_{1},x_{2}))$ .

A pair of error exponents $(E_{1},E_{2})$ is attainable at a rate pair $(R_{1},R_{2})$ if there is a sequence of $(n,R_{1},R_{2})$ codes satisfying $E_{i}\leq\liminf_{n\to\infty}-(1/n)\log P_{e,i}$ for $i=1,2$ . The set of all attainable error exponents at $(R_{1},R_{2})$ comprises the error exponent region at $(R_{1},R_{2})$ and we shall denote it as ${\cal E}(R_{1},R_{2})$ . The main result of this paper is a single letter characterization of a non–trivial subset of ${\cal E}(R_{1},R_{2})$ for each $R_{1},R_{2}$ .

III Background

In this section, we present achievable error exponents for the interference channel which are based on known results of error exponents for single user and multiple access channels (MAC) for fixed composition codebooks [12, 13, 11]. These exponents will be used as a baseline for comparing the performance of the error exponents that we derive in Section IV.

In the following, we will focus on the error performance of user 1, and as a result, all explanations and expressions will be specialized to receiver 1. Similar expressions also hold for user 2 with the exchange of indices $1\leftrightarrow 2$ .

A possibly suboptimal decoder for the interference channel can be obtained from a given multiple access channel decoder by simply ignoring the decoded message of the interfering transmitter. For example, following [13], we can use a minimum entropy decoder that for a given received vector $\mbox{\boldmath$y$}_{1}$ at receiver $1$ computes $(\hat{\mbox{\boldmath$x$}}_{1},\hat{\mbox{\boldmath$x$}}_{2})$

(\hat{\mbox{\boldmath$x$}}_{1},\hat{\mbox{\boldmath$x$}}_{2})=\operatornamewithlimits{arg\,min}_{(\tilde{\mbox{\boldmath$x$}}_{1},\tilde{\mbox{\boldmath$x$}}_{2})\in{\cal C}_{1}\times{\cal C}_{2}}\hat{H}_{\tilde{}\mbox{\boldmath$x$}_{1}\tilde{}\mbox{\boldmath$x$}_{2}\mbox{\boldmath$y$}_{1}}(X_{1},X_{2}|Y_{1}),

and throws away $\hat{\mbox{\boldmath$x$}}_{2}$ .

It follows from [13] that for random codebooks of fixed composition $Q_{1},Q_{2}$ , the average probability of decoding both messages in error, where the averaging is done over the random choice of codebooks, satisfies:

\Pr(\hat{\mbox{\boldmath$x$}}_{1}\neq\mbox{\boldmath$x$}_{1},\hat{\mbox{\boldmath$x$}}_{2}\neq\mbox{\boldmath$x$}_{2})\stackrel{{\scriptstyle.}}{{\leq}}e^{-nE_{1,2}}

where

	$\displaystyle E_{1,2}=$	$\displaystyle\min_{P_{\hat{X}_{1}\hat{X}_{2}\hat{Y}_{1}}:P_{\hat{X}_{1}}=Q_{1},P_{\hat{X}_{2}}=Q_{2}}D(P_{\hat{Y}_{1}\|\hat{X}_{1}\hat{X}_{2}}\|\|q_{1}\|P_{\hat{X}_{1},\hat{X}_{2}})$
		$\displaystyle+I(\hat{X}_{1};\hat{X}_{2})$
		$\displaystyle+\|I(\hat{X}_{1};\hat{Y}_{1})+I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1})-R_{1}-R_{2}\|^{+}$

with $|\cdot|^{+}=\max\{\cdot,0\}$ .

In addition, the average probability of decoding the message of the interfering transmitter correctly but the message of the desired transmitter incorrectly satisfies:

\Pr(\hat{\mbox{\boldmath$x$}}_{1}\neq\mbox{\boldmath$x$}_{1},\hat{\mbox{\boldmath$x$}}_{2}=\mbox{\boldmath$x$}_{2})\stackrel{{\scriptstyle.}}{{\leq}}e^{-nE_{1|2}}

where

	$\displaystyle E_{1\|2}=$	$\displaystyle\min_{P_{\hat{X}_{1}\hat{X}_{2}\hat{Y}_{1}}:P_{\hat{X}_{1}}=Q_{1},P_{\hat{X}_{2}}=Q_{2}}D(P_{\hat{Y}_{1}\|\hat{X}_{1}\hat{X}_{2}}\|\|q_{1}\|P_{\hat{X}_{1},\hat{X}_{2}})$
		$\displaystyle+I(\hat{X}_{1};\hat{X}_{2})+\|I(\hat{X}_{1};\hat{X}_{2},\hat{Y}_{1})-R_{1}\|^{+}.$

Therefore, the overall average error performance of this MAC decoder in the IFC satisfies:

\Pr(\hat{\mbox{\boldmath$x$}}_{1}\neq\mbox{\boldmath$x$}_{1})\stackrel{{\scriptstyle.}}{{\leq}}e^{-n\min\{E_{1,2},E_{1|2}\}}.

A second suboptimal decoder that leads to tractable error performance bounds is the single user maximum mutual information decoder (which in this case coincides with the minimum entropy decoder):

\hat{\mbox{\boldmath$x$}}_{1}=\operatornamewithlimits{arg\,max}_{\mbox{\boldmath$x$}_{1}\in{\cal C}_{1}}\hat{I}_{\mbox{\boldmath$x$}_{1}\mbox{\boldmath$y$}_{1}}(X_{1};Y_{1}).

In this case, standard application of the method of types [11] leads to the following bound on the average error probability under random fixed composition codebooks of types $Q_{1},Q_{2}$ :

\Pr(\hat{\mbox{\boldmath$x$}}_{1}\neq\mbox{\boldmath$x$}_{1})\stackrel{{\scriptstyle.}}{{\leq}}e^{-nE_{1}},

where

	$\displaystyle E_{1}=$	$\displaystyle\min_{P_{\hat{X}_{1}\hat{X}_{2}\hat{Y}_{1}}:P_{\hat{X}_{1}}=Q_{1},P_{\hat{X}_{2}}=Q_{2}}D(P_{\hat{Y}_{1}\|\hat{X}_{1}\hat{X}_{2}}\|\|q_{1}\|P_{\hat{X}_{1},\hat{X}_{2}})$
		$\displaystyle+I(\hat{X}_{1};\hat{X}_{2})+\|I(\hat{X}_{1};\hat{Y}_{1})-R_{1}\|^{+}.$

We can choose the better decoder between these two, that leads to the better error performance. Therefore, we obtain that

E_{B,1}=\max\{E_{1};\min\{E_{1,2};E_{1|2}\}\}

(1)

is an achievable error exponent at receiver 1, with an analogous exponent following for receiver 2.

IV Main Result

Our main contribution is stated in the following theorem, which presents a new error exponent region for the discrete memoryless two–user IFC. While the full proof appears in Appendix A, we also provide a proof outline below, to give an idea of the main steps.

Theorem 1

For a discrete memoryless two-user IFC as defined in Section I, for a family of block codes of rates $R_{1}$ and $R_{2}$ a decoding error probability for user 1 satisfying

\liminf_{n\to\infty}-\frac{1}{n}\log\overline{P}_{e,1}(n)\geq E_{R,1}(R_{1},R_{2},Q_{1},Q_{2},\rho,\lambda)

(2)

can be achieved as the block length of the codes $n$ goes to infinity, where the error exponent $E_{R,1}(R_{1},R_{2},Q_{1},Q_{2},\rho,\lambda)$ is given by

	$\displaystyle E_{R,1}(R_{1},R_{2},Q_{1},Q_{2},\rho,\lambda)=\Bigg{\{}R_{2}-\rho R_{1}+\min\Bigg{\{}$
	$\displaystyle\min_{\begin{subarray}{c}(P_{\hat{X}_{1}\hat{X}_{2}\hat{Y}_{1}},P_{\hat{X}_{1}^{\prime}\hat{X}_{2}^{\prime}\hat{Y}_{1}^{\prime}})\\ \in{\cal S}_{1}(Q_{1},Q_{2})\end{subarray}}f_{1}\left(\rho,\lambda,P_{\hat{X}_{1}\hat{X}_{2}\hat{Y}_{1}},P_{\hat{X}_{1}^{\prime}\hat{X}_{2}^{\prime}\hat{Y}_{1}^{\prime}}\right);$
	$\displaystyle\min_{\begin{subarray}{c}(P_{\hat{X}_{1}\hat{X}_{2}\hat{Y}_{1}},P_{\hat{X}_{1}^{\prime}\hat{X}_{2}^{\prime}\hat{Y}_{1}^{\prime}})\\ \in{\cal S}_{2}(Q_{1},Q_{2},R_{2})\end{subarray}}f_{2}\left(\rho,\lambda,P_{\hat{X}_{1}\hat{X}_{2}\hat{Y}_{1}},P_{\hat{X}_{1}^{\prime}\hat{X}_{2}^{\prime}\hat{Y}_{1}^{\prime}}\right)\Bigg{\}}\Bigg{\}}$		(3)

where

$\displaystyle f_{1}\stackrel{{\scriptstyle\triangle}}{{=}}$	$\displaystyle g(\rho,\lambda,P_{\hat{X}_{1}\hat{X}_{2}\hat{Y}_{1}},P_{\hat{X}_{1}^{\prime}\hat{X}_{2}^{\prime}\hat{Y}_{1}^{\prime}})-H(\hat{Y}_{1}\|\hat{X}_{1})+\rho I(\hat{X}_{1}^{\prime};\hat{Y}_{1}^{\prime})$
	$\displaystyle+\max\bigg{\{}I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1})-R_{2};$
	$\displaystyle\quad\quad\quad\quad\quad\quad\quad\quad\overline{\rho\lambda}(I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1})-R_{2})\bigg{\}}$
	$\displaystyle+\max\bigg{\{}\overline{\rho}I(\hat{X}_{2}^{\prime};\hat{Y}_{1}^{\prime})+\rho I(\hat{X}_{2}^{\prime};\hat{X}_{1}^{\prime},\hat{Y}_{1}^{\prime})-R_{2};$
	$\displaystyle\rho(I(\hat{X}_{2}^{\prime};\hat{X}_{1}^{\prime},\hat{Y}_{1}^{\prime})-R_{2});\rho\lambda(I(\hat{X}_{2}^{\prime};\hat{X}_{1}^{\prime},\hat{Y}_{1}^{\prime})-R_{2})\bigg{\}}$	(4)
$\displaystyle f_{2}\stackrel{{\scriptstyle\triangle}}{{=}}$	$\displaystyle g(\rho,\lambda,P_{\hat{X}_{1}\hat{X}_{2}\hat{Y}_{1}},P_{\hat{X}_{1}^{\prime}\hat{X}_{2}^{\prime}\hat{Y}_{1}^{\prime}})-H(\hat{Y}_{1}\|\hat{X}_{1})$
	$\displaystyle+\rho I(\hat{X}_{1}^{\prime};\hat{X}_{2}^{\prime},\hat{Y}_{1}^{\prime})+I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1})-R_{2}$	(5)

with

	$\displaystyle g\stackrel{{\scriptstyle\triangle}}{{=}}$	$\displaystyle-\overline{\rho\lambda}E_{\hat{X}_{1},\hat{X}_{2},\hat{Y}_{1}}\log q_{1}(\hat{Y}_{1}\|\hat{X}_{1},\hat{X}_{2})$
		$\displaystyle\quad\quad\quad\quad-\rho\lambda E_{\hat{X}_{1}^{\prime},\hat{X}_{2}^{\prime},\hat{Y}_{1}^{\prime}}\log q_{1}(\hat{Y}_{1}^{\prime}\|\hat{X}_{1}^{\prime},\hat{X}_{2}^{\prime})$

and

	$\displaystyle{\cal S}_{1}(Q_{1},Q_{2})\stackrel{{\scriptstyle\triangle}}{{=}}$	$\displaystyle\big{\{}(P_{\hat{X}_{1}\hat{X}_{2}\hat{Y}_{1}},P_{\hat{X}_{1}^{\prime}\hat{X}_{2}^{\prime}\hat{Y}_{1}^{\prime}})\in{\cal S}^{2}:P_{\hat{Y}_{1}}=P_{\hat{Y}_{1}^{\prime}},$
		$\displaystyle P_{\hat{X}_{1}}=P_{\hat{X}_{1}^{\prime}}=Q_{1},P_{\hat{X}_{2}}=P_{\hat{X}_{2}^{\prime}}=Q_{2}\big{\}}$		(6)

$\displaystyle{\cal S}_{2}(Q_{1},Q_{2},R_{2})\stackrel{{\scriptstyle\triangle}}{{=}}$	$\displaystyle\big{\{}(P_{\hat{X}_{1}\hat{X}_{2}\hat{Y}_{1}},P_{\hat{X}_{1}^{\prime}\hat{X}_{2}^{\prime}\hat{Y}_{1}^{\prime}})\in{\cal S}^{2}:$
	$\displaystyle P_{\hat{X}_{1}}=P_{\hat{X}_{1}^{\prime}}=Q_{1},P_{\hat{X}_{2}}=P_{\hat{X}_{2}^{\prime}}=Q_{2},$
	$\displaystyle R_{2}\leq I(\hat{X}_{2};\hat{Y}_{1}),P_{\hat{X}_{2},\hat{Y}_{1}}=P_{\hat{X}_{2}^{\prime},\hat{Y}_{1}^{\prime}}\big{\}}$	(7)

where ${\cal S}$ is the probability simplex in $\mathcal{X}_{1}\times\mathcal{X}_{2}\times\mathcal{Y}_{1}$ . In the bound (2), $(\rho,\lambda)\in[0,1]^{2}$ can be chosen to maximize the error exponent $E_{R,1}$ .

In eqs. (2), (3), (6), and (7), $Q_{1}$ and $Q_{2}$ are probability distributions defined over the alphabets $\mathcal{X}_{1}$ and $\mathcal{X}_{2}$ respectively.

Expressions for the error probability $P_{e,2}$ and error exponent $E_{R,2}$ equivalent to (2) and (3) can be stated for the receiver of user 2 by replacing $X_{1}\leftrightarrow X_{2}$ , $Y_{1}\rightarrow Y_{2}$ , and $q_{1}\rightarrow q_{2}$ in all the expressions. By varying $Q_{1}$ and $Q_{2}$ over all probability distributions in $\mathcal{X}_{1}$ and $\mathcal{X}_{2}$ respectively, we obtain the error exponent region for fixed rates $R_{1}$ and $R_{2}$ .

Remark 1

A lower bound to $E_{R,1}^{*}\stackrel{{\scriptstyle\triangle}}{{=}}\max_{\rho,\lambda}E_{R,1}(R_{1},$ $R_{2},Q_{1},Q_{2},\rho,\lambda)$ is derived in Appendix B (cf. equation (B.4)) that is closer in form to the expressions underlying the benchmark exponent $E_{B,1}$ presented above. In particular, this lower bound allows us to establish analytically (see Appendix B) that $E_{B,1}\leq E_{R,1}^{*}$ at $R_{1}=0$ (and for sufficiently small $R_{1}$ ). Numerical computations, as presented in Section VI, indicate that this inequality can be strict.

A second application of the lower bound (B.4) is to determine the set of rate pairs $R_{1},R_{2}$ for which $E_{R,1}^{*}>0$ . We show in Appendix B that this region includes

\mathcal{R}_{1}=\{R_{1}<I(\hat{X}_{1};\hat{Y}_{1})\}\cup\Big{\{}\{R_{1}+R_{2}<I(\hat{Y}_{1};\hat{X}_{1},\hat{X}_{2})\}\\ \cap\{R_{1}<I(\hat{X}_{1};\hat{Y}_{1}|\hat{X}_{2})\Big{\}},

with an analogous region following for the set where $E_{R,2}^{*}>0$ (see Fig. 1).

Refer to caption — Figure 1: Rate region $\mathcal{R}_{1}$ where $E_{R,1}^{*}>0$ .

Furthermore, it is shown in [11] that the error exponent achievable for user no. 1 with optimal decoding and random fixed composition codebooks is zero outside the closure of the region $\mathcal{R}_{1}$ . This result, together with our contribution characterize the rate region where the attainable exponents with random constant composition codebooks are positive. Finally, it can be shown that this region is contained in the HK region [11].

Remark 2

Theorem 1 presents an asymptotic upper bound on the average probability of decoding error for fixed composition codebooks, where the averaging is done over the random choice of codebooks. It is straightforward to show (see, e.g., [4]) that there exists a specific (i.e. non-random) sequence of fixed composition codebooks of increasing block length $n$ for which the same asymptotic error performance can be achieved.

Proof Outline. For $n$ non–negative reals $a_{1},\ldots,a_{n}$ and $b\in[0,1]$ , the following inequality [5, Problem 4.15(f)] will be frequently used:

\left(\sum_{i=1}^{n}a_{i}\right)^{b}\leq\sum_{i=1}^{n}a_{i}^{b}.

(8)

For a given block length $n$ , we generate the codebook of user $i=1,2$ by choosing $M_{i}$ sequences $\mbox{\boldmath$x$}_{i}$ of length $n$ independently and uniformly over all the sequences of length $n$ and type $Q_{i}$ in $\mathcal{X}_{i}^{n}$ . Note that $Q_{i},i=1,2$ have rational entries with denominator $n$ . We will write $\mbox{\boldmath$x$}_{i,j}$ to denote the $j$ -th codeword of user $i$ .

For a given channel output $\mbox{\boldmath$y$}_{1}\in\mathcal{Y}_{1}^{n}$ , the best decoding rule to minimize the probability of error in decoding the message of user 1 is ML decoding, which consists of picking the message $m$ which maximizes $P(\mbox{\boldmath$y$}_{1}|\mbox{\boldmath$x$}_{1,m})=\sum_{i=1}^{M_{2}}q_{1}^{(n)}(\mbox{\boldmath$y$}_{1}|\mbox{\boldmath$x$}_{1,m},\mbox{\boldmath$x$}_{2,i})/M_{2}$ . Letting

q_{1,{\cal C}_{2}}^{(n)}(\mbox{\boldmath$y$}_{1}|\mbox{\boldmath$x$}_{1})\stackrel{{\scriptstyle\triangle}}{{=}}\frac{1}{M_{2}}\sum_{i=1}^{M_{2}}q_{1}^{(n)}(\mbox{\boldmath$y$}_{1}|\mbox{\boldmath$x$}_{1},\mbox{\boldmath$x$}_{2,i})

(9)

be the “average” channel observed at receiver 1, where the averaging is done over the codewords of user 2 in ${\cal C}_{2}$ , the decoding error probability at receiver 1 for transmitted codeword $\mbox{\boldmath$x$}_{1,m}$ and codebooks ${\cal C}_{1}$ and ${\cal C}_{2}$ is given by:

	$\displaystyle P_{e,1}(\mbox{\boldmath$x$}_{1,m},{\cal C}_{1},{\cal C}_{2})=$
	$\displaystyle\sum_{\mbox{\boldmath$y$}_{1}\in\mathcal{Y}_{1}^{n}}P_{e,1}($	$\displaystyle\mbox{\boldmath$x$}_{1,m},{\cal C}_{1},{\cal C}_{2}\|\mbox{\boldmath$y$}_{1})q_{1,{\cal C}_{2}}^{(n)}(\mbox{\boldmath$y$}_{1}\|\mbox{\boldmath$x$}_{1,m})$		(10)

With the introduction of the average channel (9), and the use of two auxiliary parameters $(\rho,\lambda)\in[0,1]^{2}$ , we can follow the approach of [5] to bound the conditional probability of decoding error $P_{e,1}(\mbox{\boldmath$x$}_{m},{\cal C}_{1},{\cal C}_{2}|\mbox{\boldmath$y$}_{1})$ . Taking expectation over the random choice of codebooks ${\cal C}_{1}$ and ${\cal C}_{2}$ we obtain an average error probability:

	$\displaystyle\overline{P}_{E_{1}}\leq$	$\displaystyle M_{1}^{\rho}\sum_{\mbox{\boldmath$y$}_{1}\in\mathcal{Y}_{1}^{n}}\mbox{\boldmath$E$}_{{\cal C}_{2}}\bigg{\{}\mbox{\boldmath$E$}_{\mbox{\boldmath$X$}_{1}}\bigg{[}[q_{1,{\cal C}_{2}}^{(n)}(\mbox{\boldmath$y$}_{1}\|\mbox{\boldmath$X$}_{1})]^{\overline{\rho\lambda}}\bigg{]}$
		$\displaystyle\cdot\mbox{\boldmath$E$}_{\mbox{\boldmath$X$}_{1}}^{\rho}\bigg{[}[q_{1,{\cal C}_{2}}^{(n)}(\mbox{\boldmath$y$}_{1}\|\mbox{\boldmath$X$}_{1})]^{\lambda}\bigg{]}\bigg{\}}$		(11)

where we used Jensen’s inequality to move the second expectation inside $(\cdot)^{\rho}$ .

Equation (11) is hard to handle, mainly due to the correlation introduced by ${\cal C}_{2}$ between the two factors inside the outer expectation. Furthermore, the evaluation of the inner expectations over $\mbox{\boldmath$X$}_{1}$ are complicated due to the powers $\overline{\rho\lambda}$ and $\lambda$ affecting $q_{1,{\cal C}_{2}}^{(n)}(\mbox{\boldmath$y$}_{1}|\mbox{\boldmath$X$}_{1})$ . Bounding methods based on Jensen’s inequality and (8) fail to give good results due to the loss of exponential tightness.

We proceed with a refined bounding technique based on the method of types inspired by [7]. While in this approach we still use (8), we use it to bound sums with a number of terms that only grows polynomially with $n$ , and as a result, exponential tightness is preserved.

Since the channel is memoryless,

$\displaystyle q_{1,{\cal C}_{2}}^{(n)}(\mbox{\boldmath$y$}_{1}\|\mbox{\boldmath$x$}_{1})=$	$\displaystyle\frac{1}{M_{2}}\sum_{i=1}^{M_{2}}\prod_{t=1}^{n}q_{1}(y_{1}(t)\|x_{1}(t),x_{2,i}(t))$
$\displaystyle=$	$\displaystyle\frac{1}{M_{2}}\sum_{P_{\hat{X}_{1}\hat{X}_{2}\hat{Y}_{1}}}N_{\mbox{\boldmath$x$}_{1},\mbox{\boldmath$y$}_{1}}(P_{\hat{X}_{1}\hat{X}_{2}\hat{Y}_{1}})$
	$\displaystyle\cdot e^{n\mbox{\boldmath$E$}_{\hat{X}_{1}\hat{X}_{2}\hat{Y}_{1}}\left[\log q_{1}\left(\hat{Y}_{1}\|\hat{X}_{1},\hat{X}_{2}\right)\right]}$	(12)

where we used $N_{\mbox{\boldmath$x$}_{1},\mbox{\boldmath$y$}_{1}}(P_{\hat{X}_{1}\hat{X}_{2}\hat{Y}_{1}})$ to denote the number of codewords $\mbox{\boldmath$x$}_{2}$ in ${\cal C}_{2}$ such that $(\mbox{\boldmath$x$}_{1},\mbox{\boldmath$x$}_{2},\mbox{\boldmath$y$}_{1})$ have empirical distribution $P_{\hat{X}_{1}\hat{X}_{2}\hat{Y}_{1}}$ . We also used $\mbox{\boldmath$E$}_{\hat{X}_{1}\hat{X}_{2}\hat{Y}_{1}}(\cdot)$ to denote expectation with respect to the distribution $P_{\hat{X}_{1}\hat{X}_{2}\hat{Y}_{1}}$ .

Replacing (12) in (11) and using (8) three times, we obtain:

$\displaystyle\overline{P}_{E_{1}}\leq$	$\displaystyle\frac{M_{1}^{\rho}}{M_{2}}\sum_{\hat{P}}\sum_{\hat{P}^{\prime}}\sum_{\mbox{\boldmath$y$}_{1}\in\mathcal{Y}_{1}^{n}}\mbox{\boldmath$E$}_{{\cal C}_{2}}\Bigg{\{}\mbox{\boldmath$E$}_{\mbox{\boldmath$X$}_{1}}\bigg{[}N_{\mbox{\boldmath$X$}_{1},\mbox{\boldmath$y$}_{1}}^{\overline{\rho\lambda}}(\hat{P})\bigg{]}$
	$\displaystyle\cdot\mbox{\boldmath$E$}_{\mbox{\boldmath$X$}_{1}}^{\rho}\bigg{[}N_{\mbox{\boldmath$X$}_{1},\mbox{\boldmath$y$}_{1}}^{\lambda}(\hat{P}^{\prime})\bigg{]}\Bigg{\}}$
	$\displaystyle\cdot e^{n[\overline{\rho\lambda}E_{\hat{P}}\log q_{1}(\hat{Y}_{1}\|\hat{X}_{1},\hat{X}_{2})+\lambda E_{\hat{P}^{\prime}}\log q_{1}(\hat{Y}_{1}^{\prime}\|\hat{X}_{1}^{\prime},\hat{X}_{2}^{\prime})}$	(13)

where we used $\hat{P}=P_{\hat{X}_{1}\hat{X}_{2}\hat{Y}_{1}}$ and $\hat{P}^{\prime}=P_{\hat{X}_{1}^{\prime}\hat{X}_{2}^{\prime}\hat{Y}_{1}^{\prime}}$ to shorten the expression.

We next consider the bounding of

	$\displaystyle E(\mbox{\boldmath$y$}_{1},\hat{P},\hat{P}^{\prime})\stackrel{{\scriptstyle\triangle}}{{=}}$
	$\displaystyle\mbox{\boldmath$E$}_{{\cal C}_{2}}\Bigg{\{}\mbox{\boldmath$E$}_{\mbox{\boldmath$X$}_{1}}$	$\displaystyle\bigg{[}N_{\mbox{\boldmath$X$}_{1},\mbox{\boldmath$y$}_{1}}^{\overline{\rho\lambda}}(\hat{P})\bigg{]}\mbox{\boldmath$E$}_{\mbox{\boldmath$X$}_{1}}^{\rho}\bigg{[}N_{\mbox{\boldmath$X$}_{1},\mbox{\boldmath$y$}_{1}}^{\lambda}(\hat{P}^{\prime})\bigg{]}\Bigg{\}},$		(14)

and note that $N_{\mbox{\boldmath$X$}_{1},\mbox{\boldmath$y$}_{1}}(\hat{P})$ and $N_{\mbox{\boldmath$X$}_{1},\mbox{\boldmath$y$}_{1}}(\hat{P}^{\prime})$ are formed by sums of an exponentially large number of indicator functions, each of which takes value 1 with exponentially small probability. These sums concentrate around their means, which show different behavior depending on how the number of terms in the sum ( $e^{nR_{2}}$ ) compares to the probability of each of the indicator functions taking value 1 (depending on the case considered, these probabilities take the form $e^{-nI(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1})}$ , $e^{-nI(\hat{X}_{2}^{\prime};\hat{X}_{1}^{\prime},\hat{Y}_{1}^{\prime})}$ , or $e^{-nI(\hat{X}_{2}^{\prime};\hat{Y}_{1}^{\prime})}$ ). Whenever one of the factors in (14) concentrates around its mean it behaves as a constant, and hence is uncorrelated with the remaining factor. As a result, the correlation between the two factors of (14), which complicates the analysis, can be circumvented. We give the details of this part of the derivation in Appendix A, but note here that the resulting bound on $E(\mbox{\boldmath$y$}_{1},\hat{P},\hat{P}^{\prime})$ depends on $\mbox{\boldmath$y$}_{1}$ only through a factor $1(\mbox{\boldmath$y$}_{1}\in P_{\hat{Y}_{1}},P_{\hat{Y}_{1}^{\prime}};P_{\hat{X}_{1}}=P_{\hat{X}_{1}^{\prime}}=Q_{1};P_{\hat{X}_{2}}=P_{\hat{X}_{2}^{\prime}}=Q_{2})$ . Therefore, the innermost sum in (13) can be evaluated by counting the number of vectors $\mbox{\boldmath$y$}_{1}\in\mathcal{Y}_{1}^{n}$ that have empirical types $P_{\hat{Y}_{1}}$ and $P_{\hat{Y}_{1}^{\prime}}$ . Note that this count can only be positive for $P_{\hat{Y}_{1}}=P_{\hat{Y}_{1}^{\prime}}$ . This count is approximately equal to $e^{nH(\hat{Y}_{1})}$ to first order in the exponent. Furthermore, the sums over $\hat{P}$ and $\hat{P}^{\prime}$ in (13) have a number of terms that only grows polynomially with $n$ . Therefore, to first order, the exponential growth rate of (13) equals the maximum exponential growth rate of the argument of the outer two sums, where the maximization is performed over the distributions $\hat{P}$ and $\hat{P}^{\prime}$ which are rational, with denominator $n$ . We can further upper bound the probability of error by enlarging the optimization region, maximizing over any probability distributions $\hat{P},\hat{P}^{\prime}$ .

V Convex Optimization Issues

In order to get a valid evaluation of $E_{R,1}(R_{1},R_{2},Q_{1},$ $Q_{2},\rho,\lambda)$ , for any given $Q_{1},Q_{2},\rho,\lambda$ satisfying the constraints of the outer maximization, we need to accurately solve the inner minimization problems. A brute force search may not give accurate enough results in reasonable time. As will be shown below, the first minimization problem in (3) is a convex problem, and as a result, it that can be solved efficiently. In addition, convexity allows to lower bound the objective function by its supporting hyperplane, which in turn, allows to get a reliable¹¹1In our implementation we solve the original convex optimization problem using the MATLAB function fmincon. lower bound through the solution of a linear program.

The second minimization problem is not convex due to the non–convex constraint $R_{2}\leq I(\hat{X}_{2};\hat{Y}_{1})$ . If we remove this constraint, it will be later shown that we obtain a convex problem that can be solved efficiently. There are two possible situations:

The first situation occurs when the optimal solution to the modified problem satisfies $R_{2}\leq I(\hat{X}_{2};\hat{Y}_{1})$ : in this case, the solution to the modified problem is also a solution to the original problem.

The second situation is when the optimal solution to the modified problem satisfies $R_{2}>I(\hat{X}_{2};\hat{Y}_{1})$ : in this case, a solution to the original problem must satisfy $R_{2}=I(\hat{X}_{2};\hat{Y}_{1})$ . We prove this statement by contradiction. Let $P^{*}_{1}$ be the optimal solution to the modified problem, and $P^{*}_{2}$ be an optimal solution to the original problem. Now assume conversely, that there is no $P^{*}_{2}$ that satisfies $R_{2}=I(\hat{X}_{2};\hat{Y}_{1})$ . With this assumption, we have that at $P^{*}_{2}$ , $R_{2}<I(\hat{X}_{2};\hat{Y}_{1})$ . Let $\mathcal{D}\triangleq\{P=(P_{\hat{X}_{1}\hat{X}_{2}\hat{Y}_{1}},P_{\hat{X}_{1}^{\prime}\hat{X}_{2}^{\prime}\hat{Y}_{1}^{\prime}}):P_{\hat{X}_{1}}=P_{\hat{X}_{1}^{\prime}}=Q_{1},P_{\hat{X}_{2}}=P_{\hat{X}_{2}^{\prime}}=Q_{2}\}$ . Note that $\mathcal{D}$ is a convex set and $P_{1}^{*},P_{2}^{*}\in\mathcal{D}$ . Due to the continuity of $I(\hat{X}_{2};\hat{Y}_{1})$ , the straight line in $\mathcal{D}$ that joins $P^{*}_{1}$ and $P^{*}_{2}$ must pass through an intermediate point $\overline{P}=\alpha P^{*}_{1}+(1-\alpha)P^{*}_{2}$ , $\alpha\in(0,1)$ , that satisfies $I(\hat{X}_{2};\hat{Y}_{1})=R_{2}$ . Let $f_{2}(\cdot)$ be the objective function of the second minimization problem in (3), restricted to $\mathcal{D}$ . It will be shown later that $f_{2}(\cdot)$ , restricted to this domain, is a convex function. By hypothesis, $f_{2}(\overline{P})>f_{2}(P_{2}^{*})$ and we have $f_{2}(P_{1}^{*})\leq f_{2}(P_{2}^{*})<f_{2}(\overline{P})$ . On the other hand, from the convexity of $f_{2}(\cdot)$ , restricted to $\mathcal{D}$ , we have $f_{2}(\overline{P})\leq\alpha f_{2}(P_{1}^{*})+(1-\alpha)f_{2}(P_{2}^{*})\leq f_{2}(P_{2}^{*})$ and we get a contradiction. Therefore, it follows that there is a solution $P^{*}_{2}$ to the original problem that satisfies $R_{2}=I(\hat{X}_{2};\hat{Y}_{1})$ .

Let $f_{1}(\cdot)$ be the objective function of the first minimization problem in (3). First, we note that $P_{2}^{*}$ satisfies the constraints of the first minimization problem since they are less restrictive than the constraints of the second minimization problem in (3). We next prove that $f_{1}(P_{2}^{*})=f_{2}(P_{2}^{*})$ . As a result, the optimal solution $P^{*}$ of the first minimization problem satisfies $f_{1}(P^{*})\leq f_{1}(P_{2}^{*})=f_{2}(P^{*}_{2})$ , and we do not need to know $f_{2}(P_{2}^{*})$ to evaluate the argument of the maximization in (3). Using the fact that at $P_{2}^{*}$ , $I(\hat{X}_{2};\hat{Y}_{1})=I(\hat{X}_{2}^{\prime};\hat{Y}_{1}^{\prime})=R_{2}$ , we have:

	$\displaystyle f_{2}(P_{2}^{})-f_{1}(P_{2}^{})$
	$\displaystyle=\rho I(\hat{X}_{1}^{\prime};\hat{X}_{2}^{\prime},\hat{Y}_{1}^{\prime})-\rho I(\hat{X}_{1}^{\prime};\hat{Y}_{1}^{\prime})-\rho(I(\hat{X}_{2}^{\prime};\hat{X}_{1}^{\prime},\hat{Y}_{1}^{\prime})-R_{2})$
	$\displaystyle=\rho\left[I(\hat{X}_{2}^{\prime};\hat{X}_{1}^{\prime},\hat{Y}_{1}^{\prime})-I(\hat{X}_{2}^{\prime};\hat{Y}_{1}^{\prime})-I(\hat{X}_{2}^{\prime};\hat{X}_{1}^{\prime},\hat{Y}_{1}^{\prime})+R_{2}\right]$
	$\displaystyle=0,$		(15)

where we used the identity $I(\hat{X}_{1}^{\prime};\hat{X}_{2}^{\prime},\hat{Y}_{1}^{\prime})-I(\hat{X}_{1}^{\prime};\hat{Y}_{1}^{\prime})=I(\hat{X}_{2}^{\prime};\hat{X}_{1}^{\prime},\hat{Y}_{1}^{\prime})-I(\hat{X}_{2}^{\prime};\hat{Y}_{1}^{\prime})$ in the second equality.

In summary, if the solution to the second minimization problem in (3), without the constraint on $R_{2}$ , satisfies $R_{2}>I(\hat{X}_{2};\hat{Y}_{1})$ , then the first minimization problem in (3) dominates the expression. Otherwise, the solution to the second minimization problem in (3) without the constraint $R_{2}\leq I(\hat{X}_{2};\hat{Y}_{1})$ , equals the solution to the second minimization problem with this constraint.

It remains to show that the objective functions of the minimization problems in (3), $f_{1}(P_{\hat{X}_{1}\hat{X}_{2}\hat{Y}_{1}},P_{\hat{X}_{1}^{\prime}\hat{X}_{2}^{\prime}\hat{Y}_{1}^{\prime}})$ , $f_{2}(P_{\hat{X}_{1}\hat{X}_{2}\hat{Y}_{1}},P_{\hat{X}_{1}^{\prime}\hat{X}_{2}^{\prime}\hat{Y}_{1}^{\prime}})$ , restricted to the domain $\mathcal{D}$ , are convex functions. Since the sum of convex functions is convex, to prove the convexity of $f_{1}(\cdot)$ on $\mathcal{D}$ , we only need to prove that the different terms of

$\displaystyle f_{1}=$	$\displaystyle-\overline{\rho\lambda}\mbox{\boldmath$E$}_{\hat{X}_{1}\hat{X}_{2}\hat{Y}_{1}}\log q(\hat{Y}_{1}\|\hat{X}_{1},\hat{X}_{2})-$
	$\displaystyle\rho\lambda\mbox{\boldmath$E$}_{\hat{X}_{1}^{\prime}\hat{X}_{2}^{\prime}\hat{Y}_{1}^{\prime}}\log q(\hat{Y}_{1}^{\prime}\|\hat{X}_{1}^{\prime},\hat{X}_{2}^{\prime})-H(\hat{Y}_{1}\|\hat{X}_{1})+\rho I(\hat{X}_{1}^{\prime};\hat{Y}_{1}^{\prime})$
	$\displaystyle+\max\bigg{\{}I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1})-R_{2};$
	$\displaystyle\quad\quad\quad\quad\quad\quad\quad\quad\overline{\rho\lambda}(I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1})-R_{2})\bigg{\}}$
	$\displaystyle+\max\bigg{\{}\overline{\rho}I(\hat{X}_{2}^{\prime};\hat{Y}_{1}^{\prime})+\rho I(\hat{X}_{2}^{\prime};\hat{X}_{1}^{\prime},\hat{Y}_{1}^{\prime})-R_{2};$
	$\displaystyle\rho(I(\hat{X}_{2}^{\prime};\hat{X}_{1}^{\prime},\hat{Y}_{1}^{\prime})-R_{2});\rho\lambda(I(\hat{X}_{2}^{\prime};\hat{X}_{1}^{\prime},\hat{Y}_{1}^{\prime})-R_{2})\bigg{\}}$	(16)

are convex within $\mathcal{D}$ .

First, we have that $-\overline{\rho\lambda}\mbox{\boldmath$E$}_{\hat{X}_{1}\hat{X}_{2}\hat{Y}_{1}}\log q(\hat{Y}_{1}|\hat{X}_{1},\hat{X}_{2})-\rho\lambda\mbox{\boldmath$E$}_{\hat{X}_{1}^{\prime}\hat{X}_{2}^{\prime}\hat{Y}_{1}^{\prime}}\log q(\hat{Y}_{1}^{\prime}|\hat{X}_{1}^{\prime},\hat{X}_{2}^{\prime})$ is linear in $(P_{\hat{X}_{1}\hat{X}_{2}\hat{Y}_{1}},P_{\hat{X}_{1}^{\prime}\hat{X}_{2}^{\prime}\hat{Y}_{1}^{\prime}})$ and therefore convex. Also, we have that $-H(\hat{Y}_{1}|\hat{X}_{1})=H(\hat{X}_{1})-H(\hat{X}_{1},\hat{Y}_{1})$ is convex for fixed $P_{\hat{X}_{1}}$ due to the concavity of $H(\hat{X}_{1},\hat{Y}_{1})$ .

In addition, $I(\hat{X}_{1}^{\prime};\hat{Y}_{1}^{\prime})$ can be written as $D(P_{\hat{X}_{1}^{\prime}\hat{Y}_{1}^{\prime}}||P_{\hat{X}_{1}^{\prime}}\times P_{\hat{Y}_{1}^{\prime}})$ . Let $\overline{P}=\lambda\hat{P}+(1-\lambda)\check{P}$ for any $\hat{P}$ , $\check{P}$ such that $\hat{P}_{\hat{X}_{1}^{\prime}}=\check{P}_{\hat{X}_{1}^{\prime}}$ and $\lambda\in[0,1]$ . We have that $\overline{P}_{\hat{X}_{1}^{\prime}\hat{Y}_{1}^{\prime}}=\lambda\hat{P}_{\hat{X}_{1}^{\prime}\hat{Y}_{1}^{\prime}}+(1-\lambda)\check{P}_{\hat{X}_{1}^{\prime}\hat{Y}_{1}^{\prime}}$ and $\overline{P}_{\hat{X}_{1}^{\prime}}\times\overline{P}_{\hat{Y}_{1}^{\prime}}=\hat{P}_{\hat{X}_{1}^{\prime}}\times(\lambda\hat{P}_{\hat{Y}_{1}^{\prime}}+(1-\lambda)\check{P}_{\hat{Y}_{1}^{\prime}})=\lambda(\hat{P}_{\hat{X}_{1}^{\prime}}\times\hat{P}_{\hat{Y}_{1}^{\prime}})+(1-\lambda)(\check{P}_{\hat{X}_{1}^{\prime}}\times\check{P}_{\hat{Y}_{1}^{\prime}})$ . The convexity of $\rho I(\hat{X}_{1}^{\prime};\hat{Y}_{1}^{\prime})$ for fixed $P_{\hat{X}_{1}^{\prime}}$ follows from the convexity of $D(P\|Q)$ in the pair $(P,Q)$ :

	$\displaystyle I(\hat{X}_{1}^{\prime};\hat{Y}^{\prime}_{1})\bigg{\|}_{\overline{P}}$	$\displaystyle=D(\overline{P}_{\hat{X}_{1}^{\prime}\hat{Y}_{1}^{\prime}}\\|\overline{P}_{\hat{X}_{1}^{\prime}}\times\overline{P}_{\hat{Y}_{1}^{\prime}})$
		$\displaystyle\leq\lambda D(\hat{P}_{\hat{X}_{1}^{\prime}\hat{Y}_{1}^{\prime}}\\|\hat{P}_{\hat{X}_{1}^{\prime}}\times\hat{P}_{\hat{Y}_{1}^{\prime}})$
		$\displaystyle+(1-\lambda)D(\check{P}_{\hat{X}_{1}^{\prime}\hat{Y}_{1}^{\prime}}\\|\check{P}_{\hat{X}_{1}^{\prime}}\times\check{P}_{\hat{Y}_{1}^{\prime}})$
		$\displaystyle=\lambda I(\hat{X}_{1}^{\prime};\hat{Y}_{1}^{\prime})\bigg{\|}_{\hat{P}}+(1-\lambda)I(\hat{X}_{1}^{\prime};\hat{Y}_{1}^{\prime})\bigg{\|}_{\check{P}}.$

Continuing with the next term of (16),

\max\big{\{}I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1})-R_{2};\overline{\rho\lambda}(I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1})-R_{2})\big{\}}

we note that it is the maximum of two convex functions, and therefore convex. The convexity of each of the individual functions follows from the convexity of $I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1})$ for fixed $P_{\hat{X}_{1}}$ , $P_{\hat{X}_{2}}$ , which can be proved along the same lines as (LABEL:eqn:convexI).

Finally, we consider the last term of (16):

	$\displaystyle\max\bigg{\{}\overline{\rho}I(\hat{X}_{2}^{\prime};\hat{Y}_{1}^{\prime})+\rho I(\hat{X}_{2}^{\prime};\hat{X}_{1}^{\prime},\hat{Y}_{1}^{\prime})-R_{2};$
	$\displaystyle\quad\quad\rho(I(\hat{X}_{2}^{\prime};\hat{X}_{1}^{\prime},\hat{Y}_{1}^{\prime})-R_{2});\rho\lambda(I(\hat{X}_{2}^{\prime};\hat{X}_{1}^{\prime},\hat{Y}_{1}^{\prime})-R_{2})\bigg{\}}.$

Each of the arguments of the $\max\{\ldots\}$ can be shown to be the sum of convex functions for fixed $P_{\hat{X}_{1}^{\prime}}$ and $P_{\hat{X}_{2}^{\prime}}$ , using a similar argument as the one used to prove (LABEL:eqn:convexI). Since the maximum of convex functions is convex, the convexity of $f_{1}$ restricted to $\mathcal{D}$ follows.

Using similar arguments, it is easy to show that

	$\displaystyle f_{2}$	$\displaystyle=-\overline{\rho\lambda}\mbox{\boldmath$E$}_{\hat{X}_{1}\hat{X}_{2}\hat{Y}_{1}}\log q_{1}(\hat{Y}_{1}\|\hat{X}_{1},\hat{X}_{2})-$
		$\displaystyle\rho\lambda\mbox{\boldmath$E$}_{\hat{X}_{1}^{\prime}\hat{X}_{2}^{\prime}\hat{Y}_{1}^{\prime}}\log q_{1}(\hat{Y}_{1}^{\prime}\|\hat{X}_{1}^{\prime}\hat{X}_{2}^{\prime})-H(\hat{Y}_{1}\|\hat{X}_{1})+$
		$\displaystyle\rho I(\hat{X}_{1}^{\prime};\hat{X}_{2},\hat{Y}_{1}^{\prime})+I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1})-R_{2}$

is convex in $\mathcal{D}$ .

VI Numerical Results

In this section, we present a numerical example to show the performance of the error exponent region introduced in Theorem 1. We use as a baseline for comparison the error exponent region of Section III, which is obtained with minor modifications from known results for single user and multiple access channels.

We present results for the binary Z-channel model: $Y_{1}=X_{1}*X_{2}\oplus Z$ , $Y_{2}=X_{2}$ , where $X_{1},X_{2},Y_{1},Y_{2}\in\{0,1\}$ , $Z\sim\text{Bernoulli}(p)$ , $*$ is multiplication, and $\oplus$ is modulo 2 addition. This is a modified version of the binary erasure IFC studied in [10], where we add noise $Z$ to the received signal of user 1. In the results presented here, we fix $p=0.01$ .

The boundary of the error exponent region is a surface in four dimensions $R_{1},R_{2},E_{R,1},E_{R,2}$ . This surface can be obtained parametrically by computing $E_{R,1},E_{R,2}$ as a function of $R_{1},R_{2},Q_{1},Q_{2}$ , by optimizing over $\rho$ and $\lambda$ in (3) and in the corresponding expression for $E_{R,2}$ . The parameterization of $E_{R,i}$ in terms of $R_{1},R_{2},Q_{1},Q_{2}$ , allows the study of the error performance as a function of the parameters that directly influence it.

Fig. 2 shows that the error exponents under optimal decoding derived in this paper can be strictly better than the baseline error exponents of Section III. This suggests that the inequality obtained in Appendix B for $R_{1}=0$ can be strict. In addition, in all the plots that we computed for the Z-channel for different values of $Q_{1},Q_{2}$ and $R_{2}$ we were not able to find a single case where the baseline exponent $E_{B,1}$ was larger than $E_{R,1}$ .

We see that the curves of $E_{R,1}$ ( $E_{B,1}$ ) for fixed $R_{2},Q_{1},Q_{2}$ have a linear part for $R_{1}$ below a critical value $R_{1c}^{(R)}$ ( $R_{1c}^{(B)}$ ), and a curvy part for $R_{1}>R_{1c}^{(R)}$ ( $R_{1}>R_{1c}^{(B)}$ ) (note that the critical values depend on the parameters $R_{2},Q_{1}$ and $Q_{2}$ ). Figure 3 shows the optimal parameters $\rho$ and $\lambda$ for the $E_{R,1}$ curves shown in Fig. 2 for $R_{2}=0.139$ and $R_{2}=0.277$ nats/channel use. We see that for the linear part of the $E_{R,1}$ curves $\rho=1$ and $\lambda=1/2$ are optimal, while for the curvy part (i.e. $R_{1}>R_{1c}^{(R)}$ ) the optimal $\rho$ decreases to 0 and the optimal $\lambda$ increases towards 1. For $R_{1}$ in the interval $(0,\min\{R_{1c}^{(R)};R_{1c}^{(B)}\})$ the gap between the $E_{R,1}$ and $E_{B,1}$ curves remains constant as both curves are lines with slope $-1$ , and this gap is equal to the gap at $R_{1}=0$ . In general, any gap between $E_{R,1}$ and $E_{G,1}$ at $R_{1}=0$ will remain constant in the interval where both curves have slope $-1$ . We also note since the optimal parameters $\rho$ and $\lambda$ vary for different rates, these parameters are indeed active, i.e. they have influence on the resulting error exponent.

The curves of Fig. 2 are obtained for fixed choices of $Q_{1}$ and $Q_{2}$ , which are the distributions used to generate the random fixed composition codebooks. As $Q_{1}$ and $Q_{2}$ vary in the probability simplex ${\cal S}$ , we obtain the four-dimensional error exponent region $\{R_{1},R_{2},E_{R,1}(R_{1},R_{2},Q_{1},Q_{2}),E_{R,2}(R_{1},R_{2},Q_{1},Q_{2}):Q_{1},Q_{2}\in{\cal S}\}$ . In order to obtain a two-dimensional plot of the region, we consider a projection: we fix $R_{2}$ varying $R_{1}$ and plot the maximum value over $Q_{1}$ and $Q_{2}$ in the error exponent region of $\min\{E_{R,1},E_{R,2}\}$ . This corresponds to choosing $Q_{1}$ and $Q_{2}$ in order to maximize the error exponent simultaneously achievable for both users. Figure 4 shows this projection for $R_{2}=0.139$ and $R_{2}=0.277$ nats/channel use, where, for reference, we included the corresponding curves for the error exponents $E_{B,1},E_{B,2}$ of Section III.

For the noiseless binary channel of user 2, $E_{R,2}=\max\{H(Q_{2})-R_{2};0\}$ , and as a result, $E_{R,2}$ decreases with increasing $\mbox{Pr}(X_{2}=1)$ for $\mbox{Pr}(X_{2}=1)\geq 1/2$ . On the other hand, because of the multiplication between $X_{1}$ and $X_{2}$ in the received signal $Y_{1}$ , increasing $\mbox{Pr}(X_{2}=1)$ results in less interference for user 1, and a larger value of $E_{R,1}$ . It follows that there is a direct trade-off between $E_{R,1}$ and $E_{R,2}$ through the choice of $Q_{2}$ , and whenever $\min\{E_{R,1},E_{R,2}\}$ is maximized, $E_{R,1}=E_{R,2}$ . Therefore, in the curve of Fig. 4, $E_{R,1}=E_{R,2}$ .

From the plots of Figs. 2 and 4, we see that the error exponents obtained from Theorem 1 sometimes outperform and are never worse than the baseline error exponents of Section III.

Appendix A Proof of Theorem 1

It is easy to see that the optimum decoder for user 1 picks the message $m$ ( $1\leq m\leq M_{1}$ ) that maximizes $(1/M_{2})\sum_{\mbox{\boldmath$x$}_{2}\in{\cal C}_{2}}q_{1}^{(n)}(\mbox{\boldmath$y$}_{1}|\mbox{\boldmath$x$}_{1},\mbox{\boldmath$x$}_{2})$ , where $M_{1}=\lceil e^{nR_{1}}\rceil$ and $M_{2}=\lceil e^{nR_{2}}\rceil$ . Applying Gallager’s general upper bound to the “channel” $P(\mbox{\boldmath$y$}_{1}|\mbox{\boldmath$x$}_{1})=\frac{1}{M_{2}}\sum_{\mbox{\boldmath$x$}_{2}\in{\cal C}_{2}}q_{1}^{(n)}(\mbox{\boldmath$y$}_{1}|\mbox{\boldmath$x$}_{1},\mbox{\boldmath$x$}_{2})$ , we have for user no. 1:

	$\displaystyle P_{E_{1}}$	$\displaystyle\leq\sum_{\mbox{\boldmath$y$}_{1}}\left[\frac{1}{M_{2}}\sum_{\mbox{\boldmath$x$}_{2}\in{\cal C}_{2}}q_{1}^{(n)}(\mbox{\boldmath$y$}_{1}\|\mbox{\boldmath$x$}_{1},\mbox{\boldmath$x$}_{2})\right]^{\overline{\rho\lambda}}\times$
		$\displaystyle\left[\sum_{\mbox{\boldmath$x$}_{1}^{\prime}\neq\mbox{\boldmath$x$}_{1}}\left(\frac{1}{M_{2}}\sum_{\mbox{\boldmath$x$}_{2}\in{\cal C}_{2}}q_{1}^{(n)}(\mbox{\boldmath$y$}_{1}\|\mbox{\boldmath$x$}_{1}^{\prime},\mbox{\boldmath$x$}_{2})\right)^{\lambda}\right]^{\rho},$		(A.1)

where $\lambda\geq 0$ and $\rho\geq 0$ are arbitrary parameters to be optimized in the sequel. Thus, the average error probability is upper bounded by the expectation of the above w.r.t. the ensemble of codes of both users. Let us take the expectation w.r.t. the ensemble of user 1 first, and we denote this expectation operator by $\mbox{\boldmath$E$}_{{\cal C}_{1}}\{\cdot\}$ . Since the codewords of user 1 are independent, the expectation of the summand in the sum above is given by the product of expectations, namely, the product of

	$\displaystyle A$	$\displaystyle\stackrel{{\scriptstyle\triangle}}{{=}}\mbox{\boldmath$E$}_{{\cal C}_{1}}\left\{\left[\frac{1}{M_{2}}\sum_{\mbox{\boldmath$x$}_{2}\in{\cal C}_{2}}q_{1}^{(n)}(\mbox{\boldmath$y$}_{1}\|\mbox{\boldmath$x$}_{1},\mbox{\boldmath$x$}_{2})\right]^{\overline{\rho\lambda}}\right\}$
		$\displaystyle=M_{2}^{\rho\lambda-1}\mbox{\boldmath$E$}_{{\cal C}_{1}}\left\{\left[\sum_{\mbox{\boldmath$x$}_{2}\in{\cal C}_{2}}q_{1}^{(n)}(\mbox{\boldmath$y$}_{1}\|\mbox{\boldmath$x$}_{1},\mbox{\boldmath$x$}_{2})\right]^{\overline{\rho\lambda}}\right\}.$		(A.2)

and

	$\displaystyle B$	$\displaystyle\stackrel{{\scriptstyle\triangle}}{{=}}\mbox{\boldmath$E$}_{{\cal C}_{1}}\left\{\left[\sum_{\mbox{\boldmath$x$}_{1}^{\prime}\neq\mbox{\boldmath$x$}_{1}}\left(\frac{1}{M_{2}}\sum_{\mbox{\boldmath$x$}_{2}\in{\cal C}_{2}}q_{1}^{(n)}(\mbox{\boldmath$y$}_{1}\|\mbox{\boldmath$x$}_{1}^{\prime},\mbox{\boldmath$x$}_{2})\right)^{\lambda}\right]^{\rho}\right\}$
		$\displaystyle=M_{2}^{-\rho\lambda}\mbox{\boldmath$E$}_{{\cal C}_{1}}\left\{\left[\sum_{\mbox{\boldmath$x$}_{1}^{\prime}\neq\mbox{\boldmath$x$}_{1}}\left(\sum_{\mbox{\boldmath$x$}_{2}\in{\cal C}_{2}}q_{1}^{(n)}(\mbox{\boldmath$y$}_{1}\|\mbox{\boldmath$x$}_{1}^{\prime},\mbox{\boldmath$x$}_{2})\right)^{\lambda}\right]^{\rho}\right\}.$

Now, let $N_{\mbox{\boldmath$x$}_{1},\mbox{\boldmath$y$}_{1}}(P_{\hat{X}_{1}\hat{X}_{2}\hat{Y}_{1}})$ denote the number of codewords $\{\mbox{\boldmath$x$}_{2}\}$ that form a joint empirical PMF $P_{\hat{X}_{1}\hat{X}_{2}\hat{Y}_{1}}$ together with a given $\mbox{\boldmath$x$}_{1}$ and $\mbox{\boldmath$y$}_{1}$ . Then, using (8), $A$ can be bounded by

$\displaystyle A=$	$\displaystyle M_{2}^{\rho\lambda-1}\mbox{\boldmath$E$}_{\mbox{\boldmath$X$}_{1}}\Bigg{[}\sum_{P_{\hat{X}_{1}\hat{X}_{2}\hat{Y}_{1}}}N_{\mbox{\boldmath$X$}_{1},\mbox{\boldmath$y$}_{1}}(P_{\hat{X}_{1}\hat{X}_{2}\hat{Y}_{1}})\times$
	$\displaystyle\quad\quad\quad\quad\quad\quad\quad e^{n\mbox{\boldmath$E$}_{\hat{X}_{1}\hat{X}_{2}\hat{Y}_{1}}\log q_{1}(\hat{Y}_{1}\|\hat{X}_{1},\hat{X}_{2})}\Bigg{]}^{\overline{\rho\lambda}}$
$\displaystyle\leq$	$\displaystyle M_{2}^{\rho\lambda-1}\sum_{P_{\hat{X}_{1}\hat{X}_{2}\hat{Y}_{1}}}\mbox{\boldmath$E$}_{\mbox{\boldmath$X$}_{1}}N_{\mbox{\boldmath$X$}_{1},\mbox{\boldmath$y$}_{1}}^{\overline{\rho\lambda}}(P_{\hat{X}_{1}\hat{X}_{2}\hat{Y}_{1}})\times$
	$\displaystyle\quad\quad\quad\quad\quad\quad\quad e^{n\overline{\rho\lambda}\mbox{\boldmath$E$}_{\hat{X}_{1}\hat{X}_{2}\hat{Y}_{1}}\log q_{1}(\hat{Y}_{1}\|\hat{X}_{1},\hat{X}_{2})}$	(A.3)

where $q_{1}(\hat{Y}_{1}|\hat{X}_{1},\hat{X}_{2})$ is the single–letter transition probability distribution of the IFC, and where $\mbox{\boldmath$E$}_{\hat{X}_{1}\hat{X}_{2}\hat{Y}_{1}}f(\hat{X}_{1},\hat{X}_{2},\hat{Y}_{1})$ , for a generic function $f$ , denotes the expectation operator when the RV’s $(\hat{X}_{1},\hat{X}_{2},\hat{Y}_{1})$ are understood to be distributed according to $P_{\hat{X}_{1}\hat{X}_{2}\hat{Y}_{1}}$ . Similarly, (and using Jensen’s inequality to push the expectation w.r.t. ${\cal C}_{1}$ into the brackets), we have:

	$\displaystyle B\leq$	$\displaystyle M_{2}^{-\rho\lambda}M_{1}^{\rho}\Bigg{[}\sum_{P_{\hat{X}_{1}\hat{X}_{2}\hat{Y}_{1}}}\mbox{\boldmath$E$}_{\mbox{\boldmath$X$}_{1}}N_{\mbox{\boldmath$X$}_{1},\mbox{\boldmath$y$}_{1}}^{\lambda}(P_{\hat{X}_{1}\hat{X}_{2}\hat{Y}_{1}})\times$
		$\displaystyle\quad\quad\quad\quad\quad\quad\quad e^{n\lambda\mbox{\boldmath$E$}_{\hat{X}_{1}\hat{X}_{2}\hat{Y}_{1}}\log q(\hat{Y}_{1}\|\hat{X}_{1},\hat{X}_{2})}\Bigg{]}^{\rho}$		(A.4)

Taking the product of these two expressions, applying (8) to the summation in the bound for $B$ , and taking expectations with respect to the codebook ${\cal C}_{2}$ yields

$\displaystyle\mbox{\boldmath$E$}_{{\cal C}_{2}}(A$	$\displaystyle B)\leq M_{1}^{\rho}M_{2}^{-1}\sum_{P_{\hat{X}_{1}\hat{X}_{2}\hat{Y}_{1}}}\sum_{P_{\hat{X}_{1}^{\prime}\hat{X}_{2}^{\prime}\hat{Y}_{1}^{\prime}}}$
	$\displaystyle\mbox{\boldmath$E$}_{{\cal C}_{2}}[\mbox{\boldmath$E$}_{\mbox{\boldmath$X$}_{1}}N_{\mbox{\boldmath$X$}_{1},\mbox{\boldmath$y$}_{1}}^{\overline{\rho\lambda}}(P_{\hat{X}_{1}\hat{X}_{2}\hat{Y}_{1}})\mbox{\boldmath$E$}^{\rho}_{\mbox{\boldmath$X$}_{1}}N_{\mbox{\boldmath$X$}_{1},\mbox{\boldmath$y$}_{1}}^{\lambda}(P_{\hat{X}_{1}^{\prime}\hat{X}_{2}^{\prime}\hat{Y}_{1}^{\prime}})]$
	$\displaystyle\times\exp\{n[\overline{\rho\lambda}\mbox{\boldmath$E$}_{\hat{X}_{1}\hat{X}_{2}\hat{Y}_{1}}\log q_{1}(\hat{Y}_{1}\|\hat{X}_{1},\hat{X}_{2})$
	$\displaystyle\quad\quad\quad+\rho\lambda\mbox{\boldmath$E$}_{\hat{X}_{1}^{\prime}\hat{X}_{2}^{\prime}\hat{Y}_{1}^{\prime}}\log q_{1}(\hat{Y}_{1}^{\prime}\|\hat{X}_{1}^{\prime},\hat{X}_{2}^{\prime})]\}$	(A.5)

The next step is to bound the term involving the expectation over ${\cal C}_{2}$ . As noted, the codewords $\{\mbox{\boldmath$X$}_{1}\}$ and $\{\mbox{\boldmath$X$}_{2}\}$ are randomly selected i.i.d. over the type classes ${\cal T}_{1}={\cal T}_{Q_{1}}$ and ${\cal T}_{2}={\cal T}_{Q_{2}}$ corresponding to probability distributions $Q_{1}$ and $Q_{2}$ , respectively. To avoid cumbersome notation, we denote hereafter $\hat{P}=P_{\hat{X}_{1}\hat{X}_{2}\hat{Y}_{1}}$ and $\hat{P}^{\prime}=P_{\hat{X}_{1}^{\prime}\hat{X}_{2}^{\prime}\hat{Y}_{1}^{\prime}}$ and assume that $P_{\hat{X}_{1}}=P_{\hat{X}_{1}^{\prime}}=Q_{1}$ , $P_{\hat{X}_{2}}=P_{\hat{X}_{2}^{\prime}}=Q_{2}$ , $P_{\hat{Y}_{1}}=P_{\hat{Y}_{1}^{\prime}}$ and that $\mbox{\boldmath$y$}_{1}$ lies in the type class corresponding to $P_{\hat{Y}_{1}}$ . We will also use the shorthand notation

\mbox{\boldmath$E$}_{{\cal C}_{2}}\triangleq\mbox{\boldmath$E$}_{{\cal C}_{2}}[\mbox{\boldmath$E$}_{\mbox{\boldmath$X$}_{1}}N_{\mbox{\boldmath$X$}_{1},\mbox{\boldmath$y$}_{1}}^{\overline{\rho\lambda}}(\hat{P})\mbox{\boldmath$E$}_{\mbox{\boldmath$X$}_{1}}^{\rho}N_{\mbox{\boldmath$X$}_{1},\mbox{\boldmath$y$}_{1}}^{\lambda}(\hat{P}^{\prime})].

(A.6)

The bounding of $\mbox{\boldmath$E$}_{{\cal C}_{2}}$ requires considering multiple cases which depend on how $R_{2}$ compares to different information quantities, and also depend on properties of the joint types $P_{\hat{X}_{1}\hat{X}_{2}\hat{Y}_{1}},P_{\hat{X}_{1}^{\prime}\hat{X}_{2}^{\prime}\hat{Y}_{1}^{\prime}}$ . In order to guide the reader through the different steps we present in Fig. 5 below a schematic representation of the different cases that arise.

We first consider two different ranges of $R_{2}$ , according to its comparison with $I(\hat{X}_{2}^{\prime};\hat{X}_{1}^{\prime},\hat{Y}_{1}^{\prime})$ :

1. The range $R_{2}\geq I(\hat{X}_{2}^{\prime};\hat{X}_{1}^{\prime},\hat{Y}_{1}^{\prime})$ . Here we have:

	$\displaystyle\mbox{\boldmath$E$}_{{\cal C}_{2}}=\mbox{\boldmath$E$}_{{\cal C}_{2}}\Bigg{\{}\mbox{\boldmath$E$}_{\mbox{\boldmath$X$}_{1}}\left[N_{\mbox{\boldmath$X$}_{1},\mbox{\boldmath$y$}_{1}}^{1-\rho\lambda}(\hat{P})\right]$
	$\displaystyle\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\times\bigg{[}\frac{1}{\|{\cal T}_{1}\|}\sum_{\tilde{\mbox{\boldmath$x$}}\in{\cal T}_{1}}N_{\tilde{\mbox{\boldmath$x$}}_{1},\mbox{\boldmath$y$}_{1}}^{\lambda}(\hat{P}^{\prime})\bigg{]}^{\rho}\Bigg{\}}$
$\displaystyle=$	$\displaystyle\mbox{\boldmath$E$}_{{\cal C}_{2}}\Bigg{\{}\mbox{\boldmath$E$}_{\mbox{\boldmath$X$}_{1}}\left[N_{\mbox{\boldmath$X$}_{1},\mbox{\boldmath$y$}_{1}}^{1-\rho\lambda}(\hat{P})\right]\cdot\bigg{[}\frac{1}{\|{\cal T}_{1}\|}\sum_{\tilde{\mbox{\boldmath$x$}}\in{\cal T}_{1}}N_{\tilde{\mbox{\boldmath$x$}}_{1},\mbox{\boldmath$y$}_{1}}^{\lambda}(\hat{P}^{\prime})\bigg{]}^{\rho}\times$
	$\displaystyle 1\left[N_{\tilde{\mbox{\boldmath$x$}}_{1},\mbox{\boldmath$y$}_{1}}(\hat{P}^{\prime})\leq e^{n[(R_{2}-I(\hat{X}_{2}^{\prime};\hat{X}_{1}^{\prime},\hat{Y}_{1}^{\prime}))+\epsilon]},\forall\tilde{\mbox{\boldmath$x$}}_{1}\in{\cal T}_{1}\right]\Bigg{\}}$
	$\displaystyle+\mbox{\boldmath$E$}_{{\cal C}_{2}}\Bigg{\{}\mbox{\boldmath$E$}_{\mbox{\boldmath$X$}_{1}}\left[N_{\mbox{\boldmath$X$}_{1},\mbox{\boldmath$y$}_{1}}^{1-\rho\lambda}(\hat{P})\right]\bigg{[}\frac{1}{\|{\cal T}_{1}\|}\sum_{\tilde{\mbox{\boldmath$x$}}\in{\cal T}_{1}}N_{\tilde{\mbox{\boldmath$x$}_{1}},\mbox{\boldmath$y$}_{1}}^{\lambda}(\hat{P}^{\prime})\bigg{]}^{\rho}\times$
	$\displaystyle 1\left[\exists\tilde{\mbox{\boldmath$x$}}\in{\cal T}_{1}:N_{\tilde{\mbox{\boldmath$x$}}_{1},\mbox{\boldmath$y$}_{1}}(\hat{P}^{\prime})>e^{n[(R_{2}-I(\hat{X}_{2}^{\prime};\hat{X}_{1}^{\prime},\hat{Y}_{1}^{\prime}))+\epsilon]}\right]\Bigg{\}}$
$\displaystyle\leq$	$\displaystyle\mbox{\boldmath$E$}_{{\cal C}_{2}}\Bigg{\{}\mbox{\boldmath$E$}_{\mbox{\boldmath$X$}_{1}}\left[N_{\mbox{\boldmath$X$}_{1},\mbox{\boldmath$y$}_{1}}^{\overline{\rho\lambda}}(\hat{P})\right]\cdot\bigg{[}e^{-n(H(\hat{X}_{1}^{\prime})-\epsilon)}\times$
	$\displaystyle\sum_{\tilde{\mbox{\boldmath$x$}}\in{\cal T}_{1}}1\left[(\tilde{\mbox{\boldmath$x$}},\mbox{\boldmath$y$}_{1})\in{\cal T}_{P_{\hat{X}_{1}^{\prime}\hat{Y}_{1}^{\prime}}}\right]\cdot e^{n\lambda(R_{2}-I(\hat{X}_{2}^{\prime};\hat{X}_{1}^{\prime},\hat{Y}_{1}^{\prime})+\epsilon)}\bigg{]}^{\rho}\Bigg{\}}$
	$\displaystyle+e^{nR_{2}}\mbox{Pr}\left[\exists\tilde{\mbox{\boldmath$x$}}\in{\cal T}_{1}:N_{\tilde{\mbox{\boldmath$x$}},\mbox{\boldmath$y$}_{1}}(\hat{P}^{\prime})>e^{n[(R_{2}-I(\hat{X}_{2}^{\prime};\hat{X}_{1}^{\prime},\hat{Y}_{1}^{\prime}))+\epsilon]}\right]$
$\displaystyle\stackrel{{\scriptstyle.}}{{\leq}}$	$\displaystyle\mbox{\boldmath$E$}_{{\cal C}_{2}}\left\{\mbox{\boldmath$E$}_{\mbox{\boldmath$X$}_{1}}\left[N_{\mbox{\boldmath$X$}_{1},\mbox{\boldmath$y$}_{1}}^{1-\rho\lambda}(\hat{P})\right]\right\}\cdot e^{-n\rho[H(\hat{X}_{1}^{\prime})-H(\hat{X}_{1}^{\prime}\|\hat{Y}_{1}^{\prime})]}\times$
	$\displaystyle\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad e^{n\rho\lambda(R_{2}-I(\hat{X}_{2}^{\prime};\hat{X}_{1}^{\prime},\hat{Y}_{1}^{\prime}))}$	(A.7)

where in the second to last inequality we used $N_{\mbox{\boldmath$x$}_{1},\mbox{\boldmath$y$}}\leq M_{2}$ , and in the last inequality we used the fact that

	$\displaystyle\mbox{Pr}\left\{\exists\tilde{\mbox{\boldmath$x$}}\in{\cal T}_{1}:N_{\tilde{\mbox{\boldmath$x$}},\mbox{\boldmath$y$}_{1}}(\hat{P}^{\prime})>e^{n[(R_{2}-I(\hat{X}_{2}^{\prime};\hat{X}_{1}^{\prime},\hat{Y}_{1}^{\prime}))+\epsilon]}\right\}$
	$\displaystyle\leq e^{n(H(\hat{X}_{1}^{\prime})+\epsilon)}\cdot\mbox{Pr}\left\{N_{\tilde{\mbox{\boldmath$x$}},\mbox{\boldmath$y$}_{1}}(\hat{P}^{\prime})>e^{n[(R_{2}-I(\hat{X}_{2}^{\prime};\hat{X}_{1}^{\prime},\hat{Y}_{1}^{\prime}))+\epsilon]}\right\}$

for any $\tilde{\mbox{\boldmath$x$}}\in{\cal T}_{1}$ , which decays doubly exponentially with $n$ (cf. [7, Appendix]).

To compute $\mbox{\boldmath$E$}_{{\cal C}_{2}}\left\{\mbox{\boldmath$E$}_{\mbox{\boldmath$X$}_{1}}\left[N_{\mbox{\boldmath$X$}_{1},\mbox{\boldmath$y$}_{1}}^{1-\rho\lambda}(\hat{P})\right]\right\}$ we consider two cases, according to the comparison between $R_{2}$ and $I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1})$ :

The case $R_{2}\geq I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1})$ . Here, we have:

	$\displaystyle\mbox{\boldmath$E$}_{{\cal C}_{2}}\mbox{\boldmath$E$}_{\mbox{\boldmath$X$}_{1}}\left[N_{\mbox{\boldmath$X$}_{1},\mbox{\boldmath$y$}_{1}}^{1-\rho\lambda}(\hat{P})\right]=\mbox{\boldmath$E$}_{\mbox{\boldmath$X$}_{1}}\mbox{\boldmath$E$}_{{\cal C}_{2}}\left[N_{\mbox{\boldmath$X$}_{1},\mbox{\boldmath$y$}_{1}}^{1-\rho\lambda}(\hat{P})\right]$
	$\displaystyle\stackrel{{\scriptstyle.}}{{\leq}}\mbox{\boldmath$E$}_{\mbox{\boldmath$X$}_{1}}\left[1\left((\mbox{\boldmath$X$}_{1},\mbox{\boldmath$y$}_{1})\in{\cal T}_{P_{\hat{X}_{1}\hat{Y}_{1}}}\right)e^{n\overline{\rho\lambda}(R_{2}-I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1}))}\right]$
	$\displaystyle\stackrel{{\scriptstyle.}}{{=}}e^{-nI(\hat{X}_{1};\hat{Y}_{1})}e^{n\overline{\rho\lambda}(R_{2}-I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1}))}.$		(A.9)

Therefore, when

R_{2}\geq\max\{I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1}),I(\hat{X}_{2}^{\prime};\hat{X}_{1}^{\prime},\hat{Y}_{1}^{\prime})\}

we have:

	$\displaystyle\mbox{\boldmath$E$}_{{\cal C}_{2}}$	$\displaystyle\stackrel{{\scriptstyle.}}{{\leq}}\exp\left\{n\left[-I(\hat{X}_{1};\hat{Y}_{1})+\overline{\rho\lambda}(R_{2}-I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1}))\right.\right.$
		$\displaystyle\quad\left.\left.-\rho I(\hat{X}_{1}^{\prime};\hat{Y}_{1}^{\prime})+\rho\lambda(R_{2}-I(\hat{X}_{2}^{\prime};\hat{X}_{1}^{\prime},\hat{Y}_{1}^{\prime}))\right]\right\}.$		(A.10)

The case $R_{2}<I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1})$ . Here we have:

	$\displaystyle\mbox{\boldmath$E$}_{{\cal C}_{2}}\mbox{\boldmath$E$}_{\mbox{\boldmath$X$}_{1}}\left[N_{\mbox{\boldmath$X$}_{1},\mbox{\boldmath$y$}_{1}}^{\overline{\rho\lambda}}(\hat{P})\right]$	$\displaystyle\leq\mbox{\boldmath$E$}_{{\cal C}_{2}}\mbox{\boldmath$E$}_{\mbox{\boldmath$X$}_{1}}\left[N_{\mbox{\boldmath$X$}_{1},\mbox{\boldmath$y$}_{1}}(\hat{P})\right]$
		$\displaystyle\stackrel{{\scriptstyle.}}{{\leq}}e^{-nI(\hat{X}_{1};\hat{Y}_{1})}\cdot e^{n(R_{2}-I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1}))},$		(A.11)

where we used the fact that $\overline{\rho\lambda}\leq 1$ and then estimated the expectation of $N_{\mbox{\boldmath$X$}_{1},\mbox{\boldmath$y$}_{1}}(\hat{P})$ as $M_{2}$ times the probability $\mbox{\boldmath$x$}_{2}$ would fall into the corresponding conditional type. Therefore, when

I(\hat{X}_{2}^{\prime};\hat{X}_{1}^{\prime},\hat{Y}_{1}^{\prime})\leq R_{2}<I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1})

we have:

	$\displaystyle\mbox{\boldmath$E$}_{{\cal C}_{2}}$	$\displaystyle\stackrel{{\scriptstyle.}}{{\leq}}\exp\left\{n\left[-I(\hat{X}_{1};\hat{Y}_{1})+(R_{2}-I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1}))\right.\right.$
		$\displaystyle\quad\left.\left.-\rho I(\hat{X}_{1}^{\prime};\hat{Y}_{1}^{\prime})+\rho\lambda(R_{2}-I(\hat{X}_{2}^{\prime};\hat{X}_{1}^{\prime},\hat{Y}_{1}^{\prime}))\right]\right\}.$		(A.12)

The exponents for the subcases $(\ref{eqn:Ecy1a})$ and $(\ref{eqn:Ecy1b})$ corresponding to $R_{2}\geq I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1})$ and $R_{2}<I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1})$ , respectively, differ only in the factors ( $\overline{\rho\lambda}$ and $1$ , resp.) multiplying the term $R_{2}-I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1})$ . Therefore, we can consolidate these two subscases of $R_{2}\geq I(\hat{X}_{2}^{\prime};\hat{X}_{1}^{\prime},\hat{Y}_{1}^{\prime})$ into the expression:

$\displaystyle\mbox{\boldmath$E$}_{{\cal C}_{2}}$	$\displaystyle\stackrel{{\scriptstyle.}}{{\leq}}\exp\left\{n\left[-I(\hat{X}_{1};\hat{Y}_{1})+\right.\right.$
	$\displaystyle\quad\min\{\overline{\rho\lambda}(R_{2}-I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1})),$
	$\displaystyle\quad\quad\quad(R_{2}-I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1}))\}$
	$\displaystyle\quad\left.\left.-\rho I(\hat{X}_{1}^{\prime};\hat{Y}_{1}^{\prime})+\rho\lambda(R_{2}-I(\hat{X}_{2}^{\prime};\hat{X}_{1}^{\prime},\hat{Y}_{1}^{\prime}))\right]\right\},$	(A.13)

since $\min\{\overline{\rho\lambda}$ $(R_{2}-I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1})),$ $(R_{2}-I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1}))\}$ is $\overline{\rho\lambda}$ $(R_{2}-I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1}))$ when $R_{2}\geq I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1})$ and $(R_{2}-I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1}))$ when $R_{2}<I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1})$ .

2. The range $R_{2}<I(\hat{X}_{2}^{\prime};\hat{X}_{1}^{\prime},\hat{Y}_{1}^{\prime})$ . In this range,

	$\displaystyle\mbox{\boldmath$E$}_{{\cal C}_{2}}$	$\displaystyle=\mbox{\boldmath$E$}_{{\cal C}_{2}}\left\{\mbox{\boldmath$E$}_{\mbox{\boldmath$X$}_{1}}\left[N_{\mbox{\boldmath$X$}_{1},\mbox{\boldmath$y$}_{1}}^{1-\rho\lambda}(\hat{P})\right]\mbox{\boldmath$E$}_{\mbox{\boldmath$X$}_{1}}^{\rho}\left[N_{\mbox{\boldmath$X$}_{1},\mbox{\boldmath$y$}_{1}}^{\lambda}(\hat{P}^{\prime})\right]\right\}$
		$\displaystyle\leq\mbox{\boldmath$E$}_{{\cal C}_{2}}\left\{\mbox{\boldmath$E$}_{\mbox{\boldmath$X$}_{1}}\left[N_{\mbox{\boldmath$X$}_{1},\mbox{\boldmath$y$}_{1}}^{1-\rho\lambda}(\hat{P})\right]\mbox{\boldmath$E$}_{\mbox{\boldmath$X$}_{1}}^{\rho}\left[N_{\mbox{\boldmath$X$}_{1},\mbox{\boldmath$y$}_{1}}(\hat{P}^{\prime})\right]\right\}$

where we assumed $\lambda\leq 1$ in the last step. The second expectation over $\mbox{\boldmath$X$}_{1}$ can be evaluated as

$\displaystyle E_{\mbox{\boldmath$X$}_{1}}N_{\mbox{\boldmath$X$}_{1},\mbox{\boldmath$y$}_{1}}$	$\displaystyle(P_{\hat{X}_{1}^{\prime}\hat{X}_{2}^{\prime}\hat{Y}_{1}^{\prime}})$
	$\displaystyle=\sum_{\mbox{\boldmath$x$}_{2}\in{\cal C}_{2}}\mbox{\boldmath$E$}_{\mbox{\boldmath$X$}_{1}}1((\mbox{\boldmath$X$}_{1},\mbox{\boldmath$x$}_{2},\mbox{\boldmath$y$}_{1})\in{\cal T}_{P_{\hat{X}_{1}^{\prime}\hat{X}_{2}^{\prime}\hat{Y}_{1}^{\prime}}})$
	$\displaystyle\stackrel{{\scriptstyle\cdot}}{{=}}e^{-nI(\hat{X}_{1}^{\prime};\hat{X}_{2}^{\prime},\hat{Y}_{1}^{\prime})}\sum_{\mbox{\boldmath$x$}_{2}\in{\cal C}_{2}}1((\mbox{\boldmath$x$}_{2},\mbox{\boldmath$y$}_{1})\in{\cal T}_{P_{\hat{X}_{2}^{\prime}\hat{Y}_{1}^{\prime}}})$
	$\displaystyle=e^{-nI(\hat{X}_{1}^{\prime};\hat{X}_{2}^{\prime},\hat{Y}_{1}^{\prime})}N_{\mbox{\boldmath$y$}_{1}}(P_{\hat{X}_{2}^{\prime}\hat{Y}_{1}^{\prime}}),$	(A.15)

where $N_{\mbox{\boldmath$y$}_{1}}(P_{\hat{X}_{2}^{\prime}\hat{Y}_{1}^{\prime}})$ is the number of codewords $\{\mbox{\boldmath$x$}_{2}\}$ that are jointly typical with $\mbox{\boldmath$y$}_{1}$ according to $P_{\hat{X}_{2}^{\prime}\hat{Y}_{1}^{\prime}}$ . Thus,

	$\displaystyle\mbox{\boldmath$E$}_{{\cal C}_{2}}\big{[}\mbox{\boldmath$E$}_{\mbox{\boldmath$X$}_{1}}N_{\mbox{\boldmath$X$}_{1},\mbox{\boldmath$y$}_{1}}^{\overline{\rho\lambda}}(\hat{P})\mbox{\boldmath$E$}^{\rho}_{\mbox{\boldmath$X$}_{1}}N_{\mbox{\boldmath$X$}_{1},\mbox{\boldmath$y$}_{1}}(\hat{P}^{\prime})\big{]}$
	$\displaystyle\stackrel{{\scriptstyle\cdot}}{{=}}e^{-n\rho I(\hat{X}_{1}^{\prime};\hat{X}_{2}^{\prime},\hat{Y}_{1}^{\prime})}\mbox{\boldmath$E$}_{{\cal C}_{2}}\big{[}\mbox{\boldmath$E$}_{\mbox{\boldmath$X$}_{1}}N_{\mbox{\boldmath$X$}_{1},\mbox{\boldmath$y$}_{1}}^{\overline{\rho\lambda}}(P_{\hat{X}_{1}\hat{X}_{2}\hat{Y}_{1}})N_{\mbox{\boldmath$y$}_{1}}^{\rho}(P_{\hat{X}_{2}^{\prime}\hat{Y}_{1}^{\prime}})\big{]}$
	$\displaystyle=e^{-n\rho I(\hat{X}_{1}^{\prime};\hat{X}_{2}^{\prime},\hat{Y}_{1}^{\prime})}\mbox{\boldmath$E$}_{\mbox{\boldmath$X$}_{1}}\mbox{\boldmath$E$}_{{\cal C}_{2}}\big{[}N_{\mbox{\boldmath$X$}_{1},\mbox{\boldmath$y$}_{1}}^{\overline{\rho\lambda}}(P_{\hat{X}_{1}\hat{X}_{2}\hat{Y}_{1}})N_{\mbox{\boldmath$y$}_{1}}^{\rho}(P_{\hat{X}_{2}^{\prime}\hat{Y}_{1}^{\prime}})\big{]}.$		(A.16)

To bound $\mbox{\boldmath$E$}_{\mbox{\boldmath$X$}_{1}}\mbox{\boldmath$E$}_{{\cal C}_{2}}[N_{\mbox{\boldmath$X$}_{1},\mbox{\boldmath$y$}_{1}}^{\overline{\rho\lambda}}(\hat{P})N_{\mbox{\boldmath$y$}_{1}}^{\rho}(\hat{P}^{\prime})]$ , we consider two cases depending on how $R_{2}$ compares to $I(\hat{X}_{2}^{\prime};\hat{Y}_{1}^{\prime})$ .

The case $R_{2}\geq I(\hat{X}_{2}^{\prime};\hat{Y}_{1}^{\prime})$ . Here, we have:

	$\displaystyle\mbox{\boldmath$E$}_{\mbox{\boldmath$X$}_{1}}\mbox{\boldmath$E$}_{{\cal C}_{2}}[N_{\mbox{\boldmath$X$}_{1},\mbox{\boldmath$y$}_{1}}^{\overline{\rho\lambda}}(\hat{P})N_{\mbox{\boldmath$y$}_{1}}^{\rho}(\hat{P}^{\prime})]$
$\displaystyle=$	$\displaystyle\mbox{\boldmath$E$}_{\mbox{\boldmath$X$}_{1}}\mbox{\boldmath$E$}_{{\cal C}_{2}}\Bigg{\{}N_{\mbox{\boldmath$X$}_{1},\mbox{\boldmath$y$}_{1}}^{\overline{\rho\lambda}}(\hat{P})N_{\mbox{\boldmath$y$}_{1}}^{\rho}(\hat{P}^{\prime})\times$
	$\displaystyle\quad\quad\quad\quad\quad\quad\quad 1\bigg{[}N_{\mbox{\boldmath$y$}_{1}}(\hat{P}^{\prime})\leq e^{n(R_{2}-I(\hat{X}_{2}^{\prime};\hat{Y}_{1}^{\prime})+\epsilon)}\bigg{]}\Bigg{\}}$
	$\displaystyle+\mbox{\boldmath$E$}_{\mbox{\boldmath$X$}_{1}}\mbox{\boldmath$E$}_{{\cal C}_{2}}\Bigg{\{}N_{\mbox{\boldmath$X$}_{1},\mbox{\boldmath$y$}_{1}}^{\overline{\rho\lambda}}(\hat{P})N_{\mbox{\boldmath$y$}_{1}}^{\rho}(\hat{P}^{\prime})\times$
	$\displaystyle\quad\quad\quad\quad\quad\quad\quad 1\bigg{[}N_{\mbox{\boldmath$y$}_{1}}(\hat{P}^{\prime})>e^{n(R_{2}-I(\hat{X}_{2}^{\prime};\hat{Y}_{1}^{\prime})+\epsilon)}\bigg{]}\Bigg{\}}$
$\displaystyle\stackrel{{\scriptstyle.}}{{\leq}}$	$\displaystyle e^{n\rho(R_{2}-I(\hat{X}_{2}^{\prime};\hat{Y}_{1}^{\prime}))}\mbox{\boldmath$E$}_{\mbox{\boldmath$X$}_{1}}\mbox{\boldmath$E$}_{{\cal C}_{2}}\bigg{[}N_{\mbox{\boldmath$X$}_{1},\mbox{\boldmath$y$}_{1}}^{\overline{\rho\lambda}}(\hat{P})\bigg{]}$
	$\displaystyle+e^{n(\overline{\rho\lambda}+\rho)R_{2}}\mbox{Pr}\bigg{[}N_{\mbox{\boldmath$y$}_{1}}(\hat{P}^{\prime})>e^{n(R_{2}-I(\hat{X}_{2}^{\prime};\hat{Y}_{1}^{\prime})+\epsilon)}\bigg{]}$
$\displaystyle\stackrel{{\scriptstyle.}}{{\leq}}$	$\displaystyle\exp\Bigg{\{}n\Bigg{[}\rho(R_{2}-I(\hat{X}_{2}^{\prime};\hat{Y}_{1}^{\prime}))-I(\hat{X}_{1};\hat{Y}_{1})$
	$\displaystyle+1(R_{2}\geq I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1}))\overline{\rho\lambda}(R_{2}-I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1}))$
	$\displaystyle+1(R_{2}<I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1}))(R_{2}-I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1}))\bigg{]}\Bigg{\}}$
$\displaystyle=$	$\displaystyle\exp\Bigg{\{}n\Bigg{[}\rho(R_{2}-I(\hat{X}_{2}^{\prime};\hat{Y}_{1}^{\prime}))-I(\hat{X}_{1};\hat{Y}_{1})$
	$\displaystyle+\min\{\overline{\rho\lambda}(R_{2}-I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1})),$
	$\displaystyle\quad\quad(R_{2}-I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1}))\}\bigg{]}\Bigg{\}}$	(A.17)

where we used the fact that $\mbox{Pr}\big{[}N_{\mbox{\boldmath$y$}_{1}}(\hat{P}^{\prime})>e^{n(R_{2}-I(\hat{X}_{2}^{\prime};\hat{Y}_{1}^{\prime})+\epsilon)}\big{]}$ decays doubly exponentially in the third inequality, and bounded $\mbox{\boldmath$E$}_{\mbox{\boldmath$X$}_{1}}\mbox{\boldmath$E$}_{{\cal C}_{2}}\big{[}N_{\mbox{\boldmath$X$}_{1},\mbox{\boldmath$y$}_{1}}^{\overline{\rho\lambda}}(\hat{P})\big{]}$ using (A.9) and (A.11) in the last inequality.

The case $R_{2}<I(\hat{X}_{2}^{\prime};\hat{Y}_{1}^{\prime})$ . Here, we further split the evaluation into two parts. In the first part, $R_{2}\geq I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1})$ , and we have:

	$\displaystyle\mbox{\boldmath$E$}_{\mbox{\boldmath$X$}_{1}}\mbox{\boldmath$E$}_{{\cal C}_{2}}[N_{\mbox{\boldmath$X$}_{1},\mbox{\boldmath$y$}_{1}}^{\overline{\rho\lambda}}(\hat{P})N_{\mbox{\boldmath$y$}_{1}}^{\rho}(\hat{P}^{\prime})]$
$\displaystyle\leq$	$\displaystyle\mbox{\boldmath$E$}_{\mbox{\boldmath$X$}_{1}}\mbox{\boldmath$E$}_{{\cal C}_{2}}\Bigg{\{}N_{\mbox{\boldmath$X$}_{1},\mbox{\boldmath$y$}_{1}}^{1-\rho\lambda}(\hat{P})N_{\mbox{\boldmath$y$}_{1}}^{\rho}(\hat{P}^{\prime})\times$
	$\displaystyle\quad\quad\quad\quad\quad\quad 1\bigg{[}N_{\mbox{\boldmath$X$}_{1},\mbox{\boldmath$y$}_{1}}(\hat{P})\leq e^{n(R_{2}-I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1})+\epsilon)}\bigg{]}\Bigg{\}}$
	$\displaystyle+\mbox{\boldmath$E$}_{\mbox{\boldmath$X$}_{1}}\mbox{\boldmath$E$}_{{\cal C}_{2}}\Bigg{\{}N_{\mbox{\boldmath$X$}_{1},\mbox{\boldmath$y$}_{1}}^{1-\rho\lambda}(\hat{P})N_{\mbox{\boldmath$y$}_{1}}^{\rho}(\hat{P}^{\prime})\times$
	$\displaystyle\quad\quad\quad\quad\quad\quad 1\bigg{[}N_{\mbox{\boldmath$X$}_{1},\mbox{\boldmath$y$}_{1}}(\hat{P})>e^{n(R_{2}-I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1})+\epsilon)}\bigg{]}\Bigg{\}}$
$\displaystyle\stackrel{{\scriptstyle.}}{{\leq}}$	$\displaystyle e^{n\overline{\rho\lambda}(R_{2}-I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1}))}\times$
	$\displaystyle\quad\quad\quad\quad\quad\mbox{\boldmath$E$}_{\mbox{\boldmath$X$}_{1}}\mbox{\boldmath$E$}_{{\cal C}_{2}}\bigg{\{}N_{\mbox{\boldmath$y$}_{1}}^{\rho}(\hat{P}^{\prime})1\big{[}(\mbox{\boldmath$X$}_{1},\mbox{\boldmath$y$}_{1})\in{\cal T}_{P_{\hat{X}_{1}\hat{Y}_{1}}}\big{]}\bigg{\}}$
	$\displaystyle+e^{n(\overline{\rho\lambda}+\rho)R_{2}}\mbox{Pr}\bigg{[}N_{\mbox{\boldmath$X$}_{1},\mbox{\boldmath$y$}_{1}}(\hat{P})>e^{n(R_{2}-I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1})+\epsilon)}\bigg{]}$
$\displaystyle\stackrel{{\scriptstyle.}}{{\leq}}$	$\displaystyle e^{n[\overline{\rho\lambda}(R_{2}-I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1}))-I(\hat{X}_{1};\hat{Y}_{1})]}\mbox{\boldmath$E$}_{{\cal C}_{2}}\big{[}N_{\mbox{\boldmath$y$}_{1}}^{\rho}(P_{\hat{X}_{2}^{\prime}\hat{Y}_{1}^{\prime}})\big{]}$
$\displaystyle\stackrel{{\scriptstyle.}}{{\leq}}$	$\displaystyle\exp\bigg{\{}n\big{[}\overline{\rho\lambda}(R_{2}-I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1}))-I(\hat{X}_{1};\hat{Y}_{1})$
	$\displaystyle\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad+R_{2}-I(\hat{X}_{2}^{\prime};\hat{Y}_{1}^{\prime})\big{]}\bigg{\}}$	(A.18)

where we used in the last inequality

\mbox{\boldmath$E$}_{{\cal C}_{2}}\big{[}N_{\mbox{\boldmath$y$}_{1}}^{\rho}(P_{\hat{X}_{2}^{\prime}\hat{Y}_{1}^{\prime}})\big{]}\leq\mbox{\boldmath$E$}_{{\cal C}_{2}}\big{[}N_{\mbox{\boldmath$y$}_{1}}(P_{\hat{X}_{2}^{\prime}\hat{Y}_{1}^{\prime}})\big{]}\stackrel{{\scriptstyle.}}{{=}}e^{n(R_{2}-I(\hat{X}_{2}^{\prime};\hat{Y}_{1}^{\prime}))}

valid for $\rho\leq 1$ .

The other part corresponds to $R_{2}<I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1})$ . Here we have:

$\displaystyle\mbox{\boldmath$E$}_{\mbox{\boldmath$X$}_{1}}$	$\displaystyle\mbox{\boldmath$E$}_{{\cal C}_{2}}[N_{\mbox{\boldmath$X$}_{1},\mbox{\boldmath$y$}_{1}}^{\overline{\rho\lambda}}(\hat{P})N_{\mbox{\boldmath$y$}_{1}}^{\rho}(\hat{P}^{\prime})]$
$\displaystyle=$	$\displaystyle\mbox{\boldmath$E$}_{\mbox{\boldmath$X$}_{1}}\mbox{\boldmath$E$}_{{\cal C}_{2}}\bigg{\{}N_{\mbox{\boldmath$X$}_{1},\mbox{\boldmath$y$}_{1}}^{1-\rho\lambda}(\hat{P})N_{\mbox{\boldmath$y$}_{1}}^{\rho}(\hat{P}^{\prime})1\big{[}N_{\mbox{\boldmath$y$}_{1}}(\hat{P}^{\prime})\leq e^{n\epsilon}\big{]}\bigg{\}}$
	$\displaystyle+\mbox{\boldmath$E$}_{\mbox{\boldmath$X$}_{1}}\mbox{\boldmath$E$}_{{\cal C}_{2}}\bigg{\{}N_{\mbox{\boldmath$X$}_{1},\mbox{\boldmath$y$}_{1}}^{\overline{\rho\lambda}}(\hat{P})N_{\mbox{\boldmath$y$}_{1}}^{\rho}(\hat{P}^{\prime})1\big{[}N_{\mbox{\boldmath$y$}_{1}}(\hat{P}^{\prime})>e^{n\epsilon}\big{]}\bigg{\}}$
$\displaystyle\leq$	$\displaystyle e^{n\rho\epsilon}\mbox{\boldmath$E$}_{\mbox{\boldmath$X$}_{1}}\mbox{\boldmath$E$}_{{\cal C}_{2}}\bigg{\{}N_{\mbox{\boldmath$X$}_{1},\mbox{\boldmath$y$}_{1}}^{\overline{\rho\lambda}}(\hat{P})1\big{[}N_{\mbox{\boldmath$y$}_{1}}(\hat{P}^{\prime})\geq 1\big{]}\bigg{\}}$
	$\displaystyle\quad\quad\quad\quad\quad\quad\quad+e^{n(\overline{\rho\lambda}+\rho)R_{2}}\mbox{Pr}\bigg{[}N_{\mbox{\boldmath$y$}_{1}}(\hat{P}^{\prime})>e^{n\epsilon}\bigg{]}$
$\displaystyle\stackrel{{\scriptstyle.}}{{\leq}}$	$\displaystyle\mbox{\boldmath$E$}_{\mbox{\boldmath$X$}_{1}}\mbox{\boldmath$E$}_{{\cal C}_{2}}\bigg{\{}N_{\mbox{\boldmath$X$}_{1},\mbox{\boldmath$y$}_{1}}^{\overline{\rho\lambda}}(\hat{P})\cdot 1\big{[}N_{\mbox{\boldmath$y$}_{1}}(\hat{P}^{\prime})\geq 1\big{]}\times$
	$\displaystyle\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad 1\big{[}N_{\mbox{\boldmath$X$}_{1},\mbox{\boldmath$y$}_{1}}(\hat{P})\leq e^{n\epsilon}\big{]}\bigg{\}}$
	$\displaystyle+\mbox{\boldmath$E$}_{\mbox{\boldmath$X$}_{1}}\mbox{\boldmath$E$}_{{\cal C}_{2}}\bigg{\{}N_{\mbox{\boldmath$X$}_{1},\mbox{\boldmath$y$}_{1}}^{\overline{\rho\lambda}}(\hat{P})\cdot 1\big{[}N_{\mbox{\boldmath$y$}_{1}}(\hat{P}^{\prime})\geq 1\big{]}\times$
	$\displaystyle\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad 1\big{[}N_{\mbox{\boldmath$X$}_{1},\mbox{\boldmath$y$}_{1}}(\hat{P})>e^{n\epsilon}\big{]}\bigg{\}}$
$\displaystyle\stackrel{{\scriptstyle.}}{{\leq}}$	$\displaystyle e^{n\overline{\rho\lambda}\epsilon}\mbox{\boldmath$E$}_{\mbox{\boldmath$X$}_{1}}\mbox{\boldmath$E$}_{{\cal C}_{2}}\bigg{\{}1\big{[}N_{\mbox{\boldmath$y$}_{1}}(\hat{P}^{\prime})\geq 1\big{]}\times$
	$\displaystyle\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad 1\big{[}N_{\mbox{\boldmath$X$}_{1},\mbox{\boldmath$y$}_{1}}(\hat{P})\geq 1\big{]}\bigg{\}}$
	$\displaystyle+e^{n\overline{\rho\lambda}R_{2}}\mbox{\boldmath$E$}_{\mbox{\boldmath$X$}_{1}}\bigg{\{}\mbox{Pr}\big{[}N_{\mbox{\boldmath$X$}_{1},\mbox{\boldmath$y$}_{1}}(\hat{P})>e^{n\epsilon}\big{]}\bigg{\}}$
$\displaystyle\stackrel{{\scriptstyle.}}{{=}}$	$\displaystyle\frac{1}{\|{\cal T}_{1}\|}\sum_{\tilde{\mbox{\boldmath$x$}}_{1}\in{\cal T}_{1}}1\big{[}(\tilde{\mbox{\boldmath$x$}}_{1},\mbox{\boldmath$y$}_{1})\in{\cal T}_{P_{\hat{X}_{1}\hat{Y}_{1}}}\big{]}\times$
	$\displaystyle\quad\quad\quad\quad\quad\mbox{Pr}\big{[}N_{\mbox{\boldmath$y$}_{1}}(\hat{P}^{\prime})\geq 1,N_{\tilde{\mbox{\boldmath$x$}}_{1},\mbox{\boldmath$y$}_{1}}(\hat{P})\geq 1\big{]}$	(A.19)

To bound $\mbox{Pr}\big{[}N_{\mbox{\boldmath$y$}_{1}}(\hat{P}^{\prime})\geq 1,N_{\tilde{\mbox{\boldmath$x$}}_{1},\mbox{\boldmath$y$}_{1}}(\hat{P})\geq 1\big{]}$ , we consider two cases:

The first case is when $P_{\hat{X}_{2}\hat{Y}_{1}}=P_{\hat{X}_{2}^{\prime}\hat{Y}_{1}^{\prime}}$ : in this case, $\big{\{}N_{\tilde{\mbox{\boldmath$x$}}_{1},\mbox{\boldmath$y$}_{1}}(\hat{P})\geq 1\big{\}}\Rightarrow\big{\{}N_{\mbox{\boldmath$y$}_{1}}(\hat{P}^{\prime})\geq 1\big{\}}$ . Therefore,

	$\displaystyle\mbox{Pr}\big{[}N_{\mbox{\boldmath$y$}_{1}}(\hat{P}^{\prime})\geq 1,N_{\tilde{\mbox{\boldmath$x$}}_{1},\mbox{\boldmath$y$}_{1}}(\hat{P})\geq 1\big{]}=$	$\displaystyle\mbox{Pr}\big{[}N_{\tilde{\mbox{\boldmath$x$}}_{1},\mbox{\boldmath$y$}_{1}}(\hat{P})\geq 1\big{]}$
	$\displaystyle\stackrel{{\scriptstyle.}}{{\leq}}$	$\displaystyle e^{n(R_{2}-I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1}))}.$

Replacing in (A.19), we get:

	$\displaystyle\mbox{\boldmath$E$}_{\mbox{\boldmath$X$}_{1}}\mbox{\boldmath$E$}_{{\cal C}_{2}}$	$\displaystyle[N_{\mbox{\boldmath$X$}_{1},\mbox{\boldmath$y$}_{1}}^{\overline{\rho\lambda}}(\hat{P})N_{\mbox{\boldmath$y$}_{1}}^{\rho}(\hat{P}^{\prime})]$
		$\displaystyle\stackrel{{\scriptstyle.}}{{\leq}}\exp\big{\{}n\big{[}-I(\hat{X}_{1};\hat{Y}_{1})+R_{2}-I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1})\big{]}\big{\}}.$		(A.20)

The other case is $P_{\hat{X}_{2}\hat{Y}_{1}}\neq P_{\hat{X}_{2}^{\prime}\hat{Y}_{1}^{\prime}}$ : in this case, the same codeword $\mbox{\boldmath$x$}_{2}$ cannot simultaneously satisfy $(\tilde{\mbox{\boldmath$x$}}_{1},\mbox{\boldmath$x$}_{2},\mbox{\boldmath$y$}_{1})\in{\cal T}_{P_{\hat{X}_{1}\hat{X}_{2}\hat{Y}_{1}}}$ and $(\mbox{\boldmath$x$}_{2},\mbox{\boldmath$y$}_{1})\in{\cal T}_{P_{\hat{X}_{2}^{\prime}\hat{Y}_{1}^{\prime}}}$ . Therefore, we have that

	$\displaystyle\mbox{Pr}\big{[}N_{\mbox{\boldmath$y$}_{1}}(\hat{P}^{\prime})\geq 1,\quad$	$\displaystyle N_{\tilde{\mbox{\boldmath$x$}}_{1},\mbox{\boldmath$y$}_{1}}(\hat{P})\geq 1\big{]}$
	$\displaystyle=$	$\displaystyle\mbox{Pr}\big{[}\exists\mbox{\boldmath$x$}_{2}^{\prime}\neq\mbox{\boldmath$x$}_{2}:(\tilde{\mbox{\boldmath$x$}}_{1},\mbox{\boldmath$x$}_{2}^{\prime},\mbox{\boldmath$y$}_{1})\in{\cal T}_{P_{\hat{X}_{1}\hat{X}_{2}\hat{Y}_{1}}},$
		$\displaystyle\quad\quad\quad\quad\quad\quad\quad\quad(\mbox{\boldmath$x$}_{2},\mbox{\boldmath$y$}_{1})\in{\cal T}_{P_{\hat{X}^{\prime}_{2},\hat{Y}_{1}^{\prime}}}\big{]}$
	$\displaystyle\leq$	$\displaystyle\sum_{\mbox{\boldmath$x$}_{2}\in{\cal C}_{2}}\sum_{\mbox{\boldmath$x$}_{2}^{\prime}\neq\mbox{\boldmath$x$}_{2}}\mbox{Pr}\big{[}(\tilde{\mbox{\boldmath$x$}}_{1},\mbox{\boldmath$x$}_{2}^{\prime},\mbox{\boldmath$y$}_{1})\in{\cal T}_{P_{\hat{X}_{1}\hat{X}_{2}\hat{Y}_{1}}},$
		$\displaystyle\quad\quad\quad\quad\quad\quad\quad\quad(\mbox{\boldmath$x$}_{2},\mbox{\boldmath$y$}_{1})\in{\cal T}_{P_{\hat{X}_{2}^{\prime},\hat{Y}_{1}^{\prime}}}\big{]}$
	$\displaystyle\stackrel{{\scriptstyle.}}{{\leq}}$	$\displaystyle e^{n2R_{2}}e^{-nI(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1})}e^{-nI(\hat{X}_{2}^{\prime};\hat{Y}_{1}^{\prime})}.$

Replacing in (A.19), we get:

$\displaystyle\mbox{\boldmath$E$}_{\mbox{\boldmath$X$}_{1}}\mbox{\boldmath$E$}_{{\cal C}_{2}}$	$\displaystyle[N_{\mbox{\boldmath$X$}_{1},\mbox{\boldmath$y$}_{1}}^{\overline{\rho\lambda}}(\hat{P})N_{\mbox{\boldmath$y$}_{1}}^{\rho}(\hat{P}^{\prime})]$
$\displaystyle\stackrel{{\scriptstyle.}}{{\leq}}$	$\displaystyle\exp\big{\{}n\big{[}-I(\hat{X}_{1};\hat{Y}_{1})+R_{2}-I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1})$
	$\displaystyle\quad\quad\quad\quad\quad\quad\quad\quad+R_{2}-I(\hat{X}_{2}^{\prime};\hat{Y}_{1}^{\prime})\big{]}\big{\}}.$	(A.21)

This completes the decomposition of $\mbox{\boldmath$E$}_{{\cal C}_{2}}$ into the various subcases.

Consolidation. Next, we carry out a consolidation process that merges all of the above subcases into a more compact expression, leading ultimately to the expression in Theorem 1. Figure 5 gives a schematic representation, in terms of a tree, of the various consolidation steps described below. The consolidation of (A) and (A) into (A.13) was done before, but we include it in Fig. 5 for completeness. Referring to Fig. 5, the consolidation starts at the deepest leaves of the tree and works its way up the nodes until it reaches the root.

We begin with the last set of subsubcases derived, $R_{2}\geq I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1})$ and $R_{2}<I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1})$ (expressions (A.18), (A.20), and (A.21)) for the subcase $R_{2}<I(\hat{X}_{2}^{\prime};\hat{Y}_{1}^{\prime})$ , and consolidate them as follows:

$\displaystyle\mbox{\boldmath$E$}_{\mbox{\boldmath$X$}_{1}}\mbox{\boldmath$E$}_{{\cal C}_{2}}$	$\displaystyle\stackrel{{\scriptstyle.}}{{\leq}}\exp\Bigg{\{}n\Big{\{}1(R_{2}\geq I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1}))\times$
	$\displaystyle\big{[}\overline{\rho\lambda}(R_{2}-I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1}))-I(\hat{X}_{1};\hat{Y}_{1})$
	$\displaystyle\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad+R_{2}-I(\hat{X}_{2}^{\prime};\hat{Y}_{1}^{\prime})\big{]}$
	$\displaystyle+1(R_{2}<I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1}))1(P_{\hat{X}_{2}\hat{Y}_{1}}\neq P_{\hat{X}_{2}^{\prime}\hat{Y}_{1}^{\prime}})\times$
	$\displaystyle\quad\big{[}-I(\hat{X}_{1};\hat{Y}_{1})+R_{2}-I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1})$
	$\displaystyle\quad\quad\quad\quad\quad\quad\quad\quad+R_{2}-I(\hat{X}_{2}^{\prime};\hat{Y}_{1}^{\prime})\big{]}$
	$\displaystyle+1(R_{2}<I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1}))1(P_{\hat{X}_{2}\hat{Y}_{1}}=P_{\hat{X}_{2}^{\prime}\hat{Y}_{1}^{\prime}})\times$
	$\displaystyle\quad\big{[}-I(\hat{X}_{1};\hat{Y}_{1})+R_{2}-I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1})\big{]}\Big{\}}\Bigg{\}}.$	(A.22)

Next we would like to decompose the indicator $1(R_{2}\geq I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1}))$ appearing in the initial part of this expression as

	$\displaystyle 1(R_{2}$	$\displaystyle\geq I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1}))$
	$\displaystyle=$	$\displaystyle 1(R_{2}\geq I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1}))1(P_{\hat{X}_{2}\hat{Y}_{1}}=P_{\hat{X}_{2}^{\prime}\hat{Y}_{1}^{\prime}})+$
		$\displaystyle 1(R_{2}\geq I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1}))1(P_{\hat{X}_{2}\hat{Y}_{1}}\neq P_{\hat{X}_{2}^{\prime}\hat{Y}_{1}^{\prime}})$
	$\displaystyle=$	$\displaystyle 1(R_{2}\geq I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1}))1(P_{\hat{X}_{2}\hat{Y}_{1}}\neq P_{\hat{X}_{2}^{\prime}\hat{Y}_{1}^{\prime}}),$

where we are taking into account in the last step that for the present subcase $R_{2}<I(\hat{X}_{2}^{\prime};\hat{Y}_{1}^{\prime})$ , $1(R_{2}\geq I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1}))1(P_{\hat{X}_{2}\hat{Y}_{1}}=P_{\hat{X}_{2}^{\prime}\hat{Y}_{1}^{\prime}})=0$ since for $P_{\hat{X}_{2}\hat{Y}_{1}}=P_{\hat{X}_{2}^{\prime}\hat{Y}_{1}^{\prime}}$ we have $R_{2}<I(\hat{X}_{2}^{\prime};\hat{Y}_{1}^{\prime})=I(\hat{X}_{2};\hat{Y}_{1})\leq I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1})$ .

Applying this decomposition to (A.22), then combining terms having the same indicators $1(P_{\hat{X}_{2}\hat{Y}_{1}}\neq P_{\hat{X}_{2}^{\prime}\hat{Y}_{1}^{\prime}})$ , and $1(P_{\hat{X}_{2}\hat{Y}_{1}}=P_{\hat{X}_{2}^{\prime}\hat{Y}_{1}^{\prime}})$ , and replacing indicators by $\min\{\cdots\}$ as appropriate (similar to (A.13)), we simplify (A.22) to

$\displaystyle\mbox{\boldmath$E$}_{\mbox{\boldmath$X$}_{1}}$	$\displaystyle\mbox{\boldmath$E$}_{{\cal C}_{2}}$
$\displaystyle\stackrel{{\scriptstyle.}}{{\leq}}$	$\displaystyle\exp\Bigg{\{}n\Big{\{}1(P_{\hat{X}_{2}\hat{Y}_{1}}\neq P_{\hat{X}_{2}^{\prime}\hat{Y}_{1}^{\prime}})\big{[}-I(\hat{X}_{1};\hat{Y}_{1})+$
	$\displaystyle\quad\min\{\overline{\rho\lambda}(R_{2}{-}I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1})),R_{2}{-}I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1})\}$
	$\displaystyle\quad\quad+R_{2}-I(\hat{X}_{2}^{\prime};\hat{Y}_{1}^{\prime})\big{]}$
	$\displaystyle+1(P_{\hat{X}_{2}\hat{Y}_{1}}=P_{\hat{X}_{2}^{\prime}\hat{Y}_{1}^{\prime}})1(R_{2}<I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1}))\times$
	$\displaystyle\quad\big{[}-I(\hat{X}_{1};\hat{Y}_{1})+R_{2}-I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1})\big{]}\Big{]}\Big{\}}\Bigg{\}}.$	(A.23)

This is valid for the subcase $R_{2}<I(\hat{X}_{2}^{\prime};\hat{Y}_{1}^{\prime})$ .

Next, we consolidate (A.17) from the subcase $R_{2}\geq I(\hat{X}_{2}^{\prime};\hat{Y}_{1}^{\prime})$ with (A.23) and insert the result into (A.16) to get

$\displaystyle\mbox{\boldmath$E$}_{{\cal C}_{2}}$	$\displaystyle\stackrel{{\scriptstyle.}}{{\leq}}\exp\Bigg{\{}n\Big{\{}-\rho I(\hat{X}_{1}^{\prime};\hat{X}_{2}^{\prime},\hat{Y}_{1}^{\prime})$
	$\displaystyle+1(R_{2}{\geq}I(\hat{X}_{2}^{\prime};\hat{Y}_{1}^{\prime}))\Big{[}{-}I(\hat{X}_{1};\hat{Y}_{1}){+}\rho(R_{2}{-}I(\hat{X}_{2}^{\prime};\hat{Y}_{1}^{\prime}))$
	$\displaystyle{+}\min\{\overline{\rho\lambda}(R_{2}{-}I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1})),(R_{2}{-}I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1}))\}\Big{]}$
	$\displaystyle{+}1(R_{2}{<}I(\hat{X}_{2}^{\prime};\hat{Y}_{1}^{\prime}))\Big{[}1(P_{\hat{X}_{2}\hat{Y}_{1}}{\neq}P_{\hat{X}_{2}^{\prime}\hat{Y}_{1}^{\prime}})\big{[}-I(\hat{X}_{1};\hat{Y}_{1})$
	$\displaystyle{+}\min\{\overline{\rho\lambda}(R_{2}{-}I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1})),R_{2}{-}I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1})\}$
	$\displaystyle\quad\quad\quad+R_{2}-I(\hat{X}_{2}^{\prime};\hat{Y}_{1}^{\prime})\big{]}$
	$\displaystyle\quad+1(P_{\hat{X}_{2}\hat{Y}_{1}}=P_{\hat{X}_{2}^{\prime}\hat{Y}_{1}^{\prime}})1(R_{2}<I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1}))\times$
	$\displaystyle\quad\quad\big{[}-I(\hat{X}_{1};\hat{Y}_{1})+R_{2}-I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1})\big{]}\Big{]}\Big{\}}\Bigg{\}},$	(A.24)

which applies to the range $R_{2}<I(\hat{X}_{2}^{\prime};\hat{X}_{1}^{\prime},\hat{Y}_{1}^{\prime})$ . Again, expanding all terms against the indicators $1(P_{\hat{X}_{2}\hat{Y}_{1}}\neq P_{\hat{X}_{2}^{\prime}\hat{Y}_{1}^{\prime}})$ , and $1(P_{\hat{X}_{2}\hat{Y}_{1}}=P_{\hat{X}_{2}^{\prime}\hat{Y}_{1}^{\prime}})$ , and, as above, replacing indicators by $\min\{\cdots\}$ as appropriate, we obtain

$\displaystyle\mbox{\boldmath$E$}_{{\cal C}_{2}}$	$\displaystyle\stackrel{{\scriptstyle.}}{{\leq}}\exp\Bigg{\{}n\Big{\{}1(P_{\hat{X}_{2}\hat{Y}_{1}}\neq P_{\hat{X}_{2}^{\prime}\hat{Y}_{1}^{\prime}})\Big{[}-\rho I(\hat{X}_{1}^{\prime};\hat{X}_{2}^{\prime},\hat{Y}_{1}^{\prime})$
	$\displaystyle\quad-I(\hat{X}_{1};\hat{Y}_{1})+\min\{\overline{\rho\lambda}(R_{2}-I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1})),$
	$\displaystyle\quad\quad\quad\quad\quad R_{2}-I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1})\}$
	$\displaystyle\quad+\min\{\rho(R_{2}-I(\hat{X}_{2}^{\prime};\hat{Y}_{1}^{\prime})),R_{2}-I(\hat{X}_{2}^{\prime};\hat{Y}_{1}^{\prime})\}\Big{]}$
	$\displaystyle 1(P_{\hat{X}_{2}\hat{Y}_{1}}=P_{\hat{X}_{2}^{\prime}\hat{Y}_{1}^{\prime}})\times$
	$\displaystyle\quad\Big{[}-\rho I(\hat{X}_{1}^{\prime};\hat{X}_{2}^{\prime},\hat{Y}_{1}^{\prime})+1(R_{2}\geq I(\hat{X}_{2};\hat{Y}_{1}))\times$
	$\displaystyle\quad\quad\big{[}-I(\hat{X}_{1};\hat{Y}_{1})+\rho(R_{2}-I(\hat{X}_{2}^{\prime};\hat{Y}_{1}^{\prime}))$
	$\displaystyle\quad\quad+\min\{\overline{\rho\lambda}(R_{2}-I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1})),$
	$\displaystyle\quad\quad\quad\quad R_{2}-I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1})\}\big{]}+1(R_{2}<I(\hat{X}_{2};\hat{Y}_{1}))\times$
	$\displaystyle\quad\quad\big{[}-I(\hat{X}_{1};\hat{Y}_{1})+R_{2}-I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1})\big{]}\Big{]}\Big{\}}\Bigg{\}}.$	(A.25)

Using the identity (proved via the chain rule)

I(\hat{X}_{1}^{\prime};\hat{X}_{2}^{\prime},\hat{Y}_{1}^{\prime})+I(\hat{X}_{2}^{\prime};\hat{Y}_{1}^{\prime})=I(\hat{X}_{2}^{\prime};\hat{X}_{1}^{\prime},\hat{Y}_{1}^{\prime})+I(\hat{X}_{1}^{\prime};\hat{Y}_{1}^{\prime})

twice, we can rewrite the term

-\rho I(\hat{X}_{1}^{\prime};\hat{X}_{2}^{\prime},\hat{Y}_{1}^{\prime})+\min\{\rho(R_{2}-I(\hat{X}_{2}^{\prime};\hat{Y}_{1}^{\prime})),\\ R_{2}-I(\hat{X}_{2}^{\prime};\hat{Y}_{1}^{\prime})\}

appearing after the indicator $1(P_{\hat{X}_{2}\hat{Y}_{1}}\neq P_{\hat{X}_{2}^{\prime}\hat{Y}_{1}^{\prime}})$ in (A.25) as

-\rho I(\hat{X}_{1}^{\prime};\hat{Y}_{1}^{\prime})+\min\{\rho(R_{2}-I(\hat{X}_{2}^{\prime};\hat{X}_{1}^{\prime},\hat{Y}_{1}^{\prime})),\\ R_{2}-\overline{\rho}I(\hat{X}_{2}^{\prime};\hat{Y}_{1}^{\prime})-\rho I(\hat{X}_{2}^{\prime};\hat{X}_{1}^{\prime},\hat{Y}_{1}^{\prime})\}.

Similarly, we can decompose the term $-\rho I(\hat{X}_{1}^{\prime};\hat{X}_{2}^{\prime},\hat{Y}_{1}^{\prime})$ appearing after the indicator $1(P_{\hat{X}_{2}\hat{Y}_{1}}=P_{\hat{X}_{2}^{\prime}\hat{Y}_{1}^{\prime}})$ against the indicators $1(R_{2}\geq I(\hat{X}_{2};\hat{Y}_{1})$ and $1(R_{2}<I(\hat{X}_{2};\hat{Y}_{1}))$ , and use the above identity to combine it with $\rho(R_{2}-I(\hat{X}_{2}^{\prime};\hat{Y}_{1}^{\prime}))$ appearing after the indicator $1(R_{2}\geq I(\hat{X}_{2};\hat{Y}_{1}))$ . Incorporating these steps, we can rewrite (A.25) as

$\displaystyle\mbox{\boldmath$E$}_{{\cal C}_{2}}$	$\displaystyle\stackrel{{\scriptstyle.}}{{\leq}}\exp\Bigg{\{}n\Big{\{}1(P_{\hat{X}_{2}\hat{Y}_{1}}{\neq}P_{\hat{X}_{2}^{\prime}\hat{Y}_{1}^{\prime}})\Big{[}{-}I(\hat{X}_{1};\hat{Y}_{1}){-}\rho I(\hat{X}_{1}^{\prime};\hat{Y}_{1}^{\prime})$
	$\displaystyle\quad+\min\{\overline{\rho\lambda}(R_{2}-I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1})),R_{2}-I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1})\}$
	$\displaystyle\quad+\min\{R_{2}-\overline{\rho}I(\hat{X}_{2}^{\prime};\hat{Y}_{1}^{\prime})-\rho I(\hat{X}_{2}^{\prime};\hat{X}_{1}^{\prime},\hat{Y}_{1}^{\prime}),$
	$\displaystyle\quad\quad\quad\quad\quad\quad\rho(R_{2}-I(\hat{X}_{2}^{\prime};\hat{X}_{1}^{\prime},\hat{Y}_{1}^{\prime}))\}\Big{]}$
	$\displaystyle+1(P_{\hat{X}_{2}\hat{Y}_{1}}=P_{\hat{X}_{2}^{\prime}\hat{Y}_{1}^{\prime}})\times$
	$\displaystyle\quad\Big{[}1(R_{2}\geq I(\hat{X}_{2};\hat{Y}_{1}))\big{[}-I(\hat{X}_{1};\hat{Y}_{1})-\rho I(\hat{X}_{1}^{\prime};\hat{Y}_{1}^{\prime})$
	$\displaystyle\quad\quad{+}\min\{\overline{\rho\lambda}(R_{2}{-}I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1})),R_{2}{-}I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1})\}$
	$\displaystyle\quad\quad{+}\rho(R_{2}-I(\hat{X}_{2}^{\prime};\hat{X}_{1}^{\prime},\hat{Y}_{1}^{\prime}))\big{]}$
	$\displaystyle\quad+1(R_{2}<I(\hat{X}_{2};\hat{Y}_{1}))\big{[}-I(\hat{X}_{1};\hat{Y}_{1})+R_{2}$
	$\displaystyle\quad\quad-I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1})-\rho I(\hat{X}_{1}^{\prime};\hat{X}_{2}^{\prime},\hat{Y}_{1}^{\prime})\big{]}\Big{]}\Big{\}}\Bigg{\}}.$	(A.26)

Finally, we consolidate (A.13) from the range $R_{2}\geq I(\hat{X}_{2}^{\prime};\hat{X}_{1}^{\prime},\hat{Y}_{1}^{\prime})$ with the just obtained (A.26) (for the range $R_{2}<I(\hat{X}_{2}^{\prime};\hat{X}_{1}^{\prime},\hat{Y}_{1}^{\prime})$ ) to get

$\displaystyle\mbox{\boldmath$E$}_{{\cal C}_{2}}$	$\displaystyle\stackrel{{\scriptstyle.}}{{\leq}}\exp\Bigg{\{}n\Big{\{}1(R_{2}\geq I(\hat{X}_{2}^{\prime};\hat{X}_{1}^{\prime},\hat{Y}_{1}^{\prime}))\times$
	$\displaystyle\Big{[}-I(\hat{X}_{1};\hat{Y}_{1})-\rho I(\hat{X}_{1}^{\prime};\hat{Y}_{1}^{\prime})$
	$\displaystyle{+}\min\{\overline{\rho\lambda}(R_{2}{-}I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1})),(R_{2}{-}I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1}))\}$
	$\displaystyle{+}\rho\lambda(R_{2}-I(\hat{X}_{2}^{\prime};\hat{X}_{1}^{\prime},\hat{Y}_{1}^{\prime}))\Big{]}$
	$\displaystyle+1(R_{2}<I(\hat{X}_{2}^{\prime};\hat{X}_{1}^{\prime},\hat{Y}_{1}^{\prime}))\Big{[}1(P_{\hat{X}_{2}\hat{Y}_{1}}{\neq}P_{\hat{X}_{2}^{\prime}\hat{Y}_{1}^{\prime}})\times$
	$\displaystyle\quad\Big{[}{-}I(\hat{X}_{1};\hat{Y}_{1}){-}\rho I(\hat{X}_{1}^{\prime};\hat{Y}_{1}^{\prime})$
	$\displaystyle\quad+\min\{\overline{\rho\lambda}(R_{2}{-}I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1})),R_{2}{-}I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1})\}$
	$\displaystyle\quad+\min\{R_{2}-\overline{\rho}I(\hat{X}_{2}^{\prime};\hat{Y}_{1}^{\prime})-\rho I(\hat{X}_{2}^{\prime};\hat{X}_{1}^{\prime},\hat{Y}_{1}^{\prime}),$
	$\displaystyle\quad\quad\quad\quad\quad\quad\rho(R_{2}-I(\hat{X}_{2}^{\prime};\hat{X}_{1}^{\prime},\hat{Y}_{1}^{\prime}))\}\Big{]}$
	$\displaystyle+1(P_{\hat{X}_{2}\hat{Y}_{1}}=P_{\hat{X}_{2}^{\prime}\hat{Y}_{1}^{\prime}})\times$
	$\displaystyle\quad\Big{[}1(R_{2}\geq I(\hat{X}_{2};\hat{Y}_{1}))\big{[}-I(\hat{X}_{1};\hat{Y}_{1})-\rho I(\hat{X}_{1}^{\prime};\hat{Y}_{1}^{\prime})$
	$\displaystyle\quad{+}\min\{\overline{\rho\lambda}(R_{2}{-}I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1})),R_{2}{-}I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1})\}$
	$\displaystyle\quad{+}\rho(R_{2}-I(\hat{X}_{2}^{\prime};\hat{X}_{1}^{\prime},\hat{Y}_{1}^{\prime}))\big{]}$
	$\displaystyle\quad+1(R_{2}<I(\hat{X}_{2};\hat{Y}_{1}))\big{[}-I(\hat{X}_{1};\hat{Y}_{1})+R_{2}$
	$\displaystyle\quad\quad-I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1})-\rho I(\hat{X}_{1}^{\prime};\hat{X}_{2}^{\prime},\hat{Y}_{1}^{\prime})\big{]}\Big{]}\Big{]}\Big{\}}\Bigg{\}}.$	(A.27)

As before, after expanding the first indicator $1(R_{2}\geq I(\hat{X}_{2}^{\prime};\hat{X}_{1}^{\prime},\hat{Y}_{1}^{\prime}))$ against $1(P_{\hat{X}_{2}\hat{Y}_{1}}\neq P_{\hat{X}_{2}^{\prime}\hat{Y}_{1}^{\prime}})$ , and $1(P_{\hat{X}_{2}\hat{Y}_{1}}=P_{\hat{X}_{2}^{\prime}\hat{Y}_{1}^{\prime}})$ , and combining terms, we obtain

$\displaystyle\mbox{\boldmath$E$}_{{\cal C}_{2}}$	$\displaystyle\stackrel{{\scriptstyle.}}{{\leq}}\exp\Bigg{\{}n\Big{\{}1(P_{\hat{X}_{2}\hat{Y}_{1}}{\neq}P_{\hat{X}_{2}^{\prime}\hat{Y}_{1}^{\prime}})\Big{[}{-}I(\hat{X}_{1};\hat{Y}_{1}){-}\rho I(\hat{X}_{1}^{\prime};\hat{Y}_{1}^{\prime})$
	$\displaystyle\quad+\min\{\overline{\rho\lambda}(R_{2}{-}I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1})),R_{2}{-}I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1})\}$
	$\displaystyle\quad+\min\{R_{2}-\overline{\rho}I(\hat{X}_{2}^{\prime};\hat{Y}_{1}^{\prime})-\rho I(\hat{X}_{2}^{\prime};\hat{X}_{1}^{\prime},\hat{Y}_{1}^{\prime}),$
	$\displaystyle\quad\quad\rho(R_{2}-I(\hat{X}_{2}^{\prime};\hat{X}_{1}^{\prime},\hat{Y}_{1}^{\prime})),\rho\lambda(R_{2}-I(\hat{X}_{2}^{\prime};\hat{X}_{1}^{\prime},\hat{Y}_{1}^{\prime}))\}\Big{]}$
	$\displaystyle+1(P_{\hat{X}_{2}\hat{Y}_{1}}=P_{\hat{X}_{2}^{\prime}\hat{Y}_{1}^{\prime}})\times$
	$\displaystyle\quad\Big{[}1(R_{2}\geq I(\hat{X}_{2};\hat{Y}_{1}))\big{[}-I(\hat{X}_{1};\hat{Y}_{1})-\rho I(\hat{X}_{1}^{\prime};\hat{Y}_{1}^{\prime})$
	$\displaystyle\quad{+}\min\{\overline{\rho\lambda}(R_{2}{-}I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1})),R_{2}{-}I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1})\}$
	$\displaystyle\quad{+}\min\{\rho(R_{2}{-}I(\hat{X}_{2}^{\prime};\hat{X}_{1}^{\prime},\hat{Y}_{1}^{\prime})),\rho\lambda(R_{2}{-}I(\hat{X}_{2}^{\prime};\hat{X}_{1}^{\prime},\hat{Y}_{1}^{\prime}))\}\big{]}$
	$\displaystyle\quad+1(R_{2}<I(\hat{X}_{2};\hat{Y}_{1}))\big{[}-I(\hat{X}_{1};\hat{Y}_{1})+R_{2}$
	$\displaystyle\quad\quad-I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1})-\rho I(\hat{X}_{1}^{\prime};\hat{X}_{2}^{\prime},\hat{Y}_{1}^{\prime})\big{]}\Big{]}\Big{\}}\Bigg{\}},$	(A.28)

where, in simplifying, we have made use of the identity

	$\displaystyle 1$	$\displaystyle(R_{2}\geq I(\hat{X}_{2}^{\prime};\hat{X}_{1}^{\prime},\hat{Y}_{1}^{\prime}))\rho\lambda(R_{2}-I(\hat{X}_{2}^{\prime};\hat{X}_{1}^{\prime},\hat{Y}_{1}^{\prime}))+$
		$\displaystyle\quad 1(R_{2}<I(\hat{X}_{2}^{\prime};\hat{X}_{1}^{\prime},\hat{Y}_{1}^{\prime}))\min\{R_{2}-\overline{\rho}I(\hat{X}_{2}^{\prime};\hat{Y}_{1}^{\prime})$
		$\displaystyle-\rho I(\hat{X}_{2}^{\prime};\hat{X}_{1}^{\prime},\hat{Y}_{1}^{\prime}),\rho(R_{2}-I(\hat{X}_{2}^{\prime};\hat{X}_{1}^{\prime},\hat{Y}_{1}^{\prime}))\}$
		$\displaystyle=\min\{R_{2}-\overline{\rho}I(\hat{X}_{2}^{\prime};\hat{Y}_{1}^{\prime})-\rho I(\hat{X}_{2}^{\prime};\hat{X}_{1}^{\prime},\hat{Y}_{1}^{\prime}),$
		$\displaystyle\quad\rho(R_{2}-I(\hat{X}_{2}^{\prime};\hat{X}_{1}^{\prime},\hat{Y}_{1}^{\prime})),\rho\lambda(R_{2}-I(\hat{X}_{2}^{\prime};\hat{X}_{1}^{\prime},\hat{Y}_{1}^{\prime}))\},$

along with

	$\displaystyle 1(P_{\hat{X}_{2}\hat{Y}_{1}}=P_{\hat{X}_{2}^{\prime}\hat{Y}_{1}^{\prime}})1(R_{2}\geq I(\hat{X}_{2}^{\prime};\hat{X}_{1}^{\prime},\hat{Y}_{1}^{\prime}))$
	$\displaystyle=1(P_{\hat{X}_{2}\hat{Y}_{1}}{=}P_{\hat{X}_{2}^{\prime}\hat{Y}_{1}^{\prime}})1(R_{2}{\geq}I(\hat{X}_{2};\hat{Y}_{1}))1(R_{2}{\geq}I(\hat{X}_{2}^{\prime};\hat{X}_{1}^{\prime},\hat{Y}_{1}^{\prime})),$

and finally

	$\displaystyle 1$	$\displaystyle(R_{2}\geq I(\hat{X}_{2}^{\prime};\hat{X}_{1}^{\prime},\hat{Y}_{1}^{\prime}))\rho\lambda(R_{2}-I(\hat{X}_{2}^{\prime};\hat{X}_{1}^{\prime},\hat{Y}_{1}^{\prime}))+$
		$\displaystyle 1(R_{2}<I(\hat{X}_{2}^{\prime};\hat{X}_{1}^{\prime},\hat{Y}_{1}^{\prime}))\rho(R_{2}-I(\hat{X}_{2}^{\prime};\hat{X}_{1}^{\prime},\hat{Y}_{1}^{\prime}))$
		$\displaystyle=\min\{\rho(R_{2}{-}I(\hat{X}_{2}^{\prime};\hat{X}_{1}^{\prime},\hat{Y}_{1}^{\prime})),\rho\lambda(R_{2}{-}I(\hat{X}_{2}^{\prime};\hat{X}_{1}^{\prime},\hat{Y}_{1}^{\prime}))\}.$

We use (A.28) in (A), add over all vectors $\mbox{\boldmath$y$}_{1}$ , decompose all joint-type-dependent terms appearing in (A), as well as the term $nH(\hat{Y}_{1})$ arising from the summation over $\mbox{\boldmath$y$}_{1}$ per type, against the indicators $1(P_{\hat{X}_{2}\hat{Y}_{1}}\neq P_{\hat{X}_{2}^{\prime}\hat{Y}_{1}^{\prime}})$ and $1(P_{\hat{X}_{2}\hat{Y}_{1}}=P_{\hat{X}_{2}^{\prime}\hat{Y}_{1}^{\prime}})$ , and finally optimize over the types $P_{\hat{X}_{1}\hat{X}_{2}\hat{Y}_{1}}$ , $P_{\hat{X}_{1}^{\prime}\hat{X}_{2}^{\prime}\hat{Y}_{1}^{\prime}}$ to obtain:

$\displaystyle\mbox{\boldmath$E$}_{{\cal C}_{1},{\cal C}_{2}}$	$\displaystyle(P_{E_{1}})\stackrel{{\scriptstyle.}}{{\leq}}\exp\Bigg{\{}n\Bigg{\{}-R_{2}+\rho R_{1}+\max\Bigg{\{}$
	$\displaystyle\max_{\begin{subarray}{c}P_{\hat{X}_{1}\hat{X}_{2}\hat{Y}_{1}},P_{\hat{X}_{1}^{\prime}\hat{X}_{2}^{\prime}\hat{Y}_{1}^{\prime}}\\ P_{\hat{X}_{1}}=P_{\hat{X}_{1}^{\prime}}=Q_{1},\\ P_{\hat{X}_{2}}=P_{\hat{X}_{2}^{\prime}}=Q_{2},\\ P_{\hat{Y}}=P_{\hat{Y}_{1}^{\prime}}\\ P_{\hat{X}_{2}\hat{Y}_{1}}\neq P_{\hat{X}_{2}^{\prime}\hat{Y}_{1}^{\prime}},\end{subarray}}\Bigg{[}\overline{\rho\lambda}\mbox{\boldmath$E$}_{\hat{X}_{1}\hat{X}_{2}\hat{Y}_{1}}\log q_{1}(\hat{Y}_{1}\|\hat{X}_{1},\hat{X}_{2})$
	$\displaystyle+\rho\lambda\mbox{\boldmath$E$}_{\hat{X}_{1}^{\prime}\hat{X}_{2}^{\prime}\hat{Y}_{1}^{\prime}}\log q_{1}(\hat{Y}_{1}^{\prime}\|\hat{X}_{1}^{\prime},\hat{X}_{2}^{\prime})$
	$\displaystyle+H(\hat{Y}_{1}\|\hat{X}_{1}){-}\rho I(\hat{X}_{1}^{\prime};\hat{Y}_{1}^{\prime})$
	$\displaystyle\quad+\min\{\overline{\rho\lambda}(R_{2}{-}I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1})),R_{2}{-}I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1})\}$
	$\displaystyle\quad+\min\{R_{2}-\overline{\rho}I(\hat{X}_{2}^{\prime};\hat{Y}_{1}^{\prime})-\rho I(\hat{X}_{2}^{\prime};\hat{X}_{1}^{\prime},\hat{Y}_{1}^{\prime}),$
	$\displaystyle\quad\quad\rho(R_{2}{-}I(\hat{X}_{2}^{\prime};\hat{X}_{1}^{\prime},\hat{Y}_{1}^{\prime})),\rho\lambda(R_{2}{-}I(\hat{X}_{2}^{\prime};\hat{X}_{1}^{\prime},\hat{Y}_{1}^{\prime}))\}\Bigg{]};$
	$\displaystyle\max_{\begin{subarray}{c}P_{\hat{X}_{1}\hat{X}_{2}\hat{Y}_{1}},P_{\hat{X}_{1}^{\prime}\hat{X}_{2}^{\prime}\hat{Y}_{1}^{\prime}}\\ P_{\hat{X}_{1}}=P_{\hat{X}_{1}^{\prime}}=Q_{1},\\ P_{\hat{X}_{2}}=P_{\hat{X}_{2}^{\prime}}=Q_{2},\\ P_{\hat{X}_{2}\hat{Y}_{1}}=P_{\hat{X}_{2}^{\prime}\hat{Y}_{1}^{\prime}}\end{subarray}}\Bigg{[}\overline{\rho\lambda}\mbox{\boldmath$E$}_{\hat{X}_{1}\hat{X}_{2}\hat{Y}_{1}}\log q_{1}(\hat{Y}_{1}\|\hat{X}_{1},\hat{X}_{2})$
	$\displaystyle+\rho\lambda\mbox{\boldmath$E$}_{\hat{X}_{1}^{\prime}\hat{X}_{2}^{\prime}\hat{Y}_{1}^{\prime}}\log q_{1}(\hat{Y}_{1}^{\prime}\|\hat{X}_{1}^{\prime},\hat{X}_{2}^{\prime})$
	$\displaystyle+1(R_{2}\geq I(\hat{X}_{2};\hat{Y}_{1}))\big{[}H(\hat{Y}_{1}\|\hat{X}_{1})-\rho I(\hat{X}_{1}^{\prime};\hat{Y}_{1}^{\prime})$
	$\displaystyle\quad{+}\min\{\overline{\rho\lambda}(R_{2}{-}I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1})),R_{2}{-}I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1})\}$
	$\displaystyle\quad{+}\min\{\rho(R_{2}-I(\hat{X}_{2}^{\prime};\hat{X}_{1}^{\prime},\hat{Y}_{1}^{\prime})),$
	$\displaystyle\quad\quad\rho\lambda(R_{2}-I(\hat{X}_{2}^{\prime};\hat{X}_{1}^{\prime},\hat{Y}_{1}^{\prime}))\}\big{]}$
	$\displaystyle\quad+1(R_{2}<I(\hat{X}_{2};\hat{Y}_{1}))\big{[}H(\hat{Y}_{1}\|\hat{X}_{1})+R_{2}$
	$\displaystyle\quad\quad-I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1})-\rho I(\hat{X}_{1}^{\prime};\hat{X}_{2}^{\prime},\hat{Y}_{1}^{\prime})\big{]}\Bigg{]}\Bigg{\}}\Bigg{\}}\Bigg{\}}$	(A.29)

Note that the term $H(\hat{Y}_{1})$ mentioned above has been combined with the term $-I(\hat{X}_{1};\hat{X}_{2})$ appearing in all subcases of (A.28) to yield the $H(\hat{Y}_{1}|\hat{X}_{1})$ appearing throughout (A.29).

The expression in Theorem 1 is obtained from (A.29) by dropping the constraint $P_{\hat{X}_{2}\hat{Y}_{1}}\neq P_{\hat{X}_{2}^{\prime}\hat{Y}_{1}^{\prime}},$ from the first maximization (which, given the continuity of the underlying terms, is not really a constraint anyway), by noting that if, in the resulting expression, the second maximization is attained when $R_{2}\geq I(\hat{X}_{2};\hat{Y}_{1})$ , it will be dominated by the first maximization so that the second maximization can be restricted to the case $R_{2}<I(\hat{X}_{2};\hat{Y}_{1})$ , and finally by negating the resulting exponent (and propagating the negation as $-\max\{\cdots\}=\min\{-\cdots\}$ throughout).

Appendix B A Lower Bound to $E_{R,1}$

We can lower bound the maximization of (3) over $\rho$ and $\lambda$ by applying the min-max theorem twice, as follows.

First we introduce a new parameter $\theta$ and bound (3) as

$\displaystyle E$	${}_{R,1}\geq\min_{\theta\in[0,1]}\Bigg{\{}R_{2}-\rho R_{1}+\theta\times$	(B.1)
	$\displaystyle\min_{\begin{subarray}{c}(P_{\hat{X}^{(1)}_{1}\hat{X}^{(1)}_{2}\hat{Y}^{(1)}_{1}},\\ P_{\hat{X}^{{}^{\prime}(1)}_{1}\hat{X}^{{}^{\prime}(1)}_{2}\hat{Y}^{{}^{\prime}(1)}_{1}})\\ \in{\cal S}_{1}(Q_{1},Q_{2})\end{subarray}}f_{1}\left(\rho,\lambda,P_{\hat{X}^{(1)}_{1}\hat{X}^{(1)}_{2}\hat{Y}^{(1)}_{1}},P_{\hat{X}^{{}^{\prime}(1)}_{1}\hat{X}^{{}^{\prime}(1)}_{2}\hat{Y}^{{}^{\prime}(1)}_{1}}\right)+$
	$\displaystyle\overline{\theta}\min_{\begin{subarray}{c}(P_{\hat{X}^{(2)}_{1}\hat{X}^{(2)}_{2}\hat{Y}^{(2)}_{1}},\\ P_{\hat{X}^{{}^{\prime}(2)}_{1}\hat{X}^{{}^{\prime}(2)}_{2}\hat{Y}^{{}^{\prime}(2)}_{1}})\\ \in{\cal S}_{2}(Q_{1},Q_{2})\end{subarray}}f_{2}\left(\rho,\lambda,P_{\hat{X}^{(2)}_{1}\hat{X}^{(2)}_{2}\hat{Y}^{(2)}_{1}},P_{\hat{X}^{{}^{\prime}(2)}_{1}\hat{X}^{{}^{\prime}(2)}_{2}\hat{Y}^{{}^{\prime}(2)}_{1}}\right)\Bigg{\}}$	(B.2)

where $\overline{\theta}=1-\theta$ and we have dropped the constraint involving $R_{2}$ from ${\cal S}_{2}$ , resulting in a lower bound, and making ${\cal S}_{2}$ convex.

Letting $\gamma=\rho\lambda$ , we claim that for fixed $\theta$ , the expression in (B.2) being minimized over $\theta$ above is convex in $(\rho,\gamma)$ . This follows from the fact that for fixed $P_{\hat{X}^{(1)}_{1}\hat{X}^{(1)}_{2}\hat{Y}^{(1)}_{1}},$ $P_{\hat{X}^{{}^{\prime}(1)}_{1}\hat{X}^{{}^{\prime}(1)}_{2}\hat{Y}^{{}^{\prime}(1)}_{1}},$ $P_{\hat{X}^{(2)}_{1}\hat{X}^{(2)}_{2}\hat{Y}^{(2)}_{1}},$ $P_{\hat{X}^{{}^{\prime}(2)}_{1}\hat{X}^{{}^{\prime}(2)}_{2}\hat{Y}^{{}^{\prime}(2)}_{1}})$ , both $f_{1}$ and $f_{2}$ are affine in $(\rho,\gamma)$ . The only problem would come from the $\max$ ’s appearing in these expressions, but it can be checked that these maximizations are independent of $(\rho,\gamma)$ for fixed $(P_{\hat{X}^{(1)}_{1}\hat{X}^{(1)}_{2}\hat{Y}^{(1)}_{1}},$ $P_{\hat{X}^{{}^{\prime}(1)}_{1}\hat{X}^{{}^{\prime}(1)}_{2}\hat{Y}^{{}^{\prime}(1)}_{1}},$ $P_{\hat{X}^{(2)}_{1}\hat{X}^{(2)}_{2}\hat{Y}^{(2)}_{1}},$ $P_{\hat{X}^{{}^{\prime}(2)}_{1}\hat{X}^{{}^{\prime}(2)}_{2}\hat{Y}^{{}^{\prime}(2)}_{1}})$ . Letting $\Sigma=\{(x,y):x\in[0,1],y\in[0,x]\}$ , we can thus apply the min-max theorem of convex analysis (twice) as follows

	$\displaystyle E^{*}_{R,1}$
	$\displaystyle\geq\max_{(\rho,\gamma)\in\Sigma}\min_{\theta\in[0,1]}\Bigg{\{}R_{2}-\rho R_{1}+\theta\times$
	$\displaystyle\min_{\begin{subarray}{c}(P_{\hat{X}^{(1)}_{1}\hat{X}^{(1)}_{2}\hat{Y}^{(1)}_{1}},\\ P_{\hat{X}^{{}^{\prime}(1)}_{1}\hat{X}^{{}^{\prime}(1)}_{2}\hat{Y}^{{}^{\prime}(1)}_{1}})\\ \in{\cal S}_{1}(Q_{1},Q_{2})\end{subarray}}f_{1}\left(\rho,\gamma,P_{\hat{X}^{(1)}_{1}\hat{X}^{(1)}_{2}\hat{Y}^{(1)}_{1}},P_{\hat{X}^{{}^{\prime}(1)}_{1}\hat{X}^{{}^{\prime}(1)}_{2}\hat{Y}^{{}^{\prime}(1)}_{1}}\right)+$
	$\displaystyle\overline{\theta}\min_{\begin{subarray}{c}(P_{\hat{X}^{(2)}_{1}\hat{X}^{(2)}_{2}\hat{Y}^{(2)}_{1}},\\ P_{\hat{X}^{{}^{\prime}(2)}_{1}\hat{X}^{{}^{\prime}(2)}_{2}\hat{Y}^{{}^{\prime}(2)}_{1}})\\ \in{\cal S}_{2}(Q_{1},Q_{2})\end{subarray}}f_{2}\left(\rho,\gamma,P_{\hat{X}^{(2)}_{1}\hat{X}^{(2)}_{2}\hat{Y}^{(2)}_{1}},P_{\hat{X}^{{}^{\prime}(2)}_{1}\hat{X}^{{}^{\prime}(2)}_{2}\hat{Y}^{{}^{\prime}(2)}_{1}}\right)\Bigg{\}}$
	$\displaystyle=\min_{\theta\in[0,1]}\max_{(\rho,\gamma)\in\Sigma}\Bigg{\{}R_{2}-\rho R_{1}+\theta\times$
	$\displaystyle\min_{\begin{subarray}{c}(P_{\hat{X}^{(1)}_{1}\hat{X}^{(1)}_{2}\hat{Y}^{(1)}_{1}},\\ P_{\hat{X}^{{}^{\prime}(1)}_{1}\hat{X}^{{}^{\prime}(1)}_{2}\hat{Y}^{{}^{\prime}(1)}_{1}})\\ \in{\cal S}_{1}(Q_{1},Q_{2})\end{subarray}}f_{1}\left(\rho,\gamma,P_{\hat{X}^{(1)}_{1}\hat{X}^{(1)}_{2}\hat{Y}^{(1)}_{1}},P_{\hat{X}^{{}^{\prime}(1)}_{1}\hat{X}^{{}^{\prime}(1)}_{2}\hat{Y}^{{}^{\prime}(1)}_{1}}\right)+$
	$\displaystyle\overline{\theta}\min_{\begin{subarray}{c}(P_{\hat{X}^{(2)}_{1}\hat{X}^{(2)}_{2}\hat{Y}^{(2)}_{1}},\\ P_{\hat{X}^{{}^{\prime}(2)}_{1}\hat{X}^{{}^{\prime}(2)}_{2}\hat{Y}^{{}^{\prime}(2)}_{1}})\\ \in{\cal S}_{2}(Q_{1},Q_{2})\end{subarray}}f_{2}\left(\rho,\gamma,P_{\hat{X}^{(2)}_{1}\hat{X}^{(2)}_{2}\hat{Y}^{(2)}_{1}},P_{\hat{X}^{{}^{\prime}(2)}_{1}\hat{X}^{{}^{\prime}(2)}_{2}\hat{Y}^{{}^{\prime}(2)}_{1}}\right)\Bigg{\}}$
	$\displaystyle=\min_{\theta\in[0,1]}\max_{(\rho,\gamma)\in\Sigma}\min_{\begin{subarray}{c}(P_{\hat{X}^{(1)}_{1}\hat{X}^{(1)}_{2}\hat{Y}^{(1)}_{1}},P_{\hat{X}^{{}^{\prime}(1)}_{1}\hat{X}^{{}^{\prime}(1)}_{2}\hat{Y}^{{}^{\prime}(1)}_{1}},\\ P_{\hat{X}^{(2)}_{1}\hat{X}^{(2)}_{2}\hat{Y}^{(2)}_{1}},P_{\hat{X}^{{}^{\prime}(2)}_{1}\hat{X}^{{}^{\prime}(2)}_{2}\hat{Y}^{{}^{\prime}(2)}_{1}})\\ \in{\cal S}_{1}(Q_{1},Q_{2})\times{\cal S}_{2}(Q_{1},Q_{2})\end{subarray}}\Bigg{\{}R_{2}-\rho R_{1}+$
	$\displaystyle\quad\quad\quad\theta f_{1}\left(\rho,\gamma,P_{\hat{X}^{(1)}_{1}\hat{X}^{(1)}_{2}\hat{Y}^{(1)}_{1}},P_{\hat{X}^{{}^{\prime}(1)}_{1}\hat{X}^{{}^{\prime}(1)}_{2}\hat{Y}^{{}^{\prime}(1)}_{1}}\right)+$
	$\displaystyle\quad\quad\quad\overline{\theta}f_{2}\left(\rho,\gamma,P_{\hat{X}^{(2)}_{1}\hat{X}^{(2)}_{2}\hat{Y}^{(2)}_{1}},P_{\hat{X}^{{}^{\prime}(2)}_{1}\hat{X}^{{}^{\prime}(2)}_{2}\hat{Y}^{{}^{\prime}(2)}_{1}}\right)\Bigg{\}}$
	$\displaystyle=\min_{\theta\in[0,1]}\min_{\begin{subarray}{c}(P_{\hat{X}^{(1)}_{1}\hat{X}^{(1)}_{2}\hat{Y}^{(1)}_{1}},P_{\hat{X}^{{}^{\prime}(1)}_{1}\hat{X}^{{}^{\prime}(1)}_{2}\hat{Y}^{{}^{\prime}(1)}_{1}},\\ P_{\hat{X}^{(2)}_{1}\hat{X}^{(2)}_{2}\hat{Y}^{(2)}_{1}},P_{\hat{X}^{{}^{\prime}(2)}_{1}\hat{X}^{{}^{\prime}(2)}_{2}\hat{Y}^{{}^{\prime}(2)}_{1}})\\ \in{\cal S}_{1}(Q_{1},Q_{2})\times{\cal S}_{2}(Q_{1},Q_{2})\end{subarray}}\max_{(\rho,\gamma)\in\Sigma}\Bigg{\{}R_{2}-\rho R_{1}+$
	$\displaystyle\quad\quad\quad\theta f_{1}\left(\rho,\gamma,P_{\hat{X}^{(1)}_{1}\hat{X}^{(1)}_{2}\hat{Y}^{(1)}_{1}},P_{\hat{X}^{{}^{\prime}(1)}_{1}\hat{X}^{{}^{\prime}(1)}_{2}\hat{Y}^{{}^{\prime}(1)}_{1}}\right)+$
	$\displaystyle\quad\quad\quad\overline{\theta}f_{2}\left(\rho,\gamma,P_{\hat{X}^{(2)}_{1}\hat{X}^{(2)}_{2}\hat{Y}^{(2)}_{1}},P_{\hat{X}^{{}^{\prime}(2)}_{1}\hat{X}^{{}^{\prime}(2)}_{2}\hat{Y}^{{}^{\prime}(2)}_{1}}\right)\Bigg{\}}$		(B.3)

Since, as noted above, for fixed ( $\theta,$ $P_{\hat{X}^{(1)}_{1}\hat{X}^{(1)}_{2}\hat{Y}^{(1)}_{1}},$ $P_{\hat{X}^{{}^{\prime}(1)}_{1}\hat{X}^{{}^{\prime}(1)}_{2}\hat{Y}^{{}^{\prime}(1)}_{1}},$ $P_{\hat{X}^{(2)}_{1}\hat{X}^{(2)}_{2}\hat{Y}^{(2)}_{1}},$ $P_{\hat{X}^{{}^{\prime}(2)}_{1}\hat{X}^{{}^{\prime}(2)}_{2}\hat{Y}^{{}^{\prime}(2)}_{1}}$ ), both $f_{1}$ and $f_{2}$ are affine in $(\rho,\gamma)$ , the inner maximization in (B.3) is attained at one of the points $(\rho,\gamma)=\{(0,0),(1,0),(1,1)\}$ . After simplification, we obtain

	$\displaystyle E_{R,1}^{*}\geq\min_{\theta\in[0,1]}\min_{\begin{subarray}{c}(P_{\hat{X}^{(1)}_{1}\hat{X}^{(1)}_{2}\hat{Y}^{(1)}_{1}},P_{\hat{X}^{{}^{\prime}(1)}_{1}\hat{X}^{{}^{\prime}(1)}_{2}\hat{Y}^{{}^{\prime}(1)}_{1}},\\ P_{\hat{X}^{(2)}_{1}\hat{X}^{(2)}_{2}\hat{Y}^{(2)}_{1}},P_{\hat{X}^{{}^{\prime}(2)}_{1}\hat{X}^{{}^{\prime}(2)}_{2}\hat{Y}^{{}^{\prime}(2)}_{1}})\\ \in{\cal S}_{1}(Q_{1},Q_{2})\times{\cal S}_{2}(Q_{1},Q_{2})\end{subarray}}\max\Bigg{\{}$
	$\displaystyle\theta\Big{[}-E\left[\log q_{1}(\hat{Y}^{(1)}_{1}\|\hat{X}^{(1)}_{1},\hat{X}^{(1)}_{2})\right]-H(\hat{Y}^{(1)}_{1}\|\hat{X}^{(1)}_{1})+$
	$\displaystyle\quad\quad I(\hat{X}^{(1)}_{2};\hat{X}^{(1)}_{1},\hat{Y}^{(1)}_{1})+\|I(\hat{X}^{{}^{\prime}(1)}_{2};\hat{Y}^{{}^{\prime}(1)}_{1})-R_{2}\|^{+}\Big{]}+$
	$\displaystyle\overline{\theta}\Big{[}-E\left[\log q_{1}(\hat{Y}^{(2)}_{1}\|\hat{X}^{(2)}_{1},\hat{X}^{(2)}_{2})\right]-H(\hat{Y}^{(2)}_{1}\|\hat{X}^{(2)}_{1})+$
	$\displaystyle\quad\quad I(\hat{X}^{(2)}_{2};\hat{X}^{(2)}_{1},\hat{Y}^{(2)}_{1})\Big{]};$
	$\displaystyle-R_{1}+\theta\Big{[}-E\left[\log q_{1}(\hat{Y}^{(1)}_{1}\|\hat{X}^{(1)}_{1},\hat{X}^{(1)}_{2})\right]-$
	$\displaystyle\quad\quad H(\hat{Y}^{(1)}_{1}\|\hat{X}^{(1)}_{1})+I(\hat{X}^{{}^{\prime}(1)}_{1};\hat{Y}^{{}^{\prime}(1)}_{1})+$
	$\displaystyle\quad\quad I(\hat{X}^{(1)}_{2};\hat{X}^{(1)}_{1},\hat{Y}^{(1)}_{1})+\|I(\hat{X}^{{}^{\prime}(1)}_{2};\hat{X}^{{}^{\prime}(1)}_{1},\hat{Y}^{{}^{\prime}(1)}_{1})-R_{2}\|^{+}\Big{]}+$
	$\displaystyle\overline{\theta}\Big{[}-E\left[\log q_{1}(\hat{Y}^{(2)}_{1}\|\hat{X}^{(2)}_{1},\hat{X}^{(2)}_{2})\right]-H(\hat{Y}^{(2)}_{1}\|\hat{X}^{(2)}_{1})+$
	$\displaystyle\quad\quad I(\hat{X}^{{}^{\prime}(2)}_{1};\hat{X}^{{}^{\prime}(2)}_{2},\hat{Y}^{{}^{\prime}(2)}_{1})+I(\hat{X}^{(2)}_{2};\hat{X}^{(2)}_{1},\hat{Y}^{(2)}_{1})\Big{]};$
	$\displaystyle-R_{1}+\theta\Big{[}-E\left[\log q_{1}(\hat{Y}^{{}^{\prime}(1)}_{1}\|\hat{X}^{{}^{\prime}(1)}_{1},\hat{X}^{{}^{\prime}(1)}_{2})\right]-$
	$\displaystyle\quad\quad H(\hat{Y}^{(1)}_{1}\|\hat{X}^{(1)}_{1})+I(\hat{X}^{{}^{\prime}(1)}_{1};\hat{Y}^{{}^{\prime}(1)}_{1})+$
	$\displaystyle\quad\quad I(\hat{X}^{{}^{\prime}(1)}_{2};\hat{X}^{{}^{\prime}(1)}_{1},\hat{Y}^{{}^{\prime}(1)}_{1})+\|I(\hat{X}^{(1)}_{2};\hat{X}^{(1)}_{1},\hat{Y}^{(1)}_{1})-R_{2}\|^{+}\Big{]}+$
	$\displaystyle\overline{\theta}\Big{[}-E\left[\log q_{1}(\hat{Y}^{{}^{\prime}(2)}_{1}\|\hat{X}^{{}^{\prime}(2)}_{1},\hat{X}^{{}^{\prime}(2)}_{2})\right]-H(\hat{Y}^{(2)}_{1}\|\hat{X}^{(2)}_{1})+$
	$\displaystyle\quad\quad I(\hat{X}^{{}^{\prime}(2)}_{1};\hat{X}^{{}^{\prime}(2)}_{2},\hat{Y}^{{}^{\prime}(2)}_{1})+I(\hat{X}^{(2)}_{2};\hat{X}^{(2)}_{1},\hat{Y}^{(2)}_{1})\Big{]}\Bigg{\}}$

Next, we note the identities

	$\displaystyle I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1})=I(\hat{X}_{1};\hat{X}_{2})+H(\hat{Y}_{1}\|\hat{X}_{1})-H(\hat{Y}_{1}\|\hat{X}_{1},\hat{X}_{2})$
	$\displaystyle I(\hat{X}_{1};\hat{X}_{2},\hat{Y}_{1})=I(\hat{X}_{1};\hat{X}_{2})+H(\hat{Y}_{1}\|\hat{X}_{2})-H(\hat{Y}_{1}\|\hat{X}_{1},\hat{X}_{2})$
	$\displaystyle D(P_{\hat{Y}_{1}\|\hat{X}_{1}\hat{X}_{2}}\|\|q_{1}\|P_{\hat{X}_{1}\hat{X}_{2}})=-H(\hat{Y}_{1}\|\hat{X}_{1},\hat{X}_{2})-$
	$\displaystyle\quad\quad\quad\mbox{\boldmath$E$}_{\hat{X}_{1}\hat{X}_{2}\hat{Y}_{1}}\left[\log q_{1}\left(\hat{Y}_{1}\|\hat{X}_{1},\hat{X}_{2}\right)\right]$

and use them, with the shorthand $D^{(m)}=D(P_{\hat{Y}^{(m)}_{1}|\hat{X}^{(m)}_{1}\hat{X}^{(m)}_{2}}||q_{1}|P_{\hat{X}^{(m)}_{1}\hat{X}^{(m)}_{2}})$ and $D^{{}^{\prime}(m)}=D(P_{\hat{Y}^{{}^{\prime}(m)}_{1}|\hat{X}^{{}^{\prime}(m)}_{1}\hat{X}^{{}^{\prime}(m)}_{2}}||q_{1}|P_{\hat{X}^{{}^{\prime}(m)}_{1}\hat{X}^{{}^{\prime}(m)}_{2}})$ , for $m\in\{1,2\}$ , to rewrite the bound as

	$\displaystyle E_{R,1}^{*}\geq\min_{\theta\in[0,1]}\min_{\begin{subarray}{c}(P_{\hat{X}^{(1)}_{1}\hat{X}^{(1)}_{2}\hat{Y}^{(1)}_{1}},P_{\hat{X}^{{}^{\prime}(1)}_{1}\hat{X}^{{}^{\prime}(1)}_{2}\hat{Y}^{{}^{\prime}(1)}_{1}},\\ P_{\hat{X}^{(2)}_{1}\hat{X}^{(2)}_{2}\hat{Y}^{(2)}_{1}},P_{\hat{X}^{{}^{\prime}(2)}_{1}\hat{X}^{{}^{\prime}(2)}_{2}\hat{Y}^{{}^{\prime}(2)}_{1}})\\ \in{\cal S}_{1}(Q_{1},Q_{2})\times{\cal S}_{2}(Q_{1},Q_{2})\end{subarray}}\max\Bigg{\{}$
	$\displaystyle\theta\Big{[}D^{(1)}+I(\hat{X}^{(1)}_{1};\hat{X}^{(1)}_{2})+\|I(\hat{X}^{{}^{\prime}(1)}_{2};\hat{Y}^{{}^{\prime}(1)}_{1})-R_{2}\|^{+}\Big{]}+$
	$\displaystyle\overline{\theta}\Big{[}D^{(2)}+I(\hat{X}^{(2)}_{1};\hat{X}^{(2)}_{2})\Big{]};$
	$\displaystyle-R_{1}+\theta\Big{[}D^{(1)}+I(\hat{X}^{(1)}_{1};\hat{X}^{(1)}_{2})+I(\hat{X}^{{}^{\prime}(1)}_{1};\hat{Y}^{{}^{\prime}(1)}_{1})$
	$\displaystyle\quad\quad\quad+\|I(\hat{X}^{{}^{\prime}(1)}_{2};\hat{X}^{{}^{\prime}(1)}_{1},\hat{Y}^{{}^{\prime}(1)}_{1})-R_{2}\|^{+}\Big{]}+$
	$\displaystyle\overline{\theta}\Big{[}D^{(2)}+I(\hat{X}^{(2)}_{1};\hat{X}^{(2)}_{2})+I(\hat{X}^{{}^{\prime}(2)}_{1};\hat{X}^{{}^{\prime}(2)}_{2},\hat{Y}^{{}^{\prime}(2)}_{1})\Big{]};$
	$\displaystyle-R_{1}+\theta\Big{[}D^{{}^{\prime}(1)}+I(\hat{X}^{{}^{\prime}(1)}_{1};\hat{X}^{{}^{\prime}(1)}_{2})+I(\hat{X}^{(1)}_{1};\hat{Y}^{(1)}_{1})+$
	$\displaystyle\quad\quad\quad\|I(\hat{X}^{(1)}_{2};\hat{X}^{(1)}_{1},\hat{Y}^{(1)}_{1})-R_{2}\|^{+}\Big{]}+$
	$\displaystyle\overline{\theta}\Big{[}D^{{}^{\prime}(2)}+I(\hat{X}^{{}^{\prime}(2)}_{1};\hat{X}^{{}^{\prime}(2)}_{2})+I(\hat{X}^{(2)}_{1};\hat{X}^{(2)}_{2},\hat{Y}^{(2)}_{1})\Big{]}\Bigg{\}}$		(B.4)

where in simplifying the third expression in the maximization we have also exploited the constraints $H(\hat{Y}_{1}^{(1)})=H(\hat{Y}_{1}^{{}^{\prime}(1)})$ and $H(\hat{Y}_{1}^{(2)}|\hat{X}_{2}^{(2)})=H(\hat{Y}_{1}^{{}^{\prime}(2)}|\hat{X}_{2}^{{}^{\prime}(2)})$ .

For $R_{1}=0$ we can further simplify this expression. In particular, for $R_{1}=0$ , the first term in the inner maximization is readily seen to be always smaller than the second term. Additionally, the second and third terms are symmetric in the primed and non-primed joint distributions, which, together with the readily established joint convexity of the maximum of these two terms on the constraint set, imply that the inner minimization over the joint types is achieved when the primed and non-primed joint distributions are equal, in which case the two terms are equal. Therefore, at $R_{1}=0$ we have

$\displaystyle E_{R,1}^{*}$	$\displaystyle\geq\min_{\theta\in[0,1]}\min_{\begin{subarray}{c}(P_{\hat{X}^{(1)}_{1}\hat{X}^{(1)}_{2}\hat{Y}^{(1)}_{1}},P_{\hat{X}^{(2)}_{1}\hat{X}^{(2)}_{2}\hat{Y}^{(2)}_{1}}):\\ P_{\hat{X}^{(1)}_{1}}=P_{\hat{X}^{(2)}_{1}}=Q_{1},P_{\hat{X}^{(1)}_{2}}=P_{\hat{X}^{(2)}_{2}}=Q_{2}\end{subarray}}$
	$\displaystyle\theta\Big{[}D^{(1)}+I(\hat{X}^{(1)}_{1};\hat{X}^{(1)}_{2})+$
	$\displaystyle\quad\quad I(\hat{X}^{(1)}_{1};\hat{Y}^{(1)}_{1})+\|I(\hat{X}^{(1)}_{2};\hat{X}^{(1)}_{1},\hat{Y}^{(1)}_{1})-R_{2}\|^{+}\Big{]}+$
	$\displaystyle\overline{\theta}\Big{[}D^{(2)}+I(\hat{X}^{(2)}_{1};\hat{X}^{(2)}_{2})+I(\hat{X}^{(2)}_{1};\hat{X}^{(2)}_{2},\hat{Y}^{(2)}_{1})\Big{]}$	(B.5)

$\displaystyle E_{R,1}^{*}$	$\displaystyle\geq\min\Bigg{\{}\min_{\begin{subarray}{c}P_{\hat{X}_{1}\hat{X}_{2}\hat{Y}_{1}}:\\ P_{\hat{X}_{1}}=Q_{1},P_{\hat{X}_{2}}=Q_{2}\end{subarray}}\Big{[}D+I(\hat{X}_{1};\hat{X}_{2})+I(\hat{X}_{1};\hat{Y}_{1})$
	$\displaystyle\quad\quad\quad+\|I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1})-R_{2}\|^{+}\Big{]};$
	$\displaystyle\min_{\begin{subarray}{c}P_{\hat{X}_{1}\hat{X}_{2}\hat{Y}_{1}}:\\ P_{\hat{X}_{1}}=Q_{1},P_{\hat{X}_{2}}=Q_{2}\end{subarray}}\Big{[}D+I(\hat{X}_{1};\hat{X}_{2})+I(\hat{X}_{1};\hat{X}_{2},\hat{Y}_{1})\Big{]}\Bigg{\}}$	(B.6)

where $D=D(P_{\hat{Y}_{1}|\hat{X}_{1}\hat{X}_{2}}||q_{1}|P_{\hat{X}_{1}\hat{X}_{2}})$ .

Simplifying $E_{B,1}$ at $R_{1}=0$ gives

$\displaystyle E_{B,1}$	$\displaystyle=\max\Bigg{\{}\min_{\begin{subarray}{c}P_{\hat{X}_{1}\hat{X}_{2}\hat{Y}_{1}}:\\ P_{\hat{X}_{1}}=Q_{1},P_{\hat{X}_{2}}=Q_{2}\end{subarray}}\Big{[}D+I(\hat{X}_{1};\hat{X}_{2})+I(\hat{X}_{1};\hat{Y}_{1})\Big{]};$
	$\displaystyle\quad\min\Bigg{\{}\min_{\begin{subarray}{c}P_{\hat{X}_{1}\hat{X}_{2}\hat{Y}_{1}}:\\ P_{\hat{X}_{1}}=Q_{1},P_{\hat{X}_{2}}=Q_{2}\end{subarray}}\Big{[}D+I(\hat{X}_{1};\hat{X}_{2})+$
	$\displaystyle\quad\quad\quad+\|I(\hat{X}_{1};\hat{Y}_{1})+I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1})-R_{2}\|^{+}\Big{]};$
	$\displaystyle\quad\min_{\begin{subarray}{c}P_{\hat{X}_{1}\hat{X}_{2}\hat{Y}_{1}}:\\ P_{\hat{X}_{1}}=Q_{1},P_{\hat{X}_{2}}=Q_{2}\end{subarray}}\Big{[}D+I(\hat{X}_{1};\hat{X}_{2})+I(\hat{X}_{1};\hat{X}_{2},\hat{Y}_{1})\Big{]}\Bigg{\}}\Bigg{\}}$	(B.7)

which is seen to be no bigger than the above lower bound on $E^{*}_{R,1}$ , since $|I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1})-R_{2}|^{+}\geq 0$ , $I(\hat{X}_{1};\hat{X}_{2},\hat{Y}_{1})\geq I(\hat{X}_{1};\hat{Y}_{1})$ , and $I(\hat{X}_{1};\hat{Y}_{1})+|I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1})-R_{2}|^{+}\geq|I(\hat{X}_{1};\hat{Y}_{1})+I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1})-R_{2}|^{+}$ .

Another application of the lower bound (B.4) is in determining the set of rate pairs $R_{1},R_{2}$ for which $E_{R,1}^{*}>0$ . Let $(\hat{X}_{1},\hat{X}_{2})$ be independent with marginal distributions $Q_{1}$ and $Q_{2}$ and $\hat{Y}_{1}$ be the result of $(\hat{X}_{1},\hat{X}_{2})$ passing through the channel $q_{1}$ . We shall argue that if $R_{1}<I(\hat{X}_{1};\hat{Y}_{1})+|I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1})-R_{2}|^{+}$ $=I(\hat{X}_{1};\hat{Y}_{1})+|I(\hat{X}_{2};\hat{Y}_{1}|\hat{X}_{1})-R_{2}|^{+}$ . and $R_{1}<I(\hat{X}_{1};\hat{X}_{2},\hat{Y}_{1})=I(\hat{X}_{1};\hat{Y}_{1}|\hat{X}_{2})$ then the expression (B.4) must be greater than 0. Indeed, for the expression (B.4) to equal 0, we see from the first term in the inner maximum that the minimizing $\theta$ and joint distributions must satisfy one of the following: case 1: $\theta=1$ , $D^{(1)}=0$ , and $I(\hat{X}^{(1)}_{1};\hat{X}^{(1)}_{2})=0$ ; case 2: $\theta=0$ , $D^{(2)}=0$ , and $I(\hat{X}^{(2)}_{1};\hat{X}^{(2)}_{2})=0$ ; or case 3: $0<\theta<1$ , $D^{(1)}=D^{(2)}=0$ , and $I(\hat{X}^{(1)}_{1};\hat{X}^{(1)}_{2})=I(\hat{X}^{(2)}_{1};\hat{X}^{(2)}_{2})=0$ . If case 1 holds then $(\hat{X}_{1}^{(1)},\hat{X}_{2}^{(1)},\hat{Y}_{1}^{(1)})$ necessarily have the same joint distribution as $(\hat{X}_{1},\hat{X}_{2},\hat{Y}_{1})$ , in which case, we see from the third term in the maximum in (B.4) that $R_{1}\geq I(\hat{X}_{1};\hat{Y}_{1})+|I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1})-R_{2}|^{+}$ . Similarly, if case 2 holds then it follows that $(\hat{X}_{1}^{(2)},\hat{X}_{2}^{(2)},\hat{Y}_{1}^{(2)})$ have the same joint distribution as $(\hat{X}_{1},\hat{X}_{2},\hat{Y}_{1})$ , in which case, it follows again from the third term in the maximum that $R_{1}\geq I(\hat{X}_{1};\hat{X}_{2},\hat{Y}_{1})$ . Finally, if case 3 holds then both $(\hat{X}_{1}^{(1)},\hat{X}_{2}^{(1)},\hat{Y}_{1}^{(1)})$ and $(\hat{X}_{1}^{(2)},\hat{X}_{2}^{(2)},\hat{Y}_{1}^{(2)})$ have the same distribution as $(\hat{X}_{1},\hat{X}_{2},\hat{Y}_{1})$ , in which case, after writing $R_{1}=\theta R_{1}+\overline{\theta}R_{1}$ , we see again that either $R_{1}\geq I(\hat{X}_{1};\hat{Y}_{1})+|I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1})-R_{2}|^{+}$ or $R_{1}\geq I(\hat{X}_{1};\hat{X}_{2},\hat{Y}_{1})$ must hold. Thus, the three cases together establish the above claim that if $R_{1}<I(\hat{X}_{1};\hat{Y}_{1})+|I(\hat{X}_{2};\hat{Y}_{1}|\hat{X}_{1})-R_{2}|^{+}$ and $R_{1}<I(\hat{X}_{1};\hat{Y}_{1}|\hat{X}_{2})$ then the expression (B.4), and hence $E^{*}_{R,1}$ , must be greater than 0. It can be checked that this region is equivalent to

\{R_{1}<I(\hat{X}_{1};\hat{Y}_{1})\}\cup\Big{\{}\{R_{1}+R_{2}<I(\hat{Y}_{1};\hat{X}_{1},\hat{X}_{2})\}\\ \cap\{R_{1}<I(\hat{X}_{1};\hat{Y}_{1}|\hat{X}_{2})\Big{\}}

which is represented in Fig. 1 in Section IV. It is shown in [11] that for the ensemble of constant composition codes comprised of i.i.d. codewords uniformly distributed over the types $Q_{1}$ and $Q_{2}$ , the exponential decay rate of the average probability of error for user 1 must necessarily be zero for rate pairs outside of this region, even for optimum, maximum likelihood decoding.

References

[1] T. S. Han and K. Kobayashi, “A new achievable rate region for the interference channel,” IEEE Trans. Info. Theory, vol. IT-27, pp. 49 - 60, January 1981.
[2] A. B. Carleial, “A case where interference does not reduce capacity,” (Corresp.), IEEE Trans. Info. Theory, vol. IT-21, pp. 569 - 570, September 1975.
[3] R. Etkin, D. Tse, and H. Wang, “Gaussian Interference Channel Capacity to Within One Bit,” submitted to IEEE Transactions on Information Theory, Feb. 2007. Also, available on–line at: [http://arxiv.org/PS_cache/cs/pdf/0702/0702045v2.pdf].
[4] L. Weng, S. S. Pradhan, and A. Anastasopoulos, “Error exponent regions for Gaussian broadcast and multiple-access channels,” IEEE Trans. Info. Theory, vol. IT-54, pp. 2919 - 2942, July 2008.
[5] R. G. Gallager, Information Theory and Reliable Communication, John Wiley & Sons, Inc., New York, 1968.
[6] I. Csiszár, and J. Körner, Information Theory: Coding Theorems for Discrete Memoryless Systems, Akadémiai Kiadó, Budapest, 1981.
[7] N. Merhav, “Relations between random coding exponents and the statistical physics of random codes,” accepted to IEEE Trans. Inform. Theory, Sep. 2008. Also, available on–line at: [http://www.ee.technion.ac.il/people/merhav/papers/p117.pdf].
[8] N. Merhav, “Error exponents of erasure/list decoding revisited via moments of distance enumerators,” IEEE Trans. Inform. Theory, Vol. 54, No. 10, pp. 4439-4447, Oct. 2008.
[9] R. Etkin, N. Merhav, E. Ordentlich, “Error exponents of optimum decoding for the interference channel,” Proceedings of the IEEE International Symposium on Information Theory, Toronto, Canada, pp. 1523–1527, 6-11 July 2008.
[10] R. Etkin, E. Ordentlich, “Discrete Memoryless Interference Channel: New Outer Bound,” Proceedings of the IEEE International Symposium on Information Theory, Nice, France, pp. 2851–2855, 24-29 June 2007.
[11] C. Chang, HP Labs Technical Report, 2008.
[12] J. Pokorny, H. Wallmeier, “Random coding bound and codes produced by permutations for the multiple-access channel,” IEEE Trans. Inform. Theory, Vol. 31, No. 6, pp. 741–750, Nov. 1985.
[13] Yu-Sun Liu, B. L. Hughes, “A new universal random coding bound for the multiple-access channel,” IEEE Trans. Inform. Theory, Vol. 42, No. 2, pp. 376–386, Mar. 1996.

	$\displaystyle I(\hat{X}_{1}^{\prime};\hat{Y}^{\prime}_{1})\bigg{\|}_{\overline{P}}$	$\displaystyle=D(\overline{P}_{\hat{X}_{1}^{\prime}\hat{Y}_{1}^{\prime}}\\|\overline{P}_{\hat{X}_{1}^{\prime}}\times\overline{P}_{\hat{Y}_{1}^{\prime}})$
		$\displaystyle\leq\lambda D(\hat{P}_{\hat{X}_{1}^{\prime}\hat{Y}_{1}^{\prime}}\\|\hat{P}_{\hat{X}_{1}^{\prime}}\times\hat{P}_{\hat{Y}_{1}^{\prime}})$
		$\displaystyle+(1-\lambda)D(\check{P}_{\hat{X}_{1}^{\prime}\hat{Y}_{1}^{\prime}}\\|\check{P}_{\hat{X}_{1}^{\prime}}\times\check{P}_{\hat{Y}_{1}^{\prime}})$
		$\displaystyle=\lambda I(\hat{X}_{1}^{\prime};\hat{Y}_{1}^{\prime})\bigg{\|}_{\hat{P}}+(1-\lambda)I(\hat{X}_{1}^{\prime};\hat{Y}_{1}^{\prime})\bigg{\|}_{\check{P}}.$

	$\displaystyle E_{R,1}^{*}\geq\min_{\theta\in[0,1]}\min_{\begin{subarray}{c}(P_{\hat{X}^{(1)}_{1}\hat{X}^{(1)}_{2}\hat{Y}^{(1)}_{1}},P_{\hat{X}^{{}^{\prime}(1)}_{1}\hat{X}^{{}^{\prime}(1)}_{2}\hat{Y}^{{}^{\prime}(1)}_{1}},\\ P_{\hat{X}^{(2)}_{1}\hat{X}^{(2)}_{2}\hat{Y}^{(2)}_{1}},P_{\hat{X}^{{}^{\prime}(2)}_{1}\hat{X}^{{}^{\prime}(2)}_{2}\hat{Y}^{{}^{\prime}(2)}_{1}})\\ \in{\cal S}_{1}(Q_{1},Q_{2})\times{\cal S}_{2}(Q_{1},Q_{2})\end{subarray}}\max\Bigg{\{}$
	$\displaystyle\theta\Big{[}-E\left[\log q_{1}(\hat{Y}^{(1)}_{1}\|\hat{X}^{(1)}_{1},\hat{X}^{(1)}_{2})\right]-H(\hat{Y}^{(1)}_{1}\|\hat{X}^{(1)}_{1})+$
	$\displaystyle\quad\quad I(\hat{X}^{(1)}_{2};\hat{X}^{(1)}_{1},\hat{Y}^{(1)}_{1})+\|I(\hat{X}^{{}^{\prime}(1)}_{2};\hat{Y}^{{}^{\prime}(1)}_{1})-R_{2}\|^{+}\Big{]}+$
	$\displaystyle\overline{\theta}\Big{[}-E\left[\log q_{1}(\hat{Y}^{(2)}_{1}\|\hat{X}^{(2)}_{1},\hat{X}^{(2)}_{2})\right]-H(\hat{Y}^{(2)}_{1}\|\hat{X}^{(2)}_{1})+$
	$\displaystyle\quad\quad I(\hat{X}^{(2)}_{2};\hat{X}^{(2)}_{1},\hat{Y}^{(2)}_{1})\Big{]};$
	$\displaystyle-R_{1}+\theta\Big{[}-E\left[\log q_{1}(\hat{Y}^{(1)}_{1}\|\hat{X}^{(1)}_{1},\hat{X}^{(1)}_{2})\right]-$
	$\displaystyle\quad\quad H(\hat{Y}^{(1)}_{1}\|\hat{X}^{(1)}_{1})+I(\hat{X}^{{}^{\prime}(1)}_{1};\hat{Y}^{{}^{\prime}(1)}_{1})+$
	$\displaystyle\quad\quad I(\hat{X}^{(1)}_{2};\hat{X}^{(1)}_{1},\hat{Y}^{(1)}_{1})+\|I(\hat{X}^{{}^{\prime}(1)}_{2};\hat{X}^{{}^{\prime}(1)}_{1},\hat{Y}^{{}^{\prime}(1)}_{1})-R_{2}\|^{+}\Big{]}+$
	$\displaystyle\overline{\theta}\Big{[}-E\left[\log q_{1}(\hat{Y}^{(2)}_{1}\|\hat{X}^{(2)}_{1},\hat{X}^{(2)}_{2})\right]-H(\hat{Y}^{(2)}_{1}\|\hat{X}^{(2)}_{1})+$
	$\displaystyle\quad\quad I(\hat{X}^{{}^{\prime}(2)}_{1};\hat{X}^{{}^{\prime}(2)}_{2},\hat{Y}^{{}^{\prime}(2)}_{1})+I(\hat{X}^{(2)}_{2};\hat{X}^{(2)}_{1},\hat{Y}^{(2)}_{1})\Big{]};$
	$\displaystyle-R_{1}+\theta\Big{[}-E\left[\log q_{1}(\hat{Y}^{{}^{\prime}(1)}_{1}\|\hat{X}^{{}^{\prime}(1)}_{1},\hat{X}^{{}^{\prime}(1)}_{2})\right]-$
	$\displaystyle\quad\quad H(\hat{Y}^{(1)}_{1}\|\hat{X}^{(1)}_{1})+I(\hat{X}^{{}^{\prime}(1)}_{1};\hat{Y}^{{}^{\prime}(1)}_{1})+$
	$\displaystyle\quad\quad I(\hat{X}^{{}^{\prime}(1)}_{2};\hat{X}^{{}^{\prime}(1)}_{1},\hat{Y}^{{}^{\prime}(1)}_{1})+\|I(\hat{X}^{(1)}_{2};\hat{X}^{(1)}_{1},\hat{Y}^{(1)}_{1})-R_{2}\|^{+}\Big{]}+$
	$\displaystyle\overline{\theta}\Big{[}-E\left[\log q_{1}(\hat{Y}^{{}^{\prime}(2)}_{1}\|\hat{X}^{{}^{\prime}(2)}_{1},\hat{X}^{{}^{\prime}(2)}_{2})\right]-H(\hat{Y}^{(2)}_{1}\|\hat{X}^{(2)}_{1})+$
	$\displaystyle\quad\quad I(\hat{X}^{{}^{\prime}(2)}_{1};\hat{X}^{{}^{\prime}(2)}_{2},\hat{Y}^{{}^{\prime}(2)}_{1})+I(\hat{X}^{(2)}_{2};\hat{X}^{(2)}_{1},\hat{Y}^{(2)}_{1})\Big{]}\Bigg{\}}$

	$\displaystyle I(\hat{X}_{2};\hat{X}_{1},\hat{Y}_{1})=I(\hat{X}_{1};\hat{X}_{2})+H(\hat{Y}_{1}\|\hat{X}_{1})-H(\hat{Y}_{1}\|\hat{X}_{1},\hat{X}_{2})$
	$\displaystyle I(\hat{X}_{1};\hat{X}_{2},\hat{Y}_{1})=I(\hat{X}_{1};\hat{X}_{2})+H(\hat{Y}_{1}\|\hat{X}_{2})-H(\hat{Y}_{1}\|\hat{X}_{1},\hat{X}_{2})$
	$\displaystyle D(P_{\hat{Y}_{1}\|\hat{X}_{1}\hat{X}_{2}}\|\|q_{1}\|P_{\hat{X}_{1}\hat{X}_{2}})=-H(\hat{Y}_{1}\|\hat{X}_{1},\hat{X}_{2})-$
	$\displaystyle\quad\quad\quad\mbox{\boldmath$E$}_{\hat{X}_{1}\hat{X}_{2}\hat{Y}_{1}}\left[\log q_{1}\left(\hat{Y}_{1}\|\hat{X}_{1},\hat{X}_{2}\right)\right]$

Error Exponents of Optimum Decoding for the Interference Channel

Abstract

Index Terms:

I Introduction

II Notation, Definitions, and Channel Model

III Background

IV Main Result

Theorem 1

Remark 1

Remark 2

V Convex Optimization Issues

VI Numerical Results

Appendix A Proof of Theorem 1

Appendix B A Lower Bound to ER,1E_{R,1}

References

Appendix B A Lower Bound to $E_{R,1}$