This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

11institutetext: Institute of Mathematical Sciences, HBNI, Chennai
11email: gganesan82@gmail.com

Achieving positive rates with predetermined dictionaries

Ghurumuruhan Ganesan
Abstract

In the first part of the paper we consider binary input channels that are not necessarily stationary and show how positive rates can be achieved using codes constrained to be within predetermined dictionaries. We use a Gilbert-Varshamov-like argument to obtain the desired rate achieving codes. Next we study the corresponding problem for channels with arbitrary alphabets and use conflict-set decoding to show that if the dictionaries are contained within “nice” sets, then positive rates are achievable.

Key words: Positive rates, predetermined dictionaries.

AMS 2000 Subject Classification: Primary: 94A15, 94A24.


1 Introduction

Achieving positive rates with low probability of error in communication channels is an important problem in information theory [3]. In general, a rate RR is defined to be achievable if there exists codes with rate RR and having arbitrarily small error probability as the code length n.n\rightarrow\infty. The existence of such codes is determined through the probabilistic method of choosing a random code (from the set of all possible codes) and showing that the chosen code has small error probability.

In many cases of interest, we would like to select codes satisfying certain constraints or equivalently from a predetermined dictionary (see [2, 7] for examples). For stationary channels, the method of types [4, 5] can be used to study positive rate achievability with the restriction that the dictionary falls within the set of words belonging to a particular type. In this paper, we study achievability of positive rates with arbitrary deterministic dictionaries for both binary and general input channels using counting techniques.

The paper is organized as follows: In Section 2, we study positive rate achievability in binary input channels using predetermined dictionaries. Next in Section 3, we describe the rate achievability problem for arbitrary stationary channels and state our result Theorem 3.1 regarding achieving positive rates using given dictionaries. Finally, in Section 4, we prove Theorem 3.1.

2 Binary Channels

For integer n1,n\geq 1, an element of the set {0,1}n\{0,1\}^{n} is said to be a codeword or simply word, of length n.n. Consider a discrete memoryless symmetric channel with input alphabet {0,1}n\{0,1\}^{n} that corrupts a transmitted word 𝐱=(x1,,xn)\mathbf{x}=(x_{1},\ldots,x_{n}) as follows. If 𝐘=(Y1,,Yn)\mathbf{Y}=(Y_{1},\ldots,Y_{n}) is the received (random) word, then

Yi:=xi11(Wi=0)+(1xi)11(Wi=1)+ε11(Wi=ε)Y_{i}:=x_{i}1{1}(W_{i}=0)+(1-x_{i})1{1}(W_{i}=1)+\varepsilon 1{1}(W_{i}=\varepsilon) (2.1)

for all 1in,1\leq i\leq n, where 11(.)1{1}(.) denotes the indicator function and ε\varepsilon denotes the erasure symbol. If Wi=1,W_{i}=1, then the bit xix_{i} is substituted and if Wi=ε,W_{i}=\varepsilon, then xix_{i} is erased. The random variables {Wi}1in\{W_{i}\}_{1\leq i\leq n} are independent with

(Wi=1)=pf(i) and (Wi=ε)=pe(i)\mathbb{P}(W_{i}=1)=p_{f}(i)\text{ and }\mathbb{P}(W_{i}=\varepsilon)=p_{e}(i) (2.2)

and so the probability of a bit error (due to either a substituted bit or an erased bit) at “time” index ii is pf(i)+pe(i).p_{f}(i)+p_{e}(i). Letting 𝐖:=(W1,,Wn),\mathbf{W}:=(W_{1},\ldots,W_{n}), we also denote 𝐘=:h(𝐱,𝐖)\mathbf{Y}=:h(\mathbf{x},\mathbf{W}) where hh is a deterministic function defined via (2.1). We are interested in communicating through the above described channel, with low probability of error, using words from a predetermined (deterministic) dictionary.

Dictionaries

A dictionary of size MM is a set 𝒟{0,1}n{\cal D}\subseteq\{0,1\}^{n} of cardinality #𝒟=M.\#{\cal D}=M. A subset 𝒞={𝐱1,,𝐱L}𝒟{\cal C}=\{\mathbf{x}_{1},\ldots,\mathbf{x}_{L}\}\subseteq{\cal D} is said to be an nn-length code of size L,L, contained in the dictionary 𝒟.{\cal D}. Suppose we transmit a word picked from 𝒞,{\cal C}, through the channel given by (2.1) and receive the (random) word 𝐘.\mathbf{Y}. Given 𝐘\mathbf{Y} we would like an estimate 𝐱^\hat{\mathbf{x}} of the word from 𝒞{\cal C} that was transmitted. A decoder g:{0,1,ε}n𝒞g:\{0,1,\varepsilon\}^{n}\rightarrow{\cal C} is a deterministic map that uses the received word 𝐘\mathbf{Y} to obtain an estimate of the transmitted word. The probability of error corresponding to the code 𝒞{\cal C} and the decoder gg is then defined as

q(𝒞,g):=max1iL(g(h(𝐱i,𝐖))𝐱i),q({\cal C},g):=\max_{1\leq i\leq L}\mathbb{P}\left(g\left(h\left(\mathbf{x}_{i},\mathbf{W}\right)\right)\neq\mathbf{x}_{i}\right), (2.3)

where 𝐖=(W1,,Wn)\mathbf{W}=(W_{1},\ldots,W_{n}) is the additive noise as described in (2.1).

We have the following definition regarding achievable rates using predetermined dictionaries.

Definition 1

Let R>0R>0 and let :={𝒟n}n1{\cal F}:=\{{\cal D}_{n}\}_{n\geq 1} be any sequence of dictionaries such that each 𝒟n{\cal D}_{n} has size at least 2nR.2^{nR}. We say that R>0R>0 is an {\cal F}-achievable rate if the following holds true for every ϵ>0:\epsilon>0: For all nn large, there exists a code 𝒞n𝒟n{\cal C}_{n}\subset{\cal D}_{n} of size #𝒞n=2nR\#{\cal C}_{n}=2^{nR} and a decoder gng_{n} such that the probability of error q(𝒞n,gn)<ϵ.q({\cal C}_{n},g_{n})<\epsilon.

If 𝒟n={0,1}n{\cal D}_{n}=\{0,1\}^{n} for each n,n, then the above reduces to the usual concept of rate achievability as in [3] and we simply say that RR is achievable.

For 0<x<10<x<1 we define the entropy function

H(x):=xlogx(1x)log(1x),H(x):=-x\cdot\log{x}-(1-x)\cdot\log(1-x), (2.4)

where all logarithms in this section to the base two and have the following result.

Theorem 2.1

For integer n1n\geq 1 let

μf=μf(n):=i=1npf(i) and μe=μe(n):=i=1npe(i)\mu_{f}=\mu_{f}(n):=\sum_{i=1}^{n}p_{f}(i)\text{ and }\mu_{e}=\mu_{e}(n):=\sum_{i=1}^{n}p_{e}(i)

be the expected number of bit substitutions and erasures, respectively in an nn-length codeword and suppose

min(μf(n),μe(n)) and p:=lim supn1n(2μf(n)+μe(n))<12.\min\left(\mu_{f}(n),\mu_{e}(n)\right)\longrightarrow\infty\text{ and }p:=\limsup_{n}\frac{1}{n}\left(2\mu_{f}(n)+\mu_{e}(n)\right)<\frac{1}{2}. (2.5)

Let H(p)<α1H(p)<\alpha\leq 1 and let :={𝒟n}n1{\cal F}:=\{{\cal D}_{n}\}_{n\geq 1} be any sequence of dictionaries satisfying #𝒟n2αn,\#{\cal D}_{n}\geq 2^{\alpha n}, for each n.n. We have that every R<αH(p)R<\alpha-H(p) is {\cal F}-achievable.

For a given α,\alpha, let p(α)p(\alpha) be the largest value of pp such that H(p)<α.H(p)<\alpha. The above result says that every R<αH(p)R<\alpha-H(p) is achievable using arbitrary dictionaries. We use Gilbert-Varshamov-like arguments to prove Theorem 2.1 below.

As a special case, for binary symmetric channels with crossover probability pf,p_{f}, each bit is independently substituted with probability pf.p_{f}. No erasures occur and so

μf(n)=npf and μe(n)=0.\mu_{f}(n)=np_{f}\text{ and }\mu_{e}(n)=0.

Thus p=2pfp=2p_{f} and from Theorem 2.1 we therefore have that if H(2pf)<α,H(2p_{f})<\alpha, then every R<αH(2pf)R<\alpha-H(2p_{f}) is achievable.

Proof of Theorem 2.1

The main idea of the proof is as follows. Using standard deviation estimates, we first obtain an upper bound on the number of possible errors that could occur in a transmitted word. More specifically, if TT denotes the number of bit errors in an nn-length word and ϵ>0\epsilon>0 is given, we use standard deviation estimates to determine T0=T0(n)T_{0}=T_{0}(n) such that (T>T0)ϵ.\mathbb{P}(T>T_{0})\leq\epsilon. We then use a Gilbert-Varshamov argument to obtain a code that can correct up to T0T_{0} bit errors. The details are described below.

We prove the Theorem in two steps. In the first step, we construct the code 𝒞{\cal C} and decoder gg and in the second step, we estimate the probability of the decoding error for 𝒞{\cal C} using g.g. For 𝐱,𝐲{0,1}n,\mathbf{x},\mathbf{y}\in\{0,1\}^{n}, we let dH(𝐱,𝐲)=i=1n11(xiyi)d_{H}(\mathbf{x},\mathbf{y})=\sum_{i=1}^{n}1{1}(x_{i}\neq y_{i}) be the Hamming distance between 𝐱\mathbf{x} and 𝐲,\mathbf{y}, where as before 11(.)1{1}(.) denotes the indicator function. The minimum distance of a code is the minimum distance between any two words in a code.

Step 1: Assume for simplicity that t:=np(1+2ϵ)t:=np(1+2\epsilon) is an integer and let d=t+1.d=t+1. For a word 𝐱\mathbf{x} let Bd1(𝐱)B_{d-1}(\mathbf{x}) be the set of words that are at a distance of at most d1d-1 from 𝐱.\mathbf{x}. If 𝒞𝒟{\cal C}\subseteq{\cal D} is a maximum size code with minimum distance at least d,d, then by the maximality of 𝒞{\cal C} we must have

𝐱𝒞Bd1(𝐱)=𝒟.\bigcup_{\mathbf{x}\in{\cal C}}B_{d-1}(\mathbf{x})={\cal D}. (2.6)

This is known as the Gilbert-Varshamov argument [6].

The cardinalities of 𝒟{\cal D} and Bd1(𝐱)B_{d-1}(\mathbf{x}) are 2αn2^{\alpha n} and i=0d1(ni)\sum_{i=0}^{d-1}{n\choose i} respectively and so from (2.6), we see that the code 𝒞{\cal C} has size

#𝒞2αni=0d1(ni)\#{\cal C}\geq\frac{2^{\alpha n}}{\sum_{i=0}^{d-1}{n\choose i}} (2.7)

and minimum distance at least d.d. Also since p<12,p<\frac{1}{2}, we have for all small ϵ>0\epsilon>0 that (ni)(nd1)=(nnp(1+2ϵ)){n\choose i}\leq{n\choose d-1}={n\choose np(1+2\epsilon)} and so i=0d1(ni)n(nnp(1+2ϵ)).\sum_{i=0}^{d-1}{n\choose i}\leq n\cdot{n\choose np(1+2\epsilon)}. Using Stirling approximation we get

(nnp(1+2ϵ))4en2nH(p+2pϵ){n\choose np(1+2\epsilon)}\leq 4en\cdot 2^{nH(p+2p\epsilon)}

and so from (2.7), we get for δ>0\delta>0 that

#𝒞14en22n(αH(p+2pϵ))2n(αH(p)δ)\#{\cal C}\geq\frac{1}{4en^{2}}\cdot 2^{n(\alpha-H(p+2p\epsilon))}\geq 2^{n(\alpha-H(p)-\delta)} (2.8)

provided ϵ>0\epsilon>0 is small.

We now use a two stage decoder described as follows: Suppose the received word is 𝐘\mathbf{Y} and for simplicity suppose that the last ee positions in 𝐘\mathbf{Y} have been erased. For a codeword 𝐱=(x1,,xn),\mathbf{x}=(x_{1},\ldots,x_{n}), let 𝐱red:=(x1,,xne)\mathbf{x}_{red}:=(x_{1},\ldots,x_{n-e}) be the reduced word formed by the first nen-e bits. Let 𝒞red={𝐱red:𝐱𝒞}{\cal C}_{red}=\{\mathbf{x}_{red}:\mathbf{x}\in{\cal C}\} be the set of all reduced codewords in the code 𝒞{\cal C} formed by the first nen-e bits.

In the first stage of the decoding process, the decoder corrects bit substitutions by collecting all words 𝒮𝒞red{\cal S}\subseteq{\cal C}_{red} whose Hamming distance from 𝐘red\mathbf{Y}_{red} is minimum. If 𝒮{\cal S} contains exactly one word, say 𝐳red,\mathbf{z}_{red}, the decoder outputs 𝐳red\mathbf{z}_{red} as the estimate obtained in the first step of the iteration. Otherwise, the decoder outputs “decoding error”. In the second stage of the decoding process, the decoder uses 𝐳red\mathbf{z}_{red} to correct the erasures. Formally let 𝒮e:={𝐱𝒞:𝐱red=𝐳red}{\cal S}_{e}:=\{\mathbf{x}\in{\cal C}:\mathbf{x}_{red}=\mathbf{z}_{red}\} be the set of all codewords whose first nen-e bits match 𝐳red.\mathbf{z}_{red}. If there exists exactly one word 𝐳\mathbf{z} in 𝒮e,{\cal S}_{e}, then the decoder outputs 𝐳\mathbf{z} to be the transmitted word. Else the decoder outputs “decoding error”.

Step 2: Suppose a word 𝐱𝒞\mathbf{x}\in{\cal C} was transmitted and the received word is 𝐘.\mathbf{Y}. Let 𝐖=(W1,,Wn)\mathbf{W}=(W_{1},\ldots,W_{n}) be the random noise vector as in (2.1) and let

Tf:=i=1n11(Wi=1)T_{f}:=\sum_{i=1}^{n}1{1}(W_{i}=1)

be the number of bits that have been substituted so that

𝔼Tf=i=1npf(i)=μf(n),\mathbb{E}T_{f}=\sum_{i=1}^{n}p_{f}(i)=\mu_{f}(n),

by (2.2). By standard deviation estimates (Corollary A.1.14,A.1.14, pp. 312, [1]) we have

(|Tfμf(n)|ϵμf(n))2eϵ24μf(n)ϵ2\mathbb{P}\left(|T_{f}-\mu_{f}(n)|\geq\epsilon\mu_{f}(n)\right)\leq 2e^{-\frac{\epsilon^{2}}{4}\mu_{f}(n)}\leq\frac{\epsilon}{2} (2.9)

for all nn large, by the first condition of (2.5). Similarly if Te=i=1n11(Wi=ε)T_{e}=\sum_{i=1}^{n}1{1}(W_{i}=\varepsilon) is the number of erased bits, then

(|Teμe(n)|ϵμe(n))2eϵ24μe(n)ϵ2\mathbb{P}\left(|T_{e}-\mu_{e}(n)|\geq\epsilon\mu_{e}(n)\right)\leq 2e^{-\frac{\epsilon^{2}}{4}\mu_{e}(n)}\leq\frac{\epsilon}{2} (2.10)

for all nn large.

Next, using the second condition of (2.5) we have that

(2μf(n)+μe(n))(1+ϵ)np(1+2ϵ)=t(2\mu_{f}(n)+\mu_{e}(n))(1+\epsilon)\leq np(1+2\epsilon)=t

for all nn large and so from (2.9) and (2.10) we get that (2Tf+Tet)ϵ\mathbb{P}\left(2T_{f}+T_{e}\geq t\right)\leq\epsilon for all nn large. If 2Tf+Tet,2T_{f}+T_{e}\leq t, then by construction the decoder outputs 𝐱\mathbf{x} as the estimate of the transmitted word. Therefore a decoding error occurs only if 2Tf+Tet2T_{f}+T_{e}\geq t which happens with probability at most ϵ.\epsilon. Combining with (2.8) and using the fact that δ>0\delta>0 is arbitrary, we get that every R<αH(p)R<\alpha-H(p) is {\cal F}-achievable.                                                                                               

3 General channels

Consider a discrete memoryless channel with finite input alphabet 𝒳{\cal X} of size N:=#𝒳,N:=\#{\cal X}, a finite output alphabet 𝒴{\cal Y} and a transition probability pY|X(y|x),x𝒳,y𝒴.p_{Y|X}(y|x),x\in{\cal X},y\in{\cal Y}. The term pY|X(y|x)p_{Y|X}(y|x) denotes the probability that output yy is observed given that input xx is transmitted through the channel.

For n1n\geq 1 we define a subset 𝒟n𝒳n{\cal D}_{n}\subseteq{\cal X}^{n} to be a dictionary. A subset 𝒞={x1,,xM}𝒟n{\cal C}=\{x_{1},\ldots,x_{M}\}\subseteq{\cal D}_{n} is defined to be an nn-length code contained within the dictionary 𝒟n.{\cal D}_{n}. Suppose we transmit the word x1x_{1} and receive the (random) word Γx1𝒴n.\Gamma_{x_{1}}\in{\cal Y}^{n}. Given Γx1\Gamma_{x_{1}} we would like an estimate x^\hat{x} of the word from 𝒞{\cal C} that was transmitted. A decoder g:𝒴n𝒞g:{\cal Y}^{n}\rightarrow{\cal C} is a deterministic map that “guesses” the transmitted word based on the received word Γx1.\Gamma_{x_{1}}. We denote the probability of error corresponding to the code 𝒞{\cal C} and the decoder gg as

q(𝒞,g):=maxx𝒞(g(Γx)x).q({\cal C},g):=\max_{x\in{\cal C}}\mathbb{P}\left(g(\Gamma_{x})\neq x\right). (3.1)

To study positive rate achievability using arbitrary dictionaries, we have a couple of preliminary definitions. Let pX(.)p_{X}(.) be any probability distribution on the input alphabet 𝒳{\cal X} and let H(X):=x𝒳pX(x)logpX(x)H(X):=-\sum_{x\in{\cal X}}p_{X}(x)\log{p_{X}(x)} be the entropy of a random variable XX where the logarithm is to the base NN here. Let YY be a random variable having joint distribution pXY(x,y)p_{XY}(x,y) with the random variable XX defined by pXY(x,y):=pY|X(y|x)pX(x).p_{XY}(x,y):=p_{Y|X}(y|x)\cdot p_{X}(x). Thus YY is the random output of the channel when the input is X.X. Letting pY(y):=xpXY(x,y)p_{Y}(y):=\sum_{x}p_{XY}(x,y) be the marginal of YY we have that the joint entropy and conditional entropy [3] are respectively given by

H(X,Y)=x,ypXY(x,y)logpXY(x,y)H(X,Y)=-\sum_{x,y}p_{XY}(x,y)\log{p_{XY}(x,y)}

and

H(Y|X)=x,ypXY(x,y)logpY|X(y|x).H(Y|X)=-\sum_{x,y}p_{XY}(x,y)\log{p_{Y|X}(y|x)}.

The following result obtains positive rates achievable with predetermined dictionaries for the channel described above.

Theorem 3.1

Let pX,pYp_{X},p_{Y} and pXYp_{XY} be as above and let 0<αH(X).0<\alpha\leq H(X). For every ϵ>0\epsilon>0 and for all nn large, there is a deterministic set n{\cal B}_{n} with size at least Nn(H(X)2ϵ)N^{n(H(X)-2\epsilon)} and satisfying the following property: If 𝒟n{\cal D}_{n} is any subset of n{\cal B}_{n} with cardinality Nn(α2ϵ)N^{n(\alpha-2\epsilon)} and

R<αH(Y|X)H(X|Y)7ϵR<\alpha-H(Y|X)-H(X|Y)-7\epsilon (3.2)

is positive, then there exists a code 𝒞n𝒟n{\cal C}_{n}\subset{\cal D}_{n} containing NnRN^{nR} words and a decoder gng_{n} with error probability q(𝒞n,gn)<ϵ.q\left({\cal C}_{n},g_{n}\right)<\epsilon.

Thus if the sequence of dictionaries :={𝒟n}n1{\cal F}:=\{{\cal D}_{n}\}_{n\geq 1} is such that 𝒟nn{\cal D}_{n}\subset{\cal B}_{n} for each n,n, then every R<αH(Y|X)H(X|Y)R<\alpha-H(Y|X)-H(X|Y) is {\cal F}-achievable. Also, setting α=H(X)\alpha=H(X) and 𝒟n=n{\cal D}_{n}={\cal B}_{n} also gives us that every R<H(X)H(X|Y)H(Y|X)R<H(X)-H(X|Y)-H(Y|X) is achievable in the usual sense of [3], without any restrictions on the dictionaries. For context, we remark that Theorem 2.1 holds for arbitrary dictionaries.

To prove Theorem 3.1, we use typical sets [3] together with conflict set decoding described in the next section. Before we do so, we present an example to illustrate Theorem 3.1.

Example

Consider a binary asymmetric channel with alphabet 𝒳=𝒴={0,1}{\cal X}={\cal Y}=\{0,1\} and transition probability

p(1|0)=p0=1p(0|0) and p(0|1)=p1=1p(1|1).p(1|0)=p_{0}=1-p(0|0)\text{ and }p(0|1)=p_{1}=1-p(1|1).

To apply Theorem 3.1, we assume that the input has the symmetric distribution (Xi=0)=12=(Xi=1)\mathbb{P}(X_{i}=0)=\frac{1}{2}=\mathbb{P}(X_{i}=1) so that the entropy H(X)H(X) equals its maximum value of 1.1. The entropy of the output H(Y)=H(q)H(Y)=H(q) where q=1p0+p12q=\frac{1-p_{0}+p_{1}}{2} and the conditional entropies equal

H(Y|X)=12(H(p0)+H(p1)) and H(X|Y)=12(H(p0)+H(p1))+1H(q).H(Y|X)=\frac{1}{2}\left(H(p_{0})+H(p_{1})\right)\text{ and }H(X|Y)=\frac{1}{2}\left(H(p_{0})+H(p_{1})\right)+1-H(q).

Set p0=pp_{0}=p and p1=p+Δ.p_{1}=p+\Delta. If both pp and Δ\Delta are small, then H(q)H(q) is close to one and H(p0)H(p_{0}) and H(p1)H(p_{1}) are close to zero. We assume that pp and Δ\Delta are such that

α0:=H(Y|X)+H(X|Y)=H(p)+H(p+Δ)+1H(1Δ2)\alpha_{0}:=H(Y|X)+H(X|Y)=H(p)+H(p+\Delta)+1-H\left(\frac{1-\Delta}{2}\right)

is strictly less than one and choose α>α0.\alpha>\alpha_{0}. Every R<αα0R<\alpha-\alpha_{0} is then {\cal F}-achievable as in the statement following Theorem 3.1 and every R<1α0R<1-\alpha_{0} is achievable without any dictionary restrictions, in the usual sense of [3].

In Figure 1, we plot 1α01-\alpha_{0} as a function of pp for various values of the asymmetry factor Δ.\Delta. For example, for an asymmetry factor of Δ=0.05\Delta=0.05 we see that positive rates are achievable for pp roughly up to 0.08.0.08.

Refer to caption
Figure 1: Plotting the rate 1α01-\alpha_{0} as a function of pp for various values of the asymmetry factor Δ.\Delta.

4 Proof of Theorem 3.1

We use conflict set decoding to prove Theorem 3.1. Therefore in the first part of this section, we prove an auxiliary result regarding conflict set decoding that is also of independent interest.

4.1 Conflict set decoding

Consider a discrete memoryless channel with finite input alphabet 𝒳0{\cal X}_{0} and finite output alphabet 𝒴0{\cal Y}_{0} and transition probability p0(y|x),x𝒳0,y𝒴0.p_{0}(y|x),x\in{\cal X}_{0},y\in{\cal Y}_{0}. For convenience, we define the channel by a collection of random variables θx,x𝒳0\theta_{x},x\in{\cal X}_{0} with the distribution (θx=y):=p0(y|x)\mathbb{P}\left(\theta_{x}=y\right):=p_{0}(y|x) for y𝒴0.y\in{\cal Y}_{0}. All random variables are defined on the probability space (Ω,,).(\Omega,{\cal F},\mathbb{P}). For ϵ>0,x𝒳0\epsilon>0,x\in{\cal X}_{0} and y𝒴0y\in{\cal Y}_{0} we let D(x,ϵ)D(x,\epsilon) and C(y,ϵ)C(y,\epsilon) be deterministic sets such that

(θxD(x,ϵ))1ϵ and C(y,ϵ)={x:yD(x,ϵ)}.\mathbb{P}\left(\theta_{x}\in D(x,\epsilon)\right)\geq 1-\epsilon\text{ and }C(y,\epsilon)=\{x:y\in D(x,\epsilon)\}. (4.1)

We define D(x,ϵ)D(x,\epsilon) to be an ϵ\epsilon-probable set or simply probable output set corresponding to the input xx and for y𝒴0,y\in{\cal Y}_{0}, we denote C(y,ϵ)C(y,\epsilon) to be the ϵ\epsilon-conflict set or simply conflict set corresponding to the output y.y. There are many possible choices for D(x,ϵ);D(x,\epsilon); for example D(x,ϵ)=𝒴0D(x,\epsilon)={\cal Y}_{0} is one choice. In Proposition 1 below, we show however that choosing ϵ\epsilon-probable sets as small as possible allows us to increase the size of the desired code. We also define

dL(ϵ):=maxx𝒳0#D(x,ϵ) and dR(ϵ):=maxy𝒴0#C(y,ϵ)d_{L}(\epsilon):=\max_{x\in{\cal X}_{0}}\#D(x,\epsilon)\text{ and }d_{R}(\epsilon):=\max_{y\in{\cal Y}_{0}}\#C(y,\epsilon) (4.2)

where #A\#A denotes the cardinality of the set A.A.

As before, a code 𝒞{\cal C} of size MM is a set of distinct words {x1,,xM}𝒳0.\{x_{1},\ldots,x_{M}\}\subseteq{\cal X}_{0}. Suppose we transmit the word x1x_{1} and receive the (random) word θx1.\theta_{x_{1}}. Given θx1\theta_{x_{1}} we would like an estimate x^\hat{x} of the word from 𝒞{\cal C} that was transmitted. A decoder g:𝒴0𝒞g:{\cal Y}_{0}\rightarrow{\cal C} is a deterministic map that guesses the transmitted word based on the received word θx1.\theta_{x_{1}}. We denote the probability of error corresponding to the code 𝒞,{\cal C}, the decoder gg and the collection of the probable sets 𝒟:={D(x,ϵ)}x𝒳0{\cal D}:=\{D(x,\epsilon)\}_{x\in{\cal X}_{0}} as

q(𝒞,g,𝒟):=maxx𝒞(g(θx)x).q({\cal C},g,{\cal D}):=\max_{x\in{\cal C}}\mathbb{P}\left(g(\theta_{x})\neq x\right). (4.3)

We have the following Proposition.

Proposition 1

For ϵ>0\epsilon>0 let 𝒟={D(x,ϵ)}x𝒳0{\cal D}=\{D(x,\epsilon)\}_{x\in{\cal X}_{0}} be any collection of ϵ\epsilon-probable sets. If there exists an integer MM satisfying

M<#𝒳0dL(ϵ)dR(ϵ),M<\frac{\#{\cal X}_{0}}{d_{L}(\epsilon)\cdot d_{R}(\epsilon)}, (4.4)

then there exists a code 𝒞𝒳0{\cal C}\subseteq{\cal X}_{0} of size MM and a decoder gg whose decoding error probability is q(𝒞,g,𝒟)<ϵ.q({\cal C},g,{\cal D})<\epsilon.

Thus as long as the number of words is below a certain threshold, we are guaranteed that the error probability is sufficiently small. Also, from (4.4) we see that it would be better to choose probable sets with as small cardinality as possible.

Proof of Proposition 1

Code construction: We recall that by definition, given input x,x, the output θx\theta_{x} belongs to the set D(x,ϵ)D(x,\epsilon) with probability at least 1ϵ.1-\epsilon. Therefore we first construct a code 𝒞={x1,,xM}{\cal C}=\{x_{1},\ldots,x_{M}\} containing MM distinct words and satisfying

D(xi,ϵ)D(xj,ϵ)= for all xi,xj𝒞.D(x_{i},\epsilon)\cap D(x_{j},\epsilon)=\emptyset\text{ for all }x_{i},x_{j}\in{\cal C}. (4.5)

Throughout we assume that MM satisfies (4.4). To obtain the desired distinct words, we use following the bipartite graph representation. Let G=G(ϵ)G=G(\epsilon) be a bipartite graph with vertex set 𝒳0𝒴0.{\cal X}_{0}\cup{\cal Y}_{0}. We join x𝒳0x\in{\cal X}_{0} and y𝒴0y\in{\cal Y}_{0} by an edge if and only if yD(x,ϵ).y\in D(x,\epsilon). The size of D(x,ϵ)D(x,\epsilon) therefore represents the degree of the vertex xx and the size of C(y,ϵ)C(y,\epsilon) represents the degree of the vertex y.y. By definition (see (4.2)) dL=maxx𝒳#D(x,ϵ) and dR=maxy𝒴#C(y,ϵ)d_{L}=\max_{x\in{\cal X}}\#D(x,\epsilon)\text{ and }d_{R}=\max_{y\in{\cal Y}}\#C(y,\epsilon) denote the maximum degree of a left vertex and a right vertex, respectively, in G.G. We say that a set of vertices {x1,,xM}\{x_{1},\ldots,x_{M}\} is disjoint if for all ij,i\neq j, the vertices xix_{i} and xjx_{j} have no common neighbour (in 𝒴0{\cal Y}_{0}). Constructing codes with disjoint ϵ\epsilon-probable sets satisfying (4.5) is therefore equivalent to finding disjoint sets of vertices in 𝒳0.{\cal X}_{0}.

We now use direct counting to get a set of MM disjoint vertices {x1,,xM}\{x_{1},\ldots,x_{M}\} in 𝒳0.{\cal X}_{0}. First we pick any vertex x1𝒳0.x_{1}\in{\cal X}_{0}. The degree of x1x_{1} is at most dLd_{L} and moreover, each vertex in D(x1,ϵ)𝒴D(x_{1},\epsilon)\subseteq{\cal Y} has at most dRd_{R} neighbours in 𝒳0.{\cal X}_{0}. The total number of (bad) vertices of 𝒳0{\cal X}_{0} adjacent to some vertex in D(x1,ϵ)D(x_{1},\epsilon) is at most dLdR.d_{L}\cdot d_{R}. Removing all these bad vertices, we are left with a bipartite subgraph G1G_{1} of GG whose left vertex set has size at least N0dLdRN_{0}-d_{L}\cdot d_{R} where N0=#𝒳0.N_{0}=\#{\cal X}_{0}. We now pick one vertex in the left vertex set of G1G_{1} and continue the above procedure. After the ithi^{th} step, the number of left vertices remaining is N0idLdRN_{0}-i\cdot d_{L}\cdot d_{R} and so from (4.4) we get that this process continues at least for MM steps. The words corresponding to vertices {x1,,xM}\{x_{1},\ldots,x_{M}\} form our code 𝒞.{\cal C}.

Decoder definition: Let 𝒞{\cal C} be the code as constructed above. For decoding, we use the conflict-set decoder defined as follows: If yD(xj,ϵ)y\in D(x_{j},\epsilon) for some xj𝒞x_{j}\in{\cal C} and the conflict set C(y,ϵ)C(y,\epsilon) does not contain any of word of 𝒞{xj},{\cal C}\setminus\{x_{j}\}, then we set g(y)=xj.g(y)=x_{j}. Otherwise, we set g(y)g(y) to be any arbitrary value; for concreteness, we set g(y)=x1.g(y)=x_{1}.

We claim that the probability of error of the conflict-set decoder is at most ϵ.\epsilon. To see this is true, suppose we transmit the word xi.x_{i}. With probability at least
1ϵ,1-\epsilon, the corresponding output θxiD(xi,ϵ).\theta_{x_{i}}\in D(x_{i},\epsilon). Because (4.5) holds, we must necessarily have that yD(xk,ϵ)y\notin D(x_{k},\epsilon) for any kj.k\neq j. This implies that the conflict-set decoder outputs the correct word xix_{i} with probability at least 1ϵ.1-\epsilon.           

We now prove Theorem 3.1 using typical sets and conflict set decoding.

4.2 Proof of Theorem 3.1

For notational simplicity we prove Theorem 3.1 with 𝒳=𝒴={0,1}.{\cal X}={\cal Y}=\{0,1\}. An analogous analysis holds for the general case.

The proof consists of three steps. In the first step, we define and estimate the occurrence of certain typical sets. In the next step, we use the typical sets constructed in Step 11 to determine the set n{\cal B}_{n} in the statement of the Theorem. Finally, we use Proposition 1 to obtain the bound (3.2) on the rates.
Step 1: Typical sets: We define the typical set

An(ϵ)=(An,1(ϵ)×An,2(ϵ))An,3(ϵ)A_{n}(\epsilon)=\left(A_{n,1}(\epsilon)\times A_{n,2}(\epsilon)\right)\bigcap A_{n,3}(\epsilon) (4.6)

where

An,1(ϵ)={x𝒳n:2n(H(X)+ϵ)p(x)2n(H(X)ϵ)},A_{n,1}(\epsilon)=\{x\in{\cal X}^{n}:2^{-n(H(X)+\epsilon)}\leq p(x)\leq 2^{-n(H(X)-\epsilon)}\},
An,2(ϵ)={y𝒴n:2n(H(Y)+ϵ)p(y)2n(H(Y)ϵ)}A_{n,2}(\epsilon)=\{y\in{\cal Y}^{n}:2^{-n(H(Y)+\epsilon)}\leq p(y)\leq 2^{-n(H(Y)-\epsilon)}\}

and

An,3(ϵ)={(x,y)𝒳n×𝒴n:2n(H(X,Y)+ϵ)p(x,y)2n(H(X,Y)ϵ)}A_{n,3}(\epsilon)=\{(x,y)\in{\cal X}^{n}\times{\cal Y}^{n}:2^{-n(H(X,Y)+\epsilon)}\leq p(x,y)\leq 2^{-n(H(X,Y)-\epsilon)}\}

with the notation that if x=(x1,,xn),x=(x_{1},\ldots,x_{n}), then p(x):=i=1np(xi).p(x):=\prod_{i=1}^{n}p(x_{i}).

We estimate (An,1(ϵ))\mathbb{P}(A_{n,1}(\epsilon)) as follows. If (X1,,Xn)(X_{1},\ldots,X_{n}) is a random element of 𝒳n{\cal X}^{n} with {Xi}\{X_{i}\} i.i.d. and each having distribution p(.),p(.), then the random variable logp(Xi)\log{p(X_{i})} has mean H(X)H(X) and so by Chebychev’s inequality

(An,1c(ϵ))\displaystyle\mathbb{P}(A^{c}_{n,1}(\epsilon)) =\displaystyle= (|i=1nlogp(Xi)nH(X)|nH(X)ϵ)\displaystyle\mathbb{P}\left(\left|\sum_{i=1}^{n}\log{p(X_{i})}-nH(X)\right|\geq nH(X)\epsilon\right)
\displaystyle\leq 1n2H2(X)ϵ2𝔼(i=1nlogp(Xi)nH(X))2\displaystyle\frac{1}{n^{2}H^{2}(X)\epsilon^{2}}\mathbb{E}\left(\sum_{i=1}^{n}\log{p(X_{i})}-nH(X)\right)^{2}
=\displaystyle= 1nH2(X)ϵ2𝔼(logp(X1)H(X))2\displaystyle\frac{1}{nH^{2}(X)\epsilon^{2}}\mathbb{E}\left(\log{p(X_{1})}-H(X)\right)^{2}

which converges to zero as n.n\rightarrow\infty. Analogous estimates hold for the sets An,2(ϵ)A_{n,2}(\epsilon) and An,3(ϵ)A_{n,3}(\epsilon) and so

(An(ϵ))1ϵ2\mathbb{P}(A_{n}(\epsilon))\geq 1-\epsilon^{2} (4.7)

for all nn large.

Step 2: Determining the set n{\cal B}_{n}: We now use the set An(ϵ)A_{n}(\epsilon) defined above to determine the set n{\cal B}_{n} in the statement of the Theorem as follows. For xAn,1(ϵ),x\in A_{n,1}(\epsilon), let

Dn(x,ϵ):={yAn,2(ϵ):(x,y)An(ϵ)}.D_{n}(x,\epsilon):=\{y\in A_{n,2}(\epsilon):(x,y)\in A_{n}(\epsilon)\}.

In Figure 2, we illustrate the sets {An,i(ϵ)}1i3\{A_{n,i}(\epsilon)\}_{1\leq i\leq 3} and the set An(ϵ).A_{n}(\epsilon). The rectangle EFGHEFGH denotes An,1(ϵ)×An,2(ϵ)A_{n,1}(\epsilon)\times A_{n,2}(\epsilon) and the oval set A3A_{3} represents An,3(ϵ).A_{n,3}(\epsilon). The hatched region represents An(ϵ).A_{n}(\epsilon). The line yzyz represents the set Dn(x,ϵ)D_{n}(x,\epsilon) for xAn,1(ϵ)x\in A_{n,1}(\epsilon) shown on the XX-axis.

Refer to caption
Figure 2: The set Dn(x,ϵ)D_{n}(x,\epsilon) obtained from the sets An,i(ϵ),1i3.A_{n,i}(\epsilon),1\leq i\leq 3.

From Figure 2 we see that

xAn,1(ϵ)(yDn(x,ϵ)p(x,y))=(x,y)An(ϵ)p(x,y)1ϵ2\sum_{x\in A_{n,1}(\epsilon)}\left(\sum_{y\in D_{n}(x,\epsilon)}p(x,y)\right)=\sum_{(x,y)\in A_{n}(\epsilon)}p(x,y)\geq 1-\epsilon^{2} (4.8)

by (4.7). Letting

An,4(ϵ):={xAn,1(ϵ):yDn(x,ϵ)p(y|x)1ϵ},A_{n,4}(\epsilon):=\left\{x\in A_{n,1}(\epsilon):\sum_{y\in D_{n}(x,\epsilon)}p(y|x)\geq 1-\epsilon\right\}, (4.9)

we split the summation in first term in (4.8) as L1+L2L_{1}+L_{2} where

L1=xAn,4(ϵ)(yDn(x,ϵ)p(y|x))p(x)xAn,4(ϵ)p(x)=(An,4(ϵ))L_{1}=\sum_{x\in A_{n,4}(\epsilon)}\left(\sum_{y\in D_{n}(x,\epsilon)}p(y|x)\right)p(x)\leq\sum_{x\in A_{n,4}(\epsilon)}p(x)=\mathbb{P}\left(A_{n,4}(\epsilon)\right) (4.10)

and

L2\displaystyle L_{2} =\displaystyle= xAn,1(ϵ)An,4(ϵ)(yDn(x,ϵ)p(y|x))p(x)\displaystyle\sum_{x\in A_{n,1}(\epsilon)\setminus A_{n,4}(\epsilon)}\left(\sum_{y\in D_{n}(x,\epsilon)}p(y|x)\right)p(x) (4.11)
\displaystyle\leq (1ϵ)xAn,1(ϵ)An,4(ϵ)p(x)\displaystyle(1-\epsilon)\sum_{x\in A_{n,1}(\epsilon)\setminus A_{n,4}(\epsilon)}p(x)
\displaystyle\leq (1ϵ)(An,4c(ϵ)).\displaystyle(1-\epsilon)\mathbb{P}\left(A^{c}_{n,4}(\epsilon)\right).

Substituting (4.11) and (4.10) into (4.8) we get

1ϵ(An,4c(ϵ))L1+L21ϵ21-\epsilon\cdot\mathbb{P}\left(A^{c}_{n,4}(\epsilon)\right)\geq L_{1}+L_{2}\geq 1-\epsilon^{2}

and so (An,4c(ϵ))ϵ.\mathbb{P}\left(A^{c}_{n,4}(\epsilon)\right)\leq\epsilon. Because An,4(ϵ)An,1(ϵ),A_{n,4}(\epsilon)\subseteq A_{n,1}(\epsilon), we therefore get that

1ϵ(An,4(ϵ))=xAn,4(ϵ)p(x)2n(H(X)ϵ)#An,4(ϵ).1-\epsilon\leq\mathbb{P}\left(A_{n,4}(\epsilon)\right)=\sum_{x\in A_{n,4}(\epsilon)}p(x)\leq 2^{-n(H(X)-\epsilon)}\#A_{n,4}(\epsilon).

Setting n=An,4(ϵ){\cal B}_{n}=A_{n,4}(\epsilon) we then get

#n2n(H(X)ϵ)(1ϵ)2n(H(X)2ϵ)\#{\cal B}_{n}\geq 2^{n(H(X)-\epsilon)}\cdot(1-\epsilon)\geq 2^{n(H(X)-2\epsilon)}

for all nn large.

Step 3: Using Proposition 1: For αH(X),\alpha\leq H(X), we let 𝒟n{\cal D}_{n} be any set of size 2n(α2ϵ)2^{n(\alpha-2\epsilon)} contained within n.{\cal B}_{n}. Let GG be the bipartite graph with vertex set 𝒳c𝒴c{\cal X}_{c}\cup{\cal Y}_{c} where 𝒳c:=𝒟n,𝒴c:=An,2(ϵ){\cal X}_{c}:={\cal D}_{n},\;{\cal Y}_{c}:=A_{n,2}(\epsilon) and an edge is present between x𝒳cx\in{\cal X}_{c} and y𝒴cy\in{\cal Y}_{c} if and only if (x,y)An(ϵ).(x,y)\in A_{n}(\epsilon). We now compute the sizes of the probable sets and the conflict sets in that order.

For each x𝒳cx\in{\cal X}_{c} we have by definition (4.9) of An,4(ϵ)A_{n,4}(\epsilon) that

yDn(x,ϵ)p(y|x)1ϵ\sum_{y\in D_{n}(x,\epsilon)}p(y|x)\geq 1-\epsilon (4.12)

and so we set Dn(x,ϵ)D_{n}(x,\epsilon) to be the ϵ\epsilon-probable set corresponding to x𝒟n.x\in{\cal D}_{n}. To estimate the size of Dn(x,ϵ),D_{n}(x,\epsilon), we use the fact that (x,y)An(ϵ)(x,y)\in A_{n}(\epsilon) and so

p(y|x)=p(x,y)p(x)2n(H(X,Y)+ϵ)2n(H(X)ϵ)=2n(H(Y|X)+2ϵ).p(y|x)=\frac{p(x,y)}{p(x)}\geq\frac{2^{-n(H(X,Y)+\epsilon)}}{2^{-n(H(X)-\epsilon)}}=2^{-n(H(Y|X)+2\epsilon)}. (4.13)

Thus

1yDn(x,ϵ)p(y|x)#Dn(x,ϵ)2n(H(Y|X)+2ϵ)1\geq\sum_{y\in D_{n}(x,\epsilon)}p(y|x)\geq\#D_{n}(x,\epsilon)\cdot 2^{-n(H(Y|X)+2\epsilon)}

and consequently

#Dn(x,ϵ)2n(H(Y|X)+2ϵ).\#D_{n}(x,\epsilon)\leq 2^{n(H(Y|X)+2\epsilon)}. (4.14)

Finally, we estimate the size of the conflict set C(y,ϵ)C(y,\epsilon) for each y𝒴c.y\in{\cal Y}_{c}. Again we use the fact that if (x,y)(x,y) is an edge in GG then (x,y)An(ϵ)(x,y)\in A_{n}(\epsilon) and so

p(x|y)=p(x,y)p(y)2n(H(X,Y)+ϵ)2n(H(Y)ϵ)=2n(H(X|Y)+2ϵ).p(x|y)=\frac{p(x,y)}{p(y)}\geq\frac{2^{-n(H(X,Y)+\epsilon)}}{2^{-n(H(Y)-\epsilon)}}=2^{-n(H(X|Y)+2\epsilon)}. (4.15)

Thus

1xC(y,ϵ)p(x|y)#C(y,ϵ)2n(H(X|Y)+2ϵ)1\geq\sum_{x\in C(y,\epsilon)}p(x|y)\geq\#C(y,\epsilon)\cdot 2^{-n(H(X|Y)+2\epsilon)}

and we get that #C(y,ϵ)2n(H(X|Y)+2ϵ).\#C(y,\epsilon)\leq 2^{n(H(X|Y)+2\epsilon)}. Using this and (4.14), we get that the conditions in Proposition 1 hold with

N0=2n(α2ϵ),dL(ϵ)=2n(H(Y|X)+2ϵ) and dR(ϵ)=2n(H(X|Y)+2ϵ).N_{0}=2^{n(\alpha-2\epsilon)},d_{L}(\epsilon)=2^{n(H(Y|X)+2\epsilon)}\text{ and }d_{R}(\epsilon)=2^{n(H(X|Y)+2\epsilon)}.

If M=2nRM=2^{nR} with R<αH(X|Y)H(Y|X)7ϵ,R<\alpha-H(X|Y)-H(Y|X)-7\epsilon, then (4.4) holds and so there exists a code containing M=2nRM=2^{nR} words from 𝒟n{\cal D}_{n} giving an error probability of at most ϵ\epsilon with the conflict set decoder.                                                             

Acknowledgements

I thank Professors Rajesh Sundaresan, C. R. Subramanian and the referees for crucial comments that led to an improvement of the paper. I also thank IMSc for my fellowships.

References

  • [1] Alon, N. and Spencer, J., The Probabilistic Method, Wiley Interscience (2008).
  • [2] Bandemer, B., El Gamal, A. and Kim, Y-H., Optimal Achievable Rates for Interference Networks with Random Codes, IEEE Transactions on Information Theory, 61, 6536–6549 (2015).
  • [3] Cover, T. and Thomas, J., Elements of Information Theory, Wiley (2006).
  • [4] Csiszár, I. and Körner, J., Graph Decomposition: A New Key to Coding Theorems, IEEE Transactions on Information Theory, 27, 5–12 (1981).
  • [5] Csiszár, I., The Method of Types, IEEE Transactions on Information Theory, 44, pp. 2505–2523 (1998).
  • [6] Huffman, W. C. and Pless, V., Fundamentals of Error Correcting Codes, Cambridge University Press (2003).
  • [7] Zamir, R., Lattice Coding for Signals and Networks, Cambridge University Press (2014).