¹¹institutetext: Institute of Mathematical Sciences, HBNI, Chennai
¹¹email: gganesan82@gmail.com

Achieving positive rates with predetermined dictionaries

Ghurumuruhan Ganesan

Abstract

In the first part of the paper we consider binary input channels that are not necessarily stationary and show how positive rates can be achieved using codes constrained to be within predetermined dictionaries. We use a Gilbert-Varshamov-like argument to obtain the desired rate achieving codes. Next we study the corresponding problem for channels with arbitrary alphabets and use conflict-set decoding to show that if the dictionaries are contained within “nice” sets, then positive rates are achievable.

Key words: Positive rates, predetermined dictionaries.

AMS 2000 Subject Classification: Primary: 94A15, 94A24.

1 Introduction

Achieving positive rates with low probability of error in communication channels is an important problem in information theory [3]. In general, a rate $R$ is defined to be achievable if there exists codes with rate $R$ and having arbitrarily small error probability as the code length $n\rightarrow\infty.$ The existence of such codes is determined through the probabilistic method of choosing a random code (from the set of all possible codes) and showing that the chosen code has small error probability.

In many cases of interest, we would like to select codes satisfying certain constraints or equivalently from a predetermined dictionary (see [2, 7] for examples). For stationary channels, the method of types [4, 5] can be used to study positive rate achievability with the restriction that the dictionary falls within the set of words belonging to a particular type. In this paper, we study achievability of positive rates with arbitrary deterministic dictionaries for both binary and general input channels using counting techniques.

The paper is organized as follows: In Section 2, we study positive rate achievability in binary input channels using predetermined dictionaries. Next in Section 3, we describe the rate achievability problem for arbitrary stationary channels and state our result Theorem 3.1 regarding achieving positive rates using given dictionaries. Finally, in Section 4, we prove Theorem 3.1.

2 Binary Channels

For integer $n\geq 1,$ an element of the set $\{0,1\}^{n}$ is said to be a codeword or simply word, of length $n.$ Consider a discrete memoryless symmetric channel with input alphabet $\{0,1\}^{n}$ that corrupts a transmitted word $\mathbf{x}=(x_{1},\ldots,x_{n})$ as follows. If $\mathbf{Y}=(Y_{1},\ldots,Y_{n})$ is the received (random) word, then

Y_{i}:=x_{i}1{1}(W_{i}=0)+(1-x_{i})1{1}(W_{i}=1)+\varepsilon 1{1}(W_{i}=\varepsilon)

(2.1)

for all $1\leq i\leq n,$ where $1{1}(.)$ denotes the indicator function and $\varepsilon$ denotes the erasure symbol. If $W_{i}=1,$ then the bit $x_{i}$ is substituted and if $W_{i}=\varepsilon,$ then $x_{i}$ is erased. The random variables $\{W_{i}\}_{1\leq i\leq n}$ are independent with

\mathbb{P}(W_{i}=1)=p_{f}(i)\text{ and }\mathbb{P}(W_{i}=\varepsilon)=p_{e}(i)

(2.2)

and so the probability of a bit error (due to either a substituted bit or an erased bit) at “time” index $i$ is $p_{f}(i)+p_{e}(i).$ Letting $\mathbf{W}:=(W_{1},\ldots,W_{n}),$ we also denote $\mathbf{Y}=:h(\mathbf{x},\mathbf{W})$ where $h$ is a deterministic function defined via (2.1). We are interested in communicating through the above described channel, with low probability of error, using words from a predetermined (deterministic) dictionary.

Dictionaries

A dictionary of size $M$ is a set ${\cal D}\subseteq\{0,1\}^{n}$ of cardinality $\#{\cal D}=M.$ A subset ${\cal C}=\{\mathbf{x}_{1},\ldots,\mathbf{x}_{L}\}\subseteq{\cal D}$ is said to be an $n-$ length code of size $L,$ contained in the dictionary ${\cal D}.$ Suppose we transmit a word picked from ${\cal C},$ through the channel given by (2.1) and receive the (random) word $\mathbf{Y}.$ Given $\mathbf{Y}$ we would like an estimate $\hat{\mathbf{x}}$ of the word from ${\cal C}$ that was transmitted. A decoder $g:\{0,1,\varepsilon\}^{n}\rightarrow{\cal C}$ is a deterministic map that uses the received word $\mathbf{Y}$ to obtain an estimate of the transmitted word. The probability of error corresponding to the code ${\cal C}$ and the decoder $g$ is then defined as

q({\cal C},g):=\max_{1\leq i\leq L}\mathbb{P}\left(g\left(h\left(\mathbf{x}_{i},\mathbf{W}\right)\right)\neq\mathbf{x}_{i}\right),

(2.3)

where $\mathbf{W}=(W_{1},\ldots,W_{n})$ is the additive noise as described in (2.1).

We have the following definition regarding achievable rates using predetermined dictionaries.

Definition 1

Let $R>0$ and let ${\cal F}:=\{{\cal D}_{n}\}_{n\geq 1}$ be any sequence of dictionaries such that each ${\cal D}_{n}$ has size at least $2^{nR}.$ We say that $R>0$ is an ${\cal F}-$ achievable rate if the following holds true for every $\epsilon>0:$ For all $n$ large, there exists a code ${\cal C}_{n}\subset{\cal D}_{n}$ of size $\#{\cal C}_{n}=2^{nR}$ and a decoder $g_{n}$ such that the probability of error $q({\cal C}_{n},g_{n})<\epsilon.$

If ${\cal D}_{n}=\{0,1\}^{n}$ for each $n,$ then the above reduces to the usual concept of rate achievability as in [3] and we simply say that $R$ is achievable.

For $0<x<1$ we define the entropy function

H(x):=-x\cdot\log{x}-(1-x)\cdot\log(1-x),

(2.4)

where all logarithms in this section to the base two and have the following result.

Theorem 2.1

For integer $n\geq 1$ let

\mu_{f}=\mu_{f}(n):=\sum_{i=1}^{n}p_{f}(i)\text{ and }\mu_{e}=\mu_{e}(n):=\sum_{i=1}^{n}p_{e}(i)

be the expected number of bit substitutions and erasures, respectively in an $n-$ length codeword and suppose

\min\left(\mu_{f}(n),\mu_{e}(n)\right)\longrightarrow\infty\text{ and }p:=\limsup_{n}\frac{1}{n}\left(2\mu_{f}(n)+\mu_{e}(n)\right)<\frac{1}{2}.

(2.5)

Let $H(p)<\alpha\leq 1$ and let ${\cal F}:=\{{\cal D}_{n}\}_{n\geq 1}$ be any sequence of dictionaries satisfying $\#{\cal D}_{n}\geq 2^{\alpha n},$ for each $n.$ We have that every $R<\alpha-H(p)$ is ${\cal F}-$ achievable.

For a given $\alpha,$ let $p(\alpha)$ be the largest value of $p$ such that $H(p)<\alpha.$ The above result says that every $R<\alpha-H(p)$ is achievable using arbitrary dictionaries. We use Gilbert-Varshamov-like arguments to prove Theorem 2.1 below.

As a special case, for binary symmetric channels with crossover probability $p_{f},$ each bit is independently substituted with probability $p_{f}.$ No erasures occur and so

\mu_{f}(n)=np_{f}\text{ and }\mu_{e}(n)=0.

Thus $p=2p_{f}$ and from Theorem 2.1 we therefore have that if $H(2p_{f})<\alpha,$ then every $R<\alpha-H(2p_{f})$ is achievable.

Proof of Theorem 2.1

The main idea of the proof is as follows. Using standard deviation estimates, we first obtain an upper bound on the number of possible errors that could occur in a transmitted word. More specifically, if $T$ denotes the number of bit errors in an $n-$ length word and $\epsilon>0$ is given, we use standard deviation estimates to determine $T_{0}=T_{0}(n)$ such that $\mathbb{P}(T>T_{0})\leq\epsilon.$ We then use a Gilbert-Varshamov argument to obtain a code that can correct up to $T_{0}$ bit errors. The details are described below.

We prove the Theorem in two steps. In the first step, we construct the code ${\cal C}$ and decoder $g$ and in the second step, we estimate the probability of the decoding error for ${\cal C}$ using $g.$ For $\mathbf{x},\mathbf{y}\in\{0,1\}^{n},$ we let $d_{H}(\mathbf{x},\mathbf{y})=\sum_{i=1}^{n}1{1}(x_{i}\neq y_{i})$ be the Hamming distance between $\mathbf{x}$ and $\mathbf{y},$ where as before $1{1}(.)$ denotes the indicator function. The minimum distance of a code is the minimum distance between any two words in a code.

Step 1: Assume for simplicity that $t:=np(1+2\epsilon)$ is an integer and let $d=t+1.$ For a word $\mathbf{x}$ let $B_{d-1}(\mathbf{x})$ be the set of words that are at a distance of at most $d-1$ from $\mathbf{x}.$ If ${\cal C}\subseteq{\cal D}$ is a maximum size code with minimum distance at least $d,$ then by the maximality of ${\cal C}$ we must have

\bigcup_{\mathbf{x}\in{\cal C}}B_{d-1}(\mathbf{x})={\cal D}.

(2.6)

This is known as the Gilbert-Varshamov argument [6].

The cardinalities of ${\cal D}$ and $B_{d-1}(\mathbf{x})$ are $2^{\alpha n}$ and $\sum_{i=0}^{d-1}{n\choose i}$ respectively and so from (2.6), we see that the code ${\cal C}$ has size

\#{\cal C}\geq\frac{2^{\alpha n}}{\sum_{i=0}^{d-1}{n\choose i}}

(2.7)

and minimum distance at least $d.$ Also since $p<\frac{1}{2},$ we have for all small $\epsilon>0$ that ${n\choose i}\leq{n\choose d-1}={n\choose np(1+2\epsilon)}$ and so $\sum_{i=0}^{d-1}{n\choose i}\leq n\cdot{n\choose np(1+2\epsilon)}.$ Using Stirling approximation we get

{n\choose np(1+2\epsilon)}\leq 4en\cdot 2^{nH(p+2p\epsilon)}

and so from (2.7), we get for $\delta>0$ that

\#{\cal C}\geq\frac{1}{4en^{2}}\cdot 2^{n(\alpha-H(p+2p\epsilon))}\geq 2^{n(\alpha-H(p)-\delta)}

(2.8)

provided $\epsilon>0$ is small.

We now use a two stage decoder described as follows: Suppose the received word is $\mathbf{Y}$ and for simplicity suppose that the last $e$ positions in $\mathbf{Y}$ have been erased. For a codeword $\mathbf{x}=(x_{1},\ldots,x_{n}),$ let $\mathbf{x}_{red}:=(x_{1},\ldots,x_{n-e})$ be the reduced word formed by the first $n-e$ bits. Let ${\cal C}_{red}=\{\mathbf{x}_{red}:\mathbf{x}\in{\cal C}\}$ be the set of all reduced codewords in the code ${\cal C}$ formed by the first $n-e$ bits.

In the first stage of the decoding process, the decoder corrects bit substitutions by collecting all words ${\cal S}\subseteq{\cal C}_{red}$ whose Hamming distance from $\mathbf{Y}_{red}$ is minimum. If ${\cal S}$ contains exactly one word, say $\mathbf{z}_{red},$ the decoder outputs $\mathbf{z}_{red}$ as the estimate obtained in the first step of the iteration. Otherwise, the decoder outputs “decoding error”. In the second stage of the decoding process, the decoder uses $\mathbf{z}_{red}$ to correct the erasures. Formally let ${\cal S}_{e}:=\{\mathbf{x}\in{\cal C}:\mathbf{x}_{red}=\mathbf{z}_{red}\}$ be the set of all codewords whose first $n-e$ bits match $\mathbf{z}_{red}.$ If there exists exactly one word $\mathbf{z}$ in ${\cal S}_{e},$ then the decoder outputs $\mathbf{z}$ to be the transmitted word. Else the decoder outputs “decoding error”.

Step 2: Suppose a word $\mathbf{x}\in{\cal C}$ was transmitted and the received word is $\mathbf{Y}.$ Let $\mathbf{W}=(W_{1},\ldots,W_{n})$ be the random noise vector as in (2.1) and let

T_{f}:=\sum_{i=1}^{n}1{1}(W_{i}=1)

be the number of bits that have been substituted so that

\mathbb{E}T_{f}=\sum_{i=1}^{n}p_{f}(i)=\mu_{f}(n),

by (2.2). By standard deviation estimates (Corollary $A.1.14,$ pp. 312, [1]) we have

\mathbb{P}\left(|T_{f}-\mu_{f}(n)|\geq\epsilon\mu_{f}(n)\right)\leq 2e^{-\frac{\epsilon^{2}}{4}\mu_{f}(n)}\leq\frac{\epsilon}{2}

(2.9)

for all $n$ large, by the first condition of (2.5). Similarly if $T_{e}=\sum_{i=1}^{n}1{1}(W_{i}=\varepsilon)$ is the number of erased bits, then

\mathbb{P}\left(|T_{e}-\mu_{e}(n)|\geq\epsilon\mu_{e}(n)\right)\leq 2e^{-\frac{\epsilon^{2}}{4}\mu_{e}(n)}\leq\frac{\epsilon}{2}

(2.10)

for all $n$ large.

Next, using the second condition of (2.5) we have that

(2\mu_{f}(n)+\mu_{e}(n))(1+\epsilon)\leq np(1+2\epsilon)=t

for all $n$ large and so from (2.9) and (2.10) we get that $\mathbb{P}\left(2T_{f}+T_{e}\geq t\right)\leq\epsilon$ for all $n$ large. If $2T_{f}+T_{e}\leq t,$ then by construction the decoder outputs $\mathbf{x}$ as the estimate of the transmitted word. Therefore a decoding error occurs only if $2T_{f}+T_{e}\geq t$ which happens with probability at most $\epsilon.$ Combining with (2.8) and using the fact that $\delta>0$ is arbitrary, we get that every $R<\alpha-H(p)$ is ${\cal F}-$ achievable. ∎

3 General channels

Consider a discrete memoryless channel with finite input alphabet ${\cal X}$ of size $N:=\#{\cal X},$ a finite output alphabet ${\cal Y}$ and a transition probability $p_{Y|X}(y|x),x\in{\cal X},y\in{\cal Y}.$ The term $p_{Y|X}(y|x)$ denotes the probability that output $y$ is observed given that input $x$ is transmitted through the channel.

For $n\geq 1$ we define a subset ${\cal D}_{n}\subseteq{\cal X}^{n}$ to be a dictionary. A subset ${\cal C}=\{x_{1},\ldots,x_{M}\}\subseteq{\cal D}_{n}$ is defined to be an $n-$ length code contained within the dictionary ${\cal D}_{n}.$ Suppose we transmit the word $x_{1}$ and receive the (random) word $\Gamma_{x_{1}}\in{\cal Y}^{n}.$ Given $\Gamma_{x_{1}}$ we would like an estimate $\hat{x}$ of the word from ${\cal C}$ that was transmitted. A decoder $g:{\cal Y}^{n}\rightarrow{\cal C}$ is a deterministic map that “guesses” the transmitted word based on the received word $\Gamma_{x_{1}}.$ We denote the probability of error corresponding to the code ${\cal C}$ and the decoder $g$ as

q({\cal C},g):=\max_{x\in{\cal C}}\mathbb{P}\left(g(\Gamma_{x})\neq x\right).

(3.1)

To study positive rate achievability using arbitrary dictionaries, we have a couple of preliminary definitions. Let $p_{X}(.)$ be any probability distribution on the input alphabet ${\cal X}$ and let $H(X):=-\sum_{x\in{\cal X}}p_{X}(x)\log{p_{X}(x)}$ be the entropy of a random variable $X$ where the logarithm is to the base $N$ here. Let $Y$ be a random variable having joint distribution $p_{XY}(x,y)$ with the random variable $X$ defined by $p_{XY}(x,y):=p_{Y|X}(y|x)\cdot p_{X}(x).$ Thus $Y$ is the random output of the channel when the input is $X.$ Letting $p_{Y}(y):=\sum_{x}p_{XY}(x,y)$ be the marginal of $Y$ we have that the joint entropy and conditional entropy [3] are respectively given by

H(X,Y)=-\sum_{x,y}p_{XY}(x,y)\log{p_{XY}(x,y)}

and

H(Y|X)=-\sum_{x,y}p_{XY}(x,y)\log{p_{Y|X}(y|x)}.

The following result obtains positive rates achievable with predetermined dictionaries for the channel described above.

Theorem 3.1

Let $p_{X},p_{Y}$ and $p_{XY}$ be as above and let $0<\alpha\leq H(X).$ For every $\epsilon>0$ and for all $n$ large, there is a deterministic set ${\cal B}_{n}$ with size at least $N^{n(H(X)-2\epsilon)}$ and satisfying the following property: If ${\cal D}_{n}$ is any subset of ${\cal B}_{n}$ with cardinality $N^{n(\alpha-2\epsilon)}$ and

R<\alpha-H(Y|X)-H(X|Y)-7\epsilon

(3.2)

is positive, then there exists a code ${\cal C}_{n}\subset{\cal D}_{n}$ containing $N^{nR}$ words and a decoder $g_{n}$ with error probability $q\left({\cal C}_{n},g_{n}\right)<\epsilon.$

Thus if the sequence of dictionaries ${\cal F}:=\{{\cal D}_{n}\}_{n\geq 1}$ is such that ${\cal D}_{n}\subset{\cal B}_{n}$ for each $n,$ then every $R<\alpha-H(Y|X)-H(X|Y)$ is ${\cal F}-$ achievable. Also, setting $\alpha=H(X)$ and ${\cal D}_{n}={\cal B}_{n}$ also gives us that every $R<H(X)-H(X|Y)-H(Y|X)$ is achievable in the usual sense of [3], without any restrictions on the dictionaries. For context, we remark that Theorem 2.1 holds for arbitrary dictionaries.

To prove Theorem 3.1, we use typical sets [3] together with conflict set decoding described in the next section. Before we do so, we present an example to illustrate Theorem 3.1.

Example

Consider a binary asymmetric channel with alphabet ${\cal X}={\cal Y}=\{0,1\}$ and transition probability

p(1|0)=p_{0}=1-p(0|0)\text{ and }p(0|1)=p_{1}=1-p(1|1).

To apply Theorem 3.1, we assume that the input has the symmetric distribution $\mathbb{P}(X_{i}=0)=\frac{1}{2}=\mathbb{P}(X_{i}=1)$ so that the entropy $H(X)$ equals its maximum value of $1.$ The entropy of the output $H(Y)=H(q)$ where $q=\frac{1-p_{0}+p_{1}}{2}$ and the conditional entropies equal

H(Y|X)=\frac{1}{2}\left(H(p_{0})+H(p_{1})\right)\text{ and }H(X|Y)=\frac{1}{2}\left(H(p_{0})+H(p_{1})\right)+1-H(q).

Set $p_{0}=p$ and $p_{1}=p+\Delta.$ If both $p$ and $\Delta$ are small, then $H(q)$ is close to one and $H(p_{0})$ and $H(p_{1})$ are close to zero. We assume that $p$ and $\Delta$ are such that

\alpha_{0}:=H(Y|X)+H(X|Y)=H(p)+H(p+\Delta)+1-H\left(\frac{1-\Delta}{2}\right)

is strictly less than one and choose $\alpha>\alpha_{0}.$ Every $R<\alpha-\alpha_{0}$ is then ${\cal F}-$ achievable as in the statement following Theorem 3.1 and every $R<1-\alpha_{0}$ is achievable without any dictionary restrictions, in the usual sense of [3].

In Figure 1, we plot $1-\alpha_{0}$ as a function of $p$ for various values of the asymmetry factor $\Delta.$ For example, for an asymmetry factor of $\Delta=0.05$ we see that positive rates are achievable for $p$ roughly up to $0.08.$

Refer to caption — Figure 1: Plotting the rate $1-\alpha_{0}$ as a function of $p$ for various values of the asymmetry factor $\Delta.$

4 Proof of Theorem 3.1

We use conflict set decoding to prove Theorem 3.1. Therefore in the first part of this section, we prove an auxiliary result regarding conflict set decoding that is also of independent interest.

4.1 Conflict set decoding

Consider a discrete memoryless channel with finite input alphabet ${\cal X}_{0}$ and finite output alphabet ${\cal Y}_{0}$ and transition probability $p_{0}(y|x),x\in{\cal X}_{0},y\in{\cal Y}_{0}.$ For convenience, we define the channel by a collection of random variables $\theta_{x},x\in{\cal X}_{0}$ with the distribution $\mathbb{P}\left(\theta_{x}=y\right):=p_{0}(y|x)$ for $y\in{\cal Y}_{0}.$ All random variables are defined on the probability space $(\Omega,{\cal F},\mathbb{P}).$ For $\epsilon>0,x\in{\cal X}_{0}$ and $y\in{\cal Y}_{0}$ we let $D(x,\epsilon)$ and $C(y,\epsilon)$ be deterministic sets such that

\mathbb{P}\left(\theta_{x}\in D(x,\epsilon)\right)\geq 1-\epsilon\text{ and }C(y,\epsilon)=\{x:y\in D(x,\epsilon)\}.

(4.1)

We define $D(x,\epsilon)$ to be an $\epsilon-$ probable set or simply probable output set corresponding to the input $x$ and for $y\in{\cal Y}_{0},$ we denote $C(y,\epsilon)$ to be the $\epsilon-$ conflict set or simply conflict set corresponding to the output $y.$ There are many possible choices for $D(x,\epsilon);$ for example $D(x,\epsilon)={\cal Y}_{0}$ is one choice. In Proposition 1 below, we show however that choosing $\epsilon-$ probable sets as small as possible allows us to increase the size of the desired code. We also define

d_{L}(\epsilon):=\max_{x\in{\cal X}_{0}}\#D(x,\epsilon)\text{ and }d_{R}(\epsilon):=\max_{y\in{\cal Y}_{0}}\#C(y,\epsilon)

(4.2)

where $\#A$ denotes the cardinality of the set $A.$

As before, a code ${\cal C}$ of size $M$ is a set of distinct words $\{x_{1},\ldots,x_{M}\}\subseteq{\cal X}_{0}.$ Suppose we transmit the word $x_{1}$ and receive the (random) word $\theta_{x_{1}}.$ Given $\theta_{x_{1}}$ we would like an estimate $\hat{x}$ of the word from ${\cal C}$ that was transmitted. A decoder $g:{\cal Y}_{0}\rightarrow{\cal C}$ is a deterministic map that guesses the transmitted word based on the received word $\theta_{x_{1}}.$ We denote the probability of error corresponding to the code ${\cal C},$ the decoder $g$ and the collection of the probable sets ${\cal D}:=\{D(x,\epsilon)\}_{x\in{\cal X}_{0}}$ as

q({\cal C},g,{\cal D}):=\max_{x\in{\cal C}}\mathbb{P}\left(g(\theta_{x})\neq x\right).

(4.3)

We have the following Proposition.

Proposition 1

For $\epsilon>0$ let ${\cal D}=\{D(x,\epsilon)\}_{x\in{\cal X}_{0}}$ be any collection of $\epsilon-$ probable sets. If there exists an integer $M$ satisfying

M<\frac{\#{\cal X}_{0}}{d_{L}(\epsilon)\cdot d_{R}(\epsilon)},

(4.4)

then there exists a code ${\cal C}\subseteq{\cal X}_{0}$ of size $M$ and a decoder $g$ whose decoding error probability is $q({\cal C},g,{\cal D})<\epsilon.$

Thus as long as the number of words is below a certain threshold, we are guaranteed that the error probability is sufficiently small. Also, from (4.4) we see that it would be better to choose probable sets with as small cardinality as possible.

Proof of Proposition 1

Code construction: We recall that by definition, given input $x,$ the output $\theta_{x}$ belongs to the set $D(x,\epsilon)$ with probability at least $1-\epsilon.$ Therefore we first construct a code ${\cal C}=\{x_{1},\ldots,x_{M}\}$ containing $M$ distinct words and satisfying

D(x_{i},\epsilon)\cap D(x_{j},\epsilon)=\emptyset\text{ for all }x_{i},x_{j}\in{\cal C}.

(4.5)

Throughout we assume that $M$ satisfies (4.4). To obtain the desired distinct words, we use following the bipartite graph representation. Let $G=G(\epsilon)$ be a bipartite graph with vertex set ${\cal X}_{0}\cup{\cal Y}_{0}.$ We join $x\in{\cal X}_{0}$ and $y\in{\cal Y}_{0}$ by an edge if and only if $y\in D(x,\epsilon).$ The size of $D(x,\epsilon)$ therefore represents the degree of the vertex $x$ and the size of $C(y,\epsilon)$ represents the degree of the vertex $y.$ By definition (see (4.2)) $d_{L}=\max_{x\in{\cal X}}\#D(x,\epsilon)\text{ and }d_{R}=\max_{y\in{\cal Y}}\#C(y,\epsilon)$ denote the maximum degree of a left vertex and a right vertex, respectively, in $G.$ We say that a set of vertices $\{x_{1},\ldots,x_{M}\}$ is disjoint if for all $i\neq j,$ the vertices $x_{i}$ and $x_{j}$ have no common neighbour (in ${\cal Y}_{0}$ ). Constructing codes with disjoint $\epsilon-$ probable sets satisfying (4.5) is therefore equivalent to finding disjoint sets of vertices in ${\cal X}_{0}.$

We now use direct counting to get a set of $M$ disjoint vertices $\{x_{1},\ldots,x_{M}\}$ in ${\cal X}_{0}.$ First we pick any vertex $x_{1}\in{\cal X}_{0}.$ The degree of $x_{1}$ is at most $d_{L}$ and moreover, each vertex in $D(x_{1},\epsilon)\subseteq{\cal Y}$ has at most $d_{R}$ neighbours in ${\cal X}_{0}.$ The total number of (bad) vertices of ${\cal X}_{0}$ adjacent to some vertex in $D(x_{1},\epsilon)$ is at most $d_{L}\cdot d_{R}.$ Removing all these bad vertices, we are left with a bipartite subgraph $G_{1}$ of $G$ whose left vertex set has size at least $N_{0}-d_{L}\cdot d_{R}$ where $N_{0}=\#{\cal X}_{0}.$ We now pick one vertex in the left vertex set of $G_{1}$ and continue the above procedure. After the $i^{th}$ step, the number of left vertices remaining is $N_{0}-i\cdot d_{L}\cdot d_{R}$ and so from (4.4) we get that this process continues at least for $M$ steps. The words corresponding to vertices $\{x_{1},\ldots,x_{M}\}$ form our code ${\cal C}.$

Decoder definition: Let ${\cal C}$ be the code as constructed above. For decoding, we use the conflict-set decoder defined as follows: If $y\in D(x_{j},\epsilon)$ for some $x_{j}\in{\cal C}$ and the conflict set $C(y,\epsilon)$ does not contain any of word of ${\cal C}\setminus\{x_{j}\},$ then we set $g(y)=x_{j}.$ Otherwise, we set $g(y)$ to be any arbitrary value; for concreteness, we set $g(y)=x_{1}.$

We claim that the probability of error of the conflict-set decoder is at most $\epsilon.$ To see this is true, suppose we transmit the word $x_{i}.$ With probability at least
$1-\epsilon,$ the corresponding output $\theta_{x_{i}}\in D(x_{i},\epsilon).$ Because (4.5) holds, we must necessarily have that $y\notin D(x_{k},\epsilon)$ for any $k\neq j.$ This implies that the conflict-set decoder outputs the correct word $x_{i}$ with probability at least $1-\epsilon.$ ∎

We now prove Theorem 3.1 using typical sets and conflict set decoding.

4.2 Proof of Theorem 3.1

For notational simplicity we prove Theorem 3.1 with ${\cal X}={\cal Y}=\{0,1\}.$ An analogous analysis holds for the general case.

The proof consists of three steps. In the first step, we define and estimate the occurrence of certain typical sets. In the next step, we use the typical sets constructed in Step $1$ to determine the set ${\cal B}_{n}$ in the statement of the Theorem. Finally, we use Proposition 1 to obtain the bound (3.2) on the rates.
Step 1: Typical sets: We define the typical set

A_{n}(\epsilon)=\left(A_{n,1}(\epsilon)\times A_{n,2}(\epsilon)\right)\bigcap A_{n,3}(\epsilon)

(4.6)

where

A_{n,1}(\epsilon)=\{x\in{\cal X}^{n}:2^{-n(H(X)+\epsilon)}\leq p(x)\leq 2^{-n(H(X)-\epsilon)}\},

A_{n,2}(\epsilon)=\{y\in{\cal Y}^{n}:2^{-n(H(Y)+\epsilon)}\leq p(y)\leq 2^{-n(H(Y)-\epsilon)}\}

and

A_{n,3}(\epsilon)=\{(x,y)\in{\cal X}^{n}\times{\cal Y}^{n}:2^{-n(H(X,Y)+\epsilon)}\leq p(x,y)\leq 2^{-n(H(X,Y)-\epsilon)}\}

with the notation that if $x=(x_{1},\ldots,x_{n}),$ then $p(x):=\prod_{i=1}^{n}p(x_{i}).$

We estimate $\mathbb{P}(A_{n,1}(\epsilon))$ as follows. If $(X_{1},\ldots,X_{n})$ is a random element of ${\cal X}^{n}$ with $\{X_{i}\}$ i.i.d. and each having distribution $p(.),$ then the random variable $\log{p(X_{i})}$ has mean $H(X)$ and so by Chebychev’s inequality

$\displaystyle\mathbb{P}(A^{c}_{n,1}(\epsilon))$	$\displaystyle=$	$\displaystyle\mathbb{P}\left(\left\|\sum_{i=1}^{n}\log{p(X_{i})}-nH(X)\right\|\geq nH(X)\epsilon\right)$
	$\displaystyle\leq$	$\displaystyle\frac{1}{n^{2}H^{2}(X)\epsilon^{2}}\mathbb{E}\left(\sum_{i=1}^{n}\log{p(X_{i})}-nH(X)\right)^{2}$
	$\displaystyle=$	$\displaystyle\frac{1}{nH^{2}(X)\epsilon^{2}}\mathbb{E}\left(\log{p(X_{1})}-H(X)\right)^{2}$

which converges to zero as $n\rightarrow\infty.$ Analogous estimates hold for the sets $A_{n,2}(\epsilon)$ and $A_{n,3}(\epsilon)$ and so

\mathbb{P}(A_{n}(\epsilon))\geq 1-\epsilon^{2}

(4.7)

for all $n$ large.

Step 2: Determining the set ${\cal B}_{n}$ : We now use the set $A_{n}(\epsilon)$ defined above to determine the set ${\cal B}_{n}$ in the statement of the Theorem as follows. For $x\in A_{n,1}(\epsilon),$ let

D_{n}(x,\epsilon):=\{y\in A_{n,2}(\epsilon):(x,y)\in A_{n}(\epsilon)\}.

In Figure 2, we illustrate the sets $\{A_{n,i}(\epsilon)\}_{1\leq i\leq 3}$ and the set $A_{n}(\epsilon).$ The rectangle $EFGH$ denotes $A_{n,1}(\epsilon)\times A_{n,2}(\epsilon)$ and the oval set $A_{3}$ represents $A_{n,3}(\epsilon).$ The hatched region represents $A_{n}(\epsilon).$ The line $yz$ represents the set $D_{n}(x,\epsilon)$ for $x\in A_{n,1}(\epsilon)$ shown on the $X-$ axis.

From Figure 2 we see that

\sum_{x\in A_{n,1}(\epsilon)}\left(\sum_{y\in D_{n}(x,\epsilon)}p(x,y)\right)=\sum_{(x,y)\in A_{n}(\epsilon)}p(x,y)\geq 1-\epsilon^{2}

(4.8)

by (4.7). Letting

A_{n,4}(\epsilon):=\left\{x\in A_{n,1}(\epsilon):\sum_{y\in D_{n}(x,\epsilon)}p(y|x)\geq 1-\epsilon\right\},

(4.9)

we split the summation in first term in (4.8) as $L_{1}+L_{2}$ where

L_{1}=\sum_{x\in A_{n,4}(\epsilon)}\left(\sum_{y\in D_{n}(x,\epsilon)}p(y|x)\right)p(x)\leq\sum_{x\in A_{n,4}(\epsilon)}p(x)=\mathbb{P}\left(A_{n,4}(\epsilon)\right)

(4.10)

and

$\displaystyle L_{2}$	$\displaystyle=$	$\displaystyle\sum_{x\in A_{n,1}(\epsilon)\setminus A_{n,4}(\epsilon)}\left(\sum_{y\in D_{n}(x,\epsilon)}p(y\|x)\right)p(x)$	(4.11)
	$\displaystyle\leq$	$\displaystyle(1-\epsilon)\sum_{x\in A_{n,1}(\epsilon)\setminus A_{n,4}(\epsilon)}p(x)$
	$\displaystyle\leq$	$\displaystyle(1-\epsilon)\mathbb{P}\left(A^{c}_{n,4}(\epsilon)\right).$

Substituting (4.11) and (4.10) into (4.8) we get

1-\epsilon\cdot\mathbb{P}\left(A^{c}_{n,4}(\epsilon)\right)\geq L_{1}+L_{2}\geq 1-\epsilon^{2}

and so $\mathbb{P}\left(A^{c}_{n,4}(\epsilon)\right)\leq\epsilon.$ Because $A_{n,4}(\epsilon)\subseteq A_{n,1}(\epsilon),$ we therefore get that

1-\epsilon\leq\mathbb{P}\left(A_{n,4}(\epsilon)\right)=\sum_{x\in A_{n,4}(\epsilon)}p(x)\leq 2^{-n(H(X)-\epsilon)}\#A_{n,4}(\epsilon).

Setting ${\cal B}_{n}=A_{n,4}(\epsilon)$ we then get

\#{\cal B}_{n}\geq 2^{n(H(X)-\epsilon)}\cdot(1-\epsilon)\geq 2^{n(H(X)-2\epsilon)}

for all $n$ large.

Step 3: Using Proposition 1: For $\alpha\leq H(X),$ we let ${\cal D}_{n}$ be any set of size $2^{n(\alpha-2\epsilon)}$ contained within ${\cal B}_{n}.$ Let $G$ be the bipartite graph with vertex set ${\cal X}_{c}\cup{\cal Y}_{c}$ where ${\cal X}_{c}:={\cal D}_{n},\;{\cal Y}_{c}:=A_{n,2}(\epsilon)$ and an edge is present between $x\in{\cal X}_{c}$ and $y\in{\cal Y}_{c}$ if and only if $(x,y)\in A_{n}(\epsilon).$ We now compute the sizes of the probable sets and the conflict sets in that order.

For each $x\in{\cal X}_{c}$ we have by definition (4.9) of $A_{n,4}(\epsilon)$ that

\sum_{y\in D_{n}(x,\epsilon)}p(y|x)\geq 1-\epsilon

(4.12)

and so we set $D_{n}(x,\epsilon)$ to be the $\epsilon-$ probable set corresponding to $x\in{\cal D}_{n}.$ To estimate the size of $D_{n}(x,\epsilon),$ we use the fact that $(x,y)\in A_{n}(\epsilon)$ and so

p(y|x)=\frac{p(x,y)}{p(x)}\geq\frac{2^{-n(H(X,Y)+\epsilon)}}{2^{-n(H(X)-\epsilon)}}=2^{-n(H(Y|X)+2\epsilon)}.

(4.13)

Thus

1\geq\sum_{y\in D_{n}(x,\epsilon)}p(y|x)\geq\#D_{n}(x,\epsilon)\cdot 2^{-n(H(Y|X)+2\epsilon)}

and consequently

\#D_{n}(x,\epsilon)\leq 2^{n(H(Y|X)+2\epsilon)}.

(4.14)

Finally, we estimate the size of the conflict set $C(y,\epsilon)$ for each $y\in{\cal Y}_{c}.$ Again we use the fact that if $(x,y)$ is an edge in $G$ then $(x,y)\in A_{n}(\epsilon)$ and so

p(x|y)=\frac{p(x,y)}{p(y)}\geq\frac{2^{-n(H(X,Y)+\epsilon)}}{2^{-n(H(Y)-\epsilon)}}=2^{-n(H(X|Y)+2\epsilon)}.

(4.15)

Thus

1\geq\sum_{x\in C(y,\epsilon)}p(x|y)\geq\#C(y,\epsilon)\cdot 2^{-n(H(X|Y)+2\epsilon)}

and we get that $\#C(y,\epsilon)\leq 2^{n(H(X|Y)+2\epsilon)}.$ Using this and (4.14), we get that the conditions in Proposition 1 hold with

N_{0}=2^{n(\alpha-2\epsilon)},d_{L}(\epsilon)=2^{n(H(Y|X)+2\epsilon)}\text{ and }d_{R}(\epsilon)=2^{n(H(X|Y)+2\epsilon)}.

If $M=2^{nR}$ with $R<\alpha-H(X|Y)-H(Y|X)-7\epsilon,$ then (4.4) holds and so there exists a code containing $M=2^{nR}$ words from ${\cal D}_{n}$ giving an error probability of at most $\epsilon$ with the conflict set decoder. ∎

Acknowledgements

I thank Professors Rajesh Sundaresan, C. R. Subramanian and the referees for crucial comments that led to an improvement of the paper. I also thank IMSc for my fellowships.

References

[1] Alon, N. and Spencer, J., The Probabilistic Method, Wiley Interscience (2008).
[2] Bandemer, B., El Gamal, A. and Kim, Y-H., Optimal Achievable Rates for Interference Networks with Random Codes, IEEE Transactions on Information Theory, 61, 6536–6549 (2015).
[3] Cover, T. and Thomas, J., Elements of Information Theory, Wiley (2006).
[4] Csiszár, I. and Körner, J., Graph Decomposition: A New Key to Coding Theorems, IEEE Transactions on Information Theory, 27, 5–12 (1981).
[5] Csiszár, I., The Method of Types, IEEE Transactions on Information Theory, 44, pp. 2505–2523 (1998).
[6] Huffman, W. C. and Pless, V., Fundamentals of Error Correcting Codes, Cambridge University Press (2003).
[7] Zamir, R., Lattice Coding for Signals and Networks, Cambridge University Press (2014).