11email: gganesan82@gmail.com
Achieving positive rates with predetermined dictionaries
Abstract
In the first part of the paper we consider binary input channels that are not necessarily stationary and show how positive rates can be achieved using codes constrained to be within predetermined dictionaries. We use a Gilbert-Varshamov-like argument to obtain the desired rate achieving codes. Next we study the corresponding problem for channels with arbitrary alphabets and use conflict-set decoding to show that if the dictionaries are contained within “nice” sets, then positive rates are achievable.
Key words: Positive rates, predetermined dictionaries.
AMS 2000 Subject Classification: Primary: 94A15, 94A24.
1 Introduction
Achieving positive rates with low probability of error in communication channels is an important problem in information theory [3]. In general, a rate is defined to be achievable if there exists codes with rate and having arbitrarily small error probability as the code length The existence of such codes is determined through the probabilistic method of choosing a random code (from the set of all possible codes) and showing that the chosen code has small error probability.
In many cases of interest, we would like to select codes satisfying certain constraints or equivalently from a predetermined dictionary (see [2, 7] for examples). For stationary channels, the method of types [4, 5] can be used to study positive rate achievability with the restriction that the dictionary falls within the set of words belonging to a particular type. In this paper, we study achievability of positive rates with arbitrary deterministic dictionaries for both binary and general input channels using counting techniques.
The paper is organized as follows: In Section 2, we study positive rate achievability in binary input channels using predetermined dictionaries. Next in Section 3, we describe the rate achievability problem for arbitrary stationary channels and state our result Theorem 3.1 regarding achieving positive rates using given dictionaries. Finally, in Section 4, we prove Theorem 3.1.
2 Binary Channels
For integer an element of the set is said to be a codeword or simply word, of length Consider a discrete memoryless symmetric channel with input alphabet that corrupts a transmitted word as follows. If is the received (random) word, then
(2.1) |
for all where denotes the indicator function and denotes the erasure symbol. If then the bit is substituted and if then is erased. The random variables are independent with
(2.2) |
and so the probability of a bit error (due to either a substituted bit or an erased bit) at “time” index is Letting we also denote where is a deterministic function defined via (2.1). We are interested in communicating through the above described channel, with low probability of error, using words from a predetermined (deterministic) dictionary.
Dictionaries
A dictionary of size is a set of cardinality A subset is said to be an length code of size contained in the dictionary Suppose we transmit a word picked from through the channel given by (2.1) and receive the (random) word Given we would like an estimate of the word from that was transmitted. A decoder is a deterministic map that uses the received word to obtain an estimate of the transmitted word. The probability of error corresponding to the code and the decoder is then defined as
(2.3) |
where is the additive noise as described in (2.1).
We have the following definition regarding achievable rates using predetermined dictionaries.
Definition 1
Let and let be any sequence of dictionaries such that each has size at least We say that is an achievable rate if the following holds true for every For all large, there exists a code of size and a decoder such that the probability of error
If for each then the above reduces to the usual concept of rate achievability as in [3] and we simply say that is achievable.
For we define the entropy function
(2.4) |
where all logarithms in this section to the base two and have the following result.
Theorem 2.1
For integer let
be the expected number of bit substitutions and erasures, respectively in an length codeword and suppose
(2.5) |
Let and let be any sequence of dictionaries satisfying for each We have that every is achievable.
For a given let be the largest value of such that The above result says that every is achievable using arbitrary dictionaries. We use Gilbert-Varshamov-like arguments to prove Theorem 2.1 below.
As a special case, for binary symmetric channels with crossover probability each bit is independently substituted with probability No erasures occur and so
Thus and from Theorem 2.1 we therefore have that if then every is achievable.
Proof of Theorem 2.1
The main idea of the proof is as follows. Using standard deviation estimates, we first obtain an upper bound on the number of possible errors that could occur in a transmitted word. More specifically, if denotes the number of bit errors in an length word and is given, we use standard deviation estimates to determine such that We then use a Gilbert-Varshamov argument to obtain a code that can correct up to bit errors. The details are described below.
We prove the Theorem in two steps. In the first step, we construct the code and decoder and in the second step, we estimate the probability of the decoding error for using For we let be the Hamming distance between and where as before denotes the indicator function. The minimum distance of a code is the minimum distance between any two words in a code.
Step 1: Assume for simplicity that is an integer and let For a word let be the set of words that are at a distance of at most from If is a maximum size code with minimum distance at least then by the maximality of we must have
(2.6) |
This is known as the Gilbert-Varshamov argument [6].
The cardinalities of and are and respectively and so from (2.6), we see that the code has size
(2.7) |
and minimum distance at least Also since we have for all small that and so Using Stirling approximation we get
and so from (2.7), we get for that
(2.8) |
provided is small.
We now use a two stage decoder described as follows: Suppose the received word is and for simplicity suppose that the last positions in have been erased. For a codeword let be the reduced word formed by the first bits. Let be the set of all reduced codewords in the code formed by the first bits.
In the first stage of the decoding process, the decoder corrects bit substitutions by collecting all words whose Hamming distance from is minimum. If contains exactly one word, say the decoder outputs as the estimate obtained in the first step of the iteration. Otherwise, the decoder outputs “decoding error”. In the second stage of the decoding process, the decoder uses to correct the erasures. Formally let be the set of all codewords whose first bits match If there exists exactly one word in then the decoder outputs to be the transmitted word. Else the decoder outputs “decoding error”.
Step 2: Suppose a word was transmitted and the received word is Let be the random noise vector as in (2.1) and let
be the number of bits that have been substituted so that
by (2.2). By standard deviation estimates (Corollary pp. 312, [1]) we have
(2.9) |
for all large, by the first condition of (2.5). Similarly if is the number of erased bits, then
(2.10) |
for all large.
Next, using the second condition of (2.5) we have that
for all large and so from (2.9) and (2.10) we get that for all large. If then by construction the decoder outputs as the estimate of the transmitted word. Therefore a decoding error occurs only if which happens with probability at most Combining with (2.8) and using the fact that is arbitrary, we get that every is achievable. ∎
3 General channels
Consider a discrete memoryless channel with finite input alphabet of size a finite output alphabet and a transition probability The term denotes the probability that output is observed given that input is transmitted through the channel.
For we define a subset to be a dictionary. A subset is defined to be an length code contained within the dictionary Suppose we transmit the word and receive the (random) word Given we would like an estimate of the word from that was transmitted. A decoder is a deterministic map that “guesses” the transmitted word based on the received word We denote the probability of error corresponding to the code and the decoder as
(3.1) |
To study positive rate achievability using arbitrary dictionaries, we have a couple of preliminary definitions. Let be any probability distribution on the input alphabet and let be the entropy of a random variable where the logarithm is to the base here. Let be a random variable having joint distribution with the random variable defined by Thus is the random output of the channel when the input is Letting be the marginal of we have that the joint entropy and conditional entropy [3] are respectively given by
and
The following result obtains positive rates achievable with predetermined dictionaries for the channel described above.
Theorem 3.1
Let and be as above and let For every and for all large, there is a deterministic set with size at least and satisfying the following property: If is any subset of with cardinality and
(3.2) |
is positive, then there exists a code containing words and a decoder with error probability
Thus if the sequence of dictionaries is such that for each then every is achievable. Also, setting and also gives us that every is achievable in the usual sense of [3], without any restrictions on the dictionaries. For context, we remark that Theorem 2.1 holds for arbitrary dictionaries.
To prove Theorem 3.1, we use typical sets [3] together with conflict set decoding described in the next section. Before we do so, we present an example to illustrate Theorem 3.1.
Example
Consider a binary asymmetric channel with alphabet and transition probability
To apply Theorem 3.1, we assume that the input has the symmetric distribution so that the entropy equals its maximum value of The entropy of the output where and the conditional entropies equal
Set and If both and are small, then is close to one and and are close to zero. We assume that and are such that
is strictly less than one and choose Every is then achievable as in the statement following Theorem 3.1 and every is achievable without any dictionary restrictions, in the usual sense of [3].
In Figure 1, we plot as a function of for various values of the asymmetry factor For example, for an asymmetry factor of we see that positive rates are achievable for roughly up to

4 Proof of Theorem 3.1
We use conflict set decoding to prove Theorem 3.1. Therefore in the first part of this section, we prove an auxiliary result regarding conflict set decoding that is also of independent interest.
4.1 Conflict set decoding
Consider a discrete memoryless channel with finite input alphabet and finite output alphabet and transition probability For convenience, we define the channel by a collection of random variables with the distribution for All random variables are defined on the probability space For and we let and be deterministic sets such that
(4.1) |
We define to be an probable set or simply probable output set corresponding to the input and for we denote to be the conflict set or simply conflict set corresponding to the output There are many possible choices for for example is one choice. In Proposition 1 below, we show however that choosing probable sets as small as possible allows us to increase the size of the desired code. We also define
(4.2) |
where denotes the cardinality of the set
As before, a code of size is a set of distinct words Suppose we transmit the word and receive the (random) word Given we would like an estimate of the word from that was transmitted. A decoder is a deterministic map that guesses the transmitted word based on the received word We denote the probability of error corresponding to the code the decoder and the collection of the probable sets as
(4.3) |
We have the following Proposition.
Proposition 1
For let be any collection of probable sets. If there exists an integer satisfying
(4.4) |
then there exists a code of size and a decoder whose decoding error probability is
Thus as long as the number of words is below a certain threshold, we are guaranteed that the error probability is sufficiently small. Also, from (4.4) we see that it would be better to choose probable sets with as small cardinality as possible.
Proof of Proposition 1
Code construction: We recall that by definition, given input the output belongs to the set with probability at least Therefore we first construct a code containing distinct words and satisfying
(4.5) |
Throughout we assume that satisfies (4.4). To obtain the desired distinct words, we use following the bipartite graph representation. Let be a bipartite graph with vertex set We join and by an edge if and only if The size of therefore represents the degree of the vertex and the size of represents the degree of the vertex By definition (see (4.2)) denote the maximum degree of a left vertex and a right vertex, respectively, in We say that a set of vertices is disjoint if for all the vertices and have no common neighbour (in ). Constructing codes with disjoint probable sets satisfying (4.5) is therefore equivalent to finding disjoint sets of vertices in
We now use direct counting to get a set of disjoint vertices in First we pick any vertex The degree of is at most and moreover, each vertex in has at most neighbours in The total number of (bad) vertices of adjacent to some vertex in is at most Removing all these bad vertices, we are left with a bipartite subgraph of whose left vertex set has size at least where We now pick one vertex in the left vertex set of and continue the above procedure. After the step, the number of left vertices remaining is and so from (4.4) we get that this process continues at least for steps. The words corresponding to vertices form our code
Decoder definition: Let be the code as constructed above. For decoding, we use the conflict-set decoder defined as follows: If for some and the conflict set does not contain any of word of then we set Otherwise, we set to be any arbitrary value; for concreteness, we set
We claim that the probability of error of the conflict-set decoder is at most To see this is true, suppose we transmit the word With probability at least
the corresponding output Because (4.5) holds, we must necessarily have that for any This implies that the conflict-set decoder outputs the correct word with probability at least ∎
We now prove Theorem 3.1 using typical sets and conflict set decoding.
4.2 Proof of Theorem 3.1
For notational simplicity we prove Theorem 3.1 with An analogous analysis holds for the general case.
The proof consists of three steps. In the first step, we define and estimate the occurrence of certain typical sets. In the next step, we use the typical sets constructed in Step to determine the set in the statement of the Theorem. Finally, we use Proposition 1 to obtain the bound (3.2) on the rates.
Step 1: Typical sets: We define the typical set
(4.6) |
where
and
with the notation that if then
We estimate as follows. If is a random element of with i.i.d. and each having distribution then the random variable has mean and so by Chebychev’s inequality
which converges to zero as Analogous estimates hold for the sets and and so
(4.7) |
for all large.
Step 2: Determining the set : We now use the set defined above to determine the set in the statement of the Theorem as follows. For let
In Figure 2, we illustrate the sets and the set The rectangle denotes and the oval set represents The hatched region represents The line represents the set for shown on the axis.

From Figure 2 we see that
(4.8) |
by (4.7). Letting
(4.9) |
we split the summation in first term in (4.8) as where
(4.10) |
and
(4.11) | |||||
Substituting (4.11) and (4.10) into (4.8) we get
and so Because we therefore get that
Setting we then get
for all large.
Step 3: Using Proposition 1: For we let be any set of size contained within Let be the bipartite graph with vertex set where
and an edge is present between and if and only if
We now compute the sizes of the probable sets and the conflict sets in that order.
For each we have by definition (4.9) of that
(4.12) |
and so we set to be the probable set corresponding to To estimate the size of we use the fact that and so
(4.13) |
Thus
and consequently
(4.14) |
Finally, we estimate the size of the conflict set for each Again we use the fact that if is an edge in then and so
(4.15) |
Thus
and we get that Using this and (4.14), we get that the conditions in Proposition 1 hold with
If with then (4.4) holds and so there exists a code containing words from giving an error probability of at most with the conflict set decoder. ∎
Acknowledgements
I thank Professors Rajesh Sundaresan, C. R. Subramanian and the referees for crucial comments that led to an improvement of the paper. I also thank IMSc for my fellowships.
References
- [1] Alon, N. and Spencer, J., The Probabilistic Method, Wiley Interscience (2008).
- [2] Bandemer, B., El Gamal, A. and Kim, Y-H., Optimal Achievable Rates for Interference Networks with Random Codes, IEEE Transactions on Information Theory, 61, 6536–6549 (2015).
- [3] Cover, T. and Thomas, J., Elements of Information Theory, Wiley (2006).
- [4] Csiszár, I. and Körner, J., Graph Decomposition: A New Key to Coding Theorems, IEEE Transactions on Information Theory, 27, 5–12 (1981).
- [5] Csiszár, I., The Method of Types, IEEE Transactions on Information Theory, 44, pp. 2505–2523 (1998).
- [6] Huffman, W. C. and Pless, V., Fundamentals of Error Correcting Codes, Cambridge University Press (2003).
- [7] Zamir, R., Lattice Coding for Signals and Networks, Cambridge University Press (2014).