This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

A Construction for Balancing Non-Binary Sequences Based on Gray Code Prefixes

Elie N. Mambou and Theo G. Swart This paper was presented in part at the IEEE International Symposium on Information Theory, Barcelona, Spain, July, 2016.The authors are with the Department of Electrical and Electronic Engineering Science, University of Johannesburg, P. O. Box 524, Auckland Park, 2006, South Africa (e-mails: {emambou, tgswart}@uj.ac.za).This work is based on research supported in part by the National Research Foundation of South Africa (UID 77596).
Abstract

We introduce a new construction for the balancing of non-binary sequences that make use of Gray codes for prefix coding. Our construction provides full encoding and decoding of sequences, including the prefix. This construction is based on a generalization of Knuth’s parallel balancing approach, which can handle very long information sequences. However, the overall sequence—composed of the information sequence, together with the prefix—must be balanced. This is reminiscent of Knuth’s serial algorithm. The encoding of our construction does not make use of lookup tables, while the decoding process is simple and can be done in parallel.

Index Terms:
Balanced sequence, DC-free codes, Gray code prefix.

I Introduction

The use of balanced codes is crucial for some information transmission systems. Errors can occur in the process of storing data onto optical devices due to the low frequency of operation between structures of the servo and the data written on the disc. This can be avoided by using encoded balanced codes, as no low frequencies are observed. In such systems, balanced codes are also useful for tracking the data on the disc. Balanced codes are also used for countering cut-off at low frequencies in digital transmission through capacitive coupling or transformers. This cut-off is caused by multiple same-charge bits, and results in a DC level that charges the capacitor in the AC coupler [1]. In general, the suppression of low-frequency spectrum can be done with balanced codes.

A large body of work on balanced codes is derived from the simple algorithm for balancing sequences proposed by Knuth [2]. According to Knuth’s parallel algorithm, a binary sequence, 𝒙x, of even length kk, can always be balanced by complementing its first or last ii bits, where 0ik0\leq i\leq k. The index ii is then encoded as a balanced prefix that is appended to the data. The decoder can easily recover ii from the prefix, and then again complementing the first or last ii bits to obtain the original information. For Knuth’s serial (or sequential) algorithm, the prefix is used to provide information regarding the information sequence’s initial weight. Bits are sequentially complemented from one side of the overall sequence, until the information sequence and prefix together are balanced. Since the original weight is indicated by the prefix, the decoder simply has to sequentially complement the bits until this weight is attained.

Al-Bassam [3] presented a generalization of Knuth’s algorithm for binary codes, non-binary codes and semi-balanced codes (the latter occur where the number of 0’s and 1’s differs by at most a certain value in each sequence of the code). The balancing of binary codes with low DC level is based on DC-free coset codes. For the design of non-binary balanced codes, symbols in the information sequence are qq-ary complemented from one side, but because this process does not guarantee balancing, an extra redundant symbol is added to enforce the balancing (similar to our approach later on). Information regarding how many symbols to complement is sent by using a balanced prefix.

Capocelli et al. [4] proposed using two functions that must satisfy certain properties to encode any qq-ary sequence into balanced sequences. The first function is similar to Knuth’s serial scheme: it outputs a prefix sequence depending on the original sequence’s weight. Additionally, all the qq-ary sequences are partitioned into disjointed chains, where each chain’s sequences have unique weights. The second function is then used to select an alternate sequence in the chain containing the original information sequence, such that the chosen prefix and the alternate sequence together are balanced.

Tallini and Vaccaro [8] presented another construction for balanced qq-ary sequences that makes use of balancing and compression. Sequences that are close to being balanced are encoded with a generalization of Knuth’s serial scheme. Based on the weight of the information sequence, a prefix is chosen. Symbols are then “complemented in stages”, one at a time, until the weight that balances the sequence and prefix together is attained. Other sequences are compressed with a uniquely decodable variable length code and balanced using the saved space.

Swart and Weber [5] extended Knuth’s parallel balancing scheme to qq-ary sequences with parallel decoding. However, this technique does not provide a prefix code implementation, with the assumption that small lookup tables can be used for this. Our approach aims to implement these prefixes via Gray codes. Swart and Weber’s scheme will be expanded on in Section II-A, as it also forms the basis of our proposed algorithm.

Swart and Immink [6] described a prefixless algorithm for balancing of qq-ary sequences. By using the scheme from [5] and applying precoding to a very specific error correction code, it was shown that balancing can be achieved without the need for a prefix.

Pelusi et al. [7] presented a refined implementation of Knuth’s algorithm for parallel decoding of qq-ary balanced codes, similar to [5]. This method significantly improved [4] and [8] in terms of complexity.

The rest of this paper is structured as follows. In Section II, we present the background for our work, which includes Swart and Weber’s balancing scheme for qq-ary sequences [5] and non-binary Gray code theory [10]. In Section III, a construction is presented for sequences where k=qtk=q^{t}. Section IV extends on our proposed construction to sequences with kqtk\neq q^{t}. Finally, Section V deals with the redundancy and complexity of our construction compared to prior art constructions, our conclusions are presented in Section VI.

II Preliminaries

Let 𝒙=(x1x2xk)\mbox{\boldmath$x$}=(x_{1}x_{2}\dots x_{k}) be a qq-ary information sequence of length kk, where xi{0,1,q1}x_{i}\in\{0,1,\dots q-1\} is from a non-binary alphabet. A prefix of length rr is appended to 𝒙x. The prefix and information together are denoted by 𝒄=(c1c2cn)\mbox{\boldmath$c$}=(c_{1}c_{2}\dots c_{n}) of length n=k+rn=k+r, where ci{0,1,q1}c_{i}\in\{0,1,\dots q-1\}. Let w(𝒄)w(\mbox{\boldmath$c$}) refer to the weight of 𝒄c, that is the algebraic sum of symbols in 𝒄c. The sequence 𝒄c is said to be balanced if

w(𝒄)=i=1nci=n(q1)2.w(\mbox{\boldmath$c$})=\sum_{i=1}^{n}c_{i}=\frac{n(q-1)}{2}.

Let βn,q\beta_{n,q} represent this value obtained at the balancing state. For the rest of the paper, the parameters kk, nn, qq and rr are chosen in such a way that the balancing value, βn,q=n(q1)/2\beta_{n,q}=n(q-1)/2, leads to a positive integer.

II-A Balancing of qq-ary Sequences

Any information sequence, 𝒙x of length kk and alphabet size qq, can always be balanced by adding (modulo qq) to that sequence one sequence from a set of balancing sequences [5]. The balancing sequence, 𝒃s,p=(b1b2bk)\mbox{\boldmath$b$}_{s,p}=(b_{1}b_{2}\dots b_{k}) is derived as

bi={s,i>p,s+1(modq),ip,b_{i}=\begin{cases}s,&i>p,\\ s+1\pmod{q},&i\leq p,\end{cases}

where ss and pp are positive integers with 0sq10\leq s\leq q-1 and 0pk10\leq p\leq k-1. Let zz be the iterator through all possible balancing sequences, such that z=sn+pz=sn+p and 0zkq10\leq z\leq kq-1. Let 𝒚y refer to the resulting sequence when adding (modulo qq) the balancing sequence to the information sequence, 𝒚=𝒙q𝒃s,p\mbox{\boldmath$y$}=\mbox{\boldmath$x$}\oplus_{q}\mbox{\boldmath$b$}_{s,p}, where q\oplus_{q} denotes modulo qq addition. The cardinality of balancing sequences equals kqkq and amongst them, at least one leads to a balanced output 𝒚y.

Since ss and pp can easily be determined for the zz-th balancing sequence using z=sn+pz=sn+p, we will use the simplified notation 𝒃z\mbox{\boldmath$b$}_{z} to denote 𝒃s,p\mbox{\boldmath$b$}_{s,p}.

Example 1

Let us consider the balancing of the 3-ary sequence 2101, of length 4. The encoding process is illustrated below, with weights in bold indicating that the sequences are balanced.

z𝒙q𝒃z=𝒚w(𝒚)0(2101)3(0000)=(2101)𝟒1(2101)3(1000)=(0101)22(2101)3(1100)=(0201)33(2101)3(1110)=(0211)𝟒4(2101)3(1111)=(0212)55(2101)3(2111)=(1212)66(2101)3(2211)=(1012)𝟒7(2101)3(2221)=(1022)58(2101)3(2222)=(1020)39(2101)3(0222)=(2020)𝟒10(2101)3(0022)=(2120)511(2101)3(0002)=(2100)3\begin{array}[]{c@{\quad\quad}c@{\;}c@{\;}c@{\;}c@{\;}c@{\quad\quad}c}z&\mbox{\boldmath$x$}&\oplus_{q}&\mbox{\boldmath$b$}_{z}&=&\mbox{\boldmath$y$}&w(\mbox{\boldmath$y$})\\ \hline\cr 0&(2101)&\oplus_{3}&(0000)&=&(2101)&\mathbf{4}\\ 1&(2101)&\oplus_{3}&(1000)&=&(0101)&2\\ 2&(2101)&\oplus_{3}&(1100)&=&(0201)&3\\ 3&(2101)&\oplus_{3}&(1110)&=&(0211)&\mathbf{4}\\ 4&(2101)&\oplus_{3}&(1111)&=&(0212)&5\\ 5&(2101)&\oplus_{3}&(2111)&=&(1212)&6\\ 6&(2101)&\oplus_{3}&(2211)&=&(1012)&\mathbf{4}\\ 7&(2101)&\oplus_{3}&(2221)&=&(1022)&5\\ 8&(2101)&\oplus_{3}&(2222)&=&(1020)&3\\ 9&(2101)&\oplus_{3}&(0222)&=&(2020)&\mathbf{4}\\ 10&(2101)&\oplus_{3}&(0022)&=&(2120)&5\\ 11&(2101)&\oplus_{3}&(0002)&=&(2100)&3\\ \end{array}

For this example, there are four occurrences of balanced sequences.

A (γ,τ)(\gamma,\tau)-random walk refers to a path with random increases of γ\gamma and decreases of τ\tau. In our case, a random walk graph is the plot of the function of w(𝒚)w(\mbox{\boldmath$y$}) versus zz. In general, the random walk graph of w(𝒚)w(\mbox{\boldmath$y$}) always forms a (1,q1)(1,q-1)-random walk [5]. Fig. 1 presents the (1,2)(1,2)-random walk for Example 1. The dashed line indicates the balancing value β4,3=4\beta_{4,3}=4.

Refer to caption
Figure 1: Random walk graph of w(𝒚)w(\mbox{\boldmath$y$}) for Example 1

This method, as presented in [5], assumed that the zz indices can be sent using balanced prefixes, but the actual encoding of these was not taken into account. For instance, in Example 1 indices z=0z=0, 33, 66 and 99 must be encoded into balanced prefixes, in order to send overall balanced sequences.

II-B Non-binary Gray Codes

Binary Gray codes were first proposed by Gray [9] for solving problems in pulse code communication, and have been extended to various other applications. The assumption throughout this paper is that a Gray code is mapped from a set of possible sequences appearing in the normal lexicographical order. This ordering results in the main property of binary Gray codes: two adjacent codewords differ in only one bit.

The (r,q)(r^{\prime},q)-Gray code is a set of qq-ary sequences of length rr^{\prime} such that any two adjacent codewords differ in only one symbol position. This set is not unique, as any permutation of a symbol column within the code could also generate a new (r,q)(r^{\prime},q)-Gray code. In this work, a unique set of (r,q)(r^{\prime},q)-Gray codes is considered, as presented by Guan [10]. This set possesses an additional property: the difference between any two consecutive sequences’ weights is ±1\pm 1. This same set of Gray codes was already determined in [11] through a recursive method.

Let 𝒅=(d1d2dr)\mbox{\boldmath$d$}=(d_{1}d_{2}\ldots d_{r^{\prime}}) be any sequence within the set of all qq-ary sequences of length rr^{\prime}, listed in the normal lexicographic order. These sequences are mapped to (r,q)(r^{\prime},q)-Gray code sequences, 𝒈=(g1g2gr)\mbox{\boldmath$g$}=(g_{1}g_{2}\ldots g_{r^{\prime}}), such that any two consecutive sequences are different in only one symbol position.

Table I shows a (3,3)(3,3)-Gray code, where 𝒅d is the 3-ary representation of the index z{0,1,,26}z\in\{0,1,\ldots,26\} and 𝒈g is the corresponding Gray code sequence. We see that for 𝒈g, the adjacent sequences’ weights differ by +1+1 or 1-1.

TABLE I: Example of (3,3)(3,3)-Gray code
zz 𝒅d 𝒈g zz 𝒅d 𝒈g zz 𝒅d 𝒈g
0 (000)(000) (000)(000) 9 (100)(100) (122)(122) 18 (200)(200) (200)(200)
1 (001)(001) (001)(001) 10 (101)(101) (121)(121) 19 (201)(201) (201)(201)
2 (002)(002) (002)(002) 11 (102)(102) (120)(120) 20 (202)(202) (202)(202)
3 (010)(010) (012)(012) 12 (110)(110) (110)(110) 21 (210)(210) (212)(212)
4 (011)(011) (011)(011) 13 (111)(111) (111)(111) 22 (211)(211) (211)(211)
5 (012)(012) (010)(010) 14 (112)(112) (112)(112) 23 (212)(212) (210)(210)
6 (020)(020) (020)(020) 15 (120)(120) (102)(102) 24 (220)(220) (220)(220)
7 (021)(021) (021)(021) 16 (121)(121) (101)(101) 25 (221)(221) (221)(221)
8 (022)(022) (022)(022) 17 (122)(122) (100)(100) 26 (222)(222) (222)(222)

We will make use of the following encoding and decoding algorithms from [10].

II-B1 Encoding algorithm for (r,q)(r^{\prime},q)-Gray code

Let 𝒅=(d1d2dr)\mbox{\boldmath$d$}=(d_{1}d_{2}\ldots d_{r^{\prime}}) and 𝒈=(g1g2gr)\mbox{\boldmath$g$}=(g_{1}g_{2}\ldots g_{r^{\prime}}) denote respectively a qq-ary sequence of length rr^{\prime} and its corresponding Gray code sequence.

Let SiS_{i} be the sum of the first i1i-1 symbols of 𝒈g, with 2ir2\leq i\leq r^{\prime} and g1=d1g_{1}=d_{1}. Then

Si=j=1i1gj,andgi={di,if Si is even,q1di,if Si is odd.S_{i}=\sum_{j=1}^{i-1}g_{j},\quad\text{and}\quad g_{i}=\begin{cases}d_{i},&\text{if }S_{i}\text{ is even},\\ q-1-d_{i},&\text{if }S_{i}\text{ is odd}.\end{cases}

The parity of SiS_{i} determines 𝒈g’s symbols from 𝒅d. If SiS_{i} is even then the symbol stays the same, otherwise the qq-ary complement of the symbol is taken.

II-B2 Decoding algorithm for (r,q)(r^{\prime},q)-Gray code

Let 𝒈g, 𝒅d and SiS_{i} be defined as before, with 2ir2\leq i\leq r^{\prime} and d1=g1d_{1}=g_{1}. Then

Si=j=1i1gj,anddi={gi,if Si is even,q1gi,if Si is odd.S_{i}=\sum_{j=1}^{i-1}g_{j},\quad\text{and}\quad d_{i}=\begin{cases}g_{i},&\text{if }S_{i}\text{ is even},\\ q-1-g_{i},&\text{if }S_{i}\text{ is odd}.\end{cases}

III Construction for k=qtk=q^{t}

For the sake of simplicity, we will briefly explain the construction for information lengths limited to k=qtk=q^{t}, with tt being a positive integer. More details can be found in our conference paper [13]. In the next section we will show how this restriction can be avoided.

The main component of this technique is to encode the balancing indices, zz, into Gray code prefixes that can easily be encoded and decoded. The prefix together with the information sequence must be balanced.

The condition, k=qtk=q^{t}, is enforced so that the cardinality of the (r,q)(r^{\prime},q)-Gray code is equal to that of the balancing sequences, making r=logq(kq)=logq(qt+1)=t+1r^{\prime}=\log_{q}(kq)=\log_{q}(q^{t+1})=t+1.

III-1 Encoding

Let 𝒄=(𝒈|𝒚)=(g1g2gry1y2yk)\mbox{\boldmath$c$}^{\prime}=(\mbox{\boldmath$g$}|\mbox{\boldmath$y$})=(g_{1}g_{2}\ldots g_{r^{\prime}}y_{1}y_{2}\ldots y_{k}) be the concatenation of the Gray code prefix with 𝒚y, with || representing the concatenation. As stated earlier, for the sequences 𝒚y we obtain a (1,q1)(1,q-1)-random walk, and for the Gray codes 𝒈g we have a (1,1)(1,1)-random walk. Therefore, when we concatenate the two sequences together, the random walk graph of 𝒄\mbox{\boldmath$c$}^{\prime} forms a ({0;2},{q2;q})(\{0;2\},\{q-2;q\})-random walk, i.e. increases of 0 or 2 and decreases of q2q-2 or qq.

This concatenation of a Gray code prefix, 𝒈g, with an output sequence, 𝒚y, does not guarantee the balancing of the overall sequence, since the increases of 2 in the random walk graph do not guarantee that it will pass through a specific point. An extra symbol uu is added to ensure overall balancing, with u=βn,qw(𝒄)u=\beta_{n,q}-w(\mbox{\boldmath$c$}^{\prime}) if 0uq10\leq u\leq q-1, otherwise u=0u=0, thus forcing the random graph to a specific point. The overall sequence is the concatenation of uu, 𝒈g and 𝒚y, i.e. 𝒄=(u|𝒈|𝒚)=(ug1g2gry1y2yk)\mbox{\boldmath$c$}=(u|\mbox{\boldmath$g$}|\mbox{\boldmath$y$})=(ug_{1}g_{2}\ldots g_{r^{\prime}}y_{1}y_{2}\ldots y_{k}). The length of 𝒄c is n=k+r+1n=k+r^{\prime}+1.

In summary, the balancing of any qq-ary sequence of length kk, where k=qtk=q^{t}, can be achieved by adding (modulo qq) an appropriate balancing sequence, 𝒃z\mbox{\boldmath$b$}_{z}, and prefixing a redundant symbol uu with a Gray code sequence, 𝒈g. The construction relies on finding a Gray code prefix to describe zz, and at the same time be balanced together with 𝒚y.

Example 2

Let us consider the encoding of the ternary sequence, 201 of length 3. Since t=1t=1, the length of Gray code prefixes will be r=2r^{\prime}=2. The overall length is n=6n=6 and the balancing value is β6,3=6\beta_{6,3}=6. The encoding process below is followed.

z𝒙q𝒃z=𝒚𝒄w(𝒄)0(201)3(000)=(201)(𝟎00¯201)31(201)3(100)=(001)(𝟎01¯001)22(201)3(110)=(011)(𝟐02¯011)𝟔3(201)3(111)=(012)(𝟎12¯012)𝟔4(201)3(211)=(112)(𝟎11¯112)𝟔5(201)3(221)=(122)(𝟎10¯122)𝟔6(201)3(222)=(120)(𝟏20¯120)𝟔7(201)3(022)=(220)(𝟎21¯220)78(201)3(002)=(220)(𝟏21¯200)𝟔\begin{array}[]{c@{\quad\quad}c@{\;}c@{\;}c@{\;}c@{\;}c@{\quad\quad}c@{\quad\quad}c}z&\mbox{\boldmath$x$}&\oplus_{q}&\mbox{\boldmath$b$}_{z}&=&\mbox{\boldmath$y$}&\mbox{\boldmath$c$}&w(\mbox{\boldmath$c$})\\ \hline\cr 0&(201)&\oplus_{3}&(000)&=&(201)&(\underline{\mathbf{0}00}201)&3\\ 1&(201)&\oplus_{3}&(100)&=&(001)&(\underline{\mathbf{0}01}001)&2\\ 2&(201)&\oplus_{3}&(110)&=&(011)&(\underline{\mathbf{2}02}011)&\mathbf{6}\\ 3&(201)&\oplus_{3}&(111)&=&(012)&(\underline{\mathbf{0}12}012)&\mathbf{6}\\ 4&(201)&\oplus_{3}&(211)&=&(112)&(\underline{\mathbf{0}11}112)&\mathbf{6}\\ 5&(201)&\oplus_{3}&(221)&=&(122)&(\underline{\mathbf{0}10}122)&\mathbf{6}\\ 6&(201)&\oplus_{3}&(222)&=&(120)&(\underline{\mathbf{1}20}120)&\mathbf{6}\\ 7&(201)&\oplus_{3}&(022)&=&(220)&(\underline{\mathbf{0}21}220)&7\\ 8&(201)&\oplus_{3}&(002)&=&(220)&(\underline{\mathbf{1}21}200)&\mathbf{6}\\ \end{array}

The underlined symbols represent the appended prefix, the bold underlined symbol is uu, which is chosen such that β6,3\beta_{6,3} is obtained whenever possible, and the bold weights indicate that balancing was achieved. Fig. 2 presents the random walk graph for the weight of the overall sequence, 𝐜c, with the shaded area indicating the possible weights as a result of the flexibility in choosing uu.

Refer to caption
Figure 2: Random walk graph of w(𝒄)w(\mbox{\boldmath$c$}) for Example 2

III-2 Decoding

The decoding consists of recovering the index zz from the Gray code prefix, 𝒈g, and finding ss and pp to reconstruct 𝒃z\mbox{\boldmath$b$}_{z}. The original sequence is then obtained as 𝒙=𝒚q𝒃z\mbox{\boldmath$x$}=\mbox{\boldmath$y$}\ominus_{q}\mbox{\boldmath$b$}_{z}, where q\ominus_{q} represents modulo qq subtraction.

As an example, Table II shows the decoding of every Gray code sequence into balancing sequences using the (2,3)(2,3)-Gray code set.

TABLE II: Decoding of (2,3)(2,3)-Gray codes for 33-ary sequences of length 2
Gray code (𝒈g) Sequence (𝒅d) zz s,ps,p 𝒃z\mbox{\boldmath$b$}_{z}
(00)(00) (00)(00) 0 0,00,0 (000)(000)
(01)(01) (01)(01) 1 0,10,1 (100)(100)
(02)(02) (02)(02) 2 0,20,2 (110)(110)
(12)(12) (10)(10) 3 1,01,0 (111)(111)
(11)(11) (11)(11) 4 1,11,1 (211)(211)
(10)(10) (12)(12) 5 1,21,2 (221)(221)
(20)(20) (20)(20) 6 2,02,0 (222)(222)
(21)(21) (21)(21) 7 2,12,1 (022)(022)
(22)(22) (22)(22) 8 2,22,2 (002)(002)
Example 3

Consider the received ternary sequence 𝐜=(012012)\mbox{\boldmath$c$}=(012012) of length n=6n=6 (one of the balanced sequences from Example 2). The (2,3)(2,3)-Gray code prefixes were used in encoding the original sequence.

The first symbol in 𝐜c, u=0u=0 is dropped, then the Gray code prefix is 𝐠=(12)\mbox{\boldmath$g$}=(12). This Gray code corresponds to 𝐝=(10)\mbox{\boldmath$d$}=(10) as presented in Table II. This implies that z=3z=3, leading to s=1s=1, p=0p=0 and therefore 𝐛3=(111)\mbox{\boldmath$b$}_{3}=(111). The original sequence is recovered as

𝒙=𝒚q𝒃z=(012)3(111)=(201).\mbox{\boldmath$x$}=\mbox{\boldmath$y$}\ominus_{q}\mbox{\boldmath$b$}_{z}=(012)\ominus_{3}(111)=(201).

Thus, the information sequence from Example 2 is recovered.

IV Construction for kqtk\neq q^{t}

We will now generalize the technique described in the previous section to sequences of any length, i.e. kqtk\neq q^{t}.

The idea is to use a subset of the (r,q)(r^{\prime},q)-Gray code with an appropriate length to encode the zz indices that represent the kqkq balancing sequences. Therefore, the cardinality of (r,q)(r^{\prime},q)-Gray code prefixes must be greater than that of the balancing sequences, i.e. qr>kqq^{r^{\prime}}>kq or r>logqk+1r^{\prime}>\log_{q}k+1.

However, the challenge is to find the appropriate subset of (r,q)(r^{\prime},q)-Gray code prefixes that can uniquely match the kqkq balancing sequences, and still guarantee balancing when combined with uu and 𝒚y.

IV-A (r,q)(r^{\prime},q)-Gray code prefixes for qq odd

When examining the random walk graph for Gray codes with qq odd, one notices that the random walk forms an odd function around a specific point. Fig. 3 presents the (4,3)(4,3)-Gray code random walk graph, with GG being the intersection point between the horizontal line, w(𝒈)=4w(\mbox{\boldmath$g$})=4, and the vertical line, z=40z=40. The graph forms an odd function around this point GG. In general, for (r,q)(r^{\prime},q)-Gray codes where qq is odd, the random walk of the Gray codes gives an odd function centered around w(𝒈)=βr,qw(\mbox{\boldmath$g$})=\beta_{r^{\prime},q} and z=qr2z=\lfloor\frac{q^{r^{\prime}}}{2}\rfloor, where \lfloor\cdot\rfloor represents the floor function.

Refer to caption
Figure 3: (4,3)(4,3)-Gray code random walk graph
Lemma 1

The random walk graph of (r,q)(r^{\prime},q)-Gray codes where qq is odd forms an odd function around the point GG.

It was proved in [11] that any (r,q)(r^{\prime},q)-Gray code, where qq is odd, is reflected. That is, the random walk graph of the (r,q)(r^{\prime},q)-Gray code forms an odd function centered around the point GG.

This implies that any subset of an (r,q)(r^{\prime},q)-Gray code around the center of its random walk graph, where the information sequence is such that kqkq is odd (i.e. kk is odd), always has an average weight equal to βr,q\beta_{r^{\prime},q}. As we need a unique subset of Gray code sequences for any case, we choose kqkq elements from the “middle” values of z[0,qr1]z\in[0,q^{r^{\prime}}-1] and call it the zz-centered subset. The index for this subset is denoted by zz^{\prime}, with z[z1,z2]z^{\prime}\in[z_{1},z_{2}]. When kqkq is even (i.e. kk is even), it is not guaranteed that the subset of (r,q)(r^{\prime},q)-Gray codes’ average weight around the center equals exactly βr,q\beta_{r^{\prime},q}. However, it will be very close to it, with a rounded value that is equal to βr,q\beta_{r^{\prime},q}. We formalize these observations in the subsequent lemma.

Let 𝒢\mathcal{G} denote the subset of kqkq Gray code sequences that are used to encode the index zz^{\prime}, let w¯()\overline{w}(\cdot) denote the average weight of a set of sequences and let \lVert\cdot\rVert denote rounding to the nearest integer.

Lemma 2

For an (r,q)(r^{\prime},q)-Gray code subset, 𝒢\mathcal{G}, where qq is odd and the zz^{\prime}-th codewords are chosen with z[z1,z2]z^{\prime}\in[z_{1},z_{2}], the following holds:

  • if kk is odd with z1=qr2kq2z_{1}=\lfloor\frac{q^{r^{\prime}}}{2}\rfloor-\lfloor\frac{kq}{2}\rfloor and z2=qr2+kq2z_{2}=\lfloor\frac{q^{r^{\prime}}}{2}\rfloor+\lfloor\frac{kq}{2}\rfloor, then w¯(𝒢)=βr,q\overline{w}(\mathcal{G})=\beta_{r^{\prime},q},

  • if kk is even with z1=qr2kq2z_{1}=\lfloor\frac{q^{r^{\prime}}}{2}\rfloor-\frac{kq}{2} and z2=qr2+kq21z_{2}=\lfloor\frac{q^{r^{\prime}}}{2}\rfloor+\frac{kq}{2}-1, then w¯(𝒢)=βr,q\lVert\overline{w}(\mathcal{G})\rVert=\beta_{r^{\prime},q}.

Proof:

To simplify notation in this proof, we simply use β\beta to represent βr,q\beta_{r^{\prime},q} throughout.

If kk is odd, it follows directly from Lemma 1 that choosing kqkq sequences (where kqkq is odd) from z=qr2kq2z=\lfloor\frac{q^{r^{\prime}}}{2}\rfloor-\lfloor\frac{kq}{2}\rfloor to z=qr2+kq2z=\lfloor\frac{q^{r^{\prime}}}{2}\rfloor+\lfloor\frac{kq}{2}\rfloor, centered around z=qr2z=\lfloor\frac{q^{r^{\prime}}}{2}\rfloor, will result in w¯(𝒢)=β\overline{w}(\mathcal{G})=\beta, since the random walk forms an odd function around this point.

In cases where kk is even, if z2z_{2} was chosen as qr2+kq2\lfloor\frac{q^{r^{\prime}}}{2}\rfloor+\frac{kq}{2}, we would have exactly w¯(𝒢)=β\overline{w}(\mathcal{G})=\beta (using the same reasoning as for the case where kk is odd), as we use kq2\frac{kq}{2} elements to the left of qr2\lfloor\frac{q^{r^{\prime}}}{2}\rfloor and kq2\frac{kq}{2} elements to the right of it. However, this would mean that kq+1kq+1 elements are being used. Thus, z2=qr2+kq21z_{2}=\lfloor\frac{q^{r^{\prime}}}{2}\rfloor+\frac{kq}{2}-1 must be used. Let α\alpha be the weight of the (z2+1)(z_{2}+1)-th Gray code, then

w¯(𝒢)=(kq+1)βαkq=β+βαkq.\overline{w}(\mathcal{G})=\frac{(kq+1)\beta-\alpha}{kq}=\beta+\frac{\beta-\alpha}{kq}.

The lowest possible value of α\alpha is αmin=0\alpha_{\min}=0, and its highest possible value is αmax=k(q1)\alpha_{\max}=k(q-1). Thus,

β+βαmaxkqw¯(𝒢)β+βαminkq\beta+\frac{\beta-\alpha_{\max}}{kq}\leq\overline{w}(\mathcal{G})\leq\beta+\frac{\beta-\alpha_{\min}}{kq}

and with some manipulations it can be shown that

β(12kr1kq)w¯(𝒢)β(1+1kq).\beta\left(1-\frac{\frac{2k}{r^{\prime}}-1}{kq}\right)\leq\overline{w}(\mathcal{G})\leq\beta\left(1+\frac{1}{kq}\right).

Finally, where qq is odd, we have q3q\geq 3, and rounding to the nearest integer results in w¯(𝒢)=β\lVert\overline{w}(\mathcal{G})\rVert=\beta. ∎

IV-B (r,q)(r^{\prime},q)-Gray code prefixes for qq even

For the encoding of sequences that make use of (r,q)(r^{\prime},q)-Gray code prefixes where qq is even, a different approach is followed. The subset of Gray code prefixes is obtained by placing a sliding window of length kqkq over the random walk graph of the (r,q)(r^{\prime},q)-Gray code sequences, and shifting it until we obtain a subset with an average weight value of βr,q\beta_{r^{\prime},q}. Fig. 4 shows the (6,2)(6,2)-Gray code random walk graph.

However, this process does not always guarantee a subset of Gray code prefixes with an average weight value of exactly βr,q\beta_{r^{\prime},q}. Since we have flexibility in choosing uu, we can choose the average weight for the subset to be close to βr,q\beta_{r^{\prime},q}, and adjust uu as necessary to obtain exact balancing.

Refer to caption
Figure 4: (6,2)(6,2)-Gray code random walk graph
Lemma 3

An (r,q)(r^{\prime},q)-Gray code subset, 𝒢\mathcal{G}, where qq is even, can be chosen such that w¯(𝒢)=βr,q\lVert\overline{w}(\mathcal{G})\rVert=\beta_{r^{\prime},q}.

Proof:

A similar reasoning as in the proof of Lemma 2, where a symbol with weight α\alpha is repeatedly removed from the set, can be used to find w¯(𝒢)\lVert\overline{w}(\mathcal{G})\rVert. ∎

IV-C Encoding

Having presented all the required components, we now propose our encoding algorithm. The length of the required Gray code prefix is

r=logqk+1,r^{\prime}=\lceil\log_{q}k\rceil+1, (1)

where \lceil\cdot\rceil represents the ceiling function.

The cardinality of (r,q)(r^{\prime},q)-Gray codes equals qrq^{r^{\prime}}. This implies that qr1<kq<qrq^{r^{\prime}-1}<kq<q^{r^{\prime}}. The encoding will make use of a subset of kqkq Gray code sequences from the qrq^{r^{\prime}} available ones.

Theorem 1

Any qq-ary sequence can be balanced by adding (modulo qq) an appropriate balancing sequence, 𝐛z\mbox{\boldmath$b$}_{z}, and prefixing a redundant symbol, uu, with a Gray code sequence, 𝐠g, taken from the subset of (r,q)(r^{\prime},q)-Gray code prefixes.

Proof:

Let 𝒰\mathcal{U} denote the set of possible symbols for uu, i.e. 𝒰={0,1,,q1}\mathcal{U}=\{0,1,\ldots,q-1\}, let 𝒢\mathcal{G} denote the subset of Gray code sequences, and let 𝒴\mathcal{Y} denote the set of kqkq output sequences after the kqkq balancing sequences are added to the information sequence.

It is easy to see that

w¯(𝒰)=(q1)2.\overline{w}(\mathcal{U})=\frac{(q-1)}{2}.

From Lemmas 2 and 3, the subset of (r,q)(r^{\prime},q)-Gray code prefixes that corresponds to the kqkq balancing sequences is chosen such that

w¯(𝒢)=r(q1)2=βr,q.\lVert\overline{w}(\mathcal{G})\rVert=\frac{r^{\prime}(q-1)}{2}=\beta_{r^{\prime},q}.

It was proved in [5] that the average weight of the kqkq sequences, 𝒚=𝒙q𝒃z\mbox{\boldmath$y$}=\mbox{\boldmath$x$}\oplus_{q}\mbox{\boldmath$b$}_{z}, is such that

w¯(𝒴)=k(q1)2=βk,q.\overline{w}(\mathcal{Y})=\frac{k(q-1)}{2}=\beta_{k,q}.

By considering 𝒄=(u|𝒈|𝒚)\mbox{\boldmath$c$}=(u|\mbox{\boldmath$g$}|\mbox{\boldmath$y$}), with length n=k+r+1n=k+r^{\prime}+1, as the overall sequence to be transmitted, it follows that:

w¯(𝒰)+w¯(𝒢)+w¯(𝒴)\displaystyle\overline{w}(\mathcal{U})+\lVert\overline{w}(\mathcal{G})\rVert+\overline{w}(\mathcal{Y}) =(q1)2+r(q1)2+k(q1)2\displaystyle=\frac{(q-1)}{2}+\frac{r^{\prime}(q-1)}{2}+\frac{k(q-1)}{2}
=(k+r+1)(q1)2\displaystyle=\frac{(k+r^{\prime}+1)(q-1)}{2}
=βn,q.\displaystyle=\beta_{n,q}.

This implies that there is at least one 𝒄c for which w(𝒄)βn,qw(\mbox{\boldmath$c$})\leq\beta_{n,q} and at least one other 𝒄c for which w(𝒄)βn,qw(\mbox{\boldmath$c$})\geq\beta_{n,q}. Taking the random walk’s increases into account, as well as the flexibility in choosing uu, we can conclude that there is at least one 𝒄c such that w(𝒄)=βn,qw(\mbox{\boldmath$c$})=\beta_{n,q}. ∎

The encoding algorithm consists of the following steps:

  1. 1.

    Obtain the correct Gray code length rr^{\prime} by using (1). Then find the corresponding subset of (r,q)(r^{\prime},q)-Gray code prefixes, z[z1,z2]z^{\prime}\in[z_{1},z_{2}], using the methods discussed in Section IV-A where qq is odd and in Section IV-B where qq is even.

  2. 2.

    Incrementing through zz, determine the balancing sequences, 𝒃z\mbox{\boldmath$b$}_{z}, and add them to the information sequence 𝒙x to obtain outputs 𝒚y.

  3. 3.

    For each increment of zz, append every 𝒚y with the corresponding Gray code prefix 𝒈g following the lexicographic order, with 𝒈g obtained from the qq-ary representations of the zz^{\prime} indices.

  4. 4.

    Finally, set u=βn,qw(𝒚)w(𝒈)u=\beta_{n,q}-w(\mbox{\boldmath$y$})-w(\mbox{\boldmath$g$}) if u{0,1,,q1}u\in\{0,1,\ldots,q-1\}, otherwise set u=0u=0.

We illustrate the encoding algorithm with the following two examples, one for an odd value of qq and the other for an even value of qq.

Example 4

Consider encoding the ternary sequence, (21120)(21120), of length 5. Since r=log35+1=3r^{\prime}=\lceil\log_{3}5\rceil+1=3, we require (3,3)(3,3)-Gray code prefixes to encode the zz^{\prime} indices. The overall sequence length is n=k+r+1=9n=k+r^{\prime}+1=9, and the balancing value is β9,3=9\beta_{9,3}=9. The cardinality of the (3,3)(3,3)-Gray code is 27 and the required zz-centered subset of prefixes containing kq=15kq=15 elements is such that z[5,19]z^{\prime}\in[5,19].

The following process shows the possible sequences obtained. Again the underlined symbols represent the appended prefix, the bold underlined symbol is uu, and the bold weights indicate balancing.

zz𝒙q𝒃z=𝒚𝒄w(𝒄)05(21120)3(00000)=(21120)(𝟐010¯21120)𝟗16(21120)3(10000)=(01120)(𝟎020¯01120)627(21120)3(11000)=(02120)(𝟏021¯02120)𝟗38(21120)3(11100)=(02220)(𝟎022¯02220)1049(21120)3(11110)=(02200)(𝟎122¯02200)𝟗510(21120)3(11111)=(02201)(𝟎121¯02201)𝟗611(21120)3(21111)=(12201)(𝟎120¯12201)𝟗712(21120)3(22111)=(10201)(𝟎110¯10201)6813(21120)3(22211)=(10001)(𝟎111¯10001)5914(21120)3(22221)=(10011)(𝟐112¯10011)𝟗1015(21120)3(22222)=(10012)(𝟐102¯10012)𝟗1116(21120)3(02222)=(20012)(𝟐101¯20012)𝟗1217(21120)3(00222)=(21012)(𝟐100¯21012)𝟗1318(21120)3(00022)=(21112)(𝟎200¯21112)𝟗1419(21120)3(00002)=(21122)(𝟎201¯21122)11\begin{array}[]{c@{\;}ccc@{\;}c}z&z^{\prime}&\mbox{\boldmath$x$}\oplus_{q}\mbox{\boldmath$b$}_{z}=\mbox{\boldmath$y$}&\mbox{\boldmath$c$}&w(\mbox{\boldmath$c$})\\ \hline\cr 0&5&(21120)\oplus_{3}(00000)=(21120)&(\underline{\mathbf{2}010}21120)&\mathbf{9}\\ 1&6&(21120)\oplus_{3}(10000)=(01120)&(\underline{\mathbf{0}020}01120)&6\\ 2&7&(21120)\oplus_{3}(11000)=(02120)&(\underline{\mathbf{1}021}02120)&\mathbf{9}\\ 3&8&(21120)\oplus_{3}(11100)=(02220)&(\underline{\mathbf{0}022}02220)&10\\ 4&9&(21120)\oplus_{3}(11110)=(02200)&(\underline{\mathbf{0}122}02200)&\mathbf{9}\\ 5&10&(21120)\oplus_{3}(11111)=(02201)&(\underline{\mathbf{0}121}02201)&\mathbf{9}\\ 6&11&(21120)\oplus_{3}(21111)=(12201)&(\underline{\mathbf{0}120}12201)&\mathbf{9}\\ 7&12&(21120)\oplus_{3}(22111)=(10201)&(\underline{\mathbf{0}110}10201)&6\\ 8&13&(21120)\oplus_{3}(22211)=(10001)&(\underline{\mathbf{0}111}10001)&5\\ 9&14&(21120)\oplus_{3}(22221)=(10011)&(\underline{\mathbf{2}112}10011)&\mathbf{9}\\ 10&15&(21120)\oplus_{3}(22222)=(10012)&(\underline{\mathbf{2}102}10012)&\mathbf{9}\\ 11&16&(21120)\oplus_{3}(02222)=(20012)&(\underline{\mathbf{2}101}20012)&\mathbf{9}\\ 12&17&(21120)\oplus_{3}(00222)=(21012)&(\underline{\mathbf{2}100}21012)&\mathbf{9}\\ 13&18&(21120)\oplus_{3}(00022)=(21112)&(\underline{\mathbf{0}200}21112)&\mathbf{9}\\ 14&19&(21120)\oplus_{3}(00002)=(21122)&(\underline{\mathbf{0}201}21122)&11\\ \end{array}
Example 5

Consider encoding the 4-ary sequence, (312), of length 3. As before, r=log43+1=2r^{\prime}=\lceil\log_{4}3\rceil+1=2, requiring (2,4)(2,4)-Gray code prefixes to be used. The overall sequence length is n=6n=6, and the balancing value is β6,4=9\beta_{6,4}=9. The cardinality of the (2,4)(2,4)-Gray code equals 16. The zz^{\prime}-subset is found by employing a sliding window of length kq=12kq=12 over the random walk graph of the (2,4)(2,4)-Gray code prefixes, shown in Fig. 5. A suitable subset is found where z1=1z_{1}=1 and z2=12z_{2}=12, with an average weight value of 3, which equals β2,4=3\beta_{2,4}=3.

Refer to caption
Figure 5: (2,4)(2,4)-Gray code random walk graph with chosen subset

The encoding process for the 4-ary sequence is shown next.

zz𝒙q𝒃z=𝒚𝒄w(𝒄)01(312)4(000)=(312)(𝟐01¯312)𝟗12(312)4(100)=(012)(𝟎02¯012)523(312)4(110)=(022)(𝟐03¯022)𝟗34(312)4(111)=(023)(𝟎13¯023)𝟗45(312)4(211)=(123)(𝟎12¯123)𝟗56(312)4(221)=(133)(𝟎11¯133)𝟗67(312)4(222)=(130)(𝟎10¯130)578(312)4(322)=(230)(𝟐20¯230)𝟗89(312)4(332)=(200)(𝟎21¯200)5910(312)4(333)=(201)(𝟐22¯201)𝟗1011(312)4(033)=(301)(𝟎23¯301)𝟗1112(312)4(003)=(311)(𝟎33¯311)11\begin{array}[]{c@{\quad}c@{\quad\quad}c@{\;}c@{\;}c@{\;}c@{\;}c@{\quad\quad}c@{\quad\quad}c}z&z^{\prime}&\mbox{\boldmath$x$}&\oplus_{q}&\mbox{\boldmath$b$}_{z}&=&\mbox{\boldmath$y$}&\mbox{\boldmath$c$}&w(\mbox{\boldmath$c$})\\ \hline\cr 0&1&(312)&\oplus_{4}&(000)&=&(312)&(\underline{\mathbf{2}01}312)&\mathbf{9}\\ 1&2&(312)&\oplus_{4}&(100)&=&(012)&(\underline{\mathbf{0}02}012)&5\\ 2&3&(312)&\oplus_{4}&(110)&=&(022)&(\underline{\mathbf{2}03}022)&\mathbf{9}\\ 3&4&(312)&\oplus_{4}&(111)&=&(023)&(\underline{\mathbf{0}13}023)&\mathbf{9}\\ 4&5&(312)&\oplus_{4}&(211)&=&(123)&(\underline{\mathbf{0}12}123)&\mathbf{9}\\ 5&6&(312)&\oplus_{4}&(221)&=&(133)&(\underline{\mathbf{0}11}133)&\mathbf{9}\\ 6&7&(312)&\oplus_{4}&(222)&=&(130)&(\underline{\mathbf{0}10}130)&5\\ 7&8&(312)&\oplus_{4}&(322)&=&(230)&(\underline{\mathbf{2}20}230)&\mathbf{9}\\ 8&9&(312)&\oplus_{4}&(332)&=&(200)&(\underline{\mathbf{0}21}200)&5\\ 9&10&(312)&\oplus_{4}&(333)&=&(201)&(\underline{\mathbf{2}22}201)&\mathbf{9}\\ 10&11&(312)&\oplus_{4}&(033)&=&(301)&(\underline{\mathbf{0}23}301)&\mathbf{9}\\ 11&12&(312)&\oplus_{4}&(003)&=&(311)&(\underline{\mathbf{0}33}311)&11\\ \end{array}

IV-D Decoding

Fig. 6 presents the decoding process of our proposed scheme, for any qq-ary information sequence. The decoding algorithm consists of the following steps:

  1. 1.

    The redundant symbol uu is dropped, then the following rr^{\prime} symbols are extracted as the Gray code prefix, 𝒈g, converted to 𝒅d and used to find zz^{\prime}.

  2. 2.

    From zz^{\prime}, the corresponding zz index is computed as z=zz1z=z^{\prime}-z_{1}.

  3. 3.

    zz is used to find the parameters ss and pp, then 𝒃z\mbox{\boldmath$b$}_{z} is derived.

  4. 4.

    Finally, the original sequence is recovered through 𝒙=𝒚q𝒃z\mbox{\boldmath$x$}=\mbox{\boldmath$y$}\ominus_{q}\mbox{\boldmath$b$}_{z}.

Refer to caption
Figure 6: Decoding process for any qq-ary information sequence
Example 6

Consider the decoding of the sequence, (𝟐100¯121200)(\underline{\mathbf{2}100}121200) (the underlined symbols are the prefix and the bold underlined symbol is uu), where n=10n=10 and q=3q=3, that was encoded using (3,3)(3,3)-Gray code prefixes.

The first symbol u=2u=2 is dropped, then the Gray code prefix is extracted as (100)(100), which corresponds to z=17z^{\prime}=17, and the zz^{\prime}-subset of (3,3)(3,3)-Gray code prefixes is z[4,21]z^{\prime}\in[4,21], thus z=13z=13. This can be seen from Table III, where the decoding of all (3,3)(3,3)-Gray codes is shown.

This implies that s=2s=2 and p=1p=1, resulting in 𝐛13=(022222)\mbox{\boldmath$b$}_{13}=(022222). Finally, the original information sequence is extracted as 𝐱=𝐲q𝐛z=(121200)3(022222)=(102011)\mbox{\boldmath$x$}=\mbox{\boldmath$y$}\ominus_{q}\mbox{\boldmath$b$}_{z}=(121200)\ominus_{3}(022222)=(102011).

TABLE III: Decoding of (3,3)(3,3)-Gray codes for ternary sequences with k=6k=6 and z[4,21]z^{\prime}\in[4,21]
Gray code (𝐠g) Sequence (𝐝d) zz^{\prime} zz s,ps,p 𝒃z\mbox{\boldmath$b$}_{z}
(000)(000) (000)(000) 0
(001)(001) (001)(001) 1
(002)(002) (002)(002) 2
(012)(012) (010)(010) 3
(011)(011) (011)(011) 4 0 0,00,0 (000000)(000000)
(010)(010) (012)(012) 5 1 0,10,1 (100000)(100000)
(020)(020) (020)(020) 6 2 0,20,2 (110000)(110000)
(021)(021) (021)(021) 7 3 0,30,3 (111000)(111000)
(022)(022) (022)(022) 8 4 0,40,4 (111100)(111100)
(122)(122) (100)(100) 9 5 0,50,5 (111110)(111110)
(121)(121) (101)(101) 10 6 1,01,0 (111111)(111111)
(120)(120) (102)(102) 11 7 1,11,1 (211111)(211111)
(110)(110) (110)(110) 12 8 1,21,2 (221111)(221111)
(111)(111) (111)(111) 13 9 1,31,3 (222111)(222111)
(112)(112) (112)(112) 14 10 1,41,4 (222211)(222211)
(102)(102) (120)(120) 15 11 1,51,5 (222221)(222221)
(101)(101) (121)(121) 16 12 2,02,0 (222222)(222222)
(100)(100) (122)(122) 17 13 2,12,1 (022222)(022222)
(200)(200) (200)(200) 18 14 2,22,2 (002222)(002222)
(201)(201) (201)(201) 19 15 2,32,3 (000222)(000222)
(202)(202) (202)(202) 20 16 2,42,4 (000022)(000022)
(212)(212) (210)(210) 21 17 2,52,5 (000002)(000002)
(211)(211) (211)(211) 22
(210)(210) (212)(212) 23
(220)(220) (220)(220) 24
(221)(221) (221)(221) 25
(222)(222) (222)(222) 26

V Redundancy and Complexity

In this section we compare the redundancy and complexity of our proposed scheme with some existing ones.

V-A Redundancy

Let qk\mathcal{F}_{q}^{k} denote the cardinality of the full set of balanced qq-ary sequences of length kk. According to [17],

qk=qk6πr(q21)(1+𝒪(1k)).\mathcal{F}_{q}^{k}=q^{k}\sqrt{\frac{6}{\pi r(q^{2}-1)}}\left(1+\mathcal{O}\left(\frac{1}{k}\right)\right).

The information sequence length, kk, in terms of the redundancy, rr, for the construction in [5] is

kqrqqr16πr(q21).k\leq\frac{\mathcal{F}_{q}^{r}}{q}\approx q^{r-1}\sqrt{\frac{6}{\pi r(q^{2}-1)}}. (2)

In [4], two schemes are presented for kk information symbols, where one satisfies the bound

kqr1q1,k\leq\frac{q^{r}-1}{q-1}, (3)

and the other one satisfies

k2qr1q1r.k\leq 2\frac{q^{r}-1}{q-1}-r. (4)

The construction in [8] presents the information sequence length in terms of the redundancy as

k112γqr1q1a1(q,γ)ra2(q,γ),k\leq\frac{1}{1-2\gamma}\frac{q^{r}-1}{q-1}-a_{1}(q,\gamma)r-a_{2}(q,\gamma),

with γ[0,12)\gamma\in[0,\frac{1}{2}), where a1a_{1} and a2a_{2} are scalars depending on qq and γ\gamma. If the compression aspect is ignored, the information sequence length is the same as in (4).

The prefixless scheme presented in [6] has information sequence length that satisfies

kqr1r.k\leq q^{r-1}-r. (5)

Two constructions with parallel decoding are presented in [7]. The first construction, where the prefixes are also balanced as in [5], has its information length as a function of rr as

kqr{qmod2+[(q1)k]mod2}q1.k\leq\frac{\mathcal{F}_{q}^{r}-\{q\bmod 2+[(q-1)k]\bmod 2\}}{q-1}. (6)

The second construction, where the prefixes need not be balanced, is a refinement of the first and has an information length the same as (3).

As presented in Section IV, the redundancy of our new construction is given by r=logqk+2r=\lceil\log_{q}k\rceil+2. Therefore, the information sequence length in terms of redundancy is

k=qr2.k=q^{r-2}. (7)

Fig. 7 presents a comparison of the information length, kk, versus the redundancy, rr, for various constructions as discussed above. For all qq, our construction is only comparable to the information lengths from (2) and (6), although it does slightly improve on both.

However, the trade-off is that as the redundancy becomes greater, the complexity of our scheme tends to remain constant, as we see in the next section.

Refer to caption
Figure 7: Comparison of information sequence length vs. redundancy for various schemes

V-B Complexity

We estimate the complexity of our proposed scheme and compare it to that of existing algorithms.

The techniques in [4] and [8] both require 𝒪(qklogqk)\mathcal{O}(qk\log_{q}k) digit operations for the encoding and decoding. The method from [5] takes 𝒪(qklogqk)\mathcal{O}(qk\log_{q}k) digit operations for the encoding and 𝒪(1)\mathcal{O}(1) digit operations for the decoding. A refined design of the parallel decoding method is presented in [7], where the complexity equals 𝒪(klogqk)\mathcal{O}(k\sqrt{\log_{q}k}) in the encoding case and 𝒪(1)\mathcal{O}(1) digit operations in the decoding process.

The following pseudo code presents the steps of our encoding method:

Input: Information sequence, x of length k.
Output: Encoded sequence, y of length n=k+r.

for i=0:kq;
   for j=z1:z2;
      y(i) = [u | g(j) |  x + b(s,p)(i)];
        If (w(y(i))==beta)
          // Testing for balanced sequence.
          exit();
          //Terminate the program.
   end;
end;

In the above code, ii is the iterator through the kqkq output sequences and also through the balancing sequences, while jj is the iterator through the subset of Gray code sequences, ranging from z1=qr2kq2z_{1}=\lfloor\frac{q^{r^{\prime}}}{2}\rfloor-\lfloor\frac{kq}{2}\rfloor to z2=z1+kq1z_{2}=z_{1}+kq-1. The symbol ‘\mid’ denotes the concatenation.

Our encoding scheme is based on the construction in [5] that has an encoding complexity of 𝒪(qklogqk)\mathcal{O}(qk\log_{q}k), and it takes 𝒪(logqk)\mathcal{O}(\log_{q}k) to encode Gray code prefixes as presented in [10]. Therefore the encoding of our algorithm requires 𝒪(qklogqk)\mathcal{O}(qk\log_{q}k) digit operations.

The decoding process consists of very simple steps: the recovery of the index zz^{\prime} from the Gray code requires 𝒪(logqk)\mathcal{O}(\log_{q}k) digit operations [10]. After obtaining the index zz^{\prime} from the Gray code prefix, the balancing sequence 𝒃z\mbox{\boldmath$b$}_{z} is found and then the original information sequence is recovered through the operation, 𝒚=𝒙q𝒃z\mbox{\boldmath$y$}=\mbox{\boldmath$x$}\ominus_{q}\mbox{\boldmath$b$}_{z}, which can be performed in parallel, resulting in a complexity of 𝒪(1)\mathcal{O}(1). Therefore the overall complexity for the decoding is 𝒪(logqk)\mathcal{O}(\log_{q}k) digit operations.

Table IV summarizes the complexities for various constructions, where the orders of digit operations it takes to complete the encoding/decoding are compared.

TABLE IV: Complexities of various schemes (orders are in digit operations)
Algorithm Encoding order Decoding order
[8] 𝒪(qklogqk)\mathcal{O}(qk\log_{q}k) 𝒪(qklogqk)\mathcal{O}(qk\log_{q}k)
[4] 𝒪(qklogqk)\mathcal{O}(qk\log_{q}k) 𝒪(qklogqk)\mathcal{O}(qk\log_{q}k)
[5] 𝒪(qklogqk)\mathcal{O}(qk\log_{q}k) 𝒪(1)\mathcal{O}(1)
[7] 𝒪(klogqk)\mathcal{O}(k\sqrt{\log_{q}k}) 𝒪(1)\mathcal{O}(1)
Our scheme 𝒪(qklogqk)\mathcal{O}(qk\log_{q}k) 𝒪(logqk)\mathcal{O}(\log_{q}k)

VI Conclusion

An efficient construction has been proposed for balancing non-binary information sequences. By making use of Gray codes for the prefix, no lookup tables are used, only linear operations are needed for the balancing and the Gray code implementation. The encoding scheme has a complexity of 𝒪(qklogqk)\mathcal{O}(qk\log_{q}k) digit operations. For the decoding process, once the Gray code prefix is decoded using 𝒪(logqk)\mathcal{O}(\log_{q}k) digit operations, the balancing sequence is determined and the rest of the decoding process is performed in parallel. This makes the decoding fast and efficient.

Possible future research directions include finding a mathematical procedure to determine the subset of Gray code sequences for qq even, given that it was found manually, by using a sliding window over the random walk graph. Practically, the redundant symbol uu only needs to take on values of zero (when the random walk falls on the balancing value) or one (when the random walk falls just below the balancing value). Thus, unnecessary redundancy is contained in uu, especially for large values of qq. However, the flexibility over uu increases the occurrences of balanced sequences. These additional balanced outputs could potentially be used to send auxiliary data that could reduce the redundancy. This property was proved for the binary case [12]. Additionally, given that the random walk graph passes through other weights in the region of the balancing value, the scheme can be extended to the construction of constant weight sequences with arbitrary weights.

References

  • [1] K. A. S. Immink, Codes for Mass Data Storage Systems, 2nd ed., Shannon Foundation Publishers, Eindhoven, The Netherlands, 2004.
  • [2] D. E. Knuth, “Efficient balanced codes,” IEEE Transactions on Information Theory, vol. 32, no. 1, pp. 51–53, Jan. 1986.
  • [3] S. Al-Bassam, “Balanced codes,” Ph.D. dissertation, Oregon State University, USA, Jan. 1990.
  • [4] R. M. Capocelli, L. Gargano and U. Vaccaro, “Efficient qq-ary immutable codes,” Discrete Applied Mathematics, vol. 33, no. 1–3, pp. 25–41, Nov. 1991.
  • [5] T. G. Swart and J. H. Weber, “Efficient balancing of qq-ary sequences with parallel decoding,” in Proceedings of the IEEE International Symposium on Information Theory, Seoul, Korea, 28 Jun.–3 Jul. 2009, pp. 1564–1568.
  • [6] T. G. Swart and K. A. S. Immink, “Prefixless qq-ary balanced codes with ECC,” in Proceedings of the IEEE Information Theory Workshop, Seville, Spain, Sep. 9–13, 2013.
  • [7] D. Pelusi, S. Elmougy, L. G. Tallini and B. Bose, “mm-ary balanced codes with parallel decoding,” IEEE Transactions on Information Theory, vol. 61, no. 6, pp. 3251–3264, Jun. 2015.
  • [8] L. G. Tallini and U. Vaccaro, “Efficient mm-ary immutable codes,” Discrete Applied Mathematics, vol. 92, no. 1, pp. 17–56, Mar. 1999.
  • [9] F. Gray, “Pulse code communication,” U. S. Patent 2632058, Mar. 1953.
  • [10] D.-J. Guan, “Generalized Gray codes with applications,” in Proceedings of National Science Council, Republic of China, Part A, vol. 22, no. 6, Apr. 1998, pp. 841–848.
  • [11] M. C. Er, “On generating the N-ary reflected Gray codes,” IEEE Transactions on Computers, vol. 33, no. 8, pp. 739–741, Aug. 1984.
  • [12] J. H. Weber and K. A. S. Immink, “Knuth’s balancing of codewords revisited,” IEEE Transactions on Information Theory, vol. 56, no. 4, pp. 1673–1679, Apr. 2010.
  • [13] E. N. Mambou, and T. G. Swart, “Encoding and decoding of balanced qq-ary sequences using a Gray code prefix,” in Proceedings of the IEEE International Symposium on Information Theory, Barcelona, Spain, Jul. 10–15, 2016, pp. 380–384.
  • [14] K. A. S. Immink and J. H. Weber, “Very efficient balanced codes,” IEEE Transactions on Information Theory, vol. 28, no. 2, pp. 188–192, Feb. 2010.
  • [15] L. G. Tallini and B. Bose, “Balanced codes with parallel encoding and decoding,” IEEE Transactions on Computers, vol. 48, no. 8, pp. 794–814, Aug. 1999.
  • [16] B. Bose, “On unordered codes,” Proceedings of the International Symposium on Fault-Tolerant Computing, Pittsburgh, PA, 1987, pp. 102–107.
  • [17] Z. Star, “An asymptotic formula in the theory of compositions,” Aequationes Mathematicae, vol. 13, no. 1, pp. 279–284, Feb. 1975.