This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Low-Complexity Near-ML Decoding of Large Non-Orthogonal STBCs using Reactive Tabu Search

N. Srinidhi, Saif K. Mohammed, A. Chockalingam, and B. Sundar Rajan
Department of ECE, Indian Institute of Science, Bangalore 560012, INDIA
Abstract

Non-orthogonal space-time block codes (STBC) with large dimensions are attractive because they can simultaneously achieve both high spectral efficiencies (same spectral efficiency as in V-BLAST for a given number of transmit antennas) as well as full transmit diversity. Decoding of non-orthogonal STBCs with large dimensions has been a challenge. In this paper, we present a reactive tabu search (RTS) based algorithm for decoding non-orthogonal STBCs from cyclic division algebras (CDA) having large dimensions. Under i.i.d fading and perfect channel state information at the receiver (CSIR), our simulation results show that RTS based decoding of 12×1212\times 12 STBC from CDA and 4-QAM with 288 real dimensions achieves i)i) 10310^{-3} uncoded BER at an SNR of just 0.5 dB away from SISO AWGN performance, and ii)ii) a coded BER performance close to within about 5 dB of the theoretical MIMO capacity, using rate-3/4 turbo code at a spectral efficiency of 18 bps/Hz. RTS is shown to achieve near SISO AWGN performance with less number of dimensions than with LAS algorithm (which we reported recently) at some extra complexity than LAS. We also report good BER performance of RTS when i.i.d fading and perfect CSIR assumptions are relaxed by considering a spatially correlated MIMO channel model, and by using a training based iterative RTS decoding/channel estimation scheme.

I Introduction

MIMO systems that employ non-orthogonal space-time block codes (STBC) from cyclic division algebras (CDA) for arbitrary number of transmit antennas, NtN_{t}, are attractive because they can simultaneously provide both full-rate (i.e., NtN_{t} complex symbols per channel use, which is same as in V-BLAST) as well as full transmit diversity [1],[2]. The 2×22\times 2 Golden code is a well known non-orthogonal STBC from CDA for 2 transmit antennas [3]. High spectral efficiencies of the order of tens of bps/Hz can be achieved using large non-orthogonal STBCs. For e.g., a 16×1616\times 16 STBC from CDA has 256 complex symbols in it with 512 real dimensions; with 16-QAM and rate-3/4 turbo code, this system offers a high spectral efficiency of 48 bps/Hz. Decoding of non-orthogonal STBCs with such large dimensions, however, has been a challenge. Sphere decoder and its low-complexity variants are prohibitively complex for decoding such STBCs with hundreds of dimensions. Recently, we proposed a low-complexity near-ML achieving algorithm to decode large non-orthogonal STBCs from CDA; this algorithm, which is based on bit-flipping approach, is termed as likelihood ascent search (LAS) algorithm [4]-[6]. In this paper, we present a reactive tabu search (RTS) based approach to near-ML decoding of non-orthogonal STBCs with large dimensions.

Key attractive features of the proposed RTS based decoding are its low-complexity and near-ML performance in systems with large dimensions (e.g., hundreds of dimensions). While creating hundreds of dimensions in space alone (e.g., V-BLAST) requires hundreds of antennas, use of non-orthogonal STBCs from CDA can create hundreds of dimensions with just tens of antennas (space) and tens of channel uses (time). Given that 802.11 smart WiFi products with 12 transmit antennas11112 antennas in these products are now used only for beamforming. Single-beam multi-antenna approaches can offer range increase and interference avoidance, but not spectral efficiency increase. at 2.5 GHz are now commercially available [7] (which establishes that issues related to placement of many antennas and RF/IF chains can be solved in large aperture communication terminals like set-top boxes/laptops), large non-orthogonal STBCs (e.g., 16×1616\hskip-1.42262pt\times\hskip-1.42262pt16 STBC from CDA) in combination with large dimension near-ML decoding using RTS can enable communications at increased spectral efficiencies of the order of tens of bps/Hz (note that current standards achieve only <10<10 bps/Hz using only up to 4 tx antennas).

Tabu search (TS), a heuristic originally designed to obtain approximate solutions to combinatorial optimization problems [8]-[11], is increasingly applied in communication problems [12]-[14]. For e.g., in [12], design of constellation label maps to maximize asymptotic coding gain is formulated as a quadratic assignment problem (QAP), which is solved using RTS [11]. RTS approach is shown to be effective in terms of BER performance and efficient in terms of computational complexity in CDMA multiuser detection [13]. In [14], a fixed TS based detection in V-BLAST is presented. In this paper, we establish that RTS based decoding of non-orthogonal STBCs can achieve excellent BER performance (near-ML and near-capacity performance) in large dimensions at practically affordable low-complexities. We also present a stopping-criterion for the RTS algorithm. RTS for large dimension non-orthogonal STBC decoding has not been reported so far. Our results in this paper can be summarized as follows:

  • Under i.i.d fading and perfect channel state information at the receiver (CSIR), our simulation results show that RTS based decoding of 12×1212\times 12 STBC from CDA and 4-QAM (288 real dimensions) achieves i)i) 10310^{-3} uncoded BER at an SNR of just 0.5 dB away from SISO AWGN performance, and ii)ii) a coded BER performance close to within about 5 dB of the theoretical capacity using rate-3/4 turbo code at a spectral efficiency of 18 bps/Hz.

  • Compared to the LAS algorithm we reported recently in [4]-[6], RTS achieves near-SISO AWGN performance with less number of dimensions than with LAS; this is achieved at some extra complexity compared to LAS.

  • We report good BER performance when i.i.d fading and perfect CSIR assumptions are relaxed by adopting a spatially correlated MIMO channel model, and a training based iterative RTS decoding/channel estimation scheme.

II Non-Orthogonal STBC MIMO System Model

Consider a STBC MIMO system with multiple transmit and receive antennas. An (n,p,k)(n,p,k) STBC is represented by a matrix 𝐗cn×p{\bf X}_{c}\in{\mathbb{C}}^{n\times p}, where nn and pp denote the number of transmit antennas and number of time slots, respectively, and kk denotes the number of complex data symbols sent in one STBC matrix. The (i,j)(i,j)th entry in 𝐗c{\bf X}_{c} represents the complex number transmitted from the iith transmit antenna in the jjth time slot. The rate of an STBC is kp\frac{k}{p}. Let NrN_{r} and Nt=nN_{t}=n denote the number of receive and transmit antennas, respectively. Let 𝐇cNr×Nt{\bf H}_{c}\in{\mathbb{C}}^{N_{r}\times N_{t}} denote the channel gain matrix, where the (i,j)(i,j)th entry in 𝐇c{\bf H}_{c} is the complex channel gain from the jjth transmit antenna to the iith receive antenna. We assume that the channel gains remain constant over one STBC matrix and vary (i.i.d) from one STBC matrix to the other. Assuming rich scattering, we model the entries of 𝐇c{\bf H}_{c} as i.i.d 𝒞𝒩(0,1)\mathcal{C}\mathcal{N}(0,1). The received space-time signal matrix, 𝐘cNr×p{\bf Y}_{c}\in{\mathbb{C}}^{N_{r}\times p}, can be written as

𝐘c=𝐇c𝐗c+𝐍c,{\bf Y}_{c}={\bf H}_{c}{\bf X}_{c}+{\bf N}_{c},\vskip-5.69054pt (1)

where 𝐍cNr×p{\bf N}_{c}\in{\mathbb{C}}^{N_{r}\times p} is the noise matrix at the receiver and its entries are modeled as i.i.d 𝒞𝒩(0,σ2=NtEsγ)\mathcal{C}\mathcal{N}\big{(}0,\sigma^{2}=\frac{N_{t}E_{s}}{\gamma}\big{)}, where EsE_{s} is the average energy of the transmitted symbols, and γ\gamma is the average received SNR per receive antenna [15], and the (i,j)(i,j)th entry in 𝐘c{\bf Y}_{c} is the received signal at the iith receive antenna in the jjth time-slot. Consider linear dispersion STBCs, where 𝐗c{\bf X}_{c} can be written in the form [15]

𝐗c\displaystyle{\bf X}_{c} =\displaystyle= i=1kxc(i)𝐀c(i),\displaystyle\sum_{i=1}^{k}x_{c}^{(i)}{\bf A}_{c}^{(i)}, (2)

where xc(i)x_{c}^{(i)} is the iith complex data symbol, and 𝐀c(i)Nt×p{\bf A}_{c}^{(i)}\in{\mathbb{C}}^{N_{t}\times p} is its corresponding weight matrix. The received signal model in (1) can be written in an equivalent V-BLAST form as

𝐲c=i=1kxc(i)(𝐇^c𝐚c(i))+𝐧c=𝐇~c𝐱c+𝐧c,{\bf y}_{c}\,\,=\,\,\sum_{i=1}^{k}x_{c}^{(i)}\,(\widehat{{\bf H}}_{c}\,{\bf a}_{c}^{(i)})+{\bf n}_{c}\,\,=\,\,\widetilde{{\bf H}}_{c}{\bf x}_{c}+{\bf n}_{c}, (3)

where 𝐲cNrp×1=vec(𝐘c){\bf y}_{c}\in{\mathbb{C}}^{N_{r}p\times 1}=vec\,({\bf Y}_{c}), 𝐇^cNrp×Ntp=(𝐈𝐇c)\widehat{{\bf H}}_{c}\in{\mathbb{C}}^{N_{r}p\times N_{t}p}=({\bf I}\otimes{\bf H}_{c}), 𝐚c(i)Ntp×1=vec(𝐀c(i)){\bf a}_{c}^{(i)}\in{\mathbb{C}}^{N_{t}p\times 1}=vec\,({\bf A}_{c}^{(i)}), 𝐧cNrp×1=vec(𝐍c){\bf n}_{c}\in{\mathbb{C}}^{N_{r}p\times 1}=vec\,({\bf N}_{c}), 𝐱ck×1{\bf x}_{c}\in{\mathbb{C}}^{k\times 1} whose iith entry is the data symbol xc(i)x_{c}^{(i)}, and 𝐇~cNrp×k\widetilde{{\bf H}}_{c}\in{\mathbb{C}}^{N_{r}p\times k} whose iith column is 𝐇^c𝐚c(i)\widehat{{\bf H}}_{c}\,{\bf a}_{c}^{(i)}, i=1,2,,ki=1,2,\cdots,k. Each element of 𝐱c{\bf x}_{c} is an MM-PAM/MM-QAM symbol. Let 𝐲c{\bf y}_{c}, 𝐇~c\widetilde{{\bf H}}_{c}, 𝐱c{\bf x}_{c}, 𝐧c{\bf n}_{c} be decomposed into real and imaginary parts as:

𝐲c=𝐲I+j𝐲Q,\displaystyle{\bf y}_{c}={\bf y}_{I}+j{\bf y}_{Q}, 𝐱c=𝐱I+j𝐱Q,\displaystyle{\bf x}_{c}={\bf x}_{I}+j{\bf x}_{Q},
𝐧c=𝐧I+j𝐧Q,\displaystyle{\bf n}_{c}={\bf n}_{I}+j{\bf n}_{Q}, 𝐇~c=𝐇I+j𝐇Q.\displaystyle\widetilde{{\bf H}}_{c}={\bf H}_{I}+j{\bf H}_{Q}. (4)

Further, we define 𝐇r2Nrp×2k{\bf H}_{r}\in{\mathbb{R}}^{2N_{r}p\times 2k}, 𝐲r2Nrp×1{\bf y}_{r}\in{\mathbb{R}}^{2N_{r}p\times 1}, 𝐱r2k×1{\bf x}_{r}\in{\mathbb{R}}^{2k\times 1}, and 𝐧r2Nrp×1{\bf n}_{r}\in{\mathbb{R}}^{2N_{r}p\times 1} as

𝐇r=(𝐇I𝐇Q𝐇Q𝐇I),𝐲r=[𝐲IT𝐲QT]T,\displaystyle{\bf H}_{r}=\left(\begin{array}[]{cc}{\bf H}_{I}\hskip 5.69054pt-{\bf H}_{Q}\\ {\bf H}_{Q}\hskip 14.22636pt{\bf H}_{I}\end{array}\right),\hskip 11.38109pt{\bf y}_{r}=[{\bf y}_{I}^{T}\hskip 5.69054pt{\bf y}_{Q}^{T}]^{T}, (7)
𝐱r=[𝐱IT𝐱QT]T,𝐧r=[𝐧IT𝐧QT]T.\displaystyle\hskip 11.38109pt{\bf x}_{r}=[{\bf x}_{I}^{T}\hskip 5.69054pt{\bf x}_{Q}^{T}]^{T},\hskip 11.38109pt{\bf n}_{r}=[{\bf n}_{I}^{T}\hskip 5.69054pt{\bf n}_{Q}^{T}]^{T}. (8)

Now, (3) can be written as

𝐲r\displaystyle{\bf y}_{r} =\displaystyle= 𝐇r𝐱r+𝐧r.\displaystyle{\bf H}_{r}{\bf x}_{r}+{\bf n}_{r}. (9)

Henceforth, we work with the real-valued system in (9). For notational simplicity, we drop subscripts rr in (9) and write

𝐲\displaystyle{\bf y} =\displaystyle= 𝐇𝐱+𝐧,\displaystyle{\bf H}{\bf x}+{\bf n},\vskip-1.42262pt (10)

where 𝐇=𝐇r2Nrp×2k{\bf H}={\bf H}_{r}\in{\mathbb{R}}^{2N_{r}p\times 2k}, 𝐲=𝐲r2Nrp×1{\bf y}={\bf y}_{r}\in{\mathbb{R}}^{2N_{r}p\times 1}, 𝐱=𝐱r2k×1{\bf x}={\bf x}_{r}\in{\mathbb{R}}^{2k\times 1}, and 𝐧=𝐧r2Nrp×1{\bf n}={\bf n}_{r}\in{\mathbb{R}}^{2N_{r}p\times 1}. We assume that the channel coefficients are known at the receiver but not at the transmitter. Let 𝔸={aq,q=1,2,,M},{\mathbb{A}}\stackrel{{\scriptstyle\triangle}}{{=}}\{a_{q},q=1,2,\cdots,M\}, where aq=2q1Ma_{q}=2q-1-M denote the MM-PAM signal set from which xix_{i} (iith entry of 𝐱{\bf x}) takes values, i=0,,2k1i=0,\cdots,2k-1. The ML solution is given by

𝐝ML\displaystyle{\bf d}_{ML} =\displaystyle= arg min𝐝𝔸2k𝐝T𝐇T𝐇𝐝2𝐲T𝐇𝐝,\displaystyle{\mbox{arg min}\atop{{\bf d}\in{\mathbb{A}}^{2k}}}\thinspace{\bf d}^{T}{\bf H}^{T}{\bf H}{\bf d}-2{\bf y}^{T}{\bf H}{\bf d}, (11)

whose complexity is exponential in kk.

II-A Full-rate Non-orthogonal STBCs from CDA

We focus on the decoding of square (i.e., n=p=Ntn\hskip-2.56073pt=\hskip-2.56073ptp\hskip-2.56073pt=\hskip-2.56073ptN_{t}), full-rate (i.e., k=pn=Nt2k\hskip-2.56073pt=\hskip-2.56073ptpn\hskip-2.56073pt=\hskip-2.56073ptN_{t}^{2}), circulant (where the weight matrices 𝐀c(i){\bf A}_{c}^{(i)}’s are permutation type), non-orthogonal STBCs from CDA [1], whose construction for arbitrary number of transmit antennas nn is given by the matrix in Eqn.(9.a) given at the bottom of this column. In (9.a), ωn=e𝐣2πn\omega_{n}=e^{\frac{{\bf j}2\pi}{n}}, 𝐣=1{\bf j}=\sqrt{-1}, and du,vd_{u,v}, 0u,vn10\leq u,v\leq n-1 are the n2n^{2} data symbols from a QAM alphabet. When δ=t=1\delta=t=1, the code in (9.a) is information lossless (ILL), and when δ=e5𝐣\delta=e^{\sqrt{5}\,{\bf j}} and t=e𝐣t=e^{{\bf j}}, it is of full-diversity and information lossless (FD-ILL) [1]. High spectral efficiencies with large nn can be achieved using this code construction. However, since these STBCs are non-orthogonal, ML detection gets increasingly impractical for large nn. Consequently, a key challenge in realizing the benefits of these large STBCs in practice is that of achieving near-ML performance for large nn at low decoding complexities. The BER performance results we report in Sec. IV show that the RTS based decoding algorithm we present in the following section essentially meets this challenge.

III RTS Algorithm for Large Non-Orthogonal STBC Decoding

In this section, we present the RTS algorithm for decoding non-orthogonal STBCs. The goal is to get 𝐱^\widehat{{\bf x}}, an estimate of 𝐱{\bf x}, given 𝐲{\bf y} and 𝐇{\bf H}.

Neighborhood Definitions: For each vector in the solution space, define the neighborhood structure as follows. Symbol neighborhood of a signal point aq𝔸a_{q}\in{\mathbb{A}}, q=1,2,,Mq=1,2,\cdots,M, is defined as the set 𝒩(aq)𝔸{aq}{\cal N}(a_{q})\subset{\mathbb{A}}-\{a_{q}\}; e.g., for 4-PAM, 𝔸={3,1,1,3}{\mathbb{A}}=\{-3,-1,1,3\}, and 𝒩(3)={1,1}{\cal N}(-3)=\{-1,1\}, 𝒩(1)={1,3}{\cal N}(1)=\{-1,3\}, and so on. Then, N=|𝒩(aq)|,q{1M}N\stackrel{{\scriptstyle\triangle}}{{=}}|{\cal N}(a_{q})|,\forall q\in\{1\cdots M\} is the number of symbol neighbors of aqa_{q}. Note that the maximum and minimum value NN can take is M1M-1 and 1, respectively. Let 𝐱(m)=[x0(m)x1(m)x2k1(m)]{\bf x}^{(m)}=\small{[x_{0}^{(m)}\thinspace x_{1}^{(m)}\cdots x_{2k-1}^{(m)}]} denote the data vector in the mmth iteration. We refer to the vector

𝐳(m)(u,v)=[z0(m)(u,v)z1(m)(u,v)z2k1(m)(u,v)],\displaystyle\mathbf{z}^{(m)}(u,v)\,=\,\big{[}z^{(m)}_{0}(u,v)\,\,\,\,z^{(m)}_{1}(u,v)\,\cdots\,z^{(m)}_{2k-1}(u,v)\big{]}, (12)

as the (u,v)(u,v)th vector neighbor of 𝐱(m)\mathbf{x}^{(m)}, u=0,,2k1u=0,\cdots,2k-1, v=0,,N1v=0,\cdots,N-1, if 1)1) 𝐱(m)\mathbf{x}^{(m)} differs from 𝐳(m)(u,v)\mathbf{z}^{(m)}(u,v) in the uuth coordinate, and 2)2) the uuth element of 𝐳(m)(u,v)\mathbf{z}^{(m)}(u,v) is the vvth symbol neighbor of xu(m)x_{u}^{(m)}. That is,

zi(m)(u,v)={xi(m)foriuwv(xu(m))fori=u,z_{i}^{(m)}(u,v)\,\,=\,\,\left\{\begin{array}[]{ll}x^{(m)}_{i}&\mbox{for}\,\,\,i\neq u\\ w_{v}(x_{u}^{(m)})&\mbox{for}\,\,\,i=u,\end{array}\right. (13)

where wv(a)w_{v}(a), v=0,1,,N1v=0,1,\cdots,N-1 is the vvth element in 𝒩(a){\cal N}(a). So we will have 2kN2kN vectors which differ from a given vector in the solution space in only one coordinate. These 2kN2kN vectors form the neighborhood of the given vector. It is noted that bit-flipping is a special case with N=1N=1 and M=2M=2.

[i=0n1d0,itiδi=0n1dn1,iωnitiδi=0n1d1,iωn(n1)itii=0n1d1,itii=0n1d0,iωnitiδi=0n1d2,iωn(n1)itii=0n1d2,itii=0n1d1,iωnitiδi=0n1d3,iωn(n1)itii=0n1dn2,itii=0n1dn3,iωnitiδi=0n1dn1,iωn(n1)itii=0n1dn1,itii=0n1dn2,iωnitii=0n1d0,iωn(n1)iti](9.a)\hskip-11.38109pt\left[\hskip-2.84526pt\begin{array}[]{cccc}\sum_{i=0}^{n-1}d_{0,i}\,t^{i}&\delta\sum_{i=0}^{n-1}d_{n-1,i}\,\omega_{n}^{i}\,t^{i}&\cdots&\delta\sum_{i=0}^{n-1}d_{1,i}\,\omega_{n}^{(n-1)i}\,t^{i}\\ \sum_{i=0}^{n-1}d_{1,i}\,t^{i}&\sum_{i=0}^{n-1}d_{0,i}\,\omega_{n}^{i}\,t^{i}&\cdots&\delta\sum_{i=0}^{n-1}d_{2,i}\,\omega_{n}^{(n-1)i}\,t^{i}\\ \sum_{i=0}^{n-1}d_{2,i}\,t^{i}&\sum_{i=0}^{n-1}d_{1,i}\,\omega_{n}^{i}\,t^{i}&\cdots&\delta\sum_{i=0}^{n-1}d_{3,i}\,\omega_{n}^{(n-1)i}\,t^{i}\\ \vdots&\vdots&\vdots&\vdots\\ \sum_{i=0}^{n-1}d_{n-2,i}\,t^{i}&\sum_{i=0}^{n-1}d_{n-3,i}\,\omega_{n}^{i}\,t^{i}&\cdots&\delta\sum_{i=0}^{n-1}d_{n-1,i}\,\omega_{n}^{(n-1)i}t^{i}\\ \sum_{i=0}^{n-1}d_{n-1,i}\,t^{i}&\sum_{i=0}^{n-1}d_{n-2,i}\,\omega_{n}^{i}\,t^{i}&\cdots&\sum_{i=0}^{n-1}d_{0,i}\,\omega_{n}^{(n-1)i}\,t^{i}\end{array}\hskip-2.84526pt\right]\hskip 0.0pt(\mbox{9.a})

Tabu Matrix: A tabu_matrix of size 2kM×N2kM\times N with non-negative integer entries is created; this matrix will contain the tabu information for all the moves in the search. A non-zero entry in the tabu_matrix means that the corresponding move is a tabu.

RTS Algorithm: Let 𝐠(m)\mathbf{g}^{(m)} be the vector which has the least ML cost found till the mmth iteration of the algorithm. Let lrepl_{rep} be the average length (in number of iterations) between two successive occurrences of the same solution vector (repetitions), at the end of an iteration. Tabu period, PP, a dynamic non-negative integer parameter, is defined. If a move is marked as tabu in an iteration, it will remain as tabu for PP subsequent iterations. The algorithm starts with an initial solution vector 𝐱(0){\bf x}^{(0)}, which, for e.g., could be the MMSE or MF output vector. Set 𝐠(0)=𝐱(0)\mathbf{g}^{(0)}={\bf x}^{(0)}, lrep=0l_{rep}=0, and P=P0P=P_{0}. All the entries of the tabu_matrix are set to zero. The following steps 1) to 3) are performed in each iteration. Consider mmth iteration in the algorithm, m0m\geq 0.

Step 1): Define 𝐲mf=𝐇T𝐲\thinspace\mathbf{y}_{mf}\stackrel{{\scriptstyle\triangle}}{{=}}\mathbf{H}^{T}\mathbf{y}, 𝐑=𝐇T𝐇\thinspace\mathbf{R}\stackrel{{\scriptstyle\triangle}}{{=}}\mathbf{H}^{T}\mathbf{H}, and 𝐟(m)=𝐑𝐱(m)𝐲mf\thinspace\mathbf{f}^{(m)}\stackrel{{\scriptstyle\triangle}}{{=}}\mathbf{R}\mathbf{x}^{(m)}-\mathbf{y}_{mf}. Let 𝐞(m)(u,v)=𝐳(m)(u,v)𝐱(m)\mathbf{e}^{(m)}(u,v)=\mathbf{z}^{(m)}(u,v)-\mathbf{x}^{(m)}. The ML costs of the 2kN2kN neighbors of 𝐱(m)\mathbf{x}^{(m)}, namely, 𝐳(m)(u,v)\mathbf{z}^{(m)}(u,v), u=0,,2k1u=0,\cdots,2k-1, v=0,,N1v=0,\cdots,N-1, are computed as

ϕ(𝐳(m)(u,v))\displaystyle\phi(\mathbf{z}^{(m)}(u,v)) =\displaystyle\hskip-5.69054pt= (𝐱(m)+𝐞(m)(u,v))T𝐑(𝐱(m)+𝐞(m)(u,v))\displaystyle\hskip-5.69054pt\big{(}\mathbf{x}^{(m)}+\mathbf{e}^{(m)}(u,v)\big{)}^{T}\mathbf{R}\thinspace\big{(}\mathbf{x}^{(m)}+\mathbf{e}^{(m)}(u,v)\big{)} (14)
2(𝐱(m)+𝐞(m)(u,v))T𝐲mf\displaystyle\hskip-5.69054pt-2\big{(}\mathbf{x}^{(m)}+\mathbf{e}^{(m)}(u,v)\big{)}^{T}\mathbf{y}_{mf}
=\displaystyle\hskip-99.58464pt= ϕ(𝐱(m))+2(𝐞(m)(u,v))T𝐑𝐱(m)\displaystyle\hskip-51.21495pt\phi(\mathbf{x}^{(m)})+2\big{(}\mathbf{e}^{(m)}(u,v)\big{)}^{T}\mathbf{R}\thinspace\mathbf{x}^{(m)}
+(𝐞(m)(u,v))T𝐑𝐞(m)(u,v)2(𝐞(m)(u,v))T𝐲mf\displaystyle\hskip-51.21495pt+\thinspace\big{(}\mathbf{e}^{(m)}(u,v)\big{)}^{T}\mathbf{R}\thinspace\mathbf{e}^{(m)}(u,v)-2\big{(}\mathbf{e}^{(m)}(u,v)\big{)}^{T}\mathbf{y}_{mf}
=\displaystyle\hskip-99.58464pt= ϕ(𝐱(m))+2eu(m)(u,v)fu(m)+(eu(m)(u,v))2𝐑u,u=C(eu(m)(u,v)),\displaystyle\hskip-51.21495pt\phi(\mathbf{x}^{(m)})+\underbrace{2\thinspace e_{u}^{(m)}(u,v)\thinspace{f}_{u}^{(m)}+\thinspace\big{(}e_{u}^{(m)}(u,v)\big{)}^{2}\thinspace\mathbf{R}_{u,u}}_{\stackrel{{\scriptstyle\triangle}}{{=}}\thinspace C\big{(}e_{u}^{(m)}(u,v)\big{)}}\,,

where eu(m)(u,v)e_{u}^{(m)}(u,v) is the uuth element of 𝐞(m)(u,v)\mathbf{e}^{(m)}(u,v), fu(m)f_{u}^{(m)} is uuth element of 𝐟(m)\mathbf{f}^{(m)}, and 𝐑u,u\mathbf{R}_{u,u} is the (u,u)(u,u)th element of 𝐑\mathbf{R}. ϕ(𝐱(m))\phi(\mathbf{x}^{(m)}) on the RHS in (14) can be dropped since it will not affect the cost minimization. Let

(u1,v1)\displaystyle\hskip-19.91692pt(u_{1},v_{1}) =\displaystyle\hskip-5.69054pt= arg minu,vC(eu(m)(u,v)).\displaystyle\hskip-5.69054pt{\mbox{arg min}\atop{u,v}}\thinspace\thinspace\thinspace C\big{(}e_{u}^{(m)}(u,v)\big{)}. (15)

The move (u1,v1)(u_{1},v_{1}) is accepted if any one of the following two conditions is satisfied:

i)i) ϕ(𝐳(m)(u,v))<ϕ(𝐠(m))\phi(\mathbf{z}^{(m)}(u,v))<\phi(\mathbf{g}^{(m)})

ii)ii) tabu_matrix((u11)M+q,v1)=0\textit{tabu\_matrix}((u_{1}-1)M+q,v_{1})=0 where q:xu(m)=aq𝔸q:x_{u}^{(m)}=a_{q}\in\mathbb{A}.

If move (u1,v1)(u_{1},v_{1}) is accepted, then make

𝐱(m+1)\displaystyle\mathbf{x}^{(m+1)} =\displaystyle= 𝐱(m)+𝐞(u1,v1).\displaystyle\mathbf{x}^{(m)}+\mathbf{e}(u_{1},v_{1}). (16)

If move (u1,v1)(u_{1},v_{1}) is not accepted (i.e., neither of conditions i)i) and ii)ii) is satisfied), find (u2,v2)(u_{2},v_{2}) such that

(u2,v2)\displaystyle\hskip-19.91692pt(u_{2},v_{2}) =\displaystyle\hskip-5.69054pt= arg minu,v:uu1,vv1C(eu(m)(u,v)),\displaystyle\hskip-5.69054pt{\mbox{arg min}\atop{u,v\thinspace\mbox{:}\thinspace u\neq u_{1},v\neq v_{1}}}\thinspace\thinspace\thinspace C\big{(}e_{u}^{(m)}(u,v)\big{)}, (17)

and check for acceptance of the (u2,v2)(u_{2},v_{2}) move. If this also cannot be accepted, repeat the procedure for (u3,v3)(u_{3},v_{3}), and so on. If all the 2kN2kN moves are tabu, then all the tabu_matrix entries are decremented by the minimum value in the tabu_matrix; this goes on till one of the moves becomes permissible. Let (u,v)(u^{\prime},v^{\prime}) be the index of the neighbor with the minimum cost for which the move is permitted. Let xu(m)=aq=wv′′(xu(m+1))x_{u^{\prime}}^{(m)}=a_{q^{\prime}}=w_{v^{\prime\prime}}(x_{u^{\prime}}^{(m+1)}), and xu(m+1)=aq′′x_{u^{\prime}}^{(m+1)}=a_{q^{\prime\prime}}, where aq,aq′′𝔸a_{q^{\prime}},a_{q^{\prime\prime}}\in\mathbb{A}.

Step 2: After a move is done, the new solution vector is checked for repetition. For the channel model in (10), repetition can be checked by comparing the ML costs of the solutions in the previous iterations. If there is a repetition, the length of the repetition from the previous occurrence is found, and the average length, lrepl_{rep}, is updated. The tabu period PP is modified as P=P+1P=P+1. If the number of iterations elapsed since the last change of the value of PP exceeds βlrep\beta l_{rep}, for a fixed β>0\beta>0, make P=P1P=P-1. The minimum value of PP, however, will be 1. Note that this step, if executed, also qualifies as the one which changed PP. After a move (u,v)(u^{\prime},v^{\prime}) is accepted, make

tabu_matrix((u1)M+q,v)=P+1,\displaystyle\textit{tabu\_matrix}\thinspace((u^{\prime}-1)M+q^{\prime},v^{\prime})\,\,=\,\,P+1,
tabu_matrix((u1)M+q′′,v′′)=P+1,\displaystyle\textit{tabu\_matrix}\thinspace((u^{\prime}-1)M+q^{\prime\prime},v^{\prime\prime})\,\,=\,\,P+1, (18)

and 𝐠(m+1)=𝐠(m)\mathbf{g}^{(m+1)}=\mathbf{g}^{(m)}. However, if ϕ(𝐱(m+1))<ϕ(𝐠(m))\phi(\mathbf{x}^{(m+1)})<\phi(\mathbf{g}^{(m)}), then make

tabu_matrix((u1)M+q,v)=  0,\displaystyle\textit{tabu\_matrix}\thinspace((u^{\prime}-1)M+q^{\prime},v^{\prime})\,\,=\,\,0,
tabu_matrix((u1)M+q′′,v′′)=  0,\displaystyle\textit{tabu\_matrix}\thinspace((u^{\prime}-1)M+q^{\prime\prime},v^{\prime\prime})\,\,=\,\,0, (19)

and 𝐠(m+1)=𝐱(m+1)\mathbf{g}^{(m+1)}=\mathbf{x}^{(m+1)}.

Step 3): Update the entries of the tabu_matrix as

tabu_matrix(r,s)\displaystyle\hskip-17.07164pt\textit{tabu\_matrix}\thinspace(r,s) =\displaystyle\hskip-2.84526pt= max{tabu_matrix(r,s)1,0},\displaystyle\hskip-2.84526pt\max\{\textit{tabu\_matrix}\thinspace(r,s)-1,0\}, (20)

for r=0,,2kM1r=0,\cdots,2kM-1, s=0,,N1.s=0,\cdots,N-1. 𝐟(m)\mathbf{f}^{(m)} is updated as

𝐟(m+1)=𝐟(m)+eu(m)(u,v)𝐑u,\displaystyle\mathbf{f}^{(m+1)}\,\,\,=\,\,\,\mathbf{f}^{(m)}+e_{u^{\prime}}^{(m)}(u^{\prime},v^{\prime})\mathbf{R}_{u^{\prime}}, (21)

where 𝐑u\mathbf{R}_{u^{\prime}} is the u{u^{\prime}}th column of 𝐑\mathbf{R}.

Stopping criterion: The algorithm can be stopped based on a fixed number of iterations. Though convergence can be slow at low SNRs (typ. hundreds of iterations), it can be fast (typ. tens of iterations) at moderate to high SNRs. So rather than fixing a large number of iterations to stop the algorithm irrespective of the SNR, we use an efficient stopping criterion which makes use of the knowledge of the best ML cost in a given iteration, as follows.

Since the ML criterion is to minimize 𝐇𝐱𝐲2{\|\mathbf{Hx}-\mathbf{y}\|}^{2}, the minimum value of the objective function 𝐱T𝐇T𝐇𝐱2𝐱T𝐇T𝐲\mathbf{x}^{T}\mathbf{H}^{T}\mathbf{Hx}-2\mathbf{x}^{T}\mathbf{H}^{T}\mathbf{y}, 𝐱2k\mathbf{x}\in\mathbb{R}^{2k}, is equal to 𝐲T𝐲-\mathbf{y}^{T}\mathbf{y}. We stop the algorithm when the least ML cost achieved in an iteration is within certain range of the global minimum, which is 𝐲T𝐲-\mathbf{y}^{T}\mathbf{y}. We stop the algorithm in the mmth iteration, if the condition

|ϕ(𝐠(m))(𝐲T𝐲)||𝐲T𝐲|<α1,\displaystyle\frac{|\phi(\mathbf{g}^{(m)})-(-\mathbf{y}^{T}\mathbf{y})|}{|-\mathbf{y}^{T}\mathbf{y}|}\,\,\,<\,\,\,\alpha_{1}, (22)

is met with at least min_iter iterations being completed to make sure the search algorithm has ‘settled.’ The bound is gradually relaxed as the number of iterations increase and the algorithm is terminated when

|ϕ(𝐠(m))(𝐲T𝐲)||𝐲T𝐲|<mα2.\displaystyle\frac{|\phi(\mathbf{g}^{(m)})-(-\mathbf{y}^{T}\mathbf{y})|}{|-\mathbf{y}^{T}\mathbf{y}|}\,\,\,<\,\,\,m\alpha_{2}. (23)

In addition, we terminate the algorithm whenever the number of repetitions of solutions exceeds max_rep. Also, the maximum number of iterations is set to max_iter. We have found that use of the following stopping criterion parameters results in low complexity without compromising much on the performance (compared to a fixed number of iterations of 300) for 4-QAM: min_iter=20\textit{min\_iter}=20, max_iter=300\textit{max\_iter}=300, max_rep=75\textit{max\_rep}=75, α1=0.05\alpha_{1}=0.05, and α2=0.0005\alpha_{2}=0.0005.

IV Simulation Results

In this section, we present the uncoded/coded BER performance of the RTS algorithm in decoding non-orthogonal STBCs with δ=t=1\delta=t=1 (i.e., ILL) and δ=e5𝐣\delta=e^{\sqrt{5}{\bf j}}, t=e𝐣t=e^{\bf j} (i.e., FD-ILL222Our simulation results show that the BER performance of FD-ILL and ILL STBCs with RTS decoding are almost the same.). The following RTS parameters are used in all the simulations: MMSE initial vector, P0=2,β=1,0.1,α1=5%,α2=0.05%,max_rep=75,max_iter=300,min_iter=20P_{0}=2,\beta=1,0.1,\alpha_{1}=5\%,\alpha_{2}=0.05\%,\textit{max\_rep=75},\textit{max\_iter}=300,\textit{min\_iter}=20.

IV-A Uncoded BER performance of RTS:

RTS versus LAS Performance: In Fig. 1, we plot the uncoded BER of the RTS algorithm as a function of average received SNR per receive antenna, γ\gamma [15], in decoding 4×44\times 4 (32 dimensions), 8×88\times 8 (128 dimensions) and 12×1212\times 12 (288 dimensions) non-orthogonal ILL STBCs for 4-QAM and Nt=NrN_{t}=N_{r}. Perfect CSIR and i.i.d fading are assumed. For the same settings, performance of the LAS algorithm in [4]-[6] are also plotted for comparison. MMSE initial vector is used in both RTS and LAS. As a reference, we have plotted the BER performance on a SISO AWGN channel as well. From Fig. 1, the following interesting observations can be made:

  • the BER of RTS algorithm improves and approaches SISO AWGN performance as Nt=NrN_{t}\hskip-2.13394pt=\hskip-2.13394ptN_{r} (i.e., STBC size) is increased; e.g., performance close to within 0.5 dB from SISO AWGN performance is achieved at 10310^{-3} uncoded BER in decoding 12×1212\times 12 STBC with 288 real dimensions.

  • RTS algorithm performs better than LAS algorithm (see RTS and LAS BER plots for 4×44\times 4 and 8×88\times 8 STBCs. Further, while both RTS and LAS algorithms exhibit large system behavior (i.e., BER improves as Nt=NrN_{t}\hskip-2.13394pt=\hskip-2.13394ptN_{r} is increased), RTS is able to achieve nearness to SISO AWGN performance at 10310^{-3} BER with less number of dimensions than with LAS. This is evident by observing that, while LAS requires 512 dimensions (16×1616\hskip-1.42262pt\times\hskip-1.42262pt16 STBC) to achieve 1 dB closeness to SISO AWGN performance at 10310^{-3} BER, RTS is able to achieve even 0.5 dB closeness with just 288 dimensions (12×1212\hskip-1.42262pt\times\hskip-1.42262pt12 STBC). RTS is able to achieve this better performance because, while the bit/symbol-flipping strategies are similar in both RTS and LAS, the inherent escape strategy in RTS allows it to move out of local minimas and move towards better solutions. Consequently, RTS incurs some extra complexity compared to LAS, without increase in the order of complexity.

RTS performance in V-BLAST: A similar observation can be made with uncoded BER of RTS detection in V-BLAST in Fig. 2 for Nt=NrN_{t}\hskip 0.0pt=\hskip 0.0ptN_{r} and 4-QAM. From Fig. 2, it is seen that LAS requires 128 dimensions (64×6464\hskip-2.13394pt\times\hskip-2.13394pt64 V-BLAST) to achieve performance within 1 dB of SISO AWGN performance at 10310^{-3} BER, whereas RTS is able to achieve even better closeness with just 64 dimensions (32×3232\hskip-1.42262pt\times\hskip-1.42262pt32 V-BLAST). In summary, the ability to achieve near SISO AWGN performance at less dimensions than LAS is an attractive feature of RTS.

Refer to caption
Figure 1: Uncoded BER of RTS decoding of 4×44\times 4, 8×88\times 8 and 12×1212\times 12 non-orthogonal STBCs from CDA. Nt=NrN_{t}=N_{r}, ILL STBCs (δ=t=1\delta=t=1), 4-QAM. RTS parameters: P0=2,β=1,α1=5%,α2=0.05%,max_iter=300,min_iter=20P_{0}=2,\beta=1,\alpha_{1}=5\%,\alpha_{2}=0.05\%,\textit{max\_iter}=300,\textit{min\_iter}=20. RTS achieves near SISO AWGN performance for increasing Nt=NrN_{t}=N_{r} (i.e., STBC size). RTS performs better than LAS.

IV-B Turbo coded BER performance of RTS

Figure 3 shows the rate-3/4 turbo coded BER of RTS decoding of 12×1212\times 12 non-orthogonal ILL STBC with Nt=NrN_{t}=N_{r} and 4-QAM (corresponding to a spectral efficiency of 18 bps/Hz), under perfect CSIR and i.i.d fading. The theoretical minimum SNR required to achieve 18 bps/Hz spectral efficiency on a Nt=Nr=12N_{t}\hskip-1.42262pt=\hskip-1.42262ptN_{r}\hskip-1.42262pt=\hskip-1.42262pt12 MIMO channel with perfect CSIR and i.i.d fading is 4.27 dB (obtained through simulation of the ergodic capacity formula [15]). From Fig. 3, it is seen that RTS decoding is able to achieve vertical fall in coded BER close to within about 5 dB from the theoretical minimum SNR, which is good nearness to capacity performance. This nearness to capacity can be further improved by 1 to 1.5 dB if soft decision values, proposed in [5], are fed to the turbo decoder.

Refer to caption
Figure 2: Uncoded BER of RTS detection of V-BLAST with Nt=NrN_{t}=N_{r} and 4-QAM. RTS parameters: P0=2,β=0.1,α1=5%,α2=0.05%,max_iter=300,min_iter=20P_{0}=2,\beta=0.1,\alpha_{1}=5\%,\alpha_{2}=0.05\%,\textit{max\_iter}=300,\textit{min\_iter}=20. RTS achieves near SISO AWGN performance for increasing Nt=NrN_{t}=N_{r}. RTS performs better than LAS.

IV-C Iterative RTS Decoding/Channel Estimation

Next, we relax the perfect CSIR assumption by considering a training based iterative RTS decoding/channel estimation scheme. Transmission is carried out in frames, where one Nt×NtN_{t}\times N_{t} pilot matrix (for training purposes) followed by NdN_{d} data STBC matrices are sent in each frame as shown in Fig. 4. One frame length, TT, (taken to be the channel coherence time) is T=(Nd+1)NtT=(N_{d}+1)N_{t} channel uses. The proposed scheme works as follows: i)i) obtain an MMSE estimate of the channel matrix during the pilot phase, ii)ii) use the estimated channel matrix to decode the data STBC matrices using RTS algorithm, and iii)iii) iterate between channel estimation and RTS decoding for a certain number of times. For 12×1212\times 12 ILL STBC, in addition to perfect CSIR performance, Fig. 3 also shows the performance with CSIR estimated using the above iterative RTS decoding/channel estimation scheme for Nd=8N_{d}=8 and Nd=20N_{d}=20. 2 iterations between RTS decoding and channel estimation are used. With Nd=20N_{d}=20 (which corresponds to large coherence times, i.e., slow fading) the BER and bps/Hz with estimated CSIR get closer to those with perfect CSIR.

IV-D Effect of MIMO Spatial Correlation

In Figs. 1 to 3, we assumed i.i.d fading. But spatial correlation at transmit/receive antennas and the structure of scattering and propagation environment can affect the rank structure of the MIMO channel resulting in degraded performance [16],[17]. We relaxed the i.i.d. fading assumption by considering the correlated MIMO channel model proposed by Gesbert et al in [17], which takes into account carrier frequency (fcf_{c}), spacing between antenna elements (dt,drd_{t},d_{r}), distance between tx and rx antennas (RR), and scattering environment. In Fig. 5, we plot the uncoded BER of RTS decoding of 12×1212\times 12 FD-ILL STBC with perfect CSIR in i)i) i.i.d. fading, and ii)ii) correlated MIMO fading model in [17]. It is seen that, compared to i.i.d fading, there is a loss in diversity order in spatial correlation for Nt=Nr=12N_{t}=N_{r}=12; further, use of more receive antennas (Nr=14,Nt=12N_{r}=14,N_{t}=12) alleviates this loss in performance. Finally, we note that have carried out simulations of RTS decoding for 16-QAM as well, where similar results reported here for 4-QAM are observed. The RTS decoding can be used to decode perfect codes of large dimensions as well.

Refer to caption
Figure 3: Turbo coded BER of RTS decoding of 12×1212\times 12 non-orthogonal ILL STBC with Nt=NrN_{t}=N_{r}, 4-QAM, rate-3/4 turbo code, and 18 bps/Hz. RTS parameters: P0=2,β=1,α1=5%,α2=0.05%,max_iter=300,min_iter=20P_{0}=2,\beta=1,\alpha_{1}=5\%,\alpha_{2}=0.05\%,\textit{max\_iter}=300,\textit{min\_iter}=20. BER of RTS with estimated CSIR approaches close to that with perfect CSIR for increasing NdN_{d} (i.e., slow fading).
Refer to caption
Figure 4: Transmission scheme with one pilot matrix followed by NdN_{d} data STBC matrices in each frame.
Refer to caption
Figure 5: Effect of spatial correlation on the performance of RTS decoding of 12×1212\times 12 FD-ILL STBC with Nt=12N_{t}=12, Nr=12,14N_{r}=12,14, 4-QAM, rate-3/4 turbo code, 18 bps/Hz. Correlated MIMO channel parameters: fc=5f_{c}=5 GHz, R=500R=500 m, S=30S=30, Dt=Dr=20D_{t}=D_{r}=20 m, θt=θr=90\theta_{t}=\theta_{r}=90^{\circ}, Nrdr=72N_{r}d_{r}=72 cm, dt=drd_{t}=d_{r}. Spatial correlation degrades achieved diversity order compared to i.i.d. Increasing NrN_{r} alleviates this performance loss.

References

  • [1] B. A. Sethuraman, B. Sundar Rajan, and V. Shashidhar, “Full-diversity high-rate space-time block codes from division algebras,” IEEE Trans. Inform. Theory, vol. 49, no. 10, pp. 2596-2616, October 2003.
  • [2] F. Oggier, J.-C. Belfiore, and E. Viterbo, Cyclic Division Algebras: A Tool for Space-Time Coding, Foundations and Trends in Commun. and Inform. Theory, vol. 4, no. 1, pp. 1-95, Now Publishers, 2007.
  • [3] J.-C. Belfiore, G. Rekaya, and E. Viterbo, “The golden code: A 2×22\times 2 full-rate space-time code with non-vanishing determinants,” IEEE Trans. Inform. Theory, vol. 51, no. 4, April 2005.
  • [4] K. Vishnu Vardhan, Saif K. Mohammed, A. Chockalingam, B. Sundar Rajan, “A low-complexity detector for large MIMO systems and multicarrier CDMA systems,” IEEE JSAC Spl. Iss. on Multiuser Detection, for Adv. Commun. Systems and Networks, pp. 473-485, April 2008.
  • [5] Saif K. Mohammed, A. Chockalingam, and B. Sundar Rajan, “A low-complexity near-ML performance achieving algorithm for large MIMO detection,” Proc. IEEE ISIT’2008, Toronto, July 2008.
  • [6] Saif K. Mohammed, A. Chockalingam, and B. Sundar Rajan, “High-rate space-time coded large MIMO systems: Low-complexity detection and performance,” Proc. IEEE GLOBECOM’2008, December 2008.
  • [7] http://www.ruckuswireless.com/technology/beamflex.php
  • [8] F. Glover, “Tabu Search - Part I,” ORSA Journal of Computing, vol. 1, no. 3, Summer 1989, pp. 190-206.
  • [9] F. Glover, “Tabu Search - Part II,” ORSA Journal of Computing, vol. 2, no. 1, Winter 1990, pp. 4-32.
  • [10] F. Glover and M. Laguna, “Tabu Search - Modern Heuristic Techniques for Combinatorial Problems,” Colin R. Reeves Ed., 70-150, Blackwell Scientific Publications, Oxford, 1993.
  • [11] R. Battiti and G. Tecchiolli, “The reactive tabu search,” ORSA Journal on Computing, no. 2, pp. 126-140, 1994.
  • [12] Y. Huang and J. A. Ritcey, “Improved 16-QAM constellation labeling for BI-STCM-ID with the Alamouti scheme,” IEEE Commun. Letters, vol. 9, no. 2, pp. 157-159, February 2005.
  • [13] P. H. Tan and L. K. Rasmussen, “Multiuser detection in CDMA - A comparison of relaxations, exact, and heuristic search methods,” IEEE Trans. Wireless Commun., pp. 1802-1809, September 2004.
  • [14] H. Zhao, H. Long, and W. Wang, “Tabu search detection for MIMO systems,” Proc. IEEE PIMRC’2007, Athens, September 2007.
  • [15] H. Jafarkhani, Space-Time Coding: Theory and Practice, Cambridge University Press, 2005.
  • [16] D. Shiu, G. J. Foschini, M. J. Gans, and J. M. Khan, “Fading correlation and its effect on the capacity of multi-antenna systems,” IEEE Trans. on Commun., vol. 48, pp. 502-513, March 2000.
  • [17] D. Gesbert, H. Bölcskei, D. A. Gore, and A. J. Paulraj, “Outdoor MIMO wireless channels: Models and performance prediction,” IEEE Trans. on Commun., vol. 50, pp. 1926-1934, December 2002.