This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Low Complexity Iterative Rake Decision Feedback Equalizer for Zero-Padded OTFS systems

Tharaj Thaj and Emanuele Viterbo
A preliminary version of this work will be presented in part at the IEEE Wireless Communications and Networking Conference (WCNC), May 2020, [3]. The authors are with Department of Electrical and Computer Systems Engineering, Monash University, Clayton, VIC 3800, Australia. E-mail: tharaj.thaj@monash.edu, emanuele.viterbo@monash.edu. This work was supported by the Australian Research Council through the Discovery Project under Grant DP200100096. Simulations were undertaken with the assistance of resources and services from the National Computational Infrastructure (NCI), which is supported by the Australian Government. ECSE Department, Monash University, Clayton, VIC 3800, Australia
Email: {tharaj.thaj, emanuele.viterbo}@monash.edu
Abstract

This paper presents a linear complexity iterative rake detector for the recently proposed orthogonal time frequency space (OTFS) modulation scheme. The basic idea is to extract and coherently combine the received multipath components of the transmitted symbols in the delay-Doppler grid using maximal ratio combining (MRC) to improve the SNR of the combined signal. We reformulate the OTFS input-output relation in simple vector form by placing guard null symbols or zero padding (ZP) in the delay-Doppler grid and exploiting the resulting circulant property of the blocks of the channel matrix. Using this vector input-output relation we propose a low complexity iterative decision feedback equalizer (DFE) based on MRC. The performance and complexity of the proposed detector favorably compares with the state of the art message passing detector. An alternative time domain MRC based detector is also proposed for even faster detection. We further propose a Gauss-Seidel based over-relaxation parameter in the rake detector to improve the performance and the convergence speed of the iterative detection. We also show how the MRC detector can be combined with outer error-correcting codes to operate as a turbo DFE scheme to further improve the error performance. All results are compared with a baseline orthogonal frequency division multiplexing (OFDM) scheme employing a single tap minimum mean square error (MMSE) equalizer.

Index Terms:
OTFS, Detector, Decoder, Rake, Maximal Ratio Combining, Delay–Doppler Channel, Turbo, DFE, Gauss Seidel, Successive Over-Relaxation.

I Introduction

Orthogonal time frequency and space (OTFS) is a new two dimensional (2D) modulation technique that transforms information symbols in the delay-Doppler domain to the familiar time-frequency domain by spreading all the information symbols (e.g., QAM) over both time and frequency to achieve maximum effective111Effective diversity introduced for OTFS in [2] is a more meaningful measure of the actual diversity at practical SNR values, when the number of transmitted symbols is large. diversity [1, 2]. As a result, a time-frequency selective channel due to multipath fading and mobility, is converted into a separable and quasi-orthogonal interaction, where all received information symbols experience roughly the same localized impairment [1]. Hence, for each information symbol, the received components in all the delay-Doppler diversity branches can be separated and coherently combined.

OTFS can also be interpreted as a two-dimensional code division multiple access (CDMA) scheme, where information symbols are spread in both time and frequency, differently from conventional CDMA systems [1]. In direct sequence CDMA operating in a multipath fading channel, a rake receiver works by combining the delayed components (or echoes) of the transmitted symbols extracted by using matched filters tuned to the respective delay shifts. Similarly, in the case of OTFS, the received delay shifted and Doppler shifted components of the transmitted information symbols can be extracted and coherently combined using linear diversity combining techniques to improve the SNR of the accumulated signal.

Diversity combining techniques are well studied in the literature starting from Brennan’s paper on linear diversity combining [4]. Rake receivers for time domain combining using a variety of linear combining schemes like maximal ratio combining (MRC), equal gain combining (EGC) and selection combining (SC) are discussed in [5, 6]. MRC is shown to be optimal in the case of correlated and uncorrelated branches, even for unequal noise and interference power in the branches [7]. Moreover, iterative rake combining schemes and its variants are shown to combat inter-symbol interference better and are well investigated in the literature for CDMA systems [8].

In this paper, we propose an iterative rake receiver for the OTFS system using the maximal ratio combining scheme. Following [3], we group the delay-Doppler grid symbols into vectors according to their delay index and reformulate the input-output relation between the transmitted and received frames in terms of these transmitted and received vectors. By placing some null symbols (zero-padding (ZP)) in the delay-Doppler domain we arrive at a reduced input-output relation, which allows the use of the maximal ratio combining to design a low complexity detector for OTFS. The overhead of the null guard symbols, needed for the proposed detection scheme, also allows to insert pilot symbols at no additional cost [10]. These null symbols in the delay-Doppler domain act as interleaved ZP guard bands in the time-domain. Taking advantage of this interleaved time-domain ZP, we further present an alternate low complexity time-domain MRC based detection for OTFS.

OTFS with the ZP guard band as mentioned above is similar to the Doppler-resilient orthogonal signal division multiplexing (D-OSDM) scheme recently proposed in [11] for under water acoustic channels [12] which is modelled as relatively faster time-varying as compared to the vehicular channel model assumption [13]. Even though the information symbols in both schemes are transmitted in the delay-Doppler domain, the main advantage of the general OTFS transceiver structure is the provision to insert arbitrary frequency domain windowing, which is not a part of the D-OSDM scheme. Windowing allows OTFS to select a subset of sub-carriers for transmission and reception, which is particularly useful in multi-user communication schemes.

The rest of the paper is organized as follows. In Section II, we discuss the system model and derive the input-output relation in the vector form. To understand the operation of the proposed detector, we look at the input-output relation in delay-time and time domains in Section III. In Section IV, the proposed MRC based iterative rake detector, its low complexity implementation and the conditions for convergence are described. In Section V, we propose further improvements to the rake detector providing faster convergence and better error performance. The simulation results are given in Section VI followed by a discussion on the complexity of the proposed algorithm in Section VII. Section VIII contains our concluding remarks and future research directions.

II OTFS System Model

II-A Notations

The following notations will be followed in this paper: aa, 𝐚\bf{a}, 𝐀{\bf A} represent a scalar, vector, and matrix, respectively; 𝐚(n){\bf a}(n) and 𝐀(m,n){\bf A}(m,n) represent the nn-th and (m,n)(m,n)-th element of 𝐚{\bf a} and 𝐀{\bf A}, respectively; 𝐀{\bf A}^{\dagger}, 𝐀{\bf A}^{*} and 𝐀n{\bf A}^{n} represent the Hermitian transpose, complex conjugate and nn-th power of 𝐀{\bf A}. The set of M×NM\times N dimensional matrices with complex entries are denoted by N×M{\mathbb{C}}^{N\times M}. Let \circledast represent circular convolution, \otimes, the Kronecker product, \circ, the Hadamard product (i.e., the element wise multiplication) and, \oslash, the Hadamard division (i.e., the element wise division). Let |𝒮||\mathcal{S}| denote the cardinality of the set 𝒮\mathcal{S}, tr(A)\mathrm{tr}(A), the trace of the square matrix 𝐀{\bf A}, vec(𝐀)({\bf A}), the column-wise vectorization of the matrix 𝐀{\bf A} and vecN,M1(𝐚){\rm vec}_{N,M}^{-1}({\bf a}) is the matrix formed by folding a vector 𝐚{\bf a} into a N×MN\times M matrix by filling it column wise. Let 𝐅N{\bf F}_{N} be the normalized NN point discrete Fourier transform (DFT) matrix with elements 𝐅N(i,k)=N1/2ej2πik/N{\bf F}_{N}(i,k)=N^{-1/2}{\rm e}^{-j2\pi ik/N} and 𝐅N{\bf F}_{N}^{{\dagger}} the inverse discrete Fourier transform (IDFT) matrix, 𝐈M{\bf I}_{M}, the M×MM\times M identity matrix. The vectors 𝟎N{\bf 0}_{N} and 𝟏N{\bf 1}_{N} denote a NN length column vector of zeros and ones, respectively. The scalar z=ej2πMNz={\rm e}^{\frac{j2\pi}{MN}}.

II-B Transmitter and Receiver Operation

The transmitter and receiver operations for the general OTFS system are described in [9, 15]. We will be using the following matrix/vector representation throughout the paper. Let 𝐗{\bf X}, 𝐘{\bf Y} M×N\in\mathbb{C}^{M\times N} be the transmitted and received two-dimensional delay-Doppler grid, forming a frame of M×NM\times N 𝒬\mathcal{Q}-QAM symbols, with unit average energy. Let 𝐱m,𝐲mN×1{\bf x}_{m},{\bf y}_{m}\in\mathbb{C}^{N\times 1} be column vectors containing the symbols in the mm-th row of 𝐗{\bf X} and 𝐘{\bf Y}, respectively: 𝐱m{\bf x}_{m} = [𝐗(m,0),𝐗(m,1),,𝐗(m,N1)]T[{\bf X}(m,0),{\bf X}(m,1),\cdots,{\bf X}(m,N-1)]^{\text{T}} and 𝐲m{\bf y}_{m} = [𝐘(m,0),𝐘(m,1),,𝐘(m,N1)]T[{\bf Y}(m,0),{\bf Y}(m,1),\cdots,{\bf Y}(m,N-1)]^{\text{T}}, where mm and nn denote the delay (row) and Doppler (column) indices, respectively, in the two-dimensional grid. The total frame duration and bandwidth of the transmitted OTFS signal frame are Tf=NTT_{f}=NT and B=MΔfB=M\Delta f, respectively. We consider the case where TΔf=1T\Delta f=1, i.e., the OTFS signal is critically sampled for any pulse shaping waveform.

II-B1 Basic OTFS Transmitter and Receiver

The delay-Doppler domain symbols in 𝐗{\bf X} is converted to the time-frequency domain (𝐗tf)({\bf X}_{\rm tf}) using the inverse symplectic fast Fourier transform (ISFFT) operation.

𝐗tf=𝐅M𝐗𝐅N{\bf X}_{\rm tf}={{\bf F}}_{M}\cdot{{\bf X}}\cdot{{\bf F}}_{N}^{{\dagger}} (1)

The “Heisenberg transform modulator” generates the time domain signal from the time-frequency samples using an M-point IFFT along with the pulse-shaping waveform gtx(t)g_{\rm tx}(t). The transmitted signal can be written as

𝐒=𝐆tx(𝐅M𝐗tf)=𝐆tx(𝐗𝐅N){{\bf S}}={{\bf G}}_{\rm tx}\cdot({{\bf F}}_{M}^{{\dagger}}\cdot{\bf X}_{\rm tf})={{\bf G}}_{\rm tx}\cdot({{\bf X}}\cdot{{\bf F}}_{N}^{{\dagger}}) (2)

where the diagonal matrix 𝐆tx{\bf G}_{\rm tx} has the samples of gtx(t)g_{\rm tx}(t) as its entries: 𝐆tx=diag[gtx(0),gtx(T/M),,gtx((M1)T/M)]𝒞M×M{\bf G}_{\rm tx}=\text{diag}[g_{\rm tx}(0),g_{\rm tx}(T/M),\ldots,g_{\rm tx}((M-1)T/M)]\in\mathcal{C}^{M\times M}. Let 𝐗~\mathbf{\tilde{\text{$\bf X$}}} be the matrix containing the delay-time samples before applying pulse shaping waveform and is related to the delay-Doppler domain symbols as

𝐗~T=[𝐱~0,,𝐱~M1]=𝐅N[𝐱0,,𝐱M1]=𝐅N𝐗T.\displaystyle{\mathbf{\tilde{\text{$\bf X$}}}}^{\rm T}=[\mathbf{\tilde{\text{$\bf x$}}}_{0},\ldots,\mathbf{\tilde{\text{$\bf x$}}}_{M-1}]={{\bf F}}_{N}^{{\dagger}}[{\bf x}_{0},\ldots,{\bf x}_{M-1}]={{\bf F}}_{N}^{{\dagger}}\cdot{\bf X}^{\rm T}. (3)

The time domain vector 𝐬𝒞NM×1{\bf s}\in\mathcal{C}^{NM\times 1}, to be transmitted into the physical channel can be written as

𝐬=vec(𝐆tx𝐗~).\displaystyle{\bf s}=\text{vec}({\bf G}_{\rm tx}\cdot\mathbf{\tilde{\text{$\bf X$}}}). (4)

These samples are pulse shaped and transmitted as a continuous time signal s(t)s(t). At the receiver, the delay-time samples are obtained from the sampled received time domain waveform 𝐫NM×1{\bf r}\in\mathbb{C}^{NM\times 1} as

𝐘~=vecN,M1((𝐈M𝐆rx)𝐫),\displaystyle\mathbf{\tilde{\text{$\bf Y$}}}=\text{vec}_{N,M}^{-1}\!\left(({\bf I}_{M}\otimes{\bf G}_{\rm rx})\cdot{\bf r}\right), (5)

where the diagonal matrix 𝐆rx{\bf G}_{\rm rx} has the samples of grx(t)g_{\rm rx}(t) as its entries: 𝐆rx=diag[grx(0),grx(T/M),,grx((M1)T/M)]𝒞M×M{\bf G}_{\rm rx}=\text{diag}[g_{\rm rx}(0),g_{\rm rx}(T/M),\ldots,g_{\rm rx}((M-1)T/M)]\in\mathcal{C}^{M\times M} is the pulse shaping filter at the receiver. The received delay-Doppler and delay-time domain symbols are related as

𝐘T=[𝐲0,,𝐲M1]=𝐅N[𝐲~0,,𝐲~M1]=𝐅N𝐘~T.\displaystyle{{\bf Y}}^{\rm T}=[{\bf y}_{0},\ldots,{\bf y}_{M-1}]={{\bf F}}_{N}[\mathbf{\tilde{\text{$\bf y$}}}_{0},\ldots,\mathbf{\tilde{\text{$\bf y$}}}_{M-1}]={\bf F}_{N}\cdot\mathbf{\tilde{\text{$\bf Y$}}}^{\rm T}. (6)

II-B2 Rectangular pulse shaping waveforms

Refer to caption
Figure 1: Discrete baseband model of the ZP-OTFS system for N=6,M=8N=6,M=8 for (a) transmitter (b) receiver and (c) the discrete delay-Doppler channel at the set of discrete delay tap indices ={0,1,2}.\mathcal{L}=\{0,1,2\}. The samples shown using the same colour in (c) represent the Doppler response in the same delay tap. In (b), two versions of the proposed Rake receiver are presented (see Section IV). The receiver chain on the top part of (b) operates directly in the information symbol domain, i.e., the delay-Doppler domain (see Algorithm 1 in Section IV.A) and the bottom part of (b) is the faster version (see Algorithm 2 in Section IV.B) which operates in the delay-time domain.

In this paper, we consider rectangular transmit and received pulse shaping waveforms which is equivalent to time-domain windowing, i.e., 𝐆tx=𝐆rx=𝐈M{\bf G}_{\rm tx}={\bf G}_{\rm rx}={\bf I}_{M}.222In general, the pulse shaping waveforms (𝐆tx)({\bf G}_{\rm tx}) could be circulant matrices (equivalent to time-domain filtering). The transmitted and received time domain discrete samples 𝐬,𝐫{\bf s,r} can then be written in terms of the delay-time samples 𝐱~m\mathbf{\tilde{\text{$\bf x$}}}_{m} and 𝐲~m\mathbf{\tilde{\text{$\bf y$}}}_{m} as

𝐬(m+nM)=𝐱~m(n),\displaystyle{\bf s}(m+nM)=\mathbf{\tilde{\text{$\bf x$}}}_{m}(n),
𝐫(m+nM)=𝐲~m(n).\displaystyle{\bf r}(m+nM)=\mathbf{\tilde{\text{$\bf y$}}}_{m}(n). (7)

In this case, the transmitted and received discrete time domain signal samples can be related to the delay-Doppler domain information symbols as

𝐬=vec(𝐗𝐅N)and𝐫=vec(𝐘𝐅N).\displaystyle{\bf s}=\text{vec}({\bf X}\cdot{\bf F}_{N}^{{\dagger}})\quad\text{and}\quad{\bf r}=\text{vec}({\bf Y}\cdot{\bf F}_{N}^{{\dagger}}). (8)

The operation in (8) in the literature is known as the inverse discrete Zak transform [16].

The simplified transmitter and receiver baseband equivalent model for rectangular pulse shaping waveforms and two MRC based detection methods (to be discussed in Section IV) are shown in Fig. 1 (a) and (b). The last lmaxl_{\rm max} symbol vectors (rows) of the transmitted delay-Doppler grid, where lmaxl_{\rm max} is the maximum channel delay spread index, are made zero to avoid inter-block interference in the time-domain. These zero vectors aid in reducing the complexity of detection for OTFS (explained in Section III-B) by allowing parallel processing of the NN independent time domain blocks of duration TT.

For the rest of the paper, to differentiate with the basic OTFS scheme, as discussed in [1, 9], we refer to the above scheme including zero padding as the ZP-OTFS. Our main motivation behind adding the delay-Doppler domain ZP is the design of a low complexity detector for OTFS, [3]. Adding a ZP along the delay dimension in the OTFS delay-Doppler grid can be seen as analogous to the time-domain CP or ZP added in orthogonal frequency division multiplexing (OFDM), which allows the design of a single tap equalizer in the time-frequency domain, and hence contribute to reduction in detector complexity. Moreover, in OTFS, the ZP can be used as guard band for the pilot in the delay-Doppler domain [10], and hence reduction in detector complexity can be achieved at little cost, which is convenient for the ZP-OTFS system.

II-C Continuous Time Baseband Channel Model

Consider a baseband equivalent channel model333We do not consider the effects of carrier frequency and antenna gains in this paper. with PP propagation paths, where hih_{i} is the complex path gain, i\ell_{i} and κi\kappa_{i} are the normalized delay shift and normalized Doppler shift, respectively, associated with the ii-th path, where i,κi\ell_{i},\kappa_{i}\in\mathbb{R} are not necessarily integers. The actual delay and Doppler shift for the ii-th path is given by τi=iMΔf<τmax=maxMΔf\tau_{i}=\frac{\ell_{i}}{M\Delta f}<\tau_{\max}=\frac{\ell_{\max}}{M\Delta f}, νi=κiNT\nu_{i}=\frac{\kappa_{i}}{NT} with |νi|<νmax|\nu_{i}|<\nu_{\max}. We assume that the channel is under-spread, i.e., τmaxνmax1\tau_{\max}{\nu_{\max}}\ll 1. Under the under-spread assumption, max<M\ell_{\max}<M and the normalized Doppler shifts N/2<κi<N/2-N/2<\kappa_{i}<N/2. Since the number of channel coefficients PP in the delay-Doppler domain is typically limited, the channel response has a sparse representation [1, 9]:

h(τ,ν)=i=1Phiδ(ττi)δ(ννi).h(\tau,\nu)=\sum_{i=1}^{P}h_{i}\delta(\tau-\tau_{i})\delta(\nu-\nu_{i}). (9)

Alternatively, we can write,

h(τ,ν)=κ𝒦ν(κ)δ(τT/M)δ(νκΔf/N)h(\tau,\nu)=\sum_{\ell\in\mathcal{L}^{\prime}}\sum_{\kappa\in\mathcal{K}_{\ell}}{\nu}_{\ell}(\kappa)\delta(\tau-{\ell}T/M)\delta(\nu-{\kappa}{\Delta f}/N) (10)

where ={i}\mathcal{L}^{\prime}=\{\ell_{i}\} is the set of L=||L^{\prime}=|\mathcal{L}^{\prime}| distinct normalized delay shifts among the PP paths in the delay-Doppler domain, 𝒦={κi=i}\mathcal{K}_{\ell}=\{\kappa_{i}\mid\ell=\ell_{i}\} is the set of normalized Doppler shifts for each path with normalized delay shift i\ell_{i}, and

ν(κ)={hi,if =i and κ=κi0,otherwise.\displaystyle{\nu}_{\ell}(\kappa)=\left.\left\{\begin{array}[]{ll}h_{i},&\text{if }\ell=\ell_{i}\text{ and }\kappa=\kappa_{i}\\ 0,&\text{otherwise.}\end{array}\right.\right. (13)

is the \ell-th delay tap Doppler response. The magnitude of a Doppler response function ν(κ){\nu}_{\ell}(\kappa) evaluated at integer delay and Doppler shifts is shown in Fig. 1.

The corresponding continuous time-varying channel impulse response function can be written, for all \ell\in\mathcal{L}^{\prime}, as

g(τ,t)\displaystyle g(\tau,t) =νh(τ,ν)ej2πν(tτ)𝑑ν.\displaystyle=\int_{\nu}h(\tau,\nu){\rm e}^{j2\pi\nu(t-\tau)}\,d\nu. (14)

Substituting (13) into (14) and evaluating (14) at τ=T/M\tau=\ell T/M, we get,

g(T/M,t)\displaystyle g(\ell T/M,t) =κ𝒦ν(κ)ej2πκΔfN(tT/M)\displaystyle=\sum_{\kappa\in\mathcal{K}_{\ell}}{\nu}_{\ell}(\kappa){\rm e}^{j2\pi\kappa\frac{\Delta f}{N}(t-\ell T/M)} (15)

which represents the delay-time channel response, for all \ell\in\mathcal{L}^{\prime}.

II-D Discrete Time Baseband Channel Model

At the transmitter, the OTFS frame of bandwidth B=MΔfB=M\Delta f is up-converted to a carrier frequency fcf_{c} to occupy a pass band channel, assuming fcBf_{c}\gg B. At the receiver, the channel impaired signal is down-converted to baseband and sampled at MΔfM\Delta f Hz, thereby limiting the received waveform to NMNM complex samples. Therefore, from a communication system design point of view, it is convenient to have a discrete baseband equivalent representation of the system, [14].

In the previous section, we looked at the continuous time model of the channel. The discrete time model is obtained by sampling the received waveform r(t)r(t) at sampling intervals t=qT/Mt=qT/M, where 0qNM10\leq q\leq NM-1, which discretizes the delay-time channel. The set of normalized delay shifts, \mathcal{L}^{\prime} is therefore replaced as \mathcal{L} with the set of L=||L=|\mathcal{L}| discrete delay taps representing delay shifts at integer multiples of the sampling period T/MT/M. Recall that ΔfN\frac{\Delta f}{N} and TM\frac{T}{M} are the Doppler and delay resolution, respectively, of the delay-Doppler grid, given TΔf=1T{\Delta f}=1. Following from the sampling theorem [14], the discrete baseband delay-time channel model of (15) is given as,

gs(l,q)\displaystyle g^{\rm s}(l,q) =(κ𝒦ν(κ)zκ(ql))sinc(l)\displaystyle=\sum_{\ell\in\mathcal{L}^{\prime}}\left(\sum_{\kappa\in\mathcal{K}_{\ell}}{\nu}_{\ell}(\kappa)z^{\kappa(q-l)}\right){\rm sinc}(l-\ell) (16)

where sinc(x)=sin(πx)/(πx){\rm sinc}(x)={\rm sin}(\pi x)/{(\pi x)} and z=ej2πNMz={\rm e}^{\frac{j2\pi}{NM}}.

Note that, due to fractional delays, the sampling at the receiver introduces interference between Doppler responses at different delay shifts. This is due to sinc reconstruction of the delay-time response at fractional delay points (\ell), [14]. However, under the assumption that the channel delay shifts can be modelled as integer delay shifts without loss of accuracy, i.e., when =\mathcal{L}^{\prime}=\mathcal{L} and hence =l\ell=l^{\prime}\in\mathbb{Z}, the sinc function in (16) reduces to

sinc(ll)={1,if l=l0,otherwise.\displaystyle{\rm sinc}(l-l^{\prime})=\left.\left\{\begin{array}[]{ll}1,&\text{if }l^{\prime}=l\\ 0,&\text{otherwise.}\end{array}\right.\right. (19)

Consequently, the relation between the actual Doppler response and the sampled time-domain channel at each integer delay tap ll\in\mathcal{L} in (16) reduces to

gs(l,q)=κ𝒦lνl(κ)zκ(ql).g^{\rm s}(l,q)=\sum_{\kappa\in\mathcal{K}_{l}}{\nu}_{l}(\kappa)z^{\kappa(q-l)}. (20)

Here we want to remind the readers that the effective channel as seen by the receiver depends on the actual channel response as well as the operation parameters (delay and Doppler resolution) of the receiver.

For the rest of the paper, to clearly differentiate between the real continuous channel and the effective discrete channel as seen by the receiver, we use \ell and κ\kappa to denote the normalized delay and Doppler shifts (not necessarily integers) associated with the channel whereas ll and kk is used only to denote integer delay and Doppler shift indices, respectively, associated with the channel sampled on the OTFS delay-Doppler grid.

II-E Input-Output Relations in Delay-Doppler Domain

In this section, we reformulate the input-output relation with rectangular pulse shaping waveforms, for the ZP-OTFS system shown in Fig. 1.

Starting from the received time-domain signal r(t)r(t), the continuous time domain input-output relation can be written as

r(t)=0τmaxg(τ,t)s(tτ)𝑑τ.r(t)=\int_{0}^{\tau_{\rm max}}g(\tau,t)s(t-\tau)\,d\,\tau. (21)

From (16), the corresponding discrete time-domain input-output relation when the transmitted and received time-domain signals are sampled at t=qT/Mt=qT/M can be written as

𝐫(q)=lgs(l,q)𝐬(ql){\bf r}(q)=\sum_{l\in\mathcal{L}}g^{\rm s}(l,q){\bf s}(q-l) (22)

where 𝐫(q)=r(qTM){\bf r}(q)=r(q\frac{T}{M}), 𝐬(q)=s(qTM){\bf s}(q)=s(q\frac{T}{M}). Using the relations in (7), we split the time index q=0,,MN1q=0,\ldots,MN-1 in terms of the delay and Doppler frame indices as q=(m+nM)q=(m+nM), where the m=0,1,,M1m=0,1,\ldots,M-1 and n=0,1,,N1n=0,1,\ldots,N-1. Then replacing 𝝂~m,l(n)=gs(l,m+nM)\mathbf{\tilde{\text{$\boldsymbol{\nu}$}}}_{m,l}(n)=g^{\rm s}(l,m+nM), we can rewrite (22) in terms of the delay-time symbol vectors as

𝐲~m(n)\displaystyle{\mathbf{\tilde{\text{$\bf y$}}}}_{m}(n) =l𝝂~m,l(n)𝐱~ml(n)\displaystyle=\sum_{l\in\mathcal{L}}{\mathbf{\tilde{\text{$\boldsymbol{\nu}$}}}_{m,l}(n)}{\mathbf{\tilde{\text{$\bf x$}}}}_{m-l}(n) (23)

where 𝝂~m,lN×1\mathbf{\tilde{\text{$\boldsymbol{\nu}$}}}_{m,l}\in\mathbb{C}^{N\times 1} is given as

𝝂~m,l(n)\displaystyle{\mathbf{\tilde{\text{$\boldsymbol{\nu}$}}}_{m,l}(n)} =(κ𝒦lν(κ)zκ(ml)ej2πκnN)sinc(l).\displaystyle=\sum_{\ell\in\mathcal{L}^{\prime}}\left(\sum_{\kappa\in\mathcal{K}_{l}}{\nu}_{\ell}(\kappa)z^{\kappa(m-l)}{\rm e}^{\frac{j2\pi\kappa n}{N}}\right){\rm sinc}(l-\ell). (24)

For integer delay tap channel assumption, i.e., l=l=\ell\in\mathbb{Z}, (24) becomes,

𝝂~m,l(n)\displaystyle{\mathbf{\tilde{\text{$\boldsymbol{\nu}$}}}_{m,l}(n)} =κ𝒦lνl(κ)zκ(ml)ej2πκnN.\displaystyle=\sum_{\kappa\in\mathcal{K}_{l}}{\nu}_{l}(\kappa)z^{\kappa(m-l)}{\rm e}^{\frac{j2\pi\kappa n}{N}}. (25)

We can note from (25) that the discrete delay-time response 𝝂~m,l(n){\mathbf{\tilde{\text{$\boldsymbol{\nu}$}}}_{m,l}(n)} for each delay tap ll at time instants t=mMT+nTt=\frac{m}{M}T+nT is related to the inverse Fourier transform of the Doppler response νl(κ){\nu}_{l}(\kappa) of the ll-th delay tap sampled at time t=mMTt=\frac{m}{M}T. We may ignore the case in (23) when ml<0m-l<0 i.e., when there is inter-block interference due to channel delay spread, by making 𝐱~m(n)=0\mathbf{\tilde{\text{$\bf x$}}}_{m}(n)=0 for all nn when ml<0m-l<0 such that,

𝝂~m,l(n)𝐱~ml([nk]N)=0, if m<l\displaystyle{\mathbf{\tilde{\text{$\boldsymbol{\nu}$}}}_{m,l}}(n)\mathbf{\tilde{\text{${\bf x}$}}}_{m-l}([n-k]_{N})=0,\text{ if }m<l (26)

This is equivalent to placing null symbol vectors 𝟎N{\bf 0}_{N} in the last lmaxl_{\max} rows of 𝐗{\bf X} (zero padding along the delay dimension of the OTFS grid). Hence, we can set, for n=0,,N1n=0,\ldots,N-1,

𝐱m(n)=𝐱~m(n)=0, if mMlmax\displaystyle{\bf x}_{m}(n)={\mathbf{\tilde{\text{$\bf x$}}}}_{m}(n)=0,\text{ if }m\geq M-l_{\max} (27)

The delay-Doppler domain received symbols can be obtained by taking an NN-point FFT of the delay-time received symbol vectors (6)

𝐲m=𝐅N𝐲~m=\displaystyle{{\bf y}}_{m}={\bf F}_{N}\cdot\mathbf{\tilde{\text{$\bf y$}}}_{m}= l𝐅N(𝝂~m,l𝐱~ml)\displaystyle\sum_{l\in\mathcal{L}}{\bf F}_{N}\cdot({\mathbf{\tilde{\text{$\boldsymbol{\nu}$}}}_{m,l}}\circ{\mathbf{\tilde{\text{$\bf x$}}}}_{m-l})
=l(𝐅N𝝂~m,l)(𝐅N𝐱~ml)\displaystyle=\sum_{l\in\mathcal{L}}({\bf F}_{N}\cdot{\mathbf{\tilde{\text{$\boldsymbol{\nu}$}}}_{m,l}})\circledast({\bf F}_{N}\cdot{\mathbf{\tilde{\text{$\bf x$}}}}_{m-l})
=l𝝂m,l𝐱ml\displaystyle=\sum_{l\in\mathcal{L}}{\boldsymbol{\nu}}_{m,l}\circledast{\bf x}_{m-l} (28)

where,

𝝂m,l(k)=1Nn=0N1𝝂~m,l(n)ej2πknN\displaystyle{{\boldsymbol{\nu}}_{m,l}(k)}=\frac{1}{\sqrt{N}}\sum_{n=0}^{N-1}\mathbf{\tilde{\text{$\boldsymbol{\nu}$}}}_{m,l}(n){\rm e}^{\frac{-j2\pi kn}{N}} (29)

for 0kN10\leq k\leq N-1, 0m<Mlmax0\leq m<M-l_{max}, is the discrete Doppler spread vector in the ll-th channel delay tap, experienced by all the symbols in the (ml)\left(m-l\right)-th row of the M×NM\times N OTFS delay-Doppler grid. Fig. 1 (c) shows the discrete Doppler spread vectors 𝝂l,l{\boldsymbol{\nu}}_{l,l} for 𝐱0{\bf x}_{0}. Substituting (16), (25) and (24) in (29), we can write the discrete Doppler spread vector 𝝂m,lN×1{\boldsymbol{\nu}}_{m,l}\in\mathbb{C}^{N\times 1} in terms of the channel Doppler response ν(κ){\nu}_{\ell}(\kappa), for a channel model assuming:

II-E1 Fractional delay and fractional Doppler shifts

𝝂m,l(k)\displaystyle{{\boldsymbol{\nu}}_{m,l}(k)} =1N(κ𝒦ν(κ)zκ(ml)ζN(κk))sinc(l)\displaystyle=\frac{1}{\sqrt{N}}\sum_{\ell\in\mathcal{L}^{\prime}}\left(\sum_{\kappa\in\mathcal{K}_{\ell}}{\nu}_{\ell}(\kappa)z^{\kappa(m-l)}\zeta_{N}(\kappa-k)\right){\rm sinc}(l-\ell) (30)

where ,κ\ell,\kappa\in\mathbb{R} and the periodic sinc function ζ()\zeta(\cdot) includes the extra phase and magnitude variations in the Doppler spread vectors due to fractional Doppler shifts, given as

ζN(x)=1Nn=0N1ej2πxnN=1Nsin(πx)sin(πx/N)ejπx(N1)N\displaystyle\zeta_{N}(x)=\frac{1}{\sqrt{N}}\sum_{n=0}^{N-1}{\rm e}^{\frac{j2\pi xn}{N}}=\frac{1}{\sqrt{N}}\frac{{\rm sin}\left(\pi x\right)}{{\rm sin}(\pi x/N)}{\rm e}^{\frac{j\pi x(N-1)}{N}} (31)

II-E2 Integer delay and fractional Doppler shifts

For integer values of (l)(l-\ell), the function sinc(l){\rm sinc}(l-\ell) evaluates to 11 when l=l=\ell and zero else where. Hence (30) reduces to

𝝂m,l(k)\displaystyle{{\boldsymbol{\nu}}_{m,l}(k)} =1Nκ𝒦lν(κ)zκ(ml)ζN(κk)\displaystyle=\frac{1}{\sqrt{N}}\sum_{\kappa\in\mathcal{K}_{l}}{\nu}_{\ell}(\kappa)z^{\kappa(m-l)}\zeta_{N}(\kappa-k) (32)

for l=l=\ell\in\mathbb{Z} and κ\kappa\in\mathbb{R}

II-E3 Integer delay and integer Doppler shifts

For integer values of xx, the function ζN(x)\zeta_{N}(x) evaluates to N\sqrt{N} when x=0x=0 and zero else where. Hence (32) reduces to the simple form

𝝂m,l(k)={ν(κ)zκ(ml),if l= and k=[κ]N0,otherwise.\displaystyle{{\boldsymbol{\nu}}_{m,l}(k)}=\left.\left\{\begin{array}[]{ll}{\nu}_{\ell}(\kappa)z^{\kappa(m-l)},&\text{if }l=\ell\text{ and }k=[\kappa]_{N}\\ 0,&\text{otherwise.}\end{array}\right.\right. (35)

for ,κ\ell,\kappa\in\mathbb{Z}.

Remark – The above three cases result in phase changes zκ(ml)z^{\kappa(m-l)} due to the rectangular pulse shaping waveforms. For the ideal pulse shaping waveform assumption, it was shown in [3, 9] that the Doppler spread vectors 𝝂m,l{\boldsymbol{\nu}}_{m,l} are invariant on the 2-D delay-Doppler grid and hence not dependent on the row index mm. The phase variations zκ(ml)z^{\kappa(m-l)} can be ignored in (30), (32) and (35). As a result (28) is a simple time-invariant 2-D circular convolution as shown in [3, 9]. It is important to note that ignoring such phase variations in the detection process results in significant performance degradation. \square

For the rest of the paper and simulations, we assume integer delays and fractional Doppler shifts for rectangular pulse shaping waveforms, i.e., we consider the discrete input-output relation of the form given in (28) and (32) where =\mathcal{L}^{\prime}=\mathcal{L}\in\mathbb{Z}.

The OTFS delay-Doppler domain discrete system for the ZP OTFS system can be expressed in the matrix form as

𝐲=𝐇𝐱+𝐰;{\bf y}={\bf H}\cdot{\bf x}+{\bf w}; (36)

where 𝐱,𝐲,𝐰NM×1{\bf x,y,w}\in{\mathbb{C}}^{NM\times 1} and 𝐇NM×NM{\bf H}\in{\mathbb{C}}^{NM\times NM} is the OTFS channel matrix when transmitted and received symbol-vectors, 𝐱m,𝐲mN×1{\bf x}_{m},{\bf y}_{m}\in{\mathbb{C}}^{N\times 1} are grouped and stacked as 𝐲=[𝐲0T,𝐲1T,,𝐲M1T]T{\bf y}=[{\bf y}_{0}^{\text{T}},{\bf y}_{1}^{\text{T}},\cdots,{\bf y}_{M-1}^{\text{T}}]^{\text{T}}, 𝐱=[𝐱0T,𝐱1T,,𝐱M1T]T{\bf x}=[{\bf x}_{0}^{\text{T}},{\bf x}_{1}^{\text{T}},\cdots,{\bf x}_{M-1}^{\text{T}}]^{\text{T}} and 𝐰=[𝐰0T,𝐰1T,,𝐰M1T]T{\bf w}=[{\bf w}_{0}^{\text{T}},{\bf w}_{1}^{\text{T}},\cdots,{\bf w}_{M-1}^{\text{T}}]^{\text{T}} is independent and identically distributed (iid) additive white guassian noise (AWGN) with variance σw2\sigma_{w}^{2}. Referring to the vectorized form shown in Fig. 2, we convert the circular convolution between two vectors into the product of a matrix and a vector by defining 𝐊m,lN×N{\bf K}_{m,l}\in\mathbb{C}^{N\times N} to be a banded matrix for ll\in\mathcal{L} and an all zero matrix otherwise

𝐊m,l\displaystyle{\bf K}_{m,l} =circ[𝝂m,l(0),,𝝂m,l(N1)]\displaystyle=\text{circ}[{\boldsymbol{\nu}}_{m,l}(0),\cdots,{\boldsymbol{\nu}}_{m,l}(N-1)]
=[𝝂m,l(0)𝝂m,l(N1)𝝂m,l(1)𝝂m,l(1)𝝂m,l(0)𝝂m,l(2)𝝂m,l(N1)𝝂m,l(N2)𝝂m,l(0)].\displaystyle=\left[\begin{array}[]{cccc}{{\boldsymbol{\nu}}_{m,l}}(0)&{{\boldsymbol{\nu}}_{m,l}}(N-1)&\cdots&{{\boldsymbol{\nu}}_{m,l}}(1)\\ {{\boldsymbol{\nu}}_{m,l}}(1)&{{\boldsymbol{\nu}}_{m,l}}(0)&\cdots&{{\boldsymbol{\nu}}_{m,l}}(2)\\ \vdots&\ddots&\ddots&\vdots\\ {{\boldsymbol{\nu}}_{m,l}}(N-1)&{{\boldsymbol{\nu}}_{m,l}}(N-2)&\cdots&{{\boldsymbol{\nu}}_{m,l}}(0)\end{array}\right]~.

We note that the band width of each submatrix 𝐊m,l{\bf K}_{m,l} of 𝐇{\bf H} is equal to the maximum Doppler spread kmaxN/2k_{\max}\leq N/2 and the full channel matrix 𝐇{\bf H} has a band width equal to N(lmax+1)N(l_{\max}+1). We can then write (28) as

𝐲m=l𝐊m,l𝐱ml.\displaystyle{{\bf y}}_{m}=\sum_{l\in\mathcal{L}}{\bf K}_{m,l}\cdot{{\bf x}}_{m-l}. (37)
Refer to caption
Figure 2: The delay-Doppler domain input-output relation 𝐲=𝐇𝐱{\bf y}={\bf H}\cdot{\bf x} after adding null symbols only contains the shaded blocks for N=M=8N=M=8 and lmax=3l_{max}=3.

Note that 𝐊m,l{\bf K}_{m,l} (or 𝝂m,l{\boldsymbol{\nu}}_{m,{l}}) can be considered as the linear time-variant channel between the receiver grid delay index mm and transmitter grid delay index mlm-l in the OTFS delay-Doppler grid. Now (28) and (37) gives us a very simple equation relating the transmitted and received symbol-vectors that we defined at the start of this section.

III Input-Output Relation in Other Domains

In this section, we discuss the ZP-OTFS input-output relation between the transmitted and received delay-time symbol vectors and discuss the advantages of carrying out significant part of the OTFS receiver processing in the delay-time domain. We also highlight some properties of the delay-time and time-domain channel matrices to later analyze the convergence of the proposed detector.

When NN and MM are sufficiently large, considering the channel normalized delay and Doppler shifts (κi\kappa_{i} and i\ell_{i}) as integers has negligible effect on the accuracy of the channel representation. However, the effect of fractional Doppler is more pronounced for short OTFS frames, [22]. When NN is small, a single path with fractional Doppler shift is seen as a cluster of paths with integer Doppler shifts at the receiver. Depending on the resolution, more channel coefficients along the Doppler dimension are required to fully represent the channel state information needed for accurate detection at the receiver, [9]. This increases the total number of paths PP for the discrete channel. To mitigate such problem, the value of NN may be increased, which, in turn, will increase the frame duration NTNT. However, the frame duration is limited by the delay-Doppler coherence time,444This coherence time should not be confused with the traditional notion related to the inverse of the Doppler spread, [3]. i.e, the time over which the delay-Doppler channel coefficients remain constant.

Another way of solving the fractional Doppler issue is by dealing with the delay-Doppler channel coefficients in the delay-time domain. As Doppler shifts cannot be resolved in this domain, the number of delay-time channel coefficients is neither affected by the fractional Doppler shifts nor by the Doppler spread of that delay tap. Therefore, to fully take advantage of the OTFS performance in a rich Doppler spread regime (i.e., large |𝒦l||\mathcal{K}_{l}|’s), it is convenient to design a receiver with low complexity that is independent of the Doppler spread.

III-A Delay-Time Domain

For the purpose of delay-time detection analysis in Section IV, we look at the matrix representation of the delay-time input-output relation. The matrices 𝐊m,l{\bf K}_{m,l} in the delay-Doppler domain can be diagonalized to 𝐊~m,l\mathbf{\tilde{\text{$\bf K$}}}_{m,l} in the corresponding Fourier domain (delay-time domain) as

𝐊m,l=𝐅N𝐊~m,l𝐅N,\displaystyle{\bf K}_{m,l}={\bf F}_{N}\cdot\mathbf{\tilde{\text{$\bf K$}}}_{m,l}\cdot{\bf F}_{N}^{{\dagger}},
\displaystyle\implies 𝐊~m,l=diag[𝝂~m,l(0),,𝝂~m,l(N1)]\displaystyle\mathbf{\tilde{\text{$\bf K$}}}_{m,l}=\text{diag}[\mathbf{\tilde{\text{$\boldsymbol{\nu}$}}}_{m,{l}}(0),\cdots,\mathbf{\tilde{\text{$\boldsymbol{\nu}$}}}_{m,{l}}(N-1)]
where 𝝂~m,l=𝐅N𝝂m,l\displaystyle\text{where }\mathbf{\tilde{\text{$\boldsymbol{\nu}$}}}_{m,l}={\bf F}_{N}^{{\dagger}}{\boldsymbol{\nu}}_{m,l}
Refer to caption
Figure 3: The delay-time domain input-output relation (𝐲~=𝐇~𝐱~\mathbf{\tilde{\text{$\bf y$}}}=\mathbf{\tilde{\text{$\bf H$}}}\cdot\mathbf{\tilde{\text{$\bf x$}}}) after adding null symbols for N=M=8N=M=8 and lmax=3l_{\max}=3.

thereby transforming the delay-Doppler domain channel matrix 𝐇{\bf H} into the delay-time domain channel matrix 𝐇~\mathbf{\tilde{\text{$\bf H$}}} by replacing the sub-matrices 𝐊m,l{\bf K}_{m,l} in 𝐇{\bf H} with 𝐊~m,l\mathbf{\tilde{\text{$\bf K$}}}_{m,l}. Given the input-output relation in (36) was simplified in (37) by placing null symbols in the delay-Doppler grid as given in (27), the strictly upper triangular blocks of 𝐇~\mathbf{\tilde{\text{$\bf H$}}} can also be set to zero. The input-output relation in the delay-time domain, illustrated in Fig. 3, can then be written in the matrix form as

𝐲~=𝐇~𝐱~+𝐰~;\mathbf{\tilde{\text{$\bf y$}}}=\mathbf{\tilde{\text{$\bf H$}}}\cdot\mathbf{\tilde{\text{$\bf x$}}}+\mathbf{\tilde{\text{$\bf w$}}}; (38)

where

𝐲~=(𝐈M𝐅N)𝐲,𝐱~=(𝐈M𝐅N)𝐱,\displaystyle\mathbf{\tilde{\text{$\bf y$}}}=({\bf I}_{M}\otimes{\bf F}_{N}^{{\dagger}})\cdot{\bf y},\quad\mathbf{\tilde{\text{$\bf x$}}}=({\bf I}_{M}\otimes{\bf F}_{N}^{{\dagger}})\cdot{\bf x},
𝐇~=(𝐈M𝐅N)𝐇(𝐈M𝐅N),\displaystyle\mathbf{\tilde{\text{$\bf H$}}}=({\bf I}_{M}\otimes{\bf F}_{N}^{{\dagger}})\cdot{\bf H}\cdot({\bf I}_{M}\otimes{\bf F}_{N}), (39)

and 𝐰~\mathbf{\tilde{\text{$\bf w$}}} is the time domain AWGN vector. In this domain, the complexity of matrix multiplication is significantly reduced as the sparsity L/NL/N of 𝐇~\mathbf{\tilde{\text{$\bf H$}}} is less than or equal to the sparsity P/NP/N of 𝐇{\bf H}, where LL is the number of unique delay taps and PP is the total number of propagation paths. The delay-time domain channel matrix 𝐇~\mathbf{\tilde{\text{$\bf H$}}} is a banded block matrix (with a bandwidth of Nlmax+1Nl_{\max}+1), where 𝐊~m,l𝒞N×N\mathbf{\tilde{\text{$\bf K$}}}_{m,l}\in\mathcal{C}^{N\times N} are non-zero diagonal matrices for mlm\geq l and ll\in\mathcal{L} and zero matrices otherwise. Consequently, the delay-Doppler domain input-output relation in (28) becomes

𝐲~m=l𝝂~m,l𝐱~ml,𝐱~m=𝟎N for mMlmax.\displaystyle{\mathbf{\tilde{\text{$\bf y$}}}}_{m}=\sum_{l\in\mathcal{L}}\mathbf{\tilde{\text{$\boldsymbol{\nu}$}}}_{m,l}\circ{\mathbf{\tilde{\text{$\bf x$}}}}_{m-l},\quad\mathbf{\tilde{\text{$\bf x$}}}_{m}={\bf 0}_{N}\text{ for }m\geq M-l_{\rm max}. (40)

in the delay-time domain, where 𝐱~=[𝐱~0T,,𝐱~M1T]T\mathbf{\tilde{\text{$\bf x$}}}=[\mathbf{\tilde{\text{$\bf x$}}}_{0}^{\text{T}},\cdots,\mathbf{\tilde{\text{$\bf x$}}}_{M-1}^{\text{T}}]^{\text{T}} and 𝐲~=[𝐲~0T,,𝐲~M1T]T\mathbf{\tilde{\text{$\bf y$}}}=[\mathbf{\tilde{\text{$\bf y$}}}_{0}^{\text{T}},\cdots,\mathbf{\tilde{\text{$\bf y$}}}_{M-1}^{\text{T}}]^{\text{T}}.

III-B Time Domain

Refer to caption
Figure 4: The time-domain input-output relation 𝐫=𝐆𝐬{\bf r}={\bf G}\cdot{\bf s} after shuffling the matrix 𝐇~\mathbf{\tilde{\text{$\bf H$}}} as 𝐆=𝐏𝐇~𝐏T{\bf G}={\bf P}\cdot\mathbf{\tilde{\text{$\bf H$}}}\cdot{\bf P}^{\rm T} for N=M=8N=M=8 and lmax=3l_{\max}=3.

Here, we show how the time domain input-output relation is connected to the delay-Doppler and the delay-time domain input-output relations.

From (7), it can be seen that the delay-time vectors 𝐱~\mathbf{\tilde{\text{$\bf x$}}} and 𝐲~\mathbf{\tilde{\text{$\bf y$}}} in (38) are simply shuffled versions of the time domain transmitted and received vectors 𝐬{\bf s} and 𝐫{\bf r}, respectively. Let 𝐬{\bf s} and 𝐫{\bf r} be split into NN blocks each of size MM, such that 𝐬=[𝐬0T,,𝐬N1T]T{\bf s}=[{\bf s}_{0}^{\text{T}},\cdots,{\bf s}_{N-1}^{\text{T}}]^{\text{T}} and 𝐫=[𝐫0T,,𝐫N1T]T{\bf r}=[{\bf r}_{0}^{\text{T}},\cdots,{\bf r}_{N-1}^{\text{T}}]^{\text{T}}. Then 𝐱~m=[𝐬0(m),,𝐬N1(m)]T\mathbf{\tilde{\text{$\bf x$}}}_{m}=[{\bf s}_{0}(m),\cdots,{\bf s}_{N-1}(m)]^{\text{T}} and 𝐲~m=[𝐫0(m),,𝐫N1(m)]T\mathbf{\tilde{\text{$\bf y$}}}_{m}=[{\bf r}_{0}(m),\cdots,{\bf r}_{N-1}(m)]^{\text{T}}.

Let

𝐏\displaystyle{\bf P} =[𝐄1,1𝐄2,1𝐄M,1𝐄1,2𝐄2,2𝐄M,2𝐄1,N𝐄2,N𝐄M,N]NM×NM\displaystyle=\left[\begin{array}[]{cccc}{\bf E}_{1,1}&{\bf E}_{2,1}&\cdots&{\bf E}_{M,1}\\ {\bf E}_{1,2}&{\bf E}_{2,2}&\cdots&{\bf E}_{M,2}\\ \vdots&\ddots&\ddots&\vdots\\ {\bf E}_{1,N}&{\bf E}_{2,N}&\cdots&{\bf E}_{M,N}\end{array}\right]~\in\mathbb{C}^{NM\times NM} (45)

be the row-column interleaver permutation matrix such that 𝐬=𝐏𝐱~{\bf s}={\bf P}\cdot\mathbf{\tilde{\text{$\bf x$}}} and 𝐫=𝐏𝐲~{\bf r}={\bf P}\cdot\mathbf{\tilde{\text{$\bf y$}}} where 𝐄i,jM×N{\bf E}_{i,j}\in\mathbb{C}^{M\times N} is defined as

𝐄i,j(i,j)={1,if i=i and j=j0,otherwise.\displaystyle{\bf E}_{i,j}(i^{\prime},j^{\prime})=\left.\left\{\begin{array}[]{ll}1,&\text{if }i^{\prime}=i\text{ and }j^{\prime}=j\\ 0,&\text{otherwise.}\end{array}\right.\right. (48)

Such permutation is known in the literature as a perfect shuffle, and has the following property [17]: given square matrices 𝐀{\bf A} and 𝐁{\bf B}

𝐀𝐁=𝐏(𝐁𝐀)𝐏T.{\bf A}\otimes{\bf B}={\bf P}\cdot({\bf B}\otimes{\bf A})\cdot{\bf P}^{\text{T}}. (49)

The input-output relation in (38) can now be written as

(𝐏T𝐫)=𝐇~(𝐏T𝐬)+𝐰~.\displaystyle({\bf P}^{\rm T}\cdot{\bf r})=\mathbf{\tilde{\text{$\bf H$}}}\cdot({\bf P}^{\rm T}\cdot{\bf s})+\mathbf{\tilde{\text{$\bf w$}}}. (50)

Multiplying both sides of (50) on the left by 𝐏{\bf P}, the input-output relation can be expressed in terms of the time-domain channel matrix 𝐆=𝐏𝐇~𝐏T{\bf G}={\bf P}\cdot\mathbf{\tilde{\text{$\bf H$}}}\cdot{\bf P}^{\rm T} as

𝐫=𝐆𝐬+𝐰¯.{\bf r}={\bf G}\cdot{\bf s}+\mathbf{\bar{\text{$\bf w$}}}. (51)

We note that 𝐆{\bf G} and 𝐇~\mathbf{\tilde{\text{$\bf H$}}} are similar matrices and hence share the same eigenvalues [18]. From (39) using the perfect shuffle property in (49), the time domain channel matrix 𝐆{\bf G} can be related to the delay-Doppler domain channel matrix 𝐇{\bf H} as

𝐆=(𝐅N𝐈M)(𝐏𝐇𝐏T)(𝐅N𝐈M).\displaystyle{\bf G}=({\bf F}_{N}^{{\dagger}}\otimes{\bf I}_{M})\cdot({\bf P}\cdot{\bf H}\cdot{\bf P}^{\rm T})\cdot({\bf F}_{N}\otimes{\bf I}_{M}). (52)

As shown in Fig. 4 the null symbols added in the delay-Doppler domain act as interleaved guard bands of length lmaxl_{\max} in the time-domain vector 𝐬{\bf s} and thus help in avoiding interference between the time domain blocks 𝐫n{\bf r}_{n} for n=0,,N1n=0,\cdots,N-1. This forces 𝐆{\bf G} to be a block-diagonal matrix. As a result, the large matrix equation in (51) can be split into NN parallel smaller linear matrix equations with the blocks 𝐆0,,𝐆N1M×M{\bf G}_{0},\cdots,{\bf G}_{N-1}\in\mathbb{C}^{M\times M} as the corresponding channel matrices. 𝐆n{\bf G}_{n} are the diagonal blocks of G each with a bandwidth of lmax+1l_{\max}+1. The system equation in (51) can be split and written as

𝐫n=𝐆n𝐬n+𝐰¯nwhere n=0,,N1.{\bf r}_{n}={\bf G}_{n}\cdot{\bf s}_{n}+\mathbf{\bar{\text{$\bf w$}}}_{n}\quad\text{where }~~n=0,\cdots,N-1. (53)

Since 𝐆=𝐏𝐇~𝐏T{\bf G}={\bf P}\cdot\mathbf{\tilde{\text{$\bf H$}}}\cdot{\bf P}^{\rm T}, the non-zero entries of the M×MM\times M time domain channel sub-matrices 𝐆n{\bf G}_{n} are related to the entries of the N×NN\times N delay-time channel sub-matrices 𝐊~m,l\mathbf{\tilde{\text{$\bf K$}}}_{m,l} and the time-varying complex channel gain for each delay tap gs(l,q)g^{\rm s}(l,q) as

gs(l,q)=𝐆n(m,ml)=𝐊~m,l(n,n)=𝝂~m,l(n)g^{\rm s}(l,q)={\bf G}_{n}(m,m-l)=\mathbf{\tilde{\text{$\bf K$}}}_{m,l}(n,n)=\mathbf{\tilde{\text{$\boldsymbol{\nu}$}}}_{m,l}(n) (54)

where q=m+nMq=m+nM, m{li<M|l}m\in\{l\leq i<M|l\in\mathcal{L}\} and 0n<N0\leq n<N.

IV Low Complexity Iterative Rake Detector

Refer to caption
Figure 5: MRC delay-Doppler domain operation for M=7M=7 and the set of discrete delay indices =0,1,2\mathcal{L}={0,1,2}.

We can think of the proposed MRC detector as the maximal ratio combining of the channel impaired signal components received at L=||PL=|\mathcal{L}|\leq P different delay branches in the delay-Doppler grid analogous to a CDMA rake receiver as shown in Fig. 5. The noise plus interference (NPI) power in each of these branches is different and depends on the channel response. In each detector iteration, we cancel the estimated inter symbol-vector interference in the branches selected for combining, thereby iteratively improving the post MRC signal to interference plus noise ratio (SINR).

The input output relation between the transmitted and received symbol-vectors 𝐱m{\bf x}_{m} and 𝐲m{\bf y}_{m} in (28) is given by

𝐲m+l=l𝐊m+l,l𝐱m+𝐰m+l{\bf y}_{m+l}={\sum_{l\in\mathcal{L}}{\bf K}_{m+l,l}\cdot{\bf x}_{m}}+{\bf w}_{m+l} (55)

where 𝐰m{\bf w}_{m} is iid AWGN noise with variance σn2\sigma_{n}^{2}. From (55), due to the inter-symbol interference caused by delay spread (lmaxT/Ml_{\max}T/M), all symbol-vectors 𝐱m{\bf x}_{m} have a signal component in LL received symbol-vectors 𝐲m+l{\bf y}_{m+l}, for ll\in\mathcal{L}. Let 𝐛mlN×1{\bf b}_{m}^{l}\in\mathbb{C}^{N\times 1} be the channel impaired signal component of 𝐱m{\bf x}_{m} in the received 𝐲m+l{\bf y}_{m+l} vector at delay index m+lm+l after removing the interference of the other transmitted symbol-vectors 𝐱k{\bf x}_{k} for kmk\neq m. Assuming we have the estimates of symbol-vectors 𝐱m{\bf x}_{m} from previous iterations, we can then write 𝐛ml{\bf b}_{m}^{l} for ll\in\mathcal{L} as

𝐛ml=𝐲m+ll,ll𝐊m+l,l𝐱^m+ll.{\bf b}_{m}^{l}={\bf y}_{m+l}-{\sum_{{{l^{\prime}}}\in\mathcal{L},{l^{\prime}}\neq l}{\bf K}_{m+l,{{l^{\prime}}}}}\cdot\hat{\bf x}_{m+l-{{l^{\prime}}}}. (56)

Then from (55) and (56) for ll\in\mathcal{L}, we have LL equations for the symbol-vector estimates 𝐱^m\hat{\bf x}_{m} given as

𝐛ml=𝐊m+l,l𝐱^m+𝐰m+l+interference{\bf b}_{m}^{l}={{\bf K}_{m+l,l}}\cdot{\hat{\bf x}_{m}}+{\bf w}_{m+l}+\text{interference} (57)

in the delay branch with index ll due to error in the current estimates of the interfering symbol-vectors 𝐱m+lp{\bf x}_{m+l-{p}} for lpl\neq{p}. In the proposed scheme, instead of estimating the transmitted symbol-vector 𝐱^m\hat{\bf x}_{m} separately from each of the LL equations in (57), we perform maximal ratio combining (58) of the estimates 𝐛ml{\bf b}_{m}^{l} followed by symbol-by-symbol QAM demapping using (61). The vector output of the maximal ratio combiner, 𝐜mN×1{\bf c}_{m}\in\mathbb{C}^{N\times 1}, is given by

𝐜m=𝐃m1𝐠m{\bf c}_{m}={\bf D}_{m}^{-1}\cdot{\bf g}_{m} (58)

where

𝐃m=l𝐊m+l,l𝐊m+l,l{\bf D}_{m}={\sum_{l\in\mathcal{L}}{\bf K}_{m+l,l}^{{\dagger}}\cdot{{\bf K}_{m+l,l}}} (59)
𝐠m=l𝐊m+l,l𝐛ml{\bf g}_{m}=\sum_{l\in\mathcal{L}}{\bf K}_{m+l,l}^{{\dagger}}\cdot{\bf b}_{m}^{l} (60)

and the hard estimates are given by

𝐱^m(n)=argminaj𝒬|aj𝐜m(n)|.{\bf\hat{x}}_{m}(n)=\arg\min_{a_{j}\in\mathcal{Q}}\left|{a_{j}-{\bf c}_{m}(n)}\right|. (61)

where aja_{j} is signal from the QAM alphabet 𝒬\mathcal{Q}, with j=1,,|𝒬|j=1,\ldots,|\mathcal{Q}| and n=0,,N1n=0,\ldots,N-1. Let 𝒟(.)\mathcal{D}(.) denote the decision on the estimate 𝐜m{\bf c}_{m} in every iteration such that 𝐱^m(i)=𝒟(𝐜m(i))\hat{\bf x}_{m}^{(i)}=\mathcal{D}({\bf c}_{m}^{(i)}). Hard-decision function 𝒟(c)\mathcal{D}(c) is given by the maximum likelihood (ML) criterion in (61).

1Input: 𝐇{\bf H}, 𝐃m{\bf D}_{m}, 𝐲m{\bf y}_{m}, 𝐱m=𝟎N{\bf x}_{m}={\bf 0}_{N}\forall m=0,,M1m=0,\ldots,M-1
2 for i=1:max iterations do
3 for m=0:M1m=0:M^{\prime}-1 do
4    for ll\in\mathcal{L} do
5       𝐛ml=𝐲m+lpl𝐊m+l,p𝐱^m+lp{\bf b}_{m}^{l}={\bf y}_{m+l}-{\sum_{{p}\neq l}{\bf K}_{m+l,{p}}}\cdot\hat{\bf x}_{m+l-{p}}
6       
7      end for
8    𝐠m=l𝐊m+l,l𝐛ml{\bf g}_{m}=\sum_{l\in\mathcal{L}}{\bf K}_{m+l,l}^{{\dagger}}\cdot{\bf b}_{m}^{l}
9    𝐜m=𝐃m1𝐠m{{\bf c}}_{m}={\bf D}_{m}^{-1}\cdot{\bf g}_{m}
10    𝐱^m=𝒟(𝐜m)(or𝐱^m=𝐜m{\bf\hat{x}}_{m}=\mathcal{D}({\bf c}_{m})\quad(\text{or}\quad\hat{\bf x}_{m}={\bf c}_{m})5
11    
12   end for
13 
14 end for
Output: 𝐱^m{\bf\hat{x}}_{m}
Algorithm 1 MRC in delay-Doppler domain.

Once we update the estimate 𝐱^m\hat{\bf x}_{m}, we increment mm and repeat the same to estimate all M=MlmaxM^{\prime}=M-l_{max} information symbol-vectors 𝐱^m{\bf\hat{x}}_{m} using the updated estimates555Alternatively, a soft estimate can also be used in conjunction with an outer coding scheme as described in Section V-B. of the previously decoded symbol-vectors in the form of a decision feedback equalizer (DFE) as shown in Fig. 5. Note that the DFE action leads to sequential updates whereas alternatively, using only the previous iteration estimates leads to parallel updates. We verified experimentally that parallel updates result in slower convergence. Algorithm 1 shows the delay-Doppler domain MRC operation (also see Fig. 5).

IV-A Reduced complexity delay-time domain implementation

In (56), for each symbol-vector 𝐱m{\bf x}_{m}, we need to compute LL vectors 𝐛ml{\bf b}_{m}^{l}. This operation requires L(L1)L(L-1) products between matrices 𝐊m,l{\bf K}_{m,l} and estimated symbol-vectors 𝐱^ml\hat{\bf x}_{m-l}. We can take advantage of the redundant operations to reduce the complexity. Let us define the residual noise plus interference (RNPI) term in the ii-th iteration

Δ𝐲m(i)=𝐲ml𝐊m,l𝐱^ml(i){\Delta}{\bf y}_{m}^{(i)}={\bf y}_{m}-{\sum_{l\in\mathcal{L}}{\bf K}_{m,l}}\cdot\hat{\bf x}_{m-l}^{(i)} (62)

which can be considered as the residual error in the reconstructed received delay-Doppler domain symbols due to error in estimation of the transmitted symbols. Note that symbol-vectors 𝐱^m\hat{\bf x}_{m} are estimated in increasing order for m=0,,M1m=0,\ldots,M^{\prime}-1. Therefore, for estimating the symbol-vector 𝐱m{\bf x}_{m}, only the symbol-vectors 𝐱^m+p\hat{\bf x}_{m+p}, for p<0p<0, have updated estimates available in the current iteration. For p0p\geq 0, the previous iteration estimates are used. From (56) and (62), 𝐛ml{\bf b}_{m}^{l} computation for estimating the symbol-vector 𝐱m{\bf x}_{m} in the ii-th iteration can be written as

𝐛ml=Δ𝐲m+l(i)+𝐊m+l,l𝐱^m(i1).{\bf b}_{m}^{l}={\Delta}{\bf y}_{m+l}^{(i)}+{\bf K}_{m+l,l}\cdot\hat{\bf x}_{m}^{(i-1)}. (63)

Substituting (63) for 𝐛ml{\bf b}_{m}^{l} in (60), the direct computation of 𝐛ml{\bf b}_{m}^{l} can be avoided by writing 𝐠m(i){\bf g}_{m}^{(i)} for the ii-th iteration as

𝐠m(i)\displaystyle{\bf g}_{m}^{(i)} =l𝐊m+l,lΔ𝐲m+l(i)+(l𝐊m+l,l𝐊m+l,l)𝐱^m(i1)\displaystyle=\sum_{l\in\mathcal{L}}{\bf K}_{m+l,l}^{{\dagger}}\cdot{\Delta}{\bf y}^{(i)}_{m+l}+\left(\sum_{l\in\mathcal{L}}{\bf K}_{m+l,l}^{{\dagger}}\cdot{\bf K}_{m+l,l}\right)\cdot\hat{\bf x}_{m}^{(i-1)}
=l𝐊m+l,lΔ𝐲m+l(i)+𝐃m𝐱^m(i1).\displaystyle=\sum_{l\in\mathcal{L}}{\bf K}_{m+l,l}\cdot{\Delta}{\bf y}^{(i)}_{m+l}+{\bf D}_{m}\cdot\hat{\bf x}_{m}^{(i-1)}. (64)

Then from (58) and (64), the MRC output at the ii-th iteration can be written as

𝐜m(i)\displaystyle{\bf c}_{m}^{(i)} =𝐱^m(i1)+𝐃m1Δ𝐠m(i)\displaystyle=\hat{\bf x}_{m}^{(i-1)}+{\bf D}_{m}^{-1}\cdot{\Delta}{\bf g}_{m}^{(i)} (65)

where

Δ𝐠m(i)=l𝐊m+l,lΔ𝐲m+l(i){\Delta}{\bf g}_{m}^{(i)}=\sum_{l\in\mathcal{L}}{\bf K}_{m+l,l}^{{\dagger}}\cdot{\Delta}{\bf y}_{m+l}^{(i)} (66)

The vector Δ𝐠m(i){\Delta}{\bf g}_{m}^{(i)} in (66) is the maximal ratio combining of the RNPI’s in all the delay branches (𝐲m+l for l)({\bf y}_{m+l}\text{ for }l\in\mathcal{L}) having a component of 𝐱m{\bf x}_{m} in them.

In the ii-th iteration, for every estimated symbol-vector 𝐱m{\bf x}_{m}, LL RNPI vectors Δ𝐲m+l(i){\Delta}{\bf y}_{m+l}^{(i)} need to be updated. which costs L2L^{2} matrix-vector products. However, the complexity of (62) can be reduced by storing and updating the initial RNPI vectors Δ𝐲m(0){\Delta}{\bf y}_{m}^{(0)}. The LL RNPI vectors which have a component of the most recently estimated symbol-vector are updated as follows,

Δ𝐲m+l(i)Δ𝐲m+l(i)𝐊m+l,l(𝐱m(i)𝐱m(i1)).{\Delta}{\bf y}_{m+l}^{(i)}\leftarrow{\Delta}{\bf y}_{m+l}^{(i)}-{\bf K}_{m+l,l}\cdot({{\bf x}}_{m}^{(i)}-{{\bf x}}_{m}^{(i-1)}). (67)

The number of matrix-vector products required to compute Δ𝐲m(i){\Delta}{\bf y}_{m}^{(i)} has now been reduced from L2L^{2} in (62) to LL in (67). Moreover, as described in Section II-E, the matrix-vector products in (66) and (67) are products between circulant matrices 𝐊m,lN×N{\bf K}_{m,l}\in{\mathbb{C}}^{N\times N} and column vectors 𝐱m{\bf x}_{m} or Δ𝐲mN×1\Delta{\bf y}_{m}\in{\mathbb{C}}^{N\times 1} which can be converted to element-wise product of vectors 𝝂~m,l𝐱~m\mathbf{\tilde{\text{$\boldsymbol{\nu}$}}}_{m,l}\circ\mathbf{\tilde{\text{$\bf x$}}}_{m} or 𝝂~m,lΔ~𝐲m\mathbf{\tilde{\text{$\boldsymbol{\nu}$}}}_{m,l}\circ\mathbf{\tilde{\text{$\Delta$}}}{\bf y}_{m}, respectively, in the delay-time domain with a complexity of NN complex multiplications. Let the superscript \sim denotes the NN-IFFT of a vector (i.e., 𝐚~=𝐅NH𝐚\mathbf{\tilde{\text{$\bf a$}}}={\bf F}_{N}^{\text{H}}\cdot{\bf a}). The equations (65), (66) and (67) can now be written in corresponding delay-time domain as

𝐜~m(i)\displaystyle\mathbf{\tilde{\text{$\bf c$}}}_{m}^{(i)} =𝐱~m(i1)+Δ𝐠~m(i)𝐝~m\displaystyle=\mathbf{\tilde{\text{$\bf x$}}}_{m}^{(i-1)}+{\Delta}\mathbf{\tilde{\text{$\bf g$}}}_{m}^{(i)}\oslash\mathbf{\tilde{\text{$\bf d$}}}_{m} (68)
Δ𝐠~m(i)=l𝝂~m+l,lΔ𝐲~m+l(i){\Delta}\mathbf{\tilde{\text{$\bf g$}}}_{m}^{(i)}=\sum_{l\in\mathcal{L}}\mathbf{\tilde{\text{$\boldsymbol{\nu}$}}}_{m+l,l}^{\ast}\circ{\Delta}\mathbf{\tilde{\text{$\bf y$}}}_{m+l}^{(i)} (69)
Δ𝐲~m+l(i)Δ𝐲~m+l(i)𝝂~m+l,l(𝐱~m(i)𝐱~m(i1)){\Delta}\mathbf{\tilde{\text{$\bf y$}}}_{m+l}^{(i)}\leftarrow{\Delta}\mathbf{\tilde{\text{$\bf y$}}}_{m+l}^{(i)}-\mathbf{\tilde{\text{$\boldsymbol{\nu}$}}}_{m+l,l}\circ({\mathbf{\tilde{\text{$\bf x$}}}}_{m}^{(i)}-{\mathbf{\tilde{\text{$\bf x$}}}}_{m}^{(i-1)}) (70)

where

𝐝~m=l𝝂~m+l,l𝝂~m+l,l\mathbf{\tilde{\text{$\bf d$}}}_{m}=\sum_{l\in\mathcal{L}}\mathbf{\tilde{\text{$\boldsymbol{\nu}$}}}_{m+l,l}^{{\dagger}}\circ\mathbf{\tilde{\text{$\boldsymbol{\nu}$}}}_{m+l,l} (71)

which can be computed in only NLNL complex multiplications.

1Input: 𝐇~\mathbf{\tilde{\text{$\bf H$}}}, 𝐝~m\mathbf{\tilde{\text{$\bf d$}}}_{m}, 𝐱~m(0)\mathbf{\tilde{\text{$\bf x$}}}_{m}^{(0)}, 𝐲~m\mathbf{\tilde{\text{$\bf y$}}}_{m} \forall m=0,,M1m=0,\ldots,M-1
2 for m=0:M1m=0:M^{\prime}-1 do
3 Δ𝐲~m(0)=𝐲~ml𝝂~m,l𝐱~ml(0)\Delta\mathbf{\tilde{\text{${\bf y}$}}}^{{(0)}}_{m}=\mathbf{\tilde{\text{${\bf y}$}}}_{m}-{\sum_{l\in\mathcal{L}}\mathbf{\tilde{\text{$\boldsymbol{\nu}$}}}_{m,l}\circ{\mathbf{\tilde{\text{$\bf x$}}}_{m-l}^{(0)}}}
4 end for
5for i=1:max iterations do
6 Δ𝐲~(i)=Δ𝐲~(i1)\Delta\mathbf{\tilde{\text{${\bf y}$}}}^{{(i)}}=\Delta\mathbf{\tilde{\text{${\bf y}$}}}^{{(i-1)}}
7 for m=0:M1m=0:M^{\prime}-1 do
8    Δ𝐠~m(i)=l𝝂~m+l,lΔ𝐲~m+l(i){\Delta}\mathbf{\tilde{\text{$\bf g$}}}_{m}^{(i)}=\sum_{l\in\mathcal{L}}\mathbf{\tilde{\text{$\boldsymbol{\nu}$}}}_{m+l,l}^{\ast}\circ{\Delta}\mathbf{\tilde{\text{$\bf y$}}}_{m+l}^{(i)}
9    𝐜~m(i)=𝐱~m(i1)+Δ𝐠~m(i)𝐝~m\mathbf{\tilde{\text{${\bf c}$}}}_{m}^{(i)}=\mathbf{\tilde{\text{${\bf x}$}}}_{m}^{(i-1)}+{\Delta}\mathbf{\tilde{\text{$\bf g$}}}_{m}^{(i)}\oslash\mathbf{\tilde{\text{$\bf d$}}}_{m}
10    𝐱~m(i)=𝐅N𝒟(𝐅N𝐜~m(i))\mathbf{\tilde{\text{$\bf{x}$}}}_{m}^{(i)}={\bf F}_{N}^{{\dagger}}\cdot\mathcal{D}({\bf F}_{N}\cdot\mathbf{\tilde{\text{${\bf c}$}}}_{m}^{(i)})  (or 𝐱~m(i)=𝐜~m(i)\mathbf{\tilde{\text{$\bf{x}$}}}_{m}^{(i)}=\mathbf{\tilde{\text{$\bf c$}}}_{m}^{(i)})
11    for ll\in\mathcal{L} do
12       Δ𝐲~m+l(i)Δ𝐲~m+l(i)𝝂~m+l,l(𝐱~m(i)𝐱~m(i1)){\Delta}\mathbf{\tilde{\text{$\bf y$}}}_{m+l}^{(i)}\leftarrow{\Delta}\mathbf{\tilde{\text{$\bf y$}}}_{m+l}^{(i)}-\mathbf{\tilde{\text{$\boldsymbol{\nu}$}}}_{m+l,l}\circ({\mathbf{\tilde{\text{$\bf x$}}}}_{m}^{(i)}-{\mathbf{\tilde{\text{$\bf x$}}}}_{m}^{(i-1)})
13       
14      end for
15    
16   end for
17 if (Δ𝐲~(i)Δ𝐲~(i1)||{\Delta}\mathbf{\tilde{\text{$\bf y$}}}^{(i)}||\geq||{\Delta}\mathbf{\tilde{\text{$\bf y$}}}^{(i-1)}||) then EXIT
18 end for
Output: 𝐱^m=𝒟(𝐅N𝐱~m)\hat{\bf x}_{m}={\mathcal{D}}({\bf F}_{N}\cdot\mathbf{\tilde{\text{$\bf x$}}}_{m})
Algorithm 2 Reduced complexity MRC in delay-time domain .

IV-A1 Computational complexity per iteration

Overall complexity per iteration for calculating Δ𝐠~m(i){\Delta}\mathbf{\tilde{\text{$\bf g$}}}_{m}^{(i)}, 𝐜~m(i)\mathbf{\tilde{\text{$\bf c$}}}_{m}^{(i)} and Δ𝐲~m(i){\Delta}\mathbf{\tilde{\text{$\bf y$}}}_{m}^{(i)} for all symbol-vectors is M(2L+1)NM^{\prime}(2L+1)N complex multiplications. The redundant FFT computations can be avoided by storing the Fourier transform of the MLM^{\prime}L Doppler spread vectors 𝝂m,l{\boldsymbol{\nu}}_{m,l}, the MM^{\prime} initial symbol-vector estimates 𝐱m(0){\bf x}_{m}^{(0)} and the RNPI vectors Δ𝐲~m(0)\Delta\mathbf{\tilde{\text{$\bf y$}}}^{(0)}_{m} in (67). The hard decision estimates require the delay-time vectors to be transformed into the delay-Doppler domain and back using two NN-IFFT operations (which requires 2Nlog2(N)2N\log_{2}(N) complex multiplications) per symbol-vector. Algorithm 2 shows the low complexity delay-time domain MRC implementation. The detector iterations are stopped when the overall RNPI error Δ𝐲~=[Δ𝐲~0T,Δ𝐲~1T,,Δ𝐲~M1T]T\Delta\mathbf{\tilde{\text{$\bf y$}}}=[\Delta\mathbf{\tilde{\text{$\bf y$}}}_{0}^{\text{T}},\Delta\mathbf{\tilde{\text{$\bf y$}}}_{1}^{\text{T}},\cdots,\Delta\mathbf{\tilde{\text{$\bf y$}}}_{M-1}^{\text{T}}]^{\text{T}} due to the estimation error in symbol-vectors stops reducing.

IV-A2 Initial computational complexity

In the proposed detector, the initial computations include generating all the entries of the matrices 𝐇{\bf H} and 𝐇~\mathbf{\tilde{\text{$\bf H$}}}, which requires computing the vectors 𝝂m,l{\boldsymbol{\nu}}_{m,l} and their Fourier transform 𝝂~m,l\mathbf{\tilde{\text{$\boldsymbol{\nu}$}}}_{m,l} for all m=0,,M1m=0,\ldots,M^{\prime}-1 and ll\in\mathcal{L}. Assuming the integer delay-Doppler channel parameters (hi,ki,li)(h_{i},k_{i},l_{i}) are known for i=1,2,,Pi=1,2,\ldots,P, the channel Doppler spread vectors 𝝂m,l{\boldsymbol{\nu}}_{m,l} can be easily computed using the relations given in (13) and (35).

Let KlK_{l} be the number of non-zero channel coefficients in each vector 𝝂m,l{\boldsymbol{\nu}}_{m,l} (or paths with different Doppler shift in the same delay bin ll\in\mathcal{L}) such that total number of channel coefficients or propagation paths as seen by the OTFS receiver is P=lKlP=\sum_{l\in\mathcal{L}}K_{l}. The number of complex multiplications required to compute the MLM^{\prime}L vectors 𝝂m,l{\boldsymbol{\nu}}_{m,l} using (35) is MlKl=MPM^{\prime}\sum_{l\in\mathcal{L}}K_{l}=M^{\prime}P. The OTFS channel matrix 𝐇{\bf H} (or equivalently the vectors 𝝂m,l{\boldsymbol{\nu}}_{m,l}) can then be generated in MPM^{\prime}P complex multiplications.

For the delay-time domain MRC operation in Algorithm 2, 𝝂~m,l\mathbf{\tilde{\text{$\boldsymbol{\nu}$}}}_{m,l} (NN-IFFT of 𝝂m,l{\boldsymbol{\nu}}_{m,l}) can be computed in min{Nkl,Nlog2(N)}\min\{Nk_{l},N\log_{2}(N)\} complex multiplications, since there are only KlK_{l} non-zero channel coefficients in each delay tap ll. Then, the number of complex multiplications required to compute 𝐇~\mathbf{\tilde{\text{$\bf H$}}} (or equivalently all the 𝝂~m,l\mathbf{\tilde{\text{$\boldsymbol{\nu}$}}}_{m,l}) is upper bounded by MNlKl=MNPM^{\prime}N\sum_{l}K_{l}=M^{\prime}NP.

Alternatively, for the fractional Doppler case, the complexity of initial computations remains unaffected for the delay-time domain detector as 𝝂~m,l\mathbf{\tilde{\text{$\boldsymbol{\nu}$}}}_{m,l} can be generated directly from the channel gains, delays, and Doppler shifts (hi,κi,i)(h_{i},\kappa_{i},\ell_{i}) of the PP paths, using (13) and (25) with MNPM^{\prime}NP complex multiplications.

IV-B Low complexity initial estimate

In Algorithm 1 and 2, we initially assume that all the 𝒬\mathcal{Q}-QAM signals aja_{j} are equally likely and the mean of aja_{j}’s is zero and so we initialize 𝐱^m(0)=𝟎N\hat{{\bf x}}_{m}^{(0)}={\bf 0}_{N}, for all mm. The MRC detector complexity per iteration is of the order O(NML)O(NML) and the overall complexity scales linearly with the number of iterations.

However, a better initial estimate of the OTFS symbols instead of 𝐱^m=𝟎N\hat{\bf x}_{m}={\bf 0}_{N} may reduce the required number of MRC iterations and to reach convergence. Assuming ideal pulse shaping waveform, a single tap equalizer in the time-frequency domain can provide an improved low complexity initial estimate.

Following the remark in Section II-E and [3], we define 𝐇ddM×N{\bf H}_{\text{dd}}\in\mathbb{C}^{M\times N}, the delay-Doppler domain channel impulse response matrix for the ideal pulse shaping waveform case,

𝐇dd(m,n)={νl(κ),if m=l,n=[κ]N0,otherwise.\displaystyle{\bf H}_{\text{dd}}(m,n)=\left.\left\{\begin{array}[]{ll}\nu_{l}(\kappa),&\text{if }m=l,n=[\kappa]_{N}\\ 0,&\text{otherwise.}\end{array}\right.\right. (74)

For the fractional Doppler case (when κ\kappa is a real number). the ideal channel response can be written in terms of the Doppler spread vectors as 𝐇dd=[𝝂0,0,𝝂1,1,,𝝂M1,M1]T{\bf H}_{\text{dd}}=[{\boldsymbol{\nu}}_{0,0},{\boldsymbol{\nu}}_{1,1},\cdots,{\boldsymbol{\nu}}_{M-1,M-1}]^{\text{T}}. The corresponding time-frequency channel response for the ideal pulse shaping waveform is obtained by an inverse symplectic finite fourier transform (ISFFT) operation on the delay-Doppler channel as

𝐇tf\displaystyle{\bf H}_{\text{tf}} =𝐅M𝐇dd𝐅NH\displaystyle={\bf F}_{M}\cdot{\bf H}_{\text{dd}}\cdot{\bf F}^{\text{H}}_{N} (75)
=𝐅M[𝝂0,0,𝝂1,1,,𝝂M1,M1]T𝐅NH\displaystyle={\bf F}_{M}\cdot[{\boldsymbol{\nu}}_{0,0},{\boldsymbol{\nu}}_{1,1},\cdots,{\boldsymbol{\nu}}_{M-1,M-1}]^{\text{T}}\cdot{\bf F}^{\text{H}}_{N}
=𝐅M[𝝂~0,0,𝝂~1,1,,𝝂~M1,M1]T.\displaystyle={\bf F}_{M}\cdot[\mathbf{\tilde{\text{$\boldsymbol{\nu}$}}}_{0,0},\mathbf{\tilde{\text{$\boldsymbol{\nu}$}}}_{1,1},\cdots,\mathbf{\tilde{\text{$\boldsymbol{\nu}$}}}_{M-1,M-1}]^{\text{T}}. (76)

Similarly, the received time-frequency samples can be obtained by the ISFFT operation on the received delay-Doppler domain samples as

𝐘tf=𝐅M𝐘𝐅NH=𝐅M[𝐲~0,𝐲~1,𝐲~M1]T.\displaystyle{\bf Y}_{\text{tf}}={\bf F}_{M}\cdot{\bf Y}\cdot{\bf F}^{\text{H}}_{N}={\bf F}_{M}\cdot[\mathbf{\tilde{\text{$\bf y$}}}_{0},\mathbf{\tilde{\text{$\bf y$}}}_{1},\cdots\mathbf{\tilde{\text{$\bf y$}}}_{M-1}]^{\text{T}}. (77)

Since in the ideal pulse shaping waveform case, circular convolution of the channel and transmitted symbols in the delay-Doppler domain transforms to element-wise product in the time-frequency domain, we estimate the transmitted samples in the time-frequency domain by a single tap minimum mean square error (MMSE) equalizer

𝐗^tf(m,n)=𝐇tf(m,n)𝐘tf(m,n)|𝐇tf(m,n)|2+σw2{\bf\hat{X}}_{\text{\text{tf}}}(m,n)=\frac{{\bf H}_{\text{tf}}^{\ast}(m,n)\cdot{\bf Y}_{\text{tf}}(m,n)}{\lvert{\bf H}_{\text{tf}}(m,n)\rvert^{2}+\sigma_{w}^{2}} (78)

for m=0,,M1m=0,\ldots,M-1 and n=0,,N1n=0,\ldots,N-1.

The time-delay domain initial estimates of the OTFS symbol-vectors can then be obtained by the Heisenberg transform operation on the time-frequency domain estimates as

[𝐱~0(0),𝐱~1(0),𝐱~M1(0)]T=𝐅M𝐗^tf.[\mathbf{\tilde{\text{$\bf x$}}}_{0}^{(0)},\mathbf{\tilde{\text{$\bf x$}}}_{1}^{(0)},\cdots\mathbf{\tilde{\text{$\bf x$}}}_{M-1}^{(0)}]^{\text{T}}={\bf F}^{{\dagger}}_{M}\cdot{\bf\hat{X}}_{\text{tf}}. (79)

Note that 𝝂¯m,l=𝟎N\mathbf{\bar{\text{$\boldsymbol{\nu}$}}}_{m,l}={\bf 0}_{N} for ll\not\in\mathcal{L} and hence the operation in (76) can be computed in min{NML,NMlog2(M)}\min\{NML,NM\log_{2}(M)\} complex multiplications. Since we have already computed 𝝂~m,l\mathbf{\tilde{\text{$\boldsymbol{\nu}$}}}_{m,l}, and 𝐲~\mathbf{\tilde{\text{$\bf y$}}} is just a shuffled version of the received time-domain samples, the overall number of computations (for the steps in (76), (77), (78) and (79)) required for the initial estimate is upper bounded by NM(L+2log2(M)+3)NM(L+2\log_{2}(M)+3), which is comparable to the complexity of one detector iteration NM(2L+1)NM^{\prime}(2L+1).

IV-C Condition for Detector Convergence

In this section, we cast the delay-time algorithm (Algorithm 2) in the time-domain with the purpose of analysing the detector convergence using the properties of Jacobi and Gauss Seidel iterative methods for solving linear equations [19, 20]. The basic principle of iterative MRC operation in the delay-time domain with sequential updates given in (68)-(70) can be compactly expressed as

𝐱~(i)=𝐱~(i1)+𝐃~1𝐇~(𝐲~𝐇~𝐱~(i1))\mathbf{\tilde{\text{$\bf x$}}}^{(i)}=\mathbf{\tilde{\text{$\bf x$}}}^{(i-1)}+\mathbf{\tilde{\text{$\bf D$}}}^{-1}\mathbf{\tilde{\text{$\bf H$}}}^{{\dagger}}(\mathbf{\tilde{\text{$\bf y$}}}-\mathbf{\tilde{\text{$\bf H$}}}\mathbf{\tilde{\text{$\bf x$}}}^{(i-1)}) (80)

when using parallel updates (i.e. without DFE), where 𝐃~\mathbf{\tilde{\text{$\bf D$}}} is the matrix containing diagonal elements of 𝐇~𝐇~\mathbf{\tilde{\text{$\bf H$}}}^{{\dagger}}\mathbf{\tilde{\text{$\bf H$}}}. The rows and columns of the delay-time channel matrix 𝐇~\mathbf{\tilde{\text{$\bf H$}}} are perfectly shuffled using the permutation matrix 𝐏{\bf P} to obtain a similar, block diagonal time-domain channel matrix 𝐆{\bf G} as explained in Section II-F. This allows the equivalent operation in (80) to be split and executed in parallel for each independent time domain block 𝐆n{\bf G}_{n} as

𝐬n(i)=𝐬n(i1)+𝐃n1𝐆n(𝐫n𝐆n𝐬n(i1)){\bf s}_{n}^{(i)}={\bf s}_{n}^{(i-1)}+{\bf D}_{n}^{-1}{\bf G}_{n}^{{\dagger}}({\bf r}_{n}-{\bf G}_{n}{\bf s}_{n}^{(i-1)}) (81)

where 𝐃n{\bf D}_{n} is the matrix containing the diagonal elements of 𝐆n𝐆n{\bf G}_{n}^{{\dagger}}{\bf G}_{n}. Equation (81) can be written in the form

𝐬n(i)=𝐓nJ𝐬n(i1)+𝐐nJ𝐳n\displaystyle{\bf s}_{n}^{(i)}=-{\bf T}^{\rm J}_{n}\cdot{\bf s}_{n}^{(i-1)}+{\bf Q}^{\rm J}_{n}\cdot{\bf z}_{n}
𝐓nJ=𝐃n1(𝐋n+𝐋n),𝐐nJ=𝐃n1,𝐳n=𝐆n𝐫n\displaystyle{\bf T}^{\rm J}_{n}={\bf D}_{n}^{-1}\cdot({\bf L}_{n}+{\bf L}_{n}^{{\dagger}}),\quad{\bf Q}^{\rm J}_{n}={\bf D}_{n}^{-1},\quad{\bf z}_{n}={\bf G}_{n}^{{\dagger}}{\bf r}_{n} (82)

where 𝐋n{\bf L}_{n} and 𝐋n{\bf L}_{n}^{{\dagger}} are the matrices containing the strictly lower and upper triangular parts of the Hermitian matrix 𝐑n=𝐆n𝐆n{\bf R}_{n}={\bf G}_{n}^{{\dagger}}{\bf G}_{n}. Finally, we observe that the parallel update formulation in (82) matches the classic Jacobi iterative method (hence the superscript ’J’ in 𝐓nJ{\bf T}_{n}^{\rm J}) for solving linear equations, [19].

We now focus on the sequential update method given in Algorithm 1 and 2 based on the DFE operation. Note that, in Algorithm 2, the linear matrix equation in (80) is solved block-wise with low complexity, where the latest estimates of the symbol-vectors calculated in the current iteration are used in estimating the next symbol-vector as in a DFE

𝐬n(i)=𝐬n(i1)+𝐃n1(𝐳n𝐋n𝐬n(i)(a)𝐋n𝐬n(i1)(b)){\bf s}_{n}^{(i)}={\bf s}_{n}^{(i-1)}+{\bf D}_{n}^{-1}({\bf z}_{n}-\underbrace{{\bf L}_{n}{\bf s}_{n}^{(i)}}_{(a)}-\underbrace{{\bf L}_{n}^{{\dagger}}{\bf s}_{n}^{(i-1)}}_{(b)}) (83)

where (a)(a) and (b)(b) denote the contribution of the current and previous-iteration estimates, respectively. We can modify (82) for the DFE iterative method in (83) as

𝐬n(i)=𝐓nGS𝐬n(i1)+𝐐nGS𝐳n\displaystyle{\bf s}_{n}^{(i)}=-{\bf T}^{\rm GS}_{n}\cdot{\bf s}_{n}^{(i-1)}+{\bf Q}^{\rm GS}_{n}\cdot{\bf z}_{n}
𝐓nGS=(𝐃n+𝐋n)1𝐋n,𝐐nGS=(𝐃n+𝐋n)1\displaystyle{\bf T}^{\rm GS}_{n}=({\bf D}_{n}+{\bf L}_{n})^{-1}\cdot{\bf L}_{n}^{{\dagger}},\quad{\bf Q}^{\rm GS}_{n}=({\bf D}_{n}+{\bf L}_{n})^{-1} (84)

and observe that Algorithm 2 coincides with the well studied Gauss Seidel (GS) method available in the literature [19, 20]. Algorithm 3 shows the equivalent time domain GS method implementing Algorithm 2.

1Input: 𝐫{\bf r}, 𝐆{\bf G}
2 for n=0:N1n=0:N-1 do
3 𝐑n=𝐆n𝐆n{\bf R}_{n}={\bf G}_{n}^{{\dagger}}\cdot{\bf G}_{n}
4 𝐳n=𝐆n𝐫n{\bf z}_{n}={\bf G}_{n}^{{\dagger}}\cdot{\bf r}_{n}
5 𝐋n=strictly lower triangular part{𝐑n{\bf L}_{n}=\text{strictly lower triangular part}\{{\bf R}_{n}}
6 𝐓nGS=(𝐃n+𝐋n)1𝐋n{\bf T}^{\rm GS}_{n}=({\bf D}_{n}+{\bf L}_{n})^{-1}\cdot{\bf L}_{n}^{{\dagger}}
7 𝐐nGS=(𝐃n+𝐋n)1{\bf Q}^{\rm GS}_{n}=({\bf D}_{n}+{\bf L}_{n})^{-1}
8 
9 end for
10𝐬^(0)=𝐏(𝐈M𝐅N)𝐱^(0)\hat{\bf s}^{(0)}={\bf P}\cdot({\bf I}_{M}\otimes{\bf F}_{N}^{{\dagger}})\cdot{\hat{\bf x}}^{(0)}
11 for i=1i=1:max iterations do
12 for n=0:N1n=0:N-1 do
13    𝐬^n(i)=𝐓nGS𝐬^n(i1)+𝐐nGS𝐳n\hat{\bf s}_{n}^{(i)}=-{\bf T}^{\rm GS}_{n}\cdot\hat{\bf s}_{n}^{(i-1)}+{\bf Q}^{\rm GS}_{n}\cdot{\bf z}_{n}
14    
15   end for
16 if (𝐫𝐆𝐬^(i)𝐫𝐆𝐬^(i1)||{\bf r}-{\bf G}\cdot\hat{\bf s}^{(i)}||\geq||{\bf r}-{\bf G}\cdot\hat{\bf s}^{(i-1)}||) then EXIT
17 end for
Output: 𝐱^=(𝐈M𝐅N)(𝐏𝐬^(i))\hat{\bf x}=({\bf I}_{M}\otimes{\bf F}_{N})\cdot({\bf P}\cdot{\hat{\bf s}^{(i)}})
Algorithm 3 MRC delay-time domain operation principle in the form of time domain Gauss-Seidel method .

Both Jacobi and GS methods are used to iteratively find the least squares solution

𝐬^n=min𝐬^n𝐳n𝐑n𝐬^n2\hat{\bf s}_{n}=\min_{\hat{\bf s}_{n}}||{\bf z}_{n}-{\bf R}_{n}\hat{\bf s}_{n}||^{2} (85)

of the MM-dimensional linear system of equations

𝐳n=𝐑n𝐬n+𝐰¯n{\bf z}_{n}={\bf R}_{n}\cdot{\bf s}_{n}+\mathbf{\bar{\text{$\bf w$}}}_{n} (86)

where 𝐑nM×M{\bf R}_{n}\in\mathbb{C}^{M\times M} and 𝐬^n,𝐳nM×1\hat{\bf s}_{n},{\bf z}_{n}\in\mathbb{C}^{M\times 1}. We further assume that the time-domain correlation matrix 𝐑n=𝐆n𝐆n{\bf R}_{n}={\bf G}_{n}^{{\dagger}}{\bf G}_{n} is non-singular and hence positive definite Hermitian.

In [19, 20], it is shown that the iteration method (82) for the linear system in (86) is convergent, if ρ(𝐓nGS)<1\rho({\bf T}^{\rm GS}_{n})<1, where ρ(𝐓nGS)\rho({\bf T}^{\rm GS}_{n}) is the spectral radius666Spectral radius of a matrix is the largest absolute value of its eigenvalues. of the square matrix 𝐓nGS{\bf T}^{\rm GS}_{n} [19, 20]. For the Jacobi method, ρ(𝐓nJ)<1\rho({\bf T}^{\rm J}_{n})<1 if 𝐑n{\bf R}_{n} is diagonally dominant, which depends on the channel and cannot be guaranteed. However, the GS method is known to converge faster and convergence is guaranteed under more general conditions than the Jacobi method [19, 20]. In Appendix we prove the following lemma

Lemma 1.

The GS iterative method for the solution of (86) is converging (i.e., ρ(𝐓nGS)<1\rho({\bf T}^{\rm GS}_{n})<1) if 𝐑n{\bf R}_{n} is a positive definite Hermitian matrix. Furthermore, ρ(𝐓nGS)=1\rho({\bf T}^{\rm GS}_{n})=1 if 𝐑n{\bf R}_{n} is a positive semi-definite Hermitian matrix.

We note that the algorithm may still converge even for some channels that result in a positive semi-definite Hermitian matrix 𝐑n{\bf R}_{n} (i.e., ρ(𝐓nGS)=1\rho{({\bf T}^{\rm GS}_{n})}=1), but this is not guaranteed.

Even though the implementation of the iterative MRC detector in Algorithm 3 looks simpler than the one in Algorithm 2, the complexity of initial computations for directly calculating 𝐑n{\bf R}_{n}, 𝐓nGS{\bf T}^{\rm GS}_{n} and 𝐐nGS{\bf Q}^{\rm GS}_{n} is O(NML2)O(NML^{2}) complex multiplications since 𝐆n{\bf G}_{n} is a banded matrix with LL non-zero elements in each row. However, in Algorithm 2, the circulant property of the blocks of the channel matrix 𝐇{\bf H} (due to the placement of null symbols in the OTFS grid as shown in Fig. 2) is utilized to reduce the overall complexity of the initial computations to O(NML)O(NML) complex multiplications as explained in Section III-A.

V Further Improvements

V-A Successive Over Relaxed (SOR) Iterative Rake Detector

Refer to caption
Figure 6: 64-QAM BER performance for different relaxation parameters ω\omega.

In time domain, the proposed iterative Rake detector is similar to doing NN parallel GS iterations on the matched filtered received waveform, as shown in Section III-C. GS and its variants such as successive over-relaxation (SOR) method are well presented in [21, 19, 20]. The SOR method is obtained by introducing a relaxation parameter ω\omega in the GS method (83) as,

𝐬n(i)=𝐬n(i1)+ω𝐃n1(𝐳n𝐋n𝐬n(i)𝐋n𝐬n(i1)).{\bf s}_{n}^{(i)}={\bf s}_{n}^{(i-1)}+\omega{\bf D}_{n}^{-1}({\bf z}_{n}-{\bf L}_{n}{\bf s}_{n}^{(i)}-{\bf L}_{n}^{{\dagger}}{\bf s}_{n}^{(i-1)}). (87)

The corresponding GS iteration matrix 𝐓nGS{\bf T}^{\rm GS}_{n} and 𝐐nGS{\bf Q}^{\rm GS}_{n} in Algorithm 3 can be modified as

𝐓nω=(𝐃n+ω𝐋n)1((ω1)𝐃n+ω𝐋n){\bf T}_{n}^{\omega}=({\bf D}_{n}+\omega{\bf L}_{n})^{-1}\cdot((\omega-1){\bf D}_{n}+\omega{\bf L}_{n}^{{\dagger}}) (88)
𝐐nω=(𝐃n+ω𝐋n)1.{\bf Q}_{n}^{\omega}=({\bf D}_{n}+\omega{\bf L}_{n})^{-1}. (89)

In Appendix we prove the following lemma.

Lemma 2.

The SOR GS iterative method for the solution of (86) is converging (i.e., ρ(𝐓nω)<1\rho({\bf T}^{\omega}_{n})<1) if 𝐑n{\bf R}_{n} is a positive definite Hermitian matrix and 0<ω<20<\omega<2.

We can then simply modify the proposed delay-time detector Algorithm 2 by rewriting (68) as

𝐜~m(i)=𝐜~m(i1)+ω(Δ𝐪~m(i1)𝐝~m).\mathbf{\tilde{\text{${\bf c}$}}}_{m}^{(i)}=\mathbf{\tilde{\text{${\bf c}$}}}_{m}^{(i-1)}+\omega({\Delta}\mathbf{\tilde{\text{$\bf q$}}}_{m}^{(i-1)}\oslash\mathbf{\tilde{\text{$\bf d$}}}_{m}). (90)

Note that when ω=1\omega=1, (90) coincides with (68). The relaxation parameter when ω>1\omega>1 is called the over-relaxation parameter and when ω<1\omega<1 is called the under relaxation parameter. The computation of the optimal SOR parameter ω=ωopt\omega=\omega_{\text{opt}} which minimizes the spectral radius ρ(𝐓nω)\rho({\bf T}_{n}^{\omega}) requires computing the eigenvalues of the iteration matrix 𝐓nω{\bf T}_{n}^{\omega} , [20, 19].

The aim is to find the range of values of ω\omega for which the SOR method converges (see Lemma 2), the set of which denotes the region of convergence, and, if possible, the best value ωopt\omega_{\rm opt}. The optimum SOR parameter can be analytically calculated given the spectral radius of the Jacobi matrix ρ(𝐓nJ)<1\rho({\bf T}^{\rm J}_{n})<1 [21]. However, it is known that ρ(𝐓nJ)<1\rho({\bf T}^{\rm J}_{n})<1 only if 𝐑n{\bf R}_{n} is diagonally dominant, but this is not guaranteed for all channels. In such cases, the numerical calculation of ωopt\omega_{opt} is not practical for large system matrices, rather a region of good performance, within the region of convergence, is easier to find, as suggested by [21]. Further, when the power delay profile statistical model of the channel is given, the good region for the SOR parameter can be optimized offline by simulation.

In this paper, we try to analyse the effect of ω\omega and the range of values of good performance by simulation. Fig. 6 show the BER plot for 64-QAM for different values of ω\omega. In Fig. 7, we plot the required (abbreviated as reqd. in the plot legend) SNR (labelled as ’Q-QAM reqd. SNR’) on the left y-axis alongside the required number of iterations (labelled as ’Q-QAM reqd. iters’) on the right y-axis, to achieve a BER of 10310^{-3} for different modulation sizes, respectively, for different values of ω[1,1.5]\omega\in[1,1.5]. The y-axis of the plot represents the SNR (dB) or the iterations depending on the corresponding curve. The maximum number of iterations is set to 50. It can be seen that the optimum ω\omega for the standard extended vehicular A (EVA) 777The EVA channel power-delay profile (with a maximum speed = 120 km/hr) is given by [0, -1.5, -1.4, -3.6, -0.6, -9.1, -7.0, -12.0, -16.9] dB with excess delay taps =={0,1,2,3,4,5,8,13,19}\mathcal{L}^{\prime}=\mathcal{L}=\{0,1,2,3,4,5,8,13,19\} normalized to the delay resolution 1/(MΔf)1/(M\Delta f) of an OTFS grid with bandwidth MΔfM\Delta f, where M=512M=512 and Δf=15\Delta f=15 kHz. channel model [13] consistently lies in the interval [1.2,1.3][1.2,1.3]. We can observe that there is a 2.5 dB and 17dB gain at a BER of 10310^{-3} for 16-QAM and 64-QAM, respectively, due to just the over-relaxation parameter with almost no extra computational complexity.

Refer to caption
Figure 7: Error performance and convergence speed of different relaxation parameters ω\omega for different modulation sizes |||\mathbb{Q}| at BER 10310^{-3}.

The effect of the SOR parameter on the convergence speed of the MRC detector can be seen in Fig. 7 (right y-axis). It shows the number of iterations required to achieve a BER of 10310^{-3} for different modulation sizes at the corresponding SNR values as given in the plot legend. It can be seen that the biggest reduction in complexity comes at 64-QAM where, the number of iterations required is significantly reduced (by almost 3 times) as compared to the case when SOR parameter ω=1\omega=1. For 4-QAM and 16-QAM, the optimum SOR parameter approximately halves the number of required iterations.

Finally, if no prior knowledge of the channel statistical model is available, we observed by simulation that some performance improvement can still be achieved by setting the value of ω\omega to slightly above 1. The optimization of ω\omega with low complexity, for different SNR, channel profiles and number of multipaths will be investigated in future work.

V-B Iterative Rake Turbo Decoder

Refer to caption
Figure 8: OTFS iterative rake turbo decoder operation.

In order to improve FER performance, the turbo decoder principle shown in Fig. 8 is proposed. The encoded bits are random interleaved in the frame so as to enhance the delay-Doppler diversity.

The detector output bit log likelihood ratios (LLR) after random de-interleaving is fed to the low-density parity check (LDPC) decoder. The hard decision coded bits from the LDPC decoder after interleaving and QAM modulation is then fed back to the MRC detector as the input symbol-vector estimates and the process repeats. Overall, one turbo iteration involves one iteration of MRC detector, de-interleaver, LDPC decoder, interleaver, and the QAM modulator. As shown in Fig. 8, for the first iteration, the initial estimate of the QAM symbols is provided by the low complexity MMSE equalizer as explained in Section III-B, after which the initial estimate comes form the LDPC decoder.

From (65), the soft estimate of the delay-Doppler domain symbol-vector 𝐜m{\bf c}_{m} after MRC combining can be written as

𝐜m=𝐱m+𝐞mm=0,M1{\bf c}_{m}={\bf x}_{m}+{\bf e}_{m}\quad\quad m=0,\ldots M^{\prime}-1 (91)

where 𝐱m{\bf x}_{m} is the transmitted symbol-vector at delay index mm and 𝐞m{\bf e}_{m} denotes the normalized post MRC NPI vector. We assume that 𝐞m{\bf e}_{m} follows a zero mean Gaussian distribution with variance σm2{\sigma}_{m}^{2}. This assumption becomes more accurate as the number of interfering terms increases. Then, the LLR Lm,n,b(i)L_{m,n,b}^{(i)} of bit bb of the nn-th transmitted symbol in the estimated symbol-vector 𝐜m(i){\bf c}^{(i)}_{m} in the ii-th iteration can be obtained by

Lm,n,b(i)\displaystyle L_{m,n,b}^{(i)} =log(Pr(b=0|𝐜m(i)(n))Pr(b=1|𝐜m(i)(n)))\displaystyle=\log\left(\frac{Pr(b=0|{\bf c}^{(i)}_{m}(n))}{Pr(b=1|{\bf c}^{(i)}_{m}(n))}\right)
=log(qQ0exp(|𝐜m(i)(n)q|2/σm2)qQ1exp(|𝐜m(i)(n)q|2/σm2))\displaystyle=\log\left(\frac{\sum_{q\in Q_{0}}\exp({-|{\bf c}^{(i)}_{m}(n)-q|^{2}/\sigma_{m}^{2}})}{\sum_{q^{\prime}\in Q_{1}}\exp({{-|{\bf c}^{(i)}_{m}(n)-q^{\prime}|^{2}}/\sigma_{m}^{2}})}\right) (92)

where 𝒬0\mathcal{Q}_{0} and 𝒬1\mathcal{Q}_{1} are the subsets of QAM symbols, where the bb-th bit of the symbol is 0 and 1, respectively. The complexity of LLR calculation can be reduced by the max-log approximated LLR obtained as

L~m,n,b(i)=1σm2(minq𝒬0|𝐜m(i)(n)q|2minq𝒬1|𝐜m(i)(n)q|2).\displaystyle\mathbf{\tilde{\text{$L$}}}^{(i)}_{m,n,b}=\frac{1}{\sigma^{2}_{m}}\left(\min_{q\in{\mathcal{Q}}_{0}}\left|{{{{\bf c}^{(i)}_{m}(n)}{}}}-q\right|^{2}\!-\!\min_{q^{\prime}\in{\mathcal{Q}}_{1}}\left|{{{\bf c}^{(i)}_{m}(n)}{}}-q^{\prime}\right|^{2}\right). (93)

In order to compute the bit LLRs, an estimate of the post MRC NPI variance σm2\sigma_{m}^{2} is required. Accurate estimation of σm2\sigma_{m}^{2} is not straightforward and requires knowledge of the correlation between all the estimated symbol-vectors and RNPI vectors which changes every iteration as well. Since the entries of channel Doppler spread vectors 𝝂m,l{\boldsymbol{\nu}}_{m,l} can be assumed to be zero mean, i.i.d. and normal distributed [13], the channel Doppler spread for different delay taps can be assumed to be uncorrelated. i.e., E[𝝂m,l𝝂m,p]=0E[{\boldsymbol{\nu}}_{m,l}^{{\dagger}}\cdot{\boldsymbol{\nu}}_{m^{\prime},{p}}]=0 for lpl\neq{p}. Furthermore, for the purpose of a simple estimate of the post MRC NPI variance, we assume that RNPI Δ𝐲m(i){\Delta}{\bf y}_{m}^{(i)} in the different delay branches are uncorrelated (i.e., E[Δ𝐲mΔ𝐲p]=0E[{{\Delta}{\bf y}_{m}^{{\dagger}}\cdot{\Delta}{\bf y}_{p}}]=0 for mpm\neq p in all iterations) and follows Gaussian distribution. The covariance matrix of the delay-time RNPI vector Δ𝐲~m\Delta\mathbf{\tilde{\text{$\bf y$}}}_{m} in the ii-th iteration

𝑪m(i)(j,k)=\displaystyle{\boldsymbol{C}}_{m}^{(i)}(j,k)= (Δ𝐲~m(i)(j)E{Δ𝐲~m(i)})(Δ𝐲~m(i)(k)E{Δ𝐲~m(i)})\displaystyle({\Delta}\mathbf{\tilde{\text{$\bf y$}}}_{m}^{(i)}(j)-E\{{\Delta}\mathbf{\tilde{\text{$\bf y$}}}_{m}^{(i)}\})({\Delta}\mathbf{\tilde{\text{$\bf y$}}}_{m}^{(i)}(k)-E\{{\Delta}\mathbf{\tilde{\text{$\bf y$}}}_{m}^{(i)}\})^{\ast} (94)

for j,k=0,,N1j,k=0,\ldots,N-1 and E{Δ𝐲~m(i)}=1Nn=1NΔ𝐲~m(i)(n)E\{{\Delta}\mathbf{\tilde{\text{$\bf y$}}}_{m}^{(i)}\}=\frac{1}{N}\sum_{n=1}^{N}{\Delta}\mathbf{\tilde{\text{$\bf y$}}}_{m}^{(i)}(n). Since Fourier transformation is a unitary transformation, the NPI variance remains the same in both domains, and we approximate the post MRC NPI variance for the symbol-vector soft estimate 𝐜m(i){\bf c}_{m}^{(i)} in the ii-th iteration as

σm2(i)=Var(𝐞~m(i))1NlLηm,ltr(𝑪m+l(i))\sigma_{m}^{2{(i)}}=\mathrm{Var}(\mathbf{\tilde{\text{$\bf e$}}}_{m}^{(i)})\approx\frac{1}{N}\sum_{l\in L}{\eta}_{m,l}{\mathrm{tr}}({\boldsymbol{C}}^{(i)}_{m+l}) (95)

where ηm,l=𝝂~m+l,l𝐝~m2{\eta}_{m,l}=||\mathbf{\tilde{\text{$\boldsymbol{\nu}$}}}_{m+l,l}\oslash\mathbf{\tilde{\text{$\bf d$}}}_{m}||^{2} is the normalized post MRC channel power in the different delay branches selected for combining. The bit LLR calculation in (93) and NPI variance calculation in (95) has a complexity of 2NMlog2(|𝒬|)2NM\log_{2}(|\mathcal{Q}|) and NMLNML, respectively. The LDPC decoder complexity is of the order CLDPC=O(log2(|𝒬|)NM)C_{\rm LDPC}=O\left(\log_{2}(|\mathcal{Q}|)NM\right). The overall complexity of detection increases by CLDPC+NM(2log2(|𝒬|)+L)+C_{\rm LDPC}+NM(2\log_{2}(|\mathcal{Q}|)+L)+ for every turbo iteration.

Refer to caption
Figure 9: Uncoded 4-QAM BER Plot : MRC vs MPA vs MMSE-OFDM.
Refer to caption
Figure 10: Uncoded 16-QAM BER Plot : MRC vs MPA vs MMSE-OFDM.

VI Simulation Results and Discussion

For simulations we generate OTFS frames for N=128N=128 and M=512M=512. The sub-carrier spacing Δf\Delta f is taken as 15 kHz. The maximum delay spread (in terms of integer taps) is taken to be 32 (lmax=31l_{\max}=31) which is approximately 4 μs{\mu}s. The channel delay model is generated according to the standard EVA model (with a speed of 120 km/h) with the Doppler shift for the ii-th path generated from a uniform distribution U(0,νmax)U(0,\nu_{max}), where νmax\nu_{max} is the maximum Doppler shift [13]. We consider one Doppler shifted path per delay tap with L=9L=9 and kmax=16k_{max}=16. For our simulations, we assume perfect knowledge of the channel state information at the receiver (see [10] for practical channel estimation in OTFS). For BER plots, 10510^{5} frames are send for every point in the BER curve and for FER plots, all simulations run for a minimum of 10510^{5} frames or until 100 OTFS frame errors are encountered. BER is plotted to show uncoded performance, while FER is used when an outer coding scheme is applied.

Refer to caption
Figure 11: Turbo 16-QAM FER Plot: MRC vs BIC-MMSE-OFDM.
Refer to caption
Figure 12: Turbo 64-QAM FER Plot: MRC vs BIC-MMSE-OFDM.

Fig. 9 shows the BER plot for the MRC detector, with and without the initial estimate in Section III-B, for 4-QAM modulated OTFS waveform with a maximum of 10 iterations888Iterations are stopped according to the residual NPI convergence criteria in Algorithm 2.. Performance is compared with the state of the art message passing algorithm (MPA) described in [22, 23] (labeled as OTFS-MPA in Fig. 9 and 10) with a maximum of 10 iterations999The MPA stopping criteria is based on the convergence of the estimated symbol probabilities [22]. and the OFDM single tap MMSE equalizer. It can be seen that with the initial estimate (labeled as OTFS-MRC with Init. Est.101010Init. Est. refers to detection with the Initial Estimate in Section III-B.), there is a \approx1 dB gain over the MPA algorithm at a BER of 10310^{-3}. This gain is contributed by the improved SNR due to the MRC operation (or matched-filtering) at the receiver and the initial time-frequency MMSE estimate, which is more reliable for lower modulation sizes like BPSK and 4-QAM, thereby increasing the convergence speed (due to the initial estimates begin closer to the solution).

Note that the same initial estimates could also be used to improve the performance of MPA. However, the estimates need to be transformed into the delay-Doppler domain and 𝒬\mathcal{Q}-QAM alphabet probabilities for all the information symbols need to be calculated. This would incur a high complexity just to get the improved initial estimate. Moreover, similar to MRC detection, MPA can also be applied on the matched-filtered system matrix 𝐇𝐇{\bf H}^{{\dagger}}{\bf H} instead of 𝐇{\bf H}, but this approximately doubles the MPA complexity, which scales linearly with the number of non-zero elements in the matrix. [22, 23].

Fig. 10 shows the BER plot for the MRC detector for 16-QAM modulation with maximum 15 iterations compared to the MPA-based detector with maximum 30 iterations. It can be seen that with the over-relaxed iterative detection (labeled as OTFS-SOR-MRC with Init. Est. (ω=1.25\omega=1.25)), the BER performance is improved by around 2.5 dB at BER =103=10^{-3}. Moreover, the SOR-iterative algorithm converges on average in less than 8 iterations for SNR>15 dB. We can see from Fig. 6 and 7 that the SOR parameter has more impact at higher modulation schemes, where the initial low complexity estimate is less accurate and the convergence is generally slow without SOR.

Refer to caption
Figure 13: Turbo 64-QAM FER Plot: MRC vs BIC–MMSE-OFDM for codeword lengths: 672, 3840.

Fig. 11 and 12 shows the frame error performance of the plain and SOR-turbo-Rake decoder with initial low complexity estimate for 16 and 64 QAM modulation, respectively, compared with bit interleaved coded OFDM with MMSE detection scheme (labeled as OFDM BICM decoder). A half-rate LDPC code of length Nc=3840N_{c}=3840 bits from [25] is used and every OTFS frame contains NMlog2(|𝒬|)/Nc\lfloor NM\log_{2}(|\mathcal{Q}|)/N_{c}\rfloor codewords.

Turbo iterations are stopped when all the decoded codewords within the frame satisfy the LDPC parity check. It can be observed that just 1 iteration of turbo MRC detector (labeled as Turbo-Rake 1 iter) is required to achieve better error performance than the bit interleaved coded MMSE OFDM. Moreover, with the over-relaxation parameter ω=1.25\omega=1.25 (labeled as SOR-Turbo-Rake), a gain of 0.2\approx 0.2dB (for 16 QAM with 3 turbo iterations) and 1\approx 1dB (for 64 QAM with 3 turbo iterations) is achieved in the FER performance. The overall detector complexity in terms average number of iterations to converge is significantly reduced by using turbo iterations along with the initial estimates from the time-frequency single tap equalizer.

Fig. 13 shows the FER performance of the proposed detector vs BICM-OFDM for different codeword lengths: long (labeled as SOR-Turbo-Rake-3840) and short (labeled as SOR-Turbo-Rake-672). For a fair comparison with the OFDM scheme, the FER plot for a single turbo iteration is also plotted alongside. It can be observed that, the proposed detector with single turbo iteration has a gain of 3\approx 3dB and 4\approx 4dB for codeword length of 3840 and 672, respectively, as compared to the OFDM scheme at a FER of 10210^{-2}. It can be noted that more iterations are required for short codewords to achieve the same performance as long codewords.

VII Detector Complexity

In the table below, we summarize and compare the overall complexity of the iterative Rake receiver (in terms of complex multiplications), including initial computations and Fourier domain transformations as discussed in Section IV.

Computations per iteration (I) NM(2L+1+2log2(N))NM^{\prime}(2L+1+2\log_{2}(N))
Initial computations (II) NM(P+2L)NM^{\prime}(P+2L)
(III) NM[L+2log2(M)+3]NM[L+2\log_{2}(M)+3]

Term (I) accounts for the computations inside each detector iteration, which includes calculating Δ𝐠~m(i){\Delta}\mathbf{\tilde{\text{$\bf g$}}}_{m}^{(i)}, Δ𝐲~m(i)\Delta\mathbf{\tilde{\text{$\bf y$}}}_{m}^{(i)}, 𝐜~m(i)\mathbf{\tilde{\text{$\bf c$}}}_{m}^{(i)}, and the symbol-vector hard decision estimates 𝐱~m(i)\mathbf{\tilde{\text{$\bf x$}}}_{m}^{(i)} in Algorithm 2. Term (II) is for initial computations, which involves calculating MLM^{\prime}L delay-time Doppler spread vectors 𝝂~m,l\mathbf{\tilde{\text{$\boldsymbol{\nu}$}}}_{m,l}, initial MM^{\prime} residual vectors Δ𝐲~m(0)\Delta\mathbf{\tilde{\text{$\bf y$}}}_{m}^{(0)} in (70), and MM^{\prime} vectors 𝐝~m\mathbf{\tilde{\text{$\bf d$}}}_{m} and term (III) is to compute the low complexity initial time-frequency estimate 𝐱^m(0){\hat{\bf x}}_{m}^{(0)} in (78).

The detectors for OTFS with complexity linear in NMNM and with non-ideal pulse shaping waveform (rectangular) are discussed in [22, 24]. The complexity of the MPA detector per iteration scales with the number of paths on the discrete delay-Doppler grid and the alphabet size |𝒬||\mathcal{Q}|, and has a complexity of the order O(P|𝒬|NM)O(P|\mathcal{Q}|NM) [22]. The linear minimum mean square error detector proposed in [24] even though is a non-iterative detector has a computational complexity of O((lmax2+kmaxP2)NM)O\left((l_{\max}^{2}+k_{\max}P^{2})NM\right) whereas the proposed detector has a complexity of O(SLNM)O(SLNM) where LPL\leq P and SS is the number of MRC detector iterations as given in Fig. 7.

The complexity of the proposed detector is compared with other linear complexity OTFS detectors, for different modulation sizes, number of multipaths in Fig. 14. The dashed lines represents the case when there are 5 paths with distinct Doppler shifts in each delay tap i.e., P=5LP=5L. It can be concluded from Fig. 14 that the proposed detector complexity is significantly lower than the one of other OTFS detectors and closer to that of an OFDM single tap MMSE equalizer.

For the iterative operation, the storage requirement for the MRC detector is (L+2)NM(L+2)NM complex numbers as only the LNMLNM delay-time channel coefficients, the MM RNPI vectors, and the MM^{\prime} symbol vector estimates need to be stored for each iteration. For MPA, the storage requirement is much higher and of the order O(P|𝒬|NM)O(P|\mathcal{Q}|NM), [22].

Refer to caption
Figure 14: Complexity comparison with other linear detectors, for different modulation sizes, for an OTFS frame of size N=128,M=512N=128,M=512 for P=LP=L, i.e., for one Doppler path per delay tap (solid lines) and P=5LP=5L, i.e., for five Doppler paths per delay tap (dashed lines).

VIII Conclusion

We reformulated the OTFS input-output relation and proposed two versions of a linear complexity iterative rake detector algorithm for ZP-OTFS modulation based on the maximal ratio combining principle. We show that the MRC detector along with a low complexity initial estimate of symbol-vectors can achieve similar or better BER performance than the MPA detector with lower complexity and storage requirements. Based on the well studied Gauss-Seidel method, we introduced a successive over relaxation parameter to improve error performance and faster convergence of the proposed detector. The MRC detector performance was further improved with the aid of an outer error control coding scheme using turbo iterations. An additional advantage of the MRC detector is that the complexity is linear in LL (number of delay taps) rather than PP (total number of paths), thanks to the vector decomposition of the 2-D convolution with the channel.

Appendix

VIII-1 Proof of Lemma (1)

Consider the MM dimensional linear system of equations 𝐳n=𝐑n𝐬n{\bf z}_{n}={\bf R}_{n}\cdot{\bf s}_{n} without the noise term in (86). The positive definite Hermitian system matrix 𝐑n{\bf R}_{n} can be split as 𝐃n+𝐋n+𝐋n{\bf D}_{n}+{\bf L}_{n}+{{\bf L}}^{{\dagger}}_{n}, where 𝐃n{\bf D}_{n} and 𝐋nM×M{\bf L}_{n}\in\mathbb{C}^{M\times M} are the matrices containing the diagonal and strictly lower-triangular elements, respectively. Pre and post-multiplying both sides of (86) by 𝐃n1/2{\bf D}_{n}^{-1/2} and 𝐃n1/2{\bf D}_{n}^{1/2}, respectively, we get the re-scaled system of equations

𝐳n=𝐑n𝐬n{\bf z}_{n}^{\prime}={\bf R}_{n}^{\prime}\cdot{\bf s}_{n}^{\prime} (96)

where

𝐑n=𝐃n1/2𝐑n𝐃n1/2,𝐳n=𝐃n1/2𝐳n,𝐬n=𝐃n1/2𝐬n{\bf R}_{n}^{\prime}={\bf D}_{n}^{-1/2}\cdot{\bf R}_{n}\cdot{\bf D}_{n}^{-1/2},\quad{\bf z}_{n}^{\prime}={\bf D}_{n}^{-1/2}\cdot{\bf z}_{n},\quad{\bf s}_{n}^{\prime}={\bf D}_{n}^{1/2}\cdot{\bf s}_{n} (97)

𝐑n{\bf R}_{n}^{\prime} is the re-scaled system matrix, which can be split as

𝐑n=𝐈M+𝐋n+Ln{\bf R}^{\prime}_{n}={\bf I}_{M}+{\bf L}^{\prime}_{n}+\textbf{{\bf L}}^{\prime{\dagger}}_{n} (98)

where 𝐋n=𝐃n1/2𝐋n𝐃n1/2{\bf L}_{n}^{\prime}={\bf D}_{n}^{-1/2}\cdot{\bf L}_{n}\cdot{\bf D}_{n}^{-1/2}.

Since 𝐑n{\bf R}_{n}^{\prime} is a positive definite Hermitian matrix, any non-zero vector 𝐮{\bf u} such that 𝐮𝐮=β>0{\bf u}^{{\dagger}}\cdot{\bf u}=\beta>0 satisfies,

𝐮(𝐈M+𝐋n+𝐋n)𝐮>0\displaystyle{\bf u}^{{\dagger}}\cdot({\bf I}_{M}+{\bf L}^{\prime}_{n}+{\bf L}^{\prime{\dagger}}_{n})\cdot{\bf u}>0
\displaystyle\implies β+2[𝐮𝐋n𝐮]>0.\displaystyle\beta+2\Re{[{\bf u}^{{\dagger}}\cdot{\bf L}^{\prime}_{n}\cdot{\bf u}]}>0. (99)

The inequality in (99) can now be written as

a=[𝐮𝐋n𝐮]=[𝐮𝐋n𝐮]>β2a=\Re[{\bf u}^{{\dagger}}\cdot{\bf L}^{\prime}_{n}\cdot{\bf u}]=\Re[{\bf u}^{{\dagger}}\cdot{\bf L}^{\prime{\dagger}}_{n}\cdot{\bf u}]>-\frac{\beta}{2} (100)

where []\Re[\cdot] denotes the real part. Also note that

b=[𝐮𝐋n𝐮]=[𝐮𝐋n𝐮]b=\Im[{\bf u}^{{\dagger}}\cdot{\bf L}^{\prime}_{n}\cdot{\bf u}]=-\Im[{\bf u}^{{\dagger}}\cdot{\bf L}^{\prime{\dagger}}_{n}\cdot{\bf u}] (101)

where []\Im[\cdot] denotes the imaginary part.

Solving (86) is equivalent to solving the linear system of equations in (96) and re-scaling its solution vector as given in (97). The equivalent GS iteration matrix 𝐓nGS{\bf T}^{\rm GS}_{n} for (97) can be written as

𝐓nGS=(𝐈M+𝐋n)1𝐋n.\displaystyle{\bf T}^{\rm GS}_{n}=({\bf I}_{M}+{\bf L}^{\prime}_{n})^{-1}\cdot{\bf L}^{\prime{\dagger}}_{n}. (102)

Now, the GS method for the system equation given in (84) is guaranteed to converge if |λ(𝐓nGS)|<1|\lambda({\bf T}^{\rm GS}_{n})|<1, where λ(𝐓nGS)\lambda({\bf T}^{\rm GS}_{n}) denotes any eigenvalue of 𝐓nGS{\bf T}^{\rm GS}_{n}, which satisfy 𝐓nGS𝐯=λ(𝐓nGS)𝐯{\bf T}^{\rm GS}_{n}\cdot{\bf v}=\lambda({\bf T}^{\rm GS}_{n}){\bf v}, for the corresponding eigenvectors 𝐯{\bf v}, i.e.,

(𝐈M+𝐋n)1𝐋n𝐯=λ(𝐓nGS)𝐯.\displaystyle({\bf I}_{M}+{\bf L}^{\prime}_{n})^{-1}\cdot{\bf L}^{\prime{\dagger}}_{n}\cdot{\bf v}=\lambda({\bf T}^{\rm GS}_{n}){\bf v}. (103)

After multiplying both sides of (103) by 𝐯H(𝐈M+𝐋n){\bf v}^{H}\cdot({\bf I}_{M}+{\bf L}_{n}^{\prime}), we can write λ(𝐓nGS)\lambda({\bf T}^{\rm GS}_{n}) as

λ(𝐓nGS)\displaystyle\lambda({\bf T}^{\rm GS}_{n}) =𝐯n𝐋n𝐯nβ+𝐯n𝐋n𝐯n=|ajb||β+a+jb|\displaystyle=\frac{{\bf v}_{n}^{{\dagger}}\cdot{\bf L}^{\prime{\dagger}}_{n}\cdot{\bf v}_{n}}{\beta+{\bf v}_{n}^{{\dagger}}\cdot{\bf L}_{n}^{\prime}\cdot{\bf v}_{n}}=\frac{|a-jb|}{|\beta+a+jb|} =a2+b2(β+a)2+b2.\displaystyle=\frac{\sqrt{a^{2}+b^{2}}}{\sqrt{(\beta+a)^{2}+b^{2}}}. (104)

From (100), (101) and (104), it can be seen that |λ(𝐓nGS)|<1|\lambda({\bf T}^{\rm GS}_{n})|<1. Similarly for the case when 𝐑n{\bf R}_{n} is positive semi-definite ,i.e., (100) becomes aβ/2a\geq-\beta/2, the eigenvalue inequality becomes |λ(𝐓nGS)|1|\lambda({\bf T}^{\rm GS}_{n})|\leq 1. Since ρ(𝐓nGS)\rho({\bf T}^{\rm GS}_{n}) is equal to the largest absolute value of the eigenvalues of 𝐓nGS{\bf T}^{\rm GS}_{n}, the positive definiteness of 𝐑n{\bf R}_{n} ensures that ρ(𝐓nGS)<1\rho({\bf T}^{\rm GS}_{n})<1.

VIII-2 Proof of Lemma (2)

Following the steps above, (104) can be modified for the eigenvalues of the SOR-GS iteration matrix 𝐓nω{\bf T}_{n}^{\omega} defined in (88) as

λ(𝐓nω)=(ω1)(𝐯𝐯)+ω(𝐯𝐋n𝐯n)𝐯𝐯+ω(𝐯𝐋n𝐯).\displaystyle\lambda({\bf T}_{n}^{\omega})=\frac{(\omega-1)({\bf v}^{{\dagger}}\cdot{\bf v})+\omega({\bf v}^{{\dagger}}\cdot{\bf L}^{\prime{\dagger}}_{n}\cdot{\bf v}_{n})}{{\bf v}^{{\dagger}}\cdot{\bf v}+\omega({\bf v}^{{\dagger}}\cdot{\bf L}_{n}^{\prime}\cdot{\bf v})}. (105)

The condition for eigenvalues λ(𝐓nGS)\lambda({\bf T}^{\rm GS}_{n}) in (104) can then be modified for the SOR case as

|λ(𝐓nω)|\displaystyle|\lambda({\bf T}_{n}^{\omega})| =((ω1)β+ωa)2+(ωb)2(β+ωa)2+(ωb)2.\displaystyle=\frac{\sqrt{((\omega-1)\beta+\omega a)^{2}+(\omega b)^{2}}}{\sqrt{(\beta+\omega a)^{2}+(\omega b)^{2}}}. (106)

It can be seen from (106) that |λ(𝐓nω)|<1|\lambda({\bf T}_{n}^{\omega})|<1, if |(ω1)β+ωa|<|β+ωa||(\omega-1)\beta+\omega a|<|\beta+\omega a|, which is guaranteed if 0<ω<20<\omega<2.

References

  • [1] R. Hadani, S. Rakib, M. Tsatsanis, A. Monk, A. J. Goldsmith, A. F. Molisch, and R. Calderbank, “Orthogonal time frequency space modulation,” in Proc. IEEE Wireless Commun. Netw. Conf. (WCNC), San Francisco, CA, USA, Mar. 2017.
  • [2] P. Raviteja, Y. Hong, E. Viterbo, and E. Biglieri, “Effective Diversity of OTFS Modulation,” IEEE Wireless Commun. Lett., vol. 9, no. 2, pp. 249-253, Feb. 2020.
  • [3] T. Thaj and E. Viterbo, “Low Complexity Iterative Rake Detector for Orthogonal Time Frequency Space Modulation,” in Proc. IEEE Wireless Commun. Netw. Conf. (WCNC), Seoul, Korea (South), May 2020.
  • [4] D. G. Brennan, “Linear diversity combining techniques,” in Proc. IRE, vol.47, pp. 1075-1102, June 1959.
  • [5] S. Kondo and B. Milstein, “Performance of multicarrier DS CDMA systems,” IEEE Trans. Commun., vol. 44, no. 2, pp. 238-246, Feb. 1996.
  • [6] S. Imada and T. Ohtsuki, “Pre-RAKE diversity combining for UWB systems in IEEE 802.15 UWB multipath channel,” Int. Workshop on UWBST and IWUWBS, Kyoto, Japan, 2004, pp. 236-240.
  • [7] Xiaofei Dong and N. C. Beaulieu, “Optimal maximal ratio combining with correlated diversity branches,” IEEE Commun. Lett., vol. 6, no. 1, pp. 22-24, Jan. 2002.
  • [8] R. M. Buehrer, S. P. Nicoloso, and S. Gollamudi, “Linear versus non-linear interference cancellation,” J. Commun. Netw., vol. 1, no. 2, pp. 118-133, Jun. 1999.
  • [9] P. Raviteja, Y. Hong, E. Viterbo, and E. Biglieri, “Practical pulse-shaping waveforms for reduced-cyclic-prefix OTFS,” IEEE Trans. Veh. Technol., Oct. 2018.
  • [10] P. Raviteja, K. T. Phan, and Y. Hong, “Embedded Pilot-Aided Channel Estimation for OTFS in Delay–Doppler Channels,” IEEE Trans. Veh. Technol., vol. 68, no. 5, pp. 4906-4917, May 2019.
  • [11] T. Ebihara and G. Leus, “Doppler-Resilient Orthogonal Signal-Division Multiplexing for Underwater Acoustic Communication,” IEEE J. Ocean. Eng., vol. 41, no. 2, pp. 408-427, April 2016.
  • [12] M. Stojanovic and J. Preisig, “Underwater acoustic communication channels: Propagation models and statistical characterization,” IEEE Commun. Mag., vol. 47, no. 1, pp. 84-89, Jan. 2009.
  • [13] “LTE Evolved Universal Terrestrial Radio Access (E-UTRA); Base Station (BS) radio transmission and reception,” 3GPP TS 36.104 version 8.6.0 Release 8, Jul. 2009, ETSI TS. doi: 10.1109/MCOM.2009.4752682
  • [14] D. Tse and P. Viswanath, Fundamentals of Wireless Communication, 3rd ed. Cambridge University Press, ISBN: 9780521845274, 2005.
  • [15] A. Farhang, A. RezazadehReyhani, L. E. Doyle and B. Farhang-Boroujeny, “Low complexity modem structure for OFDM-based orthogonal time frequency space modulation,” in IEEE Wireless Commun. Lett., vol. 7, no. 3, pp. 344-347, Jun. 2018.
  • [16] R. Hadani, S. Rakib, M. Tsatsanis, A. Monk, A. J. Goldsmith, A. F. Molisch, and R. Calderbank, “Orthogonal time frequency space modulation,” Aug. 2018. doi: arXiv:1808.00519
  • [17] C. F. V. Loan, “The ubiquitous Kronecker product,” J. Comput. Appl. Math., vol. 123, issues 1–2, pp. 85-100, ISSN 0377-0427, 2000.
  • [18] R. A. Beezer, A First Course in Linear Algebra, 3rd ed. Washington, DC, USA, Congruent Press, pp. 404-406, ISBN: 978-0-9844175-5-1, 1973.
  • [19] A. Björck, Numerical Methods for Least Squares Problems, SIAM, 1996. doi: 10.1137/1.9781611971484
  • [20] Y. Saad, Iterative Methods for Sparse Linear Systems, 2nd ed. SIAM, 2003. doi: 10.1137/1.9780898718003
  • [21] A. Hadjidimos, “Successive overrelaxation (SOR) and related methods,” J. Comput. Appl. Math., vol. 123, Issues 1–2, pp. 177-199, ISSN 0377-0427, Nov. 2000.
  • [22] P. Raviteja, K.T. Phan, Yi Hong, and E. Viterbo, “Low-complexity iterative detection for orthogonal time frequency space modulation,” in Proc. IEEE Wireless Commun. Netw. Conf. (WCNC), Barcelona, Apr. 2018.
  • [23] P. Raviteja, K.T. Phan, Yi Hong, and E. Viterbo, “Interference cancellation and iterative detection for orthogonal time frequency space modulation,” IEEE Trans. Wireless Commun., vol. 17, no. 10, pp. 6501-6515, Oct. 2018.
  • [24] S. Tiwari, S.S. Das and V. Rangamgari, “Low-complexity LMMSE receiver for OTFS,” IEEE Commun. Lett., Oct. 2019.
  • [25] T.T.B. Nguygen, T.N. Tan and H. Lee, “Effecient QC-LDPC Encoder for 5G New Radio,” Electronics, vol. 8, no. 6, p. 668, Jun. 2019.