This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Channel Mapping Based on Interleaved Learning with Complex-Domain MLP-Mixer

Zirui Chen, Zhaoyang Zhang, Zhaohui Yang, and Lei Liu This work was supported in part by National Natural Science Foundation of China under Grants 62394292 and U20A20158, Ministry of Industry and Information Technology under Grant TC220H07E, Zhejiang Provincial Key R&D Program under Grant 2023C01021, and the Fundamental Research Funds for the Central Universities. (Corresponding Author: Zhaoyang Zhang)Z. Chen, Z. Zhang, Z. Yang and L. Liu are with College of Information Science and Electronic Engineering, Zhejiang University, Hangzhou 310027, China, and also with Zhejiang Provincial Key Laboratory of Info. Proc., Commun. & Netw. (IPCAN), Hangzhou 310007, China. (E-mails: {ziruichen, ning_ming, yang_zhaohui, lei_liu}@zju.edu.cn)
Abstract

In multiple-input multiple-output (MIMO) orthogonal frequency division multiplexing (OFDM) systems, representing the whole channel only based on partial subchannels will significantly reduce the channel acquisition overhead. For such a channel mapping task, inspired by the intrinsic coupling across the space and frequency domains, this letter proposes to use interleaved learning with partial antenna and subcarrier characteristics to represent the whole MIMO-OFDM channel. Specifically, we design a complex-domain multilayer perceptron (MLP)-Mixer (CMixer), which utilizes two kinds of complex-domain MLP modules to learn the space and frequency characteristics respectively and then interleaves them to couple the learned properties. The complex-domain computation facilitates the learning on the complex-valued channel data, while the interleaving tightens the coupling of space and frequency domains. These two designs jointly reduce the learning burden, making the physics-inspired CMixer more effective on channel representation learning than existing data-driven approaches. Simulation shows that the proposed scheme brings 4.6~10dB gains in mapping accuracy compared to existing schemes under different settings. Besides, ablation studies show the necessity of complex-domain computation as well as the extent to which the interleaved learning matches the channel properties.

Index Terms:
Channel Mapping, Deep learning, Physics-inspired learning, MIMO, OFDM.

I Introduction

Multiple-input multiple-output (MIMO) and orthogonal frequency division multiplexing (OFDM) modulation are two key enabling technologies in wireless communication systems, thanks to their ability in achieving high spectrum efficiency. Meanwhile, unleashing these potentials generally requires acquiring real-time channel state information (CSI). However, MIMO and OFDM techniques significantly increase the size of CSI data, resulting in significant signaling overhead [1].

To address this challenge, the work in [2] proposes the channel mapping over space and frequency, i.e., by leveraging the implicit correlations within the high-dimensional channel, obtaining the whole MIMO-OFDM channel from known subchannel at a partial set of antennas and subcarriers. In this way, the signaling overhead required for channel acquisition can be reduced from the entire high-dimensional channel to some specific subchannels. Specifically, the authors in [2] leverage the capabilities of deep learning (DL) techniques in mining implicit features and high-dimensional data representation, using a multi-layer perceptron (MLP) to learn the channel mapping function by fitting it from the training data. Moreover, there are some channel estimation works that use similar ideas to estimate the whole channel by mapping from only a small number of subchannels. In [3], the authors use DL learning techniques to map the information of partial subcarriers to whole subcarriers for channel estimation in OFDM systems. Further, the work in [4] improves the network structure in [3] with a residual convolutional neural network (Res_CNN) to enhance the performance of channel mapping and estimation.

Despite some preliminary attempts, existing channel mapping networks are still directly migrated from popular DL schemes and lack task-related design, which makes the learning mainly data-driven. However, data-driven learning is mediocre under limited training data, and the learned generalization is often weak. Therefore, it is necessary to design the network a priori according to the properties of the data and the task to improve the learning efficiency, such as the design of CNNs based on the translational equivalence and smoothness of images. This letter delves into the physical properties inside MIMO-OFDM channels and accordingly proposes a physics-inspired learning scheme for channel mapping. The main contributions of this letter are summarized as follows:

  • By revealing the intrinsic coupling between the space and frequency characteristics of an MIMO-OFDM channel, we propose an interleaved learning framework to investigate this cross-domain internal correlation.

  • To facilitate the learning process, we propose a novel complex-domain MLP-Mixer (CMixer) layer module for channel representation, and then construct a CMixer model for channel mapping by stacking multiple CMixer layers.

  • Through comparison experiments with existing methods, we demonstrate the superiority of the proposed scheme. Further ablation studies show the necessity and value of physics-inspired design for improving mapping performance.

II System Model

In this section, we introduce the system model, including the channel model and the channel mapping framework.

II-A Channel Model

We consider a massive MIMO system, where a base station (BS) with Nt1N_{\mathrm{t}}\gg 1 antennas serves multiple single-antenna users. Also, the considered system adopts OFDM modulation with NcN_{\mathrm{c}} subcarriers. The channel between one user and the BS is assumed to be composed of PP paths, which can be expressed as,

𝐡(f)=p=1Pαpej2πfτp𝐚(p),{\bf{h}}\left(f\right)=\sum\limits_{p=1}^{P}{{\alpha_{p}}{e^{-j2\pi f{\tau_{p}}}}{\bf{a}}(\vec{p})}, (1)

where ff is the carrier frequency, αp{\alpha_{p}} is the amplitude attenuation, τp{\tau_{p}} is the propagation delay, and p\vec{p} is the three-dimensional unit vector of departure direction of the pp-th path. Furthermore, 𝐚(p)\mathbf{a}(\vec{p}) is the array defined as,

𝐚(p)=[1,ej2πfd1p/c,,ej2πfdNt1p/c]T,{\mathbf{a}(\vec{p})}={\left[{1,{e^{-j2\pi f{{\vec{d}_{1}}}\cdot\vec{p}/c}},\ldots,{e^{-j2\pi f{{\vec{d}_{{N_{\rm{t}}}-1}}}\cdot\vec{p}/c}}}\right]^{T}}, (2)

where [0,d1,,dNt1]\left[{\vec{0},{{\vec{d}_{1}}},\ldots,{{\vec{d}_{{N_{\rm{t}}}-1}}}}\right] is the space vector array, di(i=1,2,,Nt1){{\vec{d}_{i}}}\leavevmode\nobreak\ (i=1,2,\ldots,{{N_{\rm{t}}}-1}) represents the three-dimensional space vector between the ii-antenna and the first antenna, and cc is the speed of light. The transmission distance shift between the ii-th antenna and the first antenna on pp-th path can be written as dip{{\vec{d}_{i}}}\cdot\vec{p}. If the antenna array form is a uniform linear array, dip{{\vec{d}_{i}}}\cdot\vec{p} can be simplified as idcosθpid\cos{\theta_{p}}, where θp{\theta_{p}} is the angle of departure (AoD) of the pp-th path and dd is the antenna space. This simplified case is equivalent to the channel model in the literature [5]. Further, the whole CSI matrix 𝐇Nt×Nc\mathbf{H}\in{\mathbb{C}^{{N_{\mathrm{t}}}{\rm{\times}}{N_{\mathrm{c}}}}} between the user and the BS can be expressed as,

𝐇=[𝐡(f1),𝐡(f2),,𝐡(fNc)],\displaystyle\mathbf{H}=[\mathbf{h}({f_{1}}),\mathbf{h}(f_{2}),\cdots,\mathbf{h}({f_{N_{\mathrm{c}}}})], (3)

where fi=f0+Δfi,(i=1,2,,Nc){f_{i}}={f_{0}}+\Delta{f_{i}},(i=1,2,\cdots,{N_{\rm{c}}}), fi{f_{i}} is the ii-th subcarrier frequency, f0f_{0} is the base frequency, and Δfi\Delta{f_{i}} is the frequency shift between the ii-th subcarrier and the base frequency.

II-B Channel Mapping

As shown in the channel model, there are significant correlations within the CSI matrix, benefiting from the path-similarity of sub-channels. Further, channel mapping [2] aims at realizing such a task leveraging on this correlation: inputting sub-channels of some antennas and subcarriers to output the whole channel data, which can be written as,

g:𝐇[Ω]𝐇,{\rm{g}}:{\bf{H}}[\Omega]\to{\bf{H}}, (4)

where Ω\Omega is the selected small subset in space and frequency, 𝐇[Ω]{\bf{H}}[\Omega] is also written as 𝐇0{\bf{H}}^{0}.

The channel mapping task has many potential applications. For example, in high-dimensional channel estimation, it is possible only to estimate the CSI of several antennas and subcarriers and then obtain the whole channel by channel mapping to decrease the pilot overhead. Furthermore, in the high-dimensional CSI feedback, it is possible only to upload partial downlink CSI, and the BS can still reconstruct the whole downlink CSI based on channel mapping, forming a user-friendly CSI feedback scheme without the additional encoder. In Eq. (4), for the subset Ω\Omega, a typical selection is the whole channel of several certain antennas and subcarriers, i.e.,

Ω=AB,\displaystyle\begin{array}[]{l}\Omega{\rm{=A}}\otimes{\rm{B}},\end{array} (6)

where A\rm{A} is the subset of the selected antennas, B\rm{B} is the subset of the selected subcarriers, and \otimes represents that aA&bB,(a,b)Ω\forall a\in{\rm{A\&b}}\in{\rm{B,}}\left({a,b}\right)\in\Omega. Besides, the settings of A and B are often uniform distributions and as widely ranging as possible in space and frequency domains, respectively.

Refer to caption
Figure 1: The overall framework of channel mapping.

Since CSI is a complex coupling of multipath channel responses, the internal correlation in the CSI matrix is often implicit and, thus, is difficult to explore through traditional signal processing methods. Therefore, we use the DL technology with excellent implicit feature learning capabilities to realize efficient channel mapping. The overall DL-based channel mapping framework is shown in Fig. 1. This learning process mainly relies on the representation learning of channel data, i.e., representing implicit features from known channel information and then representing the whole channel based on the obtained features. For the mapping neural network, if some a priori design can be introduced according to the physical properties of the MIMO-OFDM channel, it will undoubtedly improve the representation learning and guide the network to learn more efficiently from the limited training data.

III Proposed Schemes and Related Analysis

This section details our proposed learning scheme and related analysis, including the revealed channel characteristic, the learning module inspired by this characteristic, and the proposed channel mapping scheme utilizing this module.

Refer to caption
(a) Overall idea of interleaved learning
Refer to caption
(b) Stacking interleaved learning modules to form CMixer model
Refer to caption
(c) Interleaved learning module for channel representation
Refer to caption
(d) Complex-domain MLP for Space and Frequency Mixing
Figure 2: The graphic illustration of the proposed CMixer model.

III-A Interlaced Space and Frequency Characteristics

This subsection presents that there is an unique interlaced space and frequency characteristic in MIMO-OFDM channel. Specifically, according to the channel model in Section II-A, we can define the following function,

q𝐇(d,Δf)=\displaystyle{q_{\bf{H}}}(\vec{d},\Delta f)=
p=1Pαpej2πf0τpej(2πf0dp/c+2πΔfτp+2πΔfdp/c),\displaystyle\sum\limits_{p=1}^{P}{{\alpha_{p}}{e^{-j2\pi{f_{0}}{\tau_{p}}}}{e^{-j\left({2\pi{f_{0}}\vec{d}\cdot\vec{p}/c+2\pi\Delta f{\tau_{p}}+2\pi\Delta f\vec{d}\cdot\vec{p}/c}\right)}}}, (7)

where the parameters of q𝐇(){q_{\bf{H}}}(\cdot), αp,f0,τp,p\alpha_{p},f_{0},\tau_{p},{\vec{p}}, are the shared large-scale features among all sub-channels. Therefore, q𝐇(){q_{\bf{H}}}(\cdot) is independent of the specific antennas and subcarriers. Then, leveraging q𝐇(){q_{\bf{H}}}(\cdot), the CSI of the mm-th antenna and nn-th subcarrier can be written as q𝐇(dm1,Δfn){q_{\bf{H}}}({{\vec{d}_{m-1}}},\Delta{f_{n}}), then the CSI matrix in Eq. (3) can be rewritten as,

𝐇=\displaystyle{\bf{H}}=
[q𝐇(0,Δf1)q𝐇(0,Δf2)q𝐇(0,ΔfNc)q𝐇(d1,Δf1)q𝐇(d1,Δf2)q𝐇(d1,ΔfNc)q𝐇(dNt1,Δf1)q𝐇(dNt1,Δf2)q𝐇(dNt1,ΔfNc)].\displaystyle\left[{\begin{array}[]{*{20}{c}}{{q_{\bf{H}}}(\vec{0},\Delta{f_{1}})}&{{q_{\bf{H}}}(\vec{0},\Delta{f_{2}})}&\cdots&{{q_{\bf{H}}}(\vec{0},\Delta{f_{{N_{\rm{c}}}}})}\\ {{q_{\bf{H}}}({{\vec{d}_{1}}},\Delta{f_{1}})}&{{q_{\bf{H}}}({{\vec{d}_{1}}},\Delta{f_{2}})}&\cdots&{{q_{\bf{H}}}({{\vec{d}_{1}}},\Delta{f_{{N_{\rm{c}}}}})}\\ \vdots&\vdots&\ddots&\vdots\\ {{q_{\bf{H}}}({{\vec{d}_{{N_{\rm{t}}}-1}}},\Delta{f_{1}})}&{{q_{\bf{H}}}({{\vec{d}_{{N_{\rm{t}}}-1}}},\Delta{f_{2}})}&\cdots&{{q_{\bf{H}}}({{\vec{d}_{{N_{\rm{t}}}-1}}},\Delta{f_{{N_{\rm{c}}}}})}\end{array}}\right]. (12)

For any two columns of 𝐇[:,m],𝐇[:,n]{\bf{H}}\left[{:,m}\right],{\bf{H}}\left[{:,n}\right] (m,n{0,,Nc1})(\forall m,n\in\left\{{0,\ldots,{N_{\rm{c}}-1}}\right\}), all antennas’ CSI of different subcarriers, although 𝐇[:,m],𝐇[:,n]{\bf{H}}\left[{:,m}\right],{\bf{H}}\left[{:,n}\right] have different frequency shifts Δfm,Δfn\Delta{f_{m}},\Delta{f_{n}}, they still share the same space features [0,d1,,dNt1]\left[{\vec{0},{{\vec{d}_{1}}},\ldots,{{\vec{d}_{{N_{\rm{t}}}-1}}}}\right] and the same large-scale features q𝐇(,){q_{\bf{H}}}(\cdot,\cdot). Similarly, for any two rows of 𝐇[m,:],𝐇[n,:](m,n{0,,Nt1}){\bf{H}}\left[{m,:}\right],{\bf{H}}\left[{n,:}\right](\forall m,n\in\left\{{0,\ldots,{N_{\rm{t}}-1}}\right\}), all subcarriers’ CSI of different antennas, although there exists different space shift dm,dn{{\vec{d}_{m}}},{{\vec{d}_{n}}}, they still share the same frequency features [Δf1,Δf2,,ΔfNc][\Delta{f_{1}},\Delta{f_{2}},\ldots,\Delta{f_{{N_{\rm{c}}}}}] and the same large-scale features q𝐇(){q_{\bf{H}}}(\cdot). To summarize, based on the shared large-scale channel features q𝐇(){q_{\bf{H}}}(\cdot), shared space features [0,d1,,dNt1]\left[{\vec{0},{{\vec{d}_{1}}},\ldots,{{\vec{d}_{{N_{\rm{t}}}-1}}}}\right] and shared frequency features [Δf1,Δf2,,ΔfNc][\Delta{f_{1}},\Delta{f_{2}},\ldots,\Delta{f_{{N_{\rm{c}}}}}] interlaced couple, form the internal correlation of the whole CSI matrix.

III-B Interleaved Learning Module for Channel Representation

According to the revealed property, we propose an interleaved space and frequency learning approach for channel representation, as shown in Fig. 2(a). This approach first learns the space dimension, and this learning process is shared among all subcarriers, named as space mixing. This sharing of space learning process utilizes the sharing of space characteristics among all subcarriers. Then, the proposed approach learns the frequency dimension, and this learning process is also shared among all antennas, called frequency mixing. Similarly, this sharing of frequency learning process also utilizes the sharing of frequency characteristics among all antennas. The cascading of space mixing and frequency mixing interlaces the learned space and frequency characteristics, highly matching the revealed channel characteristics. In the neural network implement, similar to [6], we also introduce the residual connection and the layer normalization to keep the effectiveness and stability of training. Fig. 2(c) presents the specific learning module design, and the calculation process is as follows,

𝐕[i,:,:]\displaystyle{\bf{V}}[i,:,:] =𝐈[i,:,:]+SM(LN(𝐈[i,:,:])),\displaystyle={\bf{I}}[i,:,:]+{\rm{SM}}({\rm{LN}}({\bf{I}}[i,:,:])),
i[0,1,,Nc1],\displaystyle\leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ \forall i\in\left[{0,1,\ldots,N_{\rm{c}}^{\rm{{}^{\prime}}}-1}\right], (13a)
𝐎[:,j,:]\displaystyle{\bf{O}}[:,j,:] =𝐕[:,j,:]+FM(LN(𝐕[:,j,:])),\displaystyle={\bf{V}}[:,j,:]+{\rm{FM}}({\rm{LN}}({\bf{V}}[:,j,:])),
j[0,1,,Nt1],\displaystyle\leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ \forall j\in\left[{0,1,\ldots,N_{\rm{t}}^{\rm{{}^{\prime}}}-1}\right], (13b)

where 𝐈\bf{I} is the input, 𝐎\bf{O} is the output, 𝐕\bf{V} is the intermediate variable matrix, 𝐈,𝐎,𝐕\bf{I},\bf{O},\bf{V} are all of Nc×Nt×2N_{\rm{c}}^{{}^{\prime}}{\rm{\times}}{N_{\rm{t}}^{{}^{\prime}}}{\rm{\times 2}} size, and SM\rm{SM}, FM\rm{FM}, and LN\rm{LN} are the abbreviation of space mixing, frequency mixing, and layer normalization, respectively. The computation in Eq. (13) completes a nonlinear representation of the input 𝐈\bf{I} through interleaved learning and yields an output 𝐎\bf{O} of the same size as 𝐈\bf{I}.

Considering that the channel characteristics are embodied in complex-valued data while the current mainstream DL platforms are based on real-valued operations, we design a complex-domain MLP (CMLP) module to specifically realize space mixing and frequency mixing. Fig. 2(d) shows the flowchart of CMLP, and the simplified calculation process can be written as follows,

𝒞X/X2reshape2XMLP2XreshapeX2/𝒞X,{\mathcal{C}^{X}/\mathcal{R}^{X*2}}\xrightarrow{{{\text{reshape}}}}{\mathcal{R}^{2X}}\xrightarrow{{{\text{MLP}}}}{\mathcal{R}^{2X}}\xrightarrow{{{\text{reshape}}}}{\mathcal{R}^{X*2}}/\mathcal{C}^{X}, (14)

where 𝒞\mathcal{C} represents complex-valued data and \mathcal{R} represents real-valued data, XX refers to NtN_{\rm{t}}^{{}^{\prime}} or NcN_{\rm{c}}^{{}^{\prime}}. SS in Fig. 2(d) is constant multiples of XX. This method places the real and imaginary parts in the same neural layer for learning, which ensures the sufficient interaction of real and imaginary information better than treating the real and imaginary parts as separate channels. Moreover, this method recovers the complex dimension at the end to ensure that the next mixing can still be performed on the complex domain. Thus, the learning of the channel data can always remain in the complex domain, even if stacking multiple mixing layers. In summary, this subsection establishes an interleaved learning module, a CMixer layer, consisting of a cascade of a space-learning CMLP and a frequency-learning CMLP. It is suitable for channel representation learning since it is designed according to the channel’s physical properties.

III-C CMixer Model for Channel Mapping

By stacking the above representation modules to enhance the nonlinearity and adding the necessary dimension mapping modules, the CMixer model is built for the channel mapping task. The network structure is shown in Fig. 2(b), and the calculation process is as follows. First, inputting the Nc0×Nt0×2N_{\rm{c}}^{0}{\rm{\times}}N_{\rm{t}}^{0}{\rm{\times 2}}-size known channel to a 2Nt02Nt2N_{\rm{t}}^{0}\to 2N_{\rm{t}}^{{}^{\prime}} MLP layer and a 2Nc02Nc2N_{\rm{c}}^{0}\to 2N_{\rm{c}}^{{}^{\prime}} MLP layer to output a Nc×Nt×2N_{\rm{c}}^{{}^{\prime}}{\rm{\times}}N_{\rm{t}}^{{}^{\prime}}{\rm{\times 2}}-size variable. This part yields 𝒪(Nt0Nt+Nc0Nc)\mathcal{O}(N_{\rm{t}}^{0}N_{\rm{t}}^{{}^{\prime}}+N_{\rm{c}}^{0}N_{\rm{c}}^{{}^{\prime}}) parameters and 𝒪(Nt0NtNc0+Nc0NcNt)\mathcal{O}(N_{\rm{t}}^{0}N_{\rm{t}}^{{}^{\prime}}N_{\rm{c}}^{0}+N_{\rm{c}}^{0}N_{\rm{c}}^{{}^{\prime}}N_{\rm{t}}^{{}^{\prime}}) computation complexity. Then, inputting the Nc×Nt×2N_{\rm{c}}^{{}^{\prime}}{\rm{\times}}N_{\rm{t}}^{{}^{\prime}}{\rm{\times 2}}-size variable to stacked KK CMixer layers and the output variable is still of Nc×Nt×2N_{\rm{c}}^{{}^{\prime}}{\rm{\times}}N_{\rm{t}}^{{}^{\prime}}{\rm{\times 2}} size. This part yields 𝒪(KNtSt+KNcSc)\mathcal{O}(KN_{\rm{t}}^{{}^{\prime}}{S_{\rm{t}}}+KN_{\rm{c}}^{{}^{\prime}}{S_{\rm{c}}}) parameters and 𝒪(KNtStNc+KNcScNt)\mathcal{O}(KN_{\rm{t}}^{{}^{\prime}}{S_{\rm{t}}}N_{\rm{c}}^{{}^{\prime}}+KN_{\rm{c}}^{{}^{\prime}}{S_{\rm{c}}}N_{\rm{t}}^{{}^{\prime}}) computation complexity. Finally, using a 2Nt2Nt2N_{\rm{t}}^{{}^{\prime}}\to 2{N_{\rm{t}}} MLP layer and a 2Nc2Nc2N_{\rm{c}}^{{}^{\prime}}\to 2{N_{\rm{c}}} MLP layer to mapping the previous output to Nc×Nt×2{N_{\rm{c}}}{\rm{\times}}{N_{\rm{t}}}{\rm{\times 2}}-size variable, which is the whole MIMO-OFDM CSI. This part yields 𝒪(NtNt+NcNc)\mathcal{O}(N_{\rm{t}}^{{}^{\prime}}{N_{\rm{t}}}+N_{\rm{c}}^{{}^{\prime}}{N_{\rm{c}}}) parameters and 𝒪(NtNtNc+NcNcNt)\mathcal{O}(N_{\rm{t}}^{{}^{\prime}}{N_{\rm{t}}}N_{\rm{c}}^{{}^{\prime}}+N_{\rm{c}}^{{}^{\prime}}{N_{\rm{c}}}{N_{\rm{t}}}) computation complexity. Since Nt0<Nt,Nc0<NcN_{\rm{t}}^{0}<{N_{\rm{t}}},N_{\rm{c}}^{0}<{N_{\rm{c}}}, NtN_{\rm{t}}^{{}^{\prime}} and St{S_{\rm{t}}} are constant multiples of NtN_{\rm{t}}, NcN_{\rm{c}}^{{}^{\prime}} and Sc{S_{\rm{c}}} are constant multiples of NcN_{\rm{c}}, the total parameter scale of this model is 𝒪(KNt2+KNc2)\mathcal{O}\left({KN_{\rm{t}}^{2}+KN_{\rm{c}}^{2}}\right) and the total computation complexity is 𝒪(KNt2Nc+KNc2Nt)\mathcal{O}\left({KN_{\rm{t}}^{2}{N_{\rm{c}}}+KN_{\rm{c}}^{2}{N_{\rm{t}}}}\right). To train the model, mean square error (MSE) is used as the loss function, which can be written as

Loss(Θ)=1Nnum=1N𝐇num𝐇^num22,\mathrm{Loss}(\Theta)=\frac{1}{N}\sum\limits_{num=1}^{N}\left\|{{{{\mathbf{H}}}_{num}}-{{\widehat{\mathbf{H}}}_{num}}}\right\|_{2}^{2}, (15)

where Θ\Theta is the parameter set of the proposed model, NN is the number of channel samples in the training set, and 2\|\cdot\|_{2} is the Euclidean norm. In addition, Θ\Theta can be updated through the existing gradient descent-based optimizers, such as the adaptive momentum estimation (Adam) optimizer.

IV Simulation Experiments

In this section, we evaluate the performance and properties of the proposed CMixer through simulation experiments.

IV-A Experiment Settings

In this work, we use the open-source DeepMIMO dataset of ‘O1’ scenario at 3.5GHz [7] for the simulation experiment. The bandwidth is set as 40MHz and divided to 32 subcarriers, the BS is equipped with a uniform linear array (ULA) consisting of 32 antennas. The user area is chose as R701-R1400. 50000 channel samples are randomly sampled from the dataset and divided into the training set and testing set with a ratio of 4:1. In addition, for intuitive performance evaluation, we use two existing typical DL-based channel mapping schemes as benchmarks. One is the MLP method [2], using a pure MLP network to learn the mapping function. The other is the Res_CNN method [4], using a CNN with residual connection as the network structure. Also, the model and training settings are summarized in Table I.

Table I: Parameter settings of CMixer Model
Parameters Value
Batch size 250
Learning rate 1×1031\times 10^{-3} (multiply by 0.2 every 500 epochs after the 500th epoch)
Model setting K=5,Nc=32,Nt=32,Nc=32,Nt=32,Sc=128,St=128K=5,N_{\rm{c}}=32,N_{\rm{t}}=32,N_{\rm{c}}^{{}^{\prime}}=32,N_{\rm{t}}^{{}^{\prime}}=32,S_{\rm{c}}=128,S_{\rm{t}}=128
Subset Ω=AB\Omega{\rm{=A}}\otimes{\rm{B}} A={0,stept,,(Nt01)×stept}{\rm{A}}=\{{0,step_{\rm{t}},\ldots,(N_{\rm{t}}^{0}-1)\times step_{\rm{t}}}\}, B={0,stepc,,(Nc01)×stepc}{\rm{B}}=\{{0,step_{\rm{c}},\ldots,(N_{\rm{c}}^{0}-1)\times step_{\rm{c}}}\}, where stept=Nt/Nt0,stepc=Nc/Nc0step_{\rm{t}}=\left\lfloor{{N_{\rm{t}}}/N_{\rm{t}}^{0}}\right\rfloor,step_{\rm{c}}=\left\lfloor{{N_{\rm{c}}}/N_{\rm{c}}^{0}}\right\rfloor
Number of model parameters 0.176 million, the case of (Nc0=5,Nt0=5N_{\rm{c}}^{0}=5,N_{\rm{t}}^{0}=5)
Floating-point operations (FLOPs) 5.61 million, the case of (Nc0=5,Nt0=5N_{\rm{c}}^{0}=5,N_{\rm{t}}^{0}=5)
Training epochs 2000
Optimizer Adam

Moreover, we use normalized MSE (NMSE) and cosine correlation ρ\rho [8] as the performance indices, which are defined as follows:

NMSE=𝔼{𝐇𝐇^22𝐇22},\mathrm{NMSE}=\mathbb{E}\left\{{\frac{{\left\|{{\mathbf{H}}-\widehat{\mathbf{H}}}\right\|_{2}^{2}}}{{\left\|{\mathbf{H}}\right\|_{2}^{2}}}}\right\}, (16)

and

ρ=𝔼{1Ncm=1Nc|𝐡^mH𝐡m|𝐡^m2𝐡m2},\rho{\rm{=}}{\mathop{\mathbb{E}}\nolimits}\left\{{\frac{1}{{{N_{c}}}}\sum\limits_{m=1}^{{N_{c}}}{\frac{{|{{\widehat{\bf{h}}}_{m}}^{H}{{\bf{h}}_{m}}|}}{{||{{\widehat{\bf{h}}}_{m}}|{|_{2}}||{{\bf{h}}_{m}}|{|_{2}}}}}}\right\}, (17)

where the 𝐡𝐦/𝐡^m{\bf{h}_{m}}/{\widehat{\bf{h}}}_{m} is the CSI of mm-th subcarrier, i.e. the mm-th columon of the CSI matrix 𝐇/𝐇^\mathbf{H}/\widehat{\mathbf{H}}.

IV-B Performance Evaluation

We first evaluate the mapping performance of the proposed CMixer and use various mapping settings to ensure the universality of evaluation results. The known channel size, the Ω\Omega in Eq. (4), is set as 4×4, 5×5, 6×6 and 7×7 (antennas × subcarriers). Table II shows the mapping performance of proposed scheme and benchmarks. Compared with benchmarks, the proposed CMixer provides 4.6~10dB gains in NMSE and 5.7~9.6 percentage points gain in ρ\rho under same known channel size, or reduces required known channel size by up to 67.3%(116/49)67.3\%(1-16/49) under same and even better accuracy.

Table II: NMSE and ρ\rho under various mapping settings.
Indexes Schemes Known channel size (antennas × subcarriers)
4 × 4 5 × 5 6 × 6 7 × 7
NMSE (dB) CMixer -10.49 -14.11 -15.38 -19.24
MLP -5.85 -6.88 -7.53 -9.22
Res_CNN -3.37 -4.49 -7.23 -8.94
ρ\rho CMixer 0.9555 0.9807 0.9856 0.9940
MLP 0.8589 0.8903 0.9071 0.9367
Res_CNN 0.7394 0.8079 0.9017 0.9353

Further, Fig. 3 shows the mapping results of the different schemes on a random channel sample using grayscale visualization (Known channel size is 5 antennas × 5 subcarriers). It can be intuitively seen that the mapping channel based on CMixer is closer to the true channel than based on benchmarks. In summary, the above experimental results reflect the effectiveness and superiority of the proposed CMixer on the channel mapping task.

Refer to caption
Figure 3: Grayscale visualization of the original CSI and represented CSI.

IV-C Ablation Studies

Complex-domain computation and interleaved learning are the two typical features of the proposed CMixer model. The performance evaluation in Section IV-B illustrates the superiority of the scheme as a whole, and this subsection evaluates the value of these sub-modules through ablation studies. These ablation experiments are all conducted under the known channel of 5 antennas × 5 subcarriers.

We first evaluate the necessity of using CMLPs. We replace the proposed CMLPs with vanilla MLPs. The specific ablation operation is changing the 2XS2X2X\to S\to 2X calculation (CMLP) to the 2(XSX)2\left({X\to S\to X}\right) calculation (MLP with real and imag parallelism), remaining the same computation complexity, 4XS4XS, to ensure fairness. Table III shows the results of this ablation. Not using the CMLPs in either space mixing or frequency mixing leads to performance degradation. This phenomenon demonstrates the advantages of CMLPs in the representation learning of complex-valued channel data.

Table III: NMSE (dB) of using/not using CMLPs.
Frequency Mixing Space Mixing Using MLP Using CMLP
Using MLP -7.13 -11.21
Using CMLP -12.38 -14.11

Then, we also evaluate what leads to the effectiveness of the proposed interleaved learning and whether it exactly stems from the unique interlaced space and frequency characteristics in the channel. Specifically, we shuffle the element order in the channel matrix in a way that remains/loses interlaced space and frequency characteristics and use the CMixer to learn the new matrix as the representation output. The first shuffle mode is named as ‘interlaced shuffle’ and written as,

𝐇interlaced=𝐏Nt×Nt𝐇Nt×Nc𝐏Nc×Nc,{{{\bf{H}}_{{\rm{interlaced}}}}={{\bf{P}}^{{N_{\rm{t}}}\times{N_{\rm{t}}}}}{{\bf{H}}^{{N_{\rm{t}}}\times{N_{\rm{c}}}}}{{\bf{P}}^{{N_{\rm{c}}}\times{N_{\rm{c}}}}},} (18)

where each 𝐏\bf{P} represents a permutation matrix. In this way, the new channel matrix 𝐇interlaced{{\bf{H}}_{{\rm{interlaced}}}} has different space and frequency characteristics from the original channel 𝐇{\bf{H}}, but its space and frequency characteristics are still interlaced coupled since its each row or column still respectively corresponds to certain antenna or subcarrier. The other shuffle mode is named as ‘non interlaced shuffle’ and written as,

𝐇noninterlaced=vec1(𝐏NtNc×NtNcvec(𝐇Nt×Nc)),{{{\bf{H}}_{{\rm{non}}-{\rm{interlaced}}}}={\rm{vec}}^{-1}\left({{{\bf{P}}^{{N_{\rm{t}}}{N_{\rm{c}}}\times{N_{\rm{t}}}{N_{\rm{c}}}}}{\rm{vec}}\left({{{\bf{H}}^{{N_{\rm{t}}}\times{N_{\rm{c}}}}}}\right)}\right),} (19)

where vec(){\rm{vec}}(\cdot) denotes the transposition of a matrix into a vector and vec1(){\rm{vec}}^{-1}(\cdot) is the reverse process of vec(){\rm{vec}}(\cdot). In this shuffle mode, the row or column of 𝐇noninterlaced{{\bf{H}}_{{\rm{non}}-{\rm{interlaced}}}} does not necessarily correspond to certain antenna or subcarrier. Thus, the space and frequency characteristics are no longer coupled in an interlaced manner. Note that the above two kinds of new channel matrix 𝐇noninterlaced,𝐇interlaced{{\bf{H}}_{{\rm{non}}-{\rm{interlaced}}}},{{\bf{H}}_{{\rm{interlaced}}}} and the original channel matrix 𝐇{\bf{H}} are identical in element content and only differ on the data structure. We use various random permutation matrices for experiments, and the statistical results are shown in Table IV. The proposed CMixer can effectively represent 𝐇interlaced{{\bf{H}}_{{\rm{interlaced}}}} with interlaced space and frequency characteristics while is severely degraded in representing 𝐇noninterlaced{{\bf{H}}_{{\rm{non}}-{\rm{interlaced}}}} without the interlaced characteristics. This phenomenon illustrates that the excellent performance of the proposed interleaved space and frequency learning indeed draws on the physical properties rather than relying exclusively on data fitting. The CMixer model’s performance gains come from the design inspired by channel characteristics.

Table IV: NMSE (dB) of using CMixer to map channel w/wo interlaced space and frequency characteristics.
Shuffle mode NMSE (dB)
Origin 14.11-14.11
Interlaced 14.25±0.20-14.25\pm 0.20
Non-Interlaced 1.01±0.05-1.01\pm 0.05

V Conclusion

In this paper, we propose a CMixer model for channel mapping, achieving excellent performance. Modeling analysis and ablation studies state that the complex-domain computation and interleaved learning in this model are suitable for channel representation and, thus, are likely to be applicable in other channel-related tasks as well. We hope that our work can provide inspirations for using DL technologies to solve problems in wireless systems.

References

  • [1] Z. Chen, Z. Zhang, and Z. Yang, “Big AI models for 6G wireless networks: Opportunities, challenges, and research directions,” arXiv preprint arXiv:2308.06250, 2023.
  • [2] M. Alrabeiah and A. Alkhateeb, “Deep learning for TDD and FDD massive MIMO: Mapping channels in space and frequency,” in 2019 53rd Asilomar Conference on Signals, Systems, and Computers, 2019, pp. 1465–1470.
  • [3] M. Soltani, V. Pourahmadi, A. Mirzaei, et al., “Deep learning-based channel estimation,” IEEE Communications Letters, vol. 23, no. 4, pp. 652–655, 2019.
  • [4] L. Li, H. Chen, H.-H. Chang, et al., “Deep residual learning meets OFDM channel estimation,” IEEE Wireless Communications Letters, vol. 9, no. 5, pp. 615–618, 2020.
  • [5] Z. Chen, Z. Zhang, Z. Xiao, et al., “Viewing channel as sequence rather than image: A 2-D Seq2Seq approach for efficient MIMO-OFDM CSI feedback,” IEEE Transactions on Wireless Communications, vol. 22, no. 11, pp. 7393–7407, 2023.
  • [6] I.O. Tolstikhin, N. Houlsby, A. Kolesnikov, et al., “MLP-Mixer: An all-MLP architecture for vision,” Advances in Neural Information Processing Systems, vol. 34, pp. 24261–24272, Dec. 2021.
  • [7] A. Alkhateeb, “DeepMIMO: A generic deep learning dataset for millimeter wave and massive MIMO applications,” arXiv preprint arXiv:1902.06435, 2019.
  • [8] C.-K. Wen, W.-T. Shih, and S. Jin, “Deep learning for massive MIMO CSI feedback,” IEEE Wireless Communications Letters, vol. 7, no. 5, pp. 748–751, Oct. 2018.