Multiple Residual Dense Networks for Reconfigurable Intelligent Surfaces Cascaded Channel Estimation

Yu Jin, Jiayi Zhang, Chongwen Huang, Liang Yang, Huahua Xiao, Bo Ai, and Zhiqin Wang Y. Jin and J. Zhang are with the School of Electronic and Information Engineering, Beijing Jiaotong University, Beijing 100044, China, and also with the Frontiers Science Center for Smart High-speed Railway System, Beijing Jiaotong University, Beijing 100044, China (e-mail: jiayizhang@bjtu.edu.cn).C. Huang is with Zhejiang Provincial Key Lab of information processing, communication and networking, Zhejiang University, Hangzhou 310007, China (e-mail: chongwenhuang@zju.edu.cn).L. Yang is with College of Computer Science, and Electronic Engineering, Hunan University, Changsha 410082, China (e-mail: liangy@hnu.edu.cn).H. Xiao is with ZTE Corporation, and State Key Laboratory of Mobile Network and Mobile Multimedia Technology, Shenzhen 518057, China (e-mail: xiao.huahua@zte.com.cn).B. Ai is with the State Key Laboratory of Rail Traffic Control and Safety, Beijing Jiaotong University, Beijing 100044, China, and also with the Frontiers Science Center for Smart High-speed Railway System, and also with Henan Joint International Research Laboratory of Intelligent Networking and Data Analysis, Zhengzhou University, Zhengzhou 450001, China, and also with Research Center of Networks and Communications, Peng Cheng Laboratory, Shenzhen 518055, China (e-mail: boai@bjtu.edu.cn).Z. Wang is with China Academy of Information and Communications Technology, Beijing 100191, P. R. China (e-mail: zhiqin.wang@caict.ac.cn).

Abstract

Reconfigurable intelligent surface (RIS) constitutes an essential and promising paradigm that relies programmable wireless environment and provides capability for space-intensive communications, due to the use of low-cost massive reflecting elements over the entire surfaces of man-made structures. However, accurate channel estimation is a fundamental technical prerequisite to achieve the huge performance gains from RIS. By leveraging the low rank structure of RIS channels, three practical residual neural networks, named convolutional blind denoising network, convolutional denoising generative adversarial networks and multiple residual dense network, are proposed to obtain accurate channel state information, which can reflect the impact of different methods on the estimation performance. Simulation results reveal the evolution direction of these three methods and reveal their superior performance compared with existing benchmark schemes.

Index Terms:

Channel estimation, deep learning, multiple residual dense network, reconfigurable intelligent surface.

I Introduction

To greatly enhance ultra-high data rate and ubiquitous coverage requirements of the sixth-generation (6G) wireless networks, as one of the promising and innovative techniques, reconfigurable intelligent surface (RIS) aided massive multiple-input multiple-output (MIMO) is envisioned to significantly reduce link blocking probability and system energy consumption to improve link quality with sophisticated beamforming [1, 2, 3]. RIS aided MIMO has been explored with near-passive array to obtain green and sustainable communications between the user equipment (UE) and the base station (BS), by appropriately and dynamically adjusting the magnitude and phase response, wireless signals can be coherently combined and steered to desired directions [4, 5]. Each RIS reflective element can individually control the amplitude response and phase shift of the incident electromagnetic waves at the nanosecond level to achieve energy concentration. Through reflection, refraction, absorption, and transmission, the reshaped electromagnetic waves will form new paths. Based on the these passive and low-cost characteristics of RIS reflective elements, the RIS system requires very low energy consumption to improve the electromagnetic environment and increase propagation environment coverage.

However, benefits from a systematic performance improvement, the RIS system relies on the perfect channel state information (CSI) assumption. Unfortunately, the above work assumes a perfect CSI but not consider the difficulty of obtaining it. First of all, it is quite difficult to estimate the RIS-UE and RIS-BS channels separately, unless the RIS can be equipped with radio frequency (RF) chains. Secondly, the cascaded channel of the RIS between the BS and the UE can be very high-dimensional due to the massive number of reflecting elements. Currently, assuming that RIS elements are connected to RF chains, the channel estimation can be performed with acceptable performance through compressed sensing (CS) based methods. However, due to extremely low deployment, hardware and communication costs, purely passive RIS reflecting elements are undoubtedly more attractive.

By assuming active reflection patterns to achieve a smaller active array size to reduce hardware complexity, a conventional least squares method (LS) is proposed. In addition, by using the low-rank characteristics of the MIMO channel, the training overhead can be reduced through sparse matrix decomposition. Considering the sparse representation of cascaded channels, the CS method is proposed in [6]. Furthermore, as the number of antennas of the UE and BS are equipped with more antennas, the channel estimation complexity increases sharply. Using the angular-domain channel sparsity, a CS-based channel estimation scheme is proposed in [7]. However, the difference in structure sparsity between different channels will cause performance loss. Moreover, deep learning (DL) were proposed to predict the optimal RIS phase shift matrices [8], but it is still significant to get accurate CSI.

In the field of image denoising, the previous convolutional neural network (CNN) structure can construct a pair of training images by adding synthetic noise to the noise-free images [9]. Considering the similarity between image noise reduction and channel estimation, a deep residual learning approach was proposed to learn the cascaded channels from the noisy pilot-based observations [10, 11] Also, a new architecture called Multiple Residual Dense Network (MRDN) has been proposed and has received great attention [12]. In particular, the proposed architecture uses Residual Dense Network (RDN) as a component.

In this correspondence, we propose two practical residual neural networks to exploit the cascaded channel estimation. Main contributions are given as follows: First, generative adversarial networks-based convolutional blind denoising (GAN-CBD) and convolutional blind denoising network (CBDNet) are proposed to obtain accurate CSI, exploiting offline trained neural network; Second, multiple residual dense network (MRDN) is proposed to flexibly adapt to the online cascaded channel estimation; Finally, numerical results confirm that the performance of the proposed methods can significantly outperform existing schemes in terms of ADMM and CV-DnCNN, and CBDNet.

II System Model

Refer to caption — Figure 1: MRDN-based channel estimation for RIS system.

We begin by considering the uplink of a time division duplex (TDD) RIS-aided mmWave communication system. As shown in Fig. 1, we consider one RIS, one controller, one base station (BS) equipped with ${N_{b}}$ antennas and $K$ user equipments (UEs) equipped with ${N_{u}}$ antennas for the mmWave RIS-aided MIMO system, we assume that the planar RIS is equipped with $N=N_{\mathrm{v}}N_{\mathrm{h}}$ passive reflecting elements, where $N_{\mathrm{h}}$ and $N_{\mathrm{v}}$ denote the number of unit elements for RIS in horizontal and vertical orientations. Defining $\mathbf{h}_{\mathrm{r},u_{k}}\in\mathbb{C}^{{N}\times N_{u}},\mathbf{h}_{\mathrm{r},b}\in\mathbb{C}^{{N}\times N_{b}}$ as the channels from the $k$ th UE to the RIS and the BS to the RIS, $\mathbf{h}_{u_{k},b}\in\mathbb{C}^{{N_{b}}\times{N_{u}}}$ as the direct channel between the $k$ th UE and the BS, respectively. $\mathbb{C}^{{M}\times{N}}$ represents an ${M}\times{N}$ complex-valued matrix. Then we can express the received signal as

\mathbf{y}=\sum\limits_{k=1}^{K}\left(\underbrace{\mathbf{h}_{\mathrm{r},b}^{T}\bm{\Psi}_{k}\mathbf{h}_{\mathrm{r},u_{k}}\bm{\Phi}_{k}^{\mathrm{T}}}_{\text{RIS-assisted link }}+\underbrace{\mathbf{h}_{u_{k},b}\bm{\Phi}_{k}^{\mathrm{T}}}_{\text{Direct link}}\right)+\mathbf{n},

(1)

where $\mathbf{n}\sim\mathcal{CN}\left(\mathbf{0},\sigma_{n}^{2}\mathbf{I}\right)$ denotes the noise vector at the BS and $\sigma_{n}^{2}$ is the noise power at each antenna, $\bm{\Phi}_{k}=[\bm{\phi}_{k,1},\bm{\phi}_{k,2},\ldots,\bm{\phi}_{k,{N_{u}}}]\in\mathbb{C}^{\tau\times{N_{u}}}$ denotes pilot matrix for the $k$ th UE, $\bm{\phi}_{n,k}\in\mathbb{C}^{\tau\times 1}$ denotes the orthogonal pilot sequence sent by the $n$ th antenna of $k$ th UE ( $\bm{\phi}_{k_{1},i}^{H}\bm{\phi}_{k_{2},j}=0$ , if $k_{1}\neq k_{2}$ or $i\neq j$ ; $\bm{\phi}_{k_{1},i}^{H}\bm{\phi}_{k_{2},j}=1$ , if $k_{1}=k_{2}$ and $i=j$ , $\forall k_{1},k_{2}\in\{1,2,\ldots,K\}$ ). For transmitting the pilots, all antennas of each UE adopt different pilot sequence. In particular, a pilot would only be allocated to one UE, resulting a orthogonal pilot matrix. Considering a simple model where one or more users in each slot have different optimal RIS phase shift matrix. Therefore, the RIS phase shift matrix $\bm{\Psi}_{k}$ represents the phase shift introduced by the RIS to the impinging signal from the transmitter in the $k$ th time slot. In addition, $\bm{\Psi}_{k}\triangleq\operatorname{diag}\{\bm{\psi}_{k}\}\in\mathbb{C}^{N\times N}$ , with $\bm{\psi}_{k}\in\mathbb{C}^{N\times 1}$ representing the effective phase shifts of the RIS reflecting elements and its $n$ th element is $[\bm{\psi}_{k}]_{n}=\varpi_{n}e^{j\theta_{n}},\forall n\in\{1,2,\ldots,N\}$ . Without loss of generality, we can assume $\bm{\psi}_{k}=\mathbf{1}$ .

By exploiting $\bm{\Phi}_{k}^{\mathrm{T}}\bm{\Phi}_{k}^{*}=\mathbf{I}$ and for simplifying the designing and analysing of the channel estimation algorithms in this work, we assume that there is no direct link between UE and BS due to blockages or negligible receive power, then, the processed received signal of the $k$ th UE at the BS is given by

\displaystyle\mathbf{y}_{k}=\mathbf{y}\bm{\Phi}_{k}^{*}=\mathbf{h}_{\mathrm{r},b}^{T}\bm{\Psi}_{k}\mathbf{h}_{\mathrm{r},u}+\mathbf{n}\bm{\Phi}_{k}^{*}.

(2)

Since practical mmWave channels usually have limited number of scatters, a LoS is expected in RIS systems. The mmWave channel of the $k$ th UE to the RIS and BS to the RIS are, respectively, given as

{\mathbf{h}_{\mathrm{r},u}}=\sum\limits_{l=1}^{L_{\mathrm{T}}}{z_{l}\bm{\alpha}_{\mathrm{R},t}\left({{\phi_{\mathrm{R},l}^{\mathrm{azi}}},{\phi_{\mathrm{R},l}^{\mathrm{ele}}}}\right)\bm{\alpha}_{\mathrm{T},t}^{H}\left({{\phi_{\mathrm{T},l}^{\mathrm{azi}}},{\phi_{\mathrm{T},l}^{\mathrm{ele}}}}\right)},

(3)

{\mathbf{h}_{\mathrm{r},b}}=\sum\limits_{l=1}^{L_{\mathrm{R}}}{z_{l}\bm{\alpha}_{\mathrm{R},r}\left({{\phi_{\mathrm{R},l}^{\mathrm{azi}}},{\phi_{\mathrm{R},l}^{\mathrm{ele}}}}\right)\bm{\alpha}_{\mathrm{T},r}^{H}\left({{\phi_{\mathrm{T},l}^{\mathrm{azi}}},{\phi_{\mathrm{T},l}^{\mathrm{ele}}}}\right)},

(4)

where ${L}\ll\min\left({N_{\text{act}},N_{t}}\right)$ denotes the number of multipaths, $z_{l,k}\in\mathbb{C}$ denotes the distance-dependent pathloss of the $\mathbf{h}_{\mathrm{T},k}$ in the $l$ th path. $\phi_{\mathrm{R},l}^{\mathrm{ele}}$ $(\phi_{\mathrm{R},l}^{\mathrm{azi}})$ denotes the elevation (azimuth) angle-of-arrival of the $l$ th path for both $\mathbf{h}_{\mathrm{T},k}$ and $\mathbf{h}_{\mathrm{T},k_{\text{act}}}$ . The steering vectors depend on the array geometry. For the typical channel ${\mathbf{h}_{\mathrm{T},k_{\text{act}}}}$ and ${\mathbf{h}_{\mathrm{T},k}}$ , variables $\bm{\alpha}_{\mathrm{R}}({{\phi_{\mathrm{R},l}^{\mathrm{azi}}},{\phi_{\mathrm{R},l}^{\mathrm{ele}}}})\in\mathbb{C}^{{N_{r}}\times 1}$ and $\bm{\alpha}_{\mathrm{T}}({{\phi_{\mathrm{T},l}^{\mathrm{azi}}},{\phi_{\mathrm{T},l}^{\mathrm{ele}}}})\in\mathbb{C}^{{N_{t}}\times 1}$ are given by

		$\displaystyle\bm{\alpha}_{\mathrm{R},t}\left(\phi_{\mathrm{R},l}^{\mathrm{azi}},\phi_{\mathrm{R},l}^{\mathrm{ele}}\right)=[1,\mathrm{e}^{j2\pi kd\sin\phi_{\mathrm{R},l}^{\text{azi }}\sin\phi_{\mathrm{R},l}^{\text{ele }}/\lambda},\cdots,$		(5)
		$\displaystyle\mathrm{e}^{j2\pi d\left(N_{\mathrm{v}}-1\right)\sin\phi_{\mathrm{R},l}^{\text{axi }}\!\sin\phi_{\mathrm{R},l}^{\text{ele }}/\!\lambda}]^{T}\!\!\otimes\![1,\mathrm{e}^{j2\pi kd\cos\phi_{\mathrm{R},l}^{\text{ele }}/\lambda},\!\cdots\!,$
		$\displaystyle\mathrm{e}^{j2\pi d\left(N_{\mathrm{h}}-1\right)\cos\phi_{\mathrm{R},l}^{\text{ele }}/\lambda}]^{T},$

		$\displaystyle\bm{\alpha}_{\mathrm{T},t}\left(\phi_{\mathrm{T},l}^{\mathrm{azi}},\phi_{\mathrm{T},l}^{\mathrm{ele}}\right)=[1,\mathrm{e}^{j2\pi d\sin\phi_{\mathrm{T},l}^{\mathrm{azi}}\sin\phi_{\mathrm{T},l}^{\mathrm{ele}}/\lambda},\cdots,$		(6)
		$\displaystyle\mathrm{e}^{j2\pi d\left(N_{T1}-1\right)\sin\phi_{\mathrm{T},l}^{\mathrm{ai}}\sin\phi_{\mathrm{T},l}^{\mathrm{ele}}/\lambda}]^{T}\otimes[1,\mathrm{e}^{j2\pi d\cos\phi_{\mathrm{T},l}^{\mathrm{ele}}/\lambda},\!\cdots\!,$
		$\displaystyle\mathrm{e}^{j2\pi d\left(N_{T2}-1\right)\cos\phi_{\mathrm{T},l}^{\mathrm{ele}}/\lambda}]^{T}$

where $\lambda$ denotes the wavelength, $d$ denotes the antenna spacing, and $\otimes$ is the Kronecker product. $\phi_{\mathrm{T},l}^{\mathrm{ele}}$ $(\phi_{\mathrm{T},l}^{\mathrm{azi}})$ denotes the elevation (azimuth) angle-of-departure of the $l$ th path for both $\mathbf{h}_{\mathrm{T},k}$ and $\mathbf{h}_{\mathrm{T},k_{\text{ele}}}$ . $\bm{\alpha}_{\mathrm{T}}({{\phi_{\mathrm{T},l}^{\mathrm{azi}}},{\phi_{\mathrm{T},l}^{\mathrm{ele}}}})$ and $\bm{\alpha}_{\mathrm{R}}({{\phi_{\mathrm{R},l}^{\mathrm{azi}}},{\phi_{\mathrm{R},l}^{\mathrm{ele}}}})$ denote the steering vectors at the sender side and the receive side, respectively.

III Proposed Channel Estimation Methods

In this section, we introduce CBDNet, GAN-CBD and MRDN for the cascaded channel estimation of RIS systems. Leveraging the sparsity of cascaded mmWave channel, we naturally introduce CBDNet-based method into cascaded channel estimation in line with previous works. And we use the GAN structure to improve the network structure. Specifically, the proposed method MRDN combines the application of residual dense network (RDN) structure and the convolutional block attention module (CBAM) [13], which serves as a building block and can obtain accurate CSI for the cascaded sparsity channel. Compared with existing baseline schemes, MRDN can reduce the complexity of RIS hardware. In the following, we will show the CBDNet, GAN-CBD and MRDN structure channel estimator. In addition, $\mathbf{x}$ and $\mathbf{z}$ represent the input and output of the universal layers and networks, respectively, in this correspondence.

III-A CBDNet-based Channel Estimator

$\textrm{DNN}_{E}$ and $\textrm{DNN}_{D}$ denote the noise level estimation subnetwork and the non-blind denosing subnetwork, respectively. $\Theta_{E}$ and $\Theta_{D}$ are the network parameters of $\textrm{DNN}_{E}$ and $\textrm{DNN}_{D}$ , respectively.

III-A1 Basic Structure

Assuming that $*$ denotes Conv function, as $\mathbf{x}$ and $\mathbf{z}$ are the input and output of the $k$ th Conv layer, the mathematical deduction for convolutional layer is

\displaystyle\mathbf{z}=W_{k}*\mathbf{x}+b_{k},

(7)

where the weight and bias matrices $W_{k}$ and $b_{k}$ are the $k$ th Conv parameters. $\mathbf{z}=c(\mathbf{x})$ , $\Theta_{k,c}=\{W_{k,c},b_{k,c}\}$ for “Conv” layers, $\mathbf{z}=s(\mathbf{x})$ , $\Theta_{k,s}=\{W_{k,s},b_{k,s}\}$ for “SoftMax” layers. Assuming that $\max$ denotes “ReLU” layer function, the mathematical deduction for “ReLU” layer is

\displaystyle\mathbf{z}=\max\left(0,\mathbf{x}\right),

(8)

count as $\mathbf{z}=r(\mathbf{x})$ for “ReLU” layers.

III-A2 Noise Level Estimation Subnetwork

•

Input Layer: As the real and imaginary parts of the received signal matrix ${\mathbf{y}_{k}}\in\mathbb{C}^{N_{b}\times N_{u}}$ are independent at the BS, we first combine them into a super matrix $\mathbf{Y}\in\mathbb{R}^{N_{b}\times 2N_{u}}$ as the input of $\textrm{DNN}_{E}$ . $\mathbb{R}^{{M}\times{N}}$ represents an ${M}\times{N}$ real-valued matrix.

•

Convolutional sensing: The $\textrm{DNN}_{E}$ consists of $\mathrm{B}_{c}$ Conv layers and $\mathrm{K}$ SoftMax layers. The recurrence relation of main body for $\textrm{DNN}_{E}$ is

	$\displaystyle\sigma$	$\displaystyle=\mathcal{F}_{E}(\mathbf{Y},\Theta_{E})$		(9)
		$\displaystyle=c\!\circ\!\cdots\!\circ c\circ\!s\circ\!\cdots\!\circ s(\mathbf{Y})=(c)^{\mathrm{B}_{c}}\circ(s)^{\mathrm{K}}(\mathbf{Y}),$		(9)

where the operator $\circ$ denotes a function composition, $\sigma$ denotes the noise level for the space-invariant AWGN, $\mathbf{M}\in\mathbb{R}^{N_{b}\times 2N_{u}}$ is a uniform map where all elements are $\sigma$ , $\Theta_{E}=\{\Theta_{1,c},\ldots,\Theta_{\mathrm{B}_{c},c},\Theta_{1,s},\ldots,\Theta_{\mathrm{K},s},\}$ . The $\mathcal{F}_{E}:\mathbb{R}^{N_{b}\times 2N_{u}}\mapsto\mathbb{R}^{1\times 1}$ is the mapping function for $\textrm{DNN}_{E}$ .

III-A3 Non-Blind Denosing Subnetwork

•

Input Layer: The $\textrm{DNN}_{D}$ takes both $\mathbf{Y}$ and $\mathbf{M}$ as input to obtain the estimated channel $\widehat{\mathbf{H}}$ .

•

Residual Blocks: The $\textrm{DNN}_{D}$ consists of $\mathrm{B}$ residual blocks $c\circ b\circ r$ , then, the recurrence relation of main body for $\textrm{DNN}_{D}$ is

$\displaystyle\mathbf{{H}}_{\mathrm{m}}$	$\displaystyle=\mathcal{F}_{D}(\mathbf{Y},\mathbf{M},\Theta_{D})$	(10)
	$\displaystyle=c\circ b\circ r\cdots c\circ b\circ r(\mathbf{Y},\mathbf{M})$
	$\displaystyle=(c\circ b\circ r)^{\mathrm{B}}(\mathbf{Y},\mathbf{M}).$

The middle output $\mathbf{{H}}_{\mathrm{m}}=\mathcal{F}_{D}(\mathbf{Y},\mathbf{M},\Theta_{D})$ , where $\mathcal{F}_{D}:\mathbb{R}^{N_{b}\times 2N_{u}}\mapsto\mathbb{R}^{N_{b}\times 2N_{u}}$ is the mapping function for stacking residual blocks.

•

Output Layer: By reversing the combining, the middle output of $\textrm{DNN}_{D}$ $\mathbf{{H}}_{\mathrm{m}}\in\mathbb{R}^{N_{b}\times 2N_{u}}$ produces the estimated channel matrix $\mathbf{\widehat{\mathbf{H}}}\in\mathbb{C}^{N_{b}\times N_{u}}$ .
•

Loss Function: In asymmetric learning, the noise level is estimated to improve the loss function, to quantify the effectiveness of $\textrm{DNN}_{D}$ criterion. The loss function is denoted as

$\displaystyle{\mathcal{L}_{\text{rec}}}={\frac{1}{\sigma}}\|\mathbf{\widehat{\mathbf{H}}}-\mathbf{H}\|^{2}$ (11)

Given the estimated noise level $\sigma(\mathbf{Y})$ and the truth $\sigma(\mathbf{Y_{i}})$ , more penalty is incorporated into their MSE when $\sigma(\mathbf{Y})<\sigma(\mathbf{Y_{i}})$ .

III-B GAN-based Channel Estimation

Motivated by the development of generative adversarial networks (GAN) structure technique, based on the previous CBDNet as our own generator subnetwork, we develop our own GAN-CBD for denoise modeling. The GAN paradigm generates samplers through training and fitting as CBDNet works, and the results of GAN-CBD network compare with and label results, making the discriminator $D$ work well. Training $D$ can distinguish the training examples from the samples generated by $G$ , and $G$ undergoes the judgment of $D$ to reduce the possibility of samples being misclassified.

III-B1 Generator Network

In addition, in order to verify the effectiveness of GAN structure, we use CBDNet as the generator network. The $\textrm{GAN}_{D}$ consists of $\mathrm{B}$ residual blocks. We have

		$\displaystyle\mathbf{\widehat{\mathbf{H}}}=\mathcal{G}_{d}(\mathbf{Y},\mathbf{M},\Theta_{G_{d}})$		(12)
		$\displaystyle=c\circ b\circ r\cdots c\circ b\circ r(\mathbf{Y},\mathbf{M})=(c\circ b\circ r)^{\mathrm{B}}(\mathbf{Y},\mathbf{M}),$		(12)

where $\mathcal{G}_{d}:\mathbb{R}^{N_{b}\times 2N_{u}}\mapsto\mathbb{C}^{N_{b}\times N_{u}}$ is the mapping function for the generator network. $\mathbf{M}\in\mathbb{R}^{N_{b}\times 2N_{u}}$ is a uniform map from $\textrm{GAN}_{E}$ , $\sigma=\mathcal{G}_{e}(\mathbf{Y},\Theta_{G_{e}})$ .

III-B2 Discriminator Network

In the original formulation, the training procedure defines a continuous minimax game as

\displaystyle\underset{G}{\arg\min}\ \underset{D}{\arg\max}\mathbb{E}[\log D(\mathbf{x})]+\mathbb{E}[\log(1-D(G(\mathbf{n})))]

(13)

where $D$ is a function that maps $\mathbb{R}^{N_{b}\times 2N_{u}}$ to the unit interval, and $G$ is a function that maps a noise vector $\mathbf{n}\in\mathbb{R}^{N_{b}\times 2N_{u}}$ , drawn from a simple distribution $p(\mathbf{n})$ , to the ambient space of the training data $\mathbb{R}^{N_{b}\times 2N_{u}}$ .

III-C MRDN-based Channel Estimation

We define this feature concatenation part of RDN and CBAM in Fig. 2, and use it as a building module of MRDN.

III-C1 Basic Structure

Assuming that $*$ denotes “Conv” layer function, $\max$ denotes “ReLU” layer function, and the residual block model is the composition of two cascaded functions:

\displaystyle\mathbf{z}_{-1}

\displaystyle=W_{n,r}*\mathbf{x}+b_{n,r},

(14)

\displaystyle\mathbf{z}_{0}

\displaystyle=\max\left(0,\mathbf{z}_{-1}\right),

(15)

where the weight and bias matrices of the $n$ th residual block parameter are denoted by $\Theta_{n,r}=\{W_{n,r},b_{n,r}\},n\in\{1,2,\ldots,\mathrm{B}\}$ . $\mathbf{x}$ and $\mathbf{z}_{0}$ are the input and output of the residual block, respectively. let $g_{n}$ denotes the single recursion function of $n$ th residual block.

III-C2 Residual Dense Network Structure

RDN performs well in addressing denoising image problems. Motivated by many recent image restoration networks including RDN, we include the global residual connection such that the network can focus on learning the difference between the noisy and ground-truth channel matrix. The main body of RDN have $\mathrm{B}$ layers. The recurrence relation of main body for the $n$ th layer is $F_{1}=g_{1}(\mathbf{Y})$ and

\displaystyle F_{n}=g_{n}(F_{n-1}(\mathbf{Y}),\cdots,F_{1}(\mathbf{Y}),\mathbf{Y}),\forall n\in\{2,\ldots,\mathrm{B}\}.

(16)

III-C3 Convolutional Block Attention Module

\displaystyle\mathbf{z}_{-1}

\displaystyle=W_{-1,a}*\mathbf{x}+b_{-1,a},

(17)

\displaystyle\mathbf{z}_{0}

\displaystyle=\max\left(0,\mathbf{z}_{-1}\right),

(18)

\displaystyle\mathbf{z}_{1}

\displaystyle=W_{1,a}*\mathbf{z}_{0}+b_{1,a},

(19)

where the weight and bias matrices consist the CBAM parameter $\Theta_{a}=\{W_{-1,a},W_{1,a},b_{-1,a},b_{1,a}\}$ . $\mathbf{x}$ and $\mathbf{z}_{1}$ are the input and output of the CBAM. The recurrence relation of CBAM for MRDN is $A(\mathbf{x})=c\circ r\circ c(\mathbf{x})$ .

III-C4 Input Layer

As the real and imaginary parts of the received signal matrix ${\mathbf{y}_{k}}\in\mathbb{C}^{N_{b}\times N_{u}}$ are independent at the RIS, we first combine them into a super matrix $\mathbf{Y}\in\mathbb{R}^{N_{b}\times 2N_{u}}$ . In this case, the channel matrix can be treated as a 2D image and the super matrix $\mathbf{Y}$ is the input of MRDN.

III-C5 Multiple Residual Dense Network Structure

We take advantages of novel ideas in RDN and RCAN as follows.

•

RDN itself is an image restoration network, but we use it with modifications as a component of our network and construct a cascaded structure of $N_{R}$ RDNs as our image denoising network.
•

The recurrence relation of main body for RDN is

$\displaystyle M(\mathbf{x})=F_{n,N_{R}}\circ F_{n,N_{R}-1}\circ\cdots\circ F_{n,1}(\mathbf{x}),$ (20)

$\displaystyle F(\mathbf{x})=M\circ A(\mathbf{x})=F_{n}^{N_{R}}\circ A(\mathbf{x}),$ (21)

where the operator $\circ$ denotes a function composition and $F_{n}^{N_{R}}$ denotes the $N_{R}$ -fold product of $F_{n}$ . The middle output $\mathbf{{H}}_{\mathrm{m}}=F(\mathbf{Y})$ , where $F:\mathbb{R}^{N_{b}\times 2N_{u}}\mapsto\mathbb{R}^{N_{b}\times 2N_{u}}$ is the mapping function for MRDN.

III-C6 Computational Complexity Analysis

The computational complexity of the training phase in CBDNet is given by

\displaystyle\mathcal{O}(N^{2}{K^{2}}st({L_{d}}{D_{l}}^{2}+{L_{e}}{E_{l}}^{2})),

(22)

where $s$ donates the size of mini-batch, $t$ donates the number of iterations, ${K^{2}}$ donates the size of kernels. ${L_{d}}$ and ${L_{e}}$ denote the number of “Conv” for $\textrm{DNN}_{D}$ and $\textrm{DNN}_{E}$ , ${D_{l}}$ and ${E_{l}}$ denote the number of features for the $l$ th layer of $\textrm{DNN}_{D}$ and $\textrm{DNN}_{E}$ , respectively. The computational complexity of the training phase in GAN-CBD and MRDN are given by

\displaystyle\mathcal{O}(N^{2}{K^{2}}st({L_{g,d}}{D_{g,l}}^{2}+{L_{g,e}}{E_{g,l}}^{2}+{L_{a}}{E_{a}}^{2})),

(23)

and

\displaystyle\mathcal{O}(N^{2}{K^{2}}st{L_{m}}^{2}{D_{m}}^{2}).

(24)

IV Simulation Result

We consider the RIS-aided mmWave massive MIMO system with 20 UEs, where $N_{b}=64$ , $N_{u}=32$ , $N=4096$ , $L=3$ and $d=\lambda/2$ . In terms of hardware, we use Intel Core i7-9700K @3.60GHz, 32 GB RAM and NVIDIA GeForce RTX 2080Ti to implement the above three models through PyTorch library. From the perspective of normalized mean square error (NMSE) performance, this section illustrates the pros and cons of the three proposed channel estimators in terms of structure. All simulation results are derived in PyCharm Community Edition (Python 3.8 environment). The training rate is set as 0.0001 for MRDN and 0.001 for CBDNet and GAN-CBD and the mini-batch size is 20 for all three methods. The training, validation, and testing sets include 16,000, 6,000, and 8,000 samples, respectively. The training, validation and testing sets for the three methods use the same data set samples. The number of RDN for MRDN, residual blocks for both CBDNet and GAN-CBD are 6 and 12, respectively. The MRDN has 80 features, CBDNet and GAN-CBD have 96 features. Note that NMSE is defined as

\text{NMSE}=\mathbb{E}\left({{{\left\|{\widehat{\mathbf{H}}-\mathbf{H}}\right\|^{2}}}/{{\left\|{\widehat{\mathbf{H}}}\right\|^{2}}}}\right).

(25)

Figure 4 compares the three different models, including the MRDN, CBDNet and GAN-CBD. We can find that the MRDN can achieve best NMSE performance and fastest convergence. Because the GAN-CBD brings the advantage of judging the network, it shows better performance than the CBDNet. The computational complexity of training and offline operation can be hugely reduced. Also, the robustness of the channel estimator to different scenarios is enhanced. The average running time of MRDN (in seconds) is 0.0075, while the CBDNet and the GAN-CBD are 0.0094 and 0.0098 respectively, the computational complexity of training and offline operations for the MRDN can reduced compared with the CBDNet and the GAN-CBD. However, for almost the same computational complexity, the GAN-CBD can achieve better NMSE performance and fast convergence compared with the CBDNet. But compared with the MRDN, the improvement of network structure is not significant.

Figure 5 compares the NMSE performance of the proposed MRDN-based channel estimator for different structures (e.g., CBDNet [14], GAN-CBDN, CV-DnCNN [15]) and with existing conventional channel estimation methods (e.g., ADMM [16], PAPRFAC [17]). The simulation results are averaged over 300 iterations for the three proposed methods. It can be observed that MRDN can achieve better NMSE performance compared with GAN-CBD and CBDNet by 5.63dB and 4.51dB respectively. Compared with CV-DnCNN, which is also based on CNN, as well as conventional ADMM and PAPRFAC, regardless of the significant performance comparison in NMSE, the lower complexity of MRDN allows it to be better applied.

Figure 6 compares the NMSE performance for different number of features and RDN. With more RDNs for the global residual dense connection, and more comprehensive perceptual fields, the MRDN with 80 features and 6 dense connections for RDN performs better. Consequently, the main challenge in accurately describing noise is the lack of observational dimensions and modeling capabilities of neural networks, such as features and layers.

V Conclusion

We proposed the CBDNet, GAN-CBD and MRDN based cascaded channel estimators for RIS-aided mmWave massive MIMO communication systems. Utilizing the sparsity of the cascaded RIS channels and classic image processing techniques, we regard the channel matrix as a two-dimensional image. The proposed residual dense network structure can increase the flexibility of the overall network to obtain better generalization and fitting capabilities, while the advantages brought by the GAN structure are not significant. Compared with the previous generation method, based on the above advantages, the MRDN-based deep learning network is designed to estimate the cascaded RIS channels. The simulation results show that the performance of the proposed the MRDN estimator increases with the increase of the scale of the network structure under the same order of complexity as the CBDNet and the GAN-CBD.

Appendix A Forward and Backward Propagation

Assume that the weight matrix $W_{i}\in\mathbb{R}^{n_{i+1}\times n_{i}}$ and the bias vector $b_{i}\in\mathbb{R}^{n_{i+1}}$ are the parameters at $i$ “Conv” layer $c_{i}$ . In the multilayer perceptron (MLP), we can explicitly write $c_{i}\left(\mathbf{x}_{i};W_{i},b_{i}\right)=r_{i}\left(W_{i}\cdot\mathbf{x}_{i}+b_{i}\right)$ , where $r_{i}$ is an elementwise function and the definition for Conv is $r_{i}(\mathbf{x})\equiv\max(0,\mathbf{x})$ , as the first derivative is $r_{i}^{\prime}(\mathbf{x})=H(\mathbf{x})$ . For any $\mathbf{x}_{i}\in\mathbb{R}^{n_{i}}$ and vectors $U_{i}\in\mathbb{R}^{n_{i+1}\times n_{i}}$ in inner product space,

\displaystyle\nabla_{W_{i}}c_{i}\left(\mathbf{x}_{i}\right)\cdot U_{i}=\mathrm{D}\mit\Psi_{i}\left(\mathbf{z}_{i}\right)\cdot U_{i}\cdot\mathbf{x}_{i},

(26)

\displaystyle\nabla_{b_{i}}c_{i}\left(\mathbf{x}_{i}\right)=\mathrm{D}\mit\Psi_{i}\left(\mathbf{z}_{i}\right),

(27)

where $\mathbf{z}_{i}=W_{i}\cdot\mathbf{x}_{i}+b_{i}$ , $\mit\Psi(v)=\sum_{k=1}^{n}\psi\left(v_{k}\right)e_{k}$ and

\displaystyle\mathrm{D}f(x;\theta)\cdot v=\left.\frac{\mathrm{d}}{\mathrm{d}t}f(x+tv;\theta)\right|_{t=0}.

(28)

The loss function gradients in MLP is

\displaystyle J(\mathbf{x},\!\mathbf{z};\!\theta)\!=\!\frac{1}{2}\|\mathbf{z}\!\!-\!\!F(\mathbf{x};\!\theta)\|^{2}\!=\!\frac{1}{2}\langle\mathbf{z}\!\!-\!\!F(\mathbf{x};\!\theta),\mathbf{z}\!\!-\!\!F(\mathbf{x};\!\theta)\rangle.

(29)

Let $(\mathbf{x},\mathbf{z})\in E_{1}\times E_{L+1}$ be a network input-output pair,

\displaystyle\nabla_{W_{i}}J(\mathbf{x},\mathbf{z};\theta)=\left[\mit\Psi_{i}^{\prime}\left(\mathbf{z}_{i}\right)\odot\left(\mathrm{D}^{*}\omega_{i+1}\left(\mathbf{x}_{i+1}\right)\cdot e\right)\right]\mathbf{x}_{i}^{T},

(30)

\displaystyle\nabla_{b_{i}}J(\mathbf{x},\mathbf{z};\theta)=\mit\Psi_{i}^{\prime}\left(\mathbf{z}_{i}\right)\odot\left(\mathrm{D}^{*}\omega_{i+1}\left(\mathbf{x}_{i+1}\right)\cdot e\right),

(31)

where $\mathbf{x}_{i}\!=\!\alpha_{i-1}(\mathbf{x})$ , and the prediction error is $e\!=\!F(\mathbf{x};\!\theta)-\mathbf{z}$ ,

\displaystyle F(\mathbf{x};\theta)=\left(c_{L}\circ\cdots\circ c_{1}\right)(\mathbf{x}),

(32)

for all $i\in[L]$ and $\theta\in\{W_{i},b_{i}\}$ ,

\displaystyle\nabla_{\theta_{i}}J(\mathbf{x},\mathbf{z};\theta)=\nabla_{\theta_{i}}^{*}c_{i}\left(\mathbf{x}_{i}\right)\cdot\mathrm{D}^{*}\omega_{i+1}\left(\mathbf{x}_{i+1}\right)\cdot e.

(33)

The generic layer of a CNN as a parameter-dependent map that takes as input an $m_{1}$ -channeled tensor, where each channel is a matrix of size $n_{1}\times\ell_{1}$ , and outputs an $m_{2}$ -channeled tensor, where each channel is a matrix of size $n_{2}\times\ell_{2}$ . The parameters $W\in\mathbb{R}^{p\times q}\otimes\mathbb{R}^{m_{2}}$ , the input $x\in\mathbb{R}^{n_{1}\times\ell_{1}}\otimes\mathbb{R}^{m_{1}}$ . $\left\{e_{j}\right\}_{j=1}^{m_{1}}$ denotes an orthonormal basis for $\mathbb{R}^{m_{1}}$ , and $\left\{\bar{e}_{j}\right\}_{j=1}^{m_{2}}$ denotes an orthonormal basis for $\mathbb{R}^{m_{2}}$ , the $\mathbf{x}$ and $W$ is:

\displaystyle\mathbf{x}=\sum_{j=1}^{m_{1}}\mathbf{x}_{j}\otimes e_{j},\quad W=\sum_{j=1}^{m_{2}}W_{j}\otimes\bar{e}_{j}.

(34)

And the convolution operator $C$ can be written as

\displaystyle C(W,\mathbf{x})=\sum_{j=1}^{m_{2}}c_{j}(W,\mathbf{x})\otimes\bar{e}_{j},

(35)

where $c_{j}$ is a bilinear operator that defines the mechanics of the convolution:

\displaystyle c_{j}(W,\mathbf{x})=\sum_{k=1}^{\widehat{n}_{1}}\sum_{l=1}^{\widehat{\ell}_{1}}\left\langle W_{j},\mathcal{K}_{\gamma(k,l,\Delta)}(\mathbf{x})\right\rangle\widehat{E}_{k,l},

(36)

where $\mathcal{K}$ is the cropping operator that defines the action of convolution, $\gamma(k,l,\Delta)=(1+(k-1)\Delta,1+(l-1)\Delta)$ , $\Delta$ defines the stride of the Conv. The generic layer $c_{i}$ is

\displaystyle c_{i}\left(\mathbf{x}_{i}\right)=\mit\Psi_{i}\left(C_{i}\left(W_{i},\mathbf{x}_{i}\right)\right).

(37)

For the “Conv” layer,

\displaystyle\nabla_{W_{i}}^{*}c_{i}(x_{i})=(\mit\Psi_{i}\llcorner\mathbf{x}_{i})^{*}\mathrm{D}C_{i}(W_{i},\mathbf{x}_{i})\mathrm{D}^{*}\mit\Psi_{i}(C_{i}(W_{i},\mathbf{x}_{i})).

(38)

\displaystyle\!\!\!\mathrm{D}^{*}c_{i}(\mathbf{x}_{i})\!=\!(W_{i}\lrcorner C_{i})^{\!*}\mathrm{D}\mit\Psi_{i}(C_{i}(W_{i},\!\mathbf{x}_{i}))\mathrm{D}^{*}\mit\Psi_{i}(C_{i}(W_{i},\!\mathbf{x}_{i})).

(39)

where $(e_{1}\lrcorner B)\cdot e_{2}=B(e_{1},e_{2})$ , and $(B\llcorner e_{2})\cdot e_{1}=B(e_{1},e_{2})$ . The learning rate $\eta\in\mathbb{R}_{+}$ , the gradient descent step algorithm update the parameter for backpropagation is

\displaystyle\nabla_{W_{i}}J(\mathbf{x},\mathbf{z};\theta)\leftarrow\!(C_{i}\llcorner\mathbf{x}_{i})^{*}\mathrm{D}C_{i}\left(W_{i},\mathbf{x}_{i})\mathrm{D}^{*}\mit\Psi_{i}(\mathbf{z}_{i}\right)e_{i},

(40)

the parameters can be update by $W_{i}\leftarrow W_{i}-\eta\nabla_{W_{i}}J(x,y;\theta)$ . Due to the application of the derivative chain rule and error backpropagation, the high-dimensional neural network demonstrates excellent results.

References

[1] C. Huang, A. Zappone, G. C. Alexandropoulos, M. Debbah, and C. Yuen, “Reconfigurable intelligent surfaces for energy efficiency in wireless communication,” IEEE Trans. Wireless Commun., vol. 18, no. 8, pp. 4157–4170, 2019.
[2] Q. Wu and R. Zhang, “Towards smart and reconfigurable environment: Intelligent reflecting surface aided wireless network,” IEEE Commun. Mag., vol. 58, no. 1, pp. 106–112, 2019.
[3] J. Zhang, E. Björnson, M. Matthaiou, D. W. K. Ng, H. Yang, and D. J. Love, “Prospective multiple antenna technologies for beyond 5G,” IEEE J. Sel. Areas Commun., vol. 38, no. 8, pp. 1637–1660, Aug. 2020.
[4] J. Lin, G. Wang, R. Fan, T. A. Tsiftsis, and C. Tellambura, “Channel estimation for wireless communication systems assisted by large intelligent surfaces,” arXiv preprint arXiv:1911.02158, 2019.
[5] J. Zhang, H. Du, Q. Sun, B. Ai, and D. W. K. Ng, “Physical layer security enhancement with reconfigurable intelligent surface-aided networks,” IEEE Trans. Inf. Forensic Secur., vol. 16, pp. 3480–3495, 2021.
[6] Z.-Q. He and X. Yuan, “Cascaded channel estimation for large intelligent metasurface assisted massive MIMO,” IEEE Wireless Commun. Lett., vol. 9, no. 2, pp. 210–214, Feb. 2019.
[7] J. Chen, Y.-C. Liang, H. V. Cheng, and W. Yu, “Channel estimation for reconfigurable intelligent surface aided multi-user MIMO systems,” arXiv:1912.03619, 2019.
[8] A. Taha, M. Alrabeiah, and A. Alkhateeb, “Deep learning for large intelligent surfaces in millimeter wave and massive MIMO systems,” in Proc. IEEE Globecom, 2019, pp. 1–6.
[9] J. Yang, C.-K. Wen, S. Jin, and F. Gao, “Beamspace channel estimation in mmWave systems via cosparse image reconstruction technique,” IEEE Trans. Commun., vol. 66, no. 10, pp. 4767–4782, Oct. 2018.
[10] K. Zhang, W. Zuo, Y. Chen, D. Meng, and L. Zhang, “Beyond a gaussian denoiser: Residual learning of deep CNN for image denoising,” IEEE Trans. Image Process., vol. 26, no. 7, pp. 3142–3155, Jul. 2017.
[11] C. Liu, X. Liu, D. W. K. Ng, and J. Yuan, “Deep residual network empowered channel estimation for IRS-assisted multi-user communication systems,” in Proc. IEEE ICC, 2021, pp. 1–7.
[12] D. Kim, J. R. Chung, and S. Jung, “GRDN: Grouped residual dense network for real image denoising and GAN-based real-world noise modeling,” in Proc. CVF CVPRW, 2019, pp. 2086–2094.
[13] S. Woo, J. Park, J.-Y. Lee, and I. S. Kweon, “CBAM: Convolutional block attention module,” in Proc. ECCV, 2018, pp. 3–19.
[14] Y. Jin, J. Zhang, B. Ai, and X. Zhang, “Channel estimation for mmWave massive MIMO with convolutional blind denoising network,” IEEE Commun. Lett., vol. 24, no. 1, pp. 95–98, Jan. 2019.
[15] S. Liu, Z. Gao, J. Zhang, M. Di Renzo, and M.-S. Alouini, “Deep denoising neural network assisted compressive channel estimation for mmwave intelligent reflecting surfaces,” IEEE Trans. Veh. Technol., vol. 69, no. 8, pp. 9223–9228, Aug. 2020.
[16] E. Vlachos, G. C. Alexandropoulos, and J. Thompson, “Massive MIMO channel estimation for millimeter wave systems via matrix completion,” IEEE Signal Process. Lett., vol. 25, no. 11, pp. 1675–1679, Nov. 2018.
[17] L. Wei, C. Huang, G. C. Alexandropoulos, C. Yuen, Z. Zhang, and M. Debbah, “Channel estimation for RIS-empowered multi-user MISO wireless communications,” IEEE Trans. Commun., vol. 69, no. 6, pp. 4144–4157, Jun. 2021.