Multiple Residual Dense Networks for Reconfigurable Intelligent Surfaces Cascaded Channel Estimation
Abstract
Reconfigurable intelligent surface (RIS) constitutes an essential and promising paradigm that relies programmable wireless environment and provides capability for space-intensive communications, due to the use of low-cost massive reflecting elements over the entire surfaces of man-made structures. However, accurate channel estimation is a fundamental technical prerequisite to achieve the huge performance gains from RIS. By leveraging the low rank structure of RIS channels, three practical residual neural networks, named convolutional blind denoising network, convolutional denoising generative adversarial networks and multiple residual dense network, are proposed to obtain accurate channel state information, which can reflect the impact of different methods on the estimation performance. Simulation results reveal the evolution direction of these three methods and reveal their superior performance compared with existing benchmark schemes.
Index Terms:
Channel estimation, deep learning, multiple residual dense network, reconfigurable intelligent surface.I Introduction
To greatly enhance ultra-high data rate and ubiquitous coverage requirements of the sixth-generation (6G) wireless networks, as one of the promising and innovative techniques, reconfigurable intelligent surface (RIS) aided massive multiple-input multiple-output (MIMO) is envisioned to significantly reduce link blocking probability and system energy consumption to improve link quality with sophisticated beamforming [1, 2, 3]. RIS aided MIMO has been explored with near-passive array to obtain green and sustainable communications between the user equipment (UE) and the base station (BS), by appropriately and dynamically adjusting the magnitude and phase response, wireless signals can be coherently combined and steered to desired directions [4, 5]. Each RIS reflective element can individually control the amplitude response and phase shift of the incident electromagnetic waves at the nanosecond level to achieve energy concentration. Through reflection, refraction, absorption, and transmission, the reshaped electromagnetic waves will form new paths. Based on the these passive and low-cost characteristics of RIS reflective elements, the RIS system requires very low energy consumption to improve the electromagnetic environment and increase propagation environment coverage.
However, benefits from a systematic performance improvement, the RIS system relies on the perfect channel state information (CSI) assumption. Unfortunately, the above work assumes a perfect CSI but not consider the difficulty of obtaining it. First of all, it is quite difficult to estimate the RIS-UE and RIS-BS channels separately, unless the RIS can be equipped with radio frequency (RF) chains. Secondly, the cascaded channel of the RIS between the BS and the UE can be very high-dimensional due to the massive number of reflecting elements. Currently, assuming that RIS elements are connected to RF chains, the channel estimation can be performed with acceptable performance through compressed sensing (CS) based methods. However, due to extremely low deployment, hardware and communication costs, purely passive RIS reflecting elements are undoubtedly more attractive.
By assuming active reflection patterns to achieve a smaller active array size to reduce hardware complexity, a conventional least squares method (LS) is proposed. In addition, by using the low-rank characteristics of the MIMO channel, the training overhead can be reduced through sparse matrix decomposition. Considering the sparse representation of cascaded channels, the CS method is proposed in [6]. Furthermore, as the number of antennas of the UE and BS are equipped with more antennas, the channel estimation complexity increases sharply. Using the angular-domain channel sparsity, a CS-based channel estimation scheme is proposed in [7]. However, the difference in structure sparsity between different channels will cause performance loss. Moreover, deep learning (DL) were proposed to predict the optimal RIS phase shift matrices [8], but it is still significant to get accurate CSI.
In the field of image denoising, the previous convolutional neural network (CNN) structure can construct a pair of training images by adding synthetic noise to the noise-free images [9]. Considering the similarity between image noise reduction and channel estimation, a deep residual learning approach was proposed to learn the cascaded channels from the noisy pilot-based observations [10, 11] Also, a new architecture called Multiple Residual Dense Network (MRDN) has been proposed and has received great attention [12]. In particular, the proposed architecture uses Residual Dense Network (RDN) as a component.
In this correspondence, we propose two practical residual neural networks to exploit the cascaded channel estimation. Main contributions are given as follows: First, generative adversarial networks-based convolutional blind denoising (GAN-CBD) and convolutional blind denoising network (CBDNet) are proposed to obtain accurate CSI, exploiting offline trained neural network; Second, multiple residual dense network (MRDN) is proposed to flexibly adapt to the online cascaded channel estimation; Finally, numerical results confirm that the performance of the proposed methods can significantly outperform existing schemes in terms of ADMM and CV-DnCNN, and CBDNet.
II System Model

We begin by considering the uplink of a time division duplex (TDD) RIS-aided mmWave communication system. As shown in Fig. 1, we consider one RIS, one controller, one base station (BS) equipped with antennas and user equipments (UEs) equipped with antennas for the mmWave RIS-aided MIMO system, we assume that the planar RIS is equipped with passive reflecting elements, where and denote the number of unit elements for RIS in horizontal and vertical orientations. Defining as the channels from the th UE to the RIS and the BS to the RIS, as the direct channel between the th UE and the BS, respectively. represents an complex-valued matrix. Then we can express the received signal as
(1) |
where denotes the noise vector at the BS and is the noise power at each antenna, denotes pilot matrix for the th UE, denotes the orthogonal pilot sequence sent by the th antenna of th UE (, if or ; , if and , ). For transmitting the pilots, all antennas of each UE adopt different pilot sequence. In particular, a pilot would only be allocated to one UE, resulting a orthogonal pilot matrix. Considering a simple model where one or more users in each slot have different optimal RIS phase shift matrix. Therefore, the RIS phase shift matrix represents the phase shift introduced by the RIS to the impinging signal from the transmitter in the th time slot. In addition, , with representing the effective phase shifts of the RIS reflecting elements and its th element is . Without loss of generality, we can assume .
By exploiting and for simplifying the designing and analysing of the channel estimation algorithms in this work, we assume that there is no direct link between UE and BS due to blockages or negligible receive power, then, the processed received signal of the th UE at the BS is given by
(2) |
Since practical mmWave channels usually have limited number of scatters, a LoS is expected in RIS systems. The mmWave channel of the th UE to the RIS and BS to the RIS are, respectively, given as
(3) |
(4) |
where denotes the number of multipaths, denotes the distance-dependent pathloss of the in the th path. denotes the elevation (azimuth) angle-of-arrival of the th path for both and . The steering vectors depend on the array geometry. For the typical channel and , variables and are given by
(5) | ||||
(6) | ||||
where denotes the wavelength, denotes the antenna spacing, and is the Kronecker product. denotes the elevation (azimuth) angle-of-departure of the th path for both and . and denote the steering vectors at the sender side and the receive side, respectively.
III Proposed Channel Estimation Methods
In this section, we introduce CBDNet, GAN-CBD and MRDN for the cascaded channel estimation of RIS systems. Leveraging the sparsity of cascaded mmWave channel, we naturally introduce CBDNet-based method into cascaded channel estimation in line with previous works. And we use the GAN structure to improve the network structure. Specifically, the proposed method MRDN combines the application of residual dense network (RDN) structure and the convolutional block attention module (CBAM) [13], which serves as a building block and can obtain accurate CSI for the cascaded sparsity channel. Compared with existing baseline schemes, MRDN can reduce the complexity of RIS hardware. In the following, we will show the CBDNet, GAN-CBD and MRDN structure channel estimator. In addition, and represent the input and output of the universal layers and networks, respectively, in this correspondence.
III-A CBDNet-based Channel Estimator
and denote the noise level estimation subnetwork and the non-blind denosing subnetwork, respectively. and are the network parameters of and , respectively.
III-A1 Basic Structure
Assuming that denotes Conv function, as and are the input and output of the th Conv layer, the mathematical deduction for convolutional layer is
(7) |
where the weight and bias matrices and are the th Conv parameters. , for “Conv” layers, , for “SoftMax” layers. Assuming that denotes “ReLU” layer function, the mathematical deduction for “ReLU” layer is
(8) |
count as for “ReLU” layers.
III-A2 Noise Level Estimation Subnetwork
-
•
Input Layer: As the real and imaginary parts of the received signal matrix are independent at the BS, we first combine them into a super matrix as the input of . represents an real-valued matrix.
-
•
Convolutional sensing: The consists of Conv layers and SoftMax layers. The recurrence relation of main body for is
(9) where the operator denotes a function composition, denotes the noise level for the space-invariant AWGN, is a uniform map where all elements are , . The is the mapping function for .
III-A3 Non-Blind Denosing Subnetwork
-
•
Input Layer: The takes both and as input to obtain the estimated channel .
-
•
Residual Blocks: The consists of residual blocks , then, the recurrence relation of main body for is
(10) The middle output , where is the mapping function for stacking residual blocks.
-
•
Output Layer: By reversing the combining, the middle output of produces the estimated channel matrix .
-
•
Loss Function: In asymmetric learning, the noise level is estimated to improve the loss function, to quantify the effectiveness of criterion. The loss function is denoted as
(11) Given the estimated noise level and the truth , more penalty is incorporated into their MSE when .
III-B GAN-based Channel Estimation
Motivated by the development of generative adversarial networks (GAN) structure technique, based on the previous CBDNet as our own generator subnetwork, we develop our own GAN-CBD for denoise modeling. The GAN paradigm generates samplers through training and fitting as CBDNet works, and the results of GAN-CBD network compare with and label results, making the discriminator work well. Training can distinguish the training examples from the samples generated by , and undergoes the judgment of to reduce the possibility of samples being misclassified.
III-B1 Generator Network
In addition, in order to verify the effectiveness of GAN structure, we use CBDNet as the generator network. The consists of residual blocks. We have
(12) | ||||
where is the mapping function for the generator network. is a uniform map from , .
III-B2 Discriminator Network
In the original formulation, the training procedure defines a continuous minimax game as
(13) |
where is a function that maps to the unit interval, and is a function that maps a noise vector , drawn from a simple distribution , to the ambient space of the training data .
III-C MRDN-based Channel Estimation
We define this feature concatenation part of RDN and CBAM in Fig. 2, and use it as a building module of MRDN.
III-C1 Basic Structure
Assuming that denotes “Conv” layer function, denotes “ReLU” layer function, and the residual block model is the composition of two cascaded functions:
(14) |
(15) |
where the weight and bias matrices of the th residual block parameter are denoted by . and are the input and output of the residual block, respectively. let denotes the single recursion function of th residual block.


III-C2 Residual Dense Network Structure
RDN performs well in addressing denoising image problems. Motivated by many recent image restoration networks including RDN, we include the global residual connection such that the network can focus on learning the difference between the noisy and ground-truth channel matrix. The main body of RDN have layers. The recurrence relation of main body for the th layer is and
(16) |
III-C3 Convolutional Block Attention Module
(17) |
(18) |
(19) |
where the weight and bias matrices consist the CBAM parameter . and are the input and output of the CBAM. The recurrence relation of CBAM for MRDN is .


III-C4 Input Layer
As the real and imaginary parts of the received signal matrix are independent at the RIS, we first combine them into a super matrix . In this case, the channel matrix can be treated as a 2D image and the super matrix is the input of MRDN.
III-C5 Multiple Residual Dense Network Structure
We take advantages of novel ideas in RDN and RCAN as follows.
-
•
RDN itself is an image restoration network, but we use it with modifications as a component of our network and construct a cascaded structure of RDNs as our image denoising network.
-
•
The recurrence relation of main body for RDN is
(20) (21) where the operator denotes a function composition and denotes the -fold product of . The middle output , where is the mapping function for MRDN.
III-C6 Computational Complexity Analysis
The computational complexity of the training phase in CBDNet is given by
(22) |
where donates the size of mini-batch, donates the number of iterations, donates the size of kernels. and denote the number of “Conv” for and , and denote the number of features for the th layer of and , respectively. The computational complexity of the training phase in GAN-CBD and MRDN are given by
(23) |
and
(24) |
IV Simulation Result

We consider the RIS-aided mmWave massive MIMO system with 20 UEs, where , , , and . In terms of hardware, we use Intel Core i7-9700K @3.60GHz, 32 GB RAM and NVIDIA GeForce RTX 2080Ti to implement the above three models through PyTorch library. From the perspective of normalized mean square error (NMSE) performance, this section illustrates the pros and cons of the three proposed channel estimators in terms of structure. All simulation results are derived in PyCharm Community Edition (Python 3.8 environment). The training rate is set as 0.0001 for MRDN and 0.001 for CBDNet and GAN-CBD and the mini-batch size is 20 for all three methods. The training, validation, and testing sets include 16,000, 6,000, and 8,000 samples, respectively. The training, validation and testing sets for the three methods use the same data set samples. The number of RDN for MRDN, residual blocks for both CBDNet and GAN-CBD are 6 and 12, respectively. The MRDN has 80 features, CBDNet and GAN-CBD have 96 features. Note that NMSE is defined as
(25) |
Figure 4 compares the three different models, including the MRDN, CBDNet and GAN-CBD. We can find that the MRDN can achieve best NMSE performance and fastest convergence. Because the GAN-CBD brings the advantage of judging the network, it shows better performance than the CBDNet. The computational complexity of training and offline operation can be hugely reduced. Also, the robustness of the channel estimator to different scenarios is enhanced. The average running time of MRDN (in seconds) is 0.0075, while the CBDNet and the GAN-CBD are 0.0094 and 0.0098 respectively, the computational complexity of training and offline operations for the MRDN can reduced compared with the CBDNet and the GAN-CBD. However, for almost the same computational complexity, the GAN-CBD can achieve better NMSE performance and fast convergence compared with the CBDNet. But compared with the MRDN, the improvement of network structure is not significant.
Figure 5 compares the NMSE performance of the proposed MRDN-based channel estimator for different structures (e.g., CBDNet [14], GAN-CBDN, CV-DnCNN [15]) and with existing conventional channel estimation methods (e.g., ADMM [16], PAPRFAC [17]). The simulation results are averaged over 300 iterations for the three proposed methods. It can be observed that MRDN can achieve better NMSE performance compared with GAN-CBD and CBDNet by 5.63dB and 4.51dB respectively. Compared with CV-DnCNN, which is also based on CNN, as well as conventional ADMM and PAPRFAC, regardless of the significant performance comparison in NMSE, the lower complexity of MRDN allows it to be better applied.
Figure 6 compares the NMSE performance for different number of features and RDN. With more RDNs for the global residual dense connection, and more comprehensive perceptual fields, the MRDN with 80 features and 6 dense connections for RDN performs better. Consequently, the main challenge in accurately describing noise is the lack of observational dimensions and modeling capabilities of neural networks, such as features and layers.
V Conclusion
We proposed the CBDNet, GAN-CBD and MRDN based cascaded channel estimators for RIS-aided mmWave massive MIMO communication systems. Utilizing the sparsity of the cascaded RIS channels and classic image processing techniques, we regard the channel matrix as a two-dimensional image. The proposed residual dense network structure can increase the flexibility of the overall network to obtain better generalization and fitting capabilities, while the advantages brought by the GAN structure are not significant. Compared with the previous generation method, based on the above advantages, the MRDN-based deep learning network is designed to estimate the cascaded RIS channels. The simulation results show that the performance of the proposed the MRDN estimator increases with the increase of the scale of the network structure under the same order of complexity as the CBDNet and the GAN-CBD.
Appendix A Forward and Backward Propagation
Assume that the weight matrix and the bias vector are the parameters at “Conv” layer . In the multilayer perceptron (MLP), we can explicitly write , where is an elementwise function and the definition for Conv is , as the first derivative is . For any and vectors in inner product space,
(26) |
(27) |
where , and
(28) |
The loss function gradients in MLP is
(29) |
Let be a network input-output pair,
(30) |
(31) |
where , and the prediction error is ,
(32) |
for all and ,
(33) |
The generic layer of a CNN as a parameter-dependent map that takes as input an -channeled tensor, where each channel is a matrix of size , and outputs an -channeled tensor, where each channel is a matrix of size . The parameters , the input . denotes an orthonormal basis for , and denotes an orthonormal basis for , the and is:
(34) |
And the convolution operator can be written as
(35) |
where is a bilinear operator that defines the mechanics of the convolution:
(36) |
where is the cropping operator that defines the action of convolution, , defines the stride of the Conv. The generic layer is
(37) |
For the “Conv” layer,
(38) |
(39) |
where , and . The learning rate , the gradient descent step algorithm update the parameter for backpropagation is
(40) |
the parameters can be update by . Due to the application of the derivative chain rule and error backpropagation, the high-dimensional neural network demonstrates excellent results.
References
- [1] C. Huang, A. Zappone, G. C. Alexandropoulos, M. Debbah, and C. Yuen, “Reconfigurable intelligent surfaces for energy efficiency in wireless communication,” IEEE Trans. Wireless Commun., vol. 18, no. 8, pp. 4157–4170, 2019.
- [2] Q. Wu and R. Zhang, “Towards smart and reconfigurable environment: Intelligent reflecting surface aided wireless network,” IEEE Commun. Mag., vol. 58, no. 1, pp. 106–112, 2019.
- [3] J. Zhang, E. Björnson, M. Matthaiou, D. W. K. Ng, H. Yang, and D. J. Love, “Prospective multiple antenna technologies for beyond 5G,” IEEE J. Sel. Areas Commun., vol. 38, no. 8, pp. 1637–1660, Aug. 2020.
- [4] J. Lin, G. Wang, R. Fan, T. A. Tsiftsis, and C. Tellambura, “Channel estimation for wireless communication systems assisted by large intelligent surfaces,” arXiv preprint arXiv:1911.02158, 2019.
- [5] J. Zhang, H. Du, Q. Sun, B. Ai, and D. W. K. Ng, “Physical layer security enhancement with reconfigurable intelligent surface-aided networks,” IEEE Trans. Inf. Forensic Secur., vol. 16, pp. 3480–3495, 2021.
- [6] Z.-Q. He and X. Yuan, “Cascaded channel estimation for large intelligent metasurface assisted massive MIMO,” IEEE Wireless Commun. Lett., vol. 9, no. 2, pp. 210–214, Feb. 2019.
- [7] J. Chen, Y.-C. Liang, H. V. Cheng, and W. Yu, “Channel estimation for reconfigurable intelligent surface aided multi-user MIMO systems,” arXiv:1912.03619, 2019.
- [8] A. Taha, M. Alrabeiah, and A. Alkhateeb, “Deep learning for large intelligent surfaces in millimeter wave and massive MIMO systems,” in Proc. IEEE Globecom, 2019, pp. 1–6.
- [9] J. Yang, C.-K. Wen, S. Jin, and F. Gao, “Beamspace channel estimation in mmWave systems via cosparse image reconstruction technique,” IEEE Trans. Commun., vol. 66, no. 10, pp. 4767–4782, Oct. 2018.
- [10] K. Zhang, W. Zuo, Y. Chen, D. Meng, and L. Zhang, “Beyond a gaussian denoiser: Residual learning of deep CNN for image denoising,” IEEE Trans. Image Process., vol. 26, no. 7, pp. 3142–3155, Jul. 2017.
- [11] C. Liu, X. Liu, D. W. K. Ng, and J. Yuan, “Deep residual network empowered channel estimation for IRS-assisted multi-user communication systems,” in Proc. IEEE ICC, 2021, pp. 1–7.
- [12] D. Kim, J. R. Chung, and S. Jung, “GRDN: Grouped residual dense network for real image denoising and GAN-based real-world noise modeling,” in Proc. CVF CVPRW, 2019, pp. 2086–2094.
- [13] S. Woo, J. Park, J.-Y. Lee, and I. S. Kweon, “CBAM: Convolutional block attention module,” in Proc. ECCV, 2018, pp. 3–19.
- [14] Y. Jin, J. Zhang, B. Ai, and X. Zhang, “Channel estimation for mmWave massive MIMO with convolutional blind denoising network,” IEEE Commun. Lett., vol. 24, no. 1, pp. 95–98, Jan. 2019.
- [15] S. Liu, Z. Gao, J. Zhang, M. Di Renzo, and M.-S. Alouini, “Deep denoising neural network assisted compressive channel estimation for mmwave intelligent reflecting surfaces,” IEEE Trans. Veh. Technol., vol. 69, no. 8, pp. 9223–9228, Aug. 2020.
- [16] E. Vlachos, G. C. Alexandropoulos, and J. Thompson, “Massive MIMO channel estimation for millimeter wave systems via matrix completion,” IEEE Signal Process. Lett., vol. 25, no. 11, pp. 1675–1679, Nov. 2018.
- [17] L. Wei, C. Huang, G. C. Alexandropoulos, C. Yuen, Z. Zhang, and M. Debbah, “Channel estimation for RIS-empowered multi-user MISO wireless communications,” IEEE Trans. Commun., vol. 69, no. 6, pp. 4144–4157, Jun. 2021.