This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Conditional Diffusion Model-Driven Generative Channels
for Double RIS-Aided Wireless Systems

Yiyang Ni, Qi Zhang, Guangji Chen, Yan Cai, Jun Li, Shi Jin Yiyang Ni is with the Jiangsu Key Laboratory of Wireless Communications, Nanjing University of Posts and Telecommunications, Nanjing 210003, China, also with the Jiangsu Second Normal University and Jiangsu Institute of Educational Science Research, Nanjing 210013, China (email: niyy@jssnu.edu.cn).Jun Li and Shi Jin are with the National Mobile Communications Research Laboratory, Southeast University, Nanjing 210096, China (e-mail: jun.li@seu.edu.cn, jinshi@seu.edu.cn).(Corresponding author:Jun Li)Qi Zhang and Yan Cai are with the School of Communications and Information Engineering, Nanjing University of Posts and Telecommunications, Nanjing 210003, China (email:1224014636@njupt.edu.cn, caiy@njupt.edu.cn)Guangji Chen is with the School of Electronic and Optical Engineering, Nanjing University of Science and Technology, Nanjing 210094, China (email:guangjichen@njust.edu.cn)
Abstract

With the development of the upcoming sixth-generation networks (6G), reconfigurable intelligent surfaces (RISs) have gained significant attention due to its ability of reconfiguring wireless channels via smart reflections. However, traditional channel state information (CSI) acquisition techniques for double-RIS systems face challenges (e.g., high pilot overhead or multipath interference). This paper proposes a new channel generation method in double-RIS communication systems based on the tool of conditional diffusion model (CDM). The CDM is trained on synthetic channel data to capture channel characteristics. It addresses the limitations of traditional CSI generation methods, such as insufficient model understanding capability and poor environmental adaptability. We provide a detailed analysis of the diffusion process for channel generation, and it is validated through simulations. The simulation results demonstrate that the proposed CDM based method outperforms traditional channel acquisition methods in terms of normalized mean squared error (NMSE). This method offers a new paradigm for channel acquisition in double-RIS systems, which is expected to improve the quality of channel acquisition with low pilot overhead.

Index Terms:
Diffusion model, channel estimation, deep learning, double reconfigurable intelligent surface.

I Introduction

WITH the rapid development of fifth-generation (5G) and the imminent arrival of sixth-generation (6G) wireless communication networks, the demand for innovative technologies that enhance performance, extend coverage, and improve energy efficiency has become increasingly critical [1][2]. Among these emerging solutions, reconfigurable intelligent surfaces (RISs) have gained prominence as a transformative technology that enables intelligent control of the radio propagation environment by dynamically adjusting the phase shifts of the incident signals. Composed of a large array of passive reflecting elements with ultra-low power consumption, RIS offers a new degree of freedom for wireless channel optimization and enhances network adaptability.

Although significant progress has been made in single-RIS systems, the performance is often constrained by the size, placement, and limited adaptability of the RIS in dynamic environments. To address these limitations, double-RIS has been proposed, wherein two RISs are deployed cooperatively to jointly optimize the transceiver design and passive beamforming of RISs. By strategically positioning and controlling both surfaces, double-RIS systems can significantly enhance system flexibility and performance, leading to the improved signal-to-noise ratio (SNR) and channel capacity [3]. This architecture has attracted growing attention for its potential to improve spatial diversity, exploit multipath propagation, and extend coverage in next-generation wireless networks.

However, unlocking the potential benefits of a high passive beamfomring gain provided by the double-RIS architecture relies on the accurate estimation of channel state information (CSI). Precise CSI is crucial for optimizing RIS configurations, enhancing transmission efficiency, and improving overall system throughput [4]. However, traditional CSI estimation techniques, including pilot-based and model-based methods, often rely on oversimplified assumptions regarding channel characteristics and system dynamics, which may fail in highly dynamic or complex propagation environments [5]. Moreover, compared with single-RIS architecture, presence of two RISs introduces a higher dimension of channel coefficients and more complex structures of channels. These interactions significantly increase modeling complexity, further limiting the practicality and effectiveness of conventional estimation techniques in real-world scenarios.

Recently, deep learning (DL) has emerged as a promising approach to address the limitations of conventional CSI estimation methods [6]. For example, a convolutional neural network (CNN) was employed in [7] to learn the mapping between received signals and channel responses. Similarly, a conditional generative adversarial network (CGAN)-based method was proposed in [8] to enhance CSI estimation accuracy in RIS-assisted mmWave MIMO systems. However, despite their effectiveness, these approaches suffer from two fundamental limitations: they fail to fully exploit the phase coherence properties of RIS elements.

To overcome the aforementioned issues, we propose a novel channel acquisition framework for double‐RIS communication systems based on a conditional diffusion model (CDM) that reconstructs complete CSI from partial observations. By leveraging spatial correlations between channel parameters and conditioning on received pilot signals from only a subset of reflecting elements, our generative model can produce high‐fidelity channel realizations that faithfully capture multipath fading, interference patterns, and spatial correlations under diverse RIS configurations. Unlike conventional schemes that require exhaustive, element‐wise estimation, this CDM‐based approach directly leverages partial channel information, thereby significantly reducing pilot overhead while maintaining estimation accuracy. Simulation results demonstrate that our method consistently outperforms traditional deep learning techniques in normalized mean squared error (NMSE), highlighting its promise for system design, performance optimization, and real‐time operation in complex double‐RIS deployments where efficient and accurate CSI acquisition is essential.

Refer to caption
Figure 1: A double RIS aided uplink wireless communication system.

II Channel Model

We consider a double-RIS-assisted communication system, as illustrated in Fig. 1. In this system, a single-antenna user communicates with a base station (BS) equipped with NN antennas via two distributed RISs, denoted as RIS1 and RIS2, respectively. Due to the complexity of estimating the double-reflection link, we focus solely on this path, assuming that the direct user-BS link is blocked by environmental obstructions (e.g., walls or corners in indoor scenarios). To mitigate path loss and blockage effects, RIS1 is deployed near the user, and RIS2 is deployed close to the BS, enabling efficient communication via the double-reflection link. Let M1M_{1} and M2M_{2} denote the number of passive elements at RIS1 and RIS2, respectively. All channels are assumed to be quasi-static and follow a flat fading model within each coherence interval.

We define the channel from the user to RIS1 as 𝐮[u1,,um,,uM1]TM1×1\mathbf{u}\triangleq\left[u_{1},\cdots,u_{m},\cdots,u_{M_{1}}\right]^{T}\in\mathbb{C}^{M_{1}\times 1} where um(m=1,,M1)u_{m}\left(m=1,\ldots,M_{1}\right) denotes the channel coefficient from the user to the mm-th RIS1 element. The channel between RIS1 and RIS2 is denoted by 𝐃[𝐝1,,𝐝m,,𝐝M1]M2×M1\mathbf{D}\triangleq[\mathbf{d}_{1},\cdots,\mathbf{d}_{m},\cdots,\mathbf{d}_{M_{1}}]\in\mathbb{C}^{M_{2}\times M_{1}} where each column 𝐝m=[d1,m,,dm,m,,dM2,m]TM2×1\mathbf{d}_{m}=[d_{1,m},\cdots,d_{m^{\prime},m},\cdots,d_{M_{2},m}]^{T}\in\mathbb{C}^{M_{2}\times 1} represents the channel coefficients from the mm-th RIS1 element to all RIS2 elements. Here, dm,md_{m^{\prime},m} is the channel coefficient between the mm-th RIS1 element and the mm^{\prime}-th (m=1,,M2)\left(m^{\prime}=1,\ldots,M_{2}\right) RIS2 element. In addition, 𝐆2N×M2\mathbf{G}_{2}\in\mathbb{C}^{N\times M_{2}} denotes the channel from RIS2 to the BS.The reflection coefficient vector of RISμ (μ{1,2})\left(\mu\in\{1,2\}\right) is denoted as 𝜽μ[θμ,1,,θμ,Mμ]T=[βμ,1ejϕμ,1,,βμ,Mμejϕμ,Mμ]TMμ×1\boldsymbol{\theta}_{\mu}\triangleq\left[\theta_{\mu,1},\ldots,\theta_{\mu,M_{\mu}}\right]^{T}=\left[\beta_{\mu,1}e^{j\phi_{\mu,1}},\ldots,\beta_{\mu,M_{\mu}}e^{j\phi_{\mu,M_{\mu}}}\right]^{T}\in\mathbb{C}^{M_{\mu}\times 1} where βμ,m=1\beta_{\mu,m}=1 and ϕμ,m[0,2π)\phi_{\mu,m}\in\left[0,2\pi\right) denote the reflection amplitude and phase shift of the mm-th element of RIS1, respectively. We consider a challenging scenario, where both the direct link and single-reflection links are severely blocked due to dense obstacles. To this end, we concentrate our analysis on this specific propagation path. Accordingly, the equivalent end-to-end channel between the user and the BS can be expressed as

h=G2𝚽2D𝚽1u\displaystyle\textbf{h}=\textbf{G}_{2}\boldsymbol{\Phi}_{2}\textbf{D}\boldsymbol{\Phi}_{1}\textbf{u} (1)

where 𝚽μ=diag(𝜽μ)\boldsymbol{\Phi}_{\mu}=\mathrm{diag}\left(\boldsymbol{\theta}_{\mu}\right) denotes the diagonal reflection matrix of RISμ. Since we consider the fully passive RISs without signal reception or transmission capabilities, it is infeasible to acquire the CSI between the two RISs. To address this, we define the cascaded user \rightarrow RIS1 \rightarrow RIS2 channel as D~[d~1,,d~m,,d~M1]=Ddiag(u)M2×M1\tilde{\textbf{D}}\triangleq\begin{bmatrix}\tilde{\textbf{d}}_{1},\cdots,\tilde{\textbf{d}}_{m},\cdots,\tilde{\textbf{d}}_{M_{1}}\end{bmatrix}=\textbf{D}\mathrm{diag}(\textbf{u})\in\mathbb{C}^{M_{2}\times M_{1}} where d~m=dmumM2×1\tilde{\textbf{d}}_{m}=\textbf{d}_{m}{u}_{m}\in\mathbb{C}^{M_{2}\times 1} represents the contribution of the mm-th RIS1 element weighted by its user-side channel coefficient. Then, the channel model in (1) can be equivalently expressed as

h =G2𝚽2D~𝜽1\displaystyle=\textbf{G}_{2}\boldsymbol{\Phi}_{2}\tilde{\textbf{D}}\boldsymbol{\theta}_{1}
=G2[diag(d~1)𝜽2,,diag(d~M1)𝜽2]𝜽1\displaystyle=\textbf{G}_{2}\left[\mathrm{diag}\left(\tilde{\textbf{d}}_{1}\right)\boldsymbol{\theta}_{2},\dots,\mathrm{diag}\left(\tilde{\textbf{d}}_{M_{1}}\right)\boldsymbol{\theta}_{2}\right]\boldsymbol{\theta}_{1}
=m=1M1G2diag(d~m)𝜽2Bmθ1,m\displaystyle=\sum_{m=1}^{M_{1}}\underbrace{\textbf{G}_{2}\mathrm{diag}(\tilde{\textbf{d}}_{m})\boldsymbol{\theta}_{2}}_{\textbf{B}_{m}}\theta_{1,m} (2)

where BmN×M2\textbf{B}_{m}\in\mathbb{C}^{N\times M_{2}} denotes the effective channel associated with the mm-th element of RIS, incorporating the complete path from the user to the BS via both RISs.

III Proposed Channel Generation Method

In large-scale double-RIS systems, the need to estimate CSI for numerous elements introduces substantial pilot overhead, which shortens the time available for data transmission and lowers system throughput. To mitigate this issue, we estimate the CSI of a selected subset of elements, significantly reducing pilot overhead. The full CSI is then inferred by exploiting spatial correlations among channels.

In this section, we propose a channel generation approach based on a CDM, which consists of two key stages including the forward and the reverse processes. The conditional diffusion model offers a robust framework for reconstructing complete CSI from partial channel state observations.

We formulate the double-RIS channel generation as a sampling process from a learned latent prior, where the double-RIS channel is reconstructed iteratively [9]. This process consists of main stages including a forward process and a reverse process [10]. In the forward process, double-RIS channel acquisition is modeled as a sampling procedurethat gradually transforms the initial data into a distribution resembling Gaussian noise. The reverse process then iteratively denoises this data to reconstruct the complete channel. A conditional diffusion model is employed to generate channel realizations that closely match the actual distribution. Let BmPN×M2P\textbf{B}_{m}^{\text{P}}\in\mathbb{C}^{N\times M_{2}^{\text{P}}} denote the partial estimated cascaded channel corresponding to a subset of M2PM_{2}^{\text{P}} elements. The mask ratio is defined as ρ=1M2P/M2\rho={1-M_{2}^{\text{P}}}/{M_{2}}, indicating the fraction of elements with unestimated CSI. Tuning the mask ratio enables a flexible balance between pilot overhead and channel acquisition accuracy.

The partial cascaded channel BmP\textbf{B}_{m}^{\text{P}} is vectorized into a real-valued vector 𝐱02NM2P×1\mathbf{x}_{0}\in\mathbb{R}^{2NM_{2}^{\text{P}}\times 1}by stacking its real and imaginary components [11]. In the forward process, Gaussian noise is gradually added to 𝐱0\mathbf{x}_{0}, resulting in 𝐱T\mathbf{x}_{T} after T diffusion steps. This process is formally defined as

𝐱t=1βt𝐱t1+βtϵt,\displaystyle\mathbf{x}_{t}=\sqrt{1-\beta_{t}}\mathbf{x}_{t-1}+\sqrt{\beta_{t}}\boldsymbol{\epsilon}_{t}, (3)
Refer to caption
Figure 2: Illustration of conditional diffusion model structure. It integrates a conditional diffusion model module, which derives the complete channel state information from the partial channel state information by leveraging spatial correlations. During the reverse process, pilot signals are added as conditional input, thereby significantly improving the generation effect.

where t[0,T]t\in\left[0,T\right] denotes the diffusion step, and the variance schedule βt\beta_{t} increases linearly from β1\beta_{1} to βT\beta_{T}, and ϵt\boldsymbol{\epsilon}_{t} represents standard Gaussian noise. Based on the Markov chain framework, the distribution of 𝐱t\mathbf{x}_{t} conditioned on the original input 𝐱0\mathbf{x}_{0} is given by

p(𝐱t𝐱0)=𝒞𝒩(α¯t𝐱0,(1α¯t)𝐈),p\left(\mathbf{x}_{t}\mid\mathbf{x}_{0}\right)=\mathcal{CN}\left(\sqrt{\bar{\alpha}}_{t}\mathbf{x}_{0},\left(1-\bar{\alpha}_{t}\right)\mathbf{I}\right), (4)

where α¯t=i=1tαi\overline{\alpha}_{t}=\prod_{i=1}^{t}\alpha_{i}, αt=1βt\alpha_{t}=1-\beta_{t} ,with 𝐈\mathbf{I} represents the identity matrix. It is essential for the subsequent training process to obtain xt\textbf{x}_{t} by sampling from the distribution in (4) .

During the conditional reverse process, the reverse transition p(𝐱t1|𝐱t)p\left(\mathbf{x}_{t-1}|\mathbf{x}_{t}\right) also follows Gaussian distribution, and can be written as

p(𝐱t1|𝐱t)=𝒩(𝝁(𝐱t,t),𝚺(𝐱t,t)),p\left(\mathbf{x}_{t-1}|\mathbf{x}_{t}\right)=\mathcal{N}\left(\boldsymbol{\mu}\left(\mathbf{x}_{t},t\right),\boldsymbol{\Sigma}\left(\mathbf{x}_{t},t\right)\right), (5)

where the mean and covariance are determined by the current state 𝐱t\mathbf{x}_{t} and the time step tt. If the reverse distribution p(𝐱t1|𝐱t)p\left(\mathbf{x}_{t-1}|\mathbf{x}_{t}\right) can be obtained during the denoising process, the original data can be progressively reconstructed. To approximate this intractable distribution, we employ a neural network denoted as pθ(𝐱t1|𝐱t)p_{\theta}\left(\mathbf{x}_{t-1}|\mathbf{x}_{t}\right). The corresponding mean and variance are respectively expressed as

μθ(𝐱t,t)\displaystyle{\mu}_{\theta}\left(\mathbf{x}_{t},t\right) =1αt(𝐱tβt1α¯tϵθ(𝐱t,t)),\displaystyle=\frac{1}{\sqrt{{\alpha}_{t}}}\left(\mathbf{x}_{t}-\frac{\beta_{t}}{\sqrt{1-\bar{\alpha}_{t}}}\boldsymbol{\epsilon}_{\theta}\left(\mathbf{x}_{t},t\right)\right),
Σθ(𝐱t,t)\displaystyle{\Sigma}_{\theta}\left(\mathbf{x}_{t},t\right) =1α¯t11α¯tβt,\displaystyle=\frac{1-\bar{\alpha}_{t-1}}{1-\bar{\alpha}_{t}}\beta_{t}, (6)

where ϵθ(𝐱t,t)\boldsymbol{\epsilon}_{\theta}\left(\mathbf{x}_{t},t\right) denotes the predicted noise generated by the neural network, given the input 𝐱t\mathbf{x}_{t} at time step tt.

Algorithm 1 Training Process of the Proposed Method

Input: BmP\textbf{B}_{m}^{\text{P}}, TT, {αt}\{\alpha_{t}\}, {βt}\{\beta_{t}\}, and training epochs KK, weighting coefficient λ2\lambda_{2}.

BmP\textbf{B}_{m}^{\text{P}} collected from environments 1 to environments A.

Initialization: Vectorize BmP\textbf{B}_{m}^{\text{P}} into a real-valued vector x0\textbf{x}_{0}.

Initialize U-Net parameters θ\theta randomly.

for i=1:Ki=1:K do

for t=1:Tt=1:T do

        Generate noise-corrupted 𝐱t\mathbf{x}_{t} using {αt},{βt}\{\alpha_{t}\},\{\beta_{t}\} and ϵt\boldsymbol{\epsilon}_{t} based on (3).

        Extract yt\textbf{y}_{t} for conditioning training data.

        Train the U-Net to get ϵ~θ,t\tilde{\epsilon}_{\theta,t} based on (16).

        Compute loss function Lθ{L}_{\theta} based on (15).

        Update θ\theta using gradient descent on Lθ{L}_{\theta}.

        end for

    end for

Output: Trained noise prediction network parameters θ\theta.

Algorithm 2 Inference Process of the Proposed Method

Input: yt\textbf{y}_{t}, TT, {αt}\{\alpha_{t}\}, {βt}\{\beta_{t}\}, weighting coefficient λ2\lambda_{2} and trained θ\theta.

    Initialization: 𝐱t𝒩(𝟎,𝐈)\mathbf{x}_{t}\sim\mathcal{N}\left(\mathbf{0},\mathbf{I}\right).

    for t=T:1t=T:1 do

        Update ϵ~θ,t\tilde{\epsilon}_{\theta,t}.

        Compute μθ(𝐱t,t)\mu_{\theta}(\mathbf{x}_{t},t) and Σθ(𝐱t,t)\Sigma_{\theta}(\mathbf{x}_{t},t) based on (6).

        Compute 𝐱t1\mathbf{x}_{t-1} based on (17).

    end for

    Convert real-valued vector 𝐱^0\mathbf{\hat{x}}_{0} to matrix 𝐁^m\hat{\mathbf{B}}_{m}.

Output: Estimated 𝐁^m{\mathbf{\hat{B}}_{m}}.

Our objective is to train the model to accurately predict channel realizations based on the collected dataset. As shown in [10], the training loss can be simplified to the Kullback–Leibler (KL) divergence between the true reverse distribution p(𝐱t1|𝐱t)p\left(\mathbf{x}_{t-1}|\mathbf{x}_{t}\right) and the approximation pθ(𝐱t1|𝐱t)p_{\theta}\left(\mathbf{x}_{t-1}|\mathbf{x}_{t}\right) at each diffusion step

L=𝒟KL[p(𝐱t1|𝐱t)||pθ(𝐱t1|𝐱t)].\displaystyle L=\mathcal{D}_{\mathrm{KL}}\left[p\left(\mathbf{x}_{t-1}|\mathbf{x}_{t}\right)||p_{\theta}\left(\mathbf{x}_{t-1}|\mathbf{x}_{t}\right)\right]. (7)

Conclusively,the training process is to minimize the discrepancy between the the noise predicted by the network and the true noise added during the forward process

L=𝔼[ϵϵθ(𝐱t,t)2].\displaystyle L=\mathbb{E}\left[\left\|\boldsymbol{\epsilon}-\boldsymbol{\epsilon}_{\theta}\left(\mathbf{x}_{t},t\right)\right\|^{2}\right]. (8)

To generate high-quality channel samples, relying solely on Gaussian noise to reconstruct 𝐱0\mathbf{x}_{0} may yield random and inaccurate results. Therefore, we incorporate the received signal yt\textbf{y}_{t}, obtained from pilot transmissions, as an auxiliary input to the neural network.

In the double-RIS system, the user transmits a pilot symbol ss. For simplicity, which is typically set to 1 for simplicity. After propagation through the cascaded double-reflection links, the received signal at the BS is given by

yt=G2𝚽2D𝚽1us+n,\displaystyle\textbf{y}_{t}=\textbf{G}_{2}\boldsymbol{\Phi}_{2}\textbf{D}\boldsymbol{\Phi}_{1}\textbf{u}s+\textbf{n}, (9)

where nN×1\textbf{n}\in\mathbb{C}^{N\times 1} represents the additive noise. The conditional reverse distribution is then redefined as p(𝐱t1|𝐱t,𝐲t)p\left(\mathbf{x}_{t-1}|\mathbf{x}_{t},\mathbf{y}_{t}\right) which can be expressed as follows by Bayesian theorem

p(𝐱t1|𝐱t,yt)\displaystyle p\left(\mathbf{x}_{t-1}|\mathbf{x}_{t},\textbf{y}_{t}\right) =p(𝐱t,yt|𝐱t1)p(𝐱t1)p(𝐱t,yt)\displaystyle=\frac{p\left(\mathbf{x}_{t},\textbf{y}_{t}|\mathbf{x}_{t-1}\right)p\left(\mathbf{x}_{t-1}\right)}{p\left(\mathbf{x}_{t},\textbf{y}_{t}\right)}
=p(yt|𝐱t1,𝐱t)p(𝐱t|𝐱t1)p(𝐱t1)p(yt|𝐱t)p(𝐱t)\displaystyle=\frac{p\left(\textbf{y}_{t}|\mathbf{x}_{t-1},\mathbf{x}_{t}\right)p\left(\mathbf{x}_{t}|\mathbf{x}_{t-1}\right)p\left(\mathbf{x}_{t-1}\right)}{p\left(\textbf{y}_{t}|\mathbf{x}_{t}\right)p\left(\mathbf{x}_{t}\right)}
=p(yt|𝐱t1,𝐱t)p(𝐱t1|𝐱t)p(yt|𝐱t).\displaystyle=\frac{p\left(\textbf{y}_{t}|\mathbf{x}_{t-1},\mathbf{x}_{t}\right)p\left(\mathbf{x}_{t-1}|\mathbf{x}_{t}\right)}{p\left(\textbf{y}_{t}|\mathbf{x}_{t}\right)}. (10)

Since both 𝐱t\mathbf{x}_{t} and yt{\textbf{y}}_{t} are known during the denoising process, the term p(yt|𝐱t){p\left({\textbf{y}}_{t}|\mathbf{x}_{t}\right)} is treated as a constant and denoted by λ1\lambda_{1}. Hence, (10) can be simplified as

p(𝐱t1|𝐱t,yt)=λ1p(yt|𝐱t1,𝐱t)p(𝐱t1|𝐱t),\displaystyle p\left(\mathbf{x}_{t-1}|\mathbf{x}_{t},{\textbf{y}}_{t}\right)=\lambda_{1}p\left({\textbf{y}}_{t}|\mathbf{x}_{t-1},\mathbf{x}_{t}\right)p\left(\mathbf{x}_{t-1}|\mathbf{x}_{t}\right), (11)

Furthermore, the likelihood term p(yt|𝐱t1,𝐱t)p\left({\textbf{y}}_{t}|\mathbf{x}_{t-1},\mathbf{x}_{t}\right) can else be expressed as

p(yt|𝐱t1,𝐱t)=p(𝐱t1|yt,𝐱t)p(yt|𝐱t)p(𝐱t|𝐱t1).\displaystyle p\left({\textbf{y}}_{t}|\mathbf{x}_{t-1},\mathbf{x}_{t}\right)=\frac{p\left(\mathbf{x}_{t-1}|{\textbf{y}}_{t},\mathbf{x}_{t}\right)p\left(\textbf{y}_{t}|\mathbf{x}_{t}\right)}{p\left(\mathbf{x}_{t}|\mathbf{x}_{t-1}\right)}. (12)

Taking the logarithm of (12) and calculating the gradient, we can obtain

logp(yt|𝐱t1,𝐱t)\displaystyle\nabla\log p\left(\textbf{y}_{t}|\mathbf{x}_{t-1},\mathbf{x}_{t}\right)\propto logp(𝐱t|yt,𝐱t1)\displaystyle\nabla\log p\left(\mathbf{x}_{t}|\textbf{y}_{t},\mathbf{x}_{t-1}\right)
logp(𝐱t1|𝐱t).\displaystyle-\nabla\log p\left(\mathbf{x}_{t-1}|\mathbf{x}_{t}\right). (13)

We approximate the conditional distribution pθ()p_{\theta}\left(\cdot\right) using a parameterized neural network p(𝐱t1|𝐱t,yt){p\left(\mathbf{x}_{t-1}|\mathbf{x}_{t},\textbf{y}_{t}\right)}, and thus obtain

θlogpθ(𝐱t1|𝐱t,yt)\displaystyle\nabla_{\theta}\log p_{\theta}\left(\mathbf{x}_{t-1}|\mathbf{x}_{t},\textbf{y}_{t}\right)\approx λ2θlogpθ(𝐱t|yt,𝐱t1)\displaystyle\lambda_{2}\nabla_{\theta}\log p_{\theta}\left(\mathbf{x}_{t}|\textbf{y}_{t},\mathbf{x}_{t-1}\right)
+(1λ2)θlogpθ(𝐱t1|𝐱t),\displaystyle+\left(1-\lambda_{2}\right)\nabla_{\theta}\log p_{\theta}\left(\mathbf{x}_{t-1}|\mathbf{x}_{t}\right), (14)

where λ2\lambda_{2} is a weighting coefficient that quantifies the importance of the conditional inputs yt\textbf{y}_{t}. Accordingly, we modify the loss function L{L} as

L(θ)=𝔼[ϵtϵ~θ,t2].\displaystyle L\left(\theta\right)=\mathbb{E}\left[\left\|\boldsymbol{\epsilon}_{t}-\tilde{\boldsymbol{\epsilon}}_{\theta,t}\right\|^{2}\right]. (15)

In (15), ϵ~θ,t\tilde{\boldsymbol{\epsilon}}_{\theta,t} is the modified predicted noise.

ϵ~θ,t=λ2ϵθ(𝐱t,t,yt)+(1λ2)ϵθ(𝐱t,t),\displaystyle\tilde{\boldsymbol{\epsilon}}_{\theta,t}=\lambda_{2}\boldsymbol{\epsilon}_{\theta}\left(\mathbf{x}_{t},t,\textbf{y}_{t}\right)+\left(1-\lambda_{2}\right)\boldsymbol{\epsilon}_{\theta}\left(\mathbf{x}_{t},t\right), (16)

where ϵθ(𝐱t,t,yt)\boldsymbol{\epsilon}_{\theta}\left(\mathbf{x}_{t},t,\textbf{y}_{t}\right) represents the noise predicted with the additional conditional inputs yt\textbf{y}_{t}. The update process from 𝐱t\mathbf{x}_{t} to 𝐱t1\mathbf{x}_{t-1} can be represent as

𝐱t1=1αt(𝐱tβt1α¯tϵ~θ,t)+Σθ(𝐱t,t)zt,\displaystyle\mathbf{x}_{t-1}=\frac{1}{\sqrt{\alpha_{t}}}\left(\mathbf{x}_{t}-\frac{\beta_{t}}{\sqrt{1-\overline{\alpha}_{t}}}\tilde{\boldsymbol{\epsilon}}_{\theta,t}\right)+\Sigma_{\theta}\left(\mathbf{x}_{t},t\right)z_{t}, (17)

where zt{z}_{t} is a Gaussian noise. We encode the diffusion step T{T} and the received signal yt\textbf{y}_{t} through the step and conditional embedding modules, respectively. These embeddings are then integrated with the noisy input 𝐱t\mathbf{x}_{t} and fed into a U-net. A convolutional architecture is used to effectively fuse the three types of inputs. Finally, the network is trained by minimizing the loss function in (15) . The training and inference procedures are summarized in Algorithms 1 and 2, respectively.

IV Simulation Results

In this section, we provide the simulation settings and results to evaluate effectiveness of the proposed method in double-RIS-assisted. The BS is equipped with a uniform linear array with N=4N=4 antennas. To enable full CSI reconstruction from partial observations, we incorporate spatial correlation into the channel model [12]. Consider RIS1 with M1M_{1} reflecting elements, the spatial correlation matrix Ωf,g\Omega_{f,g} is defined as

Ωf,g={1,k=0sin(k)k,otherwise,\displaystyle\Omega_{f,g}=\left\{\begin{array}[]{cc}1,&\;k=0\\ \frac{\sin\left(k\right)}{k},&\mathrm{otherwise},\end{array}\right. (20)

where k=2π|fg|dλk=\frac{2\pi|f-g|d}{\lambda}, f,g=0,1,,M11f,g=0,1,\cdots,M_{1}-1, dd represents the distance between the adjacent reflecting elements, λ\lambda represents the wavelength at which the system operates. Similarly, the same correlation model is applied to RIS2.

We now detail the network configuration for the proposed CDM. The diffusion model is trained with T = 500 steps, and the noise βt\beta_{t} is linearly increased from 10410^{-4} to 0.02. During the reverse process, a U-Net based convolutional neural network is employed to approximate the denoising distribution at each step. Furthermore, a conditional embedding module is utilized to incorporate the partial observed signals as auxiliary input.

NMSE is adopted as the metric of the estimation accuracy, which is defined as

NMSE=𝔼(𝐁^m𝐁mF2𝐁mF2),\displaystyle\text{NMSE}=\mathbb{E}\left(\frac{\|\hat{\mathbf{B}}_{m}-\mathbf{B}_{m}\|_{F}^{2}}{\|\mathbf{B}_{m}\|_{F}^{2}}\right), (21)

where 𝐁^m\hat{\mathbf{B}}_{m} and 𝐁m\mathbf{B}_{m} represent the generative channel and the ground-truth, respectively.

Refer to caption
Figure 3: NMSE versus SNR under different M1M_{1} and M2M_{2}.

As shown in Figure 3, we present the NMSE performance of the proposed CDM in the double-RIS system. For comparison, we also incorporate the long short-term memory (LSTM) estimation algorithm and a representative deep learning method, the CGAN. We evaluate the NMSE performance of the double-RIS system under different numbers of array elements. All evaluations are conducted within a SNR range from -5 dB to 20 dB, and comparisons with other methods are also made under different numbers of array elements. We set the values of mask ratio to 0.2 uniformly. It can be observed that, in terms of NMSE, the proposed CDM significantly outperforms both the LSTM estimation algorithm and the method based on the CGAN. This performance improvement is attributed to the ability of this method to reconstruct the complete CSI from partial CSI by exploiting spatial correlation, leveraging both the forward and reverse diffusion processes, while ensuring strict consistency between the training phase and the inference phase. As a result, the model gradually mitigates the impact of noise during the reconstruction process. Regarding the influence of the array size, we observe that the NMSE performance deteriorates as the number of array elements increases. This degradation of performance is caused by the increased complexity of the cascaded channel matrix between the two RISs, which substantially increases the computational burden.

Refer to caption
Figure 4: NMSE versus SNR under different channel ratio ρ\rho.

In Figure 4, we plot the comparison of the NMSE of the proposed CDM under different mask ratios and for different numbers of elements corresponding to these mask ratios. In this method, we use partial channel information for training to generate complete channel information. We compare the cases where the mask information ratios are 0.2 and 0.5 respectively, and also consider the cases where the number of array elements is 16 and 64 for these two ratios. It can be seen from the figure that as the SNR increases, the NMSE decreases, which is an expected result. In addition, at a fixed SNR level, the NMSE decreases as the partial information ratio increases. This phenomenon can be attributed to the fact that a higher ratio leads to an increase in the auxiliary information fed into the diffusion model network. Meanwhile, we also compare different methods under different mask ratios. It can be seen from the figure that under the same mask ratio, the proposed CDM outperforms the CGAN and the LSTM. This is because during the reverse denoising process of CDM, denoising is carried out step by step, enabling more complete utilization of the channel information, thus achieving better performance.

V Conclusion

In this paper, we proposed a novel double-RIS channel generation method leveraging a conditional diffusion model. The CDM method utilizes spatial correlation to generate the full channel state information from partial channel state information and incorporates pilot signals as conditional inputs, resulting in a significant performance improvement. The proposed method abandons the traditional channel estimation methods that rely on prior theoretical models, and thus is more efficient in channel acquisition.

References

  • [1] Y. Ni, H. Zhao, Y. Liu, J. Wang, G. Gui and H. Zhang, “Analysis of RIS-aided communications over Nakagami-mm fading channels,” IEEE Trans. Veh. Technol., vol. 72, no. 7, pp. 8709-8721, Jul. 2023.
  • [2] J. Zhang, J. Li, L. Shi, Z. Wang, S. Jin, W. Chen, and H. V. Poor, “Decision transformers for wireless communications: A new paradigm of resource management,” IEEE Wireless Commun., vol. 32, no. 2, pp. 180-186, Apr. 2025.
  • [3] P. Zhang, S. Gong and S. Ma, “Double-RIS aided multi-user MIMO communications: Common reflection pattern and joint beamforming design,” IEEE Trans. Veh. Technol., vol. 73, no. 3, pp. 4418-4423, Mar. 2024.
  • [4] Y. Chen, M. Jian and L. Dai, “Channel estimation for RIS assisted wireless communications: Stationary or non-stationary?,” IEEE Trans. Signal Process., vol. 72, pp. 3776-3791, 2024.
  • [5] W. Chen, Y. Han, C. -K. Wen, X. Li and S. Jin, “Channel customization for low-complexity CSI acquisition in multi-RIS-assisted MIMO systems.” IEEE J. Sel. Areas Commun., vol. 43, no. 3, pp. 851-866, Mar. 2025.
  • [6] J. Guo, W. Chen, C. -K. Wen and S. Jin, “Deep learning-based two-timescale CSI feedback for beamforming design in RIS-assisted communications.” IEEE Trans. Veh. Technol., vol. 72, no. 4, pp. 5452-5457, Apr. 2023.
  • [7] Z. Mao, X. Liu and M. Peng, “Channel estimation for intelligent reflecting surface assisted massive MIMO systems—A deep learning approach.” IEEE Commun. Lett., vol. 26, no. 4, pp. 798-802, Apr. 2022.
  • [8] M. Ye, C. Pan, Y. Xu and C. Li, “Generative adversarial networks-based channel estimation for intelligent reflecting surface assisted mmWave MIMO systems.” IEEE Trans. Cogn. Commun. Netw., vol. 26, no. 4, pp. 798-802, Apr. 2022.
  • [9] W. Tong, W. Xu, F. Wang, W. Ni and J. Zhang, “Diffusion model-based channel estimation for RIS-aided communication systems.” IEEE Wireless Commun. Lett., vol. 13, no. 9, pp. 2586-2590, Sep. 2024.
  • [10] J. Ho, A. Jain, and P. Abbeel, “Denoising diffusion probabilistic models.” Proc. Adv. Neural Inf. Process. Syst., vol. 33, 2020, pp. 6840–6851.
  • [11] J. Zhang, J. Li, Z. Wang, Y. Han, L. Shi and B. Cao, “Decision transformer for IRS-assisted systems with diffusion-driven generative channels,” IEEE Int. Conf. Commun. China, ICCC, Hangzhou, China, 2024.
  • [12] E. Björnson and L. Sanguinetti, “Rayleigh fading modeling and channel hardening for reconfigurable intelligent surfaces.” IEEE Wireless Commun. Lett., vol. 10, no. 4, pp. 830-834, Apr. 2021.