A Novel OTFS-based Massive Random Access Scheme in Cell-Free Massive MIMO Systems for High-Speed Mobility

Yanfeng Hu, Dongming Wang, Xinjiang Xia,
Jiamin Li, Pengcheng Zhu, and Xiaohu You This work was supported by the National Key R&D Program of China under Grant 2023YFB2603802, and by the National Natural Science Foundation of China (NSFC) under Grant 62371346. (Corresponding authors: Dongming Wang.)Y. Hu, D. Wang, J. Fu, J. Li, P. Zhu and X. You are with the National Mobile Communications Research Lab, Southeast University, Nanjing, 210096, P.R.China, and also with the Purple Mountain Laboratories, Nanjing, 210096, P.R.China (email: huyanfeng@seu.edu.cn, wangdm@seu.edu.cn, jiaminli@seu.edu.cn, p.zhu@seu.edu.cn, xhyu@seu.edu.cn)X. Xia is with the Purple Mountain Laboratories, Nanjing, 210096, P.R.China (email: xinjiang

\_

xia@aa.seu.edu.cn)

Abstract

In the research of next-generation wireless communication technologies, orthogonal time frequency space (OTFS) modulation is emerging as a promising technique for high-speed mobile environments due to its superior efficiency and robustness in doubly-selective channels. Additionally, the cell-free architecture, which eliminates the issues associated with cell boundaries, offers broader coverage for radio access networks. By combining cell-free network architecture with OTFS modulation, the system may meet the demands of massive random access required by machine-type communication devices in high-speed scenarios. This paper explores a massive random access scheme based on OTFS modulation within a cell-free architecture. A transceiver model for uplink OTFS signals involving multiple access points (APs) is developed, where channel estimation with fractional channel parameters is approximated as a two-dimensional block sparse matrix recovery problem. Building on existing superimposed and embedded preamble schemes, a hybrid preamble strategy intended for massive random access is proposed. This scheme leverages superimposed and embedded preambles to respectively achieve rough and accurate active user equipment (UEs) detection (AUD), as well as precise channel estimation. Moreover, this study introduces a generalized approximate message passing and pattern-coupled sparse Bayesian learning with Laplacian prior (GAMP-PCSBL-La) algorithm, which effectively captures block sparse features after discrete cosine transform (DCT), delivering precise estimation results with reduced computational complexity. Simulation results demonstrate that the proposed scheme is effective and provides superior performance compared to other existing schemes.

Index Terms:

Massive random access, OTFS, cell-free massive MIMO, active UE detection, channel estimation, block sparse recovery.

I Introduction

The next-generation wireless communication will delve deeper into more ubiquitous Internet of Things (IoT) scenarios in the coming decades, encompassing broader coverage areas and a significantly larger number of user equipment (UEs) [1]. Beyond human-type communication devices (HTCDs), there are numerous machine-type communication devices (MTCDs) that need to facilitate data transmission [2]. In high-speed massive machine-type communication (mMTC) scenarios, such as high-speed railways, Internet of Vehicles (IoV), unmanned aerial vehicle (UAV) communications, and high-speed integrated sensing and communication (ISAC) [3], the great number of MTCDs face constraints such as allocable resources [4]. Due to asynchronous delays and Doppler shifts caused by high-speed mobility, doubly-selective features are often exhibited by the transmission channel. Traditional coordinated access protocols, which involve multiple handshake processes, not only introduce extra delays but also incur significant signaling overhead [5]. Moreover, coordinated orthogonal resources suffer severe orthogonality degradation in doubly-selective channels, thereby reducing system performance [6]. Unlike coordinated schemes, grant-free non-orthogonal multiple access (NOMA) allows devices to transmit data without allocated resources. The receiver performs active UE detection (AUD) and channel estimation (CE) based on unique non-orthogonal preamble sequence assigned to each UE [7]. Therefore, grant-free NOMA in uncoordinated access schemes is considered one of the key technologies for mMTC [4].

Emerging machine-type wireless transmission services impose stringent demands on communication quality in high-mobility scenarios. Orthogonal frequency division multiplexing (OFDM), widely used in 4G and 5G, can eliminate inter-symbol interference caused by time dispersion using a cyclic prefix (CP), but struggles to mitigate frequency dispersion caused by Doppler shifts, leading to inter-carrier interference [8]. Hadani et al. proposed a novel two-dimensional modulation known as orthogonal time frequency space (OTFS) [9]. Compared to OFDM, OTFS has been proven to significantly improve transmission performance in doubly-selective channels with only a modest increase in system complexity [10]. Specifically, OTFS uses a two-dimensional inverse symplectic finite Fourier transform (ISFFT) to map signals from the Doppler-delay (DD) domain to the time-frequency (TF) domain. Unlike OFDM, each signal symbol in OTFS spans the entire TF domain channel, fully exploiting channel diversity and enhancing reliability [11]. Additionally, the number of reflectors is considerably smaller than the dimension of transmitted symbols, resulting in sparsity for channel parameters in the DD domain [9], which simplify the estimation of channel state information (CSI). Given these advantages, OTFS is considered a promising candidate for next-generation broadband communication modulation technology.

In addition, high-mobility communication inevitably requires wide coverage, as UEs may travel significant distances during communication intervals. Cellular network necessitates handovers for high-mobility UEs, increasing the complexity of system processing [12]. Moreover, boundary effects limit the transmission efficiency for UEs located at the cell edges [13]. Therefore, a concept named cell-free massive MIMO has been proposed to support denser and wider device coverage, significantly enhancing spectral efficiency and reliability [14]. By deploying numerous access points (APs) across the coverage area, boundary effects are eliminated in cell-free massive system [15]. Each AP is equipped with an independent signal processing unit and connected to a central processing unit (CPU) via fronthaul, providing a flexible networking [16]. Additionally, with UEs being closer to the receiving antennas, signal transmission and processing delays are significantly reduced. Mohammadi et al. theoretically demonstrated that OTFS modulation can achieve superior performance within a cell-free massive MIMO architecture [17]. However, numerous challenges still need to be addressed for massive random access in high-mobility scenarios when integrating cell-free massive MIMO.

Recent discussions on OTFS grant-free access schemes for high-mobility mainly focus on low earth orbit (LEO) satellite communication [18]. Shen et al. approximated the OTFS channel as a sparse matrix and utilized the low-complexity pattern-coupled sparse Bayesian learning (PCSBL) for AUD and sparse CE [19]. Zhou et al. designed a novel training sequences aided OTFS (TS-OTFS) transmission protocol for LEO satellite IoT communication and proposed a two-stage AUD and CE method [20]. Besides, a high-speed railway IoT active detection method combining tandem spreading multiple access (TSMA) and OTFS was proposed in [21]. By pre-estimating propagation delays, a preamble transmission method was designed in [22], allowing UEs to perform pre-compensation. However, existing research fails to address schemes for massive high-mobility MTCD access that incorporate cell-free massive MIMO systems. Moreover, the current CE methods, including the embedded [23] and superimposed [24] pilot schemes, have their limitations: the former incurs high pilot overhead, while the latter has suboptimal estimation performance. Hence, a balanced scheme is required to ensure accurate estimation while reducing overhead.

To address the aforementioned challenges, this paper investigates the AUD and CE schemes for massive random access in cell-free massive MIMO system. Firstly, we establish OTFS uplink signal model in cell-free massive MIMO system. Secondly, a hybrid preamble scheme is designed, where rough AUD is performed by superimposed preambles, and joint accurate AUD and CE are achieved based on embedded preambles. This scheme reduces the overall sparse signal dimension, allowing the system to accommodate more UEs. Finally, we propose a new block sparse matrix recovery algorithm for AUD and CE, named generalized approximate message passing and pattern-coupled sparse Bayesian learning with Laplacian prior (GAMP-PCSBL-La). Simulations demonstrate that this algorithm achieves better estimation performance compared to existing block recovery algorithms. Our contributions are summarized as follows:

•

We first analyze the massive random access model with OTFS modulation integrating cell-free massive MIMO in this paper. Through mathematical approximation, utilizing uniform planar array (UPA) of antennas and selecting appropriate preamble sequence embedding positions, we model the preamble signals as a two-dimensional (2-D) sparse compressed sensing model in the delay-Doppler-UE-beam domains. Then AUD and CE are transformed into a 2-D block sparse matrix recovery problem.
•

To address the scale constraints of high-dimensional sparse matrices in compressed sensing¹¹1The sparse recovery of compressed sensing requires meeting sparsity constraint [25], i.e. $L>C\cdot{{K}_{a}}\log K$ , where $L_{p}$ denotes the length of observed sequences, $K_{a}$ and $K$ are the number of nonzero and total elements of sparse sequence, respectively. $C$ is a small constant. while reducing the overhead of preambles, we propose a hybrid preamble scheme. Rough AUD is performed by superimposed preamble, followed by accurate AUD and CE based on embedded preamble. This approach reduces the sparse channel dimension for each estimation, enabling the system to support massive access for numerous UEs.
•

A novel GAMP-PCSBL-La algorithm is designed to recover the two-dimensional block sparse channel matrix. GAMP achieves good estimation performance while reducing computational complexity by avoiding matrix inversion [26]. PCSBL captures the block sparsity of two-dimensional matrix [27], and the Laplacian prior distribution has been proven to enhance reconstruction ability of sparse signals with discrete cosine transform (DCT) [28]. By combining these features, GAMP-PCSBL-La achieves excellent channel estimation accuracy with low computational complexity. Our simulation results further validate this conclusion.

The remainder of this paper is organized as follows. In Section II, we introduce the system model. Section III discusses the rough AUD, joint accurate AUD and CE strategies based on hybrid preamble scheme. In Section IV, we present the novel block sparse matrix recovery algorithm, GAMP-PCSBL-La. Section V provides numerical simulations and the corresponding analysis. Finally, the conclusion is given in Section VI.

Notations: Bold lower letters and bold capital letters denote vectors and matrices, respectively. Normal lower letters and capital letters represent scalar variables and constants, respectively. $\mathbb{C}$ and $\mathbb{R}$ are complex number set and real number set, respectively. $\mathbf{A}_{a:b,c:d}$ represent a sliced matrix for $\mathbf{A}$ with $a$ -th row to $b$ -th row and $c$ -th column to $d$ -th column, while $\mathbf{a}_{a:b}$ is a sliced vector for $\mathbf{a}$ with $a$ -th element to $b$ -th element. Especially, $\mathbf{A}_{a:b,:}$ denotes the submatrix of $\mathbf{A}$ with $a$ -th row to $b$ -th row. $\mathbb{E}$ and $\mathbb{V}$ mean the expectation and variance, respectively. $\delta(\cdot)$ denotes a Dirac delta function, and $()^{H}$ is the conjugate transpose of a matrix or vector. $\mathbf{A}[a,b]$ means $(a,b)$ -th element of matrix $\mathbf{A}$ . $\otimes$ and $\odot$ represent Kronecker product and Hadamard product, respectively. Calligraphy letters are used to denote sets. $\left\|\cdot\right\|_{F}$ is Frobenius norm. $\left\lceil\cdot\right\rceil$ and $\left[\cdot\right]_{\text{R}}$ respecitvely represent ceiling and rounding operations.

Refer to caption — Figure 1: Massive random access in cell-free massive MIMO system.

II System Model

II-A mMTC in Cell-free Massive MIMIO System

We consider a cell-free massive MIMO system, as shown in Fig. 1, comprising $B$ APs and $U$ single-antenna UEs, which are randomly distributed over a large area. Assume that each AP is connected to the CPU through fronthaul, allowing lossless data interaction. The UEs move within the area, and only a small portion of UEs transmit uplink data to APs in a specific transmission slot. These UEs are referred to as active UEs, denoted as $\mathcal{K_{A}}$ , while the remaining UEs are silent. The channels of active UEs and APs experience doubly-selective fading. The maximum delay and Doppler shift are set to $\tau_{\max}$ and $\nu_{\max}$ , respectively. The signal propagation from an active UE to an AP is characterized by a finite number of paths. The uplink transmission block consists of preamble sequences and data symbols. AUD and CE are performed based on received preamble sequences.

II-B OTFS Modulation and Channel Model

Consider a typical OTFS transceiver system occupies bandwidth ${M\Delta f}$ and time duration $NT$ , where $M$ denotes the number of subcarriers with interval $\Delta f$ and $N$ denotes the number of time intervals $T$ . In DD domain, the resolutions of delay and Doppler parameters are $\frac{1}{{M\Delta f}}$ and $\frac{1}{{NT}}$ , respectively. For a given active UE $u$ , the modulated and power-allocated symbol $\left\{{X_{u}^{DD}\left[{k,l}\right],0\leq k\leq N-1,0\leq l\leq M-1}\right\}$ is assigned to the $(k,l)$ -th grid point in $N\times M$ DD grid. Here, $k$ and $l$ represent the indices for Doppler domain and delay domain, respectively. By applying the ISFFT to ${\mathbf{X}}_{u}^{DD}\in{\mathbb{C}^{N\times M}}$ in DD domain, the $N\times M$ zero-mean symbols are transformed into TF domain:

X_{u}^{TF}\left[{n,m}\right]=\frac{1}{{\sqrt{NM}}}\sum\limits_{k}{}\sum\limits_{l}{}X_{u}^{DD}\left[{k,l}\right]{e^{-j2\pi\left({\frac{{ml}}{M}-\frac{{nk}}{N}}\right)}}.

(1)

On this basis, the transmitter applies the Heisenberg transform to convert it into time-domain:

{s_{u}}\left(t\right)=\sum\limits_{n}{}\sum\limits_{m}{}\mathbf{X}_{u}^{TF}\left[{n,m}\right]{e^{j2\pi m\Delta f(t-nT)}}{g_{tx}}(t-nT),

(2)

where ${g_{tx}}(t)$ represents the rectangular window function in time domain with a duration of $T$ . The delay-Doppler channel response model from UE $u$ to the $b$ -th AP is defined as:

{h_{u,b}}\left({\tau,\nu}\right)=\sum\limits_{i=1}^{P}{{h_{u,b,i}}\delta\left({\tau-{\tau_{u,b,i}}}\right)\delta\left({\nu-{\nu_{u,b,i}}}\right)}.

(3)

Here, ${h_{u,b,i}}$ , ${\tau_{u,b,i}}$ and ${\nu_{u,b,i}}$ represent the gain, delay, and Doppler shift, respectively, of the $i$ -th path from UE $u$ to the $b$ -th AP. $P$ is the number of path. Consider the path loss and shadow fading, we have ${h_{u,b,i}}\sim{\mathcal{C}\mathcal{N}}\left({0,{\lambda_{u,b}}/{P^{2}}}\right)$ , with ${\lambda_{u,b}}$ representing the large-scale fading coefficient of the channels from UE $u$ to $b$ -th AP. The corresponding received time-domain signals for $b$ -th AP is presented as:

{r_{b}}\left(t\right)=\iint{}{h_{u,b}}\left({\tau,\nu}\right){s_{u}}\left({t-\tau}\right){e^{j2\pi\left({t-\tau}\right)\nu}}d\tau d\nu+{n_{b}}\left(t\right),

(4)

where ${n_{b}}\left(t\right)$ represents the additive Gaussian noise. Each AP is equipped with a UPA, where $N_{z}$ and $N_{y}$ represent the number of antennas in the $z$ -direction and $y$ -direction, respectively. As shown in Fig. 3, let the elevation angle and azimuth angle of the $i$ -th incident signal from the $u$ -th active UE to the $b$ -th AP are denoted by ${\vartheta_{u,b,i}}$ and ${\varsigma_{u,b,i}}$ , respectively. The antenna spacing is specified as half-wavelength, with $\rho_{u,b,i}^{y}=\cos{\vartheta_{u,b,i}}\cos{\varsigma_{u,b,i}}$ and $\rho_{u,b,i}^{z}=\sin{\vartheta_{u,b,i}}$ . The response of UPA is given by:

{\mathbf{a}}^{s}_{u,b,i}={{\mathbf{a}}_{{N_{y}}}}\left({\rho_{u,b,i}^{y}}\right)\otimes{{\mathbf{a}}_{{N_{z}}}}\left({\rho_{u,b,i}^{z}}\right),

(5)

where ${{\mathbf{a}}_{N}}\left(\rho\right)$ represents the spatial steering vector with dimension $N$ , expressed as:

{{\mathbf{a}}_{N}}\left(\rho\right)=\frac{1}{{\sqrt{{N}}}}{\left[{1,{\operatorname{e}^{j\pi\rho}},\cdots,{\operatorname{e}^{j\pi\left({N-1}\right)\rho}}}\right]^{T}}.

(6)

The combine matrix for each AP is denoted as ${\mathbf{W}}={{\mathbf{D}}_{{N_{y}}}}\otimes{{\mathbf{D}}_{{N_{z}}}}$ , where ${{\mathbf{D}}_{{N}}}$ is the DFT matrix with dimension $N$ . We define beam vector $\mathbf{a}_{u,b,i}=\mathbf{W}^{H}\mathbf{a}_{u,b,i}^{s}$ , which has only one non-zero block. The local signal processing unit of AP performs Wigner transform on the time-domain received signal, yielding the received signal presented as

		$\displaystyle{\mathbf{y}_{u,b}^{TF}}\left[{n,m}\right]=\frac{1}{T}\sum\limits_{i}{}{h_{u,b,i}}\mathbf{a}_{u,b,i}\sum\limits_{m^{\prime}}{}{{X}_{u}^{TF}}\left[{n,m^{\prime}}\right]{e^{-j2\pi m^{\prime}\Delta f{\tau_{u,b,i}}}}\hfill$
		$\displaystyle{e^{-j2\pi{\nu_{u,b,i}}{\tau_{u,b,i}}}}{e^{j2\pi{\nu_{u,b,i}}nT}}\int_{{\tau_{u,b,i}}}^{T}{}{e^{-j2\pi\Delta ft(m-m^{\prime}-\frac{{{\nu_{u,b,i}}}}{{\Delta f}})}}dt\hfill$
		$\displaystyle+{{X}_{u}^{TF}}\left[{n-1,m^{\prime}}\right]{e^{-j2\pi m^{\prime}\Delta f{\tau_{u,b,i}}}}{e^{j2\pi m^{\prime}\Delta fT}}{e^{j2\pi{\nu_{u,b,i}}nT}}\hfill$
		$\displaystyle{e^{-j2\pi{\nu_{u,b,i}}{\tau_{u,b,i}}}}\int_{0}^{{\tau_{u,b,i}}}{}{e^{-j2\pi\Delta ft(m-m^{\prime}-\frac{{{\nu_{u,b,i}}}}{{\Delta f}})}}dt\hfill$
		$\displaystyle+{\mathbf{n}_{u,b}}\left[{n,m}\right]\in\mathbb{C}^{N_{z}N_{y}\times 1}.$		(7)

Usually, $M$ is larger than $N$ . We assume that each delay parameter is an integer multiple of the resolution, i.e.,


	$\displaystyle{\tau_{u,b,i}}=\frac{{{l_{u,b,i}}}}{{M\Delta f}},$		(8a)
	$\displaystyle{\nu_{u,b,i}}=\frac{{{k_{u,b,i}}+{{\tilde{k}}_{u,b,i}}}}{{NT}},$		(8b)

where both ${l_{u,b,i}}$ and ${k_{u,b,i}}$ are integers, and ${\tilde{k}_{u,b,i}}$ is a fraction value between -0.5 and 0.5. Using symplectic finite Fourier transform (SFFT), the received signal in the TF domain is transformed into the DD domain:

\mathbf{y}_{u,b}^{DD}[k,l]=\frac{1}{{\sqrt{NM}}}\sum\limits_{n}{}\sum\limits_{m}{}\mathbf{y}_{u,b}^{TF}\left[{n,m}\right]{e^{j2\pi\left({\frac{{ml}}{M}-\frac{{nk}}{N}}\right)}}\in\mathbb{C}^{N_{z}N_{y}\times 1}.

(9)

Additionally, we introduce a function $\psi\left(x\right)$ defined as

\psi_{N}\left({x,y}\right)=\frac{1}{N}\frac{{1-{e^{j2\pi x}}}}{{1-{e^{-\frac{{j2\pi(y-x)}}{N}}}}}.

(10)

Combining equations (1), (II-B), (8a), (8b) and (9), we obtain the received signal model in DD domain as $\mathbf{y}_{u,b}^{DD}\left[{k,l}\right]=\sum_{i=1}^{P}\mathbf{y}_{u,b,i}^{DD}\left[{k,l}\right]$ , where $\mathbf{y}_{u,b,i}^{DD}\left[{k,l}\right]$ is shown in equation (11), $k^{\prime\prime}\in\left[{{k_{u,b,i}}-\varepsilon,{k_{u,b,i}}+\varepsilon}\right]$ is defined as the neighborhood of integer Doppler parameters, and $\varepsilon$ is a very small integer.

Proof:

Please refer to Appendix A in [29]. ∎

\mathbf{y}_{u,b,i}^{DD}\left[{k,l}\right]\approx\left\{{\begin{array}[]{*{20}{c}}{{h_{u,b,i}}\mathbf{a}_{u,b,i}\sum\limits_{k^{\prime\prime}}{}X_{u}^{DD}\left[{k-k^{\prime\prime},l-{l_{u,b,i}}}\right]{e^{-j2\pi\frac{{k-k^{\prime\prime}}}{N}}}{e^{j2\pi\frac{{\left({l-{l_{u,b,i}}}\right)\left({{k_{u,b,i}}+{{\tilde{k}}_{u,b,i}}}\right)}}{{NM}}}}\psi_{N}({k_{u,b,i}}+{{\tilde{k}}_{u,b,i}},k^{\prime\prime})},&{l<{l_{u,b,i}}},\\ {{h_{u,b,i}}\mathbf{a}_{u,b,i}\sum\limits_{k^{\prime\prime}}{}X_{u}^{DD}\left[{k-k^{\prime\prime},l-{l_{u,b,i}}}\right]{e^{j2\pi\frac{{\left({l-{l_{u,b,i}}}\right)\left({{k_{u,b,i}}+{{\tilde{k}}_{u,b,i}}}\right)}}{{NM}}}}\psi_{N}({k_{u,b,i}}+{{\tilde{k}}_{u,b,i}},k^{\prime\prime})},&{l\geq{l_{u,b,i}}}.\end{array}}\right.

(11)

III Hybrid Preamble-based AUD and CE Scheme

III-A Rough Estimation

We are going to consider the signal model with both delay domain and Doppler domain dimensions are relatively small. Assuming $N^{\prime}=\alpha N$ and $M^{\prime}=\beta M$ are both integers, where $0<\alpha,\beta<1$ . The quantization value for the maximum delay ${\tau_{\max}}$ is ${\tilde{l}^{\prime}_{\max}}={\tau_{\max}}M^{\prime}\Delta f=\beta{\tau_{\max}}M\Delta f\ll 1$ when $\beta$ is particularly small, which implies that any delay parameter $0<{\tilde{l}^{\prime}_{u,b,i}}\leq{\tilde{l}^{\prime}_{\max}}\ll 1$ is a fractional value. The quantization value for the maximum Doppler shift is ${k^{\prime}_{\max}}=\left\lceil{{\nu_{\max}}N^{\prime}T}\right\rceil$ . Similar to equation (11), we obtain the reception model as:

		$\displaystyle\mathbf{y}_{u,b}^{DD}\left[{k^{\prime},l^{\prime}}\right]\mathop{\approx}\limits^{(a)}\sum\limits_{i}{}{h_{u,b,i}}\mathbf{a}_{u,b,i}\sum\limits_{k^{\prime\prime}}{}{X}_{u}^{DD}\left[{k^{\prime}-k^{\prime\prime},l^{\prime}}\right]{e^{j2\pi\frac{{l^{\prime}{{k^{\prime}}_{u,b,i}}}}{{N^{\prime}M^{\prime}}}}}\hfill$
		$\displaystyle{e^{-j2\pi\frac{{{{\tilde{l}^{\prime}}_{u,b,i}}\left({{{k^{\prime}}_{u,b,i}}+{{\tilde{k}^{\prime}}_{u,b,i}}}\right)}}{{N^{\prime}M^{\prime}}}}}\psi_{N^{\prime}}({{{k^{\prime}}_{u,b,i}}+{{\tilde{k}^{\prime}}_{u,b,i}}},k^{\prime\prime})\psi_{M^{\prime}}({-\tilde{l}^{\prime}}_{u,b,i},0),$		(12)

where ${\tilde{l}^{\prime}_{u,b,i}}={\tau_{u,b,i}}M^{\prime}\Delta f$ , ${k^{\prime}_{u,b,i}}=\left[{{\nu_{u,b,i}}N^{\prime}T}\right]_{\text{R}}$ , ${\tilde{k}^{\prime}_{u,b,i}}={\nu_{u,b,i}}N^{\prime}T-{k^{\prime}_{u,b,i}}$ and $k^{\prime\prime}\in\left[{{{k^{\prime}}_{u,b,i}}-\varepsilon^{\prime},{{k^{\prime}}_{u,b,i}}+\varepsilon^{\prime}}\right]$ is defined as the neighborhood of ${{k^{\prime}}_{u,b,i}}$ with $\varepsilon^{\prime}$ is a small-value integer. Due to $l^{\prime}<M^{\prime}$ , $-0.5\leq{\tilde{k}^{\prime}_{u,b,i}}<0.5$ , and especially when $M^{\prime}$ is very small and $N^{\prime}$ is larger compared to $M^{\prime}$ , (a) holds approximately true.

Proof:

Please refer to Appendix B in [29]. ∎

For a concise expression, we define function:

	$\displaystyle\mathbf{C}$	$\displaystyle{}_{{\mathbf{x}},k,\varepsilon}=\left[{\text{circ(}}{\mathbf{x}},0{\text{),circ(}}{\mathbf{x}},1{\text{),}}\ldots{\text{,circ(}}{\mathbf{x}},k+\varepsilon{\text{),}}\right.$		(13)
		$\displaystyle\left.\qquad{\text{circ(}}{\mathbf{x}},-\varepsilon{\text{),circ(}}{\mathbf{x}},-\varepsilon+1{\text{),}}\ldots{\text{,circ(}}{\mathbf{x}},-1{\text{)}}\right],$		(13)

where ${\text{circ(}}{\mathbf{x}},i{\text{)}}$ represents the vector obtained by circularly shifting vector $\mathbf{x}$ by $i$ positions. Based on the above definitions, we transform equation (III-A) into matrix form:

\begin{gathered}{\mathbf{Y}}_{u,b}^{p1}\approx\left({{\mathbf{X}}_{u}^{p1}\odot{\mathbf{\Phi^{\prime}}}}\right){\mathbf{H}}_{u,b}^{DD1}+{\mathbf{N}}_{u,b}^{DD1}\hfill\\ ={\mathbf{A}}_{u}^{p1}{\mathbf{H}}_{u,b}^{DD1}+{\mathbf{N}}_{u,b}^{DD1}\in{\mathbb{C}^{N^{\prime}M^{\prime}\times N_{y}N_{z}}},\hfill\\ \end{gathered}

(14)

where ${\mathbf{Y}}_{b}^{p1}=\left[\mathbf{y}_{b}^{DD}\left[{0,0}\right],...,\mathbf{y}_{b}^{DD}\left[{N^{\prime}-1,M^{\prime}-1}\right]\right]^{T}$ , and


	$\displaystyle{\mathbf{X}}_{u}^{p1}=\left[{\begin{array}[]{*{20}{c}}{\mathbf{C}_{\left({\mathbf{X}}_{u}^{DD}\right)_{:,1},{{k^{\prime}}_{\max}},\varepsilon^{\prime}}}\\ {\mathbf{C}_{\left({\mathbf{X}}_{u}^{DD}\right)_{:,2},{{k^{\prime}}_{\max}},\varepsilon^{\prime}}}\\ \vdots\\ {\mathbf{C}_{\left({\mathbf{X}}_{u}^{DD}\right)_{:,M^{\prime}},{{k^{\prime}}_{\max}},\varepsilon^{\prime}}}\end{array}}\right],$		(15e)
	$\displaystyle{\mathbf{\Phi^{\prime}}}={\left({{\mathbf{F}}_{M^{\prime},N^{\prime}}}\right)_{:,{{\mathbf{p}}_{r1}}}}\otimes{{\mathbf{1}}^{N^{\prime}\times 1}}.$		(15f)

Here, ${{\mathbf{F}}_{M^{\prime},N^{\prime}}}$ is a matrix, where each element is given by $F_{M^{\prime},N^{\prime}}[k,l]=e^{j2\pi\frac{kl}{M^{\prime}N^{\prime}}}$ . ${{\mathbf{p}}_{r1}}$ is the index vector that satisfies ${{\mathbf{p}}_{r1}}=[1:k^{\prime}_{\max}+\varepsilon^{\prime}+1]\cup[N^{\prime}-\varepsilon^{\prime}+1:N^{\prime}]$ . ${\mathbf{H}}_{u,b}^{DD1}\in{\mathbb{C}^{({{k^{\prime}}_{\max}}+2\varepsilon^{\prime}+1)\times 1}}$ is expressed as

{\mathbf{H}}_{u,b}^{DD1}=\sum\limits_{i}{{\mathbf{H}}_{u,b,i}^{DD1}}.

(16)

\left(\mathbf{H}_{u,b,i}^{DD1}\right)_{t,:}=\left\{{\begin{array}[]{*{20}{c}}{{\hat{h}^{\prime}_{u,b,i}}\mathbf{a}_{u,b,i}^{T}}\psi_{N^{\prime}}({{{\tilde{k}^{\prime}}_{u,b,i}}},t^{\prime}),&{\left\{{\begin{array}[]{*{20}{c}}{{\text{if }}{{k^{\prime}}_{u,b,i}}+t^{\prime}<0{\text{, }}t=({{k^{\prime}}_{\max}}+2\varepsilon^{\prime}+1)+{{k^{\prime}}_{u,b,i}}+t^{\prime}+1},\\ {{\text{if }}{{k^{\prime}}_{i}}+t^{\prime}\geq 0{\text{, }}t={{k^{\prime}}_{u,b,i}}+t^{\prime}+1},\end{array}}\right.}\\ \mathbf{0}^{1\times N_{y}N_{z}},&{{\text{otherwise}}}.\end{array}}\right.

(17)

${{\mathbf{H}}_{u,b,i}^{DD1}}$ is presented in equation (17), where $-\varepsilon^{\prime}\leq t^{\prime}\leq\varepsilon^{\prime}$ and ${\hat{h}^{\prime}_{u,b,i}}={h_{u,b,i}}\psi_{M^{\prime}}({-\tilde{l}^{\prime}}_{u,b,i},0){e^{-j2\pi\frac{{{{\tilde{l}^{\prime}}_{u,b,i}}\left({{{k^{\prime}}_{u,b,i}}+{{\tilde{k}^{\prime}}_{u,b,i}}}\right)}}{{N^{\prime}M^{\prime}}}}}$ . Combining equations (14) and (17), ${\mathbf{A}}_{u}^{p1}={{\mathbf{X}}_{u}^{p1}\odot{\mathbf{\Phi^{\prime}}}}$ is regarded as the known measurement matrix at the AP, ${\mathbf{Y}}_{b}^{p1}$ as the observed matrix and ${\mathbf{H}}_{u,b}^{DD1}$ as an unknown 2-D block sparse matrix. In a multi-user scenario, the dimension of the sparse matrix expands, allowing us to utilize this model to detect the indices of non-zero entries in the sparse matrix and thereby identify potential active UEs. Given the coarse approximations made during rough AUD, especially under conditions where $M^{\prime}$ is notably small, accurate estimation of channel parameters becomes challenging. Therefore, it necessitates further refinement based on initial rough detection for accurate AUD and CE. Detailed elaboration on this can be found in following subsections.

III-B Accurate Estimation

Given a sufficiently large dimensions of transmission block, the embedded preamble can be adopted for joint accurate AUD and CE. Let ${k_{\max}}=\left\lfloor{{\nu_{\max}}NT}\right\rfloor$ and ${l_{\max}}={\tau_{\max}}M\Delta f$ . It can be observed from (11) that the $(k,l)$ -th DD domain received symbol is affected by the transmitted symbols with range of $[k-{k_{\max}}-\varepsilon:k+\varepsilon,l-{l_{\max}}:l]$ . Therefore, to avoid interference between preamble and data, a guard interval needs to be estabilshed, where symbols within this interval are set to zero, as illustrated in Fig. 4.

Assuming the starting coordinates of the preamble are $(k_{p},l_{p})$ , with the dimension of $L_{p}$ on the delay axis and $K_{p}$ on the Doppler axis. If we set $l_{p}-l_{\max}\geq l_{\max}$ and $l_{p}+L_{p}<M$ , for $l\in\left[{{l_{p}}-{l_{\max}},{l_{p}}+{L_{p}}}\right]$ , only the case when $l\geq{l_{u,b,i}}$ (since $l\geq{l_{\max}}\geq{l_{u,b,i}}$ ) in (11) are considerable. We denote $M_{p}=L_{p}+l_{\max}$ , $N_{p}=K_{p}+k_{\max}$ , and let ${{\mathbf{X}}_{u,p}}=({{\mathbf{X}}_{u}})_{{k_{p}}:{k_{p}}+{N_{p}}+\varepsilon-1,{l_{p}}:{l_{p}}+M_{p}-1}$ , ${\mathbf{Y}}_{u,b}^{p2}=({{\mathbf{Y}}_{u,b}})_{\text{vec}({k_{p}}:{k_{p}}+{N_{p}}-1,{l_{p}}:{l_{p}}+M_{p}-1),:}$ , where $\mathbf{X}_{u}\in\mathbb{C}^{N\times M}$ and $\mathbf{Y}_{u,b}\in\mathbb{C}^{NM\times N_{z}N_{y}}$ are DD domain transmitted symbols of $u$ -th UE and received symbols of $b$ -th AP, respectively. $\text{vec}({k_{p}}:{k_{p}}+{N_{p}}-1,{l_{p}}:{l_{p}}+M_{p}-1)$ denotes the vectorized matrix indices $[{k_{p}}:{k_{p}}+{N_{p}}-1,{l_{p}}:{l_{p}}+M_{p}-1]$ . Similar to (14), the received embedded preamble signal is expressed as:

\begin{gathered}{\mathbf{Y}}_{u,b}^{p2}\approx\left({{\mathbf{X}}_{u}^{p2}\odot{\mathbf{\Phi}}}\right){\mathbf{H}}_{u,b}^{DD2}+{\mathbf{N}}_{u,b}^{DD2}\hfill\\ ={\mathbf{A}}_{u}^{p2}{\mathbf{H}}_{u,b}^{DD2}+{\mathbf{N}}_{u,b}^{DD2}\in{\mathbb{C}^{{N_{p}}{M_{p}}\times N_{z}N_{y}}},\hfill\\ \end{gathered}

(18)

where ${\mathbf{X}}_{u}^{p2}$ is described in equation (19), and

\begin{gathered}{\mathbf{X}}_{u}^{p2}=\left[{\begin{array}[]{*{20}{c}}{{\mathbf{C}^{c}}_{\left({{\mathbf{X}}_{u,p}}\right)_{:,1},{k_{\max}},\varepsilon}}\\ {{\mathbf{C}^{c}}_{\left({{\mathbf{X}}_{u,p}}\right)_{:,2},{k_{\max}},\varepsilon}}\\ \vdots\\ {{\mathbf{C}^{c}}_{\left({{\mathbf{X}}_{u,p}}\right)_{:,{L_{p}}+{l_{\max}}},{k_{\max}},\varepsilon}}\end{array}\begin{array}[]{*{20}{c}}{{\mathbf{C}^{c}}_{\left({{\mathbf{X}}_{u,p}}\right)_{:,{L_{p}}+{l_{\max}}},{k_{\max}},\varepsilon}}\\ {{\mathbf{C}^{c}}_{\left({{\mathbf{X}}_{u,p}}\right)_{:,1},{k_{\max}},\varepsilon}}\\ \vdots\\ {{\mathbf{C}^{c}}_{\left({{\mathbf{X}}_{u,p}}\right)_{:,{L_{p}}+{l_{\max}}-1},{k_{\max}},\varepsilon}}\end{array}\begin{array}[]{*{20}{c}}{}\hfil\\ {}\hfil\\ \cdots\\ {}\hfil\end{array}\begin{array}[]{*{20}{c}}{{\mathbf{C}^{c}}_{\left({{\mathbf{X}}_{u,p}}\right)_{:,{L_{p}}+1},{k_{\max}},\varepsilon}}\\ {{\mathbf{C}^{c}}_{\left({{\mathbf{X}}_{u,p}}\right)_{:,{L_{p}}+2},{k_{\max}},\varepsilon}}\\ \vdots\\ {{\mathbf{C}^{c}}_{\left({{\mathbf{X}}_{u,p}}\right)_{:,{L_{p}}},{k_{\max}},\varepsilon}}\end{array}}\right]\hfill\\ \end{gathered}

(19)


	$\displaystyle{\mathbf{C}^{c}}_{{\mathbf{x}},k,\varepsilon}={\left({\mathbf{C}_{{\mathbf{x}},k,\varepsilon}}\right)_{1:\dim({\mathbf{x}})-\varepsilon,:}},$		(20a)
	$\displaystyle{\mathbf{\Phi}}={{\mathbf{1}}^{1\times\left({{l_{\max}}+1}\right)}}\otimes{\left({{\mathbf{F}}_{M,N}}\right)_{{l_{p}}:{l_{p}}+{M_{p}}-1,{{\mathbf{p}}_{r2}}}}\otimes{{\mathbf{1}}^{{N_{p}}\times 1}}.$		(20b)

${{\mathbf{p}}_{r2}}$ is the index vector that satisfies ${{\mathbf{p}}_{r2}}=[1:k_{\max}+\varepsilon+1]\cup[N_{p}-\varepsilon+1:N_{p}]$ . ${\mathbf{H}}_{u,b}^{DD2}\in{\mathbb{C}^{({k_{\max}}+2\varepsilon+1)({l_{\max}}+1)\times N_{z}N_{y}}}$ is expressed as

{\mathbf{H}}_{u,b}^{DD2}=\sum\limits_{i}{{\mathbf{H}}_{u,b,i}^{DD2}}.

(21)

\left(\mathbf{H}_{u,b,i}^{DD2}\right)_{t,:}=\left\{{\begin{array}[]{*{20}{c}}{{\hat{h}_{u,b,i}}\mathbf{a}_{u,b,i}^{T}\psi_{N}({{{\tilde{k}}_{u,b,i}}},t^{\prime}),}&{\left\{{\begin{array}[]{*{20}{c}}{{\text{if }}{k_{u,b,i}}+t^{\prime}<0{\text{, }}t=({l_{i}}+1)({k_{\max}}+2\varepsilon+1)+{k_{u,b,i}}+t+1},\\ {{\text{if }}{k_{u,b,i}}+t^{\prime}\geq 0{\text{, }}t={l_{i}}({k_{\max}}+2\varepsilon+1)+{k_{u,b,i}}+t^{\prime}+1},\end{array}}\right.}\\ \mathbf{0}^{1\times N_{y}N_{z}},&{{\text{otherwise}}}.\end{array}}\right.

(22)

${{\mathbf{H}}_{u,b,i}^{DD2}}$ is presented as in equation (22), where $-\varepsilon\leq t^{\prime}\leq\varepsilon$ and ${\hat{h}_{u,b,i}}={h_{u,b,i}}{e^{-j2\pi\frac{{{{l}_{u,b,i}}\left({{{k}_{u,b,i}}+{{\tilde{k}}_{u,b,i}}}\right)}}{{NM}}}}$ . According to equation (18), ${\mathbf{A}}_{u}^{p2}={{\mathbf{X}}_{u}^{p2}\odot{\mathbf{\Phi}}}$ can be considered as the known measurement matrix at the AP, ${\mathbf{Y}}_{u,b}^{p2}$ as the observed matrix and ${\mathbf{H}}_{u,b}^{DD2}$ as an unknown 2-D block sparse matrix. Since in accurate estimation, $M>M^{\prime}$ and $N>N^{\prime}$ , which make a higher resolution in delay and Doppler, resulting in more precise quantization. However, compared to rough estimation, the dimension of the sparse vector for accurate estimation is larger, making it more difficult to recover the sparse vector in multi-UE scenarios. Therefore, a hybrid preamble scheme is designed to achieve precise detection and estimation with lower overhead and complexity.

III-C Hybrid Preamble for Multi-UE Joint Active Detection and Channel Estimation

In the $N\times M$ DD domain, we superimpose the superimposed preamble, denoted as preamble1 ${\mathbf{X}}_{u,p1}^{DD}$ , and the block symbols ${\mathbf{X}}_{u,2}^{DD}$ , which includes the embedded preamble denoted as preamble2 ${\mathbf{X}}_{u,p2}$ , and data symbols ${\mathbf{X}}_{u,d}$ . Different power levels are allocated to ${\mathbf{X}}_{u,p1}^{DD}$ and ${\mathbf{X}}_{u,2}^{DD}$ , ensuring a significant difference in energy domain between these two types of signals. The superimposed result forms transmission block, structured as shown in the Fig. 5.

Then we have

\begin{gathered}{\mathbf{X}}_{u}^{DD}={\mathbf{X}}_{u,1}^{DD}+{\mathbf{X}}_{u,2}^{DD},\hfill\\ {{X}}_{u,1}^{DD}[k,l]=\left\{{\begin{array}[]{*{20}{c}}{{{X}}_{u,p1}^{DD}[k^{\prime},l^{\prime}]}&{k=k^{\prime},l=\frac{{l^{\prime}}}{\beta}},\\ 0&{{\text{otherwise}}},\end{array}}\right.\hfill\\ {{X}}_{u,2}^{DD}[k,l]=\left\{{\begin{array}[]{*{20}{c}}{{{X}}_{u,p2}^{DD}[k^{\prime},l^{\prime}]}&{k=k^{\prime}+{k_{p}},l=l^{\prime}+{l_{p}}},\\ {{{X}}_{u,d}^{DD}[k,l]}&{\left({k,l}\right){\text{ not in }}{\mathcal{P}\mathcal{G}}{\text{ area}}},\\ 0&{{\text{otherwise}}},\end{array}}\right.\hfill\\ \end{gathered}

(23)

where $\mathcal{PG}$ area represents the grids designated for placing preamble2 and the guard intervals. We set the preamble1 is placed at intervals of $\frac{1}{\beta}$ along the delay axis while being placed continuously along the Doppler axis. Since the delay dimension $M^{\prime}=\beta M$ of preamble1 is assumed to be very small, and the Doppler dimension satisfies $N^{\prime}=\alpha N\leq\frac{N}{2}$ , there is sufficient space within DD dimension to place preamble2 and guard interval. This arrangement ensures that the received signal of preamble1 does not interfere with preamble2.

Building on this,the received signal in the TF domain can be expressed as:

\begin{gathered}{\mathbf{Y}^{TF}_{b}}={\mathbf{Y}^{TF}_{b1}}+{\mathbf{Y}^{TF}_{b2}}+\mathbf{N}^{TF}_{b}\\ ={\mathbf{Y}^{TF}_{b1}}+\mathbf{\tilde{Z}}^{TF}_{b},\\ \end{gathered}

(24)

where ${\mathbf{Y}^{TF}_{b1}}$ and ${\mathbf{Y}^{TF}_{b2}}$ represent $b$ -th AP’s received ${\mathbf{X}}_{1}^{DD}$ and ${\mathbf{X}}_{2}^{DD}$ signals in TF domain, respectively. $\mathbf{\tilde{Z}}^{TF}_{b}={\mathbf{Y}^{TF}_{b2}}+\mathbf{N}^{TF}_{b}$ is treated as noise. Assuming that applying ISFFT to ${\mathbf{X}}_{u,p1}^{DD}\in{\mathbb{C}^{N^{\prime}\times M^{\prime}}}$ results in a TF domain signal ${\mathbf{X}}_{u,p1}^{TF}\in{\mathbb{C}^{N^{\prime}\times M^{\prime}}}$ , and applying ISFFT to ${\mathbf{X}}_{u,1}^{DD}\in{\mathbb{C}^{N\times M}}$ to obtain a TF domain signal ${\mathbf{X}}_{u,1}^{TF}\in{\mathbb{C}^{N\times M}}$ , both signals pass through the same channel to arrive at the $b$ -th AP. After performing the Wigner transform, the TF domain received signals are ${\mathbf{Y}}_{u,b,p1}^{TF}\in{\mathbb{C}^{N^{\prime}M^{\prime}\times N_{z}N_{y}}}$ and ${\mathbf{Y}}_{u,b,1}^{TF}\in{\mathbb{C}^{NM\times N_{z}N_{y}}}$ respectively. Based on equations (1), (9), and (22), we can derive:

{{X}}_{u,p1}^{TF}[n^{\prime},m^{\prime}]=\frac{1}{{\sqrt{\alpha\beta}}}{{X}}_{u,1}^{TF}[\frac{{n^{\prime}}}{\alpha},m^{\prime}],

(25)

\begin{gathered}{\mathbf{y}}_{u,b,1}^{TF}[\frac{{n^{\prime}}}{\alpha},m^{\prime}]=\sqrt{\alpha\beta}\frac{1}{T}\sum\limits_{i}{}{h_{u,b,i}\mathbf{a}_{u,b,i}}\sum\limits_{m^{\prime\prime}}{}{{X}}_{u,p1}^{TF}[n^{\prime},m^{\prime\prime}]\hfill\\ {e^{-j2\pi m^{\prime\prime}\Delta f{\tau_{u,b,i}}}}{e^{-j2\pi{\nu_{u,b,i}}{\tau_{u,b,i}}}}{e^{j2\pi{\nu_{u,b,i}}\frac{{n^{\prime}}}{\alpha}T}}\hfill\\ \int_{{\tau_{u,b,i}}}^{T}{}{e^{-j2\pi\Delta ft(m^{\prime}-m^{\prime\prime}-\frac{{{\nu_{u,b,i}}}}{{\Delta f}})}}dt+\mathbf{n}_{u,b,1}^{TF}[\frac{{n^{\prime}}}{\alpha},m^{\prime}],\hfill\\ \end{gathered}

(26)

\begin{gathered}{\mathbf{y}}_{u,b,p1}^{TF}[n^{\prime},m^{\prime}]=\frac{1}{T}\sum\limits_{i}{}{h_{u,b,i}\mathbf{a}_{u,b,i}}\sum\limits_{m^{\prime\prime}}{{X}}_{u,p1}^{TF}[n^{\prime},m^{\prime\prime}]\hfill\\ {e^{-j2\pi m^{\prime\prime}\Delta f{\tau_{u,b,i}}}}{e^{-j2\pi{\nu_{u,b,i}}{\tau_{u,b,i}}}}{e^{j2\pi{\nu_{u,b,i}}n^{\prime}T}}\hfill\\ \int_{{\tau_{u,b,i}}}^{T}{}{e^{-j2\pi\Delta ft(m^{\prime}-m^{\prime\prime}-\frac{{{\nu_{u,b,i}}}}{{\Delta f}})}}dt+\mathbf{n}_{u,b,p1}^{TF}[n^{\prime},m^{\prime}].\hfill\\ \end{gathered}

(27)

Proof:

Please refer to Appendix C in [29]. ∎

By comparing equations (26) and (27), it is apparent that for $0\leq n^{\prime}<N^{\prime}$ and $0\leq m^{\prime}<M^{\prime}$ , define ${\mathbf{y^{\prime}}}_{u,b,1}^{TF}[n^{\prime},m^{\prime}]={\sqrt{\alpha\beta}{\mathbf{y}}_{u,b,1}^{TF}[\frac{{n^{\prime}}}{\alpha},m^{\prime}]}\in{\mathbb{C}^{N_{z}N_{y}\times 1}}$ can be approximated as the received signal of ${\mathbf{X}}_{u,p1}^{TF}[n^{\prime},m^{\prime}]$ through a channel with the same parameters, except that the Doppler parameter is $\frac{1}{\alpha}$ times the original one. Therefore, the maximum Doppler quantization parameter ${k^{\prime}_{\max}}$ also becomes $\frac{1}{\alpha}$ times the original value. Based on this inference, we apply an $N^{\prime}\times M^{\prime}$ SFFT to each column of ${\mathbf{Y^{\prime}}}_{u,b,1}^{TF}=\left[\mathbf{y^{\prime}}_{u,b,1}^{TF}\left[{0,0}\right],...,\mathbf{y^{\prime}}_{u,b,1}^{TF}\left[{N^{\prime}-1,M^{\prime}-1}\right]\right]^{T}$ and then perform rough AUD with the maximum Doppler quantization parameter ${k^{\prime}_{\max}}=[\frac{1}{\alpha}N^{\prime}\Delta f{\nu_{\max}}]_{\text{R}}$ . In the multi-UE scenario, the reception model for preamble1, as described in (14), can be written as:

{\mathbf{Y}}_{b}^{p1}\approx{{\mathbf{A}}^{p1}}{\mathbf{H}}_{b}^{DD1}+{\mathbf{N}}_{b}^{DD}\in{\mathbb{C}^{N^{\prime}M^{\prime}\times N_{z}N_{y}}},

(28)

where ${\mathbf{Y}}_{b}^{p1}$ is the SFFT result of $\sum_{u}{\mathbf{Y^{\prime}}}_{u,b,1}^{TF}$ , and


	$\displaystyle\qquad\qquad\qquad{{\mathbf{A}}^{p1}}=\left[{\mathbf{A}}_{1}^{p1},{\mathbf{A}}_{2}^{p1},\ldots,{\mathbf{A}}_{U}^{p1}\right],$		(29a)
	$\displaystyle{\mathbf{H}}_{b}^{DD1}={\left[{{{\left({{\mathbf{H}}_{b,1}^{DD1}}\right)}^{T}},{{\left({{\mathbf{H}}_{b,2}^{DD1}}\right)}^{T}},\ldots,{{\left({{\mathbf{H}}_{b,U}^{DD1}}\right)}^{T}}}\right]^{T}}.$		(29b)

After completing the rough AUD, each AP transmits the detected results, representing the set of active UEs, to CPU. The CPU merges these results to form a system-wide rough active UEs set, as ${\mathcal{\bar{U}}_{a}}=\bigcup\nolimits_{b}{{\mathcal{\bar{U}}_{b,a}}}$ , where ${\mathcal{\bar{U}}_{b,a}}$ denotes the set of active UEs detected by the $b$ -th AP. Assuming that for $1\leq i\leq\left|{{\mathcal{\bar{U}}_{a}}}\right|$ , we have $u_{i}\in\mathcal{\bar{U}}_{a}$ . Similarly, for multi-UE scenario, equation (18) is rewritten as:

{\mathbf{Y}}_{b}^{p2}\approx{{\mathbf{A}}^{p2}}{\mathbf{H}}_{b}^{DD2}+{\mathbf{N}}_{b}^{DD2}\in{\mathbb{C}^{{N_{p}}{M_{p}}\times N_{z}N_{y}}},

(30)

where ${\mathbf{Y}}_{b}^{p2}=\sum_{u}{\mathbf{Y}}_{u,b}^{p2}$ , and


	$\displaystyle\qquad\qquad\qquad{{\mathbf{A}}^{p2}}=\left[{\mathbf{A}}_{{u_{1}}}^{p2},{\mathbf{A}}_{{u_{2}}}^{p2},\ldots,{\mathbf{A}}_{{u_{\left\|{{\mathcal{\bar{U}}_{a}}}\right\|}}}^{p2}\right],$		(31a)
	$\displaystyle{\mathbf{H}}_{b}^{DD2}={\left[{{{\left({{\mathbf{H}}_{b,{u_{1}}}^{DD2}}\right)}^{T}},{{\left({{\mathbf{H}}_{b,{u_{2}}}^{DD2}}\right)}^{T}},\ldots,{{\left({{\mathbf{H}}_{b,{u_{\left\|{{\mathcal{U}_{a}}}\right\|}}}^{DD2}}\right)}^{T}}}\right]^{T}}.$		(31b)

Since $\left|{{\mathcal{\bar{U}}_{a}}}\right|\ll U$ , the dimension of the sparse vector to be recovered is smaller than that of the estimated vector in a scheme that solely performs accurate AUD with the same sparsity (i.e., the same number of non-zero elements) and received signals. Therefore, a more accurate estimation can be achieved by the hybrid preamble scheme.

After obtaining active UEs and their corresponding channels, the influence of preamble1 on the received signal can be removed by using successive interference cancellation (SIC). Based on the residual received signal and estimated channel parameters, the data signal can be recovered using algorithms such as message passing. The system’s signal processing flow can be seen in Fig. 2. This part of data recovery is out of the scope of this paper. The hybrid preamble based AUD and CE scheme is summarized as in Algorithm 1.

Algorithm 1 Hybrid Preamble Based AUD and CE Scheme

\left\{{{\mathbf{Y}}_{b}^{p1}}\right\}

{{\mathbf{A}}^{p1}}

\left\{{{\mathbf{Y}}_{b}^{p2}}\right\}

{{\mathcal{U}_{a}}}=\bigcup\nolimits_{b}{{{{\mathcal{U}}}_{b,a}}}

\left\{{{\mathbf{\bar{H}}}_{b}^{DD2}|1\leq b\leq B}\right\}

1: % Rough AUD

2: for

b=1

B

3: Recover

{\mathbf{H}}_{b}^{DD1}

based on

{\mathbf{Y}}_{b}^{p1}

and

{\mathbf{A}}^{p1}

by block sparse matrix recovery algorithm (such as GAMP-PCSBL-La proposed in Section IV);

4: Obtain

{\mathcal{\bar{U}}_{b,a}}

based on non-zero entries of estimated

{\mathbf{H}}_{b}^{DD1}

;

5: end for

6: Form

{{\mathbf{A}}^{p2}}

based on

{\mathcal{\bar{U}}_{a}}=\bigcup\nolimits_{b}{{\mathcal{\bar{U}}_{b,a}}}

;

7: % Accurate AUD and CE

8: for

b=1

B

9: Recover

{\mathbf{H}}_{b}^{DD2}

based on

{\mathbf{A}}^{p2}

and

{\mathbf{Y}}_{b}^{p2}

by block sparse matrix recovery algorithm (such as GAMP-PCSBL-La proposed in Section IV);

10: Obtain accurate detected active UEs’ set

{\bar{\mathcal{U}}_{a}}

and channel matrix

\left\{{{\mathbf{\bar{H}}}_{b,i}^{DD2}|i\in{{\bar{\mathcal{U}}}_{a}}}\right\}

based on non-zero entries of estimated

{\mathbf{H}}_{b}^{DD2}

11: end for

IV 2-D Block Sparse Matrix Recovery

IV-A Probability Model

As elaborated in [30], the Laplacian distribution, compared to Gaussian mixture distribution, can better capture the sparsity of signals after undergoing DCT and achieve more precise estimation. Given an AWGN channel model:

{\mathbf{\tilde{Y}}}={\mathbf{\tilde{A}\tilde{X}}}+{\mathbf{\tilde{N}}},

(32)

where ${\mathbf{\tilde{Y}}}\in{\mathbb{C}^{L\times J}}$ is the observed matrix, ${\mathbf{\tilde{A}}}\in{\mathbb{C}^{L\times I}}$ is the measurement matrix (its values are known at the receiver), ${\mathbf{\tilde{X}}}\in{\mathbb{C}^{I\times J}}$ is the block sparse matrix to be estimated, and ${\mathbf{\tilde{N}}}\in{\mathbb{C}^{L\times J}}$ is the additive noise matrix. Since the Laplacian distribution is defined only for real-valued random variables, we need to convert the complex form model of equation (32) into the following real equivalent model:

	$\displaystyle{\mathbf{Y}}={\mathbf{AX}}+{\mathbf{N}},{\mathbf{Y}}\buildrel\Delta\over{=}\left[{\begin{array}[]{*{20}{c}}{\mathcal{R}\left\{{{\mathbf{\tilde{Y}}}}\right\}}\\ {\mathcal{I}\left\{{{\mathbf{\tilde{Y}}}}\right\}}\end{array}}\right]\in{\mathbb{R}^{2L\times J}},$		(35)
	$\displaystyle{\mathbf{A}}\buildrel\Delta\over{=}\left[{\begin{array}[]{*{20}{c}}{\mathcal{R}\left\{{{\mathbf{\tilde{A}}}}\right\}}&{-\mathcal{I}\left\{{{\mathbf{\tilde{A}}}}\right\}}\\ {\mathcal{I}\left\{{{\mathbf{\tilde{A}}}}\right\}}&{\mathcal{R}\left\{{{\mathbf{\tilde{A}}}}\right\}}\end{array}}\right]\in{\mathbb{R}^{2L\times 2I}},$		(38)
	$\displaystyle{\mathbf{X}}\buildrel\Delta\over{=}\left[{\begin{array}[]{{20}{c}}{\mathcal{R}\left\{{{\mathbf{\tilde{X}}}}\right\}}\\ {\mathcal{I}\left\{{{\mathbf{\tilde{X}}}}\right\}}\end{array}}\right]\in{\mathbb{R}^{2I\times J}},{\mathbf{N}}\buildrel\Delta\over{=}\left[{\begin{array}[]{{20}{c}}{\mathcal{R}\left\{{{\mathbf{\tilde{N}}}}\right\}}\\ {\mathcal{I}\left\{{{\mathbf{\tilde{N}}}}\right\}}\end{array}}\right]\in{\mathbb{R}^{2L\times J}}.$		(43)

Here, $\mathcal{R}\left\{\cdot\right\}$ and $\mathcal{I}\left\{\cdot\right\}$ represent the operations of taking the real and imaginary parts of a complex matrix, respectively. In practical systems, the noise variance is often unpredictable. We assume that the communication between the transmitter and receiver occurs over an AWGN channel, i.e.,

p({\mathbf{Y}}|{\mathbf{Z}})=\prod\nolimits_{l,j}{\mathcal{N}\left({{y_{l,j}};{z_{l,j}},\gamma}\right)},

(44)

where ${z_{l,j}}$ is $(l,j)$ -th element of matirx ${\mathbf{Z}}$ , ${\mathbf{Z}}={\mathbf{AX}}$ and $\gamma$ denotes the noise variance. Referencing the two-layer hierarchical probabilistic model of PCSBL [27], we introduce the hyperparameters $\left\{{{\alpha_{i,j}}}\right\}$ and establish the probability distributions of ${\mathbf{X}}$ as:


	$\displaystyle p({\mathbf{X}}\|{\bm{\alpha}})={\prod\limits_{\begin{subarray}{c}0<i<I+1\\ j\end{subarray}}{\mathcal{L}\mathcal{A}\left({{x_{i,j}};\tau_{i,j}^{-1}}\right)}}{\prod\limits_{\begin{subarray}{c}K<i<2I\\ j\end{subarray}}{\mathcal{L}\mathcal{A}\left({{x_{i,j}};\tau_{i-I,j}^{-1}}\right)}},$		(45a)
	$\displaystyle{\tau_{i,j}}={\alpha_{i,j}}+\eta{\alpha_{i-1,j}}+\eta{\alpha_{i+1,j}}+\eta{\alpha_{i,j-1}}+\eta{\alpha_{i,j+1}},$		(45b)
	$\displaystyle p({\alpha_{i,j}})=\mathcal{G}\mathcal{A}\left({{\alpha_{i,j}};a,b}\right).$		(45c)

Among (45a)-(45c), $\mathcal{G}\mathcal{A}\left({{\alpha_{i,j}};a,b}\right)=\Gamma{\left(a\right)^{-1}}{b^{a}}\alpha_{i,j}^{a}{e^{-b{\alpha_{i,j}}}}$ denotes the Gamma distribution with shape parameter $a$ and scale parameter $b$ . $\Gamma\left(a\right)=\int_{0}^{\infty}{{t^{a-1}}{e^{-t}}dt}$ is the Gamma funcation. $\eta\geq 0$ represents the coupling factor and $\mathcal{L}\mathcal{A}(x;b)=\frac{1}{{2b}}\exp\left({-\frac{{\left|x\right|}}{{2b}}}\right)$ is a Laplacian distribution. (45a) shows that the real and imaginary parts of ${\tilde{x}_{i,j}}$ share the same hyperparameter ${\tau_{i,j}}$ , as defined in (45b).

IV-B GAMP Algorithm for Sparse Matrix Recovery

As [26] explained, given a prior distribution, the GAMP algorithm can achieve sparse signal recovery with reduced computational complexity from $\mathcal{O}(I^{3})$ to $\mathcal{O}(IL)$ . Following the sum-product and max-sum forms of the BP algorithm, the GAMP algorithm uses Gaussian and quadratic approximations to provide the minimum mean square error (MMSE) estimation and maximum a posteriori (MAP) estimation of the sparse matrix, respectively. By defining scalar estimation functions, ${g_{in}}\left(\cdot\right)$ and ${g_{out}}\left(\cdot\right)$ , the GAMP algorithm iteratively performs scalar operations at the input and output nodes to decompose the vector-valued estimation problem. Assuming that in $t$ -th iteration, the prior distribution of the sparse matrix is expressed as $p({\mathbf{X}}|{{\bm{\alpha}}{(t)}})$ , with ${{\bm{\alpha}}{(t)}}$ is the hyperparameter obtained in the $t$ -th iteration. For AWGN channel, the GAMP algorithm is shown from line 3 to line 13 in Algorithm 2.

Algorithm 2 GAMP-PCSBL-La

{\mathbf{Y}}

{\mathbf{A}}

p({\mathbf{X}}|{\bm{\alpha}})

p({\bm{\alpha}})

\eta

\varepsilon

{\mathbf{\hat{X}}}\left({t+1}\right)

{\bm{\alpha}}(t+1)

, and

\gamma(t+1)

1: Initialize:

{\bm{\alpha}}(1)

{\mathbf{\hat{X}}}\left(1\right)=\mathbf{0}

{\mathbf{S}}\left(0\right)=\mathbf{0}

\gamma(1)

{u_{i,j}^{x}(1)}

;

2: for

t=1

T

\forall l,j

u_{l,j}^{p}(t)=\sum\nolimits_{i}{{{\left|{{a_{l,i}}}\right|}^{2}}}u_{i,j}^{x}(t)

\forall l,j

{\hat{p}_{l,j}}(t)=\sum\nolimits_{i}{{a_{l,i}}{{\hat{x}}_{i,j}}(t)}-u_{l,j}^{p}(t){\hat{s}_{l,j}}(t-1)

\forall l,j

u_{l,j}^{z}(t)=\frac{{u_{l,j}^{p}(t)\gamma(t)}}{{u_{l,j}^{p}(t)+\gamma(t)}}

\forall l,j

{\hat{z}_{l,j}}(t)=\frac{{u_{l,j}^{p}(t){y_{l,j}}+\gamma(t){{\hat{p}}_{l,j}}(t)}}{{u_{l,j}^{p}(t)+\gamma(t)}}

\forall l,j

{\hat{s}_{l,j}}(t)={g_{out}}\left({t,{{\hat{p}}_{l,j}}(t),{y_{l,j}},u_{l,j}^{p}(t)}\right)

\forall l,j

u_{l,j}^{s}(t)=-\frac{{\partial{g_{out}}\left({t,{{\hat{p}}_{l,j}}(t),{y_{l,j}},u_{l,j}^{p}(t)}\right)}}{{\partial{{\hat{p}}_{l,j}}(t)}}

\forall i,j

u_{i,j}^{r}(t)={\left[{\sum\nolimits_{l}{{{\left|{{a_{l,i}}}\right|}^{2}}u_{l,j}^{s}(t)}}\right]^{-1}}

10:

\forall i,j

{\hat{r}_{i,j}}(t)={\hat{x}_{i,j}}(t)+u_{i,j}^{r}(t)\sum\nolimits_{l}{{a_{l,i}}{{\hat{s}}_{l,j}}(t)}

11:

\forall i,j

{\tau_{i,j}}(t)={\alpha_{i,j}}(t)+\eta{\alpha_{i-1,j}}(t)+\eta{\alpha_{i+1,j}}(t)+\eta{\alpha_{i,j-1}}(t)+\eta{\alpha_{i,j+1}}(t)

12:

\forall i,j

{\hat{x}_{i,j}}(t+1)={g_{in}}\left({t,{{\hat{r}}_{i,j}}(t),{\tau_{i,j}}(t),u_{i,j}^{r}(t)}\right)

13:

\forall i,j

u_{i,j}^{x}(t+1)=u_{i,j}^{r}(t)\frac{{\partial{g_{in}}\left({t,{{\hat{r}}_{i,j}}(t),{\tau_{i,j}}(t),u_{i,j}^{r}(t)}\right)}}{{\partial{{\hat{r}}_{i,j}}(t)}}

14:

\forall i,j

{\alpha_{i,j}}(t+1)=\frac{a}{{b+{\omega_{i,j}}(t+1)+{\omega_{N+i,j}}(t+1)}}

15: Update

\gamma(t+1)=\frac{{\sum\nolimits_{l,j}{}{{\left\|{{y_{l,j}}-{{\hat{z}}_{l,j}}(t)}\right\|}^{2}}+u_{l,j}^{z}(t)}}{{2MN}}

16: If

\frac{{\left\|{{\mathbf{\hat{X}}}\left({t+1}\right)-{\mathbf{\hat{X}}}\left(t\right)}\right\|_{F}^{2}}}{{\left\|{{\mathbf{\hat{X}}}\left({t+1}\right)}\right\|_{F}^{2}}}<\varepsilon

, break

17: end for

For sum-product GAMP, ${g_{out}}\left({t,{{\hat{p}}_{l,j}}(t),{y_{l,j}},u_{l,j}^{p}(t)}\right)$ and $u_{l,j}^{s}(t)$ are defined as


	$\displaystyle{g_{out}}\left({t,{{\hat{p}}_{l,j}}(t),{y_{l,j}},u_{l,j}^{p}(t)}\right)=\frac{{{y_{l,j}}-{{\hat{p}}_{l,j}}(t)}}{{u_{l,j}^{p}(t)+\gamma(t)}},$		(46a)
	$\displaystyle u_{l,j}^{s}(t)=\frac{1}{{u_{l,j}^{p}(t)+\gamma(t)}}.$		(46b)

Based on MMSE estimation, in input node, we have


$\displaystyle{g_{in}}\left({t,{{\hat{r}}_{i,j}}(t),{\tau_{i,j}}(t),u_{i,j}^{r}(t)}\right)$	$\displaystyle={\mathbb{E}}_{p(x\|r,\tau,u^{r})}\{{x_{i,j}}\},$	(47a)
$\displaystyle\frac{{\partial{g_{in}}\left({t,{{\hat{r}}_{i,j}}(t),{\tau_{i,j}}(t),u_{i,j}^{r}(t)}\right)}}{{\partial{{\hat{r}}_{i,j}}(t)}}{\tau_{i,j}}(t)$	$\displaystyle={\mathbb{V}}_{p(x\|r,\tau,u^{r})}\{{x_{i,j}}\},$	(47b)

where $p(x_{i,j}|r,\tau,u^{r})$ represent the approximate posterior distribution of $(i,j)$ -th element of the matrix to be estimated. In the sum-product derivation, the messages from the factor node $p\left({y|x}\right)$ to the variable node ${x_{i,j}}$ are approximated as:

{\vec{m}_{{x_{i,j}}}}(t)\approx\mathcal{N}\left({{x_{i,j}};{{\hat{r}}_{i,j}}(t),u_{i,j}^{r}(t)}\right).

(48)

As previously mentioned, the prior of ${x_{i,j}}$ is defined as:

\displaystyle p({x_{i,j}}|{\tau_{\tilde{i},j}}(t))=\mathcal{L}\mathcal{A}\left({{x_{i,j}};{{\left({\tau_{\tilde{i},j}(t)}\right)}^{-1}}}\right),

with

\displaystyle\tilde{i}=\left\{{\begin{array}[]{*{20}{c}}i&{1\leq i\leq I}\\ {i-I}&{I+1\leq i\leq 2I}\end{array}}\right..

Therefore, the approximate posterior distribution of ${x_{i,j}}$ can be expressed as:

		$\displaystyle p\left({{x_{i,j}}\|{{\hat{r}}_{i,j}}(t),{\tau_{i,j}}(t),u_{i,j}^{r}(t)}\right)\propto{{\vec{m}}_{{x_{i,j}}}}(t)p({x_{i,j}}\|{\tau_{\tilde{i},j}}(t))\hfill$
		$\displaystyle=\mathcal{N}\left({{x_{i,j}};{{\hat{r}}_{i,j}}(t),u_{i,j}^{r}(t)}\right)\mathcal{L}\mathcal{A}\left({{x_{i,j}};{{\left({\tau_{\tilde{i},j}(t)}\right)}^{-1}}}\right)\hfill$		(49)
		$\displaystyle=\frac{1}{{{\psi_{i,j}}(t)}}\exp\left\{{-{\xi_{i,j}}\left(t\right)-\frac{1}{{2u_{i,j}^{r}(t)}}{{\left({{x_{i,j}}-{\varphi_{i,j}}\left(t\right)}\right)}^{2}}}\right\},$

where


		$\displaystyle{\psi_{i,j}}(t)=\sqrt{2\pi u_{i,j}^{r}(t)}\left[\exp\left\{{-\xi_{i,j}^{-}\left(t\right)}\right\}Q\left({{\varphi_{i,j}^{-}(t)}}/{{\sqrt{u_{i,j}^{r}(t)}}}\right)\right.$
		$\displaystyle\left.+\exp\left\{{-\xi_{i,j}^{+}\left(t\right)}\right\}Q\left({{{\varphi_{i,j}^{+}(t)}}/{{\sqrt{u_{i,j}^{r}(t)}}}}\right)\right],$		(50a)
		$\displaystyle{\xi_{i,j}}\left(t\right)={\tau_{\tilde{i},j}}(t){\hat{r}_{i,j}}(t){\text{sign}}\left({{x_{i,j}}}\right)-\frac{1}{2}u_{i,j}^{r}(t){\left({{\tau_{\tilde{i},j}}(t)}\right)^{2}},$		(50b)
		$\displaystyle{\varphi_{i,j}}\left(t\right)={\hat{r}_{i,j}}(t)-u_{i,j}^{r}(t){\tau_{\tilde{i},j}}(t){\text{sign}}\left({{x_{i,j}}}\right),$		(50c)
		$\displaystyle\xi_{i,j}^{-}\left(t\right)=-{\tau_{\tilde{i},j}}(t){\hat{r}_{i,j}}(t)-\frac{1}{2}u_{i,j}^{r}(t){\left({{\tau_{\tilde{i},j}}(t)}\right)^{2}},$		(50d)
		$\displaystyle\xi_{i,j}^{+}\left(t\right)={\tau_{\tilde{i},j}}(t){\hat{r}_{i,j}}(t)-\frac{1}{2}u_{i,j}^{r}(t){\left({{\tau_{\tilde{i},j}}(t)}\right)^{2}},$		(50e)
		$\displaystyle\varphi_{i,j}^{-}\left(t\right)={\hat{r}_{i,j}}(t)+u_{i,j}^{r}(t){\tau_{\tilde{i},j}}(t),$		(50f)
		$\displaystyle\varphi_{i,j}^{+}\left(t\right)={\hat{r}_{i,j}}(t)-u_{i,j}^{r}(t){\tau_{\tilde{i},j}}(t),$		(50g)
		$\displaystyle{\text{sign}}({x_{i,j}})=\left\{{\begin{array}[]{*{20}{c}}1,&{{x_{i,j}}>0,}\\ 0,&{{x_{i,j}}=0,}\\ {-1},&{{x_{i,j}}<0.}\end{array}}\right.$		(50k)

According to (IV-B), the posterior mean and variance of ${x_{i,j}}$ can be calculated as in (51) and (52), where $Q\left(\cdot\right)$ is the standard $Q$ -function, representing the tail probability of the normal distribution, defined as:

{\hat{x}_{i,j}}(t+1)=\frac{{\sqrt{2\pi u_{i,j}^{r}(t)}}}{{{\psi_{i,j}}(t)}}\left[{{e^{-\xi_{i,j}^{-}(t)}}\varphi_{i,j}^{-}(t)Q\left({{{\varphi_{i,j}^{-}(t)}}/{{\sqrt{u_{i,j}^{r}(t)}}}}\right)+{e^{-\xi_{i,j}^{+}(t)}}\varphi_{i,j}^{+}(t)Q\left({{{\varphi_{i,j}^{+}(t)}}/{{\sqrt{u_{i,j}^{r}(t)}}}}\right)}\right].

(51)

\begin{gathered}u_{i,j}^{x}(t+1)=\frac{{\sqrt{2\pi u_{i,j}^{r}(t)}}}{{{\psi_{i,j}}(t)}}\left(\left({{{\left({\varphi_{i,j}^{+}(t)}\right)}^{2}}+u_{i,j}^{r}(t)}\right){e^{-\xi_{i,j}^{+}(t)}}Q\left({-{{\varphi_{i,j}^{+}(t)}}/{{\sqrt{u_{i,j}^{r}(t)}}}}\right)+\left({{{\left({\varphi_{i,j}^{-}(t)}\right)}^{2}}+u_{i,j}^{r}(t)}\right){e^{-\xi_{i,j}^{-}(t)}}\right.\hfill\\ Q\left({{{\varphi_{i,j}^{-}(t)}}/{{\sqrt{u_{i,j}^{r}(t)}}}}\right)\left.-\frac{{2{\tau_{i,j}}(t)u_{i,j}^{r}{{(t)}^{2}}}}{{\sqrt{2\pi u_{i,j}^{r}(t)}}}{e^{-{{{{\hat{r}}_{i,j}}{{(t)}^{2}}}}/{{2u_{i,j}^{r}(t)}}}}\right)-{\left({{{\hat{x}}_{i,j}}(t)}\right)^{2}}.\hfill\\ \end{gathered}

(52)

Q\left(x\right)=\frac{1}{{\sqrt{2\pi}}}\int_{x}^{\infty}{{e^{-\frac{{{u^{2}}}}{2}}}du}.

(53)

Proof:

Please refer to Appendix D in [29]. ∎

This completes the GAMP portion of Algorithm 2.

IV-C Learning Hyperparameters via EM Algorithm

After obtaining the posterior distribution of ${\mathbf{X}}$ , our objective shifts to finding appropriate hyperparameters ${\bm{\alpha}}$ and $\gamma$ that maximize the posterior probability of them. A direct strategy is to use the EM algorithm, where ${\mathbf{X}}$ is treated as a hidden variable. In the E-step, the log-posterior mean is computed, and in the M-step, the log-posterior is maximized. The iterative process of these two steps is summarized as follows.

E Step: Given the posterior distribution of ${\mathbf{X}}$ and the observed matrix ${\mathbf{Y}}$ , we compute the mean of the log-posterior of the hyperparameters ${\bm{\alpha}}$ with respect to the hidden variable ${\mathbf{X}}$ in $t$ -th iteration. Let ${\bm{\theta}}=\left\{{{\bm{\alpha}},\gamma}\right\}$ and we define $R$ function as:

\begin{gathered}R\left({{\bm{\theta}}|{\bm{\theta}}(t)}\right)={\mathbb{E}_{p({\mathbf{X}}|{\mathbf{Y}},{{\bm{\theta}}{\left(t\right)}})}}\left\{{\log p\left({{\bm{\theta}}|{\mathbf{X}},{\mathbf{Y}},{\bm{\theta}}(t)}\right)}\right\}\hfill\\ =R\left({{\bm{\alpha}}|{\bm{\theta}}(t)}\right)+R\left({\gamma|{\bm{\theta}}(t)}\right)+c,\hfill\\ \end{gathered}

(54)

where $c$ represents a constant that is independent of ${\bm{\theta}}$ . Next, we calculate $R\left({{\bm{\alpha}}|{\bm{\theta}}(t)}\right)$ and $R\left({\gamma|{\bm{\theta}}(t)}\right)$ as follow.

	$\displaystyle R\left({{\bm{\alpha}}\|{\bm{\theta}}(t)}\right)={\mathbb{E}_{p({\mathbf{X}}\|{\mathbf{Y}},{\bm{\theta}}(t))}}\left\{{\log p\left({{\mathbf{X}}\|{\bm{\alpha}}}\right)+\log p\left({\bm{\alpha}}\right)}\right\}\hfill$
	$\displaystyle=\sum\nolimits_{i,j}{}2\ln({\alpha_{i,j}}+\eta{\alpha_{i-1,j}}+\eta{\alpha_{i+1,j}}+\eta{\alpha_{i,j-1}}+\eta{\alpha_{i,j+1}})\hfill$
	$\displaystyle-({\alpha_{i,j}}+\eta{\alpha_{i-1,j}}+\eta{\alpha_{i+1,j}}+\eta{\alpha_{i,j-1}}+\eta{\alpha_{i,j+1}})\times\hfill$
	$\displaystyle\left\langle{\left\|{{x_{i,j}}(t)}\right\|+\left\|{{x_{i+I,j}}(t)}\right\|}\right\rangle+a\ln{\alpha_{i,j}}-b{\alpha_{i,j}},$		(55)

\begin{gathered}R\left({\gamma|{\mathbf{\theta}}(t)}\right)={\mathbb{E}_{p({\mathbf{X}}|{\mathbf{Y}},{\mathbf{\theta}}(t))}}\left\{{\log p\left({{\mathbf{Y}}|{\mathbf{Z}},\gamma}\right)}\right\}\hfill\\ =-IJ\ln\gamma-\frac{{\left\|{{\mathbf{Y}}-{\mathbf{\hat{Z}}}\left(t\right)}\right\|_{F}^{2}+\sum\nolimits_{i,j}{u_{i,j}^{z}(t)}}}{{2\gamma}},\hfill\\ \end{gathered}

(56)

where $\left\langle{\left|{{x_{i,j}}(t)}\right|}\right\rangle$ represents the mean of the absolute value of ${x_{i,j}}(t)$ , that is

\begin{gathered}\left\langle{\left|{{x_{i,j}}(t)}\right|}\right\rangle=\int{\left|{{x_{i,j}}(t)}\right|p({x_{i,j}}(t)|{\mathbf{Y}},{\mathbf{\theta}}(t))d}{x_{i,j}}(t)\hfill\\ =\frac{1}{{{\psi_{i,j}}(t)}}\left[{e^{-\xi_{i,j}^{+}(t)}}\sqrt{2\pi u_{i,j}^{r}(t)}\varphi_{i,j}^{+}(t)Q\left({\frac{{\varphi_{i,j}^{+}(t)}}{{\sqrt{u_{i,j}^{r}(t)}}}}\right)-\right.\hfill\\ \left.{e^{-\xi_{i,j}^{-}(t)}}\sqrt{2\pi u_{i,j}^{r}(t)}\varphi_{i,j}^{-}(t)Q\left({\frac{{\varphi_{i,j}^{-}(t)}}{{\sqrt{u_{i,j}^{r}(t)}}}}\right)+2u_{i,j}^{r}(t){e^{-\frac{{{{\left({{{\hat{r}}_{i,j}}(t)}\right)}^{2}}}}{{2u_{i,j}^{r}(t)}}}}\right].\hfill\\ \end{gathered}

(57)

M Step: We update the hyperparameters ${\bm{\alpha}}$ and $\gamma$ by maximizing $R$ function:


$\displaystyle{\bm{\alpha}}(t+1)$	$\displaystyle=\mathop{\arg\max}\limits_{\bm{\alpha}}R\left({{\bm{\alpha}}\|{\bm{\theta}}(t)}\right),$	(58a)
$\displaystyle\gamma(t+1)$	$\displaystyle=\mathop{\arg\max}\limits_{\gamma}R\left({\gamma\|{\bm{\theta}}(t)}\right).$	(58b)

First, we consider $\bm{\alpha}$ . Unlike conventional SBL, in PCSBL, the hyperparameters are interdependent, meaning that the element-wise estimation of parameters cannot be performed independently. Directly solving the result of (58a) is challenging. To address this, we refer to the derivation process in [28] and consider an alternative suboptimal solution that achieves good estimation accuracy while simplifying the computation process. Assuming ${{\bm{\alpha}}^{*}}$ is the optimal solution to (58a), the first-order derivative of R funcation with respect to ${\bm{\alpha}}$ equals zero at ${{\bm{\alpha}}^{*}}$ . That is, for any $i$ , $j$ , the following condition holds:

\begin{gathered}\left.\frac{{\partial R({{\mathbf{\alpha}}|{{\mathbf{\theta}}^{(t)}}})}}{{\partial{\alpha_{i,j}}}}\right|_{{\mathbf{\alpha}}={{\mathbf{\alpha}}^{*}}}=\frac{a}{{\alpha_{i,j}^{*}}}+2\left({\upsilon_{i,j}}+\eta{\upsilon_{i-1,j}}+\eta{\upsilon_{i+1,j}}+\right.\hfill\\ \left.\eta{\upsilon_{i,j-1}}+\eta{\upsilon_{i,j+1}}\right)-b-{\omega_{i,j}}(t)-{\omega_{N+i,j}}(t)=0,\hfill\\ \end{gathered}

(59)

where

\begin{gathered}{\omega_{i,j}}(t)\buildrel\Delta\over{=}\left\langle{\left|{{x_{i,j}}(t)}\right|}\right\rangle+\eta\left\langle{\left|{{x_{i-1,j}}(t)}\right|}\right\rangle\hfill\\ +\eta\left\langle{\left|{{x_{i+1,j}}(t)}\right|}\right\rangle+\eta\left\langle{\left|{{x_{i,j-1}}(t)}\right|}\right\rangle+\eta\left\langle{\left|{{x_{i,j+1}}(t)}\right|}\right\rangle,\hfill\\ \end{gathered}

(60)

{\upsilon_{i,j}}\buildrel\Delta\over{=}\frac{1}{{\alpha_{i,j}^{*}+\eta\alpha_{i-1,j}^{*}+\eta\alpha_{i+1,j}^{*}+\eta\alpha_{i,j-1}^{*}+\eta\alpha_{i,j+1}^{*}}}.

(61)

In our model, the parameters $\eta\geq 0$ and ${\alpha_{i,j}}\geq 0$ hold true for any $i,j$ . Building on this, based on equation (61), ${\upsilon_{i,j}}$ satisfies the following inequality constraints:

	$\displaystyle 0\leq{\upsilon_{i,j}}\leq\frac{1}{{\alpha_{i,j}^{*}}},$
	$\displaystyle 0\leq{\upsilon_{i,j}}\leq\frac{1}{{\eta\alpha_{i-1,j}^{*}}},$
	$\displaystyle 0\leq{\upsilon_{i,j}}\leq\frac{1}{{\eta\alpha_{i+1,j}^{*}}},$		(62)
	$\displaystyle 0\leq{\upsilon_{i,j}}\leq\frac{1}{{\eta\alpha_{i,j-1}^{*}}},$
	$\displaystyle 0\leq{\upsilon_{i,j}}\leq\frac{1}{{\eta\alpha_{i,j+1}^{*}}}.$

Substituting the above results into equation (59), we obtain:

\frac{a}{{\alpha_{i,j}^{*}}}\leq b+{\omega_{i,j}}(t)+{\omega_{N+i,j}}(t)\leq\frac{{a+10}}{{\alpha_{i,j}^{*}}}.

(63)

Then $\alpha_{i,j}^{*}\in\left[{\frac{a}{{b+{\omega_{i,j}}(t)+{\omega_{N+i,j}}(t)}},\frac{{a+10}}{{b+{\omega_{i,j}}(t)+{\omega_{N+i,j}}(t)}}}\right]$ is held. Therefore, a simple suboptimal solution for equation (58a) can be given by:

{\alpha_{i,j}}(t+1)=\frac{a}{{b+{\omega_{i,j}}(t)+{\omega_{N+i,j}}(t)}},

(64)

Next, we focus on noise variance $\gamma$ . Suppose ${\gamma^{*}}$ is the optimal solution of (58b), it satisfies:

\left.\frac{{\partial R\left({\gamma|{\mathbf{\theta}}(t)}\right)}}{{\partial\gamma}}\right|_{\gamma={\gamma^{*}}}=-\frac{{MN}}{{{\gamma^{*}}}}+\frac{{\left\|{{\mathbf{Y}}-{\mathbf{\hat{Z}}}\left(t\right)}\right\|_{F}^{2}+\sum\nolimits_{i,j}{u_{i,j}^{z}(t)}}}{{2{{\left({{\gamma^{*}}}\right)}^{2}}}}=0.

(65)

It is easy to obtain the expression of $\gamma(t+1)$ as:

\gamma(t+1)={\gamma^{*}}=\frac{{\left\|{{\mathbf{Y}}-{\mathbf{\hat{Z}}}\left(t\right)}\right\|_{F}^{2}+\sum\nolimits_{i,j}{u_{i,j}^{z}(t)}}}{{2MN}}.

(66)

Thus completes the update process for ${\bm{\theta}}$ . Equations (64) and (66) serve as the output of the EM algorithm, reflected in lines 14 and 15 of Algorithm 2. With this, we have completed the entire derivation process of the GAMP-PCSBL-La algorithm. In Section VI, we validate that the proposed GAMP-PCSBL-La algorithm can accurately estimate block sparse matrix with DCT sparse properties. This algorithm is employed for AUD and CE.

V Computational Complexity Analysis

The scheme proposed in this paper consists of two main stages: rough AUD and joint accurate AUD and CE. For rough AUD, an additional SFFT for the superimposed preamble is introduced, along with the GAMP-PCSBL-La algorithm for 2-D block sparse matrix recovery. The computational complexity for rough AUD is given by ${\chi_{s}}={\mathcal{O}}\left({N^{\prime}\log N^{\prime}}\right)+{\mathcal{O}}\left({M^{\prime}\log M^{\prime}}\right)+{\mathcal{O}}\left({{N_{z}}{N_{y}}N^{\prime}M^{\prime}U\left|{{{\mathbf{p}}_{r1}}}\right|}\right)$ , where the first two terms correspond to the SFFT, and the last term corresponds to the GAMP-PCSBL-La algorithm. Similarly, for the joint accurate AUD and CE, the GAMP-PCSBL-La algorithm for 2-D block sparse matrix recovery, applied to the embedded preamble, is used, with a computational complexity of ${\chi_{e}}={\mathcal{O}}\left({{N_{z}}{N_{y}}\left|{{{\mathcal{K}}_{\mathcal{A}}}}\right|\left|{{{\mathbf{p}}_{r2}}}\right|{N_{p}}{M_{p}}}\right)$ . Additionally, the receiver needs to perform SIC for the superimposed preamble, with a computational complexity of $M^{\prime}N^{\prime}L$ . In summary, the overall computational complexity of the proposed scheme is ${\chi_{h}}={\chi_{s}}+{\chi_{e}}+M^{\prime}N^{\prime}L$ . In existing schemes, the superimposed scheme has lower complexity but poorer access performance. Furthermore, as $U\gg\left|{{{\mathcal{K}}_{\mathcal{A}}}}\right|$ , this results in the embedded scheme potentially having higher complexity than the hybird scheme, while occupying more DD resources.

VI Simulations

To validate the effectiveness and superiority of the proposed scheme, numerical simulations are conducted. The specific simulation parameters are detailed in Table I. We consider the 3GPP vehicular models, namely extended vehicular A (EVA) with number of paths is 9 and ${\tau_{\max}}=2.5\mu s$ [31]. The delay and Doppler parameters are randomly generated within the range of 0 to their respective maximum values. ${\vartheta_{u,b,i}}$ and ${\varsigma_{u,b,i}}$ are uniformly distributed within the ranges $\left[{0,\pi}\right]$ and $\left[{-\frac{\pi}{2},\frac{\pi}{2}}\right]$ , respectively.

TABLE I: Simulation Parameters

Parameters	Definition	Value
$N$	Doppler dimension for a block	128
$M$	Delay dimension for a block	512
$P$	Number of a UE’s paths	9
$v_{\max}$	Maximum velocity (km/h)	300
$\tau_{\max}$	Maximum path delay ( $\mu$ s)	2.5
$f_{c}$	Carrier frequency (GHz)	4
$\Delta f$	Subcarrier interval (kHz)	15
$N^{\prime}$	Doppler dimension for preamble1	64
$M^{\prime}$	Delay dimension for preamble1	4
$K_{p}$	Doppler dimension for preamble2	20
$L_{p}$	Delay dimension for preamble2	20
$\lambda$	Large scale fading coefficient (dB)	$-128-37.6\log d$
$n_{0}$	Background noise (dBm/Hz)	-174
$P_{t}$	Transmission power (dBm)	10

To evaluate the performance of the massive random access scheme, we use the detection error rate (DER) and the normalized mean squared error (NMSE) as performance metrics for AUD and CE, respectively. They are defined as follows:


	$\displaystyle\qquad{\text{DER}}=\frac{{\left\|{{{\mathcal{K_{A}}}}\backslash{{{\mathcal{U}}}_{a}}}\right\|+\left\|{{{{\mathcal{U}}}_{a}}\backslash{{\mathcal{K_{A}}}}}\right\|}}{U},$		(67a)
	$\displaystyle{\text{NMSE}}=10{\log_{10}}\frac{{\left\\|{{\mathbf{\bar{H}}}^{DDA2}-{\mathbf{H}}^{DDA2}}\right\\|_{F}^{2}}}{{\left\\|{{\mathbf{H}}^{DDA2}}\right\\|_{F}^{2}}},$		(67b)

where ${\mathcal{A}}\backslash{\mathcal{B}}$ represents a set whose elements are in ${\mathcal{A}}$ but not in ${\mathcal{B}}$ . $\left|{\mathcal{A}}\right|$ denotes the cardinality of set $\mathcal{A}$ . A smaller DER or NMSE indicates more accurate detection and estimation results, corresponding to better AUD and CE performance.

Initially, the performance of the proposed GAMP-PCSBL-La algorithm is compared with other existing algorithms for 2-D block sparse matrix recovery. we set the dimensions of the 2-D block sparse matrix to $256\times 64$ , the observation matrix to $64\times 64$ , and the sensing matrix to $64\times 256$ . A block sparse matrix was generated by randomly creating non-zero values and applying a DCT. The elements of the sensing matrix and noise matrix followed a Gaussian distribution. Compared algorithms including GAMP-PCSBL with Gaussian prior (GAMP-PCSBL-Gs) [32], PCSBL [27], orthogonal matching pursuit (OMP) [33], block OMP (BOMP) [34], turbo variational Bayesian inference with Markov random field (turbo-VBI-MRF) [35], and GAMP-SBL [36].

In Fig. 6(a), with the number of non-zero blocks fixed at 5, we compared the performance of various algorithms in recovering block sparse matrix under different signal-to-noise ratios (SNRs). The simulation curves show that as the SNR increases, the NMSE performance of all algorithms improves. In Fig. 6(b), with the SNR fixed at 12.5 dB, we analyzed the impact of varying the column dimensions of the block sparse matrix on the performance of each algorithm. It is evident that as the dimensions of the sparse matrix increase, the estimation accuracy of all algorithms gradually declines. Additionally, in Fig. 6(c), with the SNR fixed at 12.5 dB and the sparse matrix dimensions set to $256\times 64$ , we compared the performance trends of each algorithm under different numbers of non-zero blocks. This figure implies that as the number of non-zero blocks increases, the estimated accuracy decreases across all algorithms. The simulation results in Fig. 6(b) and 6(c) are consistent with the relevant conclusions of compressed sensing theory. These simulation curves also demonstrate that algorithms utilizing PCSBL outperform other algorithms in block sparse matrix recovery. Moreover, the proposed GAMP-PCSBL-La algorithm outperforms the other algorithms, showcasing its unique performance advantages in recovering block sparse matrices formed through DCT.

We compare the convergence trends of several iterative algorithms in Fig. 7, with the simulation settings being consistent with those in Fig. 6. The ‘estimated error’ in the figure is defined as the non-logarithmic form of NMSE, i.e., $10^{\text{NMSE}/10}$ . The figure shows that the PCSBL and turbo-VBI-MRF algorithms, which are based on direct matrix inversion, converge faster than GAMP-based algorithms, reaching convergence in approximately 5 iterations. In contrast, the GAMP-based algorithms converge after about 20 iterations. This simulation result demonstrates that the proposed GAMP-PCSBL-La algorithm exhibits good convergence and reliability.

In the hybrid preamble scheme proposed in this paper, the power allocation between the two superimposed signals, ${\mathbf{X}}_{u,1}^{DD}$ and ${\mathbf{X}}_{u,2}^{DD}$ , significantly impacts the performance of the receiver. To assess this impact, we define $\sigma_{p}^{2}=\frac{{{P_{{X_{1}}}}}}{{{P_{{X_{1}}}}+{P_{{X_{2}}}}}}$ as the power allocation ratio of ${\mathbf{X}}_{u,1}^{DD}$ , and let $\sigma_{p}^{2}$ vary from 0.1 to 0.9 in steps of 0.1. After SIC, the received signal is expressed as ${{\mathbf{\hat{Y}}}_{b}}={{\mathbf{Y}}_{b}}-{{\mathbf{S}}_{{\mathbf{X}}_{1}^{DD},{\mathbf{\hat{H}}}_{b}^{DD2}}}={{\mathbf{S}}_{{\mathbf{X}}_{2}^{DD},{\mathbf{\hat{H}}}_{b}^{DD2}}}+{\mathbf{I}}+{\mathbf{N}}$ , where ${{\mathbf{S}}_{{\mathbf{X}},{\mathbf{H}}}}$ represents the received signal of ${\mathbf{X}}$ after passing through the channel ${\mathbf{H}}$ . ${\mathbf{I}}$ and ${\mathbf{N}}$ denote interference and additive Gaussian noise, respectively. The receiver performance is evaluated based on the signal-to-interference-plus-noise ratio (SINR) under different values of $\sigma_{p}^{2}$ :

{\text{SINR}}=\frac{{\sum\nolimits_{b}{\left\|{{{\mathbf{S}}_{{\mathbf{X}}_{2}^{DD},{\mathbf{\hat{H}}}_{b}^{DD2}}}}\right\|_{F}^{2}}}}{{\sum\nolimits_{b}{\left\|{{{{\mathbf{\hat{Y}}}}_{b}}-{{\mathbf{S}}_{{\mathbf{X}}_{2}^{DD},{\mathbf{\hat{H}}}_{b}^{DD2}}}}\right\|_{F}^{2}}}}.

(68)

The simulation results, as shown by the curves in Fig.8, indicate that as $\sigma_{p}^{2}$ increases, the receiver performance first improves and then deteriorates, with optimal performance occurring around $\sigma_{p}^{2}=0.3$ . At this point, the magnitude of non-zero entries of ${\mathbf{X}}_{u,1}^{DD}$ to be ten times that of ${\mathbf{X}}_{u,2}^{DD}$ . The trend of this curve aligns with the conclusion in [24], which indicates that there is an optimal power allocation ratio for the superimposed pilot signals. In the subsequent simulations, we will fix the $\sigma_{p}^{2}=0.3$ .

In order to demonstrate the advantages of the proposed scheme in massive random access scenarios, we compare its performance under different value of $U$ . Specifically, we fix the AP antenna array dimension at $8\times 8$ and set the number of active users to 30, while varying $U$ from 100 to 3000. Especially, we use ‘HP’, ‘SP’ and ‘EP’ to denote hybrid preamble, superimposed preamble and embedded preamble schemes, respectively. The average detection error number (DEN) instead of DER is used to characterize the AUD performance for fairness. Under the GAMP-PCSBL-La algorithm, the performance trends of the three schemes are illustrated in Fig. 9. It can be observed that when $U$ is relatively small, HP, SP, and EP all achieve satisfactory detection and estimation performance. However, as $U$ increases, the performance of EP deteriorates significantly, as the growth rate of the sensing matrix dimension in EP is much higher than in the other two schemes. Initially, HP outperforms SP in terms of active user detection, but as $U$ increases, the detection performance of both schemes begins to converge. This is because, in high- $U$ scenarios, the detection performance of HP is primarily constrained by the rough AUD stage. Nevertheless, HP consistently outperforms SP in channel estimation by at least 5dB, and this performance gap increases with $U$ . Overall, HP exhibits the best performance in massive UE scenarios.

In Fig. 10, with the number of active UEs fixed at 30, we simulated the impact of antenna array dimensions on the performance of massive random access schemes. Additionally, to avoid the complex computation of large matrix inversions, we limited our comparison to low-complexity GAMP-based algorithms to evaluate their performance in massive random access, thereby verifying the superiority of the proposed scheme. The results indicate that as the number of antennas increases, both the hybrid preamble scheme and the superimposed preamble scheme exhibit improved AUD and CE performance. Due to the absence of a rough activity detection step to reduce the dimensionality of the matrix to be estimated, the embedded preamble scheme alone, with its excessively large block-sparse channel matrix, fails to achieve effective AUD and CE results. It is evident that in massive random access, the proposed hybrid preamble scheme significantly outperforms the schemes that utilize either the superimposed or embedded preamble alone. Additionally, the simulation curves demonstrate that, compared to the GAMP-PCSBL-Gs and GAMP-SBL algorithms, the proposed GAMP-PCSBL-La algorithm more effectively captures the block-sparsity caused by fractional channel parameters, resulting in superior AUD and CE performance.

Similar conclusions can be drawn from the simulation curves in Fig. 11. With the antenna array dimension fixed at $8\times 8$ , we vary the number of active UEs from 10 to 40. The simulation results show that as the number of active UEs increases, the performance of all the massive random access schemes declines. This decline is attributed to the fact that more active UEs correspond to more non-zero elements, making the matrix less sparse. The proposed hybrid preamble scheme achieves significantly better AUD and CE performance compared to the other two preamble schemes when addressing the demands of massive random access. The larger dimension of the channel matrix to be estimated causes the embedded preamble scheme to fail when used alone. Moreover, the simulation curves in Fig. 11 demonstrate that the proposed GAMP-PCSBL-La algorithm outperforms other iterative algorithms in block-sparse matrix recovery.

VII Conclusion

This paper proposes a hybrid preamble scheme for massive machine-type random access in high-mobility scenarios within cell-free massive MIMO systems using OTFS modulation. This scheme employs a superimposed preamble for rough AUD, then performs accurate AUD and CE based on the rough detected UE set and embedded preamble. By leveraging the advantages of both preamble schemes, the proposed hybrid preamble scheme achieves more precise detection and estimation with reduced preamble overhead. Additionally, a GAMP-PCSBL-La algorithm is introduced to estimate the channel matrix, effectively capturing the block-sparse characteristics of the channel caused by fractional channel parameters, while maintaining low computational complexity. Simulation results demonstrate that the proposed hybrid preamble scheme better meets the requirements for massive random access in OTFS-modulating cell-free massive MIMO systems, and that the GAMP-PCSBL-La algorithm is particularly well-suited for this scheme.

Appendix A

By substituting equations (1) and (8a), (8b) into equation (II-B), we obtain:

\begin{gathered}\mathbf{y}_{u,b}\left[{n,m}\right]=\frac{1}{T}\sum\limits_{i}{}{h_{u,b,i}}{\mathbf{a}_{u,b,i}}\sum\limits_{m^{\prime}}{}\frac{1}{{\sqrt{NM}}}\sum\limits_{k^{\prime}}{}\sum\limits_{l^{\prime}}{}{X_{u}^{DD}}\left[{k^{\prime},l^{\prime}}\right]\hfill\\ {e^{-j2\pi\left({\frac{{m^{\prime}l^{\prime}}}{M}-\frac{{nk^{\prime}}}{N}}\right)}}{e^{-j2\pi\frac{{m^{\prime}{l_{i}}}}{M}}}{e^{-j2\pi\frac{{{l_{i}}\left({{k_{i}}+{{\tilde{k}}_{i}}}\right)}}{{NM}}}}{e^{j2\pi\frac{{n({k_{i}}+{{\tilde{k}}_{i}})}}{N}}}\hfill\\ \sum\limits_{p={l_{u,b,i}}}^{M}{}\frac{1}{{M\Delta f}}{e^{-j2\pi\frac{p}{M}(m-m^{\prime}-\frac{{{k_{u,b,i}}+{{\tilde{k}}_{u,b,i}}}}{N})}}+\hfill\\ \frac{1}{T}\sum\limits_{i}{}{h_{u,b,i}}{\mathbf{a}_{u,b,i}}\sum\limits_{m^{\prime}}{}\frac{1}{{\sqrt{NM}}}\sum\limits_{k^{\prime}}{}\sum\limits_{l^{\prime}}{}{X^{DD}}\left[{k^{\prime},l^{\prime}}\right]{e^{-j2\pi\left({\frac{{m^{\prime}l^{\prime}}}{M}-\frac{{nk^{\prime}}}{N}}\right)}}\hfill\\ {e^{-j2\pi\frac{{m^{\prime}{l_{i}}}}{M}}}{e^{-j2\pi\frac{{{l_{i}}\left({{k_{u,b,i}}+{{\tilde{k}}_{u,b,i}}}\right)}}{{NM}}}}{e^{j2\pi\frac{{\left({n-1}\right)({k_{u,b,i}}+{{\tilde{k}}_{u,b,i}})}}{N}}}\hfill\\ \sum\limits_{p=0}^{{l_{u,b,i}}-1}{}\frac{1}{{M\Delta f}}{e^{-j2\pi\frac{p}{M}(m-m^{\prime}-\frac{{{k_{u,b,i}}+{{\tilde{k}}_{u,b,i}}}}{N})}}+\mathbf{n}_{u,b}[n,m].\hfill\\ \end{gathered}

(69)

Furthermore, substituting equation (9) into the above equation then we get:

		$\displaystyle{\mathbf{y}_{u,b}^{DD}}\left[{k,l}\right]=\sum\limits_{n}{}\sum\limits_{m}{}\sum\limits_{i}{}{h_{u,b,i}}{\mathbf{a}_{u,b,i}}\sum\limits_{m^{\prime}}{}\frac{1}{{NM}}\sum\limits_{k^{\prime}}{}{\sum\limits_{l^{\prime}}X_{u}^{DD}}\left[{k^{\prime},l^{\prime}}\right]$
		$\displaystyle{e^{-j2\pi\left({\frac{{m^{\prime}l^{\prime}}}{M}-\frac{{nk^{\prime}}}{N}}\right)}}{e^{-j2\pi\frac{{m^{\prime}{l_{u,b,i}}}}{M}}}{e^{-j2\pi\frac{{{l_{u,b,i}}\left({{k_{u,b,i}}+{{\tilde{k}}_{u,b,i}}}\right)}}{{NM}}}}$
		$\displaystyle{e^{j2\pi\frac{{n({k_{u,b,i}}+{{\tilde{k}}_{u,b,i}})}}{N}}}\sum\limits_{p={l_{u,b,i}}}^{M}{}\frac{1}{M}{e^{-j2\pi\frac{p}{M}(m-m^{\prime}-\frac{{{k_{u,b,i}}+{{\tilde{k}}_{u,b,i}}}}{N})}}$
		$\displaystyle{e^{j2\pi\left({\frac{{ml}}{M}-\frac{{nk}}{N}}\right)}}+\sum\limits_{n}{}\sum\limits_{m}{}\sum\limits_{i}{}{h_{u,b,i}}{\mathbf{a}_{u,b,i}}\sum\limits_{m^{\prime}}\frac{1}{{NM}}{e^{j2\pi\left({\frac{{ml}}{M}-\frac{{nk}}{N}}\right)}}\sum\limits_{k^{\prime}}{}$
		$\displaystyle{\sum\limits_{l^{\prime}}X^{DD}}\left[{k^{\prime},l^{\prime}}\right]{e^{-j2\pi\left({\frac{{m^{\prime}l^{\prime}}}{M}-\frac{{\left({n-1}\right)k^{\prime}}}{N}}\right)}}{e^{-j2\pi\frac{{{l_{u,b,i}}\left({{k_{u,b,i}}+{{\tilde{k}}_{u,b,i}}}\right)}}{{NM}}}}$
		$\displaystyle{e^{-j2\pi\frac{{m^{\prime}{l_{u,b,i}}}}{M}}}{e^{j2\pi\frac{{n({k_{u,b,i}}+{{\tilde{k}}_{u,b,i}})}}{N}}}\sum\limits_{p=0}^{{l_{u,b,i}}}{}\frac{1}{M}{e^{-j2\pi\frac{p}{M}(m-m^{\prime}-\frac{{{k_{u,b,i}}+{{\tilde{k}}_{u,b,i}}}}{N})}}$
		$\displaystyle=\sum\limits_{i}{}{h_{u,b,i}}{\mathbf{a}_{u,b,i}}\sum\limits_{k^{\prime}}{}{\sum\limits_{l^{\prime}}X^{DD}}\left[{k^{\prime},l^{\prime}}\right]{e^{-j2\pi\frac{{{l_{i}}\left({{k_{u,b,i}}+{{\tilde{k}}_{u,b,i}}}\right)}}{{NM}}}}$
		$\displaystyle\sum\limits_{p={l_{u,b,i}}}^{M}{}{e^{j2\pi\frac{p}{M}(\frac{{{k_{u,b,i}}+{{\tilde{k}}_{u,b,i}}}}{N})}}\sum\limits_{m^{\prime}}{}\frac{1}{M}{e^{-j2\pi\frac{{m^{\prime}}}{M}(l^{\prime}+{l_{u,b,i}}-p)}}$
		$\displaystyle\sum\limits_{m}{}\frac{1}{M}{e^{-j2\pi\frac{m}{M}(p-l)}}\sum\limits_{n}{}\frac{1}{N}{e^{-j2\pi\frac{n}{N}(k-{k_{u,b,i}}-{{\tilde{k}}_{u,b,i}}-k^{\prime})}}+$
		$\displaystyle\sum\limits_{i}{}{h_{u,b,i}}{\mathbf{a}_{u,b,i}}\sum\limits_{k^{\prime}}{}{\sum\limits_{l^{\prime}}X^{DD}}\left[{k^{\prime},l^{\prime}}\right]{e^{-j2\pi\frac{{k^{\prime}}}{N}}}$
		$\displaystyle\sum\limits_{p=0}^{{l_{u,b,i}}-1}{}{e^{j2\pi\frac{{p-{l_{u,b,i}}}}{M}(\frac{{{k_{u,b,i}}+{{\tilde{k}}_{u,b,i}}}}{N})}}\sum\limits_{m}{}\frac{1}{M}{e^{-j2\pi\frac{m}{M}(p-l)}}$
		$\displaystyle\sum\limits_{m^{\prime}}{}\frac{1}{M}{e^{-j2\pi\frac{{m^{\prime}}}{M}(l^{\prime}+{l_{u,b,i}}-p)}}\sum\limits_{n}{}\frac{1}{N}{e^{-j2\pi\frac{n}{N}(k-{k_{u,b,i}}-{{\tilde{k}}_{u,b,i}}-k^{\prime})}}.$		(70)

We define ${\bm{\Pi}}_{N}\left({x-a}\right)=\frac{1}{N}\sum\limits_{n=0}^{N-1}{}{e^{-j2\pi\frac{n}{N}\left({x-a}\right)}}$ , thus, the above equation can be expressed as:

\begin{gathered}{\mathbf{y}_{b,b}^{DD}}\left[{k,l}\right]=\sum\limits_{i}{}{h_{u,b,i}}{\mathbf{a}_{u,b,i}}\sum\limits_{k^{\prime}}{}{\sum\limits_{l^{\prime}}X_{u}^{DD}}\left[{k^{\prime},l^{\prime}}\right]\hfill\\ \sum\limits_{p={l_{u,b,i}}}^{M}{}{e^{j2\pi\frac{{\left({p-{l_{u,b,i}}}\right)\left({{k_{u,b,i}}+{{\tilde{k}}_{u,b,i}}}\right)}}{{MN}}}}\delta\left({l^{\prime}+{l_{u,b,i}}-p}\right)\hfill\\ \delta\left({p-l}\right){{\bm{\Pi}}_{N}}\left({k-{k_{u,b,i}}-{{\tilde{k}}_{u,b,i}}-k^{\prime}}\right)+\sum\limits_{i}{}{h_{u,b,i}}{\mathbf{a}_{u,b,i}}\sum\limits_{k^{\prime}}{}\sum\limits_{l^{\prime}}\hfill\\ X_{u}^{DD}\left[{k^{\prime},l^{\prime}}\right]{e^{-j2\pi\frac{{k^{\prime}}}{N}}}\sum\limits_{p=0}^{{l_{u,b,i}}-1}{}{e^{j2\pi\frac{{\left({p-{l_{u,b,i}}}\right)\left({{k_{u,b,i}}+{{\tilde{k}}_{u,b,i}}}\right)}}{{MN}}}}\hfill\\ \delta\left({l^{\prime}+{l_{u,b,i}}-p}\right)\delta\left({p-l}\right){{\bm{\Pi}}_{N}}\left({k-{k_{u,b,i}}-{{\tilde{k}}_{u,b,i}}-k^{\prime}}\right).\hfill\\ \end{gathered}

(71)

Analyzing the equation (71), and using the properties of the Dirac delta function, ${\mathbf{y}_{u,b}}^{DD}\left[{k,l}\right]$ can be written as a segment function. When ${l_{u,b,i}}\leq l$ , we have:

\begin{gathered}{\mathbf{y}_{u,b}}^{DD}\left[{k,l}\right]=\sum_{i}{}{h_{u,b,i}}{\mathbf{a}_{u,b,i}}\sum\limits_{k^{\prime}}{}{X_{u}^{DD}}\left[{k^{\prime},l-{l_{u,b,i}}}\right]\hfill\\ {e^{j2\pi\frac{{\left({l-{l_{u,b,i}}}\right)\left({{k_{u,b,i}}+{{\tilde{k}}_{u,b,i}}}\right)}}{{NM}}}}{{\bm{\Pi}}_{N}}\left({k-{k_{u,b,i}}-{{\tilde{k}}_{u,b,i}}-k^{\prime}}\right)\hfill\\ \mathop{\approx}\limits^{a}\sum_{i}{h_{u,b,i}}{\mathbf{a}_{u,b,i}}\sum\limits_{k^{\prime\prime}}{}{X_{u}^{DD}}\left[{k-k^{\prime\prime},l-{l_{u,b,i}}}\right]\hfill\\ {e^{j2\pi\frac{{\left({l-{l_{u,b,i}}}\right)\left({{k_{u,b,i}}+{{\tilde{k}}_{u,b,i}}}\right)}}{{NM}}}}\frac{1}{N}\frac{{1-{e^{j2\pi{{\tilde{k}}_{u,b,i}}}}}}{{1-{e^{-j2\pi\frac{{k^{\prime\prime}-{k_{u,b,i}}-{{\tilde{k}}_{u,b,i}}}}{N}}}}}\hfill\\ \end{gathered}

(72)

Here, the approximate equality $a$ retains only the $2\varepsilon+1$ integer points near the extremum to approximate ${{\bm{\Pi}}_{N}}\left({k-{k_{u,b,i}}-{{\tilde{k}}_{u,b,i}}-k^{\prime}}\right)$ , where $k^{\prime\prime}\in\left[{{k_{i}}-\varepsilon,{k_{i}}+\varepsilon}\right]$ . Similarly, when ${l_{u,b,i}}>l$ , we have

		$\displaystyle{{\mathbf{y}_{u,b}}^{DD}}\left[{k,l}\right]=\sum_{i}{}{h_{u,b,i}}{\mathbf{a}_{u,b,i}}\sum\limits_{k^{\prime}}{}{X_{u}^{DD}}\left[{k^{\prime},l-{l_{u,b,i}}}\right]{e^{-j2\pi\frac{{k^{\prime}}}{N}}}$
		$\displaystyle{e^{j2\pi\frac{{\left({l-{l_{u,b,i}}}\right)\left({{k_{u,b,i}}+{{\tilde{k}}_{u,b,i}}}\right)}}{{NM}}}}{{\bm{\Pi}}_{N}}\left({k-{k_{u,b,i}}-{{\tilde{k}}_{u,b,i}}-k^{\prime}}\right)$
		$\displaystyle\approx\sum_{i}{h_{u,b,i}}{\mathbf{a}_{u,b,i}}\sum\limits_{k^{\prime\prime}}{}{X^{DD}}\left[{k-k^{\prime\prime},l-{l_{u,b,i}}}\right]{e^{-j2\pi\frac{{k-k^{\prime\prime}}}{N}}}$
		$\displaystyle{e^{j2\pi\frac{{\left({l-{l_{u,b,i}}}\right)\left({{k_{u,b,i}}+{{\tilde{k}}_{u,b,i}}}\right)}}{{NM}}}}\frac{1}{N}\frac{{1-{e^{j2\pi{{\tilde{k}}_{u,b,i}}}}}}{{1-{e^{-j2\pi\frac{{k^{\prime\prime}-{k_{u,b,i}}-{{\tilde{k}}_{u,b,i}}}}{N}}}}}.$		(73)

Combining the above derivations, we obtain equation (11).

Appendix B

In the case where $M$ is very small, the received signal in the time-frequency domain can be expressed as:

\begin{gathered}\mathbf{y}_{u,b}\left[{n,m}\right]=\frac{1}{T}\sum\limits_{i}{}{h_{u,b,i}}{\mathbf{a}_{u,b,i}}\sum\limits_{m^{\prime}}{}\frac{1}{{\sqrt{NM}}}\sum\limits_{k^{\prime}}{}\sum\limits_{l^{\prime}}{}{X_{u}^{DD}}\left[{k^{\prime},l^{\prime}}\right]\hfill\\ {e^{-j2\pi\left({\frac{{m^{\prime}l^{\prime}}}{M}-\frac{{nk^{\prime}}}{N}}\right)}}{e^{-j2\pi\frac{{{{\tilde{l}}_{i}}\left({{k_{u,b,i}}+{{\tilde{k}}_{u,b,i}}}\right)}}{{NM}}}}{e^{-j2\pi\frac{{m^{\prime}{{\tilde{l}}_{u,b,i}}}}{M}}}{e^{j2\pi\frac{{n({k_{u,b,i}}+{{\tilde{k}}_{i}})}}{N}}}\hfill\\ \left(\sum\limits_{p=0}^{M}{}\frac{1}{{M\Delta f}}{e^{-j2\pi\frac{p}{M}(m-m^{\prime}-\frac{{{k_{u,b,i}}+{{\tilde{k}}_{u,b,i}}}}{N})}}-\right.\hfill\\ \left.\int_{0}^{{\tau_{u,b,i}}}{}{e^{-j2\pi\Delta ft(m-m^{\prime}-\frac{{{\nu_{u,b,i}}}}{{\Delta f}})}}dt\right)\hfill\\ \mathop{\approx}\limits^{a}\frac{1}{T}\sum\limits_{i}{}{h_{u,b,i}}{\mathbf{a}_{u,b,i}}\sum\limits_{m^{\prime}}{}\frac{1}{{\sqrt{NM}}}\sum\limits_{k^{\prime}}{}\sum\limits_{l^{\prime}}{}{X_{u}^{DD}}\left[{k^{\prime},l^{\prime}}\right]{e^{-j2\pi\frac{{m^{\prime}{{\tilde{l}}_{u,b,i}}}}{M}}}\hfill\\ {e^{-j2\pi\left({\frac{{m^{\prime}l^{\prime}}}{M}-\frac{{nk^{\prime}}}{N}}\right)}}{e^{-j2\pi\frac{{{{\tilde{l}}_{u,b,i}}\left({{k_{u,b,i}}+{{\tilde{k}}_{u,b,i}}}\right)}}{{NM}}}}{e^{j2\pi\frac{{n({k_{u,b,i}}+{{\tilde{k}}_{u,b,i}})}}{N}}}\hfill\\ \sum\limits_{p=0}^{M}{}\frac{1}{{M\Delta f}}{e^{-j2\pi\frac{p}{M}(m-m^{\prime}-\frac{{{k_{u,b,i}}+{{\tilde{k}}_{u,b,i}}}}{N})}}.\hfill\\ \end{gathered}

(74)

In our system, assuming that the delay parameter is much smaller than the duration of one symbol, we can roughly establish the approximation $a$ in the above equation. Similar to the derivation in Appendix A, we can substitute equations (1), (8a), (8b), and (9) to obtain:

	$\displaystyle{\mathbf{y}_{u,b}^{DD}}\left[{k,l}\right]\approx\sum\limits_{n}{}\sum\limits_{m}{}\sum\limits_{i}{}{h_{u,b,i}}{\mathbf{a}_{u,b,i}}\sum\limits_{m^{\prime}}{}\frac{1}{{NM}}\sum\limits_{k^{\prime}}{}{\sum\limits_{l^{\prime}}X_{u}^{DD}}\left[{k^{\prime},l^{\prime}}\right]$
	$\displaystyle{e^{-j2\pi\left({\frac{{m^{\prime}l^{\prime}}}{M}-\frac{{nk^{\prime}}}{N}}\right)}}{e^{-j2\pi\frac{{m^{\prime}{{\tilde{l}}_{i}}}}{M}}}{e^{-j2\pi\frac{{{{\tilde{l}}_{i}}\left({{k_{u,b,i}}+{{\tilde{k}}_{u,b,i}}}\right)}}{{NM}}}}{e^{j2\pi\frac{{n({k_{u,b,i}}+{{\tilde{k}}_{u,b,i}})}}{N}}}$
	$\displaystyle\sum\limits_{p={l_{u,b,i}}}^{M}{}\frac{1}{M}{e^{j2\pi\left({\frac{{ml}}{M}-\frac{{nk}}{N}}\right)}}{e^{-j2\pi\frac{p}{M}(m-m^{\prime}-\frac{{{k_{u,b,i}}+{{\tilde{k}}_{u,b,i}}}}{N})}}$
	$\displaystyle=\sum\limits_{i}{}{h_{u,b,i}}{\mathbf{a}_{u,b,i}}\sum\limits_{k^{\prime}}{}{\sum\limits_{l^{\prime}}X^{DD}}\left[{k^{\prime},l^{\prime}}\right]{e^{-j2\pi\frac{{{{\tilde{l}}_{u,b,i}}\left({{k_{u,b,i}}+{{\tilde{k}}_{u,b,i}}}\right)}}{{NM}}}}$
	$\displaystyle\sum\limits_{p={l_{u,b,i}}}^{M}{}{e^{j2\pi\frac{p}{M}(\frac{{{k_{u,b,i}}+{{\tilde{k}}_{u,b,i}}}}{N})}}\sum\limits_{m^{\prime}}{}\frac{1}{M}{e^{-j2\pi\frac{{m^{\prime}}}{M}(l^{\prime}+{{\tilde{l}}_{u,b,i}}-p)}}$
	$\displaystyle\sum\limits_{m}{}\frac{1}{M}{e^{-j2\pi\frac{m}{M}(p-l)}}\sum\limits_{n}{}\frac{1}{N}{e^{-j2\pi\frac{n}{N}(k-{k_{u,b,i}}-{{\tilde{k}}_{u,b,i}}-k^{\prime})}}$
	$\displaystyle=\sum\limits_{i}{}{h_{u,b,i}}{\mathbf{a}_{u,b,i}}\sum\limits_{k^{\prime}}{}{\sum\limits_{l^{\prime}}X^{DD}}\left[{k^{\prime},l^{\prime}}\right]{e^{-j2\pi\frac{{{{\tilde{l}}_{u,b,i}}\left({{k_{u,b,i}}+{{\tilde{k}}_{u,b,i}}}\right)}}{{NM}}}}$
	$\displaystyle\sum\limits_{p={l_{u,b,i}}}^{M}{}{e^{j2\pi\frac{p}{M}(\frac{{{k_{u,b,i}}+{{\tilde{k}}_{u,b,i}}}}{N})}}{{\bm{\Pi}}_{M}}\left({l^{\prime}+{{\tilde{l}}_{u,b,i}}-p}\right)\delta\left({p-l}\right)$
	$\displaystyle{{\bm{\Pi}}_{N}}\left({k-{k_{u,b,i}}-{{\tilde{k}}_{u,b,i}}-k^{\prime}}\right)$
	$\displaystyle=\sum\limits_{i}{}{h_{u,b,i}}{\mathbf{a}_{u,b,i}}\sum\limits_{k^{\prime}}{}{\sum\limits_{l^{\prime}}X^{DD}}\left[{k^{\prime},l^{\prime}}\right]{e^{j2\pi\frac{{(l-{{\tilde{l}}_{u,b,i}})\left({{k_{u,b,i}}+{{\tilde{k}}_{u,b,i}}}\right)}}{{NM}}}}$
	$\displaystyle{{\bm{\Pi}}_{M}}\left({l^{\prime}+{{\tilde{l}}_{u,b,i}}-l}\right){{\bm{\Pi}}_{N}}\left({k-{k_{u,b,i}}-{{\tilde{k}}_{u,b,i}}-k^{\prime}}\right)$
	$\displaystyle\approx\frac{1}{{NM}}\sum\limits_{i}{}{h_{u,b,i}}{\mathbf{a}_{u,b,i}}\sum\limits_{k^{\prime\prime}}{}{X^{DD}}\left[{k-k^{\prime\prime},l}\right]\frac{{1-{e^{-j2\pi{{\tilde{l}}_{u,b,i}}}}}}{{1-{e^{-j2\pi\frac{{{{\tilde{l}}_{u,b,i}}}}{M}}}}}$
	$\displaystyle{e^{j2\pi\frac{{\left({l-{{\tilde{l}}_{u,b,i}}}\right)\left({{k_{u,b,i}}+{{\tilde{k}}_{u,b,i}}}\right)}}{{NM}}}}\frac{{1-{e^{j2\pi{{\tilde{k}}_{u,b,i}}}}}}{{1-{e^{-j2\pi\frac{{k^{\prime\prime}-{k_{u,b,i}}-{{\tilde{k}}_{u,b,i}}}}{N}}}}}.$

Then we can derive equation (III-A) based on above results.

Appendix C

For the case where $M$ is small, the derivation of equation (27) can be found in Appendix B. For the case where $M$ is large, according to equation (II-B), we have:

	$\displaystyle{\mathbf{y}_{u,b}}\left[{\frac{{n^{\prime}}}{\alpha},m^{\prime}}\right]=\frac{1}{T}\sum\limits_{i}{}{h_{u,b,i}}{\mathbf{a}_{u,b,i}}\sum\limits_{m^{\prime\prime}}{}{{X}}_{u,1}^{TF}\left[\frac{{n^{\prime}}}{\alpha},m^{\prime\prime}\right]{e^{-j2\pi m^{\prime\prime}\Delta f{\tau_{u,b,i}}}}$
	$\displaystyle{e^{-j2\pi{\nu_{u,b,i}}{\tau_{u,b,i}}}}{e^{j2\pi{\nu_{u,b,i}}\frac{{n^{\prime}}}{\alpha}T}}\int_{{\tau_{u,b,i}}}^{T}{}{e^{-j2\pi\Delta ft(m^{\prime}-m^{\prime\prime}-\frac{{{\nu_{u,b,i}}}}{{\Delta f}})}}dt$
	$\displaystyle+\frac{1}{T}\sum\limits_{i}{}{h_{u,b,i}}{\mathbf{a}_{u,b,i}}\sum\limits_{m^{\prime\prime}}{}{{X}}_{u,1}^{TF}\left[\frac{{n^{\prime}}}{\alpha}-1,m^{\prime\prime}\right]{e^{-j2\pi m^{\prime\prime}\Delta f{\tau_{u,b,i}}}}$
	$\displaystyle{e^{-j2\pi m^{\prime\prime}\Delta fT}}{e^{-j2\pi{\nu_{u,b,i}}{\tau_{u,b,i}}}}{e^{j2\pi{\nu_{u,b,i}}\frac{{n^{\prime}}}{\alpha}T}}$
	$\displaystyle\int_{0}^{{\tau_{u,b,i}}}{}{e^{-j2\pi\Delta ft(m^{\prime}-m^{\prime\prime}-\frac{{{\nu_{u,b,i}}}}{{\Delta f}})}}dt$
	$\displaystyle\mathop{=}\limits^{a}\frac{1}{T}\sum\limits_{i}{}{h_{u,b,i}}{\mathbf{a}_{u,b,i}}\sum\limits_{m^{\prime\prime}}{}{{X}}_{u,1}^{TF}\left[\frac{{n^{\prime}}}{\alpha},m^{\prime\prime}\right]{e^{-j2\pi m^{\prime\prime}\Delta f{\tau_{u,b,i}}}}$
	$\displaystyle{e^{-j2\pi{\nu_{u,b,i}}{\tau_{u,b,i}}}}{e^{j2\pi{\nu_{u,b,i}}\frac{{n^{\prime}}}{\alpha}T}}\sum\limits_{p=0}^{M-1}{\frac{1}{{M\Delta f}}{e^{-j2\pi\frac{p}{M}(m^{\prime}-m^{\prime\prime}-\frac{{{\nu_{u,b,i}}}}{{\Delta f}})}}}$
	$\displaystyle+\frac{1}{T}\sum\limits_{i}{}{h_{u,b,i}}{\mathbf{a}_{u,b,i}}\sum\limits_{m^{\prime\prime}}{}\sum\limits_{p=0}^{{l_{u,b,i}}}{\frac{1}{{M\Delta f}}{e^{-j2\pi\frac{p}{M}(m^{\prime}-m^{\prime\prime}-\frac{{{\nu_{u,b,i}}}}{{\Delta f}})}}}$
	$\displaystyle{e^{-j2\pi m^{\prime\prime}\Delta f{\tau_{u,b,i}}}}{e^{-j2\pi{\nu_{u,b,i}}{\tau_{u,b,i}}}}{e^{j2\pi{\nu_{u,b,i}}\frac{{n^{\prime}}}{\alpha}T}}\left({{X}}_{u,1}^{TF}\left[\frac{{n^{\prime}}}{\alpha}-1,m^{\prime\prime}\right]\right.$
	$\displaystyle\left.{e^{-j2\pi m^{\prime\prime}\Delta fT}}-{{X}}_{u,1}^{TF}\left[\frac{{n^{\prime}}}{\alpha},m^{\prime\prime}\right]\right).$

Assuming the time-frequency domain symbols ${\mathbf{X}}_{u,1}^{TF}[n^{\prime},m^{\prime}]$ follow a zero-mean Gaussian distribution, according to the central limit theorem, the ratio of variances between the first and second terms on the right-hand side of the equation $a$ is $\frac{M}{{2\bar{l}}}$ , where $\bar{l}$ is the expected value of the delay quantization value ${l_{u,b,i}}$ . Typically, delays are assumed to be uniformly randomly distributed, so $2\bar{l}={l_{\max}}$ and ${l_{\max}}\ll M$ It can be considered that the first term on the right-hand side of equation $a$ dominates the numerical value. By placing the second term of equation $a$ into the noise, we obtain the equation (27).

Appendix D

According to equations (IV-B) and (50), the posterior mean of ${x_{i,j}}$ can be expressed as:

\begin{gathered}{{\hat{x}}_{i,j}}(t+1)=\int{{x_{i,j}}p\left({{x_{i,j}}|{\mathbf{Y}},{\mathbf{\tau}},\gamma}\right)d}{x_{i,j}}\hfill\\ =\frac{{\sqrt{2\pi u_{i,j}^{r}(t)}}}{{{\psi_{i,j}}(t)}}\left[{e^{-\xi_{i,j}^{+}\left(t\right)}}{\psi_{1}}\left({\varphi_{i,j}^{+}\left(t\right),u_{i,j}^{r}(t)}\right)-{e^{-\xi_{i,j}^{-}\left(t\right)}}\right.\hfill\\ \left.{\psi_{1}}\left({-\varphi_{i,j}^{-}\left(t\right),u_{i,j}^{r}(t)}\right)\right],\hfill\\ \end{gathered}

(75)

where

\begin{gathered}{\psi_{1}}\left({\varphi,u}\right)=\frac{1}{{\sqrt{2\pi u}}}\int_{0}^{+\infty}{}t\exp\left\{{-\frac{{{{\left({t-\varphi}\right)}^{2}}}}{{2u}}}\right\}dt\hfill\\ =\varphi Q\left({-\frac{\varphi}{{\sqrt{u}}}}\right)+\frac{u}{{\sqrt{2\pi u}}}\exp\left\{{-\frac{{{\varphi^{2}}}}{{2u}}}\right\}.\hfill\\ \end{gathered}

(76)

Then we have

		$\displaystyle{\hat{x}_{i,j}}(t+1)=\frac{{\sqrt{2\pi u_{i,j}^{r}(t)}}}{{{\psi_{i,j}}(t)}}\left[{e^{-\xi_{i,j}^{+}\left(t\right)}}\varphi_{i,j}^{+}\left(t\right)Q\left({{{-\varphi_{i,j}^{+}\left(t\right)}}/{{\sqrt{u_{i,j}^{r}(t)}}}}\right)\right.$
		$\displaystyle+{e^{-\xi_{i,j}^{-}\left(t\right)}}\varphi_{i,j}^{-}\left(t\right)Q\left({{{\varphi_{i,j}^{-}\left(t\right)}}/{{\sqrt{u_{i,j}^{r}(t)}}}}\right)$
		$\displaystyle\left.+\frac{{u_{i,j}^{r}(t)}}{{\sqrt{2\pi u_{i,j}^{r}(t)}}}\left({{e^{-\xi_{i,j}^{+}\left(t\right)-\frac{{{{\left({\varphi_{i,j}^{+}\left(t\right)}\right)}^{2}}}}{{2u_{i,j}^{r}(t)}}}}-{e^{-\xi_{i,j}^{-}\left(t\right)-\frac{{{{\left({\varphi_{i,j}^{-}\left(t\right)}\right)}^{2}}}}{{2u_{i,j}^{r}(t)}}}}}\right)\right].$		(77)

From equations (50d) to (50g), we can get that

\xi_{i,j}^{+}\left(t\right)+\frac{{{{\left({\varphi_{i,j}^{+}\left(t\right)}\right)}^{2}}}}{{2u_{i,j}^{r}(t)}}=\xi_{i,j}^{-}\left(t\right)+\frac{{{{\left({\varphi_{i,j}^{-}\left(t\right)}\right)}^{2}}}}{{2u_{i,j}^{r}(t)}}=\frac{{{{\left({{{\hat{r}}_{i,j}}(t)}\right)}^{2}}}}{{2u_{i,j}^{r}(t)}}.

(78)

The last two terms of equation (D) can be eliminated, resulting in equation (51). We define

\begin{gathered}{\chi_{i,j}}(t+1)=\int{}x_{i,j}^{2}p\left({{x_{i,j}}|{\mathbf{Y}},{\mathbf{\tau}},\gamma}\right)d{x_{i,j}}\hfill\\ =\frac{{\sqrt{2\pi u_{i,j}^{r}(t)}}}{{{\psi_{i,j}}(t)}}\left[{e^{-\xi_{i,j}^{+}\left(t\right)}}{\psi_{2}}\left({\varphi_{i,j}^{+}\left(t\right),u_{i,j}^{r}(t)}\right)+{e^{-\xi_{i,j}^{-}\left(t\right)}}\right.\hfill\\ \left.{\psi_{2}}\left({-\varphi_{i,j}^{-}\left(t\right),u_{i,j}^{r}(t)}\right)\right],\hfill\\ \end{gathered}

(79)

where

{\psi_{2}}\left({\varphi,u}\right)=\frac{1}{{\sqrt{2\pi u}}}\int_{0}^{+\infty}{}{t^{2}}\exp\left\{{-\frac{{{{\left({t-\varphi}\right)}^{2}}}}{{2u}}}\right\}dt.

(80)

First we have

\begin{array}[]{*{20}{c}}{g\left(t\right)=\exp\left\{{-\frac{{{{\left({t-\varphi}\right)}^{2}}}}{{2u}}}\right\}}&\to&{g^{\prime}\left(t\right)=-\frac{{t-\varphi}}{u}g\left(t\right)}\\ {f\left(t\right)=t}&\to&{f^{\prime}\left(t\right)=1}\end{array},

(81)

using the fact that

\int_{0}^{+\infty}{f\left(t\right)g^{\prime}\left(t\right)dt}=\left.{f\left(t\right)g\left(t\right)}\right|_{0}^{+\infty}-\int_{0}^{+\infty}{f^{\prime}\left(t\right)g\left(t\right)dt},

(82)

and $\left.{f\left(t\right)g\left(t\right)}\right|_{0}^{+\infty}=0$ to get

\int_{0}^{+\infty}{}\frac{{t\left({t-\varphi}\right)}}{u}\exp\left\{{-\frac{{{{\left({t-\varphi}\right)}^{2}}}}{{2u}}}\right\}dt=\int_{0}^{+\infty}{}\exp\left\{{-\frac{{{{\left({t-\varphi}\right)}^{2}}}}{{2u}}}\right\}dt.

(83)

In the right-hand side of equation (83), we set $x=\left({t-\varphi}\right)/\sqrt{u}$ and substitute the definitions of ${\psi_{1}}\left({\varphi,u}\right)$ and ${\psi_{2}}\left({\varphi,u}\right)$ into the left-hand side, yielding:

\frac{{\sqrt{2\pi u}}}{u}{\psi_{2}}\left({\varphi,u}\right)-\frac{{\varphi\sqrt{2\pi u}}}{u}{\psi_{1}}\left({\varphi,u}\right)=\sqrt{2\pi u}Q\left({\frac{{-\gamma}}{{\sqrt{u}}}}\right).

(84)

Then we get

{\psi_{2}}\left({\varphi,u}\right)=\varphi{\psi_{1}}\left({\varphi,u}\right)+uQ\left({\frac{{-\gamma}}{{\sqrt{u}}}}\right).

(85)

With the definition of ${\psi_{1}}\left({\varphi,u}\right)$ , it can be obtained that

\begin{gathered}{e^{-\xi_{i,j}^{+}\left(t\right)}}{\psi_{2}}\left({\varphi_{i,j}^{+}\left(t\right),u_{i,j}^{r}(t)}\right)=\hfill\\ \left({{{\left({\varphi_{i,j}^{+}\left(t\right)}\right)}^{2}}+u_{i,j}^{r}(t)}\right){e^{-\xi_{i,j}^{+}\left(t\right)}}Q\left({\frac{{-\varphi_{i,j}^{+}\left(t\right)}}{{\sqrt{u_{i,j}^{r}(t)}}}}\right)\hfill\\ +\frac{{u_{i,j}^{r}(t)\varphi_{i,j}^{+}\left(t\right)}}{{\sqrt{2\pi u_{i,j}^{r}(t)}}}\exp\left\{{-\frac{{{{\left({{{\hat{r}}_{i,j}}(t)}\right)}^{2}}}}{{2u_{i,j}^{r}(t)}}}\right\},\hfill\\ \end{gathered}

(86)

\begin{gathered}{e^{-\xi_{i,j}^{-}\left(t\right)}}{\psi_{2}}\left({-\varphi_{i,j}^{-}\left(t\right),u_{i,j}^{r}(t)}\right)=\hfill\\ \left({{{\left({\varphi_{i,j}^{-}\left(t\right)}\right)}^{2}}+u_{i,j}^{r}(t)}\right){e^{-\xi_{i,j}^{-}\left(t\right)}}Q\left({\frac{{-\varphi_{i,j}^{-}\left(t\right)}}{{\sqrt{u_{i,j}^{r}(t)}}}}\right)\hfill\\ +\frac{{u_{i,j}^{r}(t)\varphi_{i,j}^{-}\left(t\right)}}{{\sqrt{2\pi u_{i,j}^{r}(t)}}}\exp\left\{{-\frac{{{{\left({{{\hat{r}}_{i,j}}(t)}\right)}^{2}}}}{{2u_{i,j}^{r}(t)}}}\right\}.\hfill\\ \end{gathered}

(87)

Combining equations (50f) and (50g), and substituting (86) and (87) into (79), finally using the variance definition $u_{i,j}^{x}(t+1)={\chi_{i,j}}(t+1)-{\left({{{\hat{x}}_{i,j}}(t+1)}\right)^{2}}$ , we obtain the result of equation (52).

References

[1] M. Matthaiou, O. Yurduseven, H. Q. Ngo, D. Morales-Jimenez, S. L. Cotton, and V. F. Fusco, “The road to 6G: Ten physical layer challenges for communications engineers,” IEEE Commun. Mag., vol. 59, pp. 64–69, Jan. 2021.
[2] Y. Wu, X. Gao, S. Zhou, W. Yang, Y. Polyanskiy, and G. Caire, “Massive access for future wireless communication systems,” IEEE Wirel. Commun., vol. 27, pp. 148–156, Aug. 2020.
[3] B. Ai, A. F. Molisch, M. Rupp, and Z.-D. Zhong, “5G key technologies for smart railways,” Proc. IEEE, vol. 108, pp. 856–893, June 2020.
[4] M. B. Shahab, R. Abbas, M. Shirvanimoghaddam, and S. J. Johnson, “Grant-free non-orthogonal multiple access for IoT: A survey,” IEEE Commun. Surv. Tutor., vol. 22, pp. 1805–1838, May 2020.
[5] H. Chen, R. Abbas, P. Cheng, M. Shirvanimoghaddam, W. Hardjawana, W. Bao, Y. Li, and B. Vucetic, “Ultra-reliable low latency cellular networks: Use cases, challenges and approaches,” IEEE Commun. Mag., vol. 56, pp. 119–125, Dec. 2018.
[6] O. Kodheli, E. Lagunas, N. Maturo, S. K. Sharma, B. Shankar, J. F. M. Montoya, J. C. M. Duncan, D. Spano, S. Chatzinotas, S. Kisseleff, et al., “Satellite communications in the new space era: A survey and future challenges,” IEEE Commun. Surv. Tutor., vol. 23, pp. 70–109, Oct. 2020.
[7] Y. Liu, S. Zhang, X. Mu, Z. Ding, R. Schober, N. Al-Dhahir, E. Hossain, and X. Shen, “Evolution of NOMA toward next generation multiple access (NGMA) for 6G,” IEEE J. Sel. Areas Commun., vol. 40, pp. 1037–1071, Apr. 2022.
[8] W. C. Jakes and D. C. Cox, Microwave mobile communications. New York, NY, USA: Wiley-IEEE press, 1994.
[9] R. Hadani, S. Rakib, M. Tsatsanis, A. Monk, A. J. Goldsmith, A. F. Molisch, and R. Calderbank, “Orthogonal time frequency space modulation,” in Proc. IEEE Wireless Commun. Netw. Conf (WCNC), pp. 1–6, IEEE, Mar. 2017.
[10] A. Farhang, A. RezazadehReyhani, L. E. Doyle, and B. Farhang-Boroujeny, “Low complexity modem structure for OFDM-based orthogonal time frequency space modulation,” IEEE Wirel. Commun. Lett., vol. 7, pp. 344–347, Nov. 2017.
[11] Z. Wei, W. Yuan, S. Li, J. Yuan, G. Bharatula, R. Hadani, and L. Hanzo, “Orthogonal time-frequency space modulation: A promising next-generation waveform,” IEEE Wirel. Commun., vol. 28, pp. 136–144, Aug. 2021.
[12] J. Wang, C. Jiang, and L. Kuang, “High-mobility satellite-UAV communications: Challenges, solutions, and future research trends,” IEEE Commun. Mag., vol. 60, pp. 38–43, May 2022.
[13] J. Zhang, E. Björnson, M. Matthaiou, D. W. K. Ng, H. Yang, and D. J. Love, “Prospective multiple antenna technologies for beyond 5G,” IEEE J. Sel. Areas Commun., vol. 38, pp. 1637–1660, Aug. 2020.
[14] H. Q. Ngo, A. Ashikhmin, H. Yang, E. G. Larsson, and T. L. Marzetta, “Cell-free massive MIMO versus small cells,” IEEE Trans. Wirel. Commun., vol. 16, pp. 1834–1850, Mar. 2017.
[15] E. Björnson and L. Sanguinetti, “Scalable cell-free massive MIMO systems,” IEEE Trans. Commun., vol. 68, pp. 4247–4261, July 2020.
[16] D. Wang, X. You, Y. Huang, W. Xu, J. Li, P. Zhu, Y. Jiang, Y. Cao, X. Xia, Z. Zhang, et al., “Full-spectrum cell-free RAN for 6G systems: system design and experimental results,” Sci. China-Inf. Sci., vol. 66, p. 130305, Feb. 2023.
[17] M. Mohammadi, H. Q. Ngo, and M. Matthaiou, “Cell-free massive MIMO meets OTFS modulation,” IEEE Trans. Commun., vol. 70, pp. 7728–7747, Nov. 2022.
[18] Z. Gao, X. Zhou, J. Zhao, J. Li, C. Zhu, C. Hu, P. Xiao, S. Chatzinotas, D. W. K. Ng, and B. Ottersten, “Grant-free NOMA-OTFS paradigm: Enabling efficient ubiquitous access for LEO satellite Internet-of-Things,” IEEE Netw., vol. 37, pp. 18–26, Jan. 2023.
[19] B. Shen, Y. Wu, J. An, C. Xing, L. Zhao, and W. Zhang, “Random access with massive MIMO-OTFS in LEO satellite communications,” IEEE J. Sel. Areas Commun., vol. 40, pp. 2865–2881, Oct. 2022.
[20] X. Zhou, K. Ying, Z. Gao, Y. Wu, Z. Xiao, S. Chatzinotas, J. Yuan, and B. Ottersten, “Active terminal identification, channel estimation, and signal detection for grant-free NOMA-OTFS in LEO satellite Internet-of-Things,” IEEE Trans. Wirel. Commun., vol. 22, pp. 2847–2866, Apr. 2022.
[21] Y. Ma, G. Ma, N. Wang, Z. Zhong, and B. Ai, “OTFS-TSMA for massive Internet of Things in high-speed railway,” IEEE Trans. Wirel. Commun., vol. 21, pp. 519–531, Jan. 2021.
[22] A. K. Sinha, S. K. Mohammed, P. Raviteja, Y. Hong, and E. Viterbo, “OTFS based random access preamble transmission for high mobility scenarios,” IEEE Trans. Veh. Technol., vol. 69, pp. 15078–15094, Dec. 2020.
[23] P. Raviteja, K. T. Phan, and Y. Hong, “Embedded pilot-aided channel estimation for OTFS in delay–Doppler channels,” IEEE Trans. Veh. Technol., vol. 68, pp. 4906–4917, May 2019.
[24] H. B. Mishra, P. Singh, A. K. Prasad, and R. Budhiraja, “OTFS channel estimation and data detection designs with superimposed pilots,” IEEE Trans. Wirel. Commun., vol. 21, pp. 2258–2274, Apr. 2021.
[25] E. J. Candès, J. Romberg, and T. Tao, “Robust uncertainty principles: Exact signal reconstruction from highly incomplete frequency information,” IEEE Trans. Inf. Theory, vol. 52, pp. 489–509, Feb. 2006.
[26] S. Rangan, “Generalized approximate message passing for estimation with random linear mixing,” in Proc. IEEE Int. Symp. Inf. Theory, pp. 2168–2172, IEEE, July 2011.
[27] J. Fang, Y. Shen, H. Li, and P. Wang, “Pattern-coupled sparse Bayesian learning for recovery of block-sparse signals,” IEEE Trans. Signal Process., vol. 63, pp. 360–372, Jan. 2014.
[28] F. Bellili, F. Sohrabi, and W. Yu, “Generalized approximate message passing for massive MIMO mmWave channel estimation with Laplacian prior,” IEEE Trans. Commun., vol. 67, pp. 3205–3219, May 2019.
[29] Y. Hu, D. Wang, X. Xia, J. Li, P. Zhu, and X. You, “A novel massive random access in cell-free massive mimo systems for high-speed mobility with otfs modulation,” arXiv preprint arXiv:2409.01111, 2025. https://arxiv.org/abs/2409.01111.
[30] S. D. Babacan, R. Molina, and A. K. Katsaggelos, “Bayesian compressive sensing using Laplace priors,” IEEE Trans. Image Process., vol. 19, pp. 53–63, Jan. 2009.
[31] M. Series, “Guidelines for evaluation of radio interface technologies for IMT-Advanced,” Report ITU, vol. 638, no. 31, 2009.
[32] J. Fang, L. Zhang, and H. Li, “Two-dimensional pattern-coupled sparse Bayesian learning via generalized approximate message passing,” IEEE Trans. Image Process., vol. 25, pp. 2920–2930, June 2016.
[33] J. A. Tropp and A. C. Gilbert, “Signal recovery from random measurements via orthogonal matching pursuit,” IEEE Trans. Inf. Theory, vol. 53, pp. 4655–4666, Dec. 2007.
[34] L. Lu, W. Xu, Y. Wang, and Z. Tian, “Compressive spectrum sensing using sampling-controlled block orthogonal matching pursuit,” IEEE Trans. Commun., vol. 71, pp. 1096–1111, Feb. 2022.
[35] M. Zhang, X. Yuan, and Z.-Q. He, “Variance state propagation for structured sparse bayesian learning,” IEEE Trans. Signal Process., vol. 68, pp. 2386–2400, Mar. 2020.
[36] X. Zhang, P. Fan, L. Hao, and X. Quan, “Generalized approximate message passing based Bayesian learning detectors for uplink grant-free NOMA,” IEEE Trans. Veh. Technol., vol. 72, pp. 15057–15061, Nov. 2023.