Cell-Free Massive MIMO Detection: A Distributed Expectation Propagation Approach

Hengtao He, Xianghao Yu, Jun Zhang, S.H. Song, and Khaled B. Letaief This paper has been presented in part at the 2021 IEEE Global Commun. Conf. (GLOBECOM), Madrid, Spain. [1]. This work was supported by the Research Grant Council under Grant No. 16212120.H. He, X. Yu, J. Zhang, S. Song and Khaled B. Letaief are with the Department of Electronic and Computer Engineering, the Hong Kong University of Science and Technology, Hong Kong, E-mail: {eehthe, eexyu, eejzhang, eeshsong, eekhaled@ust.hk}@ust.hk.

Abstract

Cell-free massive MIMO is one of the core technologies for future wireless networks. It is expected to bring enormous benefits, including ultra-high reliability, data throughput, energy efficiency, and uniform coverage. As a radically distributed system, the performance of cell-free massive MIMO critically relies on efficient distributed processing algorithms. In this paper, we propose a distributed expectation propagation (EP) detector for cell-free massive MIMO, which consists of two modules: a nonlinear module at the central processing unit (CPU) and a linear module at each access point (AP). The turbo principle in iterative channel decoding is utilized to compute and pass the extrinsic information between the two modules. An analytical framework is provided to characterize the asymptotic performance of the proposed EP detector with a large number of antennas. Furthermore, a distributed iterative channel estimation and data detection (ICD) algorithm is developed to handle the practical setting with imperfect channel state information (CSI). Simulation results will show that the proposed method outperforms existing detectors for cell-free massive MIMO systems in terms of the bit-error rate and demonstrate that the developed theoretical analysis accurately predicts system performance. Finally, it is shown that with imperfect CSI, the proposed ICD algorithm improves the system performance significantly and enables non-orthogonal pilots to reduce the pilot overhead.

Index Terms:

6G, cell-free massive MIMO, distributed MIMO detection, expectation propagation.

I Introduction

The fifth-generation (5G) wireless networks have been commercialized since 2019 to support a wide range of services, including enhanced mobile broadband, ultra-reliable and low-latency communications, and massive machine-type communications[2]. However, the endeavor for faster and more reliable wireless communications will never stop. This trend is reinforced by the recent emergence of several innovative applications, including the Internet of Everything, Tactile Internet, and seamless virtual and augmented reality. Future sixth-generation (6G) wireless networks are expected to provide ubiquitous coverage, enhanced spectral efficiency (SE), connected intelligence, etc. [3, 4]. Such diverse service requirements create daunting challenges for system design. With the commercialization of massive MIMO technologies [5] in 5G, it is time to think about new MIMO-based network architectures to support the continuous exponential growth of mobile data traffic and a plethora of applications.

As a promising solution, cell-free massive MIMO was proposed [6]. It is a disruptive technology and has been recognized as a crucial and core enabler for beyond 5G and 6G networks[7, 8, 9, 10, 12, 11]. Cell-free massive MIMO can be interpreted as a combination of massive MIMO[5], distributed antenna systems (DAS)[13], and Network MIMO[14]. It is expected to bring important benefits, including huge data throughput, ultra-low latency, ultra-high reliability, a huge increase in the mobile energy efficiency, and ubiquitous and uniform coverage. In cell-free massive MIMO systems, a very large number of distributed access points (APs) are connected to a central processing unit (CPU) via a front-haul network in order to cooperate and jointly serve a large number of users using the same time or frequency resources over a wide coverage area. In contrast to current cellular systems, there is no cell or cell boundary in cell-free MIMO networks. As a result, this approach is revolutionary and will be able to relieve one of the major bottlenecks and inherent limitations of wireless networks, i.e., the strong inter-cell interference. Compared to conventional co-located massive MIMO, cell-free networks offer more uniform connectivity for all users thanks to the macro-diversity gain provided by the distributed antennas.

Investigations on cell-free massive MIMO started with some initial attempts on analyzing the SE [6], where single-antenna APs, single-antenna users, and Rayleigh fading channels were considered. The analysis has been extended to multi-antenna APs with Rayleigh fading, Rician fading, and correlated channels [15, 16, 17], showing that cell-free massive MIMO networks can achieve great SE. The energy efficiency of cell-free massive MIMO systems was then investigated [18, 19]. It was shown that cell-free massive MIMO systems can improve the energy efficiency by approximately ten times compared to cellular massive MIMO. Although cell-free massive MIMO has shown a huge potential, its deployment critically depends on effective and scalable algorithms. According to [20], cell-free massive MIMO is considered to be scalable if the signal processing tasks for channel estimation, precoder and combiner design, fronthaul overhead, and power optimization per AP can be kept within finite complexity when the number of served users goes to infinity. Unfortunately, the conventional cell-free massive MIMO is not scalable. To tackle the scalability issue, the concept of user-centric has been introduced into cell-free massive MIMO systems [20, 21, 22, 23, 24, 25]. In particular, a user-centric dynamic cooperation clustering (DCC) scheme was introduced [20], where each user is only served by a subset of APs. This scheme was called scalable cell-free massive MIMO.

In this paper, we will focus on effective data detection algorithms in cell-free massive MIMO systems. In this aspect, some early attempts were made on centralized algorithms where the detection is implemented at the CPU with the received pilots and data signals reported from all APs [6, 26]. However, the computational and fronthaul overhead of such a centralized detection scheme is prohibitively high when the network size becomes large. To address this challenge, distributed detectors have been recently investigated. In [27], one centralized and three distributed receivers with different levels of cooperation among APs were compared in terms of SE. Radio stripes were then incorporated into cell-free massive MIMO in [28]. In this case, the APs are sequentially connected and share the same fronthaul link in a radio stripe network, thus reducing the cabling substantially. Based on this structure, a novel uplink sequential processing algorithm was developed which can achieve an optimal performance in terms of both SE and mean-square error (MSE). Furthermore, it can achieve the same performance as the centralized minimum MSE (MMSE) processing, while requiring much lower fronthaul overhead by making full use of the computational resources at the APs. However, the distributed detectors investigated in [27] are linear detectors. Therefore, they are highly suboptimal or may even be ill-conditioned in terms of the bit-error rate (BER) performance. To address this problem, the local per-bit soft detection is exploited at each AP with the bit log-likelihoods shared on the front-haul link by a partial marginalization detector [29]. However, the proposed soft detection is still very complex as the approximate posterior density for each received data bit is required to be computed at each AP. Therefore, it is of great importance to develop a distributed and non-linear receiver to achieve a better BER performance with a considerably lower complexity.

To fill this gap, we propose a distributed non-linear detector for cell-free massive MIMO networks in this paper, which is derived based on the expectation propagation (EP) principle [30]. The EP algorithm, proposed in [30], provides an iterative method to recover the transmitted data from the received signal and has recently attracted extensive research interests in massive MIMO detection [31, 32, 33, 34]. Recently, it has been extended to cell-free massive MIMO systems [35, 36]. The algorithm is derived from the factor graph with the messages updated and passed between different pairs of nodes. Specifically, with the linear MMSE estimator, the APs first detect the symbols with the local channel state information (CSI) and transfer the posterior mean and variance estimates to the CPU. Then, the extrinsic information for each AP is computed and integrated at the CPU by maximum-ratio combining (MRC). Subsequently, the CPU uses the posterior mean estimator to refine the detection and the extrinsic information is transferred to each AP from the CPU via the fronthaul.

The main contributions of this paper are summarized as follows:

•

Different from the existing linear detectors in cell-free massive MIMO[27], we propose a distributed and non-linear detector. The detection performance is improved at the cost of slightly increasing the computation overhead at the computationally-powerful CPU. Simulation results will demonstrate that the proposed method outperforms existing distributed detectors and even the centralized MMSE detector in terms of BER performance.
•

To be applicable in practical scenarios with imperfect CSI, we further develop a distributed iterative channel estimation and data detection (ICD) algorithm for cell-free massive MIMO systems, where the detector takes the channel estimation error and channel statistics into consideration while the channel estimation is refined by the detected data. Simulation results will demonstrate that the proposed ICD algorithm outperforms existing distributed detectors and enables non-orthogonal pilots to reduce the pilot overhead.
•

We develop an analytical framework to analyze the performance of the distributed EP algorithm, which can precisely predict the performance of the proposed detector. Based on the theoretical analysis, key performance metrics of the system, such as the MSE and BER, can be analytically determined without time-consuming Monte Carlo simulation.

Notations—For any matrix $\mathbf{A}$ , $\mathbf{A}^{H}$ and ${\mathrm{tr}}(\mathbf{A})$ will denote the conjugate transpose and trace of $\mathbf{A}$ , respectively. In addition, $\mathbf{I}$ is the identity matrix and $\mathbf{0}$ is the zero matrix. We use $\mathrm{D}z$ to denote the real Gaussian integration measure. That is,

\mathrm{D}z=\phi(z)dz,\quad\mathrm{where}\quad\phi(z)\triangleq\frac{1}{\sqrt{2\pi}}e^{-\frac{z^{2}}{2}}.

A complex Gaussian distribution with mean $\boldsymbol{\mu}$ and covariance $\boldsymbol{\Omega}$ can be described by the probability density function,

\mathcal{N}_{\mathbb{C}}(\mathbf{z};\boldsymbol{\mu},\boldsymbol{\Omega})=\frac{1}{\mathrm{det}(\pi\boldsymbol{\Omega})}e^{-(\mathbf{z}-\boldsymbol{\mu})^{H}\boldsymbol{\Omega}^{-1}(\mathbf{z}-\boldsymbol{\mu})}.

The remaining part of this paper is organized as follows. Section II introduces the system model and formulates the cell-free massive MIMO detection problem. The distributed EP detector is proposed in Section III and the distributed ICD receiver is investigated in Section IV. Furthermore, an analytical framework is provided in Section V. Numerical results are then presented in Section VI and Section VII concludes the paper.

II System Model

In this section, we first present the system model and formulate the cell-free massive MIMO detection problem. Four commonly-adopted receivers are then briefly introduced.

II-A Cell-Free Massive MIMO

As illustrated in Fig. 1, we consider a cell-free massive MIMO network with $L$ distributed APs, each equipped with $N$ antennas to serve single-antenna users. All APs are connected to a CPU that has abundant computing resources. Denote ${\mathbf{h}}_{kl}\in{\mathbb{C}}^{N\times 1}\sim\mathcal{N}_{{\mathbb{C}}}(\mathbf{0},{\mathbf{R}}_{kl})$ as the channel between the $k$ -th user and the $l$ -th AP, where ${\mathbf{R}}_{kl}\in{\mathbb{C}}^{N\times N}$ is the spatial correlation matrix, and $\beta_{k,l}=\mathrm{tr}({\mathbf{R}}_{kl})/N$ as the large-scale fading coefficient involving the geometric path loss and shadowing. In the uplink data transmission phase, we consider $\mathcal{M}_{k}\subset\{1,\ldots,L\}$ as the subset of APs that serve the $k$ -th user and define the DCC matrices ${\mathbf{D}}_{kl}\in{\mathbb{C}}^{N\times N}$ based on $\mathcal{M}_{k}$ as

{\mathbf{D}}_{kl}=\left\{\begin{aligned} &{\mathbf{I}}_{N}\quad\mathrm{if}\ l\in\mathcal{M}_{k}\\ &\mathbf{0}_{N\times N}\quad\mathrm{if}\ l\notin\mathcal{M}_{k}.\end{aligned}\right.

(1)

Specifically, ${\mathbf{D}}_{kl}$ is an identity matrix if the $l$ -th AP serves the $k$ -th user, and is a zero matrix, otherwise. Furthermore, we define $\mathcal{D}_{l}$ as the set of user indices that are served by the $l$ -th AP

\mathcal{D}_{l}=\bigg{\{}k:\mathrm{tr}({\mathbf{D}}_{kl})\geq 1,k\in\{1,\ldots,K\}\bigg{\}},

(2)

where the cardinality of $\mathcal{D}_{l}$ is denoted as $|\mathcal{D}_{l}|$ . When each AP serves all users, we have $|\mathcal{D}_{l}|=K$ accordingly. Assuming that perfect CSI is available at the local APs, the received signal ${\mathbf{y}}_{l}\in{\mathbb{C}}^{N\times 1}$ at the $l$ -th AP is then given by

{\mathbf{y}}_{l}=\sum_{k\in\mathcal{D}_{l}}\sqrt{p_{k}}{\mathbf{h}}_{kl}x_{k}+{\mathbf{n}}_{l},

(3)

where $x_{k}\in{\mathbb{C}}$ is the transmitted symbol drawn from an $M$ -QAM constellation, and $p_{k}>0$ is the transmit power at the $k$ -th user. The additive noise at the $l$ -th AP is denoted as ${\mathbf{n}}_{l}\in{\mathbb{C}}^{N\times 1}\sim\mathcal{N}_{{\mathbb{C}}}(\mathbf{0}_{N},\sigma^{2}{\mathbf{I}}_{N})$ . Let ${\mathbf{x}}=[x_{1},\ldots,x_{K}]^{T}$ denote the transmitted vector from all users and ${\mathbf{H}}_{l}=[{\mathbf{h}}_{1l},\ldots,{\mathbf{h}}_{Kl}]\in{\mathbb{C}}^{N\times K}$ is the channel from all users to AP $l$ . If the $k$ -th user is not associated with the $l$ -th AP, the channel vector is ${\mathbf{h}}_{kl}=\mathbf{0}$ . Furthermore, we denote ${\mathbf{H}}=[{\mathbf{H}}_{1}^{T},\ldots,{\mathbf{H}}_{L}^{T}]^{T}\in{\mathbb{C}}^{LN\times K}$ as the channel matrix between all users and APs. The uplink detection problem for cell-free massive MIMO is to detect the transmitted data ${\mathbf{x}}$ based on the received signals ${\mathbf{y}}_{l}\,(l=1,2,\ldots,L)$ , channel matrix ${\mathbf{H}}$ , and noise power $\sigma^{2}$ .

Refer to caption — Figure 1: . Conventional and scalable cell-free massive MIMO systems.

II-B Linear Receivers

For the cell-free massive MIMO detection, there are four commonly-adopted linear receivers [27] with different levels of cooperation among APs. This is elaborated as follows.

•

Fully centralized receiver: The pilot and data signals received at all APs are sent to the CPU for channel estimation and data detection. The CPU may perform the MMSE or MRC detection.
•

Partially distributed receiver with large-scale fading decoding: First, each AP estimates the channels and uses the linear MMSE detector to detect the received signals. Then, the detected signals are collected at the CPU for joint detection for all users by utilizing the large-scale fading decoding (LSFD) method. Compared to the fully centralized receiver, only the channel statistics are utilized at the CPU but the pilot signals are not required to be sent to the CPU.
•

Partially distributed receiver with average decoding: It is a special case of the partially distributed receiver with LSFD. The CPU performs joint detection for all users by simply taking the average of the local estimates. Thus, no channel statistics are required to be transmitted to the CPU via the fronthaul.
•

Fully distributed receiver: It is a fully distributed approach in which the data detection is performed at the APs based on the local channel estimates. No information is required to be transferred to the CPU. Each AP may perform MMSE detection.

All of the above-mentioned four receivers achieve performance that is far from the optimal one due to the linear processing. In contrast, non-linear receivers have shown great advantages in terms of BER in cellular massive MIMO systems but at the cost of higher computational complexity [31]. Thanks to the relatively high computing abilities at the CPU and the large number of APs in cell-free massive MIMO systems, we can offload some of the computational-intensive operations to the CPU and distribute the partial computation tasks to APs. Following this idea, we will propose a distributed non-linear detector for cell-free massive MIMO systems.

III Proposed Distributed EP Detector

In this section, we first formulate the MIMO detection problem from the Bayesian inference. Then, we introduce the factor graph and propose the distributed EP detector. Finally, we will analyze the computational complexity and fronthaul overhead of the proposed detector.

III-A Distributed Bayesian Inference

We first apply the Bayesian inference to recover the signals ${\mathbf{x}}$ from the received signal ${\mathbf{y}}$ in the data detection stage with the linear model ${\mathbf{y}}={\mathbf{H}}{\mathbf{x}}+{\mathbf{n}}$ , where ${\mathbf{y}}=[{\mathbf{y}}_{1}^{T},\ldots,{\mathbf{y}}_{L}^{T}]^{T}\in{\mathbb{C}}^{LN\times 1}$ , ${\mathbf{H}}=[{\mathbf{H}}_{1}^{T},\ldots,{\mathbf{H}}_{L}^{T}]^{T}\in{\mathbb{C}}^{LN\times K}$ and ${\mathbf{x}}=[x_{1},\ldots,x_{K}]^{T}{\mathbb{C}}^{K\times 1}$ . Based on Bayes’ theorem, the posterior probability is given by

\mathtt{P}({\mathbf{x}}|{\mathbf{y}},{\mathbf{H}})=\frac{\mathtt{P}({\mathbf{y}}|{\mathbf{x}},{\mathbf{H}})\mathtt{P}({\mathbf{x}})}{\mathtt{P}({\mathbf{y}}|{\mathbf{H}})}=\frac{\mathtt{P}({\mathbf{y}}|{\mathbf{x}},{\mathbf{H}})\mathtt{P}({\mathbf{x}})}{\int\mathtt{P}({\mathbf{y}}|{\mathbf{x}},{\mathbf{H}})\mathtt{P}({\mathbf{x}})d{\mathbf{x}}}

(4)

where $\mathtt{P}({\mathbf{y}}|{\mathbf{x}},{\mathbf{H}})$ is the likelihood function with the known channel matrix and $\mathtt{P}({\mathbf{x}})$ is the prior distribution of ${\mathbf{x}}$ . Given the posterior probability $\mathtt{P}({\mathbf{x}}|{\mathbf{y}},{\mathbf{H}})$ , the Bayesian MMSE estimate is obtained by

\hat{{\mathbf{x}}}=\int{\mathbf{x}}\mathtt{P}({\mathbf{x}}|{\mathbf{y}},{\mathbf{H}})d{\mathbf{x}}.

(5)

However, the Bayesian MMSE estimator is not tractable because the marginal posterior probability in (5) involves a high-dimensional integral, which motivates us to develop an advanced method to approximate (4) effectively. Furthermore, different from conventional cellular MIMO systems[31], the posterior probability in (4) has to be rewritten in a distributed way because of the inherent distributed characteristic of cell-free massive MIMO systems, given by

\mathtt{P}({\mathbf{x}}|{\mathbf{y}},{\mathbf{H}})\varpropto\mathtt{P}({\mathbf{x}})\prod_{l=1}^{L}\mathrm{exp}(-\|{\mathbf{y}}_{l}-{\mathbf{H}}_{l}{\mathbf{x}}\|^{2}/\sigma^{2}).

(6)

As a result, we can develop an efficient distributed MIMO detector to approximate the posterior probability (6) by resorting to message passing on factor graph and take the characteristic of cell-free massive MIMO into consideration. The factor graph and proposed detector will be elaborated in next sections.

III-B Factor Graph for Cell-free Massive MIMO Detection

Factor graph is a powerful tool to develop efficient algorithms for solving inference problems by performing different message-passing rules. We first introduce several concepts for the factor graph [37], as illustrated in Fig. 2, where the hollow circles represent the variable nodes and the solid squares represent the factor nodes. For a factor graph with factor nodes $\{f_{0},f_{1},\ldots,f_{L}\}$ , all connected to a set of variable nodes $\{{\mathbf{x}}_{i}\}$ , the messages are computed and updated iteratively by performing the following rules.

1) Approximate beliefs: The approximate belief $b_{\mathrm{app}}({\mathbf{x}})$ on variable node ${\mathbf{x}}$ is $\mathcal{N}_{\mathbb{C}}({\mathbf{x}},{\mathbf{x}}_{\mathrm{A}}^{\mathrm{ext}},v_{\mathrm{A}}^{\mathrm{ext}})$ , where ${\mathbf{x}}_{\mathrm{A}}^{\mathrm{ext}}=\mathtt{E}[{\mathbf{x}}|b_{\mathrm{sp}}]$ and $v_{\mathrm{A}}^{\mathrm{ext}}=\mathrm{mean}(\mathrm{diag}(\mathrm{Cov}[{\mathbf{x}}|b_{\mathrm{sp}}]))$ are the mean and average variance of the corresponding beliefs $b_{\mathrm{sp}}({\mathbf{x}})\propto\prod_{l}\mu_{f_{l}\rightarrow{\mathbf{x}}}({\mathbf{x}})$ .

2) Variable-to-factor messages: The message from a variable node ${\mathbf{x}}$ to a connected factor node $f_{l}$ is given by

\mu_{{\mathbf{x}}\rightarrow f_{l}}({\mathbf{x}})=\prod_{j\neq l}^{L}\mu_{f_{j}\rightarrow{\mathbf{x}}}({\mathbf{x}}),

(7)

which is the product of the message from other factors.

3) Factor-to-variable messages: The message from a factor node $f_{l}$ to a connected variable node ${\mathbf{x}}_{i}$ is $\mu_{f_{l}\rightarrow{\mathbf{x}}_{i}}({\mathbf{x}})\propto\int f({\mathbf{x}}_{i},\{{\mathbf{x}}_{j}\}_{j\neq i})\prod_{j\neq i}\mu_{{\mathbf{x}}_{j}\rightarrow f_{l}}{\mathrm{d}}{\mathbf{x}}_{j}$ .

If we treat the $L$ APs and the CPU as the factor nodes, we can construct the factor graph for cell-free massive MIMO detection. By further applying the above message-passing rules to the constructed factor graph, we develop the distributed EP detector for cell-free massive MIMO systems. The detailed derivation is given in Appendix A-A.

III-C Distributed EP Detector

The distributed EP-based detector is illustrated in Algorithm 1. Compared with the centralized EP detector, it deploys partial calculation modules at the AP based on the local information and then sends the estimated posterior mean and variance to the CPU for combining. In particular, the input of the algorithm is the received signal ${\mathbf{y}}_{l}$ , channel matrix ${\mathbf{H}}_{l}$ , and noise level $\sigma^{2}$ , while the output is the recovered signal ${\mathbf{x}}_{\mathrm{B}}^{\mathrm{post},T}$ in the $T$ -th iteration. The initial parameters are $\boldsymbol{\gamma}_{l}^{(0)}=\mathbf{0}$ , and $\lambda_{l}^{(0)}=\frac{1}{E_{x}}$ , where

E_{x}=\mathtt{E}\{\|{\mathbf{x}}\|^{2}\}/K.

(8)

Note that $\boldsymbol{\gamma}_{l}^{(0)}$ and $\lambda_{l}^{(0)}$ are the initialized extrinsic information, and $E_{x}$ is the power of the transmitted symbol $x_{k}$ . The block diagram of the proposed distributed EP detector is illustrated in Fig. 3, which is composed of two modules, A and B. Each module uses the turbo principle in iterative decoding, where each module passes the extrinsic messages to the other module and this process repeats until convergence. It is derived from the factor graph with the messages updated and passed between different pairs of nodes that are assumed to follow Gaussian distributions. As the Gaussian distribution can be fully characterized by its mean and variance, only the mean and variance need to be calculated and passed between different modules. To better understand the distributed EP detection algorithm, we elaborate the details of each equation in Algorithm 1. Specifically, module A is the linear MMSE (LMMSE) estimator performed at the APs according to the following linear model

{\mathbf{y}}_{l}={\mathbf{H}}_{l}{\mathbf{x}}+{\mathbf{n}}_{l}.

(9)

In the $t$ -iteration, the explicit expressions for the posterior covariance matrix $\boldsymbol{\Sigma}_{l}^{t}$ and mean vector $\boldsymbol{\mu}_{l}^{t}$ are given by (14) and (15), respectively. Note that each AP only uses the local channel ${\mathbf{H}}_{l}$ to detect the transmitted signal ${\mathbf{x}}$ . For the ease of notation, we omit the iteration index $t$ for all estimates for the mean and variance in Algorithm 1. Then, the variance estimate $v_{\mathrm{A},l}^{\mathrm{post}}=\mathrm{tr}(\boldsymbol{\Sigma}_{l}^{t})/|\mathcal{D}_{l}|$ and mean estimate ${\mathbf{x}}_{\mathrm{A},l}^{\mathrm{post}}=\boldsymbol{\mu}_{l}^{t}$ are transferred to the CPU to compute the extrinsic information $v_{\mathrm{A},l}^{\mathrm{ext}}$ (16) and ${\mathbf{x}}_{\mathrm{A},l}^{\mathrm{ext}}$ (17), respectively. The extrinsic information ${\mathbf{x}}_{\mathrm{A},l}^{\mathrm{\mathrm{ext}}}$ can be regarded as the AWGN observation given by

{\mathbf{x}}_{\mathrm{A},l}^{\mathrm{ext}}={\mathbf{x}}+{\mathbf{n}}^{\mathrm{ext}}_{\mathrm{A},l},

(10)

where ${\mathbf{n}}^{\mathrm{ext}}_{\mathrm{A},l}\sim\mathcal{N}_{{\mathbb{C}}}(0,v_{\mathrm{A},l}^{\mathrm{ext}}{\mathbf{I}})$ [33]. As a result, the linear model in (3) is decoupled into $K$ parallel and independent AWGN channels with equivalent noise $v_{\mathrm{A},l}^{\mathrm{ext}}$ for each AP. Subsequently, the CPU collects all extrinsic means $\{{\mathbf{x}}_{\mathrm{A},l}^{\mathrm{ext}}\}_{l=1}^{L}$ and variances $\{v_{\mathrm{A},l}^{\mathrm{ext}}\}_{l=1}^{L}$ and performs MRC¹¹1In our paper, we focus on linear combining of the local estimate $\mathbf{x}_{\mathrm{A},l}^{\mathrm{ext}}$ and $v_{\mathrm{A},l}^{\mathrm{ext}}$ from APs as $\mathbf{x}_{\mathrm{A}}^{\mathrm{ext}}=\sum_{l=1}^{L}a_{l}\mathbf{x}_{\mathrm{A},l}^{\mathrm{ext}},\quad l=1,\ldots,L$ . The optimal combining is MRC, which can maximize the post-combination SNR of the final AWGN observation $\mathrm{x}_{\mathrm{A}}^{\mathrm{ext}}$ with the linear combination of $\mathbf{x}_{\mathrm{A},l}^{\mathrm{ext}}$ where $\mathbf{x}_{\mathrm{A}}^{\mathrm{ext}}=\mathbf{x}+\mathbf{n}^{\mathrm{eq}}$ and $\mathbf{n}_{\mathrm{A}}^{\mathrm{ext}}\sim\mathcal{N}_{\mathbb{C}}(0,v_{\mathrm{A}}^{\mathrm{ext}}\mathbf{I})$ . The final expressions are shown in (18) and (19) and the detailed derivation is given in Appendix A-B.. The MRC expressions (18) and (19) are obtained by maximizing the post-combination signal-to-noise ratio (SNR) of the AWGN observation ${\mathbf{x}}_{\mathrm{A}}^{\mathrm{ext}}$ at the CPU, given by

{\mathbf{x}}_{\mathrm{A}}^{\mathrm{ext}}={\mathbf{x}}+{\mathbf{n}}^{\mathrm{ext}}_{\mathrm{A}},

(11)

where ${\mathbf{n}}^{\mathrm{ext}}_{\mathrm{A}}\sim\mathcal{N}_{{\mathbb{C}}}(0,v_{\mathrm{A}}^{\mathrm{ext}}{\mathbf{I}})$ . The CPU uses the posterior mean estimator to detect the signal ${\mathbf{x}}$ from the equivalent AWGN model (11). Then, the posterior mean and variance are computed by the posterior MMSE estimator for the equivalent AWGN model in (20) and (21). As the transmitted symbol is assumed to be drawn from the $M$ -QAM set $\mathcal{S}=\{s_{1},s_{2},\ldots,s_{M}\}$ , the corresponding expressions for each element in (20) and (21) are given by

[x_{\mathrm{B}}^{\mathrm{post}}]_{k}=\mathtt{E}\{x_{k}|[x_{\mathrm{A}}^{\mathrm{ext}}]_{k},v_{\mathrm{A}}^{\mathrm{ext}}\}=\frac{\sum_{s_{i}\in\mathcal{S}}s_{i}\mathcal{N}_{{\mathbb{C}}}(s_{i};[x_{\mathrm{A}}^{\mathrm{ext}}]_{k},v_{\mathrm{A}}^{\mathrm{ext}})p(s_{i})}{\sum_{s_{i}\in\mathcal{S}}\mathcal{N}_{{\mathbb{C}}}(s_{i};[x_{\mathrm{A}}^{\mathrm{ext}}]_{k},v_{\mathrm{A}}^{\mathrm{ext}})p(s_{i})},

(12)

[v_{\mathrm{B}}^{\mathrm{post}}]_{k}=\mathtt{var}\{x_{k}|[x_{\mathrm{A}}^{\mathrm{ext}}]_{k},v_{\mathrm{A}}^{\mathrm{ext}}\}=\frac{\sum_{s_{i}\in\mathcal{S}}|s_{i}|^{2}\mathcal{N}_{{\mathbb{C}}}(s_{i};[x_{\mathrm{A}}^{\mathrm{ext}}]_{k},v_{\mathrm{A}}^{\mathrm{ext}})p(s_{i})}{\sum_{s_{i}\in\mathcal{S}}\mathcal{N}_{{\mathbb{C}}}(s_{i};[x_{\mathrm{A}}^{\mathrm{ext}}]_{k},v_{\mathrm{A}}^{\mathrm{ext}})p(s_{i})}-|[x_{\mathrm{B}}^{\mathrm{post}}]_{k}|^{2},

(13)

where $[x_{\mathrm{B}}^{\mathrm{post}}]_{k}$ , $[v_{\mathrm{B}}^{\mathrm{post}}]_{k}$ , and $[x_{\mathrm{A}}^{\mathrm{ext}}]_{k}$ are the $k$ -th element in ${\mathbf{x}}_{\mathrm{B}}^{\mathrm{post}}$ , ${\mathbf{v}}_{\mathrm{B}}^{\mathrm{post}}$ , and ${\mathbf{x}}_{\mathrm{A}}^{\mathrm{ext}}$ , respectively. The posterior mean ${\mathbf{x}}_{\mathrm{B}}^{\mathrm{post}}$ and variance ${\mathbf{v}}_{\mathrm{B}}^{\mathrm{post}}$ are then utilized to compute the extrinsic information $\lambda_{l}^{(t)}$ and $\boldsymbol{\gamma}_{l}^{(t)}$ for each AP in (22) and (23), where the function $\mathrm{mean}(\cdot)$ is used to compute the mean of a vector. Finally, the extrinsic information $\lambda_{l}^{(t)}$ and $\boldsymbol{\gamma}_{l}^{(t)}$ are transferred to each AP in the next iteration. The whole procedure is executed iteratively until it is terminated by a stopping criterion or a maximum number of iterations.

Input: Received signal

{\mathbf{y}}_{l}

, channel matrix

{\mathbf{H}}_{l}

, and noise level

\sigma^{2}

Output: Recovered signal

{\mathbf{x}}_{\mathrm{B}}^{\mathrm{post}}

Initialize:

\boldsymbol{\gamma}_{l}^{(0)}\leftarrow\mathbf{0}

\lambda_{l}^{(0)}\leftarrow\frac{1}{E_{x}}

for $t=1,\cdots,T$ do

Module A in APs:

(1) Compute the posterior mean and variance of

{\mathbf{x}}_{\mathrm{A},l}

v_{\mathrm{A},l}^{\mathrm{post}}\leftarrow\boldsymbol{\Sigma}_{l}^{t}=\bigg{(}\sigma^{-2}{\mathbf{H}}^{H}_{l}{\mathbf{H}}_{l}+\lambda_{l}^{(t-1)}{\mathbf{I}}\bigg{)}^{-1}

(14)

{\mathbf{x}}_{\mathrm{A},l}^{\mathrm{post}}\leftarrow\boldsymbol{\mu}_{l}^{t}=\boldsymbol{\Sigma}_{l}^{t}\bigg{(}\sigma^{-2}{\mathbf{H}}_{l}^{H}{\mathbf{y}}_{l}+\boldsymbol{\gamma}_{l}^{(t-1)}\bigg{)}.

(15)

Module B in CPU:

(2) Compute the extrinsic mean and variance of

{\mathbf{x}}_{\mathrm{A},l}

v_{\mathrm{A},l}^{\mathrm{ext}}\leftarrow\bigg{(}\frac{1}{v_{\mathrm{A},l}^{\mathrm{post}}}-\lambda_{l}^{(t-1)}\bigg{)}^{-1}

(16)

{\mathbf{x}}_{\mathrm{A},l}^{\mathrm{ext}}\leftarrow v_{\mathrm{A},l}^{\mathrm{ext}}\bigg{(}\frac{\boldsymbol{\mu}_{l}^{t}}{v_{\mathrm{A},l}^{\mathrm{post}}}-\boldsymbol{\gamma}_{l}^{(t-1)}\bigg{)}^{-1}.

(17)

(3) MRC combining of

{\mathbf{x}}_{\mathrm{A},l}

\frac{1}{v_{\mathrm{A}}^{\mathrm{ext}}}=\sum_{l=1}^{L}\frac{1}{v_{\mathrm{A},l}^{\mathrm{ext}}}

(18)

{\mathbf{x}}_{\mathrm{A}}^{\mathrm{ext}}=v_{\mathrm{A}}^{\mathrm{ext}}\sum_{l=1}^{L}\frac{{\mathbf{x}}_{\mathrm{A},l}^{\mathrm{ext}}}{v_{\mathrm{A},l}^{\mathrm{ext}}}.

(19)

(4) Compute the posterior mean and variance of

{\mathbf{x}}_{\mathrm{B}}

{\mathbf{x}}_{\mathrm{B}}^{\mathrm{post}}=\mathtt{E}\{{\mathbf{x}}|{\mathbf{x}}_{\mathrm{A}}^{\mathrm{ext}},v_{\mathrm{A}}^{\mathrm{ext}}\}

(20)

{\mathbf{v}}_{\mathrm{B}}^{\mathrm{post}}=\mathtt{var}\{{\mathbf{x}}|{\mathbf{x}}_{\mathrm{A}}^{\mathrm{ext}},v_{\mathrm{A}}^{\mathrm{ext}}\}.

(21)

(5) Compute the extrinsic mean and variance

{\mathbf{x}}_{\mathrm{B},l}

\frac{1}{v_{\mathrm{B},l}^{\mathrm{ext}}}\leftarrow\lambda_{l}^{(t)}=\frac{1}{\mathrm{mean}({\mathbf{v}}_{\mathrm{B}}^{\mathrm{post}})}-\frac{1}{v_{\mathrm{A},l}^{\mathrm{ext}}}

(22)

\frac{{\mathbf{x}}_{\mathrm{B},l}^{\mathrm{ext}}}{v_{\mathrm{B},l}^{\mathrm{ext}}}\leftarrow\boldsymbol{\gamma}_{l}^{(t)}=\frac{{\mathbf{x}}_{\mathrm{B}}^{\mathrm{post}}}{\mathrm{mean}({\mathbf{v}}_{\mathrm{B}}^{\mathrm{post}})}-\frac{{\mathbf{x}}_{\mathrm{A},l}^{\mathrm{post}}}{v_{\mathrm{A},l}^{\mathrm{ext}}}.

(23)

Algorithm 1 Distributed EP for cell-free massive MIMO detection

III-D Computational Complexity and Fronthaul Overhead

In the following, we provide the complexity analysis and fronthaul overhead for different detectors in Tables I and II, respectively. For the proposed distributed EP detector, the computational complexity at each AP is dominated by the LMMSE estimator for estimating the signal ${\mathbf{x}}_{\mathrm{A},l}^{\mathrm{post}}$ , which is $\mathcal{O}(N|\mathcal{D}_{l}|^{2})$ because of the matrix inversion required in (14), while the computational complexity at the CPU is $O(L|\mathcal{D}_{l}|)$ in each iteration. Furthermore, if the number of antennas $N$ is less than $|\mathcal{D}_{l}|$ , we can use the matrix inversion lemma to carry out the matrix inversion in (14) as follows

(\sigma^{-2}{\mathbf{H}}_{l}^{H}{\mathbf{H}}_{l}+{\mathbf{D}})^{-1}={\mathbf{D}}^{-1}-\sigma^{-2}{\mathbf{D}}^{-1}{\mathbf{H}}_{l}^{H}({\mathbf{I}}+\sigma^{-2}{\mathbf{H}}_{l}{\mathbf{D}}^{-1}{\mathbf{H}}_{l}^{H})^{-1}{\mathbf{H}}_{l}{\mathbf{D}}^{-1},

where ${\mathbf{D}}=\lambda_{l}^{(t-1)}{\mathbf{I}}$ and the computational complexity is reduced to $\mathcal{O}(|\mathcal{D}_{l}|N^{2})$ . Therefore, the overall computational complexity at each AP is $\mathcal{O}(T|\mathcal{D}_{l}|N^{2})$ while the overall computational complexity at the CPU is $\mathcal{O}(TL|\mathcal{D}_{l}|)$ for $T$ iterations. As observed in Table I, the distritbuted EP detector mainly increases the computational complexity at the CPU when compared with other distributed linear receivers.

TABLE I: . Complexity of different detectors.

Detectors	Distributed EP	Centralized MMSE	Partially distributed MMSE	Distributed MMSE
AP	$8NT\|\mathcal{D}_{l}\|(\|\mathcal{D}_{l}\|+1)+$ $6T\|\mathcal{D}_{l}\|(\|\mathcal{D}_{l}\|+2)$	$0$	$8N\|\mathcal{D}_{l}\|(\|\mathcal{D}_{l}\|+1)+$ $6\|\mathcal{D}_{l}\|(\|\mathcal{D}_{l}\|+2)$	$8N\|\mathcal{D}_{l}\|(\|\mathcal{D}_{l}\|+1)+$ $6\|\mathcal{D}_{l}\|(\|\mathcal{D}_{l}\|+2)$
CPU	$LT(1+2\|\mathcal{D}_{l}\|)+$ $\|\mathcal{D}_{l}\|(T-1)(7M+2)$	$\|\mathcal{D}_{l}\|[8NL(\|\mathcal{D}_{l}\|+1)+$ $2(4\|\mathcal{D}_{l}\|+3)]$	$L(1+2\|\mathcal{D}_{l}\|)$	$0$

We further compare the number of complex scalars that need to be transmitted from the APs to the CPU via the fronthauls of different detectors in Table II. We assume that $\tau_{c}$ and $\tau_{p}$ are the coherence time and pilot length, respectively. The results for the 4 baseline detectors are from [27]. With the proposed detector, $\sum\limits_{l=1}^{L}(\tau_{c}-\tau_{p})2T(|\mathcal{D}_{l}|+1)$ scalars need to be passed from the APs to the CPU and no statistical parameters are required to be passed, where $T$ denotes the total number of iterations. The fronthaul overhead of the proposed distributed EP detector is similar to those of the centralized MMSE and EP detectors. The detailed comparison shall be determined by the value of system parameters.

TABLE II: . Fronthaul overhead

Detectors	Coherence block	Statistical parameters
Distributed EP	$\sum\limits_{l=1}^{L}(\tau_{c}-\tau_{p})2T(\|\mathcal{D}_{l}\|+1)$	$0$
Centralized MMSE	$\tau_{c}NL$	$KLN^{2}/2$
Partially distributed MMSE with LSFD	$(\tau_{c}-\tau_{p})KL$	$KL+(L^{2}K^{2}+KL)/2$
Partially distributed MMSE with average decoding	$(\tau_{c}-\tau_{p})KL$	$0$
Distributed MMSE	$0$	$0$
Centralized EP	$\tau_{c}NL$	$KLN^{2}/2$

IV Distributed iterative Channel Estimation and Data Detection

In Section III, we developed a distributed EP detector assuming perfect CSI. However, the channel matrix is typically estimated at the receiver with uplink training, and thus channel estimation errors should be considered in the detection stage. In this section, we will propose a distributed ICD receiver for cell-free massive MIMO to handle imperfect CSI.

IV-A ICD Algorithm

ICD has been explored for MIMO [38, 39], OFDM [40], and massive MIMO [41] systems with a low-resolution analog-to-digital converter (ADC) [42]. Recently, it has been investigated in cell-free massive MIMO to improve the BER performance and reduce the pilot overhead. Different from [43] that solved the complex biconvex optimization problem with high complexity, we develop a low-complexity turbo-like ICD architecture. The iterative procedure is summarized in Algorithm 2.

As illustrated in Fig. 3, the proposed ICD scheme employs a similar idea as iterative decoding. At each AP, the channel estimator and Module A in the proposed distributed EP detector are performed. They exchange extrinsic information iteratively. In the ICD processing stage, the channel estimation is first performed based on the received pilot signal and transmitted pilot. Then, the data detector performs signal detection by taking both the channel estimation error and channel statistics into consideration. It will then feed back the detected data and detection error to the channel estimator. Finally, data-aided channel estimation is employed with the help of the detected data. The whole ICD procedure is executed iteratively until it is terminated by a stopping criterion or a maximum number of ICD iterations.

Similar to Algorithm 1, the input of the ICD scheme is the pilot signal matrix, $\mathbf{X}^{{\mathrm{p}}}$ , the received signal matrix corresponding to the pilot matrix, ${\mathbf{Y}}^{{\mathrm{p}}}$ , and that corresponding to the data matrix, ${\mathbf{Y}}^{{\mathrm{d}}}$ , in each time slot for each AP. In the $r$ -th ICD iteration²²2 Here we use $r$ to denote the number of data feedback from the data detector to the channel estimator. For example, $r=1$ means no data feedback while $r=4$ means 3 data feedback should be performed., $\hat{{\mathbf{H}}}^{(r)}$ is the estimated channel matrix, $\hat{{\mathbf{X}}}^{({\mathrm{d}},r)}$ is the estimated data matrix, and $\hat{{\mathbf{V}}}_{\mathrm{est},l}$ and $\hat{{\mathbf{V}}}_{\mathrm{det},l}$ are used to compute the covariance matrix for the equivalent noise in the signal detector and channel estimator, respectively. The final output of the signal detector is the detected data matrix $\hat{{\mathbf{X}}}^{(\mathrm{d},R)}$ , where $R$ is the total number of ICD iterations. Compared with the conventional receiver where the channel estimator and signal detector are designed separately, the ICD scheme can improve the BER performance by considering the characteristics of the channel estimation error and channel statistics. After performing signal detection, the detected payload data will be utilized for channel estimation. Next, we will elaborate the whole procedure in detail.

Input: Received signal

{\mathbf{Y}}^{{\mathrm{p}}}

and

{\mathbf{Y}}^{{\mathrm{d}}}

, pilot signal matrix,

\mathbf{X}^{{\mathrm{p}}}

, spatial correlation matrix

{\mathbf{R}}_{kl}

, and noise level

\sigma^{2}

Output: Recovered signal

\hat{{\mathbf{X}}}_{(\mathrm{d},R)}={\mathbf{x}}_{\mathrm{B}}^{\mathrm{post}}

Initialize:

\boldsymbol{\gamma}_{l}^{(0)}\leftarrow\mathbf{0}

\lambda_{l}^{(0)}\leftarrow\frac{1}{E_{x}}

for $r=1,\cdots,R$ do

(1) Perform LMMSE channel estimation to obtain

\hat{{\mathbf{H}}}^{(r)}

(2) Perform distributed EP-based MIMO detection to obtain

\hat{{\mathbf{X}}}^{(\mathrm{d},r)}

(3) Data feedback to channel estimator

Algorithm 2 Distributed EP-based ICD for cell-free massive MIMO

IV-B LMMSE Estimator

We consider adopting the classical LMMSE estimator in the channel estimation stage at each AP. To facilitate the representation of the channel estimation problem, we consider the matrix vectorization to (9) and rewrite it as

{\mathbf{y}}^{{\mathrm{p}}}_{l}={\mathbf{A}}^{{\mathrm{p}}}\bar{{\mathbf{h}}}_{l}+{\mathbf{n}}^{{\mathrm{p}}}_{l},

(24)

where ${\mathbf{A}}^{{\mathrm{p}}}=({\mathbf{X}}^{{\mathrm{p}}})^{T}\otimes{\mathbf{I}}_{N}\in\mathbb{C}^{\tau_{p}N\times KN}$ , ${\mathbf{y}}^{{\mathrm{p}}}_{l}=\mathrm{vec}({\mathbf{Y}}^{{\mathrm{p}}}_{l})\in\mathbb{C}^{\tau_{p}N\times 1}$ , $\bar{{\mathbf{h}}}_{l}=\mathrm{vec}({\mathbf{H}}_{l})\in\mathbb{C}^{NK\times 1}$ , and ${\mathbf{n}}^{{\mathrm{p}}}_{l}=\mathrm{vec}(\mathbf{N}^{{\mathrm{p}}}_{l})\in\mathbb{C}^{\tau_{p}N\times 1}$ . We denote $\otimes$ as the matrix Kronecker product and $\mathrm{vec}(\cdot)$ as the vectorization operation. In the pilot-only based channel estimation stage, the LMMSE estimate of $\bar{{\mathbf{h}}}_{l}$ is given by

\hat{\bar{{\mathbf{h}}}}^{\mathrm{p}}_{l}={\mathbf{R}}_{\bar{{\mathbf{h}}}_{l}\bar{{\mathbf{h}}}_{l}}({\mathbf{A}}^{{\mathrm{p}}})^{H}({\mathbf{A}}^{{\mathrm{p}}}{\mathbf{R}}_{\bar{{\mathbf{h}}}_{l}\bar{{\mathbf{h}}}_{l}}({\mathbf{A}}^{{\mathrm{p}}})^{H}+\sigma^{2}{\mathbf{I}}_{\tau_{p}N})^{-1}{\mathbf{y}}_{l}^{{\mathrm{p}}},

(25)

where ${\mathbf{R}}_{\bar{{\mathbf{h}}}_{l}\bar{{\mathbf{h}}}_{l}}$ is the channel covariance matrix and depends on the spatial correlation matrix ${\mathbf{R}}_{kl}$ . Based on the property of the LMMSE estimator, $\hat{\bar{{\mathbf{h}}}}_{l}^{{\mathrm{p}}}$ is a Gaussian random vector, and the channel estimation error vector $\Delta{{\mathbf{h}}^{{\mathrm{p}}}_{l}}=\hat{\bar{{\mathbf{h}}}}^{{\mathrm{p}}}_{l}-{\mathbf{h}}_{l}$ is also a Gaussian random vector with zero-mean and covariance matrix ${\mathbf{R}}_{\Delta{{\mathbf{h}}_{l}^{\mathrm{p}}}}$ given by

{\mathbf{R}}_{\Delta{{\mathbf{h}}_{l}^{\mathrm{p}}}}={\mathbf{R}}_{\bar{{\mathbf{h}}}_{l}\bar{{\mathbf{h}}}_{l}}-{\mathbf{R}}_{\bar{{\mathbf{h}}}_{l}\bar{{\mathbf{h}}}_{l}}({\mathbf{A}}^{{\mathrm{p}}})^{H}({\mathbf{A}}^{{\mathrm{p}}}{\mathbf{R}}_{\bar{{\mathbf{h}}}_{l}\bar{{\mathbf{h}}}_{l}}({\mathbf{A}}^{{\mathrm{p}}})^{H}+\sigma^{2}{\mathbf{I}}_{\tau_{p}N})^{-1}{\mathbf{A}}^{{\mathrm{p}}}{\mathbf{R}}_{\bar{{\mathbf{h}}}_{l}\bar{{\mathbf{h}}}_{l}}.

(26)

IV-C Signal Detection with Channel Estimation Errors

After performing channel estimation, the estimated channel is used to perform data detection at each AP. In the data transmission stage, the received data signal vector ${\mathbf{y}}^{{\mathrm{d}}}[n]$ corresponding to the $n$ -th data in each coherence time can be expressed by ${\mathbf{y}}^{{\mathrm{d}}}[n]={\mathbf{H}}{\mathbf{x}}^{{\mathrm{d}}}[n]+{\mathbf{n}}^{{\mathrm{d}}}[n]$ , where ${\mathbf{n}}^{{\mathrm{d}}}[n]\sim\mathcal{N}_{\mathbb{C}}(0,\sigma^{2}{\mathbf{I}}_{N})$ is the AWGN vector³³3As each AP has similar signal models, we remove the index $l$ for the AP and use ${\mathbf{y}}^{{\mathrm{d}}}[n]$ , ${\mathbf{H}}$ , ${\mathbf{x}}^{{\mathrm{d}}}[n]$ , and ${\mathbf{n}}^{{\mathrm{d}}}[n]$ to denote ${\mathbf{y}}_{l}^{{\mathrm{d}}}[n]$ , ${\mathbf{H}}_{l}$ , ${\mathbf{x}}_{l}^{{\mathrm{d}}}[n]$ , and ${\mathbf{n}}_{l}^{{\mathrm{d}}}[n]$ , respectively. To simplify the notation, we have the same operation for the following symbols.. We can rewrite it in a matrix from as ${\mathbf{Y}}^{{\mathrm{d}}}={\mathbf{H}}{\mathbf{X}}^{{\mathrm{d}}}+{\mathbf{N}}^{{\mathrm{d}}}$ , where ${\mathbf{N}}^{{\mathrm{d}}}=\left[{\mathbf{n}}^{{\mathrm{d}}}[1],\ldots,{\mathbf{n}}^{{\mathrm{d}}}[\tau_{d}]\right]\in{\mathbb{C}}^{N\times\tau_{d}}$ is the AWGN matrix in the data transmission stage. Denote the estimated channel $\hat{{\mathbf{H}}}\in\mathbb{C}^{N\times K}$ as $\hat{{\mathbf{H}}}={\mathbf{H}}+\Delta{\mathbf{H}}$ , where $\Delta{\mathbf{H}}$ is the channel estimation error. The signal detection problem can be formulated as

\displaystyle{\mathbf{y}}^{{\mathrm{d}}}[n]={\mathbf{H}}{\mathbf{x}}^{{\mathrm{d}}}[n]+{\mathbf{n}}^{{\mathrm{d}}}[n]=\hat{{\mathbf{H}}}{\mathbf{x}}^{{\mathrm{d}}}[n]+\hat{{\mathbf{n}}}^{{\mathrm{d}}}[n],

(27)

where $\hat{{\mathbf{n}}}^{{\mathrm{d}}}[n]={\mathbf{n}}^{{\mathrm{d}}}[n]-\Delta{\mathbf{H}}{\mathbf{x}}^{{\mathrm{d}}}[n]$ is the equivalent noise in the signal detector and the statistical characterization of $\hat{{\mathbf{n}}}^{{\mathrm{d}}}[n]$ is given in Appendix B-A.

IV-D Data-Aided Channel Estimation

To further improve the BER performance, a data-aided channel estimation approach is adopted in the channel estimation stage. First, conventional pilot-only based channel estimation is performed and the transmitted symbols are detected. Then, the detected symbols are fed back to the channel estimator as additional pilot symbols to refine the channel estimation. In the channel training stage, pilot matrix ${\mathbf{X}}^{{\mathrm{p}}}$ is transmitted. The received signal matrix corresponding to the pilot matrix ${\mathbf{Y}}^{{\mathrm{p}}}\in\mathbb{C}^{N\times\tau_{p}}$ is expressed as ${\mathbf{Y}}^{{\mathrm{p}}}={\mathbf{H}}{\mathbf{X}}^{{\mathrm{p}}}+{\mathbf{N}}^{{\mathrm{p}}}$ , where ${\mathbf{N}}^{{\mathrm{p}}}=\left[{\mathbf{n}}^{{\mathrm{p}}}[1],\ldots,\mathbf{n}^{{\mathrm{p}}}[\tau_{p}]\right]\in\mathbb{C}^{N\times\tau_{p}}$ is the AWGN matrix and each column $\mathbf{n}^{{\mathrm{p}}}[n]\sim\mathcal{N}_{\mathbb{C}}(0,\sigma^{2}\mathbf{I}_{N})$ for $n=1,\ldots,\tau_{p}$ . The estimated data matrix $\hat{\mathbf{X}}^{{\mathrm{d}}}$ can be expressed as $\hat{{\mathbf{X}}}^{{\mathrm{d}}}={\mathbf{X}}^{{\mathrm{d}}}+{\mathbf{E}}^{{\mathrm{d}}}$ , where $\mathbf{E}^{{\mathrm{d}}}$ is the signal detection error matrix. In the data-aided channel estimation stage, the estimated $\hat{{\mathbf{X}}}^{{\mathrm{d}}}$ are fed back to the channel estimator as additional pilots. Then, the received signal matrix ${\mathbf{Y}}^{{\mathrm{d}}}$ corresponding to $\hat{{\mathbf{X}}}^{{\mathrm{d}}}$ can be expressed as

\displaystyle{\mathbf{Y}}^{{\mathrm{d}}}={\mathbf{H}}{\mathbf{X}}^{{\mathrm{d}}}+{\mathbf{N}}^{{\mathrm{d}}}={\mathbf{H}}\hat{{\mathbf{X}}}^{{\mathrm{d}}}+\hat{{\mathbf{N}}}^{{\mathrm{d}}},

(28)

where $\hat{{\mathbf{N}}}^{{\mathrm{d}}}={\mathbf{N}}^{{\mathrm{d}}}-{\mathbf{H}}{\mathbf{E}}^{{\mathrm{d}}}$ is the equivalent noise for the data-aided pilot $\hat{{\mathbf{X}}}^{{\mathrm{d}}}$ . The statistical information of the $n$ -th column of $\hat{{\mathbf{N}}}^{{\mathrm{d}}}$ is $\hat{\mathbf{n}}^{{\mathrm{d}}}[n]\sim\mathcal{N}_{{\mathbb{C}}}(\mathbf{0},\hat{\mathbf{V}}^{{\mathrm{d}}}[n])$ for $n=1,\ldots,\tau_{d}$ , where $\hat{\mathbf{V}}^{{\mathrm{d}}}[n]$ will be utilized for data-aided channel estimation. We denote ${\mathbf{Y}}=[{\mathbf{Y}}^{{\mathrm{p}}},{\mathbf{Y}}^{{\mathrm{d}}}]$ as the received signal matrix corresponding to the overall transmitted signal. By concatenating the received pilot and data signals, we have

\displaystyle{\mathbf{Y}}=\left[{\mathbf{Y}}^{{\mathrm{p}}}\ {\mathbf{Y}}^{{\mathrm{d}}}\right]={\mathbf{H}}{\mathbf{X}}+{\mathbf{N}},

(29)

where ${\mathbf{X}}=[{\mathbf{X}}^{{\mathrm{p}}},\hat{{\mathbf{X}}}^{{\mathrm{d}}}]$ and ${\mathbf{N}}=[{\mathbf{N}}^{{\mathrm{p}}},\hat{{\mathbf{N}}}^{{\mathrm{d}}}]$ can be interpreted as the equivalent pilot signal and noise in the data-aided channel estimation stage, respectively. The statistical characteristic for the equivalent noise and corresponding LMMSE estimator are derived in Appendix B-B.

V Performance Analysis

In this section, we provide an asymptotic performance analysis for the proposed distributed EP detector. We consider $L,K\rightarrow\infty$ and fix

\alpha_{l}=\frac{K}{N},\alpha=\frac{K}{LN}.

(30)

After proposing a state evolution analysis framework, we will characterize the fixed points of the proposed detector.

V-A State Evolution Analysis

We first provide a state evolution analysis framework to predict the asymptotic performance of the distributed EP detector in the large-system limit⁴⁴4The large-system limit means that the cell-free massive MIMO system with $L,K\rightarrow\infty$ ..

Proposition 1.

In the large-system limit, the asymptotic behavior (such as MSE and BER) of Algorithm 1 can be described by the following equations:


$\displaystyle v_{\mathrm{A},l}^{\mathrm{ext},t}$	$\displaystyle=\bigg{(}\frac{1}{K_{l}}\sum_{k=1}^{K_{l}}\frac{1}{\sigma^{-2}\tau_{k,l}+\lambda_{l}^{(t-1)}}-\lambda_{l}^{(t-1)}\bigg{)}^{-1}$	(31a)
$\displaystyle v_{\mathrm{A}}^{\mathrm{ext},t}$	$\displaystyle=\bigg{(}\sum_{l=1}^{L}\frac{1}{v_{\mathrm{A},l}^{\mathrm{ext},t}}\bigg{)}^{-1}$	(31b)
$\displaystyle\lambda_{l}^{(t)}$	$\displaystyle=\frac{1}{\mathrm{MSE}(v_{\mathrm{A}}^{\mathrm{ext},t})}-\frac{1}{v_{\mathrm{A},l}^{\mathrm{ext},t}}.$	(31c)

$\blacksquare$

The function $\mathrm{MSE}(\cdot)$ is given by

\mathrm{MSE}(v_{\mathrm{A},l}^{\mathrm{ext},t})\triangleq\mathtt{E}\bigg{\{}|x-\mathtt{E}\{x|x_{\mathrm{A},l}^{\mathrm{ext},t},v_{\mathrm{A},l}^{\mathrm{ext},t}|^{2}\bigg{\}}

(32)

where the expectation is with respect to $x$ . $\tau_{k,l}$ are the eigenvalues of ${\mathbf{H}}^{H}_{l}{\mathbf{H}}_{l}$ .

The state equations illustrated in Proposition 1 can be proved rigorously in a similar way as [44]. To better understand the state evolution analysis, we give an intuitive derivation here. In Algorithm 1, we have explicit expressions for computing the mean and variance for each module. By substituting (14) into (16), we have the following expression

v_{\mathrm{A},l}^{\mathrm{ext},t}=\frac{1}{\frac{1}{K}\mathrm{tr}(\sigma^{-2}{\mathbf{H}}^{H}_{l}{\mathbf{H}}_{l}+\lambda_{l}^{(t-1)}{\mathbf{I}})}-\lambda_{l}^{(t-1)}.

(33)

The asymptotic expressions for $v_{\mathrm{A}}^{\mathrm{ext},t}$ and $\lambda_{l}^{(t)}$ can then be derived from (18) and (22). Finally, the asymptotic MSE can be interpreted as the MSE of the decoupled scalar AWGN channels (11) and is related to the distribution of the transmitted signal ${\mathbf{x}}$ . Next, we will give a specific example for Proposition 1.

Example 1: If the channel matrix ${\mathbf{H}}_{l}\,(l=1,2,\ldots,L)$ is an i.i.d. Gaussian distributed matrix, then $v_{\mathrm{A},l}^{\mathrm{ext}}$ will converge to the following asymptotic expression by adopting the result of the $\mathcal{R}$ -transform⁵⁵5The concept of $\mathcal{R}$ -transform can be found in [45], which enables the characterization of the asymptotic spectrum of a sum of suitable matrices from their individual asymptotic spectra. of the average empirical eigenvalue distribution given by,

v_{\mathrm{A},l}^{\mathrm{ext},t}=\frac{\alpha_{l}\sigma^{2}+(\alpha_{l}-1)\lambda_{l}^{(t-1)}}{2}+\frac{\sqrt{\bigg{(}\alpha_{l}\sigma^{2}+(\alpha_{l}-1)\lambda_{l}^{(t-1)}\bigg{)}^{2}+4\alpha_{l}\sigma^{2}\lambda_{l}^{(t-1)}}}{2}.

(34)

Furthermore, if the data symbol is drawn from a quadrature phase-shift keying (QPSK) constellation, the $\mathrm{MSE}$ is expressed by

\mathrm{MSE}(v_{\mathrm{A},l}^{\mathrm{ext},t})=1-\int\mathrm{D}z\tanh(\frac{1}{v_{\mathrm{A},l}^{\mathrm{ext},t}}+\sqrt{\frac{1}{v_{\mathrm{A},l}^{\mathrm{ext},t}}}z).

(35)

Furthermore, the BER w.r.t. ${\mathbf{x}}$ can be evaluated through the equivalent AWGN channel (11) with an equivalent $\mathrm{SNR}=1/v_{\mathrm{A},l}^{\mathrm{ext},T}$ and is given by

\mathrm{BER}=2Q(\sqrt{\frac{1}{v_{\mathrm{A},l}^{\mathrm{ext},T}}})-[Q(\sqrt{\frac{1}{v_{\mathrm{A},l}^{\mathrm{ext},T}}})]^{2},

(36)

where $Q(x)=\int_{x}^{\infty}\mathrm{D}z$ is the $Q$ -function. In fact, the MSE and BER are determined given the knowledge of the AWGN channel (11) with $\mathrm{SNR}=1/v_{\mathrm{A},l}^{\mathrm{ext},T}$ , which is known as the decoupling principle. Thus, if the data symbol is drawn from other $M$ -QAM constellations, the corresponding BER can be easily obtained using a closed-form BER expression [46].

V-B Fixed Point Characterization

The fixed points of the distributed EP detection algorithm are the stationary points of a relaxed version of the Kullback-Leibler (KL) divergence to minimize the posterior probability (4), which is given by

b({\mathbf{x}})=\arg\min\limits_{b({\mathbf{x}})\in\mathcal{F}}D\left[b({\mathbf{x}})\|\mathtt{P}({\mathbf{x}}|{\mathbf{y}},{\mathbf{H}})\right].

(37)

The KL divergence between two distributions is defined as,

{D}\left[{p\left(x\right)\left\|{q\left(x\right)}\right.}\right]=\int{p\left(x\right)\log\frac{{p\left(x\right)}}{{q\left(x\right)}}}{{\mathrm{d}}}x+\int{\left({q\left(x\right)-p\left(x\right)}\right)}{{\mathrm{d}}}x.

(38)

Generally, it is difficult to approximate the complex posterior distribution (4) by a tractable approach. The moment matching method is utilized to exploit the relaxations of the KL divergence minimization problem [47]. The process for solving the KL divergence minimization problem is equivalent to optimizing the Bethe free energy subject to the moment-matching constraint. That is,


	$\displaystyle\min\limits_{b,b_{1},\ldots,b_{L}}\max\limits_{q}~{}J\left(b,b_{1},\ldots,b_{L},q\right)$		(39a)
	$\displaystyle~{}~{}~{}~{}~{}\text{s.t.}~{}~{}~{}~{}\mathtt{E}_{b_{l}}({\mathbf{x}})=\mathtt{E}_{q}({\mathbf{x}}),$		(39b)
	$\displaystyle~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}\frac{1}{K}\sum\limits_{k\in\mathcal{K}}\mathtt{E}_{b_{l}}(\|x_{k}\|^{2})=\frac{1}{K}\sum\limits_{k\in\mathcal{K}}\mathtt{E}_{q}(\|x_{k}\|^{2}),$		(39c)

where $J\left(b,b_{1},\ldots,b_{L},q\right)=\mathrm{KL}\left(b\|e^{\log f({\mathbf{x}})}\right)+\sum\limits_{l=1}^{L}\mathrm{KL}\left(b_{l}\|e^{\log f_{L}({\mathbf{x}})}\right)+H(q)$ and $H(\cdot)$ is the differential entropy. In addition, $b({\mathbf{x}})$ , $\{b_{c}({\mathbf{x}})\}_{l=1}^{L}$ , and $q({\mathbf{x}})$ are given by


	$\displaystyle b({\mathbf{x}}):=\frac{1}{Z_{0}\left({\mathbf{x}}_{\mathrm{A}}^{\mathrm{ext}}\right)}p({\mathbf{x}})\exp\left[-\\|{\mathbf{x}}-{\mathbf{x}}_{\mathrm{A}}^{\mathrm{ext}}\\|^{2}/v_{\mathrm{A}}^{\mathrm{ext}}\right],$		(40a)
	$\displaystyle b_{l}({\mathbf{x}}):=\frac{1}{Z_{c}\left({\mathbf{x}}_{\mathrm{B},l}^{\mathrm{ext}}\right)}\exp\left[-\frac{1}{\sigma^{2}}\\|{\mathbf{y}}_{l}-{\mathbf{h}}_{l}{\mathbf{x}}\\|^{2}-\\|{\mathbf{x}}-{\mathbf{x}}_{\mathrm{B},l}^{\mathrm{ext}}\\|^{2}/v_{\mathrm{B},l}^{\mathrm{ext}}\right],$		(40b)
	$\displaystyle q({\mathbf{x}}):=\frac{1}{Z_{q}\left(\hat{{\mathbf{x}}}\right)}\exp\left[-\\|{\mathbf{x}}-\hat{{\mathbf{x}}}\\|^{2}/v\right],$		(40c)

respectively, where $Z_{0}\left({\mathbf{x}}_{\mathrm{A}}^{\mathrm{ext}}\right)$ , $\{Z_{c}\left({\mathbf{x}}_{\mathrm{B},l}^{\mathrm{ext}}\right)\}_{c=1}^{C}$ , and $Z_{q}\left(\hat{{\mathbf{x}}}\right)$ are the normalization factors corresponding to their density functions. Based on the above-mentioned optimization process, we have


	$\displaystyle\hat{{\mathbf{x}}}=\mathtt{E}_{b}({\mathbf{x}})=\mathtt{E}_{b_{1}}({\mathbf{x}})=\cdots=\mathtt{E}_{b_{C}}({\mathbf{x}})=\mathtt{E}_{q}({\mathbf{x}}),$		(41a)
	$\displaystyle\begin{aligned} v&=\frac{1}{K}\sum\limits_{k\in\mathcal{K}}\mathtt{E}_{b}(\|x_{k}\|^{2})=\frac{1}{K}\sum\limits_{k\in\mathcal{K}}\mathtt{E}_{b_{1}}(\|x_{k}\|^{2})=\cdots&=\frac{1}{K}\sum\limits_{k\in\mathcal{K}}\mathtt{E}_{b_{L}}(\|x_{k}\|^{2})=\frac{1}{K}\sum\limits_{k\in\mathcal{K}}\mathtt{E}_{q}(\|x_{k}\|^{2}),\end{aligned}$		(41b)

for any fixed points $v$ and $\hat{{\mathbf{x}}}$ in Algorithm 1. This fixed point characterization is helpful to perform the convergence analysis and understand the algorithm from the optimization perspective.

VI Simulation Results

In this section, we will provide extensive simulation results to demonstrate the excellent performance of the proposed distributed EP detector for cell-free massive MIMO. After showing the simulation results of the distributed EP detector in conventional and scalable cell-free massive MIMO systems, we will verify the accuracy of the proposed analytical framework. Furthermore, the performance of the distributed EP-based ICD scheme is also investigated. The codes for the simulation part is available at https://github.com/hehengtao/DeEP-cell-free-mMIMO.

VI-A Conventional Cell-Free Massive MIMO

In this subsection, we consider the BER performance of the proposed detector in the conventional cell-free massive MIMO where each AP serves all users. We assume that perfect CSI can be obtained at each AP. The simulation parameters are set as $N=8$ , $K=32$ , and $L=8$ . Fig. 4 compares the achievable BER of the proposed distributed EP, centralized EP detectors [31], and other baselines [27] using QPSK and $16$ -QAM modulation symbols. The results are obtained by Monte Carlo simulations with $10,000$ independent channel realizations. We denote “deEP” as the distributed EP detector proposed in Section III. It can be observed that the proposed distributed EP detector outperforms the distributed MMSE detector with only one EP iteration. This is because deEP incorporates the prior information of the symbol ${\mathbf{x}}$ into the posterior mean estimator for the equivalent AWGN channel (10). Furthermore, it outperforms the centralized MMSE detector with $T=5$ iterations for different modulation symbols. Compared with the centralized EP detectors [31], the distributed EP detector achieves comparable BER performance.

VI-B Accuracy of the Analytical Results

We next verify the accuracy of the analytical framework in Fig. 5 with different system settings and modulation schemes. As shown in the figure, the BERs of the proposed detector match well with the derived analytical results, which demonstrates the accuracy of the analytical framework developed in Section V-A. Although the analytical framework is derived in the large-system limit, it still can predict the BER performance in small-sized MIMO systems illustrated in Fig. 5(b). Therefore, instead of performing time-consuming Monte Carlo simulations to obtain the corresponding performance metrics, we can predict the theoretical behavior by the state evolution equations. Furthermore, the analytical framework can be further utilized to optimize the system design.

VI-C Scalable Cell-Free Massive MIMO

As the user-centric approach is more attractive for cell-free massive MIMO, we investigate the distributed EP detector with DCC framework. The user first appoints a master AP according to the large-scale fading factor and assigned a pilot ${\mathbf{p}}_{\tau}$ by the appointed AP. Then, other neighboring APs determine whether they serve the accessing user according to the assigned pilot ${\mathbf{p}}_{\tau}$ . Finally, the cluster for the $k$ -th user is constructed. Fig. 6 shows that the distributed EP detector outperforms both the centralized and distributed MMSE detectors. Furthermore, the performance loss is limited when compared to the conventional cell-free massive MIMO system while the computational complexity is significantly decreased from $\mathcal{O}(KN^{2})$ to $O(|\mathcal{D}_{l}|N^{2})$ .

VI-D Performance of ICD with 3GPP Urban Channel

In Sections VI-A to VI-D, all detectors are investigated assuming perfect CSI. In this section, we consider a distributed EP-based ICD scheme for cell-free massive MIMO with the 3GPP Urban Channel [48] and imperfect CSI. We assume that $N=8$ , $K=8$ , and $L=4$ . The APs are assumed to be deployed in a $1\times 1$ $\mathrm{km}^{2}$ area located in urban environments to match with the system settings in [27]. The large-scale fading factor is given by

\beta_{kl}\,[\textrm{dB}]=-30.5-36.7\log_{10}\left(\frac{d_{kl}}{1\,\textrm{m}}\right)+g_{kl},

(42)

where $d_{kl}$ is the distance between the $k$ -th user and $l$ -th AP and $g_{kl}\sim\mathcal{N}(0,4^{2})$ represents shadow fading. The shadowing terms from an AP to different users are correlated as

\mathbb{E}\{g_{kl}g_{ij}\}=\begin{cases}4^{2}2^{-\delta_{ki}/9\,\textrm{m}}&l=j\\ 0&l\neq j,\end{cases}

(43)

where $\delta_{ki}$ is the distance between the $k$ -th user and $i$ -th user. The second case in (43) on the correlation of shadowing terms related to two different APs can be ignored. Furthermore, the multi-antenna APs are equipped with half-wavelength-spaced uniform linear arrays and the spatial correlation is generated using the Gaussian local scattering model with a $15^{\circ}$ angular standard deviation. The transmit power $p_{k}$ for each user is $p_{k}=100$ mW and the bandwidth is $20$ MHz.

Fig. 7 compares the achievable BER of the proposed distributed EP detector with those of other detectors in the 3GPP Urban Channel and imperfect CSI. The channel is estimated with the LMMSE channel estimator. As can be observed from the figure, the proposed ICD scheme outperforms the centralized and distributed MMSE detectors, and shows a similar performance as the distributed EP detector with perfect CSI and $16$ -QAM symbols.

VI-E Non-Orthogonal Pilots

In this section, we consider the performance of ICD with different pilots in the channel training stage. In particular, the orthogonal pilot matrix $\mathbf{X}^{{\mathrm{p}}}\in\mathbb{C}^{N\times\tau_{p}}$ is chosen by selecting $\tau_{p}$ columns from the discrete Fourier transformation (DFT) matrix ${\mathbf{F}}\in\mathbb{C}^{\tau_{p}\times\tau_{p}}$ . For a non-orthogonal pilot, each element of the pilot matrix is drawn from $64$ -QAM. Furthermore, we consider $\tau_{p}=K$ and $\tau_{d}=16K$ for the two pilots. Fig. 8 shows the BERs of the distributed detectors in the ICD architecture, where $r=1$ indicates no data feedback to the channel estimator while $r=2$ refers to the scenario where the detected data are fed back to channel estimator once. As can be observed from the figure, the data feedback can improve the system performance significantly. Specifically, if we target for a BER = $10^{-2}$ with QPSK symbols and non-orthogonal pilots, data feedback ( $r=2$ ) can bring $3.3$ dB and data feedback ( $r=4$ ) can bring $4.5$ dB performance gain, respectively. Furthermore, the performance gain will be enlarged if orthogonal DFT pilots are used. Interestingly, using DFT pilots has a similar performance as that of non-orthogonal pilots. This demonstrates that the ICD scheme is an efficient way to enable non-orthogonal pilots for reducing the pilot overhead in cell-free massive MIMO systems.

VII Conclusion

In this paper, we proposed a distributed EP detector for cell-free massive MIMO. It was shown that the proposed detector achieves better performance than linear detectors for both conventional and scalable cell-free massive MIMO networks. Compared to other distributed detectors, it achieves a better BER performance with an increase of the computational overhead at the CPU. A distributed ICD was then proposed for cell-free massive MIMO systems to handle imperfect CSI. An analytical framework was also provided to characterize the asymptotic performance of the proposed detector in a large system setting. Simulation results demonstrated that the proposed method outperforms existing distributed detectors for cell-free massive MIMO in terms of BER performance. Furthermore, the proposed ICD architecture significantly improves the system performance and enables non-orthogonal pilots to reduce the pilot overhead. For future research, it is interesting to extend the distributed EP algorithm to precoding and channel estimation for cell-free massive MIMO systems.

Appendix A Message-Passing Derivation of Algorithm 1

A-A Derivation of Distributed EP Detector

In this appendix, we present the derivation of Algorithm 1 from the message-passing perspective⁶⁶6We also can derive Algorithm 1 by directly utilizing the EP principle in [49], which is an extension of belief propagation and obtained by imposing an exponential family constraint on messages. Both BP and EP can be derived from the generic message passing rules introduced in Section III-B.. By applying the above message-passing rules to the factor graph in Fig. 2, we can obtain Algorithm 1. First, we give the factor graph for cell-free massive MIMO detection. As illustrated in Fig. 2, we have factor graphs for describing the centralized processing model (4) and distributed processing model (6). Specifically, the factor graph contains $L+1$ factor nodes $\{f,f_{1},\ldots,f_{L}\}$ and one variable node ${\mathbf{x}}$ for the distributed processing model (6). The factor $f$ represents the prior information of ${\mathbf{x}}$ and $f_{l}$ denotes $l$ -th AP. After constructing the factor graph for (6), we approximate the posterior distribution by computing and passing messages among the nodes in the factor graph with an iterative manner and aforementioned rules. We initialize the messages from the variable node ${\mathbf{x}}$ to factors $\{f_{0},f_{1},\ldots,f_{L}\}$ as $\mu_{{\mathbf{x}}\rightarrow f_{l}}({\mathbf{x}})=\mathcal{N}_{\mathbb{C}}({\mathbf{x}};\boldsymbol{\gamma}_{l}^{(0)}/\lambda_{l}^{(0)},\lambda_{l}{\mathbf{I}})(l=1,2,...,L)$ . Then, according to Factor-to-variable messages, we have the message $\mu_{f_{l}\rightarrow{\mathbf{x}}}({\mathbf{x}})=b_{f_{l}}({\mathbf{x}})/\mu_{{\mathbf{x}}\rightarrow f_{l}}({\mathbf{x}})$ ⁷⁷7As each AP serves all users, only one variable ${\mathbf{x}}$ should be considered in the message passing process. Therefore, we have a reduced form as $\mu_{f_{l}\rightarrow{\mathbf{x}}}({\mathbf{x}})\propto\int f_{l}({\mathbf{x}})\Pi\mu_{{\mathbf{x}}\rightarrow f_{l}}{\mathrm{d}}{\mathbf{x}}\propto b_{f_{l}}({\mathbf{x}})/\mu_{{\mathbf{x}}\rightarrow f_{l}}({\mathbf{x}})$ .. The message $\mu_{{\mathbf{x}}\rightarrow f_{l}}({\mathbf{x}})$ is initialized as $\mathcal{N}_{\mathbb{C}}({\mathbf{x}};\boldsymbol{\gamma}_{l}^{(0)}/\lambda_{l}^{(0)},\lambda_{l}{\mathbf{I}})$ for each AP. Therefore, how to compute the belief $b_{f_{l}}({\mathbf{x}})$ is of great significance and it is given by

b_{f_{l}}({\mathbf{x}})=\arg\min\limits_{b({\mathbf{x}})\in\mathcal{F}}D\left[{{\mu_{{\mathbf{x}}\to{f_{l}}}}\left({\mathbf{x}}\right){f_{l}}\left({\mathbf{x}}\right)}\|b({\mathbf{x}})\right],

(44)

The explicit expression for KL divergence is given in (38). Since obtaining a distribution to minimize the KL divergence is very difficult, we consider $b_{f_{l}}({\mathbf{x}})$ is a Gaussian distribution with the same mean and covariance matrix with the distribution $\mu_{{\mathbf{x}}\to{f_{l}}}\left({\mathbf{x}}\right){f_{l}}\left({\mathbf{x}}\right)$ . Therefore, we have $b_{f_{l}}({\mathbf{x}})=\mathcal{N}_{\mathbb{C}}({\mathbf{x}};\boldsymbol{\mu}_{l},\boldsymbol{\Sigma}_{l})$ with $\boldsymbol{\Sigma}_{l}=(\sigma^{-2}{\mathbf{H}}^{H}_{l}{\mathbf{H}}_{l}+\lambda_{l}{\mathbf{I}})^{-1}$ and $\boldsymbol{\mu}_{l}=\boldsymbol{\Sigma}_{l}(\sigma^{-2}{\mathbf{H}}_{l}{\mathbf{y}}_{l}+\boldsymbol{\gamma}_{l})$ , which yields (14) and (15) in Algorithm 1. Then, we have

\mu_{f_{l}\rightarrow{\mathbf{x}}}({\mathbf{x}})=\frac{b_{f_{l}}({\mathbf{x}})}{\mu_{{\mathbf{x}}\to{f_{l}}}({\mathbf{x}})}\cong\frac{\mathcal{N}_{\mathbb{C}}({\mathbf{x}};{\mathbf{x}}_{\mathrm{A},l}^{\mathrm{post}},v_{\mathrm{A},l}^{\mathrm{post}}{\mathbf{I}})}{\mathcal{N}_{\mathbb{C}}({\mathbf{x}};\boldsymbol{\gamma}_{l}/\lambda_{l},\lambda_{l}{\mathbf{I}})}\propto{\mathcal{N}_{\mathbb{C}}({\mathbf{x}};{\mathbf{x}}_{\mathrm{A},l}^{\mathrm{ext}},v_{\mathrm{A},l}^{\mathrm{ext}}{\mathbf{I}})},

(45)

where we approximate $b_{f_{l}}({\mathbf{x}})$ by $\mathcal{N}_{\mathbb{C}}({\mathbf{x}};{\mathbf{x}}_{\mathrm{A},l}^{\mathrm{post}},v_{\mathrm{A},l}^{\mathrm{post}}{\mathbf{I}})$ . The principles behind the approximation are moment-matching and self-averaging [49, 50]. According to the Gaussian production lemma, we have (16) and (17) to obtain $v_{\mathrm{A},l}^{\mathrm{ext}}$ and ${\mathbf{x}}_{\mathrm{A},l}^{\mathrm{ext}}$ in Algorithm 1, respectively. Next, we consider the message $\mu_{{\mathbf{x}}\rightarrow f}({\mathbf{x}})$ to be the product of messages passing from each APs (factors) $\{f_{1},f_{2},...,f_{L}\}$ to ${\mathbf{x}}$ and we have

\mu_{{\mathbf{x}}\rightarrow f}({\mathbf{x}})=\prod_{l=1}^{L}\mu_{{\mathbf{x}}\to{f_{l}}}=\prod_{l=1}^{L}\mathcal{N}_{\mathbb{C}}({\mathbf{x}};{\mathbf{x}}_{\mathrm{A},l}^{\mathrm{ext}},v_{\mathrm{A},l}^{\mathrm{ext}})=\mathcal{N}_{\mathbb{C}}({\mathbf{x}};{\mathbf{x}}_{\mathrm{A}}^{\mathrm{ext}},v_{\mathrm{A}}^{\mathrm{ext}}{\mathbf{I}}),

(46)

where $v_{\mathrm{A}}^{\mathrm{ext}}$ and ${\mathbf{x}}_{\mathrm{A}}^{\mathrm{ext}}$ are obtained from (18) and (19). We then approximate the belief on $f$ as $b_{f}({\mathbf{x}})\propto\mathcal{N}_{\mathbb{C}}({\mathbf{x}};{\mathbf{x}}_{\mathrm{B}}^{\mathrm{post}},\mathrm{diag}({\mathbf{v}}_{\mathrm{B}}^{\mathrm{post}}))$ . The mean ${\mathbf{x}}_{\mathrm{B}}^{\mathrm{post}}$ and variance ${\mathbf{v}}_{\mathrm{B}}^{\mathrm{post}}$ are obtained by moment-matching and self-averaging, and denoted by (20) and (21), respectively. As the transmitted symbol is assumed to be drawn from $M$ -QAM set, the explicit expressions for (20) and (21) are given by (12) and (13). Similar to (45), we set the message $\mu_{f\rightarrow{\mathbf{x}}}({\mathbf{x}})$ as

\mu_{f\rightarrow{\mathbf{x}}}({\mathbf{x}})=\frac{b_{f}({\mathbf{x}})}{\mu_{{\mathbf{x}}\rightarrow f}({\mathbf{x}})}=\frac{\mathcal{N}_{\mathbb{C}}({\mathbf{x}};{\mathbf{x}}_{\mathrm{B}}^{\mathrm{post}},v_{\mathrm{B}}^{\mathrm{post}}{\mathbf{I}})}{\mathcal{N}_{\mathbb{C}}({\mathbf{x}};{\mathbf{x}}_{\mathrm{A}}^{\mathrm{ext}},v_{\mathrm{A}}^{\mathrm{ext}}{\mathbf{I}})}\propto\mathcal{N}_{\mathbb{C}}({\mathbf{x}},{\mathbf{x}}_{\mathrm{B}}^{\mathrm{ext}},v_{\mathrm{B}}^{\mathrm{ext}}).

(47)

Finally, according to the variable-to-factor messages, we compute the message $\mu_{{\mathbf{x}}\rightarrow f_{l}}({\mathbf{x}})$ for the next iteration which is given by

\mu_{{\mathbf{x}}\rightarrow f_{l}}({\mathbf{x}})=\mu_{f\rightarrow{\mathbf{x}}}({\mathbf{x}})\prod_{f^{{}^{\prime}}\neq f}\mu_{f^{{}^{\prime}}\rightarrow{\mathbf{x}}}({\mathbf{x}})\propto\mathcal{N}_{\mathbb{C}}({\mathbf{x}};\boldsymbol{\gamma}_{l}/\lambda_{l},\lambda_{l}{\mathbf{I}}),

(48)

and the explicit expressions for $\boldsymbol{\gamma}_{l}$ and $\lambda_{l}$ are given by (22) and (23), respectively. We can obtain the Algorithm 1 by repeating above message passing procedure.

A-B Derivation of MRC Combining

For distributed EP-based MIMO detection, the challenge is how to combine the message from different APs. In our paper, we focus on linear combining of the local estimate $\mathbf{x}_{\mathrm{A},l}^{\mathrm{ext}}$ and $v_{\mathrm{A},l}^{\mathrm{ext}}$ from APs as follows,

\mathbf{x}_{\mathrm{A}}^{\mathrm{ext}}=\sum_{l=1}^{L}a_{l}\mathbf{x}_{\mathrm{A},l}^{\mathrm{ext}},\quad l=1,\ldots,L.

(49)

As the linear model (3) is decoupled into the equivalent AWGN model, we focus on the scalar form of the (49) with

x_{k,\mathrm{A}}^{\mathrm{ext}}=\sum_{l=1}^{L}a_{l,k}x_{\mathrm{A},l,k}^{\mathrm{ext}},\quad k=1,\ldots,K,

(50)

where $a_{l,k}$ is the local estimate for the $k$ -user from the $l$ -th AP, and it depends on the variance $v_{\mathrm{A},l}^{\mathrm{ext}}$ . As in (10) and (11)., we write the estimate $x_{k,\mathrm{A}}^{\mathrm{ext}}=x_{k}+n_{l,k}^{eq}$ , where $n_{l,k}^{eq}$ represents equivalent noise with known variance $v_{\mathrm{A},l}^{\mathrm{ext}}$ . For the $k$ -th user, the optimal linear combining is $x_{k,\mathrm{A}}^{\mathrm{ext}}=\sum_{l=1}^{L}a_{l,k}x_{\mathrm{A},l,k}^{\mathrm{ext}}$ . Thus, we have $x_{k,\mathrm{A}}^{\mathrm{ext}}=\sum_{l=1}^{L}a_{l,k}x_{k}+\sum_{l=1}^{L}a_{l,k}n_{\mathrm{A},l,k}^{\mathrm{ext}}$ . Because $x_{k,\mathrm{A}}^{\mathrm{ext}}$ should be an unbiased estimate of $x_{k}$ , we have $\sum_{l=1}^{L}a_{l,k}=1$ . After combining, the equivalent SNR is given by

\mathrm{SNR}=\frac{|x_{k}|^{2}}{|\sum_{l=1}^{L}a_{l,k}n_{\mathrm{A},l,k}^{\mathrm{ext}}|^{2}}=\frac{E_{s}}{\sum_{l=1}^{L}a_{l,k}^{2}v_{\mathrm{A},l,k}^{\mathrm{ext}}}.

(51)

As a result, the combing problem becomes an optimization problem

	$\displaystyle\max$	$\displaystyle\quad\frac{E_{s}}{\sum_{l=1}^{L}a_{l,k}^{2}v_{\mathrm{A},l,k}^{\mathrm{ext}}}$
	$\displaystyle\mathrm{s.t.}$	$\displaystyle\quad\sum_{l=1}^{L}a_{l,k}=1.$		(52)

By using the method of Lagrange multipliers, it is easy to obtain

a_{k,l}=\frac{1}{v_{\mathrm{A},l,k}}\bigg{(}\sum_{l=1}^{L}\frac{1}{v_{\mathrm{A},l,k}^{\mathrm{ext}}}\bigg{)}^{-1},l=1,\ldots,L.

(53)

Thus, we obtain the MRC combining in (18) and (19).

Appendix B Derivation of ICD

In this appendix, we present the derivation of the equivalent channel estimation error in the data detection stage, as well as the derivation of data detection error in channel estimation stage.

B-A Derivation of Equivalent Channel Estimation Error

The equivalent noise $\hat{{\mathbf{n}}}^{{\mathrm{d}}}[n]$ in the signal detector is assumed to be Gaussian distributed with zero mean and covariance matrix $\hat{{\mathbf{V}}}_{\mathrm{est}}[n]$ , which can be obtained by considering the statistical properties of the channel estimation error. For the convenience of simplicity, we omit the time index $n$ . The detailed expression for $\hat{\mathbf{V}}_{\mathrm{est}}$ is then given by

\hat{\mathbf{V}}_{\mathrm{est}}=\mathrm{diag}\left(\sum_{j=1}^{K}\sigma_{\Delta h_{1,j}}^{2}+\sigma^{2},\ldots,\sum_{j=1}^{K}\sigma_{\Delta h_{N,j}}^{2}+\sigma^{2}\right),

(54)

where $\sigma_{\Delta h_{i,j}}^{2}$ is the variance of $(i,j)$ -th element for channel estimation error matrix $\Delta{\mathbf{H}}$ obtained from ${\mathbf{R}}_{\Delta{{\mathbf{h}}_{l}^{\mathrm{p}}}}$ .

B-B Derivation of Data Detection Error

Owing to the data feedback from the signal detector to the channel estimator, the LMMSE channel estimation in the data-aided stage is given by

\hat{\bar{{\mathbf{h}}}}_{l}={\mathbf{R}}_{\bar{{\mathbf{h}}}_{l}\bar{{\mathbf{h}}}_{l}}{\mathbf{A}}^{H}({\mathbf{A}}{\mathbf{R}}_{\bar{{\mathbf{h}}}_{l}\bar{{\mathbf{h}}}_{l}}{\mathbf{A}}^{H}+{\mathbf{R}}_{{\mathbf{n}}{\mathbf{n}}})^{-1}{\mathbf{y}},

(55)

where ${\mathbf{A}}={\mathbf{X}}^{T}\otimes\mathbf{I}_{N}\in\mathbb{C}^{\tau_{c}N\times KN}$ , ${\mathbf{y}}=\mathrm{vec}(\mathbf{Y})\in\mathbb{C}^{\tau_{c}N\times 1}$ , ${\mathbf{n}}=\mathrm{vec}(\mathbf{N})\in\mathbb{C}^{\tau_{c}N\times 1}$ , and $\tau_{c}=\tau_{p}+\tau_{d}$ . The covariance matrix ${\mathbf{R}}_{\Delta{\mathbf{H}}_{l}}$ of the channel estimation error vector $\Delta{{\mathbf{h}}_{l}}=\hat{\bar{{\mathbf{h}}}}_{l}-{\mathbf{h}}$ can be computed as

{\mathbf{R}}_{\Delta{\mathbf{h}}_{l}}={\mathbf{R}}_{\bar{{\mathbf{h}}}_{l}\bar{{\mathbf{h}}}_{l}}-{\mathbf{R}}_{\bar{{\mathbf{h}}}_{l}\bar{{\mathbf{h}}}_{l}}{\mathbf{A}}^{H}({\mathbf{A}}{\mathbf{R}}_{\bar{{\mathbf{h}}}_{l}\bar{{\mathbf{h}}}_{l}}{\mathbf{A}}^{H}+{\mathbf{R}}_{{\mathbf{n}}{\mathbf{n}}})^{-1}{\mathbf{A}}{\mathbf{R}}_{\bar{{\mathbf{h}}}_{l}\bar{{\mathbf{h}}}_{l}}.

(56)

The covariance matrix ${\mathbf{R}}_{{\mathbf{n}}{\mathbf{n}}}$ contains the equivalent noise power of the actual pilot $\mathbf{X}^{{\mathrm{p}}}$ and additional pilot $\hat{\mathbf{X}}^{{\mathrm{d}}}$ . The explicit expression for ${\mathbf{R}}_{{\mathbf{n}}{\mathbf{n}}}$ is given by

\displaystyle{\mathbf{R}}_{{\mathbf{n}}{\mathbf{n}}}=\left[\begin{array}[]{c c c}\sigma^{2}{\mathbf{I}}_{\tau_{p}N}\\ &\hat{\mathbf{V}}_{\mathrm{det}}\end{array}\right],

(59)

where

\displaystyle\hat{\mathbf{V}}_{\mathrm{det}}=\left[\begin{array}[]{c c c}\hat{\mathbf{V}}^{{\mathrm{d}}}[1]\\ &\ddots\\ &&\hat{\mathbf{V}}^{{\mathrm{d}}}[{\tau_{d}}]\end{array}\right]

(63)

and

\hat{\mathbf{V}}^{{\mathrm{d}}}[n]=\left(\sum_{j=1}^{K_{l}}\sigma_{e_{j,n}}^{2}+\sigma^{2}\right)\mathbf{I}_{K}.

(64)

Denote ${\mathbf{z}}_{n}=\mathbf{H}{\mathbf{e}}_{n}$ , $z_{i,n}=\sum_{j=1}^{K}h_{i,j}e_{j,n}$ , where ${\mathbf{e}}_{n}$ is the $n$ -th column of the signal detection error matrix ${\mathbf{E}}^{{\mathrm{d}}}$ and $\sigma_{e_{j,n}}^{2}$ is the variance for $j$ -th element of ${\mathbf{e}}_{n}$ . The signal detection error matrix ${\mathbf{E}}^{{\mathrm{d}}}$ can be obtained from the posterior variance ${\mathbf{v}}_{\mathrm{A},l}^{\mathrm{post}}$ in the module A.

References

[1] H. He, H. Wang, X. Yu, J. Zhang, S.H. Song, K. B. Letaief, “Distributed expectation propagation detection for cell-free massive MIMO,” in Proc. IEEE Global Commun. Conf. (GLOBECOM), Madrid, Spain, 2021.
[2] J. G. Andrews, S. Buzzi, W. Choi, S. V. Hanly, A. Lozano, C. K. Soong, and J. C. Zhang, “What will 5G be?” IEEE J. Sel. Areas Commun., vol. 32, no. 6, pp. 1065-1082, 2014.
[3] K. B. Letaief, Y. Shi, J. Lu, and J. Lu, “Edge artificial intelligence for 6G: Vision, enabling technologies, and applications,” IEEE J. Sel. Areas Commun., vol. 40, no. 1, pp. 5-36, 2022.
[4] K. B. Letaief, W. Chen, Y. Shi, J. Zhang, and Y.-J.-A. Zhang, “The roadmap to 6G: AI empowered wireless networks,” IEEE Commun. Mag., vol. 57, no. 8, pp. 84-90, Aug. 2019.
[5] T. L. Marzetta, “Noncooperative cellular wireless with unlimited numbers of base station antennas,” IEEE Trans. Wireless Commun., vol. 9, no. 11, pp. 3590–3600, Nov. 2010.
[6] H. Q. Ngo, A. Ashikhmin, H. Yang, E. G. Larsson, and T. L. Marzetta, ”Cell-free massive MIMO versus small cells,” IEEE Trans. Wireless Commun., vol. 16, no. 3, pp. 1834–1850, Mar. 2017.
[7] J. Zhang, S. Chen, Y. Lin, J. Zheng, B. Ai, and L. Hanzo, “Cell-free massive MIMO: A new next-generation paradigm,” IEEE Access, vol. 7, pp. 99878–99888, Sep. 2019.
[8] J. Zhang, E. Björnson, M. Matthaiou, D. W. K. Ng, H. Yang, and D. J. Love, “Prospective multiple antenna technologies for beyond 5G,” IEEE J. Sel. Areas Commun., vol. 38, no. 8, pp. 1637-1660, Aug. 2020.
[9] M. Matthaiou, O. Yurduseven, H. Q. Ngo, D. Morales-Jimenez, S. L. Cotton, and V. F. Fusco, “The road to 6G: Ten physical layer challenges for communications engineers,” IEEE Commun. Mag., vol. 59, no. 1, pp. 64-69, Jan. 2021.
[10] H. He, X. Yu, J. Zhang, S.H. Song, K. B. Letaief, “Cell-Free Massive MIMO for 6G Wireless Communication Networks,” Journal of Communications and Information Networks, vol. 6, no. 4, pp. 321-335, Dec. 2021.
[11] G. Interdonato, E. Björnson, H. Q. Ngo, P. Frenger, and E. G. Larsson, “Ubiquitous cell-free massive MIMO communications,” EURASIP J. Wireless Commun. and Netw., vol. 2019, no. 1, pp. 197-206, 2019.
[12] D. Wang, C. Zhang, Y. Du, J. Zhao, M. Jiang, and X. You, “Implementation of a cloud-based cell-free distributed massive MIMO system,” IEEE Commun. Mag., vol. 58, no. 8, pp. 61-67, 2020.
[13] W. Choi, J. G. Andrews, “Downlink performance and capacity of distributed antenna systems in a multicell environment,” IEEE Trans. Wireless Commun., vol. 6, no. 1, pp. 69-73, Jan. 2007.
[14] J. Zhang, R. Chen, J. G. Andrews, A. Ghosh, and R. W. Heath Jr, “Networked MIMO with clustered linear precoding,” IEEE Trans. Wireless Commun., vol. 8, no. 4, pp. 1910-1921, Apr. 2009.
[15] Ö. Özdogan, E. Björnson, and J. Zhang, “Performance of cell-free massive MIMO with rician fading and phase shifts,” IEEE Trans. Wireless Commun., vol. 18, no. 11, pp. 5299-5315, Nov. 2019.
[16] Z. Wang, J. Zhang, E. Björnson, and B. Ai, “Uplink performance of cell-free massive MIMO over spatially correlated Rician fading channels,” IEEE Commun. Lett., vol. 25, no. 4, pp. 1348-1352, Apr. 2020.
[17] S.-N. Jin, D.-W. Yue, and H. H. Nguyen, “Spectral and energy efficiency in cell-free massive MIMO systems over correlated Rician fading,” IEEE Syst. J., pp. 1-12, May. 2020.
[18] H. Q. Ngo, L. Tran, T. Q. Duong, M. Matthaiou, and E. G. Larsson, “On the total energy efficiency of cell-free massive MIMO,” IEEE Trans. Green Commun. Netw, vol. 2, no. 1, pp. 25-39, 2018.
[19] H. V. Nguyen, V. D. Nguyen, O. A. Dobre, S. K. Sharma, S. Chatzinotas, B. Ottersten, and O. S. Shin, “On the spectral and energy efficiencies of full-duplex cell-free massive MIMO,” IEEE J. Sel. Areas Commun., vol. 38, no. 8, pp. 1698-1718, Aug. 2020.
[20] E. Björnson and L. Sanguinetti, “Scalable cell-free massive MIMO systems,” IEEE Trans. Commun., vol. 68, no. 7, pp. 4247–4261, Jul. 2020.
[21] S. Buzzi, C. D’Andrea, “Cell-free massive MIMO: User-centric approach,” IEEE Wireless Commun. Lett., vol. 6, no. 6, pp. 706-709, Dec. 2017.
[22] S. Buzzi, C. D’Andrea, A. Zappone, and C. D’Elia, “User-centric 5G cellular networks: Resource allocation and comparison with the cell-free massive MIMO approach,” IEEE Trans. Wireless Commun., vol. 19, no. 2, pp. 1250-1264, Feb. 2020.
[23] H. A. Ammar, R. Adve, S. Shahbazpanahi, G. Boudreau, and K. V. Srinivas, “User-centric cell-free massive MIMO networks: A survey of opportunities, challenges and solutions,” IEEE Commun. Surv. Tutor., vol. 24, no. 1, pp. 611-652, 2022.
[24] A. A. Polegre, F. Riera-Palou, G. Femenias, and A. G. Armada, “Channel hardening in cell-free and user-centric massive MIMO networks with spatially correlated ricean fading,” IEEE Access, vol. 8, pp. 139827-139845, 2020.
[25] F. Riera-Palou, G. Femenias, A. G. Armada, and A. Pérez-Neira, “Clustered cell-free massive MIMO,” in Proc. IEEE Globecom Workshops (GC Wkshps), Dec. 2018, pp. 1-6.
[26] E. Nayebi, A. Ashikhmin, T. L. Marzetta, and B. D. Rao, “Performance of cell-free massive MIMO systems with MMSE and LSFD receivers,” in Proc. 50th Asilomar Conf. Signals, Syst. Comput., Nov. 2016, pp. 203–207.
[27] E. Björnson and L. Sanguinetti, “Making cell-free massive MIMO competitive with MMSE processing and centralized implementation,” IEEE Trans. Wireless Commun., vol. 19, no. 1, pp. 77–90, Jan. 2020.
[28] Z. H. Shaik, E. Björnson, and E. G. Larsson, “MMSE-optimal sequential processing for cell-free massive MIMO with radio stripes,” IEEE Trans. Commun., vol. 69, no. 11, pp. 7775-7789, Nov. 2021.
[29] Carmen D’Andrea, E. Bjornson, and E. G. Larsson, “Improving cell-free massive MIMO by local per-bit soft detection,” IEEE Commun. Lett., vol. 25, no. 7, pp. 2400-2404, Apr. 2021.
[30] T. P. Minka, “A family of algorithms for approximate Bayesian Inference,” Ph.D. dissertation, Dept. Elect. Eng. Comput. Sci., MIT, Cambridge, MA, USA, 2001.
[31] J. Céspedes, P. M. Olmos, M. Sánchez-Fernández, and F. Perez-Cruz, “Expectation propagation detection for high-order high-dimensional MIMO systems,” IEEE Trans. Commun., vol. 62, no. 8, pp. 2840–2849, Aug. 2014.
[32] C. Jeon, K. Li, J. R. Cavallaro, and C. Studer, “Decentralized equalization with feedforward architectures for massive MU-MIMO,” IEEE Trans. Signal Process., vol. 67, no. 17, pp. 4418–4432, Sep. 2019.
[33] H. Wang, A. Kosasih, C. Wen, S. Jin, and W. Hardjawana, “Expectation propagation detector for extra-large scale massive MIMO,” IEEE Trans. Wireless Commun., vol. 19, no. 3, pp. 2036–2051, Mar. 2020.
[34] K. Ghavami and M. Naraghi-Pour, “MIMO detection with imperfect channel state information using expectation propagation,” IEEE Trans. Veh. Technol., vol. 66, no. 9, pp. 8129-8138, Sep. 2017.
[35] R. Gholami, L. Cottatellucci, and D. Slock, “Message passing for a bayesian semi-blind approach to cell-free massive mimo,” in Proc. 55th Asilomar Conf. Signals Syst. Comput., Nov, 2021, pp. 1237-1241.
[36] A. Kosasih, et al. “Improving Cell-Free Massive MIMO Detection Performance via Expectation Propagation,” in Proc. IEEE 94th Veh. Technol. Conf. (VTC-Fall), Sep. 2021, pp. 1-6.
[37] F. R. Kschischange, B. J. Frey, and H. A. Loeliger, “Factor graphs and the sum-product algorithm,” IEEE Trans. Inf. Theory, vol. 42, no. 2, pp. 498-519, Feb. 2001.
[38] M. Coldrey and P. Bohlin, “Training-based MIMO systems: Part II:Improvements using detected symbol information,” IEEE Trans. Signal Process., vol. 56, no. 1, pp. 296-303, Jan. 2008.
[39] H. He, C.-K. Wen, S. Jin, and G. Y. Li, “Model-driven deep learning for MIMO detection,” IEEE Trans. Signal Process., vol. 68, pp. 1702–1715, Mar. 2020.
[40] R. Prasad, C. R. Murthy, and B. D. Rao, “Joint channel estimation and data detection in MIMO-OFDM systems: A sparse Bayesian learning approach,” IEEE Trans. Signal Process., vol. 63, no. 20, pp. 5369-5382, Oct. 2015.
[41] J. Ma and L. Ping, “Data-aided channel estimation in large antenna systems,” IEEE Trans. Signal Process., vol. 62, no. 12, pp. 3111-3124, Jun. 2014.
[42] C.-K. Wen, C.-J. Wang, S. Jin, K.-K. Wong, and P. Ting, “Bayes-optimal joint channel-and-data estimation for massive MIMO with low-precision ADCs” IEEE Trans. Signal Process., vol. 64, no. 10, pp. 2541-2556, Jul. 2015.
[43] H. Song, X. You, C. Zhang, O. Tirkkonen, C. Studer, “Minimizing pilot overhead in cell-Free massive MIMO systems via joint estimation and detection,” in Proc. IEEE 21st Int. Workshop Signal Process. Adv. Wireless Commun. (SPAWC), Atlanta, GA, USA, May. 2020, pp. 1-5.
[44] K. Takeuchi, “Rigorous dynamics of expectation-propagation-based signal recovery from unitarily invariant measurements,” IEEE Trans. Inf. Theory., vol. 66, no. 1, 368-386, Oct. 2019.
[45] A. M. Tulino and S. Verdu, Random Matrix Theory and Wireless Communications. Hanover, MA USA: Now Publishers Inc., 2004.
[46] J. G. Proakis, Digital Communications. Boston, USA: McGraw-Hill Companies, 2007.
[47] A. Fletcher, M. Sahraee-Ardakan, S. Rangan, and P. Schniter, “Expectation consistent approximate inference: Generalizations and convergence,” in Proc. IEEE Int. Symp. Inf. Theory, Barcelona, Spain, Jul. 2016, pp. 190-194.
[48] 3GPP, “Further advancements for E-UTRA physical layer aspects (Release 9),” 3GPP TS 36.814, Tech. Rep., Mar. 2017.
[49] T. Minka, “Divergence measures and message passing,” Microsoft Research Cambridge, Tech. Rep., 2005.
[50] S. Rangan, P. Schniter, and A. Fletcher, “Vector approximate message passing” IEEE Trans. Inf. Theory, vol. 65, no. 10, pp. 6664-6684, Oct. 2019.