This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Cell-Free Massive MIMO Detection: A Distributed Expectation Propagation Approach

Hengtao He,  Xianghao Yu,  Jun Zhang,  S.H. Song,  and Khaled B. Letaief This paper has been presented in part at the 2021 IEEE Global Commun. Conf. (GLOBECOM), Madrid, Spain. [1]. This work was supported by the Research Grant Council under Grant No. 16212120.H. He, X. Yu, J. Zhang, S. Song and Khaled B. Letaief are with the Department of Electronic and Computer Engineering, the Hong Kong University of Science and Technology, Hong Kong, E-mail: {eehthe, eexyu, eejzhang, eeshsong, eekhaled@ust.hk}@ust.hk.
Abstract

Cell-free massive MIMO is one of the core technologies for future wireless networks. It is expected to bring enormous benefits, including ultra-high reliability, data throughput, energy efficiency, and uniform coverage. As a radically distributed system, the performance of cell-free massive MIMO critically relies on efficient distributed processing algorithms. In this paper, we propose a distributed expectation propagation (EP) detector for cell-free massive MIMO, which consists of two modules: a nonlinear module at the central processing unit (CPU) and a linear module at each access point (AP). The turbo principle in iterative channel decoding is utilized to compute and pass the extrinsic information between the two modules. An analytical framework is provided to characterize the asymptotic performance of the proposed EP detector with a large number of antennas. Furthermore, a distributed iterative channel estimation and data detection (ICD) algorithm is developed to handle the practical setting with imperfect channel state information (CSI). Simulation results will show that the proposed method outperforms existing detectors for cell-free massive MIMO systems in terms of the bit-error rate and demonstrate that the developed theoretical analysis accurately predicts system performance. Finally, it is shown that with imperfect CSI, the proposed ICD algorithm improves the system performance significantly and enables non-orthogonal pilots to reduce the pilot overhead.

Index Terms:
6G, cell-free massive MIMO, distributed MIMO detection, expectation propagation.

I Introduction

The fifth-generation (5G) wireless networks have been commercialized since 2019 to support a wide range of services, including enhanced mobile broadband, ultra-reliable and low-latency communications, and massive machine-type communications[2]. However, the endeavor for faster and more reliable wireless communications will never stop. This trend is reinforced by the recent emergence of several innovative applications, including the Internet of Everything, Tactile Internet, and seamless virtual and augmented reality. Future sixth-generation (6G) wireless networks are expected to provide ubiquitous coverage, enhanced spectral efficiency (SE), connected intelligence, etc. [3, 4]. Such diverse service requirements create daunting challenges for system design. With the commercialization of massive MIMO technologies [5] in 5G, it is time to think about new MIMO-based network architectures to support the continuous exponential growth of mobile data traffic and a plethora of applications.

As a promising solution, cell-free massive MIMO was proposed [6]. It is a disruptive technology and has been recognized as a crucial and core enabler for beyond 5G and 6G networks[7, 8, 9, 10, 12, 11]. Cell-free massive MIMO can be interpreted as a combination of massive MIMO[5], distributed antenna systems (DAS)[13], and Network MIMO[14]. It is expected to bring important benefits, including huge data throughput, ultra-low latency, ultra-high reliability, a huge increase in the mobile energy efficiency, and ubiquitous and uniform coverage. In cell-free massive MIMO systems, a very large number of distributed access points (APs) are connected to a central processing unit (CPU) via a front-haul network in order to cooperate and jointly serve a large number of users using the same time or frequency resources over a wide coverage area. In contrast to current cellular systems, there is no cell or cell boundary in cell-free MIMO networks. As a result, this approach is revolutionary and will be able to relieve one of the major bottlenecks and inherent limitations of wireless networks, i.e., the strong inter-cell interference. Compared to conventional co-located massive MIMO, cell-free networks offer more uniform connectivity for all users thanks to the macro-diversity gain provided by the distributed antennas.

Investigations on cell-free massive MIMO started with some initial attempts on analyzing the SE [6], where single-antenna APs, single-antenna users, and Rayleigh fading channels were considered. The analysis has been extended to multi-antenna APs with Rayleigh fading, Rician fading, and correlated channels [15, 16, 17], showing that cell-free massive MIMO networks can achieve great SE. The energy efficiency of cell-free massive MIMO systems was then investigated [18, 19]. It was shown that cell-free massive MIMO systems can improve the energy efficiency by approximately ten times compared to cellular massive MIMO. Although cell-free massive MIMO has shown a huge potential, its deployment critically depends on effective and scalable algorithms. According to [20], cell-free massive MIMO is considered to be scalable if the signal processing tasks for channel estimation, precoder and combiner design, fronthaul overhead, and power optimization per AP can be kept within finite complexity when the number of served users goes to infinity. Unfortunately, the conventional cell-free massive MIMO is not scalable. To tackle the scalability issue, the concept of user-centric has been introduced into cell-free massive MIMO systems [20, 21, 22, 23, 24, 25]. In particular, a user-centric dynamic cooperation clustering (DCC) scheme was introduced [20], where each user is only served by a subset of APs. This scheme was called scalable cell-free massive MIMO.

In this paper, we will focus on effective data detection algorithms in cell-free massive MIMO systems. In this aspect, some early attempts were made on centralized algorithms where the detection is implemented at the CPU with the received pilots and data signals reported from all APs [6, 26]. However, the computational and fronthaul overhead of such a centralized detection scheme is prohibitively high when the network size becomes large. To address this challenge, distributed detectors have been recently investigated. In [27], one centralized and three distributed receivers with different levels of cooperation among APs were compared in terms of SE. Radio stripes were then incorporated into cell-free massive MIMO in [28]. In this case, the APs are sequentially connected and share the same fronthaul link in a radio stripe network, thus reducing the cabling substantially. Based on this structure, a novel uplink sequential processing algorithm was developed which can achieve an optimal performance in terms of both SE and mean-square error (MSE). Furthermore, it can achieve the same performance as the centralized minimum MSE (MMSE) processing, while requiring much lower fronthaul overhead by making full use of the computational resources at the APs. However, the distributed detectors investigated in [27] are linear detectors. Therefore, they are highly suboptimal or may even be ill-conditioned in terms of the bit-error rate (BER) performance. To address this problem, the local per-bit soft detection is exploited at each AP with the bit log-likelihoods shared on the front-haul link by a partial marginalization detector [29]. However, the proposed soft detection is still very complex as the approximate posterior density for each received data bit is required to be computed at each AP. Therefore, it is of great importance to develop a distributed and non-linear receiver to achieve a better BER performance with a considerably lower complexity.

To fill this gap, we propose a distributed non-linear detector for cell-free massive MIMO networks in this paper, which is derived based on the expectation propagation (EP) principle [30]. The EP algorithm, proposed in [30], provides an iterative method to recover the transmitted data from the received signal and has recently attracted extensive research interests in massive MIMO detection [31, 32, 33, 34]. Recently, it has been extended to cell-free massive MIMO systems [35, 36]. The algorithm is derived from the factor graph with the messages updated and passed between different pairs of nodes. Specifically, with the linear MMSE estimator, the APs first detect the symbols with the local channel state information (CSI) and transfer the posterior mean and variance estimates to the CPU. Then, the extrinsic information for each AP is computed and integrated at the CPU by maximum-ratio combining (MRC). Subsequently, the CPU uses the posterior mean estimator to refine the detection and the extrinsic information is transferred to each AP from the CPU via the fronthaul.

The main contributions of this paper are summarized as follows:

  • Different from the existing linear detectors in cell-free massive MIMO[27], we propose a distributed and non-linear detector. The detection performance is improved at the cost of slightly increasing the computation overhead at the computationally-powerful CPU. Simulation results will demonstrate that the proposed method outperforms existing distributed detectors and even the centralized MMSE detector in terms of BER performance.

  • To be applicable in practical scenarios with imperfect CSI, we further develop a distributed iterative channel estimation and data detection (ICD) algorithm for cell-free massive MIMO systems, where the detector takes the channel estimation error and channel statistics into consideration while the channel estimation is refined by the detected data. Simulation results will demonstrate that the proposed ICD algorithm outperforms existing distributed detectors and enables non-orthogonal pilots to reduce the pilot overhead.

  • We develop an analytical framework to analyze the performance of the distributed EP algorithm, which can precisely predict the performance of the proposed detector. Based on the theoretical analysis, key performance metrics of the system, such as the MSE and BER, can be analytically determined without time-consuming Monte Carlo simulation.

Notations—For any matrix 𝐀\mathbf{A}, 𝐀H\mathbf{A}^{H} and tr(𝐀){\mathrm{tr}}(\mathbf{A}) will denote the conjugate transpose and trace of 𝐀\mathbf{A}, respectively. In addition, 𝐈\mathbf{I} is the identity matrix and 𝟎\mathbf{0} is the zero matrix. We use Dz\mathrm{D}z to denote the real Gaussian integration measure. That is,

Dz=ϕ(z)dz,whereϕ(z)12πez22.\mathrm{D}z=\phi(z)dz,\quad\mathrm{where}\quad\phi(z)\triangleq\frac{1}{\sqrt{2\pi}}e^{-\frac{z^{2}}{2}}.

A complex Gaussian distribution with mean 𝝁\boldsymbol{\mu} and covariance 𝛀\boldsymbol{\Omega} can be described by the probability density function,

𝒩(𝐳;𝝁,𝛀)=1det(π𝛀)e(𝐳𝝁)H𝛀1(𝐳𝝁).\mathcal{N}_{\mathbb{C}}(\mathbf{z};\boldsymbol{\mu},\boldsymbol{\Omega})=\frac{1}{\mathrm{det}(\pi\boldsymbol{\Omega})}e^{-(\mathbf{z}-\boldsymbol{\mu})^{H}\boldsymbol{\Omega}^{-1}(\mathbf{z}-\boldsymbol{\mu})}.

The remaining part of this paper is organized as follows. Section II introduces the system model and formulates the cell-free massive MIMO detection problem. The distributed EP detector is proposed in Section III and the distributed ICD receiver is investigated in Section IV. Furthermore, an analytical framework is provided in Section V. Numerical results are then presented in Section VI and Section VII concludes the paper.

II System Model

In this section, we first present the system model and formulate the cell-free massive MIMO detection problem. Four commonly-adopted receivers are then briefly introduced.

II-A Cell-Free Massive MIMO

As illustrated in Fig. 1, we consider a cell-free massive MIMO network with LL distributed APs, each equipped with NN antennas to serve single-antenna users. All APs are connected to a CPU that has abundant computing resources. Denote 𝐡klN×1𝒩(𝟎,𝐑kl){\mathbf{h}}_{kl}\in{\mathbb{C}}^{N\times 1}\sim\mathcal{N}_{{\mathbb{C}}}(\mathbf{0},{\mathbf{R}}_{kl}) as the channel between the kk-th user and the ll-th AP, where 𝐑klN×N{\mathbf{R}}_{kl}\in{\mathbb{C}}^{N\times N} is the spatial correlation matrix, and βk,l=tr(𝐑kl)/N\beta_{k,l}=\mathrm{tr}({\mathbf{R}}_{kl})/N as the large-scale fading coefficient involving the geometric path loss and shadowing. In the uplink data transmission phase, we consider k{1,,L}\mathcal{M}_{k}\subset\{1,\ldots,L\} as the subset of APs that serve the kk-th user and define the DCC matrices 𝐃klN×N{\mathbf{D}}_{kl}\in{\mathbb{C}}^{N\times N} based on k\mathcal{M}_{k} as

𝐃kl={𝐈Niflk𝟎N×Niflk.{\mathbf{D}}_{kl}=\left\{\begin{aligned} &{\mathbf{I}}_{N}\quad\mathrm{if}\ l\in\mathcal{M}_{k}\\ &\mathbf{0}_{N\times N}\quad\mathrm{if}\ l\notin\mathcal{M}_{k}.\end{aligned}\right. (1)

Specifically, 𝐃kl{\mathbf{D}}_{kl} is an identity matrix if the ll-th AP serves the kk-th user, and is a zero matrix, otherwise. Furthermore, we define 𝒟l\mathcal{D}_{l} as the set of user indices that are served by the ll-th AP

𝒟l={k:tr(𝐃kl)1,k{1,,K}},\mathcal{D}_{l}=\bigg{\{}k:\mathrm{tr}({\mathbf{D}}_{kl})\geq 1,k\in\{1,\ldots,K\}\bigg{\}}, (2)

where the cardinality of 𝒟l\mathcal{D}_{l} is denoted as |𝒟l||\mathcal{D}_{l}|. When each AP serves all users, we have |𝒟l|=K|\mathcal{D}_{l}|=K accordingly. Assuming that perfect CSI is available at the local APs, the received signal 𝐲lN×1{\mathbf{y}}_{l}\in{\mathbb{C}}^{N\times 1} at the ll-th AP is then given by

𝐲l=k𝒟lpk𝐡klxk+𝐧l,{\mathbf{y}}_{l}=\sum_{k\in\mathcal{D}_{l}}\sqrt{p_{k}}{\mathbf{h}}_{kl}x_{k}+{\mathbf{n}}_{l}, (3)

where xkx_{k}\in{\mathbb{C}} is the transmitted symbol drawn from an MM-QAM constellation, and pk>0p_{k}>0 is the transmit power at the kk-th user. The additive noise at the ll-th AP is denoted as 𝐧lN×1𝒩(𝟎N,σ2𝐈N){\mathbf{n}}_{l}\in{\mathbb{C}}^{N\times 1}\sim\mathcal{N}_{{\mathbb{C}}}(\mathbf{0}_{N},\sigma^{2}{\mathbf{I}}_{N}). Let 𝐱=[x1,,xK]T{\mathbf{x}}=[x_{1},\ldots,x_{K}]^{T} denote the transmitted vector from all users and 𝐇l=[𝐡1l,,𝐡Kl]N×K{\mathbf{H}}_{l}=[{\mathbf{h}}_{1l},\ldots,{\mathbf{h}}_{Kl}]\in{\mathbb{C}}^{N\times K} is the channel from all users to AP ll. If the kk-th user is not associated with the ll-th AP, the channel vector is 𝐡kl=𝟎{\mathbf{h}}_{kl}=\mathbf{0}. Furthermore, we denote 𝐇=[𝐇1T,,𝐇LT]TLN×K{\mathbf{H}}=[{\mathbf{H}}_{1}^{T},\ldots,{\mathbf{H}}_{L}^{T}]^{T}\in{\mathbb{C}}^{LN\times K} as the channel matrix between all users and APs. The uplink detection problem for cell-free massive MIMO is to detect the transmitted data 𝐱{\mathbf{x}} based on the received signals 𝐲l(l=1,2,,L){\mathbf{y}}_{l}\,(l=1,2,\ldots,L), channel matrix 𝐇{\mathbf{H}}, and noise power σ2\sigma^{2}.

Refer to caption

(a) Conventional cell-free massive MIMO system

Refer to caption

(b) Scalable cell-free massive MIMO system.

Figure 1: .  Conventional and scalable cell-free massive MIMO systems.

II-B Linear Receivers

For the cell-free massive MIMO detection, there are four commonly-adopted linear receivers [27] with different levels of cooperation among APs. This is elaborated as follows.

  • Fully centralized receiver: The pilot and data signals received at all APs are sent to the CPU for channel estimation and data detection. The CPU may perform the MMSE or MRC detection.

  • Partially distributed receiver with large-scale fading decoding: First, each AP estimates the channels and uses the linear MMSE detector to detect the received signals. Then, the detected signals are collected at the CPU for joint detection for all users by utilizing the large-scale fading decoding (LSFD) method. Compared to the fully centralized receiver, only the channel statistics are utilized at the CPU but the pilot signals are not required to be sent to the CPU.

  • Partially distributed receiver with average decoding: It is a special case of the partially distributed receiver with LSFD. The CPU performs joint detection for all users by simply taking the average of the local estimates. Thus, no channel statistics are required to be transmitted to the CPU via the fronthaul.

  • Fully distributed receiver: It is a fully distributed approach in which the data detection is performed at the APs based on the local channel estimates. No information is required to be transferred to the CPU. Each AP may perform MMSE detection.

All of the above-mentioned four receivers achieve performance that is far from the optimal one due to the linear processing. In contrast, non-linear receivers have shown great advantages in terms of BER in cellular massive MIMO systems but at the cost of higher computational complexity [31]. Thanks to the relatively high computing abilities at the CPU and the large number of APs in cell-free massive MIMO systems, we can offload some of the computational-intensive operations to the CPU and distribute the partial computation tasks to APs. Following this idea, we will propose a distributed non-linear detector for cell-free massive MIMO systems.

III Proposed Distributed EP Detector

In this section, we first formulate the MIMO detection problem from the Bayesian inference. Then, we introduce the factor graph and propose the distributed EP detector. Finally, we will analyze the computational complexity and fronthaul overhead of the proposed detector.

III-A Distributed Bayesian Inference

We first apply the Bayesian inference to recover the signals 𝐱{\mathbf{x}} from the received signal 𝐲{\mathbf{y}} in the data detection stage with the linear model 𝐲=𝐇𝐱+𝐧{\mathbf{y}}={\mathbf{H}}{\mathbf{x}}+{\mathbf{n}}, where 𝐲=[𝐲1T,,𝐲LT]TLN×1{\mathbf{y}}=[{\mathbf{y}}_{1}^{T},\ldots,{\mathbf{y}}_{L}^{T}]^{T}\in{\mathbb{C}}^{LN\times 1}, 𝐇=[𝐇1T,,𝐇LT]TLN×K{\mathbf{H}}=[{\mathbf{H}}_{1}^{T},\ldots,{\mathbf{H}}_{L}^{T}]^{T}\in{\mathbb{C}}^{LN\times K} and 𝐱=[x1,,xK]TK×1{\mathbf{x}}=[x_{1},\ldots,x_{K}]^{T}{\mathbb{C}}^{K\times 1}. Based on Bayes’ theorem, the posterior probability is given by

𝙿(𝐱|𝐲,𝐇)=𝙿(𝐲|𝐱,𝐇)𝙿(𝐱)𝙿(𝐲|𝐇)=𝙿(𝐲|𝐱,𝐇)𝙿(𝐱)𝙿(𝐲|𝐱,𝐇)𝙿(𝐱)𝑑𝐱\mathtt{P}({\mathbf{x}}|{\mathbf{y}},{\mathbf{H}})=\frac{\mathtt{P}({\mathbf{y}}|{\mathbf{x}},{\mathbf{H}})\mathtt{P}({\mathbf{x}})}{\mathtt{P}({\mathbf{y}}|{\mathbf{H}})}=\frac{\mathtt{P}({\mathbf{y}}|{\mathbf{x}},{\mathbf{H}})\mathtt{P}({\mathbf{x}})}{\int\mathtt{P}({\mathbf{y}}|{\mathbf{x}},{\mathbf{H}})\mathtt{P}({\mathbf{x}})d{\mathbf{x}}} (4)

where 𝙿(𝐲|𝐱,𝐇)\mathtt{P}({\mathbf{y}}|{\mathbf{x}},{\mathbf{H}}) is the likelihood function with the known channel matrix and 𝙿(𝐱)\mathtt{P}({\mathbf{x}}) is the prior distribution of 𝐱{\mathbf{x}}. Given the posterior probability 𝙿(𝐱|𝐲,𝐇)\mathtt{P}({\mathbf{x}}|{\mathbf{y}},{\mathbf{H}}), the Bayesian MMSE estimate is obtained by

𝐱^=𝐱𝙿(𝐱|𝐲,𝐇)𝑑𝐱.\hat{{\mathbf{x}}}=\int{\mathbf{x}}\mathtt{P}({\mathbf{x}}|{\mathbf{y}},{\mathbf{H}})d{\mathbf{x}}. (5)

However, the Bayesian MMSE estimator is not tractable because the marginal posterior probability in (5) involves a high-dimensional integral, which motivates us to develop an advanced method to approximate (4) effectively. Furthermore, different from conventional cellular MIMO systems[31], the posterior probability in (4) has to be rewritten in a distributed way because of the inherent distributed characteristic of cell-free massive MIMO systems, given by

𝙿(𝐱|𝐲,𝐇)𝙿(𝐱)l=1Lexp(𝐲l𝐇l𝐱2/σ2).\mathtt{P}({\mathbf{x}}|{\mathbf{y}},{\mathbf{H}})\varpropto\mathtt{P}({\mathbf{x}})\prod_{l=1}^{L}\mathrm{exp}(-\|{\mathbf{y}}_{l}-{\mathbf{H}}_{l}{\mathbf{x}}\|^{2}/\sigma^{2}). (6)

As a result, we can develop an efficient distributed MIMO detector to approximate the posterior probability (6) by resorting to message passing on factor graph and take the characteristic of cell-free massive MIMO into consideration. The factor graph and proposed detector will be elaborated in next sections.

III-B Factor Graph for Cell-free Massive MIMO Detection

Factor graph is a powerful tool to develop efficient algorithms for solving inference problems by performing different message-passing rules. We first introduce several concepts for the factor graph [37], as illustrated in Fig. 2, where the hollow circles represent the variable nodes and the solid squares represent the factor nodes. For a factor graph with factor nodes {f0,f1,,fL}\{f_{0},f_{1},\ldots,f_{L}\}, all connected to a set of variable nodes {𝐱i}\{{\mathbf{x}}_{i}\}, the messages are computed and updated iteratively by performing the following rules.

Refer to caption
Figure 2: .  Factor graphs for message passing algorithms.

1) Approximate beliefs: The approximate belief bapp(𝐱)b_{\mathrm{app}}({\mathbf{x}}) on variable node 𝐱{\mathbf{x}} is 𝒩(𝐱,𝐱Aext,vAext)\mathcal{N}_{\mathbb{C}}({\mathbf{x}},{\mathbf{x}}_{\mathrm{A}}^{\mathrm{ext}},v_{\mathrm{A}}^{\mathrm{ext}}), where 𝐱Aext=𝙴[𝐱|bsp]{\mathbf{x}}_{\mathrm{A}}^{\mathrm{ext}}=\mathtt{E}[{\mathbf{x}}|b_{\mathrm{sp}}] and vAext=mean(diag(Cov[𝐱|bsp]))v_{\mathrm{A}}^{\mathrm{ext}}=\mathrm{mean}(\mathrm{diag}(\mathrm{Cov}[{\mathbf{x}}|b_{\mathrm{sp}}])) are the mean and average variance of the corresponding beliefs bsp(𝐱)lμfl𝐱(𝐱)b_{\mathrm{sp}}({\mathbf{x}})\propto\prod_{l}\mu_{f_{l}\rightarrow{\mathbf{x}}}({\mathbf{x}}).

2) Variable-to-factor messages: The message from a variable node 𝐱{\mathbf{x}} to a connected factor node flf_{l} is given by

μ𝐱fl(𝐱)=jlLμfj𝐱(𝐱),\mu_{{\mathbf{x}}\rightarrow f_{l}}({\mathbf{x}})=\prod_{j\neq l}^{L}\mu_{f_{j}\rightarrow{\mathbf{x}}}({\mathbf{x}}), (7)

which is the product of the message from other factors.

3) Factor-to-variable messages: The message from a factor node flf_{l} to a connected variable node 𝐱i{\mathbf{x}}_{i} is μfl𝐱i(𝐱)f(𝐱i,{𝐱j}ji)jiμ𝐱jfld𝐱j\mu_{f_{l}\rightarrow{\mathbf{x}}_{i}}({\mathbf{x}})\propto\int f({\mathbf{x}}_{i},\{{\mathbf{x}}_{j}\}_{j\neq i})\prod_{j\neq i}\mu_{{\mathbf{x}}_{j}\rightarrow f_{l}}{\mathrm{d}}{\mathbf{x}}_{j}.

If we treat the LL APs and the CPU as the factor nodes, we can construct the factor graph for cell-free massive MIMO detection. By further applying the above message-passing rules to the constructed factor graph, we develop the distributed EP detector for cell-free massive MIMO systems. The detailed derivation is given in Appendix A-A.

Refer to caption

Refer to caption

Figure 3: .   Block diagrams of the proposed distributed EP detector, as well as the distributed EP-based iterative channel estimation and data detection. The “EXT.” blocks represent the extrinsic information computation. The channel estimator and signal detector exchange information iteratively until convergence.

III-C Distributed EP Detector

The distributed EP-based detector is illustrated in Algorithm 1. Compared with the centralized EP detector, it deploys partial calculation modules at the AP based on the local information and then sends the estimated posterior mean and variance to the CPU for combining. In particular, the input of the algorithm is the received signal 𝐲l{\mathbf{y}}_{l}, channel matrix 𝐇l{\mathbf{H}}_{l}, and noise level σ2\sigma^{2}, while the output is the recovered signal 𝐱Bpost,T{\mathbf{x}}_{\mathrm{B}}^{\mathrm{post},T} in the TT-th iteration. The initial parameters are 𝜸l(0)=𝟎\boldsymbol{\gamma}_{l}^{(0)}=\mathbf{0}, and λl(0)=1Ex\lambda_{l}^{(0)}=\frac{1}{E_{x}}, where

Ex=𝙴{𝐱2}/K.E_{x}=\mathtt{E}\{\|{\mathbf{x}}\|^{2}\}/K. (8)

Note that 𝜸l(0)\boldsymbol{\gamma}_{l}^{(0)} and λl(0)\lambda_{l}^{(0)} are the initialized extrinsic information, and ExE_{x} is the power of the transmitted symbol xkx_{k}. The block diagram of the proposed distributed EP detector is illustrated in Fig. 3, which is composed of two modules, A and B. Each module uses the turbo principle in iterative decoding, where each module passes the extrinsic messages to the other module and this process repeats until convergence. It is derived from the factor graph with the messages updated and passed between different pairs of nodes that are assumed to follow Gaussian distributions. As the Gaussian distribution can be fully characterized by its mean and variance, only the mean and variance need to be calculated and passed between different modules. To better understand the distributed EP detection algorithm, we elaborate the details of each equation in Algorithm 1. Specifically, module A is the linear MMSE (LMMSE) estimator performed at the APs according to the following linear model

𝐲l=𝐇l𝐱+𝐧l.{\mathbf{y}}_{l}={\mathbf{H}}_{l}{\mathbf{x}}+{\mathbf{n}}_{l}. (9)

In the tt-iteration, the explicit expressions for the posterior covariance matrix 𝚺lt\boldsymbol{\Sigma}_{l}^{t} and mean vector 𝝁lt\boldsymbol{\mu}_{l}^{t} are given by (14) and (15), respectively. Note that each AP only uses the local channel 𝐇l{\mathbf{H}}_{l} to detect the transmitted signal 𝐱{\mathbf{x}}. For the ease of notation, we omit the iteration index tt for all estimates for the mean and variance in Algorithm 1. Then, the variance estimate vA,lpost=tr(𝚺lt)/|𝒟l|v_{\mathrm{A},l}^{\mathrm{post}}=\mathrm{tr}(\boldsymbol{\Sigma}_{l}^{t})/|\mathcal{D}_{l}| and mean estimate 𝐱A,lpost=𝝁lt{\mathbf{x}}_{\mathrm{A},l}^{\mathrm{post}}=\boldsymbol{\mu}_{l}^{t} are transferred to the CPU to compute the extrinsic information vA,lextv_{\mathrm{A},l}^{\mathrm{ext}} (16) and 𝐱A,lext{\mathbf{x}}_{\mathrm{A},l}^{\mathrm{ext}} (17), respectively. The extrinsic information 𝐱A,lext{\mathbf{x}}_{\mathrm{A},l}^{\mathrm{\mathrm{ext}}} can be regarded as the AWGN observation given by

𝐱A,lext=𝐱+𝐧A,lext,{\mathbf{x}}_{\mathrm{A},l}^{\mathrm{ext}}={\mathbf{x}}+{\mathbf{n}}^{\mathrm{ext}}_{\mathrm{A},l}, (10)

where 𝐧A,lext𝒩(0,vA,lext𝐈){\mathbf{n}}^{\mathrm{ext}}_{\mathrm{A},l}\sim\mathcal{N}_{{\mathbb{C}}}(0,v_{\mathrm{A},l}^{\mathrm{ext}}{\mathbf{I}}) [33]. As a result, the linear model in (3) is decoupled into KK parallel and independent AWGN channels with equivalent noise vA,lextv_{\mathrm{A},l}^{\mathrm{ext}} for each AP. Subsequently, the CPU collects all extrinsic means {𝐱A,lext}l=1L\{{\mathbf{x}}_{\mathrm{A},l}^{\mathrm{ext}}\}_{l=1}^{L} and variances {vA,lext}l=1L\{v_{\mathrm{A},l}^{\mathrm{ext}}\}_{l=1}^{L} and performs MRC111In our paper, we focus on linear combining of the local estimate 𝐱A,lext\mathbf{x}_{\mathrm{A},l}^{\mathrm{ext}} and vA,lextv_{\mathrm{A},l}^{\mathrm{ext}} from APs as 𝐱Aext=l=1Lal𝐱A,lext,l=1,,L\mathbf{x}_{\mathrm{A}}^{\mathrm{ext}}=\sum_{l=1}^{L}a_{l}\mathbf{x}_{\mathrm{A},l}^{\mathrm{ext}},\quad l=1,\ldots,L. The optimal combining is MRC, which can maximize the post-combination SNR of the final AWGN observation xAext\mathrm{x}_{\mathrm{A}}^{\mathrm{ext}} with the linear combination of 𝐱A,lext\mathbf{x}_{\mathrm{A},l}^{\mathrm{ext}} where 𝐱Aext=𝐱+𝐧eq\mathbf{x}_{\mathrm{A}}^{\mathrm{ext}}=\mathbf{x}+\mathbf{n}^{\mathrm{eq}} and 𝐧Aext𝒩(0,vAext𝐈)\mathbf{n}_{\mathrm{A}}^{\mathrm{ext}}\sim\mathcal{N}_{\mathbb{C}}(0,v_{\mathrm{A}}^{\mathrm{ext}}\mathbf{I}). The final expressions are shown in (18) and (19) and the detailed derivation is given in Appendix A-B.. The MRC expressions (18) and (19) are obtained by maximizing the post-combination signal-to-noise ratio (SNR) of the AWGN observation 𝐱Aext{\mathbf{x}}_{\mathrm{A}}^{\mathrm{ext}} at the CPU, given by

𝐱Aext=𝐱+𝐧Aext,{\mathbf{x}}_{\mathrm{A}}^{\mathrm{ext}}={\mathbf{x}}+{\mathbf{n}}^{\mathrm{ext}}_{\mathrm{A}}, (11)

where 𝐧Aext𝒩(0,vAext𝐈){\mathbf{n}}^{\mathrm{ext}}_{\mathrm{A}}\sim\mathcal{N}_{{\mathbb{C}}}(0,v_{\mathrm{A}}^{\mathrm{ext}}{\mathbf{I}}). The CPU uses the posterior mean estimator to detect the signal 𝐱{\mathbf{x}} from the equivalent AWGN model (11). Then, the posterior mean and variance are computed by the posterior MMSE estimator for the equivalent AWGN model in (20) and (21). As the transmitted symbol is assumed to be drawn from the MM-QAM set 𝒮={s1,s2,,sM}\mathcal{S}=\{s_{1},s_{2},\ldots,s_{M}\}, the corresponding expressions for each element in (20) and (21) are given by

[xBpost]k=𝙴{xk|[xAext]k,vAext}=si𝒮si𝒩(si;[xAext]k,vAext)p(si)si𝒮𝒩(si;[xAext]k,vAext)p(si),[x_{\mathrm{B}}^{\mathrm{post}}]_{k}=\mathtt{E}\{x_{k}|[x_{\mathrm{A}}^{\mathrm{ext}}]_{k},v_{\mathrm{A}}^{\mathrm{ext}}\}=\frac{\sum_{s_{i}\in\mathcal{S}}s_{i}\mathcal{N}_{{\mathbb{C}}}(s_{i};[x_{\mathrm{A}}^{\mathrm{ext}}]_{k},v_{\mathrm{A}}^{\mathrm{ext}})p(s_{i})}{\sum_{s_{i}\in\mathcal{S}}\mathcal{N}_{{\mathbb{C}}}(s_{i};[x_{\mathrm{A}}^{\mathrm{ext}}]_{k},v_{\mathrm{A}}^{\mathrm{ext}})p(s_{i})}, (12)
[vBpost]k=𝚟𝚊𝚛{xk|[xAext]k,vAext}=si𝒮|si|2𝒩(si;[xAext]k,vAext)p(si)si𝒮𝒩(si;[xAext]k,vAext)p(si)|[xBpost]k|2,[v_{\mathrm{B}}^{\mathrm{post}}]_{k}=\mathtt{var}\{x_{k}|[x_{\mathrm{A}}^{\mathrm{ext}}]_{k},v_{\mathrm{A}}^{\mathrm{ext}}\}=\frac{\sum_{s_{i}\in\mathcal{S}}|s_{i}|^{2}\mathcal{N}_{{\mathbb{C}}}(s_{i};[x_{\mathrm{A}}^{\mathrm{ext}}]_{k},v_{\mathrm{A}}^{\mathrm{ext}})p(s_{i})}{\sum_{s_{i}\in\mathcal{S}}\mathcal{N}_{{\mathbb{C}}}(s_{i};[x_{\mathrm{A}}^{\mathrm{ext}}]_{k},v_{\mathrm{A}}^{\mathrm{ext}})p(s_{i})}-|[x_{\mathrm{B}}^{\mathrm{post}}]_{k}|^{2}, (13)

where [xBpost]k[x_{\mathrm{B}}^{\mathrm{post}}]_{k}, [vBpost]k[v_{\mathrm{B}}^{\mathrm{post}}]_{k}, and [xAext]k[x_{\mathrm{A}}^{\mathrm{ext}}]_{k} are the kk-th element in 𝐱Bpost{\mathbf{x}}_{\mathrm{B}}^{\mathrm{post}}, 𝐯Bpost{\mathbf{v}}_{\mathrm{B}}^{\mathrm{post}}, and 𝐱Aext{\mathbf{x}}_{\mathrm{A}}^{\mathrm{ext}}, respectively. The posterior mean 𝐱Bpost{\mathbf{x}}_{\mathrm{B}}^{\mathrm{post}} and variance 𝐯Bpost{\mathbf{v}}_{\mathrm{B}}^{\mathrm{post}} are then utilized to compute the extrinsic information λl(t)\lambda_{l}^{(t)} and 𝜸l(t)\boldsymbol{\gamma}_{l}^{(t)} for each AP in (22) and (23), where the function mean()\mathrm{mean}(\cdot) is used to compute the mean of a vector. Finally, the extrinsic information λl(t)\lambda_{l}^{(t)} and 𝜸l(t)\boldsymbol{\gamma}_{l}^{(t)} are transferred to each AP in the next iteration. The whole procedure is executed iteratively until it is terminated by a stopping criterion or a maximum number of iterations.

Input: Received signal 𝐲l{\mathbf{y}}_{l}, channel matrix 𝐇l{\mathbf{H}}_{l}, and noise level σ2\sigma^{2}.
Output: Recovered signal 𝐱Bpost{\mathbf{x}}_{\mathrm{B}}^{\mathrm{post}}.
Initialize: 𝜸l(0)𝟎\boldsymbol{\gamma}_{l}^{(0)}\leftarrow\mathbf{0}, λl(0)1Ex\lambda_{l}^{(0)}\leftarrow\frac{1}{E_{x}}
for t=1,,Tt=1,\cdots,T  do
         Module A in APs:
      (1) Compute the posterior mean and variance of 𝐱A,l{\mathbf{x}}_{\mathrm{A},l}:
vA,lpost𝚺lt=(σ2𝐇lH𝐇l+λl(t1)𝐈)1v_{\mathrm{A},l}^{\mathrm{post}}\leftarrow\boldsymbol{\Sigma}_{l}^{t}=\bigg{(}\sigma^{-2}{\mathbf{H}}^{H}_{l}{\mathbf{H}}_{l}+\lambda_{l}^{(t-1)}{\mathbf{I}}\bigg{)}^{-1} (14)
𝐱A,lpost𝝁lt=𝚺lt(σ2𝐇lH𝐲l+𝜸l(t1)).{\mathbf{x}}_{\mathrm{A},l}^{\mathrm{post}}\leftarrow\boldsymbol{\mu}_{l}^{t}=\boldsymbol{\Sigma}_{l}^{t}\bigg{(}\sigma^{-2}{\mathbf{H}}_{l}^{H}{\mathbf{y}}_{l}+\boldsymbol{\gamma}_{l}^{(t-1)}\bigg{)}. (15)
Module B in CPU:
      (2) Compute the extrinsic mean and variance of 𝐱A,l{\mathbf{x}}_{\mathrm{A},l}:
vA,lext(1vA,lpostλl(t1))1v_{\mathrm{A},l}^{\mathrm{ext}}\leftarrow\bigg{(}\frac{1}{v_{\mathrm{A},l}^{\mathrm{post}}}-\lambda_{l}^{(t-1)}\bigg{)}^{-1} (16)
𝐱A,lextvA,lext(𝝁ltvA,lpost𝜸l(t1))1.{\mathbf{x}}_{\mathrm{A},l}^{\mathrm{ext}}\leftarrow v_{\mathrm{A},l}^{\mathrm{ext}}\bigg{(}\frac{\boldsymbol{\mu}_{l}^{t}}{v_{\mathrm{A},l}^{\mathrm{post}}}-\boldsymbol{\gamma}_{l}^{(t-1)}\bigg{)}^{-1}. (17)
(3) MRC combining of 𝐱A,l{\mathbf{x}}_{\mathrm{A},l}:
1vAext=l=1L1vA,lext\frac{1}{v_{\mathrm{A}}^{\mathrm{ext}}}=\sum_{l=1}^{L}\frac{1}{v_{\mathrm{A},l}^{\mathrm{ext}}} (18)
𝐱Aext=vAextl=1L𝐱A,lextvA,lext.{\mathbf{x}}_{\mathrm{A}}^{\mathrm{ext}}=v_{\mathrm{A}}^{\mathrm{ext}}\sum_{l=1}^{L}\frac{{\mathbf{x}}_{\mathrm{A},l}^{\mathrm{ext}}}{v_{\mathrm{A},l}^{\mathrm{ext}}}. (19)
(4) Compute the posterior mean and variance of 𝐱B{\mathbf{x}}_{\mathrm{B}}:
𝐱Bpost=𝙴{𝐱|𝐱Aext,vAext}{\mathbf{x}}_{\mathrm{B}}^{\mathrm{post}}=\mathtt{E}\{{\mathbf{x}}|{\mathbf{x}}_{\mathrm{A}}^{\mathrm{ext}},v_{\mathrm{A}}^{\mathrm{ext}}\} (20)
𝐯Bpost=𝚟𝚊𝚛{𝐱|𝐱Aext,vAext}.{\mathbf{v}}_{\mathrm{B}}^{\mathrm{post}}=\mathtt{var}\{{\mathbf{x}}|{\mathbf{x}}_{\mathrm{A}}^{\mathrm{ext}},v_{\mathrm{A}}^{\mathrm{ext}}\}. (21)
(5) Compute the extrinsic mean and variance 𝐱B,l{\mathbf{x}}_{\mathrm{B},l}:
1vB,lextλl(t)=1mean(𝐯Bpost)1vA,lext\frac{1}{v_{\mathrm{B},l}^{\mathrm{ext}}}\leftarrow\lambda_{l}^{(t)}=\frac{1}{\mathrm{mean}({\mathbf{v}}_{\mathrm{B}}^{\mathrm{post}})}-\frac{1}{v_{\mathrm{A},l}^{\mathrm{ext}}} (22)
𝐱B,lextvB,lext𝜸l(t)=𝐱Bpostmean(𝐯Bpost)𝐱A,lpostvA,lext.\frac{{\mathbf{x}}_{\mathrm{B},l}^{\mathrm{ext}}}{v_{\mathrm{B},l}^{\mathrm{ext}}}\leftarrow\boldsymbol{\gamma}_{l}^{(t)}=\frac{{\mathbf{x}}_{\mathrm{B}}^{\mathrm{post}}}{\mathrm{mean}({\mathbf{v}}_{\mathrm{B}}^{\mathrm{post}})}-\frac{{\mathbf{x}}_{\mathrm{A},l}^{\mathrm{post}}}{v_{\mathrm{A},l}^{\mathrm{ext}}}. (23)
Algorithm 1 Distributed EP for cell-free massive MIMO detection

III-D Computational Complexity and Fronthaul Overhead

In the following, we provide the complexity analysis and fronthaul overhead for different detectors in Tables I and II, respectively. For the proposed distributed EP detector, the computational complexity at each AP is dominated by the LMMSE estimator for estimating the signal 𝐱A,lpost{\mathbf{x}}_{\mathrm{A},l}^{\mathrm{post}}, which is 𝒪(N|𝒟l|2)\mathcal{O}(N|\mathcal{D}_{l}|^{2}) because of the matrix inversion required in (14), while the computational complexity at the CPU is O(L|𝒟l|)O(L|\mathcal{D}_{l}|) in each iteration. Furthermore, if the number of antennas NN is less than |𝒟l||\mathcal{D}_{l}|, we can use the matrix inversion lemma to carry out the matrix inversion in (14) as follows

(σ2𝐇lH𝐇l+𝐃)1=𝐃1σ2𝐃1𝐇lH(𝐈+σ2𝐇l𝐃1𝐇lH)1𝐇l𝐃1,(\sigma^{-2}{\mathbf{H}}_{l}^{H}{\mathbf{H}}_{l}+{\mathbf{D}})^{-1}={\mathbf{D}}^{-1}-\sigma^{-2}{\mathbf{D}}^{-1}{\mathbf{H}}_{l}^{H}({\mathbf{I}}+\sigma^{-2}{\mathbf{H}}_{l}{\mathbf{D}}^{-1}{\mathbf{H}}_{l}^{H})^{-1}{\mathbf{H}}_{l}{\mathbf{D}}^{-1},

where 𝐃=λl(t1)𝐈{\mathbf{D}}=\lambda_{l}^{(t-1)}{\mathbf{I}} and the computational complexity is reduced to 𝒪(|𝒟l|N2)\mathcal{O}(|\mathcal{D}_{l}|N^{2}). Therefore, the overall computational complexity at each AP is 𝒪(T|𝒟l|N2)\mathcal{O}(T|\mathcal{D}_{l}|N^{2}) while the overall computational complexity at the CPU is 𝒪(TL|𝒟l|)\mathcal{O}(TL|\mathcal{D}_{l}|) for TT iterations. As observed in Table I, the distritbuted EP detector mainly increases the computational complexity at the CPU when compared with other distributed linear receivers.

TABLE I: .  Complexity of different detectors.
Detectors Distributed EP Centralized MMSE Partially distributed MMSE Distributed MMSE
AP 8NT|𝒟l|(|𝒟l|+1)+8NT|\mathcal{D}_{l}|(|\mathcal{D}_{l}|+1)+ 6T|𝒟l|(|𝒟l|+2)6T|\mathcal{D}_{l}|(|\mathcal{D}_{l}|+2) 0 8N|𝒟l|(|𝒟l|+1)+8N|\mathcal{D}_{l}|(|\mathcal{D}_{l}|+1)+ 6|𝒟l|(|𝒟l|+2)6|\mathcal{D}_{l}|(|\mathcal{D}_{l}|+2) 8N|𝒟l|(|𝒟l|+1)+8N|\mathcal{D}_{l}|(|\mathcal{D}_{l}|+1)+ 6|𝒟l|(|𝒟l|+2)6|\mathcal{D}_{l}|(|\mathcal{D}_{l}|+2)
CPU LT(1+2|𝒟l|)+LT(1+2|\mathcal{D}_{l}|)+ |𝒟l|(T1)(7M+2)|\mathcal{D}_{l}|(T-1)(7M+2) |𝒟l|[8NL(|𝒟l|+1)+|\mathcal{D}_{l}|[8NL(|\mathcal{D}_{l}|+1)+ 2(4|𝒟l|+3)]2(4|\mathcal{D}_{l}|+3)] L(1+2|𝒟l|)L(1+2|\mathcal{D}_{l}|) 0

We further compare the number of complex scalars that need to be transmitted from the APs to the CPU via the fronthauls of different detectors in Table II. We assume that τc\tau_{c} and τp\tau_{p} are the coherence time and pilot length, respectively. The results for the 4 baseline detectors are from [27]. With the proposed detector, l=1L(τcτp)2T(|𝒟l|+1)\sum\limits_{l=1}^{L}(\tau_{c}-\tau_{p})2T(|\mathcal{D}_{l}|+1) scalars need to be passed from the APs to the CPU and no statistical parameters are required to be passed, where TT denotes the total number of iterations. The fronthaul overhead of the proposed distributed EP detector is similar to those of the centralized MMSE and EP detectors. The detailed comparison shall be determined by the value of system parameters.

TABLE II: .  Fronthaul overhead
Detectors Coherence block Statistical parameters
Distributed EP l=1L(τcτp)2T(|𝒟l|+1)\sum\limits_{l=1}^{L}(\tau_{c}-\tau_{p})2T(|\mathcal{D}_{l}|+1) 0
Centralized MMSE τcNL\tau_{c}NL KLN2/2KLN^{2}/2
Partially distributed MMSE with LSFD (τcτp)KL(\tau_{c}-\tau_{p})KL KL+(L2K2+KL)/2KL+(L^{2}K^{2}+KL)/2
Partially distributed MMSE with average decoding (τcτp)KL(\tau_{c}-\tau_{p})KL 0
Distributed MMSE 0 0
Centralized EP τcNL\tau_{c}NL KLN2/2KLN^{2}/2

IV Distributed iterative Channel Estimation and Data Detection

In Section III, we developed a distributed EP detector assuming perfect CSI. However, the channel matrix is typically estimated at the receiver with uplink training, and thus channel estimation errors should be considered in the detection stage. In this section, we will propose a distributed ICD receiver for cell-free massive MIMO to handle imperfect CSI.

IV-A ICD Algorithm

ICD has been explored for MIMO [38, 39], OFDM [40], and massive MIMO [41] systems with a low-resolution analog-to-digital converter (ADC) [42]. Recently, it has been investigated in cell-free massive MIMO to improve the BER performance and reduce the pilot overhead. Different from [43] that solved the complex biconvex optimization problem with high complexity, we develop a low-complexity turbo-like ICD architecture. The iterative procedure is summarized in Algorithm 2.

As illustrated in Fig. 3, the proposed ICD scheme employs a similar idea as iterative decoding. At each AP, the channel estimator and Module A in the proposed distributed EP detector are performed. They exchange extrinsic information iteratively. In the ICD processing stage, the channel estimation is first performed based on the received pilot signal and transmitted pilot. Then, the data detector performs signal detection by taking both the channel estimation error and channel statistics into consideration. It will then feed back the detected data and detection error to the channel estimator. Finally, data-aided channel estimation is employed with the help of the detected data. The whole ICD procedure is executed iteratively until it is terminated by a stopping criterion or a maximum number of ICD iterations.

Similar to Algorithm 1, the input of the ICD scheme is the pilot signal matrix, 𝐗p\mathbf{X}^{{\mathrm{p}}}, the received signal matrix corresponding to the pilot matrix, 𝐘p{\mathbf{Y}}^{{\mathrm{p}}}, and that corresponding to the data matrix, 𝐘d{\mathbf{Y}}^{{\mathrm{d}}}, in each time slot for each AP. In the rr-th ICD iteration222 Here we use rr to denote the number of data feedback from the data detector to the channel estimator. For example, r=1r=1 means no data feedback while r=4r=4 means 3 data feedback should be performed., 𝐇^(r)\hat{{\mathbf{H}}}^{(r)} is the estimated channel matrix, 𝐗^(d,r)\hat{{\mathbf{X}}}^{({\mathrm{d}},r)} is the estimated data matrix, and 𝐕^est,l\hat{{\mathbf{V}}}_{\mathrm{est},l} and 𝐕^det,l\hat{{\mathbf{V}}}_{\mathrm{det},l} are used to compute the covariance matrix for the equivalent noise in the signal detector and channel estimator, respectively. The final output of the signal detector is the detected data matrix 𝐗^(d,R)\hat{{\mathbf{X}}}^{(\mathrm{d},R)}, where RR is the total number of ICD iterations. Compared with the conventional receiver where the channel estimator and signal detector are designed separately, the ICD scheme can improve the BER performance by considering the characteristics of the channel estimation error and channel statistics. After performing signal detection, the detected payload data will be utilized for channel estimation. Next, we will elaborate the whole procedure in detail.

Input: Received signal 𝐘p{\mathbf{Y}}^{{\mathrm{p}}} and 𝐘d{\mathbf{Y}}^{{\mathrm{d}}}, pilot signal matrix, 𝐗p\mathbf{X}^{{\mathrm{p}}}, spatial correlation matrix 𝐑kl{\mathbf{R}}_{kl} , and noise level σ2\sigma^{2}.
Output: Recovered signal 𝐗^(d,R)=𝐱Bpost\hat{{\mathbf{X}}}_{(\mathrm{d},R)}={\mathbf{x}}_{\mathrm{B}}^{\mathrm{post}}.
Initialize: 𝜸l(0)𝟎\boldsymbol{\gamma}_{l}^{(0)}\leftarrow\mathbf{0}, λl(0)1Ex\lambda_{l}^{(0)}\leftarrow\frac{1}{E_{x}}
for r=1,,Rr=1,\cdots,R  do
      
      (1) Perform LMMSE channel estimation to obtain 𝐇^(r)\hat{{\mathbf{H}}}^{(r)}
      (2) Perform distributed EP-based MIMO detection to obtain 𝐗^(d,r)\hat{{\mathbf{X}}}^{(\mathrm{d},r)}
      (3) Data feedback to channel estimator
Algorithm 2 Distributed EP-based ICD for cell-free massive MIMO

IV-B LMMSE Estimator

We consider adopting the classical LMMSE estimator in the channel estimation stage at each AP. To facilitate the representation of the channel estimation problem, we consider the matrix vectorization to (9) and rewrite it as

𝐲lp=𝐀p𝐡¯l+𝐧lp,{\mathbf{y}}^{{\mathrm{p}}}_{l}={\mathbf{A}}^{{\mathrm{p}}}\bar{{\mathbf{h}}}_{l}+{\mathbf{n}}^{{\mathrm{p}}}_{l}, (24)

where 𝐀p=(𝐗p)T𝐈NτpN×KN{\mathbf{A}}^{{\mathrm{p}}}=({\mathbf{X}}^{{\mathrm{p}}})^{T}\otimes{\mathbf{I}}_{N}\in\mathbb{C}^{\tau_{p}N\times KN}, 𝐲lp=vec(𝐘lp)τpN×1{\mathbf{y}}^{{\mathrm{p}}}_{l}=\mathrm{vec}({\mathbf{Y}}^{{\mathrm{p}}}_{l})\in\mathbb{C}^{\tau_{p}N\times 1}, 𝐡¯l=vec(𝐇l)NK×1\bar{{\mathbf{h}}}_{l}=\mathrm{vec}({\mathbf{H}}_{l})\in\mathbb{C}^{NK\times 1}, and 𝐧lp=vec(𝐍lp)τpN×1{\mathbf{n}}^{{\mathrm{p}}}_{l}=\mathrm{vec}(\mathbf{N}^{{\mathrm{p}}}_{l})\in\mathbb{C}^{\tau_{p}N\times 1}. We denote \otimes as the matrix Kronecker product and vec()\mathrm{vec}(\cdot) as the vectorization operation. In the pilot-only based channel estimation stage, the LMMSE estimate of 𝐡¯l\bar{{\mathbf{h}}}_{l} is given by

𝐡¯^lp=𝐑𝐡¯l𝐡¯l(𝐀p)H(𝐀p𝐑𝐡¯l𝐡¯l(𝐀p)H+σ2𝐈τpN)1𝐲lp,\hat{\bar{{\mathbf{h}}}}^{\mathrm{p}}_{l}={\mathbf{R}}_{\bar{{\mathbf{h}}}_{l}\bar{{\mathbf{h}}}_{l}}({\mathbf{A}}^{{\mathrm{p}}})^{H}({\mathbf{A}}^{{\mathrm{p}}}{\mathbf{R}}_{\bar{{\mathbf{h}}}_{l}\bar{{\mathbf{h}}}_{l}}({\mathbf{A}}^{{\mathrm{p}}})^{H}+\sigma^{2}{\mathbf{I}}_{\tau_{p}N})^{-1}{\mathbf{y}}_{l}^{{\mathrm{p}}}, (25)

where 𝐑𝐡¯l𝐡¯l{\mathbf{R}}_{\bar{{\mathbf{h}}}_{l}\bar{{\mathbf{h}}}_{l}} is the channel covariance matrix and depends on the spatial correlation matrix 𝐑kl{\mathbf{R}}_{kl}. Based on the property of the LMMSE estimator, 𝐡¯^lp\hat{\bar{{\mathbf{h}}}}_{l}^{{\mathrm{p}}} is a Gaussian random vector, and the channel estimation error vector Δ𝐡lp=𝐡¯^lp𝐡l\Delta{{\mathbf{h}}^{{\mathrm{p}}}_{l}}=\hat{\bar{{\mathbf{h}}}}^{{\mathrm{p}}}_{l}-{\mathbf{h}}_{l} is also a Gaussian random vector with zero-mean and covariance matrix 𝐑Δ𝐡lp{\mathbf{R}}_{\Delta{{\mathbf{h}}_{l}^{\mathrm{p}}}} given by

𝐑Δ𝐡lp=𝐑𝐡¯l𝐡¯l𝐑𝐡¯l𝐡¯l(𝐀p)H(𝐀p𝐑𝐡¯l𝐡¯l(𝐀p)H+σ2𝐈τpN)1𝐀p𝐑𝐡¯l𝐡¯l.{\mathbf{R}}_{\Delta{{\mathbf{h}}_{l}^{\mathrm{p}}}}={\mathbf{R}}_{\bar{{\mathbf{h}}}_{l}\bar{{\mathbf{h}}}_{l}}-{\mathbf{R}}_{\bar{{\mathbf{h}}}_{l}\bar{{\mathbf{h}}}_{l}}({\mathbf{A}}^{{\mathrm{p}}})^{H}({\mathbf{A}}^{{\mathrm{p}}}{\mathbf{R}}_{\bar{{\mathbf{h}}}_{l}\bar{{\mathbf{h}}}_{l}}({\mathbf{A}}^{{\mathrm{p}}})^{H}+\sigma^{2}{\mathbf{I}}_{\tau_{p}N})^{-1}{\mathbf{A}}^{{\mathrm{p}}}{\mathbf{R}}_{\bar{{\mathbf{h}}}_{l}\bar{{\mathbf{h}}}_{l}}. (26)

IV-C Signal Detection with Channel Estimation Errors

After performing channel estimation, the estimated channel is used to perform data detection at each AP. In the data transmission stage, the received data signal vector 𝐲d[n]{\mathbf{y}}^{{\mathrm{d}}}[n] corresponding to the nn-th data in each coherence time can be expressed by 𝐲d[n]=𝐇𝐱d[n]+𝐧d[n]{\mathbf{y}}^{{\mathrm{d}}}[n]={\mathbf{H}}{\mathbf{x}}^{{\mathrm{d}}}[n]+{\mathbf{n}}^{{\mathrm{d}}}[n], where 𝐧d[n]𝒩(0,σ2𝐈N){\mathbf{n}}^{{\mathrm{d}}}[n]\sim\mathcal{N}_{\mathbb{C}}(0,\sigma^{2}{\mathbf{I}}_{N}) is the AWGN vector333As each AP has similar signal models, we remove the index ll for the AP and use 𝐲d[n]{\mathbf{y}}^{{\mathrm{d}}}[n], 𝐇{\mathbf{H}}, 𝐱d[n]{\mathbf{x}}^{{\mathrm{d}}}[n], and 𝐧d[n]{\mathbf{n}}^{{\mathrm{d}}}[n] to denote 𝐲ld[n]{\mathbf{y}}_{l}^{{\mathrm{d}}}[n], 𝐇l{\mathbf{H}}_{l}, 𝐱ld[n]{\mathbf{x}}_{l}^{{\mathrm{d}}}[n], and 𝐧ld[n]{\mathbf{n}}_{l}^{{\mathrm{d}}}[n], respectively. To simplify the notation, we have the same operation for the following symbols.. We can rewrite it in a matrix from as 𝐘d=𝐇𝐗d+𝐍d{\mathbf{Y}}^{{\mathrm{d}}}={\mathbf{H}}{\mathbf{X}}^{{\mathrm{d}}}+{\mathbf{N}}^{{\mathrm{d}}}, where 𝐍d=[𝐧d[1],,𝐧d[τd]]N×τd{\mathbf{N}}^{{\mathrm{d}}}=\left[{\mathbf{n}}^{{\mathrm{d}}}[1],\ldots,{\mathbf{n}}^{{\mathrm{d}}}[\tau_{d}]\right]\in{\mathbb{C}}^{N\times\tau_{d}} is the AWGN matrix in the data transmission stage. Denote the estimated channel 𝐇^N×K\hat{{\mathbf{H}}}\in\mathbb{C}^{N\times K} as 𝐇^=𝐇+Δ𝐇\hat{{\mathbf{H}}}={\mathbf{H}}+\Delta{\mathbf{H}}, where Δ𝐇\Delta{\mathbf{H}} is the channel estimation error. The signal detection problem can be formulated as

𝐲d[n]=𝐇𝐱d[n]+𝐧d[n]=𝐇^𝐱d[n]+𝐧^d[n],\displaystyle{\mathbf{y}}^{{\mathrm{d}}}[n]={\mathbf{H}}{\mathbf{x}}^{{\mathrm{d}}}[n]+{\mathbf{n}}^{{\mathrm{d}}}[n]=\hat{{\mathbf{H}}}{\mathbf{x}}^{{\mathrm{d}}}[n]+\hat{{\mathbf{n}}}^{{\mathrm{d}}}[n], (27)

where 𝐧^d[n]=𝐧d[n]Δ𝐇𝐱d[n]\hat{{\mathbf{n}}}^{{\mathrm{d}}}[n]={\mathbf{n}}^{{\mathrm{d}}}[n]-\Delta{\mathbf{H}}{\mathbf{x}}^{{\mathrm{d}}}[n] is the equivalent noise in the signal detector and the statistical characterization of 𝐧^d[n]\hat{{\mathbf{n}}}^{{\mathrm{d}}}[n] is given in Appendix B-A.

IV-D Data-Aided Channel Estimation

To further improve the BER performance, a data-aided channel estimation approach is adopted in the channel estimation stage. First, conventional pilot-only based channel estimation is performed and the transmitted symbols are detected. Then, the detected symbols are fed back to the channel estimator as additional pilot symbols to refine the channel estimation. In the channel training stage, pilot matrix 𝐗p{\mathbf{X}}^{{\mathrm{p}}} is transmitted. The received signal matrix corresponding to the pilot matrix 𝐘pN×τp{\mathbf{Y}}^{{\mathrm{p}}}\in\mathbb{C}^{N\times\tau_{p}} is expressed as 𝐘p=𝐇𝐗p+𝐍p{\mathbf{Y}}^{{\mathrm{p}}}={\mathbf{H}}{\mathbf{X}}^{{\mathrm{p}}}+{\mathbf{N}}^{{\mathrm{p}}}, where 𝐍p=[𝐧p[1],,𝐧p[τp]]N×τp{\mathbf{N}}^{{\mathrm{p}}}=\left[{\mathbf{n}}^{{\mathrm{p}}}[1],\ldots,\mathbf{n}^{{\mathrm{p}}}[\tau_{p}]\right]\in\mathbb{C}^{N\times\tau_{p}} is the AWGN matrix and each column 𝐧p[n]𝒩(0,σ2𝐈N)\mathbf{n}^{{\mathrm{p}}}[n]\sim\mathcal{N}_{\mathbb{C}}(0,\sigma^{2}\mathbf{I}_{N}) for n=1,,τpn=1,\ldots,\tau_{p}. The estimated data matrix 𝐗^d\hat{\mathbf{X}}^{{\mathrm{d}}} can be expressed as 𝐗^d=𝐗d+𝐄d\hat{{\mathbf{X}}}^{{\mathrm{d}}}={\mathbf{X}}^{{\mathrm{d}}}+{\mathbf{E}}^{{\mathrm{d}}}, where 𝐄d\mathbf{E}^{{\mathrm{d}}} is the signal detection error matrix. In the data-aided channel estimation stage, the estimated 𝐗^d\hat{{\mathbf{X}}}^{{\mathrm{d}}} are fed back to the channel estimator as additional pilots. Then, the received signal matrix 𝐘d{\mathbf{Y}}^{{\mathrm{d}}} corresponding to 𝐗^d\hat{{\mathbf{X}}}^{{\mathrm{d}}} can be expressed as

𝐘d=𝐇𝐗d+𝐍d=𝐇𝐗^d+𝐍^d,\displaystyle{\mathbf{Y}}^{{\mathrm{d}}}={\mathbf{H}}{\mathbf{X}}^{{\mathrm{d}}}+{\mathbf{N}}^{{\mathrm{d}}}={\mathbf{H}}\hat{{\mathbf{X}}}^{{\mathrm{d}}}+\hat{{\mathbf{N}}}^{{\mathrm{d}}}, (28)

where 𝐍^d=𝐍d𝐇𝐄d\hat{{\mathbf{N}}}^{{\mathrm{d}}}={\mathbf{N}}^{{\mathrm{d}}}-{\mathbf{H}}{\mathbf{E}}^{{\mathrm{d}}} is the equivalent noise for the data-aided pilot 𝐗^d\hat{{\mathbf{X}}}^{{\mathrm{d}}}. The statistical information of the nn-th column of 𝐍^d\hat{{\mathbf{N}}}^{{\mathrm{d}}} is 𝐧^d[n]𝒩(𝟎,𝐕^d[n])\hat{\mathbf{n}}^{{\mathrm{d}}}[n]\sim\mathcal{N}_{{\mathbb{C}}}(\mathbf{0},\hat{\mathbf{V}}^{{\mathrm{d}}}[n]) for n=1,,τdn=1,\ldots,\tau_{d}, where 𝐕^d[n]\hat{\mathbf{V}}^{{\mathrm{d}}}[n] will be utilized for data-aided channel estimation. We denote 𝐘=[𝐘p,𝐘d]{\mathbf{Y}}=[{\mathbf{Y}}^{{\mathrm{p}}},{\mathbf{Y}}^{{\mathrm{d}}}] as the received signal matrix corresponding to the overall transmitted signal. By concatenating the received pilot and data signals, we have

𝐘=[𝐘p𝐘d]=𝐇𝐗+𝐍,\displaystyle{\mathbf{Y}}=\left[{\mathbf{Y}}^{{\mathrm{p}}}\ {\mathbf{Y}}^{{\mathrm{d}}}\right]={\mathbf{H}}{\mathbf{X}}+{\mathbf{N}}, (29)

where 𝐗=[𝐗p,𝐗^d]{\mathbf{X}}=[{\mathbf{X}}^{{\mathrm{p}}},\hat{{\mathbf{X}}}^{{\mathrm{d}}}] and 𝐍=[𝐍p,𝐍^d]{\mathbf{N}}=[{\mathbf{N}}^{{\mathrm{p}}},\hat{{\mathbf{N}}}^{{\mathrm{d}}}] can be interpreted as the equivalent pilot signal and noise in the data-aided channel estimation stage, respectively. The statistical characteristic for the equivalent noise and corresponding LMMSE estimator are derived in Appendix B-B.

V Performance Analysis

In this section, we provide an asymptotic performance analysis for the proposed distributed EP detector. We consider L,KL,K\rightarrow\infty and fix

αl=KN,α=KLN.\alpha_{l}=\frac{K}{N},\alpha=\frac{K}{LN}. (30)

After proposing a state evolution analysis framework, we will characterize the fixed points of the proposed detector.

V-A State Evolution Analysis

We first provide a state evolution analysis framework to predict the asymptotic performance of the distributed EP detector in the large-system limit444The large-system limit means that the cell-free massive MIMO system with L,KL,K\rightarrow\infty..

Proposition 1.

In the large-system limit, the asymptotic behavior (such as MSE and BER) of Algorithm 1 can be described by the following equations:

vA,lext,t\displaystyle v_{\mathrm{A},l}^{\mathrm{ext},t} =(1Klk=1Kl1σ2τk,l+λl(t1)λl(t1))1\displaystyle=\bigg{(}\frac{1}{K_{l}}\sum_{k=1}^{K_{l}}\frac{1}{\sigma^{-2}\tau_{k,l}+\lambda_{l}^{(t-1)}}-\lambda_{l}^{(t-1)}\bigg{)}^{-1} (31a)
vAext,t\displaystyle v_{\mathrm{A}}^{\mathrm{ext},t} =(l=1L1vA,lext,t)1\displaystyle=\bigg{(}\sum_{l=1}^{L}\frac{1}{v_{\mathrm{A},l}^{\mathrm{ext},t}}\bigg{)}^{-1} (31b)
λl(t)\displaystyle\lambda_{l}^{(t)} =1MSE(vAext,t)1vA,lext,t.\displaystyle=\frac{1}{\mathrm{MSE}(v_{\mathrm{A}}^{\mathrm{ext},t})}-\frac{1}{v_{\mathrm{A},l}^{\mathrm{ext},t}}. (31c)

\blacksquare

The function MSE()\mathrm{MSE}(\cdot) is given by

MSE(vA,lext,t)𝙴{|x𝙴{x|xA,lext,t,vA,lext,t|2}\mathrm{MSE}(v_{\mathrm{A},l}^{\mathrm{ext},t})\triangleq\mathtt{E}\bigg{\{}|x-\mathtt{E}\{x|x_{\mathrm{A},l}^{\mathrm{ext},t},v_{\mathrm{A},l}^{\mathrm{ext},t}|^{2}\bigg{\}} (32)

where the expectation is with respect to xx. τk,l\tau_{k,l} are the eigenvalues of 𝐇lH𝐇l{\mathbf{H}}^{H}_{l}{\mathbf{H}}_{l}.

The state equations illustrated in Proposition 1 can be proved rigorously in a similar way as [44]. To better understand the state evolution analysis, we give an intuitive derivation here. In Algorithm 1, we have explicit expressions for computing the mean and variance for each module. By substituting (14) into (16), we have the following expression

vA,lext,t=11Ktr(σ2𝐇lH𝐇l+λl(t1)𝐈)λl(t1).v_{\mathrm{A},l}^{\mathrm{ext},t}=\frac{1}{\frac{1}{K}\mathrm{tr}(\sigma^{-2}{\mathbf{H}}^{H}_{l}{\mathbf{H}}_{l}+\lambda_{l}^{(t-1)}{\mathbf{I}})}-\lambda_{l}^{(t-1)}. (33)

The asymptotic expressions for vAext,tv_{\mathrm{A}}^{\mathrm{ext},t} and λl(t)\lambda_{l}^{(t)} can then be derived from (18) and (22). Finally, the asymptotic MSE can be interpreted as the MSE of the decoupled scalar AWGN channels (11) and is related to the distribution of the transmitted signal 𝐱{\mathbf{x}}. Next, we will give a specific example for Proposition 1.

Example 1: If the channel matrix 𝐇l(l=1,2,,L){\mathbf{H}}_{l}\,(l=1,2,\ldots,L) is an i.i.d. Gaussian distributed matrix, then vA,lextv_{\mathrm{A},l}^{\mathrm{ext}} will converge to the following asymptotic expression by adopting the result of the \mathcal{R}-transform555The concept of \mathcal{R}-transform can be found in [45], which enables the characterization of the asymptotic spectrum of a sum of suitable matrices from their individual asymptotic spectra. of the average empirical eigenvalue distribution given by,

vA,lext,t=αlσ2+(αl1)λl(t1)2+(αlσ2+(αl1)λl(t1))2+4αlσ2λl(t1)2.v_{\mathrm{A},l}^{\mathrm{ext},t}=\frac{\alpha_{l}\sigma^{2}+(\alpha_{l}-1)\lambda_{l}^{(t-1)}}{2}+\frac{\sqrt{\bigg{(}\alpha_{l}\sigma^{2}+(\alpha_{l}-1)\lambda_{l}^{(t-1)}\bigg{)}^{2}+4\alpha_{l}\sigma^{2}\lambda_{l}^{(t-1)}}}{2}. (34)

Furthermore, if the data symbol is drawn from a quadrature phase-shift keying (QPSK) constellation, the MSE\mathrm{MSE} is expressed by

MSE(vA,lext,t)=1Dztanh(1vA,lext,t+1vA,lext,tz).\mathrm{MSE}(v_{\mathrm{A},l}^{\mathrm{ext},t})=1-\int\mathrm{D}z\tanh(\frac{1}{v_{\mathrm{A},l}^{\mathrm{ext},t}}+\sqrt{\frac{1}{v_{\mathrm{A},l}^{\mathrm{ext},t}}}z). (35)

Furthermore, the BER w.r.t. 𝐱{\mathbf{x}} can be evaluated through the equivalent AWGN channel (11) with an equivalent SNR=1/vA,lext,T\mathrm{SNR}=1/v_{\mathrm{A},l}^{\mathrm{ext},T} and is given by

BER=2Q(1vA,lext,T)[Q(1vA,lext,T)]2,\mathrm{BER}=2Q(\sqrt{\frac{1}{v_{\mathrm{A},l}^{\mathrm{ext},T}}})-[Q(\sqrt{\frac{1}{v_{\mathrm{A},l}^{\mathrm{ext},T}}})]^{2}, (36)

where Q(x)=xDzQ(x)=\int_{x}^{\infty}\mathrm{D}z is the QQ-function. In fact, the MSE and BER are determined given the knowledge of the AWGN channel (11) with SNR=1/vA,lext,T\mathrm{SNR}=1/v_{\mathrm{A},l}^{\mathrm{ext},T}, which is known as the decoupling principle. Thus, if the data symbol is drawn from other MM-QAM constellations, the corresponding BER can be easily obtained using a closed-form BER expression [46].

V-B Fixed Point Characterization

The fixed points of the distributed EP detection algorithm are the stationary points of a relaxed version of the Kullback-Leibler (KL) divergence to minimize the posterior probability (4), which is given by

b(𝐱)=argminb(𝐱)D[b(𝐱)𝙿(𝐱|𝐲,𝐇)].b({\mathbf{x}})=\arg\min\limits_{b({\mathbf{x}})\in\mathcal{F}}D\left[b({\mathbf{x}})\|\mathtt{P}({\mathbf{x}}|{\mathbf{y}},{\mathbf{H}})\right]. (37)

The KL divergence between two distributions is defined as,

D[p(x)q(x)]=p(x)logp(x)q(x)dx+(q(x)p(x))dx.{D}\left[{p\left(x\right)\left\|{q\left(x\right)}\right.}\right]=\int{p\left(x\right)\log\frac{{p\left(x\right)}}{{q\left(x\right)}}}{{\mathrm{d}}}x+\int{\left({q\left(x\right)-p\left(x\right)}\right)}{{\mathrm{d}}}x. (38)

Generally, it is difficult to approximate the complex posterior distribution (4) by a tractable approach. The moment matching method is utilized to exploit the relaxations of the KL divergence minimization problem [47]. The process for solving the KL divergence minimization problem is equivalent to optimizing the Bethe free energy subject to the moment-matching constraint. That is,

minb,b1,,bLmaxqJ(b,b1,,bL,q)\displaystyle\min\limits_{b,b_{1},\ldots,b_{L}}\max\limits_{q}~{}J\left(b,b_{1},\ldots,b_{L},q\right) (39a)
s.t.𝙴bl(𝐱)=𝙴q(𝐱),\displaystyle~{}~{}~{}~{}~{}\text{s.t.}~{}~{}~{}~{}\mathtt{E}_{b_{l}}({\mathbf{x}})=\mathtt{E}_{q}({\mathbf{x}}), (39b)
1Kk𝒦𝙴bl(|xk|2)=1Kk𝒦𝙴q(|xk|2),\displaystyle~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}\frac{1}{K}\sum\limits_{k\in\mathcal{K}}\mathtt{E}_{b_{l}}(|x_{k}|^{2})=\frac{1}{K}\sum\limits_{k\in\mathcal{K}}\mathtt{E}_{q}(|x_{k}|^{2}), (39c)

where J(b,b1,,bL,q)=KL(belogf(𝐱))+l=1LKL(blelogfL(𝐱))+H(q)J\left(b,b_{1},\ldots,b_{L},q\right)=\mathrm{KL}\left(b\|e^{\log f({\mathbf{x}})}\right)+\sum\limits_{l=1}^{L}\mathrm{KL}\left(b_{l}\|e^{\log f_{L}({\mathbf{x}})}\right)+H(q) and H()H(\cdot) is the differential entropy. In addition, b(𝐱)b({\mathbf{x}}), {bc(𝐱)}l=1L\{b_{c}({\mathbf{x}})\}_{l=1}^{L}, and q(𝐱)q({\mathbf{x}}) are given by

b(𝐱):=1Z0(𝐱Aext)p(𝐱)exp[𝐱𝐱Aext2/vAext],\displaystyle b({\mathbf{x}}):=\frac{1}{Z_{0}\left({\mathbf{x}}_{\mathrm{A}}^{\mathrm{ext}}\right)}p({\mathbf{x}})\exp\left[-\|{\mathbf{x}}-{\mathbf{x}}_{\mathrm{A}}^{\mathrm{ext}}\|^{2}/v_{\mathrm{A}}^{\mathrm{ext}}\right], (40a)
bl(𝐱):=1Zc(𝐱B,lext)exp[1σ2𝐲l𝐡l𝐱2𝐱𝐱B,lext2/vB,lext],\displaystyle b_{l}({\mathbf{x}}):=\frac{1}{Z_{c}\left({\mathbf{x}}_{\mathrm{B},l}^{\mathrm{ext}}\right)}\exp\left[-\frac{1}{\sigma^{2}}\|{\mathbf{y}}_{l}-{\mathbf{h}}_{l}{\mathbf{x}}\|^{2}-\|{\mathbf{x}}-{\mathbf{x}}_{\mathrm{B},l}^{\mathrm{ext}}\|^{2}/v_{\mathrm{B},l}^{\mathrm{ext}}\right], (40b)
q(𝐱):=1Zq(𝐱^)exp[𝐱𝐱^2/v],\displaystyle q({\mathbf{x}}):=\frac{1}{Z_{q}\left(\hat{{\mathbf{x}}}\right)}\exp\left[-\|{\mathbf{x}}-\hat{{\mathbf{x}}}\|^{2}/v\right], (40c)

respectively, where Z0(𝐱Aext)Z_{0}\left({\mathbf{x}}_{\mathrm{A}}^{\mathrm{ext}}\right), {Zc(𝐱B,lext)}c=1C\{Z_{c}\left({\mathbf{x}}_{\mathrm{B},l}^{\mathrm{ext}}\right)\}_{c=1}^{C}, and Zq(𝐱^)Z_{q}\left(\hat{{\mathbf{x}}}\right) are the normalization factors corresponding to their density functions. Based on the above-mentioned optimization process, we have

𝐱^=𝙴b(𝐱)=𝙴b1(𝐱)==𝙴bC(𝐱)=𝙴q(𝐱),\displaystyle\hat{{\mathbf{x}}}=\mathtt{E}_{b}({\mathbf{x}})=\mathtt{E}_{b_{1}}({\mathbf{x}})=\cdots=\mathtt{E}_{b_{C}}({\mathbf{x}})=\mathtt{E}_{q}({\mathbf{x}}), (41a)
v=1Kk𝒦𝙴b(|xk|2)=1Kk𝒦𝙴b1(|xk|2)==1Kk𝒦𝙴bL(|xk|2)=1Kk𝒦𝙴q(|xk|2),\displaystyle\begin{aligned} v&=\frac{1}{K}\sum\limits_{k\in\mathcal{K}}\mathtt{E}_{b}(|x_{k}|^{2})=\frac{1}{K}\sum\limits_{k\in\mathcal{K}}\mathtt{E}_{b_{1}}(|x_{k}|^{2})=\cdots&=\frac{1}{K}\sum\limits_{k\in\mathcal{K}}\mathtt{E}_{b_{L}}(|x_{k}|^{2})=\frac{1}{K}\sum\limits_{k\in\mathcal{K}}\mathtt{E}_{q}(|x_{k}|^{2}),\end{aligned} (41b)

for any fixed points vv and 𝐱^\hat{{\mathbf{x}}} in Algorithm 1. This fixed point characterization is helpful to perform the convergence analysis and understand the algorithm from the optimization perspective.

VI Simulation Results

In this section, we will provide extensive simulation results to demonstrate the excellent performance of the proposed distributed EP detector for cell-free massive MIMO. After showing the simulation results of the distributed EP detector in conventional and scalable cell-free massive MIMO systems, we will verify the accuracy of the proposed analytical framework. Furthermore, the performance of the distributed EP-based ICD scheme is also investigated. The codes for the simulation part is available at https://github.com/hehengtao/DeEP-cell-free-mMIMO.

VI-A Conventional Cell-Free Massive MIMO

In this subsection, we consider the BER performance of the proposed detector in the conventional cell-free massive MIMO where each AP serves all users. We assume that perfect CSI can be obtained at each AP. The simulation parameters are set as N=8N=8, K=32K=32, and L=8L=8. Fig. 4 compares the achievable BER of the proposed distributed EP, centralized EP detectors [31], and other baselines [27] using QPSK and 1616-QAM modulation symbols. The results are obtained by Monte Carlo simulations with 10,00010,000 independent channel realizations. We denote “deEP” as the distributed EP detector proposed in Section III. It can be observed that the proposed distributed EP detector outperforms the distributed MMSE detector with only one EP iteration. This is because deEP incorporates the prior information of the symbol 𝐱{\mathbf{x}} into the posterior mean estimator for the equivalent AWGN channel (10). Furthermore, it outperforms the centralized MMSE detector with T=5T=5 iterations for different modulation symbols. Compared with the centralized EP detectors [31], the distributed EP detector achieves comparable BER performance.

Refer to caption

(a) QPSK

Refer to caption

(b) 16-QAM

Figure 4: .  BER performance comparisons of different detectors in the conventional cell-free massive MIMO system.

VI-B Accuracy of the Analytical Results

We next verify the accuracy of the analytical framework in Fig. 5 with different system settings and modulation schemes. As shown in the figure, the BERs of the proposed detector match well with the derived analytical results, which demonstrates the accuracy of the analytical framework developed in Section V-A. Although the analytical framework is derived in the large-system limit, it still can predict the BER performance in small-sized MIMO systems illustrated in Fig. 5(b). Therefore, instead of performing time-consuming Monte Carlo simulations to obtain the corresponding performance metrics, we can predict the theoretical behavior by the state evolution equations. Furthermore, the analytical framework can be further utilized to optimize the system design.

Refer to caption

(a) N=16N=16, K=64K=64, L=16L=16

Refer to caption

(b) N=8N=8, K=8K=8, L=4L=4

Figure 5: .  BER performance comparisons of the analytical and simulation results for the distributed EP detector with different modulation schemes.

VI-C Scalable Cell-Free Massive MIMO

As the user-centric approach is more attractive for cell-free massive MIMO, we investigate the distributed EP detector with DCC framework. The user first appoints a master AP according to the large-scale fading factor and assigned a pilot 𝐩τ{\mathbf{p}}_{\tau} by the appointed AP. Then, other neighboring APs determine whether they serve the accessing user according to the assigned pilot 𝐩τ{\mathbf{p}}_{\tau}. Finally, the cluster for the kk-th user is constructed. Fig. 6 shows that the distributed EP detector outperforms both the centralized and distributed MMSE detectors. Furthermore, the performance loss is limited when compared to the conventional cell-free massive MIMO system while the computational complexity is significantly decreased from 𝒪(KN2)\mathcal{O}(KN^{2}) to O(|𝒟l|N2)O(|\mathcal{D}_{l}|N^{2}).

Refer to caption

(a) QPSK

Refer to caption

(b) 16-QAM

Figure 6: .  BER performance comparisons of different detectors in the scalable cell-free massive MIMO system.

VI-D Performance of ICD with 3GPP Urban Channel

In Sections VI-A to VI-D, all detectors are investigated assuming perfect CSI. In this section, we consider a distributed EP-based ICD scheme for cell-free massive MIMO with the 3GPP Urban Channel [48] and imperfect CSI. We assume that N=8N=8, K=8K=8, and L=4L=4. The APs are assumed to be deployed in a 1×11\times 1 km2\mathrm{km}^{2} area located in urban environments to match with the system settings in [27]. The large-scale fading factor is given by

βkl[dB]=30.536.7log10(dkl1m)+gkl,\beta_{kl}\,[\textrm{dB}]=-30.5-36.7\log_{10}\left(\frac{d_{kl}}{1\,\textrm{m}}\right)+g_{kl}, (42)

where dkld_{kl} is the distance between the kk-th user and ll-th AP and gkl𝒩(0,42)g_{kl}\sim\mathcal{N}(0,4^{2}) represents shadow fading. The shadowing terms from an AP to different users are correlated as

𝔼{gklgij}={422δki/9ml=j0lj,\mathbb{E}\{g_{kl}g_{ij}\}=\begin{cases}4^{2}2^{-\delta_{ki}/9\,\textrm{m}}&l=j\\ 0&l\neq j,\end{cases} (43)

where δki\delta_{ki} is the distance between the kk-th user and ii-th user. The second case in (43) on the correlation of shadowing terms related to two different APs can be ignored. Furthermore, the multi-antenna APs are equipped with half-wavelength-spaced uniform linear arrays and the spatial correlation is generated using the Gaussian local scattering model with a 1515^{\circ} angular standard deviation. The transmit power pkp_{k} for each user is pk=100p_{k}=100 mW and the bandwidth is 2020 MHz.

Refer to caption

(a) QPSK

Refer to caption

(b) 16-QAM

Figure 7: .  BER performance comparisons of different detectors in the cell-free massive MIMO system with 3GPP Urban Channel model and imperfect CSI.

Fig. 7 compares the achievable BER of the proposed distributed EP detector with those of other detectors in the 3GPP Urban Channel and imperfect CSI. The channel is estimated with the LMMSE channel estimator. As can be observed from the figure, the proposed ICD scheme outperforms the centralized and distributed MMSE detectors, and shows a similar performance as the distributed EP detector with perfect CSI and 1616-QAM symbols.

VI-E Non-Orthogonal Pilots

In this section, we consider the performance of ICD with different pilots in the channel training stage. In particular, the orthogonal pilot matrix 𝐗pN×τp\mathbf{X}^{{\mathrm{p}}}\in\mathbb{C}^{N\times\tau_{p}} is chosen by selecting τp\tau_{p} columns from the discrete Fourier transformation (DFT) matrix 𝐅τp×τp{\mathbf{F}}\in\mathbb{C}^{\tau_{p}\times\tau_{p}}. For a non-orthogonal pilot, each element of the pilot matrix is drawn from 6464-QAM. Furthermore, we consider τp=K\tau_{p}=K and τd=16K\tau_{d}=16K for the two pilots. Fig. 8 shows the BERs of the distributed detectors in the ICD architecture, where r=1r=1 indicates no data feedback to the channel estimator while r=2r=2 refers to the scenario where the detected data are fed back to channel estimator once. As can be observed from the figure, the data feedback can improve the system performance significantly. Specifically, if we target for a BER = 10210^{-2} with QPSK symbols and non-orthogonal pilots, data feedback (r=2r=2) can bring 3.33.3 dB and data feedback (r=4r=4) can bring 4.54.5 dB performance gain, respectively. Furthermore, the performance gain will be enlarged if orthogonal DFT pilots are used. Interestingly, using DFT pilots has a similar performance as that of non-orthogonal pilots. This demonstrates that the ICD scheme is an efficient way to enable non-orthogonal pilots for reducing the pilot overhead in cell-free massive MIMO systems.

Refer to caption

(a) QPSK

Refer to caption

(b) 16-QAM

Figure 8: .  BERs performance comparisons of the distributed EP-based ICD scheme with orthogonal and non-orthogonal pilots.

VII Conclusion

In this paper, we proposed a distributed EP detector for cell-free massive MIMO. It was shown that the proposed detector achieves better performance than linear detectors for both conventional and scalable cell-free massive MIMO networks. Compared to other distributed detectors, it achieves a better BER performance with an increase of the computational overhead at the CPU. A distributed ICD was then proposed for cell-free massive MIMO systems to handle imperfect CSI. An analytical framework was also provided to characterize the asymptotic performance of the proposed detector in a large system setting. Simulation results demonstrated that the proposed method outperforms existing distributed detectors for cell-free massive MIMO in terms of BER performance. Furthermore, the proposed ICD architecture significantly improves the system performance and enables non-orthogonal pilots to reduce the pilot overhead. For future research, it is interesting to extend the distributed EP algorithm to precoding and channel estimation for cell-free massive MIMO systems.

Appendix A Message-Passing Derivation of Algorithm 1

A-A Derivation of Distributed EP Detector

In this appendix, we present the derivation of Algorithm 1 from the message-passing perspective666We also can derive Algorithm 1 by directly utilizing the EP principle in [49], which is an extension of belief propagation and obtained by imposing an exponential family constraint on messages. Both BP and EP can be derived from the generic message passing rules introduced in Section III-B.. By applying the above message-passing rules to the factor graph in Fig. 2, we can obtain Algorithm 1. First, we give the factor graph for cell-free massive MIMO detection. As illustrated in Fig. 2, we have factor graphs for describing the centralized processing model (4) and distributed processing model (6). Specifically, the factor graph contains L+1L+1 factor nodes {f,f1,,fL}\{f,f_{1},\ldots,f_{L}\} and one variable node 𝐱{\mathbf{x}} for the distributed processing model (6). The factor ff represents the prior information of 𝐱{\mathbf{x}} and flf_{l} denotes ll-th AP. After constructing the factor graph for (6), we approximate the posterior distribution by computing and passing messages among the nodes in the factor graph with an iterative manner and aforementioned rules. We initialize the messages from the variable node 𝐱{\mathbf{x}} to factors {f0,f1,,fL}\{f_{0},f_{1},\ldots,f_{L}\} as μ𝐱fl(𝐱)=𝒩(𝐱;𝜸l(0)/λl(0),λl𝐈)(l=1,2,,L)\mu_{{\mathbf{x}}\rightarrow f_{l}}({\mathbf{x}})=\mathcal{N}_{\mathbb{C}}({\mathbf{x}};\boldsymbol{\gamma}_{l}^{(0)}/\lambda_{l}^{(0)},\lambda_{l}{\mathbf{I}})(l=1,2,...,L). Then, according to Factor-to-variable messages, we have the message μfl𝐱(𝐱)=bfl(𝐱)/μ𝐱fl(𝐱)\mu_{f_{l}\rightarrow{\mathbf{x}}}({\mathbf{x}})=b_{f_{l}}({\mathbf{x}})/\mu_{{\mathbf{x}}\rightarrow f_{l}}({\mathbf{x}})777As each AP serves all users, only one variable 𝐱{\mathbf{x}} should be considered in the message passing process. Therefore, we have a reduced form as μfl𝐱(𝐱)fl(𝐱)Πμ𝐱fld𝐱bfl(𝐱)/μ𝐱fl(𝐱)\mu_{f_{l}\rightarrow{\mathbf{x}}}({\mathbf{x}})\propto\int f_{l}({\mathbf{x}})\Pi\mu_{{\mathbf{x}}\rightarrow f_{l}}{\mathrm{d}}{\mathbf{x}}\propto b_{f_{l}}({\mathbf{x}})/\mu_{{\mathbf{x}}\rightarrow f_{l}}({\mathbf{x}}).. The message μ𝐱fl(𝐱)\mu_{{\mathbf{x}}\rightarrow f_{l}}({\mathbf{x}}) is initialized as 𝒩(𝐱;𝜸l(0)/λl(0),λl𝐈)\mathcal{N}_{\mathbb{C}}({\mathbf{x}};\boldsymbol{\gamma}_{l}^{(0)}/\lambda_{l}^{(0)},\lambda_{l}{\mathbf{I}}) for each AP. Therefore, how to compute the belief bfl(𝐱)b_{f_{l}}({\mathbf{x}}) is of great significance and it is given by

bfl(𝐱)=argminb(𝐱)D[μ𝐱fl(𝐱)fl(𝐱)b(𝐱)],b_{f_{l}}({\mathbf{x}})=\arg\min\limits_{b({\mathbf{x}})\in\mathcal{F}}D\left[{{\mu_{{\mathbf{x}}\to{f_{l}}}}\left({\mathbf{x}}\right){f_{l}}\left({\mathbf{x}}\right)}\|b({\mathbf{x}})\right], (44)

The explicit expression for KL divergence is given in (38). Since obtaining a distribution to minimize the KL divergence is very difficult, we consider bfl(𝐱)b_{f_{l}}({\mathbf{x}}) is a Gaussian distribution with the same mean and covariance matrix with the distribution μ𝐱fl(𝐱)fl(𝐱)\mu_{{\mathbf{x}}\to{f_{l}}}\left({\mathbf{x}}\right){f_{l}}\left({\mathbf{x}}\right). Therefore, we have bfl(𝐱)=𝒩(𝐱;𝝁l,𝚺l)b_{f_{l}}({\mathbf{x}})=\mathcal{N}_{\mathbb{C}}({\mathbf{x}};\boldsymbol{\mu}_{l},\boldsymbol{\Sigma}_{l}) with 𝚺l=(σ2𝐇lH𝐇l+λl𝐈)1\boldsymbol{\Sigma}_{l}=(\sigma^{-2}{\mathbf{H}}^{H}_{l}{\mathbf{H}}_{l}+\lambda_{l}{\mathbf{I}})^{-1}and 𝝁l=𝚺l(σ2𝐇l𝐲l+𝜸l)\boldsymbol{\mu}_{l}=\boldsymbol{\Sigma}_{l}(\sigma^{-2}{\mathbf{H}}_{l}{\mathbf{y}}_{l}+\boldsymbol{\gamma}_{l}), which yields (14) and (15) in Algorithm 1. Then, we have

μfl𝐱(𝐱)=bfl(𝐱)μ𝐱fl(𝐱)𝒩(𝐱;𝐱A,lpost,vA,lpost𝐈)𝒩(𝐱;𝜸l/λl,λl𝐈)𝒩(𝐱;𝐱A,lext,vA,lext𝐈),\mu_{f_{l}\rightarrow{\mathbf{x}}}({\mathbf{x}})=\frac{b_{f_{l}}({\mathbf{x}})}{\mu_{{\mathbf{x}}\to{f_{l}}}({\mathbf{x}})}\cong\frac{\mathcal{N}_{\mathbb{C}}({\mathbf{x}};{\mathbf{x}}_{\mathrm{A},l}^{\mathrm{post}},v_{\mathrm{A},l}^{\mathrm{post}}{\mathbf{I}})}{\mathcal{N}_{\mathbb{C}}({\mathbf{x}};\boldsymbol{\gamma}_{l}/\lambda_{l},\lambda_{l}{\mathbf{I}})}\propto{\mathcal{N}_{\mathbb{C}}({\mathbf{x}};{\mathbf{x}}_{\mathrm{A},l}^{\mathrm{ext}},v_{\mathrm{A},l}^{\mathrm{ext}}{\mathbf{I}})}, (45)

where we approximate bfl(𝐱)b_{f_{l}}({\mathbf{x}}) by 𝒩(𝐱;𝐱A,lpost,vA,lpost𝐈)\mathcal{N}_{\mathbb{C}}({\mathbf{x}};{\mathbf{x}}_{\mathrm{A},l}^{\mathrm{post}},v_{\mathrm{A},l}^{\mathrm{post}}{\mathbf{I}}). The principles behind the approximation are moment-matching and self-averaging [49, 50]. According to the Gaussian production lemma, we have (16) and (17) to obtain vA,lextv_{\mathrm{A},l}^{\mathrm{ext}} and 𝐱A,lext{\mathbf{x}}_{\mathrm{A},l}^{\mathrm{ext}} in Algorithm 1, respectively. Next, we consider the message μ𝐱f(𝐱)\mu_{{\mathbf{x}}\rightarrow f}({\mathbf{x}}) to be the product of messages passing from each APs (factors){f1,f2,,fL}\{f_{1},f_{2},...,f_{L}\} to 𝐱{\mathbf{x}} and we have

μ𝐱f(𝐱)=l=1Lμ𝐱fl=l=1L𝒩(𝐱;𝐱A,lext,vA,lext)=𝒩(𝐱;𝐱Aext,vAext𝐈),\mu_{{\mathbf{x}}\rightarrow f}({\mathbf{x}})=\prod_{l=1}^{L}\mu_{{\mathbf{x}}\to{f_{l}}}=\prod_{l=1}^{L}\mathcal{N}_{\mathbb{C}}({\mathbf{x}};{\mathbf{x}}_{\mathrm{A},l}^{\mathrm{ext}},v_{\mathrm{A},l}^{\mathrm{ext}})=\mathcal{N}_{\mathbb{C}}({\mathbf{x}};{\mathbf{x}}_{\mathrm{A}}^{\mathrm{ext}},v_{\mathrm{A}}^{\mathrm{ext}}{\mathbf{I}}), (46)

where vAextv_{\mathrm{A}}^{\mathrm{ext}} and 𝐱Aext{\mathbf{x}}_{\mathrm{A}}^{\mathrm{ext}} are obtained from (18) and (19). We then approximate the belief on ff as bf(𝐱)𝒩(𝐱;𝐱Bpost,diag(𝐯Bpost))b_{f}({\mathbf{x}})\propto\mathcal{N}_{\mathbb{C}}({\mathbf{x}};{\mathbf{x}}_{\mathrm{B}}^{\mathrm{post}},\mathrm{diag}({\mathbf{v}}_{\mathrm{B}}^{\mathrm{post}})). The mean 𝐱Bpost{\mathbf{x}}_{\mathrm{B}}^{\mathrm{post}} and variance 𝐯Bpost{\mathbf{v}}_{\mathrm{B}}^{\mathrm{post}} are obtained by moment-matching and self-averaging, and denoted by (20) and (21), respectively. As the transmitted symbol is assumed to be drawn from MM-QAM set, the explicit expressions for (20) and (21) are given by (12) and (13). Similar to (45), we set the message μf𝐱(𝐱)\mu_{f\rightarrow{\mathbf{x}}}({\mathbf{x}}) as

μf𝐱(𝐱)=bf(𝐱)μ𝐱f(𝐱)=𝒩(𝐱;𝐱Bpost,vBpost𝐈)𝒩(𝐱;𝐱Aext,vAext𝐈)𝒩(𝐱,𝐱Bext,vBext).\mu_{f\rightarrow{\mathbf{x}}}({\mathbf{x}})=\frac{b_{f}({\mathbf{x}})}{\mu_{{\mathbf{x}}\rightarrow f}({\mathbf{x}})}=\frac{\mathcal{N}_{\mathbb{C}}({\mathbf{x}};{\mathbf{x}}_{\mathrm{B}}^{\mathrm{post}},v_{\mathrm{B}}^{\mathrm{post}}{\mathbf{I}})}{\mathcal{N}_{\mathbb{C}}({\mathbf{x}};{\mathbf{x}}_{\mathrm{A}}^{\mathrm{ext}},v_{\mathrm{A}}^{\mathrm{ext}}{\mathbf{I}})}\propto\mathcal{N}_{\mathbb{C}}({\mathbf{x}},{\mathbf{x}}_{\mathrm{B}}^{\mathrm{ext}},v_{\mathrm{B}}^{\mathrm{ext}}). (47)

Finally, according to the variable-to-factor messages, we compute the message μ𝐱fl(𝐱)\mu_{{\mathbf{x}}\rightarrow f_{l}}({\mathbf{x}}) for the next iteration which is given by

μ𝐱fl(𝐱)=μf𝐱(𝐱)ffμf𝐱(𝐱)𝒩(𝐱;𝜸l/λl,λl𝐈),\mu_{{\mathbf{x}}\rightarrow f_{l}}({\mathbf{x}})=\mu_{f\rightarrow{\mathbf{x}}}({\mathbf{x}})\prod_{f^{{}^{\prime}}\neq f}\mu_{f^{{}^{\prime}}\rightarrow{\mathbf{x}}}({\mathbf{x}})\propto\mathcal{N}_{\mathbb{C}}({\mathbf{x}};\boldsymbol{\gamma}_{l}/\lambda_{l},\lambda_{l}{\mathbf{I}}), (48)

and the explicit expressions for 𝜸l\boldsymbol{\gamma}_{l} and λl\lambda_{l} are given by (22) and (23), respectively. We can obtain the Algorithm 1 by repeating above message passing procedure.

A-B Derivation of MRC Combining

For distributed EP-based MIMO detection, the challenge is how to combine the message from different APs. In our paper, we focus on linear combining of the local estimate 𝐱A,lext\mathbf{x}_{\mathrm{A},l}^{\mathrm{ext}} and vA,lextv_{\mathrm{A},l}^{\mathrm{ext}} from APs as follows,

𝐱Aext=l=1Lal𝐱A,lext,l=1,,L.\mathbf{x}_{\mathrm{A}}^{\mathrm{ext}}=\sum_{l=1}^{L}a_{l}\mathbf{x}_{\mathrm{A},l}^{\mathrm{ext}},\quad l=1,\ldots,L. (49)

As the linear model (3) is decoupled into the equivalent AWGN model, we focus on the scalar form of the (49) with

xk,Aext=l=1Lal,kxA,l,kext,k=1,,K,x_{k,\mathrm{A}}^{\mathrm{ext}}=\sum_{l=1}^{L}a_{l,k}x_{\mathrm{A},l,k}^{\mathrm{ext}},\quad k=1,\ldots,K, (50)

where al,ka_{l,k} is the local estimate for the kk-user from the ll-th AP, and it depends on the variance vA,lextv_{\mathrm{A},l}^{\mathrm{ext}}. As in (10) and (11)., we write the estimate xk,Aext=xk+nl,keqx_{k,\mathrm{A}}^{\mathrm{ext}}=x_{k}+n_{l,k}^{eq}, where nl,keqn_{l,k}^{eq} represents equivalent noise with known variance vA,lextv_{\mathrm{A},l}^{\mathrm{ext}}. For the kk-th user, the optimal linear combining is xk,Aext=l=1Lal,kxA,l,kextx_{k,\mathrm{A}}^{\mathrm{ext}}=\sum_{l=1}^{L}a_{l,k}x_{\mathrm{A},l,k}^{\mathrm{ext}}. Thus, we have xk,Aext=l=1Lal,kxk+l=1Lal,knA,l,kextx_{k,\mathrm{A}}^{\mathrm{ext}}=\sum_{l=1}^{L}a_{l,k}x_{k}+\sum_{l=1}^{L}a_{l,k}n_{\mathrm{A},l,k}^{\mathrm{ext}}. Because xk,Aextx_{k,\mathrm{A}}^{\mathrm{ext}} should be an unbiased estimate of xkx_{k}, we have l=1Lal,k=1\sum_{l=1}^{L}a_{l,k}=1. After combining, the equivalent SNR is given by

SNR=|xk|2|l=1Lal,knA,l,kext|2=Esl=1Lal,k2vA,l,kext.\mathrm{SNR}=\frac{|x_{k}|^{2}}{|\sum_{l=1}^{L}a_{l,k}n_{\mathrm{A},l,k}^{\mathrm{ext}}|^{2}}=\frac{E_{s}}{\sum_{l=1}^{L}a_{l,k}^{2}v_{\mathrm{A},l,k}^{\mathrm{ext}}}. (51)

As a result, the combing problem becomes an optimization problem

max\displaystyle\max Esl=1Lal,k2vA,l,kext\displaystyle\quad\frac{E_{s}}{\sum_{l=1}^{L}a_{l,k}^{2}v_{\mathrm{A},l,k}^{\mathrm{ext}}}
s.t.\displaystyle\mathrm{s.t.} l=1Lal,k=1.\displaystyle\quad\sum_{l=1}^{L}a_{l,k}=1. (52)

By using the method of Lagrange multipliers, it is easy to obtain

ak,l=1vA,l,k(l=1L1vA,l,kext)1,l=1,,L.a_{k,l}=\frac{1}{v_{\mathrm{A},l,k}}\bigg{(}\sum_{l=1}^{L}\frac{1}{v_{\mathrm{A},l,k}^{\mathrm{ext}}}\bigg{)}^{-1},l=1,\ldots,L. (53)

Thus, we obtain the MRC combining in (18) and (19).

Appendix B Derivation of ICD

In this appendix, we present the derivation of the equivalent channel estimation error in the data detection stage, as well as the derivation of data detection error in channel estimation stage.

B-A Derivation of Equivalent Channel Estimation Error

The equivalent noise 𝐧^d[n]\hat{{\mathbf{n}}}^{{\mathrm{d}}}[n] in the signal detector is assumed to be Gaussian distributed with zero mean and covariance matrix 𝐕^est[n]\hat{{\mathbf{V}}}_{\mathrm{est}}[n], which can be obtained by considering the statistical properties of the channel estimation error. For the convenience of simplicity, we omit the time index nn. The detailed expression for 𝐕^est\hat{\mathbf{V}}_{\mathrm{est}} is then given by

𝐕^est=diag(j=1KσΔh1,j2+σ2,,j=1KσΔhN,j2+σ2),\hat{\mathbf{V}}_{\mathrm{est}}=\mathrm{diag}\left(\sum_{j=1}^{K}\sigma_{\Delta h_{1,j}}^{2}+\sigma^{2},\ldots,\sum_{j=1}^{K}\sigma_{\Delta h_{N,j}}^{2}+\sigma^{2}\right), (54)

where σΔhi,j2\sigma_{\Delta h_{i,j}}^{2} is the variance of (i,j)(i,j)-th element for channel estimation error matrix Δ𝐇\Delta{\mathbf{H}} obtained from 𝐑Δ𝐡lp{\mathbf{R}}_{\Delta{{\mathbf{h}}_{l}^{\mathrm{p}}}}.

B-B Derivation of Data Detection Error

Owing to the data feedback from the signal detector to the channel estimator, the LMMSE channel estimation in the data-aided stage is given by

𝐡¯^l=𝐑𝐡¯l𝐡¯l𝐀H(𝐀𝐑𝐡¯l𝐡¯l𝐀H+𝐑𝐧𝐧)1𝐲,\hat{\bar{{\mathbf{h}}}}_{l}={\mathbf{R}}_{\bar{{\mathbf{h}}}_{l}\bar{{\mathbf{h}}}_{l}}{\mathbf{A}}^{H}({\mathbf{A}}{\mathbf{R}}_{\bar{{\mathbf{h}}}_{l}\bar{{\mathbf{h}}}_{l}}{\mathbf{A}}^{H}+{\mathbf{R}}_{{\mathbf{n}}{\mathbf{n}}})^{-1}{\mathbf{y}}, (55)

where 𝐀=𝐗T𝐈NτcN×KN{\mathbf{A}}={\mathbf{X}}^{T}\otimes\mathbf{I}_{N}\in\mathbb{C}^{\tau_{c}N\times KN}, 𝐲=vec(𝐘)τcN×1{\mathbf{y}}=\mathrm{vec}(\mathbf{Y})\in\mathbb{C}^{\tau_{c}N\times 1}, 𝐧=vec(𝐍)τcN×1{\mathbf{n}}=\mathrm{vec}(\mathbf{N})\in\mathbb{C}^{\tau_{c}N\times 1}, and τc=τp+τd\tau_{c}=\tau_{p}+\tau_{d}. The covariance matrix 𝐑Δ𝐇l{\mathbf{R}}_{\Delta{\mathbf{H}}_{l}} of the channel estimation error vector Δ𝐡l=𝐡¯^l𝐡\Delta{{\mathbf{h}}_{l}}=\hat{\bar{{\mathbf{h}}}}_{l}-{\mathbf{h}} can be computed as

𝐑Δ𝐡l=𝐑𝐡¯l𝐡¯l𝐑𝐡¯l𝐡¯l𝐀H(𝐀𝐑𝐡¯l𝐡¯l𝐀H+𝐑𝐧𝐧)1𝐀𝐑𝐡¯l𝐡¯l.{\mathbf{R}}_{\Delta{\mathbf{h}}_{l}}={\mathbf{R}}_{\bar{{\mathbf{h}}}_{l}\bar{{\mathbf{h}}}_{l}}-{\mathbf{R}}_{\bar{{\mathbf{h}}}_{l}\bar{{\mathbf{h}}}_{l}}{\mathbf{A}}^{H}({\mathbf{A}}{\mathbf{R}}_{\bar{{\mathbf{h}}}_{l}\bar{{\mathbf{h}}}_{l}}{\mathbf{A}}^{H}+{\mathbf{R}}_{{\mathbf{n}}{\mathbf{n}}})^{-1}{\mathbf{A}}{\mathbf{R}}_{\bar{{\mathbf{h}}}_{l}\bar{{\mathbf{h}}}_{l}}. (56)

The covariance matrix 𝐑𝐧𝐧{\mathbf{R}}_{{\mathbf{n}}{\mathbf{n}}} contains the equivalent noise power of the actual pilot 𝐗p\mathbf{X}^{{\mathrm{p}}} and additional pilot 𝐗^d\hat{\mathbf{X}}^{{\mathrm{d}}}. The explicit expression for 𝐑𝐧𝐧{\mathbf{R}}_{{\mathbf{n}}{\mathbf{n}}} is given by

𝐑𝐧𝐧=[σ2𝐈τpN𝐕^det],\displaystyle{\mathbf{R}}_{{\mathbf{n}}{\mathbf{n}}}=\left[\begin{array}[]{c c c}\sigma^{2}{\mathbf{I}}_{\tau_{p}N}\\ &\hat{\mathbf{V}}_{\mathrm{det}}\end{array}\right], (59)

where

𝐕^det=[𝐕^d[1]𝐕^d[τd]]\displaystyle\hat{\mathbf{V}}_{\mathrm{det}}=\left[\begin{array}[]{c c c}\hat{\mathbf{V}}^{{\mathrm{d}}}[1]\\ &\ddots\\ &&\hat{\mathbf{V}}^{{\mathrm{d}}}[{\tau_{d}}]\end{array}\right] (63)

and

𝐕^d[n]=(j=1Klσej,n2+σ2)𝐈K.\hat{\mathbf{V}}^{{\mathrm{d}}}[n]=\left(\sum_{j=1}^{K_{l}}\sigma_{e_{j,n}}^{2}+\sigma^{2}\right)\mathbf{I}_{K}. (64)

Denote 𝐳n=𝐇𝐞n{\mathbf{z}}_{n}=\mathbf{H}{\mathbf{e}}_{n}, zi,n=j=1Khi,jej,nz_{i,n}=\sum_{j=1}^{K}h_{i,j}e_{j,n}, where 𝐞n{\mathbf{e}}_{n} is the nn-th column of the signal detection error matrix 𝐄d{\mathbf{E}}^{{\mathrm{d}}} and σej,n2\sigma_{e_{j,n}}^{2} is the variance for jj-th element of 𝐞n{\mathbf{e}}_{n}. The signal detection error matrix 𝐄d{\mathbf{E}}^{{\mathrm{d}}} can be obtained from the posterior variance 𝐯A,lpost{\mathbf{v}}_{\mathrm{A},l}^{\mathrm{post}} in the module A.

References

  • [1] H. He, H. Wang, X. Yu, J. Zhang, S.H. Song, K. B. Letaief, “Distributed expectation propagation detection for cell-free massive MIMO,” in Proc. IEEE Global Commun. Conf. (GLOBECOM), Madrid, Spain, 2021.
  • [2] J. G. Andrews, S. Buzzi, W. Choi, S. V. Hanly, A. Lozano, C. K. Soong, and J. C. Zhang, “What will 5G be?” IEEE J. Sel. Areas Commun., vol. 32, no. 6, pp. 1065-1082, 2014.
  • [3] K. B. Letaief, Y. Shi, J. Lu, and J. Lu, “Edge artificial intelligence for 6G: Vision, enabling technologies, and applications,” IEEE J. Sel. Areas Commun., vol. 40, no. 1, pp. 5-36, 2022.
  • [4] K. B. Letaief, W. Chen, Y. Shi, J. Zhang, and Y.-J.-A. Zhang, “The roadmap to 6G: AI empowered wireless networks,” IEEE Commun. Mag., vol. 57, no. 8, pp. 84-90, Aug. 2019.
  • [5] T. L. Marzetta, “Noncooperative cellular wireless with unlimited numbers of base station antennas,” IEEE Trans. Wireless Commun., vol. 9, no. 11, pp. 3590–3600, Nov. 2010.
  • [6] H. Q. Ngo, A. Ashikhmin, H. Yang, E. G. Larsson, and T. L. Marzetta, ”Cell-free massive MIMO versus small cells,” IEEE Trans. Wireless Commun., vol. 16, no. 3, pp. 1834–1850, Mar. 2017.
  • [7] J. Zhang, S. Chen, Y. Lin, J. Zheng, B. Ai, and L. Hanzo, “Cell-free massive MIMO: A new next-generation paradigm,” IEEE Access, vol. 7, pp. 99878–99888, Sep. 2019.
  • [8] J. Zhang, E. Björnson, M. Matthaiou, D. W. K. Ng, H. Yang, and D. J. Love, “Prospective multiple antenna technologies for beyond 5G,” IEEE J. Sel. Areas Commun., vol. 38, no. 8, pp. 1637-1660, Aug. 2020.
  • [9] M. Matthaiou, O. Yurduseven, H. Q. Ngo, D. Morales-Jimenez, S. L. Cotton, and V. F. Fusco, “The road to 6G: Ten physical layer challenges for communications engineers,” IEEE Commun. Mag., vol. 59, no. 1, pp. 64-69, Jan. 2021.
  • [10] H. He, X. Yu, J. Zhang, S.H. Song, K. B. Letaief, “Cell-Free Massive MIMO for 6G Wireless Communication Networks,” Journal of Communications and Information Networks, vol. 6, no. 4, pp. 321-335, Dec. 2021.
  • [11] G. Interdonato, E. Björnson, H. Q. Ngo, P. Frenger, and E. G. Larsson, “Ubiquitous cell-free massive MIMO communications,” EURASIP J. Wireless Commun. and Netw., vol. 2019, no. 1, pp. 197-206, 2019.
  • [12] D. Wang, C. Zhang, Y. Du, J. Zhao, M. Jiang, and X. You, “Implementation of a cloud-based cell-free distributed massive MIMO system,” IEEE Commun. Mag., vol. 58, no. 8, pp. 61-67, 2020.
  • [13] W. Choi, J. G. Andrews, “Downlink performance and capacity of distributed antenna systems in a multicell environment,” IEEE Trans. Wireless Commun., vol. 6, no. 1, pp. 69-73, Jan. 2007.
  • [14] J. Zhang, R. Chen, J. G. Andrews, A. Ghosh, and R. W. Heath Jr, “Networked MIMO with clustered linear precoding,” IEEE Trans. Wireless Commun., vol. 8, no. 4, pp. 1910-1921, Apr. 2009.
  • [15] Ö. Özdogan, E. Björnson, and J. Zhang, “Performance of cell-free massive MIMO with rician fading and phase shifts,” IEEE Trans. Wireless Commun., vol. 18, no. 11, pp. 5299-5315, Nov. 2019.
  • [16] Z. Wang, J. Zhang, E. Björnson, and B. Ai, “Uplink performance of cell-free massive MIMO over spatially correlated Rician fading channels,” IEEE Commun. Lett., vol. 25, no. 4, pp. 1348-1352, Apr. 2020.
  • [17] S.-N. Jin, D.-W. Yue, and H. H. Nguyen, “Spectral and energy efficiency in cell-free massive MIMO systems over correlated Rician fading,” IEEE Syst. J., pp. 1-12, May. 2020.
  • [18] H. Q. Ngo, L. Tran, T. Q. Duong, M. Matthaiou, and E. G. Larsson, “On the total energy efficiency of cell-free massive MIMO,” IEEE Trans. Green Commun. Netw, vol. 2, no. 1, pp. 25-39, 2018.
  • [19] H. V. Nguyen, V. D. Nguyen, O. A. Dobre, S. K. Sharma, S. Chatzinotas, B. Ottersten, and O. S. Shin, “On the spectral and energy efficiencies of full-duplex cell-free massive MIMO,” IEEE J. Sel. Areas Commun., vol. 38, no. 8, pp. 1698-1718, Aug. 2020.
  • [20] E. Björnson and L. Sanguinetti, “Scalable cell-free massive MIMO systems,” IEEE Trans. Commun., vol. 68, no. 7, pp. 4247–4261, Jul. 2020.
  • [21] S. Buzzi, C. D’Andrea, “Cell-free massive MIMO: User-centric approach,” IEEE Wireless Commun. Lett., vol. 6, no. 6, pp. 706-709, Dec. 2017.
  • [22] S. Buzzi, C. D’Andrea, A. Zappone, and C. D’Elia, “User-centric 5G cellular networks: Resource allocation and comparison with the cell-free massive MIMO approach,” IEEE Trans. Wireless Commun., vol. 19, no. 2, pp. 1250-1264, Feb. 2020.
  • [23] H. A. Ammar, R. Adve, S. Shahbazpanahi, G. Boudreau, and K. V. Srinivas, “User-centric cell-free massive MIMO networks: A survey of opportunities, challenges and solutions,” IEEE Commun. Surv. Tutor., vol. 24, no. 1, pp. 611-652, 2022.
  • [24] A. A. Polegre, F. Riera-Palou, G. Femenias, and A. G. Armada, “Channel hardening in cell-free and user-centric massive MIMO networks with spatially correlated ricean fading,” IEEE Access, vol. 8, pp. 139827-139845, 2020.
  • [25] F. Riera-Palou, G. Femenias, A. G. Armada, and A. Pérez-Neira, “Clustered cell-free massive MIMO,” in Proc. IEEE Globecom Workshops (GC Wkshps), Dec. 2018, pp. 1-6.
  • [26] E. Nayebi, A. Ashikhmin, T. L. Marzetta, and B. D. Rao, “Performance of cell-free massive MIMO systems with MMSE and LSFD receivers,” in Proc. 50th Asilomar Conf. Signals, Syst. Comput., Nov. 2016, pp. 203–207.
  • [27] E. Björnson and L. Sanguinetti, “Making cell-free massive MIMO competitive with MMSE processing and centralized implementation,” IEEE Trans. Wireless Commun., vol. 19, no. 1, pp. 77–90, Jan. 2020.
  • [28] Z. H. Shaik, E. Björnson, and E. G. Larsson, “MMSE-optimal sequential processing for cell-free massive MIMO with radio stripes,” IEEE Trans. Commun., vol. 69, no. 11, pp. 7775-7789, Nov. 2021.
  • [29] Carmen D’Andrea, E. Bjornson, and E. G. Larsson, “Improving cell-free massive MIMO by local per-bit soft detection,” IEEE Commun. Lett., vol. 25, no. 7, pp. 2400-2404, Apr. 2021.
  • [30] T. P. Minka, “A family of algorithms for approximate Bayesian Inference,” Ph.D. dissertation, Dept. Elect. Eng. Comput. Sci., MIT, Cambridge, MA, USA, 2001.
  • [31] J. Céspedes, P. M. Olmos, M. Sánchez-Fernández, and F. Perez-Cruz, “Expectation propagation detection for high-order high-dimensional MIMO systems,” IEEE Trans. Commun., vol. 62, no. 8, pp. 2840–2849, Aug. 2014.
  • [32] C. Jeon, K. Li, J. R. Cavallaro, and C. Studer, “Decentralized equalization with feedforward architectures for massive MU-MIMO,” IEEE Trans. Signal Process., vol. 67, no. 17, pp. 4418–4432, Sep. 2019.
  • [33] H. Wang, A. Kosasih, C. Wen, S. Jin, and W. Hardjawana, “Expectation propagation detector for extra-large scale massive MIMO,” IEEE Trans. Wireless Commun., vol. 19, no. 3, pp. 2036–2051, Mar. 2020.
  • [34] K. Ghavami and M. Naraghi-Pour, “MIMO detection with imperfect channel state information using expectation propagation,” IEEE Trans. Veh. Technol., vol. 66, no. 9, pp. 8129-8138, Sep. 2017.
  • [35] R. Gholami, L. Cottatellucci, and D. Slock, “Message passing for a bayesian semi-blind approach to cell-free massive mimo,” in Proc. 55th Asilomar Conf. Signals Syst. Comput., Nov, 2021, pp. 1237-1241.
  • [36] A. Kosasih, et al. “Improving Cell-Free Massive MIMO Detection Performance via Expectation Propagation,” in Proc. IEEE 94th Veh. Technol. Conf. (VTC-Fall), Sep. 2021, pp. 1-6.
  • [37] F. R. Kschischange, B. J. Frey, and H. A. Loeliger, “Factor graphs and the sum-product algorithm,” IEEE Trans. Inf. Theory, vol. 42, no. 2, pp. 498-519, Feb. 2001.
  • [38] M. Coldrey and P. Bohlin, “Training-based MIMO systems: Part II:Improvements using detected symbol information,” IEEE Trans. Signal Process., vol. 56, no. 1, pp. 296-303, Jan. 2008.
  • [39] H. He, C.-K. Wen, S. Jin, and G. Y. Li, “Model-driven deep learning for MIMO detection,” IEEE Trans. Signal Process., vol. 68, pp. 1702–1715, Mar. 2020.
  • [40] R. Prasad, C. R. Murthy, and B. D. Rao, “Joint channel estimation and data detection in MIMO-OFDM systems: A sparse Bayesian learning approach,” IEEE Trans. Signal Process., vol. 63, no. 20, pp. 5369-5382, Oct. 2015.
  • [41] J. Ma and L. Ping, “Data-aided channel estimation in large antenna systems,” IEEE Trans. Signal Process., vol. 62, no. 12, pp. 3111-3124, Jun. 2014.
  • [42] C.-K. Wen, C.-J. Wang, S. Jin, K.-K. Wong, and P. Ting, “Bayes-optimal joint channel-and-data estimation for massive MIMO with low-precision ADCs” IEEE Trans. Signal Process., vol. 64, no. 10, pp. 2541-2556, Jul. 2015.
  • [43] H. Song, X. You, C. Zhang, O. Tirkkonen, C. Studer, “Minimizing pilot overhead in cell-Free massive MIMO systems via joint estimation and detection,” in Proc. IEEE 21st Int. Workshop Signal Process. Adv. Wireless Commun. (SPAWC), Atlanta, GA, USA, May. 2020, pp. 1-5.
  • [44] K. Takeuchi, “Rigorous dynamics of expectation-propagation-based signal recovery from unitarily invariant measurements,” IEEE Trans. Inf. Theory., vol. 66, no. 1, 368-386, Oct. 2019.
  • [45] A. M. Tulino and S. Verdu, Random Matrix Theory and Wireless Communications. Hanover, MA USA: Now Publishers Inc., 2004.
  • [46] J. G. Proakis, Digital Communications.   Boston, USA: McGraw-Hill Companies, 2007.
  • [47] A. Fletcher, M. Sahraee-Ardakan, S. Rangan, and P. Schniter, “Expectation consistent approximate inference: Generalizations and convergence,” in Proc. IEEE Int. Symp. Inf. Theory, Barcelona, Spain, Jul. 2016, pp. 190-194.
  • [48] 3GPP, “Further advancements for E-UTRA physical layer aspects (Release 9),” 3GPP TS 36.814, Tech. Rep., Mar. 2017.
  • [49] T. Minka, “Divergence measures and message passing,” Microsoft Research Cambridge, Tech. Rep., 2005.
  • [50] S. Rangan, P. Schniter, and A. Fletcher, “Vector approximate message passing” IEEE Trans. Inf. Theory, vol. 65, no. 10, pp. 6664-6684, Oct. 2019.