This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Variational Bayes Inference for Data Detection in Cell-Free Massive MIMO

Ly V. Nguyen1, Hien Quoc Ngo2, Le-Nam Tran3, A. Lee Swindlehurst4, and Duy H. N. Nguyen5 Email: vnguyen6@sdsu.edu, hien.ngo@qub.ac.uk, nam.tran@ucd.ie, swindle@uci.edu, duy.nguyen@sdsu.edu 1Computational Science Research Center, San Diego State University, CA, USA 2Institute of Electronics, Communications, and Information Technology (ECIT), Queen’s University Belfast, UK 3School of Electrical and Electronic Engineering, University College Dublin, Ireland 4Department of Electrical Engineering and Computer Science, University of California, Irvine, CA, USA 5Department of Electrical and Computer Engineering, San Diego State University, CA, USA
Abstract

Cell-free massive MIMO is a promising technology for beyond-5G networks. Through the deployment of many cooperating access points (AP), the technology can significantly enhance user coverage and spectral efficiency compared to traditional cellular systems. Since the APs are distributed over a large area, the level of favorable propagation in cell-free massive MIMO is less than the one in colocated massive MIMO. As a result, the current linear processing schemes are not close to the optimal ones when the number of AP antennas is not very large. The aim of this paper is to develop nonlinear variational Bayes (VB) methods for data detection in cell-free massive MIMO systems. Contrary to existing work in the literature, which only attained point estimates of the transmit data symbols, the proposed methods aim to obtain the posterior distribution and the Bayes estimate of the data symbols. We develop the VB methods accordingly to the levels of cooperation among the APs. Simulation results show significant performance advantages of the developed VB methods over the linear processing techniques.

Index Terms:
Cell-free, inference, massive MIMO, variational Bayes.

I Introduction

Cell-free massive multiple-input multiple-output (MIMO) is considered as a promising technology for powering beyond-5G networks. The key idea of a cell-free massive MIMO system is to distributively deploy a large number of access points (APs) coherently serving all users in the system. As illustrated in Fig. 1, the APs in a cell-free system can be randomly located all over the coverage area and are connected to one or several central processing units (CPUs). Due to this distributed deployment, any user is highly likely to be close to at least one AP. A cell-free system can effectively resolve the poor coverage issue in cell-edge areas of conventional cellular systems  [1, 2, 3]. In addition, a cell-free system enables different levels of cooperation among the APs with certain levels of joint signal processing at the CPU, ranging from fully centralized processing (Level 4), to partially distributed processing (Levels 3 and 2), and to a fully distributed processing (Level 1) [4]. Joint signal processing at the system’s CPU allows a cell-free system to better address the inter-cell interference, which becomes more severe in cellular systems with small cell deployments. Therefore, cell-free massive MIMO systems can offer significant enhancements in user coverage and energy efficiency compared to traditional cellular systems [1, 5, 4].

Refer to caption
Figure 1: Diagram of a cell-free massive MIMO system with multiple distributed APs connected to a CPU.

The majority of existing research on uplink cell-free massive MIMO has focused on spectral and energy efficiency analysis with linear signal processing methods, such as maximum-ratio combining (MRC) [1], zero-forcing (ZF) [1], and linear minimum mean-squared error (LMMSE) [4]. While such approaches have relatively low complexity, linear methods do not perform well in systems with low level of favorable propagation (e.g. when the number of AP antennas is small or is not much larger than the number of UEs, or the channels are highly correlated). Nonlinear signal processing is thus a promising alternative approach that can offer higher spectral efficiency [4] or lower bit error rate (BER) [6]. The recent work in [6] proposed a nonlinear optimization-based algorithm for joint channel estimation and data detection in cell-free massive MIMO. However, the approach in [6] can only provide point estimates of the data symbols of interest. Different from these papers, the focus of this paper is on devising efficient algorithms to obtain Bayesian estimates of the data symbols. Unfortunately, realizing the exact posterior distributions of the data symbols is intractable, even in a conventional single cell MIMO system. We, therefore, develop variational Bayes (VB) inference methods for approximating intractable posterior distributions of data symbols, which are then used to detect the symbols. We investigate the VB methods for joint data detection with fully centralized processing at the CPU, as well as for distributed data detection at the APs. For fully centralized processing, we assume that full knowledge of the channel state information (CSI) is available at the CPU. Likewise, for distributed processing at each AP, we assume that CSI knowledge for the channel from the users to that AP is locally available. Simulation results show significant performance advantages of the developed VB methods over the LMMSE processing techniques in [4].

Notation: Upper-case and lower-case boldface letters denote matrices and column vectors, respectively. The transpose and conjugate transpose are denoted by []T[\cdot]^{T} and []H[\cdot]^{H}, respectively. 𝒞𝒩(𝝁,𝚺)\mathcal{CN}(\boldsymbol{\mu},\boldsymbol{\Sigma}) represents a complex Gaussian random vector with mean 𝝁\boldsymbol{\mu} and covariance matrix 𝚺\mathbf{\Sigma}; 𝒞𝒩(𝐱;𝝁,𝚺)=(1/(πK|𝚺|))exp((𝐱𝝁)H𝚺1(𝐱𝝁))\mathcal{CN}(\mathbf{x};\boldsymbol{\mu},\mathbf{\Sigma})=\big{(}1/\big{(}\pi^{K}|\mathbf{\Sigma}|\big{)}\big{)}\mathrm{exp}\big{(}-(\mathbf{x}-\boldsymbol{\mu})^{H}\mathbf{\Sigma}^{-1}(\mathbf{x}-\boldsymbol{\mu})\big{)} denotes the probability distribution function (PDF) of a length-KK random vector 𝐱𝒞𝒩(𝝁,𝚺)\mathbf{x}\sim\mathcal{CN}(\boldsymbol{\mu},\mathbf{\Sigma}). 𝔼p(x)[x]\mathbb{E}_{p(x)}[x] and Varp(x)[x]\mathrm{Var}_{p(x)}[x] are the mean and the variance of xx with respect to its distribution p(x)p(x); x\langle x\rangle and σx2\sigma_{x}^{2} denote the mean and variance of xx with respect to a variational distribution q(x)q(x).

II System Model

We consider an uplink cell-free massive MIMO system with LL distributed APs, each equipped with NN antennas, serving KK randomly located single-antenna users. It is assumed that NKNLN\leq K\leq NL. Denote 𝐡iN\mathbf{h}_{i\ell}\in\mathbb{C}^{N} as the uplink channel from the ii-th user and the \ell-th AP and 𝐇=[𝐡1,,𝐡K]\mathbf{H}_{\ell}=[\mathbf{h}_{1\ell},\ldots,\mathbf{h}_{K\ell}]. We assume a block Rayleigh fading scenario in which the channel 𝐡i\mathbf{h}_{i\ell} remains constant for TT time slots and is normally distributed as 𝒞𝒩(𝟎,βi𝐑i)\mathcal{CN}(\mathbf{0},\beta_{i\ell}\mathbf{R}_{i\ell}). Here, βi\beta_{i\ell} is the large-scale fading coefficient and 𝐑i\mathbf{R}_{i\ell} is the normalized spatial correlation matrix whose diagonal elements equal to one. Due to the random user deployment, the large-scale fading coefficient βi\beta_{i\ell} is different from one user to another user, resulting in a non-i.i.d. channel matrix 𝐇\mathbf{H}_{\ell}. We assume that the channel vectors {𝐡i}\{\mathbf{h}_{i\ell}\} are independent of each other for each user-AP pair.

Let 𝐱t=[x1,t,,xK,t]T\mathbf{x}_{t}=[x_{1,t},\ldots,x_{K,t}]^{T} be the transmitted symbol vector at time slot tt, in which the transmitted symbol xi,tx_{i,t} from the ii-th user is drawn from a complex-valued discrete constellation 𝒮\mathcal{S} such that 𝔼[xi,t]=0\mathbb{E}[x_{i,t}]=0 and 𝔼[|xi,t|2]=ρi\mathbb{E}[|x_{i,t}|^{2}]=\rho_{i}. The prior distribution of xi,tx_{i,t} is thus given by

p(xi,t)=a𝒮paδ(xi,ta),\displaystyle p(x_{i,t})=\sum_{a\in\mathcal{S}}p_{a}\delta(x_{i,t}-a), (1)

where pap_{a} corresponds to the known prior probability of the constellation point a𝒮a\in\mathcal{S}. The received signal vector 𝐲,tN\mathbf{y}_{\ell,t}\in\mathbb{C}^{N} at the \ell-th AP can be modeled as

𝐲,t=i=1K𝐡ixi,t+𝐧,t=𝐇𝐱t+𝐧,t,\mathbf{y}_{\ell,t}=\sum_{i=1}^{K}\mathbf{h}_{i\ell}x_{i,t}+\mathbf{n}_{\ell,t}=\mathbf{H}_{\ell}\mathbf{x}_{t}+\mathbf{n}_{\ell,t}, (2)

where 𝐧,t\mathbf{n}_{\ell,t} is the noise vector whose elements are independent and identically distributed (i.i.d.) as 𝒞𝒩(0,N0)\mathcal{CN}(0,N_{0}). The interest of this paper is to obtain an estimated 𝐱^t\hat{\mathbf{x}}_{t} of 𝐱t\mathbf{x}_{t} from multiple observed signal vectors 𝐲,t\mathbf{y}_{\ell,t}’s across the LL distributed APs with minimum mean squared detection error 𝔼[𝐱t𝐱^t2]\mathbb{E}\big{[}\|\mathbf{x}_{t}-\hat{\mathbf{x}}_{t}\|^{2}\big{]}.

III Four Levels of Cell-Free Massive MIMO Signal Processing Using LMMSE Filtering

To frame the discussion on the developed VB methods, we revisit the 4 levels of signal processing in cell-free systems using LMMSE filtering as studied in [4]. Since the processing is based on a per time slot basis, without loss of generality, we drop the time index tt.

III-A Level 4: Fully Centralized Processing

At this level, the APs do not process their received signals. Instead, the received signals are forwarded to the CPU for fully centralized processing, including the data detection task. The signals forwarded from the LL APs can be stacked into

𝐲=𝐇𝐱+𝐧,\mathbf{y}=\mathbf{Hx}+\mathbf{n}, (3)

where 𝐲=[𝐲1T,,𝐲LT]T\mathbf{y}=[\mathbf{y}_{1}^{T},\ldots,\mathbf{y}_{L}^{T}]^{T}, 𝐇=[𝐇1T,,𝐇LT]T\mathbf{H}=[\mathbf{H}_{1}^{T},\ldots,\mathbf{H}_{L}^{T}]^{T}, and 𝐧=[𝐧1T,,𝐧LT]T\mathbf{n}=[\mathbf{n}_{1}^{T},\ldots,\mathbf{n}_{L}^{T}]^{T}. The processing for cell-free massive MIMO in this level is similar to the processing at a conventional co-located MIMO receiver. The CPU detects 𝐱=[x1,,xK]T\mathbf{x}=[x_{1},\ldots,x_{K}]^{T} using the received signal vector 𝐲\mathbf{y} and the channel matrix 𝐇\mathbf{H}. Among the linear detectors, the LMMSE detector maximizes the signal-to-interference-and-noise ratio (SINR) and also achieves the best detection performance [4]. With the full knowledge of 𝐇\mathbf{H}, the LMMSE estimate 𝐱^\hat{\mathbf{x}} is formed as

𝐱^=(𝐇H𝐇+N0𝐈K)1𝐇H𝐲,\displaystyle\hat{\mathbf{x}}=\big{(}\mathbf{H}^{H}\mathbf{H}+N_{0}\mathbf{I}_{K}\big{)}^{-1}\mathbf{H}^{H}\mathbf{y}, (4)

which is then element-wise projected onto 𝒮\mathcal{S}. We note that the LMMSE filter in the presented form requires the inverse of a K×KK\times K-dimensional matrix.

III-B Level 3: Local Processing & Large-Scale Fading Decoding

At this level, each AP pre-processes its received signal by computing a local estimate of 𝐱\mathbf{x} that are forwarded to the CPU for final decoding [4]. Assuming full knowledge of channel matrix 𝐇\mathbf{H}_{\ell} at the \ell-th AP, the local LMMSE estimate 𝐱ˇ=[xˇi,,xˇK]T\check{\mathbf{x}}_{\ell}=[\check{x}_{i\ell},\ldots,\check{x}_{K\ell}]^{T} of 𝐱\mathbf{x} can be found as

𝐱ˇ=𝐇H(𝐇𝐇H+N0𝐈N)1𝐲.\displaystyle\check{\mathbf{x}}_{\ell}=\mathbf{H}_{\ell}^{H}\big{(}\mathbf{H}_{\ell}\mathbf{H}^{H}_{\ell}+N_{0}\mathbf{I}_{N}\big{)}^{-1}\mathbf{y}_{\ell}. (5)

We note that the LMMSE filter in this presented form requires the inverse of a N×NN\times N-dimensional matrix. The CPU then can linearly combine the local estimates {xˇi:=1,,L}\{\check{x}_{i\ell}\,:\,\ell=1,\ldots,L\} to obtain the estimate

x^i==1Laixˇi,\displaystyle\hat{x}_{i}=\sum_{\ell=1}^{L}a_{i\ell}\check{x}_{i\ell}, (6)

which is eventually used to decode xix_{i}. Here, the weighting coefficient vector 𝐚i=[ai1,,aiL]T\mathbf{a}_{i}=[a_{i1},\ldots,a_{iL}]^{T} relies only on channel statistics and can be optimized by the CPU. This combining method is also known as the large-scale fading decoding (LSFB) strategy in the context of cellular massive MIMO. We note that no instantaneous CSI of any channel is required at the CPU.

III-C Level 2: Local Processing & Simple Centralized Decoding

At this level, the CPU forms an estimate of xix_{i} by simply taking the average of the local estimates [4]. This yield an estimate x^i\hat{x}_{i} as

x^i=1L=1Lxˇi.\displaystyle\hat{x}_{i}=\frac{1}{L}\sum_{\ell=1}^{L}\check{x}_{i\ell}. (7)

We note that no statistical parameters of CSI are needed at the CPU at this level of centralized signal processing.

III-D Level 1: Small-Cell Network

At this level, each user signal is decoded by only one AP that gives the highest spectral efficiency to the user, i.e., the highest SINR [4]. LMMSE filtering can be applied to obtain the local estimate of the user signal. Since only one estimate per user is forwarded to the CPU, no centralizing decoding is required.

IV Variational Bayes for Cell-Free Detection

In this paper, we focus on developing VB-based methods for data detection in cell-free massive MIMO systems that require certain levels of centralized processing, i.e., Levels 4, 3, and 2. For Level 4 processing, we assume that the symbol vectors are estimated independently at each time slot. However, for Levels 3 and 2 processing, we assume that the symbol vectors are first estimated locally over the whole fading block. As explained later in the section, this method of processing helps reduce the amount of signaling to the CPU, where the local estimates are aggregated to obtain the final estimate.

IV-A Background on VB

We first present the background on VB for approximate inference that will be exploited for solving the data detection in cell-free systems. VB inference is a powerful framework from machine learning that approximates intractable posterior distributions of latent variables with a known family of simpler distributions through optimization. The goal of VB inference is to find an approximation for a computationally intractable posterior distribution p(𝐱|𝐲)p(\mathbf{x}|\mathbf{y}) given a probabilistic model that specifies the joint distribution p(𝐱,𝐲)p(\mathbf{x},\mathbf{y}), where 𝐲\mathbf{y} represents the set of all observed variables and 𝐱\mathbf{x} is a set of mm latent variables and parameters. The VB inference method aims at finding a density function q(𝐱)q(\mathbf{x}) with its own setting of variational parameters within a family 𝒬\mathcal{Q} of density functions that makes q(𝐱)q(\mathbf{x}) close to the posterior distribution of interest p(𝐱|𝐲)p(\mathbf{x}|\mathbf{y}). VB inference amounts to solving the following optimization problem:

q(𝐱)\displaystyle q(\mathbf{x}) =argminq(𝐱)𝒬KL(q(𝐱)p(𝐱|𝐲))\displaystyle=\arg\min_{q(\mathbf{x})\in\mathcal{Q}}\;\mathrm{KL}\big{(}q(\mathbf{x})\|p(\mathbf{x}|\mathbf{y})\big{)}
=argminq(𝐱)𝒬𝔼q(𝐱)[lnq(𝐱)]𝔼q(𝐱)[lnp(𝐱|𝐲)],\displaystyle=\arg\min_{q(\mathbf{x})\in\mathcal{Q}}\;\mathbb{E}_{q(\mathbf{x})}\big{[}\ln q(\mathbf{x})\big{]}-\mathbb{E}_{q(\mathbf{x})}\big{[}\ln p(\mathbf{x}|\mathbf{y})\big{]}\;, (8)

where KL(q(𝐱)p(𝐱|𝐲)\mathrm{KL}\big{(}q(\mathbf{x})\|p(\mathbf{x}|\mathbf{y}) is the Kullback-Leibler (KL) divergence from q(𝐱)q(\mathbf{x}) to p(𝐱|𝐲)p(\mathbf{x}|\mathbf{y}). Minimizing the KL divergence is equivalent to maximizing the evidence lower bound (ELBO\mathrm{ELBO}[7], which is defined as

ELBO(q)=𝔼q(𝐱)[lnp(𝐱,𝐲)]𝔼q(𝐱)[lnq(𝐱)].\displaystyle\mathrm{ELBO}(q)=\mathbb{E}_{q(\mathbf{x})}\big{[}\ln p(\mathbf{x},\mathbf{y})\big{]}-\mathbb{E}_{q(\mathbf{x})}\big{[}\ln q(\mathbf{x})\big{]}\;. (9)

The maximum of ELBO(q)\mathrm{ELBO}(q) occurs when q(𝐱)=p(𝐱|𝐲)q(\mathbf{x})=p(\mathbf{x}|\mathbf{y}). Since working with the true posterior distribution is often intractable, it is more convenient to consider a restricted family of distributions q(𝐱)q(\mathbf{x}). Among VB inference methods, the mean-field approximation enables efficient optimization of the variational distribution over a partition of the latent variables, while keeping the variational distributions over other partitions fixed [7]. The mean-field variational family is constructed such that

q(𝐱)=i=1mqi(xi),\displaystyle q(\mathbf{x})=\prod_{i=1}^{m}q_{i}(x_{i}), (10)

where the latent variables are mutually independent and each is governed by a distinct factor in the variational density. Among all mean-field distributions q(𝐱)q(\mathbf{x}), the general expression for the optimal solution of the variational density qi(xi)q_{i}(x_{i}) that maximizes the ELBO can be obtained as [7]

qi(xi)exp{lnp(𝐲|𝐱)+lnp(𝐱)},\displaystyle q_{i}(x_{i})\propto\mathrm{exp}\left\{\big{\langle}{\ln p(\mathbf{y}|\mathbf{x})+\ln p(\mathbf{x})\big{\rangle}}\right\}\;, (11)

where \langle\cdot\rangle denotes the expectation with respect to all latent variables except xix_{i} using the currently fixed variational density qi(𝐱i)=jiqj(xj)q_{-i}(\mathbf{x}_{-i})=\prod_{j\neq i}q_{j}(x_{j}). By iterating the update of qi(xi)q_{i}(x_{i}) sequentially over all jj, the ELBO(q)\mathrm{ELBO}(q) objective function can be monotonically improved. This is the basis behind the coordinate ascent variational inference algorithm, which guarantees convergence to at least a local optimum of ELBO(q)\mathrm{ELBO}(q) [7, 8]. To this send, we examine how the mean-field VB framework can be exploited for data detection at different levels of cooperation in a cell-free system.

IV-B Level 4: Fully Centralized Processing

At this level, the signals forwarded from the APs can be stacked into a single large-scale MIMO system as being shown in (3). In a recent work [9], we developed several VB-based methods for MIMO data detection. Among them, the LMMSE-VB algorithm showed superior performance in MIMO systems with non-i.i.d. channels. Certainly, the algorithm can be adopted for data detection in cell-free systems with fully centralized processing. In the following, we present key operations in the algorithm. For details of the algorithm, we refer the readers to [9].

The LMMSE-VB algorithm floats the background noise covariance matrix as an unknown random variable, instead of treating the noise’s variance N0N_{0} as known. The postulated noise covariance matrix 𝐂post\mathbf{C}^{\mathrm{post}} is estimated by the algorithm itself. For ease of computation, we use 𝐖=(𝐂post)1\mathbf{W}=(\mathbf{C}^{\mathrm{post}})^{-1} to denote the precision matrix and assume a conjugate prior complex Wishart distribution 𝒞𝒲(𝐖0,n)\mathcal{CW}(\mathbf{W}_{0},n) for 𝐖\mathbf{W}, where 𝐖0𝟎\mathbf{W}_{0}\succeq\mathbf{0} is the scale matrix and nNLn\geq NL indicates the degrees of freedom. The PDF of 𝐖𝒞𝒲(𝐖0,n)\mathbf{W}\sim\mathcal{CW}(\mathbf{W}_{0},n) satisfies

p(𝐖)|𝐖|nMexp(tr{𝐖01𝐖}).\displaystyle p(\mathbf{W})\propto|\mathbf{W}|^{n-M}\mathrm{exp}\big{(}-\operatorname{tr}\{\mathbf{W}_{0}^{-1}\mathbf{W}\}\big{)}. (12)

The joint distribution p(𝐲,𝐱,𝐖;𝐇)p(\mathbf{y},\mathbf{x},\mathbf{W};\mathbf{H}) can be factored as

p(𝐲,𝐱,𝐖;𝐇)=p(𝐲|𝐱,𝐖;𝐇)p(𝐱)p(𝐖),\displaystyle p(\mathbf{y},\mathbf{x},\mathbf{W};\mathbf{H})=p(\mathbf{y}|\mathbf{x},\mathbf{W};\mathbf{H})p(\mathbf{x})p(\mathbf{W}), (13)

where p(𝐲|𝐱,𝐖;𝐇)=𝒞𝒩(𝐲;𝐇𝐱,𝐖1)p(\mathbf{y}|\mathbf{x},\mathbf{W};\mathbf{H})=\mathcal{CN}(\mathbf{y};\mathbf{Hx},\mathbf{W}^{-1}). Given the observation 𝐲\mathbf{y}, we aim at obtaining the mean-field variational distribution q(𝐱,𝐖)q(\mathbf{x},\mathbf{W}) such that

p(𝐱,𝐖|𝐲;𝐇)q(𝐱,𝐖)=i=1Kqi(xi)q(𝐖).\displaystyle p(\mathbf{x},\mathbf{W}|\mathbf{y};\mathbf{H})\approx q(\mathbf{x},\mathbf{W})=\prod_{i=1}^{K}q_{i}(x_{i})q(\mathbf{W}). (14)

The optimization of q(𝐱,𝐖)q(\mathbf{x},\mathbf{W}) is executed by iteratively updating {xi}\{x_{i}\} and 𝐖\mathbf{W} as follows.

a) Updating xix_{i}. The variational distribution qi(xi)q_{i}(x_{i}) is obtained by expanding the conditional in (13) and taking the expectation with respect to all latent variables except xix_{i} using the variational distribution jiKqj(xj)q(𝐖)\prod_{j\neq i}^{K}q_{j}(x_{j})q(\mathbf{W}):

qi(xi)\displaystyle q_{i}(x_{i}) \displaystyle\propto p(xi)𝒞𝒩(zi;xi,1/(𝐡iH𝐖𝐡i)),\displaystyle p(x_{i})\,\mathcal{CN}\big{(}z_{i};x_{i},{1}/{\big{(}\mathbf{h}_{i}^{H}\langle\mathbf{W}\rangle\mathbf{h}_{i}\big{)}}\big{)}, (15)

where ziz_{i} is a linear estimate of xix_{i} that is defined as

zi\displaystyle z_{i} =\displaystyle= xi+𝐡iH𝐖𝐡iH𝐖𝐡i(𝐲𝐇𝐱).\displaystyle\langle x_{i}\rangle+\frac{\mathbf{h}^{H}_{i}\langle\mathbf{W}\rangle}{\mathbf{h}_{i}^{H}\langle\mathbf{W}\rangle\mathbf{h}_{i}}\big{(}\mathbf{y}-\mathbf{H}\langle\mathbf{x}\rangle\big{)}. (16)

It is observed in (15) that 𝒞𝒩(zi;xi,σ^i2)\mathcal{CN}\big{(}z_{i};x_{i},\hat{\sigma}_{i}^{2}\big{)} with σ^i2=1/(𝐡iH𝐖𝐡i)\hat{\sigma}_{i}^{2}=1/\big{(}\mathbf{h}_{i}^{H}\langle\mathbf{W}\rangle\mathbf{h}_{i}\big{)} can be interpreted as the likelihood function p(zi|xi;σ^i2)p\big{(}z_{i}|x_{i};\hat{\sigma}_{i}^{2}\big{)}. In other words, the mean-field VB approximation decouples the linear MIMO system into KK parallel AWGN channels zi=xi+𝒞𝒩(0,σ^i2)z_{i}=x_{i}+\mathcal{CN}\big{(}0,\hat{\sigma}_{i}^{2}\big{)}.

The variational distribution qi(xi)q_{i}(x_{i}) is realized by normalizing p(xi)𝒞𝒩(zi;xi,σ^i2)p(x_{i})\,\mathcal{CN}\big{(}z_{i};x_{i},\hat{\sigma}_{i}^{2}\big{)}. The variational mean xi=𝔼[xi|zi]\langle x_{i}\rangle=\mathbb{E}[x_{i}|z_{i}] and variance σxi2\sigma_{x_{i}}^{2} are then computed accordingly.

b) Updating 𝐖\mathbf{W}. The variational distribution q(𝐖)q(\mathbf{W}) is obtained by taking the expectation of the conditional in (13) with respect to q(𝐱)q(\mathbf{x}):

q(𝐖)\displaystyle q(\mathbf{W}) \displaystyle\propto exp{lnp(𝐲|𝐱,𝐖;𝐇)+lnp(𝐖)}.\displaystyle\mathrm{exp}\big{\{}\big{\langle}\ln p(\mathbf{y}|\mathbf{x},\mathbf{W};\mathbf{H})+\ln p(\mathbf{W})\big{\rangle}\big{\}}. (17)

The variational distribution q(𝐖)q(\mathbf{W}) is also complex Wishart with n+1n+1 degrees of freedom [9]. The variational mean 𝐖\langle\mathbf{W}\rangle can be computed accordingly. In [9], we also proposed to use the estimator

𝐖=(𝐲𝐇𝐱2NL𝐈NL+𝐇𝚺𝐱𝐇)1,\displaystyle\langle\mathbf{W}\rangle=\bigg{(}\frac{\|\mathbf{y}-\mathbf{Hx}\|^{2}}{NL}\mathbf{I}_{NL}+\mathbf{H}\boldsymbol{\Sigma}_{\mathbf{x}}\mathbf{H}\bigg{)}^{-1}, (18)

where 𝚺𝐱=diag(σx12,,σxK2)\boldsymbol{\Sigma}_{\mathbf{x}}=\mathrm{diag}(\sigma_{x_{1}}^{2},\ldots,\sigma_{x_{K}}^{2}).

By iteratively optimizing {qi(xi)}\big{\{}q_{i}(x_{i})\big{\}} and q(𝐖)q(\mathbf{W}) via the updates of {xi}\{\langle x_{i}\rangle\} and 𝐖\langle\mathbf{W}\rangle, we obtain the CAVI algorithm for estimating 𝐱\mathbf{x} and the precision matrix 𝐖\mathbf{W}. We refer to this scheme as the LMMSE-VB algorithm since ziz_{i} resembles an LMMSE estimate of xix_{i} due to the cancellation of the inter-user interference and the whitening with the postulated noise covariance matrix 𝐂post\mathbf{C}^{\mathrm{post}}.

IV-C Level 3: Local Processing & Nonlinear Decoding

At this level, our proposed VB-based method involves two operations: 1) Executing the LMMSE-VB algorithm independently at each AP to compute local estimates of 𝐱t\mathbf{x}_{t} and 2) Aggregating the local estimates at the CPU for joint nonlinear decoding of 𝐱t\mathbf{x}_{t}. However, we make a minor modification to the LMMSE-VB algorithm which allow it to operate over the whole block of TT time slots.

IV-C1 AP Processing

The signal processing at an AP, say the \ell-th AP, is to generate a coarse estimate 𝐱^t\hat{\mathbf{x}}_{t} of 𝐱t\mathbf{x}_{t}, from the observation 𝐲t\mathbf{y}_{t}. We treat the background noise covariance matrix at the \ell-th AP as an unknown random variable. The postulated noise matrix 𝐂post\mathbf{C}_{\ell}^{\mathrm{post}} has to be estimated as well. We denote the precision matrix 𝐖=(𝐂post)1\mathbf{W}_{\ell}=(\mathbf{C}_{\ell}^{\mathrm{post}})^{-1}, 𝐘=[𝐲,1,,𝐲,T]\mathbf{Y}_{\ell}=[\mathbf{y}_{\ell,1},\ldots,\mathbf{y}_{\ell,T}], and 𝐗=[𝐱1,,𝐱T]\mathbf{X}=[\mathbf{x}_{1},\ldots,\mathbf{x}_{T}]. The joint distribution p(𝐘,𝐗,𝐖;𝐇)p(\mathbf{Y}_{\ell},\mathbf{X},\mathbf{W}_{\ell};\mathbf{H}_{\ell}) can be factorized as

p(𝐘,𝐗,𝐖;𝐇)=p(𝐘|𝐗,𝐖;𝐇)p(𝐗)p(𝐖),p(\mathbf{Y}_{\ell},\mathbf{X},\mathbf{W}_{\ell};\mathbf{H}_{\ell})=p(\mathbf{Y}_{\ell}|\mathbf{X},\mathbf{W}_{\ell};\mathbf{H}_{\ell})p(\mathbf{X})p(\mathbf{W}_{\ell}), (19)

where p(𝐘|𝐗,𝐖;𝐇)=t=1Tp(𝐲,t|𝐱t,𝐖;𝐇)p(\mathbf{Y}_{\ell}|\mathbf{X},\mathbf{W}_{\ell};\mathbf{H}_{\ell})=\prod_{t=1}^{T}p(\mathbf{y}_{\ell,t}|\mathbf{x}_{t},\mathbf{W}_{\ell};\mathbf{H}_{\ell}) with p(𝐲,t|𝐱t,𝐖;𝐇)=𝒞𝒩(𝐲,t;𝐇𝐱t,𝐖1)p(\mathbf{y}_{\ell,t}|\mathbf{x}_{t},\mathbf{W}_{\ell};\mathbf{H}_{\ell})=\mathcal{CN}\big{(}\mathbf{y}_{\ell,t};\mathbf{H}_{\ell}\mathbf{x}_{t},\mathbf{W}_{\ell}^{-1}\big{)}. Given the observation 𝐘\mathbf{Y}_{\ell}, we aim at obtaining the mean-field variational distribution q(𝐗,𝐖)q_{\ell}(\mathbf{X},\mathbf{W}_{\ell}) such that

p(𝐗,𝐖|𝐘;𝐇)\displaystyle p(\mathbf{X},\mathbf{W}_{\ell}|\mathbf{Y}_{\ell};\mathbf{H}_{\ell}) q(𝐗,𝐖)\displaystyle\approx q_{\ell}(\mathbf{X},\mathbf{W}_{\ell})
=i=1Kt=1Tqi,t(xi,t)q(𝐖).\displaystyle=\prod_{i=1}^{K}\prod_{t=1}^{T}q_{i\ell,t}(x_{i,t})q(\mathbf{W}_{\ell}). (20)

The optimization of q(𝐗,𝐖)q_{\ell}(\mathbf{X},\mathbf{W}_{\ell}) is executed by iteratively updating {xi,t}\{x_{i,t}\} and 𝐖\mathbf{W}_{\ell} as follows.

a) Update xi,tx_{i,t}: The variational distribution qi,t(xi,t)q_{i\ell,t}(x_{i,t}) is obtained by expanding the conditional in (19) and taking the expectation with respect to all latent variables except xi,tx_{i,t} using the variational distribution (j,r)(i,t)qj,r(xj,r)q(𝐖)\prod_{(j,r)\neq(i,t)}q_{j\ell,r}(x_{j,r})q(\mathbf{W}_{\ell}):

qi,t(xi,t)\displaystyle q_{i\ell,t}(x_{i,t})
exp{lnp(𝐲,t|𝐱t,𝐖;𝐇)+lnp(𝐱t)}\displaystyle\propto\exp\left\{\langle\ln p(\mathbf{y}_{\ell,t}|\mathbf{x}_{t},\mathbf{W}_{\ell};\mathbf{H}_{\ell})+\ln p(\mathbf{x}_{t})\rangle\right\}
p(xi,t)exp{(𝐲,t𝐇𝐱t)H𝐖(𝐲,t𝐇𝐱t)}\displaystyle\propto p(x_{i,t})\exp\left\{\left\langle-(\mathbf{y}_{\ell,t}-\mathbf{H}_{\ell}\mathbf{x}_{t})^{H}\mathbf{W}_{\ell}(\mathbf{y}_{\ell,t}-\mathbf{H}_{\ell}\mathbf{x}_{t})\right\rangle\right\}
p(xi,t)exp{𝐡iH𝐖𝐡i|xi,tzi,t|2}\displaystyle\propto p(x_{i,t})\exp\left\{-\mathbf{h}_{i\ell}^{H}\langle\mathbf{W}_{\ell}\rangle\mathbf{h}_{i\ell}|x_{i,t}-z_{i\ell,t}|^{2}\right\}
p(xi,t)𝒞𝒩(zi,t;xi,t,1/(𝐡iH𝐖𝐡i)),\displaystyle\propto p(x_{i,t})\,\mathcal{CN}\big{(}z_{i\ell,t};x_{i,t},1/(\mathbf{h}_{i\ell}^{H}\langle\mathbf{W}_{\ell}\rangle\mathbf{h}_{i\ell})\big{)}, (21)

where

zi,t\displaystyle z_{i\ell,t} =\displaystyle= 𝐡iH𝐖𝐡iH𝐖𝐡i(𝐲,tjiK𝐡jxj,t)\displaystyle\frac{\mathbf{h}_{i\ell}^{H}\langle\mathbf{W}_{\ell}\rangle}{\mathbf{h}_{i\ell}^{H}\langle\mathbf{W}_{\ell}\rangle\mathbf{h}_{i\ell}}\big{(}\mathbf{y}_{\ell,t}-\sum_{j\neq i}^{K}\mathbf{h}_{j\ell}\langle x_{j\ell,t}\rangle\big{)} (22)
=\displaystyle= xi,t+𝐡iH𝐖(𝐲,t𝐇𝐱t)𝐡iH𝐖𝐡i.\displaystyle\langle x_{i,t}\rangle+\frac{\mathbf{h}_{i\ell}^{H}\langle\mathbf{W}_{\ell}\rangle(\mathbf{y}_{\ell,t}-\mathbf{H}_{\ell}\langle\mathbf{x}_{t}\rangle)}{\mathbf{h}_{i\ell}^{H}\langle\mathbf{W}_{\ell}\rangle\mathbf{h}_{i\ell}}.

It is observed in (IV-C1) that 𝒞𝒩(zi,t;xi,t,σˇi2)\mathcal{CN}\big{(}z_{i\ell,t};x_{i,t},\check{\sigma}_{i\ell}^{2}\big{)} with σˇi2=1/(𝐡iH𝐖𝐡i)\check{\sigma}_{i\ell}^{2}={1}/{\big{(}\mathbf{h}_{i\ell}^{H}\langle\mathbf{W}_{\ell}\rangle\mathbf{h}_{i\ell}\big{)}} can be interpreted as the likelihood function p(zi,t|xi,t;σˇi2)p\big{(}z_{i\ell,t}|x_{i,t};\check{\sigma}_{i\ell}^{2}\big{)}. In this case, the mean-field VB approximation decouples the uplink MIMO channel to the \ell-th AP into KK parallel AWGN channels zi,t=xi,t+𝒞𝒩(0,σˇi2)z_{i\ell,t}=x_{i,t}+\mathcal{CN}\big{(}0,\check{\sigma}_{i\ell}^{2}\big{)}. It is also observed that zi,tz_{i\ell,t} is the local LMMSE estimate of xi,tx_{i,t}, while the variance σˇi2\check{\sigma}_{i\ell}^{2} indicates the reliability of this estimate.

The variational distribution qi,t(xi,t)q_{i\ell,t}(x_{i,t}) is realized by normalizing p(xi,t)𝒞𝒩(zi,t;xi,t,σˇi2)p(x_{i,t})\mathcal{CN}\big{(}z_{i\ell,t};x_{i,t},\check{\sigma}_{i\ell}^{2}\big{)}. The variational mean xi,t=𝔼[xi,t|zi,t]\langle x_{i,t}\rangle=\mathbb{E}[x_{i,t}|z_{i\ell,t}] and variance σxi,t2\sigma_{x_{i,t}}^{2} can be computed accordingly. Hereafter, we use xˇi,t\check{x}_{i\ell,t} instead of xi,t\langle x_{i,t}\rangle or 𝔼[xi,t|zi,t]\mathbb{E}[x_{i,t}|z_{i\ell,t}] to indicate the nonlinear MMSE estimate of xi,tx_{i,t} at the \ell-th AP.

b) Update 𝐖\mathbf{W}_{\ell}: The variational distribution q(𝐖)q(\mathbf{W}_{\ell}) is obtained by taking the expectation of the conditional in (19) with respect to i=1Kt=1Tqi,t(xi,t)\prod_{i=1}^{K}\prod_{t=1}^{T}q_{i\ell,t}(x_{i,t}):

q(𝐖)\displaystyle q(\mathbf{W}_{\ell}) \displaystyle\propto exp{lnp(𝐘|𝐗,𝐖;𝐇)+lnp(𝐖)}.\displaystyle\mathrm{exp}\big{\{}\big{\langle}\ln p(\mathbf{Y}_{\ell}|\mathbf{X},\mathbf{W}_{\ell};\mathbf{H}_{\ell})+\ln p(\mathbf{W}_{\ell})\big{\rangle}\big{\}}. (23)
𝐖=(n+T)(𝐖0+(𝐘𝐇𝐗)(𝐘𝐇𝐗)H+t=1T𝐇𝚺𝐱,t𝐇)1.\displaystyle\langle\mathbf{W}_{\ell}\rangle=(n+T)\Bigg{(}\mathbf{W}_{0}+(\mathbf{Y}_{\ell}-\mathbf{H}_{\ell}\mathbf{X})(\mathbf{Y}_{\ell}-\mathbf{H}_{\ell}\mathbf{X})^{H}+\sum_{t=1}^{T}\mathbf{H}_{\ell}\boldsymbol{\Sigma}_{\mathbf{x},t}\mathbf{H}_{\ell}\Bigg{)}^{-1}. (24)

Assuming a conjugate prior complex Wishart distributed 𝒞𝒲(𝐖0,,n)\mathcal{CW}(\mathbf{W}_{0,\ell},n) for 𝐖\mathbf{W}_{\ell}, the variational distribution q(𝐖)q(\mathbf{W}) is also complex Wishart with n+Tn+T degrees of freedom. The variational mean 𝐖\langle\mathbf{W}_{\ell}\rangle is given in (24), where 𝚺𝐱,t=diag(σx1,t2,,σxK,t2)\boldsymbol{\Sigma}_{\mathbf{x},t}=\operatorname{diag}{(}\sigma_{x_{1,t}}^{2},\ldots,\sigma_{x_{K,t}}^{2}).

The LMMSE-VB algorithm is executed at the \ell-th AP by iteratively optimizing {qi,t(xi,t)}\{q_{i\ell,t}(x_{i,t})\} and q(𝐖)q(\mathbf{W}) via the updates of {xi,t}\{\langle x_{i,t}\rangle\} and 𝐖\langle\mathbf{W}_{\ell}\rangle. The \ell-th AP then sends the LMMSE estimate zi,tz_{i\ell,t} and the variance σˇi2\check{\sigma}_{i\ell}^{2} to the CPU for centralized decoding. By pre-processing the whole block of TT time slots, σˇi2\check{\sigma}_{i\ell}^{2} is sent only once for each channel realization. In contrast, if the LMMSE-VB algorithm is executed on a per time slot basis, the variance of the LMMSE estimate zi,tz_{i\ell,t} has to be computed and sent for each time slot.

IV-C2 CPU Processing

After collecting the local estimates zi,tz_{i\ell,t} and the variance σˇi2\check{\sigma}_{i\ell}^{2} from the LL APs, the CPU can proceed to decode each of the KK symbols independently. Since zi,t=xi,t+𝒞𝒩(0,σˇi2)z_{i\ell,t}=x_{i,t}+\mathcal{CN}\big{(}0,\check{\sigma}_{i\ell}^{2}\big{)}, an approximate posterior distribution p(xi,t|{zi,t};{σˇi2})p(x_{i,t}|\{z_{i\ell,t}\};\{\check{\sigma}_{i\ell}^{2}\}) can be easily derived. The MAP estimate x^i,t\hat{x}_{i,t} of xi,tx_{i,t} is obtained as

x^i,t=argmaxxi,t𝒮(lnp(xi)=1L|zi,txi,t|2σˇi2).\hat{x}_{i,t}=\arg\max_{x_{i,t}\in\mathcal{S}}\left(\ln p(x_{i})-\sum_{\ell=1}^{L}\frac{|z_{i\ell,t}-x_{i,t}|^{2}}{\check{\sigma}_{i\ell}^{2}}\right). (25)

We note that the above nonlinear combination of local estimates and reliability information is significantly different from the linear combination of local estimates in (6).

IV-D Level 2: Local Processing & Simple Linear Combining

At this level, only local estimates are fed back to the CPU. The LMMSE-VB mentioned in Level 3 signal processing can be used to generate the coarse local estimates. However, the local nonlinear MMSE estimates xˇi,t\check{x}_{i\ell,t} is sent, instead of the LMMSE estimate zi,tz_{i\ell,t} and the variance σˇi2\check{\sigma}_{i\ell}^{2}. We note that xˇi,t\check{x}_{i\ell,t} can be computed using zi,tz_{i\ell,t} and σˇi2\check{\sigma}_{i\ell}^{2}, but not the reverse.

A simple estimate of xi,tx_{i,t} can be obtained by simply taking the average of all the estimates xˇi,t\check{x}_{i\ell,t} as

x^i,t=1L=1Lxˇi,t.\hat{x}_{i,t}=\frac{1}{L}\sum_{\ell=1}^{L}\check{x}_{i\ell,t}. (26)

The final detected symbol of xi,tx_{i,t} is the constellation point that is closest to x^i,t\hat{x}_{i,t}.

V Numerical Results

This section presents the numerical results comparing the developed VB-based methods for data detection in cell-free systems with the LMMSE filtering methods in [4]. We use a simulation setting and a channel model in urban environments similar to the work in [4]. In particular, a network area of 1×11\times 1 km is considered where the APs are deployed on a square grid and users are randomly distributed. The large-scale fading coefficient of the channel between user-ii and AP-\ell (in dB) is given as

βi=30.536.7log10(di)+Fi,\beta_{i\ell}=-30.5-36.7\log_{10}(d_{i\ell})+F_{i\ell}, (27)

where did_{i\ell} (in m) is the distance between user-ii and AP-\ell and Fi𝒩(0,16)F_{i\ell}\sim\mathcal{N}(0,16) is the shadow fading. The correlation between the shadowing terms from an AP to different users is modeled as

𝔼[FiFi]={16×2δii/9,=0,\mathbb{E}[F_{i\ell}F_{i^{\prime}\ell^{\prime}}]=\begin{cases}16\times 2^{-\delta_{ii^{\prime}}/9},&\ell=\ell^{\prime}\\ 0,&\ell\neq\ell^{\prime}\end{cases} (28)

where δii\delta_{ii^{\prime}} (in m) is the distance between user-ii and user-ii^{\prime}. Receive antennas at each AP are arranged in a uniform linear array with half-wavelength spacing. For spatial correlation, we use the Gaussian local scattering model with a 1515^{\circ} angular standard deviation [10]. We set the noise as 𝒞𝒩(0,1)\mathcal{CN}(0,1) and vary the transmit power of users.

In this work, we compare different data detection methods assuming perfect CSI and QPSK signalling. We assume that each AP is equipped with 44 antennas, i.e., N=4N=4. Fig. 2 presents the symbol error rate (SER) performance of the two types of methods in a relatively small setting of cell-free systems with K=16K=16 and L=16L=16. As the user transmit power is increased, the VB-based methods attain much lower SER than the MMSE filtering methods. Up to 22-dB gain is observed at Level 4 and 44-dB gain is observed at Level 3 and 2.

Fig. 3 presents the SER performance a cell-free system with K=40K=40 and L=64L=64. The figure clearly indicates the superior performance of the proposed VB-based methods over the MMSE filtering methods. It is also observed from both figures that the more centralized signal processing is carried at the CPU, the better SER performance can be achieved, especially in systems with a large number of users, e.g., K=40K=40.

Refer to caption
Figure 2: SER performance of the VB-based methods (in solid lines) and LMMSE methods (in dashed lines) versus the user transmit power, with K=16K=16, L=16L=16, and N=4N=4.
Refer to caption
Figure 3: SER performance of the VB-based methods (in solid lines) and LMMSE methods (in dashed lines) versus the user transmit power, with K=40K=40, L=64L=64, and N=4N=4.

VI Conclusion

In this paper, we have proposed the VB-based methods for data detection in cell-free systems at three different levels of AP cooperation. The proposed methods can achieve much lower SER than the linear MMSE signal processing methods. We note that the presented study only considers the case of perfect CSI available at the CPU (for Level 4) and at the APs (for Levels 3 and 2). As an extension of this paper, we are developing novel VB-based methods for data detection with imperfect CSI and joint channel estimation and data detection in cell-free systems.

Acknowledgment

This work was supported by the U.S. National Science Foundation under Grants ECCS-2146436 and CCF-2225576.

References

  • [1] H. Q. Ngo, A. Ashikhmin, H. Yang, E. G. Larsson, and T. L. Marzetta, “Cell-free massive MIMO versus small cells,” IEEE Trans. Wireless Commun., vol. 16, no. 3, pp. 1834–1850, Mar. 2017.
  • [2] G. Interdonato, E. Björnson, H. Quoc Ngo, P. Frenger, and E. G. Larsson, EURASIP J. Wireless Commun. and Networking, 2019. [Online]. Available: https://doi.org/10.1186/s13638-019-1507-0
  • [3] E. Björnson and L. Sanguinetti, “Scalable cell-free massive MIMO systems,” IEEE Trans. Commun., vol. 68, no. 7, pp. 4247–4261, July 2020.
  • [4] ——, “Making cell-free massive MIMO competitive with MMSE processing and centralized implementation,” IEEE Trans. Wireless Commun., vol. 19, no. 1, pp. 77–90, Jan. 2020.
  • [5] E. Nayebi, A. Ashikhmin, T. L. Marzetta, H. Yang, and B. D. Rao, “Precoding and power optimization in cell-free massive MIMO systems,” IEEE Trans. Wireless Commun., vol. 16, no. 7, pp. 4445–4459, July 2017.
  • [6] H. Song, T. Goldstein, X. You, C. Zhang, O. Tirkkonen, and C. Studer, “Joint channel estimation and data detection in cell-free massive MU-MIMO systems,” IEEE Trans. Wireless Commun. (Early Access), 2021.
  • [7] C. M. Bishop, Pattern Recognition and Machine Learning.   Springer, 2006.
  • [8] M. J. Wainwright and M. I. Jordan, Graphical models, exponential families, and variational inference.   Now Publishers Inc, 2008.
  • [9] D. H. N. Nguyen, I. Atzeni, A. Tölli, and A. L. Swindlehurst, “A variational Bayesian perspective on massive MIMO detection,” 2022. [Online]. Available: http://engineering.sdsu.edu/ñguyen/downloads/VB_for_MIMO_detection.pdf
  • [10] E. Björnson, J. Hoydis, and L. Sanguinetti, “Massive MIMO networks: Spectral, energy, and hardware efficiency,” Foundations and Trends in Signal Processing, vol. 11, no. 3-4, pp. 154–655, 2017.