Improving Sum-Rate of Cell-Free Massive MIMO with Expanded Compute-and-Forward

Jiayi Zhang, Jing Zhang, Derrick Wing Kwan Ng,
Shi Jin, and Bo Ai J. Zhang and J. Zhang are with the School of Electronic and Information Engineering, Beijing Jiaotong University, Beijing 100044, China, and also with the Frontiers Science Center for Smart High-speed Railway System, Beijing Jiaotong University, Beijing 100044, China (e-mail: jiayizhang@bjtu.edu.cn).D. W. K. Ng is with the School of Electrical Engineering and Telecommunications, University of New South Wales, NSW 2052, Australia. (e-mail: w.k.ng@unsw.edu.au).S. Jin is with the National Mobile Communications Research Laboratory, Southeast University, Nanjing 210096, China (e-mail: jinshi@seu.edu.cn).B. Ai is with the State Key Laboratory of Rail Traffic Control and Safety, Beijing Jiaotong University, Beijing 100044, China, and also with the Frontiers Science Center for Smart High-speed Railway System, and also with Henan Joint International Research Laboratory of Intelligent Networking and Data Analysis, Zhengzhou University, Zhengzhou 450001, China, and also with Research Center of Networks and Communications, Peng Cheng Laboratory, Shenzhen, China (e-mail: boai@bjtu.edu.cn).

Abstract

Cell-free massive multiple-input multiple-output (MIMO) employs a large number of distributed access points (APs) to serve a small number of user equipments (UEs) via the same time/frequency resource. Due to the strong macro diversity gain, cell-free massive MIMO can considerably improve the achievable sum-rate compared to conventional cellular massive MIMO. However, the performance of cell-free massive MIMO is upper limited by inter-user interference (IUI) when employing simple maximum ratio combining (MRC) at receivers. To harness IUI, the expanded compute-and-forward (ECF) framework is adopted. In particular, we propose power control algorithms for the parallel computation and successive computation in the ECF framework, respectively, to exploit the performance gain and then improve the system performance. Furthermore, we propose an AP selection scheme and the application of different decoding orders for the successive computation. Finally, numerical results demonstrate that ECF frameworks outperform the conventional CF and MRC frameworks in terms of achievable sum-rate.

Index Terms:

Cell-free massive MIMO, expanded compute-and-forward, power control, sum rate.

I Introduction

Massive multiple-input multiple-output (MIMO) is a promising physical-layer technology to keep up with the exponential traffic growth of future wireless communication systems. More specifically, massive MIMO can provide tremendous beamforming gains and spatially multiplexing gains to multiple user equipments (UEs) and increase the system achievable sum-rate [2, 3, 4]. Despite the potential performance gain brought by massive MIMO, UEs at cell-edge may experience poor channel conditions and suffer from strong inter-cell interference (ICI). To alleviate this performance bottleneck, distributed massive MIMO has been proposed to combat ICI and to improve the performance of cell-edge UEs. However, there is a fundamental performance limitation for distributed massive MIMO with full cooperation between different transmitters [5].

Recently, the authors in [6] proposed a practical network infrastructure for distributed massive MIMO, under the name of cell-free massive MIMO [7, 8, 9]. In cell-free massive MIMO systems, a large number of access points (APs) distribute in a large area and are connected to a central processing unit (CPU) via a fronthaul network. In particular, a small number of UEs are served by all APs with the same time/frequency resource [6, 10, 11]. Since there are no cells or cell boundaries, ICI does not exist. Indeed, cell-free massive MIMO is a specific realization of distributed massive MIMO [6].

The most outstanding aspect of cell-free massive MIMO is that many APs simultaneously serve a much smaller number of UEs, which yields a high degree of macro-diversity and can offer a huge spectral efficiency. Besides, some studies have reported that favorable propagation is also a potential advantage for cell-free massive MIMO which can be exploited to eliminate inter-user interference (IUI) [6]. Note that favorable propagation refers to the property that when the number of AP antennas is sufficiently large, the channels between the UEs and APs become asymptotically orthogonal [12]. However, the favorable propagation property does not always hold in practical systems. The non-negligible IUI is highly undesirable and leads to a considerable loss in achievable sum-rate. As a result, how to harness the IUI has triggered many new coding and signal processing techniques.

I-A Related Works

As a new approach of linear physical-layer network coding that allows intermediate nodes to send out functions of their received packets [13, 14, 15, 16], the compute-and-forward (CF) scheme has recently been employed in cell-free massive MIMO systems to offer protection against noise and to reduce IUI with cooperation gain [17]. For the uplink transmission, UEs employ a nested lattice coding strategy to encode data that takes values in a prime-size finite field before transmission. Then, the CF scheme enables APs to decode the integer linear equations of UEs’ codewords using the noisy linear combinations provided by the channels. Relying on nested lattice codes, the linear combination of UEs’ codewords is still a regular codeword [18, 19]. Next, each AP forwards the decoded combination to the CPU through the fronthaul link. After receiving sufficient linear combinations, the CPU could recover every UE¡¯s original data by performing AP selection and solving the received equations [20, 21, 17].

However, the CF scheme requests all UEs transmit with equal power, which is generally not the optimal strategy for improving the achievable sum-rate. Due to the different propagation conditions between APs and UEs, the performance can be improved by performing appropriate power control [12]. Moreover, with power control for UEs, the effective noise variance across all APs whose linear combinations involve the message can be reduced. Then, the achievable sum-rate can be further improved.

Motivated by the discussion above, we adopt the expanded compute-and-forward (ECF) framework which was proposed in [22] for the uplink transmission in cell-free massive MIMO systems. The ECF framework is able to distribute transmit powers unequally and retains the connection between the finite field data and the lattice codeword. We note that coordinated multiple points (CoMP) framework also can be implemented with interference alignment at the transmitter-side [23, 24, 25], however, the distinction between CoMP and ECF is that CoMP as conventionally defined does not involve CF strategy.

There are two types of ECF framework, named parallel computation and successive computation, respectively. The distinction between these schemes is that in parallel computation the CPU recovers UEs’ data independently while for successive computation the CPU decodes the linear combinations by using successive cancellation. Specifically, in successive computation, the combinations which have been decoded can be used as side information in the subsequent decoding steps to decrease both effective noise variance and the number of UEs that need to tolerate the effective noise. Applying successive computation helps improve the achievable sum-rate, however, in terms of processing delay, the parallel computation has some advantages. In other words, there is a trade-off between the parallel computation and successive computation.

Besides, there are some key aspects which dominate the performance of ECF framework: coefficient vector selection and AP selection. Since the performance of ECF is captured by the computation rate and that rate achieves the highest when the equation coefficients closely approximate the effective channel coefficients, designing the coefficient vector elaborately is beneficial for the improvement of achievable sum-rate. As for AP selection, it is performed at the CPU when recovering UEs’ original data in both parallel computation and successive computation. With the help of AP selection, the computational complexity of power optimization is reduced. Furthermore, the noise tolerance on UEs’ data can also be relaxed, which contributes to the improvement of the achievable sum-rate.

I-B Contributions

In this paper, we consider the application of ECF framework in cell-free massive MIMO systems to increase the achievable sum-rate, including both parallel computation and successive computation. The main contributions of this paper are as follows:

•

We apply a quadratic programming relaxation based coefficient vector selection method and a large-scale fading based low-complexity AP selection algorithm to improve the achievable sum-rate of the cell-free massive MIMO system.
•

We design efficient power control algorithms for parallel and successive computation schemes, respectively. For the successive computation scheme, we further derive a sub-optimal decoding order of combinations and develop three assignment algorithms to find a sub-optimal decoding order of UEs.
•

We quantitatively compare the performance of conventional combining and ECF frameworks under practical channel model and scenarios, which proves that the ECF framework is an effective approach for the fronthaul reduction. In particular, the successive computation scheme outperforms the parallel computation scheme with a larger fronthaul load.

Compared with our related conference paper [1], which focused only on parallel computation with power control based on uplink-downlink duality, in this paper, we provide a thorough analysis for the successive computation scheme with power control for improving the achievable sum rate. Besides, the problem-solving methodology for determining the suboptimal decoding order of combinations and UEs are investigated. Furthermore, the results from [1] are not applicable to the case considered in this paper due to different power control method and additional AP selection algorithm are applied. More importantly, we also provide practice insights into the performance of MRC, CF, centralized MMSE, parallel computation, and successive computation schemes in achievable sum rate.

The rest of this paper is organized as follows. In Section II, we describe the cell-free massive MIMO system model. A detailed introduction for ECF framework is given in Section III. Furthermore, AP selection methods and power control algorithm for parallel computation are introduced in Section III-B. In Section III-C, we investigate different decoding order methods of combinations for successive computation. Finally, numerical results and discussions are given in Section IV while Section V concludes the paper.

Table I shows the notations. Unless further specified, plain letters, boldface letters, and boldface uppercase letters denotes scalars, column vectors, and matrices respectively.

TABLE I: Notations

p

A prime number

\mathbb{R}

\mathbb{C}

\mathbb{Z}_{p}

Reals, complex field,

finite field of size

p

{q_{1}},{q_{2}},{w_{1}},{w_{2}},r

Element in

\mathbb{Z}_{p}

\mathbb{Z}[i]=

\left\{{a+\left.{bi}\right|a,b\in\mathbb{Z}}\right\}

Set of Gaussian integers whose real

and imaginary parts are both integers

\sum

Addition over the real

or complex field

\oplus

Addition over the finite field

a\bmod p=r

Computing the remainder

of dividing

a

p

{q_{1}}{w_{1}}\oplus{q_{2}}{w_{2}}

{{q_{1}}{w_{1}}+{q_{2}}{w_{2}}}\bmod p

\left\|{\bf{a}}\right\|

2-norm of vector

\bf{a}

{{\bf{a}}^{T}}

{{\bf{a}}^{H}}

Transpose of

\bf{a}

conjugate-transpose of

\bf{a}

\left\lfloor a\right\rfloor

Floor function of

a

Identity matrix

{\mathbb{E}}\left\{{a}\right\}

Expectation of

{a}

{\log^{+}}\left(a\right)

\max\left({\log\left(a\right),0}\right)

, the log function

is with respect to base 2

II System Model

We consider an uplink cell-free massive MIMO system. $M$ single-antenna APs and $L$ ( $M>L$ ) single-antenna UEs are randomly distributed in a wide geographical area [6, 10, 11]. APs provide services for UEs via the same time/frequency resource. In particular, each AP exchanges information with the CPU via fronthaul link. As the practical number of APs is finite, we assume that the IUI can still have significant impact on the achievable sum-rate.

Refer to caption — Figure 1: ECF framework based cell-free massive MIMO systems.

First, we will provide some necessary definition on nested lattice codes. An $n$ -dimensional lattice, $\Lambda$ , is a set of points in ${\mathbb{R}^{n}}$ such that if ${\mathbf{s}},{\mathbf{t}}\in\Lambda$ , then ${\mathbf{s}}+{\mathbf{t}}\in\Lambda$ and if ${\mathbf{s}}\in\Lambda$ , then $-{\mathbf{s}}\in\Lambda$ . Note that a lattice can always be written in terms of a lattice generator matrix ${\mathbf{B}}\in{\mathbb{R}^{n\times n}}$ , i.e., $\Lambda=\left\{{{\mathbf{s}}={\mathbf{Bc}}:{\mathbf{c}}\in{\mathbb{Z}^{n}}}\right\}$ . Besides, a lattice $\Lambda$ is said to be nested in a lattice ${\Lambda_{1}}$ if $\Lambda\subseteq{\Lambda_{1}}$ . As shown in Fig. 1, without loss of generality, the $l$ th UE maps the original length- $k$ data ${\bf{w}}_{l}\in\mathbb{Z}^{k}_{p}$ into a length- $n$ complex-valued lattice codeword ${{\bf{x}}_{l}}$ with encoder ${\phi_{l}}:\mathbb{Z}_{p}^{k}\to\mathbb{Z}{[i]^{n}}$ . The specific choices of $n$ and $p$ are studied in [22, Theorem 8]. For creating generation matrices to encode the original data into nested lattice codeword, the blocklength needs to be large enough. Therefore, the longer blocklength, i.e., $n$ , is better. Note that $k_{l}$ is the number of symbols carrying information. The remaining $k-k_{l}$ symbols are set to zero to meet the power constraint and the effective noise tolerance. The lattice codeword is subject to the power constraint $\mathbb{E}{\left\|{{{\bf{x}}_{l}}}\right\|^{2}}\leq n{P_{l}}$ , where ${P_{l}}$ is the transmit power of the $l$ th UE.

Let $g_{mk}$ represent the channel coefficient between the $m$ th AP and $l$ th UE, which is given by

{g_{ml}}=\beta_{ml}^{1/2}{h_{ml}},

(1)

where $\beta_{ml}$ denotes the large-scale fading and $h_{ml}\in{\mathbb{C}}$ denotes the small-scale fading. With the help of [11, Eq. (17)], the propagation is given as

{\beta_{ml}}\left[{{\rm{dB}}}\right]=-30.5-36.7{\log_{10}}\left({{d_{ml}}/1{\text{m}}}\right)+{F_{ml}},

(2)

where $d_{ml}$ represents the distances between the $m$ th AP and the $l$ th UE and ${F_{ml}}\sim{\mathcal{N}}(0,{4^{2}})$ is the shadow fading. We assume that ${h_{ml}},m=1,\ldots,M,l=1,\ldots,L$ , are independent and identically distributed (i.i.d.) ${\cal C}{\cal N}\left({0,1}\right)$ random variables (RV)s.

The length- $n$ vector received signal at the $m$ th AP is

{{\bf{y}}_{m}}=\sum\nolimits_{l=1}^{L}{{g_{ml}}}{{\bf{x}}_{l}}+{{\bf{z}}_{m}},

(3)

where the thermal noise ${{\bf{z}}_{m}}\in{\mathbb{C}^{n}}$ is elementwise independent and identically distributed (i.i.d.) $\mathcal{CN}\left({0,{\sigma^{2}}}\right)$ .

The ECF framework manipulates the algebraic structure such that any Gaussian integer combination of lattice codewords is still a lattice point. In cell-free massive MIMO, each AP endeavours to represent the received length- $n$ signal vector ${{\bf{y}}_{m}}$ with a Gaussian integer linear combination of UEs’ codewords. By applying an equalization factor $b_{m}$ and selecting the coefficient vector ${{\bf{a}}_{m}}={\left[{{a_{m1}},{a_{m2}},\ldots,{a_{mL}}}\right]^{T}}\in\mathbb{Z}{\left[i\right]^{L}}$ , the scaled received signal can be expressed as

{b_{m}}{{\bf{y}}_{m}}=\sum\nolimits_{l=1}^{L}{{a_{ml}}}{{\bf{x}}_{l}}+\underbrace{\sum\nolimits_{l=1}^{L}{\left({{b_{m}}{g_{ml}}-{a_{ml}}}\right)}{{\bf{x}}_{l}}+{b_{m}}{{\bf{z}}_{m}}}_{{\rm{effective\,noise}}}.

(4)

Each AP is equipped with a decoder, ${\varphi_{m}}:\mathbb{Z}{\left[i\right]^{{n}}}\to\mathbb{Z}_{p}^{{k}}$ . Then, AP decodes the received signal ${{\bf{y}}_{m}}$ into the finite field as ${{{\bf{\hat{u}}}}_{m}}={\varphi_{m}}\left({{{\bf{y}}_{m}}}\right)$ , where ${{{\bf{\hat{u}}}}_{m}}$ is an estimation of the linear combination of original data ${{\bf{u}}_{m}}=\mathop{\oplus}\limits_{l=1}^{L}{q_{ml}}{{\bf{w}}_{l}}={\sum\nolimits_{l=1}^{L}{{a_{ml}}{{\bf{x}}_{l}}}}\bmod p$ . ¹¹1If the codeword spacing for a given data from the $l$ th UE can tolerate the maximum effective noise across the APs whose linear combinations involve that data, the probability of decoding error is given as $\Pr\left({\mathop{\cup}\limits_{m=1}^{{M_{l}}}\left\{{{{{\bf{\hat{u}}}}_{m}}\neq{{\bf{u}}_{m}}}\right\}}\right)<\varepsilon$ , where $M_{l}$ represents the number of APs whose combinations contain the data and $\varepsilon$ is a small positive number that tends to zero. The specific procedure for recovering messages for UEs is stated in [18]. Given $L$ linear combinations of messages with real and imaginary coefficient matrices ${{\bf{Q}}^{R}}=\left\{{q_{ml}^{R}}\right\}$ , ${{\bf{Q}}^{I}}=\left\{{q_{ml}^{I}}\right\}$ , the CPU can recover message ${{\bf{w}}_{l}}$ if there exists a vector ${\bf{c}}\in{\mathbb{Z}}_{p}^{M\times L}$ such that

{{\bf{c}}^{T}}\left[{\begin{array}[]{*{20}{c}}{{{\bf{Q}}^{R}}}&{-{{\bf{Q}}^{I}}}\\ {{{\bf{Q}}^{I}}}&{{{\bf{Q}}^{R}}}\end{array}}\right]={{\bm{\delta}}_{l}^{T}},

(5)

where ${{\bm{\delta}}_{l}}$ denotes a unit column vector with 1 in the $l$ th entry and 0 elsewhere. ²²2The two decoders adopted at APs and the CPU, respectively, have different functionalities. Indeed, decoders at the APs are used for decoding the received signal into the linear combination of the UEs’ original data. Then, these decoded linear combinations are transmitted from APs to CPU. In contrast, the decoder at the CPU is responsible for recovering each UE’s data from those combinations. Specifically, when applying it with successive computation, the interference cancellation procedure takes place at the decoder at the CPU. For the traditional multiuser MIMO systems, where $M=L$ , data recovery is a major challenge due to the high probability of rank deficiency. However, the number of APs is far larger than that of the UEs in cell-free massive MIMO systems. Since when the number of APs increases the probability of selecting $L$ APs that provides $L$ independent linear combinations also increases [17], the extra APs can ensure a much higher probability for avoiding rank deficiency, so as to improve the probability to recover the desired message.

III Expanded Compute-and-Forward

One of the major challenges in cell-free massive MIMO is the IUI in the uplink. In particular, CF scheme can achieve large gain through decoding linear functions of transmitted signals with nested lattice codes. The performance of CF scheme for cell-free massive MIMO has been compared with MRC in [17], which shows that with equal power transmission at all UEs, the CF scheme can offer a throughput improvement. Furthermore, the ECF framework can improve the achievable sum-rate utilizing the characteristic of optimal power control.

In this section, two practical ECF frameworks are considered for cell-free massive MIMO systems. The first one is parallel computation, which refers to that the CPU decodes each of the integer linear combinations independently. Furthermore, successive computation decodes received combinations one-by-one and employing the side information to reduce the effective noise. We begin with the parallel computation.

III-A Coefficient Vector Selection

The goal of this paper is to evaluate the performance of the ECF framework for cell-free massive MIMO systems by deriving its computation rate region [22], which is defined as the set of achievable rate $R_{l}$ ensuring successful data recovery:

	$\displaystyle{\mathcal{R}_{\text{ECF}}}\left({\bf{P}},{{\bf{g}}_{m}},{{\bf{a}}_{m}}\right)\triangleq\bigg{\{}\left({R_{1}},{R_{2}},\dots,{R_{L}}\right)\in\mathbb{R}_{+}^{L}:$
	$\displaystyle{R_{l}}\!\leq\!{\log^{+}}\!\left(\frac{{{P_{l}}}}{{{\sigma^{2}}\left({\bf{P}},{{\bf{g}}_{m}},{{\bf{a}}_{m}}\right)}}\right)\;\;\forall\left({m,l}\right)\;\;{\rm{s.}}{\rm{t.}}\;\;{a_{ml}}\neq 0\bigg{\}},$		(6)

where ${\sigma^{2}}\left({\bf{P}},{{\bf{g}}_{m}},{{\bf{a}}_{m}}\right)$ refers to the effective noise at the $m$ th AP and ${\bf{P}}\buildrel\Delta\over{=}{\text{diag}}\left({{P_{1}},{P_{2}},\ldots,{P_{L}}}\right)$ is the diagonal matrix with the power constraint for UEs. In order to maximize the computation rate region, we need to find the optimal coefficient vector ${{{\bf{a}}_{m}}}$ and equalization factor $b_{m}$ .

According to [22, Lemma 2], the equalization factor $b_{m}$ that minimizes the effective noise variance from (4) is the MMSE projection. Then, we have

{b_{m}}={\bf{g}}_{m}^{H}{{\bf{P}}}{{\bf{a}}_{m}}{\left({1+{\bf{g}}_{m}^{H}{{\bf{P}}}{{\bf{g}}_{m}}}\right)^{-1}}.

(7)

Hence, the effective noise is given by

	$\displaystyle{\sigma^{2}}\left({{\bf{P}},{{\bf{g}}_{m}},{{\bf{a}}_{m}}}\right)$	$\displaystyle\buildrel\Delta\over{=}\frac{1}{n}{\mathbb{E}}\left\{{{{\left\\|{{{\bf{X}}^{T}}\left({{b_{m}}{{\bf{g}}_{m}}-{{\bf{a}}_{m}}}\right)+{b_{m}}{{\bf{z}}_{m}}}\right\\|}^{2}}}\right\}$
		$\displaystyle={\bf{a}}_{m}^{H}{\left({{{{{\bf{P}}}}^{-1}}+{{\bf{g}}_{m}}{\bf{g}}_{m}^{H}}\right)^{-1}}{{\bf{a}}_{m}},$		(8)

where ${\bf{X}}={\left[{{{\bf{x}}_{1}},{{\bf{x}}_{2}},\ldots,{{\bf{x}}_{L}}}\right]^{T}}$ represents the codeword matrix. For the $m$ th AP, the aim is to find its optimal coefficient vector that maximizes the computation rate region as

	$\displaystyle{{\bf{a}}_{m,\text{opt}}}$	$\displaystyle=\mathop{\arg\max}\limits_{{{\bf{a}}_{m}}\in\mathbb{Z}{{\left[i\right]}^{L}}}{{\mathcal{R}}_{\text{ECF}}}\left({{\bf{P}},{{\bf{g}}_{m}},{{\bf{a}}_{m}}}\right)$
		$\displaystyle=\mathop{\arg\min}\limits_{{{\bf{a}}_{m}}\in\mathbb{Z}{{\left[i\right]}^{L}}}{\sigma^{2}}\left({{\bf{P}},{{\bf{g}}_{m}},{{\bf{a}}_{m}}}\right).$		(9)

Since the channel coefficient between the $m$ th AP and the $l$ th UE is complex-valued, the received signal ${{\bf{y}}_{m}}$ can be divided into the real part and the imaginary part:

	$\displaystyle{\mathop{\rm Re}\nolimits}\left({{{\bf{y}}_{m}}}\right)\!\!=\!\!\sum\limits_{l=1}^{L}{\left({{\mathop{\rm Re}\nolimits}\left({{g_{ml}}}\right){\mathop{\rm Re}\nolimits}\left({{{\bf{x}}_{l}}}\right)\!\!-\!\!{\mathop{\rm Im}\nolimits}\left({{g_{ml}}}\right){\mathop{\rm Im}\nolimits}\left({{{\bf{x}}_{l}}}\right)}\right)}\!\!+\!\!{\mathop{\rm Re}\nolimits}\left({{{\bf{z}}_{m}}}\right),$
	$\displaystyle{\mathop{\rm Im}\nolimits}\left({{{\bf{y}}_{m}}}\right)\!\!=\!\!\sum\limits_{l=1}^{L}{\left({{\mathop{\rm Im}\nolimits}\left({{g_{ml}}}\right){\mathop{\rm Re}\nolimits}\left({{{\bf{x}}_{l}}}\right)\!\!+\!\!{\mathop{\rm Re}\nolimits}\left({{g_{ml}}}\right){\mathop{\rm Im}\nolimits}\left({{{\bf{x}}_{l}}}\right)}\right)}\!\!+\!\!{\mathop{\rm Im}\nolimits}\left({{{\bf{z}}_{m}}}\right).$

Therefore, we can transform the complex-valued network with $L$ UEs and $M$ APs into a real-valued network with $2L$ UEs and $2M$ APs. It is convenient to calculate the real and imaginary parts of the coefficient vector ${{\bf{a}}_{m}}$ , respectively. ³³3Reducing the problem of developing coefficient algorithms for complex channels to an equivalent real-only channel is generally suboptimal. Some solutions of finding the optimal solution in polynomial time over complex integer based lattices and complex channels were proposed in [26], however, they require a substantially higher complexity. Therefore, the investigation of explicitly addresses the complex channel with low complexity is one of our future work. Without loss of generality, we only consider ${\mathop{\rm Re}\nolimits}\left({{{\bf{a}}_{m}}}\right)$ for a given real-valued channel coefficient ${\mathop{\rm Re}\nolimits}\left({{{\bf{g}}_{m}}}\right)$ in the following.

For each coefficient vector ${\mathop{\rm Re}\nolimits}\left({{{\bf{g}}_{m}}}\right)$ , we can find a signed permutation matrix S, which is unimodular and orthogonal such that ${\bf{S}}{\mathop{\rm Re}\nolimits}\left({{{\bf{g}}_{m}}}\right)$ is nonnegative and its elements are in nondecreasing order [27, Lemma 1]. Suppose ${\mathop{\rm Re}\nolimits}{\left({{{\bf{a}}_{m}}}\right)_{\text{opt}}}$ is the optimal coefficient vector with the specifical power constraint P and channel coefficient ${\mathop{\rm Re}\nolimits}\left({{{\bf{g}}_{m}}}\right)$ , we have $\mathcal{R}\left({{\mathop{\rm Re}\nolimits}{\left({{{\bf{g}}_{m}}}\right)},{\mathop{\rm Re}\nolimits}{\left({{{\bf{a}}_{m}}}\right)_{\text{opt}}}}\right)=\mathcal{R}\left({{\bf{S}}{\mathop{\rm Re}\nolimits}\left({{{\bf{g}}_{m}}}\right),{\bf{S}}{\mathop{\rm Re}\nolimits}{\left({{{\bf{a}}_{m}}}\right)_{\text{opt}}}}\right)$ [27, Lemma 3]. Define $\overline{{\mathop{\rm Re}\nolimits}\left({{{\bf{g}}_{m}}}\right)}$ as the nonnegative and non-decreasing-ordered vector, e.g., $\overline{{\mathop{\rm Re}\nolimits}\left({{{\bf{g}}_{m}}}\right)}={\bf{S}}{\mathop{\rm Re}\nolimits}\left({{{\bf{g}}_{m}}}\right)$ . Therefore, we can recover the desired coefficient vector through ${\mathop{\rm Re}\nolimits}{\left({{{\bf{a}}_{m}}}\right)_{\text{opt}}}={{\bf{S}}^{-1}}{\overline{{\mathop{\rm Re}\nolimits}\left({{{\bf{a}}_{m}}}\right)}_{\text{opt}}}$ .

In the following, we concentrate on acquiring ${\overline{{\mathop{\rm Re}\nolimits}\left({{{\bf{a}}_{m}}}\right)}_{\text{opt}}}$ for $\overline{{\mathop{\rm Re}\nolimits}\left({{{\bf{g}}_{m}}}\right)}$ by relaxing the optimization problems stated in (III-A) based on the quadratic programming (QP) method [28]. Recall that $\overline{{\mathop{\rm Re}\nolimits}\left({{{\bf{a}}_{m}}}\right)}$ is in nondecreasing order, therefore, the maximum element should be ${\overline{{\mathop{\rm Re}\nolimits}\left({{{\bf{a}}_{m}}}\right)}_{L}}$ . According to [22], the searching space for ${\overline{{\mathop{\rm Re}\nolimits}\left({{{\bf{a}}_{m}}}\right)}_{L}}$ can be restricted with

{\overline{{\mathop{\rm Re}\nolimits}\left({{{\bf{a}}_{m}}}\right)}_{L}}\leq{\lambda_{\max}}\left({{\bf{I}}+\overline{{\mathop{\rm Re}\nolimits}\left({{{\bf{g}}_{m}}}\right)}{{\bf{P}}}{{\overline{{\mathop{\rm Re}\nolimits}\left({{{\bf{g}}_{m}}}\right)}}^{T}}}\right),

(10)

where ${\lambda_{\max}}\left({{\bf{I}}+\overline{{\mathop{\rm Re}\nolimits}\left({{{\bf{g}}_{m}}}\right)}{{\bf{P}}}{{\overline{{\mathop{\rm Re}\nolimits}\left({{{\bf{g}}_{m}}}\right)}}^{T}}}\right)$ denotes the maximum eigenvalue of $\left({{\bf{I}}+\overline{{\mathop{\rm Re}\nolimits}\left({{{\bf{g}}_{m}}}\right)}{{\bf{P}}}{{\overline{{\mathop{\rm Re}\nolimits}\left({{{\bf{g}}_{m}}}\right)}}^{T}}}\right)$ . Then, the problem stated in (III-A) can be rewritten as a series of QP problems

$\displaystyle\mathop{{\rm{minimize}}}\limits_{\overline{{\mathop{\rm Re}\nolimits}\left({{{\bf{a}}_{m}}}\right)}}\qquad$	$\displaystyle{\overline{{\mathop{\rm Re}\nolimits}\left({{{\bf{a}}_{m}}}\right)}^{T}}{{\bf{G}}_{m}}\overline{{\mathop{\rm Re}\nolimits}\left({{{\bf{a}}_{m}}}\right)}$
$\displaystyle{\rm{subject\,to}}\qquad$	$\displaystyle\overline{{\mathop{\rm Re}\nolimits}\left({{{\bf{a}}_{m}}}\right)}\in{\mathbb{R}^{L}},$
	$\displaystyle{\overline{{\mathop{\rm Re}\nolimits}\left({{{\bf{a}}_{m}}}\right)}_{L}}=k,\;\;\;k=1,2,\ldots,K,$	(11)

where ${{\bf{G}}_{m}}={\left({{\bf{P}}+\overline{{\mathop{\rm Re}\nolimits}\left({{{\bf{g}}_{m}}}\right)}\,{\overline{{\mathop{\rm Re}\nolimits}\left({{{\bf{g}}_{m}}}\right)}^{T}}}\right)}^{-1}$ and $K=\left\lfloor{{\lambda_{\max}}\left({{\bf{I}}+\overline{{\mathop{\rm Re}\nolimits}\left({{{\bf{g}}_{m}}}\right)}{{\bf{P}}}{{\overline{{\mathop{\rm Re}\nolimits}\left({{{\bf{g}}_{m}}}\right)}}^{T}}}\right)}\right\rfloor$ . Let $\overline{{\mathop{\rm Re}\nolimits}\left({{{\bf{a}}_{m}}}\right)}_{k}^{+}$ represent the solution to the problem of (III-A) with the constraint ${\overline{{\mathop{\rm Re}\nolimits}\left({{{\bf{a}}_{m}}}\right)}_{L}}=k$ . $K$ solutions can be obtained by utilizing $\overline{{\mathop{\rm Re}\nolimits}\left({{{\bf{a}}_{m}}}\right)}_{k}^{+}=k\overline{{\mathop{\rm Re}\nolimits}\left({{{\bf{a}}_{m}}}\right)}_{1}^{+}$ . With the Lagrange multiplier method [28], we have

\overline{{\mathop{\rm Re}\nolimits}\left({{{\bf{a}}_{m}}}\right)}_{1}^{+}=\left[{\bf{r}},1\right]^{T},

(12)

where ${\bf{r}}=-{\left({{{\bf{G}}_{m}}\left({1:L-1,1:L-1}\right)}\right)^{-1}}{\bf{G}}\left({1:L-1,L}\right)$ . With the help of [27, ALgorithm 1], $K$ real-valued solutions to the problem in (III-A), $\left\{{\overline{{\mathop{\rm Re}\nolimits}\left({{{\bf{a}}_{m}}}\right)}_{k}^{+}}\right\}$ , can be quantized to integer-valued $\left\{{\overline{{\mathop{\rm Re}\nolimits}\left({{{\bf{a}}_{m}}}\right)}_{k}^{{\mathop{\rm int}}}}\right\}$ . We select a sub-optimal coefficient vector ${\overline{{\mathop{\rm Re}\nolimits}\left({{{\bf{a}}_{m}}}\right)}_{\text{opt}}}$ for $\overline{{\mathop{\rm Re}\nolimits}\left({{{\bf{g}}_{m}}}\right)}$ with

{\overline{{\mathop{\rm Re}\nolimits}\left({{{\bf{a}}_{m}}}\right)}_{\text{opt}}}=\arg\mathop{\min}\limits_{\overline{{\mathop{\rm Re}\nolimits}\left({{{\bf{a}}_{m}}}\right)}\in\left\{{\overline{{\mathop{\rm Re}\nolimits}\left({{{\bf{a}}_{m}}}\right)}_{k}^{{\mathop{\rm int}}}}\right\}}{\overline{{\mathop{\rm Re}\nolimits}\left({{{\bf{a}}_{m}}}\right)}^{T}}{{\bf{G}}_{m}}\overline{{\mathop{\rm Re}\nolimits}\left({{{\bf{a}}_{m}}}\right)}.

Finally, the optimal coefficient vector ${\mathop{\rm Re}\nolimits}{\left({{{\bf{a}}_{m}}}\right)_{\text{opt}}}$ correlated with the channel coefficient ${\mathop{\rm Re}\nolimits}\left({{{\bf{g}}_{m}}}\right)$ is recovered with ${\mathop{\rm Re}\nolimits}{\left({{{\bf{a}}_{m}}}\right)_{\text{opt}}}={{\bf{S}}^{-1}}{\overline{{\mathop{\rm Re}\nolimits}\left({{{\bf{a}}_{m}}}\right)}_{\text{opt}}}$ . Following a similar line of reasoning, the imaginary part of the coefficient vector can be derived.

III-B Parallel Computation

For parallel computation, the integer linear combinations of UEs’ data are decoded independently. On this basis, we first introduce the computation rate region. Then, we provide a detailed description of the proposed power control algorithm which improves the achievable sum-rate. For reducing the effective noise variance and computation complexity, we further propose an AP selection algorithm based on large-scale fading.

III-B1 Computation Rate Region

Let us suppose that all APs have full channel state information. To obtain the estimation of the integer combination with UEs’ original data ${{{\bf{\hat{u}}}}_{m}}$ , the $m$ th AP multiplies the received signal by an equalization factor $b_{m}$ by the received signal to obtain the effective channel as

	$\displaystyle{\widetilde{\bf{y}}_{m}}={b_{m}}{{\bf{y}}_{m}}={b_{m}}{{\bf{X}}^{T}}{{\bf{g}}_{m}}+{b_{m}}{{\bf{z}}_{m}}$
	$\displaystyle={b_{m}}{{\bf{X}}^{T}}{{\bf{a}}_{m}}+\underbrace{{{\bf{X}}^{T}}\left({{b_{m}}{{\bf{g}}_{m}}-{{\bf{a}}_{m}}}\right)+{b_{m}}{{\bf{z}}_{m}}}_{{\rm{effective\,noise}}}.$		(13)

After choosing $b_{m}$ to be the minimum mean-square error (MMSE) coefficient adopted at the $m$ th AP, the minimum effective noise variance for parallel computation is given by

\sigma_{\text{para}}^{2}\left({{\bf{P}},{{\bf{g}}_{m}},{{\bf{a}}_{m}}}\right)\buildrel\Delta\over{=}{\bf{a}}_{m}^{H}{\left({{{{{\bf{P}}}}^{-1}}+{{\bf{g}}_{m}}{\bf{g}}_{m}^{H}}\right)^{-1}}{{\bf{a}}_{m}}.

(14)

We denote ${\bf{A}}$ as the matrix of the coefficient vectors, ${\bf{A}}=\left[{{{\bf{a}}_{1,}}{{\bf{a}}_{2}},\ldots,{{\bf{a}}_{M}}}\right]$ . Specifically, if the $m$ th column of ${\bf{A}}$ is a null vector, the $m$ th AP does not serve any UE; if the $l$ th row of ${\bf{A}}$ is a zero vector, the $l$ th UE is not served by any AP. When we remove such columns and rows from ${\bf{A}}$ , we obtain ${\bf{A}}\in{\mathbb{Z}}{\left[i\right]^{L^{\prime}\times M^{\prime}}}$ , where $L^{\prime}$ and $M^{\prime}$ refers to the number of effective UEs and APs. Due to the array gain, the sum-rate increases along with the value of $M^{\prime}$ increase. However, there is a trade-off between the values of $L^{\prime}$ and sum-rate performance, since the growth of effective UEs does not always lead to the increase in sum-rate [6]. According to the discussion of coefficient vector selection in Section III-A, the values of $M^{\prime}$ and $L^{\prime}$ are determined by the location of APs and UEs, therefore $M^{\prime}$ and $L^{\prime}$ can take the optimal value when the location of APs and UEs is optimal. Define the rank of ${\bf{A}}$ by ${{M^{\prime}}_{{\rm{rank}}}}\buildrel\Delta\over{=}{\rm{Rank}}\left({\bf{A}}\right)$ . According to [18], all effective UEs’ data can be recovered if ${{M^{\prime}}_{{\rm{rank}}}}=L^{\prime}$ . Therefore, we only need $L^{\prime}$ integer linear combinations among the whole $M^{\prime}$ combinations. In other words, only $L^{\prime}$ APs need to transmit signals to the CPU through fronthaul links. The computation rate region for the parallel computation is given by

	$\displaystyle\!\!{\mathcal{R}_{{\rm{para}}}}\left({{\bf{P}},{{\bf{g}}_{m}},{{\bf{a}}_{m}}}\right)\buildrel\Delta\over{=}\Bigg{\{}{\left({{R_{1}},{R_{2}},\ldots,{R_{L^{\prime}}}}\right)\in\mathbb{R}_{+}^{L^{\prime}}:}$
	$\displaystyle\!\!{{R_{l}}\!\leq\!{{\log}^{+}}\!\left(\!{\frac{{{P_{l}}}}{{{\sigma_{\text{para}}^{2}}\left({{{{\bf{P}},{{\bf{g}}_{m}},{{\bf{a}}_{m}}}}}\!\right)}}}\right)\;\;\forall\!\left({m,l}\right)\;{\rm{s.}}{\rm{t.}}\;{a_{ml}}\neq 0}\Bigg{\}},$		(15)

where $a_{ml}=0$ means the $m$ th AP doesn’t serve the $l$ th UE.

III-B2 Power Optimization

If ${a_{ml}}\neq 0$ , the computed achievable rate for the $l$ th UE at the $m$ th AP is given as

{R^{\prime}_{\left({l,m}\right)}}={\log^{+}}\left({\frac{{{P_{l}}}}{{\sigma_{{\rm{para}}}^{2}\left({{\bf{P}},{{\bf{g}}_{m}},{{\bf{a}}_{m}}}\right)}}}\right)

(16)

However, when recovering the data of the $l$ th UE, the codeword spacing for that data should tolerate the maximum effective noise variance across APs, whose linear combinations involve that data. Therefore, the actual achievable rate of the $l$ th UE is

\!{R_{l}}\!=\!\mathop{\min}\limits_{{a_{ml}}\neq 0}\!{{R^{\prime}}_{\left({l,m}\right)}}\!=\!\mathop{\min}\limits_{{a_{ml}}\neq 0}{\log^{+}}\left({\frac{{{P_{l}}}}{{\sigma_{{\rm{para}}}^{2}\left({{\bf{P}},{{\bf{g}}_{m}},{{\bf{a}}_{m}}}\right)}}}\right).

(17)

Hence, the achievable sum-rate of $L^{\prime}$ UEs is

\sum\limits_{l=1}^{L^{\prime}}{{R_{l}}}=\mathop{\min}\limits_{{a_{ml}}\neq 0}\sum\limits_{l=1}^{L^{\prime}}{\left({{{\log}^{+}}\left({\frac{{P_{l}}}{{\sigma_{{\rm{para}}}^{2}\left({{{\bf{P}}},{{\bf{g}}_{m}},{{\bf{a}}_{m}}}\right)}}}\right)}\right)}.

(18)

Recall that all UEs transmit with equal power in CF scheme. For fairness, we compare the performance of CF and ECF with the constraint of equal total transmit power. We aim at optimizing the power allocation to maximize the achievable sum-rate under the constraints on the total power consumption $P_{t}$ . The optimization problem is formulated as follows:

$\displaystyle\mathop{{\rm{maximize}}}\limits_{\bf{P}}$	$\displaystyle\sum\limits_{l=1}^{L^{\prime}}{{R_{l}}}$
$\displaystyle{\rm{subject\,to}}$	$\displaystyle\sum\nolimits_{l=1}^{L^{\prime}}{{P_{l}}}={P_{t}},$
	$\displaystyle{P_{l}}\geq 0,l=1,2,\ldots.L^{\prime}.$	(19)

UEs can share a total power budget which is the upper bound performance of each UE’s power constraint as their total maximum allowable transmit power [29]. Besides, (19) is handled at the CPU since the global information ${{\bf{a}}_{1}},\cdots,{{\bf{a}}_{M}}$ and ${{\bf{g}}_{1}},\cdots,{{\bf{g}}_{M}}$ are required.

As mentioned above, each AP decodes $\sum\nolimits_{l=1}^{L}{{a_{ml}}{{\mathbf{x}}_{l}}}$ as one regular codeword due to the lattice algebraic structure. All UEs served by the $m$ th AP need to tolerate the same effective noise. If the linear integer combinations $\sum\nolimits_{l=1}^{L}{{a_{ml}}{{\mathbf{x}}_{l}}}$ can tolerate the effective noise variance ${{{\sigma_{{\rm{para}}}^{2}\left({{{\bf{P}}},{{\bf{g}}_{m}},{{\bf{a}}_{m}}}\right)}}}$ , then all UEs served by the $m$ th AP, which means ${a_{ml}}\neq 0$ can be successfully recovered from the linear combination with integer coefficient vector ${{\bf{a}}_{m}}$ . In cell-free massive MIMO, we always emphasize a good quality-of-service for all users. However, directly improving the achievable sum rate cannot achieve a good balance of quality-of-service for all users [12]. Therefore, the goal of minimizing the maximum effective noise variance that can generally improve the achievable rate for most UEs is more suitable for our model. In other words, we could minimize the maximum effective noise variance as

$\displaystyle\mathop{{\rm{minimize}}}\limits_{{{\bf{P}}}}\mathop{\max}\limits_{m=1,\ldots,M^{\prime}}$	$\displaystyle\left\{{{{\sigma_{{\rm{para}}}^{2}\left({{{\bf{P}}},{{\bf{g}}_{m}},{{\bf{a}}_{m}}}\right)}}}\right\}$
$\displaystyle{\rm{subject\,to}}$	$\displaystyle\sum\nolimits_{l=1}^{L^{\prime}}{{P_{l}}}=P_{t},$
	$\displaystyle{P_{l}}\geq 0,\;\;\;l=1,2,\ldots.L^{\prime}.$	(20)

According to (14), (III-B2) is equivalent to

$\displaystyle\mathop{{\rm{minimize}}}\limits_{\bf{P}}\mathop{\max}\limits_{m=1,\ldots,M^{\prime}}$	$\displaystyle\left\{{{\bf{a}}_{m}^{H}{\left({{{{{\bf{P}}}}^{-1}}+{{\bf{g}}_{m}}{\bf{g}}_{m}^{H}}\right)^{-1}}{{\bf{a}}_{m}}}\right\}$
$\displaystyle{\rm{subject\,to}}$	$\displaystyle\sum\nolimits_{l=1}^{L^{\prime}}{{P_{l}}}={P_{t}},$
	$\displaystyle{P_{l}}\geq 0,\;\;\;l=1,\ldots,L^{\prime}.$	(21)

According to [12, Lemma B. 4], the matrix inversion can be equivalently represented by

{\left({{{{{{\bf{P}}}}}^{-1}}+{{\bf{g}}_{m}}{\bf{g}}_{m}^{H}}\right)^{-1}}={{\bf{P}}}-\frac{1}{{1+{\bf{g}}_{m}^{H}{{\bf{P}}}{{\bf{g}}_{m}}}}{{\bf{P}}}{{\bf{g}}_{m}}{\bf{g}}_{m}^{H}{{\bf{P}}}.

(22)

Therefore, the effective noise variance for the $m$ th AP is

\sigma_{{\rm{para}}}^{2}\left({{\bf{P}},{{\bf{a}}_{m}},{{\bf{g}}_{m}}}\right)={\bf{a}}_{m}^{H}{\bf{P}}{{\bf{a}}_{m}}-\frac{{{\bf{a}}_{m}^{H}{\bf{P}}{{\bf{g}}_{m}}{\bf{g}}_{m}^{H}{\bf{P}}{{\bf{a}}_{m}}}}{{1+{\bf{g}}_{m}^{H}{\bf{P}}{{\bf{g}}_{m}}}}.

(23)

Aa a result, (III-B2) can be rewritten as

$\displaystyle\mathop{{\rm{minimize}}}\limits_{\bf{P}}\mathop{\max}\limits_{m=1,\ldots,M^{\prime}}$	$\displaystyle{\left\{{{\bf{a}}_{m}^{H}{\bf{P}}{{\bf{a}}_{m}}-\frac{{{\bf{a}}_{m}^{H}{\bf{P}}{{\bf{g}}_{m}}{\bf{g}}_{m}^{H}{\bf{P}}{{\bf{a}}_{m}}}}{{1+{\bf{g}}_{m}^{H}{\bf{P}}{{\bf{g}}_{m}}}}}\right\}}$
$\displaystyle{\rm{subject\,to}}$	$\displaystyle\sum\limits_{l=1}^{L^{\prime}}{{P_{l}}}={P_{t}},$
	$\displaystyle{P_{l}}\geq 0,\;\;\;l=1,\ldots,L^{\prime}.$	(24)

However, (III-B2) is NP-hard. To tackle this challenge, we first introduce three auxiliary variables. On this basis, we can build the following optimization problem by introducing three auxiliary variables $r$ , $s$ , and $t$ :

\left\{\begin{aligned} \mathop{\min}\limits_{{\bf{P}},r,s}\;\;\;&t\\ {\rm{s}}{\rm{.t}}{\rm{.}}\;\;\;&t\geq{r}-{s},\;\;\;\forall m,\\ &{r}\geq{\bf{a}}_{m}^{H}{\bf{P}}{{\bf{a}}_{m}},\;\;\;\forall m,\\ &{s}\leq\frac{{{{\left\|{{\bf{a}}_{m}^{H}{\bf{P}}{{\bf{g}}_{m}}}\right\|}^{2}}}}{{1+{\bf{g}}_{m}^{H}{\bf{P}}{{\bf{g}}_{m}}}},\;\;\;\forall m,\\ &\sum\limits_{l=1}^{L^{\prime}}{{P_{l}}}={P_{t}},\\ &{P_{l}}\geq 0,\;\;\;\forall l.\end{aligned}\right.

(25)

In particular, variables $r$ and $s$ have limited searching space, respectively. For a given value of $r$ , the variable $s$ should be smaller than $r$ . Therefore, we employ a two-dimension of brute force search on these two scalars. For fixed $r$ and $s$ , the optimization problem in (25) can be rewritten as a feasibility problem

\left\{\begin{aligned} &{\bf{a}}_{m}^{H}{\bf{P}}{{\bf{a}}_{m}}\leq r,\;\;\;\forall m,\\ &\left({1/2}\right){{\bf{p}}^{T}}{\bf{Jp}}-s{{\bf{v}}}{\bf{p}}-{{s}}\geq{{0}},\;\;\;\forall m,\\ &\sum\limits_{l=1}^{L^{\prime}}{{P_{l}}}={P_{t}},\\ &{P_{l}}\geq 0,\;\;\;\forall l,\end{aligned}\right.

(26)

where ${\bf{p}}={\left[{{P_{1}},\ldots,{P_{L^{\prime}}}}\right]^{T}}$ , ${\bf{v}}={\left[{{{\left|{{g_{m1}}}\right|}^{2}},\ldots,{{\left|{{g_{mL^{\prime}}}}\right|}^{2}}}\right]^{T}}$ . ${\bf{J}}$ is a $L^{\prime}\times L^{\prime}$ matrix whose ${\left({{l_{1}},{l_{2}}}\right)}$ th element is given as ${{\bf{J}}_{\left({{l_{1}},{l_{2}}}\right)}}=2{\left|{{a_{m{l_{1}}}}}\right|\left|{{g_{m{l_{1}}}}}\right|\left|{{a_{m{l_{2}}}}}\right|\left|{{g_{m{l_{2}}}}}\right|}$ . Clearly, ${\bf{J}}$ is a positive semi-definite matrix. Consequently, (26) can be solved efficiently by performing a brute force search on two scalars. In each step, a feasibility problem needs to be solved. Since transforming a nonconvex problem into its equivalent convex form is quite difficult if not possible, an off-the-shelf optimization solver, e.g. fmincon in Matlab, is adopted to obtain a suboptimal solution. Besides, simulations show that solving a nonconvex problem also brings obvious performance improvement, the computation cost is tolerable. Besides, actually (III-B2) is equal to (25) only if the two terms in the objective of (III-B2) are independent. However, at the optimal point, we only care about minimize the final maximum term. More specifically, Algorithm 1 can solve (25). The parameters in Step 1, e.g., ${r_{\min}}$ and ${r_{\max}}$ , can be determined by solving another two feasibility problems:

\begin{cases}\begin{array}[]{l}\mathop{\max}\limits_{\bf{P}}\;\;\;{\bf{a}}_{m}^{H}{\bf{P}}{{\bf{a}}_{m}}\\ {\rm{s}}{\rm{.t}}{\rm{.}}\;\;\;\sum\limits_{l=1}^{L^{\prime}}{{P_{l}}}={P_{t}},\\ \;\;\;\;\;\;\;{P_{l}}\geq 0,\;\;\;\forall l,\end{array}\end{cases}\quad\begin{cases}\begin{array}[]{l}\mathop{\min}\limits_{\bf{P}}\;\;\;{\bf{a}}_{m}^{H}{\bf{P}}{{\bf{a}}_{m}}\\ {\rm{s}}{\rm{.t}}{\rm{.}}\;\;\;\sum\limits_{l=1}^{L^{\prime}}{{P_{l}}}={P_{t}},\\ \;\;\;\;\;\;{P_{l}}\geq 0,\;\;\;\forall l,\end{array}\end{cases}

(27)

respectively. When the search is completed, all APs achieve the same minimal effective noise variance $t$ . The corresponding values of $r$ and $s$ can be denoted as $r_{\text{opt}}$ and $s_{\text{opt}}$ . Finally, utilizing (18) and (23), the achievable sum-rate can be obtained. In Algorithm 1, there are at most

\left\lfloor\!{\frac{{\left(\!{2{r_{\max}}\!-\!{r_{sl}}\!\left\lfloor{\frac{{{r_{\max}}\!-\!{r_{\min}}}}{{{r_{sl}}}}\!-\!1}\right\rfloor}\!\right)\!\left\lfloor{\frac{{{r_{\max}}\!-\!{r_{\min}}}}{{{r_{sl}}}}}\right\rfloor}}{\!}{{2{s_{sl}}}}}\right\rfloor\!\!\left\lfloor\!{\frac{{{r_{\max}}\!-\!{r_{\min}}}}{{{r_{sl}}}}}\!\right\rfloor

(28)

feasibility problems that need to be solved, where ${s_{sl}}$ and ${r_{sl}}$ refer to the step size for searching $r$ and $s$ , respectively.

Algorithm 1 Brute Force Search on Two Scalars for Solving Problem (25)

1:Initialization: Define the range of the values of

r

{r_{\min}}

and

{r_{\max}}

. Choose step size for

r

and

s

r_{sl}

and

s_{sl}

, respectively. Set

t=\infty

and

r={r_{\max}}

2:Set

s=r-s_{sl}

. If

s\geq 0

, solving the feasibility program in (26),else go to Step Step 4.

3:If (26) is feasible, set

t=\min\left({t,r-s}\right)

, else back to Step Step 2.

4:Update

r

with

r=r-r_{sl}

. Stop if

r\leq{r_{\min}}

III-B3 AP Selection

As mentioned above, recovering $L^{\prime}$ UEs’ original data only requires $L^{\prime}$ integer linear combinations. Therefore, we propose a low-complexity AP selection algorithm for two purposes. First, only $L^{\prime}$ effective noise variance participant in the brute force search leads to a reduction in computational complexity. Second, the noise tolerance on UEs’ data can be relaxed, which contributes to the improvement of the achievable sum-rate. Recall that (10) restricts the maximum value in the coefficient vector ${{{\bf{a}}_{m}}}$ and the search space is generally small. Hence, for different APs, it is the difference on the second term in (23) that leads to a significant deviation on the effective noise variance.

Note that the average channel gain is -70 dB while the noise power is -130 dBW. Therefore, the denominator of that term is close to 1 and is several orders of magnitude smaller than the numerator. Consequently, the main factor that affects the effective noise variance across different APs is the sum of all elements in ${\bf{J}}$ . In other words, decoding the estimations ${{{\bf{\hat{u}}}}_{m}}$ of APs with a high sum value of ${\bf{J}}$ will obtain a higher achievable sum-rate. Therefore, we prefer selecting APs with a high sum value of ${\bf{J}}$ . Furthermore, according to (1), $g_{ml}$ is a function of $\beta_{ml}$ . Then, we propose an algorithm for AP selection based on large-scale fading coefficient ${\beta_{ml}}$ , which generally stays constant for several coherence intervals.

We construct matrix ${\bf{J}}$ for each AP by replacing the channel coefficient $g_{ml}$ with ${\beta_{ml}}$ . For each ${\bf{J}}$ , we first sum all the elements and sort the sum values in ascending order. Then, we apply the greedy AP selection for message recovery stated in [17, Algorithm 1]. Compared with the AP selection in [17], we sort the APs by firstly replacing the channel coefficient $g_{ml}$ with $\beta_{ml}$ , and then calculating the sum value of all the elements in the matrix $J$ , which is independent of the power allocation. According to [17], we check the columns of ${\bf{A}}$ one by one, where ${\bf{A}}={\left[{{\bf{a}}_{1}\ldots,{\bf{a}}_{M}}\right]}$ , until rank requirement satisfied, therefore the computational complexity of the proposed AP selection method is no more than ${\cal O}\left({M^{\prime}+M^{\prime}{{\log}_{2}}\left({M^{\prime}}\right)+M^{\prime}{{\left({M^{\prime}-1}\right)}^{3}}}\right)$ .

III-C Successive Computation

It is beneficial to remove the codewords which have been decoded successfully from the channel observation. In that case, subsequent decoding stages will encounter less interference. This well-known technique is referred to as successive interference cancellation (SIC). In ECF framework, we apply an analog of that for cell-free massive MIMO, which is named successive computation and can be viewed as the combination of ECF and successive interference cancellation. Compared with parallel computation, successive computation reduces the effective noise variance and the number of users that need to tolerate that effective noise in each decoding step [22]. Hence, it can further improve the system performance. The SIC technique also applied in [30], which proposes a hybrid deep reinforcement learning (DRL) model to design the IUI-aware receive diversity combining scheme. Compared with [30], our successive computation scheme benefits from applying the nested lattice coding strategy which can effectively reduce the fronthaul load. In this subsection, the expression of the computation rate region for successive computation is introduced firstly. Since the decoding order of integer linear combinations ${{{\bf{\tilde{u}}}}_{m}}$ and UEs both have a significant impact on the performance of successive computation, we present different methods to find the sub-optimal decoding orders with power control.

III-C1 Computation Rate Region

In successive computation, AP $m$ sends the received signal ${\bf{y}}_{m}$ and the integer linear combinations of codewords ${\bf{a}}_{m}^{T}{\bf{X}}$ to the CPU, instead of decoding the received signal ${{\bf{y}}_{m}}$ into the combinations of UEs’ original data ${{{\bf{\hat{u}}}}_{m}}$ ⁴⁴4When applying with successive computation, the interference cancellation procedure takes place at the decoder equipped at the CPU. Note that the signaling exchanges occurred per coherent interval and the fronthaul load are $2M^{\prime}n$ and $4M^{\prime}n$ with parallel computation and successive computation, respectively. Besides, the data of UEs are conveyed from the CPU to the APs through fronthaul links and then distributed to the UEs with precoding.. Define ${{\bf{A}}_{m-1}}\buildrel\Delta\over{=}{\left[{{{\bf{a}}_{1}},\ldots,{{\bf{a}}_{m-1}}}\right]^{T}}$ , the CPU applies equalization factor $b_{m}$ and vector ${\bf{c}}$ to the $m$ th combination as

		$\displaystyle{{{\bf{\tilde{y}}}}_{m}}={b_{m}}{{\bf{y}}_{m}}+{{\bf{X}}^{T}}{{\bf{A}}_{m-1}}{\bf{c}}$
		$\displaystyle={{\bf{X}}^{T}}{{\bf{a}}_{m}}+\underbrace{{{\bf{X}}^{T}}\left({{b_{m}}{{\bf{y}}_{m}}+{{\bf{A}}_{m-1}}{\bf{c}}-{{\bf{a}}_{m}}}\right)+{b_{m}}{{\bf{z}}_{m}}}_{{\rm{effective\;noise}}}.$		(29)

where ${\bf{c}}={\left[{{c_{1}},\ldots,{c_{m-1}}}\right]^{T}}$ . (III-C1) shows that the decoded linear combinations $\left\{{{{\bf{y}}}_{1}},\ldots,{{{\bf{y}}}_{m-1}}\right\}$ can be used as side information for reducing the effective noise experienced in the latter decoding stages for other UEs.

After choosing $b_{m}$ and c to be the MMSE projection scalar and vector, the minimum effective noise variance with transmit power matrix P is given by

\sigma_{{\rm{succ}}}^{2}\left({{{\bf{g}}_{m}},{{\bf{a}}_{m}},\left.{\bf{P}}\right|{{\bf{A}}_{m-1}}}\right)\buildrel\Delta\over{=}{\bf{a}}_{m}^{H}{{\bf{F}}^{T}}{{\bf{N}}_{m-1}}{\bf{F}}{{\bf{a}}_{m}},

(30)

where

\displaystyle{{\bf{F}}^{T}}{\bf{F}}={\left({{{\bf{P}}^{-1}}+{{\bf{g}}_{m}}{\bf{g}}_{m}^{H}}\right)^{-1}},

(31)

and

\displaystyle{{\bf{N}}_{m-1}}\!=\!{\bf{I}}\!-\!{\bf{FA}}_{m-1}^{T}{\left({{{\bf{A}}_{m-1}}{{\bf{F}}^{T}}{\bf{FA}}_{m-1}^{T}}\right)^{-1}}{{\bf{A}}_{m-1}}{{\bf{F}}^{T}}.

(32)

Then, the computation rate region for successive computation is given as

		$\displaystyle{\mathcal{R}_{{\rm{succ}}}}\left({{\bf{P}},{{\bf{g}}_{m}},\left.{{{\bf{a}}_{m}}}\right\|{{\bf{A}}_{m-1}}}\right)\buildrel\Delta\over{=}\Bigg{\{}{\left({{R_{1}},\ldots,{R_{L^{\prime}}}}\right)\in R_{+}^{L}:}$
		$\displaystyle{{R_{l}}\!\leq\!{{\log}^{+}}\!\left(\!{\frac{{{P_{l}}}}{{\sigma_{{\rm{succ}}}^{2}\left({{\bf{P}},{{\bf{g}}_{m}},\left.{{{\bf{a}}_{m}}}\right\|{{\bf{A}}_{m-1}}}\right)}}}\!\right)\forall\!\left({m,l}\right)\;{\text{s.t.}}\;{a_{ml}}\!\neq\!0}\!\Bigg{\}}.$		(33)

III-C2 Searching Decoding Order for Combinations

Among $M^{\prime}$ APs, we first select the effective UEs and APs that participate in the uplink transmission. Then, we try to find the candidate integer combinations with small effective noise utilizing the AP selection algorithm. The detailed procedure has been introduced in Section III-B3. According to (III-C1), the effective noise variance is related to the decoding order of integer linear combinations. Therefore, we propose an efficient method for determining the side information matrix ${{\bf{A}}_{m-1}}$ with power control.

The main idea of successive computation is to utilize the side information and the decoded integer linear combinations to reduce the effective noise. Note that the integer linear combination decoded firstly does not have any side information to exploit. Hence, the effective noise expression for the first decoding step is similar to that of parallel computation, which is given by

		$\displaystyle\sigma_{{\rm{succ}},\delta\left(1\right)}^{2}\left({{\bf{P}},{{\bf{g}}_{\delta\left(1\right)}},{{\bf{a}}_{\delta\left(1\right)}}}\right)$
		$\displaystyle={\bf{a}}_{\delta\left(1\right)}^{H}{\left({{{\bf{P}}^{-1}}+{{\bf{g}}_{\delta\left(1\right)}}{\bf{g}}_{\delta\left(1\right)}^{H}}\right)^{-1}}{{\bf{a}}_{\delta\left(1\right)}}.$		(34)

where $\delta\left(m\right)$ denotes the decoding order. As the remaining combinations can reduce their effective noise with the help of side information, we can select the integer linear combination which has the minimum effective noise to decode firstly. To determine ${\bf{a}}_{\delta\left(1\right)}^{T}{\bf{X}}$ , we solve the following optimization problem to obtain its local sub-optimal power allocation. As the problem (III-C2) is non-convex and translate it into non-convex is quite difficult, an off-the-shelf optimization solver, e.g. fmincon in Matlab, is adopted to obtain a suboptimal solution. Note that although the obtained solution is suboptimal, the performance of the proposed framework is still superior compared with CF and MRC, which will be verified in the simulation section. Such that, for each ${\bf{a}}_{m}^{T}{\bf{X}}$ , $m=1,\ldots,L^{\prime}$ , we have

$\displaystyle\mathop{\min}\limits_{\bf{P}}\,\,$	$\displaystyle\sigma_{{\rm{succ}},\delta\left(1\right)}^{2}\left({{\bf{P}},{{\bf{g}}_{m}},{{\bf{a}}_{m}}}\right)$
$\displaystyle{\rm{s}}{\rm{.t}}{\rm{.}}\,\,$	$\displaystyle\sum\nolimits_{l=1}^{L^{\prime}}{{P_{l}}}={P_{t}},$
	$\displaystyle{P_{l}}\geq 0,l=1,\ldots,L^{\prime}.$	(35)

Then, we calculate the effective noise $\sigma_{{\rm{succ}},\delta\left(1\right)}^{2}$ for each ${\bf{a}}_{m}^{T}{\bf{X}}$ with its own power allocation matrix and select the combination which has the minimal effective noise as ${\bf{a}}_{\delta\left(1\right)}^{T}{\bf{X}}$ .

After determining ${\bf{a}}_{\delta\left(1\right)}^{T}{\bf{X}}$ , we begin to determine the remaining decoding order. For the $m$ th step, we determine ${{\bf{a}}^{T}_{\delta\left(m\right)}}{\bf{X}}$ depending on the effective noise and the rank of the side information matrix. More specifically, we first calculate the effective noise variance for each of the remaining integer linear combinations according to (30) and sort them in ascending order. Then, in line with the order, in each turn add the corresponding coefficient vector to the side information matrix ${{\bf{A}}_{m-1}}$ , which is known from step 1 through $m-1$ , to form ${{\bf{A}}_{m}}=\left[{{{\bf{A}}_{m-1}};{\bf{a}}_{m^{\prime}}^{T}}\right]$ , $m^{\prime}=1,\ldots,M^{\prime}-m+1$ . Finally, we select the integer linear combination which meets the constraint

{\rm{Rank}}\left({{{\bf{A}}_{m}}}\right)=m,

(36)

to update the side information matrix. The procedure for finding ${{\bf{a}}_{\delta\left(m\right)}^{T}}{\bf{X}}$ terminates when all integer linear combinations find themselves decoding orders. The detailed procedure for searching the decoding order of combinations in successive computation is summarized in Algorithm 2. To determine ${\bf{a}}_{\delta\left(1\right)}^{T}{\bf{X}}$ , we need to solve the optimization problem in (III-C2) for $L^{\prime}$ times. Then, for searching the decoding order for the left $L^{\prime}-1$ combinations in terms of the number of complex multiplications is

	$\displaystyle\sum\limits_{m=2}^{L^{\prime}}{\frac{{\left({L^{\prime}-m+2}\right)\left({L^{\prime}-m+1}\right)}}{2}}{\left[{\frac{{{{\left({L^{\prime}}\right)}^{3}}-L^{\prime}}}{3}}+{{\left({m-1}\right)}^{3}}L^{\prime}\right.}$
	$\displaystyle{\left.{+\frac{{{{\left({m-1}\right)}^{2}}L^{\prime}\!\!+\!\!\left({m-1}\right)L^{\prime}}}{2}\!\!+\!\!\left({m-1}\right){{\left({L^{\prime}}\right)}^{2}}\!\!+\!\!2{{\left({L^{\prime}}\right)}^{2}}}\right]}.$		(37)

Algorithm 2 Searching Decoding Order for Combinations in Successive Computation

1:Input:

L^{\prime}

candidate integer linear combinations and the corresponding coefficient vectors which can be obtained through AP selection.

2:Initialization:

m=2

n=1

3:Solve the optimization problem (III-C2) for all

L^{\prime}

combinations and calculate the effective noise variance for each combination using (III-C2). Select the combination which has the minimal effective noise as

{\bf{a}}_{\delta\left(1\right)}^{T}{\bf{X}}

and remove the corresponding coefficient vector from the candidate set. The side information is obtained with

{{\bf{A}}_{1}}=\left[{\bf{a}}_{\delta\left(1\right)}^{T}\right]

. The power constraint matrix

{\bf{P}}

is determined with the power allocation for

{\bf{a}}_{\delta\left(1\right)}^{T}{\bf{X}}

4:Calculate the effective noise variance for each of the remaining combinations based on

{{\bf{A}}_{m-1}}

using (30) and sort them in ascending order. Update

{{\bf{A}}_{m}}

with

{{\bf{A}}_{m-1}}=\left[{{\bf{A}}_{m-1}};{{\bf{a}}_{n}^{T}}\right]

. If

{\rm{Rank}}\left({{{\bf{A}}_{m}}}\right)=m

, remove

{{\bf{a}}_{n}}

form the candidate set to update the side information matrix and set

m:=m+1,n=1

, else

n:=n+1

5:Stop if

{\rm{Rank}}\left({{{\bf{A}}_{L}^{\prime}}}\right)=L^{\prime}

, otherwise back to Step 4.

6:Output:

{{{\bf{A}}_{L}^{\prime}}}

III-C3 Searching Decoding Order for UEs

In successive computation, the effective noise of UEs whose data is decoded in the latter decoding stages can be reduced. At the $m$ th decoding step, it is possible to use the side information matrix ${{\bf{A}}_{m-1}}$ to reduce some known individual codewords and remove them from the integer linear information ${\bf{a}}_{\delta\left(m\right)}^{T}{\bf{X}}$ without changing the effective noise variance. In particular, if the $l$ th UE’s data has been recovered at the $m$ th step, the data only need to tolerate the maximum effective noise among $\left\{{{\bf{a}}_{\delta\left(1\right)}^{T}{\bf{X}},\ldots,{\bf{a}}_{\delta\left(m\right)}^{T}{\bf{X}}}\right\}$ . Therefore, the decoding order of UEs also has an effect on the achievable sum-rate. Searching the decoding order has been studied in some works [31], [32] with SIC. However, our problem is generally intractable and the use of convex optimization for obtaining the optimal solution is not possible. Therefore, we propose several methods to determine the decoding order of UEs and select the best one as a suboptimal solution.

Received-Power-Based Algorithm

In general successive interference cancellation, the decoding order of UEs is determined by their received power. We calculate the received power of the $l$ th UE with respect to the $m$ th integer linear combination with ${P_{r,\left({l,m}\right)}}={P_{l}}{\left\|{{g_{ml}}}\right\|^{2}}$ . Then, we can obtain the achievable rate for each UE with ${\bf{a}}_{\delta\left(m\right)}^{T}{\bf{X}}$ with

{R_{{\delta\left(m\right)},l}}=\frac{{{P_{l}}}}{{\sigma_{{\rm{succ}},\delta\left(m\right)}^{2}\left({{\bf{P}},{{\bf{g}}_{\delta\left(m\right)}},\left.{{{\bf{a}}_{\delta\left(m\right)}}}\right|{{\bf{A}}_{m-1}}}\right)}},\;{a_{{\delta\left(m\right)},l}}\neq 0,

(38)

and sort them in descending order based on the received power. The signal from UE whose rate with respect to the combination is in the first place of the order is decoded at the ${\delta\left(m\right)}$ step.

Channel-Coefficient-Based Algorithm

As stated in Section III-B3, a better channel condition leads to the less effective noise variance. Therefore, UEs with good channel condition contributes to the small effective noise of selected APs. These UEs can be decoded first to relax their effective noise tolerance and then have a large achievable rate. For the $l$ th UE, let us define ${{\bf{g}}_{l}}$ as the channel coefficients with $L^{\prime}$ integer linear combinations ${{\bf{g}}_{l}}={\left[{{g_{1l}},\ldots,{g_{L^{\prime}l}}}\right]}^{T}$ . We calculate the 2-norm of ${{\bf{g}}_{l}}$ for all UEs and sort them in descending order.

Hungarian Algorithm

With the effective noise variance for $L^{\prime}$ integer linear combinations and the power allocation for $L^{\prime}$ UEs, we can find the assignment for each UE with $P_{l}$ . It has been shown in [33] that the Hungarian algorithm may be the best solution to the combinational optimization problem. Therefore, we first construct a $L^{\prime}\times L^{\prime}$ matrix ${\bf{C}}$ , whose element in the $l$ th row and $m$ th column represents the achievable rate of the $l$ th UE with the $m$ th integer linear combination from (38), such as $C_{ml}=R_{ml}$ . The conventional Hungarian algorithm aims to find $L^{\prime}$ element which are set in different rows and columns of ${\bf{C}}$ . The sum of these $L^{\prime}$ element is minimum. However, we need to obtain the maximum value of the achievable sum-rate. Therefore, we first find the maximum value ${C_{\max}}$ in ${\bf{C}}$ and replace each element with ${C_{ml}}={C_{ml}}-{C_{\max}}$ . The detailed procedure of the Hungarian algorithm is summarized in Algorithm 3.

Algorithm 3 Hungarian Algorithm for Finding Optimal Decoding Order of UEs

1:Perform row operations on

{\bf{C}}

. The minimum element of each row is selected and is subtracted from each element in that row.

2:Repeat the procedure stated in Step 1 for all columns.

3:Count the minimum number of rows and columns that cover all zeros. Test the optimality. If the number of counted lines is equal to

L^{\prime}

, such as

{N_{l}}=L^{\prime}

, stop the procedure.

4:Find the minimum value that is not covered in lines and add that to intersection points. Subtract that minimum value from elements that are not covered by counted lines.

5:Repeat Step 3 for checking the optimality condition. If

{N_{l}}\leq L^{\prime}

, repeat Step 4.

To determine which UE’s data should be recovered at the $m$ th decoding step, we need to compute $L^{\prime}+1-m$ UEs’ achievable rate. Therefore, the computational complexity for the received-power-based algorithm and the channel-coefficient-based algorithm is ${\cal O}\left({\frac{{\left({L^{\prime}+1}\right)L^{\prime}}}{2}}\right)$ . Besides, the complexity of the Hungarian algorithm is ${\cal O}\left({{{\left({L^{\prime}}\right)}^{3}}}\right)$ [34]. After determining the decoding order of integer linear combinations and UEs, the achievable sum-rate can be obtained. Using (38) for calculating the achievable rate for $l$ th UE whose data is recovered with ${\bf{a}}_{\delta\left(m\right)}^{T}{\bf{X}}$ , and the sum achievable rate is given as ${R_{\text{sum}}}=\sum\nolimits_{l=1}^{L^{\prime}}{{R_{ml}}}$ .

IV Numerical Results

IV-A Parameters Setup

We adopt the similar parameters setting in [6] as the basis to establish our simulation system model. More specifically, all UEs and APs are randomly located within a square of 1 $\times$ 1 km. In each simulation setup, the APs and UEs are uniformly distributed at random locations within the simulation area. The square is wrapped around at the edges to avoid boundary effects. Hata-COST231 model is employed to characterize the large-scale propagation.

IV-B Results and Discussion

IV-B1 Parallel Computation

First, we evaluate the performance of the proposed parallel computation (PARA) scheme in terms of the achievable sum-rate with power control. The APS-PARA scheme refers to the PARA with AP selection. Fig. 2 shows the achievable sum-rate obtained via CF, PARA, and APS-PARA schemes versus the number of APs with $L=10$ and ${P_{t}=200}$ mW. Owing to the array again, the system performance of all considered schemes increases as the number of APs $M$ increasing. Moreover, the PARA scheme with the proposed power control method outperforms the conventional CF scheme. For example, compared with the CF scheme, both PARA and APS-PARA schemes improve the achievable sum-rate by factors more than 1.24 and 1.36 for the case of $M=100$ , respectively. This is due to the fact that the ECF framework enables optimal transmit power of UEs which facilitates the exploitation of performance gain. Furthermore, it can be seen from Fig. 2 that APS-PARA scheme is better than the PARA scheme. This is contributed to the low IUI brought by the proposed AP selection. Due to the effective noise variance which UEs’ data need to tolerate decreases considerably, it is beneficial to utilize AP selection for improving the achievable sum-rate. Besides, the computational complexity has also been reduced with AP selection. For recovering UEs’ original information, only $L$ integer linear combinations instead of $M$ need to be used in the power control. Besides, when the number of APs is 60, compared with imperfect CSI estimated by MMSE estimation method [12] known at APs, the performance degradation caused by imperfect CSI is only 4%.

Assuming that the power control is utilized at the CPU based on the large-scale fading, we need to replace ${g_{ml}}$ with ${\beta}_{ml}$ for solving the optimization problem (III-B2). The PARA scheme using power control based on the large-scale fading is referred to as LSF-PARA scheme. Fig. 3 shows the achievable sum-rate obtained with PARA, LSF-PARA, APS-PARA, and APS-LSF-PARA schemes against the number of APs. As expected, the achievable sum-rate of all schemes improves as the number of APs increases, and applying AP selection does help enhance the performance. Furthermore, the impact on achievable sum-rate of neglecting the small-scale fading is not critical, especially when the ratio of APs to UEs becomes large. In particular, the performance gap due to ignoring the small-scale fading vanishes for $M=100$ . This is due to the property of channel hardening [12]. As the number of antennas is sufficiently large, the variance of the channel gain reduces and the fading becomes almost as a deterministic channel.

IV-B2 Successive Computation

Next, we examine the performance of successive computation (SUCC) schemes. Let us denote the successive computation scheme based on large-scale fading applied with AP selection, power allocation, and Hungarian algorithm by APS-LSF-PA-Hungarian-SUCC. Fig. 4 shows the performance of APS-LSF-PA-SUCC schemes with different algorithms on determining the decoding order of UEs, Hungarian algorithm, received-power-based (RP) algorithm, and channel-coefficient-based algorithm. As we assume that the power control is employed at the CPU, which means that the channel coefficient is replaced with the large-scale fading coefficient, the APS-LSF-PA-SUCC scheme with searching the decoding order of UEs through 2-norm of large-scale fading coefficients is named as APS-LSF-PA-NLSF-SUCC scheme. As shown in Fig. 4, the APS-LSF-PA-Hungarian-SUCC scheme achieves the best result compared to other schemes. Furthermore, the performance of APS-LSF-PA-RP-SUCC scheme is similar to that of APS-LSF-PA-NLSF-SUCC. This is due to the fact that the denominator of the second term in (23) is several orders of magnitude smaller than the numerator, as the transmit power normalized by the noise power is huge. Note that the effect of the first term in (23) on effective noise variance is not significant. Therefore, according to (23), UEs with good channel state should be allocated with more transmit power for reducing the effective noise and finally the system can obtain a larger achievable sum-rate. In this way, using RP and NLSF algorithms leads to the same result.

In previous simulation results, we have shown that the instantaneous channel state information can help the parallel computation scheme to improve the achievable sum-rate while with higher complexity. In successive computation, the conclusion is similar. In Fig. 5, we compare the achievable sum-rate of APS-PA-Hungarian-SUCC and APS-LSF-PA-Hungarian-SUCC schemes. Although the performance gap induced by the replacement of channel coefficient becomes large along with the increase of the number of UEs, it is still very small. At the same time, transmitting instantaneous channel state information yields a great growth load in fronthaul load. Noted that the small-scale fading coefficient is only static during one coherence block while the large-scale fading coefficient stays constant for a duration of at least 40 small-scale fading coherence intervals [6]. Therefore, using the statistical channel state information works well for successive computation schemes. Besides, the benefit of employing AP selection is also obvious in successive computation. During searching the decoding order of combinations (36), in the $m$ th step we only need to calculate $L^{\prime}-m$ times to find the minimal effective noise which increases the rank of the side information matrix. Furthermore, we can use ${{\bf{u}}_{1}}\ldots,{{\bf{u}}_{m-1}}$ to eliminate certain symbols from the combination and thus remove the constraint on them. For determining the total decoding order of combinations, the computational complexity is $\mathcal{O}\left({{{\left({L^{\prime}}\right)}^{2}}}\right)$ while abandoning the selection needs $\mathcal{O}\left({{{\left({M^{\prime}}\right)}^{2}}}\right)$ . Therefore, utilizing AP selection not only improves the performance but also decrease the computational complexity when the number of APs is larger than UEs.

IV-B3 Comparison of centralized MMSE, ECF, CF, and MRC scheme

In Fig. 6, we compare the achievable sum-rate of APS-LSF-PA-Hungarian-SUCC, APS-LSF-PARA, CF, and MRC schemes. MRC scheme is the simple linear strategy in cell-free massive MIMO, which has been widely used in previous works [6]. In the uplink data transmission, the received signal at the $m$ th AP can be expressed as ${{\bf{y}}_{m}}=\sum\nolimits_{l=1}^{L}{{g_{ml}}}\sqrt{{{P_{l}}}}{{\bf{x}}_{l}}+{{\bf{z}}_{m}}$ . Then, the $m$ th AP multiplies the received signal with the conjugate of its channel coefficient vector ${{{\bf{g}}}_{m}}$ and then forwards ${{\bf{y}}_{m}}{\bf{g}}_{m}^{*}$ to the CPU. The CPU combines signals from all $M$ APs. Therefore, the achievable rate of the $l$ th UE is given by

{R_{{\rm{mrc,}}l}}={\log_{2}}\left({1+\frac{{P{{\left|{{\bf{g}}_{l}^{H}{{\bf{g}}_{l}}}\right|}^{2}}}}{{{{\left|{{\bf{g}}_{l}^{H}}\right|}^{2}}+{P_{l}}\sum\nolimits_{l\neq l^{\prime}}{{{\left|{{\bf{g}}_{l^{\prime}}^{H}{{\bf{g}}_{l}}}\right|}^{2}}}}}}\right).

(39)

It is clear to see that the IUI limits the achievable sum-rate. However, employing CF and ECF schemes can harness and even exploit the interference for cooperative gain, which leads to an increase in the achievable sum-rate. This can be verified from Fig. 6. When the number of APs is not very large, which means the IUI affects the performance significantly, the advantage of applying CF and ECF schemes is self-evident. For example, compared with MRC scheme, the achievable sum-rate of CF and ECF schemes improves by factors more than 1.5 and 2.5 when $M=100$ , respectively.

Although utilizing the ECF framework can effectively improve the system performance, it is not the optimal choice for maximizing the achievable rate. Fig. 7 shows the cumulative distribution function (CDF) of achievable sum-rate for centralized MMSE, APS-LSF-PA-Hungarian-SUCC, APS-LSF-PARA, CF, and MRC schemes with $M=100$ , $L=20$ . From Fig. 7, we can first observe that our parallel ECF scheme with power control method that solves (26) outperforms both CF and MRC. Second, when comparing the ECF framework with local MR and zero-forcing (ZF) schemes using quantized signals under the same fronthaul limit, our proposed ECF schemes including parallel and successive computation have superior performance. Specifically, compared with the local ZF, applying the successive computation scheme leads to 60.4% improvement in terms of the average achievable sum-rate. Besides, the performance gap between the APS-LSF-PA-Hungarian-SUCC scheme and the centralized MMSE scheme is obvious. It is attributed to the fact that the centralized MMSE adopts the optimal combining scheme for maximizing the instantaneous signal-to-interference-and-noise ratio [12]. Specifically, applying the centralized MMSE scheme leads to 55% improvement in terms of average achievable sum-rate. However, compared with centralized MMSE, ECF is also an efficient approach for fronthaul reduction and hence a largely achievable sum-rate still can be realized even if the fronthaul capacity is limited. In particular, each AP decodes the received signal into the finite field by applying the equalization factor and then forwards an integer combination of the transmitted symbols of all UEs. The cardinality of signals transmitted in the fronthaul link is the same as the cardinality of UEs original data, this is the theoretical minimum fronthaul load required to achieve lossless transmission [17]. When the fronthaul capacity restricted as $R_{0}$ , the actual achievable rate is $R=\min\left\{{{R_{0}},{R_{\text{sum}}}}\right\}$ [35], where ${R_{\text{sum}}}$ represents the achievable sum-rate without considering the fronthaul load constraint.

IV-B4 Trade-off between the performance and the complexity of ECF schemes

TABLE II: The computational complexity and performance of various version of ECF framework

Schemes

Computation Complexity

Sum rate [bits per channel use]

AP Selection

Power Optimization

(the number of feasibility problems)

Searching Decoding Order

Parallel

Computation

{\begin{array}[]{l}{\cal O}\left({M^{\prime}+M^{\prime}{{\log}_{2}}\left({M^{\prime}}\right)}\right.\\ \left.{+M^{\prime}{{\left({M^{\prime}-1}\right)}^{3}}}\right)\end{array}}

\begin{array}[]{l}\left\lfloor{\left({{r_{\max}}\!+\!{r_{\max}}\!-\!{r_{sl}}\left\lfloor{\frac{{{r_{\max}}\!-\!{r_{\min}}}}{{{r_{sl}}}}\!\!-\!\!1}\right\rfloor}\right)}\right.\\ \left.\times\left\lfloor{\frac{{{r_{\max}}-{r_{\min}}}}{{{r_{sl}}}}}\right\rfloor/2{s_{sl}}\right\rfloor\left\lfloor{\frac{{{r_{\max}}-{r_{\min}}}}{{{r_{sl}}}}}\right\rfloor\end{array}

N/A

19.57

Successive Computation

Using Received-power-based algorithm

\begin{array}[]{l}{\cal O}\left({M^{\prime}+M^{\prime}{{\log}_{2}}\left({M^{\prime}}\right)}\right.\\ \left.{+M^{\prime}{{\left({M^{\prime}-1}\right)}^{3}}}\right)\end{array}

\begin{array}[]{l}\sum\limits_{m=2}^{L^{\prime}}{\frac{{\left({L^{\prime}-m+2}\right)\left({L^{\prime}-m+1}\right)}}{2}}\left[{\frac{{{{\left({L^{\prime}}\right)}^{3}}-L^{\prime}}}{3}}\right.\\ +\frac{{{{\left({m-1}\right)}^{2}}L^{\prime}+\left({m-1}\right)L^{\prime}}}{2}+{\left({m-1}\right)^{3}}L^{\prime}\\ \left.{+\left({m-1}\right){{\left({L^{\prime}}\right)}^{2}}+2{{\left({L^{\prime}}\right)}^{2}}}\right]\end{array}

{\cal O}\left({\frac{{\left({L^{\prime}+1}\right)L^{\prime}}}{2}}\right)

23.58

Using Channel-coefficient-based algorithm

\begin{array}[]{l}{\cal O}\left({M^{\prime}+M^{\prime}{{\log}_{2}}\left({M^{\prime}}\right)}\right.\\ \left.{+M^{\prime}{{\left({M^{\prime}-1}\right)}^{3}}}\right)\end{array}

\begin{array}[]{l}\sum\limits_{m=2}^{L^{\prime}}{\frac{{\left({L^{\prime}-m+2}\right)\left({L^{\prime}-m+1}\right)}}{2}}\left[{\frac{{{{\left({L^{\prime}}\right)}^{3}}-L^{\prime}}}{3}}\right.\\ +\frac{{{{\left({m-1}\right)}^{2}}L^{\prime}+\left({m-1}\right)L^{\prime}}}{2}+{\left({m-1}\right)^{3}}L^{\prime}\\ \left.{+\left({m-1}\right){{\left({L^{\prime}}\right)}^{2}}+2{{\left({L^{\prime}}\right)}^{2}}}\right]\end{array}

{\cal O}\left({\frac{{\left({L^{\prime}+1}\right)L^{\prime}}}{2}}\right)

22.82

Using Hungarian algorithm

\begin{array}[]{l}{\cal O}\left({M^{\prime}+M^{\prime}{{\log}_{2}}\left({M^{\prime}}\right)}\right.\\ \left.{+M^{\prime}{{\left({M^{\prime}-1}\right)}^{3}}}\right)\end{array}

\begin{array}[]{l}\sum\limits_{m=2}^{L^{\prime}}{\frac{{\left({L^{\prime}-m+2}\right)\left({L^{\prime}-m+1}\right)}}{2}}\left[{\frac{{{{\left({L^{\prime}}\right)}^{3}}-L^{\prime}}}{3}}\right.\\ +\frac{{{{\left({m-1}\right)}^{2}}L^{\prime}+\left({m-1}\right)L^{\prime}}}{2}+{\left({m-1}\right)^{3}}L^{\prime}\\ \left.{+\left({m-1}\right){{\left({L^{\prime}}\right)}^{2}}+2{{\left({L^{\prime}}\right)}^{2}}}\right]\end{array}

{\cal O}\left({{{\left({L^{\prime}}\right)}^{3}}}\right)

25.51

In Table II, we summarize the performance in terms of sum-rate and the computational complexity of various versions of successive computation and parallel computation. It can be observed that there is a trade-off between performance and computational complexity. Specifically, the successive computation with the Hungarian algorithm for searching decoding order of UEs has the higher computational complexity and superior performance compared with the other two methods, i.e., received-power-based algorithm and channel-coefficient-based algorithm.

IV-B5 Scalable Issue

In order to realize scalability [36], [37], our ECF framework needs to control the number of UEs each AP serves. Specifically, according to the large-scale-fading-based AP selection criterion proposed in [38], APs are first selected to form UE-centric clusters for each UE. Then, each AP sorts the UEs that need to be served according to the large-scale fading information, and then selects only the first several UEs with the best channel quality to serve. Fig. 8 shows the performance comparison between the original non scalable ECF and the scalable ECF schemes. We can observe that the performance loss is small and decreases with the increase of the number of APs.

V Conclusions

In this work, we investigate the achievable sum-rate of ECF framework for cell-free massive MIMO systems. Two types of ECF framework including parallel computation and successive computation to improve the achievable sum-rate in cell-free massive MIMO are proposed. An AP selection scheme is proposed to reduce the effective noise tolerance of UEs to further improve the performance and reduce the computation complexity. We prove that the proposed power control algorithm for parallel computation and successive computation with AP selection can improve the achievable sum-rate significantly. For obtaining better system performance, methods for determining the decoding order of combinations and UEs are also presented. Numerical results show that compared with CF and MRC schemes, the ECF framework remarkably improves the achievable sum-rate of cell-free massive MIMO systems.

References

[1] J. Zhang, J. Zhang, J. Zheng, S. Jin, and B. Ai, “Expanded compute-and-forward for backhaul-limited cell-free massive MIMO,” in Proc. IEEE ICC Wkshps, 2019, pp. 1–6.
[2] T. L. Marzetta, “Noncooperative cellular wireless with unlimited numbers of base station antennas,” IEEE Trans. Wireless Commun., vol. 9, no. 11, pp. 3590–3600, Nov. 2010.
[3] J. Zhang, E. Björnson, M. Matthaiou, D. W. K. Ng, H. Yang, and D. J. Love, “Prospective multiple antenna technologies for beyond 5G,” IEEE J. Sel. Areas in Commun., vol. 38, no. 8, pp. 1637–1660, Aug. 2020.
[4] X. Chen, D. W. K. Ng, W. Yu, E. G. Larsson, N. Al-Dhahir, and R. Schober, “Massive access for 5G and beyond,” IEEE J. Sel. Areas in Commun., vol. 39, no. 3, pp. 615–637, Mar. 2021.
[5] A. Lozano, R. W. Heath, and J. G. Andrews, “Fundamental limits of cooperation,” IEEE Trans. Inf. Theory, vol. 59, no. 9, pp. 5213–5226, Sep. 2013.
[6] H. Q. Ngo, A. Ashikhmin, H. Yang, E. G. Larsson, and T. L. Marzetta, “Cell-free massive MIMO versus small cells,” IEEE Trans. Wireless Commun., vol. 16, no. 3, pp. 1834–1850, Mar. 2017.
[7] E. Nayebi, A. Ashikhmin, T. L. Marzetta, H. Yang, and B. D. Rao, “Precoding and power optimization in cell-free massive MIMO systems,” IEEE Trans. Wireless Commun., vol. 16, no. 7, pp. 4445–4459, Jul. 2017.
[8] M. Karlsson, E. Björnson, and E. G. Larsson, “Techniques for system information broadcast in cell-free massive MIMO,” IEEE Trans. Commun., vol. 67, no. 1, pp. 244–257, Jan. 2019.
[9] M. Bashar, K. Cumanan, A. G. Burr, M. Debbah, and H. Q. Ngo, “On the uplink max¨cmin SINR of cell-free massive MIMO systems,” IEEE Trans. Wireless Commun., vol. 18, no. 4, pp. 2021–2036, Apr. 2019.
[10] G. Interdonato, E. Björnson, H. Q. Ngo, P. Frenger, and E. G. Larsson, “Ubiquitous cell-free massive MIMO communications,” EURASIP J. Wireless Commun. Netw., vol. 2019, no. 1, p. 197, Dec. 2019.
[11] E. Bjornson and L. Sanguinetti, “Making cell-free massive MIMO competitive with MMSE processing and centralized implementation,” IEEE Trans. Wireless Commun., vol. 19, no. 1, pp. 77–90, Jun. 2020.
[12] E. Björnson, J. Hoydis, L. Sanguinetti et al., “Massive MIMO networks: Spectral, energy, and hardware efficiency,” Foundations and Trends® in Signal Processing, vol. 11, no. 3-4, pp. 154–655, 2017.
[13] R. Ahlswede, N. Cai, S.-Y. Li, and R. W. Yeung, “Network information flow,” IEEE Trans. Inf. Theory, vol. 46, no. 4, pp. 1204–1216, Apr. 2000.
[14] S.-Y. Li, R. W. Yeung, and N. Cai, “Linear network coding,” IEEE Trans. Inf. Theory, vol. 49, no. 2, pp. 371–381, Feb. 2003.
[15] T. Yang and I. B. Collings, “On the optimal design and performance of linear physical-layer network coding for fading two-way relay channels,” IEEE Trans. Wireless Commun., vol. 13, no. 2, pp. 956–967, May 2014.
[16] B. Nazer and M. Gastpar, “Reliable physical layer network coding,” Proc. IEEE, vol. 99, no. 3, pp. 438–460, Mar. 2011.
[17] Q. Huang and A. Burr, “Compute-and-forward in cell-free massive MIMO: Great performance with low backhaul load,” in Proc. IEEE ICC Wkshps, May 2017, pp. 601–606.
[18] B. Nazer and M. Gastpar, “Compute-and-forward: Harnessing interference through structured codes,” IEEE Trans. Inf. Theory, vol. 57, no. 10, pp. 6463–6486, Oct. 2011.
[19] S.-N. Hong and G. Caire, “Compute-and-forward strategies for cooperative distributed antenna systems,” IEEE Trans. Inf. Theory, vol. 59, no. 9, pp. 5227–5243, Sep. 2013.
[20] C. Feng, D. Silva, and F. R. Kschischang, “An algebraic approach to physical-layer network coding,” IEEE Trans. Inf. Theory, vol. 59, no. 11, pp. 7576–7596, Nov. 2013.
[21] M. Nokleby and B. Aazhang, “Lattice coding over the relay channel,” in Proc. IEEE ICC, 2011, pp. 1–5.
[22] B. Nazer, V. R. Cadambe, V. Ntranos, and G. Caire, “Expanding the compute-and-forward framework: Unequal powers, signal levels, and multiple linear combinations,” IEEE Trans. Inf. Theory, vol. 62, no. 9, pp. 4879–4909, Sep. 2016.
[23] Z. Li, J. Chen, L. Zhen, S. Cui, K. G. Shin, and J. Liu, “Coordinated multi-point transmissions based on interference alignment and neutralization,” IEEE Trans. Wireless Commun., vol. 18, no. 7, pp. 3347–3365, Jul. 2019.
[24] P. Marsch and G. P. Fettweis, Coordinated Multi-Point in Mobile Communications: from theory to practice. Cambridge University Press, 2011.
[25] V. V. Veeravalli and A. El Gamal, Interference management in wireless networks: Fundamental bounds and the role of cooperation. Cambridge University Press, 2018.
[26] Q. Huang and A. Burr, “Low complexity coefficient selection algorithms for compute-and-forward,” IEEE Access, 2017, pp. 19 182–19 193.
[27] B. Zhou and W. H. Mow, “A quadratic programming relaxation approach to compute-and-forward network coding design,” in Proc. IEEE ISIT, Jun. 2014, pp. 2296–2300.
[28] S. Boyd and L. Vandenberghe, Convex Optimization. Cambridge University Press, 2004.
[29] W. He, B. Nazer, and S. S. Shitz, “Uplink-downlink duality for integer-forcing,” IEEE Trans. Inf. Theory, vol. 64, no. 3, pp. 1992–2011, Mar. 2018.
[30] Y. Al-Eryani, M. Akrout, and E. Hossain, “Multiple access in cell-free networks: Outage performance, dynamic clustering, and deep reinforcement learning-based design,” IEEE J. Sel. Areas in Commun., vol. 39, no. 4, pp. 1028–1042, 2020.
[31] W. Mesbah and H. Alnuweiri, “Joint rate, power, and decoding order optimization of MIMO-MAC with MMSE-SIC,” in Proc. IEEE Globecom Wkshps, 2009.
[32] Z. Zhou, T. Jiang, H. Bai, S. Sun, and H. Long, “Joint optimization of power and decoding order in CDMA based cognitive radio systems with successive interference cancellation,” in Proc. IEEE ICCT, 2011, pp. 187–191.
[33] R. R. Patel, T. T. Desai, and S. J. Patel, “Scheduling of jobs based on hungarian method in cloud computing,” in Proc. IEEE ICICCT, 2017, pp. 6–9.
[34] B.-L. Xu and Z.-Y. Wu, “A study on two measurements-to-tracks data assignment algorithms,” Inf. Sci., vol. 177, no. 19, pp. 4176–4187, 2007.
[35] B. Nazer, A. Sanderovich, M. Gastpar, and S. Shamai, “Structured superposition for backhaul constrained cellular uplink,” in Proc. IEEE ISIT, 2009, pp. 1530–1534.
[36] E. Björnson and L. Sanguinetti, “Scalable cell-free massive MIMO systems,” IEEE Trans. Commun., vol. 68, no. 7, pp. 4247–4261, Jul. 2020.
[37] G. Interdonato, P. Frenger, and E. G. Larsson, “Scalability aspects of cell-free massive MIMO,” in Proc. IEEE ICC, 2019, pp. 1–6.
[38] H. Q. Ngo, L.-N. Tran, T. Q. Duong, M. Matthaiou, and E. G. Larsson, “On the total energy efficiency of cell-free massive MIMO,” IEEE Trans. Green Commun. Netw., vol. 2, no. 1, pp. 25–39, Mar. 2018.