Interference Cancellation Information Geometry Approach for Massive MIMO Channel Estimation

An-An Lu, , Bingyan Liu and Xiqi Gao A.-A. Lu, B. Liu and X. Q. Gao are with the National Mobile Communications Research Laboratory (NCRL), Southeast University, Nanjing, 210096 China, and also with Purple Mountain Laboratories, Nanjing 211111, China, e-mail: aalu@seu.edu.cn, xqgao@seu.edu.cn.

Abstract

In this paper, the interference cancellation information geometry approaches (IC-IGAs) for massive MIMO channel estimation are proposed. The proposed algorithms are low-complexity approximations of the minimum mean square error (MMSE) estimation. To illustrate the proposed algorithms, a unified framework of the information geometry approach for channel estimation and its geometric explanation are described first. Then, a modified form that has the same mean as the MMSE estimation is constructed. Based on this, the IC-IGA algorithm and the interference cancellation simplified information geometry approach (IC-SIGA) are derived by applying the information geometry framework. The a posteriori means on the equilibrium of the proposed algorithms are proved to be equal to the mean of MMSE estimation, and the complexity of the IC-SIGA algorithm in practical massive MIMO systems is further reduced by considering the beam-based statistical channel model (BSCM) and fast Fourier transform (FFT). Simulation results show that the proposed methods achieve similar performance as the existing information geometry approach (IGA) with lower complexity.

Index Terms:

Massive MIMO, interference cancellation (IC), information geometry approaches (IGA), beam based statistical channel model (BSCM), channel estimation.

I Introduction

Massive multi-input multi-output (MIMO)[1, 2, 3] is the core enabling technology for the 5th generation (5G) mobile communications. It has further evolved into extra large-scale MIMO (XL-MIMO) [4, 5, 6], which has become a research hotspot of the 6th generation (6G) mobile communications. By increasing the number of antennas at the base station (BS), massive MIMO has significantly enhanced the spatial multiplexing and diversity gain and achieved a substantial increase in energy and spectral efficiency. To achieve these potential gains, the most important thing is acquiring accurate channel state information (CSI). In this paper, we focus on the channel estimation problem for massive MIMO.

The object of channel estimation is obtaining the a posteriori information of the channel from received signals. When the a priori probability density function (PDF) is Gaussian, the minimum mean square error (MMSE) estimator, which is also the a posteriori mean, is the optimal estimator. To fully exploit the sparsity of massive MIMO, analytical channel models with joint space-frequency representation such as the doubly beam-based stochastic model (BSCM)[7] are established, and the channel estimation problem can be transformed into the angle-delay domain. As the number of antennas increases, the pilot resources in massive MIMO are no longer enough [8, 9] and non-orthogonal pilots [10, 11] are often used. Thus, the channel estimator in massive MIMO usually needs to perform joint estimation of the channels of different users in the angle-delay domain, which makes the complexity of the matrix inversion in the MMSE estimation prohibitive.

Due to the complexity issue, low-complexity channel estimators that can achieve near MMSE performance are widely investigated in the literature. In [12], a polynomial expansion (PE) channel estimation is proposed for massive MIMO with arbitrary statistics. In [13], a low-complexity channel estimation with low dimensional channel gain estimation is proposed for massive MIMO with uniform planar array (UPA) by estimating the angle of arrival (AOA) first. Deep learning-based channel estimation approaches are proposed for beamspace mmWave massive MIMO and multi-cell massive MIMO systems in [14] and [15], respectively. In [16], a generalized approximate message passing (GAMP) method is proposed for channel estimation of a massive MIMO mmWave channel. Among these approaches, the GAMP might be the most promising one to be implemented in practical systems. However, the derivation of the GAMP is not easy to follow since it lacks of rigorous and concise mathematical explanation. In [11], an information geometry approach (IGA) that can achieve similar performance with similar complexity as the GAMP is proposed for massive MIMO channel estimation.

Information geometry theory arises from the study of invariant geometric structures in statistical inference [17] and provides the mathematical foundation of statistics [18]. It views the space of probability distributions as manifold and tackles problems in information science by using the concepts of differential geometry with tensor calculus [19]. Information geometry theory also plays an important role in machine learning, signal processing, optimization theory[20], and fields such as neuroscience [21] and quantum physics [22]. The information geometry explanation of the belief propagation (BP) algorithm [23] is given in [24], which also shows that the concave-convex procedure (CCCP) method [25] computing the marginal distribution can be interpreted by information geometry. In [26], the decoding algorithms for Turbo codes and low-density parity check (LDPC) codes are derived from the viewpoint of information geometry, and the equilibrium and error of the algorithms are analyzed.

The research on MIMO and massive MIMO based on information geometry is rarely seen in the literature. In [27], an information geometric approach is proposed to approximate ML estimation in semi-blind MIMO channel identification. For massive MIMO, information geometry is introduced in [11] and [28] to derive information geometry approaches (IGA) for channel estimation and detection, respectively. Moreover, a simplified IGA (S-IGA) algorithm is proposed in [29] by using the constant envelope property of the channel measurement matrix. Information geometry provides a unified framework for understanding belief propagation or message passing based algorithms. The detailed relation between the IGA and AMP algorithm is provided in [In preparation].

The information geometry approach define the auxiliary probability density functions (PDFs) based on the orignal PDF to obtain a low complexity algorithm. In [11], The auxiliary PDFs are defined based on the elements of the received signal, and each auxiliary PDF computes the message of all the channel elements. Thus, a natural question is whether we can derive a new channel estimation algorithm that is different from and has a lower complexity than that in [11]. To answer this question, we propose the interference cancellation information geometry approach (IC-IGA) for massive MIMO channel estimation in this paper. In the new algorithm, each auxiliary PDF focuses on the message for one element of the channel vector, and both time and space complexities are much lower than that of the IGA algorithm. To derive this new algorithm, we first provide a unified framework of the channel estimation information geometry approach, and explain the geometric meaning of the equilibrium of this approach. Then, we construct a modified channel estimation form that has the same mean as the MMSE estimation and apply the unified framework to obtain the new IG algorithm. To further reduce the complexity, the interference cancellation simplified information geometry approach (IC-SIGA) is proposed. Finally, the a posteriori means on the equilibriums of the proposed algorithms are proved to be equal to the mean of MMSE estimation, and the complexity analysis is provided.

The rest of this paper is organized as follows. The preliminaries about the manifold of complex Gaussian distributions are provided in Section II. The general information geometry framework for massive MIMO channel estimation is presented in Section III. The derivations of IC-IGA and IC-SIGA are presented in Sections IV and V, respectively. Simulation results are provided in Section VI. The conclusion is drawn in Section VII.

Notations: Throughout this paper, uppercase and lowercase boldface letters are used for matrices and vectors, respectively. The superscripts $(\cdot)^{*}$ , $(\cdot)^{T}$ , and $(\cdot)^{H}$ denote the conjugate, transpose, and conjugate transpose operations, respectively. The mathematical expectation operator is denoted by ${\mathbb{E}}\{\cdot\}$ . The operators $\det(\cdot)$ represent the matrix determinant, and $\|\cdot\|_{2}$ is the $\ell_{2}$ norm. The operators $\odot$ and $\otimes$ denote the Hadamard and Kronecker product, respectively. The $N\times N$ identity matrix is denoted by $\mathbf{I}_{N}$ , and $\mathbf{I}_{N,M}$ is used to denote $[\mathbf{I}_{N}~{}\mathbf{0}_{N,(M-N)}]$ when $N<M$ and $[\mathbf{I}_{M}~{}\mathbf{0}_{M,(N-M)}]^{T}$ when $N>M$ . A vector composed of the diagonal elements of $\mathbf{X}$ is denoted by $\text{diag}(\mathbf{X})$ , and a diagonal matrix with $\mathbf{x}$ along its diagonal is denoted by $\text{diag}(\mathbf{x})$ . We use $h_{n}$ or $[\mathbf{h}]_{n}$ , $a_{mn}$ or $[\mathbf{A}]_{mn}$ , $[\mathbf{A}]_{:,n}$ and $[\mathbf{A}]_{m,:}$ to denote the $n$ -th element of the vector $\mathbf{h}$ , the $(m,n)$ -th element of the matrix $\mathbf{A}$ , the $n$ -th column and the $m$ -th row of matrix $\mathbf{A}$ , respectively. The symbol $\lceil x\rceil$ denotes the smallest integer among those larger than $x$ . Define $\mathbb{Z}_{N}^{+}=\{0,1,\cdots,N\}$ . The operation $a\bmod b$ denotes the integer $a$ modulo the integer $b$ .

II Preliminaries

In this section, we present an information geometric perspective on the space of multivariate complex Gaussian distributions by using the concepts from [17].

II-A Affine and Dual Affine Coordinate Systems

From information geometry theory, we have that the manifold of complex Gaussian distribution is a dually flat manifold, which has an affine coordinate system and a dual affine coordinate system. The two coordinate systems are also called natural parameters and expectation parameters in the literature. In the following, we describe the two affine coordinate systems in detail.

The Gaussian distributions belong to the exponential family of distributions [30]. Let $\bm{\theta},\bm{\Theta}$ be the natural parameter of a complex Gaussian distribution of a random vector $\mathbf{x}\in\mathbb{C}^{N\times 1}$ , then the PDF $p(\mathbf{x};\bm{\theta},\bm{\Theta})$ is defined as

\displaystyle p(\mathbf{x};\bm{\theta},\bm{\Theta})=\exp\left\{\mathbf{x}^{H}\bm{\theta}+\bm{\theta}^{H}\mathbf{x}+\mathbf{x}^{H}\bm{\Theta}\mathbf{x}-\psi(\bm{\theta},\bm{\Theta})\right\}

(1)

where $\psi(\bm{\theta},\bm{\Theta})$ is the normalization factor, which is called free energy function and given by

\displaystyle\psi(\bm{\theta},\bm{\Theta})

\displaystyle=N\log(\pi)-\log\det(-\bm{\Theta})-\bm{\theta}^{H}\bm{\Theta}^{-1}\bm{\theta}.

(2)

Let $\mathcal{M}=\{p(\mathbf{x};\bm{\theta},\bm{\Theta})\}$ be the manifold of multivariate complex Gaussian distributions and $\bm{\theta},\bm{\Theta}$ is an coordinate system. The free energy function $\psi(\bm{\theta},\bm{\Theta})$ is a convex function of $\bm{\theta},\bm{\Theta}$ and introduces an affine flat structure, which means the $\bm{\theta},\bm{\Theta}$ is an affine coordinate system and each coordinate axis of $\bm{\theta},\bm{\Theta}$ is a straight line. Furthermore, the Bregman divergence [31] from $\bm{\theta},\bm{\Theta}$ to $\bm{\theta}^{\prime},\bm{\Theta}^{\prime}$ derived from $\psi(\bm{\theta},\bm{\Theta})$ is the same as the Kullback–Leibler (KL) divergence from $p(\mathbf{x};\bm{\theta}^{\prime},\bm{\Theta}^{\prime})$ to $p(\mathbf{x};\bm{\theta},\bm{\Theta})$ .

The dual coordinate system of $\mathcal{M}$ is obtained by the Legendre transformation [32], i.e., the gradients of the free energy function $\psi(\bm{\theta},\bm{\Theta})$ . From (2), we can obtain the dual coordinate $\bm{\mu},\mathbf{M}$ as

	$\displaystyle{\Hy@raisedlink{\hyper@anchorstart{\@IEEEtheHrefequation}}\Hy@raisedlink{\hyper@anchorend}}\frac{\partial\psi}{\partial\bm{\theta}^{*}}$	$\displaystyle=\mathbb{E}\{\mathbf{x}\}=\bm{\mu}=-\bm{\Theta}^{-1}\bm{\theta}$			(3a)
	$\displaystyle\frac{\partial\psi}{\partial\bm{\Theta}}$	$\displaystyle=\mathbb{E}\{\mathbf{x}\mathbf{x}^{H}\}=\mathbf{M}=\bm{\Theta}^{-1}\bm{\theta}\bm{\theta}^{H}\bm{\Theta}^{-1}-\bm{\Theta}^{-1}=\bm{\mu}\bm{\mu}^{H}+\bm{\Sigma}$			(3b)

where $\bm{\Sigma}$ is the covariance matrix of $\mathbf{x}$ . The dual coordinate is the combination of the first and second order moments of $\mathbf{x}$ and is also called the expectation parameter. The dual function of $\psi$ is given by

\displaystyle\phi=\psi^{*}

\displaystyle=\int p(\mathbf{x};\bm{\theta},\bm{\Theta})\log{p(\mathbf{x};\bm{\theta},\bm{\Theta})}\,\mathrm{d}\mathbf{x}

(4)

which is the negative entropy of the PDF $p(\mathbf{x};\bm{\theta},\bm{\Theta})$ . By using the dual coordinate system, we have that

\displaystyle\phi(\bm{\mu},\mathbf{M})

\displaystyle=c-\log\det(\mathbf{M}-\bm{\mu}\bm{\mu}^{H})

(5)

where $c$ is a constant. The dual function $\phi$ is a convex function of $\bm{\mu},\mathbf{M}$ and induces the dual affine flat structure. The Bregman divergence from $\bm{\mu},\mathbf{M}$ to $\bm{\mu}^{\prime},\mathbf{M}^{\prime}$ derived from $\phi$ is the KL divergence from $p(\mathbf{x};\bm{\mu},\mathbf{M})$ to $p(\mathbf{x};\bm{\mu}^{\prime},\mathbf{M}^{\prime})$ .

The transformations from expectation parameters to natural parameters can also be obtained from the Legendre transformation as

	$\displaystyle{\Hy@raisedlink{\hyper@anchorstart{\@IEEEtheHrefequation}}\Hy@raisedlink{\hyper@anchorend}}\bm{\theta}$	$\displaystyle=\frac{\partial\phi}{\partial\bm{\mu}^{*}}=\bm{\Sigma}^{-1}\bm{\mu}\qquad$			(6a)
	$\displaystyle\bm{\Theta}$	$\displaystyle=\frac{\partial\phi}{\partial\mathbf{M}}=-\bm{\Sigma}^{-1}$			(6b)

where $\bm{\Sigma}=\mathbf{M}-\bm{\mu}\bm{\mu}^{H}$ is the covariance matrix and is used here for brevity. By using the expectation parameter, the PDF can also be written in the familiar form as

\displaystyle p(\mathbf{x};\bm{\mu},\bm{\Sigma})

\displaystyle=\exp\left\{\mathbf{x}^{H}\bm{\Sigma}^{-1}\bm{\mu}+\bm{\mu}^{H}\bm{\Sigma}^{-1}\mathbf{x}-\mathbf{x}^{H}\bm{\Sigma}^{-1}\mathbf{x}-\psi(\bm{\mu},\bm{\Sigma})\right\}

(7)

where $\psi(\bm{\mu},\bm{\Sigma})=\log(\pi^{N})+\log\det(\bm{\Sigma})+\bm{\mu}^{H}\bm{\Sigma}^{-1}\bm{\mu}$ .

II-B $e$ -flat Submanifold and $m$ -Projection

After introducing the affine and dual affine coordinate systems, we present the definitions of $e$ -flat and $m$ -projection, which are very important when describing the information geometry approach for channel estimation.

A submanifold $\mathcal{M}_{1}\subset\mathcal{M}$ is called $e$ -flat if it has a linear constraint in the affine coordinate $\bm{\theta},\bm{\Theta}$ . The term $m$ -flat can be similarly defined by using the dual affine coordinate $\bm{\mu},\mathbf{M}$ . An example of $e$ -flat submanifold is the manifold of independent complex Gaussian distributions, defined as

\displaystyle\mathcal{M}_{0}=\{p(\mathbf{x};\bm{\theta}_{0},\bm{\Theta}_{0})\}

(8)

where $\bm{\Theta}_{0}\in\mathbb{R}^{N\times N}$ is a diagonal matrix. Since $\bm{\Theta}_{0}$ are diagonal matrices, the expectation parameter is very easy to obtain as

\displaystyle\bm{\mu}_{0}

\displaystyle=-\bm{\Theta}^{-1}\bm{\theta}_{0},\qquad

\displaystyle\bm{\Sigma}_{0}

= -Θ_0^-1.

(9)

The projection to an $e$ -flat manifold is called $m$ -projection because the projection can be realized linearly in the dual affine coordinate system. Let $p(\mathbf{x};\bm{\theta}_{0},\bm{\Theta}_{0})$ and $p(\mathbf{x};\bm{\theta}_{1},\bm{\Theta}_{1})$ be two points in the manifold $\mathcal{M}$ , refered as $P_{0}$ and $P_{1}$ , and $P_{0}\in\mathcal{M}_{0}$ and $P_{1}\notin\mathcal{M}_{0}$ .

The $m$ -projection is unique and minimizes the KL divergence. Specifically, the $m$ -projection from $P_{1}$ to $\mathcal{M}_{0}$ is defined as

	$\displaystyle p(\mathbf{x};\bm{\theta}_{1}^{0},\bm{\Theta}_{1}^{0})$	$\displaystyle={\Pi}_{\mathcal{M}_{0}}^{m}\{p(\mathbf{x};\bm{\theta}_{1},\bm{\Theta}_{1})\}$			(10)
		$\displaystyle=\mathop{\arg\min}\limits_{p(\mathbf{x};\bm{\theta}_{0},\bm{\Theta}_{0})\in\mathcal{M}_{0}}D_{KL}\left(p(\mathbf{x};\bm{\theta}_{1},\bm{\Theta}_{1});p(\mathbf{x};\bm{\theta}_{0},\bm{\Theta}_{0})\right).$			(10)

By using the affine and dual affine coordinate systems, the $m$ -projection is easy to obtain. First, rewrite the KL divergence as [17]

\displaystyle D_{KL}(P_{1};P_{0})=\phi(\bm{\mu}_{1},\mathbf{M}_{1})+\psi(\bm{\theta}_{0},\bm{\Theta}_{0})-\bm{\mu}_{1}^{H}\bm{\theta}_{0}-\bm{\theta}_{0}^{H}\bm{\mu}_{1}-{\rm tr}(\mathbf{M}_{1}\bm{\Theta}_{0}).

(11)

Then, the minimum can be obtained from the first-order optimal condition

	$\displaystyle{\Hy@raisedlink{\hyper@anchorstart{\@IEEEtheHrefequation}}\Hy@raisedlink{\hyper@anchorend}}\frac{\partial D_{KL}(P_{1};P_{0})}{\partial\bm{\theta}_{0}^{*}}$	$\displaystyle=\frac{\partial\psi}{\partial\bm{\theta}_{0}^{*}}-\bm{\mu}_{1}=\bm{\mu}_{0}-\bm{\mu}_{1}$			(12a)
	$\displaystyle\frac{\partial D_{KL}(P_{1};P_{0})}{\partial\bm{\Theta}_{0}}$	$\displaystyle=\frac{\partial\psi}{\partial\bm{\Theta}_{0}}-\frac{\partial{\rm tr}(\mathbf{M}_{1}\bm{\Theta}_{0})}{\partial\bm{\Theta}_{0}}=\mathbf{I}\odot{\mathbf{M}}_{0}-\mathbf{I}\odot{\mathbf{M}}_{1}$			(12b)

where $\mathbf{I}\odot{\mathbf{M}}_{1}$ is obtained because $\bm{\Theta}_{0}$ is a diagnal matrix. Thus, we have $\bm{\mu}_{0}=\bm{\mu}_{1}$ and $\mathbf{I}\odot{\mathbf{M}}_{0}=\mathbf{I}\odot{\mathbf{M}}_{1}$ for the projection point, and $\bm{\theta}_{1}^{0},\bm{\Theta}_{1}^{0}$ are their dual coordiantes. The projection is simple in the dual affine coordinate system. Let $P_{1}^{0}$ be the projection point, the dual straight line connecting $P_{1}$ and $P_{1}^{0}$ is the shortest one among those dual straight lines from $P_{1}$ to $M_{0}$ .

III Information Geometry Framework for Massive MIMO Channel Estimation

In this section, we provide a framework of information geometry methods for channel estimation in massive MIMO systems inspired by Section 11.3.3 and 11.3.4 in [17].

III-A Problem Formulation

In massive MIMO channel estimation, a general received signal model is given by

\displaystyle\mathbf{y}=\mathbf{A}\mathbf{h}+\mathbf{z}

(13)

where $\mathbf{A}\in\mathbb{C}^{M\times N}$ is the deterministic measurement matrix, $\mathbf{h}$ is a random Gaussian vector distributed as $\mathbf{h}\sim\mathcal{CN}(\mathbf{0},\mathbf{D})$ , and $\mathbf{z}$ is the complex Gaussian noise vector distributed as $\mathbf{h}\sim\mathcal{CN}(\mathbf{0},\sigma_{z}^{2}\mathbf{I})$ .

For the received signal model in (13), the posterior distribution can be written as

\displaystyle p(\mathbf{h}|\mathbf{y})

\displaystyle=\exp\left\{\sigma_{z}^{-2}\mathbf{h}^{H}\mathbf{A}^{H}\mathbf{y}+\sigma_{z}^{-2}\mathbf{y}^{H}\mathbf{A}\mathbf{h}-\mathbf{h}^{H}(\sigma_{z}^{-2}\mathbf{A}^{H}\mathbf{A}+\mathbf{D}^{-1})\mathbf{h}-\psi\right\}.

(14)

According to (1), the natural parameters of $p(\mathbf{h}|\mathbf{y})$ are

	$\displaystyle\bm{\theta}$	$\displaystyle=\sigma_{z}^{-2}\mathbf{A}^{H}\mathbf{y}$			(15a)
	$\displaystyle\bm{\Theta}$	$\displaystyle=-(\sigma_{z}^{-2}\mathbf{A}^{H}\mathbf{A}+\mathbf{D}^{-1})$			(15b)

whereas the dual affine coordinate can be obtained from (3a) as

	$\displaystyle\bm{\mu}$	$\displaystyle=-\bm{\Theta}^{-1}\bm{\theta}=(\sigma_{z}^{-2}\mathbf{A}^{H}\mathbf{A}+\mathbf{D}^{-1})^{-1}\sigma_{z}^{-2}\mathbf{A}^{H}\mathbf{y}$			(16a)
	$\displaystyle\mathbf{M}$	$\displaystyle=\bm{\mu}\bm{\mu}^{H}-\bm{\Theta}^{-1}=\bm{\mu}\bm{\mu}^{H}+(\sigma_{z}^{-2}\mathbf{A}^{H}\mathbf{A}+\mathbf{D}^{-1})^{-1}.$			(16b)

The dual affine coordinate $\bm{\mu}$ is the posterior mean of $\mathbf{h}$ , and thus is also the MMSE estimation. For massive MIMO systems, the complexity of $\bm{\mu}$ is often too high due to the inversion of the large dimensional matrix. Thus, one of the most important problems for massive MIMO is to derive low-complexity channel estimation methods.

III-B Information Geometry Framework

From Section II-B, we know that if we can $m$ -project $p(\mathbf{h};\bm{\theta},\bm{\Theta})$ onto the $e$ -flat submanifold ${\cal{M}}_{0}$ , then $\bm{\mu}$ is equal to the mean at the projection point, which is easy to obtain. However, the $m$ -projection still involves the matrix inversion of the large dimensional matrix, and thus can not provide a low-complexity solution.

Information geometry provides other low-complexity ways to find a point in ${\cal{M}}_{0}$ whose dual coordinate is approximation or the same as that of the $m$ -projection point instead of using the $m$ -projection. The IGA algorithm proposed in [11] is a specific algorithm derived based on information geometry, but its complexity can be further reduced. To extend the information geometry approach to derive new low-complexity algorithms, we provide a framework of information geometry for massive MIMO channel estimation.

We call the submanifold ${\cal{M}}_{0}$ the target manifold since we want to find a target point in it. Since the target point can not be obtained directly by using the $m$ -projection, the auxiliary manifolds and PDFs are needed to find a way to approximate the $m$ -projection. The natural parameters or affine coordinates of original posterior PDF are $\bm{\theta}_{or}=\sigma_{z}^{-2}\mathbf{A}^{H}\mathbf{y}$ , $\bm{\Theta}_{or}=-(\sigma_{z}^{-2}\mathbf{A}^{H}\mathbf{A}+\mathbf{D}^{-1})$ . With the auxiliary manifolds, the process of the information geometry framework for channel estimation is summarized below:

(1)

The natural parameters $\bm{\theta}_{or}$ and $\bm{\Theta}_{or}$ are split to construct $Q$ auxiliary manifolds of PDFs and one target manifold of PDFs;
(2)

Initialize the auxiliary points and the target point in the auxiliary manifolds and target manifold, respectively;
(3)

Calculate the $m$ -projections of the auxiliary points to the target manifold and compute the beliefs in the affine coordinate system;
(4)

Update the natural parameters of the auxiliary and target points;
(5)

Repeat (3) and (4) until the algorithm converges or fixed iterations, output the mean and variance of the target point.

With the framework, the $m$ -projection of the original point to the target manifold is approximated by the $m$ -projections from the auxiliary points to the target manifold. To make the approximation well enough, two important conditions described in the following subsections are needed.

III-C Split of Natural Parameter and the $e$ -Condition

In the general information geometry framework, the most important thing is to define the auxiliary points and manifolds. These definitions depend on the way of splitting natural parameter and determine what specific algorithm can be derived.

The split of $\bm{\theta}_{or}$ and $\bm{\Theta}_{or}$ into $Q$ items is given as

	$\displaystyle{\Hy@raisedlink{\hyper@anchorstart{\@IEEEtheHrefequation}}\Hy@raisedlink{\hyper@anchorend}}\bm{\theta}_{or}=\sigma_{z}^{-2}\mathbf{A}^{H}\mathbf{y}=\sum_{q=1}^{Q}\mathbf{b}_{q}$		(17a)
	$\displaystyle\bm{\Theta}_{or}=-(\sigma_{z}^{-2}\mathbf{A}^{H}\mathbf{A}+\mathbf{D}^{-1})=-(\sum_{q=1}^{Q}\mathbf{C}_{q}+\bm{\Lambda}_{c})$		(17b)

where the setting of $\mathbf{b}_{q}$ , $\mathbf{C}_{q}$ , $Q$ , and $\bm{\Lambda}_{c}$ depends on specific algorithms, and $\bm{\Lambda}_{c}$ is usually a diagonal matrix. Let $\mathbf{a}_{q}=[\mathbf{A}^{H}]_{:,q}$ be the $q$ -th column of $\mathbf{A}^{H}$ . In the IGA algorithm proposed in [11], the split $\mathbf{b}_{q}=\mathbf{a}_{q}y_{q}$ , $\mathbf{C}_{q}=\mathbf{a}_{q}\mathbf{a}_{q}^{H}$ , $Q=M$ , and $\bm{\Lambda}_{c}=\mathbf{D}^{-1}$ is used.

Based on the split, we define $Q$ auxiliary points or PDFs. Let $\bm{\Lambda}_{q}$ be a diagonal matrix. The natural parameter of the $q$ -th auxiliary point $p(\mathbf{h};{\bm{\theta}}_{q},{\bm{\Theta}}_{q})$ is defined by

	$\displaystyle{\Hy@raisedlink{\hyper@anchorstart{\@IEEEtheHrefequation}}\Hy@raisedlink{\hyper@anchorend}}{\bm{\theta}}_{q}$	$\displaystyle={\bm{\lambda}}_{q}+\mathbf{b}_{q}$			(18a)
	$\displaystyle{\bm{\Theta}}_{q}$	$\displaystyle=-({\bm{\Lambda}}_{q}+\mathbf{C}_{q}+\bm{\Lambda}_{c}).$			(18b)

where $\bm{\lambda}_{q}$ and $\bm{\Lambda}_{q}$ are variables in the natural parameter. The corresponding auxiliary PDF and manifold are then given as

\displaystyle p(\mathbf{h};{\bm{\theta}}_{q},{\bm{\Theta}}_{q})=\exp\left\{\mathbf{h}^{H}(\bm{\lambda}_{q}+\mathbf{b}_{q})+(\bm{\lambda}_{q}+\mathbf{b}_{q})^{H}\mathbf{h}-\mathbf{h}^{H}(\bm{\Lambda}_{q}+\mathbf{C}_{q}+\bm{\Lambda}_{c})\mathbf{h}-\psi_{q}\right\}.

(19)

and

\displaystyle\mathcal{M}_{q}=\left\{p(\mathbf{h};{\bm{\theta}}_{q},{\bm{\Theta}}_{q})\right\}.

(20)

These auxiliary manifolds $\mathcal{M}_{q}$ s are parallel to each other when $\mathbf{C}_{q}$ s are not diagonal since the points in different auxiliary manifolds never intersect.

The target manifold is still the manifold of independent complex Gaussian distributions $\mathcal{M}_{0}=\left\{p(\mathbf{h};{\bm{\theta}}_{0},{\bm{\Theta}}_{0})\right\}$ . To be consistent with the auxiliary points, the natural parameter of the target point is defined as

	$\displaystyle{\Hy@raisedlink{\hyper@anchorstart{\@IEEEtheHrefequation}}\Hy@raisedlink{\hyper@anchorend}}{\bm{\theta}}_{0}$	$\displaystyle={\bm{\lambda}}_{0}$			(21a)
	$\displaystyle{\bm{\Theta}}_{0}$	$\displaystyle=-({\bm{\Lambda}}_{0}+\bm{\Lambda}_{c})$			(21b)

where ${\bm{\Lambda}}_{0}$ is diagonal. The corresponding PDF $p(\mathbf{h};{\bm{\theta}}_{0},{\bm{\Theta}}_{0})$ is

\displaystyle p(\mathbf{h};{\bm{\theta}}_{0},{\bm{\Theta}}_{0})=\exp\left\{\mathbf{h}^{H}{\bm{\lambda}}_{0}+{\bm{\lambda}}_{0}^{H}\mathbf{h}-\mathbf{h}^{H}{(\bm{\Lambda}}_{0}+\bm{\Lambda}_{c})\mathbf{h}-\psi_{0}\right\}

(22)

The target manifold is also parallel to all the auxiliary manifolds.

Refer to caption — Figure 1: $e$ -condition

Now, we introduce the first condition of the information geometry framework as

\displaystyle\sum_{q=1}^{Q}({\bm{\theta}}_{q},{\bm{\Theta}}_{q})+(1-Q)({\bm{\theta}}_{0},{\bm{\Theta}}_{0})=(\bm{\theta}_{or},\bm{\Theta}_{or})

(23)

It means the original point, the target point and the auxiliary points are on a hyperplane in the affine coordinate system. This is called the $e$ -condition since it is a linear condition in the coordinate system $\bm{\theta},\bm{\Theta}$ as shown in Fig. 1. Because of the split, the $e$ -condition always holds if

\displaystyle\sum_{q=1}^{Q}(\bm{\lambda}_{q},\bm{\Lambda}_{q})+(1-Q)(\bm{\lambda}_{0},\bm{\Lambda}_{0})=0.

(24)

Thus, the above formula is also the $e$ -condition. The $e$ -condition makes sure the target point and auxiliary points are related to the original point, which is important to finally get an approximation of the $m$ -projection point.

III-D The $m$ -Condition and General Algorithm

The $e$ -condition only states the relation between the natural parameters of the original point, the auxiliary points, and the target point, but what we need is part of the expectation parameters of the original point. To obtain an approximation of the $m$ -projection point, we need another condition, i.e, the $m$ -condition.

The $m$ -condition is named because it is a linear condition in the dual affine coordinate system $\bm{\mu},\mathbf{M}$ , and is given by

\displaystyle({\bm{\mu}}_{q}^{\star},\mathbf{I}\odot\mathbf{M}_{q}^{\star})=({\bm{\mu}}_{0}^{\star},\mathbf{I}\odot\mathbf{M}_{0}^{\star})\quad\forall q\in\mathbb{Z}_{Q}^{+}.

(25)

From Section II-B, it means all the $m$ -projection points of the auxiliary points are the same and equal to the target point. By combining the $m$ -condition with the $e$ -condition, a good approximation of the $m$ -projection point on the target manifold of the original point can be obtained.

From (3a), the expectation parameters ${\bm{\mu}}_{q},{\mathbf{M}}_{q}$ of auxiliary points can be obtained as

	$\displaystyle{\bm{\mu}}_{q}$	$\displaystyle=({\bm{\Lambda}}_{q}+\mathbf{C}_{q}+\bm{\Lambda}_{c})^{-1}({\bm{\lambda}}_{q}+\mathbf{b}_{q})$			(26a)
	$\displaystyle{\mathbf{M}}_{q}$	$\displaystyle={\bm{\mu}}_{q}{\bm{\mu}}_{q}^{H}+({\bm{\Lambda}}_{q}+\mathbf{C}_{q}+\bm{\Lambda}_{c})^{-1}.$			(26b)

Then, the expectation parameters of the $m$ -projection points on the target manifold $\mathcal{M}_{0}$ satisfies

	$\displaystyle{\Hy@raisedlink{\hyper@anchorstart{\@IEEEtheHrefequation}}\Hy@raisedlink{\hyper@anchorend}}{\bm{\mu}}_{q}^{0}$	$\displaystyle={\bm{\mu}}_{q}$			(27a)
	$\displaystyle\mathbf{I}\odot{\mathbf{M}}_{q}^{0}$	$\displaystyle=\mathbf{I}\odot{\mathbf{M}}_{q}$			(27b)

as shown in Section II-B, and further we have the covariance of the $m$ -projection ${\bm{\Sigma}}_{q}^{0}=\mathbf{I}\odot{\bm{\Sigma}}_{q}$ . The natural parameters of the $m$ -projection points can be obtained as

	$\displaystyle{\Hy@raisedlink{\hyper@anchorstart{\@IEEEtheHrefequation}}\Hy@raisedlink{\hyper@anchorend}}{\bm{\theta}}_{q}^{0}$	$\displaystyle=-{\bm{\Theta}}_{q}^{0}({\bm{\Lambda}}_{q}+\mathbf{C}_{q}+\bm{\Lambda}_{c})^{-1}({\bm{\lambda}}_{q}+\mathbf{b}_{q})$			(28a)
	$\displaystyle{\bm{\Theta}}_{q}^{0}$	$\displaystyle=-(\mathbf{I}\odot({\bm{\Lambda}}_{q}+\mathbf{C}_{q}+\bm{\Lambda}_{c})^{-1})^{-1}.$			(28b)

To make both the $e$ -condition and the $m$ -condition hold, the auxiliary points need to exchange beliefs. Define ${\bm{\lambda}}_{q}^{0}$ and ${\bm{\Lambda}}_{q}^{0}$ to make ${\bm{\theta}}_{q}^{0}={\bm{\lambda}}_{q}^{0}$ and ${\bm{\Theta}}_{q}^{0}=-({\bm{\Lambda}}_{q}^{0}+\bm{\Lambda}_{c})$ hold. The beliefs are defined as

	$\displaystyle{\Hy@raisedlink{\hyper@anchorstart{\@IEEEtheHrefequation}}\Hy@raisedlink{\hyper@anchorend}}\bm{\xi}_{q}$	$\displaystyle={\bm{\lambda}}_{q}^{0}-{\bm{\lambda}}_{q}$			(29a)
	$\displaystyle\bm{\Xi}_{q}$	$\displaystyle={\bm{\Lambda}}_{q}^{0}-{\bm{\Lambda}}_{q}.$			(29b)

Then, the natural parameters of the $m$ -projection points can be expressed as

	$\displaystyle{\Hy@raisedlink{\hyper@anchorstart{\@IEEEtheHrefequation}}\Hy@raisedlink{\hyper@anchorend}}{\bm{\theta}}_{q}^{0}$	$\displaystyle={\bm{\lambda}}_{q}+\bm{\xi}_{q}$			(30a)
	$\displaystyle{\bm{\Theta}}_{q}^{0}$	$\displaystyle=-\left({\bm{\Lambda}}_{q}+\bm{\Xi}_{q}+\bm{\Lambda}_{c}\right).$			(30b)

By comparing them with the natural parameters of auxiliary point $p(\mathbf{h};{\bm{\theta}}_{q},{\bm{\Theta}}_{q})$ , it can be observed that $\bm{\xi}_{q}$ , $\bm{\Xi}_{q}$ are approximations of $\mathbf{b}_{q}$ , $\mathbf{C}_{q}$ in ${\bm{\theta}}_{q}$ and ${\bm{\Theta}}_{q}$ , respectively, where $\bm{\Xi}_{q}$ is also diagonal.

After defining the beliefs, the iterative update of natural parameters of the target point and auxiliary points are constructed as

	$\displaystyle{\Hy@raisedlink{\hyper@anchorstart{\@IEEEtheHrefequation}}\Hy@raisedlink{\hyper@anchorend}}\bm{\lambda}_{0}^{t+1}$	$\displaystyle=\sum_{q}\bm{\xi}_{q}^{t}$			(31a)
	$\displaystyle\bm{\Lambda}_{0}^{t+1}$	$\displaystyle=\sum_{q}\bm{\Xi}_{q}^{t}$			(31b)

and

	$\displaystyle{\Hy@raisedlink{\hyper@anchorstart{\@IEEEtheHrefequation}}\Hy@raisedlink{\hyper@anchorend}}\bm{\lambda}_{q}^{t+1}$	$\displaystyle=\sum_{q^{\prime}\neq q}\bm{\xi}_{q^{\prime}}^{t}=\bm{\lambda}_{0}^{t+1}-\bm{\xi}_{q}^{t}$			(32a)
	$\displaystyle\bm{\Lambda}_{q}^{t+1}$	$\displaystyle=\sum_{q^{\prime}\neq q}\bm{\Xi}_{q^{\prime}}^{t}=\bm{\Lambda}_{0}^{t+1}-\bm{\Xi}_{q}^{t}.$			(32b)

From the above two equations, it is observed that the $e$ -condition always holds. An information geometry based algorithm is obtained by iteratively calculating (28a), (29a), (31a) and (32a). When the algorithm converges, it is easy to obtain that

	$\displaystyle{\Hy@raisedlink{\hyper@anchorstart{\@IEEEtheHrefequation}}\Hy@raisedlink{\hyper@anchorend}}({\bm{\theta}}_{q}^{0})^{\star}$	$\displaystyle={\bm{\lambda}}_{0}^{\star}$			(33a)
	$\displaystyle({\bm{\Theta}}_{q}^{0})^{\star}$	$\displaystyle=-\left({\bm{\Lambda}}_{0}^{\star}+\bm{\Lambda}_{c}\right)={\bm{\Theta}}_{0}^{\star}$			(33b)

which means all the $m$ -projection points are equal to the target point, and is equivalent to the $m$ -condition. In conclusion, both the $e$ -condition and the $m$ -condition are satisfied when the algorithm converges.

Finally, the output of this algorithm is the mean and covariance of the target point

	$\displaystyle{\bm{\mu}}_{0}$	$\displaystyle=-{\bm{\Theta}}_{0}^{-1}{\bm{\theta}}_{0}={\bm{\Lambda}}_{0}^{-1}{\bm{\lambda}}_{0}$			(34a)
	$\displaystyle{\bm{\Sigma}}_{0}$	$\displaystyle=-{\bm{\Theta}}_{0}^{-1}={\bm{\Lambda}}_{0}^{-1}.$			(34b)

It is regarded as the approximated mean and covariance of the marginal PDF corresponding to the original PDF. When the algorithm does not converge, damping can be introduced in the updating of beliefs to ensure the convergence of the algorithm without changing its equilibrium. Although the process of the $m$ -projection in (28a) still involves the matrix inversion, i.e., $({\bm{\Lambda}}_{q}+\mathbf{C}_{q}+\bm{\Lambda}_{c})^{-1}$ , it can be implemented with low complexity by properly setting of $\mathbf{C}_{q}$ since ${\bm{\Lambda}}_{q}$ and $\bm{\Lambda}_{c}$ are diagonal matrices. For example, in the IGA algorithm proposed in [11], the matrix $\mathbf{C}_{q}$ is a rank- $1$ matrix.

III-E Geometrical Explanation

This iterative process and the stabilization point can be explained geometrically. Define the $m$ -flat manifold $\mathcal{M}^{\star}$ and the $e$ -flat manifold $\mathcal{E}^{\star}$ as

	$\displaystyle\mathcal{M}^{\star}$	$\displaystyle=\left\{p(\mathbf{h};\bm{\theta},\bm{\Theta})\|(\bm{\mu},\mathbf{I}\odot\mathbf{M})=(\bm{\mu}_{q}^{\star},\mathbf{I}\odot\mathbf{M}_{q}^{\star})=(\bm{\mu}_{0}^{\star},\mathbf{I}\odot\mathbf{M}_{0}^{\star}),\forall q\in\mathbb{Z}_{Q}^{+}\right\}$			(35a)
	$\displaystyle\mathcal{E}^{\star}$	$\displaystyle=\left\{p(\mathbf{h};\bm{\theta},\bm{\Theta})\|(\bm{\theta},\bm{\Theta})=\sum_{q=1}^{Q}c_{q}({\bm{\theta}}_{q}^{\star},{\bm{\Theta}}_{q}^{\star})+(1-\sum_{q=1}^{Q}c_{q})({\bm{\theta}}_{0}^{\star},{\bm{\Theta}}_{0}^{\star})\right\}$			(35b)

where $c_{q}$ are the positive coefficients. The geometric interpretation of the $m$ -condition is given by Fig. 2. The $e$ -flat auxiliary and target manifolds are perpendicular to the $m$ -flat manifold $\mathcal{M}^{\star}$ under the metric induced by the KL divergence.

From the $e$ -condition and the $m$ -condition shown in Figs. 1 and 2, it can be seen that when the algorithm converges, all the auxiliary points, the target points and the original points are also on the same $e$ -flat manifold $\mathcal{E}^{\star}$ . Meanwhile, all the auxiliary points and the target point are on the same $m$ -flat manifold $\mathcal{M}^{\star}$ . The $m$ -condition makes sure the $m$ -projection points of the auxiliary points and the target point are the same. The $e$ -condition relates the auxiliary points and the target point to the original point. By combining these two conditions, the original point is close to the manifold $\mathcal{M}^{\star}$ . Theorem 11.8 of [17] proves that $\mathcal{M}^{\star}$ contains the original point when the factor graph is acyclic, i.e., which means the exact mean of the original distribution is obtained. However, this property is not guaranteed when the factor graph is cyclic, which is the common case for the channel estimation problem. Fortunately, for the channel estimation problem, it is usually able to prove the mean $\hat{\bm{\mu}}_{0}^{\star}$ is equal to the mean $\bm{\mu}_{or}$ when the algorithm converges as shown in [11].

IV IC-IGA for Massive MIMO Channel Estimation

In this section, an IC-IGA for massive MIMO channel estimation is proposed. First, a modified form equivalent to the MMSE estimation is given to define the auxiliary manifolds. Then, the framework of the information geometry approach is applied to derive this new IGA algorithm. Finally, the equilibrium and complexity analysis are provided.

IV-A Definition of Auxiliary Manifolds with Modified MMSE Form

In this subsection, we provide a new way of splitting the natural parameters based on a modified MMSE form. Then, the corresponding auxiliary manifolds are constructed.

To derive a new low-complexity IGA algorithm, we want to make each auxiliary manifold focus on the computation of one element of $\mathbf{h}$ . According to the framework of information geometry methods, the $n$ -th auxiliary PDF can be defined as

\displaystyle p(\mathbf{h};{\bm{\theta}}_{n},{\bm{\Theta}}_{n})=\exp\{\mathbf{h}^{H}({\bm{\lambda}}_{n}+\mathbf{b}_{n})+({\bm{\lambda}}_{n}^{H}+\mathbf{b}_{n}^{H})\mathbf{h}-\mathbf{h}^{H}({\bm{\Lambda}}_{n}+\mathbf{C}_{n})\mathbf{h}-\psi_{n}\}

(36)

where ${\bm{\theta}}_{n}={\bm{\lambda}}_{n}+\mathbf{b}_{n}$ , ${\bm{\Theta}}_{n}={\bm{\Lambda}}_{n}+\mathbf{C}_{n}$ and $\bm{\Lambda}_{c}$ is set to be a zero matrix.

Let $\mathbf{K}=\sigma_{z}^{-2}\mathbf{A}^{H}\mathbf{A}$ . To make the $n$ -th auxiliary PDF only compute the information of the $n$ -th element of $\mathbf{h}$ , a natural idea is to set $\mathbf{C}_{n}$ and $\mathbf{b}_{n}$ as

\displaystyle\mathbf{P}_{1n}\mathbf{C}_{n}\mathbf{P}_{1n}^{H}=\left(\begin{array}[]{cc}c_{n}&\frac{1}{2}\bar{\mathbf{k}}_{n}^{H}\\ \frac{1}{2}\bar{\mathbf{k}}_{n}&\mathbf{0}\end{array}\right),\qquad\mathbf{P}_{1n}\mathbf{b}_{n}=\left(\begin{array}[]{c}\frac{1}{\sigma_{z}^{2}}\mathbf{a}_{n}^{H}\mathbf{y}\\ \mathbf{0}\end{array}\right)

(41)

where $\mathbf{a}_{n}$ is the $n$ -th column of $\mathbf{A}$ , $\bar{\mathbf{k}}_{n}$ is the vector obtained by deleting the $n$ -element $k_{nn}$ from the $n$ -column of $\mathbf{K}$ , $c_{n}=\sigma_{z}^{-2}\mathbf{a}_{n}^{H}\mathbf{a}_{n}+d_{n}^{-1}$ , and $\mathbf{P}_{1n}\in\mathbb{R}^{N\times N}$ is the ordering matrix obtained by extracting $n$ -th row of $\mathbf{I}_{N}$ and put it in the first row. The matrix $\mathbf{P}_{1n}$ has the property that $\mathbf{P}_{1n}\mathbf{P}_{1n}^{H}=\mathbf{P}_{1n}^{H}\mathbf{P}_{1n}=\mathbf{I}_{N}$ . By left-multiplying $\mathbf{P}_{1n}$ or right-multiplying $\mathbf{P}_{1n}^{H}$ , one can extract the $n$ -th row or column of a given matrix and put it in the first row or column.

However, there is a flaw in this setup. The items associated with elements other than $h_{n}$ in $\mathbf{h}$ are missing in the auxiliary function, and thus might not form a Gaussian-distributed PDF since ${\bm{\Lambda}}_{n}+\mathbf{C}_{n}$ are not always positive semidefinite. To overcome this issue, we present the following theorem.

Theorem 1.

Let the matrices $\mathbf{T}$ and $\bm{\Upsilon}$ be defined as $\mathbf{T}=\sigma_{z}^{-2}\mathbf{A}^{H}\mathbf{A}-\mathbf{I}\odot(\sigma_{z}^{-2}\mathbf{A}^{H}\mathbf{A})$ and $\bm{\Upsilon}=\left(\mathbf{I}\odot(\sigma_{z}^{-2}\mathbf{A}^{H}\mathbf{A})+\mathbf{D}^{-1}\right)^{-1}$ . The estimator

\displaystyle\hat{\mathbf{h}}=\left(\sigma_{z}^{-2}\mathbf{A}^{H}\mathbf{A}+\mathbf{D}^{-1}+\mathbf{T}+\mathbf{T}\bm{\Upsilon}\mathbf{T}^{H}\right)^{-1}\left(\sigma_{z}^{-2}\mathbf{A}^{H}\mathbf{y}+\mathbf{T}\bm{\Upsilon}\sigma_{z}^{-2}\mathbf{A}^{H}\mathbf{y}\right)

(42)

is equivalent to the MMSE estimator.

Proof.

The proof is provided in Appendix A. ∎

Based on Theorem 1, we can use the following PDF

	$\displaystyle p(\mathbf{h}\|\mathbf{y})$	$\displaystyle=\exp\Big{\{}\mathbf{h}^{H}(\sigma_{z}^{-2}\mathbf{A}^{H}\mathbf{y}+\mathbf{T}\bm{\Upsilon}\sigma_{z}^{-2}\mathbf{A}^{H}\mathbf{y})+(\sigma_{z}^{-2}\mathbf{A}^{H}\mathbf{y}+\mathbf{T}\bm{\Upsilon}\sigma_{z}^{-2}\mathbf{A}^{H}\mathbf{y})^{H}\mathbf{h}$			(43)
		$\displaystyle\qquad\qquad-\mathbf{h}^{H}\left(\sigma_{z}^{-2}\mathbf{A}^{H}\mathbf{A}+\mathbf{D}^{-1}+\mathbf{T}+\mathbf{T}\bm{\Upsilon}\mathbf{T}^{H}\right)^{-1}\mathbf{h}-\tilde{\psi}\Big{\}}$			(43)

to compute the original mean. With the modified MMSE form, we propose a new way of splitting natural parameter as

	$\displaystyle{\Hy@raisedlink{\hyper@anchorstart{\@IEEEtheHrefequation}}\Hy@raisedlink{\hyper@anchorend}}\bm{\theta}_{or}=\sigma_{z}^{-2}\mathbf{A}^{H}\mathbf{y}+\mathbf{T}\bm{\Upsilon}\sigma_{z}^{-2}\mathbf{A}^{H}\mathbf{y}=\sum_{n=1}^{N}\mathbf{b}_{n}$		(44a)
	$\displaystyle\bm{\Theta}_{or}=-(\sigma_{z}^{-2}\mathbf{A}^{H}\mathbf{A}+\mathbf{D}^{-1}+\mathbf{T}+\mathbf{T}\bm{\Upsilon}\mathbf{T}^{H})=-(\sum_{n=1}^{N}\mathbf{C}_{n})$		(44b)

where $\mathbf{b}_{n}$ and $\mathbf{C}_{n}$ are defined as

\displaystyle\mathbf{P}_{1n}\mathbf{C}_{n}\mathbf{P}_{1n}^{H}=\left(\begin{array}[]{cc}c_{n}&\bar{\mathbf{k}}_{n}^{H}\\ \bar{\mathbf{k}}_{n}&\frac{1}{c_{n}}\bar{\mathbf{k}}_{n}\bar{\mathbf{k}}_{n}^{H}\end{array}\right),\qquad\mathbf{P}_{1n}\mathbf{b}_{n}=\left(\begin{array}[]{c}\frac{1}{\sigma_{z}^{2}}\mathbf{a}_{n}^{H}\mathbf{y}\\ \frac{1}{\sigma_{z}^{2}}\frac{1}{c_{n}}\bar{\mathbf{k}}_{n}\mathbf{a}_{n}^{H}\mathbf{y}\end{array}\right).

(49)

The natural parameter of the $n$ -th auxiliary PDF is defined as

	$\displaystyle{\Hy@raisedlink{\hyper@anchorstart{\@IEEEtheHrefequation}}\Hy@raisedlink{\hyper@anchorend}}\bm{\theta}_{n}$	$\displaystyle=\bm{\lambda}_{-n}+\mathbf{b}_{n}$			(50a)
	$\displaystyle\bm{\Theta}_{n}$	$\displaystyle=-(\bm{\Lambda}_{-n}+\mathbf{C}_{n})$			(50b)

where $\bm{\lambda}_{-n},\bm{\Lambda}_{-n}$ are given by

	$\displaystyle\bm{\lambda}_{-n}=(\lambda_{1},\cdots,\lambda_{n-1},0,\lambda_{n+1},\cdots,\lambda_{N})^{T}$		(51a)
	$\displaystyle\bm{\Lambda}_{-n}=\text{diag}(\Lambda_{1},\cdots,\Lambda_{n-1},0,\Lambda_{n+1},\cdots,\Lambda_{N}).$		(51b)

The subscript ${-n}$ instead of $n$ is used because we impose more constraints on $\bm{\lambda}_{n},\bm{\Lambda}_{n}$ , and the $n$ -th element has been set to zero. The auxiliary PDF $p(\mathbf{h};\bm{\theta}_{n},\bm{\Theta}_{n})$ can be constructed as

\displaystyle p(\mathbf{h};\bm{\theta}_{n},\bm{\Theta}_{n})=\exp\{\mathbf{h}^{H}(\bm{\lambda}_{-n}+\mathbf{b}_{n})+(\bm{\lambda}_{-n}^{H}+\mathbf{b}_{n}^{H})\mathbf{h}-\mathbf{h}^{H}(\bm{\Lambda}_{-n}+\mathbf{C}_{n})\mathbf{h}-\psi_{n}\}.

(52)

and the corresponding auxiliary manifolds are $\mathcal{M}_{n}=\left\{p(\mathbf{h};\bm{\theta}_{n},\bm{\Theta}_{n})\right\},\forall n\in\mathbb{Z}_{N}^{+}$ .

The natural parameter of the target point is defined as $\bm{\theta}_{0}=\bm{\lambda}$ , $\bm{\Theta}_{0}=-\bm{\Lambda}$ , where

	$\displaystyle\bm{\lambda}$	$\displaystyle=(\lambda_{1},\lambda_{2},\cdots,\lambda_{N})^{T}$			(53a)
	$\displaystyle\bm{\Lambda}$	$\displaystyle={\rm diag}(\Lambda_{1},\Lambda_{2},\cdots,\Lambda_{N}).$			(53b)

The target PDF becomes

\displaystyle p_{0}(\mathbf{h};\bm{\theta}_{0},\bm{\Theta}_{0})=\exp\{\mathbf{h}^{H}\bm{\lambda}+\bm{\lambda}^{H}\mathbf{h}-\mathbf{h}^{H}\bm{\Lambda}\mathbf{h}-\psi_{0}\}

(54)

and the target manifold is still $\mathcal{M}_{0}=\left\{p(\mathbf{h};\bm{\theta}_{0},\bm{\Theta}_{0})\right\}$ . It is easy to check the $e$ -condition always holds due to

\displaystyle\sum_{n=1}^{N}(\bm{\lambda}_{-n},\bm{\Lambda}_{-n})+(1-N)(\bm{\lambda},\bm{\Lambda})=0.

(55)

IV-B Derivation of IC-IGA

After constructing the auxiliary manifolds, the estimation problem can be solved by applying the information geometry framework of section III. For convenience, we define $\bm{\lambda}_{n},\bm{\Lambda}_{n}$ as

	$\displaystyle\bm{\lambda}_{n}=(\lambda_{1},\cdots,\lambda_{n-1},\lambda_{n+1},\cdots,\lambda_{N})^{T}\in\mathbb{C}^{(N-1)\times 1}$		(56a)
	$\displaystyle\bm{\Lambda}_{n}=\text{diag}(\Lambda_{1},\cdots,\Lambda_{n-1},\Lambda_{n+1},\cdots,\Lambda_{N})\in\mathbb{C}^{(N-1)\times(N-1)}.$		(56b)

The following theorem gives the beliefs $\bm{\xi}_{n}$ , $\bm{\Xi}_{n}$ corresponding to the $n$ -th auxiliary point at the $t$ -th iteration.

Theorem 2.

Let the beliefs $\bm{\xi}_{n}$ , $\bm{\Xi}_{n}$ be defined as $\bm{\xi}_{n}=\bm{\lambda}_{-n}^{0}-\bm{\lambda}_{-n}$ and $\bm{\Xi}_{n}=\bm{\Lambda}_{-n}^{0}-\bm{\Lambda}_{-n}$ . Then, their elements are given by

	$\displaystyle[\bm{\xi}_{n}]_{i}=\begin{cases}r_{n}\mu_{n},\quad i=n\\ 0,\quad\text{others}\end{cases}$		(57a)
	$\displaystyle{[\bm{\Xi}_{n}]}_{ii}=\begin{cases}r_{n},\quad i=n\\ 0,\quad\text{others}\end{cases}$		(57b)

where $\mu_{n}=\frac{1}{\sigma_{z}^{2}}c_{n}^{-1}\left(\mathbf{a}_{n}^{H}\mathbf{y}-\mathbf{a}_{n}^{H}\mathbf{A}\bm{\Lambda}^{-1}\bm{\lambda}+\mathbf{a}_{n}^{H}\mathbf{a}_{n}\Lambda_{n}^{-1}\lambda_{n}\right)$ , $r_{n}=c_{n}(1+e_{n})^{-1}$ , and $e_{n}=\frac{1}{c_{n}}\bar{\mathbf{k}}_{n}^{H}\bm{\Lambda}_{n}^{-1}\bar{\mathbf{k}}_{n}$ .

Proof.

The proof is provided in Appendix B. ∎

Before proceeding to the derivation of IC-IGA, we present more insights about Theorem 2. Let $\bar{\mathbf{h}}_{n}$ and $s_{n}$ be defined as $\bar{\mathbf{h}}_{n}=(h_{1},\cdots,h_{n-1},h_{n+1},\cdots,h_{N})^{T}$ and

\displaystyle s_{n}

\displaystyle=\frac{1}{\sigma_{z}^{2}}\mathbf{a}_{n}^{H}\mathbf{y}-\bar{\mathbf{k}}_{n}^{H}\bar{\mathbf{h}}_{n}

(58)

we have that

\displaystyle\frac{1}{Z_{n}}\exp\{\mathbf{h}^{H}\mathbf{b}_{n}+\mathbf{b}_{n}^{H}\mathbf{h}-\mathbf{h}^{H}\mathbf{C}_{n}\mathbf{h}\}

\displaystyle=\frac{1}{Z_{n}}\exp\left\{h_{n}^{*}s_{n}+s_{n}^{*}h_{n}-h_{n}^{*}c_{n}h_{n}-\frac{s_{n}^{*}s_{n}}{c_{n}}\right\}

(59)

where $Z_{n}$ is the normalization constant. It means this part of $p(\mathbf{h};\bm{\theta}_{n},\bm{\Theta}_{n})$ can be viewed as a conditional PDF of $h_{n}$ as $p(h_{n}|h_{1},\cdots,h_{n-1},h_{n+1},\cdots,h_{N},\mathbf{y})$ . Furthermore, the remain part of $p(\mathbf{h};\bm{\theta}_{n},\bm{\Theta}_{n})$ can be viewed as

\displaystyle\exp\{\mathbf{h}^{H}\bm{\lambda}_{-n}+\bm{\lambda}_{-n}^{H}\mathbf{h}-\mathbf{h}^{H}\bm{\Lambda}_{-n}\mathbf{h}-\psi\}=p(h_{1}|\mathbf{y})\cdots p(h_{n-1}|\mathbf{y})p(h_{n+1}|\mathbf{y})\cdots p(h_{N}|\mathbf{y}).

(60)

The corresponding PDF $p_{n}(\mathbf{h};\bm{\theta}_{n},\bm{\Theta}_{n})$ can be rewritten as

	$\displaystyle\quad p(\mathbf{h};\bm{\theta}_{n},\bm{\Theta}_{n})$
	$\displaystyle=p(h_{n}\|h_{1},\cdots,h_{n-1},h_{n+1},\cdots,h_{N},\mathbf{y})p(h_{1}\|\mathbf{y})\cdots p(h_{n-1}\|\mathbf{y})p(h_{n+1}\|\mathbf{y})\cdots p(h_{N}\|\mathbf{y}).$		(61)

Thus, the marginal PDF of other elements will be the same as that provided by $\bm{\lambda}_{-n},\bm{\Lambda}_{-n}$ , and each auxiliary manifold only updates the marginal PDF of one element. This explains why the $n$ -th auxiliary manifold has a belief of $0$ for the other elements as shown in Theorem 2.

From Theorem 2, the $n$ -th auxiliary manifold only computes the mean $\mu_{n}$ and variance $r_{n}^{-1}$ of the $n$ -th element of $\mathbf{h}$ . The corresponding natural parameter is $\lambda_{n}=r_{n}\mu_{n},\Lambda_{n}=r_{n}$ . In the computation, the mean and variance of the other elements are those of the other auxiliary manifolds in the previous iteration, which is consistent with the idea of interference cancellation (IC), and therefore this method is called IC-IGA.

After calculating the beliefs, the parameters $\bm{\lambda}$ , $\bm{\Lambda}$ and $\bm{\lambda}_{-n}$ , $\bm{\Lambda}_{-n}$ corresponding to the target and auxiliary points are updated based on (31a) and (32a). However, this update might cause the algorithm to diverge. By introducing damping, the convergence of the algorithm can be enhanced without changing its equilibrium. Let $0<\alpha\leq 1$ be the damping coefficient, the natural parameters $\lambda_{n}^{t+1}$ and $\Lambda_{n}^{t+1}$ corresponding to the target and auxiliary points in $\bm{\lambda}^{t+1}$ , $\bm{\Lambda}^{t+1}$ , $\bm{\lambda}_{-n}^{t+1}$ and $\bm{\Lambda}_{-n}^{t+1}$ are updated as

	$\displaystyle{\Hy@raisedlink{\hyper@anchorstart{\@IEEEtheHrefequation}}\Hy@raisedlink{\hyper@anchorend}}\lambda_{n}^{t+1}$	$\displaystyle=\alpha r_{n}^{t+1}\mu_{n}^{t+1}+(1-\alpha)\lambda_{n}^{t}$			(62a)
	$\displaystyle\Lambda_{n}^{t+1}$	$\displaystyle=\alpha r_{n}^{t+1}+(1-\alpha)\Lambda_{n}^{t},$			(62b)

where the computation of $\mu_{n}^{t}$ and $r_{n}^{t}$ can be rewritten as

	$\displaystyle{\Hy@raisedlink{\hyper@anchorstart{\@IEEEtheHrefequation}}\Hy@raisedlink{\hyper@anchorend}}\mu_{n}^{t+1}$	$\displaystyle=\frac{1}{\sigma_{z}^{2}}c_{n}^{-1}\left(\mathbf{a}_{n}^{H}\mathbf{y}-\mathbf{a}_{n}^{H}\mathbf{A}(\bm{\Lambda}^{t})^{-1}\bm{\lambda}^{t}+\mathbf{a}_{n}^{H}\mathbf{a}_{n}(\Lambda_{n}^{t})^{-1}\lambda_{n}^{t}\right)$			(63a)
	$\displaystyle r_{n}^{t+1}$	$\displaystyle=\frac{c_{n}}{1+e_{n}^{t}}$			(63b)

where $e_{n}^{t}=\frac{1}{c_{n}}\bar{\mathbf{k}}_{n}^{H}(\bm{\Lambda}_{n}^{t})^{-1}\bar{\mathbf{k}}_{n}$ and $c_{n}=\frac{1}{\sigma_{z}^{2}}\mathbf{a}_{n}^{H}\mathbf{a}_{n}+d_{n}^{-1}$ . Besides, define the matrix $\mathbf{L}=(\mathbf{A}^{H}\mathbf{A})\odot(\mathbf{A}^{H}\mathbf{A})^{*}$ and vector $\mathbf{v}=(\Lambda_{1}^{-1},\Lambda_{2}^{-1},\cdots,\Lambda_{N}^{-1})^{T}$ , then the computation of $e_{n}^{t}$ can be rewritten as

\displaystyle e_{n}^{t}=\frac{1}{\sigma_{z}^{4}}c_{n}^{-1}(\mathbf{e}_{n}^{T}\mathbf{L}\mathbf{v}^{t}-\mathbf{a}_{n}^{H}\mathbf{a}_{n}(\Lambda_{n}^{-1})^{t}\mathbf{a}_{n}^{H}\mathbf{a}_{n})

(64)

where $\mathbf{e}_{n}=[\mathbf{I}_{N}]_{:,n}$ is the vector where only the $n$ -th element is $1$ and the rest are all $0$ s. The channel estimation IC-IGA is summarized as Algorithm 1.

Input: The received signal

\mathbf{y}

, the a priori covariance

\mathbf{D}

\mathbf{h}

and the maximal iteration number

T

Output: The approximated posterior mean

\bm{\mu}^{t}

and covariance

(\mathbf{r}^{t})^{-1}

of beam domain channel

\mathbf{h}

1 Initialization

t=0

\bm{\mu}^{t}=\mathbf{0}

\bm{\lambda}^{t}=\mathbf{0}

\bm{\Lambda}^{t}=\mathbf{I}

\mathbf{v}^{t}=\mathbf{1}

, calculate

\mathbf{A}^{H}\mathbf{y}

\mathbf{A}^{H}\mathbf{A}

, and

\mathbf{L}=(\mathbf{A}^{H}\mathbf{A})\odot(\mathbf{A}^{H}\mathbf{A})^{*}

2 while $t\leq T$ do

3 Calculate the

m

-projection of

N

auxiliary points to the target manifold and the corresponding mean

\mu_{n}^{t+1}

and covariance

(r_{n}^{t+1})^{-1}

	$\displaystyle\mu_{n}^{t+1}$	$\displaystyle=\frac{1}{\sigma_{z}^{2}}c_{n}^{-1}\left(\mathbf{a}_{n}^{H}\mathbf{y}-\mathbf{a}_{n}^{H}\mathbf{A}(\bm{\Lambda}^{t})^{-1}\bm{\lambda}^{t}+\mathbf{a}_{n}^{H}\mathbf{a}_{n}(\Lambda_{n}^{t})^{-1}\lambda_{n}^{t}\right)$
	$\displaystyle r_{n}^{t+1}$	$\displaystyle=\frac{c_{n}}{1+e_{n}^{t}}$
	$\displaystyle e_{n}^{t}$	$\displaystyle=\frac{1}{\sigma_{z}^{4}c_{n}}(\mathbf{e}_{n}^{T}\mathbf{L}\mathbf{v}^{t}-\mathbf{a}_{n}^{H}\mathbf{a}_{n}(\Lambda_{n}^{t})^{-1}\mathbf{a}_{n}^{H}\mathbf{a}_{n}).$

Update the parameters of the target and auxiliary points as

	$\displaystyle{\Hy@raisedlink{\hyper@anchorstart{\@IEEEtheHrefequation}}\Hy@raisedlink{\hyper@anchorend}}\lambda_{n}^{t+1}$	$\displaystyle=\alpha r_{n}^{t+1}\mu_{n}^{t+1}+(1-\alpha)\lambda_{n}^{t}$			(65a)
	$\displaystyle\Lambda_{n}^{t+1}$	$\displaystyle=\alpha r_{n}^{t+1}+(1-\alpha)\Lambda_{n}^{t}.$

t=t+1

4 end while

When the algorithm converges or

t>T

, output the posterior mean

\bm{\mu}^{t}

and covariance

(\mathbf{r}^{t})^{-1}

\mathbf{h}

Algorithm 1 IC-IGA for channel estimation

IV-C Equilibrium and Complexity Analysis

In this subsection, we present analysis for the equilibrium, i.e., fixed point or limit point, and the complexity of the IC-IGA.

From (25) and (24), Algorithm 1 satisfies the following conditions at the equilibrium

	$\displaystyle\bm{\mu}^{\star}=\bm{\mu}_{n}^{\star},\quad\bm{\Sigma}^{\star}=\mathbf{I}\odot\bm{\Sigma}_{n}^{\star}$		(66)
	$\displaystyle\bm{\lambda}^{\star}=\frac{1}{N-1}\sum_{n=1}^{N}\bm{\lambda}_{-n}^{\star},\quad\bm{\Lambda}^{\star}=\frac{1}{N-1}\sum_{n=1}^{N}\bm{\Lambda}_{-n}^{\star}$		(67)

which leads to the following theorem.

Theorem 3.

At the equilibrium of IC-IGA, the mean $\bm{\mu}^{\star}$ of $p(\mathbf{h};\bm{\theta}_{0},\bm{\Theta}_{0})$ is equal to the mean $\bm{\mu}_{\text{MMSE}}$ of the posterior distribution $p(\mathbf{h}|\mathbf{y})$ .

Proof.

The proof is provided in Appendix C. ∎

We now analyze the complexity of the IC-IGA algorithm in terms of time complexity (or computational complexity) and space complexity. In each iteration, the time complexity is mainly in the multiplication of matrix $\mathbf{L}$ and vector $\mathbf{v}$ , where $\mathbf{L}\in\mathbb{R}^{N\times N}$ and $\mathbf{v}\in\mathbb{R}^{N\times 1}$ , so the complexity is $\mathcal{O}(N^{2})$ . Finally, the time complexity of this algorithm is $\mathcal{O}(TN^{2})$ , and $T$ is the number of iterations. The free variables of this algorithm are $N$ -dimensional vectors $\bm{\mu}$ , $\mathbf{r}$ , $\mathbf{e}$ , $\bm{\lambda}$ , $\text{diag}(\bm{\Lambda})$ , and the number of free variables is $5N$ , so the space complexity of this algorithm is $\mathcal{O}(N)$ . When $T$ is small, the complexity of this algorithm is lower compared with the time complexity of $\mathcal{O}(N^{3}+N^{2}M)$ and the space complexity of $\mathcal{O}(N^{3})$ in the MMSE algorithm. In addition, compared with the time complexity of $\mathcal{O}(TMN)$ and the space complexity of $\mathcal{O}(MN)$ in the existing IGA algorithm [11], the IC-IGA algorithm has a comparable time complexity and a lower space complexity when $M$ is comparable to $N$ . In practice, the channel dimension $N$ is often smaller than the received signal dimension $M$ due to the sparsity of the channel, and the IC-IGA algorithm has lower time complexity and space complexity compared to the IGA algorithm.

V IC-SIGA for Massive MIMO Channel Estimation with BSCM and ZC Sequences

In this section, an IC-SIGA with lower complexity is proposed based on the IC-IGA by directly constructing the iterative update of the mean. To further reduce the complexity of IC-SIGA in practical systems, a massive MIMO with UPA is considered. By using BSCM and ZC sequences, a practical receive model is then established, and finally, the complexity of IC-SIGA is reduced by FFT and sparsity of the beam domain channel.

V-A Derivation of IC-SIGA

In this subsection, we provide the derivation of IC-SIGA, which can further reduce the complexity of channel estimation.

From the previous section, it is clear that the time complexity of the IC-IGA algorithm is mainly in the multiplication $\mathbf{L}\mathbf{v}$ . This computation is related to the calculation of $r_{n}$ and $e_{n}$ , which involves the computation of the corresponding variance of the $m$ -projection of each auxiliary point. In the following, we reconsider the computation of the mean corresponding to the $m$ -projection of each auxiliary point. From (63a) and $\bm{\mu}^{\star}=(\bm{\Lambda}^{\star})^{-1}\bm{\lambda}^{\star}$ , we can obtain

\displaystyle\bm{\mu}^{\star}

\displaystyle=\frac{1}{\sigma_{z}^{2}}\left(\mathbf{A}^{H}\mathbf{y}-\mathbf{A}^{H}\mathbf{A}{\bm{\mu}}^{\star}+(\mathbf{I}\odot\mathbf{A}^{H}\mathbf{A}){\bm{\mu}}^{\star}\right)./\mathbf{c}

(68)

where $\mathbf{c}=(c_{1},\cdots,c_{N})^{T}$ , $c_{n}=\frac{1}{\sigma_{z}^{2}}\mathbf{a}_{n}^{H}\mathbf{a}_{n}+d_{n}^{-1}$ , and $./$ denotes the element-by-element division of vectors. Equation (68) indicates that $\bm{\mu}$ can be updated without $r_{n}$ , $e_{n}$ .

However, equation (68) might also not converge. Thus, we also need to introduce the damping coefficient $0<\alpha\leq 1$ . Let $\bm{\mu}^{temp}$ be defined as

\displaystyle\bm{\mu}^{temp}

\displaystyle=\frac{1}{\sigma_{z}^{2}}\left(\mathbf{A}^{H}\mathbf{y}-\mathbf{A}^{H}\mathbf{A}\bm{\mu}^{t}+(\mathbf{I}\odot\mathbf{A}^{H}\mathbf{A})\bm{\mu}^{t}\right)./\mathbf{c}

(69)

where $\bm{\mu}^{t}$ is the mean at the $t$ -th iteration. Then, the mean of the target point $\bm{\mu}^{t+1}$ can be updated as

\displaystyle\bm{\mu}^{t+1}=\alpha\bm{\mu}^{temp}+(1-\alpha)\bm{\mu}^{t}.

(70)

When the algorithm converges, output the mean of the target point $\bm{\mu}^{t}$ . The IC-SIGA for channel estimation is summarized as Algorithm 2.

Input: The received signal

\mathbf{y}

, the a priori covariance

\mathbf{D}

\mathbf{h}

and the maximal iteration number

T

Output: The approximated posterior mean

\bm{\mu}^{t}

of beam domain channel

\mathbf{h}

1 Initialization

t=0

\bm{\mu}^{t}=\mathbf{0}

, calculate

\mathbf{A}^{H}\mathbf{y}

\mathbf{I}\odot\mathbf{A}^{H}\mathbf{A}

2 while $t\leq T$ do

3 Calculate the

m

-projection of

N

auxiliary points to the target manifold and the corresponding mean

\bm{\mu}^{temp}

\displaystyle\bm{\mu}^{temp}

\displaystyle=\frac{1}{\sigma_{z}^{2}}\left(\mathbf{A}^{H}\mathbf{y}-\mathbf{A}^{H}\mathbf{A}\bm{\mu}^{t}+(\mathbf{I}\odot\mathbf{A}^{H}\mathbf{A})\bm{\mu}^{t}\right)./\mathbf{c}.

Update the mean of the target point as

\displaystyle\bm{\mu}^{t+1}=\alpha\bm{\mu}^{temp}+(1-\alpha)\bm{\mu}^{t}.

t=t+1

4 end while

When the algorithm converges or

t>T

, output the posterior mean

\bm{\mu}^{t}

\mathbf{h}

Algorithm 2 IC-SIGA for channel estimation

For the equilibrium, we can have the following theorem by using similar methods as that in IC-IGA.

Theorem 4.

At the equilibrium of IC-SIGA, the mean $\bm{\mu}^{\star}$ is equal to the mean $\bm{\mu}_{\text{MMSE}}$ of the posterior distribution $p(\mathbf{h}|\mathbf{y})$ .

Proof.

The proof is provided in Appendix D. ∎

V-B System Configuration and Channel Model of Massive MIMO

In this subsection, we consider a 3D massive MIMO system equipped with UPA. The system configuration and the BSCM are introduced to show the properties of the channel.

Consider a massive MIMO-OFDM system working in time division duplexing (TDD) mode. The antenna array at the BS is a UPA with $M_{r}=M_{z}M_{x}$ antennas, where $M_{z}$ and $M_{x}$ are the numbers of vertical and horizontal antennas. All users are equipped with a single antenna. The carrier frequency is $f_{c}$ , and the wavelength is $\lambda_{c}$ . The vertical and horizontal antenna spacings $d_{z}$ and $d_{x}$ are both set to half wavelength. The number of OFDM subcarriers is $N_{c}$ , of which $M_{p}$ training subcarriers are used to transmit the uplink pilot signal. Let the set of training subcarrier indexes be defined as $\mathcal{L}=\{\ell_{0},\ell_{1},\cdots,\ell_{M_{p}-1}\}$ . Let $M_{g}$ and $T_{s}$ be the length of the cyclic prefix (CP) and sampling interval, respectively. The subcarrier interval is $\Delta f=\frac{1}{N_{c}T_{s}}$ and the transmission bandwidth is $B=M_{p}\Delta f$ .

The directional cosines for the $z$ and $x$ axis are defined as $u_{r}=\sin\theta_{r}$ and $v_{r}=\cos\theta_{r}\sin\phi_{r}$ , where $\theta_{r}$ and $\phi_{r}$ are polar and azimuthal angles of arrival (AOA) at the BS. The space steering vector at the BS side is given by [7]

\displaystyle\mathbf{v}(u_{r},v_{r})=\mathbf{v}_{z}(u_{r})\otimes\mathbf{v}_{x}(v_{r})\in\mathbb{C}^{M_{r}\times 1}

(71)

where

	$\displaystyle\mathbf{v}_{z}(u)=[1~{}~{}e^{-j2\pi\tfrac{d_{z}}{\lambda_{c}}u}~{}~{}\cdots~{}~{}e^{-j2\pi\tfrac{(M_{z}-1)d_{z}}{\lambda_{c}}u}]^{T}$		(72)
	$\displaystyle\mathbf{v}_{x}(v)=[1~{}~{}e^{-j2\pi\tfrac{d_{x}}{\lambda_{c}}v}~{}~{}\cdots~{}~{}e^{-j2\pi\tfrac{(M_{x}-1)d_{x}}{\lambda_{c}}v}]^{T}.$		(73)

Let $u_{i}$ and $v_{j}$ be sampled directional cosines, defined as

	$\displaystyle u_{i}$	$\displaystyle=\frac{2(i-1)-N_{z}}{N_{z}},~{}~{}i\in\mathbb{Z}_{N_{z}}^{+}$			(74)
	$\displaystyle v_{i}$	$\displaystyle=\frac{2(j-1)-N_{x}}{N_{x}},~{}~{}j\in\mathbb{Z}_{N_{x}}^{+}.$			(75)

Based on the space steering vector, the matrix of sampled space steering vectors are defined by

\displaystyle\mathbf{V}=\mathbf{V}_{z}\otimes\mathbf{V}_{x}\in\mathbb{C}^{M_{r}\times N_{r}}

(76)

where

	$\displaystyle\mathbf{V}_{z}$	$\displaystyle=[\mathbf{v}_{z}(u_{1})~{}~{}\mathbf{v}_{z}(u_{2})~{}~{}\cdots~{}~{}\mathbf{v}_{z}(u_{N_{z}})]$			(77a)
	$\displaystyle\mathbf{V}_{x}$	$\displaystyle=[\mathbf{v}_{x}(v_{1})~{}~{}\mathbf{v}_{x}(v_{2})~{}~{}\cdots~{}~{}\mathbf{v}_{x}(v_{N_{x}})].$			(77b)

The symbols $N_{z}=F_{z}M_{z}$ and $N_{x}=F_{x}M_{x}$ are the numbers of sampled horizontal and vertical cosines, where $F_{z}$ and $F_{x}$ are fine factors. The frequency steering vector is defined similarly as

\displaystyle\mathbf{u}(\tau)=\left[1~{}~{}e^{-j2\pi\Delta f\tau}~{}~{}\cdots~{}~{}e^{-j2\pi(M_{p}-1)\Delta f\tau}\right]^{T}.

(78)

The matrix of sampled frequency steering vectors is then given by

\displaystyle\mathbf{U}=[\mathbf{u}(\tau_{1})~{}~{}\mathbf{u}(\tau_{2})~{}~{}\cdots~{}~{}\mathbf{u}(\tau_{N_{f}})]\in\mathbb{C}^{M_{p}\times N_{f}}

(79)

where $N_{p}=F_{p}M_{p}$ is the number of sampled delays, $F_{p}$ is the fine factor, $N_{f}=\left\lceil\frac{N_{p}M_{g}}{N_{c}}\right\rceil$ , and $\tau_{r}$ are sampled delays, given by

\displaystyle\tau_{r}

\displaystyle=\frac{r-1}{N_{p}\Delta f},~{}~{}r\in\mathbb{Z}_{N_{f}}^{+}.

(80)

Finally, the space-frequency domain channel matrix of user $k$ can be expressed as [11, 7]

\displaystyle\mathbf{G}_{k}=\mathbf{V}\mathbf{H}_{k}\mathbf{U}^{T}\in\mathbb{C}^{M_{r}\times M_{p}}

(81)

where $\mathbf{H}_{k}\in\mathbb{C}^{N_{r}\times N_{f}}$ is the beam domain channel matrix. This model is called the BSCM. The beam-domain channel matrix $\mathbf{H}_{k}$ has improved sparsity than the traditional beam-domain stochastic channel model based on the discrete Fourier transform (DFT) matrices [33]. The elements of $\mathbf{H}_{k}$ are assumed to be independent and follow a complex Gaussian distribution with zero mean and different variances. The beam domain channel power matrix is defined as $\bm{\Omega}_{k}$ , where $[\bm{\Omega}_{k}]_{ij}$ is the variance of $[\mathbf{H}_{k}]_{ij}$ . It is the statistical CSI and remains constant over a relatively longer period than the instantaneous CSI.

V-C Receive Model with BSCM and ZC Sequences

In this subsection, we describe a specific receive model in practical massive MIMO systems, which shows that $\mathbf{A}$ multiplying vector can be realized by FFT.

The object of channel estimation is to obtain the a posteriori information of the space-frequency channel $\mathbf{G}_{k}$ , which can be calculated from the beam domain channel $\mathbf{H}_{k}$ with deterministic matrices $\mathbf{U}$ and $\mathbf{V}$ . Thus, we focus on the estimation of $\mathbf{H}_{k}$ . We use the pilot signal sequence in [34] as

\displaystyle\mathbf{x}_{k}=\tilde{\mathbf{x}}_{q}\odot\mathbf{u}(\tau_{(p-1)N_{f}+1})

(82)

where $\tilde{\mathbf{x}}_{q}\in\mathbb{C}^{M_{p}\times 1}$ is the Zadoff-Chu (ZC) sequence defined as

\displaystyle[\tilde{\mathbf{x}}_{q}]_{l}=e^{-j\frac{\pi(q-1)l(l-1)}{N_{l}}},\ l=1,\cdots,M_{p}

(83)

where $N_{l}$ is the largest prime number satisfying $N_{l}<M_{p}$ , $q=\lfloor(k-1)/P\rfloor+1$ and $p=\left((k-1)\bmod P\right)+1$ denote the root coefficient and cyclic shift, $Q=\lceil K/P\rceil$ is the number of roots, $P$ is the number of UEs of each root, and $\tau_{(p-1)N_{f}+1}$ is the sampled delay defined in (80).

Let $\mathbf{X}_{k}=\text{diag}(\mathbf{x}_{k})\in\mathbb{C}^{M_{p}\times M_{p}}$ be the pilot matrix. The received pilot signal matrix $\mathbf{Y}_{t}\in\mathbb{C}^{M_{r}\times M_{p}}$ at the $t$ -th OFDM symbol is expressed as

\displaystyle\mathbf{Y}_{t}=\sum\limits_{k=1}^{K}\mathbf{G}_{k,t}\mathbf{X}_{k}+\mathbf{Z}_{t}

(84)

where the noise matrix $\mathbf{Z}_{t}$ consists of i.i.d. elements with zero mean and variance $\sigma_{z}^{2}$ .

For convenience, the subscript $t$ of the OFDM symbol is omitted hereafter. Substituting (81) and (82) into (84), we have

\displaystyle\mathbf{Y}

\displaystyle=\sum\limits_{q=1}^{Q}\mathbf{V}\left(\sum\limits_{p=1}^{P}\mathbf{H}_{q,p}\mathbf{U}^{T}\text{diag}\left(\mathbf{u}(\tau_{(p-1)N_{f}+1})\right)\right)\tilde{\mathbf{X}}_{q}+\mathbf{Z}

(85)

where $\tilde{\mathbf{X}}_{q}=\text{diag}\left(\tilde{\mathbf{x}}_{q}\right)$ . Let $\mathbf{F}_{N_{p}}$ be an $N_{p}$ -dimensional DFT matrix and define the partial DFT matrix $\mathbf{U}_{F}=[\mathbf{u}(\tau_{1}),\cdots,\mathbf{u}(\tau_{N_{p}})]\in\mathbb{C}^{M_{p}\times N_{p}}$ . We then have $\mathbf{U}=\mathbf{U}_{F}\mathbf{I}_{N_{p},N_{f}}$ and $\mathbf{U}_{F}=\mathbf{I}_{M_{p},N_{p}}\mathbf{F}_{N_{p}}$ .

In (85), $\mathbf{U}^{T}\text{diag}\left(\mathbf{u}(\tau_{(p-1)N_{f}+1})\right)$ can be calculated as

\displaystyle\mathbf{U}^{T}\text{diag}\left(\mathbf{u}(\tau_{(p-1)N_{f}+1})\right)

\displaystyle=\mathbf{I}_{N_{f},N_{p}}\bm{\Pi}_{N_{p}}^{(p-1)N_{f}}\mathbf{U}_{F}^{T}

(86)

where

\displaystyle\bm{\Pi}_{N}^{n}=\left(\begin{array}[]{cc}\mathbf{0}&\mathbf{I}_{N-n}\\ \mathbf{I}_{n}&\mathbf{0}\end{array}\right)

(89)

is the permutation matrix. Then, (85) can be reexpressed as

\displaystyle\mathbf{Y}

\displaystyle=\sum\limits_{q=1}^{Q}\mathbf{V}(\sum\limits_{p=1}^{P}\mathbf{H}_{q,p}\mathbf{I}_{N_{f},N_{p}}\bm{\Pi}_{N_{p}}^{(p-1)N_{f}})\mathbf{U}_{F}^{T}\tilde{\mathbf{X}}_{q}+\mathbf{Z}.

(90)

When $P\leq\lfloor N_{p}/N_{f}\rfloor$ , we can avoid mutual aliasing of UEs with the same root and define

\displaystyle\tilde{\mathbf{H}}_{q}

\displaystyle=\left[\mathbf{H}_{q,1},\cdots,\mathbf{H}_{q,P},\mathbf{0}_{N_{r},N_{p}-PN_{f}}\right]\in\mathbb{C}^{N_{r}\times N_{p}}.

(91)

Finally, the uplink received signal model is given as

\displaystyle\mathbf{Y}=\mathbf{V}\mathbf{H}\mathbf{P}+\mathbf{Z}

(92)

where

	$\displaystyle\mathbf{H}$	$\displaystyle=[\tilde{\mathbf{H}}_{1}~{}~{}\tilde{\mathbf{H}}_{2}~{}~{}\cdots~{}~{}\tilde{\mathbf{H}}_{Q}]$			(93)
	$\displaystyle\mathbf{P}$	$\displaystyle=[\tilde{\mathbf{X}}_{1}\mathbf{U}_{F}~{}~{}\tilde{\mathbf{X}}_{2}\mathbf{U}_{F}~{}~{}\cdots~{}~{}\tilde{\mathbf{X}}_{Q}\mathbf{U}_{F}]^{T}.$			(94)

By vectorizing (92), the received signal model in vector form is

\displaystyle\mathbf{y}=\tilde{\mathbf{A}}\tilde{\mathbf{h}}+\mathbf{z}

(95)

where $\mathbf{y}=\text{vec}(\mathbf{Y})\in\mathbb{C}^{M\times 1}$ , $\tilde{\mathbf{A}}=\mathbf{P}^{T}\otimes\mathbf{V}\in\mathbb{C}^{M\times\tilde{N}}$ , $\tilde{\mathbf{h}}=\text{vec}(\mathbf{H})\in\mathbb{C}^{\tilde{N}\times 1}$ , $\mathbf{z}=\text{vec}(\mathbf{Z})\sim\mathcal{CN}(\mathbf{0},\sigma_{z}^{2}\mathbf{I})$ , $M=M_{r}M_{p}$ , and $\tilde{N}=QN_{p}N_{r}$ . Since the beam domain channels are sparse, we can obtain a low dimensional $\mathbf{h}=\tilde{\mathbf{E}}\mathbf{h}\in\mathbb{C}^{N\times 1}$ from $\tilde{\mathbf{h}}$ by removing the elements with zero variance, where $\tilde{\mathbf{E}}$ is the extraction matrix. Then, the general received signal model $\mathbf{y}={\mathbf{A}}{\mathbf{h}}+\mathbf{z}$ in (13) can be obtained, where $\mathbf{A}\in\mathbb{C}^{M\times N}$ is the matrix obtained by removing corresponding columns of $\tilde{\mathbf{A}}$ , and $\mathbf{D}$ can be obtained from $\bm{\Omega}_{k}$ of all users.

V-D Complexity Analysis

The time complexity of the IC-SIGA algorithm is mainly in the multiplication $\mathbf{A}^{H}\mathbf{b}$ and $\mathbf{A}\mathbf{s}$ , where $\mathbf{s}$ and $\mathbf{b}$ are two arbitrary vectors. In the following, we analyze the fast implementation method and complexity of the operation $\mathbf{A}^{H}\mathbf{b}$ . The computation of $\mathbf{A}\mathbf{s}$ can be implemented similarly.

The computation of $\mathbf{A}^{H}\mathbf{b}$ can be written as $\mathbf{A}^{H}\mathbf{b}=\tilde{\mathbf{A}}^{H}\tilde{\mathbf{b}}$ . From $\tilde{\mathbf{A}}=\mathbf{P}^{T}\otimes\mathbf{V}$ , the vector $\tilde{\mathbf{A}}^{H}\tilde{\mathbf{b}}$ can be transformed into a matrix as $\tilde{\mathbf{A}}^{H}\tilde{\mathbf{b}}=\text{vec}(\mathbf{V}^{H}\mathbf{B}\mathbf{P}^{H})$ , where $\mathbf{B}\in\mathbb{C}^{M_{r}\times M_{p}}$ , $\text{vec}(\mathbf{B})=\tilde{\mathbf{b}}$ . Substituting the expression (94) for $\mathbf{P}$ yields $\mathbf{B}\mathbf{P}^{H}=\mathbf{B}(\tilde{\mathbf{X}}_{1}^{*}\mathbf{I}_{M_{p},N_{p}}\mathbf{F}_{N_{p}}^{*},\cdots,\tilde{\mathbf{X}}_{Q}^{*}\mathbf{I}_{M_{p},N_{p}}\mathbf{F}_{N_{p}}^{*})$ . Define $\tilde{\mathbf{B}}_{q}=[\mathbf{B}\tilde{\mathbf{X}}_{q}^{*},\mathbf{0}_{M_{p},N_{p}-M_{p}}]\in\mathbb{C}^{M_{r}\times N_{p}}$ , then $\mathbf{B}\tilde{\mathbf{X}}_{q}^{*}\mathbf{I}_{M_{p},N_{p}}\mathbf{F}_{N_{p}}^{*}=\tilde{\mathbf{B}}_{q}\mathbf{F}_{N_{p}}^{*}$ , where $\tilde{\mathbf{B}}_{q}\mathbf{F}_{N_{p}}^{*}$ can be realized by FFT. Since $\mathbf{X}_{q}^{*}$ is a diagonal matrix, the complexity of $\mathbf{B}\mathbf{X}_{q}^{*}$ is negligible. Thus, the time complexity of $\mathbf{B}\mathbf{P}^{H}$ is $\mathcal{O}(QM_{r}N_{p}\log_{2}N_{p})$ .

From (77a) we have $\mathbf{V}=(\mathbf{I}_{M_{z},N_{z}}\otimes\mathbf{I}_{M_{x},N_{x}})(\mathbf{F}_{N_{z}}\otimes\mathbf{F}_{N_{x}}).$ Then, $\mathbf{V}^{H}(\mathbf{B}\mathbf{P}^{H})$ can be realized by FFT with a time complexity of $\mathcal{O}(QN_{p}N_{r}\log_{2}N_{r})$ . In summary, the time complexity of IC-SIGA is $\mathcal{O}\left(TQ(M_{r}N_{p}\log_{2}N_{p}+N_{p}N_{r}\log_{2}N_{r})\right)$ b where $T$ is the iteration number.

Then, we analyze the space complexity. In IC-SIGA, the free variables are $N$ -dimensional vectors $\bm{\mu}^{temp}$ , $\bm{\mu}^{t}$ , and the number of free variables is $2N$ . Therefore, the space complexity of IC-SIGA is $\mathcal{O}(N)$ .

VI Simulation Results

TABLE I: Parameter Setting of the QuaDRiGa

Parameter	Value
Number of BS antenna $M_{r}=M_{z}\times M_{x}$	128 $=$ 8 $\times$ 16
UT number $K$	12, 24
Center frequency $f_{c}$	4.8GHz
Number of subcarriers $N_{c}$	2048
Subcarrier spacing $\Delta f$	30kHz
Number of training subcarriers $M_{p}$	120
CP length $M_{g}$	144

In this section, we provide simulation results to show the performance of the proposed IC-IGA and IC-SIGA for MIMO-OFDM channel estimation. We use the widely used QuaDRiGa channel model. The simulation scenario is set to “3GPP $\_$ 38.901 $\_$ UMa $\_$ NLOS”. The main parameters for the simulations are summarized in Table I. The layout of this massive MIMO-OFDM system is plotted in Fig. 3, where the location of the BS is at $(0,0,25)$ , and the users are randomly generated in a $120^{\circ}$ sector with radius $r=500$ m around $(0,0,0)$ at $1.5$ m height. The channel is normalized as $\mathbb{E}\{\|\mathbf{G}_{k}\|_{F}^{2}\}=M_{r}M_{p}$ . Fine factors are set as $F_{z}=F_{x}=F_{p}=2$ . The SNR is set as SNR $=1/\sigma_{z}^{2}$ . We use the algorithm proposed in [7] to obtain the beam domain channel power matrices $\bm{\Omega}_{k},\forall k$ . The normalized mean-squared error (NMSE) is used as the performance metric for the channel estimation, and is defined as

\displaystyle\text{NMSE}=\frac{1}{KN_{sam}}\sum\limits_{k=1}^{K}\sum\limits_{n=1}^{N_{sam}}\frac{\|\bar{\mathbf{G}}_{k,n}-\mathbf{G}_{k,n}\|_{F}^{2}}{\|\mathbf{G}_{k,n}\|_{F}^{2}}

(96)

where $N_{sam}$ is the number of the received pilot signals, $\mathbf{G}_{k,n}$ is the exact space-frequency domain channel matrix, and $\bar{\mathbf{G}}_{k,n}$ is the estimated space-frequency domain channel matrix. We set $N_{sam}=200$ in our simulations.

TABLE II: Complexities of algorithms

Algorithm	Time Complexity	Space Complexity
IC-IGA	$\mathcal{O}(TN^{2})$	$\mathcal{O}(N)$
IC-SIGA	$\mathcal{O}\left(TQ(M_{r}N_{p}\log_{2}N_{p}+N_{p}N_{r}\log_{2}N_{r})\right)$	$\mathcal{O}(N)$
IGA	$\mathcal{O}(TMN)$	$\mathcal{O}(MN)$
MMSE	$\mathcal{O}(N^{3}+N^{2}M)$	$\mathcal{O}(N^{3})$

First, we present simulation results to show the complexity performance of different algorithms. The complexity of the IC-IGA, IC-SIGA, IGA, and MMSE algorithms is summarized in Table II. The time complexity and space complexity of the above four algorithms are presented in Figs. 4 and 5 for comparison, where the number of iterations is $T=100$ , the number of pilot roots is $Q=\lceil K/P\rceil$ , and simulation parameters are configured as in Table I. From the figure, it can be found that the order of the time complexities of these algorithms is IC-SIGA $<$ IC-IGA $<$ IGA $<$ MMSE, and the order of the space complexities of these algorithms is IC-SIGA $=$ IC-IGA $<$ IGA $<$ MMSE. Meanwhile, the time complexity of IC-SIGA algorithm is the same for the same number of pilot roots. The time complexity of IC-IGA is much less than the MMSE algorithm and also less than the IGA, whereas the space complexity of IC-IGA is much less than the IGA and MMSE algorithm. The time complexity of IC-SIGA is much less than other algorithms, and the space complexity of IC-SIGA is much less than that of the IGA and the MMSE algorithm.

Fig. 6 shows the NMSE performance of IC-IGA and IC-SIGA channel estimation compared with IGA and MMSE. The user number is set to be $K=12$ and $K=24$ . The iteration numbers of IC-IGA and IC-SIGA are set as 100. The damping coefficients of IC-IGA and IC-SIGA are $\alpha_{\text{IC-IGA}}=0.45$ and $\alpha_{\text{IC-SIGA}}=0.25$ , respectively. From the figure, it can be seen that both IC-IGA and IC-SIGA can obtain almost the same performance as MMSE for all SNR scenarios with two numbers of users. In addition, the performance using orthogonal pilots is more accurate, which is because the non-orthogonal pilots introduce interference from the pilots of users with other roots.

Fig. 7 plots the convergence performance of the IC-IGA and IC-SIGA for $K=24$ at SNR= $10$ dB. The results of the MMSE estimation and the IGA in [11] are given for comparison. The damping coefficients of IGA, IC-IGA and IC-SIGA are $\alpha_{\text{IGA}}=0.05$ , $\alpha_{\text{IC-IGA}}=0.45$ and $\alpha_{\text{IC-SIGA}}=0.25$ . As shown in the figure, the NMSE performance of IC-IGA and IC-SIGA can converge close to that of the MMSE estimation, which is consistent with Theorems 3 and 4. Furthermore, the NMSE of the IC-IGA decreases rapidly at the beginning, but the convergence speeds of the three algorithms, IC-IGA, IC-SIGA, and IGA, are nearly the same after about $10$ iterations.

Fig. 8 and Fig. 9 show the convergence performance curves of IC-IGA and IC-SIGA for the orthogonal pilots case ( $K=12$ ) and non-orthogonal pilots ( $K=24$ ) case at different SNRs. The SNRs under consideration are low SNR scenarios $\text{SNR}=-10\text{dB}$ and high SNR scenarios where $\text{SNR}=30\text{dB}$ . From the figure, it can be seen that IC-IGA and IC-SIGA can approach convergence within $10$ iterations in the low SNR scenario, and $60$ iterations in the high SNR scenario.

VII Conclusion

In this paper, manifolds of complex Gaussian distributions are illustrated under information geometry theory, and a unified information geometry framework for channel estimation is described. To obtain an interference cancellation style algorithm, a modified MMSE form that has the same mean as the original MMSE estimator is constructed. Based on the unified framework and the modified form, the IC-IGA is then proposed for massive MIMO. The form of IC-IGA is simpler than IGA. Then, IC-SIGA is proposed to further reduce the complexity. The equilibria and complexities of the algorithms are analyzed. Simulation results show that the proposed methods can obtain similar performance to the IGA algorithm with fewer iterations and lower complexity.

Appendix A Proof of Theorem 1

It is easy to verify that $\mathbf{I}+\mathbf{T}\bm{\Upsilon}$ is invertible. Simultaneously multiplying $\mathbf{I}+\mathbf{T}\bm{\Upsilon}$ inside and outside the matrix inversion in the MMSE estimator yields

	$\displaystyle\quad\left(\sigma_{z}^{-2}\mathbf{A}^{H}\mathbf{A}+\mathbf{D}^{-1}\right)^{-1}\left(\sigma_{z}^{-2}\mathbf{A}^{H}\mathbf{y}\right)$
	$\displaystyle=\left((\mathbf{I}+\mathbf{T}\bm{\Upsilon})(\sigma_{z}^{-2}\mathbf{A}^{H}\mathbf{A}+\mathbf{D}^{-1})\right)^{-1}(\mathbf{I}+\mathbf{T}\bm{\Upsilon})\left(\sigma_{z}^{-2}\mathbf{A}^{H}\mathbf{y}\right)$
	$\displaystyle=\left(\sigma_{z}^{-2}\mathbf{A}^{H}\mathbf{A}+\mathbf{D}^{-1}+\mathbf{T}\bm{\Upsilon}\sigma_{z}^{-2}\mathbf{A}^{H}\mathbf{A}+\mathbf{T}\bm{\Upsilon}\mathbf{D}^{-1}\right)^{-1}\left(\sigma_{z}^{-2}\mathbf{A}^{H}\mathbf{y}+\mathbf{T}\bm{\Upsilon}\sigma_{z}^{-2}\mathbf{A}^{H}\mathbf{y}\right).$		(97)

By using $\sigma_{z}^{-2}\mathbf{A}^{H}\mathbf{A}=\mathbf{T}^{H}+\mathbf{I}\odot(\sigma_{z}^{-2}\mathbf{A}^{H}\mathbf{A})$ , it follows that

	$\displaystyle\quad\left(\sigma_{z}^{-2}\mathbf{A}^{H}\mathbf{A}+\mathbf{D}^{-1}\right)^{-1}\left(\sigma_{z}^{-2}\mathbf{A}^{H}\mathbf{y}\right)$
	$\displaystyle=\left(\sigma_{z}^{-2}\mathbf{A}^{H}\mathbf{A}+\mathbf{D}^{-1}+\mathbf{T}\bm{\Upsilon}\mathbf{T}^{H}+\mathbf{T}\bm{\Upsilon}(\mathbf{I}\odot(\sigma_{z}^{-2}\mathbf{A}^{H}\mathbf{A}))+\mathbf{T}\bm{\Upsilon}\mathbf{D}^{-1}\right)^{-1}\left(\sigma_{z}^{-2}\mathbf{A}^{H}\mathbf{y}+\mathbf{T}\bm{\Upsilon}\sigma_{z}^{-2}\mathbf{A}^{H}\mathbf{y}\right)$
	$\displaystyle=\left(\sigma_{z}^{-2}\mathbf{A}^{H}\mathbf{A}+\mathbf{D}^{-1}+\mathbf{T}\bm{\Upsilon}\mathbf{T}^{H}+\mathbf{T}\bm{\Upsilon}\bm{\Upsilon}^{-1}\right)^{-1}\left(\sigma_{z}^{-2}\mathbf{A}^{H}\mathbf{y}+\mathbf{T}\bm{\Upsilon}\sigma_{z}^{-2}\mathbf{A}^{H}\mathbf{y}\right)$
	$\displaystyle=\left(\sigma_{z}^{-2}\mathbf{A}^{H}\mathbf{A}+\mathbf{D}^{-1}+\mathbf{T}+\mathbf{T}\bm{\Upsilon}\mathbf{T}^{H}\right)^{-1}\left(\sigma_{z}^{-2}\mathbf{A}^{H}\mathbf{y}+\mathbf{T}\bm{\Upsilon}\sigma_{z}^{-2}\mathbf{A}^{H}\mathbf{y}\right)$		(98)

where the second equality is due to the definition of $\bm{\Upsilon}$ . The above result means (42) is equivalent to the MMSE estimator.

Appendix B Proof of Theorem 2

The subscript $t$ is omitted here for convenience. After placing the $n$ -th row and $n$ -th column of $(\bm{\Lambda}_{-n}+\mathbf{C}_{n})$ in the first row and first column by left-multiplying of $\mathbf{P}_{1n}$ and right-multiplying of $\mathbf{P}_{1n}^{H}$ , its inverse matrix can be computed by applying block matrix inversion formula, i.e.,

	$\displaystyle\left(\mathbf{P}_{1n}(\bm{\Lambda}_{-n}+\mathbf{C}_{n})\mathbf{P}_{1n}^{H}\right)^{-1}=\left(\begin{array}[]{cc}c_{n}&\bar{\mathbf{k}}_{n}^{H}\\ \bar{\mathbf{k}}_{n}&\frac{1}{c_{n}}\bar{\mathbf{k}}_{n}\bar{\mathbf{k}}_{n}^{H}+\bm{\Lambda}_{n}\end{array}\right)^{-1}$		(101)
	$\displaystyle=\left(\begin{array}[]{cc}r_{n}^{-1}&-\frac{1}{c_{n}}\bar{\mathbf{k}}_{n}^{H}\bm{\Lambda}_{n}^{-1}\\ -\frac{1}{c_{n}}\bm{\Lambda}_{n}^{-1}\bar{\mathbf{k}}_{n}&\bm{\Lambda}_{n}^{-1}\end{array}\right)$		(104)

where

\displaystyle r_{n}

\displaystyle=c_{n}-\bar{\mathbf{k}}_{n}^{H}\left(\frac{1}{c_{n}}\bar{\mathbf{k}}_{n}\bar{\mathbf{k}}_{n}^{H}+\bm{\Lambda}_{n}\right)^{-1}\bar{\mathbf{k}}_{n}.

(105)

By using the Sherman-Morrison formula for matrix inversion, we can obtain that

$\displaystyle r_{n}$	$\displaystyle=c_{n}-\bar{\mathbf{k}}_{n}^{H}\left(\bm{\Lambda}_{n}^{-1}-\frac{\bm{\Lambda}_{n}^{-1}\frac{1}{c_{n}}\bar{\mathbf{k}}_{n}\bar{\mathbf{k}}_{n}^{H}\bm{\Lambda}_{n}^{-1}}{1+\frac{1}{c_{n}}\bar{\mathbf{k}}_{n}^{H}\bm{\Lambda}_{n}^{-1}\bar{\mathbf{k}}_{n}}\right)\bar{\mathbf{k}}_{n}$	(106)
	$\displaystyle=c_{n}-\left(\bar{\mathbf{k}}_{n}^{H}\bm{\Lambda}_{n}^{-1}\bar{\mathbf{k}}_{n}-\frac{\frac{1}{c_{n}}\bar{\mathbf{k}}_{n}^{H}\bm{\Lambda}_{n}^{-1}\bar{\mathbf{k}}_{n}\bar{\mathbf{k}}_{n}^{H}\bm{\Lambda}_{n}^{-1}\bar{\mathbf{k}}_{n}}{1+\frac{1}{c_{n}}\bar{\mathbf{k}}_{n}^{H}\bm{\Lambda}_{n}^{-1}\bar{\mathbf{k}}_{n}}\right)$
	$\displaystyle=c_{n}-\frac{\bar{\mathbf{k}}_{n}^{H}\bm{\Lambda}_{n}^{-1}\bar{\mathbf{k}}_{n}}{1+\frac{1}{c_{n}}\bar{\mathbf{k}}_{n}^{H}\bm{\Lambda}_{n}^{-1}\bar{\mathbf{k}}_{n}}$
	$\displaystyle=\frac{c_{n}}{1+\frac{1}{c_{n}}\bar{\mathbf{k}}_{n}^{H}\bm{\Lambda}_{n}^{-1}\bar{\mathbf{k}}_{n}}$
	$\displaystyle=\frac{c_{n}}{1+e_{n}}$

where $e_{n}=\frac{1}{c_{n}}\bar{\mathbf{k}}_{n}^{H}\bm{\Lambda}_{n}^{-1}\bar{\mathbf{k}}_{n}$ . It can be shown that $c_{n}=r_{n}(1+e_{n})$ .

Further, the natural parameters of the $m$ -projection of the auxiliary point to the target manifold are

$\displaystyle\bm{\Theta}_{n}^{0}$	$\displaystyle=-(\bm{\Sigma}_{n}^{0})^{-1}=-(\mathbf{I}\odot\bm{\Sigma}_{n})^{-1}$	(109)
	$\displaystyle=-(\mathbf{I}\odot(\bm{\Lambda}_{-n}+\mathbf{C}_{n})^{-1})^{-1}$
	$\displaystyle=-\mathbf{P}_{1n}^{H}\left(\begin{array}[]{cc}r_{n}&\mathbf{0}^{H}\\ \mathbf{0}&\bm{\Lambda}_{n}\end{array}\right)\mathbf{P}_{1n}.$

Since $\bm{\Theta}_{n}^{0}=-\bm{\Lambda}_{-n}^{0}$ , then the belief $\bm{\Xi}_{n}$ is

$\displaystyle\bm{\Xi}_{n}$	$\displaystyle=\bm{\Lambda}_{-n}^{0}-\bm{\Lambda}_{-n}$	(114)
	$\displaystyle=\mathbf{P}_{1n}^{H}\left(\begin{array}[]{cc}r_{n}&\mathbf{0}^{H}\\ \mathbf{0}&\bm{\Lambda}_{n}\end{array}\right)\mathbf{P}_{1n}-\mathbf{P}_{1n}^{H}\left(\begin{array}[]{cc}0&\mathbf{0}^{H}\\ \mathbf{0}&\bm{\Lambda}_{n}\end{array}\right)\mathbf{P}_{1n}$	(114)
	$\displaystyle=\mathbf{P}_{1n}^{H}\left(\begin{array}[]{cc}r_{n}&\mathbf{0}^{H}\\ \mathbf{0}&\mathbf{0}\end{array}\right)\mathbf{P}_{1n}.$	(117)

The mean $\bm{\mu}_{n}^{0}$ of $m$ -projection is

	$\displaystyle\bm{\mu}_{n}^{0}$	$\displaystyle=\bm{\mu}_{n}=(\bm{\Lambda}_{-n}+\mathbf{C}_{n})^{-1}(\bm{\lambda}_{-n}+\mathbf{b}_{n})$			(122)
		$\displaystyle=\mathbf{P}_{1n}^{H}\left(\begin{array}[]{cc}r_{n}^{-1}&-\frac{1}{c_{n}}\bar{\mathbf{k}}_{n}^{H}\bm{\Lambda}_{n}^{-1}\\ -\frac{1}{c_{n}}\bm{\Lambda}_{n}^{-1}\bar{\mathbf{k}}_{n}&\bm{\Lambda}_{n}^{-1}\end{array}\right)\left(\begin{array}[]{c}\frac{1}{\sigma_{z}^{2}}\mathbf{a}_{n}^{H}\mathbf{y}\\ \frac{1}{\sigma_{z}^{2}}\frac{1}{c_{n}}\bar{\mathbf{k}}_{n}\mathbf{a}_{n}^{H}\mathbf{y}+\bm{\lambda}_{n}\end{array}\right).$			(122)

From

\displaystyle\quad-\frac{1}{c_{n}}\bm{\Lambda}_{n}^{-1}\bar{\mathbf{k}}_{n}\frac{1}{\sigma_{z}^{2}}\mathbf{a}_{n}^{H}\mathbf{y}+\bm{\Lambda}_{n}^{-1}\left(\frac{1}{\sigma_{z}^{2}}\frac{1}{c_{n}}\bar{\mathbf{k}}_{n}\mathbf{a}_{n}^{H}\mathbf{y}+\bm{\lambda}_{n}\right)=\bm{\Lambda}_{n}^{-1}\bm{\lambda}_{n}

(123)

we have

\displaystyle\bm{\mu}_{n}^{0}

\displaystyle=\bm{\mu}_{n}=\mathbf{P}_{1n}^{H}\left(\begin{array}[]{c}\mu_{n}\\ \bm{\Lambda}_{n}^{-1}\bm{\lambda}_{n}\end{array}\right)

(126)

where

\displaystyle\mu_{n}

\displaystyle=r_{n}^{-1}\frac{1}{\sigma_{z}^{2}}\mathbf{a}_{n}^{H}\mathbf{y}-\frac{1}{c_{n}}\bar{\mathbf{k}}_{n}^{H}\bm{\Lambda}_{n}^{-1}\cdot\frac{1}{\sigma_{z}^{2}}\frac{1}{c_{n}}\bar{\mathbf{k}}_{n}\mathbf{a}_{n}^{H}\mathbf{y}-\frac{1}{c_{n}}\bar{\mathbf{k}}_{n}^{H}\bm{\Lambda}_{n}^{-1}\bm{\lambda}_{n}.

(127)

The second and third terms can be further simplified as

	$\displaystyle\quad-\frac{1}{c_{n}}\bar{\mathbf{k}}_{n}^{H}\bm{\Lambda}_{n}^{-1}\cdot\frac{1}{\sigma_{z}^{2}}\frac{1}{c_{n}}\bar{\mathbf{k}}_{n}\mathbf{a}_{n}^{H}\mathbf{y}$
	$\displaystyle=-\frac{1}{\sigma_{z}^{2}}\frac{1}{(c_{n})^{2}}\bar{\mathbf{k}}_{n}^{H}\bm{\Lambda}_{n}^{-1}\bar{\mathbf{k}}_{n}\mathbf{a}_{n}^{H}\mathbf{y}$
	$\displaystyle\overset{(a)}{=}-\frac{1}{\sigma_{z}^{2}}\frac{1}{c_{n}}e_{n}\mathbf{a}_{n}^{H}\mathbf{y}$
	$\displaystyle\overset{(b)}{=}-\frac{1}{\sigma_{z}^{2}}r_{n}^{-1}(1+e_{n})^{-1}e_{n}\mathbf{a}_{n}^{H}\mathbf{y}$		(128)

and

\displaystyle-\frac{1}{c_{n}}\bar{\mathbf{k}}_{n}^{H}\bm{\Lambda}_{n}^{-1}\bm{\lambda}_{n}\overset{(c)}{=}-r_{n}^{-1}(1+e_{n})^{-1}\bar{\mathbf{k}}_{n}^{H}\bm{\Lambda}_{n}^{-1}\bm{\lambda}_{n}

(129)

where the equality $\overset{(a)}{=}$ is because $e_{n}=\frac{1}{c_{n}}\bar{\mathbf{k}}_{n}^{H}\bm{\Lambda}_{n}^{-1}\bar{\mathbf{k}}_{n}$ , and $\overset{(b)}{=},\overset{(c)}{=}$ is because $c_{n}=r_{n}(1+e_{n})$ . By using this simplified results, we further obtain

$\displaystyle\mu_{n}$	$\displaystyle=r_{n}^{-1}\frac{1}{\sigma_{z}^{2}}\mathbf{a}_{n}^{H}\mathbf{y}-r_{n}^{-1}(1+e_{n})^{-1}e_{n}\frac{1}{\sigma_{z}^{2}}\mathbf{a}_{n}^{H}\mathbf{y}-r_{n}^{-1}(1+e_{n})^{-1}\bar{\mathbf{k}}_{n}^{H}\bm{\Lambda}_{n}^{-1}\bm{\lambda}_{n}$	(130)
	$\displaystyle=r_{n}^{-1}(1+e_{n})^{-1}\frac{1}{\sigma_{z}^{2}}\mathbf{a}_{n}^{H}\mathbf{y}-r_{n}^{-1}(1+e_{n})^{-1}\bar{\mathbf{k}}_{n}^{H}\bm{\Lambda}_{n}^{-1}\bm{\lambda}_{n}$
	$\displaystyle=r_{n}^{-1}(1+e_{n})^{-1}\left(\frac{1}{\sigma_{z}^{2}}\mathbf{a}_{n}^{H}\mathbf{y}-\bar{\mathbf{k}}_{n}^{H}\bm{\Lambda}_{n}^{-1}\bm{\lambda}_{n}\right)$
	$\displaystyle\overset{(d)}{=}c_{n}^{-1}\left(\frac{1}{\sigma_{z}^{2}}\mathbf{a}_{n}^{H}\mathbf{y}-\bar{\mathbf{k}}_{n}^{H}\bm{\Lambda}_{n}^{-1}\bm{\lambda}_{n}\right).$

where the equality $\overset{(d)}{=}$ is because $c_{n}=r_{n}(1+e_{n})$ , $\mu_{n}$ and $r_{n}^{-1}$ are the mean and variance of the $n$ -th element of $\mathbf{h}$ computed from the $n$ -th auxiliary manifold, respectively.

Further, the natural parameter $\bm{\theta}_{n}^{0}$ of the $m$ -projection is

\displaystyle\bm{\theta}_{n}^{0}

\displaystyle=-\bm{\Theta}_{n}^{0}\bm{\mu}_{n}^{0}=\mathbf{P}_{1n}^{H}\left(\begin{array}[]{c}r_{n}\mu_{n}\\ \bm{\lambda}_{n}\end{array}\right).

(133)

According to $\bm{\theta}_{n}^{0}=\bm{\lambda}_{-n}^{0}$ , the belief $\bm{\xi}_{n}$ is

$\displaystyle\bm{\xi}_{n}$	$\displaystyle=\bm{\lambda}_{-n}^{0}-\bm{\lambda}_{-n}$	(138)
	$\displaystyle=\mathbf{P}_{1n}^{H}\left(\begin{array}[]{c}r_{n}\mu_{n}\\ \bm{\lambda}_{n}\end{array}\right)-\mathbf{P}_{1n}^{H}\left(\begin{array}[]{c}0\\ \bm{\lambda}_{n}\end{array}\right)$	(138)
	$\displaystyle=\mathbf{P}_{1n}^{H}\left(\begin{array}[]{c}r_{n}\mu_{n}\\ \mathbf{0}\end{array}\right).$	(141)

Appendix C Proof of Theorem 3

From (26a) and (49), at the equilibrium, we have

$\displaystyle\sum_{n=1}^{N}\bm{\lambda}_{-n}^{\star}$	$\displaystyle=\sum_{n=1}^{N}(\bm{\Lambda}_{-n}+\mathbf{C}_{n})\bm{\mu}_{n}^{\star}-\sum_{n=1}^{N}\mathbf{b}_{n}$	(142)
	$\displaystyle=\left(\sum_{n=1}^{N}\bm{\Lambda}_{-n}^{\star}+\sum_{n=1}^{N}\mathbf{C}_{n}\right)\bm{\mu}^{\star}-\sum_{n=1}^{N}\mathbf{b}_{n}$
	$\displaystyle=\left(\sum_{n=1}^{N}\bm{\Lambda}_{-n}^{\star}+\sigma_{z}^{-2}\mathbf{A}^{H}\mathbf{A}+\mathbf{D}^{-1}+\mathbf{T}+\mathbf{T}\bm{\Upsilon}\mathbf{T}^{H}\right)\bm{\mu}^{\star}$
	$\displaystyle\quad-\left(\sigma_{z}^{-2}\mathbf{A}^{H}\mathbf{y}+\mathbf{T}\bm{\Upsilon}\sigma_{z}^{-2}\mathbf{A}^{H}\mathbf{y}\right).$

Substituting $\sum_{n=1}^{N}\bm{\Lambda}_{-n}^{\star}=(N-1)\bm{\Lambda}^{\star}$ in (67) and $\bm{\mu}^{\star}=\left(\bm{\Lambda}^{\star}\right)^{-1}\bm{\lambda}^{\star}$ into the above equation, we have

$\displaystyle\sum_{n=1}^{N}\bm{\lambda}_{-n}^{\star}$	$\displaystyle=\left((N-1)\bm{\Lambda}^{\star}+\sigma_{z}^{-2}\mathbf{A}^{H}\mathbf{A}+\mathbf{D}^{-1}+\mathbf{T}+\mathbf{T}\bm{\Upsilon}\mathbf{T}^{H}\right)\bm{\mu}^{\star}$	(143)
	$\displaystyle\quad-\left(\sigma_{z}^{-2}\mathbf{A}^{H}\mathbf{y}+\mathbf{T}\bm{\Upsilon}\sigma_{z}^{-2}\mathbf{A}^{H}\mathbf{y}\right)$
	$\displaystyle=(N-1)\bm{\lambda}^{\star}+\left(\sigma_{z}^{-2}\mathbf{A}^{H}\mathbf{A}+\mathbf{D}^{-1}+\mathbf{T}+\mathbf{T}\bm{\Upsilon}\mathbf{T}^{H}\right)\bm{\mu}^{\star}$
	$\displaystyle\quad-\left(\sigma_{z}^{-2}\mathbf{A}^{H}\mathbf{y}+\mathbf{T}\bm{\Upsilon}\sigma_{z}^{-2}\mathbf{A}^{H}\mathbf{y}\right).$

According to $\sum_{n=1}^{N}\bm{\lambda}_{-n}^{\star}=(N-1)\bm{\lambda}^{\star}$ in (67), it can be further obtained that

\displaystyle 0=\left(\sigma_{z}^{-2}\mathbf{A}^{H}\mathbf{A}+\mathbf{D}^{-1}+\mathbf{T}+\mathbf{T}\bm{\Upsilon}\mathbf{T}^{H}\right)\bm{\mu}^{\star}-\left(\sigma_{z}^{-2}\mathbf{A}^{H}\mathbf{y}+\mathbf{T}\bm{\Upsilon}\sigma_{z}^{-2}\mathbf{A}^{H}\mathbf{y}\right)

(144)

which means

\displaystyle\bm{\mu}^{\star}=\left(\sigma_{z}^{-2}\mathbf{A}^{H}\mathbf{A}+\mathbf{D}^{-1}+\mathbf{T}+\mathbf{T}\bm{\Upsilon}\mathbf{T}^{H}\right)^{-1}\left(\sigma_{z}^{-2}\mathbf{A}^{H}\mathbf{y}+\mathbf{T}\bm{\Upsilon}\sigma_{z}^{-2}\mathbf{A}^{H}\mathbf{y}\right).

(145)

From Theorem 1 it follows that this equation is equal to the mean $\bm{\mu}_{\text{MMSE}}$ of the MMSE estimation, i.e.,

\displaystyle\bm{\mu}^{\star}=\left(\sigma_{z}^{-2}\mathbf{A}^{H}\mathbf{A}+\mathbf{D}^{-1}\right)^{-1}\sigma_{z}^{-2}\mathbf{A}^{H}\mathbf{y}.

(146)

Appendix D Proof of Theorem 4

From (68), at the Equilibrium, we have

\displaystyle\bm{\mu}^{\star}

\displaystyle=\frac{1}{\sigma_{z}^{2}}\left(\mathbf{A}^{H}\mathbf{y}-\mathbf{A}^{H}\mathbf{A}\bm{\mu}^{\star}+(\mathbf{I}\odot\mathbf{A}^{H}\mathbf{A})\bm{\mu}^{\star}\right)./\mathbf{c}.

(147)

It can be reexpressed as

\displaystyle\left(\text{diag}(\mathbf{c})\cdot\mathbf{I}+\sigma_{z}^{-2}\mathbf{A}^{H}\mathbf{A}-\sigma_{z}^{-2}\left(\mathbf{I}\odot\mathbf{A}^{H}\mathbf{A}\right)\right)\bm{\mu}^{\star}

\displaystyle=\sigma_{z}^{-2}\mathbf{A}^{H}\mathbf{y}.

(148)

From $c_{n}=\frac{1}{\sigma_{z}^{2}}\mathbf{a}_{n}^{H}\mathbf{a}_{n}+d_{n}^{-1}$ , it follows that $\text{diag}(\mathbf{c})=\sigma_{z}^{-2}\left(\mathbf{I}\odot\mathbf{A}^{H}\mathbf{A}\right)+\mathbf{D}^{-1}$ , then it follows that

\displaystyle\left(\sigma_{z}^{-2}\mathbf{A}^{H}\mathbf{A}+\mathbf{D}^{-1}\right)\bm{\mu}^{\star}

\displaystyle=\sigma_{z}^{-2}\mathbf{A}^{H}\mathbf{y}

(149)

which means

\displaystyle\bm{\mu}^{\star}=\left(\sigma_{z}^{-2}\mathbf{A}^{H}\mathbf{A}+\mathbf{D}^{-1}\right)^{-1}\sigma_{z}^{-2}\mathbf{A}^{H}\mathbf{y}.

(150)

References

[1] L. Lu, G. Y. Li, A. L. Swindlehurst, A. Ashikhmin, and R. Zhang, “An overview of massive MIMO: Benefits and challenges,” IEEE J. Sel. Topics Signal Process., vol. 8, no. 5, pp. 742–758, 2014.
[2] E. Björnson, J. Hoydis, M. Kountouris, and M. Debbah, “Massive MIMO systems with non-ideal hardware: Energy efficiency, estimation, and capacity limits,” IEEE Trans. Inf. Theory, vol. 60, no. 11, pp. 7112–7139, 2014.
[3] T. L. Marzetta, E. G. Larsson, H. Yang, and H. Q. Ngo, Fundamentals of Massive MIMO. Cambridge University Press, 2016.
[4] E. De Carvalho, A. Ali, A. Amiri, M. Angjelichinoski, and R. W. Heath, “Non-stationarities in extra-large-scale massive MIMO,” IEEE Wireless Communications, vol. 27, no. 4, pp. 74–80, 2020.
[5] J. C. Marinello, T. Abrão, A. Amiri, E. De Carvalho, and P. Popovski, “Antenna selection for improving energy efficiency in XL-MIMO systems,” IEEE Trans. Veh. Technol., vol. 69, no. 11, pp. 13 305–13 318, 2020.
[6] Z. Wang, J. Zhang, H. Du, D. Niyato, S. Cui, B. Ai, M. Debbah, K. B. Letaief, and H. V. Poor, “A tutorial on extremely large-scale MIMO for 6G: Fundamentals, signal processing, and applications,” IEEE Communications Surveys & Tutorials, 2024.
[7] A.-A. Lu, Y. Chen, and X. Gao, “2D beam domain statistical CSI estimation for massive MIMO uplink,” IEEE Trans. Wireless Commun., vol. 23, no. 1, pp. 749 – 761, 2024.
[8] O. Elijah, C. Y. Leow, T. A. Rahman, S. Nunoo, and S. Z. Iliya, “A comprehensive survey of pilot contamination in massive MIMO—5G system,” IEEE Communications Surveys & Tutorials, vol. 18, no. 2, pp. 905–923, 2015.
[9] L. You, X. Gao, X.-G. Xia, N. Ma, and Y. Peng, “Pilot reuse for massive MIMO transmission over spatially correlated Rayleigh fading channels,” IEEE Trans. Wireless Commun., vol. 14, no. 6, pp. 3352–3366, 2015.
[10] H. Wang, W. Zhang, Y. Liu, Q. Xu, and P. Pan, “On design of non-orthogonal pilot signals for a multi-cell massive mimo system,” IEEE Wireless Commun. Lett., vol. 4, no. 2, pp. 129–132, 2014.
[11] J. Yang, A.-A. Lu, Y. Chen, X. Gao, X.-G. Xia, and D. T. Slock, “Channel estimation for massive MIMO: An information geometry approach,” IEEE Trans. Signal Process., vol. 70, pp. 4820–4834, 2022.
[12] N. Shariati, E. Björnson, M. Bengtsson, and M. Debbah, “Low-complexity polynomial channel estimation in large-scale mimo with arbitrary statistics,” IEEE J. Sel. Topics Signal Process., vol. 8, no. 5, pp. 815–830, 2014.
[13] A. Wang, R. Yin, and C. Zhong, “Channel estimation for uniform rectangular array based massive MIMO systems with low complexity,” IEEE Trans. Veh. Technol., vol. 68, no. 3, pp. 2545–2556, 2019.
[14] H. He, C.-K. Wen, S. Jin, and G. Y. Li, “Deep learning-based channel estimation for beamspace mmWave massive MIMO systems,” IEEE Commun. Lett., vol. 7, no. 5, pp. 852–855, 2018.
[15] E. Balevi, A. Doshi, and J. G. Andrews, “Massive MIMO channel estimation with an untrained deep neural network,” IEEE Trans. Wireless Commun., vol. 19, no. 3, pp. 2079–2090, 2020.
[16] F. Bellili, F. Sohrabi, and W. Yu, “Generalized approximate message passing for massive MIMO mmWave channel estimation with Laplacian prior,” IEEE Trans. Commun., vol. 67, no. 5, pp. 3205–3219, 2019.
[17] S.-i. Amari, Information Geometry and Its Applications. Springer, 2016.
[18] N. Ay, J. Jost, H. Vân Lê, and L. Schwachhöfer, Information geometry. Springer, 2017, vol. 64.
[19] F. Nielsen, “The many faces of information geometry,” Not. Am. Math. Soc, vol. 69, no. 1, pp. 36–45, 2022.
[20] S.-i. Amari, “Information geometry in optimization, machine learning and statistical inference,” Frontiers of Electrical and Electronic Engineering in China, vol. 5, pp. 241–260, 2010.
[21] M. Oizumi, N. Tsuchiya, and S.-i. Amari, “Unified framework for information integration based on information geometry,” Proceedings of the National Academy of Sciences, vol. 113, no. 51, pp. 14 817–14 822, 2016.
[22] L. Banchi, P. Giorda, and P. Zanardi, “Quantum information-geometry of dissipative quantum phase transitions,” Physical Review E, vol. 89, no. 2, p. 022102, 2014.
[23] J. Pearl, Probabilistic reasoning in intelligent systems: networks of plausible inference. Morgan kaufmann, 1988.
[24] S. Ikeda, T. Tanaka, and S.-i. Amari, “Stochastic reasoning, free energy, and information geometry,” Neural Computation, vol. 16, no. 9, pp. 1779–1810, 2004.
[25] A. L. Yuille, “CCCP algorithms to minimize the Bethe and Kikuchi free energies: Convergent alternatives to belief propagation,” Neural computation, vol. 14, no. 7, pp. 1691–1722, 2002.
[26] S. Ikeda, T. Tanaka, and S.-i. Amari, “Information geometry of turbo and low-density parity-check codes,” IEEE Trans. Inf. Theory, vol. 50, no. 6, pp. 1097–1114, 2004.
[27] A. Zia, J. P. Reilly, J. Manton, and S. Shirani, “An information geometric approach to ML estimation with incomplete data: application to semiblind MIMO channel identification,” IEEE Trans. Signal Process., vol. 55, no. 8, pp. 3975–3986, 2007.
[28] J. Yang, Y. Chen, X. Gao, D. Slock, and X.-G. Xia, “Signal detection for ultra-massive MIMO: An information geometry approach,” IEEE Trans. Signal Process., 2024.
[29] J. Yang, Y. Chen, A.-A. Lu, W. Zhong, X. Gao, X. You, X.-G. Xia, and D. Slock, “Channel estimation for massive MIMO-OFDM: simplified information geometry approach,” in 2023 IEEE 98th Vehicular Technology Conference (VTC2023-Fall). IEEE, 2023, pp. 1–6.
[30] O. Simeone, Machine learning for engineers. Cambridge university press, 2022.
[31] L. M. Bregman, “The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming,” USSR computational mathematics and mathematical physics, vol. 7, no. 3, pp. 200–217, 1967.
[32] S.-i. Amari and H. Nagaoka, Methods of information geometry. American Mathematical Soc., 2000, vol. 191.
[33] C. Sun, X. Gao, S. Jin, M. Matthaiou, Z. Ding, and C. Xiao, “Beam division multiple access transmission for massive MIMO communications,” IEEE Trans. Commun., vol. 63, no. 6, pp. 2170–2184, 2015.
[34] 3GPP TS 36.211, “Evolved universal terrestrial radio access (E-UTRA); physical channels and modulation (Release 15),” 2019.

Interference Cancellation Information Geometry Approach for Massive MIMO Channel Estimation

Abstract

Index Terms:

I Introduction

II Preliminaries

II-A Affine and Dual Affine Coordinate Systems

II-B ee-flat Submanifold and mm-Projection

III Information Geometry Framework for Massive MIMO Channel Estimation

III-A Problem Formulation

III-B Information Geometry Framework

III-C Split of Natural Parameter and the ee-Condition

III-D The mm-Condition and General Algorithm

III-E Geometrical Explanation

IV IC-IGA for Massive MIMO Channel Estimation

IV-A Definition of Auxiliary Manifolds with Modified MMSE Form

Theorem 1.

Proof.

IV-B Derivation of IC-IGA

Theorem 2.

Proof.

IV-C Equilibrium and Complexity Analysis

Theorem 3.

Proof.

V IC-SIGA for Massive MIMO Channel Estimation with BSCM and ZC Sequences

V-A Derivation of IC-SIGA

Theorem 4.

Proof.

V-B System Configuration and Channel Model of Massive MIMO

V-C Receive Model with BSCM and ZC Sequences

V-D Complexity Analysis

VI Simulation Results

VII Conclusion

Appendix A Proof of Theorem 1

Appendix B Proof of Theorem 2

Appendix C Proof of Theorem 3

Appendix D Proof of Theorem 4

References

II-B $e$ -flat Submanifold and $m$ -Projection

III-C Split of Natural Parameter and the $e$ -Condition

III-D The $m$ -Condition and General Algorithm