This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Memory Approximate Message Passing

Lei Liu, Shunqi Huang and Brian M. Kurkoski
School of Information Science, Japan Institute of Science and Technology, Japan
Abstract

Approximate message passing (AMP) is a low-cost iterative parameter-estimation technique for certain high-dimensional linear systems with non-Gaussian distributions. However, AMP only applies to independent identically distributed (IID) transform matrices, but may become unreliable for other matrix ensembles, especially for ill-conditioned ones. To handle this difficulty, orthogonal/vector AMP (OAMP/VAMP) was proposed for general right-unitarily-invariant matrices. However, the Bayes-optimal OAMP/VAMP requires high-complexity linear minimum mean square error estimator. To solve the disadvantages of AMP and OAMP/VAMP, this paper proposes a memory AMP (MAMP), in which a long-memory matched filter is proposed for interference suppression. The complexity of MAMP is comparable to AMP. The asymptotic Gaussianity of estimation errors in MAMP is guaranteed by the orthogonality principle. A state evolution is derived to asymptotically characterize the performance of MAMP. Based on the state evolution, the relaxation parameters and damping vector in MAMP are optimized. For all right-unitarily-invariant matrices, the optimized MAMP converges to OAMP/VAMP, and thus is Bayes-optimal if it has a unique fixed point. Finally, simulations are provided to verify the validity and accuracy of the theoretical results.

footnotetext: A full version of this paper is accessible at arXiv (see [1]). The source code of this work is publicly available at sites.google.com/site/leihomepage/researc.L. Liu was supported in part by the Japan Society for the Promotion of Science (JSPS) Kakenhi under Grant JP 21K14156, and in part by JSPS Kakenhi Grant JP 19H02137.

I Introduction

Consider the problem of signal reconstruction for a noisy linear system:

𝒚=𝑨𝒙+𝒏\bm{y}=\bm{Ax}+\bm{n} (1)

where 𝒚M×1\bm{y}\!\in\!\mathbb{C}^{M\!\times\!1} is a vector of observations, 𝑨M×N\bm{A}\!\in\!\mathbb{C}^{M\!\times\!N} is a transform matrix, 𝒙\bm{x} is a vector to be estimated and 𝒏𝒞𝒩(𝟎,σ2𝑰M)\bm{n}\!\sim\!\mathcal{CN}(\mathbf{0},\sigma^{2}\bm{I}_{M}) is a vector of Gaussian additive noise samples. The entries of 𝒙\bm{x} are independent and identically distributed (IID) with zero mean and unit variance, i.e., xiPxx_{i}\sim P_{x}. In this paper, we consider a large system with M,NM,N\to\infty and a fixed δ=M/N\delta=M/N (compressed ratio). In the special case when 𝒙\bm{x} is Gaussian, the optimal solution can be obtained using standard linear minimum mean square error (MMSE) methods. Otherwise, the problem is in general NP hard [2, 3].

I-A Background

Approximate message passing (AMP) has attracted extensive research interest for this problem [4, 5]. AMP adopts a low-complexity matched filter (MF), so its complexity is as low as 𝒪(MN){\cal O}(MN) per iteration. Remarkably, the asymptotic performance of AMP can be described by a scalar recursion called state evolution derived heuristically in [5] and proved rigorously in [4]. State evolution analysis in [4] implies that AMP is Bayes-optimal for zero-mean IID sub-Gaussian sensing matrices when the compression rate is larger than a certain threshold [7]. Spatial coupling [7, 8, 9, 10] is used for the optimality of AMP for any compression rate.

A basic assumption of AMP is that 𝑨\bm{A} has IID Gaussian entries [5, 4]. For matrices with correlated entries, AMP may perform poorly or even diverge [11, 12, 13]. It was discovered in [14, 15] that a variant of AMP based on a unitary transformation, called UTAMP, performs well for difficult (e.g. correlated) matrices 𝑨\bm{A}. Independently, orthogonal AMP (OAMP) was proposed in [16] for unitarily invariant 𝑨\bm{A}. OAMP is related to a variant of the expectation propagation algorithm [17] (called diagonally-restricted expectation consistent inference in [18] or scalar expectation propagation in [19]), as observed in [20, 21]. A closely related algorithm, an MMSE-based vector AMP (VAMP) [21], is equivalent to expectation propagation in its diagonally-restricted form [18]. The accuracy of state evolution for such expectation propagation type algorithms (including VAMP and OAMP) was conjectured in [16] and proved in [21, 20]. The Bayes optimality of OAMP is derived in [21, 20, 16] when the compression rate is larger than a certain threshold, and the advantages of AMP-type algorithms over conventional turbo receivers [23, 22] are demonstrated in [25, 24].

The main weakness of OAMP/VAMP is the high-complexity 𝒪(M3+M2N){\cal O}(M^{3}+M^{2}N) incurred by linear MMSE (LMMSE) estimator. Singular-value decomposition (SVD) was used to avoid the high-complexity LMMSE in each iteration [21], but the complexity of the SVD itself is as high as that of the LMMSE estimator. The performance of OAMP/VAMP degrades significantly when the LMMSE estimator is replaced by the low-complexity MF [16] used in AMP. This limits the application of OAMP/VAMP to large-scale systems for which LMMSE is too complex.

In summary, the existing Bayes-optimal AMP-type algorithms are either limited to IID matrices (e.g. AMP) or need high-complexity LMMSE (e.g. OAMP/VAMP). Hence, a low-complexity Bayes-optimal message passing algorithm for unitarily invariant matrices is desired.

A long-memory AMP algorithm was originally constructed in [26] to solve the Thouless-Anderson-Palmer equations for Ising models with general invariant random matrices. The results in [26] were rigorously justified via SE in [27]. Recently, Takeuchi proposed convolutional AMP (CAMP), in which the AMP is modified by replacing the Onsager term with a convolution of all preceding messages [28]. The CAMP has low complexity and applies to unitarily invariant matrices. It is proved that the CAMP is Bayes-optimal if it converges to a unique fixed point [28]. However, it is found that the CAMP has a low convergence speed and may fail to converge, particularly for matrices with high condition numbers. In addition, a heuristic damping was used to improve the convergence of CAMP. However, the damping is performed on the a-posteriori outputs, which breaks orthogonality and the asymptotic Gaussianity of estimation errors [28].

I-B Contributions

To overcome the difficulties in AMP, OAMP/VAMP and CAMP, we propose a memory AMP (MAMP) using a low-complexity long-memory MF. Due to the correlated long memory, stricter orthogonality is required for MAMP to guarantee the asymptotic Gaussianity of estimation errors in MAMP [20, 28]. In detail, the step-by-step orthogonalization between current input and output estimation errors in OAMP/VAMP is not sufficient, and instead, the current output estimation error is required to be orthogonal to all preceding input estimation errors. A covariance-matrix state evolution is established for MAMP. Based on state evolution, relaxation parameters and a damping vector, preserving the orthogonality (e.g. the asymptotic Gaussianity of estimation errors), are analytically optimized to guarantee and improve the convergence of MAMP. The main properties of MAMP are summarized as follows.

  • MAMP has comparable complexity to AMP and much lower complexity than OAMP/VAMP.

  • MAMP converges to the same fixed point as that of OAMP/VAMP for all unitarily invariant matrices. As a result, it is Bayes-optimal if it has a unique fixed point.

II Preliminaries

II-A Problem Formulation

Refer to caption
Figure 1: Graphic illustrations for (a) a system model with two constraints Φ\Phi and Γ\Gamma, and (b) a non-memory iterative process (NMIP) involving two local processors γt\gamma_{t} and ϕt\phi_{t}.

Fig. 1(a) illustrates the system in (1) with two constraints:

Γ:𝒚=𝑨𝒙+𝒏,Φ:xiPx,i.\Gamma:\;\;\bm{y}=\bm{Ax}+\bm{n},\qquad\Phi:\;\;x_{i}\sim P_{x},\;\;\forall i.\vspace{-1mm} (2)

Our aim is to use the AMP-type iterative approach in Fig. 1(b) to find an MMSE estimation of 𝒙\bm{x}, i.e., its MSE converges to

mmse{𝒙|𝒚,𝑨,Γ,Φ}1NE{𝒙^post𝒙2},{\rm mmse}\{\bm{x}|\bm{y},\bm{A},{\Gamma},{\Phi}\}\equiv\tfrac{1}{N}{\mathrm{E}}\{\|\hat{\bm{x}}_{\rm post}-{\bm{x}}\|^{2}\}, (3)

where 𝒙^post=E{𝒙|𝒚,𝑨,Γ,Φ}\hat{\bm{x}}_{\rm post}\!=\!{\mathrm{E}}\{\bm{x}|\bm{y},\bm{A},{\Gamma},{\Phi}\} is the a-posteriori meas of 𝒙\bm{x}.

Definition 1 (Bayes Optimality)

An iterative approach is said to be Bayes optimal if its MSE converges to the MMSE of the system in (1).

II-B Assumptions

Let the singular value decomposition of 𝑨\bm{A} be 𝑨=𝑼𝚺𝑽\bm{A}\!=\!\bm{U}\bm{\Sigma}\bm{V}, where 𝑼M×M\bm{U}\!\in\!\mathbb{C}^{M\times M} and 𝑽N×N\bm{V}\!\in\!\mathbb{C}^{N\times N} are orthogonal matrices, and 𝚺\bm{\Sigma} is a diagonal matrix. We assume that 𝑨\bm{A} is known and is right-unitarily-invariant, i.e., 𝑼\bm{U}, 𝑽\bm{V} and 𝚺\bm{\Sigma} are independent, and 𝑽\bm{V} is Haar distributed. Let λt=1NE{tr[(𝑨𝑨H)t]}\lambda_{t}=\tfrac{1}{N}{\rm E}\{{\rm tr}[(\bm{A}\bm{A}^{\rm H})^{t}]\} and λ[λmax+λmin]/2{\lambda}^{\dagger}\equiv[\lambda_{\max}+\lambda_{\min}]/2, where λmin\lambda_{\min} and λmax\lambda_{\max} denote the minimal and maximal eigenvalues of 𝑨𝑨H\bm{A}\bm{A}^{\rm H}, respectively. We assume that λmin\lambda_{\min}, λmax\lambda_{\max} and {λt}\{\lambda_{t}\} are known. This assumption can be relaxed using specific approximations (see [1]).

II-C Non-memory Iterative Process and Orthogonality

Non-memory Iterative Process (NMIP): Fig. 1(b) illustrates an NMIP consisting of a linear estimator (LE) and a non-linear estimator (NLE): Starting with t=1t=1,

LE:𝒓t=γt(𝒙t),NLE:𝒙t+1=ϕt(𝒓t),{\mathrm{LE:}}\quad\bm{r}_{t}=\gamma_{t}\left(\bm{x}_{t}\right),\qquad{\mathrm{NLE:}}\;\;\bm{x}_{t+1}=\phi_{t}\left(\bm{r}_{t}\right),\vspace{-1mm} (4)

where γt()\gamma_{t}(\cdot) and ϕt()\phi_{t}(\cdot) process the two constraints Γ\Gamma and Φ\Phi separately, based only on their current inputs 𝒙t\bm{x}_{t} and 𝒓t\bm{r}_{t} respectively. Let

𝒓t=𝒙+𝒈t,𝒙t=𝒙+𝒇t,\bm{r}_{t}=\bm{x}+\bm{g}_{t},\qquad\bm{x}_{t}=\bm{x}+\bm{f}_{t},\vspace{-1mm} (5a)
where 𝒈t\bm{g}_{t} and 𝒇t\bm{f}_{t} indicate the estimation errors with zero means and variances:
vtγ=1NE{𝒈t2},vtϕ=1NE{𝒇t2}.\displaystyle v_{t}^{\gamma}=\tfrac{1}{N}{\rm E}\{\|\bm{g}_{t}\|^{2}\},\qquad v_{t}^{\phi}=\tfrac{1}{N}{\rm E}\{\|\bm{f}_{t}\|^{2}\}. (5b)

The asymptotic IID Gaussian property of an NMIP was conjectured in [16] and proved in [20, 21].

Lemma 1 (Orthogonality and Asymptotic IID Gaussianity)

Assume that 𝐀\bm{A} is unitarily invariant with M,NM,N\!\to\!\infty and the following orthogonality holds for all t1t\geq 1:

1N𝒈tH𝒇t=a.s.0,1N𝐟t+1H𝐠t=a.s.0.\tfrac{1}{N}\bm{g}_{t}^{\rm H}\bm{f}_{t}\overset{\rm a.s.}{=}0,\qquad\tfrac{1}{N}\bm{f}_{t+1}^{\rm H}\bm{g}_{t}\overset{\rm a.s.}{=}0. (6)

Then, for Lipschitz-continuous [29] {γt()}\{{\gamma}_{t}(\cdot)\} and separable-and-Lipschitz-continuous {ϕt()}\{\phi_{t}(\cdot)\}, we have: t1\forall t\geq 1,

vtγ\displaystyle v_{t}^{\gamma} =a.s.1NE{γt(𝐱+𝜼tϕ)𝐱2},\displaystyle\overset{\rm a.s.}{=}\tfrac{1}{N}{\rm E}\Big{\{}\|\gamma_{t}(\bm{x}+\bm{\eta}_{t}^{\phi})-\bm{x}\|^{2}\Big{\}}, (7a)
vt+1ϕ\displaystyle v_{t+1}^{\phi} =a.s.1NE{ϕt(𝐱+𝜼tγ)𝐱2},\displaystyle\overset{\rm a.s.}{=}\tfrac{1}{N}{\rm E}\Big{\{}\|\phi_{t}(\bm{x}+\bm{\eta}_{t}^{\gamma})-\bm{x}\|^{2}\Big{\}}, (7b)

where 𝛈tϕ𝒞𝒩(𝟎,vtϕ𝐈)\bm{\eta}_{t}^{\phi}\sim\mathcal{CN}(\bm{0},v_{t}^{\phi}\bm{I}) and 𝛈tγ𝒞𝒩(𝟎,vtγ𝐈)\bm{\eta}_{t}^{\gamma}\sim\mathcal{CN}(\bm{0},v_{t}^{\gamma}\bm{I}) are independent of 𝐱\bm{x}.

II-D Overview of OAMP/VAMP

OAMP/VAMP [16, 21]: Let ρt=σ2/vtϕ\rho_{t}=\sigma^{2}/{v_{t}^{\phi}}, ϕ^t()\hat{\phi}_{t}(\cdot) be an MMSE estimator, and γ^t()\hat{\gamma}_{t}(\cdot) be an estimator of 𝒙\bm{x} defined as

γ^t(𝒙t)𝑨H(ρt𝑰+𝑨𝑨H)1(𝒚𝑨𝒙t).\hat{\gamma}_{t}\left(\bm{x}_{t}\right)\equiv\bm{A}^{\rm H}\big{(}\rho_{t}\bm{I}+\bm{A}\bm{A}^{\mathrm{H}}\big{)}^{-1}(\bm{y}-\bm{A}\bm{x}_{t}). (8)

An OAMP/VAMP is then defined as: Starting with t=1t=1, v1ϕ=1v_{1}^{\phi}=1 and 𝒙1=𝟎\bm{x}_{1}=\bm{0},

LE:𝒓t=γt(𝒙t)\displaystyle\!\!\!\!{\mathrm{LE:}}\quad\bm{r}_{t}=\gamma_{t}\left(\bm{x}_{t}\right) 1ϵtγγ^t(𝒙t)+𝒙t,\displaystyle\equiv\tfrac{1}{{\epsilon}^{\gamma}_{t}}\hat{\gamma}_{t}\left(\bm{x}_{t}\right)+\bm{x}_{t}, (9a)
NLE:𝒙t+1=ϕt(𝒓t)\displaystyle\!\!\!\!{\mathrm{NLE:}}\;\bm{x}_{t+1}\!=\!\phi_{t}\left(\bm{r}_{t}\right) 1ϵt+1ϕ[ϕ^t(𝒓t)+(ϵt+1ϕ1)𝒓t],\displaystyle\!\equiv\!\tfrac{1}{\epsilon^{\phi}_{t+1}}\!\big{[}\hat{\phi}_{t}(\bm{r}_{t})\!+\!(\epsilon^{\phi}_{t+1}\!-\!1)\bm{r}_{t}\big{]}, (9b)
where
ϵtγ\displaystyle{\epsilon}_{t}^{\gamma} =1Ntr{𝑨H(ρt𝑰+𝑨𝑨H)1𝑨},\displaystyle=\tfrac{1}{N}{\mathrm{tr}}\big{\{}\bm{A}^{\rm H}\big{(}\rho_{t}\bm{I}+\bm{A}\bm{A}^{\mathrm{H}}\big{)}^{\!-1}\!\!\bm{A}\big{\}}, (9c)
vtγ\displaystyle v^{\gamma}_{t} =γSE(vtϕ)vtϕ[(ϵtγ)11],\displaystyle=\gamma_{\mathrm{SE}}(v_{t}^{\phi})\equiv{v}^{\phi}_{t}[({\epsilon}_{t}^{\gamma})^{-1}-1], (9d)
ϵt+1ϕ\displaystyle\epsilon^{\phi}_{t+1} =11Nvtγϕ^t(𝒙+vtγ𝜼)𝒙2,\displaystyle=1-\tfrac{1}{Nv^{\gamma}_{t}}\|\hat{\phi}_{t}(\bm{x}+\sqrt{v^{\gamma}_{t}}\bm{\eta})-\bm{x}\|^{2}, (9e)
vt+1ϕ\displaystyle{v}^{\phi}_{t+1} =ϕSE(vtγ)vtγ[(ϵt+1ϕ)11],\displaystyle=\phi_{\mathrm{SE}}(v^{\gamma}_{t})\equiv v^{\gamma}_{t}[(\epsilon^{\phi}_{t+1})^{-1}-1], (9f)

and 𝜼𝒞𝒩(𝟎,𝑰)\bm{\eta}\sim{\cal{CN}}(\bm{0},\bm{I}) is independent of 𝒙\bm{x}.

We assume that ϕ^t()\hat{\phi}_{t}(\cdot) is an MMSE estimator given by

ϕ^t(𝒓t)E{𝒙|𝒓t}.\hat{\phi}_{t}(\bm{r}^{t})\equiv\mathrm{E}\{\bm{x}|\bm{r}^{t}\}. (10)

It was proved in [20] that OAMP/VAMP satisfies the orthogonality in (6). Hence, the IID Gaussian property in (7) holds for OAMP/VAMP.

Lemma 2 (Bayes Optimality [32, 30, 16, 31])

Assume that M,NM,\!N\!\to\!\infty with a fixed δ=M/N\delta\!=\!\!M\!/\!N, and OAMP/VAMP satisfies the unique fixed point condition. Then, OAMP/VAMP is Bayes optimal for right-unitarily-invariant matrices.

Complexity: The NLE in OAMP/VAMP is a symbol-by-symbol estimator, whose time complexity is as low as 𝒪(N){\cal O}(N). The complexity of OAMP/VAMP is dominated by LMMSE-LE, which costs 𝒪(M2N+M3)\mathcal{O}(M^{2}N\!+\!M^{3}) time complexity per iteration for matrix multiplication and matrix inversion.

III Memory AMP

III-A Memory Iterative Process and Orthogonality

Refer to caption
Figure 2: Graphic illustration for a long-memory iterative process (MIP).

Memory Iterative Process (MIP): Fig. 2 illustrates an MIP based on a long-memory linear estimator (LMLE) and a non-linear estimator (NLE) defined as: Starting with t=1t=1,

LMIE:\displaystyle{\rm LMIE:} 𝒓t\displaystyle\quad\quad\bm{r}_{t} =γt(𝒙1,𝒙t),\displaystyle=\gamma_{t}\left(\bm{x}_{1},\cdots\bm{x}_{t}\right), (11a)
NLE:\displaystyle{\rm NLE:} 𝒙t+1\displaystyle\quad\bm{x}_{t+1} =ϕt(𝒓t).\displaystyle=\phi_{t}\left(\bm{r}_{t}\right). (11b)

We call (11) MIP since γt()\gamma_{t}(\cdot) contains long memory {𝒙i,i<t}\{\bm{x}_{i},i<t\}. It should be emphasized that the step-by-step orthogonalization between current input and output estimation errors is not sufficient to guarantee the asymptotic IID Gaussianity for MIP. Thus, a stricter orthogonality is required, i.e., the estimation error of γt()\gamma_{t}(\cdot) is required to be orthogonal to all preceding estimation errors [28, 33].

Lemma 3 (Orthogonality and Asymptotic IID Gaussianity [28, 33])

Assume that 𝐀\bm{A} is unitarily invariant with M,NM,N\!\to\!\infty and the following orthogonality holds for all 1tt1\leq t^{\prime}\leq t:

1N𝒈tH𝒇t=a.s.0,1N𝐟t+1H𝐠t=a.s.0.\tfrac{1}{N}\bm{g}_{t}^{\rm H}\bm{f}_{t^{\prime}}\overset{\rm a.s.}{=}0,\qquad\tfrac{1}{N}\bm{f}_{t+1}^{\rm H}\bm{g}_{t}\overset{\rm a.s.}{=}0. (12)

Then, for Lipschitz-continuous [29] {γt()}\{{\gamma}_{t}(\cdot)\} and separable-and-Lipschitz-continuous {ϕt()}\{\phi_{t}(\cdot)\}, we have: 1tt\forall 1\!\leq\!t^{\prime}\!\leq\!t,

vt,tγ=a.s.1NE{[γt(𝐱+𝜼1ϕ,,𝐱+𝜼tϕ)𝐱]H\displaystyle v_{t,t^{\prime}}^{\gamma}\!\!\overset{\rm a.s.}{=}\!\!\tfrac{1}{N}{\rm E}\big{\{}\big{[}\gamma_{t}\big{(}\bm{x}+\bm{\eta}_{1}^{\phi},\dots,\bm{x}+\bm{\eta}_{t}^{\phi}\big{)}-\bm{x}\big{]}^{\rm H}
[γt(𝒙+𝜼1ϕ,,𝒙+𝜼tϕ)𝒙]},\displaystyle\hskip 56.9055pt\cdot\big{[}\gamma_{t^{\prime}}\big{(}\bm{x}+\bm{\eta}_{1}^{\phi},\dots,\bm{x}+\bm{\eta}_{t^{\prime}}^{\phi}\big{)}-\bm{x}\big{]}\big{\}}, (13a)
vt+1,t+1ϕ=a.s.1NE{[ϕt(𝐱+𝜼tγ)𝐱]H[ϕt(𝐱+𝜼tγ)𝐱]},\displaystyle v_{t+1,{t^{\prime}}\!+1}^{\phi}\!\!\overset{\rm a.s.}{=}\!\!\tfrac{1}{N}{\rm E}\big{\{}\!\big{[}\phi_{t}\big{(}\bm{x}\!+\!\bm{\eta}_{t}^{\gamma}\big{)}\!-\!\bm{x}\big{]}^{\!\rm H}\big{[}\phi_{t^{\prime}}\big{(}\bm{x}\!+\!\bm{\eta}_{t^{\prime}}^{\gamma}\big{)}\!-\!\bm{x}\big{]}\!\big{\}}\!, (13b)

where 𝛈tϕ𝒞𝒩(𝟎,vt,tϕ𝐈)\bm{\eta}_{t}^{\phi}\!\!\sim\!\mathcal{CN}(\bm{0},v_{t,t}^{\phi}\bm{I}) with E{𝛈τϕ(𝛈τϕ)H}=vτ,τϕ𝐈{\rm E}\{\bm{\eta}_{\tau}^{\phi}(\bm{\eta}_{\tau^{\prime}}^{\phi})^{\rm H}\}\!=\!v_{\tau,\tau^{\prime}}^{\phi}\bm{I} and 𝛈tγ𝒞𝒩(𝟎,vt,tγ𝐈)\bm{\eta}_{t}^{\gamma}\!\!\sim\!\mathcal{CN}(\bm{0},v_{t,t}^{\gamma}\bm{I}) with E{𝛈τγ(𝛈τγ)H}=vτ,τγ𝐈{\rm E}\{\bm{\eta}_{\tau}^{\gamma}(\bm{\eta}_{\tau^{\prime}}^{\gamma})^{\rm H}\}\!=\!v_{\tau,\tau^{\prime}}^{\gamma}\bm{I}. Besides, {𝛈tϕ}\{\bm{\eta}_{t}^{\phi}\} and 𝛈tγ}\bm{\eta}_{t}^{\gamma}\} are independent of 𝐱\bm{x}.

III-B Memory AMP (MAMP)

Memory AMP: Let 𝑩=λ𝑰𝑨𝑨H\bm{B}=\lambda^{\dagger}\bm{I}-\bm{A}\bm{A}^{\mathrm{H}}, and consider

𝒓^t=θt𝑩𝒓^t1+ξt(𝒚𝑨𝒙t).\hat{\bm{r}}_{t}=\theta_{t}\bm{B}\hat{\bm{r}}_{t-1}+\xi_{t}(\bm{y}-\bm{A}\bm{x}_{t}).\vspace{-1mm} (14)

An MAMP process is defined as: Starting with t=1t\!=\!\!1 and 𝒙1=𝒓^0=𝟎{\bm{x}}_{1}\!\!=\!\hat{\bm{r}}_{0}\!=\!\bm{0},

𝒓t\displaystyle\bm{r}_{t} =γt(𝒙1,𝒙t)1εt(𝑨H𝒓^t+i=1tpti𝒙i),\displaystyle=\gamma_{t}\left(\bm{x}_{1},\cdots\bm{x}_{t}\right)\equiv\tfrac{1}{{\varepsilon}_{t}}\big{(}\bm{A}^{\mathrm{H}}\hat{\bm{r}}_{t}+\textstyle\sum_{i=1}^{t}p_{ti}\bm{x}_{i}\big{)}, (15a)
𝒙t+1\displaystyle\bm{x}_{t+1}

where ϕt()\phi_{t}(\cdot) is the same as that in OAMP/VAMP (see (9)).

The following are intuitions of MAMP.

  • In LMIE, all preceding messages are utilized in i=1tpti𝒙i\textstyle\sum_{i=1}^{t}p_{ti}\bm{x}_{i} to guarantee the orthogonality in (12). In NLE, at most lt1l_{t}-1 preceding messages are utilized in i=1lt1\scaletoζ8ptti𝒙tlt+1+i\textstyle\sum_{i=1}^{l_{t}-1}\scaleto{\zeta}{8pt}_{ti}\bm{x}_{t-l_{t}+1+i} (e.g. damping) to guarantee and improve the convergence of MAMP.

  • 𝑩=λ𝑰𝑨𝑨H\bm{B}=\lambda^{\dagger}\bm{I}-\bm{A}\bm{A}^{\mathrm{H}} ensures that MAMP has the same fixed point as OAMP/VAMP.

  • {εt}\{{\varepsilon}_{t}\} and {pti}\{p_{ti}\}, given in Subsection IV-A (see (15q)), guarantee the orthogonality in (12) (see Theorem 1).

  • {θt}\{\theta_{t}\} and {ξt}\{\xi_{t}\} improve the convergence speed of MAMP. The optimizations of θt\theta_{t} and ξt\xi_{t} are given in Subsections V-A (see (15ag)) and V-B (see (15ae)), respectively.

  • \scaleto𝜻8ptt=[\scaletoζ8ptt1,,\scaletoζ8pttlt]T\scaleto{\bm{\zeta}}{8pt}_{t}=[{\scaleto{\zeta}{8pt}}_{t1},\cdots,{\scaleto{\zeta}{8pt}}_{tl_{t}}]^{\rm T}, optimized in Subsection V-C (see (15ai)), is a damping vector with i=1lt\scaletoζ8ptti=1\textstyle\sum_{i=1}^{l_{t}}{\scaleto{\zeta}{8pt}}_{ti}=1. In particular, no damping is applied for \scaleto𝜻8ptt=[0,,0,1]T\scaleto{\bm{\zeta}}{8pt}_{t}=[0,\cdots,0,1]^{\rm T}. We set lt=min{L,t+1}l_{t}=\min\{L,t\!+\!1\}, where LL is the maximum damping length (in general, L=3L=3). Damping guarantees and improves the convergence of MAMP.

We call (15) memory AMP as it involves a long memory {𝒙i,i<t}\{\bm{x}_{i},i<t\} at LMIE that is different from the non-memory LE in OAMP/VAMP. Matrix-vector multiplications instead of matrix inverse are involved. Thus, the complexity of MAMP is comparable to AMP, i.e., as low as 𝒪(MN){\cal O}(MN) per iteration.

IV Main Properties of MAMP

IV-A Orthogonality and Asymptotic IID Gaussianity

Let 𝑾t=𝑨H𝑩t𝑨\bm{W}_{t}=\bm{A}^{\rm H}\bm{B}^{t}\bm{A}. For t0t\geq 0, we define

bt\displaystyle b_{t} 1Ntr{𝑩t}=i=0t(ti)(1)i(λ)tiλi,\displaystyle\equiv\tfrac{1}{N}{\rm tr}\{\bm{B}^{t}\}=\textstyle\sum_{i=0}^{t}\binom{t}{i}(-1)^{i}(\lambda^{\dagger})^{t-i}\lambda_{i}, (15pa)
wt\displaystyle w_{t} 1Ntr{𝑾t}=λbtbt+1.\displaystyle\equiv\tfrac{1}{N}{\rm tr}\{\bm{W}_{t}\}=\lambda^{\dagger}b_{t}-b_{t+1}. (15pb)

For 1it1\leq i\leq t,

ϑtiξiτ=i+1tθτ,ptiϑtiwti,εt=i=1tpti.\displaystyle\!\!\vartheta_{ti}\equiv\xi_{i}\!\textstyle\prod_{\tau=i+1}^{t}\!\theta_{\tau},\;\;\;p_{ti}\equiv\vartheta_{ti}w_{t-i},\;\;\;{\varepsilon}_{t}=\textstyle\sum_{i=1}^{t}p_{ti}. (15q)

Furthermore, ϑti=1\vartheta_{ti}=1 if i>ti>t.

Proposition 1

The {𝐫t}\{{\bm{r}}_{t}\} in (15) and the corresponding errors can be expanded to

𝒓t=1εt[𝑭t𝒚+i=1t𝑯ti𝒙i],𝒈t=1εt(𝑭t𝒏+i=1t𝑯ti𝒇i),\displaystyle\!\!{\bm{r}}_{t}\!\!=\!\!\tfrac{1}{{\varepsilon}_{t}}\!\big{[}\bm{F}_{t}\bm{y}\!+\!\!\textstyle\sum_{i=1}^{t}\!\!\bm{H}_{ti}\bm{x}_{i}\big{]},\;\,\bm{g}_{t}\!\!=\!\!\tfrac{1}{{\varepsilon}_{t}}\!\big{(}\bm{F}_{t}{\bm{n}}\!+\!\!\textstyle\sum_{i=1}^{t}\!\!\bm{H}_{ti}\bm{f}_{i}\big{)}\!, (15ra)
where
𝑭ti=1tϑti𝑨H𝑩ti,𝑯tiϑti(wti𝑰𝑾ti).\displaystyle\bm{F}_{t}\!\equiv\!\textstyle\sum_{i=1}^{t}\vartheta_{ti}\bm{A}^{\mathrm{H}}\bm{B}^{t-i},\quad\bm{H}_{ti}\!\equiv\!\vartheta_{ti}(w_{t-i}\bm{I}\!-\!\bm{W}_{t-i}). (15rb)

The following is based on Lemma 3 and Proposition 1.

Theorem 1 (Orthogonality and Asymptotic IID Gaussianity)

Assume that 𝐀\bm{A} is right-unitarily-invariant with M,NM,N\!\to\!\infty. The orthogonality in (12) holds for MAMP. Therefore, the IID Gaussianity in (13) holds for MAMP.

IV-B State Evolution

Using the IID Gaussian property in Theorem 1, we establish a state evolution for the dynamics of the MSE of MAMP. The main challenge is the correlation between the long-memory inputs of LMIE. It requires a covariance-matrix state evolution to track the dynamics of MSE.

Define the covariance matrices as follows:

𝑽tγ=[vijγ]t×t,𝑽tϕ¯=[vijϕ¯]t×t,\bm{V}_{t}^{\gamma}=[v^{\gamma}_{ij}]_{t\times t},\qquad\bm{V}_{t}^{\bar{\phi}}=[v^{\bar{\phi}}_{ij}]_{t\times t},\vspace{-1mm} (15sa)
where
vttγ1NE{𝒈tH𝒈t},vttϕ¯1NE{𝒇tH𝒇t}.\displaystyle v^{\gamma}_{tt^{\prime}}\equiv\tfrac{1}{N}{\mathrm{E}}\{\bm{g}_{t}^{\mathrm{H}}\bm{g}_{t^{\prime}}\},\qquad v^{\bar{\phi}}_{tt^{\prime}}\equiv\tfrac{1}{N}{\mathrm{E}}\{\bm{f}_{t}^{\mathrm{H}}\bm{f}_{t^{\prime}}\}. (15sb)
Proposition 2 (State Evolution)

The covariance matrices of MAMP can be tracked by the following state evolution: Starting with v11ϕ=1v_{11}^{\phi}=1,

LMIE:𝑽tγ=γSE(𝑽tϕ¯),NLE:𝑽t+1ϕ¯=ϕ¯SE(𝑽tγ).{\rm LMIE:}\;\bm{V}_{t}^{\gamma}=\gamma_{\mathrm{SE}}(\bm{V}_{t}^{\bar{\phi}}),\quad{\rm NLE:}\;\bm{V}_{t+1}^{\bar{\phi}}=\bar{\phi}_{\mathrm{SE}}(\bm{V}_{t}^{\gamma}). (15t)

The details of γSE()\gamma_{\mathrm{SE}}(\cdot) and ϕ¯SE()\bar{\phi}_{\mathrm{SE}}(\cdot) are provided in [1].

IV-C Convergence and Bayes Optimality

The following theorem gives the convergence and Bayes optimality of the optimized MAMP. The proof is provided in Appendix F in [1].

Theorem 2 (Convergence and Bayes Optimality)

Assume that M,NM,N\to\infty with a fixed δ=M/N\delta=M/N and 𝐀\bm{A} is right-unitarily-invariant. The MAMP with optimized {θt,𝛇t}\{\theta_{t},\bm{\zeta}_{t}\} (see Section V) converges to the same fixed point as OAMP/VAMP, i.e., it is Bayes optimal if it has a unique fixed point.

IV-D Complexity Comparison

Table I compares the time and space complexity of MAMP, CAMP, AMP and OAMP/VAMP, where TT is the number of iterations. MAMP and CAMP have the similar time and space complexity. OAMP/VAMP has higher complexity than AMP, CAMP and MAMP, while MAMP and CAMP have comparable complexity to AMP for TNT\ll N. For more details, refer to Section IV-D in [1].

TABLE I: Time and Space Complexity Comparison Between AMP-Type Algorithms
Algorithms Time complexity Space complexity
AMP[5] 𝒪(MNT){\cal O}(MNT) 𝒪(MN){\cal O}(MN)
OAMP/VAMP[21, 16]
(SVD)
𝒪(M2N+MNT){\cal O}(M^{2}N\!+\!MNT) 𝒪(N2+MN){\cal O}(N^{2}\!+\!MN)
OAMP/VAMP[21, 16]
(matrix inverse)
𝒪((M2N+M3)T){\cal O}\big{(}(M^{2}N\!+\!M^{3})T\big{)} 𝒪(MN){\cal O}(MN)
CAMP [28] 𝒪(MNT+MT2+T4){\cal O}(MNT\!+\!MT^{2}\!+\!T^{4}) 𝒪(MN+MT+T2){\cal O}(MN\!+\!MT\!+\!T^{2})
MAMP 𝒪(MNT+(N+M)T2+T3){\cal O}\big{(}M\!NT\!\!+\!(\!N\!\!+\!\!M\!)T^{2}\!\!+\!\!T^{3}\big{)} 𝒪(MN+(N+M)T+T2){\cal O}\big{(}M\!N\!\!+\!(\!N\!\!+\!\!M\!)T\!\!+\!\!T^{2}\big{)}

V Parameter Optimization

The parameters {θt,ξt,\scaleto𝜻8ptt}\{\theta_{t},\xi_{t},\scaleto{\bm{\zeta}}{8pt}_{t}\} are optimized step-by-step for each iteration assuming that the parameters {θt,ξt,\scaleto𝜻8ptt,tt1}\{\theta_{t^{\prime}},\xi_{t^{\prime}},\scaleto{\bm{\zeta}}{8pt}_{t^{\prime}},{t^{\prime}}\leq t-1\} in previous iterations are fixed. More specifically, we first optimize θt\theta_{t}. Then, given θt\theta_{t}, we optimize ξt\xi_{t}. Finally, given θt\theta_{t} and ξ\xi, we optimize \scaleto𝜻8ptt\scaleto{\bm{\zeta}}{8pt}_{t}.

V-A Optimization of θt\theta_{t}

The optimal θt\theta_{t} is given by

θt=(λ+σ2/vttϕ)1,\theta_{t}=(\lambda^{\dagger}+\sigma^{2}/v_{tt}^{\phi})^{-1}, (15ae)

which minimizes the spectral radius of θt𝑩\theta_{t}\bm{B}. From (15ae), the spectral radius of θt𝑩\theta_{t}\bm{B} satisfies

ρ(θt𝑩)=λmaxλminλmax+λmin+2ρt<1.\rho(\theta_{t}\bm{B})=\tfrac{\lambda_{\max}-\lambda_{\min}}{\lambda_{\max}+\lambda_{\min}+2\rho_{t}}<1. (15af)

That is, the convergence condition is satisfied. In addition, (15ae) also optimizes the convergence speed of MAMP.

V-B Optimization of ξt\xi_{t}

Proposition 3

Fixed θt\theta_{t}, the optimal ξt\xi_{t} that minimizes vttγv_{tt}^{\gamma} is given by ξ1opt=1\xi_{1}^{\rm opt}=1 and for t2t\geq 2,

ξtopt=ct2ct0+ct3ct1ct0+ct2,\xi_{t}^{\rm opt}=\frac{c_{t2}c_{t0}+c_{t3}}{c_{t1}c_{t0}+c_{t2}}, (15ag)

where

ct0\displaystyle c_{t0} =i=1t1pti/w0,ct1=σ2w0+vttϕw¯00,\displaystyle=\textstyle\sum_{i=1}^{t-1}p_{ti}/w_{0},\qquad\qquad c_{t1}=\sigma^{2}w_{0}+v_{{t}{t}}^{\phi}\bar{w}_{00}, (15aha)
ct2\displaystyle c_{t2} =i=1t1ϑti(σ2wti+vtiϕw¯0ti),\displaystyle=-\textstyle{\sum}_{i=1}^{t-1}\vartheta_{ti}(\sigma^{2}w_{t-i}+v_{{t}{i}}^{\phi}\bar{w}_{0t-i}), (15ahb)
ct3\displaystyle c_{t3} =i=1t1j=1t1ϑtiϑtj(σ2w2tij+vijϕw¯titj).\displaystyle=\textstyle{\sum}_{i=1}^{t-1}\textstyle{\sum}_{j=1}^{t-1}\vartheta_{ti}\vartheta_{tj}\big{(}\sigma^{2}w_{2t-i-j}+v^{\phi}_{ij}\bar{w}_{t-it-j}\big{)}. (15ahc)

V-C Optimization of \scaleto𝛇8ptt\scaleto{\bm{\zeta}}{8pt}_{t}

Let 𝓥t+1ϕ\bm{\mathcal{V}}_{t+1}^{\phi} be the covariance matrix of the input errors of ϕ¯t\bar{\phi}_{t}, i.e. {𝒇tlt+2,,𝒇t,ϕt(𝒓t)𝒙}\{\bm{f}_{t-l_{t}+2},\cdots,\bm{f}_{t},\phi_{t}(\bm{r}_{t})-\bm{x}\}.

Proposition 4 (Optimal damping)

Fixed θt\theta_{t} and ξt\xi_{t}, the optimal \scaleto𝛇8ptt\scaleto{\bm{\zeta}}{8pt}_{t} that minimizes vt+1t+1ϕ¯v_{t+1t+1}^{\bar{\phi}} is given by

\scaleto𝜻8pttopt=[𝓥t+1ϕ]1𝟏𝟏T[𝓥t+1ϕ]1𝟏.\scaleto{\bm{\zeta}}{8pt}_{t}^{\rm opt}=\tfrac{[\bm{\mathcal{V}}_{t+1}^{\phi}]^{-1}\bm{1}}{\bm{1}^{\rm\!T}[\bm{\mathcal{V}}_{t+1}^{\phi}]^{-1}\bm{1}}. (15ai)

It is easy to see that the MSE of current iteration with optimized damping is not worse than that of the previous iteration, which is a special case of \scaleto𝜻8ptt=[0,,1,0]T\scaleto{\bm{\zeta}}{8pt}_{t}=[0,\cdots,1,0]^{\rm T}. That is, the MSE of MAMP with optimized damping is monotonically decreasing in the iterations.

VI Simulation Results

We study a compressed sensing problem where 𝒙\bm{x} follows a symbol-wise Bernoulli-Gaussian distribution, i.e. i\forall i,

xi{0,probability=1μ𝒩(0,μ1),probability=μ.x_{i}\sim\left\{\begin{array}[]{ll}0,&{\rm probability}=1-\mu\\ {\cal N}(0,\mu^{-1}),&{\rm probability}=\mu\end{array}\right.. (15al)

The variance of xix_{i} is normalized to 1. The signal-to-noise ratio (SNR) is defined as SNR=1/σ2{\rm SNR}=1/\sigma^{2}.

Let the SVD of 𝑨\bm{A} be 𝑨=𝑼𝚺𝑽\bm{A}=\bm{U\Sigma V}. The system model in (1) can be rewritten as[16, 21]:

𝒚=𝑼𝚺𝑽𝒙+𝒏.{\bm{y}}={\bm{U\Sigma V}}\bm{x}+{\bm{n}}. (15am)

Note that 𝑼H𝒏\bm{U}^{\mathrm{H}}\bm{n} has the same distribution as 𝒏\bm{n}. Thus, we can assume 𝑼=𝑰\bm{U}=\bm{I} without loss of generality. To reduce the calculation complexity of OAMP/VAMP, we approximate a large random unitary matrix by 𝑽=𝚷𝑭{\bm{V}}={\bm{\Pi F}}, where 𝚷\bm{\Pi} is a random permutation matrix and 𝑭\bm{F} is a discrete Fourier transform (DFT) matrix. Note that all the algorithms involved here admit fast implementation for this matrix model. The eigenvalues {di}\{d_{i}\} are generated as: di/di+1=κ1/Jd_{i}/d_{i+1}=\kappa^{1/J} for i=1,,J1i=1,\ldots,J-1 and i=1Jdi2=N\sum_{i=1}^{J}d_{i}^{2}=N, where J=min{M,N}J=\min\{M,N\}. Here, κ1\kappa\geq 1 controls the condition number of 𝑨\bm{A}. Note that MAMP does not require the SVD structure of 𝑨\bm{A}. MAMP only needs the right-unitarily invariance of 𝑨\bm{A}.

VI-A Influence of Relaxation Parameters and Damping

Fig. VI-A shows the influence of the relaxation parameters {λ,θt,ξt}\{\lambda^{\dagger},\theta_{t},\xi_{t}\} and damping. Without damping (e.g. L=1L=1) the convergence of MAMP is not guaranteed. In addition, the optimization of {λ,θt,ξt}\{\lambda^{\dagger},\theta_{t},\xi_{t}\} has significant improvement in the MSE of MAMP. That is

  • (i)

    damping guarantees the convergence of MAMP, and

  • (ii)

    the relaxation parameters {λ,θt,ξt}\{\lambda^{\dagger},\theta_{t},\xi_{t}\} do not change the fixed point of MAMP, but they can be optimized to improve the convergence speed.

Figure 3: MSE versus the number of iterations for MAMP with different parameters {λ,θt,ξt,L}\{\lambda^{\dagger},\theta_{t},\xi_{t},L\}. M=4096,N=8192M=4096,N=8192, μ=0.1\mu=0.1, κ=10\kappa=10 and SNR=30{\rm SNR}=30 dB. “optimal” denotes MAMP with optimized {λ,θt,ξt}\{\lambda^{\dagger},\theta_{t},\xi_{t}\} and L=3L=3 (damping length). The other curves denote MAMP with the same parameters as the “optimal” except the one marked on each curve.

VI-B Comparison with AMP and CAMP

Fig. 4 shows MSE versus the number of iterations for AMP, CAMP, OMAP/VAMP and MAMP. To improve the convergence, both AMP and CAMP are damped. As can be seen, for an ill-conditioned matrix with κ=10\kappa=10, the MSE performance of AMP is poor. CAMP converges to the same performance as that of OAMP/VAMP. However, the state evolution (SE) of CAMP is inaccurate since damping is made on the a-posteriori outputs, which breaks the Gaussianity of the estimation errors. MAMP converges faster than CAMP to OAMP/VAMP. Furthermore, the state evolution of MAMP is accurate since damping is made on the orthogonal outputs, which preserves the Gaussianity of the estimation errors.

Refer to caption
Figure 4: MSE versus the number of iterations for AMP, CAMP, OMAP/VAMP and MAMP. M=8192,N=16384M=8192,N=16384, μ=0.1\mu=0.1, κ=10\kappa=10, L=3L=3 and SNR=30{\rm SNR}=30 dB. The curves of AMP and CAMP are from Fig. 2 in [28].

VI-C Influence of High Condition Number and Damping Length

Fig. 5 shows MSE versus the number of iterations for MAMP with different damping lengths. As can be seen, MAMP converges to OAMP/VAMP for the matrix with high condition number κ=100\kappa=100, and the state evolution (SE) of MAMP matches well with the simulated MSE. Note that CAMP diverges when κ>15\kappa>15 (see Fig. 4 in [28]). In addition, MAMP with L=3L=3 (damping length) has significant improvement in convergence speed compared with L=2L=2 when the condition number is large. It should be mentioned that L=3L=3 is generally enough for MAMP, since the MSEs of MAMP are almost the same when L3L\geq 3.

Refer to caption
Figure 5: MSE versus the number of iterations for MAMP and OAMP/VAMP with different damping lengths. M=4096,N=8192M=4096,N=8192, μ=0.1\mu=0.1, SNR=30{\rm SNR}=30 dB, κ=100\kappa=100 and L=2L=2 and 3.

VII Conclusions

This paper proposes a low-cost MAMP for high-dimensional linear systems with unitarily transform matrices. The proposed MAMP is not only Bayes-optimal, but also has comparable complexity to AMP. Specifically, the techniques of long memory and orthogonalization are used to achieve the Bayes-optimal solution of the problem with a low-complexity MF. The convergence of MAMP is optimized with some relaxation parameters and a damping vector. The optimized MAMP is guaranteed to converge to the high-complexity OAMP/VAMP for all right-unitarily-invariant matrices.

References

  • [1] L. Liu, S. Huang, and B. M. Kurkoski, “Memory approximate message passing,” arXiv preprint arXiv:2012.10861, Dec. 2020. [Online] Available: arxiv.org/abs/2012.10861.
  • [2] D. Micciancio, “The hardness of the closest vector problem with preprocessing,” IEEE Trans. Inf. Theory, vol. 47, no. 3, pp. 1212-1215, Mar. 2001.
  • [3] S. Verdú, “Optimum multi-user signal detection,” Ph.D. dissertation, Department of Electrical and Computer Engineering, University of Illinois at Urbana-Champaign, Urbana, IL, Aug. 1984.
  • [4] M. Bayati and A. Montanari, “The dynamics of message passing on dense graphs, with applications to compressed sensing,” IEEE Trans. Inf. Theory, vol. 57, no. 2, pp. 764–785, Feb. 2011.
  • [5] D. L. Donoho, A. Maleki, and A. Montanari, “Message-passing algorithms for compressed sensing,” in Proc. Nat. Acad. Sci., vol. 106, no. 45, Nov. 2009.
  • [6] T. Richardson and R. Urbanke, Modern Coding Theory. New York: Cambridge University Press, 2008.
  • [7] K. Takeuchi, T. Tanaka, and T. Kawabata, “Performance improvement of iterative multiuser detection for large sparsely-spread CDMA systems by spatial coupling,” IEEE Trans. Inf. Theory, vol. 61, no. 4, pp. 1768-1794, Apr. 2015.
  • [8] S. Kudekar, T. Richardson, and R. Urbanke, “Threshold saturation via spatial coupling: Why convolutional LDPC ensembles perform so well over the BEC,” IEEE Trans. Inf. Theory, vol. 57, no. 2, pp. 803–834, Feb. 2011.
  • [9] F. Krzakala, M. Mézard, F. Sausset, Y. F. Sun, and L. Zdeborová, “Statistical-physics-based reconstruction in compressed sensing,” Phys. Rev. X, vol. 2, pp. 021 005–1–18, May 2012.
  • [10] D. L. Donoho, A. Javanmard, and A. Montanari, “Information-theoretically optimal compressed sensing via spatial coupling and approximate message passing,” IEEE Trans. Inf. Theory, vol. 59, no. 11, pp. 7434-7464, Nov. 2013.
  • [11] A. Manoel, F. Krzakala, E. W. Tramel, and L. Zdeborová, “Sparse estimation with the swept approximated message-passing algorithm,” arXiv preprint arXiv:1406.4311, 2014.
  • [12] S. Rangan, A. K. Fletcher, P. Schniter, and U. S. Kamilov, “Inference for generalized linear models via alternating directions and bethe free energy minimization,” IEEE Trans. Inf. Theory, vol. 63, no. 1, pp. 676–697, Jan 2017.
  • [13] J. Vila, P. Schniter, S. Rangan, F. Krzakala, and L. Zdeborová, “Adaptive damping and mean removal for the generalized approximate message passing algorithm,” in Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on, 2015, pp. 2021–2025.
  • [14] Q. Guo and J. Xi, “Approximate message passing with unitary transformation,” CoRR, vol. abs/1504.04799, 2015. [Online]. Available: http://arxiv.org/abs/1504.04799
  • [15] Z. Yuan, Q. Guo and M. Luo, “Approximate message passing with unitary transformation for robust bilinear Recovery,” IEEE Trans. Signal Process., doi: 10.1109/TSP.2020.3044847.
  • [16] J. Ma and L. Ping, “Orthogonal AMP,” IEEE Access, vol. 5, pp. 2020–2033, 2017, preprint arXiv:1602.06509, 2016.
  • [17] T. P. Minka, “Expectation propagation for approximate bayesian inference,” in Proceedings of the Seventeenth conference on Uncertainty in artificial intelligence, 2001, pp. 362–369.
  • [18] M. Opper and O. Winther, “Expectation consistent approximate inference,” Journal of Machine Learning Research, vol. 6, no. Dec, pp. 2177–2204, 2005.
  • [19] B. Çakmak and M. Opper, “Expectation propagation for approximate inference: Free probability framework,” arXiv preprint arXiv:1801.05411, 2018.
  • [20] K. Takeuchi, “Rigorous dynamics of expectation-propagation-based signal recovery from unitarily invariant measurements,” IEEE Trans. Inf. Theory, vol. 66, no. 1, 368-386, Jam. 2020.
  • [21] S. Rangan, P. Schniter, and A. Fletcher, “Vector approximate message passing,” IEEE Trans. Inf. Theory, vol. 65, no. 10, pp. 6664-6684, Oct. 2019.
  • [22] M. Tuchler, A. C. Singer and R. Koetter, “Minimum mean squared error equalization using a priori information,” IEEE Trans. Signal Process., vol. 50, no. 3, pp. 673-683, March 2002.
  • [23] L. Liu, Y. Chi, C. Yuen, Y. L. Guan, and Y. Li, “Capacity-achieving MIMO-NOMA: Iterative LMMSE detection,” IEEE Trans. Signal Process., vol. 67, no. 7, 1758-1773, April 2019.
  • [24] J. Ma, L. Liu, X. Yuan and L. Ping, ”On orthogonal AMP in coded linear vector systems,” IEEE Trans. Wireless Commun., vol. 18, no. 12, pp. 5658-5672, Dec. 2019.
  • [25] L. Liu, C. Liang, J. Ma, and L. Ping, “Capacity optimality of AMP in coded systems,” Submitted to IEEE Trans. Inf. Theory, 2019. [Online] Available: arxiv.org/pdf/1901.09559.pdf
  • [26] M. Opper, B. Çakmak, and O. Winther, “A theory of solving TAP equations for Ising models with general invariant random matrices,” J.Phys. A: Math. Theor., vol. 49, no. 11, p. 114002, Feb. 2016.
  • [27] Z. Fan, “Approximate message passing algorithms for rotationally invariant matrices,” arXiv:2008.11892, 2020.
  • [28] K. Takeuchi, “Bayes-optimal convolutional AMP,” IEEE Trans. Inf. Theory, 2021, DOI: 10.1109/TIT.2021.3077471. [Online] arXiv preprint arXiv:2003.12245v3, 2020.
  • [29] R. Berthier, A. Montanari, and P. M. Nguyen, “State evolution for approximate message passing with non-separable functions,” arXiv preprint arXiv:1708.03950, 2017.
  • [30] A. M. Tulino, G. Caire, S. Verdú, and S. Shamai (Shitz), “Support recovery with sparsely sampled free random matrices,” IEEE Trans. Inf. Theory, vol. 59, no. 7, pp. 4243–4271, Jul. 2013.
  • [31] K. Takeda, S. Uda, and Y. Kabashima, “Analysis of CDMA systems that are characterized by eigenvalue spectrum,” Europhys. Lett., vol. 76, no. 6, pp. 1193-1199, 2006.
  • [32] J. Barbier, N. Macris, A. Maillard, and F. Krzakala, “The mutual information in random linear estimation beyond i.i.d. matrices,” arXiv preprint arXiv:1802.08963, 2018.
  • [33] K. Takeuchi, “A unified framework of state evolution for message-passing algorithms,” arXiv:1901.03041, 2019.