This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

NLMS
normalized least mean square
AEC
acoustic echo cancellation
FIR
finite impulse response
RIR
room impulse response
EM
expectation-maximization
MMSE
minimum mean square error
PDF
probability density function
KLMS
Kalman-based LMS algorithm

The NLMS algorithm with time-variant optimum
stepsize derived from a Bayesian network perspective

Christian Huemmer, , Roland Maas, Walter Kellermann
Abstract

In this article, we derive a new stepsize adaptation for the normalized least mean square algorithm (NLMS) by describing the task of linear acoustic echo cancellation from a Bayesian network perspective. Similar to the well-known Kalman filter equations, we model the acoustic wave propagation from the loudspeaker to the microphone by a latent state vector and define a linear observation equation (to model the relation between the state vector and the observation) as well as a linear process equation (to model the temporal progress of the state vector). Based on additional assumptions on the statistics of the random variables in observation and process equation, we apply the expectation-maximization (EM) algorithm to derive an NLMS-like filter adaptation. By exploiting the conditional independence rules for Bayesian networks, we reveal that the resulting EM-NLMS algorithm has a stepsize update equivalent to the optimal-stepsize calculation proposed by Yamamoto and Kitayama in 1982, which has been adopted in many textbooks. As main difference, the instantaneous stepsize value is estimated in the M step of the EM algorithm (instead of being approximated by artificially extending the acoustic echo path). The EM-NLMS algorithm is experimentally verified for synthesized scenarios with both, white noise and male speech as input signal.

Index Terms:
Adaptive stepsize, NLMS, Bayesian network, machine learning, EM algorithm

I Introduction

Machine learning techniques have been widely applied to signal processing tasks since decades [1, 2]. For example, directed graphical models, termed Bayesian networks, have shown to provide a powerful framework for modeling causal probabilistic relationships between random variables [3, 4, 5, 6, 7]. In previous work, the update equations of the Kalman filter and the normalized least mean square (NLMS) algorithm have already been derived from a Bayesian network perspective based on a linear relation between the latent room impulse response (RIR) vector and the observation [8, 9].
The NLMS algorithm is one of the most-widely used adaptive algorithms in speech signal processing and a variety of stepsize adaptation schemes has been proposed to improve its system identification performance [10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21]. In this article, we derive a novel NLMS-like filter adaptation (termed EM-NLMS algorithm) by applying the expectation-maximization (EM) algorithm to a probabilistic model for linear system identification. Based on the conditional independence rules for Bayesian networks, it is shown that the normalized stepsize of the EM-NLMS algorithm is equivalent to the one proposed in [10], which is now commonly accepted as optimum NLMS stepsize rule, see e.g. [22]. As the main difference relative to [10] , the normalized stepsize is here estimated as part of the EM algorithm instead of being approximated by artificially extending the acoustic echo path. For a valid comparison, we review the algorithm of [10] for the linear acoustic echo cancellation (AEC) scenario shown in Fig. 1.

\psfrag{A}[c][c]{$\mathbf{x}_{n}$}\psfrag{B}[c][c]{$\mathbf{h}_{n}$}\psfrag{C}[c][c]{$v_{n}$}\psfrag{D}[c][c]{$d_{n}$}\psfrag{F}[c][c]{$\hat{d}_{n}$}\psfrag{G}[c][c]{$e_{n}$}\psfrag{E}[c][c]{$\mathbf{\hat{h}}_{n-1}$}\includegraphics[width=130.08731pt]{AECworkPSFRAG.eps}
Figure 1: System model for linear AEC with RIR vector 𝐡n\mathbf{h}_{n}

The acoustic path between loudspeaker and microphone at time nn is modeled by the linear  finite impulse response (FIR) filter

𝐡n=[h0,n,h1,n,,hM1,n]T\mathbf{h}_{n}=[h_{0,n},h_{1,n},...,h_{M-1,n}]^{T} (1)

with time-variant coefficients hκ,nh_{\kappa,n}, where κ=0,,M1{\kappa=0,...,M-1}. The observation equation models the microphone sample dnd_{n}:

dn=𝐱nT𝐡n+vn,d_{n}=\mathbf{x}^{T}_{n}\mathbf{h}_{n}+v_{n}, (2)

with the additive variable vnv_{n} modeling near-end interferences and the observed input signal vector 𝐱n=[xn,xn1,,xnM+1]T\mathbf{x}_{n}=[x_{n},x_{n-1},...,x_{n-M+1}]^{T} capturing the time-domain samples xnx_{n}. The iterative estimation of the RIR vector by the adaptive FIR filter 𝐡^n\mathbf{\hat{h}}_{n} is realized by the update rule

𝐡^n=𝐡^n1+λn𝐱nen,\mathbf{\hat{h}}_{n}=\mathbf{\hat{h}}_{n-1}+\lambda_{n}\mathbf{x}_{n}e_{n}, (3)

with the stepsize λn\lambda_{n} and the error signal

en=dn𝐱nT𝐡^n1e_{n}=d_{n}-\mathbf{x}_{n}^{T}\mathbf{\hat{h}}_{n-1} (4)

relating the observation dnd_{n} and its estimate d^n=𝐱nT𝐡^n1\hat{d}_{n}=\mathbf{x}_{n}^{T}\mathbf{\hat{h}}_{n-1}. In [10], the optimal choice of λn\lambda_{n} has been approximated as:

λn1M{𝐡n𝐡^n122}{en2},\lambda_{n}\approx\frac{1}{M}\frac{\mathcal{E}\{||\mathbf{h}_{n}-\mathbf{\hat{h}}_{n-1}||^{2}_{2}\}}{\mathcal{E}\{e_{n}^{2}\}}, (5)

where ||||2||\cdot||_{2} denotes the Euclidean norm and {}\mathcal{E}\{\cdot\} the expectation operator. As the true echo path 𝐡n\mathbf{h}_{n} is unobservable, so that the numerator in (5) cannot be computed, λn\lambda_{n} is further approximated by introducing a delay of NTN_{T} coefficients to the echo path 𝐡n\mathbf{h}_{n}. Moreover, a recursive approximation of the denominator in (5) is applied using the forgetting factor η\eta [22, 23]. The resulting stepsize approximation

λn1NTκ=0NT1h^k,n12(1η)en2+η{en12}\lambda_{n}\approx\frac{1}{N_{T}}\frac{\sum\limits_{\kappa=0}^{N_{T}-1}\hat{h}^{2}_{k,n-1}}{(1-\eta)e_{n}^{2}+\eta\mathcal{E}\{e_{n-1}^{2}\}} (6)

leads to oscillations which have to be addressed by limiting the absolute value of λn\lambda_{n} [24].

TABLE I: Relation between the NLMS algorithm following [10] and the proposed EM-NLMS algorithm
NLMS algorithm [10]   EM-NLMS algorithm
Norm. stepsize λn\lambda_{n} (5)   E step
Estimation of λn\lambda_{n} (6)   M step
         equivalent to         (Section III)         replaced by         (Subsec. II-C)

In this article, we derive the EM-NLMS algorithm which applies the filter update of (3) using the stepsize in (5), where λn\lambda_{n} is estimated in the M Step of the EM algorithm instead of being approximated by using (6).

This article is structured as follows: In Section II, we propose a probabilistic model for the linear AEC scenario of Fig. 1 and derive the EM-NLMS algorithm, which is revealed in Section III to be similar to the NLMS algorithm proposed in [10]. As main difference (cf. Table I), the stepsize is estimated in the M Step of the EM algorithm instead of being approximated by artificially extending the acoustic echo path. In Section IV, the EM-NLMS algorithm is experimentally verified for synthesized scenarios with both, white noise and male speech as input signal. Finally, conclusions are drawn in Section V.

II The EM-NLMS algorithm for linear AEC

Throughout this article, the Gaussian probability density function (PDF) of a real-valued length-MM vector 𝐳n\mathbf{z}_{n} with mean vector 𝝁𝐳,n\boldsymbol{\mu}_{\mathbf{z},n} and covariance matrix 𝐂𝐳,n\mathbf{C}_{\mathbf{z},n} is denoted as

𝐳n𝒩{𝐳n|𝝁𝐳,n,𝐂𝐳,n}=|𝐂𝐳,n|1/2(2π)M/2exp{(𝐳n𝝁𝐳,n)T𝐂𝐳,n1(𝐳n𝝁𝐳,n)2},\begin{split}&\mathbf{z}_{n}\sim\mathcal{N}\{\mathbf{z}_{n}|\boldsymbol{\mu}_{\mathbf{z},n},\mathbf{C}_{\mathbf{z},n}\}\\ =&\frac{|\mathbf{C}_{\mathbf{z},n}|^{-1/2}}{(2\pi)^{M/2}}\exp\left\{-\frac{(\mathbf{z}_{n}-\boldsymbol{\mu}_{\mathbf{z},n})^{T}\mathbf{C}_{\mathbf{z},n}^{-1}(\mathbf{z}_{n}-\boldsymbol{\mu}_{\mathbf{z},n})}{2}\right\},\end{split} (7)

where |||\cdot| represents the determinant of a matrix. Furthermore, 𝐂𝐳,n=C𝐳,n𝐈\mathbf{C}_{\mathbf{z},n}=C_{\mathbf{z},n}\mathbf{I} (with identity matrix 𝐈\mathbf{I}) implies the elements of 𝐳n\mathbf{z}_{n} to be mutually statistically independent and of equal variance C𝐳,nC_{\mathbf{z},n}.

II-A Probabilistic AEC model

To describe the linear AEC scenario of Fig. 1 from a Bayesian network perspective, we model the acoustic echo path as a latent state vector 𝐡n\mathbf{h}_{n} identically defined as in (1) and capture uncertainties (e.g. due to the limitation to a linear system with a finite set of coefficients) by the additive uncertainty 𝐰n\mathbf{w}_{n}. Consequently, the linear process equation and the linear observation equation,

𝐡n=𝐡n1+𝐰nanddn=𝐱nT𝐡n+vn,\mathbf{h}_{n}=\mathbf{h}_{n-1}+\mathbf{w}_{n}\quad\text{and}\quad d_{n}=\mathbf{x}^{T}_{n}\mathbf{h}_{n}+v_{n}, (8)

can be jointly represented by the graphical model shown in Fig. 2. The directed links express statistical dependencies between the nodes and random variables, such as vnv_{n}, are marked as circles. We make the following assumptions on the PDFs of the random variables in Fig. 2:

  • The uncertainty 𝐰n\mathbf{w}_{n} is normally distributed with mean vector 𝟎\mathbf{0} (of zero-valued entries) and variance C𝐰,nC_{\mathbf{w},n}:

    𝐰n𝒩{𝐰n|𝟎,𝐂𝐰,n},𝐂𝐰,n=C𝐰,n𝐈.\mathbf{w}_{n}\sim\mathcal{N}\{\mathbf{w}_{n}|\mathbf{0},\mathbf{C}_{\mathbf{w},n}\},\quad\mathbf{C}_{\mathbf{w},n}=C_{\mathbf{w},n}\mathbf{I}. (9)
  • The microphone signal uncertainty vnv_{n} is assumed to be normally distributed with variance Cv,nC_{v,n} and zero mean:

    vn𝒩{vn|0,Cv,n}.v_{n}\sim\mathcal{N}\{v_{n}|0,C_{v,n}\}. (10)
  • The posterior distribution p(𝐡n|d1:n)p\left(\mathbf{h}_{n}|d_{1:n}\right) is defined with mean vector 𝝁𝐡,n\boldsymbol{\mu}_{\mathbf{h},n}, variance C𝐡,nC_{\mathbf{h},n} and d1:n=d1,,dn{d_{1:n}=d_{1},...,d_{n}}:

    p(𝐡n|d1:n)=𝒩{𝐡n|𝝁𝐡,n,𝐂𝐡,n},𝐂𝐡,n=C𝐡,n𝐈.p\left(\mathbf{h}_{n}|d_{1:n}\right)=\mathcal{N}\{\mathbf{h}_{n}|\boldsymbol{\mu}_{\mathbf{h},n},\mathbf{C}_{\mathbf{h},n}\},\quad\mathbf{C}_{\mathbf{h},n}=C_{\mathbf{h},n}\mathbf{I}. (11)
\psfrag{A1}[c][c]{$\mathbf{w}_{1}$}\psfrag{A2}[c][c]{$\mathbf{w}_{2}$}\psfrag{An1}[c][c]{$\mathbf{w}_{n-1}$}\psfrag{An}[c][c]{$\mathbf{w}_{n}$}\psfrag{b1}[c][c]{$\mathbf{h}_{1}$}\psfrag{b2}[c][c]{$\mathbf{h}_{2}$}\psfrag{bn1}[c][c]{$\mathbf{h}_{n-1}$}\psfrag{bn}[c][c]{$\mathbf{h}_{n}$}\psfrag{c1}[c][c]{$d_{1}$}\psfrag{c2}[c][c]{$d_{2}$}\psfrag{cn1}[c][c]{$d_{n-1}$}\psfrag{cn}[c][c]{$d_{n}$}\psfrag{d1}[c][c]{$v_{1}$}\psfrag{d2}[c][c]{$v_{2}$}\psfrag{dn1}[c][c]{$v_{n-1}$}\psfrag{dn}[c][c]{$v_{n}$}\includegraphics[width=147.4292pt]{LDSfullRestrictedParticle.eps}
Figure 2: Bayesian network for linear AEC with latent state vector 𝐡n\mathbf{h}_{n}

Based on this probabilistic AEC model, we apply the EM algorithm consisting of two parts: In the E Step, the filter update is derived based on minimum mean square error (MMSE) estimation (Subsection II-B). In the M step, we predict the model parameters Cv,n+1C_{v,n+1} and C𝐰,n+1C_{\mathbf{w},n+1} to estimate the adaptive stepsize value λn+1\lambda_{n+1} (Subsection II-C).

II-B E step: Inference of the state vector

The MMSE estimation of the state vector identifies the mean vector of the posterior distribution as estimate 𝐡^n\mathbf{\hat{h}}_{n}:

𝐡^n=argmin𝐡~n{𝐡~n𝐡n22}={𝐡n|d1:n}=𝝁𝐡,n.\mathbf{\hat{h}}_{n}=\underset{\mathbf{\tilde{h}}_{n}}{\operatorname{argmin}}\;\mathcal{E}\{||\mathbf{\tilde{h}}_{n}-\mathbf{h}_{n}||_{2}^{2}\}=\mathcal{E}\{\mathbf{h}_{n}|d_{1:n}\}=\boldsymbol{\mu}_{\mathbf{h},n}. (12)

Due to the linear relations between the variables in (2) and (8), and under the restrictions to a linear estimator of 𝐡^n\mathbf{\hat{h}}_{n} and normally distributed random variables, the MMSE estimation is analytically tractable [9] . Exploiting the product rules for linear Gaussian models and conditional independence of the Bayesian network in Fig 2, the filter update can be derived as a special case of the Kalman filter equations [9, p. 639]:

𝐡^n=𝐡^n1+𝚲n𝐱nen,\mathbf{\hat{h}}_{n}=\mathbf{\hat{h}}_{n-1}+\boldsymbol{\Lambda}_{n}\mathbf{x}_{n}e_{n}, (13)

with the stepsize matrix

𝚲n=𝐂𝐡,n1+𝐂𝐰,n𝐱nT(𝐂𝐡,n1+𝐂𝐰,n)𝐱n+Cv,n\boldsymbol{\Lambda}_{n}=\frac{\mathbf{C}_{\mathbf{h},n-1}+\mathbf{C}_{\mathbf{w},n}}{\mathbf{x}^{T}_{n}(\mathbf{C}_{\mathbf{h},n-1}+\mathbf{C}_{\mathbf{w},n})\mathbf{x}_{n}+C_{v,n}} (14)

and the update of the covariance matrix given as

𝐂𝐡,n=(𝐈𝚲n𝐱n𝐱nT)(𝐂𝐡,n1+𝐂𝐰,n).\mathbf{C}_{\mathbf{h},n}=\left(\mathbf{I}-\boldsymbol{\Lambda}_{n}\mathbf{x}_{n}\mathbf{x}^{T}_{n}\right)(\mathbf{C}_{\mathbf{h},n-1}+\mathbf{C}_{\mathbf{w},n}). (15)

By inserting (9) and (11), we can rewrite the filter update of (13) to the filter update defined in (3) with the scalar stepsize

λn=C𝐡,n1+C𝐰,n𝐱nT𝐱n(C𝐡,n1+C𝐰,n)+Cv,n.\lambda_{n}=\frac{C_{\mathbf{h},n-1}+C_{\mathbf{w},n}}{\mathbf{x}^{T}_{n}\mathbf{x}_{n}(C_{\mathbf{h},n-1}+C_{\mathbf{w},n})+C_{v,n}}. (16)

Finally, the update of C𝐡,nC_{\mathbf{h},n} is approximated following (11) as

C𝐡,n=(11)diag{𝐂𝐡,n}M=(15)(1λn𝐱nT𝐱nM)(C𝐡,n1+C𝐰,n),C_{\mathbf{h},n}\stackrel{{\scriptstyle(\ref{equ:Posterior})}}{{=}}\frac{\text{diag}\{\mathbf{C}_{\mathbf{h},n}\}}{M}\stackrel{{\scriptstyle(\ref{equ:LDSupdateC})}}{{=}}\left(1-\lambda_{n}\frac{\mathbf{x}^{T}_{n}\mathbf{x}_{n}}{M}\right)(C_{\mathbf{h},n-1}+C_{\mathbf{w},n}), (17)

where diag{}\text{diag}\{\cdot\} adds up the diagonal elements of a matrix.
Before showing the equality of the stepsize updates in (16) and (5) in Section III, we propose a new alternative to estimate λn\lambda_{n} in (16) by deriving the updates of the model parameters C𝐰,nC_{\mathbf{w},n} and Cv,nC_{v,n} in the following section.

II-C M step: Online learning of the model parameters

In the M step, we predict the model parameters for the following time instant. Although the maximum likelihood estimation is analytically tractable, we apply the EM algorithm to derive an online estimator: In order to update θn={Cv,n,C𝐰,n}\theta_{n}=\{C_{v,n},C_{\mathbf{w},n}\} to the new parameters θnnew={Cv,nnew,C𝐰,nnew}\theta^{\text{new}}_{n}=\{C^{\text{new}}_{v,n},C^{\text{new}}_{\mathbf{w},n}\}, the lower bound

𝐡1:n|θ1:n{ln(p(d1:n,𝐡1:n|θ1:n))}lnp(d1:n|θ1:n),\mathcal{E}_{\mathbf{h}_{1:n}|\theta_{1:n}}\{\ln\left(p(d_{1:n},\mathbf{h}_{1:n}|\theta_{1:n})\right)\}\leq\ln p(d_{1:n}|\theta_{1:n}), (18)

is maximized, where θ1:n={Cv,1:n,C𝐰,1:n}\theta_{1:n}=\{C_{v,1:n},C_{\mathbf{w},1:n}\}. For this, the PDF p(d1:n,𝐡1:n|θ1:n)p(d_{1:n},\mathbf{h}_{1:n}|\theta_{1:n}) is determined by applying the decomposition rules for Bayesian networks [9]:

p(d1:n,𝐡1:n|θ1:n)=p(𝐡n|𝐡n1,C𝐰,n𝐈)p(dn|𝐡n,Cv,n)\displaystyle p(d_{1:n},\mathbf{h}_{1:n}|\theta_{1:n})=p(\mathbf{h}_{n}|\mathbf{h}_{n-1},C_{\mathbf{w},n}\mathbf{I})p(d_{n}|\mathbf{h}_{n},C_{v,n})
m=1n1p(𝐡m|𝐡m1,C𝐰,m𝐈)p(dm|𝐡m,Cv,m).\displaystyle\cdot\prod_{m=1}^{n-1}p(\mathbf{h}_{m}|\mathbf{h}_{m-1},C_{\mathbf{w},m}\mathbf{I})p(d_{m}|\mathbf{h}_{m},C_{v,m}). (19)

Next, we take the natural logarithm ln()(\cdot) of p(d1:n,𝐡1:n|θ1:n)p(d_{1:n},\mathbf{h}_{1:n}|\theta_{1:n}), replace θn\theta_{n} by θnnew\theta^{\text{new}}_{n} and maximize the right-hand side of (18) with respect to θnnew\theta^{\text{new}}_{n}:

θnnew\displaystyle\theta^{\text{new}}_{n} =argmaxC𝐰,nnew𝐡1:n|θn{ln(p(𝐡n|𝐡n1,C𝐰,nnew𝐈))}\displaystyle=\underset{C^{\text{new}}_{\mathbf{w},n}}{\operatorname{argmax}}\;\mathcal{E}_{\mathbf{h}_{1:n}|\theta_{n}}\{\ln\left(p(\mathbf{h}_{n}|\mathbf{h}_{n-1},C^{\text{new}}_{\mathbf{w},n}\mathbf{I})\right)\}
+argmaxCv,nnew𝐡1:n|θn{ln(p(dn|𝐡n,Cv,nnew))},\displaystyle+\underset{C^{\text{new}}_{v,n}}{\operatorname{argmax}}\;\mathcal{E}_{\mathbf{h}_{1:n}|\theta_{n}}\{\ln\left(p(d_{n}|\mathbf{h}_{n},C^{\text{new}}_{v,n})\right)\}, (20)

where we apply two separate maximizations starting with the estimation of Cv,nnewC^{\text{new}}_{v,n} by inserting

ln(p(dn|𝐡n,Cv,nnew))=(8)ln(2πCv,nnew)2(dn𝐱nT𝐡n)22Cv,nnew\ln(p(d_{n}|\mathbf{h}_{n},C^{\text{new}}_{v,n}))\stackrel{{\scriptstyle(\ref{equ:TranEq})}}{{=}}-\frac{\ln(2\pi C^{\text{new}}_{v,n})}{2}-\frac{(d_{n}-\mathbf{x}^{T}_{n}\mathbf{h}_{n})^{2}}{2C^{\text{new}}_{v,n}} (21)

into (20). This leads to the instantaneous estimate:

Cv,nnew\displaystyle C^{\text{new}}_{v,n} =𝐡1:n|θn{(dn𝐱nT𝐡n)2}\displaystyle=\mathcal{E}_{\mathbf{h}_{1:n}|\theta_{n}}\{(d_{n}-\mathbf{x}^{T}_{n}\mathbf{h}_{n})^{2}\} (22)
=dn+𝐱nT(C𝐡,n𝐈+𝐡^n𝐡^nT)𝐱n2𝐱nT𝐡^n\displaystyle=d_{n}+\mathbf{x}^{T}_{n}(C_{\mathbf{h},n}\mathbf{I}+\mathbf{\hat{h}}_{n}\mathbf{\hat{h}}^{T}_{n})\mathbf{x}_{n}-2\mathbf{x}^{T}_{n}\mathbf{\hat{h}}_{n} (23)
=(dn𝐱nT𝐡^n)2+𝐱nT𝐱nC𝐡,n.\displaystyle=(d_{n}-\mathbf{x}^{T}_{n}\mathbf{\hat{h}}_{n})^{2}+\mathbf{x}^{T}_{n}\mathbf{x}_{n}C_{\mathbf{h},n}. (24)

The variance (of the microphone signal uncertainty) Cv,nnewC^{\text{new}}_{v,n} in (24) consists of two components, which can be interpreted as follows [25]: The first term in (24) is given as the squared error signal after filter adaptation and is influenced by near-end interferences like background noise. The second term in (24) depends on the signal energy 𝐱nT𝐱n\mathbf{x}^{T}_{n}\mathbf{x}_{n} and the variance C𝐡,nC_{\mathbf{h},n} which implies that it considers uncertainties in the linear echo path model. Similar to the derivation for Cv,nnewC^{\text{new}}_{v,n}, we insert

ln(p(𝐡n|𝐡n1,C𝐰,n𝐈))\displaystyle\ln(p(\mathbf{h}_{n}|\mathbf{h}_{n-1},C_{\mathbf{w},n}\mathbf{I}))
=(8)Mln(2πC𝐰,nnew)2(𝐡n𝐡n1)T(𝐡n𝐡n1)2C𝐰,nnew\displaystyle\stackrel{{\scriptstyle(\ref{equ:TranEq})}}{{=}}-\frac{M\ln(2\pi C^{\text{new}}_{\mathbf{w},n})}{2}-\frac{(\mathbf{h}_{n}-\mathbf{h}_{n-1})^{T}(\mathbf{h}_{n}-\mathbf{h}_{n-1})}{2C^{\text{new}}_{\mathbf{w},n}} (25)

into (20), to derive the instantaneous estimate of C𝐰,nnewC^{\text{new}}_{\mathbf{w},n}:

C𝐰,nnew=\displaystyle C^{\text{new}}_{\mathbf{w},n}=\; 1M𝐡1:n|θn{(𝐡n𝐡n1)T(𝐡n𝐡n1)}\displaystyle\frac{1}{M}\mathcal{E}_{\mathbf{h}_{1:n}|\theta_{n}}\{(\mathbf{h}_{n}-\mathbf{h}_{n-1})^{T}(\mathbf{h}_{n}-\mathbf{h}_{n-1})\} (26)
=(11)\displaystyle\stackrel{{\scriptstyle(\ref{equ:Posterior})}}{{=}} C𝐡,nC𝐡,n1+1M(𝐡^nT𝐡^n𝐡^n1T𝐡^n1),\displaystyle C_{\mathbf{h},n}-C_{\mathbf{h},n-1}+\frac{1}{M}\left(\mathbf{\hat{h}}_{n}^{T}\mathbf{\hat{h}}_{n}-\mathbf{\hat{h}}_{n-1}^{T}\mathbf{\hat{h}}_{n-1}\right), (27)

where we employed the statistical independence between 𝐰n\mathbf{w}_{n} and 𝐡n1\mathbf{h}_{n-1}. Equation (27) implies the estimation of C𝐰,nnewC^{\text{new}}_{\mathbf{w},n} as difference of the filter tap autocorrelations between the time instants nn and n1n-1. Finally, the updated values in θnnew\theta^{\text{new}}_{n} are used as initialization for the following time step, so that

θn+1:=θnnewC𝐰,n+1:=C𝐰,nnew,Cv,n+1:=Cv,nnew.\theta_{n+1}:=\theta^{\text{new}}_{n}\;\rightarrow\;C_{\mathbf{w},n+1}:=C^{\text{new}}_{\mathbf{w},n},\;C_{v,n+1}:=C^{\text{new}}_{v,n}. (28)

III Comparison between the EM-NLMS algorithm and the NLMS algorithm proposed in [10]

In this part, we compare the proposed EM-NLMS algorithm to the NLMS algorithm reviewed in Section I and show the equality between the adaptive stepsizes in (5) and (16). We reformulate the stepsize update in (16) by applying the conditional independence rules for Bayesian networks [9]: First, we exploit the equalities

𝐂𝐡,n=(11)C𝐡,n𝐈=(12){(𝐡n𝐡^n)(𝐡n𝐡^n)T},𝐂𝐰,n=(9)C𝐰,n𝐈={𝐰n𝐰nT},\begin{split}\mathbf{C}_{\mathbf{h},n}&\stackrel{{\scriptstyle(\ref{equ:Posterior})}}{{=}}C_{\mathbf{h},n}\mathbf{I}\stackrel{{\scriptstyle(\ref{equ:mes})}}{{=}}\mathcal{E}\{(\mathbf{h}_{n}-\mathbf{\hat{h}}_{n})(\mathbf{h}_{n}-\mathbf{\hat{h}}_{n})^{T}\},\\ \mathbf{C}_{\mathbf{w},n}&\stackrel{{\scriptstyle(\ref{equ:ChannelUncert})}}{{=}}C_{\mathbf{w},n}\mathbf{I}=\mathcal{E}\{\mathbf{w}_{n}\mathbf{w}_{n}^{T}\},\end{split} (29)

which lead to the following relations:

C𝐡,n\displaystyle C_{\mathbf{h},n} ={(𝐡n𝐡^n)T(𝐡n𝐡^n)}M={𝐡n𝐡^n22}M,\displaystyle=\frac{\mathcal{E}\{(\mathbf{h}_{n}-\mathbf{\hat{h}}_{n})^{T}(\mathbf{h}_{n}-\mathbf{\hat{h}}_{n})\}}{M}=\frac{\mathcal{E}\{||\mathbf{h}_{n}-\mathbf{\hat{h}}_{n}||^{2}_{2}\}}{M},
C𝐰,n\displaystyle C_{\mathbf{w},n} ={𝐰nT𝐰n}M={𝐰n22}M.\displaystyle=\frac{\mathcal{E}\{\mathbf{w}_{n}^{T}\mathbf{w}_{n}\}}{M}=\frac{\mathcal{E}\{||\mathbf{w}_{n}||^{2}_{2}\}}{M}. (30)

Second, it can be seen in Fig. 2 that the state vector 𝐡n1\mathbf{h}_{n-1} and the uncertainty 𝐰n\mathbf{w}_{n} are statistically independent as they share a head-to-head relationship with respect to the latent vector 𝐡n\mathbf{h}_{n}. As a consequence, the numerator in (16) can be rewritten as

C𝐡,n1+C𝐰,n=(30)\displaystyle C_{\mathbf{h},n-1}+C_{\mathbf{w},n}\stackrel{{\scriptstyle(\ref{equ:Zwischenschritt})}}{{=}}\; {𝐡n1𝐡^n122}M+{𝐰n22}M\displaystyle\frac{\mathcal{E}\{||\mathbf{h}_{n-1}-\mathbf{\hat{h}}_{n-1}||^{2}_{2}\}}{M}+\frac{\mathcal{E}\{||\mathbf{w}_{n}||^{2}_{2}\}}{M}
=(8)\displaystyle\stackrel{{\scriptstyle(\ref{equ:TranEq})}}{{=}}\; {𝐡n𝐡^n122}M.\displaystyle\frac{\mathcal{E}\{||\mathbf{h}_{n}-\mathbf{\hat{h}}_{n-1}||^{2}_{2}\}}{M}. (31)

Finally, we consider the mean of the squared error signal

{en2}=(2),(4){(𝐱nT(𝐡n𝐡^n1)+vn)2},\mathcal{E}\{e_{n}^{2}\}\stackrel{{\scriptstyle(\ref{equ:ObservSpe}),(\ref{equ:Error})}}{{=}}\mathcal{E}\{(\mathbf{x}_{n}^{T}(\mathbf{h}_{n}-\mathbf{\hat{h}}_{n-1})+v_{n})^{2}\}, (32)

which is not conditioned on the microphone signal dnd_{n}. By applying the conditional independence rules to the Bayesian network in Fig. 2, the head-to-head relationship with respect to dnd_{n} implies vnv_{n} to be statistically independent from 𝐡n1\mathbf{h}_{n-1} and 𝐰n\mathbf{w}_{n}, respectively. Consequently, we can rewrite (32) as:

{en2}=(10)\displaystyle\mathcal{E}\{e_{n}^{2}\}\stackrel{{\scriptstyle(\ref{equ:micUnc})}}{{=}}\hskip 3.41432pt 𝐱nT{(𝐡n𝐡^n1)(𝐡n𝐡^n1)T}𝐱n+Cv,n\displaystyle\hskip 8.53581pt\mathbf{x}_{n}^{T}\mathcal{E}\{(\mathbf{h}_{n}-\mathbf{\hat{h}}_{n-1})(\mathbf{h}_{n}-\mathbf{\hat{h}}_{n-1})^{T}\}\mathbf{x}_{n}+C_{v,n}
=(8),(29)\displaystyle\stackrel{{\scriptstyle(\ref{equ:TranEq}),(\ref{equ:varDef})}}{{=}}\hskip-3.41432pt 𝐱nT𝐱n(C𝐡,n1+C𝐰,n)+Cv,n.\displaystyle\hskip 8.53581pt\mathbf{x}^{T}_{n}\mathbf{x}_{n}(C_{\mathbf{h},n-1}+C_{\mathbf{w},n})+C_{v,n}. (33)

The insertion of (31) and (33) into the stepsize defined in (16) yields the identical expression for λn\lambda_{n} as in (5). The main difference of the proposed EM-NLMS algorithm is that the model parameters C𝐡,nC_{\mathbf{h},n} and C𝐰,nC_{\mathbf{w},n} (and consequently the normalized stepsize λn\lambda_{n}) are estimated in the M step of the EM algorithm instead of being approximated using (6).

IV Experimental results

This section focuses on the experimental verification of the EM-NLMS algorithm (“EM-NLMS”) in comparison to the adaptive stepsize-NLMS algorithm described in Section I (“Adapt. NLMS”) and the conventional NLMS algorithm (“Conv. NLMS”) with a fixed stepsize. An overview of the algorithms including the individually tuned model parameters is shown in Table II. Note the regularization of all three stepsize updates by the additive constant ϵ=0.01\epsilon=0.01 to avoid a division by zero. For the evaluation, we synthesize the microphone signal by convolution of the loudspeaker signal with an RIR vector measured in a room with T60=100T_{60}=100 ms (filter length M=512M=512 at a sampling rate of 1616 kHz). This is realized for both white noise and a male speech signal as loudspeaker signals. Furthermore, background noise is simulated by adding Gaussian white noise at a global signal-to-noise ratio of 2020 dB. The comparison is realized in terms of the stepsize αn\alpha_{n} and the system distance Δhn\Delta h_{n} as a measure for the system identification performance:

Δhn=10log10𝐡^n𝐡n22𝐡n22dB,αn=λn(𝐱nT𝐱n).\Delta h_{n}=10\log_{10}\frac{||\mathbf{\hat{h}}_{n}-\mathbf{h}_{n}||_{2}^{2}}{||\mathbf{h}_{n}||_{2}^{2}}\;\text{dB},\quad\alpha_{n}=\lambda_{n}(\mathbf{x}^{T}_{n}\mathbf{x}_{n}). (34)

The results for white noise as input signal are illustrated in Fig 3. Note that in Fig. 3a) the EM-NLMS shows the best system identification compared to the Adapt. NLMS and the Conv. NLMS. As depicted in Fig. 3b), the stepsize αn\alpha_{n} of the EM-NLMS and the Adapt. NLMS decreases from a value of 0.50.5 with the stepsize of the EM-NLMS decaying more slowly.
For male speech as input signal, we improve the convergence of the Conv. NLMS by setting a fixed threshold to stop adaptation (αn=0\alpha_{n}=0) in speech pauses. Furthermore, the absolute value of λn\lambda_{n} for the Adapt. NLMS is limited to 0.5 (for a heuristic justification see [24]). As illustrated in Fig. 4a), the EM-NLMS shows again the best system identification compared to the Adapt. NLMS and the Conv. NLMS. By focusing on a small time frame, we can see in Fig. 4b) that the stepsize αn\alpha_{n} of the EM-NLMS algorithm is not restricted to the values of 0 and 0.50.5 (as Conv. NLMS) and not affected by oscillations (as Adapt. NLMS).
Note that the only relevant increase in computational complexity of the EM-NLMS relative to the Conv. NLMS is caused by the scalar product 𝐡^nT𝐡^n\mathbf{\hat{h}}_{n}^{T}\mathbf{\hat{h}}_{n} for the calculation of C𝐰,nC_{\mathbf{w},n} (cf. Table II), which seems relatively small compared to other sophisticated stepsize adaptation algorithms.

TABLE II: Realizations of the EM-NLMS algorithm (“EM-NLMS”),
the NLMS algorithm due to [10] (“Adapt. NLMS“) and
the conventional NLMS algorithm (”Conv. NLMS“)    
𝐡^n=𝐡^n1+λn𝐱nen\mathbf{\hat{h}}_{n}=\mathbf{\hat{h}}_{n-1}+\lambda_{n}\mathbf{x}_{n}e_{n}
EM-NLMS λn=C𝐡,n1+C𝐰,n𝐱nT𝐱n(C𝐡,n1+C𝐰,n)+Cv,n+ϵ\lambda_{n}=\frac{C_{\mathbf{h},n-1}+C_{\mathbf{w},n}}{\mathbf{x}^{T}_{n}\mathbf{x}_{n}(C_{\mathbf{h},n-1}+C_{\mathbf{w},n})+C_{v,n}+\epsilon}
C𝐡,n=(1λn𝐱nT𝐱nM)(C𝐡,n1+C𝐰,n)C_{\mathbf{h},n}=\left(1-\lambda_{n}\frac{\mathbf{x}^{T}_{n}\mathbf{x}_{n}}{M}\right)(C_{\mathbf{h},n-1}+C_{\mathbf{w},n})
Cv,n+1=(dn𝐱nT𝐡^n)2+𝐱nT𝐱nC𝐡,nC_{v,n+1}=\left(d_{n}-\mathbf{x}^{T}_{n}\mathbf{\hat{h}}_{n}\right)^{2}+\mathbf{x}^{T}_{n}\mathbf{x}_{n}C_{\mathbf{h},n}
C𝐰,n+1=C𝐡,nC𝐡,n1+𝐡^nT𝐡^n𝐡^n1T𝐡^n1MC_{\mathbf{w},n+1}=C_{\mathbf{h},n}-C_{\mathbf{h},n-1}+\frac{\mathbf{\hat{h}}_{n}^{T}\mathbf{\hat{h}}_{n}-\mathbf{\hat{h}}_{n-1}^{T}\mathbf{\hat{h}}_{n-1}}{M}
C𝐡,0=C𝐰,0=Cv,0=0.1C_{\mathbf{h},0}=C_{\mathbf{w},0}=C_{v,0}=0.1, ϵ=0.01\;\epsilon=0.01
Adapt. NLMS λn1NTκ=0NT1h^k,n12(1η)en2+η{en12}+ϵ\lambda_{n}\approx\frac{1}{N_{T}}\frac{\sum\limits_{\kappa=0}^{N_{T}-1}\hat{h}^{2}_{k,n-1}}{(1-\eta)e_{n}^{2}+\eta\mathcal{E}\{e_{n-1}^{2}\}+\epsilon}
NT=5,η=0.9,e02=0.1,ϵ=0.01N_{T}=5,\;\;\eta=0.9,\;\;e_{0}^{2}=0.1,\;\;\epsilon=0.01
Conv. NLMS λn=0.5𝐱nT𝐱n+ϵ,ϵ=0.01\lambda_{n}=\frac{0.5}{\mathbf{x}^{T}_{n}\mathbf{x}_{n}+\epsilon},\;\;\epsilon=0.01

V Conclusion

In this article, we derive the EM-NLMS algorithm from a Bayesian network perspective and show the equality with respect to the NLMS algorithm initially proposed in [10]. As main difference, the stepsize is estimated in the M Step of the EM algorithm instead of being approximated by artificially extending the acoustic echo path. For the derivation of the EM-NLMS algorithm, which is experimentally shown to be promising for the task of linear AEC, we define a probabilistic model for linear system identification and exploit the product and conditional independence rules of Bayesian networks. All together this article exemplifies the benefit of applying machine learning techniques to classical signal processing tasks.

0112233445540-4020-200Δhn\Delta h_{n} / dB \;\;\rightarrowa)EM-NLMS Adapt. NLMSConv. NLMS
0112233445500.20.20.40.40.60.6time / s \;\;\rightarrowαn\alpha_{n} \;\rightarrowb)
Figure 3: Comparison of the EM-NLMS algorithm (“EM-NLMS”), the NLMS algorithm due to [10] (“Adapt. NLMS“) and the conventional NLMS algorithm (”Conv. NLMS“) in terms of the system distance Δhn\Delta h_{n} and the stepsize αn\alpha_{n} for white Gaussian noise as input signal.
022446688101030-3020-2010-100Δhn\Delta h_{n} / dB \;\;\rightarrowa)EM-NLMS Adapt. NLMSConv. NLMS
333.23.23.43.43.63.63.83.84400.20.20.40.40.60.6αn\alpha_{n} \;\rightarrowb)
333.23.23.43.43.63.63.83.8441-10.5-0.500.50.511time / s \;\;\rightarrowdnd_{n} \;\rightarrowc)
Figure 4: Comparison of the EM-NLMS algorithm (“EM-NLMS”), the NLMS algorithm due to [10] (“Adapt. NLMS“) and the conventional NLMS algorithm (”Conv. NLMS“) in terms of the system distance Δhn\Delta h_{n} and the stepsize αn\alpha_{n} (short time frame for visualization purposes) for male speech as input signal (see the microphone signal dnd_{n} in Fig. 4c)).

References

  • [1] B. J. Frey and N. Jojic, “A comparison of algorithms for inference and learning in probabilistic graphical models,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 27, no. 9, pp. 1392–1416, Sept. 2005.
  • [2] T. Adali, D. Miller, K. Diamantaras, and J. Larsen, “Trends in machine learning for signal processing [in the spotlight],” IEEE Signal Processing Mag., vol. 28, no. 6, pp. 193–196, Nov. 2011.
  • [3] J.A. Bilmes and C. Bartels, “Graphical model architectures for speech recognition,” IEEE Signal Processing Mag.,, vol. 22, no. 5, pp. 89–100, Sept. 2005.
  • [4] S.J. Rennie, P. Aarabi, and B.J. Frey, “Variational probabilistic speech separation using microphone arrays,” IEEE Trans. Audio, Speech and Lang. Process., vol. 15, no. 1, pp. 135–149, Jan. 2007.
  • [5] M.J. Wainwright and M.I. Jordan, “Graphical models, exponential families, and variational inference,” Found. Trends Mach. Learning, vol. 1, no. 1–2, pp. 1–305, Dec. 2008.
  • [6] D. Barber and A. Cemgil, “Graphical models for time-series,” IEEE Signal Processing Mag., Nov. 2010.
  • [7] C.W. Maina and J.M. Walsh, “Joint speech enhancement and speaker identification using approximate Bayesian inference,” IEEE Trans. Audio, Speech and Lang. Process., vol. 19, no. 6, pp. 1517–1529, Aug. 2011.
  • [8] R. Maas, C. Huemmer, A. Schwarz, C. Hofmann, and W. Kellermann, “A Bayesian network view on linear and nonlinear acoustic echo cancellation,” in IEEE China Summit Int. Conf. Signal Inform. Process. (ChinaSIP), Xi’an, China, July 2014, pp. 495–499.
  • [9] C. M. Bishop, Pattern Recognition and Machine Learning, Springer, 2006, 8th printing 2009.
  • [10] S. Yamamoto and S. Kitayama, “An adaptive echo canceller with variable step gain method,” Trans. IECE Japan, vol. E65, no. 1, pp. 1–8, Jan. 1982.
  • [11] H.-C. Huang and J. Lee, “A variable step size LMS algorithm,” IEEE Trans. Signal Process., vol. 40, no. 7, pp. 1633–1642, July 1992.
  • [12] T. Aboulnasr and K. Mayyas, “A robust variable step-size LMS-type algorithm: analysis and simulations,” IEEE Trans. Signal Process., vol. 45, no. 3, pp. 631–639, July 1992.
  • [13] A. Mader, H. Puder, and G. U. Schmidt, “Step-size control for acoustic echo cancellation filters – an overview,” Signal Process., vol. 80, no. 9, pp. 1697–1719, Sept. 2000.
  • [14] H.-C. Shin, A.H. Sayed, and W.-J. Song, “Variable step-size NLMS and affine projection algorithms,” IEEE Signal Process. Lett., vol. 11, no. 2, pp. 132–135, Feb. 2004.
  • [15] J. Benesty, H. Rey, L. R. Vega, and S. Tressens, “A nonparametric VSS NLMS algorithm,” IEEE Trans. Signal Process., vol. 13, no. 10, pp. 581–584, Oct. 2006.
  • [16] P.A.C. Lopes and J.B. Gerald, “New normalized LMS algorithms based on the Kalman filter,” in Proc. IEEE Int. Conf. Acoustics, Speech, Signal Process. (ICASSP), May 2007, pp. 117–120.
  • [17] M. Asif Iqbal and S. L. Grant, “Novel variable step size NLMS algorithms for echo cancellation,” in Proc. IEEE Int. Conf. Acoustics, Speech, Signal Process. (ICASSP), Apr. 2008, pp. 241–244.
  • [18] C. Paleologu, J. Benesty, S. L. Grant, and C. Osterwise, “Variable step-size NLMS algorithms designed for echo cancellation,” in IEEE Rec. 43rd Asilomar Conf. Signals, Syst. and Comput., Nov. 2009, pp. 633–637.
  • [19] J.-K. Hwang and Y.-P. Li, “Variable step-size LMS algorithm with a gradient-based weighted average,” IEEE Signal Process. Lett., vol. 16, no. 12, pp. 1043–1046, Dec. 2009.
  • [20] H.-C. Huang and J. Lee, “A new variable step-size NLMS algorithm and its performance analysis,” IEEE Trans. Signal Process., vol. 60, no. 4, pp. 2055–2060, Apr. 2012.
  • [21] H. Zhao and Y. Yu, “Novel adaptive VSS-NLMS algorithm for system identification,” in IEEE 4th Int. Conf. Intell. Control Inform. Process. (ICICIP), June 2013, pp. 760–764.
  • [22] S. Haykin, Adaptive Filter Theory, Prentice Hall, 2002.
  • [23] C. Breining, P. Dreiseitel, E. Hänsler, A. Mader, B. Nitsch, H. Puder, T. Schertler, G. Schmidt, and J. Tilp, “Acoustic echo control,” Signal Process., vol. 16, no. 4, pp. 42–69, July 1999.
  • [24] U. Schultheiß, Über die Adaption eines Kompensators für akustische Echos, Ph.D. thesis, 1988.
  • [25] R. Maas, C. Huemmer, C. Hofmann, and W. Kellermann, “On Bayesian networks in speech signal processing,” in ITG Conf. Speech Commun., Erlangen, Germany, Sept. 2014.