NLMS: normalized least mean square
AEC: acoustic echo cancellation
FIR: finite impulse response
RIR: room impulse response
EM: expectation-maximization
MMSE: minimum mean square error
PDF: probability density function
KLMS: Kalman-based LMS algorithm

The NLMS algorithm with time-variant optimum
stepsize derived from a Bayesian network perspective

Christian Huemmer, , Roland Maas, Walter Kellermann

Abstract

In this article, we derive a new stepsize adaptation for the normalized least mean square algorithm (NLMS) by describing the task of linear acoustic echo cancellation from a Bayesian network perspective. Similar to the well-known Kalman filter equations, we model the acoustic wave propagation from the loudspeaker to the microphone by a latent state vector and define a linear observation equation (to model the relation between the state vector and the observation) as well as a linear process equation (to model the temporal progress of the state vector). Based on additional assumptions on the statistics of the random variables in observation and process equation, we apply the expectation-maximization (EM) algorithm to derive an NLMS-like filter adaptation. By exploiting the conditional independence rules for Bayesian networks, we reveal that the resulting EM-NLMS algorithm has a stepsize update equivalent to the optimal-stepsize calculation proposed by Yamamoto and Kitayama in 1982, which has been adopted in many textbooks. As main difference, the instantaneous stepsize value is estimated in the M step of the EM algorithm (instead of being approximated by artificially extending the acoustic echo path). The EM-NLMS algorithm is experimentally verified for synthesized scenarios with both, white noise and male speech as input signal.

Index Terms:

Adaptive stepsize, NLMS, Bayesian network, machine learning, EM algorithm

I Introduction

Machine learning techniques have been widely applied to signal processing tasks since decades [1, 2]. For example, directed graphical models, termed Bayesian networks, have shown to provide a powerful framework for modeling causal probabilistic relationships between random variables [3, 4, 5, 6, 7]. In previous work, the update equations of the Kalman filter and the normalized least mean square (NLMS) algorithm have already been derived from a Bayesian network perspective based on a linear relation between the latent room impulse response (RIR) vector and the observation [8, 9].
The NLMS algorithm is one of the most-widely used adaptive algorithms in speech signal processing and a variety of stepsize adaptation schemes has been proposed to improve its system identification performance [10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21]. In this article, we derive a novel NLMS-like filter adaptation (termed EM-NLMS algorithm) by applying the expectation-maximization (EM) algorithm to a probabilistic model for linear system identification. Based on the conditional independence rules for Bayesian networks, it is shown that the normalized stepsize of the EM-NLMS algorithm is equivalent to the one proposed in [10], which is now commonly accepted as optimum NLMS stepsize rule, see e.g. [22]. As the main difference relative to [10] , the normalized stepsize is here estimated as part of the EM algorithm instead of being approximated by artificially extending the acoustic echo path. For a valid comparison, we review the algorithm of [10] for the linear acoustic echo cancellation (AEC) scenario shown in Fig. 1.

\psfrag{A}[c][c]{$\mathbf{x}_{n}$}\psfrag{B}[c][c]{$\mathbf{h}_{n}$}\psfrag{C}[c][c]{$v_{n}$}\psfrag{D}[c][c]{$d_{n}$}\psfrag{F}[c][c]{$\hat{d}_{n}$}\psfrag{G}[c][c]{$e_{n}$}\psfrag{E}[c][c]{$\mathbf{\hat{h}}_{n-1}$}\includegraphics[width=130.08731pt]{AECworkPSFRAG.eps}

Figure 1: System model for linear AEC with RIR vector

\mathbf{h}_{n}

The acoustic path between loudspeaker and microphone at time $n$ is modeled by the linear finite impulse response (FIR) filter

\mathbf{h}_{n}=[h_{0,n},h_{1,n},...,h_{M-1,n}]^{T}

(1)

with time-variant coefficients $h_{\kappa,n}$ , where ${\kappa=0,...,M-1}$ . The observation equation models the microphone sample $d_{n}$ :

d_{n}=\mathbf{x}^{T}_{n}\mathbf{h}_{n}+v_{n},

(2)

with the additive variable $v_{n}$ modeling near-end interferences and the observed input signal vector $\mathbf{x}_{n}=[x_{n},x_{n-1},...,x_{n-M+1}]^{T}$ capturing the time-domain samples $x_{n}$ . The iterative estimation of the RIR vector by the adaptive FIR filter $\mathbf{\hat{h}}_{n}$ is realized by the update rule

\mathbf{\hat{h}}_{n}=\mathbf{\hat{h}}_{n-1}+\lambda_{n}\mathbf{x}_{n}e_{n},

(3)

with the stepsize $\lambda_{n}$ and the error signal

e_{n}=d_{n}-\mathbf{x}_{n}^{T}\mathbf{\hat{h}}_{n-1}

(4)

relating the observation $d_{n}$ and its estimate $\hat{d}_{n}=\mathbf{x}_{n}^{T}\mathbf{\hat{h}}_{n-1}$ . In [10], the optimal choice of $\lambda_{n}$ has been approximated as:

\lambda_{n}\approx\frac{1}{M}\frac{\mathcal{E}\{||\mathbf{h}_{n}-\mathbf{\hat{h}}_{n-1}||^{2}_{2}\}}{\mathcal{E}\{e_{n}^{2}\}},

(5)

where $||\cdot||_{2}$ denotes the Euclidean norm and $\mathcal{E}\{\cdot\}$ the expectation operator. As the true echo path $\mathbf{h}_{n}$ is unobservable, so that the numerator in (5) cannot be computed, $\lambda_{n}$ is further approximated by introducing a delay of $N_{T}$ coefficients to the echo path $\mathbf{h}_{n}$ . Moreover, a recursive approximation of the denominator in (5) is applied using the forgetting factor $\eta$ [22, 23]. The resulting stepsize approximation

\lambda_{n}\approx\frac{1}{N_{T}}\frac{\sum\limits_{\kappa=0}^{N_{T}-1}\hat{h}^{2}_{k,n-1}}{(1-\eta)e_{n}^{2}+\eta\mathcal{E}\{e_{n-1}^{2}\}}

(6)

leads to oscillations which have to be addressed by limiting the absolute value of $\lambda_{n}$ [24].

TABLE I: Relation between the NLMS algorithm following [10] and the proposed EM-NLMS algorithm

	NLMS algorithm [10]	EM-NLMS algorithm
Norm. stepsize $\lambda_{n}$	(5)	E step
Estimation of $\lambda_{n}$	(6)	M step

In this article, we derive the EM-NLMS algorithm which applies the filter update of (3) using the stepsize in (5), where $\lambda_{n}$ is estimated in the M Step of the EM algorithm instead of being approximated by using (6).

This article is structured as follows: In Section II, we propose a probabilistic model for the linear AEC scenario of Fig. 1 and derive the EM-NLMS algorithm, which is revealed in Section III to be similar to the NLMS algorithm proposed in [10]. As main difference (cf. Table I), the stepsize is estimated in the M Step of the EM algorithm instead of being approximated by artificially extending the acoustic echo path. In Section IV, the EM-NLMS algorithm is experimentally verified for synthesized scenarios with both, white noise and male speech as input signal. Finally, conclusions are drawn in Section V.

II The EM-NLMS algorithm for linear AEC

Throughout this article, the Gaussian probability density function (PDF) of a real-valued length- $M$ vector $\mathbf{z}_{n}$ with mean vector $\boldsymbol{\mu}_{\mathbf{z},n}$ and covariance matrix $\mathbf{C}_{\mathbf{z},n}$ is denoted as

\begin{split}&\mathbf{z}_{n}\sim\mathcal{N}\{\mathbf{z}_{n}|\boldsymbol{\mu}_{\mathbf{z},n},\mathbf{C}_{\mathbf{z},n}\}\\ =&\frac{|\mathbf{C}_{\mathbf{z},n}|^{-1/2}}{(2\pi)^{M/2}}\exp\left\{-\frac{(\mathbf{z}_{n}-\boldsymbol{\mu}_{\mathbf{z},n})^{T}\mathbf{C}_{\mathbf{z},n}^{-1}(\mathbf{z}_{n}-\boldsymbol{\mu}_{\mathbf{z},n})}{2}\right\},\end{split}

(7)

where $|\cdot|$ represents the determinant of a matrix. Furthermore, $\mathbf{C}_{\mathbf{z},n}=C_{\mathbf{z},n}\mathbf{I}$ (with identity matrix $\mathbf{I}$ ) implies the elements of $\mathbf{z}_{n}$ to be mutually statistically independent and of equal variance $C_{\mathbf{z},n}$ .

II-A Probabilistic AEC model

To describe the linear AEC scenario of Fig. 1 from a Bayesian network perspective, we model the acoustic echo path as a latent state vector $\mathbf{h}_{n}$ identically defined as in (1) and capture uncertainties (e.g. due to the limitation to a linear system with a finite set of coefficients) by the additive uncertainty $\mathbf{w}_{n}$ . Consequently, the linear process equation and the linear observation equation,

\mathbf{h}_{n}=\mathbf{h}_{n-1}+\mathbf{w}_{n}\quad\text{and}\quad d_{n}=\mathbf{x}^{T}_{n}\mathbf{h}_{n}+v_{n},

(8)

can be jointly represented by the graphical model shown in Fig. 2. The directed links express statistical dependencies between the nodes and random variables, such as $v_{n}$ , are marked as circles. We make the following assumptions on the PDFs of the random variables in Fig. 2:

•

The uncertainty $\mathbf{w}_{n}$ is normally distributed with mean vector $\mathbf{0}$ (of zero-valued entries) and variance $C_{\mathbf{w},n}$ :

\mathbf{w}_{n}\sim\mathcal{N}\{\mathbf{w}_{n}|\mathbf{0},\mathbf{C}_{\mathbf{w},n}\},\quad\mathbf{C}_{\mathbf{w},n}=C_{\mathbf{w},n}\mathbf{I}.

(9)

•

The microphone signal uncertainty $v_{n}$ is assumed to be normally distributed with variance $C_{v,n}$ and zero mean:

$v_{n}\sim\mathcal{N}\{v_{n}|0,C_{v,n}\}.$ (10)

•

The posterior distribution $p\left(\mathbf{h}_{n}|d_{1:n}\right)$ is defined with mean vector $\boldsymbol{\mu}_{\mathbf{h},n}$ , variance $C_{\mathbf{h},n}$ and ${d_{1:n}=d_{1},...,d_{n}}$ :

p\left(\mathbf{h}_{n}|d_{1:n}\right)=\mathcal{N}\{\mathbf{h}_{n}|\boldsymbol{\mu}_{\mathbf{h},n},\mathbf{C}_{\mathbf{h},n}\},\quad\mathbf{C}_{\mathbf{h},n}=C_{\mathbf{h},n}\mathbf{I}.

(11)

\psfrag{A1}[c][c]{$\mathbf{w}_{1}$}\psfrag{A2}[c][c]{$\mathbf{w}_{2}$}\psfrag{An1}[c][c]{$\mathbf{w}_{n-1}$}\psfrag{An}[c][c]{$\mathbf{w}_{n}$}\psfrag{b1}[c][c]{$\mathbf{h}_{1}$}\psfrag{b2}[c][c]{$\mathbf{h}_{2}$}\psfrag{bn1}[c][c]{$\mathbf{h}_{n-1}$}\psfrag{bn}[c][c]{$\mathbf{h}_{n}$}\psfrag{c1}[c][c]{$d_{1}$}\psfrag{c2}[c][c]{$d_{2}$}\psfrag{cn1}[c][c]{$d_{n-1}$}\psfrag{cn}[c][c]{$d_{n}$}\psfrag{d1}[c][c]{$v_{1}$}\psfrag{d2}[c][c]{$v_{2}$}\psfrag{dn1}[c][c]{$v_{n-1}$}\psfrag{dn}[c][c]{$v_{n}$}\includegraphics[width=147.4292pt]{LDSfullRestrictedParticle.eps}

Figure 2: Bayesian network for linear AEC with latent state vector

\mathbf{h}_{n}

Based on this probabilistic AEC model, we apply the EM algorithm consisting of two parts: In the E Step, the filter update is derived based on minimum mean square error (MMSE) estimation (Subsection II-B). In the M step, we predict the model parameters $C_{v,n+1}$ and $C_{\mathbf{w},n+1}$ to estimate the adaptive stepsize value $\lambda_{n+1}$ (Subsection II-C).

II-B E step: Inference of the state vector

The MMSE estimation of the state vector identifies the mean vector of the posterior distribution as estimate $\mathbf{\hat{h}}_{n}$ :

\mathbf{\hat{h}}_{n}=\underset{\mathbf{\tilde{h}}_{n}}{\operatorname{argmin}}\;\mathcal{E}\{||\mathbf{\tilde{h}}_{n}-\mathbf{h}_{n}||_{2}^{2}\}=\mathcal{E}\{\mathbf{h}_{n}|d_{1:n}\}=\boldsymbol{\mu}_{\mathbf{h},n}.

(12)

Due to the linear relations between the variables in (2) and (8), and under the restrictions to a linear estimator of $\mathbf{\hat{h}}_{n}$ and normally distributed random variables, the MMSE estimation is analytically tractable [9] . Exploiting the product rules for linear Gaussian models and conditional independence of the Bayesian network in Fig 2, the filter update can be derived as a special case of the Kalman filter equations [9, p. 639]:

\mathbf{\hat{h}}_{n}=\mathbf{\hat{h}}_{n-1}+\boldsymbol{\Lambda}_{n}\mathbf{x}_{n}e_{n},

(13)

with the stepsize matrix

\boldsymbol{\Lambda}_{n}=\frac{\mathbf{C}_{\mathbf{h},n-1}+\mathbf{C}_{\mathbf{w},n}}{\mathbf{x}^{T}_{n}(\mathbf{C}_{\mathbf{h},n-1}+\mathbf{C}_{\mathbf{w},n})\mathbf{x}_{n}+C_{v,n}}

(14)

and the update of the covariance matrix given as

\mathbf{C}_{\mathbf{h},n}=\left(\mathbf{I}-\boldsymbol{\Lambda}_{n}\mathbf{x}_{n}\mathbf{x}^{T}_{n}\right)(\mathbf{C}_{\mathbf{h},n-1}+\mathbf{C}_{\mathbf{w},n}).

(15)

By inserting (9) and (11), we can rewrite the filter update of (13) to the filter update defined in (3) with the scalar stepsize

\lambda_{n}=\frac{C_{\mathbf{h},n-1}+C_{\mathbf{w},n}}{\mathbf{x}^{T}_{n}\mathbf{x}_{n}(C_{\mathbf{h},n-1}+C_{\mathbf{w},n})+C_{v,n}}.

(16)

Finally, the update of $C_{\mathbf{h},n}$ is approximated following (11) as

C_{\mathbf{h},n}\stackrel{{\scriptstyle(\ref{equ:Posterior})}}{{=}}\frac{\text{diag}\{\mathbf{C}_{\mathbf{h},n}\}}{M}\stackrel{{\scriptstyle(\ref{equ:LDSupdateC})}}{{=}}\left(1-\lambda_{n}\frac{\mathbf{x}^{T}_{n}\mathbf{x}_{n}}{M}\right)(C_{\mathbf{h},n-1}+C_{\mathbf{w},n}),

(17)

where $\text{diag}\{\cdot\}$ adds up the diagonal elements of a matrix.
Before showing the equality of the stepsize updates in (16) and (5) in Section III, we propose a new alternative to estimate $\lambda_{n}$ in (16) by deriving the updates of the model parameters $C_{\mathbf{w},n}$ and $C_{v,n}$ in the following section.

II-C M step: Online learning of the model parameters

In the M step, we predict the model parameters for the following time instant. Although the maximum likelihood estimation is analytically tractable, we apply the EM algorithm to derive an online estimator: In order to update $\theta_{n}=\{C_{v,n},C_{\mathbf{w},n}\}$ to the new parameters $\theta^{\text{new}}_{n}=\{C^{\text{new}}_{v,n},C^{\text{new}}_{\mathbf{w},n}\}$ , the lower bound

\mathcal{E}_{\mathbf{h}_{1:n}|\theta_{1:n}}\{\ln\left(p(d_{1:n},\mathbf{h}_{1:n}|\theta_{1:n})\right)\}\leq\ln p(d_{1:n}|\theta_{1:n}),

(18)

is maximized, where $\theta_{1:n}=\{C_{v,1:n},C_{\mathbf{w},1:n}\}$ . For this, the PDF $p(d_{1:n},\mathbf{h}_{1:n}|\theta_{1:n})$ is determined by applying the decomposition rules for Bayesian networks [9]:

	$\displaystyle p(d_{1:n},\mathbf{h}_{1:n}\|\theta_{1:n})=p(\mathbf{h}_{n}\|\mathbf{h}_{n-1},C_{\mathbf{w},n}\mathbf{I})p(d_{n}\|\mathbf{h}_{n},C_{v,n})$
	$\displaystyle\cdot\prod_{m=1}^{n-1}p(\mathbf{h}_{m}\|\mathbf{h}_{m-1},C_{\mathbf{w},m}\mathbf{I})p(d_{m}\|\mathbf{h}_{m},C_{v,m}).$		(19)

Next, we take the natural logarithm ln $(\cdot)$ of $p(d_{1:n},\mathbf{h}_{1:n}|\theta_{1:n})$ , replace $\theta_{n}$ by $\theta^{\text{new}}_{n}$ and maximize the right-hand side of (18) with respect to $\theta^{\text{new}}_{n}$ :

	$\displaystyle\theta^{\text{new}}_{n}$	$\displaystyle=\underset{C^{\text{new}}_{\mathbf{w},n}}{\operatorname{argmax}}\;\mathcal{E}_{\mathbf{h}_{1:n}\|\theta_{n}}\{\ln\left(p(\mathbf{h}_{n}\|\mathbf{h}_{n-1},C^{\text{new}}_{\mathbf{w},n}\mathbf{I})\right)\}$
		$\displaystyle+\underset{C^{\text{new}}_{v,n}}{\operatorname{argmax}}\;\mathcal{E}_{\mathbf{h}_{1:n}\|\theta_{n}}\{\ln\left(p(d_{n}\|\mathbf{h}_{n},C^{\text{new}}_{v,n})\right)\},$		(20)

where we apply two separate maximizations starting with the estimation of $C^{\text{new}}_{v,n}$ by inserting

\ln(p(d_{n}|\mathbf{h}_{n},C^{\text{new}}_{v,n}))\stackrel{{\scriptstyle(\ref{equ:TranEq})}}{{=}}-\frac{\ln(2\pi C^{\text{new}}_{v,n})}{2}-\frac{(d_{n}-\mathbf{x}^{T}_{n}\mathbf{h}_{n})^{2}}{2C^{\text{new}}_{v,n}}

(21)

into (20). This leads to the instantaneous estimate:

$\displaystyle C^{\text{new}}_{v,n}$	$\displaystyle=\mathcal{E}_{\mathbf{h}_{1:n}\|\theta_{n}}\{(d_{n}-\mathbf{x}^{T}_{n}\mathbf{h}_{n})^{2}\}$	(22)
	$\displaystyle=d_{n}+\mathbf{x}^{T}_{n}(C_{\mathbf{h},n}\mathbf{I}+\mathbf{\hat{h}}_{n}\mathbf{\hat{h}}^{T}_{n})\mathbf{x}_{n}-2\mathbf{x}^{T}_{n}\mathbf{\hat{h}}_{n}$	(23)
	$\displaystyle=(d_{n}-\mathbf{x}^{T}_{n}\mathbf{\hat{h}}_{n})^{2}+\mathbf{x}^{T}_{n}\mathbf{x}_{n}C_{\mathbf{h},n}.$	(24)

The variance (of the microphone signal uncertainty) $C^{\text{new}}_{v,n}$ in (24) consists of two components, which can be interpreted as follows [25]: The first term in (24) is given as the squared error signal after filter adaptation and is influenced by near-end interferences like background noise. The second term in (24) depends on the signal energy $\mathbf{x}^{T}_{n}\mathbf{x}_{n}$ and the variance $C_{\mathbf{h},n}$ which implies that it considers uncertainties in the linear echo path model. Similar to the derivation for $C^{\text{new}}_{v,n}$ , we insert

	$\displaystyle\ln(p(\mathbf{h}_{n}\|\mathbf{h}_{n-1},C_{\mathbf{w},n}\mathbf{I}))$
	$\displaystyle\stackrel{{\scriptstyle(\ref{equ:TranEq})}}{{=}}-\frac{M\ln(2\pi C^{\text{new}}_{\mathbf{w},n})}{2}-\frac{(\mathbf{h}_{n}-\mathbf{h}_{n-1})^{T}(\mathbf{h}_{n}-\mathbf{h}_{n-1})}{2C^{\text{new}}_{\mathbf{w},n}}$		(25)

into (20), to derive the instantaneous estimate of $C^{\text{new}}_{\mathbf{w},n}$ :

	$\displaystyle C^{\text{new}}_{\mathbf{w},n}=\;$	$\displaystyle\frac{1}{M}\mathcal{E}_{\mathbf{h}_{1:n}\|\theta_{n}}\{(\mathbf{h}_{n}-\mathbf{h}_{n-1})^{T}(\mathbf{h}_{n}-\mathbf{h}_{n-1})\}$		(26)
	$\displaystyle\stackrel{{\scriptstyle(\ref{equ:Posterior})}}{{=}}$	$\displaystyle C_{\mathbf{h},n}-C_{\mathbf{h},n-1}+\frac{1}{M}\left(\mathbf{\hat{h}}_{n}^{T}\mathbf{\hat{h}}_{n}-\mathbf{\hat{h}}_{n-1}^{T}\mathbf{\hat{h}}_{n-1}\right),$		(27)

where we employed the statistical independence between $\mathbf{w}_{n}$ and $\mathbf{h}_{n-1}$ . Equation (27) implies the estimation of $C^{\text{new}}_{\mathbf{w},n}$ as difference of the filter tap autocorrelations between the time instants $n$ and $n-1$ . Finally, the updated values in $\theta^{\text{new}}_{n}$ are used as initialization for the following time step, so that

\theta_{n+1}:=\theta^{\text{new}}_{n}\;\rightarrow\;C_{\mathbf{w},n+1}:=C^{\text{new}}_{\mathbf{w},n},\;C_{v,n+1}:=C^{\text{new}}_{v,n}.

(28)

III Comparison between the EM-NLMS algorithm and the NLMS algorithm proposed in [10]

In this part, we compare the proposed EM-NLMS algorithm to the NLMS algorithm reviewed in Section I and show the equality between the adaptive stepsizes in (5) and (16). We reformulate the stepsize update in (16) by applying the conditional independence rules for Bayesian networks [9]: First, we exploit the equalities

\begin{split}\mathbf{C}_{\mathbf{h},n}&\stackrel{{\scriptstyle(\ref{equ:Posterior})}}{{=}}C_{\mathbf{h},n}\mathbf{I}\stackrel{{\scriptstyle(\ref{equ:mes})}}{{=}}\mathcal{E}\{(\mathbf{h}_{n}-\mathbf{\hat{h}}_{n})(\mathbf{h}_{n}-\mathbf{\hat{h}}_{n})^{T}\},\\ \mathbf{C}_{\mathbf{w},n}&\stackrel{{\scriptstyle(\ref{equ:ChannelUncert})}}{{=}}C_{\mathbf{w},n}\mathbf{I}=\mathcal{E}\{\mathbf{w}_{n}\mathbf{w}_{n}^{T}\},\end{split}

(29)

which lead to the following relations:

	$\displaystyle C_{\mathbf{h},n}$	$\displaystyle=\frac{\mathcal{E}\{(\mathbf{h}_{n}-\mathbf{\hat{h}}_{n})^{T}(\mathbf{h}_{n}-\mathbf{\hat{h}}_{n})\}}{M}=\frac{\mathcal{E}\{\|\|\mathbf{h}_{n}-\mathbf{\hat{h}}_{n}\|\|^{2}_{2}\}}{M},$
	$\displaystyle C_{\mathbf{w},n}$	$\displaystyle=\frac{\mathcal{E}\{\mathbf{w}_{n}^{T}\mathbf{w}_{n}\}}{M}=\frac{\mathcal{E}\{\|\|\mathbf{w}_{n}\|\|^{2}_{2}\}}{M}.$		(30)

Second, it can be seen in Fig. 2 that the state vector $\mathbf{h}_{n-1}$ and the uncertainty $\mathbf{w}_{n}$ are statistically independent as they share a head-to-head relationship with respect to the latent vector $\mathbf{h}_{n}$ . As a consequence, the numerator in (16) can be rewritten as

	$\displaystyle C_{\mathbf{h},n-1}+C_{\mathbf{w},n}\stackrel{{\scriptstyle(\ref{equ:Zwischenschritt})}}{{=}}\;$	$\displaystyle\frac{\mathcal{E}\{\|\|\mathbf{h}_{n-1}-\mathbf{\hat{h}}_{n-1}\|\|^{2}_{2}\}}{M}+\frac{\mathcal{E}\{\|\|\mathbf{w}_{n}\|\|^{2}_{2}\}}{M}$
	$\displaystyle\stackrel{{\scriptstyle(\ref{equ:TranEq})}}{{=}}\;$	$\displaystyle\frac{\mathcal{E}\{\|\|\mathbf{h}_{n}-\mathbf{\hat{h}}_{n-1}\|\|^{2}_{2}\}}{M}.$		(31)

Finally, we consider the mean of the squared error signal

\mathcal{E}\{e_{n}^{2}\}\stackrel{{\scriptstyle(\ref{equ:ObservSpe}),(\ref{equ:Error})}}{{=}}\mathcal{E}\{(\mathbf{x}_{n}^{T}(\mathbf{h}_{n}-\mathbf{\hat{h}}_{n-1})+v_{n})^{2}\},

(32)

which is not conditioned on the microphone signal $d_{n}$ . By applying the conditional independence rules to the Bayesian network in Fig. 2, the head-to-head relationship with respect to $d_{n}$ implies $v_{n}$ to be statistically independent from $\mathbf{h}_{n-1}$ and $\mathbf{w}_{n}$ , respectively. Consequently, we can rewrite (32) as:

	$\displaystyle\mathcal{E}\{e_{n}^{2}\}\stackrel{{\scriptstyle(\ref{equ:micUnc})}}{{=}}\hskip 3.41432pt$	$\displaystyle\hskip 8.53581pt\mathbf{x}_{n}^{T}\mathcal{E}\{(\mathbf{h}_{n}-\mathbf{\hat{h}}_{n-1})(\mathbf{h}_{n}-\mathbf{\hat{h}}_{n-1})^{T}\}\mathbf{x}_{n}+C_{v,n}$
	$\displaystyle\stackrel{{\scriptstyle(\ref{equ:TranEq}),(\ref{equ:varDef})}}{{=}}\hskip-3.41432pt$	$\displaystyle\hskip 8.53581pt\mathbf{x}^{T}_{n}\mathbf{x}_{n}(C_{\mathbf{h},n-1}+C_{\mathbf{w},n})+C_{v,n}.$		(33)

The insertion of (31) and (33) into the stepsize defined in (16) yields the identical expression for $\lambda_{n}$ as in (5). The main difference of the proposed EM-NLMS algorithm is that the model parameters $C_{\mathbf{h},n}$ and $C_{\mathbf{w},n}$ (and consequently the normalized stepsize $\lambda_{n}$ ) are estimated in the M step of the EM algorithm instead of being approximated using (6).

IV Experimental results

This section focuses on the experimental verification of the EM-NLMS algorithm (“EM-NLMS”) in comparison to the adaptive stepsize-NLMS algorithm described in Section I (“Adapt. NLMS”) and the conventional NLMS algorithm (“Conv. NLMS”) with a fixed stepsize. An overview of the algorithms including the individually tuned model parameters is shown in Table II. Note the regularization of all three stepsize updates by the additive constant $\epsilon=0.01$ to avoid a division by zero. For the evaluation, we synthesize the microphone signal by convolution of the loudspeaker signal with an RIR vector measured in a room with $T_{60}=100$ ms (filter length $M=512$ at a sampling rate of $16$ kHz). This is realized for both white noise and a male speech signal as loudspeaker signals. Furthermore, background noise is simulated by adding Gaussian white noise at a global signal-to-noise ratio of $20$ dB. The comparison is realized in terms of the stepsize $\alpha_{n}$ and the system distance $\Delta h_{n}$ as a measure for the system identification performance:

\Delta h_{n}=10\log_{10}\frac{||\mathbf{\hat{h}}_{n}-\mathbf{h}_{n}||_{2}^{2}}{||\mathbf{h}_{n}||_{2}^{2}}\;\text{dB},\quad\alpha_{n}=\lambda_{n}(\mathbf{x}^{T}_{n}\mathbf{x}_{n}).

(34)

The results for white noise as input signal are illustrated in Fig 3. Note that in Fig. 3a) the EM-NLMS shows the best system identification compared to the Adapt. NLMS and the Conv. NLMS. As depicted in Fig. 3b), the stepsize $\alpha_{n}$ of the EM-NLMS and the Adapt. NLMS decreases from a value of $0.5$ with the stepsize of the EM-NLMS decaying more slowly.
For male speech as input signal, we improve the convergence of the Conv. NLMS by setting a fixed threshold to stop adaptation ( $\alpha_{n}=0$ ) in speech pauses. Furthermore, the absolute value of $\lambda_{n}$ for the Adapt. NLMS is limited to 0.5 (for a heuristic justification see [24]). As illustrated in Fig. 4a), the EM-NLMS shows again the best system identification compared to the Adapt. NLMS and the Conv. NLMS. By focusing on a small time frame, we can see in Fig. 4b) that the stepsize $\alpha_{n}$ of the EM-NLMS algorithm is not restricted to the values of $0$ and $0.5$ (as Conv. NLMS) and not affected by oscillations (as Adapt. NLMS).
Note that the only relevant increase in computational complexity of the EM-NLMS relative to the Conv. NLMS is caused by the scalar product $\mathbf{\hat{h}}_{n}^{T}\mathbf{\hat{h}}_{n}$ for the calculation of $C_{\mathbf{w},n}$ (cf. Table II), which seems relatively small compared to other sophisticated stepsize adaptation algorithms.

TABLE II: Realizations of the EM-NLMS algorithm (“EM-NLMS”),
the NLMS algorithm due to [10] (“Adapt. NLMS“) and
the conventional NLMS algorithm (”Conv. NLMS“)

	$\mathbf{\hat{h}}_{n}=\mathbf{\hat{h}}_{n-1}+\lambda_{n}\mathbf{x}_{n}e_{n}$
EM-NLMS	$\lambda_{n}=\frac{C_{\mathbf{h},n-1}+C_{\mathbf{w},n}}{\mathbf{x}^{T}_{n}\mathbf{x}_{n}(C_{\mathbf{h},n-1}+C_{\mathbf{w},n})+C_{v,n}+\epsilon}$
	$C_{\mathbf{h},n}=\left(1-\lambda_{n}\frac{\mathbf{x}^{T}_{n}\mathbf{x}_{n}}{M}\right)(C_{\mathbf{h},n-1}+C_{\mathbf{w},n})$
	$C_{v,n+1}=\left(d_{n}-\mathbf{x}^{T}_{n}\mathbf{\hat{h}}_{n}\right)^{2}+\mathbf{x}^{T}_{n}\mathbf{x}_{n}C_{\mathbf{h},n}$
	$C_{\mathbf{w},n+1}=C_{\mathbf{h},n}-C_{\mathbf{h},n-1}+\frac{\mathbf{\hat{h}}_{n}^{T}\mathbf{\hat{h}}_{n}-\mathbf{\hat{h}}_{n-1}^{T}\mathbf{\hat{h}}_{n-1}}{M}$
	$C_{\mathbf{h},0}=C_{\mathbf{w},0}=C_{v,0}=0.1$ , $\;\epsilon=0.01$
Adapt. NLMS	$\lambda_{n}\approx\frac{1}{N_{T}}\frac{\sum\limits_{\kappa=0}^{N_{T}-1}\hat{h}^{2}_{k,n-1}}{(1-\eta)e_{n}^{2}+\eta\mathcal{E}\{e_{n-1}^{2}\}+\epsilon}$
	$N_{T}=5,\;\;\eta=0.9,\;\;e_{0}^{2}=0.1,\;\;\epsilon=0.01$
Conv. NLMS	$\lambda_{n}=\frac{0.5}{\mathbf{x}^{T}_{n}\mathbf{x}_{n}+\epsilon},\;\;\epsilon=0.01$

V Conclusion

In this article, we derive the EM-NLMS algorithm from a Bayesian network perspective and show the equality with respect to the NLMS algorithm initially proposed in [10]. As main difference, the stepsize is estimated in the M Step of the EM algorithm instead of being approximated by artificially extending the acoustic echo path. For the derivation of the EM-NLMS algorithm, which is experimentally shown to be promising for the task of linear AEC, we define a probabilistic model for linear system identification and exploit the product and conditional independence rules of Bayesian networks. All together this article exemplifies the benefit of applying machine learning techniques to classical signal processing tasks.

Figure 3: Comparison of the EM-NLMS algorithm (“EM-NLMS”), the NLMS algorithm due to [10] (“Adapt. NLMS“) and the conventional NLMS algorithm (”Conv. NLMS“) in terms of the system distance

\Delta h_{n}

and the stepsize

\alpha_{n}

for white Gaussian noise as input signal.

Figure 4: Comparison of the EM-NLMS algorithm (“EM-NLMS”), the NLMS algorithm due to [10] (“Adapt. NLMS“) and the conventional NLMS algorithm (”Conv. NLMS“) in terms of the system distance

\Delta h_{n}