- NLMS
- normalized least mean square
- AEC
- acoustic echo cancellation
- FIR
- finite impulse response
- RIR
- room impulse response
- EM
- expectation-maximization
- MMSE
- minimum mean square error
- probability density function
- KLMS
- Kalman-based LMS algorithm
The NLMS algorithm with time-variant optimum
stepsize derived from a Bayesian network perspective
Abstract
In this article, we derive a new stepsize adaptation for the normalized least mean square algorithm (NLMS) by describing the task of linear acoustic echo cancellation from a Bayesian network perspective. Similar to the well-known Kalman filter equations, we model the acoustic wave propagation from the loudspeaker to the microphone by a latent state vector and define a linear observation equation (to model the relation between the state vector and the observation) as well as a linear process equation (to model the temporal progress of the state vector). Based on additional assumptions on the statistics of the random variables in observation and process equation, we apply the expectation-maximization (EM) algorithm to derive an NLMS-like filter adaptation. By exploiting the conditional independence rules for Bayesian networks, we reveal that the resulting EM-NLMS algorithm has a stepsize update equivalent to the optimal-stepsize calculation proposed by Yamamoto and Kitayama in 1982, which has been adopted in many textbooks. As main difference, the instantaneous stepsize value is estimated in the M step of the EM algorithm (instead of being approximated by artificially extending the acoustic echo path). The EM-NLMS algorithm is experimentally verified for synthesized scenarios with both, white noise and male speech as input signal.
Index Terms:
Adaptive stepsize, NLMS, Bayesian network, machine learning, EM algorithmI Introduction
Machine learning techniques have been widely applied to signal processing tasks since decades [1, 2].
For example, directed graphical models, termed Bayesian networks, have shown to provide a powerful framework for modeling causal probabilistic relationships between random variables [3, 4, 5, 6, 7]. In previous work,
the update equations of the Kalman filter and the normalized least mean square (NLMS) algorithm have already been
derived from a Bayesian network perspective based on a linear relation between the latent room impulse response (RIR) vector and the observation [8, 9].
The NLMS algorithm is one of the most-widely used adaptive algorithms in speech signal processing and a variety of stepsize adaptation
schemes has been proposed to
improve its system identification performance [10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21].
In this article, we derive a novel NLMS-like filter adaptation (termed EM-NLMS algorithm)
by applying the expectation-maximization (EM) algorithm to a probabilistic model for linear system identification.
Based on the conditional independence rules for Bayesian networks, it is shown that the normalized stepsize of the EM-NLMS algorithm
is equivalent to the one proposed in [10], which is now commonly accepted as optimum NLMS stepsize rule, see e.g. [22].
As the main difference relative to [10] , the normalized stepsize is here estimated as part of the EM algorithm instead of being approximated by artificially extending the acoustic echo path.
For a valid comparison, we review the algorithm of [10] for the linear acoustic echo cancellation (AEC) scenario shown in Fig. 1.
The acoustic path between loudspeaker and microphone at time is modeled by the linear finite impulse response (FIR) filter
(1) |
with time-variant coefficients , where . The observation equation models the microphone sample :
(2) |
with the additive variable modeling near-end interferences and the observed input signal vector capturing the time-domain samples . The iterative estimation of the RIR vector by the adaptive FIR filter is realized by the update rule
(3) |
with the stepsize and the error signal
(4) |
relating the observation and its estimate . In [10], the optimal choice of has been approximated as:
(5) |
where denotes the Euclidean norm and the expectation operator. As the true echo path is unobservable, so that the numerator in (5) cannot be computed, is further approximated by introducing a delay of coefficients to the echo path . Moreover, a recursive approximation of the denominator in (5) is applied using the forgetting factor [22, 23]. The resulting stepsize approximation
(6) |
leads to oscillations which have to be addressed by limiting the absolute value of [24].
In this article, we derive the EM-NLMS algorithm which applies the filter update of (3) using the stepsize in (5), where is estimated in the M Step of the EM algorithm instead of being approximated by using (6).
This article is structured as follows: In Section II, we propose a probabilistic model for the linear AEC scenario of Fig. 1 and derive the EM-NLMS algorithm, which is revealed in Section III to be similar to the NLMS algorithm proposed in [10]. As main difference (cf. Table I), the stepsize is estimated in the M Step of the EM algorithm instead of being approximated by artificially extending the acoustic echo path. In Section IV, the EM-NLMS algorithm is experimentally verified for synthesized scenarios with both, white noise and male speech as input signal. Finally, conclusions are drawn in Section V.
II The EM-NLMS algorithm for linear AEC
Throughout this article, the Gaussian probability density function (PDF) of a real-valued length- vector with mean vector and covariance matrix is denoted as
(7) |
where represents the determinant of a matrix. Furthermore, (with identity matrix ) implies the elements of to be mutually statistically independent and of equal variance .
II-A Probabilistic AEC model
To describe the linear AEC scenario of Fig. 1 from a Bayesian network perspective, we model the acoustic echo path as a latent state vector identically defined as in (1) and capture uncertainties (e.g. due to the limitation to a linear system with a finite set of coefficients) by the additive uncertainty . Consequently, the linear process equation and the linear observation equation,
(8) |
can be jointly represented by the graphical model shown in Fig. 2. The directed links express statistical dependencies between the nodes and random variables, such as , are marked as circles. We make the following assumptions on the PDFs of the random variables in Fig. 2:
-
•
The uncertainty is normally distributed with mean vector (of zero-valued entries) and variance :
(9) -
•
The microphone signal uncertainty is assumed to be normally distributed with variance and zero mean:
(10) -
•
The posterior distribution is defined with mean vector , variance and :
(11)
Based on this probabilistic AEC model, we apply the EM algorithm consisting of two parts: In the E Step, the filter update is derived based on minimum mean square error (MMSE) estimation (Subsection II-B). In the M step, we predict the model parameters and to estimate the adaptive stepsize value (Subsection II-C).
II-B E step: Inference of the state vector
The MMSE estimation of the state vector identifies the mean vector of the posterior distribution as estimate :
(12) |
Due to the linear relations between the variables in (2) and (8), and under the restrictions to a linear estimator of and normally distributed random variables, the MMSE estimation is analytically tractable [9] . Exploiting the product rules for linear Gaussian models and conditional independence of the Bayesian network in Fig 2, the filter update can be derived as a special case of the Kalman filter equations [9, p. 639]:
(13) |
with the stepsize matrix
(14) |
and the update of the covariance matrix given as
(15) |
By inserting (9) and (11), we can rewrite the filter update of (13) to the filter update defined in (3) with the scalar stepsize
(16) |
Finally, the update of is approximated following (11) as
(17) |
where adds up the diagonal elements of a matrix.
Before showing the equality of the stepsize updates in (16) and (5) in Section III, we propose a new alternative to estimate in (16) by
deriving the updates of the model parameters and in the following section.
II-C M step: Online learning of the model parameters
In the M step, we predict the model parameters for the following time instant. Although the maximum likelihood estimation is analytically tractable, we apply the EM algorithm to derive an online estimator: In order to update to the new parameters , the lower bound
(18) |
is maximized, where . For this, the PDF is determined by applying the decomposition rules for Bayesian networks [9]:
(19) |
Next, we take the natural logarithm ln of , replace by and maximize the right-hand side of (18) with respect to :
(20) |
where we apply two separate maximizations starting with the estimation of by inserting
(21) |
into (20). This leads to the instantaneous estimate:
(22) | ||||
(23) | ||||
(24) |
The variance (of the microphone signal uncertainty) in (24) consists of two components, which can be interpreted as follows [25]: The first term in (24) is given as the squared error signal after filter adaptation and is influenced by near-end interferences like background noise. The second term in (24) depends on the signal energy and the variance which implies that it considers uncertainties in the linear echo path model. Similar to the derivation for , we insert
(25) |
into (20), to derive the instantaneous estimate of :
(26) | ||||
(27) |
where we employed the statistical independence between and . Equation (27) implies the estimation of as difference of the filter tap autocorrelations between the time instants and . Finally, the updated values in are used as initialization for the following time step, so that
(28) |
III Comparison between the EM-NLMS algorithm and the NLMS algorithm proposed in [10]
In this part, we compare the proposed EM-NLMS algorithm to the NLMS algorithm reviewed in Section I and show the equality between the adaptive stepsizes in (5) and (16). We reformulate the stepsize update in (16) by applying the conditional independence rules for Bayesian networks [9]: First, we exploit the equalities
(29) |
which lead to the following relations:
(30) |
Second, it can be seen in Fig. 2 that the state vector and the uncertainty are statistically independent as they share a head-to-head relationship with respect to the latent vector . As a consequence, the numerator in (16) can be rewritten as
(31) |
Finally, we consider the mean of the squared error signal
(32) |
which is not conditioned on the microphone signal . By applying the conditional independence rules to the Bayesian network in Fig. 2, the head-to-head relationship with respect to implies to be statistically independent from and , respectively. Consequently, we can rewrite (32) as:
(33) |
The insertion of (31) and (33) into the stepsize defined in (16) yields the identical expression for as in (5). The main difference of the proposed EM-NLMS algorithm is that the model parameters and (and consequently the normalized stepsize ) are estimated in the M step of the EM algorithm instead of being approximated using (6).
IV Experimental results
This section focuses on the experimental verification of the EM-NLMS algorithm (“EM-NLMS”) in comparison to the adaptive stepsize-NLMS algorithm described in Section I (“Adapt. NLMS”)
and the conventional NLMS algorithm (“Conv. NLMS”) with a fixed stepsize.
An overview of the algorithms including the individually tuned model parameters is shown in Table II.
Note the regularization of all three stepsize updates by the additive constant to avoid a division by zero.
For the evaluation, we synthesize the microphone signal by convolution of the loudspeaker signal with an RIR vector measured in a room with ms (filter length at a sampling rate of kHz).
This is realized for both white noise and a male speech signal as loudspeaker signals.
Furthermore, background noise is simulated by adding Gaussian white noise at a global signal-to-noise ratio of dB.
The comparison is realized in terms of the stepsize and the system distance as a measure for the system identification performance:
(34) |
The results for white noise as input signal are illustrated in Fig 3.
Note that in Fig. 3a) the EM-NLMS shows the best system identification compared to the Adapt. NLMS and the Conv. NLMS.
As depicted in Fig. 3b), the stepsize of the EM-NLMS and the Adapt. NLMS decreases from a value of with the stepsize of the EM-NLMS decaying more slowly.
For male speech as input signal, we improve the convergence of the Conv. NLMS by setting a fixed threshold to stop adaptation () in speech pauses.
Furthermore, the absolute value of for the Adapt. NLMS is limited to 0.5 (for a heuristic justification see [24]).
As illustrated in Fig. 4a), the EM-NLMS shows again the best system identification compared to the Adapt. NLMS and the Conv. NLMS. By focusing on a small time frame, we can see in Fig. 4b)
that the stepsize of the EM-NLMS algorithm
is not restricted to the values of and (as Conv. NLMS) and not affected by oscillations (as Adapt. NLMS).
Note that the only relevant increase in computational complexity of the EM-NLMS relative to the Conv. NLMS is caused by the scalar product for the calculation of (cf. Table II), which seems relatively small compared to other sophisticated stepsize adaptation algorithms.
the NLMS algorithm due to [10] (“Adapt. NLMS“) and
the conventional NLMS algorithm (”Conv. NLMS“)
EM-NLMS | |
---|---|
, | |
Adapt. NLMS | |
Conv. NLMS |
V Conclusion
In this article, we derive the EM-NLMS algorithm from a Bayesian network perspective and show the equality with respect to the NLMS algorithm initially proposed in [10].
As main difference, the stepsize is estimated in the M Step of the EM algorithm instead of being approximated by artificially extending the acoustic echo path.
For the derivation of the EM-NLMS algorithm, which is experimentally shown to be promising for the task of linear AEC, we define a probabilistic model for linear system identification and exploit the product and conditional
independence rules of Bayesian networks.
All together this article exemplifies the benefit of applying machine learning techniques to classical signal processing tasks.
References
- [1] B. J. Frey and N. Jojic, “A comparison of algorithms for inference and learning in probabilistic graphical models,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 27, no. 9, pp. 1392–1416, Sept. 2005.
- [2] T. Adali, D. Miller, K. Diamantaras, and J. Larsen, “Trends in machine learning for signal processing [in the spotlight],” IEEE Signal Processing Mag., vol. 28, no. 6, pp. 193–196, Nov. 2011.
- [3] J.A. Bilmes and C. Bartels, “Graphical model architectures for speech recognition,” IEEE Signal Processing Mag.,, vol. 22, no. 5, pp. 89–100, Sept. 2005.
- [4] S.J. Rennie, P. Aarabi, and B.J. Frey, “Variational probabilistic speech separation using microphone arrays,” IEEE Trans. Audio, Speech and Lang. Process., vol. 15, no. 1, pp. 135–149, Jan. 2007.
- [5] M.J. Wainwright and M.I. Jordan, “Graphical models, exponential families, and variational inference,” Found. Trends Mach. Learning, vol. 1, no. 1–2, pp. 1–305, Dec. 2008.
- [6] D. Barber and A. Cemgil, “Graphical models for time-series,” IEEE Signal Processing Mag., Nov. 2010.
- [7] C.W. Maina and J.M. Walsh, “Joint speech enhancement and speaker identification using approximate Bayesian inference,” IEEE Trans. Audio, Speech and Lang. Process., vol. 19, no. 6, pp. 1517–1529, Aug. 2011.
- [8] R. Maas, C. Huemmer, A. Schwarz, C. Hofmann, and W. Kellermann, “A Bayesian network view on linear and nonlinear acoustic echo cancellation,” in IEEE China Summit Int. Conf. Signal Inform. Process. (ChinaSIP), Xi’an, China, July 2014, pp. 495–499.
- [9] C. M. Bishop, Pattern Recognition and Machine Learning, Springer, 2006, 8th printing 2009.
- [10] S. Yamamoto and S. Kitayama, “An adaptive echo canceller with variable step gain method,” Trans. IECE Japan, vol. E65, no. 1, pp. 1–8, Jan. 1982.
- [11] H.-C. Huang and J. Lee, “A variable step size LMS algorithm,” IEEE Trans. Signal Process., vol. 40, no. 7, pp. 1633–1642, July 1992.
- [12] T. Aboulnasr and K. Mayyas, “A robust variable step-size LMS-type algorithm: analysis and simulations,” IEEE Trans. Signal Process., vol. 45, no. 3, pp. 631–639, July 1992.
- [13] A. Mader, H. Puder, and G. U. Schmidt, “Step-size control for acoustic echo cancellation filters – an overview,” Signal Process., vol. 80, no. 9, pp. 1697–1719, Sept. 2000.
- [14] H.-C. Shin, A.H. Sayed, and W.-J. Song, “Variable step-size NLMS and affine projection algorithms,” IEEE Signal Process. Lett., vol. 11, no. 2, pp. 132–135, Feb. 2004.
- [15] J. Benesty, H. Rey, L. R. Vega, and S. Tressens, “A nonparametric VSS NLMS algorithm,” IEEE Trans. Signal Process., vol. 13, no. 10, pp. 581–584, Oct. 2006.
- [16] P.A.C. Lopes and J.B. Gerald, “New normalized LMS algorithms based on the Kalman filter,” in Proc. IEEE Int. Conf. Acoustics, Speech, Signal Process. (ICASSP), May 2007, pp. 117–120.
- [17] M. Asif Iqbal and S. L. Grant, “Novel variable step size NLMS algorithms for echo cancellation,” in Proc. IEEE Int. Conf. Acoustics, Speech, Signal Process. (ICASSP), Apr. 2008, pp. 241–244.
- [18] C. Paleologu, J. Benesty, S. L. Grant, and C. Osterwise, “Variable step-size NLMS algorithms designed for echo cancellation,” in IEEE Rec. 43rd Asilomar Conf. Signals, Syst. and Comput., Nov. 2009, pp. 633–637.
- [19] J.-K. Hwang and Y.-P. Li, “Variable step-size LMS algorithm with a gradient-based weighted average,” IEEE Signal Process. Lett., vol. 16, no. 12, pp. 1043–1046, Dec. 2009.
- [20] H.-C. Huang and J. Lee, “A new variable step-size NLMS algorithm and its performance analysis,” IEEE Trans. Signal Process., vol. 60, no. 4, pp. 2055–2060, Apr. 2012.
- [21] H. Zhao and Y. Yu, “Novel adaptive VSS-NLMS algorithm for system identification,” in IEEE 4th Int. Conf. Intell. Control Inform. Process. (ICICIP), June 2013, pp. 760–764.
- [22] S. Haykin, Adaptive Filter Theory, Prentice Hall, 2002.
- [23] C. Breining, P. Dreiseitel, E. Hänsler, A. Mader, B. Nitsch, H. Puder, T. Schertler, G. Schmidt, and J. Tilp, “Acoustic echo control,” Signal Process., vol. 16, no. 4, pp. 42–69, July 1999.
- [24] U. Schultheiß, Über die Adaption eines Kompensators für akustische Echos, Ph.D. thesis, 1988.
- [25] R. Maas, C. Huemmer, C. Hofmann, and W. Kellermann, “On Bayesian networks in speech signal processing,” in ITG Conf. Speech Commun., Erlangen, Germany, Sept. 2014.