This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

\TPMargin

5pt {textblock}0.80(0.10, 0.01) ©2021 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.

Block Coordinate Descent Algorithms for Auxiliary-Function-Based Independent Vector Extraction

Rintaro Ikeshita, , Tomohiro Nakatani, , and Shoko Araki Rintaro Ikeshita, Tomohiro Nakatani, and Shoko Araki are with NTT Corporation, Kyoto, 619-0237, Japan (e-mail: ikeshita@ieee.org).
Abstract

In this paper, we address the problem of extracting all super-Gaussian source signals from a linear mixture in which (i) the number of super-Gaussian sources KK is less than that of sensors MM, and (ii) there are up to MKM-K stationary Gaussian noises that do not need to be extracted. To solve this problem, independent vector extraction (IVE) using a majorization minimization and block coordinate descent (BCD) algorithms has been developed, attaining robust source extraction and low computational cost. We here improve the conventional BCDs for IVE by carefully exploiting the stationarity of the Gaussian noise components. We also newly develop a BCD for a semiblind IVE in which the transfer functions for several super-Gaussian sources are given a priori. Both algorithms consist of a closed-form formula and a generalized eigenvalue decomposition. In a numerical experiment of extracting speech signals from noisy mixtures, we show that when K=1K=1 in a blind case or at least K1K-1 transfer functions are given in a semiblind case, the convergence of our proposed BCDs is significantly faster than those of the conventional ones.

Index Terms:
Blind source extraction, independent component analysis, independent vector analysis, block coordinate descent method, generalized eigenvalue problem

I Introduction

Blind source separation (BSS) is a problem concerned with estimating the original source signals from a mixture signal captured by multiple sensors. When the number of sources is no greater than that of sensors, i.e., in the (over-)determined case, independent component analysis (ICA [1, 2, 3, 4]) has been a popular approach to BSS because it is simple and effective. When each original source is a vector and a multivariate random variable, independent vector analysis (IVA [5, 6]), also termed joint BSS [7, 8, 9], has been widely studied as an extension of ICA.

In this paper, we will only focus on the problem of extracting all the super-Gaussian sources from a linear mixture signal under the following assumptions and improve the computational efficacy of IVA (ICA can also be improved in the same way as IVA, and so we only deal with IVA):

  1. 1.

    The number of super-Gaussian sources KK is known and fewer than that of sensors MM, i.e., K<MK<M.

  2. 2.

    There can be up to MKM-K stationary Gaussian noises, and thus the problem remains (over-)determined.

The second assumption, which concerns the rigorous development of efficient algorithms, can be violated to some extent when applied in practice (see numerical experiments in Section VII). To distinguish it from a general BSS, this problem is called blind source extraction (BSE [2]). The BSE problem is often encountered in such applications as speaker source enhancement, biomedical signal processing, or passive radar/sonar. In speaker source enhancement, for instance, the maximum number of speakers (super-Gaussian sources) can often be predetermined as a certain number (e.g., two or three) while an audio device is equipped with more microphones. In real-word applications, the observed signal is contaminated with background noises, and most noise signals are more stationary than speaker signals.

BSE can be solved by simply applying IVA as if there were MM super-Gaussian sources and selecting the top KK (<M)(<M) super-Gaussian signals from the MM separated signals in some way. However, this approach is computationally intensive if MM gets too large. To reduce the computing cost, for preprocessing of IVA, the number of channels (or sensors) can be reduced to KK by using principle component analysis or by selecting KK sensors with high SNRs. These channel reductions, however, often degrade the separation performance due to the presence of the background noises.

BSE methods for efficiently extracting just one or several non-Gaussian signals (not restricted to super-Gaussian signals) have already been studied mainly in the context of ICA [10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20]. The natural gradient algorithm and the FastICA method with deflation techniques [16, 17, 18] can sequentially extract non-Gaussian sources one by one. In FastICA, the deflation is based on the orthogonal constraint where the sample correlation between every pair of separated signals equals zero. This constraint was also used to develop FastICA with symmetric orthonormalization [17, 18, 19, 20] that can simultaneously extract KK non-Gaussian signals.

I-A Independent vector extraction (IVE)

Recently, maximum likelihood approaches have been proposed for BSE in which the background noise components are modeled as stationary Gaussians. These methods include independent vector extraction (IVE [21, 22, 23]) and overdetermined IVA (OverIVA [24, 25, 26, 27]), which will be collectively referred to as IVE in this paper. When KK non-Gaussian sources are all super-Gaussian (as defined in Assumption 2), IVE can use a majorization-minimization (MM [28]) optimization algorithm developed for an auxiliary-function-based ICA (AuxICA [29]) and an auxiliary-function-based IVA (AuxIVA [30, 31, 32]). In this paper, we only deal with an IVE based on the MM algorithm [24, 25, 26, 27, 23] and always assume that the super-Gaussian distributions are defined by Assumption 2 in Section III-A.

In the MM-based IVE, each separation matrix WM×MW\in\mathbb{C}^{M\times M} is optimized by iteratively minimizing a surrogate function of the following form:

g(W)\displaystyle g(W) =i=1K𝒘ihVi𝒘i+tr(WzhVzWz)log|detW|2,\displaystyle=\sum_{i=1}^{K}\bm{w}_{i}^{h}V_{i}\bm{w}_{i}+\operatorname*{tr}\left(W_{\operatorname*{z}}^{h}V_{\operatorname*{z}}W_{\operatorname*{z}}\right)-\log|\det W|^{2},
W\displaystyle W =[𝒘1,,𝒘K,Wz]M×M,\displaystyle=[\bm{w}_{1},\ldots,\bm{w}_{K},W_{\operatorname*{z}}]\in\mathbb{C}^{M\times M},
Wz\displaystyle W_{\operatorname*{z}} =[𝒘K+1,,𝒘M]M×(MK).\displaystyle=[\bm{w}_{K+1},\ldots,\bm{w}_{M}]\in\mathbb{C}^{M\times(M-K)}.

Here, 𝒘1,,𝒘KM\bm{w}_{1},\ldots,\bm{w}_{K}\in\mathbb{C}^{M} are filters that extract the target-source signals and WzW_{\operatorname*{z}} is a filter that extracts the MKM-K noise components. V1,,VK,VzM×MV_{1},\ldots,V_{K},V_{\operatorname*{z}}\in\mathbb{C}^{M\times M} are the Hermitian positive definite matrices that are updated in each iteration of the MM algorithm. Interestingly, when K=MK=M or when viewing the second term of g(W)g(W) as tr(WzhVzWz)=i=K+1M𝒘ihVz𝒘i\operatorname*{tr}\left(W_{\operatorname*{z}}^{h}V_{\operatorname*{z}}W_{\operatorname*{z}}\right)=\sum_{i=K+1}^{M}\bm{w}_{i}^{h}V_{\operatorname*{z}}\bm{w}_{i}, the problem of minimizing g(W)g(W) has been discussed in ICA/IVA literature [33, 34, 35, 36, 37]. Among the algorithms developed so far, block coordinate descent (BCD [38]) methods with simple analytical solutions have attracted much attention in the field of audio source separation because they have been experimentally shown to give stable and fast convergence behaviors. A family of these BCD algorithms [29, 30, 31, 32], summarized in Table I, is currently called an iterative projection (IP) method. The IP method has been widely applied not only to ICA and IVA but also to audio source separation methods using more advanced source models (see [39, 40, 41, 42, 43] for the details of such methods).

I-B Conventional BCD (or IP) algorithms for IVE

When we consider directly applying the IP methods developed for AuxIVA to the BSE problem of minimizing g(W)g(W) with respect to WW, AuxIVA-IP1 [30] in Table I, for instance, will cyclically update 𝒘1𝒘K𝒘M\bm{w}_{1}\rightarrow\cdots\rightarrow\bm{w}_{K}\rightarrow\cdots\rightarrow\bm{w}_{M} one by one. However, in the BSE problem, the signals of interests are KK non-Gaussian sources, and most of the optimization of WzW_{\operatorname*{z}} should be skipped.

Therefore, in a previous work [24], an algorithm (IVE-OC in Table I) was proposed that cyclically updates 𝒘1Wz𝒘KWz\bm{w}_{1}\rightarrow W_{\operatorname*{z}}\rightarrow\cdots\rightarrow\bm{w}_{K}\rightarrow W_{\operatorname*{z}} one by one with a computationally cheap formula for WzW_{\operatorname*{z}}. In this work [24], the updating equation for WzW_{\operatorname*{z}} was derived solely from the (weak) orthogonal constraint (OC [22, 21, 23]) where the sample correlation between the separated KK non-Gaussian sources and the MKM-K background noises equals zero. (Note that the KK non-Gaussian sources are not constrained to be orthogonal in the OC.) Although the algorithm has successfully reduced the computational cost of IVA by nearly a factor of K/MK/M, the validity of imposing the heuristic OC remains unclear.

After that, IVE-OC has been extended by removing the OC from the model. One such extension is a direct method that can obtain a global minimizer of g(W)g(W) when K=1K=1 [25, 27]. The other extension is a BCD algorithm for K2K\geq 2 that cyclically updates the pairs (𝒘1,Wz)(𝒘K,Wz)(\bm{w}_{1},W_{\operatorname*{z}})\rightarrow\cdots\rightarrow(\bm{w}_{K},W_{\operatorname*{z}}) one by one [26], but the computational cost is not so cheap due to the full update of WzW_{\operatorname*{z}} in each iteration. These algorithms are called IVE-IP2 in this paper (see Table I).

I-C Contributions

In this work, we propose BCD algorithms for IVE, which are summarized in Table I with comparisons to the previous BCDs (IPs). The followings are the contributions of this paper.

(i) We speed up the previous IVE-IP2 for K2K\geq 2 by showing that WzW_{\operatorname*{z}}’s update can be omitted in each iteration of BCD without changing the behaviors of the algorithm (Section IV-F). This is attained by carefully exploiting the stationary Gaussian assumption for the noise components. In an experiment of speaker source enhancement, we confirmed that the computational cost of the conventional IVE-IP2 is consistently reduced.

(ii) We provide a comprehensive explanation of IVE-IP2 for K=1K=1 (Sections IV-D and IV-H). Interestingly, it turns out to be a method that iteratively updates the maximum signal-to-noise-ratio (MaxSNR [44, 45]) beamformer and the power spectrum for the unique (K=1K=1) target-source signal. The experimental result shows that this algorithm has much faster convergence than conventional algorithms. Note that IVE-IP2 for K=1K=1 was developed independently and simultaneously in our conference paper [27] and by Scheibler–Ono [25].

(iii) We reveal that IVE-OC [24], which was developed with the help of the heuristic orthogonal constraint (OC), can also be obtained as a BCD algorithm for our proposed IVE without the OC (Section IV-G). (This interpretation of IVE-OC as a BCD algorithm was described in our conference paper [27], but it is not mathematically rigorous. We here provide a rigorous proof of this interpretation.)

(iv) We further extend the proposed IVE-IP2 for the semiblind case where the transfer functions for LL (1LK1\leq L\leq K) sources of interests can be given a priori (Section V). We call the proposed semiblind method Semi-IVE. In Semi-IVE, LL separation filters, e.g., 𝒘1,,𝒘L\bm{w}_{1},\ldots,\bm{w}_{L}, which correspond to the known transfer functions are optimized by iteratively solving the linear constrained minimum variance (LCMV [44]) beamforming algorithm, and the remaining 𝒘L+1,,𝒘K\bm{w}_{L+1},\ldots,\bm{w}_{K} (and WzW_{\operatorname*{z}}) are optimized by the full-blind IVE-IP2 algorithm. We experimentally show that when LK1L\geq K-1 transfer functions are given Semi-IVE yields surprisingly fast convergence.

Organization: The rest of this paper is organized as follows. The BSE and semi-BSE problems are defined in Section II. The probabilistic model for the proposed IVE is compared with related methods in Section III. The optimization algorithms are developed separately for the BSE and semi-BSE problems in Sections IV and V. The computational time complexity for these methods is discussed in Section VI. In Sections VII and VIII, experimental results and concluding remarks are described.

I-D Notations

Let 𝒮+d\mathcal{S}_{+}^{d} or 𝒮++d\mathcal{S}_{++}^{d} denote the set of all Hermitian positive semidefinite (PSD) or positive definite (PD) matrices of size d×dd\times d. Let GL(d)\operatorname*{GL}(d) denote the set of all (complex) nonsingular matrices of size d×dd\times d. Also, let 𝟎dd\bm{0}_{d}\in\mathbb{C}^{d} be the zero vector, let Oi,ji×jO_{i,j}\in\mathbb{C}^{i\times j} be the zero matrix, let Idd×dI_{d}\in\mathbb{C}^{d\times d} be the identity matrix, and let 𝒆k\bm{e}_{k} be a vector whose kk-th element equals one and the others are zero. For a vector or matrix AA, AA^{\top} and AhA^{h} represent the transpose and the conjugate transpose of AA. The element-wise conjugate is denoted as A=(A)hA^{\ast}=(A^{\top})^{h}. For a matrix Ai×jA\in\mathbb{C}^{i\times j}, ImA\operatorname*{Im}A is defined as the subspace {A𝒙𝒙j}i\{A\bm{x}\mid\bm{x}\in\mathbb{C}^{j}\}\subset\mathbb{C}^{i}. For a vector 𝒗\bm{v}, 𝒗=𝒗h𝒗\|\bm{v}\|=\sqrt{\bm{v}^{h}\bm{v}} denotes the Euclidean norm.

TABLE I: Optimization process of BCD algorithms for problem (29)
Method Algorithm Reference Assumption Optimization process of BCD
Conventional (Aux)IVA IP1 [30] - 𝒘1𝒘K𝒘M\bm{w}_{1}\rightarrow\cdots\rightarrow\bm{w}_{K}\rightarrow\cdots\rightarrow\bm{w}_{M}
IP2 [31] K=M=2K=M=2 W=[𝒘1,𝒘2]W=[\bm{w}_{1},\bm{w}_{2}] (direct optimization)
IP2 [32] (if MM is even) (𝒘1,𝒘2)(𝒘M1,𝒘M)(\bm{w}_{1},\bm{w}_{2})\rightarrow\cdots\rightarrow(\bm{w}_{M-1},\bm{w}_{M})
Conventional IVE IVE-OC1) [24], §III-C Orthogonal constraint (OC) 𝒘1Wz1𝒘KWz1\bm{w}_{1}\rightarrow W_{\operatorname*{z}}^{1}\rightarrow\cdots\rightarrow\bm{w}_{K}\rightarrow W_{\operatorname*{z}}^{1}
IP23) [26], §IV-E K2K\geq 2 (𝒘1,Wz)(𝒘K,Wz)(\bm{w}_{1},W_{\operatorname*{z}})\rightarrow\cdots\rightarrow(\bm{w}_{K},W_{\operatorname*{z}})
Proposed IVE IP11) §IV-G - 𝒘1(Wz,Ωz)𝒘K(Wz,Ωz)\bm{w}_{1}\rightarrow(W_{\operatorname*{z}},\Omega_{\operatorname*{z}})\rightarrow\cdots\rightarrow\bm{w}_{K}\rightarrow(W_{\operatorname*{z}},\Omega_{\operatorname*{z}})
IP22) §IV-D K=1K=1 W=[𝒘1,(Wz)]W=[\bm{w}_{1},(W_{\operatorname*{z}})] (direct optimization)
IP23) §IV-F K2K\geq 2 (𝒘1,(Wz))(𝒘K,(Wz))(\bm{w}_{1},(W_{\operatorname*{z}}))\rightarrow\cdots\rightarrow(\bm{w}_{K},(W_{\operatorname*{z}}))
Proposed Semi-IVE IP24) §V Given 𝒂1,,𝒂L\bm{a}_{1},\cdots,\bm{a}_{L} (LK1)(L\geq K-1) W=[𝒘1,,𝒘K,(Wz)]W=[\bm{w}_{1},\ldots,\bm{w}_{K},(W_{\operatorname*{z}})] (direct optimization)
§V Given 𝒂1,,𝒂L\bm{a}_{1},\cdots,\bm{a}_{L} (LK2)(L\leq K-2) (𝒘L+1,(Wz))(𝒘K,(Wz))(\bm{w}_{L+1},(W_{\operatorname*{z}}))\rightarrow\cdots\rightarrow(\bm{w}_{K},(W_{\operatorname*{z}}))
1) The two algorithms are identical as shown in Section IV-G.
2) It was developed independently and simultaneously by Scheibler–Ono [25] and the authors [27] in Proceedings of ICASSP2020.
3) The proposed IVE-IP2 for K2K\geq 2 is an acceleration of the conventional IVE-IP2 developed in [26].
4) 𝒘1,,𝒘L\bm{w}_{1},\cdots,\bm{w}_{L}, which correspond to 𝒂1,,𝒂L\bm{a}_{1},\ldots,\bm{a}_{L}, are directly globally optimized as the LCMV beamformers (see Section V-B).
2,3,4) In the proposed IVE-IP2 and Semi-IVE, the optimizations for WzW_{\operatorname*{z}} are skipped (see Sections IV-D, IV-F, and V-C).

II Blind and semiblind source extraction

II-A Blind source extraction (BSE) problem

Throughout this paper, we discuss IVA and IVE using the terminology from audio source separation in the short-term Fourier transform (STFT) domain.

Suppose that KK super-Gaussian target-source signals and a stationary Gaussian noise signal of dimension MKM-K are transmitted and observed by MM sensors. In this paper, we only consider the case where 1K<M1\leq K<M. The observed signal in the STFT domain is modeled at each frequency bin f=1,,Ff=1,\ldots,F and time-frame t=1,,Tt=1,\ldots,T as

𝒙(f,t)\displaystyle\bm{x}(f,t) =As(f)𝒔(f,t)+Az(f)𝒛(f,t)M,\displaystyle=A_{\operatorname*{s}}(f)\bm{s}(f,t)+A_{\operatorname*{z}}(f)\bm{z}(f,t)\in\mathbb{C}^{M}, (1)
As(f)\displaystyle A_{\operatorname*{s}}(f) =[𝒂1(f),,𝒂K(f)]M×K,\displaystyle=[\,\bm{a}_{1}(f),\ldots,\bm{a}_{K}(f)\,]\in\mathbb{C}^{M\times K}, (2)
𝒔(f,t)\displaystyle\bm{s}(f,t) =[s1(f,t),,sK(f,t)]K,\displaystyle=[\,s_{1}(f,t),\ldots,s_{K}(f,t)\,]^{\top}\in\mathbb{C}^{K}, (3)
Az(f)\displaystyle A_{\operatorname*{z}}(f) M×(MK),𝒛(f,t)MK,\displaystyle\in\mathbb{C}^{M\times(M-K)},\quad\bm{z}(f,t)\in\mathbb{C}^{M-K}, (4)

where si(f,t)s_{i}(f,t)\in\mathbb{C} and 𝒛(f,t)MK\bm{z}(f,t)\in\mathbb{C}^{M-K} are the STFT coefficients of target source i{1,,K}i\in\{1,\ldots,K\} and the noise signal, respectively. 𝒂i(f)M\bm{a}_{i}(f)\in\mathbb{C}^{M} is the (time-independent) acoustic transfer function (or steering vector) of source ii to the sensors, and Az(f)M×(MK)A_{\operatorname*{z}}(f)\in\mathbb{C}^{M\times(M-K)} is that of the noise. It is assumed that the source signals are statistically mutually independent.

In the blind source extraction (BSE) problem, we are given an observed mixture 𝒙={𝒙(f,t)}f,t\bm{x}=\{\bm{x}(f,t)\}_{f,t} and the number of target sources KK. From these inputs, we seek to estimate the spatial images 𝒙1,,𝒙K\bm{x}_{1},\ldots,\bm{x}_{K} for the target sources, which are defined as 𝒙i(f,t)𝒂i(f)si(f,t)M\bm{x}_{i}(f,t)\coloneqq\bm{a}_{i}(f)s_{i}(f,t)\in\mathbb{C}^{M}, i=1,,Ki=1,\ldots,K.

II-B Semiblind source extraction (Semi-BSE) problem

In the semiblind source extraction (Semi-BSE) problem, in addition to the BSE inputs, we are given LL transfer functions 𝒂1,,𝒂L\bm{a}_{1},\ldots,\bm{a}_{L} for LL super-Gaussian sources, where 1LK1\leq L\leq K. From these inputs, we estimate all the target-source spatial images 𝒙1,,𝒙K\bm{x}_{1},\ldots,\bm{x}_{K}. If L=KL=K, then Semi-BSE is known as a beamforming problem.

Motivation to address Semi-BSE: In some applications of audio source extraction, such as meeting diarization [46], the locations of some (but not necessarily all) point sources are available or can be estimated accurately, and their acoustic transfer functions can be obtained from, e.g., the sound propagation model [47]. For instance, in a conference situation, the attendees may be sitting in chairs with fixed, known locations. On the other hand, the locations of moderators, panel speakers, or audience may change from time to time and cannot be determined in advance. In such a case, by using these partial prior knowledge of transfer functions, Semi-BSE methods can improve the computational efficiency and separation performance of BSE methods. In addition, since there are many effective methods for estimating the transfer function of (at least) a dominant source [48, 49, 50], there is a wide range of applications where Semi-BSE can be used to improve the performance of BSE.

III Probabilistic models

We start by presenting the probabilistic models for the proposed auxiliary-function-based independent vector extraction (IVE), which is almost the same as those of such related methods as IVA [5], AuxIVA [30, 31, 32], and the conventional IVE [22, 21, 23, 24, 25, 26].

III-A Probabilistic model of proposed IVE

In the mixing model (1), the mixing matrix

A(f)=[𝒂1(f),,𝒂K(f),Az(f)]M×M\displaystyle A(f)=[\,\bm{a}_{1}(f),\ldots,\bm{a}_{K}(f),A_{\operatorname*{z}}(f)\,]\in\mathbb{C}^{M\times M} (5)

is square and generally invertible, and hence the problem can be converted into one that estimates a separation matrix W(f)GL(M)W(f)\in\operatorname*{GL}(M) satisfying W(f)hA(f)=IMW(f)^{h}A(f)=I_{M}, or equivalently, satisfying

si(f,t)\displaystyle s_{i}(f,t) =𝒘i(f)h𝒙(f,t),i=1,,K,\displaystyle=\bm{w}_{i}(f)^{h}\bm{x}(f,t)\in\mathbb{C},\quad i=1,\ldots,K, (6)
𝒛(f,t)\displaystyle\bm{z}(f,t) =Wz(f)h𝒙(f,t)MK,\displaystyle=W_{\operatorname*{z}}(f)^{h}\bm{x}(f,t)\in\mathbb{C}^{M-K}, (7)

where we define

W(f)\displaystyle W(f) =[𝒘1(f),,𝒘K(f),Wz(f)]GL(M),\displaystyle=[\,\bm{w}_{1}(f),\ldots,\bm{w}_{K}(f),W_{\operatorname*{z}}(f)\,]\in\operatorname*{GL}(M), (8)
𝒘i(f)\displaystyle\bm{w}_{i}(f) M,i=1,,K,\displaystyle\in\mathbb{C}^{M},\quad i=1,\ldots,K, (9)
Wz(f)\displaystyle W_{\operatorname*{z}}(f) M×(MK).\displaystyle\in\mathbb{C}^{M\times(M-K)}. (10)

Denote by 𝒔i(t)=[si(1,t),,si(F,t)]F\bm{s}_{i}(t)=[\,s_{i}(1,t),\ldots,s_{i}(F,t)\,]^{\top}\in\mathbb{C}^{F} the vector of all the frequency components for source ii and time-frame tt. The proposed IVE exploits the following three assumptions. Note that Assumption 2 was introduced for developing AuxICA [29] and AuxIVA [30, 31, 32].

Assumption 1 (Independence of sources).

The random variables {𝒔i(t),𝒛(f,t)}i,f,t\{\bm{s}_{i}(t),\bm{z}(f,t)\}_{i,f,t} are mutually independent:

p({𝒔i(t),𝒛(f,t)}i,f,t)=i,tp(𝒔i(t))f,tp(𝒛(f,t)).\displaystyle p(\{\bm{s}_{i}(t),\bm{z}(f,t)\}_{i,f,t})=\prod_{i,t}p(\bm{s}_{i}(t))\cdot\prod_{f,t}p(\bm{z}(f,t)).
Assumption 2 (Super-Gaussian distributions for the target sources [29, 30]).

The target-source signal 𝒔i(t)\bm{s}_{i}(t) follows a circularly symmetric super-Gaussian distribution:

logp(𝒔i(t))=G(𝒔i(t))+const.,\displaystyle-\log p(\bm{s}_{i}(t))=G(\|\bm{s}_{i}(t)\|)+\operatorname*{const.}, (11)

where G:0G\colon\mathbb{R}_{\geq 0}\to\mathbb{R} is differentiable and satisfies that G(r)r\frac{G^{\prime}(r)}{r} is nonincreasing on r>0r\in\mathbb{R}_{>0}. Here, GG^{\prime} is the first derivative of GG. Candidates of GG (or the probability density functions) include the logcosh\log\cosh function and the circularly symmetric generalized Gaussian distribution (GGD) with the scale parameter αi>0\alpha_{i}\in\mathbb{R}_{>0} and the shape parameter 0<β<20<\beta<2, which is also known as the exponential power distribution [51]:

G(𝒔i(t))=(𝒔i(t)αi)β+2Flogαi.\displaystyle G(\|\bm{s}_{i}(t)\|)=\left(\frac{\|\bm{s}_{i}(t)\|}{\alpha_{i}}\right)^{\beta}+2F\log\alpha_{i}. (12)

GGD is a parametric family of symmetric distributions, and when β=1\beta=1 it is nothing but the complex Laplace distribution. It has been experimentally shown in many studies that ICA type methods including IVA and IVE can work effectively for audio source separation tasks when audio signals such as speech signals are modeled by the super-Gaussian distributions (see, e.g., [5, 6, 21, 22, 23, 24, 25, 26, 27, 30, 31, 32]).

Assumption 3 (Stationary Gaussian distribution for the background noise).

The noise signal 𝒛(f,t)MK\bm{z}(f,t)\in\mathbb{C}^{M-K} follows a circularly symmetric complex Gaussian distribution with the zero mean and identity covariance matrix:

𝒛(f,t)\displaystyle\bm{z}(f,t) 𝒩(𝟎MK,IMK),\displaystyle\sim\mathbb{C}\mathcal{N}\left(\bm{0}_{M-K},I_{M-K}\right), (13)
p(𝒛(f,t))\displaystyle p(\bm{z}(f,t)) =1πMKexp(𝒛(f,t)2).\displaystyle=\frac{1}{\pi^{M-K}}\exp\left(-\|\bm{z}(f,t)\|^{2}\right). (14)

Assumption 3 plays a central role for deriving several efficient algorithms for IVE. Despite this assumption, as we experimentally confirm in Section VII, the proposed IVE can extract speech signals even in a diffuse noise environment where the noise signal is considered super-Gaussian or nonstationary and has an arbitrary large spatial rank.

With the model defined by (6)–(13), the negative loglikelihood, g0(W)1Tlogp(𝒙W)g_{0}(W)\coloneqq-\frac{1}{T}\log p(\bm{x}\mid W), can be computed as

g0(W)\displaystyle g_{0}(W) =\displaystyle= 1Ti=1Kt=1TG(𝒔i(t))+1Tf=1Ft=1T𝒛(f,t)2\displaystyle\frac{1}{T}\sum_{i=1}^{K}\sum_{t=1}^{T}G(\|\bm{s}_{i}(t)\|)+\frac{1}{T}\sum_{f=1}^{F}\sum_{t=1}^{T}\|\bm{z}(f,t)\|^{2}
2f=1Flog|detW(f)|+const.,\displaystyle-2\sum_{f=1}^{F}\log|\det W(f)|+\operatorname*{const.}, (15)

where W{W(f)}f=1FW\coloneqq\{W(f)\}_{f=1}^{F} are the variables to be optimized.

Remark 1.

The stationary Gaussian noise model (13) with the identity covariance matrix does not sacrifice generality. At first glance, it seems better to employ

𝒛(f,t)𝒩(𝟎,Ωz(f))\displaystyle\bm{z}(f,t)\sim\mathbb{C}\mathcal{N}\left(\bm{0},\Omega_{\operatorname*{z}}(f)\right) (16)

with a general covariance matrix Ωz(f)𝒮++MK\Omega_{\operatorname*{z}}(f)\in\mathcal{S}_{++}^{M-K}. However, we can freely change the variables to satisfy (13) using the ambiguity between AzA_{\operatorname*{z}} and 𝒛\bm{z}, given by

Az(f)𝒛(f,t)\displaystyle A_{\operatorname*{z}}(f)\bm{z}(f,t) =(Az(f)Ωz(f)12)(Ωz(f)12𝒛(f,t)).\displaystyle=(A_{\operatorname*{z}}(f)\Omega_{\operatorname*{z}}(f)^{\frac{1}{2}})(\Omega_{\operatorname*{z}}(f)^{-\frac{1}{2}}\bm{z}(f,t)). (17)

III-B Relation to IVA and AuxIVA

If we assume that the MKM-K noise components also independently follow super-Gaussian distributions, then the IVE model coincides with that of IVA [5] or AuxIVA [30, 31, 32]. The following are the two advantages of assuming the stationary Gaussian model (13).

(i) As we confirm experimentally in Section VII, when we optimize the IVE model, separation filters 𝒘1,,𝒘K\bm{w}_{1},\ldots,\bm{w}_{K} extract the top KK highly super-Gaussian (or nonstationary) signals such as speech signals from the observed mixture while WzW_{\operatorname*{z}} extracts only the background noises that are more stationary and approximately follow Gaussian distributions. On the other hand, in IVA, which assumes super-Gaussian noise models, KK (<M<M) target-source signals need to be chosen from the MM separated signals after optimizing the model.

(ii) As we reveal in Section IV-G, in the proposed IVE, it suffices to optimize ImWz={Wz𝒗𝒗MK}M\operatorname*{Im}W_{\operatorname*{z}}=\{W_{\operatorname*{z}}\bm{v}\mid\bm{v}\in\mathbb{C}^{M-K}\}\subset\mathbb{C}^{M}, the subspace spanned by WzM×(MK)W_{\operatorname*{z}}\in\mathbb{C}^{M\times(M-K)}, instead of WzW_{\operatorname*{z}}. Because we can optimize ImWz\operatorname*{Im}W_{\operatorname*{z}} very efficiently (Section IV-G), IVE can reduce the computation time of IVA, especially when KMK\ll M.

III-C Relation to IVE with orthogonal constraint

The proposed IVE is inspired by OverIVA with an orthogonal constraint (OC) [24]. This conventional method will be called IVE-OC in this paper.

IVE-OC [24] was proposed as an acceleration of IVA [30] for the case where K<MK<M, while maintaining its separation performance. The IVE-OC model is defined as the proposed IVE by replacing the noise model from (13) to (16) and introducing two additional constraints:

Wz(f)\displaystyle W_{\operatorname*{z}}(f) =[Wz1(f)IMK],Wz1(f)K×(MK),\displaystyle=\begin{bmatrix}W_{\operatorname*{z}}^{1}(f)\\ -I_{M-K}\end{bmatrix},\quad W_{\operatorname*{z}}^{1}(f)\in\mathbb{C}^{K\times(M-K)}, (18)
1Tt=1T𝒔(f,t)𝒛(f,t)h=OK,MK.\displaystyle\frac{1}{T}\sum_{t=1}^{T}\bm{s}(f,t)\bm{z}(f,t)^{h}=O_{K,M-K}. (19)

The first constraint (18) may be applicable because there is no need to extract the noise components (see [24, 26] for details). The second constraint (19), called an orthogonal constraint (OC [21]), was introduced to help the model distinguish between the target-source and noise signals.

OC, which forces the sample correlation between the separated target-source and noise signals to be zero, can equivalently be expressed as

Ws(f)hVz(f)Wz(f)=OK,MK,\displaystyle W_{\operatorname*{s}}(f)^{h}V_{\operatorname*{z}}(f)W_{\operatorname*{z}}(f)=O_{K,M-K}, (20)
Vz(f)\displaystyle V_{\operatorname*{z}}(f) =1Tt=1T𝒙(f,t)𝒙(f,t)h𝒮+M,\displaystyle=\frac{1}{T}\sum_{t=1}^{T}\bm{x}(f,t)\bm{x}(f,t)^{h}\in\mathcal{S}_{+}^{M}, (21)
Ws(f)\displaystyle W_{\operatorname*{s}}(f) =[𝒘1(f),,𝒘K(f)]M×K,\displaystyle=[\,\bm{w}_{1}(f),\ldots,\bm{w}_{K}(f)\,]\in\mathbb{C}^{M\times K}, (22)

which together with (18) imply

Wz1(f)=(Ws(f)hVz(f)Es)1(Ws(f)hVz(f)Ez),\displaystyle W_{\operatorname*{z}}^{1}(f)=(W_{\operatorname*{s}}(f)^{h}V_{\operatorname*{z}}(f)E_{\operatorname*{s}})^{-1}(W_{\operatorname*{s}}(f)^{h}V_{\operatorname*{z}}(f)E_{\operatorname*{z}}), (23)

where we define

Es\displaystyle E_{\operatorname*{s}} [𝒆1,,𝒆K]=[IKOMK,K]M×K,\displaystyle\coloneqq[\,\bm{e}_{1},\ldots,\bm{e}_{K}\,]=\begin{bmatrix}I_{K}\\ O_{M-K,K}\end{bmatrix}\in\mathbb{C}^{M\times K}, (24)
Ez\displaystyle E_{\operatorname*{z}} [𝒆K+1,,𝒆M]=[OK,MKIMK]M×(MK).\displaystyle\coloneqq[\,\bm{e}_{K+1},\ldots,\bm{e}_{M}\,]=\begin{bmatrix}O_{K,M-K}\\ I_{M-K}\end{bmatrix}\in\mathbb{C}^{M\times(M-K)}. (25)

From (18) and (23), it turns out that WzW_{\operatorname*{z}} is uniquely determined by WsW_{\operatorname*{s}} in the IVE-OC model. Hence, in a paper on IVE-OC [24], an algorithm was proposed in which WzW_{\operatorname*{z}} is updated based on (18) and (23) immediately after updating any other variables 𝒘1,,𝒘K\bm{w}_{1},\ldots,\bm{w}_{K} to always impose OC on the model. Although the algorithm was experimentally shown to work well, its validity from a theoretical point of view is unclear because the update rule for WzW_{\operatorname*{z}} is derived solely from the constraints (18)–(19) and does not reflect an objective of the optimization problem for parameter estimation, such as minimizing the negative loglikelihood.

In this paper, we develop BCD algorithms for the maximum likelihood estimation of the proposed IVE that does not rely on OC, and identify one such algorithm (IVE-IP1 developed in Section IV-G) that exactly coincides with the conventional algorithm for IVE-OC. This means that OC is not essential for developing fast algorithms in IVE-OC. Due to removing OC from the model, we can provide other more computationally efficient algorithms for IVE, which is the main contribution of this paper.

IV Algorithms for the BSE problem

We develop iterative algorithms for the maximum likelihood estimation of IVE. The proposed and some conventional algorithms [24, 25, 26, 27] are based on the following two methods.

  • One is the conventional majorization-minimization (MM) algorithm developed for AuxICA [29] and AuxIVA [30]. In MM, instead of dealing with original objective function g0g_{0} (Eq. (15)), a surrogate function of g0g_{0} that is easier to minimize is addressed (Section IV-A).

  • The other is block coordinate descent (BCD) algorithms. In each iteration of BCDs, several WW columns are updated to globally minimize the above surrogate function with respect to that variable. In this paper, we propose several BCDs that improve the conventional BCDs.

Our proposed algorithms are summarized in Algorithm 1. The optimization processes of all the BCDs are summarized in Table I and detailed in the following subsections. The computational time complexities of the algorithms are discussed in Section VI.

IV-A Majorization-minimization (MM) approach

We briefly describe how to apply the conventional MM technique developed for AuxICA [29] to the proposed IVE.

In IVE as well as AuxICA/AuxIVA, a surrogate function gg of g0g_{0} is designed with an auxiliary variable rr that satisfies

g0(W)=minrg(W,r).\displaystyle g_{0}(W)=\min_{r}g(W,r). (26)

Then, variables WW and rr are alternately updated by iteratively solving

r(l)\displaystyle r^{(l)} argminrg(W(l1),r),\displaystyle\in\operatorname*{argmin}_{r}g(W^{(l-1)},r), (27)
W(l)\displaystyle W^{(l)} argminWg(W,r(l))\displaystyle\in\operatorname*{argmin}_{W}g(W,r^{(l)}) (28)

for l=1,2,l=1,2,\ldots until convergence. In the same way as in AuxIVA [30], by applying Proposition 5 in Appendix A to the first term of g0(W)g_{0}(W), problem (28) in IVE comes down to solving the following FF subproblems:

W(f)argminW(f)g(W(f),r(l)),f=1,,F,\displaystyle W(f)\in\operatorname*{argmin}_{W(f)}~{}g(W(f),r^{(l)}),\quad f=1,\ldots,F, (29)

where r(l){ri(l)(t)0}i,tr^{(l)}\coloneqq\{r_{i}^{(l)}(t)\in\mathbb{R}_{\geq 0}\}_{i,t} and

g(W(f),r(l))=i=1K𝒘i(f)hVi(f)𝒘i(f)\displaystyle g(W(f),r^{(l)})=\sum_{i=1}^{K}\bm{w}_{i}(f)^{h}V_{i}(f)\bm{w}_{i}(f)
+tr(Wz(f)hVz(f)Wz(f))2log|detW(f)|,\displaystyle+\operatorname*{tr}\left(W_{\operatorname*{z}}(f)^{h}V_{\operatorname*{z}}(f)W_{\operatorname*{z}}(f)\right)-2\log|\det W(f)|, (30)
Vi(f)\displaystyle V_{i}(f) =1Tt=1Tϕi(t)𝒙(f,t)𝒙(f,t)h,\displaystyle=\frac{1}{T}\sum_{t=1}^{T}\phi_{i}(t)\bm{x}(f,t)\bm{x}(f,t)^{h}, (31)
Vz(f)\displaystyle V_{\operatorname*{z}}(f) =1Tt=1T𝒙(f,t)𝒙(f,t)h,\displaystyle=\frac{1}{T}\sum_{t=1}^{T}\bm{x}(f,t)\bm{x}(f,t)^{h}, (32)
ϕi(t)\displaystyle\phi_{i}(t) =G(ri(l)(t))2ri(l)(t),ri(l)(t)=𝒔i(l)(t),\displaystyle=\frac{G^{\prime}(r_{i}^{(l)}(t))}{2r_{i}^{(l)}(t)},\quad r_{i}^{(l)}(t)=\|\bm{s}_{i}^{(l)}(t)\|, (33)
si(l)(f,t)=𝒘i(l1)(f)h𝒙(f,t).\displaystyle s_{i}^{(l)}(f,t)=\bm{w}_{i}^{(l-1)}(f)^{h}\bm{x}(f,t). (34)

Here, the computation of (33)–(34) corresponds to the optimization of (27). Recall that GG^{\prime} in (33) is the first derivative of G:0G\colon\mathbb{R}_{\geq 0}\to\mathbb{R} (see Assumption 2). To efficiently solve (29), we propose BCD algorithms in the following subsections. From the derivation, it is guaranteed that the objective function is monotonically nonincreasing at each iteration in the MM algorithm.111 Due to space limitations, showing the convergence rate and other convergence properties of the proposed algorithms will be left as a future work.

IV-B Block coordinate descent (BCD) algorithms and the stationary condition for problem (29)

No algorithms have been found that obtain a global optimal solution for problem (29) for general K,MK,M\in\mathbb{N}. Thus, iterative algorithms have been proposed to find a local optimal solution. Among them, a family of BCD algorithms has been attracting much attention (e.g., [29, 30, 31, 32, 24, 26, 25, 27, 39, 40, 41, 42]) because they have been experimentally shown to work faster and more robustly than other algorithms such as the natural gradient method [52]. The family of these BCD algorithms (specialized to solve (29)) is currently called an iterative projection (IP) method.

As we will see in the following subsections, all the IP algorithms summarized in Table I can be developed by exploiting the stationary condition, which is also called the first-order necessary optimality condition [53]. To simplify the notation, when we discuss (29), we abbreviate frequency bin index ff without mentioning it. For instance, W(f)W(f) and Vi(f)V_{i}(f) are simply denoted as WW and ViV_{i} (without confusion).

Lemma 1 (See, e.g., [33, 34, 35]).

The stationary condition for problem (29) is expressed as (i=1,,Ki=1,\ldots,K)

g𝒘i\displaystyle\frac{\partial g}{\partial\bm{w}_{i}^{\ast}} =𝟎M\displaystyle=\bm{0}_{M} \displaystyle\Longleftrightarrow WhVi𝒘i\displaystyle\quad W^{h}V_{i}\bm{w}_{i} =𝒆i,\displaystyle=\bm{e}_{i}, (35)
gWz\displaystyle\frac{\partial g}{\partial W_{\operatorname*{z}}^{\ast}} =OM,MK\displaystyle=O_{M,M-K} \displaystyle\Longleftrightarrow WhVzWz\displaystyle\quad W^{h}V_{\operatorname*{z}}W_{\operatorname*{z}} =Ez,\displaystyle=E_{\operatorname*{z}}, (36)

where EzE_{\operatorname*{z}} is given by (25).

To rigorously derive the algorithms, we always assume the following two technical but mild conditions (C1) and (C2) for problem (29):222 If (C1) is violated, problem (29) has no optimal solutions and algorithms should diverge to infinity (see [54, Proposition 1] for the proof). Conversely, if (C1) is satisfied, it is guaranteed that problem (29) has an optimal solution by Proposition 6 in Appendix A. In practice, the number of frames TT exceeds that of sensors MM, and (C1) holds in general. Condition (C2) is satisfied automatically if we initialize WW as nonsingular. Intuitively, singular WW implies log|detW|=+-\log|\det W|=+\infty, which will never occur during optimization.

(C1)

V1,,VK,Vz𝒮+MV_{1},\ldots,V_{K},V_{\operatorname*{z}}\in\mathcal{S}_{+}^{M} are positive definite.

(C2)

Estimates of WM×MW\in\mathbb{C}^{M\times M} are always nonsingular during optimization.

IV-C Conventional methods: IVA-IP1, IVA-IP2, and IVE-OC

IV-C1 IVA-IP1

Let Wz=[𝒘K+1,,𝒘M]W_{\operatorname*{z}}=[\bm{w}_{K+1},\ldots,\bm{w}_{M}]. As shown in Table I, IVA-IP1 [30] cyclically updates each separation filter 𝒘1,,𝒘M\bm{w}_{1},\ldots,\bm{w}_{M} by solving the following subproblem for each i=1,,Mi=1,\ldots,M one by one:

𝒘iargmin𝒘ig(𝒘1,,𝒘M,r).\displaystyle\bm{w}_{i}\in\operatorname*{argmin}_{\bm{w}_{i}}g(\bm{w}_{1},\ldots,\bm{w}_{M},r). (37)

This can be solved under (C1) and (C2) by

𝒖i\displaystyle\bm{u}_{i} (WhVi)1𝒆iM,\displaystyle\leftarrow(W^{h}V_{i})^{-1}\bm{e}_{i}\in\mathbb{C}^{M}, (38)
𝒘i\displaystyle\bm{w}_{i} 𝒖i(𝒖ihVi𝒖i)12M.\displaystyle\leftarrow\bm{u}_{i}\left(\bm{u}_{i}^{h}V_{i}\bm{u}_{i}\right)^{-\frac{1}{2}}\in\mathbb{C}^{M}. (39)

Here, we define ViVz𝒮++MV_{i}\coloneqq V_{\operatorname*{z}}\in\mathcal{S}_{++}^{M} for i=K+1,,Mi=K+1,\ldots,M, and the iith column of WW in (38), i.e., 𝒘i\bm{w}_{i}, is set to the current value before update. When applied to the BSE problem, IVA-IP1’s main drawback is that the computation time increases significantly as MM gets larger since it updates WzW_{\operatorname*{z}} even though there is no need to extract the background noises.

IV-C2 IVE-OC

To accelerate IVA-IP1 when it is applied to the BSE problem, IVE-OC [24] updates WzW_{\operatorname*{z}} using (18) and (23) (see Section III-C and Algorithm 2). Although this update rule seems heuristic, we reveal in Section IV-G that IVE-OC can be comprehended in terms of BCD for the proposed IVE that does not rely on OC.

IV-C3 IVA-IP2

When (K,M)=(1,2)(K,M)=(1,2) or (2,2)(2,2), i.e., when W=[𝒘1,𝒘2]W=[\bm{w}_{1},\bm{w}_{2}], problem (29) can be solved directly (not iteratively) through a generalized eigenvalue problem [35, 31], which is more efficient than IVA-IP1. We extend this direct method to the case where K=1K=1 and general MM (2\geq 2) in Section IV-D.

In a previous work [32], this IVA-IP2 was extended for the case where K=M3K=M\geq 3. This algorithm, which is also called IVA-IP2, updates two separation filters, e.g., 𝒘i\bm{w}_{i} and 𝒘j\bm{w}_{j}, in each iteration by exactly solving the following subproblem:

(𝒘i,𝒘j)argmin𝒘i,𝒘jg(𝒘1,,𝒘M,r).\displaystyle(\bm{w}_{i},\bm{w}_{j})\in\operatorname*{argmin}_{\bm{w}_{i},\,\bm{w}_{j}}g(\bm{w}_{1},\ldots,\bm{w}_{M},r). (40)

We extend this algorithm in Section IV-F.

IV-D Proposed IVE-IP2 for the case of K=1K=1

We focus on the case where K=1K=1 and derive the proposed algorithm IVE-IP2333 The algorithm IVE-IP2 for K=1K=1 was developed independently and simultaneously by Scheibler–Ono (called the fast IVE or FIVE [25]) and the authors [27] in the Proceedings of ICASSP2020. In this paper, we give a complete proof for the derivation of IVE-IP2 and add some remarks (see also Section IV-H for the projection back operation). that is expected to be more efficient than IVE-IP1. The algorithm is summarized in Algorithm 3.

When (K,M)=(1,2)(K,M)=(1,2) or (2,2)(2,2), problem (29) can be solved directly through a generalized eigenvalue problem [35, 31, 32]. We here extend this direct method to the case where K=1K=1 and M2M\geq 2 in the following proposition.

Proposition 2.

Let K=1K=1, M2M\geq 2, and V1,Vz𝒮++MV_{1},V_{\operatorname*{z}}\in\mathcal{S}_{++}^{M}. A matrix W=[𝒘1,Wz]GL(M)W=[\bm{w}_{1},W_{\operatorname*{z}}]\in\operatorname*{GL}(M) with 𝒘1M\bm{w}_{1}\in\mathbb{C}^{M} satisfies stationary conditions (35) and (36) if and only if

𝒘1\displaystyle\bm{w}_{1} =𝒖1(𝒖1hV1𝒖1)12Q1,Q1𝒰(1),\displaystyle=\bm{u}_{1}\left(\bm{u}_{1}^{h}V_{1}\bm{u}_{1}\right)^{-\frac{1}{2}}Q_{1},\quad Q_{1}\in\mathcal{U}(1), (41)
Wz\displaystyle W_{\operatorname*{z}} =Uz(UzhVzUz)12Qz,Qz𝒰(M1),\displaystyle=U_{\operatorname*{z}}\left(U_{\operatorname*{z}}^{h}V_{\operatorname*{z}}U_{\operatorname*{z}}\right)^{-\frac{1}{2}}Q_{\operatorname*{z}},\quad Q_{\operatorname*{z}}\in\mathcal{U}(M-1), (42)
Uz\displaystyle U_{\operatorname*{z}} M×(M1)withUzhVz𝒖1=𝟎M1,\displaystyle\in\mathbb{C}^{M\times(M-1)}\quad\text{with}\quad U_{\operatorname*{z}}^{h}V_{\operatorname*{z}}\bm{u}_{1}=\bm{0}_{M-1}, (43)
𝒖1\displaystyle\bm{u}_{1} M×1withVz𝒖1=λV1𝒖1,\displaystyle\in\mathbb{C}^{M\times 1}\quad\text{with}\quad V_{\operatorname*{z}}\bm{u}_{1}=\lambda V_{1}\bm{u}_{1}, (44)

where 𝒰(d)\mathcal{U}(d) is the set of all unitary matrices of size d×dd\times d. Also, (44) is the generalized eigenvalue problem for (Vz,V1)(V_{\operatorname*{z}},V_{1}) with the eigenvalue λ>0\lambda\in\mathbb{R}_{>0} and eigenvector 𝒖1M\bm{u}_{1}\in\mathbb{C}^{M}.

Moreover, if the generalized eigenvalue λ>0\lambda\in\mathbb{R}_{>0} in (44) is chosen as the largest one, then any WGL(M)W\in\operatorname*{GL}(M) obtained by (41)–(44) is a global optimal solution for problem (29).

Proof.

We first show that WW satisfies (35)–(36) if and only if it is computed by (41)–(44).

For the “if” part, observe that UzV1𝒖1=𝟎M1U_{\operatorname*{z}}V_{1}\bm{u}_{1}=\bm{0}_{M-1} holds by (43)–(44) and λ0\lambda\neq 0. Thus, WW surely satisfies (35)–(36).

We prove the “only if” part. The stationary conditions (35)–(36) imply that vectors V1𝒘1V_{1}\bm{w}_{1} and Vz𝒘1V_{\operatorname*{z}}\bm{w}_{1} are orthogonal to the subspace ImWz\operatorname*{Im}W_{\operatorname*{z}} of dimension M1M-1. Hence, it holds that 𝒘1=c𝒖1\bm{w}_{1}=c\bm{u}_{1} for some cc\in\mathbb{C}, where 𝒖1\bm{u}_{1} is given by (44). This cc is restricted by 𝒘1hV1𝒘1=1\bm{w}_{1}^{h}V_{1}\bm{w}_{1}=1, and we obtain (41). In a similar manner, (36) implies that vector Vz𝒖1V_{\operatorname*{z}}\bm{u}_{1} is orthogonal to ImWz\operatorname*{Im}W_{\operatorname*{z}} of dimension M1M-1. Hence, it holds that Wz=UzRW_{\operatorname*{z}}=U_{\operatorname*{z}}R for some R(M1)×(M1)R\in\mathbb{C}^{(M-1)\times(M-1)}, where UzU_{\operatorname*{z}} is given by (43). This RR is restricted by WzhVzWz=IM1W_{\operatorname*{z}}^{h}V_{\operatorname*{z}}W_{\operatorname*{z}}=I_{M-1}, and we have (42).

We next show the latter statement. By Proposition 6 in Section IV-G, global optimal solutions exist, and they must satisfy the stationary conditions, which are equivalent to (41)–(44). Since (41)–(44) satisfy (35)–(36), the sum of the first and second terms of gg becomes MM, which is constant. On the other hand, for the logdet\log\det term, it holds that

|detW|\displaystyle|\det W| =(𝒖1hV1𝒖1)12det(UzhVzUz)12|detU|\displaystyle=\left(\bm{u}_{1}^{h}V_{1}\bm{u}_{1}\right)^{-\frac{1}{2}}\cdot\det\left(U_{\operatorname*{z}}^{h}V_{\operatorname*{z}}U_{\operatorname*{z}}\right)^{-\frac{1}{2}}\cdot|\det U|
=λdet(UhVzU)12|detU|=λdet(Vz)12,\displaystyle=\sqrt{\lambda}\det\left(U^{h}V_{\operatorname*{z}}U\right)^{-\frac{1}{2}}\cdot|\det U|=\sqrt{\lambda}\det(V_{\operatorname*{z}})^{-\frac{1}{2}},

where we define U[𝒖1,Uz]U\coloneqq[\bm{u}_{1},U_{\operatorname*{z}}] and use UzhVz𝒖1=𝟎M1U_{\operatorname*{z}}^{h}V_{\operatorname*{z}}\bm{u}_{1}=\bm{0}_{M-1} in the second equality. Hence, the largest λ\lambda leads to the smallest gg, which concludes the proof. ∎

By Proposition 2, under condition (C1), a global optimal solution for problem (29) can be obtained by updating W=[𝒘1,Wz]W=[\bm{w}_{1},W_{\operatorname*{z}}] using (41)–(44) with Q1=1Q_{1}=1 and Qz=IM1Q_{\operatorname*{z}}=I_{M-1} and choosing the generalized eigenvalue λ\lambda in (44) as the largest one. Moreover, this algorithm can be accelerated by omitting the computation of (42)–(43) and updating only 𝒘1\bm{w}_{1} according to (41) and (44). It is applicable for extracting the unique target-source signal because the formulas for computing 𝒘1\bm{w}_{1}, V1V_{1}, and VzV_{\operatorname*{z}} are independent of WzW_{\operatorname*{z}}. The obtained algorithm IVE-IP2 is shown in Algorithm 3.

Interestingly, because VzV_{\operatorname*{z}} and V1V_{1} can be viewed as the covariance matrices of the mixture and noise signals, the update rules (41) and (44) turn out to be a MaxSNR beamformer [44, 45]. Hence, IVE-IP2 can be understood as a method that iteratively updates the MaxSNR beamformer 𝒘1\bm{w}_{1} and target-source signal 𝒔1\bm{s}_{1}.

IV-E Drawback of conventional IVE-IP2 for the case of K2K\geq 2

Suppose 2K<M2\leq K<M. The conventional IVE-IP2 developed in [26, Algorithm 2] is a BCD algorithm that cyclically updates the pairs (𝒘1,Wz),,(𝒘K,Wz)(\bm{w}_{1},W_{\operatorname*{z}}),\ldots,(\bm{w}_{K},W_{\operatorname*{z}}) by solving the following subproblem for each i=1,,Ki=1,\ldots,K one by one:

(𝒘i,Wz)argmin𝒘i,Wzg(𝒘1,,𝒘K,Wz,r).\displaystyle(\bm{w}_{i},W_{\operatorname*{z}})\in\operatorname*{argmin}_{\bm{w}_{i},\,W_{\operatorname*{z}}}g(\bm{w}_{1},\ldots,\bm{w}_{K},W_{\operatorname*{z}},r). (45)

It was shown in [26, Theorem 3] that a global optimal solution of problem (45) can be obtained through a generalized eigenvalue problem. In this paper, we simplify the result of [26, Theorem 3] in the next proposition, based on which we will speed up the conventional IVE-IP2 in Section IV-F.

Proposition 3.

Let 2K<M2\leq K<M, and Vi,Vz𝒮++MV_{i},V_{\operatorname*{z}}\in\mathcal{S}_{++}^{M}. A global optimal solution of problem (45) is obtained as

𝒘i\displaystyle\bm{w}_{i} =Pi𝒃i(𝒃ihGi𝒃i)12M×1,\displaystyle=P_{i}\bm{b}_{i}\left(\bm{b}_{i}^{h}G_{i}\bm{b}_{i}\right)^{-\frac{1}{2}}\in\mathbb{C}^{M\times 1}, (46)
Wz\displaystyle W_{\operatorname*{z}} =PzBz(BzhGzBz)12M×(MK),\displaystyle=P_{\operatorname*{z}}B_{\operatorname*{z}}\left(B_{\operatorname*{z}}^{h}G_{\operatorname*{z}}B_{\operatorname*{z}}\right)^{-\frac{1}{2}}\in\mathbb{C}^{M\times(M-K)}, (47)
P\displaystyle P_{\ell} =((W)hV)1[𝒆i,Ez]M×(MK+1),\displaystyle=\left((W^{\prime})^{h}V_{\ell}\right)^{-1}[\,\bm{e}_{i},E_{\operatorname*{z}}\,]\in\mathbb{C}^{M\times(M-K+1)}, (48)
G\displaystyle G_{\ell} =PhVP𝒮++MK+1,{i,z},\displaystyle=P_{\ell}^{h}V_{\ell}P_{\ell}\in\mathcal{S}_{++}^{M-K+1},\quad\ell\in\{i,\operatorname*{z}\}, (49)
W\displaystyle W^{\prime} =[𝒘1,,𝒘i,,𝒘K,Wz]GL(M),\displaystyle=[\,\bm{w}_{1},\ldots,\bm{w}_{i}^{\prime},\ldots,\bm{w}_{K},W^{\prime}_{\operatorname*{z}}\,]\in\operatorname*{GL}(M), (50)
Bz\displaystyle B_{\operatorname*{z}} (MK+1)×(MK)withBzhGz𝒃i=𝟎MK,\displaystyle\in\mathbb{C}^{(M-K+1)\times(M-K)}\quad\text{with}\quad B_{\operatorname*{z}}^{h}G_{\operatorname*{z}}\bm{b}_{i}=\bm{0}_{M-K}, (51)
𝒃i\displaystyle\bm{b}_{i} (MK+1)×1withGi𝒃i=λmaxGz𝒃i,\displaystyle\in\mathbb{C}^{(M-K+1)\times 1}\quad\text{with}\quad G_{i}\bm{b}_{i}=\lambda_{\max}G_{\operatorname*{z}}\bm{b}_{i}, (52)

where EzE_{\operatorname*{z}} is defined by (25), and 𝒘iM×1\bm{w}_{i}^{\prime}\in\mathbb{C}^{M\times 1} and WzM×(MK)W^{\prime}_{\operatorname*{z}}\in\mathbb{C}^{M\times(M-K)} in (50) are set arbitrarily as long as WW^{\prime} is nonsingular (for instance they can be set to the current values under condition (C2)). Also, (52) is the generalized eigenvalue problem for (Gi,Gz)(G_{i},G_{\operatorname*{z}}), and 𝒃i\bm{b}_{i} is the eigenvector corresponding to the largest generalized eigenvalue λmax>0\lambda_{\max}\in\mathbb{R}_{>0}.

Proof.

The proof is given in Appendix B. ∎

In the conventional IVE-IP2, under condition (C2), problem (45) is solved by (46)–(50) where Bi,z[𝒃i,Bz]B_{i,\operatorname*{z}}\coloneqq[\bm{b}_{i},B_{\operatorname*{z}}] is computed through the following generalized eigenvalue problem instead of using (51)–(52):

GiBi,z=GzBi,zdiag{λmax,λ2,,λMK+1}.\displaystyle G_{i}B_{i,\operatorname*{z}}=G_{\operatorname*{z}}B_{i,\operatorname*{z}}\operatorname*{diag}\{\lambda_{\mathrm{max}},\lambda_{2},\ldots,\lambda_{M-K+1}\}. (53)

Since the computational cost of solving (53) is greater than computing (51)–(52), we can speed up the conventional IVE-IP2, as shown in the next subsection.

IV-F Proposed IVE-IP2 for K2K\geq 2

The proposed IVE-IP2 for K2K\geq 2, which is summarized in Algorithm 4, is an acceleration of the conventional IVE-IP2 described in the previous subsection. We here provide another efficient formula to solve (45) without changing the behavior of the conventional IVE-IP2.

Thanks to newly providing Proposition 3, we can update the pair (𝒘i,Wz)(\bm{w}_{i},W_{\operatorname*{z}}) according to (46)–(52) in which only the first generalized eigenvector has to be computed. Moreover, for the purpose of optimizing 𝒘1,,𝒘K\bm{w}_{1},\ldots,\bm{w}_{K}, we do not have to update WzW_{\operatorname*{z}} using Eqs. (47) and (51) when solving problem (45) for each i=1,,Ki=1,\ldots,K. To see this, observe that

  1. 1.

    in the MM algorithm (Eqs. (29)–(34)), the auxiliary variable rr, and by extension, the covariance matrices V1,,VK,VzV_{1},\ldots,V_{K},V_{\operatorname*{z}} are independent of WzW_{\operatorname*{z}}; and hence,

  2. 2.

    WzW_{\operatorname*{z}} never contributes to the construction of the surrogate function gg during iterative optimization.

This observation implies that problem (45) remains the same during iterative optimization regardless whether we update WzW_{\operatorname*{z}} or not. Hence, in IVE-IP2, it is sufficient to update only 𝒘i\bm{w}_{i} in (45) using Eqs. (46), (48)–(50), (52).

This advantage stems from the stationary Gaussian assumption for the noise components. To the contrary, suppose that the noise components are nonstationary or non-Gaussian. Then, the surrogate function gg as well as VzV_{\operatorname*{z}} should depend on WzW_{\operatorname*{z}} in the same way that ViV_{i} depends on 𝒘i\bm{w}_{i}, meaning that WzW_{\operatorname*{z}} must be optimized in subproblem (45). In this way, the stationary Gaussian assumption is important to derive a computationally efficient algorithm for IVE.

IV-G IVE-IP1: Reinterpretation of the conventional IVE-OC

In this subsection, we model the noise components by (16) with a general time-independent covariance matrix Ωz\Omega_{\operatorname*{z}}, i.e.,

logp(𝒛(t))=𝒛(t)hΩz1𝒛(t)+logdetΩz+const.,\displaystyle-\log p(\bm{z}(t))=\bm{z}(t)^{h}\Omega_{\operatorname*{z}}^{-1}\bm{z}(t)+\log\det\Omega_{\operatorname*{z}}+\operatorname*{const.}, (54)

and develop a BCD algorithm called IVE-IP1 that cyclically updates 𝒘1,(Wz,Ωz),𝒘2,(Wz,Ωz),,𝒘K,(Wz,Ωz)\bm{w}_{1},(W_{\operatorname*{z}},\Omega_{\operatorname*{z}}),\bm{w}_{2},(W_{\operatorname*{z}},\Omega_{\operatorname*{z}}),\ldots,\bm{w}_{K},(W_{\operatorname*{z}},\Omega_{\operatorname*{z}}) by solving the following subproblems one by one:

𝒘i\displaystyle\bm{w}_{i} argmin𝒘ig(𝒘1,,𝒘K,Wz,Ωz,r),\displaystyle\in\operatorname*{argmin}_{\bm{w}_{i}}g(\bm{w}_{1},\ldots,\bm{w}_{K},W_{\operatorname*{z}},\Omega_{\operatorname*{z}},r), (55)
(Wz,Ωz)\displaystyle(W_{\operatorname*{z}},\Omega_{\operatorname*{z}}) argminWz,Ωzg(𝒘1,,𝒘K,Wz,Ωz,r).\displaystyle\in\operatorname*{argmin}_{W_{\operatorname*{z}},\,\Omega_{\operatorname*{z}}}g(\bm{w}_{1},\ldots,\bm{w}_{K},W_{\operatorname*{z}},\Omega_{\operatorname*{z}},r). (56)

Here, the frequency bin index ff is omitted. Due to considering the noise covariance Ωz\Omega_{\operatorname*{z}}, the objective function gg is slightly changed to

g(W,Ωz,r)\displaystyle g(W,\Omega_{\operatorname*{z}},r) =\displaystyle= i=1K𝒘ihVi𝒘i+tr(WzhVzWzΩz1)\displaystyle\sum_{i=1}^{K}\bm{w}_{i}^{h}V_{i}\bm{w}_{i}+\operatorname*{tr}\left(W_{\operatorname*{z}}^{h}V_{\operatorname*{z}}W_{\operatorname*{z}}\Omega_{\operatorname*{z}}^{-1}\right)
+logdetΩz2log|detW|.\displaystyle+\log\det\Omega_{\operatorname*{z}}-2\log|\det W|. (57)

It will turn out that the BCD algorithm IVE-IP1 is identical to the conventional IVE-OC summarized in Algorithm 2. In other words, IVE-OC, which relies on the orthogonal constraint (OC), can be interpreted as a BCD algorithm for the proposed IVE without OC.

Since problem (55) is the same as problem (37), the update rules for 𝒘1,,𝒘K\bm{w}_{1},\ldots,\bm{w}_{K} are given by (38)–(39). On the other hand, an algorithm for problem (56) can be obtained from the next proposition.

Proposition 4.

Let K,MK,M\in\mathbb{N} with K<MK<M, Vz𝒮++MV_{\operatorname*{z}}\in\mathcal{S}_{++}^{M}, and let Ws=[𝒘1,,𝒘K]M×KW_{\operatorname*{s}}=[\bm{w}_{1},\ldots,\bm{w}_{K}]\in\mathbb{C}^{M\times K} be full column rank. Then, a pair (Wz,Ωz)(W_{\operatorname*{z}},\Omega_{\operatorname*{z}}) is a global optimal solution of problem (56) if and only if it is computed by

Wz\displaystyle W_{\operatorname*{z}} M×(MK)withWshVzWz=OK,MK,\displaystyle\in\mathbb{C}^{M\times(M-K)}\quad\text{with}\quad W_{\operatorname*{s}}^{h}V_{\operatorname*{z}}W_{\operatorname*{z}}=O_{K,M-K}, (58)
Ωz\displaystyle\Omega_{\operatorname*{z}} =WzhVzWz𝒮++MK.\displaystyle=W_{\operatorname*{z}}^{h}V_{\operatorname*{z}}W_{\operatorname*{z}}\in\mathcal{S}_{++}^{M-K}. (59)

(Note that WzW_{\operatorname*{z}} must be full column rank to guarantee the positive definiteness of Ωz\Omega_{\operatorname*{z}}.)

Proof.

The stationary condition which is the necessary optimality condition of problem (56) is expressed as

gWz\displaystyle\frac{\partial g}{\partial W_{\operatorname*{z}}^{\ast}} =O\displaystyle=O \displaystyle\Longleftrightarrow {WshVzWz=OK,MK,WzhVzWz=Ωz,\displaystyle\begin{cases}W_{\operatorname*{s}}^{h}V_{\operatorname*{z}}W_{\operatorname*{z}}=O_{K,M-K},\\ W_{\operatorname*{z}}^{h}V_{\operatorname*{z}}W_{\operatorname*{z}}=\Omega_{\operatorname*{z}},\end{cases} (60)
gΩz1\displaystyle\frac{\partial g}{\partial\Omega_{\operatorname*{z}}^{-1}} =O\displaystyle=O \displaystyle\Longleftrightarrow WzhVzWz=Ωz.\displaystyle W_{\operatorname*{z}}^{h}V_{\operatorname*{z}}W_{\operatorname*{z}}=\Omega_{\operatorname*{z}}. (61)

Hence, the “only if” part is obvious.

To see the “if” part, we show that all stationary points are globally optimal. To this end, it suffices to prove that

(i)

gg takes the same value on all stationary points; and

(ii)

gg attains its minimum at some (Wz,Ωz)(W_{\operatorname*{z}},\Omega_{\operatorname*{z}}).

We first check (i). By (59), the second term of gg becomes tr(WzhVzWzΩz1)=MK\operatorname*{tr}(W_{\operatorname*{z}}^{h}V_{\operatorname*{z}}W_{\operatorname*{z}}\Omega_{\operatorname*{z}}^{-1})=M-K, which is constant. Let (Wz,Ωz)(W_{\operatorname*{z}},\Omega_{\operatorname*{z}}) and (Wz,Ωz)(W_{\operatorname*{z}}^{\prime},\Omega_{\operatorname*{z}}^{\prime}) be two stationary points. Since ImWz=ImWz\operatorname*{Im}W_{\operatorname*{z}}=\operatorname*{Im}W^{\prime}_{\operatorname*{z}} by the first part of (60), we have Wz=WzQW_{\operatorname*{z}}^{\prime}=W_{\operatorname*{z}}Q for some QGL(MK)Q\in\operatorname*{GL}(M-K). Then, by (61), Ωz=QhΩzQ\Omega_{\operatorname*{z}}^{\prime}=Q^{h}\Omega_{\operatorname*{z}}Q. Hence, for the logdet\log\det terms of gg, we have

logdetΩz2log|det[Ws,Wz]|\displaystyle\log\det\Omega_{\operatorname*{z}}^{\prime}-2\log|\det[W_{\operatorname*{s}},W_{\operatorname*{z}}^{\prime}]|
=\displaystyle= logdet(QhΩzQ)2log|det[Ws,Wz]detQ|\displaystyle\log\det(Q^{h}\Omega_{\operatorname*{z}}Q)-2\log|\det[W_{\operatorname*{s}},W_{\operatorname*{z}}]\cdot\det Q|
=\displaystyle= logdetΩz2log|det[Ws,Wz]|,\displaystyle\log\det\Omega_{\operatorname*{z}}-2\log|\det[W_{\operatorname*{s}},W_{\operatorname*{z}}]|,

which concludes (i).

We next show (ii). Let us change the variables from WzW_{\operatorname*{z}} and Ωz\Omega_{\operatorname*{z}} to Wz=WzΩz12W_{\operatorname*{z}}^{\prime}=W_{\operatorname*{z}}\Omega_{\operatorname*{z}}^{-\frac{1}{2}} and Ωz\Omega_{\operatorname*{z}}. Then, the objective function gg with respect to WzW_{\operatorname*{z}}^{\prime} and Ωz\Omega_{\operatorname*{z}} is expressed as

g(Wz,Ωz)=tr(WzhVzWz)2log|det[Ws,Wz]|+const.,\displaystyle g(W_{\operatorname*{z}}^{\prime},\Omega_{\operatorname*{z}})=\operatorname*{tr}(W_{\operatorname*{z}}^{\prime h}V_{\operatorname*{z}}W_{\operatorname*{z}}^{\prime})-2\log|\det[W_{\operatorname*{s}},W_{\operatorname*{z}}^{\prime}]|+\operatorname*{const.},

which is independent of Ωz\Omega_{\operatorname*{z}}. By Proposition 6, the problem of minimizing gg with respect to WzW_{\operatorname*{z}}^{\prime} attains its minimum at some WzW_{\operatorname*{z}}^{\prime}, which in turn implies that problem (56) also attains its minimum at some (Wz,Ωz)(W_{\operatorname*{z}},\Omega_{\operatorname*{z}}). ∎

Proposition 4 implies that in problem (56) it is sufficient to optimize Im(Wz)\operatorname*{Im}(W_{\operatorname*{z}}), instead of WzW_{\operatorname*{z}}, to satisfy the orthogonal constraint (OC), i.e., Eq. (58), which is part of the stationary condition (60).444 The observation that OC appears as part of the stationary condition was pointed out in [26]. Since the update formula (18) and (23) for WzW_{\operatorname*{z}} in IVE-OC surely satisfies OC, it is applicable for the solution of problem (56). Hence, it turns out that the conventional IVE-OC summarized in Algorithm 2 can be obtained as BCD for the proposed IVE.

IV-H Source extraction and projection back

After optimizing separation matrix WW in IVE, we can achieve source extraction by computing the minimum mean square error (MMSE) estimator of the source spatial image 𝒙i\bm{x}_{i} for each i=1,,Ki=1,\ldots,K:

𝒙^i(f,t)(W(f)h𝒆i)projection back(𝒘i(f)h𝒙(f,t))source extraction.\displaystyle\hat{\bm{x}}_{i}(f,t)\coloneqq\underbrace{\left(W(f)^{-h}\bm{e}_{i}\right)}_{\text{projection back}}\underbrace{\left(\bm{w}_{i}(f)^{h}\bm{x}(f,t)\right)}_{\text{source extraction}}. (62)

The projection back operation [55] adjusts the amplitude ambiguity of the extracted signals between frequency bins.

We here show that in the projection back operation, i.e., Wh𝒆iW^{-h}\bm{e}_{i} for i=1,,Ki=1,\ldots,K, we need only 𝒘1,,𝒘K,ImWz\bm{w}_{1},\ldots,\bm{w}_{K},\operatorname*{Im}W_{\operatorname*{z}} but not WzW_{\operatorname*{z}} (frequency bin index ff is dropped off for simplicity). To see this, let

W(i)=[𝒘1,,𝒘K,Wz(i)]GL(M),i{1,2}\displaystyle W^{(i)}=[\bm{w}_{1},\ldots,\bm{w}_{K},W_{\operatorname*{z}}^{(i)}]\in\operatorname*{GL}(M),\quad i\in\{1,2\} (63)

satisfy ImWz(1)=ImWz(2)\operatorname*{Im}W_{\operatorname*{z}}^{(1)}=\operatorname*{Im}W_{\operatorname*{z}}^{(2)}. Then, we have

W(1)=W(2)[IKOOD]for some DGL(MK).\displaystyle W^{(1)}=W^{(2)}\begin{bmatrix}I_{K}&O\\ O&D\\ \end{bmatrix}~{}\text{for some $D\in\operatorname*{GL}(M-K)$}.

This implies (W(1))h𝒆i=(W(2))h𝒆i\left(W^{(1)}\right)^{-h}\bm{e}_{i}=\left(W^{(2)}\right)^{-h}\bm{e}_{i} for i=1,,Ki=1,\ldots,K, meaning that the projection back operation depends only on ImWz\operatorname*{Im}W_{\operatorname*{z}}, as we required.

Remark 2.

Since the proposed IVE-IP2 will never update WzW_{\operatorname*{z}} from its initial value, we update ImWz\operatorname*{Im}W_{\operatorname*{z}} using (18) and (23) after the optimization, which corresponds to Step 1 in Algorithm 1. This update rule is applicable for this purpose by considering the stationary condition (36), which is equivalent to (60) with Ωz=IMK\Omega_{\operatorname*{z}}=I_{M-K}.

V Algorithm for semiblind BSE problem

V-A Proposed Semi-IVE

We next address the semiblind source extraction problem (Semi-BSE) defined in Section II. In Semi-BSE, we are given a priori the LL transfer functions (steering vectors) for target sources l=1,,Ll=1,\ldots,L (1LK1\leq L\leq K), denoted as

A1(f)[𝒂1(f),,𝒂L(f)]M×L.\displaystyle A_{1}(f)\coloneqq[\,\bm{a}_{1}(f),\ldots,\bm{a}_{L}(f)\,]\in\mathbb{C}^{M\times L}. (64)

In this situation, it is natural in the context of the linear constrained minimum variance (LCMV [44]) beamforming algorithm to regularize the separation matrix as W(f)hA1(f)=E1W(f)^{h}A_{1}(f)=E_{1}, where E1E_{1} is defined by (71) below. We can thus immediately propose the Semi-BSE method called Semi-IVE as IVE with the linear constraints, i.e.,

minimize𝑊g0(W)(defined by (15))subjecttotoW(f)hA1(f)=E1,f=1,,F.}\displaystyle\left.\begin{array}[]{cl}\underset{W}{\operatorname*{minimize}}&g_{0}(W)\quad\text{(defined by \eqref{eq:loss})}\\ \operatorname*{subject~{}to}-to&W(f)^{h}A_{1}(f)=E_{1},\quad f=1,\ldots,F.\end{array}\right\} (67)

Here, W{W(f)}f=1FW\coloneqq\{W(f)\}_{f=1}^{F}.

The optimization of Semi-IVE is also based on the MM algorithm described in Section IV-A. The surrogate function for Semi-IVE is the same as that of IVE, i.e., Eq. (30), and the problem of minimizing the surrogate function becomes the following constrained optimization problem for each frequency bin f=1,,Ff=1,\ldots,F:

minimizeW(f)g(W(f),r)(defined by (30))subjecttotoW(f)hA1(f)=E1.}\displaystyle\left.\begin{array}[]{cl}\underset{W(f)}{\operatorname*{minimize}}&g(W(f),r)\quad\text{(defined by \eqref{eq:loss:MM})}\\ \operatorname*{subject~{}to}-to&W(f)^{h}A_{1}(f)=E_{1}.\end{array}\right\} (70)

The goal of this section is to develop an efficient algorithm to obtain a (local) optimal solution for problem (70).

Hereafter, we omit frequency bin index ff and use the following notations for simplicity:

E1\displaystyle E_{1} =[𝒆1(M),,𝒆L(M)]=[ILOML,L]M×L,\displaystyle=[\,\bm{e}_{1}^{(M)},\ldots,\bm{e}_{L}^{(M)}\,]=\begin{bmatrix}I_{L}\\ O_{M-L,L}\end{bmatrix}\in\mathbb{C}^{M\times L}, (71)
E2\displaystyle E_{2} =[𝒆L+1(M),,𝒆M(M)]=[OL,MLIML]M×(ML),\displaystyle=[\,\bm{e}_{L+1}^{(M)},\ldots,\bm{e}_{M}^{(M)}\,]=\begin{bmatrix}O_{L,M-L}\\ I_{M-L}\end{bmatrix}\in\mathbb{C}^{M\times(M-L)}, (72)
W2\displaystyle W_{2} =[𝒘L+1,,𝒘K,Wz]M×(ML).\displaystyle=[\,\bm{w}_{L+1},\ldots,\bm{w}_{K},W_{\operatorname*{z}}\,]\in\mathbb{C}^{M\times(M-L)}. (73)

Here, the superscript (d) in 𝒆i(d)d\bm{e}_{i}^{(d)}\in\mathbb{C}^{d} represents the length of the unit vectors.

V-B LCMV beamforming algorithm for 𝐰1,,𝐰L\bm{w}_{1},\ldots,\bm{w}_{L}

By applying Proposition 7 in Appendix A, the objective function can be expressed as

g(W)\displaystyle g(W) =\displaystyle= i=1K𝒘ihVi𝒘i+tr(WzhVzWz)\displaystyle\sum_{i=1}^{K}\bm{w}_{i}^{h}V_{i}\bm{w}_{i}+\operatorname*{tr}(W_{\operatorname*{z}}^{h}V_{\operatorname*{z}}W_{\operatorname*{z}})
+logdet(A1hA1)logdet(W2hW2).\displaystyle+\log\det\left(A_{1}^{h}A_{1}\right)-\log\det\left(W_{2}^{h}W_{2}\right). (74)

Hence, in problem (70), separation filters 𝒘1,,𝒘L\bm{w}_{1},\ldots,\bm{w}_{L} that correspond to the given transfer functions 𝒂1,,𝒂L\bm{a}_{1},\ldots,\bm{a}_{L} can be globally optimized by solving

minimize𝒘i𝒘ihVi𝒘isubjecttoto𝒘ihA1=(𝒆i(L))1×L.}\displaystyle\left.\begin{array}[]{cl}\underset{\bm{w}_{i}}{\operatorname*{minimize}}&\bm{w}_{i}^{h}V_{i}\bm{w}_{i}\\ \operatorname*{subject~{}to}-to&\bm{w}_{i}^{h}A_{1}=(\bm{e}_{i}^{(L)})^{\top}\in\mathbb{C}^{1\times L}.\end{array}\right\} (77)

This problem is nothing but the LCMV beamforming problem that is solved as

𝒘i=Vi1A1(A1hVi1A1)1𝒆i(L)M.\displaystyle\bm{w}_{i}=V_{i}^{-1}A_{1}\left(A_{1}^{h}V_{i}^{-1}A_{1}\right)^{-1}\bm{e}_{i}^{(L)}\in\mathbb{C}^{M}. (78)

V-C Block coordinate descent (BCD) algorithm for W2W_{2}

We develop a BCD algorithm for optimizing the remaining variables W2W_{2}. Since W2M×(ML)W_{2}\in\mathbb{C}^{M\times(M-L)} is restricted by the (ML)L(M-L)L linear constraints W2hA1=OML,LW_{2}^{h}A_{1}=O_{M-L,L}, it can be parameterized using the (ML)2(M-L)^{2} variables. One such choice is given by

W2\displaystyle W_{2} =W2W¯M×(ML),W¯(ML)×(ML),\displaystyle=W_{2}^{\prime}\overline{W}\in\mathbb{C}^{M\times(M-L)},\quad\overline{W}\in\mathbb{C}^{(M-L)\times(M-L)}, (79)
W2\displaystyle W_{2}^{\prime} =[A1,E2]hE2M×(ML),\displaystyle=[A_{1},E_{2}]^{-h}E_{2}\in\mathbb{C}^{M\times(M-L)}, (80)

where E2E_{2} is defined as (72), and it is assumed that [A1,E2][A_{1},E_{2}] is nonsingular. It is easy to see that this W2W_{2} certainly satisfies the linear constraints. By substituting (79)–(80) into (74), we can reformulate problem (70) as an unconstrained optimization problem of minimizing

g¯(W¯,r)=i=L+1K𝒘¯ihV¯i𝒘¯i+tr(W¯zhV¯zW¯z)2log|detW¯|\displaystyle\overline{g}(\overline{W},r)=\sum_{i=L+1}^{K}\overline{\bm{w}}_{i}^{h}\overline{V}_{\!i}\overline{\bm{w}}_{i}+\operatorname*{tr}(\overline{W}_{\!\operatorname*{z}}^{h}\overline{V}_{\!\operatorname*{z}}\overline{W}_{\!\operatorname*{z}})-2\log|\det\overline{W}|

with respect to W¯\overline{W}, where we define

W¯\displaystyle\overline{W} =[𝒘¯L+1,,𝒘¯K,W¯z](ML)×(ML),\displaystyle=[\,\overline{\bm{w}}_{L+1},\ldots,\overline{\bm{w}}_{K},\overline{W}_{\!\operatorname*{z}}\,]\in\mathbb{C}^{(M-L)\times(M-L)}, (81)
𝒘¯i\displaystyle\overline{\bm{w}}_{i} (ML)×1,i=L+1,,K,\displaystyle\in\mathbb{C}^{(M-L)\times 1},\quad i=L+1,\ldots,K, (82)
W¯z\displaystyle\overline{W}_{\!\operatorname*{z}} (ML)×(MK),\displaystyle\in\mathbb{C}^{(M-L)\times(M-K)}, (83)
V¯i\displaystyle\overline{V}_{\!i} =(W2)hViW2𝒮++ML,i{L+1,,K,z}.\displaystyle=\left(W_{2}^{\prime}\right)^{h}V_{i}W_{2}^{\prime}\in\mathcal{S}_{++}^{M-L},~{}~{}~{}i\in\{L+1,\ldots,K,\operatorname*{z}\}. (84)

Interestingly, this problem is nothing but the BSE problem and was already discussed in Section IV.

In a similar manner to IVE-IP2, our proposed Semi-IVE algorithm updates 𝒘¯L+1,,𝒘¯K\overline{\bm{w}}_{L+1},\ldots,\overline{\bm{w}}_{K} one by one by solving the following subproblem for each i=L+1,,Ki=L+1,\ldots,K:

(𝒘i¯,W¯z)argmin𝒘¯i,W¯zg¯(𝒘¯L+1,,𝒘¯K,W¯z,r).\displaystyle(\overline{\bm{w}_{i}},\overline{W}_{\!\operatorname*{z}})\in\operatorname*{argmin}_{\overline{\bm{w}}_{i},\,\overline{W}_{\!\operatorname*{z}}}\overline{g}(\overline{\bm{w}}_{L+1},\ldots,\overline{\bm{w}}_{K},\overline{W}_{\!\operatorname*{z}},r). (85)
  • When L=K1L=K-1 and W¯=[𝒘¯K,W¯z]\overline{W}=[\,\overline{\bm{w}}_{K},\overline{W}_{\!\operatorname*{z}}\,], a global optimal solution for problem (85) can be obtained by applying Proposition 2.

  • When 1LK21\leq L\leq K-2, a global optimal solution of (85) can be obtained by applying Proposition 3.

  • Note that for the same reason as in the proposed IVE-IP2, updating W¯z\overline{W}_{\!\operatorname*{z}} in (85) is not necessary.

In summary, Semi-IVE can be presented in Algorithm 5, in which we use the following notations:

IML\displaystyle I_{M-L} =[𝒆¯L+1,,𝒆¯K,E¯z]GL(ML),\displaystyle=[\,\overline{\bm{e}}_{L+1},\ldots,\overline{\bm{e}}_{K},\overline{E}_{\operatorname*{z}}\,]\in\operatorname*{GL}(M-L), (86)
𝒆¯i\displaystyle\overline{\bm{e}}_{i} =𝒆iL(ML)ML,i=L+1,,K,\displaystyle=\bm{e}_{i-L}^{(M-L)}\in\mathbb{C}^{M-L},\quad i=L+1,\ldots,K, (87)
E¯z\displaystyle\overline{E}_{\operatorname*{z}} =[𝒆KL+1(ML),,𝒆ML(ML)](ML)×(MK).\displaystyle=[\,\bm{e}_{K-L+1}^{(M-L)},\ldots,\bm{e}_{M-L}^{(M-L)}\,]\in\mathbb{C}^{(M-L)\times(M-K)}. (88)
Remark 3.

Semi-IVE does not update W¯z\overline{W}_{\!\operatorname*{z}} and WzW_{\operatorname*{z}} from the initial values. Hence, for the same reason as in Remark 2, we need to optimize ImW¯z\operatorname*{Im}\overline{W}_{\!\operatorname*{z}} and ImWz\operatorname*{Im}W_{\operatorname*{z}} after the optimization and before performing the projection back operation. For this purpose, we adopt the following formula:

Wz=W2W¯z,W¯z=[(W¯shV¯zE¯s)1(W¯shV¯zE¯z)IMK].\displaystyle W_{\operatorname*{z}}=W_{2}^{\prime}\overline{W}_{\!\operatorname*{z}},\quad\overline{W}_{\!\operatorname*{z}}=\begin{bmatrix}(\overline{W}_{\!\operatorname*{s}}^{h}\overline{V}_{\!\operatorname*{z}}\overline{E}_{\operatorname*{s}})^{-1}(\overline{W}_{\!\operatorname*{s}}^{h}\overline{V}_{\!\operatorname*{z}}\overline{E}_{\operatorname*{z}})\\ -I_{M-K}\end{bmatrix}. (89)

Here, we use the following notations:

W¯\displaystyle\overline{W} =[W¯s,W¯z],\displaystyle=[\,\overline{W}_{\!\operatorname*{s}},\overline{W}_{\!\operatorname*{z}}\,], W¯s\displaystyle\quad\overline{W}_{\!\operatorname*{s}} =[𝒘¯L+1,,𝒘¯K],\displaystyle=[\,\overline{\bm{w}}_{L+1},\ldots,\overline{\bm{w}}_{K}\,],
IML\displaystyle I_{M-L} =[E¯s,E¯z],\displaystyle=[\,\overline{E}_{\operatorname*{s}},\overline{E}_{\operatorname*{z}}\,], E¯s\displaystyle\quad\overline{E}_{\operatorname*{s}} =[𝒆¯L+1,,𝒆¯K].\displaystyle=[\,\overline{\bm{e}}_{L+1},\ldots,\overline{\bm{e}}_{K}\,].

The formula (89) optimizes ImW¯z\operatorname*{Im}\overline{W}_{\!\operatorname*{z}} (and hence ImWz\operatorname*{Im}W_{\operatorname*{z}}) as described in Remark 2.

VI Computational time complexity

We compare the computational time complexity per iteration of the algorithms presented in Table I. In IVE-IP1, IVE-IP2, and Semi-IVE, the runtime is dominated by

  • the computation of V1,,VK𝒮++MV_{1},\ldots,V_{K}\in\mathcal{S}_{++}^{M} at Step 1 in Algorithm 1, which costs O(KM2FT)\mathrm{O}(KM^{2}FT); or

  • the matrix inversions and generalized eigenvalue decompositions in Algorithms 25, which cost O(KM3F)\mathrm{O}(KM^{3}F).

Thus, the time complexity of these algorithms is

O(KM2FT+KM3F).\displaystyle\mathrm{O}(KM^{2}FT+KM^{3}F). (90)

On the other hand, the time complexity of IVA-IP1 and IVA-IP2 is

O(M3FT+M4F)\displaystyle\mathrm{O}(M^{3}FT+M^{4}F) (91)

due to the computation of MM covariance matrices V1,,VMV_{1},\ldots,V_{M}. Consequently, IVE reduces the computational time complexity of IVA by a factor of K/MK/M.

1 Data: Observed signal 𝒙\bm{x};
         Number of sources KK; and
         LL transfer function A1=[𝒂1,,𝒂L]A_{1}=[\bm{a}_{1},\ldots,\bm{a}_{L}],
         where 0LK0\leq L\leq K.
2 Result: Separated spatial images 𝒙1,,𝒙K\bm{x}_{1},\ldots,\bm{x}_{K}.
3 begin
       /* Initialization */
      4 W(f)IMW(f)\leftarrow-I_{M}
      5 Vz(f)1Tt=1T𝒙(f,t)𝒙(f,t)hV_{\operatorname*{z}}(f)\leftarrow\frac{1}{T}\sum_{t=1}^{T}\bm{x}(f,t)\bm{x}(f,t)^{h}
      6 if using IVE-IP1 or IVE-IP2 then
            7 Update Wz(f)W_{\operatorname*{z}}(f) using Step 2 in Algorithm 2
      8if using Semi-IVE then
            9 W2(f)[A1(f),E2]hE2W_{2}^{\prime}(f)\leftarrow[\,A_{1}(f),E_{2}\,]^{-h}E_{2}
            10 V¯z(f)W2(f)hVz(f)W2(f)\overline{V}_{\!\operatorname*{z}}(f)\leftarrow W_{2}^{\prime}(f)^{h}V_{\operatorname*{z}}(f)W_{2}^{\prime}(f)
            11 Update Wz(f)W_{\operatorname*{z}}(f) using (89)
            
      /* Optimization */
      12 repeat
            13 for i=1,,Ki=1,\ldots,K do
                  14 si(f,t)𝒘i(f)h𝒙(f,t)s_{i}(f,t)\leftarrow\bm{w}_{i}(f)^{h}\bm{x}(f,t)
                  15 ri(t)𝒔i(t)r_{i}(t)\leftarrow\|\bm{s}_{i}(t)\|
                  16 αiβ=β2F(1Ttri(t)β)\alpha_{i}^{\beta}=\frac{\beta}{2F}(\frac{1}{T}\sum_{t}r_{i}(t)^{\beta})
                  17 ϕi(t)G(ri(t),αi)2ri(t)=β21αiβri(t)2β\phi_{i}(t)\leftarrow\frac{G^{\prime}(r_{i}(t),\alpha_{i})}{2r_{i}(t)}=\frac{\beta}{2}\frac{1}{\alpha_{i}^{\beta}r_{i}(t)^{2-\beta}}
                  18 ϕi(t)min{ϕi(t),105×min{ϕi(t)}t=1T}\phi_{i}(t)\leftarrow\min\{\phi_{i}(t),10^{5}\times\min\{\phi_{i}(t)\}_{t=1}^{T}\} // for numerical stability
                  19 Vi(f)1Ttϕi(t)𝒙(f,t)𝒙(f,t)hV_{i}(f)\leftarrow\frac{1}{T}\sum_{t}\phi_{i}(t)\bm{x}(f,t)\bm{x}(f,t)^{h}
                  20 Vi(f)Vi(f)+103tr{Vi(f)}IMV_{i}(f)\leftarrow V_{i}(f)+10^{-3}\operatorname*{tr}\{V_{i}(f)\}I_{M} // for numerical stability
                  
            21Update W(f)W(f) for each ff using IVE-IP1, IVE-IP2, or Semi-IVE.
            
      until convergence
      /* Source extraction */
      22 if using IVE-IP2 then
            23 Update Wz(f)W_{\operatorname*{z}}(f) using Step 2 in Algorithm 2
      24if using Semi-IVE then
            25 Update Wz(f)W_{\operatorname*{z}}(f) using (89)
            
      26𝒙i(f,t)(W(f)h𝒆i)𝒘i(f)h𝒙(f,t)\bm{x}_{i}(f,t)\leftarrow\left(W(f)^{-h}\bm{e}_{i}\right)\bm{w}_{i}(f)^{h}\bm{x}(f,t)
      
Algorithm 1 IVE (L=0L=0) and Semi-IVE (1LK1\leq L\leq K) based on generalized Gaussian distributions in Assumption 2: G(𝒔i(t),αi)=(𝒔i(t)αi)β+2FlogαiG(\|\bm{s}_{i}(t)\|,\alpha_{i})=\left(\frac{\|\bm{s}_{i}(t)\|}{\alpha_{i}}\right)^{\beta}+2F\log\alpha_{i}. Here, α1,,αK\alpha_{1},\ldots,\alpha_{K} are parameters to be optimized.
1 for i=1,,Ki=1,\ldots,K do
      2 𝒖i(f)(W(f)hVi(f))1𝒆i\bm{u}_{i}(f)\leftarrow\left(W(f)^{h}V_{i}(f)\right)^{-1}\bm{e}_{i}
      3 𝒘i(f)𝒖i(f)(𝒖i(f)hVi(f)𝒖i(f))12\bm{w}_{i}(f)\leftarrow\bm{u}_{i}(f)\left(\bm{u}_{i}(f)^{h}V_{i}(f)\bm{u}_{i}(f)\right)^{-\frac{1}{2}}
      4 Wz(f)[(Ws(f)hVz(f)Es)1(Ws(f)hVz(f)Ez)IMK]W_{\operatorname*{z}}(f)\leftarrow{\small\begin{bmatrix}(W_{\operatorname*{s}}(f)^{h}V_{\operatorname*{z}}(f)E_{\operatorname*{s}})^{-1}(W_{\operatorname*{s}}(f)^{h}V_{\operatorname*{z}}(f)E_{\operatorname*{z}})\\ -I_{M-K}\end{bmatrix}}
      
Algorithm 2 IVE-IP1 (proposed in [24])
1 Solve Vz(f)𝒖=λmaxV1(f)𝒖V_{\operatorname*{z}}(f)\bm{u}=\lambda_{\max}V_{1}(f)\bm{u} to obtain the eigenvector 𝒖\bm{u} corresponding to the largest eigenvalue λmax\lambda_{\max}.
2 𝒘1(f)𝒖(𝒖hV1(f)𝒖)12\bm{w}_{1}(f)\leftarrow\bm{u}\left(\bm{u}^{h}V_{1}(f)\bm{u}\right)^{-\frac{1}{2}}
Algorithm 3 IVE-IP2 for K=1K=1
1 for i=1,,Ki=1,\ldots,K do
      2 for {i,z}\ell\in\{i,\operatorname*{z}\} do
            3 P(f)(W(f)hV(f))1[𝒆i,Ez]P_{\ell}(f)\leftarrow\left(W(f)^{h}V_{\ell}(f)\right)^{-1}[\,\bm{e}_{i},E_{\operatorname*{z}}\,]
            4 G(f)P(f)hV(f)P(f)G_{\ell}(f)\leftarrow P_{\ell}(f)^{h}V_{\ell}(f)P_{\ell}(f)
            
      5Solve Gi(f)𝒃=λmaxGz(f)𝒃G_{i}(f)\bm{b}=\lambda_{\max}G_{\operatorname*{z}}(f)\bm{b} to obtain 𝒃\bm{b} corresponding to the largest eigenvalue λmax\lambda_{\max}.
      6 𝒘i(f)Pi(f)𝒃(𝒃hGi(f)𝒃)12\bm{w}_{i}(f)\leftarrow P_{i}(f)\bm{b}\left(\bm{b}^{h}G_{i}(f)\bm{b}\right)^{-\frac{1}{2}}
Algorithm 4 IVE-IP2 for K2K\geq 2
/* LCMV beamforming */
1 for i=1,,Li=1,\ldots,L do
      2 𝒘i(f)Vi(f)1A1(f)(A1(f)hVi(f)1A1(f))1𝒆i\bm{w}_{i}(f)\leftarrow V_{i}(f)^{-1}A_{1}(f)\left(A_{1}(f)^{h}V_{i}(f)^{-1}A_{1}(f)\right)^{-1}\bm{e}_{i}
      
3if L=KL=K then
       return
/* BCD */
4 for i=L+1,,Ki=L+1,\ldots,K do
      5 V¯i(f)W2(f)hVi(f)W2(f)\overline{V}_{\!i}(f)\leftarrow W_{2}^{\prime}(f)^{h}V_{i}(f)W_{2}^{\prime}(f)
      
6if L=K1L=K-1 then
      7 Solve V¯z(f)𝒖¯=λmaxV¯K(f)𝒖¯\overline{V}_{\!\operatorname*{z}}(f)\overline{\bm{u}}=\lambda_{\max}\overline{V}_{\!K}(f)\overline{\bm{u}} to obtain 𝒖¯\overline{\bm{u}} corresponding to the largest eigenvalue λmax\lambda_{\max}.
      8 𝒘K(f)W2(f)𝒖¯(𝒖¯hV¯K(f)𝒖¯)12\bm{w}_{K}(f)\leftarrow W_{2}^{\prime}(f)\overline{\bm{u}}\left(\overline{\bm{u}}^{h}\overline{V}_{\!K}(f)\overline{\bm{u}}\right)^{-\frac{1}{2}}
9else
      10 for i=L+1,,Ki=L+1,\ldots,K do
            11 for {i,z}\ell\in\{i,\operatorname*{z}\} do
                  12 P¯(f)(W¯(f)hV¯(f))1[𝒆¯i,E¯z]\overline{P}_{\ell}(f)\leftarrow\left(\overline{W}(f)^{h}\overline{V}_{\ell}(f)\right)^{-1}[\,\overline{\bm{e}}_{i},\overline{E}_{\operatorname*{z}}\,]
                  13 G¯(f)P¯(f)hV¯(f)P¯(f)\overline{G}_{\ell}(f)\leftarrow\overline{P}_{\ell}(f)^{h}\overline{V}_{\ell}(f)\overline{P}_{\ell}(f)
                  
            14Solve G¯i(f)𝒃¯=λmaxG¯z(f)𝒃¯\overline{G}_{i}(f)\overline{\bm{b}}=\lambda_{\max}\overline{G}_{\operatorname*{z}}(f)\overline{\bm{b}} to obtain 𝒃¯\overline{\bm{b}} corresponding to the largest eigenvalue λmax\lambda_{\max}.
            15 𝒘i(f)W2(f)P¯i(f)𝒃¯(𝒃¯hG¯i(f)𝒃¯)12\bm{w}_{i}(f)\leftarrow W_{2}^{\prime}(f)\overline{P}_{i}(f)\overline{\bm{b}}\left(\overline{\bm{b}}^{h}\overline{G}_{i}(f)\overline{\bm{b}}\right)^{-\frac{1}{2}}
      
Algorithm 5 Semi-IVE

VII Experiments

In this numerical experiment, we evaluated the following properties of the IVE and Semi-IVE algorithms:

  • The source extraction performance for the speech signals in terms of the signal-to-distortion ratio (SDR [56]) between the estimated and oracle source spatial images;

  • the runtime performance.

We compared the performance of the following five methods whose optimization procedures are summarized in Table I:

  1. 1.

    IVA-IP1-old: The conventional AuxIVA with IP1 [30], followed by picking KK signals in an oracle manner.

  2. 2.

    IVE-IP1-old: The conventional IVE-IP1 or IVE-OC [24] (Algorithms 1 and 2).

  3. 3.

    IVE-IP2-old: The conventional IVE-IP2 [26] (see [26, Algorithm 2] for the implementation when K2K\geq 2).

  4. 4.

    IVE-IP2-new: The proposed IVE-IP2 (Algorithms 1, 3, and 4), where the generalized eigenvalue problems will be solved using the power method with 30 iterations (see Section VII-C for details).555 Note that in the case of K=1K=1, IVE-IP2-old (called FIVE [25]) and IVE-IP2-new are the same algorithm (see Section IV-D). However, we distinguish between them because we propose to use the power method to efficiently obtain the first generalized eigenvector in the proposed IVE-IP2-new.

  5. 5.

    Semi-IVE-(L)(L)-new: The proposed semiblind IVE algorithm, where the transfer functions of LL super-Gaussian sources are given as an oracle (Algorithms 1 and 5).

VII-A Dataset

As evaluation data, we generated synthesized convolutive noisy mixtures of speech signals.

Room impulse response (RIR) data: We used the RIR data recorded in room E2A from the RWCP Sound Scene Database in Real Acoustical Environments [57]. The reverberation time (RT60\mathrm{RT}_{60}) of room E2A is 300 ms. These data consist of nine RIRs from nine different directions.

Speech data: We used point-source speech signals from the test set of the TIMIT corpus [58]. We concatenated the speech signals from the same speaker so that the length of each signal exceeded ten seconds. We prepared 168 speech signals in total.

Noise data: We used a background noise signal recorded in a cafe (CAF) from the third ‘CHiME’ Speech Separation and Recognition Challenge [59]. We chose a monaural signal captured at ch1 on a tablet and used it as a point-source noise signal. The noise signal was about 35 minutes long.

Mixture signals: We generated 100 mixtures consisting of K{1,2,3}K\in\{1,2,3\} speech signals and J=5J=5 noise signals:

  1. 1.

    We selected KK speech signals at random from the 168 speech data. We selected JJ non-overlapping segments at random from the noise data and prepared JJ noise signals. We selected K+JK+J RIRs at random from the original nine RIRs.

  2. 2.

    We convolved the KK speech and JJ noise signals with the selected K+JK+J RIRs to create K+JK+J spatial images.

  3. 3.

    We added the obtained K+JK+J spatial images in such a way that SNR10log101Ki=1Kλi(s)j=1Jλj(n)\mathrm{SNR}\coloneqq 10\log_{10}\frac{\frac{1}{K}\sum_{i=1}^{K}\lambda_{i}^{(\mathrm{s})}}{\sum_{j=1}^{J}\lambda_{j}^{(\mathrm{n})}} becomes 0 dB if K=1,2K=1,2 and 5 dB if K=3K=3, where λi(s)\lambda_{i}^{(\mathrm{s})} and λj(n)\lambda_{j}^{(\mathrm{n})} denote the sample variances of the iith speech-source and jjth noise-source spatial images.

VII-B Experimental conditions

For all the methods, we initialized W(f)=IMW(f)=-I_{M} and assumed that each super-Gaussian source 𝒔i(t)F\bm{s}_{i}(t)\in\mathbb{C}^{F} follows the generalized Gaussian distribution (GGD). More concretely, we set to G(ri(t),αi)=(ri(t)αi)β+2FlogαiG(r_{i}(t),\alpha_{i})=(\frac{r_{i}(t)}{\alpha_{i}})^{\beta}+2F\log\alpha_{i} with shape parameter β=0.1\beta=0.1 in Assumption 2. Scale parameters α1,,αK>0\alpha_{1},\ldots,\alpha_{K}\in\mathbb{R}_{>0} are optimized to αiβ=β2F(1Ttri(t)β)\alpha_{i}^{\beta}=\frac{\beta}{2F}(\frac{1}{T}\sum_{t}r_{i}(t)^{\beta}) every after WW is updated (Step 1 in Algorithm 1).

In Semi-IVE-(L)(L)-new, we prepared oracle transfer functions 𝒂1,,𝒂L\bm{a}_{1},\ldots,\bm{a}_{L} using oracle spatial images 𝒙1,,𝒙L\bm{x}_{1},\ldots,\bm{x}_{L}. We set 𝒂(f)\bm{a}_{\ell}(f) to the first eigenvector of sample covariance matrix R(f)=1Tt=1T𝒙(f,t)𝒙(f,t)hR_{\ell}(f)=\frac{1}{T}\sum_{t=1}^{T}\bm{x}_{\ell}(f,t)\bm{x}_{\ell}(f,t)^{h} for each source {1,,L}\ell\in\{1,\ldots,L\}. Here, 𝒂(f)\bm{a}_{\ell}(f) was normalized to be 𝒂(f)=1\|\bm{a}_{\ell}(f)\|=1.

The performance was tested for one to three speakers and two to eight microphones. The sampling rate was 16 kHz, the reverberation time was 300 ms, the frame length was 4096 (256 ms), and the frame shift was 1024 (64 ms).

VII-C Implementation notes

We implemented all the algorithms in Python 3.7.1.

In IVE-IP2-new and Semi-IVE, we need to solve the generalized eigenvalue problems of form A𝒙=λmaxB𝒙A\bm{x}=\lambda_{\mathrm{max}}B\bm{x} where we require only the first generalized eigenvector. To prevent ‘for loops’ in the Python implementation, we solved them (i) by first transforming the problem into an eigenvalue problem (B1A)𝒙=λmax𝒙\left(B^{-1}A\right)\bm{x}=\lambda_{\mathrm{max}}\bm{x}, and then (ii) using power iteration (also known as the power method [60]) to obtain the required first eigenvector. The number of iterations in the power method was set to 30 in this experiment.

On the other hand, in IVE-IP2-old for K2K\geq 2, all the generalized eigenvectors have to be obtained. To prevent ‘for loops,’ we implemented it (i) by first transforming the generalized eigenvector problem into an eigenvector problem in the same way as above, and then (ii) by calling the numpy.linalg.eig function. IVE-IP2-old for K=1K=1 (i.e., FIVE [25]) was implemented in the same way.

These implementations are not recommended for numerical stability, but we adopted them to make the algorithms run fast.

Refer to caption
(a) One target source (K=1K=1), SNR = 0 [dB], averaged signal length: 11.46 sec
Refer to caption
(b) Two target sources (K=2K=2), SNR = 0 [dB], averaged signal length: 12.85 sec
Refer to caption
(c) Three target sources (K=3K=3), SNR = 5 [dB], averaged signal length: 12.92 sec
Figure 1: SDR [dB] performance as a function of runtime. Results shown here are averaged over 100 mixtures.
TABLE II: Realtime factors of algorithms when number of iterations is set to 50
Number of sources K=1K=1 K=2K=2 K=3K=3
Number of microphones M=2M=2 M=6M=6 M=8M=8 M=3M=3 M=6M=6 M=8M=8 M=4M=4 M=6M=6 M=8M=8
Signal length [sec] 11.77 11.77 11.77 12.43 12.43 12.43 12.81 12.81 12.81
IVA-IP1-old 0.08 0.58 0.99 0.18 0.61 1.05 0.31 0.70 1.20
IVE-IP1-old 0.05 0.13 0.18 0.14 0.27 0.36 0.28 0.44 0.67
IVE-IP2-old 0.07 0.32 0.48 0.21 0.59 0.93 0.38 0.81 1.45
IVE-IP2-new 0.06 0.16 0.22 0.19 0.40 0.59 0.35 0.59 1.00
Semi-IVE-(1)(1)-new 0.04 0.12 0.17 0.15 0.29 0.41 0.32 0.54 0.89
Semi-IVE-(2)(2)-new - - - 0.14 0.26 0.35 0.28 0.44 0.70
Semi-IVE-(3)(3)-new - - - - - - 0.28 0.42 0.65

VII-D Experimental results

Figure 1 shows the SDR performance of each algorithm as a function of the runtime, and Table II presents the realtime factor (RTF) of the algorithms when the number of iterations is set to 50 for all the algorithms. These results were obtained by running the algorithms on a PC with Intel(R) Core(TM) i7-7820X CPU @ 3.60GHz using a single thread.

VII-D1 Effectiveness of IVE-IP2-new for extracting unique target source (K=1K=1)

In Fig. 1(a), IVE-IP2-new showed faster convergence than the conventional algorithms while achieving a similar SDR performance. Interestingly, the full-blind IVE algorithms in the M6M\geq 6 cases produced rather better SDRs than the informed Semi-IVE-(1)(1)-new in the M=2M=2 case. This result motivates us to use more microphones at the expense of increased runtime. As desired, these increased runtime of the IVE algorithms, especially the proposed IVE-IP2-new, was shown to be very small compared to IVA-IP1-old.

VII-D2 Effectiveness of IVE-IP2-new compared to IVE-IP2-old

From Fig. 1 and Table II, it can be confirmed for M6M\geq 6 that the computational cost of the proposed IVE-IP2-new is consistently smaller than IVE-IP2-old while maintaining its separation performance, which shows the effectiveness of IVE-IP2-new. On the other hand, if MM is small, IVE-IP2-new gave almost the same performance as IVE-IP2-old.

VII-D3 Comparison of IVA-IP1-old, IVE-IP1-old, and IVE-IP2-new

If K=1K=1 (Fig. 1(a)), the proposed IVE-IP2-new gave the fastest convergence among other blind algorithms, which clearly shows the superiority of IVE-IP2-new. On the other hand, if K{2,3}K\in\{2,3\} (Fig. 1(b)–(c)), the convergence of IVE-IP2-new is slower than that of IVE-IP1-old, mainly because the computational cost per iteration of IVE-IP2-new exceedes that of IVE-IP1-old (see Table II). Therefore, IVE-IP1-old is still important when extracting more than two sources.

VII-D4 Effectiveness of Semi-IVE-(L)(L)-new

In Fig. 1, the proposed Semi-IVE algorithms naturally outperformed all of the full-blind IVA and IVE algorithms. Surprisingly, when the LK1L\coloneqq K-1 transfer functions are given a priori (and M4M\geq 4), the Semi-IVE algorithms (Semi-IVE-(1)(1)-new if K=2K=2 and Semi-IVE-(2)(2)-new if K=3K=3) achieved comparable or sometimes better performance than Semi-IVE-(K)(K)-new. The convergence of Semi-IVE-(L)(L)-new with LK1L\geq K-1 was also extremely fast. These results clearly show the effectiveness of the proposed Semi-IVE algorithms.

VIII Concluding remarks

We presented two new efficient BCD algorithms for IVE: (i) IVE-IP2, which extracts all the super-Gaussian sources from a linear mixture in a fully blind manner, and (ii) Semi-IVE, which improves IVE-IP2 in the semiblind scenario in which the transfer functions for several super-Gaussian sources are available as prior knowledge. We also argued that the conventional IVE that relies on the orthogonal constraint (IVE-OC) can be interpreted as BCD for IVE (IVE-IP1). Due to the stationary Gaussian noise assumption, these BCD (or IP) algorithms can skip most of the optimization of the filters for separating the noise components, which plays a central role for achieving a low computational cost of optimization.

Our numerical experiment, which extracts speech signals from their noisy reverberant mixture, showed that when K=1K=1 or given at least K1K-1 transfer functions in a semiblind case, the proposed IVE-IP2 and Semi-IVE resulted in significantly faster convergence compared to the conventional algorithms, where KK is the number of the target-source signals. The new IVE-IP2 consistently speeded up the old IVE-IP2, and the conventional IVE-IP1 remains important for extracting multiple sources.

Appendix A

We prepare several propositions that are needed to rigorously develop the algorithms in Sections IV and V.

Proposition 5 below gives an inequality which is the basis of the MM algorithm for AuxICA [29], AuxIVA [30], and (the auxiliary-function-based) IVE.

Proposition 5 (See [29, Theorem 1]).

Let G:0G\colon\mathbb{R}_{\geq 0}\to\mathbb{R} be differentiable and satisfy that G(r)r\frac{G^{\prime}(r)}{r} is nonincreasing on r>0r\in\mathbb{R}_{>0}. Then, for arbitrary r,r~>0r,\widetilde{r}\in\mathbb{R}_{>0}, it holds that

G(r)G(r~)2r~r2+(G(r~)r~G(r~)2).\displaystyle G(r)\leq\frac{G^{\prime}(\widetilde{r})}{2\widetilde{r}}\cdot r^{2}+\left(G(\widetilde{r})-\frac{\widetilde{r}\cdot G^{\prime}(\widetilde{r})}{2}\right). (92)

The inequality holds with equality if r=r~r=\widetilde{r}.

To give a proof of Propositions 2, 3, and 4, we need the following proposition that provides a sufficient condition for the existence of globally optimal solutions.

Proposition 6.

Suppose V1,,Vn𝒮++MV_{1},\ldots,V_{n}\in\mathcal{S}_{++}^{M} and 1nM1\leq n\leq M. Let W2M×(Mn)W_{2}\in\mathbb{C}^{M\times(M-n)} be full column rank. The function gg with respect to W1=[𝒘1,,𝒘n]M×nW_{1}=[\bm{w}_{1},\ldots,\bm{w}_{n}]\in\mathbb{C}^{M\times n}, defined by

g(W1)\displaystyle g(W_{1}) =i=1n𝒘ihVi𝒘ilog|det[W1,W2]|2,\displaystyle=\sum_{i=1}^{n}\bm{w}_{i}^{h}V_{i}\bm{w}_{i}-\log|\det[W_{1},W_{2}]|^{2},

is lower bounded and attains its minimum at some W1W_{1}.

Proof.

The statement for n=Mn=M was proved in [34, 35, 36, 37]. Since that for 1n<M1\leq n<M can also be proved in the same way as the former part, we omit the proof. ∎

Proposition 7 below gives a useful expression of |detW|2|\det W|^{2}. The same statement for L=1L=1 in Proposition 7 can be found in [61], and we here extend it for general 1LM1\leq L\leq M.

Proposition 7.

Let 1L<M1\leq L<M and

W\displaystyle W =[W1,W2]M×M,\displaystyle=\begin{bmatrix}W_{1},W_{2}\end{bmatrix}\in\mathbb{C}^{M\times M},
W1\displaystyle W_{1} M×L,W2M×(ML),\displaystyle\in\mathbb{C}^{M\times L},\quad W_{2}\in\mathbb{C}^{M\times(M-L)},
A1\displaystyle A_{1} M×L,W1hA1=IL,W2hA1=OML,L.\displaystyle\in\mathbb{C}^{M\times L},\quad W_{1}^{h}A_{1}=I_{L},\quad W_{2}^{h}A_{1}=O_{M-L,L}.

Then, |detW|2=det(A1hA1)1det(W2hW2)|\det W|^{2}=\det\left(A_{1}^{h}A_{1}\right)^{-1}\det\left(W_{2}^{h}W_{2}\right) holds.

Proof.

The orthogonal projection onto ImA1\operatorname*{Im}A_{1} is given by P=A1(A1hA1)1A1hM×MP=A_{1}(A_{1}^{h}A_{1})^{-1}A_{1}^{h}\in\mathbb{C}^{M\times M}. Then, it holds that

|detW|2\displaystyle|\det W|^{2} =|det([PW1+(IP)W1,W2])|2\displaystyle=\left|\det([\,PW_{1}+(I-P)W_{1},W_{2}\,])\right|^{2}
=|det([PW1,W2])|2\displaystyle=\left|\det([\,PW_{1},W_{2}\,])\right|^{2}
=det([PW1,W2]h[PW1,W2])\displaystyle=\det([\,PW_{1},W_{2}\,]^{h}[\,PW_{1},W_{2}\,])
=det([W1hPhPW1OL,MLOML,LW2hW2])\displaystyle=\det\left(\begin{bmatrix}W_{1}^{h}P^{h}PW_{1}&O_{L,M-L}\\ O_{M-L,L}&W_{2}^{h}W_{2}\end{bmatrix}\right)
=det(A1hA1)1det(W2hW2),\displaystyle=\det(A_{1}^{h}A_{1})^{-1}\cdot\det(W_{2}^{h}W_{2}),

where we use Im(IP)ImW2\operatorname*{Im}(I-P)\supseteq\operatorname*{Im}W_{2} in the second equality. ∎

Appendix B Proof of Proposition 3

Proof.

The proof is very similar to that of [26, Theorem 3]. By Proposition 6, problem (45) has global optimal solutions, and they satisfy the stationary conditions (35)–(36). In Eqs. (35)–(36), the first KK rows except for the iith row is linear with respect to 𝒘i\bm{w}_{i} and WzW_{\operatorname*{z}}, and so they can be solved as

𝒘i\displaystyle\bm{w}_{i} =PiGi1𝒄i,\displaystyle=P_{i}G_{i}^{-1}\bm{c}_{i}, 𝒄i\displaystyle\quad\bm{c}_{i} (MK+1)×1,\displaystyle\in\mathbb{C}^{(M-K+1)\times 1}, (93)
Wz\displaystyle W_{\operatorname*{z}} =PzGz1Cz,\displaystyle=P_{\operatorname*{z}}G_{\operatorname*{z}}^{-1}C_{\operatorname*{z}}, Cz\displaystyle\quad C_{\operatorname*{z}} (MK+1)×(MK),\displaystyle\in\mathbb{C}^{(M-K+1)\times(M-K)}, (94)

where PP_{\ell} and GG_{\ell} for {i,z}\ell\in\{i,\operatorname*{z}\} are defined by (48)–(49), and 𝒄i\bm{c}_{i} and CzC_{\operatorname*{z}} are free variables (the roles of Gi1G_{i}^{-1} and Gz1G_{\operatorname*{z}}^{-1} will be clear below). Substituting (93)–(94) into the objective function gg, we have

g\displaystyle g =𝒄ihGi1𝒄i+tr{CihGz1Cz}\displaystyle=\bm{c}_{i}^{h}G_{i}^{-1}\bm{c}_{i}+\operatorname*{tr}\{C_{i}^{h}G_{\operatorname*{z}}^{-1}C_{\operatorname*{z}}\}
log|det[W0PiGi1𝒄iPzGz1Cz]|2+const.,\displaystyle\quad-\log|\det[\,W_{0}\mid P_{i}G_{i}^{-1}\bm{c}_{i}\mid P_{\operatorname*{z}}G_{\operatorname*{z}}^{-1}C_{\operatorname*{z}}\,]|^{2}+\operatorname*{const.}, (95)

where W0[𝒘1,,𝒘i1,𝒘i+1,,𝒘K]M×(K1)W_{0}\coloneqq[\,\bm{w}_{1},\ldots,\bm{w}_{i-1},\bm{w}_{i+1},\ldots,\bm{w}_{K}\,]\in\mathbb{C}^{M\times(K-1)}.

Now, let us consider minimizing gg with respect to 𝒄i\bm{c}_{i} and CzC_{\operatorname*{z}}. In (95), the logdet\log\det term can be simplified as

2log|det[W0PiGi1𝒄iPzGz1Cz]|\displaystyle 2\log|\det\begin{bmatrix}W_{0}\mid P_{i}G_{i}^{-1}\bm{c}_{i}\mid P_{\operatorname*{z}}G_{\operatorname*{z}}^{-1}C_{\operatorname*{z}}\end{bmatrix}|
=\displaystyle= 2log|det[W0hPihVi]1[W0hPihVi][W0PiGi1𝒄iPzGz1Cz]|\displaystyle 2\log|\det\begin{bmatrix}W_{0}^{h}\\ P_{i}^{h}V_{i}\end{bmatrix}^{-1}\begin{bmatrix}W_{0}^{h}\\ P_{i}^{h}V_{i}\end{bmatrix}\begin{bmatrix}W_{0}\mid P_{i}G_{i}^{-1}\bm{c}_{i}\mid P_{\operatorname*{z}}G_{\operatorname*{z}}^{-1}C_{\operatorname*{z}}\end{bmatrix}|
=\displaystyle= 2log|det[W0hW0OMK+1,K1𝒄iCz]|+const.\displaystyle 2\log|\det\begin{bmatrix}W_{0}^{h}W_{0}&*&*\\ O_{M-K+1,K-1}&\bm{c}_{i}&C_{\operatorname*{z}}\end{bmatrix}|+\operatorname*{const.}
=\displaystyle= 2log|det[𝒄i,Cz]|+const.\displaystyle 2\log|\det[\,\bm{c}_{i},C_{\operatorname*{z}}\,]|+\operatorname*{const.}

Here, we used ViPi=(W)h[𝒆i,Ez]=VzPzV_{i}P_{i}=\left(W^{\prime}\right)^{-h}[\,\bm{e}_{i},E_{\operatorname*{z}}\,]=V_{\operatorname*{z}}P_{\operatorname*{z}} twice in the second equality. Hence, by applying Proposition 2, gg attains its minimum when

𝒄i\displaystyle\bm{c}_{i} =𝒖i(𝒖ihGi1𝒖i)12,Cz=Uz(UzhGz1Uz)12,\displaystyle=\bm{u}_{i}\left(\bm{u}_{i}^{h}G_{i}^{-1}\bm{u}_{i}\right)^{-\frac{1}{2}},\quad C_{\operatorname*{z}}=U_{\operatorname*{z}}\left(U_{\operatorname*{z}}^{h}G_{\operatorname*{z}}^{-1}U_{\operatorname*{z}}\right)^{-\frac{1}{2}}, (96)
Uz\displaystyle U_{\operatorname*{z}} (MK+1)×(MK)withUzhGz1𝒖i=𝟎,\displaystyle\in\mathbb{C}^{(M-K+1)\times(M-K)}\quad\text{with}\quad U_{\operatorname*{z}}^{h}G_{\operatorname*{z}}^{-1}\bm{u}_{i}=\bm{0}, (97)
𝒖i\displaystyle\bm{u}_{i} (MK+1)×1withGz1𝒖i=λmaxGi1𝒖i,\displaystyle\in\mathbb{C}^{(M-K+1)\times 1}\quad\text{with}\quad G_{\operatorname*{z}}^{-1}\bm{u}_{i}=\lambda_{\max}G_{i}^{-1}\bm{u}_{i}, (98)

where λmax\lambda_{\max} in (98) denotes the largest generalized eigenvalue. Because (97)–(98) are equivalent to (51)–(52) through 𝒃i=Gi1𝒖i\bm{b}_{i}=G_{i}^{-1}\bm{u}_{i} and Bz=Gz1UzB_{\operatorname*{z}}=G_{\operatorname*{z}}^{-1}U_{\operatorname*{z}}, a global optimal solution for (45) can be obtained by

𝒘i\displaystyle\bm{w}_{i} =PiGi1𝒖i(𝒖ihGi1𝒖i)12=Pi𝒃i(𝒃ihGi𝒃i)12,\displaystyle=P_{i}G_{i}^{-1}\bm{u}_{i}\left(\bm{u}_{i}^{h}G_{i}^{-1}\bm{u}_{i}\right)^{-\frac{1}{2}}=P_{i}\bm{b}_{i}(\bm{b}_{i}^{h}G_{i}\bm{b}_{i})^{-\frac{1}{2}},
Wz\displaystyle W_{\operatorname*{z}} =PzGz1Uz(UzhGz1Uz)12=PzBz(BzhGzBz)12\displaystyle=P_{\operatorname*{z}}G_{\operatorname*{z}}^{-1}U_{\operatorname*{z}}\left(U_{\operatorname*{z}}^{h}G_{\operatorname*{z}}^{-1}U_{\operatorname*{z}}\right)^{-\frac{1}{2}}=P_{\operatorname*{z}}B_{\operatorname*{z}}\left(B_{\operatorname*{z}}^{h}G_{\operatorname*{z}}B_{\operatorname*{z}}\right)^{-\frac{1}{2}}

as we desired. ∎

References

  • [1] P. Comon and C. Jutten, Handbook of Blind Source Separation: Independent component analysis and applications.   Academic press, 2010.
  • [2] A. Cichocki and S. Amari, Adaptive blind signal and image processing: learning algorithms and applications.   John Wiley & Sons, 2002.
  • [3] T.-W. Lee, “Independent component analysis,” in Independent component analysis.   Springer, 1998, pp. 27–66.
  • [4] J. V. Stone, Independent component analysis: a tutorial introduction.   MIT press, 2004.
  • [5] T. Kim, H. T. Attias, S.-Y. Lee, and T.-W. Lee, “Blind source separation exploiting higher-order frequency dependencies,” IEEE Trans. Audio, Speech, Language Process., vol. 15, no. 1, pp. 70–79, 2007.
  • [6] A. Hiroe, “Solution of permutation problem in frequency domain ICA, using multivariate probability density functions,” in Proc. ICA, 2006, pp. 601–608.
  • [7] Y.-O. Li, T. Adali, W. Wang, and V. D. Calhoun, “Joint blind source separation by multiset canonical correlation analysis,” IEEE Trans. Signal Process., vol. 57, no. 10, pp. 3918–3929, 2009.
  • [8] X.-L. Li, T. Adalı, and M. Anderson, “Joint blind source separation by generalized joint diagonalization of cumulant matrices,” Signal Processing, vol. 91, no. 10, pp. 2314–2322, 2011.
  • [9] M. Anderson, T. Adali, and X.-L. Li, “Joint blind source separation with multivariate Gaussian model: Algorithms and performance analysis,” IEEE Trans. Signal Process., vol. 60, no. 4, pp. 1672–1683, 2011.
  • [10] J. H. Friedman and J. W. Tukey, “A projection pursuit algorithm for exploratory data analysis,” IEEE Trans. Comput., vol. 100, no. 9, pp. 881–890, 1974.
  • [11] P. J. Huber, “Projection pursuit,” The annals of Statistics, pp. 435–475, 1985.
  • [12] J. H. Friedman, “Exploratory projection pursuit,” Journal of the American statistical association, vol. 82, no. 397, pp. 249–266, 1987.
  • [13] J.-F. Cardoso and A. Souloumiac, “Blind beamforming for non-Gaussian signals,” in IEE proceedings F (radar and signal processing), vol. 140, no. 6, 1993, pp. 362–370.
  • [14] N. Delfosse and P. Loubaton, “Adaptive blind separation of independent sources: A deflation approach,” Signal processing, vol. 45, no. 1, pp. 59–83, 1995.
  • [15] S. A. Cruces-Alvarez, A. Cichocki, and S. Amari, “From blind signal extraction to blind instantaneous signal separation: Criteria, algorithms, and stability,” IEEE Trans. Neural Netw., vol. 15, no. 4, pp. 859–873, 2004.
  • [16] S. Amari and A. Cichocki, “Adaptive blind signal processing-neural network approaches,” Proceedings of the IEEE, vol. 86, no. 10, pp. 2026–2048, 1998.
  • [17] A. Hyvärinen and E. Oja, “A fast fixed-point algorithm for independent component analysis,” Neural Comput., vol. 9, no. 7, pp. 1483–1492, 1997.
  • [18] A. Hyvarinen, “Fast and robust fixed-point algorithms for independent component analysis,” IEEE Trans. Neural Netw., vol. 10, no. 3, pp. 626–634, 1999.
  • [19] E. Oja and Z. Yuan, “The FastICA algorithm revisited: Convergence analysis,” IEEE Trans. Neural Netw., vol. 17, no. 6, pp. 1370–1381, 2006.
  • [20] T. Wei, “A convergence and asymptotic analysis of the generalized symmetric FastICA algorithm,” IEEE Transl. Signal Process., vol. 63, no. 24, pp. 6445–6458, 2015.
  • [21] Z. Koldovský and P. Tichavský, “Gradient algorithms for complex non-Gaussian independent component/vector extraction, question of convergence,” IEEE Trans. Signal Process., vol. 67, no. 4, pp. 1050–1064, 2018.
  • [22] Z. Koldovský, P. Tichavský, and V. Kautský, “Orthogonally constrained independent component extraction: Blind MPDR beamforming,” in Proc. EUSIPCO, 2017, pp. 1155–1159.
  • [23] J. Janský, J. Málek, J. Čmejla, T. Kounovský, Z. Koldovský, and J. Žd’ánský, “Adaptive blind audio source extraction supervised by dominant speaker identification using x-vectors,” in Proc. ICASSP, 2020, pp. 676–680.
  • [24] R. Scheibler and N. Ono, “Independent vector analysis with more microphones than sources,” in Proc. WASPAA, 2019, pp. 185–189.
  • [25] ——, “Fast independent vector extraction by iterative SINR maximization,” in Proc. ICASSP, 2020, pp. 601–605.
  • [26] ——, “MM algorithms for joint independent subspace analysis with application to blind single and multi-source extraction,” arXiv:2004.03926v1, 2020.
  • [27] R. Ikeshita, T. Nakatani, and S. Araki, “Overdetermined independent vector analysis,” in Proc. ICASSP, 2020, pp. 591–595.
  • [28] K. Lange, MM optimization algorithms.   SIAM, 2016, vol. 147.
  • [29] N. Ono and S. Miyabe, “Auxiliary-function-based independent component analysis for super-Gaussian sources,” in Proc. LVA/ICA, 2010, pp. 165–172.
  • [30] N. Ono, “Stable and fast update rules for independent vector analysis based on auxiliary function technique,” in Proc. WASPAA, 2011, pp. 189–192.
  • [31] ——, “Fast stereo independent vector analysis and its implementation on mobile phone,” in Proc. IWAENC, 2012, pp. 1–4.
  • [32] ——, “Fast algorithm for independent component/vector/low-rank matrix analysis with three or more sources,” in Proc. ASJ Spring Meeting, 2018, (in Japanese).
  • [33] D.-T. Pham and J.-F. Cardoso, “Blind separation of instantaneous mixtures of nonstationary sources,” IEEE Trans. Signal Process., vol. 49, no. 9, pp. 1837–1848, 2001.
  • [34] S. Dégerine and A. Zaïdi, “Separation of an instantaneous mixture of Gaussian autoregressive sources by the exact maximum likelihood approach,” IEEE Trans. Signal Process., vol. 52, no. 6, pp. 1499–1512, 2004.
  • [35] ——, “Determinant maximization of a nonsymmetric matrix with quadratic constraints,” SIAM J. Optim., vol. 17, no. 4, pp. 997–1014, 2006.
  • [36] A. Yeredor, “On hybrid exact-approximate joint diagonalization,” in Proc. CAMSAP, 2009, pp. 312–315.
  • [37] A. Yeredor, B. Song, F. Roemer, and M. Haardt, “A “sequentially drilled” joint congruence (SeDJoCo) transformation with applications in blind source separation and multiuser MIMO systems,” IEEE Trans. Signal Process., vol. 60, no. 6, pp. 2744–2757, 2012.
  • [38] P. Tseng, “Convergence of a block coordinate descent method for nondifferentiable minimization,” Journal of optimization theory and applications, vol. 109, no. 3, pp. 475–494, 2001.
  • [39] D. Kitamura, N. Ono, H. Sawada, H. Kameoka, and H. Saruwatari, “Determined blind source separation unifying independent vector analysis and nonnegative matrix factorization,” IEEE/ACM Trans. Audio, Speech, Language Process., vol. 24, no. 9, pp. 1622–1637, 2016.
  • [40] H. Kameoka, L. Li, S. Inoue, and S. Makino, “Supervised determined source separation with multichannel variational autoencoder,” Neural computation, vol. 31, no. 9, pp. 1891–1914, 2019.
  • [41] N. Makishima, S. Mogami, N. Takamune, D. Kitamura, H. Sumino, S. Takamichi, H. Saruwatari, and N. Ono, “Independent deeply learned matrix analysis for determined audio source separation,” IEEE/ACM Trans. Audio, Speech, Language Process., vol. 27, no. 10, pp. 1601–1615, 2019.
  • [42] K. Sekiguchi, A. A. Nugraha, Y. Bando, and K. Yoshii, “Fast multichannel source separation based on jointly diagonalizable spatial covariance matrices,” in Proc. EUSIPCO, 2019, pp. 1–5.
  • [43] N. Ito and T. Nakatani, “FastMNMF: Joint diagonalization based accelerated algorithms for multichannel nonnegative matrix factorization,” in Proc. ICASSP, 2019, pp. 371–375.
  • [44] H. L. Van Trees, Optimum array processing: Part IV of detection, estimation, and modulation theory.   John Wiley & Sons, 2004.
  • [45] E. Warsitz and R. Haeb-Umbach, “Blind acoustic beamforming based on generalized eigenvalue decomposition,” IEEE Trans. Audio, Speech, Language Process., vol. 15, no. 5, pp. 1529–1539, 2007.
  • [46] N. Ito, S. Araki, and T. Nakatani, “Data-driven and physical model-based designs of probabilistic spatial dictionary for online meeting diarization and adaptive beamforming,” in Proc. EUSIPCO, 2017, pp. 1165–1169.
  • [47] D. H. Johnson and D. E. Dudgeon, Array signal processing: concepts and techniques.   Simon & Schuster, Inc., 1992.
  • [48] E. Warsitz and R. Haeb-Umbach, “Blind acoustic beamforming based on generalized eigenvalue decomposition,” IEEE Trans. Audio, Speech, Language Process., vol. 15, no. 5, pp. 1529–1539, 2007.
  • [49] A. Khabbazibasmenj, S. A. Vorobyov, and A. Hassanien, “Robust adaptive beamforming based on steering vector estimation with as little as possible prior information,” IEEE Trans. Signal Process., vol. 60, no. 6, pp. 2974–2987, 2012.
  • [50] S. A. Vorobyov, “Principles of minimum variance robust adaptive beamforming design,” Signal Processing, vol. 93, no. 12, pp. 3264–3277, 2013.
  • [51] E. Gómez, M. Gomez-Viilegas, and J. Marin, “A multivariate generalization of the power exponential family of distributions,” Communications in Statistics-Theory and Methods, vol. 27, no. 3, pp. 589–600, 1998.
  • [52] S. Amari, A. Cichocki, and H. H. Yang, “A new learning algorithm for blind signal separation,” in Proc. NIPS, 1996, pp. 757–763.
  • [53] J. Nocedal and S. Wright, Numerical optimization.   Springer Science & Business Media, 2006.
  • [54] R. Ikeshita, N. Ito, T. Nakatani, and H. Sawada, “Independent low-rank matrix analysis with decorrelation learning,” in Proc. WASPAA, 2019, pp. 288–292.
  • [55] N. Murata, S. Ikeda, and A. Ziehe, “An approach to blind source separation based on temporal structure of speech signals,” Neurocomputing, vol. 41, no. 1-4, pp. 1–24, 2001.
  • [56] E. Vincent, R. Gribonval, and C. Févotte, “Performance measurement in blind audio source separation,” IEEE Trans. Audio, Speech, Language Process., vol. 14, no. 4, pp. 1462–1469, 2006.
  • [57] S. Nakamura, K. Hiyane, F. Asano, T. Nishiura, and T. Yamada, “Acoustical sound database in real environments for sound scene understanding and hands-free speech recognition,” in LREC, 2000.
  • [58] J. Garofolo, L. Lamel, W. Fisher, J. Fiscus, D. Pallett, N. Dahlgren, and V. Zue, “TIMIT Acoustic-Phonetic Continuous Speech Corpus LDC93S1. Web Download. Philadelphia: Linguistic Data Consortium,” 1993.
  • [59] J. Barker, R. Marxer, E. Vincent, and S. Watanabe, “The third ‘CHiME’ speech separation and recognition challenge: Dataset, task and baselines,” in Proc. ASRU, 2015, pp. 504–511.
  • [60] K. E. Atkinson, An introduction to numerical analysis.   John wiley & sons, 2008.
  • [61] T. Nakatani and K. Kinoshita, “Maximum likelihood convolutional beamformer for simultaneous denoising and dereverberation,” in Proc. EUSIPCO, 2019, pp. 1–5.