This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Quantum-Inspired Genetic Algorithm for Robust Source Separation in Smart City Acoustics

Minh K. Quan1, Mayuri Wijayasundara1, Sujeeva Setunge4, Pubudu N. Pathirana1 1School of Engineering, Deakin University, Australia
4School of Engineering, Royal Melbourne Institute of Technology University, Melbourne, VIC 3000, Australia
Abstract

The cacophony of urban sounds presents a significant challenge for smart city applications that rely on accurate acoustic scene analysis. Effectively analyzing these complex soundscapes, often characterized by overlapping sound sources, diverse acoustic events, and unpredictable noise levels, requires precise source separation. This task becomes more complicated when only limited training data is available. This paper introduces a novel Quantum-Inspired Genetic Algorithm (p-QIGA) for source separation, drawing inspiration from quantum information theory to enhance acoustic scene analysis in smart cities. By leveraging quantum superposition for efficient solution space exploration and entanglement to handle correlated sources, p-QIGA achieves robust separation even with limited data. These quantum-inspired concepts are integrated into a genetic algorithm framework to optimize source separation parameters. The effectiveness of our approach is demonstrated on two datasets: the TAU Urban Acoustic Scenes 2020 Mobile dataset, representing typical urban soundscapes, and the Silent Cities dataset, capturing quieter urban environments during the COVID-19 pandemic. Experimental results show that the p-QIGA achieves accuracy comparable to state-of-the-art methods while exhibiting superior resilience to noise and limited training data, achieving up to 8.2 dB signal-to-distortion ratio (SDR) in noisy environments and outperforming baseline methods by up to 2 dB with only 10% of the training data. This research highlights the potential of p-QIGA to advance acoustic signal processing in smart cities, particularly for noise pollution monitoring and acoustic surveillance.

Index Terms:
Acoustic signal processing, Quantum-inspired algorithms, Smart cities, Source separation

I Introduction

The rise of smart cities has brought an abundance of opportunities to improve urban living through data-driven solutions. Acoustic scene analysis (ASA) plays a crucial role in this vision, enabling a deeper understanding of the urban environment through the analysis of sound [1]. However, the complexity of urban soundscapes, characterized by overlapping sound sources, diverse acoustic events, and unpredictable noise levels, presents a significant challenge for ASA, particularly in source separation. Traditional source separation methods, such as Independent Component Analysis (ICA) and Non-negative Matrix Factorization (NMF), often struggle with these complex soundscapes due to their limitations in handling correlated sources and their sensitivity to noise [2]. This necessitates the exploration of more robust and adaptable techniques that can effectively disentangle individual sound sources in real-world urban scenarios.

Quantum information theory, with principles like superposition and entanglement, offers a powerful framework for signal processing and communication, enabling efficient representation of high-dimensional signals and encoding of correlated information beyond classical capabilities. Quantum-inspired genetic algorithms have been previously explored for various optimization problems. For instance, Roy et al. [3] demonstrated the effectiveness of quantum-based genetic algorithms for optimizing complex functions. Our work builds upon these prior contributions by specifically applying this approach to the noise separation problem in complex urban acoustic scenes, with a focus on achieving robust and accurate source separation even with limited training data. However, limitations in current quantum hardware hinder the direct application of quantum algorithms to ASA. This motivates exploring quantum-inspired algorithms, adapting quantum mechanics principles to classical computation for near-term applications while leveraging the unique advantages of quantum phenomena.

I-A Background and Related Work

ASA involves tasks such as acoustic scene classification and sound event detection, with source separation being a crucial component [1]. However, traditional source separation methods like ICA [4] and NMF [5], as well as more recent techniques like Sparse Component Analysis (SCA) [6] and deep learning (DL) (e.g., [7, 8]), face challenges in handling correlated sources, noise robustness, computational efficiency, and data efficiency in complex urban acoustic scenes. While promising for signal processing, the direct application of quantum algorithms to ASA, particularly source separation, remains largely unexplored. Quantum algorithms, such as Quantum Principal Component Analysis (QPCA) [9] and Quantum Fourier Transform (QFT) [10], offer potential advantages for signal processing tasks due to their inherent ability to handle high-dimensional spaces and exploit quantum phenomena like superposition and entanglement. However, their application to source separation in complex acoustic environments requires further investigation. Similarly, Quantum Support Vector Machines (QSVM) [11] is more suited for classification tasks, while Quantum Annealing (QA) [12] shows potential for optimization but needs further evaluation in complex scenarios.

I-B Motivations and Key Contributions

Our proposed Quantum-Inspired Genetic Algorithm (p-QIGA) addresses the challenge of source separation in complex urban soundscapes by incorporating quantum concepts into a genetic optimization framework. This approach allows us to effectively disentangle individual sound sources in real-world urban scenarios, even in the presence of noise and limited training data. The p-QIGA’s ability to accurately separate sources enhances the identification and tracking of vehicles in noisy environments and contributes to improved urban planning by providing insights into the acoustic characteristics of different urban spaces. This research makes the following key contributions:

  • We introduce a novel p-QIGA leveraging quantum concepts to optimize source separation in complex urban acoustic scenes, enhancing performance and robustness even with limited data.

  • Our p-QIGA effectively addresses key challenges in smart city source separation, including handling correlated sources and diverse sound events.

  • By achieving accurate and robust source separation, our p-QIGA improves performance in critical smart city applications like noise pollution monitoring and acoustic surveillance.

II Proposed Methodology

II-A Problem Formulation: Source Separation in ASA

Source separation in ASA aims to decompose an observed audio signal x(t)x(t) into its constituent sound sources si(t)s_{i}(t). The mixing process can be modeled as a convolutive mixture in the time domain:

x(t)=i=1Nk=0K1ai(k)si(tk)+n(t),x(t)=\sum_{i=1}^{N}\sum_{k=0}^{K-1}a_{i}(k)s_{i}(t-k)+n(t), (1)

or equivalently in the frequency domain:

X(f)=i=1NAi(f)Si(f)+N(f),X(f)=\sum_{i=1}^{N}A_{i}(f)S_{i}(f)+N(f), (2)

where ai(k)a_{i}(k) and Ai(f)A_{i}(f) are the mixing filters, and n(t)n(t) and N(f)N(f) represent noise. The goal is to estimate si(t)s_{i}(t) (or Si(f)S_{i}(f)) given x(t)x(t) (or X(f)X(f)). This is challenging due to factors like unknown mixing filters, correlated sources, diverse sound events, noise, reverberation, and limited training data, motivating the exploration of novel approaches like the quantum-inspired genetic algorithm proposed in this paper.

II-B Quantum Encoding of Acoustic Features

Within our proposed p-QIGA-based source separation framework, the initial stage focuses on encoding acoustic features into a quantum representation. Utilizing Mel-frequency cepstral coefficients (MFCCs) within this quantum framework offers potential advantages in terms of representational compactness, noise resilience, and correlation capture. These benefits are achieved through encoding MFCCs into quantum states using a parameterized quantum circuit (PQC) illustrated in Fig. 1. This process facilitates efficient processing, robustness to noise, and the exploitation of entanglement to identify correlated sound sources, crucial for effective source separation in complex acoustic environments.

Refer to caption

Fig. 1: Our PQC for encoding MFCC features.

The parameterized PQC shown in Fig. 1 encodes nn MFCC features into a 4-qubit quantum state using single-qubit rotation gates (RyR_{y}) and two-qubit controlled-NOT (CNOT) gates. The encoding process starts by applying Hadamard gates (H) to the first and last qubits to introduce superposition. CNOT gates then create entanglement between adjacent qubits, capturing correlations between features. Each RyR_{y} gate is parameterized by a distinct MFCC feature, with this pattern repeating for subsequent features. This design leverages the strengths of each gate: Hadamard gates for superposition, enabling a larger solution space; CNOT0,1CNOT_{0,1} gates for entanglement, capturing feature relationships; and RyR_{y} gates for precise encoding of individual MFCC values. The alternating pattern of CNOT0,1CNOT_{0,1} and RyR_{y} gates ensures each feature is encoded into a separate qubit while capturing correlations. MFCC features are sequentially assigned to qubits, simplifying the encoding and interpretation. This scheme is chosen for its ability to capture feature correlations and potential noise resilience due to the use of entangled quantum states. The choice of quantum operators and their parameterization can significantly impact the encoding and the p-QIGA’s performance. Different rotation gates (e.g., RxR_{x}, RzR_{z}) or multi-qubit gates (e.g., Toffoli gates) could alter the encoding.

Let 𝐱=[x1,x2,,xn]Tn\mathbf{x}=[x_{1},x_{2},...,x_{n}]^{T}\in\mathbb{R}^{n} denote the vector of MFCC features. The PQC encodes this into a quantum state |ψ(x)4|\psi(x)\rangle\in\mathcal{H}^{\otimes 4}:

|ψ(𝐱)=U(𝐱,θ)|04,|\psi(\mathbf{x})\rangle=U(\mathbf{x},\theta)|0\rangle^{\otimes 4}, (3)

where U(x,θ)U(x,\theta) is the unitary operator representing the PQC with trainable parameters θ\theta. The optimization of θ\theta can be formulated as:

θ=argmaxθ𝒫(θ),\theta^{*}=\arg\max_{\theta}\mathcal{P}(\theta), (4)

where 𝒫(θ)\mathcal{P}(\theta) is a performance metric. For the first four MFCC features, the encoding process can be represented as:

|ψ0\displaystyle|\psi_{0}\rangle =|04,\displaystyle=|0\rangle^{\otimes 4}, |ψ1=H0H3|ψ0,\displaystyle|\psi_{1}\rangle=H_{0}H_{3}|\psi_{0}\rangle,
|ψ2\displaystyle|\psi_{2}\rangle =CNOT0,1|ψ1,\displaystyle=CNOT_{0,1}|\psi_{1}\rangle, |ψ3=Ry(x2)|ψ2,\displaystyle|\psi_{3}\rangle=R_{y}(x_{2})|\psi_{2}\rangle,
|ψ4\displaystyle|\psi_{4}\rangle =CNOT1,2|ψ3,\displaystyle=CNOT_{1,2}|\psi_{3}\rangle, |ψ5=Ry(x4)|ψ4,\displaystyle|\psi_{5}\rangle=R_{y}(x_{4})|\psi_{4}\rangle,
|ψ6\displaystyle|\psi_{6}\rangle =CNOT2,3|ψ5.\displaystyle=CNOT_{2,3}|\psi_{5}\rangle.

This encoding scheme compactly represents multiple MFCC features, potentially enabling efficient processing and noise resilience. Furthermore, capturing correlations between features through entanglement can enhance the p-QIGA’s source separation capabilities.

II-C Quantum-Inspired Genetic Algorithm for Source Separation (p-QIGA)

II-C1 Representation and Initialization

In p-QIGA, each individual in the population represents a candidate solution to the source separation problem. These individuals are encoded as quantum states within a 4-qubit Hilbert space 4\mathcal{H}^{\otimes 4}, where each qubit corresponds to a specific parameter of the source separation model. The initial population is generated by randomly initializing the qubits in a superposition of states, allowing for a diverse exploration of the parameter space. An individual |ψi|\psi_{i}\rangle in the population can be represented as:

|ψi=αi|0+βi|1,|\psi_{i}\rangle=\alpha_{i}|0\rangle+\beta_{i}|1\rangle, (5)

where αi\alpha_{i} and βi\beta_{i} are complex probability amplitudes satisfying |αi|2+|βi|2=1|\alpha_{i}|^{2}+|\beta_{i}|^{2}=1.

II-C2 Quantum-Inspired Genetic Operators

The p-QIGA employs quantum-inspired genetic operators to evolve the population of candidate solutions. These quantum-inspired operators, such as superposition and entanglement, are designed to enhance the search process.

Quantum Crossover

This operator combines genetic information from two parent individuals, p1p_{1} and p2p_{2}, to create two 𝒪\mathcal{O}, o1o_{1} and o2o_{2}. It leverages the concept of superposition to create 𝒪\mathcal{O} that are a linear combination of the parent states, allowing for exploration of new regions in the parameter space. The crossover operation can be represented as:

|ψo1\displaystyle|\psi_{o_{1}}\rangle =αp1|ψp1+βp2|ψp2,\displaystyle=\alpha_{p_{1}}|\psi_{p_{1}}\rangle+\beta_{p_{2}}|\psi_{p_{2}}\rangle, (6)
|ψo2\displaystyle|\psi_{o_{2}}\rangle =αp2|ψp2+βp1|ψp1,\displaystyle=\alpha_{p_{2}}|\psi_{p_{2}}\rangle+\beta_{p_{1}}|\psi_{p_{1}}\rangle, (7)

where the probability amplitudes αpi\alpha_{p_{i}} and βpi\beta_{p_{i}} are determined by a rotation gate applied to the parent states.

Quantum Mutation

This operator introduces random variations in the quantum states of individuals, simulating the effect of quantum fluctuations. This helps to maintain diversity in the population and prevent premature convergence to suboptimal solutions. The mutation operation can be represented as a rotation gate applied to an individual’s state:

|ψi=R(θ)|ψi,|\psi_{i}^{\prime}\rangle=R(\theta)|\psi_{i}\rangle, (8)

where R(θ)R(\theta) is a rotation gate with a randomly chosen rotation angle θ\theta.

II-C3 Fitness Function

The fitness function evaluates the quality of each candidate solution, guiding the p-QIGA towards optimal separation parameters. In our case, the fitness function F(|ψi)F(|\psi_{i}\rangle) considers multiple criteria:

F(|ψi)\displaystyle F(|\psi_{i}\rangle) =w1SDR(|ψi)+w2SIR(|ψi)\displaystyle=w_{1}\cdot SDR(|\psi_{i}\rangle)+w_{2}\cdot SIR(|\psi_{i}\rangle) (9)
+w3SAR(|ψi)w4C(|ψi),\displaystyle\quad+w_{3}\cdot SAR(|\psi_{i}\rangle)-w_{4}\cdot C(|\psi_{i}\rangle),

where SDR(|ψi)SDR(|\psi_{i}\rangle), SIR(|ψi)SIR(|\psi_{i}\rangle), and SAR(|ψi)SAR(|\psi_{i}\rangle) are the Signal-to-Distortion Ratio, Signal-to-Interference Ratio, and Signal-to-Artifacts Ratio, respectively, of the separated sources obtained using the parameters encoded in |ψi|\psi_{i}\rangle. C(|ψi)C(|\psi_{i}\rangle) is a penalty term that measures the correlation between the separated sources, and w1w_{1}, w2w_{2}, w3w_{3}, and w4w_{4} are weight factors that balance the importance of each criterion.

II-C4 Optimization Process

The p-QIGA iteratively applies the quantum-inspired genetic operators to evolve the population of candidate solutions. This iterative process continues until either a maximum number of generations, denoted as TmaxT_{max}, is reached or the best fitness value in the population, denoted as FbestF_{best}, surpasses a predefined desired fitness threshold, denoted as FdesiredF_{desired}. The optimization process can be represented as the iterative update of the population:

𝒫(t+1)=𝒪((𝒞(𝒮(𝒫(t))))),\mathcal{P}^{(t+1)}=\mathcal{O}(\mathcal{M}(\mathcal{C}(\mathcal{S}(\mathcal{P}^{(t)})))), (10)

where 𝒫(t)\mathcal{P}^{(t)} represents the population at generation tt, 𝒮\mathcal{S} is the selection operator, 𝒞\mathcal{C} is the crossover operator, \mathcal{M} is the mutation operator, and 𝒪\mathcal{O} is the offspring replacement operator. In detail, this process can be described in Algorithm 1.

II-C5 Convergence Analysis

The convergence of the p-QIGA can be analyzed by considering the probability of finding the optimal solution in each generation tt, denoted as Popt(t)P_{opt}^{(t)}. The change in this probability, ΔPopt(t)=Popt(t+1)Popt(t)\Delta P_{opt}^{(t)}=P_{opt}^{(t+1)}-P_{opt}^{(t)}, can be modeled as:

ΔPopt(t)=αE(t)(1Popt(t)),\Delta P_{opt}^{(t)}=\alpha\cdot E(t)\cdot(1-P_{opt}^{(t)}), (11)

where α\alpha represents the effectiveness of the quantum-inspired operators, and E(t)=11+βtE(t)=\frac{1}{1+\beta t} is the exploration rate, with β\beta controlling the rate of decrease in exploration. By solving this difference equation, we can analyze the convergence behavior of the p-QIGA.

Algorithm 1 Quantum-Inspired Genetic Algorithm for Source Separation (p-QIGA)
1:Initialize population 𝒫={|ψi}i=1P\mathcal{P}=\{|\psi_{i}\rangle\}_{i=1}^{P} with random quantum states.
2:t0t\leftarrow 0
3:while t<Tmaxt<T_{max} and Fbest<FdesiredF_{best}<F_{desired} do
4:     for |ψi𝒫|\psi_{i}\rangle\in\mathcal{P} do
5:         Evaluate fitness F(|ψi)F(|\psi_{i}\rangle) (Eq. 9).
6:     end for
7:     Fbestmax|ψi𝒫F(|ψi)F_{best}\leftarrow\max_{|\psi_{i}\rangle\in\mathcal{P}}F(|\psi_{i}\rangle)
8:     𝒫selectedSelect(𝒫)\mathcal{P}_{selected}\leftarrow\text{Select}(\mathcal{P})
9:     for (|ψp1,|ψp2)𝒫selected(|\psi_{p_{1}}\rangle,|\psi_{p_{2}}\rangle)\in\mathcal{P}_{selected} do
10:         (|ψo1,|ψo2)Crossover(|ψp1,|ψp2)(|\psi_{o_{1}}\rangle,|\psi_{o_{2}}\rangle)\leftarrow\text{Crossover}(|\psi_{p_{1}}\rangle,|\psi_{p_{2}}\rangle)
11:     end for
12:     for |ψo𝒪|\psi_{o}\rangle\in\mathcal{O} do
13:         |ψoMutate(|ψo)|\psi_{o}\rangle\leftarrow\text{Mutate}(|\psi_{o}\rangle) with probability PmP_{m}
14:     end for
15:     𝒫𝒪\mathcal{P}\leftarrow\mathcal{O}
16:     tt+1t\leftarrow t+1
17:end while
18:Return argmax|ψi𝒫F(|ψi)\arg\max_{|\psi_{i}\rangle\in\mathcal{P}}F(|\psi_{i}\rangle)

II-C6 Computational Complexity and Scalability

The p-QIGA demonstrates efficiency with a time complexity of O(M)O(M), where MM is the number of MFCC features, and constant space complexity. This outperforms classical methods like ICA (O(N3)O(N^{3})) and NMF (O(N2I)O(N^{2}\cdot I)). To assess scalability, we conducted experiments varying the number of sources, data size, and circuit depth. Increasing the number of sources from 2 to 5 increased runtime by 35%, attributed to the increased circuit complexity. Doubling the data size resulted in a 42% runtime increase. Increasing circuit depth by adding an additional layer of gates led to a 28% runtime increase. These results demonstrate the p-QIGA’s ability to handle increasingly complex scenarios with moderate increases in computational cost. Future work will explore further optimizations for enhanced efficiency.

II-D Classical Post-processing and Classification

Following the quantum-enhanced source separation process, the estimated source signals s^i(t)\hat{s}_{i}(t) often require further refinement. This subsection details the classical post-processing techniques employed and the subsequent classification stage.

II-D1 Post-processing Techniques

The following signal processing techniques are applied to each separated source s^i(t)\hat{s}_{i}(t):

  • Filtering: A bandpass filter Hi(f)H_{i}(f) is applied to remove residual noise and unwanted frequency components: S^i(f)=Hi(f)S^i(f)\hat{S}_{i}^{\prime}(f)=H_{i}(f)\hat{S}_{i}(f), where S^i(f)\hat{S}_{i}(f) and S^i(f)\hat{S}_{i}^{\prime}(f) are the DFTs of s^i(t)\hat{s}_{i}(t) and the filtered signal s^i(t)\hat{s}_{i}^{\prime}(t), respectively.

  • Dynamic Range Compression: A dynamic range compression algorithm C()C(\cdot) is applied to reduce the dynamic range:

    s^i′′(t)={s^i(t)if |s^i(t)|τ,τ+|s^i(t)|τρif |s^i(t)|>τ,\hat{s}_{i}^{\prime\prime}(t)=\begin{cases}\hat{s}_{i}^{\prime}(t)&\text{if }|\hat{s}_{i}^{\prime}(t)|\leq\tau,\\ \tau+\frac{|\hat{s}_{i}^{\prime}(t)|-\tau}{\rho}&\text{if }|\hat{s}_{i}^{\prime}(t)|>\tau,\end{cases} (12)

    where τ\tau is the threshold and ρ\rho is the compression ratio.

  • De-clipping: A de-clipping algorithm D()D(\cdot) is applied to restore signal fidelity in clipped regions, for example, by replacing clipped samples with interpolated values.

II-D2 Acoustic Scene Classification

The refined separated sources s^i′′′(t)\hat{s}_{i}^{\prime\prime\prime}(t) are used for acoustic scene classification. We extract a feature vector fi\textit{f}_{i} from each source and employ a Support Vector Machine (SVM) classifier g()g(\cdot) with a radial basis function (RBF) kernel:

c^=g([f1,f2,,fN]),\hat{c}=g([\textit{f}_{1},\textit{f}_{2},...,\textit{f}_{N}]), (13)

where c^\hat{c} is the predicted acoustic scene class.

III Experimental Setting

III-A Datasets

We evaluate our p-QIGA on two datasets: (1) the TAU Urban Acoustic Scenes 2020 Mobile dataset [13] (Dataset 1), comprising recordings from 10 acoustic scenes in 12 European cities, captured using 4 different mobile devices; and (2) the Silent Cities dataset [14] (Dataset 2), capturing unique soundscapes recorded during the COVID-19 pandemic in various cities worldwide, featuring urban environments with reduced human activity. Both datasets are preprocessed (format conversion, downsampling, noise reduction) and split into training (80%), validation (10%), and test (10%) sets.

III-B Evaluation Metrics

We evaluate the performance of our p-QIGA algorithm using established metrics that quantify the quality of the separated sources, including:

  1. 1.

    Signal-to-Distortion Ratio (SDR):

    SDR(si,s^i)=10log10tsi2(t)t(si(t)s^i(t))2.SDR(s_{i},\hat{s}_{i})=10\log_{10}\frac{\sum_{t}s_{i}^{2}(t)}{\sum_{t}(s_{i}(t)-\hat{s}_{i}(t))^{2}}. (14)
  2. 2.

    Signal-to-Interference Ratio (SIR):

    SIR(si,s^i)=10log10tsi2(t)t(s^i(t)si(t))2tn2(t),SIR(s_{i},\hat{s}_{i})=10\log_{10}\frac{\sum_{t}s_{i}^{2}(t)}{\sum_{t}(\hat{s}_{i}(t)-s_{i}(t))^{2}-\sum_{t}n^{2}(t)}, (15)

    where n(t)n(t) represents the noise component.

  3. 3.

    Signal-to-Artifacts Ratio (SAR):

    SAR(si,s^i)=10log10t(s^i(t)n(t))2t(s^i(t)si(t))2.SAR(s_{i},\hat{s}_{i})=10\log_{10}\frac{\sum_{t}(\hat{s}_{i}(t)-n(t))^{2}}{\sum_{t}(\hat{s}_{i}(t)-s_{i}(t))^{2}}. (16)

These metrics are widely used in the field of ASA and provide a comprehensive assessment of source separation performance by capturing different aspects of the separation quality, such as target distortion, interference suppression, and artifact removal. Furthermore, these metrics align with the evaluation criteria used in related works, allowing for a fair comparison with existing source separation techniques.

III-C Baseline Methods

We benchmark our p-QIGA against established source separation methods, including:

  • Classical methods: ICA, NMF, SCA, and multi-channel CNNs.

  • Quantum-Inspired method: Quantum Annealing (QA) [12] for optimizing a classical source separation model.

These baselines provide a diverse benchmark for evaluating the performance of our p-QIGA.

III-D Implementation Details

The p-QIGA was implemented using Qiskit for quantum computing simulation and standard Python libraries (NumPy, SciPy) for classical components. Experiments were conducted on a workstation with an Intel Xeon Gold 6248 CPU, 128GB RAM, and an NVIDIA Tesla V100 GPU. Hyperparameter tuning for all methods was performed using grid search, with optimal values selected based on validation set performance. Specific hyperparameters and their search spaces are listed in Table I.

Table I: Hyperparameters for p-QIGA and Baseline Methods
Method Hyperparameter Value
p-QIGA Population size 50
Number of generations 100
Crossover probability 0.8
Mutation probability 0.1
Weight factors in Equation (9) wSDR=0.5w_{SDR}=0.5, wSIR=0.3w_{SIR}=0.3, wSAR=0.2w_{SAR}=0.2, wC=1.0w_{C}=1.0
ICA Learning rate 0.01
NMF Number of components 10
Regularization parameter 0.01
SCA Sparsity level 0.1
Dictionary size 256
CNN Number of layers 5
Kernel size 3
Learning rate 0.001
Batch size 32
QA Annealing schedule Linear
Number of iterations 1000

IV Results and Analysis

IV-A Source Separation Performance

As shown in Table II, the p-QIGA achieves competitive performance on both datasets, demonstrating its effectiveness in handling complex acoustic scenes. On the TAU Urban Acoustic Scenes 2020 Mobile dataset, it achieves comparable performance to CNNs and outperforms other classical methods (ICA, NMF, SCA). On the Silent Cities dataset, the p-QIGA outperforms all baselines, highlighting its superior ability to handle correlated sources and limited training data. These results are statistically significant (p << 0.05). Furthermore, the p-QIGA exhibits better generalization capabilities than the CNN and demonstrates greater robustness to variations in acoustic conditions and limited training data, which are crucial for real-world smart city applications.

Table II: Source Separation Performance
Method Dataset 1 Dataset 2
SDR (dB) SIR (dB) SAR (dB) SDR (dB) SIR (dB) SAR (dB)
ICA 9.5 14.2 11.5 7.2 12.1 9.1
NMF 9.8 14.8 11.8 7.8 12.9 9.6
SCA 10.0 15.1 11.9 8.1 13.2 9.9
CNN 10.5 16.0 12.5 8.3 13.5 10.1
QA 9.9 14.9 11.7 8.0 13.0 9.8
p-QIGA (Ours) 10.2 15.5 12.1 8.5 13.8 10.3

IV-B Impact of Quantum-Inspired Components

An ablation study was conducted to investigate the contribution of each quantum-inspired component (superposition, entanglement, crossover, and mutation) in the p-QIGA. Each component was systematically removed or replaced with its classical counterpart to evaluate its impact on source separation performance. Table III presents the performance of these p-QIGA variants compared to the full p-QIGA on the Silent Cities dataset. As shown in the table, removing superposition or entanglement leads to a noticeable performance degradation, particularly in terms of SDR and SIR. This highlights the benefits of encoding acoustic features into quantum states and leveraging entanglement to capture correlations between sources. Similarly, replacing the quantum crossover and mutation operators with classical counterparts also results in a slight performance decrease, indicating the effectiveness of these quantum-inspired operators in exploring the solution space.

Table III: Impact of Quantum-Inspired Components (Silent Cities Dataset)
Method SDR (dB) SIR (dB) SAR (dB)
p-QIGA (full) 8.5 13.8 10.3
p-QIGA without superposition 7.8 12.5 9.6
p-QIGA without entanglement 8.1 13.1 9.9
p-QIGA with classical crossover 8.2 13.3 10.0
p-QIGA with classical mutation 8.3 13.5 10.1

IV-C Performance on Different Acoustic Scenes

Refer to caption

Fig. 2: p-QIGA vs. Baseline Methods: SDR, SIR, and SAR across Scene Categories.

The p-QIGA demonstrated robustness in challenging scenarios, achieving 8.2 dB SDR in a high-noise “busy street” scene (3 dB SNR), significantly outperforming the CNN (6.5 dB, p=0.023). While proficient in moderately dense scenes, performance slightly decreased with higher source density, suggesting potential limitations in resolving closely spaced sources. However, the p-QIGA showed greater resilience to source mobility than CNNs, achieving 7.8 dB SDR and 13.1 dB SIR in a “train station” with 80% moving sources, compared to the CNN’s 6.1 dB and 10.5 dB, respectively. Fig. 2 further confirms the p-QIGA’s superior performance and generalization ability, especially in complex scenes with high noise, source density, or source mobility, attributed to its quantum-inspired encoding, entanglement mechanism, and genetic operators.

IV-D Performance with Varying Data Sizes

To assess the data efficiency of the p-QIGA, we conducted experiments with varying training data sizes on the TAU Urban Acoustic Scenes 2020 Mobile dataset. The p-QIGA and baseline methods were trained on 10%, 25%, 50%, and 75% of the original training set, ensuring consistent dataset reductions across all methods for fair comparison. Performance was evaluated on the held-out test set using SDR, SIR, and SAR. As shown in Table IV, the p-QIGA consistently outperforms baseline methods, even with limited training data (e.g., achieving 7.1 dB SDR with only 10% of the data). This data efficiency, attributed to the quantum-inspired encoding and genetic optimization, highlights the p-QIGA’s suitability for scenarios with scarce labeled data and underscores the potential of quantum-inspired algorithms for practical smart city applications.

Table IV: Performance with Varying Data Sizes
Training Data Size p-QIGA (SDR) CNN (SDR) ICA (SDR)
10% 7.1 dB 6.2 dB 5.3 dB
25% 7.8 dB 6.9 dB 6.0 dB
50% 8.3 dB 7.5 dB 6.8 dB
75% 8.5 dB 8.0 dB 7.2 dB

V Conclusion

This paper presented a novel p-QIGA designed to address the complex task of source separation in urban acoustic scenes for smart city applications. By incorporating quantum concepts of superposition and entanglement into a genetic optimization framework, our approach achieved robust and accurate source separation even in challenging acoustic environments. These environments, which are often characterized by high noise levels, numerous interfering sources, and dynamic conditions, pose significant challenges for traditional source separation methods. The p-QIGA’s effectiveness was rigorously evaluated on two distinct datasets, demonstrating its superior performance compared to classical methods, particularly in scenarios with limited training data. This capability is crucial for real-world applications where obtaining large labeled datasets can be costly or impractical. This research contributes significantly to the field of acoustic signal processing by introducing a new class of quantum-inspired algorithms with the potential to enhance source separation capabilities in diverse applications.

References

  • [1] J. Abeßer, “A review of deep learning based methods for acoustic scene classification,” Applied Sciences, vol. 10, no. 6, p. 2020, 2020.
  • [2] S. Ansari, A. S. Alatrany, K. A. Alnajjar, T. Khater, S. Mahmoud, D. Al-Jumeily, and A. J. Hussain, “A survey of artificial intelligence approaches in blind source separation,” Neurocomputing, vol. 561, p. 126895, 2023.
  • [3] U. Roy, S. Roy, and S. Nayek, “Optimization with quantum genetic algorithm,” International Journal of Computer Applications, vol. 102, no. 16, pp. 1–7, 2014.
  • [4] J. Xie and M. Zhu, “Investigation of acoustic and visual features for acoustic scene classification,” Expert Systems with Applications, vol. 126, pp. 20–29, 2019.
  • [5] V. Bisot, R. Serizel, S. Essid, and G. Richard, “Acoustic scene classification with matrix factorization for unsupervised feature learning,” in 2016 IEEE international conference on acoustics, speech and signal processing (ICASSP).   IEEE, 2016, pp. 6445–6449.
  • [6] A. Asaei, H. Bourlard, M. J. Taghizadeh, and V. Cevher, “Computational methods for underdetermined convolutive speech localization and separation via model-based sparse component analysis,” Speech Communication, vol. 76, pp. 201–217, 2016.
  • [7] W. Xie, Q. He, Z. Yu, and Y. Li, “Deep mutual attention network for acoustic scene classification,” Digital Signal Processing, vol. 123, p. 103450, 2022.
  • [8] S. Arniriparian, M. Freitag, N. Cummins, M. Gerczuk, S. Pugachevskiy, and B. Schuller, “A fusion of deep convolutional generative adversarial networks and sequence to sequence autoencoders for acoustic scene classification,” in 2018 26th European signal processing conference (EUSIPCO).   IEEE, 2018, pp. 977–981.
  • [9] S. Lloyd, M. Mohseni, and P. Rebentrost, “Quantum principal component analysis,” Nature physics, vol. 10, no. 9, pp. 631–633, 2014.
  • [10] S. Zhou, T. Loke, J. A. Izaac, and J. Wang, “Quantum fourier transform in computational basis,” Quantum Information Processing, vol. 16, no. 3, p. 82, 2017.
  • [11] P. Rebentrost, M. Mohseni, and S. Lloyd, “Quantum support vector machine for big data classification,” Physical review letters, vol. 113, no. 13, p. 130503, 2014.
  • [12] N. Chancellor, “Modernizing quantum annealing using local searches,” New Journal of Physics, vol. 19, no. 2, p. 023024, 2017.
  • [13] T. Heittola, A. Mesaros, and T. Virtanen, “Acoustic scene classification in dcase 2020 challenge: generalization across devices and low complexity solutions,” arXiv preprint arXiv:2005.14623, 2020.
  • [14] S. Challéat, N. Farrugia, J. S. Froidevaux, A. Gasc, and N. Pajusco, “a dataset of acoustic measurements from soundscapes collected worldwide during the covid-19 pandemic,” Scientific Data, vol. 11, no. 1, p. 928, 2024.