Quantum-Inspired Genetic Algorithm for Robust Source Separation in Smart City Acoustics
Abstract
The cacophony of urban sounds presents a significant challenge for smart city applications that rely on accurate acoustic scene analysis. Effectively analyzing these complex soundscapes, often characterized by overlapping sound sources, diverse acoustic events, and unpredictable noise levels, requires precise source separation. This task becomes more complicated when only limited training data is available. This paper introduces a novel Quantum-Inspired Genetic Algorithm (p-QIGA) for source separation, drawing inspiration from quantum information theory to enhance acoustic scene analysis in smart cities. By leveraging quantum superposition for efficient solution space exploration and entanglement to handle correlated sources, p-QIGA achieves robust separation even with limited data. These quantum-inspired concepts are integrated into a genetic algorithm framework to optimize source separation parameters. The effectiveness of our approach is demonstrated on two datasets: the TAU Urban Acoustic Scenes 2020 Mobile dataset, representing typical urban soundscapes, and the Silent Cities dataset, capturing quieter urban environments during the COVID-19 pandemic. Experimental results show that the p-QIGA achieves accuracy comparable to state-of-the-art methods while exhibiting superior resilience to noise and limited training data, achieving up to 8.2 dB signal-to-distortion ratio (SDR) in noisy environments and outperforming baseline methods by up to 2 dB with only 10% of the training data. This research highlights the potential of p-QIGA to advance acoustic signal processing in smart cities, particularly for noise pollution monitoring and acoustic surveillance.
Index Terms:
Acoustic signal processing, Quantum-inspired algorithms, Smart cities, Source separationI Introduction
The rise of smart cities has brought an abundance of opportunities to improve urban living through data-driven solutions. Acoustic scene analysis (ASA) plays a crucial role in this vision, enabling a deeper understanding of the urban environment through the analysis of sound [1]. However, the complexity of urban soundscapes, characterized by overlapping sound sources, diverse acoustic events, and unpredictable noise levels, presents a significant challenge for ASA, particularly in source separation. Traditional source separation methods, such as Independent Component Analysis (ICA) and Non-negative Matrix Factorization (NMF), often struggle with these complex soundscapes due to their limitations in handling correlated sources and their sensitivity to noise [2]. This necessitates the exploration of more robust and adaptable techniques that can effectively disentangle individual sound sources in real-world urban scenarios.
Quantum information theory, with principles like superposition and entanglement, offers a powerful framework for signal processing and communication, enabling efficient representation of high-dimensional signals and encoding of correlated information beyond classical capabilities. Quantum-inspired genetic algorithms have been previously explored for various optimization problems. For instance, Roy et al. [3] demonstrated the effectiveness of quantum-based genetic algorithms for optimizing complex functions. Our work builds upon these prior contributions by specifically applying this approach to the noise separation problem in complex urban acoustic scenes, with a focus on achieving robust and accurate source separation even with limited training data. However, limitations in current quantum hardware hinder the direct application of quantum algorithms to ASA. This motivates exploring quantum-inspired algorithms, adapting quantum mechanics principles to classical computation for near-term applications while leveraging the unique advantages of quantum phenomena.
I-A Background and Related Work
ASA involves tasks such as acoustic scene classification and sound event detection, with source separation being a crucial component [1]. However, traditional source separation methods like ICA [4] and NMF [5], as well as more recent techniques like Sparse Component Analysis (SCA) [6] and deep learning (DL) (e.g., [7, 8]), face challenges in handling correlated sources, noise robustness, computational efficiency, and data efficiency in complex urban acoustic scenes. While promising for signal processing, the direct application of quantum algorithms to ASA, particularly source separation, remains largely unexplored. Quantum algorithms, such as Quantum Principal Component Analysis (QPCA) [9] and Quantum Fourier Transform (QFT) [10], offer potential advantages for signal processing tasks due to their inherent ability to handle high-dimensional spaces and exploit quantum phenomena like superposition and entanglement. However, their application to source separation in complex acoustic environments requires further investigation. Similarly, Quantum Support Vector Machines (QSVM) [11] is more suited for classification tasks, while Quantum Annealing (QA) [12] shows potential for optimization but needs further evaluation in complex scenarios.
I-B Motivations and Key Contributions
Our proposed Quantum-Inspired Genetic Algorithm (p-QIGA) addresses the challenge of source separation in complex urban soundscapes by incorporating quantum concepts into a genetic optimization framework. This approach allows us to effectively disentangle individual sound sources in real-world urban scenarios, even in the presence of noise and limited training data. The p-QIGA’s ability to accurately separate sources enhances the identification and tracking of vehicles in noisy environments and contributes to improved urban planning by providing insights into the acoustic characteristics of different urban spaces. This research makes the following key contributions:
-
•
We introduce a novel p-QIGA leveraging quantum concepts to optimize source separation in complex urban acoustic scenes, enhancing performance and robustness even with limited data.
-
•
Our p-QIGA effectively addresses key challenges in smart city source separation, including handling correlated sources and diverse sound events.
-
•
By achieving accurate and robust source separation, our p-QIGA improves performance in critical smart city applications like noise pollution monitoring and acoustic surveillance.
II Proposed Methodology
II-A Problem Formulation: Source Separation in ASA
Source separation in ASA aims to decompose an observed audio signal into its constituent sound sources . The mixing process can be modeled as a convolutive mixture in the time domain:
(1) |
or equivalently in the frequency domain:
(2) |
where and are the mixing filters, and and represent noise. The goal is to estimate (or ) given (or ). This is challenging due to factors like unknown mixing filters, correlated sources, diverse sound events, noise, reverberation, and limited training data, motivating the exploration of novel approaches like the quantum-inspired genetic algorithm proposed in this paper.
II-B Quantum Encoding of Acoustic Features
Within our proposed p-QIGA-based source separation framework, the initial stage focuses on encoding acoustic features into a quantum representation. Utilizing Mel-frequency cepstral coefficients (MFCCs) within this quantum framework offers potential advantages in terms of representational compactness, noise resilience, and correlation capture. These benefits are achieved through encoding MFCCs into quantum states using a parameterized quantum circuit (PQC) illustrated in Fig. 1. This process facilitates efficient processing, robustness to noise, and the exploitation of entanglement to identify correlated sound sources, crucial for effective source separation in complex acoustic environments.
The parameterized PQC shown in Fig. 1 encodes MFCC features into a 4-qubit quantum state using single-qubit rotation gates () and two-qubit controlled-NOT (CNOT) gates. The encoding process starts by applying Hadamard gates (H) to the first and last qubits to introduce superposition. CNOT gates then create entanglement between adjacent qubits, capturing correlations between features. Each gate is parameterized by a distinct MFCC feature, with this pattern repeating for subsequent features. This design leverages the strengths of each gate: Hadamard gates for superposition, enabling a larger solution space; gates for entanglement, capturing feature relationships; and gates for precise encoding of individual MFCC values. The alternating pattern of and gates ensures each feature is encoded into a separate qubit while capturing correlations. MFCC features are sequentially assigned to qubits, simplifying the encoding and interpretation. This scheme is chosen for its ability to capture feature correlations and potential noise resilience due to the use of entangled quantum states. The choice of quantum operators and their parameterization can significantly impact the encoding and the p-QIGA’s performance. Different rotation gates (e.g., , ) or multi-qubit gates (e.g., Toffoli gates) could alter the encoding.
Let denote the vector of MFCC features. The PQC encodes this into a quantum state :
(3) |
where is the unitary operator representing the PQC with trainable parameters . The optimization of can be formulated as:
(4) |
where is a performance metric. For the first four MFCC features, the encoding process can be represented as:
This encoding scheme compactly represents multiple MFCC features, potentially enabling efficient processing and noise resilience. Furthermore, capturing correlations between features through entanglement can enhance the p-QIGA’s source separation capabilities.
II-C Quantum-Inspired Genetic Algorithm for Source Separation (p-QIGA)
II-C1 Representation and Initialization
In p-QIGA, each individual in the population represents a candidate solution to the source separation problem. These individuals are encoded as quantum states within a 4-qubit Hilbert space , where each qubit corresponds to a specific parameter of the source separation model. The initial population is generated by randomly initializing the qubits in a superposition of states, allowing for a diverse exploration of the parameter space. An individual in the population can be represented as:
(5) |
where and are complex probability amplitudes satisfying .
II-C2 Quantum-Inspired Genetic Operators
The p-QIGA employs quantum-inspired genetic operators to evolve the population of candidate solutions. These quantum-inspired operators, such as superposition and entanglement, are designed to enhance the search process.
Quantum Crossover
This operator combines genetic information from two parent individuals, and , to create two , and . It leverages the concept of superposition to create that are a linear combination of the parent states, allowing for exploration of new regions in the parameter space. The crossover operation can be represented as:
(6) | ||||
(7) |
where the probability amplitudes and are determined by a rotation gate applied to the parent states.
Quantum Mutation
This operator introduces random variations in the quantum states of individuals, simulating the effect of quantum fluctuations. This helps to maintain diversity in the population and prevent premature convergence to suboptimal solutions. The mutation operation can be represented as a rotation gate applied to an individual’s state:
(8) |
where is a rotation gate with a randomly chosen rotation angle .
II-C3 Fitness Function
The fitness function evaluates the quality of each candidate solution, guiding the p-QIGA towards optimal separation parameters. In our case, the fitness function considers multiple criteria:
(9) | ||||
where , , and are the Signal-to-Distortion Ratio, Signal-to-Interference Ratio, and Signal-to-Artifacts Ratio, respectively, of the separated sources obtained using the parameters encoded in . is a penalty term that measures the correlation between the separated sources, and , , , and are weight factors that balance the importance of each criterion.
II-C4 Optimization Process
The p-QIGA iteratively applies the quantum-inspired genetic operators to evolve the population of candidate solutions. This iterative process continues until either a maximum number of generations, denoted as , is reached or the best fitness value in the population, denoted as , surpasses a predefined desired fitness threshold, denoted as . The optimization process can be represented as the iterative update of the population:
(10) |
where represents the population at generation , is the selection operator, is the crossover operator, is the mutation operator, and is the offspring replacement operator. In detail, this process can be described in Algorithm 1.
II-C5 Convergence Analysis
The convergence of the p-QIGA can be analyzed by considering the probability of finding the optimal solution in each generation , denoted as . The change in this probability, , can be modeled as:
(11) |
where represents the effectiveness of the quantum-inspired operators, and is the exploration rate, with controlling the rate of decrease in exploration. By solving this difference equation, we can analyze the convergence behavior of the p-QIGA.
II-C6 Computational Complexity and Scalability
The p-QIGA demonstrates efficiency with a time complexity of , where is the number of MFCC features, and constant space complexity. This outperforms classical methods like ICA () and NMF (). To assess scalability, we conducted experiments varying the number of sources, data size, and circuit depth. Increasing the number of sources from 2 to 5 increased runtime by 35%, attributed to the increased circuit complexity. Doubling the data size resulted in a 42% runtime increase. Increasing circuit depth by adding an additional layer of gates led to a 28% runtime increase. These results demonstrate the p-QIGA’s ability to handle increasingly complex scenarios with moderate increases in computational cost. Future work will explore further optimizations for enhanced efficiency.
II-D Classical Post-processing and Classification
Following the quantum-enhanced source separation process, the estimated source signals often require further refinement. This subsection details the classical post-processing techniques employed and the subsequent classification stage.
II-D1 Post-processing Techniques
The following signal processing techniques are applied to each separated source :
-
•
Filtering: A bandpass filter is applied to remove residual noise and unwanted frequency components: , where and are the DFTs of and the filtered signal , respectively.
-
•
Dynamic Range Compression: A dynamic range compression algorithm is applied to reduce the dynamic range:
(12) where is the threshold and is the compression ratio.
-
•
De-clipping: A de-clipping algorithm is applied to restore signal fidelity in clipped regions, for example, by replacing clipped samples with interpolated values.
II-D2 Acoustic Scene Classification
The refined separated sources are used for acoustic scene classification. We extract a feature vector from each source and employ a Support Vector Machine (SVM) classifier with a radial basis function (RBF) kernel:
(13) |
where is the predicted acoustic scene class.
III Experimental Setting
III-A Datasets
We evaluate our p-QIGA on two datasets: (1) the TAU Urban Acoustic Scenes 2020 Mobile dataset [13] (Dataset 1), comprising recordings from 10 acoustic scenes in 12 European cities, captured using 4 different mobile devices; and (2) the Silent Cities dataset [14] (Dataset 2), capturing unique soundscapes recorded during the COVID-19 pandemic in various cities worldwide, featuring urban environments with reduced human activity. Both datasets are preprocessed (format conversion, downsampling, noise reduction) and split into training (80%), validation (10%), and test (10%) sets.
III-B Evaluation Metrics
We evaluate the performance of our p-QIGA algorithm using established metrics that quantify the quality of the separated sources, including:
-
1.
Signal-to-Distortion Ratio (SDR):
(14) -
2.
Signal-to-Interference Ratio (SIR):
(15) where represents the noise component.
-
3.
Signal-to-Artifacts Ratio (SAR):
(16)
These metrics are widely used in the field of ASA and provide a comprehensive assessment of source separation performance by capturing different aspects of the separation quality, such as target distortion, interference suppression, and artifact removal. Furthermore, these metrics align with the evaluation criteria used in related works, allowing for a fair comparison with existing source separation techniques.
III-C Baseline Methods
We benchmark our p-QIGA against established source separation methods, including:
-
•
Classical methods: ICA, NMF, SCA, and multi-channel CNNs.
-
•
Quantum-Inspired method: Quantum Annealing (QA) [12] for optimizing a classical source separation model.
These baselines provide a diverse benchmark for evaluating the performance of our p-QIGA.
III-D Implementation Details
The p-QIGA was implemented using Qiskit for quantum computing simulation and standard Python libraries (NumPy, SciPy) for classical components. Experiments were conducted on a workstation with an Intel Xeon Gold 6248 CPU, 128GB RAM, and an NVIDIA Tesla V100 GPU. Hyperparameter tuning for all methods was performed using grid search, with optimal values selected based on validation set performance. Specific hyperparameters and their search spaces are listed in Table I.
Method | Hyperparameter | Value |
p-QIGA | Population size | 50 |
Number of generations | 100 | |
Crossover probability | 0.8 | |
Mutation probability | 0.1 | |
Weight factors in Equation (9) | , , , | |
ICA | Learning rate | 0.01 |
NMF | Number of components | 10 |
Regularization parameter | 0.01 | |
SCA | Sparsity level | 0.1 |
Dictionary size | 256 | |
CNN | Number of layers | 5 |
Kernel size | 3 | |
Learning rate | 0.001 | |
Batch size | 32 | |
QA | Annealing schedule | Linear |
Number of iterations | 1000 |
IV Results and Analysis
IV-A Source Separation Performance
As shown in Table II, the p-QIGA achieves competitive performance on both datasets, demonstrating its effectiveness in handling complex acoustic scenes. On the TAU Urban Acoustic Scenes 2020 Mobile dataset, it achieves comparable performance to CNNs and outperforms other classical methods (ICA, NMF, SCA). On the Silent Cities dataset, the p-QIGA outperforms all baselines, highlighting its superior ability to handle correlated sources and limited training data. These results are statistically significant (p 0.05). Furthermore, the p-QIGA exhibits better generalization capabilities than the CNN and demonstrates greater robustness to variations in acoustic conditions and limited training data, which are crucial for real-world smart city applications.
Method | Dataset 1 | Dataset 2 | ||||
SDR (dB) | SIR (dB) | SAR (dB) | SDR (dB) | SIR (dB) | SAR (dB) | |
ICA | 9.5 | 14.2 | 11.5 | 7.2 | 12.1 | 9.1 |
NMF | 9.8 | 14.8 | 11.8 | 7.8 | 12.9 | 9.6 |
SCA | 10.0 | 15.1 | 11.9 | 8.1 | 13.2 | 9.9 |
CNN | 10.5 | 16.0 | 12.5 | 8.3 | 13.5 | 10.1 |
QA | 9.9 | 14.9 | 11.7 | 8.0 | 13.0 | 9.8 |
p-QIGA (Ours) | 10.2 | 15.5 | 12.1 | 8.5 | 13.8 | 10.3 |
IV-B Impact of Quantum-Inspired Components
An ablation study was conducted to investigate the contribution of each quantum-inspired component (superposition, entanglement, crossover, and mutation) in the p-QIGA. Each component was systematically removed or replaced with its classical counterpart to evaluate its impact on source separation performance. Table III presents the performance of these p-QIGA variants compared to the full p-QIGA on the Silent Cities dataset. As shown in the table, removing superposition or entanglement leads to a noticeable performance degradation, particularly in terms of SDR and SIR. This highlights the benefits of encoding acoustic features into quantum states and leveraging entanglement to capture correlations between sources. Similarly, replacing the quantum crossover and mutation operators with classical counterparts also results in a slight performance decrease, indicating the effectiveness of these quantum-inspired operators in exploring the solution space.
Method | SDR (dB) | SIR (dB) | SAR (dB) |
p-QIGA (full) | 8.5 | 13.8 | 10.3 |
p-QIGA without superposition | 7.8 | 12.5 | 9.6 |
p-QIGA without entanglement | 8.1 | 13.1 | 9.9 |
p-QIGA with classical crossover | 8.2 | 13.3 | 10.0 |
p-QIGA with classical mutation | 8.3 | 13.5 | 10.1 |
IV-C Performance on Different Acoustic Scenes
The p-QIGA demonstrated robustness in challenging scenarios, achieving 8.2 dB SDR in a high-noise “busy street” scene (3 dB SNR), significantly outperforming the CNN (6.5 dB, p=0.023). While proficient in moderately dense scenes, performance slightly decreased with higher source density, suggesting potential limitations in resolving closely spaced sources. However, the p-QIGA showed greater resilience to source mobility than CNNs, achieving 7.8 dB SDR and 13.1 dB SIR in a “train station” with 80% moving sources, compared to the CNN’s 6.1 dB and 10.5 dB, respectively. Fig. 2 further confirms the p-QIGA’s superior performance and generalization ability, especially in complex scenes with high noise, source density, or source mobility, attributed to its quantum-inspired encoding, entanglement mechanism, and genetic operators.
IV-D Performance with Varying Data Sizes
To assess the data efficiency of the p-QIGA, we conducted experiments with varying training data sizes on the TAU Urban Acoustic Scenes 2020 Mobile dataset. The p-QIGA and baseline methods were trained on 10%, 25%, 50%, and 75% of the original training set, ensuring consistent dataset reductions across all methods for fair comparison. Performance was evaluated on the held-out test set using SDR, SIR, and SAR. As shown in Table IV, the p-QIGA consistently outperforms baseline methods, even with limited training data (e.g., achieving 7.1 dB SDR with only 10% of the data). This data efficiency, attributed to the quantum-inspired encoding and genetic optimization, highlights the p-QIGA’s suitability for scenarios with scarce labeled data and underscores the potential of quantum-inspired algorithms for practical smart city applications.
Training Data Size | p-QIGA (SDR) | CNN (SDR) | ICA (SDR) |
10% | 7.1 dB | 6.2 dB | 5.3 dB |
25% | 7.8 dB | 6.9 dB | 6.0 dB |
50% | 8.3 dB | 7.5 dB | 6.8 dB |
75% | 8.5 dB | 8.0 dB | 7.2 dB |
V Conclusion
This paper presented a novel p-QIGA designed to address the complex task of source separation in urban acoustic scenes for smart city applications. By incorporating quantum concepts of superposition and entanglement into a genetic optimization framework, our approach achieved robust and accurate source separation even in challenging acoustic environments. These environments, which are often characterized by high noise levels, numerous interfering sources, and dynamic conditions, pose significant challenges for traditional source separation methods. The p-QIGA’s effectiveness was rigorously evaluated on two distinct datasets, demonstrating its superior performance compared to classical methods, particularly in scenarios with limited training data. This capability is crucial for real-world applications where obtaining large labeled datasets can be costly or impractical. This research contributes significantly to the field of acoustic signal processing by introducing a new class of quantum-inspired algorithms with the potential to enhance source separation capabilities in diverse applications.
References
- [1] J. Abeßer, “A review of deep learning based methods for acoustic scene classification,” Applied Sciences, vol. 10, no. 6, p. 2020, 2020.
- [2] S. Ansari, A. S. Alatrany, K. A. Alnajjar, T. Khater, S. Mahmoud, D. Al-Jumeily, and A. J. Hussain, “A survey of artificial intelligence approaches in blind source separation,” Neurocomputing, vol. 561, p. 126895, 2023.
- [3] U. Roy, S. Roy, and S. Nayek, “Optimization with quantum genetic algorithm,” International Journal of Computer Applications, vol. 102, no. 16, pp. 1–7, 2014.
- [4] J. Xie and M. Zhu, “Investigation of acoustic and visual features for acoustic scene classification,” Expert Systems with Applications, vol. 126, pp. 20–29, 2019.
- [5] V. Bisot, R. Serizel, S. Essid, and G. Richard, “Acoustic scene classification with matrix factorization for unsupervised feature learning,” in 2016 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, 2016, pp. 6445–6449.
- [6] A. Asaei, H. Bourlard, M. J. Taghizadeh, and V. Cevher, “Computational methods for underdetermined convolutive speech localization and separation via model-based sparse component analysis,” Speech Communication, vol. 76, pp. 201–217, 2016.
- [7] W. Xie, Q. He, Z. Yu, and Y. Li, “Deep mutual attention network for acoustic scene classification,” Digital Signal Processing, vol. 123, p. 103450, 2022.
- [8] S. Arniriparian, M. Freitag, N. Cummins, M. Gerczuk, S. Pugachevskiy, and B. Schuller, “A fusion of deep convolutional generative adversarial networks and sequence to sequence autoencoders for acoustic scene classification,” in 2018 26th European signal processing conference (EUSIPCO). IEEE, 2018, pp. 977–981.
- [9] S. Lloyd, M. Mohseni, and P. Rebentrost, “Quantum principal component analysis,” Nature physics, vol. 10, no. 9, pp. 631–633, 2014.
- [10] S. Zhou, T. Loke, J. A. Izaac, and J. Wang, “Quantum fourier transform in computational basis,” Quantum Information Processing, vol. 16, no. 3, p. 82, 2017.
- [11] P. Rebentrost, M. Mohseni, and S. Lloyd, “Quantum support vector machine for big data classification,” Physical review letters, vol. 113, no. 13, p. 130503, 2014.
- [12] N. Chancellor, “Modernizing quantum annealing using local searches,” New Journal of Physics, vol. 19, no. 2, p. 023024, 2017.
- [13] T. Heittola, A. Mesaros, and T. Virtanen, “Acoustic scene classification in dcase 2020 challenge: generalization across devices and low complexity solutions,” arXiv preprint arXiv:2005.14623, 2020.
- [14] S. Challéat, N. Farrugia, J. S. Froidevaux, A. Gasc, and N. Pajusco, “a dataset of acoustic measurements from soundscapes collected worldwide during the covid-19 pandemic,” Scientific Data, vol. 11, no. 1, p. 928, 2024.