Are Microphone Signals Alone Sufficient for Self-Positioning?
Abstract
In an era where asynchronous environments pose challenges to traditional self-positioning methods, we propose a new transformation to the existing paradigm. Traditionally, time of arrival (TOA) measurements require both microphone and source signals, limiting their applicability in environments with unknown emission time of human voices or sources and unknown recording start time of independent microphones. To address this issue, our research pioneers a mapping function capable of transforming both TOA and time difference of arrival (TDOA) formulas, demonstrating, for the first time, that they can be identical to one another. This implies that microphone signals alone are sufficient for self-positioning without the need for source signals waveform, a groundbreaking advancement in the field that carries the potential to revolutionize self-positioning techniques, expanding their applicability in challenging environments. Supported by a robust mathematical proof and compelling experimental results, this research represents a timely and significant contribution to the current discourse in signal, and audio processing.
Index Terms:
Time of arrival, time difference of arrival, self-positioning, mapping functionI Introduction
The ability to accurately localize distributed microphones and sound sources is a fundamental requirement in various acoustic tasks, including noise reduction, source signal enhancement, and separation [1, 2, 3]. This is conventionally achieved through the utilization of time of arrival (TOA) and time difference of arrival (TDOA) measurements [4]. However, these techniques have significant limitations, particularly in asynchronous environments where the timing information of signal emission and recording are unknown in advance.
In scenario where the waveform of the source signals is available, including information on frequency, amplitude, and duration, TOA measurements can be estimated through cross-correlation methods [5]. This has led to the development of various self-positioning methodologies such as probabilistic generative models [6], maximal likelihood estimation [7], Gram matrix and semi-definite relaxation [8], and techniques utilizing the low-rank property (LRP) [9] with alternating minimization method [10, 11], and structure total least square [12, 13].
Alternatively, when source signals waveform is hard to obtain, self-positioning techniques pivot towards TDOA measurements, which can be estimated with audio signals from a pair of microphones [5]. This shift has led to a plethora of methodologies, such as maximal likelihood estimation [7, 17], auxiliary function method [14], LRP with nuclear truncation minimization [15, 16], and distributed damped Newton optimization [4, 18].
Yet, amidst these developments, a significant and pressing question has lingered: Can microphone signals alone be sufficient for self-positioning, thereby negating the need for source signals? The answer to this question carries profound implications for the field, as it can streamline and make the self-positioning process more efficient and adaptable. Moreover, it presents a timely advancement, given the growing complexity of audio environments and the increasing need for flexible and efficient localization methods.
This research takes a groundbreaking step towards answering this crucial question. We introduce an innovative mapping function that transforms both TOA and TDOA formulas to an identical representation/form. In other words, our findings illustrate that their transformations can mirror one another perfectly, confirming that relying solely on microphone signals is sufficient for self-positioning tasks. Therefore, our novel approach unveils, for the first time, the exact relationship between TOA and TDOA measurements, challenging the long-standing assumption that TOA necessitates both microphone signals and the source signal waveform.
This revolutionary insight doesn’t merely simplify the self-positioning process by eliminating the need for additional information from source signals. It also broadens its applicability, as properties initially designed for TOA-based localization, such as rank 3 [3] and rank 5 [19], can now be applied to TDOA-based localization. In essence, our work represents a significant, novel, and timely contribution, with the potential to dramatically reshape self-positioning techniques in asynchronous environments and catalyze further advancements in signal, and audio processing.
II Problem Formulation
Consider a setup where we have asynchronous microphones and asynchronous sound sources, located at and , respectively, with 3 denoting three dimensions. After sources have emitted the audio signals and microphones have received the corresponding signals, we can encounter two possible scenarios.
In the scenario where the waveform from the source signals can be acquired, by defining the recording start time of microphone as and emission time of source as as well as the speed of sound as , the TOA () between microphone and source can be calculated as [8]
(1) |
where and , and is the norm. In addition, without loss of generality, the location of the first source can be set as because of the invariance of translation and rotation regarding the geometry of microphones and sources [3].
The second scenario arises when it is challenging to obtain the waveform from the source signals. Here, we define the TDOA () of source between microphone and microphone as [8]
(2) |
Upon inspection of Eq. (2), it can be observed that after the source emits the audio signal and both and microphones receive the corresponding signal, the microphone signal contains information about the start time of microphone , the emitted time of source , as well as the time difference in signal propagation from the source to the microphone. Similarly, the signal at the microphone contains information about the start time of the microphone , the emitted time of source , and the time difference in signal propagation from the source to the microphone. Thus, employing the generalized cross-correlation with phase transform [20] method, TDOA () can be estimated using only the audio signals from the and microphones, demonstrating the independence of TDOA from the source signal. Besides, according to the definition of TDOA, it measures the time difference between a pair of microphones when they receive the corresponding source signal, therefore, the TDOA () of source in Eq. (2) can also be measured by the microphone signal and any other one of remaining microphone signals.
Interestingly, this equation shares the same structural form as the TOA formula in Eq. (1). However, the exact relationships between TOA formula in Eq. (1) and TDOA formula in Eq. (2) remain elusive. No existing works have demonstrated this relationship so far, and as a result, the sufficiency of utilizing only microphone signals for self-positioning is still unknown. Our research objective, therefore, is to investigate the feasibility of utilizing the microphone signals alone for self-positioning when the waveform of source signals is unavailable. The results of our study have the potential to challenge the long-standing assumption that the acquisition of source signal waveform is a necessity for TOA-based self-positioning. This can lead to an expansion of self-positioning techniques, enhancing their utility in challenging environments.
III Mapping function for TOA and TDOA Formulas
In this section, a novel mapping function is derived for TOA formula in Eq. (1) and TDOA formula in Eq. (2). We first present the novel mapping function in Subsection A, then the proof of the proposed mapping function is shown in Subsection B followed by a subsection for showing the property of the proposed mapping function.
III-A Mapping Function
TOA measurements are unavailable when waveform of source signals is missing, and only TDOA measurements can be used for localization once this situation happens. Since there are no existing works in the state-of-the-arts investigate the relationships between TOA and TDOA measurements, here, we present a novel mapping function to show the sufficiency of using microphone signals alone for both TOA and TDOA-based self-positioning. The proposed mapping function, , for TOA formula in Eq. (1) and TDOA formula in Eq. (2) is defined as
(4) |
and
(5) |
respectively, then by applying the mapping function, , to TOA formula in Eq. (1) and TDOA formula in Eq. (2) and defining two variables
(6) |
we state that
(7) |
where and . From the statement in Eq. (7), we can see that once this relationship is proved, this mapping function indicates the same structure as TOA formula in Eq. (1), showing the location of both microphones and sources can be obtained with by utilizing the same methods that are designed for TOA-based self-positioning. More importantly, the sufficiency that utilizing microphone signals alone can be revealed for self-positioning, providing the potential to challenge the long-standing assumption that TOA necessitates both microphone signals and the waveform of source signals for self-positioning. Thus, the process of self-positioning can be more adaptable and efficient, and the abilities of self-positioning techniques can be expanded in challenging environments.
III-B Proof for mapping function
We first derive the transformation of TOA formula in Eq. (4), then the derivation of transformation of TDOA formula in Eq. (5) is displayed. Finally, we validate the statement in Eq. (7) by comparing the transformation of TOA formula in Eq. (4) with the transformation of TDOA formula in Eq. (5).
III-B1 Transformation of TOA formula
From TOA formula in Eq. (1), we can have
(8) |
then with Eqs. (1) and (8), the difference between and can be displayed as
(9) |
From Eq. (9), we can see the mean value for with respect to the index is
(10) |
III-B2 Transformation of TDOA formula
III-B3 Validation of statement
Based on the definitions of the two variables and in Eq. (6), then with the transformation of TOA formula in Eq. (III-B1) and transformations of TDOA formula in Eq. (III-B2), we can see that Eq. (III-B1) and Eq. (III-B2) are identical to one another, this completes the proof of mapping function in Eq. (7).
With the proof of the statement in Eq. (7), we can see that the transformations of TOA and TDOA formulas are identical to one another, revealing the sufficiency of utilizing microphone signals for both TOA and TDOA-based self-positioning, providing the potentials to challenge the long-standing assumption that TOA necessitates both microphone signals and the waveform of source signals for self-positioning. In addition, the statement in Eq. (7) indicates that many properties, such as rank 3 [3] and rank 5 [19], that are used for TOA-based localization can also be used for TDOA-based localization, this makes the tasks of self-positioning more efficient and adaptable. Besides, by eliminating the need for additional information from source signals, a wide range of other applications, such as noise reduction, sources signals enhancement and separation [1, 2, 3] can also be facilitated since the importance of self-positioning for those applications above.
III-C Property for Mapping Function
Since (see content below Eq. (1)), let’s denote and for and . Then based on Eqs. (7), (III-B1) and (III-B2), we can summarize the proposed mapping function, , as close form
(15) |
Finally, with the close form of proposed mapping function in Eq. (15), we can see the interesting property of this mapping function, i.e., the mean value of mapping function is 0 with respect to index , which can be summarized as
(16) |
Upon inspection of Eq. (16), it indicates that the mean value of transformation of both TOA and TDOA formulas with respect to all microphones and any source is 0.
IV Experimental Validations
In this section, experimental results are shown to validate the proposed mapping function. The experimental setups are illustrated in subsection A first, then the evaluation metric is defined and the validations of both the proposed mapping function and the property of the proposed mapping function are shown in subsection B.
IV-A Setups
IV-A1 Simulation data
All the simulation data is randomly generated by MATLAB with uniform distribution, both the start time of microphones and emission time of sources are in the range of , the locations of microphone and source are distributed in the room with size of [8] and the speed of sound is set to be . In addition, both the number of microphones and the number of sources are set to , and the number of configurations is set to be . Besides, since the number of both microphones and sources is and the number of configurations is , there are data points for simulated data.
IV-A2 Real-Life data
The real data [21] was collected in an office of size of , where most of the furniture inside the office was removed. There are microphones which were fixed, and a chirp was played by a loudspeaker from positions. This real-life data for TOA matrix can be downloaded at Github111This real-life data is available at https://github.com/swing-research/xtdoa/tree/master/matlab [8, 21] and the TDOA matrix is calculated by Eq. (2). For more details of this real-life data, readers can refer to references [8, 21]. Also, both the start time of microphones and the emission time of sources are in the range of . In addition, the number of data points for real-life data is 780 since there are 12 microphones and 65 sources.
IV-B Evaluations and Results
We first show the value of proposed mapping function for the transformations of both TOA and TDOA measurements with both simulation data and real-life dataset, then the property of the proposed mapping function in Eq. (16), , is validated. Finally, the statement for the proposed mapping function in Eq. (7) is evaluated by measuring the difference of transformations of TOA and TDOA formulas
(17) |
where and . As can be seen from Eq. (17), once is equal to zero, the values of transformation of TOA and TDOA formulas are the same as each other, it indicates that the proposed mapping function, , is validated.
Fig. 1 shows the experimental results with both simulated data and real data. From Fig. 1(a), it can be observed that the values of for both TOA and TDOA measurements in simulated data are in the range of while the corresponding values in real data are in the range of , this is because of the different sizes of the rooms are used for simulation data and real life data, respectively. In addition, form Fig. 1(b), we can see that the values of in both simulation and real data are always with a magnitude of , and it should be noted that those errors/inaccuracies are introduced by the machine calculation accuracy of MATLAB. Therefore, the property of proposed mapping function is validated. Besides, from Fig. 1(c), we can also see that the value of is also with a magnitude of due to the machine calculation accuracy of MATLAB, therefore, is validated. This implies that the transformations of TOA formula in Eq. (1) and TDOA formula in Eq. (2) are identical to one another, so that the statement for proposed mapping function in Eq. (7) is validated. TOA measurements are obtained with both microphones received signals and source signals while TDOA measurements are obtained with microphones signals only, and from Fig. 1(c), it is obvious that the transformation of TOA and TDOA measurements are the same as each other, therefore, our novel mapping function shows the sufficiency of utilizing microphone signals alone for self-positioning, negating the need of source signals for self-positioning, presenting a timely advancement for tasks of self-positioning.


V Conclusion
This letter investigated the sufficiency of using microphone signals alone for self-positioning that has never been investigated in the state-of-the-arts. When both the emission time of the source signal and the recording start times of the microphones are unknown, by presenting a novel mapping function that has never been shown in the literature to transform both TOA and TDOA formulas, we demonstrated that the transformations of TOA and TDOA formulas are identical to one another, showing the sufficiency that uses microphone signals alone for self-positioning, making the tasks of self-positioning more flexible and adaptable. Therefore, the proposed mapping function can be regarded as a timely advancement for the tasks of self-positioning.
For future works, based on the existing TOA and TDOA-based methods, it would be interesting to apply this mapping function to estimate the unknown emission time and start time as well as the locations of microphones and sources. Besides, it might also be interesting to apply this mapping function to other applications, such as noise reduction, sources signals enhancement and separation.
References
- [1] X. Dang, Q. Cheng, and H. Zhu, “Indoor multiple sound source localization via multi-dimensional assignment data association,” IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 27, no. 12, pp. 1944–1956, 2019.
- [2] Q. Zhang, Z. Chen, and F. Yin, “Distributed marginalized auxiliary particle filter for speaker tracking in distributed microphone networks,” IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 24, no. 11, pp. 1921–1934, 2016.
- [3] T. K. Le and N. Ono, “Closed-form and near closed-form solutions for TOA-based joint source and sensor localization,” IEEE Trans. Signal Process., vol. 64, no. 18, pp. 4751-4766, 2016.
- [4] D. Hu, Z. Chen, F. Yin, ”Geometry calibration for acoustic transceiver networks based on network newton distributed optimization,” IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 29, pp. 1023-1032, 2021.
- [5] X. Wang, and D. Hu, “Distributed self-localization for acoustic transceiver networks,” IEEE Signal Process. Lett., 2023.
- [6] R. Biswas and S. Thrun, “A passive approach to microphone network localization,” in Proc. IEEE/RSJ Int. Conf. Intell. Robots Syst., pp. 1544–1549, 2004.
- [7] V. C. Raykar, I. V. Kozintsev, and R. Lienhart, ”Position calibration of microphones and loudsources in distributed computing platforms,” IEEE Trans. Speech, Audio Process., vol. 13, no. 1, pp. 70-83, 2004.
- [8] D. E. Badawy, V. Larsson, M. Pollefeys, and I. Dokmanic, ”Localizing unsynchronized sensors with unknown sources,” IEEE Trans. Signal Process., vol. 71, pp. 641-654, 2023.
- [9] P. H. Schönemann, ”On metric multidimensional unfolding,” Psychometrika, vol. 35, no. 3, pp. 349-366, 1970.
- [10] N. D. Gaubitch, W. B. Kleijn, and R. Heusdens, ”Calibration of distributed sound acquisition systems using toa measurements from a moving acoustic source,” in Proc. IEEE Int. Conf. Acoust. Speech, Signal Process., pp. 7455-7459, 2014.
- [11] N. D. Gaubitch, W. B. Kleijn, and R. Heusdens, ”Auto-localization in ad-hoc microphone arrays,” in Proc. IEEE Int. Conf. Acoust. Speech, Signal Process., pp. 106–110, 2013.
- [12] J. Zhang, R. C. Hendriks, and R. Heusdens, ”Structured total least squares based internal delay estimation for distributed microphone auto-localization,” in Proc. Int. Workshop Acoustic Signal Enhancement, pp. 1-5, 2016.
- [13] R. Heusdens and N. Gaubitch, ”Time-delay estimation for toa-based localization of multiple microphones,” in Proc. IEEE Int. Conf. Acoust. Speech, Signal Process., pp. 609-613, 2014.
- [14] N. Ono, H. Kohno, N. Ito and S. Sagayama, ”Blind alignment of asynchronously recorded signals for distributed microphone array,” in Proc. WASPAA, pp. 161-164, 2009.
- [15] F. Jiang and Y. Kuang, ”Time delay estimation for tdoa self-calibration using truncated nuclear norm regularization,” in Proc. IEEE Int. Conf. Acoust. Speech, Signal Process., pp. 3885-3889, 2013.
- [16] Y. Kuang, and K. Åström, ”Stratified microphone network self-calibration from tdoa measurements,” in Proc. EUSIPCO, pp. 1-5, 2013.
- [17] S. Woźniak, and K. Kowalczyk, ”Passive joint localization and synchronization of distributed microphone arrays,” IEEE Signal Process. Lett., vol. 26, no. 2, pp. 292-296, 2018.
- [18] D. Hu, Z. Chen, and F. Yin, ”Passive Geometry Calibration for Microphone Arrays Based on Distributed Damped Newton Optimization,” IEEE/ACM Trans. Audio, Speech, Language Process., vol. 29, pp. 118-131, 2020.
- [19] M. Pollefeys and D. Nister, “Direct computation of sound and microphone locations from time-difference-of-arrival data,” in Proc. IEEE Int. Conf. Acoust. Speech, Signal Process., pp. 2445-2448, 2008.
- [20] M. S. Brandstein, and H. F. Silverman, ”A robust method for speech signal time-delay estimation in reverberant rooms,” in Proc. IEEE Int. Conf. Acoust. Speech, Signal Process., vol. 1, pp. 375-378, 1997.
- [21] K. Batstone, G. Flood, T. Beleyur, V. Larsson, H. R. Goerlitz, M. Oskarsson, and K. Åström, ”Robust self-calibration of constant offset time-difference-of-arrival,” in Proc. IEEE Int. Conf. Acoust. Speech, Signal Process., pp. 4410-4414, 2019.