Abstract
In many speech communication applications, robust localization and tracking of multiple speakers in noisy and reverberant environments are of major importance. Several algorithms to tackle this problem have been proposed in the last decades. In this paper, we propose several extensions to a recently presented joint direction of arrival (DOA) and pitch estimation method, increasing its robustness in multi-speaker scenarios, noise, and reverberation. First, a spectral comb filter is added to the original algorithm to better cope with concurrent speakers. Second, the well-known generalized cross-correlation with phase transform (GCC-PHAT) is used as an additional weighting function to improve the DOA estimation accuracy in terms of correct hits. Third, using multiple microphone pairs, the multi-channel cross-correlation approach is incorporated to improve the robustness against noise and reverberation. In order to improve tracking for moving and even intersecting speakers, a particle filter is used. Experiments with real-world recordings in realistic acoustic conditions show that the proposed extensions increase the DOA hit rate by about 33% compared to the original algorithm for two step-wise moving sources at a signal-to-noise ratio (SNR) of 15 dB and a reverberation time RT60of 560 ms.
Author supplied keywords
Cite
CITATION STYLE
Gerlach, S., Bitzer, J., Goetze, S., & Doclo, S. (2014). Joint estimation of pitch and direction of arrival: Improving robustness and accuracy for multi-speaker scenarios. Eurasip Journal on Audio, Speech, and Music Processing, 2014(1), 1–17. https://doi.org/10.1186/s13636-014-0031-8
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.