Deep learning-based direction-of-arrival estimation for multiple speech sources using a small scale array

  • Zhang M
  • Pan X
  • Shen Y
  • et al.
10Citations
Citations of this article
11Readers
Mendeley users who have this article in their library.
Get full text

Abstract

A high resolution direction-of-arrival (DOA) approach is presented based on deep neural networks (DNNs) for multiple speech sources localization using a small scale array. First, three invariant features from the time-frequency spectrum of the input signal include generalized cross correlation (GCC) coefficients, GCC coefficients in the mel-scaled subband, and the combination of GCC coefficients and logarithmic mel spectrogram. Then the DNN labels are designed to fit the Gaussian distribution, which is similar to the spatial spectrum of the multiple signal classification. Finally, DOAs are predicted by performing peak detection on the DNN outputs, where the maximum values correspond to speech signals of interest. The DNN-based DOA estimation method outperforms the existing high resolution beamforming techniques in numerical simulations. The proposed framework implemented with a four-element microphone array can effectively localize multiple speech sources in an indoor environment.

Cite

CITATION STYLE

APA

Zhang, M., Pan, X., Shen, Y., & Qiu, J. (2021). Deep learning-based direction-of-arrival estimation for multiple speech sources using a small scale array. The Journal of the Acoustical Society of America, 149(6), 3841–3850. https://doi.org/10.1121/10.0005127

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free