Deep learning-based direction-of-arrival estimation for multiple speech sources using a small scale array

Min Zhang; Xiang Pan; Yining Shen; Jianjun Qiu

Journal ArticleOPEN ACCESS

Deep learning-based direction-of-arrival estimation for multiple speech sources using a small scale array

Zhang M
Pan X
Shen Y
et al.

The Journal of the Acoustical Society of America (2021) 149(6) 3841-3850

DOI: 10.1121/10.0005127

14Citations

11Readers

Get full text

Abstract

A high resolution direction-of-arrival (DOA) approach is presented based on deep neural networks (DNNs) for multiple speech sources localization using a small scale array. First, three invariant features from the time-frequency spectrum of the input signal include generalized cross correlation (GCC) coefficients, GCC coefficients in the mel-scaled subband, and the combination of GCC coefficients and logarithmic mel spectrogram. Then the DNN labels are designed to fit the Gaussian distribution, which is similar to the spatial spectrum of the multiple signal classification. Finally, DOAs are predicted by performing peak detection on the DNN outputs, where the maximum values correspond to speech signals of interest. The DNN-based DOA estimation method outperforms the existing high resolution beamforming techniques in numerical simulations. The proposed framework implemented with a four-element microphone array can effectively localize multiple speech sources in an indoor environment.

References Powered by Scopus

View more at Scopus

Cited by Powered by Scopus

View more at Scopus

Cite

CITATION STYLE

APA

Zhang, M., Pan, X., Shen, Y., & Qiu, J. (2021). Deep learning-based direction-of-arrival estimation for multiple speech sources using a small scale array. The Journal of the Acoustical Society of America, 149(6), 3841–3850. https://doi.org/10.1121/10.0005127

Readers' Seniority

PhD / Post grad / Masters / Doc 7

88%

Lecturer / Post doc 1

13%

Readers' Discipline

Engineering 3

43%

Social Sciences 2

29%

Computer Science 1

14%

Agricultural and Biological Sciences 1

14%

Deep learning-based direction-of-arrival estimation for multiple speech sources using a small scale array

Abstract

References Powered by Scopus

Deep residual learning for image recognition

MULTIPLE EMITTER LOCATION AND SIGNAL PARAMETER ESTIMATION.

Librispeech: An ASR corpus based on public domain audio books

Cited by Powered by Scopus

Introduction to the special issue on machine learning in acoustics

Robust high-resolution direction-of-arrival estimation method using DenseBlock-based U-net

Direction-of-Arrival Estimation Method Based on Neural Network with Temporal Structure for Underwater Acoustic Vector Sensor Array

Register to see more suggestions

Cite

Readers' Seniority

Readers' Discipline