Improved MVDR beamforming using single-channel mask prediction networks

Hakan Erdogan; John Hershey; Shinji Watanabe; Michael Mandel; Jonathan Le Roux

Conference Proceedings

Improved MVDR beamforming using single-channel mask prediction networks

Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH (2016) 08-12-September-2016 1981-1985

DOI: 10.21437/Interspeech.2016-552

315Citations

126Readers

Get full text

Abstract

Recent studies on multi-microphone speech databases indicate that it is beneficial to perform beamforming to improve speech recognition accuracies, especially when there is a high level of background noise. Minimum variance distortionless response (MVDR) beamforming is an important beamforming method that performs quite well for speech recognition purposes especially if the steering vector is known. However, steering the beamformer to focus on speech in unknown acoustic conditions remains a challenging problem. In this study, we use singlechannel speech enhancement deep networks to form masks that can be used for noise spatial covariance estimation, which steers the MVDR beamforming toward the speech. We analyze how mask prediction affects performance and also discuss various ways to use masks to obtain the speech and noise spatial covariance estimates in a reliable way. We show that using a single mask across microphones for covariance prediction with minima-limited post-masking yields the best result in terms of signal-level quality measures and speech recognition word error rates in a mismatched training condition.

Author supplied keywords

Cite

CITATION STYLE

APA

Erdogan, H., Hershey, J., Watanabe, S., Mandel, M., & Le Roux, J. (2016). Improved MVDR beamforming using single-channel mask prediction networks. In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH (Vol. 08-12-September-2016, pp. 1981–1985). International Speech and Communication Association. https://doi.org/10.21437/Interspeech.2016-552

Improved MVDR beamforming using single-channel mask prediction networks

Abstract

Author supplied keywords

Cite

Register to see more suggestions