Improved MVDR beamforming using single-channel mask prediction networks

315Citations
Citations of this article
126Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Recent studies on multi-microphone speech databases indicate that it is beneficial to perform beamforming to improve speech recognition accuracies, especially when there is a high level of background noise. Minimum variance distortionless response (MVDR) beamforming is an important beamforming method that performs quite well for speech recognition purposes especially if the steering vector is known. However, steering the beamformer to focus on speech in unknown acoustic conditions remains a challenging problem. In this study, we use singlechannel speech enhancement deep networks to form masks that can be used for noise spatial covariance estimation, which steers the MVDR beamforming toward the speech. We analyze how mask prediction affects performance and also discuss various ways to use masks to obtain the speech and noise spatial covariance estimates in a reliable way. We show that using a single mask across microphones for covariance prediction with minima-limited post-masking yields the best result in terms of signal-level quality measures and speech recognition word error rates in a mismatched training condition.

Cite

CITATION STYLE

APA

Erdogan, H., Hershey, J., Watanabe, S., Mandel, M., & Le Roux, J. (2016). Improved MVDR beamforming using single-channel mask prediction networks. In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH (Vol. 08-12-September-2016, pp. 1981–1985). International Speech and Communication Association. https://doi.org/10.21437/Interspeech.2016-552

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free