Speech enhancement with LSTM recurrent neural networks and its application to noise-robust ASR

Felix Weninger; Hakan Erdogan; Shinji Watanabe; Emmanuel Vincent; Jonathan Le Roux; John R. Hershey; Björn Schuller

Conference Proceedings

Speech enhancement with LSTM recurrent neural networks and its application to noise-robust ASR

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2015) 9237 91-99

DOI: 10.1007/978-3-319-22482-4_11

448Citations

156Readers

Get full text

Abstract

We evaluate some recent developments in recurrent neural network (RNN) based speech enhancement in the light of noise-robust automatic speech recognition (ASR). The proposed framework is based on Long Short-Term Memory (LSTM) RNNs which are discriminatively trained according to an optimal speech reconstruction objective. We demonstrate that LSTM speech enhancement, even when used ‘naïvely’ as front-end processing, delivers competitive results on the CHiME-2 speech recognition task. Furthermore, simple, feature-level fusion based extensions to the framework are proposed to improve the integration with the ASR back-end. These yield a best result of 13.76% average word error rate, which is, to our knowledge, the best score to date.

Cite

CITATION STYLE

APA

Weninger, F., Erdogan, H., Watanabe, S., Vincent, E., Le Roux, J., Hershey, J. R., & Schuller, B. (2015). Speech enhancement with LSTM recurrent neural networks and its application to noise-robust ASR. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 9237, pp. 91–99). Springer Verlag. https://doi.org/10.1007/978-3-319-22482-4_11

Speech enhancement with LSTM recurrent neural networks and its application to noise-robust ASR

Abstract

Cite

Register to see more suggestions