Word-level permutation and improved lower frame rate for RNN-based acoustic modeling

2Citations
Citations of this article
4Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Recently, the RNN-based acoustic model has shown promising performance. However, its generalization ability to multiple scenarios is not powerful enough for two reasons. Firstly, it encodes inter-word dependency, which conflicts with the nature that an acoustic model should model the pronunciation of words only. Secondly, the RNN-based acoustic model depicting the inner-word acoustic trajectory frame-by-frame is too precise to tolerate small distortions. In this work, we propose two variants to address aforementioned two problems. One is the word-level permutation, i.e. the order of input features and corresponding labels is shuffled with a proper probability according to word boundaries. It aims to eliminate inter-word dependencies. The other one is the improved LFR (iLFR) model, which equidistantly splits the original sentence into N utterances to overcome the discarding data in LFR model. Results based on LSTM RNN demonstrate 7% relative performance improvement by jointing the word-level permutation and iLFR.

Cite

CITATION STYLE

APA

Zhao, Y., Zhou, S., Xu, S., & Xu, B. (2017). Word-level permutation and improved lower frame rate for RNN-based acoustic modeling. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 10639 LNCS, pp. 859–869). Springer Verlag. https://doi.org/10.1007/978-3-319-70136-3_91

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free