Improving Russian LVCSR Using Deep Neural Networks for Acoustic and Language Modeling

6Citations
Citations of this article
2Readers
Mendeley users who have this article in their library.
Get full text

Abstract

In the paper, we present our very large vocabulary continuous Russian speech recognition system based on various neural networks. We employed neural networks on both acoustic and language modeling stages. For training hybrid acoustic models, we experimented with several types of neural networks: feedforward deep neural network, time-delay neural network, Long Short-Term Memory, bidirectional Long Short-Term Memory. We created neural networks with various numbers of hidden layers and units in hidden layers. Language modeling was performed using recurrent neural network. At first, experiments on Russian speech recognition were carried out using hybrid acoustic models and 3-gram language model. Then 500-best list was rescored with recurrent neural network language model. The lowest word error rate equal to 15.13% was achieved using time-delay neural network for acoustic modeling and recurrent neural network language model interpolated with 3-gram model for 500-best list rescoring.

Cite

CITATION STYLE

APA

Kipyatkova, I. (2018). Improving Russian LVCSR Using Deep Neural Networks for Acoustic and Language Modeling. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11096 LNAI, pp. 291–300). Springer Verlag. https://doi.org/10.1007/978-3-319-99579-3_31

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free