In the paper, we present our very large vocabulary continuous Russian speech recognition system based on various neural networks. We employed neural networks on both acoustic and language modeling stages. For training hybrid acoustic models, we experimented with several types of neural networks: feedforward deep neural network, time-delay neural network, Long Short-Term Memory, bidirectional Long Short-Term Memory. We created neural networks with various numbers of hidden layers and units in hidden layers. Language modeling was performed using recurrent neural network. At first, experiments on Russian speech recognition were carried out using hybrid acoustic models and 3-gram language model. Then 500-best list was rescored with recurrent neural network language model. The lowest word error rate equal to 15.13% was achieved using time-delay neural network for acoustic modeling and recurrent neural network language model interpolated with 3-gram model for 500-best list rescoring.
CITATION STYLE
Kipyatkova, I. (2018). Improving Russian LVCSR Using Deep Neural Networks for Acoustic and Language Modeling. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11096 LNAI, pp. 291–300). Springer Verlag. https://doi.org/10.1007/978-3-319-99579-3_31
Mendeley helps you to discover research relevant for your work.