Investigating joint CTC-attention models for end-to-end russian speech recognition

6Citations
Citations of this article
3Readers
Mendeley users who have this article in their library.
Get full text

Abstract

We propose an application of attention-based models for automatic recognition of continuous Russian speech. We experimented with three types of attention mechanism, data augmentation based on a tempo and pitch perturbations, and a beam search pruning method. Moreover we propose a using of sparsemax function for our task as a probability distribution generator for an attention mechanism. We experimented with a joint CTC-Attention encoder-decoders using deep convolutional networks to compress input features or waveform spectrograms. Also we experimented with Highway LSTM model as an encoder. We performed experiments with a small dataset of Russian speech with total duration of more than 60 h. We got the recognition accuracy improvement by using proposed methods and showed better performance in terms of speech decoding speed using the beam search optimization method.

Cite

CITATION STYLE

APA

Markovnikov, N., & Kipyatkova, I. (2019). Investigating joint CTC-attention models for end-to-end russian speech recognition. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11658 LNAI, pp. 337–347). Springer Verlag. https://doi.org/10.1007/978-3-030-26061-3_35

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free