Investigating joint CTC-attention models for end-to-end russian speech recognition

Nikita Markovnikov; Irina Kipyatkova

Conference Proceedings

Investigating joint CTC-attention models for end-to-end russian speech recognition

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2019) 11658 LNAI 337-347

DOI: 10.1007/978-3-030-26061-3_35

6Citations

3Readers

Get full text

Abstract

We propose an application of attention-based models for automatic recognition of continuous Russian speech. We experimented with three types of attention mechanism, data augmentation based on a tempo and pitch perturbations, and a beam search pruning method. Moreover we propose a using of sparsemax function for our task as a probability distribution generator for an attention mechanism. We experimented with a joint CTC-Attention encoder-decoders using deep convolutional networks to compress input features or waveform spectrograms. Also we experimented with Highway LSTM model as an encoder. We performed experiments with a small dataset of Russian speech with total duration of more than 60 h. We got the recognition accuracy improvement by using proposed methods and showed better performance in terms of speech decoding speed using the beam search optimization method.

Author supplied keywords

Cite

CITATION STYLE

APA

Markovnikov, N., & Kipyatkova, I. (2019). Investigating joint CTC-attention models for end-to-end russian speech recognition. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11658 LNAI, pp. 337–347). Springer Verlag. https://doi.org/10.1007/978-3-030-26061-3_35

Investigating joint CTC-attention models for end-to-end russian speech recognition

Abstract

Author supplied keywords

Cite

Register to see more suggestions