Towards end-to-end speech recognition with transfer learning

34Citations
Citations of this article
79Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

A transfer learning-based end-to-end speech recognition approach is presented in two levels in our framework. Firstly, a feature extraction approach combining multilingual deep neural network (DNN) training with matrix factorization algorithm is introduced to extract high-level features. Secondly, the advantage of connectionist temporal classification (CTC) is transferred to the target attention-based model through a joint CTC-attention model composed of shallow recurrent neural networks (RNNs) on top of the proposed features. The experimental results show that the proposed transfer learning approach achieved the best performance among all end-to-end methods and could be comparable to the state-of-the-art speech recognition system for TIMIT when further jointly decoded with a RNN language model.

Cite

CITATION STYLE

APA

Qin, C. X., Qu, D., & Zhang, L. H. (2018). Towards end-to-end speech recognition with transfer learning. Eurasip Journal on Audio, Speech, and Music Processing, 2018(1). https://doi.org/10.1186/s13636-018-0141-9

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free