A hybrid CTC+Attention model based on end-to-end framework for multilingual speech recognition

Sendong Liang; Wei Qi Yan

Journal ArticleOPEN ACCESS

A hybrid CTC+Attention model based on end-to-end framework for multilingual speech recognition

Multimedia Tools and Applications (2022) 81(28) 41295-41308

DOI: 10.1007/s11042-022-12136-3

10Citations

15Readers

Abstract

Speech recognition is an important field in natural language processing. In this paper, the end-to-end framework for speech recognition with multilingual datasets is proposed. The end-to-end methods do not require complicated alignment and construction of the pronunciation dictionary, which show a promising prospect. In this paper, we implement a hybrid model of CTC and attention (CTC+Attention) model based on PyTorch. In order to compare speech recognition methods for multiple languages, we design and create three datasets: Chinese, English, and Code-Switch. We evaluate the proposed hybrid CTC+Attention model in multilingual environment. Throughout our experiments, we find that the proposed hybrid CTC+Attention model based on end-to-end framework achieves better performance compared with the HMM-DNN model in a single language and Code-Switch speaking environment. Moreover, the results of speech recognition with regard to different languages are compared in this paper. The CER(i.e., Character Error Rate) of the proposed hybrid CTC+Attention model based on the Chinese dataset defeated the traditional model and reached 10.22%.

Author supplied keywords

Cite

CITATION STYLE

APA

Liang, S., & Yan, W. Q. (2022). A hybrid CTC+Attention model based on end-to-end framework for multilingual speech recognition. Multimedia Tools and Applications, 81(28), 41295–41308. https://doi.org/10.1007/s11042-022-12136-3

A hybrid CTC+Attention model based on end-to-end framework for multilingual speech recognition

Abstract

Author supplied keywords

Cite

Register to see more suggestions