Research on Khalkha Dialect Mongolian Speech Recognition Acoustic Model Based on Weight Transfer

1Citations
Citations of this article
3Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Due to the lack of labeled training data, the performance of acoustic models in low-resource speech recognition systems such as Khalkha dialect Mongolian is poor. Transfer Learning can solve the data-sparse problem by learning the source domain (high resource) knowledge to guides the training of the target domain (low resource) model. In this paper, we investigate the modeling method of using different transfer learning ways in the Khalkha dialect Mongolian ASR system. First, the English and Chahar dialect are used as the source domains, and the trained acoustic model on the above source domains are conducted to initialize the Khalkha acoustic model parameter. Furthermore, the different training strategies, the portability of different hidden layers, and the impact of the pre-training model on the transfer model were applied to validate their effectiveness in the Khalkha dialect ASR task. The experimental results show that the optimal acoustic model is chain TDNN based on weight transfer method with Chahar dialect as the source domain. The final WER is 15.67%, which is relatively reduced by 38% compared to the random initialization model.

Cite

CITATION STYLE

APA

Shi, L., Bao, F., Wang, Y., & Gao, G. (2019). Research on Khalkha Dialect Mongolian Speech Recognition Acoustic Model Based on Weight Transfer. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11839 LNAI, pp. 519–528). Springer. https://doi.org/10.1007/978-3-030-32236-6_47

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free