Improving the performance of Uyghur speech recognition based on Factorized Time-Delay Neural Network

0Citations
Citations of this article
7Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Speech recognition systems have low accuracy in recognizing the Uyghur language, a low-resource language, due to its strong language specificity and few public training datasets. Given this problem, considering the characteristics of Uyghur, we use morpheme units to build a language model and use mixture data augmentation methods to expand the training data. A 9-layer TDNN-F is applied, which can effectively utilize contextual information. An optimal 9.88% WER (Word Error Rate) is achieved in experiments on the open-source dataset THUYVG-20. Compared to the baseline system of this dataset, the WER is reduced by 6.7%, which significantly improves the accuracy of the Uyghur speech recognition, and provides a reference in other low-resource languages for speech recognization.

Cite

CITATION STYLE

APA

Zhang, L., Liu, L., Huang, G., Zhu, H., Zha, H., Qiu, Z., … Lu, W. (2022). Improving the performance of Uyghur speech recognition based on Factorized Time-Delay Neural Network. In Journal of Physics: Conference Series (Vol. 2400). Institute of Physics. https://doi.org/10.1088/1742-6596/2400/1/012059

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free