Improving the performance of Uyghur speech recognition based on Factorized Time-Delay Neural Network

Luzhuo Zhang; Lanjiao Liu; Gaoce Huang; Haipeng Zhu; Haipeng Zha; Zicheng Qiu; Xia Zhang; Wenqiang Lu

Conference ProceedingsOPEN ACCESS

Improving the performance of Uyghur speech recognition based on Factorized Time-Delay Neural Network

Journal of Physics: Conference Series (2022) 2400(1)

DOI: 10.1088/1742-6596/2400/1/012059

0Citations

7Readers

Abstract

Speech recognition systems have low accuracy in recognizing the Uyghur language, a low-resource language, due to its strong language specificity and few public training datasets. Given this problem, considering the characteristics of Uyghur, we use morpheme units to build a language model and use mixture data augmentation methods to expand the training data. A 9-layer TDNN-F is applied, which can effectively utilize contextual information. An optimal 9.88% WER (Word Error Rate) is achieved in experiments on the open-source dataset THUYVG-20. Compared to the baseline system of this dataset, the WER is reduced by 6.7%, which significantly improves the accuracy of the Uyghur speech recognition, and provides a reference in other low-resource languages for speech recognization.

Cite

CITATION STYLE

APA

Zhang, L., Liu, L., Huang, G., Zhu, H., Zha, H., Qiu, Z., … Lu, W. (2022). Improving the performance of Uyghur speech recognition based on Factorized Time-Delay Neural Network. In Journal of Physics: Conference Series (Vol. 2400). Institute of Physics. https://doi.org/10.1088/1742-6596/2400/1/012059

Improving the performance of Uyghur speech recognition based on Factorized Time-Delay Neural Network

Abstract

Cite

Register to see more suggestions