Abstract
Speech recognition systems have low accuracy in recognizing the Uyghur language, a low-resource language, due to its strong language specificity and few public training datasets. Given this problem, considering the characteristics of Uyghur, we use morpheme units to build a language model and use mixture data augmentation methods to expand the training data. A 9-layer TDNN-F is applied, which can effectively utilize contextual information. An optimal 9.88% WER (Word Error Rate) is achieved in experiments on the open-source dataset THUYVG-20. Compared to the baseline system of this dataset, the WER is reduced by 6.7%, which significantly improves the accuracy of the Uyghur speech recognition, and provides a reference in other low-resource languages for speech recognization.
Cite
CITATION STYLE
Zhang, L., Liu, L., Huang, G., Zhu, H., Zha, H., Qiu, Z., … Lu, W. (2022). Improving the performance of Uyghur speech recognition based on Factorized Time-Delay Neural Network. In Journal of Physics: Conference Series (Vol. 2400). Institute of Physics. https://doi.org/10.1088/1742-6596/2400/1/012059
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.