Abstract
Automatic speech recognition systems usually require large annotated speech corpus for training. The manual annotation of a large corpus is very difficult. It can be very helpful to use unsupervised and semi-supervised learning methods in addition to supervised learning. In this work, we focus on using a semi-supervised training approach for Bangla Speech Recognition that can exploit large unpaired audio and text data. We encode speech and text data in an intermediate domain and propose a novel loss function based on the global encoding distance between encoded data to guide the semi-supervised training. Our proposed method reduces the Word Error Rate (WER) of the system from 37% to 31.9%.
Cite
CITATION STYLE
Sadeq, N., Chowdhury, N. T., Utshaw, F. T., Ahmed, S., & Adnan, M. A. (2020). Improving end-to-end bangla speech recognition with semi-supervised training. In Findings of the Association for Computational Linguistics Findings of ACL: EMNLP 2020 (pp. 1875–1883). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2020.findings-emnlp.169
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.