Improving end-to-end bangla speech recognition with semi-supervised training

2Citations
Citations of this article
64Readers
Mendeley users who have this article in their library.

Abstract

Automatic speech recognition systems usually require large annotated speech corpus for training. The manual annotation of a large corpus is very difficult. It can be very helpful to use unsupervised and semi-supervised learning methods in addition to supervised learning. In this work, we focus on using a semi-supervised training approach for Bangla Speech Recognition that can exploit large unpaired audio and text data. We encode speech and text data in an intermediate domain and propose a novel loss function based on the global encoding distance between encoded data to guide the semi-supervised training. Our proposed method reduces the Word Error Rate (WER) of the system from 37% to 31.9%.

Cite

CITATION STYLE

APA

Sadeq, N., Chowdhury, N. T., Utshaw, F. T., Ahmed, S., & Adnan, M. A. (2020). Improving end-to-end bangla speech recognition with semi-supervised training. In Findings of the Association for Computational Linguistics Findings of ACL: EMNLP 2020 (pp. 1875–1883). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2020.findings-emnlp.169

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free