Making More of Little Data: Improving Low-Resource Automatic Speech Recognition Using Data Augmentation

Martijn Bartelds; Nay San; Bradley McDonnell; Dan Jurafsky; Martijn Wieling

Conference ProceedingsOPEN ACCESS

Making More of Little Data: Improving Low-Resource Automatic Speech Recognition Using Data Augmentation

Proceedings of the Annual Meeting of the Association for Computational Linguistics (2023) 1 715-729

DOI: 10.18653/v1/2023.acl-long.42

8Citations

22Readers

Abstract

The performance of automatic speech recognition (ASR) systems has advanced substantially in recent years, particularly for languages for which a large amount of transcribed speech is available. Unfortunately, for low-resource languages, such as minority languages, regional languages or dialects, ASR performance generally remains much lower. In this study, we investigate whether data augmentation techniques could help improve low-resource ASR performance, focusing on four typologically diverse minority languages or language variants (West Germanic: Gronings, West-Frisian; Malayo-Polynesian: Besemah, Nasal). For all four languages, we examine the use of self-training, where an ASR system trained with the available human-transcribed data is used to generate transcriptions, which are then combined with the original data to train a new ASR system. For Gronings, for which there was a preexisting text-to-speech (TTS) system available, we also examined the use of TTS to generate ASR training data from text-only sources. We find that using a self-training approach consistently yields improved performance (a relative WER reduction up to 20.5% compared to using an ASR system trained on 24 minutes of manually transcribed speech). The performance gain from TTS augmentation for Gronings was even stronger (up to 25.5% relative reduction in WER compared to a system based on 24 minutes of manually transcribed speech). In sum, our results show the benefit of using self-training or (if possible) TTS-generated data as an efficient solution to overcome the limitations of data availability for resource-scarce languages in order to improve ASR performance.

Cite

CITATION STYLE

APA

Bartelds, M., San, N., McDonnell, B., Jurafsky, D., & Wieling, M. (2023). Making More of Little Data: Improving Low-Resource Automatic Speech Recognition Using Data Augmentation. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (Vol. 1, pp. 715–729). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2023.acl-long.42

Making More of Little Data: Improving Low-Resource Automatic Speech Recognition Using Data Augmentation

Abstract

Cite

Register to see more suggestions