Exploring Native and Non-Native English Child Speech Recognition With Whisper

Rishabh Jain; Andrei Barcovschi; Mariam Yahayah Yiwere; Peter Corcoran; Horia Cucu

Journal ArticleOPEN ACCESS

Exploring Native and Non-Native English Child Speech Recognition With Whisper

IEEE Access (2024) 12 41601-41610

DOI: 10.1109/ACCESS.2024.3378738

1Citations

14Readers

Abstract

Modern end-to-end Automatic Speech Recognition (ASR) systems struggle to recognise children's speech. This challenge is due to the high acoustic variability in children's voices and the scarcity of child speech training data, particularly for accented or low-resource languages. This study focuses on improving the performance of ASR on native and non-native English child speech using publicly available datasets. We evaluate how the large-scale whisper models (trained with a large amount of adult speech data) perform with child speech. In addition, we perform finetuning experiments using different child speech datasets to investigate the performance of whisper ASR on non-native English-speaking children's speech. Our findings indicate relative Word Error Rate (WER) improvements ranging from 29% to 89% over previous benchmarks on the same datasets. Notably, these gains were achieved by finetuning with only a 10% sample of unseen non-native datasets. These results demonstrate the potential of whisper for improving ASR in a low-resource scenario for non-native child speech.

Author supplied keywords

Cite

CITATION STYLE

APA

Jain, R., Barcovschi, A., Yiwere, M. Y., Corcoran, P., & Cucu, H. (2024). Exploring Native and Non-Native English Child Speech Recognition With Whisper. IEEE Access, 12, 41601–41610. https://doi.org/10.1109/ACCESS.2024.3378738

Exploring Native and Non-Native English Child Speech Recognition With Whisper

Abstract

Author supplied keywords

Cite

Register to see more suggestions