While the deep learning revolution has led to significant performance improvements in speech recognition, accented speech remains a challenge. Current approaches to this challenge typically do not seek to understand and provide explanations for the variations of accented speech, whether they stem from native regional variation or non-native error patterns. This paper seeks to address non-native speaker variations from both a knowledge-based and a data-driven perspective. We propose to approximate non-native accented-speech pronunciation patterns by the means of two approaches: based on phonetic and phonological knowledge on the one hand and inferred from a text-tospeech system on the other. Artificial speech is then generated with a range of variants which have been captured in confusion matrices representing phoneme similarities. We then show that non-native accent confusions actually propagate to the transcription from the ASR, thus suggesting that the inference of accent specific phoneme confusions is achievable from artificial speech.
CITATION STYLE
Masson, M., & Carson-Berndsen, J. (2023). Investigating Phoneme Similarity with Artificially Accented Speech. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (pp. 49–57). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2023.sigmorphon-1.6
Mendeley helps you to discover research relevant for your work.