The paper describes a novel approach to automated accent identification by training a speech recogniser to distinguish between different versions of the phonemes that make up the language. In this approach, a standard speech recogniser is trained with data where the critical phonemes (vowels and a small set of consonants) are marked as coming in several varieties, one per accent. This is followed by inspection of the output of the recogniser to determine which versions predominate in a given utterance. Put simply, if a speaker produces phonemes that match Levantine versions of those phonemes they are characterised as speaking with a Levantine accent. Similarly, if they produce phonemes that match the Egyptian versions they should be characterised as speaking with an Egyptian accent, and so on. The accuracy of this approach to classifying speakers’ accents varies from 79 to 86% when tested on speakers from the five main Arabic accent groups (Gulf, Iraqi, Egyptian, Levantine, Maghrebi), depending on a range of conditions discussed in the paper. These results are an improvement on the state of the art for accent recognition for Arabic, i.e. for classifying spoken, rather than written, material on the basis of the speaker’s geographical origin.
CITATION STYLE
Alsharhan, E., & Ramsay, A. (2023). Robust automatic accent identification based on the acoustic evidence. International Journal of Speech Technology, 26(3), 665–680. https://doi.org/10.1007/s10772-023-10031-2
Mendeley helps you to discover research relevant for your work.