Effect of Language Mixture on Speaker Verification: An Investigation with Amharic, English, and Mandarin Chinese

0Citations
Citations of this article
2Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Speaker verification (SV) tasks with low-resource language corpora naturally face technical difficulties and often require language mixture processing. In this paper, the LibriSpeech ASR corpus, the AISHELL-I Mandarin Speech corpus, and the Yegna2021 corpus were used for training the x-vector model. The Yegna2021 is a bilingual speech corpus consisting of Amharic and English languages. We designed and collected the Yegna2021 corpus to facilitate SV experimentation. Over 200 native Ethiopian speakers who are bilingual in both languages have participated in the creation of the corpus. To the best of our knowledge, this is the first study of SV systems in Amharic language. This study proposes that improving SV performance degradation, caused by language mismatch between training and testing utterances, requires not only combining two or more languages for training, but also considering the phonetic similarities and differences between languages that impact on obtaining better SV performance. The varied effects of language combinations have been examined on Mandarin Chinese, Amharic, and English languages. In this paper, we investigate the impact of language mismatches between training and testing on SV performance using only the Yegna2021corpus. The experimental results show that a language variability between training and testing utterances significantly degrades SV performance (between 6.5% to 9.0%). The combination of Amharic and Mandarin yields better SV performance than English and Mandarin, achieving an Equal error rate (EER) of 8.3% as compared to 9.8%, with relative performance degradation of 17.1%. To verify these results, we paired Mandarin with data from the LibriSpeech, and the result shows 18.2% relative performance degradation, with an EER of 9.9% for English and Mandarin.

Cite

CITATION STYLE

APA

Tadele, F., Wei, J., Honda, K., Zhang, R., & Yang, W. (2022). Effect of Language Mixture on Speaker Verification: An Investigation with Amharic, English, and Mandarin Chinese. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 13340 LNCS, pp. 243–256). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-031-06791-4_20

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free