Effect of Language Mixture on Speaker Verification: An Investigation with Amharic, English, and Mandarin Chinese

Firew Tadele; Jianguo Wei; Kiyoshi Honda; Ruiteng Zhang; Wenhao Yang

Conference Proceedings

Effect of Language Mixture on Speaker Verification: An Investigation with Amharic, English, and Mandarin Chinese

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2022) 13340 LNCS 243-256

DOI: 10.1007/978-3-031-06791-4_20

0Citations

2Readers

Get full text

Abstract

Speaker verification (SV) tasks with low-resource language corpora naturally face technical difficulties and often require language mixture processing. In this paper, the LibriSpeech ASR corpus, the AISHELL-I Mandarin Speech corpus, and the Yegna2021 corpus were used for training the x-vector model. The Yegna2021 is a bilingual speech corpus consisting of Amharic and English languages. We designed and collected the Yegna2021 corpus to facilitate SV experimentation. Over 200 native Ethiopian speakers who are bilingual in both languages have participated in the creation of the corpus. To the best of our knowledge, this is the first study of SV systems in Amharic language. This study proposes that improving SV performance degradation, caused by language mismatch between training and testing utterances, requires not only combining two or more languages for training, but also considering the phonetic similarities and differences between languages that impact on obtaining better SV performance. The varied effects of language combinations have been examined on Mandarin Chinese, Amharic, and English languages. In this paper, we investigate the impact of language mismatches between training and testing on SV performance using only the Yegna2021corpus. The experimental results show that a language variability between training and testing utterances significantly degrades SV performance (between 6.5% to 9.0%). The combination of Amharic and Mandarin yields better SV performance than English and Mandarin, achieving an Equal error rate (EER) of 8.3% as compared to 9.8%, with relative performance degradation of 17.1%. To verify these results, we paired Mandarin with data from the LibriSpeech, and the result shows 18.2% relative performance degradation, with an EER of 9.9% for English and Mandarin.

Author supplied keywords

Cite

CITATION STYLE

APA

Tadele, F., Wei, J., Honda, K., Zhang, R., & Yang, W. (2022). Effect of Language Mixture on Speaker Verification: An Investigation with Amharic, English, and Mandarin Chinese. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 13340 LNCS, pp. 243–256). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-031-06791-4_20

Effect of Language Mixture on Speaker Verification: An Investigation with Amharic, English, and Mandarin Chinese

Abstract

Author supplied keywords

Cite

Register to see more suggestions