I-vectors and Deep Convolutional Neural Networks for Language Identification in Clean and Reverberant Environments

0Citations
Citations of this article
1Readers
Mendeley users who have this article in their library.
Get full text

Abstract

In the current study, a method for automatic language identification based on deep convolutional neural networks (DCNN) and the i-vector paradigm is proposed. Convolutional neural networks (CNN) have been successfully applied to image classification, speech emotion recognition, and facial expression recognition. In the current study, a variant of typical CNN is being applied and experimentally investigated in spoken language identification. When the proposed method was evaluated on the NIST 2015 i-vector Machine Learning Challenge task for the recognition of 50 in-set languages, a 3.9% equal error rate (EER) was achieved. The proposed method was compared to two baseline methods showing superior performance. The results obtained are very promising and show the effectiveness of using DCNN in spoken language identification. Furthermore, in the current study, a front-end feature enhancement and dereverberation approach based on a deep convolutional autoencoder is also reported.

Cite

CITATION STYLE

APA

Heracleous, P., Mohammad, Y., Takai, K., Yasuda, K., & Yoneyama, A. (2023). I-vectors and Deep Convolutional Neural Networks for Language Identification in Clean and Reverberant Environments. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 13396 LNCS, pp. 30–40). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-031-23793-5_3

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free