Vietnamese Speaker Verification With Mel-Scale Filter Bank Energies and Deep Learning

Thi Thanh Mai Nguyen; Duc Dung Nguyen; Chi Mai Luong

Journal ArticleOPEN ACCESS

Vietnamese Speaker Verification With Mel-Scale Filter Bank Energies and Deep Learning

IEEE Access (2024) 12 150114-150122

DOI: 10.1109/ACCESS.2024.3479092

1Citations

9Readers

Abstract

Mel-Frequency Cepstral Coefficients (MFCCs) have been extensively used as input for many traditional and modern speech processing systems. The power of MFCCs lies in the compact representation of speech signals, which is capable of capturing the essential phonetic content of the speech. However, most of the MFCC energy concentrates on the low-order coefficients, and the flat distribution of high-order MFCC values makes convolutional operators less sensitive to the transient details of the coefficients, which may be important in certain speech processing tasks like speaker recognition. In this paper, we analyze the differences between Mel-scale filter bank energies (MFBEs) and MFCCs, and we show that MFBEs are more effective inputs for deep learning-based Vietnamese speaker verification. MFBEs help deep learning models learn a better speaker representation with a more compact distribution of embedding vectors. Experiments on two Vietnamese speaker verification datasets show that the MFBEs consistently outperform MFCCs in improving the performance of some state-of-the-art deep learning models. The equal error rate (EER) on the Vietnam-Celeb test dataset was reduced by 1.14% with the ResNetSE-34 model and 2.36%, or 51.6% improvement, on the VLSP2021 test dataset with ECAPA-TDNN model and transfer learning.

Author supplied keywords

Cite

CITATION STYLE

APA

Nguyen, T. T. M., Nguyen, D. D., & Luong, C. M. (2024). Vietnamese Speaker Verification With Mel-Scale Filter Bank Energies and Deep Learning. IEEE Access, 12, 150114–150122. https://doi.org/10.1109/ACCESS.2024.3479092

Vietnamese Speaker Verification With Mel-Scale Filter Bank Energies and Deep Learning

Abstract

Author supplied keywords

Cite

Register to see more suggestions