A comparative study on speaker gender identification using MFCC and statistical learning methods

2Citations
Citations of this article
1Readers
Mendeley users who have this article in their library.
Get full text

Abstract

In this study, we built databases for mandarin speeches under quiet and noisy environments, respectively. After using mel-frequency cepstrum coefficient (MFCC) to extract feature vectors for the speech records, we performed speaker gender identification using three statistical learning methods: K-nearest neighbor (KNN), probabilistic neural network (PNN), and support vector machine (SVM) and analyzed the influences of frame size, normalization, and noise on the identification result. The experiment showed that (1) the best appropriate frame size is 2,048; (2) feature normalization increased the whole accuracy about 3%; (3) the accuracies of SVM are highest than those of KNN and PNN, which reached 100, 97.8 and 95.8% accuracies in the quiet, noise, and hybrid database.

Cite

CITATION STYLE

APA

Xiao, H. (2014). A comparative study on speaker gender identification using MFCC and statistical learning methods. Advances in Intelligent Systems and Computing, 255, 715–723. https://doi.org/10.1007/978-81-322-1759-6_82

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free