In this study, we built databases for mandarin speeches under quiet and noisy environments, respectively. After using mel-frequency cepstrum coefficient (MFCC) to extract feature vectors for the speech records, we performed speaker gender identification using three statistical learning methods: K-nearest neighbor (KNN), probabilistic neural network (PNN), and support vector machine (SVM) and analyzed the influences of frame size, normalization, and noise on the identification result. The experiment showed that (1) the best appropriate frame size is 2,048; (2) feature normalization increased the whole accuracy about 3%; (3) the accuracies of SVM are highest than those of KNN and PNN, which reached 100, 97.8 and 95.8% accuracies in the quiet, noise, and hybrid database.
CITATION STYLE
Xiao, H. (2014). A comparative study on speaker gender identification using MFCC and statistical learning methods. Advances in Intelligent Systems and Computing, 255, 715–723. https://doi.org/10.1007/978-81-322-1759-6_82
Mendeley helps you to discover research relevant for your work.