A comparative study on speaker gender identification using MFCC and statistical learning methods

Hanguang Xiao

Journal Article

A comparative study on speaker gender identification using MFCC and statistical learning methods

Xiao H

Advances in Intelligent Systems and Computing (2014) 255 715-723

DOI: 10.1007/978-81-322-1759-6_82

2Citations

1Readers

Get full text

Abstract

In this study, we built databases for mandarin speeches under quiet and noisy environments, respectively. After using mel-frequency cepstrum coefficient (MFCC) to extract feature vectors for the speech records, we performed speaker gender identification using three statistical learning methods: K-nearest neighbor (KNN), probabilistic neural network (PNN), and support vector machine (SVM) and analyzed the influences of frame size, normalization, and noise on the identification result. The experiment showed that (1) the best appropriate frame size is 2,048; (2) feature normalization increased the whole accuracy about 3%; (3) the accuracies of SVM are highest than those of KNN and PNN, which reached 100, 97.8 and 95.8% accuracies in the quiet, noise, and hybrid database.

Author supplied keywords

Cite

CITATION STYLE

APA

Xiao, H. (2014). A comparative study on speaker gender identification using MFCC and statistical learning methods. Advances in Intelligent Systems and Computing, 255, 715–723. https://doi.org/10.1007/978-81-322-1759-6_82

A comparative study on speaker gender identification using MFCC and statistical learning methods

Abstract

Author supplied keywords

Cite

Register to see more suggestions