Speaker identification made easy with pruned reassigned spectrograms

3Citations
Citations of this article
5Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

One common scenario for speaker identification presents the task of identifying samples of speech from members of a previously enrolled group. One recent (and typical) set of results used 36 seconds of speech from each speaker to train Gaussian models by expectation-maximization during enrollment, and 20 seconds of speech for the test samples. Three major problems with this procedure are 1) sensitivity to noise; 2) impractical amounts of speech are required; 3) computationally expensive training is required. In our study, the reassigned spectrogram is pruned using phase-derivative indicator functions to provide a sparse time-frequency matrix of very small (40 ms) samples of speech. The pruning eliminates Gaussian noise up to 6 dB SNR at least. Principal component analysis provided a set of 30 features from each spectrogram. Using an enrolled group of 24 speakers recorded under low-fidelity conditions, 83% identification accuracy (comparable to state of the art results with 6 dB SNR) was achieved from real-time classification methods (e.g. support vector machines) without need for extensive training. Moreover, these results extend to less than 6 dB SNR where standard techniques break down. The three main problems of speaker identification can thus be better addressed by our methodology. © 2013 Acoustical Society of America.

Cite

CITATION STYLE

APA

Fulop, S. A., & Kim, Y. (2013). Speaker identification made easy with pruned reassigned spectrograms. In Proceedings of Meetings on Acoustics (Vol. 19). https://doi.org/10.1121/1.4798949

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free