Improvement of speaker identification by combining prosodic features with acoustic features

2Citations
Citations of this article
9Readers
Mendeley users who have this article in their library.
Get full text

Abstract

In this paper, we study prosodic features derived from pitch parameters to improve the performance of speaker identification (SID) system. In order to deal with the problem of missing pitch in telephone speech, we use pitch estimation for each frame, even in unvoiced regions. After silence frames removal, we also improve prosodic modeling by a weighting form of logarithm of pitch. Then new prosodic features are combined with MFCC parameters. Based on our Gaussian Mixture Model-Universal Background Model (GMM-UBM) recognizer, SID experiments are conducted on the NIST 2001 cellular telephone corpus. Compared to MFCC features, combined features yield 7.0% relative error reduction for male and 2.5% for female. We also discuss the advanced pitch extraction and modeling approach for the improvement of SID systems. © Springer-Verlag Berlin Heidelberg 2004.

Cite

CITATION STYLE

APA

Zheng, R., Zhang, S., & Xu, B. (2004). Improvement of speaker identification by combining prosodic features with acoustic features. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 3338, 569–576. https://doi.org/10.1007/978-3-540-30548-4_65

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free