Improvement of speaker identification by combining prosodic features with acoustic features

Rong Zheng; Shuwu Zhang; Bo Xu

Journal Article

Improvement of speaker identification by combining prosodic features with acoustic features

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2004) 3338 569-576

DOI: 10.1007/978-3-540-30548-4_65

2Citations

9Readers

Get full text

Abstract

In this paper, we study prosodic features derived from pitch parameters to improve the performance of speaker identification (SID) system. In order to deal with the problem of missing pitch in telephone speech, we use pitch estimation for each frame, even in unvoiced regions. After silence frames removal, we also improve prosodic modeling by a weighting form of logarithm of pitch. Then new prosodic features are combined with MFCC parameters. Based on our Gaussian Mixture Model-Universal Background Model (GMM-UBM) recognizer, SID experiments are conducted on the NIST 2001 cellular telephone corpus. Compared to MFCC features, combined features yield 7.0% relative error reduction for male and 2.5% for female. We also discuss the advanced pitch extraction and modeling approach for the improvement of SID systems. © Springer-Verlag Berlin Heidelberg 2004.

Cite

CITATION STYLE

APA

Zheng, R., Zhang, S., & Xu, B. (2004). Improvement of speaker identification by combining prosodic features with acoustic features. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 3338, 569–576. https://doi.org/10.1007/978-3-540-30548-4_65

Improvement of speaker identification by combining prosodic features with acoustic features

Abstract

Cite

Register to see more suggestions