Whisper is an alternativeway of speech communication especially when a speaker does not want to reveal the information other than the target listeners. Generally, speaker-specific information is present in both excitation source and vocal tract system. However, whispered speech does not contain significant source characteristics as there is almost no excitation by the vocal folds, and speaker information in vocal tract system is also low as compared to the normal speech signal. Hence, it is difficult to recognize a speaker from his/her whispered speech. To address this, features based on vocal tract system characteristics such as state-of-the-art Mel Frequency Cepstral Coefficients (MFCC) and recently developed Cochlear Frequency Cepstral Coefficients (CFCC) are proposed. CHAINS (Characterizing individual speakers) whispered speech database is used for conducting experiments using GMM-UBM (Gaussian Mixture Modeling- Universal Background Modeling) approach. It was observed from the experiments that the fusion of CFCC and MFCC gives improvement in % IR (Identification Rate) and % EER (Equal Error Rate) than MFCC alone, indicating that proposed features and their score-level fusion captures complementary speaker specific information.
CITATION STYLE
Raikar, A., Gandhi, A., & Patil, H. A. (2015). Combining evidences from mel Cepstral and cochlear Cepstral features for speaker recognition using whispered speech. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 9302, pp. 405–413). Springer Verlag. https://doi.org/10.1007/978-3-319-24033-6_46
Mendeley helps you to discover research relevant for your work.