Time-frequency feature representation using multi-resolution texture analysis and acoustic activity detector for real-life speech emotion recognition

Kun Ching Wang

Journal ArticleOPEN ACCESS

Time-frequency feature representation using multi-resolution texture analysis and acoustic activity detector for real-life speech emotion recognition

Wang K

Sensors (Switzerland) (2015) 15(1) 1458-1478

DOI: 10.3390/s150101458

29Citations

51Readers

Abstract

The classification of emotional speech is mostly considered in speech-related research on human-computer interaction (HCI). In this paper, the purpose is to present a novel feature extraction based on multi-resolutions texture image information (MRTII). The MRTII feature set is derived from multi-resolution texture analysis for characterization and classification of different emotions in a speech signal. The motivation is that we have to consider emotions have different intensity values in different frequency bands. In terms of human visual perceptual, the texture property on multi-resolution of emotional speech spectrogram should be a good feature set for emotion classification in speech. Furthermore, the multi-resolution analysis on texture can give a clearer discrimination between each emotion than uniform-resolution analysis on texture. In order to provide high accuracy of emotional discrimination especially in real-life, an acoustic activity detection (AAD) algorithm must be applied into the MRTII-based feature extraction. Considering the presence of many blended emotions in real life, in this paper make use of two corpora of naturally-occurring dialogs recorded in real-life call centers. Compared with the traditional Mel-scale Frequency Cepstral Coefficients (MFCC) and the state-of-the-art features, the MRTII features also can improve the correct classification rates of proposed systems among different language databases. Experimental results show that the proposed MRTII-based feature information inspired by human visual perception of the spectrogram image can provide significant classification for real-life emotional recognition in speech.

Author supplied keywords

Cite

CITATION STYLE

APA

Wang, K. C. (2015). Time-frequency feature representation using multi-resolution texture analysis and acoustic activity detector for real-life speech emotion recognition. Sensors (Switzerland), 15(1), 1458–1478. https://doi.org/10.3390/s150101458

Time-frequency feature representation using multi-resolution texture analysis and acoustic activity detector for real-life speech emotion recognition

Abstract

Author supplied keywords

Cite

Register to see more suggestions