Convolutional neural network with spectrogram and perceptual features for speech emotion recognition

Linjuan Zhang; Longbiao Wang; Jianwu Dang; Lili Guo; Haotian Guan

Conference Proceedings

Convolutional neural network with spectrogram and perceptual features for speech emotion recognition

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2018) 11304 LNCS 62-71

DOI: 10.1007/978-3-030-04212-7_6

10Citations

13Readers

Get full text

Abstract

Convolutional neural network (CNN) has demonstrated a great power at mining deep information from spectrogram for speech emotion recognition. However, perceptual features such as low-level descriptors (LLDs) and their statistical values were not utilized sufficiently in CNN-based emotion recognition. To solve this problem, we propose novel features to combine spectrogram and perceptual features in different levels. Firstly, frame-level LLDs are arranged as time-sequence LLDs. Then, spectrogram and time-sequence LLDs are fused as compositional spectrographic features (CSF). To fully utilize perceptual features and global information, statistical values of LLDs are added in CSF to generate rich-compositional spectrographic features (RSF). Finally, the proposed features are individually fed to CNN to extract deep features for emotion recognition. Bi-directional long short-term memory was employed to identify emotions and the experiments were conducted on EmoDB. Compared with spectrogram, CSF and RSF improve the unweighted accuracy by a relative error reduction of 32.04% and 36.91%, respectively.

Author supplied keywords

Cite

CITATION STYLE

APA

Zhang, L., Wang, L., Dang, J., Guo, L., & Guan, H. (2018). Convolutional neural network with spectrogram and perceptual features for speech emotion recognition. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11304 LNCS, pp. 62–71). Springer Verlag. https://doi.org/10.1007/978-3-030-04212-7_6

Convolutional neural network with spectrogram and perceptual features for speech emotion recognition

Abstract

Author supplied keywords

Cite

Register to see more suggestions