End-to-End Speech Emotion Recognition with Gender Information

Ting Wei Sun

Journal ArticleOPEN ACCESS

End-to-End Speech Emotion Recognition with Gender Information

Sun T

IEEE Access (2020) 8 152423-152438

DOI: 10.1109/ACCESS.2020.3017462

66Citations

85Readers

Abstract

Many works have focused on speech emotion recognition algorithms. However, most rely on the proper selection of speech acoustic features. In this paper, we propose a novel emotion recognition algorithm that does not rely on any speech acoustic features and combines speaker gender information. We aim to benefit from the rich information from speech raw data, without any artificial intervention. In general, speech emotion recognition systems require manual selection of appropriate traditional acoustic features as classifier input for emotion recognition. Utilizing deep learning algorithms, and the network automatically select important information from raw speech signal for the classification layer to accomplish emotion recognition. It can prevent the omission of emotion information that cannot be direct mathematically modeled as a speech acoustic characteristic. We also add speaker gender information to the proposed algorithm to further improve recognition accuracy. The proposed algorithm combines a Residual Convolutional Neural Network (R-CNN) and a gender information block. The raw speech data is sent to these two blocks simultaneously. The R-CNN network obtains the necessary emotional information from the speech data and classifies the emotional category. The proposed algorithm is evaluated on three public databases with different language systems. Experimental results show that the proposed algorithm has 5.6%, 7.3%, and 1.5%, respectively accuracy improvements in Mandarin, English, and German compared with existing highest-accuracy algorithms. In order to verify the generalization of the proposed algorithm, we use FAU and eNTERFACE databases, in these two independent databases, the proposed algorithm can also achieve 85.8% and 71.1% accuracy, respectively.

Author supplied keywords

Cite

CITATION STYLE

APA

Sun, T. W. (2020). End-to-End Speech Emotion Recognition with Gender Information. IEEE Access, 8, 152423–152438. https://doi.org/10.1109/ACCESS.2020.3017462

End-to-End Speech Emotion Recognition with Gender Information

Abstract

Author supplied keywords

Cite

Register to see more suggestions