Parallelized Convolutional Recurrent Neural Network with Spectral Features for Speech Emotion Recognition

84Citations
Citations of this article
69Readers
Mendeley users who have this article in their library.

Abstract

Speech is the most effective way for people to exchange complex information. Recognition of emotional information contained in speech is one of the important challenges in the field of artificial intelligence. To better acquire emotional features in speech signals, a parallelized convolutional recurrent neural network (PCRN) with spectral features is proposed for speech emotion recognition. First, frame-level features are extracted from each utterance and, a long short-term memory is employed to learn these features frame by frame. At the same time, the deltas and delta-deltas of the log Mel-spectrogram are calculated and reconstructed into three channels (static, delta, and delta-delta); these 3-D features are learned by a convolutional neural network (CNN). Then, the two learned high-level features are fused and batch normalized. Finally, a SoftMax classifier is used to classify emotions. Our PCRN model simultaneously processes two different types of features in parallel to better learn the subtle changes in emotion. The experimental results on four public datasets show the superiority of our proposed method, which is better than the previous works.

Cite

CITATION STYLE

APA

Jiang, P., Fu, H., Tao, H., Lei, P., & Zhao, L. (2019). Parallelized Convolutional Recurrent Neural Network with Spectral Features for Speech Emotion Recognition. IEEE Access, 7, 90368–90377. https://doi.org/10.1109/ACCESS.2019.2927384

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free