Speech is the most effective way for people to exchange complex information. Recognition of emotional information contained in speech is one of the important challenges in the field of artificial intelligence. To better acquire emotional features in speech signals, a parallelized convolutional recurrent neural network (PCRN) with spectral features is proposed for speech emotion recognition. First, frame-level features are extracted from each utterance and, a long short-term memory is employed to learn these features frame by frame. At the same time, the deltas and delta-deltas of the log Mel-spectrogram are calculated and reconstructed into three channels (static, delta, and delta-delta); these 3-D features are learned by a convolutional neural network (CNN). Then, the two learned high-level features are fused and batch normalized. Finally, a SoftMax classifier is used to classify emotions. Our PCRN model simultaneously processes two different types of features in parallel to better learn the subtle changes in emotion. The experimental results on four public datasets show the superiority of our proposed method, which is better than the previous works.
CITATION STYLE
Jiang, P., Fu, H., Tao, H., Lei, P., & Zhao, L. (2019). Parallelized Convolutional Recurrent Neural Network with Spectral Features for Speech Emotion Recognition. IEEE Access, 7, 90368–90377. https://doi.org/10.1109/ACCESS.2019.2927384
Mendeley helps you to discover research relevant for your work.