A novel heterogeneous parallel convolution Bi-LSTM for speech emotion recognition

Huiyun Zhang; Heming Huang; Henry Han

Journal ArticleOPEN ACCESS

A novel heterogeneous parallel convolution Bi-LSTM for speech emotion recognition

Applied Sciences (Switzerland) (2021) 11(21)

DOI: 10.3390/app11219897

32Citations

22Readers

Abstract

Speech emotion recognition is a substantial component of natural language processing (NLP). It has strict requirements for the effectiveness of feature extraction and that of the acoustic model. With that in mind, a Heterogeneous Parallel Convolution Bi-LSTM model is proposed to address the challenges. It consists of two heterogeneous branches: the left one contains two dense layers and a Bi-LSTM layer, while the right one contains a dense layer, a convolution layer, and a Bi-LSTM layer. It can exploit the spatiotemporal information more effectively, and achieves 84.65%, 79.67%, and 56.50% unweighted average recalls on the benchmark databases EMODB, CASIA, and SAVEE, respectively. Compared with the previous research results, the proposed model achieves better performance stably.

Author supplied keywords

Cite

CITATION STYLE

APA

Zhang, H., Huang, H., & Han, H. (2021). A novel heterogeneous parallel convolution Bi-LSTM for speech emotion recognition. Applied Sciences (Switzerland), 11(21). https://doi.org/10.3390/app11219897

A novel heterogeneous parallel convolution Bi-LSTM for speech emotion recognition

Abstract

Author supplied keywords

Cite

Register to see more suggestions