Cross-corpus speech emotion recognition based on deep domain-adaptive convolutional neural network

Jiateng Liu; Wenming Zheng; Yuan Zong; Cheng Lu; Chuangao Tang

Journal ArticleOPEN ACCESS

Cross-corpus speech emotion recognition based on deep domain-adaptive convolutional neural network

IEICE Transactions on Information and Systems (2020) E103D(2) 459-463

DOI: 10.1587/transinf.2019EDL8136

20Citations

19Readers

Abstract

In this letter, we propose a novel deep domain-adaptive convolutional neural network (DDACNN) model to handle the challenging cross-corpus speech emotion recognition (SER) problem. The framework of the DDACNN model consists of two components: a feature extraction model based on a deep convolutional neural network (DCNN) and a domain-adaptive (DA) layer added in the DCNN utilizing the maximum mean discrepancy (MMD) criterion. We use labeled spectrograms from source speech corpus combined with unlabeled spectrograms from target speech corpus as the input of two classic DCNNs to extract the emotional features of speech, and train the model with a special mixed loss combined with a cross-entrophy loss and an MMD loss. Compared to other classic cross-corpus SER methods, the major advantage of the DDACNN model is that it can extract robust speech features which are time-frequency related by spectrograms and narrow the discrepancies between feature distribution of source corpus and target corpus to get better cross-corpus performance. Through several cross-corpus SER experiments, our DDACNN achieved the state-of-the-art performance on three public emotion speech corpora and is proved to handle the cross-corpus SER problem efficiently.

Author supplied keywords

References Powered by Scopus

View more at Scopus

Cited by Powered by Scopus

View more at Scopus

Cite

CITATION STYLE

APA

Liu, J., Zheng, W., Zong, Y., Lu, C., & Tang, C. (2020). Cross-corpus speech emotion recognition based on deep domain-adaptive convolutional neural network. IEICE Transactions on Information and Systems, E103D(2), 459–463. https://doi.org/10.1587/transinf.2019EDL8136

Readers' Seniority

PhD / Post grad / Masters / Doc 3

60%

Lecturer / Post doc 1

20%

Researcher 1

20%

Readers' Discipline

Computer Science 3

43%

Engineering 2

29%

Chemical Engineering 1

14%

Neuroscience 1

14%

Cross-corpus speech emotion recognition based on deep domain-adaptive convolutional neural network

Abstract

Author supplied keywords

References Powered by Scopus

Gradient-based learning applied to document recognition

ImageNet classification with deep convolutional neural networks

Domain adaptation via transfer component analysis

Cited by Powered by Scopus

Integrating process dynamics in data-driven models of chemical processing systems

Fusing visual attention cnn and bag of visual words for cross-corpus speech emotion recognition

Speech emotion recognition based on transfer learning from the FaceNet framework

Register to see more suggestions

Cite

Readers' Seniority

Readers' Discipline