Implicitly Aligning Joint Distributions for Cross-Corpus Speech Emotion Recognition

6Citations
Citations of this article
6Readers
Mendeley users who have this article in their library.

Abstract

In this paper, we investigate the problem of cross-corpus speech emotion recognition (SER), in which the training (source) and testing (target) speech samples belong to different corpora. This case thus leads to a feature distribution mismatch between the source and target speech samples. Hence, the performance of most existing SER methods drops sharply. To solve this problem, we propose a simple yet effective transfer subspace learning method called joint distribution implicitly aligned subspace learning (JIASL). The basic idea of JIASL is very straightforward, i.e., building an emotion discriminative and corpus invariant linear regression model under an implicit distribution alignment strategy. Following this idea, we first make use of the source speech features and emotion labels to endow such a regression model with emotion-discriminative ability. Then, a well-designed reconstruction regularization term, jointly considering the marginal and conditional distribution alignments between the speech samples in both corpora, is adopted to implicitly enable the regression model to predict the emotion labels of target speech samples. To evaluate the performance of our proposed JIASL, extensive cross-corpus SER experiments are carried out, and the results demonstrate the promising performance of the proposed JIASL in coping with the tasks of cross-corpus SER.

Cite

CITATION STYLE

APA

Lu, C., Zong, Y., Tang, C., Lian, H., Chang, H., Zhu, J., … Zhao, Y. (2022). Implicitly Aligning Joint Distributions for Cross-Corpus Speech Emotion Recognition. Electronics (Switzerland), 11(17). https://doi.org/10.3390/electronics11172745

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free