Multi-Task Conformer with Multi-Feature Combination for Speech Emotion Recognition

Jiyoung Seo; Bowon Lee

Journal ArticleOPEN ACCESS

Multi-Task Conformer with Multi-Feature Combination for Speech Emotion Recognition

Symmetry (2022) 14(7)

DOI: 10.3390/sym14071428

8Citations

11Readers

Abstract

Along with automatic speech recognition, many researchers have been actively studyingspeech emotion recognition, since emotion information is as crucial as the textual information foreffective interactions. Emotion can be divided into categorical emotion and dimensional emotion.Although categorical emotion is widely used, dimensional emotion, typically represented as arousaland valence, can provide more detailed information on the emotional states. Therefore, in thispaper, we propose a Conformer-based model for arousal and valence recognition. Our model usesConformer as an encoder, a fully connected layer as a decoder, and statistical pooling layers as aconnector. In addition, we adopted multi-task learning and multi-feature combination, which showeda remarkable performance for speech emotion recognition and time-series analysis, respectively. Theproposed model achieves a state-of-the-art recognition accuracy of 70.0 ± 1.5% for arousal in termsof unweighted accuracy on the IEMOCAP dataset.

Author supplied keywords

Cite

CITATION STYLE

APA

Seo, J., & Lee, B. (2022). Multi-Task Conformer with Multi-Feature Combination for Speech Emotion Recognition. Symmetry, 14(7). https://doi.org/10.3390/sym14071428

Multi-Task Conformer with Multi-Feature Combination for Speech Emotion Recognition

Abstract

Author supplied keywords

Cite

Register to see more suggestions