A Multitask Learning Approach Based on Cascaded Attention Network and Self-Adaption Loss for Speech Emotion Recognition

Yang Liu; Yuqi Xia; Haoqin Sun; Xiaolei Meng; Jianxiong Bai; Wenbo Guan; Zhen Zhao; Yongwei Li

Journal ArticleOPEN ACCESS

A Multitask Learning Approach Based on Cascaded Attention Network and Self-Adaption Loss for Speech Emotion Recognition

IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences (2023) E106–A(6) 876-885

DOI: 10.1587/transfun.2022EAP1091

0Citations

5Readers

Abstract

SUMMARY Speech emotion recognition (SER) has been a complex and difficult task for a long time due to emotional complexity. In this paper, we propose a multitask deep learning approach based on cascaded attention network and self-adaption loss for SER. First, non-personalized features are extracted to represent the process of emotion change while reducing external variables’ influence. Second, to highlight salient speech emotion features, a cascade attention network is proposed, where spatial temporal attention can effectively locate the regions of speech that express emotion, while self-attention reduces the dependence on external information. Finally, the influence brought by the differences in gender and human perception of external information is alleviated by using a multitask learning strategy, where a self-adaption loss is introduced to determine the weights of different tasks dynamically. Experimental results on IEMOCAP dataset demonstrate that our method gains an absolute improvement of 1.97% and 0.91% over state-of-the-art strategies in terms of weighted accuracy (WA) and unweighted accuracy (UA), respectively.

Author supplied keywords

Cite

CITATION STYLE

APA

Liu, Y., Xia, Y., Sun, H., Meng, X., Bai, J., Guan, W., … Li, Y. (2023). A Multitask Learning Approach Based on Cascaded Attention Network and Self-Adaption Loss for Speech Emotion Recognition. IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences, E106–A(6), 876–885. https://doi.org/10.1587/transfun.2022EAP1091

A Multitask Learning Approach Based on Cascaded Attention Network and Self-Adaption Loss for Speech Emotion Recognition

Abstract

Author supplied keywords

Cite

Register to see more suggestions