This paper proposes a new task of Multi-modal Multi-emotion Emotional Support Conversation (MMESC), which has great value in various applications, such as counseling, daily chatting, and elderly company. This task aims to fully perceive the users’ emotional states from multiple modalities and generate appropriate responses to provide comfort for improving their feelings. Traditional works mainly focus on textual conversation, while a single-modal cannot accurately reflect the users’ emotions, such as saying fine with an inconsistent disgusting feeling. To address this problem, we propose a new task on multi-modalities and exploit a new method called FEAT for this new task. FEAT can integrate fine-grained emotional knowledge from multiple modalities. It first recognizes the users’ mental states based on an emotion-aware transformer. It then generates supportive responses using a hybrid method with multiple comfort strategies. To evaluate our method, we construct a large-scale dataset named MMESConv. It is almost two times larger than existing single-modal datasets. There are three modalities in this dataset (text, audio, and video) with fine-grained emotion annotations and strategy labels. Extensive experiments on this dataset demonstrate the advantages of our proposed framework.
CITATION STYLE
Liu, G., Dong, X., Wang, M. xiang, Yu, J., Gan, M., Liu, W., & Yin, J. (2023). Multi-modal Multi-emotion Emotional Support Conversation. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 14176 LNAI, pp. 293–308). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-031-46661-8_20
Mendeley helps you to discover research relevant for your work.