In order to improve multi-party social interaction with artificial companions such as robots or virtual agents, real-time Emotion Recognition in Conversation (ERC) is required. In this context, ERC is a challenging task which involves multiple challenges, such as processing multimodal data over time, taking into account the multi-party context with any number of participants, understanding implied relevant commonsense knowledge during interaction and taking into account each participant's emotional attitude. To deal with the aforementioned challenges, we design a multimodal off-the-shelf model that meets the requirements of real-life scenarios, specifically dyadic and multi-party interactions. We propose a Knowledge Aware Multi-Headed Network that integrates various sources including the dialog history and commonsense knowledge about the speaker and other participants. The weights of these pieces of information are modulated using a multi-head attention mechanism. The proposed model is learnt in a Multi-Task Learning framework which combines the ERC task with a Dialogue Act (DA) recognition task and an Emotion Shift (ES) detection task through a joint learning strategy. Our proposition obtains competitive and stable results on several benchmark datasets that vary in number of participants and length of conversations, and outperforms the state-of-the-art on one of these datasets. The importance of DA and ES prediction in determining the speaker's current emotional state is investigated.
CITATION STYLE
Rasendrasoa, S., Pauchet, A., Saunier, J., & Adam, S. (2022). Real-Time Multimodal Emotion Recognition in Conversation for Multi-Party Interactions. In ACM International Conference Proceeding Series (pp. 395–403). Association for Computing Machinery. https://doi.org/10.1145/3536221.3556601
Mendeley helps you to discover research relevant for your work.