Detection of emotion in a conversation has a lot of applications such as humanizing chatbots, understanding public opinion through social media, medical counseling, building security systems and interactive computer simulations, etc. Since humans express emotion not only from what they speak but also from their tone and facial expressions, we have used features from three modes—text, audio and video and tried out different fusion techniques to combine the models. We have proposed a new architecture specially designed for dyadic conversation where each individual is modelled using a separate network that exchanges emotion context and seems to have a conversation with the other network. We have refined this model using teacher force.
CITATION STYLE
Shah, P., Raj, P. P., Suresh, P., & Das, B. (2021). Contextually Aware Multimodal Emotion Recognition. In Advances in Intelligent Systems and Computing (Vol. 1245, pp. 745–753). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-981-15-7234-0_71
Mendeley helps you to discover research relevant for your work.