Understanding human emotion is vital to communicate effectively with others, monitor patients, analyse behaviour, and keep an eye on those who are vulnerable. Emotion recognition is essential to achieve a complete human-machine interoperability experience. Artificial intelligence, mainly machine learning (ML), have been used in recent years to improve the model for recognising emotions from a single type of data. A multimodal system has been proposed that uses text, facial expressions, and speech signals to identify emotions in this work. The MobileNet architecture is used to predict emotion from facial expressions, and different ML classifiers are used to predict emotion from text and speech signals in the proposed model. The Facial Expression Recognition 2013 (FER2013) dataset has been used to recognise emotion from facial expressions, whilst the Interactive Emotional Dyadic Motion Capture (IEMOCAP) dataset was used for both text and speech emotion recognition. The proposed ensemble technique consisting of random forest, extreme gradient boosting, and multi-layer perceptron achieves an accuracy of 70.67%, which is better than the unimodal approaches used.
CITATION STYLE
Shahriar, M. F., Arnab, M. S. A., Khan, M. S., Rahman, S. S., Mahmud, M., & Kaiser, M. S. (2023). Towards Machine Learning-Based Emotion Recognition from Multimodal Data. In Lecture Notes in Networks and Systems (Vol. 519 LNNS, pp. 99–109). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-981-19-5191-6_9
Mendeley helps you to discover research relevant for your work.