Multi-modal fusion emotion recognition based on HMM and ANN

6Citations
Citations of this article
3Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Emotional states play an important role in Human-Computer Interaction. An emotion recognition framework is proposed to extract and fuse features from both video sequences and speech signals. This framework is constructed from two Hidden Markov Models (HMMs) represented to achieve emotional states with video and audio respectively; Artificial Neural Network (ANN) is applied as the whole fusion mechanism. Two important phases for HMMs are Facial Animation Parameters (FAPs) extraction from video sequences based on Active Appearance Model (AAM), and pitch and energy features extraction from speech signals. Experiments indicate that the proposed approach has better performance and robustness than methods using video or audio separately. © Springer-Verlag Berlin Heidelberg 2012.

Cite

CITATION STYLE

APA

Xu, C., Cao, T., Feng, Z., & Dong, C. (2013). Multi-modal fusion emotion recognition based on HMM and ANN. Communications in Computer and Information Science, 332, 541–550. https://doi.org/10.1007/978-3-642-34447-3_48

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free