Emotional states play an important role in Human-Computer Interaction. An emotion recognition framework is proposed to extract and fuse features from both video sequences and speech signals. This framework is constructed from two Hidden Markov Models (HMMs) represented to achieve emotional states with video and audio respectively; Artificial Neural Network (ANN) is applied as the whole fusion mechanism. Two important phases for HMMs are Facial Animation Parameters (FAPs) extraction from video sequences based on Active Appearance Model (AAM), and pitch and energy features extraction from speech signals. Experiments indicate that the proposed approach has better performance and robustness than methods using video or audio separately. © Springer-Verlag Berlin Heidelberg 2012.
CITATION STYLE
Xu, C., Cao, T., Feng, Z., & Dong, C. (2013). Multi-modal fusion emotion recognition based on HMM and ANN. Communications in Computer and Information Science, 332, 541–550. https://doi.org/10.1007/978-3-642-34447-3_48
Mendeley helps you to discover research relevant for your work.