Emotion expressions sometimes are mixed with the utterance expression in spontaneous face-to-face communication, which makes difficulties for emotion recognition. This article introduces the methods of reducing the utterance influences in visual parameters for the audio-visual-based emotion recognition. The audio and visual channels are first combined under a Multistream Hidden Markov Model (MHMM). Then, the utterance reduction is finished by finding the residual between the real visual parameters and the outputs of the utterance related visual parameters. This article introduces the Fused Hidden Markov Model Inversion method which is trained in the neutral expressed audio-visual corpus to solve the problem. To reduce the computing complexity the inversion model is further simplified to a Gaussian Mixture Model (GMM) mapping. Compared with traditional bimodal emotion recognition methods (e.g., SVM, CART, Boosting), the utterance reduction method can give better results of emotion recognition. The experiments also show the effectiveness of our emotion recognition system when it was used in a live environment. Keywords: Bimodal emotion recognition Utterance Independent, Multistream Hidden Markov Model, Fused Hid- den Markov Model Inversion
CITATION STYLE
Tao, J., Pan, S., Yang, M., Li, Y., Mu, K., & Che, J. (2011). Utterance independent bimodal emotion recognition in spontaneous communication. EURASIP Journal on Advances in Signal Processing, 2011(1). https://doi.org/10.1186/1687-6180-2011-4
Mendeley helps you to discover research relevant for your work.