Audio visual emotion recognition based on triple-stream dynamic bayesian network models

34Citations
Citations of this article
28Readers
Mendeley users who have this article in their library.
Get full text

Abstract

We present a triple stream DBN model (T-AsyDBN) for audio visual emotion recognition, in which the two audio feature streams are synchronous, while they are asynchronous with the visual feature stream within controllable constraints. MFCC features and the principle component analysis (PCA) coefficients of local prosodic features are used for the audio streams. For the visual stream, 2D facial features as well 3D facial animation unit features are defined and concatenated, and the feature dimensions are reduced by PCA. Emotion recognition experiments on the eNERFACE'05 database show that by adjusting the asynchrony constraint, the proposed T-AsyDBN model obtains 18.73% higher correction rate than the traditional multi-stream state synchronous HMM (MSHMM), and 10.21% higher than the two stream asynchronous DBN model (Asy-DBN). © 2011 Springer-Verlag.

Cite

CITATION STYLE

APA

Jiang, D., Cui, Y., Zhang, X., Fan, P., Ganzalez, I., & Sahli, H. (2011). Audio visual emotion recognition based on triple-stream dynamic bayesian network models. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 6974 LNCS, pp. 609–618). https://doi.org/10.1007/978-3-642-24600-5_64

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free