Audio visual emotion recognition based on triple-stream dynamic bayesian network models

Dongmei Jiang; Yulu Cui; Xiaojing Zhang; Ping Fan; Isabel Ganzalez; Hichem Sahli

Conference Proceedings

Audio visual emotion recognition based on triple-stream dynamic bayesian network models

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2011) 6974 LNCS(PART 1) 609-618

DOI: 10.1007/978-3-642-24600-5_64

34Citations

28Readers

Get full text

Abstract

We present a triple stream DBN model (T-AsyDBN) for audio visual emotion recognition, in which the two audio feature streams are synchronous, while they are asynchronous with the visual feature stream within controllable constraints. MFCC features and the principle component analysis (PCA) coefficients of local prosodic features are used for the audio streams. For the visual stream, 2D facial features as well 3D facial animation unit features are defined and concatenated, and the feature dimensions are reduced by PCA. Emotion recognition experiments on the eNERFACE'05 database show that by adjusting the asynchrony constraint, the proposed T-AsyDBN model obtains 18.73% higher correction rate than the traditional multi-stream state synchronous HMM (MSHMM), and 10.21% higher than the two stream asynchronous DBN model (Asy-DBN). © 2011 Springer-Verlag.

Author supplied keywords

Cite

CITATION STYLE

APA

Jiang, D., Cui, Y., Zhang, X., Fan, P., Ganzalez, I., & Sahli, H. (2011). Audio visual emotion recognition based on triple-stream dynamic bayesian network models. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 6974 LNCS, pp. 609–618). https://doi.org/10.1007/978-3-642-24600-5_64

Audio visual emotion recognition based on triple-stream dynamic bayesian network models

Abstract

Author supplied keywords

Cite

Register to see more suggestions