In this study, we present a fusion model for emotion recognition based on visual data. The proposed model uses video information as its input and generates emotion labels for each video sample. Based on the video data, we first choose the most significant face regions with the use of a face detection and selection step. Subsequently, we employ three CNN-based architectures to extract the high-level features of the face image sequence. Furthermore, we adjusted one additional module for each CNN-based architecture to capture the sequential information of the entire video dataset. The combination of the three CNN-based models in a late-fusion-based approach yields a competitive result when compared to the baseline approach while using two public datasets: AFEW 2016 and SAVEE.
CITATION STYLE
Do, L. N., Yang, H. J., Nguyen, H. D., Kim, S. H., Lee, G. S., & Na, I. S. (2021). Deep neural network-based fusion model for emotion recognition using visual data. Journal of Supercomputing, 77(10), 10773–10790. https://doi.org/10.1007/s11227-021-03690-y
Mendeley helps you to discover research relevant for your work.