We investigated emotion classification from brief video recordings from the GEMEP database wherein actors portrayed 18 emotions. Vocal features consisted of acousticparameters related to frequency, intensity, spectral distribution, and durations.Facial features consisted of facial action units. We first performed a series of personindependentsupervised classification experiments. Best performance (AUC = 0.88)was obtained by merging the output from the best unimodal vocal (Elastic Net,AUC = 0.82) and facial (Random Forest, AUC = 0.80) classifiers using a late fusionapproach and the product rule method. All 18 emotions were recognized withabove-chance recall, although recognition rates varied widely across emotions(e.g., high for amusement, anger, and disgust; and low for shame). Multimodalfeature patterns for each emotion are described in terms of the vocal and facialfeatures that contributed most to classifier performance. Next, a series of exploratoryunsupervised classification experiments were performed to gain more insight intohow emotion expressions are organized. Solutions from traditional clusteringtechniques were interpreted using decision trees in order to explore which featuresunderlie clustering. Another approach utilized various dimensionality reductiontechniques paired with inspection of data visualizations. Unsupervised methodsdid not cluster stimuli in terms of emotion categories, but several explanatorypatterns were observed. Some could be interpreted in terms of valence and arousal,but actor and gender specific aspects also contributed to clustering. Identifyingexplanatory patterns holds great potential as a meta-heuristic when unsupervisedmethods are used in complex classification tasks
CITATION STYLE
Carbonell, M. F., Boman, M., & Laukka, P. (2021). Comparing supervised and unsupervised approaches to multimodal emotion recognition. PeerJ Computer Science, 7. https://doi.org/10.7717/PEERJ-CS.804
Mendeley helps you to discover research relevant for your work.