How Good Is Your Model ‘Really’? On ‘Wildness’ of the In-the-Wild Speech-Based Affect Recognisers

Vedhas Pandit; Maximilian Schmitt; Nicholas Cummins; Franz Graf; Lucas Paletta; Björn Schuller

Conference Proceedings

How Good Is Your Model ‘Really’? On ‘Wildness’ of the In-the-Wild Speech-Based Affect Recognisers

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2018) 11096 LNAI 490-500

DOI: 10.1007/978-3-319-99579-3_51

2Citations

7Readers

Get full text

Abstract

We evaluate, for the first time, the generalisability of in-the-wild speech-based affect tracking models using the database used in the ‘Affect Recognition’ sub-challenge of the Audio/Visual Emotion Challenge and Workshop (AVEC 2017) – namely the ‘Automatic Sentiment Analysis in the Wild (SEWA)’ and the ‘Graz Real-life Affect in the Street and Supermarket (GRAS 2 )’ corpus. The GRAS2 corpus is the only corpus to date featuring audiovisual recordings and time-continuous affect labels of the random participants recorded surreptitiously in a public place. The SEWA database was also collected in an in-the-wild paradigm in that it also features spontaneous affect behaviours, and real-life acoustic disruptions due to connectivity and hardware problems. The SEWA participants, however, were well aware of being recorded throughout, and thus the data potentially suffers from the ‘observer’s paradox’. In this paper, we evaluate how a model trained on a typical data suffering from the observer’s paradox (SEWA) fairs on a real-life data that is relatively free from such psychological effect (GRAS 2 ), and vice versa. Because of the drastically different recording conditions and the recording equipments, the feature spaces for the two databases differ extremely. The in-the-wild nature of the real-life databases, and the extreme disparity between the feature spaces are the key challenges tackled in this paper, a problem of a high practical relevance. We extract bag of audio words features using, for the very first time, a randomised database-independent codebook. True to our hypothesis, the Support Vector Regression model trained on GRAS 2 had better generalisability, as this model could reasonably predict the SEWA arousal labels.

Author supplied keywords

Cite

CITATION STYLE

APA

Pandit, V., Schmitt, M., Cummins, N., Graf, F., Paletta, L., & Schuller, B. (2018). How Good Is Your Model ‘Really’? On ‘Wildness’ of the In-the-Wild Speech-Based Affect Recognisers. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11096 LNAI, pp. 490–500). Springer Verlag. https://doi.org/10.1007/978-3-319-99579-3_51

How Good Is Your Model ‘Really’? On ‘Wildness’ of the In-the-Wild Speech-Based Affect Recognisers

Abstract

Author supplied keywords

Cite

Register to see more suggestions