How Good Is Your Model ‘Really’? On ‘Wildness’ of the In-the-Wild Speech-Based Affect Recognisers

2Citations
Citations of this article
7Readers
Mendeley users who have this article in their library.
Get full text

Abstract

We evaluate, for the first time, the generalisability of in-the-wild speech-based affect tracking models using the database used in the ‘Affect Recognition’ sub-challenge of the Audio/Visual Emotion Challenge and Workshop (AVEC 2017) – namely the ‘Automatic Sentiment Analysis in the Wild (SEWA)’ and the ‘Graz Real-life Affect in the Street and Supermarket (GRAS 2 )’ corpus. The GRAS2 corpus is the only corpus to date featuring audiovisual recordings and time-continuous affect labels of the random participants recorded surreptitiously in a public place. The SEWA database was also collected in an in-the-wild paradigm in that it also features spontaneous affect behaviours, and real-life acoustic disruptions due to connectivity and hardware problems. The SEWA participants, however, were well aware of being recorded throughout, and thus the data potentially suffers from the ‘observer’s paradox’. In this paper, we evaluate how a model trained on a typical data suffering from the observer’s paradox (SEWA) fairs on a real-life data that is relatively free from such psychological effect (GRAS 2 ), and vice versa. Because of the drastically different recording conditions and the recording equipments, the feature spaces for the two databases differ extremely. The in-the-wild nature of the real-life databases, and the extreme disparity between the feature spaces are the key challenges tackled in this paper, a problem of a high practical relevance. We extract bag of audio words features using, for the very first time, a randomised database-independent codebook. True to our hypothesis, the Support Vector Regression model trained on GRAS 2 had better generalisability, as this model could reasonably predict the SEWA arousal labels.

Cite

CITATION STYLE

APA

Pandit, V., Schmitt, M., Cummins, N., Graf, F., Paletta, L., & Schuller, B. (2018). How Good Is Your Model ‘Really’? On ‘Wildness’ of the In-the-Wild Speech-Based Affect Recognisers. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11096 LNAI, pp. 490–500). Springer Verlag. https://doi.org/10.1007/978-3-319-99579-3_51

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free