DeSIQ: Towards an Unbiased, Challenging Benchmark for Social Intelligence Understanding

3Citations
Citations of this article
11Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Social intelligence is essential for understanding and reasoning about human expressions, intents and interactions. One representative benchmark for its study is Social Intelligence Queries (Social-IQ), a dataset of multiple-choice questions on videos of complex social interactions. We define a comprehensive methodology to study the soundness of Social-IQ, as the soundness of such benchmark datasets is crucial to the investigation of the underlying research problem. Our analysis reveals that Social-IQ contains substantial biases, which can be exploited by a moderately strong language model to learn spurious correlations to achieve perfect performance without being given the context or even the question. We introduce DeSIQ, a new challenging dataset, constructed by applying simple perturbations to Social-IQ. Our empirical analysis shows DeSIQ significantly reduces the biases in the original Social-IQ dataset. Furthermore, we examine and shed light on the effect of model size, model style, learning settings, commonsense knowledge, and multi-modality on the new benchmark performance. Our new dataset, observations and findings open up important research questions for the study of social intelligence.

Cite

CITATION STYLE

APA

Guo, X. Y., Li, Y. F., & Haffari, G. (2023). DeSIQ: Towards an Unbiased, Challenging Benchmark for Social Intelligence Understanding. In EMNLP 2023 - 2023 Conference on Empirical Methods in Natural Language Processing, Proceedings (pp. 3169–3180). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2023.emnlp-main.191

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free