DeSIQ: Towards an Unbiased, Challenging Benchmark for Social Intelligence Understanding

Xiao Yu Guo; Yuan Fang Li; Gholamreza Haffari

Conference Proceedings

DeSIQ: Towards an Unbiased, Challenging Benchmark for Social Intelligence Understanding

EMNLP 2023 - 2023 Conference on Empirical Methods in Natural Language Processing, Proceedings (2023) 3169-3180

DOI: 10.18653/v1/2023.emnlp-main.191

3Citations

11Readers

Get full text

Abstract

Social intelligence is essential for understanding and reasoning about human expressions, intents and interactions. One representative benchmark for its study is Social Intelligence Queries (Social-IQ), a dataset of multiple-choice questions on videos of complex social interactions. We define a comprehensive methodology to study the soundness of Social-IQ, as the soundness of such benchmark datasets is crucial to the investigation of the underlying research problem. Our analysis reveals that Social-IQ contains substantial biases, which can be exploited by a moderately strong language model to learn spurious correlations to achieve perfect performance without being given the context or even the question. We introduce DeSIQ, a new challenging dataset, constructed by applying simple perturbations to Social-IQ. Our empirical analysis shows DeSIQ significantly reduces the biases in the original Social-IQ dataset. Furthermore, we examine and shed light on the effect of model size, model style, learning settings, commonsense knowledge, and multi-modality on the new benchmark performance. Our new dataset, observations and findings open up important research questions for the study of social intelligence.

Cite

CITATION STYLE

APA

Guo, X. Y., Li, Y. F., & Haffari, G. (2023). DeSIQ: Towards an Unbiased, Challenging Benchmark for Social Intelligence Understanding. In EMNLP 2023 - 2023 Conference on Empirical Methods in Natural Language Processing, Proceedings (pp. 3169–3180). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2023.emnlp-main.191

DeSIQ: Towards an Unbiased, Challenging Benchmark for Social Intelligence Understanding

Abstract

Cite

Register to see more suggestions