Abstract
Warning: This paper contains offensive material by way of examples and case studies which is unavoidable due to the nature of the work. In this paper, we describe our work on social bias detection in a low-resource multilingual setting in which the languages are from two very divergent families- Indo-European (English, Hindi, and Italian) and Altaic (Korean). Currently, the majority of the social bias datasets available are in English and this inhibits progress on social bias detection in low-resource languages. To address this problem, we introduce a new dataset for social bias detection in Hindi and investigate multilingual transfer learning using publicly available English, Italian, and Korean datasets. The Hindi dataset contains ∼ 9k social media posts annotated for (i) binary bias labels (bias/neutral), (ii) binary labels for sentiment (positive/negative), (iii) target groups for each bias category, and (iv) rationale for annotated bias labels (a short piece of text). We benchmark our Hindi dataset using different multilingual models, with XLM-R achieving the best performance of 80.8 macroF1 score. Our results show that the detection of social biases in resource-constrained languages such as Hindi and Korean may be improved with the use of a similar dataset in English. We also show that translating all datasets into English does not work effectively for detecting social bias, since the nuances of source language are lost in translation.
Cite
CITATION STYLE
Sahoo, N. R., Mallela, N., & Bhattacharyya, P. (2023). With Prejudice to None: A Few-Shot, Multilingual Transfer Learning Approach to Detect Social Bias in Low Resource Languages. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (pp. 13316–13330). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2023.findings-acl.842
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.