To widen the scope of bias studies in natural language processing beyond American English we introduce material for measuring social bias in language models against demographic groups in France. We extend the CrowS-pairs dataset with 1, 677 sentence pairs in French that cover stereotypes in ten types of bias. 1, 467 sentence pairs are translated from CrowS-pairs and 210 are newly crowdsourced and translated back into English. The sentence pairs contrast stereotypes concerning underadvantaged groups with the same sentence concerning advantaged groups. We find that four widely used language models favor sentences that express stereotypes in most bias categories. We report on the translation process and offer guidelines to further extend the dataset to other languages.
CITATION STYLE
Névéol, A., Dupont, Y., Bezançon, J., & Fort, K. (2022). French CrowS-Pairs: Extending a challenge dataset for measuring social bias in masked language models to a language other than English. In Traitement Automatique des Langues Naturelles, TALN 2022 - Actes de la 29e Conference sur le Traitement Automatique des Langues Naturelles: Conference Principale (Vol. 1, pp. 355–364). Association pour le traitement automatique des langues. https://doi.org/10.18653/v1/2022.acl-long.583
Mendeley helps you to discover research relevant for your work.