Gender Bias and Under-Representation in Natural Language Processing across Human Languages

26Citations
Citations of this article
25Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Natural Language Processing (NLP) systems are at the heart of many critical automated decision-making systems making crucial recommendations about our future world. However, these systems reflect a wide range of biases, from gender bias to a bias in which voices they represent. In this paper, a team including speakers of 9 languages - Chinese, Spanish, English, Arabic, German, French, Farsi, Urdu, and Wolof - reports and analyzes measurements of gender bias in the Wikipedia corpora for these 9 languages. In the process, we also document how our work exposes crucial gaps in the NLP-pipeline for many languages. Despite substantial investments in multilingual support, the modern NLP-pipeline still systematically and dramatically under-represents the majority of human voices in the NLP-guided decisions that are shaping our collective future. We develop extensions to profession-level and corpus-level gender bias metric calculations originally designed for English and apply them to 8 other languages, including languages like Spanish, Arabic, German, French and Urdu that have grammatically gendered nouns including different feminine, masculine and neuter profession words. We compare these gender bias measurements across the Wikipedia corpora in different languages as well as across some corpora of more traditional literature.

References Powered by Scopus

Quantifying Gender Bias in Different Corpora

36Citations
N/AReaders
Get full text

Cited by Powered by Scopus

Algorithmic fairness datasets: the story so far

63Citations
N/AReaders
Get full text

AI Fairness in Data Management and Analytics: A Review on Challenges, Methodologies and Applications

38Citations
N/AReaders
Get full text

Exploring gender biases in ML and AI academic research through systematic literature review

30Citations
N/AReaders
Get full text

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Cite

CITATION STYLE

APA

Chen, Y., Mahoney, C., Grasso, I., Wali, E., Matthews, A., Middleton, T., … Matthews, J. (2021). Gender Bias and Under-Representation in Natural Language Processing across Human Languages. In AIES 2021 - Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society (pp. 24–34). Association for Computing Machinery, Inc. https://doi.org/10.1145/3461702.3462530

Readers' Seniority

Tooltip

PhD / Post grad / Masters / Doc 5

56%

Lecturer / Post doc 2

22%

Researcher 2

22%

Readers' Discipline

Tooltip

Computer Science 5

63%

Medicine and Dentistry 1

13%

Arts and Humanities 1

13%

Psychology 1

13%

Save time finding and organizing research with Mendeley

Sign up for free