Abstract
The ease of access to pre-trained transformers has enabled developers to leverage large-scale language models to build exciting applications for their users. While such pre-trained models offer convenient starting points for researchers and developers, there is little consideration for the societal biases captured within these model risking perpetuation of racial, gender, and other harmful biases when these models are deployed at scale. In this paper, we investigate gender and racial bias across ubiquitous pre-trained language models, including GPT-2, XLNet, BERT, RoBERTa, ALBERT and DistilBERT. We evaluate bias within pre-trained transformers using three metrics: WEAT, sequence likelihood, and pronoun ranking. We conclude with an experiment demonstrating the ineffectiveness of word-embedding techniques, such as WEAT, signaling the need for more robust bias testing in transformers.
Cite
CITATION STYLE
Silva, A., Tambwekar, P., & Gombolay, M. (2021). Towards a Comprehensive Understanding and Accurate Evaluation of Societal Biases in Pre-Trained Transformers. In NAACL-HLT 2021 - 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Conference (pp. 2383–2389). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2021.naacl-main.189
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.