Measuring Biases of Word Embeddings: What Similarity Measures and Descriptive Statistics to Use?

Hossein Azarpanah; Mohsen Farhadloo

Conference Proceedings

Measuring Biases of Word Embeddings: What Similarity Measures and Descriptive Statistics to Use?

TrustNLP 2021 - 1st Workshop on Trustworthy Natural Language Processing, Proceedings of the Workshop (2021) 8-14

DOI: 10.18653/v1/2021.trustnlp-1.2

7Citations

65Readers

Get full text

Abstract

Word embeddings are widely used in Natural Language Processing (NLP) for a vast range of applications. However, it has been consistently proven that these embeddings reflect the same human biases that exist in the data used to train them. Most of the introduced bias indicators to reveal word embeddings' bias are average-based indicators based on the cosine similarity measure. In this study, we examine the impacts of different similarity measures as well as other descriptive techniques than averaging in measuring the biases of contextual and non-contextual word embeddings. We show that the extent of revealed biases in word embeddings depends on the descriptive statistics and similarity measures used to measure the bias. We found that over the ten categories of word embedding association tests, Mahalanobis distance reveals the smallest bias, and Euclidean distance reveals the largest bias in word embeddings. In addition, the contextual models reveal less severe biases than the noncontextual word embedding models with GPT showing the fewest number of WEAT biases.

Cite

CITATION STYLE

APA

Azarpanah, H., & Farhadloo, M. (2021). Measuring Biases of Word Embeddings: What Similarity Measures and Descriptive Statistics to Use? In TrustNLP 2021 - 1st Workshop on Trustworthy Natural Language Processing, Proceedings of the Workshop (pp. 8–14). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2021.trustnlp-1.2

Measuring Biases of Word Embeddings: What Similarity Measures and Descriptive Statistics to Use?

Abstract

Cite

Register to see more suggestions