Reducing Racial and Gender Bias in Machine Learning and Natural Language Processing Tasks Using a GAN Approach

Isabella S. Mandis

Journal ArticleOPEN ACCESS

Reducing Racial and Gender Bias in Machine Learning and Natural Language Processing Tasks Using a GAN Approach

Mandis I

International Journal of High School Research (2021) 3(6) 17-24

DOI: 10.36838/v3i6.5

N/ACitations

12Readers

Abstract

While organizations such as Amazon have used machine learning-based algorithms for their hiring processes, diverse employees are not equitably hired due to biased datasets. Current approaches to debias machine learning algorithms are expensive and difficult to implement. This research uses a Generative Adversarial Network (GAN) to debias a multi-class machine learning classifier's prediction of a person's income with respect to their race and gender. First, it is shown that the multi-class classifier, which uses California census data, is biased through a quantitative score and high misclassification rates. Next, taking inspiration from classical GAN architecture, two neural networks are created: a predictive network that takes in a person's features, excluding race and gender, to predict their income; and an adversarial network that infers the person's race and gender from the predictive network. To prove the generalizability of the GAN, the GAN is used to debias a Natural Language Processing (NLP) task: a word vector association task trained on 1,000 random Wikipedia articles. A decrease in bias is observed when the GAN is applied to the multi-class classifier and the word vector association task. The classifier, which originally had p-% of 39% for race and 30% for gender, increased to 76% for race and 82% for gender after applying the GAN. It has been shown that artificial intelligence, more specifically GANs, can be used to decrease the bias in machine learning algorithm outputs; the algorithm can be easily applied to real-world situations such as hiring employees or approving loans. Why Eliminating Bias Is Important: For the purposes of this paper, bias is defined as a condition where an algorithm when queried for data returns one result at a higher rate than another, based on its reading of 'sensitive at-tributes' such as race and gender. In other words, an algorithm is biased if it unfairly prefers some groups of data over others. The GAN algorithm proposed in this research debiases existing algorithms that prioritize men over women, and White people over people who are Black, Asian, or Native American. A more detailed and quantifiable definition of fairness can be found in the "Meaning of Fairness" section of this paper. Machine learning consists of algorithms that are exposed to training data which then improve their abilities through experience. Unfortunately, biases are often associated with this process. For example, the word doctor may typically be associated with a man because of the bias in the inputted dataset. As machine learning is trained on data, bias present in any given data is paralleled in the machine learning algorithm. There are disparities in life that cannot be debiased: for example , women are more likely to live longer than men. A bias exists wherever there is a disparity in results that does not necessarily reflect reality. Disparities in data lead to biased results from the algorithms that are trained on said data. For instance, a woman may be statistically more likely to be a nurse than a man, but if this fact prevents men from becoming nurses it is an example of bias. When this fact does not prevent men from becoming nurses, it is an example of disparity. A biased result can easily occur in machine learning. Considering the prior example , here is a scenario that could occur. First, an algorithm takes data containing a disparity-such as a dataset showing that more women are nurses than men-and trains itself to implicitly recognize that women are better nurses than men because of that original disparity. Over time, the algorithm perpetuates its skewed understanding by recommending that a company should hire more women than men for nursing roles. This project does not create a solution for algorithms that are biased by design. Rather, this approach attempts to decrease

Cite

CITATION STYLE

APA

Mandis, I. S. (2021). Reducing Racial and Gender Bias in Machine Learning and Natural Language Processing Tasks Using a GAN Approach. International Journal of High School Research, 3(6), 17–24. https://doi.org/10.36838/v3i6.5

Reducing Racial and Gender Bias in Machine Learning and Natural Language Processing Tasks Using a GAN Approach

Abstract

Cite

Register to see more suggestions