Training Data Debugging for the Fairness of Machine Learning Software

27Citations
Citations of this article
33Readers
Mendeley users who have this article in their library.
Get full text

Abstract

With the widespread application of machine learning (ML) software, especially in high-risk tasks, the concern about their unfairness has been raised towards both developers and users of ML software. The unfairness of ML software indicates the software behavior affected by the sensitive features (e.g., sex), which leads to biased and illegal decisions and has become a worthy problem for the whole software engineering community. According to the 'data-driven' programming paradigm of ML software, we consider the root cause of the unfairness as biased features in training data. Inspired by software debugging, we propose a novel method, Linear-regression based Training Data Debugging (LTDD), to debug feature values in training data, i.e., (a) identify which features and which parts of them are biased, and (b) exclude the biased parts of such features to recover as much valuable and unbiased information as possible to build fair ML software. We conduct an extensive study on nine data sets and three classifiers to evaluate the effect of our method LTDD compared with four baseline methods. Experimental results show that (a) LTDD can better improve the fairness of ML software with less or comparable damage to the performance, and (b) LTDD is more actionable for fairness improvement in realistic scenarios.

Cite

CITATION STYLE

APA

Li, Y., Meng, L., Chen, L., Yu, L., Wu, D., Zhou, Y., & Xu, B. (2022). Training Data Debugging for the Fairness of Machine Learning Software. In Proceedings - International Conference on Software Engineering (Vol. 2022-May, pp. 2215–2227). IEEE Computer Society. https://doi.org/10.1145/3510003.3510091

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free