Training Data Debugging for the Fairness of Machine Learning Software

Yanhui Li; Linghan Meng; Lin Chen; Li Yu; Di Wu; Yuming Zhou; Baowen Xu

Conference ProceedingsOPEN ACCESS

Training Data Debugging for the Fairness of Machine Learning Software

Proceedings - International Conference on Software Engineering (2022) 2022-May 2215-2227

DOI: 10.1145/3510003.3510091

31Citations

39Readers

Get full text

Abstract

With the widespread application of machine learning (ML) software, especially in high-risk tasks, the concern about their unfairness has been raised towards both developers and users of ML software. The unfairness of ML software indicates the software behavior affected by the sensitive features (e.g., sex), which leads to biased and illegal decisions and has become a worthy problem for the whole software engineering community. According to the 'data-driven' programming paradigm of ML software, we consider the root cause of the unfairness as biased features in training data. Inspired by software debugging, we propose a novel method, Linear-regression based Training Data Debugging (LTDD), to debug feature values in training data, i.e., (a) identify which features and which parts of them are biased, and (b) exclude the biased parts of such features to recover as much valuable and unbiased information as possible to build fair ML software. We conduct an extensive study on nine data sets and three classifiers to evaluate the effect of our method LTDD compared with four baseline methods. Experimental results show that (a) LTDD can better improve the fairness of ML software with less or comparable damage to the performance, and (b) LTDD is more actionable for fairness improvement in realistic scenarios.

Author supplied keywords

Cite

CITATION STYLE

APA

Li, Y., Meng, L., Chen, L., Yu, L., Wu, D., Zhou, Y., & Xu, B. (2022). Training Data Debugging for the Fairness of Machine Learning Software. In Proceedings - International Conference on Software Engineering (Vol. 2022-May, pp. 2215–2227). IEEE Computer Society. https://doi.org/10.1145/3510003.3510091

Training Data Debugging for the Fairness of Machine Learning Software

Abstract

Author supplied keywords

Cite

Register to see more suggestions