A survey on datasets for fairness-aware machine learning

267Citations
Citations of this article
257Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

As decision-making increasingly relies on machine learning (ML) and (big) data, the issue of fairness in data-driven artificial intelligence systems is receiving increasing attention from both research and industry. A large variety of fairness-aware ML solutions have been proposed which involve fairness-related interventions in the data, learning algorithms, and/or model outputs. However, a vital part of proposing new approaches is evaluating them empirically on benchmark datasets that represent realistic and diverse settings. Therefore, in this paper, we overview real-world datasets used for fairness-aware ML. We focus on tabular data as the most common data representation for fairness-aware ML. We start our analysis by identifying relationships between the different attributes, particularly with respect to protected attributes and class attribute, using a Bayesian network. For a deeper understanding of bias in the datasets, we investigate interesting relationships using exploratory analysis. This article is categorized under: Commercial, Legal, and Ethical Issues > Fairness in Data Mining Fundamental Concepts of Data and Knowledge > Data Preprocessing.

Cite

CITATION STYLE

APA

Le Quy, T., Roy, A., Iosifidis, V., Zhang, W., & Ntoutsi, E. (2022, May 1). A survey on datasets for fairness-aware machine learning. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery. John Wiley and Sons Inc. https://doi.org/10.1002/widm.1452

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free