Despite the wide adoption of classification algorithms in many fields, their predictions may hurt the benefit of some people due to the ubiquitous bias over sensitive features, such as race, gender and age. To avoid biased predictions, extensive research efforts have been devoted to training fair classification models under a variety of fairness definitions. However, we observe that recent fair classification methods may still make their predictions based on sensitive features implicitly under existing fairness definitions because the non-sensitive features these models rely on still have the capabilities of predicting the values of sensitive features. To overcome this limitation, we introduce a new fairness definition named “Fairness Through Strict Unawareness” for deep neural networks (DNN), which emphasizes the unpredictability of the sensitive features by the fair classification model. Accordingly, we proposed a bi-level optimization-based approach that prevents the encoded features of a DNN classifier to rely on any sensitive information (explicitly or implicitly). We show that the proposed framework satisfies the fairness under strict unawareness condition while still maintains its prediction accuracy. Experimental results on two benchmark datasets also support this claim. Results show that the proposed framework can significantly degrade the models’ ability of inferring sensitive features without sacrificing its general predictive capability.
CITATION STYLE
Wang, H., Zhang, H., Wang, Y., & Gao, J. (2021). Fair classification under strict unawareness. In SIAM International Conference on Data Mining, SDM 2021 (pp. 199–207). Siam Society. https://doi.org/10.1137/1.9781611976700.23
Mendeley helps you to discover research relevant for your work.