Fooling Partial Dependence via Data Poisoning

9Citations
Citations of this article
20Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Many methods have been developed to understand complex predictive models and high expectations are placed on post-hoc model explainability. It turns out that such explanations are not robust nor trustworthy, and they can be fooled. This paper presents techniques for attacking Partial Dependence (plots, profiles, PDP), which are among the most popular methods of explaining any predictive model trained on tabular data. We showcase that PD can be manipulated in an adversarial manner, which is alarming, especially in financial or medical applications where auditability became a must-have trait supporting black-box machine learning. The fooling is performed via poisoning the data to bend and shift explanations in the desired direction using genetic and gradient algorithms. We believe this to be the first work using a genetic algorithm for manipulating explanations, which is transferable as it generalizes both ways: in a model-agnostic and an explanation-agnostic manner.

Cite

CITATION STYLE

APA

Baniecki, H., Kretowicz, W., & Biecek, P. (2023). Fooling Partial Dependence via Data Poisoning. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 13715 LNAI, pp. 121–136). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-031-26409-2_8

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free