Abstract
Many of today’s decision making systems deployed in the real world are not static—they are changing and adapting over time, a phenomenon known as model adaptation takes place. Because of their wide reaching influence and potentially serious consequences, the need for transparency and interpretability of AI-based decision making systems is widely accepted and thus have been worked on extensively—e.g. a very prominent class of explanations are contrasting explanations which try to mimic human explanations. However, usually, explanation methods assume a static system that has to be explained. Explaining non-static systems is still an open research question, which poses the challenge how to explain model differences, adaptations and changes. In this contribution, we propose and (empirically) evaluate a general framework for explaining model adaptations and differences by contrasting explanations. We also propose a method for automatically finding regions in data space that are affected by a given model adaptation—i.e. regions where the internal reasoning of the other (e.g. adapted) model changed—and thus should be explained. Finally, we also propose a regularization for model adaptations to ensure that the internal reasoning of the adapted model does not change in an unwanted way.
Author supplied keywords
Cite
CITATION STYLE
Artelt, A., Hinder, F., Vaquet, V., Feldhans, R., & Hammer, B. (2023). Contrasting Explanations for Understanding and Regularizing Model Adaptations. Neural Processing Letters, 55(5), 5273–5297. https://doi.org/10.1007/s11063-022-10826-5
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.