Interpreting deep learning models with marginal attribution by conditioning on quantiles

Michael Merz; Ronald Richman; Andreas Tsanakas; Mario V. Wüthrich

Journal ArticleOPEN ACCESS

Interpreting deep learning models with marginal attribution by conditioning on quantiles

Data Mining and Knowledge Discovery (2022) 36(4) 1335-1370

DOI: 10.1007/s10618-022-00841-4

6Citations

19Readers

Abstract

A vast and growing literature on explaining deep learning models has emerged. This paper contributes to that literature by introducing a global gradient-based model-agnostic method, which we call Marginal Attribution by Conditioning on Quantiles (MACQ). Our approach is based on analyzing the marginal attribution of predictions (outputs) to individual features (inputs). Specifically, we consider variable importance by fixing (global) output levels, and explaining how features marginally contribute to these fixed global output levels. MACQ can be seen as a marginal attribution counterpart to approaches such as accumulated local effects, which study the sensitivities of outputs by perturbing inputs. Furthermore, MACQ allows us to separate marginal attribution of individual features from interaction effects and to visualize the 3-way relationship between marginal attribution, output level, and feature value.

Author supplied keywords

Cite

CITATION STYLE

APA

Merz, M., Richman, R., Tsanakas, A., & Wüthrich, M. V. (2022). Interpreting deep learning models with marginal attribution by conditioning on quantiles. Data Mining and Knowledge Discovery, 36(4), 1335–1370. https://doi.org/10.1007/s10618-022-00841-4

Interpreting deep learning models with marginal attribution by conditioning on quantiles

Abstract

Author supplied keywords

Cite

Register to see more suggestions