Perturbation-based techniques are promising for explaining black-box machine learning models due to their effectiveness and ease of implementation. However, prior works have faced the problem of Out-of-Distribution (OoD) - an artifact of randomly perturbed data becoming inconsistent with the original dataset, degrading the reliability of generated explanations, which is still under-explored according to our best knowledge. This work addresses the OoD issue by designing a simple yet effective module that can quantify the affinity between the perturbed data and the original dataset distribution. Specifically, we penalize the influences of unreliable OoD data for the perturbed samples by integrating the inlier scores and prediction results of the target models, thereby making the final explanations more robust. Our solution is shown to be compatible with the most popular perturbation-based XAI algorithms: RISE, OCCLUSION, and LIME. Extensive experiments confirmed that our methods exhibit superior performance in most cases with computational and cognitive metrics. In particular, we point out the degradation problem of RISE algorithm for the first time. With our design, the performance of RISE can be boosted significantly. Besides, our solution also resolves a fundamental problem with a faithfulness indicator, a commonly used evaluation metric of XAI algorithms that appears sensitive to the OoD issue.
CITATION STYLE
Qiu, L., Yang, Y., Cao, C. C., Zheng, Y., Ngai, H., Hsiao, J., & Chen, L. (2022). Generating Perturbation-based Explanations with Robustness to Out-of-Distribution Data. In WWW 2022 - Proceedings of the ACM Web Conference 2022 (pp. 3594–3605). Association for Computing Machinery, Inc. https://doi.org/10.1145/3485447.3512254
Mendeley helps you to discover research relevant for your work.