We study the evaluation of graph explanation methods. The state of the art to evaluate explanation methods is to first train a GNN, then generate explanations, and finally compare those explanations with the ground truth. We show five pitfalls that sabotage this pipeline because the GNN does not use the ground-truth edges. Thus, the explanation method cannot detect the ground truth. We propose three novel benchmarks: (i) pattern detection, (ii) community detection, and (iii) handling negative evidence and gradient saturation. In a re-evaluation of state-of-the-art explanation methods, we show paths for improving existing methods and highlight further paths for GNN explanation research.
CITATION STYLE
Faber, L., K. Moghaddam, A., & Wattenhofer, R. (2021). When Comparing to Ground Truth is Wrong: On Evaluating GNN Explanation Methods. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 332–341). Association for Computing Machinery. https://doi.org/10.1145/3447548.3467283
Mendeley helps you to discover research relevant for your work.