Re-thinking model robustness from stability: a new insight to defend adversarial examples

Shufei Zhang; Kaizhu Huang; Zenglin Xu

Journal ArticleOPEN ACCESS

Re-thinking model robustness from stability: a new insight to defend adversarial examples

Machine Learning (2022) 111(7) 2489-2513

DOI: 10.1007/s10994-022-06186-9

5Citations

5Readers

Abstract

We study the model robustness against adversarial examples, referred to as small perturbed input data that may however fool many state-of-the-art deep learning models. Unlike previous research, we establish a novel theory addressing the robustness issue from the perspective of stability of the loss function in the small neighborhood of natural examples. We propose to exploit an energy function to describe the total variation in a small neighborhood and prove that reducing such energy guarantees the robustness against adversarial examples. We also show that the traditional training methods including adversarial training and virtual adversarial training tend to minimize the lower bound of our proposed energy function. Importantly, we prove that minimizing the energy function can obtain a better generalization bound than traditional adversarial training approaches. Through a series of experiments, we demonstrate the superiority of our model on different datasets for defending adversarial attacks. In particular, our proposed adversarial framework achieves the best performance compared with previous adversarial training methods on benchmark datasets CIFAR-10, CIFAR-100 and SVHN and they demonstrate much better robustness against adversarial examples than all the other comparison methods.

Author supplied keywords

Cite

CITATION STYLE

APA

Zhang, S., Huang, K., & Xu, Z. (2022). Re-thinking model robustness from stability: a new insight to defend adversarial examples. Machine Learning, 111(7), 2489–2513. https://doi.org/10.1007/s10994-022-06186-9

Re-thinking model robustness from stability: a new insight to defend adversarial examples

Abstract

Author supplied keywords

Cite

Register to see more suggestions