In safety-critical machine learning applications, it is crucial to defend models against adversarial attacks - small modifications of the input that change the predictions. Besides rigorously studied ℓp-bounded additive perturbations, semantic perturbations (e.g. rotation, translation) raise a serious concern on deploying ML systems in real-world. Therefore, it is important to provide provable guarantees for deep learning models against semantically meaningful input transformations. In this paper, we propose a new universal probabilistic certification approach based on Chernoff-Cramer bounds that can be used in general attack settings. We estimate the probability of a model to fail if the attack is sampled from a certain distribution. Our theoretical findings are supported by experimental results on different datasets.
CITATION STYLE
Pautov, M., Tursynbek, N., Munkhoeva, M., Muravev, N., Petiushko, A., & Oseledets, I. (2022). CC-Cert: A Probabilistic Approach to Certify General Robustness of Neural Networks. In Proceedings of the 36th AAAI Conference on Artificial Intelligence, AAAI 2022 (Vol. 36, pp. 7975–7983). Association for the Advancement of Artificial Intelligence. https://doi.org/10.1609/aaai.v36i7.20768
Mendeley helps you to discover research relevant for your work.