Adversarial attack has cast a shadow on the massive success of deep neural networks. Despite being almost visually identical to the clean data, the adversarial images can fool deep neural networks into the wrong predictions with very high confidence. Adversarial training, as the most prevailing defense technique, suffers from class-wise unfairness and model-dependent challenges. In this paper, we propose to detect and eliminate adversarial data in databases prior to data processing in supporting robust and secure AI workloads. We empirically show that we can build a binary classifier separating the adversarial apart from the clean data with high accuracy. We also show that the binary classifier is robust to a second-round adversarial attack. In other words, it is difficult to disguise adversarial samples to bypass the binary classifier. Furthermore, we empirically investigate the generalization limitation which lingers on all current defensive methods, including the binary classifier approach. And we hypothesize that this is the result of the intrinsic property of adversarial crafting algorithms. Our experiments ascertain that adversarial and clean data are two different datasets that can be separated with a binary classifier, which can serve as a portable component to detect and eliminate adversarial data in an end-to-end data management pipeline.
CITATION STYLE
Gong, Z., & Wang, W. (2023). Adversarial and Clean Data Are Not Twins. In Proceedings of the 6th International Workshop on Exploiting Artificial Intelligence Techniques for Data Management, aiDM 2023 - In conjunction with the 2023 ACM SIGMOD/PODS Conference. Association for Computing Machinery, Inc. https://doi.org/10.1145/3593078.3593935
Mendeley helps you to discover research relevant for your work.