NIC: Detecting Adversarial Samples with Neural Network Invariant Checking

162Citations
Citations of this article
183Readers
Mendeley users who have this article in their library.

Abstract

Deep Neural Networks (DNN) are vulnerable to adversarial samples that are generated by perturbing correctly classified inputs to cause DNN models to misbehave (e.g., misclassification). This can potentially lead to disastrous consequences especially in security-sensitive applications. Existing defense and detection techniques work well for specific attacks under various assumptions (e.g., the set of possible attacks are known beforehand). However, they are not sufficiently general to protect against a broader range of attacks. In this paper, we analyze the internals of DNN models under various attacks and identify two common exploitation channels: the provenance channel and the activation value distribution channel. We then propose a novel technique to extract DNN invariants and use them to perform runtime adversarial sample detection. Our experimental results of 11 different kinds of attacks on popular datasets including ImageNet and 13 models show that our technique can effectively detect all these attacks (over 90% accuracy) with limited false positives. We also compare it with three state-of-the-art techniques including the Local Intrinsic Dimensionality (LID) based method, denoiser based methods (i.e., MagNet and HGD), and the prediction inconsistency based approach (i.e., feature squeezing). Our experiments show promising results.

Cite

CITATION STYLE

APA

Ma, S., Liu, Y., Tao, G., Lee, W. C., & Zhang, X. (2019). NIC: Detecting Adversarial Samples with Neural Network Invariant Checking. In 26th Annual Network and Distributed System Security Symposium, NDSS 2019. The Internet Society. https://doi.org/10.14722/ndss.2019.23415

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free