Hotflip: White-box adversarial examples for text classification

Javid Ebrahimi; Anyi Rao; Daniel Lowd; Dejing Dou

Conference ProceedingsOPEN ACCESS

Hotflip: White-box adversarial examples for text classification

ACL 2018 - 56th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (Long Papers) (2018) 2 31-36

DOI: 10.18653/v1/p18-2006

566Citations

535Readers

Abstract

We propose an efficient method to generate white-box adversarial examples to trick a character-level neural classifier. We find that only a few manipulations are needed to greatly decrease the accuracy. Our method relies on an atomic flip operation, which swaps one token for another, based on the gradients of the one-hot input vectors. Due to efficiency of our method, we can perform adversarial training which makes the model more robust to attacks at test time. With the use of a few semantics-preserving constraints, we demonstrate that HotFlip can be adapted to attack a word-level classifier as well.

Cite

CITATION STYLE

APA

Ebrahimi, J., Rao, A., Lowd, D., & Dou, D. (2018). Hotflip: White-box adversarial examples for text classification. In ACL 2018 - 56th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (Long Papers) (Vol. 2, pp. 31–36). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/p18-2006

Hotflip: White-box adversarial examples for text classification

Abstract

Cite

Register to see more suggestions