Generating Adversarial Texts for Recurrent Neural Networks

1Citations
Citations of this article
6Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Adversarial examples have received increasing attention recently due to their significant values in evaluating and improving the robustness of deep neural networks. Existing adversarial attack algorithms have achieved good result for most images. However, those algorithms cannot be directly applied to texts as the text data is discrete in nature. In this paper, we extend two state-of-the-art attack algorithms, PGD and C&W, to craft adversarial text examples for RNN-based models. For Extend-PGD attack, it identifies the words that are important for classification by computing the Jacobian matrix of the classifier, to effectively generate adversarial text examples. For Extend-C&W attack, it utilizes regularization to minimize the alteration of the original input text. We conduct comparison experiments on two recurrent neural networks trained for classifying texts in two real-world datasets. Experimental results show that our Extend-PGD and Extend-C&W attack algorithms have advantages of attack success rate and semantics-preserving ability, respectively.

Cite

CITATION STYLE

APA

Liu, C., Lin, W., & Yang, Z. (2020). Generating Adversarial Texts for Recurrent Neural Networks. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 12396 LNCS, pp. 39–51). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-030-61609-0_4

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free