Generating Content-Preserving and Semantics-Flipping Adversarial Text

3Citations
Citations of this article
11Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Natural Language Processing (NLP) models are often vulnerable to semantics-preserving adversarial attacks. That is, they make different semantic predictions on input instances with similar content and semantics. However, it remains unclear to which extent modern NLP models are vulnerable to content-preserving and semantics-flipping (CPSF) adversarial attacks. That is, they would make the same semantic prediction on input instances with similar content but flipped semantics. Attackers can use either semantics-preserving or CPSF adversarial examples to create misunderstanding between humans and models, and incur severe consequences in real-world applications. However, this equally important problem on CPSF adversarial examples has not been studied by researchers yet. In this paper, we perform the first study to investigate CPSF adversarial examples and propose CPSF adversarial attacks to reveal this new type of vulnerability of NLP models. We develop a two-stage approach to generate CPSF adversarial examples. Our experiments on two types of NLP tasks, sentiment analysis and textual entailment, demonstrate that CPSF adversarial examples can successfully fool victim models while preserving the same content with flipped semantics to humans. We further validate the good transferability of CPSF adversarial examples on NLP services of Microsoft and Google. Moreover, we demonstrate that adversarial training can to a meaningful extent mitigate CPSF adversarial attacks. Overall, our work implies that researchers need to improve NLP models' robustness against CPSF adversarial attacks that uniquely exploit the blind spots where NLP models are too insensitive to even big changes in semantics.

Cite

CITATION STYLE

APA

Pei, W., & Yue, C. (2022). Generating Content-Preserving and Semantics-Flipping Adversarial Text. In ASIA CCS 2022 - Proceedings of the 2022 ACM Asia Conference on Computer and Communications Security (pp. 975–989). Association for Computing Machinery, Inc. https://doi.org/10.1145/3488932.3517397

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free