Generating Content-Preserving and Semantics-Flipping Adversarial Text

Weiping Pei; Chuan Yue

Conference ProceedingsOPEN ACCESS

Generating Content-Preserving and Semantics-Flipping Adversarial Text

ASIA CCS 2022 - Proceedings of the 2022 ACM Asia Conference on Computer and Communications Security (2022) 975-989

DOI: 10.1145/3488932.3517397

3Citations

11Readers

Get full text

Abstract

Natural Language Processing (NLP) models are often vulnerable to semantics-preserving adversarial attacks. That is, they make different semantic predictions on input instances with similar content and semantics. However, it remains unclear to which extent modern NLP models are vulnerable to content-preserving and semantics-flipping (CPSF) adversarial attacks. That is, they would make the same semantic prediction on input instances with similar content but flipped semantics. Attackers can use either semantics-preserving or CPSF adversarial examples to create misunderstanding between humans and models, and incur severe consequences in real-world applications. However, this equally important problem on CPSF adversarial examples has not been studied by researchers yet. In this paper, we perform the first study to investigate CPSF adversarial examples and propose CPSF adversarial attacks to reveal this new type of vulnerability of NLP models. We develop a two-stage approach to generate CPSF adversarial examples. Our experiments on two types of NLP tasks, sentiment analysis and textual entailment, demonstrate that CPSF adversarial examples can successfully fool victim models while preserving the same content with flipped semantics to humans. We further validate the good transferability of CPSF adversarial examples on NLP services of Microsoft and Google. Moreover, we demonstrate that adversarial training can to a meaningful extent mitigate CPSF adversarial attacks. Overall, our work implies that researchers need to improve NLP models' robustness against CPSF adversarial attacks that uniquely exploit the blind spots where NLP models are too insensitive to even big changes in semantics.

Author supplied keywords

Cite

CITATION STYLE

APA

Pei, W., & Yue, C. (2022). Generating Content-Preserving and Semantics-Flipping Adversarial Text. In ASIA CCS 2022 - Proceedings of the 2022 ACM Asia Conference on Computer and Communications Security (pp. 975–989). Association for Computing Machinery, Inc. https://doi.org/10.1145/3488932.3517397

Generating Content-Preserving and Semantics-Flipping Adversarial Text

Abstract

Author supplied keywords

Cite

Register to see more suggestions