VoteTRANS: Detecting Adversarial Text without Training by Voting on Hard Labels of Transformations

0Citations
Citations of this article
13Readers
Mendeley users who have this article in their library.

Abstract

Adversarial attacks reveal serious flaws in deep learning models. More dangerously, these attacks preserve the original meaning and escape human recognition. Existing methods for detecting these attacks need to be trained using original/adversarial data. In this paper, we propose detection without training by voting on hard labels from predictions of transformations, namely, VoteTRANS. Specifically, VoteTRANS detects adversarial text by comparing the hard labels of input text and its transformation. The evaluation demonstrates that VoteTRANS effectively detects adversarial text across various state-of-the-art attacks, models, and datasets.

Cite

CITATION STYLE

APA

Nguyen-Son, H. Q., Hidano, S., Fukushima, K., Kiyomoto, S., & Echizen, I. (2023). VoteTRANS: Detecting Adversarial Text without Training by Voting on Hard Labels of Transformations. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (pp. 5090–5104). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2023.findings-acl.315

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free