Enhancing Neural Text Detector Robustness with μAttacking and RR-Training

Gongbo Liang; Jesus Guerrero; Fengbo Zheng; Izzat Alsmadi

Journal ArticleOPEN ACCESS

Enhancing Neural Text Detector Robustness with μAttacking and RR-Training

Electronics (Switzerland) (2023) 12(8)

DOI: 10.3390/electronics12081948

8Citations

10Readers

Abstract

With advanced neural network techniques, language models can generate content that looks genuinely created by humans. Such advanced progress benefits society in numerous ways. However, it may also bring us threats that we have not seen before. A neural text detector is a classification model that separates machine-generated text from human-written ones. Unfortunately, a pretrained neural text detector may be vulnerable to adversarial attack, aiming to fool the detector into making wrong classification decisions. Through this work, we propose (Formula presented.) Attacking, a mutation-based general framework that can be used to evaluate the robustness of neural text detectors systematically. Our experiments demonstrate that (Formula presented.) Attacking identifies the detector’s flaws effectively. Inspired by the insightful information revealed by (Formula presented.) Attacking, we also propose an RR-training strategy, a straightforward but effective method to improve the robustness of neural text detectors through finetuning. Compared with the normal finetuning method, our experiments demonstrated that RR-training effectively increased the model robustness by up to (Formula presented.) without increasing much effort when finetuning a neural text detector. We believe the (Formula presented.) Attacking and RR-training are useful tools for developing and evaluating neural language models.

Author supplied keywords

Cite

CITATION STYLE

APA

Liang, G., Guerrero, J., Zheng, F., & Alsmadi, I. (2023). Enhancing Neural Text Detector Robustness with μAttacking and RR-Training. Electronics (Switzerland), 12(8). https://doi.org/10.3390/electronics12081948

Enhancing Neural Text Detector Robustness with μAttacking and RR-Training

Abstract

Author supplied keywords

Cite

Register to see more suggestions