A Textual Backdoor Defense Method Based on Deep Feature Classification

Kun Shao; Junan Yang; Pengjiang Hu; Xiaoshuai Li

Journal ArticleOPEN ACCESS

A Textual Backdoor Defense Method Based on Deep Feature Classification

Entropy (2023) 25(2)

DOI: 10.3390/e25020220

2Citations

11Readers

Abstract

Natural language processing (NLP) models based on deep neural networks (DNNs) are vulnerable to backdoor attacks. Existing backdoor defense methods have limited effectiveness and coverage scenarios. We propose a textual backdoor defense method based on deep feature classification. The method includes deep feature extraction and classifier construction. The method exploits the distinguishability of deep features of poisoned data and benign data. Backdoor defense is implemented in both offline and online scenarios. We conducted defense experiments on two datasets and two models for a variety of backdoor attacks. The experimental results demonstrate the effectiveness of this defense approach and outperform the baseline defense method.

Author supplied keywords

Cite

CITATION STYLE

APA

Shao, K., Yang, J., Hu, P., & Li, X. (2023). A Textual Backdoor Defense Method Based on Deep Feature Classification. Entropy, 25(2). https://doi.org/10.3390/e25020220

A Textual Backdoor Defense Method Based on Deep Feature Classification

Abstract

Author supplied keywords

Cite

Register to see more suggestions