Attack on Unfair ToS Clause Detection: A Case Study using Universal Adversarial Triggers

Shanshan Xu; Irina Broda; Rashid Haddad; Marco Negrini; Matthias Grabmair

Conference Proceedings

Attack on Unfair ToS Clause Detection: A Case Study using Universal Adversarial Triggers

NLLP 2022 - Natural Legal Language Processing Workshop 2022, Proceedings of the Workshop (2022) 238-245

DOI: 10.18653/v1/2022.nllp-1.21

0Citations

19Readers

Get full text

Abstract

Recent work has demonstrated that natural language processing techniques can support consumer protection by automatically detecting unfair clauses in the Terms of Service (ToS) Agreement. This work demonstrates that transformer-based ToS analysis systems are vulnerable to adversarial attacks. We conduct experiments attacking an unfair-clause detector with universal adversarial triggers. Experiments show that a minor perturbation of the text can considerably reduce the detection performance. Moreover, to measure the detectability of the triggers, we conduct a detailed human evaluation study by collecting both answer accuracy and response time from the participants. The results show that the naturalness of the triggers remains key to tricking readers.

Cite

CITATION STYLE

APA

Xu, S., Broda, I., Haddad, R., Negrini, M., & Grabmair, M. (2022). Attack on Unfair ToS Clause Detection: A Case Study using Universal Adversarial Triggers. In NLLP 2022 - Natural Legal Language Processing Workshop 2022, Proceedings of the Workshop (pp. 238–245). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2022.nllp-1.21

Attack on Unfair ToS Clause Detection: A Case Study using Universal Adversarial Triggers

Abstract

Cite

Register to see more suggestions