Putting words into the system's mouth: A targeted attack on neural machine translation using monolingual data poisoning

22Citations
Citations of this article
64Readers
Mendeley users who have this article in their library.

Abstract

Neural machine translation systems are known to be vulnerable to adversarial test inputs, however, as we show in this paper, these systems are also vulnerable to training attacks. Specifically, we propose a poisoning attack in which a malicious adversary inserts a small poisoned sample of monolingual text into the training set of a system trained using back-translation. This sample is designed to induce a specific, targeted translation behaviour, such as peddling misinformation. We present two methods for crafting poisoned examples, and show that only a tiny handful of instances, amounting to only 0.02% of the training set, is sufficient to enact a successful attack. We outline a defence method against said attacks, which partly ameliorates the problem. However, we stress that this is a blind-spot in modern NMT, demanding immediate attention.

References Powered by Scopus

Learning phrase representations using RNN encoder-decoder for statistical machine translation

11676Citations
N/AReaders
Get full text

Neural machine translation of rare words with subword units

4474Citations
N/AReaders
Get full text

A Call for Clarity in Reporting BLEU Scores

1990Citations
N/AReaders
Get full text

Cited by Powered by Scopus

Spinning Language Models: Risks of Propaganda-As-A-Service and Countermeasures

37Citations
N/AReaders
Get full text

Adversarial Machine Learning on Social Network: A Survey

7Citations
N/AReaders
Get full text

The Routledge Guide to Teaching Translation and Interpreting Online

7Citations
N/AReaders
Get full text

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Cite

CITATION STYLE

APA

Wang, J., Xu, C., Guzmán, F., El-Kishky, A., Tang, Y., Rubinstein, B. I. P., & Cohn, T. (2021). Putting words into the system’s mouth: A targeted attack on neural machine translation using monolingual data poisoning. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021 (pp. 1463–1473). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2021.findings-acl.127

Readers' Seniority

Tooltip

PhD / Post grad / Masters / Doc 12

55%

Researcher 8

36%

Professor / Associate Prof. 1

5%

Lecturer / Post doc 1

5%

Readers' Discipline

Tooltip

Computer Science 21

75%

Linguistics 5

18%

Neuroscience 1

4%

Social Sciences 1

4%

Save time finding and organizing research with Mendeley

Sign up for free