Pulling Out All The Full Stops: Punctuation Sensitivity in Neural Machine Translation and Evaluation

1Citations
Citations of this article
14Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Much of the work testing machine translation systems for robustness and sensitivity has been adversarial or tended towards testing noisy input such as spelling errors, or non-standard input such as dialects. In this work, we take a step back to investigate a sensitivity problem that can seem trivial and is often overlooked: punctuation. We perform basic sentence-final insertion and deletion perturbation tests with full stops, exclamation and questions marks across source languages and demonstrate a concerning finding: commercial, production-level machine translation systems are vulnerable to mere single punctuation insertion or deletion, resulting in unreliable translations. Moreover, we demonstrate that both string-based and model-based evaluation metrics also suffer from this vulnerability, producing significantly different scores when translations only differ in a single punctuation, with model-based metrics penalizing each punctuation differently. Our work calls into question the reliability of machine translation systems and their evaluation metrics, particularly for real-world use cases, where inconsistent punctuation is often the most common and the least disruptive noise.

Cite

CITATION STYLE

APA

Jwalapuram, P. (2023). Pulling Out All The Full Stops: Punctuation Sensitivity in Neural Machine Translation and Evaluation. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (pp. 6116–6130). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2023.findings-acl.381

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free