Abstract
Much of the work testing machine translation systems for robustness and sensitivity has been adversarial or tended towards testing noisy input such as spelling errors, or non-standard input such as dialects. In this work, we take a step back to investigate a sensitivity problem that can seem trivial and is often overlooked: punctuation. We perform basic sentence-final insertion and deletion perturbation tests with full stops, exclamation and questions marks across source languages and demonstrate a concerning finding: commercial, production-level machine translation systems are vulnerable to mere single punctuation insertion or deletion, resulting in unreliable translations. Moreover, we demonstrate that both string-based and model-based evaluation metrics also suffer from this vulnerability, producing significantly different scores when translations only differ in a single punctuation, with model-based metrics penalizing each punctuation differently. Our work calls into question the reliability of machine translation systems and their evaluation metrics, particularly for real-world use cases, where inconsistent punctuation is often the most common and the least disruptive noise.
Cite
CITATION STYLE
Jwalapuram, P. (2023). Pulling Out All The Full Stops: Punctuation Sensitivity in Neural Machine Translation and Evaluation. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (pp. 6116–6130). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2023.findings-acl.381
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.