Treatment of markup in statistical machine translation

12Citations
Citations of this article
69Readers
Mendeley users who have this article in their library.
Get full text

Abstract

We present work on handling XML markup in Statistical Machine Translation (SMT). The methods we propose can be used to effectively preserve markup (for instance inline formatting or structure) and to place markup correctly in a machine-translated segment. We evaluate our approaches with parallel data that naturally contains markup or where markup was inserted to create synthetic examples. In our experiments, hybrid reinsertion has proven the most accurate method to handle markup, while alignment masking and alignment reinsertion should be regarded as viable alternatives. We provide implementations of all the methods described and they are freely available as an open-source framework.

Cite

CITATION STYLE

APA

Müller, M. (2017). Treatment of markup in statistical machine translation. In DiscoMT 2017 - Discourse in Machine Translation, Proceedings of the Workshop (pp. 36–46). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/w17-4804

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free