An unsupervised method for automatic translation memory cleaning

2Citations
Citations of this article
87Readers
Mendeley users who have this article in their library.

Abstract

We address the problem of automatically cleaning a large-scale Translation Memory (TM) in a fully unsupervised fashion, i.e. without human-labelled data. We approach the task by: i) designing a set of features that capture the similarity between two text segments in different languages, ii) use them to induce reliable training labels for a subset of the translation units (TUs) contained in the TM, and iii) use the automatically labelled data to train an ensemble of binary classifiers. We apply our method to clean a test set composed of 1,000 TUs randomly extracted from the English-Italian version of MyMemory, the world's largest public TM. Our results show competitive performance not only against a strong baseline that exploits machine translation, but also against a state-of-the-art method that relies on human-labelled data.

Cite

CITATION STYLE

APA

Sabet, M. J., Negri, M., Turchi, M., & Barbu, E. (2016). An unsupervised method for automatic translation memory cleaning. In 54th Annual Meeting of the Association for Computational Linguistics, ACL 2016 - Short Papers (pp. 287–292). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/p16-2047

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free