Coherence in Machine Translation (MT) has received little attention to date. One of the main issues we face in work in this area is the lack of labelled data. While coherent (human authored) texts are abundant and incoherent texts could be taken from MT output, the latter also contains other errors which are not specifically related to coherence. This makes it difficult to identify and quantify issues of coherence in those texts. We introduce an initiative to create a corpus consisting of data artificially manipulated to contain errors of coherence common in MT output. Such a corpus could then be used as a benchmark for coherence models in MT, and potentially as training data for coherence models in supervised settings.
CITATION STYLE
Smith, K. S., Aziz, W., & Specia, L. (2015). A Proposal for a Coherence Corpus in Machine Translation. In DiscoMT 2015 - Discourse in Machine Translation, Proceedings of the Workshop (pp. 52–58). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/w15-2507
Mendeley helps you to discover research relevant for your work.