A Proposal for a Coherence Corpus in Machine Translation

6Citations
Citations of this article
70Readers
Mendeley users who have this article in their library.

Abstract

Coherence in Machine Translation (MT) has received little attention to date. One of the main issues we face in work in this area is the lack of labelled data. While coherent (human authored) texts are abundant and incoherent texts could be taken from MT output, the latter also contains other errors which are not specifically related to coherence. This makes it difficult to identify and quantify issues of coherence in those texts. We introduce an initiative to create a corpus consisting of data artificially manipulated to contain errors of coherence common in MT output. Such a corpus could then be used as a benchmark for coherence models in MT, and potentially as training data for coherence models in supervised settings.

Cite

CITATION STYLE

APA

Smith, K. S., Aziz, W., & Specia, L. (2015). A Proposal for a Coherence Corpus in Machine Translation. In DiscoMT 2015 - Discourse in Machine Translation, Proceedings of the Workshop (pp. 52–58). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/w15-2507

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free