A Proposal for a Coherence Corpus in Machine Translation

6Citations
Citations of this article
73Readers
Mendeley users who have this article in their library.

Abstract

Coherence in Machine Translation (MT) has received little attention to date. One of the main issues we face in work in this area is the lack of labelled data. While coherent (human authored) texts are abundant and incoherent texts could be taken from MT output, the latter also contains other errors which are not specifically related to coherence. This makes it difficult to identify and quantify issues of coherence in those texts. We introduce an initiative to create a corpus consisting of data artificially manipulated to contain errors of coherence common in MT output. Such a corpus could then be used as a benchmark for coherence models in MT, and potentially as training data for coherence models in supervised settings.

References Powered by Scopus

Modeling local coherence: An entity-based approach

443Citations
N/AReaders
Get full text

Correcting ESL errors using phrasal SMT techniques

170Citations
N/AReaders
Get full text

A model of coherence based on distributed sentence representation

115Citations
N/AReaders
Get full text

Cited by Powered by Scopus

How are neural machine-translated Chinese-to-English short stories constructed and cohered? An exploratory study based on theme-rheme structure

6Citations
N/AReaders
Get full text

An overview on text coherence methods

4Citations
N/AReaders
Get full text

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Cite

CITATION STYLE

APA

Smith, K. S., Aziz, W., & Specia, L. (2015). A Proposal for a Coherence Corpus in Machine Translation. In DiscoMT 2015 - Discourse in Machine Translation, Proceedings of the Workshop (pp. 52–58). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/w15-2507

Readers' Seniority

Tooltip

PhD / Post grad / Masters / Doc 19

59%

Researcher 9

28%

Lecturer / Post doc 3

9%

Professor / Associate Prof. 1

3%

Readers' Discipline

Tooltip

Computer Science 26

70%

Linguistics 8

22%

Engineering 2

5%

Neuroscience 1

3%

Save time finding and organizing research with Mendeley

Sign up for free