Text segmentation criteria for statistical machine translation

Mauro Cettolo; Marcello Federico

Conference Proceedings

Text segmentation criteria for statistical machine translation

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2006) 4139 LNAI 664-673

DOI: 10.1007/11816508_66

5Citations

4Readers

Get full text

Abstract

For several reasons machine translation systems are today unsuited to process long texts in one shot. In particular, in statistical machine translation, heuristic search algorithms are employed whose level of approximation depends on the length of the input. Moreover, processing time can be a bottleneck with long sentences, whereas multiple text chunks can be quickly processed in parallel. Hence, in real working conditions the problem arises of how to optimally split the input text. In this work, we investigate several text segmentation criteria and verify their impact on translation performance by means of a statistical phrase-based translation system. Experiments are reported on a popular as well as difficult task, namely the translation of news agencies from Chinese-English as proposed by the NIST MT evaluation workshops. Results reveal that best performance can be achieved by taking into account both linguistic and input length constraints. © Springer-Verlag Berlin Heidelberg 2006.

Cite

CITATION STYLE

APA

Cettolo, M., & Federico, M. (2006). Text segmentation criteria for statistical machine translation. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 4139 LNAI, pp. 664–673). Springer Verlag. https://doi.org/10.1007/11816508_66

Text segmentation criteria for statistical machine translation

Abstract

Cite

Register to see more suggestions