Automatic long sentence segmentation for neural machine translation

Shaohui Kuang; Deyi Xiong

Book Chapter

Automatic long sentence segmentation for neural machine translation

Springer Verlag, (2016), 162-174

DOI: 10.1007/978-3-319-50496-4_14

3Citations

20Readers

Get full text

Abstract

Neural machine translation (NMT) is an emerging machine translation paradigm that translates texts with an encoder-decoder neural architecture. Very recent studies find that translation quality drops significantly when NMT translates long sentences. In this paper, we propose a novel method to deal with this issue by segmenting long sentences into several clauses. We introduce a split and reordering model to collectively detect the optimal sequence of segmentation points for a long source sentence. Each segmented clause is translated by the NMT system independently into a target clause. The translated target clauses are then concatenated without reordering to form the final translation for the long sentence. On NIST Chinese-English translation tasks, our segmentation method achieves a substantial improvement of 2.94 BLEU points over the NMT baseline on translating long sentences with more than 30 words, and 5.43 BLEU points on sentences of over 40 words.

Cite

CITATION STYLE

APA

Kuang, S., & Xiong, D. (2016). Automatic long sentence segmentation for neural machine translation. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 10102, pp. 162–174). Springer Verlag. https://doi.org/10.1007/978-3-319-50496-4_14

Automatic long sentence segmentation for neural machine translation

Abstract

Cite

Register to see more suggestions