Automatic long sentence segmentation for neural machine translation

3Citations
Citations of this article
20Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Neural machine translation (NMT) is an emerging machine translation paradigm that translates texts with an encoder-decoder neural architecture. Very recent studies find that translation quality drops significantly when NMT translates long sentences. In this paper, we propose a novel method to deal with this issue by segmenting long sentences into several clauses. We introduce a split and reordering model to collectively detect the optimal sequence of segmentation points for a long source sentence. Each segmented clause is translated by the NMT system independently into a target clause. The translated target clauses are then concatenated without reordering to form the final translation for the long sentence. On NIST Chinese-English translation tasks, our segmentation method achieves a substantial improvement of 2.94 BLEU points over the NMT baseline on translating long sentences with more than 30 words, and 5.43 BLEU points on sentences of over 40 words.

Cite

CITATION STYLE

APA

Kuang, S., & Xiong, D. (2016). Automatic long sentence segmentation for neural machine translation. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 10102, pp. 162–174). Springer Verlag. https://doi.org/10.1007/978-3-319-50496-4_14

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free