Sentence Concatenation Approach to Data Augmentation for Neural Machine Translation

8Citations
Citations of this article
62Readers
Mendeley users who have this article in their library.

Abstract

Neural machine translation (NMT) has recently gained widespread attention because of its high translation accuracy. However, it shows poor performance in the translation of long sentences, which is a major issue in low-resource languages. It is assumed that this issue is caused by insufficient number of long sentences in the training data. Therefore, this study proposes a simple data augmentation method to handle long sentences. In this method, we use only the given parallel corpora as the training data and generate long sentences by concatenating two sentences. Based on the experimental results, we confirm improvements in long sentence translation by the proposed data augmentation method, despite its simplicity. Moreover, the translation quality is further improved by the proposed method, when combined with back-translation.

Cite

CITATION STYLE

APA

Kondo, S., Hotate, K., Hirasawa, T., Kaneko, M., & Komachi, M. (2021). Sentence Concatenation Approach to Data Augmentation for Neural Machine Translation. In NAACL-HLT 2021 - 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Student Research Workshop (pp. 143–149). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2021.naacl-srw.18

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free