Character-level Chinese-English Translation through ASCII Encoding

9Citations
Citations of this article
86Readers
Mendeley users who have this article in their library.

Abstract

Character-level Neural Machine Translation (NMT) models have recently achieved impressive results on many language pairs. They mainly do well for Indo-European language pairs, where the languages share the same writing system. However, for translating between Chinese and English, the gap between the two different writing systems poses a major challenge because of a lack of systematic correspondence between the individual linguistic units. In this paper, we enable character-level NMT for Chinese, by breaking down Chinese characters into linguistic units similar to that of Indo-European languages. We use the Wubi encoding scheme1, which preserves the original shape and semantic information of the characters, while also being reversible. We show promising results from training Wubi-based models on the character- and subword-level with recurrent as well as convolutional models.

Cite

CITATION STYLE

APA

Nikolov, N. I., Hu, Y., Tan, M. X., & Hahnloser, R. H. R. (2018). Character-level Chinese-English Translation through ASCII Encoding. In WMT 2018 - 3rd Conference on Machine Translation, Proceedings of the Conference (Vol. 1, pp. 10–16). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/w18-6302

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free