Abstract
Position encoding (PE), an essential part of self-attention networks (SANs), is used to preserve the word order information for natural language processing tasks, generating fixed position indices for input sequences. However, in cross-lingual scenarios, e.g., machine translation, the PEs of source and target sentences are modeled independently. Due to word order divergences in different languages, modeling the cross-lingual positional relationships might help SANs tackle this problem. In this paper, we augment SANs with cross-lingual position representations to model the bilingually aware latent structure for the input sentence. Specifically, we utilize bracketing transduction grammar (BTG)-based reordering information to encourage SANs to learn bilingual diagonal alignments. Experimental results on WMT'14 English-German, WAT'17 Japanese-English, and WMT'17 Chinese-English translation tasks demonstrate that our approach significantly and consistently improves translation quality over strong baselines. Extensive analyses confirm that the performance gains come from the cross-lingual information.
Cite
CITATION STYLE
Ding, L., Wang, L., & Tao, D. (2020). Self-attention with cross-lingual position representation. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (pp. 1679–1685). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2020.acl-main.153
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.