Vecalign: Improved sentence alignment in linear time and space

81Citations
Citations of this article
109Readers
Mendeley users who have this article in their library.

Abstract

We introduce Vecalign, a novel bilingual sentence alignment method which is linear in time and space with respect to the number of sentences being aligned and which requires only bilingual sentence embeddings. On a standard German-French test set, Vecalign outperforms the previous state-of-the-art method (which has quadratic time complexity and requires a machine translation system) by 5 F1 points. It substantially outperforms the popular Hunalign toolkit at recovering Bible verse alignments in medium- to low-resource language pairs, and it improves downstream MT quality by 1.7 and 1.6 BLEU in Sinhala!English and Nepali!English, respectively, compared to the Hunalign-based Paracrawl pipeline.

Cite

CITATION STYLE

APA

Thompson, B., & Koehn, P. (2019). Vecalign: Improved sentence alignment in linear time and space. In EMNLP-IJCNLP 2019 - 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing, Proceedings of the Conference (pp. 1342–1348). Association for Computational Linguistics. https://doi.org/10.18653/v1/d19-1136

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free