Learning bilingual sentence embeddings via autoencoding and computing similarities with a multilayer perceptron

1Citations
Citations of this article
86Readers
Mendeley users who have this article in their library.

Abstract

We propose a novel model architecture and training algorithm to learn bilingual sentence embeddings from a combination of parallel and monolingual data. Our method connects autoencoding and neural machine translation to force the source and target sentence embeddings to share the same space without the help of a pivot language or an additional transformation. We train a multilayer perceptron on top of the sentence embeddings to extract good bilingual sentence pairs from nonparallel or noisy parallel data. Our approach shows promising performance on sentence alignment recovery and the WMT 2018 parallel corpus filtering tasks with only a single model.

Cite

CITATION STYLE

APA

Kim, Y., Rosendahl, H., Rossenbach, N., Rosendahl, J., Khadivi, S., & Ney, H. (2019). Learning bilingual sentence embeddings via autoencoding and computing similarities with a multilayer perceptron. In ACL 2019 - 4th Workshop on Representation Learning for NLP, RepL4NLP 2019 - Proceedings of the Workshop (pp. 61–71). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/w19-4309

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free