Translation invariant word embeddings

Matt Gardner; Kejun Huang; Evangelos Papalexakis; Xiao Fu; Partha Talukdar; Christos Faloutsos; Nicholas Sidiropoulos; Tom Mitchell

Conference ProceedingsOPEN ACCESS

Translation invariant word embeddings

Conference Proceedings - EMNLP 2015: Conference on Empirical Methods in Natural Language Processing (2015) 1084-1088

DOI: 10.18653/v1/d15-1127

22Citations

136Readers

Abstract

This work focuses on the task of finding latent vector representations of the words in a corpus. In particular, we address the issue of what to do when there are multiple languages in the corpus. Prior work has, among other techniques, used canonical correlation analysis to project pre-trained vectors in two languages into a common space. We propose a simple and scalable method that is inspired by the notion that the learned vector representations should be invariant to translation between languages. We show empirically that our method outperforms prior work on multilingual tasks, matches the performance of prior work on monolingual tasks, and scales linearly with the size of the input data (and thus the number of languages being embedded).

Cite

CITATION STYLE

APA

Gardner, M., Huang, K., Papalexakis, E., Fu, X., Talukdar, P., Faloutsos, C., … Mitchell, T. (2015). Translation invariant word embeddings. In Conference Proceedings - EMNLP 2015: Conference on Empirical Methods in Natural Language Processing (pp. 1084–1088). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/d15-1127

Translation invariant word embeddings

Abstract

Cite

Register to see more suggestions