We present a matrix factorization model for learning cross-lingual representations for sentences. Using sentence-aligned corpora, the proposed model learns distributed representations by factoring the given data into language-dependent factors and one shared factor. As a result, input sentences from both languages can be mapped into fixed-length vectors and then compared directly using the cosine similarity measure, which achieves 0.8 Pearson correlation on Spanish-English semantic textual similarity.
CITATION STYLE
Aldarmaki, H., & Diab, M. (2016). GWU NLP at SemEval-2016 shared task 1: Matrix factorization for crosslingual STS. In SemEval 2016 - 10th International Workshop on Semantic Evaluation, Proceedings (pp. 663–667). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/s16-1101
Mendeley helps you to discover research relevant for your work.