On cross-lingual text similarity using neural translation models

Kazuhiro Seki

Journal ArticleOPEN ACCESS

On cross-lingual text similarity using neural translation models

Seki K

Journal of Information Processing (2019) 27 315-321

DOI: 10.2197/ipsjjip.27.315

3Citations

5Readers

Abstract

Accurately computing the similarity between two texts written in different languages has tremendous value in many applications, such as cross-lingual information retrieval and cross-lingual text mining/analytics. This paper studies the important problem based on neural networks. Specifically, our focus is on the neural machine translation models. While translation models are utilized, we pay special attention not to the translation itself but to the intermediate states of given texts stored in the translation models. Our assumption is that the intermediate states capture the syntactic and semantic meaning of input texts and are a good representation of the texts, avoiding inevitable translation errors. To study the validity of the assumption, we investigate the utility of the intermediates states and their effectiveness in computing cross-lingual text similarity in comparison with other neural network-based distributed representations of texts, including word and paragraph embedding-based approaches. We demonstrate that an approach using the intermediate states outperforms not only these approaches but also a strong machine translation-based one. Furthermore, it is revealed that intermediate states and translated texts work complementarily each other despite the fact that they are generated from the same NMT models.

Author supplied keywords

Cite

CITATION STYLE

APA

Seki, K. (2019). On cross-lingual text similarity using neural translation models. Journal of Information Processing, 27, 315–321. https://doi.org/10.2197/ipsjjip.27.315

On cross-lingual text similarity using neural translation models

Abstract

Author supplied keywords

Cite

Register to see more suggestions