Semantic space transformations for cross-lingual document classification

Jiří Martínek; Ladislav Lenc; Pavel Král

Conference Proceedings

Semantic space transformations for cross-lingual document classification

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2018) 11139 LNCS 608-616

DOI: 10.1007/978-3-030-01418-6_60

N/ACitations

4Readers

Get full text

Abstract

Cross-lingual document representation can be done by training monolingual semantic spaces and then to use bilingual dictionaries with some transform method to project word vectors into a unified space. The main goal of this paper consists in evaluation of three promising transform methods on cross-lingual document classification task. We also propose, evaluate and compare two cross-lingual document classification approaches. We use popular convolutional neural network (CNN) and compare its performance with a standard maximum entropy classifier. The proposed methods are evaluated on four languages, namely English, German, Spanish and Italian from the Reuters corpus. We demonstrate that the results of all transformation methods are close to each other, however the orthogonal transformation gives generally slightly better results when CNN with trained embeddings is used. The experimental results also show that convolutional network achieves better results than maximum entropy classifier. We further show that the proposed methods are competitive with the state of the art.

Cite

CITATION STYLE

APA

Martínek, J., Lenc, L., & Král, P. (2018). Semantic space transformations for cross-lingual document classification. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11139 LNCS, pp. 608–616). Springer Verlag. https://doi.org/10.1007/978-3-030-01418-6_60

Semantic space transformations for cross-lingual document classification

Abstract

Cite

Register to see more suggestions