Cross-lingual text classification via model translation with limited dictionaries

18Citations
Citations of this article
44Readers
Mendeley users who have this article in their library.

Abstract

Cross-lingual text classification (CLTC) refers to the task of classifying documents in different languages into the same taxonomy of categories. An open challenge in CLTC is to classify documents for the languages where labeled training data are not available. Existing approaches rely on the availability of either high-quality machine translation of documents (to the languages where massively training data are available), or rich bilingual dictionaries for effective translation of trained classification models (to the languages where labeled training data are lacking). This paper studies the CLTC challenge under the assumption that neither condition is met. That is, we focus on the problem of translating classification models with highly incomplete bilingual dictionaries. Specifically, we propose two new approaches that combines unsupervised word embedding in different languages, supervised mapping of embedded words across languages, and probabilistic translation of classification models. The approaches show significant performance improvement in CLTC on a benchmark corpus of Reuters news stories (RCV1/RCV2) in English, Spanish, German, French and Chinese and an internal dataset in Uzbek, compared to representative baseline methods using conventional bilingual dictionaries or highly incomplete ones.

Cite

CITATION STYLE

APA

Xu, R., Yang, Y., Liu, H., & Hsi, A. (2016). Cross-lingual text classification via model translation with limited dictionaries. In International Conference on Information and Knowledge Management, Proceedings (Vol. 24-28-October-2016, pp. 95–104). Association for Computing Machinery. https://doi.org/10.1145/2983323.2983732

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free