This paper explores bridging the content of two different languages via latent topics. Specifically, we propose a unified probabilistic model to simultaneously model latent topics from bilingual corpora that discuss comparable content and use the topics as features in a cross-lingual, dictionary-less text categorization task. Experimental results on multilingual Wikipedia data show that the proposed topic model effectively discovers the topic information from the bilingual corpora, and the learned topics successfully transfer classification knowledge to other languages, for which no labeled training data are available. © 2011 Springer-Verlag.
CITATION STYLE
De Smet, W., Tang, J., & Moens, M. F. (2011). Knowledge transfer across multilingual corpora via latent topics. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 6634 LNAI, pp. 549–560). Springer Verlag. https://doi.org/10.1007/978-3-642-20841-6_45
Mendeley helps you to discover research relevant for your work.