A co-classification approach to learning from multilingual corpora

39Citations
Citations of this article
46Readers
Mendeley users who have this article in their library.

Abstract

We address the problem of learning text categorization from a corpus of multilingual documents. We propose a multiview learning, co-regularization approach, in which we consider each language as a separate source, and minimize a joint loss that combines monolingual classification losses in each language while ensuring consistency of the categorization across languages. We derive training algorithms for logistic regression and boosting, and show that the resulting categorizers outperform models trained independently on each language, and even, most of the times, models trained on the joint bilingual data. Experiments are carried out on a multilingual extension of the RCV2 corpus, which is available for benchmarking. © The Author(s) 2009.

Cite

CITATION STYLE

APA

Amini, M. R., & Goutte, C. (2010). A co-classification approach to learning from multilingual corpora. Machine Learning, 79(1–2), 105–121. https://doi.org/10.1007/s10994-009-5151-5

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free