Exploiting comparable corpora and bilingual dictionaries for Cross-language Text Categorization

Alfio Gliozzo; Carlo Strapparava

Conference ProceedingsOPEN ACCESS

Exploiting comparable corpora and bilingual dictionaries for Cross-language Text Categorization

COLING/ACL 2006 - 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (2006) 1 553-560

DOI: 10.3115/1220175.1220245

45Citations

104Readers

Abstract

Cross-language Text Categorization is the task of assigning semantic classes to documents written in a target language (e.g. English) while the system is trained using labeled documents in a source language (e.g. Italian). In this work we present many solutions according to the availability of bilingual resources, and we show that it is possible to deal with the problem even when no such resources are accessible. The core technique relies on the automatic acquisition of Multilingual Domain Models from comparable corpora. Experiments show the effectiveness of our approach, providing a low cost solution for the Cross Language Text Categorization task. In particular, when bilingual dictionaries are available the performance of the categorization gets close to that of monolingual text categorization. © 2006 Association for Computational Linguistics.

Cite

CITATION STYLE

APA

Gliozzo, A., & Strapparava, C. (2006). Exploiting comparable corpora and bilingual dictionaries for Cross-language Text Categorization. In COLING/ACL 2006 - 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (Vol. 1, pp. 553–560). Association for Computational Linguistics (ACL). https://doi.org/10.3115/1220175.1220245

Exploiting comparable corpora and bilingual dictionaries for Cross-language Text Categorization

Abstract

Cite

Register to see more suggestions