Learning-free text categorization

Patrick Ruch; Robert Baud; Antoine Geissb̈uhler

Conference Proceedings

Learning-free text categorization

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2003) 2780 LNAI 199-208

DOI: 10.1007/978-3-540-39907-0_28

19Citations

17Readers

Get full text

Abstract

In this paper, we report on the fusion of simple retrieval strategies with thesaural resources in order to perform large-scale text categorization tasks. Unlike most related systems, which rely on training data in order to infer text-to-concept relationships, our approach can be applied with any controlled vocabulary and does not use any training data. The first classification module uses a traditional vector-space retrieval engine, which has been fine-tuned for the task, while the second classifier is based on regular variations of the concept list. For evaluation purposes, the system uses a sample of MedLine and the Medical Subject Headings (MeSH) terminology as collection of concepts. Preliminary results show that performances of the hybrid system are significantly improved as compared to each single system. For top returned concepts, the system reaches performances comparable to machine learning systems, while genericity and scalability issues are clearly in favor of the learningfree approach. We draw conclusion on the importance of hybrids strategies combining data-poor classifiers and knowledge-based terminological resources for general text mapping tasks. © Springer-Verlag Berlin Heidelberg 2003.

Cite

CITATION STYLE

APA

Ruch, P., Baud, R., & Geissb̈uhler, A. (2003). Learning-free text categorization. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 2780 LNAI, pp. 199–208). https://doi.org/10.1007/978-3-540-39907-0_28

Learning-free text categorization

Abstract

Cite

Register to see more suggestions