Learning-free text categorization

19Citations
Citations of this article
17Readers
Mendeley users who have this article in their library.
Get full text

Abstract

In this paper, we report on the fusion of simple retrieval strategies with thesaural resources in order to perform large-scale text categorization tasks. Unlike most related systems, which rely on training data in order to infer text-to-concept relationships, our approach can be applied with any controlled vocabulary and does not use any training data. The first classification module uses a traditional vector-space retrieval engine, which has been fine-tuned for the task, while the second classifier is based on regular variations of the concept list. For evaluation purposes, the system uses a sample of MedLine and the Medical Subject Headings (MeSH) terminology as collection of concepts. Preliminary results show that performances of the hybrid system are significantly improved as compared to each single system. For top returned concepts, the system reaches performances comparable to machine learning systems, while genericity and scalability issues are clearly in favor of the learningfree approach. We draw conclusion on the importance of hybrids strategies combining data-poor classifiers and knowledge-based terminological resources for general text mapping tasks. © Springer-Verlag Berlin Heidelberg 2003.

Cite

CITATION STYLE

APA

Ruch, P., Baud, R., & Geissb̈uhler, A. (2003). Learning-free text categorization. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 2780 LNAI, pp. 199–208). https://doi.org/10.1007/978-3-540-39907-0_28

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free