Abstract
Advanced techniques to access the information distributed on the Web often exploit automatic text categorization to filter out irrelevant data before activating specific searching procedures. The drawback of such approach is the need of a large number of training documents to train the target classifiers. One way to reduce such number relates to the use of more effective document similarities based on prior knowledge. Unfortunately, previous work has shown that such information (e.g. WordNet) causes the decrease of retrieval accuracy. In this paper, we propose kernel functions to use prior knowledge in learning algorithms for document classification. Such kernels implement balanced and statistically coherent document similarities in a vector space by means of the term similarity based on the WordNet hierarchy. Cross-validation results show the benefit of the approach for Support Vector Machines when few training examples are available.
Author supplied keywords
Cite
CITATION STYLE
Basili, R., Cammisa, M., & Moschitti, A. (2006). A semantic kernel to classify texts with very few training examples. Informatica (Ljubljana), 30(2), 163–172.
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.