An unsupervised hierarchical approach to document categorization

  • Wetzker R
  • Alpcan T
  • Bauckhage C
 et al. 
  • 26

    Readers

    Mendeley users who have this article in their library.
  • 7

    Citations

    Citations of this article.

Abstract

We propose a hierarchical approach to document categorization that requires no pre-configuration and maps the semantic document space to a predefined taxonomy. The utilization of search engines to train a hierarchical classifier makes our approach more flexible than existing solutions which rely on (human) labeled data and are bound to a specific domain. We show that the structural information given by the taxonomy allows for a context aware construction of search queries and leads to higher tagging accuracy. We test our approach on different benchmark datasets and evaluate its performance on the single- and multi-tag assignment tasks. The experimental results show that our solution is as accurate as supervised classifiers for web page classification and still performs well when categorizing domain specific documents. 2007 IEEE.

Get free article suggestions today

Mendeley saves you time finding and organizing research

Sign up here
Already have an account ?Sign in

Find this document

Authors

Cite this document

Choose a citation style from the tabs below

Save time finding and organizing research with Mendeley

Sign up for free