Hierarchical text classification using methods from machine learning

  • Granitzer M
N/ACitations
Citations of this article
35Readers
Mendeley users who have this article in their library.

Abstract

Due to the permantently growing amount of textual data, automatic methods for organizing the data are needed. Automatic text classication is one of this methods. It automatically assigns documents to a set of classes based on the textual content of the document. Normally, the set of classes is hierarchically structured but today's classication approaches ignore hierarchical structures, thereby loosing valuable human knowledge. This thesis exploits the hierarchical organization of classes to improve accuracy and reduce computational complexity. Classi cation methods from machine learning, namely BoosTexter and the newly introduced Centroid- Boosting algorithm, are used for learning hierarchies. In doing so, error propagation from higher level nodes and comparing decisions between independently trained leaf nodes are two problems which are considered in this thesis. Experiments are performed on the Reuters 21578, the Reuters Corpus Volume 1 and the Ohsumed data set, which are well known in literature. Rocchio and Support Vector Machines, which are state of the art algorithms in the eld of text classication, serve as base line classiers. Comparing algorithms is done by applying statistical signicance tests. Results show that, depending on the structure of a hierarchy, accuracy improves and computational complexity decreases due to hierarchical classi- cation. Also, the introduced model for comparing leaf nodes yields an increase in performance.

Cite

CITATION STYLE

APA

Granitzer, M. (2003). Hierarchical text classification using methods from machine learning. Master’s Thesis, Graz University of Technology. Retrieved from http://know-center.tugraz.at/wp-content/uploads/2010/12/2004_Dip_MGranitzer1.pdf

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free