Ontology-based Text Document Clustering.

Steffen Staab; Andreas Hotho

Conference Proceedings

Ontology-based Text Document Clustering.

Staab S
Hotho A

Intelligent Information Processing and Web Mining, Proceedings of the International IIS: IIPWM'03 Conference held in Zakopane (2003) 451-452

N/ACitations

99Readers

Abstract

Text clustering typically involves clustering in a high dimensional space, which appears difficult with regard to virtually all practical settings. In addition, given a particular clustering result it is typically very hard to come up with a good explanation of why the text clusters have been constructed the way they are. In this paper, we propose a new approach for applying background knowledge during preprocessing in order to improve clustering results and allow for selection between results. We preprocess our input data applying an ontology-based heuristics for feature selection and feature aggregation. Thus, we construct a number of alternative text representations. Based on these representations, we compute multiple clustering results using K- Means. The results may be distinguished and explained by the corresponding selection of concepts in the ontology. Our results compare favourably with a sophisticated baseline preprocessing strategy.

Author supplied keywords

2003 clustering myown ontology text

Cite

CITATION STYLE

APA

Staab, S., & Hotho, A. (2003). Ontology-based Text Document Clustering. In Intelligent Information Processing and Web Mining, Proceedings of the International IIS: IIPWM’03 Conference held in Zakopane (pp. 451–452). Retrieved from http://dblp.uni-trier.de/db/conf/iis/iis2003.html#StaabH03

Ontology-based Text Document Clustering.

Abstract

Author supplied keywords

Cite

Register to see more suggestions