Abstract
Text clustering typically involves clustering in a high dimensional space, which appears difficult with regard to virtually all practical settings. In addition, given a particular clustering result it is typically very hard to come up with a good explanation of why the text clusters have been constructed the way they are. In this paper, we propose a new approach for applying background knowledge during preprocessing in order to improve clustering results and allow for selection between results. We preprocess our input data applying an ontology-based heuristics for feature selection and feature aggregation. Thus, we construct a number of alternative text representations. Based on these representations, we compute multiple clustering results using K- Means. The results may be distinguished and explained by the corresponding selection of concepts in the ontology. Our results compare favourably with a sophisticated baseline preprocessing strategy.
Author supplied keywords
Cite
CITATION STYLE
Staab, S., & Hotho, A. (2003). Ontology-based Text Document Clustering. In Intelligent Information Processing and Web Mining, Proceedings of the International IIS: IIPWM’03 Conference held in Zakopane (pp. 451–452). Retrieved from http://dblp.uni-trier.de/db/conf/iis/iis2003.html#StaabH03
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.