Analysis of structural relationships for hierarchical cluster labeling

Markus Muhr; Roman Kern; Michael Granitzer

Conference Proceedings

Analysis of structural relationships for hierarchical cluster labeling

SIGIR 2010 Proceedings - 33rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (2010) 178-185

DOI: 10.1145/1835449.1835481

19Citations

71Readers

Get full text

Abstract

Cluster label quality is crucial for browsing topic hierarchies obtained via document clustering. Intuitively, the hierarchical structure should influence the labeling accuracy. However, most labeling algorithms ignore such structural properties and therefore, the impact of hierarchical structures on the labeling accuracy is yet unclear. In our work we integrate hierarchical information, i.e. sibling and parent-child relations, in the cluster labeling process. We adapt standard labeling approaches, namely Maximum Term Frequency, Jensen-Shannon Divergence, χ2 Test, and Information Gain, to take use of those relationships and evaluate their impact on 4 different datasets, namely the Open Directory Project, Wikipedia, TREC Ohsumed and the CLEF IP European Patent dataset. We show, that hierarchical relationships can be exploited to increase labeling accuracy especially on high-level nodes. © 2010 ACM.

Author supplied keywords

Cite

CITATION STYLE

APA

Muhr, M., Kern, R., & Granitzer, M. (2010). Analysis of structural relationships for hierarchical cluster labeling. In SIGIR 2010 Proceedings - 33rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 178–185). https://doi.org/10.1145/1835449.1835481

Analysis of structural relationships for hierarchical cluster labeling

Abstract

Author supplied keywords

Cite

Register to see more suggestions