Cluster It! Semiautomatic splitting and naming of classification concepts

Dominik Stork; Kai Eckert; Heiner Stuckenschmidt

Conference Proceedings

Cluster It! Semiautomatic splitting and naming of classification concepts

Studies in Classification, Data Analysis, and Knowledge Organization (2013) 365-373

DOI: 10.1007/978-3-319-00035-0_37

0Citations

1Readers

Get full text

Abstract

In this paper, we present a semiautomatic approach to split overpopulated classification concepts (i.e. classes) into subconcepts and propose suitable names for the new concepts. Our approach consists of three steps: In a first step, meaningful term clusters are created and presented to the user for further curation and selection of possible new subconcepts. A graph representation and simple tf-idf weighting is used to create the cluster suggestions. The term clusters are used as seeds for the subsequent content-based clustering of the documents using k-Means. At last, the resulting clusters are evaluated based on their correlation with the preselected term clusters and proper terms for the naming of the clusters are proposed. We show that this approach efficiently supports the maintainer while avoiding the usual quality problems of fully automatic clustering approaches, especially with respect to the handling of outliers and determination of the number of target clusters. The documents of the parent concept are directly assigned to the new subconcepts favoring high precision. © Springer International Publishing Switzerland 2013.

Cite

CITATION STYLE

APA

Stork, D., Eckert, K., & Stuckenschmidt, H. (2013). Cluster It! Semiautomatic splitting and naming of classification concepts. In Studies in Classification, Data Analysis, and Knowledge Organization (pp. 365–373). Kluwer Academic Publishers. https://doi.org/10.1007/978-3-319-00035-0_37

Cluster It! Semiautomatic splitting and naming of classification concepts

Abstract

Cite

Register to see more suggestions