Efficient prediction-based validation for document clustering

Derek Greene; Pádraig Cunningham

Conference ProceedingsOPEN ACCESS

Efficient prediction-based validation for document clustering

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2006) 4212 LNAI 663-670

DOI: 10.1007/11871842_65

3Citations

7Readers

Abstract

Recently, stability-based techniques have emerged as a very promising solution to the problem of cluster validation. An inherent drawback of these approaches is the computational cost of generating and assessing multiple clusterings of the data. In this paper we present an efficient prediction-based validation approach suitable for application to large, high-dimensional datasets such as text corpora. We use kernel clustering to isolate the validation procedure from the original data. Furthermore, we employ a prototype reduction strategy that allows us to work on a reduced kernel matrix, leading to significant computational savings. To ensure that this condensed representation accurately reflects the cluster structures in the data, we propose a density-biased strategy to select the reduced prototypes. This novel validation process is evaluated on real-world text datasets, where it is shown to consistently produce good estimates for the optimal number of clusters. © Springer-Verlag Berlin Heidelberg 2006.

Cite

CITATION STYLE

APA

Greene, D., & Cunningham, P. (2006). Efficient prediction-based validation for document clustering. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 4212 LNAI, pp. 663–670). Springer Verlag. https://doi.org/10.1007/11871842_65

Efficient prediction-based validation for document clustering

Abstract

Cite

Register to see more suggestions