Abstract
Clustering evaluation generally relies on some desirable properties of clustering solutions (partitions, in particular): the properties of clusters' compactness and separation, as well as the property of stability are often considered as indicators of clustering quality. In fact, since the real clustering is unknown (clustering being originated by an unsupervised process), one should focus on obtaining good enough partitions. Clustering quality is, however, a difficult concept to put in practice. Furthermore, when aiming for clusters compactness and separation one does not necessarily meet the real clusters (e.g. Brun et al. 2007). Similarly, when focusing on the property of stability, one may find that solutions which are more stable but do not necessarily fit better the real solution (e.g. Cardoso et al. 2008). In the present paper we consider clustering solution's reproducibility in other data sets drawn from the same source as an indicator of stability. We use a new cross-validation procedure and measure the agreement between clustering solutions obtained and the real partitions (real data sets from the UCI repository, Asunción and Newman 2007, are used). Next, we study the association between indicators of stability and agreement with the real partition.We conclude with a discussion of the trade-off bias-variability, which we believe is a relevant issue to investigate within unsupervised learning, clustering in particular. © Springer-Verlag Berlin Heidelberg 2010.
Cite
CITATION STYLE
Cardoso, M. G. M. S., Faceli, K., & De Carvalho, A. C. P. L. F. (2010). Evaluation of clustering results: The trade-off bias-variability. In Studies in Classification, Data Analysis, and Knowledge Organization (pp. 201–208). Kluwer Academic Publishers. https://doi.org/10.1007/978-3-642-10745-0_21
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.