Towards a simple clustering criterion based on minimum length encoding

Marcus Christopher Ludl; Gerhard Widmer

Conference ProceedingsOPEN ACCESS

Towards a simple clustering criterion based on minimum length encoding

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2002) 2430 258-270

DOI: 10.1007/3-540-36755-1_22

3Citations

2Readers

Abstract

We propose a simple and intuitive clustering evaluation criterion based on the minimum description length principle which yields a particularly simple way of describing and encoding a set of examples. The basic idea is to view a clustering as a restriction of the attribute domains, given an example’s cluster membership. As a special operational case we develop the so-called rectangular uniform message length measure that can be used to evaluate clusterings described as sets of hyper-rectangles. We theoretically prove that this measure punishes cluster boundaries in regions of uniform instance distribution (i.e., unintuitive clusterings), and we experimentally compare a simple clustering algorithm using this measure with the well-known algorithms KMeans and AutoClass.

Cite

CITATION STYLE

APA

Ludl, M. C., & Widmer, G. (2002). Towards a simple clustering criterion based on minimum length encoding. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 2430, pp. 258–270). Springer Verlag. https://doi.org/10.1007/3-540-36755-1_22

Towards a simple clustering criterion based on minimum length encoding

Abstract

Cite

Register to see more suggestions